Data Replication and Backups

Purpose

The purpose of this guide is to highlight the importance of maintaining trusted copies of your data according to best practices and to offer criteria to consider when implementing a data recovery strategy to minimize the risk of data loss, particularly research data.  

Some of our data (e.g., email, iCloud, Google Drive, OneDrive) is already replicated to the cloud, and these replicas are insurance against risks like theft or hardware failure, and can also provide access to important data when your primary system (laptop or desktop) isn't available. However, these replicas are not necessarily protected from severe threats such as ransomware. To help keep it clear, we'll refer to the first category as replicas and the next one as backups.

Backups that ensure data recovery are a separate copy of your data, typically updated at a regular intervals and are designed to protect against accidental and malicious data loss events, such as:

  • deletion;
  • corruption;
  • hardware failures;
  • malware;
  • and ransomware attacks.

Software and services that provide the right level of protection for backups will have features that make it harder for older copies to be deleted or modified. This prevents an attacker from removing all the old copies and then encrypting the current data. Both replica and backup services often create multiple copies of the data on their systems to reduce the odds of a customer's data being lost from a single failure.

Guidance: Requirements and Best Practices

What to Backup

Choosing what to backup is often a question of how much data you have and whether there is a cost. For modest amounts of data, e.g., on a laptop, there may be departmental or other solution that is provided. If you have several servers in a rack somewhere or a local storage device, multiple copies of a few backups will likely require a larger scale solution. Include laptops, desktops, and servers when surveying the scope of data.

Here are some questions to help determine what data to prioritize:

  • What is the highest priority data?
  • Do you need to backup raw data, the result of analysis, or both?
  • Can any of the data be retrieved elsewhere or recreated?
  • What would be the cost and time involved to recreate the data?
  • What is the cost of backing up the data?

Backups Solution Criteria

After you've prioritized the data to be backed up, use this recommended criteria for both the solution you choose and the policies you implement:

  • Backups should occur at least daily.
  • Include any data you cannot replace (e.g., recently collected instrument or survey data).
  • Store in a physically secure and off-site location.
  • Protect from being overwritten or destroyed (immutable).
  • Include several points-in-time or versions of your data.
  • Do not automatically expire (and delete) any data.
  • Support all the file types and sizes that comprise your data.
  • Encryption is highly recommended to ensure its integrity and privacy, while in transit over a network and while stored (at rest).

Deciding Where To Backup Your Data

On-Premise Backups

Establishing a local backup solution requires considerable management. Whether it is a dedicated backup hardware/software solution or a straightforward and inexpensive strategy of simply copying your data to an external hard drive or memory stick, these methods has a number of risks:

  • To retain offline copies of your data someone must remember to swap out the storage device.
  • Availability and access to local storage device are subject to catastrophic event, such as flood or fire, in the building.
  • It will be difficult to protect local backups from being overwritten by malware or ransomware.
  • Storage devices must be maintained and upgraded regularly and can break or be stolen.

Even copying data your data to a remote server of a mounted driver or using a file transfer tool can include some of these risks.

Third-Party Backup Services

  •  Backups services will typically include important features such as replicating your data across multiple locations, encryption, versioning, and ease of management.
  • A full service backup solution may include the client software and provide the storage for your data.  Consider the following:
    • How frequently are backups updated?
    • How many versions of file changes are retained? And where are they stored?
    • How long are previous versions retained?
    • Are certain type of files or sizes excluded from backups?
    • If the source data is deleted how long will it be retained in the backups?
    • What is the total cost for a managed backup solution?

Backups to the Cloud Storage

You can write your own backups scripts or use backup software with cloud storage (e.g., S3 in Amazon Web Services). This can be a complex process and is more suited to projects planning to or already using cloud computing or storage, or with extremely large archival storage needs not addressed by other systems.

Campus-Recommended Backup Solutions

  • Scripps Institution of Oceanography (SIO) offers Code42 CrashPlan
  • Institute of Geophysics and Planetary Physics (IGPP) offers Code42 Crashplan
  • SDSC offers CommVault
  • UCSD Health - we need to confirm their workstation solution Do we want to say something about Health-supported (server/data drive) having a Health-provided backup solution?
  • Iplaceholder for potential campus solution or at least a recommendation (i.e., Druva)

Resources and Definitions:

  • For help with this and any other research computing and data matters, please contact Research IT Services (research-it@ucsd.edu).
  • Disaster Recovery Plan (DR): is broader than backups and would include policies and procedures to enable a complete return to productivity after a catastrophic event.
  • Digital preservation: the effort to ensure access to data over a long period, preserve its integrity and protect against obsolescence of technology such as changes in format, hardware or software.


Please Note: The main copy of this page may be found on the secure.assure.ucsd.edu website.