No one wants to lose data, but it’s hard to avoid: accidental deletion of files, unexpected power outages, hard drive damage, hard drive loss, natural disasters… All of the above will lead to data loss, which is likely to be impossible to recover directly. I believe that many people have had such troubles. This article will briefly introduce the 321 principles of backup, as well as best practices for cloud backup and local backup.

Typical Data Loss Cases

  • For example, the operation and maintenance personnel of GitLab.com once deleted the data by mistake: on February 1, 2017, the operation and maintenance personnel used the root account to log in to the main server by mistake and deleted the core service data. More seriously, five of the backup methods failed. But fortunately there is still a backup from six hours ago. Although the website was inaccessible for a few hours, and a lot of data generated after the backup was lost, most of the data was eventually restored. Event Details.
  • Data loss due to WannaCry virus: WannaCry is a ransomware virus that targets a vulnerability in Windows systems to encrypt user files on the system, making them inaccessible unless $300 ~ $600 (equivalent to several thousand RMB). More than 300,000 computers were reported to be infected with this virus. Numerous businesses, including banks, hospitals, and rail systems, were unable to function properly due to the virus. In China, the computers of many college students who use the Education Network were attacked by this virus, resulting in the loss of files.

The above data loss has brought huge losses to people, but if we can make data backups in advance, we can reduce the loss of data later.

Best Backup Principles: The 321 Principle

In the process of backup, we should implement the 321 principle, so as to ensure the reliability and effectiveness of the backup.

  • Three data copies: In addition to the original data, save two copies of the data. If the probability of loss of these three copies is independent of each other (all 1%), then the probability of loss of three copies at the same time is only 0.0001%, which is lower than the probability of loss of two copies at the same time.
    Two types of storage media: Data on the same type of storage media is more likely to be lost at the same time. For example, you have three copies of data stored on the computer’s internal memory, but if the computer’s disk is completely damaged, the disk is formatted by mistake, or the computer is lost, the data is lost along with it. In the above case, another type of storage medium can be a removable hard disk, SD card, USB flash drive, CD, DVD, etc.
    One offsite backup: Physical isolation between multiple backups is important. If those backups were all in one room, a single fire would be enough to destroy all of them. If conditions permit, it is safe to store backups across cities (at intervals of more than 100km). Storing backups at home and at work also counts as offsite backups.

In addition, you should also pay attention to the timeliness of backups, and if possible, try to shorten the backup cycle. For example, a minute-by-minute backup is more time-sensitive than an hourly backup. When data is lost, the former will only lose the last 1 minute of work, while the latter will lose the last 1 hour of work.

Cloud Backup

Generally, cloud backups are very reliable. The cloud server will help you do the 321 principle. You only need to choose a cloud storage service provider and upload the files to be backed up.

It is recommended to choose a service provider that provides an automatic backup function, which can save you the step of manually selecting files to upload. Usually automatic backup also performs incremental backup of files, that is, each backup uploads only the files that have changed since the last backup, which can greatly save upload time.

A typical example of cloud backup is the iCloud backup function in iOS. After this function is enabled, the iOS device will automatically upload personal data such as pictures, contacts, documents, chat records, and software archives to the cloud. After purchasing a new iOS device, this data can be automatically restored to the new device from the cloud.

Simple Backup Using Object Storage

Regularly package and upload important files on the server to object storage for simple backup. You can directly use the object storage of Amazon S3, Google Cloud Storage, Alibaba Cloud OSS, and Tencent Cloud COS. The above services all provide 99.999999999% durability, that is, once the file is uploaded, it is almost impossible to accidentally lose it. Object storage in cloud services is usually stored in multiple Availability Zones (usually at least three) within a region, and each Availability Zone also contains multiple copies of files. There is a certain distance between the various Availability Zones, so this achieves off-site. About regions and availability zones, you can refer to this AWS article for details.

Object storage for cloud services generally has a choice of regions. The geographically closest region is usually chosen for the lowest latency. These services are usually billed on a per-use basis, which mainly includes storage space occupied within a certain period of time and traffic charges for transferring data. For example, if you want to back up 1GB of data, it may only cost a few dollars or cents per month, or even free. (different service providers charge different)

The software on many servers has integrated the function of using object storage for backup: after the service provider has activated the object storage, you only need to configure the authorization key in the software to use the object storage for backup. If the software does not integrate this backup function, a simple backup can also be implemented manually. For example, use mysqldump to export database files, and use gzip and tar commands to compress and package files for backup. Usually object storage providers also provide command-line tools that can be used to simply upload files to object storage. For example, AWS has aws, which supports S3 operations; Google Cloud Storage has gsutil; Alibaba Cloud OSS has ossutil; Tencent Cloud has tccli, which supports COS operations.

Take a Full Disk Backup of the Server Using a Snapshot

Usually a snapshot backs up all data on the server’s disk. Restoring from snapshots is also very convenient. This doesn’t even require software on the server to support backup functionality. Whether the server disk is damaged, system files are lost, or files are deleted, you can recover from a snapshot.

If possible, it is recommended to use both object storage backup and snapshot backup.

Local Backup

Although cloud backup is simple and durable, the speed of backing up large volumes of data is proportional to bandwidth. What’s more, what the backup needs is the uplink bandwidth, which is usually a fraction of the downlink bandwidth that the operator claims. And just use an ordinary mechanical hard disk for backup, and its speed has reached the speed of a gigabit network, and the gigabit network penetration rate is low and the price is extremely expensive. Therefore, local backup may be more appropriate when encountering one or more situations such as large data volume, limited bandwidth, and limited backup/restore time.

For local backup, you need to do the 321 principle yourself. You’ll need to back up your data to two hard drives (either via a LAN or wired connection) and store one of the hard drives offsite. Many desktop operating systems support backup, you can find the backup function in the control panel of the latest Windows system, and use Time Machine to back up on macOS. It is recommended to configure automatic backup.

If conditions permit, it is recommended to back up more important files to the cloud while backing up locally.

Retention of historical versions

We should keep earlier versions of files in case we need them. Keeping multiple historical versions of files Using incremental backups can help save storage space. If possible, keep as many historical versions as possible.

Earlier versions can have relatively longer intervals in order to save space: like Time Machine in macOS, it keeps hourly backups for the past 24 hours, daily backups for the past month, and Take as many weekly backups as possible over a month, until the disk space fills up.

Some network storages automatically keep historical versions, such as Dropbox, Google Drive, iCloud, etc. Some software also keep historical versions on the local disk. For example, Git keeps a history of every commit.

It is recommended to keep historical versions for important files first, and if possible for all files.

Other

Object Storage Storage Class

Generally, object storage provides multiple storage categories, and different storage categories have different pricing and usage scenarios. Reasonable use of multiple storage categories can save expenses.

Primary Storage Types for Amazon S3

Sorted by storage price from high to low, the durability is 99.999999999%, and they are all available in multiple availability zones.

  • STANDARD: Default, suitable for frequently accessed files
  • STANDARD_IA: The storage unit price is lower (54% by default), but there is an additional retrieval fee. Also, this type is stored for at least 30 days and at least 128kb.
  • GLACIER: The lowest unit price for storage (17% by default), no real-time access, and additional retrieval fees

Backups larger than 128kb and infrequently accessed are recommended to be stored in STANDARD_IA, and early historical versions that are rarely accessed can be stored in GLACIER.

Main Storage Types of Google Cloud Storage

Sorted by storage price from high to low, the durability is 99.999999999%, and they are all available in multiple availability zones.

  • Multi-Regional: Multi-Regional Storage (stronger than Multi-AZ), this storage type will store files in multiple cities/countries within a continent. According to the official website, the uploaded files will be stored in at least two data centers separated by at least 160 kilometers. Suitable for storing frequently accessed files around the world
  • Regional: STANDARD corresponding to S3
  • Nearline: STANDARD_IA for S3, 50% of the regional price, no minimum file size limit
  • Coldline: GLACIER corresponding to S3, with at least 90 days of storage, but supports real-time access, 35% of the regional price, and the retrieval cost is higher than Nearline.

Likewise, infrequently accessed backups are recommended to be stored in Nearline, and earlier historical versions that are rarely accessed can be stored in Coldline.

Main storage types of Alibaba Cloud OSS

Sorted by storage price from high to low, the durability is 99.999999999%, and they are all available in multiple availability zones.

  • Standard Type: STANDARD for S3
  • Low frequency access type: Corresponding to STANDARD_IA of S3, it is 67% of the price of standard type, at least 30 days storage, at least 64kb.
  • Archive: GLACIER for S3, 28% of the standard price, at least 64kb. Storage for at least 60 days, retrieval costs higher than low frequency access.

Similarly, it is recommended to store infrequently accessed backups in the infrequent access type, and the early historical versions that are rarely accessed again can be stored in the archive type.