Archiving data is sometimes confused with backing it up. While both processes are used to store data, the reasons for performing them are very different. Backups are created primarily for recovering systems and data in the aftermath of unforeseen events. Backups make copies of existing data that can be used to easily replace lost or corrupt files.
Archives are used for the longterm storage of specific data elements that are needed to satisfy legal or regulatory requirements. An archive may be the only copy of the given data, making it critically important that it is stored safely and protected against possible loss. Archived data often contains sensitive or personally identifying information, making it imperative that it is stored securely to meet privacy regulations.
Differences in Data Availability
The difference in why backups and archives are created affects issues such as how they are stored and the speed at which they need to be accessed. Backups need to be readily available to address unexpected outages or data loss scenarios. Mission-critical systems are often configured to immediately fail over to backups to avoid or minimize downtime.
Archived data normally does not need to be accessed as rapidly as does backup data. Information needed to provide evidence to auditors or furnish documentation for corporate lawyers usually has less demanding time requirements. Retrieval of the necessary data can be scheduled so it is available when needed. In the majority of cases, archived data does not need to be immediately available.
Choosing a Cloud Archiving Solution
The following factors need to be considered when selecting a cloud archiving solution.
- Data security and durability - Since a single copy of important data is often archived, the provider needs to ensure the data won’t be lost, corrupted, or access by unauthorized personnel.
- Integrated compliance management - Archival data subject to regulatory guidelines needs to be appropriately managed by the cloud provider.
- Storage location - In some cases, data needs to be stored in specific geographical regions to satisfy compliance requirements.
- Data access time - Various cloud vendors provide different access times for archived data versus other types of storage.
Following are two examples of the archive offerings of major cloud providers.
Amazon S3 Glacier and S3 Glacier Deep Archive
This offering by Amazon Web Services (AWS) promises 99.999999999% data durability with comprehensive security and compliance capabilities. The cost of storage can be as low as $1 per terabyte per month. Glacier has three access tiers that range from a few minutes to several hours. Deep Archive has two options that return data in 12 or 48 hours.
Google Cloud Nearline, Coldline, and Archive
Google offers three archival solutions to address the varying needs of accessing archived data. They promise low latency and availability of archived data, with a low cost per gigabyte and 99.999999999% durability of objects over a given year. Data is protected by Google-grade security and redundant storage.
The cloud offers a viable method of archiving sensitive data without incurring the capital costs of procuring additional on-premises storage. Organizations with archiving needs should investigate how the cloud can satisfy them.