What is Object Storage?

What is Object Storage?

Sarojini Devi Nagappan
17 March 2020

Object storage is a way of storing data as objects. Each object consists of the data itself, associated metadata, and a unique identifier. The data can be in any format and of any size. Metadata describes the type of data, owner of data, security privileges, and any other contextual information about the data. The unique identifier is the number used to identify the objects in distributed systems. This identifier also makes it possible to locate the objects without knowing the physical location which allows you to store data in different locations, therefore, you don’t have to worry about having a big storage space at one location.

Benefits of Using Object Storage

Metadata in the objects have classification information which makes it easier for data analytics on object-based storage. There is no hierarchical structure therefore data retrieval is much faster. 

With object-based storage, you also don’t have to worry about the growing volume of data. Add as much data as you want to, there is no size limit and cost to store data is lesser as it can be stored in multiple locations. 

Object storage architecture is highly scalable and can be managed just by adding nodes to it. No size limit, scalable is important for cost reduction and exactly how is this possible? With object-based storage, copies of objects can be stored in multiple locations, if a node fails data can be retrieved from different locations thus the risk of downtime is very low. 

Next is the security aspect, the unique identifier on the object can be used to check if the copies are corrupted meaning the data protection is already a part of the object architecture.

Disadvantages of Using Object Storage

Object storage tends to use the eventual consistency model to achieve high availability. Eventual consistency means data updates on each node of the database gets consistent eventually. To explain this further, the last version of an object will be first stored on one node and eventually, the update is replicated to the rest of the locations. 

This promotes high availability in object-based storage and is good for relatively static data such as videos, images, or unstructured data (which does not change much). 

Consistency is an issue because there is no guarantee that the version of data retrieved is the latest version hence object storage is not recommended for transactional data.

Best Fit for Object Storage

Object Storage is best for unstructured data which does not have high levels of write transactions on the data. Multimedia files, image archives or backups are the best kinds of data for object storage. 

Besides this, if there is a need for the data storage to be geographically distributed, then object storage is the best solution. Its metadata can keep the information to support the parallel access to data distributed in different data centers. With object storage, there is no size limitation hence it’s great to store huge amounts of data. 

Cloud Object Storage Storage Providers

Amazon Simple Storage Service (Amazon S3)

Amazon S3 is a web-based object storage service provided by Amazon Web Services. This web service allows you to store and retrieve data as objects from the object storage via an API or HTTPS. This service gives a huge amount of storage coupled with high security making it suitable for almost any application of any size. Scores well for big data analytics because it can store data in millions with high durability and provides end to end management of the object storage. Amazon S3 has features to organize and manage your data, that is stored as objects in S3 buckets and each object can take up to 5 terabytes in size. Objects in S3 buckets can be organized and shared across the S3 Storage Classes. It also has features to prevent unauthorised access, secure data, and monitor data at the object and bucket level.

Google Cloud Storage (GCS)

Google Cloud Storage is an online file storage web service to store and access data on a Google Cloud Platform Infrastructure. GCS is a good choice for small-medium enterprises and popular among developers because it has a simple programming interface, high capacity and scalability. GCS, like Amazon S3, has features like storage management, audit logs, security, object hold, etc. GCS is priced on the amount of data and network egress.

Azure Blob Storage

Azure Blog Storage is the cloud object storage service provided by Microsoft. It stores large amounts of unstructured object data, such as text or binary data, that can be accessed via HTTP or HTTPS like Amazon S3 and GCS. In Azure, binary objects are called blobs and the unit of storage is referred to as container identified by a unique key. Unlike GCS and Amazon S3, data stored in Azure can be either block blobs, append blobs, and page blobs. Block blobs hold text or binary files up to ~5 TB  like media or image files. Any files larger than 100Mb must be uploaded as small blocks which will be consolidated into a final blob. Page blob is used for random access files and they provide random read/write access to 512-byte pages. Append blobs are like block blobs except they are used for append operations and are frequently used to append log information on the same block. Object versioning is done using taking read-only snapshots of the blobs. To retrieve the read-only access blob, you can append the date and time value of the snapshot returned by Azure to the blob URL. Azure Blob Storage cost is priced by the amount of data stored per month, storage account type, replication type, and network egress

IBM Cloud Object Storage (COS)

This cloud storage, like the rest, is also a highly scalable cloud storage web service which allows the user to manage the data via its portal. It is designed for durability and security which is a common feature for all cloud storage services. COS uses Information Dispersal Algorithms (IDAs) which separates data into slices and distributes it across the data centers. This feature gives high data security whereby the complete data is not stored on a single node. This service also allows you to configure data access policies by using CLI or the available UI. There are 4 types of storage classes available to store the data according to the workload. The standard storage is for active workloads and there is a need for multiple access. Vault storage class is used for less- frequently accessed data, the cold vault is ideal for long term retention data and finally, the flex storage class is for data with unpredictable usage pattern. COS uses Aspera high-speed transfer to improve data transfer performance especially in networks with high latency and packet loss.

Conclusion

Object storage is important if you have a massive amount of data to be stored without worrying about the underlying structure it has. 

With current advances in analytics, this type of storage could turn out to be the best solution not only for analytics but for data retention and backup as well. It supports scalability and can be geo-distributed, so organisations can worry less about data recovery and security. 

There are many cloud vendors out there which provide the private or public cloud services, however, Amazon S3, Google Cloud Storage, Azure Blob Storage, and IBM Cloud Storage seem to lead the market at the moment. Commonly all these storage services support storage of large amounts of data, maintain the object life cycle with a slight variation, and provide encryption on their individual architecture. 

The best differentiator is, of course, the cost, but this aspect depends on the configuration required by the users. At a glance, one might seem costlier, but depending on the selected plan, after a breakdown in services, it might lead to a higher cost.