Object storage manages data in the form of objects. Each object has its own metadata and unique identifier. The unique identifier can be used to access the object from different machines or devices.
Why is object-based storage popular? The answer points towards its ability to store unlimited amounts of data. This characteristic is important for data analytics where it involves huge amounts of unstructured data.
The object-based storage architecture gained popularity when mobile apps and web applications started to store their binary data in cloud storage. AWS S3, Dropbox, and Google Cloud Storage are some popular examples with object storage as its back end. AWS S3 grew in massive scale and popularised object storage data when they claimed its ability to store trillions of data in 2013. This blog will give an insight into S3 object storage.
Object Storage in Amazon Web Services (AWS)
AWS leveraged on the object storage architecture and introduced cloud-based object storage as a service. As mentioned earlier, object-based storage has the capability of storing large data in its raw format and AWS used this as the base of its services to offer scalable and cost-effective storage service. Amazon Simple Storage Service (Amazon S3) and Amazon Glacier are the two cloud object storage services offered by AWS.
Key benefits of AWS Cloud Object Storage includes the following:
- Data availability and durability is 99.999999999% ( Might differ according to tiers)
- Data can be distributed across minimum 3 physical facilities separated geographically
- Supports different forms of encryption
- Comply with security standards including PCI-DSS, HIPAA/HITECH, FedRAMP, SEC Rule 17-a-4, EU Data Protection Directive and many others
- Provide flexible storage and administration facility
S3 Object Storage
S3 Object Storage was introduced by Amazon 2006 via the Amazon S3 services. It is an object-level storage hence it won't be able to host OS or dynamic sites. A single object in the storage is able to store up to 5TB of data. Objects are arranged in a collection called buckets, stored and retrieved using simple commands such as PUT, GET, DELETE. S3 objects are referred by the unique identifier given by the user which is unique to S3 object storage.
S3 bucket is a container for the objects created and owned by an AWS account. S3 service is a global service, but the buckets created are by region and it has a unique identifier. Every object must be in a bucket and there is no limit to the number of objects in each bucket. Each AWS account can create up to 100 buckets and the delete operation can be applied to empty and non-empty buckets. The data model for S3 is flat, therefore there is no structure or hierarchy in a bucket.
Each object in the S3 bucket has the following properties:
- Key: object name (Unique)
- Value: data portion not transparent to S3
- Metadata: Information about the data like content type size, last modified. This information can be customized
- Version ID: Version id for the object ( Unique)
- Access Control Information:: Access control for the objects
Objects are uniquely identified by the key and version id. Its metadata cannot be modified after uploading. To update it one needs to perform COPY operation and then set the metadata again. The objects can be retrieved partially or as a whole but it can't leave a bucket from a specific AWS region unless it’s explicitly copied using Cross Region replication.
S3 Bucket and Object Operations
There are several operations which can be performed on the buckets and objects. Following are some of the operations and simple command examples for each operation.
S3 allows listing of all the keys within a bucket.
Command example: list-buckets command to display the names of all your Amazon S3 buckets (across all regions)
$ aws s3api list-buckets --query "Buckets.Name"
Objects can be retrieved as a whole or partially via Range HTTP header.
Command example: get-object command to download an object from Amazon S3
$ aws s3api get-object --bucket text-content --key dir/my_images.tar.bz2 my_images.tar.bz2
Objects of size 5GB can be uploaded in a single PUT operation or can be uploaded in multipart upload for objects of size > 5GB.
Command example: put-object command to upload an object to Amazon S3
$ aws s3api put-object --bucket text-content --key dir-1/my_images.tar.bz2 --body my_images.tar.bz2
Copying of object up to 5GB can be done using a single operation and multipart upload can be used for uploads up to 5TB
Command example: cp command copies a single file to a specified bucket and key
$ aws s3 cp test.txt s3://mybucket/test2.txt
S3 allows the deletion of a single object or multiple objects in a single call.
Command example: deletes an object named test.txt from a bucket named my-bucket
$ aws s3api delete-object --bucket my-bucket --key test.txt
S3 Access Points
Access points are named network endpoints attached buckets. These points are used to perform S3 object operations. Access points can only be used to perform an operation on objects and not on buckets. Examples of object operations are GetObject and PutObject. For any request from these access points, S3 applies a set of permission and network controls.
Creating Access Points
Example Command using AWS CLI: Create an access point named example-ap for bucket example-bucket in account 123456789012.
$ aws s3control create-access-point --name example-ap --account-id 123456789012 --bucket example-bucket
Request Object via Access Points
Example Command using AWS CLI: how to request the object my-image.jpg through the access point prod owned by account ID 123456789012 in Region us-west-2
$ aws s3api get-object --key my-image.jpg --bucket arn:aws:s3:us-west-2:123456789012:accesspoint/prod download.jpg
S3 Object Storage capability of handling unlimited data solves the costly on-premises infrastructure for data storage. Organisations can now have object storage as the backend for their applications. In addition, applications with S3 object storage are guaranteed to be reliable, scalable, and secure. Whilst there are many positive points for S3 object storage, there are also a number of challenges to be kept in consideration when adopting this storage type. Following are some of them:
- S3 has multiple levels of security including using the S3 key. S3 keys are customized by the users. Users will have to manage the security of this key. If the user loses the key, the data cannot be accessed.
- S3 does not provide any locking for synchronized access, this requires additional programming to ensure there is no clash in multiple PUT operations for the same object.
- Data access requirements from S3 object storage may incur an additional cost