Monday, December 23, 2019

AWS Simple Storage Services

Before jumping into AWS S3 (Simple Storage Services), Need to understand the Storage types, Object Storage and Distributed Architecture.


A.      Storage Types

 Storage is typically divided into three major types.

1.       Block Storage

Data is stored in evenly size of Block. Means when you update data, then only required block get updated, which is fast.
Its provide lowest latency, Highest performance and highly redundant.
Block storage is suitable for transaction database, Random read/write loads and Structure database.


2.       File Storage

Data is stored as a single file of information in a folder, so if you want to update the information you need to overwrite the whole file. Files are stored in Folder and we can nest more folder under others folders. So we organize files in sub-directories and directory. Each file has limited set of metadata like Name, Creation date, Modified date, Created by etc.
It works fine when you have limited files, when files grow your performance went down drastically. you will face the issue when searching file, updating file etc.

3.       Object Storage


To overcome the limitation/Problem of File storage, Object storage comes into the picture.
Each file is a separate object for file storage. These files store in a flat address space, means there is no hierarchy concept in object storage, all files store in the same level. 
In Object storage single, File is bundled into object along with metadata TAGS and unique identifier.
Because Object storage, stores Object(File), its metadata and unique identifier together, that’s why Its ideally suited for Distributed storage architecture and easily scaled with cheap hardware compare to Block and File storage. 
Object storage cannot be mounted as a drive on the virtual server. You can access the object(files) using API/Command line only.



B.      Distributed architecture


Data processing and data storage are not kept on a single machine rather it is distributed over several independent machines and Machines can be located in a different location which is connected via Network.

C.      Simple Storage Service(S3)


S3 is object based Storage service offering of AWS. S3 has distributed architecture where objects are stored in multiple location on AWS infrastructure.
Using the API, SDK and AWS console, you can Store/Retrieve any amount of data anytime, anywhere using internet.
You can store unlimited object in S3 and Size of an object can be 0 Bytes to 5 TB.

There are two main block/components of Simple Storage Service(S3).

1.       Bucket


The Bucket is a flat container of objects; it does not provide any hierarchical Structure. If you want to store objects in S3, first you need to create the Bucket. The Bucket is a region specific in AWS.

a.       Bucket is a flat container of the objects but You can create the logical folder inside the Bucket.

b.      You can store unlimited objects in the bucket.

c.       You cannot create the nested Bucket.

d.      By default, you can create maximum 100 Bucket but This is a soft limit, it can be increased.

e.       Ownership of Bucket cannot be transferred.

f.        Bucket Name is globally unique in all regions and account. Means someone chooses Bucket Name then you cannot use this name. However, if you delete the Bucket then same name will available for use.

g.       Bucket name cannot rename/Changed.

h.      Bucket name has a specific naming convention; you have to follow while creating the Bucket.


2.       Object


a.       Size of an object can be 0 Bytes to 5 TB.

b.      Each object is accessible by a unique ID (name or Key).

c.       Using combinations of below properties, you can uniquely identify the object in S3.
                                                                                            i.            Service End Point
                                                                                           ii.            Bucket Name
                                                                                         iii.            Object Key(name)
                                                                                         iv.            Object version(optional)

d.      Bucket is region specific and object is stored in a Bucket, So Object also is region specific. The objects never leave that region, unless you intentionally move to another region or Enable CROSS REGION REPLICTION properties. We will discuss CROSS REGION REPLICTION in later section of this Blog.  

e.       S3 provide high data durability, Object are S3 redundantly stored in multiple facility in a region (Where Bucket is created).


3.       Storage Classes


Now we have an idea of what is Bucket and objects. We store our files in the form of object in Bucket. While we upload objects in a Bucket, AWS asks for Storage Classes.
We have different type of data in our environments, every data use for different purpose, So Each Data has different requirement in terms of durability, availability. So AWS Provided the different type of storage classes based on the ask. Every storage class has different durability, availability and cost.   if you go to high availability and durability, you have to pay the high cost.

Below are the different storages classes provided by AWS for S3 Buckets.

a.       S3-Standard
b.      S3-Standrad Infrequent Access
c.       S3 one Zone Infrequent Access
d.      S3 Reduced Redundancy 
e.       S3 Intelligent Tiering 
f.        S3 Glacier
g.       S3 Glacier Deep archive

1.       S3-Standard


If your data is frequently accessed and you need high durability along with high availability the S3-Standard id your answer.

1.       It provides Eleven Nines Durability i.e. 99.999999999%.

2.       It provided four Nines Availability   i.e.  99.99% over a given year.

3.       3 copies of each object is created in the different availability Zone.

2.       S3-Standard IA (Infrequent Access)


If you don’t access your data frequently. you need high durability and high availability is not a requirement but at least good availibity required. Then S3-Standard IA storage class is good option to store the data.

1.       It provides Eleven Nines Durability i.e. 99.999999999%

2.       It provided Three Nines Availability   i.e.  99.9% over a given year.

3.       Minimum storage duration is 30 Days, means if you put files in this type of storage class Bucket, you have to pay for at least 30 day’s charges. No matter if you delete after one day, you have to pay at least 30 day’s cost. This is also a reason. it is good for the long lived object.

4.       Minimum billable object size is 128 KB, means if you upload object size of less than 128 KB, AWS charge you 128 KB size
5.       Per GB retrieval charges apply.

6.       3 copies of each object are created in the different availability Zone.
 

3.       S3 one Zone IA


This class is your option if your data is Long lived, infrequent access and Non Critical (can be reproduce in case of data loss)

1.       It provides Eleven Nines Durability i.e. 99.999999999%

2.       It provides Availability   i.e.  99.5% over a given year.

3.       Minimum storage duration is 30 Days.

4.       Minimum billable object size is 128 KB, means if you upload object size of less than 128 KB, AWS charge you 128 KB size.

5.       Per GB retrieval charges apply.

6.     Only one copy of each object is created, so in case of Zone failure your data is completely lost, that’s why AWS suggested to use this class for Non critical data that can be reproduce in case of lost.


4.       S3 Reduced Redundancy (RRS)


Now this is not AWS recommended storage class.  it is created for frequently access and non-critical data.

1.       It provides four Nines Durability i.e. 99.99%

2.       It provided four Nines Availability   i.e.  99.99% over a given year.

3.       3 copies of each object is created in the different availability Zone.

S3-Standard is more cost effective storage class compare to “Reduced Redundancy” storage class. This is the reason this class in not recommended for use, It will out from storage class types soon.

5.       S3 Intelligent Tiering 


If the data access pattern is not predictable and data is for long lived, then S3 Intelligent Tiering is a good option. Data resides between 2 storage classes. AWS will move data “S3 Standard” to “S3 Standard IA” storage classes according the accessibility of the object.
If data is infrequent access AWS move this object from “S3 Standard” to “S3 Standard IA” and vice versa. An Object less than 128 KB size cannot move to “S3 Standard IA”.

As AWS applies this logic to change the access class, that’s why AWS charges for monitoring and Automation.

1.       It provides Eleven Nines Durability i.e. 99.999999999%.

2.       It provided Three Nines Availability   i.e.  99.9% over a given year.

3.       Minimum storage duration is 30 Days.

4.       Extra fee applies for monitoring and automation of the object.

5.       3 copies of each object are created in the different availability Zone.

6.       S3 Glacier


It is a solution for long term backup/Archive of data and retrieval time is between minute to hours.

1.       It provides Eleven Nines Durability i.e. 99.999999999%.

2.       It provided four Nines Availability   i.e.  99.99% over a given year.

3.       Minimum storage duration is 90 Days.

4.       Per GB retrieval charges apply but 10 GB per Month retrieval is free in with an account.

5.       3 copies of each object is created in the different availability Zone.

7.       S3 Glacier Deep Archive


It is a solution for the archival of rarely access data and   retrieval time is between 12 hours to 48 hours.

1.       It provides Eleven Nines Durability i.e. 99.999999999%.

2.       It provided Four Nines Availability   i.e.  99.99% over a given year.

3.       Minimum storage duration is 180 Days.

4.       Per GB retrieval charges apply.

5.       3 copies of each object are created in the different availability Zone.

6.       Cost is approx. 75% less than S3 Glacier Storage class.

4.       S3 Bucket Versioning

Bucket versioning helps you Keeping multiple version of the object in a same Bucket. It helps you keep safe your object from accidental data Deletion or overwrite. It also used for data retention and archive the older data.

1.       When versioning is enabled and you overwrite the existing object, S3 automatically create a new version of the object. You can also access the older version whenever is required.

2.       When versioning is enabled and you try to delete an object, a delete marker is placed on the object. You can still view the object and “delete marker”. If you want to reconsider the deleted object, you just need to delete the “Delete marker” and object will be available again.

3.       Versioning apply on all the object of the Bucket.

4.       If we Enabled, the versioning on existing Bucket which already have an object then Bucket versioning will protect the existing and new object and maintain their version as they updated.

5.       Only S3 Bucket owner can permanently delete the object once versioning is enabled.

6.       Once you Enabled the versioning on the Bucket, you cannot disable it. However, you can just suspend it. Below are the states of Bucket versioning.
a.       Un-Versioned
b.      Enabled
c.       Suspend

7.       If you suspend the versioning, Existing object version remain as it is. However, objects will not versioned further in future updates.

8.       If you Get any object and not pass the version, by default S3 return most recent version of the object.

5.       S3 Bucket MFA (Multifactor Factor Authentication) Delete


MFA Delete is an another level of security to protect the S3 Bucket. You cannot enable MFA Delete via Console.  it can be enabled via Command line and API.


6.       S3 Consistency Level


Before going to discuss the S3 consistency level, we need to understand what is the Data consistency and type of data consistency.


Data Consistency


In Distribute architecture, when we store the same data in multiple nodes and read this data from a different node at same time. Then Consistency level refer how consistent is your returned data.
There are two type of Consistency

1.       Strong /Immediate Consistency:

If we update object to any node, it will be updated to all other nodes before an object is available to read. It means in some time period object is not available. It requires a blocking mechanism, Object is block to read until data is updated to all nodes, So Every consumer will have same the object.

2.       Eventual Consistency:

There is no blocking mechanism required for eventual consistency. if an object is updated to any node and an immediate read from different nodes, will not return the updated object.

1.       S3 provides strong/Immediate consistency for new object (PUT), Means if you upload object in S3 then S3 provide Strong Consistency level.

2.       S3 provides Eventual consistency for overwrite the existing object.

3.       S3 provides Eventual consistency to delete the existing object


7.       S3 Encryption


You can encrypt the S3 Data at REST. There are two ways to encrypt data in S3.

1.       Server side Encryption (SSE)

Data is encrypted from S3 service before storing on disk. There is no extra cost to use this feature.  There are three ways to achieve server side encryption.

a.       SSE-S3

Data is encrypted by the S3 service using S3 managed Encryption key. S3 regular rotate the master key and use AES-256 encryption key.

b.      SSE-KMS

Data is encrypted by the S3 service using AWS KMS Encryption keys.

c.       SSE-C

Data is encrypted by the S3 service using Client provided Encryption key. AWS never store the key, so if The Client loses the key then he/she can never access the object.


2.       Client side Encryption:

             The Client encrypts the data on their side and then upload/transfer the data into  S3. 

8.       S3 Static Website


With the help of this feature, you can host your static content websites using S3.

1.       S3 Hosted static websites are automatically scaled to meet the demand. you don’t need any Load Balancer to scale.

2.       There are no extra charges to host the static websites on S3.

3.       You can use your own Domain Name with S3 hosted static website.

4.       S3 hosted website does not support HTTPS, it works only on HTTP.

5.       Below are two formats of S3 hosted Static website URL

a.       Format-1

            http://<bucket Name>. S3-website  -  <Aws region>.                          amazonaws.com

Example: http://mybucket.s3-website-eu-west-                                                   3.amazonaws.com

b.      Format-2
                       
http://<bucket Name>. S3-website . <Aws region>.                               amazonaws.com

Example:  http://mybucket.s3-website.eu-west-                                                  3.amazonaws.com


9.       Pre Signed URLs


Pre Signed URLs uses to provide temporary access to specific objects to those people who don’t have AWS credentials.
Expiry date and time is associated with pre-sign URLs.
Pre-Sign URLs can be used for downloading or uploading the object.

10.   CRR (CROSS REGION REPLICATION)


This is a Bucket level replication, which enables automatic and asynchronous copy of object to different region bucket in the same or different account.

Use case:
a.       Low latency to access the object
b.       Compliance requirement (Put the data at least two region)
                       
1.       To apply the CROSS REGION REPLICATION, Versioning must be enabled on Source and destination bucket.

2.       Replication can happen to only one Destination Buckets.

3.       If you are setting up CROSS REGION REPLICATION in cross account, then the source Bucket owner must have permission to replicate objects in destination bucket.

4.       It replicated TAG along with object, if any.

5.       Object using Encryption with SSE-C (Server side Encryption with Client Key) and SSE-KMS C (Server side Encryption with KMS Key) cannot be replicated.

6.       Objects in source Bucket that are replicas created by another CROSS REGION REPLICATION process are not replicated in destination Bucket.


       

11.   S3 Transfer Acceleration


This is used to accelerate the object uploading process into S3 Bucket from users over a long distance. If you have a Bucket in USA region and you are trying to upload the object from India, then it will take long time to upload, S3 Transfer acceleration reduces this uploading time.

1.       Once S3 Transfer acceleration is enabled, you cannot disable it, however you can suspend it.

2.       S3 Transfer Acceleration uses the Cloud Front Edge location, Once Data has arrived at nearest CloudFront Edge location from the user then it will be copied to destination bucket using an optimized network path.

3.       No data is stored on CloudFront Edge location.

4.       It is not HIPAA Compliant.  

12.   S3 Access

You can grant S3 Bucket/object permission to:

1.       Individual users   

2.       AWS accounts

3.       Make the resource Public

4.       All Authenticated user (User with AWS credentials)


13.   Bucket/Object Access path


To access the Bucket /Objects via SDK/API, AWS provide the URL for each Bucket/Object. AWS S3 provides 2 styles of paths for accessing S3 Bucket/objects.

1.       Virtual-Hosted-Style URLs

Below is the format:

https:// <Your BucketName>. s3 . <Region Name>. amazonaws.com/<Object Name>
Example:

2.       Path-style URLs

Below is the format:

https://S3 - <Region Name > . amazonaws.com / <Bucket Name> / <Object Name>
Example:
         https://s3-us-east-2.amazonaws.com/MyBucket/object.txt

Note:
1.       if your region is “us-east-1” i.e. N. Virginia then you don’t need to pass the region Name in URL.
                  2.  Support for the path-style model ends on September 30, 2020



14.   S3 Object Multipart Upload



Multipart upload is used to upload S3 Object in parts, this parts upload in parallel. You can upload the maximum 5GB object size in one PUT request, if the object size is more than 5 GB then multipart will help you to upload the objects.

1.       Recommend object size to use multipart upload is larger than 100 MB. However, you can use for object size starting from 5 MB.


2.       Use Case:

Higher throughput – we can upload parts in parallel,

Easier error recovery – we need to re-upload only the failed parts

15.   S3 Server Access Logging

This S3 feature help you to enable Records of requests which is made to access the bucket and save these records logs into S3 Buckets. Bucket from where you apply the Server Access Logging is called Source Bucket and Bucket where you store these logs are called Destination Bucket. Source Bucket and Destination Bucket can be same.

1.       These detail logs provide information such as Requester name, Bucket name, Request Time, Request Action, response code and Error if applicable.

2.       By Default, Server Access Logging is disabled.

3.       Destination Bucket (Bucket where you store logs) can be exist in the same region as Source Bucket (Bucket from where you apply the Server Access Logging) exists. But recommendation is to put Destination bucket in different region.

4.       There can be a delay to receive the “Access Logs” in Destination Bucket.

5.       Once Enable, you can disable it any time when required.
                                                            

16.   S3 Bucket Life Cycle Policy


With the help of Life Cycle Policy, you can define the action on objects during its lifetime.
e.g.
a.       Move Object to another Storage Class.

b.       To Archive objects.

c.       Delete Objects after specific time periods. 

1.       You can apply Life cycle policy on all the objects of the Bucket OR Subset of objects. Based on Versions, Name prefix. 

17.   S3 and Event Notification


When certain Event occur on Bucket, you can configure automatically notification to below AWS services.

a.       SNS (Amazon Simple Notification Service)

b.      SQS (Amazon Simple Queue Service)

c.       AWS Lambda Function

1.       This is Bucket level configuration and you can configure multiple event as required.

2.       No Extra charge apply to configure the Event Notification on S3.

3.       You can configure the Notification on Create Bucket, Delete Object etc. 




2 comments:

AWS Elastic Compute Cloud -EC2

Elastic Compute Cloud (EC2) is a virtual Machine on AWS Host (Physical servers). AWS uses a XEN hypervisor to create virtualization and p...

Amazon Web Services Fundamentals