AWS CLI S3 multithreaded Configuration

I am going to explain how to configure the AWS CLI for using multiple threads to upload and download data from Amazon S3. Multiple threads can improve data transfer performance.

For the demo, As shown in the below image we are going to use an EC2 instance that utilizes an S3 VPC endpoint the instance has 100GB Io1 volume and also has an IAM role assigned to the instance that provides access to the S3 bucket.

Open command prompt connected to our Amazon EC2 instance. The instance has the AWS CLI already installed.

Generate 5GB file on EC2 instance.

dd if=/path/to/input of=5GB.file bs=1 count=0 seek=5G
#or#
dd if=/dev/zero of=YOUR-FILE-NAME-HERE bs=1 count=0 seek=5G

the above command will create a 5GB file on the file system.

The next thing we will be doing is we are going to configure some of the settings. The setting shown in this demo applies to the s3 CLI commands and not to s3API.

Commands to set concurrent requests allowed(we are setting it to 1 to check actual time for single thread).

aws configure set default.s3.max_concurrent_requests 1

Commands to set if the object should be multipart or not.

aws configure set default.s3.multipart_threshold 64MB

Commands to set the size of the chunk that is going to upload and download.

aws configure set default.s3.multipart_chunksize 16MB

Now upload the file to S3 from EC2

time aws s3 cp 5GB.file s3://s3name/upload2.test

Now Check the time for upload the file

Now set the concurrent requests allowed to 10

aws configure set default.s3.max_concurrent_requests 10

Now upload the file to S3 from EC2

time aws s3 cp 5GB.file s3://s3name/upload2.test

Now the same 5GB file will take 10 times less time to upload. because now 10 threads are uploading the files in the chunk.

AWS CLI S3 multithreaded Configuration

Comments

More from this blog

Face Recognition with Python and OpenCV

Git Cheat Sheet

Python script to download attachments block by Gmail

Command Palette

Comments

More from this blog