AWS CLI S3 multithreaded Configuration
How to speed up upload and download data on S3 using a small AWS CLI configuration.
I am going to explain how to configure the AWS CLI for using multiple threads to upload and download data from Amazon S3. Multiple threads can improve data transfer performance.
For the demo, As shown in the below image we are going to use an EC2 instance that utilizes an S3 VPC endpoint the instance has 100GB Io1 volume and also has an IAM role assigned to the instance that provides access to the S3 bucket.
Open command prompt connected to our Amazon EC2 instance. The instance has the AWS CLI already installed.
Generate 5GB file on EC2 instance.
dd if=/path/to/input of=5GB.file bs=1 count=0 seek=5G
#or#
dd if=/dev/zero of=YOUR-FILE-NAME-HERE bs=1 count=0 seek=5G
the above command will create a 5GB file on the file system.
The next thing we will be doing is we are going to configure some of the settings. The setting shown in this demo applies to the s3 CLI commands and not to s3API.
Commands to set concurrent requests allowed(we are setting it to 1 to check actual time for single thread).
aws configure set default.s3.max_concurrent_requests 1
Commands to set if the object should be multipart or not.
aws configure set default.s3.multipart_threshold 64MB
Commands to set the size of the chunk that is going to upload and download.
aws configure set default.s3.multipart_chunksize 16MB
Now upload the file to S3 from EC2
time aws s3 cp 5GB.file s3://s3name/upload2.test
Now Check the time for upload the file
Now set the concurrent requests allowed to 10
aws configure set default.s3.max_concurrent_requests 10
Now upload the file to S3 from EC2
time aws s3 cp 5GB.file s3://s3name/upload2.test
Now the same 5GB file will take 10 times less time to upload. because now 10 threads are uploading the files in the chunk.