S3 multipart upload

Multipart uploadMultipart upload allows uploading large objects in parts. This post reveals how that feature works. Primitive s3curl is used to test S3 APIs.

An application has to upload every single object in several steps:

  • Initiate
  • Upload parts
  • Complete

When multipart upload is completed, ECS assembles the initial object from the uploaded parts.

Initiate

This operation initiates a multipart upload and returns an Upload ID.

Upload ID is used to associate all of the parts in the specific multipart upload. You have to specify this it in each of subsequent upload part requests. Upload ID must be included into the final request to complete the multipart upload operation.

  • Create a multipart upload for multipartobject.
  • Get the Upload ID – 96cd4b1fc64f41f182698d6939be7204
# ./s3curl.pl --id=ecsid -- -s -X POST http://10.0.0.1:9020/bucket/multipartobject?uploads | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<InitiateMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Bucket>bucket</Bucket>
  <Key>multipartobject</Key>
  <UploadId>96cd4b1fc64f41f182698d6939be7204</UploadId>
</InitiateMultipartUploadResult>

Please check ECS REST API reference for more details why I used the specific HTTP commands and URL parameters.

  • Check all existing multipart upload sessions.
  • Only one is available now.
# ./s3curl.pl --id=ecsid -- -s "http://10.0.0.1:9020/bucket/?uploads" | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListMultipartUploadsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Bucket>bucket</Bucket>
  <MaxUploads>1000</MaxUploads>
  <IsTruncated>false</IsTruncated>
  <Upload>
    <Key>multipartobject</Key>
    <UploadId>96cd4b1fc64f41f182698d6939be7204</UploadId>
    <RequestInitiator>
      <ID>objuser1</ID>
      <DisplayName>objuser1</DisplayName>
    </RequestInitiator>
    <Owner>
      <ID>objuser1</ID>
      <DisplayName>objuser1</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
    <Initiated>2017-05-05T15:14:57.791Z</Initiated>
  </Upload>
</ListMultipartUploadsResult>
  • The object is not created yet. The bucket is empty.
# ./s3curl.pl --id=ecsid -- -s http://10.0.0.1:9020/bucket/ | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>bucket</Name>
  <Prefix/>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <IsTruncated>false</IsTruncated>
  <ServerSideEncryptionEnabled>false</ServerSideEncryptionEnabled>
</ListBucketResult>

Upload parts

This operation uploads a part in a multipart upload.

A part number uniquely identifies a part and also defines its position within the object. Part numbers can be any number from 1 to 10,000.

Note: If you upload a new part using the same part number that was used with a previous part, the previously uploaded part is overwritten.

  • Create an object which we will upload.
  • Split the object on 3x parts 5 bytes long.
# vi object
part1part2part3

# split –b 5 object part_
# ls –l
-rw-r--r-- 1 root 720748206       18 May 5 18:18 object
-rw-r--r-- 1 root 720748206         6 May 5 18:19 part_aa
-rw-r--r-- 1 root 720748206         6 May 5 18:19 part_ab
-rw-r--r-- 1 root 720748206         6 May 5 18:19 part_ac
  • No parts uploaded yet.
# ./s3curl.pl --id=ecsid -- -s "http://10.0.0.1:9020/bucket/multipartobject?uploadId=96cd4b1fc64f41f182698d6939be7204" | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListPartsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
 <Bucket>bucket</Bucket>
  <Key>multipartobject</Key>
  <UploadId>96cd4b1fc64f41f182698d6939be7204</UploadId>
  <Initiator>
    <ID>objuser1</ID>
    <DisplayName>objuser1</DisplayName>
  </Initiator>
  <Owner>
    <ID>objuser1</ID>
    <DisplayName>objuser1</DisplayName>
  </Owner>
  <StorageClass>STANDARD</StorageClass>
  <PartNumberMarker>0</PartNumberMarker>
  <NextPartNumberMarker>0</NextPartNumberMarker>
  <MaxParts>1000</MaxParts>
  <IsTruncated>false</IsTruncated>
</ListPartsResult>
  • Upload the 1st part of the object.
  • Content-Length header is mandatory. It specifies the size of the part in bytes.
  • uploadId and partNumber must be defined
# ./s3curl.pl --id=ecsid -- -X PUT -T part_aa -H "Content-Length: 5" -s "http://10.0.0.1:9020/bucket/multipartobject?uploadId=96cd4b1fc64f41f182698d6939be7204&partNumber=1"

Note: to get both uploadID and partNumber parameters in s3curl URL we have to use double quotes (“”).

  • Check if the part is uploaded.
# ./s3curl.pl --id=ecsid -- -s "http://10.0.0.1:9020/bucket/multipartobject?uploadId=96cd4b1fc64f41f182698d6939be7204" | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListPartsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Bucket>bucket</Bucket>
  <Key>multipartobject</Key>
  <UploadId>96cd4b1fc64f41f182698d6939be7204</UploadId>
  <Initiator>
    <ID>objuser1</ID>
    <DisplayName>objuser1</DisplayName>
  </Initiator>
  <Owner>
    <ID>objuser1</ID>
    <DisplayName>objuser1</DisplayName>
  </Owner>
  <StorageClass>STANDARD</StorageClass>
  <PartNumberMarker>0</PartNumberMarker>
  <NextPartNumberMarker>0</NextPartNumberMarker>
  <MaxParts>1000</MaxParts>
  <IsTruncated>false</IsTruncated>

  <Part>
    <PartNumber>1</PartNumber>
    <LastModified>2017-05-05T15:34:37.423Z</LastModified>
    <ETag>"ffc88b4ca90a355f8ddba6b2c3b2af5c"</ETag>
    <Size>5</Size>
  </Part>
</ListPartsResult>
  • Let’s test out of order upload.
  • Instead of 2nd part I’ll upload the 3rd
# ./s3curl.pl --id=ecsid -- -X PUT -T part_ac -H "Content-Length: 5" -s "http://10.0.0.1:9020/bucket/multipartobject?uploadId=96cd4b1fc64f41f182698d6939be7204&partNumber=3"

# ./s3curl.pl --id=ecsid -- -s "http://10.0.0.1:9020/bucket/multipartobject?uploadId=96cd4b1fc64f41f182698d6939be7204" | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListPartsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Bucket>bucket</Bucket>
  <Key>multipartobject</Key>
  <UploadId>96cd4b1fc64f41f182698d6939be7204</UploadId>
  <Initiator>
    <ID>objuser1</ID>
    <DisplayName>objuser1</DisplayName>
  </Initiator>
  <Owner>
    <ID>objuser1</ID>
    <DisplayName>objuser1</DisplayName>
  </Owner>
  <StorageClass>STANDARD</StorageClass>
  <PartNumberMarker>0</PartNumberMarker>
  <NextPartNumberMarker>0</NextPartNumberMarker>
  <MaxParts>1000</MaxParts>
  <IsTruncated>false</IsTruncated>

  <Part>
    <PartNumber>1</PartNumber>
    <LastModified>2017-05-05T15:34:37.423Z</LastModified>
    <ETag>"ffc88b4ca90a355f8ddba6b2c3b2af5c"</ETag>
    <Size>5</Size>
  </Part>

  <Part>
    <PartNumber>3</PartNumber>
    <LastModified>2017-05-05T15:38:01.264Z</LastModified>
    <ETag>"49dcd91231f801159e893fb5c6674985"</ETag>
    <Size>5</Size>
  </Part>
</ListPartsResult>

  • Upload the last part of the object.
# ./s3curl.pl --id=ecsid -- -X PUT -T part_ab -H "Content-Length: 5" -s "http://10.0.0.1:9020/bucket/multipartobject?uploadId=96cd4b1fc64f41f182698d6939be7204&partNumber=2"
  • All parts are uploaded.
  • Save somewhere ETags of all parts.
# ./s3curl.pl --id=ecsid -- -s "http://10.0.0.1:9020/bucket/multipartobject?uploadId=96cd4b1fc64f41f182698d6939be7204" | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListPartsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Bucket>bucket</Bucket>
  <Key>multipartobject</Key>
  <UploadId>96cd4b1fc64f41f182698d6939be7204</UploadId>
  <Initiator>
    <ID>objuser1</ID>
    <DisplayName>objuser1</DisplayName>
  </Initiator>
  <Owner>
    <ID>objuser1</ID>
    <DisplayName>objuser1</DisplayName>
  </Owner>
  <StorageClass>STANDARD</StorageClass>
  <PartNumberMarker>0</PartNumberMarker>
  <NextPartNumberMarker>0</NextPartNumberMarker>
  <MaxParts>1000</MaxParts>
  <IsTruncated>false</IsTruncated>

  <Part>
    <PartNumber>1</PartNumber>
    <LastModified>2017-05-05T15:34:37.423Z</LastModified>
    <ETag>"ffc88b4ca90a355f8ddba6b2c3b2af5c"</ETag>
    <Size>5</Size>
  </Part>

  <Part>
    <PartNumber>2</PartNumber>
    <LastModified>2017-05-05T15:42:01.128Z</LastModified>
    <ETag>"d067a0fa9dc61a6e7195ca99696b5a89"</ETag>
    <Size>5</Size>
  </Part>

  <Part>
    <PartNumber>3</PartNumber>
    <LastModified>2017-05-05T15:38:01.264Z</LastModified>
    <ETag>"49dcd91231f801159e893fb5c6674985"</ETag>
    <Size>5</Size>
  </Part>
</ListPartsResult>

Complete

This operation completes a multipart upload by assembling previously uploaded parts.

Upon receiving this request, all the parts are concatenated in ascending order by part number to create a new object.

In the Complete Multipart Upload request, you need to provide a list of all parts. For each part in the list, you have to specify the part number and the ETag header value.

Note: Processing of a Complete Multipart Upload request could take several minutes to complete.

  • Multipart object is not completed yet. So we can’t see it in the bucket.
# ./s3curl.pl --id=ecsid -- -s "http://10.0.0.1:9020/bucket/" | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>bucket</Name>
  <Prefix/>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <IsTruncated>false</IsTruncated>
  <ServerSideEncryptionEnabled>false</ServerSideEncryptionEnabled>
</ListBucketResult>
  • Create an xml file which we will specify as body of complete request.
  • Specify corresponding ETags
# vi CompleteMultipartUpload.txt
<CompleteMultipartUpload>
  <Part>
    <PartNumber>1</PartNumber>
    <ETag>ffc88b4ca90a355f8ddba6b2c3b2af5c</ETag>
  </Part>
  <Part>
    <PartNumber>2</PartNumber>
    <ETag>d067a0fa9dc61a6e7195ca99696b5a89</ETag>
  </Part>
  <Part>
    <PartNumber>3</PartNumber>
    <ETag>49dcd91231f801159e893fb5c6674985</ETag>
  </Part>
</CompleteMultipartUpload>
  • Execute a complete request.
# ./s3curl.pl --id=ecsid -- -X POST -d @CompleteMultipartUpload.txt -s "http://10.0.0.1:9020/bucket/multipartobject?uploadId=96cd4b1fc64f41f182698d6939be7204" | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/200-03-01/">
  <Location>http://10.0.0.1:9020/bucket/multipartobject</Location>
  <Bucket>bucket</Bucket>
  <Key>multipartobject</Key>
  <ETag>"5e9583dc343a55b829d3b99070a10015-3"</ETag>
</CompleteMultipartUploadResult>
  • The object is in the bucket now.
# ./s3curl.pl --id=ecsid -- -s "http://10.0.0.1:9020/bucket/" | xmllint --format -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>bucket</Name>
  <Prefix/>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <IsTruncated>false</IsTruncated>
  <ServerSideEncryptionEnabled>false</ServerSideEncryptionEnabled>
  <Contents>
    <Key>multipartobject</Key>
    <LastModified>2017-05-05T16:05:11Z</LastModified>
    <ETag>"5e9583dc343a55b829d3b99070a10015-3"</ETag>
    <Size>15</Size>
    <StorageClass>STANDARD</StorageClass>
    <Owner>
      <ID>objuser1</ID>
      <DisplayName>objuser1</DisplayName>
    </Owner>
  </Contents>
</ListBucketResult>
  • All three parts are merged together in the right order.
# ./s3curl.pl --id=ecsid -- -s "http://10.0.0.1:9020/bucket/multipartobject"
part1part2part3

Note: Amazon recommends using Multipart upload for objects more than 100MB.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s