Had to dig around to figure this out, and nobody else seems to have done so:

The algorithm to calculate the Amazon S3 multipart MD5 Etag from the original 
file is thus:

Calculate the MD5 hash for each uploaded part of the file, concatenate the 
hashes into a single binary string and calculate the MD5 hash of that result.  
Be sure to use the *binary* value of the hashes rather than the hex-ascii 
representation in the final step.

A rough command sequence for this is

1) run 'dd bs=1024k count=15 skip=0 if=uploadfile | md5sum >>md5sumlist'

2) repeat (1) while incrementing skip= by 15 for each file sub-part and collect 
the values; you are done when dd says "cannot skip to specified offset"

3) delete the last line from 'md5sumlist' if (2) lead to appending 
'd41d8cd98f00b204e9800998ecf8427e' from a final empty-data iteration

4) convert the list of MD5 checksums to binary with a command similar to

cat md5sumlist | awk '{print $1}' | while read MD5; do echo $MD5 | xxd -r -p 
>>md5bincat; done

5) run 'md5sum md5bincat'

The 'xxd' utility appears to be part of the 'vim' utility package rather than a 
stand-alone project.

If you set 'multipart_chunk_size_mb' in .s3cfg to something other than 15, be 
sure to use that as the count= value and skip= increment when running 'dd'.


------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to