[resent with wrapped lines due to prior truncations]

Had to dig around to figure this out, and nobody
else seems to have done so:

The algorithm to calculate the Amazon S3 multipart
MD5 Etag from the original file is thus:

Calculate the MD5 hash for each uploaded part of
the file, concatenate the hashes into a single
binary string and calculate the MD5 hash of that
result.  Be sure to use the *binary* value of the
hashes rather than the hex-ascii representation in
the final step.

A rough command sequence for this is

1) run 'dd bs=1024k count=15 skip=0 if=uploadfile | md5sum >>md5sumlist'

2) repeat (1) while incrementing skip= by 15 for
each file sub-part and collect the values; you are
done when dd says "cannot skip to specified
offset"

3) delete the last line from 'md5sumlist' if (2)
lead to appending 'd41d8cd98f00b204e9800998ecf8427e'
from a final empty-data iteration

4) convert the list of MD5 checksums to binary
with a command similar to

cat md5sumlist | awk '{print $1}' | while read MD5; do echo $MD5 | xxd -r -p 
>>md5bincat; done

5) run 'md5sum md5bincat'

The 'xxd' utility appears to be part of the 'vim'
utility package rather than a stand-alone project.

If you set 'multipart_chunk_size_mb' in .s3cfg to
something other than 15, be sure to use that as
the count= value and skip= increment when running
'dd'.


------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to