amazon calculates its own MD5 and puts it in the <ETag> field of a
BUCKET_LIST.  s3cmd does not send a Content-MD5 header, which AWS could use
to validate that the received object matches the sent object, as the new
AWS signature v4 method now calculates the sha256 of the content and sends
that, which is effectively the same.

However, for multipart-sent files, and for server-side-encrypted files
(either with a KMS key or an Amazon-managed key), the resulting ETag
doesn't match the original content's MD5.

For this reason, s3cmd sends an x-amz-meta-s3cmd-attribs header which
includes the original object's MD5.  This metadata is stored in S3 and is
returned by an object HEAD or GET call.


On Mon, Mar 23, 2015 at 4:50 AM, Russell Gadd <rust...@gmail.com> wrote:

> Thanks for the comments Will. I had a look at Duplicity and as you say it
> looks like a decent backup tool but isn't what I'm looking for. I will have
> a better look at Python but at present my inclination is to stick to
> bash/sed/awk.
>
> I wonder if Matt could answer the question regarding MD5's returned by a
> list operation, i.e. does Amazon calculate them based on its own file copy
> or does it expect to be given the MD5 by the upload software? I seem to
> remember reading somewhere that Amazon only uses the MD5 for verification
> of transfer, so that if a file is uploaded in multiple parts it only
> calculates its own MD5s for each part. Maybe that's outdated information.
>
> I tried to verify this by uploading a 35MB file using the s3 console (so
> s3cmd wouldn't know anything about it) and checking how long it took to
> download vs how long to list with the --list-md5 option (doing the list
> operation first). The download was about 15 seconds on my system but the
> MD5 listing was almost instant, so Amazon had the MD5. However I don't
> think the upload was multipart, because it restarted 3 times, sometimes
> getting over half way, before it managed the upload and restarted from the
> beginning. So I'm still none the wiser.
>
> I do believe in verifying backups which is why I'm keen on the MD5 check
> based on the actual file at s3. I haven't seen any cloud service offer to
> do hashes on their data - I think one which did would have an extra selling
> point. As far as I'm concerned I'd be happy to pay a fee for such a
> service, they wouldn't have to charge much to make it viable. Of course
> you'd have to make sure their client software didn't cheat by doing the
> hash on your own PC and you'd want to use independent software locally to
> verify their hash.
>
> Regards
> Russell
>
> On 21 March 2015 at 21:10, Will McCown <w...@ross-mccown.com> wrote:
>
>> On 3/21/2015 11:51 AM, Russell Gadd wrote:
>> > My questions are:
>> >
>> >  1. Where does Amazon get its MD5 from? Is it calculated locally in my
>> >     PC and sent in some headers? If Amazon calculates it at their end
>> >     from the file it has on its servers then the verification is ok but
>> >     otherwise how do I know their copy of the file is valid?
>>
>> I believe that Amazon calculates it on their end, or at least I hope so
>> as I use it as an integrity check for my own backups. If you learn
>> otherwise please let us know.
>>
>> >  2. How easy is it to find out how to use Amazon's AWS CLI in Linux? I
>> >     have tried out s3cmd and it seems easy to use, but at first glance
>> >     the AWS CLI looks pretty complex.
>> >  3. I plan to use Bash and a little sed / awk in Linux. I've already
>> >     done some code to create and manipulate this index as a trial. I
>> >     don't particularly like Bash as such but it does a job.
>> >     Alternatively I could perhaps use this project to learn some other
>> >     language such as Python, but I'm not particularly keen to do this
>> >     unless it confers particular advantages. Any opinions would be
>> >     welcome (leaning perhaps to a C-like language if possible).
>>
>> I would certainly borrow heavily from s3cmd as an example.  I've looked
>> at the CLI as well and find it pretty complex (but I'm not a really
>> a programmer).  You might also want to check out the package called
>> "duplicity".  I've been using it with s3 as the back end for a while
>> and it seems to work pretty well (but works in the classical
>> full/incremental backup mode which isn't quite what you are
>> are thinking of).  But duplicity is written in python and will
>> be another example of an implementation of an s3 back end.
>>
>> I used to write lots of complicated base/sed/awk scripts to do stuff,
>> but these days I think Perl or Python is a much better choice for
>> such things.  Both languages have a tremendous open-source library
>> bases to draw upon that you can do a lot with very little actual
>> coding.
>>
>> --
>> Will McCown, Rolling Hills Estates, CA
>> w...@ross-mccown.com
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> S3tools-general mailing list
>> S3tools-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to