Yes Jeremy that comment is worth much. The combination of "never" and "a
hundred thousand files a day" is quite powerful - thanks. So maybe I'm
being a little paranoid, possibly because of my experience with the
problems experienced in the past with other cloud services. I do appreciate
that s3 has a great reputation, which is why I am here.

So thanks again.

Regards
Russell

On 23 March 2015 at 20:59, Jeremy Wadsack <jeremy.wads...@gmail.com> wrote:

> Q: What checksums does Amazon S3 employ to detect data corruption?
>> Amazon S3 uses a combination of Content-MD5 checksums and cyclic
>> redundancy checks (CRCs) to detect data corruption. Amazon S3 performs
>> these checksums on data at rest and repairs any corruption using redundant
>> data. In addition, the service calculates checksums on all network traffic
>> to detect corruption of data packets when storing or retrieving data.
>
>  – http://aws.amazon.com/s3/faqs/
>
>
> FWIW, I've never had a corrupt file on S3 and we move a hundred thousand
> files a day.
>
>
>
> Jeremy Wadsack
>
> On Mon, Mar 23, 2015 at 1:44 PM, Russell Gadd <rust...@gmail.com> wrote:
>
>> So if I want to verify that Amazon holds a valid copy of my files, then I
>> could get s3cmd to list those of size less than 15MB (assuming an unchanged
>> default chunk size) with --list-md5, but for larger files I'd need to
>> download them and calculate the MD5 to compare with the original. I think
>> I'm likely to leave this to the end of the project as I've got some local
>> backup anyway.
>>
>> Many thanks for the information.
>>
>> Russell
>>
>> On 23 March 2015 at 17:14, Matt Domsch <m...@domsch.com> wrote:
>>
>>> amazon calculates its own MD5 and puts it in the <ETag> field of a
>>> BUCKET_LIST.  s3cmd does not send a Content-MD5 header, which AWS could use
>>> to validate that the received object matches the sent object, as the new
>>> AWS signature v4 method now calculates the sha256 of the content and sends
>>> that, which is effectively the same.
>>>
>>> However, for multipart-sent files, and for server-side-encrypted files
>>> (either with a KMS key or an Amazon-managed key), the resulting ETag
>>> doesn't match the original content's MD5.
>>>
>>> For this reason, s3cmd sends an x-amz-meta-s3cmd-attribs header which
>>> includes the original object's MD5.  This metadata is stored in S3 and is
>>> returned by an object HEAD or GET call.
>>>
>>>
>>> On Mon, Mar 23, 2015 at 4:50 AM, Russell Gadd <rust...@gmail.com> wrote:
>>>
>>>> Thanks for the comments Will. I had a look at Duplicity and as you say
>>>> it looks like a decent backup tool but isn't what I'm looking for. I will
>>>> have a better look at Python but at present my inclination is to stick to
>>>> bash/sed/awk.
>>>>
>>>> I wonder if Matt could answer the question regarding MD5's returned by
>>>> a list operation, i.e. does Amazon calculate them based on its own file
>>>> copy or does it expect to be given the MD5 by the upload software? I seem
>>>> to remember reading somewhere that Amazon only uses the MD5 for
>>>> verification of transfer, so that if a file is uploaded in multiple parts
>>>> it only calculates its own MD5s for each part. Maybe that's outdated
>>>> information.
>>>>
>>>> I tried to verify this by uploading a 35MB file using the s3 console
>>>> (so s3cmd wouldn't know anything about it) and checking how long it took to
>>>> download vs how long to list with the --list-md5 option (doing the list
>>>> operation first). The download was about 15 seconds on my system but the
>>>> MD5 listing was almost instant, so Amazon had the MD5. However I don't
>>>> think the upload was multipart, because it restarted 3 times, sometimes
>>>> getting over half way, before it managed the upload and restarted from the
>>>> beginning. So I'm still none the wiser.
>>>>
>>>> I do believe in verifying backups which is why I'm keen on the MD5
>>>> check based on the actual file at s3. I haven't seen any cloud service
>>>> offer to do hashes on their data - I think one which did would have an
>>>> extra selling point. As far as I'm concerned I'd be happy to pay a fee for
>>>> such a service, they wouldn't have to charge much to make it viable. Of
>>>> course you'd have to make sure their client software didn't cheat by doing
>>>> the hash on your own PC and you'd want to use independent software locally
>>>> to verify their hash.
>>>>
>>>> Regards
>>>> Russell
>>>>
>>>> On 21 March 2015 at 21:10, Will McCown <w...@ross-mccown.com> wrote:
>>>>
>>>>> On 3/21/2015 11:51 AM, Russell Gadd wrote:
>>>>> > My questions are:
>>>>> >
>>>>> >  1. Where does Amazon get its MD5 from? Is it calculated locally in
>>>>> my
>>>>> >     PC and sent in some headers? If Amazon calculates it at their end
>>>>> >     from the file it has on its servers then the verification is ok
>>>>> but
>>>>> >     otherwise how do I know their copy of the file is valid?
>>>>>
>>>>> I believe that Amazon calculates it on their end, or at least I hope so
>>>>> as I use it as an integrity check for my own backups. If you learn
>>>>> otherwise please let us know.
>>>>>
>>>>> >  2. How easy is it to find out how to use Amazon's AWS CLI in Linux?
>>>>> I
>>>>> >     have tried out s3cmd and it seems easy to use, but at first
>>>>> glance
>>>>> >     the AWS CLI looks pretty complex.
>>>>> >  3. I plan to use Bash and a little sed / awk in Linux. I've already
>>>>> >     done some code to create and manipulate this index as a trial. I
>>>>> >     don't particularly like Bash as such but it does a job.
>>>>> >     Alternatively I could perhaps use this project to learn some
>>>>> other
>>>>> >     language such as Python, but I'm not particularly keen to do this
>>>>> >     unless it confers particular advantages. Any opinions would be
>>>>> >     welcome (leaning perhaps to a C-like language if possible).
>>>>>
>>>>> I would certainly borrow heavily from s3cmd as an example.  I've looked
>>>>> at the CLI as well and find it pretty complex (but I'm not a really
>>>>> a programmer).  You might also want to check out the package called
>>>>> "duplicity".  I've been using it with s3 as the back end for a while
>>>>> and it seems to work pretty well (but works in the classical
>>>>> full/incremental backup mode which isn't quite what you are
>>>>> are thinking of).  But duplicity is written in python and will
>>>>> be another example of an implementation of an s3 back end.
>>>>>
>>>>> I used to write lots of complicated base/sed/awk scripts to do stuff,
>>>>> but these days I think Perl or Python is a much better choice for
>>>>> such things.  Both languages have a tremendous open-source library
>>>>> bases to draw upon that you can do a lot with very little actual
>>>>> coding.
>>>>>
>>>>> --
>>>>> Will McCown, Rolling Hills Estates, CA
>>>>> w...@ross-mccown.com
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>>>> sponsored
>>>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>>>> for all
>>>>> things parallel software development, from weekly thought leadership
>>>>> blogs to
>>>>> news, videos, case studies, tutorials and more. Take a look and join
>>>>> the
>>>>> conversation now. http://goparallel.sourceforge.net/
>>>>> _______________________________________________
>>>>> S3tools-general mailing list
>>>>> S3tools-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>>> sponsored
>>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>>> for all
>>>> things parallel software development, from weekly thought leadership
>>>> blogs to
>>>> news, videos, case studies, tutorials and more. Take a look and join the
>>>> conversation now. http://goparallel.sourceforge.net/
>>>> _______________________________________________
>>>> S3tools-general mailing list
>>>> S3tools-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>> sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>> for all
>>> things parallel software development, from weekly thought leadership
>>> blogs to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> S3tools-general mailing list
>>> S3tools-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> S3tools-general mailing list
>> S3tools-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to