IMO retrieving the original MD5 calculated by the
's3cmd put' and attached to the file in metadata
defeats the purpose of the S3 MD5 hash scheme.
The point is to be able to verify that the cloud
copy of the file is the same as the original, and
so only the hash calculated by Amazon is relevant.
The metadata MD5 value certainly is useful for the
'sync' command and for validating restored files.

Here is a patch that restores the original 's3cmd
ls --list-md5' behavior.  It's straightforward to
calculate the Amazon S3 MD5 value as described in
my recent post covering the algorithm.

I've also enhanced the 'info' subcommand to show
both values:

$ s3cmd info s3://offsite-backup-wc/OFFSITE_20130707170754
s3://offsite-backup-wc/OFFSITE_20130707170754 (object):
   File size: 5675190650
   Last mod:  Fri, 02 Aug 2013 17:05:24 GMT
   MIME type: application/octet-stream; charset=binary
   AZ3MD5 sum:   9f85d36e45625b0d50a46f8628dd80bb-6
      MD5 sum:   18ed4d9c695c72329001d34939d74a88
   policy: none
   ACL:       amazon_s3: FULL_CONTROL

One interesting discovery is that S3 prevents
copying, moving or even editing the metadata for
files that are larger the 5G (5368709120 bytes).

If a multipart file is smaller then 5G it can be
copied and the AZ3MD5 sum will be re-calculated
for a single chunk and will match the normal MD5
calculation.  I suspect that Amazon may migrate
>5G files to different arrays even though
customers are prohibited from similar operations.
In that case the file may be divided into some
standard chunk size determined by Amazon and the
AZ3MD5 value recalculated accordingly.  If anyone
knows what that size might be please advise.
Knowing the value in advance allows one to
calculate what the migration AZ3MD5 value would
be before the original local file is deleted.

Sometime in the next week I'll post a proper
'az3md5' script that performs the S3 hash
calculation for a specified segment size.
--- s3cmd.orig  2013-06-05 17:43:31.000000000 -0400
+++ s3cmd       2013-08-03 14:09:44.000000000 -0400
@@ -154,14 +154,6 @@
 
     for object in response["list"]:
         md5 = object['ETag'].strip('"')
-        if cfg.list_md5:
-            if md5.find('-') >= 0: # need to get md5 from the object
-                object_uri = uri.compose_uri(bucket, object["Key"])
-                info_response = s3.object_info(S3Uri(object_uri))
-                try:
-                    md5 = info_response['s3cmd-attrs']['md5']
-                except KeyError:
-                    pass
 
         size, size_coeff = formatSize(object["Size"], 
Config().human_readable_sizes)
         output(format_string % {
@@ -619,12 +611,13 @@
                 output(u"   File size: %s" % info['headers']['content-length'])
                 output(u"   Last mod:  %s" % info['headers']['last-modified'])
                 output(u"   MIME type: %s" % info['headers']['content-type'])
-                md5 = info['headers']['etag'].strip('"')
+                az3md5 = info['headers']['etag'].strip('"')
+                output(u"   AZ3MD5 sum:   %s" % az3md5)
                 try:
                     md5 = info['s3cmd-attrs']['md5']
+                    output(u"      MD5 sum:   %s" % md5)
                 except KeyError:
                     pass
-                output(u"   MD5 sum:   %s" % md5)
             else:
                 info = s3.bucket_info(uri)
                 output(u"%s (bucket):" % uri.uri())
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to