Hi John,
Although AWS API states the ETAG is not guaranteed to be an integrity hash, its 
internal code assumes a special format of ETAG for object uploaded through 
multi-part, like TransferManager. This is reflected in AmazonS3Client's api 
implementation "getObject" below:

     /* (non-Javadoc)

     * @see 
com.amazonaws.services.s3.AmazonS3#getObject(com.amazonaws.services.s3.model.GetObjectRequest,
 java.io.File)

     */

    public ObjectMetadata getObject(GetObjectRequest getObjectRequest, File 
destinationFile)

            throws AmazonClientException, AmazonServiceException {

        assertParameterNotNull(destinationFile,

                "The destination file parameter must be specified when 
downloading an object directly to a file");


        S3Object s3Object = getObject(getObjectRequest);

        // getObject can return null if constraints were specified but not met

        if(s3Object==null)return null;


        ServiceUtils.downloadObjectToFile(s3Object, 
destinationFile,(getObjectRequest.getRange()==null));


        return s3Object.getObjectMetadata();

    }

And In ServiceUtils.downloadObjectToFile, it determines whether an ETAG is 
generated from multipart upload or not through the following routine:


    /**

     * Returns true if the specified ETag was from a multipart upload.

     *

     * @param eTag

     *            The ETag to test.

     *

     * @return True if the specified ETag was from a multipart upload, otherwise

     *         false it if belongs to an object that was uploaded in a single

     *         part.

     */

    public static boolean isMultipartUploadETag(String eTag) {

        return eTag.contains("-");

    }

As you can see, it assumes that multipart upload ETAG should contain "-", not 
underscore "_".  For RIAK CS, the ETAG generated for my S3 object uploaded 
through TransferManager does not follow this convention, thus that check 
failed, and then failed integrity check since that ETAG is not actual MD5sum, 
specifically, reflected in the following code snippet from 
ServiceUtils.downloadObjectToFile:


        try {

            // Multipart Uploads don't have an MD5 calculated on the service 
side

            if 
(ServiceUtils.isMultipartUploadETag(s3Object.getObjectMetadata().getETag()) == 
false) {

                clientSideHash = Md5Utils.computeMD5Hash(new 
FileInputStream(destinationFile));

                serverSideHash = 
BinaryUtils.fromHex(s3Object.getObjectMetadata().getETag());

            }

        } catch (Exception e) {

            log.warn("Unable to calculate MD5 hash to validate download: " + 
e.getMessage(), e);

        }


        if (performIntegrityCheck && clientSideHash != null && serverSideHash 
!= null && !Arrays.equals(clientSideHash, serverSideHash)) {

            throw new AmazonClientException("Unable to verify integrity of data 
download.  " +

                    "Client calculated content hash didn't match hash 
calculated by Amazon S3.  " +

                    "The data stored in '" + destinationFile.getAbsolutePath() 
+ "' may be corrupt.");

        }

If you want to check how we upload the file to RIAK CS using multi-part upload, 
you can check the code at Git repo: 
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=core/src/com/cloud/storage/template/S3TemplateDownloader.java;h=ca0df5d515e900c5313ccb14e962aa72c0785b84;hb=refs/heads/object_store.

Thanks
-min


On 6/7/13 7:53 AM, "John Burwell" 
<jburw...@basho.com<mailto:jburw...@basho.com>> wrote:

Thomas,

The AWS API explicitly states the ETag is not guaranteed to be an integrity 
hash [1].  According to RFC 2616 [2], clients should not infer any meaning to 
the content of an ETag.  Essentially, it is an opaque version identifier which 
should only be compared for equality to another ETag value to detect a resource 
change.  As such, I agree with your assessment that s3cmd is making an invalid 
assumption regarding the value of the ETag.

Min, could you please send the stack trace you receiving from TransferManager?  
Also, could send a reference to the code in the Git repo?  With that 
information, we can start run down the source of the problem.

Thanks,
-John

[1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
[2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

On Jun 7, 2013, at 1:08 AM, Thomas O'Dowd 
<tpod...@cloudian.com<mailto:tpod...@cloudian.com>> wrote:

Min,
This looks like an s3cmd problem. I just downloaded the latest s3cmd to
check the source code.
In S3/FileLists.py:
        compare_md5 = 'md5' in cfg.sync_checks
        # Multipart-uploaded files don't have a valid md5 sum - it ends
with "...-nn"
        if compare_md5:
            if (src_remote == True and src_list[file]['md5'].find("-")
= 0) or (dst_remote == True and dst_list[file]['md5'].find("-") >= 0):
Basically, s3cmd is trying to verify that the checksum of the data that
it downloads is the same as the etag unless the etag ends with "-YYY".
This is an AWS convention (as I mentioned in an earlier mail) so it
works but it seems that RiakCS has a different ETAG format which doesn't
match -YYY so s3cmd assumes the other type of ETAG which is the same as
the MD5 checksum. For RiakCS however, this is not the case. This is why
you get the checksum error.
Chances are that Riak is doing the right thing here and the data file
will be the same as what you uploaded. You could change the s3cmd code
to be more lenient for Riak. The Basho guys might either like to change
their format or talk to the different tool vendors about changing the
tools to work with Riak. For Cloudian, we choose to try to keep it
similar to AWS so we could avoid stuff like this.
Tom.
On Fri, 2013-06-07 at 04:02 +0000, Min Chen wrote:
John,
  We are not able to successfully download file that was uploaded to Riak CS 
with TransferManager using S3cmd. Same error as we encountered using amazon s3 
java client due to the incompatible ETAG format ( - and _ difference).
Thanks
-min
On Jun 6, 2013, at 5:40 PM, "John Burwell" 
<jburw...@basho.com<mailto:jburw...@basho.com>> wrote:
Edison,
Riak CS and S3 seed their hashes differently -- causing the form to appear 
slightly different.  In particular, Riak CS uses URI-safe base64 encoding which 
explains why the ETag values contain "-"s instead of "_"s.  From a client 
perspective, the ETags are treated as opaque strings that are passed through to 
the server for processing and compared strictly for equality.  Therefore, the 
form of the hash will not cause the client to choke, and the Riak CS behavior 
you are seeing is S3 API compatible (see 
http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for more 
details).
Were you able to successfully download the file from Riak CS using s3cmd?
Thanks,
-John
On Jun 6, 2013, at 6:57 PM, Edison Su 
<edison...@citrix.com<mailto:edison...@citrix.com>> wrote:
The Etag created by both RIAK CS and Amazon S3 seems a little bit different, in 
case of multi part upload.
Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
Test environment:
S3cmd: version: version 1.5.0-alpha1
Riak cs:
Name        : riak
Arch        : x86_64
Version     : 1.3.1
Release     : 1.el6
Size        : 40 M
Repo        : installed
>From repo   : basho-products
The command I used to put:
s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v -d
The etag created for the file, when using Riak CS is WxEUkiQzTWm_2C8A92fLQg==
EBUG: Sending request method_string='POST', 
uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test?uploadId=kfDkh7Q_QCWN7r0ZTqNq4Q==',
 headers={'content-length': '309', 'Authorization': 'AWS 
OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-date': 'Thu, 06 Jun 
2013 22:54:28 +0000'}, body=(309 bytes)
DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun 2013 22:40:09 
GMT', 'content-length': '326', 'content-type': 'application/xml', 'server': 
'Riak CS'}, 'reason': 'OK', 'data': '<?xml version="1.0" 
encoding="UTF-8"?><CompleteMultipartUploadResult 
xmlns="http://s3.amazonaws.com/doc/2006-03-01/";><Location>http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-1/test</Location><Bucket>imagestore</Bucket><Key>tmpl/1/1/routing-1/test</Key><ETag>kfDkh7Q_QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
While the etag created by Amazon S3 is: 
&quot;70e1860be687d43c039873adef4280f2-3&quot;
DEBUG: Sending request method_string='POST', 
uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfdfdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--',
DEBUG: Response: {'status': 200, 'headers': {, 'server': 'AmazonS3', 
'transfer-encoding': 'chunked', 'connection': 'Keep-Alive', 'x-amz-request-id': 
'8DFF5D8025E58E99', 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 
2013 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': 
'<?xml version="1.0" encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult 
xmlns="http://s3.amazonaws.com/doc/2006-03-01/";><Location>http://fdfdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&quot;70e1860be687d43c039873adef4280f2-3&quot;</ETag></CompleteMultipartUploadResult>'}
So the etag created on Amazon S3 has "-"(dash) in it, but there is only "_" 
(underscore) on Riak cs.
Do you know the reason? What should we need to do to make it compatible with 
Amazon S3 SDK?
-----Original Message-----
From: John Burwell [mailto:jburw...@basho.com]
Sent: Thursday, June 06, 2013 2:03 PM
To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>
Subject: Re: Object based Secondary storage.
Min,
Are you calculating the MD5 or letting the Amazon client do it?
Thanks,
-John
On Jun 6, 2013, at 4:54 PM, Min Chen 
<min.c...@citrix.com<mailto:min.c...@citrix.com>> wrote:
Thanks Tom. Indeed I have a S3 question that need some advise from
some S3 experts. To support upload object > 5G, I have used
TransferManager.upload to upload object to S3, upload went fine and
object are successfully put to S3. However, later on when I am using
"s3cmd get <object key>" to retrieve this object, I always got this exception:
"MD5 signatures do not match: computed=Y, received="X"
It seems that Amazon S3 kept a different Md5 sum for the multi-part
uploaded object. We have been using Riak CS for our S3 testing. If I
changed to not using multi-part upload and directly invoking S3
putObject, I will not run into this issue. Do you have such experience
before?
-min
On 6/6/13 1:56 AM, "Thomas O'Dowd" 
<tpod...@cloudian.com<mailto:tpod...@cloudian.com>> wrote:
Thanks Min. I've printed out the material and am reading new threads.
Can't comment much yet until I understand things a bit more.
Meanwhile, feel free to hit me up with any S3 questions you have. I'm
looking forward to playing with the object_store branch and testing
it out.
Tom.
On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
Welcome Tom. You can check out this FS
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Backu
p+Obj
ec
t+Store+Plugin+Framework for secondary storage architectural work
t+Store+Plugin+done
in
object_store branch.You may also check out the following recent
threads regarding 3 major technical questions raised by community as
well as our answers and clarification.
http://mail-archives.apache.org/mod_mbox/cloudstack-
dev/201306.mbox/
%3C77
B3
37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
http://mail-archives.apache.org/mod_mbox/cloudstack-
dev/201306.mbox/
%3CCD
D2
2955.3DDDC%25min.chen%40citrix.com%3E
http://mail-archives.apache.org/mod_mbox/cloudstack-
dev/201306.mbox/
%3CCD
D2
300D.3DE0C%25min.chen%40citrix.com%3E
That branch is mainly worked on by Edison and me, and we are at PST
timezone.
Thanks
-min
--
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian(r) Community Edition!
--
Cloudian KK - http://www.cloudian.com/get-started.html
Fancy 100TB of full featured S3 Storage?
Checkout the Cloudian® Community Edition!


Reply via email to