Edison,

It appears that the S3 clients have a quirk in their behavior for multi-part 
uploads.  I have created a defect for Riak CS 
(https://github.com/basho/riak_cs/issues/585).  Once a patch has been merged 
merged into master, I will provide instructions for building from source (it is 
very easy), and we can move forward.  Until the path is available, I recommend 
configuring TransferManager with a high multi-part upload threshold (4.5 GB 
should do the trick) and using files less than the size of threshold until the 
Riak CS patch becomes available.

Thanks  for running down this issue.  As I said, it is unexpected behavior, but 
in discussing it, it seems like the quickest remedy is to have Riak CS emulate 
the quirk.  
-John

On Jun 7, 2013, at 1:23 PM, Edison Su <edison...@citrix.com> wrote:

> 
> 
>> -----Original Message-----
>> From: John Burwell [mailto:jburw...@basho.com]
>> Sent: Friday, June 07, 2013 7:54 AM
>> To: dev@cloudstack.apache.org
>> Cc: Kelly McLaughlin
>> Subject: Re: Object based Secondary storage.
>> 
>> Thomas,
>> 
>> The AWS API explicitly states the ETag is not guaranteed to be an integrity
>> hash [1].  According to RFC 2616 [2], clients should not infer any meaning to
>> the content of an ETag.  Essentially, it is an opaque version identifier 
>> which
>> should only be compared for equality to another ETag value to detect a
>> resource change.  As such, I agree with your assessment that s3cmd is
>> making an invalid assumption regarding the value of the ETag.
> 
> 
> Not only s3cmd, but Amazon S3 java SDK also makes the "invalid" assumption.
> What's your opinion to solve the SDK incompatibility issue? 
> 
>> 
>> Min, could you please send the stack trace you receiving from
>> TransferManager?  Also, could send a reference to the code in the Git repo?
>> With that information, we can start run down the source of the problem.
>> 
>> Thanks,
>> -John
>> 
>> [1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
>> [2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
>> 
>> On Jun 7, 2013, at 1:08 AM, Thomas O'Dowd <tpod...@cloudian.com>
>> wrote:
>> 
>>> Min,
>>> 
>>> This looks like an s3cmd problem. I just downloaded the latest s3cmd
>>> to check the source code.
>>> 
>>> In S3/FileLists.py:
>>> 
>>>       compare_md5 = 'md5' in cfg.sync_checks
>>>       # Multipart-uploaded files don't have a valid md5 sum - it ends
>>> with "...-nn"
>>>       if compare_md5:
>>>           if (src_remote == True and src_list[file]['md5'].find("-")
>>>> = 0) or (dst_remote == True and dst_list[file]['md5'].find("-") >= 0):
>>> 
>>> Basically, s3cmd is trying to verify that the checksum of the data
>>> that it downloads is the same as the etag unless the etag ends with "-YYY".
>>> This is an AWS convention (as I mentioned in an earlier mail) so it
>>> works but it seems that RiakCS has a different ETAG format which
>>> doesn't match -YYY so s3cmd assumes the other type of ETAG which is
>>> the same as the MD5 checksum. For RiakCS however, this is not the
>>> case. This is why you get the checksum error.
>>> 
>>> Chances are that Riak is doing the right thing here and the data file
>>> will be the same as what you uploaded. You could change the s3cmd code
>>> to be more lenient for Riak. The Basho guys might either like to
>>> change their format or talk to the different tool vendors about
>>> changing the tools to work with Riak. For Cloudian, we choose to try
>>> to keep it similar to AWS so we could avoid stuff like this.
>>> 
>>> Tom.
>>> 
>>> On Fri, 2013-06-07 at 04:02 +0000, Min Chen wrote:
>>>> John,
>>>> We are not able to successfully download file that was uploaded to Riak
>> CS with TransferManager using S3cmd. Same error as we encountered using
>> amazon s3 java client due to the incompatible ETAG format ( - and _
>> difference).
>>>> 
>>>> Thanks
>>>> -min
>>>> 
>>>> 
>>>> 
>>>> On Jun 6, 2013, at 5:40 PM, "John Burwell" <jburw...@basho.com> wrote:
>>>> 
>>>>> Edison,
>>>>> 
>>>>> Riak CS and S3 seed their hashes differently -- causing the form to
>> appear slightly different.  In particular, Riak CS uses URI-safe base64 
>> encoding
>> which explains why the ETag values contain "-"s instead of "_"s.  From a 
>> client
>> perspective, the ETags are treated as opaque strings that are passed through
>> to the server for processing and compared strictly for equality.  Therefore,
>> the form of the hash will not cause the client to choke, and the Riak CS
>> behavior you are seeing is S3 API compatible (see
>> http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html for
>> more details).
>>>>> 
>>>>> Were you able to successfully download the file from Riak CS using
>> s3cmd?
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> 
>>>>> On Jun 6, 2013, at 6:57 PM, Edison Su <edison...@citrix.com> wrote:
>>>>> 
>>>>>> The Etag created by both RIAK CS and Amazon S3 seems a little bit
>> different, in case of multi part upload.
>>>>>> 
>>>>>> Here is the result I tested on both RIAK CS and Amazon S3, with s3cmd.
>>>>>> Test environment:
>>>>>> S3cmd: version: version 1.5.0-alpha1 Riak cs:
>>>>>> Name        : riak
>>>>>> Arch        : x86_64
>>>>>> Version     : 1.3.1
>>>>>> Release     : 1.el6
>>>>>> Size        : 40 M
>>>>>> Repo        : installed
>>>>>> From repo   : basho-products
>>>>>> 
>>>>>> The command I used to put:
>>>>>> s3cmd put some-file s3://some-path --multipart-chunk-size-mb=100 -v
>>>>>> -d
>>>>>> 
>>>>>> The etag created for the file, when using Riak CS is
>>>>>> WxEUkiQzTWm_2C8A92fLQg==
>>>>>> 
>>>>>> EBUG: Sending request method_string='POST',
>>>>>> uri='http://imagestore.s3.amazonaws.com/tmpl/1/1/routing-
>> 1/test?upl
>>>>>> oadId=kfDkh7Q_QCWN7r0ZTqNq4Q==', headers={'content-length':
>> '309',
>>>>>> 'Authorization': 'AWS
>>>>>> OYAZXCAFUC1DAFOXNJWI:xlkHI9tUfUV/N+Ekqpi7Jz/pbOI=', 'x-amz-
>> date':
>>>>>> 'Thu, 06 Jun 2013 22:54:28 +0000'}, body=(309 bytes)
>>>>>> DEBUG: Response: {'status': 200, 'headers': {'date': 'Thu, 06 Jun
>>>>>> 2013 22:40:09 GMT', 'content-length': '326', 'content-type':
>>>>>> 'application/xml', 'server': 'Riak CS'}, 'reason': 'OK', 'data':
>>>>>> '<?xml version="1.0"
>>>>>> encoding="UTF-8"?><CompleteMultipartUploadResult
>>>>>> xmlns="http://s3.amazonaws.com/doc/2006-03-
>> 01/"><Location>http://im
>>>>>> agestore.s3.amazonaws.com/tmpl/1/1/routing-
>> 1/test</Location><Bucket
>>>>>>> imagestore</Bucket><Key>tmpl/1/1/routing-
>> 1/test</Key><ETag>kfDkh7Q
>>>>>> _QCWN7r0ZTqNq4Q==</ETag></CompleteMultipartUploadResult>'}
>>>>>> 
>>>>>> While the etag created by Amazon S3 is:
>>>>>> &quot;70e1860be687d43c039873adef4280f2-3&quot;
>>>>>> 
>>>>>> DEBUG: Sending request method_string='POST',
>>>>>> 
>> uri='/fixes/icecake/systdfdfdfemvm.iso1?uploadId=vdkPSAtaA7g.fdfdfd
>>>>>> 
>> fdf..iaKRNW_8QGz.bXdfdfdfdfdfkFXwUwLzRcG5obVvJFDvnhYUFdT6fYr1rig--
>> '
>>>>>> ,
>>>>>> DEBUG: Response: {'status': 200, 'headers': {, 'server':
>>>>>> 'AmazonS3', 'transfer-encoding': 'chunked', 'connection':
>>>>>> 'Keep-Alive', 'x-amz-request-id': '8DFF5D8025E58E99',
>>>>>> 'cache-control': 'proxy-revalidate', 'date': 'Thu, 06 Jun 2013
>>>>>> 22:39:47 GMT', 'content-type': 'application/xml'}, 'reason': 'OK',
>>>>>> 'data': '<?xml version="1.0"
>>>>>> encoding="UTF-8"?>\n\n<CompleteMultipartUploadResult
>>>>>> xmlns="http://s3.amazonaws.com/doc/2006-03-
>> 01/"><Location>http://fd
>>>>>> 
>> fdfdfdfdfdf</Location>Key>fixes/icecake/systemvm.iso1</Key><ETag>&q
>>>>>> uot;70e1860be687d43c039873adef4280f2-
>> 3&quot;</ETag></CompleteMultip
>>>>>> artUploadResult>'}
>>>>>> 
>>>>>> So the etag created on Amazon S3 has "-"(dash) in it, but there is only
>> "_" (underscore) on Riak cs.
>>>>>> 
>>>>>> Do you know the reason? What should we need to do to make it
>> compatible with Amazon S3 SDK?
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: John Burwell [mailto:jburw...@basho.com]
>>>>>>> Sent: Thursday, June 06, 2013 2:03 PM
>>>>>>> To: dev@cloudstack.apache.org
>>>>>>> Subject: Re: Object based Secondary storage.
>>>>>>> 
>>>>>>> Min,
>>>>>>> 
>>>>>>> Are you calculating the MD5 or letting the Amazon client do it?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> 
>>>>>>> On Jun 6, 2013, at 4:54 PM, Min Chen <min.c...@citrix.com> wrote:
>>>>>>> 
>>>>>>>> Thanks Tom. Indeed I have a S3 question that need some advise
>>>>>>>> from some S3 experts. To support upload object > 5G, I have used
>>>>>>>> TransferManager.upload to upload object to S3, upload went fine
>>>>>>>> and object are successfully put to S3. However, later on when I
>>>>>>>> am using "s3cmd get <object key>" to retrieve this object, I always
>> got this exception:
>>>>>>>> 
>>>>>>>> "MD5 signatures do not match: computed=Y, received="X"
>>>>>>>> 
>>>>>>>> It seems that Amazon S3 kept a different Md5 sum for the
>>>>>>>> multi-part uploaded object. We have been using Riak CS for our S3
>>>>>>>> testing. If I changed to not using multi-part upload and directly
>>>>>>>> invoking S3 putObject, I will not run into this issue. Do you
>>>>>>>> have such experience
>>>>>>> before?
>>>>>>>> 
>>>>>>>> -min
>>>>>>>> 
>>>>>>>> On 6/6/13 1:56 AM, "Thomas O'Dowd" <tpod...@cloudian.com>
>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks Min. I've printed out the material and am reading new
>> threads.
>>>>>>>>> Can't comment much yet until I understand things a bit more.
>>>>>>>>> 
>>>>>>>>> Meanwhile, feel free to hit me up with any S3 questions you
>>>>>>>>> have. I'm looking forward to playing with the object_store
>>>>>>>>> branch and testing it out.
>>>>>>>>> 
>>>>>>>>> Tom.
>>>>>>>>> 
>>>>>>>>> On Wed, 2013-06-05 at 16:14 +0000, Min Chen wrote:
>>>>>>>>>> Welcome Tom. You can check out this FS
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+Bac
>>>>>>> ku
>>>>>>>>>> p+Obj
>>>>>>>>>> ec
>>>>>>>>>> t+Store+Plugin+Framework for secondary storage architectural
>>>>>>>>>> t+Store+Plugin+work done
>>>>>>>>>> in
>>>>>>>>>> object_store branch.You may also check out the following recent
>>>>>>>>>> threads regarding 3 major technical questions raised by
>>>>>>>>>> community as well as our answers and clarification.
>>>>>>>>>> 
>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>>>> dev/201306.mbox/
>>>>>>>>>> %3C77
>>>>>>>>>> B3
>>>>>>>>>> 
>>>>>>> 
>> 37AF224FD84CBF8401947098DD87036A76%40SJCPEX01CL01.citrite.net%3E
>>>>>>>>>> 
>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>>>> dev/201306.mbox/
>>>>>>>>>> %3CCD
>>>>>>>>>> D2
>>>>>>>>>> 2955.3DDDC%25min.chen%40citrix.com%3E
>>>>>>>>>> 
>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cloudstack-
>>>>>>> dev/201306.mbox/
>>>>>>>>>> %3CCD
>>>>>>>>>> D2
>>>>>>>>>> 300D.3DE0C%25min.chen%40citrix.com%3E
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> That branch is mainly worked on by Edison and me, and we are at
>>>>>>>>>> PST timezone.
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> -min
>>>>>>>>> --
>>>>>>>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>>>>>>>> Fancy 100TB of full featured S3 Storage?
>>>>>>>>> Checkout the Cloudian(r) Community Edition!
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> Cloudian KK - http://www.cloudian.com/get-started.html
>>> Fancy 100TB of full featured S3 Storage?
>>> Checkout the Cloudian(r) Community Edition!
>>> 
> 

Reply via email to