Hi,

I see from your code sample that you are using the HTTP mode. Do you see the 
same issue if you switch to using Protocol Buffers?

Best regards,

Christian




On 9 Nov 2013, at 09:53, finkle mcgraw <finklemcg...@gmail.com> wrote:

> Hi John and Engel,
> 
> Here's a link to a Dropbox folder with a set of file pairs (the source file 
> and the corrupted version that has taken a round trip via riak):
> https://www.dropbox.com/sh/snfbiqm0jys9u2a/AZPF7_RcBT
> 
> John, to answer your questions:
> 
> Windows-->Riak-->Ubuntu VM
> When uploading files from windows to riak, then downloading them to the 
> Ubuntu VM, inconsistencies appear also, but always for the same subset of 
> files (if I repeatedly download the same set of files from riak and verify 
> against the source files). This to me indicates that these files were 
> corrupted on the upload from windows to riak.
> 
> Ubuntu VM-->Riak-->Windows
> When uploading the source files from Ubuntu VM (and after having verified 
> that they can be downloaded into the Ubuntu VM again without any problems) 
> and then downloading them to windows, inconsistencies appear. However, these 
> inconsistencies are varying from file to file from each download round. I.e., 
> by downloading a file a few times I eventually get a non-corrupted version. 
> This to me indicates that the files were correctly uploaded to riak from the 
> Ubuntu VM, but are corrupted somewhere in the download flow on the windows 
> machine.
> 
> Ergo: Data appears to be corrupted both when going upstream and when going 
> downstream somewhere inside the stack used by the riak python client on 
> windows 7 64 bit.
> 
> One more observation: I've done some byte for byte comparisons when 
> uploading/downloading, and the error rate appears to be on the order of 0.4 
> ppm.
> 
> Finkle
> 
> 
> 
> 
> 
> 
> 
> 2013/11/9 John Daily <jda...@basho.com>
> (And the inverse would also be interesting to know.)
> 
> -John
> 
> On Nov 8, 2013, at 6:41 PM, John Daily <jda...@basho.com> wrote:
> 
>> If you upload the files from Windows, and download them to the Ubuntu VM, do 
>> inconsistencies ever appear?
>> 
>> -John
>> 
>> On Nov 8, 2013, at 4:58 PM, Engel Sanchez <en...@basho.com> wrote:
>> 
>>> Hello there,
>>> 
>>> This looks puzzling. Just from looking at the code we haven't found 
>>> anything suspicious. Would you mind posting a pair of those files that 
>>> failed to match somewhere so we can look at the differences?
>>> 
>>> Thanks for reporting this.
>>> 
>>> Engel@Basho
>>> 
>>> 
>>> On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <finklemcg...@gmail.com> 
>>> wrote:
>>> Fellow Riak users,
>>> 
>>> I've noticed that when I upload binary files with sizes of >~1 MB to Riak 
>>> from my Windows 7 (64 bit) machine, then read the same data back again, 
>>> often it has a few corrupted bytes, while maintining the correct total data 
>>> length.
>>> 
>>> Here's the Python script I use to provoke and detect the situation:
>>> https://gist.github.com/anonymous/7376084
>>> 
>>> Notice that I included the typical output when running the script at the 
>>> bottom of the gist. As you can see, for that particular run, half of the 
>>> dummy-data files were corrupted. The returned data from Riak has the exact 
>>> same length as the source, but not the exact same content. I've only done 
>>> brief analysis of how the corruptions appear within the files that are 
>>> detected as corrupted, but it looks like it's typically between 1 to 5 
>>> bytes that are altered, evenly distributed within the file.
>>> 
>>> I get no exceptions or warnings from the Riak Python client. Everything 
>>> appears to be in order.
>>> 
>>> So far I've tested this on two different windows machines against two 
>>> different Riak clusters (a five node Amazon cluster with a loadbalancer in 
>>> front, and a local devcluster running inside an Ubuntu 12.04 Virtual 
>>> Machine). The problems appear in all four possible combinations.
>>> 
>>> However, if I run the script from within an Ubuntu VM, on one of the said 
>>> Windows machines, against any of the two Riak clusteres, the problems do 
>>> NOT appear.
>>> 
>>> Another observation: If I generate 50 sample files, upload them, then 
>>> repeatedly try to download them over and over again, the script will detect 
>>> corruptions in different files on each repetition of downloading. E.g., on 
>>> round one it might say that file 1,5, and 19 were corrupted, but on round 
>>> two it might say 3, 8 and 19.
>>> 
>>> Here is the riak stats-view from the Amazon cluster we're running (that I 
>>> tested the script agains):
>>> https://gist.github.com/anonymous/7376379
>>> 
>>> But as I said, the corruptions appear also when working locally between a 
>>> Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine.
>>> 
>>> Here are my local package versions, running on Python 2.7.5 64 bit on 
>>> Windows 7 64 bit:
>>> protobuf==2.4.1
>>> riak==2.0.1
>>> riak-pb==1.4.1.1
>>> 
>>> Any ideas? This seems relatively serious, unless it's some kind of brutal 
>>> oversight on my part.
>>> 
>>> Finkle
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to