Hi, I see from your code sample that you are using the HTTP mode. Do you see the same issue if you switch to using Protocol Buffers?
Best regards, Christian On 9 Nov 2013, at 09:53, finkle mcgraw <finklemcg...@gmail.com> wrote: > Hi John and Engel, > > Here's a link to a Dropbox folder with a set of file pairs (the source file > and the corrupted version that has taken a round trip via riak): > https://www.dropbox.com/sh/snfbiqm0jys9u2a/AZPF7_RcBT > > John, to answer your questions: > > Windows-->Riak-->Ubuntu VM > When uploading files from windows to riak, then downloading them to the > Ubuntu VM, inconsistencies appear also, but always for the same subset of > files (if I repeatedly download the same set of files from riak and verify > against the source files). This to me indicates that these files were > corrupted on the upload from windows to riak. > > Ubuntu VM-->Riak-->Windows > When uploading the source files from Ubuntu VM (and after having verified > that they can be downloaded into the Ubuntu VM again without any problems) > and then downloading them to windows, inconsistencies appear. However, these > inconsistencies are varying from file to file from each download round. I.e., > by downloading a file a few times I eventually get a non-corrupted version. > This to me indicates that the files were correctly uploaded to riak from the > Ubuntu VM, but are corrupted somewhere in the download flow on the windows > machine. > > Ergo: Data appears to be corrupted both when going upstream and when going > downstream somewhere inside the stack used by the riak python client on > windows 7 64 bit. > > One more observation: I've done some byte for byte comparisons when > uploading/downloading, and the error rate appears to be on the order of 0.4 > ppm. > > Finkle > > > > > > > > 2013/11/9 John Daily <jda...@basho.com> > (And the inverse would also be interesting to know.) > > -John > > On Nov 8, 2013, at 6:41 PM, John Daily <jda...@basho.com> wrote: > >> If you upload the files from Windows, and download them to the Ubuntu VM, do >> inconsistencies ever appear? >> >> -John >> >> On Nov 8, 2013, at 4:58 PM, Engel Sanchez <en...@basho.com> wrote: >> >>> Hello there, >>> >>> This looks puzzling. Just from looking at the code we haven't found >>> anything suspicious. Would you mind posting a pair of those files that >>> failed to match somewhere so we can look at the differences? >>> >>> Thanks for reporting this. >>> >>> Engel@Basho >>> >>> >>> On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <finklemcg...@gmail.com> >>> wrote: >>> Fellow Riak users, >>> >>> I've noticed that when I upload binary files with sizes of >~1 MB to Riak >>> from my Windows 7 (64 bit) machine, then read the same data back again, >>> often it has a few corrupted bytes, while maintining the correct total data >>> length. >>> >>> Here's the Python script I use to provoke and detect the situation: >>> https://gist.github.com/anonymous/7376084 >>> >>> Notice that I included the typical output when running the script at the >>> bottom of the gist. As you can see, for that particular run, half of the >>> dummy-data files were corrupted. The returned data from Riak has the exact >>> same length as the source, but not the exact same content. I've only done >>> brief analysis of how the corruptions appear within the files that are >>> detected as corrupted, but it looks like it's typically between 1 to 5 >>> bytes that are altered, evenly distributed within the file. >>> >>> I get no exceptions or warnings from the Riak Python client. Everything >>> appears to be in order. >>> >>> So far I've tested this on two different windows machines against two >>> different Riak clusters (a five node Amazon cluster with a loadbalancer in >>> front, and a local devcluster running inside an Ubuntu 12.04 Virtual >>> Machine). The problems appear in all four possible combinations. >>> >>> However, if I run the script from within an Ubuntu VM, on one of the said >>> Windows machines, against any of the two Riak clusteres, the problems do >>> NOT appear. >>> >>> Another observation: If I generate 50 sample files, upload them, then >>> repeatedly try to download them over and over again, the script will detect >>> corruptions in different files on each repetition of downloading. E.g., on >>> round one it might say that file 1,5, and 19 were corrupted, but on round >>> two it might say 3, 8 and 19. >>> >>> Here is the riak stats-view from the Amazon cluster we're running (that I >>> tested the script agains): >>> https://gist.github.com/anonymous/7376379 >>> >>> But as I said, the corruptions appear also when working locally between a >>> Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine. >>> >>> Here are my local package versions, running on Python 2.7.5 64 bit on >>> Windows 7 64 bit: >>> protobuf==2.4.1 >>> riak==2.0.1 >>> riak-pb==1.4.1.1 >>> >>> Any ideas? This seems relatively serious, unless it's some kind of brutal >>> oversight on my part. >>> >>> Finkle >>> >>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com