Hi Elias, Thanks for your excellent description of the problem. We haven't seen this before to my knowledge, and this isn't expected behavior.
Can you answer a few questions to help us troubleshoot? - Can you send me the exact error message you saw when the client failed to deserialize the object? - What OS are you running? Virtualized or bare-metal? - What version of the Ruby library does your app use? - Anything special about the disks? (Super slow and old? Shiny new SSD's? RAID?) - Can you reproduce the results without using multi_backend? - Does the data load successfully when you use Bitcask instead of LevelDB? Also, if you can share your code, or if you have a small script that can reproduce the failure, that would be extremely helpful. Best, Rusty On Sun, Oct 30, 2011 at 7:58 PM, Elias Levy <fearsome.lucid...@gmail.com>wrote: > I am finding that there appears to be some sort of race condition when > reading recently written objects (as in concurrently). I am using Riak > 1.0.0 with the leveldb backend through the multi backend in a 3 node > cluster. Writes are done with W=2 and reads with R=2. The client is using > the riak client Ruby gem. > > The issue cropped up while working on a data loading script. The script > load data from a file and insert it into the cluster. It attempts to do so > in parallel, with configurable concurrency. This data is largely > non-repetitive. Usually an object is written once and has worked without > major issue. I recently changed the script to collect statistics on some > of the data being inserted, and insert the stats into a different bucket. > The stats are written in JSON and keyed by a value in the data being > loaded. The script will attempt to fetch the stats object for the key > currently under consideration, if it finds one merge the new stats, and > store the new or updated object. > > Once some of the stats objects started to grow into the KB range, the > reading of some existing stat objects started to fail. Upon examination it > seems the data in the object was being truncated and thus riak client > failed to deserialize the object as it was no longer valid. But if I > fetched the object manually I was returned complete. I added a loop to the > script to retry such truncated fetches, and I found that they would succeed > after a few tries. > > It would thus appear that Riak is making the new object available to be > fetched before its data is fully stored, leading to the apparently > truncated return. The issue only becomes visible once the object is large > enough to introduce enough delay in processing for store and fetch > operations to overlap. Using W=2 and R=2 probably has no effect as only > the vclocks are compared, not the actual data stored. Not sure if this is > an issue with the new leveldb backend or the KV code. > > Anyone seen this? Is it expected behavior? Shouldn't the new object only > be exposed after it has been completed received and stored? > > Elias > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- Rusty Klophaus (@rustyio) *Basho Technologies, Inc.* www.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com