Re: Race condition reading objects

Rusty Klophaus Mon, 31 Oct 2011 13:02:00 -0700

Hi Elias,

Thanks for your excellent description of the problem. We haven't seen this
before to my knowledge, and this isn't expected behavior.


Can you answer a few questions to help us troubleshoot?

   - Can you send me the exact error message you saw when the client failed
   to deserialize the object?
   - What OS are you running? Virtualized or bare-metal?
   - What version of the Ruby library does your app use?
   - Anything special about the disks? (Super slow and old? Shiny new
   SSD's? RAID?)
   - Can you reproduce the results without using multi_backend?
   - Does the data load successfully when you use Bitcask instead of
   LevelDB?

Also, if you can share your code, or if you have a small script that can
reproduce the failure, that would be extremely helpful.

Best,
Rusty

On Sun, Oct 30, 2011 at 7:58 PM, Elias Levy <fearsome.lucid...@gmail.com>wrote:

> I am finding that there appears to be some sort of race condition when
> reading recently written objects (as in concurrently).  I am using Riak
> 1.0.0 with the leveldb backend through the multi backend in a 3 node
> cluster.  Writes are done with W=2 and reads with R=2.  The client is using
> the riak client Ruby gem.
>
> The issue cropped up while working on a data loading script.  The script
> load data from a file and insert it into the cluster.  It attempts to do so
> in parallel, with configurable concurrency.  This data is largely
> non-repetitive.  Usually an object is written once and has worked without
> major issue.  I recently changed the script to collect statistics on some
> of the data being inserted, and insert the stats into a different bucket.
>  The stats are written in JSON and keyed by a value in the data being
> loaded.  The script will attempt to fetch the stats object for the key
> currently under consideration, if it finds one merge the new stats, and
> store the new or updated object.
>
> Once some of the stats objects started to grow into the KB range, the
> reading of some existing stat objects started to fail.  Upon examination it
> seems the data in the object was being truncated and thus riak client
> failed to deserialize the object as it was no longer valid.  But if I
> fetched the object manually I was returned complete.  I added a loop to the
> script to retry such truncated fetches, and I found that they would succeed
> after a few tries.
>
> It would thus appear that Riak is making the new object available to be
> fetched before its data is fully stored, leading to the apparently
> truncated return.  The issue only becomes visible once the object is large
> enough to introduce enough delay in processing for store and fetch
> operations to overlap.  Using W=2 and R=2 probably has no effect as only
> the vclocks are compared, not the actual data stored.  Not sure if this is
> an issue with the new leveldb backend or the KV code.
>
> Anyone seen this?  Is it expected behavior?  Shouldn't the new object only
> be exposed after it has been completed received and stored?
>
> Elias
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Rusty Klophaus (@rustyio)
*Basho Technologies, Inc.*
www.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Race condition reading objects

Reply via email to