Hi all, I've come across some issues while testing what happens when failures happen on our system, for example a machine failing. One of the (slightly scary) issues I have come across is for a short while when a Riak node goes down, data that is read from another node isn't always consistent. I have written a small test script to demonstrate this issue:
https://gist.github.com/847749 Halfway through I switch off a node; here are the results: Deleted 0 Wrote 100 454551 1298916758: 100 454551 1298916759: 100 454551 1298916760: 100 454551 1298916761: 100 454551 1298916762: 100 454551 1298916762: 100 454551 1298916763: 100 454551 1298916764: 100 454551 (Shutdown around here) 1298916765: 100 454551 1298916766: 99 460532 1298916767: 91 412241 1298916768: 100 454551 1298916769: 100 454551 1298916770: 100 454551 1298916771: 100 454551 1298916772: 100 454551 1298916773: 100 454551 1298916774: 100 454551 1298916775: 100 454551 1298916776: 100 454551 1298916777: 100 454551 ^C1298916777: 100 454551 Deleted 100 Slightly more scary is that it appears to sometimes read old (deleted) data: Deleted 0 Wrote 100 495792 1298916784: 100 495792 1298916785: 100 495792 1298916786: 100 495792 1298916786: 100 495792 (Shutdown around here) 1298916787: 100 495792 1298916788: 100 487322 1298916789: 100 495792 1298916790: 100 495792 1298916791: 100 495792 1298916792: 100 495792 1298916793: 100 495792 1298916794: 100 495792 1298916795: 100 495792 ^C1298916796: 100 495792 1298916797: 100 495792 Deleted 100 This is using the Ripple library (0.8.3) talking directly to the local node, however I believe the same problem is happening when using the Erlang PBC library. This problem seems to be exacerbated when there are larger amounts of data being stored in Riak, and the eventual consistency takes longer to occur. I am quite puzzled as to why this is happening, I could kind of understand if data went missing, but the eventual consistency is what puzzles me, I only have two nodes, so why does the data eventually sort itself out? Secondly why does this still happen even with W and DW set to 3 (I originally had the script using the default values, but thought I would try this)? Both of the nodes are running Riak 0.14.0, here are the relavent configs: https://gist.github.com/847756 Apologies if I am just doing something stupid, it has been a rather long day :) Regards, Luca Spiller
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com