If ElasticSearch is a better fit, then using ElasticSearch is the right thing to do. The whole "NoSQL movement" is really about choice. At scale there will never be a single solution that is best for everyone. Riak is intentionally focused on high availability, reliability, and fault-tolerance. If you have critical data and/or care most about optimizing for "it just works" (even during a server failure, even at 2 AM), then Riak is likely the best choice. If other priorities dominate, then you may want to look elsewhere.
Riak is based on Dynamo. Dynamo is an eventually consistent system that embraces sloppy quorums and hinted handoff. In terms of CAP, it is an AP system rather than a CP system. And, by embracing sloppy quorums, Riak/Dynamo takes AP to its limit by optimizing for "always writable" over "read-your-own writes" consistency. In comparison, Project Voldemort is another Dynamo inspired system that chooses to go with strict quorums. This provides RYOW eventual consistency, but leads to lower availability guarantees during failures/partitions. For Riak 1.0, we introduced PW/PR. The intention behind PW/PR was to add support for strict quorums. Requests with R/W settings would use the standard sloppy quorum logic, requests with PR/PW would use strict quorum logic. As Jon mentioned in his earlier email, the current implementation that went into 1.0 turns out to have a corner case were strict quorums aren't enforced. Addressing this corner case and strengthening the guarantees of PW/PR are "on the list". To illustrate sloppy vs strict, consider a 5-node cluster: A/B/C/D/E. And a W=Quorum/N=3 write request is issued on a key that is owned by nodes A/B/C. However, nodes B and C are currently offline. With sloppy quorums/hinted handoff, Riak will re-route the requests meant for B/C to nodes D/E. This allows the write to succeed with the guarantee of having 3-replicas in the cluster. In the future, D will eventually handoff it's replica back to B whenever it comes online, and E will do the same to C. This is hinted handoff. With strict quorums (ie. PW=Quorum), the desired behavior is to only send requests to the primary nodes. Since B/C are down, only A would respond, and therefore the write would fail because it did not fulfill the requested replica requirement. Again, in the current implementation, there are cases where this is not guaranteed and a request may be sent to a fallback node. This leads to the behavior discussed earlier. Of course, even if we had perfect PW/PR semantics, Riak still only gives you a limited form of "read your own writes" consistency. The labels "absolute consistency", "strong consistency" or even "atomic operation" are vague when discussing distributed systems with multiple clients. Some thoughts to ponder: 1. Do you allow multiple clients to write to Riak at the same time? With concurrent writers, "atomic" can mean multiple things. Do you want linearizability? Do you want one writer to fail? Is optimistic concurrency control or MVCC your solution? You could route everything though a single writer, but then you introduce a single-point-of-failure (SPOF). Adding an SPOF in-front of a highly available system is non-ideal. Or, perhaps a single writer with leader election / failover? 2. What about write failure? In Riak, a write failure does not mean the value won't later show-up in a read. If you issue a PW=3 write, it may fail because it succeeded to write to 1 replica, but not the other 2. However, the 1-replica that does have the value will eventually propagate it to the other 2 replicas through read repair. Thus, you may eventually read the failed-to-write value. Of course, the write could have failed to all 3 replicas in which case you won't ever read it. Which type of failure was it? You don't know, because actual failure and "didn't reply before I responded to the client" are indistinguishable without something similar to a 2-phase commit. Does your client handle this case? Perhaps just re-issues writes until they succeed? What if your client dies while re-issuing requests, is the value lost? What consistency guarantees do you want to provide in this scenario? In general, distributed consistency is non-trivial. Even master/slave systems have choices to make. Synchronous vs asynchronous replication. ElasticSearch is synchronous (at least to secondary RAM), while MongoDB is asynchronous. If you have multiple slaves, what consistency guarantees are there between all of them? If the master crashes during a write that was replicated to some but not all slaves, is it possible to get different values on a read if different subset of slaves crash as well? For the strongest replication guarantees, you end up with protocols with higher latency and lower availability guarantees. It's always a tradeoff game. As an aside, around March 2010, I started to investigate strong consistency in Riak. Part of that work lead to an implementation of riak_zab (http://github.com/jtuple/riak_zab), an atomic, 2-phase commit protocol built on riak_core that I released last year. I have unreleased code that provided a strongly consistent riak_kv layer (riakual) on top of riak_zab. This was one of many possible ways to add stronger guarantees to Riak. As Jon mentioned, stronger consistency is a research area for 2012. While CAP dictates that you can't have C/A/P at once, there's no reason you can't have a product that provides both AP requests and CP requests. Perhaps there will be more to discuss on that point later on this year. The main take-away of this long email is that providing different guarantees in the presence of node failures and network partitions is a non-trivial problem. If the goal is high-availability and no SPOF, the problem is even more challenging. I would recommend against anyone trying to implementing client-side strong consistency on-top of Riak, unless you understand the scope of the problem and are intentionally limiting yourself to a subset of strong consistency (eg. "assume writes always succeed", "assume only one writer always", etc). Or, if you understand how you would do so, leverage something like Zookeeper to provide consistent replicated state that is backed by Riak. (That's essentially what riakual/riak_zab did). Or, better yet, look at how you can reformulate your problem to work in an eventually consistent system. Less coordination will always provide faster, more predictable performance. Options like statebox, knockbox, and meangirls can help with certain problem domains: https://github.com/mochi/statebox http://reiddraper.com/introducing-knockbox/ https://github.com/aphyr/meangirls Finally, if you truly need atomic, strong consistency, today and not tomorrow then consider other options. Seriously, if Riak isn't the right fit, "Don't use my database": http://www.slideshare.net/BashoTechnologies/basho-and-riak-at-goto-stockholm-dont-use-my-database If you do look at other options, take the time to truly understand the guarantees provided. What does atomic, multi-master replication mean for a given product? What failure conditions can it tolerate? Can you ever lose data under rare, but not necessarily uncommon scenarios? If you're unsure, ask questions. It's not about good or bad. Different products have different guarantees. The goal is to ensure that whatever features/guarantees you need for your project are provided by the choice you decide upon. -Joe @jtuple On Wed, Jan 11, 2012 at 11:07 AM, Les Mikesell <lesmikes...@gmail.com> wrote: > On Wed, Jan 11, 2012 at 10:56 AM, Vishal Shah <gold...@gmail.com> wrote: >> To add to Ian's comment, for me personally, this specific characteristic is >> in fact a very important distinguishing feature of Riak vs other scalable KV >> systems. To me, this is what separates > > Yes, but it makes it unusable for anything that requires atomic > operations. A group here just went with elasticsearch - partly > because of the full lucene indexer and range queries, but mostly > because they wanted redundant data feeds and an atomic operation to > reject duplicates. And as a nice side effect, you can run a > client-only node that doesn't store data on the same box with the > application so you don't need to go through a separate load balancer > to deal with failures of the node the app is configured to use. > > -- > Les Mikesell > lesmikes...@gmail.com > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Joseph Blomstedt <j...@basho.com> Software Engineer Basho Technologies, Inc. http://www.basho.com/ _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com