On Mon, Nov 22, 2010 at 5:14 PM, Todd Lipcon <t...@lipcon.org> wrote: > On Mon, Nov 22, 2010 at 1:58 PM, David Jeske <dav...@gmail.com> wrote: >> >> On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon <t...@lipcon.org> wrote: >>> >>> Not quite. The replica synchronization code is pretty messy, but >>> basically it will take the longest replica that may have been synced, not a >>> quorum. >>> i.e the guarantee is that "if you successfully sync() data, it will be >>> present after replica synchronization". Unsynced data *may* be present after >>> replica synchronization. >>> But keep in mind that recovery is blocking in most cases - ie if the RS >>> is writing to a pipeline and waiting on acks, and one of the nodes in the >>> pipeline dies, then it will recover the pipeline (without the dead node) and >>> continue syncing to the remaining two nodes. The client is still blocked at >>> this point. >> >> I see. So it sounds like my statement #1 was wrong. Will the RS ever >> timeout the write and fail in the face of not being able to push it to HDFS? >> Is it correct to say: >> Once a write is issued to HBase, it will either catistrophicly fail (i.e. >> disconnect), in which case the write with either have failed or succeeded, >> and if it succeeded, future reads will always show that write? As opposed to >> Cassandra, which in all configurations where reads allow a subset of all >> nodes, can "fail" a write while having the write show a temporary period of >> inconsistency (depending on who you talk to) followed by the write either >> applying or not applying depending on whether or not it actually wrote a >> single node during the "failure to meet the write consistency request"? > > Yes, this seems accurate to me. > >> >> Does Cassandra have any return result which distinguishes between these >> two states: >> 1 - your data was not written to any nodes (true failure) >> 2 - your data was written to at least 1 node, but not enough to meet your >> write-consistency count >> ? >> >> > >
David , Return messages such as "your data was written to at least 1 node but not enough to make your write-consistency count". Do not help the situation. As the client that writes the data would be aware of the inconsistency, but the other clients would not. Thus it only makes sense to pass or fail entirely. (Thought it could be an interesting error message) Right, CASSANDRA-1314 only solves the memory overhead issue. Another twist to throw in the "losing writes conversation" is that file systems can lose writes as well :) Unless you are choosing many synchronous options that most people do not use (IMHO) @Todd. Good catch about caching HFile blocks. My point still applies though. Caching HFIle blocks on a single node vs individual "dataums" on N nodes may not be more efficient. Thus terms like "Slower" and "Less Efficient" could be very misleading. Isn't caching only the item more efficient? In cases with high random read is evicting single keys more efficient then evicting blocks in terms of memory churn? These are difficult questions to answer absolutely so seeing bullet points such as '#Cassandra has slower this' are oversimplifications of complex problems.