On Fri, Aug 13, 2010 at 1:11 AM, Benjamin Black <b...@b3k.us> wrote: > On Thu, Aug 12, 2010 at 8:54 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> There are two concerns that give me pause. >> >> The first is that 1072 is tackling a use case that Cassandra already >> handles well: high volume of writes to a counter, with low volume >> reads. (This can be done by inserting uuids into a counter row, and >> aggregating them either in the background or at read time or with some >> combination of these. The counter rows can be sharded if necessary.) >> > > This is simply not an acceptable alternative and just can't be called > handling it "well".
What part is it handling poorly, at a technical level? This is almost exactly what 1072 does internally -- we are concerned here with the high write, low read volume case. > It is equivalent to "make the users do it", which > is the case for almost anything. I strongly feel we should be in the business of providing building blocks, not special cases on top of that. (But see below, I *do* think the 580 version vectors is the kind of building block we want!) > The reasons #1072 is so valuable: > > 1) Does not require _any_ user action. This can be addressed at the library level. Just as our first stab at ZK integration was a rather clunky patch; "cages" is better. > 2) Does not change the EC-centric model of Cassandra. It does, though. 1072 is *not* a version vector-based approach -- that would be 580. Read the 1072 design doc, if you haven't. (Thanks to Kelvin for writing that up!) I'm referring in particular to reads requiring CL.ALL. (My understanding is that in the previous design, a "master" replica was chosen and was always written to first.) Both of these break "the EC-centric model" and that is precisely the objection I made when I said "ConsistencyLevel is not respected." I don't think this is fixable in the 1072 approach. I would be thrilled to be wrong. >> The second is that the approach in 1072 resembles an entirely separate >> system that happens to use part of Cassandra infrastructure -- the >> thrift API, the MessagingService, the sstable format -- but isn't >> really part of it. ConsistencyLevel is not respected, and special >> cases abound to weld things in that don't fit, e.g. the AES/Streaming >> business. > > Then let's find ways to make it as elegant as it can be. Ultimately, > this functionality needs to be in Cassandra or users will simply > migrate someplace else for this extremely common use case. This is what I've been pushing for. The version vector approach to counting (i.e. 580 as opposed to 1072) is exactly the more elegant, EC-centric approach that addresses a case that we *don't* currently handle well (counters with a higher read volume than 1072). -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com