On Fri, Aug 13, 2010 at 6:24 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> This is simply not an acceptable alternative and just can't be called >> handling it "well". > > What part is it handling poorly, at a technical level? This is almost > exactly what 1072 does internally -- we are concerned here with the > high write, low read volume case. >
Requiring clients directly manage the counter rows in order to periodically compress or segment them. Yes, you can emulate the behavior. No, that is not handling it well. >> It is equivalent to "make the users do it", which >> is the case for almost anything. > > I strongly feel we should be in the business of providing building > blocks, not special cases on top of that. (But see below, I *do* > think the 580 version vectors is the kind of building block we want!) > I agree, 580 is really valuable and should be in. The problem for high write rate, distributed counters is the requirement of read before write inherent in such vector-based approaches. Am I missing some aspect of 580 that precludes that? >> The reasons #1072 is so valuable: >> >> 1) Does not require _any_ user action. > > This can be addressed at the library level. Just as our first stab at > ZK integration was a rather clunky patch; "cages" is better. > Certainly, but it would be hard to argue (and I am not) that the tightly synchronized behavior of ZK is a good match for Cassandra (mixing in Paxos could make for some neat options, but that's another debate...). >> 2) Does not change the EC-centric model of Cassandra. > > It does, though. 1072 is *not* a version vector-based approach -- > that would be 580. Read the 1072 design doc, if you haven't. (Thanks > to Kelvin for writing that up!) > Nor is Cassandra right now. I know 1072 isn't vector based, and I think that is in its favor _for this application_. > I'm referring in particular to reads requiring CL.ALL. (My > understanding is that in the previous design, a "master" replica was > chosen and was always written to first.) Both of these break "the > EC-centric model" and that is precisely the objection I made when I > said "ConsistencyLevel is not respected." I don't think this is > fixable in the 1072 approach. I would be thrilled to be wrong. > It is EC in that the total for a counter is unknown until resolved on read. Yes, it does not respect CL, but since it can only be used in 1 way, I don't see that as a disadvantage. >>> The second is that the approach in 1072 resembles an entirely separate >>> system that happens to use part of Cassandra infrastructure -- the >>> thrift API, the MessagingService, the sstable format -- but isn't >>> really part of it. ConsistencyLevel is not respected, and special >>> cases abound to weld things in that don't fit, e.g. the AES/Streaming >>> business. >> >> Then let's find ways to make it as elegant as it can be. Ultimately, >> this functionality needs to be in Cassandra or users will simply >> migrate someplace else for this extremely common use case. > > This is what I've been pushing for. The version vector approach to > counting (i.e. 580 as opposed to 1072) is exactly the more elegant, > EC-centric approach that addresses a case that we *don't* currently > handle well (counters with a higher read volume than 1072). > Perhaps I missed something: does counting 580 require read before counter update (local to the node, not a client read)? b