Re: [DISCUSSION] High-volume counters in Cassandra

2010-10-01 Thread Sylvain Lebresne
On Fri, Oct 1, 2010 at 5:12 PM, Zhu Han wrote: >> They have however at least one advantage: >>  - your super columns are indexed, you don't have to deserialize them >>    entirely each time. >> > > The  size of counter super column is limited to how many replicas propagated > values as the lead re

Re: [DISCUSSION] High-volume counters in Cassandra

2010-10-01 Thread Zhu Han
> They have however at least one advantage: > - your super columns are indexed, you don't have to deserialize them >entirely each time. > The size of counter super column is limited to how many replicas propagated values as the lead replica. It's size is upper bounded by the number of repli

Re: [DISCUSSION] High-volume counters in Cassandra

2010-10-01 Thread Sylvain Lebresne
On Thu, Sep 30, 2010 at 6:29 PM, Ryan King wrote: > On Tue, Sep 28, 2010 at 10:14 PM, Jonathan Ellis wrote: >> On Tue, Sep 28, 2010 at 4:00 PM, Sylvain Lebresne wrote: >>> I agree that it is worth adding a support for counter as supercolumns >>> in 1546 and that's fairly trivial, so I will add t

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-30 Thread Ryan King
On Tue, Sep 28, 2010 at 10:14 PM, Jonathan Ellis wrote: > On Tue, Sep 28, 2010 at 4:00 PM, Sylvain Lebresne wrote: >> I agree that it is worth adding a support for counter as supercolumns >> in 1546 and that's fairly trivial, so I will add that as soon as possible >> (but please understand that I

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-28 Thread Jonathan Ellis
On Tue, Sep 28, 2010 at 4:00 PM, Sylvain Lebresne wrote: > I agree that it is worth adding a support for counter as supercolumns > in 1546 and that's fairly trivial, so I will add that as soon as possible > (but please understand that I'm working on this for a good part during > my free time). > >

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-28 Thread Sylvain Lebresne
I agree that it is worth adding a support for counter as supercolumns in 1546 and that's fairly trivial, so I will add that as soon as possible (but please understand that I'm working on this for a good part during my free time). As for supercolumns of counters, there is what Jonathan proposes, bu

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-28 Thread Jonathan Ellis
On Tue, Sep 28, 2010 at 3:35 PM, Ryan King wrote: > On Tue, Sep 28, 2010 at 12:48 PM, Jonathan Ellis wrote: >> On Tue, Sep 28, 2010 at 2:25 PM, Ryan King wrote: >>> Sorry, been catching up on this. >>> >>> From Twitter's perspective, 1546 is probably insufficient because it >>> doesn't allow one

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-28 Thread Ryan King
On Tue, Sep 28, 2010 at 12:48 PM, Jonathan Ellis wrote: > On Tue, Sep 28, 2010 at 2:25 PM, Ryan King wrote: >> Sorry, been catching up on this. >> >> From Twitter's perspective, 1546 is probably insufficient because it >> doesn't allow one to do time-series data without supercolumns (which >> mig

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-28 Thread Jonathan Ellis
On Tue, Sep 28, 2010 at 2:25 PM, Ryan King wrote: > Sorry, been catching up on this. > > From Twitter's perspective, 1546 is probably insufficient because it > doesn't allow one to do time-series data without supercolumns (which > might work ok, but require a good deal of work). Additionally, one

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-28 Thread Ryan King
Sorry, been catching up on this. >From Twitter's perspective, 1546 is probably insufficient because it doesn't allow one to do time-series data without supercolumns (which might work ok, but require a good deal of work). Additionally, one of our deployed systems already does supercolumns of counte

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-28 Thread Jeremy Hanna
Is there any feedback from Twitter and Digg and perhaps SimpleGeo people about CASSANDRA-1546? Would that work so that you wouldn't have to maintain a fork? On Sep 27, 2010, at 5:25 AM, Sylvain Lebresne wrote: > In CASSANDRA-1546, I propose an alternative to #1072. At it's core, > it rewrites #

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-27 Thread Sylvain Lebresne
In CASSANDRA-1546, I propose an alternative to #1072. At it's core, it rewrites #1072 without the clocks structure (by splitting the clock into individual columns, not unlike what Zhu Han proposed in his preceding mail, but in a row instead of a super column, for reason explained in the issue). Bu

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-26 Thread Zhu Han
I propose a new way to solve the counter problem in cassandra-1502[1]. Since I do not follow the jira update very carefully, I paste it here and want to let more people comment it and then to see whether its feasible. "Seems like we have not found a solution acceptable to everybody. I tries to pr

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-25 Thread Jonathan Ellis
On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han wrote: > Can we just let the patch committed but mark it as "alpah" or > "experimental"? I explained exactly why that is not a good approach here: http://www.mail-archive.com/dev@cassandra.apache.org/msg00917.html -- Jonathan Ellis Project Chair, Apache

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-25 Thread Zhu Han
> > > On the other hand, if the patch authors never bring it up to the > standards of the rest of the project, well, then it's a good thing we > didn't commit it under a "commit now, fix later" process. > > > Maybe this fork could be prevented if committers could give the guidance? > > While it's t

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Torsten Curdt
> 2-4 are bad enough risks that it's worth taking the time to get it > right before committing it. So what is the plan for "getting it right" then? Is it a plan that satisfies all parties? Can people work together on that plan? Would be great to discuss the details on the dev list (not in JIRAs)

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Jonathan Ellis
On Fri, Sep 24, 2010 at 8:45 AM, Chris Goffinet wrote: > My two cents on the issue is that, it's an important feature that the > community wants, and the work needing to be done to let 1072, isn't worth the > amount of effort to delay at this stage. I would much rather get the code in, > suppor

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Chris Goffinet
I can chime in on that part Joe. The counters weren't the reasons for launch issues. It was just actually the volume of messages we were trying to handle in the cluster, so it didn't matter if it was reads for counts or reads to normal CFs, we just had more on the count side. It is very unfortu

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Joe Stump
On Sep 24, 2010, at 8:01 AM, Jeremy Hanna wrote: > H... would there be any way that others in the project that are familiar > with the design could help the authors to redo some of the elements to remove > the internal clock structure and get it to work properly before 0.7.0 is > finalize

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Jeremy Hanna
H... would there be any way that others in the project that are familiar with the design could help the authors to redo some of the elements to remove the internal clock structure and get it to work properly before 0.7.0 is finalized? Not sure if that's feasible, but I would just hate to s

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Jonathan Ellis
On Fri, Sep 24, 2010 at 5:39 AM, Torsten Curdt wrote: > Cassandra is out of incubation and I am no longer on the PMC ...but this > a little concerning. > > I know this discussion is all about good internal design but - for the sake > of the community - isn't there a way this fork could be avoided?

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Courtney Robinson
?Apologies for my last e-mail with the misleading subject i was reading this thread and mistakenly replied with stuff.

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Courtney Robinson
?I've been using Cassandra for a while now and no problems. I have a new project coming up now that we're penciling out the data structure for. The best we've come up with has turned into a graph structure i'm just wanting to know what people think because i know there are graph db's out there

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Jeremy Hanna
I'm all for some kind of some kind of compromise. It doesn't appear to be a niche use case for just one company. Twitter, Digg, and SimpleGeo and several others have said they will be using high volume counters. They have already cleaned things up and separated it out. I don't know all the details

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Torsten Curdt
Cassandra is out of incubation and I am no longer on the PMC ...but this a little concerning. I know this discussion is all about good internal design but - for the sake of the community - isn't there a way this fork could be avoided? I don't have the feeling this is about the work involved implem

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-24 Thread Johan Oskarsson
Here is an update to where we are at with the counters. As promised we have published a new patch that adds a separate api method, marked experimental, for the counters in CASSANDRA-1072. The discussion has now moved to CASSANDRA-1502, a ticket that suggests the internal IClock, reconciler and

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-06 Thread Jeff Hodges
Sorry, I hit reply thinking that was going to just Johan. My bad. It is true, though. On Mon, Sep 6, 2010 at 12:46 PM, Jeff Hodges wrote: > You're a fucking hero. > > On Sep 6, 2010 12:12 PM, "Johan Oskarsson" wrote: > The consensus in this thread seems to be moving towards the following todos >

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-06 Thread Jeff Hodges
You're a fucking hero. On Sep 6, 2010 12:12 PM, "Johan Oskarsson" wrote: The consensus in this thread seems to be moving towards the following todos in order to get 1072 into trunk. * create separate api methods for increments * mark functionality as experimental * further code cleanup (please c

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-06 Thread Jonathan Ellis
Yes. On Mon, Sep 6, 2010 at 12:11 PM, Johan Oskarsson wrote: > The consensus in this thread seems to be moving towards the following todos > in order to get 1072 into trunk. > > * create separate api methods for increments > * mark functionality as experimental > * further code cleanup (please c

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-06 Thread Johan Oskarsson
The consensus in this thread seems to be moving towards the following todos in order to get 1072 into trunk. * create separate api methods for increments * mark functionality as experimental * further code cleanup (please comment on jira with specific suggestions) Is this a reasonable summary? W

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-06 Thread Jonathan Ellis
On Thu, Sep 2, 2010 at 4:10 PM, Torsten Curdt wrote: > The feature could still be marked experimental. > That should loosen the contract a little. But at least it would be > something to work with. Maybe this is the best approach, post- code cleanup. Although I'm reluctant to add code with known

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-05 Thread Zhu Han
I thought about it again for a while. It might be a good trade-off to just implement the "CASSANDRA-1421" as a new API and limit the new code only in StorageProxy level, and never put any dependency on internal memchanism of Cassandra, e.g. compaction, membership management and other complicated

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-05 Thread Zhu Han
+ 1 for Jonathan Ellis. I might not be on the same page as you active community members. But I'm wondering why not put this feature to a popular client library or as a contrib package? In CASSANDRA-1072 + CASSANDRA-1397, the increment of counter is not idempotent, so it's difficult to align with

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-03 Thread Jeremy Hanna
So ditch Clocks and refactor to be more cleanly separated and it could go in? On Sep 2, 2010, at 3:55 PM, Jonathan Ellis wrote: > I still have not seen any response to my other misgivings about 1072 > that I have raised on the ticket. Specifically, the existing patch is > based around a Clock st

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-02 Thread Torsten Curdt
I cannot say anything about the implementation details of the patch or even the two different approaches. Not sure that even matters that much at this stage. What can say though is that I got the feeling that there is a lot of desire and drive in the community to get at least something in. Ignoring

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-02 Thread Adam Samet
If a new api method is added for counters, the thrift interface Clock structure wouldn't be needed, but that's getting to be an implementation detail. Whether 1072 is an appropriate step forward is tangential to that issue. The patch has been refactored several times based on JIRA feedback. If th

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-02 Thread Jonathan Ellis
I still have not seen any response to my other misgivings about 1072 that I have raised on the ticket. Specifically, the existing patch is based around a Clock structure that, since 580 is a dead end, is no longer necessary. I'm also uneasy about adding 200k of code that meshes as poorly with the

Re: [DISCUSSION] High-volume counters in Cassandra

2010-09-02 Thread Ben Standefer
At SimpleGeo, we're close to just merging 1072 internally. I've talked with several members of the community who have already done this and are running 1072 in production or quasi-production. It seems like if this isn't merged, people are going to merge it internally anyways. I think such a wide

[DISCUSSION] High-volume counters in Cassandra

2010-09-02 Thread Johan Oskarsson
In the last few months Digg and Twitter have been using a counter patch that lets Cassandra act as a high-volume realtime counting system. Atomic counters enable new applications that were previously difficult to implement at scale, including realtime analytics and large-scale systems monitoring