Hi I think In this case (logging hard traffic) both of two idea can't scale write operation in current Cassandra. So wait for secondary index support.
2010/6/3 Jonathan Shook <jsh...@gmail.com> > Insert "if you want to use long values for keys and column names" > above paragraph 2. I forgot that part. > > On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook <jsh...@gmail.com> wrote: > > If you want to do range queries on the keys, you can use OPP to do this: > > (example using UTF-8 lexicographic keys, with bursts split across rows > > according to row size limits) > > > > Events: { > > "20100601.05.30.003": { > > "20100601.05.30.003": <value> > > "20100601.05.30.007": <value> > > ... > > } > > } > > > > With a future version of Cassandra, you may be able to use the same > > basic datatype for both key and column name, as keys will be binary > > like the rest, I believe. > > > > I'm not aware of specific performance improvements when using OPP > > range queries on keys vs iterating over known keys. I suspect (hope) > > that round-tripping to the server should be reduced, which may be > > significant. Does anybody have decent benchmarks that tell the > > difference? > > > > > > On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning <ben...@gmail.com> wrote: > >> With a traffic pattern like that, you may be better off storing the > >> events of each burst (I'll call them group) in one or more keys and > >> then storing these keys in the day key. > >> > >> EventGroupsPerDay: { > >> "20100601": { > >> 123456789: "group123", // column name is timestamp group was > >> received, column value is key > >> 123456790: "group124" > >> } > >> } > >> > >> EventGroups: { > >> "group123": { > >> 123456789: "value1", > >> 123456799: "value2" > >> } > >> } > >> > >> If you think of Cassandra as a toolkit for building scalable indexes > >> it seems to make the modeling a bit easier. In this case, you're > >> building an index by day to lookup events that come in as groups. So, > >> first you'd fetch the slice of columns for the day you're interested > >> in to figure out which groups to look at then you'd fetch the events > >> in those groups. > >> > >> There are plenty of alternate ways to divide up the data among rows > >> also - you could use hour keys instead of days as an example. > >> > >> On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn <da...@lookin2.com> > wrote: > >>> Let's say you're logging events, and you have billions of events. What > if > >>> the events come in bursts, so within a day there are millions of > events, but > >>> they all come within microseconds of each other a few times a day? How > do > >>> you find the events that happened on a particular day if you can't > store > >>> them all in one row? > >>> > >>> On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook <jsh...@gmail.com> > wrote: > >>>> > >>>> Either OPP by key, or within a row by column name. I'd suggest the > latter. > >>>> If you have structured data to stick under a column (named by the > >>>> timestamp), then you can serialize and unserialize it yourself, or you > >>>> can use a supercolumn. It's effectively the same thing. Cassandra > >>>> only provides the super column support as a convenience layer as it is > >>>> currently implemented. That may change in the future. > >>>> > >>>> You didn't make clear in your question why a standard column would be > >>>> less suitable. I presumed you had layered structure within the > >>>> timestamp, hence my response. > >>>> How would you logically partition your dataset according to natural > >>>> application boundaries? This will answer most of your question. > >>>> If you have a dataset which can't be partitioned into a reasonable > >>>> size row, then you may want to use OPP and key concatenation. > >>>> > >>>> What do you mean by giant? > >>>> > >>>> On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn <da...@lookin2.com> > >>>> wrote: > >>>> > How do I handle giant sets of ordered data, e.g. by timestamps, > which I > >>>> > want > >>>> > to access by range? > >>>> > > >>>> > I can't put all the data into a supercolumn, because it's loaded > into > >>>> > memory > >>>> > at once, and it's too much data. > >>>> > > >>>> > Am I forced to use an order-preserving partitioner? I don't want the > >>>> > headache. Is there any other way? > >>>> > > >>> > >>> > >> > > >