Re: Giant sets of ordered data

2010-06-04 Thread Benjamin Black
Use index rows named for time intervals that contain columns named for the row keys of the base data rows from each interval. b On Wed, Jun 2, 2010 at 8:32 AM, David Boxenhorn wrote: > How do I handle giant sets of ordered data, e.g. by timestamps, which I want > to access by range? > > I can't

Re: Giant sets of ordered data

2010-06-03 Thread yoshiyuki kanno
Hi I think In this case (logging hard traffic) both of two idea can't scale write operation in current Cassandra. So wait for secondary index support. 2010/6/3 Jonathan Shook > Insert "if you want to use long values for keys and column names" > above paragraph 2. I forgot that part. > > On Wed,

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
Insert "if you want to use long values for keys and column names" above paragraph 2. I forgot that part. On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook wrote: > If you want to do range queries on the keys, you can use OPP to do this: > (example using UTF-8 lexicographic keys, with bursts split ac

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
If you want to do range queries on the keys, you can use OPP to do this: (example using UTF-8 lexicographic keys, with bursts split across rows according to row size limits) Events: { "20100601.05.30.003": { "20100601.05.30.003": "20100601.05.30.007": ... } } With a future version

Re: Giant sets of ordered data

2010-06-02 Thread Ben Browning
With a traffic pattern like that, you may be better off storing the events of each burst (I'll call them group) in one or more keys and then storing these keys in the day key. EventGroupsPerDay: { "20100601": { 123456789: "group123", // column name is timestamp group was received, column val

Re: Giant sets of ordered data

2010-06-02 Thread David Boxenhorn
Let's say you're logging events, and you have billions of events. What if the events come in bursts, so within a day there are millions of events, but they all come within microseconds of each other a few times a day? How do you find the events that happened on a particular day if you can't store t

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
Either OPP by key, or within a row by column name. I'd suggest the latter. If you have structured data to stick under a column (named by the timestamp), then you can serialize and unserialize it yourself, or you can use a supercolumn. It's effectively the same thing. Cassandra only provides the su

Re: Giant sets of ordered data

2010-06-02 Thread Ben Browning
I like to model this kind of data as columns, where the timestamps are the column name (either longs, TimeUUIDs, or string depending on your usage). If you have too much data for a single row, you'd need to have multiple rows of these. For time-series data, it makes sense to use one row per minute/