Use index rows named for time intervals that contain columns named for
the row keys of the base data rows from each interval.
b
On Wed, Jun 2, 2010 at 8:32 AM, David Boxenhorn wrote:
> How do I handle giant sets of ordered data, e.g. by timestamps, which I want
> to access by range?
>
> I can't
Hi
I think In this case (logging hard traffic) both of two idea can't scale
write operation in current Cassandra.
So wait for secondary index support.
2010/6/3 Jonathan Shook
> Insert "if you want to use long values for keys and column names"
> above paragraph 2. I forgot that part.
>
> On Wed,
Insert "if you want to use long values for keys and column names"
above paragraph 2. I forgot that part.
On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook wrote:
> If you want to do range queries on the keys, you can use OPP to do this:
> (example using UTF-8 lexicographic keys, with bursts split ac
If you want to do range queries on the keys, you can use OPP to do this:
(example using UTF-8 lexicographic keys, with bursts split across rows
according to row size limits)
Events: {
"20100601.05.30.003": {
"20100601.05.30.003":
"20100601.05.30.007":
...
}
}
With a future version
With a traffic pattern like that, you may be better off storing the
events of each burst (I'll call them group) in one or more keys and
then storing these keys in the day key.
EventGroupsPerDay: {
"20100601": {
123456789: "group123", // column name is timestamp group was
received, column val
Let's say you're logging events, and you have billions of events. What if
the events come in bursts, so within a day there are millions of events, but
they all come within microseconds of each other a few times a day? How do
you find the events that happened on a particular day if you can't store
t
Either OPP by key, or within a row by column name. I'd suggest the latter.
If you have structured data to stick under a column (named by the
timestamp), then you can serialize and unserialize it yourself, or you
can use a supercolumn. It's effectively the same thing. Cassandra
only provides the su
I like to model this kind of data as columns, where the timestamps are
the column name (either longs, TimeUUIDs, or string depending on your
usage). If you have too much data for a single row, you'd need to have
multiple rows of these. For time-series data, it makes sense to use
one row per minute/