Re: Best way to store millisecond-accurate data

Miguel Verde Tue, 04 May 2010 07:36:56 -0700

One would use batch processes (e.g. through Hadoop) or client-sideaggregation, yes. In theory it would be possible to introduce runtimesharding across rows into the Cassandra server side, but it's not partof its design.

In practice, one would want to model their data such that the 'row hastoo much columns' scenario is prevented.

On May 4, 2010, at 8:06 AM, Даниел Симеонов<dsimeo...@gmail.com> wrote:

Hi Miguel,
I'd like to ask is it possible to have runtime sharding or rows incassandra, i.e. if the row has too much new columns inserted thencreate another one row (let's say if the original timesharding isone day per row, then we would have two rows for that day). Maybebatch processes could do that.
Best regards, Daniel.

2010/4/24 Miguel Verde <miguelitov...@gmail.com>
TimeUUID's time component is measured in 100-nanosecond intervals.The library you use might calculate it with poorer accuracy orprecision, but from a storage/comparison standpoint in Cassandramillisecond data is easily captured by it.
One typical way of dealing with the data explosion of sampled timeseries data is to bucket/shard rows (i.e. Bob-20100423-bloodpressure) so that you put an upper bound on the row length.
On Apr 23, 2010, at 7:01 PM, Andrew Nguyen <andrew-lists-cassan...@ucsfcti.org> wrote:
Hello,
I am looking to store patient physiologic data in Cassandra - it'sbeing collected at rates of 1 to 125 Hz. I'm thinking of storingthe timestamps as the column names and the patient/parameter comboas the row key. For example, Bob is in the ICU and is currentlyhaving his blood pressure, intracranial pressure, and heart ratemonitored. I'd like to collect this with the following row keys:
Bob-bloodpressure
Bob-intracranialpressure
Bob-heartrate
The column names would be timestamps but that's where my questionsstart:
I'm not sure what the best data type and CompareWith would be. Frommy searching, it sounds like the TimeUUID may be suitable but isn'treally designed for millisecond accuracy. My other thought is justto store them as strings (2010-04-23 10:23:45.016). While I spaceisn't the foremost concern, we will be collecting this data 24/7 sowe'll be creating many columns over the long-term.
I found https://issues.apache.org/jira/browse/CASSANDRA-16 whichstates that the entire row must fit in memory. Does this includethe values as well as the column names?
In considering the limits of cassandra and the best way to modelthis, we would be adding 3.9 billion rows per year (assuming 125 Hz@ 24/7). However, I can't really think of a better way to modelthis... So, am I thinking about this all wrong or am I on the righttrack?
Thanks,
Andrew

Re: Best way to store millisecond-accurate data

Reply via email to