TimeUUID's time component is measured in 100-nanosecond intervals. The
library you use might calculate it with poorer accuracy or precision,
but from a storage/comparison standpoint in Cassandra millisecond data
is easily captured by it.
One typical way of dealing with the data explosion of sampled time
series data is to bucket/shard rows (i.e. Bob-20100423-bloodpressure)
so that you put an upper bound on the row length.
On Apr 23, 2010, at 7:01 PM, Andrew Nguyen <andrew-lists-cassan...@ucsfcti.org
> wrote:
Hello,
I am looking to store patient physiologic data in Cassandra - it's
being collected at rates of 1 to 125 Hz. I'm thinking of storing
the timestamps as the column names and the patient/parameter combo
as the row key. For example, Bob is in the ICU and is currently
having his blood pressure, intracranial pressure, and heart rate
monitored. I'd like to collect this with the following row keys:
Bob-bloodpressure
Bob-intracranialpressure
Bob-heartrate
The column names would be timestamps but that's where my questions
start:
I'm not sure what the best data type and CompareWith would be. From
my searching, it sounds like the TimeUUID may be suitable but isn't
really designed for millisecond accuracy. My other thought is just
to store them as strings (2010-04-23 10:23:45.016). While I space
isn't the foremost concern, we will be collecting this data 24/7 so
we'll be creating many columns over the long-term.
I found https://issues.apache.org/jira/browse/CASSANDRA-16 which
states that the entire row must fit in memory. Does this include
the values as well as the column names?
In considering the limits of cassandra and the best way to model
this, we would be adding 3.9 billion rows per year (assuming 125 Hz
@ 24/7). However, I can't really think of a better way to model
this... So, am I thinking about this all wrong or am I on the right
track?
Thanks,
Andrew