On Thu, 15 Apr 2010 11:27:47 +0200 Jean-Pierre Bergamin <ja...@ractive.ch> wrote:
JB> Am 14.04.2010 15:22, schrieb Ted Zlatanov: >> On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin"<ja...@ractive.ch> >> wrote: >> JB> The metrics are stored together with a timestamp. The queries we want to JB> perform are: JB> * The last value of a specific metric of a device JB> * The values of a specific metric of a device between two timestamps t1 and JB> t2 >> >> Make your key "devicename-metricname-YYYYMMDD-HHMM" (with whatever time >> sharding makes sense to you; I use UTC by-hours and by-day in my >> environment). Then your supercolumn is the collection time as a >> LongType and your columns inside the supercolumn can express the metric >> in detail (collector agent, detailed breakdown, etc.). >> JB> Just for my understanding. What is "time sharding"? I couldn't find an JB> explanation somewhere. Do you mean that the time-series data is rolled JB> up in 5 minues, 1 hour, 1 day etc. slices? Yes. The usual meaning of "shard" in RDBMS world is to segment your database by some criteria, e.g. US vs. Europe in Amazon AWS because their data centers are laid out so. I was taking a linguistic shortcut to mean "break down your rows by some convenient criteria." You can actually set up your Partitioner in Cassandra to literally shard your keyspace rows based on the key, but I just meant "slice" in my note. JB> So this would be defined as: JB> <ColumnFamily Name="measurements" ColumnType="Super" JB> CompareWith="UTF8Type" CompareSubcolumnsWith="LongType" /> JB> So when i want to read all values of one metric between two timestamps JB> t0 and t1, I'd have to read the supercolumns that match a key range JB> (device1:metric1:t0 - device1:metric1:t1) and then all the JB> supercolumns for this key? Yes. This is a single multiget if you can construct the key range explicitly. Cassandra loads a lot of this in memory already and filters it after the fact, that's why it pays to slice your keys and to stitch them together on the client side if you have to go across a time boundary. You'll also get better key load balancing with deeper slicing if you use the randomizing partitioner. In the result set, you'll get each matching supercolumn with all the columns inside it. You may have to page through supercolumns. Ted