Re: 1000's of column families

Hiller, Dean Thu, 27 Sep 2012 08:37:36 -0700

PlayOrm DOES support inheritance mapping but only supports single table right 
now.  In fact, DboColumnMeta.java  has 4 subclasses that all map to that one 
ColumnFamily so we already support and heavily use the inheritance feature.


That said, I am more concerned with scalability.  The more you stuff into a 
table, the more partitions you need….as an example, I really have a choice

Have this in a partition
device1 datapoint1
device2 datapoint1
device1 datapoint2
device2 datapoint2
device1 datapoint3

OR have just this in a partition
device1 datapoint1
device1 datapoint1
device1 datapoint1

If I use the latter approach, I can have more points for device1 in one 
partition.  I could use inheritance but then I can't fit as many data points 
for device 1 in a partition.

Does that make more sense?

Later,
Dean


From: Marcelo Elias Del Valle <mvall...@gmail.com<mailto:mvall...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, September 27, 2012 8:45 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: 1000's of column families

Dean,

     I was used, in the relational world, to use hibernate and O/R mapping. 
There were times when I used 3 classes (2 inheriting from 1 another) and mapped 
all of the to 1 table. The common part was in the super class and each sub 
class had it's own columns. The table, however, use to have all the columns and 
this design was hard because of that, as creating more subclasses would need 
changes in the table.
     However, if you use playOrm and if playOrm has/had a feature to allow 
inheritance mapping to a CF, it would solve your problem, wouldn't it? Of 
course it is probably much harder than it might problably appear... :D

Best regards,
Marcelo Valle.

2012/9/27 Hiller, Dean <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>>
We have 1000's of different building devices and we stream data from these 
devices.  The format and data from each one varies so one device has 
temperature at timeX with some other variables, another device has CO2 
percentage and other variables.  Every device is unique and streams it's own 
data.  We dynamically discover devices and register them.  Basically, one CF or 
table per thing really makes sense in this environment.  While we could try to 
find out which devices "are" similar, this would really be a pain and some 
devices add some new variable into the equation.  NOT only that but researchers 
can register new datasets and upload them as well and each dataset they have 
they do NOT want to share with other researches necessarily so we have security 
groups and each CF belongs to security groups.  We dynamically create CF's on 
the fly as people register new datasets.

On top of that, when the data sets get too large, we probably want to partition 
a single CF into time partitions.  We could create one CF and put all the data 
and have a partition per device, but then a time partition will contain 
"multiple" devices of data meaning we need to shrink our time partition size 
where if we have CF per device, the time partition can be larger as it is only 
for that one device.

THEN, on top of that, we have a meta CF for these devices so some people want 
to query for streams that match criteria AND which returns a CF name and they 
query that CF name so we almost need a query with variables like select cfName 
from Meta where x = y and then select * from cfName where xxxxx. Which we can 
do today.

Dean

From: Marcelo Elias Del Valle 
<mvall...@gmail.com<mailto:mvall...@gmail.com><mailto:mvall...@gmail.com<mailto:mvall...@gmail.com>>>
Reply-To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Date: Thursday, September 27, 2012 8:01 AM
To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Subject: Re: 1000's of column families

Out of curiosity, is it really necessary to have that amount of CFs?
I am probably still used to relational databases, where you would use a new 
table just in case you need to store different kinds of data. As Cassandra 
stores anything in each CF, it might probably make sense to have a lot of CFs 
to store your data...
But why wouldn't you use a single CF with partitions in these case? Wouldn't it 
be the same thing? I am asking because I might learn a new modeling technique 
with the answer.

[]s

2012/9/26 Hiller, Dean 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov><mailto:dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>>>
We are streaming data with 1 stream per 1 CF and we have 1000's of CF.  When 
using the tools they are all geared to analyzing ONE column family at a time 
:(.  If I remember correctly, Cassandra supports as many CF's as you want, 
correct?  Even though I am going to have tons of funs with limitations on the 
tools, correct?

(I may end up wrapping the node tool with my own aggregate calls if needed to 
sum up multiple column families and such).

Thanks,
Dean



--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr



--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Re: 1000's of column families

Reply via email to