Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
> mailto:user@cassandra.apache.org>> > Date: Tuesday, October 2, 2012 1:01 PM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > mailto:user@cassandra.apache.org>> > Subject: Re: 1000's of column families > > Dean, > > On Tuesday, O

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
2, 2012 1:01 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: 1000's of column families Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: Because the data for an index is not all together(ie. Need a m

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: > Because the data for an index is not all together(ie. Need a multi get to get > the data). It is not contiguous. > > The prefix in a partition they keep the data so all data for a prefix from > what I understand is contiguous.

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Tuesday, October 2, 2012 11:18 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: 1000's of column fam

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Jeremy, On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote: > Another option that may or may not work for you is the support in Cassandra > 1.1+ to use a secondary index as an input to your mapreduce job. What you > might do is add a field to the column family that represents which virt

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which virtual column family that it is part of. Then when doing mapreduce jobs, you

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, Brian, By the way, PlayOrm offers a NoSqlTypedSession that is different than the ORM half of PlayOrm dealing in raw stuff that does indexing(so you can do Scalable SQL on data that has no ORM on top of it). That is what we use for our 1000's of CF's as we don't know the format of any of t

Re: 1000's of column families

2012-10-02 Thread Ben Hood
On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill wrote: > Exactly. So you're back to the deliberation between using multiple CFs (potentially with some known working upper bound*) or feeding your map reduce in some other way (as you decided to do with Storm). In my particular scenario I'd like to be

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Exactly. --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 € healthmarketscience.com This information transmitted in this em

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Brian, On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill wrote: > > Without putting too much thought into it... > > Given the underlying architecture, I think you could/would have to write > your own partitioner, which would partition based on the prefix/virtual > keyspace. I might be barking up the

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Agreed. Do we know yet what the overhead is for each column family? What is the limit? If you have a SINGLE keyspace w/ 2+ CF's, what happens? Anyone know? -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Dr

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Thanks for the idea but…(but please keep thinking on it)... 100% what we don't want since partitioned data resides on the same node. I want to map/reduce the column families and leverage the parallel disks :( :( I am sure others would want to do the same…..We almost need a feature of virtual Col

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Without putting too much thought into it... Given the underlying architecture, I think you could/would have to write your own partitioner, which would partition based on the prefix/virtual keyspace. -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Scie

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Dean, On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean wrote: > Ben, > to address your question, read my last post but to summarize, yes, there > is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT > when doing map/reduce. Doing map/reduce, you will now have HUGE overhead > i

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, to address your question, read my last post but to summarize, yes, there is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT when doing map/reduce. Doing map/reduce, you will now have HUGE overhead in reading a whole slew of rows you don't care about as you can't map/

Re: 1000's of column families

2012-10-01 Thread Ben Hood
On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill wrote: > Its just a convenient way of prefixing: > http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html So given that it is possible to use a CF per tenant, should we assume that there at sufficient scale that there is less

Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
Its just a convenient way of prefixing: http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html -brian On Mon, Oct 1, 2012 at 4:22 PM, Ben Hood <0x6e6...@gmail.com> wrote: > Brian, > > On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill wrote: >> We haven't committed either wa

Re: 1000's of column families

2012-10-01 Thread Ben Hood
Brian, On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill wrote: > We haven't committed either way yet, but given Ed Anuff's presentation > on virtual keyspaces, we were leaning towards a single column family > approach: > http://blog.apigee.com/detail/building_a_mobile_data_platform_with_cassandra_-_

Re: 1000's of column families

2012-10-01 Thread Hiller, Dean
F for these devices so some >>>people want >> to query for streams that match criteria AND which returns a CF name >>and they >> query that CF name so we almost need a query with variables like select >>cfName >> from Meta where x = y and then select * from

Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
uery with variables like select cfName > from Meta where x = y and then select * from cfName where x. Which we can > do > today. >> >> Dean >> >> From: Marcelo Elias Del Valle mailto:mvall...@gmail.com>> >> Reply-To: "user@cassandra.apache.org&l

Re: 1000's of column families

2012-09-28 Thread Flavio Baronti
1 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: 1000's of column families Out of curiosity, is it really necessary to have that amount of CFs? I am probably still used to relational databases, where you woul

Re: 1000's of column families

2012-09-28 Thread Aaron Turner
ssandra.apache.org>" > mailto:user@cassandra.apache.org>> > Date: Thursday, September 27, 2012 11:52 PM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > mailto:user@cassandra.apache.org>> > Subject: Re: 1000's of column fa

Re: 1000's of column families

2012-09-28 Thread Robin Verlangen
ra.apache.org<mailto:user@cassandra.apache.org>" < > user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Date: Thursday, September 27, 2012 11:52 PM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" < > user@cassandra.apache.org<mail

Re: 1000's of column families

2012-09-28 Thread Hiller, Dean
mailto:user@cassandra.apache.org>> Subject: Re: 1000's of column families "so if you add up all the applications which would be huge and then all the tables which is large, it just keeps growing. It is a very nice concept(all data in one location), though we will see how implementi

Re: 1000's of column families

2012-09-27 Thread Robin Verlangen
"so if you add up all the applications which would be huge and then all the tables which is large, it just keeps growing. It is a very nice concept(all data in one location), though we will see how implementing it goes." This shouldn't be a real problem for Cassandra. Just add more nodes and ever

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
Unfortunately, the security aspect is very strict. Some make their data public but there are many projects where due to client contracts, they cannot make their data public within our company(ie. Other groups in our company are not allowed to see the data). Also, currently, we have researchers up

Re: 1000's of column families

2012-09-27 Thread Aaron Turner
On Thu, Sep 27, 2012 at 7:35 PM, Marcelo Elias Del Valle wrote: > > > 2012/9/27 Aaron Turner >> >> How strict are your security requirements? If it wasn't for that, >> you'd be much better off storing data on a per-statistic basis then >> per-device. Hell, you could store everything in a single

Re: 1000's of column families

2012-09-27 Thread Edward Capriolo
Hector also offers support for 'Virtual Keyspaces' which you might want to look at. On Thu, Sep 27, 2012 at 1:10 PM, Aaron Turner wrote: > On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean wrote: >> We have 1000's of different building devices and we stream data from these >> devices. The format

Re: 1000's of column families

2012-09-27 Thread Aaron Turner
On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean wrote: > We have 1000's of different building devices and we stream data from these > devices. The format and data from each one varies so one device has > temperature at timeX with some other variables, another device has CO2 > percentage and othe

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
y-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Thursday, September 27, 2012 8:45 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject

Re: 1000's of column families

2012-09-27 Thread Marcelo Elias Del Valle
pache.org<mailto:user@cassandra.apache.org>> > Date: Thursday, September 27, 2012 8:01 AM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" < > user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Subject: Re: 1000's of column famili

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
t;mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Thursday, September 27, 2012 8:01 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: 1000's of column families Out of

Re: 1000's of column families

2012-09-27 Thread Marcelo Elias Del Valle
Out of curiosity, is it really necessary to have that amount of CFs? I am probably still used to relational databases, where you would use a new table just in case you need to store different kinds of data. As Cassandra stores anything in each CF, it might probably make sense to have a lot of CFs t

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
Is there a non rhetorical question in there? Maybe is that a feature request in disguise? The question was basically, Is Cassandra ok with as many CF's as you want? It sounds like it is not based on the email that every CF causes a bit more RAM to be used though. So if cassandra is not ok with

Re: 1000's of column families

2012-09-27 Thread Robin Verlangen
Every CF adds some overhead (in memory) to each node. This is something you should really keep in mind. Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments

Re: 1000's of column families

2012-09-27 Thread Sylvain Lebresne
On Thu, Sep 27, 2012 at 12:13 AM, Hiller, Dean wrote: > We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When > using the tools they are all geared to analyzing ONE column family at a time > :(. If I remember correctly, Cassandra supports as many CF's as you want, > corr

1000's of column families

2012-09-26 Thread Hiller, Dean
We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When using the tools they are all geared to analyzing ONE column family at a time :(. If I remember correctly, Cassandra supports as many CF's as you want, correct? Even though I am going to have tons of funs with limitati