Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-03 Thread Aditya Narayan
Thanks Tyler! On Thu, Feb 3, 2011 at 12:06 PM, Tyler Hobbs wrote: > On Wed, Feb 2, 2011 at 3:27 PM, Aditya Narayan wrote: >> >> Can I have some more feedback about my schema perhaps somewhat more >> criticisive/harsh ? > > It sounds reasonable to me. > > Since you're writing/reading all of the

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Tyler Hobbs
On Wed, Feb 2, 2011 at 3:27 PM, Aditya Narayan wrote: > Can I have some more feedback about my schema perhaps somewhat more > criticisive/harsh ? > It sounds reasonable to me. Since you're writing/reading all of the subcolumns at the same time, I would opt for a standard column with the tags se

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
Can I have some more feedback about my schema perhaps somewhat more criticisive/harsh ? Thanks again, Aditya Narayan On Wed, Feb 2, 2011 at 10:27 PM, Aditya Narayan wrote: > @Bill > Thank you BIll! > > @Cassandra users > Can others also leave their suggestions and comments about my schema, plea

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
@Bill Thank you BIll! @Cassandra users Can others also leave their suggestions and comments about my schema, please. Also my question about whether to use a superColumn or alternatively, just store the data (that would otherwise be stored in subcolumns) as serialized into a single column in standa

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread William R Speirs
I did not understand before... sorry. Again, depending upon how many reminders you have for a single user, this could be a long/wide row. Again, it really comes down to how many reminders are we talking about and how often will they be read/written. While a single row can contain millions (may

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
You got me wrong perhaps.. I am already splitting the row on per user basis ofcourse, otherwise the schema wont make sense for my usage. The row contains only *reminders of a single user* sorted in chronological order. The reminder Id are stored as supercolumn name and subcolumn contain tags for t

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread William R Speirs
Any time I see/hear "a single row containing all ..." I get nervous. That single row is going to reside on a single node. That is potentially a lot of load (don't know the system) for that single node. Why wouldn't you split it by at least user? If it won't be a lot of load, then why are you usi

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
I think you got it exactly what I wanted to convey except for few things I want to clarify: I was thinking of a single row containing all reminders (& not split by day). History of the reminders need to be maintained for some time. After certain time (say 3 or 6 months) they may be deleted by ttl

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread William R Speirs
To reiterate, so I know we're both on the same page, your schema would be something like this: - A column family (as you describe) to store the details of a reminder. One reminder per row. The row key would be a TimeUUID. - A super column family to store the reminders for each user, for each

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
Actually, I am trying to use Cassandra to display to users on my applicaiton, the list of all Reminders set by themselves for themselves, on the application. I need to store rows containing the timeline of daily Reminders put by the users, for themselves, on application. The reminders need to be p

Re: Schema Design

2011-01-30 Thread Jonathan Ellis
r a host in a single row is not a good choice. 2 >> reason: >> 1, too few keys, so your data will not distributing well. >> 2, data under a key will always increase. So Cassandra have to do more >> SSTable compaction. >> >> -邮件原件- >> 发件人: Wil

Re: Schema Design

2011-01-29 Thread aaron morton
t; > -邮件原件- > 发件人: William R Speirs [mailto:bill.spe...@gmail.com] > 发送时间: 2011年1月27日 9:15 > 收件人: user@cassandra.apache.org > 主题: Re: Schema Design > > It makes sense that the single row for a system (with a growing number of > columns) will reside on a single mac

Re: Schema Design

2011-01-26 Thread Wangpei (Peter)
.@gmail.com] 发送时间: 2011年1月27日 9:15 收件人: user@cassandra.apache.org 主题: Re: Schema Design It makes sense that the single row for a system (with a growing number of columns) will reside on a single machine. With that in mind, here is my updated schema: - A single column family for all the mess

Re: Schema Design

2011-01-26 Thread William R Speirs
Ah, sweet... thanks for the link! Bill- On 01/26/2011 08:20 PM, buddhasystem wrote: Bill, it's all explained here: http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size,the Watch the number of CFs and the memtable sizes. In my experience, this all matters.

Re: Schema Design

2011-01-26 Thread buddhasystem
Bill, it's all explained here: http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size,the Watch the number of CFs and the memtable sizes. In my experience, this all matters. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Des

Re: Schema Design

2011-01-26 Thread William R Speirs
ich you means you'll definitely not be able to distribute your load very well. From: Bill Speirs [bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 1:23 PM To: user@cassandra.apache.org Subject: Re: Schema Design I like this approach, but I have 2 q

RE: Schema Design

2011-01-26 Thread Shu Zhang
very well. From: Bill Speirs [bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 1:23 PM To: user@cassandra.apache.org Subject: Re: Schema Design I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns t

Re: Schema Design

2011-01-26 Thread buddhasystem
I used the term "sharding" a bit frivolously. Sorry. It's just splitting semantically homogenious data among CFs doesn't scale too well, as each CF is allocated a piece of memory on the server. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Sche

Re: Schema Design

2011-01-26 Thread Nick Santini
One thing you can do is create one CF, then as the row key use the application name + timestamp, with that you can do your range query using OOP. then store whatever you want in the row problem would be if one app generates far more logs than the others Nicolas Santini On Thu, Jan 27, 2011 at 1

Re: Schema Design

2011-01-26 Thread David McNelis
My cli knowledge sucks so far, so I'll leave that to othersI'm doing most of my reading/writing through a thrift client (hector/java based) As for the implications, as of the latest version of Cassandra there is not theoretical limit to the number of columns that a particular row can hold. O

Re: Schema Design

2011-01-26 Thread Bill Speirs
I have a basic understanding of OPP... if most of my messages come within a single hour then a few nodes could be storing all of my values, right? You totally lost me on, "whether to shard data as per system..." Is my schema (one column family per system, and row keys as TimeUUIDType) sharding by

Re: Schema Design

2011-01-26 Thread Bill Speirs
I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time

Re: Schema Design

2011-01-26 Thread buddhasystem
Having separate columns for Year, Month etc seems redundant. It's tons more efficient to keep say UTC time in POSIX format (basically integer). It's easy to convert back and forth. If you want to get a range of dates, in that case you might use Order Preserving Partitioner, and sort out which sys

Re: Schema Design

2011-01-26 Thread David McNelis
I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and