@Bill Thank you BIll! @Cassandra users Can others also leave their suggestions and comments about my schema, please. Also my question about whether to use a superColumn or alternatively, just store the data (that would otherwise be stored in subcolumns) as serialized into a single column in standard type column family.
Thanks -Aditya Narayan On Wed, Feb 2, 2011 at 10:11 PM, William R Speirs <bill.spe...@gmail.com> wrote: > I did not understand before... sorry. > > Again, depending upon how many reminders you have for a single user, this > could be a long/wide row. Again, it really comes down to how many reminders > are we talking about and how often will they be read/written. While a single > row can contain millions (maybe more) columns, that doesn't mean it's a good > idea. > > I'm working on a logging system with Cassandra and ran into this same type > of problem. Do I put all of the messages for a single system into a single > row keyed off that system's name? I quickly came to the answer of "no" and > now I break my row keys into POSIX_timestamp:system where my timestamps are > buckets for every 5 minutes. This nicely distributes the load across the > nodes in my system. > > Bill- > > On 02/02/2011 11:18 AM, Aditya Narayan wrote: >> >> You got me wrong perhaps.. >> >> I am already splitting the row on per user basis ofcourse, otherwise >> the schema wont make sense for my usage. The row contains only >> *reminders of a single user* sorted in chronological order. The >> reminder Id are stored as supercolumn name and subcolumn contain tags >> for that reminder. >> >> >> >> On Wed, Feb 2, 2011 at 9:19 PM, William R Speirs<bill.spe...@gmail.com> >> wrote: >>> >>> Any time I see/hear "a single row containing all ..." I get nervous. That >>> single row is going to reside on a single node. That is potentially a lot >>> of >>> load (don't know the system) for that single node. Why wouldn't you split >>> it >>> by at least user? If it won't be a lot of load, then why are you using >>> Cassandra? This seems like something that could easily fit into an >>> SQL/relational style DB. If it's too much data (millions of users, 100s >>> of >>> millions of reminders) for a standard SQL/relational model, then it's >>> probably too much for a single row. >>> >>> I'm not familiar with the TTL functionality of Cassandra... sorry cannot >>> help/comment there, still learning :-) >>> >>> Yea, my $0.02 is that this is an effective way to leverage super columns. >>> >>> Bill- >>> >>> On 02/02/2011 10:43 AM, Aditya Narayan wrote: >>>> >>>> I think you got it exactly what I wanted to convey except for few >>>> things I want to clarify: >>>> >>>> I was thinking of a single row containing all reminders (& not split >>>> by day). History of the reminders need to be maintained for some time. >>>> After certain time (say 3 or 6 months) they may be deleted by ttl >>>> facility. >>>> >>>> "While presenting the reminders timeline to the user, latest >>>> supercolumns like around 50 from the start_end will be picked up and >>>> their subcolumns values will be compared to the Tags user has chosen >>>> to see and, corresponding to the filtered subcolumn values(tags), the >>>> rows of the reminder details would be picked up.." >>>> >>>> Is supercolumn a preferable choice for this ? Can there be a better >>>> schema than this ? >>>> >>>> >>>> -Aditya Narayan >>>> >>>> >>>> >>>> On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs<bill.spe...@gmail.com> >>>> wrote: >>>>> >>>>> To reiterate, so I know we're both on the same page, your schema would >>>>> be >>>>> something like this: >>>>> >>>>> - A column family (as you describe) to store the details of a reminder. >>>>> One >>>>> reminder per row. The row key would be a TimeUUID. >>>>> >>>>> - A super column family to store the reminders for each user, for each >>>>> day. >>>>> The row key would be something like: YYYYMMDD:user_id. The column names >>>>> would simply be the TimeUUID of the messages. The sub column names >>>>> would >>>>> be >>>>> the tag names of the various reminders. >>>>> >>>>> The idea is that you would then get a slice of each row for a user, for >>>>> a >>>>> day, that would only contain sub column names with the tags you're >>>>> looking >>>>> for? Then based upon the column names returned, you'd look-up the >>>>> reminders. >>>>> >>>>> That seems like a solid schema to me. >>>>> >>>>> Bill- >>>>> >>>>> On 02/02/2011 09:37 AM, Aditya Narayan wrote: >>>>>> >>>>>> Actually, I am trying to use Cassandra to display to users on my >>>>>> applicaiton, the list of all Reminders set by themselves for >>>>>> themselves, on the application. >>>>>> >>>>>> I need to store rows containing the timeline of daily Reminders put by >>>>>> the users, for themselves, on application. The reminders need to be >>>>>> presented to the user in a chronological order like a news feed. >>>>>> Each reminder has got certain tags associated with it(so that, at >>>>>> times, user may also choose to see the reminders filtered by tags in >>>>>> chronological order). >>>>>> >>>>>> So I thought of a schema something like this:- >>>>>> >>>>>> -Each Reminder details may be stored as separate rows in column >>>>>> family. >>>>>> -For presenting the timeline of reminders set by user to be presented >>>>>> to the user, the timeline row of each user would contain the Id/Key(s) >>>>>> (of the Reminder rows) as the supercolumn names and the subcolumns >>>>>> inside that supercolumns could contain the list of tags associated >>>>>> with particular reminder. All tags set at once during first write. The >>>>>> no of tags(subcolumns) will be around 8 maximum. >>>>>> >>>>>> Any comments, suggestions and feedback on the schema design are >>>>>> requested.. >>>>>> >>>>>> Thanks >>>>>> Aditya Narayan >>>>>> >>>>>> >>>>>> On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan<ady...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> Hey all, >>>>>>> >>>>>>> I need to store supercolumns each with around 8 subcolumns; >>>>>>> All the data for a supercolumn is written at once and all subcolumns >>>>>>> need to be retrieved together. The data in each subcolumn is not big, >>>>>>> it just contains keys to other rows. >>>>>>> >>>>>>> Would it be preferred to have a supercolumn family or just a standard >>>>>>> column family containing "all the subcolumns data serialized in >>>>>>> single >>>>>>> column(s) " ? >>>>>>> >>>>>>> Thanks >>>>>>> Aditya Narayan >>>>>>> >>>>> >>> >