Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

Aditya Narayan Wed, 02 Feb 2011 08:58:22 -0800

@Bill
Thank you BIll!

@Cassandra users
Can others also leave their suggestions and comments about my schema, please.
Also my question about whether to use a superColumn or alternatively,
just store the data (that would otherwise be stored in subcolumns) as
serialized into a single column in standard type column family.


Thanks

-Aditya Narayan



On Wed, Feb 2, 2011 at 10:11 PM, William R Speirs <bill.spe...@gmail.com> wrote:
> I did not understand before... sorry.
>
> Again, depending upon how many reminders you have for a single user, this
> could be a long/wide row. Again, it really comes down to how many reminders
> are we talking about and how often will they be read/written. While a single
> row can contain millions (maybe more) columns, that doesn't mean it's a good
> idea.
>
> I'm working on a logging system with Cassandra and ran into this same type
> of problem. Do I put all of the messages for a single system into a single
> row keyed off that system's name? I quickly came to the answer of "no" and
> now I break my row keys into POSIX_timestamp:system where my timestamps are
> buckets for every 5 minutes. This nicely distributes the load across the
> nodes in my system.
>
> Bill-
>
> On 02/02/2011 11:18 AM, Aditya Narayan wrote:
>>
>> You got me wrong perhaps..
>>
>> I am already splitting the row on per user basis ofcourse, otherwise
>> the schema wont make sense for my usage. The row contains only
>> *reminders of a single user* sorted in chronological order. The
>> reminder Id are stored as supercolumn name and subcolumn contain tags
>> for that reminder.
>>
>>
>>
>> On Wed, Feb 2, 2011 at 9:19 PM, William R Speirs<bill.spe...@gmail.com>
>>  wrote:
>>>
>>> Any time I see/hear "a single row containing all ..." I get nervous. That
>>> single row is going to reside on a single node. That is potentially a lot
>>> of
>>> load (don't know the system) for that single node. Why wouldn't you split
>>> it
>>> by at least user? If it won't be a lot of load, then why are you using
>>> Cassandra? This seems like something that could easily fit into an
>>> SQL/relational style DB. If it's too much data (millions of users, 100s
>>> of
>>> millions of reminders) for a standard SQL/relational model, then it's
>>> probably too much for a single row.
>>>
>>> I'm not familiar with the TTL functionality of Cassandra... sorry cannot
>>> help/comment there, still learning :-)
>>>
>>> Yea, my $0.02 is that this is an effective way to leverage super columns.
>>>
>>> Bill-
>>>
>>> On 02/02/2011 10:43 AM, Aditya Narayan wrote:
>>>>
>>>> I think you got it exactly what I wanted to convey except for few
>>>> things I want to clarify:
>>>>
>>>> I was thinking of a single row containing all reminders (&    not split
>>>> by day). History of the reminders need to be maintained for some time.
>>>> After certain time (say 3 or 6 months) they may be deleted by ttl
>>>> facility.
>>>>
>>>> "While presenting the reminders timeline to the user, latest
>>>> supercolumns like around 50 from the start_end will be picked up and
>>>> their subcolumns values will be compared to the Tags user has chosen
>>>> to see and, corresponding to the filtered subcolumn values(tags), the
>>>> rows of the reminder details would be picked up.."
>>>>
>>>> Is supercolumn a preferable choice for this ? Can there be a better
>>>> schema than this ?
>>>>
>>>>
>>>> -Aditya Narayan
>>>>
>>>>
>>>>
>>>> On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs<bill.spe...@gmail.com>
>>>>  wrote:
>>>>>
>>>>> To reiterate, so I know we're both on the same page, your schema would
>>>>> be
>>>>> something like this:
>>>>>
>>>>> - A column family (as you describe) to store the details of a reminder.
>>>>> One
>>>>> reminder per row. The row key would be a TimeUUID.
>>>>>
>>>>> - A super column family to store the reminders for each user, for each
>>>>> day.
>>>>> The row key would be something like: YYYYMMDD:user_id. The column names
>>>>> would simply be the TimeUUID of the messages. The sub column names
>>>>> would
>>>>> be
>>>>> the tag names of the various reminders.
>>>>>
>>>>> The idea is that you would then get a slice of each row for a user, for
>>>>> a
>>>>> day, that would only contain sub column names with the tags you're
>>>>> looking
>>>>> for? Then based upon the column names returned, you'd look-up the
>>>>> reminders.
>>>>>
>>>>> That seems like a solid schema to me.
>>>>>
>>>>> Bill-
>>>>>
>>>>> On 02/02/2011 09:37 AM, Aditya Narayan wrote:
>>>>>>
>>>>>> Actually, I am trying to use Cassandra to display to users on my
>>>>>> applicaiton, the list of all Reminders set by themselves for
>>>>>> themselves, on the application.
>>>>>>
>>>>>> I need to store rows containing the timeline of daily Reminders put by
>>>>>> the users, for themselves, on application. The reminders need to be
>>>>>> presented to the user in a chronological order like a news feed.
>>>>>> Each reminder has got certain tags associated with it(so that, at
>>>>>> times, user may also choose to see the reminders filtered by tags in
>>>>>> chronological order).
>>>>>>
>>>>>> So I thought of a schema something like this:-
>>>>>>
>>>>>> -Each Reminder details may be stored as separate rows in column
>>>>>> family.
>>>>>> -For presenting the timeline of reminders set by user to be presented
>>>>>> to the user, the timeline row of each user would contain the Id/Key(s)
>>>>>> (of the Reminder rows) as the supercolumn names and the subcolumns
>>>>>> inside that supercolumns could contain the list of tags associated
>>>>>> with particular reminder. All tags set at once during first write. The
>>>>>> no of tags(subcolumns) will be around 8 maximum.
>>>>>>
>>>>>> Any comments, suggestions and feedback on the schema design are
>>>>>> requested..
>>>>>>
>>>>>> Thanks
>>>>>> Aditya Narayan
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan<ady...@gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hey all,
>>>>>>>
>>>>>>> I need to store supercolumns each with around 8 subcolumns;
>>>>>>> All the data for a supercolumn is written at once and all subcolumns
>>>>>>> need to be retrieved together. The data in each subcolumn is not big,
>>>>>>> it just contains keys to other rows.
>>>>>>>
>>>>>>> Would it be preferred to have a supercolumn family or just a standard
>>>>>>> column family containing "all the subcolumns data serialized in
>>>>>>> single
>>>>>>> column(s) " ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Aditya Narayan
>>>>>>>
>>>>>
>>>
>

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

Reply via email to