Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

Aditya Narayan Wed, 02 Feb 2011 08:18:53 -0800

You got me wrong perhaps..

I am already splitting the row on per user basis ofcourse, otherwise
the schema wont make sense for my usage. The row contains only
*reminders of a single user* sorted in chronological order. The
reminder Id are stored as supercolumn name and subcolumn contain tags
for that reminder.




On Wed, Feb 2, 2011 at 9:19 PM, William R Speirs <bill.spe...@gmail.com> wrote:
> Any time I see/hear "a single row containing all ..." I get nervous. That
> single row is going to reside on a single node. That is potentially a lot of
> load (don't know the system) for that single node. Why wouldn't you split it
> by at least user? If it won't be a lot of load, then why are you using
> Cassandra? This seems like something that could easily fit into an
> SQL/relational style DB. If it's too much data (millions of users, 100s of
> millions of reminders) for a standard SQL/relational model, then it's
> probably too much for a single row.
>
> I'm not familiar with the TTL functionality of Cassandra... sorry cannot
> help/comment there, still learning :-)
>
> Yea, my $0.02 is that this is an effective way to leverage super columns.
>
> Bill-
>
> On 02/02/2011 10:43 AM, Aditya Narayan wrote:
>>
>> I think you got it exactly what I wanted to convey except for few
>> things I want to clarify:
>>
>> I was thinking of a single row containing all reminders (&  not split
>> by day). History of the reminders need to be maintained for some time.
>> After certain time (say 3 or 6 months) they may be deleted by ttl
>> facility.
>>
>> "While presenting the reminders timeline to the user, latest
>> supercolumns like around 50 from the start_end will be picked up and
>> their subcolumns values will be compared to the Tags user has chosen
>> to see and, corresponding to the filtered subcolumn values(tags), the
>> rows of the reminder details would be picked up.."
>>
>> Is supercolumn a preferable choice for this ? Can there be a better
>> schema than this ?
>>
>>
>> -Aditya Narayan
>>
>>
>>
>> On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs<bill.spe...@gmail.com>
>>  wrote:
>>>
>>> To reiterate, so I know we're both on the same page, your schema would be
>>> something like this:
>>>
>>> - A column family (as you describe) to store the details of a reminder.
>>> One
>>> reminder per row. The row key would be a TimeUUID.
>>>
>>> - A super column family to store the reminders for each user, for each
>>> day.
>>> The row key would be something like: YYYYMMDD:user_id. The column names
>>> would simply be the TimeUUID of the messages. The sub column names would
>>> be
>>> the tag names of the various reminders.
>>>
>>> The idea is that you would then get a slice of each row for a user, for a
>>> day, that would only contain sub column names with the tags you're
>>> looking
>>> for? Then based upon the column names returned, you'd look-up the
>>> reminders.
>>>
>>> That seems like a solid schema to me.
>>>
>>> Bill-
>>>
>>> On 02/02/2011 09:37 AM, Aditya Narayan wrote:
>>>>
>>>> Actually, I am trying to use Cassandra to display to users on my
>>>> applicaiton, the list of all Reminders set by themselves for
>>>> themselves, on the application.
>>>>
>>>> I need to store rows containing the timeline of daily Reminders put by
>>>> the users, for themselves, on application. The reminders need to be
>>>> presented to the user in a chronological order like a news feed.
>>>> Each reminder has got certain tags associated with it(so that, at
>>>> times, user may also choose to see the reminders filtered by tags in
>>>> chronological order).
>>>>
>>>> So I thought of a schema something like this:-
>>>>
>>>> -Each Reminder details may be stored as separate rows in column family.
>>>> -For presenting the timeline of reminders set by user to be presented
>>>> to the user, the timeline row of each user would contain the Id/Key(s)
>>>> (of the Reminder rows) as the supercolumn names and the subcolumns
>>>> inside that supercolumns could contain the list of tags associated
>>>> with particular reminder. All tags set at once during first write. The
>>>> no of tags(subcolumns) will be around 8 maximum.
>>>>
>>>> Any comments, suggestions and feedback on the schema design are
>>>> requested..
>>>>
>>>> Thanks
>>>> Aditya Narayan
>>>>
>>>>
>>>> On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan<ady...@gmail.com>
>>>>  wrote:
>>>>>
>>>>> Hey all,
>>>>>
>>>>> I need to store supercolumns each with around 8 subcolumns;
>>>>> All the data for a supercolumn is written at once and all subcolumns
>>>>> need to be retrieved together. The data in each subcolumn is not big,
>>>>> it just contains keys to other rows.
>>>>>
>>>>> Would it be preferred to have a supercolumn family or just a standard
>>>>> column family containing "all the subcolumns data serialized in single
>>>>> column(s) " ?
>>>>>
>>>>> Thanks
>>>>> Aditya Narayan
>>>>>
>>>
>

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

Reply via email to