Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

Drew Kutcharian Thu, 31 Mar 2011 10:52:26 -0700

Thanks Aaron,

I have already checked out Twissandra. I was mainly looking to see how 
Secondary Indexes can be used and how they effect Data Modeling. There doesn't 
seem to be a lot of coverage on them.


In addition, I couldn't tell what kind of Partitioner is Twissandra using and 
why.

cheers,

Drew


On Mar 31, 2011, at 5:53 AM, aaron morton wrote:

> Drew, 
>       The Twissandra project is a twitter clone in cassandra, it may give you 
> some insight into how things can be modelled 
> https://github.com/thobbs/twissandra
> 
>       If you are just starting then consider something like...
> 
>       - CF to hold the user, their data and their network links  
>       - standard CF to hold a blog entry, key is a timestamp 
>       - standard CF to hold blog comments, each comment as a single column 
> where the name is a long timestamp 
>       - standard CF to hold the blogs for a user, key is the user id and each 
> column is the blog key 
> 
> Thats not a great schema but it's a simple starting point you can build on 
> and refine using things like secondary indexes and doing more/less in the 
> same CF. 
> 
> Good luck. 
> Aaron
> 
> On 30 Mar 2011, at 15:13, Drew Kutcharian wrote:
> 
>> I'm pretty new to Cassandra and I would like to get your advice on modeling. 
>> The object model of the project that I'm working on will be pretty close to 
>> Blogger, Tumblr, etc. (or any other blogging website).
>> Where you have Users, that each can have many Blogs and each Blog can have 
>> many comments. How would you model this efficiently considering:
>> 
>> 1) Be able to directly link to a User
>> 2) Be able to directly link to a Blog
>> 3) Be able to query and get all the Blogs for a User ordered by time created 
>> descending (new blogs first)
>> 4) Be able to query and get all the Comments for each Blog ordered by time 
>> created ascending (old comments first)
>> 5) Be able to link different Users to each other, as a network.
>> 6) Have a well distributed hash so we don't end up with "hot" nodes, while 
>> the rest of the nodes are idle
>> 7) It would be nice to show a User how many Blogs they have or how many 
>> comments are on a Blog, without iterating thru the whole dataset.
>> NEW: 8) Be able to query for the most recently added Blogs. For example, 
>> Blogs added today, this week, this month, etc.
>> 
>> The target Cassandra version is 0.8 to use the Secondary Indexes. The goal 
>> is to be very efficient, so no Text keys. We were thinking of using Time 
>> Based 64bit ids, using Snowflake.
>> 
>> Thanks,
>> 
>> Drew
>

Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

Reply via email to