Thanks Aaron, I have already checked out Twissandra. I was mainly looking to see how Secondary Indexes can be used and how they effect Data Modeling. There doesn't seem to be a lot of coverage on them.
In addition, I couldn't tell what kind of Partitioner is Twissandra using and why. cheers, Drew On Mar 31, 2011, at 5:53 AM, aaron morton wrote: > Drew, > The Twissandra project is a twitter clone in cassandra, it may give you > some insight into how things can be modelled > https://github.com/thobbs/twissandra > > If you are just starting then consider something like... > > - CF to hold the user, their data and their network links > - standard CF to hold a blog entry, key is a timestamp > - standard CF to hold blog comments, each comment as a single column > where the name is a long timestamp > - standard CF to hold the blogs for a user, key is the user id and each > column is the blog key > > Thats not a great schema but it's a simple starting point you can build on > and refine using things like secondary indexes and doing more/less in the > same CF. > > Good luck. > Aaron > > On 30 Mar 2011, at 15:13, Drew Kutcharian wrote: > >> I'm pretty new to Cassandra and I would like to get your advice on modeling. >> The object model of the project that I'm working on will be pretty close to >> Blogger, Tumblr, etc. (or any other blogging website). >> Where you have Users, that each can have many Blogs and each Blog can have >> many comments. How would you model this efficiently considering: >> >> 1) Be able to directly link to a User >> 2) Be able to directly link to a Blog >> 3) Be able to query and get all the Blogs for a User ordered by time created >> descending (new blogs first) >> 4) Be able to query and get all the Comments for each Blog ordered by time >> created ascending (old comments first) >> 5) Be able to link different Users to each other, as a network. >> 6) Have a well distributed hash so we don't end up with "hot" nodes, while >> the rest of the nodes are idle >> 7) It would be nice to show a User how many Blogs they have or how many >> comments are on a Blog, without iterating thru the whole dataset. >> NEW: 8) Be able to query for the most recently added Blogs. For example, >> Blogs added today, this week, this month, etc. >> >> The target Cassandra version is 0.8 to use the Secondary Indexes. The goal >> is to be very efficient, so no Text keys. We were thinking of using Time >> Based 64bit ids, using Snowflake. >> >> Thanks, >> >> Drew >