Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

aaron morton Thu, 31 Mar 2011 15:42:30 -0700

It does not have a yaml file, so am assuming it's the default Random 
Partitioner.


Aaron

On 1 Apr 2011, at 04:51, Drew Kutcharian wrote:

> Thanks Aaron,
> 
> I have already checked out Twissandra. I was mainly looking to see how 
> Secondary Indexes can be used and how they effect Data Modeling. There 
> doesn't seem to be a lot of coverage on them.
> 
> In addition, I couldn't tell what kind of Partitioner is Twissandra using and 
> why.
> 
> cheers,
> 
> Drew
> 
> 
> On Mar 31, 2011, at 5:53 AM, aaron morton wrote:
> 
>> Drew, 
>>      The Twissandra project is a twitter clone in cassandra, it may give you 
>> some insight into how things can be modelled 
>> https://github.com/thobbs/twissandra
>> 
>>      If you are just starting then consider something like...
>> 
>>      - CF to hold the user, their data and their network links  
>>      - standard CF to hold a blog entry, key is a timestamp 
>>      - standard CF to hold blog comments, each comment as a single column 
>> where the name is a long timestamp 
>>      - standard CF to hold the blogs for a user, key is the user id and each 
>> column is the blog key 
>> 
>> Thats not a great schema but it's a simple starting point you can build on 
>> and refine using things like secondary indexes and doing more/less in the 
>> same CF. 
>> 
>> Good luck. 
>> Aaron
>> 
>> On 30 Mar 2011, at 15:13, Drew Kutcharian wrote:
>> 
>>> I'm pretty new to Cassandra and I would like to get your advice on 
>>> modeling. The object model of the project that I'm working on will be 
>>> pretty close to Blogger, Tumblr, etc. (or any other blogging website).
>>> Where you have Users, that each can have many Blogs and each Blog can have 
>>> many comments. How would you model this efficiently considering:
>>> 
>>> 1) Be able to directly link to a User
>>> 2) Be able to directly link to a Blog
>>> 3) Be able to query and get all the Blogs for a User ordered by time 
>>> created descending (new blogs first)
>>> 4) Be able to query and get all the Comments for each Blog ordered by time 
>>> created ascending (old comments first)
>>> 5) Be able to link different Users to each other, as a network.
>>> 6) Have a well distributed hash so we don't end up with "hot" nodes, while 
>>> the rest of the nodes are idle
>>> 7) It would be nice to show a User how many Blogs they have or how many 
>>> comments are on a Blog, without iterating thru the whole dataset.
>>> NEW: 8) Be able to query for the most recently added Blogs. For example, 
>>> Blogs added today, this week, this month, etc.
>>> 
>>> The target Cassandra version is 0.8 to use the Secondary Indexes. The goal 
>>> is to be very efficient, so no Text keys. We were thinking of using Time 
>>> Based 64bit ids, using Snowflake.
>>> 
>>> Thanks,
>>> 
>>> Drew
>> 
>

Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

Reply via email to