Thoughts ?
On Tue, Nov 6, 2012 at 3:58 AM, Ertio Lew <ertio...@gmail.com> wrote: > I need to store (1)posts written by users, (2)along with activity data by > other users on these posts & (3) some counters for each post like views > counts, likes counts, etc. So for each post, there is 3 category of data > associated, the original post data which is stored in one CF using single > row per post, another counters data using 1 row for each post data in > counters type CF & for activity data, each user stores his own activity > column for each post he reacted to & also stores activity data of all his > friends in a dedicated row for every user. > > > So here is my current schema plan : > > For Posts: > ------------- > 1 CF with single row for each post > > > For Counters: > ------------------ > 1 CF with single row for each post > > > For Activities Data > --------------------------- > > 1 CF with single row for each user > > > > Now for showing the post at anytime I need to have all the 3 categories of > data so I'm forced to read 3 CFs. So I have been wondering why I shouldn't > be trying to merge this data into a single CF as materialized view in > single row so that read queries could be made more efficiently. > > Here is the idea I have got: > > For each post I would be storing the post data (written once never updated > type)+ activities data of all users on that post (written for each user at > different times & may be edited many times) in a 'single row'. Using > the activities data of all users I can calculate all the counters data(by > iterating over activity columns), so I don't need to store that explicitly. > So now for reading some 10 posts at a time, I just need to read 10 rows. > Also I set a reasonable limit on no of columns to read so that if the post > counters are too big I don't have to read all column, then in that (less > often)cases I perform a second query to read the counters from another CF. > So for most of the time I would enjoy reading from single CF & single row > for each post. But another issue is that since that single row will contain > activity of several users (each column added at different times to row) so > that row might go in many SSTtables. So which is a good schema for me 1st > one or 2nd with respect to performance ? > > Thanks. > > > > > > > >