The fact that subcolumns inside the supercolumns aren't indexed currently may suck here, whenever a small no (10-20 ) of subcolumns need to be retreived from a large list of subcolumns of a supercolumn like MyPostsIdKeysList.
On Fri, Jan 7, 2011 at 9:58 PM, Raj <rajkumar....@gmail.com> wrote: > My question is in context of a social network schema design > > I am thinking of following schema for storing a user's data that is > required as he logs in & is led to his homepage:- > (I aimed at a schema design such that through a single row read query > all the data that would be required to put up the homepage of that > user, is retreived.) > > UserSuperColumnFamily: { // Column Family > > UserIDKey: > {columns: MyName, MyEmail, MyCity,...etc > supercolumns: MyFollowersList, MyFollowiesList, MyPostsIdKeysList, > MyInterestsList, MyAlbumsIdKeysList, MyVideoIdKeysList, > RecentNotificationsForUserList, MessagesReceivedList, > MessagesSentList, AccountSettingsList, RecentSelfActivityList, > UpdatesFromFollowiesList > } > } > > Thus user's newfeed would be generated using superColumn: > UpdatesFromFollowiesList. But the UpdatesFromFollowiesList, would > obviously contain only Id of the posts and not the entire post data. > > Questions: > > 1.) What could be the problems with this design, any improvements ? > > 2.) Would frequent & heavy overwrite operations/ row mutations (for > example; when propagating the post updates for news-feed from some > user to all his followies) which leads to rows ultimately being in > several SSTables, will lead to degraded read performance ?? Is it > suitable to use row Cache(too big row but all data required uptil user > is logged in) If I do not use cache, it may be very expensive to pull > the row each time a data is required for the given user since row > would be in several sstables. How can I improve the > read performance here > > The actual data of the posts from network would be retrieved using > PostIdKey through subsequent read queries from columnFamily > PostsSuperColumnFamily which would be like follows: > > PostsSuperColumnFamily:{ > > PostIdKey: > { > columns: PostOwnerId, PostBody > supercolumns: TagsForPost {list of columns of all tags for the > post}, PeopleWhoLikedThisPost {list of columns of UserIdKey of all the > likers} > } > } > > Is this the best design to go with or are there any issues to consider > here ? Thanks in anticipation of your valuable comments.! >