A couple of days ago I came across Countandra ( http://countandra.org/ ). It seems that it might be a solution for you.
Gr. Robin 2012/1/20 Tamar Fraenkel <ta...@tok-media.com> > ** > > Hi! > > I am a newbie to Cassandra and seeking some advice regarding the data > model I should use to best address my needs. > > For simplicity, what I want to accomplish is: > > I have a system that has users (potentially ~10,000 per day) and they > perform actions in the system (total of ~50,000 a day). > > Each User’s action is taking place in a certain point in time, and is also > classified into categories (1 to 5) and tagged by 1-30 tags. Each action’s > Categories and Tags has a score associated with it, the score is between 0 > to 1 (let’s assume precision of 0.0001). > > I want to be able to identify similar actions in the system (performed > usually by more than one user). Similarity of actions is calculated based > on their common Categories and Tags taking scores into account. > > I need the system to store: > > - The list of my users with attributes like name, age etc > - For each action – the categories and tags associated with it and > their score, the time of the action, and the user who performed it. > - Groups of similar actions (ActionGroups) – the id’s of actions in > the group, the categories and tags describing the group, with their scores. > Those are calculated using an algorithm that takes into account the > categories and tags of the actions in the group. > > When a user performs a new action in the system, I want to add it to a > fitting ActionGroups (with similar categories and tags). > > For this I need to be able to perform the following: > > Find all the recent ActionGroups (those who were updated with actions > performed during the last T minutes), who has at list one of the new > action’s categories AND at list one of the new action’s tags. > > > > I thought of two ways to address the issue and I would appreciate your > insights. > > > > First one using secondary indexes > > Column Family: *Users* > > Key: userId > > Compare with Bytes Type > > Columns: name: <>, age: <> etc… > > > > Column Family: *Actions* > > Key: actionId > > Compare with Bytes Type > > Columns: Category1 : <Score> …. > > CategoriN: <Score>, > > Tag1 : <Score>, …. > > TagK:<Score> > > Time: timestamp > > user: userId > > > > Column Family: *ActionGroups* > > Key: actionGroupId > > Compare with Bytes Type > > Columns: Category1 : <Score> …. > > CategoriN: <Score>, > > Tag1 : <Score> …. > > TagK:<Score> > > lastUpdateTime: timestamp > > actionId1: null, … , > > actionIdM: null > > > > I will then define secondary index on each tag columns, category columns, > and the update time column. > > Let’s assume the new action I want to add to ActionGroup has > NewActionCategory1 - NewActionCategoryK, and has NewActionTag1 – > NewActionTagN. I will perform the following query: > > Select * From ActionGroups where > > (NewActionCategory1 > 0 … or NewActionCategoryK > 0) and > > (NewActionTag1 > 0 … or NewActionTagN > 0) and > > lastUpdateTime > T; > > > > Second solution > > Have the same CF as in the first solution *without the secondary* *index*, > and have two additional CF-ies: > > Column Family: *CategoriesToActionGroupId* > > Key: categoryId > > Compare with ByteType > > Columns: {Timestamp, ActionGroupsId1 } : null > > {Timestamp, ActionGroupsId2} : null > > ... > > *timestamp is the update time for the ActionGroup > > > > A similar CF will be defined for tags. > > > > I will then be able to run several queries on CategoriesToActionGroupId > (one for each of the new story Categories), with column slice for the right > update time of the ActionGroup. > > I will do the same for the TagsToActionGroupId. > > I will then use my client code to remove duplicates (ActionGroups who are > associated with more than one Tag or Category). > > > > My questions are: > > 1. Are the two solutions viable? If yes, which is better > 2. Is there any better way of doing this? > 3. Can I use jdbc and CQL with both method, or do I have to use Hector > (I am using Java). > > Thanks > > Tamar > > > > >