But What about: Rainbird?
On Sat, Jan 21, 2012 at 10:52 AM, R. Verlangen <ro...@us2.nl> wrote: > > A couple of days ago I came across Countandra ( http://countandra.org/ ). It > seems that it might be a solution for you. > > Gr. Robin > > > 2012/1/20 Tamar Fraenkel <ta...@tok-media.com> >> >> Hi! >> >> I am a newbie to Cassandra and seeking some advice regarding the data model >> I should use to best address my needs. >> >> For simplicity, what I want to accomplish is: >> >> I have a system that has users (potentially ~10,000 per day) and they >> perform actions in the system (total of ~50,000 a day). >> >> Each User’s action is taking place in a certain point in time, and is also >> classified into categories (1 to 5) and tagged by 1-30 tags. Each action’s >> Categories and Tags has a score associated with it, the score is between 0 >> to 1 (let’s assume precision of 0.0001). >> >> I want to be able to identify similar actions in the system (performed >> usually by more than one user). Similarity of actions is calculated based on >> their common Categories and Tags taking scores into account. >> >> I need the system to store: >> >> The list of my users with attributes like name, age etc >> For each action – the categories and tags associated with it and their >> score, the time of the action, and the user who performed it. >> Groups of similar actions (ActionGroups) – the id’s of actions in the group, >> the categories and tags describing the group, with their scores. Those are >> calculated using an algorithm that takes into account the categories and >> tags of the actions in the group. >> >> When a user performs a new action in the system, I want to add it to a >> fitting ActionGroups (with similar categories and tags). >> >> For this I need to be able to perform the following: >> >> Find all the recent ActionGroups (those who were updated with actions >> performed during the last T minutes), who has at list one of the new >> action’s categories AND at list one of the new action’s tags. >> >> >> >> I thought of two ways to address the issue and I would appreciate your >> insights. >> >> >> >> First one using secondary indexes >> >> Column Family: Users >> >> Key: userId >> >> Compare with Bytes Type >> >> Columns: name: <>, age: <> etc… >> >> >> >> Column Family: Actions >> >> Key: actionId >> >> Compare with Bytes Type >> >> Columns: Category1 : <Score> …. >> >> CategoriN: <Score>, >> >> Tag1 : <Score>, …. >> >> TagK:<Score> >> >> Time: timestamp >> >> user: userId >> >> >> >> Column Family: ActionGroups >> >> Key: actionGroupId >> >> Compare with Bytes Type >> >> Columns: Category1 : <Score> …. >> >> CategoriN: <Score>, >> >> Tag1 : <Score> …. >> >> TagK:<Score> >> >> lastUpdateTime: timestamp >> >> actionId1: null, … , >> >> actionIdM: null >> >> >> >> I will then define secondary index on each tag columns, category columns, >> and the update time column. >> >> Let’s assume the new action I want to add to ActionGroup has >> NewActionCategory1 - NewActionCategoryK, and has NewActionTag1 – >> NewActionTagN. I will perform the following query: >> >> Select * From ActionGroups where >> >> (NewActionCategory1 > 0 … or NewActionCategoryK > 0) and >> >> (NewActionTag1 > 0 … or NewActionTagN > 0) and >> >> lastUpdateTime > T; >> >> >> >> Second solution >> >> Have the same CF as in the first solution without the secondary index , and >> have two additional CF-ies: >> >> Column Family: CategoriesToActionGroupId >> >> Key: categoryId >> >> Compare with ByteType >> >> Columns: {Timestamp, ActionGroupsId1 } : null >> >> {Timestamp, ActionGroupsId2} : null >> >> ... >> >> *timestamp is the update time for the ActionGroup >> >> >> >> A similar CF will be defined for tags. >> >> >> >> I will then be able to run several queries on CategoriesToActionGroupId (one >> for each of the new story Categories), with column slice for the right >> update time of the ActionGroup. >> >> I will do the same for the TagsToActionGroupId. >> >> I will then use my client code to remove duplicates (ActionGroups who are >> associated with more than one Tag or Category). >> >> >> >> My questions are: >> >> Are the two solutions viable? If yes, which is better >> Is there any better way of doing this? >> Can I use jdbc and CQL with both method, or do I have to use Hector (I am >> using Java). >> >> Thanks >> >> Tamar >> >> >> >> > >