I will frame my question in a different way.

Each user in my system subscribes to updates from selected other users (updates are aggregated from outside) and tags the users to which he/she is subscribed to.

In my current design, I have a column family called "Followers" keyed by userid in which each column name is the userid of another user following the first user. Another super column family called "Subscriptions" again keyed by userid in which each super column name is the userid of the user to whose updates the "key" is subscribed to - the columns contain data the tags.

Obviously I use the tags in lots of places and needs the reverse index on tags (list of subscriptions which have a tag). This is done by maintaining another column family - "SubscriptionsByTag"

Now, with the advent of secondary indexes in 0.7 can I redesign it to make it a little simpler? Maybe avoid having to maintain the reverse index for tags?

I do understand that secondary indexes are not supported for super columns. So, can I have "Subscriptions" to be a column family where userid maps to a comma separated list of tags? Is it possible, out of the box or by implementing some interface to have secondary index over such multi valued columns?

What in general would be the best practices for such multi-valued fields on which I need a secondary index too. (Joss's reply confused me, am I right in thinking that range slices are only for retrieving values for a continuous set of keys and not really for secondary indexes)

[Sorry if I seem too naive]

Thanks,
Prasad

On 12/22/2010 09:47 PM, Anand Somani wrote:


One approach is to ask yourself questions as to how you would use this information, for example

  * how often to you go from user to tags
  * how often would you want to go from tag->users.
  * What kind of reporting would you want to do on tags and how often
  * Can multiple people add the same tag to the same user, are they
    maintained separately
  * Given your business, how many users do you expect
  * etc.

Depending on that one approach might work better than other. I have not used indexes/non id based searches (do not have that use case) in Cassandra yet, so this is just based on time I have spend reading about it.

One approach using indexes was given by Jool, the other approach is using reverse indexes

  * 2 CF - one for user and one for tags (reverse index)
  * User - might need to have a SC - with tags and some information
    like who tagged it
  * Tag - tag to column of users
  * Advantage: -
      o 1 query to find user->tags on user CF
      o tag->users - on tag CF (I would think this would be more
        efficient than user->tags since that will potentially hit
        multiple rows/nodes, unless I have misunderstood secondary
        indexes)
  * Disadvantage
      o Need to write to couple of CF, but writes are relatively
        cheaper than reads in Cassandra
      o Since you update 2 CF and there are no transaction, one might
        succeed and the other might fail

Even with the other suggestion of indexes you can still add the tag->users.



On Wed, Dec 22, 2010 at 4:54 AM, Prasad Sunkari <s.pra...@gmail.com <mailto:s.pra...@gmail.com>> wrote:


    Hi all,

    I have a column family for users of my system and I need to have
    tags set to these users.  My current plan is to have a column that
    holds a string (comma separated tags).

    I am not clear if this the best way to do it.  Specially because
    this may lead to a complications when more than one administrator
    is trying to tag the same user (lost updates) as well as the
    secondary indexes (if I wanted to use the built in secondary
    indexes).  I also am not sure if it is possible to have a
    secondary index on a multi-valued column!

    Another alternative is to have it in a super column with each tag
    being a column by itself and let my application take care of the
    secondary indexes.

    I am currently of the opinion that the second solution is the only
    thing that I could do.
    Any suggestions?  Since this is my first app on Cassandra I am
    trying to see if my opinion is correct.

    Thanks,
    Prasad



Reply via email to