Re: Secondary indexes for multi-value fields

Prasad Sunkari Wed, 22 Dec 2010 11:05:33 -0800


I will frame my question in a different way.

Each user in my system subscribes to updates from selected other users(updates are aggregated from outside) and tags the users to which he/sheis subscribed to.

In my current design, I have a column family called "Followers" keyed byuserid in which each column name is the userid of another user followingthe first user. Another super column family called "Subscriptions"again keyed by userid in which each super column name is the userid ofthe user to whose updates the "key" is subscribed to - the columnscontain data the tags.

Obviously I use the tags in lots of places and needs the reverse indexon tags (list of subscriptions which have a tag). This is done bymaintaining another column family - "SubscriptionsByTag"

Now, with the advent of secondary indexes in 0.7 can I redesign it tomake it a little simpler? Maybe avoid having to maintain the reverseindex for tags?

I do understand that secondary indexes are not supported for supercolumns. So, can I have "Subscriptions" to be a column family whereuserid maps to a comma separated list of tags? Is it possible, out ofthe box or by implementing some interface to have secondary index oversuch multi valued columns?

What in general would be the best practices for such multi-valued fieldson which I need a secondary index too. (Joss's reply confused me, am Iright in thinking that range slices are only for retrieving values for acontinuous set of keys and not really for secondary indexes)


[Sorry if I seem too naive]

Thanks,
Prasad

On 12/22/2010 09:47 PM, Anand Somani wrote:

One approach is to ask yourself questions as to how you would use thisinformation, for example


  * how often to you go from user to tags
  * how often would you want to go from tag->users.
  * What kind of reporting would you want to do on tags and how often
  * Can multiple people add the same tag to the same user, are they
    maintained separately
  * Given your business, how many users do you expect
  * etc.

Depending on that one approach might work better than other. I havenot used indexes/non id based searches (do not have that use case) inCassandra yet, so this is just based on time I have spend readingabout it.

One approach using indexes was given by Jool, the other approach isusing reverse indexes


  * 2 CF - one for user and one for tags (reverse index)
  * User - might need to have a SC - with tags and some information
    like who tagged it
  * Tag - tag to column of users
  * Advantage: -
      o 1 query to find user->tags on user CF
      o tag->users - on tag CF (I would think this would be more
        efficient than user->tags since that will potentially hit
        multiple rows/nodes, unless I have misunderstood secondary
        indexes)
  * Disadvantage
      o Need to write to couple of CF, but writes are relatively
        cheaper than reads in Cassandra
      o Since you update 2 CF and there are no transaction, one might
        succeed and the other might fail

Even with the other suggestion of indexes you can still add thetag->users.

On Wed, Dec 22, 2010 at 4:54 AM, Prasad Sunkari <s.pra...@gmail.com<mailto:s.pra...@gmail.com>> wrote:



    Hi all,

    I have a column family for users of my system and I need to have
    tags set to these users.  My current plan is to have a column that
    holds a string (comma separated tags).

    I am not clear if this the best way to do it.  Specially because
    this may lead to a complications when more than one administrator
    is trying to tag the same user (lost updates) as well as the
    secondary indexes (if I wanted to use the built in secondary
    indexes).  I also am not sure if it is possible to have a
    secondary index on a multi-valued column!

    Another alternative is to have it in a super column with each tag
    being a column by itself and let my application take care of the
    secondary indexes.

    I am currently of the opinion that the second solution is the only
    thing that I could do.
    Any suggestions?  Since this is my first app on Cassandra I am
    trying to see if my opinion is correct.

    Thanks,
    Prasad

Re: Secondary indexes for multi-value fields

Reply via email to