I will frame my question in a different way.
Each user in my system subscribes to updates from selected other users
(updates are aggregated from outside) and tags the users to which he/she
is subscribed to.
In my current design, I have a column family called "Followers" keyed by
userid in which each column name is the userid of another user following
the first user. Another super column family called "Subscriptions"
again keyed by userid in which each super column name is the userid of
the user to whose updates the "key" is subscribed to - the columns
contain data the tags.
Obviously I use the tags in lots of places and needs the reverse index
on tags (list of subscriptions which have a tag). This is done by
maintaining another column family - "SubscriptionsByTag"
Now, with the advent of secondary indexes in 0.7 can I redesign it to
make it a little simpler? Maybe avoid having to maintain the reverse
index for tags?
I do understand that secondary indexes are not supported for super
columns. So, can I have "Subscriptions" to be a column family where
userid maps to a comma separated list of tags? Is it possible, out of
the box or by implementing some interface to have secondary index over
such multi valued columns?
What in general would be the best practices for such multi-valued fields
on which I need a secondary index too. (Joss's reply confused me, am I
right in thinking that range slices are only for retrieving values for a
continuous set of keys and not really for secondary indexes)
[Sorry if I seem too naive]
Thanks,
Prasad
On 12/22/2010 09:47 PM, Anand Somani wrote:
One approach is to ask yourself questions as to how you would use this
information, for example
* how often to you go from user to tags
* how often would you want to go from tag->users.
* What kind of reporting would you want to do on tags and how often
* Can multiple people add the same tag to the same user, are they
maintained separately
* Given your business, how many users do you expect
* etc.
Depending on that one approach might work better than other. I have
not used indexes/non id based searches (do not have that use case) in
Cassandra yet, so this is just based on time I have spend reading
about it.
One approach using indexes was given by Jool, the other approach is
using reverse indexes
* 2 CF - one for user and one for tags (reverse index)
* User - might need to have a SC - with tags and some information
like who tagged it
* Tag - tag to column of users
* Advantage: -
o 1 query to find user->tags on user CF
o tag->users - on tag CF (I would think this would be more
efficient than user->tags since that will potentially hit
multiple rows/nodes, unless I have misunderstood secondary
indexes)
* Disadvantage
o Need to write to couple of CF, but writes are relatively
cheaper than reads in Cassandra
o Since you update 2 CF and there are no transaction, one might
succeed and the other might fail
Even with the other suggestion of indexes you can still add the
tag->users.
On Wed, Dec 22, 2010 at 4:54 AM, Prasad Sunkari <s.pra...@gmail.com
<mailto:s.pra...@gmail.com>> wrote:
Hi all,
I have a column family for users of my system and I need to have
tags set to these users. My current plan is to have a column that
holds a string (comma separated tags).
I am not clear if this the best way to do it. Specially because
this may lead to a complications when more than one administrator
is trying to tag the same user (lost updates) as well as the
secondary indexes (if I wanted to use the built in secondary
indexes). I also am not sure if it is possible to have a
secondary index on a multi-valued column!
Another alternative is to have it in a super column with each tag
being a column by itself and let my application take care of the
secondary indexes.
I am currently of the opinion that the second solution is the only
thing that I could do.
Any suggestions? Since this is my first app on Cassandra I am
trying to see if my opinion is correct.
Thanks,
Prasad