*"your primary goal is to fetch a user by dept_id and user_id and additionally keep versions of the user data?"* My primary goal was to just fetch users for a dept, sorted by modified date. Now the limitation from cassandra that mod_date can't be a clustering key if it can be updated forces me to have all versions. I was looking for any standard design practices around this , because it seems very common use case to me.
Thanks for the link, I do understand the purpose of different keys :) Regards Monmohan On Fri, 7 Apr 2017 at 13:57 <j.kes...@enercast.de> wrote: > Hi, > > > > your primary goal is to fetch a user by dept_id and user_id and > additionally keep versions of the user data? > > > > { > > dept_id text, > > user_id text, > > mod_date timestamp, > > user_name text, > > PRIMARY KEY ((dept_id,user_id), mod_date) > > WITH CLUSTERING ORDER BY (mod_date DESC); > > } > > > > There is a difference between partition key and cluster keys. My suggestion > will end up with all versions of a particular (dept_id,user_id) on a > partition (say node) and all versions of your data on that portion in > descending order by mod_date. > > > > For a normal loopkup you do not need to know mod_date, a simple SELECT * > FROM users WHERE dept_id=foo and user_id=bar LIMIT 1 will do. > > > > http://datascale.io/cassandra-partitioning-and-clustering-keys-explained/ > > > > > > > > Gesendet von meinem Windows 10 Phone > > > > *Von: *Monmohan Singh <monmo...@gmail.com> > *Gesendet: *Donnerstag, 6. April 2017 13:54 > *An: *user@cassandra.apache.org > *Betreff: *The changing clustering key > > > > Dear Cassandra experts, > > I have a data modeling question for cases where data needs to be sorted by > keys which can be modified. > > So , say we have a user table > > { > > dept_id text, > > user_id text, > > user_name text, > > mod_date timestamp > > PRIMARY KEY (dept_id,user_id) > > } > > Now I can query cassandra to get all users by a dept_id > > What if I wanted to query to get all users in a dept, sorted by mod_date. > > So, one way would be to > > { > > dept_id text, > > user_id text, > > mod_date timestamp, > > user_name text, > > PRIMARY KEY (dept_id,user_id, mod_date) > > } > > But, mod_date changes every time user name is updated. So it can't be part > of clustering key. > > > > Attempt 1: Don't update the row but instead create new record for every > update. So, say the record for user foo is like below > > {'dept_id1','user_id1',TimeStamp1','foo'} and then the name was changed to > 'bar' and then to 'baz' . In that case we add another row to table, so the > table data would look like > > > > {'dept_id1','user_id1',TimeStamp3','baz'} > > {'dept_id1','user_id1',TimeStamp2','bar'} > > {'dept_id1','user_id1',TimeStamp1','foo'} > > > > Now we can get all users in a dept, sorted by mod_date but it presents a > different problem. The data returned is duplicated. > > > > Attempt 2 : Add another column to identify the head record much like a > linked list > > { > > dept_id text, > > user_id text, > > mod_date timestamp, > > user_name text, > > next_record text > > PRIMARY KEY (user_id,user_id, mod_date) > > } > > Every time an update happens it adds a row and also adds the PK of new > record except in the latest record. > > > > {'dept_id1','user_id1',TimeStamp3','baz','HEAD'} > > {'dept_id1','user_id1',TimeStamp2','bar','dept_id1#user_id1#TimeStamp3'} > > {'dept_id1','user_id1',TimeStamp1','foo','dept_id1#user_id1#TimeStamp2'} > > and also add a secondary index to 'next_record' column. > > > > Now I can support get all users in a dept, sorted by mod_date by > > SELECT * from USERS where dept_id=':dept' AND next_record='HEAD' order by > mod_date. > > > > But it looks fairly involved solution and perhaps I am missing something , > a simpler solution .. > > > > The other option is delete and insert but for high frequency changes I > think Cassandra has issues with tombstones. > > > > Thanks for helping on this. > > Regards > > Monmohan >