*"your primary goal is to fetch a user by dept_id and user_id and
additionally keep versions of the user data?"*
My primary goal was to just fetch users for a dept, sorted by modified
date. Now the limitation from cassandra that mod_date can't be a clustering
key if it can be updated forces me to have all versions.  I was looking for
any standard design practices around this , because it seems very common
use case to me.

Thanks for the link, I do understand the purpose of different keys :)
Regards
Monmohan


On Fri, 7 Apr 2017 at 13:57 <j.kes...@enercast.de> wrote:

> Hi,
>
>
>
> your primary goal is to fetch a user by dept_id and user_id and
> additionally keep versions of the user data?
>
>
>
> {
>
>    dept_id text,
>
>    user_id text,
>
>    mod_date timestamp,
>
>    user_name text,
>
>    PRIMARY KEY ((dept_id,user_id), mod_date)
>
>    WITH CLUSTERING ORDER BY (mod_date DESC);
>
> }
>
>
>
> There is a difference between partition key and cluster keys. My suggestion
> will end up with all versions of a particular (dept_id,user_id) on a
> partition (say node) and all versions of your data on that portion in
> descending order by mod_date.
>
>
>
> For a normal loopkup you do not need to know mod_date, a simple SELECT *
> FROM users WHERE dept_id=foo and user_id=bar LIMIT 1 will do.
>
>
>
> http://datascale.io/cassandra-partitioning-and-clustering-keys-explained/
>
>
>
>
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *Monmohan Singh <monmo...@gmail.com>
> *Gesendet: *Donnerstag, 6. April 2017 13:54
> *An: *user@cassandra.apache.org
> *Betreff: *The changing clustering key
>
>
>
> Dear Cassandra experts,
>
> I have a data modeling question for cases where data needs to be sorted by
> keys which can be modified.
>
> So , say we have a user table
>
> {
>
>    dept_id text,
>
>    user_id text,
>
>    user_name text,
>
>    mod_date timestamp
>
>    PRIMARY KEY (dept_id,user_id)
>
> }
>
> Now I can query cassandra to get all users by a dept_id
>
> What if I wanted to query to get all users in a dept, sorted by mod_date.
>
> So, one way would be to
>
> {
>
>    dept_id text,
>
>    user_id text,
>
>    mod_date timestamp,
>
>    user_name text,
>
>    PRIMARY KEY (dept_id,user_id, mod_date)
>
> }
>
> But, mod_date changes every time user name is updated. So it can't be part
> of clustering key.
>
>
>
> Attempt 1:  Don't update the row but instead create new record for every
> update. So, say the record for user foo is like below
>
> {'dept_id1','user_id1',TimeStamp1','foo'} and then the name was changed to
> 'bar' and then to 'baz' . In that case we add another row to table, so the
> table data would look like
>
>
>
> {'dept_id1','user_id1',TimeStamp3','baz'}
>
> {'dept_id1','user_id1',TimeStamp2','bar'}
>
> {'dept_id1','user_id1',TimeStamp1','foo'}
>
>
>
> Now we can get all users in a dept, sorted by mod_date but it presents a
> different problem. The data returned is duplicated.
>
>
>
> Attempt 2 : Add another column to identify the head record much like a
> linked list
>
> {
>
>    dept_id text,
>
>    user_id text,
>
>    mod_date timestamp,
>
>    user_name text,
>
>    next_record text
>
>    PRIMARY KEY (user_id,user_id, mod_date)
>
> }
>
> Every time an update happens it adds a row and also adds the PK of new
> record except in the latest record.
>
>
>
> {'dept_id1','user_id1',TimeStamp3','baz','HEAD'}
>
> {'dept_id1','user_id1',TimeStamp2','bar','dept_id1#user_id1#TimeStamp3'}
>
> {'dept_id1','user_id1',TimeStamp1','foo','dept_id1#user_id1#TimeStamp2'}
>
> and also add a secondary index to 'next_record' column.
>
>
>
> Now I can support get all users in a dept, sorted by mod_date by
>
> SELECT * from USERS where dept_id=':dept' AND next_record='HEAD' order by
> mod_date.
>
>
>
> But it looks fairly involved solution and perhaps I am missing something ,
> a simpler solution ..
>
>
>
> The other option is delete and insert but for high frequency changes I
> think Cassandra has issues with tombstones.
>
>
>
> Thanks for helping on this.
>
> Regards
>
> Monmohan
>

Reply via email to