Stratio (or DSE Search) should be good for ad hoc or complex queries, but if there are some fixed/common query patterns you might be better off implementing query tables or using materialized views. The latter allows you to include a non-PK data column in the PK of the MV so that you can directly access the indexed row without the complexity of Lucene/DSE. This also lets you effectively cluster data that will be commonly accessed together on a single node/partition, and to do it automatically without any application logic to manually duplicate/update data.
(3.x still has the restriction that an MV PK can only include one non-PK data column - CASSANDRA-9928 <https://issues.apache.org/jira/browse/CASSANDRA-9928>.) -- Jack Krupansky On Wed, Mar 16, 2016 at 4:40 PM, I PVP <i...@hotmail.com> wrote: > Jack/Tom > Thanks for answering. > > Here is the table definition so far: > > CREATE TABLE audit_trail ( > auditid timeuuid, > actiontype text, > objecttype text, > executedby uuid ( or timeuuid?), > executedat timestamp, > objectbefore text, > objectafter text, > clientipaddr text, > serveripaddr text, > servername text, > channel text, > PRIMARY KEY (auditid) > ); > > objectbefore/after are the only ones that will have JSON content. quering > based on the contents of these two columns are not a requirement. > > At this moment the queries are going to be mainly on executedby ( the > employee id). > Stratio’s Cassandra Lucene Index will be used to allow querying/filtering > on executedat (timestamp) ,objecttype(order, customer, ticket, > message,account, paymenttransaction,refund etc.) and actiontype(create, > retrieve, update, delete, approve, activate, unlock, lock etc.) . > > I am considering to count exclusively on Stratio’s Cassandra Lucene > filtering and avoid to add “period” columns like month(int), year(int), > day (int). > > Thanks > > -- > IPVP > > > From: Jack Krupansky <jack.krupan...@gmail.com> <jack.krupan...@gmail.com> > Reply: user@cassandra.apache.org <user@cassandra.apache.org>> > <user@cassandra.apache.org> > Date: March 16, 2016 at 5:22:36 PM > To: user@cassandra.apache.org <user@cassandra.apache.org>> > <user@cassandra.apache.org> > Subject: Re: Modeling Audit Trail on Cassandra > > executedby is the ID assigned to an employee. > > I'm presuming that JSON is to be used for objectbefore/after. This > suggests no ability to query by individual object fields. I didn't sense > any other columns that would be JSON. > > > > -- Jack Krupansky > > On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge <t...@drillster.com> > wrote: > >> Is text the most appropriate data type to store JSON that contain couple >>> of dozen lines ? >>> >> >> It sure is the simplest way to store JSON. >> >> The query requirement is "where executedby = ?”. >>> >> >> Since executedby is a timeuuid, I guess you don't want to query a single >> record, since that would require you to know the exact timeuuid. Do you >> mean that you would like to query all changes in a certain time frame, e.g. >> today? In that case, you would have to group your rows in time buckets, >> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any >> other period that suits your situation. Retrieving all changes in a >> specific time frame is done by retrieving all relevant periods. >> >> Tom >> > >