I have found that range_key communicates better what you can actually do with them, whereas clustering is more passive.
ml On Thu, Mar 13, 2014 at 2:08 PM, Jack Krupansky <j...@basetechnology.com>wrote: > “range key” is formally known as “clustering column”. One or more > clustering columns can be specified to identify individual rows in a > partition. Without clustering columns, one partition is one row. So, it’s a > matter of whether you want your rows to be in the same partition or > distributed. > > -- Jack Krupansky > > *From:* Laing, Michael <michael.la...@nytimes.com> > *Sent:* Thursday, March 13, 2014 1:39 PM > *To:* user@cassandra.apache.org > *Subject:* Re: CQL Select Map using an IN relationship > > Think of them as: > > > PRIMARY KEY (partition_key[, range_key]) > > > where the partition_key can be compounded as: > > > (partition_key0 [, partition_key1, ...]) > > > and the optional range_key can be compounded as: > > > range_key0 [, range_key1 ...] > > > If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key > and key2 is the range_key and queries will work that hash to key1 (the > partition) using = or IN and specify a range on key2. > > But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the > compound partition key - there is no range key - and you can specify = on > key1 and = or IN on key2 (but not a range). > > Anyway that's what I remember! Hope it helps. > > ml > > > On Thu, Mar 13, 2014 at 11:27 AM, David Savage <davemssav...@gmail.com>wrote: > >> Great that works, thx! I probably would have never found that... >> >> It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or >> PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the >> time. >> >> Kind regards, >> >> Dave >> >> >> On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael < >> michael.la...@nytimes.com> wrote: >> >>> Create your table like this and it will work: >>> >>> CREATE TABLE test.documents (group text,id bigint,data >>> map<text,text>,PRIMARY KEY ((group, id))); >>> >>> The extra parens catenate 'group' and 'id' into the partition key - IN >>> will work on the last component of a partition key. >>> >>> ml >>> >>> >>> On Thu, Mar 13, 2014 at 10:40 AM, David Savage >>> <davemssav...@gmail.com>wrote: >>> >>>> Nope, upgraded to 2.0.5 and still get the same problem, I actually >>>> simplified the problem a little in my first post, there's a composite >>>> primary key involved as I need to partition ids into groups >>>> >>>> So the full CQL statements are: >>>> >>>> >>>> CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', >>>> 'replication_factor':3}; >>>> >>>> >>>> >>>> CREATE TABLE test.documents (group text,id bigint,data >>>> map<text,text>,PRIMARY KEY (group, id)); >>>> >>>> >>>> >>>> INSERT INTO test.documents(id,group,data) VALUES >>>> (0,'test',{'count':'0'}); >>>> >>>> INSERT INTO test.documents(id,group,data) VALUES >>>> (1,'test',{'count':'1'}); >>>> >>>> INSERT INTO test.documents(id,group,data) VALUES >>>> (2,'test',{'count':'2'}); >>>> >>>> >>>> >>>> SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2); >>>> >>>> >>>> >>>> Thanks for your help. >>>> >>>> >>>> >>>> Kind regards, >>>> >>>> >>>> >>>> /Dave >>>> >>>> >>>> On Thu, Mar 13, 2014 at 2:00 PM, David Savage >>>> <davemssav...@gmail.com>wrote: >>>> >>>>> Hmmm that maybe the problem, I'm currently testing with 2.0.2 which >>>>> got dragged in by the cassandra unit library I'm using for testing [1] I >>>>> will try to fix my build dependencies and retry, thx. >>>>> >>>>> /Dave >>>>> >>>>> [1] https://github.com/jsevellec/cassandra-unit >>>>> >>>>> >>>>> On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael < >>>>> michael.la...@nytimes.com> wrote: >>>>> >>>>>> I have no problem doing this w 2.0.5 - what version of C* are you >>>>>> using? Or maybe I don't understand your data model... attach 'creates' if >>>>>> you don't mind. >>>>>> >>>>>> ml >>>>>> >>>>>> >>>>>> On Thu, Mar 13, 2014 at 9:24 AM, David Savage <davemssav...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Hi Peter, >>>>>>> >>>>>>> Thanks for the help, unfortunately I'm not sure that's the problem, >>>>>>> the id is the primary key on the documents table and the timestamp >>>>>>> is the primary key on the eventlog table >>>>>>> >>>>>>> >>>>>>> Kind regards, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> On Thursday, 13 March 2014, Peter Lin <wool...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> it's not clear to me if your "id" column is the KEY or just a >>>>>>>> regular column with secondary index. >>>>>>>> >>>>>>>> queries that have IN on non primary key columns isn't supported >>>>>>>> yet. not sure if that answers your question. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Mar 13, 2014 at 7:12 AM, David Savage < >>>>>>>> davemssav...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi there, >>>>>>>>> >>>>>>>>> I'm experimenting using cassandra and have run across an error >>>>>>>>> message which I need a little more information on. >>>>>>>>> >>>>>>>>> The use case I'm experimenting with is a series of document >>>>>>>>> updates (documents being an arbitrary map of key value pairs), I >>>>>>>>> would like >>>>>>>>> to find the latest document updates after a specified time period. I >>>>>>>>> don't >>>>>>>>> want to store many copies of the documents (one per update) as the >>>>>>>>> updates >>>>>>>>> are often only to single keys in the map so that would involve a lot >>>>>>>>> of >>>>>>>>> duplicated data. >>>>>>>>> >>>>>>>>> The solution I've found that seems to fit best in terms of >>>>>>>>> performance is to have two tables. >>>>>>>>> >>>>>>>>> One that has an event log of timeuuid -> docid and a second that >>>>>>>>> stores the documents themselves stored by docid -> map<string, >>>>>>>>> string>. I >>>>>>>>> then run two queries, one to select ids that have changed after a >>>>>>>>> certain >>>>>>>>> time: >>>>>>>>> >>>>>>>>> SELECT id FROM eventlog WHERE timestamp>=minTimeuuid($minimumTime) >>>>>>>>> >>>>>>>>> and then a second to select the actual documents themselves >>>>>>>>> >>>>>>>>> SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, >>>>>>>>> 7…) >>>>>>>>> >>>>>>>>> However this then explodes on query with the error message: >>>>>>>>> >>>>>>>>> "Cannot restrict PRIMARY KEY part id by IN relation as a >>>>>>>>> collection is selected by the query" >>>>>>>>> >>>>>>>>> Detective work lead me to these lines in >>>>>>>>> org.apache.cassandra.cql3.statementsSelectStatement: >>>>>>>>> >>>>>>>>> // We only support IN for the last name and >>>>>>>>> for compact storage so far >>>>>>>>> // TODO: #3885 allows us to extend to non >>>>>>>>> compact as well, but that remains to be done >>>>>>>>> if (i != stmt.columnRestrictions.length - 1) >>>>>>>>> throw new >>>>>>>>> InvalidRequestException(String.format("PRIMARY KEY part %s cannot be >>>>>>>>> restricted by IN relation", cname)); >>>>>>>>> else if (stmt.selectACollection()) >>>>>>>>> throw new >>>>>>>>> InvalidRequestException(String.format("Cannot restrict PRIMARY KEY >>>>>>>>> part %s >>>>>>>>> by IN relation as a collection is selected by the query", cname)); >>>>>>>>> >>>>>>>>> It seems like #3885 will allow support for the first IF block >>>>>>>>> above, but I don't think it will allow the second, am I correct? >>>>>>>>> >>>>>>>>> Any pointers on how I can work around this would be greatly >>>>>>>>> appreciated. >>>>>>>>> >>>>>>>>> Kind regards, >>>>>>>>> >>>>>>>>> Dave >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > >