I have found that range_key communicates better what you can actually do
with them, whereas clustering is more passive.

ml


On Thu, Mar 13, 2014 at 2:08 PM, Jack Krupansky <j...@basetechnology.com>wrote:

>   “range key” is formally known as “clustering column”. One or more
> clustering columns can be specified to identify individual rows in a
> partition. Without clustering columns, one partition is one row. So, it’s a
> matter of whether you want your rows to be in the same partition or
> distributed.
>
> -- Jack Krupansky
>
>  *From:* Laing, Michael <michael.la...@nytimes.com>
> *Sent:* Thursday, March 13, 2014 1:39 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: CQL Select Map using an IN relationship
>
>  Think of them as:
>
>
> PRIMARY KEY (partition_key[, range_key])
>
>
> where the partition_key can be compounded as:
>
>
> (partition_key0 [, partition_key1, ...])
>
>
> and the optional range_key can be compounded as:
>
>
> range_key0 [, range_key1 ...]
>
>
> If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key
> and key2 is the range_key and queries will work that hash to key1 (the
> partition) using = or IN and specify a range on key2.
>
> But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the
> compound partition key - there is no range key - and you can specify = on
> key1 and = or IN on key2 (but not a range).
>
> Anyway that's what I remember! Hope it helps.
>
> ml
>
>
> On Thu, Mar 13, 2014 at 11:27 AM, David Savage <davemssav...@gmail.com>wrote:
>
>> Great that works, thx! I probably would have never found that...
>>
>> It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or
>> PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the
>> time.
>>
>> Kind regards,
>>
>> Dave
>>
>>
>> On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> Create your table like this and it will work:
>>>
>>> CREATE TABLE test.documents (group text,id bigint,data
>>> map<text,text>,PRIMARY KEY ((group, id)));
>>>
>>> The extra parens catenate 'group' and 'id' into the partition key - IN
>>> will work on the last component of a partition key.
>>>
>>> ml
>>>
>>>
>>> On Thu, Mar 13, 2014 at 10:40 AM, David Savage 
>>> <davemssav...@gmail.com>wrote:
>>>
>>>> Nope, upgraded to 2.0.5 and still get the same problem, I actually
>>>> simplified the problem a little in my first post, there's a composite
>>>> primary key involved as I need to partition ids into groups
>>>>
>>>> So the full CQL statements are:
>>>>
>>>>
>>>> CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
>>>> 'replication_factor':3};
>>>>
>>>>
>>>>
>>>> CREATE TABLE test.documents (group text,id bigint,data
>>>> map<text,text>,PRIMARY KEY (group, id));
>>>>
>>>>
>>>>
>>>> INSERT INTO test.documents(id,group,data) VALUES
>>>> (0,'test',{'count':'0'});
>>>>
>>>> INSERT INTO test.documents(id,group,data) VALUES
>>>> (1,'test',{'count':'1'});
>>>>
>>>> INSERT INTO test.documents(id,group,data) VALUES
>>>> (2,'test',{'count':'2'});
>>>>
>>>>
>>>>
>>>> SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);
>>>>
>>>>
>>>>
>>>> Thanks for your help.
>>>>
>>>>
>>>>
>>>> Kind regards,
>>>>
>>>>
>>>>
>>>> /Dave
>>>>
>>>>
>>>> On Thu, Mar 13, 2014 at 2:00 PM, David Savage 
>>>> <davemssav...@gmail.com>wrote:
>>>>
>>>>>  Hmmm that maybe the problem, I'm currently testing with 2.0.2 which
>>>>> got dragged in by the cassandra unit library I'm using for testing [1] I
>>>>> will try to fix my build dependencies and retry, thx.
>>>>>
>>>>> /Dave
>>>>>
>>>>> [1] https://github.com/jsevellec/cassandra-unit
>>>>>
>>>>>
>>>>> On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael <
>>>>> michael.la...@nytimes.com> wrote:
>>>>>
>>>>>> I have no problem doing this w 2.0.5 - what version of C* are you
>>>>>> using? Or maybe I don't understand your data model... attach 'creates' if
>>>>>> you don't mind.
>>>>>>
>>>>>> ml
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 13, 2014 at 9:24 AM, David Savage <davemssav...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> Thanks for the help, unfortunately I'm not sure that's the problem,
>>>>>>> the id is the primary key on the documents table and the timestamp
>>>>>>> is the primary key on the eventlog table
>>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>> On Thursday, 13 March 2014, Peter Lin <wool...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> it's not clear to me if your "id" column is the KEY or just a
>>>>>>>> regular column with secondary index.
>>>>>>>>
>>>>>>>> queries that have IN on non primary key columns isn't supported
>>>>>>>> yet. not sure if that answers your question.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Mar 13, 2014 at 7:12 AM, David Savage <
>>>>>>>> davemssav...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi there,
>>>>>>>>>
>>>>>>>>> I'm experimenting using cassandra and have run across an error
>>>>>>>>> message which I need a little more information on.
>>>>>>>>>
>>>>>>>>> The use case I'm experimenting with is a series of document
>>>>>>>>> updates (documents being an arbitrary map of key value pairs), I 
>>>>>>>>> would like
>>>>>>>>> to find the latest document updates after a specified time period. I 
>>>>>>>>> don't
>>>>>>>>> want to store many copies of the documents (one per update) as the 
>>>>>>>>> updates
>>>>>>>>> are often only to single keys in the map so that would involve a lot 
>>>>>>>>> of
>>>>>>>>> duplicated data.
>>>>>>>>>
>>>>>>>>> The solution I've found that seems to fit best in terms of
>>>>>>>>> performance is to have two tables.
>>>>>>>>>
>>>>>>>>> One that has an event log of timeuuid -> docid and a second that
>>>>>>>>> stores the documents themselves stored by docid -> map<string, 
>>>>>>>>> string>. I
>>>>>>>>> then run two queries, one to select ids that have changed after a 
>>>>>>>>> certain
>>>>>>>>> time:
>>>>>>>>>
>>>>>>>>> SELECT id FROM eventlog WHERE timestamp>=minTimeuuid($minimumTime)
>>>>>>>>>
>>>>>>>>> and then a second to select the actual documents themselves
>>>>>>>>>
>>>>>>>>> SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6,
>>>>>>>>> 7…)
>>>>>>>>>
>>>>>>>>> However this then explodes on query with the error message:
>>>>>>>>>
>>>>>>>>> "Cannot restrict PRIMARY KEY part id by IN relation as a
>>>>>>>>> collection is selected by the query"
>>>>>>>>>
>>>>>>>>> Detective work lead me to these lines in
>>>>>>>>> org.apache.cassandra.cql3.statementsSelectStatement:
>>>>>>>>>
>>>>>>>>>                      // We only support IN for the last name and
>>>>>>>>> for compact storage so far
>>>>>>>>>                     // TODO: #3885 allows us to extend to non
>>>>>>>>> compact as well, but that remains to be done
>>>>>>>>>                     if (i != stmt.columnRestrictions.length - 1)
>>>>>>>>>                         throw new
>>>>>>>>> InvalidRequestException(String.format("PRIMARY KEY part %s cannot be
>>>>>>>>> restricted by IN relation", cname));
>>>>>>>>>                     else if (stmt.selectACollection())
>>>>>>>>>                         throw new
>>>>>>>>> InvalidRequestException(String.format("Cannot restrict PRIMARY KEY 
>>>>>>>>> part %s
>>>>>>>>> by IN relation as a collection is selected by the query", cname));
>>>>>>>>>
>>>>>>>>> It seems like #3885 will allow support for the first IF block
>>>>>>>>> above, but I don't think it will allow the second, am I correct?
>>>>>>>>>
>>>>>>>>> Any pointers on how I can work around this would be greatly
>>>>>>>>> appreciated.
>>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>>
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Reply via email to