Re: Multiple Primary Keys on an IN clause or 2i?

aaron morton Wed, 27 Mar 2013 13:03:49 -0700

> CREATE TABLE msg_archive(
> thread_id varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
This with reversed clustering so the most recent columns are at the start 
(makes it quicker to get the last X messages) see 
http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_TABLE#cql-create-columnfamily


>  SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) 
> AND ts < 1234567890;
> 

This but do not request 1000 rows per call. Each row becomes a request on a 
node, and there is a max of 32 threads processing those reads. Requests of 50 
to 100 are reasonable depending on the number of nodes you have. If you only 
have 3 nodes I would start smaller. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 7:17 PM, Byron Wang <byron.w...@woowteam.com> wrote:

> Hi,  
> 
> I'm currently trying to implement an offline message retrieval solution 
> wherein I retrieve messages after a particular timestamp for specific users. 
> My question is will what route should I go for…multple primary keys on an IN 
> clause or using 2i
> 
> 
> The current model of the messages table looks something like this
> 
> CREATE TABLE msg_archive(
> thread_id varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
> 
> 
> 
> where thread_id is an alphabetized order of sender and recipient such as 
> "brian|john"
> 
> Now, in order to retrieve the messages, I will have to retrieve them based on 
> the number of contacts you have and as such the query will look something 
> like this
> 
> SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) 
> AND ts < 1234567890;
> 
> Ofcourse the list of friends a user can have can potentially reach around 500 
> or even worse 1000 so the IN clause can potentially have these large amount 
> of primary keys.
> 
> 
> 
> 
> The question is will this work well or do I have to modify the schema such 
> that we should incorporate secondary indexes And look something like this 
> instead?
> 
> CREATE TABLE msg_archive(
> thread_id varchar,
> recipient varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
> 
> 
> 
> CREATE INDEX ON msg_archive (recipient);
> 
> For the select statement, ofcourse it will be as simple as
> 
> SELECT * FROM msg_archive WHERE recipient = 'brian' AND ts < 1234567890;
> 
> 
> 
> Which is actually better in terms of performance? Or are there other 
> suggestions to this kind of model?
> 
> Thanks!
> Byron
> 
> 
> 
> --  
> Byron Wang
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> 
>

Re: Multiple Primary Keys on an IN clause or 2i?

Reply via email to