> CREATE TABLE msg_archive( > thread_id varchar, > ts timestamp, > msg blob, > PRIMARY KEY (thread_id, ts)) This with reversed clustering so the most recent columns are at the start (makes it quicker to get the last X messages) see http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_TABLE#cql-create-columnfamily
> SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) > AND ts < 1234567890; > This but do not request 1000 rows per call. Each row becomes a request on a node, and there is a max of 32 threads processing those reads. Requests of 50 to 100 are reasonable depending on the number of nodes you have. If you only have 3 nodes I would start smaller. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 26/03/2013, at 7:17 PM, Byron Wang <byron.w...@woowteam.com> wrote: > Hi, > > I'm currently trying to implement an offline message retrieval solution > wherein I retrieve messages after a particular timestamp for specific users. > My question is will what route should I go for…multple primary keys on an IN > clause or using 2i > > > The current model of the messages table looks something like this > > CREATE TABLE msg_archive( > thread_id varchar, > ts timestamp, > msg blob, > PRIMARY KEY (thread_id, ts)) > > > > where thread_id is an alphabetized order of sender and recipient such as > "brian|john" > > Now, in order to retrieve the messages, I will have to retrieve them based on > the number of contacts you have and as such the query will look something > like this > > SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) > AND ts < 1234567890; > > Ofcourse the list of friends a user can have can potentially reach around 500 > or even worse 1000 so the IN clause can potentially have these large amount > of primary keys. > > > > > The question is will this work well or do I have to modify the schema such > that we should incorporate secondary indexes And look something like this > instead? > > CREATE TABLE msg_archive( > thread_id varchar, > recipient varchar, > ts timestamp, > msg blob, > PRIMARY KEY (thread_id, ts)) > > > > CREATE INDEX ON msg_archive (recipient); > > For the select statement, ofcourse it will be as simple as > > SELECT * FROM msg_archive WHERE recipient = 'brian' AND ts < 1234567890; > > > > Which is actually better in terms of performance? Or are there other > suggestions to this kind of model? > > Thanks! > Byron > > > > -- > Byron Wang > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > >