I think you'd just be better served with just a little different primary
key.

If your primary key was (user_id, created_at)  or (user_id, created_at,
question_id), then you'd be able to run the above query without a problem.

This will mean that the entire pantheon of a specific user_id will be
stored as a 'row' (in the old style C* vernacular), and then the
information would be ordered by the 2nd piece of the primary key (or 2nd,
then 3rd if you included question_id).

You would certainly want to include any field that makes a record unique in
the primary key.  Another thing to note is that if a field is part of the
primary key you can not create a secondary index on that field.  You can
work around that by storing the field twice, but you might want to rethink
your structure if you find yourself doing that often.


On Wed, Jun 19, 2013 at 12:05 PM, Christopher J. Bottaro <
cjbott...@academicworks.com> wrote:

> Hello,
>
> We are considering using Cassandra and I want to make sure our use case
> fits Cassandra's strengths.  We have the table like:
>
> answers
> -------
> user_id | question_id | result | created_at
>
> Where our most common query will be something like:
>
> SELECT * FROM answers WHERE user_id = 123 AND created_at > '01/01/2012'
> AND created_at < '01/01/2013'
>
> Sometimes we will also limit by a question_id or a list of question_ids.
>
> Secondary indexes will be created on user_id and question_id.  We expect
> the upper bound of number of answers for a given user to be around 10,000.
>
> Now my understanding of how Cassandra will run the aforementioned query is
> that it will load all the answers for a given user into memory using the
> secondary index, then scan over that set filtering based on the dates.
>
> Considering that that will be our most used query and it will happen very
> often, is this a bad use case for Cassandra?
>
> Thanks for the help.
>

Reply via email to