Secondary index and/or row key in the read path ?

Jean-Armel Luce Sun, 19 Aug 2012 23:33:33 -0700

Hello,

I am using Cassandra 1.1.1 and CQL3.


Could you tell me what is the best strategy for retrieving a row using a
condition on a row key (operator =) and also filter on a 2nd column?

For example, I create a  table named "testwhere" with a row key on column
"mykey" and 2 other columns "col1" and "col2".

I would like to retrieve the row with the key 'key1' only if col1 = 'abcd'
I send the request  SELECT mykey, col1 from testwhere where mykey = 'key1'
and col1 = 'abcd';
As you can see, the 1st condition in the WHERE clause is based on the row
key.
However the request doesn't work if no secondary index is created on the
column used in the 2nd condition of the WHERE clause. It works only if a
secondary indexed is created on this 2nd column (see below).
Does that mean that the secondary index is used in the read path instead of
the row key, even if there is a condition on the row key in the WHERE
clause ?

Here is an example :

jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
Use HELP for help.
cqlsh> use test1;
cqlsh:test1> CREATE TABLE testwhere (mykey varchar PRIMARY KEY,
         ...  col1 varchar,
         ...  col2 varchar);
cqlsh:test1> INSERT INTO testwhere (mykey, col1, col2) VALUES ('key1',
'abcd', 'efgh');

cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1';
 mykey | col1
-------+------
  key1 | abcd

cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and
col1 = 'abcd';
Bad Request: No indexed columns present in by-columns clause with Equal
operator

cqlsh:test1> CREATE INDEX col1_idx ON testwhere (col1);
cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and
col1 = 'abcd';
 mykey | col1
-------+------
  key1 | abcd

cqlsh:test1>


My understanding is :
The 1st SELECT is working because there is only the row key in the WHERE
clause
The 2nd SELECT is not working because the row key is in the WHERE clause,
but there is no index on col1
The 3rd SELECT (which is the same as the 2nd SELECT) is working because the
row key is in the WHERE clause, and a secondary index is created on col1


For this use case, what are the recommendations of the Cassandra community ?
- do we need to create a secondary index for each column we want to filter ?
- do we need to post-process (filter) the result of the query in our
application ?
- or is there another solution ?


Thanks.

Jean-Armel

Secondary index and/or row key in the read path ?

Reply via email to