RE: Secondary index issue, unable to query for records that should be there

Nate Sammons Tue, 08 Nov 2011 07:46:09 -0800

This is against a single server, not a cluster.  Replication factor for the 
keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.


I'm trying to get a simple test together that shows this.  Does anyone know if 
multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rka...@gmail.com]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be 
there

Nate, is this all against a single Cassandra server, or do you have a ring 
setup? If you do have a ring setup, what is your replicationfactor set to? Also 
what ConsistencyLevel are you writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons 
<nsamm...@ften.com<mailto:nsamm...@ften.com>> wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF 
with several secondary indexes to try out some options.  Right now I have the 
following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed 
year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: 
KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: 
KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these 
values on a Hector ColumnFamilyUpdater instance and update that way.  Then 
later I can query from the command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and 
messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should 
return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query 
for what should be there and get data back such as the above query, but later 
that same query returns 0 rows.  Similarly, with fewer clauses in the 
expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my 
client during the write, and I don't see anything in the logs (no errors 
anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on 
CQL queries with multiple indexed columns is good (does Cassandra intelligently 
use all available indexes on these queries?)



Thanks,

-nate

RE: Secondary index issue, unable to query for records that should be there

Reply via email to