Interesting... if I switch the columns to be UTF8 instead of integers, like this:
create column family IndexTest with key_validation_class = UTF8Type and comparator = UTF8Type and column_metadata = [ {column_name:year, validation_class:UTF8Type, index_type: KEYS}, {column_name:month, validation_class:UTF8Type, index_type: KEYS}, {column_name:day, validation_class:UTF8Type, index_type: KEYS}, {column_name:hour, validation_class:UTF8Type, index_type: KEYS}, {column_name:minute, validation_class:UTF8Type, index_type: KEYS}, {column_name:data, validation_class:UTF8Type} ]; And change the hector code to use setString(...) instead of setInteger(...). Then everything works fine. Is there a CQL bug with respect to non-string columns? Thanks, -nate From: Nate Sammons [mailto:nsamm...@ften.com] Sent: Tuesday, November 08, 2011 11:14 AM To: user@cassandra.apache.org Subject: RE: Secondary index issue, unable to query for records that should be there Note that I had identical behavior using a fresh download of Cassandra 1.0.2 as of today. Thanks, -nate From: Nate Sammons [mailto:nsamm...@ften.com]<mailto:[mailto:nsamm...@ften.com]> Sent: Tuesday, November 08, 2011 10:20 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: Secondary index issue, unable to query for records that should be there I restarted with logging turned up to DEBUG, and after quite a bit of logging during startup, I re-ran a query: get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49; produced the following in the following: DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) scan DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) restricted ranges for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]] DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) scan ranges are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1] DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1 DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@7bc203c<mailto:org.apache.cassandra.db.IndexScanCommand@7bc203c> from natebookpro/127.0.1.1 DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) Primary scan clause is minute DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 189) collectAllData DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) fetched null DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1> DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 ResponseVerbHandler.java (line 44) Processing response on a callback from 808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1 DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@6a25a21d<mailto:org.apache.cassandra.db.IndexScanCommand@6a25a21d> from natebookpro/127.0.1.1 DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) Primary scan clause is minute DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 189) collectAllData DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) fetched null DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1> DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 ResponseVerbHandler.java (line 44) Processing response on a callback from 809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1> Whereas a direct read of a key using "get IndexTest[2011-1-14-18-49--1];" produced a result, and the following in the logs: DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) get_slice DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) Command/ConsistencyLevel is SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1 DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) reading data locally DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) LocalReadRunnable reading SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000) DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line 189) collectAllData DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line 123) collecting 0 of 1000000: data:false:512@1320769510502017 DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 1 of 1000000: day:false:4@1320769510502014 DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 2 of 1000000: hour:false:4@1320769510502015 DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 3 of 1000000: minute:false:4@1320769510502016 DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 4 of 1000000: month:false:4@1320769510502013 DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 5 of 1000000: year:false:4@1320769510502012 DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,170 StorageProxy.java (line 689) Read: 10 ms. Note that a query for "get IndexTest where minute=49" (which also returns no records) results in the following logs: DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,210 CassandraServer.java (line 728) scan DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1017) restricted ranges for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]] DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1104) scan ranges are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1] DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1 DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@547d6c11<mailto:org.apache.cassandra.db.IndexScanCommand@547d6c11> from natebookpro/127.0.1.1 DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 96) Primary scan clause is minute DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 CollationController.java (line 189) collectAllData DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 163) fetched null DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 462@natebookpro/127.0.1.1<mailto:462@natebookpro/127.0.1.1> DEBUG [RequestResponseStage:17] 2011-11-08 10:13:40,215 ResponseVerbHandler.java (line 44) Processing response on a callback from 462@natebookpro/127.0.1.1<mailto:462@natebookpro/127.0.1.1> DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1 DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@62132898<mailto:org.apache.cassandra.db.IndexScanCommand@62132898> from natebookpro/127.0.1.1 DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 96) Primary scan clause is minute DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 CollationController.java (line 189) collectAllData DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 163) fetched null DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 463@natebookpro/127.0.1.1<mailto:463@natebookpro/127.0.1.1> DEBUG [RequestResponseStage:18] 2011-11-08 10:13:40,219 ResponseVerbHandler.java (line 44) Processing response on a callback from 463@natebookpro/127.0.1.1<mailto:463@natebookpro/127.0.1.1> Thanks, -nate From: Jake Luciani [mailto:jak...@gmail.com]<mailto:[mailto:jak...@gmail.com]> Sent: Tuesday, November 08, 2011 8:56 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Secondary index issue, unable to query for records that should be there Hi Nate, Could you try running it with debug enabled on the logs? it will give more insite into what's going on. -Jake On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons <nsamm...@ften.com<mailto:nsamm...@ften.com>> wrote: This is against a single server, not a cluster. Replication factor for the keyspace is set to 1, CL is the default for Hector, which I think is QUORUM. I'm trying to get a simple test together that shows this. Does anyone know if multiple indexes like this are efficient? Thanks, -nate From: Riyad Kalla [mailto:rka...@gmail.com<mailto:rka...@gmail.com>] Sent: Monday, November 07, 2011 4:31 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Secondary index issue, unable to query for records that should be there Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you writing with when storing the values? -R On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <nsamm...@ften.com<mailto:nsamm...@ften.com>> wrote: Hello, I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several secondary indexes to try out some options. Right now I have the following to create my CF using the CLI: create column family MyTest with key_validation_class = UTF8Type and comparator = UTF8Type and column_metadata = [ -- absolute timestamp for this message, also indexed year/month/day/hour/minute -- index these as they are low cardinality {column_name:messageTimestamp, validation_class:LongType}, {column_name:messageYear, validation_class:IntegerType, index_type: KEYS}, {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS}, {column_name:messageDay, validation_class:IntegerType, index_type: KEYS}, {column_name:messageHour, validation_class:IntegerType, index_type: KEYS}, {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS}, ... other non-indexed columns defined ]; So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way. Then later I can query from the command line with CQL such as: get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44; etc. This generally works, however at some point queries that I know should return data no longer return any rows. So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows. Similarly, with fewer clauses in the expression, like this: get MyTest where messageYear=2011 and messageMonth=6; Will also return 0 rows. ??????? Any idea what could be going wrong? I'm not getting any exceptions in my client during the write, and I don't see anything in the logs (no errors anyway). A second question - is what I'm doing insane? I'm not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?) Thanks, -nate -- http://twitter.com/tjake