I missed a ! in the code :) The query will break the token ring into ranges based on the node tokens and then find the UP nodes for each range.
I've taken another walk through the code, the logs helped. In short, you do not have enough UP nodes to support an indexed get at CL ONE. It is working by design and you *should* have gotten an UnavailableException returned. There must be CL up replicas for each token range. In your test node 200.190 is down and so the next node, with RF 2 this means there are no replicas for the range. The log line below is logged just before the UnavalableException is raised > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) > Live nodes do not satisfy ConsistencyLevel (1 required) You will need at least every RF'th node UP. Another way to look at is if you have RF contiguous nodes DOWN you cannot perform an indexed get. If you are interested this is what the logs are saying… > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) > scan ranges are > [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1] There are 4 token ranges to query, i.e. we have to make 4 reads to query over the whole cluster. > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to /172.16.200.130 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118 Starting to read for the first token range. A bug in 0.8.6 makes it read from 202.118 when it does not need to. > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1550) Primary scan clause is member > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1563) Expanding slice filter to entire row to cover additional expressions > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1605) Scanning index 'Audit_Log.member EQ kamal' starting with > DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) > collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009 > DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line > 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx > [7a7a32323636373030303438303031:false:0@1325704860925009,]) Scanned the secondary index on 200.130 and found an entry for the row key 7a7a32323636373030303438303031 matched the index expression. > DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line > 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130 Returning ZERO rows for the query result. Because the row key we read above has the token 111413491371349413596553235966977111575L which is not in the first token range from above (0,42535295865117307932921825928971026432] and this is the range we are interested in now. > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to /172.16.202.118 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118 Processing the second range now. There is only one node up for this range, 202.118 > DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 > ResponseVerbHandler.java (line 48) Processing response on a callback from > 172@/172.16.202.118 > DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 > ResponseVerbHandler.java (line 48) Processing response on a callback from > 173@/172.16.202.118 Got the callback from 202.118 for both the query ranges. The logs on 202.118 show the same local local query. But I'm a little confused as to why the row exists on node 2 at all. > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to Moving on, time to process the third token range (85070591730234615865843651857942052864,127605887595351923798765477786913079296] > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) > Live nodes do not satisfy ConsistencyLevel (1 required) Oh noes there are no nodes available for that token range. Throw UnavailableException Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/01/2012, at 10:52 AM, Kamal Bahadur wrote: > Hi Aaron, > > Thanks for your response! > > I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1 > contains the data, CL ONE and RF 2). I was connected to Node 1 while I ran > the test. I still did not get any data. See below logs: > > ----------------------------------------------------------- > nodetool ring: > ----------------------------------------------------------- > > Address DC Rack Status State Load Owns > Token > > 127605887595351923798765477786913079296 > xxx.16.200.130 datacenter1 rack1 Up Normal 584.97 KB 25.00% > 0 > xxx.16.202.118 datacenter1 rack1 Up Normal 570.91 KB 25.00% > 42535295865117307932921825928971026432 > xxx.16.200.190 datacenter1 rack1 Down Normal 593.51 KB 25.00% > 85070591730234615865843651857942052864 > xxx.16.200.217 datacenter1 rack1 Down Normal 617.23 KB 25.00% > 127605887595351923798765477786913079296 > > ----------------------------------------------------- > Logs on Node 1 > ----------------------------------------------------- > DEBUG [Thread-7] 2012-01-04 13:44:00,665 IncomingTcpConnection.java (line > 148) Version is now 2 > INFO [HintedHandoff:1] 2012-01-04 13:44:00,850 HintedHandOffManager.java > (line 314) Endpoint /172.16.200.190 died before hint delivery, aborting > DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java > (line 303) Checking remote schema before delivering hints > DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java > (line 295) schema for /172.16.200.217 matches local schema > DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java > (line 309) Sleeping 17296ms to stagger hint delivery > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,857 CassandraServer.java (line > 670) scan > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 889) > restricted ranges for query [-1,-1] are [[-1,0], > (0,42535295865117307932921825928971026432], > (42535295865117307932921825928971026432,85070591730234615865843651857942052864], > > (85070591730234615865843651857942052864,127605887595351923798765477786913079296], > (127605887595351923798765477786913079296,-1]] > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) > scan ranges are > [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1] > DEBUG [Thread-7] 2012-01-04 13:44:00,876 IncomingTcpConnection.java (line > 148) Version is now 2 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to /172.16.200.130 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118 > DEBUG [WRITE-/172.16.202.118] 2012-01-04 13:44:00,884 > OutboundTcpConnection.java (line 162) attempting to connect to /172.16.202.118 > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1550) Primary scan clause is member > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1563) Expanding slice filter to entire row to cover additional expressions > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1605) Scanning index 'Audit_Log.member EQ kamal' starting with > DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) > collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009 > DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line > 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx > [7a7a32323636373030303438303031:false:0@1325704860925009,]) > DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line > 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130 > DEBUG [RequestResponseStage:1] 2012-01-04 13:44:00,895 > ResponseVerbHandler.java (line 48) Processing response on a callback from > 171@/172.16.200.130 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to /172.16.202.118 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118 > DEBUG [Thread-7] 2012-01-04 13:44:00,912 IncomingTcpConnection.java (line > 148) Version is now 2 > DEBUG [Thread-7] 2012-01-04 13:44:00,913 IncomingTcpConnection.java (line > 148) Version is now 2 > DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 > ResponseVerbHandler.java (line 48) Processing response on a callback from > 172@/172.16.202.118 > DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 > ResponseVerbHandler.java (line 48) Processing response on a callback from > 173@/172.16.202.118 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) > Live nodes do not satisfy ConsistencyLevel (1 required) > ----------------------------------------------------- > Logs on Node 2 > ----------------------------------------------------- > DEBUG [Thread-6] 2012-01-04 13:44:00,886 Gossiper.java (line 197) Setting > version 2 for /172.16.200.130 > DEBUG [Thread-6] 2012-01-04 13:44:00,886 IncomingTcpConnection.java (line > 105) set version for /172.16.200.130 to 2 > DEBUG [Thread-6] 2012-01-04 13:44:00,895 IncomingTcpConnection.java (line > 148) Version is now 2 > DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line > 1550) Primary scan clause is member > DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line > 1550) Primary scan clause is member > DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line > 1563) Expanding slice filter to entire row to cover additional expressions > DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line > 1563) Expanding slice filter to entire row to cover additional expressions > DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line > 1605) Scanning index 'Audit_Log.member EQ kamal' starting with > DEBUG [ReadStage:3] 2012-01-04 13:44:00,907 ColumnFamilyStore.java (line > 1605) Scanning index 'Audit_Log.member EQ kamal' starting with > DEBUG [ReadStage:2] 2012-01-04 13:44:00,909 SliceQueryFilter.java (line 123) > collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325615757889011 > DEBUG [ReadStage:2] 2012-01-04 13:44:00,910 ColumnFamilyStore.java (line > 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx > [7a7a32323636373030303438303031:false:0@1325615757889011,]) > DEBUG [ReadStage:3] 2012-01-04 13:44:00,910 SliceQueryFilter.java (line 123) > collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325615757889011 > DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 ColumnFamilyStore.java (line > 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx > [7a7a32323636373030303438303031:false:0@1325615757889011,]) > DEBUG [ReadStage:2] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line > 46) Sending RangeSliceReply{rows=} to 172@/172.16.200.130 > DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line > 46) Sending RangeSliceReply{rows=} to 173@/172.16.200.130 > DEBUG [Thread-4] 2012-01-04 13:44:01,664 IncomingTcpConnection.java (line > 148) Version is now 2 > > > > On Wed, Jan 4, 2012 at 1:15 AM, aaron morton <aa...@thelastpickle.com> wrote: > I've not spent much time with the secondary indexes, so a couple of > questions. > > Whats is the output of nodetool ring ? > Which node were you connected to when you did the get ? > If you enable DEBUG logging what do the log messages from StorageProxy say > that contain the string "scan ranges are" and "reading .. from ..." > > Now for the wild guessing….It's working as designed for a CL ONE request. > Looking at test case 5 and *assuming* you were connected to node 2 this is > what I think is happening: A get indexed slice query without a start key does > not know which nodes will contain the data. Reading the code it will consider > the replicas for the minimum token as nodes to send the query to, for a CL > ONE query it will only use one. If you were connected to node 2 the query > would have executed only on node 2. > > This is where I get confused. What happens if you have 50 nodes, with RF 3, > and you execute a get_indexed_slice at QUOURM with no start_key and the only > rows that satisfy the query exist on nodes 47, 48 and 49. So they are a long > way away from the minimum token, assuming SimpleStrategy and well ordered > token ring. > > I think I've missed something, anyone ? > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote: > >> Hi Peter, >> >> To test, I wiped out all the data from Cassandra and inserted just one >> record. The row key is "7a7a32323636373030303438303031". I used getendpoints >> to see where my data is and double checked the same using sstable2json >> command. >> >> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4 nodes >> cluster. I used cassandra-cli to query the data by using one of the >> secondary index but following are my results: >> >> Test Node 1 Node 2 Node 3 Node 4 Got data back? >> 1 Up Up Up Up Yes >> 2 Up Up Up >> Yes >> 3 Up Up >> Up Yes >> 4 >> Up Up Up Yes >> 5 Up Up >> >> No >> 6 >> >> Up Up No >> 7 Up >> >> Up No >> >> >> It turns out that even though my consistency level is ONE, since I am using >> secondary index to query the data, at least 3 nodes has to be running. And >> out of these 3 running nodes, it works even if one nodes contains the data. >> >> Somewhere in the mailing I read that "Iterating through all of the rows >> matching an index clause on your cluster is guaranteed to touch N/RF of the >> nodes in your cluster, because each node only knows about data that is >> indexed locally." >> >> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is >> the number of nodes and 2 is the RF. If it is 2, why is it not returning any >> data when the two nodes that contains the data is running (test #7)? >> >> For my use case, I have to have a RF of 2 and should be able to query using >> secondary index with a CL of ONE. Is this possible when 2 nodes are down in >> a 4 nodes cluster? Is there any limitations on using secondary index? >> >> Thanks in advance. >> >> Thanks, >> Kamal >> >> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller >> <peter.schul...@infidyne.com> wrote: >> > Thanks for the response Peter! I checked everything and it look good to me. >> > >> > I am stuck with this for almost 2 days now. Has anyone had this issue? >> >> While it is certainly possible that you're running into a bug, it >> seems unlikely to me since it is the kind of bug that would affect >> almost anyone if it is failing with Unavailable due to unrelated (not >> in replica sets) nodes being down. >> >> Can you please post back with (1) the ring layout ('nodetool ring'), >> and (2) the exact row key that you're testing with? >> >> You might also want to run with DEBUG level (modify >> log4j-server.properties at the top) and the strategy (assuming you are >> using NetworkTopologyStrategy) will log selected endpoints, and >> confirm that it's indeed picking endpoints that you think it should >> based on getendpoints. >> >> -- >> / Peter Schuller (@scode, http://worldmodscode.wordpress.com) >> > >