Thanks for all the help everyone.  The values were meant to be binary.  I ended 
making the possible values between 0 and 50 instead of just 0 or 1.  That way 
no single index row gets that wide.  I now run queries for everything from 1 to 
50 to get 'queued' items and set the value to 0 when I'm done (I will never 
query for row_loaded = 0).  It's unfortunate Cassandra doesn't delegate the 
query execution to a node that had the index row on it, but rather tries to 
move the entire index row to the node that is queried.

-Chris

----- Original Message -----
From: "David Leimbach" <leim...@gmail.com>
To: user@cassandra.apache.org
Sent: Monday, April 2, 2012 8:51:46 AM
Subject: Re: really bad select performance


This is all very hypothetical, but I've been bitten by this before. 

Does row_loaded happen to be a binary or boolean value? If so the secondary 
index generated by Cassandra will have at most 2 rows, and they'll be REALLY 
wide if you have a lot of entries. Since Cassandra doesn't distribute columns 
over rows, those potentially very wide index rows, and their replicas, must 
live in SSTables in their entirety on the nodes that own them (and their 
replicas). 


Even though you limit 1, I'm not sure what "behind the scenes" things Cassandra 
does. I've received advice to avoid the built in secondary indexes in Cassandra 
for some of these reasons. Also if row_loaded is meant to implement some kind 
of queuing behavior, it could be the wrong problem space for Cassandra as a 
result of all of the above. 









On Sat, Mar 31, 2012 at 12:22 PM, aaron morton < aa...@thelastpickle.com > 
wrote: 




Is there anything in the logs when you run the queries ? 


Try turning the logging up to DEBUG on the node that fails to return and see 
what happens. You will see it send messages to other nodes and do work itself. 

One thing to note, a query that uses secondary indexes runs on a node for each 
token range. So it will use more than CL number of nodes. 


Cheers 







----------------- 
Aaron Morton 
Freelance Developer 
@aaronmorton 
http://www.thelastpickle.com 


On 30/03/2012, at 11:52 AM, Chris Hart wrote: 



Hi, 

I have the following cluster: 

136112946768375385385349842972707284580 
<ip address> MountainViewRAC1 Up Normal 1.86 GB 20.00% 0 
<ip address> MountainViewRAC1 Up Normal 2.17 GB 33.33% 
56713727820156410577229101238628035242 
<ip address> MountainViewRAC1 Up Normal 2.41 GB 33.33% 
113427455640312821154458202477256070485 
<ip address> Rackspace RAC1 Up Normal 3.9 GB 13.33% 
136112946768375385385349842972707284580 

The following query runs quickly on all nodes except 1 MountainView node: 

select * from Access_Log where row_loaded = 0 limit 1; 

There is a secondary index on row_loaded. The query usually doesn't complete 
(but sometimes does) on the bad node and returns very quickly on all other 
nodes. I've upping the rpc timeout to a full minute (rpc_timeout_in_ms: 60000) 
in the yaml, but it still often doesn't complete in a minute. It seems just as 
likely to complete and takes about the same amount of time whether the limit is 
1, 100 or 1000. 


Thanks for any help, 
Chris 


Reply via email to