That's only a small part of the problem. A larger issue is that what follows the index read is basically a multiget within that node, which generates a lot of non-sequential I/O. The second issue is that nodes are queried sequentially to get more result rows until the requested row count has been reached. So, if each node only has a handful of matching rows, you'll end up querying every node in the cluster (more or less) to answer the query.
On Tue, Jul 31, 2012 at 8:44 PM, Arya Goudarzi <gouda...@gmail.com> wrote: > I know that secondary indexes are stored > to local column families on each node. Previously where the default > cache key value was 200,000 rows, and you couldn't really tune the > local index column family, that posed a limitation on low cardinality > of the possible values of the secondary index column. However, in > Cassandra 1.1.2, I don't see the option of tuning cache per row count > any more and it is solely memory based. I wonder if this eliminates he > previous limitations with secondary indexes. > -- Tyler Hobbs DataStax <http://datastax.com/>