On Mon, Dec 29, 2014 at 5:20 PM, Sam Klock <skl...@akamai.com> wrote: > > > Our investigation led us to logic in Cassandra used to paginate scans > of rows in indexes on composites. The issue seems to be the short > algorithm Cassandra uses to select the size of the pages for the scan, > partially given on the following two lines (from > o.a.c.db.index.composites.CompositesSearcher): > > private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(), > 1); > private int rowsPerQuery = Math.max(Math.min(filter.maxRows(), > filter.maxColumns() / meanColumns), 2); > > The value computed for rowsPerQuery appears to be the page size. > > Based on our reading of the code, unless the value obtained for > meanColumns is very small, a large query-level page size is used, or > the DISTINCT keyword is used, the value for (filter.maxColumns() / > meanColumns) always ends up being small enough that the page size is > 2. This seems to be the case both for very low-cardinality indexes > (two different indexed values) and for indexes with higher > cardinalities as long as the number of entries per index row is more > than a few thousand. > > Does anyone here have relevant experience with secondary indexes that > might shed light on the design choice here? In particular, can anyone > (perhaps the developers?) explain what this algorithm is intended to do > and what we might do to safely get around this limitation? >
Hmm, this does seem suspect. I'm not sure off the top of my head why the mean columns are used at all. Each index entry (in other words, each cell in the index table) should correspond to one result row, so it seems like the slice limit for the index table should only be based on maxRows/maxColumns (or perhaps better, filter.maxResults()). Can you go ahead and open a JIRA ticket to look into this? > > Also (to the developers watching this list): is this the sort of > question we should be addressing to the dev list directly? Yes, you can either send a message to the dev list or open a JIRA ticket when you're pretty sure you've found a bug. We don't mind confirming and closing a ticket if it's not a bug. Thanks! -- Tyler Hobbs DataStax <http://datastax.com/>