Re: Optimizing queries for partition keys

Benjamin Lerer Thu, 22 Mar 2018 15:17:18 -0700

You should check the 3.x release. CASSANDRA-10657 could have fixed your
problem.



On Thu, Mar 22, 2018 at 9:15 PM, Benjamin Lerer <[email protected]
> wrote:

> Syvlain explained the problem in CASSANDRA-4536:
> " Let me note that in CQL3 a row that have no live column don't exist, so
> we can't really implement this with a range slice having an empty columns
> list. Instead we should do a range slice with a full-row slice predicate
> with a count of 1, to make sure we do have a live column before including
> the partition key. "
>
> By using ColumnFilter.selectionBuilder(); you do not select all the
> columns. By consequence, some partitions might be returned while they
> should not.
>
> On Thu, Mar 22, 2018 at 6:24 PM, Sam Klock <[email protected]> wrote:
>
>> Cassandra devs,
>>
>> We use workflows in some of our clusters (running 3.0.15) that involve
>> "SELECT DISTINCT key FROM..."-style queries.  For some tables, we
>> observed extremely poor performance under light load (i.e., a small
>> number of rows per second and frequent timeouts), which we eventually
>> traced to replicas shipping entire rows (which in some cases could store
>> on the order of MBs of data) to service the query.  That surprised us
>> (partly because 2.1 doesn't seem to behave this way), so we did some
>> digging, and we eventually came up with a patch that modifies
>> SelectStatement.java in the following way: if the selection in the query
>> only includes the partition key, then when building a ColumnFilter for
>> the query, use:
>>
>>     builder = ColumnFilter.selectionBuilder();
>>
>> instead of:
>>
>>     builder = ColumnFilter.allColumnsBuilder();
>>
>> to initialize the ColumnFilter.Builder in gatherQueriedColumns().  That
>> seems to repair the performance regression, and it doesn't appear to
>> break any functionality (based on the unit tests and some smoke tests we
>> ran involving insertions and deletions).
>>
>> We'd like to contribute this patch back to the project, but we're not
>> convinced that there aren't subtle correctness issues we're missing,
>> judging both from comments in the code and the existence of
>> CASSANDRA-5912, which suggests optimizing this kind of query is
>> nontrivial.
>>
>> So: does this change sound safe to make, or are there corner cases we
>> need to account for?  If there are corner cases, are there plausibly
>> ways of addressing them at the SelectStatement level, or will we need to
>> look deeper?
>>
>> Thanks,
>> SK
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: Optimizing queries for partition keys

Reply via email to