Re: Optimizing queries for partition keys

Sam Klock Tue, 24 Apr 2018 11:17:23 -0700

Thanks.  For those interested: opened CASSANDRA-14415.

SK


On 2018-04-19 06:04, Benjamin Lerer wrote:
> Hi Sam,
> 
> Your finding is interesting. Effectively, if the number of bytes to skip is
> larger than the remaining bytes in the buffer + the buffer size it could be
> faster to use seek.
> Feel free to open a JIRA ticket and attach your patch. It will be great if
> you could add to the ticket your table schema as well
>  as some information on your environment (e.g. disk type).
> 
> On Tue, Apr 17, 2018 at 8:53 PM, Sam Klock <skl...@akamai.com> wrote:
> 
>> Thanks (and apologies for the delayed response); that was the kind of
>> feedback we were looking for.
>>
>> We backported the fix for CASSANDRA-10657 to 3.0.16, and it partially
>> addresses our problem in the sense that it does limit the data sent on
>> the wire.  The performance is still extremely poor, however, due to the
>> fact that Cassandra continues to read large volumes of data from disk.
>> (We've also confirmed this behavior in 3.11.2.)
>>
>> With a bit more investigation, we now believe the problem (after
>> CASSNDRA-10657 is applied) is in RebufferingInputStream.skipBytes(),
>> which appears to read bytes in order to skip them.  The subclass used in
>> our case, RandomAccessReader, exposes a seek(), so we overrode
>> skipBytes() in it to make use of seek(), and that seems to resolve the
>> problem.
>>
>> This change is intuitively much safer than the one we'd originally
>> identified, but we'd still like to confirm with you folks whether it's
>> likely safe and, if so whether it's also potentially worth contributing.
>>
>> Thanks,
>> Sk
>>
>>
>> On 2018-03-22 18:16, Benjamin Lerer wrote:
>>
>>> You should check the 3.x release. CASSANDRA-10657 could have fixed your
>>> problem.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 9:15 PM, Benjamin Lerer <
>>> benjamin.le...@datastax.com
>>>
>>>> wrote:
>>>>
>>>
>>> Syvlain explained the problem in CASSANDRA-4536:
>>>> " Let me note that in CQL3 a row that have no live column don't exist, so
>>>> we can't really implement this with a range slice having an empty columns
>>>> list. Instead we should do a range slice with a full-row slice predicate
>>>> with a count of 1, to make sure we do have a live column before including
>>>> the partition key. "
>>>>
>>>> By using ColumnFilter.selectionBuilder(); you do not select all the
>>>> columns. By consequence, some partitions might be returned while they
>>>> should not.
>>>>
>>>> On Thu, Mar 22, 2018 at 6:24 PM, Sam Klock <skl...@akamai.com> wrote:
>>>>
>>>> Cassandra devs,
>>>>>
>>>>> We use workflows in some of our clusters (running 3.0.15) that involve
>>>>> "SELECT DISTINCT key FROM..."-style queries.  For some tables, we
>>>>> observed extremely poor performance under light load (i.e., a small
>>>>> number of rows per second and frequent timeouts), which we eventually
>>>>> traced to replicas shipping entire rows (which in some cases could store
>>>>> on the order of MBs of data) to service the query.  That surprised us
>>>>> (partly because 2.1 doesn't seem to behave this way), so we did some
>>>>> digging, and we eventually came up with a patch that modifies
>>>>> SelectStatement.java in the following way: if the selection in the query
>>>>> only includes the partition key, then when building a ColumnFilter for
>>>>> the query, use:
>>>>>
>>>>>      builder = ColumnFilter.selectionBuilder();
>>>>>
>>>>> instead of:
>>>>>
>>>>>      builder = ColumnFilter.allColumnsBuilder();
>>>>>
>>>>> to initialize the ColumnFilter.Builder in gatherQueriedColumns().  That
>>>>> seems to repair the performance regression, and it doesn't appear to
>>>>> break any functionality (based on the unit tests and some smoke tests we
>>>>> ran involving insertions and deletions).
>>>>>
>>>>> We'd like to contribute this patch back to the project, but we're not
>>>>> convinced that there aren't subtle correctness issues we're missing,
>>>>> judging both from comments in the code and the existence of
>>>>> CASSANDRA-5912, which suggests optimizing this kind of query is
>>>>> nontrivial.
>>>>>
>>>>> So: does this change sound safe to make, or are there corner cases we
>>>>> need to account for?  If there are corner cases, are there plausibly
>>>>> ways of addressing them at the SelectStatement level, or will we need to
>>>>> look deeper?
>>>>>
>>>>> Thanks,
>>>>> SK
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Optimizing queries for partition keys

Reply via email to