Re: Performance Of IN Queries On Wide Rows

Jeff Jirsa Wed, 21 Feb 2018 13:37:43 -0800

Slight nuance: we don't load the whole row into memory, but the column
index (and the result set, and the tombstones in the partition), which can
still spike your GC/heap (and potentially overflow the row cache, if you
have it on, which is atypical).


On Wed, Feb 21, 2018 at 1:35 PM, Carl Mueller <carl.muel...@smartthings.com>
wrote:

> Cass 2.1.14 is missing some wide row optimizations done in later cass
> releases IIRC.
>
> Speculation: IN won't matter, it will load the entire wide row into memory
> regardless which might spike your GC/heap and overflow the rowcache
>
> On Wed, Feb 21, 2018 at 2:16 PM, Gareth Collins <
> gareth.o.coll...@gmail.com> wrote:
>
>> Thanks for the response!
>>
>> I could understand that being the case if the Cassandra cluster is not
>> loaded. Splitting the work across multiple nodes would obviously make
>> the query faster.
>>
>> But if this was just a single node, shouldn't one IN query be faster
>> than multiple due to the fact that, if I understand correctly,
>> Cassandra should need to do less work?
>>
>> thanks in advance,
>> Gareth
>>
>> On Wed, Feb 21, 2018 at 7:27 AM, Rahul Singh
>> <rahul.xavier.si...@gmail.com> wrote:
>> > That depends on the driver you use but separate queries asynchronously
>> > around the cluster would be faster.
>> >
>> >
>> > --
>> > Rahul Singh
>> > rahul.si...@anant.us
>> >
>> > Anant Corporation
>> >
>> > On Feb 20, 2018, 6:48 PM -0500, Eric Stevens <migh...@gmail.com>,
>> wrote:
>> >
>> > Someone can correct me if I'm wrong, but I believe if you do a large
>> IN() on
>> > a single partition's cluster keys, all the reads are going to be served
>> from
>> > a single replica.  Compared to many concurrent individual equal
>> statements
>> > you can get the performance gain of leaning on several replicas for
>> > parallelism.
>> >
>> > On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins <
>> gareth.o.coll...@gmail.com>
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >> When querying large wide rows for multiple specific values is it
>> >> better to do separate queries for each value...or do it with one query
>> >> and an "IN"? I am using Cassandra 2.1.14
>> >>
>> >> I am asking because I had changed my app to use 'IN' queries and it
>> >> **appears** to be slower rather than faster. I had assumed that the
>> >> "IN" query should be faster...as I assumed it only needs to go down
>> >> the read path once (i.e. row cache -> memtable -> key cache -> bloom
>> >> filter -> index summary -> index -> compaction -> sstable) rather than
>> >> once for each entry? Or are there some additional caveats that I
>> >> should be aware of for 'IN' query performance (e.g. ordering of 'IN'
>> >> query entries, closeness of 'IN' query values in the SSTable etc.)?
>> >>
>> >> thanks in advance,
>> >> Gareth Collins
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> >> For additional commands, e-mail: user-h...@cassandra.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>

Re: Performance Of IN Queries On Wide Rows

Reply via email to