Slight nuance: we don't load the whole row into memory, but the column index (and the result set, and the tombstones in the partition), which can still spike your GC/heap (and potentially overflow the row cache, if you have it on, which is atypical).
On Wed, Feb 21, 2018 at 1:35 PM, Carl Mueller <carl.muel...@smartthings.com> wrote: > Cass 2.1.14 is missing some wide row optimizations done in later cass > releases IIRC. > > Speculation: IN won't matter, it will load the entire wide row into memory > regardless which might spike your GC/heap and overflow the rowcache > > On Wed, Feb 21, 2018 at 2:16 PM, Gareth Collins < > gareth.o.coll...@gmail.com> wrote: > >> Thanks for the response! >> >> I could understand that being the case if the Cassandra cluster is not >> loaded. Splitting the work across multiple nodes would obviously make >> the query faster. >> >> But if this was just a single node, shouldn't one IN query be faster >> than multiple due to the fact that, if I understand correctly, >> Cassandra should need to do less work? >> >> thanks in advance, >> Gareth >> >> On Wed, Feb 21, 2018 at 7:27 AM, Rahul Singh >> <rahul.xavier.si...@gmail.com> wrote: >> > That depends on the driver you use but separate queries asynchronously >> > around the cluster would be faster. >> > >> > >> > -- >> > Rahul Singh >> > rahul.si...@anant.us >> > >> > Anant Corporation >> > >> > On Feb 20, 2018, 6:48 PM -0500, Eric Stevens <migh...@gmail.com>, >> wrote: >> > >> > Someone can correct me if I'm wrong, but I believe if you do a large >> IN() on >> > a single partition's cluster keys, all the reads are going to be served >> from >> > a single replica. Compared to many concurrent individual equal >> statements >> > you can get the performance gain of leaning on several replicas for >> > parallelism. >> > >> > On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins < >> gareth.o.coll...@gmail.com> >> > wrote: >> >> >> >> Hello, >> >> >> >> When querying large wide rows for multiple specific values is it >> >> better to do separate queries for each value...or do it with one query >> >> and an "IN"? I am using Cassandra 2.1.14 >> >> >> >> I am asking because I had changed my app to use 'IN' queries and it >> >> **appears** to be slower rather than faster. I had assumed that the >> >> "IN" query should be faster...as I assumed it only needs to go down >> >> the read path once (i.e. row cache -> memtable -> key cache -> bloom >> >> filter -> index summary -> index -> compaction -> sstable) rather than >> >> once for each entry? Or are there some additional caveats that I >> >> should be aware of for 'IN' query performance (e.g. ordering of 'IN' >> >> query entries, closeness of 'IN' query values in the SSTable etc.)? >> >> >> >> thanks in advance, >> >> Gareth Collins >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> >