Re: CQL 'IN' predicate

Nate McCall Wed, 06 Nov 2013 15:07:19 -0800

Unless you explicitly set a page size (i'm pretty sure the query is
converted to a paging query automatically under the hood) you will get
capped at the default of 10k which might get a little weird semantically.
That said, you should experiment with explicit page sizes and see where it
gets you (i've not tried this yet with an IN clause - would be real curious
to hear how it worked).


Another thing to consider is that it's a pretty big statement to parse
every time. You might want to go the (much) smaller batch route so these
can be stored procedures? (another thing I havent tried with IN clause -
don't see why it would not work though).




On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould <d...@chill.com> wrote:

> I was wondering if anyone had a sense of performance/best practices
> around the 'IN' predicate.
>
> I have a list of up to potentially ~30k keys that I want to look up in a
> table (typically queries will have <500, but I worry about the long tail).
>  Most
> of them will not exist in the table, but, say, about 10-20% will.
>
> Would it be best to do:
>
> 1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ...... uuid30000);
>
> 2) Split into smaller batches--
> for group_of_100 in all_30000:
>    // ** Issue in parallel or block after each one??
>    SELECT fields FROM table WHERE id in (group_of_100 uuids);
>
> 3) Something else?
>
> My guess is that (1) is fine and that the only worry is too much data
> returned (which won't be a problem in this case), but I wanted to check
> that it's not a C* anti-pattern before.
>
> [Conversely, is a batch insert with up to 30k items ok?]
>
> Thanks,
> Dan
>
>


-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: CQL 'IN' predicate

Reply via email to