Unless you explicitly set a page size (i'm pretty sure the query is converted to a paging query automatically under the hood) you will get capped at the default of 10k which might get a little weird semantically. That said, you should experiment with explicit page sizes and see where it gets you (i've not tried this yet with an IN clause - would be real curious to hear how it worked).
Another thing to consider is that it's a pretty big statement to parse every time. You might want to go the (much) smaller batch route so these can be stored procedures? (another thing I havent tried with IN clause - don't see why it would not work though). On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould <d...@chill.com> wrote: > I was wondering if anyone had a sense of performance/best practices > around the 'IN' predicate. > > I have a list of up to potentially ~30k keys that I want to look up in a > table (typically queries will have <500, but I worry about the long tail). > Most > of them will not exist in the table, but, say, about 10-20% will. > > Would it be best to do: > > 1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ...... uuid30000); > > 2) Split into smaller batches-- > for group_of_100 in all_30000: > // ** Issue in parallel or block after each one?? > SELECT fields FROM table WHERE id in (group_of_100 uuids); > > 3) Something else? > > My guess is that (1) is fine and that the only worry is too much data > returned (which won't be a problem in this case), but I wanted to check > that it's not a C* anti-pattern before. > > [Conversely, is a batch insert with up to 30k items ok?] > > Thanks, > Dan > > -- ----------------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com