Thanks Nate,
I assume 10k is the return limit. I don't think I'll ever get close to
10k matches to the IN query. That said, you're right: to be safe I'll
increase the limit to match the number of items on the IN.
I didn't know CQL supported stored procedures, but I'll take a look. I
suppose my question was asking about parsing overhead, however. If one
big query doesn't cause problems--which I assume it wouldn't since there
can be multiple threads parsing and I assume C* is smart about memory
when accumulating results--I'd much rather do that.
Dan
On 11/6/13 3:05 PM, Nate McCall wrote:
Unless you explicitly set a page size (i'm pretty sure the query is
converted to a paging query automatically under the hood) you will get
capped at the default of 10k which might get a little weird
semantically. That said, you should experiment with explicit page
sizes and see where it gets you (i've not tried this yet with an IN
clause - would be real curious to hear how it worked).
Another thing to consider is that it's a pretty big statement to parse
every time. You might want to go the (much) smaller batch route so
these can be stored procedures? (another thing I havent tried with IN
clause - don't see why it would not work though).
On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould <d...@chill.com
<mailto:d...@chill.com>> wrote:
I was wondering if anyone had a sense of performance/best practices
around the 'IN' predicate.
I have a list of up to potentially ~30k keys that I want to look
up in a
table (typically queries will have <500, but I worry about the
long tail). Most
of them will not exist in the table, but, say, about 10-20% will.
Would it be best to do:
1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ......
uuid30000);
2) Split into smaller batches--
for group_of_100 in all_30000:
// ** Issue in parallel or block after each one??
SELECT fields FROM table WHERE id in (group_of_100 uuids);
3) Something else?
My guess is that (1) is fine and that the only worry is too much
data returned (which won't be a problem in this case), but I
wanted to check that it's not a C* anti-pattern before.
[Conversely, is a batch insert with up to 30k items ok?]
Thanks,
Dan
--
-----------------
Nate McCall
Austin, TX
@zznate
Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com