Hi Gopal, Thanks for the detailed response.
It’s really a very simple query that I’m trying to run: select a.a_id, b.b_id, count(*) as c from table_a a, table_b b where bloom_contains(a_id, b_id_bloom) group by a.a_id, b.b_id; Where “bloom_contains” is a custom UDF. The only changes I made were renaming the tables and columns. The sizes of the tables I’m running against are small — roughly 50-100Mb — but this query would need to be expanded to run on a table that is >100Gb (table_b would likely max out around 100Mb). Any suggestions on how to approach this would be greatly appreciated. Best, Rory