Hi,
I need to run some queries on huge amount input records. Input rate for records are 100K/seconds. A record is like (key1,key2,value) and the application should report occurances of kye1 = something && key2 == somethingElse. The problem is there are too many filters in my query: more than 3 thousands pair of key1 and key2 should be filtered. I was simply puting 1 millions of records in a temptable each time and running a query sql using spark-sql on temp table: select * from mytemptable where (kye1 = something && key2 == somethingElse) or (kye1 = someOtherthing && key2 == someAnotherThing) or ...(3thousands or!!!) And i encounter StackOverFlow at ATNConfigSet.java line 178. So i have two options IMHO: 1. Either put all key1 and key2 filter pairs in another temp table and do a join between two temp table 2. Or use spark-stream that i'm not familiar with and i don't know if it could handle 3K of filters. Which way do you suggest? what is the best solution for my problem 'performance-wise'? Thanks in advance