spark sql StackOverflow

onmstester onmstester Mon, 14 May 2018 23:33:08 -0700

Hi,



I need to run some queries on huge amount input records. Input rate for records 
are 100K/seconds.

A record is like (key1,key2,value) and the application should report occurances 
of kye1 = something &amp;&amp; key2 == somethingElse.

The problem is there are too many filters in my query: more than 3 thousands 
pair of key1 and key2 should be filtered.

I was simply puting 1 millions of records in a temptable each time and running 
a query sql using spark-sql on temp table:

select * from mytemptable where (kye1 = something &amp;&amp; key2 == 
somethingElse) or (kye1 = someOtherthing &amp;&amp; key2 == someAnotherThing) 
or ...(3thousands or!!!)

And i encounter StackOverFlow at ATNConfigSet.java line 178.



So i have two options IMHO:

1. Either put all key1 and key2 filter pairs in another temp table and do a 
join between  two temp table

2. Or use spark-stream that i'm not familiar with and i don't know if it could 
handle 3K of filters.



Which way do you suggest? what is the best solution for my problem 
'performance-wise'?



Thanks in advance

spark sql StackOverflow

Reply via email to