Hi,
I am not familiar with ATNConfigSet, but some thoughts that might help.

How many distinct key1 (resp. key2) values do you have? Are these values
reasonably stable over time?

Are these records ingested in real-time or are they loaded from a datastore?

If the latter case the DB might be able to efficiently perform the
filtering, especially if equipped with a proper index over key1/key2 (or a
composite one).

In such case the filter push-down could be very effective (I didn't get if
you just need to count or do something more with the matching record).

Alternatively, you could try to group by (key1,key2), and then filter (it
again depends on the kind of output you have in mind).

If the datastore/stream is distributed and supports partitioning, you could
partition your records by either key1 or key2 (or key1+key2), so they are
already "separated" and can be consumed more efficiently (e.g., the groupby
could then be local to a single partition).

Best regards,
Alessandro

On 15 May 2018 at 08:32, onmstester onmstester <[email protected]> wrote:

> Hi,
>
> I need to run some queries on huge amount input records. Input rate for
> records are 100K/seconds.
> A record is like (key1,key2,value) and the application should report
> occurances of kye1 = something && key2 == somethingElse.
> The problem is there are too many filters in my query: more than 3
> thousands pair of key1 and key2 should be filtered.
> I was simply puting 1 millions of records in a temptable each time and
> running a query sql using spark-sql on temp table:
> select * from mytemptable where (kye1 = something && key2 ==
> somethingElse) or (kye1 = someOtherthing && key2 == someAnotherThing) or
> ...(3thousands or!!!)
> And i encounter StackOverFlow at ATNConfigSet.java line 178.
>
> So i have two options IMHO:
> 1. Either put all key1 and key2 filter pairs in another temp table and do
> a join between  two temp table
> 2. Or use spark-stream that i'm not familiar with and i don't know if it
> could handle 3K of filters.
>
> Which way do you suggest? what is the best solution for my problem
> 'performance-wise'?
>
> Thanks in advance
>
>

Reply via email to