Re: Flink SQL Dates Between and Parallelism

2022-05-21 Thread Yuval Itzchakov
Hi Jason, The idea behind a more scalable solution is to be able to first partition the data. For example, if we have a stream of orders with some order id contained in both streams, and event time lookups are within a specific order, we can introduce partitioning and thus increase parallelism. O

Re: Flink SQL Dates Between and Parallelism

2022-05-21 Thread Jason Politis
Thank you Yuval! Will look into it first thing Monday morning, much appreciated. In the case where we aren't able to filter by anything else, is there anything else we can could potentially look into to help? Thank you Jason Politis Solutions Architect, Carrera Group carrera.io | jpoli...@car

Re: Flink SQL Dates Between and Parallelism

2022-05-20 Thread Yuval Itzchakov
Hi Jason, When using interval joins, Flink can't parallelize the execution as the join key (semantically) is even time, thus all events must fall into the same partition for Flink to be able to lookup events from the two streams. See the IntervalJoinOperator ( https://github.com/apache/flink/blob/

Flink SQL Dates Between and Parallelism

2022-05-20 Thread Jason Politis
Good evening all, We are working on a project where a few queries that are joining based on dates from table A are between dates from table B. Something like: SELECT A.ID, B.NAME FROM A, B WHERE A.DATE BETWEEN B.START_DATE AND B.END_DATE; Both A and B are topics in Kafka with 5 partitions. Doi