Hi Jason,

When using interval joins, Flink can't parallelize the execution as the
join key (semantically) is even time, thus all events must fall into the
same partition for Flink to be able to lookup events from the two streams.
See the IntervalJoinOperator (
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/co/IntervalJoinOperator.java)
for the details.

If you have any other key(s) that could join both streams, and then filter
by the time of the event at a later phase, that could speed things up.

On Fri, May 20, 2022, 23:53 Jason Politis <jpoli...@carrera.io> wrote:

> Good evening all,
>
> We are working on a project where a few queries that are joining based on
> dates from table A are between dates from table B.  Something like:
>
> SELECT
> A.ID,
> B.NAME
> FROM
> A,
> B
> WHERE
> A.DATE BETWEEN B.START_DATE AND B.END_DATE;
>
> Both A and B are topics in Kafka with 5 partitions.  Doing a simple test
> of selecting * on each one will yield tasks with a parallelism of 5, which
> tells me we have parallelism working in flink.
>
> BUT, when we attempt the query I pasted above, the BETWEEN clause doesn't
> parallelize.
>
> I'd like to get your expert opinion on this and get your help on how to
> force this to parallelize.
>
> Thank you
>
>
> Jason Politis
> Solutions Architect, Carrera Group
> carrera.io
> | jpoli...@carrera.io <kpatter...@carrera.io>
> <http://us.linkedin.com/in/jasonpolitis>
>

Reply via email to