Hi Jason,
The idea behind a more scalable solution is to be able to first partition
the data. For example, if we have a stream of orders with some order id
contained in both streams, and event time lookups are within a specific
order, we can introduce partitioning and thus increase parallelism.
O
Thank you Yuval! Will look into it first thing Monday morning, much
appreciated. In the case where we aren't able to filter by anything else,
is there anything else we can could potentially look into to help?
Thank you
Jason Politis
Solutions Architect, Carrera Group
carrera.io
| jpoli...@car
Hi Jason,
When using interval joins, Flink can't parallelize the execution as the
join key (semantically) is even time, thus all events must fall into the
same partition for Flink to be able to lookup events from the two streams.
See the IntervalJoinOperator (
https://github.com/apache/flink/blob/
Good evening all,
We are working on a project where a few queries that are joining based on
dates from table A are between dates from table B. Something like:
SELECT
A.ID,
B.NAME
FROM
A,
B
WHERE
A.DATE BETWEEN B.START_DATE AND B.END_DATE;
Both A and B are topics in Kafka with 5 partitions. Doi