Broadcast streaming join is a very interesting addition to streaming
SQL, I'm glad to see it's been brought up.
One of the major difference between streaming and batch is state.
Regular join uses "Keyed State" (the key is deduced from join
condition), so for a regular broadcast streaming join, we
+1 a FLIP to clarify the idea.
Please be careful to choose which type of state you use here. The doc[1]
says the broadcast state doesn't support RocksDB backend here.
Best,
Shengkai
[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/broadcast_state/#impo
I've seen enough demand for a streaming broadcast join in the community to
justify a FLIP -- I think it's a good idea, and look forward to the
discussion.
David
On Fri, Feb 2, 2024 at 6:55 AM Feng Jin wrote:
> +1 a FLIP for this topic.
>
>
> Best,
> Feng
>
> On Fri, Feb 2, 2024 at 10:26 PM Mart
+1 a FLIP for this topic.
Best,
Feng
On Fri, Feb 2, 2024 at 10:26 PM Martijn Visser
wrote:
> Hi,
>
> I would definitely expect a FLIP on this topic before moving to
> implementation.
>
> Best regards,
>
> Martijn
>
> On Fri, Feb 2, 2024 at 12:47 PM Xuyang wrote:
>
>> Hi, Prabhjot.
>>
>> IIUC,
Hi,
I would definitely expect a FLIP on this topic before moving to
implementation.
Best regards,
Martijn
On Fri, Feb 2, 2024 at 12:47 PM Xuyang wrote:
> Hi, Prabhjot.
>
> IIUC, the main reasons why the community has not previously considered
> supporting join hints only in batch mode are as
Hi, Prabhjot.
IIUC, the main reasons why the community has not previously considered
supporting join hints only in batch mode are as follows:
1. In batch mode, multiple join type algorithms were implemented quite early
on, and
2. Stream processing represents a long-running scenario, and it i
Hi Feng,
Thanks for your prompt response.
If we were to solve this in Flink, my higher level viewpoint is:
1. First to implement Broadcast join in Flink Streaming SQL, that works
across Table api (e.g. via a `left.join(right, ,
join_type="broadcast")
2. Then, support a Broadcast hint that would u
Hi Prabhjot
I think this is a reasonable scenario. If there is a large table and a very
small table for regular join, without broadcasting the regular join, it can
easily cause data skew.
We have also encountered similar problems too. Currently, we can only copy
multiple copies of the small table
Hello folks,
We have a use case where we have a few stream-stream joins, requiring us to
join a very large table with a much smaller table, essentially enriching
the large table with a permutation on the smaller table (Consider deriving
all orders/sessions for a new location). Given the nature of