Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-03 Thread Benchao Li
Broadcast streaming join is a very interesting addition to streaming SQL, I'm glad to see it's been brought up. One of the major difference between streaming and batch is state. Regular join uses "Keyed State" (the key is deduced from join condition), so for a regular broadcast streaming join, we

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-03 Thread Shengkai Fang
+1 a FLIP to clarify the idea. Please be careful to choose which type of state you use here. The doc[1] says the broadcast state doesn't support RocksDB backend here. Best, Shengkai [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/broadcast_state/#impo

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread David Anderson
I've seen enough demand for a streaming broadcast join in the community to justify a FLIP -- I think it's a good idea, and look forward to the discussion. David On Fri, Feb 2, 2024 at 6:55 AM Feng Jin wrote: > +1 a FLIP for this topic. > > > Best, > Feng > > On Fri, Feb 2, 2024 at 10:26 PM Mart

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread Feng Jin
+1 a FLIP for this topic. Best, Feng On Fri, Feb 2, 2024 at 10:26 PM Martijn Visser wrote: > Hi, > > I would definitely expect a FLIP on this topic before moving to > implementation. > > Best regards, > > Martijn > > On Fri, Feb 2, 2024 at 12:47 PM Xuyang wrote: > >> Hi, Prabhjot. >> >> IIUC,

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread Martijn Visser
Hi, I would definitely expect a FLIP on this topic before moving to implementation. Best regards, Martijn On Fri, Feb 2, 2024 at 12:47 PM Xuyang wrote: > Hi, Prabhjot. > > IIUC, the main reasons why the community has not previously considered > supporting join hints only in batch mode are as

Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-01 Thread Prabhjot Bharaj via user
Hi Feng, Thanks for your prompt response. If we were to solve this in Flink, my higher level viewpoint is: 1. First to implement Broadcast join in Flink Streaming SQL, that works across Table api (e.g. via a `left.join(right, , join_type="broadcast") 2. Then, support a Broadcast hint that would u

Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-01 Thread Feng Jin
Hi Prabhjot I think this is a reasonable scenario. If there is a large table and a very small table for regular join, without broadcasting the regular join, it can easily cause data skew. We have also encountered similar problems too. Currently, we can only copy multiple copies of the small table