Re: Joining and Grouping Flink Tables with Java API

2021-02-15 Thread Timo Walther
Hi, yes, we can confirm that your program has the behavior you mentioned. Since we don't use any type of time operation or windowing, your query has updating semantics. State is used for keeping the LAST_VALUEs as well as the full input tables of the JOIN. You can achieve the same with a Key

Re: Joining and Grouping Flink Tables with Java API

2021-02-15 Thread Abdelilah CHOUKRI
Thank you guys for the interest, feedback and advies, Just to clarify further on the why we used tables with grouping, Form each DataStream we only interested in the last updated or new Event, Also, we need to have ALL the previous Events stored in order to identify if the incoming event is a new

Re: Joining and Grouping Flink Tables with Java API

2021-02-11 Thread Arvid Heise
Hi Abdelilah, you are right that union does not work (well) in your case. I misunderstood the relation between the two streams. The ideal pattern would be a broadcast join imho. [1] I'm not sure how to do it in Table API/SQL though, but I hope Timo can help here as well. [1] https://ci.apache.or

Re: Joining and Grouping Flink Tables with Java API

2021-02-11 Thread Timo Walther
After thinking about this topic again, I think UNION ALL will not solve the problem because you would need to group by brandId and perform the joining within the aggregate function which could also be quite expensive. Regards, Timo On 11.02.21 17:16, Timo Walther wrote: Hi Abdelilah, at a fi

Re: Joining and Grouping Flink Tables with Java API

2021-02-11 Thread Timo Walther
Hi Abdelilah, at a first glance your logic seems to be correct. But Arvid is right that your pipeline might not have the optimal performance that Flink can offer due to the 3 groupBy operations. I'm wondering what the optimizer produces out of this plan. Maybe you can share it with us using `

Re: Joining and Grouping Flink Tables with Java API

2021-02-11 Thread Arvid Heise
Hi Abdelilah, I think your approach is overly complicated (and probably slow) but I might have misunderstood things. Naively, I'd assume that you just want to union stream 1 and stream 2 instead of joining. Note that for union the events must have the same schema, so you most likely want to have a

Joining and Grouping Flink Tables with Java API

2021-02-08 Thread Abdelilah CHOUKRI
Hi, We're trying to use Flink 1.11 Java tables API to process a streaming use case: We have 2 streams, each one with different structures. Both events, coming from Kafka, can be: - A new event (not in the system already) - An updated event (updating an event that previously was inserted) so we on