Hello, Thanks for the reply, Any strategy you followed to avoid joins when you rewrite your pipeline?
On Tue, Aug 29, 2023 at 9:15 AM Sachin Mittal <sjmit...@gmail.com> wrote: > Yes even we faced the same issue when trying to run a pipeline involving > join of two collections. It was deployed using AWS KDA, which uses flink > runner. The source was kinesis streams. > > Looks like join operations are not very efficient in terms of size > management when run on flink. > > We had to rewrite our pipeline to avoid these joins. > > Thanks > Sachin > > > On Tue, 29 Aug 2023 at 7:00 PM, Ruben Vargas <ruben.var...@metova.com> > wrote: > >> Hello >> >> I experimenting an issue with my beam pipeline >> >> I have a pipeline in which I split the work into different branches, then >> I do a join using CoGroupByKey, each message has its own unique Key. >> >> For the Join, I used a Session Window, and discarding the messages after >> trigger. >> >> I'm using Flink Runner and deployed a KInesis application. But I'm >> experiencing an unbounded growth of the checkpoint data size. When I see >> in Flink console, the following task has the largest checkpoint >> >> join_results/GBK -> ToGBKResult -> >> join_results/ConstructCoGbkResultFn/ParMultiDo(ConstructCoGbkResult) -> V >> >> >> Any Advice ? >> >> Thank you very much! >> >>