Yes even we faced the same issue when trying to run a pipeline involving join of two collections. It was deployed using AWS KDA, which uses flink runner. The source was kinesis streams.
Looks like join operations are not very efficient in terms of size management when run on flink. We had to rewrite our pipeline to avoid these joins. Thanks Sachin On Tue, 29 Aug 2023 at 7:00 PM, Ruben Vargas <ruben.var...@metova.com> wrote: > Hello > > I experimenting an issue with my beam pipeline > > I have a pipeline in which I split the work into different branches, then > I do a join using CoGroupByKey, each message has its own unique Key. > > For the Join, I used a Session Window, and discarding the messages after > trigger. > > I'm using Flink Runner and deployed a KInesis application. But I'm > experiencing an unbounded growth of the checkpoint data size. When I see > in Flink console, the following task has the largest checkpoint > > join_results/GBK -> ToGBKResult -> > join_results/ConstructCoGbkResultFn/ParMultiDo(ConstructCoGbkResult) -> V > > > Any Advice ? > > Thank you very much! > >