Hello,

Thanks for the reply, Any strategy you followed to avoid joins when you
rewrite your pipeline?



On Tue, Aug 29, 2023 at 9:15 AM Sachin Mittal <sjmit...@gmail.com> wrote:

> Yes even we faced the same issue when trying to run a pipeline involving
> join of two collections. It was deployed using AWS KDA, which uses flink
> runner. The source was kinesis streams.
>
> Looks like join operations are not very efficient in terms of size
> management when run on flink.
>
> We had to rewrite our pipeline to avoid these joins.
>
> Thanks
> Sachin
>
>
> On Tue, 29 Aug 2023 at 7:00 PM, Ruben Vargas <ruben.var...@metova.com>
> wrote:
>
>> Hello
>>
>> I experimenting an issue with my beam pipeline
>>
>> I have a pipeline in which I split the work into different branches, then
>> I do a join using CoGroupByKey, each message has its own unique Key.
>>
>> For the Join, I used a Session Window, and discarding the messages after
>> trigger.
>>
>> I'm using Flink Runner and deployed a KInesis application. But I'm
>> experiencing  an unbounded growth of the checkpoint data size. When I see
>> in Flink console, the  following task has the largest checkpoint
>>
>> join_results/GBK -> ToGBKResult ->
>> join_results/ConstructCoGbkResultFn/ParMultiDo(ConstructCoGbkResult) -> V
>>
>>
>> Any Advice ?
>>
>> Thank you very much!
>>
>>

Reply via email to