session window question

2022-04-27 Thread Sigalit Eliazov
Hi all i have the following scenario: a. a pipeline that reads messages from kafka and a session window with 1 minute duration. b. groupbykey in order to aggregate the data c. for each 'group' i do some calculation and build a new event to send to kafka. the output of this cycle is key1 - value1

Re: session window question

2022-04-27 Thread Jan Lukavský
Hi Sigalit, if I understand correctly, what you want is a deduplication of outputs coming from your GBK logic, is that correct? If do, you can use a stateful DoFn and a ValueState holding the last value seen per key in global window. There is an implementation of this approach in Deduplicate.

Re: session window question

2022-04-27 Thread Sigalit Eliazov
Thanks a lot. it solved my issue. On Wed, Apr 27, 2022 at 3:19 PM Jan Lukavský wrote: > Hi Sigalit, > > if I understand correctly, what you want is a deduplication of outputs > coming from your GBK logic, is that correct? If do, you can use a > stateful DoFn and a ValueState holding the last val

Latest Keynotes and sessions confirmed

2022-04-27 Thread Carolina Escobar
[image: Beam Summit 2022 - Cover Twitter.png] Beam Summit program is ready! Hello Apache Beam community, We are very excited to share that the Beam Summit program is ready! Beam Summit will be a hybrid event, having hybrid talks on Monday and Tuesday,

Slow Beam pipeline gets Flink checkpoint timeouts upon Kafka messages

2022-04-27 Thread Deepak Nagaraj
Hi Beam team, We're seeing Apache Beam have checkpoint timeouts on Flink. They happen when the pipeline has a slow step and we send a bunch of messages on Kafka. I have set up a similar pipeline on my laptop that reproduces the problem. Pipeline details: - * Python, running a Be