Your approach with ProcessFunction should work in general.
Can you guarantee that no event can arrive for a transaction for which an aggregated event
was already emitted? On 26.11.2017 18:22, Lothium wrote:
Hi, I have a question to a specific use case and hope that you can help me with that. I have a streaming pipeline and receive events of different types. I parse the events to their internal representation and do some transformations on them. Some of these events I want to collect internally (grouped by a transaction id) and as soon as a specific event arrives, I want to emit for example an aggregated event to the downstream operators (so this is not bound to time or a count). I thought about keying the stream by some characteristic (so that all the needed events are in the same logical partition), collect them in a stateful process function and emit this after the specific event arrived. Besides of the event types I also have to key the stream by a transaction id, which all of these events are belong to (the transaction id is in all of the events). The transaction id is unique and will only occur once, so I will have a lot of unique short living keys. I would clear the state of the process function after I have emitted the aggregated event downstream, so this should hopefully release the state and will clean it up. Is that correct? Would there be a problem, because of these many keys that I would use (which will only be used once) or wouldn't this be a problem and would Flink release the ressources (regarding memory usage etc.)? Is this the right way to handle this use case? Thanks! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/