Your approach with ProcessFunction should work in general.

Can you guarantee that no event can arrive for a transaction for which an aggregated event
was already emitted?

On 26.11.2017 18:22, Lothium wrote:
Hi,
I have a question to a specific use case and hope that you can help me with
that. I have a streaming pipeline and receive events of different types. I
parse the events to their internal representation and do some
transformations on them. Some of these events I want to collect internally
(grouped by a transaction id) and as soon as a specific event arrives, I
want to emit for example an aggregated event to the downstream operators (so
this is not bound to time or a count).
I thought about keying the stream by some characteristic (so that all the
needed events are in the same logical partition), collect them in a stateful
process function and emit this after the specific event arrived.

Besides of the event types I also have to key the stream by a transaction
id, which all of these events are belong to (the transaction id is in all of
the events). The transaction id is unique and will only occur once, so I
will have a lot of unique short living keys.

I would clear the state of the process function after I have emitted the
aggregated event downstream, so this should hopefully release the state and
will clean it up. Is that correct? Would there be a problem, because of
these many keys that I would use (which will only be used once) or wouldn't
this be a problem and would Flink release the ressources (regarding memory
usage etc.)? Is this the right way to handle this use case?

Thanks!



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Reply via email to