Hi Konstantin,
> could you replace the Kafka Source by a custom SourceFunction-implementation,
> which just produces the new events in a loop as fast as possible. This way we
> can rule out that the ingestion is responsible for the performance jump or
> the limit at 5000 events/s and can benchm
Hi Ning,
could you replace the Kafka Source by a custom
SourceFunction-implementation, which just produces the new events in a loop
as fast as possible. This way we can rule out that the ingestion is
responsible for the performance jump or the limit at 5000 events/s and can
benchmark the Flink job
> If you have a window larger than hours then you need to rethink your
> architecture - this is not streaming anymore. Only because you receive events
> in a streamed fashion you don’t need to do all the processing in a streamed
> fashion.
Thanks for the thoughts, I’ll keep that in mind. Howeve
If you have a window larger than hours then you need to rethink your
architecture - this is not streaming anymore. Only because you receive events
in a streamed fashion you don’t need to do all the processing in a streamed
fashion.
Can you store the events in a file or a database and then do aft
The application consumes from a single Kafka topic, deserializes the
JSON payload into POJOs and use a big keyed window (30+ days) for
deduplication, then emits the result for every single event to four
other keyed windows for aggregation. It looks roughly like the
following.
Source->KeyBy(A,B,C)