Re: Low Performance in High Cardinality Big Window Application

2018-08-31 Thread Ning Shi
Hi Konstantin, > could you replace the Kafka Source by a custom SourceFunction-implementation, > which just produces the new events in a loop as fast as possible. This way we > can rule out that the ingestion is responsible for the performance jump or > the limit at 5000 events/s and can benchm

Re: Low Performance in High Cardinality Big Window Application

2018-08-28 Thread Konstantin Knauf
Hi Ning, could you replace the Kafka Source by a custom SourceFunction-implementation, which just produces the new events in a loop as fast as possible. This way we can rule out that the ingestion is responsible for the performance jump or the limit at 5000 events/s and can benchmark the Flink job

Re: Low Performance in High Cardinality Big Window Application

2018-08-27 Thread Ning Shi
> If you have a window larger than hours then you need to rethink your > architecture - this is not streaming anymore. Only because you receive events > in a streamed fashion you don’t need to do all the processing in a streamed > fashion. Thanks for the thoughts, I’ll keep that in mind. Howeve

Re: Low Performance in High Cardinality Big Window Application

2018-08-26 Thread Jörn Franke
If you have a window larger than hours then you need to rethink your architecture - this is not streaming anymore. Only because you receive events in a streamed fashion you don’t need to do all the processing in a streamed fashion. Can you store the events in a file or a database and then do aft

Low Performance in High Cardinality Big Window Application

2018-08-26 Thread Ning Shi
The application consumes from a single Kafka topic, deserializes the JSON payload into POJOs and use a big keyed window (30+ days) for deduplication, then emits the result for every single event to four other keyed windows for aggregation. It looks roughly like the following. Source->KeyBy(A,B,C)