Re: Kafka and Flink's partitions

2016-08-29 Thread rss rss
Hello, thanks for the answer. 1. There is currently no way to avoid the repartitioning. When you do a > keyBy(), Flink will shuffle the data through the network. What you would > need is a way to tell Flink that the data is already partitioned. If you > would use keyed state, you would also need t

Re: Kafka and Flink's partitions

2016-08-29 Thread Robert Metzger
Hi rss, Concerning your questions: 1. There is currently no way to avoid the repartitioning. When you do a keyBy(), Flink will shuffle the data through the network. What you would need is a way to tell Flink that the data is already partitioned. If you would use keyed state, you would also need to

Kafka and Flink's partitions

2016-08-25 Thread rss rss
Hello, I want to implement something like a schema of processing which is presented on following diagram. This is calculation of number of unique users per specified time with assumption that we have > 100k events per second and > 100M unique users: I have one Kafka's topic of events with a