Re: Efficient grouping and parallelism on skewed data

2017-08-18 Thread Jakes John
Thanks for your reply. I don't have any special aggregation. My only requirement is, for every message in kafka with a particular id, write into a corresponding index in Elasticsearch.( I might need to enrich each message before writing into ES, but there are no aggregations on incoming stream)

Re: Efficient grouping and parallelism on skewed data

2017-08-17 Thread Tzu-Li (Gordon) Tai
Hi John, Do you need to do any sort of grouping on the keys and aggregation? Or are you simply using Flink to route the Kafka messages to different Elasticsearch indices? For the following I’m assuming the latter: If there’s no need for aggregate computation per key, what you can do is simply

Efficient grouping and parallelism on skewed data

2017-08-17 Thread Jakes John
Can some one help me in figuring out how to implement in flink. I have to create a pipeline Kafka->flink->elasticsearch. I have high throughput data coming into Kafka. All messages in Kafka have a key called 'id' and value is a integer that ranges 1 to N. N is dynamic with max value as 100. The n