Hello everybody, I'm using Flink Kafka consumer 0.8.x with kafka 0.8.2 and flink 1.0.3 on YARN. In kafka I have a topic which have 20 partitions and my flink topology reads from kafka (source) and writes to hbase (sink).
when: 1. flink source has parallelism set to 40 (20 of the tasks are idle) I see 10.000 requests/sec on hbase 2. flink source has parallelism set to 20 (exact number of partitions) I see 100.0000 requests/sec on hbase (so a 10x improvement) It's clear that hbase is not the limiting factor in my topology. Assumption: Flink backpressure mechanism kicks in in the 1. case more aggressively and it's limiting the ingestion of tuples in the topology. The question: In the first case, why are those 20 sources which are sitting idle contributing so much to the backpressure? Thanks guys!