Hello everybody,
I'm using Flink Kafka consumer 0.8.x with kafka 0.8.2 and flink 1.0.3 on YARN.
In kafka I have a topic which have 20 partitions and my flink topology reads
from kafka (source) and writes to hbase (sink).
when:
1. flink source has parallelism set to 40 (20 of the tasks are idle) I see
10.000 requests/sec on hbase
2. flink source has parallelism set to 20 (exact number of partitions) I
see 100.0000 requests/sec on hbase (so a 10x improvement)
It's clear that hbase is not the limiting factor in my topology.
Assumption: Flink backpressure mechanism kicks in in the 1. case more
aggressively and it's limiting the ingestion of tuples in the topology.
The question: In the first case, why are those 20 sources which are sitting
idle contributing so much to the backpressure?
Thanks guys!