GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/5661
[FLINK-8896][kafka08] fix Kafka08Fetcher trying to look up topic "n/a" on partiton "-1" ## What is the purpose of the change `Kafka08Fetcher` uses a `MARKER` to make sure the main thread wakes up when cancelling. While looking up partition leaders, this marker is removed only once from the list of partitions to look up and there is a code path that leads to two markers being present in which case the lookup will throw an exception: - `FlinkKafkaConsumerBase#cancel()` is called in one thread, stopped right after setting `running` to false, and then - `FlinkKafkaConsumerBase`'s partition discovery loop thread drops out before the first thread was able to call `Kafka08Fetcher#cancel`. ## Brief change log - remove all markers in the list, not just one - make `FlinkKafkaConsumerBase`'s partition discovery loop not call `cancel()` if already cancelled ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **JavaDocs** You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-8896 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5661.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5661 ---- commit e9711a61be0c260629c9f2f34c03af6e1aa9ac61 Author: Nico Kruber <nico@...> Date: 2018-03-08T11:22:32Z [FLINK-8896][kafka08] remove all cancel MARKERs before trying to find partition leaders This guards us against #cancel() being called multiple times and then trying to look up an invalid topic/partition pair. commit bffe96f5e3522824656ed55074ac09591a36e2ae Author: Nico Kruber <nico@...> Date: 2018-03-08T11:23:47Z [hotfix][kafka] do not run cancel() in the discovery loop if already cancelled ---- ---