GitHub user guozhangwang opened a pull request: https://github.com/apache/kafka/pull/3748
KAFKA-5797: Delay checking of partition existence in StoreChangelogReader 1. Remove timeout-based validatePartitionExists from StoreChangelogReader; instead only try to refresh metadata once after all tasks have been created and their topology initialized (hence all stores have been registered). 2. Add the logic to refresh partition metadata at the end of initialization if some restorers needing initialization cannot find their changelogs, hoping that in the next run loop these stores can find their changelogs. As a result, we would not ever call `consumer#partitionsFor` any more, but only `consumer#listTopics`; so the only blocking calls left are `listTopics` and `endOffsets, and we always capture timeout exceptions around these two calls, and delay to retry in the next run loop after refreshing the metadata. By doing this we can also reduce the number of request round trips between consumer and brokers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guozhangwang/kafka K5797-handle-metadata-available Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3748.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3748 ---- commit a1cbd208007a0a5e73ff917987a457662554c04c Author: Guozhang Wang <wangg...@gmail.com> Date: 2017-08-27T03:47:01Z handlg timeout exception commit 2348799f54722c67f9837133a939e3f982b543d9 Author: Guozhang Wang <wangg...@gmail.com> Date: 2017-08-27T05:25:21Z fix unit tests ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---