Shuyi Chen created FLINK-37366: ---------------------------------- Summary: Allow configurable retry for Kafka topic metadata fetch Key: FLINK-37366 URL: https://issues.apache.org/jira/browse/FLINK-37366 Project: Flink Issue Type: Improvement Components: Connectors / Kafka Reporter: Shuyi Chen
For high availability, we adopted a multi-primary Kafka cluster setup, so the data of a Kafka topic will be in multiple physical clusters. In case of a kafka cluster failure, Flink pipeline should continue to run w/o failure. Currently, Flink pipeline will fail due to SubscriberUtils.getTopicMetadata() throwing RuntimeException if a kafka cluster fails, causing the pipeline keep restarting. We propose to add a configurable retry policy in SubscriberUtils.getTopicMetadata(), so we can configure flink Kafka connector to tolerate kafka failure for longer period of time w/o restarting. -- This message was sent by Atlassian Jira (v8.20.10#820010)