[jira] [Created] (KAFKA-15161) InvalidReplicationFactorException at connect startup

Viktor Somogyi-Vass (Jira) Fri, 07 Jul 2023 05:10:47 -0700

Viktor Somogyi-Vass created KAFKA-15161:
-------------------------------------------


             Summary: InvalidReplicationFactorException at connect startup
                 Key: KAFKA-15161
                 URL: https://issues.apache.org/jira/browse/KAFKA-15161
             Project: Kafka
          Issue Type: Improvement
          Components: clients, KafkaConnect
    Affects Versions: 3.6.0
            Reporter: Viktor Somogyi-Vass


.h2 Problem description

In our system test environment in certain cases due to a very specific timing 
issue Connect may fail to start up. the problem lies in the very specific 
timing of a Kafka cluster and connect start/restart. In these cases while the 
broker doesn't have metadata and a consumer in connect starts and asks for 
topic metadata, it returns the following exception and fails:
{noformat}
[2023-07-07 13:56:47,994] ERROR [Worker clientId=connect-1, 
groupId=connect-cluster] Uncaught exception in herder work thread, exiting:  
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)
org.apache.kafka.common.KafkaException: Unexpected error fetching metadata for 
topic connect-offsets
        at 
org.apache.kafka.clients.consumer.internals.TopicMetadataFetcher.getTopicMetadata(TopicMetadataFetcher.java:130)
        at 
org.apache.kafka.clients.consumer.internals.TopicMetadataFetcher.getTopicMetadata(TopicMetadataFetcher.java:66)
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:2001)
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1969)
        at 
org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:251)
        at 
org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:242)
        at org.apache.kafka.connect.runtime.Worker.start(Worker.java:230)
        at 
org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:151)
        at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:363)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor is below 1 or larger than the number of available brokers.
{noformat}

Due to this error the connect node stops and it has to be manually restarted 
(and ofc it fails the test scenarios as well).

.h2 Reproduction

In my test scenario I had:
- 1 broker
- 1 connect distributed node
- I also had a patch that I applied on the broker to make sure we don't have 
metadata

Steps to repro:
# start up a zookeeper based broker without the patch
# put a breakpoint here: 
https://github.com/apache/kafka/blob/1d8b07ed6435568d3daf514c2d902107436d2ac8/clients/src/main/java/org/apache/kafka/clients/consumer/internals/TopicMetadataFetcher.java#L94
# start up a distributed connect node
# restart the kafka broker with the patch to make sure there is no metadata
# once the broker is started, release the debugger in connect

It should run into the error cited above and shut down.

This is not desirable, the connect cluster should retry to ensure its 
continuous operation or the broker should handle this case somehow differently, 
for instance by returning a RetriableException.

The earliest I've tried this is 2.8 but I think this affects versions before 
that as well (and after).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-15161) InvalidReplicationFactorException at connect startup

Reply via email to