[jira] [Created] (KAFKA-14312) Kraft + ProducerStateManager: produce requests to new partitions with a non-zero sequence number should be rejected

Travis Bischel (Jira) Mon, 17 Oct 2022 17:30:18 -0700

Travis Bischel created KAFKA-14312:
--------------------------------------

             Summary: Kraft + ProducerStateManager: produce requests to new 
partitions with a non-zero sequence number should be rejected
                 Key: KAFKA-14312
                 URL: https://issues.apache.org/jira/browse/KAFKA-14312
             Project: Kafka
          Issue Type: Bug
          Components: kraft, producer 
            Reporter: Travis Bischel



h1. Background

In Kraft mode, if I create a topic, I am occasionally seeing MetadataResponse 
with a valid leader, and if I immediately produce to that topic, I am seeing 
NOT_LEADER_FOR_PARTITION. There may be another bug causing Kraft to return a 
leader in metadata but reject requests to that leader, _but_ this is showing a 
bigger problem.

Kafka currently accepts produce requests to new partitions with a non-zero 
sequence number. I have confirmed this locally by modifying my client to start 
producing with a sequence number of 10. Producing three records sequentially 
back to back (seq 10, 11, 12) are all successful. I _think_ this 
[comment|https://github.com/apache/kafka/blob/3e7eddecd6a63ea6a9793d3270bef6d0be5c9021/core/src/main/scala/kafka/log/ProducerStateManager.scala#L235-L236]
 in the Kafka source also indicates roughly the same thing.
h1. Problem
 * Client initializes producer ID
 * Client creates topic "foo" (for the problem, we will ignore partitions – 
there is just one partition)
 * Client sends produce request A with 5 records
 * Client sends produce request B with 5 records before receiving a response 
for A
 * Broker returns NOT_LEADER_FOR_PARTITION to produce request A
 * Broker finally initializes, becomes leader before seeing request B
 * Broker accepts request B as the first request
 * Broker believes sequence number 5 is ok, and is expecting the next sequence 
to be 10
 * Client retries requests A and B, because A failed
 * Broker sees request A with sequence 0, returns OutOfOrderSequenceException
 * Client enters a fatal state, because OOOSN is not retryable

h1. Reproducing

I can reliably reproduce this error using Kraft mode with 1 broker. I am using 
the following docker compose:

{{version: "3.7"}}
{{services:}}
{{  kafka:}}
{{    image: bitnami/kafka:latest}}
{{    network_mode: host}}
{{    environment:}}
{{      KAFKA_ENABLE_KRAFT: yes}}
{{      KAFKA_CFG_PROCESS_ROLES: controller,broker}}
{{      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER}}
{{      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093}}
{{      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT}}
{{      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@127.0.0.1:9093}}
{{      # Set this to "PLAINTEXT://127.0.0.1:9092" if you want to run this 
container on localhost via Docker}}
{{      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://127.0.0.1:9092}}
{{      KAFKA_CFG_BROKER_ID: 1}}
{{      ALLOW_PLAINTEXT_LISTENER: yes}}
{{      KAFKA_KRAFT_CLUSTER_ID: XkpGZQ27R3eTl3OdTm2LYA # 16 byte base64-encoded 
UUID}}
{{      BITNAMI_DEBUG: true # Enable this to get more info on startup failures}}

 

I am running the franz-go integration tests to trigger this (frequently, but 
not all of the time). However, these tests are not required. The behavior 
described above can occasionally reproduce this.

I have never experienced this against the zookeeper version. It seems that the 
zk version always fully initializes a topic immediately and does not return 
NOT_LEADER_FOR_PARTITION on the first produce request. This is a separate 
problem – but the main problem described above exists in all versions, and 
_can_ be experienced in zk in very strange circumstances.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14312) Kraft + ProducerStateManager: produce requests to new partitions with a non-zero sequence number should be rejected

Reply via email to