Yin Lei created KAFKA-20237:
-------------------------------
Summary: TransactionManager stuck in `INITIALIZING` state after
initial SSL handshake failure
Key: KAFKA-20237
URL: https://issues.apache.org/jira/browse/KAFKA-20237
Project: Kafka
Issue Type: Bug
Components: clients, producer
Affects Versions: 3.9.0
Environment: - Operating System: Linux aarch64;
- Kafka Version (Both Client and Server): 3.9.0;
- security.protocol: SSL;
- Some producer configurations: retries=2, reconnect.backoff.ms=30000,
transactional.id not set, enable.idempotence not set;
Reporter: Yin Lei
I encountered a scenario where the `KafkaProducer` fails to recover if the
initial SSL handshake with the broker fails, even after the underlying SSL
configuration is corrected.
**Steps to Reproduce:**
1. Configure a `KafkaProducer` with SSL enabled, but use an incorrect/untrusted
certificate on the server side to trigger an `SSLHandshakeException`.
2. Start the Producer and attempt to send a message.
3. The Producer logs show recurring SSL handshake errors. At this point,
`TransactionManager` enters the `INITIALIZING` state.
4. Correct the SSL certificate configuration on the **server side** so that the
broker is now reachable and the handshake can succeed.
5. Observe the Producer's behavior, messages still cat not be sent to broker.
**Expected Behavior:**
The Producer should successfully complete the SSL handshake, and the `Sender`
thread should retry the `InitProducerId` request, allowing the
`TransactionManager` to transition from `INITIALIZING` to `READY`.
**Actual Behavior:**
Even though the network/SSL layer is recovered, the `KafkaProducer` remains
unable to send messages. The `TransactionManager` stays stuck in `INITIALIZING`
because the initial failure to obtain a `ProducerId` isn't properly
re-triggered or the state machine doesn't recover from the specific handshake
exception during the transition.
!image-2026-03-02-16-07-51-742.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)