C0urante commented on PR #16496:
URL: https://github.com/apache/kafka/pull/16496#issuecomment-2225896922

   Thanks for the ping @jolshan 🙂
   
   I could have sworn producers already logged `UNKNOWN_TOPIC_OR_PARTITION` 
errors during this scenario, so I ran the 
[ConnectWorkerIntegrationTest::testSourceTaskNotBlockedOnShutdownWithNonExistenTopic](https://github.com/apache/kafka/blob/0ada8fac6869cad8ac33a79032cf5d57bfa2a3ea/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ConnectWorkerIntegrationTest.java#L349)
 test case locally on the latest trunk to check.
   
   I see these `WARN`-level messages being logged:
   ```
   WARN [simple-connector|task-2] [Producer 
clientId=connector-producer-simple-connector-2] The metadata response from the 
cluster reported a recoverable issue with correlation id 4 : 
{nonexistenttopic=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient:1218)
   WARN [simple-connector|task-0] [Producer 
clientId=connector-producer-simple-connector-0] The metadata response from the 
cluster reported a recoverable issue with correlation id 4 : 
{nonexistenttopic=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient:1218)
   WARN [simple-connector|task-0] [Producer 
clientId=connector-producer-simple-connector-0] The metadata response from the 
cluster reported a recoverable issue with correlation id 5 : 
{nonexistenttopic=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient:1218)
   ```
   
   I do think we could add a new message with clearer wording to let users know 
that this may indicate that the topic doesn't exist and the producer will block 
in `send` until the timeout expires or the topic is created. But IMO this would 
best be accomplished with a few tweaks:
   - Only log this once per invocation of `send`
   - Log at `WARN` level
   - State that this message can be ignored if the topic has been recently 
created
   
   And as far as a long-term functional fix goes, yes, I think there's been 
some talk of a small KIP to limit the retry duration specifically for 
`UNKNOWN_TOPIC_OR_PARTITION` errors, both before `send` returns (which will 
happen if metadata for the topic partition hasn't been cached yet and cannot be 
found before the timeout expires) and after a record has been added to a batch 
(which may happen if cached metadata for the topic partition is used, but the 
topic is deleted between the last successful metadata fetch and when the record 
is sent to the broker).
   
   Obviously that's out of scope for this PR, so I don't think that those plans 
should should cause us to abandon this logging improvement in the meantime.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to