[
https://issues.apache.org/jira/browse/KAFKA-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503372#comment-17503372
]
Michael Hornung commented on KAFKA-13683:
-----------------------------------------
In the meanwhile we got this advice from Confluent Support:
{quote}Hello Michael,
My name is Nicolas I am one of Eliot colleague, he brought this ticket to my
attention because it looks like you may be using transactional producer when
you don't really need it
We are still reviewing what we can improve on this cluster to limit potential
timeout occurrence, because it should not happen with the 60000m timeout you
have configured
Is the shared code snippet the actual code that is going in production ?
Code review is outside our our usual scope, but I am concerned you are using
transaction as a way to "transactionally" send data to Kafka, as its usually
the case with classic Database
If I understand correctly, your code is starting an AkkaHttpRestServer and on
each received POST request, you are creating a new KafkaProducer and doing the
full transaction sequence to send a single record
Transactions, when used only with a producer (as in "not tied with a
consumer"), are beneficial when you are writing multiple records to Kafka on
multiple partition leaders (see
[https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/|https://urldefense.proofpoint.com/v2/url?u=https-3A__www.confluent.io_blog_exactly-2Donce-2Dsemantics-2Dare-2Dpossible-2Dheres-2Dhow-2Dapache-2Dkafka-2Ddoes-2Dit_&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=Wm9epT8Im6UTApAKPK3AXoyKGAB9EPs5XdvIwvNpone28zsdbIqqmbyhD_VucnSS&m=WGs8sYawFfjWfy7oRD9Rq5WIAyzter8xyfaNnWTyvH3j1F_qZe1cRrqQzQBKP2kk&s=UqTaqpDyCPcKtvyAHt79EBEc9PbrMbOUzWXQcUPbo4w&e=]
section {*}Transactions: Atomic writes across multiple partitions{*})
I believe that what you are looking for in your code is the guarantee that the
HTTP message was successfully delivered and replicated in the kafka cluster, to
be able to synchronously answer back the HTTP request. For this to work you
only need to configure your producer with exactly once semantic delivery, for
which you do not need the transaction overhead, but "just"
enable.idempotence=true and acks=all, you will have Replication Factor=3 and
min.isr=2 guarantee on Confluent Cloud
You may also considering using a long lived KafkaProducer that would be reused
in your HTTPServer, so that you do not have to pay the Producer initialization
time on each HTTP Request
Let me know what you think here, we are working on the cluster to see if we can
identify potential improvement, but considering this only happened once for
your application it may have just be a temporary spike
Have a good day
Nicolas
{quote}
We are implementing that solution proposal at the moment.
> Transactional Producer - Transaction with key xyz went wrong with exception:
> Timeout expired after 60000milliseconds while awaiting InitProducerId
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-13683
> URL: https://issues.apache.org/jira/browse/KAFKA-13683
> Project: Kafka
> Issue Type: Bug
> Components: clients
> Affects Versions: 2.6.0, 2.7.0, 3.0.0
> Reporter: Michael Hornung
> Priority: Critical
> Labels: new-txn-protocol-should-fix
> Attachments: AkkaHttpRestServer.scala,
> image-2022-02-24-09-12-04-804.png, image-2022-02-24-09-13-01-383.png,
> timeoutException.png
>
>
> We have an urgent issue with our customer using kafka transactional producer
> with kafka cluster with 3 or more nodes. Our customer is using confluent
> cloud on azure.
> We this exception regularly: "Transaction with key XYZ went wrong with
> exception: Timeout expired after 60000milliseconds while awaiting
> InitProducerId" (see attachment)
> We assume that the cause is a node which is down and the producer still sends
> messages to the “down node”.
> We are using kafa streams 3.0.
> *We expect that if a node is down kafka producer is intelligent enough to not
> send messages to this node any more.*
> *What’s the solution of this issue? Is there any config we have to set?*
> *This request is urgent because our costumer will soon have production
> issues.*
> *Additional information*
> * send record --> see attachment “AkkaHttpRestServer.scala” – line 100
> * producer config --> see attachment “AkkaHttpRestServer.scala” – line 126
--
This message was sent by Atlassian Jira
(v8.20.1#820001)