If KafkaStreams goes into ERROR state, you cannot do anything but
restart it.

We constantly improve KafkaStreams to avoid getting into ERROR state but
it's not always possible to auto-recover.

The point being: You should try to figure out the root cause for the
issue (not easy to do unfortunately). Maybe you could change some
configs, but it could also be a bug in KafkaStreams or the broker.

There are more improvement in Apache Kafka 2.6.0 for EOS though. So
maybe it's already fixed there.


-Matthias

On 9/9/20 10:07 PM, Pushkar Deole wrote:
> Matthias,
> 
> Is there any work around after the stream goes into error because of above
> issue like attaching a StateListener on the StreamBuilder and restart the
> stream in case of ERROR state?
> Right now, we need to start the pod that hosts the application which won't
> be feasible when the application goes into production.
> 
> On Thu, Sep 10, 2020 at 2:20 AM Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Well, it's for sure EOS related, but it seems to be a different root cause.
>>
>> I am not aware of any related bug.
>>
>> -Matthias
>>
>>
>> On 9/9/20 4:29 AM, Pushkar Deole wrote:
>>> Hi Matthias,
>>>
>>> We are using confluent kafka and upgraded to confluent version 5.5.0
>> which
>>> I believe maps to 2.5.0 of apache kafka. We tested on few labs by keeping
>>> the solution idle for few days and didn't observe the issue.
>>>
>>> However on one of the labs we observed issue again recently, this is the
>>> exception: unfortunately, don't have complete stack trace.
>>> Anyway, do you think it is same exception as above or is it different?
>> and
>>> whether this is also a kafka server issue that is being reported already?
>>>
>>>
>> {"@timestamp":"2020-08-12T09:17:35.270+00:00","@version":"1","message":"Unexpected
>>> exception in stream
>>>
>> processing.","logger_name":"com.avaya.analytics.filter.ApplicationConfig","thread_name":"analytics-event-filter-Str
>>>
>> eamThread-1","level":"ERROR","level_value":40000,"stack_trace":"org.apache.kafka.common.errors.InvalidPidMappingException:
>>> The producer attempted to use a producer id which is not currently
>> assigned
>>> to its transactiona l id.\n"}
>>>
>>> On Sat, May 9, 2020 at 12:49 AM Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>
>>>>>> So does this issue relate to transactions which are used only when
>>>>>> exactly_once guarantee is set?
>>>>
>>>> Correct.
>>>>
>>>> On 5/8/20 6:28 AM, Pushkar Deole wrote:
>>>>> Hello Matthias,
>>>>>
>>>>> By the way, this error seems to be occurring in only one of the
>> services.
>>>>> There is another service which is also using kafka streams to consumer
>>>> from
>>>>> source, uses processors and then a sink to the output topic, however
>> that
>>>>> service is running fine. The difference is this other service is using
>>>>> at_least_once guarantee while the service in error is exactly once
>>>>> guarantee.
>>>>> So does this issue relate to transactions which are used only when
>>>>> exactly_once guarantee is set?
>>>>>
>>>>> On Mon, Apr 27, 2020 at 12:37 PM Pushkar Deole <pdeole2...@gmail.com>
>>>> wrote:
>>>>>
>>>>>> came across this: seems to be the one
>>>>>> https://issues.apache.org/jira/browse/KAFKA-8710
>>>>>>
>>>>>> On Mon, Apr 27, 2020 at 12:17 PM Pushkar Deole <pdeole2...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks... can you point to those improvements/bugs that are fixed in
>>>> 2.5?
>>>>>>>
>>>>>>> On Mon, Apr 27, 2020 at 1:03 AM Matthias J. Sax <mj...@apache.org>
>>>> wrote:
>>>>>>>
>>>>>>>> Well, what you say is correct. However, it's a "bug" in the sense
>> that
>>>>>>>> for some cases the producer does not need to fail, but can
>>>> re-initialize
>>>>>>>> itself automatically. Of course, you can also see this as an
>>>> improvement
>>>>>>>> and not a bug :)
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>> On 4/25/20 7:48 AM, Pushkar Deole wrote:
>>>>>>>>> version used is 2.3
>>>>>>>>> however, not sure if this is a bug.. after doing some search, came
>>>>>>>> across
>>>>>>>>> following for the reason of this:
>>>>>>>>>
>>>>>>>>> essentially, the transaction coordinator of streams is cleaning up
>>>> the
>>>>>>>>> producer and transaction ids after a certain time interval
>> controller
>>>>>>>> by
>>>>>>>>> transactional.id.expiration.ms
>>>>>>>>> <
>>>>>>>>
>>>>
>> https://docs.confluent.io/current/installation/configuration/broker-configs.html#transactional-id-expiration-ms
>>>>>>>>> ,
>>>>>>>>> if the coordinator doesn't receive any updates/writes from the
>>>>>>>> producer for
>>>>>>>>> that much time. Default of this parameter is 7 days and our labs
>> have
>>>>>>>> been
>>>>>>>>> idle for more than that.
>>>>>>>>>
>>>>>>>>> On Fri, Apr 24, 2020 at 10:46 PM Matthias J. Sax <mj...@apache.org
>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> This version are you using?
>>>>>>>>>>
>>>>>>>>>> Couple of broker and client side exactly-once related bugs got fix
>>>> in
>>>>>>>>>> the latest release 2.5.0.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>> On 4/23/20 11:59 PM, Pushkar Deole wrote:
>>>>>>>>>>> Hello All,
>>>>>>>>>>>
>>>>>>>>>>> While using kafka streams application, we are intermittently
>>>> getting
>>>>>>>>>>> following exception and stream is closed. We need to restart the
>>>>>>>>>>> application to get it working again and start processing. This
>>>>>>>> exception
>>>>>>>>>> is
>>>>>>>>>>> observed in some of the labs which are being idle for some time
>> but
>>>>>>>> it is
>>>>>>>>>>> not observed always. Any inputs appreciated here.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> {"@timestamp":"2020-04-15T13:53:52.698+00:00","@version":"1","message":"stream-thread
>>>>>>>>>>> [analytics-event-filter-StreamThread-1] Failed to commit stream
>>>> task
>>>>>>>> 2_14
>>>>>>>>>>> due to the following
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> error:","logger_name":"org.apache.kafka.streams.processor.internals.AssignedStreamsTasks","thread_name":"analytics-event-filter-StreamThread-1","level":"ERROR","level_value":40000,"stack_trace":"org.apache.kafka.common.KafkaException:
>>>>>>>>>>> Unexpected error in AddOffsetsToTxnResponse: The producer
>> attempted
>>>>>>>> to
>>>>>>>>>> use
>>>>>>>>>>> a producer id which is not currently assigned to its
>> transactional
>>>>>>>>>>> id.\n\tat
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.kafka.clients.producer.internals.TransactionManager$AddOffsetsToTxnHandler.handleResponse(TransactionManager.java:1406)\n\tat
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1069)\n\tat
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)\n\tat
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:561)\n\tat
>>>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:553)\n\tat
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:425)\n\tat
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:311)\n\tat
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)\n\tat
>>>>>>>>>>> java.base/java.lang.Thread.run(Unknown Source)\n"}
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to