(correction: we are using samza 0.9.0)

On Fri, Jul 29, 2016 at 12:09 PM, Gaurav Agarwal <gauravagarw...@gmail.com>
wrote:

> Hi All,
>
> We are using Samza (0.10.0) in our system and recently ran into a problem
> where due to Kafka broker being unstable for few moments, our samza tasks
> while trying to write message to kafka got exceptions. After that moment,
> they went into a very long retry loop (Integer.MAX times).
>
> The repeated warning lines we are getting in container logs are:
> *.*
> *.*
>
> *WARN [2016-05-23
> 06:41:36,645] [U:260,F:293,T:552,M:2,267] 
> producer.internals.Sender:[Sender:completeBatch:257] - 
> [kafka-producer-network-thread
> | samza_producer-job4-1-1463686278936-2] - Got error produce response with
> correlation id 5888322 on topic-partition Topic3-0, retrying (2144537752
> <%282144537752> attempts left). Error: CORRUPT_MESSAGE*
> *.*
> *.*
>
> We experimented with setting the kafka producer 'retries' configuration to
> a smaller number but it appears that samza does not permit overriding this
> parameter. On top of it there is some additional Samza level retry logic to
> re-send the message if kafka errored with a 'RetriableException'
>
> May I know what is the reason for disallowing this override? Additionally,
> what is the recommended way to handle such situations?
>
> I would have thought that a possible policy would be that if after K
> (configured by user) kafka retries, samza-kafka was still unable to send
> the message, it could have thrown an exception out to the user land and let
> the user determine what is to be done - in our case we would have chosen to
> kill the container and have yarn samza app master request for a new one
> from Yarn.
>
> There seem to be at-least a couple of bugs related to this already open
>
>
>    1. https://issues.apache.org/jira/browse/SAMZA-610
>    2. https://issues.apache.org/jira/browse/SAMZA-911
>
>
> cheers,
> gaurav
>
>

Reply via email to