(correction: we are using samza 0.9.0) On Fri, Jul 29, 2016 at 12:09 PM, Gaurav Agarwal <gauravagarw...@gmail.com> wrote:
> Hi All, > > We are using Samza (0.10.0) in our system and recently ran into a problem > where due to Kafka broker being unstable for few moments, our samza tasks > while trying to write message to kafka got exceptions. After that moment, > they went into a very long retry loop (Integer.MAX times). > > The repeated warning lines we are getting in container logs are: > *.* > *.* > > *WARN [2016-05-23 > 06:41:36,645] [U:260,F:293,T:552,M:2,267] > producer.internals.Sender:[Sender:completeBatch:257] - > [kafka-producer-network-thread > | samza_producer-job4-1-1463686278936-2] - Got error produce response with > correlation id 5888322 on topic-partition Topic3-0, retrying (2144537752 > <%282144537752> attempts left). Error: CORRUPT_MESSAGE* > *.* > *.* > > We experimented with setting the kafka producer 'retries' configuration to > a smaller number but it appears that samza does not permit overriding this > parameter. On top of it there is some additional Samza level retry logic to > re-send the message if kafka errored with a 'RetriableException' > > May I know what is the reason for disallowing this override? Additionally, > what is the recommended way to handle such situations? > > I would have thought that a possible policy would be that if after K > (configured by user) kafka retries, samza-kafka was still unable to send > the message, it could have thrown an exception out to the user land and let > the user determine what is to be done - in our case we would have chosen to > kill the container and have yarn samza app master request for a new one > from Yarn. > > There seem to be at-least a couple of bugs related to this already open > > > 1. https://issues.apache.org/jira/browse/SAMZA-610 > 2. https://issues.apache.org/jira/browse/SAMZA-911 > > > cheers, > gaurav > >