----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45258/ -----------------------------------------------------------
(Updated May 11, 2016, 5:51 p.m.) Review request for samza, Jake Maes, Navina Ramesh, and Yi Pan (Data Infrastructure). Bugs: SAMZA-911 https://issues.apache.org/jira/browse/SAMZA-911 Repository: samza Description (updated) ------- Currently, the KafkaSystemProducer's producer loop keeps retrying indefinitely when there is an exception in the retryBackOff loop. This is problematic because it will completely stall the Samza container (currently single-threaded). We've observed multiple jobs being affected as a result of this because of transient kafka-broker side errors. If there are repeated exceptions, then it makes sense to retry for awhile, and then fail the container. Long term fix: We should focus on getting rid off the retryBackOff loop, and close the producer object in the callback during failure. Closing the producer object in the callback-handler thread will guarantee in-order delivery. (SAMZA-934) 1.Modified the KafkaSystemProducer to take a maxRetries. (currently, its set to 30). 2.Add tests to verify retry in case of RetriableExceptions. Diffs ----- samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala 9a44d46d29a1997958a9d2bbf7be0bde860fff64 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemProducer.scala 39426d8cf64516ec4fdc0cb4ff60b1df3a757470 Diff: https://reviews.apache.org/r/45258/diff/ Testing ------- Added unit tests to verify functionality. Thanks, Jagadish Venkatraman