Re: Kafka Producer timeout causing data loss

2018-01-25 Thread Vishal Santoshi
The reorder issue can be resolved by setting MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION to 1 if we talking pure kafka producer configs ( and I believe they port over to flink kafka connecter ). This does limit the concurrency ( at the TCP level ) when kafka is back up an issue which is not very limiti

Re: Kafka Producer timeout causing data loss

2018-01-25 Thread Elias Levy
Try setting the Kafka producer config option for number of retries ("retries") to a large number, to avoid the timeout. It defaults to zero. Do note that retries may result reordered records. On Wed, Jan 24, 2018 at 7:07 PM, Ashish Pokharel wrote: > Fabian, > > Thanks for your feedback - very h

Re: Kafka Producer timeout causing data loss

2018-01-24 Thread Ashish Pokharel
Fabian, Thanks for your feedback - very helpful as usual ! This is sort of becoming a huge problem for us right now because of our Kafka situation. For some reason I missed this detail going through the docs. We are definitely seeing heavy dose of data loss when Kafka timeouts are happening.

Re: Kafka Producer timeout causing data loss

2018-01-23 Thread Fabian Hueske
Hi Ashish, Originally, Flink always performed full recovery in case of a failure, i.e., it restarted the complete application. There is some ongoing work to improve this and make recovery more fine-grained (FLIP-1 [1]). Some parts have been added for 1.3.0. I'm not familiar with the details, but

Kafka Producer timeout causing data loss

2018-01-19 Thread ashish pok
Team, One more question to the community regarding hardening Flink Apps. Let me start off by saying we do have known Kafka bottlenecks which we are in the midst of resolving. So during certain times of day, a lot of our Flink Apps are seeing Kafka Producer timeout issues. Most of the logs are som