The reorder issue can be resolved by setting
MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION to 1 if we talking pure kafka
producer configs ( and I believe they port over to flink kafka connecter ).
This does limit the concurrency ( at the TCP level ) when kafka is back
up an issue which is not very limiti
Try setting the Kafka producer config option for number of retries
("retries") to a large number, to avoid the timeout. It defaults to zero.
Do note that retries may result reordered records.
On Wed, Jan 24, 2018 at 7:07 PM, Ashish Pokharel
wrote:
> Fabian,
>
> Thanks for your feedback - very h
Fabian,
Thanks for your feedback - very helpful as usual !
This is sort of becoming a huge problem for us right now because of our Kafka
situation. For some reason I missed this detail going through the docs. We are
definitely seeing heavy dose of data loss when Kafka timeouts are happening.
Hi Ashish,
Originally, Flink always performed full recovery in case of a failure,
i.e., it restarted the complete application.
There is some ongoing work to improve this and make recovery more
fine-grained (FLIP-1 [1]).
Some parts have been added for 1.3.0.
I'm not familiar with the details, but
Team,
One more question to the community regarding hardening Flink Apps.
Let me start off by saying we do have known Kafka bottlenecks which we are in
the midst of resolving. So during certain times of day, a lot of our Flink Apps
are seeing Kafka Producer timeout issues. Most of the logs are som