Hi,

I was wondering how Flink's fault tolerance works, because this page
is short on the details:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/fault_tolerance.html

My environment has a backend service that may be out for a couple of
hours (sad, but working on fixing that). I have a sink that would like
to write to that service, and in such cases it throws an exception.
This brings the process down and I need to manually intervene to get
it up and running again.

I was thinking to rewrite the sink to loop until it is able to write
the data (and have a multi-hour long tolarence before it throws an
exception). I hope that it will create a backpressure on the process,
"suspend" the processing and "resume" it when the backend service goes
up again.

Am I right with that assumption? Is there a better way to make
suspending and resuming automatic?

Thanks,
  Istvan

Reply via email to