??Hi,

I am working with Apache Flink and am interested in knowing how one could 
estimate the total amount of time an application spends in recovery, including 
the input stream "catch-up" after checkpoint recovery. What I am specifically 
interested in is knowing the time needed for the recovery of the state + the 
catch-up phase (since the application's source tasks are reset to an earlier 
input position after recovery, this would be the data it processed before the 
failure and data that accumulated while the application was down).

My question is, what important considerations should I take into account when 
estimating this time and which portions of the Apache Flink codebase would be 
most helpful?

Thanks

Reply via email to