Hi there, 

I just watched the flink forward talk from Amazon regarding measuring uptime 
from [1] with slides here [2] and referencing the developer mailing list here 
[3]. 

Seems like Amazon is already running with those metrics enabled in their 
production cluster. I'd really like to have those statusses available in our 
flink deployment as well. 

The author at AWS links 3 different design docs and the mailing list found came 
up with the best way to implement those metrics would be kind of a job status 
listener in the JobManager (call it jobstatus, incident or whatever). However, 
I was not able to find a JIRA issue(s) for this story and I am also not able to 
find anything if this is now implemented, planned or rejected. Does anyone of 
you know more about it and whether there is such a listener somewhere in the 
JobManager? 

Best regards 
Theo 


[1] [ https://www.youtube.com/watch?v=pIVmw1HyUqU | 
https://www.youtube.com/watch?v=pIVmw1HyUqU ] 
[2] 
https://de.slideshare.net/FlinkForward/virtual-flink-forward-2020-lessons-learned-on-apache-flink-application-availability-in-a-hosted-apache-flink-service-praveen-gattu-hwanju-kim-ryan-nienhuis
 
[3] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-for-Flink-job-execution-availability-metrics-impovement-td28882.html#a28962
 

Reply via email to