I presume you are referring to case #5? Joshua correctly pointed to the assumption I made (should have probably documented it) is that the pause/resume actions would issue relevant stop/start calls for the monitoring service to either suspend or resume heartbeats. You may argue it applies unnecessary implementation restrictions to the monitoring service and I tend to agree.
I'm not sure what a monitoring/heartbeating service would do with its > record of a paused job while it's paused. How would it know when to resume > monitoring and sending heartbeats without some notification that the update > itself has resumed? By the same token, if monitoring were to continue while > an updated was paused and an alert were to fire, what action would the > monitoring service take with regard to the paused update? Perhaps just sending OK (or a NOOP equivalent) in case of a user-paused job update would make more sense as there is nothing monitoring service could do in that case. This should work fine with pause/resume -aware/-agnostic monitoring service implementation. On Fri, Oct 10, 2014 at 1:43 PM, Joshua Cohen <jco...@twopensource.com> wrote: > In theory when the user resumes they'd also resume monitoring (and thus > resume heartbeats)? Maybe the resumeJobUpdate RPC needs to support > pauseIfNoHeartbeatsAfterMs as well? > > I'm not sure what a monitoring/heartbeating service would do with its > record of a paused job while it's paused. How would it know when to resume > monitoring and sending heartbeats without some notification that the update > itself has resumed? By the same token, if monitoring were to continue while > an updated was paused and an alert were to fire, what action would the > monitoring service take with regard to the paused update? > > On Fri, Oct 10, 2014 at 1:28 PM, David McLaughlin <da...@dmclaughlin.com> > wrote: > > > - A heartbeatJobUpdate RPC is called with the matching update ID. > > Scheduler resets countdown and responds with STOP > > > > Paused is a tricky state because the user can resume at any time. I'd > > propose we have a different response here. You really don't want to > "stop" > > monitoring the update while it is in a non-terminal state. You might want > > to be aware that your heartbeat is a no-op, though. > > > > On Fri, Oct 10, 2014 at 12:47 PM, Maxim Khutornenko <ma...@apache.org> > > wrote: > > > > > Hi all, > > > > > > We are proposing a new feature for the scheduler updater, which you > > > may find helpful. > > > > > > I have posed a brief feature summary here: > > > > > > > > > https://github.com/maxim111333/incubator-aurora/blob/hb_doc/docs/update-heartbeat.md > > > > > > Please, reply with your feedback/concerns/comments. > > > > > > Thanks, > > > Maxim > > > > > >