We are working on AURORA-690 to support external service coordinated job updates. The feature design was proposed in [1] and discussed in [2].
The one remaining question I would like to discuss here is how to expose the coordinated update configuration to the user. The approaches as I see: 1. Expose "blockIfNoPulsesAfterMs" directly in UpdateConfig requiring user to supply its value to indicate a coordinated update: ... update_config = UpdateConfig(pulse_interval_secs=60) ... While the most straightforward to implement, it may not deliver on user's expectations. The external service may be unable to match a requested job health refresh rate and potentially waste scheduler performance with unnecessary pulseJobUpdate RPC calls. We may limit the lower configurable bound for pulse_interval_secs to something sane to address the latter but it will still not address the unmatched health refresh rate issue. 2. Expose a flag in UpdateConfig and hardcode a large enough (e.g. 1 minute) interval internally. The Aurora client would then populate "blockIfNoPulsesAfterMs" to default interval in case the require_update_pulse flag is set: ... update_config = UpdateConfig(require_update_pulse=True) ... This is more user friendly but less flexible in terms of requirement changes and still does not protect against external service health refresh rate changes. 3. Do not expose any coordinated update settings in a public schema and require external service to act as a job update request proxy mutating job update config on the fly before passing it to the scheduler. This is ideal from the external service controlling the health refresh rate but may require too much hacking as we don't have a private job config schema and relaying user's identity via an external service is no fun from security perspective. Any other options? I am personally leaning towards #1 with hardcoded min value validation as the simplest solution. Users will be required to have a knowledge of what refresh rate their health monitoring system is capable of to configure pulse_interval_secs accordingly. Thoughts? Thanks, Maxim [1] - https://github.com/maxim111333/incubator-aurora/blob/hb_doc/docs/update-heartbeat.md [2] - http://mail-archives.apache.org/mod_mbox/aurora-dev/201410.mbox/%3ccaotkfx7x2oipk4zfysos0uwzrizonkja3y15pvew5k4ynuh...@mail.gmail.com%3E