Possible pros to having the scheduler do the updates: - Scheduler likely has the most direct information with respect to job/task SLA style metrics, and can use these to help in keeping jobs within SLA during an update. - If the updates are given as "rate of change", if/when tasks fail in large jobs, the update rate may be adjusted automatically to stay within SLA, and possibly use a opportunistic method to upgrade a new replacement task with the new one.
-Toby. On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wfar...@apache.org> wrote: > Hi all, > > Rolling updates of services is a crucial feature in Aurora. As such, we > want to take great care when changing its behavior. Today, Aurora operates > by delegating this functionality to the client (or any API client, for that > matter). While this has provided a nice abstraction, it turns out there are > some shortcomings with this approach: > > 1. Visibility: since the scheduler does not know about updates, it cannot > display useful information about an in-progress update > 2. Visibility: for two users to diagnose a failed update, they must be at > the same terminal, or copy/paste terminal output > 3. Usability: the scheduler has no means to show information about how an > application's packages or configuration changed over time > 4. Usability: update orchestration in the client means a lost connection > to the scheduler halts an update > > Some of the above issues can be addressed by moving update orchestration to > a service external to the scheduler. At first glance, this approach is > attractive, as there is a firm separation of concerns. However, there are a > few pitfalls with this approach: > > 1. Usability: setup and maintenance of an aurora cluster becomes even > more complicated (additional service + storage system) > 2. Usability: the user interface becomes more complicated to stitch > together, as end-users really should only have to visit one website to view > job information. > 3. Complexity: implementing a new production-ready service from scratch > will take a non-trivial amount of time > > With these issues in mind, I propose that the scheduler take over the > responsibility of application update orchestration. This will allow us to > solve the current design shortcomings, without the pitfalls of the separate > service approach. > > I'm interested in thoughts others have on this. Does the reasoning seem > sound? Are there things i'm missing? > > > -=Bill >