I agree with everything here.  A big pain point from the docker integration
side was/is the observer, and rolling the observer functionality into the
executor would simplify things greatly.

On Sat, Jan 24, 2015 at 12:29 PM, Bill Farner <wfar...@apache.org> wrote:

> +1, thanks for the braindump, Brian!  This sounds great.
>
> -=Bill
>
> On Sat, Jan 24, 2015 at 8:43 AM, Joe Smith <yasumo...@gmail.com> wrote:
>
> > Thanks for the write up!
> >
> > > On Jan 22, 2015, at 13:27, Brian Wickman <wick...@apache.org> wrote:
> > >
> > > Thermos is a standalone task execution system that is not coupled to
> > Aurora
> > > or Mesos.  This is why by default, Thermos writes out of the sandbox
> > > (/var/run/thermos), has a separate observability system (Thermos
> > observer),
> > > and CLI (thermos.)
> > >
> > > Aurora built a Thermos executor as its default executor, but the
> > scheduler
> > > is not architecturally tied to Thermos (or vice versa.)  In order to
> make
> > > things work smoothly with this decoupling, a Thermos-specific GC
> executor
> > > is also necessary to clean up the state leftover by the execution of
> > > Thermos tasks and reconcile potential conflicts between the state of
> the
> > > Mesos master and Aurora scheduler.
> > >
> > > Both the GC executor and Thermos observer violate some of the
> > philosophical
> > > axioms of Mesos (e.g. out-of-sandbox access.)  They also significantly
> > > increase the complexity of building, deploying and maintaining Aurora.
> > I'm
> > > proposing removing both of them as required Aurora components.
> > >
> > > In order to do this and make Thermos/Aurora/Mesos to play together more
> > > nicely, several things are necessary.
> > >
> > > 1) Moving /var/run/thermos for each task into the Mesos sandbox
> > >
> > > Thermos is a state machine with all state transitions persisted to
> disk.
> > > Right now this goes to /var/run/thermos, but it should instead be
> > persisted
> > > some place relative to the Mesos sandbox so that the Mesos slave can
> > > garbage collect this state once a Thermos task has completed.
> > >
> > > This poses a task detection problem -- the Thermos CLI and Thermos
> > observer
> > > rely upon the existence of /var/run/thermos to know what tasks are
> > running,
> > > so we will need to develop a plugin to detect alternate task roots (see
> > > AURORA-1024 <https://issues.apache.org/jira/browse/AURORA-1024>
> > AURORA-1025
> > > <https://issues.apache.org/jira/browse/AURORA-1025> AURORA-1026
> > > <https://issues.apache.org/jira/browse/AURORA-1026> AURORA-1027
> > > <https://issues.apache.org/jira/browse/AURORA-1025>).
> > >
> > > 2) Making the Thermos executor responsible for the Thermos UI
> > >
> > > In order to make the Thermos observer an optional component, the
> Thermos
> > > executor will need to assume Thermos observer responsibilities.  Since
> > the
> > > Mesos slave already provides a webserver to serve executor sandboxes, I
> > am
> > > proposing that the Thermos executor generates static HTML content that
> > can
> > > be served by the Mesos slave as a UI.  This means that the executor can
> > > remain lean (no embedded webserver.)  See AURORA-725
> > > <https://issues.apache.org/jira/browse/AURORA-725> AURORA-777
> > > <https://issues.apache.org/jira/browse/AURORA-777>
> > >
> > > 3) Making the Aurora scheduler responsible for state reconciliation
> > >
> > > The last component that should be removed is the GC executor.  The GC
> > > executor performs the important task of state reconciliation, but this
> is
> > > now supported directly by the Mesos master.  See AURORA-715
> > > <https://issues.apache.org/jira/browse/AURORA-715> and specifically
> > > AURORA-1047 <https://issues.apache.org/jira/browse/AURORA-1047>.
> >
> > Although the trusty gc_executor has been solid for a long time, removing
> > it would definitely simplify things, so +10.
> >
> >
> > >
> > > Lastly, this work should make it much easier to support alternate
> > executor
> > > implementations (including the Mesos default executor) from Aurora
> once a
> > > proper Aurora API (AURORA-987
> > > <https://issues.apache.org/jira/browse/AURORA-987>) is available.
> > >
> > > ~brian
> >
>

Reply via email to