Couldn't (yet-to-be-surfaced) information in the sandbox satisfy #2? -=Bill
On Mon, Feb 23, 2015 at 3:58 PM, Zameer Manji <zma...@apache.org> wrote: > I think the thermos CLI is overloaded a bit. It shows operators what is > running on the machine via Thermos. This happens to serve two use cases: > > 1. Checking what Aurora has scheduled on the machine. > 2. Checking the state of Thermos run processes. > > #1 Should be provided by Mesos and we should be keeping the CLI because of > #2. > > > > On Mon, Feb 23, 2015 at 3:54 PM, Bill Farner <wfar...@apache.org> wrote: > > > I'm saying that if Aurora launched Hadoop containes, we can no longer use > > the thermos CLI as view into what Aurora has scheduled on the host. > > > > Happy to break the conversation down into one of features. > > > > -=Bill > > > > On Mon, Feb 23, 2015 at 3:39 PM, Kevin Sweeney <kevi...@apache.org> > wrote: > > > > > I agree with Brian here - the thermos CLI is equivalent to a read-write > > > observer. I wonder if we can change this into a discussion about > > features - > > > what features of the CLI are necessary and which are superfluous? > > > > > > On Mon, Feb 23, 2015 at 3:33 PM, Brian Wickman <wick...@apache.org> > > wrote: > > > > > > > I don't see how that follows. > > > > > > > > Are you saying that if Aurora can start scheduling Hadoop containers, > > > > people can't count on their Hadoop tools for debugging? > > > > > > > > On Mon, Feb 23, 2015 at 2:02 PM, Bill Farner <wfar...@apache.org> > > wrote: > > > > > > > > > Perhaps i was unclear - i was saying that the thermos CLI cannot be > > > > counted > > > > > on as a debugging tool when other executors are in play. > > > > > > > > > > -=Bill > > > > > > > > > > On Mon, Feb 23, 2015 at 1:12 PM, Joseph Smith <yasumo...@gmail.com > > > > > > wrote: > > > > > > > > > > > I actually find that the observer fits this usecase just as well, > > and > > > > is > > > > > a > > > > > > better interface since the scheduler gives users a link to it on > > each > > > > > host > > > > > > as well. > > > > > > > > > > > > I’m looking to decrease build complexity, and removing this would > > be > > > a > > > > > > great victory for that. > > > > > > > > > > > > > On Feb 18, 2015, at 12:56 PM, Maxim Khutornenko < > > ma...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > Running "thermos status --verbosity=3" gives full thermos task > > > > history > > > > > > > including the sandbox path and process table contents. This > > really > > > > > > > saves time when trying to get to the failed task details or see > > > what > > > > > > > else is running on a host. > > > > > > > > > > > > > > On Wed, Feb 18, 2015 at 12:04 PM, Bill Farner < > > wfar...@apache.org> > > > > > > wrote: > > > > > > >> Can either of you elaborate on the type of debugging you > > currently > > > > > > >> accomplish with this tool? > > > > > > >> > > > > > > >> On Wednesday, February 18, 2015, Brian Wickman < > > > wick...@apache.org> > > > > > > wrote: > > > > > > >> > > > > > > >>> I agree it is a valuable component. However, I think that > > until > > > it > > > > > has > > > > > > >>> test coverage we should consider it an unsupported tool. > Filed > > > > > > AURORA-1131 > > > > > > >>> <https://issues.apache.org/jira/browse/AURORA-1131>. This > is > > > > > already > > > > > > on > > > > > > >>> my > > > > > > >>> radar as part of AURORA-1027 > > > > > > >>> <https://issues.apache.org/jira/browse/AURORA-1027>. > > > > > > >>> > > > > > > >>> On Wed, Feb 18, 2015 at 9:19 AM, Maxim Khutornenko < > > > > ma...@apache.org > > > > > > >>> <javascript:;>> wrote: > > > > > > >>> > > > > > > >>>>> Moving parts should either provide value or be obliterated > > from > > > > our > > > > > > >>>> source tree. > > > > > > >>>> > > > > > > >>>> I generally agree. In this particular case it's still > unclear > > to > > > > me > > > > > - > > > > > > >>>> in the absence of Thermos CLI and Observer, how do we > conduct > > > live > > > > > > >>>> site executor/thermos troubleshooting? > > > > > > >>>> > > > > > > >>>> On Tue, Feb 17, 2015 at 7:45 PM, Bill Farner < > > > wfar...@apache.org > > > > > > >>> <javascript:;>> wrote: > > > > > > >>>>>> > > > > > > >>>>>> I think we would be better served by advertising it as an > > > > > > >>>>>> optional component that provides operators and users with > > > > > debugging > > > > > > >>>>>> ability. > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> Slightly tangential discussion, but i think we should be > very > > > > > > skeptical > > > > > > >>>> of > > > > > > >>>>> fringe components. Moving parts should either provide > value > > or > > > > be > > > > > > >>>>> obliterated from our source tree. > > > > > > >>>>> > > > > > > >>>>> -=Bill > > > > > > >>>>> > > > > > > >>>>> On Tue, Feb 17, 2015 at 6:51 PM, Zameer Manji < > > > zma...@apache.org > > > > > > >>> <javascript:;>> wrote: > > > > > > >>>>> > > > > > > >>>>>> One thing I would like to point out is the thermos CLI is > > not > > > > > > required > > > > > > >>>> for > > > > > > >>>>>> Aurora operation. I think we would be better served by > > > > advertising > > > > > > it > > > > > > >>>> as an > > > > > > >>>>>> optional component that provides operators and users with > > > > > debugging > > > > > > >>>>>> ability. > > > > > > >>>>>> > > > > > > >>>>>> On Tue, Feb 17, 2015 at 6:38 PM, Joseph Smith < > > > > > yasumo...@gmail.com > > > > > > >>> <javascript:;>> > > > > > > >>>> wrote: > > > > > > >>>>>> > > > > > > >>>>>>> I believe it absolutely is- ideally as we deprecate the > > > > Observer, > > > > > > we > > > > > > >>>> can > > > > > > >>>>>>> then lean on the Mesos Slave for this information > instead. > > > This > > > > > > will > > > > > > >>>>>>> further decrease the number of moving pieces, simplifying > > the > > > > > > >>>> operation > > > > > > >>>>>> of > > > > > > >>>>>>> an Aurora/Mesos cluster. > > > > > > >>>>>>> > > > > > > >>>>>>>> On Feb 17, 2015, at 6:33 PM, Zameer Manji < > > > zma...@apache.org > > > > > > >>> <javascript:;>> > > > > > > >>>> wrote: > > > > > > >>>>>>>> > > > > > > >>>>>>>> Joe, > > > > > > >>>>>>>> > > > > > > >>>>>>>> If I understand Brian's proposal correctly < > > > > > > >>>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>> > > > > > > >>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/aurora-dev/201501.mbox/%3CCAFTdr0DZvH21tR=NLK0qP-Y9-oL9SyULy6GLah=capuw0sv...@mail.gmail.com%3E > > > > > > >>>>>>>> , > > > > > > >>>>>>>> we are going to depreciate the Observer. This combined > > with > > > > your > > > > > > >>>>>> proposal > > > > > > >>>>>>>> will make the executor the only component that can read > > the > > > > > > >>> thermos > > > > > > >>>>>>>> checkpoints and produce some output that is human > > readable. > > > Is > > > > > > >>> that > > > > > > >>>>>>>> something we want to do? > > > > > > >>>>>>>> > > > > > > >>>>>>>> On Tue, Feb 17, 2015 at 6:26 PM, Joseph Smith < > > > > > > >>> yasumo...@gmail.com <javascript:;>> > > > > > > >>>>>>> wrote: > > > > > > >>>>>>>> > > > > > > >>>>>>>>> Hi everyone, > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> After reviewing the functionality offered by the > Thermos > > > > > > >>>> Commandline > > > > > > >>>>>>> tool > > > > > > >>>>>>>>> vs. what’s exported via the Thermos Observer, I was > > hoping > > > to > > > > > > >>> bring > > > > > > >>>>>> up a > > > > > > >>>>>>>>> question I had: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Can we deprecate the Thermos CLI? > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Removing this would decrease the number of components > > > > required > > > > > > >>> for > > > > > > >>>> a > > > > > > >>>>>>>>> functional Aurora installation (a huge victory, in my > > > > opinion) > > > > > > >>> and > > > > > > >>>>>> also > > > > > > >>>>>>>>> enable the Observer to fully take over the duty of > > > providing > > > > > > >>>>>> visibility > > > > > > >>>>>>>>> into what’s running on a most. In addition, maintenance > > is > > > > > > >>>> performed > > > > > > >>>>>> via > > > > > > >>>>>>>>> the HostMaintenance API < > > > > > > >>>>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>> > > > > > > >>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/admin/host_maintenance.py#L26 > > > > > > >>>>>>>> > > > > > > >>>>>>>>> and should not be done using thermos kill, which would > > > cause > > > > > LOST > > > > > > >>>>>> tasks. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> That said, removing this tool makes it much more > > difficult > > > > for > > > > > > >>>> Thermos > > > > > > >>>>>>> to > > > > > > >>>>>>>>> be used as a monit <http://mmonit.com/monit/> > > replacement, > > > > > which > > > > > > >>>> is > > > > > > >>>>>>>>> actually rather feasible now. In addition, it also > forces > > > > > people > > > > > > >>> to > > > > > > >>>>>>>>> remember + learn the port the Observer is running on in > > > order > > > > > to > > > > > > >>>> get > > > > > > >>>>>>>>> information about tasks. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Any thoughts and opinions would be much appreciated! > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Thanks! > > > > > > >>>>>>>>> Joe > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> -- > > > > > > >>>>>>>>> Zameer Manji > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> -- > > > > > > >>>>>>> Zameer Manji > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>> > > > > > > >>>> > > > > > > >>> > > > > > > >> > > > > > > >> > > > > > > >> -- > > > > > > >> -=Bill > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Zameer Manji > > > > >