Re: Mesos and Aurora out of Sync

2014-09-19 Thread Bill Farner
Thanks! We'll get this fixed up. Are you sorted out for now? -=Bill On Fri, Sep 19, 2014 at 3:30 AM, Stephan Erb wrote: > I've filed the bug: https://issues.apache.org/jira/browse/AURORA-728 > > Regards, > Stephan > > On 18.09.2014 17:38, Bill Farner wrote: > > Answering my own question: the

Re: Mesos and Aurora out of Sync

2014-09-19 Thread Stephan Erb
I've filed the bug: https://issues.apache.org/jira/browse/AURORA-728 Regards, Stephan On 18.09.2014 17:38, Bill Farner wrote: Answering my own question: the GC executor log shows the task ended up in LOST, so i'd guess you saw PENDING -> ASSIGNED -> [STARTING ->] LOST, the final one being the s

Re: Mesos and Aurora out of Sync

2014-09-18 Thread Stephan Erb
Looks like you are on the right track. The task you mentioned (1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f0) is still considered RUNNING by mesos, even though the stderr contains the mentioned zookeeper timeout error. For aurora the task is in the LOST state (P

Re: Mesos and Aurora out of Sync

2014-09-18 Thread Bill Farner
Answering my own question: the GC executor log shows the task ended up in LOST, so i'd guess you saw PENDING -> ASSIGNED -> [STARTING ->] LOST, the final one being the scheduler assuming the task was dead. Definitely bug-worthy. -=Bill On Thu, Sep 18, 2014 at 8:37 AM, Bill Farner wrote: > For

Re: Mesos and Aurora out of Sync

2014-09-18 Thread Bill Farner
For that thermos executor stderr, was its task (1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f01) transitioned cleanly to FAILED? The error itself indicates that the executor timed out communicating with your ZooKeeper cluster, something you should look into. If the tas

Re: Mesos and Aurora out of Sync

2014-09-18 Thread Bill Farner
Just to rule out the obvious - are GC tasks in the master's 22 tasks? Their task IDs would start with 'system-gc-'. -=Bill On Thu, Sep 18, 2014 at 6:47 AM, Stephan Erb wrote: > Hi everyone, > > on my local test cluster mesos and aurora seem to be running out of sync: > >- Mesos status: 22

Mesos and Aurora out of Sync

2014-09-18 Thread Stephan Erb
Hi everyone, on my local test cluster mesos and aurora seem to be running out of sync: * Mesos status: 22 active tasks by the twitter scheduler * Aurora status: 4 active production tasks, 1 active test task * Slave status: thermos reports 5 active tasks and 'ps aux' reports 5 active proce