For that thermos executor stderr, was its task (1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f01) transitioned cleanly to FAILED?
The error itself indicates that the executor timed out communicating with your ZooKeeper cluster, something you should look into. If the task didn't ~immediately go to FAILED, that's a bug on our side, which i encourage you to file a bug for. -=Bill On Thu, Sep 18, 2014 at 8:33 AM, Bill Farner <wfar...@apache.org> wrote: > Just to rule out the obvious - are GC tasks in the master's 22 tasks? > Their task IDs would start with 'system-gc-'. > > -=Bill > > On Thu, Sep 18, 2014 at 6:47 AM, Stephan Erb <stephan....@blue-yonder.com> > wrote: > >> Hi everyone, >> >> on my local test cluster mesos and aurora seem to be running out of sync: >> >> - Mesos status: 22 active tasks by the twitter scheduler >> - Aurora status: 4 active production tasks, 1 active test task >> - Slave status: thermos reports 5 active tasks and 'ps aux' reports 5 >> active processes, i.e., aurora and thermos seem to be correct >> >> >> I thought the GC was supposed to reconcile this status? I have attached >> the log file of a recent gc_executor run and the stderr of one of the >> faulty executors. I am omitting the logfile for the executors as these are >> large and don't seem to be showing anything of interest. >> >> Any idea what is wrong here? >> >> Thanks, >> Stephan >> >> >> >> >> >> >> >> >