Looks like you are on the right track.
The task you mentioned
(1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f0)
is still considered RUNNING by mesos, even though the stderr contains
the mentioned zookeeper timeout error.
For aurora the task is in the LOST state (PENDING -> ASSIGNED ->
STARTING -> RUNNING -> PREEMPTING -> LOST).
I will try to reproduce the issue tomorrow and post a bug report.
Best,
Stephan
On Do 18 Sep 2014 17:38:40 CEST, Bill Farner wrote:
Answering my own question: the GC executor log shows the task ended up in
LOST, so i'd guess you saw PENDING -> ASSIGNED -> [STARTING ->] LOST, the
final one being the scheduler assuming the task was dead. Definitely
bug-worthy.
-=Bill
On Thu, Sep 18, 2014 at 8:37 AM, Bill Farner <wfar...@apache.org> wrote:
For that thermos executor stderr, was its task
(1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f01)
transitioned cleanly to FAILED?
The error itself indicates that the executor timed out communicating with
your ZooKeeper cluster, something you should look into. If the task didn't
~immediately go to FAILED, that's a bug on our side, which i encourage you
to file a bug for.
-=Bill
On Thu, Sep 18, 2014 at 8:33 AM, Bill Farner <wfar...@apache.org> wrote:
Just to rule out the obvious - are GC tasks in the master's 22 tasks?
Their task IDs would start with 'system-gc-'.
-=Bill
On Thu, Sep 18, 2014 at 6:47 AM, Stephan Erb <stephan....@blue-yonder.com
wrote:
Hi everyone,
on my local test cluster mesos and aurora seem to be running out of sync:
- Mesos status: 22 active tasks by the twitter scheduler
- Aurora status: 4 active production tasks, 1 active test task
- Slave status: thermos reports 5 active tasks and 'ps aux' reports
5 active processes, i.e., aurora and thermos seem to be correct
I thought the GC was supposed to reconcile this status? I have attached
the log file of a recent gc_executor run and the stderr of one of the
faulty executors. I am omitting the logfile for the executors as these are
large and don't seem to be showing anything of interest.
Any idea what is wrong here?
Thanks,
Stephan
--
Stephan Erb
Software Engineer
*Blue Yonder GmbH*
Ohiostrasse 8
D-76149 Karlsruhe
Tel +49 (0)721 383 117 6243
Fax +49 (0)721 383 117 69
stephan....@blue-yonder.com <mailto:stephan....@blue-yonder.com>
www.blue-yonder.com <http://www.blue-yonder.com/>
Registergericht Mannheim, HRB 704547
USt-IdNr. DE DE 277 091 535
Geschäftsführer: Jochen Bossert, Uwe Weiss (CEO)
<http://www.datalympics.com>