Looks like you are on the right track.

The task you mentioned (1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f0) is still considered RUNNING by mesos, even though the stderr contains the mentioned zookeeper timeout error.

For aurora the task is in the LOST state (PENDING -> ASSIGNED -> STARTING -> RUNNING -> PREEMPTING -> LOST).

I will try to reproduce the issue tomorrow and post a bug report.

Best,
Stephan


On Do 18 Sep 2014 17:38:40 CEST, Bill Farner wrote:
Answering my own question: the GC executor log shows the task ended up in
LOST, so i'd guess you saw PENDING -> ASSIGNED -> [STARTING ->] LOST, the
final one being the scheduler assuming the task was dead.  Definitely
bug-worthy.

-=Bill

On Thu, Sep 18, 2014 at 8:37 AM, Bill Farner <wfar...@apache.org> wrote:

For that thermos executor stderr, was its task
(1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f01)
transitioned cleanly to FAILED?

The error itself indicates that the executor timed out communicating with
your ZooKeeper cluster, something you should look into.  If the task didn't
~immediately go to FAILED, that's a bug on our side, which i encourage you
to file a bug for.

-=Bill

On Thu, Sep 18, 2014 at 8:33 AM, Bill Farner <wfar...@apache.org> wrote:

Just to rule out the obvious - are GC tasks in the master's 22 tasks?
  Their task IDs would start with 'system-gc-'.

-=Bill

On Thu, Sep 18, 2014 at 6:47 AM, Stephan Erb <stephan....@blue-yonder.com
wrote:

  Hi everyone,

on my local test cluster mesos and aurora seem to be running out of sync:

    - Mesos status: 22 active tasks by the twitter scheduler
    - Aurora status: 4 active production tasks,  1 active test task
    - Slave status: thermos reports 5 active tasks and 'ps aux' reports
    5 active processes, i.e., aurora and thermos seem to be correct


I thought the GC was supposed to reconcile this status? I have attached
the log file of a recent gc_executor run and the stderr of one of the
faulty executors. I am omitting the logfile for the executors as these are
large and don't seem to be showing anything of interest.

Any idea what is wrong here?

Thanks,
Stephan














--
Stephan Erb
Software Engineer
*Blue Yonder GmbH*
Ohiostrasse 8
D-76149 Karlsruhe

Tel +49 (0)721 383 117 6243
Fax +49 (0)721 383 117 69

stephan....@blue-yonder.com <mailto:stephan....@blue-yonder.com>
www.blue-yonder.com <http://www.blue-yonder.com/>
Registergericht Mannheim, HRB 704547
USt-IdNr. DE DE 277 091 535
Geschäftsführer: Jochen Bossert, Uwe Weiss (CEO)

<http://www.datalympics.com>

Reply via email to