Your are right, the job stderr complains on ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by /home/weblab/.pex/install/mesos.native-0.20.1-py2.7-linux-x86_64.egg.3ad559b2b9ba2a363146049c27abff70d4860891/mesos.native-0.20.1-py2.7-linux-x86_64.egg/mesos/native/_mesos.so)
so installing the libstdc++6-4.7-dev from ppa:ubuntu-toolchain-r/test resolve this. Now I started ./thermos_observer.pex --root=/var/run/thermos manually and the example goes fine. Many thanks! On Wed, Feb 18, 2015 at 6:19 PM, Bill Farner <wfar...@apache.org> wrote: > The executor is dying, and the scheduler is defensively backing off from > the task. Key scheduler log content is: > > > I0218 14:26:57.759 THREAD156 > org.apache.aurora.scheduler.mesos.MesosSchedulerImpl.statusUpdate: > Received status update for task > > 1424269615751-weblab-devel-hello_server-0-d3181d51-669d-4ed3-aea2-8a7a82711b92 > in state TASK_LOST with core message Executor terminated > > You can get more info in the executor sandbox, which is named > "thermos-$taskid", the task ID being the UUID in the log line above. > The easiest way to find it is "find / -name thermos-$taskid". Please > look at files in that directory and post anything that appears > relevant. > > > On Wednesday, February 18, 2015, Xasima <xas...@gmail.com> wrote: > > > Many thanks for suggestion, I had successfully run the aurora.pex job > > create command after restarting mesos-slave with provided rack, host, ip > > values. > > > > Nevertheless, I got the following error from UI against this job > > 'THROTTLED : Rescheduled, penalized for 60000 ms for flapping' > > Here is the log of aurora-scheduler on that > > https://gist.github.com/xasima/12de906475d70523316a#comment-1396134 > > > > Needs to mention that thermos and executors pex are built with > appropriate > > mesos eggs (see below), and also provided to aurora-scheduler. > > > > ls /opt/apache-aurora-0.7.0-incubating/third_party/ > > mesos-0.20.1-py2.7-linux-x86_64.egg > > mesos.interface-0.20.1-py2.7.egg > > mesos.native-0.20.1-py2.7-linux-x86_64.egg > > mesos-0.21.0-py2.7-linux-x86_64.egg > > mesos.interface-0.21.1-py2.7.egg > > mesos.native-0.21.1-py2.7-linux-x86_64.egg > > > > How to overcome this? > > > > On Wed, Feb 18, 2015 at 12:18 AM, Steve Niemitz <st...@tellapart.com > > <javascript:;>> wrote: > > > > > Is there a reason you set zk_in_proc=true? Setting it tells the > > scheduler > > > to ignore the "real" ZK server and use an in-proc one instead. > > > > > > -zk_in_proc=false > > > Launches an embedded zookeeper server for local testing causing > > > -zk_endpoints to be ignored if specified. > > > > > > > > > (com.twitter.common.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_in_proc) > > > > > > On Tue, Feb 17, 2015 at 4:09 PM, Xasima <xas...@gmail.com > <javascript:;>> > > wrote: > > > > > > > Hello. I'm bump in into following problems when trying to perform the > > > very > > > > first 'aurora.pex job create' command. > > > > 1) 'Could not connect to scheduler: No schedulers detected in > > devcluster' > > > > and > > > > 2) 'Failed to connect to Zookeeper within 10 seconds.' > > > > > > > > It had tried to check everything in configurations, but I can't find > > the > > > > root of the problem so far. I have zookeeper, mesos-master, > > mesos-slave, > > > > and aurora-scheduler running on the same server. The little > difference > > > from > > > > the default vagrant/example configuration is the usage of non default > > > > http_port for aurora scheduler. > > > > > > > > Namely, I have aurora scheduler running with the following /vars > > prop > > > > > > > > *jvm_prop_sun_java_command > > *org.apache.aurora.scheduler.app.SchedulerMain > > > > > > > > > > > > > > -thermos_executor_path=/opt/apache-aurora-0.7.0-incubating/dist/thermos_executor.pex > > > > > > > > > > -gc_executor_path=/opt/apache-aurora-0.7.0-incubating/dist/gc_executor.pex > > > > -http_port=8091 -zk_in_proc=true -zk_endpoints=localhost:2181 > > > > -zk_session_timeout=2secs -serverset_path=/aurora/scheduler > > > > -mesos_master_address=zk://localhost:2181/mesos > > -cluster_name=devcluster > > > > -native_log_quorum_size=1 > > > > -native_log_file_path=/usr/local/aurora-scheduler/db > > > > -native_log_zk_group_path=/local/service/mesos-native-log > > > > -backup_dir=/usr/local/aurora-scheduler/backups -logtostderr > -vlog=INFO > > > > > > > > and here is the successful tail of aurora-scheduler log > > > > > > > > W0217 20:42:25.952 THREAD140 > > > > com.twitter.common.zookeeper.ServerSetImpl.join: Joining a ServerSet > > > > without a shard ID is deprecated and will soon break. > > > > com.twitter.common.zookeeper.Group$ActiveMembership.join: Set group > > > member > > > > ID to member_0000000001 > > > > > > > > I0217 20:42:26.026 THREAD132 > > > > > com.twitter.common.zookeeper.ServerSetImpl$ServerSetWatcher.logChange: > > > > server set /aurora/scheduler change: from 0 members to 1 > > > > joined: > > > > > > > > ServiceInstance(serviceEndpoint:Endpoint(host:bymsq-bsu-hmetrics002, > > > > port:8091), > > > additionalEndpoints:{http=Endpoint(host:bymsq-bsu-hmetrics002, > > > > port:8091)}, status:ALIVE) > > > > > > > > I0217 20:42:26.026 THREAD132 > > > > > > > > > > org.apache.aurora.scheduler.http.LeaderRedirect$SchedulerMonitor.onChange: > > > > Found leader scheduler at > > > > [ServiceInstance(serviceEndpoint:Endpoint(host:bymsq-bsu-hmetrics002, > > > > port:8091), > > > additionalEndpoints:{http=Endpoint(host:bymsq-bsu-hmetrics002, > > > > port:8091)}, status:ALIVE)] > > > > > > > > Not sure, if this is suspicious, but I see in zookeeper > > > > /local/service/mesos-native-log/0000000010 node, and > > > /mesos/info_000000003 > > > > nodes, but there are no /aurora/scheduler node. > > > > > > > > The configuration file /etc/aurora/clusters.json points to zk with > > > proper > > > > scheduler_zk_path. All *.pex files are built with pants against > > > appropriate > > > > build or downloaded AURORA_DIST/third_party/mesos_*.egg. This gist > > > > contains all the details on my configurations > > > > https://gist.github.com/xasima/12de906475d70523316a > > > > > > > > Nevertheless, the very trivial hello_world service fails to run with > > > > errors on > > > > WARN] Could not connect to scheduler: No schedulers detected in > > > > devcluster! > > > > WARN] Could not connect to scheduler: Failed to connect to Zookeeper > > > within > > > > 10 seconds. > > > > > > > > Could please someone help and examine the configuration above? > > > > > > > > -- > > > > Best regards, > > > > ~ Xasima ~ > > > > > > > > > > > > > > > -- > > Best regards, > > ~ Xasima ~ > > > > > -- > -=Bill > -- Best regards, ~ Xasima ~