Re: Aurora.pex client can't find scheduler

Bill Farner Wed, 18 Feb 2015 12:02:13 -0800

Great to hear!

On Wednesday, February 18, 2015, Xasima <xas...@gmail.com> wrote:


> Your are right, the job stderr complains on
>
> ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version
> `GLIBCXX_3.4.18' not found (required by
>
> /home/weblab/.pex/install/mesos.native-0.20.1-py2.7-linux-x86_64.egg.3ad559b2b9ba2a363146049c27abff70d4860891/mesos.native-0.20.1-py2.7-linux-x86_64.egg/mesos/native/_mesos.so)
>
> so installing the libstdc++6-4.7-dev from ppa:ubuntu-toolchain-r/test
> resolve this.
>
> Now I started   ./thermos_observer.pex --root=/var/run/thermos manually and
> the example goes fine. Many thanks!
>
> On Wed, Feb 18, 2015 at 6:19 PM, Bill Farner <wfar...@apache.org
> <javascript:;>> wrote:
>
> > The executor is dying, and the scheduler is defensively backing off from
> > the task. Key scheduler log content is:
> >
> >
> > I0218 14:26:57.759 THREAD156
> > org.apache.aurora.scheduler.mesos.MesosSchedulerImpl.statusUpdate:
> > Received status update for task
> >
> >
> 1424269615751-weblab-devel-hello_server-0-d3181d51-669d-4ed3-aea2-8a7a82711b92
> > in state TASK_LOST with core message Executor terminated
> >
> > You can get more info in the executor sandbox, which is named
> > "thermos-$taskid", the task ID being the UUID in the log line above.
> > The easiest way to find it is "find / -name thermos-$taskid".  Please
> > look at files in that directory and post anything that appears
> > relevant.
> >
> >
> > On Wednesday, February 18, 2015, Xasima <xas...@gmail.com <javascript:;>>
> wrote:
> >
> > > Many thanks for suggestion, I had successfully run the aurora.pex job
> > > create command after restarting  mesos-slave with provided rack, host,
> ip
> > >  values.
> > >
> > > Nevertheless, I got the following error from UI against this job
> > > 'THROTTLED : Rescheduled, penalized for 60000 ms for flapping'
> > > Here is the log of aurora-scheduler on that
> > > https://gist.github.com/xasima/12de906475d70523316a#comment-1396134
> > >
> > > Needs to mention  that thermos and executors pex are built with
> > appropriate
> > > mesos eggs (see below), and also provided to aurora-scheduler.
> > >
> > > ls /opt/apache-aurora-0.7.0-incubating/third_party/
> > > mesos-0.20.1-py2.7-linux-x86_64.egg
> > > mesos.interface-0.20.1-py2.7.egg
> > > mesos.native-0.20.1-py2.7-linux-x86_64.egg
> > > mesos-0.21.0-py2.7-linux-x86_64.egg
> > > mesos.interface-0.21.1-py2.7.egg
> > > mesos.native-0.21.1-py2.7-linux-x86_64.egg
> > >
> > > How to overcome this?
> > >
> > > On Wed, Feb 18, 2015 at 12:18 AM, Steve Niemitz <st...@tellapart.com
> <javascript:;>
> > > <javascript:;>> wrote:
> > >
> > > > Is there a reason you set zk_in_proc=true?  Setting it tells the
> > > scheduler
> > > > to ignore the "real" ZK server and use an in-proc one instead.
> > > >
> > > > -zk_in_proc=false
> > > > Launches an embedded zookeeper server for local testing causing
> > > > -zk_endpoints to be ignored if specified.
> > > >
> > > >
> > >
> >
> (com.twitter.common.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_in_proc)
> > > >
> > > > On Tue, Feb 17, 2015 at 4:09 PM, Xasima <xas...@gmail.com
> <javascript:;>
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > Hello. I'm bump in into following problems when trying to perform
> the
> > > > very
> > > > > first 'aurora.pex job create' command.
> > > > > 1) 'Could not connect to scheduler: No schedulers detected in
> > > devcluster'
> > > > > and
> > > > > 2) 'Failed to connect to Zookeeper within 10 seconds.'
> > > > >
> > > > > It had tried to check everything in  configurations, but I can't
> find
> > > the
> > > > > root of the problem so far. I have zookeeper, mesos-master,
> > > mesos-slave,
> > > > > and aurora-scheduler running on the same server. The little
> > difference
> > > > from
> > > > > the default vagrant/example configuration is the usage of non
> default
> > > > > http_port  for aurora scheduler.
> > > > >
> > > > > Namely, I have  aurora scheduler  running with the following  /vars
> > > prop
> > > > >
> > > > > *jvm_prop_sun_java_command
> > > *org.apache.aurora.scheduler.app.SchedulerMain
> > > > >
> > > > >
> > > >
> > >
> >
> -thermos_executor_path=/opt/apache-aurora-0.7.0-incubating/dist/thermos_executor.pex
> > > > >
> > > >
> > >
> >
> -gc_executor_path=/opt/apache-aurora-0.7.0-incubating/dist/gc_executor.pex
> > > > > -http_port=8091 -zk_in_proc=true -zk_endpoints=localhost:2181
> > > > > -zk_session_timeout=2secs -serverset_path=/aurora/scheduler
> > > > > -mesos_master_address=zk://localhost:2181/mesos
> > > -cluster_name=devcluster
> > > > > -native_log_quorum_size=1
> > > > > -native_log_file_path=/usr/local/aurora-scheduler/db
> > > > > -native_log_zk_group_path=/local/service/mesos-native-log
> > > > > -backup_dir=/usr/local/aurora-scheduler/backups -logtostderr
> > -vlog=INFO
> > > > >
> > > > > and here is the successful tail of aurora-scheduler log
> > > > >
> > > > > W0217 20:42:25.952 THREAD140
> > > > > com.twitter.common.zookeeper.ServerSetImpl.join: Joining a
> ServerSet
> > > > > without a shard ID is deprecated and will soon break.
> > > > >  com.twitter.common.zookeeper.Group$ActiveMembership.join: Set
> group
> > > > member
> > > > > ID to member_0000000001
> > > > >
> > > > > I0217 20:42:26.026 THREAD132
> > > > >
> > com.twitter.common.zookeeper.ServerSetImpl$ServerSetWatcher.logChange:
> > > > > server set /aurora/scheduler change: from 0 members to 1
> > > > >         joined:
> > > > >
> > > > >
> ServiceInstance(serviceEndpoint:Endpoint(host:bymsq-bsu-hmetrics002,
> > > > > port:8091),
> > > > additionalEndpoints:{http=Endpoint(host:bymsq-bsu-hmetrics002,
> > > > > port:8091)}, status:ALIVE)
> > > > >
> > > > > I0217 20:42:26.026 THREAD132
> > > > >
> > > >
> > >
> >
> org.apache.aurora.scheduler.http.LeaderRedirect$SchedulerMonitor.onChange:
> > > > > Found leader scheduler at
> > > > >
> [ServiceInstance(serviceEndpoint:Endpoint(host:bymsq-bsu-hmetrics002,
> > > > > port:8091),
> > > > additionalEndpoints:{http=Endpoint(host:bymsq-bsu-hmetrics002,
> > > > > port:8091)}, status:ALIVE)]
> > > > >
> > > > > Not sure, if this is suspicious, but I see in zookeeper
> > > > > /local/service/mesos-native-log/0000000010 node, and
> > > > /mesos/info_000000003
> > > > > nodes, but there are no /aurora/scheduler node.
> > > > >
> > > > > The configuration file /etc/aurora/clusters.json points  to zk with
> > > > proper
> > > > > scheduler_zk_path. All *.pex files are built with pants against
> > > > appropriate
> > > > > build or downloaded AURORA_DIST/third_party/mesos_*.egg.   This
> gist
> > > > > contains all the details on my configurations
> > > > > https://gist.github.com/xasima/12de906475d70523316a
> > > > >
> > > > >  Nevertheless, the very trivial hello_world service fails to run
> with
> > > > > errors on
> > > > >  WARN] Could not connect to scheduler: No schedulers detected in
> > > > > devcluster!
> > > > > WARN] Could not connect to scheduler: Failed to connect to
> Zookeeper
> > > > within
> > > > > 10 seconds.
> > > > >
> > > > > Could please someone help and examine the configuration above?
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >      ~ Xasima ~
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >      ~ Xasima ~
> > >
> >
> >
> > --
> > -=Bill
> >
>
>
>
> --
> Best regards,
>      ~ Xasima ~
>


-- 
-=Bill

Re: Aurora.pex client can't find scheduler

Reply via email to