Re: Aurora.pex client can't find scheduler

Xasima Wed, 18 Feb 2015 06:57:24 -0800

Many thanks for suggestion, I had successfully run the aurora.pex job
create command after restarting  mesos-slave with provided rack, host, ip
 values.


Nevertheless, I got the following error from UI against this job
'THROTTLED : Rescheduled, penalized for 60000 ms for flapping'
Here is the log of aurora-scheduler on that
https://gist.github.com/xasima/12de906475d70523316a#comment-1396134

Needs to mention  that thermos and executors pex are built with appropriate
mesos eggs (see below), and also provided to aurora-scheduler.

ls /opt/apache-aurora-0.7.0-incubating/third_party/
mesos-0.20.1-py2.7-linux-x86_64.egg
mesos.interface-0.20.1-py2.7.egg
mesos.native-0.20.1-py2.7-linux-x86_64.egg
mesos-0.21.0-py2.7-linux-x86_64.egg
mesos.interface-0.21.1-py2.7.egg
mesos.native-0.21.1-py2.7-linux-x86_64.egg

How to overcome this?

On Wed, Feb 18, 2015 at 12:18 AM, Steve Niemitz <st...@tellapart.com> wrote:

> Is there a reason you set zk_in_proc=true?  Setting it tells the scheduler
> to ignore the "real" ZK server and use an in-proc one instead.
>
> -zk_in_proc=false
> Launches an embedded zookeeper server for local testing causing
> -zk_endpoints to be ignored if specified.
>
> (com.twitter.common.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_in_proc)
>
> On Tue, Feb 17, 2015 at 4:09 PM, Xasima <xas...@gmail.com> wrote:
>
> > Hello. I'm bump in into following problems when trying to perform the
> very
> > first 'aurora.pex job create' command.
> > 1) 'Could not connect to scheduler: No schedulers detected in devcluster'
> > and
> > 2) 'Failed to connect to Zookeeper within 10 seconds.'
> >
> > It had tried to check everything in  configurations, but I can't find the
> > root of the problem so far. I have zookeeper, mesos-master,  mesos-slave,
> > and aurora-scheduler running on the same server. The little difference
> from
> > the default vagrant/example configuration is the usage of non default
> > http_port  for aurora scheduler.
> >
> > Namely, I have  aurora scheduler  running with the following  /vars prop
> >
> > *jvm_prop_sun_java_command *org.apache.aurora.scheduler.app.SchedulerMain
> >
> >
> -thermos_executor_path=/opt/apache-aurora-0.7.0-incubating/dist/thermos_executor.pex
> >
> -gc_executor_path=/opt/apache-aurora-0.7.0-incubating/dist/gc_executor.pex
> > -http_port=8091 -zk_in_proc=true -zk_endpoints=localhost:2181
> > -zk_session_timeout=2secs -serverset_path=/aurora/scheduler
> > -mesos_master_address=zk://localhost:2181/mesos -cluster_name=devcluster
> > -native_log_quorum_size=1
> > -native_log_file_path=/usr/local/aurora-scheduler/db
> > -native_log_zk_group_path=/local/service/mesos-native-log
> > -backup_dir=/usr/local/aurora-scheduler/backups -logtostderr -vlog=INFO
> >
> > and here is the successful tail of aurora-scheduler log
> >
> > W0217 20:42:25.952 THREAD140
> > com.twitter.common.zookeeper.ServerSetImpl.join: Joining a ServerSet
> > without a shard ID is deprecated and will soon break.
> >  com.twitter.common.zookeeper.Group$ActiveMembership.join: Set group
> member
> > ID to member_0000000001
> >
> > I0217 20:42:26.026 THREAD132
> > com.twitter.common.zookeeper.ServerSetImpl$ServerSetWatcher.logChange:
> > server set /aurora/scheduler change: from 0 members to 1
> >         joined:
> >
> > ServiceInstance(serviceEndpoint:Endpoint(host:bymsq-bsu-hmetrics002,
> > port:8091),
> additionalEndpoints:{http=Endpoint(host:bymsq-bsu-hmetrics002,
> > port:8091)}, status:ALIVE)
> >
> > I0217 20:42:26.026 THREAD132
> >
> org.apache.aurora.scheduler.http.LeaderRedirect$SchedulerMonitor.onChange:
> > Found leader scheduler at
> > [ServiceInstance(serviceEndpoint:Endpoint(host:bymsq-bsu-hmetrics002,
> > port:8091),
> additionalEndpoints:{http=Endpoint(host:bymsq-bsu-hmetrics002,
> > port:8091)}, status:ALIVE)]
> >
> > Not sure, if this is suspicious, but I see in zookeeper
> > /local/service/mesos-native-log/0000000010 node, and
> /mesos/info_000000003
> > nodes, but there are no /aurora/scheduler node.
> >
> > The configuration file /etc/aurora/clusters.json points  to zk with
> proper
> > scheduler_zk_path. All *.pex files are built with pants against
> appropriate
> > build or downloaded AURORA_DIST/third_party/mesos_*.egg.   This gist
> > contains all the details on my configurations
> > https://gist.github.com/xasima/12de906475d70523316a
> >
> >  Nevertheless, the very trivial hello_world service fails to run with
> > errors on
> >  WARN] Could not connect to scheduler: No schedulers detected in
> > devcluster!
> > WARN] Could not connect to scheduler: Failed to connect to Zookeeper
> within
> > 10 seconds.
> >
> > Could please someone help and examine the configuration above?
> >
> > --
> > Best regards,
> >      ~ Xasima ~
> >
>



-- 
Best regards,
     ~ Xasima ~

Re: Aurora.pex client can't find scheduler

Reply via email to