Is there a reason you set zk_in_proc=true? Setting it tells the scheduler to ignore the "real" ZK server and use an in-proc one instead.
-zk_in_proc=false Launches an embedded zookeeper server for local testing causing -zk_endpoints to be ignored if specified. (com.twitter.common.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_in_proc) On Tue, Feb 17, 2015 at 4:09 PM, Xasima <xas...@gmail.com> wrote: > Hello. I'm bump in into following problems when trying to perform the very > first 'aurora.pex job create' command. > 1) 'Could not connect to scheduler: No schedulers detected in devcluster' > and > 2) 'Failed to connect to Zookeeper within 10 seconds.' > > It had tried to check everything in configurations, but I can't find the > root of the problem so far. I have zookeeper, mesos-master, mesos-slave, > and aurora-scheduler running on the same server. The little difference from > the default vagrant/example configuration is the usage of non default > http_port for aurora scheduler. > > Namely, I have aurora scheduler running with the following /vars prop > > *jvm_prop_sun_java_command *org.apache.aurora.scheduler.app.SchedulerMain > > -thermos_executor_path=/opt/apache-aurora-0.7.0-incubating/dist/thermos_executor.pex > -gc_executor_path=/opt/apache-aurora-0.7.0-incubating/dist/gc_executor.pex > -http_port=8091 -zk_in_proc=true -zk_endpoints=localhost:2181 > -zk_session_timeout=2secs -serverset_path=/aurora/scheduler > -mesos_master_address=zk://localhost:2181/mesos -cluster_name=devcluster > -native_log_quorum_size=1 > -native_log_file_path=/usr/local/aurora-scheduler/db > -native_log_zk_group_path=/local/service/mesos-native-log > -backup_dir=/usr/local/aurora-scheduler/backups -logtostderr -vlog=INFO > > and here is the successful tail of aurora-scheduler log > > W0217 20:42:25.952 THREAD140 > com.twitter.common.zookeeper.ServerSetImpl.join: Joining a ServerSet > without a shard ID is deprecated and will soon break. > com.twitter.common.zookeeper.Group$ActiveMembership.join: Set group member > ID to member_0000000001 > > I0217 20:42:26.026 THREAD132 > com.twitter.common.zookeeper.ServerSetImpl$ServerSetWatcher.logChange: > server set /aurora/scheduler change: from 0 members to 1 > joined: > > ServiceInstance(serviceEndpoint:Endpoint(host:bymsq-bsu-hmetrics002, > port:8091), additionalEndpoints:{http=Endpoint(host:bymsq-bsu-hmetrics002, > port:8091)}, status:ALIVE) > > I0217 20:42:26.026 THREAD132 > org.apache.aurora.scheduler.http.LeaderRedirect$SchedulerMonitor.onChange: > Found leader scheduler at > [ServiceInstance(serviceEndpoint:Endpoint(host:bymsq-bsu-hmetrics002, > port:8091), additionalEndpoints:{http=Endpoint(host:bymsq-bsu-hmetrics002, > port:8091)}, status:ALIVE)] > > Not sure, if this is suspicious, but I see in zookeeper > /local/service/mesos-native-log/0000000010 node, and /mesos/info_000000003 > nodes, but there are no /aurora/scheduler node. > > The configuration file /etc/aurora/clusters.json points to zk with proper > scheduler_zk_path. All *.pex files are built with pants against appropriate > build or downloaded AURORA_DIST/third_party/mesos_*.egg. This gist > contains all the details on my configurations > https://gist.github.com/xasima/12de906475d70523316a > > Nevertheless, the very trivial hello_world service fails to run with > errors on > WARN] Could not connect to scheduler: No schedulers detected in > devcluster! > WARN] Could not connect to scheduler: Failed to connect to Zookeeper within > 10 seconds. > > Could please someone help and examine the configuration above? > > -- > Best regards, > ~ Xasima ~ >