Re: Aurora stuck in LEADER_AWAITING_REGISTRATION

Bill Farner Wed, 15 Oct 2014 15:51:09 -0700

Glad you got that sorted out!  This is where the abstraction between the
mesos library and aurora begin to show.  Unfortunately the scheduler has no
way to know about this due to how the replicated log (provided by mesos)
works.  I've filed MESOS-1933 [1] to improve this message on the mesos side.


[1] https://issues.apache.org/jira/browse/MESOS-1933

-=Bill

On Wed, Oct 15, 2014 at 2:47 PM, Dobromir Montauk <dobro...@tellapart.com>
wrote:

> I found instructions that fixed the current issue here:
>
> http://wilderness.apache.org/channels/?f=aurora/2014-05-27
>
> Would be nice to have a more friendly error message :)
>
> On Wed, Oct 15, 2014 at 2:22 PM, Dobromir Montauk <dobro...@tellapart.com>
> wrote:
>
> > Hi,
> >
> > I've brought up Aurora on my Mesos master node with the following
> command:
> >
> > ubuntu@ec2-54-82-17-37:~/$
> >
> > GLOG_v=2
> > LIBPROCESS_PORT=5050
> > LIBPROCESS_IP=127.0.0.1
> > AURORA_HOME=/usr/local/aurora-scheduler
> > DIST_DIR=/home/ubuntu/aurora-scheduler/dist
> > AURORA_HOME=/usr/local/aurora-scheduler
> >
> > sudo /usr/local/aurora-scheduler/bin/aurora-scheduler \
> >   -cluster_name=tellapart \
> >   -http_port=8081 \
> >   -native_log_quorum_size=1 \
> >   -zk_endpoints=localhost:2181 \
> >   -mesos_master_address=54.166.50.69:5050,54.160.61.169:5050
> ,localhost:5050
> > \
> >   -serverset_path=/aurora/scheduler \
> >   -native_log_zk_group_path=/aurora/replicated-log \
> >   -native_log_file_path=$AURORA_HOME/scheduler/db \
> >   -backup_dir=$AURORA_HOME/scheduler/backups \
> >   -thermos_executor_path=/dev/null \
> >   -gc_executor_path=$DIST_DIR/gc_executor.pex \
> >   -enable_beta_updater=true \
> >   -vlog=INFO \
> >   -logtostderr
> >
> > Attached is the entire log, but basically I'm seeing this:
> >
> > I1015 21:18:05.315263 27634 group.cpp:313] Group process (group(1)@
> > 10.88.26.227:40393) connected to ZooKeeper
> > I1015 21:18:05.315322 27634 group.cpp:787] Syncing group operations:
> queue
> > size (joins, cancels, datas) = (0, 0, 0)
> > I1015 21:18:05.315348 27634 group.cpp:385] Trying to create path
> > '/aurora/replicated-log' in ZooKeeper
> > I1015 21:18:05.316 THREAD1
> > com.twitter.common.zookeeper.CandidateImpl$4.onGroupChange: Candidate
> > /aurora/scheduler/singleton_candidate_0000000008 is now leader of group:
> > [singleton_candidate_0000000008]
> > I1015 21:18:05.317 THREAD1
> > com.twitter.common.util.StateMachine$Builder$1.execute:
> SchedulerLifecycle
> > state machine transition STORAGE_PREPARED -> LEADER_AWAITING_REGISTRATION
> > I1015 21:18:05.317 THREAD1
> > org.apache.aurora.scheduler.SchedulerLifecycle$6.execute: Elected as
> > leading scheduler!
> > I1015 21:18:05.330394 27634 network.hpp:423] ZooKeeper group memberships
> > changed
> > I1015 21:18:05.330660 27639 group.cpp:658] Trying to get
> > '/aurora/replicated-log/0000000008' in ZooKeeper
> > I1015 21:18:05.331550 27635 network.hpp:461] ZooKeeper group PIDs: {
> > log-replica(1)@10.88.26.227:40393 }
> > I1015 21:18:06.027016 27634 replica.cpp:638] Replica in EMPTY status
> > received a broadcasted recover request
> > I1015 21:18:06.027216 27634 recover.cpp:188] Received a recover response
> > from a replica in EMPTY status
> > <repeat last 2 message ad nauseum>
> >
> > How can I debug what's going on?
> >
> > Thanks,
> > Dobromir
> >
>

Re: Aurora stuck in LEADER_AWAITING_REGISTRATION

Reply via email to