Question about sprint in Aurora development

2014-10-16 Thread Henry Saputra
HI Guys, I start noticing that Aurora jira has sprint tagged to the issue. Kevin also has sent email about documentation sprint happening. However I dont think I ever recall explicit discussion about having sprint-like mode to drive development for Aurora. Seemed like internal Twitter sprint is

Re: Docker on Aurora

2014-10-16 Thread Jay Buffington
I'm hoping that AURORA-633 will be up for code review in November. I believe the docker containerizer is the best way to use docker with mesos (compared to other options like a docker executor or using the external containerizer). Right now you can write an Aurora job that has processes which exe

Re: Storage is not READY

2014-10-16 Thread Oliver, James
Huzzah! Thanks Jie!! On 10/16/14 12:29 PM, "Bill Farner" wrote: >You can all rejoice, Jie stepped up and posted a patch :-) > >https://reviews.apache.org/r/26815/ > >-=Bill > >On Thu, Oct 16, 2014 at 9:56 AM, Oliver, James >wrote: > >> +1 >> >> This bit me a while back. >> >> >> >> On 10/16/14

Re: Proposal: External Update Coordination

2014-10-16 Thread Bill Farner
+1 -=Bill On Thu, Oct 16, 2014 at 11:21 AM, Maxim Khutornenko wrote: > Correct. The presence of the SessionKey does indeed mean that > heartbeats are going to be authenticated. Given that external service > has to solve authentication story to use pauseJobUpdate anyway, having > heartbeats auth

Re: Storage is not READY

2014-10-16 Thread Bill Farner
You can all rejoice, Jie stepped up and posted a patch :-) https://reviews.apache.org/r/26815/ -=Bill On Thu, Oct 16, 2014 at 9:56 AM, Oliver, James wrote: > +1 > > This bit me a while back. > > > > On 10/16/14 9:35 AM, "Jay Buffington" wrote: > > >We should fix this bug in Mesos: > >https://

Re: Proposal: External Update Coordination

2014-10-16 Thread Maxim Khutornenko
Correct. The presence of the SessionKey does indeed mean that heartbeats are going to be authenticated. Given that external service has to solve authentication story to use pauseJobUpdate anyway, having heartbeats authenticated seems like a natural progression. Also, given our current admin thrift

Re: Docker on Aurora

2014-10-16 Thread Bill Farner
Some folks have trailblazed into this territory with success, but AFAIK it's not for the faint of heart. We have no timeline on first-class docker support, but would love guidance and contributions! The docker support in mesos is still very new, and there are some challenges to overcome, specific

Re: Proposal: External Update Coordination

2014-10-16 Thread Kevin Sweeney
I inferred that authentication was required due to the presence of a SessionKey in the RPC. Of course any authentication mechanism here could have serious scaling issues (barring something like HTTP basic auth in memory) On Thu, Oct 16, 2014 at 10:48 AM, Joshua Cohen wrote: > What are our though

Re: Proposal: External Update Coordination

2014-10-16 Thread Joshua Cohen
What are our thoughts about authentication with regards to heartbeats? It seems like they should be authenticated since there does exist the potential for a malicious actor to send its own heartbeats even if the real monitoring service has detected a problem and ceased sending heartbeats. I'm not s

Re: executor build issue

2014-10-16 Thread Kevin Sweeney
Aurora master is currently pinned to 0.20.0 (a patch to upgrade to 0.20.1 should be fairly trivial, see https://reviews.apache.org/r/24616/ for prior art). On Thu, Oct 16, 2014 at 9:49 AM, Joe Stein wrote: > hmmm, I did that but getting same error... should I do it for the 0.20.0 > egg? I downlo

Re: Storage is not READY

2014-10-16 Thread Oliver, James
+1 This bit me a while back. On 10/16/14 9:35 AM, "Jay Buffington" wrote: >We should fix this bug in Mesos: >https://issues.apache.org/jira/browse/MESOS-1703 > >If the error message hadn't been terrible, Joe could have easily fixed >and moved on. > >On Thu, Oct 16, 2014 at 9:32 AM, Bill Farne

Re: executor build issue

2014-10-16 Thread Joe Stein
hmmm, I did that but getting same error... should I do it for the 0.20.0 egg? I downloaded and easy_install the 0.20.1 egg (because that is Mesos version I am running). Or can I change something in Aurora so it uses 0.20.1 egg I just installed? /*** Joe Ste

Re: executor build issue

2014-10-16 Thread Bill Farner
The relevant bit from our vagrant provisioning script is here: https://github.com/apache/incubator-aurora/blob/599a7dcbe11a49f15d082882781e812a092e959d/examples/vagrant/provision-dev-cluster.sh#L37-L42 Unfortunately, the mesos RPM/debs don't include python libraries, so we need to install those o

Re: Storage is not READY

2014-10-16 Thread Jay Buffington
We should fix this bug in Mesos: https://issues.apache.org/jira/browse/MESOS-1703 If the error message hadn't been terrible, Joe could have easily fixed and moved on. On Thu, Oct 16, 2014 at 9:32 AM, Bill Farner wrote: > Piggy-backing on Zameer's reply: > https://github.com/apache/incubator-auro

Re: Storage is not READY

2014-10-16 Thread Joe Stein
mesos-log initialize --path="$AURORA_HOME/scheduler/db" worked :) thanks /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ***

Re: executor build issue

2014-10-16 Thread Zameer Manji
You need to grab and use the eggs listed here https://mesosphere.com/downloads/ On Thu, Oct 16, 2014 at 9:20 AM, Joe Stein wrote: > I am getting the following error trying to build executor. I have mesos > 0.20.1 (below seems like it is looking for 0.20.0 maybe that is problem not > sure if so a

Re: Storage is not READY

2014-10-16 Thread Bill Farner
Piggy-backing on Zameer's reply: https://github.com/apache/incubator-aurora/blob/master/docs/deploying-aurora-scheduler.md#initializing-the-replicated-log Relevant ticket for today's doc day, https://issues.apache.org/jira/browse/AURORA-840 -=Bill On Thu, Oct 16, 2014 at 9:30 AM, Zameer Manji w

Re: Storage is not READY

2014-10-16 Thread Joshua Cohen
Hi Joe, It sounds like you haven't initialized the mesos replicated log. See the documentation here: https://github.com/apache/incubator-aurora/blob/master/docs/deploying-aurora-scheduler.md#initializing-the-replicated-log Cheers, Joshua On Thu, Oct 16, 2014 at 9:13 AM, Joe Stein wrote: > My

Re: Storage is not READY

2014-10-16 Thread Zameer Manji
Did you initialize the replicated log? On Thu, Oct 16, 2014 at 9:13 AM, Joe Stein wrote: > My build is from latest master. I only have one scheduler running. I also > see this in the log over and over (and over and over) again. Not sure it is > related. > > I1016 16:12:27.234133 26081 replica.c

executor build issue

2014-10-16 Thread Joe Stein
I am getting the following error trying to build executor. I have mesos 0.20.1 (below seems like it is looking for 0.20.0 maybe that is problem not sure if so and how to fix???) /opt/apache/incubator-aurora$aurorabuild executor Build operating on top level addresses: set([BuildFileAddress(/mnt/dat

Re: Storage is not READY

2014-10-16 Thread Joe Stein
My build is from latest master. I only have one scheduler running. I also see this in the log over and over (and over and over) again. Not sure it is related. I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1016 16:12:27.234256 26084 re

Re: Storage is not READY

2014-10-16 Thread Jay Buffington
This was fixed about three months ago: https://issues.apache.org/jira/browse/AURORA-584 Perhaps you're running an older version that doesn't have that commit? The tl;dr of that jira is that when you go to a scheduler that isn't the leader it should do a HTTP redirect (302?) to the leader. Jay

Storage is not READY

2014-10-16 Thread Joe Stein
Hi, I am getting an error when going to /scheduler in the UI and not sure what it is. Thanks in advance!!! W1016 15:13:14.419 THREAD133 org.apache.aurora.scheduler.thrift.aop.LoggingInterceptor.invoke: Uncaught transient exception while handling getRoleSummary() org.apache.aurora.scheduler.storag