aurora at Oscar Health

2014-07-23 Thread Isaac Councill
Hi, This is mostly for introduction and to say that we're using aurora in production at Oscar. So far so good, and thanks for open sourcing! I'm currently running 0.5.0 rc0 on mesos 0.18.0, about to upgrade to rc1 hoping that will fix cron. I don't have anything to give back yet, unless anyone wa

Re: aurora at Oscar Health

2014-07-23 Thread Isaac Councill
Here's the spec: https://github.com/isaac-councill/aurora-rpm-spec For expedience, I didn't try to generalize it and the spec depends on a custom python27 rpm, but that's easy to change. Not sure if the init scripts I'm using would be helpful as well - I'd be happy to pu

(AURORA-596) SQL constraint violation in DbAttributeStore

2014-08-05 Thread Isaac Councill
Hi, I noticed this issue after being hit with the "Unique index or primary key violation" error on one slave machine. Is there a recommended workaround to reset the db state safely? Currently aurora will not schedule jobs on that slave. Thanks, Isaac

Re: (AURORA-596) SQL constraint violation in DbAttributeStore

2014-08-05 Thread Isaac Councill
contain commit 2b78aff? That was the fix for AURORA-596, > which it sounds like what you are referring to. A build with that commit > should course-correct your state. > > -=Bill > > > On Tue, Aug 5, 2014 at 4:08 PM, Isaac Councill wrote: > > > Hi, > > >

one lock to lock them all

2014-09-10 Thread Isaac Councill
Hi, I just experienced an issue where a job update was failing and leaving a lock in place. Expected. What I didn't expect was that while that lock was in place, it was not possible to interact with any job (old or new) due to the lock. All the jobs I tested were in the same role and prod environm

Re: one lock to lock them all

2014-09-10 Thread Isaac Councill
; > https://issues.apache.org/jira/browse/AURORA-640 > > > > I'm not sure though if the fix made it in in time for that release. > > > > > > - David > > > > > > On Wed, Sep 10, 2014 at 11:03 AM, Isaac Councill > > wrote: > > >

monitoring aurora scheduler

2014-09-30 Thread Isaac Councill
I've been having a bad time with the great AWS Xen reboot, and thought it would be a good time to revamp monitoring among other things. Do you have any recommendations for monitoring scheduler health? I've got my own ideas, but am more interested in learning about twitter prod monitoring. For co

Re: monitoring aurora scheduler

2014-10-01 Thread Isaac Councill
r errors responding to RPCs and > web UI loading. > > I'd love to know more about the specific issue you encountered. Do the > scheduler logs indicate anything unusual during the period of downtime? > > > -=Bill > > On Tue, Sep 30, 2014 at 1:59 PM, Isaac Councill wrote: &g

Re: monitoring aurora scheduler

2014-10-01 Thread Isaac Councill
Much appreciated. On Wed, Oct 1, 2014 at 2:11 PM, Bill Farner wrote: > Ok, when you have bandwidth to upgrade again feel free to let us know if > you would like somebody standing by in IRC to assist. > > -=Bill > > On Wed, Oct 1, 2014 at 11:04 AM, Isaac Councill wrote: >

api vs apibeta

2014-12-28 Thread Isaac Councill
tl;dr; apibeta seems way faster (and arguably better) than thrift api. What are the long term objectives for apibeta? Hi, I've been working on some aurora integrations, primarily a blackbox monitoring tool at present, and was looking for the best way to communicate with the scheduler. For a lar

Re: api vs apibeta

2014-12-29 Thread Isaac Councill
ift and also can you share your > source for the test clients > > -Jake > > On Mon, Dec 29, 2014 at 1:27 AM, Isaac Councill wrote: > > > tl;dr; > > apibeta seems way faster (and arguably better) than thrift api. What are > > the long term objectives for apibeta? &g

Re: api vs apibeta

2014-12-29 Thread Isaac Councill
bench dist/aurora2 -url="http://:8081/apibeta" On Mon, Dec 29, 2014 at 3:36 PM, Isaac Councill wrote: > Here's source from go thrift (warning: very ugly). I had to make a few > modifications to the ttypes and client libraries to get it working. It > requires git.apache.