Re: Speeding up Aurora client job creation

Brian Wickman Wed, 11 Feb 2015 13:04:15 -0800

To echo what bill said, pro-tip:
Pystachio objects have .json_dump and .json_dumps methods that serialize to
file objects and strings respectively.  Analogously, they have classmethods
.json_load and .json_loads that can deserialize this data (these behave
exactly like the python json module methods.)  So if you have an .aurora
config, you can serialize it to JSON and roll your own client in the way
Bill mentions.  Similarly for those who want to generate configs
programmatically, use Job.json_load in a boilerplate .aurora config and use
the aurora client (at an invocation cost penalty.)  That being said, a lot
of this effort will be obviated by a REST API.


On Wed, Feb 11, 2015 at 12:29 PM, Bill Farner <wfar...@apache.org> wrote:

> To reduce that time you will indeed want to talk directly to the
> scheduler.  This will definitely require you to roll up your sleeves a bit
> and set up a thrift client to our api (based on api.thrift [1]), since you
> will need to specify your tasks in a format that the thermos executor can
> understand.  Turns out this is JSON data, so it should not be *too*
> prohibitive.
>
> However, there is another technical limitation you will hit for the
> submission rate you are after.  The scheduler is backed by a durable store
> whose write latency is at minimum the amount of time required to fsync.
>
> [1]
>
> https://github.com/apache/incubator-aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift
>
> -=Bill
>
> On Wed, Feb 11, 2015 at 11:46 AM, Hussein Elgridly <
> huss...@broadinstitute.org> wrote:
>
> > Hi folks,
> >
> > I'm looking at a use cases that involves submitting potentially hundreds
> of
> > jobs a second to our Mesos cluster. My tests show that the aurora client
> is
> > taking 1-2 seconds for each job submission, and that I can run about four
> > client processes in parallel before they peg the CPU at 100%. I need more
> > throughput than this!
> >
> > Squashing jobs down to the Process or Task level doesn't really make
> sense
> > for our use case. I'm aware that with some shenanigans I can batch jobs
> > together using job instances, but that's a lot of work on my current
> > timeframe (and of questionable utility given that the jobs certainly
> won't
> > have identical resource requirements).
> >
> > What I really need is (at least) an order of magnitude speedup in terms
> of
> > being able to submit jobs to the Aurora scheduler (via the client or
> > otherwise).
> >
> > Conceptually it doesn't seem like adding a job to a queue should be a
> thing
> > that takes a couple of seconds, so I'm baffled as to why it's taking so
> > long. As an experiment, I wrapped the call to client.execute() in
> > client.py:proxy_main in cProfile and called aurora job create with a very
> > simple test job.
> >
> > Results of the profile are in the Gist below:
> >
> > https://gist.github.com/helgridly/b37a0d27f04a37e72bb5
> >
> > Our of a 0.977s profile time, the two things that stick out to me are:
> >
> > 1. 0.526s spent in Pystachio for a job that doesn't use any templates
> > 2. 0.564s spent in create_job, presumably talking to the scheduler (and
> > setting up the machinery for doing so)
> >
> > I imagine I can sidestep #1 with a check for "{{" in the job file and
> > bypass Pystachio entirely. Can I also skip the Aurora client entirely and
> > talk directly to the scheduler? If so what does that entail, and are
> there
> > any risks associated?
> >
> > Thanks,
> > -Hussein
> >
> > Hussein Elgridly
> > Senior Software Engineer, DSDE
> > The Broad Institute of MIT and Harvard
> >
>

Re: Speeding up Aurora client job creation

Reply via email to