Just to make sure I get this correctly: You say, you cannot use the existing python client because it is python 2.7 only so you want to write a new one in python 3?
Regards, Stephan ________________________________________ From: Hussein Elgridly <huss...@broadinstitute.org> Sent: Monday, March 16, 2015 11:44 PM To: dev@aurora.incubator.apache.org Subject: Re: Speeding up Aurora client job creation So this has now bubbled back to the top of my TODO list and I'm actively working on it. I am entirely new to Thrift so please forgive the newbie questions... I would like to talk to the Aurora scheduler directly from my (Python) application using Thrift. Since I'm on Python 3.4 I've had to use thriftpy: https://github.com/eleme/thriftpy As far as I can tell, the following should work (by default, thriftpy uses a TBufferedTransport around a TSocket): --- import thriftpy import thriftpy.rpc aurora_api = thriftpy.load("api.thrift") client = thriftpy.rpc.make_client(aurora_api.AuroraSchedulerManager, host="localhost", port=8081, proto_factory=thriftpy.protocol.TJSONProtocolFactory() ) print(client.getJobSummary()) --- Obviously I wouldn't be writing this email if it did work :) It hangs. I jumped into pdb and found it was sending the following payload: b'\x00\x00\x00\\{"metadata": {"name": "getJobSummary", "seqid": 0, "ttype": 1, "version": 1}, "payload": {}}' to a socket that looked like this: <socket.socket fd=3, family=AddressFamily.AF_INET, type=2049, proto=0, laddr=('<localhost's_private_ip>', 49167), raddr=('localhost's_private_ip', 8081)> ...but was waiting forever to receive any data. Adding a timeout just triggered the timeout. I'm stumped. Any clues? Hussein Elgridly Senior Software Engineer, DSDE The Broad Institute of MIT and Harvard On 12 February 2015 at 04:15, Erb, Stephan <stephan....@blue-yonder.com> wrote: > Hi Hussein, > > we also had slight performance problems when talking to Aurora. We ended > up using the existing python client directly in our code (see > apache.aurora.client.api.__init__.py). This allowed us to reuse the api > object and its scheduler connection, dropping a connection latency of about > 0.3-0.4 seconds per request. > > Best Regards, > Stephan > ________________________________________ > From: Bill Farner <wfar...@apache.org> > Sent: Wednesday, February 11, 2015 9:29 PM > To: dev@aurora.incubator.apache.org > Subject: Re: Speeding up Aurora client job creation > > To reduce that time you will indeed want to talk directly to the > scheduler. This will definitely require you to roll up your sleeves a bit > and set up a thrift client to our api (based on api.thrift [1]), since you > will need to specify your tasks in a format that the thermos executor can > understand. Turns out this is JSON data, so it should not be *too* > prohibitive. > > However, there is another technical limitation you will hit for the > submission rate you are after. The scheduler is backed by a durable store > whose write latency is at minimum the amount of time required to fsync. > > [1] > > https://github.com/apache/incubator-aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift > > -=Bill > > On Wed, Feb 11, 2015 at 11:46 AM, Hussein Elgridly < > huss...@broadinstitute.org> wrote: > > > Hi folks, > > > > I'm looking at a use cases that involves submitting potentially hundreds > of > > jobs a second to our Mesos cluster. My tests show that the aurora client > is > > taking 1-2 seconds for each job submission, and that I can run about four > > client processes in parallel before they peg the CPU at 100%. I need more > > throughput than this! > > > > Squashing jobs down to the Process or Task level doesn't really make > sense > > for our use case. I'm aware that with some shenanigans I can batch jobs > > together using job instances, but that's a lot of work on my current > > timeframe (and of questionable utility given that the jobs certainly > won't > > have identical resource requirements). > > > > What I really need is (at least) an order of magnitude speedup in terms > of > > being able to submit jobs to the Aurora scheduler (via the client or > > otherwise). > > > > Conceptually it doesn't seem like adding a job to a queue should be a > thing > > that takes a couple of seconds, so I'm baffled as to why it's taking so > > long. As an experiment, I wrapped the call to client.execute() in > > client.py:proxy_main in cProfile and called aurora job create with a very > > simple test job. > > > > Results of the profile are in the Gist below: > > > > https://gist.github.com/helgridly/b37a0d27f04a37e72bb5 > > > > Our of a 0.977s profile time, the two things that stick out to me are: > > > > 1. 0.526s spent in Pystachio for a job that doesn't use any templates > > 2. 0.564s spent in create_job, presumably talking to the scheduler (and > > setting up the machinery for doing so) > > > > I imagine I can sidestep #1 with a check for "{{" in the job file and > > bypass Pystachio entirely. Can I also skip the Aurora client entirely and > > talk directly to the scheduler? If so what does that entail, and are > there > > any risks associated? > > > > Thanks, > > -Hussein > > > > Hussein Elgridly > > Senior Software Engineer, DSDE > > The Broad Institute of MIT and Harvard > > >