As a general rule we're trying to stick to Python 3.4. I don't imagine implementing something a THTTPClient of my own will be too difficult, especially given that I have the Aurora client's TRequestsTransport [1] for reference.
[1] https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/common/transport.py Hussein Elgridly Senior Software Engineer, DSDE The Broad Institute of MIT and Harvard On 16 March 2015 at 22:58, Bill Farner <wfar...@apache.org> wrote: > Exploring the possibilities - can you use python 2.7? If so, you could > leverage some of the private libraries within the client and lower the > surface area of what you need to build. It won't be a stable programmatic > API, but you might get moving faster. I assume this is what Stephan is > suggesting. > > -=Bill > > On Mon, Mar 16, 2015 at 7:52 PM, Hussein Elgridly < > huss...@broadinstitute.org> wrote: > > > I'm not quite sure I understand your question, so I'll be painfully > > explicit instead. > > > > I don't want to use the existing Aurora client because it's slow > (Pystachio > > + repeated HTTP connection overheads, as detailed earlier in this > thread). > > Instead, I want to use the Thrift interface to talk to the Aurora > scheduler > > directly - I can skip Pystachio entirely and keep the HTTP connection > > open). > > > > I cannot use the official Thrift bindings for Python as they do not yet > > support Python 3 [1]. There is a third-party, pure Python implementation > of > > Thrift that does support Python 3 called thriftpy [2]. However, thriftpy > > does not include a THTTPClient transport, which is what the Aurora > > scheduler uses. I will therefore have to write my own THTTPClient > transport > > (and probably contribute it back to thriftpy). > > > > [1] https://issues.apache.org/jira/browse/THRIFT-1857 > > [2] https://github.com/eleme/thriftpy > > > > Hussein Elgridly > > Senior Software Engineer, DSDE > > The Broad Institute of MIT and Harvard > > > > > > On 16 March 2015 at 19:11, Erb, Stephan <stephan....@blue-yonder.com> > > wrote: > > > > > Just to make sure I get this correctly: You say, you cannot use the > > > existing python client because it is python 2.7 only so you want to > > write a > > > new one in python 3? > > > > > > Regards, > > > Stephan > > > ________________________________________ > > > From: Hussein Elgridly <huss...@broadinstitute.org> > > > Sent: Monday, March 16, 2015 11:44 PM > > > To: dev@aurora.incubator.apache.org > > > Subject: Re: Speeding up Aurora client job creation > > > > > > So this has now bubbled back to the top of my TODO list and I'm > actively > > > working on it. I am entirely new to Thrift so please forgive the newbie > > > questions... > > > > > > I would like to talk to the Aurora scheduler directly from my (Python) > > > application using Thrift. Since I'm on Python 3.4 I've had to use > > thriftpy: > > > https://github.com/eleme/thriftpy > > > > > > As far as I can tell, the following should work (by default, thriftpy > > uses > > > a TBufferedTransport around a TSocket): > > > > > > --- > > > import thriftpy > > > import thriftpy.rpc > > > > > > aurora_api = thriftpy.load("api.thrift") > > > > > > client = thriftpy.rpc.make_client(aurora_api.AuroraSchedulerManager, > > > host="localhost", port=8081, > > > proto_factory=thriftpy.protocol.TJSONProtocolFactory() ) > > > > > > print(client.getJobSummary()) > > > --- > > > > > > Obviously I wouldn't be writing this email if it did work :) It hangs. > > > > > > I jumped into pdb and found it was sending the following payload: > > > > > > b'\x00\x00\x00\\{"metadata": {"name": "getJobSummary", "seqid": 0, > > "ttype": > > > 1, "version": 1}, "payload": {}}' > > > > > > to a socket that looked like this: > > > > > > <socket.socket fd=3, family=AddressFamily.AF_INET, type=2049, proto=0, > > > laddr=('<localhost's_private_ip>', 49167), > > raddr=('localhost's_private_ip', > > > 8081)> > > > > > > ...but was waiting forever to receive any data. Adding a timeout just > > > triggered the timeout. > > > > > > I'm stumped. Any clues? > > > > > > > > > Hussein Elgridly > > > Senior Software Engineer, DSDE > > > The Broad Institute of MIT and Harvard > > > > > > > > > On 12 February 2015 at 04:15, Erb, Stephan < > stephan....@blue-yonder.com> > > > wrote: > > > > > > > Hi Hussein, > > > > > > > > we also had slight performance problems when talking to Aurora. We > > ended > > > > up using the existing python client directly in our code (see > > > > apache.aurora.client.api.__init__.py). This allowed us to reuse the > api > > > > object and its scheduler connection, dropping a connection latency of > > > about > > > > 0.3-0.4 seconds per request. > > > > > > > > Best Regards, > > > > Stephan > > > > ________________________________________ > > > > From: Bill Farner <wfar...@apache.org> > > > > Sent: Wednesday, February 11, 2015 9:29 PM > > > > To: dev@aurora.incubator.apache.org > > > > Subject: Re: Speeding up Aurora client job creation > > > > > > > > To reduce that time you will indeed want to talk directly to the > > > > scheduler. This will definitely require you to roll up your sleeves > a > > > bit > > > > and set up a thrift client to our api (based on api.thrift [1]), > since > > > you > > > > will need to specify your tasks in a format that the thermos executor > > can > > > > understand. Turns out this is JSON data, so it should not be *too* > > > > prohibitive. > > > > > > > > However, there is another technical limitation you will hit for the > > > > submission rate you are after. The scheduler is backed by a durable > > > store > > > > whose write latency is at minimum the amount of time required to > fsync. > > > > > > > > [1] > > > > > > > > > > > > > > https://github.com/apache/incubator-aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift > > > > > > > > -=Bill > > > > > > > > On Wed, Feb 11, 2015 at 11:46 AM, Hussein Elgridly < > > > > huss...@broadinstitute.org> wrote: > > > > > > > > > Hi folks, > > > > > > > > > > I'm looking at a use cases that involves submitting potentially > > > hundreds > > > > of > > > > > jobs a second to our Mesos cluster. My tests show that the aurora > > > client > > > > is > > > > > taking 1-2 seconds for each job submission, and that I can run > about > > > four > > > > > client processes in parallel before they peg the CPU at 100%. I > need > > > more > > > > > throughput than this! > > > > > > > > > > Squashing jobs down to the Process or Task level doesn't really > make > > > > sense > > > > > for our use case. I'm aware that with some shenanigans I can batch > > jobs > > > > > together using job instances, but that's a lot of work on my > current > > > > > timeframe (and of questionable utility given that the jobs > certainly > > > > won't > > > > > have identical resource requirements). > > > > > > > > > > What I really need is (at least) an order of magnitude speedup in > > terms > > > > of > > > > > being able to submit jobs to the Aurora scheduler (via the client > or > > > > > otherwise). > > > > > > > > > > Conceptually it doesn't seem like adding a job to a queue should > be a > > > > thing > > > > > that takes a couple of seconds, so I'm baffled as to why it's > taking > > so > > > > > long. As an experiment, I wrapped the call to client.execute() in > > > > > client.py:proxy_main in cProfile and called aurora job create with > a > > > very > > > > > simple test job. > > > > > > > > > > Results of the profile are in the Gist below: > > > > > > > > > > https://gist.github.com/helgridly/b37a0d27f04a37e72bb5 > > > > > > > > > > Our of a 0.977s profile time, the two things that stick out to me > > are: > > > > > > > > > > 1. 0.526s spent in Pystachio for a job that doesn't use any > templates > > > > > 2. 0.564s spent in create_job, presumably talking to the scheduler > > (and > > > > > setting up the machinery for doing so) > > > > > > > > > > I imagine I can sidestep #1 with a check for "{{" in the job file > and > > > > > bypass Pystachio entirely. Can I also skip the Aurora client > entirely > > > and > > > > > talk directly to the scheduler? If so what does that entail, and > are > > > > there > > > > > any risks associated? > > > > > > > > > > Thanks, > > > > > -Hussein > > > > > > > > > > Hussein Elgridly > > > > > Senior Software Engineer, DSDE > > > > > The Broad Institute of MIT and Harvard > > > > > > > > > > > > > > >