Another update: Thrift has a pull request open for Python 3 support [1], but it was out of date and needed rebasing onto master. I did this off in my own fork [2] and managed to build a Py3-generating version of Thrift. This allowed me to generate Python 3 Thrift bindings for Aurora, which I'm including in my project along with a tarball of the Python 3 Thrift libraries. Success!
[1] https://github.com/apache/thrift/pull/213 [2] https://github.com/broadinstitute/thrift/tree/eevee/python3 The changes make Thrift fail on Python 2, so I imagine it'll be a while before they make it into official Thrift. But it works for me, so I'm happy :) Hussein Elgridly Senior Software Engineer, DSDE The Broad Institute of MIT and Harvard On 17 March 2015 at 15:18, Hussein Elgridly <huss...@broadinstitute.org> wrote: > For anyone following along at home, I managed to make my own THTTPClient > for thriftpy just fine. Unfortunately, thriftpy's TJSONProtocol seems to be > *a* JSON protocol, not *the* JSON protocol: > > thrift: [1,"getJobSummary",1,0,{}] > thriftpy: {"metadata": {"ttype": 1, "name": "getJobSummary", "version": 1, > "seqid": 0}, "payload": {}} > > Which is frustrating to say the least. I am now debating whether to: > > 1. Stub out the subset of the API that I actually need (currently only > createJob and getTasksWithoutConfigs); > 2. Roll my own protocol, based on Thrift's code [1]; or > 3. Backport my project to Python 2.7 and use official Thrift. > > [1] > https://github.com/apache/thrift/blob/93fea15b51494a79992a5323c803325537134bd8/lib/py/src/protocol/TJSONProtocol.py > > > Hussein Elgridly > Senior Software Engineer, DSDE > The Broad Institute of MIT and Harvard > > > On 16 March 2015 at 23:37, Hussein Elgridly <huss...@broadinstitute.org> > wrote: > >> As a general rule we're trying to stick to Python 3.4. I don't imagine >> implementing something a THTTPClient of my own will be too difficult, >> especially given that I have the Aurora client's TRequestsTransport [1] for >> reference. >> >> [1] >> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/common/transport.py >> >> Hussein Elgridly >> Senior Software Engineer, DSDE >> The Broad Institute of MIT and Harvard >> >> >> On 16 March 2015 at 22:58, Bill Farner <wfar...@apache.org> wrote: >> >>> Exploring the possibilities - can you use python 2.7? If so, you could >>> leverage some of the private libraries within the client and lower the >>> surface area of what you need to build. It won't be a stable >>> programmatic >>> API, but you might get moving faster. I assume this is what Stephan is >>> suggesting. >>> >>> -=Bill >>> >>> On Mon, Mar 16, 2015 at 7:52 PM, Hussein Elgridly < >>> huss...@broadinstitute.org> wrote: >>> >>> > I'm not quite sure I understand your question, so I'll be painfully >>> > explicit instead. >>> > >>> > I don't want to use the existing Aurora client because it's slow >>> (Pystachio >>> > + repeated HTTP connection overheads, as detailed earlier in this >>> thread). >>> > Instead, I want to use the Thrift interface to talk to the Aurora >>> scheduler >>> > directly - I can skip Pystachio entirely and keep the HTTP connection >>> > open). >>> > >>> > I cannot use the official Thrift bindings for Python as they do not yet >>> > support Python 3 [1]. There is a third-party, pure Python >>> implementation of >>> > Thrift that does support Python 3 called thriftpy [2]. However, >>> thriftpy >>> > does not include a THTTPClient transport, which is what the Aurora >>> > scheduler uses. I will therefore have to write my own THTTPClient >>> transport >>> > (and probably contribute it back to thriftpy). >>> > >>> > [1] https://issues.apache.org/jira/browse/THRIFT-1857 >>> > [2] https://github.com/eleme/thriftpy >>> > >>> > Hussein Elgridly >>> > Senior Software Engineer, DSDE >>> > The Broad Institute of MIT and Harvard >>> > >>> > >>> > On 16 March 2015 at 19:11, Erb, Stephan <stephan....@blue-yonder.com> >>> > wrote: >>> > >>> > > Just to make sure I get this correctly: You say, you cannot use the >>> > > existing python client because it is python 2.7 only so you want to >>> > write a >>> > > new one in python 3? >>> > > >>> > > Regards, >>> > > Stephan >>> > > ________________________________________ >>> > > From: Hussein Elgridly <huss...@broadinstitute.org> >>> > > Sent: Monday, March 16, 2015 11:44 PM >>> > > To: dev@aurora.incubator.apache.org >>> > > Subject: Re: Speeding up Aurora client job creation >>> > > >>> > > So this has now bubbled back to the top of my TODO list and I'm >>> actively >>> > > working on it. I am entirely new to Thrift so please forgive the >>> newbie >>> > > questions... >>> > > >>> > > I would like to talk to the Aurora scheduler directly from my >>> (Python) >>> > > application using Thrift. Since I'm on Python 3.4 I've had to use >>> > thriftpy: >>> > > https://github.com/eleme/thriftpy >>> > > >>> > > As far as I can tell, the following should work (by default, thriftpy >>> > uses >>> > > a TBufferedTransport around a TSocket): >>> > > >>> > > --- >>> > > import thriftpy >>> > > import thriftpy.rpc >>> > > >>> > > aurora_api = thriftpy.load("api.thrift") >>> > > >>> > > client = thriftpy.rpc.make_client(aurora_api.AuroraSchedulerManager, >>> > > host="localhost", port=8081, >>> > > proto_factory=thriftpy.protocol.TJSONProtocolFactory() ) >>> > > >>> > > print(client.getJobSummary()) >>> > > --- >>> > > >>> > > Obviously I wouldn't be writing this email if it did work :) It >>> hangs. >>> > > >>> > > I jumped into pdb and found it was sending the following payload: >>> > > >>> > > b'\x00\x00\x00\\{"metadata": {"name": "getJobSummary", "seqid": 0, >>> > "ttype": >>> > > 1, "version": 1}, "payload": {}}' >>> > > >>> > > to a socket that looked like this: >>> > > >>> > > <socket.socket fd=3, family=AddressFamily.AF_INET, type=2049, >>> proto=0, >>> > > laddr=('<localhost's_private_ip>', 49167), >>> > raddr=('localhost's_private_ip', >>> > > 8081)> >>> > > >>> > > ...but was waiting forever to receive any data. Adding a timeout just >>> > > triggered the timeout. >>> > > >>> > > I'm stumped. Any clues? >>> > > >>> > > >>> > > Hussein Elgridly >>> > > Senior Software Engineer, DSDE >>> > > The Broad Institute of MIT and Harvard >>> > > >>> > > >>> > > On 12 February 2015 at 04:15, Erb, Stephan < >>> stephan....@blue-yonder.com> >>> > > wrote: >>> > > >>> > > > Hi Hussein, >>> > > > >>> > > > we also had slight performance problems when talking to Aurora. We >>> > ended >>> > > > up using the existing python client directly in our code (see >>> > > > apache.aurora.client.api.__init__.py). This allowed us to reuse >>> the api >>> > > > object and its scheduler connection, dropping a connection latency >>> of >>> > > about >>> > > > 0.3-0.4 seconds per request. >>> > > > >>> > > > Best Regards, >>> > > > Stephan >>> > > > ________________________________________ >>> > > > From: Bill Farner <wfar...@apache.org> >>> > > > Sent: Wednesday, February 11, 2015 9:29 PM >>> > > > To: dev@aurora.incubator.apache.org >>> > > > Subject: Re: Speeding up Aurora client job creation >>> > > > >>> > > > To reduce that time you will indeed want to talk directly to the >>> > > > scheduler. This will definitely require you to roll up your >>> sleeves a >>> > > bit >>> > > > and set up a thrift client to our api (based on api.thrift [1]), >>> since >>> > > you >>> > > > will need to specify your tasks in a format that the thermos >>> executor >>> > can >>> > > > understand. Turns out this is JSON data, so it should not be *too* >>> > > > prohibitive. >>> > > > >>> > > > However, there is another technical limitation you will hit for the >>> > > > submission rate you are after. The scheduler is backed by a >>> durable >>> > > store >>> > > > whose write latency is at minimum the amount of time required to >>> fsync. >>> > > > >>> > > > [1] >>> > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/incubator-aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift >>> > > > >>> > > > -=Bill >>> > > > >>> > > > On Wed, Feb 11, 2015 at 11:46 AM, Hussein Elgridly < >>> > > > huss...@broadinstitute.org> wrote: >>> > > > >>> > > > > Hi folks, >>> > > > > >>> > > > > I'm looking at a use cases that involves submitting potentially >>> > > hundreds >>> > > > of >>> > > > > jobs a second to our Mesos cluster. My tests show that the aurora >>> > > client >>> > > > is >>> > > > > taking 1-2 seconds for each job submission, and that I can run >>> about >>> > > four >>> > > > > client processes in parallel before they peg the CPU at 100%. I >>> need >>> > > more >>> > > > > throughput than this! >>> > > > > >>> > > > > Squashing jobs down to the Process or Task level doesn't really >>> make >>> > > > sense >>> > > > > for our use case. I'm aware that with some shenanigans I can >>> batch >>> > jobs >>> > > > > together using job instances, but that's a lot of work on my >>> current >>> > > > > timeframe (and of questionable utility given that the jobs >>> certainly >>> > > > won't >>> > > > > have identical resource requirements). >>> > > > > >>> > > > > What I really need is (at least) an order of magnitude speedup in >>> > terms >>> > > > of >>> > > > > being able to submit jobs to the Aurora scheduler (via the >>> client or >>> > > > > otherwise). >>> > > > > >>> > > > > Conceptually it doesn't seem like adding a job to a queue should >>> be a >>> > > > thing >>> > > > > that takes a couple of seconds, so I'm baffled as to why it's >>> taking >>> > so >>> > > > > long. As an experiment, I wrapped the call to client.execute() in >>> > > > > client.py:proxy_main in cProfile and called aurora job create >>> with a >>> > > very >>> > > > > simple test job. >>> > > > > >>> > > > > Results of the profile are in the Gist below: >>> > > > > >>> > > > > https://gist.github.com/helgridly/b37a0d27f04a37e72bb5 >>> > > > > >>> > > > > Our of a 0.977s profile time, the two things that stick out to me >>> > are: >>> > > > > >>> > > > > 1. 0.526s spent in Pystachio for a job that doesn't use any >>> templates >>> > > > > 2. 0.564s spent in create_job, presumably talking to the >>> scheduler >>> > (and >>> > > > > setting up the machinery for doing so) >>> > > > > >>> > > > > I imagine I can sidestep #1 with a check for "{{" in the job >>> file and >>> > > > > bypass Pystachio entirely. Can I also skip the Aurora client >>> entirely >>> > > and >>> > > > > talk directly to the scheduler? If so what does that entail, and >>> are >>> > > > there >>> > > > > any risks associated? >>> > > > > >>> > > > > Thanks, >>> > > > > -Hussein >>> > > > > >>> > > > > Hussein Elgridly >>> > > > > Senior Software Engineer, DSDE >>> > > > > The Broad Institute of MIT and Harvard >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >