Hi Shannon, I'm happy to see some community engagement on our Python APIs!
On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <ches...@apache.org> wrote: > The subtaskIndex is not currently exposed to the python operator. > > Fortunately this can be changed very easily: > On the java side, within PythonStreamer.startPython() the python process > is started and several parameters are transferred (L.129++) using > stdin/-out. > These parameters are received on the python side in Environment.execute() > (L.168++). > > So the transfer is rather straight-forward, after that you only have to > modify the operator.configure() method to > also take a subtaskIndex argument, modify the RuntimeContext constructor, > add a getIndexOfThisSubtask() method and you're set. > > Feel free to open a JIRA for this. > > > On 11.03.2016 18:15, Shannon Quinn wrote: > >> Hi all, >> >> I'm interested in getting involved the Python API development. The first >> use-case I've encountered in my work is that of zipWithIndex, so I started >> looking into how to go about implementing that. It looks like the core of >> it involves being able to uniquely identify what worker you're currently in >> between distributed calls; the Scala end has >> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context >> is more or less limited to the broadcast variables. >> >> Happy to hear any hints as to how I should get started with this. Thanks. >> >> Regards, >> Shannon >> >> >