I'm a Python guru; if it doesn't have a Python API, I'll likely help make one :)

Work is bad this week but I'm planning to get started on this next week!

Shannon

On 3/14/16 5:37 AM, Robert Metzger wrote:
Hi Shannon,

I'm happy to see some community engagement on our Python APIs!

On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <ches...@apache.org>
wrote:

The subtaskIndex is not currently exposed to the python operator.

Fortunately this can be changed very easily:
On the java side, within PythonStreamer.startPython() the python process
is started and several parameters are transferred (L.129++) using
stdin/-out.
These parameters are received on the python side in Environment.execute()
(L.168++).

So the transfer is rather straight-forward, after that you only have to
modify the operator.configure() method to
also take a subtaskIndex argument, modify the RuntimeContext constructor,
add a getIndexOfThisSubtask() method and you're set.

Feel free to open a JIRA for this.


On 11.03.2016 18:15, Shannon Quinn wrote:

Hi all,

I'm interested in getting involved the Python API development. The first
use-case I've encountered in my work is that of zipWithIndex, so I started
looking into how to go about implementing that. It looks like the core of
it involves being able to uniquely identify what worker you're currently in
between distributed calls; the Scala end has
getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context
is more or less limited to the broadcast variables.

Happy to hear any hints as to how I should get started with this. Thanks.

Regards,
Shannon



Reply via email to