The subtaskIndex is not currently exposed to the python operator.

Fortunately this can be changed very easily:
On the java side, within PythonStreamer.startPython() the python process is started and several parameters are transferred (L.129++) using stdin/-out. These parameters are received on the python side in Environment.execute() (L.168++).

So the transfer is rather straight-forward, after that you only have to modify the operator.configure() method to also take a subtaskIndex argument, modify the RuntimeContext constructor, add a getIndexOfThisSubtask() method and you're set.

Feel free to open a JIRA for this.

On 11.03.2016 18:15, Shannon Quinn wrote:
Hi all,

I'm interested in getting involved the Python API development. The first use-case I've encountered in my work is that of zipWithIndex, so I started looking into how to go about implementing that. It looks like the core of it involves being able to uniquely identify what worker you're currently in between distributed calls; the Scala end has getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context is more or less limited to the broadcast variables.

Happy to hear any hints as to how I should get started with this. Thanks.

Regards,
Shannon


Reply via email to