The subtaskIndex is not currently exposed to the python operator.
Fortunately this can be changed very easily:
On the java side, within PythonStreamer.startPython() the python process
is started and several parameters are transferred (L.129++) using
stdin/-out.
These parameters are received on the python side in
Environment.execute() (L.168++).
So the transfer is rather straight-forward, after that you only have to
modify the operator.configure() method to
also take a subtaskIndex argument, modify the RuntimeContext
constructor, add a getIndexOfThisSubtask() method and you're set.
Feel free to open a JIRA for this.
On 11.03.2016 18:15, Shannon Quinn wrote:
Hi all,
I'm interested in getting involved the Python API development. The
first use-case I've encountered in my work is that of zipWithIndex, so
I started looking into how to go about implementing that. It looks
like the core of it involves being able to uniquely identify what
worker you're currently in between distributed calls; the Scala end
has getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime
context is more or less limited to the broadcast variables.
Happy to hear any hints as to how I should get started with this. Thanks.
Regards,
Shannon