[jira] [Updated] (FLINK-22913) Support Python UDF chaining in Python DataStream API

Dian Fu (Jira) Mon, 13 Sep 2021 05:03:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-22913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dian Fu updated FLINK-22913:
----------------------------
    Release Note: The job graph of Python DataStream API jobs may be different 
from before as the Python functions will be chained as much as possible to 
optimize the performance. You could disable Python functions chaining by 
setting 'python.operator-chaining.enabled' as 'false' explicitly.  (was: The 
job graph of Python DataStream API jobs may be different from before as the 
Python functions will be chained as much as possible to optimize the 
performance. You could disable Python function chaining by setting 
'python.operator-chaining.enabled' as 'false' explicitly.)

> Support Python UDF chaining in Python DataStream API
> ----------------------------------------------------
>
>                 Key: FLINK-22913
>                 URL: https://issues.apache.org/jira/browse/FLINK-22913
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Python
>            Reporter: Dian Fu
>            Assignee: Dian Fu
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Currently, for the following job:
> {code}
> ds = ..
> ds.map(map_func1)
>     .map(map_func2)
> {code}
> The Python function `map_func1` and `map_func2` will runs in separate Python 
> workers and the result of `map_func1` will be transferred to JVM and then 
> transferred to `map_func2` which may resides in another Python worker. This 
> introduces redundant communication and serialization/deserialization overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-22913) Support Python UDF chaining in Python DataStream API

Reply via email to