[ https://issues.apache.org/jira/browse/FLINK-22913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dian Fu updated FLINK-22913: ---------------------------- Release Note: The job graph of Python DataStream API jobs may be different from before as the Python functions will be chained as much as possible to optimize the performance. You could disable Python functions chaining by setting 'python.operator-chaining.enabled' as 'false' explicitly. (was: The job graph of Python DataStream API jobs may be different from before as the Python functions will be chained as much as possible to optimize the performance. You could disable Python function chaining by setting 'python.operator-chaining.enabled' as 'false' explicitly.) > Support Python UDF chaining in Python DataStream API > ---------------------------------------------------- > > Key: FLINK-22913 > URL: https://issues.apache.org/jira/browse/FLINK-22913 > Project: Flink > Issue Type: Improvement > Components: API / Python > Reporter: Dian Fu > Assignee: Dian Fu > Priority: Major > Fix For: 1.14.0 > > > Currently, for the following job: > {code} > ds = .. > ds.map(map_func1) > .map(map_func2) > {code} > The Python function `map_func1` and `map_func2` will runs in separate Python > workers and the result of `map_func1` will be transferred to JVM and then > transferred to `map_func2` which may resides in another Python worker. This > introduces redundant communication and serialization/deserialization overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005)