[jira] [Updated] (FLINK-16943) Support adding jars in PyFlink

Wei Zhong (Jira) Thu, 02 Apr 2020 02:10:29 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wei Zhong updated FLINK-16943:
------------------------------
    Description: 
Since flink-1.10.0 released, many users have complained that PyFlink is 
inconvenient when loading external jar packages. For local execution, users 
need to copy the jar files to the lib folder under the installation directory 
of PyFlink, which is hard to locate. For job submission, users need to merge 
their jars into one, as `flink run` only accepts one jar file. It may be easy 
for Java users but difficult for Python users if they haven't touched Java.

We intend to add a `add_jars` interface on PyFlink TableEnvironment to solve 
this problem. It will add the jars to the context classloader of Py4j gateway 
server and add to the `PipelineOptions.JARS` of the configuration of 
StreamExecutionEnviornment/ExecutionEnviornment.

Via this interface, users could add jars in their python job. The jars will be 
loaded immediately, and users could use it even on the next line of the Python 
code. Submitting a job with multiple external jars won't be a problem anymore 
because all the jars in `PipelineOptions.JARS` will be added to the JobGraph 
and upload to the cluster.

As it is not a big change I'm not sure whether it is necessary to create a FLIP 
to discuss this. So I created a JIRA first for flexibility. What do you think 
guys?

  was:
Since flink-1.10.0 released, many users have complained that PyFlink is 
inconvenient when loading external jar packages. For local execution, users 
need to copy the jar files to the lib folder under the installation directory 
of PyFlink, which is hard to locate. For job submission, users need to merge 
their jars into one, as `flink run` only accepts one jar file. It may be easy 
for Java users but difficult for Python users if they haven't touched Java.

We intend to add a `add_jars` interface on PyFlink TableEnvironment to solve 
this problem. It will add the jars to the context classloader of Py4j gateway 
server and add to the `PipelineOptions.JARS` of the configuration of 
StreamExecutionEnviornment/ExecutionEnviornment.

Via this interface, users could add jars in their python job. The jars will be 
loaded immediately, and users could use it even on the next line of the Python 
code. Submitting a job with multiple external jars won't be a problem anymore 
because all the jars in `PipelineOptions.JARS` will be added to the JobGraph 
and upload to the cluster.

As it seems not a big change I'm not sure whether it is necessary to create a 
FLIP to discuss this. So I created a JIRA first for flexibility. What do you 
think guys?


> Support adding jars in PyFlink
> ------------------------------
>
>                 Key: FLINK-16943
>                 URL: https://issues.apache.org/jira/browse/FLINK-16943
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Python
>            Reporter: Wei Zhong
>            Priority: Major
>
> Since flink-1.10.0 released, many users have complained that PyFlink is 
> inconvenient when loading external jar packages. For local execution, users 
> need to copy the jar files to the lib folder under the installation directory 
> of PyFlink, which is hard to locate. For job submission, users need to merge 
> their jars into one, as `flink run` only accepts one jar file. It may be easy 
> for Java users but difficult for Python users if they haven't touched Java.
> We intend to add a `add_jars` interface on PyFlink TableEnvironment to solve 
> this problem. It will add the jars to the context classloader of Py4j gateway 
> server and add to the `PipelineOptions.JARS` of the configuration of 
> StreamExecutionEnviornment/ExecutionEnviornment.
> Via this interface, users could add jars in their python job. The jars will 
> be loaded immediately, and users could use it even on the next line of the 
> Python code. Submitting a job with multiple external jars won't be a problem 
> anymore because all the jars in `PipelineOptions.JARS` will be added to the 
> JobGraph and upload to the cluster.
> As it is not a big change I'm not sure whether it is necessary to create a 
> FLIP to discuss this. So I created a JIRA first for flexibility. What do you 
> think guys?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-16943) Support adding jars in PyFlink

Reply via email to