HuangXingBo commented on a change in pull request #13232: URL: https://github.com/apache/flink/pull/13232#discussion_r480197110
########## File path: docs/dev/python/user-guide/datastream/dependency_management.md ########## @@ -0,0 +1,97 @@ +--- +title: "Dependency Management" +nav-parent_id: python_datastream_api +nav-pos: 40 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Python Dependency + +If third-party Python dependencies are used, Users can specify the dependencies with the following Python DataStream +APIs or through [command line arguments]({% link ops/cli.md %}#usage) directly when submitting the job. + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">APIs</th> + <th class="text-left">Description</th> + </tr> + </thead> + + <tbody> + <tr> + <td><strong>add_python_file(file_path)</strong></td> + <td> + <p>Adds python file dependencies which could be python files, python packages or local directories. They will be added to the PYTHONPATH of the python UDF worker.</p> +{% highlight python %} +stream_execution_environment.add_python_file(file_path) +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>set_python_requirements(requirements_file_path, requirements_cache_dir=None)</strong></td> + <td> + <p>Specifies a requirements.txt file which defines the third-party dependencies. These dependencies will be installed to a temporary directory and added to the PYTHONPATH of the python UDF worker. For the dependencies which could not be accessed in the cluster, a directory which contains the installation packages of these dependencies could be specified using the parameter "requirements_cached_dir". It will be uploaded to the cluster to support offline installation.</p> +{% highlight python %} +# commands executed in shell +echo numpy==1.16.5 > requirements.txt +pip download -d cached_dir -r requirements.txt --no-binary :all: + +# python code +stream_execution_environment.set_python_requirements("requirements.txt", "cached_dir") +{% endhighlight %} + <p>Please make sure the installation packages matches the platform of the cluster and the python version used. These packages will be installed using pip, so also make sure the version of Pip (version >= 7.1.0) and the version of SetupTools (version >= 37.0.0).</p> Review comment: SetupTools -> `Setuptools` ########## File path: docs/dev/python/user-guide/table/dependency_management.zh.md ########## @@ -61,49 +61,49 @@ table_env.add_python_file(file_path) <tr> <td><strong>set_python_requirements(requirements_file_path, requirements_cache_dir=None)</strong></td> <td> - <p>Specifies a requirements.txt file which defines the third-party dependencies. These dependencies will be installed to a temporary directory and added to the PYTHONPATH of the python UDF worker. For the dependencies which could not be accessed in the cluster, a directory which contains the installation packages of these dependencies could be specified using the parameter "requirements_cached_dir". It will be uploaded to the cluster to support offline installation.</p> + <p>配置一个 requirements.txt 文件用于指定 Python 第三方依赖,这些依赖会被安装到一个临时目录并添加到 Python Worker 的 PYTHONPATH 中。对于在集群中无法访问的外部依赖,用户可以通过 "requirements_cached_dir" 参数指定一个包含这些依赖安装包的目录,这个目录文件会被上传到集群并实现离线安装。</p> {% highlight python %} -# commands executed in shell +# shell 命令 Review comment: 执行下面的shell命令 ########## File path: docs/dev/python/user-guide/table/dependency_management.zh.md ########## @@ -61,49 +61,49 @@ table_env.add_python_file(file_path) <tr> <td><strong>set_python_requirements(requirements_file_path, requirements_cache_dir=None)</strong></td> <td> - <p>Specifies a requirements.txt file which defines the third-party dependencies. These dependencies will be installed to a temporary directory and added to the PYTHONPATH of the python UDF worker. For the dependencies which could not be accessed in the cluster, a directory which contains the installation packages of these dependencies could be specified using the parameter "requirements_cached_dir". It will be uploaded to the cluster to support offline installation.</p> + <p>配置一个 requirements.txt 文件用于指定 Python 第三方依赖,这些依赖会被安装到一个临时目录并添加到 Python Worker 的 PYTHONPATH 中。对于在集群中无法访问的外部依赖,用户可以通过 "requirements_cached_dir" 参数指定一个包含这些依赖安装包的目录,这个目录文件会被上传到集群并实现离线安装。</p> {% highlight python %} -# commands executed in shell +# shell 命令 echo numpy==1.16.5 > requirements.txt pip download -d cached_dir -r requirements.txt --no-binary :all: -# python code +# Python 代码 Review comment: Python 代码 -> python 代码 ########## File path: docs/dev/python/user-guide/datastream/dependency_management.zh.md ########## @@ -0,0 +1,96 @@ +--- +title: "依赖管理" +nav-parent_id: python_datastream_api +nav-pos: 40 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Python 依赖管理 +如果 Python DataStream 程序中应用到了 Python 第三方依赖,用户可以使用以下 API 配置依赖信息,或在提交作业时直接通过[命令行参数]({% link ops/cli.zh.md %}#usage)配置。 + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">APIs</th> + <th class="text-left">Description</th> + </tr> + </thead> + + <tbody> + <tr> + <td><strong>add_python_file(file_path)</strong></td> + <td> + <p>添加 Python 文件依赖,可以是 Python文件、Python 包或本地文件目录。它们最终会被添加到 Python Worker 的 PYTHONPATH 中,从而让 Python 函数能够正确访问读取。</p> +{% highlight python %} +stream_execution_environment.add_python_file(file_path) +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>set_python_requirements(requirements_file_path, requirements_cache_dir=None)</strong></td> + <td> + <p>配置一个 requirements.txt 文件用于指定 Python 第三方依赖,这些依赖会被安装到一个临时目录并添加到 Python Worker 的 PYTHONPATH 中。对于在集群中无法访问的外部依赖,用户可以通过 "requirements_cached_dir" 参数指定一个包含这些依赖安装包的目录,这个目录文件会被上传到集群并实现离线安装。</p> +{% highlight python %} +# shell 命令 +echo numpy==1.16.5 > requirements.txt +pip download -d cached_dir -r requirements.txt --no-binary :all: + +# Python 代码 +stream_execution_environment.set_python_requirements("requirements.txt", "cached_dir") +{% endhighlight %} + <p>请确保这些依赖安装包和集群运行环境以及 Python 版本相匹配。此外,这些依赖将通过 Pip 安装, 请确保 Pip的版本(version >= 7.1.0) 和 Setup Tools 的版本(version >= 37.0.0)和依赖安装包兼容。</p> Review comment: SetupTools -> `Setuptools` ########## File path: docs/dev/python/user-guide/table/dependency_management.zh.md ########## @@ -61,49 +61,49 @@ table_env.add_python_file(file_path) <tr> <td><strong>set_python_requirements(requirements_file_path, requirements_cache_dir=None)</strong></td> <td> - <p>Specifies a requirements.txt file which defines the third-party dependencies. These dependencies will be installed to a temporary directory and added to the PYTHONPATH of the python UDF worker. For the dependencies which could not be accessed in the cluster, a directory which contains the installation packages of these dependencies could be specified using the parameter "requirements_cached_dir". It will be uploaded to the cluster to support offline installation.</p> + <p>配置一个 requirements.txt 文件用于指定 Python 第三方依赖,这些依赖会被安装到一个临时目录并添加到 Python Worker 的 PYTHONPATH 中。对于在集群中无法访问的外部依赖,用户可以通过 "requirements_cached_dir" 参数指定一个包含这些依赖安装包的目录,这个目录文件会被上传到集群并实现离线安装。</p> {% highlight python %} -# commands executed in shell +# shell 命令 echo numpy==1.16.5 > requirements.txt pip download -d cached_dir -r requirements.txt --no-binary :all: -# python code +# Python 代码 table_env.set_python_requirements("requirements.txt", "cached_dir") {% endhighlight %} - <p>Please make sure the installation packages matches the platform of the cluster and the python version used. These packages will be installed using pip, so also make sure the version of Pip (version >= 7.1.0) and the version of SetupTools (version >= 37.0.0).</p> + <p>请确保这些依赖安装包和集群运行环境以及 Python 版本相匹配。此外,这些依赖将通过 Pip 安装, 请确保 Pip的版本(version >= 7.1.0) 和 Setup Tools 的版本(version >= 37.0.0)和依赖安装包兼容。</p> Review comment: SetupTools -> Setuptools ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org