Hi Sumeet,

For the Python dependencies, multiple ways have been provided to specify them 
and you could take either way of them.

Regarding to requirements.txt, there are 3 ways provided and you could specify 
it via either of them:
- API inside the code: set_python_requirements
- command line option: -pyreq [1]
- configuration: python.requirements

So you don’t need to specify them both inside the code and the command line 
options.

PS: It seems that -pyreq is missing from the latest CLI documentation, however, 
actually it’s there and you could refer to the 1.11 documentation for now. I’ll 
try to add it back ASAP.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html 
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html>

Regards,
Dian

> 2021年4月29日 下午3:24,Sumeet Malhotra <sumeet.malho...@gmail.com 
> <mailto:sumeet.malho...@gmail.com>> 写道:
> 
> Hi Till,
> 
> There’s no problem with the documented approach. I was looking if there were 
> any standardized ways of organizing, packaging and deploying Python code on a 
> Flink cluster.
> 
> Thanks,
> Sumeet
> 
> 
> 
> On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann <trohrm...@apache.org 
> <mailto:trohrm...@apache.org>> wrote:
> Hi Sumeet,
> 
> Is there a problem with the documented approaches on how to submit the Python 
> program (not working) or are you asking in general? Given the documentation, 
> I would assume that you can configure the requirements.txt via 
> `set_python_requirements`.
> 
> I am also pulling in Dian who might be able to tell you more about the Python 
> deployment options.
> 
> If you are not running on a session cluster, then you can also create a K8s 
> image which contains your user code. That way you ship your job when 
> deploying the cluster.
> 
> Cheers,
> Till
> 
> On Wed, Apr 28, 2021 at 10:17 AM Sumeet Malhotra <sumeet.malho...@gmail.com 
> <mailto:sumeet.malho...@gmail.com>> wrote:
> Hi,
> 
> I have a PyFlink job that consists of:
> Multiple Python files.
> Multiple 3rdparty Python dependencies, specified in a `requirements.txt` file.
> A few Java dependencies, mainly for external connectors.
> An overall job config YAML file.
> Here's a simplified structure of the code layout.
> 
> flink/
> ├── deps
> │   ├── jar
> │   │   ├── flink-connector-kafka_2.11-1.12.2.jar
> │   │   └── kafka-clients-2.4.1.jar
> │   └── pip
> │       └── requirements.txt
> ├── conf
> │   └── job.yaml
> └── job
>     ├── some_file_x.py
>     ├── some_file_y.py
>     └── main.py
> 
> I'm able to execute this job running it locally i.e. invoking something like:
> 
> python main.py --config <path_to_job_yaml>
> 
> I'm loading the jars inside the Python code, using env.add_jars(...).
> 
> Now, the next step is to submit this job to a Flink cluster running on K8S. 
> I'm looking for any best practices in packaging and specifying dependencies 
> that people tend to follow. As per the documentation here [1], various Python 
> files, including the conf YAML, can be specified using the --pyFiles option 
> and Java dependencies can be specified using --jarfile option.
> 
> So, how can I specify 3rdparty Python package dependencies? According to 
> another piece of documentation here [2], I should be able to specify the 
> requirements.txt directly inside the code and submit it via the --pyFiles 
> option. Is that right?
> 
> Are there any other best practices folks use to package/submit jobs?
> 
> Thanks,
> Sumeet
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs>
> [2] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program>

Reply via email to