Hi, I have a PyFlink job that consists of:
- Multiple Python files. - Multiple 3rdparty Python dependencies, specified in a `requirements.txt` file. - A few Java dependencies, mainly for external connectors. - An overall job config YAML file. Here's a simplified structure of the code layout. flink/ ├── deps │ ├── jar │ │ ├── flink-connector-kafka_2.11-1.12.2.jar │ │ └── kafka-clients-2.4.1.jar │ └── pip │ └── requirements.txt ├── conf │ └── job.yaml └── job ├── some_file_x.py ├── some_file_y.py └── main.py I'm able to execute this job running it locally i.e. invoking something like: python main.py --config <path_to_job_yaml> I'm loading the jars inside the Python code, using env.add_jars(...). Now, the next step is to submit this job to a Flink cluster running on K8S. I'm looking for any best practices in packaging and specifying dependencies that people tend to follow. As per the documentation here [1], various Python files, including the conf YAML, can be specified using the --pyFiles option and Java dependencies can be specified using --jarfile option. So, how can I specify 3rdparty Python package dependencies? According to another piece of documentation here [2], I should be able to specify the requirements.txt directly inside the code and submit it via the --pyFiles option. Is that right? Are there any other best practices folks use to package/submit jobs? Thanks, Sumeet [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs [2] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program