Hi Sumeet, For the Python dependencies, multiple ways have been provided to specify them and you could take either way of them.
Regarding to requirements.txt, there are 3 ways provided and you could specify it via either of them: - API inside the code: set_python_requirements - command line option: -pyreq [1] - configuration: python.requirements So you don’t need to specify them both inside the code and the command line options. PS: It seems that -pyreq is missing from the latest CLI documentation, however, actually it’s there and you could refer to the 1.11 documentation for now. I’ll try to add it back ASAP. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html <https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html> Regards, Dian > 2021年4月29日 下午3:24,Sumeet Malhotra <sumeet.malho...@gmail.com > <mailto:sumeet.malho...@gmail.com>> 写道: > > Hi Till, > > There’s no problem with the documented approach. I was looking if there were > any standardized ways of organizing, packaging and deploying Python code on a > Flink cluster. > > Thanks, > Sumeet > > > > On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann <trohrm...@apache.org > <mailto:trohrm...@apache.org>> wrote: > Hi Sumeet, > > Is there a problem with the documented approaches on how to submit the Python > program (not working) or are you asking in general? Given the documentation, > I would assume that you can configure the requirements.txt via > `set_python_requirements`. > > I am also pulling in Dian who might be able to tell you more about the Python > deployment options. > > If you are not running on a session cluster, then you can also create a K8s > image which contains your user code. That way you ship your job when > deploying the cluster. > > Cheers, > Till > > On Wed, Apr 28, 2021 at 10:17 AM Sumeet Malhotra <sumeet.malho...@gmail.com > <mailto:sumeet.malho...@gmail.com>> wrote: > Hi, > > I have a PyFlink job that consists of: > Multiple Python files. > Multiple 3rdparty Python dependencies, specified in a `requirements.txt` file. > A few Java dependencies, mainly for external connectors. > An overall job config YAML file. > Here's a simplified structure of the code layout. > > flink/ > ├── deps > │ ├── jar > │ │ ├── flink-connector-kafka_2.11-1.12.2.jar > │ │ └── kafka-clients-2.4.1.jar > │ └── pip > │ └── requirements.txt > ├── conf > │ └── job.yaml > └── job > ├── some_file_x.py > ├── some_file_y.py > └── main.py > > I'm able to execute this job running it locally i.e. invoking something like: > > python main.py --config <path_to_job_yaml> > > I'm loading the jars inside the Python code, using env.add_jars(...). > > Now, the next step is to submit this job to a Flink cluster running on K8S. > I'm looking for any best practices in packaging and specifying dependencies > that people tend to follow. As per the documentation here [1], various Python > files, including the conf YAML, can be specified using the --pyFiles option > and Java dependencies can be specified using --jarfile option. > > So, how can I specify 3rdparty Python package dependencies? According to > another piece of documentation here [2], I should be able to specify the > requirements.txt directly inside the code and submit it via the --pyFiles > option. Is that right? > > Are there any other best practices folks use to package/submit jobs? > > Thanks, > Sumeet > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs > > <https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs> > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program > > <https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program>