Alright. Then let's see what Dian recommends to do. Cheers, Till
On Thu, Apr 29, 2021 at 9:25 AM Sumeet Malhotra <sumeet.malho...@gmail.com> wrote: > Hi Till, > > There’s no problem with the documented approach. I was looking if there > were any standardized ways of organizing, packaging and deploying Python > code on a Flink cluster. > > Thanks, > Sumeet > > > > On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi Sumeet, >> >> Is there a problem with the documented approaches on how to submit the >> Python program (not working) or are you asking in general? Given the >> documentation, I would assume that you can configure the requirements.txt >> via `set_python_requirements`. >> >> I am also pulling in Dian who might be able to tell you more about the >> Python deployment options. >> >> If you are not running on a session cluster, then you can also create a >> K8s image which contains your user code. That way you ship your job when >> deploying the cluster. >> >> Cheers, >> Till >> >> On Wed, Apr 28, 2021 at 10:17 AM Sumeet Malhotra < >> sumeet.malho...@gmail.com> wrote: >> >>> Hi, >>> >>> I have a PyFlink job that consists of: >>> >>> - Multiple Python files. >>> - Multiple 3rdparty Python dependencies, specified in a >>> `requirements.txt` file. >>> - A few Java dependencies, mainly for external connectors. >>> - An overall job config YAML file. >>> >>> Here's a simplified structure of the code layout. >>> >>> flink/ >>> ├── deps >>> │ ├── jar >>> │ │ ├── flink-connector-kafka_2.11-1.12.2.jar >>> │ │ └── kafka-clients-2.4.1.jar >>> │ └── pip >>> │ └── requirements.txt >>> ├── conf >>> │ └── job.yaml >>> └── job >>> ├── some_file_x.py >>> ├── some_file_y.py >>> └── main.py >>> >>> I'm able to execute this job running it locally i.e. invoking something >>> like: >>> >>> python main.py --config <path_to_job_yaml> >>> >>> I'm loading the jars inside the Python code, using env.add_jars(...). >>> >>> Now, the next step is to submit this job to a Flink cluster running on >>> K8S. I'm looking for any best practices in packaging and specifying >>> dependencies that people tend to follow. As per the documentation here [1], >>> various Python files, including the conf YAML, can be specified using the >>> --pyFiles option and Java dependencies can be specified using --jarfile >>> option. >>> >>> So, how can I specify 3rdparty Python package dependencies? According to >>> another piece of documentation here [2], I should be able to specify the >>> requirements.txt directly inside the code and submit it via the --pyFiles >>> option. Is that right? >>> >>> Are there any other best practices folks use to package/submit jobs? >>> >>> Thanks, >>> Sumeet >>> >>> [1] >>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs >>> [2] >>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program >>> >>