Hi Till, There’s no problem with the documented approach. I was looking if there were any standardized ways of organizing, packaging and deploying Python code on a Flink cluster.
Thanks, Sumeet On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Sumeet, > > Is there a problem with the documented approaches on how to submit the > Python program (not working) or are you asking in general? Given the > documentation, I would assume that you can configure the requirements.txt > via `set_python_requirements`. > > I am also pulling in Dian who might be able to tell you more about the > Python deployment options. > > If you are not running on a session cluster, then you can also create a > K8s image which contains your user code. That way you ship your job when > deploying the cluster. > > Cheers, > Till > > On Wed, Apr 28, 2021 at 10:17 AM Sumeet Malhotra < > sumeet.malho...@gmail.com> wrote: > >> Hi, >> >> I have a PyFlink job that consists of: >> >> - Multiple Python files. >> - Multiple 3rdparty Python dependencies, specified in a >> `requirements.txt` file. >> - A few Java dependencies, mainly for external connectors. >> - An overall job config YAML file. >> >> Here's a simplified structure of the code layout. >> >> flink/ >> ├── deps >> │ ├── jar >> │ │ ├── flink-connector-kafka_2.11-1.12.2.jar >> │ │ └── kafka-clients-2.4.1.jar >> │ └── pip >> │ └── requirements.txt >> ├── conf >> │ └── job.yaml >> └── job >> ├── some_file_x.py >> ├── some_file_y.py >> └── main.py >> >> I'm able to execute this job running it locally i.e. invoking something >> like: >> >> python main.py --config <path_to_job_yaml> >> >> I'm loading the jars inside the Python code, using env.add_jars(...). >> >> Now, the next step is to submit this job to a Flink cluster running on >> K8S. I'm looking for any best practices in packaging and specifying >> dependencies that people tend to follow. As per the documentation here [1], >> various Python files, including the conf YAML, can be specified using the >> --pyFiles option and Java dependencies can be specified using --jarfile >> option. >> >> So, how can I specify 3rdparty Python package dependencies? According to >> another piece of documentation here [2], I should be able to specify the >> requirements.txt directly inside the code and submit it via the --pyFiles >> option. Is that right? >> >> Are there any other best practices folks use to package/submit jobs? >> >> Thanks, >> Sumeet >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs >> [2] >> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program >> >