Thanks for updating the documentation Dian. Appreciate it. ..Sumeet
On Sun, May 2, 2021 at 10:53 AM Dian Fu <dian0511...@gmail.com> wrote: > Hi Sumeet, > > FYI: the documentation about the CLI options of PyFlink has already been > updated [1]. > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs > > Regards, > Dian > > > On Thu, Apr 29, 2021 at 4:46 PM Dian Fu <dian0511...@gmail.com> wrote: > >> Hi Sumeet, >> >> For the Python dependencies, multiple ways have been provided to specify >> them and you could take either way of them. >> >> Regarding to requirements.txt, there are 3 ways provided and you could >> specify it via either of them: >> - API inside the code: set_python_requirements >> - command line option: -pyreq [1] >> - configuration: python.requirements >> >> So you don’t need to specify them both inside the code and the command >> line options. >> >> PS: It seems that -pyreq is missing from the latest CLI documentation, >> however, actually it’s there and you could refer to the 1.11 documentation >> for now. I’ll try to add it back ASAP. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html >> >> Regards, >> Dian >> >> 2021年4月29日 下午3:24,Sumeet Malhotra <sumeet.malho...@gmail.com> 写道: >> >> Hi Till, >> >> There’s no problem with the documented approach. I was looking if there >> were any standardized ways of organizing, packaging and deploying Python >> code on a Flink cluster. >> >> Thanks, >> Sumeet >> >> >> >> On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >>> Hi Sumeet, >>> >>> Is there a problem with the documented approaches on how to submit the >>> Python program (not working) or are you asking in general? Given the >>> documentation, I would assume that you can configure the requirements.txt >>> via `set_python_requirements`. >>> >>> I am also pulling in Dian who might be able to tell you more about the >>> Python deployment options. >>> >>> If you are not running on a session cluster, then you can also create a >>> K8s image which contains your user code. That way you ship your job when >>> deploying the cluster. >>> >>> Cheers, >>> Till >>> >>> On Wed, Apr 28, 2021 at 10:17 AM Sumeet Malhotra < >>> sumeet.malho...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I have a PyFlink job that consists of: >>>> >>>> - Multiple Python files. >>>> - Multiple 3rdparty Python dependencies, specified in a >>>> `requirements.txt` file. >>>> - A few Java dependencies, mainly for external connectors. >>>> - An overall job config YAML file. >>>> >>>> Here's a simplified structure of the code layout. >>>> >>>> flink/ >>>> ├── deps >>>> │ ├── jar >>>> │ │ ├── flink-connector-kafka_2.11-1.12.2.jar >>>> │ │ └── kafka-clients-2.4.1.jar >>>> │ └── pip >>>> │ └── requirements.txt >>>> ├── conf >>>> │ └── job.yaml >>>> └── job >>>> ├── some_file_x.py >>>> ├── some_file_y.py >>>> └── main.py >>>> >>>> I'm able to execute this job running it locally i.e. invoking something >>>> like: >>>> >>>> python main.py --config <path_to_job_yaml> >>>> >>>> I'm loading the jars inside the Python code, using env.add_jars(...). >>>> >>>> Now, the next step is to submit this job to a Flink cluster running on >>>> K8S. I'm looking for any best practices in packaging and specifying >>>> dependencies that people tend to follow. As per the documentation here [1], >>>> various Python files, including the conf YAML, can be specified using the >>>> --pyFiles option and Java dependencies can be specified using --jarfile >>>> option. >>>> >>>> So, how can I specify 3rdparty Python package dependencies? According >>>> to another piece of documentation here [2], I should be able to specify the >>>> requirements.txt directly inside the code and submit it via the --pyFiles >>>> option. Is that right? >>>> >>>> Are there any other best practices folks use to package/submit jobs? >>>> >>>> Thanks, >>>> Sumeet >>>> >>>> [1] >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs >>>> [2] >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program >>>> >>> >>