Thanks for updating the documentation Dian. Appreciate it.

..Sumeet

On Sun, May 2, 2021 at 10:53 AM Dian Fu <dian0511...@gmail.com> wrote:

> Hi Sumeet,
>
> FYI: the documentation about the CLI options of PyFlink has already been
> updated [1].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs
>
> Regards,
> Dian
>
>
> On Thu, Apr 29, 2021 at 4:46 PM Dian Fu <dian0511...@gmail.com> wrote:
>
>> Hi Sumeet,
>>
>> For the Python dependencies, multiple ways have been provided to specify
>> them and you could take either way of them.
>>
>> Regarding to requirements.txt, there are 3 ways provided and you could
>> specify it via either of them:
>> - API inside the code: set_python_requirements
>> - command line option: -pyreq [1]
>> - configuration: python.requirements
>>
>> So you don’t need to specify them both inside the code and the command
>> line options.
>>
>> PS: It seems that -pyreq is missing from the latest CLI documentation,
>> however, actually it’s there and you could refer to the 1.11 documentation
>> for now. I’ll try to add it back ASAP.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html
>>
>> Regards,
>> Dian
>>
>> 2021年4月29日 下午3:24,Sumeet Malhotra <sumeet.malho...@gmail.com> 写道:
>>
>> Hi Till,
>>
>> There’s no problem with the documented approach. I was looking if there
>> were any standardized ways of organizing, packaging and deploying Python
>> code on a Flink cluster.
>>
>> Thanks,
>> Sumeet
>>
>>
>>
>> On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> Hi Sumeet,
>>>
>>> Is there a problem with the documented approaches on how to submit the
>>> Python program (not working) or are you asking in general? Given the
>>> documentation, I would assume that you can configure the requirements.txt
>>> via `set_python_requirements`.
>>>
>>> I am also pulling in Dian who might be able to tell you more about the
>>> Python deployment options.
>>>
>>> If you are not running on a session cluster, then you can also create a
>>> K8s image which contains your user code. That way you ship your job when
>>> deploying the cluster.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Apr 28, 2021 at 10:17 AM Sumeet Malhotra <
>>> sumeet.malho...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a PyFlink job that consists of:
>>>>
>>>>    - Multiple Python files.
>>>>    - Multiple 3rdparty Python dependencies, specified in a
>>>>    `requirements.txt` file.
>>>>    - A few Java dependencies, mainly for external connectors.
>>>>    - An overall job config YAML file.
>>>>
>>>> Here's a simplified structure of the code layout.
>>>>
>>>> flink/
>>>> ├── deps
>>>> │   ├── jar
>>>> │   │   ├── flink-connector-kafka_2.11-1.12.2.jar
>>>> │   │   └── kafka-clients-2.4.1.jar
>>>> │   └── pip
>>>> │       └── requirements.txt
>>>> ├── conf
>>>> │   └── job.yaml
>>>> └── job
>>>>     ├── some_file_x.py
>>>>     ├── some_file_y.py
>>>>     └── main.py
>>>>
>>>> I'm able to execute this job running it locally i.e. invoking something
>>>> like:
>>>>
>>>> python main.py --config <path_to_job_yaml>
>>>>
>>>> I'm loading the jars inside the Python code, using env.add_jars(...).
>>>>
>>>> Now, the next step is to submit this job to a Flink cluster running on
>>>> K8S. I'm looking for any best practices in packaging and specifying
>>>> dependencies that people tend to follow. As per the documentation here [1],
>>>> various Python files, including the conf YAML, can be specified using the
>>>> --pyFiles option and Java dependencies can be specified using --jarfile
>>>> option.
>>>>
>>>> So, how can I specify 3rdparty Python package dependencies? According
>>>> to another piece of documentation here [2], I should be able to specify the
>>>> requirements.txt directly inside the code and submit it via the --pyFiles
>>>> option. Is that right?
>>>>
>>>> Are there any other best practices folks use to package/submit jobs?
>>>>
>>>> Thanks,
>>>> Sumeet
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs
>>>> [2]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program
>>>>
>>>
>>

Reply via email to