Hi Till,

There’s no problem with the documented approach. I was looking if there
were any standardized ways of organizing, packaging and deploying Python
code on a Flink cluster.

Thanks,
Sumeet



On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Sumeet,
>
> Is there a problem with the documented approaches on how to submit the
> Python program (not working) or are you asking in general? Given the
> documentation, I would assume that you can configure the requirements.txt
> via `set_python_requirements`.
>
> I am also pulling in Dian who might be able to tell you more about the
> Python deployment options.
>
> If you are not running on a session cluster, then you can also create a
> K8s image which contains your user code. That way you ship your job when
> deploying the cluster.
>
> Cheers,
> Till
>
> On Wed, Apr 28, 2021 at 10:17 AM Sumeet Malhotra <
> sumeet.malho...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a PyFlink job that consists of:
>>
>>    - Multiple Python files.
>>    - Multiple 3rdparty Python dependencies, specified in a
>>    `requirements.txt` file.
>>    - A few Java dependencies, mainly for external connectors.
>>    - An overall job config YAML file.
>>
>> Here's a simplified structure of the code layout.
>>
>> flink/
>> ├── deps
>> │   ├── jar
>> │   │   ├── flink-connector-kafka_2.11-1.12.2.jar
>> │   │   └── kafka-clients-2.4.1.jar
>> │   └── pip
>> │       └── requirements.txt
>> ├── conf
>> │   └── job.yaml
>> └── job
>>     ├── some_file_x.py
>>     ├── some_file_y.py
>>     └── main.py
>>
>> I'm able to execute this job running it locally i.e. invoking something
>> like:
>>
>> python main.py --config <path_to_job_yaml>
>>
>> I'm loading the jars inside the Python code, using env.add_jars(...).
>>
>> Now, the next step is to submit this job to a Flink cluster running on
>> K8S. I'm looking for any best practices in packaging and specifying
>> dependencies that people tend to follow. As per the documentation here [1],
>> various Python files, including the conf YAML, can be specified using the
>> --pyFiles option and Java dependencies can be specified using --jarfile
>> option.
>>
>> So, how can I specify 3rdparty Python package dependencies? According to
>> another piece of documentation here [2], I should be able to specify the
>> requirements.txt directly inside the code and submit it via the --pyFiles
>> option. Is that right?
>>
>> Are there any other best practices folks use to package/submit jobs?
>>
>> Thanks,
>> Sumeet
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program
>>
>

Reply via email to