Re: Beam portable runner setup for Flink + Python on Kubernetes

Sam Bourne Fri, 23 Feb 2024 10:33:52 -0800

Hey Jaehyeon,

Docker is the default environment type
<https://github.com/apache/beam/blob/ae8bbf86c9c5951b2685b8400d6ae3fefe678a9a/sdks/python/apache_beam/options/pipeline_options.py#L1481>
when using the PortableRunner. I included them just for reference because
we found it useful to override the default sdk container with our own.

It is pretty complicated, especially to debug sometimes, but we had some
good success running some simple pipelines in production for around a year.
I was more wary about maintaining my own Flink cluster so eventually we
decided to shed the technical debt and pay for Dataflow. Runners already
rely on docker to support the portability framework
<https://beam.apache.org/roadmap/portability/> so I don't think that is
much of a concern.

On Thu, Feb 22, 2024 at 7:49 PM Jaehyeon Kim <dott...@gmail.com> wrote:

> Hi Sam
>
> Thanks for the GitHub repo link. In your example, the environment type is
> set to DOCKER and it requires a docker container running together with the
> task manager. Would you think it is acceptable in a production environment?
>
> Cheers,
> Jaehyeon
>
> On Fri, 23 Feb 2024 at 13:57, Sam Bourne <samb...@gmail.com> wrote:
>
>> I made this a few years ago to help people like yourself.
>>
>> https://github.com/sambvfx/beam-flink-k8s
>>
>> Hopefully it's insightful and I'm happy to accept any MRs to update any
>> outdated information or to flesh it out more.
>>
>> On Thu, Feb 22, 2024 at 3:48 PM Jaehyeon Kim <dott...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I'm playing with the beam portable runner to read/write data from Kafka.
>>> I see a spark runner example on Kubernetes (
>>> https://beam.apache.org/documentation/runners/spark/#kubernetes) but
>>> the flink runner section doesn't include such an example.
>>>
>>> Is there a resource that I can learn? Ideally it'll be good if it is
>>> updated in the documentation.
>>>
>>> Cheers,
>>> Jaehyeon
>>>
>>

Re: Beam portable runner setup for Flink + Python on Kubernetes

Reply via email to