Hello everybody,

does anyone have experience with setting up a flink-cluster to execute beam
pipelines over the python sdk?

Because rootfull containers are not allowed where I want to run the
cluster, I try to do this in podman/podman-compose.

I try to make a test pipeline work:
https://github.com/matleicht/podman_beam_flink
Environment:
Debian 11 with podman and python 3.2.9(with apache-beam==2.38.0 and
podman-compose) installed.

The setup of the cluster defined in:
https://github.com/matleicht/podman_beam_flink/blob/master/docker-compose.yml
1x flink-jobmanager (flink version 1.14)
1x flink-taskmanager
1x python harness sdk
I chose to create a sdk container manually because i don't have docker
installed and flink obviously fails, when it tries to create a container
over docker.

When i try to run the pipeline (
https://github.com/matleicht/podman_beam_flink/blob/master/pipeline_test.py)
the sdk worker get stuck with the container lock:

2022/12/04 16:13:02 Starting worker pool 1: python -m
apache_beam.runners.worker.worker_pool_main --service_port=50000
--container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1',
'--logging_endpoint=localhost:45087',
'--artifact_endpoint=localhost:35323',
'--provision_endpoint=localhost:36435',
'--control_endpoint=localhost:33237']
2022/12/04 16:16:31 Failed to obtain provisioning information: failed to
dial server at localhost:36435
        caused by:
context deadline exceeded

I suspect that I have an error in the network setup or there are some
configurations missing for the harness worker, but I could not figure out
the problem.

Does anyone tried this himself on podman or see the error in container
configuration and could help me?

My ultimate goal is to create a streaming pipeline to read data from Kafka
Instance, process it, and write back to it.

Thanks for your help in advance.
Best regards
Matthis

Reply via email to