Re: [Question] Processing chunks of data in batch based pipelines

2023-07-18 Thread Alexey Romanenko
Reading from RDBMS and processing the data downstream of your pipeline is not the same in terms of bundling. The main “issue" with a former is that it reads mostly in a single thread per SQL-query and JDBC client is not exception. So, Beam can’t split data, that are not yet read, into bundles

How to submit Beam Python job onto Kubernetes with Flink runner?

2023-07-18 Thread John Tipper
Hi all, I'm wanting to run a continuous stream processing job using Beam on a Flink runner within Kubernetes. I've been following this tutorial here (https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb) but I'm not sure what the author is referring to when he

Growing checkpoint size with Python SDF for reading from Redis streams

2023-07-18 Thread Nimalan Mahendran
Hello, I am running a pipeline built in the Python SDK that reads from a Redis stream via an SDF, in the following environment: - Python 3.11 - Apache Beam 2.48.0 - Flink 1.16 - Checkpoint interval: 60s - state.backend (Flink): hashmap