Hi, thanks for reaching out. I'd be curious to see whether the memory consumption patterns you observe change if you switch the memory allocator library.
For example, you could try to use a custom container, install jemalloc and enable it. See: https://beam.apache.org/documentation/runtime/environments , https://cloud.google.com/dataflow/docs/guides/using-custom-containers Your Dockerfile might look like the following: FROM apache/beam_python3.10_sdk:2.49.0 # Prebuilt other dependencies RUN apt-get update \ && apt-get install -y libjemalloc-dev ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so # Set the entrypoint to the Apache Beam SDK launcher. ENTRYPOINT ["/opt/apache/beam/boot"] On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee <le...@allium.so> wrote: > Hello! > > I'm an avid apache beam user (on Dataflow) and we use beam to stream > blockchain data to various sinks. I recently noticed some memory issues > across all our pipelines but have yet to be able to find the root cause and > was hoping someone on your team might be able to help. If this isn't the > right avenue for it, please let me know how I should reach out. > > The details are here in stackoverflow: > > > https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io > > Thanks, > Chenghan > CTO | Allium >