Re: Suspected memory leak in Python Pubsub ReadFromPubsub

Valentyn Tymofieiev via dev Tue, 22 Aug 2023 17:23:24 -0700

Hi, thanks for reaching out.

I'd be curious to see whether the memory consumption patterns you observe
change if you switch the memory allocator library.

For example, you could try to use a custom container, install jemalloc and
enable it. See: https://beam.apache.org/documentation/runtime/environments
, https://cloud.google.com/dataflow/docs/guides/using-custom-containers

Your Dockerfile might look like the following:

FROM apache/beam_python3.10_sdk:2.49.0

# Prebuilt other dependencies
RUN apt-get update \
  && apt-get install -y libjemalloc-dev

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so

# Set the entrypoint to the Apache Beam SDK launcher.
ENTRYPOINT ["/opt/apache/beam/boot"]

On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee <le...@allium.so> wrote:

> Hello!
>
> I'm an avid apache beam user (on Dataflow) and we use beam to stream
> blockchain data to various sinks. I recently noticed some memory issues
> across all our pipelines but have yet to be able to find the root cause and
> was hoping someone on your team might be able to help. If this isn't the
> right avenue for it, please let me know how I should reach out.
>
> The details are here in stackoverflow:
>
>
> https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io
>
> Thanks,
> Chenghan
> CTO | Allium
>

Re: Suspected memory leak in Python Pubsub ReadFromPubsub

Reply via email to