Hi Beam Team,

Bump on this. Does this question make sense?

Thanks,

Arwin

On Thu, Dec 8, 2022, 2:22 PM Arwin Tio <arwin....@getcruise.com> wrote:

> Hi Beam Team,
>
> Can somebody help me understand what are the factors behind SDK Harness
> memory usage? My first guess is that the SDK Harness memory usage depends
> on:
>
> 1. User code (i.e. DoFns)
> 2. Bundle size
>
> Basically, the maximum memory usage an SDK Harness needs is however much
> memory it takes for the user DoFn to process the largest bundle size. And
> the bundle size is determined by the Runner. So to limit SDK Harness memory
> usage, we have to ensure that our Runner selects small bundle sizes.
>
> However, looking through some design and the code, it seems like:
>
>    - sdk_worker.py
>    
> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py#L385>
>  seems
>    to be have multiple active bundle processors at the same time
>    - The Fn API: How to send and receive data
>    
> <https://docs.google.com/document/d/1IGduUqmhWDi_69l9nG8kw73HZ5WI5wOps9Tshl5wpQA/edit#heading=h.u78ozd9rrlsf>
>  design
>    doc seems to describe multiplexing multiple logical streams over a gRPC
>    connection
>
> Does this mean that the SDK Harnesses process multiple bundles at the same
> time? If so, how are the number of concurrent bundles limited?
>
> Or in general, what suggestions do you have to reduce memory usage of SDK
> Harnesses?
>
> Thanks,
>
> Arwin
>

-- 


*Confidentiality Note:* We care about protecting our proprietary 
information, confidential material, and trade secrets. This message may 
contain some or all of those things. Cruise will suffer material harm if 
anyone other than the intended recipient disseminates or takes any action 
based on this message. If you have received this message (including any 
attachments) in error, please delete it immediately and notify the sender 
promptly.

Reply via email to