Hi Beam Team, Bump on this. Does this question make sense?
Thanks, Arwin On Thu, Dec 8, 2022, 2:22 PM Arwin Tio <arwin....@getcruise.com> wrote: > Hi Beam Team, > > Can somebody help me understand what are the factors behind SDK Harness > memory usage? My first guess is that the SDK Harness memory usage depends > on: > > 1. User code (i.e. DoFns) > 2. Bundle size > > Basically, the maximum memory usage an SDK Harness needs is however much > memory it takes for the user DoFn to process the largest bundle size. And > the bundle size is determined by the Runner. So to limit SDK Harness memory > usage, we have to ensure that our Runner selects small bundle sizes. > > However, looking through some design and the code, it seems like: > > - sdk_worker.py > > <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py#L385> > seems > to be have multiple active bundle processors at the same time > - The Fn API: How to send and receive data > > <https://docs.google.com/document/d/1IGduUqmhWDi_69l9nG8kw73HZ5WI5wOps9Tshl5wpQA/edit#heading=h.u78ozd9rrlsf> > design > doc seems to describe multiplexing multiple logical streams over a gRPC > connection > > Does this mean that the SDK Harnesses process multiple bundles at the same > time? If so, how are the number of concurrent bundles limited? > > Or in general, what suggestions do you have to reduce memory usage of SDK > Harnesses? > > Thanks, > > Arwin > -- *Confidentiality Note:* We care about protecting our proprietary information, confidential material, and trade secrets. This message may contain some or all of those things. Cruise will suffer material harm if anyone other than the intended recipient disseminates or takes any action based on this message. If you have received this message (including any attachments) in error, please delete it immediately and notify the sender promptly.