Hi Brian, Thanks for your response. I am reading genomic data. I am using a research tool (software) that was built to process the files - it is not built to work on multiple machines. I dont usually work with Splittable DoFn - so I hope that I understand the concept properly. Please let me know if I might be missing something Thanks, Eila
On Mon, May 17, 2021 at 1:18 PM Brian Hulette <bhule...@google.com> wrote: > What type of files are you reading? If they can be split and read by > multiple workers this might be a good candidate for a Splittable DoFn (SDF). > > Brian > > On Wed, May 12, 2021 at 6:18 AM Eila Oriel Research < > e...@orielresearch.org> wrote: > >> Hi, >> I am running out of resources on the workers machines. >> The reasons are: >> 1. Every pcollection is a reference to a LARGE file that is copied into >> the worker >> 2. The worker makes calculations on the copied file using a software >> library that consumes memory / storage / compute resources >> >> I have changed the workers' CPUs and memory size. At some point, I am >> running out of resources with this method as well >> I am looking to limit the number of pCollection / elements that are being >> processed in parallel on each worker at a time. >> >> Many thank for any advice, >> Best wishes, >> -- >> Eila >> <http://www.orielresearch.com> >> Meetup <https://www.meetup.com/Deep-Learning-In-Production/> >> > -- Eila <http://www.orielresearch.com> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>