Hi Brian,

Thanks for your response.
I am reading genomic data. I am using a research tool (software) that was
built to process the files - it is not built to work on multiple machines.
I dont usually work with Splittable DoFn - so I hope that I understand the
concept properly.
Please let me know if I might be missing something
Thanks,
Eila




On Mon, May 17, 2021 at 1:18 PM Brian Hulette <bhule...@google.com> wrote:

> What type of files are you reading? If they can be split and read by
> multiple workers this might be a good candidate for a Splittable DoFn (SDF).
>
> Brian
>
> On Wed, May 12, 2021 at 6:18 AM Eila Oriel Research <
> e...@orielresearch.org> wrote:
>
>> Hi,
>> I am running out of resources on the workers machines.
>> The reasons are:
>> 1. Every pcollection is a reference to a LARGE file that is copied into
>> the worker
>> 2. The worker makes calculations on the copied file using a software
>> library that consumes memory / storage / compute resources
>>
>> I have changed the workers' CPUs and memory size. At some point, I am
>> running out of resources with this method as well
>> I am looking to limit the number of pCollection / elements that are being
>> processed in parallel on each worker at a time.
>>
>> Many thank for any advice,
>> Best wishes,
>> --
>> Eila
>> <http://www.orielresearch.com>
>> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>

-- 
Eila
<http://www.orielresearch.com>
Meetup <https://www.meetup.com/Deep-Learning-In-Production/>

Reply via email to