Is there a way (seetings) to limit the number of element per worker machine

2021-05-12 Thread Eila Oriel Research
Hi, I am running out of resources on the workers machines. The reasons are: 1. Every pcollection is a reference to a LARGE file that is copied into the worker 2. The worker makes calculations on the copied file using a software library that consumes memory / storage / compute resources I have chan

Re: Is there a way (seetings) to limit the number of element per worker machine

2021-05-28 Thread Eila Oriel Research
2, 2021 at 6:18 AM Eila Oriel Research < > e...@orielresearch.org> wrote: > >> Hi, >> I am running out of resources on the workers machines. >> The reasons are: >> 1. Every pcollection is a reference to a LARGE file that is copied into >> the worker &g

Re: Is there a way (seetings) to limit the number of element per worker machine

2021-05-28 Thread Eila Oriel Research
If they can be split and read by >> multiple workers this might be a good candidate for a Splittable DoFn (SDF). >> >> Brian >> >> On Wed, May 12, 2021 at 6:18 AM Eila Oriel Research < >> e...@orielresearch.org> wrote: >> >>> Hi, >

Creating a dense matrix from sparse matrix using apache beam

2021-12-22 Thread Eila Oriel Research
Hi all, I am working witn Market Exchange Format (MEX) A quick explanation: it is a method to save high dimensional sparse matrix values into smaller files. It includes 3 files: - rows names with indexes file (name_r1,1) (name_r2,2) - column names and indexes file (name_c1,1) (name_c2,2) - values

/opt/userowned/ folder

2022-03-24 Thread Eila Oriel Research
Hello, We used /opt/userowned/ drive on the workers to copy files and installation vi setup.py We have noticed that /opt/userowned/ is not available anymore. The only folder under opt is google. Please let me know if there is any folder "dedicated" for installation / external files. Thanks, --

Re: /opt/userowned/ folder

2022-03-25 Thread Eila Oriel Research
Thank you. I am trying to figure it out too. most of my pipelines are relying on setup.py work on the workers Eila On Thu, Mar 24, 2022 at 10:31 PM Ahmet Altay wrote: > Adding people who might have knowledge: @Reza Rokni > @Valentyn > Tymofieiev > > On Thu, Mar 24, 2022 at 7:0

Re: /opt/userowned/ folder

2022-03-25 Thread Eila Oriel Research
ap to use custom containers instead of using >>> /opt/userowned because eventually runner v1 will be unsupported. >>> >>> 1: https://cloud.google.com/dataflow/docs/guides/using-custom-containers >>> >>> On Fri, Mar 25, 2022 at 5:19 AM Ei

Re: Support for Conda environment

2022-04-19 Thread Eila Oriel Research
Hi Anand, I dont know if it is still relevant. I am using conda. Please let me know if there is anything that I can help with. Best, Eila On Thu, Feb 17, 2022 at 10:57 AM Anand Inguva wrote: > > Hi, > > Is there anyone using Apache Beam in a Conda environment? > >1. Are you using the Conda

Unzip large file

2022-08-19 Thread Eila Oriel Research
Hi all, I am looking to unzip a large gz file. Can I restrict the job to 1 worker on dataflow runner and count on the order of the lines to stay as in the original gz file? If not, what will be the easiest way to unzip the file. p = beam.Pipeline(options=options) (p | 'Step 1.4.1 read gz file '

setup.py gsutil cp command Error 98

2023-01-17 Thread Eila Oriel Research
Hello, I am using an old code from setup.py to copy a file from gs to the worker. I am receiving the following error message: ['mkdir','-p','/opt/userowned/'], ["chmod", "777", "/opt/userowned/"], ["gsutil","cp","gs://ort_tools/anaconda/anaconda.sh", "/opt/userowned/"] insertId: "2551510730791

Re: setup.py gsutil cp command Error 98

2023-01-22 Thread Eila Oriel Research
7, 2023 at 1:22 PM Eila Oriel Research wrote: > Hello, > > I am using an old code from setup.py to copy a file from gs to the worker. > I am receiving the following error message: > > > ['mkdir','-p','/opt/userowned/'], > ["chmod", &