Hi,
I am running out of resources on the workers machines.
The reasons are:
1. Every pcollection is a reference to a LARGE file that is copied into the
worker
2. The worker makes calculations on the copied file using a software
library that consumes memory / storage / compute resources
I have chan
2, 2021 at 6:18 AM Eila Oriel Research <
> e...@orielresearch.org> wrote:
>
>> Hi,
>> I am running out of resources on the workers machines.
>> The reasons are:
>> 1. Every pcollection is a reference to a LARGE file that is copied into
>> the worker
&g
If they can be split and read by
>> multiple workers this might be a good candidate for a Splittable DoFn (SDF).
>>
>> Brian
>>
>> On Wed, May 12, 2021 at 6:18 AM Eila Oriel Research <
>> e...@orielresearch.org> wrote:
>>
>>> Hi,
>
Hi all,
I am working witn Market Exchange Format (MEX)
A quick explanation:
it is a method to save high dimensional sparse matrix values into smaller
files.
It includes 3 files:
- rows names with indexes file (name_r1,1) (name_r2,2)
- column names and indexes file (name_c1,1) (name_c2,2)
- values
Hello,
We used /opt/userowned/ drive on the workers to copy files and installation
vi setup.py
We have noticed that /opt/userowned/ is not available anymore. The only
folder under opt is google.
Please let me know if there is any folder "dedicated" for installation /
external files.
Thanks,
--
Thank you. I am trying to figure it out too. most of my pipelines are
relying on setup.py work on the workers
Eila
On Thu, Mar 24, 2022 at 10:31 PM Ahmet Altay wrote:
> Adding people who might have knowledge: @Reza Rokni
> @Valentyn
> Tymofieiev
>
> On Thu, Mar 24, 2022 at 7:0
ap to use custom containers instead of using
>>> /opt/userowned because eventually runner v1 will be unsupported.
>>>
>>> 1: https://cloud.google.com/dataflow/docs/guides/using-custom-containers
>>>
>>> On Fri, Mar 25, 2022 at 5:19 AM Ei
Hi Anand,
I dont know if it is still relevant.
I am using conda. Please let me know if there is anything that I can help
with.
Best,
Eila
On Thu, Feb 17, 2022 at 10:57 AM Anand Inguva
wrote:
>
> Hi,
>
> Is there anyone using Apache Beam in a Conda environment?
>
>1. Are you using the Conda
Hi all,
I am looking to unzip a large gz file. Can I restrict the job to 1 worker
on dataflow runner and count on the order of the lines to stay as in the
original gz file? If not, what will be the easiest way to unzip the file.
p = beam.Pipeline(options=options)
(p | 'Step 1.4.1 read gz file '
Hello,
I am using an old code from setup.py to copy a file from gs to the worker.
I am receiving the following error message:
['mkdir','-p','/opt/userowned/'],
["chmod", "777", "/opt/userowned/"],
["gsutil","cp","gs://ort_tools/anaconda/anaconda.sh", "/opt/userowned/"]
insertId: "2551510730791
7, 2023 at 1:22 PM Eila Oriel Research
wrote:
> Hello,
>
> I am using an old code from setup.py to copy a file from gs to the worker.
> I am receiving the following error message:
>
>
> ['mkdir','-p','/opt/userowned/'],
> ["chmod", &
11 matches
Mail list logo