HI all
if i remember correct, if my pipeline writes a file to a GCP Bucket, and
the file already exists, the default behaviour is that the file is not
overridden
What is the pattern to follow if i want that every time the pipeline run
and the file is stored to GCP Bucket, the existing file is ove
Yes I am just thinking how to modify/rewrite this piece of code if I want
to run my pipeline on Dataflow runner.
Thanks & Regards
Rajnil Guha
On Wed, Jul 21, 2021 at 1:12 AM Robert Bradshaw wrote:
> On Tue, Jul 20, 2021 at 12:33 PM Rajnil Guha
> wrote:
> >
> > Hi,
> >
> > Thank you so much fo
OK, I replaced:
.apply(KafkaIO.read()
.withBootstrapServers(servers)
.withTopic(readTopic)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.withoutMetadata()
Worker machines are n1-standard-2s (2 cpus and 7.5GB of RAM)
Pipeline is simple, but large amounts of end files, ~125K temp files written in
one case at least
1. Scan Bigtable (NoSQL DB)
2. Transform with business logic
3. Convert to GenericRecord
4. WriteDynamic to a google bucket a
Let's say I have PCollection and I want to use the 'readAll' pattern to
enhance some data from an additional source such as redis (which has a
readKeys PTransform). However I don't want to 'lose'
the original A. There *are* a few ways to do this currently (side inputs,
joining two streams with Co
Windowing doesn't work with Batch jobs. You could dump your BQ data to
pubsub and then use a streaming job to window.
*~Vincent*
On Wed, Jul 21, 2021 at 10:13 AM Andrew Kettmann
wrote:
> Worker machines are n1-standard-2s (2 cpus and 7.5GB of RAM)
>
> Pipeline is simple, but large amounts of e
Oof none of the documentation around windowing I have read has said anything
about it not working in a batch job. Where did you find that info at if you
remember?
From: Vincent Marquez
Sent: Wednesday, July 21, 2021 12:21 PM
To: user
Subject: Re: [2.28.0] [Java]
Worth noting that you never "lose" a PCollection. You can use the same
PCollection in as many transforms as you like and every time you reference that
PCollection it will be in the same state it was when you first read it in.
So if you have:
PCollection colA = ...;
PCollection = colA.apply(ParD
On Wed, Jul 21, 2021 at 10:37 AM Andrew Kettmann
wrote:
> Worth noting that you never "lose" a PCollection. You can use the same
> PCollection in as many transforms as you like and every time you reference
> that PCollection it will be in the same state it was when you first read
> it in.
>
> So
Hi,
Thanks for your suggestions, I will surely check them out. My exact
use-case is to check if the Pcoll is empty, and if it is, publish a message
into a Pub/Sub topic. This message will then be further used downstream by
some other processes.
Thanks & Regards
Rajnil Guha
On Wed, Jul 21, 2021 a
Hi Team,
We are working on a ETL pipeline, As part of some transformation, some
columns are deleted from the input pcollection( that are no mapped to
appropriate Transactional TableColumns).
Based on the input mapping, attributes that gets imported will be dynamic
to the inputfile and the mapping
Hi Team,
We are designing an ETL pipeline.
Pipeline is invoked when there is a new input file(CSV), and Input file
schema(columns) can be dynamic.
WE are using Apache Schema to validate the rows in the InputFile. We are
creating a Schema(after reading header row from CSV) at runtime in dofn and
s
12 matches
Mail list logo