Hi all
I have a beam pipeline running with cloud dataflow that produces avro files
on GCS. Window duration is 1 minute and currently the job is running with
64 cores (16 * n1-standard-4). Per minute the data produced is around 2GB.
Is there any recommendation on the number of avro files to specif
Do you mean the value to specify for number of shards to write [1] ?
For this I think it's better to not specify any value which will give the
runner the most flexibility.
Thanks,
Cham
[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#
+Pablo Estrada who added this.
I don't think we have tested this specific option but I believe additional
BQ parameters option was added in a generic way to accept all additional
parameters.
Looking at the code, seems like additional parameters do get passed through
to load jobs:
https://github.
Does anyone actually use Streaming Autoscaling with cloud Dataflow? I have
seen scale-ups based on CPU but never on backlog. Now I do not see scale up
events at all. If this works can you please point me to a working example.
On 2019/01/09 20:09:46, Ken Barr wrote:
> Hello
>
> I have been
I have successfully been using the sequence file source located here:
https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java
However recently we started to do bl