AvroIO Windowed Writes - Number of files to specify

2019-09-04 Thread Ziyad Muhammed
Hi all I have a beam pipeline running with cloud dataflow that produces avro files on GCS. Window duration is 1 minute and currently the job is running with 64 cores (16 * n1-standard-4). Per minute the data produced is around 2GB. Is there any recommendation on the number of avro files to specif

Re: AvroIO Windowed Writes - Number of files to specify

2019-09-04 Thread Chamikara Jayalath
Do you mean the value to specify for number of shards to write [1] ? For this I think it's better to not specify any value which will give the runner the most flexibility. Thanks, Cham [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#

Re: Python WriteToBigQuery with FILE_LOAD & additional_bq_parameters not working

2019-09-04 Thread Chamikara Jayalath
+Pablo Estrada who added this. I don't think we have tested this specific option but I believe additional BQ parameters option was added in a generic way to accept all additional parameters. Looking at the code, seems like additional parameters do get passed through to load jobs: https://github.

Re: Autoscaling stuck at 1, never see getSplitBacklogBytes() execute

2019-09-04 Thread Ken Barr
Does anyone actually use Streaming Autoscaling with cloud Dataflow? I have seen scale-ups based on CPU but never on backlog. Now I do not see scale up events at all. If this works can you please point me to a working example. On 2019/01/09 20:09:46, Ken Barr wrote: > Hello > > I have been

[Java] Compressed SequenceFile

2019-09-04 Thread Shannon Duncan
I have successfully been using the sequence file source located here: https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java However recently we started to do bl