Prefer to never specify num shards since this allows the runner the greatest flexibility in how it executes and is the most performant as well.
Increasing num shards enables more workers to do the work in parallel but there is no guarantee that it will be significantly faster since you could have 5 workers. On Thu, Feb 13, 2020 at 10:24 PM vivek chaurasiya <vivek....@gmail.com> wrote: > hi folks, I have this in code > > * globalIndexJson.apply("GCSOutput", > TextIO.write().to(fullGCSPath).withSuffix(".txt").withNumShards(500));* > > the same code is executed for 50GB, 3TB, 5TB of data. I want to know if > changing numShards for larger datasize will write to GCS faster? >