Prefer to never specify num shards since this allows the runner the
greatest flexibility in how it executes and is the most performant as well.

Increasing num shards enables more workers to do the work in parallel but
there is no guarantee that it will be significantly faster since you could
have 5 workers.

On Thu, Feb 13, 2020 at 10:24 PM vivek chaurasiya <vivek....@gmail.com>
wrote:

> hi folks, I have this in code
>
> *            globalIndexJson.apply("GCSOutput",
> TextIO.write().to(fullGCSPath).withSuffix(".txt").withNumShards(500));*
>
> the same code is executed for 50GB, 3TB, 5TB of data. I want to know if
> changing numShards for larger datasize will write to GCS faster?
>

Reply via email to