Hi Beam community, I wonder why a GroupBy operation is involved in WriteFiles: https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/io/WriteFiles.html
This doc mentioned “ The exact parallelism of the write stage can be controlled using withNumShards(int)<https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/io/WriteFiles.html#withNumShards-int->, typically used to control how many files are produced or to globally limit the number of workers connecting to an external service. However, this option can often hurt performance: it adds an additional GroupByKey<https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/transforms/GroupByKey.html> to the pipeline.” When we are saving the PCollection into multiple files, why can’t we simply split the PCollection without a key and save each split as a file? Thanks!