Hi,

I'm pretty new to beam, but have done a far bit of flink in the past. I'm
trying to solve the following, but I can't see how to do it in beam.
I have a stream of data coming in, I want to batch it up by key, and output
each batch to a file, with the name based on the key. I'm batching my input
using GroupIntoBatches, with a byte size of 512k and 10 second duration.
The output of this is a PCollection<KV<String, Iterator<Foo>>. For each
element of the PCollection, I want to write out to gs://dump/<String>/xxxxxx
I've tried this with FileIO.dynamicWrite. This appears to be creating
temporary files and shuffling, which I don't need here (if I shuffle I want
to do it before the batching).
Duplicates are ok for this.

Any help on this would be much appreciated.

Cheers,
Ivan

Reply via email to