On Wed, May 19, 2021 at 6:28 PM <tclem...@tutanota.com> wrote:

> I'm writing app that processing an unbound stream of filenames and then
> catalogs them.  What I'd like to do is to parse the files using AvroIO, but
> have each record entry paired with the original filename as a key.
>
> In the past I've used the combo FileIO.matchAll() -> FileIO.readMatches()
> -> AvroIO.readFilesGenericRecords(), but that looses the context of the
> original filename.  Is there a way to do this without reimplementing AvroIO?
>

If you use AvroIO transforms that would end up creating a
PCollection<GenericRecord> hence information regarding the original file
will not be preserved.
I don't think we have a AvroIO transform that preserves filenames yet, so
you'll probably have to use FileIO transforms and perform reading in a DoFn.
We have  a new ContextualTextIO though [1].

Thanks,
Cham

[1]
https://github.com/apache/beam/blob/master/sdks/java/io/contextualtextio/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIO.java


> Appreciate the help.
>
> -- Tim.
>
> --
> Sent with Tutanota, the secure & ad-free mailbox:
> https://tutanota.com
>

Reply via email to