On Wed, May 19, 2021 at 6:28 PM <tclem...@tutanota.com> wrote: > I'm writing app that processing an unbound stream of filenames and then > catalogs them. What I'd like to do is to parse the files using AvroIO, but > have each record entry paired with the original filename as a key. > > In the past I've used the combo FileIO.matchAll() -> FileIO.readMatches() > -> AvroIO.readFilesGenericRecords(), but that looses the context of the > original filename. Is there a way to do this without reimplementing AvroIO? >
If you use AvroIO transforms that would end up creating a PCollection<GenericRecord> hence information regarding the original file will not be preserved. I don't think we have a AvroIO transform that preserves filenames yet, so you'll probably have to use FileIO transforms and perform reading in a DoFn. We have a new ContextualTextIO though [1]. Thanks, Cham [1] https://github.com/apache/beam/blob/master/sdks/java/io/contextualtextio/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIO.java > Appreciate the help. > > -- Tim. > > -- > Sent with Tutanota, the secure & ad-free mailbox: > https://tutanota.com >