Another option is to add a second DLQ that outputs just the original rows, i.e. the user has the option to fetch failed rows with or without metadata. It would take some work on our side to add this second DLQ to existing transforms, but that seems pretty straightforward.
On Sat, Oct 19, 2024 at 1:03 AM Robert Bradshaw via dev <dev@beam.apache.org> wrote: > I came across an interesting user report at > https://github.com/apache/beam/issues/32866 which made me realize that > providing metadata about a bad element in the "bad records" output is > useful, we don't make it easy to extract the output into a PCollection > of the original elements. The output schema contains the original > element as well as metadata about what error occurred, and in an > ordinary Beam pipeline one could easily apply a Map(lambda error_row: > error_row.element) but YAML doesn't have Map, just MapToFields > (primarily to be more schema friendly). > > There are a couple of options: > > (0) Leave things as they are. One can write > > type: MapToFields > config: > fields: > fld1: element.fld1 > fld2: element.fld2 > ... > > > This is of course a bit ugly as one needs to enumerate (and know) the > set of original fields. > > (1a) Provide a special operation "Unnest" that takes a single field > and emits it as the top-level element. This can of course result in > unschema'd PCollections (which are supported, but generally don't play > as well with the other operations, including xlang ones). > > (1b) Just provide a Map. This is a generalization of 1a, but on the > other hand would be more prone to abuse. > > (1c) We could name this > > type: MapToFields > config: > fields: > *: element > > IIRC, we already have the special case of "*" in our join syntax, and > we could re-use a bunch of the MapToFields infrastructure. But maybe > it's too obscure? > > (2) Add an optional argument to error_handling to omit the metadata. > This would require a bit of a hack to support ubiquitously, and > wouldn't solve the more general problem. > > Maybe there are some other ideas as well? >