On Tue, Oct 22, 2024 at 12:36 PM Robert Bradshaw <rober...@google.com> wrote: > > On Tue, Oct 22, 2024 at 11:46 AM Danny McCormick > <dannymccorm...@google.com> wrote: > > > > > (1a) Provide a special operation "Unnest" that takes a single field > > > and emits it as the top-level element. This can of course result in > > > unschema'd PCollections (which are supported, but generally don't play > > > as well with the other operations, including xlang ones). > > > > I like this the most out of the options - why does it have to be unschema'd > > though? Couldn't we retain that information from previous steps? If not, I > > don't see a way around losing schema info. > > Yes, if the unnested element itself is schema'd, that is preserved. If > it's, say, an int, it will be a bare PCollection of ints. (Which isn't > the end of the world...) > > Naming is also still TBD. I just realized that unnest has the meaning > of iteration/flatten in some SQL dialects. For our dynamic > destinations we chose the keyword "only" to indicate that we want to > only write a specified field (as a top level record) rather than the > entire record.
Another alternative is to have a "Project" transform with keep/drop/only fields, which would parallel what we're doing for dynamic destinations and run inference. I'm still thinking StripErrorMetadata might be nice to really lower the bar for discoverability and readability for newcomers.