Definitely could be a top-level transform. Should it automatically unnest all arrays, or just the fields specified?
We do have to define the semantics for nested arrays as well. On Wed, Jan 13, 2021 at 10:57 AM Robert Bradshaw <rober...@google.com> wrote: > Ah, thanks for the clarification. UNNEST does sound like what you want > here, and would likely make sense as a top-level relational transform as > well as being supported by SQL. > > On Wed, Jan 13, 2021 at 10:53 AM Tao Li <t...@zillow.com> wrote: > >> @Kyle Weaver <kcwea...@google.com> sure thing! So the input/output >> definition for the Flatten.Iterables >> <https://beam.apache.org/releases/javadoc/2.25.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html> >> is: >> >> >> >> Input: PCollection<Iterable<T> >> >> Output: PCollection<T> >> >> >> >> The input/output for a explode transform would look like this: >> >> Input: PCollection<Row> The row schema has a field which is an array of >> T >> >> Output: PCollection<Row> The array type field from input schema is >> replaced with a new field of type T. The elements from the array type field >> are flattened into multiple rows in the new table (other fields of input >> table are just duplicated. >> >> >> >> Hope this clarification helps! >> >> >> >> *From: *Kyle Weaver <kcwea...@google.com> >> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org> >> *Date: *Tuesday, January 12, 2021 at 4:58 PM >> *To: *"user@beam.apache.org" <user@beam.apache.org> >> *Cc: *Reuven Lax <re...@google.com> >> *Subject: *Re: Is there an array explode function/transform? >> >> >> >> @Reuven Lax <re...@google.com> yes I am aware of that transform, but >> that’s different from the explode operation I was referring to: >> https://spark.apache.org/docs/latest/api/sql/index.html#explode >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191408293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IjXWhmHTGsbpgbxa1gJ5LcOFI%2BoiGIDYBwXPnukQfxk%3D&reserved=0> >> >> >> >> How is it different? It'd help if you could provide the signature (input >> and output PCollection types) of the transform you have in mind. >> >> >> >> On Tue, Jan 12, 2021 at 4:49 PM Tao Li <t...@zillow.com> wrote: >> >> @Reuven Lax <re...@google.com> yes I am aware of that transform, but >> that’s different from the explode operation I was referring to: >> https://spark.apache.org/docs/latest/api/sql/index.html#explode >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191418249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XuUUmNB3fgBasjDj0Dq1Z2g6%2Bc5fbvluf%2BnAp2m8cuE%3D&reserved=0> >> >> >> >> *From: *Reuven Lax <re...@google.com> >> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org> >> *Date: *Tuesday, January 12, 2021 at 2:04 PM >> *To: *user <user@beam.apache.org> >> *Subject: *Re: Is there an array explode function/transform? >> >> >> >> Have you tried Flatten.iterables >> >> >> >> On Tue, Jan 12, 2021, 2:02 PM Tao Li <t...@zillow.com> wrote: >> >> Hi community, >> >> >> >> Is there a beam function to explode an array (similarly to spark sql’s >> explode())? I did some research but did not find anything. >> >> >> >> BTW I think we can potentially use FlatMap to implement the explode >> functionality, but a Beam provided function would be very handy. >> >> >> >> Thanks a lot! >> >>