I would also not unnest arrays nested in arrays just the top-level array of
the specified fields.
On Wed, 13 Jan 2021 at 20:58, Reuven Lax <re...@google.com> wrote:

> Nested fields are not part of standard SQL AFAIK. Beam goes further and
> supports array of array, etc.
>
> On Wed, Jan 13, 2021 at 11:42 AM Kenneth Knowles <k...@apache.org> wrote:
>
>> Just the fields specified, IMO. When in doubt, copy SQL. (and I mean SQL
>> generally, not just Beam SQL)
>>
>> Kenn
>>
>> On Wed, Jan 13, 2021 at 11:17 AM Reuven Lax <re...@google.com> wrote:
>>
>>> Definitely could be a top-level transform. Should it automatically
>>> unnest all arrays, or just the fields specified?
>>>
>>> We do have to define the semantics for nested arrays as well.
>>>
>>> On Wed, Jan 13, 2021 at 10:57 AM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> Ah, thanks for the clarification. UNNEST does sound like what you want
>>>> here, and would likely make sense as a top-level relational transform as
>>>> well as being supported by SQL.
>>>>
>>>> On Wed, Jan 13, 2021 at 10:53 AM Tao Li <t...@zillow.com> wrote:
>>>>
>>>>> @Kyle Weaver <kcwea...@google.com> sure thing! So the input/output
>>>>> definition for the Flatten.Iterables
>>>>> <https://beam.apache.org/releases/javadoc/2.25.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html>
>>>>> is:
>>>>>
>>>>>
>>>>>
>>>>> Input: PCollection<Iterable<T>
>>>>>
>>>>> Output: PCollection<T>
>>>>>
>>>>>
>>>>>
>>>>> The input/output for a explode transform would look like this:
>>>>>
>>>>> Input:  PCollection<Row> The row schema has a field which is an array
>>>>> of T
>>>>>
>>>>> Output: PCollection<Row> The array type field from input schema is
>>>>> replaced with a new field of type T. The elements from the array type 
>>>>> field
>>>>> are flattened into multiple rows in the new table (other fields of input
>>>>> table are just duplicated.
>>>>>
>>>>>
>>>>>
>>>>> Hope this clarification helps!
>>>>>
>>>>>
>>>>>
>>>>> *From: *Kyle Weaver <kcwea...@google.com>
>>>>> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org>
>>>>> *Date: *Tuesday, January 12, 2021 at 4:58 PM
>>>>> *To: *"user@beam.apache.org" <user@beam.apache.org>
>>>>> *Cc: *Reuven Lax <re...@google.com>
>>>>> *Subject: *Re: Is there an array explode function/transform?
>>>>>
>>>>>
>>>>>
>>>>> @Reuven Lax <re...@google.com> yes I am aware of that transform, but
>>>>> that’s different from the explode operation I was referring to:
>>>>> https://spark.apache.org/docs/latest/api/sql/index.html#explode
>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191408293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IjXWhmHTGsbpgbxa1gJ5LcOFI%2BoiGIDYBwXPnukQfxk%3D&reserved=0>
>>>>>
>>>>>
>>>>>
>>>>> How is it different? It'd help if you could provide the signature
>>>>> (input and output PCollection types) of the transform you have in mind.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 12, 2021 at 4:49 PM Tao Li <t...@zillow.com> wrote:
>>>>>
>>>>> @Reuven Lax <re...@google.com> yes I am aware of that transform, but
>>>>> that’s different from the explode operation I was referring to:
>>>>> https://spark.apache.org/docs/latest/api/sql/index.html#explode
>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191418249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XuUUmNB3fgBasjDj0Dq1Z2g6%2Bc5fbvluf%2BnAp2m8cuE%3D&reserved=0>
>>>>>
>>>>>
>>>>>
>>>>> *From: *Reuven Lax <re...@google.com>
>>>>> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org>
>>>>> *Date: *Tuesday, January 12, 2021 at 2:04 PM
>>>>> *To: *user <user@beam.apache.org>
>>>>> *Subject: *Re: Is there an array explode function/transform?
>>>>>
>>>>>
>>>>>
>>>>> Have you tried Flatten.iterables
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 12, 2021, 2:02 PM Tao Li <t...@zillow.com> wrote:
>>>>>
>>>>> Hi community,
>>>>>
>>>>>
>>>>>
>>>>> Is there a beam function to explode an array (similarly to spark sql’s
>>>>> explode())? I did some research but did not find anything.
>>>>>
>>>>>
>>>>>
>>>>> BTW I think we can potentially use FlatMap to implement the explode
>>>>> functionality, but a Beam provided function would be very handy.
>>>>>
>>>>>
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>>

Reply via email to