Re: Is there an array explode function/transform?

Robert Bradshaw Thu, 14 Jan 2021 11:26:02 -0800

I think it makes sense to allow specifying more than one, if desired. This
is equivalent to just stacking multiple Unnests. (Possibly one could even
have a special syntax like "*" for all array fields.)


On Thu, Jan 14, 2021 at 10:05 AM Reuven Lax <re...@google.com> wrote:

> Should Unnest be allowed to specify multiple array fields, or just one?
>
> On Wed, Jan 13, 2021 at 11:59 PM Manninger, Matyas <
> matyas.mannin...@veolia.com> wrote:
>
>> I would also not unnest arrays nested in arrays just the top-level array
>> of the specified fields.
>>
>> On Wed, 13 Jan 2021 at 20:58, Reuven Lax <re...@google.com> wrote:
>>
>>> Nested fields are not part of standard SQL AFAIK. Beam goes further and
>>> supports array of array, etc.
>>>
>>> On Wed, Jan 13, 2021 at 11:42 AM Kenneth Knowles <k...@apache.org>
>>> wrote:
>>>
>>>> Just the fields specified, IMO. When in doubt, copy SQL. (and I mean
>>>> SQL generally, not just Beam SQL)
>>>>
>>>> Kenn
>>>>
>>>> On Wed, Jan 13, 2021 at 11:17 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Definitely could be a top-level transform. Should it automatically
>>>>> unnest all arrays, or just the fields specified?
>>>>>
>>>>> We do have to define the semantics for nested arrays as well.
>>>>>
>>>>> On Wed, Jan 13, 2021 at 10:57 AM Robert Bradshaw <rober...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Ah, thanks for the clarification. UNNEST does sound like what you
>>>>>> want here, and would likely make sense as a top-level relational 
>>>>>> transform
>>>>>> as well as being supported by SQL.
>>>>>>
>>>>>> On Wed, Jan 13, 2021 at 10:53 AM Tao Li <t...@zillow.com> wrote:
>>>>>>
>>>>>>> @Kyle Weaver <kcwea...@google.com> sure thing! So the input/output
>>>>>>> definition for the Flatten.Iterables
>>>>>>> <https://beam.apache.org/releases/javadoc/2.25.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html>
>>>>>>> is:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Input: PCollection<Iterable<T>
>>>>>>>
>>>>>>> Output: PCollection<T>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The input/output for a explode transform would look like this:
>>>>>>>
>>>>>>> Input:  PCollection<Row> The row schema has a field which is an
>>>>>>> array of T
>>>>>>>
>>>>>>> Output: PCollection<Row> The array type field from input schema is
>>>>>>> replaced with a new field of type T. The elements from the array type 
>>>>>>> field
>>>>>>> are flattened into multiple rows in the new table (other fields of input
>>>>>>> table are just duplicated.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hope this clarification helps!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From: *Kyle Weaver <kcwea...@google.com>
>>>>>>> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org>
>>>>>>> *Date: *Tuesday, January 12, 2021 at 4:58 PM
>>>>>>> *To: *"user@beam.apache.org" <user@beam.apache.org>
>>>>>>> *Cc: *Reuven Lax <re...@google.com>
>>>>>>> *Subject: *Re: Is there an array explode function/transform?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> @Reuven Lax <re...@google.com> yes I am aware of that transform,
>>>>>>> but that’s different from the explode operation I was referring to:
>>>>>>> https://spark.apache.org/docs/latest/api/sql/index.html#explode
>>>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191408293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IjXWhmHTGsbpgbxa1gJ5LcOFI%2BoiGIDYBwXPnukQfxk%3D&reserved=0>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> How is it different? It'd help if you could provide the signature
>>>>>>> (input and output PCollection types) of the transform you have in mind.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 12, 2021 at 4:49 PM Tao Li <t...@zillow.com> wrote:
>>>>>>>
>>>>>>> @Reuven Lax <re...@google.com> yes I am aware of that transform,
>>>>>>> but that’s different from the explode operation I was referring to:
>>>>>>> https://spark.apache.org/docs/latest/api/sql/index.html#explode
>>>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191418249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XuUUmNB3fgBasjDj0Dq1Z2g6%2Bc5fbvluf%2BnAp2m8cuE%3D&reserved=0>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From: *Reuven Lax <re...@google.com>
>>>>>>> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org>
>>>>>>> *Date: *Tuesday, January 12, 2021 at 2:04 PM
>>>>>>> *To: *user <user@beam.apache.org>
>>>>>>> *Subject: *Re: Is there an array explode function/transform?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Have you tried Flatten.iterables
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 12, 2021, 2:02 PM Tao Li <t...@zillow.com> wrote:
>>>>>>>
>>>>>>> Hi community,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Is there a beam function to explode an array (similarly to spark
>>>>>>> sql’s explode())? I did some research but did not find anything.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> BTW I think we can potentially use FlatMap to implement the explode
>>>>>>> functionality, but a Beam provided function would be very handy.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>>

Re: Is there an array explode function/transform?

Reply via email to