Re: [Early Feedback] Variant and Subcolumnarization Support

Amogh Jahagirdar Tue, 23 Jul 2024 06:34:27 -0700

I'm late replying to this but I'm also in agreement with 1 (adopting the
spark variant encoding), 3 (specifically only having a variant type), and 4
(ensuring we are thinking through subcolumnarization upfront since without
it the variant type may not be that useful).


I'd also support having the spec, and reference implementation in Iceberg;
as others have said, it centralizes improvements in a single, agnostic
dependency for engines, rather than engines having to take dependencies on
other engine modules.

Thanks,

Amogh Jahagirdar

On Tue, Jul 23, 2024 at 12:15 AM Péter Váry <[email protected]>
wrote:

> I have been looking around, how can we map Variant type in Flink. I have
> not found any existing type which we could use, but Flink already have some
> JSON parsing capabilities [1] for string fields.
>
> So until we have native support in Flink for something similar to Vartiant
> type, I expect that we need to map it to JSON strings in RowData.
>
> Based on that, here are my preferences:
> 1. I'm ok with adapting Spark Variant type, if we build our own Iceberg
> serializer/deserializer module for it
> 2. I prefer to move the spec to Iceberg, so we own it, and extend it, if
> needed. This could be important in the first phase. Later when it is more
> stable we might donate it to some other project, like Parquet
> 3. I would prefer to support only a single type, and Variant is more
> expressive, but having a standard way to convert between JSON and Variant
> would be useful for Flink users.
> 4. On subcolumnarization: I think Flink will only use this feature as much
> as the Iceberg readers implement this, so I would like to see as much as
> possible of it in the common Iceberg code
>
> Thanks,
> Peter
>
> [1] -
> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/systemfunctions/#json-functions
>
>
> On Tue, Jul 23, 2024, 06:36 Micah Kornfield <[email protected]> wrote:
>
>> Sorry for the late reply.  I agree with the sentiments on 1 and 3 that
>> have already been posted (adopt the Spark encoding, and only have the
>> Variant type).  As mentioned on the doc for 3, I think it would be good to
>> specify how to map scalar types to a JSON representation so there can be
>> consistency between engines that don't support variant.
>>
>>
>>> Regarding point 2, I also feel Iceberg is more natural to host such a
>>> subproject for variant spec and implementation. But let me reach out to the
>>> Spark community to discuss.
>>
>>
>> The only  other place I can think of that might be a good home for
>> Variant spec could be in Apache Arrow as a canonical extension type. There
>> is an issue for this [1].  I think the main thing on where this is housed
>> is which types are intended to be supported.  I believe Arrow is currently
>> a superset of the Iceberg type system (UUID is supported as a canonical
>> extension type [2]).
>>
>> For point 4 subcolumnarization, I think ideally this belongs in Iceberg
>> (and if Iceberg and Delta Lake can agree on how to do it that would be
>> great) with potential consultation with Parquet/ORC communities to
>> potentially add better native support.
>>
>> Thanks,
>> Micah
>>
>>
>>
>> [1] https://github.com/apache/arrow/issues/42069
>> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
>>
>> On Sat, Jul 20, 2024 at 5:54 PM Aihua Xu <[email protected]> wrote:
>>
>>> Thanks for the discussion and feedback.
>>>
>>> Do we have the consensus on point 1 and point 3 to move forward with
>>> Spark variant encoding and support Variant type only? Or let me know how to
>>> proceed from here.
>>>
>>> Regarding point 2, I also feel Iceberg is more natural to host such a
>>> subproject for variant spec and implementation. But let me reach out to the
>>> Spark community to discuss.
>>>
>>> Thanks,
>>> Aihua
>>>
>>>
>>> On Fri, Jul 19, 2024 at 9:35 AM Yufei Gu <[email protected]> wrote:
>>>
>>>> Agreed with point 1.
>>>>
>>>> For point 2, I also prefer to hold the spec and reference
>>>> implementation under Iceberg. Here are the reasons:
>>>> 1. It is unconventional and impractical for one engine to depend on
>>>> another for data types. For instance, it is not ideal for Trino to rely on
>>>> data types defined by the Spark engine.
>>>> 2. Iceberg serves as a bridge between engines and file formats. By
>>>> centralizing the specification in Iceberg, any future optimizations or
>>>> updates to file formats can be referred to within Iceberg, ensuring
>>>> consistency and reducing dependencies.
>>>>
>>>> For point 3, I'd prefer to support the variant type only at this moment.
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Thu, Jul 18, 2024 at 12:55 PM Ryan Blue <[email protected]>
>>>> wrote:
>>>>
>>>>> Similarly, I'm aligned with point 1 and I'd choose to support only
>>>>> variant for point 3.
>>>>>
>>>>> We'll need to work with the Spark community to find a good place for
>>>>> the library and spec, since it touches many different projects. I'd also
>>>>> prefer Iceberg as the home.
>>>>>
>>>>> I also think it's a good idea to get subcolumnarization into our spec
>>>>> when we update. Without that I think the feature will be fairly limited.
>>>>>
>>>>> On Thu, Jul 18, 2024 at 10:56 AM Russell Spitzer <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I'm aligned with point 1.
>>>>>>
>>>>>> For point 2 I think we should choose quickly, I honestly do think
>>>>>> this would be fine as part of the Iceberg Spec directly but understand it
>>>>>> may be better for the more broad community if it was a sub project. As a
>>>>>> sub-project I would still prefer it being an Iceberg Subproject since we
>>>>>> are engine/file-format agnostic.
>>>>>>
>>>>>> 3. I support adding just Variant.
>>>>>>
>>>>>> On Thu, Jul 18, 2024 at 12:54 AM Aihua Xu <[email protected]> wrote:
>>>>>>
>>>>>>> Hello community,
>>>>>>>
>>>>>>> It’s great to sync up with some of you on Variant and
>>>>>>> SubColumarization support in Iceberg again. Apologize that I didn’t 
>>>>>>> record
>>>>>>> the meeting but here are some key items that we want to follow up with 
>>>>>>> the
>>>>>>> community.
>>>>>>>
>>>>>>> 1. Adopt Spark Variant encoding
>>>>>>> Those present were in favor of  adopting the Spark variant encoding
>>>>>>> for Iceberg Variant with extensions to support other Iceberg types. We
>>>>>>> would like to know if anyone has an objection to this to reuse an open
>>>>>>> source encoding.
>>>>>>>
>>>>>>> 2. Movement of the Spark Variant Spec to another project
>>>>>>> To avoid introducing Apache Spark as a dependency for the engines
>>>>>>> and file formats, we discussed separating Spark Variant encoding spec 
>>>>>>> and
>>>>>>> implementation from the Spark Project to a neutral location. We thought 
>>>>>>> up
>>>>>>> several solutions but didn’t have consensus on any of them. We are 
>>>>>>> looking
>>>>>>> for more feedback on this topic from the community either in terms of
>>>>>>> support for one of these options or another idea on how to support the 
>>>>>>> spec.
>>>>>>>
>>>>>>> Options Proposed:
>>>>>>> * Leave the Spec in Spark (Difficult for versioning and other
>>>>>>> engines)
>>>>>>> * Copying the Spec into Iceberg Project Directly (Difficult for
>>>>>>> other Table Formats)
>>>>>>> * Creating a Sub-Project of Apache Iceberg and moving the spec and
>>>>>>> reference implementation there (Logistically complicated)
>>>>>>> * Creating a Sub-Project of Apache Spark and moving the spec and
>>>>>>> reference implementation there (Logistically complicated)
>>>>>>>
>>>>>>> 3. Add Variant type vs. Variant and JSON types
>>>>>>> Those who were present were in favor of adding only the Variant type
>>>>>>> to Iceberg. We are looking for anyone who has an objection to going 
>>>>>>> forward
>>>>>>> with just the Variant Type and no Iceberg JSON Type. We were favoring
>>>>>>> adding Variant type only because:
>>>>>>> * Introducing a JSON type would require engines that only support
>>>>>>> VARIANT to do write time validation of their input to a JSON column. If
>>>>>>> they don’t have a JSON type an engine wouldn’t support this.
>>>>>>> * Engines which don’t support Variant will work most of the time but
>>>>>>> can have fallback strings defined in the spec for reading unsupported
>>>>>>> types. Writing a JSON into a Variant will always work.
>>>>>>>
>>>>>>> 4. Support for Subcolumnization spec (shredding in Spark)
>>>>>>> We have no action items on this but would like to follow up on
>>>>>>> discussions on Subcolumnization in the future.
>>>>>>> * We had general agreement that this should be included in Iceberg
>>>>>>> V3 or else adding variant may not be useful.
>>>>>>> * We are interested in also adopting the shredding spec from Spark
>>>>>>> and would like to move it to whatever place we decided the Variant spec 
>>>>>>> is
>>>>>>> going to live.
>>>>>>>
>>>>>>> Let us know if missed anything and if you have any additional
>>>>>>> thoughts or suggestions.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Aihua
>>>>>>>
>>>>>>>
>>>>>>> On 2024/07/15 18:32:22 Aihua Xu wrote:
>>>>>>> > Thanks for the discussion.
>>>>>>> >
>>>>>>> > I will move forward to work on spec PR.
>>>>>>> >
>>>>>>> > Regarding the implementation, we will have module for Variant
>>>>>>> support in Iceberg so we will not have to bring in Spark libraries.
>>>>>>> >
>>>>>>> > I'm reposting the meeting invite in case it's not clear in my
>>>>>>> original email since I included in the end. Looks like we don't have 
>>>>>>> major
>>>>>>> objections/diverges but let's sync up and have consensus.
>>>>>>> >
>>>>>>> > Meeting invite:
>>>>>>> >
>>>>>>> > Wednesday, July 17 · 9:00 – 10:00am
>>>>>>> > Time zone: America/Los_Angeles
>>>>>>> > Google Meet joining info
>>>>>>> > Video call link: https://meet.google.com/pbm-ovzn-aoq
>>>>>>> > Or dial: ‪(US) +1 650-449-9343‬ PIN: ‪170 576 525‬#
>>>>>>> > More phone numbers:
>>>>>>> https://tel.meet/pbm-ovzn-aoq?pin=4079632691790
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Aihua
>>>>>>> >
>>>>>>> > On 2024/07/12 20:55:01 Micah Kornfield wrote:
>>>>>>> > > I don't think this needs to hold up the PR but I think coming to
>>>>>>> a
>>>>>>> > > consensus on the exact set of types supported is worthwhile (and
>>>>>>> if the
>>>>>>> > > goal is to maintain the same set as specified by the Spark
>>>>>>> Variant type or
>>>>>>> > > if divergence is expected/allowed).  From a fragmentation
>>>>>>> perspective it
>>>>>>> > > would be a shame if they diverge, so maybe a next step is also
>>>>>>> suggesting
>>>>>>> > > support to the Spark community on the missing existing Iceberg
>>>>>>> types?
>>>>>>> > >
>>>>>>> > > Thanks,
>>>>>>> > > Micah
>>>>>>> > >
>>>>>>> > > On Fri, Jul 12, 2024 at 1:44 PM Russell Spitzer <
>>>>>>> [email protected]>
>>>>>>> > > wrote:
>>>>>>> > >
>>>>>>> > > > Just talked with Aihua and he's working on the Spec PR now. We
>>>>>>> can get
>>>>>>> > > > feedback there from everyone.
>>>>>>> > > >
>>>>>>> > > > On Fri, Jul 12, 2024 at 3:41 PM Ryan Blue
>>>>>>> <[email protected]>
>>>>>>> > > > wrote:
>>>>>>> > > >
>>>>>>> > > >> Good idea, but I'm hoping that we can continue to get their
>>>>>>> feedback in
>>>>>>> > > >> parallel to getting the spec changes started. Piotr didn't
>>>>>>> seem to object
>>>>>>> > > >> to the encoding from what I read of his comments. Hopefully
>>>>>>> he (and others)
>>>>>>> > > >> chime in here.
>>>>>>> > > >>
>>>>>>> > > >> On Fri, Jul 12, 2024 at 1:32 PM Russell Spitzer <
>>>>>>> > > >> [email protected]> wrote:
>>>>>>> > > >>
>>>>>>> > > >>> I just want to make sure we get Piotr and Peter on board as
>>>>>>> > > >>> representatives of Flink and Trino engines. Also make sure
>>>>>>> we have anyone
>>>>>>> > > >>> else chime in who has experience with Ray if possible.
>>>>>>> > > >>>
>>>>>>> > > >>> Spec changes feel like the right next step.
>>>>>>> > > >>>
>>>>>>> > > >>> On Fri, Jul 12, 2024 at 3:14 PM Ryan Blue
>>>>>>> <[email protected]>
>>>>>>> > > >>> wrote:
>>>>>>> > > >>>
>>>>>>> > > >>>> Okay, what are the next steps here? This proposal has been
>>>>>>> out for
>>>>>>> > > >>>> quite a while and I don't see any major objections to using
>>>>>>> the Spark
>>>>>>> > > >>>> encoding. It's quite well designed and fits the need well.
>>>>>>> It can also be
>>>>>>> > > >>>> extended to support additional types that are missing if
>>>>>>> that's a priority.
>>>>>>> > > >>>>
>>>>>>> > > >>>> Should we move forward by starting a draft of the changes
>>>>>>> to the table
>>>>>>> > > >>>> spec? Then we can vote on committing those changes and get
>>>>>>> moving on an
>>>>>>> > > >>>> implementation (or possibly do the implementation in
>>>>>>> parallel).
>>>>>>> > > >>>>
>>>>>>> > > >>>> On Fri, Jul 12, 2024 at 1:08 PM Russell Spitzer <
>>>>>>> > > >>>> [email protected]> wrote:
>>>>>>> > > >>>>
>>>>>>> > > >>>>> That's fair, I'm sold on an Iceberg Module.
>>>>>>> > > >>>>>
>>>>>>> > > >>>>> On Fri, Jul 12, 2024 at 2:53 PM Ryan Blue
>>>>>>> <[email protected]>
>>>>>>> > > >>>>> wrote:
>>>>>>> > > >>>>>
>>>>>>> > > >>>>>> > Feels like eventually the encoding should land in
>>>>>>> parquet proper
>>>>>>> > > >>>>>> right?
>>>>>>> > > >>>>>>
>>>>>>> > > >>>>>> What about using it in ORC? I don't know where it should
>>>>>>> end up.
>>>>>>> > > >>>>>> Maybe Iceberg should make a standalone module from it?
>>>>>>> > > >>>>>>
>>>>>>> > > >>>>>> On Fri, Jul 12, 2024 at 12:38 PM Russell Spitzer <
>>>>>>> > > >>>>>> [email protected]> wrote:
>>>>>>> > > >>>>>>
>>>>>>> > > >>>>>>> Feels like eventually the encoding should land in
>>>>>>> parquet proper
>>>>>>> > > >>>>>>> right? I'm fine with us just copying into Iceberg though
>>>>>>> for the time
>>>>>>> > > >>>>>>> being.
>>>>>>> > > >>>>>>>
>>>>>>> > > >>>>>>> On Fri, Jul 12, 2024 at 2:31 PM Ryan Blue
>>>>>>> > > >>>>>>> <[email protected]> wrote:
>>>>>>> > > >>>>>>>
>>>>>>> > > >>>>>>>> Oops, it looks like I missed where Aihua brought this
>>>>>>> up in his
>>>>>>> > > >>>>>>>> last email:
>>>>>>> > > >>>>>>>>
>>>>>>> > > >>>>>>>> > do we have an issue to directly use Spark
>>>>>>> implementation in
>>>>>>> > > >>>>>>>> Iceberg?
>>>>>>> > > >>>>>>>>
>>>>>>> > > >>>>>>>> Yes, I think that we do have an issue using the Spark
>>>>>>> library. What
>>>>>>> > > >>>>>>>> do you think about a Java implementation in Iceberg?
>>>>>>> > > >>>>>>>>
>>>>>>> > > >>>>>>>> Ryan
>>>>>>> > > >>>>>>>>
>>>>>>> > > >>>>>>>> On Fri, Jul 12, 2024 at 12:28 PM Ryan Blue <
>>>>>>> [email protected]>
>>>>>>> > > >>>>>>>> wrote:
>>>>>>> > > >>>>>>>>
>>>>>>> > > >>>>>>>>> I raised the same point from Peter's email in a
>>>>>>> comment on the doc
>>>>>>> > > >>>>>>>>> as well. There is a spark-variant_2.13 artifact that
>>>>>>> would be a much
>>>>>>> > > >>>>>>>>> smaller scope than relying on large portions of Spark,
>>>>>>> but I even then I
>>>>>>> > > >>>>>>>>> doubt that it is a good idea for Iceberg to depend on
>>>>>>> that because it is a
>>>>>>> > > >>>>>>>>> Scala artifact and we would need to bring in a ton of
>>>>>>> Scala libs. I think
>>>>>>> > > >>>>>>>>> what makes the most sense is to have an independent
>>>>>>> implementation of the
>>>>>>> > > >>>>>>>>> spec in Iceberg.
>>>>>>> > > >>>>>>>>>
>>>>>>> > > >>>>>>>>> On Fri, Jul 12, 2024 at 11:51 AM Péter Váry <
>>>>>>> > > >>>>>>>>> [email protected]> wrote:
>>>>>>> > > >>>>>>>>>
>>>>>>> > > >>>>>>>>>> Hi Aihua,
>>>>>>> > > >>>>>>>>>> Long time no see :)
>>>>>>> > > >>>>>>>>>> Would this mean, that every engine which plans to
>>>>>>> support Variant
>>>>>>> > > >>>>>>>>>> data type needs to add Spark as a dependency? Like
>>>>>>> Flink/Trino/Hive etc?
>>>>>>> > > >>>>>>>>>> Thanks, Peter
>>>>>>> > > >>>>>>>>>>
>>>>>>> > > >>>>>>>>>>
>>>>>>> > > >>>>>>>>>> On Fri, Jul 12, 2024, 19:10 Aihua Xu <
>>>>>>> [email protected]> wrote:
>>>>>>> > > >>>>>>>>>>
>>>>>>> > > >>>>>>>>>>> Thanks Ryan.
>>>>>>> > > >>>>>>>>>>>
>>>>>>> > > >>>>>>>>>>> Yeah. That's another reason we want to pursue Spark
>>>>>>> encoding to
>>>>>>> > > >>>>>>>>>>> keep compatibility for the open source engines.
>>>>>>> > > >>>>>>>>>>>
>>>>>>> > > >>>>>>>>>>> One more question regarding the encoding
>>>>>>> implementation: do we
>>>>>>> > > >>>>>>>>>>> have an issue to directly use Spark implementation
>>>>>>> in Iceberg? Russell
>>>>>>> > > >>>>>>>>>>> pointed out that Trino doesn't have Spark dependency
>>>>>>> and that could be a
>>>>>>> > > >>>>>>>>>>> problem?
>>>>>>> > > >>>>>>>>>>>
>>>>>>> > > >>>>>>>>>>> Thanks,
>>>>>>> > > >>>>>>>>>>> Aihua
>>>>>>> > > >>>>>>>>>>>
>>>>>>> > > >>>>>>>>>>> On 2024/07/12 15:02:06 Ryan Blue wrote:
>>>>>>> > > >>>>>>>>>>> > Thanks, Aihua!
>>>>>>> > > >>>>>>>>>>> >
>>>>>>> > > >>>>>>>>>>> > I think that the encoding choice in the current
>>>>>>> doc is a good
>>>>>>> > > >>>>>>>>>>> one. I went
>>>>>>> > > >>>>>>>>>>> > through the Spark encoding in detail and it looks
>>>>>>> like a
>>>>>>> > > >>>>>>>>>>> better choice than
>>>>>>> > > >>>>>>>>>>> > the other candidate encodings for quickly
>>>>>>> accessing nested
>>>>>>> > > >>>>>>>>>>> fields.
>>>>>>> > > >>>>>>>>>>> >
>>>>>>> > > >>>>>>>>>>> > Another reason to use the Spark type is that this
>>>>>>> is what
>>>>>>> > > >>>>>>>>>>> Delta's variant
>>>>>>> > > >>>>>>>>>>> > type is based on, so Parquet files in tables
>>>>>>> written by Delta
>>>>>>> > > >>>>>>>>>>> could be
>>>>>>> > > >>>>>>>>>>> > converted or used in Iceberg tables without
>>>>>>> needing to rewrite
>>>>>>> > > >>>>>>>>>>> variant
>>>>>>> > > >>>>>>>>>>> > data. (Also, note that I work at Databricks and
>>>>>>> have an
>>>>>>> > > >>>>>>>>>>> interest in
>>>>>>> > > >>>>>>>>>>> > increasing format compatibility.)
>>>>>>> > > >>>>>>>>>>> >
>>>>>>> > > >>>>>>>>>>> > Ryan
>>>>>>> > > >>>>>>>>>>> >
>>>>>>> > > >>>>>>>>>>> > On Thu, Jul 11, 2024 at 11:21 AM Aihua Xu <
>>>>>>> > > >>>>>>>>>>> [email protected]>
>>>>>>> > > >>>>>>>>>>> > wrote:
>>>>>>> > > >>>>>>>>>>> >
>>>>>>> > > >>>>>>>>>>> > > [Discuss] Consensus for Variant Encoding
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > It’s great to be able to present the Variant
>>>>>>> type proposal
>>>>>>> > > >>>>>>>>>>> in the
>>>>>>> > > >>>>>>>>>>> > > community sync yesterday and I’m looking to host
>>>>>>> a meeting
>>>>>>> > > >>>>>>>>>>> next week
>>>>>>> > > >>>>>>>>>>> > > (targeting for 9am, July 17th) to go over any
>>>>>>> further
>>>>>>> > > >>>>>>>>>>> concerns about the
>>>>>>> > > >>>>>>>>>>> > > encoding of the Variant type and any other
>>>>>>> questions on the
>>>>>>> > > >>>>>>>>>>> first phase of
>>>>>>> > > >>>>>>>>>>> > > the proposal
>>>>>>> > > >>>>>>>>>>> > > <
>>>>>>> > > >>>>>>>>>>>
>>>>>>> https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit
>>>>>>> > > >>>>>>>>>>> >.
>>>>>>> > > >>>>>>>>>>> > > We are hoping that anyone who is interested in
>>>>>>> the proposal
>>>>>>> > > >>>>>>>>>>> can either join
>>>>>>> > > >>>>>>>>>>> > > or reply with their comments so we can discuss
>>>>>>> them. Summary
>>>>>>> > > >>>>>>>>>>> of the
>>>>>>> > > >>>>>>>>>>> > > discussion and notes will be sent to the mailing
>>>>>>> list for
>>>>>>> > > >>>>>>>>>>> further comment
>>>>>>> > > >>>>>>>>>>> > > there.
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >    -
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >    What should be the underlying binary
>>>>>>> representation
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > We have evaluated a few encodings in the doc
>>>>>>> including ION,
>>>>>>> > > >>>>>>>>>>> JSONB, and
>>>>>>> > > >>>>>>>>>>> > > Spark encoding.Choosing the underlying encoding
>>>>>>> is an
>>>>>>> > > >>>>>>>>>>> important first step
>>>>>>> > > >>>>>>>>>>> > > here and we believe we have general support for
>>>>>>> Spark’s
>>>>>>> > > >>>>>>>>>>> Variant encoding.
>>>>>>> > > >>>>>>>>>>> > > We would like to hear if anyone else has strong
>>>>>>> opinions in
>>>>>>> > > >>>>>>>>>>> this space.
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >    -
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >    Should we support multiple logical types or
>>>>>>> just Variant?
>>>>>>> > > >>>>>>>>>>> Variant vs.
>>>>>>> > > >>>>>>>>>>> > >    Variant + JSON.
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > This is to discuss what logical data type(s) to
>>>>>>> be supported
>>>>>>> > > >>>>>>>>>>> in Iceberg -
>>>>>>> > > >>>>>>>>>>> > > Variant only vs. Variant + JSON. Both types
>>>>>>> would share the
>>>>>>> > > >>>>>>>>>>> same underlying
>>>>>>> > > >>>>>>>>>>> > > encoding but would imply different limitations
>>>>>>> on engines
>>>>>>> > > >>>>>>>>>>> working with
>>>>>>> > > >>>>>>>>>>> > > those types.
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > From the sync up meeting, we are more favoring
>>>>>>> toward
>>>>>>> > > >>>>>>>>>>> supporting Variant
>>>>>>> > > >>>>>>>>>>> > > only and we want to have a consensus on the
>>>>>>> supported
>>>>>>> > > >>>>>>>>>>> type(s).
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >    -
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >    How should we move forward with
>>>>>>> Subcolumnization?
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > Subcolumnization is an optimization for Variant
>>>>>>> type by
>>>>>>> > > >>>>>>>>>>> separating out
>>>>>>> > > >>>>>>>>>>> > > subcolumns with their own metadata. This is not
>>>>>>> critical for
>>>>>>> > > >>>>>>>>>>> choosing the
>>>>>>> > > >>>>>>>>>>> > > initial encoding of the Variant type so we were
>>>>>>> hoping to
>>>>>>> > > >>>>>>>>>>> gain consensus on
>>>>>>> > > >>>>>>>>>>> > > leaving that for a follow up spec.
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > Thanks
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > Aihua
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > Meeting invite:
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > Wednesday, July 17 · 9:00 – 10:00am
>>>>>>> > > >>>>>>>>>>> > > Time zone: America/Los_Angeles
>>>>>>> > > >>>>>>>>>>> > > Google Meet joining info
>>>>>>> > > >>>>>>>>>>> > > Video call link:
>>>>>>> https://meet.google.com/pbm-ovzn-aoq
>>>>>>> > > >>>>>>>>>>> > > Or dial: ‪(US) +1 650-449-9343‬ PIN: ‪170 576
>>>>>>> 525‬#
>>>>>>> > > >>>>>>>>>>> > > More phone numbers:
>>>>>>> > > >>>>>>>>>>> https://tel.meet/pbm-ovzn-aoq?pin=4079632691790
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > > On Tue, May 28, 2024 at 9:21 PM Aihua Xu <
>>>>>>> > > >>>>>>>>>>> [email protected]> wrote:
>>>>>>> > > >>>>>>>>>>> > >
>>>>>>> > > >>>>>>>>>>> > >> Hello,
>>>>>>> > > >>>>>>>>>>> > >>
>>>>>>> > > >>>>>>>>>>> > >> We have drafted the proposal
>>>>>>> > > >>>>>>>>>>> > >> <
>>>>>>> > > >>>>>>>>>>>
>>>>>>> https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit
>>>>>>> > > >>>>>>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >> for Variant data type. Please help review and
>>>>>>> comment.
>>>>>>> > > >>>>>>>>>>> > >>
>>>>>>> > > >>>>>>>>>>> > >> Thanks,
>>>>>>> > > >>>>>>>>>>> > >> Aihua
>>>>>>> > > >>>>>>>>>>> > >>
>>>>>>> > > >>>>>>>>>>> > >> On Thu, May 16, 2024 at 12:45 PM Jack Ye <
>>>>>>> > > >>>>>>>>>>> [email protected]> wrote:
>>>>>>> > > >>>>>>>>>>> > >>
>>>>>>> > > >>>>>>>>>>> > >>> +10000 for a JSON/BSON type. We also had the
>>>>>>> same
>>>>>>> > > >>>>>>>>>>> discussion internally
>>>>>>> > > >>>>>>>>>>> > >>> and a JSON type would really play well with
>>>>>>> for example
>>>>>>> > > >>>>>>>>>>> the SUPER type in
>>>>>>> > > >>>>>>>>>>> > >>> Redshift:
>>>>>>> > > >>>>>>>>>>> > >>>
>>>>>>> > > >>>>>>>>>>>
>>>>>>> https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html,
>>>>>>> > > >>>>>>>>>>> and
>>>>>>> > > >>>>>>>>>>> > >>> can also provide better integration with the
>>>>>>> Trino JSON
>>>>>>> > > >>>>>>>>>>> type.
>>>>>>> > > >>>>>>>>>>> > >>>
>>>>>>> > > >>>>>>>>>>> > >>> Looking forward to the proposal!
>>>>>>> > > >>>>>>>>>>> > >>>
>>>>>>> > > >>>>>>>>>>> > >>> Best,
>>>>>>> > > >>>>>>>>>>> > >>> Jack Ye
>>>>>>> > > >>>>>>>>>>> > >>>
>>>>>>> > > >>>>>>>>>>> > >>>
>>>>>>> > > >>>>>>>>>>> > >>> On Wed, May 15, 2024 at 9:37 AM Tyler Akidau
>>>>>>> > > >>>>>>>>>>> > >>> <[email protected]> wrote:
>>>>>>> > > >>>>>>>>>>> > >>>
>>>>>>> > > >>>>>>>>>>> > >>>> On Tue, May 14, 2024 at 7:58 PM Gang Wu <
>>>>>>> [email protected]>
>>>>>>> > > >>>>>>>>>>> wrote:
>>>>>>> > > >>>>>>>>>>> > >>>>
>>>>>>> > > >>>>>>>>>>> > >>>>> > We may need some guidance on just how many
>>>>>>> we need to
>>>>>>> > > >>>>>>>>>>> look at;
>>>>>>> > > >>>>>>>>>>> > >>>>> > we were planning on Spark and Trino, but
>>>>>>> weren't sure
>>>>>>> > > >>>>>>>>>>> how much
>>>>>>> > > >>>>>>>>>>> > >>>>> > further down the rabbit hole we needed to
>>>>>>> go。
>>>>>>> > > >>>>>>>>>>> > >>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>> There are some engines living outside the
>>>>>>> Java world. It
>>>>>>> > > >>>>>>>>>>> would be
>>>>>>> > > >>>>>>>>>>> > >>>>> good if the proposal could cover the effort
>>>>>>> it takes to
>>>>>>> > > >>>>>>>>>>> integrate
>>>>>>> > > >>>>>>>>>>> > >>>>> variant type to them (e.g. velox,
>>>>>>> datafusion, etc.).
>>>>>>> > > >>>>>>>>>>> This is something
>>>>>>> > > >>>>>>>>>>> > >>>>> that
>>>>>>> > > >>>>>>>>>>> > >>>>> some proprietary iceberg vendors also care
>>>>>>> about.
>>>>>>> > > >>>>>>>>>>> > >>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>
>>>>>>> > > >>>>>>>>>>> > >>>> Ack, makes sense. We can make sure to share
>>>>>>> some
>>>>>>> > > >>>>>>>>>>> perspective on this.
>>>>>>> > > >>>>>>>>>>> > >>>>
>>>>>>> > > >>>>>>>>>>> > >>>> > Not necessarily, no. As long as there's a
>>>>>>> binary type
>>>>>>> > > >>>>>>>>>>> and Iceberg and
>>>>>>> > > >>>>>>>>>>> > >>>>> > the query engines are aware that the
>>>>>>> binary column
>>>>>>> > > >>>>>>>>>>> needs to be
>>>>>>> > > >>>>>>>>>>> > >>>>> > interpreted as a variant, that should be
>>>>>>> sufficient.
>>>>>>> > > >>>>>>>>>>> > >>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>> From the perspective of interoperability, it
>>>>>>> would be
>>>>>>> > > >>>>>>>>>>> good to support
>>>>>>> > > >>>>>>>>>>> > >>>>> native
>>>>>>> > > >>>>>>>>>>> > >>>>> type from file specs. Life will be easier
>>>>>>> for projects
>>>>>>> > > >>>>>>>>>>> like Apache
>>>>>>> > > >>>>>>>>>>> > >>>>> XTable.
>>>>>>> > > >>>>>>>>>>> > >>>>> File format could also provide finer-grained
>>>>>>> statistics
>>>>>>> > > >>>>>>>>>>> for variant
>>>>>>> > > >>>>>>>>>>> > >>>>> type which
>>>>>>> > > >>>>>>>>>>> > >>>>> facilitates data skipping.
>>>>>>> > > >>>>>>>>>>> > >>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>
>>>>>>> > > >>>>>>>>>>> > >>>> Agreed, there can definitely be additional
>>>>>>> value in
>>>>>>> > > >>>>>>>>>>> native file format
>>>>>>> > > >>>>>>>>>>> > >>>> integration. Just wanted to highlight that
>>>>>>> it's not a
>>>>>>> > > >>>>>>>>>>> strict requirement.
>>>>>>> > > >>>>>>>>>>> > >>>>
>>>>>>> > > >>>>>>>>>>> > >>>> -Tyler
>>>>>>> > > >>>>>>>>>>> > >>>>
>>>>>>> > > >>>>>>>>>>> > >>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>> Gang
>>>>>>> > > >>>>>>>>>>> > >>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>> On Wed, May 15, 2024 at 6:49 AM Tyler Akidau
>>>>>>> > > >>>>>>>>>>> > >>>>> <[email protected]> wrote:
>>>>>>> > > >>>>>>>>>>> > >>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>> Good to see you again as well, JB! Thanks!
>>>>>>> > > >>>>>>>>>>> > >>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>> -Tyler
>>>>>>> > > >>>>>>>>>>> > >>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>> On Tue, May 14, 2024 at 1:04 PM
>>>>>>> Jean-Baptiste Onofré <
>>>>>>> > > >>>>>>>>>>> [email protected]>
>>>>>>> > > >>>>>>>>>>> > >>>>>> wrote:
>>>>>>> > > >>>>>>>>>>> > >>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>>> Hi Tyler,
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>>> Super happy to see you there :) It reminds
>>>>>>> me our
>>>>>>> > > >>>>>>>>>>> discussions back in
>>>>>>> > > >>>>>>>>>>> > >>>>>>> the start of Apache Beam :)
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>>> Anyway, the thread is pretty interesting.
>>>>>>> I remember
>>>>>>> > > >>>>>>>>>>> some discussions
>>>>>>> > > >>>>>>>>>>> > >>>>>>> about JSON datatype for spec v3. The
>>>>>>> binary data type
>>>>>>> > > >>>>>>>>>>> is already
>>>>>>> > > >>>>>>>>>>> > >>>>>>> supported in the spec v2.
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>>> I'm looking forward to the proposal and
>>>>>>> happy to help
>>>>>>> > > >>>>>>>>>>> on this !
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>>> Regards
>>>>>>> > > >>>>>>>>>>> > >>>>>>> JB
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>>> On Sat, May 11, 2024 at 7:06 AM Tyler
>>>>>>> Akidau
>>>>>>> > > >>>>>>>>>>> > >>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > Hello,
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > We (Tyler, Nileema, Selcuk, Aihua) are
>>>>>>> working on a
>>>>>>> > > >>>>>>>>>>> proposal for
>>>>>>> > > >>>>>>>>>>> > >>>>>>> which we’d like to get early feedback from
>>>>>>> the
>>>>>>> > > >>>>>>>>>>> community. As you may know,
>>>>>>> > > >>>>>>>>>>> > >>>>>>> Snowflake has embraced Iceberg as its open
>>>>>>> Data Lake
>>>>>>> > > >>>>>>>>>>> format. Having made
>>>>>>> > > >>>>>>>>>>> > >>>>>>> good progress on our own adoption of the
>>>>>>> Iceberg
>>>>>>> > > >>>>>>>>>>> standard, we’re now in a
>>>>>>> > > >>>>>>>>>>> > >>>>>>> position where there are features not yet
>>>>>>> supported in
>>>>>>> > > >>>>>>>>>>> Iceberg which we
>>>>>>> > > >>>>>>>>>>> > >>>>>>> think would be valuable for our users, and
>>>>>>> that we
>>>>>>> > > >>>>>>>>>>> would like to discuss
>>>>>>> > > >>>>>>>>>>> > >>>>>>> with and help contribute to the Iceberg
>>>>>>> community.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > The first two such features we’d like to
>>>>>>> discuss are
>>>>>>> > > >>>>>>>>>>> in support of
>>>>>>> > > >>>>>>>>>>> > >>>>>>> efficient querying of dynamically typed,
>>>>>>> > > >>>>>>>>>>> semi-structured data: variant data
>>>>>>> > > >>>>>>>>>>> > >>>>>>> types, and subcolumnarization of variant
>>>>>>> columns. In
>>>>>>> > > >>>>>>>>>>> more detail, for
>>>>>>> > > >>>>>>>>>>> > >>>>>>> anyone who may not already be familiar:
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > 1. Variant data types
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > Variant types allow for the efficient
>>>>>>> binary
>>>>>>> > > >>>>>>>>>>> encoding of dynamic
>>>>>>> > > >>>>>>>>>>> > >>>>>>> semi-structured data such as JSON, Avro,
>>>>>>> etc. By
>>>>>>> > > >>>>>>>>>>> encoding semi-structured
>>>>>>> > > >>>>>>>>>>> > >>>>>>> data as a variant column, we retain the
>>>>>>> flexibility of
>>>>>>> > > >>>>>>>>>>> the source data,
>>>>>>> > > >>>>>>>>>>> > >>>>>>> while allowing query engines to more
>>>>>>> efficiently
>>>>>>> > > >>>>>>>>>>> operate on the data.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> Snowflake has supported the variant data
>>>>>>> type on
>>>>>>> > > >>>>>>>>>>> Snowflake tables for many
>>>>>>> > > >>>>>>>>>>> > >>>>>>> years [1]. As more and more users utilize
>>>>>>> Iceberg
>>>>>>> > > >>>>>>>>>>> tables in Snowflake,
>>>>>>> > > >>>>>>>>>>> > >>>>>>> we’re hearing an increasing chorus of
>>>>>>> requests for
>>>>>>> > > >>>>>>>>>>> variant support.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> Additionally, other query engines such as
>>>>>>> Apache Spark
>>>>>>> > > >>>>>>>>>>> have begun adding
>>>>>>> > > >>>>>>>>>>> > >>>>>>> variant support [2]. As such, we believe
>>>>>>> it would be
>>>>>>> > > >>>>>>>>>>> beneficial to the
>>>>>>> > > >>>>>>>>>>> > >>>>>>> Iceberg community as a whole to
>>>>>>> standardize on the
>>>>>>> > > >>>>>>>>>>> variant data type
>>>>>>> > > >>>>>>>>>>> > >>>>>>> encoding used across Iceberg tables.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > One specific point to make here is that,
>>>>>>> since an
>>>>>>> > > >>>>>>>>>>> Apache OSS
>>>>>>> > > >>>>>>>>>>> > >>>>>>> version of variant encoding already exists
>>>>>>> in Spark,
>>>>>>> > > >>>>>>>>>>> it likely makes sense
>>>>>>> > > >>>>>>>>>>> > >>>>>>> to simply adopt the Spark encoding as the
>>>>>>> Iceberg
>>>>>>> > > >>>>>>>>>>> standard as well. The
>>>>>>> > > >>>>>>>>>>> > >>>>>>> encoding we use internally today in
>>>>>>> Snowflake is
>>>>>>> > > >>>>>>>>>>> slightly different, but
>>>>>>> > > >>>>>>>>>>> > >>>>>>> essentially equivalent, and we see no
>>>>>>> particular value
>>>>>>> > > >>>>>>>>>>> in trying to clutter
>>>>>>> > > >>>>>>>>>>> > >>>>>>> the space with another
>>>>>>> equivalent-but-incompatible
>>>>>>> > > >>>>>>>>>>> encoding.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > 2. Subcolumnarization
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > Subcolumnarization of variant columns
>>>>>>> allows query
>>>>>>> > > >>>>>>>>>>> engines to
>>>>>>> > > >>>>>>>>>>> > >>>>>>> efficiently prune datasets when subcolumns
>>>>>>> (i.e.,
>>>>>>> > > >>>>>>>>>>> nested fields) within a
>>>>>>> > > >>>>>>>>>>> > >>>>>>> variant column are queried, and also
>>>>>>> allows optionally
>>>>>>> > > >>>>>>>>>>> materializing some
>>>>>>> > > >>>>>>>>>>> > >>>>>>> of the nested fields as a column on their
>>>>>>> own,
>>>>>>> > > >>>>>>>>>>> affording queries on these
>>>>>>> > > >>>>>>>>>>> > >>>>>>> subcolumns the ability to read less data
>>>>>>> and spend
>>>>>>> > > >>>>>>>>>>> less CPU on extraction.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> When subcolumnarizing, the system managing
>>>>>>> table
>>>>>>> > > >>>>>>>>>>> metadata and data tracks
>>>>>>> > > >>>>>>>>>>> > >>>>>>> individual pruning statistics (min, max,
>>>>>>> null, etc.)
>>>>>>> > > >>>>>>>>>>> for some subset of the
>>>>>>> > > >>>>>>>>>>> > >>>>>>> nested fields within a variant, and also
>>>>>>> manages any
>>>>>>> > > >>>>>>>>>>> optional
>>>>>>> > > >>>>>>>>>>> > >>>>>>> materialization. Without
>>>>>>> subcolumnarization, any query
>>>>>>> > > >>>>>>>>>>> which touches a
>>>>>>> > > >>>>>>>>>>> > >>>>>>> variant column must read, parse, extract,
>>>>>>> and filter
>>>>>>> > > >>>>>>>>>>> every row for which
>>>>>>> > > >>>>>>>>>>> > >>>>>>> that column is non-null. Thus, by
>>>>>>> providing a
>>>>>>> > > >>>>>>>>>>> standardized way of tracking
>>>>>>> > > >>>>>>>>>>> > >>>>>>> subcolum metadata and data for variant
>>>>>>> columns,
>>>>>>> > > >>>>>>>>>>> Iceberg can make
>>>>>>> > > >>>>>>>>>>> > >>>>>>> subcolumnar optimizations accessible
>>>>>>> across various
>>>>>>> > > >>>>>>>>>>> catalogs and query
>>>>>>> > > >>>>>>>>>>> > >>>>>>> engines.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > Subcolumnarization is a non-trivial
>>>>>>> topic, so we
>>>>>>> > > >>>>>>>>>>> expect any
>>>>>>> > > >>>>>>>>>>> > >>>>>>> concrete proposal to include not only the
>>>>>>> set of
>>>>>>> > > >>>>>>>>>>> changes to Iceberg
>>>>>>> > > >>>>>>>>>>> > >>>>>>> metadata that allow compatible query
>>>>>>> engines to
>>>>>>> > > >>>>>>>>>>> interopate on
>>>>>>> > > >>>>>>>>>>> > >>>>>>> subcolumnarization data for variant
>>>>>>> columns, but also
>>>>>>> > > >>>>>>>>>>> reference
>>>>>>> > > >>>>>>>>>>> > >>>>>>> documentation explaining
>>>>>>> subcolumnarization principles
>>>>>>> > > >>>>>>>>>>> and recommended best
>>>>>>> > > >>>>>>>>>>> > >>>>>>> practices.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > It sounds like the recent Geo proposal
>>>>>>> [3] may be a
>>>>>>> > > >>>>>>>>>>> good starting
>>>>>>> > > >>>>>>>>>>> > >>>>>>> point for how to approach this, so our
>>>>>>> plan is to
>>>>>>> > > >>>>>>>>>>> write something up in
>>>>>>> > > >>>>>>>>>>> > >>>>>>> that vein that covers the proposed spec
>>>>>>> changes,
>>>>>>> > > >>>>>>>>>>> backwards compatibility,
>>>>>>> > > >>>>>>>>>>> > >>>>>>> implementor burdens, etc. But we wanted to
>>>>>>> first reach
>>>>>>> > > >>>>>>>>>>> out to the community
>>>>>>> > > >>>>>>>>>>> > >>>>>>> to introduce ourselves and the idea, and
>>>>>>> see if
>>>>>>> > > >>>>>>>>>>> there’s any early feedback
>>>>>>> > > >>>>>>>>>>> > >>>>>>> we should incorporate before we spend too
>>>>>>> much time on
>>>>>>> > > >>>>>>>>>>> a concrete proposal.
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > Thank you!
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > [1]
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>>
>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-semistructured
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > [2]
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>>
>>>>>>> https://github.com/apache/spark/blob/master/common/variant/README.md
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > [3]
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>>
>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>> > -Tyler, Nileema, Selcuk, Aihua
>>>>>>> > > >>>>>>>>>>> > >>>>>>> >
>>>>>>> > > >>>>>>>>>>> > >>>>>>>
>>>>>>> > > >>>>>>>>>>> > >>>>>>
>>>>>>> > > >>>>>>>>>>> >
>>>>>>> > > >>>>>>>>>>> > --
>>>>>>> > > >>>>>>>>>>> > Ryan Blue
>>>>>>> > > >>>>>>>>>>> > Databricks
>>>>>>> > > >>>>>>>>>>> >
>>>>>>> > > >>>>>>>>>>>
>>>>>>> > > >>>>>>>>>>
>>>>>>> > > >>>>>>>>>
>>>>>>> > > >>>>>>>>> --
>>>>>>> > > >>>>>>>>> Ryan Blue
>>>>>>> > > >>>>>>>>> Databricks
>>>>>>> > > >>>>>>>>>
>>>>>>> > > >>>>>>>>
>>>>>>> > > >>>>>>>>
>>>>>>> > > >>>>>>>> --
>>>>>>> > > >>>>>>>> Ryan Blue
>>>>>>> > > >>>>>>>> Databricks
>>>>>>> > > >>>>>>>>
>>>>>>> > > >>>>>>>
>>>>>>> > > >>>>>>
>>>>>>> > > >>>>>> --
>>>>>>> > > >>>>>> Ryan Blue
>>>>>>> > > >>>>>> Databricks
>>>>>>> > > >>>>>>
>>>>>>> > > >>>>>
>>>>>>> > > >>>>
>>>>>>> > > >>>> --
>>>>>>> > > >>>> Ryan Blue
>>>>>>> > > >>>> Databricks
>>>>>>> > > >>>>
>>>>>>> > > >>>
>>>>>>> > > >>
>>>>>>> > > >> --
>>>>>>> > > >> Ryan Blue
>>>>>>> > > >> Databricks
>>>>>>> > > >>
>>>>>>> > > >
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Databricks
>>>>>
>>>>

Re: [Early Feedback] Variant and Subcolumnarization Support

Reply via email to