I'm late replying to this but I'm also in agreement with 1 (adopting the spark variant encoding), 3 (specifically only having a variant type), and 4 (ensuring we are thinking through subcolumnarization upfront since without it the variant type may not be that useful).
I'd also support having the spec, and reference implementation in Iceberg; as others have said, it centralizes improvements in a single, agnostic dependency for engines, rather than engines having to take dependencies on other engine modules. Thanks, Amogh Jahagirdar On Tue, Jul 23, 2024 at 12:15 AM Péter Váry <peter.vary.apa...@gmail.com> wrote: > I have been looking around, how can we map Variant type in Flink. I have > not found any existing type which we could use, but Flink already have some > JSON parsing capabilities [1] for string fields. > > So until we have native support in Flink for something similar to Vartiant > type, I expect that we need to map it to JSON strings in RowData. > > Based on that, here are my preferences: > 1. I'm ok with adapting Spark Variant type, if we build our own Iceberg > serializer/deserializer module for it > 2. I prefer to move the spec to Iceberg, so we own it, and extend it, if > needed. This could be important in the first phase. Later when it is more > stable we might donate it to some other project, like Parquet > 3. I would prefer to support only a single type, and Variant is more > expressive, but having a standard way to convert between JSON and Variant > would be useful for Flink users. > 4. On subcolumnarization: I think Flink will only use this feature as much > as the Iceberg readers implement this, so I would like to see as much as > possible of it in the common Iceberg code > > Thanks, > Peter > > [1] - > https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/systemfunctions/#json-functions > > > On Tue, Jul 23, 2024, 06:36 Micah Kornfield <emkornfi...@gmail.com> wrote: > >> Sorry for the late reply. I agree with the sentiments on 1 and 3 that >> have already been posted (adopt the Spark encoding, and only have the >> Variant type). As mentioned on the doc for 3, I think it would be good to >> specify how to map scalar types to a JSON representation so there can be >> consistency between engines that don't support variant. >> >> >>> Regarding point 2, I also feel Iceberg is more natural to host such a >>> subproject for variant spec and implementation. But let me reach out to the >>> Spark community to discuss. >> >> >> The only other place I can think of that might be a good home for >> Variant spec could be in Apache Arrow as a canonical extension type. There >> is an issue for this [1]. I think the main thing on where this is housed >> is which types are intended to be supported. I believe Arrow is currently >> a superset of the Iceberg type system (UUID is supported as a canonical >> extension type [2]). >> >> For point 4 subcolumnarization, I think ideally this belongs in Iceberg >> (and if Iceberg and Delta Lake can agree on how to do it that would be >> great) with potential consultation with Parquet/ORC communities to >> potentially add better native support. >> >> Thanks, >> Micah >> >> >> >> [1] https://github.com/apache/arrow/issues/42069 >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html >> >> On Sat, Jul 20, 2024 at 5:54 PM Aihua Xu <aihu...@gmail.com> wrote: >> >>> Thanks for the discussion and feedback. >>> >>> Do we have the consensus on point 1 and point 3 to move forward with >>> Spark variant encoding and support Variant type only? Or let me know how to >>> proceed from here. >>> >>> Regarding point 2, I also feel Iceberg is more natural to host such a >>> subproject for variant spec and implementation. But let me reach out to the >>> Spark community to discuss. >>> >>> Thanks, >>> Aihua >>> >>> >>> On Fri, Jul 19, 2024 at 9:35 AM Yufei Gu <flyrain...@gmail.com> wrote: >>> >>>> Agreed with point 1. >>>> >>>> For point 2, I also prefer to hold the spec and reference >>>> implementation under Iceberg. Here are the reasons: >>>> 1. It is unconventional and impractical for one engine to depend on >>>> another for data types. For instance, it is not ideal for Trino to rely on >>>> data types defined by the Spark engine. >>>> 2. Iceberg serves as a bridge between engines and file formats. By >>>> centralizing the specification in Iceberg, any future optimizations or >>>> updates to file formats can be referred to within Iceberg, ensuring >>>> consistency and reducing dependencies. >>>> >>>> For point 3, I'd prefer to support the variant type only at this moment. >>>> >>>> Yufei >>>> >>>> >>>> On Thu, Jul 18, 2024 at 12:55 PM Ryan Blue <b...@databricks.com.invalid> >>>> wrote: >>>> >>>>> Similarly, I'm aligned with point 1 and I'd choose to support only >>>>> variant for point 3. >>>>> >>>>> We'll need to work with the Spark community to find a good place for >>>>> the library and spec, since it touches many different projects. I'd also >>>>> prefer Iceberg as the home. >>>>> >>>>> I also think it's a good idea to get subcolumnarization into our spec >>>>> when we update. Without that I think the feature will be fairly limited. >>>>> >>>>> On Thu, Jul 18, 2024 at 10:56 AM Russell Spitzer < >>>>> russell.spit...@gmail.com> wrote: >>>>> >>>>>> I'm aligned with point 1. >>>>>> >>>>>> For point 2 I think we should choose quickly, I honestly do think >>>>>> this would be fine as part of the Iceberg Spec directly but understand it >>>>>> may be better for the more broad community if it was a sub project. As a >>>>>> sub-project I would still prefer it being an Iceberg Subproject since we >>>>>> are engine/file-format agnostic. >>>>>> >>>>>> 3. I support adding just Variant. >>>>>> >>>>>> On Thu, Jul 18, 2024 at 12:54 AM Aihua Xu <aihu...@apache.org> wrote: >>>>>> >>>>>>> Hello community, >>>>>>> >>>>>>> It’s great to sync up with some of you on Variant and >>>>>>> SubColumarization support in Iceberg again. Apologize that I didn’t >>>>>>> record >>>>>>> the meeting but here are some key items that we want to follow up with >>>>>>> the >>>>>>> community. >>>>>>> >>>>>>> 1. Adopt Spark Variant encoding >>>>>>> Those present were in favor of adopting the Spark variant encoding >>>>>>> for Iceberg Variant with extensions to support other Iceberg types. We >>>>>>> would like to know if anyone has an objection to this to reuse an open >>>>>>> source encoding. >>>>>>> >>>>>>> 2. Movement of the Spark Variant Spec to another project >>>>>>> To avoid introducing Apache Spark as a dependency for the engines >>>>>>> and file formats, we discussed separating Spark Variant encoding spec >>>>>>> and >>>>>>> implementation from the Spark Project to a neutral location. We thought >>>>>>> up >>>>>>> several solutions but didn’t have consensus on any of them. We are >>>>>>> looking >>>>>>> for more feedback on this topic from the community either in terms of >>>>>>> support for one of these options or another idea on how to support the >>>>>>> spec. >>>>>>> >>>>>>> Options Proposed: >>>>>>> * Leave the Spec in Spark (Difficult for versioning and other >>>>>>> engines) >>>>>>> * Copying the Spec into Iceberg Project Directly (Difficult for >>>>>>> other Table Formats) >>>>>>> * Creating a Sub-Project of Apache Iceberg and moving the spec and >>>>>>> reference implementation there (Logistically complicated) >>>>>>> * Creating a Sub-Project of Apache Spark and moving the spec and >>>>>>> reference implementation there (Logistically complicated) >>>>>>> >>>>>>> 3. Add Variant type vs. Variant and JSON types >>>>>>> Those who were present were in favor of adding only the Variant type >>>>>>> to Iceberg. We are looking for anyone who has an objection to going >>>>>>> forward >>>>>>> with just the Variant Type and no Iceberg JSON Type. We were favoring >>>>>>> adding Variant type only because: >>>>>>> * Introducing a JSON type would require engines that only support >>>>>>> VARIANT to do write time validation of their input to a JSON column. If >>>>>>> they don’t have a JSON type an engine wouldn’t support this. >>>>>>> * Engines which don’t support Variant will work most of the time but >>>>>>> can have fallback strings defined in the spec for reading unsupported >>>>>>> types. Writing a JSON into a Variant will always work. >>>>>>> >>>>>>> 4. Support for Subcolumnization spec (shredding in Spark) >>>>>>> We have no action items on this but would like to follow up on >>>>>>> discussions on Subcolumnization in the future. >>>>>>> * We had general agreement that this should be included in Iceberg >>>>>>> V3 or else adding variant may not be useful. >>>>>>> * We are interested in also adopting the shredding spec from Spark >>>>>>> and would like to move it to whatever place we decided the Variant spec >>>>>>> is >>>>>>> going to live. >>>>>>> >>>>>>> Let us know if missed anything and if you have any additional >>>>>>> thoughts or suggestions. >>>>>>> >>>>>>> Thanks >>>>>>> Aihua >>>>>>> >>>>>>> >>>>>>> On 2024/07/15 18:32:22 Aihua Xu wrote: >>>>>>> > Thanks for the discussion. >>>>>>> > >>>>>>> > I will move forward to work on spec PR. >>>>>>> > >>>>>>> > Regarding the implementation, we will have module for Variant >>>>>>> support in Iceberg so we will not have to bring in Spark libraries. >>>>>>> > >>>>>>> > I'm reposting the meeting invite in case it's not clear in my >>>>>>> original email since I included in the end. Looks like we don't have >>>>>>> major >>>>>>> objections/diverges but let's sync up and have consensus. >>>>>>> > >>>>>>> > Meeting invite: >>>>>>> > >>>>>>> > Wednesday, July 17 · 9:00 – 10:00am >>>>>>> > Time zone: America/Los_Angeles >>>>>>> > Google Meet joining info >>>>>>> > Video call link: https://meet.google.com/pbm-ovzn-aoq >>>>>>> > Or dial: (US) +1 650-449-9343 PIN: 170 576 525# >>>>>>> > More phone numbers: >>>>>>> https://tel.meet/pbm-ovzn-aoq?pin=4079632691790 >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Aihua >>>>>>> > >>>>>>> > On 2024/07/12 20:55:01 Micah Kornfield wrote: >>>>>>> > > I don't think this needs to hold up the PR but I think coming to >>>>>>> a >>>>>>> > > consensus on the exact set of types supported is worthwhile (and >>>>>>> if the >>>>>>> > > goal is to maintain the same set as specified by the Spark >>>>>>> Variant type or >>>>>>> > > if divergence is expected/allowed). From a fragmentation >>>>>>> perspective it >>>>>>> > > would be a shame if they diverge, so maybe a next step is also >>>>>>> suggesting >>>>>>> > > support to the Spark community on the missing existing Iceberg >>>>>>> types? >>>>>>> > > >>>>>>> > > Thanks, >>>>>>> > > Micah >>>>>>> > > >>>>>>> > > On Fri, Jul 12, 2024 at 1:44 PM Russell Spitzer < >>>>>>> russell.spit...@gmail.com> >>>>>>> > > wrote: >>>>>>> > > >>>>>>> > > > Just talked with Aihua and he's working on the Spec PR now. We >>>>>>> can get >>>>>>> > > > feedback there from everyone. >>>>>>> > > > >>>>>>> > > > On Fri, Jul 12, 2024 at 3:41 PM Ryan Blue >>>>>>> <b...@databricks.com.invalid> >>>>>>> > > > wrote: >>>>>>> > > > >>>>>>> > > >> Good idea, but I'm hoping that we can continue to get their >>>>>>> feedback in >>>>>>> > > >> parallel to getting the spec changes started. Piotr didn't >>>>>>> seem to object >>>>>>> > > >> to the encoding from what I read of his comments. Hopefully >>>>>>> he (and others) >>>>>>> > > >> chime in here. >>>>>>> > > >> >>>>>>> > > >> On Fri, Jul 12, 2024 at 1:32 PM Russell Spitzer < >>>>>>> > > >> russell.spit...@gmail.com> wrote: >>>>>>> > > >> >>>>>>> > > >>> I just want to make sure we get Piotr and Peter on board as >>>>>>> > > >>> representatives of Flink and Trino engines. Also make sure >>>>>>> we have anyone >>>>>>> > > >>> else chime in who has experience with Ray if possible. >>>>>>> > > >>> >>>>>>> > > >>> Spec changes feel like the right next step. >>>>>>> > > >>> >>>>>>> > > >>> On Fri, Jul 12, 2024 at 3:14 PM Ryan Blue >>>>>>> <b...@databricks.com.invalid> >>>>>>> > > >>> wrote: >>>>>>> > > >>> >>>>>>> > > >>>> Okay, what are the next steps here? This proposal has been >>>>>>> out for >>>>>>> > > >>>> quite a while and I don't see any major objections to using >>>>>>> the Spark >>>>>>> > > >>>> encoding. It's quite well designed and fits the need well. >>>>>>> It can also be >>>>>>> > > >>>> extended to support additional types that are missing if >>>>>>> that's a priority. >>>>>>> > > >>>> >>>>>>> > > >>>> Should we move forward by starting a draft of the changes >>>>>>> to the table >>>>>>> > > >>>> spec? Then we can vote on committing those changes and get >>>>>>> moving on an >>>>>>> > > >>>> implementation (or possibly do the implementation in >>>>>>> parallel). >>>>>>> > > >>>> >>>>>>> > > >>>> On Fri, Jul 12, 2024 at 1:08 PM Russell Spitzer < >>>>>>> > > >>>> russell.spit...@gmail.com> wrote: >>>>>>> > > >>>> >>>>>>> > > >>>>> That's fair, I'm sold on an Iceberg Module. >>>>>>> > > >>>>> >>>>>>> > > >>>>> On Fri, Jul 12, 2024 at 2:53 PM Ryan Blue >>>>>>> <b...@databricks.com.invalid> >>>>>>> > > >>>>> wrote: >>>>>>> > > >>>>> >>>>>>> > > >>>>>> > Feels like eventually the encoding should land in >>>>>>> parquet proper >>>>>>> > > >>>>>> right? >>>>>>> > > >>>>>> >>>>>>> > > >>>>>> What about using it in ORC? I don't know where it should >>>>>>> end up. >>>>>>> > > >>>>>> Maybe Iceberg should make a standalone module from it? >>>>>>> > > >>>>>> >>>>>>> > > >>>>>> On Fri, Jul 12, 2024 at 12:38 PM Russell Spitzer < >>>>>>> > > >>>>>> russell.spit...@gmail.com> wrote: >>>>>>> > > >>>>>> >>>>>>> > > >>>>>>> Feels like eventually the encoding should land in >>>>>>> parquet proper >>>>>>> > > >>>>>>> right? I'm fine with us just copying into Iceberg though >>>>>>> for the time >>>>>>> > > >>>>>>> being. >>>>>>> > > >>>>>>> >>>>>>> > > >>>>>>> On Fri, Jul 12, 2024 at 2:31 PM Ryan Blue >>>>>>> > > >>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>> > > >>>>>>> >>>>>>> > > >>>>>>>> Oops, it looks like I missed where Aihua brought this >>>>>>> up in his >>>>>>> > > >>>>>>>> last email: >>>>>>> > > >>>>>>>> >>>>>>> > > >>>>>>>> > do we have an issue to directly use Spark >>>>>>> implementation in >>>>>>> > > >>>>>>>> Iceberg? >>>>>>> > > >>>>>>>> >>>>>>> > > >>>>>>>> Yes, I think that we do have an issue using the Spark >>>>>>> library. What >>>>>>> > > >>>>>>>> do you think about a Java implementation in Iceberg? >>>>>>> > > >>>>>>>> >>>>>>> > > >>>>>>>> Ryan >>>>>>> > > >>>>>>>> >>>>>>> > > >>>>>>>> On Fri, Jul 12, 2024 at 12:28 PM Ryan Blue < >>>>>>> b...@databricks.com> >>>>>>> > > >>>>>>>> wrote: >>>>>>> > > >>>>>>>> >>>>>>> > > >>>>>>>>> I raised the same point from Peter's email in a >>>>>>> comment on the doc >>>>>>> > > >>>>>>>>> as well. There is a spark-variant_2.13 artifact that >>>>>>> would be a much >>>>>>> > > >>>>>>>>> smaller scope than relying on large portions of Spark, >>>>>>> but I even then I >>>>>>> > > >>>>>>>>> doubt that it is a good idea for Iceberg to depend on >>>>>>> that because it is a >>>>>>> > > >>>>>>>>> Scala artifact and we would need to bring in a ton of >>>>>>> Scala libs. I think >>>>>>> > > >>>>>>>>> what makes the most sense is to have an independent >>>>>>> implementation of the >>>>>>> > > >>>>>>>>> spec in Iceberg. >>>>>>> > > >>>>>>>>> >>>>>>> > > >>>>>>>>> On Fri, Jul 12, 2024 at 11:51 AM Péter Váry < >>>>>>> > > >>>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>> > > >>>>>>>>> >>>>>>> > > >>>>>>>>>> Hi Aihua, >>>>>>> > > >>>>>>>>>> Long time no see :) >>>>>>> > > >>>>>>>>>> Would this mean, that every engine which plans to >>>>>>> support Variant >>>>>>> > > >>>>>>>>>> data type needs to add Spark as a dependency? Like >>>>>>> Flink/Trino/Hive etc? >>>>>>> > > >>>>>>>>>> Thanks, Peter >>>>>>> > > >>>>>>>>>> >>>>>>> > > >>>>>>>>>> >>>>>>> > > >>>>>>>>>> On Fri, Jul 12, 2024, 19:10 Aihua Xu < >>>>>>> aihu...@apache.org> wrote: >>>>>>> > > >>>>>>>>>> >>>>>>> > > >>>>>>>>>>> Thanks Ryan. >>>>>>> > > >>>>>>>>>>> >>>>>>> > > >>>>>>>>>>> Yeah. That's another reason we want to pursue Spark >>>>>>> encoding to >>>>>>> > > >>>>>>>>>>> keep compatibility for the open source engines. >>>>>>> > > >>>>>>>>>>> >>>>>>> > > >>>>>>>>>>> One more question regarding the encoding >>>>>>> implementation: do we >>>>>>> > > >>>>>>>>>>> have an issue to directly use Spark implementation >>>>>>> in Iceberg? Russell >>>>>>> > > >>>>>>>>>>> pointed out that Trino doesn't have Spark dependency >>>>>>> and that could be a >>>>>>> > > >>>>>>>>>>> problem? >>>>>>> > > >>>>>>>>>>> >>>>>>> > > >>>>>>>>>>> Thanks, >>>>>>> > > >>>>>>>>>>> Aihua >>>>>>> > > >>>>>>>>>>> >>>>>>> > > >>>>>>>>>>> On 2024/07/12 15:02:06 Ryan Blue wrote: >>>>>>> > > >>>>>>>>>>> > Thanks, Aihua! >>>>>>> > > >>>>>>>>>>> > >>>>>>> > > >>>>>>>>>>> > I think that the encoding choice in the current >>>>>>> doc is a good >>>>>>> > > >>>>>>>>>>> one. I went >>>>>>> > > >>>>>>>>>>> > through the Spark encoding in detail and it looks >>>>>>> like a >>>>>>> > > >>>>>>>>>>> better choice than >>>>>>> > > >>>>>>>>>>> > the other candidate encodings for quickly >>>>>>> accessing nested >>>>>>> > > >>>>>>>>>>> fields. >>>>>>> > > >>>>>>>>>>> > >>>>>>> > > >>>>>>>>>>> > Another reason to use the Spark type is that this >>>>>>> is what >>>>>>> > > >>>>>>>>>>> Delta's variant >>>>>>> > > >>>>>>>>>>> > type is based on, so Parquet files in tables >>>>>>> written by Delta >>>>>>> > > >>>>>>>>>>> could be >>>>>>> > > >>>>>>>>>>> > converted or used in Iceberg tables without >>>>>>> needing to rewrite >>>>>>> > > >>>>>>>>>>> variant >>>>>>> > > >>>>>>>>>>> > data. (Also, note that I work at Databricks and >>>>>>> have an >>>>>>> > > >>>>>>>>>>> interest in >>>>>>> > > >>>>>>>>>>> > increasing format compatibility.) >>>>>>> > > >>>>>>>>>>> > >>>>>>> > > >>>>>>>>>>> > Ryan >>>>>>> > > >>>>>>>>>>> > >>>>>>> > > >>>>>>>>>>> > On Thu, Jul 11, 2024 at 11:21 AM Aihua Xu < >>>>>>> > > >>>>>>>>>>> aihua...@snowflake.com.invalid> >>>>>>> > > >>>>>>>>>>> > wrote: >>>>>>> > > >>>>>>>>>>> > >>>>>>> > > >>>>>>>>>>> > > [Discuss] Consensus for Variant Encoding >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > It’s great to be able to present the Variant >>>>>>> type proposal >>>>>>> > > >>>>>>>>>>> in the >>>>>>> > > >>>>>>>>>>> > > community sync yesterday and I’m looking to host >>>>>>> a meeting >>>>>>> > > >>>>>>>>>>> next week >>>>>>> > > >>>>>>>>>>> > > (targeting for 9am, July 17th) to go over any >>>>>>> further >>>>>>> > > >>>>>>>>>>> concerns about the >>>>>>> > > >>>>>>>>>>> > > encoding of the Variant type and any other >>>>>>> questions on the >>>>>>> > > >>>>>>>>>>> first phase of >>>>>>> > > >>>>>>>>>>> > > the proposal >>>>>>> > > >>>>>>>>>>> > > < >>>>>>> > > >>>>>>>>>>> >>>>>>> https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit >>>>>>> > > >>>>>>>>>>> >. >>>>>>> > > >>>>>>>>>>> > > We are hoping that anyone who is interested in >>>>>>> the proposal >>>>>>> > > >>>>>>>>>>> can either join >>>>>>> > > >>>>>>>>>>> > > or reply with their comments so we can discuss >>>>>>> them. Summary >>>>>>> > > >>>>>>>>>>> of the >>>>>>> > > >>>>>>>>>>> > > discussion and notes will be sent to the mailing >>>>>>> list for >>>>>>> > > >>>>>>>>>>> further comment >>>>>>> > > >>>>>>>>>>> > > there. >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > - >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > What should be the underlying binary >>>>>>> representation >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > We have evaluated a few encodings in the doc >>>>>>> including ION, >>>>>>> > > >>>>>>>>>>> JSONB, and >>>>>>> > > >>>>>>>>>>> > > Spark encoding.Choosing the underlying encoding >>>>>>> is an >>>>>>> > > >>>>>>>>>>> important first step >>>>>>> > > >>>>>>>>>>> > > here and we believe we have general support for >>>>>>> Spark’s >>>>>>> > > >>>>>>>>>>> Variant encoding. >>>>>>> > > >>>>>>>>>>> > > We would like to hear if anyone else has strong >>>>>>> opinions in >>>>>>> > > >>>>>>>>>>> this space. >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > - >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > Should we support multiple logical types or >>>>>>> just Variant? >>>>>>> > > >>>>>>>>>>> Variant vs. >>>>>>> > > >>>>>>>>>>> > > Variant + JSON. >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > This is to discuss what logical data type(s) to >>>>>>> be supported >>>>>>> > > >>>>>>>>>>> in Iceberg - >>>>>>> > > >>>>>>>>>>> > > Variant only vs. Variant + JSON. Both types >>>>>>> would share the >>>>>>> > > >>>>>>>>>>> same underlying >>>>>>> > > >>>>>>>>>>> > > encoding but would imply different limitations >>>>>>> on engines >>>>>>> > > >>>>>>>>>>> working with >>>>>>> > > >>>>>>>>>>> > > those types. >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > From the sync up meeting, we are more favoring >>>>>>> toward >>>>>>> > > >>>>>>>>>>> supporting Variant >>>>>>> > > >>>>>>>>>>> > > only and we want to have a consensus on the >>>>>>> supported >>>>>>> > > >>>>>>>>>>> type(s). >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > - >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > How should we move forward with >>>>>>> Subcolumnization? >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > Subcolumnization is an optimization for Variant >>>>>>> type by >>>>>>> > > >>>>>>>>>>> separating out >>>>>>> > > >>>>>>>>>>> > > subcolumns with their own metadata. This is not >>>>>>> critical for >>>>>>> > > >>>>>>>>>>> choosing the >>>>>>> > > >>>>>>>>>>> > > initial encoding of the Variant type so we were >>>>>>> hoping to >>>>>>> > > >>>>>>>>>>> gain consensus on >>>>>>> > > >>>>>>>>>>> > > leaving that for a follow up spec. >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > Thanks >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > Aihua >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > Meeting invite: >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > Wednesday, July 17 · 9:00 – 10:00am >>>>>>> > > >>>>>>>>>>> > > Time zone: America/Los_Angeles >>>>>>> > > >>>>>>>>>>> > > Google Meet joining info >>>>>>> > > >>>>>>>>>>> > > Video call link: >>>>>>> https://meet.google.com/pbm-ovzn-aoq >>>>>>> > > >>>>>>>>>>> > > Or dial: (US) +1 650-449-9343 PIN: 170 576 >>>>>>> 525# >>>>>>> > > >>>>>>>>>>> > > More phone numbers: >>>>>>> > > >>>>>>>>>>> https://tel.meet/pbm-ovzn-aoq?pin=4079632691790 >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > > On Tue, May 28, 2024 at 9:21 PM Aihua Xu < >>>>>>> > > >>>>>>>>>>> aihua...@snowflake.com> wrote: >>>>>>> > > >>>>>>>>>>> > > >>>>>>> > > >>>>>>>>>>> > >> Hello, >>>>>>> > > >>>>>>>>>>> > >> >>>>>>> > > >>>>>>>>>>> > >> We have drafted the proposal >>>>>>> > > >>>>>>>>>>> > >> < >>>>>>> > > >>>>>>>>>>> >>>>>>> https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit >>>>>>> > > >>>>>>>>>>> > >>>>>>> > > >>>>>>>>>>> > >> for Variant data type. Please help review and >>>>>>> comment. >>>>>>> > > >>>>>>>>>>> > >> >>>>>>> > > >>>>>>>>>>> > >> Thanks, >>>>>>> > > >>>>>>>>>>> > >> Aihua >>>>>>> > > >>>>>>>>>>> > >> >>>>>>> > > >>>>>>>>>>> > >> On Thu, May 16, 2024 at 12:45 PM Jack Ye < >>>>>>> > > >>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>> > > >>>>>>>>>>> > >> >>>>>>> > > >>>>>>>>>>> > >>> +10000 for a JSON/BSON type. We also had the >>>>>>> same >>>>>>> > > >>>>>>>>>>> discussion internally >>>>>>> > > >>>>>>>>>>> > >>> and a JSON type would really play well with >>>>>>> for example >>>>>>> > > >>>>>>>>>>> the SUPER type in >>>>>>> > > >>>>>>>>>>> > >>> Redshift: >>>>>>> > > >>>>>>>>>>> > >>> >>>>>>> > > >>>>>>>>>>> >>>>>>> https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html, >>>>>>> > > >>>>>>>>>>> and >>>>>>> > > >>>>>>>>>>> > >>> can also provide better integration with the >>>>>>> Trino JSON >>>>>>> > > >>>>>>>>>>> type. >>>>>>> > > >>>>>>>>>>> > >>> >>>>>>> > > >>>>>>>>>>> > >>> Looking forward to the proposal! >>>>>>> > > >>>>>>>>>>> > >>> >>>>>>> > > >>>>>>>>>>> > >>> Best, >>>>>>> > > >>>>>>>>>>> > >>> Jack Ye >>>>>>> > > >>>>>>>>>>> > >>> >>>>>>> > > >>>>>>>>>>> > >>> >>>>>>> > > >>>>>>>>>>> > >>> On Wed, May 15, 2024 at 9:37 AM Tyler Akidau >>>>>>> > > >>>>>>>>>>> > >>> <tyler.aki...@snowflake.com.invalid> wrote: >>>>>>> > > >>>>>>>>>>> > >>> >>>>>>> > > >>>>>>>>>>> > >>>> On Tue, May 14, 2024 at 7:58 PM Gang Wu < >>>>>>> ust...@gmail.com> >>>>>>> > > >>>>>>>>>>> wrote: >>>>>>> > > >>>>>>>>>>> > >>>> >>>>>>> > > >>>>>>>>>>> > >>>>> > We may need some guidance on just how many >>>>>>> we need to >>>>>>> > > >>>>>>>>>>> look at; >>>>>>> > > >>>>>>>>>>> > >>>>> > we were planning on Spark and Trino, but >>>>>>> weren't sure >>>>>>> > > >>>>>>>>>>> how much >>>>>>> > > >>>>>>>>>>> > >>>>> > further down the rabbit hole we needed to >>>>>>> go。 >>>>>>> > > >>>>>>>>>>> > >>>>> >>>>>>> > > >>>>>>>>>>> > >>>>> There are some engines living outside the >>>>>>> Java world. It >>>>>>> > > >>>>>>>>>>> would be >>>>>>> > > >>>>>>>>>>> > >>>>> good if the proposal could cover the effort >>>>>>> it takes to >>>>>>> > > >>>>>>>>>>> integrate >>>>>>> > > >>>>>>>>>>> > >>>>> variant type to them (e.g. velox, >>>>>>> datafusion, etc.). >>>>>>> > > >>>>>>>>>>> This is something >>>>>>> > > >>>>>>>>>>> > >>>>> that >>>>>>> > > >>>>>>>>>>> > >>>>> some proprietary iceberg vendors also care >>>>>>> about. >>>>>>> > > >>>>>>>>>>> > >>>>> >>>>>>> > > >>>>>>>>>>> > >>>> >>>>>>> > > >>>>>>>>>>> > >>>> Ack, makes sense. We can make sure to share >>>>>>> some >>>>>>> > > >>>>>>>>>>> perspective on this. >>>>>>> > > >>>>>>>>>>> > >>>> >>>>>>> > > >>>>>>>>>>> > >>>> > Not necessarily, no. As long as there's a >>>>>>> binary type >>>>>>> > > >>>>>>>>>>> and Iceberg and >>>>>>> > > >>>>>>>>>>> > >>>>> > the query engines are aware that the >>>>>>> binary column >>>>>>> > > >>>>>>>>>>> needs to be >>>>>>> > > >>>>>>>>>>> > >>>>> > interpreted as a variant, that should be >>>>>>> sufficient. >>>>>>> > > >>>>>>>>>>> > >>>>> >>>>>>> > > >>>>>>>>>>> > >>>>> From the perspective of interoperability, it >>>>>>> would be >>>>>>> > > >>>>>>>>>>> good to support >>>>>>> > > >>>>>>>>>>> > >>>>> native >>>>>>> > > >>>>>>>>>>> > >>>>> type from file specs. Life will be easier >>>>>>> for projects >>>>>>> > > >>>>>>>>>>> like Apache >>>>>>> > > >>>>>>>>>>> > >>>>> XTable. >>>>>>> > > >>>>>>>>>>> > >>>>> File format could also provide finer-grained >>>>>>> statistics >>>>>>> > > >>>>>>>>>>> for variant >>>>>>> > > >>>>>>>>>>> > >>>>> type which >>>>>>> > > >>>>>>>>>>> > >>>>> facilitates data skipping. >>>>>>> > > >>>>>>>>>>> > >>>>> >>>>>>> > > >>>>>>>>>>> > >>>> >>>>>>> > > >>>>>>>>>>> > >>>> Agreed, there can definitely be additional >>>>>>> value in >>>>>>> > > >>>>>>>>>>> native file format >>>>>>> > > >>>>>>>>>>> > >>>> integration. Just wanted to highlight that >>>>>>> it's not a >>>>>>> > > >>>>>>>>>>> strict requirement. >>>>>>> > > >>>>>>>>>>> > >>>> >>>>>>> > > >>>>>>>>>>> > >>>> -Tyler >>>>>>> > > >>>>>>>>>>> > >>>> >>>>>>> > > >>>>>>>>>>> > >>>> >>>>>>> > > >>>>>>>>>>> > >>>>> >>>>>>> > > >>>>>>>>>>> > >>>>> Gang >>>>>>> > > >>>>>>>>>>> > >>>>> >>>>>>> > > >>>>>>>>>>> > >>>>> On Wed, May 15, 2024 at 6:49 AM Tyler Akidau >>>>>>> > > >>>>>>>>>>> > >>>>> <tyler.aki...@snowflake.com.invalid> wrote: >>>>>>> > > >>>>>>>>>>> > >>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>> Good to see you again as well, JB! Thanks! >>>>>>> > > >>>>>>>>>>> > >>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>> -Tyler >>>>>>> > > >>>>>>>>>>> > >>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>> On Tue, May 14, 2024 at 1:04 PM >>>>>>> Jean-Baptiste Onofré < >>>>>>> > > >>>>>>>>>>> j...@nanthrax.net> >>>>>>> > > >>>>>>>>>>> > >>>>>> wrote: >>>>>>> > > >>>>>>>>>>> > >>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>>> Hi Tyler, >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>>> Super happy to see you there :) It reminds >>>>>>> me our >>>>>>> > > >>>>>>>>>>> discussions back in >>>>>>> > > >>>>>>>>>>> > >>>>>>> the start of Apache Beam :) >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>>> Anyway, the thread is pretty interesting. >>>>>>> I remember >>>>>>> > > >>>>>>>>>>> some discussions >>>>>>> > > >>>>>>>>>>> > >>>>>>> about JSON datatype for spec v3. The >>>>>>> binary data type >>>>>>> > > >>>>>>>>>>> is already >>>>>>> > > >>>>>>>>>>> > >>>>>>> supported in the spec v2. >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>>> I'm looking forward to the proposal and >>>>>>> happy to help >>>>>>> > > >>>>>>>>>>> on this ! >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>>> Regards >>>>>>> > > >>>>>>>>>>> > >>>>>>> JB >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>>> On Sat, May 11, 2024 at 7:06 AM Tyler >>>>>>> Akidau >>>>>>> > > >>>>>>>>>>> > >>>>>>> <tyler.aki...@snowflake.com.invalid> >>>>>>> wrote: >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > Hello, >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > We (Tyler, Nileema, Selcuk, Aihua) are >>>>>>> working on a >>>>>>> > > >>>>>>>>>>> proposal for >>>>>>> > > >>>>>>>>>>> > >>>>>>> which we’d like to get early feedback from >>>>>>> the >>>>>>> > > >>>>>>>>>>> community. As you may know, >>>>>>> > > >>>>>>>>>>> > >>>>>>> Snowflake has embraced Iceberg as its open >>>>>>> Data Lake >>>>>>> > > >>>>>>>>>>> format. Having made >>>>>>> > > >>>>>>>>>>> > >>>>>>> good progress on our own adoption of the >>>>>>> Iceberg >>>>>>> > > >>>>>>>>>>> standard, we’re now in a >>>>>>> > > >>>>>>>>>>> > >>>>>>> position where there are features not yet >>>>>>> supported in >>>>>>> > > >>>>>>>>>>> Iceberg which we >>>>>>> > > >>>>>>>>>>> > >>>>>>> think would be valuable for our users, and >>>>>>> that we >>>>>>> > > >>>>>>>>>>> would like to discuss >>>>>>> > > >>>>>>>>>>> > >>>>>>> with and help contribute to the Iceberg >>>>>>> community. >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > The first two such features we’d like to >>>>>>> discuss are >>>>>>> > > >>>>>>>>>>> in support of >>>>>>> > > >>>>>>>>>>> > >>>>>>> efficient querying of dynamically typed, >>>>>>> > > >>>>>>>>>>> semi-structured data: variant data >>>>>>> > > >>>>>>>>>>> > >>>>>>> types, and subcolumnarization of variant >>>>>>> columns. In >>>>>>> > > >>>>>>>>>>> more detail, for >>>>>>> > > >>>>>>>>>>> > >>>>>>> anyone who may not already be familiar: >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > 1. Variant data types >>>>>>> > > >>>>>>>>>>> > >>>>>>> > Variant types allow for the efficient >>>>>>> binary >>>>>>> > > >>>>>>>>>>> encoding of dynamic >>>>>>> > > >>>>>>>>>>> > >>>>>>> semi-structured data such as JSON, Avro, >>>>>>> etc. By >>>>>>> > > >>>>>>>>>>> encoding semi-structured >>>>>>> > > >>>>>>>>>>> > >>>>>>> data as a variant column, we retain the >>>>>>> flexibility of >>>>>>> > > >>>>>>>>>>> the source data, >>>>>>> > > >>>>>>>>>>> > >>>>>>> while allowing query engines to more >>>>>>> efficiently >>>>>>> > > >>>>>>>>>>> operate on the data. >>>>>>> > > >>>>>>>>>>> > >>>>>>> Snowflake has supported the variant data >>>>>>> type on >>>>>>> > > >>>>>>>>>>> Snowflake tables for many >>>>>>> > > >>>>>>>>>>> > >>>>>>> years [1]. As more and more users utilize >>>>>>> Iceberg >>>>>>> > > >>>>>>>>>>> tables in Snowflake, >>>>>>> > > >>>>>>>>>>> > >>>>>>> we’re hearing an increasing chorus of >>>>>>> requests for >>>>>>> > > >>>>>>>>>>> variant support. >>>>>>> > > >>>>>>>>>>> > >>>>>>> Additionally, other query engines such as >>>>>>> Apache Spark >>>>>>> > > >>>>>>>>>>> have begun adding >>>>>>> > > >>>>>>>>>>> > >>>>>>> variant support [2]. As such, we believe >>>>>>> it would be >>>>>>> > > >>>>>>>>>>> beneficial to the >>>>>>> > > >>>>>>>>>>> > >>>>>>> Iceberg community as a whole to >>>>>>> standardize on the >>>>>>> > > >>>>>>>>>>> variant data type >>>>>>> > > >>>>>>>>>>> > >>>>>>> encoding used across Iceberg tables. >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > One specific point to make here is that, >>>>>>> since an >>>>>>> > > >>>>>>>>>>> Apache OSS >>>>>>> > > >>>>>>>>>>> > >>>>>>> version of variant encoding already exists >>>>>>> in Spark, >>>>>>> > > >>>>>>>>>>> it likely makes sense >>>>>>> > > >>>>>>>>>>> > >>>>>>> to simply adopt the Spark encoding as the >>>>>>> Iceberg >>>>>>> > > >>>>>>>>>>> standard as well. The >>>>>>> > > >>>>>>>>>>> > >>>>>>> encoding we use internally today in >>>>>>> Snowflake is >>>>>>> > > >>>>>>>>>>> slightly different, but >>>>>>> > > >>>>>>>>>>> > >>>>>>> essentially equivalent, and we see no >>>>>>> particular value >>>>>>> > > >>>>>>>>>>> in trying to clutter >>>>>>> > > >>>>>>>>>>> > >>>>>>> the space with another >>>>>>> equivalent-but-incompatible >>>>>>> > > >>>>>>>>>>> encoding. >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > 2. Subcolumnarization >>>>>>> > > >>>>>>>>>>> > >>>>>>> > Subcolumnarization of variant columns >>>>>>> allows query >>>>>>> > > >>>>>>>>>>> engines to >>>>>>> > > >>>>>>>>>>> > >>>>>>> efficiently prune datasets when subcolumns >>>>>>> (i.e., >>>>>>> > > >>>>>>>>>>> nested fields) within a >>>>>>> > > >>>>>>>>>>> > >>>>>>> variant column are queried, and also >>>>>>> allows optionally >>>>>>> > > >>>>>>>>>>> materializing some >>>>>>> > > >>>>>>>>>>> > >>>>>>> of the nested fields as a column on their >>>>>>> own, >>>>>>> > > >>>>>>>>>>> affording queries on these >>>>>>> > > >>>>>>>>>>> > >>>>>>> subcolumns the ability to read less data >>>>>>> and spend >>>>>>> > > >>>>>>>>>>> less CPU on extraction. >>>>>>> > > >>>>>>>>>>> > >>>>>>> When subcolumnarizing, the system managing >>>>>>> table >>>>>>> > > >>>>>>>>>>> metadata and data tracks >>>>>>> > > >>>>>>>>>>> > >>>>>>> individual pruning statistics (min, max, >>>>>>> null, etc.) >>>>>>> > > >>>>>>>>>>> for some subset of the >>>>>>> > > >>>>>>>>>>> > >>>>>>> nested fields within a variant, and also >>>>>>> manages any >>>>>>> > > >>>>>>>>>>> optional >>>>>>> > > >>>>>>>>>>> > >>>>>>> materialization. Without >>>>>>> subcolumnarization, any query >>>>>>> > > >>>>>>>>>>> which touches a >>>>>>> > > >>>>>>>>>>> > >>>>>>> variant column must read, parse, extract, >>>>>>> and filter >>>>>>> > > >>>>>>>>>>> every row for which >>>>>>> > > >>>>>>>>>>> > >>>>>>> that column is non-null. Thus, by >>>>>>> providing a >>>>>>> > > >>>>>>>>>>> standardized way of tracking >>>>>>> > > >>>>>>>>>>> > >>>>>>> subcolum metadata and data for variant >>>>>>> columns, >>>>>>> > > >>>>>>>>>>> Iceberg can make >>>>>>> > > >>>>>>>>>>> > >>>>>>> subcolumnar optimizations accessible >>>>>>> across various >>>>>>> > > >>>>>>>>>>> catalogs and query >>>>>>> > > >>>>>>>>>>> > >>>>>>> engines. >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > Subcolumnarization is a non-trivial >>>>>>> topic, so we >>>>>>> > > >>>>>>>>>>> expect any >>>>>>> > > >>>>>>>>>>> > >>>>>>> concrete proposal to include not only the >>>>>>> set of >>>>>>> > > >>>>>>>>>>> changes to Iceberg >>>>>>> > > >>>>>>>>>>> > >>>>>>> metadata that allow compatible query >>>>>>> engines to >>>>>>> > > >>>>>>>>>>> interopate on >>>>>>> > > >>>>>>>>>>> > >>>>>>> subcolumnarization data for variant >>>>>>> columns, but also >>>>>>> > > >>>>>>>>>>> reference >>>>>>> > > >>>>>>>>>>> > >>>>>>> documentation explaining >>>>>>> subcolumnarization principles >>>>>>> > > >>>>>>>>>>> and recommended best >>>>>>> > > >>>>>>>>>>> > >>>>>>> practices. >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > It sounds like the recent Geo proposal >>>>>>> [3] may be a >>>>>>> > > >>>>>>>>>>> good starting >>>>>>> > > >>>>>>>>>>> > >>>>>>> point for how to approach this, so our >>>>>>> plan is to >>>>>>> > > >>>>>>>>>>> write something up in >>>>>>> > > >>>>>>>>>>> > >>>>>>> that vein that covers the proposed spec >>>>>>> changes, >>>>>>> > > >>>>>>>>>>> backwards compatibility, >>>>>>> > > >>>>>>>>>>> > >>>>>>> implementor burdens, etc. But we wanted to >>>>>>> first reach >>>>>>> > > >>>>>>>>>>> out to the community >>>>>>> > > >>>>>>>>>>> > >>>>>>> to introduce ourselves and the idea, and >>>>>>> see if >>>>>>> > > >>>>>>>>>>> there’s any early feedback >>>>>>> > > >>>>>>>>>>> > >>>>>>> we should incorporate before we spend too >>>>>>> much time on >>>>>>> > > >>>>>>>>>>> a concrete proposal. >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > Thank you! >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > [1] >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> >>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-semistructured >>>>>>> > > >>>>>>>>>>> > >>>>>>> > [2] >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> >>>>>>> https://github.com/apache/spark/blob/master/common/variant/README.md >>>>>>> > > >>>>>>>>>>> > >>>>>>> > [3] >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> >>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> > -Tyler, Nileema, Selcuk, Aihua >>>>>>> > > >>>>>>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>>>>>> > >>>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>> >>>>>>> > > >>>>>>>>>>> > >>>>>>> > > >>>>>>>>>>> > -- >>>>>>> > > >>>>>>>>>>> > Ryan Blue >>>>>>> > > >>>>>>>>>>> > Databricks >>>>>>> > > >>>>>>>>>>> > >>>>>>> > > >>>>>>>>>>> >>>>>>> > > >>>>>>>>>> >>>>>>> > > >>>>>>>>> >>>>>>> > > >>>>>>>>> -- >>>>>>> > > >>>>>>>>> Ryan Blue >>>>>>> > > >>>>>>>>> Databricks >>>>>>> > > >>>>>>>>> >>>>>>> > > >>>>>>>> >>>>>>> > > >>>>>>>> >>>>>>> > > >>>>>>>> -- >>>>>>> > > >>>>>>>> Ryan Blue >>>>>>> > > >>>>>>>> Databricks >>>>>>> > > >>>>>>>> >>>>>>> > > >>>>>>> >>>>>>> > > >>>>>> >>>>>>> > > >>>>>> -- >>>>>>> > > >>>>>> Ryan Blue >>>>>>> > > >>>>>> Databricks >>>>>>> > > >>>>>> >>>>>>> > > >>>>> >>>>>>> > > >>>> >>>>>>> > > >>>> -- >>>>>>> > > >>>> Ryan Blue >>>>>>> > > >>>> Databricks >>>>>>> > > >>>> >>>>>>> > > >>> >>>>>>> > > >> >>>>>>> > > >> -- >>>>>>> > > >> Ryan Blue >>>>>>> > > >> Databricks >>>>>>> > > >> >>>>>>> > > > >>>>>>> > > >>>>>>> > >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Databricks >>>>> >>>>