This seems to straddle that line, in that you can also view this as a way to represent semi-structured data in a manner that allows for more efficient querying and computation by breaking out some of its components into a more structured form.
(I also happen to want a canonical Arrow representation for variant data, as this type occurs in many databases but doesn't have a great representation today in ADBC results. That's why I filed [Format] Consider adding an official variant type to Arrow · Issue #42069 · apache/arrow (github.com) <https://github.com/apache/arrow/issues/42069>. Of course, there's no specific reason why a canonical Arrow representation for variants must align with Spark and/or Iceberg.) -Curt On Thu, Aug 22, 2024 at 2:01 AM Antoine Pitrou <anto...@python.org> wrote: > > Ah, thanks. I've tried to find a rationale and ended up on > https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34 . Is it > a good description of what you're after? > > If so, then I don't think Arrow is a good match. This seems mostly to be > a marshalling format for semi-structured data (like Avro?). Arrow data > types are meant to be in a representation ideal for querying and > computation, rather than transport and storage. > > This could be developed separately and then be represented in Arrow > using an extension type (perhaps a canonical one as in > https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html). > > What do other Arrow developers think? > > Regards > > Antoine. > > > Le 22/08/2024 à 10:45, Gang Wu a écrit : > > Sorry for the inconvenience. > > > > This is the permalink for the discussion: > > https://lists.apache.org/thread/hopkr2f0ftoywwt9zo3jxb7n0ob5s5bw > > > > On Thu, Aug 22, 2024 at 3:51 PM Antoine Pitrou <anto...@python.org> > wrote: > > > >> > >> Hi Gang, > >> > >> Sorry, but can you give a pointer to the start of this discussion thread > >> in a readable format (for example a mailing-list archive)? It appears > >> that dev@arrow wasn't cc'ed from the start and that can make it > >> difficult to understand what this is about. > >> > >> Regards > >> > >> Antoine. > >> > >> > >> Le 22/08/2024 à 08:32, Gang Wu a écrit : > >>> It seems that we have reached a consensus to some extent that there > >>> should be a new home for the variant spec. The pending question > >>> is whether Parquet or Arrow is a better choice. As a committer from > >> Arrow, > >>> Parquet and ORC communities, I am neutral to choose any and happy to > >>> help with the movement once a decision has been made. > >>> > >>> Should we start a vote to move forward? > >>> > >>> Best, > >>> Gang > >>> > >>> On Sat, Aug 17, 2024 at 8:34 AM Micah Kornfield <emkornfi...@gmail.com > > > >>> wrote: > >>> > >>>>> > >>>>> That being said, I think the most important consideration for now is > >>>> where > >>>>> are the current maintainers / contributors to the variant type. If > most > >>>> of > >>>>> them are already PMC members / committers on a project, it becomes a > >> bit > >>>>> easier. Otherwise if there isn't much overlap with a project's > existing > >>>>> governance, I worry there could be a bit of friction. How many active > >>>>> contributors are there from Iceberg? And how about from Arrow? > >>>> > >>>> > >>>> I think this is the key question. What are the requirements around > >>>> governance? I've seen some tangential messaging here but I'm not > clear > >> on > >>>> what everyone expects. > >>>> > >>>> I think for a lot of the other concerns my view is that the exact > >> project > >>>> does not really matter (and choosing a project with mature cross > >> language > >>>> testing infrastructure or committing to building it is critical). IIUC > >> we > >>>> are talking about following artifacts: > >>>> > >>>> 1. A stand alone specification document (this can be hosted anyplace) > >>>> 2. A set of language bindings with minimal dependencies can be > consumed > >>>> downstream (again, as long as dependencies are managed carefully any > >>>> project can host these) > >>>> 3. Potential integration where appropriate into file format libraries > >> to > >>>> support shredding (but as of now this is being bypassed by using > >>>> conventions anyways). My impression is that at least for Parquet > there > >> has > >>>> been a proliferation of vectorized readers across different projects, > so > >>>> I'm not clear how much standardization in parquet-java could help > here. > >>>> > >>>> To respond to some other questions: > >>>> > >>>> Arrow is not used as Spark's in-memory model, nor Trino and others so > >> those > >>>>> existing relationships aren't there. I also worry that differences in > >>>>> approaches would make it difficult later on. > >>>> > >>>> > >>>> While Arrow is not in the core memory model, for Spark I believe it is > >>>> still used for IPC for things like Java<->Python. Trino also consumes > >> Arrow > >>>> libraries today to support things like Snowflake/Bigquery federation. > >> But I > >>>> think this is minor because as mentioned above I think the functional > >>>> libraries would be relatively stand-alone. > >>>> > >>>> Do we think it could be introduced as a canonical extension arrow > type? > >>>> > >>>> > >>>> I believe it can be, I think there are probably different layouts > >> that can > >>>> be supported: > >>>> > >>>> 1. A struct with two variable width bytes columns (metadata and value > >> data > >>>> are stored separately and each entry has a 1:1 relationship). > >>>> 2. Shredded (shredded according to the same convention as parquet), I > >>>> would need to double check but I don't think Arrow would have problems > >> here > >>>> but REE would likely be required to make this efficient (i.e. sparse > >> value > >>>> support is important). > >>>> > >>>> In both cases the main complexity is providing the necessary functions > >> for > >>>> manipulation. > >>>> > >>>> Thanks, > >>>> Micah > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Fri, Aug 16, 2024 at 3:58 PM Will Jones <will.jones...@gmail.com> > >>>> wrote: > >>>> > >>>>> In being more engine and format agnostic, I agree the Arrow project > >> might > >>>>> be a good host for such a specification. It seems like we want to > move > >>>> away > >>>>> from hosting in Spark to make it engine agnostic. But moving into > >> Iceberg > >>>>> might make it less format agnostic, as I understand multiple formats > >>>> might > >>>>> want to implement this. I'm not intimately familiar with the state of > >>>> this, > >>>>> but I believe Delta Lake would like to be aligned with the same > format > >> as > >>>>> Iceberg. In addition, the Lance format (which I work on), will > >> eventually > >>>>> be interesting as well. It seems equally bad to me to attach this > >>>>> specification to a particular table format as it does a particular > >> query > >>>>> engine. > >>>>> > >>>>> That being said, I think the most important consideration for now is > >>>> where > >>>>> are the current maintainers / contributors to the variant type. If > most > >>>> of > >>>>> them are already PMC members / committers on a project, it becomes a > >> bit > >>>>> easier. Otherwise if there isn't much overlap with a project's > existing > >>>>> governance, I worry there could be a bit of friction. How many active > >>>>> contributors are there from Iceberg? And how about from Arrow? > >>>>> > >>>>> BTW, I'd add I'm interested in helping develop an Arrow extension > type > >>>> for > >>>>> the binary variant type. I've been experimenting with a DataFusion > >>>>> extension that operates on this [1], and already have some ideas on > how > >>>>> such an extension type might be defined. I'm not yet caught up on the > >>>>> shredded specification, but I think having just the binary format > would > >>>> be > >>>>> beneficial for in-memory analytics, which are most relevant to Arrow. > >>>> I'll > >>>>> be creating a seperate thread on the Arrow ML about this soon. > >>>>> > >>>>> Best, > >>>>> > >>>>> Will Jones > >>>>> > >>>>> [1] > >>>>> > >>>> > >> > https://github.com/datafusion-contrib/datafusion-functions-variant/issues > >>>>> > >>>>> > >>>>> On Thu, Aug 15, 2024 at 7:39 PM Gang Wu <ust...@gmail.com> wrote: > >>>>> > >>>>>> + dev@arrow > >>>>>> > >>>>>> Thanks for all the valuable suggestions! I am inclined to Micah's > idea > >>>>> that > >>>>>> Arrow might be a better host compared to Parquet. > >>>>>> > >>>>>> To give more context, I am taking the initiative to add the geometry > >>>> type > >>>>>> to both Parquet and ORC. I'd like to do the same thing for variant > >> type > >>>>> in > >>>>>> that variant type is engine and file format agnostic. This does mean > >>>> that > >>>>>> Parquet might not be the neutral place to hold the variant spec. > >>>>>> > >>>>>> Best, > >>>>>> Gang > >>>>>> > >>>>>> On Fri, Aug 16, 2024 at 10:00 AM Jingsong Li < > jingsongl...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Thanks all for your discussion. > >>>>>>> > >>>>>>> The Apache Paimon community is also considering support for this > >>>>>>> Variant type, without a doubt, we hope to maintain consistency with > >>>>>>> Iceberg. > >>>>>>> > >>>>>>> Not only the Paimon community, but also various computing engines > >>>> need > >>>>>>> to adapt to this type, such as Flink and StarRocks. We also hope to > >>>>>>> promote them to adapt to this type. > >>>>>>> > >>>>>>> It is worth noting that we also need to standardize many functions > >>>>>>> related to it. > >>>>>>> > >>>>>>> A neutral place to maintain it is a great choice. > >>>>>>> > >>>>>>> - As Gang Wu said, a standalone project is good, just like > >>>>> RoaringBitmap > >>>>>>> [1]. > >>>>>>> - As Ryan said, Parquet community is a neutral option too. > >>>>>>> - As Micah said, Arrow is also an option too. > >>>>>>> > >>>>>>> [1] https://github.com/RoaringBitmap > >>>>>>> > >>>>>>> Best, > >>>>>>> Jingsong > >>>>>>> > >>>>>>> On Fri, Aug 16, 2024 at 7:18 AM Micah Kornfield < > >>>> emkornfi...@gmail.com > >>>>>> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Thats fair @Micah, so far all the discussions have been direct > and > >>>>> off > >>>>>>> the dev list. Would you like to make the request on the public > Spark > >>>>> Dev > >>>>>>> list? I would be glad to co-sign, I can also draft up a quick email > >>>> if > >>>>>> you > >>>>>>> don't have time. > >>>>>>>> > >>>>>>>> > >>>>>>>> I think once we come to consensus, if you have bandwidth, I think > >>>> the > >>>>>>> message might be better coming from you, as you have more context > on > >>>>> some > >>>>>>> of the non-public conversations, the requirements from an Iceberg > >>>>>>> perspective on governance and the blockers that were encountered. > If > >>>>>>> details on the conversations can't be shared, (i.e. we are starting > >>>>> from > >>>>>>> scratch) it seems like suggesting a new project via SPIP might be > the > >>>>> way > >>>>>>> forward. I'm happy to help with that if it is useful but I would > >>>> guess > >>>>>>> Aihua or Tyler might be in a better place to start as it seems they > >>>>> have > >>>>>>> done more serious thinking here. > >>>>>>>> > >>>>>>>> If we decide to try to standardize on Parquet or Arrow I'm happy > to > >>>>>> help > >>>>>>> support the effort in those communities. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Micah > >>>>>>>> > >>>>>>>> On Thu, Aug 15, 2024 at 8:09 AM Russell Spitzer < > >>>>>>> russell.spit...@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Thats fair @Micah, so far all the discussions have been direct > and > >>>>> off > >>>>>>> the dev list. Would you like to make the request on the public > Spark > >>>>> Dev > >>>>>>> list? I would be glad to co-sign, I can also draft up a quick email > >>>> if > >>>>>> you > >>>>>>> don't have time. > >>>>>>>>> > >>>>>>>>> On Thu, Aug 15, 2024 at 10:04 AM Micah Kornfield < > >>>>>> emkornfi...@gmail.com> > >>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> I agree that it would be beneficial to make a sub-project, the > >>>>> main > >>>>>>> problem is political and not logistic. I've been asking for > movement > >>>>> from > >>>>>>> other relative projects for a month and we simply haven't gotten > >>>>>> anywhere. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I just wanted to double check that these issues were brought > >>>>> directly > >>>>>>> to the spark community (i.e. a discussion thread on the Spark > >>>> developer > >>>>>>> mailing list) and not via backchannels. > >>>>>>>>>> > >>>>>>>>>> I'm not sure the outcome would be different and I don't think > >>>> this > >>>>>>> should block forking the spec, but we should make sure that the > >>>>> decision > >>>>>> is > >>>>>>> publicly documented within both communities. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Micah > >>>>>>>>>> > >>>>>>>>>> On Thu, Aug 15, 2024 at 7:47 AM Russell Spitzer < > >>>>>>> russell.spit...@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> @Gang Wu > >>>>>>>>>>> > >>>>>>>>>>> I agree that it would be beneficial to make a sub-project, the > >>>>> main > >>>>>>> problem is political and not logistic. I've been asking for > movement > >>>>> from > >>>>>>> other relative projects for a month and we simply haven't gotten > >>>>>> anywhere. > >>>>>>> I don't think there is anything that would stop us from moving to a > >>>>> joint > >>>>>>> project in the future and if you know of some way of encouraging > that > >>>>>>> movement from other relevant parties I would be glad to collaborate > >>>> in > >>>>>>> doing that. One thing that I don't want to do is have the Iceberg > >>>>> project > >>>>>>> stay in a holding pattern without any clear roadmap as to how to > >>>>> proceed. > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Aug 14, 2024 at 11:12 PM Yufei Gu < > flyrain...@gmail.com > >>>>> > >>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> I’m on board with copying the spec into our repository. > >>>> However, > >>>>> as > >>>>>>> we’ve talked about, it’s not just a straightforward copy—there are > >>>>>> already > >>>>>>> some divergences. Some of them are under discussion. Iceberg is > >>>>>> definitely > >>>>>>> the best place for these specs. Engines like Trino and Flink can > then > >>>>>> rely > >>>>>>> on the Iceberg specs as a solid foundation. > >>>>>>>>>>>> > >>>>>>>>>>>> Yufei > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Aug 14, 2024 at 7:51 PM Gang Wu <ust...@gmail.com> > >>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Sorry for chiming in late. > >>>>>>>>>>>>> > >>>>>>>>>>>>> From the discussion in > >>>>>>> https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq, > I > >>>>>> don't > >>>>>>> quite understand why it is logistically complicated to create a > >>>>>> sub-project > >>>>>>> to hold the variant spec and impl. > >>>>>>>>>>>>> > >>>>>>>>>>>>> IMHO, coping the variant type spec into Apache Iceberg has > >>>> some > >>>>>>> deficiencies: > >>>>>>>>>>>>> - It is a burden to update two repos if there is a variant > >>>> type > >>>>>>> spec change and will likely result in deviation if some changes do > >>>> not > >>>>>>> reach agreement from both parties. > >>>>>>>>>>>>> - Implementers are required to keep an eye on both specs > >>>>>>> (considering proprietary engines where both Iceberg and Delta are > >>>>>>> supported). > >>>>>>>>>>>>> - Putting the spec and impl of variant type in Iceberg repo > >>>> does > >>>>>>> lose the opportunity for better native support from file formats > like > >>>>>>> Parquet and ORC. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm not sure if it is possible to create a separate project > >>>>> (e.g. > >>>>>>> apache/variant-type) to make it a single point of truth. We can > learn > >>>>>> from > >>>>>>> the experience of Apache Arrow. In this fashion, different engines, > >>>>> table > >>>>>>> formats and file formats can follow the same spec and are free to > >>>>> depend > >>>>>> on > >>>>>>> the reference implementations from apache/variant-type or implement > >>>>> their > >>>>>>> own. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Gang > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Aug 15, 2024 at 10:07 AM Jack Ye < > yezhao...@gmail.com > >>>>> > >>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> +1 for copying the spec into our repository, I think we need > >>>> to > >>>>>>> own it fully as a part of the table spec, and we can build > >>>>> compatibility > >>>>>>> through tests. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -Jack > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Aug 14, 2024 at 12:52 PM Russell Spitzer < > >>>>>>> russell.spit...@gmail.com> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm not really in favor of linking and annotating as that > >>>> just > >>>>>>> makes things more complicated and still is essentially forking just > >>>>> with > >>>>>>> more steps. If we just track our annotations / modifications to a > >>>>> single > >>>>>>> commit/version then we have the same issue again but now you have > to > >>>> go > >>>>>> to > >>>>>>> multiple sources to get the actual Spec. In addition, our very copy > >>>> of > >>>>>> the > >>>>>>> Spec is going to require new types which don't exist in the Spark > >>>> Spec > >>>>>>> which necessarily means diverging. We will need to take up new > >>>>> primitive > >>>>>>> id's (as noted in my first email) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The other issue I have is I don't think the Spark Spec is > >>>>> really > >>>>>>> going through a thorough review process from all members of the > Spark > >>>>>>> community, I believe it probably should have gone through the SPIP > >>>> but > >>>>>>> instead seems to have been merged without broad community > >>>> involvement. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The only way to truly avoid diverging is to only have a > >>>> single > >>>>>>> copy of the spec, in our previous discussions the vast majority of > >>>>> Apache > >>>>>>> Iceberg community want it to exist here. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Aug 14, 2024 at 2:19 PM Daniel Weeks < > >>>>> dwe...@apache.org > >>>>>>> > >>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I'm really excited about the introduction of variant type > >>>> to > >>>>>>> Iceberg, but I want to raise concerns about forking the spec. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I feel like preemptively forking would create the > situation > >>>>>>> where we end up diverging because there's little reason to work > with > >>>>> both > >>>>>>> communities to evolve in a way that benefits everyone. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I would much rather point to a specific version of the > spec > >>>>> and > >>>>>>> annotate any variance in Iceberg's handling. This would allow us > to > >>>>>>> continue without dividing the communities. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> If at any point there are irreconcilable differences, I > >>>> would > >>>>>>> support forking, but I don't feel like that should be the initial > >>>> step. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> No one is excited about the possibility that the physical > >>>>>>> representations end up diverging, but it feels like we're setting > >>>>>> ourselves > >>>>>>> up for that exact scenario. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -Dan > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, Aug 14, 2024 at 6:54 AM Fokko Driesprong < > >>>>>>> fo...@apache.org> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +1 to what's already being said here. It is good to copy > >>>> the > >>>>>>> spec to Iceberg and add context that's specific to Iceberg, but at > >>>> the > >>>>>> same > >>>>>>> time, we should maintain compatibility. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Kind regards, > >>>>>>>>>>>>>>>>> Fokko > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Op wo 14 aug 2024 om 15:30 schreef Manu Zhang < > >>>>>>> owenzhang1...@gmail.com>: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> +1 to copy the spec into our repository. I think the > best > >>>>> way > >>>>>>> to keep compatibility is building integration tests. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>> Manu > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Wed, Aug 14, 2024 at 8:27 PM Péter Váry < > >>>>>>> peter.vary.apa...@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks Russell and Aihua for pushing Variant support! > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Given the differences between the supported types and > >>>> the > >>>>>>> lack of interest from the other project, I think it is reasonable > to > >>>>>>> duplicate the specification to our repository. > >>>>>>>>>>>>>>>>>>> I would give very strong emphasis on sticking to the > >>>> Spark > >>>>>>> spec as much as possible, to keep compatibility as much as > possible. > >>>>>> Maybe > >>>>>>> even revert to a shared specification if the situation changes. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>> Peter > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2024. > >>>>> aug. > >>>>>>> 13., K, 19:52): > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks Russell for bringing this up. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> This is the main blocker to move forward with the > >>>> Variant > >>>>>>> support in Iceberg and hopefully we can have a consensus. To me, I > >>>> also > >>>>>>> feel it makes more sense to move the spec into Iceberg rather than > >>>>> Spark > >>>>>>> engine owns it and we try to keep it compatible with Spark spec. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>> Aihua > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Mon, Aug 12, 2024 at 6:50 PM Russell Spitzer < > >>>>>>> russell.spit...@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hi Y’all, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> We’ve hit a bit of a roadblock with the Variant > >>>>> Proposal, > >>>>>>> while we were hoping to move the Variant and Shredding > specifications > >>>>>> from > >>>>>>> Spark into Iceberg there doesn’t seem to be a lot of interest in > >>>> that. > >>>>>>> Unfortunately, I think we have a number of issues with just linking > >>>> to > >>>>>> the > >>>>>>> Spark project directly from within Iceberg and I believe we need to > >>>>> copy > >>>>>>> the specifications into our repository. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> There are a few reasons why i think this is necessary > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> First, we have a divergence of types already. The > >>>> Spark > >>>>>>> Specification already includes types which Iceberg has no > definition > >>>>> for > >>>>>>> (19, 20 - Interval Types) and Iceberg already has a type which is > not > >>>>>>> included within the Spark Specification (Time) and will soon have > >>>> more > >>>>>> with > >>>>>>> TimestampNS, and Geo. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Second, We would like to make sure that Spark is not > a > >>>>>> hard > >>>>>>> dependency for other engines. We are working with several > >>>> implementers > >>>>> of > >>>>>>> the Iceberg spec and it has previously been agreed that it would be > >>>>> best > >>>>>> if > >>>>>>> the source of truth for Variant existed in an engine and file > format > >>>>>>> neutral location. The Iceberg project has a good open model of > >>>>> governance > >>>>>>> and, as we have seen so far discussing Variant, open and active > >>>>>>> collaboration. This would also help as we can strictly version our > >>>>>> changes > >>>>>>> in-line with the rest of the Iceberg spec. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Third, The Shredding spec is not quite finished and > >>>>>>> requires some group analysis and discussion before we commit it. I > >>>>> think > >>>>>>> again the Iceberg community is probably the right place for this to > >>>>>> happen > >>>>>>> as we have already started discussions here on these topics. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> For these reasons I think we should go with a direct > >>>>> copy > >>>>>>> of the existing specification from the Spark Project and move ahead > >>>>> with > >>>>>>> our discussions and modifications within Iceberg. That said, I do > not > >>>>>> want > >>>>>>> to diverge if possible from the Spark proposal. For example, > although > >>>>> we > >>>>>> do > >>>>>>> not use the Interval types above, I think we should not reuse those > >>>>> type > >>>>>>> ids within our spec. Iceberg's Variant Spec types 19 and 20 would > >>>>> remain > >>>>>>> unused along with any other types we think are not applicable. We > >>>>> should > >>>>>>> strive whenever possible to allow for compatibility. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> In the interest of moving forward with this proposal > I > >>>>> am > >>>>>>> hoping to see if anyone in the community objects to this plan going > >>>>>> forward > >>>>>>> or has a better alternative. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> As always I am thankful for your time and am eager to > >>>>> hear > >>>>>>> back from everyone, > >>>>>>>>>>>>>>>>>>>>> Russ > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > >