Yes I agree with Jacques that fixed binary is what it is in the end. I think It is more about user experience, whether the conversion is done at the user side or Iceberg and engine side. Many people just store UUID as a 36 byte string instead of a 16 byte binary, so with an explicit UUID type, Iceberg can optimize this common use case internally for users. There might be some other benefits I overlooked, but maybe the complication introduced by this type does not really justify the slightly better user experience. I am also on the fence about it.
-Jack Ye On Tue, Jul 27, 2021 at 7:54 PM Jacques Nadeau <jacquesnad...@gmail.com> wrote: > What specific arguments are there for it being a first class type besides > it is elsewhere? Is there some kind of optimization iceberg or an engine > could do if it was typed versus just a bucket of bits? Fixed width binary > seems to cover the cases I see in terms of actual functionality in the > iceberg libraries or engines… > > > > On Tue, Jul 27, 2021 at 6:54 PM Yan Yan <yyany...@gmail.com> wrote: > >> One conversation I used to come across regarding UUID deprecation was >> from https://github.com/apache/iceberg/pull/1611 >> >> Thanks, >> Yan >> >> On Tue, Jul 27, 2021 at 1:07 PM Peter Vary <pv...@cloudera.com.invalid> >> wrote: >> >>> Hi Joshua, >>> >>> I do not have a strong preference about the UUID type, but I would like >>> the highlight, that the type is handled inconsistently in Iceberg with >>> different file formats. (See: >>> https://github.com/apache/iceberg/issues/1881) >>> >>> If we keep the type, it would be good to standardize the handling in >>> every file format. >>> >>> Thanks, Peter >>> >>> On Tue, 27 Jul 2021, 17:08 Joshua Howard, <joshthow...@gmail.com> wrote: >>> >>>> Hi. >>>> >>>> UUID is a current data type according to the Iceberg spec ( >>>> https://iceberg.apache.org/spec/#primitive-types), but there seems to >>>> have been some discussion about removing it? I could not find the original >>>> discussion, but a reference to the discussion can be found here ( >>>> https://github.com/trinodb/trino/issues/6663). >>>> >>>> I generally agree with the consensus in the Trino issue to keep UUID in >>>> Iceberg. To summarize… >>>> >>>> - It makes sense to keep the type now that row identifiers are supported >>>> - Some engines (Trino) have support for the UUID type >>>> - Engines w/o support for UUID type can determine how to map >>>> >>>> Does anyone want to remove the type? If so, why? >>> >>>