Yes I agree with Jacques that fixed binary is what it is in the end. I
think It is more about user experience, whether the conversion is done at
the user side or Iceberg and engine side. Many people just store UUID as a
36 byte string instead of a 16 byte binary, so with an explicit UUID type,
Iceberg can optimize this common use case internally for users. There might
be some other benefits I overlooked, but maybe the complication introduced
by this type does not really justify the slightly better user experience. I
am also on the fence about it.

-Jack Ye

On Tue, Jul 27, 2021 at 7:54 PM Jacques Nadeau <jacquesnad...@gmail.com>
wrote:

> What specific arguments are there for it being a first class type besides
> it is elsewhere? Is there some kind of optimization iceberg or an engine
> could do if it was typed versus just a bucket of bits? Fixed width binary
> seems to cover the cases I see in terms of actual functionality in the
> iceberg libraries or engines…
>
>
>
> On Tue, Jul 27, 2021 at 6:54 PM Yan Yan <yyany...@gmail.com> wrote:
>
>> One conversation I used to come across regarding UUID deprecation was
>> from https://github.com/apache/iceberg/pull/1611
>>
>> Thanks,
>> Yan
>>
>> On Tue, Jul 27, 2021 at 1:07 PM Peter Vary <pv...@cloudera.com.invalid>
>> wrote:
>>
>>> Hi Joshua,
>>>
>>> I do not have a strong preference about the UUID type, but I would like
>>> the highlight, that the type is handled inconsistently in Iceberg with
>>> different file formats. (See:
>>> https://github.com/apache/iceberg/issues/1881)
>>>
>>> If we keep the type, it would be good to standardize the handling in
>>> every file format.
>>>
>>> Thanks, Peter
>>>
>>> On Tue, 27 Jul 2021, 17:08 Joshua Howard, <joshthow...@gmail.com> wrote:
>>>
>>>> Hi.
>>>>
>>>> UUID is a current data type according to the Iceberg spec (
>>>> https://iceberg.apache.org/spec/#primitive-types), but there seems to
>>>> have been some discussion about removing it? I could not find the original
>>>> discussion, but a reference to the discussion can be found here (
>>>> https://github.com/trinodb/trino/issues/6663).
>>>>
>>>> I generally agree with the consensus in the Trino issue to keep UUID in
>>>> Iceberg. To summarize…
>>>>
>>>> - It makes sense to keep the type now that row identifiers are supported
>>>> - Some engines (Trino) have support for the UUID type
>>>> - Engines w/o support for UUID type can determine how to map
>>>>
>>>> Does anyone want to remove the type? If so, why?
>>>
>>>

Reply via email to