Re: [DISCUSS] UUID type

2021-09-17 Thread Jacques Nadeau
I already added it to Substrait because of Iceberg lazy consensus :D On Fri, Sep 17, 2021 at 2:05 PM Ryan Blue wrote: > Let's move forward with it. I'm not hearing much dissent after saying the > general trend is to keep UUID. So let's call it lazy consensus. > > Ryan > > On Fri, Sep 17, 2021 a

Re: [DISCUSS] UUID type

2021-09-17 Thread Ryan Blue
Let's move forward with it. I'm not hearing much dissent after saying the general trend is to keep UUID. So let's call it lazy consensus. Ryan On Fri, Sep 17, 2021 at 1:32 PM Piotr Findeisen wrote: > Hi Ryan, > > Please advise whatever feels more appropriate from your perspective. > From my per

Re: [DISCUSS] UUID type

2021-09-17 Thread Piotr Findeisen
Hi Ryan, Please advise whatever feels more appropriate from your perspective. >From my perspective, we could just go ahead and merge Trino Iceberg support for UUID, since this is just fulfilling the spec as it is defined today. Best PF On Wed, Sep 15, 2021 at 10:17 PM Ryan Blue wrote: > I don

Re: [DISCUSS] UUID type

2021-09-15 Thread Ryan Blue
I don't think we necessarily reached consensus, but I think the general trend toward the end was to keep support for UUID. Should we start a vote to validate consensus? On Wed, Sep 15, 2021 at 1:15 PM Joshua Howard wrote: > Just following up on Piotr's message here. > > Have we converged? I thin

Re: [DISCUSS] UUID type

2021-09-15 Thread Joshua Howard
Just following up on Piotr's message here. Have we converged? I think most people would assume that silence is a vote for the status-quo. On Mon, Sep 13, 2021 at 7:30 AM Piotr Findeisen wrote: > Hi, > > It seems we converged here that UUID should remain included. > I read this as a consensus re

Re: [DISCUSS] UUID type

2021-09-13 Thread Piotr Findeisen
Hi, It seems we converged here that UUID should remain included. I read this as a consensus reached, but it may be subjective. Did we objectively reached consensus on this? >From Iceberg project perspective there isn't anything to do, as UUID already *is* part of the spec ( https://iceberg.apache

Re: [DISCUSS] UUID type

2021-08-01 Thread Kyle B
Hi Ryan and all, That sounds like a reasonable reason to leave IP address types out. In my experience, dedicated IP address types are mostly found in logging tools and other things for sysadmins / DevOps etc. When querying data with IP addresses, I’ve seen it done quite a lot (eg security reas

Re: [DISCUSS] UUID type

2021-07-30 Thread Ryan Blue
Jacques, you make some good points here. I think my argument about usability leading to performance issues is a stronger argument for engines than for Iceberg. Still, there are inefficiencies in Iceberg if someone chooses to use a string in an engine that doesn't have a UUID type. Another thing to

Re: [DISCUSS] UUID type

2021-07-30 Thread parth brahmbhatt
I am personally against UUID that does not guarantee at the spec level that they are unique across something. Even if the spec could guarantee that, it feels like we are trying to define a type for what should be a constraint. I would rather remove support for UUID and let the engines do coercion w

Re: [DISCUSS] UUID type

2021-07-29 Thread Jacques Nadeau
It seems like Spark, Hive, Dremio and Impala all lack UUID as a native type. Which engines are you thinking of that have a native UUID type besides the Presto derivatives and support Iceberg? I agree that Trino should expose a UUID type on top of Iceberg tables. All the user experience things that

Re: [DISCUSS] UUID type

2021-07-29 Thread Ryan Blue
I don't think this is just a problem in Trino. If there is no UUID type, then a user must choose between a 36-byte string and a 16-byte binary. That's not a good choice to force people into. If someone chooses binary, then it's harder to work with rows and construct queries even though there is a

Re: [DISCUSS] UUID type

2021-07-29 Thread Jacques Nadeau
I think points 1&2 don't really apply since a fixed width binary already covers those properties. It seems like this isn't really a concern of iceberg but rather a cosmetic layer that exists primarily (only?) in trino. In that case I would be inclined to say that trino should just use custom metad

Re: [DISCUSS] UUID type

2021-07-29 Thread Piotr Findeisen
Hi, I agree with Ryan, that it takes some precautions before one can assume uniqueness of UUID values, and that this shouldn't be any special for UUIDs at all. After all, this is just a primitive type, which is commonly used for certain things, but "commonly" doesn't mean "always". The advantages

Re: [DISCUSS] UUID type

2021-07-28 Thread Ryan Blue
The original reason why I added UUID to the spec was that I thought there would be opportunities to take advantage of UUIDs as unique values and to optimize the use of UUIDs. I was thinking about auto-increment ID fields and how we might do something similar in Iceberg. The reason we have thought

Re: [DISCUSS] UUID type

2021-07-28 Thread Russell Spitzer
Without time based uuid's as a special type I think these aren't as useful, since the only comparator that works on a non time UUID is equality. For TimeUUIDs you need another comparator (and type) since they are not lexicographically comparable but then you can actually benefit from range pred

Re: [DISCUSS] UUID type

2021-07-27 Thread Jack Ye
Yes I agree with Jacques that fixed binary is what it is in the end. I think It is more about user experience, whether the conversion is done at the user side or Iceberg and engine side. Many people just store UUID as a 36 byte string instead of a 16 byte binary, so with an explicit UUID type, Iceb

Re: [DISCUSS] UUID type

2021-07-27 Thread Jacques Nadeau
What specific arguments are there for it being a first class type besides it is elsewhere? Is there some kind of optimization iceberg or an engine could do if it was typed versus just a bucket of bits? Fixed width binary seems to cover the cases I see in terms of actual functionality in the iceberg

Re: [DISCUSS] UUID type

2021-07-27 Thread Yan Yan
One conversation I used to come across regarding UUID deprecation was from https://github.com/apache/iceberg/pull/1611 Thanks, Yan On Tue, Jul 27, 2021 at 1:07 PM Peter Vary wrote: > Hi Joshua, > > I do not have a strong preference about the UUID type, but I would like > the highlight, that the

Re: [DISCUSS] UUID type

2021-07-27 Thread Peter Vary
Hi Joshua, I do not have a strong preference about the UUID type, but I would like the highlight, that the type is handled inconsistently in Iceberg with different file formats. (See: https://github.com/apache/iceberg/issues/1881 ) If we keep the type, it would be good to standardize the handling

[DISCUSS] UUID type

2021-07-27 Thread Joshua Howard
Hi. UUID is a current data type according to the Iceberg spec (https://iceberg.apache.org/spec/#primitive-types), but there seems to have been some discussion about removing it? I could not find the original discussion, but a reference to the discussion can be found here (https://github.com/t