I already added it to Substrait because of Iceberg lazy consensus :D
On Fri, Sep 17, 2021 at 2:05 PM Ryan Blue wrote:
> Let's move forward with it. I'm not hearing much dissent after saying the
> general trend is to keep UUID. So let's call it lazy consensus.
>
> Ryan
>
> On Fri, Sep 17, 2021 a
Let's move forward with it. I'm not hearing much dissent after saying the
general trend is to keep UUID. So let's call it lazy consensus.
Ryan
On Fri, Sep 17, 2021 at 1:32 PM Piotr Findeisen
wrote:
> Hi Ryan,
>
> Please advise whatever feels more appropriate from your perspective.
> From my per
Hi Ryan,
Please advise whatever feels more appropriate from your perspective.
>From my perspective, we could just go ahead and merge Trino Iceberg support
for UUID, since this is just fulfilling the spec as it is defined today.
Best
PF
On Wed, Sep 15, 2021 at 10:17 PM Ryan Blue wrote:
> I don
I don't think we necessarily reached consensus, but I think the general
trend toward the end was to keep support for UUID. Should we start a vote
to validate consensus?
On Wed, Sep 15, 2021 at 1:15 PM Joshua Howard wrote:
> Just following up on Piotr's message here.
>
> Have we converged? I thin
Just following up on Piotr's message here.
Have we converged? I think most people would assume that silence is a vote
for the status-quo.
On Mon, Sep 13, 2021 at 7:30 AM Piotr Findeisen
wrote:
> Hi,
>
> It seems we converged here that UUID should remain included.
> I read this as a consensus re
Hi,
It seems we converged here that UUID should remain included.
I read this as a consensus reached, but it may be subjective. Did we
objectively reached consensus on this?
>From Iceberg project perspective there isn't anything to do, as UUID
already *is* part of the spec (
https://iceberg.apache
Hi Ryan and all,
That sounds like a reasonable reason to leave IP address types out. In my
experience, dedicated IP address types are mostly found in logging tools and
other things for sysadmins / DevOps etc.
When querying data with IP addresses, I’ve seen it done quite a lot (eg
security reas
Jacques, you make some good points here. I think my argument about
usability leading to performance issues is a stronger argument for engines
than for Iceberg. Still, there are inefficiencies in Iceberg if someone
chooses to use a string in an engine that doesn't have a UUID type.
Another thing to
I am personally against UUID that does not guarantee at the spec level that
they are unique across something. Even if the spec could guarantee that, it
feels like we are trying to define a type for what should be a constraint.
I would rather remove support for UUID and let the engines do coercion w
It seems like Spark, Hive, Dremio and Impala all lack UUID as a native
type. Which engines are you thinking of that have a native UUID type
besides the Presto derivatives and support Iceberg?
I agree that Trino should expose a UUID type on top of Iceberg tables. All
the user experience things that
I don't think this is just a problem in Trino.
If there is no UUID type, then a user must choose between a 36-byte string
and a 16-byte binary. That's not a good choice to force people into. If
someone chooses binary, then it's harder to work with rows and construct
queries even though there is a
I think points 1&2 don't really apply since a fixed width binary already
covers those properties.
It seems like this isn't really a concern of iceberg but rather a cosmetic
layer that exists primarily (only?) in trino. In that case I would be
inclined to say that trino should just use custom metad
Hi,
I agree with Ryan, that it takes some precautions before one can assume
uniqueness of UUID values, and that this shouldn't be any special for UUIDs
at all.
After all, this is just a primitive type, which is commonly used for
certain things, but "commonly" doesn't mean "always".
The advantages
The original reason why I added UUID to the spec was that I thought there
would be opportunities to take advantage of UUIDs as unique values and to
optimize the use of UUIDs. I was thinking about auto-increment ID fields
and how we might do something similar in Iceberg.
The reason we have thought
Without time based uuid's as a special type I think these aren't as useful,
since the only comparator that works on a non time UUID is equality. For
TimeUUIDs you need another comparator (and type) since they are not
lexicographically comparable but then you can actually benefit from range
pred
Yes I agree with Jacques that fixed binary is what it is in the end. I
think It is more about user experience, whether the conversion is done at
the user side or Iceberg and engine side. Many people just store UUID as a
36 byte string instead of a 16 byte binary, so with an explicit UUID type,
Iceb
What specific arguments are there for it being a first class type besides
it is elsewhere? Is there some kind of optimization iceberg or an engine
could do if it was typed versus just a bucket of bits? Fixed width binary
seems to cover the cases I see in terms of actual functionality in the
iceberg
One conversation I used to come across regarding UUID deprecation was from
https://github.com/apache/iceberg/pull/1611
Thanks,
Yan
On Tue, Jul 27, 2021 at 1:07 PM Peter Vary
wrote:
> Hi Joshua,
>
> I do not have a strong preference about the UUID type, but I would like
> the highlight, that the
Hi Joshua,
I do not have a strong preference about the UUID type, but I would like the
highlight, that the type is handled inconsistently in Iceberg with
different file formats. (See: https://github.com/apache/iceberg/issues/1881
)
If we keep the type, it would be good to standardize the handling
Hi.
UUID is a current data type according to the Iceberg spec
(https://iceberg.apache.org/spec/#primitive-types), but there seems to have
been some discussion about removing it? I could not find the original
discussion, but a reference to the discussion can be found here
(https://github.com/t
20 matches
Mail list logo