+1 (binding) pending agreement on the endianness which I agree needs to be
specified in the docs. While I lean towards big-endian as it appears most
implementations of UUID use a big-endian byte order, I don't much mind what
endianness we use as long as we explicitly specify it in the spec.

On Mon, Apr 29, 2024 at 3:30 PM Fokko Driesprong <fo...@apache.org> wrote:

> +1 (non-binding)
>
> First of all, thanks Rok for working on this 🙌 I raised the mentioned
> issue on GitHub back in December 2022 and I still believe it would be a
> good addition to the spec.
>
> In Iceberg UUIDs are encoded using big endian. For example, the UUID:
> f79c3e09-677c-4bbd-a479-3f349cb785e7 is encoded as a byte array: F7 9C 3E
> 09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7. Avro supported UUIDs for a long
> time as a logical type on top of a string, but now also using fixed[16]
> <https://issues.apache.org/jira/browse/AVRO-3918> which is the way to go
> <
> https://docs.google.com/document/d/16_oSWrEM7AFUCTe0uuraAEHxywezLfoEz5ahzwvhGUk/edit#heading=h.43xuauwfk7ow
> >
> and
> is also in line with the PR by Rok.
>
> Kind regards,
> Fokko
>
>
>
> Op ma 29 apr 2024 om 20:37 schreef Micah Kornfield <emkornfi...@gmail.com
> >:
>
> > You are correct, it looks like UUID version should be encoded properly in
> > the UUID data, I think another concern around endianess was raised which
> > should probably be resolved before the vote is finalized.
> >
> > Thanks,
> > Micah
> >
> > On Monday, April 29, 2024, Felipe Oliveira Carvalho <felipe...@gmail.com
> >
> > wrote:
> >
> > > Isn't that easily decodable from the UUID data itself?
> > >
> > > If you allow the version to be specified as metadata, you now have to
> > > validate and make sure it's consistent with the version encoded in the
> > > contents of the UUID column. And UUID versions are more of a concern
> > > for UUID generation than consumption.
> > >
> > > --
> > > Felipe
> > >
> > > On Mon, Apr 29, 2024 at 2:31 PM Micah Kornfield <emkornfi...@gmail.com
> >
> > > wrote:
> > > >
> > > > Apologies for the late reply, but I think being able to specify the
> > UUID
> > > > version as metadata might make sense in some cases?
> > > >
> > > > On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc <rok.mih...@gmail.com>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Following initial requests [1][2] and recent tangential ML
> discussion
> > > [3] I
> > > > > would like to propose a vote to add language for UUID canonical
> > > extension
> > > > > type to CanonicalExtensions.rst as in PR [4] and written below.
> > > > > A draft C++ and Python implementation PR can be seen here [5].
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> > > > > [2] https://github.com/apache/arrow/issues/15058
> > > > > [3]
> https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> > > > > [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> > > > > [5] https://github.com/apache/arrow/pull/37298
> > > > >
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Accept this proposal
> > > > > [ ] +0
> > > > > [ ] -1 Do not accept this proposal because...
> > > > >
> > > > >
> > > > > UUID
> > > > > ====
> > > > >
> > > > > * Extension name: `arrow.uuid`.
> > > > >
> > > > > * The storage type of the extension is ``FixedSizeBinary`` with a
> > > length of
> > > > > 16 bytes.
> > > > >
> > > > > .. note::
> > > > >    A specific UUID version is not required or guaranteed. This
> > > extension
> > > > > represents
> > > > >    UUIDs as FixedSizeBinary(16) and does not interpret the bytes in
> > any
> > > > > way.
> > > > >
> > > > >
> > > > >
> > > > > Rok
> > > > >
> > >
> >
>

Reply via email to