+1 (binding) pending agreement on the endianness which I agree needs to be specified in the docs. While I lean towards big-endian as it appears most implementations of UUID use a big-endian byte order, I don't much mind what endianness we use as long as we explicitly specify it in the spec.
On Mon, Apr 29, 2024 at 3:30 PM Fokko Driesprong <fo...@apache.org> wrote: > +1 (non-binding) > > First of all, thanks Rok for working on this 🙌 I raised the mentioned > issue on GitHub back in December 2022 and I still believe it would be a > good addition to the spec. > > In Iceberg UUIDs are encoded using big endian. For example, the UUID: > f79c3e09-677c-4bbd-a479-3f349cb785e7 is encoded as a byte array: F7 9C 3E > 09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7. Avro supported UUIDs for a long > time as a logical type on top of a string, but now also using fixed[16] > <https://issues.apache.org/jira/browse/AVRO-3918> which is the way to go > < > https://docs.google.com/document/d/16_oSWrEM7AFUCTe0uuraAEHxywezLfoEz5ahzwvhGUk/edit#heading=h.43xuauwfk7ow > > > and > is also in line with the PR by Rok. > > Kind regards, > Fokko > > > > Op ma 29 apr 2024 om 20:37 schreef Micah Kornfield <emkornfi...@gmail.com > >: > > > You are correct, it looks like UUID version should be encoded properly in > > the UUID data, I think another concern around endianess was raised which > > should probably be resolved before the vote is finalized. > > > > Thanks, > > Micah > > > > On Monday, April 29, 2024, Felipe Oliveira Carvalho <felipe...@gmail.com > > > > wrote: > > > > > Isn't that easily decodable from the UUID data itself? > > > > > > If you allow the version to be specified as metadata, you now have to > > > validate and make sure it's consistent with the version encoded in the > > > contents of the UUID column. And UUID versions are more of a concern > > > for UUID generation than consumption. > > > > > > -- > > > Felipe > > > > > > On Mon, Apr 29, 2024 at 2:31 PM Micah Kornfield <emkornfi...@gmail.com > > > > > wrote: > > > > > > > > Apologies for the late reply, but I think being able to specify the > > UUID > > > > version as metadata might make sense in some cases? > > > > > > > > On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc <rok.mih...@gmail.com> > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Following initial requests [1][2] and recent tangential ML > discussion > > > [3] I > > > > > would like to propose a vote to add language for UUID canonical > > > extension > > > > > type to CanonicalExtensions.rst as in PR [4] and written below. > > > > > A draft C++ and Python implementation PR can be seen here [5]. > > > > > > > > > > [1] > https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j > > > > > [2] https://github.com/apache/arrow/issues/15058 > > > > > [3] > https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n > > > > > [4] https://github.com/apache/arrow/pull/41299 <- proposed change > > > > > [5] https://github.com/apache/arrow/pull/37298 > > > > > > > > > > > > > > > The vote will be open for at least 72 hours. > > > > > > > > > > [ ] +1 Accept this proposal > > > > > [ ] +0 > > > > > [ ] -1 Do not accept this proposal because... > > > > > > > > > > > > > > > UUID > > > > > ==== > > > > > > > > > > * Extension name: `arrow.uuid`. > > > > > > > > > > * The storage type of the extension is ``FixedSizeBinary`` with a > > > length of > > > > > 16 bytes. > > > > > > > > > > .. note:: > > > > > A specific UUID version is not required or guaranteed. This > > > extension > > > > > represents > > > > > UUIDs as FixedSizeBinary(16) and does not interpret the bytes in > > any > > > > > way. > > > > > > > > > > > > > > > > > > > > Rok > > > > > > > > > > >