Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

Sergey Prokhorenko Mon, 27 Oct 2025 15:37:27 -0700

 
 
 On Sat, Oct 25, 2025 at 11:07 AM Andrey Borodin <[email protected]> wrote:
>
>
>
> > On 25 Oct 2025, at 04:31, Masahiko Sawada <[email protected]> wrote:
> >
> > Or providing
> > 'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
> > text) -> uuid' might make sense too, but I'm not sure.
>
> I like the idea, so I drafted a prototype for discussion.
> Though I do not see what else methods should be provided along with added 
> one...


Thank you for drafting the patch! But I find it potentially confusing
to have different encoding methods for bytea and UUID types. I don't
see a compelling reason why the core should support base32hex
exclusively for the UUID data type, nor why base32hex should be the
only encoding method that the core provides for UUIDs (while we can
use it by default).

If we implement uuid_encode() and uuid_decode(), we might end up
creating similar encoding and decoding functions for other data types
as well, which doesn't seem like the best approach. I still believe
that extending the existing encode() and decode() functions is a
better starting point.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
________________________________________________
 


Masahiko,
 I wanted to highlight an important discussion among the authors and 
contributors of RFC 9562 regarding UUID text encoding:
https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussions/17#discussioncomment-10614817
 The RFC 9562 authors and contributors reached consensus that standardizing an 
alternate short text format for UUIDs is important. While the community debated 
between base32hex (RFC 4648) and Crockford's Base32, both were recognized for 
preserving lexicographical sort order, a critical property for database primary 
keys and URL-safe identifiers. Time constraints prevented inclusion in RFC 
9562, but the discussion established that base32hex is the existing standard 
format already defined in RFC 4648, Section 7, specifically designed for 
sort-preserving encoding.

This context is crucial because it underscores that the uuid type, as a 
first-class concept, deserves its own standardized text encoding.

Regarding the proposal to couple UUID encoding with the bytea type through 
encode()/decode() functions: I understand the appeal of reusing existing 
infrastructure, but this creates a conceptual mismatch. UUID is a distinct 
semantic type in PostgreSQL, not merely binary data. The bytea type has existed 
for decades without base32hex encoding, and that's worked fine, because bytea 
represents arbitrary binary data, not universally unique identifiers with 
specific structural properties and needs.
Consider PostgreSQL's own design philosophy. The documentation states:
"9.5. Binary String Functions and Operators  This section describes functions 
and operators for examining and manipulating binary strings, that is values of 
type bytea. Many of these are equivalent, in purpose and syntax, to the 
text-string functions described in the previous section."
 PostgreSQL maintains parallel function sets for text strings and bytea 
precisely because they serve different purposes, despite the implementation 
overhead. The uuid type deserves the same treatment: it's not just another 
binary blob, but a type with specific semantics (uniqueness, version bits, 
variant encoding) and use cases (distributed identifiers, sortable keys, 
URL-safe representations).
Why should uuid be treated as a second-class citizen and forced through bytea 
conversion, when text and bytea each have their own dedicated function families?
You've been very careful in your previous arguments to separate data type 
conversion from encoding/decoding operations. I appreciate that rigor. However, 
the current proposal to route UUID encoding through bytea contradicts that 
principle. It merges two fundamentally different data types for convenience 
rather than correctness.
 If someone wants to add base32hex encoding/decoding to bytea for general 
binary data operations, that's a worthwhile but separate discussion. The uuid 
type, however, needs native base32hex support to fulfill its role as a 
first-class PostgreSQL type with a standardized compact text representation, as 
recommended by the RFC 9562 community.

I would value your thoughts on these arguments.

Best regards,
Sergey Prokhorenko

Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

Reply via email to