On Mon, Jun 29, 2026 at 2:55 PM Masahiko Sawada <[email protected]> wrote: > > On Sun, Jun 28, 2026 at 7:20 PM Haibo Yan <[email protected]> wrote: > > > > On Thu, Jun 25, 2026 at 3:16 PM Masahiko Sawada <[email protected]> > > wrote: > > > > > > On Thu, Jun 25, 2026 at 2:31 PM Haibo Yan <[email protected]> wrote: > > > > > > > > > > > > > > > > On Thu, Jun 25, 2026 at 11:28 AM Masahiko Sawada > > > > <[email protected]> wrote: > > > >> > > > >> Hi all, > > > >> > > > >> I'd like to propose the $subject. > > > >> > > > >> Since commit ec8719ccbfcd made hex_decode_safe() SIMD-aware, decoding > > > >> a run of hex digits is now fast. The attached patch reuses > > > >> hex_decode_safe() in the UUID input function to speed up parsing. > > > >> > > > >> We accept several textual forms of a UUID[1]. The fast path handles > > > >> the common ones: 32 hex digits, the canonical 8x-4x-4x-4x-12x form > > > >> (where "nx" means n hex digits), and either of those wrapped in > > > >> braces. Otherwise, it falls back to the ordinary scalar UUID parse. > > > >> > > > >> I've benchmarked the parse speed using the following query: > > > >> > > > >> CREATE TEMP TABLE u AS SELECT gen_random_uuid()::text AS t FROM > > > >> generate_series(1, 1000000); > > > >> EXPLAIN (ANALYZE, TIMING OFF) SELECT t::uuid FROM u; > > > >> > > > >> I compared the execution time of the second query, which measures > > > >> uuid_in() alone, with/without SIMD optimization. Here are results (the > > > >> median of 5 runs): > > > >> > > > >> HEAD: 208.879 ms > > > >> Patched: 40.983 ms > > > >> > > > >> The improvements look promising to me. But in a realistic pipeline the > > > >> parse is a small fraction of the work, so end-to-end gains could be > > > >> much smaller. > > > >> > > > >> Feedback is very welcome. > > > >> > > > > I may be missing something, but I wonder whether the fast path is > > > > relying on > > > > slightly different input semantics from the existing UUID parser. > > > > > > > > In particular, hex_decode_safe() is not a strict “32 hex characters > > > > only” > > > > decoder. It skips whitespace, which is fine for its existing callers, > > > > but I > > > > don’t think UUID input should treat whitespace inside the UUID body as > > > > ignorable. > > > > > > Good catch! hex_decode_safe() skips whitespaces so the patch accepts > > > the following UUID value, which is bad: > > > > > > select '019f00b5-7f8a-722f-b707-59f0ed25cd '::uuid; > > > uuid > > > -------------------------------------- > > > 019f00b5-7f8a-722f-b707-59f0ed25cd00 > > > (1 row) > > > > > > > Also, since hex_decode_safe() returns void, the UUID fast path > > > > cannot verify that exactly UUID_LEN bytes were produced. > > > > > > IIUC hex_decode_safe() does return the output length in bytes. So I > > > think we can fallback to the scalar UUID parser if > > > esctx.error_occurred is true or if the returned value is not 16. > > > > > > > You’re right, I misread that part. Checking both esctx.error_occurred and > > the returned length sounds good to me. > > > > > > > > > > So I think it would be safer either to pre-validate that the 32 source > > > > characters are all hex digits before calling hex_decode_safe(), or to > > > > use a > > > > UUID-specific strict hex decoder for this path. After that, a comment > > > > explaining why hex_decode_safe() is safe here would make the invariant > > > > much > > > > clearer. > > > > > > IIUC hex_decode_simd_helper() accepts only hex digits so we could > > > re-use it for UUID parsing. Let me check if the above idea of using > > > the return value works for us first. > > > > > > > That sounds reasonable. My main concern was to keep the fast path’s > > accepted > > input set identical to the scalar UUID parser. Falling back when the > > decoded > > length is not UUID_LEN, together with regression tests for whitespace cases, > > should address that. > > > > > > > > > > Could you also add a few regression tests for invalid inputs that > > > > contain > > > > whitespace inside otherwise fast-path-looking UUID strings? For > > > > example: > > > > > > > > --------------------------------------------------------------- > > > > > > > > SELECT 'a0eebc99 9c0b4ef8bb6d6bb9bd380a11'::uuid; > > > > SELECT 'a0eebc999c0b4ef8bb6d6bb9bd380a1 '::uuid; > > > > SELECT '{a0eebc999c0b4ef8bb6d6bb9bd380a1 }'::uuid; > > > > SELECT 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a1 '::uuid; > > > > --------------------------------------------------------------- > > > > > > > > These should continue to be rejected in the same way as the scalar > > > > parser. > > > > Regards, > > > > > > Agreed. > > > > > I've attached the updated patch. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com
I noticed a few typos in the comments: src/backend/utils/adt/uuid.c line 56: “scalar implmentation” -> “scalar implementation” line 109: “swalled” -> “swallowed” line 110: “kepping” -> “keeping” line 118: “grammer” -> “grammar” line 119: “whitespaces” -> “whitespace” Could you fix them ? Thank you. Regards. Haibo
