On Mon, Sep 22, 2025 at 03:05:44PM -0500, Nathan Bossart wrote: > I was able to improve the hex_decode() implementation a bit.
I took a closer look at how hex_decode() performs with smaller inputs. There are some small regressions, so I tried fixing them by adding the following to the beginning of the function: if (likely(tail_idx == 0)) return hex_decode_safe_scalar(src, len, dst, escontext); This helped a little, but it mostly just slowed things down for larger inputs on AArch64: arm buf | HEAD | patch | fix -------+-------+-------+------- 2 | 4 | 6 | 4 4 | 6 | 7 | 7 8 | 8 | 8 | 8 16 | 11 | 12 | 11 32 | 18 | 5 | 6 64 | 38 | 7 | 8 256 | 134 | 18 | 24 1024 | 514 | 67 | 100 4096 | 2072 | 280 | 389 16384 | 8409 | 1126 | 1537 65536 | 34704 | 4498 | 6128 x86 buf | HEAD | patch | fix -------+-------+-------+------- 2 | 2 | 2 | 2 4 | 3 | 3 | 3 8 | 4 | 4 | 4 16 | 8 | 9 | 8 32 | 23 | 5 | 5 64 | 37 | 7 | 7 256 | 122 | 24 | 24 1024 | 457 | 91 | 92 4096 | 1798 | 357 | 358 16384 | 7161 | 1411 | 1416 65536 | 28621 | 5630 | 5653 I didn't do this test for hex_encode(), but I'd expect it to follow a similar pattern. I'm tempted to suggest that these regressions are within tolerable levels and to forge on with v10. In any case, IMHO this patch is approaching committable quality, so I'd be grateful for any feedback. -- nathan