On Mon, Sep 22, 2025 at 03:05:44PM -0500, Nathan Bossart wrote:
> I was able to improve the hex_decode() implementation a bit.

I took a closer look at how hex_decode() performs with smaller inputs.
There are some small regressions, so I tried fixing them by adding the
following to the beginning of the function:

    if (likely(tail_idx == 0))
        return hex_decode_safe_scalar(src, len, dst, escontext);

This helped a little, but it mostly just slowed things down for larger
inputs on AArch64:

                arm
    buf  | HEAD  | patch |  fix 
  -------+-------+-------+-------
       2 |     4 |     6 |     4
       4 |     6 |     7 |     7
       8 |     8 |     8 |     8
      16 |    11 |    12 |    11
      32 |    18 |     5 |     6
      64 |    38 |     7 |     8
     256 |   134 |    18 |    24
    1024 |   514 |    67 |   100
    4096 |  2072 |   280 |   389
   16384 |  8409 |  1126 |  1537
   65536 | 34704 |  4498 |  6128

                x86
    buf  | HEAD  | patch |  fix
  -------+-------+-------+-------
       2 |     2 |     2 |     2
       4 |     3 |     3 |     3
       8 |     4 |     4 |     4
      16 |     8 |     9 |     8
      32 |    23 |     5 |     5
      64 |    37 |     7 |     7
     256 |   122 |    24 |    24
    1024 |   457 |    91 |    92
    4096 |  1798 |   357 |   358
   16384 |  7161 |  1411 |  1416
   65536 | 28621 |  5630 |  5653

I didn't do this test for hex_encode(), but I'd expect it to follow a
similar pattern.  I'm tempted to suggest that these regressions are within
tolerable levels and to forge on with v10.  In any case, IMHO this patch is
approaching committable quality, so I'd be grateful for any feedback.

-- 
nathan


Reply via email to