On Wed, 15 Jan 2025 at 23:57, John Naylor <johncnaylo...@gmail.com> wrote: > > On Wed, Jan 15, 2025 at 2:14 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Compilers that inline memcpy() may arrive at the same machine code, > > but why rely on the compiler to make that optimization? If the > > compiler fails to do so, an out-of-line memcpy() call will surely > > be a loser. > > See measurements at the end. As for compilers, gcc 3.4.6 and clang > 3.0.0 can inline the memcpy. The manual copy above only gets combined > to a single word starting with gcc 12 and clang 15, and latest MSVC > still can't do it (4A in the godbolt link below). Are there any > buildfarm animals around that may not inline memcpy for word-sized > input? > > > A variant could be > > > > + const char *hexptr = &hextbl[2 * usrc]; > > + *dst++ = hexptr[0]; > > + *dst++ = hexptr[1];
I'd personally much rather see us using memcpy() for this sort of stuff. If the compiler is too braindead to inline tiny constant-and-power-of-two-sized memcpys then we'd probably also have plenty of other performance issues with that compiler already. I don't think contorting the code into something less human-readable and something the compiler may struggle even more to optimise is a good idea. The nieve way to implement the above requires two MOVs of single bytes and two increments of dst. I imagine it's easier for the compiler to inline a small constant-sized memcpy() than to figure out that it's safe to implement the above with a single word-sized MOV rather than two byte-sized MOVs due to the "dst++" in between the two. I agree that the evidence you (John) gathered is enough reason to use memcpy(). David