With commit e24d770 in place, I took a closer look at hex_decode(), and I
concluded that doing anything better without intrinsics would likely
require either a huge lookup table or something with complexity rivalling
the instrinsics approach (while also not rivalling its performance).  So, I
took a closer look at the instrinsics patches and had the following
thoughts:

* The approach looks generally reasonable to me, but IMHO the code needs
  much more commentary to explain how it works.

* The functions that test the length before potentially calling a function
  pointer should probably be inlined (see pg_popcount() in pg_bitutils.h).
  I wouldn't be surprised if some compilers are inlining this stuff
  already, but it's probably worth being explicit about it.

* Finally, I think we should ensure we've established a really strong case
  for this optimization.  IME these intrinsics patches require a ton of
  time and energy, and the code is often extremely complex.  I would be
  interested to see how your bytea test compares with the improvements
  added in commit e24d770 and with sending the data in binary.

-- 
nathan


Reply via email to