Nathan Bossart <nathandboss...@gmail.com> writes: > The aforementioned other thread [0] aims to further optimize this code by > using another instruction that requires additional configure and/or runtime > checks. $SUBJECT has been in the back of my mind for a while, but given > proposals to add further complexity to this code, I figured it might be a > good time to propose this simplification. Specifically, I think we > shouldn't worry about trying to compile only the special instrinics > versions, and instead always try to build both and choose the appropriate > one at runtime.
On the one hand, I agree that we need to keep the complexity from getting out of hand. On the other hand, I wonder if this approach isn't optimizing for the wrong case. How many machines that PG 17 will ever be run on in production will lack SSE 4.2 (for Intel) or ARMv8 instructions (on the ARM side)? It seems like a shame to be burdening these instructions with a subroutine call for the benefit of long-obsolete hardware versions. Maybe that overhead is negligible, but it doesn't sound like you tried to measure it. I'm not quite sure what to propose instead, though. I thought for a little bit about a configure switch to select "test first" or "pedal to the metal". But in practice, package builders would probably have to select the conservative "test first" option; and we know that the vast majority of modern installations use prebuilt packages, so it's not clear that this answer would help many people. Anyway, I agree that the cost of a one-time-per-process probe should be negligible; it's the per-use cost that I worry about. If you can do some measurements proving that that worry is ill-founded, then I'm good with test-first. regards, tom lane