Alvaro Herrera <alvhe...@2ndquadrant.com> writes: > That leads me to the attached patch. It creates a new file > pg_popcount.c which is the only one compiled with -mpopcnt (if > available); if there's no compiler switch to enable POPCNT, we just > don't compile the file. I'm not sure that's kosher -- in particular I'm > not sure if it can fail when POPCNT is enabled by other flags and > -mpopcnt is not needed at all. I think our c-compiler.m4 stuff is a bit > too simplistic there: it just assumes that -mpopcnt is always required.
Yes, the configure test for this stuff is really pretty broken. It's conflating two nearly independent questions: (1) does the compiler have __builtin_popcount(), and (2) does the compiler accept -mpopcnt. It is certainly the case that (1) may hold without (2); in fact, every recent non-x86_64 gcc is a counterexample to how that's done in HEAD. I think we need a clean test for __builtin_popcount(), and to be willing to use it if available, independently of -mpopcnt. Then separately we should test to see if -mpopcnt works, probably with the same infrastructure we use for checking for other compiler flags, viz # Optimization flags for specific files that benefit from vectorization PGAC_PROG_CC_VAR_OPT(CFLAGS_VECTOR, [-funroll-loops]) PGAC_PROG_CC_VAR_OPT(CFLAGS_VECTOR, [-ftree-vectorize]) + # Optimization flags for bit-twiddling + PGAC_PROG_CC_VAR_OPT(CFLAGS_POPCNT, [-mpopcnt]) # We want to suppress clang's unhelpful unused-command-line-argument warnings Then the correct test to see if we want to build pg_popcount.c (BTW, please pick a less generic name for that) and the choose function is whether we have *both* HAVE__BUILTIN_POPCOUNT and nonempty CFLAGS_POPCNT. I don't think this'd be fooled by user-specified CFLAGS. The worst possible outcome is that it builds a function that we intended would use POPCNT but it's falling back to some other implementation, in case the compiler has a switch named -mpopcnt but it doesn't do what we think it does, or the user overrode things by adding -mno-popcnt. That would really be nearly cost-free, other than the overhead of the choose function the first time through: both of the execution functions would be reducing to __builtin_popcount(), for whatever version of that the compiler is giving us, so the choice wouldn't matter. regards, tom lane