On Tue, Aug 3, 2021 at 11:36 PM David Rowley <dgrowle...@gmail.com> wrote: > > On Tue, 3 Aug 2021 at 22:43, John Naylor <john.nay...@enterprisedb.com> wrote: > > 1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not assembly. Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics coded, but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could easily write ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt instruction. Is MSVC different in that regard? If so, it might be worth a comment. > > Yeah, the names no longer fit so well after stuffing the MSVC > intrinsic into the asm function. The reason I did it that way is down > to what I read in the docs. Namely: > > "If you run code that uses these intrinsics on hardware that doesn't > support the popcnt instruction, the results are unpredictable." > > So, it has to go somewhere where we're sure the CPU supports POPCNT > and that seemed like the correct place. > > From what I understand of GCC and __builtin_popcountl(), the code > it'll output will depend on the -mpopcnt flag. So having > __builtin_popcountl() does not mean the popcnt instruction will be > used. The Microsoft documentation indicates that's not the case for > __popcnt().
Okay, "unpredictable" sounds bad. > > 2. (defined(_MSC_VER) && defined(_WIN64) lead to a runtime check for the CPUID, which is fine, but now next to it HAVE_X86_64_POPCNTQ looks strange because the latter symbol comes from a specific configure test -- the two don't seem equivalent, but maybe they are because of what MSVC does? That would be nice to spell out here. > > The problem there is that we define HAVE_X86_64_POPCNTQ based on the > outcome of configure so it does not get set for MSVC. Maybe it's > worth coming up with some other constant that can be defined or we > could just do: > > #if defined(_MSC_VER) && defined(_WIN64) > #define HAVE_X86_64_POPCNTQ > #endif That seems fine. I don't know PG can build with Arm on Windows, but for the cpuid to work, it seems safer to also check for __x86_64? > I think the reason for the asm is that __builtin_popcountl does not > mean popcnt will be used. Maybe we could have done something like > compile pg_bitutils.c with -mpopcnt, and have kept the run-time check, > but the problem there is that means that the compiler might end up > using that instruction in some other function in that file that we > don't want it to. It looks like my patch in [1] did pass the -mpopcnt > flag which Tom fixed. Ah, that makes sense. (If we someday offer a configure option for x86-64-v2, we can use intrinsics in the asm functions, and call them directly. Yet another different thread.) -- John Naylor EDB: http://www.enterprisedb.com