On Wed, Feb 19, 2025 at 1:28 AM Devulapalli, Raghuveer <raghuveer.devulapa...@intel.com> wrote: > The runtime detection code could also be appended with function > __attribute__((constructor)) so that it gets executed before main.
Hmm! I didn't know about that, thanks! It works on old gcc/clang, so that's good. I can't verify online if Arm etc. executes properly since the execute button is greyed out, but I don't see why it wouldn't: https://godbolt.org/z/3as9MGr3G Then we could have: #ifdef FRONTEND pg_attribute_constructor() #endif void pg_cpucap_initialize(void) { ... } Now, there is no equivalent on MSVC and workarounds are fragile [1]. Maybe we could only assert initialization happened for the backend and for frontend either - add a couple manual initializations to to the frontend programs where we don't want to lose performance for non-gcc/clang. - require CRC on x86-64 MSVC since Windows 10 is EOL soon, going by Thomas M.'s earlier findings on popcount (also SSE4.2) [2] The first is less risky but less tidy. > > I think the runtime dispatch is fairly simple for this case. > + pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len) > + { > + .... > + #ifdef HAVE_PCLMUL_RUNTIME > + if (len >= PCLMUL_THRESHOLD && (pg_cpucap & PGCPUCAP_CLMUL)) > > IMO, the pclmul and sse4.2 versions should be dispatched independently and > the dispatch logic should be moved out of the pg_crc32c.h to it own > pg_crc32c_dispatch.c file. That makes sense, but it does close the door on future inlining. > Also, thank you for taking lead on developing this patch. Since the latest > patch seems to be in a good shape, I'm happy to provide feedback and review > your work, or can continue development work based on any feedback. Please let > me know which you'd prefer. Sure. Depending on any feedback on the above constructor technique, I'll work on the following, since I've prototyped most of the first and the second is mostly copy-and-paste from your earlier work: > > pg_cpucap_crc32c > > I like the idea of moving all CPU runtime detection to a single module > outside of implementations. This makes it easy to extend in the future and > keeps the functionalities separate. > > > - Rename the CRC choose*.c files to pg_cpucap_{x86,arm}.c and build them > > unconditionally for each platform > > +1. Sounds perfect. We should also move the avx512 runtime detection of > popcount here. [1] https://stackoverflow.com/questions/1113409/attribute-constructor-equivalent-in-vc [2] https://www.postgresql.org/message-id/ca+hukgks64zjezv9y9mpcb-j0i+flgiv3fadwsh_3scavdr...@mail.gmail.com -- John Naylor Amazon Web Services