https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108248
--- Comment #5 from Jeffrey A. Law <law at gcc dot gnu.org> --- So a datapoint in this effort. For the Veyron V1, all the bitmanip instructions except clmul and cpop are single cycle and can be handled by any of the 4 standard ALUs. clmul, cpop are 4c and use the shared multi-cycle ALU. Obviously we may need to break things down further for other uarchs. But that's start.