https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- E.g. unsigned r3_128u_v3 (__uint128_t n) { unsigned long a; a = (n >> 88); a += (n >> 44) & 0xfffffffffffULL; a += (n & 0xfffffffffffULL); return a % 3; } could work, but haven't measured how fast it is on average against the libcall.