https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120022
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Jin Haobo from comment #4) > Thank you, but the "correct" code still generate suboptimal assembly code > for myDivMod1, while Clang is optimal. > GCC: https://gcc.godbolt.org/z/cvadddsz3 > Clang: https://gcc.godbolt.org/z/fdvaWzGo5 > > Is this actually a suboptimal assembly, or I miss some subtle detail? Yes, CSEing inline-asm with multiple outputs is not going to be optimized. GCC's inline-asm is not something which is designed to be optimized as much as normal code. Don't use inline-asm unless you really need to. GCC provides many builtins for doing things like checking for multiple overflow and intrinsics for accessing many of the crypto instructions of the HW and optimizes a lot of C code (not always though). Those should be used before inline-asm.