https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105
--- Comment #5 from XChy <xxs_chy at outlook dot com> --- (In reply to Jakub Jelinek from comment #4) > So, e.g. on x86_64, > unsigned int > f1 (unsigned val) > { > return val / 10 * 16 + val % 10; > } > > unsigned int > f2 (unsigned val) > { > return val / 10 * 6 + val; > } > > unsigned int > f3 (unsigned val, unsigned a, unsigned b) > { > return val / a * b + val % a; > } > > unsigned int > f4 (unsigned val, unsigned a, unsigned b) > { > return val / a * (b - a) + val % a; > } > > unsigned int > f5 (unsigned val) > { > return val / 93 * 127 + val % 93; > } > > unsigned int > f6 (unsigned val) > { > return val / 93 * (127 - 93) + val; > } > > f2, f3 and f5 are shorter compared to f1, f4 and f6 at -O2. > With -Os, f3 is shorter than f4, while f1/f2 and f5/f6 are the same size > (and also same number of insns there, perhaps f1 better than f2 as it uses > shift rather than imul). > So, this is really something that needs to take into account the machine > specific expansion etc., isn't a clear winner all the time. Thanks for your explanations! It's a good fold for those targets with expensive cost on "v % a", but not for those cheap. I'm not a GCC developer, do you think I should report to rtl-optimization? And it seems that f6 has smaller size than f5 at -O2 in your example: https://godbolt.org/z/PEWKfj1je