https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

--- Comment #5 from XChy <xxs_chy at outlook dot com> ---
(In reply to Jakub Jelinek from comment #4)
> So, e.g. on x86_64,
> unsigned int
> f1 (unsigned val)
> {
>   return val / 10 * 16 + val % 10;
> }
> 
> unsigned int
> f2 (unsigned val)
> {
>   return val / 10 * 6 + val;
> }
> 
> unsigned int
> f3 (unsigned val, unsigned a, unsigned b)
> {
>   return val / a * b + val % a;
> }
> 
> unsigned int
> f4 (unsigned val, unsigned a, unsigned b)
> {
>   return val / a * (b - a) + val % a;
> }
> 
> unsigned int
> f5 (unsigned val)
> {
>   return val / 93 * 127 + val % 93;
> }
> 
> unsigned int
> f6 (unsigned val)
> {
>   return val / 93 * (127 - 93) + val;
> }
> 
> f2, f3 and f5 are shorter compared to f1, f4 and f6 at -O2.
> With -Os, f3 is shorter than f4, while f1/f2 and f5/f6 are the same size
> (and also same number of insns there, perhaps f1 better than f2 as it uses
> shift rather than imul).
> So, this is really something that needs to take into account the machine
> specific expansion etc., isn't a clear winner all the time.

Thanks for your explanations! It's a good fold for those targets with expensive
cost on "v % a", but not for those cheap. I'm not a GCC developer, do you think
I should report to rtl-optimization?

And it seems that f6 has smaller size than f5 at -O2 in your example:
https://godbolt.org/z/PEWKfj1je

Reply via email to