https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102860
--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 18 Jan 2022, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102860 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org > > --- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Short testcase: > function foo(a) > integer(kind=4) :: a(1024) > a(:) = modulo (a(:), 39) > end function > -O2 -mcpu=power10. > vect_recog_divmod_pattern only handles TRUNC_{DIV,MOD}_EXPR and EXACT_DIV_EXPR > (and isn't guaranteed to succeed anyway), but optab_for_tree_code returns the > same smod_optab or sdiv_optab (if signed; FLOOR_* for unsigned is mapped to > TRUNC_*). > I guess the quickest way would be to punt on {CEIL,FLOOR,ROUND}_{DIV,MOD}_EXPR > in the vectorizer and tree-vect-generic.cc True. > Further gradual improvements can be: > 1) match.pd has: > /* For unsigned integral types, FLOOR_DIV_EXPR is the same as > TRUNC_DIV_EXPR. Rewrite into the latter in this case. */ > (simplify > (floor_div @0 @1) > (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > && TYPE_UNSIGNED (type)) > (trunc_div @0 @1))) > but expmed.cc has: > /* Promote floor rounding to trunc rounding for unsigned operations. */ > if (unsignedp) > { > if (code == FLOOR_DIV_EXPR) > code = TRUNC_DIV_EXPR; > if (code == FLOOR_MOD_EXPR) > code = TRUNC_MOD_EXPR; > if (code == EXACT_DIV_EXPR && op1_is_pow2) > code = TRUNC_DIV_EXPR; > } > Shouldn't we make it > (for floor_divmod (floor_div floor_mod) > trunc_divmod (trunc_div trunc_mod) > (simplify > (floor_divmod @0 @1) > (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > && TYPE_UNSIGNED (type)) > (trunc_divmod @0 @1)))) > ? Yeah, if the simplification is incomplete we should amend it. > 2) as the RTL optabs really do just trunc div/mod, perhaps > tree-vect-patterns.cc > could be changed to replace some or all of those operations with the trunc > operation followed by some arith and cond_exprs so that the vectorizer knows > actual cost of those operations. > E.g. it seems expmed.cc expands > r = x %[fl] y; > as > r = x % y; if (r && (x ^ y) < 0) r += y; > and > d = x /[fl] y; > would be > r = x % y; d = x / y; if (r && (x ^ y) < 0) --d; > Looking at wide-int.h, > r = x %[cl] y; > as > r = x % y; if (r && (x ^ y) >= 0) r -= y; > and > d = /[cl] y; > as > r = x % y; d = x / y; if (r && (x ^ y) >= 0) ++d; > All of the above for signed, as I said earlier, unsigned [fl] is the same as > trunc and unsigned [cl] should replace (x ^ y) >= 0 with 1. > [rd] is even more complex. That sounds reasonable as well. I think we can do 0) and 1) now and defer 2) to the next stage1, maybe tracking it with an enhancement bugreport.