On Thu, Dec 8, 2022 at 2:56 AM Jose E. Marchesi via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > The expand_expr_divmod function in expr.cc attempts to optimize cases > where both arguments of a division/modulus are known to be positive > when interpreted as signed. In these cases, both signed division and > unsigned division will raise the same value, and therefore the > cheapest option can be used.
I suspect this issue is also the same as PR 48783 . Thanks, Andrew > > In order to determine what is the cheaper option in the current > target, expand_expr_divmod actually expands both a signed divmod and > an unsigned divmod using local "sequences": > > start_sequence (); > ... > expand_divmod (... signed ...); > ... > end_sequence (); > > start_sequence (); > ... > expand_divmod (... unsigned ...); > ... > end_sequence (); > > And then compares the cost of each generated sequence, choosing the > best one. Finally, it emits the selected expanded sequence and > returns the rtx with the result. > > This approach has a caveat. Some targets do not provide instructions > for division/modulus instructions. In the case of BPF, it provides > unsigned division/modulus, but not signed division/modulus. > > In these cases, the expand_divmod tries can contain calls to funcalls. > For example, in BPF: > > start_sequence (); > ... > expand_divmod (... signed ...); -> This generates funcall to __divdi3 > ... > end_sequence (); > > start_sequence (); > ... > expand_divmod (... unsigned ...); -> This generates direct `div' insn. > ... > end_sequence (); > > The problem is that when a funcall is expanded, an accompanying global > symbol definition is written in the output stream: > > .global __divdi3 > > And this symbol definition remains in the compiled assembly file, even > if the sequence using the direct `div' instruction above is used. > > This is particularly bad in BPF, because the kernel bpf loader chokes > on the spurious symbol __divdi3 and makes the resulting BPF object > unloadable (note that BPF objects are not linked before processed by > the kernel.) > > In order to fix this, this patch modifies expand_expr_divmod in the > following way: > > - When trying each sequence (signed, unsigned) the expand_divmod calls > are told to _not_ use libcalls if everything else fails. This is > done by passing OPTAB_WIDEN as the `methods' argument. (Before it > was using the default value OPTAB_LIB_WIDEN.) > > - If any of the tried expanded sequences contain a funcall, then the > optimization is not attempted. > > A couple of BPF tests are also added to make sure this doesn't break > at any point in the future. > > Tested in bpf-unknown-none and x86_64-linux-gnu. > Regtested in x86_64-linux-gnu. No regressions. > > gcc/ChangeLog > > * expr.cc (expand_expr_divmod): Avoid side-effects of trying > sequences involving funcalls in optimization. > > gcc/testsuite/ChangeLog: > > * gcc.target/bpf/divmod-funcall-1.c: New test. > * gcc.target/bpf/divmod-funcall-2.c: Likewise. > --- > gcc/expr.cc | 44 +++++++++++-------- > .../gcc.target/bpf/divmod-funcall-1.c | 8 ++++ > .../gcc.target/bpf/divmod-funcall-2.c | 8 ++++ > 3 files changed, 41 insertions(+), 19 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > > diff --git a/gcc/expr.cc b/gcc/expr.cc > index d9407432ea5..4d4be5d7bda 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -9168,32 +9168,38 @@ expand_expr_divmod (tree_code code, machine_mode > mode, tree treeop0, > do_pending_stack_adjust (); > start_sequence (); > rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, > - op0, op1, target, 1); > + op0, op1, target, 1, OPTAB_WIDEN); > rtx_insn *uns_insns = get_insns (); > end_sequence (); > start_sequence (); > rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, > - op0, op1, target, 0); > + op0, op1, target, 0, OPTAB_WIDEN); > rtx_insn *sgn_insns = get_insns (); > end_sequence (); > - unsigned uns_cost = seq_cost (uns_insns, speed_p); > - unsigned sgn_cost = seq_cost (sgn_insns, speed_p); > > - /* If costs are the same then use as tie breaker the other other > - factor. */ > - if (uns_cost == sgn_cost) > - { > - uns_cost = seq_cost (uns_insns, !speed_p); > - sgn_cost = seq_cost (sgn_insns, !speed_p); > - } > - > - if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) > - { > - emit_insn (uns_insns); > - return uns_ret; > - } > - emit_insn (sgn_insns); > - return sgn_ret; > + /* Do not try to optimize if any of the sequences tried above > + resulted in a funcall. */ > + if (uns_ret && sgn_ret) > + { > + unsigned uns_cost = seq_cost (uns_insns, speed_p); > + unsigned sgn_cost = seq_cost (sgn_insns, speed_p); > + > + /* If costs are the same then use as tie breaker the other > + other factor. */ > + if (uns_cost == sgn_cost) > + { > + uns_cost = seq_cost (uns_insns, !speed_p); > + sgn_cost = seq_cost (sgn_insns, !speed_p); > + } > + > + if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) > + { > + emit_insn (uns_insns); > + return uns_ret; > + } > + emit_insn (sgn_insns); > + return sgn_ret; > + } > } > return expand_divmod (mod_p, code, mode, treeop0, treeop1, > op0, op1, target, unsignedp); > diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > new file mode 100644 > index 00000000000..dffb1506f06 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > @@ -0,0 +1,8 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler-not "__divdi3" } } */ > + > +int > +foo (unsigned int len) > +{ > + return ((unsigned long)len) * 234 / 5; > +} > diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > new file mode 100644 > index 00000000000..41e8e40c35c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > @@ -0,0 +1,8 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler-not "__moddi3" } } */ > + > +int > +foo (unsigned int len) > +{ > + return ((unsigned long)len) * 234 % 5; > +} > -- > 2.30.2 >