On Tue, Oct 15, 2019 at 2:18 AM Peter Bergner <berg...@linux.ibm.com> wrote: > > On 10/14/19 2:57 PM, Segher Boessenkool wrote: > > On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote: > >> The general case should be that if the caller ISA supports the callee one > >> then inlining is OK. If this is not wanted in some cases then there are > >> options like using a noinline attribute. > > > > I agree, and that is what we already do afaik. > > I agree on the making sure the caller's ISA supports the callee's ISA > before allowing inlining...and I think that's what our code it "trying" > to do. It just has some bugs. > > > > But in this case, the callee explicitly disables something (-mno-vsx), > > while the caller has it enabled (all modern cpus have VSX). If it ends > > up being inlined, it will get VSX insns generated for it. Which is what > > Jiu Fu's patch aims to prevent. > > > > So you are saying the GCC policy is that you should use noinline on the > > callee in such cases? > > I don't think sprinkling noinline's around will work given LTO can > inline random functions from one object file into a function in another > object file and we have no idea what the options were used for both files. > I think that would just force us to end up putting nolines on all fuctions > that might be compiled with LTO.
There's a function attribute for this. > I think we just need to fix the bug in the current logic when checking > whether the caller's ISA flags supports the callee's ISA flags. ...and > for that, I think we just need to add a test that enforces that the > caller's ISA flags match exactly the callee's flags, for those flags > that were explicitly set in the callee. The patch below seems to fix > the issue (regtesting now). Does this look like what we want? I believe this is going to bite you exactly in the case you want the opposite behavior. If you have CUs compiled with defaults and a specialized one with VSX that calls into generic compiled functions you _do_ want to allow inlining into the VSX enabled routines. Just think of LTO, C++ and comdats - you'll get a random comdat entity at link time for inlining - either from the VSX CU or the non-VSX one. Richard. > Peter > > > gcc/ > * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit > options. > > gcc.testsuite/ > * gcc.target/powerpc/pr70010.c: New test. > > > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c (revision 276975) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -23976,13 +23976,18 @@ rs6000_can_inline_p (tree caller, tree c > else > { > struct cl_target_option *caller_opts = TREE_TARGET_OPTION > (caller_tree); > + HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags; > struct cl_target_option *callee_opts = TREE_TARGET_OPTION > (callee_tree); > + HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags; > + HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit; > > - /* Callee's options should a subset of the caller's, i.e. a vsx > function > - can inline an altivec function but a non-vsx function can't inline a > - vsx function. */ > - if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags) > - == callee_opts->x_rs6000_isa_flags) > + /* The callee's options must be a subset of the caller's options, i.e. > + a vsx function may inline an altivec function, but a non-vsx function > + must not inline a vsx function. However, for those options that the > + callee has explicitly set, then we must enforce that the callee's > + and caller's options match exactly; see PR70010. */ > + if (((caller_isa & callee_isa) == callee_isa) > + && (caller_isa & explicit_isa) == (callee_isa & explicit_isa)) > ret = true; > } > > Index: gcc/testsuite/gcc.target/powerpc/pr70010.c > =================================================================== > --- gcc/testsuite/gcc.target/powerpc/pr70010.c (nonexistent) > +++ gcc/testsuite/gcc.target/powerpc/pr70010.c (working copy) > @@ -0,0 +1,21 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-O2 -finline" } */ > +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */ > + > +typedef int vec_t __attribute__((vector_size(16))); > + > +static vec_t > +__attribute__((__target__("no-vsx"))) > +vadd_no_vsx (vec_t a, vec_t b) > +{ > + return a + b; > +} > + > +vec_t > +__attribute__((__target__("vsx"))) > +call_vadd_no_vsx (vec_t x, vec_t y, vec_t z) > +{ > + return vadd_no_vsx (x, y) - z; > +} > > > > bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.i > typedef int vec_t __attribute__((vector_size(16))); > > static vec_t > __attribute__((__target__("no-vsx"))) > vadd_no_vsx (vec_t a, vec_t b) > { > return a + b; > } > > vec_t > __attribute__((__target__("vsx"))) > call_vadd_no_vsx (vec_t x, vec_t y, vec_t z) > { > return vadd_no_vsx (x, y) - z; > } > > bergner@pike:~/gcc/BUGS/CPU$ old-gcc -O2 -finline -S pr70010.i > bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.s > call_vadd_no_vsx: > vadduwm 2,2,3 > vsubuwm 2,2,4 > blr > > bergner@pike:~/gcc/BUGS/CPU$ > /home/bergner/gcc/build/gcc-fsf-mainline-cpu-flags-debug/gcc/xgcc > -B/home/bergner/gcc/build/gcc-fsf-mainline-cpu-flags-debug/gcc -O2 -finline > -S pr70010.i > bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.s > vadd_no_vsx: > vadduwm 2,2,3 > blr > .long 0 > > call_vadd_no_vsx: > 0: addis 2,12,.TOC.-.LCF1@ha > addi 2,2,.TOC.-.LCF1@l > mflr 0 > std 0,16(1) > stdu 1,-48(1) > li 0,32 > stvx 31,1,0 > xxlor 63,36,36 > bl vadd_no_vsx > addi 1,1,48 > li 0,-16 > vsubuwm 2,2,31 > lvx 31,1,0 > ld 0,16(1) > mtlr 0 > blr