On 10/14/19 2:57 PM, Segher Boessenkool wrote: > On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote: >> The general case should be that if the caller ISA supports the callee one >> then inlining is OK. If this is not wanted in some cases then there are >> options like using a noinline attribute. > > I agree, and that is what we already do afaik.
I agree on the making sure the caller's ISA supports the callee's ISA before allowing inlining...and I think that's what our code it "trying" to do. It just has some bugs. > But in this case, the callee explicitly disables something (-mno-vsx), > while the caller has it enabled (all modern cpus have VSX). If it ends > up being inlined, it will get VSX insns generated for it. Which is what > Jiu Fu's patch aims to prevent. > > So you are saying the GCC policy is that you should use noinline on the > callee in such cases? I don't think sprinkling noinline's around will work given LTO can inline random functions from one object file into a function in another object file and we have no idea what the options were used for both files. I think that would just force us to end up putting nolines on all fuctions that might be compiled with LTO. I think we just need to fix the bug in the current logic when checking whether the caller's ISA flags supports the callee's ISA flags. ...and for that, I think we just need to add a test that enforces that the caller's ISA flags match exactly the callee's flags, for those flags that were explicitly set in the callee. The patch below seems to fix the issue (regtesting now). Does this look like what we want? Peter gcc/ * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit options. gcc.testsuite/ * gcc.target/powerpc/pr70010.c: New test. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 276975) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -23976,13 +23976,18 @@ rs6000_can_inline_p (tree caller, tree c else { struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree); + HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags; struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); + HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags; + HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit; - /* Callee's options should a subset of the caller's, i.e. a vsx function - can inline an altivec function but a non-vsx function can't inline a - vsx function. */ - if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags) - == callee_opts->x_rs6000_isa_flags) + /* The callee's options must be a subset of the caller's options, i.e. + a vsx function may inline an altivec function, but a non-vsx function + must not inline a vsx function. However, for those options that the + callee has explicitly set, then we must enforce that the callee's + and caller's options match exactly; see PR70010. */ + if (((caller_isa & callee_isa) == callee_isa) + && (caller_isa & explicit_isa) == (callee_isa & explicit_isa)) ret = true; } Index: gcc/testsuite/gcc.target/powerpc/pr70010.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr70010.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr70010.c (working copy) @@ -0,0 +1,21 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -finline" } */ +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */ + +typedef int vec_t __attribute__((vector_size(16))); + +static vec_t +__attribute__((__target__("no-vsx"))) +vadd_no_vsx (vec_t a, vec_t b) +{ + return a + b; +} + +vec_t +__attribute__((__target__("vsx"))) +call_vadd_no_vsx (vec_t x, vec_t y, vec_t z) +{ + return vadd_no_vsx (x, y) - z; +} bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.i typedef int vec_t __attribute__((vector_size(16))); static vec_t __attribute__((__target__("no-vsx"))) vadd_no_vsx (vec_t a, vec_t b) { return a + b; } vec_t __attribute__((__target__("vsx"))) call_vadd_no_vsx (vec_t x, vec_t y, vec_t z) { return vadd_no_vsx (x, y) - z; } bergner@pike:~/gcc/BUGS/CPU$ old-gcc -O2 -finline -S pr70010.i bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.s call_vadd_no_vsx: vadduwm 2,2,3 vsubuwm 2,2,4 blr bergner@pike:~/gcc/BUGS/CPU$ /home/bergner/gcc/build/gcc-fsf-mainline-cpu-flags-debug/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-cpu-flags-debug/gcc -O2 -finline -S pr70010.i bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.s vadd_no_vsx: vadduwm 2,2,3 blr .long 0 call_vadd_no_vsx: 0: addis 2,12,.TOC.-.LCF1@ha addi 2,2,.TOC.-.LCF1@l mflr 0 std 0,16(1) stdu 1,-48(1) li 0,32 stvx 31,1,0 xxlor 63,36,36 bl vadd_no_vsx addi 1,1,48 li 0,-16 vsubuwm 2,2,31 lvx 31,1,0 ld 0,16(1) mtlr 0 blr