On Tue, Oct 15, 2019 at 2:18 AM Peter Bergner <berg...@linux.ibm.com> wrote:
>
> On 10/14/19 2:57 PM, Segher Boessenkool wrote:
> > On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote:
> >> The general case should be that if the caller ISA supports the callee one
> >> then inlining is OK. If this is not wanted in some cases then there are
> >> options like using a noinline attribute.
> >
> > I agree, and that is what we already do afaik.
>
> I agree on the making sure the caller's ISA supports the callee's ISA
> before allowing inlining...and I think that's what our code it "trying"
> to do.  It just has some bugs.
>
>
> > But in this case, the callee explicitly disables something (-mno-vsx),
> > while the caller has it enabled (all modern cpus have VSX).  If it ends
> > up being inlined, it will get VSX insns generated for it.  Which is what
> > Jiu Fu's patch aims to prevent.
> >
> > So you are saying the GCC policy is that you should use noinline on the
> > callee in such cases?
>
> I don't think sprinkling noinline's around will work given LTO can
> inline random functions from one object file into a function in another
> object file and we have no idea what the options were used for both files.
> I think that would just force us to end up putting nolines on all fuctions
> that might be compiled with LTO.

There's a function attribute for this.

> I think we just need to fix the bug in the current logic when checking
> whether the caller's ISA flags supports the callee's ISA flags. ...and
> for that, I think we just need to add a test that enforces that the
> caller's ISA flags match exactly the callee's flags, for those flags
> that were explicitly set in the callee.  The patch below seems to fix
> the issue (regtesting now).  Does this look like what we want?

I believe this is going to bite you exactly in the case you want the
opposite behavior.  If you have CUs compiled with defaults and
a specialized one with VSX that calls into generic compiled functions
you _do_ want to allow inlining into the VSX enabled routines.  Just
think of LTO, C++ and comdats - you'll get a random comdat entity
at link time for inlining - either from the VSX CU or the non-VSX one.

Richard.

> Peter
>
>
> gcc/
>         * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit
>         options.
>
> gcc.testsuite/
>         * gcc.target/powerpc/pr70010.c: New test.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  (revision 276975)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -23976,13 +23976,18 @@ rs6000_can_inline_p (tree caller, tree c
>    else
>      {
>        struct cl_target_option *caller_opts = TREE_TARGET_OPTION 
> (caller_tree);
> +      HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
>        struct cl_target_option *callee_opts = TREE_TARGET_OPTION 
> (callee_tree);
> +      HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
> +      HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
>
> -      /* Callee's options should a subset of the caller's, i.e. a vsx 
> function
> -        can inline an altivec function but a non-vsx function can't inline a
> -        vsx function.  */
> -      if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags)
> -         == callee_opts->x_rs6000_isa_flags)
> +      /* The callee's options must be a subset of the caller's options, i.e.
> +        a vsx function may inline an altivec function, but a non-vsx function
> +        must not inline a vsx function.  However, for those options that the
> +        callee has explicitly set, then we must enforce that the callee's
> +        and caller's options match exactly; see PR70010.  */
> +      if (((caller_isa & callee_isa) == callee_isa)
> +         && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
>         ret = true;
>      }
>
> Index: gcc/testsuite/gcc.target/powerpc/pr70010.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr70010.c  (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr70010.c  (working copy)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -finline" } */
> +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */
> +
> +typedef int vec_t __attribute__((vector_size(16)));
> +
> +static vec_t
> +__attribute__((__target__("no-vsx")))
> +vadd_no_vsx (vec_t a, vec_t b)
> +{
> +  return a + b;
> +}
> +
> +vec_t
> +__attribute__((__target__("vsx")))
> +call_vadd_no_vsx (vec_t x, vec_t y, vec_t z)
> +{
> +  return vadd_no_vsx (x, y) - z;
> +}
>
>
>
> bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.i
> typedef int vec_t __attribute__((vector_size(16)));
>
> static vec_t
> __attribute__((__target__("no-vsx")))
> vadd_no_vsx (vec_t a, vec_t b)
> {
>   return a + b;
> }
>
> vec_t
> __attribute__((__target__("vsx")))
> call_vadd_no_vsx (vec_t x, vec_t y, vec_t z)
> {
>   return vadd_no_vsx (x, y) - z;
> }
>
> bergner@pike:~/gcc/BUGS/CPU$ old-gcc -O2 -finline -S pr70010.i
> bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.s
> call_vadd_no_vsx:
>         vadduwm 2,2,3
>         vsubuwm 2,2,4
>         blr
>
> bergner@pike:~/gcc/BUGS/CPU$ 
> /home/bergner/gcc/build/gcc-fsf-mainline-cpu-flags-debug/gcc/xgcc 
> -B/home/bergner/gcc/build/gcc-fsf-mainline-cpu-flags-debug/gcc -O2 -finline 
> -S pr70010.i
> bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.s
> vadd_no_vsx:
>         vadduwm 2,2,3
>         blr
>         .long 0
>
> call_vadd_no_vsx:
> 0:      addis 2,12,.TOC.-.LCF1@ha
>         addi 2,2,.TOC.-.LCF1@l
>         mflr 0
>         std 0,16(1)
>         stdu 1,-48(1)
>         li 0,32
>         stvx 31,1,0
>         xxlor 63,36,36
>         bl vadd_no_vsx
>         addi 1,1,48
>         li 0,-16
>         vsubuwm 2,2,31
>         lvx 31,1,0
>         ld 0,16(1)
>         mtlr 0
>         blr

Reply via email to