Re: [RFC,patch] Linker plugin - extend API for offloading corner case (aka: LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook [GCC PR109128])

2023-05-10 Thread Alan Modra via Gcc-patches
On Thu, May 04, 2023 at 11:02:25AM +, Richard Biener via Binutils wrote:
> So since we expect the linker to use the host side table is there a way
> for the plugin to exactly query that (the set of symbols the linker
> uses from the object passed to the plugin)?

That would be possible and relatively easy to implement, but might be
slow.

>  Because if the linker
> uses something from the file but _not_ the host side offload table
> (-ffunction-sections -fdata-sections) then things would still go
> wrong, right?

> Is there a way to connect both in a way that the linker discards
> either if the other isn't present?

No, or at least I do not want to even think about implementing such a
linker "feature".  The problem is that after you have modified the
global linker symbol table after adding an object's symbols, it is
virtually impossible to restore the state of symbols to what they
would be without that object.  (Yes, we do that sort of thing for
as-needed shared libraries, but the restoration happens immediately
after adding the symbols.  I also regret implementing it the way I
did.)

The patch posted is OK from the linker side of things.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 3/8] [RS6000] rs6000_rtx_costs tidy AND

2021-01-21 Thread Alan Modra via Gcc-patches
Ping.

On Tue, Jan 12, 2021 at 02:01:57PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555754.html
> 
> On Thu, Oct 08, 2020 at 09:27:55AM +1030, Alan Modra wrote:
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Tidy AND code.
> > Don't avoid recursion on const_int shift count.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index e870ba0039a..bc5e51aa5ce 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21253,6 +21253,7 @@ static bool
> >  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >   int opno ATTRIBUTE_UNUSED, int *total, bool speed)
> >  {
> > +  rtx right;
> >int code = GET_CODE (x);
> >  
> >switch (code)
> > @@ -21430,7 +21431,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >return false;
> >  
> >  case AND:
> > -  if (CONST_INT_P (XEXP (x, 1)))
> > +  *total = COSTS_N_INSNS (1);
> > +  right = XEXP (x, 1);
> > +  if (CONST_INT_P (right))
> > {
> >   rtx left = XEXP (x, 0);
> >   rtx_code left_code = GET_CODE (left);
> > @@ -21439,17 +21442,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >   if ((left_code == ROTATE
> >|| left_code == ASHIFT
> >|| left_code == LSHIFTRT)
> > - && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
> > + && rs6000_is_valid_shift_mask (right, left, mode))
> > {
> > - *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > - if (!CONST_INT_P (XEXP (left, 1)))
> > -   *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
> > speed);
> > - *total += COSTS_N_INSNS (1);
> > + *total += rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > + *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
> >   return true;
> > }
> > }
> > -
> > -  *total = COSTS_N_INSNS (1);
> >return false;
> >  
> >  case IOR:

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 4/8] [RS6000] rs6000_rtx_costs tidy break/return

2021-01-21 Thread Alan Modra via Gcc-patches
Ping.

On Tue, Jan 12, 2021 at 02:02:09PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555755.html
> 
> On Thu, Oct 08, 2020 at 09:27:56AM +1030, Alan Modra wrote:
> > Most cases use "return false" rather than breaking out of the switch.
> > Do so in all cases.
> > 
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Tidy break/return.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index bc5e51aa5ce..383d2901c9f 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21371,7 +21371,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> > *total = rs6000_cost->fp;
> >else
> > *total = rs6000_cost->dmul;
> > -  break;
> > +  return false;
> >  
> >  case DIV:
> >  case MOD:
> > @@ -21539,7 +21539,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >   *total = rs6000_cost->fp;
> >   return false;
> > }
> > -  break;
> > +  return false;
> >  
> >  case NE:
> >  case EQ:
> > @@ -21577,13 +21577,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >   *total = 0;
> >   return true;
> > }
> > -  break;
> > +  return false;
> >  
> >  default:
> > -  break;
> > +  return false;
> >  }
> > -
> > -  return false;
> >  }
> >  
> >  /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 5/8] [RS6000] rs6000_rtx_costs cost IOR

2021-01-21 Thread Alan Modra via Gcc-patches
Ping.

On Tue, Jan 12, 2021 at 02:02:18PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555756.html
> 
> On Thu, Oct 08, 2020 at 09:27:57AM +1030, Alan Modra wrote:
> > * config/rs6000/rs6000.c (rotate_insert_cost): New function.
> > (rs6000_rtx_costs): Cost IOR.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 383d2901c9f..15a806fe307 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21206,6 +21206,91 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
> >  && get_attr_cannot_copy (insn);
> >  }
> >  
> > +/* Handle rtx_costs for scalar integer rotate and insert insns.  */
> > +
> > +static bool
> > +rotate_insert_cost (rtx left, rtx right, machine_mode mode, bool speed,
> > +   int *total)
> > +{
> > +  if (GET_CODE (right) == AND
> > +  && CONST_INT_P (XEXP (right, 1))
> > +  && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
> > +{
> > +  rtx leftop = XEXP (left, 0);
> > +  rtx rightop = XEXP (right, 0);
> > +
> > +  /* rotlsi3_insert_5.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == SImode
> > + && UINTVAL (XEXP (left, 1)) != 0
> > + && UINTVAL (XEXP (right, 1)) != 0
> > + && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
> > +   return true;
> > +  /* rotldi3_insert_6.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == DImode
> > + && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
> > +   return true;
> > +  /* rotldi3_insert_7.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == DImode
> > + && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
> > +   return true;
> > +
> > +  rtx mask = 0;
> > +  rtx shift = leftop;
> > +  rtx_code shift_code = GET_CODE (shift);
> > +  /* rotl3_insert.  */
> > +  if (shift_code == ROTATE
> > + || shift_code == ASHIFT
> > + || shift_code == LSHIFTRT)
> > +   mask = right;
> > +  else
> > +   {
> > + shift = rightop;
> > + shift_code = GET_CODE (shift);
> > + /* rotl3_insert_2.  */
> > + if (shift_code == ROTATE
> > + || shift_code == ASHIFT
> > + || shift_code == LSHIFTRT)
> > +   mask = left;
> > +   }
> > +  if (mask
> > + && CONST_INT_P (XEXP (shift, 1))
> > + && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
> > +   {
> > + *total += rtx_cost (XEXP (shift, 0), mode, shift_code, 0, speed);
> > + *total += rtx_cost (XEXP (mask, 0), mode, AND, 0, speed);
> > + return true;
> > +   }
> > +}
> > +  /* rotl3_insert_3.  */
> > +  if (GET_CODE (right) == ASHIFT
> > +  && CONST_INT_P (XEXP (right, 1))
> > +  && (INTVAL (XEXP (right, 1))
> > + == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
> > +{
> > +  *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
> > +  *total += rtx_cost (XEXP (right, 0), mode, ASHIFT, 0, speed);
> > +  return true;
> > +}
> > +  /* rotl3_insert_4.  */
> > +  if (GET_CODE (right) == LSHIFTRT
> > +  && CONST_INT_P (XEXP (right, 1))
> > +  && mode == SImode
> > +  && (INTVAL (XEXP (right, 1))
> > + + exact_log2 (-UINTVAL (XEXP (left, 1 == 32)
> > +{
> > +  *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
> > +  *total += rtx_cost (XEXP (right, 0), mode, LSHIFTRT, 0, speed);
> > +  return true;
> > +}
> > +  return false;
> > +}
> > +
> >  /* Compute a (partial) cost for rtx X.  Return true if the complete
> > cost has been computed, and false if subexpressions should be
> > scanned.  In either case, *TOTAL contains the cost result.
> > @@ -21253,7 +21338,7 @@ static bool
> >  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >   int opno ATTRIBUTE_UNUSED, int *total, bool speed)
> >  {
> > -  rtx right;
> > +  rtx left, right;
> >int code = GET_CODE (x);
> >  
> >switch (code)
> > @@ -21435,7 +21520,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >right = XEXP (x, 1);
> >if (CONST_INT_P (right))
> > {
> > - rtx left = XEXP (x, 0);
> > + left = XEXP (x, 0);
> >   rtx_code left_code = GET_CODE (left);
> >  
> >   /* rotate-and-mask: 1 insn.  */
> > @@ -21452,9 +21537,16 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >return false;
> >  
> >  case IOR:
> > -  /* FIXME */
> >*total = COSTS_N_INSNS (1);
> > -  return true;
> > +  left = XEXP (x, 0);
> > +  if (GET_CODE (left) == AND
> > + && CONST_INT_P (XEXP (left, 1)))
> > +   {
> > + right = XEXP (x, 1);
> > + if (rotate_insert_cost (left, right, mode, speed, total))
> > +   return true;
> > +   }
> > +  return false;
> >  
> >  case CLZ:
> >  case XOR:

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 7/8] [RS6000] rs6000_rtx_costs reduce cost for SETs

2021-01-21 Thread Alan Modra via Gcc-patches
Ping.

On Tue, Jan 12, 2021 at 02:02:27PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555758.html
> 
> On Thu, Oct 08, 2020 at 09:27:59AM +1030, Alan Modra wrote:
> > The aim of this patch is to make rtx_costs for SETs closer to
> > insn_cost for SETs.  One visible effect on powerpc code is increased
> > if-conversion.
> > 
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost of SET
> > operands.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 76aedbfae6f..d455aa52427 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21684,6 +21684,35 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> > }
> >return false;
> >  
> > +case SET:
> > +  /* On entry the value in *TOTAL is the number of general purpose
> > +regs being set, multiplied by COSTS_N_INSNS (1).  Handle
> > +costing of set operands specially since in most cases we have
> > +an instruction rather than just a piece of RTL and should
> > +return a cost comparable to insn_cost.  That's a little
> > +complicated because in some cases the cost of SET operands is
> > +non-zero, see point 5 above and cost of PLUS for example, and
> > +in others it is zero, for example for (set (reg) (reg)).
> > +But (set (reg) (reg)) has the same insn_cost as
> > +(set (reg) (plus (reg) (reg))).  Hack around this by
> > +subtracting COSTS_N_INSNS (1) from the operand cost in cases
> > +were we add at least COSTS_N_INSNS (1) for some operation.
> > +However, don't do so for constants.  Constants might cost
> > +more than zero when they require more than one instruction,
> > +and we do want the cost of extra instructions.  */
> > +  {
> > +   rtx_code src_code = GET_CODE (SET_SRC (x));
> > +   if (src_code == CONST_INT
> > +   || src_code == CONST_DOUBLE
> > +   || src_code == CONST_WIDE_INT)
> > + return false;
> > +   int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > +   + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> > +   if (set_cost >= COSTS_N_INSNS (1))
> > + *total += set_cost - COSTS_N_INSNS (1);
> > +   return true;
> > +  }
> > +
> >  default:
> >return false;
> >  }

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 8/8] [RS6000] rs6000_rtx_costs for !speed

2021-01-21 Thread Alan Modra via Gcc-patches
Ping.

On Tue, Jan 12, 2021 at 02:02:36PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555759.html
> 
> On Thu, Oct 08, 2020 at 09:28:00AM +1030, Alan Modra wrote:
> > When optimizing for size we shouldn't be using metrics based on speed
> > or vice-versa.  rtlanal.c:get_full_rtx_cost wants both speed and size
> > metric from rs6000_rtx_costs independent of the global optimize_size.
> > 
> > Note that the patch changes param_simultaneous_prefetches,
> > param_l1_cache_size, param_l1_cache_line_size and param_l2_cache_size,
> > which were previously all set to zero for optimize_size.  I think that
> > was a bug.  Those params are a function of the processor.
> > 
> > * config/rs6000/rs6000.h (rs6000_cost): Don't declare.
> > (struct processor_costs): Move to..
> > * config/rs6000/rs6000.c: ..here.
> > (rs6000_cost): Make static.
> > (rs6000_option_override_internal): Ignore optimize_size when
> > setting up rs6000_cost.
> > (rs6000_insn_cost): Take into account optimize_size here
> > instead.
> > (rs6000_emit_parity): Likewise.
> > (rs6000_rtx_costs): Don't use rs6000_cost when !speed.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index d455aa52427..14ecbad5df4 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -497,7 +497,26 @@ rs6000_store_data_bypass_p (rtx_insn *out_insn, 
> > rtx_insn *in_insn)
> >  
> >  /* Processor costs (relative to an add) */
> >  
> > -const struct processor_costs *rs6000_cost;
> > +struct processor_costs {
> > +  const int mulsi;   /* cost of SImode multiplication.  */
> > +  const int mulsi_const;  /* cost of SImode multiplication by constant.  */
> > +  const int mulsi_const9; /* cost of SImode mult by short constant.  */
> > +  const int muldi;   /* cost of DImode multiplication.  */
> > +  const int divsi;   /* cost of SImode division.  */
> > +  const int divdi;   /* cost of DImode division.  */
> > +  const int fp;  /* cost of simple SFmode and DFmode insns.  */
> > +  const int dmul;/* cost of DFmode multiplication (and fmadd).  */
> > +  const int sdiv;/* cost of SFmode division (fdivs).  */
> > +  const int ddiv;/* cost of DFmode division (fdiv).  */
> > +  const int cache_line_size;/* cache line size in bytes. */
> > +  const int l1_cache_size; /* size of l1 cache, in kilobytes.  */
> > +  const int l2_cache_size; /* size of l2 cache, in kilobytes.  */
> > +  const int simultaneous_prefetches; /* number of parallel prefetch
> > +   operations.  */
> > +  const int sfdf_convert;  /* cost of SF->DF conversion.  */
> > +};
> > +
> > +static const struct processor_costs *rs6000_cost;
> >  
> >  /* Instruction size costs on 32bit processors.  */
> >  static const
> > @@ -4618,131 +4637,128 @@ rs6000_option_override_internal (bool 
> > global_init_p)
> >  }
> >  
> >/* Initialize rs6000_cost with the appropriate target costs.  */
> > -  if (optimize_size)
> > -rs6000_cost = TARGET_POWERPC64 ? &size64_cost : &size32_cost;
> > -  else
> > -switch (rs6000_tune)
> > -  {
> > -  case PROCESSOR_RS64A:
> > -   rs6000_cost = &rs64a_cost;
> > -   break;
> > +  switch (rs6000_tune)
> > +{
> > +case PROCESSOR_RS64A:
> > +  rs6000_cost = &rs64a_cost;
> > +  break;
> >  
> > -  case PROCESSOR_MPCCORE:
> > -   rs6000_cost = &mpccore_cost;
> > -   break;
> > +case PROCESSOR_MPCCORE:
> > +  rs6000_cost = &mpccore_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC403:
> > -   rs6000_cost = &ppc403_cost;
> > -   break;
> > +case PROCESSOR_PPC403:
> > +  rs6000_cost = &ppc403_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC405:
> > -   rs6000_cost = &ppc405_cost;
> > -   break;
> > +case PROCESSOR_PPC405:
> > +  rs6000_cost = &ppc405_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC440:
> > -   rs6000_cost = &ppc440_cost;
> > -   break;
> > +case PROCESSOR_PPC440:
> > +  rs6000_cost = &ppc440_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC476:
> > -   rs6000_cost = &ppc476_cost;
> > -   break;
> > +case PROCESSOR_PPC476:
> > +  rs6000_cost = &ppc476_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC601:
> > -   rs6000_cost = &ppc601_cost;
> > -   break;
> > +case PROCESSOR_PPC601:
> > +  rs6000_cost = &ppc601_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC603:
> > -   rs6000_cost = &ppc603_cost;
> > -   break;
> > +case PROCESSOR_PPC603:
> > +  rs6000_cost = &ppc603_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC604:
> > -   rs6000_cost = &ppc604_cost;
> > -   break;
> > +case PROCESSOR_PPC604:
> > +  rs6000_cost = &ppc604_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC604e:
> > -   rs6000_cost = &ppc604e_cost;
> > -   break;
> > +case PROCESSOR_PPC604e:
> > +  rs6000_cost = &ppc6

Re: [RS6000] Adjust testcases for power10 instructions V3

2021-01-21 Thread Alan Modra via Gcc-patches
Ping.

On Tue, Jan 12, 2021 at 02:03:18PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557587.html
> 
> On Fri, Oct 30, 2020 at 07:00:14PM +1030, Alan Modra wrote:
> > And now waking up to what you meant by the lvsl-lvsr.c \s comment,
> > plus a revised ppc-ne0-1.c scan-assembler.
> > 
> > I think this covers all previous review corrections.  Regression tested
> > powerpc64-linux power7 and powerpc64le-linux power10.  OK?
> > 
> > * lib/target-supports.exp (check_effective_target_has_arch_pwr10): New.
> > * gcc.dg/pr56727-2.c,
> > gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-char.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-double.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-float.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-int.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-short.c,
> > gcc.target/powerpc/fold-vec-splat-floatdouble.c,
> > gcc.target/powerpc/fold-vec-splat-longlong.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-char.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-double.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-float.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-int.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-short.c,
> > gcc.target/powerpc/lvsl-lvsr.c,
> > gcc.target/powerpc/ppc-eq0-1.c,
> > gcc.target/powerpc/ppc-ne0-1.c,
> > gcc.target/powerpc/pr86731-fwrapv-longlong.c: Match power10 insns.
> > * gcc.target/powerpc/lvsl-lvsr.c: Avoid file name match.
> > 
> > diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c 
> > b/gcc/testsuite/gcc.dg/pr56727-2.c
> > index c54369ed25e..f055116772a 100644
> > --- a/gcc/testsuite/gcc.dg/pr56727-2.c
> > +++ b/gcc/testsuite/gcc.dg/pr56727-2.c
> > @@ -18,4 +18,4 @@ void h ()
> >  
> >  /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* } 
> > } } */
> >  /* { dg-final { scan-assembler "@(PLT|plt)" { target { powerpc*-*-linux* 
> > && ilp32 } } } } */
> > -/* { dg-final { scan-assembler "bl f\n\\s*nop" { target { 
> > powerpc*-*-linux* && lp64 } } } } */
> > +/* { dg-final { scan-assembler {bl f(\n\s*nop|@notoc\n)} { target { 
> > powerpc*-*-linux* && lp64 } } } } */
> > diff --git 
> > a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c 
> > b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
> > index 246f38fa6d1..1cff4550f28 100644
> > --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
> > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
> > @@ -25,6 +25,6 @@ main1 (void)
> > with no word loads (lw, lwu, lwz, lwzu, or their indexed forms)
> > or word stores (stw, stwu, stwx, stwux, or their indexed forms).  */
> >  
> > -/* { dg-final { scan-assembler "\t(lvx|lxv|lvsr|stxv)" } } */
> > +/* { dg-final { scan-assembler "\t(lvx|p?lxv|lvsr|p?stxv)" } } */
> >  /* { dg-final { scan-assembler-not "\tlwz?u?x? " { xfail { 
> > powerpc-ibm-aix* } } } } */
> >  /* { dg-final { scan-assembler-not "\tstwu?x? " } } */
> > diff --git 
> > a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c 
> > b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
> > index 9b199c219bf..104710700c8 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-cha

Re: [PATCH 3/8] [RS6000] rs6000_rtx_costs tidy AND

2021-01-31 Thread Alan Modra via Gcc-patches
On Mon, Jan 25, 2021 at 04:37:21PM -0600, Segher Boessenkool wrote:
> I still do not see what this improves, I only see possible obvious
> regressions :-(

You asked me to break the patch series into small pieces, for ease of
review and to separate tidies from functional changes.  Well OK, fair
enough.  This is one of the tidies.  The idea being to make
rs6000_rtx_costs a little more self-consistent, to not have someone
look at this code in the future and wonder why AND was treated
differently to other operations.

The only part of this patch that I can imagine you see as a possible
regression is the "Don't avoid recursion on const_int shift count"
part.  That is there only because you wanted it that way in new code.
I think you said something about premature optimisation when I made
the new code special case const_int and reg to stop recursion, like
AND.  So for consistency I made the change in old code too.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 5/8] [RS6000] rs6000_rtx_costs cost IOR

2021-01-31 Thread Alan Modra via Gcc-patches
On Mon, Jan 25, 2021 at 04:51:43PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Oct 08, 2020 at 09:27:57AM +1030, Alan Modra wrote:
> > * config/rs6000/rs6000.c (rotate_insert_cost): New function.
> > (rs6000_rtx_costs): Cost IOR.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 383d2901c9f..15a806fe307 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21206,6 +21206,91 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
> >  && get_attr_cannot_copy (insn);
> >  }
> >  
> > +/* Handle rtx_costs for scalar integer rotate and insert insns.  */
> 
> You need to document here what the return value means, and what the
> preconditions for "left" (and "right") are.

Done, and I moved the preconditions on "left" into the new function.

> > +static bool
> > +rotate_insert_cost (rtx left, rtx right, machine_mode mode, bool speed,
> > +   int *total)
> > +{
> > +  if (GET_CODE (right) == AND
> 
> ... because you never check the CODE of "left".
> 
> > +  && CONST_INT_P (XEXP (right, 1))
> > +  && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
> 
> HOST_WIDE_INT is always exactly 64 bits now, so you could do "== -1".

Yes, but this is exactly the way the expression occurs in rotl*_insert*
instruction patterns.  I think it's better to keep them the same.

> > +{
> > +  rtx leftop = XEXP (left, 0);
> > +  rtx rightop = XEXP (right, 0);
> > +
> > +  /* rotlsi3_insert_5.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == SImode
> > + && UINTVAL (XEXP (left, 1)) != 0
> > + && UINTVAL (XEXP (right, 1)) != 0
> > + && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
> > +   return true;
> 
> Empty line after return please.

Done, here and elsewhere.

> > +  /* rotldi3_insert_6.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == DImode
> > + && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
> > +   return true;
> > +  /* rotldi3_insert_7.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == DImode
> > + && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
> > +   return true;
> 
> Those could just use rs6000_is_valid_mask as well?

This is taken straight from rotldi3_insert_7, so it really ought to
stay that way.

> 
> Please wait this until stage 1.  Sorry.

OK, I'll leave all the rs6000_rtx_costs changes until then.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH, rs6000] Optimization for PowerPC 64bit constant generation [PR94395]

2021-02-02 Thread Alan Modra via Gcc-patches
On Fri, Jan 29, 2021 at 11:11:23AM +0800, HAO CHEN GUI via Gcc-patches wrote:
>    This patch tries to optimize PowerPC 64 bit constant generation when the
> constant can be transformed from a 32 bit or 16 bit constant by rotating,
> shifting and mask AND.

All and more of what you are doing here for rotated 16-bit constants
is covered by
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html

That patch is still waiting on review.  Hmm, I see my local copy of
that patch has one extra line in
gcc/testsuite/gcc.target/powerpc/rot_cst2.c
+/* { dg-additional-options "-mno-prefixed" { target { lp64 } } } */
in order to keep scan-assembler-times counts correct for power10.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 3/8] intl: always picify

2021-02-10 Thread Alan Modra via Gcc-patches
On Mon, Feb 08, 2021 at 11:16:30AM +, Nick Alcock via Binutils wrote:
> intl/ChangeLog
> 2021-02-02  Nick Alcock  
> 
>   * aclocal.m4: include picflag.m4.
>   * configure.ac (PICFLAG): Add and substitute.
>   * Makefile.in (PICFLAG): New.
>   (COMPILE): Use it.
>   * configure: Regenerate.

OK for binutils.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 4/8] intl: turn LIBINTL into -L / -l form

2021-02-10 Thread Alan Modra via Gcc-patches
On Mon, Feb 08, 2021 at 11:16:31AM +, Nick Alcock via Binutils wrote:
> intl/ChangeLog
> 2021-02-04  Nick Alcock  
> 
>   * configure.ac (LIBINTL): Transform into -L/-lintl form.
>   * configure: Regenerate.

OK for binutils.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Add -fgnu-retain/-fno-gnu-retain

2021-02-18 Thread Alan Modra via Gcc-patches
On Wed, Feb 17, 2021 at 11:23:12AM +, Jozef Lawrynowicz wrote:
> In previous discussions, it seemed to me that there was general support
> for the "used" attribute implying SHF_GNU_RETAIN and saving symbols from
> linker garbage collection[1]. Of course, the current implementation
> results in undesirable behavior - the thought that all linker scripts
> not supporting uniquely named sections would need to be updated is quite
> alarming.
> 
> It's a shame that all this extra complication is required, just because
> we cannot have a ".retain ", directive.

Is that true?  Isn't the problem really that retained sections are
named as if -ffunction-sections/-fdata-sections were in force?  And
that is due to wanting separate sections so that garbage collection
works, since SHF_GNU_RETAIN is all about linker garbage collection.

I don't see how having a ".retain " would help much.

> My preferred vision for this functionality was:
>   - SHF_GNU_RETAIN section flag indicates a section should be saved
> from linker garbage collection.
>   - ".retain " assembler directive applies SHF_GNU_RETAIN
> to the section containing .
>   - GCC "used" attribute emits a ".retain " directive for
> the symbol declaration is is applied to.  Applying the "used"
> attribute to a symbol declaration does not change the structure of
> the object file, beyond applying SHF_GNU_RETAIN to the section
> containing that symbol.

That description seems to say that a ".retain foo" would mean
everything in foo's section is kept.  If foo's section was the usual
.data, you've kept virtually everything from garbage collection.
Surely you don't expect ".retain foo" to create a separate .data
section for foo?  If you do, I'm strongly against that idea.

Note that gas indeed supports multiple sections named .data that can
serve the same purpose as -fdata-sections.  See the gas doc for the
optional .section field "unique".  That might be the best way to avoid
an under-the-hood -ffunction-sections/-fdata-sections.

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] rtx_costs

2020-09-14 Thread Alan Modra via Gcc-patches
This patch series fixes a number of issues in rs6000_rtx_costs, the
aim being to provide costing somewhat closer to reality.  Probably the
most important patch of the series is patch 4, which just adds a
comment.  Without the analysis that went into that comment, I found
myself making what seemed to be good changes but which introduced
regressions.

So far these changes have not introduced any testsuite regressions
on --with-cpu=power8 and --with-cpu=power9 all lang bootstraps on
powerpc64le-linux.  Pat spec tested on power9 against a baseline
master from a few months ago, seeing a few small improvements and no
degradations above the noise.

Some notes:

Examination of varasm.o shows quite a number of cases where
if-conversion succeeds due to different seq_cost.  One example:

extern int foo ();
int
default_assemble_integer (unsigned size)
{
  extern unsigned long rs6000_isa_flags;

  if (size > (!((rs6000_isa_flags & (1UL << 35)) != 0) ? 4 : 8))
return 0;
  return foo ();
}

This rather horrible code turns the rs6000_isa_flags value into either
4 or 8:
rldicr 9,9,28,0
srdi 9,9,28
addic 9,9,-1
subfe 9,9,9
rldicr 9,9,0,61
addi 9,9,8
Better would be
rldicl 9,9,29,63
sldi 9,9,2
addi 9,9,4

There is also a "rlwinm ra,rb,3,0,26" instead of "rldicr ra,rb,3,60",
and "li r31,0x4000; rotldi r31,r31,17" vs.
"lis r31,0x8000; clrldi r31,r31,32".
Neither of these is a real change.  I saw one occurrence of a 5 insn
sequence being replaced with a load from memory in
default_function_rodata_section, for ".rodata", and others elsewhere.

Sometimes correct insn cost leads to unexpected results.  For
example:

extern unsigned bar (void);
unsigned
f1 (unsigned a)
{
  if ((a & 0x01000200) == 0x01000200)
return bar ();
  return 0;
}

emits for a & 0x01000200
 (set (reg) (and (reg) (const_int 0x01000200)))
at expand time (two rlwinm insns) rather than the older
 (set (reg) (const_int 0x01000200))
 (set (reg) (and (reg) (reg)))
which is three insns.  However, since 0x01000200 is needed later the
older code after optimisation is smaller.


[RS6000] Count rldimi constant insns

2020-09-14 Thread Alan Modra via Gcc-patches
rldimi is generated by rs6000_emit_set_long_const when the high and
low 32 bits of a 64-bit constant are equal.

* config/rs6000/rs6000.c (num_insns_constant_gpr): Count rldimi
constants correctly.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 20a4ba382bc..6fc86f8185a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5731,7 +5731,7 @@ direct_return (void)
 
 /* Helper for num_insns_constant.  Calculate number of instructions to
load VALUE to a single gpr using combinations of addi, addis, ori,
-   oris and sldi instructions.  */
+   oris, sldi and rldimi instructions.  */
 
 static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
@@ -5759,7 +5759,7 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
 
   high >>= 1;
 
-  if (low == 0)
+  if (low == 0 || low == high)
return num_insns_constant_gpr (high) + 1;
   else if (high == 0)
return num_insns_constant_gpr (low) + 1;


[RS6000] rs6000_rtx_costs cost IOR

2020-09-14 Thread Alan Modra via Gcc-patches
* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8c300b82b11..fb5fe7969a3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21177,6 +21177,7 @@ static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
+  rtx left, right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21367,16 +21368,17 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case AND:
-  if (CONST_INT_P (XEXP (x, 1)))
+  right = XEXP (x, 1);
+  if (CONST_INT_P (right))
{
- rtx left = XEXP (x, 0);
+ left = XEXP (x, 0);
  rtx_code left_code = GET_CODE (left);
 
  /* rotate-and-mask: 1 insn.  */
  if ((left_code == ROTATE
   || left_code == ASHIFT
   || left_code == LSHIFTRT)
- && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
+ && rs6000_is_valid_shift_mask (right, left, mode))
{
  *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
  if (!CONST_INT_P (XEXP (left, 1)))
@@ -21390,9 +21392,106 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case IOR:
-  /* FIXME */
   *total = COSTS_N_INSNS (1);
-  return true;
+  left = XEXP (x, 0);
+  if (GET_CODE (left) == AND
+ && CONST_INT_P (XEXP (left, 1)))
+   {
+ right = XEXP (x, 1);
+ if (GET_CODE (right) == AND
+ && CONST_INT_P (XEXP (right, 1))
+ && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
+   {
+ rtx leftop = XEXP (left, 0);
+ rtx rightop = XEXP (right, 0);
+
+ // rotlsi3_insert_5
+ if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == SImode
+ && UINTVAL (XEXP (left, 1)) != 0
+ && UINTVAL (XEXP (right, 1)) != 0
+ && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
+   return true;
+ // rotldi3_insert_6
+ if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == DImode
+ && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
+   return true;
+ // rotldi3_insert_7
+ if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == DImode
+ && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
+   return true;
+
+ rtx mask = 0;
+ rtx shift = leftop;
+ rtx_code shift_code = GET_CODE (shift);
+ // rotl3_insert
+ if (shift_code == ROTATE
+ || shift_code == ASHIFT
+ || shift_code == LSHIFTRT)
+   mask = right;
+ else
+   {
+ shift = rightop;
+ shift_code = GET_CODE (shift);
+ // rotl3_insert_2
+ if (shift_code == ROTATE
+ || shift_code == ASHIFT
+ || shift_code == LSHIFTRT)
+   mask = left;
+   }
+ if (mask
+ && CONST_INT_P (XEXP (shift, 1))
+ && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
+   {
+ /* Test both regs even though the one in the mask is
+constrained to be equal to the output.  Increasing
+cost may well result in rejecting an invalid insn
+earlier.  */
+ rtx reg_op = XEXP (shift, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, shift_code, 0, speed);
+ reg_op = XEXP (mask, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, AND, 0, speed);
+ return true;
+   }
+   }
+ // rotl3_insert_3
+ if (GET_CODE (right) == ASHIFT
+ && CONST_INT_P (XEXP (right, 1))
+ && (INTVAL (XEXP (right, 1))
+ == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
+   {
+ rtx reg_op = XEXP (left, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, AND, 0, speed);
+ reg_op = XEXP (right, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, ASHIFT, 0, speed);
+ return true;
+   }
+ // rotl3_insert_4
+ if (GET_CODE (right) == LSHIFTRT
+ && CONST_INT_P (XEXP (right, 1))
+ && mode == SImode
+ && (INTVAL (XEXP (right, 1))
+ + exact_log2 (-UINTVAL (XEXP (left, 1 == 32)
+   {
+ rtx reg_op 

[RS6000] rotate and mask constants

2020-09-14 Thread Alan Modra via Gcc-patches
Implement more two insn constants.

PR 94393
* config/rs6000/rs6000.c (rotate_and_mask_constant): New function.
(num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 86c90c4d756..1848cb57ef8 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1112,6 +1112,8 @@ static tree rs6000_handle_altivec_attribute (tree *, 
tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
 static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
+static bool rotate_and_mask_constant (unsigned HOST_WIDE_INT, HOST_WIDE_INT *,
+ int *, unsigned HOST_WIDE_INT *);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t,
@@ -5789,7 +5791,8 @@ num_insns_constant_multi (HOST_WIDE_INT value, 
machine_mode mode)
  /* We won't get more than 2 from num_insns_constant_gpr
 except when TARGET_POWERPC64 and mode is DImode or
 wider, so the register mode must be DImode.  */
- && rs6000_is_valid_and_mask (GEN_INT (low), DImode))
+ && (rs6000_is_valid_and_mask (GEN_INT (low), DImode)
+ || rotate_and_mask_constant (low, NULL, NULL, NULL)))
insns = 2;
   total += insns;
   /* If BITS_PER_WORD is the number of bits in HOST_WIDE_INT, doing
@@ -9420,6 +9423,82 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Detect cases where a constant can be formed by li; rldicl, li; rldicr,
+   or lis; rldicl.  */
+
+static bool
+rotate_and_mask_constant (unsigned HOST_WIDE_INT c,
+ HOST_WIDE_INT *val, int *shift,
+ unsigned HOST_WIDE_INT *mask)
+{
+  /* We know C can't be formed by lis,addi so that puts constraints
+ on the max leading zeros.  lead_zeros won't be larger than
+ HOST_BITS_PER_WIDE_INT - 31.  */
+  int lead_zeros = wi::clz (c);
+  int non_zero = HOST_BITS_PER_WIDE_INT - lead_zeros;
+  /* 00...01xx0..00 (up to 14 x's, any number of leading
+ and trailing 0's) can be implemented as a li, rldicl.  */
+  if ((c & ~(HOST_WIDE_INT_UC (0x7fff) << (non_zero - 15))) == 0)
+{
+  /* eg. c = 1100   ... 
+-> val = 0x3000, shift = 49, mask = -1ull.  */
+  if (val)
+   {
+ *val = c >> (non_zero - 15);
+ *shift = non_zero - 15;
+ *mask = HOST_WIDE_INT_M1U;
+   }
+  return true;
+}
+  /* 00...01xxx1..11 (up to 15 x's, any number of leading
+ 0's and trailing 1's) can be implemented as a li, rldicl.  */
+  if ((c | (HOST_WIDE_INT_M1U << (non_zero - 16))) == HOST_WIDE_INT_M1U)
+{
+  /* eg. c =  1011   ... 
+-> val = sext(0xbfff), shift = 44, mask = 0x0fff.  */
+  if (val)
+   {
+ *val = (((c >> (non_zero - 16)) & 0x) ^ 0x8000) - 0x8000;
+ *shift = non_zero - 16;
+ *mask = HOST_WIDE_INT_M1U >> lead_zeros;
+   }
+  return true;
+}
+  /* 00...01xxx00..01..11 (up to 15 x's followed by 16 0's,
+ any number of leading 0's and trailing 1's) can be implemented as
+ lis, rldicl.  */
+  if (non_zero >= 32
+  && (c & ((HOST_WIDE_INT_1U << (non_zero - 16))
+  - (HOST_WIDE_INT_1U << (non_zero - 32 == 0
+  && (c | (HOST_WIDE_INT_M1U << (non_zero - 32))) == HOST_WIDE_INT_M1U)
+{
+  if (val)
+   {
+ *val = (((c >> (non_zero - 32)) & 0x)
+ ^ 0x8000) - 0x8000;
+ *shift = non_zero - 32;
+ *mask = HOST_WIDE_INT_M1U >> lead_zeros;
+   }
+  return true;
+}
+  /* 11..1xxx0..0 (up to 15 x's, any number of leading 1's
+ and trailing 0's) can be implemented as a li, rldicr.  */
+  int trail_zeros = wi::ctz (c);
+  if (trail_zeros >= 48
+  || ((c | ((HOST_WIDE_INT_1U << (trail_zeros + 15)) - 1))
+ == HOST_WIDE_INT_M1U))
+{
+  if (val)
+   {
+ *val = (((c >> trail_zeros) & 0x) ^ 0x8000) - 0x8000;
+ *shift = trail_zeros;
+ *mask = HOST_WIDE_INT_M1U << trail_zeros;
+   }
+  return true;
+}
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -9429,6 +9508,9 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
+  HOST_WIDE_INT val = c;
+  int shift;
+  unsigned HOST_WIDE_INT mask;
 
   ud1 = c & 0x;
   c = c >> 16;
@@ -9454,6 +9536,15 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_I

[RS6000] rs6000_rtx_costs for AND

2020-09-14 Thread Alan Modra via Gcc-patches
The existing "case AND" in this function is not sufficient for
optabs.c:avoid_expensive_constant usage, where the AND is passed in
outer_code.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Move costing for
AND to CONST_INT case.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 32044d33977..523d029800a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21150,16 +21150,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
|| outer_code == MINUS)
   && (satisfies_constraint_I (x)
   || satisfies_constraint_L (x)))
- || (outer_code == AND
- && (satisfies_constraint_K (x)
- || (mode == SImode
- ? satisfies_constraint_L (x)
- : satisfies_constraint_J (x
- || ((outer_code == IOR || outer_code == XOR)
+ || ((outer_code == AND || outer_code == IOR || outer_code == XOR)
  && (satisfies_constraint_K (x)
  || (mode == SImode
  ? satisfies_constraint_L (x)
  : satisfies_constraint_J (x
+ || (outer_code == AND
+ && rs6000_is_valid_and_mask (x, mode))
  || outer_code == ASHIFT
  || outer_code == ASHIFTRT
  || outer_code == LSHIFTRT
@@ -21196,7 +21193,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
|| outer_code == IOR
|| outer_code == XOR)
   && (INTVAL (x)
-  & ~ (unsigned HOST_WIDE_INT) 0x) == 0))
+  & ~ (unsigned HOST_WIDE_INT) 0x) == 0)
+  || (outer_code == AND
+  && rs6000_is_valid_2insn_and (x, mode)))
{
  *total = COSTS_N_INSNS (1);
  return true;
@@ -21334,26 +21333,6 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total += COSTS_N_INSNS (1);
  return true;
}
-
- /* rotate-and-mask (no rotate), andi., andis.: 1 insn.  */
- HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
- if (rs6000_is_valid_and_mask (XEXP (x, 1), mode)
- || (val & 0x) == val
- || (val & 0x) == val
- || ((val & 0x) == 0 && mode == SImode))
-   {
- *total = rtx_cost (left, mode, AND, 0, speed);
- *total += COSTS_N_INSNS (1);
- return true;
-   }
-
- /* 2 insns.  */
- if (rs6000_is_valid_2insn_and (XEXP (x, 1), mode))
-   {
- *total = rtx_cost (left, mode, AND, 0, speed);
- *total += COSTS_N_INSNS (2);
- return true;
-   }
}
 
   *total = COSTS_N_INSNS (1);


[RS6000] rs6000_rtx_costs multi-insn constants

2020-09-14 Thread Alan Modra via Gcc-patches
This small patch to rs6000_rtx_const considerably improves code
generated for large constants in 64-bit code, teaching gcc that it is
better to load a constant from memory than to generate a sequence of
up to five dependent instructions.  Note that the rs6000 backend does
generate large constants as loads from memory at expand time but
optimisation passes replace them with SETs of the value due to not
having correct costs.

PR 94393
* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost multi-insn
constants.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5b3c0ee0e8c..8c300b82b11 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21181,7 +21181,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
   switch (code)
 {
-  /* On the RS/6000, if it is valid in the insn, it is free.  */
+  /* (reg) is costed at zero by rtlanal.c:rtx_cost.  That sets a
+baseline for rtx costs:  If a constant is valid in an insn,
+it is free.  */
 case CONST_INT:
   if (((outer_code == SET
|| outer_code == PLUS
@@ -21242,6 +21244,17 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case CONST_DOUBLE:
 case CONST_WIDE_INT:
+  /* Subtract one insn here for consistency with the above code
+that returns one less than the actual number of insns for
+SETs.  Don't subtract one for other than SETs, because other
+operations will require the constant to be loaded to a
+register before performing the operation.  All special cases
+for codes other than SET must be handled before we get
+here.  */
+  *total = COSTS_N_INSNS (num_insns_constant (x, mode)
+ - (outer_code == SET ? 1 : 0));
+  return true;
+
 case CONST:
 case HIGH:
 case SYMBOL_REF:


[RS6000] rs6000_rtx_costs comment

2020-09-14 Thread Alan Modra via Gcc-patches
Prior patches in this series were small bug fixes.  This lays out the
ground rules for following patches.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Expand comment.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 523d029800a..5b3c0ee0e8c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21133,7 +21133,45 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
 
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
-   scanned.  In either case, *TOTAL contains the cost result.  */
+   scanned.  In either case, *TOTAL contains the cost result.
+
+   1) Calls from places like optabs.c:avoid_expensive_constant will
+   come here with OUTER_CODE set to an operation such as AND with X
+   being a CONST_INT or other CONSTANT_P type.  This will be compared
+   against set_src_cost, where we'll come here with OUTER_CODE as SET
+   and X the same constant.
+
+   2) Calls from places like combine:distribute_and_simplify_rtx are
+   asking whether a possibly quite complex SET_SRC can be implemented
+   more cheaply than some other logically equivalent SET_SRC.
+
+   3) Calls from places like default_noce_conversion_profitable_p will
+   come here via seq_cost and pass the pattern of a SET insn in X.
+   Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs
+   will next see the SET_SRC.  The overall cost should be comparable
+   to rs6000_insn_cost since the code is comparing one insn sequence
+   (some of which may be costed by insn_cost) against another insn
+   sequence.
+
+   4) Calls from places like cprop.c:try_replace_reg will come here
+   with OUTER_CODE as INSN, and X either a valid pattern of a SET or
+   one where some registers have been replaced with constants.  The
+   replacements may make the SET invalid, for example if
+ (set (reg1) (and (reg2) (const_int 0xfff)))
+   replaces reg2 as
+ (set (reg1) (and (symbol_ref) (const_int 0xfff)))
+   then the replacement can't be implemented in one instruction and
+   really the cost should be higher by one instruction.  However,
+   the cost for invalid insns doesn't matter much except that a
+   higher cost may lead to their rejection earlier.
+
+   5) fwprop.c:should_replace_address puts yet another wrinkle on this
+   function, where we prefer an address calculation that is more
+   complex yet has the same address_cost.  In this case "more
+   complex" is determined by having a higher set_src_cost.  So for
+   example, if we want a plain (reg) address to be replaced with
+   (plus (reg) (const)) when possible then PLUS needs to cost more
+   than zero here.  */
 
 static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,


[RS6000] rs6000_rtx_costs reduce cost for SETs

2020-09-14 Thread Alan Modra via Gcc-patches
Also use rs6000_cost only for speed.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs
when insn operation cost handled on recursive call.  Only use
rs6000_cost for speed.  Tidy break/return.  Tidy AND costing.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fb5fe7969a3..86c90c4d756 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21277,15 +21277,19 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case PLUS:
 case MINUS:
-  if (FLOAT_MODE_P (mode))
+  if (speed && FLOAT_MODE_P (mode))
*total = rs6000_cost->fp;
   else
*total = COSTS_N_INSNS (1);
   return false;
 
 case MULT:
-  if (CONST_INT_P (XEXP (x, 1))
- && satisfies_constraint_I (XEXP (x, 1)))
+  if (!speed)
+   /* A little more than one insn so that nothing is tempted to
+  turn a shift left into a multiply.  */
+   *total = COSTS_N_INSNS (1) + 1;
+  else if (CONST_INT_P (XEXP (x, 1))
+  && satisfies_constraint_I (XEXP (x, 1)))
{
  if (INTVAL (XEXP (x, 1)) >= -256
  && INTVAL (XEXP (x, 1)) <= 255)
@@ -21304,18 +21308,22 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case FMA:
-  if (mode == SFmode)
+  if (!speed)
+   *total = COSTS_N_INSNS (1) + 1;
+  else if (mode == SFmode)
*total = rs6000_cost->fp;
   else
*total = rs6000_cost->dmul;
-  break;
+  return false;
 
 case DIV:
 case MOD:
   if (FLOAT_MODE_P (mode))
{
- *total = mode == DFmode ? rs6000_cost->ddiv
- : rs6000_cost->sdiv;
+ if (!speed)
+   *total = COSTS_N_INSNS (1) + 2;
+ else
+   *total = mode == DFmode ? rs6000_cost->ddiv : rs6000_cost->sdiv;
  return false;
}
   /* FALLTHRU */
@@ -21334,7 +21342,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
}
   else
{
- if (GET_MODE (XEXP (x, 1)) == DImode)
+ if (!speed)
+   *total = COSTS_N_INSNS (1) + 2;
+ else if (GET_MODE (XEXP (x, 1)) == DImode)
*total = rs6000_cost->divdi;
  else
*total = rs6000_cost->divsi;
@@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case AND:
+  *total = COSTS_N_INSNS (1);
   right = XEXP (x, 1);
   if (CONST_INT_P (right))
{
@@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   || left_code == LSHIFTRT)
  && rs6000_is_valid_shift_mask (right, left, mode))
{
- *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
- if (!CONST_INT_P (XEXP (left, 1)))
-   *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
speed);
- *total += COSTS_N_INSNS (1);
+ rtx reg_op = XEXP (left, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, left_code, 0, speed);
+ reg_op = XEXP (left, 1);
+ if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
+   *total += rtx_cost (reg_op, mode, left_code, 1, speed);
  return true;
}
}
-
-  *total = COSTS_N_INSNS (1);
   return false;
 
 case IOR:
@@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   if (outer_code == TRUNCATE
  && GET_CODE (XEXP (x, 0)) == MULT)
{
- if (mode == DImode)
+ if (!speed)
+   *total = COSTS_N_INSNS (1) + 1;
+ else if (mode == DImode)
*total = rs6000_cost->muldi;
  else
*total = rs6000_cost->mulsi;
@@ -21554,11 +21567,16 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 case FIX:
 case UNSIGNED_FIX:
 case FLOAT_TRUNCATE:
-  *total = rs6000_cost->fp;
+  if (!speed)
+   *total = COSTS_N_INSNS (1);
+  else
+   *total = rs6000_cost->fp;
   return false;
 
 case FLOAT_EXTEND:
-  if (mode == DFmode)
+  if (!speed)
+   *total = COSTS_N_INSNS (1);
+  else if (mode == DFmode)
*total = rs6000_cost->sfdf_convert;
   else
*total = rs6000_cost->fp;
@@ -21576,7 +21594,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total = rs6000_cost->fp;
  return false;
}
-  break;
+  return false;
 
 case NE:
 case EQ:
@@ -21614,13 +21632,32 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total = 0;
  return true;
}
-  break;
+  return false;
+
+case SET:
+  /* The default cost of a SET is the number of general purpose
+regs being set multiplied by COSTS_N_INSNS (1).  That only
+works where the increme

[RS6000] rs6000_rtx_costs for PLUS/MINUS constant

2020-09-14 Thread Alan Modra via Gcc-patches
These functions do behave a little differently for SImode, so the
mode should be passed.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Pass mode to
reg_or_add_cint_operand and reg_or_sub_cint_operand.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6fc86f8185a..32044d33977 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21189,9 +21189,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  return true;
}
   else if ((outer_code == PLUS
-   && reg_or_add_cint_operand (x, VOIDmode))
+   && reg_or_add_cint_operand (x, mode))
   || (outer_code == MINUS
-  && reg_or_sub_cint_operand (x, VOIDmode))
+  && reg_or_sub_cint_operand (x, mode))
   || ((outer_code == SET
|| outer_code == IOR
|| outer_code == XOR)


Re: [PATCH] rs6000: inefficient 64-bit constant generation for consecutive 1-bits

2020-09-14 Thread Alan Modra via Gcc-patches
On Thu, Sep 10, 2020 at 04:58:03PM -0500, Peter Bergner via Gcc-patches wrote:
> +unsigned long
> +test0 (void)
> +{
> +   return 0x0000UL;
> +}
> +
> +unsigned long
> +test1 (void)
> +{
> +   return 0x0000UL;
> +}
> +
> +unsigned long
> +test2 (void)
> +{
> +   return 0x0000UL;
> +}
> +
> +unsigned long
> +test3 (void)
> +{
> +   return 0xff00UL;
> +}
> +
> +unsigned long
> +test4 (void)
> +{
> +   return 0xff00UL;
> +}
> +
> +unsigned long
> +test5 (void)
> +{
> +   return 0x0100UL;
> +}
> +
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,8,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,24,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,48" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,23" } } */

Just a comment, I don't really see too much reason to change anything,
but scan-assembler tests can be a maintenance pain in cases like these
where there are multiple ways to generate a constant in two
instructions.  For example,

test3:
li 3,-1
rldicr 3,3,0,23
blr
and

test5:
li 3,16384
rotldi 3,3,26
blr

would be two valid possibilities for test3 and test5 that don't use
rldic.  Ideally the test would verify the actual values generated by
the test functions and count instructions.

And having said that I probably ought to post such a testcase for
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553927.html
I'm sure I had one lying around somewhere...

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] rotate and mask constants

2020-09-15 Thread Alan Modra via Gcc-patches
On Tue, Sep 15, 2020 at 10:49:46AM +0930, Alan Modra wrote:
> Implement more two insn constants.

And tests.  rot_cst1 checks the values generated, rot_cst2 checks
instruction count.

* gcc.target/powerpc/rot_cst.h,
* gcc.target/powerpc/rot_cst1.c,
* gcc.target/powerpc/rot_cst2.c: New tests.

diff --git a/gcc/testsuite/gcc.target/powerpc/rot_cst.h 
b/gcc/testsuite/gcc.target/powerpc/rot_cst.h
new file mode 100644
index 000..0d100d61233
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/rot_cst.h
@@ -0,0 +1,269 @@
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c1 (void)
+{
+  return 0xc000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c2 (void)
+{
+  return 0xc00ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c3 (void)
+{
+  return 0xc0ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c4 (void)
+{
+  return 0xcULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c5 (void)
+{
+  return 0xc000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c6 (void)
+{
+  return 0xc00ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c7 (void)
+{
+  return 0xc0ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c8 (void)
+{
+  return 0xcULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c9 (void)
+{
+  return 0xc000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c10 (void)
+{
+  return 0xc00ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c11 (void)
+{
+  return 0xc0ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c12 (void)
+{
+  return 0xcULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c13 (void)
+{
+  return 0xc000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c14 (void)
+{
+  return 0xc00ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c15 (void)
+{
+  return 0xc0ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c16 (void)
+{
+  return 0xcULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b1 (void)
+{
+  return 0xbfffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b2 (void)
+{
+  return 0xbffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b3 (void)
+{
+  return 0xbfULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b4 (void)
+{
+  return 0xbULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b5 (void)
+{
+  return 0xbfffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b6 (void)
+{
+  return 0xbffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b7 (void)
+{
+  return 0xbfULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b8 (void)
+{
+  return 0xbULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b9 (void)
+{
+  return 0xbfffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b10 (void)
+{
+  return 0xbffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b11 (void)
+{
+  return 0xbfULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b12 (void)
+{
+  return 0xbULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b13 (void)
+{
+  return 0xbfffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b14 (void)
+{
+  return 0xbffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b15 (void)
+{
+  return 0xbfULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b16 (void)
+{
+  return 0xbULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r1 (void)
+{
+  return -0x124ULL << 48;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r2 (void)
+{
+  return -0x124ULL << 44;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r3 (void)
+{
+  return -0x124ULL << 40;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r4 (void)
+{
+  return -0x124ULL << 32;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r5 (void)
+{
+  return -0x124ULL << 28;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r6 (void)
+{
+  return -0x124ULL << 24;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r7 (void)
+{
+  return -0x124ULL << 20;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r8 (void)
+{
+  return -0x124ULL << 16;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r9 (void)
+{
+ 

Re: [RS6000] rs6000_rtx_costs for AND

2020-09-16 Thread Alan Modra via Gcc-patches
On Tue, Sep 15, 2020 at 01:15:34PM -0500, will schmidt wrote:
> On Tue, 2020-09-15 at 10:49 +0930, Alan Modra via Gcc-patches wrote:
> > The existing "case AND" in this function is not sufficient for
> > optabs.c:avoid_expensive_constant usage, where the AND is passed in
> > outer_code.
> > 
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Move costing for
> > AND to CONST_INT case.
[snip]

> It's not exactly 1x1..   I tentatively conclude that the /* rotate-and-
> mask */  lump of code here does go dead with the "case AND" changes
> above.

It's not so much dead as duplicate.  When rtx_costs returns false, as
it will now on sub-expressions matching the removed code, it will be
called recursively on the sub-expressions with outer_code set to the
operation, AND in this case.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] rs6000_rtx_costs cost IOR

2020-09-16 Thread Alan Modra via Gcc-patches
On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote:
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.
> 
> >  case IOR:
> > -  /* FIXME */
> >*total = COSTS_N_INSNS (1);
> > -  return true;
> 
> Hey this was okay for over five years :-)
> 
> > +  left = XEXP (x, 0);
> > +  if (GET_CODE (left) == AND
> > + && CONST_INT_P (XEXP (left, 1)))
> 
> Add a comment that this is the integer insert insns?
> 
> > + // rotlsi3_insert_5
> 
> But use /* comments */.
> 
> > + /* Test both regs even though the one in the mask is
> > +constrained to be equal to the output.  Increasing
> > +cost may well result in rejecting an invalid insn
> > +earlier.  */
> 
> Is that ever actually useful?

Possibly not in this particular case, but I did see cases where
invalid insns were rejected early by costing non-reg sub-expressions.

Beside that, the default position on rtx_costs paths that return true
should be to cost any sub-expressions unless you know for sure they
are zero cost.  And yes, we fail to do that for some cases,
eg. mul_highpart.

> So this new block is pretty huge.  Can it easily be factored to a
> separate function?  Just the insert insns part, not all IOR.

Done in my local tree.

> Okay for trunk with the comments changed to the correct syntax, and
> factoring masked insert out to a separate function pre-approved if you
> want to do that.  Thanks!

I'll hold off committing until the whole rtx_costs patch series is
reviewed (not counting the rotate_and_mask_constant patch).

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] rs6000_rtx_costs reduce cost for SETs

2020-09-17 Thread Alan Modra via Gcc-patches
On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote:
> > Also use rs6000_cost only for speed.
> 
> More directly: use something completely different for !speed, namely,
> code size.

Yes, that might be better.

> > -  if (CONST_INT_P (XEXP (x, 1))
> > - && satisfies_constraint_I (XEXP (x, 1)))
> > +  if (!speed)
> > +   /* A little more than one insn so that nothing is tempted to
> > +  turn a shift left into a multiply.  */
> > +   *total = COSTS_N_INSNS (1) + 1;
> 
> Please don't.  We have a lot of code elsewhere to handle this directly,
> already.  Also, this is just wrong for size.  Five shifts is *not*
> better than four muls.  If that is the only way to get good results,
> than unfortunately we probably have to; but do not do this without any
> proof.

Huh.  If a cost of 5 is "just wrong for size" then you prefer a cost
of 12 for example (power9 mulsi or muldi rs6000_cost)?  Noticing that
result for !speed rs6000_rtx_costs is the entire basis for the !speed
changes.  I don't have any proof that this is correct.

> >  case FMA:
> > -  if (mode == SFmode)
> > +  if (!speed)
> > +   *total = COSTS_N_INSNS (1) + 1;
> 
> Not here, either.
> 
> >  case DIV:
> >  case MOD:
> >if (FLOAT_MODE_P (mode))
> > {
> > - *total = mode == DFmode ? rs6000_cost->ddiv
> > - : rs6000_cost->sdiv;
> > + if (!speed)
> > +   *total = COSTS_N_INSNS (1) + 2;
> 
> And why + 2 even?
> 
> > - if (GET_MODE (XEXP (x, 1)) == DImode)
> > + if (!speed)
> > +   *total = COSTS_N_INSNS (1) + 2;
> > + else if (GET_MODE (XEXP (x, 1)) == DImode)
> > *total = rs6000_cost->divdi;
> >   else
> > *total = rs6000_cost->divsi;
> 
> (more)

OK, I can remove all the !speed changes.  To be honest, I didn't look
anywhere near as much at code size changes as I worried about
performance.  And about not regressing any fiddly testcase we have.

> > @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >return false;
> >  
> >  case AND:
> > +  *total = COSTS_N_INSNS (1);
> >right = XEXP (x, 1);
> >if (CONST_INT_P (right))
> > {
> > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >|| left_code == LSHIFTRT)
> >   && rs6000_is_valid_shift_mask (right, left, mode))
> > {
> > - *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > - if (!CONST_INT_P (XEXP (left, 1)))
> > -   *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
> > speed);
> > - *total += COSTS_N_INSNS (1);
> > + rtx reg_op = XEXP (left, 0);
> > + if (!REG_P (reg_op))
> > +   *total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > + reg_op = XEXP (left, 1);
> > + if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > +   *total += rtx_cost (reg_op, mode, left_code, 1, speed);
> >   return true;
> > }
> > }
> > -
> > -  *total = COSTS_N_INSNS (1);
> >return false;
> 
> This doesn't improve anything?  It just makes it different from all
> surrounding code?

So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
regs, like it doesn't for const_int.

> > @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >if (outer_code == TRUNCATE
> >   && GET_CODE (XEXP (x, 0)) == MULT)
> > {
> > - if (mode == DImode)
> > + if (!speed)
> > +   *total = COSTS_N_INSNS (1) + 1;
> 
> (more)
> 
> > +case SET:
> > +  /* The default cost of a SET is the number of general purpose
> > +regs being set multiplied by COSTS_N_INSNS (1).  That only
> > +works where the incremental cost of the operation and
> > +operands is zero, when the operation performed can be done in
> > +one instruction.  For other cases where we add COSTS_N_INSNS
> > +for some operation (see point 5 above), COSTS_N_INSNS (1)
> > +should be subtracted from the total cost.  */
> 
> What does "incremental cost" mean there?  If what increases?
> 
> > +  {
> > +   rtx_code src_code = GET_CODE (SET_SRC (x));
> > +   if (src_code == CONST_INT
> > +   || src_code == CONST_DOUBLE
> > +   || src_code == CONST_WIDE_INT)
> > + return false;
> > +   int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > +   + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> 
> This should use set_src_cost, if anything.  But that will recurse then,
> currently?  Ugh.
> 
> Using rtx_cost for SET_DEST is problematic, too.
> 
> What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
> that, hrm.
> 
> rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
> (ab)used for that, sigh.
> 
> Many targets have something for it al

Re: [RS6000] rs6000_rtx_costs reduce cost for SETs

2020-09-21 Thread Alan Modra via Gcc-patches
On Fri, Sep 18, 2020 at 01:13:18PM -0500, Segher Boessenkool wrote:
> Thanks (to both of you).  Interesting!  Which of these unrelated changes
> does this come from?

Most of the changes I saw in code generation (not in spec, I didn't
look there, but in gcc) came down to this change to the cost for SETs,
and "rs6000_rtx_costs multi-insn constants".  I expect they were the
changes that made most difference to spec results, with this patch
likely resulting in more if-conversion.

So here is the patch again, this time without any distracting other
changes.  With a further revised comment.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost of SET
operands.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8969baa4dcf..2d770afd8fe 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21599,6 +21599,35 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
}
   break;
 
+case SET:
+  /* On entry the value in *TOTAL is the number of general purpose
+regs being set, multiplied by COSTS_N_INSNS (1).  Handle
+costing of set operands specially since in most cases we have
+an instruction rather than just a piece of RTL and should
+return a cost comparable to insn_cost.  That's a little
+complicated because in some cases the cost of SET operands is
+non-zero, see point 5 above and cost of PLUS for example, and
+in others it is zero, for example for (set (reg) (reg)).
+But (set (reg) (reg)) has the same insn_cost as
+(set (reg) (plus (reg) (reg))).  Hack around this by
+subtracting COSTS_N_INSNS (1) from the operand cost in cases
+were we add at least COSTS_N_INSNS (1) for some operation.
+However, don't do so for constants.  Constants might cost
+more than zero when they require more than one instruction,
+and we do want the cost of extra instructions.  */
+  {
+   rtx_code src_code = GET_CODE (SET_SRC (x));
+   if (src_code == CONST_INT
+   || src_code == CONST_DOUBLE
+   || src_code == CONST_WIDE_INT)
+ return false;
+   int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+   + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+   if (set_cost >= COSTS_N_INSNS (1))
+ *total += set_cost - COSTS_N_INSNS (1);
+   return true;
+  }
+
 default:
   break;
 }

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] rs6000_rtx_costs cost IOR

2020-09-21 Thread Alan Modra via Gcc-patches
On Mon, Sep 21, 2020 at 10:49:17AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 17, 2020 at 01:12:19PM +0930, Alan Modra wrote:
> > On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> > > > + /* Test both regs even though the one in the mask is
> > > > +constrained to be equal to the output.  Increasing
> > > > +cost may well result in rejecting an invalid insn
> > > > +earlier.  */
> > > 
> > > Is that ever actually useful?
> > 
> > Possibly not in this particular case, but I did see cases where
> > invalid insns were rejected early by costing non-reg sub-expressions.
> 
> But does that ever change generated code?
> 
> This makes the compiler a lot harder to read and understand.  To the
> point that such micro-optimisations makes worthwhile optimisations hard
> or impossible to do.

Fair enough, here's a revised patch.

* config/rs6000/rs6000.c (rotate_insert_cost): New function.
(rs6000_rtx_costs): Cost IOR.  Tidy break/return.  Tidy AND.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5025e3c30c0..78c33cc8cba 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21118,6 +21118,91 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
 && get_attr_cannot_copy (insn);
 }
 
+/* Handle rtx_costs for scalar integer rotate and insert insns.  */
+
+static bool
+rotate_insert_cost (rtx left, rtx right, machine_mode mode, bool speed,
+   int *total)
+{
+  if (GET_CODE (right) == AND
+  && CONST_INT_P (XEXP (right, 1))
+  && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
+{
+  rtx leftop = XEXP (left, 0);
+  rtx rightop = XEXP (right, 0);
+
+  /* rotlsi3_insert_5.  */
+  if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == SImode
+ && UINTVAL (XEXP (left, 1)) != 0
+ && UINTVAL (XEXP (right, 1)) != 0
+ && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
+   return true;
+  /* rotldi3_insert_6.  */
+  if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == DImode
+ && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
+   return true;
+  /* rotldi3_insert_7.  */
+  if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == DImode
+ && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
+   return true;
+
+  rtx mask = 0;
+  rtx shift = leftop;
+  rtx_code shift_code = GET_CODE (shift);
+  /* rotl3_insert.  */
+  if (shift_code == ROTATE
+ || shift_code == ASHIFT
+ || shift_code == LSHIFTRT)
+   mask = right;
+  else
+   {
+ shift = rightop;
+ shift_code = GET_CODE (shift);
+ /* rotl3_insert_2.  */
+ if (shift_code == ROTATE
+ || shift_code == ASHIFT
+ || shift_code == LSHIFTRT)
+   mask = left;
+   }
+  if (mask
+ && CONST_INT_P (XEXP (shift, 1))
+ && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
+   {
+ *total += rtx_cost (XEXP (shift, 0), mode, shift_code, 0, speed);
+ *total += rtx_cost (XEXP (mask, 0), mode, AND, 0, speed);
+ return true;
+   }
+}
+  /* rotl3_insert_3.  */
+  if (GET_CODE (right) == ASHIFT
+  && CONST_INT_P (XEXP (right, 1))
+  && (INTVAL (XEXP (right, 1))
+ == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
+{
+  *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
+  *total += rtx_cost (XEXP (right, 0), mode, ASHIFT, 0, speed);
+  return true;
+}
+  /* rotl3_insert_4.  */
+  if (GET_CODE (right) == LSHIFTRT
+  && CONST_INT_P (XEXP (right, 1))
+  && mode == SImode
+  && (INTVAL (XEXP (right, 1))
+ + exact_log2 (-UINTVAL (XEXP (left, 1 == 32)
+{
+  *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
+  *total += rtx_cost (XEXP (right, 0), mode, LSHIFTRT, 0, speed);
+  return true;
+}
+  return false;
+}
+
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
scanned.  In either case, *TOTAL contains the cost result.
@@ -21165,6 +21250,7 @@ static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
+  rtx left, right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21295,7 +21381,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = rs6000_cost->fp;
   else
*total = rs6000_cost->dmul;
-  break;
+  return false;
 
 case DIV:
 case MOD:
@@ -21355,32 +21441,37 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case AND:
-  if (CONST_INT_P (XEXP (x, 1)))
+  *total = COSTS_N_INSNS (1);
+  right = XEXP (x, 1);
+  if (CONST_INT

PR97107, libgo fails to build for power10

2020-09-21 Thread Alan Modra via Gcc-patches
Calls from split-stack code to non-split-stack code need to expand
mapped stack memory via __morestack.  Even tail calls.

__morestack is quite a surprising function on powerpc in that it calls
back to its caller, and a tail call will continue running in the
context of extra mapped stack.

Bootstrapped and regression tested on power8, and built for power10.

PR target/97107
* config/rs6000/rs6000-internal.h (struct rs6000_stack): Improve
calls_p comment.
* config/rs6000/rs6000-logue.c (rs6000_stack_info): Likewise.
(rs6000_expand_split_stack_prologue): Emit the prologue for
functions that make a sibling call.

diff --git a/gcc/config/rs6000/rs6000-internal.h 
b/gcc/config/rs6000/rs6000-internal.h
index 9caef013a71..32681b6cece 100644
--- a/gcc/config/rs6000/rs6000-internal.h
+++ b/gcc/config/rs6000/rs6000-internal.h
@@ -32,7 +32,7 @@ typedef struct rs6000_stack {
   int cr_save_p;   /* true if the CR reg needs to be saved */
   unsigned int vrsave_mask;/* mask of vec registers to save */
   int push_p;  /* true if we need to allocate stack space */
-  int calls_p; /* true if the function makes any calls */
+  int calls_p; /* true if there are non-sibling calls */
   int world_save_p;/* true if we're saving *everything*:
   r13-r31, cr, f14-f31, vrsave, v20-v31  */
   enum rs6000_abi abi; /* which ABI to use */
diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 0f88ec19929..cb08c25c795 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -714,7 +714,7 @@ rs6000_stack_info (void)
   info->altivec_size = 16 * (LAST_ALTIVEC_REGNO + 1
 - info->first_altivec_reg_save);
 
-  /* Does this function call anything?  */
+  /* Does this function call anything (apart from sibling calls)?  */
   info->calls_p = (!crtl->is_leaf || cfun->machine->ra_needs_full_frame);
 
   /* Determine if we need to save the condition code registers.  */
@@ -5479,7 +5479,15 @@ rs6000_expand_split_stack_prologue (void)
   gcc_assert (flag_split_stack && reload_completed);
 
   if (!info->push_p)
-return;
+{
+  /* We need the -fsplit-stack prologue for functions that make
+tail calls.  Tail calls don't count against crtl->is_leaf.  */
+  for (insn = get_topmost_sequence ()->first; insn; insn = NEXT_INSN 
(insn))
+   if (CALL_P (insn))
+ break;
+  if (!insn)
+   return;
+}
 
   if (global_regs[29])
 {

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Power10 libffi fixes

2020-09-21 Thread Alan Modra via Gcc-patches
Power10 pc-relative code doesn't use or preserve r2 as a TOC pointer.
That means calling between pc-relative and TOC using code can't be
done without intervening linker stubs, and a call from TOC code to
pc-relative code must have a nop after the bl in order to restore r2.

Now the PowerPC libffi assembly code doesn't use r2 except for the
implicit use when making calls back to C, ffi_closure_helper_LINUX64
and ffi_prep_args64.  So changing the assembly to interoperate with
pc-relative code without stubs is easily done.  Controlling that is a
new built-in macro.

Upstream libffi currently has a different patch applied to work around
the power10 build failure.  I'll post a delta for upstream.
Bootstrapped and regression tested on power8, built for power10.

gcc/
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros):
Conditionally define __PCREL__.
libffi/
* src/powerpc/linux64.S (ffi_call_LINUX64): Don't emit global
entry when __PCREL__.  Call using @notoc.
(ffi_closure_LINUX64, ffi_go_closure_linux64): Likewise.

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f5982907e90..cc1e997524e 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -597,6 +597,9 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
 rs6000_define_or_undefine_macro (define_p, "__MMA__");
+  /* Whether pc-relative code is being generated.  */
+  if ((flags & OPTION_MASK_PCREL) != 0)
+rs6000_define_or_undefine_macro (define_p, "__PCREL__");
 }
 
 void
diff --git a/libffi/src/powerpc/linux64.S b/libffi/src/powerpc/linux64.S
index b2ae60ead6e..bfb4d2957ae 100644
--- a/libffi/src/powerpc/linux64.S
+++ b/libffi/src/powerpc/linux64.S
@@ -36,8 +36,10 @@
.cfi_startproc
 # if _CALL_ELF == 2
 ffi_call_LINUX64:
+#  ifndef __PCREL__
addis   %r2, %r12, .TOC.-ffi_call_LINUX64@ha
addi%r2, %r2, .TOC.-ffi_call_LINUX64@l
+#  endif
.localentry ffi_call_LINUX64, . - ffi_call_LINUX64
 # else
.section".opd","aw"
@@ -89,7 +91,11 @@ ffi_call_LINUX64:
/* Call ffi_prep_args64.  */
mr  %r4, %r1
 # if defined _CALL_LINUX || _CALL_ELF == 2
+#  ifdef __PCREL__
+   bl  ffi_prep_args64@notoc
+#  else
bl  ffi_prep_args64
+#  endif
 # else
bl  .ffi_prep_args64
 # endif
diff --git a/libffi/src/powerpc/linux64_closure.S 
b/libffi/src/powerpc/linux64_closure.S
index 6487d2a2970..938e86034f1 100644
--- a/libffi/src/powerpc/linux64_closure.S
+++ b/libffi/src/powerpc/linux64_closure.S
@@ -37,8 +37,10 @@
.cfi_startproc
 # if _CALL_ELF == 2
 ffi_closure_LINUX64:
+#  ifndef __PCREL__
addis   %r2, %r12, .TOC.-ffi_closure_LINUX64@ha
addi%r2, %r2, .TOC.-ffi_closure_LINUX64@l
+#  endif
.localentry ffi_closure_LINUX64, . - ffi_closure_LINUX64
 # else
.section".opd","aw"
@@ -155,7 +157,11 @@ ffi_closure_LINUX64:
 
# make the call
 # if defined _CALL_LINUX || _CALL_ELF == 2
+#  ifdef __PCREL__
+   bl ffi_closure_helper_LINUX64@notoc
+#  else
bl ffi_closure_helper_LINUX64
+#  endif
 # else
bl .ffi_closure_helper_LINUX64
 # endif
@@ -396,8 +402,10 @@ ffi_closure_LINUX64:
.cfi_startproc
 # if _CALL_ELF == 2
 ffi_go_closure_linux64:
+#  ifndef __PCREL__
addis   %r2, %r12, .TOC.-ffi_go_closure_linux64@ha
addi%r2, %r2, .TOC.-ffi_go_closure_linux64@l
+#  endif
.localentry ffi_go_closure_linux64, . - ffi_go_closure_linux64
 # else
.section".opd","aw"

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR97107, libgo fails to build for power10

2020-09-22 Thread Alan Modra via Gcc-patches
Hi Segher,

On Tue, Sep 22, 2020 at 06:59:42PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 22, 2020 at 09:55:12AM +0930, Alan Modra wrote:
> >if (!info->push_p)
> > -return;
> > +{
> > +  /* We need the -fsplit-stack prologue for functions that make
> > +tail calls.  Tail calls don't count against crtl->is_leaf.  */
> > +  for (insn = get_topmost_sequence ()->first; insn; insn = NEXT_INSN 
> > (insn))
> > +   if (CALL_P (insn))
> > + break;
> > +  if (!insn)
> > +   return;
> > +}
> 
> I don't think that get_topmost_sequence is correct.

We get here from inside a sequence so we definitely need a way to look
at the function RTL rather than the empty sequence.  You want
push_topmost_sequence / pop_topmost_sequence around the rtl scan?

I wasn't aware there was a need for that when not emitting insns.  And
can't see any reason why when I look carefully, except that seems to
be customary, grep only shows one other place using
get_topmost_sequence as I did.

> Other than that this is fine for trunk (and backports).  Thanks!
> 
> 
> Segher

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] Power10 libffi fixes

2020-09-23 Thread Alan Modra via Gcc-patches
On Tue, Sep 22, 2020 at 07:16:57PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 22, 2020 at 10:00:11AM +0930, Alan Modra wrote:
> > gcc/
> > * config/rs6000/rs6000-c.c (rs6000_target_modify_macros):
> > Conditionally define __PCREL__.
> 
> Please do that as a separate (earlier) patch (because it *is*, and to
> simplify backports, etc).

Done.

> > libffi/
> > * src/powerpc/linux64.S (ffi_call_LINUX64): Don't emit global
> > entry when __PCREL__.  Call using @notoc.
> > (ffi_closure_LINUX64, ffi_go_closure_linux64): Likewise.
> 
> This is okay for trunk, and for backports (possibly expedited, talk
> with Peter for what is wanted/needed for AT).

I've fixed the changelog, a comment, and added a nop after bl for the
old calls without @notoc.  While there really isn't a need for the
nops in libffi.so since the callee is hidden visibility, there is a
miniscule chance that a static libffi.a user has a very large TOC and
somehow manages to have ffi_call_LINUX64 and ffi_prep_args64 using
different TOC pointers.  In that case the linker would arrange the
call to go via a toc-adjusting stub and want to replace the nop with a
toc restore.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR97107, libgo fails to build for power10

2020-09-23 Thread Alan Modra via Gcc-patches
On Tue, Sep 22, 2020 at 07:52:53PM -0500, Segher Boessenkool wrote:
> It isn't obvious that the outer sequence is always what we want.  If
> there is some nicer way to get at the info you want, that would help.
> 
> But, this is the expander only -- so I guess we are okay here, it is
> much limited when we can be called here.  Add a comment though?

Pushed with a comment.  Thanks!

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] Power10 libffi fixes

2020-09-23 Thread Alan Modra via Gcc-patches
On Thu, Sep 24, 2020 at 01:11:02PM +0930, Alan Modra wrote:
> I've fixed the changelog, a comment, and added a nop after bl for the
> old calls without @notoc.  While there really isn't a need for the
> nops in libffi.so since the callee is hidden visibility, there is a
> miniscule chance that a static libffi.a user has a very large TOC and
> somehow manages to have ffi_call_LINUX64 and ffi_prep_args64 using
> different TOC pointers.  In that case the linker would arrange the
> call to go via a toc-adjusting stub and want to replace the nop with a
> toc restore.

Adding those nops broke libffi, and I pushed the patch before
bootstrap finished..  So this one committed as obvious to fix the
breakage.

* src/powerpc/linux64_closure.S (ffi_closure_LINUX64): Correct
location of .Lret.

diff --git a/libffi/src/powerpc/linux64_closure.S 
b/libffi/src/powerpc/linux64_closure.S
index 3e30db36190..5663bb40223 100644
--- a/libffi/src/powerpc/linux64_closure.S
+++ b/libffi/src/powerpc/linux64_closure.S
@@ -159,15 +159,17 @@ ffi_closure_LINUX64:
 # if defined _CALL_LINUX || _CALL_ELF == 2
 #  ifdef __PCREL__
bl ffi_closure_helper_LINUX64@notoc
+.Lret:
 #  else
bl ffi_closure_helper_LINUX64
+.Lret:
nop
 #  endif
 # else
bl .ffi_closure_helper_LINUX64
+.Lret:
nop
 # endif
-.Lret:
 
# now r3 contains the return type
# so use it to look up in a table

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] rs6000: Use parameterized names for tablejump

2020-09-29 Thread Alan Modra via Gcc-patches
On Wed, Sep 30, 2020 at 12:04:25AM +, Segher Boessenkool wrote:
>   * config/rs6000/rs6000.md (tablejump): Simplify.
>   (tablejumpsi): Merge this ...
>   (tablejumpdi): ... and this ...
>   (@tablejump_normal): ... into this.
>   (tablejumpsi_nospec): Merge this ...
>   (tablejumpdi_nospec): ... and this ...
>   (@tablejump_nospec): ... into this.
>   (*tablejump_internal1): Delete, rename to ...
>   (@tablejump_insn_normal): ... this.
>   (*tablejump_internal1_nospec): Delete, rename to ...
>   (@tablejump_insn_nospec): ... this.

decQuad.o] Error 1
*** stack smashing detected ***: terminated
during RTL pass: expand

I'll commit this as obvious after my regstraps finish.

* config/rs6000/rs6000.md (@tablejump_normal): Don't use
non-existent operands[].
(@tablejump_nospec): Likewise.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 24ad80993ad..779bfd11237 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -12716,21 +12716,22 @@
(use (match_operand:P 1))]
   "rs6000_speculate_indirect_jumps"
 {
+  rtx off;
   operands[0] = force_reg (SImode, operands[0]);
   if (mode == SImode)
-operands[4] = operands[0];
+off = operands[0];
   else
 {
-  operands[4] = gen_reg_rtx (Pmode);
+  off = gen_reg_rtx (Pmode);
   rtx src = gen_rtx_fmt_e (SIGN_EXTEND, Pmode, operands[0]);
-  emit_move_insn (operands[4], src);
+  emit_move_insn (off, src);
 }
 
-  operands[2] = force_reg (Pmode, gen_rtx_LABEL_REF (Pmode, operands[1]));
-  operands[3] = gen_reg_rtx (Pmode);
+  rtx lab = force_reg (Pmode, gen_rtx_LABEL_REF (Pmode, operands[1]));
+  rtx addr = gen_reg_rtx (Pmode);
 
-  emit_insn (gen_add3 (operands[3], operands[4], operands[2]));
-  emit_jump_insn (gen_tablejump_insn_normal (Pmode, operands[3], operands[1]));
+  emit_insn (gen_add3 (addr, off, lab));
+  emit_jump_insn (gen_tablejump_insn_normal (Pmode, addr, operands[1]));
   DONE;
 })
 
@@ -12740,21 +12741,22 @@
(use (match_operand:CC 2))]
   "!rs6000_speculate_indirect_jumps"
 {
+  rtx off;
   operands[0] = force_reg (SImode, operands[0]);
   if (mode == SImode)
-operands[4] = operands[0];
+off = operands[0];
   else
 {
-  operands[4] = gen_reg_rtx (Pmode);
+  off = gen_reg_rtx (Pmode);
   rtx src = gen_rtx_fmt_e (SIGN_EXTEND, Pmode, operands[0]);
-  emit_move_insn (operands[4], src);
+  emit_move_insn (off, src);
 }
 
-  operands[5] = force_reg (Pmode, gen_rtx_LABEL_REF (Pmode, operands[1]));
-  operands[3] = gen_reg_rtx (Pmode);
+  rtx lab = force_reg (Pmode, gen_rtx_LABEL_REF (Pmode, operands[1]));
+  rtx addr = gen_reg_rtx (Pmode);
 
-  emit_insn (gen_add3 (operands[3], operands[4], operands[5]));
-  emit_jump_insn (gen_tablejump_insn_nospec (Pmode, operands[3], operands[1],
+  emit_insn (gen_add3 (addr, off, lab));
+  emit_jump_insn (gen_tablejump_insn_nospec (Pmode, addr, operands[1],
 operands[2]));
   DONE;
 })


-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] gcc/configure typo fix

2020-09-29 Thread Alan Modra via Gcc-patches
Committed as obvious.

* configure.ac (--with-long-double-format): Typo fix.
* configure: Regenerate.

diff --git a/gcc/configure.ac b/gcc/configure.ac
index f5612161dcd..1ad5bbc6935 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6260,7 +6260,7 @@ supported if the default cpu is power7 or newer])
;;
   esac
   ;;
-  xpowerpc64*-*-linux*:*)
+  powerpc64*-*-linux*:*)
 AC_MSG_ERROR([--with-long-double-format argument should be ibm or ieee])
 with_long_double_format=""
 ;;

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] -mno-minimal-toc vs. power10 pcrelative

2020-09-30 Thread Alan Modra via Gcc-patches
We've had this hack in the libgcc config to build libgcc with
-mcmodel=small for powerpc64 for a long time.  It wouldn't be a bad
thing if someone who knows the multilib machinery well could arrange
for -mcmodel=small to be passed just for ppc64 when building for
earlier than power10.  But for now, make -mno-minimal-toc do nothing
when pcrel.  Which will do the right thing for any project that has
copied libgcc's trick.

We want this if configuring using --with-cpu=power10 to build a
power10 pcrel libgcc.  --mcmodel=small turns off pcrel.

Bootstrapped and regression tested powerpc64le-linux.  OK?

gcc/
* config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't
set -mcmodel=small for -mno-minimal-toc when pcrel.
libgcc/
* config/rs6000/t-linux: Document purpose of -mno-minimal-toc.

diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
index 2ded3301282..2de1097d3ec 100644
--- a/gcc/config/rs6000/linux64.h
+++ b/gcc/config/rs6000/linux64.h
@@ -132,20 +132,25 @@ extern int dot_symbols;
  if ((rs6000_isa_flags & OPTION_MASK_POWERPC64) == 0)  \
{   \
  rs6000_isa_flags |= OPTION_MASK_POWERPC64;\
- error ("%<-m64%> requires a PowerPC64 cpu");  \
+ error ("%<-m64%> requires a PowerPC64 cpu");  \
}   \
+ if (!global_options_set.x_rs6000_current_cmodel)  \
+   SET_CMODEL (CMODEL_MEDIUM); \
  if ((rs6000_isa_flags_explicit\
   & OPTION_MASK_MINIMAL_TOC) != 0) \
{   \
  if (global_options_set.x_rs6000_current_cmodel\
  && rs6000_current_cmodel != CMODEL_SMALL) \
error ("%<-mcmodel incompatible with other toc options%>"); \
- SET_CMODEL (CMODEL_SMALL);\
+ if (TARGET_MINIMAL_TOC\
+ || !(TARGET_PCREL \
+  || (PCREL_SUPPORTED_BY_OS\
+  && (rs6000_isa_flags_explicit\
+  & OPTION_MASK_PCREL) == 0))) \
+   SET_CMODEL (CMODEL_SMALL);  \
}   \
  else  \
{   \
- if (!global_options_set.x_rs6000_current_cmodel)  \
-   SET_CMODEL (CMODEL_MEDIUM); \
  if (rs6000_current_cmodel != CMODEL_SMALL)\
{   \
  if (!global_options_set.x_TARGET_NO_FP_IN_TOC) \
diff --git a/libgcc/config/rs6000/t-linux b/libgcc/config/rs6000/t-linux
index 4f6d4c4a4d2..ed821947b66 100644
--- a/libgcc/config/rs6000/t-linux
+++ b/libgcc/config/rs6000/t-linux
@@ -1,3 +1,8 @@
 SHLIB_MAPFILES += $(srcdir)/config/rs6000/libgcc-glibc.ver
 
-HOST_LIBGCC2_CFLAGS += -mlong-double-128 -mno-minimal-toc
+HOST_LIBGCC2_CFLAGS += -mlong-double-128
+
+# This is a way of selecting -mcmodel=small for ppc64, which gives
+# smaller and faster libgcc code.  Directly specifying -mcmodel=small
+# would need to take into account targets for which -mcmodel is invalid.
+HOST_LIBGCC2_CFLAGS += -mno-minimal-toc

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Adjust gcc asm for power10

2020-09-30 Thread Alan Modra via Gcc-patches
Generate assembly that is .localentry 1 with @notoc calls to match.

Bootstrapped and regression tested powerpc64le-linux on power8, and
bootstrapped on power10.  (I lost the power10 machine to someone else
before I could build a baseline to compare against.)

gcc/
* config/rs6000/ppc-asm.h: Support __PCREL__ code.
libgcc/
* config/rs6000/morestack.S,
* config/rs6000/tramp.S,
* config/powerpc/sjlj.S: Support __PCREL__ code.

diff --git a/gcc/config/rs6000/ppc-asm.h b/gcc/config/rs6000/ppc-asm.h
index 48edc9945d7..e0bce9c5aec 100644
--- a/gcc/config/rs6000/ppc-asm.h
+++ b/gcc/config/rs6000/ppc-asm.h
@@ -262,6 +262,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #undef toc
 
 #define FUNC_NAME(name) GLUE(__USER_LABEL_PREFIX__,name)
+#ifdef __PCREL__
+#define JUMP_TARGET(name) GLUE(FUNC_NAME(name),@notoc)
+#define FUNC_START(name) \
+   .type FUNC_NAME(name),@function; \
+   .globl FUNC_NAME(name); \
+FUNC_NAME(name): \
+   .localentry FUNC_NAME(name),1
+#else
 #define JUMP_TARGET(name) FUNC_NAME(name)
 #define FUNC_START(name) \
.type FUNC_NAME(name),@function; \
@@ -270,6 +278,7 @@ FUNC_NAME(name): \
 0: addis 2,12,(.TOC.-0b)@ha; \
addi 2,2,(.TOC.-0b)@l; \
.localentry FUNC_NAME(name),.-FUNC_NAME(name)
+#endif /* !__PCREL__ */
 
 #define HIDDEN_FUNC(name) \
   FUNC_START(name) \
diff --git a/libgcc/config/rs6000/morestack.S b/libgcc/config/rs6000/morestack.S
index 1b8ebb5dc3b..ac33c882c30 100644
--- a/libgcc/config/rs6000/morestack.S
+++ b/libgcc/config/rs6000/morestack.S
@@ -55,11 +55,18 @@
.type name,@function;   \
 name##:
 
+#ifdef __PCREL__
+#define ENTRY(name)\
+   ENTRY0(name);   \
+   .localentry name, 1
+#define JUMP_TARGET(name) name##@notoc
+#else
 #define ENTRY(name)\
ENTRY0(name);   \
 0: addis %r2,%r12,.TOC.-0b@ha; \
 addi %r2,%r2,.TOC.-0b@l;   \
.localentry name, .-name
+#endif
 
 #else
 
@@ -81,6 +88,9 @@ BODY_LABEL(name)##:
 
 #define SIZE(name) .size name, .-BODY_LABEL(name)
 
+#ifndef JUMP_TARGET
+#define JUMP_TARGET(name) name
+#endif
 
.text
 # Just like __morestack, but with larger excess allocation
@@ -156,7 +166,7 @@ ENTRY0(__morestack)
stdu %r1,-MORESTACK_FRAMESIZE(%r1)
 
# void __morestack_block_signals (void)
-   bl __morestack_block_signals
+   bl JUMP_TARGET(__morestack_block_signals)
 
# void *__generic_morestack (size_t *pframe_size,
#void *old_stack,
@@ -164,7 +174,7 @@ ENTRY0(__morestack)
addi %r3,%r29,NEWSTACKSIZE_SAVE
mr %r4,%r29
li %r5,0# no copying from old stack
-   bl __generic_morestack
+   bl JUMP_TARGET(__generic_morestack)
 
 # Start using new stack
stdu %r29,-32(%r3)  # back-chain
@@ -183,7 +193,7 @@ ENTRY0(__morestack)
std %r3,-0x7000-64(%r13)# tcbhead_t.__private_ss
 
# void __morestack_unblock_signals (void)
-   bl __morestack_unblock_signals
+   bl JUMP_TARGET(__morestack_unblock_signals)
 
 # Set up for a call to the target function, located 3
 # instructions after __morestack's return address.
@@ -218,11 +228,11 @@ ENTRY0(__morestack)
std %r10,PARAMREG_SAVE+56(%r29)
 #endif
 
-   bl __morestack_block_signals
+   bl JUMP_TARGET(__morestack_block_signals)
 
# void *__generic_releasestack (size_t *pavailable)
addi %r3,%r29,NEWSTACKSIZE_SAVE
-   bl __generic_releasestack
+   bl JUMP_TARGET(__generic_releasestack)
 
 # Reset __private_ss stack guard to value for old stack
ld %r12,NEWSTACKSIZE_SAVE(%r29)
@@ -231,7 +241,7 @@ ENTRY0(__morestack)
 .LEHE0:
std %r3,-0x7000-64(%r13)# tcbhead_t.__private_ss
 
-   bl __morestack_unblock_signals
+   bl JUMP_TARGET(__morestack_unblock_signals)
 
 # Use old stack again.
mr %r1,%r29
@@ -260,13 +270,15 @@ cleanup:
std %r3,PARAMREG_SAVE(%r29) # Save exception header
# size_t __generic_findstack (void *stack)
mr %r3,%r29
-   bl __generic_findstack
+   bl JUMP_TARGET(__generic_findstack)
sub %r3,%r29,%r3
addi %r3,%r3,BACKOFF
std %r3,-0x7000-64(%r13)# tcbhead_t.__private_ss
ld %r3,PARAMREG_SAVE(%r29)
-   bl _Unwind_Resume
+   bl JUMP_TARGET(_Unwind_Resume)
+#ifndef __PCREL__
nop
+#endif
.cfi_endproc
SIZE (__morestack)
 
@@ -310,7 +322,7 @@ ENTRY(__stack_split_initialize)
# void __generic_morestack_set_initial_sp (void *sp, size_t len)
mr %r3,%r1
li %r4, 0x4000
-   b __generic_morestack_set_initial_sp
+   b JUMP_TARGET(__generic_morestack_set_initial_sp)
 # The lack of .cfi_endproc here is deliber

Re: [RS6000] Adjust gcc asm for power10

2020-09-30 Thread Alan Modra via Gcc-patches
On Wed, Sep 30, 2020 at 05:36:08PM -0500, Segher Boessenkool wrote:
> On Wed, Sep 30, 2020 at 05:06:57PM +0930, Alan Modra wrote:
> > Generate assembly that is .localentry 1 with @notoc calls to match.
> 
> What is the purpose of this?  Non-obvious patchexs without any
> explanation like that cost needless extra time to review :-/
> 
> "Support __PCREL__ code." suggests that it did not even build before?
> Or did not work?  Or is this just a perfomance improvement?

Sorry, I sometimes credit you with super-human powers.  It's a
performance improvement for libgcc.a.  Calling between functions that
advertise as using the TOC and those that don't, will require linker
call stubs.

To recap, a function that uses a TOC pointer advertises that fact by a
value of 2 or larger in the symbol st_other localentry bits.  A call
advertises that it is from a function that needs to preserve r2 by
using an R_PPC64_REL24 reloc on the call, a function that doesn't have
a valid TOC pointer uses R_PPC64_REL24_NOTOC.

Note that the extra stubs I'm talking about are in statically linked
code.  Calls to shared library functions have no extra overhead due to
mis-matched toc/notoc code.  Those calls need a plt call stub anyway.
Also, indirect calls are not affected.

> > gcc/
> > * config/rs6000/ppc-asm.h: Support __PCREL__ code.
> > libgcc/
> > * config/rs6000/morestack.S,
> > * config/rs6000/tramp.S,
> > * config/powerpc/sjlj.S: Support __PCREL__ code.
> 
> The patch does look fine.  Okay for trunk (and backports if those are
> wanted; discuss with Bill I guess).  Thanks!
> 
> (But please explain the purpose of this, in the commit message if that
> makes sense.)
> 
> 
> Segher

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] -mno-minimal-toc vs. power10 pcrelative

2020-10-01 Thread Alan Modra via Gcc-patches
On Wed, Sep 30, 2020 at 03:56:32PM -0500, Segher Boessenkool wrote:
> On Wed, Sep 30, 2020 at 05:01:45PM +0930, Alan Modra wrote:
> > * config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't
> > set -mcmodel=small for -mno-minimal-toc when pcrel.
> 
> > - SET_CMODEL (CMODEL_SMALL);\
> > + if (TARGET_MINIMAL_TOC\
> > + || !(TARGET_PCREL \
> > +  || (PCREL_SUPPORTED_BY_OS\
> > +  && (rs6000_isa_flags_explicit\
> > +  & OPTION_MASK_PCREL) == 0))) \
> > +   SET_CMODEL (CMODEL_SMALL);  \
> 
> Please write this in a more readable way?  With some "else" statements,
> perhaps.
> 
> It is also fine to SET_CMODEL twice if that makes for simpler code.

Committed as per your suggestion.  I was looking at it again today
with the aim of converting this ugly macro to a function, and spotted
the duplication in freebsd64.h.  Which has some bit-rot.

Do you like the following?  rs6000_linux64_override_options is
functionally the same as what was in linux64.h.  I built various
configurations to test the change, powerpc64-linux, powerpc64le-linux
without any 32-bit targets enabled, powerpc64-freebsd12.0.

* config/rs6000/freebsd64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Use
rs6000_linux64_override_options.
* config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Break
out to..
* config/rs6000/rs6000.c (rs6000_linux64_override_options): ..this,
new function.  Tweak non-biarch test and clearing of
profile_kernel to work with freebsd64.h.

diff --git a/gcc/config/rs6000/freebsd64.h b/gcc/config/rs6000/freebsd64.h
index c9913638ffb..6984ca5a107 100644
--- a/gcc/config/rs6000/freebsd64.h
+++ b/gcc/config/rs6000/freebsd64.h
@@ -78,65 +78,7 @@ extern int dot_symbols;
 
 #undef  SUBSUBTARGET_OVERRIDE_OPTIONS
 #define SUBSUBTARGET_OVERRIDE_OPTIONS  \
-  do   \
-{  \
-  if (!global_options_set.x_rs6000_alignment_flags)\
-   rs6000_alignment_flags = MASK_ALIGN_NATURAL;\
-  if (TARGET_64BIT)\
-   {   \
- if (DEFAULT_ABI != ABI_AIX)   \
-   {   \
- rs6000_current_abi = ABI_AIX; \
- error (INVALID_64BIT, "call");\
-   }   \
- dot_symbols = !strcmp (rs6000_abi_name, "aixdesc");   \
- if (rs6000_isa_flags & OPTION_MASK_RELOCATABLE)   \
-   {   \
- rs6000_isa_flags &= ~OPTION_MASK_RELOCATABLE; \
- error (INVALID_64BIT, "relocatable"); \
-   }   \
- if (ELFv2_ABI_CHECK)  \
-   {   \
- rs6000_current_abi = ABI_ELFv2;   \
- if (dot_symbols)  \
-   error ("%<-mcall-aixdesc%> incompatible with %<-mabi=elfv2%>"); 
\
-   }   \
- if (rs6000_isa_flags & OPTION_MASK_EABI)  \
-   {   \
- rs6000_isa_flags &= ~OPTION_MASK_EABI;\
- error (INVALID_64BIT, "eabi");\
-   }   \
- if (TARGET_PROTOTYPE) \
-   {   \
- target_prototype = 0; \
- error (INVALID_64BIT, "prototype");   \
-   }   \
- if ((rs6000_isa_flags & OPTION_MASK_POWERPC64) == 0)  \
-   {   \
- rs6000_isa_flags |= OPTION_MASK_POWERPC64;\
- error ("%<-m64%> requires a PowerPC64 cpu");  \
-   }   \
-  if ((rs6000_isa_flags_explicit   \
-   & OPTION_MASK_MINIMAL_TOC) != 0)\
-   {   \
- if (global_options_set.x_rs6000_current_cmodel\
- && rs6000_current_cmodel != CMODEL_SMALL) \
-   error ("%<-mcmodel%> incompatible w

[RS6000] ICE in decompose, at rtl.h:2282

2020-10-01 Thread Alan Modra via Gcc-patches
during RTL pass: fwprop1
gcc.dg/pr82596.c: In function 'test_cststring':
gcc.dg/pr82596.c:27:1: internal compiler error: in decompose, at rtl.h:2282

-m32 gcc/testsuite/gcc.dg/pr82596.c fails along with other tests after
applying rtx_cost patches, which exposed a backend bug.

legitimize_address when presented with the following address
(plus (reg) (const_int 0x7))
attempts to rewrite it as a high/low sum.  The low part is 0x, or
-1, making the high part 0x8000.  But this is no longer canonical
for SImode.

Bootstrapped and regression tested powerpc64-linux biarch and
powerpc64le-linux.  OK?

* config/rs6000/rs6000.c (rs6000_legitimize_address): Properly
sign extend high part of address constant.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 375fff59928..d0924d59a65 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8364,7 +8364,7 @@ rs6000_legitimize_address (rtx x, rtx oldx 
ATTRIBUTE_UNUSED,
low_int = 0;
   high_int = INTVAL (XEXP (x, 1)) - low_int;
   sum = force_operand (gen_rtx_PLUS (Pmode, XEXP (x, 0),
-GEN_INT (high_int)), 0);
+gen_int_mode (high_int, Pmode)), 0);
   return plus_constant (Pmode, sum, low_int);
 }
   else if (GET_CODE (x) == PLUS

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] -mno-minimal-toc vs. power10 pcrelative

2020-10-01 Thread Alan Modra via Gcc-patches
Hi Segher,
On Thu, Oct 01, 2020 at 01:22:07PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Oct 01, 2020 at 10:57:48PM +0930, Alan Modra wrote:
> > On Wed, Sep 30, 2020 at 03:56:32PM -0500, Segher Boessenkool wrote:
> > > On Wed, Sep 30, 2020 at 05:01:45PM +0930, Alan Modra wrote:
> > > > * config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't
> > > > set -mcmodel=small for -mno-minimal-toc when pcrel.
> > > 
> > > > - SET_CMODEL (CMODEL_SMALL);\
> > > > + if (TARGET_MINIMAL_TOC\
> > > > + || !(TARGET_PCREL \
> > > > +  || (PCREL_SUPPORTED_BY_OS\
> > > > +  && (rs6000_isa_flags_explicit\
> > > > +  & OPTION_MASK_PCREL) == 0))) \
> > > > +   SET_CMODEL (CMODEL_SMALL);  \
> > > 
> > > Please write this in a more readable way?  With some "else" statements,
> > > perhaps.
> > > 
> > > It is also fine to SET_CMODEL twice if that makes for simpler code.
> > 
> > Committed as per your suggestion.
> 
> Thanks.
> 
> > I was looking at it again today
> > with the aim of converting this ugly macro to a function, and spotted
> > the duplication in freebsd64.h.  Which has some bit-rot.
> > 
> > Do you like the following?  rs6000_linux64_override_options is
> > functionally the same as what was in linux64.h.  I built various
> > configurations to test the change, powerpc64-linux, powerpc64le-linux
> > without any 32-bit targets enabled, powerpc64-freebsd12.0.
> 
> Please do this as two patches?  One the refactoring without any
> functional changes (which is pre-approved -- the name "linux64" isn't
> great if you use it in other OSes as well, but I cannot think of a
> better name either),

The patch as posted has no functional changes.  I even avoided
formatting changes as much as possible.  The only changes were those
necessary to use the code from linux64.h in a powerpc64-freebsd
compiler, where

#define TARGET_PROFILE_KERNEL 0
..
TARGET_PROFILE_KERNEL = 0;
doesn't work, nor does

if (!RS6000_BI_ARCH_P)
  error (INVALID_32BIT, "32");
when RS6000_BI_ARCH_P is undefined.

> and the other the actual change (which probably is
> fine as well, but it is hard to see with the patch like this).

I do have a followup patch..  Commit c6be439b37 wrongly left a block
of code inside and "else" block, which changed the default for power10
TARGET_NO_FP_IN_TOC accidentally.  We don't want FP constants in the
TOC when -mcmodel=medium can address them just as efficiently outside
the TOC.

* config/rs6000/rs6000.c (rs6000_linux64_override_options):
Formatting.  Correct setting of TARGET_NO_FP_IN_TOC and
TARGET_NO_SUM_IN_TOC.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 48f3cdec440..a1651551ff2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3493,8 +3493,7 @@ rs6000_linux64_override_options ()
}
   if (!global_options_set.x_rs6000_current_cmodel)
SET_CMODEL (CMODEL_MEDIUM);
-  if ((rs6000_isa_flags_explicit
-  & OPTION_MASK_MINIMAL_TOC) != 0)
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_MINIMAL_TOC) != 0)
{
  if (global_options_set.x_rs6000_current_cmodel
  && rs6000_current_cmodel != CMODEL_SMALL)
@@ -3503,23 +3502,18 @@ rs6000_linux64_override_options ()
SET_CMODEL (CMODEL_SMALL);
  else if (TARGET_PCREL
   || (PCREL_SUPPORTED_BY_OS
-  && (rs6000_isa_flags_explicit
-  & OPTION_MASK_PCREL) == 0))
+  && (rs6000_isa_flags_explicit & OPTION_MASK_PCREL) == 0))
/* Ignore -mno-minimal-toc.  */
;
  else
SET_CMODEL (CMODEL_SMALL);
}
-  else
+  if (rs6000_current_cmodel != CMODEL_SMALL)
{
- if (rs6000_current_cmodel != CMODEL_SMALL)
-   {
- if (!global_options_set.x_TARGET_NO_FP_IN_TOC)
-   TARGET_NO_FP_IN_TOC
- = rs6000_current_cmodel == CMODEL_MEDIUM;
- if (!global_options_set.x_TARGET_NO_SUM_IN_TOC)
-   TARGET_NO_SUM_IN_TOC = 0;
-   }
+ if (!global_options_set.x_TARGET_NO_FP_IN_TOC)
+   TARGET_NO_FP_IN_TOC = rs6000_current_cmodel == CMODEL_MEDIUM;
+ if (!global_options_set.x_TARGET_NO_SUM_IN_TOC)
+   TARGET_NO_SUM_IN_TOC = 0;
}
   if (TARGET_PLTSEQ && DEFAULT_ABI != ABI_ELFv2)
{


-- 
Alan Modra
Australia Development Lab, IBM


[PATCH] calls.c:can_implement_as_sibling_call_p REG_PARM_STACK_SPACE check

2020-10-02 Thread Alan Modra via Gcc-patches
This moves an #ifdef block of code from calls.c to
targetm.function_ok_for_sibcall.  Only two targets, x86 and rs6000,
define REG_PARM_STACK_SPACE or OUTGOING_REG_PARM_STACK_SPACE macros
that might vary depending on the called function.  Macros like
UNITS_PER_WORD don't change over a function boundary, nor does the
MIPS ABI, nor does TARGET_64BIT on PA-RISC.  Other targets are even
more trivially seen to not need the calls.c code.

Besides cleaning up a small piece of #ifdef code, the motivation for
this patch is to allow tail calls on PowerPC for functions that
require less reg_parm_stack_space than their caller.  The original
code in calls.c only permitted tail calls when exactly equal.

Bootstrapped and regression tested powerpc64le-linux and
x86_64-linux.  OK for master?

PR middle-end/97267
* calls.h (maybe_complain_about_tail_call): Declare.
* calls.c (maybe_complain_about_tail_call): Make global.
(can_implement_as_sibling_call_p): Move REG_PARM_STACK_SPACE
check to..
* config/i386/i386.c (ix86_function_ok_for_sibcall): ..here, and..
* config/rs6000/rs6000-logue.c (rs6000_function_ok_for_sibcall): ..
here.  Modify to allow reg_parm_stack_space less or equal to
caller, and delete redundant OUTGOING_REG_PARM_STACK_SPACE test.
* testsuite/gcc.target/powerpc/pr97267.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index ed4363811c8..df7324f9343 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1873,7 +1873,7 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 /* Issue an error if CALL_EXPR was flagged as requiring
tall-call optimization.  */
 
-static void
+void
 maybe_complain_about_tail_call (tree call_expr, const char *reason)
 {
   gcc_assert (TREE_CODE (call_expr) == CALL_EXPR);
@@ -3501,20 +3501,6 @@ can_implement_as_sibling_call_p (tree exp,
   return false;
 }
 
-#ifdef REG_PARM_STACK_SPACE
-  /* If outgoing reg parm stack space changes, we cannot do sibcall.  */
-  if (OUTGOING_REG_PARM_STACK_SPACE (funtype)
-  != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl))
-  || (reg_parm_stack_space != REG_PARM_STACK_SPACE 
(current_function_decl)))
-{
-  maybe_complain_about_tail_call (exp,
- "inconsistent size of stack space"
- " allocated for arguments which are"
- " passed in registers");
-  return false;
-}
-#endif
-
   /* Check whether the target is able to optimize the call
  into a sibcall.  */
   if (!targetm.function_ok_for_sibcall (fndecl, exp))
diff --git a/gcc/calls.h b/gcc/calls.h
index dfb951ca45b..6d4feb59dd0 100644
--- a/gcc/calls.h
+++ b/gcc/calls.h
@@ -133,6 +133,7 @@ extern bool reference_callee_copied (CUMULATIVE_ARGS *,
 extern void maybe_warn_alloc_args_overflow (tree, tree, tree[2], int[2]);
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 extern bool maybe_warn_nonstring_arg (tree, tree);
+extern void maybe_complain_about_tail_call (tree, const char *);
 extern bool get_size_range (tree, tree[2], bool = false);
 extern rtx rtx_for_static_chain (const_tree, bool);
 extern bool cxx17_empty_base_field_p (const_tree);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f684954af81..58fc5280935 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -939,6 +939,19 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   decl_or_type = type;
 }
 
+  /* If outgoing reg parm stack space changes, we cannot do sibcall.  */
+  if (OUTGOING_REG_PARM_STACK_SPACE (type)
+  != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl))
+  || (REG_PARM_STACK_SPACE (decl_or_type)
+ != REG_PARM_STACK_SPACE (current_function_decl)))
+{
+  maybe_complain_about_tail_call (exp,
+ "inconsistent size of stack space"
+ " allocated for arguments which are"
+ " passed in registers");
+  return false;
+}
+
   /* Check that the return value locations are the same.  Like
  if we are returning floats on the 80387 register stack, we cannot
  make a sibcall from a function that doesn't return a float to a
diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index d90cd5736e1..814b549e4ca 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -30,6 +30,7 @@
 #include "df.h"
 #include "tm_p.h"
 #include "ira.h"
+#include "calls.h"
 #include "print-tree.h"
 #include "varasm.h"
 #include "explow.h"
@@ -1133,6 +1134,17 @@ rs6000_function_ok_for_sibcall (tree decl, tree exp)
   else
 fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
 
+  /* If reg parm stack space increases, we cannot sibcall.  */
+  if (REG_PARM_STACK_SPACE (decl ? decl : fntype)
+  > REG_PARM_STACK_SPACE (current_function_decl))
+{
+   

Re: [PATCH] calls.c:can_implement_as_sibling_call_p REG_PARM_STACK_SPACE check

2020-10-02 Thread Alan Modra via Gcc-patches
On Fri, Oct 02, 2020 at 04:41:05PM +0930, Alan Modra wrote:
> This moves an #ifdef block of code from calls.c to
> targetm.function_ok_for_sibcall.

Not only did I miss cc'ing the i386 maintainers, I missed seeing that
the ATTRIBUTE_UNUSED reg_parm_stack_space param of
can_implement_as_sibling_call_p is now always unused.  Please consider
that parameter removed.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] calls.c:can_implement_as_sibling_call_p REG_PARM_STACK_SPACE check

2020-10-04 Thread Alan Modra via Gcc-patches
Hi Segher,

On Fri, Oct 02, 2020 at 01:50:24PM -0500, Segher Boessenkool wrote:
> On Fri, Oct 02, 2020 at 04:41:05PM +0930, Alan Modra wrote:
> > This moves an #ifdef block of code from calls.c to
> > targetm.function_ok_for_sibcall.  Only two targets, x86 and rs6000,
> > define REG_PARM_STACK_SPACE or OUTGOING_REG_PARM_STACK_SPACE macros
> > that might vary depending on the called function.  Macros like
> > UNITS_PER_WORD don't change over a function boundary, nor does the
> > MIPS ABI, nor does TARGET_64BIT on PA-RISC.  Other targets are even
> > more trivially seen to not need the calls.c code.
> > 
> > Besides cleaning up a small piece of #ifdef code, the motivation for
> > this patch is to allow tail calls on PowerPC for functions that
> > require less reg_parm_stack_space than their caller.  The original
> > code in calls.c only permitted tail calls when exactly equal.
> 
> > +  /* If reg parm stack space increases, we cannot sibcall.  */
> > +  if (REG_PARM_STACK_SPACE (decl ? decl : fntype)
> > +  > REG_PARM_STACK_SPACE (current_function_decl))
> > +{
> > +  maybe_complain_about_tail_call (exp,
> > + "inconsistent size of stack space"
> > + " allocated for arguments which are"
> > + " passed in registers");
> > +  return false;
> > +}
> 
> Maybe change the message?  You allow all sizes smaller or equal than
> the current size, "inconsistent" isn't very great for that.

We're talking about just two sizes here.  For 64-bit ELFv2 the reg
parm save size is either 0 or 64 bytes.  Yes, a better message would
be "caller lacks stack space allocated for aguments passed in
registers, required by callee".

Note that I'll likely be submitting a further patch that removes the
above code in rs6000-logue.c.  I thought is safer to only make a small
change at the same time as moving code around.

The reasoning behind a followup patch is:
a) The generic code checks that arg passing space in the called
   function is not greater than that in the current function, and,
b) ELFv2 only allocates reg_parm_stack_space when some parameter is
   passed on the stack.
Point (b) means that zero reg_parm_stack_space implies zero stack
arg space, and non-zero reg_parm_stack_space implies non-zero stack
arg space.  So the case of 0 reg_parm_stack_space in the caller and 64
in the callee will be caught by (a).

Also, there's a bug in the code I moved from calls.c.  It should be
using INCOMING_REG_PARM_STACK_SPACE, to properly compare space known
to be allocated for the current function vs. space needed for the
called function.  For an explanation of INCOMING_REG_PARM_STACK_SPACE
see https://gcc.gnu.org/pipermail/gcc-patches/2014-May/389867.html
Of course that bug doesn't matter in this context because it's always
been covered up by (a).

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH 4/8] [RS6000] rs6000_rtx_costs tidy break/return

2020-10-07 Thread Alan Modra via Gcc-patches
Most cases use "return false" rather than breaking out of the switch.
Do so in all cases.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Tidy break/return.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index bc5e51aa5ce..383d2901c9f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21371,7 +21371,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = rs6000_cost->fp;
   else
*total = rs6000_cost->dmul;
-  break;
+  return false;
 
 case DIV:
 case MOD:
@@ -21539,7 +21539,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total = rs6000_cost->fp;
  return false;
}
-  break;
+  return false;
 
 case NE:
 case EQ:
@@ -21577,13 +21577,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total = 0;
  return true;
}
-  break;
+  return false;
 
 default:
-  break;
+  return false;
 }
-
-  return false;
 }
 
 /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */


[PATCH 1/8] [RS6000] rs6000_rtx_costs comment

2020-10-07 Thread Alan Modra via Gcc-patches
This lays out the ground rules for following patches.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Expand comment.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b58eeae2b98..97180bb3819 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21208,7 +21208,46 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
 
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
-   scanned.  In either case, *TOTAL contains the cost result.  */
+   scanned.  In either case, *TOTAL contains the cost result.
+
+   1) Calls from places like optabs.c:avoid_expensive_constant will
+   come here with OUTER_CODE set to an operation such as AND with X
+   being a CONST_INT or other CONSTANT_P type.  This will be compared
+   against set_src_cost, where we'll come here with OUTER_CODE as SET
+   and X the same constant.
+
+   2) Calls from places like combine.c:distribute_and_simplify_rtx are
+   asking whether a possibly quite complex SET_SRC can be implemented
+   more cheaply than some other logically equivalent SET_SRC.
+
+   3) Calls from places like ifcvt.c:default_noce_conversion_profitable_p
+   will come here via seq_cost which calls set_rtx_cost on single set
+   insns.  set_rtx_cost passes the pattern of a SET insn in X with
+   OUTER_CODE as INSN.  The overall cost should be comparable to
+   rs6000_insn_cost since the code is comparing one insn sequence
+   (some of which may be costed by insn_cost) against another sequence.
+   Note the difference between set_rtx_cost, which costs the entire
+   SET, and set_src_cost, which costs just the SET_SRC.
+
+   4) Calls from places like cprop.c:try_replace_reg will also come
+   here via set_rtx_cost, with X either a valid pattern of a SET or
+   one where some registers have been replaced with constants.  The
+   replacements may make the SET invalid, for example if
+ (set (reg1) (and (reg2) (const_int 0xfff)))
+   has reg2 replaced as
+ (set (reg1) (and (symbol_ref) (const_int 0xfff)))
+   then the replacement can't be implemented in one instruction and
+   really the cost should be higher by one instruction.  However,
+   the cost for invalid insns doesn't matter much except that a
+   higher cost may lead to their rejection earlier.
+
+   5) fwprop.c:should_replace_address puts yet another wrinkle on this
+   function, where we prefer an address calculation that is more
+   complex yet has the same address_cost.  In this case "more
+   complex" is determined by having a higher set_src_cost.  So for
+   example, if we want a plain (reg) address to be replaced with
+   (plus (reg) (const)) when possible then PLUS needs to cost more
+   than zero here.  */
 
 static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,


[PATCH 5/8] [RS6000] rs6000_rtx_costs cost IOR

2020-10-07 Thread Alan Modra via Gcc-patches
* config/rs6000/rs6000.c (rotate_insert_cost): New function.
(rs6000_rtx_costs): Cost IOR.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 383d2901c9f..15a806fe307 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21206,6 +21206,91 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
 && get_attr_cannot_copy (insn);
 }
 
+/* Handle rtx_costs for scalar integer rotate and insert insns.  */
+
+static bool
+rotate_insert_cost (rtx left, rtx right, machine_mode mode, bool speed,
+   int *total)
+{
+  if (GET_CODE (right) == AND
+  && CONST_INT_P (XEXP (right, 1))
+  && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
+{
+  rtx leftop = XEXP (left, 0);
+  rtx rightop = XEXP (right, 0);
+
+  /* rotlsi3_insert_5.  */
+  if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == SImode
+ && UINTVAL (XEXP (left, 1)) != 0
+ && UINTVAL (XEXP (right, 1)) != 0
+ && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
+   return true;
+  /* rotldi3_insert_6.  */
+  if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == DImode
+ && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
+   return true;
+  /* rotldi3_insert_7.  */
+  if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == DImode
+ && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
+   return true;
+
+  rtx mask = 0;
+  rtx shift = leftop;
+  rtx_code shift_code = GET_CODE (shift);
+  /* rotl3_insert.  */
+  if (shift_code == ROTATE
+ || shift_code == ASHIFT
+ || shift_code == LSHIFTRT)
+   mask = right;
+  else
+   {
+ shift = rightop;
+ shift_code = GET_CODE (shift);
+ /* rotl3_insert_2.  */
+ if (shift_code == ROTATE
+ || shift_code == ASHIFT
+ || shift_code == LSHIFTRT)
+   mask = left;
+   }
+  if (mask
+ && CONST_INT_P (XEXP (shift, 1))
+ && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
+   {
+ *total += rtx_cost (XEXP (shift, 0), mode, shift_code, 0, speed);
+ *total += rtx_cost (XEXP (mask, 0), mode, AND, 0, speed);
+ return true;
+   }
+}
+  /* rotl3_insert_3.  */
+  if (GET_CODE (right) == ASHIFT
+  && CONST_INT_P (XEXP (right, 1))
+  && (INTVAL (XEXP (right, 1))
+ == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
+{
+  *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
+  *total += rtx_cost (XEXP (right, 0), mode, ASHIFT, 0, speed);
+  return true;
+}
+  /* rotl3_insert_4.  */
+  if (GET_CODE (right) == LSHIFTRT
+  && CONST_INT_P (XEXP (right, 1))
+  && mode == SImode
+  && (INTVAL (XEXP (right, 1))
+ + exact_log2 (-UINTVAL (XEXP (left, 1 == 32)
+{
+  *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
+  *total += rtx_cost (XEXP (right, 0), mode, LSHIFTRT, 0, speed);
+  return true;
+}
+  return false;
+}
+
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
scanned.  In either case, *TOTAL contains the cost result.
@@ -21253,7 +21338,7 @@ static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
-  rtx right;
+  rtx left, right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21435,7 +21520,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   right = XEXP (x, 1);
   if (CONST_INT_P (right))
{
- rtx left = XEXP (x, 0);
+ left = XEXP (x, 0);
  rtx_code left_code = GET_CODE (left);
 
  /* rotate-and-mask: 1 insn.  */
@@ -21452,9 +21537,16 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case IOR:
-  /* FIXME */
   *total = COSTS_N_INSNS (1);
-  return true;
+  left = XEXP (x, 0);
+  if (GET_CODE (left) == AND
+ && CONST_INT_P (XEXP (left, 1)))
+   {
+ right = XEXP (x, 1);
+ if (rotate_insert_cost (left, right, mode, speed, total))
+   return true;
+   }
+  return false;
 
 case CLZ:
 case XOR:


[PATCH 3/8] [RS6000] rs6000_rtx_costs tidy AND

2020-10-07 Thread Alan Modra via Gcc-patches
* config/rs6000/rs6000.c (rs6000_rtx_costs): Tidy AND code.
Don't avoid recursion on const_int shift count.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e870ba0039a..bc5e51aa5ce 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21253,6 +21253,7 @@ static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
+  rtx right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21430,7 +21431,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case AND:
-  if (CONST_INT_P (XEXP (x, 1)))
+  *total = COSTS_N_INSNS (1);
+  right = XEXP (x, 1);
+  if (CONST_INT_P (right))
{
  rtx left = XEXP (x, 0);
  rtx_code left_code = GET_CODE (left);
@@ -21439,17 +21442,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  if ((left_code == ROTATE
   || left_code == ASHIFT
   || left_code == LSHIFTRT)
- && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
+ && rs6000_is_valid_shift_mask (right, left, mode))
{
- *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
- if (!CONST_INT_P (XEXP (left, 1)))
-   *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
speed);
- *total += COSTS_N_INSNS (1);
+ *total += rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
+ *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
  return true;
}
}
-
-  *total = COSTS_N_INSNS (1);
   return false;
 
 case IOR:


[PATCH 7/8] [RS6000] rs6000_rtx_costs reduce cost for SETs

2020-10-07 Thread Alan Modra via Gcc-patches
The aim of this patch is to make rtx_costs for SETs closer to
insn_cost for SETs.  One visible effect on powerpc code is increased
if-conversion.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost of SET
operands.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 76aedbfae6f..d455aa52427 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21684,6 +21684,35 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
}
   return false;
 
+case SET:
+  /* On entry the value in *TOTAL is the number of general purpose
+regs being set, multiplied by COSTS_N_INSNS (1).  Handle
+costing of set operands specially since in most cases we have
+an instruction rather than just a piece of RTL and should
+return a cost comparable to insn_cost.  That's a little
+complicated because in some cases the cost of SET operands is
+non-zero, see point 5 above and cost of PLUS for example, and
+in others it is zero, for example for (set (reg) (reg)).
+But (set (reg) (reg)) has the same insn_cost as
+(set (reg) (plus (reg) (reg))).  Hack around this by
+subtracting COSTS_N_INSNS (1) from the operand cost in cases
+were we add at least COSTS_N_INSNS (1) for some operation.
+However, don't do so for constants.  Constants might cost
+more than zero when they require more than one instruction,
+and we do want the cost of extra instructions.  */
+  {
+   rtx_code src_code = GET_CODE (SET_SRC (x));
+   if (src_code == CONST_INT
+   || src_code == CONST_DOUBLE
+   || src_code == CONST_WIDE_INT)
+ return false;
+   int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+   + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+   if (set_cost >= COSTS_N_INSNS (1))
+ *total += set_cost - COSTS_N_INSNS (1);
+   return true;
+  }
+
 default:
   return false;
 }


[PATCH 2/8] [RS6000] rs6000_rtx_costs for AND

2020-10-07 Thread Alan Modra via Gcc-patches
The existing "case AND" in this function is not sufficient for
optabs.c:avoid_expensive_constant usage, where the AND is passed in
outer_code.  We'd like to cost AND of rs6000_is_valid_and_mask
or rs6000_is_valid_2insn_and variety there, so that those masks aren't
seen as expensive (ie. better to load to a reg then AND).

* config/rs6000/rs6000.c (rs6000_rtx_costs): Combine CONST_INT
AND handling with IOR/XOR.  Move costing for AND with
rs6000_is_valid_and_mask or rs6000_is_valid_2insn_and to
CONST_INT.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 97180bb3819..e870ba0039a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21264,16 +21264,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
|| outer_code == MINUS)
   && (satisfies_constraint_I (x)
   || satisfies_constraint_L (x)))
- || (outer_code == AND
- && (satisfies_constraint_K (x)
- || (mode == SImode
- ? satisfies_constraint_L (x)
- : satisfies_constraint_J (x
- || ((outer_code == IOR || outer_code == XOR)
+ || ((outer_code == AND || outer_code == IOR || outer_code == XOR)
  && (satisfies_constraint_K (x)
  || (mode == SImode
  ? satisfies_constraint_L (x)
  : satisfies_constraint_J (x
+ || (outer_code == AND
+ && rs6000_is_valid_and_mask (x, mode))
  || outer_code == ASHIFT
  || outer_code == ASHIFTRT
  || outer_code == LSHIFTRT
@@ -21310,7 +21307,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
|| outer_code == IOR
|| outer_code == XOR)
   && (INTVAL (x)
-  & ~ (unsigned HOST_WIDE_INT) 0x) == 0))
+  & ~ (unsigned HOST_WIDE_INT) 0x) == 0)
+  || (outer_code == AND
+  && rs6000_is_valid_2insn_and (x, mode)))
{
  *total = COSTS_N_INSNS (1);
  return true;
@@ -21448,26 +21447,6 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total += COSTS_N_INSNS (1);
  return true;
}
-
- /* rotate-and-mask (no rotate), andi., andis.: 1 insn.  */
- HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
- if (rs6000_is_valid_and_mask (XEXP (x, 1), mode)
- || (val & 0x) == val
- || (val & 0x) == val
- || ((val & 0x) == 0 && mode == SImode))
-   {
- *total = rtx_cost (left, mode, AND, 0, speed);
- *total += COSTS_N_INSNS (1);
- return true;
-   }
-
- /* 2 insns.  */
- if (rs6000_is_valid_2insn_and (XEXP (x, 1), mode))
-   {
- *total = rtx_cost (left, mode, AND, 0, speed);
- *total += COSTS_N_INSNS (2);
- return true;
-   }
}
 
   *total = COSTS_N_INSNS (1);


[PATCH 0/8] [RS6000] rs6000_rtx_costs V2

2020-10-07 Thread Alan Modra via Gcc-patches
This is a revised version of the patch series posted at
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553919.html

Patch 8/8 in particular changes rather a lot more than the original
!speed changes.  Bootstrapped and regression tested powerpc64le-linux
and powerpc64-linux biarch.

Alan Modra (8):
  [RS6000] rs6000_rtx_costs comment
  [RS6000] rs6000_rtx_costs for AND
  [RS6000] rs6000_rtx_costs tidy AND
  [RS6000] rs6000_rtx_costs tidy break/return
  [RS6000] rs6000_rtx_costs cost IOR
  [RS6000] rs6000_rtx_costs multi-insn constants
  [RS6000] rs6000_rtx_costs reduce cost for SETs
  [RS6000] rs6000_rtx_costs for !speed

 gcc/config/rs6000/rs6000.c | 516 +
 gcc/config/rs6000/rs6000.h |  23 --
 2 files changed, 350 insertions(+), 189 deletions(-)



[PATCH 6/8] [RS6000] rs6000_rtx_costs multi-insn constants

2020-10-07 Thread Alan Modra via Gcc-patches
This small patch to rs6000_rtx_costs considerably improves code
generated for large constants in 64-bit code, teaching gcc that it is
better to load a constant from memory than to generate a sequence of
up to five dependent instructions.  Note that the rs6000 backend does
generate large constants as loads from memory at expand time but
optimisation passes replace them with SETs of the value due to not
having correct costs.

PR 94393
* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost multi-insn
constants.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 15a806fe307..76aedbfae6f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21343,7 +21343,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
   switch (code)
 {
-  /* On the RS/6000, if it is valid in the insn, it is free.  */
+  /* (reg) is costed at zero by rtlanal.c:rtx_cost.  That sets a
+baseline for rtx costs:  If a constant is valid in an insn,
+it is free.  */
 case CONST_INT:
   if (((outer_code == SET
|| outer_code == PLUS
@@ -21404,6 +21406,17 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case CONST_DOUBLE:
 case CONST_WIDE_INT:
+  /* Subtract one insn here for consistency with the above code
+that returns one less than the actual number of insns for
+SETs.  Don't subtract one for other than SETs, because other
+operations will require the constant to be loaded to a
+register before performing the operation.  All special cases
+for codes other than SET must be handled before we get
+here.  */
+  *total = COSTS_N_INSNS (num_insns_constant (x, mode)
+ - (outer_code == SET ? 1 : 0));
+  return true;
+
 case CONST:
 case HIGH:
 case SYMBOL_REF:


[PATCH 8/8] [RS6000] rs6000_rtx_costs for !speed

2020-10-07 Thread Alan Modra via Gcc-patches
When optimizing for size we shouldn't be using metrics based on speed
or vice-versa.  rtlanal.c:get_full_rtx_cost wants both speed and size
metric from rs6000_rtx_costs independent of the global optimize_size.

Note that the patch changes param_simultaneous_prefetches,
param_l1_cache_size, param_l1_cache_line_size and param_l2_cache_size,
which were previously all set to zero for optimize_size.  I think that
was a bug.  Those params are a function of the processor.

* config/rs6000/rs6000.h (rs6000_cost): Don't declare.
(struct processor_costs): Move to..
* config/rs6000/rs6000.c: ..here.
(rs6000_cost): Make static.
(rs6000_option_override_internal): Ignore optimize_size when
setting up rs6000_cost.
(rs6000_insn_cost): Take into account optimize_size here
instead.
(rs6000_emit_parity): Likewise.
(rs6000_rtx_costs): Don't use rs6000_cost when !speed.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d455aa52427..14ecbad5df4 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -497,7 +497,26 @@ rs6000_store_data_bypass_p (rtx_insn *out_insn, rtx_insn 
*in_insn)
 
 /* Processor costs (relative to an add) */
 
-const struct processor_costs *rs6000_cost;
+struct processor_costs {
+  const int mulsi;   /* cost of SImode multiplication.  */
+  const int mulsi_const;  /* cost of SImode multiplication by constant.  */
+  const int mulsi_const9; /* cost of SImode mult by short constant.  */
+  const int muldi;   /* cost of DImode multiplication.  */
+  const int divsi;   /* cost of SImode division.  */
+  const int divdi;   /* cost of DImode division.  */
+  const int fp;  /* cost of simple SFmode and DFmode insns.  */
+  const int dmul;/* cost of DFmode multiplication (and fmadd).  */
+  const int sdiv;/* cost of SFmode division (fdivs).  */
+  const int ddiv;/* cost of DFmode division (fdiv).  */
+  const int cache_line_size;/* cache line size in bytes. */
+  const int l1_cache_size; /* size of l1 cache, in kilobytes.  */
+  const int l2_cache_size; /* size of l2 cache, in kilobytes.  */
+  const int simultaneous_prefetches; /* number of parallel prefetch
+   operations.  */
+  const int sfdf_convert;  /* cost of SF->DF conversion.  */
+};
+
+static const struct processor_costs *rs6000_cost;
 
 /* Instruction size costs on 32bit processors.  */
 static const
@@ -4618,131 +4637,128 @@ rs6000_option_override_internal (bool global_init_p)
 }
 
   /* Initialize rs6000_cost with the appropriate target costs.  */
-  if (optimize_size)
-rs6000_cost = TARGET_POWERPC64 ? &size64_cost : &size32_cost;
-  else
-switch (rs6000_tune)
-  {
-  case PROCESSOR_RS64A:
-   rs6000_cost = &rs64a_cost;
-   break;
+  switch (rs6000_tune)
+{
+case PROCESSOR_RS64A:
+  rs6000_cost = &rs64a_cost;
+  break;
 
-  case PROCESSOR_MPCCORE:
-   rs6000_cost = &mpccore_cost;
-   break;
+case PROCESSOR_MPCCORE:
+  rs6000_cost = &mpccore_cost;
+  break;
 
-  case PROCESSOR_PPC403:
-   rs6000_cost = &ppc403_cost;
-   break;
+case PROCESSOR_PPC403:
+  rs6000_cost = &ppc403_cost;
+  break;
 
-  case PROCESSOR_PPC405:
-   rs6000_cost = &ppc405_cost;
-   break;
+case PROCESSOR_PPC405:
+  rs6000_cost = &ppc405_cost;
+  break;
 
-  case PROCESSOR_PPC440:
-   rs6000_cost = &ppc440_cost;
-   break;
+case PROCESSOR_PPC440:
+  rs6000_cost = &ppc440_cost;
+  break;
 
-  case PROCESSOR_PPC476:
-   rs6000_cost = &ppc476_cost;
-   break;
+case PROCESSOR_PPC476:
+  rs6000_cost = &ppc476_cost;
+  break;
 
-  case PROCESSOR_PPC601:
-   rs6000_cost = &ppc601_cost;
-   break;
+case PROCESSOR_PPC601:
+  rs6000_cost = &ppc601_cost;
+  break;
 
-  case PROCESSOR_PPC603:
-   rs6000_cost = &ppc603_cost;
-   break;
+case PROCESSOR_PPC603:
+  rs6000_cost = &ppc603_cost;
+  break;
 
-  case PROCESSOR_PPC604:
-   rs6000_cost = &ppc604_cost;
-   break;
+case PROCESSOR_PPC604:
+  rs6000_cost = &ppc604_cost;
+  break;
 
-  case PROCESSOR_PPC604e:
-   rs6000_cost = &ppc604e_cost;
-   break;
+case PROCESSOR_PPC604e:
+  rs6000_cost = &ppc604e_cost;
+  break;
 
-  case PROCESSOR_PPC620:
-   rs6000_cost = &ppc620_cost;
-   break;
+case PROCESSOR_PPC620:
+  rs6000_cost = &ppc620_cost;
+  break;
 
-  case PROCESSOR_PPC630:
-   rs6000_cost = &ppc630_cost;
-   break;
+case PROCESSOR_PPC630:
+  rs6000_cost = &ppc630_cost;
+  break;
 
-  case PROCESSOR_CELL:
-   rs6000_cost = &ppccell_cost;
-   break;
+case PROCESSOR_CELL:
+  rs6000_cost = &ppccell_cost;
+  break;
 
-  case PROCESSOR_PPC750:
-  case PROCESSOR_PPC7400:
-   

Re: [PATCH] Remove non-ANSI C macros in ansidecl.h.

2022-05-10 Thread Alan Modra via Gcc-patches
On Tue, May 10, 2022 at 10:57:47AM +0200, Martin Liška wrote:
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> @Alan: Are you fine with the change from binutils/gdb perspective?

I'm fine if this isn't going into the binutils-gdb repo immediately.
There are occurrences of PTR in the cgen generated parts of opcodes,
sim, and even gdb.  I have a few patches I haven't yet committed.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] libiberty: stop using PTR macro.

2022-05-10 Thread Alan Modra via Gcc-patches
On Tue, May 10, 2022 at 10:56:22AM +0200, Martin Liška wrote:

> diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c

> @@ -457,15 +457,15 @@ htab_empty (htab_t htab)
>else if (htab->free_with_arg_f != NULL)
>   (*htab->free_with_arg_f) (htab->alloc_arg, htab->entries);
>if (htab->alloc_with_arg_f != NULL)
> - htab->entries = (PTR *) (*htab->alloc_with_arg_f) (htab->alloc_arg, 
> nsize,
> -sizeof (PTR *));
> + htab->entries = (void **) (*htab->alloc_with_arg_f) (htab->alloc_arg, 
> nsize,
> +sizeof (void **));

Here, and below, the code should really be using "sizeof (void *)".
You may as well fix that nit while you're at it.  Also, indentation
looks wrong.

>else
> - htab->entries = (PTR *) (*htab->alloc_f) (nsize, sizeof (PTR *));
> + htab->entries = (void **) (*htab->alloc_f) (nsize, sizeof (void **));
>   htab->size = nsize;
>   htab->size_prime_index = nindex;
>  }
>else
> -memset (entries, 0, size * sizeof (PTR));
> +memset (entries, 0, size * sizeof (void *));
>htab->n_deleted = 0;
>htab->n_elements = 0;
>  }

> @@ -543,10 +543,10 @@ htab_expand (htab_t htab)
>  }
>  
>if (htab->alloc_with_arg_f != NULL)
> -nentries = (PTR *) (*htab->alloc_with_arg_f) (htab->alloc_arg, nsize,
> -   sizeof (PTR *));
> +nentries = (void **) (*htab->alloc_with_arg_f) (htab->alloc_arg, nsize,
> +   sizeof (void **));
>else
> -nentries = (PTR *) (*htab->alloc_f) (nsize, sizeof (PTR *));
> +nentries = (void **) (*htab->alloc_f) (nsize, sizeof (void **));
>if (nentries == NULL)
>  return 0;
>htab->entries = nentries;

Here too.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] libiberty: remove FINAL and OVERRIDE from ansidecl.h

2022-05-24 Thread Alan Modra via Gcc-patches
On Mon, May 23, 2022 at 07:42:29PM -0400, David Malcolm via Binutils wrote:
> Any objections, or is there a reason to keep these macros that I'm
> not aware of?  (and did I send this to all the pertinent lists?)

No objection from me.  These macros are not used anywhere in
binutils-gdb.

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH 1/2] can_implement_as_sibling_call_p REG_PARM_STACK_SPACE check V2

2020-11-02 Thread Alan Modra via Gcc-patches
On Sat, Oct 31, 2020 at 12:10:35AM +1030, Alan Modra wrote:
> Would it be better if I post the patches again, restructuring them as
> 1) completely no functional change just moving the existing condition
>to the powerpc and i386 target hooks, and
> 2) twiddling the powerpc target hook?

The no function change patch.

This moves an #ifdef block of code from calls.c to
targetm.function_ok_for_sibcall.  Only two targets, x86 and rs6000,
define REG_PARM_STACK_SPACE or OUTGOING_REG_PARM_STACK_SPACE macros
that might vary depending on the called function.  Macros like
UNITS_PER_WORD don't change over a function boundary, nor does the
MIPS ABI, nor does TARGET_64BIT on PA-RISC.  Other targets are even
more trivially proven to not need the calls.c code.

Besides cleaning up a small piece of #ifdef code, the motivation for
this patch is to allow tail calls on PowerPC for functions that
require less reg_parm_stack_space than their caller.  The original
code in calls.c only permitted tail calls when exactly equal.  That
will be done in a followup patch (removing the code added to
rs6000-logue.c in this patch).

Bootstrapped and regression tested powerpc64le-linux, powerpc64-linux,
and x86_64-linux.  OK?

PR middle-end/97267
* calls.h (maybe_complain_about_tail_call): Declare.
* calls.c (maybe_complain_about_tail_call): Make global.
(can_implement_as_sibling_call_p): Delete reg_parm_stack_space
param.  Adjust caller.  Move REG_PARM_STACK_SPACE check to..
* config/i386/i386.c (ix86_function_ok_for_sibcall): ..here, and..
* config/rs6000/rs6000-logue.c (rs6000_function_ok_for_sibcall): ..
here.

diff --git a/gcc/calls.c b/gcc/calls.c
index a8f459632f2..1a7632d2d48 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1922,7 +1922,7 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 /* Issue an error if CALL_EXPR was flagged as requiring
tall-call optimization.  */
 
-static void
+void
 maybe_complain_about_tail_call (tree call_expr, const char *reason)
 {
   gcc_assert (TREE_CODE (call_expr) == CALL_EXPR);
@@ -3525,7 +3525,6 @@ static bool
 can_implement_as_sibling_call_p (tree exp,
 rtx structure_value_addr,
 tree funtype,
-int reg_parm_stack_space ATTRIBUTE_UNUSED,
 tree fndecl,
 int flags,
 tree addr,
@@ -3550,20 +3549,6 @@ can_implement_as_sibling_call_p (tree exp,
   return false;
 }
 
-#ifdef REG_PARM_STACK_SPACE
-  /* If outgoing reg parm stack space changes, we cannot do sibcall.  */
-  if (OUTGOING_REG_PARM_STACK_SPACE (funtype)
-  != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl))
-  || (reg_parm_stack_space != REG_PARM_STACK_SPACE 
(current_function_decl)))
-{
-  maybe_complain_about_tail_call (exp,
- "inconsistent size of stack space"
- " allocated for arguments which are"
- " passed in registers");
-  return false;
-}
-#endif
-
   /* Check whether the target is able to optimize the call
  into a sibcall.  */
   if (!targetm.function_ok_for_sibcall (fndecl, exp))
@@ -4088,7 +4073,6 @@ expand_call (tree exp, rtx target, int ignore)
 try_tail_call = can_implement_as_sibling_call_p (exp,
 structure_value_addr,
 funtype,
-reg_parm_stack_space,
 fndecl,
 flags, addr, args_size);
 
diff --git a/gcc/calls.h b/gcc/calls.h
index f32b6308b58..b20d24bb888 100644
--- a/gcc/calls.h
+++ b/gcc/calls.h
@@ -133,6 +133,7 @@ extern bool reference_callee_copied (CUMULATIVE_ARGS *,
 extern void maybe_warn_alloc_args_overflow (tree, tree, tree[2], int[2]);
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 extern bool maybe_warn_nonstring_arg (tree, tree);
+extern void maybe_complain_about_tail_call (tree, const char *);
 enum size_range_flags
   {
/* Set to consider zero a valid range.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 502d24057b5..809c145b638 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -939,6 +939,19 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   decl_or_type = type;
 }
 
+  /* If outgoing reg parm stack space changes, we cannot do sibcall.  */
+  if ((OUTGOING_REG_PARM_STACK_SPACE (type)
+   != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl)))
+  || (REG_PARM_STACK_SPACE (decl_or_type)
+ != REG_PARM_STACK_SPACE (current_function_decl)))
+{
+  maybe_complain_about_tail_call (exp,
+ "incons

[PATCH 2/2] [RS6000] REG_PARM_STACK_SPACE check V2

2020-11-02 Thread Alan Modra via Gcc-patches
On PowerPC we can tail call if the callee has less or equal
REG_PARM_STACK_SPACE than the caller, as demonstrated by the
testcase.  So we should use

  /* If reg parm stack space increases, we cannot sibcall.  */
  if (REG_PARM_STACK_SPACE (decl ? decl : fntype)
  > INCOMING_REG_PARM_STACK_SPACE (current_function_decl))

and note the change to use INCOMING_REG_PARM_STACK_SPACE.
REG_PARM_STACK_SPACE has always been wrong there for PowerPC.  See
https://gcc.gnu.org/pipermail/gcc-patches/2014-May/389867.html for why
if you're curious.  Not that it matters, because PowerPC can do
without this check entirely, relying on a stack slot test in generic
code.

a) The generic code checks that arg passing stack in the callee is not
   greater than that in the caller, and,
b) ELFv2 only allocates reg_parm_stack_space when some parameter is
   passed on the stack.
Point (b) means that zero reg_parm_stack_space implies zero stack
space, and non-zero reg_parm_stack_space implies non-zero stack
space.  So the case of 0 reg_parm_stack_space in the caller and 64 in
the callee will be caught by (a).

Bootstrapped and regression tested powerpc64le-linux and biarch
powerpc64-linux. OK?

PR middle-end/97267
gcc/
* config/rs6000/rs6000-logue.c (rs6000_function_ok_for_sibcall):
Remove code checking REG_PARM_STACK_SPACE.
testsuite/
* gcc.target/powerpc/pr97267.c: New test.

diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 61eb7ce7ade..d90cd5736e1 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -30,7 +30,6 @@
 #include "df.h"
 #include "tm_p.h"
 #include "ira.h"
-#include "calls.h"
 #include "print-tree.h"
 #include "varasm.h"
 #include "explow.h"
@@ -1134,19 +1133,6 @@ rs6000_function_ok_for_sibcall (tree decl, tree exp)
   else
 fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
 
-  /* If outgoing reg parm stack space changes, we cannot do sibcall.  */
-  if ((OUTGOING_REG_PARM_STACK_SPACE (fntype)
-   != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl)))
-  || (REG_PARM_STACK_SPACE (decl ? decl : fntype)
- != REG_PARM_STACK_SPACE (current_function_decl)))
-{
-  maybe_complain_about_tail_call (exp,
- "inconsistent size of stack space"
- " allocated for arguments which are"
- " passed in registers");
-  return false;
-}
-
   /* We can't do it if the called function has more vector parameters
  than the current function; there's nowhere to put the VRsave code.  */
   if (TARGET_ALTIVEC_ABI
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97267.c 
b/gcc/testsuite/gcc.target/powerpc/pr97267.c
new file mode 100644
index 000..cab46245fc9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97267.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+static int __attribute__ ((__noclone__, __noinline__))
+reg_args (int j1, int j2, int j3, int j4, int j5, int j6, int j7, int j8)
+{
+  return j1 + j2 + j3 + j4 + j5 + j6 + j7 + j8;
+}
+
+int __attribute__ ((__noclone__, __noinline__))
+stack_args (int j1, int j2, int j3, int j4, int j5, int j6, int j7, int j8,
+   int j9)
+{
+  if (j9 == 0)
+return 0;
+  return reg_args (j1, j2, j3, j4, j5, j6, j7, j8);
+}
+
+/* { dg-final { scan-assembler-not {(?n)^\s+bl\s} } } */

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 1/3] Refactor copying decl section names

2020-11-10 Thread Alan Modra via Gcc-patches
On Tue, Nov 10, 2020 at 11:45:37AM -0700, Jeff Law wrote:
> 
> On 11/12/19 11:28 PM, Strager Neds wrote:
> > * gcc/cgraph.h (symtab_node::get_section): Constify.
> > (symtab_node::set_section): Declare new overload.
> > * gcc/symtab.c (symtab_node::set_section): Define new overload.
> > (symtab_node::copy_visibility_from): Use new overload of
> > symtab_node::set_section.
> > (symtab_node::resolve_alias): Same.
> > * gcc/tree.h (set_decl_section_name): Declare new overload.
> > * gcc/tree.c (set_decl_section_name): Define new overload.
> > * gcc/c/c-decl.c (merge_decls): Use new overload of
> > set_decl_section_name.
> > * gcc/cp/decl.c (duplicate_decls): Same.
> > * gcc/cp/method.c (use_thunk): Same.
> > * gcc/cp/optimize.c (maybe_clone_body): Same.
> > * gcc/d/decl.cc (finish_thunk): Same.
> > * gcc/tree-emutls.c (get_emutls_init_templ_addr): Same.
> > * gcc/cgraphclones.c (cgraph_node::create_virtual_clone): Use new
> > overload of symtab_node::set_section.
> > (cgraph_node::create_version_clone_with_body): Same.
> > * gcc/trans-mem.c (ipa_tm_create_version): Same.
> 
> I adjusted the ChangeLog, added an entry for the coroutines.cc addition
> I made and pushed this to the trunk.

/home/alan/src/gcc/gcc/go/go-gcc.cc:2759:44: error: call of overloaded 
‘set_decl_section_name(tree_node*&, std::nullptr_t)’ is ambiguous
   set_decl_section_name (var_decl, NULL);
^
In file included from /home/alan/src/gcc/gcc/go/go-gcc.cc:27:0:
/home/alan/src/gcc/gcc/tree.h:4283:13: note: candidate: void 
set_decl_section_name(tree, const char*)
 extern void set_decl_section_name (tree, const char *);
 ^
/home/alan/src/gcc/gcc/tree.h:4284:13: note: candidate: void 
set_decl_section_name(tree, const_tree)
 extern void set_decl_section_name (tree, const_tree);
 ^

I'm using the obvious to me "(const char *) NULL" in the call to fix
this, but you might like a different style C++ fix instead.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 1/3] Refactor copying decl section names

2020-11-10 Thread Alan Modra via Gcc-patches
On Tue, Nov 10, 2020 at 09:19:37PM -0700, Jeff Law wrote:
> I think the const char * is fine.  That should force resolution to the
> same routine we were using earlier.  Do you want to commit that fix or
> shall I?

Commited 693a79a355e1.

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Use LIB2_SIDITI_CONV_FUNCS in place of ppc64-fp.c

2020-11-13 Thread Alan Modra via Gcc-patches
This patch retires ppc64-fp.c in favour of using
"LIB2_SIDITI_CONV_FUNCS = yes", which is a lot better solution than
having a copy of selected libgcc2.c functions.

So for powerpc64-linux we see these changes in libgcc files (plus
corresponding _s.o variants).
+_fixdfti.o
+_fixsfti.o
+_fixtfti.o
+_fixunsdfti.o
+_fixunssfti.o
+_fixunstfti.o
+_floattidf.o
+_floattisf.o
+_floattitf.o
+_floatuntidf.o
+_floatuntisf.o
+_floatuntitf.o
-ppc64-fp.o

with these empty objects also appearing (plus _s.o variants).
+_fixunsxfti.o
+_fixxfti.o
+_floattixf.o
+_floatuntixf.o

In reality we aren't getting new TI mode conversions as it might seem,
because the old *di*.o files corresponding to the above files
contained TI mode conversions, whereas now they contain DI mode
conversions.  Those match the functions provided in ppc64-fp.o, and
the set of dynamic libgcc_s.so.1 symbols is identical, apart from 
values, to before this patch.

For ppc32 we get a whole lot more empty objects replacing the empty
ppc64-fp.o.  Again the set of global symbol in libgcc.a and dynamic
symbols in libgcc_s.so.1 are unchanged.

Bootstrapped and regression tested powerpc64-linux, powerpc64le-linux,
powerpc-linux and powerpc-ibm-aix7.2.4.0.  OK?

* config/rs6000/t-ppc64-fp (LIB2ADD): Delete.
(LIB2_SIDITI_CONV_FUNCS): Define.
* config/rs6000/ppc64-fp.c: Delete file.

diff --git a/libgcc/config/rs6000/t-ppc64-fp b/libgcc/config/rs6000/t-ppc64-fp
index 26d1730bcdb..999679fc3cb 100644
--- a/libgcc/config/rs6000/t-ppc64-fp
+++ b/libgcc/config/rs6000/t-ppc64-fp
@@ -1,2 +1 @@
-# Can be used unconditionally, wrapped in __powerpc64__ || __64BIT__ __ppc64__.
-LIB2ADD += $(srcdir)/config/rs6000/ppc64-fp.c
+LIB2_SIDITI_CONV_FUNCS = yes

-- 
Alan Modra
Australia Development Lab, IBM


obstack.h __PTR_ALIGN vs. ubsan

2021-09-01 Thread Alan Modra via Gcc-patches
Current ubsan complains on every use of __PTR_ALIGN (when ptrdiff_t is
as large as a pointer), due to making calculations relative to a NULL
pointer.  This patch avoids the problem by extracting out and
simplifying __BPTR_ALIGN for the usual case.  I've continued to use
ptrdiff_t here, where it might be better to throw away __BPTR_ALIGN
entirely and just assume uintptr_t exists.

OK to apply for gcc?

* obstack.h (__PTR_ALIGN): Expand and simplify __BPTR_ALIGN
rather than calculating relative to a NULL pointer.

diff --git a/include/obstack.h b/include/obstack.h
index a6eb6c95587..0d8746f835b 100644
--- a/include/obstack.h
+++ b/include/obstack.h
@@ -137,9 +137,9 @@
relative to B.  Otherwise, use the faster strategy of computing the
alignment relative to 0.  */
 
-#define __PTR_ALIGN(B, P, A) \
-  __BPTR_ALIGN (sizeof (ptrdiff_t) < sizeof (void *) ? (B) : (char *) 0,  \
-P, A)
+#define __PTR_ALIGN(B, P, A)   \
+  (sizeof (ptrdiff_t) < sizeof (void *) ? __BPTR_ALIGN (B, P, A)   \
+   : (char *) (((ptrdiff_t) (P) + (A)) & ~(A)))
 
 #ifndef __attribute_pure__
 # if defined __GNUC_MINOR__ && __GNUC__ * 1000 + __GNUC_MINOR__ >= 2096

-- 
Alan Modra
Australia Development Lab, IBM


Make opcodes configure depend on bfd configure

2021-11-12 Thread Alan Modra via Gcc-patches
The idea is for opcodes to be able to see whether bfd is compiled
for 64-bit.  A lot of --enable-targets=all libopcodes is wasted space
if bfd can't load 64-bit target object files.

* Makefile.def (configure-opcodes): Depend on configure-bfd.
* Makefile.in: Regenerate.

Applied to gcc.  I'll be importing the gcc versions over to binutils.

diff --git a/Makefile.def b/Makefile.def
index 0abc42b1a1b..a504192e6d7 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -4,7 +4,7 @@ AutoGen definitions Makefile.tpl;
 // Makefile.in is generated from Makefile.tpl by 'autogen Makefile.def'.
 // This file was originally written by Nathanael Nerode.
 //
-//   Copyright 2002-2019 Free Software Foundation
+//   Copyright 2002-2021 Free Software Foundation
 //
 // This file is free software; you can redistribute it and/or modify
 // it under the terms of the GNU General Public License as published by
@@ -493,6 +493,7 @@ dependencies = { module=install-strip-ld; 
on=install-strip-bfd; };
 dependencies = { module=install-strip-ld; on=install-strip-libctf; };
 
 // libopcodes depends on libbfd
+dependencies = { module=configure-opcodes; on=configure-bfd; hard=true; };
 dependencies = { module=install-opcodes; on=install-bfd; };
 dependencies = { module=install-strip-opcodes; on=install-strip-bfd; };
 
diff --git a/Makefile.in b/Makefile.in
index d13f6c353ee..860cf8f067b 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -62717,6 +62717,16 @@ install-ld: maybe-install-libctf
 install-strip-libctf: maybe-install-strip-bfd
 install-strip-ld: maybe-install-strip-bfd
 install-strip-ld: maybe-install-strip-libctf
+configure-opcodes: configure-bfd
+configure-stage1-opcodes: configure-stage1-bfd
+configure-stage2-opcodes: configure-stage2-bfd
+configure-stage3-opcodes: configure-stage3-bfd
+configure-stage4-opcodes: configure-stage4-bfd
+configure-stageprofile-opcodes: configure-stageprofile-bfd
+configure-stagetrain-opcodes: configure-stagetrain-bfd
+configure-stagefeedback-opcodes: configure-stagefeedback-bfd
+configure-stageautoprofile-opcodes: configure-stageautoprofile-bfd
+configure-stageautofeedback-opcodes: configure-stageautofeedback-bfd
 install-opcodes: maybe-install-bfd
 install-strip-opcodes: maybe-install-strip-bfd
 configure-gas: maybe-configure-intl

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC64 ELFv2 -fpatchable-function-entry

2021-05-19 Thread Alan Modra via Gcc-patches
On Tue, May 18, 2021 at 05:48:40AM -0500, Segher Boessenkool wrote:
> I wrote a bit later:
> 
> > > Can you make this less hacky please?  Changing the generic code is just
> > > fine as well, it needs some love.
> 
> In effect making a callback / hook without making that explicit is bad
> for maintainability.  We are in stage 1, now is a good time for
> infrastructure changes.

Yes, perhaps I could do that, and possibly even write a patch that is
accepted.  I'm not going to though, because I made a decision a little
while ago that I'm not going to contribute new gcc work.  This one was
just flushing some patches that I wrote a while ago.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [POWER10] __morestack calls from pcrel code

2021-07-14 Thread Alan Modra via Gcc-patches
On Wed, Jun 30, 2021 at 05:06:30PM -0300, Tulio Magno Quites Machado Filho 
wrote:
> Alan Modra via Gcc-patches  writes:
> 
> > Compiling gcc/testsuite/gcc.dg/split-*.c and others with -mcpu=power10
> > and linking with a non-pcrel libgcc results in crashes due to the
> > power10 pcrel code not having r2 set for the generic-morestack.c
> > functions called from __morestack.  There is also a problem when
> > non-pcrel code calls a pcrel libgcc.  See the patch comments.
> >
> > A similar situation theoretically occurs with ELFv1 multi-toc
> > executables, when __morestack might be located in a different toc
> > group to its caller.  This patch makes no attempt to fix that, since
> > the gold linker does not support multi-toc (gold is needed for proper
> > support of -fsplit-stack code) nor does gcc emit __morestack calls
> > that support multi-toc.
> >
> > Bootstrapped and regression tested power64le-linux with both
> > -mcpu=power10 and -mcpu=power9.  OK for mainline and backporting to
> > gcc-11 and gcc-10?
> >
> > * config/rs6000/morestack.S (R2_SAVE): Define.
> > (__morestack): Save and restore r2.  Set up r2 for called
> > functions.
> 
> Thanks! This patch solved the issue I was seeing.
> 
> If it gets merged, can this patch be backported to GCC 10 and 11, please?
> 
> -- 
> Tulio Magno

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573978.html

This patch has now been unreviewed for over two weeks.  I expected a
rubber stamp style approval;  This assembly file is all mine, I know
the ABI and how .eh_frame driven exception handling works on powerpc.
So I'm going to claim the patch is obvious enough to someone with a
good understanding of what is going on in morestack.S and commit under
the "obvious" rule after allowing a few more days for comment.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 2/4 REVIEW] libtool.m4: fix nm BSD flag detection

2021-07-21 Thread Alan Modra via Gcc-patches
On Wed, Jul 07, 2021 at 08:03:45PM +0100, Nick Alcock via Gcc-patches wrote:
> On 7 Jul 2021, Nick Clifton told this:
> 
> > Hi Nick,
> >
> >> Ping?
> >
> > Oops.
> 
> I sent a bunch of pings out at the same time, to a bunch of different
> projects. You are the only person to respond, so thank you!
> 
> >>>   PR libctf/27482
> >>>   * libtool.m4 (LT_PATH_NM): Try BSDization flags with a user-provided
> >
> > Changes to libtool need to be posted to the libtool project:
> >
> >   https://www.gnu.org/software/libtool/
> 
> I considered this, but there is *serious* divergence between the
> libtool.m4 in our tree and upstream. Fixing this divergence looks to be
> a fairly major project in and of itself :( the last real sync looked
> like being all the way back in 2008.

Yes, I looked at doing a sync myself a few years ago..
I'll OK the two libtool changes for binutils.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [POWER10] __morestack calls from pcrel code

2021-07-21 Thread Alan Modra via Gcc-patches
On Wed, Jul 21, 2021 at 08:59:04AM -0400, David Edelsohn wrote:
> On Wed, Jul 21, 2021 at 4:29 AM Alan Modra  wrote:
> >
> > On Wed, Jul 14, 2021 at 08:24:16PM -0400, David Edelsohn wrote:
> > > > > > * config/rs6000/morestack.S (R2_SAVE): Define.
> > > > > > (__morestack): Save and restore r2.  Set up r2 for called
> > > > > > functions.
> > >
> > > This patch is okay.
> >
> > Thanks David, the patch is needed on gcc-11 and gcc-10 too.
> > OK for the branches too?
> 
> Backports are fine, but I believe that Richi is planning to cut GCC 11
> RC today, so you really should check with him about a backport at the
> last minute.

Hi Richard,
Is this patch OK at this late stage for the gcc-11 branch?
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573978.html

The impacts of the bug are segfaults and other undesirable behaviour
with Go (or more generally -fsplit-stack) on power10 when libgcc is
not power10 pcrel.  A non-pcrel libgcc is very likely how distros
will ship gcc.

-- 
Alan Modra
Australia Development Lab, IBM


Re: RFC: Changing AC_PROG_CC to AC_PROG_CC_C99 in top level configure

2021-05-02 Thread Alan Modra via Gcc-patches
On Fri, Apr 30, 2021 at 03:48:00PM -0600, Jeff Law via Gcc-patches wrote:
> 
> On 4/30/2021 12:36 PM, Simon Marchi via Gcc-patches wrote:
> > On 2021-04-26 7:32 a.m., Nick Clifton via Gdb-patches wrote:> Hi Guys,
> > >Given that gcc, gdb and now binutils are all now requiring C99 as a
> > >minimum version of C, are there any objections to updating
> > >configure.ac to reflect this ?
> > > 
> > > Cheers
> > >Nick
> > > 
> > > diff --git a/configure.ac b/configure.ac
> > > index a721316d07b..59b4194fb24 100644
> > > --- a/configure.ac
> > > +++ b/configure.ac
> > > @@ -1278,7 +1278,7 @@ else
> > > WINDMC_FOR_BUILD="\$(WINDMC)"
> > >   fi
> > > 
> > > -AC_PROG_CC
> > > +AC_PROG_CC_C99
> > >   AC_PROG_CXX
> > > 
> > >   # We must set the default linker to the linker used by gcc for the 
> > > correct
> > Hi Nick,
> > 
> > I think this fix is obvious enough, I encourage you to push it, that
> > will fix the build failure many people get in opcodes/ppc-dis.c.  We'll
> > just remove the line later when we upgrade to Autoconf 2.71, as simple
> > as that.  For now we use 2.69.  If that matters, you have my OK for the
> > GDB side of things.
> 
> That works for me.  I'd just sent Alan the trivial patch to make ppc-dis.c
> compile again with C89, but if we're going to update configure.ac
> appropriately, then it wouldn't be needed.

Yes, I prefer the configure fix too.  If we state we require C99 in
binutils then we ought to be able to use C99..

Nick, does the configure.ac change also need to go in all subdirs, to
support people running make in say ld/ rather than running make in the
top build dir?

-- 
Alan Modra
Australia Development Lab, IBM


Re: RFC: Changing AC_PROG_CC to AC_PROG_CC_C99 in top level configure

2021-05-03 Thread Alan Modra via Gcc-patches
On Mon, May 03, 2021 at 10:47:15AM -0400, Simon Marchi wrote:
> > Yes, I prefer the configure fix too.  If we state we require C99 in
> > binutils then we ought to be able to use C99..
> > 
> > Nick, does the configure.ac change also need to go in all subdirs, to
> > support people running make in say ld/ rather than running make in the
> > top build dir?
> 
> For GDB, it's not supported to run gdb/configure directly, you need to
> use the top-level configure.  Is it supported from some of the other
> projects in the repo?
> 
> I just tried with ld, it doesn't work since it depends on bfd also being
> built.  I tried with just bfd, it doesn't work (with the default
> configure options at least) because it requires zlib being built.

I wasn't talking about running configure, I was talking about running
make.  For example, you configure and make binutils as usual, then
after making a change to ld/ files, run make in the ld build dir.  I
don't tend to do that myself but I do run "make check" sometimes in a
subdir expecting to get the same results in that subdir as if "make
check" was run from the top level.

But I should have just tried it myself rather than asking.  CC, CPP
and others are inherited from the top level and appear with -std=gnu99
in the subdir Makefiles.  So it seems all the AC_PROG_CC in subdir
configure.ac can stay as they are.

> 
> So if all projects need to go through the top-level configure script
> anyway, and C99 is a baseline for all projects, then having the check
> only in the top-level makes sense to me.  Projects that have more
> specific requirements can have their own checks.  For example, sim/
> requires C11 now.  Unless the C99 check at top-level somehow does not
> play well with the C11 check in sim/?  Like if that would cause CC to be
> set to "gcc -std=gnu99 -std=gnu11" or something like that.
> 
> Simon

-- 
Alan Modra
Australia Development Lab, IBM


Re: RFC: Changing AC_PROG_CC to AC_PROG_CC_C99 in top level configure

2021-05-04 Thread Alan Modra via Gcc-patches
On 2021-05-04 8:42 a.m., Nick Clifton wrote:
> Hi Guys,
> 
> On 4/30/21 7:36 PM, Simon Marchi wrote:
>> I think this fix is obvious enough, I encourage you to push it,
> 
> OK - I have pushed the patch to the mainline branches of both
> the gcc and binutils-gdb repositories.

Thanks Nick!  Incidentally, I checked the AC_PROG_CC_C99 change on
both binutils and gcc mainline using gcc-4.9.

To build gcc on x86_64 I found the following patch necessary to avoid
lots of
error: uninitialized const member ‘stringop_algs::stringop_strategy::max’
error: uninitialized const member ‘stringop_algs::stringop_strategy::alg’
when compiling config/i386/i386-options.c.  These can't be cured by
configuring with --disable-stage1-checking.

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 97d6f3863cb..cc3b1b6d666 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -73,8 +73,8 @@ struct stringop_algs
 {
   const enum stringop_alg unknown_size;
   const struct stringop_strategy {
-const int max;
-const enum stringop_alg alg;
+int max;
+enum stringop_alg alg;
 int noalign;
   } size [MAX_STRINGOP_ALGS];
 };


-- 
Alan Modra
Australia Development Lab, IBM


Re: RFC: Changing AC_PROG_CC to AC_PROG_CC_C99 in top level configure

2021-05-05 Thread Alan Modra via Gcc-patches
On Wed, May 05, 2021 at 08:05:29AM +0100, Iain Sandoe wrote:
> Alan Modra via Gcc-patches  wrote:
> 
> > On 2021-05-04 8:42 a.m., Nick Clifton wrote:
> > > Hi Guys,
> > > 
> > > On 4/30/21 7:36 PM, Simon Marchi wrote:
> > > > I think this fix is obvious enough, I encourage you to push it,
> > > 
> > > OK - I have pushed the patch to the mainline branches of both
> > > the gcc and binutils-gdb repositories.
> > 
> > Thanks Nick!  Incidentally, I checked the AC_PROG_CC_C99 change on
> > both binutils and gcc mainline using gcc-4.9.
> > 
> > To build gcc on x86_64 I found the following patch necessary to avoid
> > lots of
> > error: uninitialized const member ‘stringop_algs::stringop_strategy::max’
> > error: uninitialized const member ‘stringop_algs::stringop_strategy::alg’
> > when compiling config/i386/i386-options.c.  These can't be cured by
> > configuring with --disable-stage1-checking.
> > 
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 97d6f3863cb..cc3b1b6d666 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -73,8 +73,8 @@ struct stringop_algs
> > {
> >   const enum stringop_alg unknown_size;
> >   const struct stringop_strategy {
> > -const int max;
> > -const enum stringop_alg alg;
> > +int max;
> > +enum stringop_alg alg;
> > int noalign;
> >   } size [MAX_STRINGOP_ALGS];
> > };
> 
> does this relate to / fix PR 100246 (which seems to fire for some GCC
> versions as well
> as older clang)?

Yes, looks like the same issue.  I started making a similar fix to the
one you attached to the PR, then laziness kicked in after noticing the
errors were only given on the const elements.

-- 
Alan Modra
Australia Development Lab, IBM


PR98125, PowerPC64 -fpatchable-function-entry

2021-05-06 Thread Alan Modra via Gcc-patches
This series of patches fixes -fpatchable-function-entry on PowerPC64
ELFv1 so that SECTION_LINK_ORDER (.section 'o' arg) is now supported,
and on PowerPC64 ELFv2 to not break the global entry code.

Bootstrapped powerpc64le-linux and x86_64-linux all langs.  I did see
one regression on both targets, libgo runtime/pprof.  It's unclear to
me what that means.

Alan Modra (3):
  PowerPC64 ELFv1 -fpatchable-function-entry
  Revert "rs6000: Avoid -fpatchable-function-entry* regressions on
powerpc64 be [PR98125]"
  PowerPC64 ELFv2 -fpatchable-function-entry

 gcc/config/rs6000/rs6000-logue.c |  5 
 gcc/config/rs6000/rs6000.c   | 47 +---
 gcc/targhooks.c  | 38 --
 gcc/targhooks.h  |  3 --
 gcc/testsuite/g++.dg/pr93195a.C  |  1 -
 gcc/varasm.c | 37 ++---
 6 files changed, 69 insertions(+), 62 deletions(-)



PowerPC64 ELFv1 -fpatchable-function-entry

2021-05-06 Thread Alan Modra via Gcc-patches
On PowerPC64 ELFv1 function symbols are defined on function
descriptors in an .opd section rather than in the function code.
.opd is not split up by the PowerPC64 backend for comdat groups or
other situations where per-function sections are required.  Thus
SECTION_LINK_ORDER can't use the function name to reference a suitable
section for ordering:  The .opd section might contain many other
function descriptors and they may be in a different order to the final
function code placement.  This patch arranges to use a code label
instead of the function name symbol.

I chose to emit the label inside default_elf_asm_named_section,
immediately before the .section directive using the label, and in case
someone uses .previous or the like, need to save and restore the
current section when switching to the function code section to emit
the label.  That requires a tweak to switch_to_section in order to get
the current section.  I checked all the TARGET_ASM_NAMED_SECTION
functions and unnamed.callback functions and it appears none will be
affected by that tweak.

PR target/98125
* varasm.c (default_elf_asm_named_section): Use a function
code label rather than the function symbol as the "o" argument.
(switch_to_section): Don't set in_section until section
directive has been emitted.

diff --git a/gcc/varasm.c b/gcc/varasm.c
index 97c1e6fff25..5f95f8cfa75 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6866,6 +6866,26 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
   *f = '\0';
 }
 
+  char func_label[256];
+  if (flags & SECTION_LINK_ORDER)
+{
+  static int recur;
+  if (recur)
+   gcc_unreachable ();
+  else
+   {
+ ++recur;
+ section *save_section = in_section;
+ static int func_code_labelno;
+ switch_to_section (function_section (decl));
+ ++func_code_labelno;
+ ASM_GENERATE_INTERNAL_LABEL (func_label, "LPFC", func_code_labelno);
+ ASM_OUTPUT_LABEL (asm_out_file, func_label);
+ switch_to_section (save_section);
+ --recur;
+   }
+}
+
   fprintf (asm_out_file, "\t.section\t%s,\"%s\"", name, flagchars);
 
   /* default_section_type_flags (above) knows which flags need special
@@ -6893,11 +6913,8 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
   if (flags & SECTION_LINK_ORDER)
{
- tree id = DECL_ASSEMBLER_NAME (decl);
- ultimate_transparent_alias_target (&id);
- const char *name = IDENTIFIER_POINTER (id);
- name = targetm.strip_name_encoding (name);
- fprintf (asm_out_file, ",%s", name);
+ fputc (',', asm_out_file);
+ assemble_name_raw (asm_out_file, func_label);
}
   if (HAVE_COMDAT_GROUP && (flags & SECTION_LINKONCE))
{
@@ -7821,11 +7838,6 @@ switch_to_section (section *new_section, tree decl)
   else if (in_section == new_section)
 return;
 
-  if (new_section->common.flags & SECTION_FORGET)
-in_section = NULL;
-  else
-in_section = new_section;
-
   switch (SECTION_STYLE (new_section))
 {
 case SECTION_NAMED:
@@ -7843,6 +7855,11 @@ switch_to_section (section *new_section, tree decl)
   break;
 }
 
+  if (new_section->common.flags & SECTION_FORGET)
+in_section = NULL;
+  else
+in_section = new_section;
+
   new_section->common.flags |= SECTION_DECLARED;
 }
 


Revert "rs6000: Avoid -fpatchable-function-entry* regressions on powerpc64 be [PR98125]"

2021-05-06 Thread Alan Modra via Gcc-patches
This reverts commit b680b9049737198d010e49cf434704c6a6ed2b3f now
that the PowerPC64 ELFv1 regression is fixed properly.

PR testsuite/98125
* targhooks.h (default_print_patchable_function_entry_1): Delete.
* targhooks.c (default_print_patchable_function_entry_1): Delete.
(default_print_patchable_function_entry): Expand above.
* config/rs6000/rs6000.c (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
Don't define.
(rs6000_print_patchable_function_entry): Delete.
* testsuite/g++.dg/pr93195a.C: Revert 2021-04-03 change.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 663eed4f055..d43c36e7f1a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1362,10 +1362,6 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #define TARGET_ASM_ASSEMBLE_VISIBILITY rs6000_assemble_visibility
 #endif
 
-#undef TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY
-#define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY \
-  rs6000_print_patchable_function_entry
-
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE rs6000_set_up_by_prologue
 
@@ -14957,30 +14953,6 @@ rs6000_assemble_visibility (tree decl, int vis)
 }
 #endif
 
-/* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a function
-   entry.  If RECORD_P is true and the target supports named sections,
-   the location of the NOPs will be recorded in a special object section
-   called "__patchable_function_entries".  This routine may be called
-   twice per function to put NOPs before and after the function
-   entry.  */
-
-void
-rs6000_print_patchable_function_entry (FILE *file,
-  unsigned HOST_WIDE_INT patch_area_size,
-  bool record_p)
-{
-  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
-  /* When .opd section is emitted, the function symbol
- default_print_patchable_function_entry_1 is emitted into the .opd section
- while the patchable area is emitted into the function section.
- Don't use SECTION_LINK_ORDER in that case.  */
-  if (!(TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2)
-  && HAVE_GAS_SECTION_LINK_ORDER)
-flags |= SECTION_LINK_ORDER;
-  default_print_patchable_function_entry_1 (file, patch_area_size, record_p,
-   flags);
-}
-
 enum rtx_code
 rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
 {
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 952fad422eb..d69c9a2d819 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1832,15 +1832,17 @@ default_compare_by_pieces_branch_ratio (machine_mode)
   return 1;
 }
 
-/* Helper for default_print_patchable_function_entry and other
-   print_patchable_function_entry hook implementations.  */
+/* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a function
+   entry.  If RECORD_P is true and the target supports named sections,
+   the location of the NOPs will be recorded in a special object section
+   called "__patchable_function_entries".  This routine may be called
+   twice per function to put NOPs before and after the function
+   entry.  */
 
 void
-default_print_patchable_function_entry_1 (FILE *file,
- unsigned HOST_WIDE_INT
- patch_area_size,
- bool record_p,
- unsigned int flags)
+default_print_patchable_function_entry (FILE *file,
+   unsigned HOST_WIDE_INT patch_area_size,
+   bool record_p)
 {
   const char *nop_templ = 0;
   int code_num;
@@ -1862,6 +1864,9 @@ default_print_patchable_function_entry_1 (FILE *file,
   patch_area_number++;
   ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE", patch_area_number);
 
+  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
+  if (HAVE_GAS_SECTION_LINK_ORDER)
+   flags |= SECTION_LINK_ORDER;
   switch_to_section (get_section ("__patchable_function_entries",
  flags, current_function_decl));
   assemble_align (POINTER_SIZE);
@@ -1878,25 +1883,6 @@ default_print_patchable_function_entry_1 (FILE *file,
 output_asm_insn (nop_templ, NULL);
 }
 
-/* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a function
-   entry.  If RECORD_P is true and the target supports named sections,
-   the location of the NOPs will be recorded in a special object section
-   called "__patchable_function_entries".  This routine may be called
-   twice per function to put NOPs before and after the function
-   entry.  */
-
-void
-default_print_patchable_function_entry (FILE *file,
-   unsigned HOST_WIDE_INT patch_area_size,
-   bool record_p)
-{
-  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
-  if (HAVE_GAS_SECTI

PowerPC64 ELFv2 -fpatchable-function-entry

2021-05-06 Thread Alan Modra via Gcc-patches
PowerPC64 ELFv2 dual entry point functions have a couple of problems
with -fpatchable-function-entry.  One is that the nops added after the
global entry land in the global entry code which is constrained to be
a power of two number of instructions, and zero global entry code has
special meaning for linkage.  The other is that the global entry code
isn't always used on function entry, and some uses of
-fpatchable-function-entry might want to affect all entries to the
function.  So this patch arranges to put one batch of nops before the
global entry, and the other after the local entry point.

PR target/98125
* config/rs6000/rs6000.c (rs6000_print_patchable_function_entry): New
function.
(TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): Define.
* config/rs6000/rs6000-logue.c: Include targhooks.h.
(rs6000_output_function_prologue): Handle nops for
-fpatchable-function-entry after local entry point.

diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index b0ac183ceff..ffa3bb3dcf1 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -51,6 +51,7 @@
 #include "gstab.h"  /* for N_SLINE */
 #include "dbxout.h" /* dbxout_ */
 #endif
+#include "targhooks.h"
 
 static int rs6000_ra_ever_killed (void);
 static void is_altivec_return_reg (rtx, void *);
@@ -3991,6 +3992,10 @@ rs6000_output_function_prologue (FILE *file)
   fputs (",1\n", file);
 }
 
+  int nops_after_entry = crtl->patch_area_size - crtl->patch_area_entry;
+  if (nops_after_entry > 0)
+default_print_patchable_function_entry (file, nops_after_entry, false);
+
   /* Output -mprofile-kernel code.  This needs to be done here instead of
  in output_function_profile since it must go after the ELFv2 ABI
  local entry point.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d43c36e7f1a..97f1b3e0674 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1404,6 +1404,10 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_ASM_FUNCTION_EPILOGUE
 #define TARGET_ASM_FUNCTION_EPILOGUE rs6000_output_function_epilogue
 
+#undef TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY
+#define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY \
+  rs6000_print_patchable_function_entry
+
 #undef TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA
 #define TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA rs6000_output_addr_const_extra
 
@@ -14953,6 +14957,33 @@ rs6000_assemble_visibility (tree decl, int vis)
 }
 #endif
 
+/* Write NOPs into the asm outfile FILE around a function entry.  This
+   routine may be called twice per function to put NOPs before and after
+   the function entry.  If RECORD_P is true the location of the NOPs will
+   be recorded by default_print_patchable_function_entry in a special
+   object section called "__patchable_function_entries".  Disable output
+   of any NOPs for the second call.  Those, if any, are output by
+   rs6000_output_function_prologue.  This means that for ELFv2 any NOPs
+   after the function entry are placed after the local entry point, not
+   the global entry point.  NOPs after the entry may be found at
+   record_loc + nops_before * 4 + local_entry_offset.  This holds true
+   when nops_before is zero.  */
+
+static void
+rs6000_print_patchable_function_entry (FILE *file,
+  unsigned HOST_WIDE_INT patch_area_size 
ATTRIBUTE_UNUSED,
+  bool record_p)
+{
+  /* Always call default_print_patchable_function_entry when RECORD_P in
+ order to output the location of the NOPs, but use the size of the
+ area before the entry on both possible calls.  If RECORD_P is true
+ on the second call then the area before the entry was zero size and
+ thus no NOPs will be output.  */
+  if (record_p)
+default_print_patchable_function_entry (file, crtl->patch_area_entry,
+   record_p);
+}
+
 enum rtx_code
 rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
 {


Re: PowerPC64 ELFv1 -fpatchable-function-entry

2021-05-18 Thread Alan Modra via Gcc-patches
On Fri, May 07, 2021 at 07:24:02PM -0500, Segher Boessenkool wrote:
> Looks good with those things tweaked.

Except that the patch chose the wrong section to emit the label,
because the decl is wrong.  But of course I was using the same decl as
used in existing SHF_LINK_ORDER support, so it was already broken.
You can see this on x86_64 gcc/testsuite/g++.dg/pr93195a.C output for
bar().

.globl  _Z3barv
.type   _Z3barv, @function
_Z3barv:
.LFB1:
.cfi_startproc
.section__patchable_function_entries,"awo",@progbits,_Z3foov
.align 8
.quad   .LPFE2
.text
.LPFE2:
nop
pushq   %rbp

Note the reference to _Z3foov!  Tracking down why this happens isn't
hard.  The first get_section ("__patchable_function_entries", ...)
saves the decl for foo, which is the one then used on all subsequent
calls.  Oops.  Jakub, commit 134afa38f0be didn't quite do the right
thing.

This problem can be fixed with a simplification of the patch I posted,
relying on the fact that currently in the only place that wants to
emit SHF_LINK_ORDER sections we've already switched to the correct
function code section.

Regression testing in progress.

PR target/98125
* varasm.c (default_elf_asm_named_section): Use a function
code label rather than the function symbol as the "o" argument.

diff --git a/gcc/varasm.c b/gcc/varasm.c
index 97c1e6fff25..6f10ac7762e 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6866,6 +6866,15 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
   *f = '\0';
 }
 
+  char func_label[256];
+  if (flags & SECTION_LINK_ORDER)
+{
+  static int func_code_labelno;
+  func_code_labelno++;
+  ASM_GENERATE_INTERNAL_LABEL (func_label, "LPFC", func_code_labelno);
+  ASM_OUTPUT_LABEL (asm_out_file, func_label);
+}
+
   fprintf (asm_out_file, "\t.section\t%s,\"%s\"", name, flagchars);
 
   /* default_section_type_flags (above) knows which flags need special
@@ -6893,11 +6902,8 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
   if (flags & SECTION_LINK_ORDER)
{
- tree id = DECL_ASSEMBLER_NAME (decl);
- ultimate_transparent_alias_target (&id);
- const char *name = IDENTIFIER_POINTER (id);
- name = targetm.strip_name_encoding (name);
- fprintf (asm_out_file, ",%s", name);
+ fprintf (asm_out_file, ",");
+ assemble_name_raw (asm_out_file, func_label);
}
   if (HAVE_COMDAT_GROUP && (flags & SECTION_LINKONCE))
{

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC64 ELFv2 -fpatchable-function-entry

2021-05-18 Thread Alan Modra via Gcc-patches
On Mon, May 10, 2021 at 04:39:55PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, May 07, 2021 at 12:19:52PM +0930, Alan Modra wrote:
> > PowerPC64 ELFv2 dual entry point functions have a couple of problems
> > with -fpatchable-function-entry.  One is that the nops added after the
> > global entry land in the global entry code which is constrained to be
> > a power of two number of instructions, and zero global entry code has
> > special meaning for linkage.  The other is that the global entry code
> > isn't always used on function entry, and some uses of
> > -fpatchable-function-entry might want to affect all entries to the
> > function.  So this patch arranges to put one batch of nops before the
> > global entry, and the other after the local entry point.
> 
> Wow ugly.  Not worse than patchable-funtion-enbtry itself I suppose :-)
> 
> > * config/rs6000/rs6000-logue.c: Include targhooks.h.
> 
> Huh, did it not already do that?!  Hrm, all the other hooks seem to be
> called via rs6000.c currently.  But you do the same, so why do you need
> to include the header in rs6000-logue.c?

For the new call to default_print_patchable_function_entry in
rs6000_output_function_prologue.

> 
> > +/* Write NOPs into the asm outfile FILE around a function entry.  This
> > +   routine may be called twice per function to put NOPs before and after
> > +   the function entry.  If RECORD_P is true the location of the NOPs will
> > +   be recorded by default_print_patchable_function_entry in a special
> > +   object section called "__patchable_function_entries".  Disable output
> > +   of any NOPs for the second call.  Those, if any, are output by
> > +   rs6000_output_function_prologue.  This means that for ELFv2 any NOPs
> > +   after the function entry are placed after the local entry point, not
> > +   the global entry point.  NOPs after the entry may be found at
> > +   record_loc + nops_before * 4 + local_entry_offset.  This holds true
> > +   when nops_before is zero.  */
> > +
> > +static void
> > +rs6000_print_patchable_function_entry (FILE *file,
> > +  unsigned HOST_WIDE_INT patch_area_size 
> > ATTRIBUTE_UNUSED,
> > +  bool record_p)
> 
> It is not a predicate.  Do not call it _p please.  Yes, I know the
> generic code calls is this.  That needs fixing.
> 
> A better name (from the viewpoint of callers, which is what matters!)
> might be first_time or similar?  Or something that really says what it
> *does*?
> 
> > +{
> > +  /* Always call default_print_patchable_function_entry when RECORD_P in
> > + order to output the location of the NOPs, but use the size of the
> > + area before the entry on both possible calls.  If RECORD_P is true
> > + on the second call then the area before the entry was zero size and
> > + thus no NOPs will be output.  */
> > +  if (record_p)
> > +default_print_patchable_function_entry (file, crtl->patch_area_entry,
> > +   record_p);
> > +}
> 
> That is trickiness that will break later.

There is a fine line between trickiness and elegance.  Given the
current available crtl variables dealing with the patch area and the
current patchable_function_entry parameters, any supposedly "less
hacky" but correct expression will simplify down to what I wrote
here.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] PR96493, powerpc local call linkage failure

2020-08-11 Thread Alan Modra via Gcc-patches
This fixes a fail when power10 isn't supported by binutils, and
ensures the test isn't run without power10 hardware or simulation on
the off chance that power10 insns are emitted in the future for this
testcase.  Bootstrapped etc.  OK?

PR target/96525
* testsuite/gcc.target/powerpc/pr96493.c: Make it a link test
when no power10_hw.  Require power10_ok.

diff --git a/gcc/testsuite/gcc.target/powerpc/pr96493.c 
b/gcc/testsuite/gcc.target/powerpc/pr96493.c
index f0de0818813..1e5d43f199d 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96493.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96493.c
@@ -1,6 +1,8 @@
-/* { dg-do run } */
+/* { dg-do run { target { power10_hw } } } */
+/* { dg-do link { target { ! power10_hw } } } */
 /* { dg-options "-mdejagnu-cpu=power8 -O2" } */
 /* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-require-effective-target power10_ok } */
 
 /* Test local calls between pcrel and non-pcrel code.
 

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] PR96493, powerpc local call linkage failure

2020-08-11 Thread Alan Modra via Gcc-patches
On Tue, Aug 11, 2020 at 01:30:36PM -0500, Segher Boessenkool wrote:
> Either always running or what this patch does will work.  But please add
> comments what the test case wants to test,

That's already in the testcase.

/* Test local calls between pcrel and non-pcrel code.

The comment goes on to say

   Despite the cpu=power10 option, the code generated here should just
   be plain powerpc64, even the necessary linker stubs.  */

which was the justification for using "dg-do run" unqualified in the
current testcase.

> and for the tricky bits.

There aren't any.  And other tests use multiple dg-do lines, eg.
gcc/testsuite/g++.dg/ext/altivec-3.C

/* { dg-do run { target { powerpc*-*-* && vmx_hw } } } */
/* { dg-do compile { target { powerpc*-*-* && { ! vmx_hw } } } } */

Committed 2ba0674c657, and apologies for missing the power10_ok first
time around on this test.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread Alan Modra via Gcc-patches
On Thu, Aug 27, 2020 at 08:21:34AM -0500, Bill Schmidt wrote:
> Prior to P10, ELFv2 hasn't implemented nonlocal sibcalls.  Now that we do,
> we need to be sure that r12 is set up prior to such a call.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this okay for trunk?

FWIW, looks fine to me.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread Alan Modra via Gcc-patches
On Thu, Aug 27, 2020 at 03:17:45PM -0500, Segher Boessenkool wrote:
> On Thu, Aug 27, 2020 at 01:51:25PM -0500, Bill Schmidt wrote:
> It not the copy that is unnecessary: the preventing it *here*, manually,
> is what is unnecessary.

Blame me for the original !rtx_equal_p in rs6000_call_aix that Bill
copied.  So does emit_move_insn prevent the copy?  I can't spot where,
maybe I haven't looked hard enough.

If emit_move_insn doesn't prevent it, then why create useless RTL that
is only going to make work for optimisation passes that remove such
nops?

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-28 Thread Alan Modra via Gcc-patches
On Fri, Aug 28, 2020 at 01:17:27AM -0500, Segher Boessenkool wrote:
> 1) Very many unnecessary moves are generated during expand *anyway*, a
> few more will not hurt;
> 2) In practice we always generate a move from a pseudo into r12 here,
> never a copy from r12 into r12;
> 3) Why generate dead code optimising cases that do not happen, making
> the compiler bigger and slower, but more importantly, making the
> compiler code harder to read?

Uh, OK point 2 and followup 3 is persuasive.  I should have thought of
that myself, thanks.

-- 
Alan Modra
Australia Development Lab, IBM


ubsan: d-demangle.c:214 signed integer overflow

2020-09-03 Thread Alan Modra via Gcc-patches
Running the libiberty testsuite
./test-demangle < libiberty/testsuite/d-demangle-expected
libiberty/d-demangle.c:214:14: runtime error: signed integer overflow: 
922337203 * 10 cannot be represented in type 'long int'

On looking at silencing ubsan, I found a real bug in dlang_number.
For a 32-bit long, some overflows won't be detected.  For example,
21474836480.  Why?  Well 214748364 * 10 is 0x7FF8 (no overflow so
far).  Adding 8 gives 0x8000 (which does overflow but there is no
test for that overflow in the code).  Then multiplying 0x8000 * 10
= 0x5 = 0 won't be caught by the multiplication overflow test.
The same holds for a 64-bit long using similarly crafted digit
sequences.

This patch replaces the mod 10 test with a simpler limit test, and
similarly the mod 26 test in dlang_decode_backref.

About the limit test:
  val * 10 + digit > ULONG_MAX is the condition for overflow
ie.
  val * 10 > ULONG_MAX - digit
or
  val > (ULONG_MAX - digit) / 10
or assuming the largest digit
  val > (ULONG_MAX - 9) / 10

I resisted the aesthetic appeal of simplifying this further to
  val > -10UL / 10
since -1UL for ULONG_MAX is only correct for 2's complement numbers.

Passes all the libiberty tests, on both 32-bit and 64-bit hosts.  OK
to apply?

* d-demangle.c: Include limits.h.
(ULONG_MAX): Provide fall-back definition.
(dlang_number): Simplify and correct overflow test.  Only
write *ret on returning non-NULL.
(dlang_decode_backref): Likewise.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index f2d6946eca..59e6ae007a 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -31,6 +31,9 @@ If not, see .  */
 #ifdef HAVE_CONFIG_H
 #include "config.h"
 #endif
+#ifdef HAVE_LIMITS_H
+#include 
+#endif
 
 #include "safe-ctype.h"
 
@@ -45,6 +48,10 @@ If not, see .  */
 #include 
 #include "libiberty.h"
 
+#ifndef ULONG_MAX
+#defineULONG_MAX   (~0UL)
+#endif
+
 /* A mini string-handling package */
 
 typedef struct string  /* Beware: these aren't required to be */
@@ -207,24 +214,24 @@ dlang_number (const char *mangled, long *ret)
   if (mangled == NULL || !ISDIGIT (*mangled))
 return NULL;
 
-  (*ret) = 0;
+  unsigned long val = 0;
 
   while (ISDIGIT (*mangled))
 {
-  (*ret) *= 10;
-
-  /* If an overflow occured when multiplying by ten, the result
-will not be a multiple of ten.  */
-  if ((*ret % 10) != 0)
+  /* Check for overflow.  Yes, we return NULL here for some digits
+that don't overflow "val * 10 + digit", but that doesn't
+matter given the later "(long) val < 0" test.  */
+  if (val > (ULONG_MAX - 9) / 10)
return NULL;
 
-  (*ret) += mangled[0] - '0';
+  val = val * 10 + mangled[0] - '0';
   mangled++;
 }
 
-  if (*mangled == '\0' || *ret < 0)
+  if (*mangled == '\0' || (long) val < 0)
 return NULL;
 
+  *ret = val;
   return mangled;
 }
 
@@ -294,24 +301,24 @@ dlang_decode_backref (const char *mangled, long *ret)
[A-Z] NumberBackRef
^
*/
-  (*ret) = 0;
+  unsigned long val = 0;
 
   while (ISALPHA (*mangled))
 {
-  (*ret) *= 26;
+  /* Check for overflow.  */
+  if (val > (ULONG_MAX - 25) / 26)
+   break;
 
-  /* If an overflow occured when multiplying by 26, the result
-will not be a multiple of 26.  */
-  if ((*ret % 26) != 0)
-   return NULL;
+  val *= 26;
 
   if (mangled[0] >= 'a' && mangled[0] <= 'z')
{
- (*ret) += mangled[0] - 'a';
+ val += mangled[0] - 'a';
+ *ret = val;
  return mangled + 1;
}
 
-  (*ret) += mangled[0] - 'A';
+  val += mangled[0] - 'A';
   mangled++;
 }
 
-- 
Alan Modra
Australia Development Lab, IBM


Re: ubsan: d-demangle.c:214 signed integer overflow

2020-09-03 Thread Alan Modra via Gcc-patches
On Thu, Sep 03, 2020 at 11:02:50PM +0200, Iain Buclaw wrote:
> Excerpts from Alan Modra's message of September 3, 2020 3:01 pm:
> > Running the libiberty testsuite
> > ./test-demangle < libiberty/testsuite/d-demangle-expected
> > libiberty/d-demangle.c:214:14: runtime error: signed integer overflow: 
> > 922337203 * 10 cannot be represented in type 'long int'
> > 
> > On looking at silencing ubsan, I found a real bug in dlang_number.
> > For a 32-bit long, some overflows won't be detected.  For example,
> > 21474836480.  Why?  Well 214748364 * 10 is 0x7FF8 (no overflow so
> > far).  Adding 8 gives 0x8000 (which does overflow but there is no
> > test for that overflow in the code).  Then multiplying 0x8000 * 10
> > = 0x5 = 0 won't be caught by the multiplication overflow test.
> > The same holds for a 64-bit long using similarly crafted digit
> > sequences.
> > 
> > This patch replaces the mod 10 test with a simpler limit test, and
> > similarly the mod 26 test in dlang_decode_backref.
> > 
> > About the limit test:
> >   val * 10 + digit > ULONG_MAX is the condition for overflow
> > ie.
> >   val * 10 > ULONG_MAX - digit
> > or
> >   val > (ULONG_MAX - digit) / 10
> > or assuming the largest digit
> >   val > (ULONG_MAX - 9) / 10
> > 
> > I resisted the aesthetic appeal of simplifying this further to
> >   val > -10UL / 10
> > since -1UL for ULONG_MAX is only correct for 2's complement numbers.
> > 
> > Passes all the libiberty tests, on both 32-bit and 64-bit hosts.  OK
> > to apply?
> > 
> 
> Thanks Alan, change seems reasonable, however on giving it a mull over,
> I see that the largest number that dlang_number would need to be able to
> handle is UINT_MAX.  These two tests which decode a wchar value are
> representative of that (first is valid, second invalid).
> 
> #
> --format=dlang
> _D4test21__T3funVwi4294967295Z3funFNaNbNiNfZv
> test.fun!('\U').fun()
> #
> --format=dlang
> _D4test21__T3funVwi4294967296Z3funFNaNbNiNfZv
> _D4test21__T3funVwi4294967296Z3funFNaNbNiNfZv

Hmm, OK, on a 32-bit host those both won't be handled due to the
"(long) val < 0" test.

Do you really want the last one to fail on a 64-bit host, ie. not
produce "test.fun!('\U1').fun()"?  I'm guessing you do, but
I'd like that confirmed before posting a followup patch.

> I'm fine with creating a new PR and dealing with the above in a separate
> change though, as it will require a few more replacements to adjust the
> result parameter type to 'unsigned' or 'long long'.  

Ha!  When I wrote the original patch I changed most of the "long"
parameters and variables to "unsigned long", because it was clear from
the code (and my understanding of name mangling) that the values were
in fact always unsigned.  I also changed "int" to "size_t" where
appropriate, not that you need to handle strings larger than 2G in
length but simply that size_t is the correct type to use with string
functions, malloc, memcpy and so on.  I decided to remove all those
changes before posting as they were just tidies, I thought.

-- 
Alan Modra
Australia Development Lab, IBM


Re: ubsan: d-demangle.c:214 signed integer overflow

2020-09-04 Thread Alan Modra via Gcc-patches
So this one is on top of the previously posted patch.

* d-demangle.c (string_need): Take a size_t n arg, and use size_t tem.
(string_append): Use size_t n.
(string_appendn, string_prependn): Take a size_t n arg.
(TEMPLATE_LENGTH_UNKNOWN): Define as -1UL.
* d-demangle.c (dlang_number): Make "ret" an unsigned long*.
Only succeed for result of [0,4294967295UL].
(dlang_decode_backref): Only succeed for result [1,MAX_LONG].
(dlang_backref): Remove now unnecessary range check.
(dlang_symbol_name_p): Likewise.
(dlang_lname, dlang_parse_template): Take an unsigned long len
arg.
(dlang_symbol_backref, dlang_identifier, dlang_parse_integer),
(dlang_parse_integer, dlang_parse_string),
(dlang_parse_arrayliteral, dlang_parse_assocarray),
(dlang_parse_structlit, dlang_parse_tuple),
(dlang_template_symbol_param, dlang_template_args): Use
unsigned long variables.
* testsuite/d-demangle-expected: Add new tests.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 59e6ae007a..152f620abf 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -62,9 +62,9 @@ typedef struct string /* Beware: these aren't 
required to be */
 } string;
 
 static void
-string_need (string *s, int n)
+string_need (string *s, size_t n)
 {
-  int tem;
+  size_t tem;
 
   if (s->b == NULL)
 {
@@ -75,7 +75,7 @@ string_need (string *s, int n)
   s->p = s->b = XNEWVEC (char, n);
   s->e = s->b + n;
 }
-  else if (s->e - s->p < n)
+  else if ((size_t) (s->e - s->p) < n)
 {
   tem = s->p - s->b;
   n += tem;
@@ -124,14 +124,14 @@ string_setlength (string *s, int n)
 static void
 string_append (string *p, const char *s)
 {
-  int n = strlen (s);
+  size_t n = strlen (s);
   string_need (p, n);
   memcpy (p->p, s, n);
   p->p += n;
 }
 
 static void
-string_appendn (string *p, const char *s, int n)
+string_appendn (string *p, const char *s, size_t n)
 {
   if (n != 0)
 {
@@ -142,7 +142,7 @@ string_appendn (string *p, const char *s, int n)
 }
 
 static void
-string_prependn (string *p, const char *s, int n)
+string_prependn (string *p, const char *s, size_t n)
 {
   char *q;
 
@@ -177,7 +177,7 @@ struct dlang_info
 };
 
 /* Pass as the LEN to dlang_parse_template if symbol length is not known.  */
-enum { TEMPLATE_LENGTH_UNKNOWN = -1 };
+#define TEMPLATE_LENGTH_UNKNOWN (-1UL)
 
 /* Prototypes for forward referenced functions */
 static const char *dlang_function_type (string *, const char *,
@@ -200,15 +200,16 @@ static const char *dlang_parse_tuple (string *, const 
char *,
  struct dlang_info *);
 
 static const char *dlang_parse_template (string *, const char *,
-struct dlang_info *, long);
+struct dlang_info *, unsigned long);
 
-static const char *dlang_lname (string *, const char *, long);
+static const char *dlang_lname (string *, const char *, unsigned long);
 
 
 /* Extract the number from MANGLED, and assign the result to RET.
-   Return the remaining string on success or NULL on failure.  */
+   Return the remaining string on success or NULL on failure.
+   A result larger than 4294967295UL is considered a failure.  */
 static const char *
-dlang_number (const char *mangled, long *ret)
+dlang_number (const char *mangled, unsigned long *ret)
 {
   /* Return NULL if trying to extract something that isn't a digit.  */
   if (mangled == NULL || !ISDIGIT (*mangled))
@@ -218,17 +219,17 @@ dlang_number (const char *mangled, long *ret)
 
   while (ISDIGIT (*mangled))
 {
-  /* Check for overflow.  Yes, we return NULL here for some digits
-that don't overflow "val * 10 + digit", but that doesn't
-matter given the later "(long) val < 0" test.  */
-  if (val > (ULONG_MAX - 9) / 10)
+  unsigned long digit = mangled[0] - '0';
+
+  /* Check for overflow.  */
+  if (val > (4294967295UL - digit) / 10)
return NULL;
 
-  val = val * 10 + mangled[0] - '0';
+  val = val * 10 + digit;
   mangled++;
 }
 
-  if (*mangled == '\0' || (long) val < 0)
+  if (*mangled == '\0')
 return NULL;
 
   *ret = val;
@@ -280,7 +281,8 @@ dlang_call_convention_p (const char *mangled)
 }
 
 /* Extract the back reference position from MANGLED, and assign the result
-   to RET.  Return the remaining string on success or NULL on failure.  */
+   to RET.  Return the remaining string on success or NULL on failure.
+   A result <= 0 is a failure.  */
 static const char *
 dlang_decode_backref (const char *mangled, long *ret)
 {
@@ -314,6 +316,8 @@ dlang_decode_backref (const char *mangled, long *ret)
   if (mangled[0] >= 'a' && mangled[0] <= 'z')
{
  val += mangled[0] - 'a';
+ if ((long) val <= 0)
+   break;
  *ret = val;
  return mangled + 1;
}
@

Re: ubsan: d-demangle.c:214 signed integer overflow

2020-09-06 Thread Alan Modra via Gcc-patches
On Fri, Sep 04, 2020 at 06:23:10PM +0200, Iain Buclaw wrote:
> If we're already using limits.h, I guess it should be fine to also add
> 
> #define UINT_MAX ((unsigned) ~0U)

Yes, except that I'll use the simpler fall-back
#define UINT_MAX (~0U)

The habit of using a cast for unsigned constants dates back to K&R C
where a U suffix was not valid.  For example, from libiberty/strtol.c
#define ULONG_MAX   ((unsigned long)(~0L))

Since the code uses ISO/ANSI C features such as prototypes I think
we're OK with a U suffix.  And if there's something I'm missing then
#define UINT_MAX ((unsigned) ~0)
would be correct for K&R.

> I'll leave it to your judgement on that though.
> 
> Other than that, OK from me.

Do I need an OK from Ian too?

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] ppc64 check for incompatible setting of minimal-toc

2020-09-11 Thread Alan Modra via Gcc-patches
On Fri, Sep 11, 2020 at 04:43:50AM -0300, Alexandre Oliva wrote:
> Could you please shed any light as to the intent, so that we can sort
> out the logic that will implement it?

The history goes back to 2003 commit 9739c90c8d where a ppc64-linux
host built most of gcc with -mminimal-toc due to toc/got size
constraints of the small model, but build libgcc with
-mno-minimal-toc.  When the medium/large model was implemented, I
thought it good to continue compiling libgcc in small model.  We'd
like to keep doing that too.  That's why -mno-minimal-toc, an option
valid for both ppc32 and ppc64, chooses small model on ppc64 when the
compiler default is medium model.  If you change that then make sure
that libgcc continues to be small model.  And any other code that
might have copied the libgcc trick..

I also thought it reasonable to error on an explicit -mcmodel=medium
or -mcmodel=large with either of -mminimal-toc or -mno-minimal-toc,
since the toc options really are specific to small model code.  Why
change that?

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] make PLT loads volatile

2020-03-11 Thread Alan Modra via Gcc-patches
With lazy PLT resolution the first load of a PLT entry may be a value
pointing at a resolver stub.  gcc's loop processing can result in the
PLT load in inline PLT calls being hoisted out of a loop in the
mistaken idea that this is an optimisation.  It isn't.  If the value
hoisted was that for a resolver stub then every call to that function
in the loop will go via the resolver, slowing things down quite
dramatically.

The PLT really is volatile, so teach gcc about that.

Bootstrapped and regression tested on powerpc64le-linux and tested
with a spec build using -mlongcalls.  OK for mainline?

* config/rs6000/rs6000.c (rs6000_longcall_ref): Use unspec_volatile
for PLT16_LO and PLT_PCREL.
* config/rs6000/rs6000.md (UNSPEC_PLT16_LO, UNSPEC_PLT_PCREL): Remove.
(UNSPECV_PLT16_LO, UNSPECV_PLT_PCREL): Define.
(pltseq_plt16_lo_, pltseq_plt_pcrel): Use unspec_volatile.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 46b7dec2abd..2d6790877fa 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -19264,8 +19264,9 @@ rs6000_longcall_ref (rtx call_ref, rtx arg)
   if (rs6000_pcrel_p (cfun))
{
  rtx reg = gen_rtx_REG (Pmode, regno);
- rtx u = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, base, call_ref, arg),
- UNSPEC_PLT_PCREL);
+ rtx u = gen_rtx_UNSPEC_VOLATILE (Pmode,
+  gen_rtvec (3, base, call_ref, arg),
+  UNSPECV_PLT_PCREL);
  emit_insn (gen_rtx_SET (reg, u));
  return reg;
}
@@ -19284,8 +19285,9 @@ rs6000_longcall_ref (rtx call_ref, rtx arg)
   rtx reg = gen_rtx_REG (Pmode, regno);
   rtx hi = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, base, call_ref, arg),
   UNSPEC_PLT16_HA);
-  rtx lo = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, reg, call_ref, arg),
-  UNSPEC_PLT16_LO);
+  rtx lo = gen_rtx_UNSPEC_VOLATILE (Pmode,
+   gen_rtvec (3, reg, call_ref, arg),
+   UNSPECV_PLT16_LO);
   emit_insn (gen_rtx_SET (reg, hi));
   emit_insn (gen_rtx_SET (reg, lo));
   return reg;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ad88b6783af..5a8e9de670b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -148,8 +148,6 @@
UNSPEC_SI_FROM_SF
UNSPEC_PLTSEQ
UNSPEC_PLT16_HA
-   UNSPEC_PLT16_LO
-   UNSPEC_PLT_PCREL
   ])
 
 ;;
@@ -178,6 +176,8 @@
UNSPECV_MTFSB1  ; Set FPSCR Field bit to 1
UNSPECV_SPLIT_STACK_RETURN   ; A camouflaged return
UNSPECV_SPEC_BARRIER ; Speculation barrier
+   UNSPECV_PLT16_LO
+   UNSPECV_PLT_PCREL
   ])
 
 ; The three different kinds of epilogue.
@@ -10359,10 +10359,10 @@
 
 (define_insn "*pltseq_plt16_lo_"
   [(set (match_operand:P 0 "gpc_reg_operand" "=r")
-   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
-  (match_operand:P 2 "symbol_ref_operand" "s")
-  (match_operand:P 3 "" "")]
- UNSPEC_PLT16_LO))]
+   (unspec_volatile:P [(match_operand:P 1 "gpc_reg_operand" "b")
+   (match_operand:P 2 "symbol_ref_operand" "s")
+   (match_operand:P 3 "" "")]
+  UNSPECV_PLT16_LO))]
   "TARGET_PLTSEQ"
 {
   return rs6000_pltseq_template (operands, RS6000_PLTSEQ_PLT16_LO);
@@ -10382,10 +10382,10 @@
 
 (define_insn "*pltseq_plt_pcrel"
   [(set (match_operand:P 0 "gpc_reg_operand" "=r")
-   (unspec:P [(match_operand:P 1 "" "")
-  (match_operand:P 2 "symbol_ref_operand" "s")
-  (match_operand:P 3 "" "")]
- UNSPEC_PLT_PCREL))]
+   (unspec_volatile:P [(match_operand:P 1 "" "")
+   (match_operand:P 2 "symbol_ref_operand" "s")
+   (match_operand:P 3 "" "")]
+  UNSPECV_PLT_PCREL))]
   "HAVE_AS_PLTSEQ && TARGET_ELF
&& rs6000_pcrel_p (cfun)"
 {

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] make PLT loads volatile

2020-03-12 Thread Alan Modra via Gcc-patches
On Thu, Mar 12, 2020 at 11:57:17AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Mar 12, 2020 at 01:18:50PM +1030, Alan Modra wrote:
> > With lazy PLT resolution the first load of a PLT entry may be a value
> > pointing at a resolver stub.  gcc's loop processing can result in the
> > PLT load in inline PLT calls being hoisted out of a loop in the
> > mistaken idea that this is an optimisation.  It isn't.  If the value
> > hoisted was that for a resolver stub then every call to that function
> > in the loop will go via the resolver, slowing things down quite
> > dramatically.
> > 
> > The PLT really is volatile, so teach gcc about that.
> 
> It would be nice if we could keep it cached after it has been resolved
> once, this has potential for regressing performance if we don't?  And
> LD_BIND_NOW should keep working just as fast as it is now, too?

Using a call-saved register to cache a load out of the PLT looks
really silly when the inline PLT call is turned back into a direct
call by the linker.  You end up with an unnecessary save and restore
of the register, plus copies from the register to r12.  What's the
chance of someone reporting that as a gcc "bug"?  :-)  Then there's
the possibility that shortening the number of instructions between two
calls of a small function runs into stalls.

How can we teach gcc about these unknowns?  ie. How to weight use of a
call-saved register to cache PLT loads against other possible uses of
that register in a loop?  It's quite likely not a good use, even when
gcc knows the PLT entry has been resolved..  Which means some gcc
infrastructure would be needed to do this sensibly and without the
necessary infrastructure, I think gcc hoisting a PLT load out of a
loop should never be done.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] make PLT loads volatile

2020-03-13 Thread Alan Modra via Gcc-patches
On Fri, Mar 13, 2020 at 10:40:38AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Mar 13, 2020 at 10:06:01AM +1030, Alan Modra wrote:
> > On Thu, Mar 12, 2020 at 11:57:17AM -0500, Segher Boessenkool wrote:
> > > On Thu, Mar 12, 2020 at 01:18:50PM +1030, Alan Modra wrote:
> > > > With lazy PLT resolution the first load of a PLT entry may be a value
> > > > pointing at a resolver stub.  gcc's loop processing can result in the
> > > > PLT load in inline PLT calls being hoisted out of a loop in the
> > > > mistaken idea that this is an optimisation.  It isn't.  If the value
> > > > hoisted was that for a resolver stub then every call to that function
> > > > in the loop will go via the resolver, slowing things down quite
> > > > dramatically.
> > > > 
> > > > The PLT really is volatile, so teach gcc about that.
> > > 
> > > It would be nice if we could keep it cached after it has been resolved
> > > once, this has potential for regressing performance if we don't?  And
> > > LD_BIND_NOW should keep working just as fast as it is now, too?
> > 
> > Using a call-saved register to cache a load out of the PLT looks
> > really silly
> 
> Who said anything about using call-saved registers?  GCC will usually
> make a stack slot for this, and only use a non-volatile register when
> that is profitable.  (I know it is a bit too aggressive with it, but
> that is a generic problem).

Using a stack slot comes about due to hoisting then running out of
call-saved registers in the loop.  Score another reason not to hoist
PLT loads.

> > when the inline PLT call is turned back into a direct
> > call by the linker.
> 
> Ah, so yeah, for direct calls we do not want this.  I was thinking this
> was about indirect calls (via a bctrl that is), dunno how I got that
> misperception.  Sorry.
> 
> What is this like for indirect calls (at C level)?  Does your patch do
> anything to those?

No effect at all.  To put your mind at rest on this point you can
verify quite easily by noticing that UNSPECV_PLT* is only generated in
rs6000_longcall_ref, and calls to that function are conditional on
GET_CODE (func_desc) == SYMBOL_REF.

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] PR94145, make PLT loads volatile

2020-03-22 Thread Alan Modra via Gcc-patches
On Wed, Mar 18, 2020 at 04:53:59PM -0500, Segher Boessenkool wrote:
> Could you please send a new patch (could be the same patch even) that
> is easier to review for me?

The PLT is volatile.  On PowerPC it is a bss style section which the
dynamic loader initialises to point at resolver stubs (called glink on
PowerPC64) to support lazy resolution of function addresses.  The
first call to a given function goes via the dynamic loader symbol
resolver, which updates the PLT entry for that function and calls the
function.  The second call, if there is one and we don't have a
multi-threaded race, will use the updated PLT entry and thus avoid
the relatively slow symbol resolver path.

Calls via the PLT are like calls via a function pointer, except that
no initialised function pointer is volatile like the PLT.  All
initialised function pointers are resolved at program startup to point
at the function or are left as NULL.  There is no support for lazy
resolution of any user visible function pointer.

So why does any of this matter to gcc?  Well, normally the PLT call
mechanism happens entirely behind gcc's back, but since we implemented
inline PLT calls (effectively putting the PLT code stub that loads the
PLT entry inline and making that code sequence scheduled), the load of
the PLT entry is visible to gcc.  That load then is subject to gcc
optimization, for example in

/* -S -mcpu=future -mpcrel -mlongcall -O2.  */
int foo (int);
void bar (void)
{
  while (foo(0))
foo (99);
}

we see the PLT load for foo being hoisted out of the loop and stashed
in a call-saved register.  If that happens to be the first call to
foo, then the stashed value is that for the resolver stub, and every
call to foo in the loop will then go via the slow resolver path.  Not
a good idea.  Also, if foo turns out to be a local function and the
linker replaces the PLT calls with direct calls to foo then gcc has
just wasted a call-saved register.

This patch teaches gcc that the PLT loads are volatile.  The change
doesn't affect other loads of function pointers and thus has no effect
on normal indirect function calls.  Note that because the
"optimization" this patch prevents can only occur over function calls,
the only place gcc can stash PLT loads is in call-saved registers or
in other memory.  I'm reasonably confident that this change will be
neutral or positive for the "ld -z now" case where the PLT is not
volatile, in code where there is any register pressure.  Even if gcc
could be taught to recognise cases where the PLT is resolved, you'd
need to discount use of registers to cache PLT loads by some factor
involving the chance that those calls would be converted to direct
calls..

PR target/94145
* config/rs6000/rs6000.c (rs6000_longcall_ref): Use unspec_volatile
for PLT16_LO and PLT_PCREL.
* config/rs6000/rs6000.md (UNSPEC_PLT16_LO, UNSPEC_PLT_PCREL): Remove.
(UNSPECV_PLT16_LO, UNSPECV_PLT_PCREL): Define.
(pltseq_plt16_lo_, pltseq_plt_pcrel): Use unspec_volatile.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 07f7cf516ba..68046fdb5ee 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -19274,8 +19274,9 @@ rs6000_longcall_ref (rtx call_ref, rtx arg)
   if (rs6000_pcrel_p (cfun))
{
  rtx reg = gen_rtx_REG (Pmode, regno);
- rtx u = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, base, call_ref, arg),
- UNSPEC_PLT_PCREL);
+ rtx u = gen_rtx_UNSPEC_VOLATILE (Pmode,
+  gen_rtvec (3, base, call_ref, arg),
+  UNSPECV_PLT_PCREL);
  emit_insn (gen_rtx_SET (reg, u));
  return reg;
}
@@ -19294,8 +19295,9 @@ rs6000_longcall_ref (rtx call_ref, rtx arg)
   rtx reg = gen_rtx_REG (Pmode, regno);
   rtx hi = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, base, call_ref, arg),
   UNSPEC_PLT16_HA);
-  rtx lo = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, reg, call_ref, arg),
-  UNSPEC_PLT16_LO);
+  rtx lo = gen_rtx_UNSPEC_VOLATILE (Pmode,
+   gen_rtvec (3, reg, call_ref, arg),
+   UNSPECV_PLT16_LO);
   emit_insn (gen_rtx_SET (reg, hi));
   emit_insn (gen_rtx_SET (reg, lo));
   return reg;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ad88b6783af..5a8e9de670b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -148,8 +148,6 @@
UNSPEC_SI_FROM_SF
UNSPEC_PLTSEQ
UNSPEC_PLT16_HA
-   UNSPEC_PLT16_LO
-   UNSPEC_PLT_PCREL
   ])
 
 ;;
@@ -178,6 +176,8 @@
UNSPECV_MTFSB1  ; Set FPSCR Field bit to 1
UNSPECV_SPLIT_STACK_RETURN   ; A camouflaged return
UNSPECV_SPEC_BARRIER ; Speculation barrier
+   UNSPECV_PLT16_LO
+   UNSPECV_PLT_PCREL
   ])
 
 ; The three different kinds of epi

[RS6000] PR96493, powerpc local call linkage failure

2020-08-06 Thread Alan Modra via Gcc-patches
This corrects current_file_function_operand, an operand predicate used
to determine whether a symbol_ref is safe to use with the local_call
patterns.  Calls between pcrel and non-pcrel code need to go via
linker stubs.  In the case of non-pcrel code to pcrel the stub saves
r2 but there needs to be a nop after the branch for the r2 restore.
So the local_call patterns can't be used there.  For pcrel code to
non-pcrel the local_call patterns could still be used, but I thought
it better to not use them since the call isn't direct.  Code generated
by the corresponding call_nonlocal_aix for pcrel is identical anyway.

Incidentally, without the TREE_CODE () == FUNCTION_DECL test,
gcc.c-torture/compile/pr37433.c and pr37433-1.c ICE.  Also, if you
make the test more strict by disallowing an op without a
SYMBOL_REF_DECL then a bunch of go and split-stack tests fail.  That's
because a prologue call to __morestack can't have a following nop.
(__morestack calls its caller at a fixed offset from the __morestack
call!)

Bootstrapped and regression tested powerpc64le-linux.  OK to apply?

Should I rename current_file_function_operand to something more
meaningful before committing?  direct_local_call_operand perhaps?

gcc/
PR target/96493
* config/rs6000/predicates.md (current_file_function_operand): Don't
accept functions that differ in r2 usage.
gcc/testsuite/
* gcc.target/powerpc/pr96493.c: New file.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index afb7c02f129..2709e46f7e5 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1051,7 +1051,12 @@
&& !((DEFAULT_ABI == ABI_AIX
  || DEFAULT_ABI == ABI_ELFv2)
 && (SYMBOL_REF_EXTERNAL_P (op)
-|| SYMBOL_REF_WEAK (op)))")))
+|| SYMBOL_REF_WEAK (op)))
+   && !(DEFAULT_ABI == ABI_ELFv2
+&& SYMBOL_REF_DECL (op) != NULL
+&& TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL
+&& (rs6000_fndecl_pcrel_p (SYMBOL_REF_DECL (op))
+!= rs6000_pcrel_p (cfun)))")))
 
 ;; Return 1 if this operand is a valid input for a move insn.
 (define_predicate "input_operand"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96493.c 
b/gcc/testsuite/gcc.target/powerpc/pr96493.c
new file mode 100644
index 000..3ee0fc9fe45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96493.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-mdejagnu-cpu=powerpc64 -O2" } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+
+/* Test local calls between pcrel and non-pcrel code.
+
+   Despite the cpu=power10 option, the code generated here should just
+   be plain powerpc64, even the necessary linker stubs.  */
+
+int one = 1;
+
+int __attribute__ ((target("cpu=power8"),noclone,noinline))
+p8_func (int x)
+{
+  return x - one;
+}
+
+int __attribute__ ((target("cpu=power10"),noclone,noinline))
+p10_func (int x)
+{
+  return p8_func (x);
+}
+
+int
+main (void)
+{
+  return p10_func (1);
+}

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] PR96493, powerpc local call linkage failure

2020-08-06 Thread Alan Modra via Gcc-patches
On Thu, Aug 06, 2020 at 05:34:03PM -0500, Segher Boessenkool wrote:
> > +/* { dg-do run } */
> > +/* { dg-options "-mdejagnu-cpu=powerpc64 -O2" } */
> 
> That is not a -mcpu= value you should ever use.  Please just pick a real
> existing CPU, maybe p7 or p8 since this requires ELFv2 anyway?  Or, what
> does it need here?  It isn't clear to me.  But you don't want a pseudo-
> POWER3 with ELFv2 :-)

What I was trying to say by using cpu=powerpc64 is that this test has
minimal requirements on the required cpu level to run the test.
That's all.  Changed to power8 and committed.

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Put call cookie back in AIX/ELFv2 call patterns

2020-03-27 Thread Alan Modra via Gcc-patches
-mlongcall -mno-pltseq is supposed to emit long calls by using
indirect calls.  It differs from -mlongcall -mpltseq in that the
function addresses are not placed in the PLT and thus lazy PLT
resolution is not available, affecting programs that dlopen shared
libraries.

In the case of -mcpu=future -mpcrel -mlongcall -mno-pltseq we see an
indirect call being generated, but combine merrily optimises the
sequence back to a direct call.  call_indirect_pcrel is enough like
call_nonlocal_aix that this can happen.

This patch puts the call cookie back in the call rtl, removed by git
commit f90f960ca8, in order to disable the optimisation for long
calls.  When that is done for call_local_aix the pattern becomes the
same as call_local32/64, so I merged them.  The only difference
besides mode between call_local32 and call_local64, dating back to
1998 commit a260abc996, is that call_local64 has TARGET_64BIT in the
predicate.  That alone doesn't seem reason enough to need separate
patterns; The P mode iterator selects DI on TARGET_64BIT anyway.

Bootstrapped and regression tested powerpc64le-linux and
powerpc64-linux biarch all langs.  OK for master?

* config/rs6000/rs6000.c (rs6000_call_aix): Emit cookie to pattern.
(rs6000_indirect_call_template_1): Adjust to suit.
* config/rs6000/rs6000.md (call_local): Merge call_local32,
call_local64, and call_local_aix.
(call_value_local): Simlarly.
(call_nonlocal_aix, call_value_nonlocal_aix): Adjust rtl to suit,
and disable pattern when CALL_LONG.
(call_indirect_aix, call_value_indirect_aix): Adjust rtl.
(call_indirect_elfv2, call_indirect_pcrel): Likewise.
(call_value_indirect_elfv2, call_value_indirect_pcrel): Likewise.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 13851d12551..2b6613bcb7e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -13621,7 +13621,7 @@ rs6000_indirect_call_template_1 (rtx *operands, 
unsigned int funop,
   if (DEFAULT_ABI == ABI_AIX)
 s += sprintf (s,
  "l%s 2,%%%u\n\t",
- ptrload, funop + 2);
+ ptrload, funop + 3);
 
   /* We don't need the extra code to stop indirect call speculation if
  calling via LR.  */
@@ -13675,12 +13675,12 @@ rs6000_indirect_call_template_1 (rtx *operands, 
unsigned int funop,
sprintf (s,
 "b%%T%ul\n\t"
 "l%s 2,%%%u(1)",
-funop, ptrload, funop + 3);
+funop, ptrload, funop + 4);
   else
sprintf (s,
 "beq%%T%ul-\n\t"
 "l%s 2,%%%u(1)",
-funop, ptrload, funop + 3);
+funop, ptrload, funop + 4);
 }
   else if (DEFAULT_ABI == ABI_ELFv2)
 {
@@ -13688,12 +13688,12 @@ rs6000_indirect_call_template_1 (rtx *operands, 
unsigned int funop,
sprintf (s,
 "b%%T%ul\n\t"
 "l%s 2,%%%u(1)",
-funop, ptrload, funop + 2);
+funop, ptrload, funop + 3);
   else
sprintf (s,
 "beq%%T%ul-\n\t"
 "l%s 2,%%%u(1)",
-funop, ptrload, funop + 2);
+funop, ptrload, funop + 3);
 }
   else
 {
@@ -24304,7 +24304,7 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, 
rtx cookie)
   rtx toc_restore = NULL_RTX;
   rtx func_addr;
   rtx abi_reg = NULL_RTX;
-  rtx call[4];
+  rtx call[5];
   int n_call;
   rtx insn;
   bool is_pltseq_longcall;
@@ -24445,7 +24445,8 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, 
rtx cookie)
   call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), tlsarg);
   if (value != NULL_RTX)
 call[0] = gen_rtx_SET (value, call[0]);
-  n_call = 1;
+  call[1] = gen_rtx_USE (VOIDmode, cookie);
+  n_call = 2;
 
   if (toc_load)
 call[n_call++] = toc_load;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 5a8e9de670b..dcccb03f376 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10459,11 +10459,11 @@
 ;; variable argument function.  It is > 0 if FP registers were passed
 ;; and < 0 if they were not.
 
-(define_insn "*call_local32"
-  [(call (mem:SI (match_operand:SI 0 "current_file_function_operand" "s,s"))
+(define_insn "*call_local"
+  [(call (mem:SI (match_operand:P 0 "current_file_function_operand" "s,s"))
 (match_operand 1))
(use (match_operand:SI 2 "immediate_operand" "O,n"))
-   (clobber (reg:SI LR_REGNO))]
+   (clobber (reg:P LR_REGNO))]
   "(INTVAL (operands[2]) & CALL_LONG) == 0"
 {
   if (INTVAL (operands[2]) & CALL_V4_SET_FP_ARGS)
@@ -10472,35 +10472,19 @@
   else if (INTVAL (operands[2]) & CALL_V4_CLEAR_FP_ARGS)
 output_asm_insn ("creqv 6,6,6", operands);
 
+  if (rs6000_pcrel_p (cfun))
+return "bl %z0@notoc";
   return (DEFAULT_ABI == ABI_V4 && flag_pic) ? "bl %z0@local" : "bl %z0";
 }
   [(set_attr "type" "branch")
(set_attr "len

Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization

2020-04-03 Thread Alan Modra via Gcc-patches
On Fri, Apr 03, 2020 at 02:13:06PM +0800, luoxhu via Gcc-patches wrote:
> Seems PR94393?  Yes, rs6000_emit_set_const calls rs6000_emit_set_long_const 
> for DImode.
> I tried unsigned long like 0xabcd87654321, 0xabcd87654321 and 
> 0xc000ULL, 
> All of them are outside of loop even without my patch.  No difference with or 
> without
> Alan's patch.

Segher probably meant the part I'm working on and haven't posted yet,
fixing lots of problems with rs6000_rtx_costs.  One of the improvements
I'm aiming for is that we should be able to emit code that loads
constants from memory without following optimisation passes converting
the loads from memory to those long sequences of dependent instructions.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization

2020-04-03 Thread Alan Modra via Gcc-patches
On Fri, Apr 03, 2020 at 04:11:32PM -0500, Segher Boessenkool wrote:
> On Fri, Apr 03, 2020 at 10:43:49PM +1030, Alan Modra wrote:
> > Segher probably meant the part I'm working on and haven't posted yet,
> > fixing lots of problems with rs6000_rtx_costs.
> 
> I meant PR94393 in fact, but yeah, this is connected *everywhere* :-)
> 
> > One of the improvements
> > I'm aiming for is that we should be able to emit code that loads
> > constants from memory without following optimisation passes converting
> > the loads from memory to those long sequences of dependent instructions.
> 
> Yeah.  Looking forward to it :-)

I have a series of small patches already, the most significant so far
being the one that adds the following comment to rs6000_rtx_costs:

  "Calls from places like optabs.c:avoid_expensive_constant will come
   here with OUTER_CODE set to an operation such as AND with X being a
   CONST_INT or other CONSTANT_P type.  This will be compared against
   set_src_cost, where we'll come here with OUTER_CODE as SET and X
   the same constant.

   Calls from places like default_noce_conversion_profitable_p will
   come here via seq_cost and pass the pattern of a SET insn in X.
   The SET isn't handled here so *TOTAL will remain as
   COSTS_N_INSNS(1) multiplied by the number of words being set.
   Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs
   will next see the set_src.  Continuing the example of an AND, this
   might be an AND of two other regs.  This AND should cost zero here
   since the insn cost has already been counted.  The overall cost
   value should be comparable to rs6000_insn_cost."

It pays to figure out what you need to do before doing anything.  :-)

Those two paragraphs will result in quite a few changes.  The first
one means that, for example, the rs6000_is_valid_and_mask test belongs
under case CONST_INT as
  || (outer_code == AND
  && rs6000_is_valid_and_mask (x, mode))
not under case AND.

The second paragraph says we are costing practically all operations
too highly.

I'm still in analysis mode, worried about whether gcc generates deep
rtl expressions to pass to rtx_cost.  I have a vague recollection of
seeing that years ago, but it looks like most places with anything
complex generate a sequence of insns and cost the sequence.  If we do
have expressions like (set (reg1) (and (plus (reg2) cst1) cst2)) then
rs6000_rtx_cost should recognise that AND as costing an extra insn.

-- 
Alan Modra
Australia Development Lab, IBM


set_rtx_cost used wrongly, should be set_src_cost

2020-04-13 Thread Alan Modra via Gcc-patches
I believe set_rtx_cost is meant to handle a SET, not a PLUS as is
passed in these two locations.  Using the proper function for a PLUS
doesn't make a huge difference: the only arg change to rtx_cost of any
consequence is outer_code of SET rather than INSN.  A mode of
word_mode rather than VOIDmode makes no difference at all since the
mode is taken from the PLUS.  An opno of 1 rather than 4 also doesn't
change anything since the only backend that does anything with opno
(besides pass it back to a recursive rtx_cost call) is nios2, and
there "opno == 0" is the only use of opno.

Bootstrapped and regression tested powerpc64le-linux and x86_64-linux.
OK for next stage1?

* tree-ssa-reassoc.c (optimize_range_tests_to_bit_test): Replace
set_rtx_cost with set_src_cost.
* tree-switch-conversion.c (bit_test_cluster::emit): Likewise.

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index ec1c033a2cf..af8faf2e6ea 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -3208,8 +3208,9 @@ optimize_range_tests_to_bit_test (enum tree_code opcode, 
int first, int length,
  HOST_WIDE_INT m = tree_to_uhwi (lowi);
  rtx reg = gen_raw_REG (word_mode, 1);
  bool speed_p = optimize_bb_for_speed_p (gimple_bb (stmt));
- cost_diff = set_rtx_cost (gen_rtx_PLUS (word_mode, reg,
- GEN_INT (-m)), speed_p);
+ cost_diff = set_src_cost (gen_rtx_PLUS (word_mode, reg,
+ GEN_INT (-m)),
+   word_mode, speed_p);
  rtx r = immed_wide_int_const (mask, word_mode);
  cost_diff += set_src_cost (gen_rtx_AND (word_mode, reg, r),
 word_mode, speed_p);
diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index bf910dd62b5..4b435941d12 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1541,8 +1541,9 @@ bit_test_cluster::emit (tree index_expr, tree index_type,
   HOST_WIDE_INT m = tree_to_uhwi (minval);
   rtx reg = gen_raw_REG (word_mode, 1);
   bool speed_p = optimize_insn_for_speed_p ();
-  cost_diff = set_rtx_cost (gen_rtx_PLUS (word_mode, reg,
- GEN_INT (-m)), speed_p);
+  cost_diff = set_src_cost (gen_rtx_PLUS (word_mode, reg,
+ GEN_INT (-m)),
+   word_mode, speed_p);
   for (i = 0; i < count; i++)
{
  rtx r = immed_wide_int_const (test[i].mask, word_mode);

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH toplevel] libctf: new testsuite

2021-01-05 Thread Alan Modra via Gcc-patches
On Tue, Jan 05, 2021 at 03:25:10PM +, Nick Alcock wrote:
> This enables 'make libctf-check', used by a new libctf testsuite in
> binutils.
> 
> 2021-01-05  Nick Alcock  
> 
>   * Makefile.def (libctf): No longer no_check.  Checking depends on
>   all-ld.
>   * Makefile.in: Regenerated.
> 
> ---
> 
>  Makefile.def  |   4 +-
>  Makefile.in   |  13 +
> 
> This is a stripped-down top-level-only subset of commit 
> c59e30ed1727135f8efb79890f2c458f73709757 in binutils-gdb.git.  (Because
> it is identical to what has already landed in binutils, it should apply
> without trouble in syncs back to there.)
> 
> I don't have permission to push this: Alan has offered to do so.

It doesn't apply due to gcc missing binutils 87279e3cef5b2c5 changes
too.  I could fix that easily enough but I'm going to ask that you
post a combined patch to bring the gcc repo up to date with any libctf
changes.

-- 
Alan Modra
Australia Development Lab, IBM


  1   2   >