[PR94092] Re: [RFC] test builtin ratio for loop distribution

2021-02-19 Thread Alexandre Oliva
Here's an improved version of the patch.  Regstrapped on
x86_64-linux-gnu, with and without a patchlet that moved multi-pieces
ahead of setmem, and also tested with riscv32-elf.

Is it ok to install?  Or should it wait for stage1?


[PR94092] introduce try store by multiple pieces

From: Alexandre Oliva 

The ldist pass turns even very short loops into memset calls.  E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.

Though short constant-length clearing memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.

RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.

This patch handles variable lengths with multiple conditional
power-of-2-constant-sized stores-by-pieces, so as to reduce the
overhead of length compares.

It also changes the last copy-prop pass into ccp, so that pointer
alignment and length's nonzero bits are detected and made available
for the expander, even for ldist-introduced SSA_NAMEs.


for  gcc/ChangeLog

PR tree-optimization/94092
* builtins.c (try_store_by_multiple_pieces): New.
(expand_builtin_memset_args): Use it.  If target_char_cast
fails, proceed as for non-constant val.  Pass len's ctz to...
* expr.c (clear_storage_hints): ... this.  Try store by
multiple pieces after setmem.
(clear_storage): Adjust.
* expr.h (clear_storage_hints): Likewise.
(try_store_by_multiple_pieces): Declare.
* passes.def: Replace the last copy_prop to ccp.
---
 gcc/builtins.c |  182 ++--
 gcc/expr.c |9 ++-
 gcc/expr.h |   13 
 gcc/passes.def |5 +-
 4 files changed, 197 insertions(+), 12 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 0aed008687ccc..05b14f2a3f6d3 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6600,6 +6600,166 @@ expand_builtin_memset (tree exp, rtx target, 
machine_mode mode)
   return expand_builtin_memset_args (dest, val, len, target, mode, exp);
 }
 
+/* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO.
+   Return TRUE if successful, FALSE otherwise.  TO is assumed to be
+   aligned at an ALIGN-bits boundary.  LEN must be a multiple of
+   1<= 0);
+
+  if (val)
+valc = 1;
+
+  /* Bits more significant than TST_BITS are part of the shared prefix
+ in the binary representation of both min_len and max_len.  Since
+ they're identical, we don't need to test them in the loop.  */
+  int tst_bits = (max_bits != min_bits ? max_bits
+ : floor_log2 (max_len ^ min_len));
+
+  /* Check whether it's profitable to start by storing a fixed BLKSIZE
+ bytes, to lower max_bits.  In the unlikely case of a constant LEN
+ (implied by identical MAX_LEN and MIN_LEN), we want to issue a
+ single store_by_pieces, but otherwise, select the minimum multiple
+ of the ALIGN (in bytes) and of the MCD of the possible LENs, that
+ brings MAX_LEN below TST_BITS, if that's lower than min_len.  */
+  unsigned HOST_WIDE_INT blksize;
+  if (max_len > min_len)
+{
+  unsigned HOST_WIDE_INT alrng = MAX (HOST_WIDE_INT_1U << ctz_len,
+ align / BITS_PER_UNIT);
+  blksize = max_len- (HOST_WIDE_INT_1U << tst_bits) + alrng;
+  blksize &= ~(alrng - 1);
+}
+  else if (max_len == min_len)
+blksize = max_len;
+  else
+gcc_unreachable ();
+  if (min_len >= blksize)
+{
+  min_len -= blksize;
+  min_bits = floor_log2 (min_len);
+  max_len -= blksize;
+  max_bits = floor_log2 (max_len);
+
+  tst_bits = (max_bits != min_bits ? max_bits
+: floor_log2 (max_len ^ min_len));
+}
+  else
+blksize = 0;
+
+  /* Check that we can use store by pieces for the maximum store count
+ we may issue (initial fixed-size block, plus conditional
+ power-of-two-sized from max_bits to ctz_len.  */
+  unsigned HOST_WIDE_INT xlenest = blksize;
+  if (max_bits >= 0)
+xlenest += ((HOST_WIDE_INT_1U << max_bits) * 2
+   - (HOST_WIDE_INT_1U << ctz_len));
+  if (!can_store_by_pieces (xlenest, builtin_memset_read_str,
+   &valc, align, true))
+return false;
+
+  rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode);
+  void *constfundata;
+  if (val)
+{
+  constfun = builtin_memset_gen_str;
+  constfundata = val = force_reg (TYPE_MODE (unsigned_char_type_node),
+ val);
+}
+  else
+{
+  constfun = builtin_memset_read_str;
+  constfundata = &valc;
+}
+
+  rtx ptr =

Re: [PATCH] PR fortran/99147 - Sanitizer detects heap-use-after-free in gfc_add_flavor

2021-02-19 Thread Paul Richard Thomas via Gcc-patches
Hi Harald,

It looks 'obvious' to me too and is certainly OK for master.

Thanks

Paul


On Thu, 18 Feb 2021 at 21:30, Harald Anlauf via Fortran 
wrote:

> Dear all,
>
> the PR reports an issue detected with an ASAN instrumented compiler,
> which can also be verified with valgrind.  It appears that the state
> of gfc_new_block could be such that it should not be dereferenced.
> Reversing the order of condition evaluation helped.
>
> I failed to find out why this should happen, but then other places
> in the code put dereferences of gfc_new_block behind other checks.
> Simple things like initializing gfc_new_block with NULL in decl.c
> did not help.
>
> Regtested on x86_64-pc-linux-gnu.  No testcase added since the issue
> can be found only with an instrumented compiler or valgrind.
>
> I consider the patch to be obvious and trivial, but post it here
> in case somebody wants to dig deeper.
>
> OK for master?
>
> Thanks,
> Harald
>
>
> PR fortran/99147 - Sanitizer detects heap-use-after-free in gfc_add_flavor
>
> Reverse order of conditions to avoid invalid read.
>
> gcc/fortran/ChangeLog:
>
> * symbol.c (gfc_add_flavor): Reverse order of conditions.
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: [PATCH 0/2] IBM Z: Fix long double <-> DFP conversions

2021-02-19 Thread Andreas Krebbel via Gcc-patches
On 2/18/21 1:57 PM, Ilya Leoshkevich wrote:
> This series fixes PR99134.  Patch 1 is factored out from the pending
> [1], patch 2 is the actual fix.  Bootstrapped and regtested on
> s390x-redhat-linux.  Ok for master?
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html
> 
> Ilya Leoshkevich (2):
>   IBM Z: Improve FPRX2 <-> TF conversions
>   IBM Z: Fix long double <-> DFP conversions

Ok. Thanks!

Andreas


[PATCH] aarch64: Introduce prefer_advsimd_autovec to GCC 10

2021-02-19 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch introduces the prefer_advsimd_autovec internal tune flag that's 
already available on the GCC 8 and 9 branches.
It allows a CPU tuning to specify that it prefers Advanced SIMD for 
autovectorisation rather than SVE.
In GCC 10 onwards this can be easily adjusted through the 
aarch64_autovec_preference param in the options override hook.
The neoversev1_tunings struct makes use of this tuning flag

Bootstrapped and tested on aarch64-none-linux-gnu.
Confirmed that an --param aarch64-autovec-preference can override the CPU 
setting if the user really wishes to.

Pushing to the GCC 10 branch.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (prefer_advsimd_autovec): 
Define.
* config/aarch64/aarch64.c (neoversev1_tunings): Use it.
(aarch64_override_options_internal): Adjust aarch64_autovec_preference
param when prefer_advsimd_autovec is enabled.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd_autovec_only_1.c: New test.


asimd-vec-10.patch
Description: asimd-vec-10.patch


*PING**2 Re: [Patch] Fortran: Fix coarray handling for gfc_dep_resolver [PR99010] (was: Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913]

2021-02-19 Thread Tobias Burnus



On 09.02.21 12:52, Tobias Burnus wrote:

Hi all, hi Thomas,

On 02.02.21 18:15, Tobias Burnus wrote:

I think I will do a combination: If 'identical' is true, I think I
cannot remove it. If it is false, it can be identical or nonoverlapping
– which makes sense.


Well, it turned out that coarray scalars did not work; for them,
the array ref consists only of the coindexed access: falling through
then triggered as fin_dep == GFC_DEP_ERROR.

To be more careful, I also removed the 'identical &&' check such that
the first loop is now always triggered if coarray != SINGLE not only
if identical – I am not sure whether it is really needed or whether
falling though is the better solution (with updating the comment).

I would be happy if someone could carefully check the logic –
in the details, it is rather confusing.

OK?

Tobias


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [PATCH 0/2] RISC-V big endian support

2021-02-19 Thread Kito Cheng via Gcc-patches
Hi Marcus:

> I could also disable the -mlittle-endian and -mbig-endian options if
> the binutils is too old, but it's questionable if getting a "I don't
> understand -mbig-endian" from gcc is more useful than getting it from
> gas.  And it would be more work for the user to fix it since then they
> would have to also rebuild gcc after upgrading binutils.

Sounds reasonable to me, I am ok with keeping it as it is without
binutils change.

I tried to run with gcc testsuite on spike + pk, seems got a bunch
extra fail cases,
could you take a look for that?

   = Summary of gcc testsuite =
   | # of unexpected case / # of unique unexpected case
   |  gcc |  g++ | gfortran |
rv64gc/   lp64/ medlow |  476 /   109 |  131 /25 |  - |
Makefile:914: recipe for target 'report-gcc-newlib' failed

I've setup test env via riscv-gnu-toolchain, you can reproduce by
following command:

$ git clone g...@github.com:riscv/riscv-gnu-toolchain.git -b big-endian
riscv-gnu-toolchain-be
$ cd riscv-gnu-toolchain-be
$ mkdir build && cd build
$ ../configure --prefix=`pwd`/install --with-arch=rv64gc
--with-multilib-generator="rv64gc-lp64--"
$ make report SIM=spike -j`nproc`


On Sat, Jan 30, 2021 at 3:54 AM Marcus Comstedt  wrote:
>
>
> Hi Kito,
>
> Kito Cheng  writes:
>
> > You can add a check in configure.ac and config.in to detect whether
> > binutils is supported or not.
> > here is example, search HAVE_AS_MISA_SPEC in this patch:
> > https://github.com/gcc-mirror/gcc/commit/4b81528241ca682025d92558ff6aeec91dafdca8
>
> Ok, but I specifically _don't_ want to require a newer binutils for
> the little endian triplets (unless there is also some other RISC-V
> change meaning that the newer binutils is needed anyway, in which case
> this point becomes moot).
>
> I could check the binutils version in the case of big endian triplets
> only, but since older binutils can't be built for those triplets
> anyway (due to config.sub), I don't think it would do much.
>
> I could also disable the -mlittle-endian and -mbig-endian options if
> the binutils is too old, but it's questionable if getting a "I don't
> understand -mbig-endian" from gcc is more useful than getting it from
> gas.  And it would be more work for the user to fix it since then they
> would have to also rebuild gcc after upgrading binutils.
>
>
>   // Marcus
>
>


Re: [PATCH] run -Wnonnull later (PR 87489)

2021-02-19 Thread Franz Sirl

Am 2021-02-01 um 01:31 schrieb Martin Sebor via Gcc-patches:

The initial -Wnonnull implementation in the middle end took place
too late in the pipeline (just before expansion), and as a result
was prone to false positives (bug 78817).  In an attempt to avoid
the worst of those, the warning was moved to the ccp2 pass in
r243874.  However, as the test case in PR 87489 shows, this is
in turn too early and causes other false positives as a result.

A few experiments with running the warning later suggest that
just before the mergephi2 pass is a good point to avoid this class
of false positives without causing any regressions or introducing
any new warnings either in Glibc or in Binutils/GDB.

Since PR 87489 is a GCC 8-11 regression I propose to make this
change on the trunk as well as on the release branches.


Hi Martin,

I tested your patch and it showed also one more warning for this 
testcase with -O2 -Wnonnull:


class b {
public:
  long c();
};
class B {
public:
  B() : f() {}
  b *f;
};
long d, e;
class g : B {
public:
  void h() {
long a = f->c();
d = e = a;
  }
};
class j {
  j();
  g i;
};
j::j() { i.h(); }

I hope cvise didn't minimize too much, but at least in the original much 
larger code the warning looks reasonable too.


Franz


[PATCH] tree-cfg: Fix up gimple_merge_blocks FORCED_LABEL handling [PR99034]

2021-02-19 Thread Jakub Jelinek via Gcc-patches
Hi!

The verifiers require that DECL_NONLOCAL or EH_LANDING_PAD_NR
labels are always the first label if there is more than one label.

When merging blocks, we don't honor that though.
On the following testcase, we try to merge blocks:
 [count: 0]:
:
S::~S (&s);

and
 [count: 0]:
:
resx 1

where  is landing pad and  is FORCED_LABEL.  And the code puts
the FORCED_LABEL before the landing pad label, violating the verification
requirements.

The following patch fixes it by moving the FORCED_LABEL after the
DECL_NONLOCAL or EH_LANDING_PAD_NR label if it is the first label.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-02-19  Jakub Jelinek  

PR ipa/99034
* tree-cfg.c (gimple_merge_blocks): If bb a starts with eh landing
pad or non-local label, put FORCED_LABELs from bb b after that label
rather than before it.

* g++.dg/opt/pr99034.C: New test.

--- gcc/tree-cfg.c.jj   2021-02-18 16:20:57.053826485 +0100
+++ gcc/tree-cfg.c  2021-02-18 19:15:48.436526437 +0100
@@ -2124,7 +2124,17 @@ gimple_merge_blocks (basic_block a, basi
  if (FORCED_LABEL (label))
{
  gimple_stmt_iterator dest_gsi = gsi_start_bb (a);
- gsi_insert_before (&dest_gsi, stmt, GSI_NEW_STMT);
+ tree first_label = NULL_TREE;
+ if (!gsi_end_p (dest_gsi))
+   if (glabel *first_label_stmt
+   = dyn_cast  (gsi_stmt (dest_gsi)))
+ first_label = gimple_label_label (first_label_stmt);
+ if (first_label
+ && (DECL_NONLOCAL (first_label)
+ || EH_LANDING_PAD_NR (first_label) != 0))
+   gsi_insert_after (&dest_gsi, stmt, GSI_NEW_STMT);
+ else
+   gsi_insert_before (&dest_gsi, stmt, GSI_NEW_STMT);
}
  /* Other user labels keep around in a form of a debug stmt.  */
  else if (!DECL_ARTIFICIAL (label) && MAY_HAVE_DEBUG_BIND_STMTS)
--- gcc/testsuite/g++.dg/opt/pr99034.C.jj   2021-02-18 19:05:27.205347304 
+0100
+++ gcc/testsuite/g++.dg/opt/pr99034.C  2021-02-18 19:07:32.352973196 +0100
@@ -0,0 +1,23 @@
+// PR ipa/99034
+// { dg-do compile }
+// { dg-options "-O2" }
+
+void *b[5];
+void foo (void);
+struct S { ~S (); };
+
+static inline void
+__attribute__((always_inline))
+bar (int d)
+{
+  S s;
+  while (d)
+foo ();
+}
+
+void
+baz (void)
+{
+  bar (2);
+  __builtin_setjmp (b);
+}

Jakub



Re: [committed] jit: fix ICE on BUILT_IN_TRAP [PR99126]

2021-02-19 Thread Andrea Corallo via Gcc-patches
David Malcolm via Gcc-patches  writes:

> I tried several approaches to fixing this; this seemed the
> least invasive.

Hi Dave,

thanks for fixing this.

I do have a trivial question: are we guaranteed that the middle-end will
not try to add any build-in other than a trap?

Regards

  Andrea


Re: [committed] jit: fix ICE on BUILT_IN_TRAP [PR99126]

2021-02-19 Thread Andrea Corallo via Gcc-patches
Andrea Corallo via Gcc-patches  writes:

> David Malcolm via Gcc-patches  writes:
>
>> I tried several approaches to fixing this; this seemed the
>> least invasive.
>
> Hi Dave,
>
> thanks for fixing this.
>
> I do have a trivial question: are we guaranteed that the middle-end will
> not try to add any build-in other than a trap?

Ah also, shouldn't the fix be also back-ported?

Thanks

  Andrea


ARM patch ping

2021-02-19 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565225.html
patch - PR98998 P1 fix.

Thanks

Jakub



Re: [PATCH] tree-cfg: Fix up gimple_merge_blocks FORCED_LABEL handling [PR99034]

2021-02-19 Thread Richard Biener
On Fri, 19 Feb 2021, Jakub Jelinek wrote:

> Hi!
> 
> The verifiers require that DECL_NONLOCAL or EH_LANDING_PAD_NR
> labels are always the first label if there is more than one label.
> 
> When merging blocks, we don't honor that though.
> On the following testcase, we try to merge blocks:
>  [count: 0]:
> :
> S::~S (&s);
> 
> and
>  [count: 0]:
> :
> resx 1
> 
> where  is landing pad and  is FORCED_LABEL.  And the code puts
> the FORCED_LABEL before the landing pad label, violating the verification
> requirements.
> 
> The following patch fixes it by moving the FORCED_LABEL after the
> DECL_NONLOCAL or EH_LANDING_PAD_NR label if it is the first label.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Huh, wasn't aware of such.  OK.

Richard.

> 2021-02-19  Jakub Jelinek  
> 
>   PR ipa/99034
>   * tree-cfg.c (gimple_merge_blocks): If bb a starts with eh landing
>   pad or non-local label, put FORCED_LABELs from bb b after that label
>   rather than before it.
> 
>   * g++.dg/opt/pr99034.C: New test.
> 
> --- gcc/tree-cfg.c.jj 2021-02-18 16:20:57.053826485 +0100
> +++ gcc/tree-cfg.c2021-02-18 19:15:48.436526437 +0100
> @@ -2124,7 +2124,17 @@ gimple_merge_blocks (basic_block a, basi
> if (FORCED_LABEL (label))
>   {
> gimple_stmt_iterator dest_gsi = gsi_start_bb (a);
> -   gsi_insert_before (&dest_gsi, stmt, GSI_NEW_STMT);
> +   tree first_label = NULL_TREE;
> +   if (!gsi_end_p (dest_gsi))
> + if (glabel *first_label_stmt
> + = dyn_cast  (gsi_stmt (dest_gsi)))
> +   first_label = gimple_label_label (first_label_stmt);
> +   if (first_label
> +   && (DECL_NONLOCAL (first_label)
> +   || EH_LANDING_PAD_NR (first_label) != 0))
> + gsi_insert_after (&dest_gsi, stmt, GSI_NEW_STMT);
> +   else
> + gsi_insert_before (&dest_gsi, stmt, GSI_NEW_STMT);
>   }
> /* Other user labels keep around in a form of a debug stmt.  */
> else if (!DECL_ARTIFICIAL (label) && MAY_HAVE_DEBUG_BIND_STMTS)
> --- gcc/testsuite/g++.dg/opt/pr99034.C.jj 2021-02-18 19:05:27.205347304 
> +0100
> +++ gcc/testsuite/g++.dg/opt/pr99034.C2021-02-18 19:07:32.352973196 
> +0100
> @@ -0,0 +1,23 @@
> +// PR ipa/99034
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +
> +void *b[5];
> +void foo (void);
> +struct S { ~S (); };
> +
> +static inline void
> +__attribute__((always_inline))
> +bar (int d)
> +{
> +  S s;
> +  while (d)
> +foo ();
> +}
> +
> +void
> +baz (void)
> +{
> +  bar (2);
> +  __builtin_setjmp (b);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-02-19 Thread Martin Jambor
On Thu, Feb 18 2021, Qing Zhao via Gcc-patches wrote:
> (CC’ing Kees Cook on this topic)
>
> Hi, 
>
> This is the first version of the complete patch for the new security feature 
> for GCC: 
>
> Initialize automatic variables with new first class option 
> -ftrivial-auto-var-init=[uninitialized|pattern|zero] 
> and  a new variable attribute “uninitialized” to exclude some variables from 
> automatical initialization to
> Control runtime overhead.
>
[...]
>
> **the complete patch:
>
> See the attachment.

I think you forgot to attach the patch (or at least I do not see any
attachment).

Martin


[PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered

2021-02-19 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch makes sure that allocno copies are not created for unordered 
modes. The testcases in the PR highlighted a case where an allocno copy 
was being created for:

(insn 121 120 123 11 (parallel [
    (set (reg:VNx2QI 217)
    (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 
0)))

    (clobber (scratch:VNx16BI))
    ]) 4750 {*vec_duplicatevnx2qi_reg}
 (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
    (nil)))

As the compiler detected that the vec_duplicate_reg pattern 
allowed the input and output operand to be of the same register class, 
it tried to create an allocno copy for these two operands, stripping 
subregs in the process. However, this meant that the copy was between 
VNx2QI and SI, which have unordered mode precisions.


So at compile time we do not know which of the two modes is smaller 
which is a requirement when updating allocno copy costs.


Regression tested on aarch64-linux-gnu.

Is this OK for trunk (and after a week backport to gcc-10) ?

Regards,
Andre


gcc/ChangeLog:
2021-02-19  Andre Vieira  

    PR rtl-optimization/98791
    * ira-conflicts.c (process_regs_for_copy): Don't create allocno 
copies for unordered modes.


gcc/testsuite/ChangeLog:
2021-02-19  Andre Vieira  

    PR rtl-optimization/98791
    * gcc.target/aarch64/sve/pr98791.c: New test.

diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index 
2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0
 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -275,7 +275,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool 
constraint_p,
   ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
   ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
 
-  if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2)
+  if (!allocnos_conflict_for_copy_p (a1, a2)
+ && offset1 == offset2
+ && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)),
+   GET_MODE_PRECISION (ALLOCNO_MODE (a2
{
  cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn,
 ira_curr_loop_tree_node);
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
new file mode 100644
index 
..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
@@ -0,0 +1,12 @@
+/* PR rtl-optimization/98791  */
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */
+extern char a[], b[];
+short c, d;
+long *e;
+void f() {
+  for (int g; g < c; g += 1) {
+a[g] = d;
+b[g] = e[g];
+  }
+}


Re: [PATCH] improve warning suppression for inlined functions (PR 98465, 98512)

2021-02-19 Thread Florian Weimer via Gcc-patches
* Jeff Law via Gcc-patches:

> I'd lean towards deferring to gcc12 stage1 given the libstdc++ hack is
> in place.  That does mean that glibc will need to work around the
> instance they've stumbled over in ppc's rawmemchr.

We'll need to work around this in the glibc build, too.  I'll check if
the suggested alternative (have the alias covered by the pragma) works
there as well.

Thanks,
Florian



Re: [committed] libstdc++: Fix __thread_yield for non-gthreads targets

2021-02-19 Thread Jonathan Wakely via Gcc-patches

On 15/02/21 16:13 +, Jonathan Wakely wrote:

The __gthread_yield() function is only defined for gthreads targets, so
check _GLIBCXX_HAS_GTHREADS before using it.

Also reorder __thread_relax and __thread_yield so that the former can
use the latter instead of repeating the same preprocessor checks.

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (__thread_yield()): Check
_GLIBCXX_HAS_GTHREADS before using __gthread_yield.
(__thread_relax()): Use __thread_yield() instead of repeating
the preprocessor checks for __gthread_yield.





--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -213,24 +213,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  { _M_w._M_do_wait(_M_version); }
};

-inline void
-__thread_relax() noexcept
-{
-#if defined __i386__ || defined __x86_64__
-  __builtin_ia32_pause();
-#elif defined _GLIBCXX_USE_SCHED_YIELD
-  __gthread_yield();
-#endif
-}
-
inline void
__thread_yield() noexcept
{
-#if defined _GLIBCXX_USE_SCHED_YIELD
+#if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
 __gthread_yield();
#endif
}

+inline void
+__thread_relax() noexcept
+{
+#if defined __i386__ || defined __x86_64__
+  __builtin_ia32_pause();
+#else
+  __gthread_yield();


As I said in the ChangeLog, this was supposed to be changed from
__gthread_yield to __thread_yield. Fixed by this patch.

Tested powerpc64le-linux. Committed to trunk.

commit 9d449189ee4304ce4f250351c8aa393324421eef
Author: Jonathan Wakely 
Date:   Fri Feb 19 09:54:04 2021

libstdc++: Fix __thread_relax for non-gthreads non-x86 targets

My recent change to the preprocessor conditions in __thread_relax() was
supposed to also change the __gthread_yield() call to __thread_yield(),
which has the right preprocessor checks. Instead I just removed the
check for _GLIBCXX_USE_SCHED_YIELD which means the __gthread_yield()
call will be ill-formed for non-gthreads targets, and targets without
sched_yield(). This fixes it properly.

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (__thread_relax()): Call
__thread_yield() not __gthread_yield().

diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 37085ae8e50..424fccbe4c5 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -217,7 +217,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __thread_yield() noexcept
 {
 #if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
- __gthread_yield();
+  __gthread_yield();
 #endif
 }
 
@@ -227,7 +227,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if defined __i386__ || defined __x86_64__
   __builtin_ia32_pause();
 #else
-  __gthread_yield();
+  __thread_yield();
 #endif
 }
   } // namespace __detail


RE: ARM patch ping

2021-02-19 Thread Kyrylo Tkachov via Gcc-patches
Hi Jakub,

> -Original Message-
> From: Jakub Jelinek 
> Sent: 19 February 2021 10:45
> To: Richard Sandiford ; Richard Earnshaw
> ; Ramana Radhakrishnan
> ; Kyrylo Tkachov
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: ARM patch ping
> 
> Hi!
> 
> I'd like to ping the
> https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565225.html
> patch - PR98998 P1 fix.

The patch is okay, but I think we can add a testcase in 
gcc.target/arm/pure-code/ as Christophe says in the PR.
That should skip it on incompatible configurations.
Thanks,
Kyrill

> 
> Thanks
> 
>   Jakub



Re: [PATCH] gcov: use mmap pools for KVP.

2021-02-19 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 09, 2021 at 09:37:12AM +0100, Martin Liška wrote:
> PING^2
> 
> @Honza: ?

Just concerning Windows (though I don't have access to that OS and can't
verify), e.g.
https://github.com/m-labs/uclibc-lm32/blob/master/utils/mmap-windows.c
contains public domain code to emulate mmap on top of the Windows APIs.
Guess it could be simplified (we only need anonymous memory and don't need a
mmap emulation, can just call).
#include 
#include 

  HANDLE h = CreateFileMapping (INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE,
0, size, NULL);
  void *mem = NULL;
  if (h)
mem = MapViewOfFile (h, FILE_MAP_WRITE, 0, 0, size);

...
// When done:
  if (mem)
UnmapViewOfFile (mem);
  if (h)
CloseHandle (h);

Perhaps ask somebody with access to Windows to verify?

Jakub



[Patch] OpenACC: C/C++ - fix async parsing [PR99137]

2021-02-19 Thread Tobias Burnus

For:
  #pragma acc parallel async(1,2)

avoid with C an ICE for the original tree:
  #pragma acc parallel async(<<< Unknown tree: c_maybe_const_expr 1, 2 >>>)

It did not ICE with C++, but I think the tree does not make sense, either:
  #pragma acc parallel async(<<< Unknown tree: void_cst >>>, 2)

OK for mainline?
(Not a regression; I don't know whether it makes sense to apply to other 
branches).

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
OpenACC: C/C++ - fix async parsing [PR99137]

gcc/c/ChangeLog:

	PR c/99137
	* c-parser.c (c_parser_oacc_clause_async): Reject comma expressions.

gcc/cp/ChangeLog:

	PR c/99137
	* parser.c (cp_parser_oacc_clause_async): Reject comma expressions.

gcc/testsuite/ChangeLog:

	PR c/99137
	* c-c++-common/goacc/asyncwait-1.c: Update dg-error.
	* c-c++-common/goacc/async-1.c: New test.

 gcc/c/c-parser.c   |  2 +-
 gcc/cp/parser.c|  2 +-
 gcc/testsuite/c-c++-common/goacc/async-1.c |  7 +++
 gcc/testsuite/c-c++-common/goacc/asyncwait-1.c | 16 
 4 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 2a49d07bab4..5cdeb21a458 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -14332,7 +14332,7 @@ c_parser_oacc_clause_async (c_parser *parser, tree list)
 {
   c_parser_consume_token (parser);
 
-  t = c_parser_expression (parser).value;
+  t = c_parser_expr_no_commas (parser, NULL).value;
   if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
 	c_parser_error (parser, "expected integer expression");
   else if (t == error_mark_node
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 70775792161..6a29b6dca10 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -37991,7 +37991,7 @@ cp_parser_oacc_clause_async (cp_parser *parser, tree list)
   matching_parens parens;
   parens.consume_open (parser);
 
-  t = cp_parser_expression (parser);
+  t = cp_parser_assignment_expression (parser);
   if (t == error_mark_node
 	  || !parens.require_close (parser))
 	cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
diff --git a/gcc/testsuite/c-c++-common/goacc/async-1.c b/gcc/testsuite/c-c++-common/goacc/async-1.c
new file mode 100644
index 000..a578dabce8c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/async-1.c
@@ -0,0 +1,7 @@
+/* PR c/99137 */
+
+void f ()
+{
+  #pragma acc parallel async(1,2)  /* { dg-error "expected '\\)' before ',' token" } */
+  ;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
index 2f5d4762b49..b5d789621ec 100644
--- a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
@@ -9,7 +9,7 @@ f (int N, float *a, float *b)
 b[ii] = a[ii];
 }
 
-#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,) /* { dg-error "expected (primary-|)expression before" } */
+#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,) /* { dg-error "expected '\\)' before ',' token" } */
 {
 for (ii = 0; ii < N; ii++)
 b[ii] = a[ii];
@@ -21,19 +21,19 @@ f (int N, float *a, float *b)
 b[ii] = a[ii];
 }
 
-#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2,) /* { dg-error "expected (primary-|)expression before" } */
+#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2,) /* { dg-error "expected '\\)' before ',' token" } */
 {
 for (ii = 0; ii < N; ii++)
 b[ii] = a[ii];
 }
 
-#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2 3) /* { dg-error "expected '\\)' before numeric constant" } */
+#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2 3) /* { dg-error "expected '\\)' before ',' token" } */
 {
 for (ii = 0; ii < N; ii++)
 b[ii] = a[ii];
 }
 
-#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2,,) /* { dg-error "expected (primary-|)expression before" } */
+#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2,,) /* { dg-error "expected '\\)' before ',' token" } */
 {
 for (ii = 0; ii < N; ii++)
 b[ii] = a[ii];
@@ -193,15 +193,15 @@ f (int N, float *a, float *b)
 
 #pragma acc wait async (1 2) /* { dg-error "expected '\\)' before numeric constant" } */
 
-#pragma acc wait async (1,) /* { dg-error "expected (primary-|)expression before" } */
+#pragma acc wait async (1,) /* { dg-error "expected '\\)' before ',' token" } */
 
 #pragma acc wait async (,1) /* { dg-error "expected (primary-|)expression before" } */
 
-#pragma acc wait async (1,2,) /* { dg-error "expected (primary-|)expression before" } */
+#pragma acc wait async (1,2,) /* { dg-error "expected '\\)' before ',' token" } */
 
-#pragma acc wait async (1,2 3) /* { dg-error "expected '\\)' before 

[PATCH] middle-end/99122 - more VLA inlining fixes

2021-02-19 Thread Richard Biener
This avoids declaring a function with VLA arguments or return values
as inlineable.  IPA CP still ICEs, so the testcase has that disabled.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-02-19  Richard Biener  

PR middle-end/99122
* tree-inline.c (inline_forbidden_p): Do not inline functions
with VLA arguments or return value.

* gcc.dg/pr99122-3.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr99122-3.c | 19 +++
 gcc/tree-inline.c| 10 ++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr99122-3.c

diff --git a/gcc/testsuite/gcc.dg/pr99122-3.c b/gcc/testsuite/gcc.dg/pr99122-3.c
new file mode 100644
index 000..6aa5b2912ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr99122-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -g -fno-ipa-cp -w" } */
+
+static int foo ();
+
+int
+bar (int n)
+{
+  return foo (n, 2.0);
+}
+
+static inline int
+foo (int n, struct T { char a[n]; } b)
+{
+  int r = 0, i;
+  for (i = 0; i < n; i++)
+r += b.a[i];
+  return r;
+}
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 01a08cf27be..c993b1fee8a 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4022,6 +4022,16 @@ inline_forbidden_p (tree fndecl)
   wi.info = (void *) fndecl;
   wi.pset = &visited_nodes;
 
+  /* We cannot inline a function with a VLA typed argument or result since
+ we have no implementation materializing a variable of such type in
+ the caller.  */
+  if (COMPLETE_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))
+  && !poly_int_tree_p (TYPE_SIZE (TREE_TYPE (TREE_TYPE (fndecl)
+return true;
+  for (tree parm = DECL_ARGUMENTS (fndecl); parm; parm = DECL_CHAIN (parm))
+if (!poly_int_tree_p (DECL_SIZE (parm)))
+  return true;
+
   FOR_EACH_BB_FN (bb, fun)
 {
   gimple *ret;
-- 
2.26.2


[PATCH] slp: fix sharing of SLP only patterns. (PR99149)

2021-02-19 Thread Tamar Christina via Gcc-patches
Hi Richi,

The attached testcase ICEs due to a couple of issues.
In the testcase you have two SLP instances that share the majority of their
definition with each other.  One tree defines a COMPLEX_MUL sequence and the
other tree a COMPLEX_FMA.

The ice happens because:

1. the refcounts are wrong, in particular the FMA case doesn't correctly count
the references for the COMPLEX_MUL that it consumes.

2. when the FMA is created it incorrectly assumes it can just tear apart the MUL
node that it's consuming.  This is wrong and should only be done when there is
no more uses of the node, in which case the vector only pattern is no longer
relevant.

To fix the last part the SLP only pattern reset code was moved into
vect_free_slp_tree which results in cleaner code.  I also think it does belong
there since that function knows when there are no more uses of the node and so
the pattern should be unmarked, so when the the vectorizer is inspecting the BB
it doesn't find the now invalid vector only patterns.

This has the obvious problem in that, eventually after analysis is done, the
entire SLP tree is dissolved before codegen.  Where we get into trouble as we
have now dissolved the patterns too...

My initial thought was to add a parameter to vect_free_slp_tree, but I know you
wouldn't like that.  So I am sending this patch up as an RFC.

PS. This testcase actually shows that the codegen we get in these cases is not
optimal.  Currently this won't vectorize as the compiler thinks the vector
version is too expensive.

My guess here is because the patterns now unshare the tree and it's likely
costing the setup for the vector code twice?

Even with the shared code (without patterns, so same as GCC 10, or turning off
the patterns) it won't vectorize it.

The scalar code is

mov w0, 0
ldr x4, [x1, #:lo12:.LANCHOR0]
ldrsw   x2, [x3, 20]
ldr x1, [x3, 8]
lsl x2, x2, 3
ldrsw   x3, [x3, 16]
ldp s3, s2, [x4]
add x5, x1, x2
ldr s0, [x1, x2]
lsl x3, x3, 3
add x4, x1, x3
fmuls1, s2, s2
fnmsub  s1, s3, s3, s1
fmuls0, s2, s0
fmadd   s0, s3, s2, s0
ldr s3, [x1, x3]
ldr s2, [x4, 4]
fadds3, s3, s1
fadds2, s0, s2
str s3, [x1, x3]
str s2, [x4, 4]
str s1, [x1, x2]
str s0, [x5, 4]
ret

and turning off the cost model we get

moviv1.2s, 0
mov w0, 0
ldr x4, [x1, #:lo12:.LANCHOR0]
ldrsw   x3, [x2, 16]
ldr x1, [x2, 8]
ldrsw   x2, [x2, 20]
ldr d0, [x4]
fcmla   v1.2s, v0.2s, v0.2s, #0
ldr d2, [x1, x3, lsl 3]
fcmla   v2.2s, v0.2s, v0.2s, #0
fcmla   v2.2s, v0.2s, v0.2s, #90
str d2, [x1, x3, lsl 3]
fcmla   v1.2s, v0.2s, v0.2s, #90
str d1, [x1, x2, lsl 3]

however, if the pattern matcher doesn't create the FMA node because it would
unshare the tree (which I think is a general heuristic that would work out to
better code wouldn't it?) then we would get (with the cost model enabled even)

moviv0.2s, 0
mov w0, 0
ldr x4, [x1, #:lo12:.LANCHOR0]
ldrsw   x3, [x2, 16]
ldr x1, [x2, 8]
ldr d1, [x4]
fcmla   v0.2s, v1.2s, v1.2s, #0
fcmla   v0.2s, v1.2s, v1.2s, #90
ldrsw   x2, [x2, 20]
ldr d1, [x1, x3, lsl 3]
faddv1.2s, v1.2s, v0.2s
str d1, [x1, x3, lsl 3]
str d0, [x1, x2, lsl 3]
ret

Which is the most optimal form.  So I think this should perhaps be handled in
GCC 12 if there's a way to detect when you're going to unshare a sub-tree.

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/99149
* tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate the
buffer.
(vect_slp_reset_pattern): Remove.
(complex_fma_pattern::matches): Remove call to vect_slp_reset_pattern.
(complex_mul_pattern::build, complex_fma_pattern::build,
complex_fms_pattern::build): Fix ref counts.
* tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern relevancy
when node is being deleted.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize value.

gcc/testsuite/ChangeLog:

PR tree-optimization/99149
* gcc.dg/vect/pr99149.C: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/pr99149.C 
b/gcc/testsuite/gcc.dg/vect/pr99149.C
new file mode 100755
index 
..b12fe17e4ded148ce2bf67486e425dd65461a148
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr99149.C
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } 
} } */
+
+class a {
+  float b;
+  float c;
+
+public:
+  a(float d, float e) : b(d), c(e) {}
+  

RE: [PATCH] slp: fix sharing of SLP only patterns. (PR99149)

2021-02-19 Thread Tamar Christina via Gcc-patches
Ps. The code in comment is needed, but was commented out due to the early free 
issue described in the patch.

Thanks,
Tamar

> -Original Message-
> From: Tamar Christina 
> Sent: Friday, February 19, 2021 2:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de
> Subject: [PATCH] slp: fix sharing of SLP only patterns. (PR99149)
> 
> Hi Richi,
> 
> The attached testcase ICEs due to a couple of issues.
> In the testcase you have two SLP instances that share the majority of their
> definition with each other.  One tree defines a COMPLEX_MUL sequence
> and the other tree a COMPLEX_FMA.
> 
> The ice happens because:
> 
> 1. the refcounts are wrong, in particular the FMA case doesn't correctly count
> the references for the COMPLEX_MUL that it consumes.
> 
> 2. when the FMA is created it incorrectly assumes it can just tear apart the
> MUL node that it's consuming.  This is wrong and should only be done when
> there is no more uses of the node, in which case the vector only pattern is no
> longer relevant.
> 
> To fix the last part the SLP only pattern reset code was moved into
> vect_free_slp_tree which results in cleaner code.  I also think it does belong
> there since that function knows when there are no more uses of the node
> and so the pattern should be unmarked, so when the the vectorizer is
> inspecting the BB it doesn't find the now invalid vector only patterns.
> 
> This has the obvious problem in that, eventually after analysis is done, the
> entire SLP tree is dissolved before codegen.  Where we get into trouble as
> we have now dissolved the patterns too...
> 
> My initial thought was to add a parameter to vect_free_slp_tree, but I know
> you wouldn't like that.  So I am sending this patch up as an RFC.
> 
> PS. This testcase actually shows that the codegen we get in these cases is not
> optimal.  Currently this won't vectorize as the compiler thinks the vector
> version is too expensive.
> 
> My guess here is because the patterns now unshare the tree and it's likely
> costing the setup for the vector code twice?
> 
> Even with the shared code (without patterns, so same as GCC 10, or turning
> off the patterns) it won't vectorize it.
> 
> The scalar code is
> 
> mov w0, 0
> ldr x4, [x1, #:lo12:.LANCHOR0]
> ldrsw   x2, [x3, 20]
> ldr x1, [x3, 8]
> lsl x2, x2, 3
> ldrsw   x3, [x3, 16]
> ldp s3, s2, [x4]
> add x5, x1, x2
> ldr s0, [x1, x2]
> lsl x3, x3, 3
> add x4, x1, x3
> fmuls1, s2, s2
> fnmsub  s1, s3, s3, s1
> fmuls0, s2, s0
> fmadd   s0, s3, s2, s0
> ldr s3, [x1, x3]
> ldr s2, [x4, 4]
> fadds3, s3, s1
> fadds2, s0, s2
> str s3, [x1, x3]
> str s2, [x4, 4]
> str s1, [x1, x2]
> str s0, [x5, 4]
> ret
> 
> and turning off the cost model we get
> 
> moviv1.2s, 0
> mov w0, 0
> ldr x4, [x1, #:lo12:.LANCHOR0]
> ldrsw   x3, [x2, 16]
> ldr x1, [x2, 8]
> ldrsw   x2, [x2, 20]
> ldr d0, [x4]
> fcmla   v1.2s, v0.2s, v0.2s, #0
> ldr d2, [x1, x3, lsl 3]
> fcmla   v2.2s, v0.2s, v0.2s, #0
> fcmla   v2.2s, v0.2s, v0.2s, #90
> str d2, [x1, x3, lsl 3]
> fcmla   v1.2s, v0.2s, v0.2s, #90
> str d1, [x1, x2, lsl 3]
> 
> however, if the pattern matcher doesn't create the FMA node because it
> would unshare the tree (which I think is a general heuristic that would work
> out to better code wouldn't it?) then we would get (with the cost model
> enabled even)
> 
> moviv0.2s, 0
> mov w0, 0
> ldr x4, [x1, #:lo12:.LANCHOR0]
> ldrsw   x3, [x2, 16]
> ldr x1, [x2, 8]
> ldr d1, [x4]
> fcmla   v0.2s, v1.2s, v1.2s, #0
> fcmla   v0.2s, v1.2s, v1.2s, #90
> ldrsw   x2, [x2, 20]
> ldr d1, [x1, x3, lsl 3]
> faddv1.2s, v1.2s, v0.2s
> str d1, [x1, x3, lsl 3]
> str d0, [x1, x2, lsl 3]
> ret
> 
> Which is the most optimal form.  So I think this should perhaps be handled in
> GCC 12 if there's a way to detect when you're going to unshare a sub-tree.
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/99149
>   * tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate the
>   buffer.
>   (vect_slp_reset_pattern): Remove.
>   (complex_fma_pattern::matches): Remove call to
> vect_slp_reset_pattern.
>   (complex_mul_pattern::build, complex_fma_pattern::build,
>   complex_fms_pattern::build): Fix ref counts.
>   * tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern
> relevancy
>   when node is being deleted.
>   * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize value.
> 
> gcc/testsuite/ChangeLog:

Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered

2021-02-19 Thread Vladimir Makarov via Gcc-patches



On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote:

Hi,

This patch makes sure that allocno copies are not created for 
unordered modes. The testcases in the PR highlighted a case where an 
allocno copy was being created for:

(insn 121 120 123 11 (parallel [
    (set (reg:VNx2QI 217)
    (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 
]) 0)))

    (clobber (scratch:VNx16BI))
    ]) 4750 {*vec_duplicatevnx2qi_reg}
 (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
    (nil)))

As the compiler detected that the vec_duplicate_reg pattern 
allowed the input and output operand to be of the same register class, 
it tried to create an allocno copy for these two operands, stripping 
subregs in the process. However, this meant that the copy was between 
VNx2QI and SI, which have unordered mode precisions.


So at compile time we do not know which of the two modes is smaller 
which is a requirement when updating allocno copy costs.


Regression tested on aarch64-linux-gnu.

Is this OK for trunk (and after a week backport to gcc-10) ?

OK.  Yes, it is wise to wait a bit and see how the patch behaves on the 
trunk before submitting it to gcc-10 branch.  Sometimes such changes can 
have quite unexpected consequences.  But I guess not in this is case.


Thank you for working on the issue.


gcc/ChangeLog:
2021-02-19  Andre Vieira  

    PR rtl-optimization/98791
    * ira-conflicts.c (process_regs_for_copy): Don't create 
allocno copies for unordered modes.


gcc/testsuite/ChangeLog:
2021-02-19  Andre Vieira  

    PR rtl-optimization/98791
    * gcc.target/aarch64/sve/pr98791.c: New test.





[Patch] Fortran: Fix DTIO with type ICE [PR99146]

2021-02-19 Thread Tobias Burnus

In this example, the formal argument is a derived type
and not a class – hence, there is an ICE.

OK for the trunk?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Fortran: Fix DTIO with type ICE [PR99146]

gcc/fortran/ChangeLog:

	PR fortran/99146
	* interface.c:

gcc/testsuite/ChangeLog:

	PR fortran/99146
	* gfortran.dg/dtio_36.f90: New test.

 gcc/fortran/interface.c   |  4 +++-
 gcc/testsuite/gfortran.dg/dtio_36.f90 | 33 +
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 87fe14280e6..f7ca52e6550 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -5305,7 +5305,9 @@ gfc_find_specific_dtio_proc (gfc_symbol *derived, bool write, bool formatted)
 }
 
 finish:
-  if (dtio_sub && derived != CLASS_DATA (dtio_sub->formal->sym)->ts.u.derived)
+  if (dtio_sub
+  && dtio_sub->formal->sym->ts.type == BT_CLASS
+  && derived != CLASS_DATA (dtio_sub->formal->sym)->ts.u.derived)
 gfc_find_derived_vtab (derived);
 
   return dtio_sub;
diff --git a/gcc/testsuite/gfortran.dg/dtio_36.f90 b/gcc/testsuite/gfortran.dg/dtio_36.f90
new file mode 100644
index 000..4e53581b86a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dtio_36.f90
@@ -0,0 +1,33 @@
+! { dg-do compile }
+!
+! PR fortran/99146
+!
+  MODULE p
+  TYPE :: person
+  sequence
+  END TYPE person
+  INTERFACE READ(UNFORMATTED)
+   MODULE PROCEDURE pruf
+  END INTERFACE
+
+  CONTAINS
+
+  SUBROUTINE pruf (dtv,unit,iostat,iomsg)
+   type(person), INTENT(INOUT) :: dtv
+   INTEGER, INTENT(IN) :: unit
+   INTEGER, INTENT(OUT) :: iostat
+   CHARACTER (LEN=*), INTENT(INOUT) :: iomsg
+   iostat = 1
+  END SUBROUTINE pruf
+
+  END MODULE p
+
+  PROGRAM test
+  USE p
+  TYPE (person) :: chairman
+
+  OPEN (UNIT=71, status = 'scratch', FORM='UNFORMATTED')
+
+  read(71) chairman
+
+  END PROGRAM test


Re: [Patch] Fortran: Fix DTIO with type ICE [PR99146]

2021-02-19 Thread Jerry DeLisle




On 2/19/21 8:00 AM, Tobias Burnus wrote:

In this example, the formal argument is a derived type
and not a class – hence, there is an ICE.

OK for the trunk?

This is OK, could you also check 89219 and 81499 and see if these are 
the same or similar.


Much appreciated.

Jerry


Re: *PING**2 Re: [Patch] Fortran: Fix coarray handling for gfc_dep_resolver [PR99010] (was: Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913]

2021-02-19 Thread Jerry DeLisle

On 2/19/21 1:33 AM, Tobias Burnus wrote:


On 09.02.21 12:52, Tobias Burnus wrote:

Hi all, hi Thomas,

On 02.02.21 18:15, Tobias Burnus wrote:

I think I will do a combination: If 'identical' is true, I think I
cannot remove it. If it is false, it can be identical or nonoverlapping
– which makes sense.


Well, it turned out that coarray scalars did not work; for them,
the array ref consists only of the coindexed access: falling through
then triggered as fin_dep == GFC_DEP_ERROR.

To be more careful, I also removed the 'identical &&' check such that
the first loop is now always triggered if coarray != SINGLE not only
if identical – I am not sure whether it is really needed or whether
falling though is the better solution (with updating the comment).

I would be happy if someone could carefully check the logic –
in the details, it is rather confusing.

OK?


Yes OK, thanks for patch.

Jerry



[PATCH][obvious] Fix typo in param description.

2021-02-19 Thread Martin Liška

Pushed to master as obvious.

Martin

gcc/ChangeLog:

PR translation/99167
* params.opt: Fix typo.
---
 gcc/params.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/params.opt b/gcc/params.opt
index a4e1ac0e88e..0dd9ac406eb 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -957,7 +957,7 @@ Maximum number of references stored in each modref base.
 
 -param=modref-max-accesses=

 Common Joined UInteger Var(param_modref_max_accesses) Init(16) Param 
Optimization
-Maximum number of accesse stored in each modref reference.
+Maximum number of accesses stored in each modref reference.
 
 -param=modref-max-tests=

 Common Joined UInteger Var(param_modref_max_tests) Init(64) Param Optimization
--
2.30.1



Re: [patch, fortran] PR96686 Namelist group objects shall be defined before appearing in namelist

2021-02-19 Thread Tobias Burnus

Hi Jerry,

On 17.02.21 04:02, Jerry DeLisle wrote:

Attached patch adds to checks.  In the case of IMPLICIT typing it
checks to see if the objects listed in the NAMELIST have defined types
andf if not, sets them to the default implicit types.

In the case of IMPLICIT NONE, the types are required be declared
before the NAMELIST.   If an object type is found to not be declared
already, an error is issued.
Regression tested.

OK for trunk?


After taking a look while being less groggy,
it looks good to me, but I have a few remarks:

I think it looks cleaner to swap inner/outer the conditions, i.e.

if (sym->ts.type == BT_UNKNOWN)
  {
if (!gfc_current_ns->seen_implicit_none)
  ...
else
  ...
  }


+++ b/gcc/testsuite/gfortran.dg/namelist_4.f90
@@ -27,14 +27,14 @@ END module M1
  program P1
  CONTAINS
  ! This has the additional wrinkle of a reference to the object.
+  INTEGER FUNCTION F2()
+F2=1
+  END FUNCTION
INTEGER FUNCTION F1()
  NAMELIST /NML3/ F2 ! { dg-error "PROCEDURE attribute conflicts" }
  ! Used to ICE here
-f2 = 1 ! { dg-error "is not a VALUE" }
+f2 = 1 ! { dg-error "is not a variable" }
  F1=1
END FUNCTION
-  INTEGER FUNCTION F2()
-F2=1
-  END FUNCTION
  END


Unless I made a mistake, there is no need to modify this testcase – even
with the patch, the error remains the same.

However, if the "f2 = 1" line is removed, the previously hidden error in
the NAMELIST line is shown: 'PROCEDURE attribute conflicts with NAMELIST
attribute in ‘f2’ at (1)' is shown.

I think you should retain this testcase – and either add between the two
functions a new one, copying 'f1' but with 'f2 = 1' removed. You then
get the expected NAMELIST error. — Or to add a new testcase for this check.

PLUS: I think it would be useful to add a test of the form:

  namelist /nml/ f
  integer :: f

which then shows the new error (declared before the namelist) which
is currently not checked for.

 * * *

On 19.02.21 16:58, Jerry DeLisle wrote:

On 2/17/21 1:19 AM, Tobias Burnus wrote:

f2 looks like a local and implicitly typed real variable. At least ifort
compiles this program successfully.


I have to admit that I am not sure about this – implicit typing is odd,
if combined with host association. But I think you are right. I also got
confused due to the reordering, which is not needed.

Regarding:


contains
  integer function f1()
!f2 = 1   !This gives an error trying to assign a value to a
   function.

Okay. Also matches ifort: "This name has already been used as an
external function name."

j = f2! This is OK


This one is odd – you assign a function address to an integer (the
compiler does the casting). ifort rejects it with: "This name has
already been used as an external function name."

That looks like a variant of https://gcc.gnu.org/PR98890 – I added it as
additional example.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[PATCH] Document the GCC11 change to DWARF5 default.

2021-02-19 Thread Mark Wielaard
* gcc-11/changes.html (General Improvements): Add a section on
the DWARF version 5 default.
---
 htdocs/gcc-11/changes.html | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index de75b8d6..04c3c449 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -139,6 +139,36 @@ a work-in-progress.
   -fsanitize=kernel-hwaddress to instrument kernel 
code.
 
   
+  
+
+  For targets that produce DWARF debugging information GCC now
+  defaults to http://dwarfstd.org/doc/DWARF5.pdf";>DWARF
+  version 5 (with the exception of VxWorks and Darwin/Mac OS X
+  which default to version 2 and AIX which defaults to version 4).
+  This can produce up to 25% more compact debug information
+  compared to earlier versions.
+
+  To take full advantage of DWARF version 5 GCC needs to be build
+  against binutils version 2.35.2 or higher.  When GCC is build
+  against earlier versions of binutils GCC will still emit DWARF
+  version 5 for most debuginfo data, but will generate version 4
+  debug line tables (even when explicitly given -gdwarf-5).
+
+  The following debug information consumers can process DWARF version 5:
+  
+   GDB 8.0, or higher
+   valgrind 3.17.0
+   elfutils 0.172, or higher (for use with systemtap,
+ dwarves/pahole, perf and libabigail).
+  
+
+  Programs embedding libbacktrace are urged to upgrade to the version
+  shipping with GCC 11.
+
+  To make GCC 11 generate an older DWARF version
+  use -g together with -gdwarf-2,
+  -gdwarf-3 or -gdwarf-4.
+  
 
 
 
-- 
2.18.4



c++: Inform of CMI reads [PR 99166]

2021-02-19 Thread Nathan Sidwell
When successfully reading a module CMI,	the user gets no indication of 
where that CMI was located.  I originally didn't consider this a problem 
-- the read was successful after all.  But it can make it difficult to 
interact with build systems, particularly when caching can be involved. 
Grovelling over internal dump files is not really useful to the user. 
Hence this option, which is similar to the -flang-info-include-translate 
variants, and allows the user to ask for all, or specific module read 
notification.


gcc/c-family/
* c.opt (flang-info-module-read, flang-info-module-read=): New. 
gcc/
* doc/invoke.texi (flang-info-module-read): Document.
gcc/cp/
* module.cc (note_cmis): New.
(struct module_state): Add inform_read_p bit.
(module_state::do_import): Inform of CMI location, if enabled.
(init_modules): Canonicalize note_cmis entries.
(handle_module_option): Handle -flang-info-module-read=FOO.
gcc/testsuite/
* g++.dg/modules/pr99166_a.X: New.
* g++.dg/modules/pr99166_b.C: New.
* g++.dg/modules/pr99166_c.C: New.
* g++.dg/modules/pr99166_d.C: New.

--
Nathan Sidwell
diff --git c/gcc/c-family/c.opt w/gcc/c-family/c.opt
index b209d46d32b..3264c646ad3 100644
--- c/gcc/c-family/c.opt
+++ w/gcc/c-family/c.opt
@@ -1752,6 +1752,14 @@ flang-info-include-translate=
 C++ Joined RejectNegative MissingArgError(missing header name)
 Note a #include translation of a specific header.
 
+flang-info-module-read
+C++ Var(note_module_read_yes)
+Note Compiled Module Interface pathnames.
+
+flang-info-module-read=
+C++ Joined RejectNegative MissingArgError(missing module name)
+Note Compiled Module Interface pathname of a specific module or header-unit.
+
 fmax-include-depth=
 C ObjC C++ ObjC++ Joined RejectNegative UInteger
 fmax-include-depth= Set the maximum depth of the nested #include.
diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc
index e801c52069e..691381bf995 100644
--- c/gcc/cp/module.cc
+++ w/gcc/cp/module.cc
@@ -317,6 +317,9 @@ version2string (unsigned version, verstr_t &out)
 /* Include files to note translation for.  */
 static vec *note_includes;
 
+/* Modules to note CMI pathames.  */
+static vec *note_cmis;
+
 /* Traits to hash an arbitrary pointer.  Entries are not deletable,
and removal is a noop (removal needed upon destruction).  */
 template 
@@ -3547,9 +3550,10 @@ class GTY((chain_next ("%h.parent"), for_user)) module_state {
 			   do it again  */
   bool call_init_p : 1; /* This module's global initializer needs
 			   calling.  */
+  bool inform_read_p : 1; /* Inform of a read.  */
   /* Record extensions emitted or permitted.  */
   unsigned extensions : SE_BITS;
-  /* 12 bits used, 4 bits remain  */
+  /* 13 bits used, 3 bits remain  */
 
  public:
   module_state (tree name, module_state *, bool);
@@ -3782,6 +3786,8 @@ module_state::module_state (tree name, module_state *parent, bool partition)
 
   partition_p = partition;
 
+  inform_read_p = false;
+
   extensions = 0;
   if (name && TREE_CODE (name) == STRING_CST)
 {
@@ -18634,6 +18640,8 @@ module_state::do_import (cpp_reader *reader, bool outermost)
 {
   const char *file = maybe_add_cmi_prefix (filename);
   dump () && dump ("CMI is %s", file);
+  if (note_module_read_yes || inform_read_p)
+	inform (loc, "reading CMI %qs", file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
@@ -19545,6 +19553,7 @@ init_modules (cpp_reader *reader)
   headers = BITMAP_GGC_ALLOC ();
 
   if (note_includes)
+/* Canonicalize header names.  */
 for (unsigned ix = 0; ix != note_includes->length (); ix++)
   {
 	const char *hdr = (*note_includes)[ix];
@@ -19567,6 +19576,37 @@ init_modules (cpp_reader *reader)
 	(*note_includes)[ix] = path;
   }
 
+  if (note_cmis)
+/* Canonicalize & mark module names.  */
+for (unsigned ix = 0; ix != note_cmis->length (); ix++)
+  {
+	const char *name = (*note_cmis)[ix];
+	size_t len = strlen (name);
+
+	bool is_system = name[0] == '<';
+	bool is_user = name[0] == '"';
+	bool is_pathname = false;
+	if (!(is_system || is_user))
+	  for (unsigned ix = len; !is_pathname && ix--;)
+	is_pathname = IS_DIR_SEPARATOR (name[ix]);
+	if (is_system || is_user || is_pathname)
+	  {
+	if (len <= (is_pathname ? 0 : 2)
+		|| (!is_pathname && name[len-1] != (is_system ? '>' : '"')))
+	  {
+		error ("invalid header name %qs", name);
+		continue;
+	  }
+	else
+	  name = canonicalize_header_name (is_pathname ? nullptr : reader,
+	   0, is_pathname, name, len);
+	  }
+	if (auto module = get_module (name))
+	  module->inform_read_p = 1;
+	else
+	  error ("invalid module name %qs", name);
+  }
+
   dump.push (NULL);
 
   /* Determine lazy handle bound.  */
@@ -19952,6 +19992,10 @@ handle_module_option (unsigned code, const char *str, int)
   vec_safe_push (note_includes, str);
   return true;
 
+  

Re: [PATCH] clear VLA bounds from all arguments (PR 97172)

2021-02-19 Thread Martin Sebor via Gcc-patches

On 2/18/21 8:51 PM, Jeff Law wrote:



On 2/18/21 7:23 PM, Martin Sebor wrote:

The fix for PR 97172 that removes non-constant VLA bounds from
attribute access is incomplete: it inadvertently removes the bounds
corresponding to just the first VLA argument, and not from subsequent
arguments.

The attached change removes the vestigial condition that causes this
bug.  Since it's behind a build failure in a Fedora package (cava?)
I'll commit it under the "obvious" clause tomorrow if there are no
concerns.

Martin


gcc-97172-2.diff

PR c/97172 - ICE: tree code 'ssa_name' is not supported in LTO streams

gcc/ChangeLog:

* attribs.c (init_attr_rdwr_indices): Guard vblist use.
(attr_access::free_lang_data): Remove a spurious test.

gcc/testsuite/ChangeLog:

* gcc.dg/pr97172.c: Add test cases.

OK. Thanks.


I just committed this change in g:3599ecb6a01.

Martin


Re: [PATCH] run -Wnonnull later (PR 87489)

2021-02-19 Thread Martin Sebor via Gcc-patches

On 2/19/21 2:48 AM, Franz Sirl wrote:

Am 2021-02-01 um 01:31 schrieb Martin Sebor via Gcc-patches:

The initial -Wnonnull implementation in the middle end took place
too late in the pipeline (just before expansion), and as a result
was prone to false positives (bug 78817).  In an attempt to avoid
the worst of those, the warning was moved to the ccp2 pass in
r243874.  However, as the test case in PR 87489 shows, this is
in turn too early and causes other false positives as a result.

A few experiments with running the warning later suggest that
just before the mergephi2 pass is a good point to avoid this class
of false positives without causing any regressions or introducing
any new warnings either in Glibc or in Binutils/GDB.

Since PR 87489 is a GCC 8-11 regression I propose to make this
change on the trunk as well as on the release branches.


Hi Martin,

I tested your patch and it showed also one more warning for this 
testcase with -O2 -Wnonnull:


class b {
public:
   long c();
};
class B {
public:
   B() : f() {}
   b *f;
};
long d, e;
class g : B {
public:
   void h() {
     long a = f->c();
     d = e = a;
   }
};
class j {
   j();
   g i;
};
j::j() { i.h(); }

I hope cvise didn't minimize too much, but at least in the original much 
larger code the warning looks reasonable too.


Thanks.  Yes, the warning would be useful here since the f pointer
in the call to f->c() is null when i.h() is called from j's ctor.
The FRE3 pass clearly exposes this :

void j::j (struct j * const this)
{
  long int _9;

   [local count: 1073741824]:
  MEM[(struct B *)this_3(D)] ={v} {CLOBBER};
  MEM[(struct B *)this_3(D)].f = 0B;
  _9 = b::c (0B);
  ...

Because the warning runs early (after CCP2), the null pointer is
not detected.  To see it the warning code would have to work quite
hard to figure out that the two assignments below refer to the same
location (it would essentially have to do what FRE does):

  MEM[(struct B *)this_3(D)].f = 0B;
  _7 = MEM[(struct b * *)_1];
  _9 = b::c (_7);

It's probably too late to make this change for GCC 11 as Jeff has
already decided that it should be deferred until GCC 12, and even
then there's a concern that doing so might cause false positives.
I think it's worth revisiting the idea in GCC 12 to see if
the concern is founded.  Let me make a note of it in the bug.

Martin


[WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]

2021-02-19 Thread Kwok Cheung Yeung

Hello

Sorry for taking so long in replying.

On 29/01/2021 3:03 pm, Jakub Jelinek wrote:

It can also crash if team is NULL, which will happen any time
this is called outside of a parallel.  Just try (should go into testsuite
too):
#include 

int
main ()
{
   omp_event_handle_t ev;
   #pragma omp task detach (ev)
   omp_fulfill_event (ev);
   return 0;
}



I have included this as task-detach-11.{c|f90}.


Additionally, there is an important difference between fulfill for
included tasks and for non-included tasks, for the former there is no team
or anything to care about, for the latter there is a team and one needs to
take the task_lock, but at that point it can do pretty much everything in
omp_fulfill_event rather than handling it elsewhere.

So, what I'm suggesting is:

Replace
   bool detach;
   gomp_sem_t completion_sem;
with
   struct gomp_task_detach *detach;
and add struct gomp_task_detach that would contain everything that will be
needed (indirect so that we don't waste space for it in every task, but only
for those that have detach clause).
We need:
1) some way to tell if it is an included task or not
2) for included tasks the gomp_sem_t completion_sem
(and nothing but 1) and 2) for those),
3) struct gomp_team * for non-included tasks
4) some way to find out if the task has finished and is just waiting for
fulfill event (perhaps your GOMP_TASK_DETACHED is ok for that)
5) some way to find out if the task has been fulfilled already
(gomp_sem_t for that seems an overkill though)

1) could be done through the struct gomp_team *team; member,
set it to NULL in included tasks (no matter if they are in some team or not)
and to non-NULL team of the task (non-included tasks must have a team).



I have opted for a union of completion_sem (for tasks that are undeferred) and a 
struct gomp_team *detach_team (for deferred tasks) that holds the team if the 
completion event has not yet fulfilled, or NULL if is it. I don't see the point 
of having an indirection to the union here since the union is just the size of a 
pointer, so it might as well be inlined.



And I don't see the point of task_detach_queue if we can handle the
dependers etc. all in omp_fulfill_event, which I think we can if we take the
task_lock.


I have removed the task_detach_queue. The team barrier, taskwait and 
taskgroup_end now just set the task kind to GOMP_TASK_DETACHED without 
decrementing the task_count if a task finishes with detach_team non-NULL.



So, I think omp_fulfill_event should look at the task->detach it got,
if task->detach->team is NULL, it is included task, GOMP_task should have
initialized task->detach->completion_sem and omp_fulfill_event should just
gomp_sem_post it and that is all, GOMP_task for included task needs to
gomp_sem_wait after it finishes before it returns.


omp_fulfill_event now posts completion_sem if the task kind is 
OMP_TASK_UNDEFERRED, and GOMP_task waits for it. Since the task is executed 
within GOMP_task, it already knows if the task has a detach clause or not, so we 
do not need to store that information in gomp_task.



Otherwise, take the team's task_lock, and look at whether the task is still
running, in that case just set the bool that it has been fulfilled (or
whatever way of signalling 5), perhaps it can be say clearing task->detach
pointer).


detach_team is now set to NULL when the event is fulfilled if the task has not 
started yet or is still executing (checked by the kind). In that case, when the 
task finishes executing, it behaves just like a task without detach would and 
finishes normally.


  When creating non-included tasks in GOMP_task with detach clause

through gomp_malloc, it would add the size needed for struct
gomp_task_detach.


Not necessary with the inlined union.


But if the task is already in GOMP_TASK_DETACHED state, instead we need
while holding the task_lock do everything that would have been done normally
on task finish, but we've skipped it because it hasn't been fulfilled.
Including the waking/sem_posts when something could be waiting on that task.

Do you agree with this, or see some reason why this can't work?


The main problem I see is this code in gomp_barrier_handle_tasks:

  if (--team->task_count == 0
  && gomp_team_barrier_waiting_for_tasks (&team->barrier))
{
  gomp_team_barrier_done (&team->barrier, state);

We do not have access to state from within omp_fulfill_event, so how should this 
be handled?



And testsuite should include also cases where we wait for the tasks with
detach clause to be fulfilled at the end of taskgroup (i.e. need to cover
all of taskwait, taskgroup end and barrier).


I have changed task-detach-[56].* to test the barrier, task-detach-[78].* to 
test taskwait, and task-detach-(9|10) to test taskgroup (with the first one 
without a target construct, the second with).


I have included the current state of my patch. All task-detach-* tests pass when 
executed without offloading o

New French PO file for 'gcc' (version 11.1-b20210207)

2021-02-19 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-11.1-b20210207.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




libgo patch committed: Update to Go1.16 release

2021-02-19 Thread Ian Lance Taylor via Gcc-patches
This patch updates libgo to the final Go 1.16 release.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
c89b42d3b9d76dedb35e8478913ddf5367f3b5e6
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index eed9ce01905..217bdd55f1d 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-c406de0594782b1d6782a732a50f5b76387852dc
+78a840e4940159a66072237f6b002ab79f441b79
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/MERGE b/libgo/MERGE
index b2fc633f06c..183b0245ee2 100644
--- a/libgo/MERGE
+++ b/libgo/MERGE
@@ -1,4 +1,4 @@
-3e06467282c6d5678a6273747658c04314e013ef
+f21be2fdc6f1becdbed1592ea0b245cdeedc5ac8
 
 The first line of this file holds the git revision number of the
 last merge done from the master library sources.
diff --git a/libgo/VERSION b/libgo/VERSION
index a19294250b3..4befab24bc9 100644
--- a/libgo/VERSION
+++ b/libgo/VERSION
@@ -1 +1 @@
-go1.16rc1
+go1.16
diff --git a/libgo/go/archive/tar/strconv.go b/libgo/go/archive/tar/strconv.go
index 6d0a4038082..f0b61e6dba6 100644
--- a/libgo/go/archive/tar/strconv.go
+++ b/libgo/go/archive/tar/strconv.go
@@ -265,8 +265,27 @@ func parsePAXRecord(s string) (k, v, r string, err error) {
return "", "", s, ErrHeader
}
 
+   afterSpace := int64(sp + 1)
+   beforeLastNewLine := n - 1
+   // In some cases, "length" was perhaps padded/malformed, and
+   // trying to index past where the space supposedly is goes past
+   // the end of the actual record.
+   // For example:
+   //"30 
mtime=1432668921.098285006\n30 ctime=2147483649.15163319"
+   //  ^ ^
+   //  | |
+   //  |  afterSpace=35
+   //  |
+   //  beforeLastNewLine=29
+   // yet indexOf(firstSpace) MUST BE before endOfRecord.
+   //
+   // See https://golang.org/issues/40196.
+   if afterSpace >= beforeLastNewLine {
+   return "", "", s, ErrHeader
+   }
+
// Extract everything between the space and the final newline.
-   rec, nl, rem := s[sp+1:n-1], s[n-1:n], s[n:]
+   rec, nl, rem := s[afterSpace:beforeLastNewLine], 
s[beforeLastNewLine:n], s[n:]
if nl != "\n" {
return "", "", s, ErrHeader
}
diff --git a/libgo/go/archive/tar/strconv_test.go 
b/libgo/go/archive/tar/strconv_test.go
index dd3505a758a..add65e272ae 100644
--- a/libgo/go/archive/tar/strconv_test.go
+++ b/libgo/go/archive/tar/strconv_test.go
@@ -368,6 +368,13 @@ func TestParsePAXRecord(t *testing.T) {
{"16 longkeyname=hahaha\n", "16 longkeyname=hahaha\n", "", "", 
false},
{"3 somelongkey=\n", "3 somelongkey=\n", "", "", false},
{"50 tooshort=\n", "50 tooshort=\n", "", "", false},
+   {"30 
mtime=1432668921.098285006\n30 ctime=2147483649.15163319", 
"30 mtime=1432668921.098285006\n30 
ctime=2147483649.15163319", "mtime", "1432668921.098285006", false},
+   {"06 k=v\n", "06 k=v\n", "", "", false},
+   {"6 k=v\n", "6 k=v\n", "", "", false},
+   {"06 k=v\n", "06 k=v\n", "", "", false},
+   {"00 k=v\n", "00 k=v\n", "", "", false},
+   {"0 k=v\n", "0 k=v\n", "", "", false},
+   {"+005 x=\n", "+005 x=\n", "", "", false},
}
 
for _, v := range vectors {
diff --git a/libgo/go/cmd/go/alldocs.go b/libgo/go/cmd/go/alldocs.go
index 49d390297cd..e7c63f0749d 100644
--- a/libgo/go/cmd/go/alldocs.go
+++ b/libgo/go/cmd/go/alldocs.go
@@ -1808,7 +1808,7 @@
 // The directory where the go command will write
 // temporary source files, packages, and binaries.
 // GOVCS
-//   Lists version control commands that may be used with matching servers.
+// Lists version control commands that may be used with matching 
servers.
 // See 'go help vcs'.
 //
 // Environment variables for use with cgo:
@@ -2410,6 +2410,17 @@
 //
 // For a detailed reference on modules, see https://golang.org/ref/mod.
 //
+// By default, the go command may download modules from 
https://proxy.golang.org.
+// It may authenticate modules using the checksum database at
+// https://sum.golang.org. Both services are operated by the Go team at Google.
+// The privacy policies for these services are available at
+// https://proxy.golang.org/privacy and https://sum.golang.org/privacy,
+// respectively.
+//
+// The go command's download behavior may be configured using GOPROXY, GOSUMDB,
+// GOPRIVATE, and other environment variables. See 'go help environment'
+// and https://golang.org/ref/mod#private-module-p

Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior

2021-02-19 Thread Segher Boessenkool
Hi!

On Fri, Feb 19, 2021 at 11:06:16AM +0800, Kewen.Lin wrote:
> on 2021/2/19 上午2:33, Segher Boessenkool wrote:
> > Is there a PR you should mention here?
> 
> I thought this is trivial so didn't file one upstream PR.  I did a
> searching in upstream bugzilla, PR93453 looks similar but different.
> I confirmed that the current patch can't make it pass (just two insns
> instead of three insns).
> 
> Do you happen to have one related in mind?  If no, I will commit it
> without PR.

That is fine.  I just asked if perhaps you forgot to mention a PR; I do
that all the time myself!  Filing a new PR if a patch is ready and
approved is just bureaucracy, so let's not (it is useful if it needs
backporting though, to have a place to track that).

Thanks,


Segher


Re: [committed] jit: fix ICE on BUILT_IN_TRAP [PR99126]

2021-02-19 Thread David Malcolm via Gcc-patches
On Fri, 2021-02-19 at 11:16 +0100, Andrea Corallo wrote:
> David Malcolm via Gcc-patches  writes:
> 
> > I tried several approaches to fixing this; this seemed the
> > least invasive.
> 
> Hi Dave,
> 
> thanks for fixing this.
> 
> I do have a trivial question: are we guaranteed that the middle-end
> will
> not try to add any build-in other than a trap?

No; the middle-end can add various other builtins.

The C/C++ FE have c_define_builtins, which calls def_builtin_1 on
everything with DEF_BUILTIN, then calls:
  targetm.init_builtins ();
  build_common_builtin_nodes ();

jit_langhook_init calls build_common_builtin_nodes.

The jit builtins_manager creates recording::functions (and their
supporting types) for any builtins that are explicitly used, and these
recording::functions get turned into trees during playback.

I looked at the doing the equivalent of DEF_BUILTIN for any builtins
that haven't been created yet, but when to do it?

(a) in the builtin manager... but not all builtin-types are supported
yet (e.g. BT_FLOAT16), so it would be a big patch to do it that way

(b) in jit_langhook_init... but then if the user creates a builtin that
wasn't referenced before in a 2nd in-memory compile, I think the types
can get out-of-sync.

So I went with the patch I did as it seems to be the least invasive way
of fixing the specific problem.

Dave




Re: [PATCH] PR 99133, Mark xxspltiw, xxspltidp, and xxsplti32x as being prefixed

2021-02-19 Thread Segher Boessenkool
On Thu, Feb 18, 2021 at 10:44:14PM -0500, Michael Meissner wrote:
> On Wed, Feb 17, 2021 at 06:09:39PM -0600, Segher Boessenkool wrote:
> > Why test a p10 patch on a p8?
> 
> Well it is just a code gen patch, so I can test it on any system, even an
> x86_64 with a cross compiler.  Given that right now xxsplatiw is only created
> via a built-in,

That in itself should be fixed btw (don't use unspecs).

> > > +   Some prefixed instructions (xxspltiw, xxspltidp, xxsplti32dp, etc.) 
> > > do not
> > > +   have a leading 'p'.  Setting the prefix attribute to special does not 
> > > the 'p'
> > > +   prefix.  */
> > 
> > (grammar)
> > 
> > "special" is the *normal* case.  *Most* prefixed insns will be like
> > this.  They aren't right now, but they will be.
> > 
> > It sounds like you should make a new attribute, "always_prefixed" or
> > something, that then the code that sets "prefixed" can use.
> 
> Yes agreed.

Or better yet, "maybe_prefixed", which you set when something else will
make the decision.  Mmostly the default value of the prefixed attribute
will then do that. See "maybe_var_shift" / "var_shift" for example.

> > >  void
> > >  rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int)
> > >  {
> > > -  next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO);
> > > +  next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO
> > > +   && get_attr_prefixed (insn) != PREFIXED_SPECIAL);
> > >return;
> > >  }
> > 
> > So you set next_insn_prefixed_p when exactly?  The original code was
> > correct, as far as I can see?
> 
> rs6000_final_prescan_insn is called in the FINAL_PRESCAN_INSN target hook, and
> it is the only place we can set the flag for ASM_OUTPUT_OPCODE to print the
> initial 'p'.  As you know, during the development of prefixed support, I went
> through various different methods of generating the prefixed instruction
> (i.e. pld instead of ld).  The current method works for normal load, store and
> add instructions that have a normal form and a prefixed form.  But it doesn't
> work for other instructions that we need to start dealing with. 

Ah, the flag doesn't mean that the next insn is prefixed.  It means we
want to prefix the mnemonic with a "p", instead.  So some name change is
in order (as a separate, trivial, patch at the start of the series) --
this helps bisection if needed later, but it also helps reviewing
enormously).

> > So, for insns like xxspltiw you should just set "prefixed" to "yes",
> > because that makes sense, is not confusing.  The code that prefixes "p"
> > to the mnemonic should change, instead.  It can look at some new
> > attribute, but it could also just use
> >   prefixed_load_p (insn) || prefixed_store_p (insn)
> >   || prefixed_paddi_p (insn)
> > or similar (perhaps make a helper function for that then?)
> 
> Yes.  Pat Haugen will be taking over the PR.

Thanks,


Segher


[PATCH] PR fortran/99169 - [9/10/11 Regression] Segfault when passing allocatable scalar into intent(out) dummy argument

2021-02-19 Thread Harald Anlauf via Gcc-patches
Dear all,

we should not clobber the pointer in case of an allocatable scalar being
an intent(out) dummy argument to a procedure.

Regtested on x86_64-pc-linux-gnu.

OK for master?  Since this is a regression, also for backports to 10/9?

Thanks,
Harald


PR fortran/99169 - Do not clobber allocatable intent(out) dummy argument

gcc/fortran/ChangeLog:

* trans-expr.c (gfc_conv_procedure_call): Do not add clobber to
allocatable intent(out) argument.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_3.f90: New test.

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 103cb31c664..cab58cd1bba 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -6077,6 +6079,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 			&& !fsym->attr.allocatable && !fsym->attr.pointer
 			&& !e->symtree->n.sym->attr.dimension
 			&& !e->symtree->n.sym->attr.pointer
+			&& !e->symtree->n.sym->attr.allocatable
 			/* See PR 41453.  */
 			&& !e->symtree->n.sym->attr.dummy
 			/* FIXME - PR 87395 and PR 41453  */
diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_3.f90 b/gcc/testsuite/gfortran.dg/intent_optimize_3.f90
new file mode 100644
index 000..6ecd722da76
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_optimize_3.f90
@@ -0,0 +1,16 @@
+! { dg-do run }
+! { dg-options "-O2" }
+! PR99169 - Segfault passing allocatable scalar into intent(out) dummy argument
+
+program p
+  implicit none
+  integer, allocatable :: i
+  allocate (i)
+  call set (i)
+  if (i /= 5) stop 1
+contains
+  subroutine set (i)
+integer, intent(out) :: i
+i = 5
+  end subroutine set
+end program p


Re: [PATCH v7] Practical improvement to libgcc complex divide

2021-02-19 Thread Joseph Myers
On Tue, 2 Feb 2021, Patrick McGehearty via Gcc-patches wrote:

> if (mode == TYPE_MODE (double_type_node))
> - ; /* Empty suffix correct.  */
> + {
> +   ; /* Empty suffix correct.  */
> +   memcpy (modename, "DBL", 4);
> + }
> else if (mode == TYPE_MODE (float_type_node))
> - suffix[0] = 'f';
> + {
> +   suffix[0] = 'f';
> +   memcpy (modename, "FLT", 4);
> + }
> else if (mode == TYPE_MODE (long_double_type_node))
> - suffix[0] = 'l';
> + {
> +   suffix[0] = 'l';
> +   memcpy (modename, "LDBL", 4);
> + }
> else
>   {
> bool found_suffix = false;
> @@ -1305,6 +1316,8 @@ c_cpp_builtins (cpp_reader *pfile)
>   sprintf (suffix, "f%d%s", floatn_nx_types[i].n,
>floatn_nx_types[i].extended ? "x" : "");
>   found_suffix = true;
> + sprintf (modename, "FLT%d%s", floatn_nx_types[i].n,
> +  floatn_nx_types[i].extended ? "X" : "");
>   break;
> }
> gcc_assert (found_suffix);

Here you're properly computing the mapping from mode to float.h macro 
prefix (though I think "modename" is a confusing name for the variable 
used for float.h macro prefixes; to me, "modename" suggests the names such 
as SF or DF, which are actually "name" here, and something like 
"float_h_prefix" would be better than "modename").

> +   if ((mode == TYPE_MODE (float_type_node))
> +   || (mode == TYPE_MODE (double_type_node))
> +   || (mode == TYPE_MODE (long_double_type_node)))

But then here you still have the check for the mode matching one of those 
three types, which defeats the point of computing the prefix in a way that 
works for *all* floating-point modes supported in libgcc.  (The above 
"gcc_assert (found_suffix)" asserts that the relevant mapping did get 
found.)

To be able to use the __LIBGCC_* macros in libgcc in all cases, they 
should be defined, using the modename (or float_h_prefix) found above, 
whether or not the mode matches one of float, double and long double.

>  #elif defined(L_multc3) || defined(L_divtc3)
>  # define MTYPE   TFtype
>  # define CTYPE   TCtype
>  # define MODEtc
>  # define CEXT__LIBGCC_TF_FUNC_EXT__
>  # define NOTRUNC (!__LIBGCC_TF_EXCESS_PRECISION__)
> +#if defined(__LIBGCC_TF_MIN__)
> +# define RBIG((__LIBGCC_TF_MAX__)/2)
> +# define RMIN(__LIBGCC_TF_MIN__)
> +# define RMIN2   (__LIBGCC_TF_EPSILON__)
> +# define RMINSCAL (1/__LIBGCC_TF_EPSILON__)
> +#else
> +# define RBIG((__LIBGCC_XF_MAX__)/2)
> +# define RMIN(__LIBGCC_XF_MIN__)
> +# define RMIN2   (__LIBGCC_XF_EPSILON__)
> +# define RMINSCAL (1/__LIBGCC_XF_EPSILON__)
> +#endif

Once you've removed the unnecessary conditional above, you don't need this 
conditional on __LIBGCC_TF_MIN__ being defined either; the *TF* macros 
will always be defined when TFmode is supported in libgcc, so you never 
need to use *XF* macros instead when building TFmode code.

-- 
Joseph S. Myers
jos...@codesourcery.com


c++: Incorrect module-number ordering [PR 98741]

2021-02-19 Thread Nathan Sidwell
One of the very strong invariants in modules is that module numbers are 
allocated such that (other than the current TU), all imports have lesser 
module numbers, and also that the binding vector is only appended to 
with increasing module numbers.   This broke down when module-directives 
became a thing and the preprocessing became entirely decoupled from 
parsing.  We'd load header units and their macros (but not symbols of 
course) during preprocessing.  Then we'd load named modules during 
parsing.  This could lead to the situation where a header unit appearing 
after a named import had a lower module number than the import. 
Consequently, if they both bound the same identifier, the binding vector 
would be misorderd and bad things happen.


This patch restores a pending import queue I previously had, but in 
simpler form (hurrah).  During preprocessing we queue all 
module-directives and when we meet one for a header unit we do the 
minimal loading for all of the queue, so they get appropriate numbering. 
 Then we load the preprocessor state for the header unit.


PR c++/98741
gcc/cp/
* module.cc (pending_imports): New.
(declare_module): Adjust test condition.
(name_pending_imports): New.
(preprocess_module): Reimplement using pending_imports.
(preprocessed_module): Move name-getting to name_pending_imports.
* name-lookup.c (append_imported_binding_slot): Assert module
ordering is increasing.
gcc/testsuite/
* g++.dg/modules/pr98741_a.H: New.
* g++.dg/modules/pr98741_b.H: New.
* g++.dg/modules/pr98741_c.C: New.
* g++.dg/modules/pr98741_d.C: New.

--
Nathan Sidwell
diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc
index 691381bf995..3d17b8ddcdb 100644
--- c/gcc/cp/module.cc
+++ w/gcc/cp/module.cc
@@ -3873,6 +3873,9 @@ module_state_hash::equal (const value_type existing,
 /* Mapper name.  */
 static const char *module_mapper_name;
 
+/* Deferred import queue (FIFO).  */
+static vec *pending_imports;
+
 /* CMI repository path and workspace.  */
 static char *cmi_repo;
 static size_t cmi_repo_length;
@@ -18944,7 +18947,7 @@ declare_module (module_state *module, location_t from_loc, bool exporting_p,
   gcc_assert (global_namespace == current_scope ());
 
   module_state *current = (*modules)[0];
-  if (module_purview_p () || module->loadedness != ML_NONE)
+  if (module_purview_p () || module->loadedness > ML_CONFIG)
 {
   error_at (from_loc, module_purview_p ()
 		? G_("module already declared")
@@ -19275,6 +19278,70 @@ module_begin_main_file (cpp_reader *reader, line_maps *lmaps,
 }
 }
 
+/* Process the pending_import queue, making sure we know the
+   filenames.   */
+
+static void
+name_pending_imports (cpp_reader *reader, bool at_end)
+{
+  auto *mapper = get_mapper (cpp_main_loc (reader));
+
+  bool only_headers = (flag_preprocess_only
+		   && !bool (mapper->get_flags () & Cody::Flags::NameOnly)
+		   && !cpp_get_deps (reader));
+  if (at_end
+  && (!vec_safe_length (pending_imports) || only_headers))
+/* Not doing anything.  */
+return;
+
+  timevar_start (TV_MODULE_MAPPER);
+
+  dump.push (NULL);
+  dump () && dump ("Resolving direct import names");
+
+  mapper->Cork ();
+  for (unsigned ix = 0; ix != pending_imports->length (); ix++)
+{
+  module_state *module = (*pending_imports)[ix];
+  gcc_checking_assert (module->is_direct ());
+  if (!module->filename)
+	{
+	  Cody::Flags flags
+	= (flag_preprocess_only ? Cody::Flags::None
+	   : Cody::Flags::NameOnly);
+
+	  if (only_headers && !module->is_header ())
+	;
+	  else if (module->module_p
+		   && (module->is_partition () || module->exported_p))
+	mapper->ModuleExport (module->get_flatname (), flags);
+	  else
+	mapper->ModuleImport (module->get_flatname (), flags);
+	}
+}
+  
+  auto response = mapper->Uncork ();
+  auto r_iter = response.begin ();
+  for (unsigned ix = 0; ix != pending_imports->length (); ix++)
+{
+  module_state *module = (*pending_imports)[ix];
+  gcc_checking_assert (module->is_direct ());
+  if (only_headers && !module->is_header ())
+	;
+  else if (!module->filename)
+	{
+	  Cody::Packet const &p = *r_iter;
+	  ++r_iter;
+
+	  module->set_filename (p);
+	}
+}
+
+  dump.pop (0);
+
+  timevar_stop (TV_MODULE_MAPPER);
+}
+
 /* We've just lexed a module-specific control line for MODULE.  Mark
the module as a direct import, and possibly load up its macro
state.  Returns the primary module, if this is a module
@@ -19322,17 +19389,22 @@ preprocess_module (module_state *module, location_t from_loc,
 	}
 }
 
+  auto desired = ML_CONFIG;
   if (is_import
-  && !module->is_module () && module->is_header ()
-  && module->loadedness < ML_PREPROCESSOR
+  && module->is_header ()
   && (!cpp_get_options (reader)->preprocessed
 	  || cpp_get_options (reader)->directives_only))
+/* We need preprocessor sta

Re: [PATCH v7] Practical improvement to libgcc complex divide

2021-02-19 Thread Patrick McGehearty via Gcc-patches

Comments inline

On 2/19/2021 3:27 PM, Joseph Myers wrote:

On Tue, 2 Feb 2021, Patrick McGehearty via Gcc-patches wrote:


  if (mode == TYPE_MODE (double_type_node))
-   ; /* Empty suffix correct.  */
+   {
+ ; /* Empty suffix correct.  */
+ memcpy (modename, "DBL", 4);
+   }
  else if (mode == TYPE_MODE (float_type_node))
-   suffix[0] = 'f';
+   {
+ suffix[0] = 'f';
+ memcpy (modename, "FLT", 4);
+   }
  else if (mode == TYPE_MODE (long_double_type_node))
-   suffix[0] = 'l';
+   {
+ suffix[0] = 'l';
+ memcpy (modename, "LDBL", 4);
+   }
  else
{
  bool found_suffix = false;
@@ -1305,6 +1316,8 @@ c_cpp_builtins (cpp_reader *pfile)
sprintf (suffix, "f%d%s", floatn_nx_types[i].n,
 floatn_nx_types[i].extended ? "x" : "");
found_suffix = true;
+   sprintf (modename, "FLT%d%s", floatn_nx_types[i].n,
+floatn_nx_types[i].extended ? "X" : "");
break;
  }
  gcc_assert (found_suffix);

Here you're properly computing the mapping from mode to float.h macro
prefix (though I think "modename" is a confusing name for the variable
used for float.h macro prefixes; to me, "modename" suggests the names such
as SF or DF, which are actually "name" here, and something like
"float_h_prefix" would be better than "modename").


float_h_prefix puts me in mind of half-precision floating point rather 
than float.h.

How would modeprefix do instead of modename?




+ if ((mode == TYPE_MODE (float_type_node))
+ || (mode == TYPE_MODE (double_type_node))
+ || (mode == TYPE_MODE (long_double_type_node)))

But then here you still have the check for the mode matching one of those
three types, which defeats the point of computing the prefix in a way that
works for *all* floating-point modes supported in libgcc.  (The above
"gcc_assert (found_suffix)" asserts that the relevant mapping did get
found.)

To be able to use the __LIBGCC_* macros in libgcc in all cases, they
should be defined, using the modename (or float_h_prefix) found above,
whether or not the mode matches one of float, double and long double.


My apologies. You are completely correct. I intended to remove the "mode 
== TYPE_MODE" code

when making the other changes but overlooked it.




  #elif defined(L_multc3) || defined(L_divtc3)
  # define MTYPETFtype
  # define CTYPETCtype
  # define MODE tc
  # define CEXT __LIBGCC_TF_FUNC_EXT__
  # define NOTRUNC (!__LIBGCC_TF_EXCESS_PRECISION__)
+#if defined(__LIBGCC_TF_MIN__)
+# define RBIG  ((__LIBGCC_TF_MAX__)/2)
+# define RMIN  (__LIBGCC_TF_MIN__)
+# define RMIN2 (__LIBGCC_TF_EPSILON__)
+# define RMINSCAL (1/__LIBGCC_TF_EPSILON__)
+#else
+# define RBIG  ((__LIBGCC_XF_MAX__)/2)
+# define RMIN  (__LIBGCC_XF_MIN__)
+# define RMIN2 (__LIBGCC_XF_EPSILON__)
+# define RMINSCAL (1/__LIBGCC_XF_EPSILON__)
+#endif

Once you've removed the unnecessary conditional above, you don't need this
conditional on __LIBGCC_TF_MIN__ being defined either; the *TF* macros
will always be defined when TFmode is supported in libgcc, so you never
need to use *XF* macros instead when building TFmode code.
And as you note, the test for LIBGCC_TF_MIN being defined is not needed 
either.


I'll resubmit the patch after I rebuild and complete testings on the 
revised code.


- patrick



Re: [PATCH v7] Practical improvement to libgcc complex divide

2021-02-19 Thread Joseph Myers
On Fri, 19 Feb 2021, Patrick McGehearty via Gcc-patches wrote:

> > Here you're properly computing the mapping from mode to float.h macro
> > prefix (though I think "modename" is a confusing name for the variable
> > used for float.h macro prefixes; to me, "modename" suggests the names such
> > as SF or DF, which are actually "name" here, and something like
> > "float_h_prefix" would be better than "modename").
> 
> float_h_prefix puts me in mind of half-precision floating point rather than
> float.h.
> How would modeprefix do instead of modename?

I'm not sure modeprefix is better; you could say SF is the prefix of 
SFmode, for example.  Hence my suggestion of a name that makes clear it's 
the prefix *in float.h*.  typeprefix might be a possibility, as it's 
really a prefix associated with a corresponding type rather than directly 
with the mode.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH v3 1/2] MIPS: Not trigger error for pre-R6 and -mcompact-branches=always

2021-02-19 Thread YunQiang Su
For MIPSr6, we may wish to use compact-branches only.
Currently, we have to use `always' option, while it is mark as conflict
with pre-R6.
  cc1: error: unsupported combination: ‘mips32r2’ -mcompact-branches=always
Just ignore -mcompact-branches=always for pre-R6.

This patch also defines
__mips_compact_branches_never
__mips_compact_branches_always
__mips_compact_branches_optimal
predefined macros

gcc/ChangeLog:
* config/mips/mips.c (mips_option_override):
* config/mips/mips.h (TARGET_RTP_PIC): not trigger error for
compact-branches=always for pre-R6.
(TARGET_CB_NEVER): Likewise.
(TARGET_CB_ALWAYS): Likewise.
(struct mips_cpu_info): define macros for compact branch policy.
* doc/invoke.texi: Document "always" with pre-R6.

gcc/testsuite/ChangeLog:
* gcc.target/mips/compact-branches-1.c: add isa_rev>=6.
* gcc.target/mips/mips.exp: don't add -mipsXXr6 option for
-mcompact-branches=always. It is usable for pre-R6 now.
* gcc.target/mips/compact-branches-8.c: New test.
* gcc.target/mips/compact-branches-9.c: New test.
---
 gcc/ChangeLog | 10 +
 gcc/config/mips/mips.c|  8 +--
 gcc/config/mips/mips.h| 22 ---
 gcc/doc/invoke.texi   |  1 +
 gcc/testsuite/ChangeLog   |  8 +++
 .../gcc.target/mips/compact-branches-1.c  |  2 +-
 .../gcc.target/mips/compact-branches-8.c  | 10 +
 .../gcc.target/mips/compact-branches-9.c  | 10 +
 gcc/testsuite/gcc.target/mips/mips.exp|  4 +---
 9 files changed, 56 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/compact-branches-8.c
 create mode 100644 gcc/testsuite/gcc.target/mips/compact-branches-9.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index eb1a44ae676..caa3fda48ce 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2021-02-20  YunQiang Su  
+
+   * config/mips/mips.c (mips_option_override):
+   * config/mips/mips.h (TARGET_RTP_PIC): not trigger error for
+   compact-branches=always for pre-R6.
+   (TARGET_CB_NEVER): Likewise.
+   (TARGET_CB_ALWAYS): Likewise.
+   (struct mips_cpu_info): define macros for compact branch policy.
+   * doc/invoke.texi: Document "always" with pre-R6.
+
 2021-02-19  Martin Sebor  
 
PR c/97172
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 8bd2d29552e..9a75dd61031 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -20107,13 +20107,7 @@ mips_option_override (void)
   target_flags |= MASK_ODD_SPREG;
 }
 
-  if (!ISA_HAS_COMPACT_BRANCHES && mips_cb == MIPS_CB_ALWAYS)
-{
-  error ("unsupported combination: %qs%s %s",
- mips_arch_info->name, TARGET_MICROMIPS ? " -mmicromips" : "",
- "-mcompact-branches=always");
-}
-  else if (!ISA_HAS_DELAY_SLOTS && mips_cb == MIPS_CB_NEVER)
+  if (!ISA_HAS_DELAY_SLOTS && mips_cb == MIPS_CB_NEVER)
 {
   error ("unsupported combination: %qs%s %s",
  mips_arch_info->name, TARGET_MICROMIPS ? " -mmicromips" : "",
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index b4a60a55d80..b8399fe1b0d 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -103,11 +103,9 @@ struct mips_cpu_info {
 #define TARGET_RTP_PIC (TARGET_VXWORKS_RTP && flag_pic)
 
 /* Compact branches must not be used if the user either selects the
-   'never' policy or the 'optimal' policy on a core that lacks
+   'never' policy or the 'optimal' / 'always' policy on a core that lacks
compact branch instructions.  */
-#define TARGET_CB_NEVER (mips_cb == MIPS_CB_NEVER  \
-|| (mips_cb == MIPS_CB_OPTIMAL \
-&& !ISA_HAS_COMPACT_BRANCHES))
+#define TARGET_CB_NEVER (mips_cb == MIPS_CB_NEVER || !ISA_HAS_COMPACT_BRANCHES)
 
 /* Compact branches may be used if the user either selects the
'always' policy or the 'optimal' policy on a core that supports
@@ -117,10 +115,11 @@ struct mips_cpu_info {
 && ISA_HAS_COMPACT_BRANCHES))
 
 /* Compact branches must always be generated if the user selects
-   the 'always' policy or the 'optimal' policy om a core that
-   lacks delay slot branch instructions.  */
-#define TARGET_CB_ALWAYS (mips_cb == MIPS_CB_ALWAYS\
-|| (mips_cb == MIPS_CB_OPTIMAL \
+   the 'always' policy on a core support compact branches,
+   or the 'optimal' policy on a core that lacks delay slot branch 
instructions.  */
+#define TARGET_CB_ALWAYS ((mips_cb == MIPS_CB_ALWAYS \
+&& ISA_HAS_COMPACT_BRANCHES) \
+|| (mips_cb == MIPS_CB_OPTIMAL   \
 && !ISA_HAS_DELAY_SLOTS))
 
 /* Special handling for JRC that exists in microMIPSR3 as well as R6
@@ -655,6 +654,13 @@ st

[PATCH v3 2/2] MIPS: add builtime option for -mcompact-branches

2021-02-19 Thread YunQiang Su
For R6+ target, it allows to configure gcc to use compact branches only.

gcc/ChangeLog:
* config.gcc: add -with-compact-branches=policy build option.
* doc/install.texi: Likewise.
---
 gcc/ChangeLog|  5 +
 gcc/config.gcc   | 12 +++-
 gcc/doc/install.texi | 19 +++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index caa3fda48ce..c5fae50e782 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2021-02-20  YunQiang Su  
+
+   * config.gcc: add -with-compact-branches=policy build option.
+   * doc/install.texi: Likewise.
+
 2021-02-20  YunQiang Su  
 
* config/mips/mips.c (mips_option_override):
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 17fea83b2e4..047f5631067 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4743,7 +4743,7 @@ case "${target}" in
;;
 
mips*-*-*)
-   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4"
+   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4 compact-branches"
 
case ${with_float} in
"" | soft | hard)
@@ -4896,6 +4896,16 @@ case "${target}" in
exit 1
;;
esac
+
+   case ${with_compact_branches} in
+   never | always | optimal)
+   with_compact_branches=${with_compact_branches}
+   ;;
+   *)
+   echo "Unknown compact-branches policy used in 
--with-compact-branches" 1>&2
+   exit 1
+   ;;
+   esac
;;
 
nds32*-*-*)
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 4c38244ae58..865630826c6 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1464,6 +1464,25 @@ systems that support conditional traps).
 Division by zero checks use the break instruction.
 @end table
 
+@item --with-compact-branches=@var{policy}
+Specify how the compiler should generate code for checking for
+division by zero.  This option is only supported on the MIPS target.
+The possibilities for @var{type} are:
+@table @code
+@item optimal
+Cause a delay slot branch to be used if one is available in the
+current ISA and the delay slot is successfully filled. If the delay slot
+is not filled, a compact branch will be chosen if one is available.
+@item never
+Ensures that compact branch instructions will never be generated.
+@item always
+Ensures that a compact branch instruction will be generated if available.
+If a compact branch instruction is not available,
+a delay slot form of the branch will be used instead.
+This option is supported from MIPS Release 6 onwards.
+For pre-R6, this option is just same as never/optimal.
+@end table
+
 @c If you make --with-llsc the default for additional targets,
 @c update the --with-llsc description in the MIPS section below.
 
-- 
2.20.1