[PR94092] Re: [RFC] test builtin ratio for loop distribution
Here's an improved version of the patch. Regstrapped on x86_64-linux-gnu, with and without a patchlet that moved multi-pieces ahead of setmem, and also tested with riscv32-elf. Is it ok to install? Or should it wait for stage1? [PR94092] introduce try store by multiple pieces From: Alexandre Oliva The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length clearing memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch handles variable lengths with multiple conditional power-of-2-constant-sized stores-by-pieces, so as to reduce the overhead of length compares. It also changes the last copy-prop pass into ccp, so that pointer alignment and length's nonzero bits are detected and made available for the expander, even for ldist-introduced SSA_NAMEs. for gcc/ChangeLog PR tree-optimization/94092 * builtins.c (try_store_by_multiple_pieces): New. (expand_builtin_memset_args): Use it. If target_char_cast fails, proceed as for non-constant val. Pass len's ctz to... * expr.c (clear_storage_hints): ... this. Try store by multiple pieces after setmem. (clear_storage): Adjust. * expr.h (clear_storage_hints): Likewise. (try_store_by_multiple_pieces): Declare. * passes.def: Replace the last copy_prop to ccp. --- gcc/builtins.c | 182 ++-- gcc/expr.c |9 ++- gcc/expr.h | 13 gcc/passes.def |5 +- 4 files changed, 197 insertions(+), 12 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 0aed008687ccc..05b14f2a3f6d3 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -6600,6 +6600,166 @@ expand_builtin_memset (tree exp, rtx target, machine_mode mode) return expand_builtin_memset_args (dest, val, len, target, mode, exp); } +/* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO. + Return TRUE if successful, FALSE otherwise. TO is assumed to be + aligned at an ALIGN-bits boundary. LEN must be a multiple of + 1<= 0); + + if (val) +valc = 1; + + /* Bits more significant than TST_BITS are part of the shared prefix + in the binary representation of both min_len and max_len. Since + they're identical, we don't need to test them in the loop. */ + int tst_bits = (max_bits != min_bits ? max_bits + : floor_log2 (max_len ^ min_len)); + + /* Check whether it's profitable to start by storing a fixed BLKSIZE + bytes, to lower max_bits. In the unlikely case of a constant LEN + (implied by identical MAX_LEN and MIN_LEN), we want to issue a + single store_by_pieces, but otherwise, select the minimum multiple + of the ALIGN (in bytes) and of the MCD of the possible LENs, that + brings MAX_LEN below TST_BITS, if that's lower than min_len. */ + unsigned HOST_WIDE_INT blksize; + if (max_len > min_len) +{ + unsigned HOST_WIDE_INT alrng = MAX (HOST_WIDE_INT_1U << ctz_len, + align / BITS_PER_UNIT); + blksize = max_len- (HOST_WIDE_INT_1U << tst_bits) + alrng; + blksize &= ~(alrng - 1); +} + else if (max_len == min_len) +blksize = max_len; + else +gcc_unreachable (); + if (min_len >= blksize) +{ + min_len -= blksize; + min_bits = floor_log2 (min_len); + max_len -= blksize; + max_bits = floor_log2 (max_len); + + tst_bits = (max_bits != min_bits ? max_bits +: floor_log2 (max_len ^ min_len)); +} + else +blksize = 0; + + /* Check that we can use store by pieces for the maximum store count + we may issue (initial fixed-size block, plus conditional + power-of-two-sized from max_bits to ctz_len. */ + unsigned HOST_WIDE_INT xlenest = blksize; + if (max_bits >= 0) +xlenest += ((HOST_WIDE_INT_1U << max_bits) * 2 + - (HOST_WIDE_INT_1U << ctz_len)); + if (!can_store_by_pieces (xlenest, builtin_memset_read_str, + &valc, align, true)) +return false; + + rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode); + void *constfundata; + if (val) +{ + constfun = builtin_memset_gen_str; + constfundata = val = force_reg (TYPE_MODE (unsigned_char_type_node), + val); +} + else +{ + constfun = builtin_memset_read_str; + constfundata = &valc; +} + + rtx ptr =
Re: [PATCH] PR fortran/99147 - Sanitizer detects heap-use-after-free in gfc_add_flavor
Hi Harald, It looks 'obvious' to me too and is certainly OK for master. Thanks Paul On Thu, 18 Feb 2021 at 21:30, Harald Anlauf via Fortran wrote: > Dear all, > > the PR reports an issue detected with an ASAN instrumented compiler, > which can also be verified with valgrind. It appears that the state > of gfc_new_block could be such that it should not be dereferenced. > Reversing the order of condition evaluation helped. > > I failed to find out why this should happen, but then other places > in the code put dereferences of gfc_new_block behind other checks. > Simple things like initializing gfc_new_block with NULL in decl.c > did not help. > > Regtested on x86_64-pc-linux-gnu. No testcase added since the issue > can be found only with an instrumented compiler or valgrind. > > I consider the patch to be obvious and trivial, but post it here > in case somebody wants to dig deeper. > > OK for master? > > Thanks, > Harald > > > PR fortran/99147 - Sanitizer detects heap-use-after-free in gfc_add_flavor > > Reverse order of conditions to avoid invalid read. > > gcc/fortran/ChangeLog: > > * symbol.c (gfc_add_flavor): Reverse order of conditions. > > -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: [PATCH 0/2] IBM Z: Fix long double <-> DFP conversions
On 2/18/21 1:57 PM, Ilya Leoshkevich wrote: > This series fixes PR99134. Patch 1 is factored out from the pending > [1], patch 2 is the actual fix. Bootstrapped and regtested on > s390x-redhat-linux. Ok for master? > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html > > Ilya Leoshkevich (2): > IBM Z: Improve FPRX2 <-> TF conversions > IBM Z: Fix long double <-> DFP conversions Ok. Thanks! Andreas
[PATCH] aarch64: Introduce prefer_advsimd_autovec to GCC 10
Hi all, This patch introduces the prefer_advsimd_autovec internal tune flag that's already available on the GCC 8 and 9 branches. It allows a CPU tuning to specify that it prefers Advanced SIMD for autovectorisation rather than SVE. In GCC 10 onwards this can be easily adjusted through the aarch64_autovec_preference param in the options override hook. The neoversev1_tunings struct makes use of this tuning flag Bootstrapped and tested on aarch64-none-linux-gnu. Confirmed that an --param aarch64-autovec-preference can override the CPU setting if the user really wishes to. Pushing to the GCC 10 branch. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (prefer_advsimd_autovec): Define. * config/aarch64/aarch64.c (neoversev1_tunings): Use it. (aarch64_override_options_internal): Adjust aarch64_autovec_preference param when prefer_advsimd_autovec is enabled. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd_autovec_only_1.c: New test. asimd-vec-10.patch Description: asimd-vec-10.patch
*PING**2 Re: [Patch] Fortran: Fix coarray handling for gfc_dep_resolver [PR99010] (was: Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913]
On 09.02.21 12:52, Tobias Burnus wrote: Hi all, hi Thomas, On 02.02.21 18:15, Tobias Burnus wrote: I think I will do a combination: If 'identical' is true, I think I cannot remove it. If it is false, it can be identical or nonoverlapping – which makes sense. Well, it turned out that coarray scalars did not work; for them, the array ref consists only of the coindexed access: falling through then triggered as fin_dep == GFC_DEP_ERROR. To be more careful, I also removed the 'identical &&' check such that the first loop is now always triggered if coarray != SINGLE not only if identical – I am not sure whether it is really needed or whether falling though is the better solution (with updating the comment). I would be happy if someone could carefully check the logic – in the details, it is rather confusing. OK? Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
Re: [PATCH 0/2] RISC-V big endian support
Hi Marcus: > I could also disable the -mlittle-endian and -mbig-endian options if > the binutils is too old, but it's questionable if getting a "I don't > understand -mbig-endian" from gcc is more useful than getting it from > gas. And it would be more work for the user to fix it since then they > would have to also rebuild gcc after upgrading binutils. Sounds reasonable to me, I am ok with keeping it as it is without binutils change. I tried to run with gcc testsuite on spike + pk, seems got a bunch extra fail cases, could you take a look for that? = Summary of gcc testsuite = | # of unexpected case / # of unique unexpected case | gcc | g++ | gfortran | rv64gc/ lp64/ medlow | 476 / 109 | 131 /25 | - | Makefile:914: recipe for target 'report-gcc-newlib' failed I've setup test env via riscv-gnu-toolchain, you can reproduce by following command: $ git clone g...@github.com:riscv/riscv-gnu-toolchain.git -b big-endian riscv-gnu-toolchain-be $ cd riscv-gnu-toolchain-be $ mkdir build && cd build $ ../configure --prefix=`pwd`/install --with-arch=rv64gc --with-multilib-generator="rv64gc-lp64--" $ make report SIM=spike -j`nproc` On Sat, Jan 30, 2021 at 3:54 AM Marcus Comstedt wrote: > > > Hi Kito, > > Kito Cheng writes: > > > You can add a check in configure.ac and config.in to detect whether > > binutils is supported or not. > > here is example, search HAVE_AS_MISA_SPEC in this patch: > > https://github.com/gcc-mirror/gcc/commit/4b81528241ca682025d92558ff6aeec91dafdca8 > > Ok, but I specifically _don't_ want to require a newer binutils for > the little endian triplets (unless there is also some other RISC-V > change meaning that the newer binutils is needed anyway, in which case > this point becomes moot). > > I could check the binutils version in the case of big endian triplets > only, but since older binutils can't be built for those triplets > anyway (due to config.sub), I don't think it would do much. > > I could also disable the -mlittle-endian and -mbig-endian options if > the binutils is too old, but it's questionable if getting a "I don't > understand -mbig-endian" from gcc is more useful than getting it from > gas. And it would be more work for the user to fix it since then they > would have to also rebuild gcc after upgrading binutils. > > > // Marcus > >
Re: [PATCH] run -Wnonnull later (PR 87489)
Am 2021-02-01 um 01:31 schrieb Martin Sebor via Gcc-patches: The initial -Wnonnull implementation in the middle end took place too late in the pipeline (just before expansion), and as a result was prone to false positives (bug 78817). In an attempt to avoid the worst of those, the warning was moved to the ccp2 pass in r243874. However, as the test case in PR 87489 shows, this is in turn too early and causes other false positives as a result. A few experiments with running the warning later suggest that just before the mergephi2 pass is a good point to avoid this class of false positives without causing any regressions or introducing any new warnings either in Glibc or in Binutils/GDB. Since PR 87489 is a GCC 8-11 regression I propose to make this change on the trunk as well as on the release branches. Hi Martin, I tested your patch and it showed also one more warning for this testcase with -O2 -Wnonnull: class b { public: long c(); }; class B { public: B() : f() {} b *f; }; long d, e; class g : B { public: void h() { long a = f->c(); d = e = a; } }; class j { j(); g i; }; j::j() { i.h(); } I hope cvise didn't minimize too much, but at least in the original much larger code the warning looks reasonable too. Franz
[PATCH] tree-cfg: Fix up gimple_merge_blocks FORCED_LABEL handling [PR99034]
Hi! The verifiers require that DECL_NONLOCAL or EH_LANDING_PAD_NR labels are always the first label if there is more than one label. When merging blocks, we don't honor that though. On the following testcase, we try to merge blocks: [count: 0]: : S::~S (&s); and [count: 0]: : resx 1 where is landing pad and is FORCED_LABEL. And the code puts the FORCED_LABEL before the landing pad label, violating the verification requirements. The following patch fixes it by moving the FORCED_LABEL after the DECL_NONLOCAL or EH_LANDING_PAD_NR label if it is the first label. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-02-19 Jakub Jelinek PR ipa/99034 * tree-cfg.c (gimple_merge_blocks): If bb a starts with eh landing pad or non-local label, put FORCED_LABELs from bb b after that label rather than before it. * g++.dg/opt/pr99034.C: New test. --- gcc/tree-cfg.c.jj 2021-02-18 16:20:57.053826485 +0100 +++ gcc/tree-cfg.c 2021-02-18 19:15:48.436526437 +0100 @@ -2124,7 +2124,17 @@ gimple_merge_blocks (basic_block a, basi if (FORCED_LABEL (label)) { gimple_stmt_iterator dest_gsi = gsi_start_bb (a); - gsi_insert_before (&dest_gsi, stmt, GSI_NEW_STMT); + tree first_label = NULL_TREE; + if (!gsi_end_p (dest_gsi)) + if (glabel *first_label_stmt + = dyn_cast (gsi_stmt (dest_gsi))) + first_label = gimple_label_label (first_label_stmt); + if (first_label + && (DECL_NONLOCAL (first_label) + || EH_LANDING_PAD_NR (first_label) != 0)) + gsi_insert_after (&dest_gsi, stmt, GSI_NEW_STMT); + else + gsi_insert_before (&dest_gsi, stmt, GSI_NEW_STMT); } /* Other user labels keep around in a form of a debug stmt. */ else if (!DECL_ARTIFICIAL (label) && MAY_HAVE_DEBUG_BIND_STMTS) --- gcc/testsuite/g++.dg/opt/pr99034.C.jj 2021-02-18 19:05:27.205347304 +0100 +++ gcc/testsuite/g++.dg/opt/pr99034.C 2021-02-18 19:07:32.352973196 +0100 @@ -0,0 +1,23 @@ +// PR ipa/99034 +// { dg-do compile } +// { dg-options "-O2" } + +void *b[5]; +void foo (void); +struct S { ~S (); }; + +static inline void +__attribute__((always_inline)) +bar (int d) +{ + S s; + while (d) +foo (); +} + +void +baz (void) +{ + bar (2); + __builtin_setjmp (b); +} Jakub
Re: [committed] jit: fix ICE on BUILT_IN_TRAP [PR99126]
David Malcolm via Gcc-patches writes: > I tried several approaches to fixing this; this seemed the > least invasive. Hi Dave, thanks for fixing this. I do have a trivial question: are we guaranteed that the middle-end will not try to add any build-in other than a trap? Regards Andrea
Re: [committed] jit: fix ICE on BUILT_IN_TRAP [PR99126]
Andrea Corallo via Gcc-patches writes: > David Malcolm via Gcc-patches writes: > >> I tried several approaches to fixing this; this seemed the >> least invasive. > > Hi Dave, > > thanks for fixing this. > > I do have a trivial question: are we guaranteed that the middle-end will > not try to add any build-in other than a trap? Ah also, shouldn't the fix be also back-ported? Thanks Andrea
ARM patch ping
Hi! I'd like to ping the https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565225.html patch - PR98998 P1 fix. Thanks Jakub
Re: [PATCH] tree-cfg: Fix up gimple_merge_blocks FORCED_LABEL handling [PR99034]
On Fri, 19 Feb 2021, Jakub Jelinek wrote: > Hi! > > The verifiers require that DECL_NONLOCAL or EH_LANDING_PAD_NR > labels are always the first label if there is more than one label. > > When merging blocks, we don't honor that though. > On the following testcase, we try to merge blocks: > [count: 0]: > : > S::~S (&s); > > and > [count: 0]: > : > resx 1 > > where is landing pad and is FORCED_LABEL. And the code puts > the FORCED_LABEL before the landing pad label, violating the verification > requirements. > > The following patch fixes it by moving the FORCED_LABEL after the > DECL_NONLOCAL or EH_LANDING_PAD_NR label if it is the first label. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Huh, wasn't aware of such. OK. Richard. > 2021-02-19 Jakub Jelinek > > PR ipa/99034 > * tree-cfg.c (gimple_merge_blocks): If bb a starts with eh landing > pad or non-local label, put FORCED_LABELs from bb b after that label > rather than before it. > > * g++.dg/opt/pr99034.C: New test. > > --- gcc/tree-cfg.c.jj 2021-02-18 16:20:57.053826485 +0100 > +++ gcc/tree-cfg.c2021-02-18 19:15:48.436526437 +0100 > @@ -2124,7 +2124,17 @@ gimple_merge_blocks (basic_block a, basi > if (FORCED_LABEL (label)) > { > gimple_stmt_iterator dest_gsi = gsi_start_bb (a); > - gsi_insert_before (&dest_gsi, stmt, GSI_NEW_STMT); > + tree first_label = NULL_TREE; > + if (!gsi_end_p (dest_gsi)) > + if (glabel *first_label_stmt > + = dyn_cast (gsi_stmt (dest_gsi))) > + first_label = gimple_label_label (first_label_stmt); > + if (first_label > + && (DECL_NONLOCAL (first_label) > + || EH_LANDING_PAD_NR (first_label) != 0)) > + gsi_insert_after (&dest_gsi, stmt, GSI_NEW_STMT); > + else > + gsi_insert_before (&dest_gsi, stmt, GSI_NEW_STMT); > } > /* Other user labels keep around in a form of a debug stmt. */ > else if (!DECL_ARTIFICIAL (label) && MAY_HAVE_DEBUG_BIND_STMTS) > --- gcc/testsuite/g++.dg/opt/pr99034.C.jj 2021-02-18 19:05:27.205347304 > +0100 > +++ gcc/testsuite/g++.dg/opt/pr99034.C2021-02-18 19:07:32.352973196 > +0100 > @@ -0,0 +1,23 @@ > +// PR ipa/99034 > +// { dg-do compile } > +// { dg-options "-O2" } > + > +void *b[5]; > +void foo (void); > +struct S { ~S (); }; > + > +static inline void > +__attribute__((always_inline)) > +bar (int d) > +{ > + S s; > + while (d) > +foo (); > +} > + > +void > +baz (void) > +{ > + bar (2); > + __builtin_setjmp (b); > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
On Thu, Feb 18 2021, Qing Zhao via Gcc-patches wrote: > (CC’ing Kees Cook on this topic) > > Hi, > > This is the first version of the complete patch for the new security feature > for GCC: > > Initialize automatic variables with new first class option > -ftrivial-auto-var-init=[uninitialized|pattern|zero] > and a new variable attribute “uninitialized” to exclude some variables from > automatical initialization to > Control runtime overhead. > [...] > > **the complete patch: > > See the attachment. I think you forgot to attach the patch (or at least I do not see any attachment). Martin
[PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered
Hi, This patch makes sure that allocno copies are not created for unordered modes. The testcases in the PR highlighted a case where an allocno copy was being created for: (insn 121 120 123 11 (parallel [ (set (reg:VNx2QI 217) (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 0))) (clobber (scratch:VNx16BI)) ]) 4750 {*vec_duplicatevnx2qi_reg} (expr_list:REG_DEAD (reg:SI 93 [ _2 ]) (nil))) As the compiler detected that the vec_duplicate_reg pattern allowed the input and output operand to be of the same register class, it tried to create an allocno copy for these two operands, stripping subregs in the process. However, this meant that the copy was between VNx2QI and SI, which have unordered mode precisions. So at compile time we do not know which of the two modes is smaller which is a requirement when updating allocno copy costs. Regression tested on aarch64-linux-gnu. Is this OK for trunk (and after a week backport to gcc-10) ? Regards, Andre gcc/ChangeLog: 2021-02-19 Andre Vieira PR rtl-optimization/98791 * ira-conflicts.c (process_regs_for_copy): Don't create allocno copies for unordered modes. gcc/testsuite/ChangeLog: 2021-02-19 Andre Vieira PR rtl-optimization/98791 * gcc.target/aarch64/sve/pr98791.c: New test. diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c index 2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0 100644 --- a/gcc/ira-conflicts.c +++ b/gcc/ira-conflicts.c @@ -275,7 +275,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)]; ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)]; - if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2) + if (!allocnos_conflict_for_copy_p (a1, a2) + && offset1 == offset2 + && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)), + GET_MODE_PRECISION (ALLOCNO_MODE (a2 { cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn, ira_curr_loop_tree_node); diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c new file mode 100644 index ..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c @@ -0,0 +1,12 @@ +/* PR rtl-optimization/98791 */ +/* { dg-do compile } */ +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */ +extern char a[], b[]; +short c, d; +long *e; +void f() { + for (int g; g < c; g += 1) { +a[g] = d; +b[g] = e[g]; + } +}
Re: [PATCH] improve warning suppression for inlined functions (PR 98465, 98512)
* Jeff Law via Gcc-patches: > I'd lean towards deferring to gcc12 stage1 given the libstdc++ hack is > in place. That does mean that glibc will need to work around the > instance they've stumbled over in ppc's rawmemchr. We'll need to work around this in the glibc build, too. I'll check if the suggested alternative (have the alias covered by the pragma) works there as well. Thanks, Florian
Re: [committed] libstdc++: Fix __thread_yield for non-gthreads targets
On 15/02/21 16:13 +, Jonathan Wakely wrote: The __gthread_yield() function is only defined for gthreads targets, so check _GLIBCXX_HAS_GTHREADS before using it. Also reorder __thread_relax and __thread_yield so that the former can use the latter instead of repeating the same preprocessor checks. libstdc++-v3/ChangeLog: * include/bits/atomic_wait.h (__thread_yield()): Check _GLIBCXX_HAS_GTHREADS before using __gthread_yield. (__thread_relax()): Use __thread_yield() instead of repeating the preprocessor checks for __gthread_yield. --- a/libstdc++-v3/include/bits/atomic_wait.h +++ b/libstdc++-v3/include/bits/atomic_wait.h @@ -213,24 +213,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { _M_w._M_do_wait(_M_version); } }; -inline void -__thread_relax() noexcept -{ -#if defined __i386__ || defined __x86_64__ - __builtin_ia32_pause(); -#elif defined _GLIBCXX_USE_SCHED_YIELD - __gthread_yield(); -#endif -} - inline void __thread_yield() noexcept { -#if defined _GLIBCXX_USE_SCHED_YIELD +#if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD __gthread_yield(); #endif } +inline void +__thread_relax() noexcept +{ +#if defined __i386__ || defined __x86_64__ + __builtin_ia32_pause(); +#else + __gthread_yield(); As I said in the ChangeLog, this was supposed to be changed from __gthread_yield to __thread_yield. Fixed by this patch. Tested powerpc64le-linux. Committed to trunk. commit 9d449189ee4304ce4f250351c8aa393324421eef Author: Jonathan Wakely Date: Fri Feb 19 09:54:04 2021 libstdc++: Fix __thread_relax for non-gthreads non-x86 targets My recent change to the preprocessor conditions in __thread_relax() was supposed to also change the __gthread_yield() call to __thread_yield(), which has the right preprocessor checks. Instead I just removed the check for _GLIBCXX_USE_SCHED_YIELD which means the __gthread_yield() call will be ill-formed for non-gthreads targets, and targets without sched_yield(). This fixes it properly. libstdc++-v3/ChangeLog: * include/bits/atomic_wait.h (__thread_relax()): Call __thread_yield() not __gthread_yield(). diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h index 37085ae8e50..424fccbe4c5 100644 --- a/libstdc++-v3/include/bits/atomic_wait.h +++ b/libstdc++-v3/include/bits/atomic_wait.h @@ -217,7 +217,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __thread_yield() noexcept { #if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD - __gthread_yield(); + __gthread_yield(); #endif } @@ -227,7 +227,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #if defined __i386__ || defined __x86_64__ __builtin_ia32_pause(); #else - __gthread_yield(); + __thread_yield(); #endif } } // namespace __detail
RE: ARM patch ping
Hi Jakub, > -Original Message- > From: Jakub Jelinek > Sent: 19 February 2021 10:45 > To: Richard Sandiford ; Richard Earnshaw > ; Ramana Radhakrishnan > ; Kyrylo Tkachov > > Cc: gcc-patches@gcc.gnu.org > Subject: ARM patch ping > > Hi! > > I'd like to ping the > https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565225.html > patch - PR98998 P1 fix. The patch is okay, but I think we can add a testcase in gcc.target/arm/pure-code/ as Christophe says in the PR. That should skip it on incompatible configurations. Thanks, Kyrill > > Thanks > > Jakub
Re: [PATCH] gcov: use mmap pools for KVP.
On Tue, Feb 09, 2021 at 09:37:12AM +0100, Martin Liška wrote: > PING^2 > > @Honza: ? Just concerning Windows (though I don't have access to that OS and can't verify), e.g. https://github.com/m-labs/uclibc-lm32/blob/master/utils/mmap-windows.c contains public domain code to emulate mmap on top of the Windows APIs. Guess it could be simplified (we only need anonymous memory and don't need a mmap emulation, can just call). #include #include HANDLE h = CreateFileMapping (INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, size, NULL); void *mem = NULL; if (h) mem = MapViewOfFile (h, FILE_MAP_WRITE, 0, 0, size); ... // When done: if (mem) UnmapViewOfFile (mem); if (h) CloseHandle (h); Perhaps ask somebody with access to Windows to verify? Jakub
[Patch] OpenACC: C/C++ - fix async parsing [PR99137]
For: #pragma acc parallel async(1,2) avoid with C an ICE for the original tree: #pragma acc parallel async(<<< Unknown tree: c_maybe_const_expr 1, 2 >>>) It did not ICE with C++, but I think the tree does not make sense, either: #pragma acc parallel async(<<< Unknown tree: void_cst >>>, 2) OK for mainline? (Not a regression; I don't know whether it makes sense to apply to other branches). Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf OpenACC: C/C++ - fix async parsing [PR99137] gcc/c/ChangeLog: PR c/99137 * c-parser.c (c_parser_oacc_clause_async): Reject comma expressions. gcc/cp/ChangeLog: PR c/99137 * parser.c (cp_parser_oacc_clause_async): Reject comma expressions. gcc/testsuite/ChangeLog: PR c/99137 * c-c++-common/goacc/asyncwait-1.c: Update dg-error. * c-c++-common/goacc/async-1.c: New test. gcc/c/c-parser.c | 2 +- gcc/cp/parser.c| 2 +- gcc/testsuite/c-c++-common/goacc/async-1.c | 7 +++ gcc/testsuite/c-c++-common/goacc/asyncwait-1.c | 16 4 files changed, 17 insertions(+), 10 deletions(-) diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 2a49d07bab4..5cdeb21a458 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -14332,7 +14332,7 @@ c_parser_oacc_clause_async (c_parser *parser, tree list) { c_parser_consume_token (parser); - t = c_parser_expression (parser).value; + t = c_parser_expr_no_commas (parser, NULL).value; if (!INTEGRAL_TYPE_P (TREE_TYPE (t))) c_parser_error (parser, "expected integer expression"); else if (t == error_mark_node diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 70775792161..6a29b6dca10 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -37991,7 +37991,7 @@ cp_parser_oacc_clause_async (cp_parser *parser, tree list) matching_parens parens; parens.consume_open (parser); - t = cp_parser_expression (parser); + t = cp_parser_assignment_expression (parser); if (t == error_mark_node || !parens.require_close (parser)) cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true, diff --git a/gcc/testsuite/c-c++-common/goacc/async-1.c b/gcc/testsuite/c-c++-common/goacc/async-1.c new file mode 100644 index 000..a578dabce8c --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/async-1.c @@ -0,0 +1,7 @@ +/* PR c/99137 */ + +void f () +{ + #pragma acc parallel async(1,2) /* { dg-error "expected '\\)' before ',' token" } */ + ; +} diff --git a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c index 2f5d4762b49..b5d789621ec 100644 --- a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c +++ b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c @@ -9,7 +9,7 @@ f (int N, float *a, float *b) b[ii] = a[ii]; } -#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,) /* { dg-error "expected (primary-|)expression before" } */ +#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,) /* { dg-error "expected '\\)' before ',' token" } */ { for (ii = 0; ii < N; ii++) b[ii] = a[ii]; @@ -21,19 +21,19 @@ f (int N, float *a, float *b) b[ii] = a[ii]; } -#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2,) /* { dg-error "expected (primary-|)expression before" } */ +#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2,) /* { dg-error "expected '\\)' before ',' token" } */ { for (ii = 0; ii < N; ii++) b[ii] = a[ii]; } -#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2 3) /* { dg-error "expected '\\)' before numeric constant" } */ +#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2 3) /* { dg-error "expected '\\)' before ',' token" } */ { for (ii = 0; ii < N; ii++) b[ii] = a[ii]; } -#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2,,) /* { dg-error "expected (primary-|)expression before" } */ +#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) async (1,2,,) /* { dg-error "expected '\\)' before ',' token" } */ { for (ii = 0; ii < N; ii++) b[ii] = a[ii]; @@ -193,15 +193,15 @@ f (int N, float *a, float *b) #pragma acc wait async (1 2) /* { dg-error "expected '\\)' before numeric constant" } */ -#pragma acc wait async (1,) /* { dg-error "expected (primary-|)expression before" } */ +#pragma acc wait async (1,) /* { dg-error "expected '\\)' before ',' token" } */ #pragma acc wait async (,1) /* { dg-error "expected (primary-|)expression before" } */ -#pragma acc wait async (1,2,) /* { dg-error "expected (primary-|)expression before" } */ +#pragma acc wait async (1,2,) /* { dg-error "expected '\\)' before ',' token" } */ -#pragma acc wait async (1,2 3) /* { dg-error "expected '\\)' before
[PATCH] middle-end/99122 - more VLA inlining fixes
This avoids declaring a function with VLA arguments or return values as inlineable. IPA CP still ICEs, so the testcase has that disabled. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-02-19 Richard Biener PR middle-end/99122 * tree-inline.c (inline_forbidden_p): Do not inline functions with VLA arguments or return value. * gcc.dg/pr99122-3.c: New testcase. --- gcc/testsuite/gcc.dg/pr99122-3.c | 19 +++ gcc/tree-inline.c| 10 ++ 2 files changed, 29 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/pr99122-3.c diff --git a/gcc/testsuite/gcc.dg/pr99122-3.c b/gcc/testsuite/gcc.dg/pr99122-3.c new file mode 100644 index 000..6aa5b2912ca --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr99122-3.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -g -fno-ipa-cp -w" } */ + +static int foo (); + +int +bar (int n) +{ + return foo (n, 2.0); +} + +static inline int +foo (int n, struct T { char a[n]; } b) +{ + int r = 0, i; + for (i = 0; i < n; i++) +r += b.a[i]; + return r; +} diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 01a08cf27be..c993b1fee8a 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -4022,6 +4022,16 @@ inline_forbidden_p (tree fndecl) wi.info = (void *) fndecl; wi.pset = &visited_nodes; + /* We cannot inline a function with a VLA typed argument or result since + we have no implementation materializing a variable of such type in + the caller. */ + if (COMPLETE_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl))) + && !poly_int_tree_p (TYPE_SIZE (TREE_TYPE (TREE_TYPE (fndecl) +return true; + for (tree parm = DECL_ARGUMENTS (fndecl); parm; parm = DECL_CHAIN (parm)) +if (!poly_int_tree_p (DECL_SIZE (parm))) + return true; + FOR_EACH_BB_FN (bb, fun) { gimple *ret; -- 2.26.2
[PATCH] slp: fix sharing of SLP only patterns. (PR99149)
Hi Richi, The attached testcase ICEs due to a couple of issues. In the testcase you have two SLP instances that share the majority of their definition with each other. One tree defines a COMPLEX_MUL sequence and the other tree a COMPLEX_FMA. The ice happens because: 1. the refcounts are wrong, in particular the FMA case doesn't correctly count the references for the COMPLEX_MUL that it consumes. 2. when the FMA is created it incorrectly assumes it can just tear apart the MUL node that it's consuming. This is wrong and should only be done when there is no more uses of the node, in which case the vector only pattern is no longer relevant. To fix the last part the SLP only pattern reset code was moved into vect_free_slp_tree which results in cleaner code. I also think it does belong there since that function knows when there are no more uses of the node and so the pattern should be unmarked, so when the the vectorizer is inspecting the BB it doesn't find the now invalid vector only patterns. This has the obvious problem in that, eventually after analysis is done, the entire SLP tree is dissolved before codegen. Where we get into trouble as we have now dissolved the patterns too... My initial thought was to add a parameter to vect_free_slp_tree, but I know you wouldn't like that. So I am sending this patch up as an RFC. PS. This testcase actually shows that the codegen we get in these cases is not optimal. Currently this won't vectorize as the compiler thinks the vector version is too expensive. My guess here is because the patterns now unshare the tree and it's likely costing the setup for the vector code twice? Even with the shared code (without patterns, so same as GCC 10, or turning off the patterns) it won't vectorize it. The scalar code is mov w0, 0 ldr x4, [x1, #:lo12:.LANCHOR0] ldrsw x2, [x3, 20] ldr x1, [x3, 8] lsl x2, x2, 3 ldrsw x3, [x3, 16] ldp s3, s2, [x4] add x5, x1, x2 ldr s0, [x1, x2] lsl x3, x3, 3 add x4, x1, x3 fmuls1, s2, s2 fnmsub s1, s3, s3, s1 fmuls0, s2, s0 fmadd s0, s3, s2, s0 ldr s3, [x1, x3] ldr s2, [x4, 4] fadds3, s3, s1 fadds2, s0, s2 str s3, [x1, x3] str s2, [x4, 4] str s1, [x1, x2] str s0, [x5, 4] ret and turning off the cost model we get moviv1.2s, 0 mov w0, 0 ldr x4, [x1, #:lo12:.LANCHOR0] ldrsw x3, [x2, 16] ldr x1, [x2, 8] ldrsw x2, [x2, 20] ldr d0, [x4] fcmla v1.2s, v0.2s, v0.2s, #0 ldr d2, [x1, x3, lsl 3] fcmla v2.2s, v0.2s, v0.2s, #0 fcmla v2.2s, v0.2s, v0.2s, #90 str d2, [x1, x3, lsl 3] fcmla v1.2s, v0.2s, v0.2s, #90 str d1, [x1, x2, lsl 3] however, if the pattern matcher doesn't create the FMA node because it would unshare the tree (which I think is a general heuristic that would work out to better code wouldn't it?) then we would get (with the cost model enabled even) moviv0.2s, 0 mov w0, 0 ldr x4, [x1, #:lo12:.LANCHOR0] ldrsw x3, [x2, 16] ldr x1, [x2, 8] ldr d1, [x4] fcmla v0.2s, v1.2s, v1.2s, #0 fcmla v0.2s, v1.2s, v1.2s, #90 ldrsw x2, [x2, 20] ldr d1, [x1, x3, lsl 3] faddv1.2s, v1.2s, v0.2s str d1, [x1, x3, lsl 3] str d0, [x1, x2, lsl 3] ret Which is the most optimal form. So I think this should perhaps be handled in GCC 12 if there's a way to detect when you're going to unshare a sub-tree. Thanks, Tamar gcc/ChangeLog: PR tree-optimization/99149 * tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate the buffer. (vect_slp_reset_pattern): Remove. (complex_fma_pattern::matches): Remove call to vect_slp_reset_pattern. (complex_mul_pattern::build, complex_fma_pattern::build, complex_fms_pattern::build): Fix ref counts. * tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern relevancy when node is being deleted. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize value. gcc/testsuite/ChangeLog: PR tree-optimization/99149 * gcc.dg/vect/pr99149.C: New test. --- inline copy of patch -- diff --git a/gcc/testsuite/gcc.dg/vect/pr99149.C b/gcc/testsuite/gcc.dg/vect/pr99149.C new file mode 100755 index ..b12fe17e4ded148ce2bf67486e425dd65461a148 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr99149.C @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */ + +class a { + float b; + float c; + +public: + a(float d, float e) : b(d), c(e) {} +
RE: [PATCH] slp: fix sharing of SLP only patterns. (PR99149)
Ps. The code in comment is needed, but was commented out due to the early free issue described in the patch. Thanks, Tamar > -Original Message- > From: Tamar Christina > Sent: Friday, February 19, 2021 2:42 PM > To: gcc-patches@gcc.gnu.org > Cc: nd ; rguent...@suse.de > Subject: [PATCH] slp: fix sharing of SLP only patterns. (PR99149) > > Hi Richi, > > The attached testcase ICEs due to a couple of issues. > In the testcase you have two SLP instances that share the majority of their > definition with each other. One tree defines a COMPLEX_MUL sequence > and the other tree a COMPLEX_FMA. > > The ice happens because: > > 1. the refcounts are wrong, in particular the FMA case doesn't correctly count > the references for the COMPLEX_MUL that it consumes. > > 2. when the FMA is created it incorrectly assumes it can just tear apart the > MUL node that it's consuming. This is wrong and should only be done when > there is no more uses of the node, in which case the vector only pattern is no > longer relevant. > > To fix the last part the SLP only pattern reset code was moved into > vect_free_slp_tree which results in cleaner code. I also think it does belong > there since that function knows when there are no more uses of the node > and so the pattern should be unmarked, so when the the vectorizer is > inspecting the BB it doesn't find the now invalid vector only patterns. > > This has the obvious problem in that, eventually after analysis is done, the > entire SLP tree is dissolved before codegen. Where we get into trouble as > we have now dissolved the patterns too... > > My initial thought was to add a parameter to vect_free_slp_tree, but I know > you wouldn't like that. So I am sending this patch up as an RFC. > > PS. This testcase actually shows that the codegen we get in these cases is not > optimal. Currently this won't vectorize as the compiler thinks the vector > version is too expensive. > > My guess here is because the patterns now unshare the tree and it's likely > costing the setup for the vector code twice? > > Even with the shared code (without patterns, so same as GCC 10, or turning > off the patterns) it won't vectorize it. > > The scalar code is > > mov w0, 0 > ldr x4, [x1, #:lo12:.LANCHOR0] > ldrsw x2, [x3, 20] > ldr x1, [x3, 8] > lsl x2, x2, 3 > ldrsw x3, [x3, 16] > ldp s3, s2, [x4] > add x5, x1, x2 > ldr s0, [x1, x2] > lsl x3, x3, 3 > add x4, x1, x3 > fmuls1, s2, s2 > fnmsub s1, s3, s3, s1 > fmuls0, s2, s0 > fmadd s0, s3, s2, s0 > ldr s3, [x1, x3] > ldr s2, [x4, 4] > fadds3, s3, s1 > fadds2, s0, s2 > str s3, [x1, x3] > str s2, [x4, 4] > str s1, [x1, x2] > str s0, [x5, 4] > ret > > and turning off the cost model we get > > moviv1.2s, 0 > mov w0, 0 > ldr x4, [x1, #:lo12:.LANCHOR0] > ldrsw x3, [x2, 16] > ldr x1, [x2, 8] > ldrsw x2, [x2, 20] > ldr d0, [x4] > fcmla v1.2s, v0.2s, v0.2s, #0 > ldr d2, [x1, x3, lsl 3] > fcmla v2.2s, v0.2s, v0.2s, #0 > fcmla v2.2s, v0.2s, v0.2s, #90 > str d2, [x1, x3, lsl 3] > fcmla v1.2s, v0.2s, v0.2s, #90 > str d1, [x1, x2, lsl 3] > > however, if the pattern matcher doesn't create the FMA node because it > would unshare the tree (which I think is a general heuristic that would work > out to better code wouldn't it?) then we would get (with the cost model > enabled even) > > moviv0.2s, 0 > mov w0, 0 > ldr x4, [x1, #:lo12:.LANCHOR0] > ldrsw x3, [x2, 16] > ldr x1, [x2, 8] > ldr d1, [x4] > fcmla v0.2s, v1.2s, v1.2s, #0 > fcmla v0.2s, v1.2s, v1.2s, #90 > ldrsw x2, [x2, 20] > ldr d1, [x1, x3, lsl 3] > faddv1.2s, v1.2s, v0.2s > str d1, [x1, x3, lsl 3] > str d0, [x1, x2, lsl 3] > ret > > Which is the most optimal form. So I think this should perhaps be handled in > GCC 12 if there's a way to detect when you're going to unshare a sub-tree. > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/99149 > * tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate the > buffer. > (vect_slp_reset_pattern): Remove. > (complex_fma_pattern::matches): Remove call to > vect_slp_reset_pattern. > (complex_mul_pattern::build, complex_fma_pattern::build, > complex_fms_pattern::build): Fix ref counts. > * tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern > relevancy > when node is being deleted. > * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize value. > > gcc/testsuite/ChangeLog:
Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered
On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote: Hi, This patch makes sure that allocno copies are not created for unordered modes. The testcases in the PR highlighted a case where an allocno copy was being created for: (insn 121 120 123 11 (parallel [ (set (reg:VNx2QI 217) (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 0))) (clobber (scratch:VNx16BI)) ]) 4750 {*vec_duplicatevnx2qi_reg} (expr_list:REG_DEAD (reg:SI 93 [ _2 ]) (nil))) As the compiler detected that the vec_duplicate_reg pattern allowed the input and output operand to be of the same register class, it tried to create an allocno copy for these two operands, stripping subregs in the process. However, this meant that the copy was between VNx2QI and SI, which have unordered mode precisions. So at compile time we do not know which of the two modes is smaller which is a requirement when updating allocno copy costs. Regression tested on aarch64-linux-gnu. Is this OK for trunk (and after a week backport to gcc-10) ? OK. Yes, it is wise to wait a bit and see how the patch behaves on the trunk before submitting it to gcc-10 branch. Sometimes such changes can have quite unexpected consequences. But I guess not in this is case. Thank you for working on the issue. gcc/ChangeLog: 2021-02-19 Andre Vieira PR rtl-optimization/98791 * ira-conflicts.c (process_regs_for_copy): Don't create allocno copies for unordered modes. gcc/testsuite/ChangeLog: 2021-02-19 Andre Vieira PR rtl-optimization/98791 * gcc.target/aarch64/sve/pr98791.c: New test.
[Patch] Fortran: Fix DTIO with type ICE [PR99146]
In this example, the formal argument is a derived type and not a class – hence, there is an ICE. OK for the trunk? Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf Fortran: Fix DTIO with type ICE [PR99146] gcc/fortran/ChangeLog: PR fortran/99146 * interface.c: gcc/testsuite/ChangeLog: PR fortran/99146 * gfortran.dg/dtio_36.f90: New test. gcc/fortran/interface.c | 4 +++- gcc/testsuite/gfortran.dg/dtio_36.f90 | 33 + 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c index 87fe14280e6..f7ca52e6550 100644 --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -5305,7 +5305,9 @@ gfc_find_specific_dtio_proc (gfc_symbol *derived, bool write, bool formatted) } finish: - if (dtio_sub && derived != CLASS_DATA (dtio_sub->formal->sym)->ts.u.derived) + if (dtio_sub + && dtio_sub->formal->sym->ts.type == BT_CLASS + && derived != CLASS_DATA (dtio_sub->formal->sym)->ts.u.derived) gfc_find_derived_vtab (derived); return dtio_sub; diff --git a/gcc/testsuite/gfortran.dg/dtio_36.f90 b/gcc/testsuite/gfortran.dg/dtio_36.f90 new file mode 100644 index 000..4e53581b86a --- /dev/null +++ b/gcc/testsuite/gfortran.dg/dtio_36.f90 @@ -0,0 +1,33 @@ +! { dg-do compile } +! +! PR fortran/99146 +! + MODULE p + TYPE :: person + sequence + END TYPE person + INTERFACE READ(UNFORMATTED) + MODULE PROCEDURE pruf + END INTERFACE + + CONTAINS + + SUBROUTINE pruf (dtv,unit,iostat,iomsg) + type(person), INTENT(INOUT) :: dtv + INTEGER, INTENT(IN) :: unit + INTEGER, INTENT(OUT) :: iostat + CHARACTER (LEN=*), INTENT(INOUT) :: iomsg + iostat = 1 + END SUBROUTINE pruf + + END MODULE p + + PROGRAM test + USE p + TYPE (person) :: chairman + + OPEN (UNIT=71, status = 'scratch', FORM='UNFORMATTED') + + read(71) chairman + + END PROGRAM test
Re: [Patch] Fortran: Fix DTIO with type ICE [PR99146]
On 2/19/21 8:00 AM, Tobias Burnus wrote: In this example, the formal argument is a derived type and not a class – hence, there is an ICE. OK for the trunk? This is OK, could you also check 89219 and 81499 and see if these are the same or similar. Much appreciated. Jerry
Re: *PING**2 Re: [Patch] Fortran: Fix coarray handling for gfc_dep_resolver [PR99010] (was: Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913]
On 2/19/21 1:33 AM, Tobias Burnus wrote: On 09.02.21 12:52, Tobias Burnus wrote: Hi all, hi Thomas, On 02.02.21 18:15, Tobias Burnus wrote: I think I will do a combination: If 'identical' is true, I think I cannot remove it. If it is false, it can be identical or nonoverlapping – which makes sense. Well, it turned out that coarray scalars did not work; for them, the array ref consists only of the coindexed access: falling through then triggered as fin_dep == GFC_DEP_ERROR. To be more careful, I also removed the 'identical &&' check such that the first loop is now always triggered if coarray != SINGLE not only if identical – I am not sure whether it is really needed or whether falling though is the better solution (with updating the comment). I would be happy if someone could carefully check the logic – in the details, it is rather confusing. OK? Yes OK, thanks for patch. Jerry
[PATCH][obvious] Fix typo in param description.
Pushed to master as obvious. Martin gcc/ChangeLog: PR translation/99167 * params.opt: Fix typo. --- gcc/params.opt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/params.opt b/gcc/params.opt index a4e1ac0e88e..0dd9ac406eb 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -957,7 +957,7 @@ Maximum number of references stored in each modref base. -param=modref-max-accesses= Common Joined UInteger Var(param_modref_max_accesses) Init(16) Param Optimization -Maximum number of accesse stored in each modref reference. +Maximum number of accesses stored in each modref reference. -param=modref-max-tests= Common Joined UInteger Var(param_modref_max_tests) Init(64) Param Optimization -- 2.30.1
Re: [patch, fortran] PR96686 Namelist group objects shall be defined before appearing in namelist
Hi Jerry, On 17.02.21 04:02, Jerry DeLisle wrote: Attached patch adds to checks. In the case of IMPLICIT typing it checks to see if the objects listed in the NAMELIST have defined types andf if not, sets them to the default implicit types. In the case of IMPLICIT NONE, the types are required be declared before the NAMELIST. If an object type is found to not be declared already, an error is issued. Regression tested. OK for trunk? After taking a look while being less groggy, it looks good to me, but I have a few remarks: I think it looks cleaner to swap inner/outer the conditions, i.e. if (sym->ts.type == BT_UNKNOWN) { if (!gfc_current_ns->seen_implicit_none) ... else ... } +++ b/gcc/testsuite/gfortran.dg/namelist_4.f90 @@ -27,14 +27,14 @@ END module M1 program P1 CONTAINS ! This has the additional wrinkle of a reference to the object. + INTEGER FUNCTION F2() +F2=1 + END FUNCTION INTEGER FUNCTION F1() NAMELIST /NML3/ F2 ! { dg-error "PROCEDURE attribute conflicts" } ! Used to ICE here -f2 = 1 ! { dg-error "is not a VALUE" } +f2 = 1 ! { dg-error "is not a variable" } F1=1 END FUNCTION - INTEGER FUNCTION F2() -F2=1 - END FUNCTION END Unless I made a mistake, there is no need to modify this testcase – even with the patch, the error remains the same. However, if the "f2 = 1" line is removed, the previously hidden error in the NAMELIST line is shown: 'PROCEDURE attribute conflicts with NAMELIST attribute in ‘f2’ at (1)' is shown. I think you should retain this testcase – and either add between the two functions a new one, copying 'f1' but with 'f2 = 1' removed. You then get the expected NAMELIST error. — Or to add a new testcase for this check. PLUS: I think it would be useful to add a test of the form: namelist /nml/ f integer :: f which then shows the new error (declared before the namelist) which is currently not checked for. * * * On 19.02.21 16:58, Jerry DeLisle wrote: On 2/17/21 1:19 AM, Tobias Burnus wrote: f2 looks like a local and implicitly typed real variable. At least ifort compiles this program successfully. I have to admit that I am not sure about this – implicit typing is odd, if combined with host association. But I think you are right. I also got confused due to the reordering, which is not needed. Regarding: contains integer function f1() !f2 = 1 !This gives an error trying to assign a value to a function. Okay. Also matches ifort: "This name has already been used as an external function name." j = f2! This is OK This one is odd – you assign a function address to an integer (the compiler does the casting). ifort rejects it with: "This name has already been used as an external function name." That looks like a variant of https://gcc.gnu.org/PR98890 – I added it as additional example. Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
[PATCH] Document the GCC11 change to DWARF5 default.
* gcc-11/changes.html (General Improvements): Add a section on the DWARF version 5 default. --- htdocs/gcc-11/changes.html | 30 ++ 1 file changed, 30 insertions(+) diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html index de75b8d6..04c3c449 100644 --- a/htdocs/gcc-11/changes.html +++ b/htdocs/gcc-11/changes.html @@ -139,6 +139,36 @@ a work-in-progress. -fsanitize=kernel-hwaddress to instrument kernel code. + + + For targets that produce DWARF debugging information GCC now + defaults to http://dwarfstd.org/doc/DWARF5.pdf";>DWARF + version 5 (with the exception of VxWorks and Darwin/Mac OS X + which default to version 2 and AIX which defaults to version 4). + This can produce up to 25% more compact debug information + compared to earlier versions. + + To take full advantage of DWARF version 5 GCC needs to be build + against binutils version 2.35.2 or higher. When GCC is build + against earlier versions of binutils GCC will still emit DWARF + version 5 for most debuginfo data, but will generate version 4 + debug line tables (even when explicitly given -gdwarf-5). + + The following debug information consumers can process DWARF version 5: + + GDB 8.0, or higher + valgrind 3.17.0 + elfutils 0.172, or higher (for use with systemtap, + dwarves/pahole, perf and libabigail). + + + Programs embedding libbacktrace are urged to upgrade to the version + shipping with GCC 11. + + To make GCC 11 generate an older DWARF version + use -g together with -gdwarf-2, + -gdwarf-3 or -gdwarf-4. + -- 2.18.4
c++: Inform of CMI reads [PR 99166]
When successfully reading a module CMI, the user gets no indication of where that CMI was located. I originally didn't consider this a problem -- the read was successful after all. But it can make it difficult to interact with build systems, particularly when caching can be involved. Grovelling over internal dump files is not really useful to the user. Hence this option, which is similar to the -flang-info-include-translate variants, and allows the user to ask for all, or specific module read notification. gcc/c-family/ * c.opt (flang-info-module-read, flang-info-module-read=): New. gcc/ * doc/invoke.texi (flang-info-module-read): Document. gcc/cp/ * module.cc (note_cmis): New. (struct module_state): Add inform_read_p bit. (module_state::do_import): Inform of CMI location, if enabled. (init_modules): Canonicalize note_cmis entries. (handle_module_option): Handle -flang-info-module-read=FOO. gcc/testsuite/ * g++.dg/modules/pr99166_a.X: New. * g++.dg/modules/pr99166_b.C: New. * g++.dg/modules/pr99166_c.C: New. * g++.dg/modules/pr99166_d.C: New. -- Nathan Sidwell diff --git c/gcc/c-family/c.opt w/gcc/c-family/c.opt index b209d46d32b..3264c646ad3 100644 --- c/gcc/c-family/c.opt +++ w/gcc/c-family/c.opt @@ -1752,6 +1752,14 @@ flang-info-include-translate= C++ Joined RejectNegative MissingArgError(missing header name) Note a #include translation of a specific header. +flang-info-module-read +C++ Var(note_module_read_yes) +Note Compiled Module Interface pathnames. + +flang-info-module-read= +C++ Joined RejectNegative MissingArgError(missing module name) +Note Compiled Module Interface pathname of a specific module or header-unit. + fmax-include-depth= C ObjC C++ ObjC++ Joined RejectNegative UInteger fmax-include-depth= Set the maximum depth of the nested #include. diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc index e801c52069e..691381bf995 100644 --- c/gcc/cp/module.cc +++ w/gcc/cp/module.cc @@ -317,6 +317,9 @@ version2string (unsigned version, verstr_t &out) /* Include files to note translation for. */ static vec *note_includes; +/* Modules to note CMI pathames. */ +static vec *note_cmis; + /* Traits to hash an arbitrary pointer. Entries are not deletable, and removal is a noop (removal needed upon destruction). */ template @@ -3547,9 +3550,10 @@ class GTY((chain_next ("%h.parent"), for_user)) module_state { do it again */ bool call_init_p : 1; /* This module's global initializer needs calling. */ + bool inform_read_p : 1; /* Inform of a read. */ /* Record extensions emitted or permitted. */ unsigned extensions : SE_BITS; - /* 12 bits used, 4 bits remain */ + /* 13 bits used, 3 bits remain */ public: module_state (tree name, module_state *, bool); @@ -3782,6 +3786,8 @@ module_state::module_state (tree name, module_state *parent, bool partition) partition_p = partition; + inform_read_p = false; + extensions = 0; if (name && TREE_CODE (name) == STRING_CST) { @@ -18634,6 +18640,8 @@ module_state::do_import (cpp_reader *reader, bool outermost) { const char *file = maybe_add_cmi_prefix (filename); dump () && dump ("CMI is %s", file); + if (note_module_read_yes || inform_read_p) + inform (loc, "reading CMI %qs", file); fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY); e = errno; } @@ -19545,6 +19553,7 @@ init_modules (cpp_reader *reader) headers = BITMAP_GGC_ALLOC (); if (note_includes) +/* Canonicalize header names. */ for (unsigned ix = 0; ix != note_includes->length (); ix++) { const char *hdr = (*note_includes)[ix]; @@ -19567,6 +19576,37 @@ init_modules (cpp_reader *reader) (*note_includes)[ix] = path; } + if (note_cmis) +/* Canonicalize & mark module names. */ +for (unsigned ix = 0; ix != note_cmis->length (); ix++) + { + const char *name = (*note_cmis)[ix]; + size_t len = strlen (name); + + bool is_system = name[0] == '<'; + bool is_user = name[0] == '"'; + bool is_pathname = false; + if (!(is_system || is_user)) + for (unsigned ix = len; !is_pathname && ix--;) + is_pathname = IS_DIR_SEPARATOR (name[ix]); + if (is_system || is_user || is_pathname) + { + if (len <= (is_pathname ? 0 : 2) + || (!is_pathname && name[len-1] != (is_system ? '>' : '"'))) + { + error ("invalid header name %qs", name); + continue; + } + else + name = canonicalize_header_name (is_pathname ? nullptr : reader, + 0, is_pathname, name, len); + } + if (auto module = get_module (name)) + module->inform_read_p = 1; + else + error ("invalid module name %qs", name); + } + dump.push (NULL); /* Determine lazy handle bound. */ @@ -19952,6 +19992,10 @@ handle_module_option (unsigned code, const char *str, int) vec_safe_push (note_includes, str); return true; +
Re: [PATCH] clear VLA bounds from all arguments (PR 97172)
On 2/18/21 8:51 PM, Jeff Law wrote: On 2/18/21 7:23 PM, Martin Sebor wrote: The fix for PR 97172 that removes non-constant VLA bounds from attribute access is incomplete: it inadvertently removes the bounds corresponding to just the first VLA argument, and not from subsequent arguments. The attached change removes the vestigial condition that causes this bug. Since it's behind a build failure in a Fedora package (cava?) I'll commit it under the "obvious" clause tomorrow if there are no concerns. Martin gcc-97172-2.diff PR c/97172 - ICE: tree code 'ssa_name' is not supported in LTO streams gcc/ChangeLog: * attribs.c (init_attr_rdwr_indices): Guard vblist use. (attr_access::free_lang_data): Remove a spurious test. gcc/testsuite/ChangeLog: * gcc.dg/pr97172.c: Add test cases. OK. Thanks. I just committed this change in g:3599ecb6a01. Martin
Re: [PATCH] run -Wnonnull later (PR 87489)
On 2/19/21 2:48 AM, Franz Sirl wrote: Am 2021-02-01 um 01:31 schrieb Martin Sebor via Gcc-patches: The initial -Wnonnull implementation in the middle end took place too late in the pipeline (just before expansion), and as a result was prone to false positives (bug 78817). In an attempt to avoid the worst of those, the warning was moved to the ccp2 pass in r243874. However, as the test case in PR 87489 shows, this is in turn too early and causes other false positives as a result. A few experiments with running the warning later suggest that just before the mergephi2 pass is a good point to avoid this class of false positives without causing any regressions or introducing any new warnings either in Glibc or in Binutils/GDB. Since PR 87489 is a GCC 8-11 regression I propose to make this change on the trunk as well as on the release branches. Hi Martin, I tested your patch and it showed also one more warning for this testcase with -O2 -Wnonnull: class b { public: long c(); }; class B { public: B() : f() {} b *f; }; long d, e; class g : B { public: void h() { long a = f->c(); d = e = a; } }; class j { j(); g i; }; j::j() { i.h(); } I hope cvise didn't minimize too much, but at least in the original much larger code the warning looks reasonable too. Thanks. Yes, the warning would be useful here since the f pointer in the call to f->c() is null when i.h() is called from j's ctor. The FRE3 pass clearly exposes this : void j::j (struct j * const this) { long int _9; [local count: 1073741824]: MEM[(struct B *)this_3(D)] ={v} {CLOBBER}; MEM[(struct B *)this_3(D)].f = 0B; _9 = b::c (0B); ... Because the warning runs early (after CCP2), the null pointer is not detected. To see it the warning code would have to work quite hard to figure out that the two assignments below refer to the same location (it would essentially have to do what FRE does): MEM[(struct B *)this_3(D)].f = 0B; _7 = MEM[(struct b * *)_1]; _9 = b::c (_7); It's probably too late to make this change for GCC 11 as Jeff has already decided that it should be deferred until GCC 12, and even then there's a concern that doing so might cause false positives. I think it's worth revisiting the idea in GCC 12 to see if the concern is founded. Let me make a note of it in the bug. Martin
[WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]
Hello Sorry for taking so long in replying. On 29/01/2021 3:03 pm, Jakub Jelinek wrote: It can also crash if team is NULL, which will happen any time this is called outside of a parallel. Just try (should go into testsuite too): #include int main () { omp_event_handle_t ev; #pragma omp task detach (ev) omp_fulfill_event (ev); return 0; } I have included this as task-detach-11.{c|f90}. Additionally, there is an important difference between fulfill for included tasks and for non-included tasks, for the former there is no team or anything to care about, for the latter there is a team and one needs to take the task_lock, but at that point it can do pretty much everything in omp_fulfill_event rather than handling it elsewhere. So, what I'm suggesting is: Replace bool detach; gomp_sem_t completion_sem; with struct gomp_task_detach *detach; and add struct gomp_task_detach that would contain everything that will be needed (indirect so that we don't waste space for it in every task, but only for those that have detach clause). We need: 1) some way to tell if it is an included task or not 2) for included tasks the gomp_sem_t completion_sem (and nothing but 1) and 2) for those), 3) struct gomp_team * for non-included tasks 4) some way to find out if the task has finished and is just waiting for fulfill event (perhaps your GOMP_TASK_DETACHED is ok for that) 5) some way to find out if the task has been fulfilled already (gomp_sem_t for that seems an overkill though) 1) could be done through the struct gomp_team *team; member, set it to NULL in included tasks (no matter if they are in some team or not) and to non-NULL team of the task (non-included tasks must have a team). I have opted for a union of completion_sem (for tasks that are undeferred) and a struct gomp_team *detach_team (for deferred tasks) that holds the team if the completion event has not yet fulfilled, or NULL if is it. I don't see the point of having an indirection to the union here since the union is just the size of a pointer, so it might as well be inlined. And I don't see the point of task_detach_queue if we can handle the dependers etc. all in omp_fulfill_event, which I think we can if we take the task_lock. I have removed the task_detach_queue. The team barrier, taskwait and taskgroup_end now just set the task kind to GOMP_TASK_DETACHED without decrementing the task_count if a task finishes with detach_team non-NULL. So, I think omp_fulfill_event should look at the task->detach it got, if task->detach->team is NULL, it is included task, GOMP_task should have initialized task->detach->completion_sem and omp_fulfill_event should just gomp_sem_post it and that is all, GOMP_task for included task needs to gomp_sem_wait after it finishes before it returns. omp_fulfill_event now posts completion_sem if the task kind is OMP_TASK_UNDEFERRED, and GOMP_task waits for it. Since the task is executed within GOMP_task, it already knows if the task has a detach clause or not, so we do not need to store that information in gomp_task. Otherwise, take the team's task_lock, and look at whether the task is still running, in that case just set the bool that it has been fulfilled (or whatever way of signalling 5), perhaps it can be say clearing task->detach pointer). detach_team is now set to NULL when the event is fulfilled if the task has not started yet or is still executing (checked by the kind). In that case, when the task finishes executing, it behaves just like a task without detach would and finishes normally. When creating non-included tasks in GOMP_task with detach clause through gomp_malloc, it would add the size needed for struct gomp_task_detach. Not necessary with the inlined union. But if the task is already in GOMP_TASK_DETACHED state, instead we need while holding the task_lock do everything that would have been done normally on task finish, but we've skipped it because it hasn't been fulfilled. Including the waking/sem_posts when something could be waiting on that task. Do you agree with this, or see some reason why this can't work? The main problem I see is this code in gomp_barrier_handle_tasks: if (--team->task_count == 0 && gomp_team_barrier_waiting_for_tasks (&team->barrier)) { gomp_team_barrier_done (&team->barrier, state); We do not have access to state from within omp_fulfill_event, so how should this be handled? And testsuite should include also cases where we wait for the tasks with detach clause to be fulfilled at the end of taskgroup (i.e. need to cover all of taskwait, taskgroup end and barrier). I have changed task-detach-[56].* to test the barrier, task-detach-[78].* to test taskwait, and task-detach-(9|10) to test taskgroup (with the first one without a target construct, the second with). I have included the current state of my patch. All task-detach-* tests pass when executed without offloading o
New French PO file for 'gcc' (version 11.1-b20210207)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the French team of translators. The file is available at: https://translationproject.org/latest/gcc/fr.po (This file, 'gcc-11.1-b20210207.fr.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
libgo patch committed: Update to Go1.16 release
This patch updates libgo to the final Go 1.16 release. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian c89b42d3b9d76dedb35e8478913ddf5367f3b5e6 diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index eed9ce01905..217bdd55f1d 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -c406de0594782b1d6782a732a50f5b76387852dc +78a840e4940159a66072237f6b002ab79f441b79 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/libgo/MERGE b/libgo/MERGE index b2fc633f06c..183b0245ee2 100644 --- a/libgo/MERGE +++ b/libgo/MERGE @@ -1,4 +1,4 @@ -3e06467282c6d5678a6273747658c04314e013ef +f21be2fdc6f1becdbed1592ea0b245cdeedc5ac8 The first line of this file holds the git revision number of the last merge done from the master library sources. diff --git a/libgo/VERSION b/libgo/VERSION index a19294250b3..4befab24bc9 100644 --- a/libgo/VERSION +++ b/libgo/VERSION @@ -1 +1 @@ -go1.16rc1 +go1.16 diff --git a/libgo/go/archive/tar/strconv.go b/libgo/go/archive/tar/strconv.go index 6d0a4038082..f0b61e6dba6 100644 --- a/libgo/go/archive/tar/strconv.go +++ b/libgo/go/archive/tar/strconv.go @@ -265,8 +265,27 @@ func parsePAXRecord(s string) (k, v, r string, err error) { return "", "", s, ErrHeader } + afterSpace := int64(sp + 1) + beforeLastNewLine := n - 1 + // In some cases, "length" was perhaps padded/malformed, and + // trying to index past where the space supposedly is goes past + // the end of the actual record. + // For example: + //"30 mtime=1432668921.098285006\n30 ctime=2147483649.15163319" + // ^ ^ + // | | + // | afterSpace=35 + // | + // beforeLastNewLine=29 + // yet indexOf(firstSpace) MUST BE before endOfRecord. + // + // See https://golang.org/issues/40196. + if afterSpace >= beforeLastNewLine { + return "", "", s, ErrHeader + } + // Extract everything between the space and the final newline. - rec, nl, rem := s[sp+1:n-1], s[n-1:n], s[n:] + rec, nl, rem := s[afterSpace:beforeLastNewLine], s[beforeLastNewLine:n], s[n:] if nl != "\n" { return "", "", s, ErrHeader } diff --git a/libgo/go/archive/tar/strconv_test.go b/libgo/go/archive/tar/strconv_test.go index dd3505a758a..add65e272ae 100644 --- a/libgo/go/archive/tar/strconv_test.go +++ b/libgo/go/archive/tar/strconv_test.go @@ -368,6 +368,13 @@ func TestParsePAXRecord(t *testing.T) { {"16 longkeyname=hahaha\n", "16 longkeyname=hahaha\n", "", "", false}, {"3 somelongkey=\n", "3 somelongkey=\n", "", "", false}, {"50 tooshort=\n", "50 tooshort=\n", "", "", false}, + {"30 mtime=1432668921.098285006\n30 ctime=2147483649.15163319", "30 mtime=1432668921.098285006\n30 ctime=2147483649.15163319", "mtime", "1432668921.098285006", false}, + {"06 k=v\n", "06 k=v\n", "", "", false}, + {"6 k=v\n", "6 k=v\n", "", "", false}, + {"06 k=v\n", "06 k=v\n", "", "", false}, + {"00 k=v\n", "00 k=v\n", "", "", false}, + {"0 k=v\n", "0 k=v\n", "", "", false}, + {"+005 x=\n", "+005 x=\n", "", "", false}, } for _, v := range vectors { diff --git a/libgo/go/cmd/go/alldocs.go b/libgo/go/cmd/go/alldocs.go index 49d390297cd..e7c63f0749d 100644 --- a/libgo/go/cmd/go/alldocs.go +++ b/libgo/go/cmd/go/alldocs.go @@ -1808,7 +1808,7 @@ // The directory where the go command will write // temporary source files, packages, and binaries. // GOVCS -// Lists version control commands that may be used with matching servers. +// Lists version control commands that may be used with matching servers. // See 'go help vcs'. // // Environment variables for use with cgo: @@ -2410,6 +2410,17 @@ // // For a detailed reference on modules, see https://golang.org/ref/mod. // +// By default, the go command may download modules from https://proxy.golang.org. +// It may authenticate modules using the checksum database at +// https://sum.golang.org. Both services are operated by the Go team at Google. +// The privacy policies for these services are available at +// https://proxy.golang.org/privacy and https://sum.golang.org/privacy, +// respectively. +// +// The go command's download behavior may be configured using GOPROXY, GOSUMDB, +// GOPRIVATE, and other environment variables. See 'go help environment' +// and https://golang.org/ref/mod#private-module-p
Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
Hi! On Fri, Feb 19, 2021 at 11:06:16AM +0800, Kewen.Lin wrote: > on 2021/2/19 上午2:33, Segher Boessenkool wrote: > > Is there a PR you should mention here? > > I thought this is trivial so didn't file one upstream PR. I did a > searching in upstream bugzilla, PR93453 looks similar but different. > I confirmed that the current patch can't make it pass (just two insns > instead of three insns). > > Do you happen to have one related in mind? If no, I will commit it > without PR. That is fine. I just asked if perhaps you forgot to mention a PR; I do that all the time myself! Filing a new PR if a patch is ready and approved is just bureaucracy, so let's not (it is useful if it needs backporting though, to have a place to track that). Thanks, Segher
Re: [committed] jit: fix ICE on BUILT_IN_TRAP [PR99126]
On Fri, 2021-02-19 at 11:16 +0100, Andrea Corallo wrote: > David Malcolm via Gcc-patches writes: > > > I tried several approaches to fixing this; this seemed the > > least invasive. > > Hi Dave, > > thanks for fixing this. > > I do have a trivial question: are we guaranteed that the middle-end > will > not try to add any build-in other than a trap? No; the middle-end can add various other builtins. The C/C++ FE have c_define_builtins, which calls def_builtin_1 on everything with DEF_BUILTIN, then calls: targetm.init_builtins (); build_common_builtin_nodes (); jit_langhook_init calls build_common_builtin_nodes. The jit builtins_manager creates recording::functions (and their supporting types) for any builtins that are explicitly used, and these recording::functions get turned into trees during playback. I looked at the doing the equivalent of DEF_BUILTIN for any builtins that haven't been created yet, but when to do it? (a) in the builtin manager... but not all builtin-types are supported yet (e.g. BT_FLOAT16), so it would be a big patch to do it that way (b) in jit_langhook_init... but then if the user creates a builtin that wasn't referenced before in a 2nd in-memory compile, I think the types can get out-of-sync. So I went with the patch I did as it seems to be the least invasive way of fixing the specific problem. Dave
Re: [PATCH] PR 99133, Mark xxspltiw, xxspltidp, and xxsplti32x as being prefixed
On Thu, Feb 18, 2021 at 10:44:14PM -0500, Michael Meissner wrote: > On Wed, Feb 17, 2021 at 06:09:39PM -0600, Segher Boessenkool wrote: > > Why test a p10 patch on a p8? > > Well it is just a code gen patch, so I can test it on any system, even an > x86_64 with a cross compiler. Given that right now xxsplatiw is only created > via a built-in, That in itself should be fixed btw (don't use unspecs). > > > + Some prefixed instructions (xxspltiw, xxspltidp, xxsplti32dp, etc.) > > > do not > > > + have a leading 'p'. Setting the prefix attribute to special does not > > > the 'p' > > > + prefix. */ > > > > (grammar) > > > > "special" is the *normal* case. *Most* prefixed insns will be like > > this. They aren't right now, but they will be. > > > > It sounds like you should make a new attribute, "always_prefixed" or > > something, that then the code that sets "prefixed" can use. > > Yes agreed. Or better yet, "maybe_prefixed", which you set when something else will make the decision. Mmostly the default value of the prefixed attribute will then do that. See "maybe_var_shift" / "var_shift" for example. > > > void > > > rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int) > > > { > > > - next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO); > > > + next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO > > > + && get_attr_prefixed (insn) != PREFIXED_SPECIAL); > > >return; > > > } > > > > So you set next_insn_prefixed_p when exactly? The original code was > > correct, as far as I can see? > > rs6000_final_prescan_insn is called in the FINAL_PRESCAN_INSN target hook, and > it is the only place we can set the flag for ASM_OUTPUT_OPCODE to print the > initial 'p'. As you know, during the development of prefixed support, I went > through various different methods of generating the prefixed instruction > (i.e. pld instead of ld). The current method works for normal load, store and > add instructions that have a normal form and a prefixed form. But it doesn't > work for other instructions that we need to start dealing with. Ah, the flag doesn't mean that the next insn is prefixed. It means we want to prefix the mnemonic with a "p", instead. So some name change is in order (as a separate, trivial, patch at the start of the series) -- this helps bisection if needed later, but it also helps reviewing enormously). > > So, for insns like xxspltiw you should just set "prefixed" to "yes", > > because that makes sense, is not confusing. The code that prefixes "p" > > to the mnemonic should change, instead. It can look at some new > > attribute, but it could also just use > > prefixed_load_p (insn) || prefixed_store_p (insn) > > || prefixed_paddi_p (insn) > > or similar (perhaps make a helper function for that then?) > > Yes. Pat Haugen will be taking over the PR. Thanks, Segher
[PATCH] PR fortran/99169 - [9/10/11 Regression] Segfault when passing allocatable scalar into intent(out) dummy argument
Dear all, we should not clobber the pointer in case of an allocatable scalar being an intent(out) dummy argument to a procedure. Regtested on x86_64-pc-linux-gnu. OK for master? Since this is a regression, also for backports to 10/9? Thanks, Harald PR fortran/99169 - Do not clobber allocatable intent(out) dummy argument gcc/fortran/ChangeLog: * trans-expr.c (gfc_conv_procedure_call): Do not add clobber to allocatable intent(out) argument. gcc/testsuite/ChangeLog: * gfortran.dg/intent_optimize_3.f90: New test. diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 103cb31c664..cab58cd1bba 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -6077,6 +6079,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, && !fsym->attr.allocatable && !fsym->attr.pointer && !e->symtree->n.sym->attr.dimension && !e->symtree->n.sym->attr.pointer + && !e->symtree->n.sym->attr.allocatable /* See PR 41453. */ && !e->symtree->n.sym->attr.dummy /* FIXME - PR 87395 and PR 41453 */ diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_3.f90 b/gcc/testsuite/gfortran.dg/intent_optimize_3.f90 new file mode 100644 index 000..6ecd722da76 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/intent_optimize_3.f90 @@ -0,0 +1,16 @@ +! { dg-do run } +! { dg-options "-O2" } +! PR99169 - Segfault passing allocatable scalar into intent(out) dummy argument + +program p + implicit none + integer, allocatable :: i + allocate (i) + call set (i) + if (i /= 5) stop 1 +contains + subroutine set (i) +integer, intent(out) :: i +i = 5 + end subroutine set +end program p
Re: [PATCH v7] Practical improvement to libgcc complex divide
On Tue, 2 Feb 2021, Patrick McGehearty via Gcc-patches wrote: > if (mode == TYPE_MODE (double_type_node)) > - ; /* Empty suffix correct. */ > + { > + ; /* Empty suffix correct. */ > + memcpy (modename, "DBL", 4); > + } > else if (mode == TYPE_MODE (float_type_node)) > - suffix[0] = 'f'; > + { > + suffix[0] = 'f'; > + memcpy (modename, "FLT", 4); > + } > else if (mode == TYPE_MODE (long_double_type_node)) > - suffix[0] = 'l'; > + { > + suffix[0] = 'l'; > + memcpy (modename, "LDBL", 4); > + } > else > { > bool found_suffix = false; > @@ -1305,6 +1316,8 @@ c_cpp_builtins (cpp_reader *pfile) > sprintf (suffix, "f%d%s", floatn_nx_types[i].n, >floatn_nx_types[i].extended ? "x" : ""); > found_suffix = true; > + sprintf (modename, "FLT%d%s", floatn_nx_types[i].n, > + floatn_nx_types[i].extended ? "X" : ""); > break; > } > gcc_assert (found_suffix); Here you're properly computing the mapping from mode to float.h macro prefix (though I think "modename" is a confusing name for the variable used for float.h macro prefixes; to me, "modename" suggests the names such as SF or DF, which are actually "name" here, and something like "float_h_prefix" would be better than "modename"). > + if ((mode == TYPE_MODE (float_type_node)) > + || (mode == TYPE_MODE (double_type_node)) > + || (mode == TYPE_MODE (long_double_type_node))) But then here you still have the check for the mode matching one of those three types, which defeats the point of computing the prefix in a way that works for *all* floating-point modes supported in libgcc. (The above "gcc_assert (found_suffix)" asserts that the relevant mapping did get found.) To be able to use the __LIBGCC_* macros in libgcc in all cases, they should be defined, using the modename (or float_h_prefix) found above, whether or not the mode matches one of float, double and long double. > #elif defined(L_multc3) || defined(L_divtc3) > # define MTYPE TFtype > # define CTYPE TCtype > # define MODEtc > # define CEXT__LIBGCC_TF_FUNC_EXT__ > # define NOTRUNC (!__LIBGCC_TF_EXCESS_PRECISION__) > +#if defined(__LIBGCC_TF_MIN__) > +# define RBIG((__LIBGCC_TF_MAX__)/2) > +# define RMIN(__LIBGCC_TF_MIN__) > +# define RMIN2 (__LIBGCC_TF_EPSILON__) > +# define RMINSCAL (1/__LIBGCC_TF_EPSILON__) > +#else > +# define RBIG((__LIBGCC_XF_MAX__)/2) > +# define RMIN(__LIBGCC_XF_MIN__) > +# define RMIN2 (__LIBGCC_XF_EPSILON__) > +# define RMINSCAL (1/__LIBGCC_XF_EPSILON__) > +#endif Once you've removed the unnecessary conditional above, you don't need this conditional on __LIBGCC_TF_MIN__ being defined either; the *TF* macros will always be defined when TFmode is supported in libgcc, so you never need to use *XF* macros instead when building TFmode code. -- Joseph S. Myers jos...@codesourcery.com
c++: Incorrect module-number ordering [PR 98741]
One of the very strong invariants in modules is that module numbers are allocated such that (other than the current TU), all imports have lesser module numbers, and also that the binding vector is only appended to with increasing module numbers. This broke down when module-directives became a thing and the preprocessing became entirely decoupled from parsing. We'd load header units and their macros (but not symbols of course) during preprocessing. Then we'd load named modules during parsing. This could lead to the situation where a header unit appearing after a named import had a lower module number than the import. Consequently, if they both bound the same identifier, the binding vector would be misorderd and bad things happen. This patch restores a pending import queue I previously had, but in simpler form (hurrah). During preprocessing we queue all module-directives and when we meet one for a header unit we do the minimal loading for all of the queue, so they get appropriate numbering. Then we load the preprocessor state for the header unit. PR c++/98741 gcc/cp/ * module.cc (pending_imports): New. (declare_module): Adjust test condition. (name_pending_imports): New. (preprocess_module): Reimplement using pending_imports. (preprocessed_module): Move name-getting to name_pending_imports. * name-lookup.c (append_imported_binding_slot): Assert module ordering is increasing. gcc/testsuite/ * g++.dg/modules/pr98741_a.H: New. * g++.dg/modules/pr98741_b.H: New. * g++.dg/modules/pr98741_c.C: New. * g++.dg/modules/pr98741_d.C: New. -- Nathan Sidwell diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc index 691381bf995..3d17b8ddcdb 100644 --- c/gcc/cp/module.cc +++ w/gcc/cp/module.cc @@ -3873,6 +3873,9 @@ module_state_hash::equal (const value_type existing, /* Mapper name. */ static const char *module_mapper_name; +/* Deferred import queue (FIFO). */ +static vec *pending_imports; + /* CMI repository path and workspace. */ static char *cmi_repo; static size_t cmi_repo_length; @@ -18944,7 +18947,7 @@ declare_module (module_state *module, location_t from_loc, bool exporting_p, gcc_assert (global_namespace == current_scope ()); module_state *current = (*modules)[0]; - if (module_purview_p () || module->loadedness != ML_NONE) + if (module_purview_p () || module->loadedness > ML_CONFIG) { error_at (from_loc, module_purview_p () ? G_("module already declared") @@ -19275,6 +19278,70 @@ module_begin_main_file (cpp_reader *reader, line_maps *lmaps, } } +/* Process the pending_import queue, making sure we know the + filenames. */ + +static void +name_pending_imports (cpp_reader *reader, bool at_end) +{ + auto *mapper = get_mapper (cpp_main_loc (reader)); + + bool only_headers = (flag_preprocess_only + && !bool (mapper->get_flags () & Cody::Flags::NameOnly) + && !cpp_get_deps (reader)); + if (at_end + && (!vec_safe_length (pending_imports) || only_headers)) +/* Not doing anything. */ +return; + + timevar_start (TV_MODULE_MAPPER); + + dump.push (NULL); + dump () && dump ("Resolving direct import names"); + + mapper->Cork (); + for (unsigned ix = 0; ix != pending_imports->length (); ix++) +{ + module_state *module = (*pending_imports)[ix]; + gcc_checking_assert (module->is_direct ()); + if (!module->filename) + { + Cody::Flags flags + = (flag_preprocess_only ? Cody::Flags::None + : Cody::Flags::NameOnly); + + if (only_headers && !module->is_header ()) + ; + else if (module->module_p + && (module->is_partition () || module->exported_p)) + mapper->ModuleExport (module->get_flatname (), flags); + else + mapper->ModuleImport (module->get_flatname (), flags); + } +} + + auto response = mapper->Uncork (); + auto r_iter = response.begin (); + for (unsigned ix = 0; ix != pending_imports->length (); ix++) +{ + module_state *module = (*pending_imports)[ix]; + gcc_checking_assert (module->is_direct ()); + if (only_headers && !module->is_header ()) + ; + else if (!module->filename) + { + Cody::Packet const &p = *r_iter; + ++r_iter; + + module->set_filename (p); + } +} + + dump.pop (0); + + timevar_stop (TV_MODULE_MAPPER); +} + /* We've just lexed a module-specific control line for MODULE. Mark the module as a direct import, and possibly load up its macro state. Returns the primary module, if this is a module @@ -19322,17 +19389,22 @@ preprocess_module (module_state *module, location_t from_loc, } } + auto desired = ML_CONFIG; if (is_import - && !module->is_module () && module->is_header () - && module->loadedness < ML_PREPROCESSOR + && module->is_header () && (!cpp_get_options (reader)->preprocessed || cpp_get_options (reader)->directives_only)) +/* We need preprocessor sta
Re: [PATCH v7] Practical improvement to libgcc complex divide
Comments inline On 2/19/2021 3:27 PM, Joseph Myers wrote: On Tue, 2 Feb 2021, Patrick McGehearty via Gcc-patches wrote: if (mode == TYPE_MODE (double_type_node)) - ; /* Empty suffix correct. */ + { + ; /* Empty suffix correct. */ + memcpy (modename, "DBL", 4); + } else if (mode == TYPE_MODE (float_type_node)) - suffix[0] = 'f'; + { + suffix[0] = 'f'; + memcpy (modename, "FLT", 4); + } else if (mode == TYPE_MODE (long_double_type_node)) - suffix[0] = 'l'; + { + suffix[0] = 'l'; + memcpy (modename, "LDBL", 4); + } else { bool found_suffix = false; @@ -1305,6 +1316,8 @@ c_cpp_builtins (cpp_reader *pfile) sprintf (suffix, "f%d%s", floatn_nx_types[i].n, floatn_nx_types[i].extended ? "x" : ""); found_suffix = true; + sprintf (modename, "FLT%d%s", floatn_nx_types[i].n, +floatn_nx_types[i].extended ? "X" : ""); break; } gcc_assert (found_suffix); Here you're properly computing the mapping from mode to float.h macro prefix (though I think "modename" is a confusing name for the variable used for float.h macro prefixes; to me, "modename" suggests the names such as SF or DF, which are actually "name" here, and something like "float_h_prefix" would be better than "modename"). float_h_prefix puts me in mind of half-precision floating point rather than float.h. How would modeprefix do instead of modename? + if ((mode == TYPE_MODE (float_type_node)) + || (mode == TYPE_MODE (double_type_node)) + || (mode == TYPE_MODE (long_double_type_node))) But then here you still have the check for the mode matching one of those three types, which defeats the point of computing the prefix in a way that works for *all* floating-point modes supported in libgcc. (The above "gcc_assert (found_suffix)" asserts that the relevant mapping did get found.) To be able to use the __LIBGCC_* macros in libgcc in all cases, they should be defined, using the modename (or float_h_prefix) found above, whether or not the mode matches one of float, double and long double. My apologies. You are completely correct. I intended to remove the "mode == TYPE_MODE" code when making the other changes but overlooked it. #elif defined(L_multc3) || defined(L_divtc3) # define MTYPETFtype # define CTYPETCtype # define MODE tc # define CEXT __LIBGCC_TF_FUNC_EXT__ # define NOTRUNC (!__LIBGCC_TF_EXCESS_PRECISION__) +#if defined(__LIBGCC_TF_MIN__) +# define RBIG ((__LIBGCC_TF_MAX__)/2) +# define RMIN (__LIBGCC_TF_MIN__) +# define RMIN2 (__LIBGCC_TF_EPSILON__) +# define RMINSCAL (1/__LIBGCC_TF_EPSILON__) +#else +# define RBIG ((__LIBGCC_XF_MAX__)/2) +# define RMIN (__LIBGCC_XF_MIN__) +# define RMIN2 (__LIBGCC_XF_EPSILON__) +# define RMINSCAL (1/__LIBGCC_XF_EPSILON__) +#endif Once you've removed the unnecessary conditional above, you don't need this conditional on __LIBGCC_TF_MIN__ being defined either; the *TF* macros will always be defined when TFmode is supported in libgcc, so you never need to use *XF* macros instead when building TFmode code. And as you note, the test for LIBGCC_TF_MIN being defined is not needed either. I'll resubmit the patch after I rebuild and complete testings on the revised code. - patrick
Re: [PATCH v7] Practical improvement to libgcc complex divide
On Fri, 19 Feb 2021, Patrick McGehearty via Gcc-patches wrote: > > Here you're properly computing the mapping from mode to float.h macro > > prefix (though I think "modename" is a confusing name for the variable > > used for float.h macro prefixes; to me, "modename" suggests the names such > > as SF or DF, which are actually "name" here, and something like > > "float_h_prefix" would be better than "modename"). > > float_h_prefix puts me in mind of half-precision floating point rather than > float.h. > How would modeprefix do instead of modename? I'm not sure modeprefix is better; you could say SF is the prefix of SFmode, for example. Hence my suggestion of a name that makes clear it's the prefix *in float.h*. typeprefix might be a possibility, as it's really a prefix associated with a corresponding type rather than directly with the mode. -- Joseph S. Myers jos...@codesourcery.com
[PATCH v3 1/2] MIPS: Not trigger error for pre-R6 and -mcompact-branches=always
For MIPSr6, we may wish to use compact-branches only. Currently, we have to use `always' option, while it is mark as conflict with pre-R6. cc1: error: unsupported combination: ‘mips32r2’ -mcompact-branches=always Just ignore -mcompact-branches=always for pre-R6. This patch also defines __mips_compact_branches_never __mips_compact_branches_always __mips_compact_branches_optimal predefined macros gcc/ChangeLog: * config/mips/mips.c (mips_option_override): * config/mips/mips.h (TARGET_RTP_PIC): not trigger error for compact-branches=always for pre-R6. (TARGET_CB_NEVER): Likewise. (TARGET_CB_ALWAYS): Likewise. (struct mips_cpu_info): define macros for compact branch policy. * doc/invoke.texi: Document "always" with pre-R6. gcc/testsuite/ChangeLog: * gcc.target/mips/compact-branches-1.c: add isa_rev>=6. * gcc.target/mips/mips.exp: don't add -mipsXXr6 option for -mcompact-branches=always. It is usable for pre-R6 now. * gcc.target/mips/compact-branches-8.c: New test. * gcc.target/mips/compact-branches-9.c: New test. --- gcc/ChangeLog | 10 + gcc/config/mips/mips.c| 8 +-- gcc/config/mips/mips.h| 22 --- gcc/doc/invoke.texi | 1 + gcc/testsuite/ChangeLog | 8 +++ .../gcc.target/mips/compact-branches-1.c | 2 +- .../gcc.target/mips/compact-branches-8.c | 10 + .../gcc.target/mips/compact-branches-9.c | 10 + gcc/testsuite/gcc.target/mips/mips.exp| 4 +--- 9 files changed, 56 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/mips/compact-branches-8.c create mode 100644 gcc/testsuite/gcc.target/mips/compact-branches-9.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index eb1a44ae676..caa3fda48ce 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,13 @@ +2021-02-20 YunQiang Su + + * config/mips/mips.c (mips_option_override): + * config/mips/mips.h (TARGET_RTP_PIC): not trigger error for + compact-branches=always for pre-R6. + (TARGET_CB_NEVER): Likewise. + (TARGET_CB_ALWAYS): Likewise. + (struct mips_cpu_info): define macros for compact branch policy. + * doc/invoke.texi: Document "always" with pre-R6. + 2021-02-19 Martin Sebor PR c/97172 diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 8bd2d29552e..9a75dd61031 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -20107,13 +20107,7 @@ mips_option_override (void) target_flags |= MASK_ODD_SPREG; } - if (!ISA_HAS_COMPACT_BRANCHES && mips_cb == MIPS_CB_ALWAYS) -{ - error ("unsupported combination: %qs%s %s", - mips_arch_info->name, TARGET_MICROMIPS ? " -mmicromips" : "", - "-mcompact-branches=always"); -} - else if (!ISA_HAS_DELAY_SLOTS && mips_cb == MIPS_CB_NEVER) + if (!ISA_HAS_DELAY_SLOTS && mips_cb == MIPS_CB_NEVER) { error ("unsupported combination: %qs%s %s", mips_arch_info->name, TARGET_MICROMIPS ? " -mmicromips" : "", diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index b4a60a55d80..b8399fe1b0d 100644 --- a/gcc/config/mips/mips.h +++ b/gcc/config/mips/mips.h @@ -103,11 +103,9 @@ struct mips_cpu_info { #define TARGET_RTP_PIC (TARGET_VXWORKS_RTP && flag_pic) /* Compact branches must not be used if the user either selects the - 'never' policy or the 'optimal' policy on a core that lacks + 'never' policy or the 'optimal' / 'always' policy on a core that lacks compact branch instructions. */ -#define TARGET_CB_NEVER (mips_cb == MIPS_CB_NEVER \ -|| (mips_cb == MIPS_CB_OPTIMAL \ -&& !ISA_HAS_COMPACT_BRANCHES)) +#define TARGET_CB_NEVER (mips_cb == MIPS_CB_NEVER || !ISA_HAS_COMPACT_BRANCHES) /* Compact branches may be used if the user either selects the 'always' policy or the 'optimal' policy on a core that supports @@ -117,10 +115,11 @@ struct mips_cpu_info { && ISA_HAS_COMPACT_BRANCHES)) /* Compact branches must always be generated if the user selects - the 'always' policy or the 'optimal' policy om a core that - lacks delay slot branch instructions. */ -#define TARGET_CB_ALWAYS (mips_cb == MIPS_CB_ALWAYS\ -|| (mips_cb == MIPS_CB_OPTIMAL \ + the 'always' policy on a core support compact branches, + or the 'optimal' policy on a core that lacks delay slot branch instructions. */ +#define TARGET_CB_ALWAYS ((mips_cb == MIPS_CB_ALWAYS \ +&& ISA_HAS_COMPACT_BRANCHES) \ +|| (mips_cb == MIPS_CB_OPTIMAL \ && !ISA_HAS_DELAY_SLOTS)) /* Special handling for JRC that exists in microMIPSR3 as well as R6 @@ -655,6 +654,13 @@ st
[PATCH v3 2/2] MIPS: add builtime option for -mcompact-branches
For R6+ target, it allows to configure gcc to use compact branches only. gcc/ChangeLog: * config.gcc: add -with-compact-branches=policy build option. * doc/install.texi: Likewise. --- gcc/ChangeLog| 5 + gcc/config.gcc | 12 +++- gcc/doc/install.texi | 19 +++ 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index caa3fda48ce..c5fae50e782 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2021-02-20 YunQiang Su + + * config.gcc: add -with-compact-branches=policy build option. + * doc/install.texi: Likewise. + 2021-02-20 YunQiang Su * config/mips/mips.c (mips_option_override): diff --git a/gcc/config.gcc b/gcc/config.gcc index 17fea83b2e4..047f5631067 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4743,7 +4743,7 @@ case "${target}" in ;; mips*-*-*) - supported_defaults="abi arch arch_32 arch_64 float fpu nan fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 madd4" + supported_defaults="abi arch arch_32 arch_64 float fpu nan fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 madd4 compact-branches" case ${with_float} in "" | soft | hard) @@ -4896,6 +4896,16 @@ case "${target}" in exit 1 ;; esac + + case ${with_compact_branches} in + never | always | optimal) + with_compact_branches=${with_compact_branches} + ;; + *) + echo "Unknown compact-branches policy used in --with-compact-branches" 1>&2 + exit 1 + ;; + esac ;; nds32*-*-*) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 4c38244ae58..865630826c6 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1464,6 +1464,25 @@ systems that support conditional traps). Division by zero checks use the break instruction. @end table +@item --with-compact-branches=@var{policy} +Specify how the compiler should generate code for checking for +division by zero. This option is only supported on the MIPS target. +The possibilities for @var{type} are: +@table @code +@item optimal +Cause a delay slot branch to be used if one is available in the +current ISA and the delay slot is successfully filled. If the delay slot +is not filled, a compact branch will be chosen if one is available. +@item never +Ensures that compact branch instructions will never be generated. +@item always +Ensures that a compact branch instruction will be generated if available. +If a compact branch instruction is not available, +a delay slot form of the branch will be used instead. +This option is supported from MIPS Release 6 onwards. +For pre-R6, this option is just same as never/optimal. +@end table + @c If you make --with-llsc the default for additional targets, @c update the --with-llsc description in the MIPS section below. -- 2.20.1