[r15-4709 Regression] FAIL: 23_containers/vector/cons/from_range.cc -std=gnu++26 (test for excess errors) on Linux/x86_64
On Linux/x86_64, b281e13ecad12d07209924a7282c53be3a1c3774 is the first bad commit commit b281e13ecad12d07209924a7282c53be3a1c3774 Author: Jonathan Wakely Date: Tue Oct 8 21:15:18 2024 +0100 libstdc++: Add P1206R7 from_range members to std::vector [PR111055] caused FAIL: 23_containers/vector/cons/from_range.cc -std=gnu++23 (test for excess errors) FAIL: 23_containers/vector/cons/from_range.cc -std=gnu++26 (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-4709/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=23_containers/vector/cons/from_range.cc --target_board='unix{-m32}'" $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=23_containers/vector/cons/from_range.cc --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=23_containers/vector/cons/from_range.cc --target_board='unix{-m64}'" $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=23_containers/vector/cons/from_range.cc --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
[PATCH] Match: Fold pow calls to ldexp when possible [PR57492]
This patch transforms the following POW calls to equivalent LDEXP calls, as discussed in PR57492: powi (2.0, i) -> ldexp (1.0, i) a * powi (2.0, i) -> ldexp (a, i) 2.0 * powi (2.0, i) -> ldexp (1.0, i + 1) pow (powof2, i) -> ldexp (1.0, i * log2 (powof2)) powof2 * pow (2, i) -> ldexp (1.0, i + log2 (powof2)) This is especially helpful for SVE architectures as LDEXP calls can be implemented using the FSCALE instruction, as seen in the following patch: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/664160.html SPEC2017 was run with this patch, while there are no noticeable improvements, there are no non-noise regressions either. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR gcc/ChangeLog: PR target/57492 * match.pd: Added patterns to fold certain calls to pow to ldexp. gcc/testsuite/ChangeLog: PR target/57492 * gcc.dg/tree-ssa/pow-to-ldexp.c: New test. 0001-Match-Fold-pow-calls-to-ldexp-when-possible-PR57492.patch Description: 0001-Match-Fold-pow-calls-to-ldexp-when-possible-PR57492.patch
[PATCH v2] [aarch64] Fix function multiversioning dispatcher link error with LTO
We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When building with LTO, the linker cannot find the __init_cpu_features_resolver.lto_priv* symbol, causing the link error. This patch get this fixed by adding DECL_EXTERNAL to the decl. To avoid used but never defined warning for this symbol, we also mark TREE_PUBLIC to the decl. Minimal steps to reproduce the bug: echo '__attribute__((target_clones("default", "aes"))) void func1() { }' > 1.c echo '__attribute__((target_clones("default", "aes"))) void func2() { }' > 2.c echo 'void func1();void func2();int main(){func1();func2();return 0;}' > main.c gcc -flto -c 1.c 2.c gcc -flto main.c 1.o 2.o Fixes: 0cfde688e213 ("[aarch64] Add function multiversioning support") gcc/ChangeLog: * config/aarch64/aarch64.cc (dispatch_function_versions): Adding DECL_EXTERNAL and TREE_PUBLIC to __init_cpu_features_resolver decl. --- gcc/config/aarch64/aarch64.cc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 5770491b30c..37123befeaf 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -20437,6 +20437,8 @@ dispatch_function_versions (tree dispatch_decl, tree init_fn_id = get_identifier ("__init_cpu_features_resolver"); tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, init_fn_id, init_fn_type); + DECL_EXTERNAL (init_fn_decl) = 1; + TREE_PUBLIC (init_fn_decl) = 1; tree arg1 = DECL_ARGUMENTS (dispatch_decl); tree arg2 = TREE_CHAIN (arg1); ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2); -- 2.47.0
[PATCH v2] Fix MV clones can not redirect to specific target on some targets
Following the implementation of commit b8ce8129a5 ("Redirect call within specific target attribute among MV clones (PR ipa/82625)"), we can now optimize calls by invoking a versioned function callee from a caller that shares the same target attribute. However, on targets that define TARGET_HAS_FMV_TARGET_ATTRIBUTE to zero, meaning they use the "target_versions" attribute instead of "target", this optimization is not feasible. Currently, the only target affected by this limitation is AArch64. This commit resolves the issue by not directly using "target" with lookup_attribute. Instead, it checks the TARGET_HAS_FMV_TARGET_ATTRIBUTE macro to decide between using the "target" or "target_version" attribute. Fixes: 79891c4cb5 ("Add support for target_version attribute") gcc/ChangeLog: * multiple_target.cc (redirect_to_specific_clone): Fix the redirection does not work on target without TARGET_HAS_FMV_TARGET_ATTRIBUTE. gcc/testsuite/ChangeLog: * g++.target/aarch64/mvc-redirect.C: New test. --- gcc/multiple_target.cc| 8 +++--- .../g++.target/aarch64/mvc-redirect.C | 25 +++ 2 files changed, 30 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/mvc-redirect.C diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc index d2c9671fc1b..a1c18f4a3a7 100644 --- a/gcc/multiple_target.cc +++ b/gcc/multiple_target.cc @@ -446,8 +446,10 @@ redirect_to_specific_clone (cgraph_node *node) cgraph_function_version_info *fv = node->function_version (); if (fv == NULL) return; + const char *fmv_attr = (TARGET_HAS_FMV_TARGET_ATTRIBUTE + ? "target" : "target_version"); - tree attr_target = lookup_attribute ("target", DECL_ATTRIBUTES (node->decl)); + tree attr_target = lookup_attribute (fmv_attr, DECL_ATTRIBUTES (node->decl)); if (attr_target == NULL_TREE) return; @@ -458,7 +460,7 @@ redirect_to_specific_clone (cgraph_node *node) if (!fv2) continue; - tree attr_target2 = lookup_attribute ("target", + tree attr_target2 = lookup_attribute (fmv_attr, DECL_ATTRIBUTES (e->callee->decl)); /* Function is not calling proper target clone. */ @@ -472,7 +474,7 @@ redirect_to_specific_clone (cgraph_node *node) for (; fv2 != NULL; fv2 = fv2->next) { cgraph_node *callee = fv2->this_node; - attr_target2 = lookup_attribute ("target", + attr_target2 = lookup_attribute (fmv_attr, DECL_ATTRIBUTES (callee->decl)); if (attr_target2 != NULL_TREE && attribute_value_equal (attr_target, attr_target2)) diff --git a/gcc/testsuite/g++.target/aarch64/mvc-redirect.C b/gcc/testsuite/g++.target/aarch64/mvc-redirect.C new file mode 100644 index 000..f29cc3745a3 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/mvc-redirect.C @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O0" } */ + +__attribute__((target_clones("default", "dotprod", "sve+sve2"))) +int foo () +{ + return 1; +} + +__attribute__((target_clones("default", "dotprod", "sve+sve2"))) +int bar() +{ + return foo (); +} + +/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\._Mdotprod:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\._MsveMsve2:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\tbl\t_Z3foov.default\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\tbl\t_Z3foov._Mdotprod\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\tbl\t_Z3foov._MsveMsve2\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3foov, %gnu_indirect_function\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3foov,_Z3foov\.resolver\n" 1 } } */ -- 2.47.0
Re: [PATCH] Add COBOL to gcc (was: Add 'cobol' to Makefile.def)
On Wed, 23 Oct 2024 15:12:19 +0200 Richard Biener wrote: > The rest of the changes look OK to me. Below is a revised patch incorporating recent feedback. Changes: * remove blank lines at EOF * add gcc/cobol/lang.opt.urls * simpllify gcc/cobol/config-lang.in (and FE requires C++) * add stub gcc/cobol/ChangeLog * group ChangeLog entries by directory * support --enable-generated-files-in-srcdir * remove reference to --fdump-generic-nodes option > This would say > >* configure: Regenerated. done. The patch previous reported "9 files" but contained only 8. We added 2, so the total is now 10. As before, this patch comprises all the "meta files" needed for the Cobol front end, including every existing file that we modified. 1. It does not interfere with --languages=c,c++, etc 2. It does not work with --languages=cobol because the source files are missing. I have not tested with git-gcc-verify because I don't know how to use it It does apply cleanly with "git am" (on my end, at least). --jkl [snip] >From be8c3d34ad7f8a92f4e1679dbbe411b4bcb04d0fbld.patch 4 Oct 2024 12:01:22 >-0400 From: "James K. Lowden" Date: Sat 26 Oct 2024 06:41:52 PM EDT Subject: [PATCH] Add 'cobol' to 10 files ChangeLog * Makefile.def: Add libgcobol module and cobol language. * configure: Regenerated * configure.ac: Add libgcobol module and cobol language. gcc/ChangeLog * gcc/common.opt: Add libgcobol module and cobol language. gcc/cobol/ChangeLog * gcc/cobol/ChangeLog: Add gcc/cobol/ChangeLog * gcc/cobol/LICENSE: Add gcc/cobol/LICENSE * gcc/cobol/Make-lang.in: Add gcc/cobol/Make-lang.in * gcc/cobol/config-lang.in: Add gcc/cobol/config-lang.in * gcc/cobol/lang.opt: Add gcc/cobol/lang.opt * gcc/cobol/lang.opt.urls: Add gcc/cobol/lang.opt.urls --- Makefile.def | ++- configure | +- configure.ac | +- gcc/cobol/ChangeLog | ++- gcc/cobol/LICENSE | +- gcc/cobol/Make-lang.in | - gcc/cobol/config-lang.in | +++- gcc/cobol/lang.opt | - gcc/cobol/lang.opt.urls | +- gcc/common.opt | 10 files changed, 479 insertions(+), 10 deletions(-) diff --git a/Makefile.def b/Makefile.def index 19954e7d731..1192e852c7a 100644 --- a/Makefile.def +++ b/Makefile.def @@ -209,6 +209,7 @@ target_modules = { module= libgomp; bootstrap= true; lib_path=.libs; }; target_modules = { module= libitm; lib_path=.libs; }; target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; }; target_modules = { module= libgrust; }; +target_modules = { module= libgcobol; }; // These are (some of) the make targets to be done in each subdirectory. // Not all; these are the ones which don't have special options. @@ -324,6 +325,7 @@ flags_to_pass = { flag= CXXFLAGS_FOR_TARGET ; }; flags_to_pass = { flag= DLLTOOL_FOR_TARGET ; }; flags_to_pass = { flag= DSYMUTIL_FOR_TARGET ; }; flags_to_pass = { flag= FLAGS_FOR_TARGET ; }; +flags_to_pass = { flag= GCOBOL_FOR_TARGET ; }; flags_to_pass = { flag= GFORTRAN_FOR_TARGET ; }; flags_to_pass = { flag= GOC_FOR_TARGET ; }; flags_to_pass = { flag= GOCFLAGS_FOR_TARGET ; }; @@ -655,6 +657,7 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; no_c=true; }; // built newlib on some targets (e.g. Cygwin). It still needs // a dependency on libgcc for native targets to configure. lang_env_dependencies = { module=libiberty; no_c=true; }; +lang_env_dependencies = { module=libgcobol; cxx=true; }; dependencies = { module=configure-target-fastjar; on=configure-target-zlib; }; dependencies = { module=all-target-fastjar; on=all-target-zlib; }; @@ -690,6 +693,7 @@ dependencies = { module=install-target-libvtv; on=install-target-libgcc; }; dependencies = { module=install-target-libitm; on=install-target-libgcc; }; dependencies = { module=install-target-libobjc; on=install-target-libgcc; }; dependencies = { module=install-target-libstdc++-v3; on=install-target-libgcc; }; +dependencies = { module=install-target-libgcobol; on=install-target-libstdc++-v3; }; // Target modules in the 'src' repository. lang_env_dependencies = { module=libtermcap; }; @@ -727,6 +731,8 @@ languages = { language=d; gcc-check-target=check-d; lib-check-target=check-target-libphobos; }; languages = { language=jit;gcc-check-target=check-jit; }; languages = { language=rust; gcc-check-target=check-rust; }; +languages = { language=cobol; gcc-check-target=check-cobol; + lib-check-target=check-target-libg
[PATCH 1/6] PR 117048: simplify-rtx: Simplify (X << C1) [+,^] (X >> C2) into ROTATE
Hi all, simplify-rtx can transform (X << C1) | (X >> C2) into ROTATE (X, C1) when C1 + C2 == mode-width. But the transformation is also valid for PLUS and XOR. Indeed GIMPLE can also do the fold. Let's teach RTL to do it too. The motivating testcase for this is in AArch64 intrinsics: uint64x2_t G2(uint64x2_t a, uint64x2_t b) { uint64x2_t c = veorq_u64(a, b); return veorq_u64(vaddq_u64(c, c), vshrq_n_u64(c, 63)); } which I was hoping to fold to a single XAR (a ROTATE+XOR instruction) but GCC was failing to detect the rotate operation for two reasons: 1) The combination of the two arms of the expression is done under XOR rather than IOR that simplify-rtx currently supports. 2) The ASHIFT operation is actually a (PLUS X X) operation and thus is not detected as the LHS of the two arms we require. The patch fixes both issues. The analysis of the two arms of the rotation expression is factored out into a common helper simplify_rotate which is then used in the PLUS, XOR, IOR cases in simplify_binary_operation_1. The check-assembly testcase for this is added in the following patch because it needs some extra AArch64 backend work, but I've added self-tests in this patch to validate the transformation. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov PR target/117048 * simplify-rtx.cc (extract_ashift_operands_p): Define. (simplify_rotate_op): Likewise. (simplify_context::simplify_binary_operation_1): Use the above in the PLUS, IOR, XOR cases. (test_vector_rotate): Define. (test_vector_ops): Use the above. v3-0001-PR-117048-simplify-rtx-Simplify-X-C1-X-C2-into-ROTAT.patch Description: v3-0001-PR-117048-simplify-rtx-Simplify-X-C1-X-C2-into-ROTAT.patch
[PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes
Hi all, The MD pattern for the XAR instruction in SVE2 is currently expressed with non-canonical RTL by using a ROTATERT code with a constant rotate amount. Fix it by using the left ROTATE code. This necessitates adjusting the rotate amount during expand. Additionally, as the SVE2 XAR instruction is unpredicated and can handle all element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used (that can only handle V2DImode operands). Therefore let's extend the accepted modes of the SVE2 patternt to include the Advanced SIMD integer modes. This leads to some tests for the svxar* intrinsics to fail because they now simplify to a plain EOR when the rotate amount is the width of the element. This simplification is desirable (EOR instructions have better or equal throughput than XAR, and they are non-destructive of their input) so the tests are adjusted. For V2DImode XAR operations we should prefer the Advanced SIMD version when it is available (TARGET_SHA3) because it is non-destructive, so restrict the SVE2 pattern accordingly. Tests are added to confirm this. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator. * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar): Use SVE_ASIMD_FULL_I modes. Use ROTATE code for the rotate step. Adjust output logic. * config/aarch64/aarch64-sve-builtins-sve2.cc (svxar_impl): Define. (svxar): Use the above. gcc/testsuite/ * gcc.target/aarch64/xar_neon_modes.c: New test. * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than XAR. * gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise. v3-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch Description: v3-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch
[PATCH 3/6] PR 117048: aarch64: Add define_insn_and_split for vector ROTATE
The ultimate goal in this PR is to match the XAR pattern that is represented as a (ROTATE (XOR X Y) VCST) from the ACLE intrinsics code in the testcase. The first blocker for this was the missing recognition of ROTATE in simplify-rtx, which is fixed in the previous patch. The next problem is that once the ROTATE has been matched from the shifts and orr/xor/plus, it will try to match it in an insn before trying to combine the XOR into it. But as we don't have a backend pattern for a vector ROTATE this recog fails and combine does not try the followup XOR+ROTATE combination which would have succeeded. This patch solves that by introducing a sort of "scaffolding" pattern for vector ROTATE, which allows it to be combined into the XAR. If it fails to be combined into anything the splitter will break it back down into the SHL+USRA sequence that it would have emitted. By having this splitter we can special-case some rotate amounts in the future to emit more specialised instructions e.g. from the REV* family. This can be done if the ROTATE is not combined into something else. This optimisation is done in the next patch in the series. Bootstrapped and tested on aarch64-none-linux-gnu. I’ll push this if the prerequisites are approved. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ PR target/117048 * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm): New define_insn_and_split. gcc/testsuite/ PR target/117048 * gcc.target/aarch64/simd/pr117048.c: New test. v3-0003-PR-117048-aarch64-Add-define_insn_and_split-for-vect.patch Description: v3-0003-PR-117048-aarch64-Add-define_insn_and_split-for-vect.patch
[PATCH 4/6] expmed, aarch64: Optimize vector rotates as vector permutes where possible
Hi all, Some vector rotate operations can be implemented in a single instruction rather than using the fallback SHL+USRA sequence. In particular, when the rotate amount is half the bitwidth of the element we can use a REV64,REV32,REV16 instruction. More generally, rotates by a byte amount can be implented using vector permutes. This patch adds such a generic routine in expmed.cc called expand_rotate_as_vec_perm that calculates the required permute indices and uses the expand_vec_perm_const interface. On aarch64 this ends up generating the single-instruction sequences above where possible and can use LDR+TBL sequences too, which are a good choice. With help from Richard, the routine should be VLA-safe. However, the only use of expand_rotate_as_vec_perm introduced in this patch is in aarch64-specific code that for now only handles fixed-width modes. A runtime aarch64 test is added to ensure the permute indices are not messed up. Bootstrapped and tested on aarch64-none-linux-gnu. Richard had approved these changes in the previous iteration, but I’ll only push this after the prerequisites in the series. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * expmed.h (expand_rotate_as_vec_perm): Declare. * expmed.cc (expand_rotate_as_vec_perm): Define. * config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate): Declare prototype. * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement. * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm): Call the above. gcc/testsuite/ * gcc.target/aarch64/vec-rot-exec.c: New test. * gcc.target/aarch64/simd/pr117048_2.c: New test. v3-0004-aarch64-Optimize-vector-rotates-as-vector-permutes-w.patch Description: v3-0004-aarch64-Optimize-vector-rotates-as-vector-permutes-w.patch
[PATCH 6/6] simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X)
Hi all, With recent patch to improve detection of vector rotates at RTL level combine now tries matching a V8HImode rotate by 8 in the example in the testcase. We can teach AArch64 to emit a REV16 instruction for such a rotate but really this operation corresponds to the RTL code BSWAP, for which we already have the right patterns. BSWAP is arguably a simpler representation than ROTATE here because it has only one operand, so let's teach simplify-rtx to generate it. With this patch the testcase now generates the simplest form: .L2: ldr q31, [x1, x0] rev16 v31.16b, v31.16b str q31, [x0, x2] add x0, x0, 16 cmp x0, 2048 bne .L2 instead of the previous: .L2: ldr q31, [x1, x0] shl v30.8h, v31.8h, 8 usrav30.8h, v31.8h, 8 str q30, [x0, x2] add x0, x0, 16 cmp x0, 2048 bne .L2 IMO ideally the bswap detection would have been done during vectorisation time and used the expanders for that, but teaching simplify-rtx to do this transformation is fairly straightforward and, unlike at tree level, we have the native RTL BSWAP code. This change is not enough to generate the equivalent sequence in SVE, but that is something that should be tackled separately. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify (rotate:HI x:HI, 8) -> (bswap:HI x:HI). gcc/testsuite/ * gcc.target/aarch64/rot_to_bswap.c: New test. v3-0006-simplify-rtx-Simplify-ROTATE-HI-X-HI-8-into-BSWAP-HI.patch Description: v3-0006-simplify-rtx-Simplify-ROTATE-HI-X-HI-8-into-BSWAP-HI.patch
[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible
Hi all, We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation away from loops and hot parts. This should be safe to do for 64-bit vectors as well even though the XAR instructions operate on 128-bit values, as the bottom 64-bit results is later accessed through the right subregs. This strategy is used whenever we have XAR instructions, the logic in aarch64_emit_opt_vec_rotate is adjusted to resort to expand_rotate_as_vec_perm only when it's expected to generate a single REV* instruction or when XAR instructions are not present. With this patch we can gerate for the input: v4si G1 (v4si r) { return (r >> 23) | (r << 9); } v8qi G2 (v8qi r) { return (r << 3) | (r >> 5); } the assembly for +sve2: G1: moviv31.4s, 0 xar z0.s, z0.s, z31.s, #23 ret G2: moviv31.4s, 0 xar z0.b, z0.b, z31.b, #5 ret instead of the current: G1: shl v31.4s, v0.4s, 9 usrav31.4s, v0.4s, 23 mov v0.16b, v31.16b ret G2: shl v31.8b, v0.8b, 3 usrav31.8b, v0.8b, 5 mov v0.8b, v31.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add generation of XAR sequences when possible. gcc/testsuite/ * gcc.target/aarch64/rotate_xar_1.c: New test. v3-0005-aarch64-Emit-XAR-for-vector-rotates-where-possible.patch Description: v3-0005-aarch64-Emit-XAR-for-vector-rotates-where-possible.patch
Re: [PATCH 4/6] aarch64: Optimize vector rotates into REV* instructions where possible
> On 25 Oct 2024, at 15:25, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >>> On 25 Oct 2024, at 13:46, Richard Sandiford >>> wrote: >>> >>> Kyrylo Tkachov writes: Thank you for the suggestions! I’m trying them out now. >> + if (rotamnt % BITS_PER_UNIT != 0) >> +return NULL_RTX; >> + machine_mode qimode; >> + if (!qimode_for_vec_perm (mode).exists (&qimode)) >> +return NULL_RTX; >> + >> + vec_perm_builder builder; >> + unsigned nunits = GET_MODE_SIZE (GET_MODE_INNER (mode)); > > simpler as GET_MODE_UNIT_SIZE > >> + unsigned total_units; >> + /* TODO: Handle VLA vector rotates? */ >> + if (!GET_MODE_SIZE (mode).is_constant (&total_units)) >> +return NULL_RTX; > > Yeah. I think we can do that by changing: > >> + builder.new_vector (total_units, 1, total_units); > > to: > > builder.new_vector (total_units, 3, units); I think units here is the size in units of the fixed-width component of the mode? So e.g. 16 for V4SI and VNx4SI but 8 for V4HI and VN4HI? >>> >>> Ah, no, sorry, I meant "nunits" rather than "units", with "nunits" >>> being the same as for your code. So for V4SI and VNx4SI we'd push >>> 12 elements total, as 4 (nunits) "patterns" of 3 elements each. >>> The first argument (total_units) is just GET_MODE_SIZE (mode) >>> in all its poly_int glory. >> >> Hmm, I’m afraid I’m lost again. For V4SI we have a vector of 16 bytes, how >> can 12 indices be enough to describe the permute? >> With this scheme we do end up pushing 12 elements, in the order: >> 2,3,0,1,6,7,4,5,10,11,8,9 . >> In the final RTX emitted in the instruction stream this seems to end up as: >>(const_vector:V16QI [ >>(const_int 2 [0x2]) >>(const_int 3 [0x3]) >>(const_int 0 [0]) >>(const_int 1 [0x1]) >>(const_int 6 [0x6]) >>(const_int 7 [0x7]) >>(const_int 4 [0x4]) >>(const_int 5 [0x5]) >>(const_int 10 [0xa]) >>(const_int 11 [0xb]) >>(const_int 8 [0x8]) >>(const_int 9 [0x9]) repeated x2 >>(const_int 14 [0xe]) >>(const_int 7 [0x7]) >>(const_int 0 [0]) >>]) >> >> So the first 12 elements are indeed correct, but the last 4 elements are not. > > Gah, sorry, I got the arguments the wrong way around. It should be: > > builder.new_vector (GET_MODE_SIZE (mode), nunits, 3); > > (4 patterns, 3 elements per pattern) > Thanks! That works. I’ve resubmitted a fixed patch with https://gcc.gnu.org/pipermail/gcc-patches/2024-October/08.html (along with other updates in the series) Kyrill > Thanks, > Richard
Re: [pushed] doc, fortran: Add a missing menu item.
Am 27.10.24 um 00:15 schrieb Iain Sandoe: Tested on x86_64-darwin21 and linux, with makeinfo 6.7 pushed to trunk, thanks Thanks! For the record, makeinfo 6.8 did not show this as an error. Best regards Thomas
[PATCH] Match: Optimize log (x) CMP CST and exp (x) CMP CST operations
This patch implements transformations for the following optimizations. logN(x) CMP CST -> x CMP expN(CST) expN(x) CMP CST -> x CMP logN(CST) For example: int foo (float x) { return __builtin_logf (x) < 0.0f; } can just be: int foo (float x) { return x < 1.0f; } The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR gcc/ChangeLog: * match.pd: Fold logN(x) CMP CST -> x CMP expN(CST) and expN(x) CMP CST -> x CMP logN(CST) gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/log_exp.c: New test. 0001-Match-Optimize-log-x-CMP-CST-and-exp-x-CMP-CST-opera.patch Description: 0001-Match-Optimize-log-x-CMP-CST-and-exp-x-CMP-CST-opera.patch
Re: [pushed] doc, fortran: Add a missing menu item.
> On 27 Oct 2024, at 08:08, Thomas Koenig wrote: > > Am 27.10.24 um 00:15 schrieb Iain Sandoe: >> Tested on x86_64-darwin21 and linux, with makeinfo 6.7 pushed to trunk, >> thanks > For the record, makeinfo 6.8 did not show this as an error. Hmm that’s maybe a regression in texinfo 6.8 then, because the entry was, indeed, missing. According to our installation pages we only require >= 4.7 (although, for some reason, I was under the impression that had been bumped up recently). Anyway .. resolved for now cheers Iain
[PATCH] Fix MV clones can not redirect to specific target on some targets
Following the implementation of commit b8ce8129a5 ("Redirect call within specific target attribute among MV clones (PR ipa/82625)"), we can now optimize calls by invoking a versioned function callee from a caller that shares the same target attribute. However, on targets that define TARGET_HAS_FMV_TARGET_ATTRIBUTE to zero, meaning they use the "target_versions" attribute instead of "target", this optimization is not feasible. Currently, the only target affected by this limitation is AArch64. This commit resolves the issue by not directly using "target" with lookup_attribute. Instead, it checks the TARGET_HAS_FMV_TARGET_ATTRIBUTE macro to decide between using the "target" or "target_version" attribute. Fixes: 79891c4cb5 ("Add support for target_version attribute") gcc/ChangeLog: * multiple_target.cc (redirect_to_specific_clone): Fix the redirection does not work on target without TARGET_HAS_FMV_TARGET_ATTRIBUTE. --- gcc/multiple_target.cc | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc index d2c9671fc1b..a1c18f4a3a7 100644 --- a/gcc/multiple_target.cc +++ b/gcc/multiple_target.cc @@ -446,8 +446,10 @@ redirect_to_specific_clone (cgraph_node *node) cgraph_function_version_info *fv = node->function_version (); if (fv == NULL) return; + const char *fmv_attr = (TARGET_HAS_FMV_TARGET_ATTRIBUTE + ? "target" : "target_version"); - tree attr_target = lookup_attribute ("target", DECL_ATTRIBUTES (node->decl)); + tree attr_target = lookup_attribute (fmv_attr, DECL_ATTRIBUTES (node->decl)); if (attr_target == NULL_TREE) return; @@ -458,7 +460,7 @@ redirect_to_specific_clone (cgraph_node *node) if (!fv2) continue; - tree attr_target2 = lookup_attribute ("target", + tree attr_target2 = lookup_attribute (fmv_attr, DECL_ATTRIBUTES (e->callee->decl)); /* Function is not calling proper target clone. */ @@ -472,7 +474,7 @@ redirect_to_specific_clone (cgraph_node *node) for (; fv2 != NULL; fv2 = fv2->next) { cgraph_node *callee = fv2->this_node; - attr_target2 = lookup_attribute ("target", + attr_target2 = lookup_attribute (fmv_attr, DECL_ATTRIBUTES (callee->decl)); if (attr_target2 != NULL_TREE && attribute_value_equal (attr_target, attr_target2)) -- 2.47.0
[PATCH] vec-lowering: Fix ABSU lowering [PR111285]
ABSU_EXPR lowering incorrectly used the resulting type for the new expression but in the case of ABSU the resulting type is an unsigned type and with ABSU is folded away. The fix is to use a signed type for the expression instead. Bootstrapped and tested on x86_64-linux-gnu. PR middle-end/111285 gcc/ChangeLog: * tree-vect-generic.cc (do_unop): Use a signed type for the operand if the operation was ABSU_EXPR. gcc/testsuite/ChangeLog: * g++.dg/torture/vect-absu-1.C: New test. Signed-off-by: Andrew Pinski --- gcc/testsuite/g++.dg/torture/vect-absu-1.C | 29 ++ gcc/tree-vect-generic.cc | 10 +++- 2 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/torture/vect-absu-1.C diff --git a/gcc/testsuite/g++.dg/torture/vect-absu-1.C b/gcc/testsuite/g++.dg/torture/vect-absu-1.C new file mode 100644 index 000..0b2035f638f --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/vect-absu-1.C @@ -0,0 +1,29 @@ +// { dg-do run } +// PR middle-end/111285 + +// The lowering of vect absu was done incorrectly + +#define vect1 __attribute__((vector_size(sizeof(int + +#define negabs(a) a < 0 ? a : -a + +__attribute__((noinline)) +int s(int a) +{ + return negabs(a); +} +__attribute__((noinline)) +vect1 int v(vect1 int a) +{ + return negabs(a); +} + +int main(void) +{ +for(int i = -10; i < 10; i++) +{ + vect1 int t = {i}; + if (v(t)[0] != s(i)) +__builtin_abort(); +} +} diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index ef7d2dd259d..21d906e9c55 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -168,7 +168,15 @@ do_unop (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b ATTRIBUTE_UNUSED, tree bitpos, tree bitsize, enum tree_code code, tree type ATTRIBUTE_UNUSED) { - a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos); + tree rhs_type = inner_type; + + /* For ABSU_EXPR, use the signed type for the rhs if the rhs was signed. */ + if (code == ABSU_EXPR + && ANY_INTEGRAL_TYPE_P (TREE_TYPE (a)) + && !TYPE_UNSIGNED (TREE_TYPE (a))) +rhs_type = signed_type_for (rhs_type); + + a = tree_vec_extract (gsi, rhs_type, a, bitsize, bitpos); return gimplify_build1 (gsi, code, inner_type, a); } -- 2.43.0
[PATCH] phiopt: Move check for maybe_undef_p slightly earlier
This moves the check for maybe_undef_p in match_simplify_replacement slightly earlier before figuring out the true/false arg using arg0/arg1 instead. In most cases this is no difference in compile time; just in the case there is an undef in the args there would be a slight compile time improvement as there is no reason to figure out which arg corresponds to the true/false side of the conditional. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (match_simplify_replacement): Move check for maybe_undef_p earlier. Signed-off-by: Andrew Pinski --- gcc/tree-ssa-phiopt.cc | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc index f8b119ea836..cffafe101a4 100644 --- a/gcc/tree-ssa-phiopt.cc +++ b/gcc/tree-ssa-phiopt.cc @@ -943,6 +943,13 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb, stmt_to_move_alt)) return false; + /* Do not make conditional undefs unconditional. */ + if ((TREE_CODE (arg0) == SSA_NAME + && ssa_name_maybe_undef_p (arg0)) + || (TREE_CODE (arg1) == SSA_NAME + && ssa_name_maybe_undef_p (arg1))) +return false; + /* At this point we know we have a GIMPLE_COND with two successors. One successor is BB, the other successor is an empty block which falls through into BB. @@ -982,13 +989,6 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb, arg_false = arg0; } - /* Do not make conditional undefs unconditional. */ - if ((TREE_CODE (arg_true) == SSA_NAME - && ssa_name_maybe_undef_p (arg_true)) - || (TREE_CODE (arg_false) == SSA_NAME - && ssa_name_maybe_undef_p (arg_false))) -return false; - tree type = TREE_TYPE (gimple_phi_result (phi)); { auto_flow_sensitive s1(stmt_to_move); -- 2.43.0
Re: counted_by attribute and type compatibility
Am Freitag, dem 25.10.2024 um 14:03 + schrieb Qing Zhao: > > > On Oct 25, 2024, at 08:13, Martin Uecker wrote: > > > > > > I agree, and error makes sense. What worries me a little bit > > > > is tying this to a semantic change in type compatibility. > > > > > > > > typedef struct foo { int n; int m; > > > > [[gnu::counted_by(n)]] char buf[]; } aaa_t; > > > > > > > > void foo() > > > > { > > > > struct foo { int n; int m; > > > > [[gnu::counted_by(m)]] char buf[]; } *b; > > > > > > > > ... = _Generic(b, aaa_t*: 1, default: 0); > > > > } > > > > > > > > would go into the default branch for compilers supporting > > > > the attribute but go into the first branch for others. Also > > > > it affects ailasing rules. > > > > > > So, they are in separate compilations? Then the compiler is not able to > > > catch such > > > inconsistency during compilation time. > > > > I am not entirely sure what you mean by this. > > > > These are two different types in different scopes, so they > > are allowed to be different. > > Okay, so the two types, aaa_t and the “struct foo” inside the function “foo”, > are two different types. > And this is legal. > > > > But _Generic then tests whether they are compatible and > > takes the attribute into account for GCC. > > Then, these two types are not compatible due to the attribute, is this > correct? Correct. > > > But for > > earlier GCC or other compilers that do not support the > > attribute the result would be different. > For a compiler that does not support the “counted_by” attribute, if the > compiler reports error for the unsupported attribute, then the user needs to > modify the source code to eliminate the unsupported attribute, then the > problem > should be resolved by the user? All compilers I know only emit a warning for unknown attributes. > If the compiler just ignores the unsupported attribute, then these two types > will be treated compatible types by the compiler. Will doing this cause any > issue? Since the “counted-by” attribute is not supported by the compiler and > is ignored by the compiler, these two types should be compatible from my > understanding, do I miss anything obvious here? For standard attributes, there is a policy that the attribute should be ignorable, i.e. removing it from a valid program should not cause any change in semantics. For GCC's attributes this is not necessarily the case, but I still think it is a good policy in general. The reason is that as a reviewer of code you do not need to take subtle effects of attribute into account. You can just pretend those do not exist when analyzing core semantics, which reduces cognitive load and specific knowledge one has to have to understand what is going on. I do not think it is a big issue, but I think it would be better to if removing / ignoring the attribute would *not* cause a change in program semantics. Martin > > Qing > > > > So maybe instead of changing the return value of comptypes, > > we simply set different_types_p (which would prevent > > redeclaration in the same scope) and also set another flag > > similar to enum_and_int_p (e.g. inconsistent_counted_by_p) > > and emit an error in the callers at some appropriate places. > > > > > > > > > > But maybe this is not a problem. > > > This does look like an issue to me… > > > Not sure how to resolve such issue at this moment. > > > > > > Or, only when the “counted_by” information is included into the TYPE, > > > such issue can be resolved? > > > > > > > > > > > > > > > > > > > > > But I was thinking about the case where you have a type with > > > > > > a counted_by attribute and one without. Using them together > > > > > > seems useful, e.g. to add a counted_by in your local version > > > > > > of a type which needs to be compatible to some API. > > > > > > > > > > For API compatibility purpose, yes, I agree here. > > > > > A stupid question here: if one is defined locally, the other one > > > > > is NOT defined locally, can such inconsistency be caught by the > > > > > same compilation (is this the LTO compilation?) > > > > > > > > If there is separate compilation this is not catched. LTO > > > > has a much coarser notion of types and would not notice > > > > either (I believe). > > > > > > Okay. Then such inconsistency will not be caught during compilation time. > > > > Yeah, but here we will miss many other inconsistencies too... > > > > > > > > > > > > > > Suppose we can catch such inconsistency in the same compilation, > > > > > which version we should keep? I guess that we should keep the > > > > > version without the counted_by attribute? > > > > > > > > > I would keep the one with the attribute, because this is the > > > > one which has more information. > > > Make sense to me > > > > > > > > > Martin > > > > > . > > > > > > Thanks. > > > Qing > > > > > > > > > > > > Martin > > > > > > > > > > -- > > Univ.-Prof. Dr. rer. nat. Martin Uecker > > Graz Univers
[patch, Fortran] Introduce unsigned versions of MASKL and MASKR
Hello world, MASKR and MASKL are obvious candidates for unsigned, too; in the previous version of the doc patch, I had promised that these would take unsigned arguments in the future. What I had in mind was they could take an unsigned argument and return an unsigned result. Thinking about this a bit more, I realized that this was actually a bad idea; nowhere else do we allow UNSIGNED for bit counting, and things like checking for negative number of bits (which is illegal) would not work. Hence, two new intrinsics, UMASKL and UMASKR. Regressoin-tesed (and this time, I added the intrinsics to the list, so no trouble expected there :-) OK for trunk? Best regards Thomas gcc/fortran/ChangeLog: * check.cc (gfc_check_mask): Handle BT_INSIGNED. * gfortran.h (enum gfc_isym_id): Add GFC_ISYM_UMASKL and GFC_ISYM_UMASKR. * gfortran.texi: List UMASKL and UMASKR, remove unsigned future unsigned arguments for MASKL and MASKR. * intrinsic.cc (add_functions): Add UMASKL and UMASKR. * intrinsic.h (gfc_simplify_umaskl): New function. (gfc_simplify_umaskr): New function. (gfc_resolve_umasklr): New function. * intrinsic.texi: Document UMASKL and UMASKR. * iresolve.cc (gfc_resolve_umasklr): New function. * simplify.cc (gfc_simplify_umaskr): New function. (gfc_simplify_umaskl): New function. gcc/testsuite/ChangeLog: * gfortran.dg/unsigned_39.f90: New test.diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc index 304ca1b9ae8..2d4af8e7df3 100644 --- a/gcc/fortran/check.cc +++ b/gcc/fortran/check.cc @@ -4466,7 +4466,12 @@ gfc_check_mask (gfc_expr *i, gfc_expr *kind) { int k; - if (!type_check (i, 0, BT_INTEGER)) + if (flag_unsigned) +{ + if (!type_check2 (i, 0, BT_INTEGER, BT_UNSIGNED)) + return false; +} + else if (!type_check (i, 0, BT_INTEGER)) return false; if (!nonnegative_check ("I", i)) @@ -4478,7 +4483,7 @@ gfc_check_mask (gfc_expr *i, gfc_expr *kind) if (kind) gfc_extract_int (kind, &k); else -k = gfc_default_integer_kind; +k = i->ts.type == BT_UNSIGNED ? gfc_default_unsigned_kind : gfc_default_integer_kind; if (!less_than_bitsizekind ("I", i, k)) return false; diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index dd599bc97a2..309095d74d5 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -699,6 +699,8 @@ enum gfc_isym_id GFC_ISYM_UBOUND, GFC_ISYM_UCOBOUND, GFC_ISYM_UMASK, + GFC_ISYM_UMASKL, + GFC_ISYM_UMASKR, GFC_ISYM_UNLINK, GFC_ISYM_UNPACK, GFC_ISYM_VERIFY, diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 3b2691649b0..429d8461f8f 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -2825,16 +2825,11 @@ The following intrinsics take unsigned arguments: The following intinsics are enabled with @option{-funsigned}: @itemize @bullet @item @code{UINT}, @pxref{UINT} +@item @code{UMASKL}, @pxref{UMASKL} +@item @code{UMASKR}, @pxref{UMASKR} @item @code{SELECTED_UNSIGNED_KIND}, @pxref{SELECTED_UNSIGNED_KIND} @end itemize -The following intrinsics will take unsigned arguments -in the future: -@itemize @bullet -@item @code{MASKL}, @pxref{MASKL} -@item @code{MASKR}, @pxref{MASKR} -@end itemize - The following intrinsics are not yet implemented in GNU Fortran, but will take unsigned arguments once they have been: @itemize @bullet diff --git a/gcc/fortran/intrinsic.cc b/gcc/fortran/intrinsic.cc index 83b65d34e43..3fb1c63bbd4 100644 --- a/gcc/fortran/intrinsic.cc +++ b/gcc/fortran/intrinsic.cc @@ -2568,6 +2568,22 @@ add_functions (void) make_generic ("maskr", GFC_ISYM_MASKR, GFC_STD_F2008); + add_sym_2 ("umaskl", GFC_ISYM_UMASKL, CLASS_ELEMENTAL, ACTUAL_NO, + BT_INTEGER, di, GFC_STD_F2008, + gfc_check_mask, gfc_simplify_umaskl, gfc_resolve_umasklr, + i, BT_INTEGER, di, REQUIRED, + kind, BT_INTEGER, di, OPTIONAL); + + make_generic ("umaskl", GFC_ISYM_UMASKL, GFC_STD_F2008); + + add_sym_2 ("umaskr", GFC_ISYM_UMASKR, CLASS_ELEMENTAL, ACTUAL_NO, + BT_INTEGER, di, GFC_STD_F2008, + gfc_check_mask, gfc_simplify_umaskr, gfc_resolve_umasklr, + i, BT_INTEGER, di, REQUIRED, + kind, BT_INTEGER, di, OPTIONAL); + + make_generic ("umaskr", GFC_ISYM_UMASKR, GFC_STD_F2008); + add_sym_2 ("matmul", GFC_ISYM_MATMUL, CLASS_TRANSFORMATIONAL, ACTUAL_NO, BT_REAL, dr, GFC_STD_F95, gfc_check_matmul, gfc_simplify_matmul, gfc_resolve_matmul, ma, BT_REAL, dr, REQUIRED, mb, BT_REAL, dr, REQUIRED); diff --git a/gcc/fortran/intrinsic.h b/gcc/fortran/intrinsic.h index ea29219819d..61d85eedc69 100644 --- a/gcc/fortran/intrinsic.h +++ b/gcc/fortran/intrinsic.h @@ -434,6 +434,8 @@ gfc_expr *gfc_simplify_transpose (gfc_expr *); gfc_expr *gfc_simplify_trim (gfc_expr *); gfc_expr *gfc_simplify_ubound (gfc_expr *, gfc_expr *, gfc_expr *); gfc_expr *gfc_simplify_ucobound (gfc_expr *, gfc_expr *, gfc_expr *);
[Patch, fortran] [13-15 regressions] PR115070 & 115348
Pushed as 'obvious' in commit r15-4702. This patch has been on my tree since July so I thought to get it out of the way before it died of bit-rot. Will backport in a week. Fortran: Fix regressions with intent(out) class[PR115070, PR115348]. 2024-10-27 Paul Thomas gcc/fortran PR fortran/115070 PR fortran/115348 * trans-expr.cc (gfc_trans_class_init_assign): If all the components of the default initializer are null for a scalar, build an empty statement to prevent prior declarations from disappearing. gcc/testsuite/ PR fortran/115070 * gfortran.dg/pr115070.f90: New test. PR fortran/115348 * gfortran.dg/pr115348.f90: New test. Paul
[r15-4702 Regression] FAIL: gfortran.dg/pr115070.f90 -O (test for excess errors) on Linux/x86_64
On Linux/x86_64, ed8ca972f8857869d2bb4a416994bb896eb1c34e is the first bad commit commit ed8ca972f8857869d2bb4a416994bb896eb1c34e Author: Paul Thomas Date: Sun Oct 27 12:40:42 2024 + Fortran: Fix regressions with intent(out) class[PR115070, PR115348]. caused FAIL: gfortran.dg/pr115070.f90 -O (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-4702/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/pr115070.f90 --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/pr115070.f90 --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/pr115070.f90 --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/pr115070.f90 --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
Re: [PATCH] xtensa: Define TARGET_DIFFERENT_ADDR_DISPLACEMENT_P target hook
On Tue, Oct 22, 2024 at 7:31 PM Takayuki 'January June' Suwa wrote: > > In commit bc5a9dab55d13f888a3cdd150c8cf5c2244f35e0 ("gcc: xtensa: reorder > movsi_internal patterns for better code generation during LRA"), the > instruction order in "movsi_internal" MD definition was changed to make LRA > use load/store instructions with larger memory address displacements, but as > a side effect, it now uses the larger displacements (ie., the larger > instructions) even outside of reload operations. > > The underlying problem is that LRA assumes by default that there is only one > maximal legitimate displacement for the same address structure, meaning that > it has no choice but to use the first load/store instruction it finds. > > To fix this, define TARGET_DIFFERENT_ADDR_DISPLACEMENT_P hook to always > return true. > > gcc/ChangeLog: > > * config/xtensa/xtensa.cc (TARGET_DIFFERENT_ADDR_DISPLACEMENT_P): > Add new target hook to always return true. > * config/xtensa/xtensa.md (movsi_internal): > Revert the previous changes. > --- > gcc/config/xtensa/xtensa.cc | 3 +++ > gcc/config/xtensa/xtensa.md | 12 ++-- > 2 files changed, 9 insertions(+), 6 deletions(-) Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master -- Thanks. -- Max
Re: [PATCH v4 2/2] RISC-V: Add testcases for unsigned .SAT_SUB form 2 with IMM = 1.
On 10/24/24 7:22 PM, Li Xu wrote: From: xuli form2: T __attribute__((noinline)) \ sat_u_sub_imm##IMM##_##T##_fmt_2 (T x) \ { \ return x >= (T)IMM ? x - (T)IMM : 0; \ } Passed the rv64gcv regression test. Signed-off-by: Li Xu gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_sub_imm-run-5.c: add run case for imm=1. * gcc.target/riscv/sat_u_sub_imm-run-6.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-run-7.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-run-8.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-5_3.c: New test. * gcc.target/riscv/sat_u_sub_imm-6_3.c: New test. * gcc.target/riscv/sat_u_sub_imm-7_3.c: New test. * gcc.target/riscv/sat_u_sub_imm-8_1.c: New test. This is fine once the prerequisite patch is installed. Thanks, jeff
Re: [PATCH #1/7] allow vuses in ifcombine blocks
On 10/25/24 5:50 AM, Alexandre Oliva wrote: Disallowing vuses in blocks for ifcombine is too strict, and it prevents usefully moving fold_truth_andor into ifcombine. That tree-level folder has long ifcombined loads, absent other relevant side effects. for gcc/ChangeLog * tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses, but not vdefs. OK jeff
Re: [PATCH #2/7] drop redundant ifcombine_ifandif parm
On 10/25/24 5:51 AM, Alexandre Oliva wrote: In preparation to changes that may modify both inner and outer conditions in ifcombine, drop the redundant parameter result_inv, that is always identical to inner_inv. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant result_inv parm. Adjust all callers. OK jeff
Re: [PATCH 6/6] simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X)
On 10/24/24 12:24 AM, Kyrylo Tkachov wrote: On 24 Oct 2024, at 07:36, Jeff Law wrote: On 10/22/24 2:26 PM, Kyrylo Tkachov wrote: Hi all, With recent patch to improve detection of vector rotates at RTL level combine now tries matching a V8HImode rotate by 8 in the example in the testcase. We can teach AArch64 to emit a REV16 instruction for such a rotate but really this operation corresponds to the RTL code BSWAP, for which we already have the right patterns. BSWAP is arguably a simpler representation than ROTATE here because it has only one operand, so let's teach simplify-rtx to generate it. With this patch the testcase now generates the simplest form: .L2: ldr q31, [x1, x0] rev16 v31.16b, v31.16b str q31, [x0, x2] add x0, x0, 16 cmp x0, 2048 bne .L2 instead of the previous: .L2: ldr q31, [x1, x0] shl v30.8h, v31.8h, 8 usrav30.8h, v31.8h, 8 str q30, [x0, x2] add x0, x0, 16 cmp x0, 2048 bne .L2 IMO ideally the bswap detection would have been done during vectorisation time and used the expanders for that, but teaching simplify-rtx to do this transformation is fairly straightforward and, unlike at tree level, we have the native RTL BSWAP code. This change is not enough to generate the equivalent sequence in SVE, but that is something that should be tackled separately. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify (rotate:HI x:HI, 8) -> (bswap:HI x:HI). gcc/testsuite/ * gcc.target/aarch64/rot_to_bswap.c: New test. v2-0006-simplify-rtx-Simplify-ROTATE-HI-X-HI-8-into-BSWAP-HI.patch From 79e6dcf698361eae46d0e99f851077199a8ce43a Mon Sep 17 00:00:00 2001 From: Kyrylo Tkachov Date: Thu, 17 Oct 2024 06:39:57 -0700 Subject: [PATCH 6/6] simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X) With recent patch to improve detection of vector rotates at RTL level combine now tries matching a V8HImode rotate by 8 in the example in the testcase. We can teach AArch64 to emit a REV16 instruction for such a rotate but really this operation corresponds to the RTL code BSWAP, for which we already have the right patterns. BSWAP is arguably a simpler representation than ROTATE here because it has only one operand, so let's teach simplify-rtx to generate it. With this patch the testcase now generates the simplest form: .L2: ldr q31, [x1, x0] rev16 v31.16b, v31.16b str q31, [x0, x2] add x0, x0, 16 cmp x0, 2048 bne .L2 instead of the previous: .L2: ldr q31, [x1, x0] shl v30.8h, v31.8h, 8 usrav30.8h, v31.8h, 8 str q30, [x0, x2] add x0, x0, 16 cmp x0, 2048 bne .L2 IMO ideally the bswap detection would have been done during vectorisation time and used the expanders for that, but teaching simplify-rtx to do this transformation is fairly straightforward and, unlike at tree level, we have the native RTL BSWAP code. This change is not enough to generate the equivalent sequence in SVE, but that is something that should be tackled separately. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify (rotate:HI x:HI, 8) -> (bswap:HI x:HI). gcc/testsuite/ * gcc.target/aarch64/rot_to_bswap.c: New test. --- gcc/simplify-rtx.cc | 6 + .../gcc.target/aarch64/rot_to_bswap.c | 23 +++ 2 files changed, 29 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/rot_to_bswap.c diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 089e03c2a7a..205a251f005 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -4328,6 +4328,12 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, mode, op0, new_amount_rtx); } #endif + /* ROTATE/ROTATERT:HI (X:HI, 8) is BSWAP:HI (X). */ + tem = unwrap_const_vec_duplicate (trueop1); + if (GET_MODE_UNIT_BITSIZE (mode) == (2 * BITS_PER_UNIT) + && CONST_INT_P (tem) && INTVAL (tem) == BITS_PER_UNIT) + return simplify_gen_unary (BSWAP, mode, op0, mode); So what about other modes? I haven't really pondered this, but isn't there something similar for ROTATE:SI (X:SI, 16)? I guess the basic question is whether or not this really need to be limited to HImode. A (ROTATE:SI (X:SI, 16)) would represent a half-word swap, rather than a byte-swap. For example, 0x12345678 rotated by 16 gives 0x56781234 whereas a bswap would give 0x78563412. AArch64 does have native operations that perform these half-word (and word) swaps, but they are not RTL BSWAP operations unfortunately. So this pattern effectively only works for HI and vector HI
[committed] libstdc++: Fix std::vector::emplace to forward parameter
If the parameter is not lvalue-convertible to bool then the current code will fail to compile. The parameter should be forwarded to restore the original value category. libstdc++-v3/ChangeLog: * include/bits/stl_bvector.h (emplace_back, emplace): Forward parameter pack to preserve value category. * testsuite/23_containers/vector/bool/emplace_rvalue.cc: New test. --- Tested x86_64-linux. Pushed to trunk. libstdc++-v3/include/bits/stl_bvector.h | 4 ++-- .../vector/bool/emplace_rvalue.cc | 24 +++ 2 files changed, 26 insertions(+), 2 deletions(-) create mode 100644 libstdc++-v3/testsuite/23_containers/vector/bool/emplace_rvalue.cc diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index 42261ac5915..70f69b5b5b5 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -1343,7 +1343,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #endif emplace_back(_Args&&... __args) { - push_back(bool(__args...)); + push_back(bool(std::forward<_Args>(__args)...)); #if __cplusplus > 201402L return back(); #endif @@ -1353,7 +1353,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _GLIBCXX20_CONSTEXPR iterator emplace(const_iterator __pos, _Args&&... __args) - { return insert(__pos, bool(__args...)); } + { return insert(__pos, bool(std::forward<_Args>(__args)...)); } #endif protected: diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/emplace_rvalue.cc b/libstdc++-v3/testsuite/23_containers/vector/bool/emplace_rvalue.cc new file mode 100644 index 000..5dea2426d60 --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/vector/bool/emplace_rvalue.cc @@ -0,0 +1,24 @@ +// { dg-do compile { target c++11 } } + +#include + +struct S +{ + explicit operator bool() &&; +}; + +void +test_emplace_back() +{ + S s; + std::vector v; + v.emplace_back(std::move(s)); +} + +void +test_emplace() +{ + S s; + std::vector v; + v.emplace(v.begin(), std::move(s)); +} -- 2.47.0
Re: [PATCH #3/7] introduce ifcombine_replace_cond
On 10/25/24 5:52 AM, Alexandre Oliva wrote: Refactor ifcombine_ifandif, moving the common code from the various paths that apply the combined condition to a new function. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out of... (ifcombine_ifandif): ... this. It looks like you also did some simplifications in ifcombine_ifandif. Those should be noted in the ChangeLog. Specifically you no longer make the calls to force_gimple_operand_gsi and simplified the equality test. OK with that change. jeff
Re: [PATCH] RISC-V: Remove skip of decl in registered_function.
On 10/22/24 12:24 AM, KuanLin Chen wrote: The GTY skip makes GGC clean the registered functions wrongly in lto. Example: riscv64-unknown-elf-gcc -flto gcc/testsuite/gcc.target/riscv/rvv/base/bug-3.c -O2 -march=rv64gcv In file included from bug-3.c:2: internal compiler error: Segmentation fault gcc/ChangeLog: *riscv-vector-builtins.cc (registered_function): Remove skip at decl. How was this tested? I put it through a regression testsuite run and it resulted in about 4700 new failures for both riscv32-elf and riscv64-elf. Patches need to be regression tested. Jeff
Re: [PATCH] RISC-V: Fix rvv builtin function groups registration asynchronously.
On 10/22/24 12:26 AM, KuanLin Chen wrote: In the origin, cc1 registers rvv builtins with turn on all sub vector extensions but lto not. It makes lto use the asynchronous DECL_MD_FUNCTION_CODE from lto-objects. Example: riscv64-unknown-elf-gcc -flto gcc/testsuite/gcc.target/riscv/rvv/base/bug-3.c -O2 -march=rv64gcv bug-3.c: In function 'main': bug-3.c:10:3: error: invalid argument to built-in function 10 | __riscv_vse32_v_i32m1 (d, vd, 1); gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_pragma_intrinsic_flags_pollute): Move to riscv-vector-builtins.cc (riscv_pragma_intrinsic_flags_restore): Ditto (riscv_ext_version_value): Remove flags initialization. * config/riscv/riscv-vector-builtins.cc: (reinit_builtins): Remove handle_pragma_vector in lto_p. (riscv_pragma_intrinsic_flags_pollute): Cut from riscv-c.cc. (riscv_pragma_intrinsic_flags_restore): Ditto. (riscv_vector_push_setting): Backup flags. (riscv_vector_pop_setting): Restore flags. (handle_pragma_vector): Initialize flags for registering builtins. You need to run the regression testsuite and verify that there are no new failures after your patch compared to a run without your patch. jeff
Re: [PATCH v2] testsuite: Sanitize pacbti test cases for Cortex-M
On 2024-10-25 12:30, Richard Earnshaw (lists) wrote: On 14/10/2024 13:23, Christophe Lyon wrote: On 10/13/24 19:50, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? Changes since v1: - Dropped changes to dg- instructions. These will be addressed in a separate set of patches later. LGTM, let's avoid mixing changes. This is OK, though I think in most (but not all) cases the additional matches on a tab are unnecessary when the instruction takes arguments. The problem cases are mostly for instructions that do not take any arguments (or where we don't try to match them). I did not include it in v1, but it was suggested in https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662113.html and I did not see that would hurt, so I included it. Pushed as r15-4701-g6ad29a858ba and r14.2.0-317-gec9bd14144a. Kind regards, Torbjörn
Re: [PATCH v2 8/8] RISC-V: Add else operand to masked loads [PR115336].
On 10/18/24 8:22 AM, Robin Dapp wrote: This patch adds else operands to masked loads. Currently the default else operand predicate accepts "undefined" (i.e. SCRATCH) as well as all-ones values. Note that this series introduces a large number of new RVV FAILs for riscv. All of them are due to us not being able to elide redundant vec_cond_exprs. PR middle-end/115336 PR middle-end/116059 gcc/ChangeLog: * config/riscv/autovec.md: Add else operand. * config/riscv/predicates.md (maskload_else_operand): New predicate. * config/riscv/riscv-v.cc (get_else_operand): Remove static. (expand_load_store): Use get_else_operand and adjust index. (expand_gather_scatter): Ditto. (expand_lanes_load_store): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr115336.c: New test. * gcc.target/riscv/rvv/autovec/pr116059.c: New test. OK once prereqs are resolved. jeff
Re: [PATCH #4/7] adjust update_profile_after_ifcombine for noncontiguous ifcombine
On 10/25/24 5:54 AM, Alexandre Oliva wrote: Prepare for ifcombining noncontiguous blocks, adding (still unused) logic to the ifcombine profile updater to handle such cases. for gcc/ChangeLog * tree-ssa-ifcombine.cc (known_succ_p): New. (update_profile_after_ifcombine): Handle noncontiguous blocks. OK jeff