[Bug target/108338] use mtvsrws for lowpart DI->SF conversion on P9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108338 --- Comment #1 from CVS Commits --- The master branch has been updated by Jiu Fu Guo : https://gcc.gnu.org/g:5f56b76ff1c15118200204569389f85cca4e32d3 commit r14--g5f56b76ff1c15118200204569389f85cca4e32d3 Author: Jiufu Guo Date: Thu Sep 28 17:00:04 2023 +0800 rs6000: optimize moving to sf from highpart di Currently, we have the pattern "movsf_from_si2" which was trying to support moving high part DI to SF. But current pattern only accepts "ashiftrt": XX:SF=bitcast:SF(subreg(YY:DI>>32),0), but actually "lshiftrt" should also be ok. And current pattern only supports BE. Here, updating the pattern to support BE and "lshiftrt". PR target/108338 gcc/ChangeLog: * config/rs6000/predicates.md (lowpart_subreg_operator): New define_predicate. * config/rs6000/rs6000.md (any_rshift): New code_iterator. (movsf_from_si2): Rename to ... (movsf_from_si2_): ... this. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108338.c: New test.
[Bug target/108338] use mtvsrws for lowpart DI->SF conversion on P9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108338 --- Comment #2 from CVS Commits --- The master branch has been updated by Jiu Fu Guo : https://gcc.gnu.org/g:537d7a445ca0ed677751afd3cdcf8465ccd5fb7e commit r14-4445-g537d7a445ca0ed677751afd3cdcf8465ccd5fb7e Author: Jiufu Guo Date: Thu Sep 28 17:34:45 2023 +0800 rs6000: use mtvsrws to move sf from si p9 As mentioned in PR108338, on p9, we could use mtvsrws to implement the bitcast from SI to SF (or lowpart DI to SF). For example: *(long long*)buff = di; float f = *(float*)(buff); "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated. A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1". PR target/108338 gcc/ChangeLog: * config/rs6000/rs6000.md (movsf_from_si): Update to generate mtvsrws for P9. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.
[Bug target/108338] use mtvsrws for lowpart DI->SF conversion on P9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108338 Jiu Fu Guo changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Jiu Fu Guo --- Fixed.
[Bug target/111634] RISC-V vector: ICE RTL check: expected code 'reg', have 'lo_sum' in rhs_regno, at rtl.h:1934
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111634 Patrick O'Neill changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Patrick O'Neill --- Confirmed to be fixed on r14-4443-ga809a556dc0. Built and ran testsuite on rv32/64gcv glibc/newlib with --enable-checking=rtl. They all built successfully and no tests fail due to rtl checking!
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #6 from JuzheZhong --- Hi, Richi. Recently, I am evaluating TSVC performance of GCC: I found both RISC-V and aarch64 can SLP vectorize it: https://godbolt.org/z/ssvTxxjeT Both GCC-13 and trunk GCC can SLP it like LLVM (GCC-12 failed) but with -fno-vect-cost-model. I suspect we should adjust Vector COST model (I don't think we should ajust cost model in target backend since LLVM by default vectorize such case).
[Bug tree-optimization/111718] New: Missed optimization of '(a+a)/a'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718 Bug ID: 111718 Summary: Missed optimization of '(a+a)/a' Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: 652023330028 at smail dot nju.edu.cn Target Milestone: --- Hello, we found some optimizations (regarding Arithmetic optimization) that GCC may have missed. We would greatly appreicate if you can take a look and let us know what you think. Given the following code: https://godbolt.org/z/5de17zvz9 unsigned n1,n2; void func1(unsigned a){ if(a>10&&a<20){ n1=a+a; n2=(a+a)/a; } } We note that `(a+a)/a` should be optimized to `2`, but gcc-trunk -O3 does not: func1(unsigned int): lea eax, [rdi-11] cmp eax, 8 ja .L1 lea eax, [rdi+rdi] xor edx, edx mov DWORD PTR n1[rip], eax div edi mov DWORD PTR n2[rip], eax .L1: ret Thank you very much for your time and effort! We look forward to hearing from you.
[Bug libstdc++/111129] std::regex incorrectly matches quantifiers with plus appended
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29 Jonathan Wakely changed: What|Removed |Added CC||hewillk at gmail dot com --- Comment #4 from Jonathan Wakely --- *** Bug 111713 has been marked as a duplicate of this bug. ***
[Bug libstdc++/111713] libstdc++ accepts invalid regular expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111713 Jonathan Wakely changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #3 from Jonathan Wakely --- Yes, it's a dup *** This bug has been marked as a duplicate of bug 29 ***
[Bug libstdc++/92798] -fshort-enums can break iterators of std::map
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92798 --- Comment #6 from Jonathan Wakely --- We could add an enumerator that forces sizeof(_Rb_tree_color) == sizeof(int), which would be valid for C++98.
[Bug fortran/111719] New: Omitting data-sharing attribute for function return value in OpenMP does not raise an error.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111719 Bug ID: 111719 Summary: Omitting data-sharing attribute for function return value in OpenMP does not raise an error. Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: pmblakely at googlemail dot com Target Milestone: --- In the following Fortran90 example: program prog contains function b(arr) implicit none integer :: i, xCells real(kind = 8) :: dx, a, dc, b, dt real(kind = 8), allocatable, dimension(:), intent(in) :: arr b = 1d300 dc = 1d300 !$OMP parallel do default(none) reduction(min:dt) firstprivate(xCells, dx, a) shared(arr) do i = 0, xCells a = arr(i) b = min(b, 1d0 / a) end do end function b end program prog the return value for function 'b' is the intended reduction value in the do-loop, but is not mentioned in the OpenMP reduction clause (dt is incorrectly mentioned instead). Due to the default(none) clause, this should be a compile-time error. However: gfortran test.f90 -o test -fopenmp compiles this without warnings or errors (versions 13.1.0, 8.4.0, 9.4.0, 11.2.0 and 12.1.0 all tested). If "b = min(b, 1d0 / a)" is replaced by "dc = min(dc, 1d0/a)" then gfortran gives: "Error: 'dc' not specified in enclosing ‘parallel’" I would expect this error to be generated in the original case as well. Note that the OpenMP standard at https://www.openmp.org/spec-html/5.2/openmpsu33.html does not give an implicitly determined data-sharing attribute for the function return value. Also the Intel Fortran ifort (2021.8.0) does raise the expected error on the above test-code.
[Bug tree-optimization/111718] Missed optimization of '(a+a)/a'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718 Ivan Sorokin changed: What|Removed |Added CC||vanyacpp at gmail dot com --- Comment #1 from Ivan Sorokin --- GCC does the optimization if the return from the function is replaced with __builtin_unreachable: unsigned n1, n2; void func1(unsigned a) { if (a <= 10 || a >= 20) __builtin_unreachable(); n1 = a + a; n2 = (a + a)/a; } func1(unsigned int): mov DWORD PTR n2[rip], 2 add edi, edi mov DWORD PTR n1[rip], edi ret https://godbolt.org/z/Tjsz6neTs Perhaps this issue has the same underlying cause as the PR80015.
[Bug middle-end/110859] New FAIL: 23_containers/vector/bool/110807.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110859 --- Comment #3 from John David Anglin --- FAIL: 23_containers/vector/bool/110807.cc -std=gnu++17 (test for excess errors) Excess errors: /home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:440: warning: 'void* __builtin_memmove(void*, const void*, unsigned int)' writing between 5 and 268435455 bytes into a region of size 4 overflows the destination [-Wstringop-overflow=]
[Bug target/64215] -Os misses an opportunity to merge two ret instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64215 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #5 from Jeffrey A. Law --- Andrew, the reason the patch you referenced doesn't help this case is because we don't have an unconditional jump to a return only block. To optimize this case we'd have to detect that we have a return only block that is immediately preceded by another return block after bbro. ie: (note 48 23 59 6 [bb 6] NOTE_INSN_BASIC_BLOCK) (insn 59 48 49 6 (use (reg/i:SI 10 a0)) -1 (nil)) (jump_insn 49 59 37 6 (simple_return) 346 {simple_return} (nil) -> simple_return) ;; lr out 1 [ra] 2 [sp] 10 [a0] ;; live out 1 [ra] 2 [sp] 10 [a0] ;; succ: EXIT [always] count:52738306 (estimated locally, freq 0.4591) ;; basic block 7, loop depth 0, count 6317494 (estimated locally, freq 0.0550), maybe hot ;; prev block 6, next block 1, flags: (REACHABLE, RTL) ;; pred: 2 [5.5% (guessed)] count:6317494 (estimated locally, freq 0.0550) (CAN_FALLTHRU) ;; bb 7 artificial_defs: { } ;; bb 7 artificial_uses: { u-1(2){ }} ;; lr in1 [ra] 2 [sp] 10 [a0] ;; lr use 2 [sp] 10 [a0] ;; lr def ;; live in 1 [ra] 2 [sp] 10 [a0] ;; live gen ;; live kill (code_label 37 49 36 7 4 (nil) [1 uses]) (note 36 37 60 7 [bb 7] NOTE_INSN_BASIC_BLOCK) (insn 60 36 51 7 (use (reg/i:SI 10 a0)) -1 (nil)) (jump_insn 51 60 41 7 (simple_return) 346 {simple_return} (nil) -> simple_return)
[Bug target/106271] Bootstrap on RISC-V on Ubuntu 22.04 LTS: bits/libc-header-start.h: No such file or directory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106271 Jeffrey A. Law changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #8 from Jeffrey A. Law --- I wasn't aware of this BZ when I made the commit referenced in c#6. But yes, the whole point of that commit was to fix this problem.
[Bug target/109414] RISC-V: unnecessary sext.w in rv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109414 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED CC||law at gcc dot gnu.org --- Comment #5 from Jeffrey A. Law --- These code generation inefficiences have been fixed. I didn't bisect, but I would hazard a guess it was Jivan's work on exposing the widening nature of the 32 bit operations and extracting the result via a promoted subreg. ie, for the first example we now generate this during expand: (insn 2 5 3 2 (set (reg/v:DI 136 [ x ]) (reg:DI 10 a0 [ x ])) "j.c":1:26 -1 (nil)) (insn 3 2 4 2 (set (reg/v:DI 137 [ n ]) (reg:DI 11 a1 [ n ])) "j.c":1:26 -1 (nil)) (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 4 8 2 (set (reg:DI 140) (sign_extend:DI (plus:SI (subreg/s/u:SI (reg/v:DI 136 [ x ]) 0) (const_int 1 [0x1] "j.c":2:12 -1 (nil)) (insn 8 7 9 2 (set (reg:SI 139) (subreg/s/u:SI (reg:DI 140) 0)) "j.c":2:12 -1 (expr_list:REG_EQUAL (plus:SI (subreg/s/u:SI (reg/v:DI 136 [ x ]) 0) (const_int 1 [0x1])) (nil))) (insn 9 8 10 2 (set (reg:DI 141) (xor:DI (reg/v:DI 137 [ n ]) (subreg:DI (reg:SI 139) 0))) "j.c":2:17 -1 (nil)) (insn 10 9 11 2 (set (reg:DI 142) (sign_extend:DI (subreg:SI (reg:DI 141) 0))) "j.c":2:17 discrim 1 -1 (nil)) (insn 11 10 15 2 (set (reg:DI 135 [ ]) (reg:DI 142)) "j.c":2:17 discrim 1 -1 (nil)) (insn 15 11 16 2 (set (reg/i:DI 10 a0) (reg:DI 135 [ ])) "j.c":3:1 -1 (nil)) (insn 16 15 0 2 (use (reg/i:DI 10 a0)) "j.c":3:1 -1 (nil)) Which is much easier for combine to analyze and prove the trailing sign extension is unnecessary.
[Bug rtl-optimization/111384] missed optimization: GCC adds extra any extend when storing subreg#0 multiple times
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111384 Jeffrey A. Law changed: What|Removed |Added Last reconfirmed||2023-10-07 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #4 from Jeffrey A. Law --- So this is something we've been pondering over in rv64 land. Joern has an extension to DCE which tracks subobjects in an attempt to determine if bits set by sign/zero extensions are never read. If they aren't read, then the extension can be eliminated.
[Bug middle-end/111699] [11/12/13 Regression] ICE: SIGSEGV: infinite recursion in fold_build3_loc/fold_ternary_loc/generic_simplify_VEC_COND_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111699 --- Comment #8 from CVS Commits --- The releases/gcc-13 branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:add2afa9e25f1776fdfbeb1b99fd1efcf850f91f commit r13-7938-gadd2afa9e25f1776fdfbeb1b99fd1efcf850f91f Author: Andrew Pinski Date: Thu Oct 5 12:21:19 2023 -0700 MATCH: Fix infinite loop between `vec_cond(vec_cond(a,b,0), c, d)` and `a & b` Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)` into `vec_cond(a & b, c, d)` but since in this case a is a comparison fold will change `a & b` back into `vec_cond(a,b,0)` which causes an infinite loop. The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*) only for GIMPLE so we don't get an infinite loop for fold any more. Note this is a latent bug since these patterns were added in r11-2577-g229752afe3156a and was exposed by r14-3350-g47b833a9abe1 where now able to remove a VIEW_CONVERT_EXPR. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/111699 gcc/ChangeLog: * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e), (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr111699-1.c: New test. (cherry picked from commit e77428a9a336f57e3efe3eff95f2b491d7e9be14)
[Bug bootstrap/111664] [14 regression] Fails to build with mawk (error in gcc/opt-read.awk) after r14-4354-ge4a4b8e983bac8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111664 Jeffrey A. Law changed: What|Removed |Added Status|ASSIGNED|RESOLVED CC||law at gcc dot gnu.org Resolution|--- |FIXED --- Comment #6 from Jeffrey A. Law --- Fixed on the trunk.
[Bug middle-end/111699] [11/12/13 Regression] ICE: SIGSEGV: infinite recursion in fold_build3_loc/fold_ternary_loc/generic_simplify_VEC_COND_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111699 --- Comment #9 from CVS Commits --- The releases/gcc-12 branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:a63238cd52d974d364677def97d4ed70d26a7410 commit r12-9915-ga63238cd52d974d364677def97d4ed70d26a7410 Author: Andrew Pinski Date: Thu Oct 5 12:21:19 2023 -0700 MATCH: Fix infinite loop between `vec_cond(vec_cond(a,b,0), c, d)` and `a & b` Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)` into `vec_cond(a & b, c, d)` but since in this case a is a comparison fold will change `a & b` back into `vec_cond(a,b,0)` which causes an infinite loop. The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*) only for GIMPLE so we don't get an infinite loop for fold any more. Note this is a latent bug since these patterns were added in r11-2577-g229752afe3156a and was exposed by r14-3350-g47b833a9abe1 where now able to remove a VIEW_CONVERT_EXPR. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/111699 gcc/ChangeLog: * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e), (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr111699-1.c: New test. (cherry picked from commit e77428a9a336f57e3efe3eff95f2b491d7e9be14)
[Bug middle-end/111699] [11/12/13 Regression] ICE: SIGSEGV: infinite recursion in fold_build3_loc/fold_ternary_loc/generic_simplify_VEC_COND_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111699 --- Comment #10 from CVS Commits --- The releases/gcc-11 branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:9d4caf90e7bf1824ebabf0bc0541bfea511ef03b commit r11-11054-g9d4caf90e7bf1824ebabf0bc0541bfea511ef03b Author: Andrew Pinski Date: Thu Oct 5 12:21:19 2023 -0700 MATCH: Fix infinite loop between `vec_cond(vec_cond(a,b,0), c, d)` and `a & b` Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)` into `vec_cond(a & b, c, d)` but since in this case a is a comparison fold will change `a & b` back into `vec_cond(a,b,0)` which causes an infinite loop. The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*) only for GIMPLE so we don't get an infinite loop for fold any more. Note this is a latent bug since these patterns were added in r11-2577-g229752afe3156a and was exposed by r14-3350-g47b833a9abe1 where now able to remove a VIEW_CONVERT_EXPR. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/111699 gcc/ChangeLog: * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e), (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr111699-1.c: New test. (cherry picked from commit e77428a9a336f57e3efe3eff95f2b491d7e9be14)
[Bug middle-end/111699] [11/12/13 Regression] ICE: SIGSEGV: infinite recursion in fold_build3_loc/fold_ternary_loc/generic_simplify_VEC_COND_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111699 Andrew Pinski changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to work||12.3.1, 13.2.1 Resolution|--- |FIXED Target Milestone|13.3|11.5 --- Comment #11 from Andrew Pinski --- Fixed everywhere.
[Bug c/111720] New: RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 Bug ID: 111720 Summary: RISC-V: Ugly codegen in RVV Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- Reference: https://godbolt.org/z/YqW7Y5Yve #include vbool8_t fn() { uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9}; uint8_t m = 1; vuint8m1_t varr = __riscv_vle8_v_u8m1(arr, 32); vuint8m1_t vand_m = __riscv_vand_vx_u8m1(varr, m, 32); vbool8_t vmask = __riscv_vreinterpret_v_u8m1_b8(vand_m); return vmask; } GCC asm: fn: lui a5,%hi(.LANCHOR0) addisp,sp,-32 vsetivlizero,4,e64,m2,ta,ma addia5,a5,%lo(.LANCHOR0) li a4,32 vle64.v v2,0(a5) vse64.v v2,0(sp) vsetvli zero,a4,e8,m1,ta,ma vle8.v v1,0(sp) vand.vi v1,v1,1 vsetvli a5,zero,e8,m1,ta,ma vsm.v v1,0(a0) addisp,sp,32 jr ra LLVM ASM: fn: # @fn .Lpcrel_hi0: auipc a0, %pcrel_hi(.L__const.fn.arr) addia0, a0, %pcrel_lo(.Lpcrel_hi0) li a1, 32 vsetvli zero, a1, e8, m1, ta, ma vle8.v v8, (a0) vand.vi v0, v8, 1 ret .L__const.fn.arr: .ascii "\001\002\007\001\003\004\005\003\001\000\001\002\004\004\t\t\001\002\007\001\003\004\005\003\001\000\001\002\004\004\t\t"
[Bug regression/111709] [13 Regression] Miscompilation of sysdeps/ieee754/dbl-64/s_fma.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111709 --- Comment #10 from dave.anglin at bell dot net --- On 2023-10-06 3:50 a.m., rguenth at gcc dot gnu.org wrote: > Does it work on trunk? No. Test results with gcc trunk are identical to with Debian gcc-13. Tried just rebuilding s_fma.c, and a full build and check.
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #1 from JuzheZhong --- The root cause is unnecessary VLS modes data movement: (insn 10 9 11 2 (set (reg:V4DI 143) (mem/u/c:V4DI (reg:DI 142) [0 S32 A128])) "/app/example.c":4:13 1119 {*movv4di} (nil)) (insn 11 10 12 2 (set (mem/c:V4DI (reg:DI 141) [0 S32 A128]) (reg:V4DI 143)) "/app/example.c":4:13 1119 {*movv4di} (nil))
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #2 from Andrew Pinski --- I noticed there is an ABI difference here. GCC is returning via a store to a0: vsm.v v1,0(a0) While LLVM is returning via v0 . Which one is correct?
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #3 from JuzheZhong --- (In reply to Andrew Pinski from comment #2) > I noticed there is an ABI difference here. > > GCC is returning via a store to a0: > vsm.v v1,0(a0) > > While LLVM is returning via v0 . > > Which one is correct? Both are correct. We have a experiment ABI doc. GCC also support same ABI but need --param=riscv-vector-abi Then GCC ASM: fn: lui a5,%hi(.LANCHOR0) addisp,sp,-32 addia5,a5,%lo(.LANCHOR0) vsetivlizero,4,e64,m2,ta,ma li a4,32 vle64.v v8,0(a5) vse64.v v8,0(sp) vsetvli zero,a4,e8,m1,ta,ma vle8.v v0,0(sp) vand.vi v0,v0,1 addisp,sp,32 jr ra GCC also return via v0 with enabling ABI. The root cause is unnecessary load/store: vle64.v v8,0(a5) vse64.v v8,0(sp)
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #4 from JuzheZhong --- I found this is not because VLS modes. with --param=riscv-autovec-preference=fixed-vlmax disabling VLS modes also see unnecessary load/store: fn: lui a5,%hi(.LANCHOR0) addisp,sp,-32 addia5,a5,%lo(.LANCHOR0) vl2re64.v v8,0(a5) - ??? unnecessary li a4,32 vs2r.v v8,0(sp)- ??? unnecessary vsetvli zero,a4,e8,m1,ta,ma vle8.v v0,0(sp) vand.vi v0,v0,1 addisp,sp,32 jr ra The optimized tree is reasonable, but after the "expand" stage, the redundant load and store are produced.
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #5 from JuzheZhong --- Similar issue in GCC 13.2: https://godbolt.org/z/axKc4qj47 fn: lui a5,%hi(.LANCHOR0) addia5,a5,%lo(.LANCHOR0) ld a1,0(a5) ld a2,8(a5) ld a3,16(a5) ld a4,24(a5) addisp,sp,-32 sd a1,0(sp) sd a2,8(sp) sd a3,16(sp) sd a4,24(sp) li a5,32 vsetvli zero,a5,e8,m1,ta,ma vle8.v v24,0(sp) vand.vi v24,v24,1 vs1r.v v24,0(a0) addisp,sp,32 jr ra Multiple ld/sd. It seems that we didn't allow natural constant mem pool
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #6 from Andrew Pinski --- I suspect if __riscv_vle8_v_u8m1 gets lowered into a load on the gimple level, it might just work ... But it gets expanded as: (insn 14 13 0 (set (reg/v:RVVM1QI 134 [ varrD.56526 ]) (if_then_else:RVVM1QI (unspec:RVVMF8BI [ (const_vector:RVVMF8BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 145) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (mem:RVVM1QI (reg:DI 144) [0 S[16, 16] A8]) (unspec:RVVM1QI [ (reg:SI 0 zero) ] UNSPEC_VUNDEF))) "/app/example.c":7:23 -1 (nil)) That seems complex.
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2023-10-07 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #7 from Andrew Pinski --- .
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #8 from JuzheZhong --- (In reply to Andrew Pinski from comment #6) > I suspect if __riscv_vle8_v_u8m1 gets lowered into a load on the gimple > level, it might just work ... > > But it gets expanded as: > (insn 14 13 0 (set (reg/v:RVVM1QI 134 [ varrD.56526 ]) > (if_then_else:RVVM1QI (unspec:RVVMF8BI [ > (const_vector:RVVMF8BI repeat [ > (const_int 1 [0x1]) > ]) > (reg:DI 145) > (const_int 2 [0x2]) repeated x2 > (const_int 0 [0]) > (reg:SI 66 vl) > (reg:SI 67 vtype) > ] UNSPEC_VPREDICATE) > (mem:RVVM1QI (reg:DI 144) [0 S[16, 16] A8]) > (unspec:RVVM1QI [ > (reg:SI 0 zero) > ] UNSPEC_VUNDEF))) "/app/example.c":7:23 -1 > (nil)) > > That seems complex. You mean the normal load MEM_REF in GCC ? I don't think we can do that since this intrinsic is defined with mask, len, else value,...etc.
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #9 from JuzheZhong --- (In reply to Andrew Pinski from comment #7) > . Besides, if we remove the data initialization: https://godbolt.org/z/qcjcP7s1c #include vuint8m1_t fn() { uint8_t arr[32]; uint8_t m = 1; vuint8m1_t varr = __riscv_vle8_v_u8m1(arr, 32); vuint8m1_t vand_m = __riscv_vand_vx_u8m1(varr, m, 32); //vbool8_t vmask = __riscv_vreinterpret_v_u8m1_b8(vand_m); return vand_m; } The issue is gone: fn: addisp,sp,-32 li a5,32 vsetvli zero,a5,e8,m1,ta,ma vle8.v v24,0(sp) vand.vi v24,v24,1 vs1r.v v24,0(a0) addisp,sp,32 jr ra The codegen as good as LLVM. I still think it is something like constant memory pool issue.
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #10 from Andrew Pinski --- The issues is GCC does prop the load/store for arr into __riscv_vle8_v_u8m1 really.
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #11 from JuzheZhong --- (In reply to Andrew Pinski from comment #10) > The issues is GCC does prop the load/store for arr into __riscv_vle8_v_u8m1 > really. Ok. Do you know why GCC prop load/store for arr into __riscv_vle8_v_u8m1? Just because the __riscv_vle8_v_u8m1 pattern is complex? I don't think we can simplify __riscv_vle8_v_u8m1 pattern since we tried to fuse all feature into a single pattern (A pattern includes multiple features become complex) to reduce the building of insn-emit.cc and insn-opinit.cc
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #12 from JuzheZhong --- Hi, Andrew. I have another try: https://godbolt.org/z/heKxcMWsY change the load into normal load of arr: vuint8m1_t varr = *(vuint8m1_t*)arr; Like you said, The issue is gone (as good as LLVM): fn: lui a5,%hi(.LANCHOR0) addia5,a5,%lo(.LANCHOR0) li a4,32 vl1re8.vv1,0(a5) vsetvli zero,a4,e8,m1,ta,ma vand.vi v1,v1,1 vs1r.v v1,0(a0) ret It seems that GCC can only optimize the normal load ? Do we have a chance to optimize such case (for an unknown load) ?
[Bug c/111721] New: RISC-V: Failed to SLP for gather_load in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111721 Bug ID: 111721 Summary: RISC-V: Failed to SLP for gather_load in RVV Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- https://godbolt.org/z/d5TPa5e5s void __attribute__((noipa)) f (int *restrict y, int *restrict x, int *restrict indices, int n) { for (int i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } RVV ASM: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e32,m1,ta,ma vlseg2e32.v v2,(a2) > VEC_LOAD_LANES vsetivlizero,4,e32,m1,ta,ma vsll.vi v4,v2,2 vsll.vi v1,v3,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v4,(a1),v4 vluxei32.v v1,(a1),v1 vsetivlizero,4,e32,m1,ta,ma sllia4,a5,3 vadd.vi v2,v4,1 vadd.vi v3,v1,2 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vsseg2e32.v v2,(a0) > VEC_STORE_LANES add a2,a2,a4 add a0,a0,a4 bne a3,zero,.L3 .L5: ret Comparing to aarch64 which can SLP, RVV geneates expensive load_lanes/store_lanes. This is because RVV is using MASK_LEN_GATHER_LOAD that we currently can didn't support SLP for it.
[Bug c/111722] New: gcc generates wrong code with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722 Bug ID: 111722 Summary: gcc generates wrong code with Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zfigura at codeweavers dot com Target Milestone: ---
[Bug c/111722] gcc generates wrong code with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-10-08 Status|UNCONFIRMED |WAITING --- Comment #1 from Andrew Pinski --- There is nothing in this bug except saying there is wrong code happening (not even with what options or with anything else).
[Bug target/111722] manually defined memcpy() and memmove() incorrectly handle overlap with -O2 -m32 -march=bdver2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722 Zeb Figura changed: What|Removed |Added Version|unknown |13.2.0 Keywords||wrong-code Target||i686-linux-gnu Component|c |target Summary|gcc generates wrong code|manually defined memcpy() |with|and memmove() incorrectly ||handle overlap with -O2 ||-m32 -march=bdver2 --- Comment #2 from Zeb Figura --- Really sorry about that, I managed to accidentally hit the Enter key halfway through writing the title. Here is the actual bug description: -- Wine provides freestanding libraries, including manual definitions of memcpy() and memmove() [1]. Those are defined in C, and while our definitions are *technically* non-compliant C (violating the requirement that the pointers must point to the same object), they should be fine for our targets, and anyway, the case I'm running into is failure to handle overlap where the pointers *do* in fact point into the same object. I can't find fault with the definitions themselves, although I may be missing something. We also, contrary to standards, give memcpy() the semantics of memmove(), because some Windows programs are buggy and make that assumption. We do this by copy-pasting the definition (I'm not sure why we do this rather than just calling one function from the other, but it is what it is). I recently started compiling with -march=native, and found that gcc was failing to correctly handle overlap in memmove. Further investigation revealed that, somehow, memmove() was being incorrectly optimized to *not* check for overlap, while memcpy() remained in its unoptimized form. I ran into this originally with the i686-w64-mingw32 target, but I've adjusted the target to i686-linux-gnu since it happens there too. It does *not* happen on x86_64. [1] https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/ntdll/string.c#l98
[Bug target/111722] manually defined memcpy() and memmove() incorrectly handle overlap with -O2 -m32 -march=bdver2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722 --- Comment #3 from Zeb Figura --- Created attachment 56072 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56072&action=edit testcase Attaching a reduced-ish testcase, that contains the unmodified code of memcpy() and memmove(), plus two callers. The callers seem to be necessary to trigger the incorrect optimization. Compile with '-c -O2 -march=bdver2 -m32'.
[Bug c++/94039] conditional operator fails to use proper overload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94039 Arthur O'Dwyer changed: What|Removed |Added CC||arthur.j.odwyer at gmail dot com --- Comment #3 from Arthur O'Dwyer --- You can also hit this with a lambda, which of course is isomorphic to Andre's test case: void (*a)() = true ? []{} : nullptr; Bug #88458 ("GCC rejects (true ? 0 : nullptr)") might be tangentially related.
[Bug target/111722] manually defined memcpy() and memmove() incorrectly handle overlap with -O2 -m32 -march=bdver2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722 Andrew Pinski changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #4 from Andrew Pinski --- There is no bug here. ICF finds that your definition of memcpy is the same as memmove and merges the 2 and then calls memcpy from your memmove and then inlines the normal memcpy because well it says it is the same. You can just use -fno-builtin to fix the issue by saying memcpy and memmove are not builtins and treat them like normal functions. That fixes the issue by not inlining the target defined memcpy.
[Bug target/111722] manually defined memcpy() and memmove() incorrectly handle overlap with -O2 -m32 -march=bdver2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722 --- Comment #5 from Zeb Figura --- (In reply to Andrew Pinski from comment #4) > There is no bug here. > ICF finds that your definition of memcpy is the same as memmove and merges > the 2 and then calls memcpy from your memmove and then inlines the normal > memcpy because well it says it is the same. I suppose I understand this explanation, but it does not feel like a very intuitive behaviour. The ICF part makes sense. The choice to optimize a builtin memcpy/memmove call into a different instruction sequence (which doesn't match the original) also makes sense. I would not really expect these two to be combined in this manner, though. memmove() is not calling builtin memcpy(), it is calling our implementation of memcpy(), which doesn't have the same semantics as builtin memcpy(). [It also seems odd to me that func2() would be replaced with a builtin memcpy() rather than a builtin memmove()?] > You can just use -fno-builtin to fix the issue by saying memcpy and memmove > are not builtins and treat them like normal functions. > > That fixes the issue by not inlining the target defined memcpy. Fair enough, I guess. I suppose that's the right thing to do anyway...
[Bug c++/111723] New: #pragma GCC system_header suppresses errors from narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111723 Bug ID: 111723 Summary: #pragma GCC system_header suppresses errors from narrowing conversions Product: gcc Version: 13.2.1 Status: UNCONFIRMED Keywords: accepts-invalid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: de34 at live dot cn Target Milestone: --- In the following program, the conversions are narrowing, but only the one for nonstd::in_fun_result is rejected. When -Wsystem-headers is used, then the narrowing conversion for std::ranges::in_fun_result are correctly diagnosed. But if -pedantic-errors and -Wsystem-headers are used together, some standard headers are rejected. Godbolt link: https://godbolt.org/z/fT7b16eoe ``` #include #include #include namespace nonstd { template struct in_fun_result { [[no_unique_address]] I in; [[no_unique_address]] F fun; template requires std::convertible_to && std::convertible_to constexpr operator in_fun_result() const& { return {in, fun}; } template requires std::convertible_to && std::convertible_to constexpr operator in_fun_result() && { return {std::move(in), std::move(fun)}; } }; } int main() { std::ranges::in_fun_result r1{}; std::ranges::in_fun_result r2 = r1; // should be error, but not diagnosed by default nonstd::in_fun_result r3{}; nonstd::in_fun_result r4 = r3; // error, rejected with -pedantic-errors } ``` It seems to me that #pragma GCC system_header shouldn't suppress errors from narrowing conversions, because the diagnostics are required by the standard.
[Bug c++/111723] #pragma GCC system_header suppresses errors from narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111723 --- Comment #1 from Andrew Pinski --- I think this is correct behavior really. Note even clang with libc++ has the same behavior ...
[Bug target/106708] [rs6000] 64bit constant generation with oris xoris
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106708 Jiu Fu Guo changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from Jiu Fu Guo --- Patch ready on the trunk.
[Bug target/93176] PPC: inefficient 64-bit constant consecutive ones
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176 Jiu Fu Guo changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #13 from Jiu Fu Guo --- Patches are committed for using "li/lis;rldicl/rldicr/rldic" to construct constants.
[Bug tree-optimization/111718] Missed optimization of '(a+a)/a'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718 --- Comment #2 from Yi <652023330028 at smail dot nju.edu.cn> --- We noticed one change between gcc-13.2 and the current gcc-trunk: https://godbolt.org/z/j5Mnvno9n In the following code, gcc-13.2 does not yet have the ability to optimize as expected, but on gcc-trunk, it does. unsigned n1,n2; void func1(unsigned a){ if(a<=10 || a>=20) return; n2=(a+a)/a; } Maybe this change will help solve this issue?
[Bug target/94393] Powerpc suboptimal 64-bit constant comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94393 Jiu Fu Guo changed: What|Removed |Added Resolution|--- |FIXED CC||guojiufu at gcc dot gnu.org Status|NEW |RESOLVED --- Comment #9 from Jiu Fu Guo --- After r14-4470, trunk generates better code for this case.
[Bug target/94395] Powerpc suboptimal 64-bit constant generation near large values with few bits set
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94395 Jiu Fu Guo changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED CC||guojiufu at gcc dot gnu.org --- Comment #3 from Jiu Fu Guo --- After r14-4470, the trunk could generate a better code for this case.
[Bug tree-optimization/111718] Missed optimization of '(a+a)/a'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718 --- Comment #3 from Andrew Pinski --- For comment #2 from EVRP: Folding statement: _3 = _2 / a_5(D); Applying pattern match.pd:934, gimple-match-4.cc:2021 gimple_simplified to _3 = 2; Which corresponds to the match pattern: /* Simplify (t * 2) / 2) -> t. */ (for div (trunc_div ceil_div floor_div round_div exact_div) (simplify (div (mult:c @0 @1) @1) (if (ANY_INTEGRAL_TYPE_P (type)) (if (TYPE_OVERFLOW_UNDEFINED (type)) @0 #if GIMPLE (with {value_range vr0, vr1;} (if (INTEGRAL_TYPE_P (type) && get_range_query (cfun)->range_of_expr (vr0, @0) && get_range_query (cfun)->range_of_expr (vr1, @1) && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1)) @0)) #endif
[Bug tree-optimization/111718] Missed optimization of '(a+a)/a'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Keywords||missed-optimization Last reconfirmed||2023-10-08 --- Comment #4 from Andrew Pinski --- (In reply to Andrew Pinski from comment #3) > For comment #2 from EVRP: > Folding statement: _3 = _2 / a_5(D); > Applying pattern match.pd:934, gimple-match-4.cc:2021 > gimple_simplified to _3 = 2; > > Which corresponds to the match pattern: > /* Simplify (t * 2) / 2) -> t. */ > (for div (trunc_div ceil_div floor_div round_div exact_div) > (simplify > (div (mult:c @0 @1) @1) > (if (ANY_INTEGRAL_TYPE_P (type)) >(if (TYPE_OVERFLOW_UNDEFINED (type)) > @0 > #if GIMPLE > (with {value_range vr0, vr1;} > (if (INTEGRAL_TYPE_P (type) > && get_range_query (cfun)->range_of_expr (vr0, @0) > && get_range_query (cfun)->range_of_expr (vr1, @1) > && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1)) > @0)) > #endif > Which was improved on the trunk by r14-4082-g55b22a6f630e (and then by r14-4191-gd946fc1c71bd). I don't know why the original testcase is not causing the above pattern to match though, maybe because a*2 is used twice ...
[Bug middle-end/111621] [RISC-V] Bad register allocation in vadd.vi may cause operational error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111621 --- Comment #2 from liu xu --- I'm sorry about that and will notice that next time. The toolchain I used was built using the gcc master branch, and another point that needs to be added is that only the vadd.vi instruction with mask will encounter the above problem, and without mask, it will not. Looking forward to your reply!
[Bug go/46986] Go is not supported on Darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46986 --- Comment #49 from Sergey Fedorov --- If someone happens to have some WIP on this, more recent than 2012, please share, if possible.