[Bug gcov-profile/82614] GCOV crashes while parsing gcda file

2017-12-19 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614 --- Comment #15 from PeteVine --- No, that's not it - gcov-dump 6/7 have no problem dumping previous versions. I'm just not sure if the problem with gcov-dump-8 is architecture specific (ARM) or it's something to do with my setup. I'm going to le

[Bug gcov-profile/83509] New: gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
-profile Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42931 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42931&action=edit example gcda file

[Bug gcov-profile/83509] gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509 --- Comment #3 from PeteVine --- OK, the following command was used to obtain the gcno/gcda files: $ gcc-8 -O3 -fprofile-generate -ftest-coverage sudoku.c && ./a.out Unlike the gcda file, gcno is dumpable with gcov-dump-8.

[Bug gcov-profile/83509] gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509 --- Comment #4 from PeteVine --- Created attachment 42934 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42934&action=edit corresponding C/gcda/gcno files

[Bug middle-end/58306] Broken profiling for unrar sources: error: corrupted value profile: value profile counter (X out of Y) inconsistent with basic-block count

2016-03-10 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58306 --- Comment #10 from PeteVine --- At least on ARM (gcc 4.9.3) it does work after a clean generate/use cycle.

[Bug c++/70282] New: cc1plus hangs taking 100% CPU

2016-03-18 Thread tulipawn at gmail dot com
++ Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 38004 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38004&action=edit Preprocessed source Target: arm-linux-gnueabihf Configured with: ../src/configure -v

[Bug middle-end/70282] cc1plus hangs taking 100% CPU

2016-03-19 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70282 --- Comment #1 from PeteVine --- It was doing the hogging for over 7h overnight so the chances it would ever complete are rather slim...

[Bug middle-end/70282] cc1plus hangs taking 100% CPU

2016-03-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70282 --- Comment #2 from PeteVine --- I'd like to add that, all else being equal about a month ago, Function.cpp compiled just fine with -O3.

[Bug middle-end/70282] cc1plus hangs taking 100% CPU

2016-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70282 --- Comment #3 from PeteVine --- I've just tried building with g++ 5.3.0 and `-flto` which to my surprise didn't hang unlike 5.3.0 without lto.

[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage

2016-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004 --- Comment #14 from PeteVine --- Tried it again using gcc 5.3.0: Building physfs (release) physfs.c ../src/physfs/physfs.c:76:5: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types] &__PHYSFS_Arch

[Bug gcov-profile/70773] New: Profiling makes sudoku solver slower

2016-04-23 Thread tulipawn at gmail dot com
-profile Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 38334 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38334&action=edit .c, .i and .gcda files Compiling the attached sudoku solver the usual

[Bug gcov-profile/70773] Profiling makes sudoku solver slower

2016-04-24 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #1 from PeteVine --- Created attachment 38337 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38337&action=edit the two assembly versions

[Bug middle-end/70773] Profiling makes sudoku solver slower

2016-04-24 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #3 from PeteVine --- Oh, a divmod issue. At least it's not using modsi3 ;) (llvm #26450) BTW, the attached assembly files were generated with lto and NEON enabled but the 20% difference stayed the same. (1s vs 1.2s)

[Bug middle-end/70773] Profiling makes sudoku solver slower

2016-04-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #5 from PeteVine --- The issue seems to be purely about soft division. (I was either using no -mcpu or -mcpu=cortex-a5) Compiling for e.g Cortex-A7, doesn't need to lower any library calls and even though hardware division is not u

[Bug bootstrap/70896] New: gcc4 ABI compatible bootstrap fails

2016-05-01 Thread tulipawn at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- I've just tried bootstrapping gcc 6.1.0 on ARM Linux with gcc 4.9.3 using the following options: --enable-languages=c,c++,fortran --prefix=/usr/gcc6 --program-suffix=-6 --e

[Bug bootstrap/70896] gcc4 ABI compatible bootstrap fails

2016-05-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70896 --- Comment #2 from PeteVine --- I had this option in my environment FLAGS - is it problematic?

[Bug bootstrap/70896] gcc4 ABI compatible bootstrap fails

2016-05-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70896 --- Comment #3 from PeteVine --- That option never caused me any troubles, btw. FWIW, once I've removed the `--disable-libstdcxx-dual-abi` option, the build started chugging along quite nicely.

[Bug bootstrap/70896] gcc4 ABI compatible bootstrap fails

2016-05-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70896 --- Comment #4 from PeteVine --- Until it failed again during the fortran part: ../../../libgfortran/io/transfer.c: In function ‘bswap_array’: ../../../libgfortran/io/transfer.c:915:25: fatal error: You must enable NEON instructions (e.g. -mfloa

[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage

2016-05-02 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004 --- Comment #15 from PeteVine --- After deleting physfs profile data I got a little further (using gcc 6.1.0): ../src/luajit2/src/lj_gc.c: In function ‘lj_mem_newgco’: ../src/luajit2/src/lj_gc.c:816:10: error: corrupted value profile: ic profile

[Bug bootstrap/70896] gcc4 ABI compatible bootstrap fails

2016-05-05 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70896 --- Comment #6 from PeteVine --- Building with -mfpu=neon has duly helped. The fpu configure option is probably not relevant here (simply setting the default for emitted code) but rather the CFLAGS used for compiling these intrinsics. (-mfpu=vfpv

[Bug bootstrap/70896] gcc4 ABI compatible bootstrap fails

2016-05-24 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70896 --- Comment #8 from PeteVine --- Nice, any idea why `--disable-libstdcxx-dual-abi` should no longer work though?

[Bug bootstrap/70896] gcc4 ABI compatible bootstrap fails

2016-05-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70896 --- Comment #9 from PeteVine --- I've rerun the same `configure` command against gcc-7-20160522 source, only with `---disable-bootstrap`, and all went fine. The latter issue is definitely gone, not sure about the original one. If disabling bootst

[Bug target/79964] Cortex A53 codegen still not optimal

2017-07-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #5 from PeteVine --- Turns out the GCC 8 regression is caused by the +crc switch in -march=armv8-a+crc. Interesting, eh?

[Bug target/79964] Cortex A53 codegen still not optimal

2017-07-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #7 from PeteVine --- Thanks for pointing that out! I was using my bash history to change the CFLAGS and when I was flipping the crc switch I didn't notice I'd picked a version without -frename-registers, hence this wrong conclusion :)

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #11 from PeteVine --- I've just retested gcc7 on both ARM platforms. AArch64 gets a 3% improvement now, while ARMv7 reproduces the issue, just as before. I'm compiling/profiling on a Cortex A5 which could be the main reason behind a

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #12 from PeteVine --- It even reproduces the following way: I built an instrumented ARMv7 binary natively, ran it on a Cortex-A53, copied the gcda file back, recompiled with -fprofile-use and got the same 20% slowdown. Surely, that

[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004 --- Comment #36 from PeteVine --- Created attachment 41239 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41239&action=edit Assembly files produced with -fverbose-asm

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #13 from PeteVine --- Created attachment 41240 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41240&action=edit Assembly files produced with -fverbose-asm

[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004 PeteVine changed: What|Removed |Added Attachment #41239|0 |1 is obsolete|

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #15 from PeteVine --- I don't have a cross-compiler built/installed. If you're positive the bug doesn't reproduce on your end (targeting generic or A5 codegen), then maybe it's about some interaction between gcc instrumentation and t

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #16 from PeteVine --- Also, I'd like to repeat the fact using -mcpu=cortex-a7 fixes the issue (no library calls present). Incidentally, having run that A7 profiled binary on a Cortex-A53, I'm seeing a 10% hit compared to a vanilla A

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #18 from PeteVine --- > Well that sounds like the same issue. > Note -fprofile-generate simple inserts counters in the generated code. In > fact the generated code is practically identical between Cortex-A5 and > Cortex-A7. As lon

[Bug target/79964] Cortex A53 codegen still not optimal

2017-04-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #2 from PeteVine --- I can confirm the first part of the issue gets fixed with this patch: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01415.html but there's a regression in gcc8 concerning the second part. (or rather the workarou

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-05-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #5 from PeteVine --- Unchanged in gcc version 8.0.0 20170501.

[Bug target/79964] Cortex A53 codegen still not optimal

2017-05-02 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #4 from PeteVine --- > I'm not sure what you're trying to measure here - it's very confusing with > multiple overlapping options (O3/Ofast/tree-vectorize), -mcpu/-march. Is it > related to -fipa-pta or is that not relevant? All the

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-06-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #7 from PeteVine --- Thanks, I promise to test any patches without delay :)

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-06-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #8 from PeteVine --- I've just confirmed the result on a newer Linux distribution (Ubuntu 16.04) and the difference between VFPv3 and v4 is clearly there (2330 vs 2560) using gcc 5.4. Unless the CPU itself requires an erratum, that

[Bug c++/69481] ICE with C++11 alias using with templates

2016-12-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69481 PeteVine changed: What|Removed |Added CC||tulipawn at gmail dot com --- Comment #10

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2016-12-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 PeteVine changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #12 f

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2016-12-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #13 from PeteVine --- Also, could these (sample) warnings actually matter when using ld.gold? NB, lra-constraints.c features in the previously provided backtrace: ../../libdecnumber/decNumber.c:3582:0: note: code may be misoptimized

[Bug middle-end/78994] New: -Ofast makes aarch64 C++ benchmark slower

2017-01-04 Thread tulipawn at gmail dot com
-end Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40463 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40463&action=edit Preprocessed source + assembly files After make && ./build

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #3 from PeteVine --- Hey, that works for me too! (62565 vs 70758 in favour of -Ofast). Usefully strange :)

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #4 from PeteVine --- I'm delighted to report **not** targeting Cortex-A53 actually incurs a performance penalty sometimes ;) http://openbenchmarking.org/result/1701128-TA-GCCCOMPAR79

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #6 from PeteVine --- It's possible I already had that patch included in my build, but in case I didn't, here's a quick addition to the previous result: http://openbenchmarking.org/result/1701143-TA-GCCCOMPAR66 The c-ray thunderx re

[Bug target/79105] New: Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-16 Thread tulipawn at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- As the title says, many results seem to suffer from switching to -mfpu=neon, etc. http://openbenchmarking.org/result/1701165-TA-1701143TA78

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 --- Comment #1 from PeteVine --- Updated to include an explicit -mfpu=neon-vfpv4. http://openbenchmarking.org/result/1701179-TA-1701143TA49 Not sure if -mcpu=cortex-a5 and -mfpu=neon shouldn't have implied VFPv4 but the explicit addition has f

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 --- Comment #2 from PeteVine --- $ gcc -v Configured with: ../configure -v --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib --without-included-gettext --e

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-01-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #21 from PeteVine --- It would be great if https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 could get squashed in one fell swoop.

[Bug target/79239] New: [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40586 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40586&action=edit Preposessed sour

[Bug target/79239] [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239 --- Comment #2 from PeteVine --- gcc -O2 or above elicits the ICE, configured with: --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib --without-included-ge

[Bug target/79239] [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239 --- Comment #5 from PeteVine --- Yes, this came from the gl4es project, and compiling the whole thing normally, only gcc7 is affected.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #9 from PeteVine --- @jgreenhalgh Please have a look at the profiled assembly for both fast and slow codegen. (attached) According to @aldyh's bisection in #68664 this probably isn't the same issue.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #11 from PeteVine --- Super cool, thanks! That makes the OP a true prophet before his time ;)

[Bug target/79370] New: Cortex-A7 hardware division switched on for -mcpu but not -mtune

2017-02-03 Thread tulipawn at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40667 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40667&action=edit Preprocessed source Compil

[Bug target/79370] Cortex-A7 hardware division switched on for -mcpu but not -mtune

2017-02-03 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79370 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/79480] New: -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- The gl-117 binary (source link attached) compiled with: -mcpu=cortex-a5 -O3 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize crashes with a SIGBUS plus this kernel

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #2 from PeteVine --- OK, having been built with: -mcpu=cortex-a5 -O3 -ffast-math -marm -fomit-frame-pointer -fipa-pta -mfpu=neon-vfpv4 -ftree-vectorize -flto -fsanitize=undefined doesn't crash but prints many errors, e.g.: 3ds.cpp

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #3 from PeteVine --- That's the same command line that leads to an immediate crash (uninstrumented).

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #4 from PeteVine --- Whereas `-fsanitize=address` aborts all the same: ==28821==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0xaf012100 #0 0xb6af76fb in operator delete(void*, unsigned i

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 PeteVine changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|DUPLICATE

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #12 from PeteVine --- Nice, PR68664 patch has fixed the issue. FWIW, unlike previously, running on a Cortex-A53, showed perfect alignment with core type (-mfpu=vfpv3) on the first run: Cortex-A8 Rendering took: 1 seconds (1801 milli

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #24 from PeteVine --- I did a git pull and restarted the build so unless something didn't get reconfigured, it definitely should've been included. If you see the improvement, never mind then.

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #25 from PeteVine --- The original issue never mentioned -Ofast or -ffast-math and I see no difference at -Ofast, indeed: http://openbenchmarking.org/result/1702153-RI-CRAYFAST424 @jgreenhalgh Can you confirm there's no regression @

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #26 from PeteVine --- OK, maybe this SoC is kinky, I give up: http://openbenchmarking.org/result/1702154-RI-CRAYFAST326

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #28 from PeteVine --- Lesson learnt, thanks! If you look at the last -Ofast result (or 1702153-RI-CRAYFAST467), the suspect difference is there (the compiler had been rebuilt from scratch with all the patches), and I even managed to

[Bug target/79581] New: VFP4 slower than VFP3 in C-ray

2017-02-17 Thread tulipawn at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40762 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40762&action=edit preprocessed source $ gcc -marm -Ofast -mcpu=cortex-a5 -mfpu=vfpv3 c-ray-mt.i -lm -l

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 PeteVine changed: What|Removed |Added Target||armv7 --- Comment #1 from PeteVine --- Disti

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-18 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #2 from PeteVine --- Created attachment 40769 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40769&action=edit sphract The other file required to run the benchmark straight from bugzilla! :)

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #4 from PeteVine --- > Judging by your -mcpu option is this on a Cortex-A5? Yes, if you look at the results on a Cortex A53 running armv7 code, it doesn't reproduce either, and A5-codegen is king :) (hopefully due to in-order design

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-22 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 PeteVine changed: What|Removed |Added CC||tulipawn at gmail dot com --- Comment #5

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-22 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 --- Comment #6 from PeteVine --- But that's related to -mcpu=cortex-a53 again, so never mind I guess.

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-23 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 --- Comment #13 from PeteVine --- Still, the 5% regression must have happened very recently. The fast gcc was built on 20170220 and the slow one yesterday, using the original patch. Once again, switching away from Cortex-A53 codegen restores the

[Bug middle-end/79712] New: Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40829 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40829&action=edit preprocessed source It seems clang is probably doing a

[Bug middle-end/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #1 from PeteVine --- Created attachment 40830 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40830&action=edit C source

[Bug middle-end/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #2 from PeteVine --- Created attachment 40831 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40831&action=edit inputs

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #4 from PeteVine --- It's a gcc version 7.0.1 20170220 (experimental) (GCC) configured with: --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #5 from PeteVine --- Clang however gets no further improvement from -funroll-loops meaning a simple `-O3 -mcpu=cortex-a53` produces much better performance than gcc without unrolling.

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #6 from PeteVine --- The difference between clang and gcc is even greater on ARMv7 Cortex A5 but there's no way to catch up through unrolling (no effect): gcc version 7.0.1 20170225:1227.2 Kpos/sec clang 3.6:

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #8 from PeteVine --- Seeing as unrolling does such a great job on aarch64, surpassing clang, should we leave the ARM issue bunched together with this one?

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-03-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #29 from PeteVine --- I used a different distribution image (binutils 2.25, no --fix-cortex-a53-835769 option) but the results haven't changed (thunderx tuning must have improved though as it stopped offering any benefit over A53): h

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #30 from PeteVine --- Or rather, the difference observed in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468#c7 is still there @ -Ofast, but the Cortex-A53 result is in the same range now. I'll have to investigate the effect of -

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #31 from PeteVine --- Indeed, that was it! I've probably found the source of my A53 issues: http://openbenchmarking.org/result/1703040-RI-CRAYERRAT99 This means comment #29 exposes a different issue and Cortex A53 codegen still is s

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #7 from PeteVine --- Not affected by -mno-fix-cortex-a53-843419 which gives the issue full validity. -Ofast pessimizes Cortex A53 codegen somehow and switching to e.g. -mcpu=cortex-a57 fixes it. (tested on trunk)

[Bug middle-end/77546] [6/7 regression] C++ software renderer performance drop

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77546 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2017-03-05 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 PeteVine changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2017-03-05 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #15 from PeteVine --- Sorry wrong number :) I meant --enable-fix-cortex-a53-843419

[Bug fortran/79933] New: gfortran no longer able to compile dolfyn benchmark

2017-03-06 Thread tulipawn at gmail dot com
Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40900 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40900&action=edit fortran source Used to work fine a few months ago. $ gfortra

[Bug target/77730] Fortran performance on aarch64 (6/7 regression heads-up)

2017-03-07 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77730 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/79964] New: Cortex A53 codegen still not optimal

2017-03-08 Thread tulipawn at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Two data points: - the integer benchmark from PR79665 runs about 7% slower with -mcpu=cortex-a53 vs other targets, equalling generic codegen. It was still indistinguishable on

[Bug ada/80007] New: --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Never tried bootstrapping ada this way before (a full bootstrap succeeded a few days ago), so I'm not entirely sure if t

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #2 from PeteVine --- Right, I definitely used the same setup a few days ago minus --disable-bootstrap.

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #4 from PeteVine --- > Can you try again without --disable-bootstrap ? It's GNAT 5.4.0. OK, I'll try again.

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #5 from PeteVine --- The repeated full ada bootstrap was successful at the same revision, using identical flags and GNAT 5.4.0. On the other hand, the failing build prints two warnings during the ada part: g-debpoo.adb: In function

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #6 from PeteVine --- Turns out it's a miscompilation bug as I was using the same set of C(XX)FLAGS that work fine for those other languages. Removing the -fomit-frame-pointer flag while leaving the rest unchanged (-O3 -mtune=cortex-

[Bug target/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-13 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #8 from PeteVine --- It was about -O3 -fomit-frame-pointer, but yeah, I don't care one bit either. Just make sure `--enable-languages=ada` works. (c++ is not being inferred so you end up with no xg++)

[Bug target/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-13 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #9 from PeteVine --- Correction, it was about -fomit-frame-pointer period! Setting the environment C(XX)FLAGS to that flag alone triggers the bug.

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-04-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #9 from PeteVine --- Well, yes, that fixes the -Ofast issue for me: -mcpu=cortex-a53 -frename-registers iir:65952 ns per loop iir_2: 63098 ns per loop -mcpu=cortex-a57 (-frename-registers) iir:62839 ns per loop iir_2: 6267

[Bug target/79964] Cortex A53 codegen still not optimal

2017-04-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 PeteVine changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #1 from

[Bug fortran/79933] gfortran no longer able to compile dolfyn benchmark

2017-11-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933 --- Comment #3 from PeteVine --- In gcc 8, -std=f2003 is required to overcome the issue but there's another failure later on: gfortran -c -O2 -std=f2003solverinterface.f90 solverinterface.f90:108:9: real*4 fpar(16) ! hint by Shibo

  1   2   3   >