[Bug c/97083] New: __builtin_lround and _builtin_llround not replaced with fcvtas on aarch64

2020-09-17 Thread linux at carewolf dot com
: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- On aarch64 calling __builtin_round and casting the result to int or long long uses a single fcvtas instruction, but using

[Bug c++/51033] generic vector subscript and shuffle support was not added to C++

2013-02-17 Thread linux at carewolf dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com

[Bug c++/51033] generic vector subscript and shuffle support was not added to C++

2013-02-17 Thread linux at carewolf dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #32 from Allan Jensen 2013-02-17 15:23:49 UTC --- (In reply to comment #31) > (In reply to comment #30) > > Another example is binary operators between scalar and vectors. In C the > > scalar > > is automatically treated as a

[Bug middle-end/53460] New: Internal compiler error: in calc_dfs_tree, at dominance.c:395

2012-05-23 Thread linux at carewolf dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53460 Bug #: 53460 Summary: Internal compiler error: in calc_dfs_tree, at dominance.c:395 Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED

[Bug middle-end/53460] Internal compiler error: in calc_dfs_tree, at dominance.c:395

2012-05-23 Thread linux at carewolf dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53460 --- Comment #1 from Allan Jensen 2012-05-23 15:34:35 UTC --- Created attachment 27481 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27481 FontFastPath.ii.gz

[Bug middle-end/53460] Internal compiler error: in calc_dfs_tree, at dominance.c:395

2012-05-23 Thread linux at carewolf dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53460 --- Comment #2 from Allan Jensen 2012-05-23 15:37:32 UTC --- It appears I am not allowed to make more than one attachment so you will have to do with one example. Here is the console output: Using built-in specs. COLLECT_GCC=/usr/bin/g++ Target:

[Bug c++/48026] #pragma optimize ignored for C++

2012-07-25 Thread linux at carewolf dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48026 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #2

[Bug target/59422] New: Support more targets for function multi versioning

2013-12-08 Thread linux at carewolf dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Created attachment 31399 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31399&action=edit Patch Trying to compile a function with an "xop" multiversion fails with a &qu

[Bug tree-optimization/78394] False positives of maybe-uninitialized with -Og

2018-12-17 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78394 --- Comment #9 from Allan Jensen --- I see two other level effort ways to possibly fix the issue. Disable the warning like for -O0 as it is buggy, or if we believe it still has some value in -Og even with the false positivies, just removing it fr

[Bug c++/88475] -E -fdirectives-only clashes with raw strings

2019-01-17 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88475 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #1

[Bug c++/88475] -E -fdirectives-only clashes with raw strings

2019-01-21 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88475 --- Comment #3 from Allan Jensen --- No, it has to be a raw-string to be valid. https://wandbox.org/permlink/I0yF3U3OXoH6LbIM

[Bug target/89057] New: GCC 7->8 regression: ARM(64) ld3 st4 less optimized

2019-01-25 Thread linux at carewolf dot com
P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- When using the vld3_u8 and vst4_u8 instrinsics, the code generated with gcc8 is less efficient than the code generated with gcc7. One has 3 moves, and the othe

[Bug target/89058] New: GCC 7->8 regression: ARM(64) ld3 st4 less optimized

2019-01-25 Thread linux at carewolf dot com
P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- When using the vld3_u8 and vst4_u8 instrinsics, the code generated with gcc8 is less efficient than the code generated with gcc7. One has 3 moves, and the other 9 mo

[Bug target/89058] GCC 7->8 regression: ARM(64) ld3 st4 less optimized

2019-01-25 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89058 --- Comment #2 from Allan Jensen --- Oops, sorry.

[Bug target/89057] [8/9 Regression] AArch64 ld3 st4 less optimized

2019-01-30 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057 --- Comment #4 from Allan Jensen --- While that change might have made things worse. The real problem is probably that the registers for those instructions are loaded and stored using intrinsics, so proper register allocation and combining cant b

[Bug tree-optimization/85406] New: Unnecessary blend when vectorizing short-cutted calculations

2018-04-15 Thread linux at carewolf dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- If you have something like this: inline unsigned qPremultiply(unsigned x) { const unsigned a = x >> 24; if (a

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-15 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 --- Comment #1 from Allan Jensen --- Note it might be hard to figure out for the compiler that the result for a==255 will leave the input unchanged, but you can observe the same if you instead test for a == 0 (and return 0). In that case the comp

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 --- Comment #3 from Allan Jensen --- You need to add the loop around it void test(unsigned *buffer, int count) { for (int i = 0; i < count; ++i) buffer[i] = qPremultiply(buffer[i]); }

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 --- Comment #4 from Allan Jensen --- Created attachment 43995 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43995&action=edit gccbug85406.cpp This version compiles with a pcmpeqd and pandn instead of a blend, but the principle is the same

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 --- Comment #6 from Allan Jensen --- Yeah, the a==255 was actually not a case I would expect the compiler to solve, which is why I changed the example to the a==0 case, which should be solveable using existing constant propagation. Note you can

[Bug rtl-optimization/85551] New: No strength reduction of modulo and integer vision

2018-04-27 Thread linux at carewolf dot com
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 44030 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44030&action=edit strmod.cpp Many simple loops using modulo naively

[Bug rtl-optimization/85551] No strength reduction of modulo and integer vision

2018-04-27 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85551 --- Comment #1 from Allan Jensen --- I also stumbled on this old motivating article when I tried googling the concept: http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TM-600.pdf

[Bug rtl-optimization/85551] No strength reduction of modulo and integer vision

2018-04-27 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85551 --- Comment #2 from Allan Jensen --- Hmm.. I appear to have made unsafe assumptions in the mod_opt cases. The first safe optimization version would then be: void mod_opt(int *a, int count, int stride, unsigned width) { int pos_opt = 0; f

[Bug tree-optimization/85692] New: Two source permute not used for vector initialization

2018-05-08 Thread linux at carewolf dot com
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- If a vector initialization is using elements from only a single vector source, it will be optimized as a shuffle, but if it is using elements from two, it

[Bug tree-optimization/85692] Two source permute not used for vector initialization

2018-05-08 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85692 --- Comment #1 from Allan Jensen --- Created attachment 44084 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44084&action=edit construct.cc Motivating examples. Compile with -msse4.1 for the second case.

[Bug tree-optimization/85692] Two source permute not used for vector initialization

2018-05-08 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85692 --- Comment #4 from Allan Jensen --- Note I already posted a patch on gcc-patches myself. It is very similar to yours

[Bug tree-optimization/85692] Two source permute not used for vector initialization

2018-05-08 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85692 --- Comment #5 from Allan Jensen --- Created attachment 44088 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44088&action=edit suggested patch

[Bug rtl-optimization/85950] New: Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-28 Thread linux at carewolf dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- When SSE4.1 is available, std::floor, std::ceil and their C counterparts are inlined to being a single roundss

[Bug rtl-optimization/85950] Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950 --- Comment #1 from Allan Jensen --- Sorry forget the example above. I will attached the real code that triggers it. Note it does not trigger with -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math, so it is somethin

[Bug rtl-optimization/85950] Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950 --- Comment #2 from Allan Jensen --- Created attachment 44196 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44196&action=edit Example To trigger need both a rounding conversion and a conversion to integer.

[Bug target/85950] Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-29 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950 --- Comment #4 from Allan Jensen --- Btw, I found this while trying to figure out why std::round() wasn't also optimized to a single roundss instruction, is that just a missing optimization or is there a quirk about that that makes them not fit?

[Bug target/85950] Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-29 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950 --- Comment #6 from Allan Jensen --- Btw, I have tested and the patch works for my cases.

[Bug tree-optimization/83847] [8 Regression] ICE in vectorizable_load, at tree-vect-stmts.c:7365

2018-01-16 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83847 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #3

[Bug tree-optimization/83847] [8 Regression] ICE in vectorizable_load, at tree-vect-stmts.c:7365

2018-01-16 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83847 --- Comment #4 from Allan Jensen --- Full output from the ICE: during GIMPLE pass: vect /src/qt5/qtbase/src/corelib/kernel/qmetaobjectbuilder.cpp: In function ‘int buildMetaObject(QMetaObjectBuilderPrivate*, char*, int, bool)’: /src/qt5/qtbase/s

[Bug middle-end/84019] New: [7/8 regression] ICE under fold-const

2018-01-24 Thread linux at carewolf dot com
-end Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- ICE when compiling Chromium in QtWebEngine under certain conditions. With gcc 8: during GIMPLE pass: fre ../../../../../qtwebengine/src/3rdparty/chromium/third_party/WebKit

[Bug middle-end/84019] [7/8 regression] ICE under fold-const

2018-01-24 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019 --- Comment #1 from Allan Jensen --- First line of the ICE (the only line reported by system gcc) ../../src/init2.c:52: MPFR assertion failed: p >= 2 && p <= ((mpfr_prec_t)((mpfr_uprec_t)(~(mpfr_uprec_t)0)>>1))

[Bug middle-end/84019] [7/8 regression] ICE under fold-const

2018-01-24 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019 --- Comment #2 from Allan Jensen --- I can provide the intermediate code, but I haven't created a reduced test-case, so it would be big.

[Bug lto/63688] all_symbols_read_handler: Assertion `lto_wrapper_argv' failed.

2018-02-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63688 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #2

[Bug debug/86582] [debug] vla size reported as 0 at Og

2019-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86582 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #3

[Bug debug/68836] GCC can't properly emit debug info for function arguments in a back-trace when using -Og

2019-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68836 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #8

[Bug c++/88475] -E -fdirectives-only clashes with raw strings

2019-03-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88475 --- Comment #5 from Allan Jensen --- Note, you can fix the conflict with icecc by setting ICEC_REMOTE_CPP=0 Icecc will only do this to enable the remote cpp feature.

[Bug rtl-optimization/43147] SSE shuffle merge

2019-05-19 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #9

[Bug c/66970] Add __has_builtin() macro

2019-07-14 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66970 --- Comment #19 from Allan Jensen --- (In reply to felix from comment #18) > So even if this feature is adopted as-is, it will necessitate some changes > in the documentation. And while I can sympathise with claims that this > behaviour is surpri

[Bug c++/58407] [C++11] Should warn about deprecated implicit generation of copy constructor/assignment

2018-10-02 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58407 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment

[Bug middle-end/84019] [7/8 regression] ICE in fold-const of std::complex division

2018-02-23 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019 --- Comment #8 from Allan Jensen --- Yes, I will take a look again and produce the intermediate results

[Bug middle-end/84019] [7/8 regression] ICE in fold-const of std::complex division

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019 Allan Jensen changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug middle-end/84718] New: [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 43566 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43566&action=edit gcc log Using latest gcc 8 updated today I hit an internal compile

[Bug middle-end/84718] [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718 --- Comment #1 from Allan Jensen --- Created attachment 43567 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43567&action=edit spdy_alt_svc_wire_format.s

[Bug middle-end/84718] [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718 --- Comment #2 from Allan Jensen --- Created attachment 43568 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43568&action=edit spdy_alt_svc_wire_format.ii.gz

[Bug middle-end/84718] [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718 --- Comment #4 from Allan Jensen --- I will update my gcc build and check

[Bug tree-optimization/84670] [8 Regression] ICE: in compute_antic_aux, at tree-ssa-pre.c:2148 with -O2 -fno-tree-dominator-opts

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84670 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment

[Bug middle-end/84718] [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718 Allan Jensen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug tree-optimization/84777] New: -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Neither the command-line flag -ftree-loop-vectorize nor -fopenmp combined with "#pragma omp simd" works when -Os is active. It seems that it when specified manually vec

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #4 from Allan Jensen --- I will try the patch. I just tried -fopt-info-vec-missed and the message reported for every loop was: note: not vectorized: latch block not empty. note: bad loop form.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #6 from Allan Jensen --- Great. Your patch worked with 90% of the marked loops! The remaining report things like this with -fopt-info-vec-missed: note: not vectorized: relevant stmt not supported: idisty.872_437 = (unsigned int) idi

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #8 from Allan Jensen --- Yes, those I say are missing are compared to -O2. I was investigating this in relation to Qt. We either build these files with -O3, or with -Os for customer that are binary size sensitive. Since some of the im

[Bug other/70118] New: UBSan claims misaligned access in SSE instrinsics

2016-03-07 Thread linux at carewolf dot com
: other Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- The intrinsics _mm_loadl_epi64 and _mm_storel_epi64 triggers UBSan warnings on unaligned access because the instrinsics definitions in emmintrin.h are using __m64 and

[Bug rtl-optimization/81174] New: bswap not recognized in |= statement

2017-06-22 Thread linux at carewolf dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 41610 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41610&action=edit bswap-issue.cc In writting a big-endian bitfield accessor I notic

[Bug rtl-optimization/81174] bswap not recognized in |= statement

2017-06-22 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81174 Allan Jensen changed: What|Removed |Added Version|6.1.1 |7.1.0 --- Comment #1 from Allan Jensen -

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-11-23 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118 --- Comment #2 from Allan Jensen --- I believe this to be fixed by r239889

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-11-23 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118 --- Comment #3 from Allan Jensen --- Or r217608

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-11-23 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118 --- Comment #4 from Allan Jensen --- Created attachment 40130 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40130&action=edit Proposed patch On closer inspection, we are only almost there, two minor changes are still needed. (testing patc

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-11-24 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118 Allan Jensen changed: What|Removed |Added Attachment #40130|0 |1 is obsolete|

[Bug target/31667] Integer extensions vectorization could be improved

2016-11-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #3

[Bug target/31667] Integer extensions vectorization could be improved

2016-11-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667 --- Comment #4 from Allan Jensen --- (In reply to Allan Jensen from comment #3) > Gcc 5 and 6 produces code with pmovzx when compiling the example with -O3 > -msse4.1 > > I assume this can be closed. Note like comment 1 saids, it will not use a

[Bug target/78563] New: SSE4.1 pmovzx shuffle pattern not recognized

2016-11-28 Thread linux at carewolf dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- An unpack pattern with 0 constant are neither folded nor recognized as a pmovzx instruction. SSE2 code: _mm_unpacklo_epi32(X, _mm_setzero_si128()) GCC code

[Bug target/78563] SSE4.1 pmovzx shuffle pattern not recognized

2016-11-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78563 --- Comment #1 from Allan Jensen --- Created attachment 40177 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40177&action=edit Test

[Bug target/47754] [missed optimization] AVX allows unaligned memory operands but GCC uses unaligned load and register operand

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #7

[Bug target/47754] [missed optimization] AVX allows unaligned memory operands but GCC uses unaligned load and register operand

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754 --- Comment #8 from Allan Jensen --- Note this happens with -mavx2, but not with -march=haswell. It appears the tuning is a bit too pessimistic when avx2 is enabled on generic x64.

[Bug target/47754] [missed optimization] AVX allows unaligned memory operands but GCC uses unaligned load and register operand

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754 --- Comment #10 from Allan Jensen --- No I mean it triggers when you compile with -mavx2, it is solved with -march=haswell. It appears the issue is the tune flag X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL is set for all processors that support avx2,

[Bug target/78762] New: Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-10 Thread linux at carewolf dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 40295 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40295&action=edit Test In gcc 7 when not optimiz

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 --- Comment #1 from Allan Jensen --- Created attachment 40296 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40296&action=edit Test compiled with -mavx2

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 --- Comment #2 from Allan Jensen --- Created attachment 40297 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40297&action=edit Test compiled with -march=haswell

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 --- Comment #3 from Allan Jensen --- Created attachment 40298 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40298&action=edit Test compiled with gcc 6

[Bug target/47754] [missed optimization] AVX allows unaligned memory operands but GCC uses unaligned load and register operand

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754 --- Comment #11 from Allan Jensen --- The think the issue I noted is completely separate from this one, so I opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 to deal with it. I think this one could probably be closed though.

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-12-12 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118 Allan Jensen changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/59874] Missing builtin (__builtin_clzs) when compiling with g++

2016-12-12 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #5

[Bug c/66970] Add __has_builtin() macro

2016-12-12 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66970 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #5

[Bug target/59874] Missing builtin (__builtin_clzs) when compiling with g++

2016-12-13 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874 --- Comment #8 from Allan Jensen --- Thanks that looks good. I will test it when I have a chance. I am changing the Qt sources to not assume the presence of __builtin_clzs when __BMI__ is defined. It can use __builtin_clz() and __builtin_ctz()-16

[Bug target/59874] Missing builtin (__builtin_clzs) when compiling with g++

2016-12-15 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874 --- Comment #15 from Allan Jensen --- Yes, the patch works and it also evaluates at compile time.

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-21 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 --- Comment #10 from Allan Jensen --- That would solve the problem, but also leave the behavior as Sandybridge only (nehalem didn't have AVX).

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-21 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 --- Comment #11 from Allan Jensen --- Btw, did you benchmark store splitting on AMD? It is also enabled for BDVER and ZNVER1.

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-21 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 --- Comment #13 from Allan Jensen --- The question is if the unaligned store is still slow on Excavator and Ryzen which support AVX2. As far as I understand the bulldozer architectures just prefer split AVX because it was basically emulating them

[Bug target/78921] New: SSE/AVX shuffle intrinsics uses builtins instead of __builtin_shuffle

2016-12-24 Thread linux at carewolf dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- The intrinsics for x86 SIMD shuffle instructions could be redeclared using __builtin_shuffle. This would help folding and better

[Bug target/80040] SSE4.1 ptest not always merged

2017-03-14 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80040 --- Comment #1 from Allan Jensen --- Created attachment 40972 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40972&action=edit Assembler output

[Bug target/80040] SSE4.1 ptest not always merged

2017-03-14 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80040 --- Comment #2 from Allan Jensen --- Created attachment 40973 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40973&action=edit Assembler output from gcc 6 Easier to compare

[Bug target/80040] New: SSE4.1 ptest not always merged

2017-03-14 Thread linux at carewolf dot com
Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 40971 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40971&action=edit Example The intrinsics _mm_testz_si128 and _mm_testc_si128 both map to the exa

[Bug ipa/80277] New: ipa-icf missing overlooking functions

2017-04-01 Thread linux at carewolf dot com
Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 41100 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41100&action=edit icf.cc Several functions that produce identical assembler are not merged by ipa

[Bug tree-optimization/82426] New: Missed tree-slp-vectorization on -O2 and -O3

2017-10-04 Thread linux at carewolf dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 42299 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42299&action=edit vectslp.cpp The attached example is a simple matrix multipl

[Bug tree-optimization/82426] Missed tree-slp-vectorization on -O2 and -O3

2017-10-04 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426 --- Comment #1 from Allan Jensen --- Created attachment 42300 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42300&action=edit Assembler output with -O3

[Bug tree-optimization/82426] Missed tree-slp-vectorization on -O2 and -O3

2017-10-04 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426 --- Comment #2 from Allan Jensen --- Created attachment 42301 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42301&action=edit Assembler output with -Os -ftree-slp-vectorize

[Bug tree-optimization/82426] Missed tree-slp-vectorization on -O2 and -O3

2017-10-04 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426 --- Comment #3 from Allan Jensen --- Note it appears the fact it can do it at all in -Os is new in gcc 7

[Bug c++/77796] New: tautological compare warning emitted for inherited static method comparisons

2016-09-29 Thread linux at carewolf dot com
: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- We have been running into several issues with the tautological compare warning in qtdeclarative, first there was https

[Bug tree-optimization/77902] New: Auto-vectorizes epilogue loops or manually vectorized functions

2016-10-08 Thread linux at carewolf dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 39774 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39774&action=edit Example that trig

[Bug tree-optimization/77902] Auto-vectorizes epilogue loops of manually vectorized functions

2016-10-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902 --- Comment #1 from Allan Jensen --- Further experimentation shows that GCC can sometimes reason about the remaining range but does so inconsistenly. For instance this examplse also fails: int result = 0; for (; count >= 4; count -= 4) {

[Bug tree-optimization/77902] Auto-vectorizes epilogue loops of manually vectorized functions

2016-10-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902 --- Comment #2 from Allan Jensen --- While this have been the case in both GCC 5 and GCC 6, it appears to both failing cases previously meantioned already produced the best case result in using a half recent GCC 7. gcc version 7.0.0 20160923 (exp

[Bug tree-optimization/77902] Auto-vectorizes epilogue loops of manually vectorized functions

2016-10-18 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902 Allan Jensen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug pch/63319] [5 Regression] ICE: Segmentation fault building qt5 with pch

2016-11-03 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63319 Allan Jensen changed: What|Removed |Added CC||linux at carewolf dot com --- Comment

[Bug tree-optimization/78394] New: False positives of maybe-uninitialized with -Og

2016-11-17 Thread linux at carewolf dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 40064 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40064&action=edit maybe_uninitialized.cpp Compiling with -Og produces a nu

[Bug tree-optimization/78394] False positives of maybe-uninitialized with -Og

2016-11-17 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78394 Allan Jensen changed: What|Removed |Added Attachment #40064|0 |1 is obsolete|

  1   2   >