[Bug target/88271] Omit test instruction after add

2018-12-10 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #10 from Daniel Fruzynski --- Here is possible code transformation to equivalent form, where this optimization can be simply applied. This change also has a bit surprising side effect, second nested while loop is unrolled. [code] voi

[Bug target/88271] Omit test instruction after add

2018-12-07 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #9 from Daniel Fruzynski --- I have idea about alternate approach to this. gcc could try to look for relations between loop control statement, and other statements which modify variables used in that control statement. With such knowl

[Bug target/88271] Omit test instruction after add

2018-12-06 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #8 from Daniel Fruzynski --- I have results from Callgrind. Cycle estimation for MoveRows function (without children) is 58.29%. This is for app without test instruction. So in synthetic benchmark for this function only speed change w

[Bug target/88271] Omit test instruction after add

2018-12-06 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #7 from Daniel Fruzynski --- One more note: this particular function creates matrices with all possible permutations of row order of original matrix, which satisfies some additional criteria. So this optimization may be applicable to

[Bug target/88271] Omit test instruction after add

2018-12-06 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #6 from Daniel Fruzynski --- Average for version with test is 246.313ms, I deleted too many digits.

[Bug target/88271] Omit test instruction after add

2018-12-06 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #5 from Daniel Fruzynski --- How to use perf? I did not have change to use it yet, I usually use time command or callgrind. I have run my app compiled with AVX2 instructions on Xeon E5-2683 v3, CentOS 7.6, on idle CPU. I run it 3 tim

[Bug target/88271] Omit test instruction after add

2018-12-06 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #4 from Uroš Bizjak --- (In reply to Daniel Fruzynski from comment #3) > What about adding new pass at the end? It would look for various possible > optimizations, which were missed earlier because they are cross-basic block. We do h

[Bug target/88271] Omit test instruction after add

2018-12-06 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #3 from Daniel Fruzynski --- What about adding new pass at the end? It would look for various possible optimizations, which were missed earlier because they are cross-basic block. In my case this example code is part of tight loop. F

[Bug target/88271] Omit test instruction after add

2018-12-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2

[Bug target/88271] Omit test instruction after add

2018-12-06 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271 --- Comment #1 from Daniel Fruzynski --- I checked that in simple case when bit shift is used in "if", it is optimized: [code] void f(); void g(); void test(int n) { if (n << 1) f(); else g(); } [/code] [asm] test(int):