[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-19 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #18 from YunQiang Su --- https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654956.html

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-18 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #17 from YunQiang Su --- I send the patch here. So we may need some more test.

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-15 Thread lis8215 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #16 from Siarhei Volkau --- Might it be that LoongArch have register reuse dependency? I observed similar behavior on XBurst with load/store/reuse pattern: e.g. this code LW $v0, 0($t1)# Xburst load latency is 4 but it has bypa

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-15 Thread lis8215 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #15 from Siarhei Volkau --- Created attachment 58437 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58437&action=edit application to test performance of shift Here is the test application (MIPS32 specific) I wrote. It allows

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-15 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #14 from YunQiang Su --- And it seems that the performance of SLL is related with the operand. Just iterate from 0 to 1e9: ``` 0b00 : b00: 000223c0sll a0,v0,0xf <-- the code is something wrong

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-14 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #13 from YunQiang Su --- I try to insert li $3, 500 li $5, 500 between SLL/BGEZ and LUI+AND/BNE. The later is still some faster on Loongson 3A4000. I notice something like this in 74K's software manual:

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-14 Thread lis8215 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #12 from Siarhei Volkau --- Highly likely it's because of data dependency, and not direct cost of shift operations on LoongArch, although can't find information to prove that. So, I guess it still might get performance benefit in cas

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-14 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 YunQiang Su changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-14 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #10 from YunQiang Su --- I have some performance test. sll+bgez is some slower than lui+and+beqz. On Loongson 3A4000, it is about 10%. So this "optimization" makes sense only for -Os.

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-12 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #9 from YunQiang Su --- I see about condmove: it is broken since gcc14. int f32(int a) { int p = (a & (1<<16)); if (p) return 100; else return 1000; }

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-07 Thread lis8215 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #8 from Siarhei Volkau --- Created attachment 58377 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58377&action=edit condmove testcase Tested with current GCC master branch: - Work with -Os confirmed. - Condmove issue present

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-06 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #7 from YunQiang Su --- Ohh, I need add "&&" before "!reload_completed". It seems work with -Os. can you give me you test code? I cannot figure out a non-workable condmove C code for it. With the constant less than 0x, AN

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-06 Thread lis8215 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #6 from Siarhei Volkau --- Well, it is work mostly well. However, it still has issues, addressed in my patch: 1) Doesn't work for -Os : highly likely costing issue. 2) Breaks condmoves, as mine does. I have no idea how to avoid tha

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-05 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #5 from YunQiang Su --- I copy the RTL pattern from RISC-V, and it seems work ``` --- a/gcc/config/mips/mips.md +++ b/gcc/config/mips/mips.md @@ -6253,6 +6253,40 @@ (define_insn "*branch_bit_inverted" } [(set_attr "type"

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-05 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #4 from YunQiang Su --- Ohh, RISC-V has solved this problem in recent release. So we can just do similar work.

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-04 Thread lis8215 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #3 from Siarhei Volkau --- I know that the patch breaks condmove cases, that's why it is silly.

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-04 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #2 from YunQiang Su --- (In reply to YunQiang Su from comment #1) > RISC-V has this problem, too. > Maybe we can try to combine it in `combine` pass, while it may be not easy. > It may break some code like: > > ``` > int f1(); > int

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-03 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376 --- Comment #1 from YunQiang Su --- RISC-V has this problem, too. Maybe we can try to combine it in `combine` pass, while it may be not easy. It may break some code like: ``` int f1(); int f2(); int f(int a) { int p = (a & 0x8);