https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #18 from YunQiang Su ---
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654956.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #17 from YunQiang Su ---
I send the patch here.
So we may need some more test.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #16 from Siarhei Volkau ---
Might it be that LoongArch have register reuse dependency?
I observed similar behavior on XBurst with load/store/reuse pattern:
e.g. this code
LW $v0, 0($t1)# Xburst load latency is 4 but it has bypa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #15 from Siarhei Volkau ---
Created attachment 58437
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58437&action=edit
application to test performance of shift
Here is the test application (MIPS32 specific) I wrote.
It allows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #14 from YunQiang Su ---
And it seems that the performance of SLL is related with the operand.
Just iterate from 0 to 1e9:
```
0b00 :
b00: 000223c0sll a0,v0,0xf <-- the code is something
wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #13 from YunQiang Su ---
I try to insert
li $3, 500
li $5, 500
between SLL/BGEZ and LUI+AND/BNE.
The later is still some faster on Loongson 3A4000.
I notice something like this in 74K's software manual:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #12 from Siarhei Volkau ---
Highly likely it's because of data dependency, and not direct cost of shift
operations on LoongArch, although can't find information to prove that.
So, I guess it still might get performance benefit in cas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
YunQiang Su changed:
What|Removed |Added
Resolution|--- |INVALID
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #10 from YunQiang Su ---
I have some performance test.
sll+bgez is some slower than lui+and+beqz.
On Loongson 3A4000, it is about 10%.
So this "optimization" makes sense only for -Os.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #9 from YunQiang Su ---
I see about condmove: it is broken since gcc14.
int
f32(int a)
{
int p = (a & (1<<16));
if (p)
return 100;
else
return 1000;
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #8 from Siarhei Volkau ---
Created attachment 58377
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58377&action=edit
condmove testcase
Tested with current GCC master branch:
- Work with -Os confirmed.
- Condmove issue present
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #7 from YunQiang Su ---
Ohh, I need add "&&" before "!reload_completed".
It seems work with -Os.
can you give me you test code?
I cannot figure out a non-workable condmove C code for it.
With the constant less than 0x, AN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #6 from Siarhei Volkau ---
Well, it is work mostly well.
However, it still has issues, addressed in my patch:
1) Doesn't work for -Os : highly likely costing issue.
2) Breaks condmoves, as mine does. I have no idea how to avoid tha
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #5 from YunQiang Su ---
I copy the RTL pattern from RISC-V, and it seems work
```
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -6253,6 +6253,40 @@ (define_insn "*branch_bit_inverted"
}
[(set_attr "type"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #4 from YunQiang Su ---
Ohh, RISC-V has solved this problem in recent release.
So we can just do similar work.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #3 from Siarhei Volkau ---
I know that the patch breaks condmove cases, that's why it is silly.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #2 from YunQiang Su ---
(In reply to YunQiang Su from comment #1)
> RISC-V has this problem, too.
> Maybe we can try to combine it in `combine` pass, while it may be not easy.
> It may break some code like:
>
> ```
> int f1();
> int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #1 from YunQiang Su ---
RISC-V has this problem, too.
Maybe we can try to combine it in `combine` pass, while it may be not easy.
It may break some code like:
```
int f1();
int f2();
int f(int a) {
int p = (a & 0x8);
18 matches
Mail list logo