Issue |
90033
|
Summary |
[RISCV] Regsiter copies in a loop should be eliminated
|
Labels |
new issue
|
Assignees |
|
Reporter |
mgudim
|
The following happens in gcc benchmark:
For this code in gcc benchmark:
```
typedef struct simple_bitmap_def
{
unsigned char *popcount;
unsigned int n_bits;
unsigned int size;
unsigned long elms[1];
} *sbitmap;
typedef const struct simple_bitmap_def *const_sbitmap;
typedef unsigned long *sbitmap_ptr;
typedef const unsigned long *const_sbitmap_ptr;
static unsigned long sbitmap_elt_popcount (unsigned long);
void
sbitmap_a_or_b (sbitmap dst, const_sbitmap a, const_sbitmap b)
{
unsigned int i, n = dst->size;
sbitmap_ptr dstp = dst->elms;
const_sbitmap_ptr ap = a->elms;
const_sbitmap_ptr bp = b->elms;
unsigned char has_popcount = dst->popcount != ((void *) 0);
for (i = 0; i < n; i++)
{
const unsigned long tmp = *ap++ | *bp++;
*dstp++ = tmp;
}
}
```
We get copies in the loop body:
```
ld a4, 0(a3)
ld a5, 0(a2)
addi a1, a3, 8
addi a2, a2, 8
or a4, a4, a5
addi a3, a0, 8
sd a4, 0(a0)
mv a0, a3
mv a3, a1
bne a1, a6, .LBB0_2
```
Copies are introduced by PHI Elimination. The code before PHI Elimination:
```
bb.2.for.body:
; predecessors: %bb.1, %bb.2
successors: %bb.3(0x04000000), %bb.2(0x7c000000); %bb.3(3.12%), %bb.2(96.88%)
%5:gpr = PHI %3:gpr, %bb.1, %10:gpr, %bb.2
%6:gpr = PHI %1:gpr, %bb.1, %9:gpr, %bb.2
%7:gpr = PHI %2:gpr, %bb.1, %8:gpr, %bb.2
%8:gpr = ADDI %7:gpr, 8
%16:gpr = LD killed %7:gpr, 0 :: (load (s64) from %ir.ap.014, !tbaa !15)
%9:gpr = nuw ADDI %6:gpr, 8
%17:gpr = LD killed %6:gpr, 0 :: (load (s64) from %ir.bp.015, !tbaa !15)
%18:gpr = OR killed %17:gpr, killed %16:gpr
%10:gpr = nuw ADDI %5:gpr, 8
SD killed %18:gpr, killed %5:gpr, 0 :: (store (s64) into %ir.dstp.016, !tbaa !15)
BNE %8:gpr, %4:gpr, %bb.2
PseudoBR %bb.3
```
Note that `SD killed %18:gpr, killed %5:gpr, 0 :: (store (s64) into %ir.dstp.016, !tbaa !15)` is using the value of induction variable `%5` which is updated in `%10:gpr = nuw ADDI %5:gpr, 8`.
However, it is legal to move the store before the add. Similar situations is with other copies.
Possible solutions:
(1) Have some scheduling pass before PHI elimination. Right after non-global `ISel` we have a scheduling, where target can choose a custom scheduler via `ST.getDAGScheduler`. None of the existing targets use this and, as I understand, this code will be replaced by something else soon? Also, these schedulers are bottom-up, while this situation is best handled in top-down I think.
Another possibility is to add a top-down scheduler somewhere before PHI Elimination?
(2) Do this reordering in some other existing non-scheduler pass? Maybe as a first step of PHI elimination? We'll have to reproduce some of the scheduler's logic though which doesn't seem right.
(3) Something else?
What do you think?
CC:
@topperc @preames @asb @wangpc-pp
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs