| Issue |
91780
|
| Summary |
[SLPVectorizer] Missed optimization: missed vectorization with non-clobbering store at the end
|
| Labels |
llvm:SLPVectorizer,
missed-optimization
|
| Assignees |
|
| Reporter |
XChy
|
Alive2 proof: it's too slow to verify the vector optimization, so I take GCC as the oracle in the link below
Godbolt link: https://godbolt.org/z/xe6Tzq8Ge
### Motivating example
For the reduced testcase below:
```llvm
void e1000x_core_prepare_arr(uint16_t *arr, uint16_t value)
{
uint16_t checksum = 0;
int i;
arr[11] = value; // vectorized if we remove this line.
for (i = 0; i < 0x3f; i++) {
checksum += arr[i];
}
arr[0x3f] = checksum; // vectorized if we replace it with "return checksum".
}
```
We get a long `add i16 checksum, arr[i]` chain: https://godbolt.org/z/xe6Tzq8Ge and SLPVectorizer fails to vectorize it.
After some trials, I found that if we remove any store outside loop, vectorization happens but codegen is still poor compared with GCC: https://godbolt.org/z/rEfbzhW77
Looks like a phase-ordering problem for cases, since the store outside loop may be merged into stores after unrolling. But even for the case with only a pure loop, the count of vectorized instructions looks bad: https://godbolt.org/z/oznjWWcKr
### Real-world motivation
This snippet of IR is derived from [qemu/hw/net/e1000x_common.c@e1000x_core_prepare_eeprom](https://github.com/qemu/qemu/blob/dafec285bdbfe415ac6823abdc510e0b92c3f094/hw/net/e1000x_common.c#L195) (after O3 pipeline).
**Let me know if you can confirm that it's an optimization opportunity, thanks.**
cc @alexey-bataev
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs