Issue 91780
Summary [SLPVectorizer] Missed optimization: missed vectorization with non-clobbering store at the end
Labels llvm:SLPVectorizer, missed-optimization
Assignees
Reporter XChy
    Alive2 proof: it's too slow to verify the vector optimization, so I take GCC as the oracle in the link below
Godbolt link: https://godbolt.org/z/xe6Tzq8Ge

### Motivating example 

For the reduced testcase below:
```llvm
void e1000x_core_prepare_arr(uint16_t *arr, uint16_t value)
{
    uint16_t checksum = 0;
    int i;

    arr[11] = value;  // vectorized if we remove this line.

 for (i = 0; i < 0x3f; i++) {
        checksum += arr[i];
 }

    arr[0x3f] = checksum;   // vectorized if we replace it with "return checksum".
}
```

We get a long `add i16 checksum, arr[i]` chain: https://godbolt.org/z/xe6Tzq8Ge  and SLPVectorizer fails to vectorize it. 
After some trials, I found that if we remove any store outside loop, vectorization happens but codegen is still poor compared with GCC: https://godbolt.org/z/rEfbzhW77
Looks like a phase-ordering problem for cases, since the store outside loop may be merged into stores after unrolling. But even for the case with only a pure loop, the count of vectorized instructions looks bad: https://godbolt.org/z/oznjWWcKr

### Real-world motivation

This snippet of IR is derived from [qemu/hw/net/e1000x_common.c@e1000x_core_prepare_eeprom](https://github.com/qemu/qemu/blob/dafec285bdbfe415ac6823abdc510e0b92c3f094/hw/net/e1000x_common.c#L195) (after O3 pipeline).

**Let me know if you can confirm that it's an optimization opportunity, thanks.**

cc @alexey-bataev 
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to