Issue |
152430
|
Summary |
Missed optimization and unexpected compilation difference when using inlined function to cast int32_t to uint8_t
|
Labels |
new issue
|
Assignees |
|
Reporter |
kasper93
|
Hi,
I didn't know what would be good title for this issue. Basically what happens is that `uint8_t min = mpeg_min;` where `mpeg_min` is `int32_t` is ignored for vectorization unless it is done via `static inline` function call... which is fully inlined.
The issue is that version vectorized using `uint8_t` is significantly faster than the one using `int32_t` and I had to add dummy static inline function to make llvm produce better code. Which is unexpected, because for what it's worth, both version should compile to the same code. I tried `__builtin_assume` doesn't change anything, except initial cast. I assume that LLVM decided to ignore the variable range, even if explicitly narrowed and use `int32_t` version directly, which hinders performance.
If you look at IR, we can see:
That fast version is doing vectorization on i8
```
%broadcast.splatinsert = insertelement <16 x i8> poison, i8 %conv, i64 0
```
While slow version is doing things on i32
``` LLVM
%broadcast.splatinsert = insertelement <16 x i32> poison, i32 %mpeg_min, i64 0
```
This is just one line, take a look at full output.
Code here https://godbolt.org/z/aeG8zbE9x with diff and attached below.
Compiled with `clang -O3`.
## Fast version
``` c
#include <stddef.h>
#include <stdint.h>
static inline int fast_impl(const uint8_t *data, ptrdiff_t stride,
ptrdiff_t width, ptrdiff_t height,
uint8_t mpeg_min, uint8_t mpeg_max)
{
while (height--) {
uint8_t cond = 0;
for (int x = 0; x < width; x++) {
const uint8_t val = data[x];
cond |= val < mpeg_min || val > mpeg_max;
}
if (cond)
return 1;
data += stride;
}
return 0;
}
int fast(const uint8_t *data, ptrdiff_t stride,
ptrdiff_t width, ptrdiff_t height,
int mpeg_min, int mpeg_max)
{
__builtin_assume(mpeg_min >= 0 && mpeg_min <= UINT8_MAX);
__builtin_assume(mpeg_max >= 0 && mpeg_max <= UINT8_MAX);
return foo_impl(data, stride, width, height, mpeg_min, mpeg_max);
}
```
## Slow version
``` c
#include <stddef.h>
#include <stdint.h>
int slow(const uint8_t *data, ptrdiff_t stride,
ptrdiff_t width, ptrdiff_t height,
int32_t mpeg_min, int32_t mpeg_max)
{
__builtin_assume(mpeg_min >= 0 && mpeg_min <= UINT8_MAX);
__builtin_assume(mpeg_max >= 0 && mpeg_max <= UINT8_MAX);
uint8_t min = mpeg_min;
uint8_t max = mpeg_max;
while (height--) {
uint8_t cond = 0;
for (int x = 0; x < width; x++) {
const uint8_t val = data[x];
cond |= val < min || val > max;
}
if (cond)
return 1;
data += stride;
}
return 0;
}
```
Thanks,
Kacper
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs