Issue 160733
Summary AArch64: folly::ConcurrentHashMap (aka code with a bunch of std::atomic) codegen could be better
Labels new issue
Assignees
Reporter MatzeB
    We noticed codegen in the context of folly::ConcurrentHashMap (aka code with a  bunch of std::atomic) could be better. I am told this a reduction of an important function, that a colleague tried to vectorize:

```
#include <arm_neon.h>
#include <atomic>

struct __attribute__((packed)) mystruct {
 std::atomic<uint64_t> low_;
    std::atomic<uint64_t> hi_;
};

uint64_t occupiedMask(mystruct& tags_, uint64_t kFullMask) {
    uint64x2_t vec;
 vec[0] = tags_.low_.load(std::memory_order_relaxed);
    vec[1] = tags_.hi_.load(std::memory_order_relaxed);
    // signed shift extends top bit to all bits
    auto occupiedV =
 vreinterpretq_u8_s8(vshrq_n_s8(vreinterpretq_s8_u64(vec), 7));
    uint8x8_t maskV = vshrn_n_u16(vreinterpretq_u16_u8(occupiedV), 4);
    return vget_lane_u64(vreinterpret_u64_u8(maskV), 0) & kFullMask;
}
```

Currently produces:
```
occupiedMask(mystruct&, unsigned long):
        ldr     x8, [x0]
        ldr     x9, [x0, #8]
 fmov    d0, x8
        mov     v0.d[1], x9
        cmlt    v0.16b, v0.16b, #0
        shrn    v0.8b, v0.8h, #4
        fmov    x8, d0
 and     x0, x8, x1
        ret
```

(godbolt equivalent: https://godbolt.org/z/xobWMhe7W )

but we think this could ideally be:

```
        ldr     q0, [x0]
        cmlt    v0.16b, v0.16b, #0
 shrn    v0.8b, v0.8h, #4
        fmov    x8, d0
        and     x0, x8, x1
        ret
```
currently you only get this code when dropping the atomics.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to