[llvm-bugs] [Bug 142368] SHA instructions mixed with AVX degrades performance by 10x

LLVM Bugs via llvm-bugs Mon, 02 Jun 2025 04:33:32 -0700

Issue	142368
Summary	SHA instructions mixed with AVX degrades performance by 10x
Labels	backend:X86, performance
Assignees
Reporter	chfast

    In the following implementation of the SHA256 the SHA intrinsic are used to accelerate the performance on the supported CPUs. Although, the SSE intrinsic are used for other basic operations like load/store/shuffle the LLVM decides to emit AVX variants of these if the AVX is enabled. It looks like the mixture of AVX and SSE/SHA instruction significantly degrades the performance (over 10x).


https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-x86.c

The small fragment of the implementation:

```c
# include <stdint.h>
# include <x86intrin.h>

void sha256_process_x86(uint32_t state[8], const uint8_t data[])
{
    __m128i STATE0, STATE1;
    __m128i MSG, TMP;
 __m128i MSG0, MSG1, MSG2, MSG3;
    __m128i ABEF_SAVE, CDGH_SAVE;
    const __m128i MASK = _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL);

    /* Rounds 0-3 */
    MSG = _mm_loadu_si128((const __m128i*) (data+0));
    MSG0 = _mm_shuffle_epi8(MSG, MASK);
    MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0xE9B5DBA5B5C0FBCFULL, 0x71374491428A2F98ULL));
    STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
    MSG = _mm_shuffle_epi32(MSG, 0x0E);
    STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);

 _mm_storeu_si128((__m128i*)&state[0], STATE0);
 _mm_storeu_si128((__m128i*)&state[4], STATE1);
}
```
https://godbolt.org/z/4Ks53fjeo

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 142368] SHA instructions mixed with AVX degrades performance by 10x

Reply via email to