[PATCH] cksum: Use AVX2 and AVX512 for speedup

Sam Russell Mon, 25 Nov 2024 08:09:46 -0800

I've added a sample benchmarking program to measure the difference without
hitting disk, looking like a 40% speedup


$ time ./cksum_bench_pclmul 1048576 10000
Hash: EFA0B24F, length: 1048576

real    0m3.018s
user    0m3.018s
sys     0m0.000s

$ time ./cksum_bench_avx2 1048576 10000
Hash: EFA0B24F, length: 1048576

real    0m1.824s
user    0m1.804s
sys     0m0.020s

The code effectively replicates the existing pclmul code and has new
constants generated for the larger folds. The main gotcha was that the
previous CRC gets inserted at a weird offset due to endianness and byte
swapping.

I don't have a skylake processor so I spun up an AWS instance to test out
the AVX512 version, it turns out there's a bug where virtualisation
environments don't handle the  AVX512   pclmul correctly despite the CPU
supporting it. It might be worth us disabling this for now as it does get
past the __builtin_cpu_supports() gate but then throws an illegal
instruction halfway through the function. It would be nice if we could at
least validate it for now though.

AVX2 has been around over 10 years though so this seems to be a safer
addition.

#include "config.h"

#include <stdio.h>
#include <stdlib.h>

#include "cksum.h"

void xorshift_populate(char* buffer, size_t len) {
    size_t i;
    unsigned int state = 0x123;

    for (i = 0; i < len; i++) {
        state ^= state << 13;
        state ^= state >> 17;
        state ^= state << 5;
        buffer[i] = (char) state;
    }
}

int main(int argc, char* argv[]) {
    uint_fast32_t hash;
    uintmax_t length;
    size_t iterations, i;
    FILE* fp;
    size_t buffer_len;
    char* buffer;

    if (argc != 3) {
        printf("Usage: %s length iterations\n", argv[0]);
        return -1;
    }
    
    buffer_len = atoi(argv[1]);
    iterations = atoi(argv[2]);
    buffer = calloc(1, buffer_len);
    xorshift_populate(buffer, buffer_len);
    for (i = 0; i < iterations; i++) {
        fp = fmemopen(buffer, buffer_len, "r");
        cksum_pclmul(fp, &hash, &length);
    }
    free(buffer);

    printf("Hash: %08X, length: %d\n", (unsigned int) hash, (int) length);
    return 0;
}

0001-cksum-Use-AVX2-and-AVX512-for-speedup.patch
Description: Binary data

[PATCH] cksum: Use AVX2 and AVX512 for speedup

Reply via email to