Hi all, I have just commited AVX10.2 new instructions patches into trunk hours ago. The next and final part for AVX10.2 upstream is to optimize code with AVX10.2 new instructions.
In this patch series, it will contain the following optimizations: - VNNI instruction auto vectorize (PATCH 1). - Codegen optimization with new scalar comparison instructions to eliminate redundant code (PATCH 2-3). - BF16 instruction auto vectorize (PATCH 4-8). This will finish the upstream for AVX10.2 series. Afterwards, we may add V2BF/V4BF in another thread just like what we have done for V2HF/V4HF when AVX512FP16 upstreamed. Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk? Thx, Haochen