Hi all, The AVX10.2 ymm rounding patches has been merged to trunk around 6 hours ago. As mentioned before, next step will be AVX10.2 new instruction support.
This patch series could be divided into three part. The first patch will refactor m512-check.h under testsuite to reuse AVX-512 helper functions and unions and avoid ABI warnings when using AVX10. The following ten patches will support all AVX10.2 new instrctions, including: - AI Datatypes, Conversions, and post-Convolution Instructions. - Media Acceleration. - IEEE-754-2019 Minimum and Maximum Support. - Saturating Conversions. - Zero-extending Partial Vector Copies. - FP Scalar Comparison. For FP Scalar Comparison part (a.k.a comx instructions), we will only provide pattern support but not intrin support since it is redundant with comi ones for common usage. We will also add some optimizations afterwards for common usage with comx instructions. If there are some strong requests, we will add intrin support in the future. The final patch will add bf8 -> fp16 intrin for convenience. Since the conversion from bf8 to fp16 is only casting for fraction part due to same bits for exponent part, we will use a sequence of instructions instead of new instructions. It is just like the scenario for bf16 -> fp32 conversion. After all these patch merged, the next step would be optimizations based on AVX10.2 new instructions, including vnni vectorization, bf16 vectorization, comx optmization, etc. Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk? Thx, Haochen