On Fri, Jul 2, 2021 at 4:19 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Fri, Jul 2, 2021 at 10:07 AM Uros Bizjak via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > AVX512FP16 is disclosed, refer to [1]. > > > > > There're 100+ instructions for AVX512FP16, 67 gcc patches, for the > > > > > convenience of review, we divide the 67 patches into 2 major parts. > > > > > The first part is 2 patches containing basic support for AVX512FP16 > > > > > (options, cpuid, _Float16 type, libgcc, etc.), and the second part is > > > > > 65 patches covering all instructions of AVX512FP16(including > > > > > intrinsic support and some optimizations). > > > > > There is a problem with the first part, _Float16 is not a C++ > > > > > standard, so the front-end does not support this type and its > > > > > mangling, so we "make up" a _Float16 type on the back-end and use > > > > > _DF16 as its mangling. The purpose of this is to align with llvm > > > > > side, because llvm C++ FE already supports _Float16[2]. > > > > > > > > > > [1] > > > > > https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html > > > > > [2] https://reviews.llvm.org/D33719 > > > > > > > > Looking through implementation of _Float16 support, I think, there is > > > > no need for _Float16 support to depend on AVX512FP16. > > > > > > > > The compiler is smart enough to use either a named pattern that > > > > describes the instruction when available or diverts to a library call > > > > to a soft-fp implementation. So, I think that general _Float16 support > > > > should be implemented first (similar to _float128) and then upgraded > > > > with AVX512FP16 specific instructions. > > > > > > > > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode > > > > secondary_reload register. > > > > > > > MOVD is under sse2, so is pinsrw, which means if we want xmm > > > load/stores for HF, sse2 is the least requirement. > > > Also we support PEXTRW reg/m16, xmm, imm8 under SSE4_1 under which we > > > have 16bit direct load/store for HFmode and no need for a secondary > > > reload. > > > So for simplicity, can we just restrict _Float16 under sse4_1? > > > > When baseline is not met, the equivalent integer calling convention is > > used, for example: > > > > --cut here-- > > typedef int __v2si __attribute__ ((vector_size (8))); > > > > __v2si foo (__v2si a, __v2si b) > > { > > return a + b; > > } > > --cut here-- > > > > will still compile with -m32 -mno-mmx with warnings: > > > > mmx1.c: In function ‘foo’: > > mmx1.c:4:1: warning: MMX vector return without MMX enabled changes the > > ABI [-Wpsabi] > > mmx1.c:3:8: warning: MMX vector argument without MMX enabled changes > > the ABI [-Wpsabi] > > > > So, by setting the baseline to SSE4.1, a big pool of targets will be > > forced to use alternative ABI. This is quite inconvenient, and we > > revert to the alternative ABI if we *really* can't satisfy ABI > > requirements (e.g. register type is not available, basic move insn > > can't be implemented). Based on your analysis, I think that SSE2 > > should be the baseline. > > > > Also, looking at insn tables, it looks that movzwl from memory + movd > > is faster than pinsrw (and similar for pextrw to memory), but I have > > no hard data here. > > > > Regarding secondary_reload, a scratch register is needed in case of > > HImode moves between memory and XMM reg, since scratch register needs > > a different mode than source and destination. Please see > > TARGET_SECONDARY_RELOAD documentation and several examples in the > > source. > > I would suggest for the purpose of simplifying the initial patch series to > not make _Float16 supported on 32bits and leave that (and its ABI) for w/o AVX512FP16, it's ok. The problem is AVX512FP16 instructions are also available for -m32, and corresponding intrinsics will need the "_Float16" type(or other builtin type name) which will also be used by users. It means we still need a 32-bit _Float16 ABI for them.
> future enhancement. Then the baseline should be SSE2 (x86-64 base) > which I think should be OK despite needing some awkwardness for > HFmode stores (scratch reg needed). > > Richard. > > > Uros. -- BR, Hongtao