Kirill, others, in the course of putting together test harness extensions for AVX512 additions to the Xen hypervisor's built-in instruction emulator I've come across a number of issues. Since it may easily be that I'm simply not knowing the full background, rather than adding bugzilla entries for all of them I thought I'd inquire first:
1) An initial idea of mine was to use -ffixed-* to force the use of the high 16 {x,y,z}mm registers (effectively by disallowing the use of the lower ones), such that I'd easily get EVEX encoded insns for whatever is possible to be EVEX-encoded with the given -mavx512* option(s). This doesn't even come close to working - all sorts of internal compiler errors result for other than the most trivial examples, most notably with AVX512VL support enabled. I can't observe similar bad effects from using -ffixed-* for other register sub-groups. I realize the interactions between the various insns the *.md files provide may be difficult to sort out, and perhaps the root cause is the same as that of bug 87354, but is this really something that's not supposed to work? 2) There looks to be quite wide a mixup of Yk and k constraints on insns. Most instructions having mask register outputs can very well use %k0, yet they're commonly using "=Yk". Exceptions are scatter/gather insns only, afaict. And of course insns using destination field masking have to use "Yk" inputs. Is there anything I'm overlooking here that prevents "=k" to be used as outlined? 2b) Both k and Yk are marked @internal in constraints.md, suggesting (to me) that I'm not supposed to use these constraints in inline asm() constructs. If that implication of mine is correct, how would I express respective constraints? 3) Certain AVX512_VBMI2, AVX512_BITALG, and GFNI+AVX512F inline functions are unavailable without AVX512BW also enabled (other than implied by SDM, XED, and binutils/gas, and other than for AVX512_VBMI). I can see why, without the SDM suggesting so, VBMI implies BW, but if this is done, other ISA extensions imo should also enable BW if need be, rather than hiding part of their inline/builtin helpers. Or the opposite position should be taken and no such implications should be made at all - aiui they're there solely for mask register size considerations, yet the respective insns could be used without masking, in which case no direct dependency on BW exists. 4) Even in very obvious situations there does not appear to be any use of embedded broadcasting. Is this something that's planned, or something I can only possibly make use of using inline assembly? 5) The VPTERNLOG* instructions look to be heavily underutilized. Not only do I observe strange VPTERNLOG*/VMODQA* (and alike) pairs, where the latter uses zeroing-masking just to produce a mix of all-zeroes and all-ones vector elements, when this same effect could have been achieved by using zeroing-masking on the VPTERNLOG* right away. Afaict the instructions can even be used for any up to 3-way logical (bit-wise boolean) operation for which no specific insn exists (with a suitably calculated immediate), yet even a simple ~ gets carried out by VPXOR-ing with a vector of all ones. Thanks, Jan