Kirill, others,

in the course of putting together test harness extensions for AVX512
additions to the Xen hypervisor's built-in instruction emulator I've
come across a number of issues. Since it may easily be that I'm
simply not knowing the full background, rather than adding bugzilla
entries for all of them I thought I'd inquire first:

1) An initial idea of mine was to use -ffixed-* to force the use of the
   high 16 {x,y,z}mm registers (effectively by disallowing the use of
   the lower ones), such that I'd easily get EVEX encoded insns for
   whatever is possible to be EVEX-encoded with the given -mavx512*
   option(s). This doesn't even come close to working - all sorts of
   internal compiler errors result for other than the most trivial
   examples, most notably with AVX512VL support enabled. I can't
   observe similar bad effects from using -ffixed-* for other register
   sub-groups. I realize the interactions between the various insns
   the *.md files provide may be difficult to sort out, and perhaps
   the root cause is the same as that of bug 87354, but is this really
   something that's not supposed to work?

2) There looks to be quite wide a mixup of Yk and k constraints on
   insns. Most instructions having mask register outputs can very
   well use %k0, yet they're commonly using "=Yk". Exceptions are
   scatter/gather insns only, afaict. And of course insns using
   destination field masking have to use "Yk" inputs. Is there
   anything I'm overlooking here that prevents "=k" to be used as
   outlined?

2b) Both k and Yk are marked @internal in constraints.md, suggesting
    (to me) that I'm not supposed to use these constraints in inline
    asm() constructs. If that implication of mine is correct, how would
    I express respective constraints?

3) Certain AVX512_VBMI2, AVX512_BITALG, and GFNI+AVX512F inline
   functions are unavailable without AVX512BW also enabled (other than
   implied by SDM, XED, and binutils/gas, and other than for AVX512_VBMI).
   I can see why, without the SDM suggesting so, VBMI implies BW, but
   if this is done, other ISA extensions imo should also enable BW if need
   be, rather than hiding part of their inline/builtin helpers. Or the
   opposite position should be taken and no such implications should be
   made at all - aiui they're there solely for mask register size
   considerations, yet the respective insns could be used without
   masking, in which case no direct dependency on BW exists.

4) Even in very obvious situations there does not appear to be any
   use of embedded broadcasting. Is this something that's planned,
   or something I can only possibly make use of using inline assembly?

5) The VPTERNLOG* instructions look to be heavily underutilized. Not
   only do I observe strange VPTERNLOG*/VMODQA* (and alike) pairs,
   where the latter uses zeroing-masking just to produce a mix of
   all-zeroes and all-ones vector elements, when this same effect
   could have been achieved by using zeroing-masking on the
   VPTERNLOG* right away. Afaict the instructions can even be used
   for any up to 3-way logical (bit-wise boolean) operation for which
   no specific insn exists (with a suitably calculated immediate), yet
   even a simple ~ gets carried out by VPXOR-ing with a vector of all
   ones.

Thanks, Jan


Reply via email to