Hi Richard, On 12/06/2026 18:14, Richard Sandiford wrote: > Alex Coplan <[email protected]> writes: > > Hi folks, > > > > Hoping for some input from Richard S here (or other AArch64 maintainers). My > > question is around the modes we use to reference ZA in the SME ACLE > > implementation. > > > > I am particularly curious about the convention described in the following > > comment above sme_2mode_function_t in aarch64-sve-builtins-functions.h: > > > > /* General SME unspec-based functions, parameterized on both the ZA mode > > and the vector mode. If the elements of the ZA and vector modes are > > the same size (e.g. _za64_f64 or _za32_s32) then the two mode arguments > > are equal, otherwise the first mode argument is the single-vector integer > > mode associated with the ZA suffix and the second mode argument is the > > tuple mode associated with the vector suffix. */ > > template<insn_code (*CODE) (int, machine_mode, machine_mode), > > insn_code (*CODE_SINGLE) (int, machine_mode, machine_mode)> > > class sme_2mode_function_t : public > > read_write_za<unspec_based_function_base> > > { > > [...] > > } > > > > So essentially this means that for an FP intrinsic like > > svmopa_za32_f32_m, we access ZA in an FP mode (VNx4SFmode), with an insn > > like: > > > > (insn 9 8 0 2 (set (reg:VNx4SF 93 za) > > (unspec:VNx4SF [ > > (reg:VNx4SF 93 za) > > (reg:DI 89 sme_state) > > (const_int 0 [0]) > > (reg:VNx4BI 103) repeated x2 > > (reg/v:VNx4SF 101 [ zn ]) > > (reg/v:VNx4SF 102 [ zm ]) > > ] UNSPEC_SME_FMOPA)) "t.c":6:5 15949 > > {aarch64_sme_fmopavnx4sfvnx4sf} > > (nil)) > > > > but for a widening FP intrinsic like svmopa_za32_f16_m, we instead get > > an integer mode for ZA (VNx4SImode): > > > > (insn 9 8 0 2 (set (reg:VNx4SI 93 za) > > (unspec:VNx4SI [ > > (reg:VNx4SI 93 za) > > (reg:DI 89 sme_state) > > (const_int 0 [0]) > > (reg:VNx4BI 103) repeated x2 > > (reg/v:VNx8HF 101 [ zn ]) > > (reg/v:VNx8HF 102 [ zm ]) > > ] UNSPEC_SME_FMOPA)) "t.c":12:5 15959 > > {aarch64_sme_fmopavnx4sivnx8hf} > > (nil)) > > > > which at first I found a little surprising, given that the underlying > > instruction still interprets the ZA contents as floating point. > > > > I was curious about the rationale for this convention. Possible > > alternatives that come to mind are: > > > > (1) Always using an integer mode for ZA accesses (if it's OK to do it > > for the widening case above, why not always?) > > (2) Match the ZA mode to the vector operands: so always use an FP mode > > of the appropriate width when the vector operands are FP operands, and > > otherwise use an integer mode. > > > > Of these, (2) seems the most natural to me, but I'm sure there's a good > > reason that it's done the way it is. > > I don't think there's a perfect choice here. > > The mode of ZA is not interpreted strictly according to the usual RTL > semantics. That would be impossible with the current infrastructure, > since the number of bytes depends on the VL squared.
Yeah. IIUC this is because poly_ints can (quite reasonably!) only represent first-order polynomials at the moment? > Instead, the mode > is supposedly just a convenience (although your question suggests it > might fail there). No, I haven't had a problem with this setup, it just stood out to me as being somewhat unusual when reviewing other SME patches. In what sense is it a convenience, though? Just so we know the correct element size from the mode used to reference ZA? > > This works since ZA is a fixed register and must always be accessed by > unspecs that are opaque to target-independent code. > > It therefore doesn't matter whether the insn patterns use I modes or F modes. > > That being the case, there didn't seem any point in distinguishing > between "ZA suffixes that map to an I mode" and "ZA suffixes that map > to an F mode". We might as well just have one set of ZA suffixes: > > DEF_SME_ZA_SUFFIX (za8, 8, VNx16QImode) > DEF_SME_ZA_SUFFIX (za16, 16, VNx8HImode) > DEF_SME_ZA_SUFFIX (za32, 32, VNx4SImode) > DEF_SME_ZA_SUFFIX (za64, 64, VNx2DImode) > DEF_SME_ZA_SUFFIX (za128, 128, VNx1TImode) > > that map directly to the spec. > > That's the reason for not doing (2). (2) would mean either (a) defining > "integer ZA suffixes" and "FP ZA suffixes", or (b) encoding integerness > or FPness in the function_base (meaning more variations of sme_2mode). I agree that (with the current ISA) defining integer/FP ZA suffixes doesn't make sense. But (playing devil's advocate) couldn't sme_2mode_function_t just have a helper, say: /* X and Y are both vector modes. Return a vector mode that is like X in element size and NUNITS, but if X and Y disagree on the FPness of elements, make X agree with Y in this sense. */ machine_mode match_fpness (machine_mode x, machine_mode y); which is then used in sme_2mode_function_t::expand (passing za_mode and v_mode)? I'm guessing it's not quite as simple as that, though. > > (1) would indeed be OK, which is why that is essentially the underlying > function_instance encoding. But it would mean that FP instructions > that operate on a single datatype would nevertheless need to be > parameterised on two different modes. > > And the way that "@" patterns work is that it is always the iterator that > is passed in place of "<...>", even if the "<...>" is a mode attribute. > Thus it would not be enough to have: > > (define_insn "@aarch64_<op><FP_ITERATOR:int_equivalent><FP_ITERATOR:mode>" > ...) > > We would need to have two separate iterators: one integer and one FP: > > (define_insn "@aarch64_<op><INT_ITERATOR:mode><FP_ITERATOR:mode>" ...) > > and use C++ conditions to make sure that they have the same element size. > > Although we do use that type of C++ condition for some mode combinations, > it's better not to lean on it too much, since all combinations do still > exist in a sense. It's just that the generators make some attempt to > compile out unneeded combinations. > > Also, sme_2mode's current approach is consistent with sme_1mode in cases > where the ZA element size matches the vector element size. This means > that an intrinsic could be converted from sme_1mode to sme_2mode for > later extensions without having to change the existing patterns. > (1) would only achieve that if we standardised on integer ZA modes for > all intrinsics, not just sme_2mode ones, which seemed like an extra > level of complication. I see, thanks a lot for the detailed explanation. Alex > > Thanks, > Richard
