Hi Richard,

On 12/06/2026 18:14, Richard Sandiford wrote:
> Alex Coplan <[email protected]> writes:
> > Hi folks,
> >
> > Hoping for some input from Richard S here (or other AArch64 maintainers). My
> > question is around the modes we use to reference ZA in the SME ACLE
> > implementation.
> >
> > I am particularly curious about the convention described in the following
> > comment above sme_2mode_function_t in aarch64-sve-builtins-functions.h:
> >
> > /* General SME unspec-based functions, parameterized on both the ZA mode
> >    and the vector mode.  If the elements of the ZA and vector modes are
> >    the same size (e.g. _za64_f64 or _za32_s32) then the two mode arguments
> >    are equal, otherwise the first mode argument is the single-vector integer
> >    mode associated with the ZA suffix and the second mode argument is the
> >    tuple mode associated with the vector suffix.  */
> > template<insn_code (*CODE) (int, machine_mode, machine_mode),
> >          insn_code (*CODE_SINGLE) (int, machine_mode, machine_mode)>
> > class sme_2mode_function_t : public 
> > read_write_za<unspec_based_function_base>
> > {
> >   [...]
> > }
> >
> > So essentially this means that for an FP intrinsic like
> > svmopa_za32_f32_m, we access ZA in an FP mode (VNx4SFmode), with an insn
> > like:
> >
> > (insn 9 8 0 2 (set (reg:VNx4SF 93 za)
> >         (unspec:VNx4SF [
> >                 (reg:VNx4SF 93 za)
> >                 (reg:DI 89 sme_state)
> >                 (const_int 0 [0])
> >                 (reg:VNx4BI 103) repeated x2
> >                 (reg/v:VNx4SF 101 [ zn ])
> >                 (reg/v:VNx4SF 102 [ zm ])
> >             ] UNSPEC_SME_FMOPA)) "t.c":6:5 15949 
> > {aarch64_sme_fmopavnx4sfvnx4sf}
> >      (nil))
> >
> > but for a widening FP intrinsic like svmopa_za32_f16_m, we instead get
> > an integer mode for ZA (VNx4SImode):
> >
> > (insn 9 8 0 2 (set (reg:VNx4SI 93 za)
> >         (unspec:VNx4SI [
> >                 (reg:VNx4SI 93 za)
> >                 (reg:DI 89 sme_state)
> >                 (const_int 0 [0])
> >                 (reg:VNx4BI 103) repeated x2
> >                 (reg/v:VNx8HF 101 [ zn ])
> >                 (reg/v:VNx8HF 102 [ zm ])
> >             ] UNSPEC_SME_FMOPA)) "t.c":12:5 15959 
> > {aarch64_sme_fmopavnx4sivnx8hf}
> >      (nil))
> >
> > which at first I found a little surprising, given that the underlying
> > instruction still interprets the ZA contents as floating point.
> >
> > I was curious about the rationale for this convention.  Possible
> > alternatives that come to mind are:
> >
> > (1) Always using an integer mode for ZA accesses (if it's OK to do it
> >     for the widening case above, why not always?)
> > (2) Match the ZA mode to the vector operands: so always use an FP mode
> >     of the appropriate width when the vector operands are FP operands, and
> >     otherwise use an integer mode.
> >
> > Of these, (2) seems the most natural to me, but I'm sure there's a good
> > reason that it's done the way it is.
> 
> I don't think there's a perfect choice here.
> 
> The mode of ZA is not interpreted strictly according to the usual RTL
> semantics.  That would be impossible with the current infrastructure,
> since the number of bytes depends on the VL squared.

Yeah. IIUC this is because poly_ints can (quite reasonably!) only
represent first-order polynomials at the moment?

> Instead, the mode
> is supposedly just a convenience (although your question suggests it
> might fail there).

No, I haven't had a problem with this setup, it just stood out to me as
being somewhat unusual when reviewing other SME patches.  In what sense
is it a convenience, though?  Just so we know the correct element size
from the mode used to reference ZA?

> 
> This works since ZA is a fixed register and must always be accessed by
> unspecs that are opaque to target-independent code.
> 
> It therefore doesn't matter whether the insn patterns use I modes or F modes.
> 
> That being the case, there didn't seem any point in distinguishing
> between "ZA suffixes that map to an I mode" and "ZA suffixes that map
> to an F mode".  We might as well just have one set of ZA suffixes:
> 
> DEF_SME_ZA_SUFFIX (za8, 8, VNx16QImode)
> DEF_SME_ZA_SUFFIX (za16, 16, VNx8HImode)
> DEF_SME_ZA_SUFFIX (za32, 32, VNx4SImode)
> DEF_SME_ZA_SUFFIX (za64, 64, VNx2DImode)
> DEF_SME_ZA_SUFFIX (za128, 128, VNx1TImode)
> 
> that map directly to the spec.
> 
> That's the reason for not doing (2).  (2) would mean either (a) defining
> "integer ZA suffixes" and "FP ZA suffixes", or (b) encoding integerness
> or FPness in the function_base (meaning more variations of sme_2mode).

I agree that (with the current ISA) defining integer/FP ZA suffixes
doesn't make sense. But (playing devil's advocate) couldn't
sme_2mode_function_t just have a helper, say:

/* X and Y are both vector modes.  Return a vector mode that is like X
   in element size and NUNITS, but if X and Y disagree on the FPness of
   elements, make X agree with Y in this sense.  */

machine_mode match_fpness (machine_mode x, machine_mode y);

which is then used in sme_2mode_function_t::expand (passing za_mode and
v_mode)?  I'm guessing it's not quite as simple as that, though.

> 
> (1) would indeed be OK, which is why that is essentially the underlying
> function_instance encoding.  But it would mean that FP instructions
> that operate on a single datatype would nevertheless need to be
> parameterised on two different modes.
> 
> And the way that "@" patterns work is that it is always the iterator that
> is passed in place of "<...>", even if the "<...>" is a mode attribute.
> Thus it would not be enough to have:
> 
> (define_insn "@aarch64_<op><FP_ITERATOR:int_equivalent><FP_ITERATOR:mode>" 
> ...)
> 
> We would need to have two separate iterators: one integer and one FP:
> 
> (define_insn "@aarch64_<op><INT_ITERATOR:mode><FP_ITERATOR:mode>" ...)
> 
> and use C++ conditions to make sure that they have the same element size.
> 
> Although we do use that type of C++ condition for some mode combinations,
> it's better not to lean on it too much, since all combinations do still
> exist in a sense.  It's just that the generators make some attempt to
> compile out unneeded combinations.
> 
> Also, sme_2mode's current approach is consistent with sme_1mode in cases
> where the ZA element size matches the vector element size.  This means
> that an intrinsic could be converted from sme_1mode to sme_2mode for
> later extensions without having to change the existing patterns.
> (1) would only achieve that if we standardised on integer ZA modes for
> all intrinsics, not just sme_2mode ones, which seemed like an extra
> level of complication.

I see, thanks a lot for the detailed explanation.

Alex

> 
> Thanks,
> Richard

Reply via email to