On 24/04/2023 09:33, Richard Sandiford via Gcc-patches wrote:
Richard Sandiford <richard.sandif...@arm.com> writes:
Tamar Christina <tamar.christ...@arm.com> writes:
Hi All,

This patch adds support for a compact syntax for specifying constraints in
instruction patterns. Credit for the idea goes to Richard Earnshaw.

I am sending up this RFC to get feedback for it's inclusion in GCC 14.
With this new syntax we want a clean break from the current limitations to make
something that is hopefully easier to use and maintain.

The idea behind this compact syntax is that often times it's quite hard to
correlate the entries in the constrains list, attributes and instruction lists.

One has to count and this often is tedious.  Additionally when changing a single
line in the insn multiple lines in a diff change, making it harder to see what's
going on.

This new syntax takes into account many of the common things that are done in MD
files.   It's also worth saying that this version is intended to deal with the
common case of a string based alternatives.   For C chunks we have some ideas
but those are not intended to be addressed here.

It's easiest to explain with an example:

normal syntax:

(define_insn_and_split "*movsi_aarch64"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  r,  
r,  r, w,r,w, w")
        (match_operand:SI 1 "aarch64_mov_operand"  " 
r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
   "(register_operand (operands[0], SImode)
     || aarch64_reg_or_zero (operands[1], SImode))"
   "@
    mov\\t%w0, %w1
    mov\\t%w0, %w1
    mov\\t%w0, %w1
    mov\\t%w0, %1
    #
    * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
    ldr\\t%w0, %1
    ldr\\t%s0, %1
    str\\t%w1, %0
    str\\t%s1, %0
    adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
    adr\\t%x0, %c1
    adrp\\t%x0, %A1
    fmov\\t%s0, %w1
    fmov\\t%w0, %s1
    fmov\\t%s0, %s1
    * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
SImode)
     && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
    [(const_int 0)]
    "{
        aarch64_expand_mov_immediate (operands[0], operands[1]);
        DONE;
     }"
   ;; The "mov_imm" type for CNT is just a placeholder.
   [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,
                    
load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
    (set_attr "arch"   "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
    (set_attr "length" "4,4,4,4,*,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
]
)

New syntax:

(define_insn_and_split "*movsi_aarch64"
   [(set (match_operand:SI 0 "nonimmediate_operand")
        (match_operand:SI 1 "aarch64_mov_operand"))]
   "(register_operand (operands[0], SImode)
     || aarch64_reg_or_zero (operands[1], SImode))"
   "@@ (cons: 0 1; attrs: type arch length)
    [=r, r  ; mov_reg  , *   , 4] mov\t%w0, %w1
    [k , r  ; mov_reg  , *   , 4] ^
    [r , k  ; mov_reg  , *   , 4] ^
    [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
    [r , n  ; mov_imm  , *   , *] #
    [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ('cnt', 
'%x0', operands[1]);
    [r , m  ; load_4   , *   , 4] ldr\t%w0, %1
    [w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
    [m , rZ ; store_4  , *   , 4] str\t%w1, %0
    [m , w  ; store_4  , fp  , 4] str\t%s1, %0
    [r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
    [r , Usa; adr      , *   , 4] adr\t%x0, %c1
    [r , Ush; adr      , *   , 4] adrp\t%x0, %A1
    [w , rZ ; f_mcr    , fp  , 4] fmov\t%s0, %w1
    [r , w  ; f_mrc    , fp  , 4] fmov\t%w0, %s1
    [w , w  ; fmov     , fp  , 4] fmov\t%s0, %s1
    [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate 
(operands[1], SImode);"
   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
SImode)
     && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
   [(const_int 0)]
   {
     aarch64_expand_mov_immediate (operands[0], operands[1]);
     DONE;
   }
   ;; The "mov_imm" type for CNT is just a placeholder.
)

The patch contains some more rewritten examples for both Arm and AArch64.  I
have included them for examples in this RFC but the final version posted in
GCC 14 will have these split out.

The main syntax rules are as follows (See docs for full rules):
   - Template must start with "@@" to use the new syntax.
   - "@@" is followed by a layout in parentheses which is "cons:" followed by
     a list of match_operand/match_scratch IDs, then a semicolon, then the
     same for attributes ("attrs:"). Both sections are optional (so you can
     use only cons, or only attrs, or both), and cons must come before attrs
     if present.
   - Each alternative begins with any amount of whitespace.
   - Following the whitespace is a comma-separated list of constraints and/or
     attributes within brackets [], with sections separated by a semicolon.
   - Following the closing ']' is any amount of whitespace, and then the actual
     asm output.
   - Spaces are allowed in the list (they will simply be removed).
   - All alternatives should be specified: a blank list should be
     "[,,]", "[,,;,]" etc., not "[]" or "" (however genattr may segfault if
     you leave certain attributes empty, I have found).
   - The actual constraint string in the match_operand or match_scratch, and
     the attribute string in the set_attr, must be blank or an empty string
     (you can't combine the old and new syntaxes).
   - The common idion * return can be shortened by using <<.
   - Any unexpanded iterators left during processing will result in an error at
     compile time.   If for some reason <> is needed in the output then these
     must be escaped using \.
   - Inside a @@ block '' is treated as "" when there are multiple characters
     inside the single quotes.  This version does not handle multi byte literals
     like specifying characters as their numerical encoding, like \003 nor does
     it handle unicode, especially multibyte encodings.  This feature may be 
more
     trouble than it's worth so have no finished it off, however this means one
     can use 'foo' instead of \"foo\" to denote a multicharacter string.
   - Inside an @@ block any unexpanded iterators will result in a compile time
     fault instead of incorrect assembly being generated at runtime.  If the
     literal <> is needed in the output this needs to be escaped with \<\>.
   - This check is not performed inside C blocks (lines starting with *).
   - Instead of copying the previous instruction again in the next pattern, one
     can use ^ to refer to the previous asm string.

Thanks for doing this.  The new syntax seems like a clear improvement
for complex patterns like movs.

Some comments/suggestions:

- From a style perspective, out-of-order constraints should IMO be strongly
   discouraged.  The asm string uses %0, %1, %2 etc. to refer to operands,
   and having that directly after a list that puts the constraints in
   a different order (such as [%2, %0, %1]) would IMO be very confusing.

   I agree there might be cases where dropping constraints makes sense.
   But I think in general we should encourage all constraints to be
   specified, and be specified in order.  And that's likely to be the
   natural choice in an overwhelming majority of cases anyway.

   So how about having a simpler syntax for the first line when all
   constraints are specified in order?  Maybe just "cons" (without the
   colon or numbers).

Alternatively: leading "=" and "+" characters describe the operand
as a whole, rather than individual alternatives.  So maybe "=" and
"+" should be in the "header" (the first line) rather than the first
"row"/alternative.  That would then be a justification for keeping
the operand numbers even for the simple case.

E.g.:

    cons: =0 1


Yes, that seems a sensible idea.

Also, it would be good if the header and rows were consistent about
whether they use comma separators or whitespace separators.  At the
moment, the headers are whitespace-separated while the rows are
comma-separated.

On that: it might be possible to use whitespace-separated columns even
for constraints, since an empty constraint should be equivalent to X.
That would look like:

     [r  r  ; mov_reg   *    4] mov\t%w0, %w1
     [k  r  ; mov_reg   *    4] ^
     [r  k  ; mov_reg   *    4] ^
     [r  M  ; mov_imm   *    4] mov\t%w0, %1
     [r  n  ; mov_imm   *    *] #
     [r  Usv; mov_imm   sve  4] << aarch64_output_sve_cnt_immediate ('cnt', 
'%x0', operands[1]);
     [r  m  ; load_4    *    4] ldr\t%w0, %1
     [w  m  ; load_4    fp   4] ldr\t%s0, %1
     [m  rZ ; store_4   *    4] str\t%w1, %0
     [m  w  ; store_4   fp   4] str\t%s1, %0
     [r  Usw; load_4    *    8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
     [r  Usa; adr       *    4] adr\t%x0, %c1
     [r  Ush; adr       *    4] adrp\t%x0, %A1
     [w  rZ ; f_mcr     fp   4] fmov\t%s0, %w1
     [r  w  ; f_mrc     fp   4] fmov\t%w0, %s1
     [w  w  ; fmov      fp   4] fmov\t%s0, %s1
     [w  Ds ; neon_move simd 4] << aarch64_output_scalar_simd_mov_immediate 
(operands[1], SImode);"

Keeping the commas means that when an alternative takes two constraints (you can conform to either) it's possible to put white space between them, so "rZ" can be written as "r Z" to make it clear that it's not a multi-letter constraint. If white space becomes the delimiter then you must write "rZ" with no separator.



I'm not sure whether that's better or worse though.  Keeping commas is
fine by me FWIW, as long as we do it in the header too.

Thanks,
Richard

R.

Reply via email to