Re: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

Richard Sandiford via Gcc-patches Wed, 14 Jun 2023 12:41:41 -0700

Tamar Christina <tamar.christ...@arm.com> writes:
> +The syntax rules are as follows:
> +@itemize @bullet
> +@item
> +Templates must start with @samp{@{@@} to use the new syntax.
> +
> +@item
> +@samp{@{@@} is followed by a layout in parentheses which is @samp{cons:}


s/parentheses/square brackets/

> +followed by a comma-separated list of 
> @code{match_operand}/@code{match_scratch}
> +operand numbers, then a semicolon, followed by the same for attributes
> +(@samp{attrs:}).  Operand modifiers can be placed in this section group as 
> well.

How about:

  Operand modifiers like @code{=} and @code{+} can be placed before
  an operand number.

> +Both sections are optional (so you can use only @samp{cons}, or only
> +@samp{attrs}, or both), and @samp{cons} must come before @samp{attrs} if
> +present.
> +
> +@item
> +Each alternative begins with any amount of whitespace.
> +
> +@item
> +Following the whitespace is a comma-separated list of "constraints" and/or
> +"attributes" within brackets @code{[]}, with sections separated by a 
> semicolon.
> +
> +@item
> +Should you want to copy the previous asm line, the symbol @code{^} can be 
> used.
> +This allows less copy pasting between alternative and reduces the number of
> +lines to update on changes.
> +
> +@item
> +When using C functions for output, the idiom @samp{* return @var{function};}
> +can be replaced with the shorthand @samp{<< @var{function};}.
> +
> +@item
> +Following the closing @samp{]} is any amount of whitespace, and then the 
> actual
> +asm output.
> +
> +@item
> +Spaces are allowed in the list (they will simply be removed).
> +
> +@item
> +All constraint alternatives should be specified.  For example, a list of
> +of three blank alternatives should be written @samp{[,,]} rather than
> +@samp{[]}.
> +
> +@item
> +All attribute alternatives should be non-empty, with @samp{*}
> +representing the default attribute value.  For example, a list of three
> +default attribute values should be written @samp{[*,*,*]} rather than
> +@samp{[]}.
> +
> +

Nit: too many blank lines.

> +@item
> +Within an @samp{@{@@} block both multiline and singleline C comments are
> +allowed, but when used outside of a C block they must be the only 
> non-whitespace
> +blocks on the line.
> +
> +@item
> +Within an @samp{@{@@} block, any iterators that do not get expanded will 
> result
> +in an error.  If for some reason it is required to have @code{<} or @code{>} 
> in
> +the output then these must be escaped using @backslashchar{}.
> +
> +@item
> +It is possible to use the @samp{attrs} list to specify some attributes and to
> +use the normal @code{set_attr} syntax to specify other attributes.  There 
> must
> +not be any overlap between the two lists.
> +
> +In other words, the following is valid:
> +@smallexample
> +@group
> +(define_insn_and_split ""
> +  [(set (match_operand:SI 0 "nonimmediate_operand")
> +       (match_operand:SI 1 "aarch64_mov_operand"))]
> +  ""
> +  @{@@ [cons: 0, 1; attrs: type, arch, length]@}
> +  @dots{}
> +  [(set_attr "foo" "mov_imm")]
> +)
> +@end group
> +@end smallexample
> +
> +but this is not valid:
> +@smallexample
> +@group
> +(define_insn_and_split ""
> +  [(set (match_operand:SI 0 "nonimmediate_operand")
> +       (match_operand:SI 1 "aarch64_mov_operand"))]
> +  ""
> +  @{@@ [cons: 0, 1; attrs: type, arch, length]@}
> +  @dots{}
> +  [(set_attr "arch" "bar")
> +   (set_attr "foo" "mov_imm")]
> +)
> +@end group
> +@end smallexample
> +
> +because you can't mix and match new and old syntax.

Maybe “because it specifies @code{arch} twice”?  Suggesting that because
“new” and “old” tend not to age well.

> +/* Add constraints to an rtx.  This function is similar to 
> remove_constraints.
> +   Errors if adding the constraints would overwrite existing constraints.  */
> +
> +static void
> +add_constraints (rtx part, file_location loc, vec_conlist &cons)
> +{
> +  const char *format_ptr;
> +
> +  if (part == NULL_RTX)
> +    return;
> +
> +  /* If match_op or match_scr, check if we have the right one, and if so, 
> copy
> +     over the constraint list.  */
> +  if (GET_CODE (part) == MATCH_OPERAND || GET_CODE (part) == MATCH_SCRATCH)
> +    {
> +      int field = GET_CODE (part) == MATCH_OPERAND ? 2 : 1;
> +      unsigned id = XINT (part, 0);
> +
> +      if (id >= cons.size ())
> +       fatal_at (loc, "could not find match_operand/scratch with id %d", id);

Is this an error?  I thought it should be treated like...

> +
> +      if (cons[id].idx == -1)
> +       return;

...cons[id].idx == -1 is here.  I.e. I think they could be combined to:

  if (ids >= cons.size () || cons[id].idx == -1)
    return;

> +
> +      if (XSTR (part, field)[0] != '\0')
> +       {
> +         error_at (loc, "can't mix normal and compact constraint syntax");
> +         return;
> +       }
> +      XSTR (part, field) = cons[id].out ();
> +      cons[id].idx = -1;
> +    }
> +
> +  format_ptr = GET_RTX_FORMAT (GET_CODE (part));
> +
> +  /* Recursively search the rtx.  */
> +  for (int i = 0; i < GET_RTX_LENGTH (GET_CODE (part)); i++)
> +    switch (*format_ptr++)
> +      {
> +      case 'e':
> +      case 'u':
> +       add_constraints (XEXP (part, i), loc, cons);
> +       break;
> +      case 'E':
> +       if (XVEC (part, i) != NULL)
> +         for (int j = 0; j < XVECLEN (part, i); j++)
> +           add_constraints (XVECEXP (part, i, j), loc, cons);
> +       break;
> +      default:
> +       continue;
> +      }
> +}
> +
> +/* Add ATTRS to definition X's attribute list.  */
> +
> +static void
> +add_attributes (rtx x, vec_conlist &attrs)
> +{
> +  unsigned int attr_index = GET_CODE (x) == DEFINE_INSN ? 4 : 3;
> +  rtvec orig = XVEC (x, attr_index);
> +  size_t n_curr = orig ? XVECLEN (x, attr_index) : 0;
> +  rtvec copy = rtvec_alloc (n_curr + attrs.size ());
> +
> +  /* Create a shallow copy of existing entries.  */
> +  memcpy (&copy->elem[attrs.size ()], &orig->elem[0], sizeof (rtx) * n_curr);

Sorry for not noticing last time, but I think this should strictly
be guarded by:

  if (orig)

to avoid calculating &orig->elem[0] on a null pointer.

> +  XVEC (x, attr_index) = copy;
> +
> +  /* Create the new elements.  */
> +  for (unsigned i = 0; i < attrs.size (); i++)
> +    {
> +      rtx attr = rtx_alloc (SET_ATTR);
> +      XSTR (attr, 0) = xstrdup (attrs[i].name.c_str ());
> +      XSTR (attr, 1) = attrs[i].out ();
> +      XVECEXP (x, attr_index, i) = attr;
> +    }
> +}
> +
> +/* Consumes spaces and tabs.  */
> +
> +static inline void
> +skip_spaces (const char **str)
> +{
> +  while (ISBLANK (**str))
> +    (*str)++;
> +}
> +
> +/* Consumes the given character, if it's there.  */
> +
> +static inline bool
> +expect_char (const char **str, char c)
> +{
> +  if (**str != c)
> +    return false;
> +  (*str)++;
> +  return true;
> +}
> +
> +/* Parses the section layout that follows a "{@" if using new syntax. Builds
> +   a vector for a single section. E.g. if we have "attrs: length, arch]..."
> +   then list will have two elements, the first for "length" and the second
> +   for "arch".  */
> +
> +static void
> +parse_section_layout (file_location loc, const char **templ, const char 
> *label,
> +                     vec_conlist &list, bool numeric)
> +{
> +  const char *name_start;
> +  size_t label_len = strlen (label);
> +  if (strncmp (label, *templ, label_len) == 0)
> +    {
> +      *templ += label_len;
> +
> +      /* Gather the names.  */
> +      while (**templ != ';' && **templ != ']')
> +       {
> +         skip_spaces (templ);
> +         name_start = *templ;
> +         int len = 0;
> +         char val = (*templ)[len];
> +         while (val != ',' && val != ';' && val != ']')
> +           {
> +             if (val == 0 || val == '\n')
> +               fatal_at (loc, "missing ']'");
> +             val = (*templ)[++len];
> +           }
> +         *templ += len;
> +         if (val == ',')
> +           (*templ)++;
> +         list.push_back (conlist (name_start, len, numeric));
> +       }
> +    }
> +}
> +
> +/* Parse a section, a section is defined as a named space separated list, 
> e.g.
> +
> +   foo: a, b, c
> +
> +   is a section named "foo" with entries a, b and c.  */
> +
> +static void
> +parse_section (const char **templ, unsigned int n_elems, unsigned int alt_no,
> +              vec_conlist &list, file_location loc, const char *name)
> +{
> +  unsigned int i;
> +
> +  /* Go through the list, one character at a time, adding said character
> +     to the correct string.  */
> +  for (i = 0; **templ != ']' && **templ != ';'; (*templ)++)
> +    if (!ISBLANK (**templ))
> +      {
> +       list[i].add (**templ);
> +       if (**templ == ',')
> +         {
> +           ++i;
> +           if (i == n_elems)
> +             fatal_at (loc, "too many %ss in alternative %d: expected %d",
> +                       name, alt_no, n_elems);
> +         }
> +       if (**templ == 0 || **templ == '\n')
> +         fatal_at (loc, "missing ']'");
> +      }

I think it'd be more obvious for the NIL and EOL check to come first,
so that we don't do anything with invalid chars:

  /* Go through the list, one character at a time, adding said character
     to the correct string.  */
  for (i = 0; **templ != ']' && **templ != ';'; (*templ)++)
    if (!ISBLANK (**templ))
      {
        if (**templ == 0 || **templ == '\n')
          fatal_at (loc, "missing ']'");
        list[i].add (**templ);
        if (**templ == ',')
          {
            ++i;
            if (i == n_elems)
              fatal_at (loc, "too many %ss in alternative %d: expected %d",
                        name, alt_no, n_elems);
          }
      }

> +
> +  if (i + 1 < n_elems)
> +    fatal_at (loc, "too few %ss in alternative %d: expected %d, got %d",
> +             name, alt_no, n_elems, i);
> +
> +  list[i].add (',');
> +}
> +
> +/* The compact syntax has more convience syntaxes.  As such we post process
> +   the lines to get them back to something the normal syntax understands.  */
> +
> +static void
> +preprocess_compact_syntax (file_location loc, int alt_no, std::string &line,
> +                          std::string &last_line)
> +{
> +  /* Check if we're copying the last statement.  */
> +  if (line.find ("^") == 0 && line.size () == 1)
> +    {
> +      if (last_line.empty ())
> +       fatal_at (loc, "found instruction to copy previous line (^) in"
> +                      "alternative %d but no previous line to copy", alt_no);
> +      line = last_line;
> +      return;
> +    }
> +
> +  std::string result;
> +  std::string buffer;
> +  /* Check if we have << which means return c statement.  */
> +  if (line.find ("<<") == 0)
> +    {
> +      result.append ("* return ");
> +      const char *chunk = line.c_str () + 2;
> +      skip_spaces (&chunk);
> +      result.append (chunk);
> +    }
> +  else
> +    result.append (line);
> +
> +  line = result;
> +  return;
> +}
> +
> +/* Converts an rtx from compact syntax to normal syntax if possible.  */
> +
> +static void
> +convert_syntax (rtx x, file_location loc)
> +{
> +  int alt_no;
> +  unsigned int templ_index;
> +  const char *templ;
> +  vec_conlist tconvec, convec, attrvec;
> +
> +  templ_index = GET_CODE (x) == DEFINE_INSN ? 3 : 2;
> +
> +  templ = XTMPL (x, templ_index);
> +
> +  /* Templates with constraints start with "{@".  */
> +  if (strncmp ("*{@", templ, 3))
> +    return;
> +
> +  /* Get the layout for the template.  */
> +  templ += 3;
> +  skip_spaces (&templ);
> +
> +  if (!expect_char (&templ, '['))
> +    fatal_at (loc, "expecing `[' to begin section list");
> +
> +  parse_section_layout (loc, &templ, "cons:", tconvec, true);
> +  convec.resize (tconvec.size ());

IMO it'd be better to leave the last line out, since there's nothing
particularly special about tconvec.size() for this mapping.

> +
> +  /* Check for any duplicate cons entries and sort based on i.  */
> +  for (auto e : tconvec)
> +    {
> +      unsigned idx = e.idx;
> +      if (idx >= convec.size ())
> +       convec.resize (idx + 1);
> +
> +      if (convec[idx].idx >= 0)
> +       fatal_at (loc, "duplicate cons number found: %d", idx);
> +      convec[idx] = e;
> +    }
> +  tconvec.clear ();
> +
> +  if (*templ != ']')
> +    {
> +      if (*templ == ';')
> +       skip_spaces (&(++templ));
> +      parse_section_layout (loc, &templ, "attrs:", attrvec, false);
> +    }
> +
> +  if (!expect_char (&templ, ']'))
> +    fatal_at (loc, "expecting `]` to end section list - section list must 
> have "
> +                  "cons first, attrs second");
> +
> +  /* We will write the un-constrainified template into new_templ.  */
> +  std::string new_templ;
> +  new_templ.append ("@");
> +
> +  /* Skip to the first proper line.  */
> +  while (*templ++ != '\n');

This seems to have reverted to an earlier version, but it has the problems
discussed previously: it never terminates if the string doesn't have a '\n',
and it allows anything to come between the ']' and the '\n'.

I think it should be:

  /* Skip to the first proper line.  */
  skip_spaces (&templ);
  if (*templ == 0)
    fatal_at (loc, "'{@...}' blocks must have at least one alternative");
  if (*templ != '\n')
    fatal_at (loc, "unexpected character '%c' after ']'", *templ);
  templ++;

> +
> +  alt_no = 0;
> +  std::string last_line;
> +
> +  /* Process the alternatives.  */
> +  while (*(templ - 1) != '\0')
> +    {
> +      /* Skip leading whitespace.  */
> +      std::string buffer;
> +      skip_spaces (&templ);
> +
> +      /* Check if we're at the end.  */
> +      if (templ[0] == '}' && templ[1] == '\0')
> +       break;
> +
> +      if (expect_char (&templ, '['))
> +       {
> +         new_templ += '\n';
> +         new_templ.append (buffer);
> +         /* Parse the constraint list, then the attribute list.  */
> +         if (convec.size () > 0)
> +           parse_section (&templ, convec.size (), alt_no, convec, loc,
> +                          "constraint");
> +
> +         if (attrvec.size () > 0)
> +           {
> +             if (convec.size () > 0 && !expect_char (&templ, ';'))
> +               fatal_at (loc, "expected `;' to separate constraints "
> +                              "and attributes in alternative %d", alt_no);
> +
> +             parse_section (&templ, attrvec.size (), alt_no,
> +                            attrvec, loc, "attribute");
> +           }
> +
> +         if (!expect_char (&templ, ']'))
> +           fatal_at (loc, "expected end of constraint/attribute list but "
> +                          "missing an ending `]' in alternative %d", alt_no);
> +       }
> +      else if (templ[0] == '/' && templ[1] == '/')
> +       {
> +         templ += 2;
> +         /* Glob till newline or end of string.  */
> +         while (*templ != '\n' || *templ != '\0')
> +           templ++;
> +         templ++;

This should be deleted, since it will skip over '\0' as well as '\n'.
The loop below handles '\n' correctly.

> +
> +         /* Skip any newlines or whitespaces needed.  */
> +         while (ISSPACE(*templ))
> +           templ++;
> +         continue;
> +       }
> +      else if (templ[0] == '/' && templ[1] == '*')
> +       {
> +         templ += 2;
> +         /* Glob till newline or end of multiline comment.  */
> +         while (templ[0] != 0 && templ[0] != '*' && templ[1] != '/')
> +           templ++;
> +         templ += 2;

Same problem about moving past '\0' here.  But the break condition would
stop on things like "*]" or "//", not just "*/".  I think it should be:

   for (; templ[0] != 0; ++templ)
     if (templ[0] == '*' && templ[1] == '/')
       {
         templ += 2;
         break;
       }

> +
> +         /* Skip any newlines or whitespaces needed.  */
> +         while (ISSPACE(*templ))
> +           templ++;
> +         continue;
> +       }
> +      else
> +       fatal_at (loc, "expected constraint/attribute list at beginning of "
> +                      "alternative %d but missing a starting `['", alt_no);
> +
> +      /* Skip whitespace between list and asm.  */
> +      ++templ;

Looks like this line should be deleted.  We only get here after
expect_char (&templ, ']'), which has already skipped the ']'.

OK for trunk with those changes, thanks.

Richard

> +      skip_spaces (&templ);
> +
> +      /* Copy asm to new template.  */
> +      std::string line;
> +      while (*templ != '\n' && *templ != '\0')
> +       line += *templ++;
> +
> +      /* Apply any pre-processing needed to the line.  */
> +      preprocess_compact_syntax (loc, alt_no, line, last_line);
> +      new_templ.append (line);
> +      last_line = line;
> +
> +      /* Normal "*..." syntax expects the closing quote to be on the final
> +        line of asm, whereas we allow the closing "}" to be on its own line.
> +        Postpone copying the '\n' until we know that there is another
> +        alternative in the list.  */
> +      while (ISSPACE (*templ))
> +       templ++;
> +      ++alt_no;
> +    }
> +
> +  /* Write the constraints and attributes into their proper places.  */
> +  if (convec.size () > 0)
> +    add_constraints (x, loc, convec);
> +
> +  if (attrvec.size () > 0)
> +    add_attributes (x, attrvec);
> +
> +  /* Copy over the new un-constrainified template.  */
> +  XTMPL (x, templ_index) = xstrdup (new_templ.c_str ());
> +
> +  /* Register for later checks during iterator expansions.  */
> +  compact_syntax.add (x);
> +}
> +
>  /* Process a top level rtx in some way, queuing as appropriate.  */
>
>  static void
> @@ -553,10 +1036,12 @@ process_rtx (rtx desc, file_location loc)
>    switch (GET_CODE (desc))
>      {
>      case DEFINE_INSN:
> +      convert_syntax (desc, loc);
>        queue_pattern (desc, &define_insn_tail, loc);
>        break;
>
>      case DEFINE_COND_EXEC:
> +      convert_syntax (desc, loc);
>        queue_pattern (desc, &define_cond_exec_tail, loc);
>        break;
>
> @@ -631,6 +1116,7 @@ process_rtx (rtx desc, file_location loc)
>         attr = XVEC (desc, split_code + 1);
>         PUT_CODE (desc, DEFINE_INSN);
>         XVEC (desc, 4) = attr;
> +       convert_syntax (desc, loc);
>
>         /* Queue them.  */
>         insn_elem = queue_pattern (desc, &define_insn_tail, loc);

Re: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

Reply via email to