RE: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

Tamar Christina via Gcc-patches Tue, 06 Jun 2023 05:01:37 -0700

Hi,

Thanks for the review, just some quick responses before I make the changes:


> >    int operand_number;              /* Operand index in the big array.  */
> >    int output_format;               /* INSN_OUTPUT_FORMAT_*.  */
> > +  bool compact_syntax_p;
> >    struct operand_data operand[MAX_MAX_OPERANDS];  };
> >
> > @@ -700,12 +702,57 @@ process_template (class data *d, const char
> *template_code)
> >       if (sp != ep)
> >         message_at (d->loc, "trailing whitespace in output template");
> >
> > -     while (cp < sp)
> > +     /* Check for any unexpanded iterators.  */
> > +     if (bp[0] != '*' && d->compact_syntax_p)
> 
> I assume the bp[0] != '*' condition skips the check for C code blocks.
> Genuine question, but are you sure we want that?  C code often includes asm
> strings (in quotes), such as for the SVE CNT[BHWD] example.
> 
> Extending the check would mean that any use of <...> for C++ templates will
> need to be quoted, but explicit instantiation is pretty rare in .md files.  
> It would
> also look weird for conditions.
> 
> Either way is fine, just asking.

I excluded it entirely to avoid also running afoul of the binary operators. So 
e.g.
* a < b && b > c ? foo : bar shouldn't trigger it.   It seemed more trouble 
than it's
worth to try to get correct.

> > +  }
> > +
> > +  /* Adds a character to the end of the string.  */  void add (char
> > + c)  {
> > +    con += c;
> > +  }
> > +
> > +  /* Output the string in the form of a brand-new char *, then effectively
> > +     clear the internal string by resetting len to 0.  */  char * out
> > + ()
> 
> Formatting: no need for a space before "out".
> 
> > +  {
> > +    /* Final character is always a trailing comma, so strip it out.
> > + */
> 
> trailing ',', ';' or ']', rather than just a comma?

Ah no, this is a bit of a lazy intercalate, when the alternatives are pushed in 
it's
not easy to tell how many there will be (because we don't keep track of it in 
this part),
so we just always add a trailing "," and ignore the last char on output.  
Validation of the
alternative counts themselves is done later by the normal machinery.

> 
> > +    char * q;
> 
> Similarly no space before "q" here.
> 
> > +    if (modifier.empty ())
> > +      q = xstrndup (con.c_str (), con.size () - 1);
> 
> Could just be "xstrdup (con.c_str ())".
> 
> > +    else
> > +      {
> > +   int len = con.size () + modifier.size ();
> > +   q = XNEWVEC (char, len);
> > +   strncpy (q, modifier.c_str (), modifier.size ());
> > +   strncpy (q + modifier.size (), con.c_str (), con.size ());
> > +   q[len -1] = '\0';
> > +      }
> 
> Do we need the separation between "modifier" and "cons"?  It looks like the
> code completes the initialisation of "modifier" before it writes to "cons", 
> and
> so we could just use a single string.

Fair point.

> > +   {
> > +     if (XSTR (part, 1) && XSTR (part, 1)[0] != '\0')
> > +       {
> > +         error_at (loc, "can't mix normal and compact attribute syntax");
> > +         break;
> > +       }
> > +     XSTR (part, 1) = attrs[index].out ();
> > +
> > +     ++index;
> > +     if (index == attrs.size ())
> > +       break;
> > +   }
> 
> It looks like you forgive mixing new-style and old-style syntax, since 
> there's no
> "else error" here.  But the documentation said that that wasn't allowed.
> 
> Either way seems OK to me, but see the next comment.
> 
> > +    }
> > +
> > +  return index;
> > +}
> > +
> > +/* Modify the attributes list to make space for the implicitly declared
> > +   attributes in the attrs: list.  */
> > +
> > +static void
> > +create_missing_attributes (rtx x, file_location /* loc */,
> > +vec_conlist &attrs) {
> > +  if (attrs.empty ())
> > +    return;
> > +
> > +  unsigned int attr_index = GET_CODE (x) == DEFINE_INSN ? 4 : 3;
> > + vec_conlist missing;
> > +
> > +  /* This is an O(n*m) loop but it's fine, both n and m will always be very
> > +     small.  */
> 
> Agreed that quadraticness isn't a problem.  But I wonder how many people
> would write an explicit placeholder set_attr.  Unlike match_operand and
> match_scratch, a placeholder set_attr doesn't carry any additional
> information.
> 
> It might be simpler to drop add_attributes and add all attributes
> unconditionally in this function instead.  If the user tries to specify the 
> same
> attribute using both syntaxes, the pattern would end up with two definitions
> of the same attribute, which ought to be flagged by existing code.
> 

This was done to support the (in arm backend) common thing of having attributes
which are either too complex to add inline in the new syntax or that just 
repeat a
value.

i.e. it's to allow cases like this:

  [(set_attr "length")
   (set_attr "predicable" "yes")
   (set_attr "predicable_short_it")
   (set_attr "arch")
   (set (attr "type") (if_then_else (match_operand 2 "const_int_operand" "")
                      (const_string "alu_imm")
                      (const_string "alu_sreg")))

Where your attrs contains:

  {@ [cons: =0, 1, 2; attrs: length, predicable_short_it, arch]

However you're right, I could simply say that you must omit the set_attr in 
attrs and just
merge the two lists?  I think that's what you were alluding to?

Cheers,
Tamar

> > +  for (conlist cl : attrs)
> > +    {
> > +      bool found = false;
> > +      for (int i = 0; XVEC (x, attr_index) && i < XVECLEN (x, attr_index); 
> > ++i)
> > +   {
> > +     rtx part = XVECEXP (x, attr_index, i);
> > +
> > +     if (GET_CODE (part) != SET_ATTR
> > +         || cl.name.compare (XSTR (part, 0)) == 0)
> > +       {
> > +         found = true;
> > +         break;
> > +       }
> > +   }
> > +
> > +      if (!found)
> > +   missing.push_back (cl);
> > +    }
> > +
> > +  rtvec orig = XVEC (x, attr_index);
> > +  size_t n_curr = orig ? XVECLEN (x, attr_index) : 0;  rtvec copy =
> > + rtvec_alloc (n_curr + missing.size ());
> > +
> > +  /* Create a shallow copy of existing entries.  */  memcpy
> > + (&copy->elem[missing.size ()], &orig->elem[0], sizeof (rtx) *
> > + n_curr);  XVEC (x, attr_index) = copy;
> > +
> > +  /* Create the new elements.  */
> > +  for (unsigned i = 0; i < missing.size (); i++)
> > +    {
> > +      rtx attr = rtx_alloc (SET_ATTR);
> > +      XSTR (attr, 0) = xstrdup (attrs[i].name.c_str ());
> > +      XSTR (attr, 1) = NULL;
> > +      XVECEXP (x, attr_index, i) = attr;
> > +    }
> > +
> > +  return;
> > +}
> > +
> > +/* Consumes spaces and tabs.  */
> > +
> > +static inline void
> > +skip_spaces (const char **str)
> > +{
> > +  while (**str == ' ' || **str == '\t')
> 
> ISSPACE here too.
> 
> > +    (*str)++;
> > +}
> > +
> > +/* Consumes the given character, if it's there.  */
> > +
> > +static inline bool
> > +expect_char (const char **str, char c) {
> > +  if (**str != c)
> > +    return false;
> > +  (*str)++;
> > +  return true;
> > +}
> > +
> > +/* Parses the section layout that follows a "{@}" if using new syntax. 
> > Builds
> > +   a vector for a single section. E.g. if we have "attrs: length arch)..."
> > +   then list will have two elements, the first for "length" and the second
> > +   for "arch".  */
> > +
> > +static void
> > +parse_section_layout (const char **templ, const char *label,
> > +                 vec_conlist &list, bool numeric) {
> > +  const char *name_start;
> > +  size_t label_len = strlen (label);
> > +  if (strncmp (label, *templ, label_len) == 0)
> > +    {
> > +      *templ += label_len;
> > +
> > +      /* Gather the names.  */
> > +      while (**templ != ';' && **templ != ']')
> > +   {
> > +     skip_spaces (templ);
> > +     name_start = *templ;
> > +     int len = 0;
> > +     char val = (*templ)[len];
> > +     while (val != ',' && val != ';' && val != ']')
> > +        val = (*templ)[++len];
> > +     *templ += len;
> > +     if (val == ',')
> > +       (*templ)++;
> > +     list.push_back (conlist (name_start, len, numeric));
> > +   }
> > +    }
> > +}
> > +
> > +/* Parse a section, a section is defined as a named space separated list, 
> > e.g.
> > +
> > +   foo: a b c
> 
> Now comma-separated rather than space-separated.  Applies to the example
> too.
> 
> > +
> > +   is a section named "foo" with entries a,b and c.  */
> > +
> > +static void
> > +parse_section (const char **templ, unsigned int n_elems, unsigned int
> alt_no,
> > +          vec_conlist &list, file_location loc, const char *name) {
> > +  unsigned int i;
> > +
> > +  /* Go through the list, one character at a time, adding said character
> > +     to the correct string.  */
> > +  for (i = 0; **templ != ']' && **templ != ';'; (*templ)++)
> > +    {
> > +      if (**templ != ' ' && **templ != '\t')
> 
> !ISSPACE
> 
> > +   {
> > +     list[i].add(**templ);
> 
> Formatting: should be a space before "(".
> 
> > +     if (**templ == ',')
> > +       {
> > +         ++i;
> > +         if (i == n_elems)
> > +           fatal_at (loc, "too many %ss in alternative %d: expected %d",
> > +                     name, alt_no, n_elems);
> > +       }
> > +   }
> > +    }
> > +
> > +  if (i + 1 < n_elems)
> > +    fatal_at (loc, "too few %ss in alternative %d: expected %d, got %d",
> > +         name, alt_no, n_elems, i);
> > +
> > +  list[i].add(',');
> > +}
> > +
> > +/* The compact syntax has more convience syntaxes.  As such we post
> process
> > +   the lines to get them back to something the normal syntax
> > +understands.  */
> > +
> > +static void
> > +preprocess_compact_syntax (file_location loc, int alt_no, std::string 
> > &line,
> > +                      std::string &last_line)
> > +{
> > +  /* Check if we're copying the last statement.  */
> > +  if (line.find ("^") == 0 && line.size () == 1)
> > +    {
> > +      if (last_line.empty ())
> > +   fatal_at (loc, "found instruction to copy previous line (^) in"
> > +                  "alternative %d but no previous line to copy", alt_no);
> > +      line = last_line;
> > +      return;
> > +    }
> > +
> > +  std::string result;
> > +  std::string buffer;
> > +  /* Check if we have << which means return c statement.  */  if
> > + (line.find ("<<") == 0)
> > +    {
> > +      result.append ("* return ");
> > +      result.append (line.substr (3));
> 
> Seems like this should be line.substr (2) or that the find() should include a
> space after "<<".  As it stands, we'd accept <<X and drop the X.
> 
> > +    }
> > +  else
> > +    result.append (line);
> > +
> > +  line = result;
> > +  return;
> > +}
> > +
> > +/* Converts an rtx from compact syntax to normal syntax if possible.
> > +*/
> > +
> > +static void
> > +convert_syntax (rtx x, file_location loc) {
> > +  int alt_no;
> > +  unsigned int index, templ_index;
> > +  const char *templ;
> > +  vec_conlist tconvec, convec, attrvec;
> > +
> > +  templ_index = GET_CODE (x) == DEFINE_INSN ? 3 : 2;
> > +
> > +  templ = XTMPL (x, templ_index);
> > +
> > +  /* Templates with constraints start with "{@".  */  if (strncmp
> > + ("*{@", templ, 3))
> > +    return;
> > +
> > +  /* Get the layout for the template.  */  templ += 3;  skip_spaces
> > + (&templ);
> > +
> > +  if (!expect_char (&templ, '['))
> > +    fatal_at (loc, "expecing `[' to begin section list");
> > +
> > +  parse_section_layout (&templ, "cons:", tconvec, true);
> > + convec.resize (tconvec.size ());
> > +
> > +  /* Check for any duplicate cons entries and sort based on i.  */
> > +  for (unsigned i = 0; i < tconvec.size (); i++)
> > +    {
> > +      int idx = tconvec[i].idx;
> > +      if (convec[idx].idx >= 0)
> > +   fatal_at (loc, "duplicate cons number found: %d", idx);
> > +      convec[idx] = tconvec[i];
> > +    }
> > +  tconvec.clear ();
> 
> "convec.resize (tconvec.size ());" isn't guaranteed to be enough if the cons:
> skips operands.  Either we need to calculate the maximum idx first, or we need
> to grow convec on demand.
> 
> > +
> > +
> 
> Nit: excess whitespace
> 
> > +  if (*templ != ']')
> > +    {
> > +      if (*templ == ';')
> > +   skip_spaces (&(++templ));
> > +      parse_section_layout (&templ, "attrs:", attrvec, false);
> > +      create_missing_attributes (x, loc, attrvec);
> > +    }
> > +
> > +  if (!expect_char (&templ, ']'))
> > +    {
> > +      fatal_at (loc, "expecting `]` to end section list - section list "
> > +           "must have cons first, attrs second");
> > +    }
> 
> Formatting nit: unnecessary braces
> 
> > +
> > +  /* We will write the un-constrainified template into new_templ.  */
> > + std::string new_templ;  new_templ.append ("@");
> > +
> > +  /* Skip to the first proper line.  */  while (*templ++ != '\n');
> 
> This seems to allow anything to follow the "]".  Should we instead use
> skip_spaces and then require a '\n'?
> 
> > +  alt_no = 0;
> > +
> > +  std::string last_line;
> > +
> > +  /* Process the alternatives.  */
> > +  while (*(templ - 1) != '\0')
> > +    {
> > +      /* Copy leading whitespace.  */
> > +      std::string buffer;
> > +      while (*templ == ' ' || *templ == '\t')
> > +   buffer += *templ++;
> 
> Why do we need to do that?  The '@' handling in genoutput.cc seems to skip
> whatever space is present.
> 
> I was wondering if it was so that column numbers matched in compiler error
> messages against "<<" lines, but those would already be off because of the "*
> return" transformation (not an issue that needs to be fixed).
> 
> > +
> > +      /* Check if we're at the end.  */
> > +      if (templ[0] == '}' && templ[1] == '\0')
> > +   break;
> > +
> > +      new_templ += '\n';
> > +      new_templ.append (buffer);
> > +
> > +      if (expect_char (&templ, '['))
> > +   {
> > +     /* Parse the constraint list, then the attribute list.  */
> > +     if (convec.size () > 0)
> > +       parse_section (&templ, convec.size (), alt_no, convec, loc,
> > +                      "constraint");
> > +
> > +     if (attrvec.size () > 0)
> > +       {
> > +         if (convec.size () > 0 && !expect_char (&templ, ';'))
> > +           fatal_at (loc, "expected `;' to separate constraints "
> > +                          "and attributes in alternative %d", alt_no);
> > +
> > +         parse_section (&templ, attrvec.size (), alt_no,
> > +                        attrvec, loc, "attribute");
> > +       }
> > +
> > +     if (!expect_char (&templ, ']'))
> > +       fatal_at (loc, "expected end of constraint/attribute list but "
> > +                      "missing an ending `]' in alternative %d", alt_no);
> > +   }
> > +      else if (templ[0] == '/' && templ[1] == '/')
> > +   {
> > +     templ+=2;
> 
> Formatting: should be spaces around "+=".  But here, and...
> 
> > +     /* Glob till newline or end of string.  */
> > +     while (*templ != '\n' || *templ != '\0')
> > +       templ++;
> > +   }
> > +      else if (templ[0] == '/' && templ[1] == '*')
> > +   {
> > +     templ+=2;
> > +     /* Glob till newline or end of multiline comment.  */
> > +     while (templ[0] != '*' && templ[1] != '/')
> > +       templ++;
> > +     templ++;
> 
> ...especially here, I think we should instead completely skip lines with
> comments and then "continue", without adding anything to new_templ for
> that iteration of the loop.  That would ensure
> that:
> 
> (a) multi-line // comments work correctly
> (b) a comment at the end gets silently dropped without adding a
>     line to the new template
> 
> > +   }
> > +      else
> > +   fatal_at (loc, "expected constraint/attribute list at beginning of "
> > +                  "alternative %d but missing a starting `['", alt_no);
> > +
> > +      /* Skip whitespace between list and asm.  */
> > +      ++templ;
> > +      skip_spaces (&templ);
> > +
> > +      /* Copy asm to new template.  */
> > +      std::string line;
> > +      while (*templ != '\n' && *templ != '\0')
> > +   line += *templ++;
> > +
> > +      /* Apply any pre-processing needed to the line.  */
> > +      preprocess_compact_syntax (loc, alt_no, line, last_line);
> > +      new_templ.append (line);
> > +      last_line = line;
> > +
> > +      /* The processing is very sensitive to whitespace, so preserve
> > +    all but the trailing ones.  */
> > +      if (templ[0] == '\n')
> > +   *templ++;
> 
> Is the point here that we allow the closing "}" to be on its own line?
> It might be worth calling that out explicitly if so.
> 
> In other words, I'd understood this to mean something like:
> 
>     /* Normal "*..." syntax expects the closing quote to be on the final
>        line of asm, whereas we allow the closing "}" to be on its own line.
>        Postpone copying the '\n' until we know that there is another
>        alternative in the list.  */
> 
> > +      ++alt_no;
> > +    }
> > +
> > +  /* Write the constraints and attributes into their proper places.
> > +*/
> > +  if (convec.size () > 0)
> > +    {
> > +      index = add_constraints (x, loc, 0, convec);
> > +      if (index < convec.size ())
> > +   fatal_at (loc, "could not find match_operand/scratch with id %d",
> > +             convec[index].idx);
> > +    }
> > +
> > +  if (attrvec.size () > 0)
> > +    {
> > +      index = add_attributes (x, loc, attrvec);
> > +      if (index < attrvec.size ())
> > +   fatal_at (loc, "could not find set_attr for attribute %s",
> > +             attrvec[index].name.c_str ());
> > +    }
> > +
> > +  /* Copy over the new un-constrainified template.  */  XTMPL (x,
> > + templ_index) = xstrdup (new_templ.c_str ());
> > +
> > +  /* Register for later checks during iterator expansions.  */
> > + compact_syntax.add (x);
> > +
> > +#if DEBUG
> > +  print_rtl_single (stderr, x);
> > +#endif
> 
> IMO it'd be better to drop this.  It's easy enough to add locally if that's 
> what
> someone wants.  ("make mddump" would also be useful for debugging this.)
> 
> Thanks,
> Richard
> 
> > +}
> > +
> >  /* Process a top level rtx in some way, queuing as appropriate.  */
> >
> >  static void
> > @@ -553,10 +1121,12 @@ process_rtx (rtx desc, file_location loc)
> >    switch (GET_CODE (desc))
> >      {
> >      case DEFINE_INSN:
> > +      convert_syntax (desc, loc);
> >        queue_pattern (desc, &define_insn_tail, loc);
> >        break;
> >
> >      case DEFINE_COND_EXEC:
> > +      convert_syntax (desc, loc);
> >        queue_pattern (desc, &define_cond_exec_tail, loc);
> >        break;
> >
> > @@ -631,6 +1201,7 @@ process_rtx (rtx desc, file_location loc)
> >     attr = XVEC (desc, split_code + 1);
> >     PUT_CODE (desc, DEFINE_INSN);
> >     XVEC (desc, 4) = attr;
> > +   convert_syntax (desc, loc);
> >
> >     /* Queue them.  */
> >     insn_elem = queue_pattern (desc, &define_insn_tail, loc);

RE: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

Reply via email to