Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

Tamar Christina Thu, 16 Jul 2020 06:24:19 -0700

Hi Richard,

I have fixed the formatting issues and made it so it checks explicitly
for ':'.


Ok for master, GCC 10, 9 and 8 after some stew?

Thanks,
Tamar

gcc/ChangeLog:

        * config/aarch64/driver-aarch64.c (INCLUDE_SET): New.
        (parse_field): Use std::string.
        (split_words, readline, find_field): New.
        (host_detect_local_cpu): Fix truncation issues.

The 07/09/2020 18:31, Richard Sandiford wrote:
> Tamar Christina <tamar.christ...@arm.com> writes:
> > Hi Richard,
> >
> > Thanks for the review,
> >
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandif...@arm.com>
> >> Sent: Thursday, July 9, 2020 1:35 PM
> >> To: Tamar Christina <tamar.christ...@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> >> <richard.earns...@arm.com>; Marcus Shawcroft
> >> <marcus.shawcr...@arm.com>; Kyrylo Tkachov <kyrylo.tkac...@arm.com>
> >> Subject: Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.
> >> 
> >> Tamar Christina <tamar.christ...@arm.com> writes:
> >> > Hi All,
> >> >
> >> > This patch fixes a couple of issues in AArch64's -mcpu=native processing:
> >> >
> >> > The buffer used to read the lines from /proc/cpuinfo is 128 bytes
> >> > long.  While this was enough in the past with the increase in 
> >> > architecture
> >> extensions it is
> >> > no longer enough.   It results in two bugs:
> >> >
> >> > 1) No option string longer than 127 characters is correctly parsed.  
> >> > Features
> >> >    that are supported are silently ignored.
> >> >
> >> > 2) It incorrectly enables features that are not present on the machine:
> >> >   a) It checks for substring matching instead of full word matching.  
> >> > This
> >> makes
> >> >      it incorrectly detect sb support when ssbs is provided instead.
> >> >   b) Due to the truncation at the 127 char border it also incorrectly 
> >> > enables
> >> >      features due to the full feature being cut off and the part that is 
> >> > left
> >> >      accidentally enables something else.
> >> >
> >> > This breaks -mcpu=native detection on some of our newer system.
> >> >
> >> > The patch fixes these issues by reading full lines up to the \n in a 
> >> > string.
> >> > This gives us the full feature line.  Secondly it creates a set from
> >> > this string
> >> > to:
> >> >
> >> >  1) Reduce matching complexity from O(n*m) to O(n*logm).
> >> >  2) Perform whole word matching instead of substring matching.
> >> >
> >> > To make this code somewhat cleaner I also changed from using char* to
> >> > using std::string and std::set.
> >> >
> >> > Note that I have intentionally avoided the use of ifstream and
> >> > stringstream to make it easier to backport.  I have also not change
> >> > the substring matching for the initial line classification as I cannot
> >> > find a documented cpuinfo format which leads me to believe there may
> >> > be kernels out there that require this which may be why the original code
> >> does this.
> >> >
> >> > I also do not want this to break if the kernel adds a new line that is
> >> > long and indents the file by two tabs to keep everything aligned.  In
> >> > short I think an imprecise match is the right thing here.
> >> >
> >> > Test for this is added as the last thing in this series as it requires
> >> > some changes to be made to be able to test this.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> 
> >> Sorry to be awkward.  I know Kyrill's already approved this, but I kind-of
> >> object.
> >> 
> >> I think we should split this into two: fixing the limit bug, and fixing the
> >> complexity.  It's easier to justify backporting the fix for the limit bug 
> >> than the
> >> fix for the complexity.
> >
> > It was never the intention to fix the complexity. The change in complexity 
> > is just
> > simply because I split the feature strings into actual words now. I do so 
> > because
> > in my opinion this is simpler than doing memcmp and checking the the 
> > character
> > after the one you matched is a space or a null character, and then checking 
> > that you are
> > one character away from the previous space or at the start of the line.
> >
> > Do you know of a simpler way to do word matches in C?
> 
> This isn't an anti-C++ thing :-)  I just think any clean-up should
> be separate from the bugfix, and only the bugfix should be backported.
> 
> >> For fixing the limit bug, it seems better to use an obstack to read the 
> >> file
> >> instead.  This should actually be much easier than what the patch does. 
> >> :-)  It
> >> should also be more efficient.
> >> 
> >
> > Sorry.. I genuinely don't understand. I had looked at obstack and the 
> > interface seemed to be
> > more work to me.  I am probably wrong since this interface has zero 
> > documentation
> > but it looks like to use obstack I have to (based on guessing from what 
> > other code are doing)
> >
> > 1. call gcc_obstack_init
> > 2. call obstack_alloc
> > 3. call obstack_grow any time I need a bigger buffer.
> > 4. call obstack_free
> >
> > So I am missing why this is simpler than calling realloc... 
> > But obviously I am missing something.
> 
> Yeah, (1), (3) and (4) are the right steps.  But the simplifying
> thing is that obstack has “optimised” single-character insertions.
> So rather than going by fgets, we can just do something like:
> 
>   while ((ch = getc (f)) != EOF)
>     {
>       obstack_1grow (&ob, ch);
>       if (ch == '\n')
>         break;
>     }
>   obstack_1grow (&ob, 0);
>   return obstack_finish (&ob);
> 
> to read a zero-terminated line.  Then use:
> 
>   obstack_free (&ob, line);
> 
> after each line.  (Optional, but more efficient, since it reuses memory.)
> 
> This saves doing a fresh xmalloc+free for each line.
> 
> >> For fixing the complexity: does this actually make things noticeably 
> >> faster in
> >> practice?  The “m” in the O(n*m) is a fairly small constant.
> >> The cost of making it O(n*logm) this way involves many more memory
> >> allocations, so it isn't obvious that it's actually more efficient in 
> >> practice.  Do
> >> you have any timings?
> >
> > Well, no.. I didn't run any benchmarks. Did you have any in mind that you 
> > wanted me to run?
> >
> > As I mentioned, the complexity change was just a byproduct of doing a much 
> > easier to see correct
> > word matching.
> 
> Ah, OK.
> 
> > But if you want me to just add extra checks to the existing implementation 
> > to see
> > if it meets all the criteria of being a standalone word then sure I can do 
> > that.  I don't particularly find
> > that implementation easier to read and I would have thought correctness was 
> > to be favored over memory
> > allocation in something that is called once.
> >
> >> 
> >> If search time is a problem, building a hash_map might be better.
> >
> > I would have thought the red-black tree of a set is fine.. But if I'm only 
> > allowed to use GCC internal structures then
> > I will change it...
> 
> No, using C++ is fine.  The reason for suggesting hash_map
> (or hash_set, as I should have said) was that, if complexity
> was an issue, hash_set/map is supposed to have amortised constant
> lookup time for a good hash.
> 
> As far as the patch itself goes:
> 
> > @@ -126,6 +127,57 @@ parse_field (const char *field)
> >    return fint;
> >  }
> > 
> > +/* Splits and returns a string based on whitespace and return it as
> > +   part of a set. Empty strings are ignored.  */
> > +
> > +static void
> > +split_words (const std::string val, std::set<std::string> &result)
> 
> Not sure there's any benefit to marking “val” const, and adding it makes
> it look like it was supposed to be “const std::string &” instead
> (which should be more efficient).
> 
> > +{
> > +  size_t cur, prev = 0;
> > +  std::string word;
> > +  while ((cur = val.find_first_of (" \n", prev)) != std::string::npos)
> > +  {
> 
> The { … } should be indented two spaces relative the “while”.
> 
> > +    word = val.substr (prev, cur - prev);
> > +    /* Skip adding empty words.  */
> > +    if (!word.empty ())
> > +      result.insert (word);
> 
> Guess, this is just personal preference, sorry, but:
> 
>   if (prev != cur)
>     result.insert (std::string (prev, cur));
> 
> seems simpler.
> 
> > +    prev = cur+1;
> 
> Should be spaces around the “+”.
> 
> > +  }
> > […]
> > +
> > +  word = val.substr (prev, cur - prev);
> > +  if (!word.empty ())
> > +    result.insert (word);
> > +}
> > +
> > […]
> > @@ -164,7 +216,6 @@ host_detect_local_cpu (int argc, const char **argv)
> >  {
> >    const char *res = NULL;
> >    static const int num_exts = ARRAY_SIZE (aarch64_extensions);
> > -  char buf[128];
> >    FILE *f = NULL;
> >    bool arch = false;
> >    bool tune = false;
> > @@ -178,6 +229,7 @@ host_detect_local_cpu (int argc, const char **argv)
> >    bool processed_exts = false;
> >    uint64_t extension_flags = 0;
> >    uint64_t default_flags = 0;
> > +  std::string buf;
> > 
> >    gcc_assert (argc);
> > 
> > @@ -202,9 +254,9 @@ host_detect_local_cpu (int argc, const char **argv)
> > 
> >    /* Look through /proc/cpuinfo to determine the implementer
> >       and then the part number that identifies a particular core.  */
> > -  while (fgets (buf, sizeof (buf), f) != NULL)
> > +  while (!(buf = readline (f)).empty ())
> >      {
> > -      if (strstr (buf, "implementer") != NULL)
> > +      if (buf.find ("implementer") != std::string::npos)
> >     {
> >       unsigned cimp = parse_field (buf);
> >       if (cimp == INVALID_IMP)
> > @@ -216,8 +268,7 @@ host_detect_local_cpu (int argc, const char **argv)
> >       else if (imp != cimp)
> >         goto not_found;
> >     }
> > -
> > -      if (strstr (buf, "variant") != NULL)
> > +      else if (buf.find ("variant") != std::string::npos)
> >     {
> >       unsigned cvariant = parse_field (buf);
> >       if (!contains_core_p (variants, cvariant))
> > @@ -229,8 +280,7 @@ host_detect_local_cpu (int argc, const char **argv)
> >         }
> >            continue;
> >          }
> > -
> > -      if (strstr (buf, "part") != NULL)
> > +      else if (buf.find ("part") != std::string::npos)
> >     {
> >       unsigned ccore = parse_field (buf);
> >       if (!contains_core_p (cores, ccore))
> > @@ -242,39 +292,36 @@ host_detect_local_cpu (int argc, const char **argv)
> >         }
> >       continue;
> >     }
> > -      if (!tune && !processed_exts && strstr (buf, "Features") != NULL)
> > +      else if (!tune && !processed_exts
> > +          && buf.find ("Features") != std::string::npos)
> >     {
> > +     /* First create the list of features in the buffer.  */
> > +     std::set<std::string> features;
> > +     /* Drop everything till the :.  */
> > +     buf = buf.substr (buf.find (':') + 1);
> 
> We should check the result of the find, rather than just assume that the
> ':' exists.
> 
> Looks like current parse_field has the same problem though…
> 
> Thanks,
> Richard

--

diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c
index d1229e676806f9607c258e5d678fb3175fadf1c2..18d670a1c584d3e6991fea4821bbf6dd394fca0c 100644
--- a/gcc/config/aarch64/driver-aarch64.c
+++ b/gcc/config/aarch64/driver-aarch64.c
@@ -21,6 +21,7 @@
 
 #include "config.h"
 #define INCLUDE_STRING
+#define INCLUDE_SET
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -116,9 +117,15 @@ valid_bL_core_p (unsigned int *core, unsigned int bL_core)
 /* Returns the hex integer that is after ':' for the FIELD.
    Returns -1 is returned if there was problem parsing the integer. */
 static unsigned
-parse_field (const char *field)
+parse_field (const std::string &field)
 {
-  const char *rest = strchr (field, ':');
+  const char *rest = strchr (field.c_str (), ':');
+
+  /* The line must be in the format of <name>:<value>, if it's not
+     then we have a weird format.  */
+  if (rest == NULL)
+    return -1;
+
   char *after;
   unsigned fint = strtol (rest + 1, &after, 16);
   if (after == rest + 1)
@@ -126,6 +133,75 @@ parse_field (const char *field)
   return fint;
 }
 
+/* Returns the index of the ':' inside the FIELD which must be found
+   after the value of KEY.  Returns string::npos if line does not contain
+   a field.  */
+
+static size_t
+find_field (const std::string &field, const std::string &key)
+{
+  size_t key_pos, sep_pos;
+  key_pos = field.find (key);
+  if (key_pos == std::string::npos)
+    return std::string::npos;
+
+  sep_pos = field.find (":", key_pos + 1);
+  if (sep_pos == std::string::npos)
+    return std::string::npos;
+
+  return sep_pos;
+}
+
+/* Splits and returns a string based on whitespace and return it as
+   part of a set. Empty strings are ignored.  */
+
+static void
+split_words (const std::string &val, std::set<std::string> &result)
+{
+  size_t cur, prev = 0;
+  std::string word;
+  while ((cur = val.find_first_of (" \n", prev)) != std::string::npos)
+    {
+      word = val.substr (prev, cur - prev);
+      /* Skip adding empty words.  */
+      if (!word.empty ())
+	result.insert (word);
+      prev = cur + 1;
+  }
+
+  if (prev != cur)
+    result.insert (val.substr (prev));
+}
+
+/* Read an entire line from F until '\n' or EOF.  */
+
+static std::string
+readline (FILE *f)
+{
+  char *buf = NULL;
+  int size = 0;
+  int last = 0;
+  const int buf_size = 128;
+
+  if (feof (f))
+    return std::string ();
+
+  do
+  {
+    size += buf_size;
+    buf = (char*)xrealloc (buf, size);
+    gcc_assert (buf);
+    fgets (buf + last, buf_size, f);
+    /* If we're not at the end of the line then override the
+       \0 added by fgets.  */
+    last = strlen (buf) - 1;
+  } while (!feof (f) && buf[last] != '\n');
+
+  std::string result (buf);
+  free (buf);
+  return result;
+}
+
 /*  Return true iff ARR contains CORE, in either of the two elements. */
 
 static bool
@@ -164,7 +240,6 @@ host_detect_local_cpu (int argc, const char **argv)
 {
   const char *res = NULL;
   static const int num_exts = ARRAY_SIZE (aarch64_extensions);
-  char buf[128];
   FILE *f = NULL;
   bool arch = false;
   bool tune = false;
@@ -178,6 +253,8 @@ host_detect_local_cpu (int argc, const char **argv)
   bool processed_exts = false;
   uint64_t extension_flags = 0;
   uint64_t default_flags = 0;
+  std::string buf;
+  size_t sep_pos = -1;
 
   gcc_assert (argc);
 
@@ -202,9 +279,9 @@ host_detect_local_cpu (int argc, const char **argv)
 
   /* Look through /proc/cpuinfo to determine the implementer
      and then the part number that identifies a particular core.  */
-  while (fgets (buf, sizeof (buf), f) != NULL)
+  while (!(buf = readline (f)).empty ())
     {
-      if (strstr (buf, "implementer") != NULL)
+      if (find_field (buf, "implementer") != std::string::npos)
 	{
 	  unsigned cimp = parse_field (buf);
 	  if (cimp == INVALID_IMP)
@@ -216,8 +293,7 @@ host_detect_local_cpu (int argc, const char **argv)
 	  else if (imp != cimp)
 	    goto not_found;
 	}
-
-      if (strstr (buf, "variant") != NULL)
+      else if (find_field (buf, "variant") != std::string::npos)
 	{
 	  unsigned cvariant = parse_field (buf);
 	  if (!contains_core_p (variants, cvariant))
@@ -229,8 +305,7 @@ host_detect_local_cpu (int argc, const char **argv)
 	    }
           continue;
         }
-
-      if (strstr (buf, "part") != NULL)
+      else if (find_field (buf, "part") != std::string::npos)
 	{
 	  unsigned ccore = parse_field (buf);
 	  if (!contains_core_p (cores, ccore))
@@ -242,39 +317,36 @@ host_detect_local_cpu (int argc, const char **argv)
 	    }
 	  continue;
 	}
-      if (!tune && !processed_exts && strstr (buf, "Features") != NULL)
+      else if (!tune && !processed_exts
+	       && (sep_pos = find_field (buf, "Features")) != std::string::npos)
 	{
+	  /* First create the list of features in the buffer.  */
+	  std::set<std::string> features;
+	  /* Drop everything till the :.  */
+	  buf = buf.substr (sep_pos + 1);
+	  split_words (buf, features);
+
 	  for (i = 0; i < num_exts; i++)
 	    {
-	      const char *p = aarch64_extensions[i].feat_string;
+	      const std::string val (aarch64_extensions[i].feat_string);
 
 	      /* If the feature contains no HWCAPS string then ignore it for the
 		 auto detection.  */
-	      if (*p == '\0')
+	      if (val.empty ())
 		continue;
 
 	      bool enabled = true;
 
 	      /* This may be a multi-token feature string.  We need
 		 to match all parts, which could be in any order.  */
-	      size_t len = strlen (buf);
-	      do
-		{
-		  const char *end = strchr (p, ' ');
-		  if (end == NULL)
-		    end = strchr (p, '\0');
-		  if (memmem (buf, len, p, end - p) == NULL)
-		    {
-		      /* Failed to match this token.  Turn off the
-			 features we'd otherwise enable.  */
-		      enabled = false;
-		      break;
-		    }
-		  if (*end == '\0')
-		    break;
-		  p = end + 1;
-		}
-	      while (1);
+	      std::set<std::string> tokens;
+	      split_words (val, tokens);
+	      std::set<std::string>::iterator it;
+
+	      /* Iterate till the first feature isn't found or all of them
+		 are found.  */
+	      for (it = tokens.begin (); enabled && it != tokens.end (); ++it)
+		 enabled = enabled && features.count (*it);
 
 	      if (enabled)
 		extension_flags |= aarch64_extensions[i].flag;

Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

Reply via email to