Hi Richard, I have fixed the formatting issues and made it so it checks explicitly for ':'.
Ok for master, GCC 10, 9 and 8 after some stew? Thanks, Tamar gcc/ChangeLog: * config/aarch64/driver-aarch64.c (INCLUDE_SET): New. (parse_field): Use std::string. (split_words, readline, find_field): New. (host_detect_local_cpu): Fix truncation issues. The 07/09/2020 18:31, Richard Sandiford wrote: > Tamar Christina <tamar.christ...@arm.com> writes: > > Hi Richard, > > > > Thanks for the review, > > > >> -----Original Message----- > >> From: Richard Sandiford <richard.sandif...@arm.com> > >> Sent: Thursday, July 9, 2020 1:35 PM > >> To: Tamar Christina <tamar.christ...@arm.com> > >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw > >> <richard.earns...@arm.com>; Marcus Shawcroft > >> <marcus.shawcr...@arm.com>; Kyrylo Tkachov <kyrylo.tkac...@arm.com> > >> Subject: Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection. > >> > >> Tamar Christina <tamar.christ...@arm.com> writes: > >> > Hi All, > >> > > >> > This patch fixes a couple of issues in AArch64's -mcpu=native processing: > >> > > >> > The buffer used to read the lines from /proc/cpuinfo is 128 bytes > >> > long. While this was enough in the past with the increase in > >> > architecture > >> extensions it is > >> > no longer enough. It results in two bugs: > >> > > >> > 1) No option string longer than 127 characters is correctly parsed. > >> > Features > >> > that are supported are silently ignored. > >> > > >> > 2) It incorrectly enables features that are not present on the machine: > >> > a) It checks for substring matching instead of full word matching. > >> > This > >> makes > >> > it incorrectly detect sb support when ssbs is provided instead. > >> > b) Due to the truncation at the 127 char border it also incorrectly > >> > enables > >> > features due to the full feature being cut off and the part that is > >> > left > >> > accidentally enables something else. > >> > > >> > This breaks -mcpu=native detection on some of our newer system. > >> > > >> > The patch fixes these issues by reading full lines up to the \n in a > >> > string. > >> > This gives us the full feature line. Secondly it creates a set from > >> > this string > >> > to: > >> > > >> > 1) Reduce matching complexity from O(n*m) to O(n*logm). > >> > 2) Perform whole word matching instead of substring matching. > >> > > >> > To make this code somewhat cleaner I also changed from using char* to > >> > using std::string and std::set. > >> > > >> > Note that I have intentionally avoided the use of ifstream and > >> > stringstream to make it easier to backport. I have also not change > >> > the substring matching for the initial line classification as I cannot > >> > find a documented cpuinfo format which leads me to believe there may > >> > be kernels out there that require this which may be why the original code > >> does this. > >> > > >> > I also do not want this to break if the kernel adds a new line that is > >> > long and indents the file by two tabs to keep everything aligned. In > >> > short I think an imprecise match is the right thing here. > >> > > >> > Test for this is added as the last thing in this series as it requires > >> > some changes to be made to be able to test this. > >> > > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > >> > >> Sorry to be awkward. I know Kyrill's already approved this, but I kind-of > >> object. > >> > >> I think we should split this into two: fixing the limit bug, and fixing the > >> complexity. It's easier to justify backporting the fix for the limit bug > >> than the > >> fix for the complexity. > > > > It was never the intention to fix the complexity. The change in complexity > > is just > > simply because I split the feature strings into actual words now. I do so > > because > > in my opinion this is simpler than doing memcmp and checking the the > > character > > after the one you matched is a space or a null character, and then checking > > that you are > > one character away from the previous space or at the start of the line. > > > > Do you know of a simpler way to do word matches in C? > > This isn't an anti-C++ thing :-) I just think any clean-up should > be separate from the bugfix, and only the bugfix should be backported. > > >> For fixing the limit bug, it seems better to use an obstack to read the > >> file > >> instead. This should actually be much easier than what the patch does. > >> :-) It > >> should also be more efficient. > >> > > > > Sorry.. I genuinely don't understand. I had looked at obstack and the > > interface seemed to be > > more work to me. I am probably wrong since this interface has zero > > documentation > > but it looks like to use obstack I have to (based on guessing from what > > other code are doing) > > > > 1. call gcc_obstack_init > > 2. call obstack_alloc > > 3. call obstack_grow any time I need a bigger buffer. > > 4. call obstack_free > > > > So I am missing why this is simpler than calling realloc... > > But obviously I am missing something. > > Yeah, (1), (3) and (4) are the right steps. But the simplifying > thing is that obstack has “optimised” single-character insertions. > So rather than going by fgets, we can just do something like: > > while ((ch = getc (f)) != EOF) > { > obstack_1grow (&ob, ch); > if (ch == '\n') > break; > } > obstack_1grow (&ob, 0); > return obstack_finish (&ob); > > to read a zero-terminated line. Then use: > > obstack_free (&ob, line); > > after each line. (Optional, but more efficient, since it reuses memory.) > > This saves doing a fresh xmalloc+free for each line. > > >> For fixing the complexity: does this actually make things noticeably > >> faster in > >> practice? The “m” in the O(n*m) is a fairly small constant. > >> The cost of making it O(n*logm) this way involves many more memory > >> allocations, so it isn't obvious that it's actually more efficient in > >> practice. Do > >> you have any timings? > > > > Well, no.. I didn't run any benchmarks. Did you have any in mind that you > > wanted me to run? > > > > As I mentioned, the complexity change was just a byproduct of doing a much > > easier to see correct > > word matching. > > Ah, OK. > > > But if you want me to just add extra checks to the existing implementation > > to see > > if it meets all the criteria of being a standalone word then sure I can do > > that. I don't particularly find > > that implementation easier to read and I would have thought correctness was > > to be favored over memory > > allocation in something that is called once. > > > >> > >> If search time is a problem, building a hash_map might be better. > > > > I would have thought the red-black tree of a set is fine.. But if I'm only > > allowed to use GCC internal structures then > > I will change it... > > No, using C++ is fine. The reason for suggesting hash_map > (or hash_set, as I should have said) was that, if complexity > was an issue, hash_set/map is supposed to have amortised constant > lookup time for a good hash. > > As far as the patch itself goes: > > > @@ -126,6 +127,57 @@ parse_field (const char *field) > > return fint; > > } > > > > +/* Splits and returns a string based on whitespace and return it as > > + part of a set. Empty strings are ignored. */ > > + > > +static void > > +split_words (const std::string val, std::set<std::string> &result) > > Not sure there's any benefit to marking “val” const, and adding it makes > it look like it was supposed to be “const std::string &” instead > (which should be more efficient). > > > +{ > > + size_t cur, prev = 0; > > + std::string word; > > + while ((cur = val.find_first_of (" \n", prev)) != std::string::npos) > > + { > > The { … } should be indented two spaces relative the “while”. > > > + word = val.substr (prev, cur - prev); > > + /* Skip adding empty words. */ > > + if (!word.empty ()) > > + result.insert (word); > > Guess, this is just personal preference, sorry, but: > > if (prev != cur) > result.insert (std::string (prev, cur)); > > seems simpler. > > > + prev = cur+1; > > Should be spaces around the “+”. > > > + } > > […] > > + > > + word = val.substr (prev, cur - prev); > > + if (!word.empty ()) > > + result.insert (word); > > +} > > + > > […] > > @@ -164,7 +216,6 @@ host_detect_local_cpu (int argc, const char **argv) > > { > > const char *res = NULL; > > static const int num_exts = ARRAY_SIZE (aarch64_extensions); > > - char buf[128]; > > FILE *f = NULL; > > bool arch = false; > > bool tune = false; > > @@ -178,6 +229,7 @@ host_detect_local_cpu (int argc, const char **argv) > > bool processed_exts = false; > > uint64_t extension_flags = 0; > > uint64_t default_flags = 0; > > + std::string buf; > > > > gcc_assert (argc); > > > > @@ -202,9 +254,9 @@ host_detect_local_cpu (int argc, const char **argv) > > > > /* Look through /proc/cpuinfo to determine the implementer > > and then the part number that identifies a particular core. */ > > - while (fgets (buf, sizeof (buf), f) != NULL) > > + while (!(buf = readline (f)).empty ()) > > { > > - if (strstr (buf, "implementer") != NULL) > > + if (buf.find ("implementer") != std::string::npos) > > { > > unsigned cimp = parse_field (buf); > > if (cimp == INVALID_IMP) > > @@ -216,8 +268,7 @@ host_detect_local_cpu (int argc, const char **argv) > > else if (imp != cimp) > > goto not_found; > > } > > - > > - if (strstr (buf, "variant") != NULL) > > + else if (buf.find ("variant") != std::string::npos) > > { > > unsigned cvariant = parse_field (buf); > > if (!contains_core_p (variants, cvariant)) > > @@ -229,8 +280,7 @@ host_detect_local_cpu (int argc, const char **argv) > > } > > continue; > > } > > - > > - if (strstr (buf, "part") != NULL) > > + else if (buf.find ("part") != std::string::npos) > > { > > unsigned ccore = parse_field (buf); > > if (!contains_core_p (cores, ccore)) > > @@ -242,39 +292,36 @@ host_detect_local_cpu (int argc, const char **argv) > > } > > continue; > > } > > - if (!tune && !processed_exts && strstr (buf, "Features") != NULL) > > + else if (!tune && !processed_exts > > + && buf.find ("Features") != std::string::npos) > > { > > + /* First create the list of features in the buffer. */ > > + std::set<std::string> features; > > + /* Drop everything till the :. */ > > + buf = buf.substr (buf.find (':') + 1); > > We should check the result of the find, rather than just assume that the > ':' exists. > > Looks like current parse_field has the same problem though… > > Thanks, > Richard --
diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c index d1229e676806f9607c258e5d678fb3175fadf1c2..18d670a1c584d3e6991fea4821bbf6dd394fca0c 100644 --- a/gcc/config/aarch64/driver-aarch64.c +++ b/gcc/config/aarch64/driver-aarch64.c @@ -21,6 +21,7 @@ #include "config.h" #define INCLUDE_STRING +#define INCLUDE_SET #include "system.h" #include "coretypes.h" #include "tm.h" @@ -116,9 +117,15 @@ valid_bL_core_p (unsigned int *core, unsigned int bL_core) /* Returns the hex integer that is after ':' for the FIELD. Returns -1 is returned if there was problem parsing the integer. */ static unsigned -parse_field (const char *field) +parse_field (const std::string &field) { - const char *rest = strchr (field, ':'); + const char *rest = strchr (field.c_str (), ':'); + + /* The line must be in the format of <name>:<value>, if it's not + then we have a weird format. */ + if (rest == NULL) + return -1; + char *after; unsigned fint = strtol (rest + 1, &after, 16); if (after == rest + 1) @@ -126,6 +133,75 @@ parse_field (const char *field) return fint; } +/* Returns the index of the ':' inside the FIELD which must be found + after the value of KEY. Returns string::npos if line does not contain + a field. */ + +static size_t +find_field (const std::string &field, const std::string &key) +{ + size_t key_pos, sep_pos; + key_pos = field.find (key); + if (key_pos == std::string::npos) + return std::string::npos; + + sep_pos = field.find (":", key_pos + 1); + if (sep_pos == std::string::npos) + return std::string::npos; + + return sep_pos; +} + +/* Splits and returns a string based on whitespace and return it as + part of a set. Empty strings are ignored. */ + +static void +split_words (const std::string &val, std::set<std::string> &result) +{ + size_t cur, prev = 0; + std::string word; + while ((cur = val.find_first_of (" \n", prev)) != std::string::npos) + { + word = val.substr (prev, cur - prev); + /* Skip adding empty words. */ + if (!word.empty ()) + result.insert (word); + prev = cur + 1; + } + + if (prev != cur) + result.insert (val.substr (prev)); +} + +/* Read an entire line from F until '\n' or EOF. */ + +static std::string +readline (FILE *f) +{ + char *buf = NULL; + int size = 0; + int last = 0; + const int buf_size = 128; + + if (feof (f)) + return std::string (); + + do + { + size += buf_size; + buf = (char*)xrealloc (buf, size); + gcc_assert (buf); + fgets (buf + last, buf_size, f); + /* If we're not at the end of the line then override the + \0 added by fgets. */ + last = strlen (buf) - 1; + } while (!feof (f) && buf[last] != '\n'); + + std::string result (buf); + free (buf); + return result; +} + /* Return true iff ARR contains CORE, in either of the two elements. */ static bool @@ -164,7 +240,6 @@ host_detect_local_cpu (int argc, const char **argv) { const char *res = NULL; static const int num_exts = ARRAY_SIZE (aarch64_extensions); - char buf[128]; FILE *f = NULL; bool arch = false; bool tune = false; @@ -178,6 +253,8 @@ host_detect_local_cpu (int argc, const char **argv) bool processed_exts = false; uint64_t extension_flags = 0; uint64_t default_flags = 0; + std::string buf; + size_t sep_pos = -1; gcc_assert (argc); @@ -202,9 +279,9 @@ host_detect_local_cpu (int argc, const char **argv) /* Look through /proc/cpuinfo to determine the implementer and then the part number that identifies a particular core. */ - while (fgets (buf, sizeof (buf), f) != NULL) + while (!(buf = readline (f)).empty ()) { - if (strstr (buf, "implementer") != NULL) + if (find_field (buf, "implementer") != std::string::npos) { unsigned cimp = parse_field (buf); if (cimp == INVALID_IMP) @@ -216,8 +293,7 @@ host_detect_local_cpu (int argc, const char **argv) else if (imp != cimp) goto not_found; } - - if (strstr (buf, "variant") != NULL) + else if (find_field (buf, "variant") != std::string::npos) { unsigned cvariant = parse_field (buf); if (!contains_core_p (variants, cvariant)) @@ -229,8 +305,7 @@ host_detect_local_cpu (int argc, const char **argv) } continue; } - - if (strstr (buf, "part") != NULL) + else if (find_field (buf, "part") != std::string::npos) { unsigned ccore = parse_field (buf); if (!contains_core_p (cores, ccore)) @@ -242,39 +317,36 @@ host_detect_local_cpu (int argc, const char **argv) } continue; } - if (!tune && !processed_exts && strstr (buf, "Features") != NULL) + else if (!tune && !processed_exts + && (sep_pos = find_field (buf, "Features")) != std::string::npos) { + /* First create the list of features in the buffer. */ + std::set<std::string> features; + /* Drop everything till the :. */ + buf = buf.substr (sep_pos + 1); + split_words (buf, features); + for (i = 0; i < num_exts; i++) { - const char *p = aarch64_extensions[i].feat_string; + const std::string val (aarch64_extensions[i].feat_string); /* If the feature contains no HWCAPS string then ignore it for the auto detection. */ - if (*p == '\0') + if (val.empty ()) continue; bool enabled = true; /* This may be a multi-token feature string. We need to match all parts, which could be in any order. */ - size_t len = strlen (buf); - do - { - const char *end = strchr (p, ' '); - if (end == NULL) - end = strchr (p, '\0'); - if (memmem (buf, len, p, end - p) == NULL) - { - /* Failed to match this token. Turn off the - features we'd otherwise enable. */ - enabled = false; - break; - } - if (*end == '\0') - break; - p = end + 1; - } - while (1); + std::set<std::string> tokens; + split_words (val, tokens); + std::set<std::string>::iterator it; + + /* Iterate till the first feature isn't found or all of them + are found. */ + for (it = tokens.begin (); enabled && it != tokens.end (); ++it) + enabled = enabled && features.count (*it); if (enabled) extension_flags |= aarch64_extensions[i].flag;