Extending -flto parallel feature to the rest of the build
Hello- I recently started using -flto in my builds, it's a very impressive feature, thanks very much for adding it. One thing that occurred to me while switching over to using it: In an LTO world, the object files, it seems to me, are becoming increasingly less relevant, at least for some applications. Since you are already committing to the build taking a long time, in return for the run-time performance benefit, it makes sense in a lot of cases to go whole-hog and just compile everything every time anyway. This comes with a lot of advantages, besides fewer large files laying around, it simplifies things a lot, say I don't need to worry about accidentally linking in an object file compiled differently vs the rest (different -march, different compiler, etc.), since I am just rebuilding from scratch every time. In my use case, I do such things a lot, and find it very freeing to know I don't need to worry about any state from a previous build. In any case, the above was some justification for why I think the following feature would be appreciated and used by others as well. It's perhaps a little surprising, or at least disappointing, that this: g++ -flto=jobserver *.o will be parallelized, but this: g++ -flto=jobserver *.cpp will effectively not be; each .cpp is compiled serially, then the LTO runs in parallel, but in many cases the first step dominates the build time. Now it's clear why things are done this way, if the user wants to parallelize the compile, they are free to do so by just naming each object as a separate target in their Makefile and running a parallel make. But this takes some effort to set up, especially if you want to take care to remove the intermediate .o files automatically, and since -flto has already opened the door to gcc providing parallelization features, it seems like it would be nice to enable parallelizing more generally, for all parts of the build that could benefit from it. I took a stab at implementing this. The below patch adds an option -fparallel=(jobserver|N) that works analogously to -flto=, but applies to the whole build. It generates a Makefile from each spec, with appropriate dependencies, and then runs make to execute it. The combination -fparallel=X -flto will also be parallelized on the lto side as well, as if -flto=jobserver were specified; the idea would be any downstream tool that could naturally offer parallel features would do so in the presence of the -fparallel switch. I am sure this must be very rough around the edges, it's my first-ever look at the gcc codebase, but I tried not to make it overly restrictive. I only really have experience with Linux and C++ so I may have inadvertently specialized something to these cases, but I did try to keep it general. Here is a list of potential issues that could be addressed: -For some jobs there are environment variables set on a per-job basis. I attempted to identify all of them and came up with COMPILER_PATH, LIBRARY_PATH, and COLLECT_GCC_OPTIONS. This would need to be kept up to date if others are added. -The mechanism I used to propagate environment variables (export + unset) is probably specific to the Bourne shell and wouldn't work on other platforms, but there would be some simple platform-specific code to do it right for Windows and others. -Similarly for -pipe mode, I put pipes into the Makefile recipe, so there may be platforms where this is not the correct syntax. Anyway, here it is, in case there is any interest to pursue it further. Thanks for listening... -Lewis = diff --git gcc/common.opt gcc/common.opt index 3b8b14d..4417847 100644 --- gcc/common.opt +++ gcc/common.opt @@ -1575,6 +1575,10 @@ flto= Common RejectNegative Joined Var(flag_lto) Link-time optimization with number of parallel jobs or jobserver. +fparallel= +Common Driver RejectNegative Joined Var(flag_parallel) +Enable parallel build with number of parallel jobs or jobserver. + Enum Name(lto_partition_model) Type(enum lto_partition_model) UnknownError(unknown LTO partitioning model %qs) diff --git gcc/gcc.c gcc/gcc.c index a5408a4..6f9c1cd 100644 --- gcc/gcc.c +++ gcc/gcc.c @@ -1716,6 +1716,73 @@ static int have_c = 0; /* Was the option -o passed. */ static int have_o = 0; +/* Parallel mode */ +static int parallel = 0; +static int parallel_ctr = 0; +static int parallel_sctr = 0; +static enum { + parallel_mode_off, + parallel_mode_first_job_in_spec, + parallel_mode_continued_spec +} parallel_mode = parallel_mode_off; +static bool jobserver = false; +static FILE* mstream = NULL; +static const char* makefile = NULL; + +/* helper to turn $ -> $$ for make and + maybe escape single quotes for the shell. */ +static void +mstream_escape_puts (const char* string, bool single_quote) +{ + if (single_quote) +fputc ('\'', mstream); + for (; *string; string++) +{ + if (*string == '$') +fputs ("$$", mstream); + else if (single_quote && *string == '\'') +fputs ("\'\\\'\'", mstream
Re: Extending -flto parallel feature to the rest of the build
Well, I guess it's safe to say this did not generate resounding interest :-). Just thought I would check once more if anyone thought it was a worthwhile thing to pursue, and/or had any feedback on the attempt at implementing it. FWIW I have been using this myself for a while now and enjoy it. Thanks! -Lewis On Wed, Dec 17, 2014 at 1:12 PM, Lewis Hyatt wrote: > Hello- > > I recently started using -flto in my builds, it's a very impressive > feature, thanks very much for adding it. One thing that occurred to me > while switching over to using it: In an LTO world, the object files, > it seems to me, are becoming increasingly less relevant, at least for > some applications. Since you are already committing to the build > taking a long time, in return for the run-time performance benefit, it > makes sense in a lot of cases to go whole-hog and just compile > everything every time anyway. This comes with a lot of advantages, > besides fewer large files laying around, it simplifies things a lot, > say I don't need to worry about accidentally linking in an object file > compiled differently vs the rest (different -march, different > compiler, etc.), since I am just rebuilding from scratch every time. > In my use case, I do such things a lot, and find it very freeing to > know I don't need to worry about any state from a previous build. > > In any case, the above was some justification for why I think the > following feature would be appreciated and used by others as well. > It's perhaps a little surprising, or at least disappointing, that > this: > > g++ -flto=jobserver *.o > > will be parallelized, but this: > > g++ -flto=jobserver *.cpp > > will effectively not be; each .cpp is compiled serially, then the LTO > runs in parallel, but in many cases the first step dominates the build > time. Now it's clear why things are done this way, if the user wants > to parallelize the compile, they are free to do so by just naming each > object as a separate target in their Makefile and running a parallel > make. But this takes some effort to set up, especially if you want to > take care to remove the intermediate .o files automatically, and since > -flto has already opened the door to gcc providing parallelization > features, it seems like it would be nice to enable parallelizing more > generally, for all parts of the build that could benefit from it. > > I took a stab at implementing this. The below patch adds an option > -fparallel=(jobserver|N) that works analogously to -flto=, but applies > to the whole build. It generates a Makefile from each spec, with > appropriate dependencies, and then runs make to execute it. The > combination -fparallel=X -flto will also be parallelized on the lto > side as well, as if -flto=jobserver were specified; the idea would be > any downstream tool that could naturally offer parallel features would > do so in the presence of the -fparallel switch. > > I am sure this must be very rough around the edges, it's my first-ever > look at the gcc codebase, but I tried not to make it overly > restrictive. I only really have experience with Linux and C++ so I may > have inadvertently specialized something to these cases, but I did try > to keep it general. Here is a list of potential issues that could be > addressed: > > -For some jobs there are environment variables set on a per-job basis. > I attempted to identify all of them and came up with COMPILER_PATH, > LIBRARY_PATH, and COLLECT_GCC_OPTIONS. This would need to be kept up > to date if others are added. > > -The mechanism I used to propagate environment variables (export + > unset) is probably specific to the Bourne shell and wouldn't work on > other platforms, but there would be some simple platform-specific code > to do it right for Windows and others. > > -Similarly for -pipe mode, I put pipes into the Makefile recipe, so > there may be platforms where this is not the correct syntax. > > Anyway, here it is, in case there is any interest to pursue it > further. Thanks for listening... > > -Lewis > > = > > diff --git gcc/common.opt gcc/common.opt > index 3b8b14d..4417847 100644 > --- gcc/common.opt > +++ gcc/common.opt > @@ -1575,6 +1575,10 @@ flto= > Common RejectNegative Joined Var(flag_lto) > Link-time optimization with number of parallel jobs or jobserver. > > +fparallel= > +Common Driver RejectNegative Joined Var(flag_parallel) > +Enable parallel build with number of parallel jobs or jobserver. > + > Enum > Name(lto_partition_model) Type(enum lto_partition_model) > UnknownError(unknown LTO partitioning model %qs) > > diff --git gcc/gcc.c gcc/gcc.c > index a5408a4..6f9c1cd 100644 > --- gcc/gcc.c
Re: The encoding of GCC's stderr
On Tue, Nov 17, 2020 at 12:01 PM David Malcolm via Gcc wrote: > > As far as I can tell, GCC's diagnostic output on stderr is a mixture of > bytes from various different places in our internal representation: > - filenames > - format strings from diagnostic messages (potentially translated via > .po files) > - identifiers > - quoted source code > - fix-it hints > - labels > > As noted in https://gcc.gnu.org/onlinedocs/cpp/Character-sets.html > source files can be in any character set, specified by -finput-charset=, and > libcpp converts that to the "source character set", Unicode, encoding it > internally as UTF-8. String and character constants are then converted to > the execution character set (defaulting to UTF-8-encoded Unicode). In many > places we use identifier_to_locale to convert from the "internal encoding" to > the locale character set, falling back to converting non-ASCII characters to > UCNs. I suspect that there are numerous places where we're not doing that, > but ought to be. > > The only test coverage I could find for -finput-charset is > gcc.dg/ucnid-16-utf8.c, which has a latin1-encoded source file, and > verifies that a latin-1 encoded variable name becomes UTF-8 encoded in > the resulting .s file. I shudder to imagine a DejaGnu test for a > source encoding that's not a superset of ASCII (e.g. UCS-4) - how would > the dg directives be handled? I wonder if DejaGnu can support tests in > which the compiler's locale is overridden with environment variables > (and thus having e.g. non-ASCII/non-UTF-8 output). > > What is the intended encoding of GCC's stderr? I don't really have the context to comment much on this, since I just know what I tried to figure out while adding the support for UTF-8 identifiers initially, but I thought I would note a few things that I have come across which are relevant. One thing is that you actually can't use -finput-charset with an encoding that is not a superset of ASCII. I was confused by this and filed PR92987 about it (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92987), but Joseph explained that this is expected. Another random tidbit, it's actually hard to test some of this stuff with dejagnu because TCL does some silent conversion without telling you. In particular, if it sees command output that looks like latin1, it will convert it to UTF-8 behind the scenes before presenting it to you, irrespective of the current locale. I ran across that when constructing the test cases for the patch attached to PR93067 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93067#c0) and had to find some random encoding that satisfied both properties (superset of ASCII, not looking like latin1) to use for that test. > > In gcc_init_libintl we call: > > #if defined HAVE_LANGINFO_CODESET > locale_encoding = nl_langinfo (CODESET); > if (locale_encoding != NULL > && (!strcasecmp (locale_encoding, "utf-8") > || !strcasecmp (locale_encoding, "utf8"))) > locale_utf8 = true; > #endif > > so presumably stderr ought to be nl_langinfo (CODESET). > > We use the above to potentially use the UTF-8 encoding of U+2018 and > U+2019 for open/close quotes, falling back to ASCII for these. > > As far as I can tell, we currently: > - blithely accept and emit filenames as bytes (I don't think we make > any attempt to enforce that they're any particular encoding) > - emit format strings in whatever encoding gettext gives us > - emit identifiers as char * from IDENTIFIER_POINTER, calling > identifier_to_locale on them in many places, but I suspect we're > missing some > - blithely emit quoted source code as raw bytes (this is PR > other/93067, which has an old patch attached; presumably the source > ought to be emitted to stderr in the locale encoding) When I was first trying to understand this stuff for supporting UTF-8 identifiers, I discussed some of this with Joseph as well here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c28 . I would agree there are two missing conversions... source should be converted to UTF-8 when read for diagnostics (PR93067, that patch btw, I have updated it so it's no longer old...), and also, it should be converted to the current locale when output. This second part is a bit involved though, because we presumably do not want to disturb the alignment of labels and such. So it would seem that in case a given character cannot be output, it should be converted to as many '?' characters as needed to fill out the display width, or something like this. Presumably the most common, if not only practical, case, would be UTF-8 source, being processed in an ASCII locale... currently the user will get UTF-8 anyway. But it can be done in general too. I could work on this if you like... would be good to finalize the PR93067 patch first probably. Should I prepare it and post here? I had left it on the PR for feedback because I wasn't sure if the approach was OK or not for how I implemented it. > - fix-it hints can contain identifiers a
post-commit hook failure
Hello- I wanted to note I got some errors when pushing this commit: commit 3ac6b5cff1eca4e1748c671960ef7b4ca5e47fd2 (HEAD -> master, origin/trunk, origin/master, origin/HEAD) Author: Lewis Hyatt Date: Tue Aug 24 19:30:44 2021 -0400 diagnostics: Support for -finput-charset [PR93067] The errors were: remote: Traceback (most recent call last): remote: File "hooks/post_receive.py", line 118, in remote: post_receive(refs_data, args.submitter_email) remote: File "hooks/post_receive.py", line 65, in post_receive remote: submitter_email) remote: File "hooks/post_receive.py", line 47, in post_receive_one remote: update.send_email_notifications() remote: File "/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py", line 189, in send_email_notifications remote: self.__email_new_commits() remote: File "/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py", line 1031, in __email_new_commits remote: commit, self.get_standard_commit_email(commit)) remote: File "/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py", line 1011, in __send_commit_email remote: default_diff=email.diff) remote: File "/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py", line 946, in __maybe_get_email_custom_contents remote: hook_input=json.dumps(hooks_data), remote: File "/usr/lib64/python2.7/json/__init__.py", line 244, in dumps remote: return _default_encoder.encode(obj) remote: File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode remote: chunks = self.iterencode(o, _one_shot=True) remote: File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode remote: return _iterencode(o, 0) remote: UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 14638: invalid start byte I think the commit has pushed fine, however some expected things did not happen such as updating Bugzilla. The error makes some sense because this commit added tests with non-standard encodings, while python3 wants everything it reads to be utf-8 by default. I wanted to mention in case it breaks some other things, such as ChangeLog updating, that need to be addressed... Please let me know if I should have done something different here. Thanks! -Lewis