Extending -flto parallel feature to the rest of the build

2014-12-17 Thread Lewis Hyatt
Hello-

I recently started using -flto in my builds, it's a very impressive
feature, thanks very much for adding it. One thing that occurred to me
while switching over to using it: In an LTO world, the object files,
it seems to me, are becoming increasingly less relevant, at least for
some applications. Since you are already committing to the build
taking a long time, in return for the run-time performance benefit, it
makes sense in a lot of cases to go whole-hog and just compile
everything every time anyway. This comes with a lot of advantages,
besides fewer large files laying around, it simplifies things a lot,
say I don't need to worry about accidentally linking in an object file
compiled differently vs the rest (different -march, different
compiler, etc.), since I am just rebuilding from scratch every time.
In my use case, I do such things a lot, and find it very freeing to
know I don't need to worry about any state from a previous build.

In any case, the above was some justification for why I think the
following feature would be appreciated and used by others as well.
It's perhaps a little surprising, or at least disappointing, that
this:

g++ -flto=jobserver *.o

will be parallelized, but this:

g++ -flto=jobserver *.cpp

will effectively not be; each .cpp is compiled serially, then the LTO
runs in parallel, but in many cases the first step dominates the build
time. Now it's clear why things are done this way, if the user wants
to parallelize the compile, they are free to do so by just naming each
object as a separate target in their Makefile and running a parallel
make. But this takes some effort to set up, especially if you want to
take care to remove the intermediate .o files automatically, and since
-flto has already opened the door to gcc providing parallelization
features, it seems like it would be nice to enable parallelizing more
generally, for all parts of the build that could benefit from it.

I took a stab at implementing this. The below patch adds an option
-fparallel=(jobserver|N) that works analogously to -flto=, but applies
to the whole build. It generates a Makefile from each spec, with
appropriate dependencies, and then runs make to execute it. The
combination -fparallel=X -flto will also be parallelized on the lto
side as well, as if -flto=jobserver were specified; the idea would be
any downstream tool that could naturally offer parallel features would
do so in the presence of the -fparallel switch.

I am sure this must be very rough around the edges, it's my first-ever
look at the gcc codebase, but I tried not to make it overly
restrictive. I only really have experience with Linux and C++ so I may
have inadvertently specialized something to these cases, but I did try
to keep it general. Here is a list of potential issues that could be
addressed:

-For some jobs there are environment variables set on a per-job basis.
I attempted to identify all of them and came up with COMPILER_PATH,
LIBRARY_PATH, and COLLECT_GCC_OPTIONS. This would need to be kept up
to date if others are added.

-The mechanism I used to propagate environment variables (export +
unset) is probably specific to the Bourne shell and wouldn't work on
other platforms, but there would be some simple platform-specific code
to do it right for Windows and others.

-Similarly for -pipe mode, I put pipes into the Makefile recipe, so
there may be platforms where this is not the correct syntax.

Anyway, here it is, in case there is any interest to pursue it
further. Thanks for listening...

-Lewis

=

diff --git gcc/common.opt gcc/common.opt
index 3b8b14d..4417847 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -1575,6 +1575,10 @@ flto=
 Common RejectNegative Joined Var(flag_lto)
 Link-time optimization with number of parallel jobs or jobserver.

+fparallel=
+Common Driver RejectNegative Joined Var(flag_parallel)
+Enable parallel build with number of parallel jobs or jobserver.
+
 Enum
 Name(lto_partition_model) Type(enum lto_partition_model)
UnknownError(unknown LTO partitioning model %qs)

diff --git gcc/gcc.c gcc/gcc.c
index a5408a4..6f9c1cd 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -1716,6 +1716,73 @@ static int have_c = 0;
 /* Was the option -o passed.  */
 static int have_o = 0;

+/* Parallel mode  */
+static int parallel = 0;
+static int parallel_ctr = 0;
+static int parallel_sctr = 0;
+static enum {
+  parallel_mode_off,
+  parallel_mode_first_job_in_spec,
+  parallel_mode_continued_spec
+} parallel_mode = parallel_mode_off;
+static bool jobserver = false;
+static FILE* mstream = NULL;
+static const char* makefile = NULL;
+
+/* helper to turn $ -> $$ for make and
+   maybe escape single quotes for the shell. */
+static void
+mstream_escape_puts (const char* string, bool single_quote)
+{
+  if (single_quote)
+fputc ('\'', mstream);
+  for (; *string; string++)
+{
+  if (*string == '$')
+fputs ("$$", mstream);
+  else if (single_quote && *string == '\'')
+fputs ("\'\\\'\'", mstream

Re: Extending -flto parallel feature to the rest of the build

2015-01-15 Thread Lewis Hyatt
Well, I guess it's safe to say this did not generate resounding
interest :-). Just thought I would check once more if anyone thought
it was a worthwhile thing to pursue, and/or had any feedback on the
attempt at implementing it. FWIW I have been using this myself for a
while now and enjoy it. Thanks!

-Lewis

On Wed, Dec 17, 2014 at 1:12 PM, Lewis Hyatt  wrote:
> Hello-
>
> I recently started using -flto in my builds, it's a very impressive
> feature, thanks very much for adding it. One thing that occurred to me
> while switching over to using it: In an LTO world, the object files,
> it seems to me, are becoming increasingly less relevant, at least for
> some applications. Since you are already committing to the build
> taking a long time, in return for the run-time performance benefit, it
> makes sense in a lot of cases to go whole-hog and just compile
> everything every time anyway. This comes with a lot of advantages,
> besides fewer large files laying around, it simplifies things a lot,
> say I don't need to worry about accidentally linking in an object file
> compiled differently vs the rest (different -march, different
> compiler, etc.), since I am just rebuilding from scratch every time.
> In my use case, I do such things a lot, and find it very freeing to
> know I don't need to worry about any state from a previous build.
>
> In any case, the above was some justification for why I think the
> following feature would be appreciated and used by others as well.
> It's perhaps a little surprising, or at least disappointing, that
> this:
>
> g++ -flto=jobserver *.o
>
> will be parallelized, but this:
>
> g++ -flto=jobserver *.cpp
>
> will effectively not be; each .cpp is compiled serially, then the LTO
> runs in parallel, but in many cases the first step dominates the build
> time. Now it's clear why things are done this way, if the user wants
> to parallelize the compile, they are free to do so by just naming each
> object as a separate target in their Makefile and running a parallel
> make. But this takes some effort to set up, especially if you want to
> take care to remove the intermediate .o files automatically, and since
> -flto has already opened the door to gcc providing parallelization
> features, it seems like it would be nice to enable parallelizing more
> generally, for all parts of the build that could benefit from it.
>
> I took a stab at implementing this. The below patch adds an option
> -fparallel=(jobserver|N) that works analogously to -flto=, but applies
> to the whole build. It generates a Makefile from each spec, with
> appropriate dependencies, and then runs make to execute it. The
> combination -fparallel=X -flto will also be parallelized on the lto
> side as well, as if -flto=jobserver were specified; the idea would be
> any downstream tool that could naturally offer parallel features would
> do so in the presence of the -fparallel switch.
>
> I am sure this must be very rough around the edges, it's my first-ever
> look at the gcc codebase, but I tried not to make it overly
> restrictive. I only really have experience with Linux and C++ so I may
> have inadvertently specialized something to these cases, but I did try
> to keep it general. Here is a list of potential issues that could be
> addressed:
>
> -For some jobs there are environment variables set on a per-job basis.
> I attempted to identify all of them and came up with COMPILER_PATH,
> LIBRARY_PATH, and COLLECT_GCC_OPTIONS. This would need to be kept up
> to date if others are added.
>
> -The mechanism I used to propagate environment variables (export +
> unset) is probably specific to the Bourne shell and wouldn't work on
> other platforms, but there would be some simple platform-specific code
> to do it right for Windows and others.
>
> -Similarly for -pipe mode, I put pipes into the Makefile recipe, so
> there may be platforms where this is not the correct syntax.
>
> Anyway, here it is, in case there is any interest to pursue it
> further. Thanks for listening...
>
> -Lewis
>
> =
>
> diff --git gcc/common.opt gcc/common.opt
> index 3b8b14d..4417847 100644
> --- gcc/common.opt
> +++ gcc/common.opt
> @@ -1575,6 +1575,10 @@ flto=
>  Common RejectNegative Joined Var(flag_lto)
>  Link-time optimization with number of parallel jobs or jobserver.
>
> +fparallel=
> +Common Driver RejectNegative Joined Var(flag_parallel)
> +Enable parallel build with number of parallel jobs or jobserver.
> +
>  Enum
>  Name(lto_partition_model) Type(enum lto_partition_model)
> UnknownError(unknown LTO partitioning model %qs)
>
> diff --git gcc/gcc.c gcc/gcc.c
> index a5408a4..6f9c1cd 100644
> --- gcc/gcc.c

Re: The encoding of GCC's stderr

2020-11-17 Thread Lewis Hyatt via Gcc
On Tue, Nov 17, 2020 at 12:01 PM David Malcolm via Gcc  wrote:
>
> As far as I can tell, GCC's diagnostic output on stderr is a mixture of
> bytes from various different places in our internal representation:
> - filenames
> - format strings from diagnostic messages (potentially translated via
> .po files)
> - identifiers
> - quoted source code
> - fix-it hints
> - labels
>
> As noted in https://gcc.gnu.org/onlinedocs/cpp/Character-sets.html
> source files can be in any character set, specified by -finput-charset=, and 
> libcpp converts that to the "source character set", Unicode, encoding it 
> internally as UTF-8.  String and character constants are then converted to 
> the execution character set (defaulting to UTF-8-encoded Unicode).  In many 
> places we use identifier_to_locale to convert from the "internal encoding" to 
> the locale character set, falling back to converting non-ASCII characters to 
> UCNs.  I suspect that there are numerous places where we're not doing that, 
> but ought to be.
>
> The only test coverage I could find for -finput-charset is
> gcc.dg/ucnid-16-utf8.c, which has a latin1-encoded source file, and
> verifies that a latin-1 encoded variable name becomes UTF-8 encoded in
> the resulting .s file.  I shudder to imagine a DejaGnu test for a
> source encoding that's not a superset of ASCII (e.g. UCS-4) - how would
> the dg directives be handled?  I wonder if DejaGnu can support tests in
> which the compiler's locale is overridden with environment variables
> (and thus having e.g. non-ASCII/non-UTF-8 output).
>
> What is the intended encoding of GCC's stderr?

I don't really have the context to comment much on this, since I just
know what I tried to figure out while adding the support for UTF-8
identifiers initially, but I thought I would note a few things that I
have come across which are relevant. One thing is that you actually
can't use -finput-charset with an encoding that is not a superset of
ASCII. I was confused by this and filed PR92987 about it
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92987), but Joseph
explained that this is expected.

Another random tidbit, it's actually hard to test some of this stuff
with dejagnu because TCL does some silent conversion without telling
you. In particular, if it sees command output that looks like latin1,
it will convert it to UTF-8 behind the scenes before presenting it to
you, irrespective of the current locale. I ran across that when
constructing the test cases for the patch attached to PR93067
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93067#c0) and had to
find some random encoding that satisfied both properties (superset of
ASCII, not looking like latin1) to use for that test.

>
> In gcc_init_libintl we call:
>
> #if defined HAVE_LANGINFO_CODESET
>   locale_encoding = nl_langinfo (CODESET);
>   if (locale_encoding != NULL
>   && (!strcasecmp (locale_encoding, "utf-8")
>   || !strcasecmp (locale_encoding, "utf8")))
> locale_utf8 = true;
> #endif
>
> so presumably stderr ought to be nl_langinfo (CODESET).
>
> We use the above to potentially use the UTF-8 encoding of U+2018 and
> U+2019 for open/close quotes, falling back to ASCII for these.
>
> As far as I can tell, we currently:
> - blithely accept and emit filenames as bytes (I don't think we make
> any attempt to enforce that they're any particular encoding)
> - emit format strings in whatever encoding gettext gives us
> - emit identifiers as char * from IDENTIFIER_POINTER, calling
> identifier_to_locale on them in many places, but I suspect we're
> missing some
> - blithely emit quoted source code as raw bytes (this is PR
> other/93067, which has an old patch attached; presumably the source
> ought to be emitted to stderr in the locale encoding)

When I was first trying to understand this stuff for supporting UTF-8
identifiers, I discussed some of this with Joseph as well here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c28 . I would agree
there are two missing conversions... source should be converted to
UTF-8 when read for diagnostics (PR93067, that patch btw, I have
updated it so it's no longer old...), and also, it should be converted
to the current locale when output. This second part is a bit involved
though, because we presumably do not want to disturb the alignment of
labels and such. So it would seem that in case a given character
cannot be output, it should be converted to as many '?' characters as
needed to fill out the display width, or something like this.
Presumably the most common, if not only practical, case, would be
UTF-8 source, being processed in an ASCII locale... currently the user
will get UTF-8 anyway. But it can be done in general too. I could work
on this if you like... would be good to finalize the PR93067 patch
first probably. Should I prepare it and post here? I had left it on
the PR for feedback because I wasn't sure if the approach was OK or
not for how I implemented it.

> - fix-it hints can contain identifiers a

post-commit hook failure

2021-08-25 Thread Lewis Hyatt via Gcc
Hello-

I wanted to note I got some errors when pushing this commit:

commit 3ac6b5cff1eca4e1748c671960ef7b4ca5e47fd2 (HEAD -> master,
origin/trunk, origin/master, origin/HEAD)
Author: Lewis Hyatt 
Date:   Tue Aug 24 19:30:44 2021 -0400

diagnostics: Support for -finput-charset [PR93067]

The errors were:

remote: Traceback (most recent call last):
remote:   File "hooks/post_receive.py", line 118, in 
remote: post_receive(refs_data, args.submitter_email)
remote:   File "hooks/post_receive.py", line 65, in post_receive
remote: submitter_email)
remote:   File "hooks/post_receive.py", line 47, in post_receive_one
remote: update.send_email_notifications()
remote:   File 
"/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py",
line 189, in send_email_notifications
remote: self.__email_new_commits()
remote:   File 
"/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py",
line 1031, in __email_new_commits
remote: commit, self.get_standard_commit_email(commit))
remote:   File 
"/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py",
line 1011, in __send_commit_email
remote: default_diff=email.diff)
remote:   File 
"/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py",
line 946, in __maybe_get_email_custom_contents
remote: hook_input=json.dumps(hooks_data),
remote:   File "/usr/lib64/python2.7/json/__init__.py", line 244, in dumps
remote: return _default_encoder.encode(obj)
remote:   File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
remote: chunks = self.iterencode(o, _one_shot=True)
remote:   File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
remote: return _iterencode(o, 0)
remote: UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in
position 14638: invalid start byte


I think the commit has pushed fine, however some expected things did
not happen such as updating Bugzilla. The error makes some sense
because this commit added tests with non-standard encodings, while
python3 wants everything it reads to be utf-8 by default. I wanted to
mention in case it breaks some other things, such as ChangeLog
updating, that need to be addressed... Please let me know if I should
have done something different here. Thanks!

-Lewis