Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Thomas Neumann
> For this _specific_ instance of the general problem, C++ users could
> use numeric_limits::max() and get away with it, but I don't
> believe such a solution (or the one you propose or similar I've seen)
> to this specific instance generalizes to portable, readable and
> maintainable solution to the general problem.   
while a solution that is a generic as numeric_limits is hard to do in C,
a simple calculation of the larges positive value can be done in C:

#define MAXSIGNEDVALUE(x) (~((~((x)0))<<(8*sizeof(x)-1)))

admittedly this simple definition still overflow during constant folding
(see below for a more complex one that avoid this), but you can
calculate the maximum value with a not too complex macro.

This should be enough for most use cases; if you need defined overflows,
use unsigned data types. And in the rare cases that this might not be
possible, use autoconf magic to find out the types/constants you have to
use. Your code will not be portable to other compilers anyway.
Restricting the optimizer by default just for these obscure corner cases
does not seem justified IMHO, especially if other compilers behave the same.

Thomas

Longer definition that avoids undefined overflows:

#define MAXSIGNEDVALUE(x) ((x)(~((~((unsigned long \
long)0))<<(8*sizeof(x)-1



Re: [RFA] C++ language compatibility in sources [was RE: Add missing casts in gengtype-lex]

2007-04-12 Thread Thomas Neumann
Dave Korn schrieb:
> Maybe it would make more sense to bundle them up into two tranches,
one for
> all the gen* utilities, one for the compiler core itself.  That would
be much
> more practical to do the full bootstrap-and-regtest procedure.
indeed. To get a full C++ compliant compiler I would probably have to
send dozends of tiny patches. Even though the individual patches are
trivial, this would be cumbersome to apply. Bundling larger logical
patches would make sense, except for the reason below...

>  However,
> bundling them all up into big patches would probably run over the size
limit
> for "small patches that don't require paperwork".  Do you have an
assignment
> on file with the FSF?
no, and this is the reason why I send tiny patches. But I could try to
fill the required paperwork (although I think I read it takes ages to be
processed).

Thomas



tree_code and type safety

2007-04-13 Thread Thomas Neumann
Hi,

while waiting for my copyright assignment, I continued compiling gcc
with a C++ compiler. Most problems are minor, but now I encountered one
where I am unsure what to do:

The basic tree codes are defined by the enum tree_code, that basically
looks like this:

enum tree_code {
   
   LAST_AND_UNUSED_TREE_CODE
}

The C front end apparently needs additional tree codes, and defines them
like this:

enum c_tree_code {
  C_DUMMY_TREE_CODE = LAST_AND_UNUSED_TREE_CODE,
  
}

So far ok, but then C front end passes its private tree code to
functions expecting tree_code values. This is not accepted by the C++
compiler, as tree_code and c_tree_code are distinct types.
In fact I think the code is undefined even in C unless something like
MAXIMUM_TREE_CODE = 65535 is added to tree_code and MINIMUM_TREE_CODE=0
is added to c_tree_code. (At least if the C++ standard paragraph 7.2.6
is similar to its C counterpart)

Now I have multiple options to fix this issue:

1) I could just explicitly cast from c_tree_code to tree_code. Avoids
the error, and is only needed about two or three times in the whole code
as there is only one C tree_code currently. But more tree code might be
added in the future and other front end might use more tree codes.

2) I could use an integer data type instead of an enum to hold the tree
code values. This avoids all problems, but is massively invasive: grep
"enum tree_code" returns 530 hits, changing all of these into tree_code
(as an integer typedef would not be in the enum namespace) would touch
many files. Probably not a good idea.

3) use preprocessor magic to add the front end tree codes into the tree
code enum, somewhat like this (just a rough sketch):

enum tree_code {
   
   LAST_AND_UNUSED_TREE_CODE,
   FIRST_C_CODE = LAST_AND_UNUSED_TREE_CODE,
#include "c-common.def"
   FIRST_FOOLANG_CODE = LAST_AND_UNUSED_TREE_CODE,
#include "foolang-common.def"
}

This gets the enum reasonable, but has to know the front ends somehow.
Or perhaps it is enough to include the tree codes of the _current_ front
end, whatever it is. The preprocessor magic is probably not trivial, but
otherwise the rest of the code should not be affected.


I tend to just go for 1) and add the casts, but this is not very future
proof. Any suggestions?

Thomas



Re: Why not contribute? (to GCC)

2010-04-24 Thread Thomas Neumann
> What reasons keep you from contributing to GCC?
I tried this a while ago, but ultimately gave up because I could not get my 
patches in. Some were applied, but many never made it. Admittedly they were 
perhaps not of general interested, there were only improving compatibility 
of the gcc base with C++ (what Ian Lance Taylor then did later).
The most frustrating part was not getting patches rejected (I could then 
improve them, after all), but having patches ignored. You submit a number of 
patches, and the result is... nothing. No response at all. Not exactly what 
encourages gcc as a free time activity.

On the other hand I am a professional developer, too (even working on 
compilers), and I myself would perhaps also be reluctant to spend time on 
reviewing and merging patches that I do not really care about. So I 
understand the gcc developers. But it is still frustrating for outsiders.

Thomas




Re: The gcc-in-cxx branch now completes bootstrap

2009-04-11 Thread Thomas Neumann
>> Also, is there any significant difference in bootstrap times?
> 
> I haven't actually measured, but subjectively bootstrap does seem to
> take longer.
I tried this out of curiosity. The numbers below are the bootstrap times on 
a 64bit 2.6.28 Linux system (Core 2 E8400), building single threaded with   
--enable-languages=c,c++ --disable-multilib

Regular gcc:

real59m6.914s
user53m53.702s
sys 3m24.073s

gcc-in-cxx:

real68m15.366s
user61m32.255s
sys 4m24.481s

Of course the bootstrap times are not that useful themselves, as they 
compare two quite different compilation tasks (one C, one C++). To get a 
better idea about the different compiler speeds, I compiled some random 
(reasonably complex) C++ code I had at hand, and compared the compile times.

Regular gcc:

real0m30.478s
user0m27.842s
sys 0m1.888s

gcc-in-cxx:

real0m35.926s
user0m34.386s
sys 0m1.208s

Again the comparison is not 100% fair ("regular gcc" is current mainline, 
while the gcc-in-cxx branch is older), but apparently the C++ version is 
quite a bit slower.

Admittedly gcc-in-cxx just recently managed to bootstrap at all, so perhaps 
performance comparisons are a bit unfair. But I do not mean this as critique 
of gcc-in-cxx, I want to help improve it and to bring it to the same speed 
as regular mainline.
Is there any reasonably simple way to find out why the C++ version is 
slower? I can use something like oprofile, of course, but I thought gcc can 
somehow give statistics about its internal times, which might be more useful 
for a first approximation.

Thomas





Re: The gcc-in-cxx branch now completes bootstrap

2009-04-11 Thread Thomas Neumann
Ben Elliston wrote:
> Try using -ftime-report.
thanks, that was what I had in mind.

The largest difference seems to be in "tree STMT verifier" (36% runtime 
increase), a few others increased slightly, most seem to be nearly 
identical. (This distribution could be an artifact of my example code, of 
course).

I will have to take a closer look to find the exact cause of the slow-down.

Thomas





Re: The gcc-in-cxx branch now completes bootstrap

2009-04-12 Thread Thomas Neumann
Curious. I ran both g++ variants in oprofile, and then compared the 
generated assembler code for the most critical functions.

The top 1 function in both cases is pointer_set_insert, and there the 
assembler code is 100% identical (module one choice between r14 and r15).

The second most critical function in the gcc-in-cxx build is walk_tree_1, 
which is only place 4 in mainline gcc.
There the code seems to be identical, too, except for code layout: The 
compiler arranges the code in a different order, and apparently has 
different a different branch prediction. The non-branching code is nearly 
identical, too.
The "hottest" assembler instructions in walk_tree_1 are memory accesses, 
apparently the mainline version causes slightly less cache misses or better 
prediction? (my interpretation, not measured yet)

I am a bit unsure how to proceed. The gcc-in-cxx assembler code looks ok, as 
it is nearly identical to the mainline code. The main differences are in the 
code/branch layout, and I wouldn't know how to debug this.

Thomas





Re: Using C++ in GCC is OK

2010-05-31 Thread Thomas Neumann
> Because C++ is a big language, I think we should try to enumerate what
> is OK, rather than what is not OK.
> Is there anyone who would like to volunteer to develop the C++ coding
> standards?
I hope you you don't mind my question (as I am currently not an active GCC 
developer), but what is the point of this? I understand why you do not want 
to have sudden, disruptive changes to a large existing code base, so any 
introduction of C++ features should be careful and controlled, but why 
artificially eliminate parts of the language?
I personally think the coding standard should be: "All C++ code has to be 1) 
easily readable, 2) easily maintainable, and 3) produce efficient machine 
code and conserve memory. Apart from that use ISO/IEC 14882-1998".

Now I know that this is totally unrealistic in the context of the GCC 
project, and some people here get really nervous about a potential C++ 
creep, but IMHO artificial limitations on a pure syntax base are not really 
meaningful. One should look at the consequences and not at the syntax.

Thomas




Re: Using C++ in GCC is OK

2010-05-31 Thread Thomas Neumann
> Well anyone can think anything, but this view is way out of the
> mainstream. I do not know of a single large real project using a
> large complex language that does not have coding standards that
> limit the use of the language.
I know this, but I do not understand this. I have worked in reasonably large 
commercial projects. Admittedly only one had had more than a hundred active 
developers (which in fact was the one with the most loose coding standards). 
But as far as I have seen it coding standards are either in the spirit of 
what I had proposed (emphasizing code quality over language features) or are 
an over-detailed mess that is a pain in practice and gives no reasonable 
justification for the imposed limitations. I would like to prevent the later 
for GCC.

But this is getting off-topic, GCC will probably limit itself to "C with 
classes" anyway. Which in my opinion is a shame, but such is life.

Thomas




Re: Where is the egg?

2006-06-12 Thread Thomas Neumann
> These are pngcrushed versions with linear dimensions between 50% and 80% of
> the 200-pixel-high original.
how about using a svg image as a master instead of a png? It could be
scaled without loss. I attached a svg produced from the original png.

Thomas


gcc.svg.bz2
Description: Binary data


Re: gcc-in-cxx branch created

2008-06-18 Thread Thomas Neumann
Hi,

On 2008-06-17 23:01, Ian Lance Taylor wrote:
> As I promised at the summit today, I have created the branch
> gcc-in-cxx (I originally said gcc-in-c++, but I decided that it was
> better to avoid possible meta-characters).  The goal of this branch is
> to develop a version of gcc which is compiled with C++.
I have a patch that allows to build a working gcc (--enable-languages=c
only) with a C++ compiler. Are you interested in it?
I tried to get parts of it into mainline but as my patches were mostly
ignored I stopped tracking mainline, so it is a bit bitrotten. If you
are interested I could try to bring it back into shape for the current
mainline.

One thing that I noticed is that even new code frequently breaks C++
compatibility. People are very fond of using C++ keywords as variable
names...

Thomas




Re: genautomata.c bug

2007-05-14 Thread Thomas Neumann
>   Looks like you're right to me.  We get away with it because a decl is larger
> than a regexp.  Testing a patch now.
this would be fixed by my patch here

http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00814.html

which is not reviewed yet. The patch switches to typesafe memory
allocation, which uncovered the bug.

Thomas



Bribing a reviewer

2007-05-25 Thread Thomas Neumann
Hi,

about two weeks ago I started submitting patches for C++ compatibility.
Unfortunately reviewing as been, ahem, a bit slow. Probably because
nobody cares about C++ compatibility. As I have only send 4% of the
total patch so far, the current acceptance rate (as in 0 patches in 2
weeks) bothers me a bit.

Therefore I am offering a deal to potential reviewers: If you promise to
review some of my patches, I will code something _you_ care about.
Within reasonable limits, of course :)

I don't expect my patches to be simply accepted (i.e. I will have to
redo the ChangeLog entries of the early ones), but some help in getting
them into gcc would be nice. And I will do something for your first, so
you basically cannot lose.

Thomas



insn_code -> tree_code in tree-vect-transform.c

2007-05-25 Thread Thomas Neumann
Hi,

as of revision 125076, tree-vect-transform.c contains the following code
in line 2010:

enum tree_code code, code1 = CODE_FOR_nothing, code2 = CODE_FOR_nothing;

This most likely wrong, CODE_FOR_nothing is an insn_code, not a
tree_code. Unfortunately there is no obvious fix (at least not obvious
to me): The usage of code1 implies that code1 should indeed be
tree_code. But CODE_FOR_nothing is a non-zero constant (2210), which is
larger than MAX_TREE_CODES. Should it perhaps haven been ERROR_MARK instead?

Thomas



Re: Bribing a reviewer

2007-05-28 Thread Thomas Neumann
> looking for something to review. And when posting a patch, try to make it
> easy for reviewers to tell that your patch is for their part of GCC.
I see your point. I originally thought I would be sending one patch for
whole gcc (as I have the complete patch ready), just broken into smaller
parts for reviewing. Therefore I called them "C++ compatibility". But I
will name the mails more according to the content in the future. (And
send pings with more reasonable names, too).

>A more traditional approach would be to use the patch tracker
> http://gcc.gnu.org/wiki/GCC_Patch_Tracking>, just in case a
reviewer is
I tried this for a few patches, but I have some problems finding out
what area I should write there... But I will try to improve my patches.

Thanks for your suggestions.

Thomas


performance of exception handling

2020-05-11 Thread Thomas Neumann via Gcc
Hi,

I want to improve the performance of C++ exception handling, and I would
like to get some feedback on how to tackle that.

Currently, exception handling scales poorly due to global mutexes when
throwing. This can be seen with a small demo script here:
https://repl.it/repls/DeliriousPrivateProfiler
Using a thread count >1 is much slower than running single threaded.
This global locking is particular painful on a machine with more than a
hundred cores, as there mutexes are expensive and contention becomes
much more likely due to the high degree of parallelism.

Of course conventional wisdom is not to use exceptions when exceptions
can occur somewhat frequently. But I think that is a silly argument, see
the WG21 paper P0709 for a detailed discussion. In particular since
there is no technical reason why they have to be slow, it is just the
current implementation that is slow.

In the current gcc implementation on Linux the bottleneck is
_Unwind_Find_FDE, or more precisely, the function dl_iterate_phdr,
that is called for every frame and that iterates over all shared
libraries while holding a global lock.
That is inherently slow, both due to global locking and due to the data
structures involved.
And it is not easy to speed that up with, e.g., a thread local cache, as
glibc has no mechanism to notify us if a shared library is added or removed.

We therefore need a way to locate the exception frames that is
independent from glibc. One way to achieve that would be to explicitly
register exception frames with __register_frame_info_bases in a
constructor function (and deregister them in a destructor function).
Of course probing explicitly registered frame currently uses a global
lock, too, but that implementation is provided by libgcc, and we can
change that to something better, allowing for lock free reads.
In libgcc explicitly registered frames take precedence over the
dl_iterate_phdr mechanism, which means that we could mix future code
that does call __register_frame_info_bases explicitly with code that
does not. Code that does register will unwind faster than code that does
not, but both can coexist in one process.

Does that sound like a viable strategy to speed up exception handling? I
would be willing to contribute code for that, but I first wanted to know
if you are interested and if the strategy makes sense. Also, my
implementation makes use of atomics, which I hope are available on all
platforms that use unwind-dw2-fde.c, but I am not sure.

Thomas


Re: performance of exception handling

2020-05-11 Thread Thomas Neumann via Gcc
> Link: 
> 
> I'm not sure if your summary is correct.

I was referring to Section 3.2, where Herb says:

"We must remove all technical reasons for a C++ project to disable
exception handling (e.g., by compiler switch) or ban use of exceptions,
in all or part of their project."

In a way I am disagreeing with the paper, of course, in that I propose
to make the existing exception mechanism faster instead of inventing a
new exception mechanism. But what I agree on with P0709 is that it is
unfortunate that many projects disable exceptions due to performance
concerns. And I think the performance problems can be solved.

> My current preferred solution is something that moves the entire code
> that locates the relevant FDE table into glibc.

That is indeed an option, but I have two concerns there. First, it will
lead to code duplication, as libgcc will have to continue to provide its
on implementation on systems with "old" glibcs lacking
__dl_ehframe_find. And second, libgcc has a second lookup mechanism for
__register_frame_info_bases etc., which is needed to JITed code anyway.
And it seems to be attractive to handle that in the same data structure
that also covers the code from executables and shared libraries. Of
course one could move that part to glibc, too. But the code duplication
problems will persist for a long time, as gcc cannot rely upon the
system glibc being new enough to provide __dl_ehframe_find.

Thomas


Re: performance of exception handling

2020-05-11 Thread Thomas Neumann via Gcc
> Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
> implementation in GLIBC creates its own set of advantages and
> disadvantages.

so what should I do now? Should I try to move the lookup into GLIBC? Or
handled it within libgcc, as I had originally proposed? Or give up due
to the inertia of a large, grown system?

Another concern is memory consumption. I wanted to store the FDE entries
in a b-tree, which allows for fast lookup and low overhead
synchronization. Memory wise that is not really worse than what we have
today (the "linear" and "erratic" arrays). But the current code has a
fallback for when it is unable to allocate these arrays, falling back to
linear search. Is something like that required? It would make the code
much more complicated (but I got from Moritz mail that some people
really care about memory constrained situations).

Thomas


Re: performance of exception handling

2020-05-12 Thread Thomas Neumann via Gcc
> Some people use exceptions to propagate "low memory" up which
> made me increase the size of the EH emergency pool (which is
> used when malloc cannot even allocate the EH data itself) ...
> 
> So yes, people care.  There absolutely has to be a path in
> unwinding that allocates no (as little as possible) memory.

note that I would not allocate at all in the unwinding path. I would
allocate memory when new frames are registered, but unwinding would be
without any allocations.

Of course there is a trade-off here. We could delay allocating the
lookup structures until the first exception occurs, in order to speed up
programs that never throw any exceptions. But that would effectively
force us to implement a "no memory" fallback, for exactly the reason you
gave, as something like bad_alloc might be the first exception that we
encounter.

Thomas


Re: performance of exception handling

2020-05-12 Thread Thomas Neumann via Gcc
> Just echoing what David said really, but: if the libgcc changes
> are expected to be portable beyond glibc, then the existence of
> an alternative option for glibc shouldn't block the libgcc changes.
> The two approaches aren't be mutually exclusive and each approach
> would achieve something that the other one wouldn't.

to make this discussion a bit less abstract I have implemented a
prototype: https://pastebin.com/KtrPhci2
It is not perfect yet, for example frame de-registration is suboptimal,
but it allows us to speak about an actual implementation with real
performance numbers.

To give some numbers I take my silly example from
https://repl.it/repls/DeliriousPrivateProfiler
with 6 * 1,000,000 function calls, where half of the functions throw,
and I execute it either single threaded or multi-threaded (with 6
threads) on a i7-6800K. Note that the effects are even more dramatic on
larger machines.
The "old" implementation is gcc 9.3., the "new" implementation is gcc
git with the patch linked above. (Note that you have to both use the
patched gcc and use LD_LIBRARY_PATH or similar to force the new libgcc
when repeating the experiment).

The execution times are:

old approach, single threaded: 4.3s
old approach, multi threaded: 6.5s
new approach, single threaded: 3.9s
new approach, multi threaded: 0.7s

This is faster even when single threaded, and it is dramatically faster
when using multiple threads. On machines where atomics are supported
raising an exception no longer uses a global mutex (except for the first
exception after new exception frames were added), and thus exception
processing scales nicely with the threaded count. The code also handles
the out-of-memory condition, falling back to linear search in that case
(just as the old code).

Of course this needs more polishing and testing, but would something
like this be acceptable for gcc? It makes exceptions much more useful in
multi-threaded applications.

Thomas