Re: gcc behavior on memory exhaustion

2017-08-10 Thread Richard Earnshaw (lists)
On 09/08/17 14:05, Andrew Roberts wrote:
> I routinely build the weekly snapshots and RC's, on x64, arm and aarch64.
> 
> The last gcc 8 snapshot and the two recent 7.2 RC's have failed to build
> on aarch64 (Raspberry Pi 3, running Arch Linux ARM). I have finally
> traced this to the system running out of memory. I guess a recent kernel
> update had changed the memory page size and the swap file was no longer
> being used because the page sizes didn't match.
> 
> Obviously this is my issue, but the error's I was getting from gcc did
> not help. I was getting ICE's, thus:
> 
> /usr/local/gcc/bin/g++ -Wall -Wextra -Wno-ignored-qualifiers
> -Wno-sign-compare -Wno-write-strings -std=c++14 -pipe -march=armv8-a
> -mcpu=cortex-a53 -mtune=cortex-a53 -ftree-vectorize -O3
> -DUNAME_S=\"linux\" -DUNAME_M=\"aarch64\" -DOSMESA=1 -I../libs/include
> -DRASPBERRY_PI -I/usr/include/freetype2 -I/usr/include/harfbuzz
> -I/usr/include/unicode   -c -o glerr.o glerr.cpp
> {standard input}: Assembler messages:
> {standard input}: Warning: end of file not at end of a line; newline
> inserted
> {standard input}:204: Error: operand 1 must be an integer register -- `mov'
> {standard input}: Error: open CFI at the end of file; missing
> .cfi_endproc directive
> g++: internal compiler error: Killed (program cc1plus)
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.
> make: *** [: glerr.o] Error 4
> make: *** Waiting for unfinished jobs
> 
> I was seeing the problem when building using make -j2. Both building gcc
> and building large user projects.
> 
> There are two issues here:
> 
> 1) There was discussion about increasing the amount of memory gcc would
> reserve to help speed up compilation of large source files, I wondered
> if this could be a factor.
> 
> 2) It would be nice to see some sort of out of memory error, rather than
> just an ICE.
> 
> The system has 858Mb of  RAM without the swap file.
> 
> Building a single source file seems to use up to 97% of the available
> memory (for a 2522 line C++ source file).
> 
> make -j2 is enough to cause the failure.
> 
> Regards
> 
> Andrew Roberts
> 
> 
> 
> 
> 
> 

If you think gcc is using an unreasonable amount of memory for a
particular bit of code then please file a bug report, with pre-processed
source code (don't assume that because the sources are part of gcc we
can reproduce your setup).  You should also set the keyword "memory-hog"
on the report.  If you have statistics for older versions of the
compiler, or for other targets that will add evidence for us to look at.

R.


Re: gcc behavior on memory exhaustion

2017-08-10 Thread Yuri Gribov
On Wed, Aug 9, 2017 at 3:14 PM, Andreas Schwab  wrote:
> On Aug 09 2017, Yuri Gribov  wrote:
>
>> On Wed, Aug 9, 2017 at 2:49 PM, Andrew Haley  wrote:
>>> On 09/08/17 14:05, Andrew Roberts wrote:
 2) It would be nice to see some sort of out of memory error, rather than
 just an ICE.
>>>
>>> There's nothing we can do: the kernel killed us.  We can't emit any
>>> message before we die.  (killed) tells you that we were killed, but
>>> we don't know who done it.
>>
>> Well, driver could check syslog...
>
> The syslog is very system dependent and may not even be readable by
> unprivileged users.

It's best-effort of course.

> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."


Re: gcc behavior on memory exhaustion

2017-08-10 Thread Florian Weimer
* Andrew Haley:

> On 09/08/17 14:05, Andrew Roberts wrote:
>> 2) It would be nice to see some sort of out of memory error, rather than 
>> just an ICE.
>
> There's nothing we can do: the kernel killed us.  We can't emit any
> message before we die.  (killed) tells you that we were killed, but
> we don't know who done it.

The driver already prints a message.

The siginfo_t information should indicate that the signal originated
from the kernel.  It seems that for SIGKILL, there are currently three
causes in the kernel: the OOM killer, some apparently unreachable code
in ptrace, and something cgroups-related.  The latter would likely
take down the driver process, too, so a kernel-originated SIGKILL
strongly points to the OOM killer.

But the kernel could definitely do better and set a flag for SIGKILL.


Re: gcc behavior on memory exhaustion

2017-08-10 Thread Pedro Alves
On 08/10/2017 10:22 PM, Florian Weimer wrote:
> * Andrew Haley:
> 
>> On 09/08/17 14:05, Andrew Roberts wrote:
>>> 2) It would be nice to see some sort of out of memory error, rather than 
>>> just an ICE.
>>
>> There's nothing we can do: the kernel killed us.  We can't emit any
>> message before we die.  (killed) tells you that we were killed, but
>> we don't know who done it.
> 
> The driver already prints a message.
> 
> The siginfo_t information should indicate that the signal originated
> from the kernel.  

OOC, where?  While a parent process can use "waitid" to get
a siginfo_t with information about the child exit, that siginfo_t
is not the same siginfo_t a signal handler would get as
argument if you could catch/intercept SIGKILL, which you can't
on Linux.  I.e., checking for e.g., si_code == SI_KERNEL in
the siginfo filled in by waitid won't work, because that
siginfo_t has si_code values for SIGCHLD [CLD_EXITED/CLD_KILLED/etc.],
not for the signal that actually killed the process.

Doesn't seem to give you any more useful information beyond the
what you can already get using waitpid (which is what libiberty's
pex code in question uses) and WIFSIGNALED/WTERMSIG.

> It seems that for SIGKILL, there are currently three
> causes in the kernel: the OOM killer, some apparently unreachable code
> in ptrace, and something cgroups-related.  The latter would likely
> take down the driver process, too, so a kernel-originated SIGKILL
> strongly points to the OOM killer.
> 
> But the kernel could definitely do better and set a flag for SIGKILL.

Meanwhile, maybe just having the driver check for SIGKILL and
enumerate likely causes would be better than the status quo.

Pedro Alves


Re: gcc behavior on memory exhaustion

2017-08-10 Thread Andrew Roberts

On 11/08/17 02:09, Pedro Alves wrote:


Meanwhile, maybe just having the driver check for SIGKILL and
enumerate likely causes would be better than the status quo.

Pedro Alves

I agree, having some indication it MIGHT be out of memory would stop 
people wasting a lot of time, and avoid spurious bug reports.


Meanwhile, I'm testing memory usage and compile times with my code on 
gcc 5.4.0, 6.4.0, 7.2.0 and 8.0.0, across x64, arm and aarch64.


I'll file a bug report on the memory-hog issue after I've narrowed down 
the issue. There certainly does seem to be far worse performance on 
aarch64 vs arm.


Here's a flavour, for a single C++ file:

aarch64

GCCCompileTimeMemoryUsed

5.4.04:18.24357328

6.4.02:57.14730020

7.2.03:03.38735748

8.0.03:29.16837316

arm

5.4.03:39.20247696

6.4.02:15.41287904

7.2.02:22.85294324

8.0.02:41.79306032

So there has been a massive blow up in memory usage on aarch64 vs arm.

While compile times for a single file are better (and its not yet 
pagefaulting). For multiple files I'm getting 400+ pagefaults while 
building the entire project, vs none for arm.


If anybody has some suggestions for things I should test I'll give it a 
go. I'll also play with:


--param ggc-min-expand= --param ggc-min-heapsize=

Is there a way of getting a list of individual optimizations enabled by -O3, so 
I can try removing individual ones?

BTW the code is very simple and dumb. Its an automatically generated file which 
just populates a std::map using 2330 inserts.
I'm sure there are better ways of doing this, but never discount how dumb your 
users are...

Thanks

Andrew



Re: gcc behavior on memory exhaustion

2017-08-10 Thread Florian Weimer
* Pedro Alves:

>> The siginfo_t information should indicate that the signal originated
>> from the kernel.  
>
> OOC, where?  While a parent process can use "waitid" to get
> a siginfo_t with information about the child exit, that siginfo_t
> is not the same siginfo_t a signal handler would get as
> argument if you could catch/intercept SIGKILL, which you can't
> on Linux.  I.e., checking for e.g., si_code == SI_KERNEL in
> the siginfo filled in by waitid won't work, because that
> siginfo_t has si_code values for SIGCHLD [CLD_EXITED/CLD_KILLED/etc.],
> not for the signal that actually killed the process.

Oh, right.  Maybe the UID field has a special value.  But it's
probably zero, so you can't tell whether the system administrator or
the kernel has killed the process.

> Meanwhile, maybe just having the driver check for SIGKILL and
> enumerate likely causes would be better than the status quo.

Indeed.