Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Jiri Slaby
On 11/04/2009 07:44 AM, Justin P. Mattock wrote:
> as for compiling: libc compiled fine, kernel fine,
> and every package on the clfs list up to boot up the fresh system.

It might be pretty c++ only, I think.


Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread KOSAKI Motohiro
> On Mon, 2 Nov 2009 13:29:29 -0800 Justin Mattock  
> wrote:
> 
> > Hello,
> > I'm not sure how to handle this,
> > while compiling firefox-3.6b1.source
> > I get this with the default compiling options,
> > as well as custom:
> > 
> > ...
> >
> > active_anon:2360492kB inactive_anon:590196kB active_file:84kB
> 
> 2.8GB of anonymous memory
> 
> > [  532.942508] Free swap  = 0kB
> > [  532.942510] Total swap = 431632kB
> 
> 430MB of swap, all used up.
> 
> That's a genuine OOM.  Something (presumably cc1plus) has consumed
> wy too much memory, quite possibly leaked it.
> 
> It would help if the oom-killer were to print some information about
> the oom-killed process's memory footprint.
> 


How about this?


Subject: [PATCH] oom: show vsz and rss information of the killed process

In typical oom anylysis scenario, we frequently want to know the killed
process has memory leak or not at first step.
This patch add vsz and rss information to oom log for helping its
analysis. It save much times of debugging guys.

example:
===
rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Pid: 1308, comm: rsyslogd Not tainted 2.6.32-rc6 #24
Call Trace:
[] ?_spin_unlock+0x2b/0x40
[] oom_kill_process+0xbe/0x2b0

(snip)

492283 pages non-shared
Out of memory: kill process 2341 (memhog) score 527276 or a child
Killed process 2341 (memhog) vsz:1054552kB, anon-rss:970588kB, file-rss:4kB
===
 ^
 |
here

Signed-off-by: KOSAKI Motohiro 
---
 mm/oom_kill.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ea2147d..498e6f6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -337,6 +337,8 @@ static void dump_tasks(const struct mem_cgroup *mem)
} while_each_thread(g, p);
 }
 
+#define K(x) ((x) << (PAGE_SHIFT-10))
+
 /*
  * Send SIGKILL to the selected  process irrespective of  CAP_SYS_RAW_IO
  * flag though it's unlikely that  we select a process with CAP_SYS_RAW_IO
@@ -356,9 +358,16 @@ static void __oom_kill_task(struct task_struct *p, int 
verbose)
return;
}
 
-   if (verbose)
-   printk(KERN_ERR "Killed process %d (%s)\n",
-   task_pid_nr(p), p->comm);
+   if (verbose) {
+   task_lock(p);
+   printk(KERN_ERR "Killed process %d (%s) "
+  "vsz:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
+  task_pid_nr(p), p->comm,
+  K(p->mm->total_vm),
+  K(get_mm_counter(p->mm, anon_rss)),
+  K(get_mm_counter(p->mm, file_rss)));
+   task_unlock(p);
+   }
 
/*
 * We give our sacrificial lamb high priority and access to
-- 
1.6.2.5





Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Dave Korn
Andrew Morton wrote:
> On Mon, 2 Nov 2009 13:29:29 -0800 Justin Mattock  
> wrote:
> 
>> Hello,
>> I'm not sure how to handle this,
>> while compiling firefox-3.6b1.source
>> I get this with the default compiling options,
>> as well as custom:
>>
>> ...
>>
>> active_anon:2360492kB inactive_anon:590196kB active_file:84kB
> 
> 2.8GB of anonymous memory
> 
>> [  532.942508] Free swap  = 0kB
>> [  532.942510] Total swap = 431632kB
> 
> 430MB of swap, all used up.
> 
> That's a genuine OOM.  Something (presumably cc1plus) has consumed
> wy too much memory, quite possibly leaked it.
> 
> It would help if the oom-killer were to print some information about
> the oom-killed process's memory footprint.

  I would think that the quickest way to proceed would be to re-run the
failing compile command under gdb at the command-line and see what it's doing
when the oom killer signals it, wouldn't it?  Or turn up the swap until it
doesn't get killed and see what info can be gleaned from the cc1(plus?)
-fmem-report output.

cheers,
  DaveK



Re: How to support 40bit GP register

2009-11-04 Thread Mohamed Shafi
2009/10/22 Richard Henderson :
> On 10/21/2009 07:25 AM, Mohamed Shafi wrote:
>>
>> For accessing a->b GCC generates the following code:
>>
>>        move.l  (sp-16), d3
>>        lsrr.l  #<16, d3
>>        move.l  (sp-12),d2
>>        asll    #<16,d2
>>        or      d3,d2
>>        cmpeq.w #<2,d2
>>        jf      _L2
>>
>> Because data registers are 40 bit for 'asll' operation the shift count
>> should be 16+8 or there should be sign extension from 32bit to 40 bits
>> after the 'or' operation. The target has instruction to sign extend
>> from 32bit to 40 bit.
>>
>> Similarly there are other operation that requires sign/zero extension.
>> So is there any way to tell GCC that the data registers are 40bit and
>> there by expect it to generate sign/zero extension accordingly ?
>
> Define a machine mode for your 40-bit type in cpu-modes.def.  Depending on
> how your 40-bit type is stored in memory, you'll use either
>
>  INT_MODE (RI, 5)                // load-store uses exactly 5 bytes
>  FRACTIONAL_INT_MODE (RI, 40, 8) // load-store uses 8 bytes
>
Richard thanks for the reply.

Load-store uses 32bits. Sign extension happens automatically. So i
have choosen INT_MODE (RI, 5) and copied movsi and renamed it to
movri. I have also specified that RImode need only one register.

> Where I've arbitrarily chosen "RImode" as a mnemonic for Register Integral
> Mode.  Now you define arithmetic operations, as needed, on
> RImode.  You define the "extendsiri" pattern to be that sign-extend from
> 32-to-40-bit instruction.  You define your comparison patterns on RImode,
> and not on SImode, since your comparison instruction works on the entire 40
> bits.

I have defined extendsiri and cbranchri4 patterns. When i compile a
program like

unsigned long xh = 1;
int main ()
{
unsigned long yh = 0xull;
unsigned long z = xh * yh;

 if (z != yh)
   abort ();

return 0;
}

I get the following ICE

internal compiler error: in immed_double_const, at emit-rtl.c:553

This happens from cse_insn () calls insert() -> gen_lowpart ->
gen_lowpart_common -> simplify_gen_subreg -> simplfy_immed_subreg.
simplify_immed_subreg is called with the parameters (outermode=RImode,
(const_int 65535), innermode=DImode, byte=0)

cse_insn is called for the following insn

(insn 10 9 11 3 bug7.c:14 (set (reg:RI 67)
(const_int 65535 [0x])) 4 {movri} (nil))


How can i overcome this?

Regards,
Shafi

>
> You'll wind up with a selection of patterns in your machine description that
> have a sign-extension pattern built in, depending on the exact behaviour of
> your ISA.  There are plenty of examples on x86_64, mips64, and Alpha (to
> name a few) that have similar properties with SI and DImodes.  Examine the
> -fdump-rtl-combine-details dump for exemplars of the canonical forms that
> the combiner creates when it tries to merge sign-extension instructions into
> preceeding patterns.
>


Re: IRA is not looking into the predicates ?

2009-11-04 Thread Mohamed Shafi
2009/10/30 Jeff Law :
> On 10/30/09 07:13, Mohamed Shafi wrote:
>>
>> Hi,
>>
>> I am doing a port for a 32bit target in GCC 4.4.0. The target does not
>> have support for symbolic address in QImode for load operations.
>
> You'll need to make sure to reject such addresses for QImode in
> GO_IF_LEGITIMATE_ADDRESS.
>
>
>>  In
>> order to do this what i have done is in define_expand for moveqi
>> reject symbolic address it they come in source operands and i have
>> also written a predicate for *moveqi_internal to reject such cases.
>>
>
> OK.  Nothing wrong with these steps.  Though you really need to make sure
> GO_IF_LEGITIMATE_ADDRESS is defined correctly.
>
> IRA doesn't look at operand predicates or insn conditions.  It assumes that
> any insns are valid assuming any pseudo registers appearing in the insn get
> suitable hard registers.
>
> Based on the dumps you provided it appears that reg61 does not get a hard
> register and reload is generating the problematical insn #24.  This is a
> good indication that your GO_IF_LEGITIMATE_ADDRESS is incorrectly
> implemented.
>
   I the GO_IF_LEGITIMATE_ADDRESS address macro i am allowing this
address because the target supports symbolic address in QImode for
store operations. And in the macro GO_IF_LEGITIMATE_ADDRESS there is
no option to check if the address is used in load or store. Thats why
in define_expand for moveqi i reject symbolic address it they come in
source operands and a predicate for *moveqi_internal to reject such
cases. But still i am getting the ICE.  IIRC the control does not come
to TARGET_SECONDARY_RELOAD also. How can i overcome this?

Regards,
Shafi


Re: IRA is not looking into the predicates ?

2009-11-04 Thread Mohamed Shafi
2009/10/30 Ian Lance Taylor :
> Mohamed Shafi  writes:
>
>>>From ice4.c.168r.asmcons
>>
>> (insn 5 2 6 2 ice4.c:4 (set (reg:SI 61 [ s ])
>>         (mem/c/i:SI (symbol_ref:SI ("s") [flags 0x2] > 0xb7bfd000 s>) [0 s+0 S4 A32])) 2 {*movsi_internal} (nil))
>>
>> (insn 6 5 7 2 ice4.c:4 (set (reg:QI 62)
>>         (plus:QI (subreg:QI (reg:SI 61 [ s ]) 0)
>>             (const_int -100 [0xff9c]))) 16 {addqi3}
>> (expr_list:REG_DEAD (reg:SI 61 [ s ])
>>         (nil)))
>>
>> How can i prevent this ICE ?
>
> If asmcons is the first place that this appears, then I think it must
> be coming from some asm statement.  So the first step would be to look
> at the asm statement and see if it can be rewritten using a different
> constraint.
>
   No this appears from the rtl expand onwards.

Shafi


Re: How to support 40bit GP register

2009-11-04 Thread Dave Korn
Mohamed Shafi wrote:

> Load-store uses 32bits. Sign extension happens automatically. So i
> have choosen INT_MODE (RI, 5) and copied movsi and renamed it to
> movri. I have also specified that RImode need only one register.

> I get the following ICE
> 
> internal compiler error: in immed_double_const, at emit-rtl.c:553
> 
> This happens from cse_insn () calls insert() -> gen_lowpart ->
> gen_lowpart_common -> simplify_gen_subreg -> simplfy_immed_subreg.
> simplify_immed_subreg is called with the parameters (outermode=RImode,
> (const_int 65535), innermode=DImode, byte=0)
> 
> cse_insn is called for the following insn
> 
> (insn 10 9 11 3 bug7.c:14 (set (reg:RI 67)
> (const_int 65535 [0x])) 4 {movri} (nil))
> 
> 
> How can i overcome this?

  Just from reading the source for immed_double_const, I see:

>   /* There are the following cases (note that there are no modes with
>  HOST_BITS_PER_WIDE_INT < GET_MODE_BITSIZE (mode) < 2 * 
> HOST_BITS_PER_WIDE_INT):

  Oops.  That's no longer true if HBPWI == 32 and your new mode has 40 bits.

>   gcc_assert (GET_MODE_BITSIZE (mode) == 2 * HOST_BITS_PER_WIDE_INT);

  I would guess that assert is firing.

>   /* If this integer fits in one word, return a CONST_INT.  */
  if ((i1 == 0 && i0 >= 0) || (i1 == ~0 && i0 < 0))
return GEN_INT (i0);

  Here you'll want to mask out and check only the low 8 (== 40 - 32, i.e.
GET_MODE_BITSIZE(mode) - HOST_BITS_PER_WIDE_INT) bits of i1, I think.  The
rest of the code looks like it should work.

cheers,
  DaveK



Re: IRA is not looking into the predicates ?

2009-11-04 Thread Jeff Law



I the GO_IF_LEGITIMATE_ADDRESS address macro i am allowing this
address because the target supports symbolic address in QImode for
store operations.
If your target can not use a symbolic address in a QImode load, then 
GO_IF_LEGITIMATE_ADDRESS must reject symbolic addresses in QImode.  It's 
that simple.


jeff




Re: PATCH: Support --enable-gold=both --with-linker=[bfd|gold]

2009-11-04 Thread H.J. Lu
On Tue, Nov 3, 2009 at 9:09 PM, Roland McGrath  wrote:
> I can't really tell how that's different from the patch I posted.
> It looks fine to me.
>

The difference is you can set the default linker.


-- 
H.J.


Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Justin P. Mattock

Dave Korn wrote:

Andrew Morton wrote:
   

On Mon, 2 Nov 2009 13:29:29 -0800 Justin Mattock  
wrote:

 

Hello,
I'm not sure how to handle this,
while compiling firefox-3.6b1.source
I get this with the default compiling options,
as well as custom:

...

active_anon:2360492kB inactive_anon:590196kB active_file:84kB
   

2.8GB of anonymous memory

 

[  532.942508] Free swap  = 0kB
[  532.942510] Total swap = 431632kB
   

430MB of swap, all used up.

That's a genuine OOM.  Something (presumably cc1plus) has consumed
wy too much memory, quite possibly leaked it.

It would help if the oom-killer were to print some information about
the oom-killed process's memory footprint.
 


   I would think that the quickest way to proceed would be to re-run the
failing compile command under gdb at the command-line and see what it's doing
when the oom killer signals it, wouldn't it?  Or turn up the swap until it
doesn't get killed and see what info can be gleaned from the cc1(plus?)
-fmem-report output.

 cheers,
   DaveK


   

I can try, only issue I have is I don't
use a distro, so building anything requires me
to hand compile it(hopefully not difficult for gdb).

So give me some time on this and I'll see if I can get this up
and running, and add that patch to kernel then go from there.

Justin P. Mattock


Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Andrew Morton
On Wed,  4 Nov 2009 18:32:16 +0900 (JST) KOSAKI Motohiro 
 wrote:

> > It would help if the oom-killer were to print some information about
> > the oom-killed process's memory footprint.
> > 
> 
> 
> How about this?

looks good, thanks.

> 
> Subject: [PATCH] oom: show vsz and rss information of the killed process
> 
> In typical oom anylysis scenario, we frequently want to know the killed
> process has memory leak or not at first step.
> This patch add vsz and rss information to oom log for helping its
> analysis. It save much times of debugging guys.
> 
> example:
> ===
> rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
> Pid: 1308, comm: rsyslogd Not tainted 2.6.32-rc6 #24
> Call Trace:
> [] ?_spin_unlock+0x2b/0x40
> [] oom_kill_process+0xbe/0x2b0
> 
> (snip)
> 
> 492283 pages non-shared
> Out of memory: kill process 2341 (memhog) score 527276 or a child
> Killed process 2341 (memhog) vsz:1054552kB, anon-rss:970588kB, file-rss:4kB
> ===
>  ^
>  |
> here
> ...
>
> + if (verbose) {
> + task_lock(p);

We need to be careful with which locks we take on the oom-killer path,
because it can be called by code which already holds locks.  But I
expect task_lock() will be OK.

> + printk(KERN_ERR "Killed process %d (%s) "
> +"vsz:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
> +task_pid_nr(p), p->comm,
> +K(p->mm->total_vm),
> +K(get_mm_counter(p->mm, anon_rss)),
> +K(get_mm_counter(p->mm, file_rss)));
> + task_unlock(p);
> + }



Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Dave Korn
Justin P. Mattock wrote:

> I can try, only issue I have is I don't
> use a distro, so building anything requires me
> to hand compile it

  Oh, ouch!

> (hopefully not difficult for gdb).

  Indeed, hopefully not.

> So give me some time on this and I'll see if I can get this up
> and running, and add that patch to kernel then go from there.

  The one thing you can still try straight away for minimal effort is the
-fmem-report option, but it's also the least informative...

cheers,
  DaveK



Re: Preserving the argument spills for GDB

2009-11-04 Thread Jean Christophe Beyler
> You can force your writes to the stack to not be removed by making
> them use UNSPEC_VOLATILE.  You would write special define_insns for
> this.

Is there an architecture port that has done this already ?

> Not to miss the obvious, note that this will hurt optimization.
> However, if you need to have the argument values available for all
> backtraces, then I'm not sure what else to recommend.  In general gcc
> will discard argument values that are not needed.

I know. Personally, I have not been advocating this but for the
moment, we have been making a study at what would be needed and how
bad it would be.

However, I've been going through the first step : running GDB, setting
a break-point and doing a continue to see what I get and try to get
the information right for O3 too.

In O0, I get:
Breakpoint @@ 1, foo (a=4, b=3, c=2, d=1) at hello.c:10

In O3, I get:
Breakpoint @@ 1, foo (a=Variable "a" is not available.) at hello.c:11

Now, I've been able to tell GCC to save those arguments exactly at the
same address in O3 as it does in O0 (I hacked the varargs saving
arguments code so that it would do the same thing for all functions).

It seems that, in the O0 case, the Dwarf information is automatically
propagated to say "The input register is now here", but when I do it
in O3, I'm issuing the information in the same way.

What am I exactly missing? Any ideas why GDB would not have enough
information in this case?

Thanks,
Jean Christophe Beyler


Re: Preserving the argument spills for GDB

2009-11-04 Thread Nathan Froyd
On Wed, Nov 04, 2009 at 11:24:34AM -0500, Jean Christophe Beyler wrote:
> However, I've been going through the first step : running GDB, setting
> a break-point and doing a continue to see what I get and try to get
> the information right for O3 too.
> 
> In O0, I get:
> Breakpoint @@ 1, foo (a=4, b=3, c=2, d=1) at hello.c:10
> 
> In O3, I get:
> Breakpoint @@ 1, foo (a=Variable "a" is not available.) at hello.c:11
> 
> It seems that, in the O0 case, the Dwarf information is automatically
> propagated to say "The input register is now here", but when I do it
> in O3, I'm issuing the information in the same way.
> 
> What am I exactly missing? Any ideas why GDB would not have enough
> information in this case?

You should look at the DWARF information (readelf -wi) and see if the
function parameters have DW_AT_location attributes.  If they don't, then
you need to ensure that they get generated.  If they do, then perhaps
they are wrong or GDB is not interpreting them correctly.  (They get
generated with optimization and interpreted correctly on other platforms
that pass args in registers.)

-Nathan


Re: How to support 40bit GP register

2009-11-04 Thread Richard Henderson

On 11/04/2009 05:34 AM, Mohamed Shafi wrote:

Load-store uses 32bits. Sign extension happens automatically. So i
have choosen INT_MODE (RI, 5) and copied movsi and renamed it to
movri. I have also specified that RImode need only one register.


This isn't going to work.  In order to get correct code, you're going to 
need to be able to spill and reload the full 40-bit value.


If you can't do this easily... then I'm afraid we'll have to find a more 
complicated solution which involves only exposing RImode values after 
register allocation.



internal compiler error: in immed_double_const, at emit-rtl.c:553


Hmm.  This is a nasty little logic error.  The quickest work-around for 
the problem is to set need_64bit_hwint in config.gcc.



r~


Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Justin Mattock
On Wed, Nov 4, 2009 at 7:45 AM, Dave Korn
 wrote:
> Justin P. Mattock wrote:
>
>> I can try, only issue I have is I don't
>> use a distro, so building anything requires me
>> to hand compile it
>
>  Oh, ouch!
>

I know.. I'm a horror for optimization

>> (hopefully not difficult for gdb).
>
>  Indeed, hopefully not.
>

you never know, some packages big/small turn into brain surgery
just to get going.(I'll try after I do some morning exercises)

>> So give me some time on this and I'll see if I can get this up
>> and running, and add that patch to kernel then go from there.
>
>  The one thing you can still try straight away for minimal effort is the
> -fmem-report option, but it's also the least informative...
>
>    cheers,
>      DaveK
>
>

O.k. here is the info from dmesg(with the patch added)
and what -fmem-report:


[  205.931940] kjournald starting.  Commit interval 5 seconds
[  205.931957] EXT3-fs warning: maximal mount count reached, running
e2fsck is recommended
[  205.935509] EXT3 FS on sdb1, internal journal
[  205.935513] EXT3-fs: mounted filesystem with writeback data mode.
[  205.956396] SELinux: initialized (dev sdb1, type ext3), uses xattr
[  434.205304] __ratelimit: 75 callbacks suppressed
[  434.205308] wicd-monitor invoked oom-killer: gfp_mask=0x201da,
order=0, oom_adj=0
[  434.205313] Pid: 1563, comm: wicd-monitor Tainted: P
2.6.32-rc5-00081-g964fe08-dirty #36
[  434.205316] Call Trace:
[  434.205325]  [] oom_kill_process+0x7c/0x243
[  434.205330]  [] __out_of_memory+0x146/0x15d
[  434.205335]  [] out_of_memory+0x6e/0x9d
[  434.205339]  [] __alloc_pages_nodemask+0x498/0x5ce
[  434.205345]  [] __do_page_cache_readahead+0xa0/0x1a1
[  434.205350]  [] ra_submit+0x1c/0x20
[  434.205353]  [] filemap_fault+0x1a6/0x346
[  434.205359]  [] __do_fault+0x4f/0x3d9
[  434.205363]  [] ? do_sync_read+0xe3/0x120
[  434.205369]  [] ? file_has_perm+0x90/0x9e
[  434.205373]  [] handle_mm_fault+0x3ab/0x6a7
[  434.205379]  [] do_page_fault+0x2bb/0x2d3
[  434.205383]  [] page_fault+0x25/0x30
[  434.205386] Mem-Info:
[  434.205388] DMA per-cpu:
[  434.205391] CPU0: hi:0, btch:   1 usd:   0
[  434.205394] CPU1: hi:0, btch:   1 usd:   0
[  434.205396] DMA32 per-cpu:
[  434.205399] CPU0: hi:  186, btch:  31 usd: 125
[  434.205401] CPU1: hi:  186, btch:  31 usd: 105
[  434.205404] Normal per-cpu:
[  434.205406] CPU0: hi:  186, btch:  31 usd: 172
[  434.205409] CPU1: hi:  186, btch:  31 usd: 154
[  434.205416] active_anon:708764 inactive_anon:266208 isolated_anon:0
[  434.205417]  active_file:71 inactive_file:11 isolated_file:0
[  434.205419]  unevictable:0 dirty:0 writeback:0 unstable:0 buffer:74
[  434.205420]  free:6961 slab_reclaimable:2782 slab_unreclaimable:16224
[  434.205421]  mapped:65 shmem:35 pagetables:2861 bounce:0
[  434.205430] DMA free:15944kB min:28kB low:32kB high:40kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15360kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
[  434.205438] lowmem_reserve[]: 0 2976 3986 3986
[  434.205449] DMA32 free:9976kB min:6020kB low:7524kB high:9028kB
active_anon:2360156kB inactive_anon:589924kB active_file:60kB
inactive_file:44kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:3047792kB mlocked:0kB dirty:0kB
writeback:0kB mapped:88kB shmem:4kB slab_reclaimable:148kB
slab_unreclaimable:316kB kernel_stack:40kB pagetables:5952kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:225
all_unreclaimable? yes
[  434.205457] lowmem_reserve[]: 0 0 1010 1010
[  434.205468] Normal free:1924kB min:2040kB low:2548kB high:3060kB
active_anon:474900kB inactive_anon:474908kB active_file:224kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1034240kB mlocked:0kB dirty:0kB
writeback:0kB mapped:172kB shmem:136kB slab_reclaimable:10980kB
slab_unreclaimable:64572kB kernel_stack:824kB pagetables:5492kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:677
all_unreclaimable? yes
[  434.205476] lowmem_reserve[]: 0 0 0 0
[  434.205481] DMA: 2*4kB 2*8kB 3*16kB 2*32kB 3*64kB 2*128kB 2*256kB
1*512kB 2*1024kB 2*2048kB 2*4096kB = 15944kB
[  434.205493] DMA32: 2*4kB 14*8kB 16*16kB 10*32kB 3*64kB 1*128kB
1*256kB 1*512kB 2*1024kB 1*2048kB 1*4096kB = 9976kB
[  434.205505] Normal: 481*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1924kB
[  434.205516] 8029 total pagecache pages
[  434.205519] 7893 pages in swap cache
[  434.205521] Swap cache stats: add 112490, delete 104597, find 5058/5479
[  434.205524] Free swap  = 0kB
[  434.205526] Total swap = 431632kB
[  434.220125] 1048576 pages RAM
[  434.220127] 40493 pages reserved
[  434.220129] 170 pages shared
[  434.220131] 1000179 pages non-shared
[  434.220135] Out of memory: kill

Re: Preserving the argument spills for GDB

2009-11-04 Thread Ian Lance Taylor
Jean Christophe Beyler  writes:

>> You can force your writes to the stack to not be removed by making
>> them use UNSPEC_VOLATILE.  You would write special define_insns for
>> this.
>
> Is there an architecture port that has done this already ?

No, because, when given the choice, gcc prefers faster execution over
more reliable debugging at high optimization levels.

Ian


Re: i370 port

2009-11-04 Thread Ulrich Weigand
Paul Edwards wrote:

> The QI must be a signed char, and thus rejecting any value greater than 127.
> As you can see, I changed it to SI, which, with the constraints and tests
> in place, should be fine.

Ah, OK.  That would explain it.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: Understanding IRA

2009-11-04 Thread Vladimir Makarov

Jeff Law wrote:

On 11/03/09 09:29, Ian Bolton wrote:

Hi again, Vladimir,

I am pleased to report some performance improvements after altering
ira-costs.c.  A key benchmark for us has improved by 5%.

Specifically, in record_reg_classes(), after the alt_cost has been
calculated and it will be applied to pp->mem_cost and pp->cost[k], I
check whether this particular operand wanted one of our BOTTOM_REGS
(r0-r15) and I further increase the pp->mem_cost by an arbitrary
amount and also increase pp->cost[k] by an arbitrary amount if k
does not represent the BOTTOM_REGS class.  My aim here is to nudge
IRA in the right direction for operands that just want BOTTOM_REGS.

After experimenting with different values for my "arbitrary
amounts", I discovered some that successfully made IRA more likely
to give BOTTOM_REGS to those instructions/operands that want
BOTTOM_REGS, since any other regs and memory ended up with high
enough costs for IRA to try and avoid using them.

I have included a snippet from my version of record_reg_classes()
below:

   
What I don't understand at this point is why the current mechanisms in 
IRA aren't showing a lower cost for using BOTTOM_REGS (or a higher 
cost for TOP_REGS).  i.e.  I don't think any of this should be 
necessary as IRA should already be doing something similar.


This may be a case where your backend hasn't indicated that TOP_REGS 
has a higher cost than BOTTOM_REGS in particular situations.


I am agree with Jeff.  It is hard to understand what you are doing 
without the architecture knowledge and some macro values in your port 
(IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).


I'd also add that besides right macro value definitions, you could use 
insn alternative hints near register constraints like ? or even *.





RE: Understanding IRA

2009-11-04 Thread Ian Bolton
Hi Jeff,

From an empirical perspective, the value of your patch is hard to
determine at this stage - one benchmark improved about 0.5% but others
have marginally regressed.

From an intellectual perspective, however, your patch is clearly a Good
Thing.  If I am understanding correctly, your aim is to prevent cases
where reload evicts a pseudo from a register that conflicts with the
pseudo that we are trying to reload, thereby undoing the clever cost-
based logic in IRA that gave the register to the current owner in the
first place?

My guess is that the performance drop could be attributed to reload
being lucky in some cases and your patch is preventing this luck from
happening.  Whilst I'm more of a risk-taker by nature, my employer
would prefer more predictable behaviour from its compiler, so we will
likely commit your patch to our development branch. Thanks very much!

Is there an ETA on when reload will be gone away? ;-)

Best regards,
Ian
 
> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: 22 October 2009 23:05
> To: Ian Bolton
> Cc: Vladimir Makarov; gcc@gcc.gnu.org
> Subject: Re: Understanding IRA
> 
> On 10/19/09 12:30, Ian Bolton wrote:
> > Hi Jeff and Vladimir.
> >
> > Jeff: I'd be interested in trying the patch if you can send it my
> way.
> >
> It's nothing special.
> 
> /* Return nonzero if REGNO is a particularly bad choice for reloading
> X.  */
> static int
> ira_bad_reload_regno_1 (int regno, rtx x)
> {
>int x_regno;
>ira_allocno_t a;
>enum reg_class pref;
> 
>/* We only deal with pseudo regs.  */
>if (! x || GET_CODE (x) != REG)
>  return 0;
> 
>x_regno = REGNO (x);
>if (x_regno < FIRST_PSEUDO_REGISTER)
>  return 0;
> 
>/* If the pseudo prefers REGNO explicitly, then do not consider
>   REGNO a bad spill choice.  */
>pref = reg_preferred_class (x_regno);
>if (reg_class_size[pref] == 1
> && TEST_HARD_REG_BIT (reg_class_contents[pref], regno))
>  return 0;
> 
>/* If the pseudo conflicts with REGNO, then we consider REGNO a
>   poor choice for a reload regno.  */
>a = ira_regno_allocno_map[x_regno];
>if (TEST_HARD_REG_BIT (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a),
regno))
>  return 1;
> 
>return 0;
> }
> 
> /* Return nonzero if REGNO is a particularly bad choice for reloading
> IN or OUT.  */
> int
> ira_bad_reload_regno (int regno, rtx in, rtx out)
> {
>return (ira_bad_reload_regno_1 (regno, in)
>|| ira_bad_reload_regno_1 (regno, out));
> }
> 
> Then change the loop in allocate_reload_reg to iterate 3 times intead
> of
> 2.  And add this fragment
> 
> 
> 
>   if (pass == 1
> && ira_bad_reload_regno (regnum, rld[r].in, rld[r].out))
>  continue;
> 
> 
> To body of hte conditional starting with
> 
> if ((reload_reg_free_p ...
> 
> 
> It's really just a hack.  I don't want to spend much time on that
code
> as ultimately I want it all to go away.
> 
> Jeff
> 
> 
> 



Mercurial mirror of mainline not updated anymore?

2009-11-04 Thread Rainer Orth
I've just found (when I wanted to run a reghunt, which is way faster
with hg than with svn) that the mercurial mirror of svn mainline hasn't
been updated since September 10th.  Is there any chance that this mirror
can be revived?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Whole program optimization and functions-only-called-once.

2009-11-04 Thread Toon Moene

Jan,

I had some time to study the example I sent you a couple of weeks ago.

According to visible inspection of the source code, there are 5 
functions (subroutines in Fortran parlance) that are called once:


MAIN   calls
HLPROG calls
GEMINI calls
SL2TIM calls
PHCALL calls
PHTASK

I.e., the last five should be candidates for inlining of "functions only 
called once".


However, ccrPOljB.o.047i.inline says:

Deciding on functions called once:

Considering gemini_.clone.1 size 11443.
 Called once from hlprog 462 insns.
 Inlined into hlprog which now has 10728 size for a net change of 
-12620 size.


Considering hlprog size 10728.
 Called once from main 7 insns.
 Not inlined because --param large-function-growth limit reached.

Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size.

The dump option -fdump-ipa-all also gives me the call graph, of which I 
copy here the relevant part:


phcall_.clone.3/11(-1) @0x7fd198c16400 (clone of phcall/33) 
availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 
bytes stack usage reachable local finalized inlinable

  called by: sl2tim/49 (0.44 per call) sl2tim_.clone.0/16 (0.44 per call)
phtask_.clone.2/12(-1) @0x7fd198c16500 (clone of phtask/41) 
availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 
bytes stack usage reachable local finalized inlinable

  called by: phcall_.clone.3/11 (3.52 per call) phcall/33 (3.52 per call)
sl2tim_.clone.0/16(-1) @0x7fd198c16900 (clone of sl2tim/49) 
availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 
3856 bytes stack usage reachable local finalized inlinable

  called by: gemini_.clone.1/40 (1.00 per call) gemini/0 (1.00 per call)
gemini_.clone.1phtask/40(-1) @0x7fd198c35000 (inline copy in hlprog/17) 
(clone of gemini/0) availability:local 147324 time, 2770 benefit 11443 
size, 1177 benefit 11635 bytes stack usage reachable local finalized 
inlinable

  called by: hlprog/17 (3.57 per call) (inlined)
phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 
benefit 4541 size, 880 benefit 480 bytes stack usage reachable body 
local finalized inlinable

  called by:
phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 
1351 size, 291 benefit 984 bytes stack usage reachable body local 
finalized inlinable

  called by:
hlprog/17(-1) @0x7fd198c16a00 availability:local 560 time, 10 benefit 
(516762 after inlining) 462 size, 1 benefit (10728 after inlining) 4216 
bytes stack usage 15851 bytes after inlining reachable body local 
finalized inlinable

  called by: main/29 (1.00 per call)
sl2tim/49(-1) @0x7fd198c35900 availability:local 207312 time, 26617 
benefit 5169 size, 941 benefit 3856 bytes stack usage reachable body 
local finalized inlinable

  called by:
gemini/0(-1) @0x7fd198bef800 availability:local 147324 time, 2770 
benefit 11443 size, 1177 benefit 11635 bytes stack usage reachable body 
local finalized inlinable

  called by:

So if we have to believe this summary,

HLPROG is called by MAIN, but is not suitable for inlining (I can live 
with that).

GEMINI is not called, but GEMINI.clone is (by HLPROG) and is inlined.
SL2TIM is not called, but SL2TIM.clone is called by GEMINI and 
GEMINI.clone; because it is called twice, it is not considered a 
function-only-called-once.
PHCALL is not called, but PHCALL.clone is called by SL2TIM and 
SL2TIM.clone; because it is called twice, it is not considered a 
function-only-called-once.
PHTASK is not called, but PHTASK.clone is called by PHCALL and 
PHCALL.clone; because it is called twice, it is not considered a 
function-only-called-once.


I don't think this is really what we want with 
functions-only-called-once: If only the .clone version of a function is 
used, than a function that's only called once *inside this clone* is a 
function-only-called-once.


I hope this analysis helps,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html


Re: Whole program optimization and functions-only-called-once.

2009-11-04 Thread Richard Guenther
On Wed, Nov 4, 2009 at 8:19 PM, Toon Moene  wrote:
> Jan,
>
> I had some time to study the example I sent you a couple of weeks ago.
>
> According to visible inspection of the source code, there are 5 functions
> (subroutines in Fortran parlance) that are called once:
>
> MAIN   calls
> HLPROG calls
> GEMINI calls
> SL2TIM calls
> PHCALL calls
> PHTASK
>
> I.e., the last five should be candidates for inlining of "functions only
> called once".
>
> However, ccrPOljB.o.047i.inline says:
>
> Deciding on functions called once:
>
> Considering gemini_.clone.1 size 11443.
>  Called once from hlprog 462 insns.
>  Inlined into hlprog which now has 10728 size for a net change of -12620
> size.
>
> Considering hlprog size 10728.
>  Called once from main 7 insns.
>  Not inlined because --param large-function-growth limit reached.
>
> Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size.
>
> The dump option -fdump-ipa-all also gives me the call graph, of which I copy
> here the relevant part:
>
> phcall_.clone.3/11(-1) @0x7fd198c16400 (clone of phcall/33)
> availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 bytes
> stack usage reachable local finalized inlinable
>  called by: sl2tim/49 (0.44 per call) sl2tim_.clone.0/16 (0.44 per call)
> phtask_.clone.2/12(-1) @0x7fd198c16500 (clone of phtask/41)
> availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 bytes
> stack usage reachable local finalized inlinable
>  called by: phcall_.clone.3/11 (3.52 per call) phcall/33 (3.52 per call)
> sl2tim_.clone.0/16(-1) @0x7fd198c16900 (clone of sl2tim/49)
> availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 3856
> bytes stack usage reachable local finalized inlinable
>  called by: gemini_.clone.1/40 (1.00 per call) gemini/0 (1.00 per call)
> gemini_.clone.1phtask/40(-1) @0x7fd198c35000 (inline copy in hlprog/17)
> (clone of gemini/0) availability:local 147324 time, 2770 benefit 11443 size,
> 1177 benefit 11635 bytes stack usage reachable local finalized inlinable
>  called by: hlprog/17 (3.57 per call) (inlined)
> phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 benefit
> 4541 size, 880 benefit 480 bytes stack usage reachable body local finalized
> inlinable
>  called by:
> phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 1351
> size, 291 benefit 984 bytes stack usage reachable body local finalized
> inlinable
>  called by:
> hlprog/17(-1) @0x7fd198c16a00 availability:local 560 time, 10 benefit
> (516762 after inlining) 462 size, 1 benefit (10728 after inlining) 4216
> bytes stack usage 15851 bytes after inlining reachable body local finalized
> inlinable
>  called by: main/29 (1.00 per call)
> sl2tim/49(-1) @0x7fd198c35900 availability:local 207312 time, 26617 benefit
> 5169 size, 941 benefit 3856 bytes stack usage reachable body local finalized
> inlinable
>  called by:
> gemini/0(-1) @0x7fd198bef800 availability:local 147324 time, 2770 benefit
> 11443 size, 1177 benefit 11635 bytes stack usage reachable body local
> finalized inlinable
>  called by:
>
> So if we have to believe this summary,
>
> HLPROG is called by MAIN, but is not suitable for inlining (I can live with
> that).
> GEMINI is not called, but GEMINI.clone is (by HLPROG) and is inlined.
> SL2TIM is not called, but SL2TIM.clone is called by GEMINI and GEMINI.clone;
> because it is called twice, it is not considered a
> function-only-called-once.
> PHCALL is not called, but PHCALL.clone is called by SL2TIM and SL2TIM.clone;
> because it is called twice, it is not considered a
> function-only-called-once.
> PHTASK is not called, but PHTASK.clone is called by PHCALL and PHCALL.clone;
> because it is called twice, it is not considered a
> function-only-called-once.
>
> I don't think this is really what we want with functions-only-called-once:
> If only the .clone version of a function is used, than a function that's
> only called once *inside this clone* is a function-only-called-once.
>
> I hope this analysis helps,

I think the underlying issue is

phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268
benefit 4541 size, 880 benefit 480 bytes stack usage reachable body
local finalized inlinable
 called by:
phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972
benefit 1351 size, 291 benefit 984 bytes stack usage reachable body
local finalized inlinable
 called by:

that these are not called but still reachable (they should not be reachable
anymore, instead the clones are now reachable).  I think there already is
a bug about cloning not updating cgraph reachability and not reclaiming
nodes after IPA transform application.

Richard.

> --
> Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/
> Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
>


Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Dave Korn
Justin Mattock wrote:

> O.k. here is the info from dmesg(with the patch added)
> and what -fmem-report:

  I don't know how to read the oom dmesg, but as to the -fmem-report:

> Memory still allocated at the end of the compilation process
> Size   AllocatedUsedOverhead
> Total   7200k   5293k104k

... what that's telling us is that there isn't a substantial leak in GCC, as
there's only 7 meg left unreclaimed by GC at the end.  I think we'll have to
wait and see what the debugger tells us; either GCC really is using that much
memory in processing the file, or there's some kind of system or kernel bug
you're running into that is causing a leak in the VMM rather than the 
application.

> String pool

> bytes   86k (17592186044415M overhead)

  0xFFF0, lol, wut?  It's possible that indicates some sort of
memory corruption going on.  Maybe valgrind can help, do you have that?

cheers,
  DaveK



Re: Understanding IRA

2009-11-04 Thread Jeff Law

On 11/04/09 10:52, Ian Bolton wrote:

Hi Jeff,

 From an empirical perspective, the value of your patch is hard to
determine at this stage - one benchmark improved about 0.5% but others
have marginally regressed.
   
It's a hack, no doubt about it.  Your results are about what I expected, 
a few things get better a few things get worse.


The purpose of the patch was twofold.  First, hopefully make reload's 
actions more predictable.  Second hopefully improve the overall code in 
at least some marginal fashion.


I didn't want to burn a lot of time on it since the reload inheritance 
is, IMHO, the biggest ball of hair in reload that I want to kill.  If 
I'm successful in killing the need for reload inheritance, then this 
little hack silently goes away.




 From an intellectual perspective, however, your patch is clearly a Good
Thing.  If I am understanding correctly, your aim is to prevent cases
where reload evicts a pseudo from a register that conflicts with the
pseudo that we are trying to reload, thereby undoing the clever cost-
based logic in IRA that gave the register to the current owner in the
first place?
   
Basically, yes.  Note that we still allow eviction if no other register 
can be found.



My guess is that the performance drop could be attributed to reload
being lucky in some cases and your patch is preventing this luck from
happening.

Precisely.



   Whilst I'm more of a risk-taker by nature, my employer
would prefer more predictable behaviour from its compiler, so we will
likely commit your patch to our development branch. Thanks very much!
   
There's definitely a lot more that could be done with this code.  If it 
were going to be around for a while, you'd want to sort the spill 
register array based on a number of criteria.  Part of those criteria 
would be the conflicts & costs recorded by IRA, whether or not the spill 
reg holds a value interesting for this insn, etc.  Given my goals, I 
didn't want to spend that much time on it.




Is there an ETA on when reload will be gone away? ;-)
   
It'll be a long time, though I'd like to start staging in pieces of the 
work for 4.6.


jeff



Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Justin P. Mattock

Dave Korn wrote:

Justin Mattock wrote:

   

O.k. here is the info from dmesg(with the patch added)
and what -fmem-report:
 


   I don't know how to read the oom dmesg, but as to the -fmem-report:

   

Memory still allocated at the end of the compilation process
Size   AllocatedUsedOverhead
Total   7200k   5293k104k
 


... what that's telling us is that there isn't a substantial leak in GCC, as
there's only 7 meg left unreclaimed by GC at the end.  I think we'll have to
wait and see what the debugger tells us; either GCC really is using that much
memory in processing the file, or there's some kind of system or kernel bug
you're running into that is causing a leak in the VMM rather than the 
application.

   

just finished compiling and installing gdb/valgrind

String pool
 


   

bytes   86k (17592186044415M overhead)
 


   0xFFF0, lol, wut?  It's possible that indicates some sort of
memory corruption going on.  Maybe valgrind can help, do you have that?

 cheers,
   DaveK


   

Not sure how to use these.(need to read)
Any quick commands I can do to get the info
to you?

Justin P. Mattock


MicroBlaze update

2009-11-04 Thread Michael Eager

I've checked in patches to the microblaze branch
to bring it into sync with gcc-4.4.2.  This has been
tagged with microblaze-4.4.2.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: Whole program optimization and functions-only-called-once.

2009-11-04 Thread Toon Moene

Richard Guenther wrote:


I think the underlying issue is

phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268
benefit 4541 size, 880 benefit 480 bytes stack usage reachable body
local finalized inlinable
 called by:
phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972
benefit 1351 size, 291 benefit 984 bytes stack usage reachable body
local finalized inlinable
 called by:

that these are not called but still reachable (they should not be reachable
anymore, instead the clones are now reachable).  I think there already is
a bug about cloning not updating cgraph reachability and not reclaiming
nodes after IPA transform application.


You don't happen to recall the bug number ?

The last time I did this sort of optimization was in 1992.

f2c (the Fortran-to-C compiler) gave me C equivalents of all Fortran 
code in the forecasting executable.


I spent a rainy Sunday afternoon to paste them into one giant source 
file, order them correctly (all called subroutines first) and then slap 
"static inline" on them.


Subsequently, I compiled the (30,000 line) C file with gcc -O3.  The 
resulting executable was about 10 % faster than the original (which was 
also compiled by f2c - g77 didn't exist at that time).


So my hopes on this optimization (when done right) are quite high :-)

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html


Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Justin Mattock
here's what I did:
 valgrind --tool=memcheck --leak-check=full -v make -f client.mk build


e2 -march=core2 -O2 -pipe -fomit-frame-pointer   -DMOZILLA_CLIENT
-include ./js-confdefs.h -Wp,-MD,.deps/jsxml.pp
/home/justin/LFS/firefox/mozilla-1.9.2/js/src/jsxml.cpp
{standard input}: Assembler messages:
{standard input}:271839: Warning: end of file not at end of a line;
newline inserted
{standard input}:271896: Error: suffix or operands invalid for `movq'
{standard input}:271896: Error: open CFI at the end of file; missing
.cfi_endproc directive
c++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See  for instructions.
make[4]: *** [jsxml.o] Error 1
make[4]: Leaving directory
`/home/name/LFS/firefox/mozilla-1.9.2/obj-x86_64-unknown-linux-gnu/js/src'
make[3]: *** [libs_tier_js] Error 2
make[3]: Leaving directory
`/home/name/LFS/firefox/mozilla-1.9.2/obj-x86_64-unknown-linux-gnu'
make[2]: *** [tier_js] Error 2
make[2]: Leaving directory
`/home/name/LFS/firefox/mozilla-1.9.2/obj-x86_64-unknown-linux-gnu'
make[1]: *** [default] Error 2
make[1]: Leaving directory
`/home/name/LFS/firefox/mozilla-1.9.2/obj-x86_64-unknown-linux-gnu'
make: *** [build] Error 2
==4072==
==4072== HEAP SUMMARY:
==4072== in use at exit: 201,183 bytes in 4,237 blocks
==4072==   total heap usage: 28,879 allocs, 24,642 frees, 2,947,434
bytes allocated
==4072==
==4072== Searching for pointers to 4,237 not-freed blocks
==4072== Checked 268,808 bytes
==4072==
==4072== 6 bytes in 1 blocks are possibly lost in loss record 41 of 295
==4072==at 0x4C2488A: malloc (vg_replace_malloc.c:195)
==4072==by 0x50ABEE1: strdup (in /lib/libc-2.10.90.so)
==4072==by 0x4118D8: xstrdup (in /usr/bin/make)
==4072==by 0x41BAA4: define_variable_in_set (in /usr/bin/make)
==4072==by 0x4160B4: eval (in /usr/bin/make)
==4072==by 0x416766: eval_makefile (in /usr/bin/make)
==4072==by 0x416A3F: read_all_makefiles (in /usr/bin/make)
==4072==by 0x410853: main (in /usr/bin/make)
==4072==
==4072== 14 bytes in 1 blocks are possibly lost in loss record 69 of 295
==4072==at 0x4C2488A: malloc (vg_replace_malloc.c:195)
==4072==by 0x411977: xmalloc (in /usr/bin/make)
==4072==by 0x411AA8: savestring (in /usr/bin/make)
==4072==by 0x41BB7A: define_variable_in_set (in /usr/bin/make)
==4072==by 0x410832: main (in /usr/bin/make)
==4072==
==4072== LEAK SUMMARY:
==4072==definitely lost: 0 bytes in 0 blocks
==4072==indirectly lost: 0 bytes in 0 blocks
==4072==  possibly lost: 20 bytes in 2 blocks
==4072==still reachable: 201,163 bytes in 4,235 blocks
==4072== suppressed: 0 bytes in 0 blocks
==4072== Reachable blocks (those to which a pointer was found) are not shown.
==4072== To see them, rerun with: --leak-check=full --show-reachable=yes
==4072==
==4072== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 5 from 5)
--4072--
--4072-- used_suppression:  2 dl-hack3-cond-1
--4072-- used_suppression:  3 glibc-2.5.x-on-SUSE-10.2-(PPC)-2a
==4072==
==4072== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 5 from 5)

I'll try out gdb, and more of valgrind.

-- 
Justin P. Mattock


Re: Whole program optimization and functions-only-called-once.

2009-11-04 Thread Andrew Pinski
On Wed, Nov 4, 2009 at 1:20 PM, Toon Moene  wrote:
> You don't happen to recall the bug number ?

It might be related to PR 41735 which I noticed when looking at the
generated assembly and trying to compare 4.5 to 4.4.

Thanks,
Andrew Pinski


Re: Whole program optimization and functions-only-called-once.

2009-11-04 Thread Richard Guenther
On Wed, Nov 4, 2009 at 10:30 PM, Andrew Pinski  wrote:
> On Wed, Nov 4, 2009 at 1:20 PM, Toon Moene  wrote:
>> You don't happen to recall the bug number ?
>
> It might be related to PR 41735 which I noticed when looking at the
> generated assembly and trying to compare 4.5 to 4.4.

Yes indeed.  Honza may be able to explain why it is like it is and if it's easy
to fix.  He's on vacation though ;)

Richard.


Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread KOSAKI Motohiro
> > +   if (verbose) {
> > +   task_lock(p);
> 
> We need to be careful with which locks we take on the oom-killer path,
> because it can be called by code which already holds locks.  But I
> expect task_lock() will be OK.

Sure.
task_lock() is already used various oom path. I think this is ok.





Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Dave Korn
Justin Mattock wrote:
> here's what I did:
>  valgrind --tool=memcheck --leak-check=full -v make -f client.mk build

> ==4072== LEAK SUMMARY:

> I'll try out gdb, and more of valgrind.

  Yep, that doesn't tell us a lot in its default modes.  I'm not a valgrind
expert but it looks from the docs like you want to try the Massif tool: it
looks really thorough.

http://valgrind.org/docs/manual/ms-manual.html

cheers,
  DaveK




Re: Understanding IRA

2009-11-04 Thread Jeff Law

On 11/04/09 10:14, Vladimir Makarov wrote:


I am agree with Jeff.  It is hard to understand what you are doing 
without the architecture knowledge and some macro values in your port 
(IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).


I'd also add that besides right macro value definitions, you could use 
insn alternative hints near register constraints like ? or even *.
I was wondering about the constraints in particular.  ISTM that for 
insns where TOP_REGS and BOTTOM_REGS have differing costs that the 
constraint letters for TOP_REGS ought to have a '?' to show they're 
slightly more costly than BOTTOM_REGS.


Alternately IRA_HARD_REGNO_ADD_COST_MULTIPLER might be used to get 
BOTTOM_REGS preferred without having to add a zillion '?' modifiers to 
the constraints in the machine description.


jeff


Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0

2009-11-04 Thread Justin Mattock
On Wed, Nov 4, 2009 at 4:36 PM, Dave Korn
 wrote:
> Justin Mattock wrote:
>> here's what I did:
>>  valgrind --tool=memcheck --leak-check=full -v make -f client.mk build
>
>> ==4072== LEAK SUMMARY:
>
>> I'll try out gdb, and more of valgrind.
>
>  Yep, that doesn't tell us a lot in its default modes.  I'm not a valgrind
> expert but it looks from the docs like you want to try the Massif tool: it
> looks really thorough.
>
> http://valgrind.org/docs/manual/ms-manual.html
>
>    cheers,
>      DaveK
>
>
>

o.k. round2 I can try gdb(now that I have somewhat of an idea of
how to use one of these tools).

in the meanwhile hopefully this is useful(if not just clip, and I'll try
to gather something useful)


using this for valgrind and the command(below in log)
valgrind -v --tool=memcheck --leak-check=yes --num-callers=10
--leak-check=full --show-reachable=yes --gen-suppressions=yes
(log of valgrind is a bit long, so here's just a few of what
suppresions printed out:)

==1830== Memcheck, a memory error detector
==1830== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==1830== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==1830== Command: c++ -o jsxml.o -c -DOSTYPE="Linux2.6" -DOSARCH=Linux
-DEXPORT_JS_API -DJS_USE_SAFE_ARENA
-I/home/justin/LFS/firefox/mozilla-1.9.2/js/src -I.
-I./../../dist/include -I./../../dist/include/nsprpub
-I/usr/include/nspr -I/home/justin/LFS/firefox/mozilla-1.9.2/js/src
-fPIC -fno-rtti -fno-exceptions -Wall -Wpointer-arith
-Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy
-Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof
-Wno-variadic-macros -Wno-long-long -pedantic -fno-strict-aliasing
-pthread -pipe -DNDEBUG -DTRIMMED -m64 -mtune=core2 -march=core2 -O2
-pipe -fomit-frame-pointer -DMOZILLA_CLIENT -include ./js-confdefs.h
-Wp,-MD,.deps/jsxml.pp
/home/justin/LFS/firefox/mozilla-1.9.2/js/src/jsxml.cpp
==1830==
--1830-- Valgrind options:
--1830---v
--1830----tool=memcheck
--1830----leak-check=yes
--1830----num-callers=10
--1830----leak-check=full
--1830----show-reachable=yes
--1830----gen-suppressions=yes
--1830-- Contents of /proc/version:
--1830--   Linux version 2.6.32-rc5-00083-g04ea458 (jus...@linux-0)
(gcc version 4.4.1 (GCC for Cross-LFS 4.4.1.20090722) ) #2 SMP Sat Oct
24 21:38:54 PDT 2009
--1830-- Arch and hwcaps: AMD64, amd64-sse3-cx16
--1830-- Page sizes: currently 4096, max supported 4096
--1830-- Valgrind library directory: /usr/lib/valgrind
--1830-- Reading syms from /usr/bin/c++ (0x40)
--1830-- Reading syms from /lib/ld-2.10.90.so (0x400)
--1830-- Reading syms from /usr/lib/valgrind/memcheck-amd64-linux (0x3800)
--1830--object doesn't have a dynamic symbol table
--1830-- Reading suppressions file: /usr/lib/valgrind/default.supp
--1830-- REDIR: 0x4016110 (strlen) redirected to 0x38040607
(vgPlain_amd64_linux_REDIR_FOR_strlen)
--1830-- Reading syms from
/usr/lib/valgrind/vgpreload_core-amd64-linux.so (0x4a2)
--1830-- Reading syms from
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so (0x4c21000)
==1830== WARNING: new redirection conflicts with existing -- ignoring it
--1830-- new: 0x04016110 (strlen  ) R-> 0x04c24fd0 strlen
--1830-- REDIR: 0x4015f30 (index) redirected to 0x4c24d50 (index)
--1830-- REDIR: 0x40160e0 (strcmp) redirected to 0x4c252e0 (strcmp)
--1830-- Reading syms from /lib/libc-2.10.90.so (0x4e28000)
--1830-- REDIR: 0x4ea4c40 (rindex) redirected to 0x4c24bb0 (rindex)
--1830-- REDIR: 0x4e9e010 (malloc) redirected to 0x4c24808 (malloc)
--1830-- REDIR: 0x4ea3330 (strncmp) redirected to 0x4c25210 (strncmp)
--1830-- REDIR: 0x4ea7670 (memcpy) redirected to 0x4c253e0 (memcpy)
--1830-- REDIR: 0x4e9df30 (free) redirected to 0x4c23b44 (free)
--1830-- REDIR: 0x4ea17d0 (strcmp) redirected to 0x4c25280 (strcmp)
--1830-- REDIR: 0x4ea3170 (strlen) redirected to 0x4c24f90 (strlen)
--1830-- REDIR: 0x4ea1750 (index) redirected to 0x4c24c50 (index)
--1830-- REDIR: 0x4ea9da0 (strchrnul) redirected to 0x4c25ff0 (strchrnul)
--1830-- REDIR: 0x4ea6d30 (mempcpy) redirected to 0x4c260e0 (mempcpy)
--1830-- REDIR: 0x4ea5bc0 (memchr) redirected to 0x4c253a0 (memchr)
--1830-- REDIR: 0x4e9f060 (realloc) redirected to 0x4c248b9 (realloc)
--1830-- REDIR: 0x4ea31c0 (strnlen) redirected to 0x4c24f60 (strnlen)
--1830-- REDIR: 0x4ea7360 (stpcpy) redirected to 0x4c25c10 (stpcpy)
--1830-- REDIR: 0x4ea2c20 (strcpy) redirected to 0x4c24ff0 (strcpy)
--1830-- REDIR: 0x4ea9d50 (rawmemchr) redirected to 0x4c26020 (rawmemchr)
--1830-- REDIR: 0x4e9d5d0 (calloc) redirected to 0x4c2322c (calloc)
--1830-- REDIR: 0x4ea4b80 (strncpy) redirected to 0x4c250c0 (strncpy)
--1830-- REDIR: 0x4ea1590 (strcat) redirected to 0x4c24d90 (strcat)
--1830-- REDIR: 0x4e5bf40 (putenv) redirected to 0x4c263d0 (putenv)
{standard input}: Assembler messages:
{standard input}:271839: Warning: end of file not at end of a line;
newline inserted
{standard input}:271896: Error: suffix or operands invalid for `movq'
{standard input