for getting profiling times in millsecond resolution.
Hi, I want to get the profiling data of an application in linux. Now I am using -pg options of gcc for generating the profile data. then used gprof for generating profiles. Here I am getting only in terms of seconds. But I want in millisecond resolution. can anybody help me. Thanks & regards Jayaraj
Re: New brach 'yara-branch' is created
Hi Vladimir, On Sat, 18 Mar 2006, Vladimir N. Makarov wrote: > What I am going to do in short perspective is > o work on code quality of some SPECINT tests (e.g. reload is doing >better job for crafty with many multi-registers than YARA) I haven't looked at the new branch yet, so forgive me if this is obvious or handled already. In my attempt I found that the only way to handle multi-word registers well was to really track (and allocate) partially live registers, i.e. separate words in a whole multi-word web. That was what the whole subweb mess was coming from, which was slowing down the whole allocator quite a bit. Ciao, Michael.
Re: New brach 'yara-branch' is created
Michael Matz wrote: Hi Vladimir, On Sat, 18 Mar 2006, Vladimir N. Makarov wrote: What I am going to do in short perspective is o work on code quality of some SPECINT tests (e.g. reload is doing better job for crafty with many multi-registers than YARA) The lower-subreg patch that Richard Henderson posted, and that comes up again and again from time to time, may also help. It does require a bit of hacking in the MDs (mostly removing the DImode patterns for logical operations since the middle-end is able to synthesize them on its own). Paolo
Leaf functions and noreturn calls
Good morning everyone! Here's a simple testcase that illustrates what I'm observing in gcc-3.3.3 -- extern void abort (void) __attribute ((noreturn)); int foo (int a, int b) { if (a > 25) abort (); return (a + b); } int bar (int a, int b) { if (a > 25) a++; return (a + b); } -- Function bar() is clearly a leaf function, so at stackframe layout time I get called with current_function_is_leaf == 1, and since it uses no nonvolatile registers I get a function with no stackframe, no registers saved, no prologue and nothing but a ret insn in the epilog. Function foo() is not regarded as a leaf function, because it calls abort, and so my prologue generation code believes it has to create a stack frame just to save the link register. However, abort is clearly marked noreturn:- -- ;(call_insn 13 31 14 0x0 (parallel [ ;(call (mem:SI (symbol_ref:SI ("abort")) [0 S4 A32]) ;(const_int 0 [0x0])) ;(clobber (reg:SI 15 r15)) ;]) 36 {call} (nil) ;(expr_list:REG_UNUSED (reg:SI 15 r15) ;(expr_list:REG_NORETURN (const_int 0 [0x0]) ;(expr_list:REG_EH_REGION (const_int 0 [0x0]) ;(nil ;(nil)) -- and without this call it would clearly be a leaf function, and since this call is noreturn we could in fact still treat it as a leaf function. Is there some complication that I haven't realised why noreturn function calls can't be disregarded in deciding whether or not a function is leaf? Taking a look at leaf_function_p, I see that it specifically discounts sibcalls; why not noreturncalls as well? cheers, DaveK -- Can't think of a witty .sigline today
Re: gcc-4.2-20060304 is now available
On Sun, 19 Mar 2006, Laurent GUERBY wrote: > Are statistics for GCC download available somewhere? I suspect > in these days of broadband that just about everyone gets the full > tarball (especially for releases...). The FreeSD ports, for example, by default do not build gfortran nor Java at this point and have never built Ada. And I doubt that many are building Ada, regardless of the platform they are using. gfortran weights in less than one megabyte, thus isn't of too much concern. Java with close to 10MB and Adad with some 4.5MB are kind of heavy, though. Keep in mind that depending on the part of the world many may still be using modem lines. Gerald
Re: FUNCTION_OK_FOR_SIBCALL vs INITIAL_FRAME_POINTER_OFFSET
On Sat, Mar 11, 2006 at 12:41:32AM -, Dave Korn wrote: > So, what if the decision it needs to make depends on the stack frame > size of the current function? How can this possibly be? When the sibcall is done, the current function's stack frame is removed. r~
RE: FUNCTION_OK_FOR_SIBCALL vs INITIAL_FRAME_POINTER_OFFSET
On 20 March 2006 14:45, Richard Henderson wrote: Hi Richard :) > On Sat, Mar 11, 2006 at 12:41:32AM -, Dave Korn wrote: >> So, what if the decision it needs to make depends on the stack frame >> size of the current function? > > How can this possibly be? When the sibcall is done, the current > function's stack frame is removed. > > > r~ If the stack frame size is >32kB, I need to use a temporary register in the epilogue to assemble the lo/hi parts of the frame size before adding it to the SP. In the non-sibcall version of the epilogue[*] it uses one of the arg-passing volatiles as a scratch register, but of course in a sibcall epilogue that register might have been pre-loaded with an argument for the sibcall which we don't want to trash. So rather than get hairy with trying to allocate scratch regs, I was just going to refuse sibcalls for functions with huge stack frames. Hence my curiosity. cheers, DaveK [*] - Which has until now been the only version for our custom target; it's still using fprintfs in TARGET_ASM_FUNCTION_EPILOGUE, and I'm going one step at a time to bring it up to scratch. -- Can't think of a witty .sigline today
Re: FUNCTION_OK_FOR_SIBCALL vs INITIAL_FRAME_POINTER_OFFSET
On Mon, Mar 20, 2006 at 02:56:00PM -, Dave Korn wrote: > If the stack frame size is >32kB, I need to use a temporary register in the > epilogue to assemble the lo/hi parts of the frame size before adding it to the > SP. In the non-sibcall version of the epilogue[*] it uses one of the > arg-passing volatiles as a scratch register, but of course in a sibcall > epilogue that register might have been pre-loaded with an argument for the > sibcall which we don't want to trash. So rather than get hairy with trying to > allocate scratch regs, I was just going to refuse sibcalls for functions with > huge stack frames. Hence my curiosity. Ah, interesting. In this case I'd deny sibcalls to functions that use all of the available scratch registers for arguments. We do something similar for i386 indirect sibcalls -- deny unless there's a free register for the jump. r~
Re: Leaf functions and noreturn calls
On Mon, Mar 20, 2006 at 12:57:14PM -, Dave Korn wrote: > Taking a look at leaf_function_p, I see that it specifically discounts > sibcalls; why not noreturncalls as well? Because generally losing unwind information from noreturn calls is a lose when it comes to debugging. r~
Re: New brach 'yara-branch' is created
Michael Matz wrote: Hi Vladimir, On Sat, 18 Mar 2006, Vladimir N. Makarov wrote: What I am going to do in short perspective is o work on code quality of some SPECINT tests (e.g. reload is doing better job for crafty with many multi-registers than YARA) I haven't looked at the new branch yet, so forgive me if this is obvious or handled already. In my attempt I found that the only way to handle multi-word registers well was to really track (and allocate) partially live registers, i.e. separate words in a whole multi-word web. That was what the whole subweb mess was coming from, which was slowing down the whole allocator quite a bit. It is not done this way yet. I thought about that too and hope it can help crafty.
Re: New brach 'yara-branch' is created
Paolo Bonzini wrote: Michael Matz wrote: Hi Vladimir, On Sat, 18 Mar 2006, Vladimir N. Makarov wrote: What I am going to do in short perspective is o work on code quality of some SPECINT tests (e.g. reload is doing better job for crafty with many multi-registers than YARA) The lower-subreg patch that Richard Henderson posted, and that comes up again and again from time to time, may also help. It does require a bit of hacking in the MDs (mostly removing the DImode patterns for logical operations since the middle-end is able to synthesize them on its own). Thanks for the information. I'll look at this.
RE: FUNCTION_OK_FOR_SIBCALL vs INITIAL_FRAME_POINTER_OFFSET
On 20 March 2006 15:14, Richard Henderson wrote: > On Mon, Mar 20, 2006 at 02:56:00PM -, Dave Korn wrote: >> If the stack frame size is >32kB, I need to use a temporary register in >> the epilogue to assemble the lo/hi parts of the frame size before adding >> it to the SP. In the non-sibcall version of the epilogue[*] it uses one >> of the arg-passing volatiles as a scratch register, but of course in a >> sibcall epilogue that register might have been pre-loaded with an argument >> for the sibcall which we don't want to trash. So rather than get hairy >> with trying to allocate scratch regs, I was just going to refuse sibcalls >> for functions with huge stack frames. Hence my curiosity. > > Ah, interesting. In this case I'd deny sibcalls to functions that > use all of the available scratch registers for arguments. Heh, of course, lateral thinking, attack the conflict from the opposite direction! > We do > something similar for i386 indirect sibcalls -- deny unless there's > a free register for the jump. Mmmf. In that case, of course, you absolutely /have/ to have a register available and no two ways about it. But in the direct-sibcall case, this test will generate false positives since if the stackframe is small enough I don't need a register at all and the sibcall should be ok. And I expect it to be the case that 99% of all functions /will/ have small stack frames, and so this will deny every sibcall to a function using the full set of arg passing regs when they'll almost all be ok. So I guess if I want to make the most of the opportunities for sibcalls, I still need hope that get_frame_size() is valid at F_O_F_S time and make a best-guess at the size of the frame, and if it goes from < 32k to > 32k by the time I actually come to emit the sibcall epilogue I'll just have to abort. Do you happen to know off the top of your head when get_frame_size() becomes valid? cheers, DaveK -- Can't think of a witty .sigline today
RE: Leaf functions and noreturn calls
On 20 March 2006 15:31, Richard Henderson wrote: > On Mon, Mar 20, 2006 at 12:57:14PM -, Dave Korn wrote: >> Taking a look at leaf_function_p, I see that it specifically discounts >> sibcalls; why not noreturncalls as well? > > Because generally losing unwind information from noreturn calls > is a lose when it comes to debugging. > > > r~ Ah, good point. You want to know where abort() was called from indeed! However, I might still want to make it an option for cases where debugging isn't going to be important; it seems to me that the generated code should still be valid. cheers, DaveK -- Can't think of a witty .sigline today
Re: New brach 'yara-branch' is created
The lower-subreg patch that Richard Henderson posted, and that comes up again and again from time to time, may also help. It does require a bit of hacking in the MDs (mostly removing the DImode patterns for logical operations since the middle-end is able to synthesize them on its own). Thanks for the information. I'll look at this. Here is an updated patch; the code is also cleaned up a bit to comply better with the GCC coding standards. The big TODO item there, is that the pass has "a bald-faced assumption that [every] subreg [of a multi-word reg] is actually inside an operand, and is thus replacable. This might be false if the target plays games with subregs in the patterns. Perhaps a better approach is to mirror what regrename does wrt recognizing the insn, iterating over the operands, smashing the operands out and iterating over the resulting pattern." Note that regrename as far as I understand does *much* more than what this pass should do. Paolo Index: Makefile.in === --- Makefile.in (revision 108713) +++ Makefile.in (working copy) @@ -972,7 +978,7 @@ OBJS-common = \ insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o \ integrate.o intl.o jump.o langhooks.o lcm.o lists.o local-alloc.o \ loop.o mode-switching.o modulo-sched.o optabs.o options.o opts.o \ - params.o postreload.o postreload-gcse.o predict.o\ + params.o postreload.o postreload-gcse.o predict.o lower-subreg.o \ insn-preds.o pointer-set.o \ print-rtl.o print-tree.o profile.o value-prof.o var-tracking.o \ real.o recog.o reg-stack.o regclass.o regmove.o regrename.o \ @@ -1722,6 +1767,8 @@ langhooks.o : langhooks.c $(CONFIG_H) $( $(TREE_H) toplev.h tree-inline.h $(RTL_H) insn-config.h $(INTEGRATE_H) \ langhooks.h $(LANGHOOKS_DEF_H) $(FLAGS_H) $(GGC_H) $(DIAGNOSTIC_H) intl.h \ $(TREE_GIMPLE_H) +lower-subreg.o : lower-subreg.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ + $(MACHMODE_H) $(RTL_H) bitmap.h tree.o : tree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ $(FLAGS_H) function.h $(PARAMS_H) \ toplev.h $(GGC_H) $(HASHTAB_H) $(TARGET_H) output.h $(TM_P_H) langhooks.h \ Index: dwarf2out.c === --- dwarf2out.c (revision 108713) +++ dwarf2out.c (working copy) @@ -8892,6 +8892,31 @@ concat_loc_descriptor (rtx x0, rtx x1) return cc_loc_result; } +/* Return a descriptor that describes the concatenation of N locations. */ + +static dw_loc_descr_ref +concatn_loc_descriptor (rtx concatn) +{ + dw_loc_descr_ref cc_loc_result = NULL; + unsigned int i, n = XVECLEN (concatn, 0); + + for (i = 0; i < n; ++i) +{ + dw_loc_descr_ref ref; + rtx x = XVECEXP (concatn, 0, i); + + ref = loc_descriptor (x); + if (ref == NULL) + return NULL; + + add_loc_descr (&cc_loc_result, ref); + ref = new_loc_descr (DW_OP_piece, GET_MODE_SIZE (GET_MODE (x)), 0); + add_loc_descr (&cc_loc_result, ref); +} + + return cc_loc_result; +} + /* Output a proper Dwarf location descriptor for a variable or parameter which is either allocated in a register or in a memory location. For a register, we just generate an OP_REG and the register number. For a @@ -8929,6 +8954,10 @@ loc_descriptor (rtx rtl) loc_result = concat_loc_descriptor (XEXP (rtl, 0), XEXP (rtl, 1)); break; +case CONCATN: + loc_result = concatn_loc_descriptor (rtl); + break; + case VAR_LOCATION: /* Single part. */ if (GET_CODE (XEXP (rtl, 1)) != PARALLEL) Index: emit-rtl.c === --- emit-rtl.c (revision 108713) +++ emit-rtl.c (working copy) @@ -846,13 +846,12 @@ gen_reg_rtx (enum machine_mode mode) return val; } -/* Generate a register with same attributes as REG, but offsetted by OFFSET. +/* Update NEW with same attributes as REG, but offsetted by OFFSET. Do the big endian correction if needed. */ -rtx -gen_rtx_REG_offset (rtx reg, enum machine_mode mode, unsigned int regno, int offset) +static void +update_reg_offset (rtx new, rtx reg, int offset) { - rtx new = gen_rtx_REG (mode, regno); tree decl; HOST_WIDE_INT var_size; @@ -894,7 +893,7 @@ gen_rtx_REG_offset (rtx reg, enum machin if ((BYTES_BIG_ENDIAN || WORDS_BIG_ENDIAN) && decl != NULL && offset > 0 - && GET_MODE_SIZE (GET_MODE (reg)) > GET_MODE_SIZE (mode) + && GET_MODE_SIZE (GET_MODE (reg)) > GET_MODE_SIZE (GET_MODE (new)) && ((var_size = int_size_in_bytes (TREE_TYPE (decl))) > 0 && var_size < GET_MODE_SIZE (GET_MODE (reg { @@ -938,6 +937,27 @@ gen_rtx_REG_offset (rtx reg, enum machin REG_ATTRS (new) = get_reg_attrs (REG_EXPR (reg), REG_
using libmudflap with non-instrumented shared libraries
Using libmudflap to test a program that uses libxml2, I found that if a program access a constant pointer in a non-instrumented library, mudflap thinks that a read violation has occurred. A simple test that illustrates this is: a.c: - char *p = "abc"; - b.c: #include extern char *p; int main() { char a = p[0]; printf("%c\n",a); return 0; } compile and link with gcc -shared -fPIC a.c -o liba.so gcc -fmudflap -lmudflap b.c -la -L. -o b When b is run, mudflap prints: *** mudflap violation 1 (check/read): time=1142875338.034838 ptr=0xb7e2a521 size=1 pc=0xb7e34317 location=`b.c:5 (main)' /usr/lib/libmudflap.so.0(__mf_check+0x37) [0xb7e34317] ./b(main+0x7a) [0x80487f2] /usr/lib/libmudflap.so.0(__wrap_main+0x176) [0xb7e34ed6] number of nearby objects: 0 - Given how mudflap works, it would be very hard to avoid this false positive. It would be nice if this limitation was documented. Thanks, Rafael
Re: New brach 'yara-branch' is created
Paolo Bonzini <[EMAIL PROTECTED]> writes: > >> The lower-subreg patch that Richard Henderson posted, and that > >> comes up again and again from time to time, may also help. It does > >> require a bit of hacking in the MDs (mostly removing the DImode > >> patterns for logical operations since the middle-end is able to > >> synthesize them on its own). > > Thanks for the information. I'll look at this. > > Here is an updated patch; the code is also cleaned up a bit to comply > better with the GCC coding standards. I'll note that I also have an updated version of this patch, which I have not been posting due to copyright assignment issues. I think my version is a bit better since it recognizes the insn and walks over the operands, rather than walking over the whole pattern. I have very good results on some test cases. In general, though, for best results on i386 the register allocator needs to be improved to track subreg deaths independently. I've started that work, but not completed it. If anybody wants to look at my patch, let me know privately. I am hoping that the copyright assignment issue will be resolved Real Soon Now. Ian
Re: NOPs inserting problem in GCC 4.1.x
"Ling-hua Tseng" <[EMAIL PROTECTED]> writes: > Because I need to use the feature of `length' attribute (i.e., use > get_attr_length() in machine description), > I have to insert NOPs explicitly before performing the pass 58 > (shorten) such that the shorten pass can calculate the length of insns > exactly. > Can I direct move the reorg pass to the under of shorten pass by modifying > the passes.c? One typical trick is to insert the nops in TARGET_ASM_FUNCTION_PROLOGUE, and then call insn_insn_lengths (); shorten_branches (get_insns ()); to recompute everything. Yes, it's ugly. Ian
Re: using libmudflap with non-instrumented shared libraries
rafael.espindola wrote: > [...] > extern char *p; > [...] >char a = p[0]; > [...] > compile and link with > gcc -shared -fPIC a.c -o liba.so > gcc -fmudflap -lmudflap b.c -la -L. -o b Did the compiler give you a warning about inability to track the lifetime of "p"? It should have. - FChE
Re: using libmudflap with non-instrumented shared libraries
> Did the compiler give you a warning about inability to track the > lifetime of "p"? It should have. No. Not even with -Wall -O2. gcc -v: gcc (GCC) 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9) > - FChE > Thanks, Rafael
Re: iWMMXt/Linux EABI toolchain
Hello again Daniel and all! I'm still working on building the iWMMXt/EABI/NPTL toolchain. I've got to the stage where I have everything built upto the final linking of glibc. Sadly despite making a lot of progress and fixing many problems I am now stuck. As I previously discovered the current gcc-trunk will not build glibc with this configuration due to inlining when it should not, so I decided to try 4.1.0. It mostly works though I had to make a few changes to enable my configuration. I have also ended up patching 4.1.0 with the missing ARM EABI changes from trunk while trying to get everthing to work. My current and hopefully final hurdle is attached. The missing symbols are certainly present in the target libc.a, any idea what I need to fix? Using built-in specs. Target: arm-iwmmxt-linux-gnueabi Configured with: /var/tmp/portage/gcc-4.1.0/work/gcc-4.1.0/configure --prefix=/usr --bindir=/usr/arm-iwmmxt-linux-gnueabi/gcc-bin/4.1.0 --includedir=/usr/lib/gcc/arm-iwmmxt-linux-gnueabi/4.1.0/include --datadir=/usr/share/gcc-data/arm-iwmmxt-linux-gnueabi/4.1.0 --mandir=/usr/share/gcc-data/arm-iwmmxt-linux-gnueabi/4.1.0/man --infodir=/usr/share/gcc-data/arm-iwmmxt-linux-gnueabi/4.1.0/info --with-gxx-include-dir=/usr/lib/gcc/arm-iwmmxt-linux-gnueabi/4.1.0/include/g++-v4 --host=i686-pc-linux-gnu --target=arm-iwmmxt-linux-gnueabi --build=i686-pc-linux-gnu --disable-altivec --with-arch=iwmmxt --with-cpu=iwmmxt --disable-multilib --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --disable-multilib --disable-libmudflap --enable-java-awt=gtk --enable-shared --enable-threads=posix --enable-languages=c,c++,java --enable-__cxa_atexit --enable-clocale=gnu Thread model: posix gcc version 4.1.0 Steve ___ Yahoo! Photos NEW, now offering a quality print service from just 8p a photo http://uk.photos.yahoo.com build.log Description: 988914365-build.log
alias time explosion
I'm not sure when this happened, but I noticed on the weekend that there has been an explosion in the time spent during the alias analysis phase. using cplusplus-grammer.ii, it use to compile on my machine in about 55 seconds, and its now up to about 150 seconds. A quick gprof indicated about 60% of compile time is being spent in bitmap_bit_p, called from compute_may_aliases. someone made it WAY slow :-) Andrew
Re: alias time explosion
On Mar 20, 2006, at 5:18 PM, Andrew MacLeod wrote: I'm not sure when this happened, but I noticed on the weekend that there has been an explosion in the time spent during the alias analysis phase. using cplusplus-grammer.ii, it use to compile on my machine in about 55 seconds, and its now up to about 150 seconds. A quick gprof indicated about 60% of compile time is being spent in bitmap_bit_p, called from compute_may_aliases. someone made it WAY slow :-) Could it be that 2 more passes of may_alias was added? -- Pinski
Re: FSF Policy re. inclusion of source code from other projects in GCC
> "Mark" == Mark Mitchell <[EMAIL PROTECTED]> writes: Mark> The FSF and GCC SC have decided to move fastjar to savannah, and Mark> stop including it in future GCC releases, which will clarify Mark> this situation. Will someone please volunteer to migrate Mark> fastjar out of our repository? I'll take it out later this week. I am leaning toward putting it into the rhug repository on sourceware.org, simply because then we (the gcj hackers) won't have to go through some long project registration process. Speak up if you have a particular problem with this. Tom
Re: FSF Policy re. inclusion of source code from other projects in GCC
Tom Tromey wrote: > I am leaning toward putting it into the rhug repository on > sourceware.org, simply because then we (the gcj hackers) won't have to > go through some long project registration process. Speak up if you > have a particular problem with this. Thanks! I would prefer it go on savannah, which is more clearly unaffiliated with any particular commercial entity. The discussion on the SC list did mention moving it to savannah, but I don't think we specifically said it had to be savannah; we were using that as an example. I don't think I have any special authority to make this call, though, and I'm not trying to accuse anyone of anything whatsoever, so please don't interpret that request as some kind of dig. Nor do I in any way fail to appreciate Red Hat's support of free software by donating the machine. So, this is definitely in the my-two-cents category. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Ada subtypes and base types
On Sat, 2006-03-18 at 10:24 +0100, Laurent GUERBY wrote: > On Fri, 2006-03-17 at 12:51 -0700, Jeffrey A Law wrote: > > I'm not suggesting the FEs deduce more types and track ranges; > > that would be rather absurd. What I'm saying is that exposing > > these types outside the FE is most likely costing you both on > > the compile-time side and on the run-time side. > > About optimization, in most languages with array bounds and > range checks, the FE will build a structure with bounds > and code like the following for a typical array loop (sorry > for poor C): > > struct { >int low,high; /* no alias */ >double *data; > } X; > > int first_i=X.low+2; > int last_i=X.high-3; > int i; > > if (first_i<=last_i) { >for(i=first_i;i<=last_i;i++) { > if (iX.high) raise_bound_error(); /* CHECK */ > do_something(array_with_bounds[i]); >} > } > > The interesting thing performance wise would be to be able to remove the > CHECK in the BE. > > Is there some optimization pass able to do that? > > And when "+2" and "-3" are replaced by "+x" and "-y" and we > know through typing that x>=0 and y>=0? Not sure, mostly because of the structure accesses and my lack of knowledge about the symbolic range support. I know we have symbolic range support, but haven't looked to see how good it really is. What we're doing now is certainly better than what DOM did for symbolic ranges (nothing). Note that this is closely related to some of the bounds checking elimination we want to support for Java one day IIRC. Note also that if i, first_i and/or last_i are globals, then the odds of these kind of tests being deleted go down considerably as they're likely call-clobbered by the call to do_something. Jeff
Re: Multiple errors with GCOV
> I don't see any progress on GCOV, so I assume it's up to me to fix > these bugs. I'm writing here to cooperate with GCOV developers to > avoid duplicate work. There are two gcov maintainers listed in the GCC maintainers file: gcovJan Hubicka [EMAIL PROTECTED] gcovNathan Sidwell [EMAIL PROTECTED] You might like to drop them a mail message specifically (in case they haven't seen your message on the list). Generally speaking, if you find genuine gcov bugs, then your patches will be welcome. Thanks, Ben
Re: alias time explosion
On Mon, 2006-03-20 at 18:55 -0500, Andrew Pinski wrote: > On Mar 20, 2006, at 5:18 PM, Andrew MacLeod wrote: > > > I'm not sure when this happened, but I noticed on the weekend that > > there > > has been an explosion in the time spent during the alias analysis > > phase. > > using cplusplus-grammer.ii, it use to compile on my machine in about 55 > > seconds, and its now up to about 150 seconds. > > > > A quick gprof indicated about 60% of compile time is being spent in > > bitmap_bit_p, called from compute_may_aliases. > > > > someone made it WAY slow :-) > > Could it be that 2 more passes of may_alias was added? I don't think so. I would expect that to double or triple the time spent in alias analysis, not the entire compile time. This is a 200% increase in compiler time... going from 50 seconds to 150 is pretty significant. And since its almost all in bitmap_bit_p, it sounds algorithmic to me... btw, this was on a 3.0 Ghz P4 running linux with a checkout from mainline this morning built with no checking... Doing a quick check back, on 01/23 shows similar time (71% of compiler time spent in alias analysis, 97 seconds out of 135). The previous compiler to that which I have laying around is 10/30/05, and it shows a much more sensible 6.32 seconds in alias analysis. It looks like sometime between 10/30 and 01/23 alias analysis got out of hand. Odd it hasn't been noted before. Andrew
Aliasing sets on Arrays Types
Take the following C code: typedef long atype[]; typedef long atype1[]; int NumSift (atype *a, atype1 *a1) { (*a)[0] = 0; (*a1)[0] = 1; return (*a)[0]; } Shouldn't the aliasing set for the type atype be the same as atype1? In NumSift, shouldn't the store to (*a1)[0] interfere with (*a)[0] so that we don't return 0 always? Here is a full testcase for testing (I don't get any warnings with -W -Wall -pedantic): typedef long atype[]; typedef long atype1[]; int NumSift (atype *a, atype1 *a1) { (*a)[0] = 0; (*a1)[0] = 1; return (*a)[0]; } int main(void) { long a[2]; if (!NumSift(&a, &a)) __builtin_abort (); return 0; } And this is a regression from 3.4.0 if this is a bug. Also note this was generated from looking at Daniel Berlin's Array Reference for Pointers patch. Thanks, Andrew Pinski