Re: Research Region Based Memory Management for Imperative Languages
We have had a long term plan (which has not fructified until now) of implementing a static analysis for improving garbage collection. Our paper in TOPLAS (http://portal.acm.org/citation.cfm?id=1290521) describes our early work. The main bottle neck for our purpose is a good pointer analysis and we have shifted our focus to that. Nothing shareable yet on that front. It may still be worthwhile implementing it using the existing pointer analysis but I don't have the bandwidth for it. If someone wants to explore improving dynamic allocation, this may be a good beginning. I will be very happy to provide information. Uday Khedker. Matt Davis wrote, On Friday 27 August 2010 06:39 AM: Hello, I am just trying to settle down on my PhD Computer Science dissertation topic. I want something low-level, compiler related, and more so useful/practical. I am considering region-based memory management, to show memory efficiency and safety. For imperative languages, such as c, this is rather difficult from static-analysis alone (e.g. aliasing and weak-typing). However, I do believe region-based management is possible. If I were to take something of this nature on for my topic, would it be valuable research, and is it even worth the effort? I am by far any kind of compiler guru, and figured you all might know best. The other option, would be to implement such concepts in a research language, which can still be interesting, but I'm not sure how practical. -Matt
Gengtype : strange code in output_type_enum
Hello all, While hacking on gengtype with Basile, we noticed a strange piece of code at line 2539 in gcc/gengtype.c r162692 static void output_type_enum (outf_p of, type_p s) { if (s->kind == TYPE_PARAM_STRUCT && s->u.s.line.file != NULL) /* Strange code @@*/ { oprintf (of, ", gt_e_"); output_mangled_typename (of, s); } else if (UNION_OR_STRUCT_P (s) && s->u.s.line.file != NULL) { oprintf (of, ", gt_ggc_e_"); output_mangled_typename (of, s); } else oprintf (of, ", gt_types_enum_last"); } We think that the enum type_kind discriminates fields union in struct type. So for TYPE_PARAM_STRUCT we believe that the param_struct field of union u inside struct type is used. If this is true, the test s->u.s.line.file != NULL is meaningless when s->kind == TYPE_PARAM_STRUCT, it should be s->u.param_struct.line.file != NULL instead in our opinion. However, the existing code appears to work but we don't understand why. Or can a type have a kind TYPE_PARAM_STRUCT and only have s->u.s valid? It might be related to the code in new_structure near line 638 of gengtype.c which sets ls->kind = TYPE_LANG_STRUCT. Perhaps TYPE_PARAM_STRUCT has two different roles. If that is indeed the case, we have to distinguish them when serializing gengtype's state. Cheers. -- Jeremie Salvucci & Basile Starynkevitch
Re: Gengtype : strange code in output_type_enum
2010/8/27 : > We think that the enum type_kind discriminates fields union in struct type. > So for TYPE_PARAM_STRUCT we believe that > the param_struct field of union u inside struct type is used. If this is > true, the test s->u.s.line.file != NULL is meaningless when s->kind == > TYPE_PARAM_STRUCT, it should be s->u.param_struct.line.file != NULL instead > in our opinion. > > > Or can a type have a kind TYPE_PARAM_STRUCT and only have s->u.s valid? It > might be related to the code in new_structure near line 638 of gengtype.c > which sets ls->kind = TYPE_LANG_STRUCT. > > Perhaps TYPE_PARAM_STRUCT has two different roles. If that is indeed the > case, we have to distinguish them when serializing gengtype's state. I don't have time to investigate this right now to come up with an answer, but did you try producing gengtype debugging dump and looking there for structs that have these combinations of properties? Especially since - > However, the existing code appears to work but we don't understand why. Cheers, -- Laurynas
Re: Gengtype : strange code in output_type_enum
"Or can a type have a kind TYPE_PARAM_STRUCT and only have s->u.s valid? It might be related to the code in new_structure near line 638 of gengtype.c which sets ls->kind = TYPE_LANG_STRUCT." Forget about this sentence, Basile messed up TYPE_PARAM_STRUCT & TYPE_LANG_STRUCT (and is typing this). Cheers -- Jeremie Salvucci & Basile Starynkevitch
Clustering switch cases
Hi, I have been analysing the gcc4.4 code due to the way it's handling: 1 extern void f(const char *); 2 extern void g(int); 3 4 #define C(n) case n: f(#n); break 5 6 void g(int n) 7 { 8 switch(n) 9 { 10 C(0); C(1); C(2); C(3); C(4); C(5); C(6); C(7); C(8); C(9); 11 C(10); C(11); C(12); C(13); C(14); C(15); C(16); C(17); C(18); C(19); 12 C(20); C(21); C(22); C(23); C(24); C(25); C(26); C(27); C(28); C(29); 13 14 C(1000); C(1001); C(1002); C(1003); C(1004); C(1005); C(1006); C(1007); C(1008); C(1009); 15 } 16 } The interesting thing about this is that GCC generates much better code if I do: 1 extern void f(const char *); 2 extern void g(int); 3 4 #define C(n) case n: f(#n); break 5 6 void g(int n) 7 { 8 switch(n) 9 { 10 C(0); C(1); C(2); C(3); C(4); C(5); C(6); C(7); C(8); C(9); 11 C(10); C(11); C(12); C(13); C(14); C(15); C(16); C(17); C(18); C(19); 12 C(20); C(21); C(22); C(23); C(24); C(25); C(26); C(27); C(28); C(29); 13 } 14 switch(n) 15 { 16 C(1000); C(1001); C(1002); C(1003); C(1004); C(1005); C(1006); C(1007); C(1008); C(1009); 17 } 18 } In the first case, it generates a binary tree, and in the second two jump tables. The jump tables solution is much more elegant (at least in our situation), generating less code and being faster. Now, what I am wondering is the reason why GCC doesn't try to cluster the cases trying to find for clusters of contiguous values in the switch. If there is no specific reason then I would implement such pass, which would before expansion split switches according to value clustering, since I find it would be a good code improvement. Currently GCC seems to only use jump table is the range of the switch is not much bigger than its count, which works well in most cases except when you have big switches with clusters of contiguous values (like the first example I sent). Any comments on this would be appreciated. -- PMatos
Better performance on older version of GCC
Hello all, I have two computers with two different versions of GCC. Otherwise the two systems have identical hardware. I have a processor and memory intensive benchmark program which I compile on both systems and I cannot understand why the system with older GCC version compiles faster code. System A has GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" System B has GCC version "4.3.0 20080428 (Red Hat 4.3.0-8)" I find that the executable compiled on system A runs faster (on both systems) than the executable compiled on system B (on both system), by a factor about approximately 4 times. I have attempted to play with the GCC optimizer flags and have not been able to get System B (with the later GCC version) to compile code with any better performance. Could someone please help figure this out? Below is the GCC command I run on System A followed by the verbose output: gcc -v -Wall -DOFFLINE_WEIGHTS -DDOUBLEP -g bfbenchmark_threaded.c -lm -lrt -lpthread -O3 -o bfbenchmark_threaded ---BEGIN OUTPUT- Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c ++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic --host=i386-redhat-linux Thread model: posix gcc version 4.1.2 20070925 (Red Hat 4.1.2-33) /usr/libexec/gcc/i386-redhat-linux/4.1.2/cc1 -quiet -v -DOFFLINE_WEIGHTS -DDOUBLEP bfbenchmark_threaded.c -quiet -dumpbase bfbenchmark_threaded.c -mtune=generic -auxbase bfbenchmark_threaded -g -O3 -Wall -version -o /tmp/ccvxPCd0.s ignoring nonexistent directory "/usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../i386-redhat-linux/include" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/lib/gcc/i386-redhat-linux/4.1.2/include /usr/include End of search list. GNU C version 4.1.2 20070925 (Red Hat 4.1.2-33) (i386-redhat-linux) compiled by GNU C version 4.1.2 20070925 (Red Hat 4.1.2-33). GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: ab322ce5b87a7c6c23d60970ec7b7b31 as -V -Qy -o /tmp/ccU8kZL1.o /tmp/ccvxPCd0.s GNU assembler version 2.17.50.0.18 (i386-redhat-linux) using BFD version version 2.17.50.0.18-1 20070731 /usr/libexec/gcc/i386-redhat-linux/4.1.2/collect2 --eh-frame-hdr --build-id -m elf_i386 --hash-style=gnu -dynamic-linker /lib/ld-linux.so.2 -o bfbenchmark_threaded /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crt1.o /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crti.o /usr/lib/gcc/i386-redhat-linux/4.1.2/crtbegin.o -L/usr/lib/gcc/i386-redhat-linux/4.1.2 -L/usr/lib/gcc/i386-redhat-linux/4.1.2 -L/usr/lib/gcc/i386-redhat-linux/4.1.2/../../.. /tmp/ccU8kZL1.o -lm -lrt -lpthread -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/i386-redhat-linux/4.1.2/crtend.o /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crtn.o ---END OUTPUT- Below is the GCC command I run on System A followed by the verbose output: gcc -v -Wall -DOFFLINE_WEIGHTS -DDOUBLEP -g bfbenchmark_threaded.c -lm -lrt -lpthread -O3 -o bfbenchmark_threaded ---BEGIN OUTPUT- Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-cpu=generic --build=i386-redhat-linux Thread model: posix gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC) COLLECT_GCC_OPTIONS='-v' '-Wall' '-DOFFLINE_WEIGHTS' '-DDOUBLEP' '-g' '-O3' '-o' 'bfbenchmark_threaded' '-mtune=generic' /usr/libexec/gcc/i386-redhat-linux/4.3.0/cc1 -quiet -v -DOFFLINE_WEIGHTS -DDOUBLEP bfbenchmark_threaded.c -quiet -dumpbase bfbenchmark_threaded.c -mtune=generic -auxbase bfbenchmark_threaded -g -O3 -Wall -version -o /tmp/ccB4B5PI.s ignoring nonexistent directory "/usr/lib/gcc/i386-redhat-linux/4.3.0/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/i386-redhat-linux/4.3.0/../../../../i386-redhat-linux/include" #include "..." sear
Re: Better performance on older version of GCC
On Fri, Aug 27, 2010 at 6:44 AM, Corey Kasten wrote: > Hello all, > > I have two computers with two different versions of GCC. Otherwise the > two systems have identical hardware. I have a processor and memory > intensive benchmark program which I compile on both systems and I cannot > understand why the system with older GCC version compiles faster code. > > System A has GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" > System B has GCC version "4.3.0 20080428 (Red Hat 4.3.0-8)" > > I find that the executable compiled on system A runs faster (on both > systems) than the executable compiled on system B (on both system), by a > factor about approximately 4 times. I have attempted to play with the > GCC optimizer flags and have not been able to get System B (with the > later GCC version) to compile code with any better performance. Could > someone please help figure this out? > Can you try gcc 4.5.1? -- H.J.
Re: Better performance on older version of GCC
On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote: > I find that the executable compiled on system A runs faster (on both > systems) than the executable compiled on system B (on both system), by a > factor about approximately 4 times. I have attempted to play with the > GCC optimizer flags and have not been able to get System B (with the > later GCC version) to compile code with any better performance. Could > someone please help figure this out? It's almost impossible to tell what's going on without an actual testcase. You might not be able to provide the actual code, but you could try distilling it down to something you could release. -Nathan
Re: Gengtype : strange code in output_type_enum
We recompiled GCC-trunk r162692 with the following modification : In function output_type_enum of gcc/gengtype.c, we replaced - if (s->kind == TYPE_PARAM_STRUCT && s->u.s.line.file != NULL) + if (s->kind == TYPE_PARAM_STRUCT && s->u.param_struct.line.file != NULL) And Gengtype works like before with c,c++, lto enabled. Do you think we have to submit a one line patch (if yes, could it be reviewed quickly)? We don't know why the old version works, and we think writing u.s.line.file is incorrect for TYPE_PARAM_STRUCT (even if it happens to work by luck), since the union u.param_struct member is the only valid for TYPE_PARAM_STRUCT. -- Jeremie Salvucci & Basile Starynkevitch
Re: Gengtype : strange code in output_type_enum
> In function output_type_enum of gcc/gengtype.c, we replaced > > - if (s->kind == TYPE_PARAM_STRUCT && s->u.s.line.file != NULL) > + if (s->kind == TYPE_PARAM_STRUCT && s->u.param_struct.line.file != > NULL) > > And Gengtype works like before with c,c++, lto enabled. > > Do you think we have to submit a one line patch (if yes, could it be reviewed Sure, one line patches are actually welcome since they are well isolated and easy to review, as opposed to large big patches containing unrelated stuff which have basically zero chance to get accepted/reviewed (other than "please break you patch into multiple pieces). Arno
Re: Gengtype : strange code in output_type_enum
2010/8/27 : > We recompiled GCC-trunk r162692 with the following modification : > > In function output_type_enum of gcc/gengtype.c, we replaced > > - if (s->kind == TYPE_PARAM_STRUCT && s->u.s.line.file != NULL) > + if (s->kind == TYPE_PARAM_STRUCT && s->u.param_struct.line.file != NULL) > > And Gengtype works like before with c,c++, lto enabled. > > Do you think we have to submit a one line patch (if yes, could it be reviewed > quickly)? We don't know why the old version works, and we think writing > u.s.line.file is incorrect for TYPE_PARAM_STRUCT (even if it happens to work > by luck), since the union u.param_struct member is the only valid for > TYPE_PARAM_STRUCT. One-line patches are welcome, but in this instance could you please find out how the old code worked before changing it (as you admit, you don't understand it). -- Laurynas
Re: Gengtype : strange code in output_type_enum
jeremie.salvu...@free.fr writes: > While hacking on gengtype with Basile, we noticed a strange piece of code at > line 2539 in gcc/gengtype.c r162692 > > static void > output_type_enum (outf_p of, type_p s) > { > if (s->kind == TYPE_PARAM_STRUCT && s->u.s.line.file != NULL) /* Strange > code @@*/ > { > oprintf (of, ", gt_e_"); > output_mangled_typename (of, s); > } > else if (UNION_OR_STRUCT_P (s) && s->u.s.line.file != NULL) > { > oprintf (of, ", gt_ggc_e_"); > output_mangled_typename (of, s); > } > else > oprintf (of, ", gt_types_enum_last"); > } > > We think that the enum type_kind discriminates fields union in struct type. > So for TYPE_PARAM_STRUCT we believe that > the param_struct field of union u inside struct type is used. If this is > true, the test s->u.s.line.file != NULL is meaningless when s->kind == > TYPE_PARAM_STRUCT, it should be s->u.param_struct.line.file != NULL instead > in our opinion. I agree that this is wrong. > However, the existing code appears to work but we don't understand why. That one is fairly easy. If you look at the generated code, you will see that those values are only used to pass to gt_pch_note_object. From there they will eventually be passed to either ggc_pch_count_object or ggc_pch_alloc_object. The default page allocator ignores this type. The zone allocator does use the type, but nobody uses that allocator. And even if you do use the zone allocator, it will work correctly if perhaps suboptimally as long as it always gets the same type for a given struct, which I believe will happen. You should send in a tested patch to fix that problem (and nothing else). Ian
Re: Clustering switch cases
"Paulo J. Matos" writes: > In the first case, it generates a binary tree, and in the second two > jump tables. The jump tables solution is much more elegant (at least > in our situation), generating less code and being faster. > Now, what I am wondering is the reason why GCC doesn't try to cluster > the cases trying to find for clusters of contiguous values in the > switch. > > If there is no specific reason then I would implement such pass, which > would before expansion split switches according to value clustering, > since I find it would be a good code improvement. > > Currently GCC seems to only use jump table is the range of the switch > is not much bigger than its count, which works well in most cases > except when you have big switches with clusters of contiguous values > (like the first example I sent). I don't know of any specific reason not to look for clusters of switch cases. The main issue would be the affect on compilation time. If you can do it with an algorithm which is linear in the number of cases, then I think it would be an acceptable optimization. Ian
Re: Better performance on older version of GCC
On Fri, 2010-08-27 at 06:50 -0700, Nathan Froyd wrote: > On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote: > > I find that the executable compiled on system A runs faster (on both > > systems) than the executable compiled on system B (on both system), by a > > factor about approximately 4 times. I have attempted to play with the > > GCC optimizer flags and have not been able to get System B (with the > > later GCC version) to compile code with any better performance. Could > > someone please help figure this out? > > It's almost impossible to tell what's going on without an actual > testcase. You might not be able to provide the actual code, but you > could try distilling it down to something you could release. > > -Nathan Thanks for the reply Nathan. I have attached an archive with the test case code. The code is built by build.sh and outputs the number of microseconds to complete the processing. Compiling with GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" produces code that runs in about 66% of the time than does GCC version "4.3.0 20080428 (Red Hat 4.3.0-8)" Thanks Corey testbenchmark.100827.1050.tgz Description: application/compressed-tar
Re: Clustering switch cases
On Fri, Aug 27, 2010 at 3:47 PM, Ian Lance Taylor wrote: > > I don't know of any specific reason not to look for clusters of switch > cases. The main issue would be the affect on compilation time. If you > can do it with an algorithm which is linear in the number of cases, then > I think it would be an acceptable optimization. > Thanks. I will be working on it. I will let you know how it goes. Cheers, -- PMatos
Re: Clustering switch cases
On Fri, Aug 27, 2010 at 4:47 PM, Ian Lance Taylor wrote: > "Paulo J. Matos" writes: > >> In the first case, it generates a binary tree, and in the second two >> jump tables. The jump tables solution is much more elegant (at least >> in our situation), generating less code and being faster. >> Now, what I am wondering is the reason why GCC doesn't try to cluster >> the cases trying to find for clusters of contiguous values in the >> switch. >> >> If there is no specific reason then I would implement such pass, which >> would before expansion split switches according to value clustering, >> since I find it would be a good code improvement. >> >> Currently GCC seems to only use jump table is the range of the switch >> is not much bigger than its count, which works well in most cases >> except when you have big switches with clusters of contiguous values >> (like the first example I sent). > > I don't know of any specific reason not to look for clusters of switch > cases. The main issue would be the affect on compilation time. If you > can do it with an algorithm which is linear in the number of cases, then > I think it would be an acceptable optimization. In fact we might want to move switch optimization up to the tree level (just because it's way easier to deal with there). Thus, lower switch to a mixture of binary tree & jump-tables (possibly using perfect hashing). Richard.
Re: Better performance on older version of GCC
On Fri, Aug 27, 2010 at 5:02 PM, Corey Kasten wrote: > On Fri, 2010-08-27 at 06:50 -0700, Nathan Froyd wrote: >> On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote: >> > I find that the executable compiled on system A runs faster (on both >> > systems) than the executable compiled on system B (on both system), by a >> > factor about approximately 4 times. I have attempted to play with the >> > GCC optimizer flags and have not been able to get System B (with the >> > later GCC version) to compile code with any better performance. Could >> > someone please help figure this out? >> >> It's almost impossible to tell what's going on without an actual >> testcase. You might not be able to provide the actual code, but you >> could try distilling it down to something you could release. >> >> -Nathan > > Thanks for the reply Nathan. > > I have attached an archive with the test case code. The code is built by > build.sh and outputs the number of microseconds to complete the > processing. > > Compiling with GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" produces > code that runs in about 66% of the time than does GCC version "4.3.0 > 20080428 (Red Hat 4.3.0-8)" -fcx-limited-range or -fcx-fortran-rules. 4.3 now is more conforming than 4.1. Richard. > Thanks > > Corey >
specs and X-s and canadian X-s.
When doing native bootstraps things are nice and easy ... the target spec definitions are also appropriate for the host. === I've been doing some canadian X-s (specifically darwin 9 => darwin 7). So, ... when doing B == H != T the specs might need to be different from B != H == T (or B=H=T) In effect the specs for the linker are used to generate code for the 'target' on the 'host'. the problem comes when the B-host needs different specs from the native case. So when doing B == H != T (first cross - to build a compiler capable of making T code on B) we need specs that the B understands (this all works quite easily using --with-sysroot= .. etc. - and needs to use the right specs to allow generation on the B system) [ a specific case is inserting a path spec to point to the sysroot - which the B linker understands] When doing B != H == T (second cross - to make the compiler for T hosted on T) we need to generate native specs that T understands. [in my example the path spec causes the T==H linker to fail] === Is it legitimate to wrap these circumstances with #ifndef CROSS_DIRECTORY_STRUCTURE ... #endif in the target headers. (or maybe there's a flag somewhere that indicates H == T ?) I recognize that this is changing the target headers depending on the host .. .. but at the moment I can't see how else to do it; the specs say "do 'this' to generate correct code for the target" .. but "this" might well be host-dependent. It doesn't seem to belong in confg/mh-* or gcc/config/x-* any insight much appreciated. Iain
Re: Better performance on older version of GCC
On Fri, 2010-08-27 at 17:09 +0200, Richard Guenther wrote: > On Fri, Aug 27, 2010 at 5:02 PM, Corey Kasten > wrote: > > On Fri, 2010-08-27 at 06:50 -0700, Nathan Froyd wrote: > >> On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote: > >> > I find that the executable compiled on system A runs faster (on both > >> > systems) than the executable compiled on system B (on both system), by a > >> > factor about approximately 4 times. I have attempted to play with the > >> > GCC optimizer flags and have not been able to get System B (with the > >> > later GCC version) to compile code with any better performance. Could > >> > someone please help figure this out? > >> > >> It's almost impossible to tell what's going on without an actual > >> testcase. You might not be able to provide the actual code, but you > >> could try distilling it down to something you could release. > >> > >> -Nathan > > > > Thanks for the reply Nathan. > > > > I have attached an archive with the test case code. The code is built by > > build.sh and outputs the number of microseconds to complete the > > processing. > > > > Compiling with GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" produces > > code that runs in about 66% of the time than does GCC version "4.3.0 > > 20080428 (Red Hat 4.3.0-8)" > > -fcx-limited-range or -fcx-fortran-rules. 4.3 now is more conforming than > 4.1. > > Richard. > > > Thanks > > > > Corey > > Richard, -fcx-limited-range worked great on both my real benchmark and my test achive. GCC didn't recognize -fcx-fortran-rules, but obviously I don't need it. Thanks so much, Corey
Errors when invoking refs_may_alias_p_1
Hi all, I have instrumented a function call like foo(&a,&b) into the gimple SSA representation (gcc-4.5) and the consequent optimizations can not pass my instrumented code. The back traces are as followings. The error occurred when the pass dse tried to test if the call I inserted may use a memory reference. It is because the arguments &a is not a SSA_VAR or INDIRECT_REF, so the assert in function bool refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) gcc_assert ((!ref1->ref || SSA_VAR_P (ref1->ref) || handled_component_p (ref1->ref) || INDIRECT_REF_P (ref1->ref) || TREE_CODE (ref1->ref) == TARGET_MEM_REF || TREE_CODE (ref1->ref) == CONST_DECL) && (!ref2->ref || SSA_VAR_P (ref2->ref) || handled_component_p (ref2->ref) || INDIRECT_REF_P (ref2->ref) || TREE_CODE (ref2->ref) == TARGET_MEM_REF || TREE_CODE (ref2->ref) == CONST_DECL)); was violated. Does anyone know why the function arguments must be a SSA_VAR or INDIRECT_REF here? Have I missed to perform any actions to maintain the consistency of Gimple SSA? #0 0x76fc8ee0 in exit () from /lib/libc.so.6 #1 0x005ae4ce in diagnostic_action_after_output (context=0x1323880, diagnostic=0x7fffd870) at ../../src/gcc/diagnostic.c:198 #2 0x005aed54 in diagnostic_report_diagnostic (context=0x1323880, diagnostic=0x7fffd870) at ../../src/gcc/diagnostic.c:424 #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at %s:%d") at ../../src/gcc/diagnostic.c:709 #4 0x005aff4f in fancy_abort (file=0xe42670 "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 "refs_may_alias_p_1") at ../../src/gcc/diagnostic.c:763 #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, ref2=0x7fffdb50, tbaa_p=1 '\001') at ../../src/gcc/tree-ssa-alias.c:775 #6 0x008a2b12 in ref_maybe_used_by_call_p_1 (call=0x76790630, ref=0x7fffdb50) at ../../src/gcc/tree-ssa-alias.c:1133 #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 #9 0x008bf275 in dse_possible_dead_store_p (stmt=0x7683e820, use_stmt=0x7fffdca8) at ../../src/gcc/tree-ssa-dse.c:212 #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, bb=0x76a75068) at ../../src/gcc/domwalk.c:185 #13 0x008c0812 in tree_ssa_dse () at ../../src/gcc/tree-ssa-dse.c:430 #14 0x0073af0a in execute_one_pass (pass=0x13cced0) at ../../src/gcc/passes.c:1572 #15 0x0073b21a in execute_pass_list (pass=0x13cced0) at ../../src/gcc/passes.c:1627 #16 0x0073b238 in execute_pass_list (pass=0x1312720) at ../../src/gcc/passes.c:1628 #17 0x0086e372 in tree_rest_of_compilation (fndecl=0x76b93500) at ../../src/gcc/tree-optimize.c:413 #18 0x009fa7c5 in cgraph_expand_function (node=0x76be7000) at ../../src/gcc/cgraphunit.c:1548 #19 0x009faa49 in cgraph_expand_all_functions () at ../../src/gcc/cgraphunit.c:1627 #20 0x009fb07e in cgraph_optimize () at ../../src/gcc/cgraphunit.c:1875 #21 0x009f9461 in cgraph_finalize_compilation_unit () at ../../src/gcc/cgraphunit.c:1096 #22 0x004a9e93 in c_write_global_declarations () at ../../src/gcc/c-decl.c:9519 #23 0x008180d4 in compile_file () at ../../src/gcc/toplev.c:1065 #24 0x0081a1c5 in do_compile () at ../../src/gcc/toplev.c:2417 #25 0x0081a286 in toplev_main (argc=21, argv=0x7fffe0f8) at ../../src/gcc/toplev.c:2459 #26 0x00519c6b in main (argc=21, argv=0x7fffe0f8) at ../../src/gcc/main.c:35 Thanks, Hongtao Purdue University
RE: Tutorial Proposal for GCC Summit
Hello Prof. Khedker, Your tutorial would be very useful. I am trying my best to attend the Summit, this being an important motivator. Thank you. Sincerely, Anmol P. Paralkar > -Original Message- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of > Uday Khedker > Sent: Wednesday, August 18, 2010 11:50 AM > To: gcc@gcc.gnu.org > Subject: Tutorial Proposal for GCC Summit > > Dear Friends, > > I have submitted a tutorial proposal for GCC Summit. This email to get > an idea of whether this tutorial will be of interest to a sufficient > number of people. We have been conducting expanded versions of this > tutorial in India for past few years and it seems to generate some > interest. Some friends in US suggested that we should hold some version > outside of India hence I have sent in a proposal to GCC Summit. > > Looking forward to getting some feedback. I would also be happy to > modify the tutorial based on the feedback. > > Thanks and regards, > > Uday. > -- > Dr. Uday Khedker > Professor > Department of Computer Science & Engg. > IIT Bombay, Powai, Mumbai 400 076, India. > email : u...@cse.iitb.ac.in > homepage: http://www.cse.iitb.ac.in/~uday > phone : Office - 91 (22) 2576 7717 >Res. - 91 (22) 2576 8717, 91 (22) 2572 0288 > -- > > > Tutorial on Essential Abstractions in GCC > - > > Motivation: > --- > > Most explanations of GCC which are publicly available describe many > details and tend to be heavy on information rather than insights. As a > consequence, a clear and crisp explanation of the journey from a given > machine description to an actual run of a compiler generated from that > machine description is not available. > > In this tutorial we describe some carefully chosen abstractions > that help one to understand the retargetability mechanism and the > architecture of the compiler generation framework of GCC and relate it > to a generated compiler. > > > Coverage > > > The default duration of this tutorial is one day and it covers the > following topics: Meeting the challenge of understanding GCC. The > architecture of GCC. Basic concepts in GCC configuration and building. > The structure of a GCC generated compiler. Plugins structure of > GCC. First level graybox probing of the compilation sequence of > a GCC generated compiler. Graybox probing for machine independent > optimizations. Graybox probing for parallelization and vectorization. > Adding Gimple and RTL passes to GCC. The retargetability and instruction > selection mechanism of GCC. Designing and understanding GCC machine > descriptions. The abstractions in GCC machine descriptions and > their influence on a compiler generated from them. The design and > implementation of gdfa (generic data flow analyzer) for GCC. > > We have held several tutorials and workshops along these lines in past > few years in India and now would like to reach out to a larger audience. > > Our main source of material is the Workshop on Essential Abstractions > in GCC held at IIT Bombay from 5th July to 8th July 2010 > (http://www.cse.iitb.ac.in/grc/gcc-workshop-10). > > Target Audience > --- > > People interested in using GCC for their research as well as people > interested in contributing to GCC will benefit a lot from this tutorial. > It is expected that this tutorial will bring down the ramp up period of > novices to GCC from several frustrating months to a few stimulating > weeks. This tutorial will also be useful for people who are interested > in relating class room concepts of compilation to a large scale > practical compiler which is widely used. > > -- > Dr. Uday Khedker > Professor > Department of Computer Science & Engg. > IIT Bombay, Powai, Mumbai 400 076, India. > email : u...@cse.iitb.ac.in > homepage: http://www.cse.iitb.ac.in/~uday > phone : Office - 91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct) >Res. - 91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct) > --
Re: Errors when invoking refs_may_alias_p_1
On Fri, Aug 27, 2010 at 5:27 PM, Hongtao wrote: > Hi all, > > I have instrumented a function call like foo(&a,&b) into the gimple SSA > representation (gcc-4.5) and the consequent optimizations can not pass > my instrumented code. The back traces are as followings. The error > occurred when the pass dse tried to test if the call I inserted may use > a memory reference. It is because the arguments &a is not a SSA_VAR or > INDIRECT_REF, so the assert in function > > bool > refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) > > gcc_assert ((!ref1->ref > || SSA_VAR_P (ref1->ref) > || handled_component_p (ref1->ref) > || INDIRECT_REF_P (ref1->ref) > || TREE_CODE (ref1->ref) == TARGET_MEM_REF > || TREE_CODE (ref1->ref) == CONST_DECL) > && (!ref2->ref > || SSA_VAR_P (ref2->ref) > || handled_component_p (ref2->ref) > || INDIRECT_REF_P (ref2->ref) > || TREE_CODE (ref2->ref) == TARGET_MEM_REF > || TREE_CODE (ref2->ref) == CONST_DECL)); > was violated. > > Does anyone know why the function arguments must be a SSA_VAR or > INDIRECT_REF here? Have I missed to perform any actions to maintain the > consistency of Gimple SSA? Yes. is_gimple_val () will return false for your arguments as it seems that the variables do not have function invariant addresses. Richard. > > #0 0x76fc8ee0 in exit () from /lib/libc.so.6 > #1 0x005ae4ce in diagnostic_action_after_output > (context=0x1323880, diagnostic=0x7fffd870) at > ../../src/gcc/diagnostic.c:198 > #2 0x005aed54 in diagnostic_report_diagnostic > (context=0x1323880, diagnostic=0x7fffd870) at > ../../src/gcc/diagnostic.c:424 > #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at > %s:%d") at ../../src/gcc/diagnostic.c:709 > #4 0x005aff4f in fancy_abort (file=0xe42670 > "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 > "refs_may_alias_p_1") > at ../../src/gcc/diagnostic.c:763 > #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, > ref2=0x7fffdb50, tbaa_p=1 '\001') > at ../../src/gcc/tree-ssa-alias.c:775 > #6 0x008a2b12 in ref_maybe_used_by_call_p_1 > (call=0x76790630, ref=0x7fffdb50) at > ../../src/gcc/tree-ssa-alias.c:1133 > #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, > ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 > #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, > ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 > #9 0x008bf275 in dse_possible_dead_store_p > (stmt=0x7683e820, use_stmt=0x7fffdca8) at > ../../src/gcc/tree-ssa-dse.c:212 > #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, > bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 > #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, > bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 > #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, > bb=0x76a75068) at ../../src/gcc/domwalk.c:185 > #13 0x008c0812 in tree_ssa_dse () at > ../../src/gcc/tree-ssa-dse.c:430 > #14 0x0073af0a in execute_one_pass (pass=0x13cced0) at > ../../src/gcc/passes.c:1572 > #15 0x0073b21a in execute_pass_list (pass=0x13cced0) at > ../../src/gcc/passes.c:1627 > #16 0x0073b238 in execute_pass_list (pass=0x1312720) at > ../../src/gcc/passes.c:1628 > #17 0x0086e372 in tree_rest_of_compilation > (fndecl=0x76b93500) at ../../src/gcc/tree-optimize.c:413 > #18 0x009fa7c5 in cgraph_expand_function (node=0x76be7000) > at ../../src/gcc/cgraphunit.c:1548 > #19 0x009faa49 in cgraph_expand_all_functions () at > ../../src/gcc/cgraphunit.c:1627 > #20 0x009fb07e in cgraph_optimize () at > ../../src/gcc/cgraphunit.c:1875 > #21 0x009f9461 in cgraph_finalize_compilation_unit () at > ../../src/gcc/cgraphunit.c:1096 > #22 0x004a9e93 in c_write_global_declarations () at > ../../src/gcc/c-decl.c:9519 > #23 0x008180d4 in compile_file () at ../../src/gcc/toplev.c:1065 > #24 0x0081a1c5 in do_compile () at ../../src/gcc/toplev.c:2417 > #25 0x0081a286 in toplev_main (argc=21, argv=0x7fffe0f8) at > ../../src/gcc/toplev.c:2459 > #26 0x00519c6b in main (argc=21, argv=0x7fffe0f8) at > ../../src/gcc/main.c:35 > > Thanks, > > Hongtao > Purdue University > > >
Re: Gengtype : strange code in output_type_enum
On Fri, 2010-08-27 at 17:25 +0300, Laurynas Biveinis wrote: > 2010/8/27 : > > We recompiled GCC-trunk r162692 with the following modification : > > > > In function output_type_enum of gcc/gengtype.c, we replaced > > > > - if (s->kind == TYPE_PARAM_STRUCT && s->u.s.line.file != NULL) > > + if (s->kind == TYPE_PARAM_STRUCT && s->u.param_struct.line.file != NULL) > > > > And Gengtype works like before with c,c++, lto enabled. > > > > Do you think we have to submit a one line patch (if yes, could it be > > reviewed quickly)? We don't know why the old version works, and we think > > writing u.s.line.file is incorrect for TYPE_PARAM_STRUCT (even if it > > happens to work by luck), since the union u.param_struct member is the only > > valid for TYPE_PARAM_STRUCT. > > One-line patches are welcome, but in this instance could you please > find out how the old code worked before changing it (as you admit, you > don't understand it). My impression is that s->u.s.line.file usually happens to have the same offset (at least on GNU/Linux/AMD64=x86_64) as s->u.param_struct.param[0] and that for every type concerned by output_type_enum its param[0] subfield happens to be non-null. This explains that it worked by accident. Is such an heuristic explanation enough to propose a patch? I am not sure to be able to provide a better one quickly (so if the explanation is not enough, I am not sure to want to propose a half-line patch). By the way, what is the good way to find out exactly what svn commit introduced the bogus line? What surprises me much more is that the s->u.s.line.file != NULL test has been accepted long time ago. From what we understand of gengtype, it could never have made any sense (because conceptually s->u.s does not exist for TYPE_PARAM_STRUCT!), even if it happens to work by pure luck. I am quite surprised (but I admit I only looked a few pages) that there does not seems to be any rules regarding use of union in C code inside GNU. My personal requirement is that a union is only usable if it is inside a structure and is discriminated by a field of this structure (the usual case of a union of sub-structures each starting with a discriminant logically fits that requirement) or by a simple pure fonction depending of such a field (in ML or Ocaml parlance, a union is a discriminated sum type; Also, rpcxdr from Sun twenty years ago had a similar requirement...). But I see no such rules within GCC, and I even saw several unions not used that way. My perhaps excessive opinion is that such union abuse always gives unmaintainable code (and I am in the minority which wants GCC code to be more easily maintainable & readable & hackable by new contributors, even at the expense of raw performance; I feel that competitors's free compilers like LLVM are much better in that aspect.). ### For the curious people, our current work on gengtype is available as http://starynkevitch.net/Basile/gengtype-r163582-27-august-2010.diff As usual, this is a temporary URL. Our patch is not yet ready for submission. I have to clean up the code, correct a bug or two, understand how exactly is the s->u.param_struct.line.file field set in present gengtype. I also have to split our work into several patches, and I am very afraid of not being able to make a sequence of small patches such that each change make still gengtype work for entire GCC! I have no idea if this is even doable (since gengtype is a code generator with *global* side effects on GCC code; it could happen that some partial change work for C but not C++ or Ada parts.). I will perhaps propose a few *related* patches on gengtype this week-end, if I am motivated enough to work on it. Cheers. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: Gengtype : strange code in output_type_enum
Basile Starynkevitch writes: > My impression is that s->u.s.line.file usually happens to have the same > offset (at least on GNU/Linux/AMD64=x86_64) as > s->u.param_struct.param[0] and that for every type concerned by > output_type_enum its param[0] subfield happens to be non-null. This > explains that it worked by accident. No, that is not the case. But I already explained why this error doesn't matter: http://gcc.gnu.org/ml/gcc/2010-08/msg00396.html > By the way, what is the good way to find out exactly what svn commit > introduced the bogus line? svn blame Ian
Re: Errors when invoking refs_may_alias_p_1
On 08/27/10 12:35, Richard Guenther wrote: > On Fri, Aug 27, 2010 at 5:27 PM, Hongtao wrote: > >> Hi all, >> >> I have instrumented a function call like foo(&a,&b) into the gimple SSA >> representation (gcc-4.5) and the consequent optimizations can not pass >> my instrumented code. The back traces are as followings. The error >> occurred when the pass dse tried to test if the call I inserted may use >> a memory reference. It is because the arguments &a is not a SSA_VAR or >> INDIRECT_REF, so the assert in function >> >> bool >> refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) >> >> gcc_assert ((!ref1->ref >> || SSA_VAR_P (ref1->ref) >> || handled_component_p (ref1->ref) >> || INDIRECT_REF_P (ref1->ref) >> || TREE_CODE (ref1->ref) == TARGET_MEM_REF >> || TREE_CODE (ref1->ref) == CONST_DECL) >> && (!ref2->ref >> || SSA_VAR_P (ref2->ref) >> || handled_component_p (ref2->ref) >> || INDIRECT_REF_P (ref2->ref) >> || TREE_CODE (ref2->ref) == TARGET_MEM_REF >> || TREE_CODE (ref2->ref) == CONST_DECL)); >> was violated. >> >> Does anyone know why the function arguments must be a SSA_VAR or >> INDIRECT_REF here? Have I missed to perform any actions to maintain the >> consistency of Gimple SSA? >> > Yes. is_gimple_val () will return false for your arguments as it seems that > the variables do not have function invariant addresses. > > Richard. > > Thanks. But how can I change my argument to gimple_vals, using it with an assignment to a temp before and replacing my argument with the temp? Hongtao >> #0 0x76fc8ee0 in exit () from /lib/libc.so.6 >> #1 0x005ae4ce in diagnostic_action_after_output >> (context=0x1323880, diagnostic=0x7fffd870) at >> ../../src/gcc/diagnostic.c:198 >> #2 0x005aed54 in diagnostic_report_diagnostic >> (context=0x1323880, diagnostic=0x7fffd870) at >> ../../src/gcc/diagnostic.c:424 >> #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at >> %s:%d") at ../../src/gcc/diagnostic.c:709 >> #4 0x005aff4f in fancy_abort (file=0xe42670 >> "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 >> "refs_may_alias_p_1") >>at ../../src/gcc/diagnostic.c:763 >> #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, >> ref2=0x7fffdb50, tbaa_p=1 '\001') >>at ../../src/gcc/tree-ssa-alias.c:775 >> #6 0x008a2b12 in ref_maybe_used_by_call_p_1 >> (call=0x76790630, ref=0x7fffdb50) at >> ../../src/gcc/tree-ssa-alias.c:1133 >> #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, >> ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 >> #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, >> ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 >> #9 0x008bf275 in dse_possible_dead_store_p >> (stmt=0x7683e820, use_stmt=0x7fffdca8) at >> ../../src/gcc/tree-ssa-dse.c:212 >> #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, >> bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 >> #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, >> bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 >> #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, >> bb=0x76a75068) at ../../src/gcc/domwalk.c:185 >> #13 0x008c0812 in tree_ssa_dse () at >> ../../src/gcc/tree-ssa-dse.c:430 >> #14 0x0073af0a in execute_one_pass (pass=0x13cced0) at >> ../../src/gcc/passes.c:1572 >> #15 0x0073b21a in execute_pass_list (pass=0x13cced0) at >> ../../src/gcc/passes.c:1627 >> #16 0x0073b238 in execute_pass_list (pass=0x1312720) at >> ../../src/gcc/passes.c:1628 >> #17 0x0086e372 in tree_rest_of_compilation >> (fndecl=0x76b93500) at ../../src/gcc/tree-optimize.c:413 >> #18 0x009fa7c5 in cgraph_expand_function (node=0x76be7000) >> at ../../src/gcc/cgraphunit.c:1548 >> #19 0x009faa49 in cgraph_expand_all_functions () at >> ../../src/gcc/cgraphunit.c:1627 >> #20 0x009fb07e in cgraph_optimize () at >> ../../src/gcc/cgraphunit.c:1875 >> #21 0x009f9461 in cgraph_finalize_compilation_unit () at >> ../../src/gcc/cgraphunit.c:1096 >> #22 0x004a9e93 in c_write_global_declarations () at >> ../../src/gcc/c-decl.c:9519 >> #23 0x008180d4 in compile_file () at ../../src/gcc/toplev.c:1065 >> #24 0x0081a1c5 in do_compile () at ../../src/gcc/toplev.c:2417 >> #25 0x0081a286 in toplev_main (argc=21, argv=0x7fffe0f8) at >> ../../src/gcc/toplev.c:2459 >> #26 0x00519c6b in main (argc=21, argv=0x7fffe0f8) at >> ../../src/gcc/main.c:35 >> >> Thanks, >> >> Hongtao >> Purdue University >> >> >> >> >
Re: Errors when invoking refs_may_alias_p_1
On Fri, Aug 27, 2010 at 8:24 PM, Hongtao wrote: > On 08/27/10 12:35, Richard Guenther wrote: >> On Fri, Aug 27, 2010 at 5:27 PM, Hongtao wrote: >> >>> Hi all, >>> >>> I have instrumented a function call like foo(&a,&b) into the gimple SSA >>> representation (gcc-4.5) and the consequent optimizations can not pass >>> my instrumented code. The back traces are as followings. The error >>> occurred when the pass dse tried to test if the call I inserted may use >>> a memory reference. It is because the arguments &a is not a SSA_VAR or >>> INDIRECT_REF, so the assert in function >>> >>> bool >>> refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) >>> >>> gcc_assert ((!ref1->ref >>> || SSA_VAR_P (ref1->ref) >>> || handled_component_p (ref1->ref) >>> || INDIRECT_REF_P (ref1->ref) >>> || TREE_CODE (ref1->ref) == TARGET_MEM_REF >>> || TREE_CODE (ref1->ref) == CONST_DECL) >>> && (!ref2->ref >>> || SSA_VAR_P (ref2->ref) >>> || handled_component_p (ref2->ref) >>> || INDIRECT_REF_P (ref2->ref) >>> || TREE_CODE (ref2->ref) == TARGET_MEM_REF >>> || TREE_CODE (ref2->ref) == CONST_DECL)); >>> was violated. >>> >>> Does anyone know why the function arguments must be a SSA_VAR or >>> INDIRECT_REF here? Have I missed to perform any actions to maintain the >>> consistency of Gimple SSA? >>> >> Yes. is_gimple_val () will return false for your arguments as it seems that >> the variables do not have function invariant addresses. >> >> Richard. >> >> > Thanks. But how can I change my argument to gimple_vals, using it with > an assignment to a temp before and replacing my argument with the temp? Yes, that will work. Richard. > Hongtao >>> #0 0x76fc8ee0 in exit () from /lib/libc.so.6 >>> #1 0x005ae4ce in diagnostic_action_after_output >>> (context=0x1323880, diagnostic=0x7fffd870) at >>> ../../src/gcc/diagnostic.c:198 >>> #2 0x005aed54 in diagnostic_report_diagnostic >>> (context=0x1323880, diagnostic=0x7fffd870) at >>> ../../src/gcc/diagnostic.c:424 >>> #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at >>> %s:%d") at ../../src/gcc/diagnostic.c:709 >>> #4 0x005aff4f in fancy_abort (file=0xe42670 >>> "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 >>> "refs_may_alias_p_1") >>> at ../../src/gcc/diagnostic.c:763 >>> #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, >>> ref2=0x7fffdb50, tbaa_p=1 '\001') >>> at ../../src/gcc/tree-ssa-alias.c:775 >>> #6 0x008a2b12 in ref_maybe_used_by_call_p_1 >>> (call=0x76790630, ref=0x7fffdb50) at >>> ../../src/gcc/tree-ssa-alias.c:1133 >>> #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, >>> ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 >>> #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, >>> ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 >>> #9 0x008bf275 in dse_possible_dead_store_p >>> (stmt=0x7683e820, use_stmt=0x7fffdca8) at >>> ../../src/gcc/tree-ssa-dse.c:212 >>> #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, >>> bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 >>> #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, >>> bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 >>> #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, >>> bb=0x76a75068) at ../../src/gcc/domwalk.c:185 >>> #13 0x008c0812 in tree_ssa_dse () at >>> ../../src/gcc/tree-ssa-dse.c:430 >>> #14 0x0073af0a in execute_one_pass (pass=0x13cced0) at >>> ../../src/gcc/passes.c:1572 >>> #15 0x0073b21a in execute_pass_list (pass=0x13cced0) at >>> ../../src/gcc/passes.c:1627 >>> #16 0x0073b238 in execute_pass_list (pass=0x1312720) at >>> ../../src/gcc/passes.c:1628 >>> #17 0x0086e372 in tree_rest_of_compilation >>> (fndecl=0x76b93500) at ../../src/gcc/tree-optimize.c:413 >>> #18 0x009fa7c5 in cgraph_expand_function (node=0x76be7000) >>> at ../../src/gcc/cgraphunit.c:1548 >>> #19 0x009faa49 in cgraph_expand_all_functions () at >>> ../../src/gcc/cgraphunit.c:1627 >>> #20 0x009fb07e in cgraph_optimize () at >>> ../../src/gcc/cgraphunit.c:1875 >>> #21 0x009f9461 in cgraph_finalize_compilation_unit () at >>> ../../src/gcc/cgraphunit.c:1096 >>> #22 0x004a9e93 in c_write_global_declarations () at >>> ../../src/gcc/c-decl.c:9519 >>> #23 0x008180d4 in compile_file () at ../../src/gcc/toplev.c:1065 >>> #24 0x0081a1c5 in do_compile () at ../../src/gcc/toplev.c:2417 >>> #25 0x0081a286 in toplev_main (argc=21, argv=0x7fffe0f8) at >>> ../../src/gcc/toplev.c:2459 >>> #26 0x00519c6b in main (argc=21, argv=0x7fffe0f8) at >>> ../../src/gcc/main.c:35 >>> >>> Thanks, >>> >>> Hongtao >>> Purdue University >>> >>> >>> >>> >> > >
Re: Errors when invoking refs_may_alias_p_1
On 08/27/10 14:29, Richard Guenther wrote: > On Fri, Aug 27, 2010 at 8:24 PM, Hongtao wrote: > >> On 08/27/10 12:35, Richard Guenther wrote: >> >>> On Fri, Aug 27, 2010 at 5:27 PM, Hongtao wrote: >>> >>> Hi all, I have instrumented a function call like foo(&a,&b) into the gimple SSA representation (gcc-4.5) and the consequent optimizations can not pass my instrumented code. The back traces are as followings. The error occurred when the pass dse tried to test if the call I inserted may use a memory reference. It is because the arguments &a is not a SSA_VAR or INDIRECT_REF, so the assert in function bool refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) gcc_assert ((!ref1->ref || SSA_VAR_P (ref1->ref) || handled_component_p (ref1->ref) || INDIRECT_REF_P (ref1->ref) || TREE_CODE (ref1->ref) == TARGET_MEM_REF || TREE_CODE (ref1->ref) == CONST_DECL) && (!ref2->ref || SSA_VAR_P (ref2->ref) || handled_component_p (ref2->ref) || INDIRECT_REF_P (ref2->ref) || TREE_CODE (ref2->ref) == TARGET_MEM_REF || TREE_CODE (ref2->ref) == CONST_DECL)); was violated. Does anyone know why the function arguments must be a SSA_VAR or INDIRECT_REF here? Have I missed to perform any actions to maintain the consistency of Gimple SSA? >>> Yes. is_gimple_val () will return false for your arguments as it seems that >>> the variables do not have function invariant addresses. >>> >>> Richard. >>> >>> >>> >> Thanks. But how can I change my argument to gimple_vals, using it with >> an assignment to a temp before and replacing my argument with the temp? >> > Yes, that will work. > > Richard. > > OK. Do we have to rewrite it like this everytime we insert a function call on Gimple body if the argument of that call is an expression? Thanks, Hongtao >> Hongtao >> #0 0x76fc8ee0 in exit () from /lib/libc.so.6 #1 0x005ae4ce in diagnostic_action_after_output (context=0x1323880, diagnostic=0x7fffd870) at ../../src/gcc/diagnostic.c:198 #2 0x005aed54 in diagnostic_report_diagnostic (context=0x1323880, diagnostic=0x7fffd870) at ../../src/gcc/diagnostic.c:424 #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at %s:%d") at ../../src/gcc/diagnostic.c:709 #4 0x005aff4f in fancy_abort (file=0xe42670 "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 "refs_may_alias_p_1") at ../../src/gcc/diagnostic.c:763 #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, ref2=0x7fffdb50, tbaa_p=1 '\001') at ../../src/gcc/tree-ssa-alias.c:775 #6 0x008a2b12 in ref_maybe_used_by_call_p_1 (call=0x76790630, ref=0x7fffdb50) at ../../src/gcc/tree-ssa-alias.c:1133 #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 #9 0x008bf275 in dse_possible_dead_store_p (stmt=0x7683e820, use_stmt=0x7fffdca8) at ../../src/gcc/tree-ssa-dse.c:212 #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, bb=0x76a75068) at ../../src/gcc/domwalk.c:185 #13 0x008c0812 in tree_ssa_dse () at ../../src/gcc/tree-ssa-dse.c:430 #14 0x0073af0a in execute_one_pass (pass=0x13cced0) at ../../src/gcc/passes.c:1572 #15 0x0073b21a in execute_pass_list (pass=0x13cced0) at ../../src/gcc/passes.c:1627 #16 0x0073b238 in execute_pass_list (pass=0x1312720) at ../../src/gcc/passes.c:1628 #17 0x0086e372 in tree_rest_of_compilation (fndecl=0x76b93500) at ../../src/gcc/tree-optimize.c:413 #18 0x009fa7c5 in cgraph_expand_function (node=0x76be7000) at ../../src/gcc/cgraphunit.c:1548 #19 0x009faa49 in cgraph_expand_all_functions () at ../../src/gcc/cgraphunit.c:1627 #20 0x009fb07e in cgraph_optimize () at ../../src/gcc/cgraphunit.c:1875 #21 0x009f9461 in cgraph_finalize_compilation_unit () at ../../src/gcc/cgraphunit.c:1096 #22 0x004a9e93 in c_write_global_declarations () at ../../src/gcc/c-decl.c:9519 #23 0x008180d4 in compile_file () at ../../src/gcc/toplev.c
Re: Errors when invoking refs_may_alias_p_1
On Fri, Aug 27, 2010 at 8:37 PM, Hongtao wrote: > On 08/27/10 14:29, Richard Guenther wrote: >> On Fri, Aug 27, 2010 at 8:24 PM, Hongtao wrote: >> >>> On 08/27/10 12:35, Richard Guenther wrote: >>> On Fri, Aug 27, 2010 at 5:27 PM, Hongtao wrote: > Hi all, > > I have instrumented a function call like foo(&a,&b) into the gimple SSA > representation (gcc-4.5) and the consequent optimizations can not pass > my instrumented code. The back traces are as followings. The error > occurred when the pass dse tried to test if the call I inserted may use > a memory reference. It is because the arguments &a is not a SSA_VAR or > INDIRECT_REF, so the assert in function > > bool > refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) > > gcc_assert ((!ref1->ref > || SSA_VAR_P (ref1->ref) > || handled_component_p (ref1->ref) > || INDIRECT_REF_P (ref1->ref) > || TREE_CODE (ref1->ref) == TARGET_MEM_REF > || TREE_CODE (ref1->ref) == CONST_DECL) > && (!ref2->ref > || SSA_VAR_P (ref2->ref) > || handled_component_p (ref2->ref) > || INDIRECT_REF_P (ref2->ref) > || TREE_CODE (ref2->ref) == TARGET_MEM_REF > || TREE_CODE (ref2->ref) == CONST_DECL)); > was violated. > > Does anyone know why the function arguments must be a SSA_VAR or > INDIRECT_REF here? Have I missed to perform any actions to maintain the > consistency of Gimple SSA? > > Yes. is_gimple_val () will return false for your arguments as it seems that the variables do not have function invariant addresses. Richard. >>> Thanks. But how can I change my argument to gimple_vals, using it with >>> an assignment to a temp before and replacing my argument with the temp? >>> >> Yes, that will work. >> >> Richard. >> >> > OK. Do we have to rewrite it like this everytime we insert a function > call on Gimple body if the argument of that call is an expression? If it isn't is_gimple_reg_type (TREE_TYPE (arg)) ? is_gimple_val (arg) : is_gimple_lvalue (arg), then yes. See gimplify_arg in gimplify.c. Richard. > Thanks, > Hongtao > >>> Hongtao >>> > #0 0x76fc8ee0 in exit () from /lib/libc.so.6 > #1 0x005ae4ce in diagnostic_action_after_output > (context=0x1323880, diagnostic=0x7fffd870) at > ../../src/gcc/diagnostic.c:198 > #2 0x005aed54 in diagnostic_report_diagnostic > (context=0x1323880, diagnostic=0x7fffd870) at > ../../src/gcc/diagnostic.c:424 > #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at > %s:%d") at ../../src/gcc/diagnostic.c:709 > #4 0x005aff4f in fancy_abort (file=0xe42670 > "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 > "refs_may_alias_p_1") > at ../../src/gcc/diagnostic.c:763 > #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, > ref2=0x7fffdb50, tbaa_p=1 '\001') > at ../../src/gcc/tree-ssa-alias.c:775 > #6 0x008a2b12 in ref_maybe_used_by_call_p_1 > (call=0x76790630, ref=0x7fffdb50) at > ../../src/gcc/tree-ssa-alias.c:1133 > #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, > ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 > #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, > ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 > #9 0x008bf275 in dse_possible_dead_store_p > (stmt=0x7683e820, use_stmt=0x7fffdca8) at > ../../src/gcc/tree-ssa-dse.c:212 > #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, > bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 > #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, > bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 > #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, > bb=0x76a75068) at ../../src/gcc/domwalk.c:185 > #13 0x008c0812 in tree_ssa_dse () at > ../../src/gcc/tree-ssa-dse.c:430 > #14 0x0073af0a in execute_one_pass (pass=0x13cced0) at > ../../src/gcc/passes.c:1572 > #15 0x0073b21a in execute_pass_list (pass=0x13cced0) at > ../../src/gcc/passes.c:1627 > #16 0x0073b238 in execute_pass_list (pass=0x1312720) at > ../../src/gcc/passes.c:1628 > #17 0x0086e372 in tree_rest_of_compilation > (fndecl=0x76b93500) at ../../src/gcc/tree-optimize.c:413 > #18 0x009fa7c5 in cgraph_expand_function (node=0x76be7000) > at ../../src/gcc/cgraphunit.c:1548 > #19 0x009faa49 in cgraph_expand_all_functions () at > ../../src/gcc/cgraphunit.c:1627 > #20 0x009fb07e in cgraph_optimize () at > ../../src/gcc/cgraphunit.c:1875 >
Re: Clustering switch cases
Another main thing missing is to consider profile information (if available) so that most frequent cases can be peeled out. David On Fri, Aug 27, 2010 at 8:03 AM, Richard Guenther wrote: > On Fri, Aug 27, 2010 at 4:47 PM, Ian Lance Taylor wrote: >> "Paulo J. Matos" writes: >> >>> In the first case, it generates a binary tree, and in the second two >>> jump tables. The jump tables solution is much more elegant (at least >>> in our situation), generating less code and being faster. >>> Now, what I am wondering is the reason why GCC doesn't try to cluster >>> the cases trying to find for clusters of contiguous values in the >>> switch. >>> >>> If there is no specific reason then I would implement such pass, which >>> would before expansion split switches according to value clustering, >>> since I find it would be a good code improvement. >>> >>> Currently GCC seems to only use jump table is the range of the switch >>> is not much bigger than its count, which works well in most cases >>> except when you have big switches with clusters of contiguous values >>> (like the first example I sent). >> >> I don't know of any specific reason not to look for clusters of switch >> cases. The main issue would be the affect on compilation time. If you >> can do it with an algorithm which is linear in the number of cases, then >> I think it would be an acceptable optimization. > > In fact we might want to move switch optimization up to the tree level > (just because it's way easier to deal with there). Thus, lower switch > to a mixture of binary tree & jump-tables (possibly using perfect > hashing). > > Richard. >
Re: Better performance on older version of GCC
Briefly looked at it -- the trunk gcc also regresses a lot compared to the binary you attached. (To match your binary, also added -mfpmath=387 -m32 options) Two problems: 1) more register spills in the trunk version -- the old compiler seems more effective in using fp stack registers; 2) the complex multiplication -- the old version emits inline sequence while the trunk version emits call to _muld3c intrinsinc. You can probably file a bug report on this. Thanks, David On Fri, Aug 27, 2010 at 8:39 AM, Corey Kasten wrote: > On Fri, 2010-08-27 at 17:09 +0200, Richard Guenther wrote: >> On Fri, Aug 27, 2010 at 5:02 PM, Corey Kasten >> wrote: >> > On Fri, 2010-08-27 at 06:50 -0700, Nathan Froyd wrote: >> >> On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote: >> >> > I find that the executable compiled on system A runs faster (on both >> >> > systems) than the executable compiled on system B (on both system), by a >> >> > factor about approximately 4 times. I have attempted to play with the >> >> > GCC optimizer flags and have not been able to get System B (with the >> >> > later GCC version) to compile code with any better performance. Could >> >> > someone please help figure this out? >> >> >> >> It's almost impossible to tell what's going on without an actual >> >> testcase. You might not be able to provide the actual code, but you >> >> could try distilling it down to something you could release. >> >> >> >> -Nathan >> > >> > Thanks for the reply Nathan. >> > >> > I have attached an archive with the test case code. The code is built by >> > build.sh and outputs the number of microseconds to complete the >> > processing. >> > >> > Compiling with GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" produces >> > code that runs in about 66% of the time than does GCC version "4.3.0 >> > 20080428 (Red Hat 4.3.0-8)" >> >> -fcx-limited-range or -fcx-fortran-rules. 4.3 now is more conforming than >> 4.1. >> >> Richard. >> >> > Thanks >> > >> > Corey >> > > > Richard, > > -fcx-limited-range worked great on both my real benchmark and my test > achive. GCC didn't recognize -fcx-fortran-rules, but obviously I don't > need it. > > Thanks so much, > Corey > > > >
Re: Better performance on older version of GCC
On Fri, Aug 27, 2010 at 5:12 PM, Xinliang David Li wrote: > Briefly looked at it -- the trunk gcc also regresses a lot compared to > the binary you attached. (To match your binary, also added > -mfpmath=387 -m32 options) > > Two problems: > > 1) more register spills in the trunk version -- the old compiler seems > more effective in using fp stack registers; > 2) the complex multiplication -- the old version emits inline sequence > while the trunk version emits call to _muld3c intrinsinc. Neither of these seems like real bug reportable ones. The first one is that due to -fexcess-precision=standard being default in 4.5 and above (see PR 323). The second one is due to -fcx-limited-range not being default any more (I cannot remember the bug number which changed that though). Thanks, Andrew Pinski
Re: Better performance on older version of GCC
Right -- I missed Richard's previous email regarding the options. Thanks, David On Fri, Aug 27, 2010 at 5:21 PM, Andrew Pinski wrote: > On Fri, Aug 27, 2010 at 5:12 PM, Xinliang David Li wrote: >> Briefly looked at it -- the trunk gcc also regresses a lot compared to >> the binary you attached. (To match your binary, also added >> -mfpmath=387 -m32 options) >> >> Two problems: >> >> 1) more register spills in the trunk version -- the old compiler seems >> more effective in using fp stack registers; >> 2) the complex multiplication -- the old version emits inline sequence >> while the trunk version emits call to _muld3c intrinsinc. > > Neither of these seems like real bug reportable ones. The first one > is that due to -fexcess-precision=standard being default in 4.5 and > above (see PR 323). The second one is due to -fcx-limited-range not > being default any more (I cannot remember the bug number which changed > that though). > > Thanks, > Andrew Pinski >