Re: Inner loop unable to compute sufficient information during vectorization
gcc-ow...@gcc.gnu.org wrote on 25/05/2009 21:53:41: > for a loop like > > 1 for(i=0;i 2 for(j=0;j 3 a[i][j] = a[i][j]+b[i][j]; > > GCC 4.3.* is unable to get the information for the inner loop that > array reference 'a' is alias of each other and generates code for > runtime aliasing check during vectorization. Both current trunk and GCC4.4 vectorize the inner loop without any runtime alias checks. > Is it necessary to > recompute all information in loop_vec_info in function > vect_analyze_ref for analysis of inner loop also, as most of the > information is similar for the outer loop for the program. Maybe you are right, and it is possible to extract at least part of the information for the inner loop from the outer loop information. > > Similarly, outer loop is able to compute correct chrec i.e. NULL , for > array 'a' reference, while innerloop has chrec as chrec_dont_know, and > therfore complaint about runtime alias check. The chrecs are not the same for inner and outer loops, so it is reasonable that the results of the data dependence tests will be different. In this case, however, it seems to be a bug. Ira
March=native with a main 64bit system and 32bit chroot
Hi all, I've a main Gentoo 64bit system with CHOST="x86_64-pc-linux-gnu" and CFLAGS="-march=native -mtune=native -O2 -pipe -fomit-frame-pointer". My cpu is a Pentium 4 Prescott and i'm using gcc-4.3.2. With this little script: "echo 'float x(float x){return x < 0 ? -x : x;}' > x.c && gcc -fverbose-asm -mtune=native -march=native -S x.c && grep '\(-march\|-mtune\)' x.s && rm -fr x.c && rm -fr x.s " i've found that gcc uses march=nocona and mtune=nocona, and this is ok. In this main system i've some 32bit chroots with CHOST="i686-pc-linux-gnu" and CFLAGS="-march=native -mtune=native -O2 -pipe -fomit-frame-pointer". But this time gcc is using march=nocona and mtune=nocona again, that is wrong i think. Should i use march=native or march=prescott (that is what i was using with old gcc versions) for my chroots? If march=prescott is the right choice i think that the gcc "native" detection should consider also $CHOST.
Intermediate representation
Hello again, I 'm still working on egcs 1.1 and the function cp_namespace_decls is not implemented in. I just want to get the classes and functions implemented in my source code and I tried to get them with the function gettags but I think I didn't understand something. I tried to read some things about the bindings in the file decl.c but I didn't get a thing. What does it do ? Have I to use it to get what I want ? Thanks. Nicolas COLLIN
Intermediate representation
Hello again, to answer your question my code's purpose is to write a kind of tree in a file, the main arborescence is the classes, their method (with parameters, return type, ...), their attributes, etc... it will also recognize some new keywords I will introduce thanks to "__attribute__". In this condition, where is the best place to put my code please ? Thank you a lot, I would haven't made any progress without your help. Nicolas COLLIN Dave Korn a écrit : Bear in mind that global_namespace only exists in the C++ compiler 'cc1plus', so if you access it directly in toplev.c, the plain C compiler 'cc1' will fail to build with a link error. This might or might not matter to you for your purposes, but it's better for that reason to keep any code that needs to understand about anything to do with C++ in the /cp/ subdirectory. Without knowing what your code does, I can't say where is the best place to put it. The flow of control is that compile_file in gcc/toplev.c calls to the language-specific parser yyparse, which then calls into hooks in the language-specific files in gcc/ or gcc/cp/, ending eventually in finish_decl in either gcc/c-decl.c or gcc/cp/decl.c, where it calls rest_of_compilation to hand off the tree representation to the mid/backend for translation to assembler code. So, the C++ specific finish_decl would be one good place to add code that needs to analyse the trees. Bear in mind that by the time you see them there, some elementary optimisations like constant folding may already have been performed, so you might not see what exactly reflects the form of the original sources. cheers, DaveK
Re: March=native with a main 64bit system and 32bit chroot
Luca Zorzo writes: > I've a main Gentoo 64bit system with CHOST="x86_64-pc-linux-gnu" and > CFLAGS="-march=native -mtune=native -O2 -pipe -fomit-frame-pointer". > My cpu is a Pentium 4 Prescott and i'm using gcc-4.3.2. > > With this little script: > "echo 'float x(float x){return x < 0 ? -x : x;}' > x.c && gcc > -fverbose-asm -mtune=native -march=native -S x.c && grep > '\(-march\|-mtune\)' x.s && rm -fr x.c && rm -fr x.s " > i've found that gcc uses march=nocona and mtune=nocona, and this is ok. > > In this main system i've some 32bit chroots with > CHOST="i686-pc-linux-gnu" and CFLAGS="-march=native -mtune=native -O2 > -pipe -fomit-frame-pointer". > But this time gcc is using march=nocona and mtune=nocona again, that > is wrong i think. > > Should i use march=native or march=prescott (that is what i was using > with old gcc versions) for my chroots? > > If march=prescott is the right choice i think that the gcc "native" > detection should consider also $CHOST. This question is appropriate for the mailing list gcc-h...@gcc.gnu.org, not the mailing list g...@gcc.gnu.org. Please take any followups to gcc-help. Thanks. Your CPU is the same whether you have a chroot or not, and whether you are running in 32-bit mode or in 64-bit mode, so it is correct for gcc to handle -mtune=native -march=native the same way in both cases. Naturally the -march option will be affected by whether you are in 32-bit mode or not. As far as I know the -mtune option doesn't make any difference. It's reasonable to ask whether -mtune=nocona gives the best results when using a Prescott. Right now -mtune=prescott and -mtune=nocona are handled identically in any case. I don't know if there is room for improvement there or not. Ian
Re: Intermediate representation
Nicolas COLLIN wrote: > Hello again, > > I 'm still working on egcs 1.1 and the function cp_namespace_decls is > not implemented in. Well, the definition is very simple tree cp_namespace_decls (tree ns) { return NAMESPACE_LEVEL (ns)->names; } and NAMESPACE_LEVEL exists in egcs-1.1, so why not try back-porting it? cheers, DaveK
gcc-4.4-20090526 is now available
Snapshot gcc-4.4-20090526 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20090526/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 147883 You'll find: gcc-4.4-20090526.tar.bz2 Complete GCC (includes all of below) gcc-core-4.4-20090526.tar.bz2 C front end and core compiler gcc-ada-4.4-20090526.tar.bz2 Ada front end and runtime gcc-fortran-4.4-20090526.tar.bz2 Fortran front end and runtime gcc-g++-4.4-20090526.tar.bz2 C++ front end and runtime gcc-java-4.4-20090526.tar.bz2 Java front end and runtime gcc-objc-4.4-20090526.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.4-20090526.tar.bz2The GCC testsuite Diffs from 4.4-20090519 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Seeking suggestion
Jamie Prescott wrote: Is there a reason why something like this would not work? if (!TARGET_XXX2) emit_clobber(gen_rtx_REG(CCmode, CC_REGNUM)); emit_insn(gen_addsi3_nc(operands[0], operands[1], operands[2])); Yes. The optimizer will not know that addsi3_nc uses CC_REGNUM, as it is not mentioned, so the optimizer will not know that these two RTL instructions always need to remain next to each other. Any optimization pass that moves insns around may separate the add from the clobber resulting in broken code. This is what parallels are for, to make sure that the clobber and add stay together. Jim
4.4: march-native gives -mno-sse4, but cpuinfo sse4_1
If I run gcc -fverbose-asm -mtune=native -march=native -S x.c I get cat x.s: .file "x.c" # GNU C (GCC) version 4.4.0 20090506 (Red Hat 4.4.0-4) (x86_64-redhat-linux) # compiled by GNU C version 4.4.0 20090506 (Red Hat 4.4.0-4), GMP version 4.2.4, MPFR version 2.4.1. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: x.c -march=core2 -mcx16 -msahf --param l1-cache-size=32 # --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=core2 # -fverbose-asm # options enabled: -falign-loops -fargument-alias ... # -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mno-sse4 # -mpush-args -mred-zone -msahf -msse -msse2 -msse3 -mssse3 cat /proc/cpuinfo: flags : .sse sse2 ssse3 sse4_1 ... Is this a bug in the march-native code, or is there a reason sse4.1 is not enabled (this cpu does not support sse4.2)? sean
Re: 4.4: march-native gives -mno-sse4, but cpuinfo sse4_1
sean darcy writes: > If I run gcc -fverbose-asm -mtune=native -march=native -S x.c > > I get > cat x.s: > .file "x.c" > # GNU C (GCC) version 4.4.0 20090506 (Red Hat 4.4.0-4) (x86_64-redhat-linux) > # compiled by GNU C version 4.4.0 20090506 (Red Hat 4.4.0-4), GMP > version 4.2.4, MPFR version 2.4.1. > # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 > # options passed: x.c -march=core2 -mcx16 -msahf --param l1-cache-size=32 > # --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=core2 > # -fverbose-asm > # options enabled: -falign-loops -fargument-alias > ... > # -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mno-sse4 > # -mpush-args -mred-zone -msahf -msse -msse2 -msse3 -mssse3 > > > cat /proc/cpuinfo: > > flags : .sse sse2 ssse3 sse4_1 ... > > Is this a bug in the march-native code, or is there a reason sse4.1 is > not enabled (this cpu does not support sse4.2)? This question is more appropriate for the gcc-h...@gcc.gnu.org mailing list. Please take any followups to gcc-help. Thanks. In gcc, -msse4 implies both SSE4.1 and SSE4.2. Since your processor does not support both, -mno-sse4 is in effect. Actually, at present gcc does not consider any processor to support SSE4.1 or SSE4.2 by default. You always have to select them explicitly. This seems to be a bug. I would encourage you to open a bug report, if there isn't already one about this; see http://gcc.gnu.org/bugs.html . Thanks. Ian