-Warray-bounds false negative
Hello, I recently came across a false negative in GCC's detection of array bounds violation. At first, I thought the other tool (PC-Lint) was having false positive, but it turns out to be correct. The false negative occurs in GCC 4.3, 4.4.1, and latest trunk (4.5). I'm curious to understand how exactly the detection breaks down, as I think it may affect if/how the loop in question is optimized. Here is the code: int main(int argc, char** argv) { unsigned char data[8]; int hyphen = 0, i = 0; char *option = *argv; for(i = 19; i < 36; ++i) { if(option[i] == '-') { if(hyphen) return false; ++hyphen; continue; } if(!(option[i] >= '0' && option[i] <= '9') && !(option[i] >= 'A' && option[i] <= 'F') && !(option[i] >= 'a' && option[i] <= 'f')) { return false; } data[(i-hyphen)/2] = 0; } return 0; } When i is 36 and hyphen is 0 (and in many other cases), data[] will be overflowed by quite a bit. Where does the breakdown in array bounds detection occur, and why? Once I understand, and if the fix is simple enough, I can try to fix the bug and supply a patch. Thanks! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
Re: -Warray-bounds false negative
On Fri, 13 Nov 2009, Andrew Pinski wrote: On Fri, Nov 13, 2009 at 1:09 PM, Matt wrote: Hello, I recently came across a false negative in GCC's detection of array bounds violation. At first, I thought the other tool (PC-Lint) was having false positive, but it turns out to be correct. The false negative occurs in GCC 4.3, 4.4.1, and latest trunk (4.5). I'm curious to understand how exactly the detection breaks down, as I think it may affect if/how the loop in question is optimized. Well in this case, all of the code is considered dead is removed before the warning will happen to be emitted. If I change it so that data is read from (instead of just written to), the trunk warns about this code: t.c:21:20: warning: array subscript is above array bounds I changed the last return to be: return data[2]; d'oh! Next time I'll look at the objdump output first. Thanks for the quick explanation! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
build failure bootstrapping trunk on Ubuntu 9.10
I'm getting this build failure with latest trunk, as of the composing of this email: ../gcc-trunk/configure --prefix=/home/matt --enable-stage1-checking=all --enable-bootstrap --enable-lto --enable-languages=c,c++../gcc-trunk/configure --prefix=/home/matt --enable-stage1-checking=all --enable-bootstrap --enable-lto --enable-languages=c,c++ make -j5 . . . /home/matt/src/gcc-obj/./prev-gcc/xgcc -B/home/matt/src/gcc-obj/./prev-gcc/ -B/home/matt/x86_64-unknown-linux-gnu/bin/ -B/home/matt/x86_64-unknown-linux-gnu/bin/ -B/home/matt/x86_64-unknown-linux-gnu/lib/ -isystem /home/matt/x86_64-unknown-linux-gnu/include -isystem /home/matt/x86_64-unknown-linux-gnu/sys-include-c -g -O2 -fprofile-use -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -Wold-style-definition -Wc++-compat -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc-trunk/gcc -I../../gcc-trunk/gcc/. -I../../gcc-trunk/gcc/../include -I../../gcc-trunk/gcc/../libcpp/include -I../../gcc-trunk/gcc/../libdecnumber -I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber -DCLOOG_PPL_BACKEND -I/usr/include/libelf ../../gcc-trunk/gcc/ira-lives.c -o ira-lives.o cc1: warnings being treated as errors ../../gcc-trunk/gcc/ira-lives.c: In function ira_implicitly_set_insn_hard_regs: ../../gcc-trunk/gcc/ira-lives.c:748:13: error: regno may be used uninitialized in this function It looks like ira-lives.c:763 has some ambiguous parenthesizing that may be causing the warning that is failing the build). Note that the warning doesn't happen on a similar piece of code on line 830 in the same file. I've been fighting with the configure process for a few days and finally got past that to this issue. So, any help is greatly appreciated :) Thanks! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
missed IPA/whopr optimization?
Hello all, In the work I'm doing on my new book, I'm trying to show how modern compiler optimizations can eliminate a good deal of the overhead introduced by an modular/unit-testable design. In verifying some of my text, I found that GCC 4.4 and 4.5 (20091018, Ubuntu 9.10 package) isn't doing an optimization that I expected it to do: class Calculable { public: virtual unsigned char calculate() = 0; }; class X : public Calculable { public: unsigned char calculate() { return 1; } }; class Y : public Calculable { public: unsigned char calculate() { return 2; } }; static void print(Calculable& c) { printf("%d\n", c.calculate()); printf("+1: %d\n", c.calculate() + 1); } int main() { X x; Y y; print(x); print(y); return 0; } GCC 4.5 (and 4.4.1) generates this approximate code: ~/src $ /usr/lib/gcc-snapshot/bin/g++ -O3 -ftree-loop-ivcanon -fivopts -ftree-loop-im -fwhole-program -fipa-struct-reorg -fipa-matrix-reorg -fgcse-sm -fgcse-las -fgcse-after-reload --param max-gcse-memory=1 --param max-pending-list-length=10 folding-test-interface.cpp -o folding-test-interface_gcc450_20091018-O3-kitchen-sink ~/src$ objdump -Mintel -S folding-test-interface_gcc450_20091018-O3-kitchen-sink | less -p \ 00400310 : 400310: 53 push rbx 400311: 48 83 ec 20 subrsp,0x20 400315: 48 8d 5c 24 10 learbx,[rsp+0x10] 40031a: 48 c7 44 24 10 c0 04movQWORD PTR [rsp+0x10],0x4004c0 400321: 40 00 400323: 48 c7 04 24 00 05 40movQWORD PTR [rsp],0x400500 40032a: 00 40032b: 48 89 dfmovrdi,rbx 40032e: ff 15 8c 01 00 00 call QWORD PTR [rip+0x18c] # 4004c0 <_ZTV1X+0x10> 400334: bf ac 04 40 00 movedi,0x4004ac 400339: 0f b6 f0movzx esi,al 40033c: 31 c0 xoreax,eax 40033e: e8 a5 03 00 00 call 4006e8 400343: 48 8b 44 24 10 movrax,QWORD PTR [rsp+0x10] 400348: 48 89 dfmovrdi,rbx 40034b: ff 10 call QWORD PTR [rax] 40034d: 0f b6 f0movzx esi,al 400350: bf a4 04 40 00 movedi,0x4004a4 400355: 31 c0 xoreax,eax 400357: 83 c6 01addesi,0x1 40035a: e8 89 03 00 00 call 4006e8 [...] as seen here, GCC isn't folding/inlining the constants returned across the virtual function boundary, even though they are visible in the compilation unit and -O3 -fwhole-program is being used. (Note that I started with just that commandline, and added things in an attempt to induce the optimization I was hoping for.) I was able to induce the optimization by removing a level of indirection via two ways: 1) By having two print() methods, one overloaded to accept X& and a second overload to accept Y&; and 2) by replacing the classes with single-level indirection function pointers: -- #include typedef unsigned char(*Calculable)(void); unsigned char one() { return 1; } unsigned char two() { return 2; } static void print(Calculable calculate) { printf("%d\n", calculate()); printf("+1: %d\n", calculate() + 1); } int main() { print(one); print(two); return 0; } -- For completeness, this code is generated from the function-pointer example optimizes in the way I expect: 00400390 : 400390: 48 83 ec 08 subrsp,0x8 400394: ba 01 00 00 00 movedx,0x1 400399: be e4 04 40 00 movesi,0x4004e4 40039e: bf 01 00 00 00 movedi,0x1 4003a3: 31 c0 xoreax,eax 4003a5: e8 c6 02 00 00 call 400670 <__printf_...@plt> 4003aa: ba 02 00 00 00 movedx,0x2 4003af: be dc 04 40 00 movesi,0x4004dc 4003b4: bf 01 00 00 00 movedi,0x1 4003b9: 31 c0 xoreax,eax 4003bb: e8 b0 02 00 00 call 400670 <__printf_...@plt> Modifying this last example to include two function pointer indirections once again causes the optimization to be missed. So, my questions are: 0) Am I missing some existing commandline parameter that would induce the optimization? (e.g. a bad connection between my chair and keyboard) 1) Is this a missed optimization bug, or is this a missing feature? 2) Either way, what are the steps to correct the issue? Thanks in advance for insights and/or help! PS: I would test with a newer 4.5.0 build, but I'm having trouble bootstrapping. Any help is appreciated on that email (sent yesterday), as well. -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
Re: GCC 4.5 is uncompilable
Hey Dave, What OS are you bootstrapping on, and with which compiler/version? (Cygwin, I assume, but you never know ;>) I haven't been able to bootstrap for a few weeks, but no one answered my email for help (which probably got lost in the kernel-related fighting): http://gcc.gnu.org/ml/gcc/2009-11/msg00476.html The code in question definitely looks like the uninitialized warning (reported as error) is valid. I'm surprised no one else has been seeing this, unless they aren't bootstrapping using 4.4.1 or above. Any help is appreciated -- I really want to get cracking on testing 4.5. Thanks! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
df_changeable_flags use in combine.c
Hi, I'm fixing some compiler errors when configuring with --enable-build-with-cxx, and ran into a curious line of code that may indicate a bug: static unsigned int rest_of_handle_combine (void) { int rebuild_jump_labels_after_combine; df_set_flags (DF_LR_RUN_DCE + DF_DEFER_INSN_RESCAN); // ... } The DF_* values are from the df_changeable_flags enum, whose values are typically used in logical and/or operations for masking purposes. As such, I'm guessing the author may have meant to do: df_set_flags (DF_LR_RUN_DCE & DF_DEFER_INSN_RESCAN); I could have just added the explicit cast necessary to silence the gcc-as-cxx warning I was running into, but I wanted to be a good citizen :) Any pointers are appreciated, Thanks! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
[gcc-as-cxx] enum conversion to int
Hi, I'm trying to fix some errors/warnings to make sure that gcc-as-cxx doesn't bitrot too much. I ran into this issue, and an unsure how to fix it without really ugly casting: enum df_changeable_flags df_set_flags (enum df_changeable_flags changeable_flags) { enum df_changeable_flags old_flags = df->changeable_flags; df->changeable_flags |= changeable_flags; return old_flags; } I'm getting this warning on the second line of the function: ./../gcc-trunk/gcc/df-core.c: In function df_changeable_flags df_set_flags(df_changeable_flags): ../../gcc-trunk/gcc/df-core.c:474: error: invalid conversion from int to df_changeable_flags At first blanch, it seems like df_changeable_flags should be a typedef to byte (or int, which is what it was being implicitly converted to everywhere), and the enum should be disbanded into individual #defines. I wanted to make sure that this wasn't a warning false positive first, though. -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
Re: [gcc-as-cxx] enum conversion to int
On Tue, 5 Jan 2010, Ian Lance Taylor wrote: Matt writes: I'm trying to fix some errors/warnings to make sure that gcc-as-cxx doesn't bitrot too much. I ran into this issue, and an unsure how to fix it without really ugly casting: enum df_changeable_flags df_set_flags (enum df_changeable_flags changeable_flags) { enum df_changeable_flags old_flags = df->changeable_flags; df->changeable_flags |= changeable_flags; return old_flags; } On trunk df_set_flags looks like this: int df_set_flags (int changeable_flags) Yes, was I pasted was a local change. I was trying to eliminate the implicit cast to int from the enum type, which was causing my --enable-werror build to fail. At this point, I think the better option would be to break up the enum values into indivdual #defines and do a typedef df_changeable_flags int; The gcc-in-cxx branch is no longer active. All the work was merged to trunk, where it is available via --enable-build-with-cxx. If you want to work on the gcc-in-cxx branch, start by merging from trunk. Sorry, I didn't mean to imply I was working on the now-dead branch. I'm doing this work in trunk. I want the build-as-cxx option to work decently so that my profiledbootstrap exercises the C++ front-end more, since that is what we compile all our code with here. As such, I'm building trunk to eliminate some of the cxx failures, and will submit a patch once it either builds completely or I've hit a brick wall. This should (hopefully) make for less work when the more invasive changes are started once trunk is open again. PS: of course, it would be even better if profiledbootstrap allowed me to point at our build's makefile to generate the runtime profile. -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
Re: [gcc-as-cxx] enum conversion to int
On Tue, 5 Jan 2010, Ian Lance Taylor wrote: Matt writes: Yes, was I pasted was a local change. I was trying to eliminate the implicit cast to int from the enum type, which was causing my --enable-werror build to fail. At this point, I think the better option would be to break up the enum values into indivdual #defines and do a typedef df_changeable_flags int; Don't use #defines. Enums give better debug info by default. typedef df_changeable_flags int is fine if that seems necessary. Right now the code simply doesn't use the df_changeable_flags type any time there is more than one flag. Okay, good to know about the better debuggability of enums. If the flags are supposed to be mutually exclusive, then the code in my other email where two flags are added together seems contrary. Regardless, does this mean that the bit-wise operations for set_flags and clear_flags could be changed to simple assignments? That would indeed fix this issue in a nice way. -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
Re: ICE building svn trunk on Ubuntu 9.x amd64
(now sending to gcc@ instead of gcc-help@, as suggested) I have narrowed it down to this reduced commandline (the time is there just to show that it may take a while, but this particular issue doesn't cause a hang): m...@hargett-755:~/src/gcc-obj/prev-gcc$ time /home/matt/src/gcc-obj/./prev-gcc/xgcc -B/home/matt/src/gcc-obj/./prev-gcc/ -B/home/matt/x86_64-unknown-linux-gnu/bin/ -B/home/matt/x86_64-unknown-linux-gnu/bin/ -B/home/matt/x86_64-unknown-linux-gnu/lib/ -isystem /home/matt/x86_64-unknown-linux-gnu/include -isystem /home/matt/x86_64-unknown-linux-gnu/sys-include-c -O2 -ftree-loop-distribution -DIN_GCC -DHAVE_CONFIG_H -I. -I. -I../../gcc-trunk/gcc -I../../gcc-trunk/gcc/. -I../../gcc-trunk/gcc/../include -I../../gcc-trunk/gcc/../libcpp/include -I../../gcc-trunk/gcc/../libdecnumber -I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber -Iyes/include -Iyes/include -DCLOOG_PPL_BACKEND ../../gcc-trunk/gcc/reload1.c -o reaload1.o../../gcc-trunk/gcc/reload1.c: In function delete_output_reload: ../../gcc-trunk/gcc/reload1.c:8391:1: error: type mismatch in binary expression long unsigned int long unsigned int D.65146_650 = D.65145_651 - D.65141_624; ../../gcc-trunk/gcc/reload1.c:8391:1: error: type mismatch in binary expression long unsigned int long unsigned int D.65154_658 = D.65153_659 - D.65149_647; ../../gcc-trunk/gcc/reload1.c:8391:1: internal compiler error: verify_stmts failed Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. real9m25.630s user9m23.823s sys 0m0.972s -O0 -ftree-loop-distribution doesn't exhibit the problem, and neither does -O1 -ftree-loop-distribution. There's something about the combination of -O2 (or -O3) and -ftree-loop-distribution that causes the ICE on this particular file. I'll try bootstrapping without -ftree-loop-distribution and see if that works for me. If more information is needed, or I should file a bug report, let me know. On Wed, 24 Jun 2009, Matt wrote: Hi, I left my profiled bootstrap to of svn r148885 to run overnight, and saw this in the morning: /home/matt/src/gcc-obj/./prev-gcc/xgcc -B/home/matt/src/gcc-obj/./prev-gcc/ -B/home/matt/x86_64-unknown-linux-gnu/bin/ -B/home/matt/x86_64-unknown-linux-gnu/bin/ -B/home/matt/x86_64-unknown-linux-gnu/lib/ -isystem /home/matt/x86_64-unknown-linux-gnu/include -isystem /home/matt/x86_64-unknown-linux-gnu/sys-include-c -O3 -floop-interchange -floop-strip-mine -floop-block -findirect-inlining -ftree-switch-conversion -fvect-cost-model -fgcse-sm -fgcse-las -fgcse-after-reload -fsee -ftree-loop-linear -ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fivopts -fvpt -funroll-loops -funswitch-loops -fprofile-generate -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual -Wold-style-definition -Wc++-compat -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc-trunk/gcc -I../../gcc-trunk/gcc/. -I../../gcc-trunk/gcc/../include -I../../gcc-trunk/gcc/../libcpp/include -I../../gcc-trunk/gcc/../libdecnumber -I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber -Iyes/include -Iyes/include -DCLOOG_PPL_BACKEND ../../gcc-trunk/gcc/rtl.c -o rtl.o ../../gcc-trunk/gcc/reload1.c: In function delete_output_reload: ../../gcc-trunk/gcc/reload1.c:8391:1: error: type mismatch in binary expression long unsigned int long unsigned int D.58046_964 = D.58045_963 - D.58041_946; ../../gcc-trunk/gcc/reload1.c:8391:1: error: type mismatch in binary expression long unsigned int long unsigned int D.58054_972 = D.58053_971 - D.58049_967; ../../gcc-trunk/gcc/reload1.c:8391:1: internal compiler error: verify_stmts failed Please submit a full bug report, with preprocessed source if appropriate. This is using the 4:4.4.0-3ubuntu1 version of Ubuntu's gcc package on amd64. Here's my configure cmdline: CFLAGS="-O3 -floop-interchange -floop-strip-mine -floop-block -findirect-inlining -ftree-switch-conversion -fvect-cost-model -fgcse-sm -fgcse-las -fgcse-after-reload -fsee -ftree-loop-linear -ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fivopts -fvpt -funroll-loops -funswitch-loops" CPPFLAGS="-O3 -floop-interchange -floop-strip-mine -floop-block -findirect-inlining -ftree-switch-conversion -fvect-cost-model -fgcse-sm -fgcse-las -fgcse-after-reload -fsee -ftree-loop-linear -ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fivopts -fvpt -funroll-loops -funswitch-loops" ../gcc-trunk/configure --prefix=/home/matt --enable-stage1-checking=all --enable-bootstrap --enable-lto --enable-languages=c,c++ --with-ppl --with-cloog and here's my make cmdline: make BOOT_CFLAGS="-O3 -floop-interchange -floop-strip-mine -floop-block -findi
Re: Phase 1 of gcc-in-cxx now complete
* Develop some trial patches which require C++, e.g., convert VEC to std::vector. Do you have any ideas for the easiest starting points? Is there anywhere that is decently self-contained, or will if have to be a big bang? I'd love to see this happen so there's more exercising of template expansion during the profiledbootstrap. If I can get pointed in the right direction, I can probably produce a patch within the next week. Thanks for this work and adding all the extra warnings! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
4.1.1 profiledbootstrap failure on amd64
I get this failure when trying to do a proifledbootstrap on amd64. This is a gentoo Linux machine with gcc 3.4.4, glibc 2.35, binutils 2.16.1, autoconf 2.59, etc, etc. make[6]: Entering directory `/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3' if [ -z "32" ]; then \ true; \ else \ rootpre=`${PWDCMD-pwd}`/; export rootpre; \ srcrootpre=`cd ../../../gcc-4.1.1-20060517/libstdc++-v3; ${PWDCMD-pwd}`/; export srcrootpre; \ lib=`echo ${rootpre} | sed -e 's,^.*/\([^/][^/]*\)/$,\1,'`; \ compiler="/home/matt/src/gcc-bin/./gcc/xgcc -B/home/matt/src/gcc-bin/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include"; \ for i in `${compiler} --print-multi-lib 2>/dev/null`; do \ dir=`echo $i | sed -e 's/;.*$//'`; \ if [ "${dir}" = "." ]; then \ true; \ else \ if [ -d ../${dir}/${lib} ]; then \ flags=`echo $i | sed -e 's/^[^;]*;//' -e 's/@/ -/g'`; \ if (cd ../${dir}/${lib}; make "AR_FLAGS=rc" "CC_FOR_BUILD=gcc" "CC_FOR_TARGET=/home/matt/src/gcc-bin/./gcc/xgcc -B/home/matt/src/gcc-bin/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include" "CFLAGS=-O2 -g -O2 " "CXXFLAGS=-g -O2 -D_GNU_SOURCE" "CFLAGS_FOR_BUILD=-g -O2" "CFLAGS_FOR_TARGET=-O2 -g -O2 " "INSTALL=/usr/bin/install -c" "INSTALL_DATA=/usr/bin/install -c -m 644" "INSTALL_PROGRAM=/usr/bin/install -c" "INSTALL_SCRIPT=/usr/bin/install -c" "LDFLAGS=" "LIBCFLAGS=-O2 -g -O2 " "LIBCFLAGS_FOR_TARGET=-O2 -g -O2 " "MAKE=make" "MAKEINFO=makeinfo --split-size=500 --split-size=500 --split-size=500" "PICFLAG=" "PICFLAG_FOR_TARGET=" "SHELL=/bin/sh" "RUNTESTFLAGS=" "exec_prefix=/usr/local" "infodir=/usr/local/info" "libdir=/usr/local/lib" "includedir=/usr/local/include" "prefix=/usr/local" "tooldir=/usr/local/x86_64-unknown-linux-gnu" "gxx_include_dir=/usr/local/include/c++/4.1.1" "AR=ar" "AS=/home/matt/src/gcc-bin/./gcc/as" "LD=/home/matt/src/gcc-bin/./gcc/collect-ld" "RANLIB=ranlib" "NM=/home/matt/src/gcc-bin/./gcc/nm" "NM_FOR_BUILD=" "NM_FOR_TARGET=nm" "DESTDIR=" "WERROR=" \ CFLAGS="-O2 -g -O2 ${flags}" \ FCFLAGS=" ${flags}" \ FFLAGS=" ${flags}" \ ADAFLAGS=" ${flags}" \ prefix="/usr/local" \ exec_prefix="/usr/local" \ GCJFLAGS=" ${flags}" \ CXXFLAGS="-g -O2 -D_GNU_SOURCE ${flags}" \ LIBCFLAGS="-O2 -g -O2 ${flags}" \ LIBCXXFLAGS="-g -O2 -D_GNU_SOURCE -fno-implicit-templates ${flags}" \ LDFLAGS=" ${flags}" \ MULTIFLAGS="${flags}" \ DESTDIR="" \ INSTALL="/usr/bin/install -c" \ INSTALL_DATA="/usr/bin/install -c -m 644" \ INSTALL_PROGRAM="/usr/bin/install -c" \ INSTALL_SCRIPT="/usr/bin/install -c" \ all); then \ true; \ else \ exit 1; \ fi; \ else true; \ fi; \ fi; \ done; \ fi make[7]: Entering directory `/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/32/libstdc++-v3' make[7]: *** No rule to make target `all'. Stop. make[7]: Leaving directory `/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/32/libstdc++-v3' make[6]: *** [multi-do] Error 1 make[6]: Leaving directory `/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3' make[5]: *** [all-multi] Error 2 make[5]: Leaving directory `/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3' make[4]: *** [all-recursive] Error 1 make[4]: Leaving directory `/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3' make[3]: *** [all] Error 2 make[3]: Leaving directory `/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3' make[2]: *** [all-target-libstdc++-v3] Error 2 make[2]: Leaving directory `/home/matt/src/gcc-bin' make[1]: *** [all] Error 2 make[1]: Leaving directory `/home/matt/src/gcc-bin' make: *** [profiledbootstrap] Error 2 -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
Re: build failure, GMP not available
I have been struggling with this issue, and now that I have successfully built GCC I thought I would share my results. Hopefully it can help someone better versed in autotools to improve the build of GCC with GMP/MPFR. For reference, a few older threads I've found: http://gcc.gnu.org/ml/gcc/2006-01/msg00333.html";>http:// gcc.gnu.org/ml/gcc/2006-01/msg00333.html http://gcc.gnu.org/ml/gcc-bugs/2006-03/ msg00723.html">http://gcc.gnu.org/ml/gcc-bugs/2006-03/msg00723.html The long and short of it: my builds of the latest versions of GMP and MPFR were perfectly fine, although not ideal for building GCC. However, the GCC 4.1.1 configure script incorrectly decided that it _had_ located useful copies of GMP and MPFR, while in fact the GFortran build fails 90 minutes later with the error message (as in the second thread above): "../.././libgfortran/mk-kinds-h.sh: Unknown type" This was configuring GCC via: ../srcdir/configure --with-gmp=/usr/local/lib64 --with-mpfr=/usr/ local/lib64 I now understand that this is a mis-use of these options, however recall configure was successful (I still do not understand why), while configure failed with the 'correct' options '--with-gmp=/usr/ local --with-mpfr=/usr/local' (because *.h are in /usr/local/include, but *.a are in /usr/local/lib64). I was finally successful by using the build-directories rather than the installed libraries via: ../srcdir/configure --with-gmp-dir=/usr/local/gmp --with-mpfr-dir=/ usr/local/mpfr but only after I made the symlink: ln -s /usr/local/mpfr/.libs/libmpfr.a /usr/local/mpfr/libmpfr.a One issue here is that '--with-mpfr=path' assumes that 'libmpfr.a' is in 'path/lib' (not true for how I installed it), while '--with-mpfr- dir=path' assumes that 'libmpfr.a' is in 'path', rather than 'path/.libs' (can this work for anyone?). Note that '--with-gmp- dir=path' does look in 'path/.libs'. This is all on RHEL4 x86_64. Note I am new to x86_64 and multilibs -- this certainly added to my difficulties. The machine does have older versions of GMP and MPFR installed in /usr/lib and /usr/lib64, while I had installed the latest versions in /usr/local (with the libraries in /usr/local/lib64). I would also note that GMP unfortunately hard- codes the bitness of the libraries in gmp.h, and that the older system /usr/include/gmp.h identifies itself as 64-bit (there are no #define switches as I would have expected). My comments: 1) It would have been very useful to have explicit configure options such as --with-gmp-lib=path and --with-gmp-include=path (etc) that explicitly locate the *.a and *.h directories, rather than (or in addition to) the existing "install directory" and "build directory" options. 2) Ideally IMHO the top-level configure (or at least the libgfortran configure) would test the execution of some or all of the required functions in GMP/MPFR. I vaguely recall that this is possible with autoconf, and should be more robust. Would it add too much complexity to the top-level configure? Thanks, - Matt
Re: build failure, GMP not available
>From: "Kaveh R. GHAZI" <[EMAIL PROTECTED]> >> Matt Fago wrote: >> One issue here is that '--with-mpfr=path' assumes that 'libmpfr.a' is >> in 'path/lib' (not true for how I installed it), while '--with-mpfr- >> dir=path' assumes that 'libmpfr.a' is in 'path', rather than >> 'path/.libs' (can this work for anyone?). Note that '--with-gmp- >> dir=path' does look in 'path/.libs'. > >This problem appears in the 4.0 series all the way through current >mainline. I do believe it should be fixed and it is simple to do so. I'll >take care of it. > >> My comments: >> >> 1) It would have been very useful to have explicit configure options >> such as --with-gmp-lib=path and --with-gmp-include=path (etc) that >> explicitly locate the *.a and *.h directories, rather than (or in >> addition to) the existing "install directory" and "build directory" >> options. > >Yes, the configure included in mpfr itself has this for searching for GMP >which it relies on. I'll add something for this in GCC also. Thank you. >> 2) Ideally IMHO the top-level configure (or at least the libgfortran >> configure) would test the execution of some or all of the required >> functions in GMP/MPFR. I vaguely recall that this is possible with >> autoconf, and should be more robust. Would it add too much complexity >> to the top-level configure? > >I tend to be reluctant about run tests because they don't work with a >cross-compiler. Would you please tell me specifically what problem >checking at runtime would prevent that the existing compile test doesn't >detect? Yes, a cross-compiler could not do runtime tests. I was trying to think of a more robust configuration-time test. This is difficult as I do not quite understand why configure was successful in finding the libraries with the correct versions, but yet the compilation itself failed. Would a link test against all of the required GMP/MPFR functions (via AC_CHECK_LIB etc) offer anything? Thanks, - Matt
Re: Bootstrap broken on x86_64 on the trunk in libgfortran?
>> ../../../trunk/libgfortran/mk-kinds-h.sh: Unknown type >> grep '^#' < kinds.h > kinds.inc >> /bin/sh: kinds.h: No such file or directory >> make[2]: *** [kinds.inc] Error 1 >> make[2]: Leaving directory >> `/home/daney/gccsvn/native-trunk/x86_64-unknown-linux-gnu/libgfortran' >> make[1]: *** [all-target-libgfortran] Error 2 >> make[1]: Leaving directory `/home/daney/gccsvn/native-trunk' >> make: *** [all] Error 2 > >Usually (like 99% of the time), this means you GMP/MPFR are broken >and is causing gfortran to crash out. I think the patch concept below may help with these issues. The idea was to make configure try to link to libmfpr using the functions only in mfpr 2.2.0 or greater that GCC is currently using (that I could find anyhow). Previously configure could succeed if any version of libmfpr was available so long as the header was the correct version (this is likely on x86_64). Please excuse any formatting issues -- this is my first patch. I have neither SVN access nor a copyright assignment, but this is a short patch. Would someone be willing to help test and possibly apply? Thanks! Matt --- configure.in(Revision 119232) +++ configure.in(Working Copy) @@ -1123,7 +1123,12 @@ if test x"$have_gmp" = xyes; then #if MPFR_VERSION_MAJOR < 2 || (MPFR_VERSION_MAJOR == 2 && MPFR_VERSION_MINOR < 2) choke me #endif - mpfr_t n; mpfr_init(n); + int t; + mpfr_t n, x; + mpfr_init (n); mpfr_init (x); + mpfr_atan2 (n, n, x, GMP_RNDN); + mpfr_erfc (n, x, GMP_RNDN); + mpfr_subnormalize (x, t, GMP_RNDN); ], [AC_MSG_RESULT([yes])], [AC_MSG_RESULT([no]); have_gmp=no]) LIBS="$saved_LIBS" fi
Re: mpfr issues when Installing gcc 3.4 on fedora core
You do mean gcc 4.3 right (either a snapshot, or from svn)? Since you're running on x86_64, do you know that the libraries are the correct bitness (running 'file' on the mpfr and gmp libraries will tell). By default gcc on x86_64 will build 64-bit, but libraries in /usr/local/lib should only be 32-bit (versus /usr/local/ lib64). The linker will ignore any 32-bit libraries when linking a 64- bit executable. How did you install gmp/mpfr (note the package from fedora is broken -- very old)? It took me quite a while to get 4.1 with fortran installed on RHEL until I got this all sorted out (I was new to multilibs). I just upgraded to fc6 and was able to install gcc from svn once I used -- with-gmp-lib=/usr/local/lib64 (etc for include and mpfr) and setting LD_LIBRARY_PATH=/usr/local/lib64 appropriately. Alternatively one could (carefully!) setup /etc/ld.so.conf and run ldconfig (I did this on RHEL). I might be able to help tomorrow AM (US mountain time) if you email me directly. FWIW, I understand the reason to keep mpfr out of the gcc tree, but not doing so makes gcc more difficult to bootstrap for a novice such as myself. Fedora's outdated gmp/mpfr package doesn't help either ... - Matt
Re: mpfr issues when Installing gcc 3.4 on fedora core
> drizzle drizzle wrote: >And as matt suggested if mpfr is not needed by 3.4, how can I >configure it that way. --disable -mpfr did not help. MPFR should not have _anything_ to do with any gcc prior to 4.x. Where did you get gcc 3.4? A tarball from a gnu mirror or somewhere else? I think either the tarball is misnamed or something is terribly wrong with it. >checking if gmp.h version and libgmp version are the same... (4.2.1/4.1.4) no >configure: WARNING: 'gmp.h' and 'libgmp' seems to have different versions or >configure: WARNING: we cannot run a program linked with GMP (if you cannot >configure: WARNING: see the version numbers above). >configure: WARNING: However since we can't use 'libtool' inside the configure, >configure: WARNING: we can't be sure. See 'config.log' for details. This means that mpfr needs to be told where gmp is and was probably not built correctly. When you configure mpfr use the options: --with-gmp-include=DIR GMP include directory --with-gmp-lib=DIR GMP lib directory Make sure these point to the lib and include directories with the new version of gmp. You can also use: --libdir=/usr/local/lib64 if you wish to install the 64-bit libraries there instead of ../lib. Note that fedora installs a 'bad' version of gmp 4.1.4 that includes a very old copy of mpfr. You seem to be picking up the library from this one. - Matt
Re: mpfr issues when Installing gcc 3.4 on fedora core
>From: drizzle drizzle <[EMAIL PROTECTED]> >Still no luck so far .. I got the gcc3.4 from the gcc archive. Any way >I can make gcc 3.4 not use these libraries ? What is the exact file name and URL? I will download the same tarball and try to build it on my fc6 box. - M
Re: mpfr issues when Installing gcc 3.4 on fedora core
>From: drizzle drizzle <[EMAIL PROTECTED]> > >svn -q checkout svn://gcc.gnu.org/svn/gcc/trunk gcc_3_4_6_release This is checking out the latest trunk, not version 3.4. The last argument only changes the name of the directory name on your local machine. The 'svn://' is what specifies the tag (in this case 'trunk'). - Matt
gcc gcov and --coverage on x86_64
Having searched in bugzilla and asked on gcc-help to no avail ... gcc --coverage appears to be broken on x86_64 in gcc 4.1.1 on FC6 (works fine with Trunk). I'm almost certain that this is a known issue, but cannot find a reference in Bugzilla. Could someone please give me a pointer to the bug? Thanks, Matt
Re: gcc gcov and --coverage on x86_64
>From: Ben Elliston <[EMAIL PROTECTED]> >> gcc --coverage appears to be broken on x86_64 in gcc 4.1.1 on FC6 >> (works fine with Trunk). I'm almost certain that this is a known >> issue, but cannot find a reference in Bugzilla. > >I implemented that option, so can probably help you. Contact me in >private mail and we'll try and troubleshoot it. If necessary, you can >then file a bug report. FYI, this is an issue with ccache and not gcc (I forgot about that possibility). Guess it's time to dig into ccache. Thanks, Matt
VAX backend status
Over the past several weeks, I've revamped the VAX backend: - fixed various bugs - improved 64bit move, add, subtract code. - added patterns for ffs, bswap16, bswap32, sync_lock_test_and_set, and sync_lock_release - modified it to generate PIC code. - fixed the dwarf2 output so it is readonly in shared libraries. - moved the constraints from vax.h to constraints.md - moved predicates to predicates.md - added several peephole and peephole2 patterns So the last major change to make the VAX backend completely modern is to remove the need for "HAVE_cc0". However, even instructions that modify the CC don't always changes all the CC bits; some instructions preserve certain bits. I'd like to do this but currently it's above my level of gcc expertise. Should the above be submitted as one megapatch? Or as a dozen or two smaller patches? And finally a few musings ... I've noticed a few things in doing the above. GCC 4.x doesn't seems to do CSE on addresses. Because the VAX binutils doesn't support non-local symbols with a non-zero addend in the GOT, PIC will do a define_expand so that (const (plus (symbol_ref) (const_int))) will be split into separate instructions. However, gcc doesn't seem to be able to take advantage of that. For instance, gcc emits: movab rpb,%r0 movab 100(%r0),%r1 cvtwl (%r1),%r0 but the movab 100(%r0),%r1 is completely unneeded, this should have been emitted as: movab rpb,%r0 cvtwl 100(%r0),%r0 I could add peepholes to find these and fix them but it would be nice if the optimizer could do that for me. Another issue is that gcc has become "stupider" when it comes using indexed addressing. For example: static struct { void (*func)(void *); void *arg; int inuse; } keys[64]; int nextkey; int setkey(void (*func)(void *), void *arg) { int i; for (i = nextkey; i < 64; i++) { if (!keys[i].inuse) goto out; } emits: movl nextkey,%r3 cmpl %r3,$63 jgtr .L38 mull3 %r3,$12,%r0 movab keys+8[%r0],%r0 tstl (%r0) The last 3 instructions should have been: mull3 %r3,$3,%r0 tstl keys+8[%r0]
[RFA] Invalid mmap(2) assumption in pch (ggc-common.c)
Running the libstdc++ testsuite on NetBSD/sparc or NetBSD/sparc64 results in most tests failing like: :1: fatal error: had to relocate PCH compilation terminated. compiler exited with status 1 This is due to a misassumption in ggc-common.c:654 (mmap_gt_pch_use_address): This version assumes that the kernel honors the START operand of mmap even without MAP_FIXED if START through START+SIZE are not currently mapped with something. That is not true for NetBSD. Due to MMU idiosyncracies, some architecures (like sparc and sparc64) will align mmap requests that don't have MAP_FIXED set for architecture specific reasons). Is there a reason why MAP_FIXED isn't used even though it probably should be? -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
[PATCH] VAX: cleanup; move macros from config/vax/vax.h to normal in config/vax/vax.c
This doesn't change any functionality, it just moves and cleans up a large number of complicated macros in vax.h to normal C code in vax.c. It's the first major step to integrating PIC support that I did for gcc 2.95.3. It also switches from using SYMBOL_REF_FLAG to SYMBOL_REF_LOCAL_P. Committed. -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message. 2005-03-26 Matt Thomas <[EMAIL PROTECTED]> * config/vax/vax.c (legitimate_constant_address_p): New. Formerly CONSTANT_ADDRESS_P in config/vax/vax.h (legitimate_constant_p): New. Formerly CONSTANT_P in vax.h. (INDEX_REGISTER_P): New. (BASE_REGISTER_P): New. (indirectable_constant_address_p): New. Adapted from INDIRECTABLE_CONSTANT_ADDRESS_P in vax.h. Use SYMBOL_REF_LOCAL_P. (indirectable_address_p): New. Adapted from INDIRECTABLE_ADDRESS_P in vax.h. (nonindexed_address_p): New. Adapted from GO_IF_NONINDEXED_ADDRESS in vax.h. (index_temp_p): New. Adapted from INDEX_TERM_P in vax.h. (reg_plus_index_p): New. Adapted from GO_IF_REG_PLUS_INDEX in vax.h. (legitimate_address_p): New. Adapted from GO_IF_LEGITIMATE_ADDRESS in vax.h (vax_mode_dependent_address_p): New. Adapted from GO_IF_MODE_DEPENDENT_ADDRESS in vax.h * config/vax/vax.h (CONSTANT_ADDRESS_P): Use legitimate_constant_address_p (CONSTANT_P): Use legitimate_constant_p. (INDIRECTABLE_CONSTANT_ADDRESS_P): Removed. (INDIRECTABLE_ADDRESS_P): Removed. (GO_IF_NONINDEXED_ADDRESS): Removed. (INDEX_TEMP_P): Removed. (GO_IF_REG_PLUS_INDEX): Removed. (GO_IF_LEGITIMATE_ADDRESS): Use legitimate_address_p. Two definitions, depending on whether REG_OK_STRICT is defined. (GO_IF_MODE_DEPENDENT_ADDRESS): Use vax_mode_dependent_address_p. Two definitions, depending on whether REG_OK_STRICT is defined. * config/vax/vax-protos.h (legitimate_constant_address_p): Prototype added. (legitimate_constant_p): Prototype added. (legitimate_address_p): Prototype added. (vax_mode_dependent_address_p): Prototype added. Index: vax.c === RCS file: /cvs/gcc/gcc/gcc/config/vax/vax.c,v retrieving revision 1.60 diff -u -3 -p -r1.60 vax.c --- vax.c 7 Apr 2005 21:44:57 - 1.60 +++ vax.c 26 Apr 2005 20:45:42 - @@ -1100,3 +1100,227 @@ vax_output_conditional_branch (enum rtx_ } } +/* 1 if X is an rtx for a constant that is a valid address. */ + +int +legitimate_constant_address_p (rtx x) +{ + return (GET_CODE (x) == LABEL_REF || GET_CODE (x) == SYMBOL_REF + || GET_CODE (x) == CONST_INT || GET_CODE (x) == CONST + || GET_CODE (x) == HIGH); +} + +/* Nonzero if the constant value X is a legitimate general operand. + It is given that X satisfies CONSTANT_P or is a CONST_DOUBLE. */ + +int +legitimate_constant_p (rtx x ATTRIBUTE_UNUSED) +{ + return 1; +} + +/* The other macros defined here are used only in legitimate_address_p (). */ + +/* Nonzero if X is a hard reg that can be used as an index + or, if not strict, if it is a pseudo reg. */ +#defineINDEX_REGISTER_P(X, STRICT) +(GET_CODE (X) == REG && (!(STRICT) || REGNO_OK_FOR_INDEX_P (REGNO (X + +/* Nonzero if X is a hard reg that can be used as a base reg + or, if not strict, if it is a pseudo reg. */ +#defineBASE_REGISTER_P(X, STRICT) +(GET_CODE (X) == REG && (!(STRICT) || REGNO_OK_FOR_BASE_P (REGNO (X + +#ifdef NO_EXTERNAL_INDIRECT_ADDRESS + +/* Re-definition of CONSTANT_ADDRESS_P, which is true only when there + are no SYMBOL_REFs for external symbols present. */ + +static int +indirectable_constant_address_p (rtx x) +{ + if (!CONSTANT_ADDRESS_P (x)) +return 0; + if (GET_CODE (x) == CONST && GET_CODE (XEXP ((x), 0)) == PLUS) +x = XEXP (XEXP (x, 0), 0); + if (GET_CODE (x) == SYMBOL_REF && !SYMBOL_REF_LOCAL_P (x)) +return 0; + + return 1; +} + +#else /* not NO_EXTERNAL_INDIRECT_ADDRESS */ + +static int +indirectable_constant_address_p (rtx x) +{ + return CONSTANT_ADDRESS_P (x); +} + +#endif /* not NO_EXTERNAL_INDIRECT_ADDRESS */ + +/* Nonzero if X is an address which can be indirected. External symbols + could be in a sharable image library, so we disallow those. */ + +static int +indirectable_address_p(rtx x, int strict) +{ + if (indirectable_constant_address_p (x)) +return 1; + if (BASE_REGISTER_P (x, strict)) +return 1; + if (GET_CODE (x) == PLUS + && BASE_REGISTER_P (XEXP (x, 0), stric
GCC 4.1: Buildable on GHz machines only?
Over the past month I've been making sure that GCC 4.1 works on NetBSD. I've completed bootstraps on sparc, sparc64, arm, x86_64, i386, alpha, mipsel, mipseb, and powerpc. I've done cross-build targets for vax. Results have been sent to gcc-testsuite. The times to complete bootstraps on older machines has been bothering me. It took nearly 72 hours for 233MHz StrongArm with 64MB to complete a bootstrap (with libjava). It took over 48 hours for a 120MHz MIPS R4400 (little endian) with 128MB to finish (without libjava) and a bit over 24 hours for a 250MHz MIPS R4400 (big endian) with 256MB to finish (again, no libjava). That doesn't even include the time to run the testsuites. I have a 50MHz 68060 with 96MB of memory (MVME177) approaching 100 hours (48 hours just to exit stage3 and start on the libraries) doing a bootstrap knowing that it's going to die when doing the ranlib of libjava. The kernel for the 060 isn't configured with a large enough dataspace to complete the ranlib. Most of the machines I've listed above are relatively powerful machines near the apex of performance of their target architecture. And yet GCC4.1 can barely be bootstrapped on them. I do most of my GCC work on a 2GHz x86_64 because it's so fast. I'm afraid the widespread availability of such fast machines hides the fast that the current performance of GCC on older architectures is appalling. I'm going to run some bootstraps with --disable-checking just to see how much faster they are. I hope I'm going to pleasantly surprised but I'm not counting on it. -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
Re: GCC 4.1: Buildable on GHz machines only?
Richard Henderson wrote: > On Tue, Apr 26, 2005 at 10:57:07PM -0400, Daniel Jacobowitz wrote: > >>I would expect it to be drastically faster. However this won't show up >>clearly in the bootstrap. The, bar none, longest bit of the bootstrap >>is building stage2; and stage1 is always built with optimization off and >>(IIRC) checking on. > > > Which is why I essentially always supply STAGE1_CFLAGS='-O -g' when > building on risc machines. Alas, the --disable-checking and STAGE1_CFLAGS="-O2 -g" (which I was already doing) only decreased the bootstrap time by 10%. By far, the longest bit of the bootstrap is building libjava. -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
[RFA] Which is better? More and simplier patterns? Fewer patterns with more embedded code?
Back when I modified gcc 2.95.3 to produce PIC code for NetBSD/vax, I changed the patterns in vax.md to be more specific with the instructions that got matched. The one advantage (to me as the writer) was it made it much easier to track down what pattern caused what instruction to be emitted. For instance: (define_insn "*pushal" [(set (match_operand:SI 0 "push_operand" "=g") (match_operand:SI 0 "address_operand" "p"))] "" "pushal %a1") I like the more and simplier patterns approach but I'm wondering what the general recommendation is? -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
Re: GCC 4.1: Buildable on GHz machines only?
Gary Funck wrote: > >>-Original Message- >>From: Matt Thomas >>Sent: Tuesday, April 26, 2005 10:42 PM > > [...] > >>Alas, the --disable-checking and STAGE1_CFLAGS="-O2 -g" (which I was >>already doing) only decreased the bootstrap time by 10%. By far, the >>longest bit of the bootstrap is building libjava. >> > > > Is it fair to compare current build times, with libjava included, > against past build times when it didn't exist? Would a closer > apples-to-apples comparison be to bootstrap GCC Core only on > the older sub Ghz platforms? libjava is built on everything but vax and mips. Bootstrapping core might be better but do the configure on the fly it's not as easy as it used to be. It would be nice if bootstrap emitted timestamps when it was started and when it completed a stage so one could just look at the make output. Regardless, GCC4.1 is a computational pig. -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
Re: GCC 4.1: Buildable on GHz machines only?
David Edelsohn wrote: >>>>>>Matt Thomas writes: > > > Matt> Regardless, GCC4.1 is a computational pig. > > If you are referring to the compiler itself, this has no basis in > reality. If you are referring to the entire compiler collection, > including runtimes, you are not using a fair comparison or are making > extreme statements without considering the cause. When I see the native stage2 m68k compiler spend 30+ minutes compute bound with no paging activity compiling a single source file, I believe that is an accurate term. Compiling stage3 on a 50MHz 68060 took 18 hours. (That 30 minutes was for fold-const.c if you care to know). At some points, I had no idea whether GCC had just gone into an infinite loop due a bug or was actually doing what it was supposed to. > GCC now supports C++, Fortran 90 and Java. Those languages have > extensive, complicated runtimes. The GCC Java environment is becoming > much more complete and standards compliant, which means adding more and > more features. That's all positive but if GCC also becomes too expensive to build then all those extra features become worthless. What is the slowest system that GCC has been recently bootstrapped on? > If your point is that fully supporting modern, richly featured > languages results in a longer build process, that is correct. Using > disparaging terms like "pig" is missing the point. As others have pointed > out, if you do not want to build some languages and runtimes, you can > disable them. GCC is providing features that users want and that has a > cost. Yes they have a cost, but the cost is mitigated by running fast processors. They are just so fast they can hide ineffiences and bloat. We have seen that for NetBSD and it's just as true for GCC or any other software. These slower processor perform usefull feedback but only if a GCC bootstrap is attempted on them on a semi-regular basis. Am I the only person who has attempted to do a native bootstrap on a system as slow as a M68k? I thought about doing a bootstrap on a MicroSparc based system but instead I decided to use a UltraSparcIIi system running with a 32bit kernel. -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
Re: GCC 4.1: Buildable on GHz machines only?
Jonathan Wakely wrote: > On Wed, Apr 27, 2005 at 08:05:39AM -0700, Matt Thomas wrote: > > >>David Edelsohn wrote: >> >> >>> GCC now supports C++, Fortran 90 and Java. Those languages have >>>extensive, complicated runtimes. The GCC Java environment is becoming >>>much more complete and standards compliant, which means adding more and >>>more features. >> >>That's all positive but if GCC also becomes too expensive to build then >>all those extra features become worthless. > > > Worthless to whom? To users of that platform that can no longer afford to build GCC. > The features under discussion are new, they didn't exist before. And because they never existed before, their cost for older platforms may not have been correctly assessed. If no one builds natively on older platforms, the recognition that the new features maybe a problem for older platforms will never be made. > If you survived without them previously you can do so now. > (i.e. don't build libjava if your machine isn't capable of it) Yes, you can skip building libjava. But can you skip building GCC? Will GCC 3.x be supported forever? If not, your compiler may have to rely being cross-built. Being able to do a bootstrap is useful and is part of the expected GCC testing but when it can only be done one or two a week, it becomes a less practical test method. > But claiming it's "worthless" when plenty of people are using it is > just, well ... worthless. Depends on your point of view. -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
Re: GCC 4.1: Buildable on GHz machines only?
Mike Stump wrote: On Apr 26, 2005, at 11:12 PM, Matt Thomas wrote: It would be nice if bootstrap emitted timestamps when it was started and when it completed a stage so one could just look at the make output. You can get them differenced for free by using: time make boostrap I know that. But it's only works overall. I want the per-stage times. Here's a sparc64--netbsd full bootstrap including libjava (the machine has 640MB and was doing nothing but building gcc): 25406.01 real 21249.17 user 6283.15 sys 0 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 54689526 page reclaims 5349 page faults 110 swaps 723 block input operations 377302 block output operations 52 messages sent 52 messages received 285329 signals received 1037478 voluntary context switches 253151 involuntary context switches -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
Re: GCC 4.1: Buildable on GHz machines only?
Someone complained I was unfair in my gcc bootstrap times since some builds included libjava/gfortran and some did not. So in the past day, I've done bootstrap with just c,c++,objc on both 3.4 and gcc4.1. I've put the results in a web page at http://3am-software.com/gcc-speed.html. The initial bootstrap compiler was gcc3.3 and they are all running off the same base of NetBSD 3.99.3. While taking out fortran and java reduced the disparity, there is still a large increase in bootstrap times from 3.4 to 4.1. -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
Re: GCC 4.1: Buildable on GHz machines only?
Joe Buck wrote: > I think you need to talk to the binutils people. It should be possible > to make ar and ld more memory-efficient. Even though systems maybe demand paged, having super large libraries that consume lots of address space can be a problem. I'd like to libjava be split into multiple shared libraries. In C, we have libc, libm, libpthread, etc. In X11, there's X11, Xt, etc. So why does java have everything in one shared library? Could the swing stuff be moved to its own? Are there other logical divisions? Unlike other modern systems with a two level page table structure, the VAX uses a single page table of indirection. This greatly reduces the amount of address space a process can efficiently use. If there are components that will not be needed by some java programs, it would nice if they could be separated into their shared libraries. -- Matt Thomas email: [EMAIL PROTECTED] 3am Software Foundry www: http://3am-software.com/bio/matt/ Cupertino, CA disclaimer: I avow all knowledge of this message.
Use $(VARRAY_H) in dependencies?
Howdy, The rules for c-objc-common.o, loop-unroll.o, and tree-inline.o include $(VARRAY_H), which is never defined, in their dependency lists. The rest of the targets that depend on varray.h include varray.h in their dependency list. varray.h includes machmode.h, system.h, coretypes.h, and tm.h, so Makefile.in should define and use VARRAY_H, right? -- Matt signature.asc Description: Digital signature
Re: Use $(VARRAY_H) in dependencies?
On Sun, May 08, 2005 at 07:31:38PM -0700, Matt Kraai wrote: > On Mon, May 09, 2005 at 03:03:23AM +0100, Paul Brook wrote: > > On Monday 09 May 2005 02:26, Matt Kraai wrote: > > > Howdy, > > > > > > The rules for c-objc-common.o, loop-unroll.o, and tree-inline.o > > > include $(VARRAY_H), which is never defined, in their dependency > > > lists. The rest of the targets that depend on varray.h include > > > varray.h in their dependency list. > > > > > > varray.h includes machmode.h, system.h, coretypes.h, and tm.h, so > > > Makefile.in should define and use VARRAY_H, right? > > > > Already one step ahead of you :-) > > > > 2005-05-07 Paul Brook <[EMAIL PROTECTED]> > > > > * Makefile.in: Fix dependencies. > > (GCOV_IO_H, VARRAY_H): Set. > > Great. The dependencies for the rules for build/genautomata.o, build/varray.o, and gtype-desc.o still include varray.h instead of $(VARRAY_H). Is this on purpose? If so, why? -- Matt signature.asc Description: Digital signature
Targets
Hello: I was wondering if the team could add the following targets to GCC\G++\G77: Basically make it even more crossplatform compiliant and emulator friendly eg: add the following cpu series : 8080, z80, 6502, 6800, and cpm/8000? :) Maybe OS Specific librarys too (eg CP/M-86\CP/M-86 Also does G77 support Fortran-66? PS: Can I help in any way(testing the mingw port(i don't have linux\bsd\unix\vms\os/2 or mac, just windows and dos Matt Ritchie
bounty available for porting AVR backend to MODE_CC
Hi All, I don't subscribe but wanted developers to know there is a bounty available for porting the gcc AVR backend to use MODE_CC. Here is the reference: https://www.bountysource.com/issues/84630749-avr-convert-the-backend-to-mode_cc-so-it-can-be-kept-in-future-releases And this is a reference to the discussion on avrfreaks.net: https://www.avrfreaks.net/forum/avr-gcc-and-avr-g-are-deprecated-now Matt
Function attribute((optimize(...))) ignored on inline functions?
I'd like to tell gcc that it's okay to inline functions (such as rintf(), to get the SSE4.1 roundss instruction) at particular call sights without compiling the entire source file or calling function with different CFLAGS. I attempted this by making inline wrapper functions annotated with attribute((optimize(...))), but it appears that the annotation does not apply to inline functions? Take for example, ex.c: #include static inline float __attribute__((optimize("-fno-trapping-math"))) rintf_wrapper_inline(float x) { return rintf(x); } float rintf_wrapper_inline_call(float x) { return rintf(x); } float __attribute__((optimize("-fno-trapping-math"))) rintf_wrapper(float x) { return rintf(x); } % gcc -O2 -msse4.1 -c ex.c % objdump -d ex.o ex.o: file format elf64-x86-64 Disassembly of section .text: : 0: e9 00 00 00 00 jmpq 5 5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) c: 00 00 00 00 0010 : 10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0 16: c3 retq whereas I expected that rintf_wrapper_inline_call would be the same as rintf_wrapper. I've read that per-function optimization is broken [1]. Is this still the case? Is there a way to accomplish what I want? [1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html
RFA: [VAX] SUBREG of MEM with a mode dependent address
GCC 4.8 for VAX is generating a subreg:HI for mem:SI indexed address. This eventually gets caught by an assert in change_address_1. Since the MEM rtx is SI, legimate_address_p thinks it's fine. I have a change to vax.md which catches these but it's extremely ugly and I have to think there's a better way. But I have to wonder why is gcc even constructing a subreg of a mem with a mode dependent address. (gdb) call debug_rtx(insn) (insn 73 72 374 12 (set (reg/v:HI 0 %r0 [orig:29 iCol ] [29]) (subreg:HI (mem/c:SI (plus:SI (mult:SI (reg/v:SI 10 %r10 [orig:22 i ] [22]) (const_int 4 [0x4])) (reg/v/f:SI 11 %r11 [orig:101 aiCol ] [101])) [4 MEM[base: _154, offset: 0B]+0 S4 A32]) 0)) sqlite3.c:92031 13 {movhi_2} (nil)) Since this wasn't movstricthi, this could be rewritten to avoid the subreg and just treat %r0 as SI as in: (insn 73 72 374 12 (set (reg/v:SI 0 %r0 [orig:29 iCol ] [29]) (mem/c:SI (plus:SI (mult:SI (reg/v:SI 10 %r10 [orig:22 i ] [22]) (const_int 4 [0x4])) (reg/v/f:SI 11 %r11 [orig:101 aiCol ] [101]) [4 MEM[base: _154, offset: 0B]+0 S4 A32]) 0)) sqlite3.c:92031 13 {movsi_2} But even if movhi is a define_expand, as far as I can tell there's isn't enough info to know whether that is possible. At that time, how can I tell that operands[0] will be a hard reg or operands[1] will be subreg of a mode dependent memory access? I've tried using secondary_reload and it called called with (subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0) but it dies in change_address_1 before invoking the code returned in sri. I've tracked this down to reload replacing (reg:SI 113) with reg_equiv_mem (133) in the rtx. However, it doesn't verify the rtx is actually valid. I added a gcc_assert to trap this and got: #1 0x0089ab87 in eliminate_regs_1 (x=0x7f7fe7b5c498, mem_mode=VOIDmode, insn=0x0, may_use_invariant=true, for_costs=true) at /u1/netbsd-HEAD/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/reload1.c:2850(gdb) list 2845 && reg_equivs 2846 && reg_equiv_memory_loc (REGNO (SUBREG_REG (x))) != 0) 2847{ 2848 new_rtx = SUBREG_REG (x); 2849 rtx z = reg_equiv_memory_loc (REGNO (new_rtx)); 2850 gcc_assert (memory_address_addr_space_p (GET_MODE (x), 2851 XEXP (z, 0), 2852 MEM_ADDR_SPACE (z))); 2853} 2854 else (gdb) call debug_rtx(z) (mem:SI (plus:SI (mult:SI (reg/v:SI 22 [ i ]) (const_int 4 [0x4])) (reg/v/f:SI 101 [ aiCol ])) [4 MEM[base: _154, offset: 0B]+0 S4 A32]) (gdb) call debug_rtx(x) (subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0) #2 0x0089cb31 in elimination_costs_in_insn (insn=0x7f7fe7b5bbd0) at /u1/netbsd-HEAD/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/reload1.c:3751 (gdb) call debug_rtx (insn) (insn 73 72 374 12 (set (nil) (subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)) /u1/netbsd-HEAD/src/external/public-domain/sqlite/lib/../dist/sqlite3.c:92031 14 {movhi} (expr_list:REG_DEAD (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) (nil))) And now I'm stymied. The limits of gcc-ness are now exceeded :) I'n looking for ideas on how to proceed. Thanks.
Re: RFA: [VAX] SUBREG of MEM with a mode dependent address
On May 30, 2014, at 10:39 AM, Jeff Law wrote: > On 05/25/14 18:19, Matt Thomas wrote: >> >> But even if movhi is a define_expand, as far as I can tell there's >> isn't enough info to know whether that is possible. At that time, >> how can I tell that operands[0] will be a hard reg or operands[1] >> will be subreg of a mode dependent memory access? > At that time, you can't know those things. Not even close ;-) You certainly > don't want to try and rewrite the insn to just use SImode. This is all an > indication something has gone wrong elsewhere and this would just paper over > the problem. > >> >> I've tried using secondary_reload and it called called with >> >> (subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0) >> >> but it dies in change_address_1 before invoking the code returned in >> sri. > I suspect if you dig deep enough, you can make a secondary reload do what you > want. It's just amazingly painful. > > You want to allocate an SImode temporary, do the load of the SI memory > location into that SImode temporary, then (subreg:SI (tempreg:SI)). Your best > bet is going to be to look at how some other ports handle their secondary > reloads. But I warn you, it's going to be painful. Doesn't work because the assert fires before the secondary reload takes place. In expr.c:convert_mode there is code that would seem to prevent this: /* For truncation, usually we can just refer to FROM in a narrower mode. */ if (GET_MODE_BITSIZE (to_mode) < GET_MODE_BITSIZE (from_mode) && TRULY_NOOP_TRUNCATION_MODES_P (to_mode, from_mode)) { if (!((MEM_P (from) && ! MEM_VOLATILE_P (from) && direct_load[(int) to_mode] && ! mode_dependent_address_p (XEXP (from, 0), MEM_ADDR_SPACE (from))) || REG_P (from) || GET_CODE (from) == SUBREG)) from = force_reg (from_mode, from); if (REG_P (from) && REGNO (from) < FIRST_PSEUDO_REGISTER && ! HARD_REGNO_MODE_OK (REGNO (from), to_mode)) from = copy_to_reg (from); emit_move_insn (to, gen_lowpart (to_mode, from)); return; } but from at that point is just (mem:SI (reg:SI 112 [ D.118399 ]) [4 MEM[base: _154, offset: 0B]+0 S4 A32]) So there is not enough information for mode_dependent_address_p to return true. >> >> I've tracked this down to reload replacing (reg:SI 113) with >> reg_equiv_mem (133) in the rtx. However, it doesn't verify the rtx >> is actually valid. I added a gcc_assert to trap this and got: > Right. reload will make that replacement and it's not going to do any > verification at that point. Verification would have happened earlier. See above. If anywhere, that is where it would have been done. > You have to look at the beginning of the main reload loop and poke at that > for a while: > > /* For each pseudo register that has an equivalent location defined, > try to eliminate any eliminable registers (such as the frame pointer) > assuming initial offsets for the replacement register, which > is the normal case. > > If the resulting location is directly addressable, substitute > the MEM we just got directly for the old REG. > > If it is not addressable but is a constant or the sum of a hard reg > and constant, it is probably not addressable because the constant is > out of range, in that case record the address; we will generate > hairy code to compute the address in a register each time it is > needed. Similarly if it is a hard register, but one that is not > valid as an address register. > > If the location is not addressable, but does not have one of the > above forms, assign a stack slot. We have to do this to avoid the > potential of producing lots of reloads if, e.g., a location involves > a pseudo that didn't get a hard register and has an equivalent memory > location that also involves a pseudo that didn't get a hard register. > > Perhaps at some point we will improve reload_when_needed handling > so this problem goes away. But that's very hairy. */ I found a simplier solution. It seemed to me that reload_inner_reg_of_subreg was the right place to make this happen. The following diff (to gcc 4.8.3) fixes the problem: diff -u -p -r1.3 reload.c --- gcc/reload.c1 Mar 2014 08:58:29 - 1.3 +++ gcc/reload.c3 Jun 2014 17:24:27 - @@ -846,6 +846,7 @@ static bool reload_inner_reg_of_subreg (rtx x, enum machine_mode mode, bool output)
Re: GCC ARM: aligned access
On Aug 31, 2014, at 11:32 AM, Joel Sherrill wrote: >> Hi, >> >> I am writing some code and found that system crashed. I found it was >> unaligned access which causes `data abort` exception. I write a piece >> of code and objdump >> it. I am not sure this is right or not. >> >> command: >> arm-poky-linux-gnueabi-gcc -marm -mno-thumb-interwork -mabi=aapcs-linux >> -mword-relocations -march=armv7-a -mno-unaligned-access >> -ffunction-sections -fdata-sections -fno-common -ffixed-r9 -msoft-float >> -pipe -O2 -c 2.c -o 2.o >> >> arch is armv7-a and used '-mno-unaligned access' > > I think this is totally expected. You were passed a u8 pointer which is > aligned for that type (no restrictions likely). You cast it to a type with > stricter alignment requirements. The code is just flawed. Some CPUs handle > unaligned accesses but not your ARM. While armv7 and armv6 supports unaligned access, that support has to be enabled by the underlying O/S. Not knowing the underlying environment, I can't say whether that support is enabled. One issue we had in NetBSD in moving to gcc4.8 was that the NetBSD/arm kernel didn't enable unaligned access for armv[67] CPUs. We quickly changed things so unaligned access is supported.
Missed optimization case
Hi all, While digging into some GCC-generated code, I noticed a missed opportunity in GCC that Clang and ICC seem to take advantage of. All versions of GCC (up to 4.9.0) seem to have the same trouble. The following source (for x86_64) shows up the problem: - #include #define add_carry32(sum, v) __asm__("addl %1, %0 ;" \ "adcl $0, %0 ;" \ : "=r" (sum) \ : "g" ((uint32_t) v), "0" (sum)) unsigned sorta_checksum(const void* src, int n, unsigned sum) { const uint32_t *s4 = (const uint32_t*) src; const uint32_t *es4 = s4 + (n >> 2); while( s4 != es4 ) { add_carry32(sum, *s4++); } add_carry32(sum, *(const uint16_t*) s4); return sum; } - (the example is a contrived version of the original code, which comes from Solarflare's OpenOnload project). GCC optimizes the loop but then re-calculates the "s4" variable outside of the loop before the last add_carry32. ICC and Clang both realise that the 's4' value in the loop is fine to re-use. GCC has an extra four instructions to calculate the same value known to be in a register upon loop exit. Compiler explorer links: GCC 4.9.0: http://goo.gl/fi3p2J ICC 13.0.1: http://goo.gl/PRTTc6 Clang 3.4.1: http://goo.gl/95JEQc I'll happily file a bug if necessary but I'm not clear in what phase the optimization opportunity has been missed. Thanks all, Matt
Re: Missed optimization case
On Tue, Dec 23, 2014 at 2:25 PM, Andi Kleen wrote: > > Please file a bug with a test case. No need to worry about the phase > too much initially, just fill in a reasonable component. > Thanks - filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64396 -matt
volatile access optimization (C++ / x86_64)
Hi all, I'm investigating ways to have single-threaded writers write to memory areas which are then (very infrequently) read from another thread for monitoring purposes. Things like "number of units of work done". I initially modeled this with relaxed atomic operations. This generates a "lock xadd" style instruction, as I can't convey that there are no other writers. As best I can tell, there's no memory order I can use to explain my usage characteristics. Giving up on the atomics, I tried volatiles. These are less than ideal as their power is less expressive, but in my instance I am not trying to fight the ISA's reordering; just prevent the compiler from eliding updates to my shared metrics. GCC's code generation uses a "load; add; store" for volatiles, instead of a single "add 1, [metric]". http://goo.gl/dVzRSq has the example (which is also at the bottom of my email). Is there a reason why (in principal) the volatile increment can't be made into a single add? Clang and ICC both emit the same code for the volatile and non-volatile case. Thanks in advance for any thoughts on the matter, Matt --- example code --- #include std::atomic a(0); void base_case() { a++; } void relaxed() { a.fetch_add(1, std::memory_order_relaxed); } void load_and_store_relaxed() { a.store(a.load(std::memory_order_relaxed) + 1, std::memory_order_relaxed); } void cast_as_int_ptr() { (*(int*)&a) ++; } void cast_as_volatile_int_ptr() { (*(volatile int*)&a) ++; } ---example output (gcc490)--- base_case(): lock addl $1, a(%rip) ret relaxed(): lock addl $1, a(%rip) ret load_and_store_relaxed(): movl a(%rip), %eax addl $1, %eax movl %eax, a(%rip) ret cast_as_int_ptr(): addl $1, a(%rip) ret cast_as_volatile_int_ptr(): movl a(%rip), %eax addl $1, %eax movl %eax, a(%rip) ret
Re: volatile access optimization (C++ / x86_64)
On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley wrote: > On 26/12/14 20:32, Matt Godbolt wrote: >> Is there a reason why (in principal) the volatile increment can't be >> made into a single add? Clang and ICC both emit the same code for the >> volatile and non-volatile case. > > Yes. Volatiles use the "as if" rule, where every memory access is as > written. a volatile increment is defined as a load, an increment, and > a store. That makes sense to me from a logical point of view. My understanding though is the volatile keyword was mainly used when working with memory-mapped devices, where memory loads and stores could not be elided. A single-instruction load-modify-write like "increment [addr]" adheres to these constraints even though it is a single instruction. I realise my understanding could be wrong here! If not though, both clang and icc are taking a short-cut that may puts them into non-compliant state. > If you want single atomic increment, atomics are what you > should use. If you want an increment to be written to memory, use a > store barrier after the increment. Thanks. I realise I was unclear in my original email. I'm really looking for a way to say "do a non-lock-prefixed increment". Atomics are too strong and enforce a bus lock. Doing a store barrier after the increment also appears heavy-handed: while I wish for eventual consistency with memory, I do not require it. I do however need the compiler to not move or elide my increment. At the moment I think the best I can do is to use an inline assembly version of the increment which prevents GCC from doing any optimisation upon it. That seems rather ugly though, and if anyone has any better suggestions I'd be very grateful. To give a concrete example: uint64_t num_done = 0; void process_work() { /* does something somewhat expensive */} void worker_thread(int num_work) { for (int i = 0; i < num_work; ++i) { process_work(); num_done++; // ideally a relaxed atomic increment here } } void reporting_thread() { while(true) { sleep(60); printf("worker has done %d\n", num_done); // ideally a relaxed read here } } In the non-atomic case above, no locked instructions are used. Given enough information about what process_work() does, the compiler can realise that num_done can be added to outside of the loop (num_done += num_work); which is the part I'd like to avoid. By making the int atomic and using relaxed, I get this guarantee but at the cost of a "lock addl". Thanks in advance for any ideas, Matt
Re: volatile access optimization (C++ / x86_64)
On Fri, Dec 26, 2014 at 4:51 PM, Marc Glisse wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677 Thanks Marc
Re: volatile access optimization (C++ / x86_64)
On Fri, Dec 26, 2014 at 5:19 PM, Andrew Haley wrote: > On 26/12/14 22:49, Matt Godbolt wrote: >> On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley wrote: >>> On 26/12/14 20:32, Matt Godbolt wrote: >> I realise my understanding could be wrong here! >> If not though, both clang and icc are taking a short-cut that may >> puts them into non-compliant state. > > It's hard to be certain. The language used by the standard is very > unhelpful: it requires all accesses to be as written, but does not > define exactly what constitutes an access. Thanks. My world is very x86-centric and so I find it hard to understand why a single instruction's RMW is different from three separate instructions; but I appreciate the standard is vague around volatiles, and that atomics go some way to using more well-defined semantics. >> Thanks. I realise I was unclear in my original email. I'm really >> looking for a way to say "do a non-lock-prefixed increment". > > Why? Performance. The single-threaded writers do not need to use a lock prefix: the atomicity of their read-add-write is guaranteed by my knowing no other threads write to the value. Thus the bus lock they take out unnecessarily slows down the instruction and potentially causes extra coherency traffic. The order of stores (on x86) is guaranteed and so provided I take a relaxed view in the consumer there's not even a need for any other flush. The memory write will necessarily "eventually" become visible to the reader. Within the constraints of the architecture I'm working in, this is plenty enough for a metric. > You could just use a compiler barrier: asm volatile(""); But this is > good only for x86 and a few others. This may be all I need, but my worry is this will inhibit other valid optimisations. I know that the "trick" used elsewhere as a barrier (asm voliatile("":::"memory");) has the effect of flushing enregistered values to memory. Ideally this wouldn't be necessary. I'll be honest; I don't know the semantics of an empty volatile asm(), but I'm not sure how it could cause only the one write (metric++) to be emitted without affecting other variables too. > Everyone else needs a real store barrier. This is certainly true if the writer needs to guarantee visibility to other threads. But that's not the case for my use case. > Well, that's the problem: do you want a barrier or not? With no > barrier there is no guarantee that the data will ever be written to > memory. Do you only care about x86 processors? I appreciate your patience in understanding my case (given I'm not explaining myself very well!) In this instance, yes, only x86 processors. I do not need an explicit ISA-level flush. I do need a guarantee that the compiler cannot optimise the increment by loop-invariant motion. >> To give a concrete example: [snip] >> By making the int >> atomic and using relaxed, I get this guarantee but at the cost of a >> "lock addl". > > Ok, I get that, but not why. If you care about a particular x86 > instruction, you can use it in an inlne asm. I'm not at all sure what > you want, really. I hope my other comments at least help to explain the why! It's not a particular instruction inasmuch as communicating to the compiler that there's only one writer, and so the lock prefix is unnecessary (for x86) as the write of the read-modify-write will not race with other writers (as none exist) and the write will eventually become visible to other threads in strict memory order (as the x86 guarantees). This last stage I believe is consistent with a "relaxed" model, with an optimisation that if no other writers exist, no bus lock is required on the writer. Again, thanks for the reply and the time taken thinking about the issue especially at this festive time of year! Best regards, Matt
Re: volatile access optimization (C++ / x86_64)
On Fri, Dec 26, 2014 at 5:20 PM, NightStrike wrote: > Have you tried release and acquire/consume instead? Yes; these emit the same instructions in this case. http://goo.gl/e94Ya7 Regards, Matt
Re: volatile access optimization (C++ / x86_64)
On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley wrote: > On 27/12/14 00:02, Matt Godbolt wrote: >> On Fri, Dec 26, 2014 at 5:19 PM, Andrew Haley wrote: >>> On 26/12/14 22:49, Matt Godbolt wrote: >>>> On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley wrote: >>> Why? >> >> Performance. > > Okay, but that's not what I was trying to ask: if you don't need an > atomic access, why do you care that it uses a read-modify-write > instruction instead of three instructions? Is it faster? Have you > measured it? Is it so much faster that it's critical for your > application? Good point. No; I've yet to measure it but I will. I'll be honest: my instinct is that really it won't make a measurable difference. From a microarchitectural point of view it devolves to almost exactly the same set of micro-operations (barring the duplicate memory address calculation). It does encode to a longer instruction stream (15 bytes vs 7 bytes), so there's an argument it puts more pressure than needed on the i-cache. But honestly, it's more from an aesthetic point of view I prefer the increment. (The locked version *is* measurable slower). Also, it's always nice to understand why particular optimisations aren't performed by the compiler from a correctness point of view! :) Thanks all for your fascinating insights :) -matt
Re: volatile access optimization (C++ / x86_64)
> On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley wrote: > Is it faster? Have you measured it? Is it so much faster that it's critical > for your > application? Well, I couldn't really leave this be: I did a little bit of benchmarking using my company's proprietary benchmarking library, which I'll try and get open sourced. It follows Intel's recommendations for using RDTSCP/CPUID etc, and I've also spent some time looking at Agner Fog 's techniques. I believe it to be pretty accurate, to within a clock cycle or two. On my laptop (Core i5 M520) the volatile and non-volatile increments are so fast as to be within the noise - 1-2 clock cycles. So that certainly lends support to your theory Andrew that it's probably not worth the effort (other than offending my aesthetic sensibilities!). Obviously this doesn't really take into account the extra i-cache pressure. As a comparison, the "lock xaddl" versions come out at 18 cycles. Obviously this is also pretty much "free" by any reasonable metric, but it's hard to measure the impact of the bus lock on other processors' memory accesses in a highly multi-threaded environment. For completeness I also tried it on a few other machines: X5670 : 0-2 for normal, 28 clocks for lock xadd E5-2667 v2: as above, 27 clocks for lock xadd E5-2667 v3: as above, 15 clocks for lock xadd On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley wrote: > Well, in this case you now know: it's a bug! But one that it's >fairly hard to care deeply about, although it might get fixed now. Understood completely! Thanks again, Matt
Re: volatile access optimization (C++ / x86_64)
On Tue, Dec 30, 2014 at 5:05 AM, Torvald Riegel wrote: > I agree with Andrew. My understanding of volatile is that the generated > code must do exactly what the abstract machine would do. That makes sense. I suppose I don't understand what the difference is in terms of an abstract machine of "load; add; store" versus the "load-add-store". At least from on x86, from the perspective of the memory bus, there's no difference I'm aware of. > One can use volatiles for synchronization if one is also manually adding > HW barriers and potentially compiler barriers (depending on whether you > need to mix volatile and non-volatile) -- but volatiles really aim at a > different use case than atomics. Again, the processor's reordering and memory barriers are not of huge concern to me in this instance. I completely agree about volatile being the wrong use case. > For the single-writer shared-counter case, a load and a store operation > with memory_order_relaxed seem to be right approach. I agree: this most closely models my intention: a non-atomic-increment but which has the semantics of being visible to other threads in a finite period of time (as per your previous email). The relaxed-load; add; relaxed-store generates the same code as the volatile code (as in; three separate instructions), but I prefer it over the volatile as it is more intention-revealing. As to whether it's valid to peephole optimize the three instructions to be a single increment in the case of x86 given relaxed memory ordering, I can offer no good opinion (though my instinct is it should be able to be!) Thanks all for your help, Matt
Re: volatile access optimization (C++ / x86_64)
On Mon, Jan 5, 2015 at 11:53 AM, DJ Delorie wrote: > > Matt Godbolt writes: >> GCC's code generation uses a "load; add; store" for volatiles, instead >> of a single "add 1, [metric]". > > GCC doesn't know if a target's load/add/store patterns are > volatile-safe, so it must avoid them. There are a few targets that have > been audited for volatile-safe-ness such that gcc *can* use the combined > load/add/store when the backend says it's OK. x86 is not yet one of > those targets. Thanks DJ. One question: do you have an example of a non-volatile-safe machine so I can get a feel for the problems one might encounter? At best I can imagine a machine that optimizes "add 0, [mem]" to avoid the read/write, but I'm not aware of such an ISA. Much appreciated, Matt
5.1.0/4.9.2 native mingw64 lto-wrapper.exe issues (PR 65559 and 65582)
I was told I should repost this on this ML rather than the gcc-help list I originally posted this under. Here was my original thread: https://gcc.gnu.org/ml/gcc-help/2015-04/msg00167.html I came across PR 65559 and 65582 while investigating why I was getting the "lto1.exe: internal compiler error: in read_cgraph_and_symbols, at lto/lto.c:2947" error during a native MINGW64 LTO build. This also seems to be present when enabling bootstrap-lto within 5.1.0 presenting an error message akin to what is listed in PR 65582. 1. Under: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/lto-wrapper.c;h=404cb68e0d1f800628ff69b7672385b88450a3d5;hb=HEAD#l927 lto-wrapper processes command-line params for filenames match (in my case) "./.libs/libspeexdsp.a@0x44e26" and separates the filename from the offset into separate variables. Since the following check to see if that file exists by opening it doesn't use the parsed filename variable and instead continues to use the argv parameter, the attempt to open it always fails and that file is not specifically parsed for LTO options. 2. One other issue I've noticed in my build happens as a result of the open call when trying to parse the options using libiberty. Under mingw64 native, the open call opens the object file in text mode and then passes the fd eventually to libiberty's simple_object_internal_read within simple-object.c. The issue springs up trying to perform a read and it hits a CTRL+Z (0x1A) within the object at which point the next read will return 0 bytes and trigger the break of the loop and a subsequent error message of "file too short" which gets silently ignored. In my testing, changing the 0x1A within the object file to something else returns the full read (or more data until another CTRL+Z is hit). Ref: https://msdn.microsoft.com/en-us/library/wyssk1bs.aspx This still happens within 4.9.2 and 4.9 trunk however in 4.9, the object file being checked for LTO sections is still passed along in the command-line whereas in 5.1.0 it gets skipped but is still listed within the res file most likely leading to the ICE within 65559. This would also explain Kai's comment on why this issue only occurs on native builds. The ICE in 5.1.0 can also be avoided by using an lto-wrapper from 4.9 or prior allowing the link to complete though no LTO options will get processed due to #1. This is my first report so I wouldn't mind some guidance. I'm familiar enough with debugging to gather whatever other level details are requested. Most of this was found using gdb. -- Matt Breedlove
5.1.0 / 5.1.1 mingw64 bootstrap LTO failure questions
I've posted an update to PR 66014 regarding mingw64 slim LTO bootstrap errors I had been getting I was hoping to get some comments on. Though this resolves the problem for me, I'm wondering what other potential issues similar to it may spring up and was hoping to get some feedback. In addition, there is another related failure when doing bootstrap-lto or bootstrap-lto-noplugin (slim or fat) in mingw64 relating to sys_siglist. mingw64 (as far as I know) does not have an implementation for it. The issue is as follows: 1. stage1 completes bootstrapping. strsignal and sys_siglist are undetected resulting in HAVE_STRSIGNAL and HAVE_SYS_SIGLIST. 2. stage2 (or stagefeedback) detects strsignal but not sys_siglist leaving HAVE_SYS_SIGLIST defined. This causes libiberty to define strsignal but skips sys_siglist during the build leaving an undefined reference to sys_siglist. 3. Build fails when attempting to link against the new LTO libiberty.a(strsignal.o) when building gcc-nm, gcc-ar, etc. Non-LTO builds suffer neither problem and fat bootstraps only suffer from the issue above which I have worked around by passing in "libiberty_cv_var_sys_siglist=no" during configuration. Combined with building libiberty with "-fno-builtin-stpcpy" (PR 66014), I have gotten all builds to finally succeed. I could use some guidance on where to go from here, however. Thanks, Matt
Re: X32 psABI status
On Feb 12, 2011, at 1:29 PM, H.J. Lu wrote: > On Sat, Feb 12, 2011 at 1:10 PM, Florian Weimer wrote: >> * H. J. Lu: >> >>> We made lots of progresses on x32 pABI: >>> >>> https://sites.google.com/site/x32abi/ >>> >>> 1. Kernel interface with syscall is close to be finalized. >>> 2. GCC x32 branch is stabilizing. >>> 3. The Bionic C library works with the syscall kernel interface. >>> >>> The next major milestone will be x32 glibc port. >> >> It is a bit difficult to extract useful information from these >> resources. > > That is true. Contributions are more than welcome. > >> Is off_t 32 bits? Why is the ia32 compatiblity kernel interface used? > > Yes. off_t is not part of the psABI since it's OS dependent. >> I'm sure a lot of people want to get rid of that in cases where they >> control the whole software stack. > > That is debatable. The current thought is the x32 user space API > is the same as is ia32. time_t is also an issue. Any system call method is beyond the scope of the psABI since it's OS dependent and user-code should never care.
Re: X32 psABI status
On Feb 12, 2011, at 7:02 PM, Andrew Pinski wrote: > On Sat, Feb 12, 2011 at 3:04 PM, H. Peter Anvin wrote: >> On 02/12/2011 01:10 PM, Florian Weimer wrote: >>> Why is the ia32 compatiblity kernel interface used? >> >> Because there is no way in hell we're designing in a second >> compatibility ABI in the kernel (and it has to be a compatibility ABI, >> because of the pointer size difference.) > > I think he is asking why not create a new ABI layer for the kernel > like it is done for n32 for MIPS. The kernel syscall ABI needs to be able to be pass 64-bit quantities in a single register (since that's what the calling ABI is capable of doing but I don't think the ia32 kernel interface can do)? Maybe it's me, but I expected X32 to be the X86-64 ABI with 32-bit longs and pointers (converted to 64-bit arguments when passed in register or on the stack). That allows the same syscall argument marshalling that currently exists but just need a different set of syscall vectors.
Re: RFC: A new MIPS64 ABI
On Feb 14, 2011, at 12:29 PM, David Daney wrote: > Background: > > Current MIPS 32-bit ABIs (both o32 and n32) are restricted to 2GB of > user virtual memory space. This is due the way MIPS32 memory space is > segmented. Only the range from 0..2^31-1 is available. Pointer > values are always sign extended. > > Because there are not already enough MIPS ABIs, I present the ... > > Proposal: A new ABI to support 4GB of address space with 32-bit > pointers. > > The proposed new ABI would only be available on MIPS64 platforms. It > would be identical to the current MIPS n32 ABI *except* that pointers > would be zero-extended rather than sign-extended when resident in > registers. In the remainder of this document I will call it > 'n32-big'. As a result, applications would have access to a full 4GB > of virtual address space. The operating environment would be > configured such that the entire lower 4GB of the virtual address space > was available to the program. I have to wonder if it's worth the effort. The primary problem I see is that this new ABI requires a 64bit kernel since faults through the upper 2G will go through the XTLB miss exception vector. > At a low level here is how it would work: > > 1) Load a pointer to a register from memory: > > n32: > LW $reg, offset($reg) > > n32-big: > LWU $reg, offset($reg) That might be sufficient for userland, but the kernel will need to do similar things (even if a 64bit kernel) when accessing structures supplied by 32-bit syscalls. It seems to be workable but if you need the additional address space why not use N64?
Re: RFC: A new MIPS64 ABI
On Feb 14, 2011, at 6:22 PM, David Daney wrote: > On 02/14/2011 04:15 PM, Matt Thomas wrote: >> >> I have to wonder if it's worth the effort. The primary problem I see >> is that this new ABI requires a 64bit kernel since faults through the >> upper 2G will go through the XTLB miss exception vector. >> > > Yes, that is correct. It is a 64-bit ABI, and like the existing n32 ABI > requires a 64-bit kernel. N32 doesn't require a LP64 kernel, just a 64-bit register aware kernel. Your N32-big does require a LP64 kernel.
Re: RFC: A new MIPS64 ABI
On Feb 14, 2011, at 6:26 PM, David Daney wrote: > On 02/14/2011 06:14 PM, Joe Buck wrote: >> On Mon, Feb 14, 2011 at 05:57:13PM -0800, Paul Koning wrote: >>> It seems that this proposal would benefit programs that need more than 2 GB >>> but less than 4 GB, and for some reason really don't want 64 bit pointers. >>> >>> This seems like a microscopically small market segment. I can't see any >>> sense in such an effort. >> >> I remember the RHEL hugemem patch being a big deal for lots of their >> customers, so a process could address the full 4GB instead of only 3GB >> on a 32-bit machine. If I recall correctly, upstream didn't want it >> (get a 64-bit machine!) but lots of paying customers clamored for it. >> >> (I personally don't have an opinion on whether it's worth bothering with). >> > > Also look at the new x86_64 ABI (See all those X32 psABI messages) that the > Intel folks are actively working on. This proposal is very similar to what > they are doing. untrue. N32 is closer to the X32 ABI since it is limited to 2GB.
Re: RFC: A new MIPS64 ABI
On Feb 14, 2011, at 6:50 PM, David Daney wrote: > On 02/14/2011 06:33 PM, Matt Thomas wrote: >> >> On Feb 14, 2011, at 6:22 PM, David Daney wrote: >> >>> On 02/14/2011 04:15 PM, Matt Thomas wrote: >>>> >>>> I have to wonder if it's worth the effort. The primary problem I see >>>> is that this new ABI requires a 64bit kernel since faults through the >>>> upper 2G will go through the XTLB miss exception vector. >>>> >>> >>> Yes, that is correct. It is a 64-bit ABI, and like the existing n32 ABI >>> requires a 64-bit kernel. >> >> N32 doesn't require a LP64 kernel, just a 64-bit register aware kernel. >> Your N32-big does require a LP64 kernel. >> > > But using 'official' kernel sources the only way to get a 64-bit register > aware kernel is for it to also be LP64. So effectively, you do in fact need > a 64-bit kernel to run n32 userspace code. Not all the world is Linux. :) NetBSD supports N32 kernels. > My proposed ABI would need trivial kernel changes: > > o Fix a couple of places where pointers are sign extended instead of zero > extended. I think you'll find there are more of these than you'd expect. > o Change the stack address and address ranges returned by mmap(). My biggest concern is that many many mips opcodes expect properly sign-extended value for registers. Thusly N32-big will require using daddu/dadd/dsub/dsubu for addresses. So that's yet another departure from N32 which can use addu/add/sub/subu. > The main work would be in the compiler toolchain and runtime libraries. You'd also need to update gas for la and dla expansion.
Internal compiler error in targhooks.c: default_secondary_reload (ARM/Thumb)
I'm getting an internal compiler error on the following test program: void func(int a, int b, int c, int d, int e, int f, int g, short int h) { assert(a < 100); assert(b < 100); assert(c < 100); assert(d < 100); assert(e < 100); assert(f < 100); assert(g < 100); assert((-1000 < h) && (h < 0)); } Command line and output: $ arm-none-eabi-gcc -mthumb -O2 -c -o test.o test.c test.c: In function 'func': test.c:11:1: internal compiler error: in default_secondary_reload, at targhooks.c:769 Please submit a full bug report, with preprocessed source if appropriate. See <https://support.codesourcery.com/GNUToolchain/> for instructions. This is running on Windows XP. Version information: $ arm-none-eabi-gcc --version arm-none-eabi-gcc.exe (Sourcery G++ Lite 2010.09-51) 4.5.1 Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. >From playing around with this, it looks to be some kind of register allocation problem--it needs to have lots of variables active at once, and the error doesn't occur unless I'm compiling for Thumb. Unfortunately I don't have a way to test this on tips, so I can't tell if it's been fixed there or not. Any information on this would be appreciated. Thanks, Matt
RE: Question about static code analysis features in GCC
Hey Sarah, Many array bounds and format string problems can already be found, especially with LTO, ClooG, loop-unrolling, and -O3 enabled. Seeing across object-file boundaries, understanding loop boundaries, and aggressive inlining allows GCC to warn about a lot of real-world vulnerabilities. When multiple IPA passes lands in trunk, it should be even better. What I think is missing is: 1) detection of double-free. This is already a function attribute called 'malloc', which is used to express a specific kind of allocation function whose return value will never be aliased. You could use that attribute, in addition to a new one ('free'), to track potential double-frees of values via VRP/IPA. 2) the ability to annotate functions as to the taint and filtering side-effects to their parameters, like the format() attribute. (I've asked for this feature from the PC-Lint people for some time.) You could make this even more generic and just add a new attribute that allows for tagging and checking of arbitrary tags: ssize_t recv(int sockfd, void *buf, size_t len, int flags) __attribute__ ((add_parameter_tag ("taint", 2))) __attribute__ ((add_return_value_tag ("taint"))); int count_sql_rows_for(const char* name) __attribute__ ((disallow_parameter_tag ("taint", 1))); void filter_sql_characters_from(const char* name) __attribute__ ((removes_parameter_tag ("taint", 1))); then a program like this: int main(void) { char name[20] = {0}; recv(GLOBAL_SOCKET, &name, sizeof(name), 0); filter_sql_characters_from(name); // comment this line to get warning count_sql_rows_for(name); } When I wrote my binary static analysis product, BugScan, we assumed that if a pointer was tainted, so was its contents. (This was especially a necessity for collections like lists and vectors in Java and C++ binaries.) You may want to get more explicit with that, by having a rescurively_add_parameter_tag() or somesuch that only applies to pointer parameters. 3) lack of explicit NULL-termination of strings. This one gets really complicated, especially for situations where they are terminated properly and then become un-terminated. 4) if a loop that writes to a pointer, and increments that pointer, is bound by a tainted value. You'd have to add an extension to the loop unroller for that, and just check for the 'taint' tag on the bounds check. Of course, you still run into temporal ordering issues, especially with globals, where the CFG ordering won't help. But don't let that discourage you -- it would be great work to see done and commoditized, and would probably be better than most commercial analyzers as well ;) Let me know if you need any more of my expertise in this area. I can't speak for GCC internals, though.
RE: GCC 4.4/4.6/4.7 uninitialized warning regression?
> > This brings out 2 questions. Why don't GCC 4.4/4.6/4.7 warn it? > > Why doesn't 64bit GCC 4.2 warn it? > Good question. It seems that the difference is whether the compiler > generates a field-by-field copy or a call to memcpy(). According to > David, the trunk gcc in 32-bit mode doesn't call memcpy, but still > doesn't warn. He's looking at it. Is this related to this bug, which I filed a year or two ago? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42561 It would indeed be very nice to get this taken care of, as this kind of analysis would really help find a lot of bugs that currently require commercial tools.
gcc and scientific computing
Hi, I am involved in a scientific computing podcast, http://inscight.org/ I was wondering if anyone from the GCC project would like to be a special guest on the show to talk about recent developments in GCC for scientific computing in C/C++. We could discuss, e.g., the graphite optimizations, link time optimization, C++Ox, ... Thanks, Matt
Detecting global pointers
I am writing a gcc plugin and am trying to detect if a value assigned by a function call, is a global variable or not. Unfortunately, all calls to 'is_global_var' with a DECL type are returning false. My pass executes after alias analysis, and ipa analysis. The cfun->gimple_df->ipa_pta is set to true, so I know the pta analysis should have resolved global information. Plugin code: if (is_gimple_call(stmt)) { gimple_debug_bb(stmt); tree lhs = gimple_call_lhs(stmt); if (lhs && is_global_var(SSA_NAME_VAR(lhs))) printf("Global detected\n"); } Source code (in Go): package main type T struct {id int} var myglobal *T; func fn() *T { myglobal = new(T); // Should be detected as global return myglobal; } func main() { t := fn(); } Basic Block dump as my plugin code executes for function 'fn': : # .MEM_4 = VDEF <.MEM_3(D)> main.myglobal.13_1 = __go_new_nopointers (4); # .MEM_5 = VDEF <.MEM_4> main.myglobal = main.myglobal.13_1; # VUSE <.MEM_5> D.186_2 = main.myglobal; return D.186_2; Any insight would be helpful. Thanks! -Matt
Re: Detecting global pointers
On Wed, May 4, 2011 at 7:38 PM, Richard Guenther wrote: > On Wed, May 4, 2011 at 6:16 AM, Matt Davis wrote: >> I am writing a gcc plugin and am trying to detect if a value assigned by a >> function call, is a global variable or not. Unfortunately, all calls to >> 'is_global_var' with a DECL type are returning false. >> >> My pass executes after alias analysis, and ipa analysis. The >> cfun->gimple_df->ipa_pta is set to true, so I know the pta analysis should >> have >> resolved global information. > > is_global_var is all you need, no need for PTA analysis (which doesn't > change this but simply uses is_global_var as well). Thanks for the clarification. >> Plugin code: >> if (is_gimple_call(stmt)) >> { >> gimple_debug_bb(stmt); >> tree lhs = gimple_call_lhs(stmt); >> if (lhs && is_global_var(SSA_NAME_VAR(lhs))) >> printf("Global detected\n"); > > That will only reliably work if the global is not of is_gimple_reg_type (), > otherwise the call will store to an automatic temporary and the store > to the global will happen in a separate statement. > >> } >> >> >> Source code (in Go): >> package main >> >> type T struct {id int} >> var myglobal *T; >> >> func fn() *T { >> myglobal = new(T); // Should be detected as global >> return myglobal; >> } >> >> func main() { >> t := fn(); >> } >> >> >> Basic Block dump as my plugin code executes for function 'fn': >> : >> # .MEM_4 = VDEF <.MEM_3(D)> >> main.myglobal.13_1 = __go_new_nopointers (4); > > assigns to a temporary > >> # .MEM_5 = VDEF <.MEM_4> >> main.myglobal = main.myglobal.13_1; > > and here is the store > > You can try looking up the store if the LHS of the call is an SSA name > by looking at its immediate uses, but of course for > > int glob; > > foo() > { > int i = call(); // not global > glob = i; > } > > this would also find the store to glob. > > So I'm not sure you can recover all information up to source level > precision. Thanks very much for the clarification and information. -Matt
Non-optimal stack usage with C++ temporaries
I've noticed some behavior with g++ that seems strange to me. I don't know if there's some technicality in the C++ standard that requires this, or if it's just a limitation to the optimization code, but it seemed strange so I thought I'd see if anybody could shed more light on it. Here's a test program that illustrates the behavior: struct Foo { char buf[256]; Foo() {} // suppress automatically-generated constructor code for clarity ~Foo() {} }; void func0(const Foo &); void func1(const Foo &); void func2(const Foo &); void func3(const Foo &); void f() { func0(Foo()); func1(Foo()); func2(Foo()); func3(Foo()); } Compiling with -O2 and "-fno-stack-protector -fno-exceptions" for clarity, on g++ 4.4.3, gives the following: : 0: 55 push %ebp 1: 89 e5 mov%esp,%ebp 3: 81 ec 18 04 00 00 sub$0x418,%esp 9: 8d 85 f8 fb ff ff lea-0x408(%ebp),%eax f: 89 04 24mov%eax,(%esp) 12: e8 fc ff ff ff call 13 <_Z1fv+0x13> 17: 8d 85 f8 fc ff ff lea-0x308(%ebp),%eax 1d: 89 04 24mov%eax,(%esp) 20: e8 fc ff ff ff call 21 <_Z1fv+0x21> 25: 8d 85 f8 fd ff ff lea-0x208(%ebp),%eax 2b: 89 04 24mov%eax,(%esp) 2e: e8 fc ff ff ff call 2f <_Z1fv+0x2f> 33: 8d 85 f8 fe ff ff lea-0x108(%ebp),%eax 39: 89 04 24mov%eax,(%esp) 3c: e8 fc ff ff ff call 3d <_Z1fv+0x3d> 41: c9 leave 42: c3 ret The function makes four function calls, each of which constructs a temporary for the parameter. The compiler dutifully allocates stack space to construct these, but it seems to allocate separate stack space for each of the temporaries. This seems unnecessary--since their lifetimes don't overlap, the same stack space could be used for each of them. The real-life code I adapted this example from had a fairly large number of temporaries strewn throughout it, each of which were quite large, so this behavior caused the generated function to use up a pretty substantial amount of stack, for what seems like no good reason. My question is, is this expected behavior? My understanding of the C++ standard is that each of those temporaries goes away at the semicolon, so it seems like they have non-overlapping lifetimes, but I know there are some exceptions to that rule. Could someone comment on whether this is an actual bug, or required for some reason by the standard, or just behavior that not enough people have run into problems with? Thanks, Matt
How to get function argument points-to information.
For some analysis I am doing, I need to determine if a particular SSA_NAME_VAR node is pointed-to by a function argument. I am iterating across the function's arguments via DECL_ARGUMENTS(), but each argument is just a DECL node, and contains no associated points-to data, as far as I can tell. I assume there is a better/different way of determining if an argument points to my node? Thanks for any insight. -Matt
missed optimization: transforming while(n>=1) into if(n>=1)
Hi, While trying to optimize pixman, I noticed that gcc is unable to recognize that 'while (n >= 1)' can often be simplified to 'if (n >= 1)'. Consider the following example, where there are loops that operate on larger amounts of data and smaller loops that deal with small or unaligned data. int sum(const int *l, int n) { int s = 0; while (n >= 2) { s += l[0] + l[1]; l += 2; n -= 2; } while (n >= 1) { s += l[0]; l += 1; n -= 1; } return s; } Clearly the while (n >= 1) loop can never execute more than once, as n must be < 2, and in the body of the loop, n is decremented. The resulting machine code includes the backward branch to the top of the while (n >= 1) loop, which can never be taken. I suppose this is a missed optimization. Is this known, or should I make a new bug report? Thanks, Matt Turner
[RFC] alpha/ev6: model 1-cycle cross-cluster delay
Alpha EV6 and newer can execute four instructions per cycle if correctly scheduled. The architecture has two clusters {0, 1}, each with its own register file. In each cluster, there are two slots {upper, lower}. Some instructions only execute from either upper or lower slots. Register values produced in one cluster take 1 cycle to appear in the other cluster, so improperly scheduled instructions may incur a cross- cluster delay. I've duplicated (define_insn_reservation ...) for instructions which can execute from either cluster, increased latencies by 1, and added bypasses. In my limited testing it seems to provide a minor improvement (I wouldn't expect much, since it should only remove single-cycle delays here and there) So, please review and provide feedback. I also have some questions: - In the Compiler Writer's Guide [1] [2], it doesn't seem to mention anything about cross-cluster delays from integer load/store instructions as producers. It seems plausible that load/stores could be a special case and update both clusters' register files at the same time, but maybe this is an oversight in (two versions of) the manual? - CMOV instructions are internally split as two distinct instructions on >=EV6 that may execute on any cluster/slot. Evidently, this means that the first part may execute on cluster 0 while the second executes on cluster 1, thereby incurring a 1-cycle cross-cluster delay. WTF. So, how can I represent this two-part instruction--by duplicating its define_insn_reservation 4 times? I can't find any rules for scheduling CMOVs in the CWG, so knowing this would be helpful too. - The CWG lists the latency of unconditional branches and jsr/call instructions as 3, whereas we have 1. I guess this latency value is only meaningful if the instruction produces a value? I'm a bit confused by this value in the CWG since it lists the latency of conditional branches as N/A, while these other types of branches as 3, although none produce a register value. - When increasing the default instruction latencies, I've added ',nothing' to the functional unit regexp. Is this the correct way to describe that the functional unit is free? - There's a ??? comment at the top that says "In addition, instruction order affects cluster issue." Does gcc understand how to do this already, or is this a TODO reminder? If it's a reminder, where should I look in gcc to add this? - I also see that fadd/fcmov/fmul instructions take an extra two cycles when the consumer is fst/ftoi, so something similar should be added for them. Can a (define_bypass ...) function specify a latency value greater than the default latency, or should I raise the default latency and special-case fst/ftoi consumers like I've done for cross-cluster delay? Thanks a lot! Matt Turner [1] http://www.compaq.com/cpq-alphaserver/technology/literature/cmpwrgd.pdf [2] http://download.majix.org/dec/comp_guide_v2.pdf --- ev6.md.orig 2007-08-02 06:49:31.0 -0400 +++ ev6.md 2011-05-24 23:15:39.414919424 -0400 @@ -24,19 +24,19 @@ ; EV6 has two symmetric pairs ("clusters") of two asymmetric integer ; units ("upper" and "lower"), yielding pipe names U0, U1, L0, L1. ; -; ??? The clusters have independent register files that are re-synced +; The clusters have independent register files that are re-synced ; every cycle. Thus there is one additional cycle of latency between -; insns issued on different clusters. Possibly model that by duplicating -; all EBOX insn_reservations that can issue to either cluster, increasing -; all latencies by one, and adding bypasses within the cluster. +; insns issued on different clusters. ; -; ??? In addition, instruction order affects cluster issue. +; ??? In addition, instruction order affects cluster issue. XXX: what to do? (define_automaton "ev6_0,ev6_1") (define_cpu_unit "ev6_u0,ev6_u1,ev6_l0,ev6_l1" "ev6_0") (define_reservation "ev6_u" "ev6_u0|ev6_u1") (define_reservation "ev6_l" "ev6_l0|ev6_l1") -(define_reservation "ev6_ebox" "ev6_u|ev6_l") +(define_reservation "ev6_ebox" "ev6_u|ev6_l") ; XXX: remove +(define_reservation "ev6_e0" "ev6_l0|ev6_u0") +(define_reservation "ev6_e1" "ev6_l1|ev6_u1") (define_cpu_unit "ev6_fa" "ev6_1") (define_cpu_unit "ev6_fm,ev6_fst0,ev6_fst1" "ev6_0") @@ -50,15 +50,26 @@ ; Integer loads take at least 3 clocks, and only issue to lower units. ; adjust_cost still factors in user-specified memory latency, so return 1 here. -(define_insn_reservation "ev6_ild" 1 +; XXX: CWG doesn't mention cross-cluster delay for ild/ist producers ??? +(define_insn_reservation "ev
Configure gcc with --multilib=... ?
Hi, I'd like to ship multilib Gentoo/MIPS installations with only n32 and n64 ABIs (ie, no o32). The reasoning is that if your system can use either 64-bit ABI you don't have any reason to run o32, given that o32-only installation media also exists. I say this mail http://gcc.gnu.org/ml/gcc/2010-01/msg00063.html suggesting the addition of a --multilib= configure option. Has such a thing been added? Is there a way to configure gcc to build only n32 and n64 ABIs? Thanks, Matt
RE: GCC 4.6.1 Status Report (2011-06-20) [BRANCH FROZEN]
> GCC 4.6.1 first release candidate has been uploaded, and the branch > is now frozen. All changes need RM approval now. > Please test it, if all goes well, 4.6.1 will be released early next > week. No chance for a fix for this in 4.6.1? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48600 This has been a critical regression for us, forcing the removal of cold attributes which in turn has reduced performance by a notable amount due to decreased spatial locality. If cold attributes are a sufficiently obscure feature that doesn't warrant a P1, let me know and I'll set expectations appropriately. Thanks!
RE: C++ bootstrap of GCC - still useful ?
> As of a couple of months, I perform a bootstrap-with-C++ > (--enable-build-with-cxx) daily on my machine between 18:10 and 20:10 UTC. > Is there still interest in daily builds like mine ? Absolutely! Especially if you do a profiled-bootstrap and/or LTO bootstrap in that mode. Hopefully this is feasible given the recent improvements in trunk that allowed Mozilla to be built this way. Even without those things, it's quite useful to make sure it stays working. So, thanks and keep it up :)
Updating the CFG after function modifcation
Hello, I have an IPA pass (implemented as a plugin) which executes after all IPA passes. My pass transforms functions by adding code and also modifying the function prototypes. I have had this work on a per-function basis, via a GIMPLE_PASS, which calls update_ssa verify_ssa and cleanup_cfg after each function is processed. However, I have recently moved my plugin to execute after all IPA passes, so I can iterate over the cfg of the program. The first iteration is an analysis, and the second iteration does the transformations. Unfortunately, I keep getting errors now, primarily a segfault in "compute_call_stmt_bb_frequency" in the processing of the main(). The segfault occurs because the argument 'bb' is NULL and later dereferenced. (NOTE: I do not modify the prototype of main). The e->call_stmt that the null basic block references is from a statement I have removed via gsi_remove during my transformation pass. I need to clean up the cfg somehow, after I remove the statement. My gimple pass, with this same functionality worked fine. Something tells me that my plugin should be in a different position. I also tried calling cleanup_tree_cfg() after my transformation pass, still no luck Any suggestions would be welcomed. Thanks for even reading this far. -Matt
PARM_DECL to SSA_NAME
Hello, I have a PARM_DECL node that I am passing to a function. Previously, my code was working, but since I have made my optimization pass operate as an IPA pass, versus a GIMPLE pass, I think I am missing some verification/resolution call that I need to make. Of course, when I pass the PARM_DECL to my function, I am now getting an error from verify_ssa() suggesting that I should be passing a SSA_NAME instance. I tried using gimple_default_def() to obtain the SSA_NAME for that PARM_DECL, however, the return value is NULL. Is there some other way of accessing the SSA_NAME information for this PARM_DECL node? The SSA has been generated before my plugin executes. Also, I do call update_ssa() after the routines are processed by my passes. Thanks for any insight. -Matt
Inline Expansion Problem
Hello, I am having the compiler insert a call to a function which is defined inside another object file. However, during inline expansion via expand_call_inline(), the following assertion fails in tree-inline.c: >> 3775: edge = cgraph_edge (id->dst_node, stmt); >> 3776: gcc_checking_assert (cg_edge); cg_node comes back as being NULL since there is only one callee and no indirect calls, the function that has the inserted call is main(). Is there something I forgot to do after inserting the gimple call statement? This works fine without optimization. -Matt
Re: Inline Expansion Problem
On Sat, Aug 27, 2011 at 09:27:49AM +0200, Richard Guenther wrote: > On Sat, Aug 27, 2011 at 4:47 AM, Matt Davis wrote: > > Hello, > > I am having the compiler insert a call to a function which is defined inside > > another object file. However, during inline expansion via > > expand_call_inline(), > > the following assertion fails in tree-inline.c: > >>> 3775: edge = cgraph_edge (id->dst_node, stmt); > >>> 3776: gcc_checking_assert (cg_edge); > > > > cg_node comes back as being NULL since there is only one callee and no > > indirect > > calls, the function that has the inserted call is main(). Is there > > something I > > forgot to do after inserting the gimple call statement? This works fine > > without > > optimization. > > Dependent on where you do it you have to add/rebuild cgraph edges. Thanks Richard, I tired "rebuild_cgraph_edges()" before I sent the initial email. Unfortunately, when I call that function after I add the statement, in an IPA pass, the resulting binary does not link, as it does not seem able to resolve the symbol to the callee. Maybe providing more context would help make more sense. insert_func_call inserts the call by adding a new gimple call statement. I've done this tons of times before, but it seems with -O the callgraph isn't happy. >> for (node=cgraph_nodes; node; node=node->next) >> { >> if (!(func = DECL_STRUCT_FUNCTION(node->decl))) >> continue; >> >> push_cfun(func); >> old_fn_decl = current_function_decl; >> current_function_decl = node->decl; >> >> insert_func_call(func); >> >> rebuild_cgraph_edges(); >> current_function_decl = old_fn_decl; >> pop_cfun(); >> } -Matt
Re: Inline Expansion Problem
On Sat, Aug 27, 2011 at 11:25:45AM +0200, Richard Guenther wrote: > On Sat, Aug 27, 2011 at 10:06 AM, Matt Davis wrote: > > On Sat, Aug 27, 2011 at 09:27:49AM +0200, Richard Guenther wrote: > >> On Sat, Aug 27, 2011 at 4:47 AM, Matt Davis wrote: > >> > Hello, > >> > I am having the compiler insert a call to a function which is defined > >> > inside > >> > another object file. However, during inline expansion via > >> > expand_call_inline(), > >> > the following assertion fails in tree-inline.c: > >> >>> 3775: edge = cgraph_edge (id->dst_node, stmt); > >> >>> 3776: gcc_checking_assert (cg_edge); > >> > > >> > cg_node comes back as being NULL since there is only one callee and no > >> > indirect > >> > calls, the function that has the inserted call is main(). Is there > >> > something I > >> > forgot to do after inserting the gimple call statement? This works fine > >> > without > >> > optimization. > >> > >> Dependent on where you do it you have to add/rebuild cgraph edges. > > > > Thanks Richard, > > I tired "rebuild_cgraph_edges()" before I sent the initial email. > > Unfortunately, when I call that function after I add the statement, in an > > IPA > > pass, the resulting binary does not link, as it does not seem able to > > resolve > > the symbol to the callee. Maybe providing more context would help make more > > sense. insert_func_call inserts the call by adding a new gimple call > > statement. > > I've done this tons of times before, but it seems with -O the callgraph > > isn't > > happy. > > If you are doing this from an IPA pass you have to add the edge manually using > update_edges_for_call_stmt. Thanks Richard, I was unable to properly use update_edges_for_call_stmt. It seems that routine is for updating an existing call. In my case I am inserting a new gimple call via gsi_insert_before() with GSI_NEW_STMT. As a gimple pass, this works fine. I appreciate all of your correspondence. -Matt
Adding functions at compile time
I am creating a few functions at compile time, via a gcc plugin. I create the functions and their bodies, and insert them into the call graph. This is all done before "cgraph_finalize_compilation_unit()" has been called. I then have another compiler pass, which gets started after the SSA representation has been generated, and it is this pass that uses the functions created previously, in the much earlier pass. The problem is that by the time the created functions are used, the cgraph has already removed those nodes since they are disjoint. I tried creating and modifying the functions in the same pass, but that was not successful either. I did not see any flag I could set in the cgraph nodes, which are created in the first pass I mentioned preventing them from being removed. Is there a way I can keep those nodes around so the functions created at compile time actually get built? -Matt
Go Garbage Collection Roots
As some of you might know, I have been researching and working on a region-based memory management plugin for GCC. My target is specifically the Go language. With that said, I have been making a fair amount of progress. More recently, I have been benchmarking my work, and it came to my attention that I need to handle types defined in an external object files. For instance, when a new List object is created, the external package for List, calls "new" and returns us a nice sparkly new List object. The runtime of Go implements "new" as "__go_new," which calls the runtime's special allocator to produce an object that is garbage collected. This is causing some snags in my system. Mainly, I want to use my own allocator, since there is only a special case when I want to use garbage collection in my region system. Is there a way/interface to register data as a root in the garbage collector, so that its not in conflict with my allocation? The other option would be to try to override "__go_new" with my own implementation, but keeping the same symbol name so that the linker does the dirty work. -Matt
Creating a structure at compile time.
I am working on a gcc-plugin where I need to create a structure at compile time. I have gleaned over one of the front ends to learn more about creating structures at compile time. What I have thus far is a type node for my struct. I now need to create an instance of this struct. For exemplary purposes we will call this type 'struct T' and we will call the instance of T, 'my_T' By using the build_constructor() routine in GCC I create an instance, my_T, which I need to pass the address of to a function. So, I take this decl, my_T, and pass it to build_fold_addr_expr(). The result of the latter is what I pass to the function 'fn()'. Yes, the function I am passing the reference to is expecting the proper type, that of address-to-T. Running this presents me with an error in expand_expr_real_1() where "Variables inherited from containing functions should have been lowered by this point." So, I figure, if I create a temp variable, 'V', of type pointer-to-T, and run make_ssa_name() on that temp. And then insert an assignment before the call to fn, so I get: 'V = &my_T;' After looking at the GIMPLE dump, I see, 'V = &my_T; fn(V);' Which is correct, however, in the type list of the caller, I only see: 'struct * V;' Now, this concerns me, I would expect to see "struct T *V;" As above, this case also fails. I am baffled, do I need to even be creating the ssa_name instance to pass to 'fn()', which is 'V' in the case above? Or, will the build_constructor() produce a tree node that I can treat as a variable, that I can pass to 'fn()' ? -Matt
Re: Creating a structure at compile time.
On Fri, Dec 2, 2011 at 3:38 PM, Matt Davis wrote: > I am working on a gcc-plugin where I need to create a structure at compile > time. > I have gleaned over one of the front ends to learn more about creating > structures at compile time. What I have thus far is a type node for my > struct. > > I now need to create an instance of this struct. For exemplary purposes we > will > call this type 'struct T' and we will call the instance of T, 'my_T' By using > the build_constructor() routine in GCC I create an instance, my_T, which I > need > to pass the address of to a function. So, I take this decl, my_T, and pass > it to > build_fold_addr_expr(). The result of the latter is what I pass to the > function 'fn()'. > > Yes, the function I am passing the reference to is expecting the proper type, > that of address-to-T. Running this presents me with an error in > expand_expr_real_1() where "Variables inherited from containing functions > should > have been lowered by this point." > > So, I figure, if I create a temp variable, 'V', of type pointer-to-T, and run > make_ssa_name() on that temp. And then insert an assignment before the call > to > fn, so I get: 'V = &my_T;' After looking at the GIMPLE dump, I see, 'V = > &my_T; > fn(V);' Which is correct, however, in the type list of the caller, I only > see: > 'struct * V;' Now, this concerns me, I would expect to see "struct T *V;" As > above, this case also fails. > > I am baffled, do I need to even be creating the ssa_name instance to pass to > 'fn()', which is 'V' in the case above? Or, will the build_constructor() > produce a tree node that I can treat as a variable, that I can pass to 'fn()' > ? > > -Matt Well, I have successfully created and used an initialized structure. Note that I do not need to run the make_ssa_name. I can declare the struct as TREE_STATIC and work from there. Now, my problem with the expand_expr_real_1 check failing is because some of the values I initialize in my compile-time created struct can be different at runtime. Is there a way I can take this constructor tree node, and have all of the values in it set in the middle of my function, where those values are defined? I do not need the structure initialized upon function entry. What I need is to have all of the values, which I already setup when I am in the middle of the function being processed. I need these values actually filled-out in the middle of function instead at function entry. I am unsure how to do this. The constructor node exists, and I'm in the middle of an IPA pass. I assume I can call gimplify_expr() but I am thinking I need to pass it something different than just a constructor tree node. Thanks for any help -Matt
Obtaining the arguments to a function pointer
I am trying to look at the arguments that are passed to a function pointer. I have an SSA_NAME which is for a pointer-type to a function-type. I want to obtain the arguments being passed to the function pointer, but after looking all over the SSA_NAME node and its corresponding VAR_DECL I cannot seem to find the arguments stashed anywhere. I know this is somewhat of a special case. Typically, if I had a fndecl it would be easy, but all I know in my case is the function type. -Matt
Re: Obtaining the arguments to a function pointer
On Sat, Dec 10, 2011 at 12:40 PM, Ian Lance Taylor wrote: > Matt Davis writes: > >> I am trying to look at the arguments that are passed to a function >> pointer. I have an SSA_NAME which is for a pointer-type to a >> function-type. I want to obtain the arguments being passed to the >> function pointer, but after looking all over the SSA_NAME node and its >> corresponding VAR_DECL I cannot seem to find the arguments stashed >> anywhere. I know this is somewhat of a special case. Typically, if I >> had a fndecl it would be easy, but all I know in my case is the >> function type. > > A function pointer doesn't have any associated arguments, at least not > as I use that word. Are you looking for the argument types? Because > there are no argument values. > > The argument types can be found from the type of the SSA_NAME, which > should be a FUNCTION_TYPE. TYPE_ARG_TYPES of the FUNCTION_TYPE will be > the argument types. Ian, I was actually looking for the argument instances and not the types. However, I have found I can get the gimple statement for this call, and just use that to obtain the actual arguments I need. Thanks for the fast reply! -Matt
Modifying the datatype of a formal parameter
I am using 'ipa_modify_formal_parameters()' to change the type of a function's formal parameter. After my pass completes, I get a 'gimple_expand_cfg()' error. I must be missing some key piece here, as the failure points to a NULL "SA.partition_to_pseudo" value. I also set_default_ssa_name() on the returned value from ipa_modify_formal_parameter (the adjustment's 'reduction' field). Do I need to re-gimplify the function or run some kind of 'cleanup' or 'update' once I modify this formal parameter? Thanks -Matt
Re: Modifying the datatype of a formal parameter
Hi Martin and thank you very much for your reply. I do have some more resolution to my issue. On Mon, Dec 19, 2011 at 8:42 PM, Martin Jambor wrote: > Hi, > > On Sun, Dec 18, 2011 at 01:57:17PM +1100, Matt Davis wrote: >> I am using 'ipa_modify_formal_parameters()' to change the type of a >> function's >> formal parameter. After my pass completes, I get a 'gimple_expand_cfg()' >> error. I must be missing some key piece here, as the failure points to a NULL >> "SA.partition_to_pseudo" value. I also set_default_ssa_name() on the >> returned >> value from ipa_modify_formal_parameter (the adjustment's 'reduction' field). >> Do >> I need to re-gimplify the function or run some kind of 'cleanup' or 'update' >> once I modify this formal parameter? > > It's difficult to say without knowing what and at what stage of the > compilation you are doing. My pass is getting called as the last IPA pass (PLUGIN_ALL_IPA_PASSES_END). I do use the same function "ipa_modify_formal_parameters()" to add additional parameters to certain functions. And it works well. > The sad truth is that > ipa_modify_formal_parameters is very much crafted for its sole user > which is IPA-SRA and is probably quite less general than what the > original intention was. Any pass using the function then must modify > the body itself to reflect the changes, just like IPA-SRA does. > > SRA does not re-gimplify the modify functions, it just returns > TODO_update_ssa or (TODO_update_ssa | TODO_cleanup_cfg) if any EH > cleanup changed the CFG. Yep, and I do call update_ssa and cleanup_tree_cfg() after my pass. > So I would suggest to have a look at IPA-SRA (grep for the only call > to ipa_modify_formal_parameters in tree-sra.c), especially at what you > do differently. If you then have any further questions, feel free to > ask. Yeah, that was one of the first things I did. Now, as mentioned, I do have some more clarity on my issue. Basically, I am just changing the type of an existing formal parameter. When I look at "gimple_expand_cfg()" which is called later, I notice that the "SA.partition_to_pseudo" for that parameter is NULL, to which "gimple_expand_cfg()" aborts() on. Now, that value is NULL, because in "gimple_expand_cfg()" the function "expand_used_vars()" is called. I need "expand_one_var()" called since that should fix-up the RTX assigned to the parameter I am modifying. Unfortunately, the bitmap, "SA.partition_has_default_def" is true for the parameter, even if I do not set it explicitly. And since it is always set, the "expand_one_var()" routine is never called. I need to unset the default def associated to the param to force "expand_one_var()" to execute. So, for the ssa name assigned to the parameter I am modifying, I use SSA_NAME_IS_DEFAULT_DEF to set the flag to 'false' This sounds like a really gross hack. If I do this, I will need to set a new ssa definition for the modified parameter. -Matt
Re: Modifying the datatype of a formal parameter
Here is a follow up. I am closer to what I need, but not quite there yet. Basically I just want to switch the type of one formal parameter to a different type. On Mon, Dec 19, 2011 at 11:05 PM, Matt Davis wrote: > Hi Martin and thank you very much for your reply. I do have some more > resolution to my issue. > > On Mon, Dec 19, 2011 at 8:42 PM, Martin Jambor wrote: >> Hi, >> >> On Sun, Dec 18, 2011 at 01:57:17PM +1100, Matt Davis wrote: >>> I am using 'ipa_modify_formal_parameters()' to change the type of a >>> function's >>> formal parameter. After my pass completes, I get a 'gimple_expand_cfg()' >>> error. I must be missing some key piece here, as the failure points to a >>> NULL >>> "SA.partition_to_pseudo" value. I also set_default_ssa_name() on the >>> returned >>> value from ipa_modify_formal_parameter (the adjustment's 'reduction' >>> field). Do >>> I need to re-gimplify the function or run some kind of 'cleanup' or 'update' >>> once I modify this formal parameter? >> >> It's difficult to say without knowing what and at what stage of the >> compilation you are doing. > > My pass is getting called as the last IPA pass > (PLUGIN_ALL_IPA_PASSES_END). I do use the same function > "ipa_modify_formal_parameters()" to add additional parameters to > certain functions. And it works well. > >> The sad truth is that >> ipa_modify_formal_parameters is very much crafted for its sole user >> which is IPA-SRA and is probably quite less general than what the >> original intention was. Any pass using the function then must modify >> the body itself to reflect the changes, just like IPA-SRA does. >> >> SRA does not re-gimplify the modify functions, it just returns >> TODO_update_ssa or (TODO_update_ssa | TODO_cleanup_cfg) if any EH >> cleanup changed the CFG. > > Yep, and I do call update_ssa and cleanup_tree_cfg() after my pass. > >> So I would suggest to have a look at IPA-SRA (grep for the only call >> to ipa_modify_formal_parameters in tree-sra.c), especially at what you >> do differently. If you then have any further questions, feel free to >> ask. > > Yeah, that was one of the first things I did. Now, as mentioned, I > do have some more clarity on my issue. Basically, I am just changing > the type of an existing formal parameter. When I look at > "gimple_expand_cfg()" which is called later, I notice that the > "SA.partition_to_pseudo" for that parameter is NULL, to which > "gimple_expand_cfg()" aborts() on. Now, that value is NULL, because > in "gimple_expand_cfg()" the function "expand_used_vars()" is called. > I need "expand_one_var()" called since that should fix-up the RTX > assigned to the parameter I am modifying. Unfortunately, the bitmap, > "SA.partition_has_default_def" is true for the parameter, even if I do > not set it explicitly. And since it is always set, the > "expand_one_var()" routine is never called. I need to unset the > default def associated to the param to force "expand_one_var()" to > execute. So, for the ssa name assigned to the parameter I am > modifying, I use SSA_NAME_IS_DEFAULT_DEF to set the flag to 'false' > This sounds like a really gross hack. If I do this, I will need to > set a new ssa definition for the modified parameter. I use ipa_modify_formal_paramaters() and swap the type of the param with that of my desired type. The resulting PARM_DECL that the latter function gives me has no default definition. So, I use make_ssa_name() and set the return of that to the default definition for the PARM_DECL. That works fine, however I need to somehow rebuild the SSANAMES for the function. So, the new name I have for the modified PARAM_DECL is out of order and gimple_expand_cfg() fails, because the new definition of the PARM_DECL is now of order for SA,partition_to_pseudo, when gimple_expand_cfg() is called. Since the partition-to-pseudo stuff works based on the index of where the SSA_NAME is in the functions list of SSANAMES. gimple_expand_cfg() works by iterating across all SSANAMEs including the one I no longer need. What I need to do is replace the old SSA_NAME with the newer SSA_NAME I get back from make_ssa_name(). I could do this directly, but I have yet to find an appropriate routine in tree-flow.h and tree-flow-inline.h -Matt
RTL Conditional and Call
Hi, I am having an RTL problem trying to make a function call from a COND_EXEC rtx. The reload pass has been called, and very simply I want to compare on an 64bit x86 %rdx with a specific integer value, and if that value is true, my function call executes. I can call the function fine outside of the conditional, but when I set it in the conditional expression, I get the following error: test.c:6:1: error: unrecognizable insn: (insn 27 13 20 2 (cond_exec (eq:BI (const_int 42 [0x2a]) (reg:DI 1 dx)) (call (mem:DI (symbol_ref:DI ("abort")) [0 S8 A8]) (const_int 0 [0]))) -1 (nil)) test.c:6:1: internal compiler error: in insn_default_length, at config/i386/i386.md:591 The original code for the condition: rtx cmp = gen_rtx_EQ( BImode, gen_rtx_CONST_INT(VOIDmode, 42), gen_rtx_REG(DImode, 1)); And the original code for the COND_EXEC expression, which is what I emit into the program: rtx sym = gen_rtx_SYMBOL_REF(Pmode, "abort"); rtx abrt_addr = gen_rtx_MEM(Pmode, sym); rtx abrt = gen_rtx_CALL(VOIDmode, abrt_addr, const0_rtx); rtx cond = gen_rtx_COND_EXEC(VOIDmode, cmp, abrt); Thanks -Matt
Re: RTL Conditional and Call
On Sat, Dec 31, 2011 at 12:51 AM, Alexander Monakov wrote: > > > On Sat, 31 Dec 2011, Matt Davis wrote: > >> Hi, >> I am having an RTL problem trying to make a function call from a >> COND_EXEC rtx. The reload pass has been called, and very simply I >> want to compare on an 64bit x86 %rdx with a specific integer value, >> and if that value is true, my function call executes. I can call the >> function fine outside of the conditional, but when I set it in the >> conditional expression, I get the following error: >> >> test.c:6:1: error: unrecognizable insn: > > Indeed, x86 does not have a "conditional call" instruction. You would have to > generate the call in a separate basic block and add a conditional branch > instruction around it. You can reference the following code, which attempts > to convert any COND_EXECs to explicit control flow: > > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02383.html > > (but you will probably need to additionally generate comparison instructions). > > Hope that helps, Thanks Alexander. This does help. What I have been doing is writing the same code in c. Compiling that, and then dumping the RTL. I then try to create the same RTL by hand. The second thing I need to do, as the first is already in place in my code, is to compare a register with a constant. So, just to test things, I just perform a simple "COMPARE" and set the mode to CCZ, and is what my analogue C variant produces in the RTL dump. Unfortunately, I'm still getting a similar error "unrecognizable insn" I feel lame asking so many questions, but this is something I want to get stronger with, so aside from my current gcc research, I am tossing this into the mix in my free time. I've looked at the rtl.def and nothing seems incorrect. My RTX: rtx cmp2 = gen_rtx_COMPARE( CCZmode, gen_rtx_REG(DImode, 1), gen_rtx_CONST_INT(VOIDmode, 42)); Once this is in place I would wrap a SET rtx and actually set the CCZ register. I'm primarily just concerned with getting the comparison piece in place first. -Matt
Interface Method Table
For a Go program being compiled in gcc, from the middle end, is there a way to figure-out which routines make up the interface-method-table? I could check the mangled name of the method table, but is there another way to deduce what methods compose it from the middle-end? Thanks! -Matt
RTL AND Instruction
Hello (again), I have a case where I need to emit an AND operation on a register and a const_int value. The machine architecture I am looking at, for the .md, is an i386. Anyways, after matching things up with the rtl.def and what is in the .md, I use the gen_rtx_AND macro and wrap that in a gen_rtx_SET. I could insert inline assembly with the ASM_OPERANDS macro, but I really want to do this with pure RTL. Essentially, I just want to emit: "and %eax, $0x7" Once I emit my rtx into the list of insns, GCC gives me an "unrecognized insn" error. I can trace the code through the first part of the condition, specified in i386.md, "ix86_binary_operator_ok," and that passes fine from the "anddi_1" define_insn. What I have in my source is the following: rtx eax = gen_rtx_REG(DImode, 0); rtx and = gen_rtx_AND(DImode, eax, gen_rtx_CONST_INT(VOIDmode, 7)); and = gen_rtx_SET(DImode, eax, and); emit_insn_before(and, insn); Thanks for any insight into this. On a side note, this is just for a side-project, and I am trying to get a better grasp of RTL. I have gone through the internals manual for RTL and Machine Descriptions, but seems I am still having a bit of trouble. -Matt
Re: RTL AND Instruction
On Sun, Jan 29, 2012 at 8:21 PM, James Courtier-Dutton wrote: > > On Jan 22, 2012 5:21 AM, "Matt Davis" wrote: >> Essentially, I just want to emit: "and %eax, $0x7" >> > Assuming at&t format, does that instruction actually exist? > How can you store the result in the constant number 7? > Did you instead mean > and $0x7, %eax Yes, I have it working. Much thanks to everyone :-) -Matt
[alpha] Request for help wrt gcc bugs 27468, 27469
Hi, Could someone please take a look at these two bugs? 27468 - sign-extending Alpha instructions not exploited 27469 - zero extension not eliminated [on Alpha] Andrew Pinski has confirmed both of them three and a half years ago. My uninformed feeling after seeing bugs 8603 and 42113 fixed is that both of them are relatively simple. I CC'd Richard since you probably know more about Alpha than anyone else, and I CC'd you, Uros, since you were extremely nice and helpful with bugs the other two previously mentioned bugs. I'm more than willing to do any testing I can, and I can get you access to a quad-833MHz ES40 to do testing on, if need be. Thanks, Matt Turner
[alpha] Wrong code produced at -Os, -O2, and -O3
Hi Uros and Richard, I was rewriting the Alpha sched_find_first_bit implementation for the Linux Kernel, and in the process I think I've come across a gcc bug. I rewrote the function using cmov instructions, and wrote a small program to test its correctness and performance. I wrote the function initially as an external .S file, and once I was reasonably sure it was correct, converted it to C function with inline assembly. Compiling both produce the exact same output, as shown. : ldq t0,0(a0) clr t2 ldq t1,8(a0) cmoveq t0,0x40,t2 cmoveq t0,t1,t0 cttzt0,t3 addqt3,t2,v0 ret In my test program, I found that when I executed the rewritten implementation _before_ the reference implementation that it produced bogus results. This only happens when using the C/inline asm function. When compiled with the external .S file, the results are correct. Attached is a tar.gz with my test code. Compile the test program with `gcc -O -mcpu=... find.c rewritten.S test.c -o test` with optional -D__REWRITTEN_INLINE and -D__REWRITTEN_FIRST. At -Os, -O2, or -O3 and -D__REWRITTEN_INLINE and -D__REWRITTEN_FIRST the program will produce incorrect results and assert(). At -O0 or -O1 or without one or both of the -D flags, it will produce correct results. I've tested with gcc-4.3.4 and gcc-4.4.2. Thanks. Let me know what I can do to help further. Matt Turner sched_find_first_bit.tar.gz Description: GNU Zip compressed data
Re: [alpha] Wrong code produced at -Os, -O2, and -O3
On Thu, Apr 8, 2010 at 2:16 AM, Uros Bizjak wrote: > On Wed, Apr 7, 2010 at 8:38 PM, Matt Turner wrote: > >> I was rewriting the Alpha sched_find_first_bit implementation for the >> Linux Kernel, and in the process I think I've come across a gcc bug. > > [...] > >> Thanks. Let me know what I can do to help further. > > Please fill a Bugzilla bugreport with your problem. Otherwise, it will > be lost in the mailing lists. > > Uros. > Sure. Thanks for the email. I've filed it in Bugzilla, with as small a test case as I can. Thanks! Matt http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43691
Stack mangling for anonymous function pointers
I'm working on a system where we're jumping from Java into C to pull a function out of a dictionary (indexed by string name) and calling it as a 'long (*)(void *, ...). There's some confusion as to if there is a method to copy a structure or an array onto the stack through the ... arg such that the remainder of the stack can be used for the specific arguments that the function is looking for (ie, "f(void *, int, long, long, double)"). Online documentation has some static as to whether a pointer to, or the whole structure is copied onto the stack. Is there a reliable way to write data to the stack such that a called function pointer can extract the values it seeks? Thanks, Matt
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
On Mar 13, 2009, at 10:06 AM, Paolo Bonzini wrote: Hm. In fold-const.c we try to make sure to produce the same result as the target would for constant-folding shifts. Thus, Paolo, I think what fold-const.c does is what we should assume for !SHIFT_COUNT_TRUNCATED. No? Unfortunately it is not so simple. fold-const.c is actually wrong, as witnessed by this program static inline int f (int s) { return 2 << s; } int main () { printf ("%d\n", f(33)); } which prints 4 at -O0 and 0 at -O2 on i686-pc-linux-gnu. But this is because i?86 doesn't define SHIFT_COUNT_TRUNCATED, no? Yes, so fold-const.c is *not* modeling the target in this case. But on the other hand, this means we can get by with documenting the effect of a conservative truncation mask: no wrong code bugs, just differences between optimization levels for undefined programs. I'll check that the optimizations done based on the truncation mask are all conservative or can be made so. So, I'd still need the information for arm and m68k, because that information is about the bitfield instructions. For rs6000 it would be nice to see what they do for 64-bits (for 32-bit I know that PowerPCs truncate to 6 bits, not 5). But for the other architectures, we can be conservative. VAX doesn't truncate at all, if you specify >31 bits it raises a reserved operand exception.
Can't pass temporary with hidden copy ctor as const ref
Hi, I'm having trouble compiling the following with g++ 4.2.1: class Uncopyable { public: Uncopyable(int x) {} private: Uncopyable(const Uncopyable & other) {} }; class User { public: void foo(int x) { foo(Uncopyable(x)); } void foo(const Uncopyable & x) { // do something } }; int main () { User u; u.foo(1); return 0; } The compiler complains that it can't find a copy ctor for 'Noncopyable'; why is this? It would seem that temporaries can be passed directly as the const ref rather than needing a copy. Message: test.cc: In member function 'void User::foo(int)': test.cc:11: error: 'Uncopyable::Uncopyable(const Uncopyable&)' is private
variadic arguments not thread safe on amd64?
I've been trying to write a program with a logging thread that will consume messages in 'printf format' passed via a struct. It seemed that this should be possible using va_copy to copy the variadic arguments but they would always come out as garbage. This is with gcc 4.1.2 on amd64. Reading through the amd64 ABI it's now clear that the va_list is just a struct and the actual values are stored in registers. So I imagine that when it switches threads the registers are restored and the va_list isn't valid anymore. But I can't find any documentation about whether the va_* macros were ever supposed to be thread safe. It seems that they probably are everywhere except PPC and amd64. Is there a portable way to pass a va_list between threads? Here's an example program, if you compile it on a 32 bit machine (or even with -m32) it prints out both strings ok, but on amd64 it will print nulls for the threaded case. $ gcc -m64 -g -lpthread test.c $ ./a.out hello world debug: hello world tdebug: hello world $ gcc -m64 -g -lpthread test.c $ ./a.out hello world debug: hello world tdebug: (null) (null) #include #include typedef struct log_s { const char *format; va_list ap; } log_t; log_t mylog; pthread_mutex_t m; pthread_cond_t c; void printlog() { vprintf(mylog.format, mylog.ap); } void *tprintlog() { pthread_mutex_lock(&m); pthread_cond_wait(&c, &m); vprintf(mylog.format, mylog.ap); pthread_mutex_unlock(&m); } void debug(const char *format, ...) { va_list ap; mylog.format = format; va_start(ap, format); va_copy(mylog.ap, ap); printlog(); va_end(ap); } void tdebug(const char *format, ...) { va_list ap; pthread_mutex_lock(&m); mylog.format = format; va_start(ap, format); va_copy(mylog.ap, ap); pthread_cond_signal(&c); pthread_mutex_unlock(&m); } int main(int argc, char *argv[]) { pthread_t t; debug("debug: %s %s\n", argv[1], argv[2]); pthread_mutex_init(&m, NULL); pthread_cond_init(&c, NULL); pthread_create(&t, NULL, tprintlog, NULL); sleep(1); tdebug("tdebug: %s %s\n", argv[1], argv[2]); sleep(1); }
Re: variadic arguments not thread safe on amd64?
On Mon, Apr 27, 2009 at 08:49:27PM -0700, Andrew Pinski wrote: > On Mon, Apr 27, 2009 at 8:37 PM, Matt Provost wrote: > > void tdebug(const char *format, ...) { > > ?? ??va_list ap; > > ?? ??pthread_mutex_lock(&m); > > ?? ??mylog.format = format; > > ?? ??va_start(ap, format); > > ?? ??va_copy(mylog.ap, ap); > > ?? ??pthread_cond_signal(&c); > > ?? ??pthread_mutex_unlock(&m); > > You are missing two va_end's here > Yes I had a question about va_end in this situation. Putting one that clears 'ap' seems fine but doesn't change anything. But if you va_end the copy that you put in the struct, then what happens when the other thread goes to use it? Or should the va_end for that be in the tprintlog function after it's done with it? In any case none of those combinations seem to affect the output. Thanks, Matt