GCC 3.4.3 static constants, named sections, and -fkeep-static-consts
Given the following, static char const rcsid[] = "$Id: f.c,v 5.4 1993/11/09 17:40:15 eggert Exp $"; int main() {} When compiled with GCC 3.4.3, at -O2, the ident string above will _not_ appear in the executable. This is apparently expected behavior. However, interestingly, gcc -fkeep-static-consts -O2 t.c did not retain the ident string, rcsid, defined above. Shouldn't -fkepp-static-consts have ensured that this static constant would appear in the executable? I also tried adding a section attribute to the string, with the hope that the compiler would retain the static constant because it had been explicitly targeted to a named section, static char const __attribute__ ((section("ident_sect"))) rcsid[] = "$Id: f.c,v 5.4 1993/11/09 17:40:15 eggert Exp $"; int main() {} but this didn't prevent the elimination of the const static defintion. Any suggestions on another method to ensure that this static const string makes it into the executable when compiled at -O2? And shouldn't -fkeep-static-consts have ensured that the static const string wasn't eliminated? Bug? The logic in wrapup_global_declarations (toplev.c) doesn't look quite right: else if (TREE_READONLY (decl) && !TREE_PUBLIC (decl) && (optimize || !flag_keep_static_consts || DECL_ARTIFICIAL (decl))) needed = 0; If 'optimize' is asserted above then flag_keep_static_consts will not be tested. Perhaps it should read as follows? && ((optimize && !flag_keep_static_consts) Alternatively, I wonder if flag_keep_static_consts should be tested earlier at a higher level, for example: if (flag_keep_static_consts) /* needed */; but I'm not sure about which of the earlier tests which assert needed = 0; are mandatory and which are optional. Enhancement request: assert node->needed if an explicit section attribute is supplied for the declaration associated with node, on the assumption that the data is being placed in a named section for a reason.
RE: GCC 3.4.3 static constants, named sections, and -fkeep-static-consts
> From: James E Wilson > Sent: Tuesday, March 08, 2005 6:59 PM [...] > > Try re-reading the docs. -fkeep-static-consts is the default. The > purpose of this is that we don't perform this optimization at -O0 > normally, but if you use -fno-keep-static-consts, then we do. So this > option can let you remove static consts in extra cases, but will never > prevent the compiler from removing them. Jim, Thanks for the follow-up. I filed a bug report, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20319 and note #2 summarizes some relevant, conflicting facts: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20319#c2 The gist is the documentation describes the current behavior. However, I don't think the current behavior is useful, and it does not agree with comments in the source code, nor the help line. However, as you noted __attribute__ ((used)) works well as a workaround, although it would be helpful if `used' was added to the documentation as a supported attribute that can be applied to variables. I think that the switch name -fkeep-static-consts might be more consistenly named if it was given the opposite sense and named something like -fdelete-unused-static-consts. The idea here is that by asserting the switch a particular optimization is _enabled_. Thus the optimizations performed at each level can be consistently enumerated by asserting a particular set of switches which enable specific optimizations. This would change the present user interface, however, I doubt that anyone is making extensive use of the current interface because at present only -fno-keep-static-consts, asserted at -O0 (no optimization), actually changes the default behavior of the compiler.
RE: Merging calls to `abort'
Richard Stallman wrote (in part): > What's the point of cross-jumping? It saves a certain amount of > space; it has no other benefit. All else being equal, there's no > reason not to do it. But cross-jumping abort calls interferes with > debugging. That's a good reason not to do it. t's get rid of this optimization. Those who want to use a > fancy_abort function will still be able to do so, but this change will > be an improvement for the rest. Would a new attribute be in order, that disables the optimization? For example, __attribute__ ((unique_call))? That way, the programmer can designate other procedures than abort() as procedures which should not be cross-jumped.
RE: Hand-written rec-descent parser of GCC-4.1 is WRONG!!!
The following paper provides some background on the difficulties encountered with parsing C++: http://citeseer.ist.psu.edu/irwin01generated.html Abstract: C++ is an extraordinarily difficult programming language to parse. The language cannot readily be approximated with an LL or LR grammar (regardless of lookahead size), and syntax analysis depends on semantic disambiguation. While conventional (LALR(1) and LL(k)) parser generation tools have been used to build C++ parsers, the effort involved in grammar modification and custom code development is substantial, rivaling the effort of constructing a parser manually. [...] Link to PDF: http://tinyurl.com/3remp And a related thread on the GCC mailing list back in 2002: http://gcc.gnu.org/ml/gcc/2002-08/msg00085.html
empty switch substituion doesn't erase matching switch?
This usage of a null substitution came up while I was trying to use this form of spec. for a different switch, but the following illustrates the problem using the existing gcc compiler as built for Redhat Linux running on an SGI Altix: Given a spec of this form, %{S:X} substitutes X, if the -S switch was given to CC. And a switch definition for -static: /* %{static:} simply prevents an error message if the target machine doesn't handle -static. */ And the resulting link command spec: *link_command: %{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:%(linker) %l %{pie:-pie} %X %{o*} %{A} %{d} %{e*} %{m} %{N} %{n} %{r}%{s} %{t} %{u*} %{x} %{z} %{Z} %{!A:%{!nostdlib:%{!nostartfiles:%S}}}%{static:} %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate:-lgcov} %{!nostdlib:%{!nodefaultlibs:%(link_gcc_c_sequence)}} %{!A:%{!nostdlib:%{!nostartfiles:%E}}} %{T*} }} % gcc --version gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-34) then the command "gcc -static t.c" ultimately yields the following collect2 command: /usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/collect2 -static /usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/../../../crt1.o /usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/../../../crti.o /usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/crtbegin.o -L/usr/lib/gcc-lib/ia64-redhat-linux/3.2.3 -L/usr/lib/gcc-lib/ia64-redhat-linux/ 3.2.3/../../.. /tmp/ccc2ISqV.o --start-group -lgcc -lgcc_eh -lc --end-group /usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/crtend.o /usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/../../../crtn.o I haven't followed the logic in detail, but should the spec. %{static:} above erase the explicit -static switch that was passed to gcc?
tips on debugging a GCC 3.4.3 MIPS RTL optim problem?
Hello, using the 3.4.3 baseline on SGI MIPS3 Irix6.5, I'm running into a problem where bad code is generated on a relatively trivial program when both -funit-at-a-time and -foptimize-sibling-calls is asserted. The nature of the failure is that the RTL optimizer seems to get confused about what value should be targeted to an argument register; it seems to coallesce two separate temporaries into one. Note that the original RTL being generated originates in some new code that I've added to support an experimental dialact of C (called UPC), so it isn't out of the question that there is some aliasing or other issue that I've introduced. However, most tests are passing, and just a few show the failure mode illustrated below. All the tests pass on i386 and IA64, fyi -- they don't demonstrate this failure. First question: are there known problems in 3.4.3 with -funit-at-a-time and/or -foptimize-sibling-calls? (I ran a few queries of the Bugzilladatabase but didn't find anything). I confirmed the problematic optimizations by compiling the program with -O0 -funit-at-a-time -foptimize-sibling-calls and noticed that correct code is generated if either or both optimization switches are removed from the command line. I tried debugging the problem by compiling with -da and looked at the various rtl dump files: t.upc.00.cgraph t.upc.07.addressof t.upc.25.gregt.upc.35.mach t.upc.01.rtl t.upc.11.cfgt.upc.26.postreload t.upc.02.sibling t.upc.19.life t.upc.27.flow2 t.upc.04.jump t.upc.24.lreg t.upc.29.ce3 The bad code shows up in t.upc.02.sibling, so probably -dr -di would have sufficed. The problem that I'm seeing is illustrated in the following RTL: (insn 66 65 77 0 (set (reg:SI 225 [ ]) (reg/f:SI 177 virtual-stack-vars)) -1 (nil) (nil)) (insn 77 66 78 0 (set (reg:DI 228) (const_int 0 [0x0])) -1 (nil) (nil)) (insn 78 77 79 0 (set (reg:DI 228) (mem/s:DI (reg/f:SI 177 virtual-stack-vars) [0 S8 A128])) -1 (nil) (nil)) (insn 79 78 80 0 (set (reg:DI 4 $4) (reg:DI 228)) -1 (nil) (nil)) (insn 80 79 81 0 (set (reg:SI 5 $5) (reg:SI 225 [ ])) -1 (nil) (nil)) (insn 81 80 82 0 (set (reg:SI 6 $6) (reg:SI 224 [ ])) -1 (nil) (nil)) (insn 82 81 83 0 (set (reg:SI 229) (unspec:SI [ (reg:SI 28 $28) (const:SI (unspec:SI [ (symbol_ref:SI ("__putblk3") [flags 0x41] ) ] 107)) (reg:SI 79 $fakec) ] 27)) -1 (nil) (nil)) (call_insn 83 82 115 0 (parallel [ (call (mem:SI (reg:SI 229) [0 S4 A32]) (const_int 0 [0x0])) (clobber (reg:SI 31 $31)) ]) -1 (nil) (nil) (expr_list (use (reg:SI 28 $28)) (expr_list (use (reg:SI 6 $6)) (expr_list (use (reg:SI 5 $5)) (expr_list (use (reg:DI 4 $4)) (nil)) (insn 115 83 116 0 (clobber (mem/s:BLK (reg/f:SI 177 virtual-stack-vars) [0 A128])) -1 (nil) Above, the second argument (reg:SI $5) is set to (reg:SI 225), which in turn is set to (reg/f:SI 177 virtual-stack-vars) which is simply the frame pointer. Note that the first argument (reg:SI $4) will end up being set to the contents of the location that the frame pointer points to -- this is incorrect -- it should be set to the contents of 16($fp), or at least some other location than the double word location beginning at $fp. It looks as if the optimizer somehow aliased the two locations, or it decided somehow that they weren't both live at the same time. If we maintain the -foptimize-sibling-calls switch but do not assert -funit-at-a-time, the following correct RTL is generated: (insn 39 38 40 0 (set (reg:SI 205) (const_int 8 [0x8])) -1 (nil) (nil)) (insn 40 39 41 0 (set (reg:SI 206) (reg/f:SI 177 virtual-stack-vars)) -1 (nil) (nil)) (insn 41 40 42 0 (set (reg:DI 207) (const_int 0 [0x0])) -1 (nil) (nil)) (insn 42 41 43 0 (set (reg:DI 207) (mem/s:DI (plus:SI (reg/f:SI 177 virtual-stack-vars) (const_int 16 [0x10])) [0 S8 A128])) -1 (nil) (nil)) (insn 43 42 44 0 (set (reg:DI 4 $4) (reg:DI 207)) -1 (nil) (nil)) (insn 44 43 45 0 (set (reg:SI 5 $5) (reg:SI 206)) -1 (nil) (nil)) (insn 45 44 46 0 (set (reg:SI 6 $6) (reg:SI 205)) -1 (nil) (nil)) (call_insn 46 45 48 0 (parallel [ (call (mem:SI (symbol_ref:SI ("__putblk3") [flags 0x41] ) [0 S4 A32]) (const_int 0 [0x0])) (clobber (reg:SI 31 $31)) ]) -1 (nil) (nil) (expr_list (use (reg:SI 28 $28)) (expr_list (use (reg:SI 6 $6)) (expr_list (use (reg:SI 5 $5)) (expr_list (use (reg:DI 4 $4)) (nil)) (insn 48 46 49 0 (clobber (mem/s:BLK (plus:SI (reg/f:SI 177 virtual-stack-vars) (const_int 16 [0x10])) [0 A128
RE: gcc 4.0.0 optimization vs. id strings (RCS, SCCS, etc.)
We use the feature of placing strings into the object file somewhat differently. We record configuration and compilation-related info. into strings which are collesced into their own linkage section. A runtime component traverses this config. info. section to ensure that the various separately linked modules have been compiled with consistent settings. Yes, this might be better done by a host based tool like collect, but that requires more work and more mechanism, and the simpler approach works fine for now.
RE: Ada and bad configury architecture.
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of > Nathanael Nerode > Sent: Monday, April 25, 2005 8:47 PM [...] > > Actually, I was going to try to convince y'all to allow the *configury* > to be put in the *configure* files. All of it. The current scheme of > stuffing the configury in the Makefile, although I know the Ada > maintainers like it, is just trouble, and is fundamentally the source of > most or all of the endless Ada cross-build problems. We implement an experimental dialect of C, called UPC, which targets SIMD class machines. One of the changes between 3.3 and 3.4 that have caused us the most grief is the decision to defer per-language configuration to the make step. This means that the dialect-specific configuration runs after gcc configuration, and we can no longer, for example overlay (or add to) the basic configuration. As an example, we need to introduce dialect-specific runtime start and end object files (serving a similar function to crtbegin.o and crtend.o) but the common start and end files are now built well before the UPC language files are even configured. Thus, there is no mechanism to add language-specific components onto the list of files that come with the base level compiler. For 3.4 we've worked around the problem, but the workaround is kludgy. In a related matter, I find it difficult to debug the makefiles that make use of included makefile fragments. I can see some advantages of these included files for developers who happen to be working on those fragments, but overall, the include files make life more difficult. Same thing goes for the included configure fragments, IMO. And while I'm ranting, I'd much prefer it the make files were 'for loop free'; that is, that they listed explicit dependencies and built those dependents in a classic make file fashion, rather than implementing iteration in the make step. Most of these suggestions argue for a method to generate make files in a more automated fashion.
RE: GCC 4.1: Buildable on GHz machines only?
> -Original Message- > From: Matt Thomas > Sent: Tuesday, April 26, 2005 10:42 PM [...] > > Alas, the --disable-checking and STAGE1_CFLAGS="-O2 -g" (which I was > already doing) only decreased the bootstrap time by 10%. By far, the > longest bit of the bootstrap is building libjava. > Is it fair to compare current build times, with libjava included, against past build times when it didn't exist? Would a closer apples-to-apples comparison be to bootstrap GCC Core only on the older sub Ghz platforms?
RE: GCC 3.4.4 Status (2005-04-29)
> From: Mark Mitchell > Sent: Friday, April 29, 2005 12:00 PM > > Now that GCC 4.0 is out the door, I've spent some time looking at the > status of the 3.4 branch. As stated previously, I'll be doing a 3.4.4 > release, and then turning the branch over to Gaby, to focus > exclusively on 4.0/4.1. [...] What is the target date for 3.4.4? Thanks.
GCC 3.3.6 - anomalous debug info?
configuration: i386-redhat-linux (Redhat 9.2), gcc 3.3.6 ("make bootstrap" from the sources), and gdb "(5.3post-0.20021129.18rh)" as well as gdb 6.3 (latest) built from sources. I'm working on some changes to GCC 3.4.3, which I've built using gcc 3.3.6. The GCC (3.4.3) that I'm debugging is compiled with -g -O0, with --enable-checking. However, I notice that when I fire up GDB 5.3, it says: Breakpoint 4, main (argc=13, argv=0xbfffdc04) at /upc/gcc-upc/src/gcc/main.c:35 35return toplev_main (argc, argv); During symbol reading, inner block not inside outer block in print_rtx. During symbol reading, inner block not inside outer block in print_rtx. During symbol reading, inner block not inside outer block in print_rtx. During symbol reading, inner block not inside outer block in print_rtx. During symbol reading, inner block not inside outer block in print_rtx. and the latest gdb 6.3 (built from sources) says the following: Breakpoint 4, main (argc=13, argv=0xbfffe7e4) at /upc/gcc-upc/src/gcc/main.c:35 35return toplev_main (argc, argv); During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e. Neither of these differing series of warning messages give me confidence that the debugging info. is correct. Is this a gcc problem, or a gdb problem? (I made a few quick probes in the Bugzilla database, but couldn't find anything that seemed relevant to malformed debug info.) Anyone else see messages like this when debugging gcc? Is there a workaround or fix? Is this something unique to what is now a fairly old version of the Linux OS? BTW, one of the reasons I tried this with the latest GCC 3.3.6 and GDB 6.3, and compiling at -O0 was to see if some problems I was seeing where the debugger was having trouble navigating gcc's object->source mappping might be fixed. I saw a similar problem using an earlier version of gcc "3.2.2 20030222 (Red Hat Linux 3.2.2-5)".
RE: Full comparison in 'cbranchsi4' leads to error in gcc 4.0
> > This works fine on gcc 3.4, however on gcc 4.0 it creates an error during > optimization. According to my investigation, the error occurs when there is a > division by a constant power of 2 which needs to be transformed into > shifting. > The error generated is: > > internal compiler error: in emit_cmp_and_jump_insn_1, at optabs.c:3599 > The easiest thing to do is to debug gcc: set a breakpoint on fancy_abort, and and go up a few levels to emit_cmp_and_jump_insn_1(). Note the incoming rtx args (x and y) and mode. From the looks of the code in there it is looking for an instruction pattern that matches, and when no match is found, it tries a wider mode, until there are no wider modes, then it aborts. You need to find the mode and rtx arguments that are being passed in, and then understand why no matching instruction is found. For example, in your instruction pattern, (define_insn "cbranchsi4" [(set (pc) (if_then_else (match_operator 0 "comparison_operator" [(match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "nonmemory_operand" "r")]) (label_ref (match_operand 3 "" "")) (pc)))] "" "c%C0jump %1 %2 %3" [(set_attr "type" "branch") (set_attr "length" "1")] ) it isn't prepared to match a memory operand. Perhaps the optimizer pre-calculated a constant, and targeted the constant into memory rather than a register? In that case, there will be no match on the third argument because it is expecting a "nonmemoryoperand".
packaging a GCC binary distribution so it can be installed at arbitrary locations?
Given a binary distibution of GCC, for example, built to install under /usr/local, is it possible to configure and build the compiler in such a way that a binary packaging method such as RPM can allow a user to specify an alternate installation point (perhaps /opt, or even the user's home directory) and have it all work? My impression is that too many hard coded paths are wired into gcc.c when it is built to make this ability to migrate the binary possible. There are workarounds for the user such as setting various environment variables and using the -B switch, but I'm looking for a method that directly allows installation of the binary to a new place than where it was initially configured. Anyone found a way to do this? (Separately, GCC 3.4 is now built using dynamic libraries for libgcc and libunwind, and these cause some different but unique problems invoking gcc [assuming the user would prefer not to adjust their library path or doesn't have access to /etc/ld.so.conf. I think things could be made simpler by specifying various -rpath settings when the executable is linked, but these -rpath settings may have to fixed up when installing the binary to a place other than it was built, unless the entries can be made relative to the executable.])
Is -static a link-only switch?
Does the -static switch play any role during compilation, or is it a link-only switch? A quick review of gcc.c, indicates that -static may play a role on some targets: /* %{static:} simply prevents an error message if the target machine doesn't handle -static. */ However, the info documentation shows the following: *Note Options for Linking: Link Options. OBJECT-FILE-NAME -lLIBRARY -nostartfiles -nodefaultlibs -nostdlib -s -static -static-libgcc -shared -shared-libgcc -symbolic -Wl,OPTION -Xlinker OPTION -u SYMBOL. I can think of target OS's that might define a different ABI for procedure calls for programs compiled with -static asserted, than when compiled for a dynamic linking environment, but can't quite tell if in fact -static has any effect during compilation.
RE: packaging a GCC binary distribution so it can be installed at arbitrary locations?
> > Yes, with recent versions of gcc you can move the entire tree around > and the gcc driver will still be able to find the various internal > executables and header files. [...] Ian, thanks. Which versions qualify as "recent" above? GCC 3.4, or 4.0, or both? Is there any documentation on how the new packaging mechanism works? If this was discussed on this list, would you happen to know approximately, when (so I can do a search of the archives)?
RE: Is -static a link-only switch?
Ian Lance Taylor wrote (in part): > In fact many targets compile code differently depending upon whether > the code is to be put into a shared library or not, but this is > controlled via options like -fpic, not -static. Is it generally safe on all currently supported targets to assert -fno-pic when compiling programs that will ultimately be linked with -static asserted? Will targets that don't support -fpic (and -fon-pic) complain, or just quietly accept the switch?
RE: packaging a GCC binary distribution so it can be installed at arbitrary locations?
Ian Lance Taylor wrote (in part): > Telling the dynamic linker about a dynamic libgcc is still a problem, > but that is a problem whereever you put the compiler. If I'm not interested in build a dynamically linked gcc, or building libgcc and related libraries as dynamic libraries, can I simply assert --disable-shared when configuring gcc, and thus ensure that the resulting compiler binaries can be easily moved around?
C99 implies -Wimplicit-function-declaration?
I notice that while compiling with -stdc99 (which asserts flag_isoc99) that the compiler issues warnings by default when it detects that a function call references a function which has not been previously declared. Although it is a useful warning, my copy of the C99 spec. seems to indicate that such a warning is optional. My copy of the C99 standard (2nd edition, 1999-12-01) Says the following in Annex I ("Common Warnings"): --- begin quote 1 An implementation may generate warnings in many situations, none of which are specified as part of this International Standard. The following are a few of the more common situations. [...] A function is called but no prototype has been supplied (6.5.2.2). --- end quote There appears to be no requirement for the compiler to issue a warning, although does seem to be permitted by the specification. Also, this behavior is not reflected in the documentation, http://gcc.gnu.org/onlinedocs/gcc-4.0.0/gcc/Warning-Options.html#Warning-Opt ions -Wimplicit-function-declaration -Werror-implicit-function-declaration Give a warning (or error) whenever a function is used before being declared. The form -Wno-error-implicit-function-declaration is not supported. This warning is enabled by -Wall (as a warning, not an error). -Wimplicit Same as -Wimplicit-int and -Wimplicit-function-declaration. This warning is enabled by -Wall. The documentaion is technically incorrect, because at the top of the page, it states: "This manual lists only one of the two forms, whichever is not the default.". However, for C99 the option is enabled by default
RE: C99 implies -Wimplicit-function-declaration?
Joseph S. Myers wrote (in part): > No prototype is different from no declaration at all. Implicit function > declarations are not part of C99, so the code is in error in C99 mode. OK, thanks. I (now) understand that the reference to a warning about a missing protoype does not apply. However, I don't see anything in section 6.5.2.2 (rev. 1999-12-01) that says that a function declaration or prototype declaration must (or should) precede a call to the function. And GCC isn't treating it as an error, but rather is enabling the warning by default. The code reads as follows (in c-objc-common.c): /* If still unspecified, make it match -std=c99 (allowing for -pedantic-errors). */ if (mesg_implicit_function_declaration < 0) { if (flag_isoc99) mesg_implicit_function_declaration = flag_pedantic_errors ? 2 : 1; else mesg_implicit_function_declaration = 0; } And mesg_implicit_function_declaration is initialized to -1 (c-common.c): /* Nonzero means message about use of implicit function declarations; 1 means warning; 2 means error. */ int mesg_implicit_function_declaration = -1;
RE: Sine and Cosine Accuracy
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of > Menezes, Evandro > Sent: Friday, May 27, 2005 1:55 PM [...] > > That's because the error is the same but symmetrical for sin and > cos, so that, when you calculate the sum of their squares, one > cancels the other out. > > The lack of accuracy in x87 is well known: see > http://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-F > unctions.html#Errors-in-Math-Functions. Ulrich Drepper used a different method to compute math function accuracy, described here: http://people.redhat.com/drepper/libm/index.html It might be interesting to re-run the safe/unsafe/x87 tests using his methodology. His results show offer comparisons on a number of platforms, and the visual representation of the errors can offer some insight into the behavior of the implementation.
RE: What is wrong with Bugzilla?
As an occasional user of the Bugzilla database, I don't find it terrible to use, though it would be nice if there were an abbreviated interface that looked for the sorts of queries that users issue the most. These often-occurring queries might be best determined by saving a month's worth of queries and ferreting out the types of queries that occur most often. I also didn't find the requirement that I register my e-mail address to be particular surprising or burdomesome. As an aside, I often stumble into the middle of a Redhat discussion list thread via Google that seems to relate to a problem that I've encountered. Redhat for some reason requires https access, and in IE6, my browser of non-choice, I have to click OK to view the page. Now, _that_ is annoying. What may be confusing to users: where do I report my problem? If I'm a Redhat user, do I log my potential GCC problem to their support site, or to the GCC site? To further confuse matters, for most users, the vendors often modify a given version of GCC to include specific patches and build options of their choice. This of course argues for logging bugs with the vendor. One wonders whether the vendors are timely in reporting legit bugs back to the GCC Bugzilla database, but one hopes so. If we for the moment assume that users of pre-packaged distributions report their bugs back to the vendor, then the GCC mailing lists and bug lists are left for those brave souls who are using GCC source code distributions directly. (perhaps the GCC maintainers can comment on whether this theory in fact holds). Matters are further complicated by the fact that there are now several viable GCC releases to choose from: 3.3.x, 3.4.x, 4.0.x, CVS head, and so on. There's even the occasional bug filed against one of the many branches. When we consider the multitude of choices, it is amazing that there is any forward progress. As a casual reader of the GCC lists, I do have one observation: the volume on the GCC bug list is very, very high. Often the bug traffic there relates to regressions and bugs that are found on the CVS head or recent development releases. As a user of the older releases (3.3, 3.4), I'd much prefer it if there were too separate bug reporting lists: one for the more stable released versions, and a separate list for the "latest". I'd also like it if there was a web page for each stable release that showed the results of a canned Bugzilla query which lists open bugs and/or recently closed bugs agaisnt the stable releases (not sure how this would be organized). As far as the tenor of the GCC mailing list goes, it is true that responses to "dumb questions" are often terse, but they're generally helpful. I think this is to be expected, when interacting with busy developers who have to balance many priorities and pressing deadlines. I've been particularly disappointed by queries to related lists like the glibc list, which of course is an equally important component of a useful C compilation system. I would vote affirmatively to somehow more closely linking GCC releases with specific GLIBC distributions, and have some sort of tighter coordination between the two. However, after delving into GLIBC on a particular platform, I can see where handling the many varieties of GLIBC builds is a big problem, and appears to be one that presently the vendors mainly deal with.
RE: What is wrong with Bugzilla? [Was: Re: GCC and Floating-Point]
> > Next try documentation, installation. Talks about compiling again. > Finally, at download, binaries I find what I want. Seeing as I suspect > that is the link most people want when they first visit, it should > perhaps be a little more obvious, and in the main body near the top? Your scenario makes a lot of sense. However, it should be possible to verify actual usage patterns by investigating web site logs, to see which pages are visited and (perhaps) in what order. Based upon this information, the pages can be re-organized to place first, and most prominent, the pages that are generally visited first. Sub-question: which version would the maintainers recommend that a user looking for a stable release try first (3.3, 3.4, or 4.0)?
semantics of null lang_hooks.callgraph.expand_function?
While working with GCC's language hooks, we found that certain places in GCC test for a null value of lang_hooks.callgraph.expand_function, but cgraph_expand_function() calls the hook directly: In cgraphunit.c: /* Expand function specified by NODE. */ static void cgraph_expand_function (struct cgraph_node *node) { tree decl = node->decl; /* We ought to not compile any inline clones. */ gcc_assert (!node->global.inlined_to); if (flag_unit_at_a_time) announce_function (decl); cgraph_lower_function (node); /* Generate RTL for the body of DECL. */ lang_hooks.callgraph.expand_function (decl); In toplev.c: /* Disable unit-at-a-time mode for frontends not supporting callgraph interface. */ if (flag_unit_at_a_time && ! lang_hooks.callgraph.expand_function) flag_unit_at_a_time = 0; In function.c: /* Possibly warn about unused parameters. When frontend does unit-at-a-time, the warning is already issued at finalization time. */ if (warn_unused_parameter && !lang_hooks.callgraph.expand_function) do_warn_unused_parameter (current_function_decl); We tried setting lang_hooks.callgraph.expand_function to NULL: /* For now, disable unit-at-a-time by setting expand_function to NULL */ #undef LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION #define LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION NULL which has the desited effect of disabling unit-at-a-time, but runs aground in cgraph_expand_function() with a segfault, when it attempts to call lang_hooks.callgraph.expand_function(). It seems that GCC is handling lang_hooks.callgraph.expand_function in an inconsistent fashion. Is a null value for expand_function meaningful? If it is, then what is the fix for cgraph_expand_function()?
Should bootstrap-O3 be the default for building/testing GCC?
Currently, the default optimization level when building, bootstrapping GCC is -O2. We routinely build with --with-build-config='bootstrap-debug bootstrap-O3' because we want to verify that our UPC changes don't affect the compiler when built with full optimizations. We also build with --enable-checking=all. Since most developers probably build/test GCC with the default -O2 options, we fairly often run into -O3 related issues when building GCC. Enough so that we're considering just using the default -O2 settings. I'm wondering if there might be benefit in changing the current defaults to use -O3 instead? Or perhaps have the configure infrastructure determine that the build is for a development version of GCC and set the flags and options accordingly? Somewhat related: has anyone recently determined whether a GCC built with -O3 is generally faster/smaller than one built at -O2? thanks, - Gary
Re: Autotuning parameters/heuristics within gcc - best place to start?
On 09/26/14 07:47:05, Andi Kleen wrote: > One example of an existing autotuner is the gccflags tuner in opentuner. Although dated, ACOVEA might offer up some ideas. http://stderr.org/doc/acovea/html/acovea_4.html
Re: organization of optimization options in manual
On 01/14/15 23:15:59, Jeff Law wrote: > Sounds good. I think just starting with the list & creating the buckets > with the list. Then post here and we'll iterate and try to nail that down > before you start moving everything in the .texi file. Something to consider, if the optimization options are re-worked: Arrange the -O options such that -O1 can be described by a distinct set of specific optimizations enabled (or disabled) in addition to -O0, and -O2 would be described as a composite of specific optimizations applied to -O1 and so on. (This might require the addition of new optimization options.) For completeness, if a specific optimization requires certain passes or the assertion of other options, that should somehow be encoded internally within the compiler. This would potentially make it easier to find which optimization (or pass) is causing a regression and might make it easier for users to understand the exact effect of a particular -O option. - Gary
how to make sure an init routine is kept in the call graph?
Recently, we tried to merge the GCC trunk into the GUPC branch and ran into an issue caused by a recent GCC update. The last successful merge was trunk version 172359, fyi. For certain UPC file scope static initializers, a per file initialization routine is created, its address is added to a global table (in its own section). The UPC runtime will call all the routines listed in that table before transferring control the user's main program. After the recent trial merge, we see the following on some of our test cases: ../../gcc/xupc -O2 -g -fdump-ipa-cgraph -fdump-tree-cfg test25.upc -o test25 /tmp/ccygQ8JN.o:(upc_init_array+0x0): undefined reference to `__upc_init_decls' The call graph dump entry for `__upc_init_decls' is as follows: __upc_init_decls/80(80) @0x71eb3de0 (asm: __upc_init_decls) body finalized called by: calls: References: Refering this function: As expected, no explicit references have been recorded. The compiler routine that creates this initialization routine is called from c_common_parse_file(): push_file_scope (); c_parse_file (); /* Generate UPC global initialization code, if required. */ if (c_dialect_upc ()) upc_write_global_declarations (); pop_file_scope (); The routine that builds the initialization function is upc_build_init_func() in gcc/upc/upc-act.c (on the gupc branch). This routine does the following to build the function, mark it as used and referenced, and to then add its address to the initialiaztion table: DECL_SOURCE_LOCATION (current_function_decl) = loc; TREE_PUBLIC (current_function_decl) = 0; TREE_USED (current_function_decl) = 1; DECL_SECTION_NAME (current_function_decl) = build_string (strlen (UPC_INIT_SECTION_NAME), UPC_INIT_SECTION_NAME); /* Swap the statement list that we've built up, for the current statement list. */ t_list = c_begin_compound_stmt (true); TREE_CHAIN (stmt_list) = TREE_CHAIN (t_list); cur_stmt_list = stmt_list; free_stmt_list (t_list); t_list = c_end_compound_stmt (loc, stmt_list, true); add_stmt (t_list); finish_function (); gcc_assert (DECL_RTL (init_func)); upc_init_array_section = get_section (UPC_INIT_ARRAY_SECTION_NAME, 0, NULL); mark_decl_referenced (init_func); init_func_symbol = XEXP (DECL_RTL (init_func), 0); assemble_addr_to_section (init_func_symbol, upc_init_array_section); In the past, setting TREE_USED() and calling mark_decl_referenced() was sufficient to make sure that this routine was not removed from the call graph. What is needed in the new scheme of things to ensure that this initialization function stays in the call graph? thanks, - Gary
Re: how to make sure an init routine is kept in the call graph?
On 04/22/11 11:14:11, Richard Guenther wrote: > GF: What is needed in the new scheme of things to ensure that this > GF: initialization function stays in the call graph? > > Try setting DECL_PRESERVE_P to 1. Richard, thanks. That worked. - Gary
Re: RFC: [GUPC] UPC-related changes
This email is a follow-up to an email with a similar title (posted a year ago). During that time period, we have worked on making the changes suggested by Joseph Myers, Tom Tromey, and other reviewers. We have also implemented various bug fixes and improvements. Our goal with this RFC is to acquaint the reviewers with UPC and the impact of UPC changes on the GCC front-end, and to gain consensus that the changes are acceptable for incorporation into the GCC trunk. Once we make further suggested changes, and have a consensus on this batch of changes, I will send out RFC's for the "middle end" (the lowering pass), "debugging" (UPC-specific DWARF extensions), "runtime" (libupc) and "testing" RFC's. Those additional RFC's are likely to be more modular and will have less impact on the GCC infrastructure. The email describing the UPC-related front-end and infrastructure changes was posted to the gcc-patches mailing list: http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html Thanks, - Gary
Re: GCC 4.7.0 Status Report (2011-09-09)
On 09/09/11 09:09:30, Jakub Jelinek wrote: > [...] What is the status of lra, reload-2a, pph, > cilkplus, gupc (I assume at least some of these are 4.8+ material)? For GUPC, we are targeting GCC 4.8. thanks, - Gary
Re: Profiling gcc itself
Two more suggestions (off-topic to the profiling point, but on topic to the idea of speeding up builds involving invocations of GCC): ccache: http://ccache.samba.org/ "ccache is a compiler cache. It speeds up recompilation by caching previous compilations and detecting when the same compilation is being done again. Supported languages are C, C++, Objective-C and Objective-C++." distcc: http://code.google.com/p/distcc/ "distcc is a program to distribute builds of C, C++, Objective C or Objective C++ code across several machines on a network. distcc should always generate the same results as a local build, is simple to install and use, and is usually much faster than a local compile."
Re: GCC 4.7.0 Status Report (2011-12-06)
On 12/06/11 01:18:28, Joseph S. Myers wrote: > [...] It still seems reasonable to aim for > entering Stage 4 (regression fixes and documentation changes only) in > early January and the 4.7.0 release in March or April. At what point in time would the GCC 4.7 branch be created, and the trunk would then be open for new contributions (not planned for the 4.7 release)? Is that also early Jan.? Thanks, - Gary
RFC: cgraph/lowering vs. finish_file for GCC/UPC rewrites?
Recently, we have been working on upgrading GCC/UPC (see http://gccupc.org) to the GCC trunk. Previously, we've sync'ed with the latest stable release, but now we want to stay more current. When built with GCC versions 4.0 through 4.3, we used the gimplify language hook, LANG_HOOKS_GIMPLIFY_EXPR, to rewrite trees that refer to UPC constructs and UPC shared variable references - converting them into non-UPC, gimplified, tree structures. This worked well, though we did need to extend the language hook to include a gimplify test predicate and fallback so that we can rewrite modify_expr's involving UPC shared variables as the target: int upc_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, bool (* gimple_test_f) (tree), int fallback) Working with the latest GCC 4.5 snapshot, we have run into a problem that leads me to believe that the current approach will no longer work with the 4.5/trunk version of GCC. In prior GCC versions, the gimplify pass was called before the call graph pass. This meant that we could safely employ the gimplify language hook to perform the rewrites, which may emit inlined runtime calls. An example UPC-related rewrite is to transform UPC shared variable references into runtime calls. This program: shared int x; shared int y; int main() { x = y; } might be translated into something like: int main() { int y_tmp = upc_get_int(upc_shared_addr(&y)); upc_put_int(upc_shared_addr(&x), &y_tmp); } The definitions of the runtime functions upc_put_int() and upc_get_int() are found in a pre-included header file (the UPC driver adds a -include switch on the command line). Depending upon optimization level and compile time switches - calls to the UPC runtime functions can be implemented as either inlined function calls or conventional calls to pre-compiled library routines. At optimization levels above -O0, most of the UPC runtime is inlined, by default. With the new/current organization of the compilation/call graph passes, we end up with the surprising result that the inlined runtime function definitions "disappear" before UPC's gimplify pass can refer to them. That's because the call graph pass noticed that the inline runtime functions were declared, but not referenced (yet). The gimplify pass is then run against the remaining function bodies, but the UPC runtime functions are no longer available. One workaround for this issue might be to mark the runtime functions, in a fashion similar to ctors/dtors so that the call graph pass won't eliminate them. I'm unsure if that will get the inlining aspects of those routines right, and it might retain unused function definitions in the form of compiled non-inlined code. GOMP appears to use a "lowering" pass that runs after the call graph and gimplify passes. It calls runtime routines via builtin function definitions, ensuring that the function definitions won't go away. However, it looks to me as if GOMP does not inline those runtime functions? OBJC implements some post-processing in the finish_file() hook routine, which in turn calls objc_finish_file(). That may be a reasonable place to relocate UPC's tree rewrites, but that leads to a few questions: Can gimplify_expr() be safely called on the same tree more than once? The question comes up because the simplest thing is to retain the current infrastructure where UPC rewrites occur in the gimplify language hook. The second gimplify pass will redo some work, calling out to the UPC language hook again, but since all UPC constructs have been rewritten and gimplified, there will be no additional work done, besides the traversal. How about an alternative approach that implements a custom tree-walk inside finish_file() (that is similar in structure to that implemented in omp-low.c). Is this rewrite routine allowed to selectively gimplify parts of the tree and/or to create temp variables managed by the code in gimplify.c? Is the description above, of the interactions between the cgraph, gimplify and lowering passes correct? What approach would you recommend for the implementation of UPC tree re-writes that will support calls to the runtime (that are inlined, if applicable)? thanks, - Gary
Re: RFC: cgraph/lowering vs. finish_file for GCC/UPC rewrites?
On 09/14/09 11:52:11, Richard Guenther wrote: > Without reading all the details of your mail I suggest that you > perform a custom walk over the function bodies right before > the frontend calls cgraph_finalize_compilation_unit () that > performs the necessary lowering (and function creation) to > GENERIC. The C++ frontend already does this during its > genericize phase to transform frontend specific trees to > middle-end GENERIC trees. Richard, thanks. Will take a look at how C++ handles things. - Gary
reghunt and "trunk" (GCC 4.5.x)?
Hello, I'm trying to set up 'reghunt' to track down a change in behavior from 2009-03-27 (4.4.3) to present. This is my first time setting up 'reghunt' - it is quite possible that I still haven't got things set up properly. I think that I've got the SVN bits, and most of the config. settings as they shoold be, but when I try to run my test, it fails trying to build 'cc1': /bin/sh gcc-reg-hunt/reghunt/src/gcc/../move-if-change tmp-options.h options.h echo timestamp > s-options-h TARGET_CPU_DEFAULT="" \ HEADERS="auto-host.h ansidecl.h" DEFINES="" \ /bin/sh gcc-reg-hunt/reghunt/src/gcc/mkconfig.sh bconfig.h x86_64-redhat-linux-gcc -c -g -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition -Wc++-compat -fno-common -DHAVE_CO NFIG_H -DGENERATOR_FILE -I. -Ibuild -Igcc-reg-hunt/reghunt/src/gcc -Igcc-reg-hunt/reghunt/src/gcc/build -Igcc-reg-hunt/reghunt/src/gcc/../include -Igcc-reg-hunt/reghunt/src/gcc/../libcpp/include -Igcc-reg-hunt/reghunt/src/gcc/../libdecnumber -Igcc-reg-hunt/reghunt/src/gcc/../libdecnumber/bid -I../libdecnumber \ -o build/errors.o gcc-reg-hunt/reghunt/src/gcc/errors.c as: line 83: exec: : not found (above, lines split for readability) Above 'as' is a script, and at line 83 it is trying to invoke the assembler, which indirectly will try to invoke ORIGINAL_AS_FOR_TARGET, but that variable is empty: ORIGINAL_AS_FOR_TARGET="" I notice that the build script, 'reghunt/bin/gcc-build-simple does some explicit configure/make steps: #msg "configure" ${REG_GCCSRC}/configure \ --prefix=$REG_PREFIX \ --enable-languages=$REG_LANGS \ $REG_CONFOPTS \ > configure.log 2>&1 || abort " configure failed" #msg "make libraries" make all-build-libiberty > ${LOGDIR}/make.all-build-libiberty.log 2>&1 || true make all-libcpp > ${LOGDIR}/make.all-libcpp.log 2>&1 || true make all-libdecnumber > ${LOGDIR}/make.all-libdecnumber.log 2>&1 || true make all-intl > ${LOGDIR}/make.all-intl.log 2>&1 || true make all-libbanshee > ${LOGDIR}/make.all-libbanshee.log 2>&1 || true make configure-gcc > ${LOGDIR}/make.configure-gcc.log 2>&1 || true and then: cd gcc # REG_COMPILER is cc1, cc1plus, or f951 #msg "make $REG_COMPILER" make $REG_MAKE_J $REG_COMPILER > ${LOGDIR}/make.${REG_COMPILER}.log 2>&1 \ || abort " make failed" msg "build completed" Which is where we're failing. I know that in the past, I've had trouble building 'gcc' by first explicitly running a make on its configure-gcc target, because it seems that some other precursors might've been left out - and this area of configuration/build may have experienced some subtle changes over the past year/two. I'm guessing that I need to chase a config./set up problem of some sort, but my top-level question is: Has anyone used 'reghunt' to find regressions in the current GCC "trunk" dating back a year/so (in this case, using a "simple" build)? I'd welcome any help/suggestsions, on setting up 'reghunt'. thanks.
Re: reghunt and "trunk" (GCC 4.5.x)?
On 01/06/10 12:54:21, Ian Lance Taylor wrote: > I think you need to make sure that the script removes any existing > config.cache files. Ian, thanks. This turned out to be a cockpit error on my part. The reghunt tools apparently expect the checked out gcc source tree to have the form /gcc; thus the sub-tree containing the GCC compiler is named /gcc/gcc. I had left off the extra level of 'gcc', tried to patch around it in the reghunt tools, but didn't catch all the refs. The net effect is the build script tried to config/make gcc directly rather than config-ing/making from the top-level. After fixing that set up error, the reghunt tools are working just fine, and I was able to find the patch that I was looking for. - Gary
Re: dwarf2 - multiple DW_TAG_variable for global variable
On 01/09/10 12:39:55, Nenad Vukicevic wrote: > This dwarf code started appearing since this patch: Here's the GCC bug report that led to this patch: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39563 It references a GDB fix as well: http://sourceware.org/ml/gdb-patches/2009-03/threads.html#00595 http://sourceware.org/ml/gdb-patches/2009-04/threads.html#00040 http://sourceware.org/ml/gdb-cvs/2009-04/msg00021.html
multiple defs. of TLS common symbols?
We use TLS relocated symbols to create thread-local symbols in the GCC UPC compiler, and have run into an issue illustrated by the following program, on a test case that defines a common symbol in several files, and uses it in a single file. The following program fails to link, with multiple defs: % head s.c t.c main.c ==> s.c <== __thread int x; ==> t.c <== __thread int x; ==> main.c <== __thread int x; int main() { x = 1; } % gcc s.c t.c main.c /tmp/ccK5Aj3k.o:(.tbss+0x0): multiple definition of `x' /tmp/ccm0kY5f.o:(.tbss+0x0): first defined here /tmp/ccchPiAt.o:(.tbss+0x0): multiple definition of `x' /tmp/ccm0kY5f.o:(.tbss+0x0): first defined here collect2: ld returned 1 exit status But if we don't use TLS storage, it all links just fine: % gcc -D__thread= s.c t.c main.c Off-hand this looks like it might be a linker issue, but perhaps there's an issue with the use of __thread in in the context above?
Re: multiple defs. of TLS common symbols?
On 01/13/10 17:15:10, Ian Lance Taylor wrote: [...] > Otherwise TLS variables are generated as definitions rather than as > common variables. > > Why do you want them to be common? For GCC/UPC compiled programs there are two compilation modes: 1) Each UPC thread is implemented as a full process, and these processes might be distributed across a network. 2) Each UPC thread is implemented as an OS thread (ie, pthread), and they are created by a single process and execute within its address spec. In the "process model", "int x;" has the usual semantics. It is defined as a common symbol. In the "pthread model", each file scoped variable is "localized" and becomes thread local; this is implemented by defining the variable using TLS relocation. Intermixing previously compiled C code that refers to file scoped variables with GCC/UPC compiled "pthread mode" files will likely not work well. But if the C code is compiled with the GCC/UPC compiler in "pthread mode" all file scoped symbols will be localized and everything should work as expected. The "process model" is the more natural and preferred way to compile UPC programs. The pthread model can offer some efficiencies and can make it easier to debug the program. Given the above, the goal of compiling in pthreads mode is to be able to compile regular "C" code as is, with the same behavior as when it was compiled in the normal process model. Thus, we want to translate all file scoped variables into localized TLS variables with the fewest surprises and differences. > Personally I tend to think that that is a good > thing. Treating uninitialized variables as common variables is a > non-standard extension even for C90. We can't get rid of them for > existing code, but __thread code is by definition new. I agree with your statement above, but for our purposes things will work better if we do create commonized TLS symbols. Maybe we can use GOMP's method for creating commonized TLS variables. Thanks for pointing it out. Do you/others on this list have a reference that supports the statement: "Treating uninitialized variables as common variables is a non-standard extension even for C90."? (I did see a thread on this list, late April 1999, that discussed some of the issues, but nothing definitive.) thanks.
Re: RFC: cgraph/lowering vs. finish_file for GCC/UPC rewrites?
On 09/14/09 11:52:11, Richard Guenther wrote: > > What approach would you recommend for the > > implementation of UPC tree re-writes that will > > support calls to the runtime (that are inlined, > > if applicable)? > > Without reading all the details of your mail I suggest that you > perform a custom walk over the function bodies right before > the frontend calls cgraph_finalize_compilation_unit () that > performs the necessary lowering (and function creation) to > GENERIC. The C++ frontend already does this during its > genericize phase to transform frontend specific trees to > middle-end GENERIC trees. I tried the custom tree walk approach, but decided that it will require some of the infrastructure already present in the gimplify pass (e. g., the creation of temp. variables), and that it is more expedient to utilize the current language dependent gimplify hook, but to move it earlier in the processing of the function body. To that end, I defined a language dependent genericize hook: /* Determine if a tree is a function parameter pack. */ bool (*function_parameter_pack_p) (const_tree); + /* Genericize before finalization (called from finish_function()). + Perform lowering of function bodies from language dependent form + to language independent (GENERIC) form. */ + void (*genericize) (tree); + which is called from finish_function (instead of calling c_genericize): if (!decl_function_context (fndecl)) { invoke_plugin_callbacks (PLUGIN_PRE_GENERICIZE, fndecl); - c_genericize (fndecl); + /* Lower to GENERIC form before finalization. */ + lang_hooks.genericize (fndecl); The UPC genericize hook is implemented as: /* Convert the tree representation of FNDECL from UPC frontend trees to GENERIC. */ void upc_genericize (tree fndecl) { /* Take care of C-specific actions first. Normally, we'd do this after the language-specific actions, but c_genericize is only a dumping pass now, and should be renamed. */ c_genericize (fndecl); /* Perform a full gimplify pass, because the UPC lowering rewrites are implemented using the gimplify framework. */ gimplify_function_tree (fndecl); } Although this may not be the best fit with the current framework, it lets us re-use the gimplify pass that we have been using with previous GCC 4.x implementations. At some point, we'll need to develop a ground-up tree-walk rewrite pass.
How to mark gimple values addressable?
(I'm copying this thread back to the main GCC list, to document the problem that we ran into, RG's suggestion and the fix that we made.) While merging our GCC/UPC implementation with the GCC trunk, we ran into a situation where some tests failed on the check shown below in verify_gimple_assign_single(). This failed because our representation of a UPC pointer-to-shared has an internal struct representation but in other respects is a pointer type (and appears to be register type). For some temps that UPC creates, they have to be marked as addressable, which causes them to no longer qualify as is_gimple_reg(), but the type still asserts is_gimple_reg_type(). The trees that were being created failed on this test: if (!is_gimple_reg (lhs) && is_gimple_reg_type (TREE_TYPE (lhs))) { error ("invalid rhs for gimple memory store"); debug_generic_stmt (lhs); debug_generic_stmt (rhs1); return true; } At first, I wondered if the checks above might be overly inclusive? On 01/11/10 11:03:46, Richard Guenther wrote: > You need a temporary for register type but non-register copy. Thus > it needs to be > > tmp_2 = A; > B = tmp_2; > > with tmp_2 being an SSA name, not > > B = A; Looking at some of the code in gimplify.c, we determined that calling prepare_gimple_addressable() is all that is needed: if (!is_gimple_addressable (src) || is_gimple_non_addressable (src)) { /* We can't address the object - we have to copy to a local (non-shared) temporary. */ - src = get_initialized_tmp_var (src, pre_p, NULL); + prepare_gimple_addressable (&src, pre_p); mark_addressable (src); is_shared_copy = 0; is_src_shared = 0; } } To make this work, prepare_gimple_addressable() needed to be changed so that it is exported from gimplify.c: -static void +void prepare_gimple_addressable (tree *expr_p, gimple_seq *seq_p) { while (handled_component_p (*expr_p)) expr_p = &TREE_OPERAND (*expr_p, 0); if (is_gimple_reg (*expr_p)) *expr_p = get_initialized_tmp_var (*expr_p, seq_p, NULL); } With this fix in place, we were able pass the various checks in tree-cfg.c, and to generate the expected code.
Re: multiple defs. of TLS common symbols?
On 01/14/10 08:26:31, Ian Lance Taylor wrote: > Online I found this: > > http://www.faqs.org/docs/artu/c_evolution.html > > [T]he ANSI Draft Standard finally settled on definition-reference > rules in 1988. Common-block public storage is still admitted as > an acceptable variation by the standard. Thanks, I found some dicussion in the C99 Rationle document, http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf section 6.2.2, "Linkage of Identifiers" (pp. 32-34). The email thread on this mailing list that I was referring to is here: http://gcc.gnu.org/ml/gcc/2009-04/msg00812.html
GCC and binutils dependencies
We recently ran into this 'as' bug running tests with the GCC (4.5 pre-cursor) "trunk" compiler on an x86_64 target running Ubuntu 8.04: http://sourceware.org/bugzilla/show_bug.cgi?id=10255 (the bug was marked fixed in June 2009). The issue was noted in this GCC PR: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40332 Since GCC 4.5 isn't out yet, I'm wondering what is the policy, or general rule that is followed with respect to a new version of GCC being dependent upon a particular version of binutils, or some important library? And, would it make sense in this case to have a GCC test case that exercises this 'as' bug to be able to detect that either the bug is there in the version of 'as' that is being used to build and test GCC, or that a regression occurred? thanks.
GUPC: A GCC frontend for UPC
A GCC front-end (and runtime) for UPC (Unified Parallel C) is available via the following GCC branch: svn://svn/gcc/branches/gupc. The GUPC project is described here: http://gcc.gnu.org/projects/gupc.html. Over the course of this year, we plan to work with the GCC development community with the goal to merge UPC support into the GCC mainline (perhaps in the GCC 4.6 release). We appreciate any/all feedback and suggestions. Thanks, - The GUPC development team
RFC: merging GUPC into the GCC trunk?
Now that GCC 4.5 has been branched from the main line, it seems that this is an appropriate time to consider GUPC for inclusion into the GCC trunk. GUPC was recently checked in as a GCC branch: http://gcc.gnu.org/projects/gupc.html What is the recommended process for having GUPC reviewed (and hopefully, subsequently approved) for being merged into the GCC mainline? Thanks, - Gary
GCC primary/secondary platforms?
On 04/07/10 11:11:05, Diego Novillo wrote: > I would suggest splitting patches across reviewer domains. See > previous merges from big branches for examples. This makes it easier > for maintainers and reviewers to review the relevant parts. > Additionally, make sure that the branch bootstraps and tests on all > primary/secondary platforms with all languages enabled. Diego, thanks for your prompt reply and suggestions. Regarding the primary/secondary platforms. Are those listed here? http://gcc.gnu.org/gcc-4.5/criteria.html We have access to only a few of the listed platforms, (and in the case of IA64 the underlying OS is SuSE not "unknown-linux-gnu"). How are the following targets handled? arm-eabi, mipsisa64-elf Are these cross-compilers targetting some sort of instruction set simulator? Is there "how to" for setting up those platforms and running tests? Given the nature of UPC we're not sure that some of those targets are applicable or will be initially supported; though I can certainly see the value of making sure that we don't break anything in the main line that would impact those platforms. Typically, how is this situation handled - where tests need to be run on hardware/software platforms that we don't have access to, prior to merging into the GCC trunk? thanks, - Gary
Re: GCC primary/secondary platforms?
Although the dscussion regarding libstdc++-v3 is likely germaine to various developers who are currently testing their changes and managing the ports that they're responsible for, it seems that this thread is venturing rather far from my initial query. I'm still wondering: Do GCC developers routinely test their patches on MIPS, ARM, and S390 platforms (for example)? I signed up for the 'cfarm' and don't see an S390 there, and some of the secondary targets look like they might be really SLOW? thanks, - Gary
CSE bug when narrowing constants
(Configuration: x86_64, GCC 4.2.3 base line) I've run into a problem where GCSE decides to kill a conditional jump instruction because it thinks that the result is always false. This happens when GCSE decides to propagate a constant that is "narrowed" [the original mode of the constant is word_mode (DImode) and the use of the constant is in a narrower mode (SImode)]. This situation arises inside the code generated by our GCC/UPC compiler, and so far I haven't been able to come up with a regular C test case that demonstrates the failure. For efficiency reasons, internal to the compiler, we overlay a 16 byte struct on top of a TImode value. The 16 byte struct is the representation of UPC's "pointer-to-shared", which is a potentially cross-node pointer consisting of three parts (vaddr, thread, phase). It looks like this: typedef struct { void *vaddr; unsigned int thread; unsigned int phase; } __attribute__ ((__aligned__(16))) upc_shared_ptr_t; Although not allowed by GCC, you can think of it has having an additional "__attribute__ ((__mode__(__TI__)))" specification. Here is an excerpt from the offending RTL that when passed to GCSE will lead to incorrect deletion of a conditional jump: [...] (insn 19 16 21 2 (set (reg:DI 81) (const_int 4294967296 [0x1])) 81 {*movdi_1_rex64} (nil) (nil)) (insn 21 19 24 2 (set (subreg:DI (reg:TI 70 [ D.2967 ]) 8) (reg:DI 81)) 81 {*movdi_1_rex64} (nil) (nil)) (insn 24 21 25 2 (set (reg:SI 60 [ p$phase ]) (const_int 1 [0x1])) 40 {*movsi_1} (nil) (nil)) (insn 25 24 26 2 (set (reg:SI 61 [ p$thread ]) (subreg:SI (reg:TI 70 [ D.2967 ]) 8)) 40 {*movsi_1} (nil) (expr_list:REG_EQUAL (const_int 4294967296 [0x1]) (nil))) [...] ;; Start of basic block 5, registers live: (nil) (code_label 53 52 54 5 2 "" [2 uses]) (note 54 53 56 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 56 54 57 5 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 61 [ p$thread ]) (const_int 0 [0x0]))) 3 {*cmpsi_ccno_1} (nil) (nil)) (jump_insn 57 56 59 5 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0x0])) (label_ref 63) (pc))) 531 {*jcc_1} (nil) (expr_list:REG_BR_PROB (const_int 7000 [0x1b58]) (nil))) [...] The conditional jump instruction formed by instructions 56 and 57 above is deleted because GCSE thinks that (reg:SI 61 [ p$thread ]) is non-zero. It comes to this conclusion when it propagates the REG_EQUAL (const_int 4294967296 [0x1]) value listed in instruction 25: (insn 25 24 26 2 (set (reg:SI 61 [ p$thread ]) (subreg:SI (reg:TI 70 [ D.2967 ]) 8)) 40 {*movsi_1} (nil) (expr_list:REG_EQUAL (const_int 4294967296 [0x1]) (nil))) Note that it takes 33 bits to express 0x1, and it won't fit into an SImode container. What CSE/GCSE should have done here is written that REG_EQUAL note as follows: (insn 25 24 26 2 (set (reg:SI 61 [ p$thread ]) (subreg:SI (reg:TI 70 [ D.2967 ]) 8)) 40 {*movsi_1} (nil) (expr_list:REG_EQUAL (const_int 0) (nil))) because only the lower 32 bits of the value are relevant. In that case, the conditional jump can be rewritten into an unconditional jump, but certainly not deleted. The code that decides it is OK to use the wider constant, without adjustment to the narrow mode is here: /* If we are looking for a CONST_INT, the mode doesn't really matter, as long as we are narrowing. So if we looked in vain for a mode narrower than word_mode before, look for word_mode now. */ if (p == 0 && code == CONST_INT && GET_MODE_SIZE (GET_MODE (x)) < GET_MODE_SIZE (word_mode)) { x = copy_rtx (x); PUT_MODE (x, word_mode); p = lookup (x, SAFE_HASH (x, VOIDmode), word_mode); } The logic above is OK as far as it goes, but the subsequent return of the unadjusted wider constant causes problems: for (p = p->first_same_value; p; p = p->next_same_value) if (GET_CODE (p->exp) == code /* Make sure this is a valid entry in the table. */ && exp_equiv_p (p->exp, p->exp, 1, false)) return p->exp; I'd think that somewhere in there gen_lowpart() needs to be called. I'd appreciate your review of the above analysis and any suggestions that you might have on implementing a fix.
Re: CSE bug when narrowing constants
On 11/28/08 16:02:11, Gary Funck wrote: > > I'd think that somewhere in there gen_lowpart() needs to > be called. I posted a suggested patch: http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01466.html which fixes the reported problem.
Re: CSE bug when narrowing constants
On 11/29/08 10:37:33, Eric Botcazou wrote: > > The conditional jump instruction formed by instructions > > 56 and 57 above is deleted because GCSE thinks that > > (reg:SI 61 [ p$thread ]) is non-zero. It comes to this > > conclusion when it propagates the > >REG_EQUAL (const_int 4294967296 [0x1]) > > value listed in instruction 25: > > > > (insn 25 24 26 2 (set (reg:SI 61 [ p$thread ]) > > (subreg:SI (reg:TI 70 [ D.2967 ]) 8)) 40 {*movsi_1} (nil) > > (expr_list:REG_EQUAL (const_int 4294967296 [0x1]) > > (nil))) > > > > Note that it takes 33 bits to express 0x1, and it won't > > fit into an SImode container. > > Then this note is invalid, REG_EQUAL pertains to the destination register: [...] Eric, thanks for the clarification on the role of REG_EQUAL notes. > IOW the culprit is not GCSE but whoever has created this note. Agreed. The routine that creates the errant REG_EQUAL note is lookup_as_function(). I posted a possible patch: http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01466.html (My FSF assignment is on file.) - Gary
Re: CSE bug when narrowing constants
On 11/29/08 14:45:49, Eric Botcazou wrote: > > Agreed. The routine that creates the errant REG_EQUAL note is > > lookup_as_function(). > > Really? Doesn't it only retrieve a pre-existing REG_EQUAL note? It retrieves an equivalent rtx constant, if it exists. Before the patch, the constant that was returned is a word mode (DImode) constant with the value 0x1 (33 bits), which won't fit into an SImode value, and therefore isn't equivalent. The fix is to call gen_lowpart() in the case where the word mode constant is narrowed to a smaller mode. In the example, the lower 32 bits of the constant will be used, which is 0, and is the correct equivalent constant. cse_insn() calls lookup_as_function() ultimately through fold_rtx(), IIRC, and is the routine that writes the REG_EQUAL note.
Re: CSE bug when narrowing constants
On 12/01/08 11:50:48, Eric Botcazou wrote: > > cse_insn() calls lookup_as_function() ultimately through fold_rtx(), IIRC, > > and is the routine that writes the REG_EQUAL note. > > OK, thanks. But I'm a little at a loss as to why this problem arises only > now: the problematic code in lookup_as_function is one decade old. Do you > happen to have older compilers around (say GCC 4.1.x based) that correctly > compile the testcase? If so, what happens differently with them? Yeah, I was surprised as well. The compiler base line this problem arose on is 4.2.3, but I think that it will occur in both older and newer base lines. The problem is triggered by code generated by the UPC (Unified Parllel C) support that we've implemented in a project we call GCC/UPC. It fails on a small UPC test case, but a number of factors have to be present to trigger the problem. I tried developing a vanilla C test case to duplicate the problem, but have so far been unsuccessful. Internally, we use VIEW_CONVERT_EXPR to overlay a TImode container on top of a struct. There is no exact C equivalent, though a union comes close. I tried that, but couldn't replicate the exact set of events that have to be present to hit the problem. I send what I tried to you separately. Perhaps adding some sort of logging in lookup_as_function() that indicates narrowing is occurring, and then running all test cases (including Ada, because its unchecked_conversion is close to what we're doing internally) would turn something up?
REC: gimplify - create a temp that is set at outermost block?
For UPC code generation, we're building an alternate method of accessing thread-local data that does not depend upon operating system support of the __thread qualifier. The motivation for this change is that we've noticed that __thread has varying levels of support across operating system/hardware platforms, and that when used extensively, we've seen capacity limitations on some target systems. UPC programs, when compiled in "pthreads mode" implicitly define all normal, file scoped or static, variables as being thread-local, which can lead to many TLS variables or to a TLS section that is quite large. The alternate implementation of TLS begins by targeting all TLS variables to a special named section. As an example, the declaration, __thread int x; can be thought of as being re-written into: int x __attribute__ ((section("tls_section"))); The runtime will allocate a per-thread block of memory that is the size of "tls_section", and initialized by the contents of that dummy section. This per-thread TLS base address will be maintained in an OS-dependent fashion as a per-thread value that will be returned by a function, called __get_tls(), which will obtain the per-thread value (possibly calling a function an OS-supplied function, for example, pthread_getspecific()). All references to 'x' will be rewritten by the UPC-specific gimplify pass into: *((&x - __tls_section_start) + __get_tls()) Above, "&x" is the address of 'x' derived in the conventional fashion as its address inside the TLS dummy section, which starts at the address given by "__tls_section_start". The gimplify code that currently implements this calculation looks like this: tls_base = lookup_name (get_identifier (UPC_TLS_BEGIN_NAME_STR)); if (!tls_base) fatal_error ("UPC thread-local section start address not found. " "Cannot find a definition for " UPC_TLS_BEGIN_NAME_STR); tls_base = build1 (ADDR_EXPR, char_ptr_type, tls_base); /* Refer to a shadow variable so that we don't try to re-gimplify * this TLS variable reference. */ var_addr = shadow_var_addr (var_decl); tls_offset = build_binary_op (MINUS_EXPR, convert (ptrdiff_type_node, var_addr), convert (ptrdiff_type_node, tls_base), 0); if (!useless_type_conversion_p (sizetype, TREE_TYPE (tls_offset))) tls_offset = convert (sizetype, tls_offset); tls_var_addr = build2 (POINTER_PLUS_EXPR, char_ptr_type, cfun->upc_thread_ctx_tmp, tls_offset); tls_ref = build_fold_indirect_ref (tls_var_addr); *expr_p = tls_ref; return GS_OK; (If you see any opportunities to improve/correct this code, please feel free to comment.) Above, you'll see a reference to "cfun->upc_thread_ctx_tmp"; this is a temporary variable that holds the value returned from __get_tls(). The idea is to call __get_tls() only once upon entry to the current function being compiled, and to re-use its value where needed. I made a first attempt at implementing this caching of the __get_tls() value, but have so far been unsuccessful. Here's the current implementation: if (!cfun->upc_thread_ctx_tmp) { const char *libfunc_name = UPC_GET_TLS_LIBCALL; tree libfunc, lib_call, tmp; libfunc = lookup_name (get_identifier (libfunc_name)); if (!libfunc) internal_error ("runtime function %s not found", libfunc_name); lib_call = build_function_call (libfunc, NULL_TREE); if (!lang_hooks.types_compatible_p (char_ptr_type, TREE_TYPE (lib_call))) lib_call = build1 (NOP_EXPR, char_ptr_type, lib_call); tmp = create_tmp_var_raw (char_ptr_type, "TLS"); TREE_READONLY (tmp) = 1; DECL_INITIAL (tmp) = lib_call; /* Record the TLS base address at the outermost level of * this function. */ DECL_CONTEXT (tmp) = current_function_decl; DECL_SEEN_IN_BIND_EXPR_P (tmp) = 1; declare_vars (tmp, DECL_SAVED_TREE (current_function_decl), false); cfun->upc_thread_ctx_tmp = tmp; } (The code from "TREE_READONLY" to "DECL_SEEN_IN_BIND_EXPR" above is cribbed from "gimple_add_tmp_var()" and "gimplify_init_constructor()".) The idea above is to initialize a temporary variable at the outer scope of the current function. Presumably, setting the initial value to the value returned by calling __get_tls(), and then calling "declare_vars()" to declare the temp. variable at the outermost scope of the function will do the job, but this code isn't having the intended effect. My sense is that the DECL_INITIAL() value above is being ignored and that code isn't being generated for it, and it seems possible that it won't be properly rescanned for gimplification. I'd appreciate any observations that you might have on why the implementation above doesn't work, and how to re-implement this section of code so that it has the desired effect. Perhaps, there's is code in GCC that currently does something like this, that I can refer
Re: REC: gimplify - create a temp that is set at outermost block?
On 05/19/09 11:29:57, Andrew Pinski wrote: > On Tue, May 19, 2009 at 11:25 AM, Gary Funck wrote: > > > > For UPC code generation, we're building an alternate > > method of accessing thread-local data that does not depend upon > > operating system support of the __thread qualifier. > > GCC has already added generic support for the __thread qualifier which > does not depend on the OS needing builtin support at all. Andrew, thanks. The only implementation that I'm aware of is described in Ulrich Drepper's 2005 paper, http://people.redhat.com/drepper/tls.pdf Is the __thread feature now more universally/portably supported? My impression is that this feature requires GNU/ELF linker and glibc support. Is that correct? We have been using builtin __thread support for quite a while. It generally has worked well on most modern Linux platforms, but we have encountered a few issues/glitches: * On SuSE 10/altix, we have seen overflows of the thread-local linker section, when compiling programs that declare many large TLS variables. * On CentOS 5/x86, we have seen programs that sometimes fail at 'exec' time, possibly because it can't muster the resources needed to start the program, or allocate the memory map. Those failures have been intermittent with no suspicious entries in the system logs. * On the older SGI/Irix systems, there has been no __thread support at all from what I can recall. Those limitations have motivated our need to provide a more portable implementation of TLS variables. thanks, - Gary
Re: REC: gimplify - create a temp that is set at outermost block?
On 05/19/09 12:10:43, Andrew Pinski wrote: > Gary wrote: > > Is the __thread feature now more universally/portably > > supported? > > Yes, see emutls.c and the VAR_DECL case in expand_expr_addr_expr_1 and > expand_expr_real_1 in expr.c. > [...] for the emulated support which is > implemented on the target side in emutls.c. > > On the tree level __thread looks the same for both the emulated and > native supports. Experimenting with this __thread emulation, a bit, I found that the following configure options appear to enable TLS emulation: --enable-threads=posix -disable-tls (where --enable-threads is likely unnecessary on most modern x86/Linux targets) Trying the following simple test program: __thread volatile int x; int main () { x = 1; return x; } The following code was generated: movl$__emutls_v.x, %edi call__emutls_get_address movl$1, (%rax) movl$__emutls_v.x, %edi call__emutls_get_address movl(%rax), %eax addq$8, %rsp ret Above, __emutls_get_address() is called twice, with the same argument. I was surprised to see that the optimizer (GCC 4.3.2) didn't notice this and use CSE to avoid the second redundant call, because emultls_get_address is defined as a "const" function: DEF_EXT_LIB_BUILTIN (BUILT_IN_EMUTLS_GET_ADDRESS, "__emutls_get_address", BT_FN_PTR_PTR, ATTR_CONST_NOTHROW_NONNULL) Back to the issue at hand, it may turn out that GCC's TLS emulation (thanks for pointing this out) will have acceptable performance I'm still interested in understanding how to create a gimple temporary that is set once upon entry to a function, so that its value is available within the function's body. thanks, - Gary
grokdeclarator drops type qualifiers when -aux-info isn't asserted?
Recently, I was debugging an issue in the GCC/UPC front-end that related to some problems compiling specific UPC type declarations. The front-end was, in certain cases, dropping UPC's "shared" qualifier. The relevant code is in grokdeclarator: if (!flag_gen_aux_info && (TYPE_QUALS (element_type))) type = TYPE_MAIN_VARIANT (type); Above, if the -aux-info switch isn't asserted then the type is set to its main variant. The -aux-info switch does the following: `-aux-info FILENAME' Output to the given filename prototyped declarations for all functions declared and/or defined in a translation unit, including those in header files. This option is silently ignored in any language other than C. [...] Given that this switch enables the generation of a report, it is surprising that this switch would cause the front-end to work differently depending upon whether -aux-info is asserted or not. That aside, I wonder if it is an error to drop the qualifiers as shown above? In the case of UPC, for example, dropping qualifiers definitely leads to problems; it may be the case that UPC's logic has to be reworked a bit if, in fact, the TYPE_MAIN_VARIANT() call above is needed. thanks, - Gary
Re: grokdeclarator drops type qualifiers when -aux-info isn't asserted?
On 05/20/09 09:45:11, Joseph S. Myers wrote: > On Tue, 19 May 2009, Gary Funck wrote: > > > That aside, I wonder if it is an error to drop the qualifiers > > as shown above? In the case of UPC, for example, dropping qualifiers > > Please read the code (and comment) immediately above that you quoted, > which saves the qualifiers combined with those specified in the > declaration, and the subsequent code applying them in the process of > building up the type. > [...] See the named address space patches for > examples of adding extra type qualifiers. Thanks. We've generally gotten that part right by adding a few qualifier bits. We can't however encode UPC's "layout qualifier" into the qualifier bits and we have to maintain it separately. I do see now that the layout qualifier on an element type should be handled earlier along with the rest of the qualifiers in the section that you're referencing. > The bug would probably be that it doesn't also drop > them if flag_gen_aux_info. Agreed. Though presumably the flag_gen_aux_info logic will have to be adjusted as well.
Re: REC: gimplify - create a temp that is set at outermost block?
On 05/20/09 10:40:02, Richard Guenther wrote: > Gary wrote: > > Above, __emutls_get_address() is called twice, with > > the same argument. I was surprised to see that the optimizer > > (GCC 4.3.2) didn't notice this and use CSE to avoid the second > > redundant call, because emultls_get_address is defined as > > a "const" function. > > This is likely because the libcall lacks a REG_EQUAL note (or > we lack something to put there). Tree level CSE would catch > it, but it doesn't see these function calls. Understood. Do you/others happen to know who is the maintainer of the TLS emulation? I tried a simple test case that works with the native TLS support, but it SEGV's when using TLS emulation. Perhaps a cockpit error on my part, but I'd like to see if I can use the TLS emulation for our purposes, and a first step is to get the example to work. thanks, - Gary
Re: REC: gimplify - create a temp that is set at outermost block?
On 05/20/09 17:13:23, Ian Lance Taylor wrote: > Gary Funck writes: > > > Do you/others happen to know who is the maintainer of the > > TLS emulation? > > [...] If you have found a bug, the fastest > way to address is probably to file a bug report. Doing a bit of research, it seems that the bug has already been been recently reported (against GCC 4.3, which the baseline we're using), http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40024 (The suggested fix hasn't been committed to the head svn branch, however.) thanks, - Gary
RFC: [GUPC] UPC-related changes
FYI, over the course of the next week/so, I will post UPC-related changes to the gcc-patches mailing list, for review. The goal is to make the necessary fixes/changes, based upon review feedback, that need to be made prior to merging the GUPC branch into the GCC trunk. Email describing the changes will be grouped according to a general area (front-end, make/configure, debugging info., middle-end, etc.). The first email, describing front-end changes is here: http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00628.html - Gary
RFE: 'enable checking' as a GCC compilation switch?
Recently, I ran into a couple of bugs/regessions that show up only if checking is enabled. This led me to the observation that it might be useful if checking can be enabled at runtime via a gcc command line switch. If this capability can be enabled by default, then regression tests could depend upon the checking capability, or users could be asked to run with full checks enabled when reporting bugs, etc. There will be some overhead to test for the switch, though the code that does the checking might remain under an #ifdef as it does now, to ensure that it absolutely isn't compiled unless the appropriate configuration option is enabled. That said, I would argue that if we go to the trouble to implement the capability, then support for checking switches should be enabled by default. If the code is never conditionalized, then --enable-checking=xxx might be re-defined to assert the various checking flags by default. Here is are some quick stats on the use/frequency of various checking options in GCC: ENABLE_CHECKING251 ENABLE_RTL_CHECKING 21 ENABLE_IRA_CHECKING 11 ENABLE_FOLD_CHECKING10 ENABLE_GC_CHECKING 9 ENABLE_DF_CHECKING 7 ENABLE_MALLOC_CHECKING 7 ENABLE_TYPES_CHECKING5 ENABLE_TREE_CHECKING 4 ENABLE_ASSERT_CHECKING 3 ENABLE_GIMPLE_CHECKING 3 ENABLE_RTL_FLAG_CHECKING 1 ENABLE_SCOPE_CHECKING1 ENABLE_VALGRIND_CHECKING 1 --- Total 334 Certainly, plenty of them to deal with, but perhaps with a bit of scripting the bulk of the changes can be automated.
GCC and out-of-range constant array indexes?
Consider the following: $ cat -n t.c 1 2 int A[10] = { 0 }; 3 4 int main() 5 { 6A[10] = 10; 7A[-1] = -1; 8return 0; 9 } In a compiler test case that I reviewed recently, there was the expectation that the compiler would issue a compile-time warning on the statements at lines 6 an 7 above. I tried this with GCC version "gcc (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10)" recently and was unable to find compilation switches that would cause it to complain about the use of out-of-range indexes above. Is there a technical reason that the compiler should not issue a warning, or might this feature become a legitimate RFE? thanks, - Gary
Re: GCC and out-of-range constant array indexes?
On 10/07/10 21:24:18, Ian Lance Taylor wrote: > -Warray-bounds, but that is one of the warnings which is unfortunately > only available when optimizing. In this case it requires -O2. Ian, thanks. I had thought optimization might be involved, but didn't try -O2. > There was an attempt a couple of years ago to implement this warning > when not optimizing [...]. Would it be possible to compute enough of the control flow graph to process warnings like this one, without running the actual optimizations, unless those optimizations are requested? Would the cost be too high? - Gary
Re: RFE: 'enable checking' as a GCC compilation switch?
On 10/03/10 12:03:44, Ian Lance Taylor wrote: > You will need to try a sample implementation and see how much the > compiler slows down and how much bigger it gets. I began roughing out the required changes. This will be a background project. If I can finish it to the point of running some timing tests, I will post the results here. thanks, - Gary
Re: GCC and out-of-range constant array indexes?
On 10/08/10 18:38:29, Basile Starynkevitch wrote: > I am not an expert on these optimizations, but why would you want that? I routinely compile/build with "-O0 -g3" because the code is easier to debug. I also admit that I compile/build with "-O0" because it is faster than "-O2" or "-O3" for example, and during development I am more interested in faster turn-around time on builds than faster execution time. Also, when I compile/build projects, I try to use the maximum level of warnings and checking that the source code base will support. I am willing to trade off some support/build time in favor of more thorough warnings. - Gary
Re: GCC and out-of-range constant array indexes?
How about the following: 1) Default warnings are cheap, and work fine at -O0. 2) Expensive warnings (-Wall, -Warray-bounds, -Wuninitialized, -Wunused) [not sure about the actual list] that require optimizations, will issue a Warning when they are requested, but the appropriate optimization level has not been asserted, that is required for those warnings to work in their maximal fashion. Or: Specification of the expensive warnings will cause appropriate control flow computations that are required to support those warning levels. (as suggested previously)
Re: GCC and out-of-range constant array indexes?
On 10/08/10 13:22:46, Ian Lance Taylor wrote: > I think both of those alternatives would be surprising and easily > misunderstood behaviour for many compiler users. [...] I find the following behavior to be surprising: $ gcc -Warray-bounds -O0 -c t.c $ gcc -Warray-bounds -O1 -c t.c $ gcc -Warray-bounds -O2 -c t.c t.c: In function ‘main’: t.c:6: warning: array subscript is above array bounds t.c:7: warning: array subscript is below array bounds The impact is that I may think that after I build my project at -O0 or -O1, with various warnings enabled, that there are potential surprises that await, when I perform a production build at -O2 and higher. It makes perfect sense to me that the following happens: $ gcc -Warray-bounds -O1 -c t.c t.c: Warning: -Warray-bounds has no effect unless compiled with optimization level -O2 and higher. > Almost all current warnings already meet those requirements; the main > problem child is -Wuninitialized. ... and -Warray-bounds?
Re: GCC and out-of-range constant array indexes?
On 10/08/10 18:38:29, Basile Starynkevitch wrote: > I am not an expert on these optimizations, but why would you want that? > The optimizations involved are indeed expensive (otherwise it would be > -O1 not -O2), but once you asked for them, why only get warnings > without the code generation improvement? Because the optimizations also make the generated code more difficult to debug, and can introduce new (buggy optimization) bugs. I prefer to get the code working with -O0 and then verify that it still works after optimization, because I think that minimizes my development risk and maximizes my productivity. Along those lines, I would still like to have all the compile-time warnings that I can get, and am willing to have my non-optimized builds go a little slower (say, no more than 20% slower) to have the additional warnings. > However, I see a logic in needing -O2 to get some warnings. > Optimizations are expensive, and they compute static properties of the > source code, which are usable (& necessary and used) for additional > warnings. After hearing the pros/cons, I have come around to the point of view that GCC's method of detecting things like uninitialized local variables is part of its optimization architecture. If I accept that my development cycle is: ("first -O0, then full optimization"), then I will have to accept that some warnings might show up when optimizations are turned on. Either that, or I might routinely run a tool like PC-LINT, or Coverity during development, and this may minimize the surprise warnings that pop up when optimizations are enabled. Or as you suggested, always run two parallel builds: one optimized, and one not. I appreciate every one's ideas and suggestions. This has been an interesting discussion thread. - Gary
codegen differences for increment of a volatile int
I've been looking at how GCC 4.0 handles "volatile" internally, and may have a question/two on that later, but in the meantime, I noticed some interesting differences in generated code that I thought were a bit unusual, and was wondering if someone here might explain why GCC behaves as it does, and what might be the recommended behavior? Beginning with this simple example, 1 int j; 2 volatile int jv; 3 void p() 4 { 5++j; 6++jv; 7 } when compiled with "gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)" the following code results: inclj movljv, %eax incl%eax movl%eax, jv Note that in the case where 'j' is _not_ volatile that a single 'incl' was generated, but in the case where 'jv' is volatile, the value was first loaded into a register, then incremented and stored back into memory. (asserting -O2 didn't substantially change the generated code) Compiling under "gcc (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)", the compiler always uses the form where the value is first loaded from memory into a register: movlj, %eax incl%eax movl%eax, j movljv, %eax incl%eax movl%eax, jv However, if -O2 is asserted, then the behavior reverts back the same behavior as demonstrated in gcc 3.4: inclj movljv, %eax incl%eax movl%eax, jv [both systems are i386-redhat-linux (FC3 and FC4)] Is there a technical reason that the use of "volatile" would dictate the second form of increment that first loads the value from memory into a register? I would think that a systems programmer might expect the opposite behavior, where "volatile" would imply the single instruction form of increment (which is non-interruptible on single processor systems).
RE: codegen differences for increment of a volatile int
> From: Bernd Jendrissek > Sent: Friday, May 05, 2006 12:50 AM [...] > Systems programmers should know better than to expect a particular > implementation of volatile. :) > > How, for example, would you suggest GCC generate code for this? > > volatile int qwerty; > > void p() > { > printf("qwerty = %d\n", ++qwerty); > } > > You could get a (uniprocessor non-interruptible) single-instruction > incl qwerty > but then you'd have to read the value again to be able to print it: > movl %eax, qwerty > at which point you've lost your "one evaluation is one read cycle" > semantics which some people might find even more important than > (uniprocessor!) atomicity. > > Don't forget that if you really wanted SMP-safe modification of > volatiles you'd have to use the "lock" prefix too. All good points, and I agree. I just mentioned this idea, because GCC is choosing the single instruction memory to memory form in some situations, and I was surprised that it chose this form in the non-volatile case, because it made more sense to me to prefer it in the volatile case - if it were to prefer it all in one situation over another. The current GCC main branch compiler offers a new rendition of the generated code at -O2: movljv, %eax addl$1, j addl$1, %eax movl%eax, jv where, when incrmenting the non-volatile 'j', it chosses 'addl' over 'incl'.
create_tmp_var_raw (gimplify.c) inadventently asserts 'volatile' on temps
While following GCC's handling of 'volatile' and other type qualifiers, I noticed that the gimplify pass created temporaries with a type with 'volatile' asserted if the underlying type also had 'volatile' asserted. Temporaries are created by the create_tmp_var_raw() procedure in gimplify.c, which reads as follows: tree create_tmp_var_raw (tree type, const char *prefix) { tree tmp_var; tree new_type; /* Make the type of the variable writable. */ new_type = build_type_variant (type, 0, 0); TYPE_ATTRIBUTES (new_type) = TYPE_ATTRIBUTES (type); tmp_var = build_decl (VAR_DECL, prefix ? create_tmp_var_name (prefix) : NULL, type); [...] Note above that an unqualified type, new_type, is created but then subsequently not used in the call to build_decl. Because of this omission, if 'type' originally had any qualifiers set (such as volatile), they'll be propagated to the temporary, which might have some unexpected effects on subsequent optimizations and code generation. The fix, I think, is to pass 'new_type': Index: gimplify.c === --- gimplify.c (revision 113552) +++ gimplify.c (working copy) @@ -449,7 +449,7 @@ TYPE_ATTRIBUTES (new_type) = TYPE_ATTRIBUTES (type); tmp_var = build_decl (VAR_DECL, prefix ? create_tmp_var_name (prefix) : NULL, - type); + new_type); /* The variable was declared by the compiler. */ DECL_ARTIFICIAL (tmp_var) = 1; (If this analysis is correct and it is recommended that I file a bug report on this, or post a patch, please let me know.)
'volatile' is propagated into constants and expression nodes (in some cases)?
Given, 1 volatile int jv; 2 3 int main () 4 { 5++jv; 6 } GCC (development branch, 4.0 and up) creates a tree node for the expression ++jv that has 'volatile' asserted in the type associated with the expression: unit size align 32 symtab 0 alias set -1 precision 32 min max > side-effects arg 0 side-effects volatile used public static common SI defer-output file a.c line 1 size unit size align 32> arg 1 constant invariant 1>> Further, 'volatile' is asserted in the type associated with the integral constant 1, above: (gdb) pt constant invariant 1> (gdb) p 0x402f2e04 $19 = 1076833796 (gdb) pt constant invariant 32> unit size constant invariant 4> align 32 symtab 0 alias set -1 precision 32 min max > We could argue whether this causes any real harm, because the ISO C spec. says the following: === 6.7.3: The properties associated with qualified types are meaningful only for expressions that are lvalues. 6.5.16: The type of an assignment expression is the type of the left operand unless the left operand has qualified type, in which case it is the unqualified version of the type of the left operand. And hopefully subsequent passes in the compiler won't be confused by seeing qualifiers asserted in expression nodes and in constants. IMO it would be better if the original tree constructed from the parsed program more closely followed the original source code, and where possible, removed extraneous qualifiers, unless they absolutely needed to convey correct semantics. Above, the qualifiers on expression nodes and constants seem to come about by a call to convert() from build_unary_op()which works its way through to this statement in fold_convert(): if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (orig) || lang_hooks.types_compatible_p (TYPE_MAIN_VARIANT (type), TYPE_MAIN_VARIANT (orig))) return fold_build1 (NOP_EXPR, type, arg); because the main variant types of the qualified "volatile int" and unqualified "int" are the same, convert() ends up recasting 'arg' into a qualified (volatile int) type. I don't know if there are other cases besides pre-/post- increment that have this problem. I think it is also possible that the code in the development head branch does a better job of generating expression nodes that have their qualifiers stripped than 4.0 did for example. Perhaps one way to gain some confidence that all possibilities have been covered is to add assertions in build_binary_op and build_unary_op (or build1 and build2 for that matter, for expression class nodes) that checks that TYPE_QUALS(t) == TYPE_UNQUALIFIED on expression nodes and constant nodes (though perhaps TYPE_CONST is meaninful for certain named constants?).
problem implementing language-specific gimplify of TRUTH_ANDIF expression
Working with GCC 4.0.1, we're implementing an experimental dialect of C, called UPC, which offers language extensions for parallel computing in a distributed shared memory setting (see: http://intrepid.com/upc). Generally, the work has proceeded well, and the language-specific callout in gimplify_expr() have been sufficient to implement UPC features by rewriting language extensions into C-like tree structures that can be further gimplified. However, we've run into a glitch, and I'm not quite certain where the fix should go, or how the fix should be implemented. UPC has a shared pointer that can address data in another process (called a thread in UPC terminology). A shared pointer has the following fields: struct shared_ptr_struct { unsigned long int phase : 48; unsigned int thread : 16; void *offset; }; typedef struct shared_ptr_struct shared_ptr_t; Two shared pointers are equal if all fields are equal: int cmp_ptr_eq (shared_ptr_t p1, shared_ptr_t p2) { return p1.offset == p2.offset && p1.phase == p2.phase && p1.thread == p2.thread; } The UPC-specific gimplify routine which implements shared pointer comparisons rewrites an expression like (p1 == p2) into the sort of code shown above. Here's the actual UPC-specific gimplify code: *expr_p = build_binary_op (TRUTH_ANDIF_EXPR, off_cmp, build_binary_op (TRUTH_ANDIF_EXPR, phase_cmp, offset_cmp, 0), 0); where off_cmp, thread_cmp and phase_cmp are expressions which evaluate the equality comparison for the offset, thread, and phase fields. For example, off0 = build3 (COMPONENT_REF, o_t, op0, upc_vaddr_field_node, NULL_TREE); off1 = build3 (COMPONENT_REF, o_t, op1, upc_vaddr_field_node, NULL_TREE); off_cmp = build_binary_op (code, off0, off1, 0); All this works pretty well, but ICE's on the following small UPC test program: shared int *p; int main(int argc, char **argv) { int errors = 0; if (p == NULL) { /* no action */ } else { errors = 1; } } % upc t.upc t.upc: In function 'main': t.upc:9: internal compiler error: in invert_truthvalue, at fold-const.c:3026 Please submit a full bug report, with preprocessed source if appropriate. See http://www.intrepid.com/upc/bugs.html> for instructions. It fails here: #1 0x005c2c39 in invert_truthvalue (arg=0x2e0bac30) at /upc/gcc-upc-4/src/gcc/fold-const.c:3026 3026 gcc_assert (TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE); The type of the arg is integer_type, not boolean: (gdb) p arg->common.type $1 = 0x2decca90 (gdb) pt constant invariant 32> unit size constant invariant 4> align 32 symtab 0 alias set -1 precision 32 min max pointer_to_this > It is an integer type because the initial build_binary_op(TRUTH_ANDIF_EXPR ... uses the type of the result of the comparisons, which is integer_type. The TRUTH_ANDIF expr is gimplified in gimplify_boolean_expr: 3079gimplify_boolean_expr (tree *expr_p) 3080{ 3081 /* Preserve the original type of the expression. */ 3082 tree type = TREE_TYPE (*expr_p); 3083 3084 *expr_p = build (COND_EXPR, type, *expr_p, 3085 convert (type, boolean_true_node), 3086 convert (type, boolean_false_node)); 3087 3088 return GS_OK; 3089} basically a boolean expression b is converted into a true or false value by rewriting it as: (b) ? true : false However, as the comment states "Preserve the original type of the expression.", the original type of the expression, 'b', is kept. In this case, the type is integer_type not boolean type. Thus the original (EQ_EXPR p1 p2) is rewritten into (COND_EXPR integer_type (TRUTH_ANDIF_EXPR (EQ_EXPR p1.offset p2.offset) TRUTH_ANDIF_EXPR (EQ_EXPR p1.phase p2.phase) (EQ_EXPR p1.thread p2.thread))) (boolean_type true) (boolean_type false)) If we call the condition expression above, 'cond', then the test program has the following structure: (COND_EXPR (cond) (void) (MODIFY_EXPR (VAR_DECL errors) (constant 1))) Invert_truthvalue wants to rewrite the construct above into: (COND_EXPR (TRUTH_NOT_EXPR (cond)) (MODIFY_EXPR (VAR_DECL errors) (constant 1)) (void)) This runs into trouble when invert_truthvalue attempts to negate the condition 'cond', insisting that cond be a boolean expression. Under normal conditions this isn't a problem, because the normal flow of control of parsing if statemtns and then gimplifying them would have forced 'cond' to be of boolen type. The problem arises when UPC rewrites the EQ_EXPR into a TRUTH_ANDIF expr. The condition expression missed a chance to be converted to a boolean type in gimplify_boolean_expr(), because that function preserves the incoming (integer) type, and it misses an opportunity again when th
externs and thread local storage
Consider the following program made up of two separate files: ==> file1.c <== extern int x; int main() { x = 5; } ==> file2.c <== int __thread x = 10; This will compile, link, and run on the IA64, but will fail at link time on AMD64: % gcc file2.c file1.c /usr/bin/ld: x: TLS definition in /tmp/ccmdUAs3.o section .tdata mismatches non-TLS reference in /tmp/ccuSmPAa.o /tmp/ccuSmPAa.o: could not read symbols: Bad value collect2: ld returned 1 exit status However if the initial extern were changed to: extern __thread int x; it will also compile, link, and run on the AMD64. To further complicate matters, if the program is rewritten into a single file as follows: int __thread x; int main() { extern int x; x = 5; } it will fail at compile-time with gcc 4.1: fx.c: In function 'main': fx.c:4: error: non-thread-local declaration of 'x' follows thread-local declaration fx.c:1: error: previous declaration of 'x' was here independent of the fact that this program likely would work fine on the IA64 and perhaps some other architectures. It seems that GCC is enforcing a policy that the __thread attribute has to be added to extern declarations if the underlying variable is declared with the __thread attribute. If we viewed the __thread attribute as something like assigning a variable to a particular linkage section (which is what it does), then shouldn't that assignment be transparent to programs referencing the variable via an extern? What are the technical reasons for the front-end enforcing this restriction, when apparently some linkers will handle the TLS linkage fine? If in fact it is required that __thread be added to the extern, is the compiler simply accommodating a limitation/bug in the linker?
RE: externs and thread local storage
Mike Stump wrote: > > This sounds like a bug that should be fixed. You should only need > __thread on the extern if there was not a previous declaration for it. > The compiler seems pretty determined to enforce this restriction. Same result with 'const' instead of _thread: int const x; int main() { extern int x; x = 5; } t.c: In function 'main': t.c:4: error: conflicting type qualifiers for 'x' t.c:1: error: previous declaration of 'x' was here
RE: externs and thread local storage
Andrew Pinski wrote: > I would have hoped people actually read: > http://gcc.gnu.org/onlinedocs/gcc/C99-Thread_002dLocal-Edits.html > > Which actually describes the edits to the C99 standard to how > __thread is supposed to behave. Thanks for the reference. Per that proposal, __thread is a storage-class specifier, which makes sense. I may have confused the issue by offering up an example using 'const' -- the point of the example was really just to show that the implementor of the __thread check wasn't lazy, but was following suit on the qualifier check. Given that __thread is a proposed extension, there isn't much precedent to lean on, because generally 'extern' can only refer to block scope identifiers and those objects are inherently global, and from the point of view of the "C" program referring to the objects, the actual method used to link, load, and access those objects is implementation defined. (Btw, personally, I'd prefer that a propoasl to extend the "C" language use something other than a keyword beginning with __ as a way of doing that. For example, a compound keyword such as "thread local" would read better and is unlikely to clobber many existing programs. If the idea of a compound keyword is too offensive, then thread_local seems a lot better than __thread to me.) Applying the proposed standard to the following: Dave Korn wrote: > Reasons like this are why we have 6.2.7.2 in the C language spec, aren't > they? > > "All declarations that refer to the same object or function shall have > compatible type; otherwise, the behavior is undefined." The answer is probably, no. Because the presence or absence of storage specifiers shouldn't affect type compatibilty. In reply to my question: > > What are the technical reasons for the front-end enforcing this > restriction, > > when apparently some linkers will handle the TLS linkage fine? > If in fact > > it is required that __thread be added to the extern, is the > compiler simply > > accommodating a limitation/bug in the linker? Seongbae Park wrote: > Because the compiler has to generate different code > for accesses to __thread vs non __thread variable In my view, this is implementation-defined, and generally can vary depending upon the underlying linker and OS technology. Further, there is at least one known platform (IA64) which seems to not impose this restriction. A few implementation techniques come to mind, where the programmer would not need to explicitly tag 'extern's with __thread: 1. The linker can fix up external references to __thread variables by inserting jumps to a "thunk" that executes the appropriate instuctions and then jumps back to the point following the original instruction. The IA64 linker might already be doing that. 2. The entire program might be compiled in some sort of PIC mode, where all external references go through some sort of indirect table, or some small subroutine is called to load the proper address, and there is no distinction between regular extern references and extern __thread references. 3. The linker coallesces __thread objects into a special linkage segment, and the OS allocates a new instance of this segment when it instantiates a thread. From the thread's point of view, access to this per-thread segment is just a regular memory reference. Thread creation may be slower, but access to TLS data is faster. Thus, I don't think gcc should be checking for the presence of the __thread specifier applied to an extern when the referenced object is also declared as having __thread persistance, and both declarations happen to be visible in a given compilation unit. If this check must remain, I think it should be down-graded to a warning (with a flag to turn off the warning), and the check should be target-tuple specific, with possible further target-dependent checks (such as special PIC modes, etc.). This part of the proposed spec.: "The declaration of an identifier for a variable that has block scope that specifies __thread shall also specify either extern or static." seems to indicate that following declaration (at block scope) is erroneous: int __thread x; because it has neither "static" nor "extern" preceding it. Interestingly, when declared at an inner scope, the declaration above appears to be allowed, because __thread is a storage specifier. Perhaps the "C" standard says somewhere that a bare block scope declaration implies "extern", but the language in the spec. seems to call out precisely the presence of either "static" or "extern" ahead of __thread in block scope declarations. Thus, it seems that if gcc is going to move towards the proposed standard, it should also deprecate block scope declarations that aren't preceeded with either "extern" or "static"? (and perhaps it should do this in preference to matching up bare declartions with extern declarations). If extern is required for block scoped objects then it seems to imply th
RE: externs and thread local storage
Seongbae Park wrote: > That's the only platform I know of that doesn't require different > sequence. > Should we make the language rules such that > it's easy to implement on one platform but not on the others, > or should we make it such that it's easy to implement in almost > all platforms ? The fact that one current generally available platform doesn't require the __thread attribute on the extern should be enough to at least question whether an *error* should be diagnosed. Also, consider that compiler can't check for consistency across separately linked files and the linker already will give an error if references to __thread local objects don't have thread lcoal relocations. > > Also, what is the benefit of allowing mismatch between > declaration and definition of __thread vs non __thread ? The extern does not mismatch, it simply doesn't provide the __thread attribution. The compiler can determine this and quietly upgrade the extern, if it chose to. In my view, all of this should be unnecessary, and should really be a linker and OS implementation issue, but it seems like it may be difficult getting a conensus on that. > It only makes reading the code more difficult > because it confuses everybody - you need to look at the definition > as well as the declaration to know whether you're dealing > with a thread local variable or not which is BAD. The example I gave had the global declaration and extern in the same source file, and there it looks pretty silly. Typically, however, one will have a .h file where the extern lives and a single source file where the __thread local variable is declared. And often, typically, the .h file might be handed off to another group as the API to be used when accessing the separately compiled implementation. In that scenario, the users of the .h file won't have the opportunity to check whether it agrees with the data object declaration anyway. So, the inconsistency will only be detected when the data object is declared, _if_ the programmer also #includes the header file with the extern declaration into the same file that declares the object. More to the point, I think it is rather too bad that the extern has to have the __thread attribute at all, and would have hoped that the linker and OS could have collaborted to make this transparent, in the same way that data can arranged in separately linked sections without impacting the way the external references are written. Thus, implementation is separated from interface. > ...proposed scheme snipped... Those weren't just proposals. Some systems already implement mechanisms like those mentioned (not proposed), and the IA64 is apparently one of those systems. > > The question to me is not whether it's doable, but whether it's > worth doing > - I see only downside and no upside of allowing mismatch. Given that on some systems, there is no need to have __thread on the extern at all, why should the compiler mandate it? If it does mandate consistency, then it should at least do so on a per platform, per compilation option basis. After all __thread isn't supported on all platforms or under certain compilation regimes -- thus the check for __thread support is made conditional upon the characteristics of the compilation target. I think the requirement to apply _thread to an extern should also be target specific. > If you're convinced that this is a really useful thing for a > particular platform, > why don't you create a new language extension flag that allows this, > and make it default on that platform ? Because it is the current implementation of __thread that is in opposition to the generally accepted practice of separating interface from implementation, and because some implementations (both present and future) do not require that the external reference be attributed with __thread.
RE: externs and thread local storage
Pinski wrote: > What about the following two sources: > char t; > --- > extern int t; > What should happen? According to the C standard this is invalid code but > the compiler does not need to diagnose the problem. Yup. Certainly a great way to re-use space across separately compiled "C" source files (ala Fortran's blank common). I can see where the compiler is within its rights to issue a warning above, or even a pedantic error.
RE: externs and thread local storage
Seongbae Park wrote: > As I said, you're welcome to implement a new option > (either a runtime option or a compile time configuration option) > that will allow mixing TLS vs non-TLS. In a way, we've already done that -- in an experimental dialact of "C" called UPC. When compiled for pthreads, all file scope data not declared in a system header file is made thread local, and in fact all data referenced through externs is also made thread local. There is a new syntax (a "shared" qualifier) used by the programmer to identify objects shared across all threads. Sounds a little scary, but works amazingly well. Because the tagging of data as __thread local is done by the compiler transparently, I tend to think that we probably stress the TLS feature more than most. > Whether or not it should be enabled for a particular platform > should be a matter of discussion, and whether or not that patch will be > accepted in the mainline will be yet another. For myself, I've worked around the problem, and don't see any consensus forming, so see little need to come up with a patch for something that has no support. As far as the process goes, I think it is better to discuss the issues here to develop a consensus (if any) before developing a patch.
RE: externs and thread local storage
Seongbae Park wrote: > In UPC, anything that's not TLS (or in UPC term, "private") > is marked explicitly as "shared". So it's NOT trasparent in > any sense of the word. > See, you have two choices - either > 1) make every global variable TLS by default and mark only > non-TLS (UPC) or > 2) vice versa (C99). > > It is not sane to allow TLS/non-TLS attribute changing underneath you > - which is what you proposed. Operations on UPC's shared objects have different semantics than regular "C". Array indexing and pointer operations span threads. Thus A[i] is on one thread but A[i] will (by default) take you to the next (cyclic) thread. Since the semantics are different, the programmer needs to know that -- it affects the API. TLS objects behave like regular "C" objects, at least from the perspective of the referencing thread. Note that this discussion started only on the question as to whether the compiler should issue an error if it sees a bare extern referencing a __thread object. My position is that it should be a target dependent error, and perhaps only a warning (because on some platforms the resulting program will link and execute as expected), and that there are many commonly occurring cases where the compiler can't catch the inconsitencies in declarations and these are left for the linker anyway. Also note that the proposed specification seems to side-step the issue by only allowing __thread after extern and static at block scope, and would not permit the situation used in the example that I presented. Further, that it isn't clear the current compiler is in sync. with the proposed specification, and that is probably a higher priority issue. (Maybe my quick reading of the spec. was wrong, and someone can correct my misunderstanding.)
x86_64 - 128 bit structs not targeted to TImode: MAX_FIXED_MODE_SIZE too small?
Given, struct shared_ptr_struct { unsigned int phase : 24; unsigned short thread : 16; void *addr; }; On the x86_64 (ie, Opteron[tm]) platform, GCC appears to designate the underlying mode of this type as a BLKmode, instead of a TImode. This has implications in terms of the quality of the code that is generated to copy and manipulate 128 bit structures (as defined in the example above). The decision to commit this type to a BLKmode value, originates in this logic in mode_for_size(): if (limit && size > MAX_FIXED_MODE_SIZE) return BLKmode; On the x86 platform, there appears to be no target definition for MAX_FIELD_SIZE. Thus, the default in stor-layout.c applies: #ifndef MAX_FIXED_MODE_SIZE #define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (DImode) #endif Other 64 bit targets define MAX_FIXED_MODE_SIZE along these lines (some line wrapping may occur below): config/i960/i960.h:#defineMAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TImode) config/ia64/ia64.h:#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TImode) config/mips/mips.h:#define MAX_FIXED_MODE_SIZE LONG_DOUBLE_TYPE_SIZE config/sh/sh.h:#define MAX_FIXED_MODE_SIZE (TARGET_SH5 ? 128 : 64) on MIPS, LONG_DOUBLE_TYPE_SIZE is defined as follows: /* A C expression for the size in bits of the type `long double' on the target machine. If you don't define this, the default is two words. */ #define LONG_DOUBLE_TYPE_SIZE \ (mips_abi == ABI_N32 || mips_abi == ABI_64 ? 128 : 64) In the 'dev' tree, the s390 defines MAX_FIXED_MODE_SIZE as follows: config/s390/s390.h:#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_64BIT ? TImode : DImode) (Arguably, the s390 variant might be a better default value to be defined in stor-layout.c) I haven't tried making the suggested change to see if the x86_64 code generator can fully support it. Are there any technical reasons that the x86_64 shouldn't target 128 bit structs into a long double (ie, two 64 bit registers)?
IA64 record alignment rules, and modes?
On the IA64, the following record, typedef struct sptr_struct { long unsigned int phase: 48; short unsigned int thread: 16; void *addr; } sptr_t; is assigned a BLKmode rather a TImode, and I was wondering whether this is a requirement of the IA64 ABI, or a coincidental result of various target configuration defintions? The final determination of the mode assigned to this struct is made in compute_record_mode(). The logic first tentatively assigns a TImode (128 bits) as expected, in the second branch of this if statement (GCC version 3.3.2): /* If we only have one real field; use its mode. This only applies to RECORD_TYPE. This does not apply to unions. */ if (TREE_CODE (type) == RECORD_TYPE && mode != VOIDmode) TYPE_MODE (type) = mode; else TYPE_MODE (type) = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1); and then reverses that decision in the subsequent if statement: /* If structure's known alignment is less than what the scalar mode would need, and it matters, then stick with BLKmode. */ if (TYPE_MODE (type) != BLKmode && STRICT_ALIGNMENT && ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT || TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (TYPE_MODE (type { /* If this is the only reason this type is BLKmode, then don't force containing types to be BLKmode. */ TYPE_NO_FORCE_BLK (type) = 1; TYPE_MODE (type) = BLKmode; } primarily because STRICT_ALIGNMENT is asserted, and BIGGEST_ALIGNMENT is 128 in config/ia64/ia64.h: #define STRICT_ALIGNMENT 1 /* Optional x86 80-bit float, quad-precision 128-bit float, and quad-word 128 bit integers all require 128 bit alignment. */ #define BIGGEST_ALIGNMENT 128 And this configuration parameter in config/ia64/ia64.h may also have led to the decision to force 64 bit alignment for this structure (this is asserted on most targets): /* Define this if you wish to imitate the way many other C compilers handle alignment of bitfields and the structures that contain them. The behavior is that the type written for a bit-field (`int', `short', or other integer type) imposes an alignment for the entire structure, as if the structure really did contain an ordinary field of that type. In addition, the bit-field is placed within the structure so that it would fit within such a field, not crossing a boundary for it. */ #define PCC_BITFIELD_TYPE_MATTERS 1 Question: If we assume that a TImode would've been a more efficient mode to represent the record type above, would it not have been acceptable for the compiler to promote the alignment of this type to 128, given there are no apparent restrictions otherwise, or are there other C conventions at work that dictate otherwise? Is there a configuration tweak that would've led to using TImode rather than BLKmode?
Re: Bad gcc/gtype-desc.h generated when using sparse checkout
On 07/15/12 21:53:02, Jonathan Wakely wrote: > [...] > It took me a while to get back to this, but your suggestion worked, > this patch allows bootstrapping to get past cp/lex.o, it hasn't > finished yet so I haven't run the tests: > [...] > Presumably gengtype goes through directories alphabetically, so if it > doesn't find gcc/ada before gcc/c then it creates an invalid > gtype-desc.h We would like to see this patch, or something similar applied. Currently, we subset the GCC source distribution to include only C, C++, and UPC when we build the GUPC (GNU UPC) source code distributions. For some test systems where we port GUPC, the available disk space is restricted, and the additional 600MB or so of space required for the additional languages and test suites might exceed the available quota. If I recall correctly, there was some discussion on this list, or perhaps gcc-patches, as to whether a decision should be made regarding the ability to subset the GCC source tree. If sub-setting is not prohibited, and there are no plans to upgrade/rewrite gengtype and its infrastructure, then something like this patch seems necessary. thanks, - Gary
graphite loop optimizer - "C" examples?
I have been experimenting with the graphite optimizer, based on GCC trunk, and cloog-isl. I started with the attached simple "C" program, which has this basic structure. #define N 2 int a[N][N], b[N], c[N]; [...] for (i = 0; i < N; i++) { b[i] = i; c[i] = i + N; } for (i = 0; i < N; i++) for (j = 0; j < N; j++) a[j][i] = b[i] + c[j]; (Attached, is the full test case.) And compiled it with: -O3 -floop-block. Couple of questions: 1) What option should I supply to confirm that the graphite optimizer ran and determine (i) did it in fact perform any optimizations, and (ii) which optimizations did it perform? 2) If -floop-block couldn't optimize this program, what is the likely reason? 3) Would you please offer pointers to example "C" programs that highlight graphite-cloog-isl optimizations? Thanks, - Gary #include #include #include #define N 2 int a[N][N], b[N], c[N]; static double cpu_time () { struct timespec ts; double t; if (clock_gettime (CLOCK_MONOTONIC, &ts)) abort (); t = ts.tv_sec + (ts.tv_nsec * 1.0e-9); return t; } int main (void) { int i, j, k; double start, stop, elapsed; for (i = 0; i < N; i++) { b[i] = i; c[i] = i + N; } start = cpu_time (); for (i = 0; i < N; i++) for (j = 0; j < N; j++) a[j][i] = b[i] + c[j]; stop = cpu_time (); elapsed = stop - start; printf ("elapsed time = %0.2f secs.\n", elapsed); return 0; }
Re: C++ and gather-detailed-mem-stats
Would it be possible to define a new function attribute that transparently adds two parameters for file name and line number? Or that etablishes a binding between this information and existing parameter names? This might be useful for regular "C" programs as well. void do_something (T1 t1, T2 t2) __attribute ((caller_info (func => __FUNC__, file => __FILE__, line => __LINE__))); Perhaps a compilation switch and a pre-defined macro are necessary to meaningfully be able to code the body of the function and/or to conditionally enable the collection of the data. The syntax above with the mix of parameter names (but no types) and pre-defined macros may not make sense, but perhaps the idea can be developed further if there is interest. Or maybe a builtin type? void do_something (T1 t1, T2 t2, const __builtin_caller_info_t * const caller_info) __attribute__ ((caller_info)); This comes at the cost of an additional pointer argument but it can be set to NULL if collection of caller info is disabled. It can also be an opaque type established via #define for configurations where either the compiler doesn't support the feature or it is disabled. Otherwise, __builtin_caller_info_t might have the obvious fields (function_name, file_name, line_number). Off-hand, I can see how to make this work OK with default argument values and/or over-loading in C++, but some more work would be involved (by the programmer) to make this work for regular "C" programs. - Gary
Re: C++ and gather-detailed-mem-stats
Or no explicit parameters at all ... void do_something (T1 t1, T2 t2) __attribute__ ((caller_info)); All this will do (with appropriate compilation switches and/or pre-defined macros) is pass one/more hidden arguments, which in turn can be accessed in the function body via a built-in function. #if CALLER_INFO_ENABLED __caller_info_t caller_info = __builtin_caller_info (); [...] #endif In this way, the programmer-visible function prototype is unaffected, though the caller and the function body have to be compiled with compatible settings.
best method to implement dynamic initializers?
We have the need to generate code that initializes certain variables and runtime-related values with expressions that can't be evaluated statically at compile-time. One method to do this, would be to create an __attribute__ ((constructor)) function that contains statements which initialize the valures of interest. Are there language dialects that already have this requirement to evaluate and assign initial values at runtime? Do they use a general mechanism like the constructor attribute or somehow roll their own? (I can see how GCC's constructor mechanism may not be sufficiently general for Ada, for example). Ideally, I'd prefer to use some already developed and proven code/approach rather than re-invent the wheel. Any pointers/tips appreciated. thanks.
(gcc 4.2) how to create an ADDR_EXPR that refers to a linkage name?
We are in the process of updating GCC/UPC's support for the UPC dialect of C to version 4.2.0 of GCC. GCC/UPC is described here: http://www.intrepid.com/upc.html Generally, things are working. However, at the moment, all tests fail when optimizations are enabled. For example: test00.upc:35: internal compiler error: in referenced_var_check_and_insert, at tree-dfa.c:639 It is failing on this check: (gdb) l 634 635 if (h) 636 { 637 /* DECL_UID has already been entered in the table. Verify that it is 638 the same entry as TO. See PR 27793. */ 639 gcc_assert (h->to == to); 640 return false; 641 } 642 643 h = GGC_NEW (struct int_tree_map); (gdb) p h->to $1 = 0x2e1ad160 (gdb) pt unit size align 8 symtab 0 alias set -1 precision 8 min max pointer_to_this > addressable used public static common QI defer-output file test00.upc line 26 size unit size align 8> (gdb) p to $2 = 0x2e1b7630 (gdb) pt unit size align 8 symtab 0 alias set -1 precision 8 min max pointer_to_this > addressable used public static common QI defer-output file test00.upc line 26 size unit size align 8> Above, the two tree nodes are clones of each other created by the following UPC-specific code: 1109/* Convert shared variable reference VAR into a shared pointer 1110 value of the form {0, 0, &VAR} */ 1112tree 1113upc_build_shared_var_addr (tree type, tree var) 1114{ 1115 tree new_var, var_addr, val; 1116 if (!(TREE_CODE (var) == VAR_DECL && TREE_SHARED (var))) 1117abort (); 1118 if (!(TREE_CODE (type) == POINTER_TYPE && TYPE_SHARED (TREE_TYPE (type 1119abort (); 1120 1121 /* Create a VAR_DECL that is the same as VAR, but 1122 with qualifiers (esp. TYPE_QUAL_SHARED) removed so that 1123 we can create the actual address of the variable (in the shared 1124 section) without infinite recursion in the 1125 gimplification pass. Make sure the new copy has 1126 the same UID as the old. In the future, we might need 1127 to reference the symbol name directly. */ 1128 1129 new_var = copy_node (var); 1130 DECL_UID (new_var) = DECL_UID (var); 1131 TREE_TYPE (new_var) = TYPE_MAIN_VARIANT (TREE_TYPE (var)); 1132 TREE_SHARED (new_var) = 0; 1133 TREE_STRICT (new_var) = 0; 1134 TREE_RELAXED (new_var) = 0; 1135 var_addr = build_fold_addr_expr (new_var); 1136 TREE_CONSTANT (var_addr) = 1; 1137 val = upc_build_shared_ptr_value (type, 1138integer_zero_node, 1139integer_zero_node, 1140var_addr); 1141 return val; 1142} As background, GCC/UPC adds a new qualifier, "shared" to indicate that a value must be accessed remotely and that it is shared across all UPC "threads" (which can be thought of as processes all running the same program, but with differing local copies of data). The UPC specific aspects of the language are translated by a gimplify pass into normal gimple trees that are then passed to the middle and back ends of GCC. For example a reference to a value of a type that is qualified as "shared" will result in a call to a (possibly inlined) remote "get" library routine. Where this gimplify pass can get confused is when it sees a reference to a shared variable. If it sees a reference to a shared variable on the right hand side of an assignment, it assumes that its value is needed and generates a remote get call. The address of a shared variable has three parts (phase, thread, virtual address). For declared variables, the phase and thread are always 0. A constructor is used to create a shared address. That's what upc_build_shared_ptr_value() does above. The virtual address part of the shared address is simply the regular address of the variable, because all shared variables are collected together in their own "upc_shared" linkage section. This section is needed simply for address assignment purposes. The actual shared data is located in a global shared address region. The code above clones a shared variable, stripping its type qualifiers (most importantly the "shared" qualifier). When the address of the cloned variable is taken, its normal C pointer-sized address will result, and the special gimplify pass doesn't get confused, thinking that the address of the variable is a shared address. The code above isn't clever. It clones the variable each time it needs to generate a shared address. In GCC 4.2, this runs into problems in the optimization pass that implements special checks for this sort of inconsistency. The discussion above is a (very) long lead up to a request for ideas and suggestions for better handling this situation. O
Re: (gcc 4.2) how to create an ADDR_EXPR that refers to a linkage name?
On Sat, Sep 01, 2007 at 01:43:37PM -0400, Diego Novillo wrote: > > Have you considered using the data sharing machinery in OpenMP? We > simply create a data structure holding all shared variables, allocate > that in shared memory and re-write all references to shared variables > as dereferences to that structure. Diego, thanks. Some other implmentations of UPC reference all shared variables indirectly through a table, built at runtime. The compiler tells the runtime how much space each variable reauires, and the runtime alloctes this from the shared memory region. The current strategy used by GCC/UPC is somewhat simpler; it lets the linker create the layout of the shared variable section. Perhaps we need to re-visit this design decision, and adopt a scheme similar to that used by GOMP. I'll review omp-low.c for ideas. GCC/UPC does have a pthreads mode of operation, but that is a special case. UPC threads are usually mapped to separate processes. The shared memory region is potentially distributed across network nodes, often accessed via a high speed interconnect. The runtime that is part of the GCC/UPC's release supports only SMP configurations and relies on mmap(). However, GCC/UPC also works with a more general runtime developed by Berkeley, which supports many network interconnects. > > This trick you are implementing with cloning the VAR_DECLs is > guaranteed not to work, sorry. We very explicitly assume that if > DECL_UID (x1) == DECL_UID (x2) then x1 == x2. This is not something > that will change. Yeah, I suspected as much when I first wrote that code. I wasn't too surprised to see that it failed a consistency check in GCC 4.2.
how to chase a tree check failure in verify_ssa?
Background: GCC 4.2.0 base line + mods for UPC dialect. Problem below is probablly a result of the UPC mods and not something inherent in GCC 4.2.0. Although the test cases that I ran pass at -O2, some fail when the value of THREADS (the number of parallel threads in the application) is set to the compile time constant one. The failing tests ICE in verify_ssa as shown below. I'd appreciate any tips or recommendations on how to diagnose problems like this, likely things to look for, and so on. The ICE occurs in tree-ssa.c at line 776 (--enable-checking is asserted): 771 772 FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, 773 SSA_OP_ALL_USES | SSA_OP_ALL_KILLS) 774 { 775 op = USE_FROM_PTR (use_p); 776 if (verify_use (bb, definition_block[SSA_NAME_VERSION (op)], 777 use_p, stmt, false, !is_gimple_reg (op), 778 names_defined_in_bb)) 779 goto err; 780 } The operand, op: (gdb) p op $49 = 0x2e1ebc60 (gdb) pt unit size align 128 symtab 0 alias set 3 fields unsigned external bit-field nonaddressable decl_4 DI file line 0 size unit size align 1 offset_align 128 offset bit offset bit_field_type context chain > chain > used ignored TI file test02.upc line 33 size unit size align 128 context > and the statement, stmt: (gdb) p stmt $50 = 0x2e1ee3c0 (gdb) pt unit size align 64 symtab 0 alias set -1 precision 48 min max > side-effects arg 0 arg 0 used ignored TI file test02.upc line 33 size unit size align 128 context > arg 1 unsigned external bit-field nonaddressable decl_4 DI file line 0 size unit size align 1 offset_align 128 offset bit offset bit_field_type context chain >> arg 1 constant invariant 0> test02.upc:33> The failure occurs because SSA_NAME_VERSION() in turn calls SSA_NAME_CHECK() which checks that the tree node is an SSA_NAME node, which 'op' clearly is not. Any ideas on how this situation might have occurred? Note that the type of op above is the internal representation of a UPC shared pointer, which has three fields (phase, thread, vaddr). This rep. overalays a shared pointer value, which is generally twice the size of a conventional pointer. Internally, UPC shared pointers are represented as POINTER_TYPE nodes whose TREE_TYPE() is qualified by a new qualifier, "shared". Various regular "C" optimizations on pointers have to be disabled for UPC's shared pointers. It may be the case that with the particular settings used in the failing test that a "c" pointer optimization was inadvertently applied to a UPC shared pointer. Thanks for your help.
Re: how to chase a tree check failure in verify_ssa?
On Mon, Sep 24, 2007 at 09:36:25AM -0400, Diego Novillo wrote: > On 9/23/07, Gary Funck <[EMAIL PROTECTED]> wrote: > > > The operand, op: > > > > (gdb) p op > > $49 = 0x2e1ebc60 > > (gdb) pt > > > This symbol was not marked for renaming and the program is already in > SSA form. When your pass introduces new symbols, you need to add them > to the symbol table (with add_referenced_var) and also mark it for > renaming (with mark_sym_for_renaming). For examples see passes like > tree-sra.c or tree-pre.c that create new variables. Diego, thanks. That particular symbol is being created in gimplify_expr, here (at line 541): 536 won't allocate any variable that is used in more than one basic 537 block, which means it will go into memory, causing much extra 538 work in reload and final and poorer code generation, outweighing 539 the extra memory allocation here. */ 540 if (!optimize || !is_formal || TREE_SIDE_EFFECTS (val)) 541 ret = create_tmp_from_val (val); 542 else 543 { 544 elt_t elt, *elt_p; 545 void **slot; Above, optimize=3, is_formal=0, and by deduction, side-effects must be true. 'val' above, is a constructor: (gdb) p debug_tree (val) unit size align 64 symtab 0 alias set 3 fields unsigned external bit-field nonaddressable decl_4 SI file line 0 size unit size align 1 offset_align 128 offset bit offset bit_field_type context chain > chain > constant> We use constructors to build a UPC shared pointer value (it has three parts [phase, thread, vaddr]). I would have thought gimplify_expr's internal mechanisms would mark veriables as referenced, when it needs to?
Re: how to chase a tree check failure in verify_ssa?
Diego, a bit more info. It seems that gimplify_operand is being called in the rewrite_uses pass of tree-ssa-loop-ivopts.c. gimplify_operand() is working on this expr: unit size align 32 symtab 0 alias set -1 precision 32 min max > constant invariant arg 0 constant invariant arg 0 constant static arg 0 constant>>> arg 1 constant invari As you can can see, we coerce a constructor into a UPC shared pointer, which works something like a pointer, but it is not inter-operable directly with integers. Typically, we have to locate the places where these sorts of optimizations are attempted and disable them for UPC shared pointers. Thanks for you help. It got me pointed in the right direction. - Gary
cgraph, unit-at-a-time, and the "used" attribute
While working on UPC, we ran into an interesting problem where if -O1 is enabled, and -funit-at-a-time is disabled (which is not the default configuration) a static variable declaration was not emitted by the assembler. I haven't quite worked out why this is the case, but reading the code did notice some awkwardness in how "used" variables are detected and handled by the call graph (cgraph) pass(es). The gist of the issue we ran into was the handling of this UPC construct: {static shared strict int x; x = x; } In UPC, "strict" is similar to volatile. The assignment of the dummy variable to itself above doesn't do anything very useful, but it does enforce a memory fence that ensures that remote reads and writes to UPC shared space can't flow past the assignment above. The UPC compiler runs a gimplify pass which finds all UPC-isms and rewrites them into C-isms, which then flow through the backend. The assignment above is loosely translated into: upc_put([0, 0, &x], upc_get([0, 0, &x], sizeof(x))); where [0, 0, &x] is an aggregate consstructor that builds the representation of a shared pointer having a thread number of 0, a phase of 0, and a virtual address of &x. All UPC shared vairables are located in a special linkage section. In this way, &x points to a location in the global shared address space, and the linker lays out each thread's contribution to the global shared address. The difficulty comes in when we generate the runtime calls above referring to &x, by referring to a shadow variable we create (by necessity, to prevent infinite recursion in the gimplify pass) that has the same external name as 'x', with the shared qualifier removed. What happens is that cgraph has already been run and determined that 'x' isn't needed and therefore it doesn't emit the declaration of 'x' into the generated assembler code. We tried asserting TREE_USED() on 'x' when it was declared, but it turns out that instead of referring to TREE_USED() or even DECL_PRESERVE_P(), cgraph instead refers directly to the "used" attribute. Because of this, if __attribute__ ((used)) is added to the declaration above, all is well. That is because the front-end checks directly for the "used" attribute in various places but seems not to check various tree flags. Here are the relevant references (in the HEAD branch): c-decl.c-} c-decl.c- c-decl.c- /* If this was marked 'used', be sure it will be output. */ c-decl.c: if (!flag_unit_at_a_time && lookup_attribute ("used", DECL_ATTRIBUTES (decl))) c-decl.c-mark_decl_referenced (decl); c-decl.c- c-decl.c- if (TREE_CODE (decl) == TYPE_DECL) -- cgraphunit.c- if (node->local.externally_visible) cgraphunit.c-return true; cgraphunit.c- cgraphunit.c: if (!flag_unit_at_a_time && lookup_attribute ("used", DECL_ATTRIBUTES (decl))) cgraphunit.c-return true; cgraphunit.c- cgraphunit.c- /* ??? If the assembler name is set by hand, it is possible to assemble -- cgraphunit.c- for (node = cgraph_nodes; node != first; node = node->next) cgraphunit.c-{ cgraphunit.c- tree decl = node->decl; cgraphunit.c: if (lookup_attribute ("used", DECL_ATTRIBUTES (decl))) cgraphunit.c- { cgraphunit.c- mark_decl_referenced (decl); cgraphunit.c- if (node->local.finalized) -- cgraphunit.c- for (vnode = varpool_nodes; vnode != first_var; vnode = vnode->next) cgraphunit.c-{ cgraphunit.c- tree decl = vnode->decl; cgraphunit.c: if (lookup_attribute ("used", DECL_ATTRIBUTES (decl))) cgraphunit.c- { cgraphunit.c- mark_decl_referenced (decl); cgraphunit.c- if (vnode->finalized) -- ipa-pure-const.c-{ ipa-pure-const.c- /* If the variable has the "used" attribute, treat it as if it had a ipa-pure-const.c- been touched by the devil. */ ipa-pure-const.c: if (lookup_attribute ("used", DECL_ATTRIBUTES (t))) ipa-pure-const.c-{ ipa-pure-const.c- local->pure_const_state = IPA_NEITHER; ipa-pure-const.c- return; -- ipa-reference.c-{ ipa-reference.c- /* If the variable has the "used" attribute, treat it as if it had a ipa-reference.c- been touched by the devil. */ ipa-reference.c: if (lookup_attribute ("used", DECL_ATTRIBUTES (t))) ipa-reference.c-return false; ipa-reference.c- ipa-reference.c- /* Do not want to do anything with volatile except mark any -- ipa-type-escape.c- tree type = get_canon_type (TREE_TYPE (t), false, false); ipa-type-escape.c- if (!type) return; ipa-type-escape.c- ipa-type-escape.c: if (lookup_attribute ("used", DECL_ATTRIBUTES (t))) ipa-type-escape.c-{ ipa-type-escape.c- mark_interesting_type (type, FULL_ESCAPE); ipa-type-escape.c- return; -- varpool.c- if (node->externally_visible || node->force_output) varpool.c-return true; varpool.c- if (!flag_unit_at_a_time varpool.c: && lookup_attribute ("used", DECL_ATTRIBUTES (decl))) varpool.c-return true; varpool.c- varpool.c- /* ??? If the assembler name is set by hand, it is possible to assemble Given that the process
Re: cgraph, unit-at-a-time, and the "used" attribute
On Mon, Oct 08, 2007 at 02:50:06PM -0700, Janis Johnson wrote: > > Might this be related to http://gcc.gnu.org/PR33645? Possibly. We think that we saw a problem rebuilding one of the math functions in libgcc2 at -O2 with unit-at-a-time disabled, that resulted in a compilation failure. Since that isn't the usual configuration, perhaps there's an implicit dependency between funit-at-a-time and one of optimization passes? (We didn't look into the issue further. The baseline we're using is 4.2.0, fyi.) Thanks for the reference to the PR. - Gary
Re: gomp slowness
On Thu, Oct 18, 2007 at 11:42:52AM +1000, skaller wrote: > > DO you know how thread local variables are handled? > [Not using Posix TLS I hope .. that would be a disaster] Would you please elaborate? What's wrong with the POSIX TLS implementation? Do you know of any studies? I ask, because we presently use the TLS facility extensively, and have suspected that there are significant performance problems, but haven't looked into the issue.