from:"Gary Funck"

GCC 3.4.3 static constants, named sections, and -fkeep-static-consts

2005-03-03 Thread Gary Funck


Given the following,

static char const rcsid[] =
"$Id: f.c,v 5.4 1993/11/09 17:40:15 eggert Exp $";
int main() {}

When compiled with GCC 3.4.3, at -O2, the ident string above will _not_
appear in the executable.  This is apparently expected behavior.

However, interestingly,
  gcc -fkeep-static-consts -O2 t.c
did not retain the ident string, rcsid, defined above.  Shouldn't
-fkepp-static-consts have ensured that this static constant would appear
in the executable?

I also tried adding a section attribute to the string, with the hope that
the compiler would retain the static constant because it had been explicitly
targeted to a named section,

static char const __attribute__ ((section("ident_sect"))) rcsid[] =
"$Id: f.c,v 5.4 1993/11/09 17:40:15 eggert Exp $";
int main() {}

but this didn't prevent the elimination of the const static defintion.

Any suggestions on another method to ensure that this static const string
makes it into the executable when compiled at -O2?  And shouldn't
-fkeep-static-consts have ensured that the static const string wasn't
eliminated?

Bug?

The logic in wrapup_global_declarations (toplev.c) doesn't look quite right:

  else if (TREE_READONLY (decl) && !TREE_PUBLIC (decl)
   && (optimize || !flag_keep_static_consts 
   || DECL_ARTIFICIAL (decl))) 
needed = 0; 

If 'optimize' is asserted above then flag_keep_static_consts will
not be tested.  Perhaps it should read as follows?

   && ((optimize && !flag_keep_static_consts)

Alternatively, I wonder if flag_keep_static_consts should be tested earlier at a
higher level, for example:

   if (flag_keep_static_consts)
  /* needed */;

but I'm not sure about which of the earlier tests which assert needed = 0;
are mandatory and which are optional.

Enhancement request: assert node->needed if an explicit section attribute
is supplied for the declaration associated with node, on the assumption
that the data is being placed in a named section for a reason.

RE: GCC 3.4.3 static constants, named sections, and -fkeep-static-consts

2005-03-09 Thread Gary Funck


> From: James E Wilson
> Sent: Tuesday, March 08, 2005 6:59 PM
[...]
> 
> Try re-reading the docs.  -fkeep-static-consts is the default.  The 
> purpose of this is that we don't perform this optimization at -O0 
> normally, but if you use -fno-keep-static-consts, then we do.  So this 
> option can let you remove static consts in extra cases, but will never 
> prevent the compiler from removing them.

Jim,

Thanks for the follow-up. I filed a bug report,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20319
and note #2 summarizes some relevant, conflicting facts:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20319#c2

The gist is the documentation describes the current behavior.
However, I don't think the current behavior is useful, and
it does not agree with comments in the source code, nor the
help line.

However, as you noted __attribute__ ((used)) works
well as a workaround, although it would be helpful if `used'
was added to the documentation as a supported attribute that
can be applied to variables.

I think that the switch name
-fkeep-static-consts might be more consistenly named if it
was given the opposite sense and named something like
-fdelete-unused-static-consts.  The idea here is that by
asserting the switch a particular optimization is _enabled_.
Thus the optimizations performed at each level can be consistently
enumerated by asserting a particular set of switches which enable
specific optimizations.  This would change the present user interface,
however, I doubt that anyone is making extensive use of the current
interface because at present only -fno-keep-static-consts, asserted
at -O0 (no optimization), actually changes the default behavior of
the compiler.

RE: Merging calls to `abort'

2005-03-13 Thread Gary Funck


Richard Stallman wrote (in part):
> What's the point of cross-jumping?  It saves a certain amount of
> space; it has no other benefit.  All else being equal, there's no
> reason not to do it.  But cross-jumping abort calls interferes with
> debugging.  That's a good reason not to do it.
t's get rid of this optimization.  Those who want to use a
> fancy_abort function will still be able to do so, but this change will
> be an improvement for the rest.

Would a new attribute be in order, that disables the optimization?
For example, __attribute__ ((unique_call))?  That way, the programmer
can designate other procedures than abort() as procedures which
should not be cross-jumped.

RE: Hand-written rec-descent parser of GCC-4.1 is WRONG!!!

2005-03-15 Thread Gary Funck



The following paper provides some background on the difficulties
encountered with parsing C++:

http://citeseer.ist.psu.edu/irwin01generated.html

Abstract: C++ is an extraordinarily difficult programming language to parse. 
The language cannot readily be approximated with an LL
or LR grammar (regardless of lookahead size), and syntax analysis depends on 
semantic disambiguation. While conventional (LALR(1)
and LL(k)) parser generation tools have been used to build C++ parsers, the 
effort involved in grammar modification and custom code
development is substantial, rivaling the effort of constructing a parser 
manually.
[...]

Link to PDF: http://tinyurl.com/3remp

And a related thread on the GCC mailing list back in 2002:
http://gcc.gnu.org/ml/gcc/2002-08/msg00085.html

empty switch substituion doesn't erase matching switch?

2005-04-17 Thread Gary Funck

This usage of a null substitution came up while I was trying to use
this form of spec. for a different switch, but the following illustrates
the problem using the existing gcc compiler as built for Redhat Linux
running on an SGI Altix:


Given a spec of this form,
 %{S:X}   substitutes X, if the -S switch was given to CC.

And a switch definition for -static:

/* %{static:} simply prevents an error message if the target machine
   doesn't handle -static.  */

And the resulting link command spec:

*link_command:
%{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:%(linker) %l %{pie:-pie} %X %{o*} 
%{A} %{d} %{e*} %{m} %{N} %{n} %{r}%{s} %{t}
%{u*} %{x} %{z} %{Z} %{!A:%{!nostdlib:%{!nostartfiles:%S}}}%{static:} %{L*} 
%(link_libgcc) %o
%{fprofile-arcs|fprofile-generate:-lgcov}
%{!nostdlib:%{!nodefaultlibs:%(link_gcc_c_sequence)}}
%{!A:%{!nostdlib:%{!nostartfiles:%E}}} %{T*} }}


% gcc --version
gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-34)

then the command "gcc -static t.c" ultimately yields the
following collect2 command:

/usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/collect2 -static 
/usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/../../../crt1.o
/usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/../../../crti.o
/usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/crtbegin.o 
-L/usr/lib/gcc-lib/ia64-redhat-linux/3.2.3 -L/usr/lib/gcc-lib/ia64-redhat-linux/
3.2.3/../../.. /tmp/ccc2ISqV.o --start-group -lgcc -lgcc_eh -lc --end-group 
/usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/crtend.o
/usr/lib/gcc-lib/ia64-redhat-linux/3.2.3/../../../crtn.o

I haven't followed the logic in detail, but should the spec.
%{static:} above erase the explicit -static switch that was
passed to gcc?

tips on debugging a GCC 3.4.3 MIPS RTL optim problem?

2005-04-24 Thread Gary Funck


Hello, using the 3.4.3 baseline on SGI MIPS3 Irix6.5,
I'm running into a problem where bad code is generated on a relatively
trivial program when both -funit-at-a-time and -foptimize-sibling-calls
is asserted.  The nature of the failure is that the RTL optimizer
seems to get confused about what value should be targeted to an
argument register; it seems to coallesce two separate temporaries
into one.  Note that the original RTL being generated originates
in some new code that I've added to support an experimental dialact
of C (called UPC), so it isn't out of the question that there is some
aliasing or other issue that I've introduced.  However, most tests
are passing, and just a few show the failure mode illustrated below.
All the tests pass on i386 and IA64, fyi -- they don't demonstrate
this failure.

First question: are there known problems in 3.4.3 with -funit-at-a-time
and/or -foptimize-sibling-calls? (I ran a few queries of the Bugzilladatabase 
but didn't find anything).

I confirmed the problematic optimizations by compiling the program with
 -O0 -funit-at-a-time -foptimize-sibling-calls
and noticed that correct code is generated if either
or both optimization switches are removed from the command line.

I tried debugging the problem by compiling with -da and
looked at the various rtl dump files:

t.upc.00.cgraph   t.upc.07.addressof  t.upc.25.gregt.upc.35.mach
t.upc.01.rtl  t.upc.11.cfgt.upc.26.postreload
t.upc.02.sibling  t.upc.19.life   t.upc.27.flow2
t.upc.04.jump t.upc.24.lreg   t.upc.29.ce3

The bad code shows up in t.upc.02.sibling, so probably -dr -di would
have sufficed.

The problem that I'm seeing is illustrated in the following RTL:


(insn 66 65 77 0 (set (reg:SI 225 [  ])
(reg/f:SI 177 virtual-stack-vars)) -1 (nil)
(nil))
 
(insn 77 66 78 0 (set (reg:DI 228)
(const_int 0 [0x0])) -1 (nil)
(nil))
 
(insn 78 77 79 0 (set (reg:DI 228)
(mem/s:DI (reg/f:SI 177 virtual-stack-vars) [0 S8 A128])) -1 (nil)
(nil))
 
(insn 79 78 80 0 (set (reg:DI 4 $4)
(reg:DI 228)) -1 (nil)
(nil))
 
(insn 80 79 81 0 (set (reg:SI 5 $5)
(reg:SI 225 [  ])) -1 (nil)
(nil))
 
(insn 81 80 82 0 (set (reg:SI 6 $6)
(reg:SI 224 [  ])) -1 (nil)
(nil))
 
(insn 82 81 83 0 (set (reg:SI 229)
(unspec:SI [
(reg:SI 28 $28)
(const:SI (unspec:SI [
(symbol_ref:SI ("__putblk3") [flags 0x41] 
)
] 107))
(reg:SI 79 $fakec)
] 27)) -1 (nil)
(nil))
 
(call_insn 83 82 115 0 (parallel [
(call (mem:SI (reg:SI 229) [0 S4 A32])
(const_int 0 [0x0]))
(clobber (reg:SI 31 $31))
]) -1 (nil)
(nil)
(expr_list (use (reg:SI 28 $28))
(expr_list (use (reg:SI 6 $6))
(expr_list (use (reg:SI 5 $5))
(expr_list (use (reg:DI 4 $4))
(nil))
 
(insn 115 83 116 0 (clobber (mem/s:BLK (reg/f:SI 177 virtual-stack-vars) [0 
A128])) -1 (nil)

Above, the second argument (reg:SI $5) is set to (reg:SI 225), which
in turn is set to (reg/f:SI 177 virtual-stack-vars) which is simply
the frame pointer.  Note that the first argument (reg:SI $4) will
end up being set to the contents of the location that the frame
pointer points to -- this is incorrect -- it should be set to the
contents of 16($fp), or at least some other location than the
double word location beginning at $fp.

It looks as if the optimizer somehow aliased the two locations,
or it decided somehow that they weren't both live at the same time.

If we maintain the -foptimize-sibling-calls switch but do not
assert -funit-at-a-time, the following correct RTL is generated:

(insn 39 38 40 0 (set (reg:SI 205)
(const_int 8 [0x8])) -1 (nil)
(nil))
 
(insn 40 39 41 0 (set (reg:SI 206)
(reg/f:SI 177 virtual-stack-vars)) -1 (nil)
(nil))
 
(insn 41 40 42 0 (set (reg:DI 207)
(const_int 0 [0x0])) -1 (nil)
(nil))

 
(insn 42 41 43 0 (set (reg:DI 207)
(mem/s:DI (plus:SI (reg/f:SI 177 virtual-stack-vars)
(const_int 16 [0x10])) [0 S8 A128])) -1 (nil)
(nil))
 
(insn 43 42 44 0 (set (reg:DI 4 $4)
(reg:DI 207)) -1 (nil)
(nil))
 
(insn 44 43 45 0 (set (reg:SI 5 $5)
(reg:SI 206)) -1 (nil)
(nil))
 
(insn 45 44 46 0 (set (reg:SI 6 $6)
(reg:SI 205)) -1 (nil)
(nil))
 
(call_insn 46 45 48 0 (parallel [
(call (mem:SI (symbol_ref:SI ("__putblk3") [flags 0x41] 
) [0 S4 A32])
(const_int 0 [0x0]))
(clobber (reg:SI 31 $31))
]) -1 (nil)
(nil)
(expr_list (use (reg:SI 28 $28))
(expr_list (use (reg:SI 6 $6))
(expr_list (use (reg:SI 5 $5))
(expr_list (use (reg:DI 4 $4))
(nil))
 
(insn 48 46 49 0 (clobber (mem/s:BLK (plus:SI (reg/f:SI 177 virtual-stack-vars)
(const_int 16 [0x10])) [0 A128

RE: gcc 4.0.0 optimization vs. id strings (RCS, SCCS, etc.)

2005-04-26 Thread Gary Funck



We use the feature of placing strings into the object file somewhat
differently.
We record configuration and compilation-related info. into strings which are
collesced into their own linkage section.  A runtime component traverses
this
config. info. section to ensure that the various separately linked modules
have been compiled with consistent settings.  Yes, this might be better done
by a host based tool like collect, but that requires more work and more
mechanism, and the simpler approach works fine for now.

RE: Ada and bad configury architecture.

2005-04-26 Thread Gary Funck



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of
> Nathanael Nerode
> Sent: Monday, April 25, 2005 8:47 PM
[...]
> 
> Actually, I was going to try to convince y'all to allow the *configury*
> to be put in the *configure* files.  All of it.  The current scheme of
> stuffing the configury in the Makefile, although I know the Ada
> maintainers like it, is just trouble, and is fundamentally the source of
> most or all of the endless Ada cross-build problems.

We implement an experimental dialect of C, called UPC, which targets
SIMD class machines.  One of the changes between 3.3 and 3.4 that
have caused us the most grief is the decision to defer per-language
configuration to the make step.  This means that the dialect-specific
configuration runs after gcc configuration, and we can no longer, for
example overlay (or add to) the basic configuration.  As an example,
we need to introduce dialect-specific runtime start and end object
files (serving a similar function to crtbegin.o and crtend.o) but the
common start and end files are now built well before the UPC language
files are even configured.  Thus, there is no mechanism to add
language-specific components onto the list of files that
come with the base level compiler. For 3.4 we've worked around the
problem, but the workaround is kludgy.

In a related matter, I find it difficult to debug the makefiles that
make use of included makefile fragments.  I can see some advantages
of these included files for developers who happen to be working on
those fragments, but overall, the include files make life more difficult.
Same thing goes for the included configure fragments, IMO.  And while
I'm ranting, I'd much prefer it the make files were 'for loop free';
that is, that they listed explicit dependencies and built those dependents
in a classic make file fashion, rather than implementing iteration
in the make step.  Most of these suggestions argue for a method to
generate make files in a more automated fashion.

RE: GCC 4.1: Buildable on GHz machines only?

2005-04-26 Thread Gary Funck



> -Original Message-
> From: Matt Thomas
> Sent: Tuesday, April 26, 2005 10:42 PM
[...]
> 
> Alas, the --disable-checking and STAGE1_CFLAGS="-O2 -g" (which I was
> already doing) only decreased the bootstrap time by 10%.  By far, the
> longest bit of the bootstrap is building libjava.
> 

Is it fair to compare current build times, with libjava included,
against past build times when it didn't exist?  Would a closer
apples-to-apples comparison be to bootstrap GCC Core only on
the older sub Ghz platforms?

RE: GCC 3.4.4 Status (2005-04-29)

2005-05-05 Thread Gary Funck



> From: Mark Mitchell
> Sent: Friday, April 29, 2005 12:00 PM
> 
> Now that GCC 4.0 is out the door, I've spent some time looking at the
> status of the 3.4 branch.  As stated previously, I'll be doing a 3.4.4
> release, and then turning the branch over to Gaby, to focus
> exclusively on 4.0/4.1. [...]

What is the target date for 3.4.4?  Thanks.

GCC 3.3.6 - anomalous debug info?

2005-05-08 Thread Gary Funck



configuration: i386-redhat-linux (Redhat 9.2), gcc 3.3.6 ("make bootstrap" from
the sources), and gdb "(5.3post-0.20021129.18rh)" as well as gdb 6.3 (latest)
built from sources.

I'm working on some changes to GCC 3.4.3, which I've built using gcc 3.3.6.
The GCC (3.4.3) that I'm debugging is compiled with -g -O0, with 
--enable-checking.

However, I notice that when I fire up GDB 5.3, it says:

Breakpoint 4, main (argc=13, argv=0xbfffdc04) at /upc/gcc-upc/src/gcc/main.c:35
35return toplev_main (argc, argv);
During symbol reading, inner block not inside outer block in print_rtx.
During symbol reading, inner block not inside outer block in print_rtx.
During symbol reading, inner block not inside outer block in print_rtx.
During symbol reading, inner block not inside outer block in print_rtx.
During symbol reading, inner block not inside outer block in print_rtx.

and the latest gdb 6.3 (built from sources) says the following:

Breakpoint 4, main (argc=13, argv=0xbfffe7e4) at /upc/gcc-upc/src/gcc/main.c:35
35return toplev_main (argc, argv);
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.
During symbol reading, Incomplete CFI data; unspecified registers at 0x08095c3e.

Neither of these differing series of warning messages give me confidence that
the debugging info. is correct.  Is this a gcc problem, or a gdb problem?
(I made a few quick probes in the Bugzilla database, but couldn't find
anything that seemed relevant to malformed debug info.)

Anyone else see messages like this when debugging gcc?  Is there a workaround
or fix?  Is this something unique to what is now a fairly old version
of the Linux OS?

BTW, one of the reasons I tried this with the latest GCC 3.3.6 and GDB 6.3,
and compiling at -O0 was to see if some problems I was seeing where the
debugger was having trouble navigating gcc's object->source mappping
might be fixed.  I saw a similar problem using an earlier version of gcc
"3.2.2 20030222 (Red Hat Linux 3.2.2-5)".

RE: Full comparison in 'cbranchsi4' leads to error in gcc 4.0

2005-05-10 Thread Gary Funck


> 
> This works fine on gcc 3.4, however on gcc 4.0 it creates an error during 
> optimization. According to my investigation, the error occurs when there is a 
> division by a constant power of 2 which needs to be transformed into 
> shifting. 
> The error generated is:
> 
> internal compiler error: in emit_cmp_and_jump_insn_1, at optabs.c:3599
> 

The easiest thing to do is to debug gcc: set a breakpoint on fancy_abort, and
and go up a few levels to emit_cmp_and_jump_insn_1().  Note the incoming
rtx args (x and y) and mode.  From the looks of the code in there it is looking
for an instruction pattern that matches, and when no match is found, it tries
a wider mode, until there are no wider modes, then it aborts.  You need to
find the mode and rtx arguments that are being passed in, and then understand
why no matching instruction is found. For example, in your instruction pattern,

(define_insn "cbranchsi4"
 [(set (pc) (if_then_else
(match_operator 0 "comparison_operator"
[(match_operand:SI 1 "register_operand" "r")
 (match_operand:SI 2 "nonmemory_operand" "r")]) 
(label_ref (match_operand 3 "" ""))
(pc)))]
 ""
   "c%C0jump %1 %2 %3"
   [(set_attr "type" "branch")
(set_attr "length" "1")]
)

it isn't prepared to match a memory operand.  Perhaps the optimizer 
pre-calculated
a constant, and targeted the constant into memory rather than a register?  In
that case, there will be no match on the third argument because it is expecting
a "nonmemoryoperand".

packaging a GCC binary distribution so it can be installed at arbitrary locations?

2005-05-12 Thread Gary Funck


Given a binary distibution of GCC, for example, built to install under
/usr/local, is it possible to configure and build the compiler in such a
way that a binary packaging method such as RPM can allow a user to specify
an alternate installation point (perhaps /opt, or even the user's home
directory) and have it all work?

My impression is that too many hard coded paths are wired into gcc.c when
it is built to make this ability to migrate the binary possible.  There are
workarounds for the user such as setting various environment variables and
using the -B switch, but I'm looking for a method that directly allows 
installation
of the binary to a new place than where it was initially configured.  Anyone 
found
a way to do this?  (Separately, GCC 3.4 is now built using dynamic libraries
for libgcc and libunwind, and these cause some different but unique problems
invoking gcc [assuming the user would prefer not to adjust their library path
or doesn't have access to /etc/ld.so.conf. I think things could be made
simpler by specifying various -rpath settings when the executable is linked,
but these -rpath settings may have to fixed up when installing the binary
to a place other than it was built, unless the entries can be made relative
to the executable.])

Is -static a link-only switch?

2005-05-12 Thread Gary Funck


Does the -static switch play any role during compilation, or is it
a link-only switch?  A quick review of gcc.c, indicates that -static
may play a role on some targets:

/* %{static:} simply prevents an error message if the target machine
   doesn't handle -static.  */

However, the info documentation shows the following:

 *Note Options for Linking: Link Options. 
  OBJECT-FILE-NAME  -lLIBRARY  
  -nostartfiles  -nodefaultlibs  -nostdlib  
  -s  -static  -static-libgcc  -shared  -shared-libgcc  -symbolic 
  -Wl,OPTION  -Xlinker OPTION  
  -u SYMBOL.


I can think of target OS's that might define a different ABI for procedure calls
for programs compiled with -static asserted, than when compiled for a dynamic
linking environment, but can't quite tell if in fact -static has any effect
during compilation.

RE: packaging a GCC binary distribution so it can be installed at arbitrary locations?

2005-05-12 Thread Gary Funck


> 
> Yes, with recent versions of gcc you can move the entire tree around
> and the gcc driver will still be able to find the various internal
> executables and header files. [...]

Ian, thanks.

Which versions qualify as "recent" above?  GCC 3.4, or 4.0, or both?
Is there any documentation on how the new packaging mechanism works?
If this was discussed on this list, would you happen to know approximately,
when (so I can do a search of the archives)?

RE: Is -static a link-only switch?

2005-05-12 Thread Gary Funck

Ian Lance Taylor wrote (in part): 
> In fact many targets compile code differently depending upon whether
> the code is to be put into a shared library or not, but this is
> controlled via options like -fpic, not -static.

Is it generally safe on all currently supported targets to assert -fno-pic
when compiling programs that will ultimately be linked with -static asserted?
Will targets that don't support -fpic (and -fon-pic) complain, or just quietly
accept the switch?

RE: packaging a GCC binary distribution so it can be installed at arbitrary locations?

2005-05-12 Thread Gary Funck


Ian Lance Taylor wrote (in part):
> Telling the dynamic linker about a dynamic libgcc is still a problem,
> but that is a problem whereever you put the compiler.

If I'm not interested in build a dynamically linked gcc, or building
libgcc and related libraries as dynamic libraries, can I simply assert
--disable-shared when configuring gcc, and thus ensure that the resulting
compiler binaries can be easily moved around?

C99 implies -Wimplicit-function-declaration?

2005-05-20 Thread Gary Funck


I notice that while compiling with -stdc99 (which asserts flag_isoc99) that
the
compiler issues warnings by default when it detects that a function call
references
a function which has not been previously declared.

Although it is a useful warning, my copy of the C99 spec. seems to indicate
that
such a warning is optional.

My copy of the C99 standard (2nd edition, 1999-12-01)
Says the following in Annex I ("Common Warnings"):

--- begin quote 

1 An implementation may generate warnings in many situations, none of which
are
specified as part of this International Standard. The following are a few of
the more
common situations.
[...]
 A function is called but no prototype has been supplied (6.5.2.2).

--- end quote 

There appears to be no requirement for the compiler to issue a warning,
although
does seem to be permitted by the specification.

Also, this behavior is not reflected in the documentation,

http://gcc.gnu.org/onlinedocs/gcc-4.0.0/gcc/Warning-Options.html#Warning-Opt
ions


-Wimplicit-function-declaration
-Werror-implicit-function-declaration
Give a warning (or error) whenever a function is used before being declared.
The form -Wno-error-implicit-function-declaration is not supported. This
warning is enabled by -Wall (as a warning, not an error).

-Wimplicit
Same as -Wimplicit-int and -Wimplicit-function-declaration. This warning is
enabled by -Wall.

The documentaion is technically incorrect, because at the top of the page,
it
states:
"This manual lists only one of the two forms, whichever is not the
default.". However, for C99 the option is enabled by default

RE: C99 implies -Wimplicit-function-declaration?

2005-05-20 Thread Gary Funck


Joseph S. Myers wrote (in part):
> No prototype is different from no declaration at all.  Implicit function 
> declarations are not part of C99, so the code is in error in C99 mode.

OK, thanks. I (now) understand that the reference to a warning about a missing 
protoype
does not apply.  However, I don't see anything in section 6.5.2.2
(rev. 1999-12-01) that says that a function declaration or prototype
declaration must (or should) precede a call to the function. And GCC isn't 
treating it
as an error, but rather is enabling the warning by default.

The code reads as follows (in c-objc-common.c):

  /* If still unspecified, make it match -std=c99
 (allowing for -pedantic-errors).  */
  if (mesg_implicit_function_declaration < 0)
{
  if (flag_isoc99)
mesg_implicit_function_declaration = flag_pedantic_errors ? 2 : 1;
  else
mesg_implicit_function_declaration = 0;
}

And mesg_implicit_function_declaration is initialized to -1 (c-common.c):

/* Nonzero means message about use of implicit function declarations;
 1 means warning; 2 means error.  */

int mesg_implicit_function_declaration = -1;

RE: Sine and Cosine Accuracy

2005-05-27 Thread Gary Funck



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of
> Menezes, Evandro
> Sent: Friday, May 27, 2005 1:55 PM
[...]
> 
> That's because the error is the same but symmetrical for sin and 
> cos, so that, when you calculate the sum of their squares, one 
> cancels the other out.
> 
> The lack of accuracy in x87 is well known: see 
> http://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-F
> unctions.html#Errors-in-Math-Functions.

Ulrich Drepper used a different method to compute math function
accuracy, described here:
http://people.redhat.com/drepper/libm/index.html

It might be interesting to re-run the safe/unsafe/x87 tests using
his methodology.  His results show offer comparisons on a number of
platforms, and the visual representation of the errors can offer
some insight into the behavior of the implementation.

RE: What is wrong with Bugzilla?

2005-05-30 Thread Gary Funck

As an occasional user of the Bugzilla database, I don't find it terrible to 
use, though
it would be nice if there were an abbreviated interface that looked for the 
sorts of
queries that users issue the most.  These often-occurring queries might be best 
determined
by saving a month's worth of queries and ferreting out the types of queries 
that occur
most often.  I also didn't find the requirement that I register my e-mail 
address to
be particular surprising or burdomesome.

As an aside, I often stumble into the middle of a Redhat discussion list thread 
via
Google that seems to relate to a problem that I've encountered.  Redhat for 
some reason
requires https access, and in IE6, my browser of non-choice, I have to click OK 
to view
the page.  Now, _that_ is annoying.

What may be confusing to users: where do I report my problem? If I'm a Redhat 
user,
do I log my potential GCC problem to their support site, or to the GCC site?  
To further
confuse matters, for most users, the vendors often modify a given version of 
GCC to
include specific patches and build options of their choice.  This of course 
argues
for logging bugs with the vendor.  One wonders whether the vendors are timely 
in reporting
legit bugs back to the GCC Bugzilla database, but one hopes so.

If we for the moment assume that users of pre-packaged distributions report 
their bugs
back to the vendor, then the GCC mailing lists and bug lists are left for those 
brave
souls who are using GCC source code distributions directly.  (perhaps the GCC 
maintainers
can comment on whether this theory in fact holds).  Matters are further 
complicated by
the fact that there are now several viable GCC releases to choose from: 3.3.x, 
3.4.x,
4.0.x, CVS head, and so on.  There's even the occasional bug filed against one 
of the
many branches.  When we consider the multitude of choices, it is amazing that 
there
is any forward progress. 

As a casual reader of the GCC lists, I do have one observation: the volume on 
the
GCC bug list is very, very high.  Often the bug traffic there relates to 
regressions
and bugs that are found on the CVS head or recent development releases.  As a 
user
of the older releases (3.3, 3.4), I'd much prefer it if there were too separate 
bug
reporting lists: one for the more stable released versions, and a separate list 
for
the "latest".  I'd also like it if there was a web page for each stable release 
that
showed the results of a canned Bugzilla query which lists open bugs and/or 
recently
closed bugs agaisnt the stable releases (not sure how this would be organized).

As far as the tenor of the GCC mailing list goes, it is true that responses to 
"dumb
questions" are often terse, but they're generally helpful.  I think this is to 
be
expected, when interacting with busy developers who have to balance many 
priorities
and pressing deadlines.  I've been particularly disappointed by queries to 
related
lists like the glibc list, which of course is an equally important component of 
a
useful C compilation system.  I would vote affirmatively to somehow more closely
linking GCC releases with specific GLIBC distributions, and have some sort of 
tighter
coordination between the two.  However, after delving into GLIBC on a particular
platform, I can see where handling the many varieties of GLIBC builds is a big
problem, and appears to be one that presently the vendors mainly deal with.

RE: What is wrong with Bugzilla? [Was: Re: GCC and Floating-Point]

2005-05-30 Thread Gary Funck


> 
> Next try documentation, installation. Talks about compiling again. 
> Finally, at download, binaries I find what I want. Seeing as I suspect 
> that is the link most people want when they first visit, it should 
> perhaps be a little more obvious, and in the main body near the top?

Your scenario makes a lot of sense.  However, it should be possible to
verify actual usage patterns by investigating web site logs, to see which
pages are visited and (perhaps) in what order.  Based upon this information,
the pages can be re-organized to place first, and most prominent, the pages
that are generally visited first.

Sub-question: which version would the maintainers recommend that a user
looking for a stable release try first (3.3, 3.4, or 4.0)?

semantics of null lang_hooks.callgraph.expand_function?

2005-10-25 Thread Gary Funck


While working with GCC's language hooks, we found that
certain places in GCC test for a null value of
lang_hooks.callgraph.expand_function, but
cgraph_expand_function() calls the hook directly:


In cgraphunit.c:

/* Expand function specified by NODE.  */
 
static void
cgraph_expand_function (struct cgraph_node *node)
{
  tree decl = node->decl;
 
  /* We ought to not compile any inline clones.  */
  gcc_assert (!node->global.inlined_to);
 
  if (flag_unit_at_a_time)
announce_function (decl);
 
  cgraph_lower_function (node);
 
  /* Generate RTL for the body of DECL.  */
  lang_hooks.callgraph.expand_function (decl);


In toplev.c:

 
  /* Disable unit-at-a-time mode for frontends not supporting callgraph
 interface.  */
  if (flag_unit_at_a_time && ! lang_hooks.callgraph.expand_function)
flag_unit_at_a_time = 0;


In function.c:


  /* Possibly warn about unused parameters.
 When frontend does unit-at-a-time, the warning is already
 issued at finalization time.  */
  if (warn_unused_parameter
  && !lang_hooks.callgraph.expand_function)
do_warn_unused_parameter (current_function_decl);


We tried setting lang_hooks.callgraph.expand_function to NULL:

/* For now, disable unit-at-a-time by setting expand_function to NULL */
#undef LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION
#define LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION NULL

which has the desited effect of disabling unit-at-a-time, but
runs aground in cgraph_expand_function() with a segfault,
when it attempts to call lang_hooks.callgraph.expand_function().

It seems that GCC is handling lang_hooks.callgraph.expand_function
in an inconsistent fashion.  Is a null value for expand_function
meaningful?  If it is, then what is the fix for cgraph_expand_function()?

Should bootstrap-O3 be the default for building/testing GCC?

2015-12-24 Thread Gary Funck


Currently, the default optimization level when building,
bootstrapping GCC is -O2.

We routinely build with --with-build-config='bootstrap-debug bootstrap-O3'
because we want to verify that our UPC changes don't affect the
compiler when built with full optimizations.  We also build
with --enable-checking=all.

Since most developers probably build/test GCC with the default -O2 options,
we fairly often run into -O3 related issues when building GCC.
Enough so that we're considering just using the default -O2 settings.

I'm wondering if there might be benefit in changing the current defaults
to use -O3 instead?  Or perhaps have the configure infrastructure
determine that the build is for a development version of GCC
and set the flags and options accordingly?

Somewhat related: has anyone recently determined whether a GCC built
with -O3 is generally faster/smaller than one built at -O2?

thanks,
- Gary

Re: Autotuning parameters/heuristics within gcc - best place to start?

2014-09-28 Thread Gary Funck

On 09/26/14 07:47:05, Andi Kleen wrote:
> One example of an existing autotuner is the gccflags tuner in opentuner.

Although dated, ACOVEA might offer up some ideas.
http://stderr.org/doc/acovea/html/acovea_4.html

Re: organization of optimization options in manual

2015-01-17 Thread Gary Funck

On 01/14/15 23:15:59, Jeff Law wrote:
> Sounds good.  I think just starting with the list & creating the buckets
> with the list.  Then post here and we'll iterate and try to nail that down
> before you start moving everything in the .texi file.

Something to consider, if the optimization options are re-worked:
Arrange the -O options such that -O1 can be described by a
distinct set of specific optimizations enabled (or disabled)
in addition to -O0, and -O2 would be described as a composite
of specific optimizations applied to -O1 and so on. (This
might require the addition of new optimization options.)

For completeness, if a specific optimization requires
certain passes or the assertion of other options, that should
somehow be encoded internally within the compiler.

This would potentially make it easier to find which optimization
(or pass) is causing a regression and might make it easier
for users to understand the exact effect of a particular -O option.

- Gary

how to make sure an init routine is kept in the call graph?

2011-04-21 Thread Gary Funck


Recently, we tried to merge the GCC trunk into the GUPC branch
and ran into an issue caused by a recent GCC update.
The last successful merge was trunk version 172359, fyi.

For certain UPC file scope static initializers,
a per file initialization routine is created, its address
is added to a global table (in its own section).
The UPC runtime will call all the routines listed in that
table before transferring control the user's main program.

After the recent trial merge, we see the following on
some of our test cases:

../../gcc/xupc -O2 -g -fdump-ipa-cgraph -fdump-tree-cfg test25.upc -o test25
/tmp/ccygQ8JN.o:(upc_init_array+0x0): undefined reference to `__upc_init_decls'

The call graph dump entry for `__upc_init_decls' is as follows:

__upc_init_decls/80(80) @0x71eb3de0 (asm: __upc_init_decls) body finalized
  called by:
  calls:
  References:
  Refering this function:

As expected, no explicit references have been recorded.

The compiler routine that creates this initialization routine is
called from c_common_parse_file():

  push_file_scope ();
  c_parse_file ();
  /* Generate UPC global initialization code, if required.  */
  if (c_dialect_upc ())
upc_write_global_declarations ();
  pop_file_scope ();

The routine that builds the initialization function is
upc_build_init_func() in gcc/upc/upc-act.c (on the gupc branch).
This routine does the following to build the function,
mark it as used and referenced, and to then add its address
to the initialiaztion table:

  DECL_SOURCE_LOCATION (current_function_decl) = loc;
  TREE_PUBLIC (current_function_decl) = 0;
  TREE_USED (current_function_decl) = 1;
  DECL_SECTION_NAME (current_function_decl) =
 build_string (strlen (UPC_INIT_SECTION_NAME), UPC_INIT_SECTION_NAME);
  /* Swap the statement list that we've built up,
 for the current statement list.  */
  t_list = c_begin_compound_stmt (true);
  TREE_CHAIN (stmt_list) = TREE_CHAIN (t_list);
  cur_stmt_list = stmt_list;
  free_stmt_list (t_list);
  t_list = c_end_compound_stmt (loc, stmt_list, true);
  add_stmt (t_list);
  finish_function ();
  gcc_assert (DECL_RTL (init_func));
  upc_init_array_section = get_section (UPC_INIT_ARRAY_SECTION_NAME,
0, NULL);
  mark_decl_referenced (init_func);
  init_func_symbol = XEXP (DECL_RTL (init_func), 0);
  assemble_addr_to_section (init_func_symbol, upc_init_array_section);

In the past, setting TREE_USED() and calling mark_decl_referenced()
was sufficient to make sure that this routine was not removed from the
call graph.

What is needed in the new scheme of things to ensure that this
initialization function stays in the call graph?

thanks,

- Gary

Re: how to make sure an init routine is kept in the call graph?

2011-04-22 Thread Gary Funck

On 04/22/11 11:14:11, Richard Guenther wrote:
> GF: What is needed in the new scheme of things to ensure that this
> GF: initialization function stays in the call graph?
> 
> Try setting DECL_PRESERVE_P to 1.

Richard, thanks.  That worked.
- Gary

Re: RFC: [GUPC] UPC-related changes

2011-07-01 Thread Gary Funck

This email is a follow-up to an email with a similar title
(posted a year ago).  During that time period, we have worked
on making the changes suggested by Joseph Myers, Tom Tromey,
and other reviewers.  We have also implemented various bug fixes
and improvements.

Our goal with this RFC is to acquaint the reviewers with UPC and
the impact of UPC changes on the GCC front-end, and to gain consensus
that the changes are acceptable for incorporation into the GCC trunk.

Once we make further suggested changes, and have a consensus on this
batch of changes, I will send out RFC's for the "middle end"
(the lowering pass), "debugging" (UPC-specific DWARF extensions),
"runtime" (libupc) and "testing" RFC's.  Those additional RFC's
are likely to be more modular and will have less impact on
the GCC infrastructure.

The email describing the UPC-related front-end and
infrastructure changes was posted to the gcc-patches mailing list:
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html

Thanks,
- Gary

Re: GCC 4.7.0 Status Report (2011-09-09)

2011-09-22 Thread Gary Funck

On 09/09/11 09:09:30, Jakub Jelinek wrote:
> [...] What is the status of lra, reload-2a, pph,
> cilkplus, gupc (I assume at least some of these are 4.8+ material)?

For GUPC, we are targeting GCC 4.8.

thanks,
- Gary

Re: Profiling gcc itself

2011-11-21 Thread Gary Funck

Two more suggestions (off-topic to the profiling point, but on
topic to the idea of speeding up builds involving invocations
of GCC):

ccache: http://ccache.samba.org/

"ccache is a compiler cache. It speeds up recompilation by caching
previous compilations and detecting when the same compilation is
being done again.  Supported languages are C, C++, Objective-C
and Objective-C++."

distcc: http://code.google.com/p/distcc/

"distcc is a program to distribute builds of C, C++, Objective C or
Objective C++ code across several machines on a network. distcc
should always generate the same results as a local build, is
simple to install and use, and is usually much faster than a
local compile."

Re: GCC 4.7.0 Status Report (2011-12-06)

2011-12-22 Thread Gary Funck

On 12/06/11 01:18:28, Joseph S. Myers wrote:
> [...] It still seems reasonable to aim for
> entering Stage 4 (regression fixes and documentation changes only) in
> early January and the 4.7.0 release in March or April.

At what point in time would the GCC 4.7 branch be created,
and the trunk would then be open for new contributions
(not planned for the 4.7 release)?  Is that also early Jan.?

Thanks,
- Gary

RFC: cgraph/lowering vs. finish_file for GCC/UPC rewrites?

2009-09-13 Thread Gary Funck


Recently, we have been working on upgrading GCC/UPC (see
http://gccupc.org) to the GCC trunk.  Previously,
we've sync'ed with the latest stable release, but
now we want to stay more current.

When built with GCC versions 4.0 through 4.3, we used
the gimplify language hook, LANG_HOOKS_GIMPLIFY_EXPR,
to rewrite trees that refer to UPC constructs and UPC
shared variable references - converting them into
non-UPC, gimplified, tree structures.  This worked
well, though we did need to extend the language hook
to include a gimplify test predicate and fallback
so that we can rewrite modify_expr's involving UPC
shared variables as the target:

int
upc_gimplify_expr (tree *expr_p,
   gimple_seq *pre_p, gimple_seq *post_p,
   bool (* gimple_test_f) (tree),
   int fallback)

Working with the latest GCC 4.5 snapshot, we have run
into a problem that leads me to believe that the current
approach will no longer work with the 4.5/trunk
version of GCC.

In prior GCC versions, the gimplify pass was called
before the call graph pass.  This meant that we could
safely employ the gimplify language hook to perform
the rewrites, which may emit inlined runtime calls.

An example UPC-related rewrite is to transform
UPC shared variable references into runtime calls.
This program:

shared int x;
shared int y;

int main()
{
  x = y;
}

might be translated into something like:

int main()
{
  int y_tmp = upc_get_int(upc_shared_addr(&y));
  upc_put_int(upc_shared_addr(&x), &y_tmp);
}

The definitions of the runtime functions upc_put_int()
and upc_get_int() are found in a pre-included header
file (the UPC driver adds a -include switch on the
command line).

Depending upon optimization level and compile time
switches - calls to the UPC runtime functions can
be implemented as either inlined function calls or
conventional calls to pre-compiled library routines.
At optimization levels above -O0, most of the UPC
runtime is inlined, by default.

With the new/current organization of the
compilation/call graph passes, we end up with the
surprising result that the inlined runtime function
definitions "disappear" before UPC's gimplify pass
can refer to them.  That's because the call graph
pass noticed that the inline runtime functions were
declared, but not referenced (yet).  The gimplify pass
is then run against the remaining function bodies,
but the UPC runtime functions are no longer available.

One workaround for this issue might be to mark the
runtime functions, in a fashion similar to ctors/dtors
so that the call graph pass won't eliminate them.
I'm unsure if that will get the inlining aspects of
those routines right, and it might retain unused
function definitions in the form of compiled
non-inlined code.

GOMP appears to use a "lowering" pass that runs after
the call graph and gimplify passes.  It calls runtime
routines via builtin function definitions, ensuring
that the function definitions won't go away.  However,
it looks to me as if GOMP does not inline those
runtime functions?

OBJC implements some post-processing in the
finish_file() hook routine, which in turn calls
objc_finish_file().  That may be a reasonable place
to relocate UPC's tree rewrites, but that leads to
a few questions:

Can gimplify_expr() be safely called on the same tree
more than once?  The question comes up because the
simplest thing is to retain the current infrastructure
where UPC rewrites occur in the gimplify language
hook.  The second gimplify pass will redo some
work, calling out to the UPC language hook again,
but since all UPC constructs have been rewritten and
gimplified, there will be no additional work done,
besides the traversal.

How about an alternative approach that implements a
custom tree-walk inside finish_file() (that is similar
in structure to that implemented in omp-low.c).
Is this rewrite routine allowed to selectively
gimplify parts of the tree and/or to create temp
variables managed by the code in gimplify.c?

Is the description above, of the interactions
between the cgraph, gimplify and lowering passes
correct?

What approach would you recommend for the
implementation of UPC tree re-writes that will
support calls to the runtime (that are inlined,
if applicable)?

thanks,

- Gary

Re: RFC: cgraph/lowering vs. finish_file for GCC/UPC rewrites?

2009-09-14 Thread Gary Funck

On 09/14/09 11:52:11, Richard Guenther wrote:
> Without reading all the details of your mail I suggest that you
> perform a custom walk over the function bodies right before
> the frontend calls cgraph_finalize_compilation_unit () that
> performs the necessary lowering (and function creation) to
> GENERIC.  The C++ frontend already does this during its
> genericize phase to transform frontend specific trees to
> middle-end GENERIC trees.

Richard, thanks.  Will take a look at how C++ handles things.

- Gary

reghunt and "trunk" (GCC 4.5.x)?

2010-01-06 Thread Gary Funck

Hello, 

I'm trying to set up 'reghunt' to track down a change
in behavior from 2009-03-27 (4.4.3) to present.  This is
my first time setting up 'reghunt' - it is quite possible
that I still haven't got things set up properly.

I think that I've got the SVN bits, and most of the config.
settings as they shoold be, but when I try to run my test,
it fails trying to build 'cc1':

/bin/sh gcc-reg-hunt/reghunt/src/gcc/../move-if-change
tmp-options.h options.h
echo timestamp > s-options-h
TARGET_CPU_DEFAULT="" \
HEADERS="auto-host.h ansidecl.h" DEFINES="" \
/bin/sh gcc-reg-hunt/reghunt/src/gcc/mkconfig.sh
bconfig.h
x86_64-redhat-linux-gcc -c  -g  -DIN_GCC   -W -Wall -Wwrite-strings
-Wcast-qual -Wstrict-prototypes -Wmissing-prototypes
-Wmissing-format-attribute -pedantic
-Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -Wold-style-definition -Wc++-compat
-fno-common  -DHAVE_CO
NFIG_H -DGENERATOR_FILE -I. -Ibuild
-Igcc-reg-hunt/reghunt/src/gcc
-Igcc-reg-hunt/reghunt/src/gcc/build
-Igcc-reg-hunt/reghunt/src/gcc/../include
-Igcc-reg-hunt/reghunt/src/gcc/../libcpp/include
-Igcc-reg-hunt/reghunt/src/gcc/../libdecnumber
-Igcc-reg-hunt/reghunt/src/gcc/../libdecnumber/bid
-I../libdecnumber \
-o build/errors.o
gcc-reg-hunt/reghunt/src/gcc/errors.c
as: line 83: exec: : not found

(above, lines split for readability)

Above 'as' is a script, and at line 83 it is trying to
invoke the assembler, which indirectly will try to invoke
ORIGINAL_AS_FOR_TARGET, but that variable is empty:
ORIGINAL_AS_FOR_TARGET=""

I notice that the build script, 'reghunt/bin/gcc-build-simple does
some explicit configure/make steps:

#msg "configure"
${REG_GCCSRC}/configure \
--prefix=$REG_PREFIX \
--enable-languages=$REG_LANGS \
$REG_CONFOPTS \
  > configure.log 2>&1 || abort "  configure failed"

#msg "make libraries"
make all-build-libiberty > ${LOGDIR}/make.all-build-libiberty.log 2>&1 || true
make all-libcpp > ${LOGDIR}/make.all-libcpp.log 2>&1 || true
make all-libdecnumber > ${LOGDIR}/make.all-libdecnumber.log 2>&1 || true
make all-intl > ${LOGDIR}/make.all-intl.log 2>&1 || true
make all-libbanshee > ${LOGDIR}/make.all-libbanshee.log 2>&1 || true
make configure-gcc > ${LOGDIR}/make.configure-gcc.log  2>&1 || true

and then:
cd gcc
# REG_COMPILER is cc1, cc1plus, or f951
#msg "make $REG_COMPILER"
make $REG_MAKE_J $REG_COMPILER > ${LOGDIR}/make.${REG_COMPILER}.log 2>&1 \
  || abort "  make failed"
msg "build completed"

Which is where we're failing.

I know that in the past, I've had trouble building 'gcc' by first
explicitly running a make on its configure-gcc target, because it
seems that some other precursors might've been left out - and this
area of configuration/build may have experienced some subtle
changes over the past year/two.

I'm guessing that I need to chase a config./set up problem of some
sort, but my top-level question is:

Has anyone used 'reghunt' to find regressions in the current GCC "trunk"
dating back a year/so (in this case, using a "simple" build)?

I'd welcome any help/suggestsions, on setting up 'reghunt'.

thanks.

Re: reghunt and "trunk" (GCC 4.5.x)?

2010-01-07 Thread Gary Funck

On 01/06/10 12:54:21, Ian Lance Taylor wrote:
> I think you need to make sure that the script removes any existing
> config.cache files.

Ian, thanks.  This turned out to be a cockpit error on my part.
The reghunt tools apparently expect the checked out gcc source tree
to have the form /gcc; thus the sub-tree containing
the GCC compiler is named /gcc/gcc.  I had left
off the extra level of 'gcc', tried to patch around it in the
reghunt tools, but didn't catch all the refs.  The net effect
is the build script tried to config/make gcc directly rather than
config-ing/making from the top-level.  After fixing that set up
error, the reghunt tools are working just fine, and I was able
to find the patch that I was looking for.

- Gary

Re: dwarf2 - multiple DW_TAG_variable for global variable

2010-01-09 Thread Gary Funck

On 01/09/10 12:39:55, Nenad Vukicevic wrote:
> This dwarf code started appearing since this patch:

Here's the GCC bug report that led to this patch:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39563

It references a GDB fix as well:
http://sourceware.org/ml/gdb-patches/2009-03/threads.html#00595
http://sourceware.org/ml/gdb-patches/2009-04/threads.html#00040
http://sourceware.org/ml/gdb-cvs/2009-04/msg00021.html

multiple defs. of TLS common symbols?

2010-01-13 Thread Gary Funck

We use TLS relocated symbols to create thread-local symbols
in the GCC UPC compiler, and have run into an issue illustrated
by the following program, on a test case that defines a
common symbol in several files, and uses it in a single file.

The following program fails to link, with multiple defs:

% head s.c t.c main.c
==> s.c <==
__thread int x;

==> t.c <==
__thread int x;

==> main.c <==
__thread int x;

int main()
{
  x = 1;
}

% gcc s.c t.c main.c
/tmp/ccK5Aj3k.o:(.tbss+0x0): multiple definition of `x'
/tmp/ccm0kY5f.o:(.tbss+0x0): first defined here
/tmp/ccchPiAt.o:(.tbss+0x0): multiple definition of `x'
/tmp/ccm0kY5f.o:(.tbss+0x0): first defined here
collect2: ld returned 1 exit status

But if we don't use TLS storage, it all links just fine:

% gcc -D__thread= s.c t.c main.c

Off-hand this looks like it might be a linker issue, but
perhaps there's an issue with the use of __thread in
in the context above?

Re: multiple defs. of TLS common symbols?

2010-01-13 Thread Gary Funck

On 01/13/10 17:15:10, Ian Lance Taylor wrote:
[...]
> Otherwise TLS variables are generated as definitions rather than as
> common variables. 
> 
> Why do you want them to be common?

For GCC/UPC compiled programs there are two compilation modes: 1) Each
UPC thread is implemented as a full process, and these processes might
be distributed across a network.  2) Each UPC thread is implemented as
an OS thread (ie, pthread), and they are created by a single process
and execute within its address spec.

In the "process model", "int x;" has the usual semantics.  It is
defined as a common symbol.  In the "pthread model", each file scoped
variable is "localized" and becomes thread local; this is implemented
by defining the variable using TLS relocation.

Intermixing previously compiled C code that refers to file scoped
variables with GCC/UPC compiled "pthread mode" files will likely not
work well.  But if the C code is compiled with the GCC/UPC compiler in
"pthread mode" all file scoped symbols will be localized and everything
should work as expected.

The "process model" is the more natural and preferred way to compile
UPC programs.  The pthread model can offer some efficiencies and can
make it easier to debug the program.

Given the above, the goal of compiling in pthreads mode is to be able
to compile regular "C" code as is, with the same behavior as when it
was compiled in the normal process model.  Thus, we want to translate
all file scoped variables into localized TLS variables with the fewest
surprises and differences.

> Personally I tend to think that that is a good
> thing.  Treating uninitialized variables as common variables is a
> non-standard extension even for C90.  We can't get rid of them for
> existing code, but __thread code is by definition new.

I agree with your statement above, but for our purposes things
will work better if we do create commonized TLS symbols.

Maybe we can use GOMP's method for creating commonized
TLS variables.  Thanks for pointing it out.

Do you/others on this list have a reference that supports
the statement: "Treating uninitialized variables
as common variables is a non-standard extension even for C90."?
(I did see a thread on this list, late April 1999, that
discussed some of the issues, but nothing definitive.)

thanks.

Re: RFC: cgraph/lowering vs. finish_file for GCC/UPC rewrites?

2010-01-14 Thread Gary Funck

On 09/14/09 11:52:11, Richard Guenther wrote:
> > What approach would you recommend for the
> > implementation of UPC tree re-writes that will
> > support calls to the runtime (that are inlined,
> > if applicable)?
> 
> Without reading all the details of your mail I suggest that you
> perform a custom walk over the function bodies right before
> the frontend calls cgraph_finalize_compilation_unit () that
> performs the necessary lowering (and function creation) to
> GENERIC.  The C++ frontend already does this during its
> genericize phase to transform frontend specific trees to
> middle-end GENERIC trees.

I tried the custom tree walk approach, but decided that it
will require some of the infrastructure already present in
the gimplify pass (e. g., the creation of temp. variables),
and that it is more expedient to utilize the current
language dependent gimplify hook, but to move it earlier
in the processing of the function body.

To that end, I defined a language dependent genericize hook:

   /* Determine if a tree is a function parameter pack.  */
   bool (*function_parameter_pack_p) (const_tree);

+  /* Genericize before finalization (called from finish_function()).
+ Perform lowering of function bodies from language dependent form
+ to language independent (GENERIC) form.  */
+  void (*genericize) (tree);
+

which is called from finish_function (instead of calling c_genericize):

   if (!decl_function_context (fndecl))
 {
   invoke_plugin_callbacks (PLUGIN_PRE_GENERICIZE, fndecl);
-  c_genericize (fndecl);
+  /* Lower to GENERIC form before finalization. */
+  lang_hooks.genericize (fndecl);

The UPC genericize hook is implemented as:

/* Convert the tree representation of FNDECL from UPC frontend trees
   to GENERIC.  */
void
upc_genericize (tree fndecl)
{
  /* Take care of C-specific actions first.
 Normally, we'd do this after the language-specific
 actions, but c_genericize is only a dumping pass
 now, and should be renamed.  */
  c_genericize (fndecl);
  /* Perform a full gimplify pass, because the UPC lowering rewrites
 are implemented using the gimplify framework.  */
  gimplify_function_tree (fndecl);
}

Although this may not be the best fit with the current
framework, it lets us re-use the gimplify pass
that we have been using with previous GCC 4.x
implementations.  At some point, we'll need to develop
a ground-up tree-walk rewrite pass.

How to mark gimple values addressable?

2010-01-14 Thread Gary Funck

(I'm copying this thread back to the main GCC list, to
document the problem that we ran into, RG's suggestion
and the fix that we made.)

While merging our GCC/UPC implementation with
the GCC trunk, we ran into a situation where
some tests failed on the check shown below in
verify_gimple_assign_single().  This failed because
our representation of a UPC pointer-to-shared has an
internal struct representation but in other respects
is a pointer type (and appears to be register type).

For some temps that UPC creates, they have to be
marked as addressable, which causes them to no longer
qualify as is_gimple_reg(), but the type still asserts
is_gimple_reg_type().

The trees that were being created failed on this test:

  if (!is_gimple_reg (lhs)
  && is_gimple_reg_type (TREE_TYPE (lhs)))
{
  error ("invalid rhs for gimple memory store");
  debug_generic_stmt (lhs);
  debug_generic_stmt (rhs1);
  return true;
}

At first, I wondered if the checks above might be overly
inclusive?

On 01/11/10 11:03:46, Richard Guenther wrote:
> You need a temporary for register type but non-register copy.  Thus
> it needs to be
> 
>   tmp_2 = A;
>   B = tmp_2;
> 
> with tmp_2 being an SSA name, not
> 
>   B = A;

Looking at some of the code in gimplify.c, we determined
that calling prepare_gimple_addressable() is all that is needed:

   if (!is_gimple_addressable (src)
   || is_gimple_non_addressable (src))
 {
   /* We can't address the object - we have to copy
  to a local (non-shared) temporary.  */
-  src = get_initialized_tmp_var (src, pre_p, NULL);
+  prepare_gimple_addressable (&src, pre_p);
   mark_addressable (src);
   is_shared_copy = 0;
   is_src_shared = 0;
 }
 }

To make this work, prepare_gimple_addressable() needed to be changed
so that it is exported from gimplify.c:

-static void
+void
 prepare_gimple_addressable (tree *expr_p, gimple_seq *seq_p)
 {
   while (handled_component_p (*expr_p))
 expr_p = &TREE_OPERAND (*expr_p, 0);
   if (is_gimple_reg (*expr_p))
 *expr_p = get_initialized_tmp_var (*expr_p, seq_p, NULL);
 }

With this fix in place, we were able pass the various checks
in tree-cfg.c, and to generate the expected code.

Re: multiple defs. of TLS common symbols?

2010-01-14 Thread Gary Funck

On 01/14/10 08:26:31, Ian Lance Taylor wrote:
> Online I found this:
> 
> http://www.faqs.org/docs/artu/c_evolution.html
> 
> [T]he ANSI Draft Standard finally settled on definition-reference
> rules in 1988.  Common-block public storage is still admitted as
> an acceptable variation by the standard.

Thanks, I found some dicussion in the C99 Rationle document,
http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
section 6.2.2, "Linkage of Identifiers" (pp. 32-34).

The email thread on this mailing list that I was referring to is here:
http://gcc.gnu.org/ml/gcc/2009-04/msg00812.html

GCC and binutils dependencies

2010-01-15 Thread Gary Funck

We recently ran into this 'as' bug running
tests with the GCC (4.5 pre-cursor) "trunk" compiler
on an x86_64 target running Ubuntu 8.04:
http://sourceware.org/bugzilla/show_bug.cgi?id=10255
(the bug was marked fixed in June 2009).

The issue was noted in this GCC PR:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40332

Since GCC 4.5 isn't out yet, I'm wondering what is
the policy, or general rule that is followed with
respect to a new version of GCC being dependent
upon a particular version of binutils, or some
important library?

And, would it make sense in this case to have
a GCC test case that exercises this 'as' bug to
be able to detect that either the bug is there in
the version of 'as' that is being used to build
and test GCC, or that a regression occurred?

thanks.

GUPC: A GCC frontend for UPC

2010-03-02 Thread Gary Funck

A GCC front-end (and runtime) for UPC (Unified Parallel C) is available
via the following GCC branch: svn://svn/gcc/branches/gupc.

The GUPC project is described here: http://gcc.gnu.org/projects/gupc.html.

Over the course of this year, we plan to work with the GCC
development community with the goal to merge UPC support
into the GCC mainline (perhaps in the GCC 4.6 release).

We appreciate any/all feedback and suggestions.

Thanks,
- The GUPC development team

RFC: merging GUPC into the GCC trunk?

2010-04-07 Thread Gary Funck

Now that GCC 4.5 has been branched from the main line,
it seems that this is an appropriate time to consider
GUPC for inclusion into the GCC trunk.

GUPC was recently checked in as a GCC branch:
http://gcc.gnu.org/projects/gupc.html

What is the recommended process for having GUPC reviewed
(and hopefully, subsequently approved) for being merged
into the GCC mainline?

Thanks,
- Gary

GCC primary/secondary platforms?

2010-04-07 Thread Gary Funck

On 04/07/10 11:11:05, Diego Novillo wrote:
> I would suggest splitting patches across reviewer domains.  See
> previous merges from big branches for examples.  This makes it easier
> for maintainers and reviewers to review the relevant parts.
> Additionally, make sure that the branch bootstraps and tests on all
> primary/secondary platforms with all languages enabled.

Diego, thanks for your prompt reply and suggestions.  Regarding
the primary/secondary platforms.  Are those listed here?
  http://gcc.gnu.org/gcc-4.5/criteria.html
We have access to only a few of the listed platforms,
(and in the case of IA64 the underlying OS is SuSE not
"unknown-linux-gnu").

How are the following targets handled?
  arm-eabi, mipsisa64-elf
Are these cross-compilers targetting some sort of instruction
set simulator?  Is there "how to" for setting up those
platforms and running tests?

Given the nature of UPC we're not sure that some of those
targets are applicable or will be initially supported;
though I can certainly see the value of making sure that we
don't break anything in the main line that would impact
those platforms.

Typically, how is this situation handled - where tests need
to be run on hardware/software platforms that we don't have
access to, prior to merging into the GCC trunk?

thanks,
- Gary

Re: GCC primary/secondary platforms?

2010-04-08 Thread Gary Funck

Although the dscussion regarding libstdc++-v3
is likely germaine to various developers who are
currently testing their changes and managing the
ports that they're responsible for, it seems that
this thread is venturing rather far from my
initial query.

I'm still wondering: Do GCC developers routinely
test their patches on MIPS, ARM, and S390 platforms
(for example)?  I signed up for the 'cfarm' and don't see
an S390 there, and some of the secondary targets
look like they might be really SLOW?

thanks,
- Gary

CSE bug when narrowing constants

2008-11-28 Thread Gary Funck

(Configuration: x86_64, GCC 4.2.3 base line)

I've run into a problem where GCSE decides to kill a
conditional jump instruction because it thinks that the
result is always false.  This happens when GCSE decides
to propagate a constant that is "narrowed" [the original
mode of the constant is word_mode (DImode) and the use of
the constant is in a narrower mode (SImode)].

This situation arises inside the code generated by our
GCC/UPC compiler, and so far I haven't been able to
come up with a regular C test case that demonstrates
the failure.  For efficiency reasons, internal to the
compiler, we overlay a 16 byte struct on top of a TImode
value.  The 16 byte struct is the representation of UPC's
"pointer-to-shared", which is a potentially cross-node
pointer consisting of three parts (vaddr, thread, phase).
It looks like this:

typedef struct {
   void *vaddr;
   unsigned int thread;
   unsigned int phase;
}
__attribute__ ((__aligned__(16)))
upc_shared_ptr_t;

Although not allowed by GCC, you can think of it has having
an additional "__attribute__ ((__mode__(__TI__)))" specification.

Here is an excerpt from the offending RTL that when passed to
GCSE will lead to incorrect deletion of a conditional jump:

[...]

(insn 19 16 21 2 (set (reg:DI 81)
(const_int 4294967296 [0x1])) 81 {*movdi_1_rex64} (nil)
(nil))

(insn 21 19 24 2 (set (subreg:DI (reg:TI 70 [ D.2967 ]) 8)
(reg:DI 81)) 81 {*movdi_1_rex64} (nil)
(nil))

(insn 24 21 25 2 (set (reg:SI 60 [ p$phase ])
(const_int 1 [0x1])) 40 {*movsi_1} (nil)
(nil))

(insn 25 24 26 2 (set (reg:SI 61 [ p$thread ])
(subreg:SI (reg:TI 70 [ D.2967 ]) 8)) 40 {*movsi_1} (nil)
(expr_list:REG_EQUAL (const_int 4294967296 [0x1])
(nil)))


[...]

;; Start of basic block 5, registers live: (nil)
(code_label 53 52 54 5 2 "" [2 uses])

(note 54 53 56 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(insn 56 54 57 5 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 61 [ p$thread ])
(const_int 0 [0x0]))) 3 {*cmpsi_ccno_1} (nil)
(nil))

(jump_insn 57 56 59 5 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0x0]))
(label_ref 63)
(pc))) 531 {*jcc_1} (nil)
(expr_list:REG_BR_PROB (const_int 7000 [0x1b58])
(nil)))

[...]

The conditional jump instruction formed by instructions
56 and 57 above is deleted because GCSE thinks that
(reg:SI 61 [ p$thread ]) is non-zero.  It comes to this
conclusion when it propagates the
   REG_EQUAL (const_int 4294967296 [0x1])
value listed in instruction 25:

(insn 25 24 26 2 (set (reg:SI 61 [ p$thread ])
(subreg:SI (reg:TI 70 [ D.2967 ]) 8)) 40 {*movsi_1} (nil)
(expr_list:REG_EQUAL (const_int 4294967296 [0x1])
(nil)))

Note that it takes 33 bits to express 0x1, and it won't
fit into an SImode container.  What CSE/GCSE should have done here is
written that REG_EQUAL note as follows:

(insn 25 24 26 2 (set (reg:SI 61 [ p$thread ])
(subreg:SI (reg:TI 70 [ D.2967 ]) 8)) 40 {*movsi_1} (nil)
(expr_list:REG_EQUAL (const_int 0)
(nil)))

because only the lower 32 bits of the value are relevant.
In that case, the conditional jump can be rewritten into
an unconditional jump, but certainly not deleted.

The code that decides it is OK to use the wider constant,
without adjustment to the narrow mode is here:

  /* If we are looking for a CONST_INT, the mode doesn't really matter, as
 long as we are narrowing.  So if we looked in vain for a mode narrower
 than word_mode before, look for word_mode now.  */
  if (p == 0 && code == CONST_INT
  && GET_MODE_SIZE (GET_MODE (x)) < GET_MODE_SIZE (word_mode))
{
  x = copy_rtx (x);
  PUT_MODE (x, word_mode);
  p = lookup (x, SAFE_HASH (x, VOIDmode), word_mode);
}

The logic above is OK as far as it goes, but the subsequent
return of the unadjusted wider constant causes problems:

  for (p = p->first_same_value; p; p = p->next_same_value)
if (GET_CODE (p->exp) == code
/* Make sure this is a valid entry in the table.  */
&& exp_equiv_p (p->exp, p->exp, 1, false))
  return p->exp;

I'd think that somewhere in there gen_lowpart() needs to
be called.  I'd appreciate your review of the above analysis
and any suggestions that you might have on implementing a fix.

Re: CSE bug when narrowing constants

2008-11-28 Thread Gary Funck

On 11/28/08 16:02:11, Gary Funck wrote:
> 
> I'd think that somewhere in there gen_lowpart() needs to
> be called.

I posted a suggested patch:
  http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01466.html
which fixes the reported problem.

Re: CSE bug when narrowing constants

2008-11-29 Thread Gary Funck

On 11/29/08 10:37:33, Eric Botcazou wrote:
> > The conditional jump instruction formed by instructions
> > 56 and 57 above is deleted because GCSE thinks that
> > (reg:SI 61 [ p$thread ]) is non-zero.  It comes to this
> > conclusion when it propagates the
> >REG_EQUAL (const_int 4294967296 [0x1])
> > value listed in instruction 25:
> >
> > (insn 25 24 26 2 (set (reg:SI 61 [ p$thread ])
> > (subreg:SI (reg:TI 70 [ D.2967 ]) 8)) 40 {*movsi_1} (nil)
> > (expr_list:REG_EQUAL (const_int 4294967296 [0x1])
> > (nil)))
> >
> > Note that it takes 33 bits to express 0x1, and it won't
> > fit into an SImode container.
> 
> Then this note is invalid, REG_EQUAL pertains to the destination register:
[...]

Eric, thanks for the clarification on the role of REG_EQUAL notes.

> IOW the culprit is not GCSE but whoever has created this note.

Agreed.  The routine that creates the errant REG_EQUAL note is
lookup_as_function().  I posted a possible patch:
  http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01466.html
(My FSF assignment is on file.)

- Gary

Re: CSE bug when narrowing constants

2008-11-29 Thread Gary Funck

On 11/29/08 14:45:49, Eric Botcazou wrote:
> > Agreed.  The routine that creates the errant REG_EQUAL note is
> > lookup_as_function().
> 
> Really?  Doesn't it only retrieve a pre-existing REG_EQUAL note?

It retrieves an equivalent rtx constant, if it exists.  Before the patch,
the constant that was returned is a word mode (DImode) constant
with the value 0x1 (33 bits), which won't fit into an SImode
value, and therefore isn't equivalent.  The fix is to call gen_lowpart()
in the case where the word mode constant is narrowed to a smaller mode.
In the example, the lower 32 bits of the constant will be used,
which is 0, and is the correct equivalent constant.  cse_insn() calls
lookup_as_function() ultimately through fold_rtx(), IIRC, and is the
routine that writes the REG_EQUAL note.

Re: CSE bug when narrowing constants

2008-12-01 Thread Gary Funck

On 12/01/08 11:50:48, Eric Botcazou wrote:
> > cse_insn() calls lookup_as_function() ultimately through fold_rtx(), IIRC,
> > and is the routine that writes the REG_EQUAL note.
> 
> OK, thanks.  But I'm a little at a loss as to why this problem arises only
> now: the problematic code in lookup_as_function is one decade old.  Do you 
> happen to have older compilers around (say GCC 4.1.x based) that correctly 
> compile the testcase?  If so, what happens differently with them?

Yeah, I was surprised as well.  The compiler base line
this problem arose on is 4.2.3, but I think that it will
occur in both older and newer base lines.  The problem is
triggered by code generated by the UPC (Unified Parllel C)
support that we've implemented in a project we call GCC/UPC.
It fails on a small UPC test case, but a number
of factors have to be present to trigger the problem.
I tried developing a vanilla C test case to duplicate the
problem, but have so far been unsuccessful.

Internally, we use VIEW_CONVERT_EXPR to overlay a TImode
container on top of a struct.  There is no exact C
equivalent, though a union comes close.  I tried that,
but couldn't replicate the exact set of events that have
to be present to hit the problem. I send what I tried
to you separately.  Perhaps adding some
sort of logging in lookup_as_function() that indicates
narrowing is occurring, and then running all test cases
(including Ada, because its unchecked_conversion is close
to what we're doing internally) would turn something up?

REC: gimplify - create a temp that is set at outermost block?

2009-05-19 Thread Gary Funck


For UPC code generation, we're building an alternate
method of accessing thread-local data that does not depend upon
operating system support of the __thread qualifier.

The motivation for this change is that we've noticed that
__thread has varying levels of support across operating
system/hardware platforms, and that when used extensively,
we've seen capacity limitations on some target systems.
UPC programs, when compiled in "pthreads mode" implicitly
define all normal, file scoped or static, variables as
being thread-local, which can lead to many TLS variables
or to a TLS section that is quite large.

The alternate implementation of TLS begins by targeting
all TLS variables to a special named section.  As an example,
the declaration,
  __thread int x;
can be thought of as being re-written into:
  int x __attribute__ ((section("tls_section")));
The runtime will allocate a per-thread block of memory
that is the size of "tls_section", and initialized by the
contents of that dummy section.  This per-thread TLS base
address will be maintained in an OS-dependent fashion as
a per-thread value that will be returned by a function,
called __get_tls(), which will obtain the per-thread value
(possibly calling a function an OS-supplied function,
for example, pthread_getspecific()).

All references to 'x' will be rewritten by the UPC-specific
gimplify pass into:
  *((&x - __tls_section_start) + __get_tls())
Above, "&x" is the address of 'x' derived in the conventional
fashion as its address inside the TLS dummy section, which
starts at the address given by "__tls_section_start".

The gimplify code that currently implements this calculation
looks like this:

  tls_base = lookup_name (get_identifier (UPC_TLS_BEGIN_NAME_STR));
  if (!tls_base)
fatal_error ("UPC thread-local section start address not found.  "
 "Cannot find a definition for " UPC_TLS_BEGIN_NAME_STR);
  tls_base = build1 (ADDR_EXPR, char_ptr_type, tls_base);
  /* Refer to a shadow variable so that we don't try to re-gimplify
   * this TLS variable reference.  */
  var_addr = shadow_var_addr (var_decl);
  tls_offset = build_binary_op (MINUS_EXPR,
convert (ptrdiff_type_node, var_addr),
convert (ptrdiff_type_node, tls_base), 0);
  if (!useless_type_conversion_p (sizetype, TREE_TYPE (tls_offset)))
tls_offset = convert (sizetype, tls_offset);
  tls_var_addr = build2 (POINTER_PLUS_EXPR, char_ptr_type,
 cfun->upc_thread_ctx_tmp, tls_offset);
  tls_ref = build_fold_indirect_ref (tls_var_addr);
  *expr_p = tls_ref;
  return GS_OK;

(If you see any opportunities to improve/correct this code,
please feel free to comment.)

Above, you'll see a reference to "cfun->upc_thread_ctx_tmp";
this is a temporary variable that holds the value returned from
__get_tls().  The idea is to call __get_tls() only once
upon entry to the current function being compiled, and to re-use
its value where needed.

I made a first attempt at implementing this caching of the
__get_tls() value, but have so far been unsuccessful.  Here's
the current implementation:

  if (!cfun->upc_thread_ctx_tmp)
{
  const char *libfunc_name = UPC_GET_TLS_LIBCALL;
  tree libfunc, lib_call, tmp;
  libfunc = lookup_name (get_identifier (libfunc_name));
  if (!libfunc)
internal_error ("runtime function %s not found", libfunc_name);
  lib_call = build_function_call (libfunc, NULL_TREE);
  if (!lang_hooks.types_compatible_p (char_ptr_type, TREE_TYPE (lib_call)))
lib_call = build1 (NOP_EXPR, char_ptr_type, lib_call);
  tmp = create_tmp_var_raw (char_ptr_type, "TLS");
  TREE_READONLY (tmp) = 1;
  DECL_INITIAL (tmp) = lib_call;
  /* Record the TLS base address at the outermost level of
   * this function.  */
  DECL_CONTEXT (tmp) = current_function_decl;
  DECL_SEEN_IN_BIND_EXPR_P (tmp) = 1;
  declare_vars (tmp, DECL_SAVED_TREE (current_function_decl), false);
  cfun->upc_thread_ctx_tmp = tmp;
}

(The code from "TREE_READONLY" to "DECL_SEEN_IN_BIND_EXPR" above
is cribbed from "gimple_add_tmp_var()" and
"gimplify_init_constructor()".)

The idea above is to initialize a temporary variable at
the outer scope of the current function.  Presumably,
setting the initial value to the value returned by calling
__get_tls(), and then calling "declare_vars()" to declare the
temp. variable at the outermost scope of the function will
do the job, but this code isn't having the intended effect.

My sense is that the DECL_INITIAL() value above is being
ignored and that code isn't being generated for it, and
it seems possible that it won't be properly rescanned
for gimplification.

I'd appreciate any observations that you might have on
why the implementation above doesn't work, and how to
re-implement this section of code so that it has the
desired effect.  Perhaps, there's is code in GCC that
currently does something like this, that I can refer

Re: REC: gimplify - create a temp that is set at outermost block?

2009-05-19 Thread Gary Funck

On 05/19/09 11:29:57, Andrew Pinski wrote:
> On Tue, May 19, 2009 at 11:25 AM, Gary Funck  wrote:
> >
> > For UPC code generation, we're building an alternate
> > method of accessing thread-local data that does not depend upon
> > operating system support of the __thread qualifier.
> 
> GCC has already added generic support for the __thread qualifier which
> does not depend on the OS needing builtin support at all.

Andrew, thanks.

The only implementation that I'm aware of is described in
Ulrich Drepper's 2005 paper,
http://people.redhat.com/drepper/tls.pdf

Is the __thread feature now more universally/portably
supported?  My impression is that this feature requires
GNU/ELF linker and glibc support.  Is that correct? 

We have been using builtin __thread support for quite a
while.  It generally has worked well on most modern Linux
platforms, but we have encountered a few issues/glitches:

* On SuSE 10/altix, we have seen overflows of the
thread-local linker section, when compiling programs
that declare many large TLS variables.

* On CentOS 5/x86, we have seen programs that sometimes
fail at 'exec' time, possibly because it can't muster
the resources needed to start the program, or allocate
the memory map.  Those failures have been intermittent
with no suspicious entries in the system logs.

* On the older SGI/Irix systems, there has been no
__thread support at all from what I can recall.

Those limitations have motivated our need to provide
a more portable implementation of TLS variables.

thanks,
- Gary

Re: REC: gimplify - create a temp that is set at outermost block?

2009-05-19 Thread Gary Funck

On 05/19/09 12:10:43, Andrew Pinski wrote:
> Gary wrote:
> > Is the __thread feature now more universally/portably
> > supported?
> 
> Yes, see emutls.c and the VAR_DECL case in expand_expr_addr_expr_1 and
> expand_expr_real_1 in expr.c.
> [...] for the emulated support which is
> implemented on the target side in emutls.c.
> 
> On the tree level __thread looks the same for both the emulated and
> native supports.

Experimenting with this __thread emulation, a bit, I found that
the following configure options appear to enable TLS emulation:
  --enable-threads=posix -disable-tls
(where --enable-threads is likely unnecessary on most
modern x86/Linux targets)

Trying the following simple test program:

__thread volatile int x;

int main ()
{
  x = 1;
  return x;
}

The following code was generated:

movl$__emutls_v.x, %edi
call__emutls_get_address
movl$1, (%rax)
movl$__emutls_v.x, %edi
call__emutls_get_address
movl(%rax), %eax
addq$8, %rsp
ret

Above, __emutls_get_address() is called twice, with
the same argument.  I was surprised to see that the optimizer
(GCC 4.3.2) didn't notice this and use CSE to avoid the second
redundant call, because emultls_get_address is defined as
a "const" function:

DEF_EXT_LIB_BUILTIN (BUILT_IN_EMUTLS_GET_ADDRESS, "__emutls_get_address",
 BT_FN_PTR_PTR, ATTR_CONST_NOTHROW_NONNULL)

Back to the issue at hand, it may turn out that GCC's TLS emulation
(thanks for pointing this out) will have acceptable performance
I'm still interested in understanding how to create a gimple
temporary that is set once upon entry to a function, so that its
value is available within the function's body.

thanks,
- Gary

grokdeclarator drops type qualifiers when -aux-info isn't asserted?

2009-05-19 Thread Gary Funck

Recently, I was debugging an issue in the GCC/UPC front-end
that related to some problems compiling specific UPC type
declarations.  The front-end was, in certain cases, dropping
UPC's "shared" qualifier.

The relevant code is in grokdeclarator:

  if (!flag_gen_aux_info && (TYPE_QUALS (element_type)))
  type = TYPE_MAIN_VARIANT (type);

Above, if the -aux-info switch isn't asserted
then the type is set to its main variant.

The -aux-info switch does the following:

`-aux-info FILENAME'
 Output to the given filename prototyped declarations for all
 functions declared and/or defined in a translation unit, including
 those in header files.  This option is silently ignored in any
 language other than C. [...]

Given that this switch enables the generation of a report,
it is surprising that this switch would cause the front-end
to work differently depending upon whether -aux-info is asserted or not.

That aside, I wonder if it is an error to drop the qualifiers
as shown above?  In the case of UPC, for example, dropping qualifiers
definitely leads to problems; it may be the case that UPC's
logic has to be reworked a bit if, in fact, the TYPE_MAIN_VARIANT()
call above is needed.

thanks,
- Gary

Re: grokdeclarator drops type qualifiers when -aux-info isn't asserted?

2009-05-20 Thread Gary Funck

On 05/20/09 09:45:11, Joseph S. Myers wrote:
> On Tue, 19 May 2009, Gary Funck wrote:
> 
> > That aside, I wonder if it is an error to drop the qualifiers
> > as shown above?  In the case of UPC, for example, dropping qualifiers
> 
> Please read the code (and comment) immediately above that you quoted, 
> which saves the qualifiers combined with those specified in the 
> declaration, and the subsequent code applying them in the process of 
> building up the type.  
> [...] See the named address space patches for 
> examples of adding extra type qualifiers.

Thanks.  We've generally gotten that part right by adding a few
qualifier bits.  We can't however encode UPC's "layout qualifier"
into the qualifier bits and we have to maintain it separately.
I do see now that the layout qualifier on an element type should
be handled earlier along with the rest of the qualifiers in the
section that you're referencing.

> The bug would probably be that it doesn't also drop 
> them if flag_gen_aux_info.

Agreed.  Though presumably the flag_gen_aux_info logic
will have to be adjusted as well.

Re: REC: gimplify - create a temp that is set at outermost block?

2009-05-20 Thread Gary Funck

On 05/20/09 10:40:02, Richard Guenther wrote:
> Gary wrote:
> > Above, __emutls_get_address() is called twice, with
> > the same argument. I was surprised to see that the optimizer
> > (GCC 4.3.2) didn't notice this and use CSE to avoid the second
> > redundant call, because emultls_get_address is defined as
> > a "const" function.
> 
> This is likely because the libcall lacks a REG_EQUAL note (or
> we lack something to put there).  Tree level CSE would catch
> it, but it doesn't see these function calls.

Understood. 

Do you/others happen to know who is the maintainer of the
TLS emulation?  I tried a simple test case that works
with the native TLS support, but it SEGV's when using
TLS emulation.  Perhaps a cockpit error on my part, but
I'd like to see if I can use the TLS emulation for our
purposes, and a first step is to get the example to work.

thanks,
- Gary

Re: REC: gimplify - create a temp that is set at outermost block?

2009-05-20 Thread Gary Funck

On 05/20/09 17:13:23, Ian Lance Taylor wrote:
> Gary Funck  writes:
> 
> > Do you/others happen to know who is the maintainer of the
> > TLS emulation?
> 
> [...] If you have found a bug, the fastest
> way to address is probably to file a bug report.

Doing a bit of research, it seems that the bug has already been
been recently reported (against GCC 4.3, which the baseline
we're using),
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40024
(The suggested fix hasn't been committed to the head
svn branch, however.)

thanks,
- Gary

RFC: [GUPC] UPC-related changes

2010-07-07 Thread Gary Funck


FYI, over the course of the next week/so, I will
post UPC-related changes to the gcc-patches
mailing list, for review.  The goal is to make the
necessary fixes/changes, based upon review feedback,
that need to be made prior to merging the GUPC branch
into the GCC trunk.

Email describing the changes will be grouped according to
a general area (front-end, make/configure,
debugging info., middle-end, etc.). The first email,
describing front-end changes is here:
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00628.html

- Gary

RFE: 'enable checking' as a GCC compilation switch?

2010-10-03 Thread Gary Funck

Recently, I ran into a couple of bugs/regessions that show up
only if checking is enabled.  This led me to the observation that
it might be useful if checking can be enabled at runtime via a
gcc command line switch.  If this capability can be enabled by
default, then regression tests could depend upon the checking
capability, or users could be asked to run with full checks
enabled when reporting bugs, etc.

There will be some overhead to test for the switch, though the
code that does the checking might remain under an #ifdef as it
does now, to ensure that it absolutely isn't compiled unless the
appropriate configuration option is enabled.  That said, I would
argue that if we go to the trouble to implement the capability,
then support for checking switches should be enabled by default.
If the code is never conditionalized, then --enable-checking=xxx
might be re-defined to assert the various checking flags by default.


Here is are some quick stats on the use/frequency of
various checking options in GCC:

ENABLE_CHECKING251
ENABLE_RTL_CHECKING 21
ENABLE_IRA_CHECKING 11
ENABLE_FOLD_CHECKING10
ENABLE_GC_CHECKING   9
ENABLE_DF_CHECKING   7
ENABLE_MALLOC_CHECKING   7
ENABLE_TYPES_CHECKING5
ENABLE_TREE_CHECKING 4
ENABLE_ASSERT_CHECKING   3
ENABLE_GIMPLE_CHECKING   3
ENABLE_RTL_FLAG_CHECKING 1
ENABLE_SCOPE_CHECKING1
ENABLE_VALGRIND_CHECKING 1
   ---
Total  334

Certainly, plenty of them to deal with, but perhaps with a bit of
scripting the bulk of the changes can be automated.

GCC and out-of-range constant array indexes?

2010-10-07 Thread Gary Funck

Consider the following:

$ cat -n t.c
 1
 2  int A[10] = { 0 };
 3
 4  int main()
 5  {
 6A[10] = 10;
 7A[-1] = -1;
 8return 0;
 9  }

In a compiler test case that I reviewed recently, there was the
expectation that the compiler would issue a compile-time warning
on the statements at lines 6 an 7 above.  I tried this with
GCC version "gcc (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10)"
recently and was unable to find compilation switches that
would cause it to complain about the use of out-of-range
indexes above.

Is there a technical reason that the compiler should not
issue a warning, or might this feature become a legitimate RFE?

thanks,
- Gary

Re: GCC and out-of-range constant array indexes?

2010-10-08 Thread Gary Funck

On 10/07/10 21:24:18, Ian Lance Taylor wrote:
> -Warray-bounds, but that is one of the warnings which is unfortunately
> only available when optimizing.  In this case it requires -O2.

Ian, thanks.  I had thought optimization might be involved, but didn't try -O2.

> There was an attempt a couple of years ago to implement this warning
> when not optimizing [...].

Would it be possible to compute enough of the control flow graph
to process warnings like this one, without running the
actual optimizations, unless those optimizations are requested?
Would the cost be too high?

- Gary

Re: RFE: 'enable checking' as a GCC compilation switch?

2010-10-08 Thread Gary Funck

On 10/03/10 12:03:44, Ian Lance Taylor wrote:
> You will need to try a sample implementation and see how much the
> compiler slows down and how much bigger it gets.

I began roughing out the required changes.  This will be a background
project.  If I can finish it to the point of running some timing
tests, I will post the results here.

thanks,
- Gary

Re: GCC and out-of-range constant array indexes?

2010-10-08 Thread Gary Funck

On 10/08/10 18:38:29, Basile Starynkevitch wrote:
> I am not an expert on these optimizations, but why would you want that? 

I routinely compile/build with "-O0 -g3" because the code is easier to debug.  I
also admit that I compile/build with "-O0" because it is faster than
"-O2" or "-O3" for example, and during development I am more interested
in faster turn-around time on builds than faster execution time.

Also, when I compile/build projects, I try to use the maximum level of warnings
and checking that the source code base will support.  I am willing to trade
off some support/build time in favor of more thorough warnings.

- Gary

Re: GCC and out-of-range constant array indexes?

2010-10-08 Thread Gary Funck


How about the following:

1) Default warnings are cheap, and work fine at -O0.

2) Expensive warnings (-Wall, -Warray-bounds, -Wuninitialized, -Wunused)
[not sure about the actual list] that require optimizations, will issue
a Warning when they are requested, but the appropriate optimization level
has not been asserted, that is required for those warnings to work in their
maximal fashion.

Or:

Specification of the expensive warnings will cause appropriate
control flow computations that are required to support those
warning levels. (as suggested previously)

Re: GCC and out-of-range constant array indexes?

2010-10-08 Thread Gary Funck

On 10/08/10 13:22:46, Ian Lance Taylor wrote:
> I think both of those alternatives would be surprising and easily
> misunderstood behaviour for many compiler users.  [...]

I find the following behavior to be surprising:

$ gcc -Warray-bounds -O0 -c t.c
$ gcc -Warray-bounds -O1 -c t.c 
$ gcc -Warray-bounds -O2 -c t.c 
t.c: In function ‘main’:
t.c:6: warning: array subscript is above array bounds
t.c:7: warning: array subscript is below array bounds

The impact is that I may think that after I build my project at
-O0 or -O1, with various warnings enabled, that there are
potential surprises that await, when I perform a production build
at -O2 and higher.

It makes perfect sense to me that the following happens:

$ gcc -Warray-bounds -O1 -c t.c 
t.c: Warning: -Warray-bounds has no effect unless compiled
with optimization level -O2 and higher.

> Almost all current warnings already meet those requirements; the main
> problem child is -Wuninitialized.

... and -Warray-bounds?

Re: GCC and out-of-range constant array indexes?

2010-10-11 Thread Gary Funck

On 10/08/10 18:38:29, Basile Starynkevitch wrote:
> I am not an expert on these optimizations, but why would you want that? 
> The optimizations involved are indeed expensive (otherwise it would be
> -O1 not -O2), but once you asked for them, why only get warnings
> without the code generation improvement?

Because the optimizations also make the generated code more
difficult to debug, and can introduce new (buggy optimization) bugs.
I prefer to get the code working with -O0 and then verify that it
still works after optimization, because I think that minimizes
my development risk and maximizes my productivity.  Along those
lines, I would still like to have all the compile-time warnings
that I can get, and am willing to have my non-optimized builds
go a little slower (say, no more than 20% slower) to have the
additional warnings.

> However, I see a logic in needing -O2 to get some warnings.
> Optimizations are expensive, and they compute static properties of the
> source code, which are usable (& necessary and used) for additional
> warnings.

After hearing the pros/cons, I have come around to the point of view
that GCC's method of detecting things like uninitialized local variables
is part of its optimization architecture.  If I accept that my
development cycle is: ("first -O0, then full optimization"), then I
will have to accept that some warnings might show up when optimizations
are turned on.  Either that, or I might routinely run a tool like
PC-LINT, or Coverity during development, and this may minimize
the surprise warnings that pop up when optimizations are enabled.
Or as you suggested, always run two parallel builds: one optimized, and
one not.

I appreciate every one's ideas and suggestions.  This has
been an interesting discussion thread.

- Gary

codegen differences for increment of a volatile int

2006-05-04 Thread Gary Funck


I've been looking at how GCC 4.0 handles "volatile" internally,
and may have a question/two on that later, but in the meantime,
I noticed some interesting differences in generated code that I
thought were a bit unusual, and was wondering if someone here
might explain why GCC behaves as it does, and what might be the
recommended behavior?

Beginning with this simple example,

 1  int j;
 2  volatile int jv;
 3  void p()
 4  {
 5++j;
 6++jv;
 7  }

when compiled with "gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)"
the following code results:

inclj
movljv, %eax
incl%eax
movl%eax, jv

Note that in the case where 'j' is _not_ volatile that a
single 'incl' was generated, but in the case where 'jv'
is volatile, the value was first loaded into a register,
then incremented and stored back into memory.
(asserting -O2 didn't substantially change the
generated code)

Compiling under "gcc (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)",
the compiler always uses the form where the value is first
loaded from memory into a register:

movlj, %eax
incl%eax
movl%eax, j
movljv, %eax
incl%eax
movl%eax, jv

However, if -O2 is asserted, then the behavior reverts
back the same behavior as demonstrated in gcc 3.4:

inclj
movljv, %eax
incl%eax
movl%eax, jv

[both systems are i386-redhat-linux (FC3 and FC4)]

Is there a technical reason that the use of "volatile" would
dictate the second form of increment that first loads the
value from memory into a register?  I would think that a
systems programmer might expect the opposite behavior, where
"volatile" would imply the single instruction form of increment
(which is non-interruptible on single processor systems).

RE: codegen differences for increment of a volatile int

2006-05-05 Thread Gary Funck



> From: Bernd Jendrissek
> Sent: Friday, May 05, 2006 12:50 AM
[...]
> Systems programmers should know better than to expect a particular
> implementation of volatile. :)
> 
> How, for example, would you suggest GCC generate code for this?
> 
> volatile int qwerty;
> 
> void p()
> {
>   printf("qwerty = %d\n", ++qwerty);
> }
> 
> You could get a (uniprocessor non-interruptible) single-instruction
>   incl   qwerty
> but then you'd have to read the value again to be able to print it:
>   movl   %eax, qwerty
> at which point you've lost your "one evaluation is one read cycle"
> semantics which some people might find even more important than
> (uniprocessor!) atomicity.
> 
> Don't forget that if you really wanted SMP-safe modification of
> volatiles you'd have to use the "lock" prefix too.

All good points, and I agree.  I just mentioned this idea, because
GCC is choosing the single instruction memory to memory form in
some situations, and I was surprised that it chose this form in
the non-volatile case, because it made more sense to me to prefer
it in the volatile case - if it were to prefer it all in one
situation over another.

The current GCC main branch compiler offers a new rendition
of the generated code at -O2:

movljv, %eax
addl$1, j
addl$1, %eax
movl%eax, jv

where, when incrmenting the non-volatile 'j', it chosses 'addl'
over 'incl'.

create_tmp_var_raw (gimplify.c) inadventently asserts 'volatile' on temps

2006-05-05 Thread Gary Funck


While following GCC's handling of 'volatile' and other type
qualifiers, I noticed that the gimplify pass created temporaries
with a type with 'volatile' asserted if the underlying type also
had 'volatile' asserted.

Temporaries are created by the create_tmp_var_raw() procedure
in gimplify.c, which reads as follows:

tree
create_tmp_var_raw (tree type, const char *prefix)
{
  tree tmp_var;
  tree new_type;

  /* Make the type of the variable writable.  */
  new_type = build_type_variant (type, 0, 0);
  TYPE_ATTRIBUTES (new_type) = TYPE_ATTRIBUTES (type);

  tmp_var = build_decl (VAR_DECL, prefix ? create_tmp_var_name (prefix) :
NULL,
type);
[...]

Note above that an unqualified type, new_type, is created but
then subsequently not used in the call to build_decl.  Because of
this omission, if 'type' originally had any qualifiers set
(such as volatile), they'll be propagated to the temporary, which
might have some unexpected effects on subsequent optimizations
and code generation.

The fix, I think, is to pass 'new_type':

Index: gimplify.c
===
--- gimplify.c  (revision 113552)
+++ gimplify.c  (working copy)
@@ -449,7 +449,7 @@
   TYPE_ATTRIBUTES (new_type) = TYPE_ATTRIBUTES (type);

   tmp_var = build_decl (VAR_DECL, prefix ? create_tmp_var_name (prefix) :
NULL,
-   type);
+   new_type);

   /* The variable was declared by the compiler.  */
   DECL_ARTIFICIAL (tmp_var) = 1;

(If this analysis is correct and it is recommended that I file a
bug report on this, or post a patch, please let me know.)

'volatile' is propagated into constants and expression nodes (in some cases)?

2006-05-05 Thread Gary Funck


Given,

 1  volatile int jv;
 2
 3  int main ()
 4  {
 5++jv;
 6  }

GCC (development branch, 4.0 and up) creates a tree
node for the expression ++jv that has 'volatile' asserted
in the type associated with the expression:

 
unit size 
align 32 symtab 0 alias set -1 precision 32 min  max >
side-effects
arg 0 
side-effects volatile used public static common SI defer-output file
a.c line 1 size  unit size 
align 32>
arg 1 
constant invariant 1>>

Further, 'volatile' is asserted in the type associated with the
integral constant 1, above:

(gdb) pt
  constant
invariant 1>
(gdb) p 0x402f2e04
$19 = 1076833796
(gdb) pt
  constant invariant 32>
unit size  constant invariant 4>
align 32 symtab 0 alias set -1 precision 32 min  max >

We could argue whether this causes any real harm, because the ISO C
spec. says the following:

===

6.7.3:
The properties associated with qualified types are meaningful only for
expressions that are lvalues.

6.5.16:
The type of an assignment expression is the type of the left operand unless
the left operand has qualified type, in which case it is the unqualified
version of the type of the left operand.



And hopefully subsequent passes in the compiler won't be confused
by seeing qualifiers asserted in expression nodes and in constants.

IMO it would be better if the original tree constructed from
the parsed program more closely followed the original source code,
and where possible, removed extraneous qualifiers, unless they
absolutely needed to convey correct semantics.

Above, the qualifiers on expression nodes and constants seem to come
about by a call to convert() from build_unary_op()which works its way
through to this statement in fold_convert():

  if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (orig)
  || lang_hooks.types_compatible_p (TYPE_MAIN_VARIANT (type),
TYPE_MAIN_VARIANT (orig)))
return fold_build1 (NOP_EXPR, type, arg);

because the main variant types of the qualified "volatile int" and
unqualified "int" are the same, convert() ends up recasting 'arg'
into a qualified (volatile int) type.

I don't know if there are other cases besides pre-/post-
increment that have this problem.  I think it is
also possible that the code in the development head branch
does a better job of generating expression nodes that have
their qualifiers stripped than 4.0 did for example.

Perhaps one way to gain some confidence that all possibilities
have been covered is to add assertions in build_binary_op
and build_unary_op (or build1 and build2 for that matter,
for expression class nodes) that checks that
TYPE_QUALS(t) == TYPE_UNQUALIFIED on expression nodes and
constant nodes (though perhaps TYPE_CONST is meaninful for
certain named constants?).

problem implementing language-specific gimplify of TRUTH_ANDIF expression

2006-06-09 Thread Gary Funck




Working with GCC 4.0.1, we're implementing an experimental dialect of C,
called UPC, which offers language extensions for parallel computing in
a distributed shared memory setting (see: http://intrepid.com/upc).

Generally, the work has proceeded well, and the language-specific callout
in gimplify_expr() have been sufficient to implement UPC features by
rewriting language extensions into C-like tree structures that can be
further gimplified.  However, we've run into a glitch, and I'm not
quite certain where the fix should go, or how the fix should be implemented.

UPC has a shared pointer that can address data in another process (called
a thread in UPC terminology).  A shared pointer has the following fields:

struct shared_ptr_struct
  {
unsigned long int phase  : 48;
unsigned int thread : 16;
void *offset;
  };
typedef struct shared_ptr_struct shared_ptr_t;

Two shared pointers are equal if  all fields are equal:

int
cmp_ptr_eq (shared_ptr_t p1, shared_ptr_t p2)
{
  return p1.offset == p2.offset
 && p1.phase == p2.phase
 && p1.thread == p2.thread;
}

The UPC-specific gimplify routine which implements shared pointer
comparisons rewrites an expression like (p1 == p2) into the
sort of code shown above.

Here's the actual UPC-specific gimplify code:

*expr_p = build_binary_op (TRUTH_ANDIF_EXPR, off_cmp,
build_binary_op (TRUTH_ANDIF_EXPR,
 phase_cmp, offset_cmp, 0), 0);

where off_cmp, thread_cmp and phase_cmp are expressions which evaluate
the equality comparison for the offset, thread, and phase fields.
For example,

  off0 = build3 (COMPONENT_REF, o_t, op0,
 upc_vaddr_field_node, NULL_TREE);
  off1 = build3 (COMPONENT_REF, o_t, op1,
 upc_vaddr_field_node, NULL_TREE);
  off_cmp = build_binary_op (code, off0, off1, 0);

All this works pretty well, but ICE's on the following small UPC test
program:

shared int *p;

int main(int argc, char **argv)
{
int errors = 0;
if (p == NULL) {
/* no action */
} else {
errors = 1;
}
}

% upc t.upc
t.upc: In function 'main':
t.upc:9: internal compiler error: in invert_truthvalue, at fold-const.c:3026
Please submit a full bug report,
with preprocessed source if appropriate.
See http://www.intrepid.com/upc/bugs.html> for instructions.

It fails here:

#1  0x005c2c39 in invert_truthvalue (arg=0x2e0bac30)
at /upc/gcc-upc-4/src/gcc/fold-const.c:3026
3026  gcc_assert (TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE);

The type of the arg is integer_type, not boolean:

(gdb) p arg->common.type
$1 = 0x2decca90
(gdb) pt
  constant invariant 32>
unit size  constant invariant 4>
align 32 symtab 0 alias set -1 precision 32 min  max 
pointer_to_this >

It is an integer type because the initial build_binary_op(TRUTH_ANDIF_EXPR
...
uses the type of the result of the comparisons, which is integer_type.

The TRUTH_ANDIF expr is gimplified in gimplify_boolean_expr:

3079gimplify_boolean_expr (tree *expr_p)
3080{
3081  /* Preserve the original type of the expression.  */
3082  tree type = TREE_TYPE (*expr_p);
3083
3084  *expr_p = build (COND_EXPR, type, *expr_p,
3085   convert (type, boolean_true_node),
3086   convert (type, boolean_false_node));
3087
3088  return GS_OK;
3089}

basically a boolean expression b is converted into a true or false value
by rewriting it as:
   (b) ? true : false
However, as the comment states "Preserve the original type of the
expression.",
the original type of the expression, 'b', is kept.  In this case, the type
is integer_type not boolean type.

Thus the original
  (EQ_EXPR p1 p2)

is rewritten into
  (COND_EXPR integer_type
 (TRUTH_ANDIF_EXPR (EQ_EXPR p1.offset p2.offset)
 TRUTH_ANDIF_EXPR (EQ_EXPR p1.phase p2.phase) (EQ_EXPR
p1.thread p2.thread)))
 (boolean_type true)
 (boolean_type false))

If we call the condition expression above, 'cond', then the test program has
the following structure:
  (COND_EXPR (cond)
 (void)
 (MODIFY_EXPR (VAR_DECL errors) (constant 1)))

Invert_truthvalue wants to rewrite the construct above into:
  (COND_EXPR (TRUTH_NOT_EXPR (cond))
 (MODIFY_EXPR (VAR_DECL errors) (constant 1))
 (void))

This runs into trouble when invert_truthvalue attempts to
negate the condition 'cond', insisting that cond be a boolean
expression. Under normal conditions this isn't a problem, because
the normal flow of control of parsing if statemtns and then
gimplifying them would have forced 'cond' to be of boolen type.

The problem arises when UPC rewrites the EQ_EXPR into a TRUTH_ANDIF
expr.  The condition expression missed a chance to be converted to a
boolean type in gimplify_boolean_expr(), because that function
preserves the incoming (integer) type, and it misses an opportunity
again when th

externs and thread local storage

2006-07-01 Thread Gary Funck


Consider the following program made up of two separate files:

==> file1.c <==
extern int x;

int main() {
x = 5;
}

==> file2.c <==
int __thread x = 10;

This will compile, link, and run on the IA64, but will fail at link time on
AMD64:

% gcc file2.c file1.c
/usr/bin/ld: x: TLS definition in /tmp/ccmdUAs3.o section .tdata mismatches
non-TLS reference in /tmp/ccuSmPAa.o
/tmp/ccuSmPAa.o: could not read symbols: Bad value
collect2: ld returned 1 exit status

However if the initial extern were changed to:
  extern __thread int x;
it will also compile, link, and run on the AMD64.

To further complicate matters, if the program is rewritten into a single
file as follows:

int __thread x;

int main() {
  extern int x;
  x = 5;
}

it will fail at compile-time with gcc 4.1:

fx.c: In function 'main':
fx.c:4: error: non-thread-local declaration of 'x' follows thread-local
declaration
fx.c:1: error: previous declaration of 'x' was here

independent of the fact that this program likely would work fine on the IA64
and perhaps some other architectures.

It seems that GCC is enforcing a policy that the __thread attribute has to
be added to extern declarations if the underlying variable is declared
with the __thread attribute.

If we viewed the __thread attribute as something like assigning a variable
to a particular linkage section (which is what it does), then shouldn't
that assignment be transparent to programs referencing the variable via
an extern?

What are the technical reasons for the front-end enforcing this restriction,
when apparently some linkers will handle the TLS linkage fine?  If in fact
it is required that __thread be added to the extern, is the compiler simply
accommodating a limitation/bug in the linker?

RE: externs and thread local storage

2006-07-01 Thread Gary Funck

Mike Stump wrote:
>
> This sounds like a bug that should be fixed.  You should only need
> __thread on the extern if there was not a previous declaration for it.
>

The compiler seems pretty determined to enforce this restriction.  Same
result
with 'const' instead of _thread:

int const x;

int main() {
  extern int x;
  x = 5;
}

t.c: In function 'main':
t.c:4: error: conflicting type qualifiers for 'x'
t.c:1: error: previous declaration of 'x' was here

RE: externs and thread local storage

2006-07-02 Thread Gary Funck

Andrew Pinski wrote:
> I would have hoped people actually read:
> http://gcc.gnu.org/onlinedocs/gcc/C99-Thread_002dLocal-Edits.html
> 
> Which actually describes the edits to the C99 standard to how  
> __thread is supposed to behave.

Thanks for the reference.  Per that proposal, __thread is a
storage-class specifier, which makes sense.  I may have
confused the issue by offering up an example using 'const'
-- the point of the example was really just to show that
the implementor of the __thread check wasn't lazy, but was
following suit on the qualifier check.

Given that __thread is a proposed extension, there isn't much
precedent to lean on, because generally 'extern' can only refer
to block scope identifiers and those objects are
inherently global, and from the point of view of the "C" program
referring to the objects, the actual method used to link, load,
and access those objects is implementation defined.

(Btw, personally, I'd prefer that a propoasl to extend the "C"
language use something other than a keyword beginning
with __ as a way of doing that.  For example, a compound keyword
such as "thread local" would read better and is unlikely to clobber
many existing programs.  If the idea of a compound keyword
is too offensive, then thread_local seems a lot better than __thread
to me.)

Applying the proposed standard to the following:

Dave Korn wrote:
> Reasons like this are why we have 6.2.7.2 in the C language spec, aren't
> they?
> 
> "All declarations that refer to the same object or function shall have
> compatible type; otherwise, the behavior is undefined."

The answer is probably, no.  Because the presence or absence of storage
specifiers shouldn't affect type compatibilty.

In reply to my question:
> > What are the technical reasons for the front-end enforcing this 
> restriction,
> > when apparently some linkers will handle the TLS linkage fine?  
> If in fact
> > it is required that __thread be added to the extern, is the 
> compiler simply
> > accommodating a limitation/bug in the linker?

Seongbae Park wrote:
> Because the compiler has to generate different code
> for accesses to __thread vs non __thread variable

In my view, this is implementation-defined, and generally can vary
depending upon the underlying linker and OS technology.  Further,
there is at least one known platform (IA64) which seems to not impose
this restriction.  A few implementation techniques come to mind,
where the programmer would not need to explicitly tag 'extern's
with __thread:

1. The linker can fix up external references to __thread variables
   by inserting jumps to a "thunk" that executes the appropriate
   instuctions and then jumps back to the point following the
   original instruction.  The IA64 linker might already be
   doing that.

2. The entire program might be compiled in some sort of PIC mode,
   where all external references go through some sort of indirect
   table, or some small subroutine is called to load the proper
   address, and there is no distinction between regular extern
   references and extern __thread references.

3. The linker coallesces __thread objects into a special linkage
   segment, and the OS allocates a new instance of this segment
   when it instantiates a thread.  From the thread's point of view,
   access to this per-thread segment is just a regular memory
   reference.  Thread creation may be slower, but access to TLS
   data is faster.

Thus, I don't think gcc should be checking for the presence of
the __thread specifier applied to an extern when the referenced
object is also declared as having __thread persistance, and both
declarations happen to be visible in a given compilation unit.
If this check must remain, I think it should be down-graded to
a warning (with a flag to turn off the warning), and the check
should be target-tuple specific, with possible further
target-dependent checks (such as special PIC modes, etc.).

This part of the proposed spec.:

"The declaration of an identifier for a variable that
has block scope that specifies __thread shall also
specify either extern or static."

seems to indicate that following declaration (at block scope)
is erroneous:
int __thread x;
because it has neither "static" nor "extern" preceding it.
Interestingly, when declared at an inner scope, the
declaration above appears to be allowed, because __thread
is a storage specifier.

Perhaps the "C" standard says somewhere that a bare
block scope declaration implies "extern", but the
language in the spec. seems to call out precisely
the presence of either "static" or "extern" ahead
of __thread in block scope declarations.

Thus, it seems that if gcc is going to move towards the
proposed standard, it should also deprecate block scope
declarations that aren't preceeded with either
"extern" or "static"?  (and perhaps it should do this
in preference to matching up bare declartions with
extern declarations).

If extern is required for block scoped objects
then it seems to imply th

RE: externs and thread local storage

2006-07-02 Thread Gary Funck

Seongbae Park wrote:
> That's the only platform I know of that doesn't require different 
> sequence.
> Should we make the language rules such that
> it's easy to implement on one platform but not on the others,
> or should we make it such that it's easy to implement in almost 
> all platforms ?

The fact that one current generally available platform doesn't require
the __thread attribute on the extern should be enough to at least
question whether an *error* should be diagnosed.  Also, consider
that compiler can't check for consistency across separately linked
files and the linker already will give an error if references to
__thread local objects don't have thread lcoal relocations.

> 
> Also, what is the benefit of allowing mismatch between
> declaration and definition of __thread vs non __thread ?

The extern does not mismatch, it simply doesn't provide
the __thread attribution.  The compiler can determine this
and quietly upgrade the extern, if it chose to.  In my view,
all of this should be unnecessary, and should really be a
linker and OS implementation issue, but it seems like it
may be difficult getting a conensus on that.

> It only makes reading the code more difficult
> because it confuses everybody - you need to look at the definition
> as well as the declaration to know whether you're dealing
> with a thread local variable or not which is BAD.

The example I gave had the global declaration and extern in the
same source file, and there it looks pretty silly.  Typically,
however, one will have a .h file where the extern lives and a
single source file where the __thread local variable is
declared.  And often, typically, the .h file might be handed
off to another group as the API to be used when accessing the
separately compiled implementation.  In that scenario, the
users of the .h file won't have the opportunity to check whether
it agrees with the data object declaration anyway.  So, the
inconsistency will only be detected when the data object is
declared, _if_ the programmer also #includes the header file
with the extern declaration into the same file that declares
the object.

More to the point, I think it is rather too bad that the
extern has to have the __thread attribute at all, and would
have hoped that the linker and OS could have collaborted to
make this transparent, in the same way that data can arranged
in separately linked sections without impacting the way
the external references are written.  Thus, implementation
is separated from interface.

> ...proposed scheme snipped...

Those weren't just proposals.  Some systems already implement
mechanisms like those mentioned (not proposed), and the IA64
is apparently one of those systems.

> 
> The question to me is not whether it's doable, but whether it's 
> worth doing
> - I see only downside and no upside of allowing mismatch.

Given that on some systems, there is no need to have __thread
on the extern at all, why should the compiler mandate it?
If it does mandate consistency, then it should at least do
so on a per platform, per compilation option basis.  After
all __thread isn't supported on all platforms or under
certain compilation regimes -- thus the check for __thread
support is made conditional upon the characteristics of
the compilation target.  I think the requirement to apply
_thread to an extern should also be target specific.

> If you're convinced that this is a really useful thing for a
> particular platform,
> why don't you create a new language extension flag that allows this,
> and make it default on that platform ?

Because it is the current implementation of __thread that is
in opposition to the generally accepted practice of separating
interface from implementation, and because some implementations
(both present and future) do not require that the external
reference be attributed with __thread.

RE: externs and thread local storage

2006-07-02 Thread Gary Funck

Pinski wrote: 
> What about the following two sources:
> char t;
> ---
> extern int t;
> What should happen? According to the C standard this is invalid code but
> the compiler does not need to diagnose the problem.

Yup.  Certainly a great way to re-use space across separately compiled
"C" source files (ala Fortran's blank common). 

I can see where the compiler is within its rights to issue
a warning above, or even a pedantic error.

RE: externs and thread local storage

2006-07-03 Thread Gary Funck

Seongbae Park wrote:
> As I said, you're welcome to implement a new option
> (either a runtime option or a compile time configuration option)
> that will allow mixing TLS vs non-TLS.

In a way, we've already done that -- in an experimental dialact of "C"
called UPC.  When compiled for pthreads, all file scope data not
declared in a system header file is made thread local, and in fact all
data referenced through externs is also made thread local.  There is a new
syntax (a "shared" qualifier) used by the programmer to identify
objects shared across all threads.  Sounds a little scary, but works
amazingly well.  Because the tagging of data as __thread local is done
by the compiler transparently, I tend to think that we probably stress
the TLS feature more than most.

> Whether or not it should be enabled for a particular platform
> should be a matter of discussion, and whether or not that patch will be
> accepted in the mainline will be yet another.

For myself, I've worked around the problem, and don't see any consensus
forming, so see little need to come up with a patch for something that
has no support.  As far as the process goes, I think it is
better to discuss the issues here to develop a consensus (if any)
before developing a patch.

RE: externs and thread local storage

2006-07-03 Thread Gary Funck

Seongbae Park wrote:
> In UPC, anything that's not TLS (or in UPC term, "private") 
> is marked explicitly as "shared". So it's NOT trasparent in 
> any sense of the word.
> See, you have two choices - either
> 1) make every global variable TLS by default and mark only 
> non-TLS (UPC) or
> 2) vice versa (C99).
> 
> It is not sane to allow TLS/non-TLS attribute changing underneath you
> - which is what you proposed.

Operations on UPC's shared objects have different semantics than
regular "C".  Array indexing and pointer operations span threads.
Thus A[i] is on one thread but A[i] will (by default) take you to
the next (cyclic) thread.  Since the semantics are different, the
programmer needs to know that -- it affects the API.  TLS objects
behave like regular "C" objects, at least from the perspective of
the referencing thread.

Note that this discussion started only on the question as to
whether the compiler should issue an error if it sees a bare extern
referencing a __thread object.  My position is that it should be a
target dependent error, and perhaps only a warning (because on some
platforms the resulting program will link and execute as expected),
and that there are many commonly occurring cases where the compiler
can't catch the inconsitencies in declarations and these are left
for the linker anyway.  Also note that the proposed specification
seems to side-step the issue by only allowing __thread after extern
and static at block scope, and would not permit the situation used
in the example that I presented.  Further, that it isn't clear
the current compiler is in sync. with the proposed specification,
and that is probably a higher priority issue.  (Maybe my quick reading
of the spec. was wrong, and someone can correct my misunderstanding.)

x86_64 - 128 bit structs not targeted to TImode: MAX_FIXED_MODE_SIZE too small?

2005-02-16 Thread Gary Funck


Given,

struct shared_ptr_struct
  {
unsigned int phase  : 24;
unsigned short thread : 16;
void *addr;
  };

On the x86_64 (ie, Opteron[tm]) platform, GCC appears to designate the
underlying mode of this type as a BLKmode, instead of a TImode.  This
has implications in terms of the quality of the code that is generated
to copy and manipulate 128 bit structures (as defined in the example
above).

The decision to commit this type to a BLKmode value, originates
in this logic in mode_for_size():

  if (limit && size > MAX_FIXED_MODE_SIZE)
 return BLKmode;

On the x86 platform, there appears to be no target definition for
MAX_FIELD_SIZE. Thus, the default in stor-layout.c applies:

#ifndef MAX_FIXED_MODE_SIZE
#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (DImode)
#endif

Other 64 bit targets define MAX_FIXED_MODE_SIZE along these lines
(some line wrapping may occur below):

config/i960/i960.h:#defineMAX_FIXED_MODE_SIZE GET_MODE_BITSIZE
(TImode)
config/ia64/ia64.h:#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TImode)
config/mips/mips.h:#define MAX_FIXED_MODE_SIZE LONG_DOUBLE_TYPE_SIZE
config/sh/sh.h:#define MAX_FIXED_MODE_SIZE (TARGET_SH5 ? 128 : 64)

on MIPS, LONG_DOUBLE_TYPE_SIZE is defined as follows:

/* A C expression for the size in bits of the type `long double' on
   the target machine.  If you don't define this, the default is two
   words.  */
#define LONG_DOUBLE_TYPE_SIZE \
  (mips_abi == ABI_N32 || mips_abi == ABI_64 ? 128 : 64)

In the 'dev' tree, the s390 defines MAX_FIXED_MODE_SIZE as follows:

config/s390/s390.h:#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE
(TARGET_64BIT ? TImode : DImode)

(Arguably, the s390 variant might be a better default value to be defined
in stor-layout.c)

I haven't tried making the suggested change to see if the x86_64 code
generator can fully support it.  Are there any technical reasons that
the x86_64 shouldn't target 128 bit structs into a long double (ie, two
64 bit registers)?

IA64 record alignment rules, and modes?

2005-02-26 Thread Gary Funck


On the IA64, the following record,

typedef struct sptr_struct
  {
long unsigned int phase: 48;
short unsigned int thread: 16;
void *addr;
  } sptr_t;

is assigned a BLKmode rather a TImode, and I was wondering whether
this is a requirement of the IA64 ABI, or a coincidental result of
various target configuration defintions?

The final determination of the mode assigned to this struct is
made in compute_record_mode().  The logic first tentatively assigns
a TImode (128 bits) as expected, in the second branch of this if
statement (GCC version 3.3.2):

  /* If we only have one real field; use its mode.  This only applies to
 RECORD_TYPE.  This does not apply to unions.  */
  if (TREE_CODE (type) == RECORD_TYPE && mode != VOIDmode)
TYPE_MODE (type) = mode;
  else
TYPE_MODE (type) = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1);


and then reverses that decision in the subsequent if statement:

  /* If structure's known alignment is less than what the scalar
 mode would need, and it matters, then stick with BLKmode.  */
  if (TYPE_MODE (type) != BLKmode
  && STRICT_ALIGNMENT
  && ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT
|| TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (TYPE_MODE (type
{
  /* If this is the only reason this type is BLKmode, then
 don't force containing types to be BLKmode.  */
  TYPE_NO_FORCE_BLK (type) = 1;
  TYPE_MODE (type) = BLKmode;
}

primarily because STRICT_ALIGNMENT is asserted, and BIGGEST_ALIGNMENT is 128 in 
config/ia64/ia64.h:
  
#define STRICT_ALIGNMENT 1

/* Optional x86 80-bit float, quad-precision 128-bit float, and quad-word
   128 bit integers all require 128 bit alignment.  */
#define BIGGEST_ALIGNMENT 128

And this configuration parameter in config/ia64/ia64.h may also have led
to the decision to force 64 bit alignment for this structure (this is asserted
on most targets):

/* Define this if you wish to imitate the way many other C compilers handle
   alignment of bitfields and the structures that contain them.
   The behavior is that the type written for a bit-field (`int', `short', or
   other integer type) imposes an alignment for the entire structure, as if the
   structure really did contain an ordinary field of that type.  In addition,
   the bit-field is placed within the structure so that it would fit within such
   a field, not crossing a boundary for it.  */
#define PCC_BITFIELD_TYPE_MATTERS 1




Question: If we assume that a TImode would've been a more efficient mode
to represent the record type above, would it not have been acceptable for
the compiler to promote the alignment of this type to 128, given there
are no apparent restrictions otherwise, or are there other C conventions
at work that dictate otherwise?  Is there a configuration tweak that
would've led to using TImode rather than BLKmode?

Re: Bad gcc/gtype-desc.h generated when using sparse checkout

2012-07-19 Thread Gary Funck

On 07/15/12 21:53:02, Jonathan Wakely wrote:
> [...]
> It took me a while to get back to this, but your suggestion worked,
> this patch allows bootstrapping to get past cp/lex.o, it hasn't
> finished yet so I haven't run the tests:
> [...]
> Presumably gengtype goes through directories alphabetically, so if it
> doesn't find gcc/ada before gcc/c then it creates an invalid
> gtype-desc.h

We would like to see this patch, or something similar applied.

Currently, we subset the GCC source distribution to include
only C, C++, and UPC when we build the GUPC (GNU UPC) source
code distributions.  For some test systems where we port GUPC, the
available disk space is restricted, and the additional 600MB or so
of space required for the additional languages and test suites might
exceed the available quota.

If I recall correctly, there was some discussion on this list,
or perhaps gcc-patches, as to whether a decision should be made
regarding the ability to subset the GCC source tree.  If sub-setting
is not prohibited, and there are no plans to upgrade/rewrite gengtype
and its infrastructure, then something like this patch seems necessary.

thanks,
- Gary

graphite loop optimizer - "C" examples?

2012-07-29 Thread Gary Funck


I have been experimenting with the graphite optimizer, based on GCC trunk, and
cloog-isl.  I started with the attached simple "C" program, which has this
basic structure.

#define N 2
int a[N][N], b[N], c[N];
[...]
  for (i = 0; i < N; i++)
{
  b[i] = i;
  c[i] = i + N;
}
  for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
  a[j][i] = b[i] + c[j];

(Attached, is the full test case.)

And compiled it with: -O3  -floop-block.

Couple of questions:
1) What option should I supply to confirm that the graphite optimizer ran and
determine (i) did it in fact perform any optimizations, and (ii) which
optimizations did it perform?
2) If -floop-block couldn't optimize this program, what is the likely reason?
3) Would you please offer pointers to example "C" programs that highlight
graphite-cloog-isl optimizations?

Thanks,
- Gary
#include 
#include 
#include 

#define N 2

int a[N][N], b[N], c[N];

static double
cpu_time ()
{
  struct timespec ts;
  double t;
  if (clock_gettime (CLOCK_MONOTONIC, &ts))
abort ();
  t = ts.tv_sec + (ts.tv_nsec * 1.0e-9);
  return t;
}

int
main (void)
{

  int i, j, k;
  double start, stop, elapsed;

  for (i = 0; i < N; i++)
{
  b[i] = i;
  c[i] = i + N;
}

  start = cpu_time ();
  for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
  a[j][i] = b[i] + c[j];
  stop = cpu_time ();
  elapsed = stop - start;

  printf ("elapsed time = %0.2f secs.\n", elapsed);

  return 0;
}

Re: C++ and gather-detailed-mem-stats

2012-08-15 Thread Gary Funck


Would it be possible to define a new function
attribute that transparently adds two parameters
for file name and line number?  Or that etablishes
a binding between this information and existing
parameter names?  This might be useful for regular
"C" programs as well.

void do_something (T1 t1, T2 t2)
   __attribute ((caller_info (func => __FUNC__,
  file => __FILE__,
  line => __LINE__)));

Perhaps a compilation switch and a pre-defined macro
are necessary to meaningfully be able to code
the body of the function and/or to conditionally
enable the collection of the data.

The syntax above with the mix of parameter names
(but no types) and pre-defined macros may not make
sense, but perhaps the idea can be developed further
if there is interest.

Or maybe a builtin type?

void do_something (T1 t1, T2 t2,
   const __builtin_caller_info_t * const caller_info)
   __attribute__ ((caller_info));

This comes at the cost of an additional pointer argument
but it can be set to NULL if collection of caller info
is disabled.  It can also be an opaque type established
via #define for configurations where either the compiler
doesn't support the feature or it is disabled.  Otherwise,
__builtin_caller_info_t might have the obvious fields
(function_name, file_name, line_number).

Off-hand, I can see how to make this work OK with default
argument values and/or over-loading in C++, but some more
work would be involved (by the programmer) to make this
work for regular "C" programs.

- Gary

Re: C++ and gather-detailed-mem-stats

2012-08-15 Thread Gary Funck

Or no explicit parameters at all ...

void do_something (T1 t1, T2 t2) __attribute__ ((caller_info));

All this will do (with appropriate compilation switches
and/or pre-defined macros) is pass one/more hidden
arguments, which in turn can be accessed in the
function body via a built-in function.

#if CALLER_INFO_ENABLED
   __caller_info_t caller_info = __builtin_caller_info ();
   [...]
#endif

In this way, the programmer-visible function prototype
is unaffected, though the caller and the function
body have to be compiled with compatible settings.

best method to implement dynamic initializers?

2008-02-04 Thread Gary Funck

We have the need to generate code that initializes
certain variables and runtime-related values with
expressions that can't be evaluated statically
at compile-time.  One method to do this, would be
to create an __attribute__ ((constructor)) function
that contains statements which initialize the
valures of interest.

Are there language dialects that already have this
requirement to evaluate and assign initial values
at runtime?  Do they use a general mechanism like
the constructor attribute or somehow roll their own?
(I can see how GCC's constructor mechanism may not
be sufficiently general for Ada, for example).

Ideally, I'd prefer to use some already developed
and proven code/approach rather than re-invent the
wheel.  Any pointers/tips appreciated.

thanks.

(gcc 4.2) how to create an ADDR_EXPR that refers to a linkage name?

2007-09-01 Thread Gary Funck


We are in the process of updating GCC/UPC's support for
the UPC dialect of C to version 4.2.0 of GCC.
GCC/UPC is described here: http://www.intrepid.com/upc.html

Generally, things are working.  However, at the moment,
all tests fail when optimizations are enabled.
For example:
test00.upc:35: internal compiler error:
   in referenced_var_check_and_insert, at tree-dfa.c:639

It is failing on this check:

(gdb) l
634
635   if (h)
636 {
637   /* DECL_UID has already been entered in the table.  Verify that 
it is
638  the same entry as TO.  See PR 27793.  */   

639   gcc_assert (h->to == to);
640   return false;
641 }  
642
643   h = GGC_NEW (struct int_tree_map);
(gdb) p h->to
$1 = 0x2e1ad160 
(gdb) pt
 
unit size 
align 8 symtab 0 alias set -1 precision 8 min  max

pointer_to_this >
addressable used public static common QI defer-output file test00.upc line 
26 size
 unit size 
align 8>
(gdb) p to
$2 = 0x2e1b7630
(gdb) pt
 
unit size 
align 8 symtab 0 alias set -1 precision 8 min  max

pointer_to_this >
addressable used public static common QI defer-output file test00.upc line 
26 size
 unit size 
align 8>


Above, the two tree nodes are clones of each other created by the
following UPC-specific code:

1109/* Convert shared variable reference VAR into a shared pointer
1110   value of the form {0, 0, &VAR} */

1112tree
1113upc_build_shared_var_addr (tree type, tree var)
1114{
1115  tree new_var, var_addr, val;
1116  if (!(TREE_CODE (var) == VAR_DECL && TREE_SHARED (var)))
1117abort ();
1118  if (!(TREE_CODE (type) == POINTER_TYPE && TYPE_SHARED (TREE_TYPE 
(type
1119abort ();
1120
1121  /* Create a VAR_DECL that is the same as VAR, but
1122 with qualifiers (esp. TYPE_QUAL_SHARED) removed so that
1123 we can create the actual address of the variable (in the shared
1124 section) without infinite recursion in the
1125 gimplification pass.  Make sure the new copy has
1126 the same UID as the old.  In the future, we might need
1127 to reference the symbol name directly. */
1128
1129  new_var = copy_node (var);
1130  DECL_UID (new_var) = DECL_UID (var);
1131  TREE_TYPE (new_var) = TYPE_MAIN_VARIANT (TREE_TYPE (var));
1132  TREE_SHARED (new_var) = 0;
1133  TREE_STRICT (new_var) = 0;
1134  TREE_RELAXED (new_var) = 0;
1135  var_addr = build_fold_addr_expr (new_var);
1136  TREE_CONSTANT (var_addr) = 1;
1137  val = upc_build_shared_ptr_value (type,
1138integer_zero_node,
1139integer_zero_node,
1140var_addr);
1141  return val;
1142}


As background, GCC/UPC adds a new qualifier, "shared" to
indicate that a value must be accessed remotely and that it
is shared across all UPC "threads" (which can be thought of
as processes all running the same program, but with
differing local copies of data).  The UPC specific aspects
of the language are translated by a gimplify pass into
normal gimple trees that are then passed to the middle
and back ends of GCC.  For example a reference to a value
of a type that is qualified as "shared" will result in a
call to a (possibly inlined) remote "get" library routine.

Where this gimplify pass can get confused is when it sees
a reference to a shared variable.  If it sees a reference to
a shared variable on the right hand side of an assignment,
it assumes that its value is needed and generates a remote
get call.  The address of a shared variable has three
parts (phase, thread, virtual address).  For declared
variables, the phase and thread are always 0.  A constructor
is used to create a shared address.  That's what
upc_build_shared_ptr_value() does above.  The virtual address
part of the shared address is simply the regular address of
the variable, because all shared variables are collected
together in their own "upc_shared" linkage section.  This
section is needed simply for address assignment purposes.
The actual shared data is located in a global shared
address region.

The code above clones a shared variable, stripping its type
qualifiers (most importantly the "shared" qualifier).  When
the address of the cloned variable is taken, its normal
C pointer-sized address will result, and the special
gimplify pass doesn't get confused, thinking that the
address of the variable is a shared address. 

The code above isn't clever.  It clones the variable
each time it needs to generate a shared address.  In GCC
4.2, this runs into problems in the optimization
pass that implements special checks for this sort of
inconsistency.

The discussion above is a (very) long lead up to a
request for ideas and suggestions for better
handling this situation.

O

Re: (gcc 4.2) how to create an ADDR_EXPR that refers to a linkage name?

2007-09-01 Thread Gary Funck

On Sat, Sep 01, 2007 at 01:43:37PM -0400, Diego Novillo wrote:
> 
> Have you considered using the data sharing machinery in OpenMP?  We
> simply create a data structure holding all shared variables, allocate
> that in shared memory and re-write all references to shared variables
> as dereferences to that structure.

Diego, thanks.  Some other implmentations of UPC reference
all shared variables indirectly through a table, built
at runtime.  The compiler tells the runtime how much space
each variable reauires, and the runtime alloctes this from
the shared memory region.

The current strategy used by GCC/UPC is somewhat simpler;
it lets the linker create the layout of the shared
variable section.  Perhaps we need to re-visit this design
decision, and adopt a scheme similar to that used by GOMP.
I'll review omp-low.c for ideas.

GCC/UPC does have a pthreads mode of operation, but that is
a special case.  UPC threads are usually mapped to separate
processes.  The shared memory region is potentially
distributed across network nodes, often accessed via a
high speed interconnect.  The runtime that is part of the
GCC/UPC's release supports only SMP configurations and
relies on mmap().  However, GCC/UPC also works with a more
general runtime developed by Berkeley, which supports many
network interconnects.

> 
> This trick you are implementing with cloning the VAR_DECLs is
> guaranteed not to work, sorry.  We very explicitly assume that if
> DECL_UID (x1) == DECL_UID (x2) then x1 == x2.  This is not something
> that will change.

Yeah, I suspected as much when I first wrote that code.
I wasn't too surprised to see that it failed a consistency
check in GCC 4.2.

how to chase a tree check failure in verify_ssa?

2007-09-22 Thread Gary Funck

Background: GCC 4.2.0 base line + mods for UPC dialect.  Problem below
is probablly a result of the UPC mods and not something
inherent in GCC 4.2.0.  Although the test cases that I ran
pass at -O2, some fail when the value of THREADS (the number
of parallel threads in the application) is set to the compile
time constant one.  The failing tests ICE in verify_ssa as 
shown below. 

I'd appreciate any tips or recommendations on how to diagnose
problems like this, likely things to look for, and so on.

The ICE occurs in tree-ssa.c at line 776 (--enable-checking is
asserted):

771
772   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter,
773 SSA_OP_ALL_USES | SSA_OP_ALL_KILLS)
774 {
775   op = USE_FROM_PTR (use_p);
776   if (verify_use (bb, definition_block[SSA_NAME_VERSION 
(op)],
777   use_p, stmt, false, !is_gimple_reg (op),
778   names_defined_in_bb))
779 goto err;
780 }

The operand, op:

(gdb) p op
$49 = 0x2e1ebc60
(gdb) pt
 
unit size 
align 128 symtab 0 alias set 3
fields 
unsigned external bit-field nonaddressable decl_4 DI file 
 line 0
size 
unit size 
align 1 offset_align 128
offset 
bit offset  
bit_field_type  context 
 chain >
chain >
used ignored TI file test02.upc line 33 size  unit size 
align 128 context >

and the statement, stmt:

(gdb) p stmt
$50 = 0x2e1ee3c0
(gdb) pt
 
unit size 
align 64 symtab 0 alias set -1 precision 48 min  max >
side-effects
arg 0 
   
arg 0 
used ignored TI file test02.upc line 33
size 
unit size 
align 128 context >
arg 1 
unsigned external bit-field nonaddressable decl_4 DI file 
 line 0
size 
unit size 
align 1 offset_align 128
offset 
bit offset  
bit_field_type  context 
 chain >>
arg 1  
constant invariant 0>
test02.upc:33>

The failure occurs because SSA_NAME_VERSION() in turn calls SSA_NAME_CHECK()
which checks that the tree node is an SSA_NAME node, which 'op' clearly is not.

Any ideas on how this situation might have occurred?  Note that the type
of op above is the internal representation of a UPC shared pointer, which
has three fields (phase, thread, vaddr).  This rep. overalays a shared
pointer value, which is generally twice the size of a conventional pointer.
Internally, UPC shared pointers are represented as POINTER_TYPE nodes whose
TREE_TYPE() is qualified by a new qualifier, "shared".  Various regular
"C" optimizations on pointers have to be disabled for UPC's shared pointers.
It may be the case that with the particular settings used in the failing
test that a "c" pointer optimization was inadvertently applied to a UPC shared
pointer.

Thanks for your help.

Re: how to chase a tree check failure in verify_ssa?

2007-09-24 Thread Gary Funck

On Mon, Sep 24, 2007 at 09:36:25AM -0400, Diego Novillo wrote:
> On 9/23/07, Gary Funck <[EMAIL PROTECTED]> wrote:
> 
> > The operand, op:
> >
> > (gdb) p op
> > $49 = 0x2e1ebc60
> > (gdb) pt
> >   
> This symbol was not marked for renaming and the program is already in
> SSA form.  When your pass introduces new symbols, you need to add them
> to the symbol table (with add_referenced_var) and also mark it for
> renaming (with mark_sym_for_renaming).  For examples see passes like
> tree-sra.c or tree-pre.c that create new variables.

Diego, thanks. That particular symbol is being created in
gimplify_expr, here (at line 541):

536  won't allocate any variable that is used in more than one basic
537  block, which means it will go into memory, causing much extra
538  work in reload and final and poorer code generation, outweighing
539  the extra memory allocation here.  */
540   if (!optimize || !is_formal || TREE_SIDE_EFFECTS (val))
541 ret = create_tmp_from_val (val);
542   else
543 {
544   elt_t elt, *elt_p;
545   void **slot;

Above, optimize=3, is_formal=0, and by deduction, side-effects
must be true.

'val' above, is a constructor:

(gdb) p debug_tree (val)

unit size 
align 64 symtab 0 alias set 3
fields 
unsigned external bit-field nonaddressable decl_4 SI file 
 line 0
size 
unit size 
align 1 offset_align 128
offset 
bit offset  
bit_field_type  context  chain >
chain >
constant>

We use constructors to build a UPC shared pointer value (it has three
parts [phase, thread, vaddr]).

I would have thought gimplify_expr's internal mechanisms would
mark veriables as referenced, when it needs to?

Re: how to chase a tree check failure in verify_ssa?

2007-09-24 Thread Gary Funck


Diego, a bit more info.  It seems that gimplify_operand
is being called in the rewrite_uses pass of 
tree-ssa-loop-ivopts.c.  gimplify_operand() is working
on this expr:

 
unit size 
align 32 symtab 0 alias set -1 precision 32 min  max >
constant invariant
arg 0 
constant invariant
arg 0 
constant static
arg 0 
constant>>>
arg 1  constant invari

As you can can see, we coerce a constructor into a UPC shared pointer,
which works something like a pointer, but it is not inter-operable directly
with integers.  Typically, we have to locate the places where these sorts
of optimizations are attempted and disable them for UPC shared pointers.

Thanks for you help.  It got me pointed in the right direction.

  - Gary

cgraph, unit-at-a-time, and the "used" attribute

2007-10-06 Thread Gary Funck


While working on UPC, we ran into an interesting problem
where if -O1 is enabled, and -funit-at-a-time is disabled
(which is not the default configuration) a static variable
declaration was not emitted by the assembler.  I haven't
quite worked out why this is the case, but reading the
code did notice some awkwardness in how "used" variables
are detected and handled by the call graph (cgraph) pass(es).

The gist of the issue we ran into was the handling of
this UPC construct:
  {static shared strict int x; x = x; }
In UPC, "strict" is similar to volatile.  The assignment
of the dummy variable to itself above doesn't do anything
very useful, but it does enforce a memory fence that
ensures that remote reads and writes to UPC shared
space can't flow past the assignment above.

The UPC compiler runs a gimplify pass which finds
all UPC-isms and rewrites them into C-isms, which
then flow through the backend.  The assignment
above is loosely translated into:
  upc_put([0, 0, &x], upc_get([0, 0, &x], sizeof(x)));
where [0, 0, &x] is an aggregate consstructor that
builds the representation of a shared pointer having
a thread number of 0, a phase of 0, and a virtual
address of &x.  All UPC shared vairables are located
in a special linkage section.  In this way, &x points
to a location in the global shared address space,
and the linker lays out each thread's contribution to
the global shared address.

The difficulty comes in when we generate the runtime
calls above referring to &x, by referring to a shadow
variable we create (by necessity, to prevent infinite
recursion in the gimplify pass) that has the same external
name as 'x', with the shared qualifier removed.

What happens is that cgraph has already been run and
determined that 'x' isn't needed and therefore
it doesn't emit the declaration of 'x' into the generated
assembler code.  We tried asserting TREE_USED()
on 'x' when it was declared, but it turns out
that instead of referring to TREE_USED() or
even DECL_PRESERVE_P(), cgraph instead refers
directly to the "used" attribute. 

Because of this, if __attribute__ ((used)) is added to the
declaration above, all is well.  That is because the front-end
checks directly for the "used" attribute in various places
but seems not to check various tree flags.

Here are the relevant references (in the HEAD branch):

c-decl.c-}
c-decl.c-
c-decl.c-  /* If this was marked 'used', be sure it will be output.  */
c-decl.c:  if (!flag_unit_at_a_time && lookup_attribute ("used", 
DECL_ATTRIBUTES (decl)))
c-decl.c-mark_decl_referenced (decl);
c-decl.c-
c-decl.c-  if (TREE_CODE (decl) == TYPE_DECL)
--
cgraphunit.c-  if (node->local.externally_visible)
cgraphunit.c-return true;
cgraphunit.c-
cgraphunit.c:  if (!flag_unit_at_a_time && lookup_attribute ("used", 
DECL_ATTRIBUTES (decl)))
cgraphunit.c-return true;
cgraphunit.c-
cgraphunit.c-  /* ??? If the assembler name is set by hand, it is possible to 
assemble
--
cgraphunit.c-  for (node = cgraph_nodes; node != first; node = node->next)
cgraphunit.c-{
cgraphunit.c-  tree decl = node->decl;
cgraphunit.c:  if (lookup_attribute ("used", DECL_ATTRIBUTES (decl)))
cgraphunit.c-   {
cgraphunit.c- mark_decl_referenced (decl);
cgraphunit.c- if (node->local.finalized)
--
cgraphunit.c-  for (vnode = varpool_nodes; vnode != first_var; vnode = 
vnode->next)
cgraphunit.c-{
cgraphunit.c-  tree decl = vnode->decl;
cgraphunit.c:  if (lookup_attribute ("used", DECL_ATTRIBUTES (decl)))
cgraphunit.c-   {
cgraphunit.c- mark_decl_referenced (decl);
cgraphunit.c- if (vnode->finalized)
--
ipa-pure-const.c-{
ipa-pure-const.c-  /* If the variable has the "used" attribute, treat it as if 
it had a
ipa-pure-const.c- been touched by the devil.  */
ipa-pure-const.c:  if (lookup_attribute ("used", DECL_ATTRIBUTES (t)))
ipa-pure-const.c-{
ipa-pure-const.c-  local->pure_const_state = IPA_NEITHER;
ipa-pure-const.c-  return;
--
ipa-reference.c-{
ipa-reference.c-  /* If the variable has the "used" attribute, treat it as if 
it had a
ipa-reference.c- been touched by the devil.  */
ipa-reference.c:  if (lookup_attribute ("used", DECL_ATTRIBUTES (t)))
ipa-reference.c-return false;
ipa-reference.c-
ipa-reference.c-  /* Do not want to do anything with volatile except mark any
--
ipa-type-escape.c-  tree type = get_canon_type (TREE_TYPE (t), false, false);
ipa-type-escape.c-  if (!type) return;
ipa-type-escape.c-
ipa-type-escape.c:  if (lookup_attribute ("used", DECL_ATTRIBUTES (t)))
ipa-type-escape.c-{
ipa-type-escape.c-  mark_interesting_type (type, FULL_ESCAPE);
ipa-type-escape.c-  return;
--
varpool.c-  if (node->externally_visible || node->force_output)
varpool.c-return true;
varpool.c-  if (!flag_unit_at_a_time
varpool.c:  && lookup_attribute ("used", DECL_ATTRIBUTES (decl)))
varpool.c-return true;
varpool.c-
varpool.c-  /* ??? If the assembler name is set by hand, it is possible to 
assemble

Given that the process

Re: cgraph, unit-at-a-time, and the "used" attribute

2007-10-09 Thread Gary Funck

On Mon, Oct 08, 2007 at 02:50:06PM -0700, Janis Johnson wrote:
> 
> Might this be related to http://gcc.gnu.org/PR33645?

Possibly. We think that we saw a problem rebuilding one of
the math functions in libgcc2 at -O2 with unit-at-a-time
disabled, that resulted in a compilation failure.  Since
that isn't the usual configuration, perhaps there's
an implicit dependency between funit-at-a-time and
one of optimization passes? (We didn't look into the
issue further.  The baseline we're using is 4.2.0, fyi.)

Thanks for the reference to the PR.
  - Gary

Re: gomp slowness

2007-11-01 Thread Gary Funck

On Thu, Oct 18, 2007 at 11:42:52AM +1000, skaller wrote:
> 
> DO you know how thread local variables are handled?
> [Not using Posix TLS I hope .. that would be a disaster]

Would you please elaborate?  What's wrong with the
POSIX TLS implementation?   Do you know of any studies?

I ask, because we presently use the TLS facility extensively,
and have suspected that there are significant performance
problems, but haven't looked into the issue.

95 matches

Mail list logo