-Warray-bounds false negative

2009-11-13 Thread Matt

Hello,

I recently came across a false negative in GCC's detection of array bounds 
violation. At first, I thought the other tool (PC-Lint) was having false 
positive, but it turns out to be correct. The false negative occurs in GCC 
4.3, 4.4.1, and latest trunk (4.5). I'm curious to understand how exactly 
the detection breaks down, as I think it may affect if/how the loop in 
question is optimized.


Here is the code:

int main(int argc, char** argv)
{
unsigned char data[8];
int hyphen = 0, i = 0;
char *option = *argv;

for(i = 19; i < 36; ++i) {
if(option[i] == '-') {
if(hyphen) return false;
++hyphen;
continue;
}

if(!(option[i] >= '0' && option[i] <= '9')
&& !(option[i] >= 'A' && option[i] <= 'F')
&& !(option[i] >= 'a' && option[i] <= 'f')) {
return false;
}

data[(i-hyphen)/2] = 0;
}

return 0;
}

When i is 36 and hyphen is 0 (and in many other cases), data[] will be 
overflowed by quite a bit. Where does the breakdown in array bounds 
detection occur, and why? Once I understand, and if the fix is simple 
enough, I can try to fix the bug and supply a patch.


Thanks!

--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


Re: -Warray-bounds false negative

2009-11-13 Thread Matt

On Fri, 13 Nov 2009, Andrew Pinski wrote:


On Fri, Nov 13, 2009 at 1:09 PM, Matt  wrote:

Hello,

I recently came across a false negative in GCC's detection of array bounds
violation. At first, I thought the other tool (PC-Lint) was having false
positive, but it turns out to be correct. The false negative occurs in GCC
4.3, 4.4.1, and latest trunk (4.5). I'm curious to understand how exactly
the detection breaks down, as I think it may affect if/how the loop in
question is optimized.


Well in this case, all of the code is considered dead is removed
before the warning will happen to be emitted.
If I change it so that data is read from (instead of just written to),
the trunk warns about this code:
t.c:21:20: warning: array subscript is above array bounds

I changed the last return to be:
  return data[2];


d'oh! Next time I'll look at the objdump output first.

Thanks for the quick explanation!

--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


build failure bootstrapping trunk on Ubuntu 9.10

2009-11-18 Thread Matt
I'm getting this build failure with latest trunk, as of the composing of 
this email:


../gcc-trunk/configure --prefix=/home/matt --enable-stage1-checking=all 
--enable-bootstrap --enable-lto 
--enable-languages=c,c++../gcc-trunk/configure --prefix=/home/matt 
--enable-stage1-checking=all --enable-bootstrap --enable-lto 
--enable-languages=c,c++


make -j5

.
.
.
/home/matt/src/gcc-obj/./prev-gcc/xgcc 
-B/home/matt/src/gcc-obj/./prev-gcc/ 
-B/home/matt/x86_64-unknown-linux-gnu/bin/ 
-B/home/matt/x86_64-unknown-linux-gnu/bin/ 
-B/home/matt/x86_64-unknown-linux-gnu/lib/ -isystem 
/home/matt/x86_64-unknown-linux-gnu/include -isystem 
/home/matt/x86_64-unknown-linux-gnu/sys-include-c  -g -O2 
-fprofile-use -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual 
-Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute 
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings 
-Werror -Wold-style-definition -Wc++-compat -fno-common  -DHAVE_CONFIG_H 
-I. -I. -I../../gcc-trunk/gcc -I../../gcc-trunk/gcc/. 
-I../../gcc-trunk/gcc/../include -I../../gcc-trunk/gcc/../libcpp/include 
-I../../gcc-trunk/gcc/../libdecnumber 
-I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber 
-DCLOOG_PPL_BACKEND  -I/usr/include/libelf 
../../gcc-trunk/gcc/ira-lives.c -o ira-lives.o


cc1: warnings being treated as errors
../../gcc-trunk/gcc/ira-lives.c: In function 
ira_implicitly_set_insn_hard_regs:
../../gcc-trunk/gcc/ira-lives.c:748:13: error: regno may be used 
uninitialized in this function




It looks like ira-lives.c:763 has some ambiguous parenthesizing that may 
be causing the warning that is failing the build). Note that the warning 
doesn't happen on a similar piece of code on line 830 in the same file.


I've been fighting with the configure process for a few days and finally 
got past that to this issue. So, any help is greatly appreciated :)


Thanks!

--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


missed IPA/whopr optimization?

2009-11-19 Thread Matt


Hello all,

In the work I'm doing on my new book, I'm trying to show how modern 
compiler optimizations can eliminate a good deal of the overhead 
introduced by an modular/unit-testable design. In verifying some of my 
text, I found that GCC 4.4 and 4.5 (20091018, Ubuntu 9.10 package) isn't 
doing an optimization that I expected it to do:


class Calculable
{
public:
virtual unsigned char calculate() = 0;
};

class X : public Calculable
{
public:
unsigned char calculate() { return 1; }
};

class Y : public Calculable
{
public:
unsigned char calculate() { return 2; }
};

static void print(Calculable& c)
{
printf("%d\n", c.calculate());
printf("+1: %d\n", c.calculate() + 1);
}

int main()
{
X x;
Y y;

print(x);
print(y);

return 0;
}

GCC 4.5 (and 4.4.1) generates this approximate code:

~/src $ /usr/lib/gcc-snapshot/bin/g++ -O3 -ftree-loop-ivcanon -fivopts 
-ftree-loop-im -fwhole-program -fipa-struct-reorg -fipa-matrix-reorg 
-fgcse-sm -fgcse-las -fgcse-after-reload --param max-gcse-memory=1 
--param max-pending-list-length=10   folding-test-interface.cpp -o 
folding-test-interface_gcc450_20091018-O3-kitchen-sink


~/src$ objdump -Mintel -S 
folding-test-interface_gcc450_20091018-O3-kitchen-sink | less -p \

00400310 :
  400310:   53  push   rbx
  400311:   48 83 ec 20 subrsp,0x20
  400315:   48 8d 5c 24 10  learbx,[rsp+0x10]
  40031a:   48 c7 44 24 10 c0 04movQWORD PTR 
[rsp+0x10],0x4004c0

  400321:   40 00
  400323:   48 c7 04 24 00 05 40movQWORD PTR [rsp],0x400500
  40032a:   00
  40032b:   48 89 dfmovrdi,rbx
  40032e:   ff 15 8c 01 00 00   call   QWORD PTR [rip+0x18c] 
# 4004c0 <_ZTV1X+0x10>

  400334:   bf ac 04 40 00  movedi,0x4004ac
  400339:   0f b6 f0movzx  esi,al
  40033c:   31 c0   xoreax,eax
  40033e:   e8 a5 03 00 00  call   4006e8 
  400343:   48 8b 44 24 10  movrax,QWORD PTR [rsp+0x10]
  400348:   48 89 dfmovrdi,rbx
  40034b:   ff 10   call   QWORD PTR [rax]
  40034d:   0f b6 f0movzx  esi,al
  400350:   bf a4 04 40 00  movedi,0x4004a4
  400355:   31 c0   xoreax,eax
  400357:   83 c6 01addesi,0x1
  40035a:   e8 89 03 00 00  call   4006e8 
[...]

as seen here, GCC isn't folding/inlining the constants returned across the 
virtual function boundary, even though they are visible in the compilation 
unit and -O3 -fwhole-program is being used. (Note that I started with just 
that commandline, and added things in an attempt to induce the 
optimization I was hoping for.)


I was able to induce the optimization by removing a level of indirection 
via two ways: 1) By having two print() methods, one overloaded to accept 
X& and a second overload to accept Y&; and 2) by replacing the classes 
with  single-level indirection function pointers:

--
#include 

typedef unsigned char(*Calculable)(void);

unsigned char one() { return 1; }
unsigned char two() { return 2; }

static void print(Calculable calculate)
{
printf("%d\n", calculate());
printf("+1: %d\n", calculate() + 1);
}

int main()
{
print(one);
print(two);

return 0;
}
--
For completeness, this code is generated from the function-pointer example 
optimizes in the way I expect:

00400390 :
  400390:   48 83 ec 08 subrsp,0x8
  400394:   ba 01 00 00 00  movedx,0x1
  400399:   be e4 04 40 00  movesi,0x4004e4
  40039e:   bf 01 00 00 00  movedi,0x1
  4003a3:   31 c0   xoreax,eax
  4003a5:   e8 c6 02 00 00  call   400670 <__printf_...@plt>
  4003aa:   ba 02 00 00 00  movedx,0x2
  4003af:   be dc 04 40 00  movesi,0x4004dc
  4003b4:   bf 01 00 00 00  movedi,0x1
  4003b9:   31 c0   xoreax,eax
  4003bb:   e8 b0 02 00 00  call   400670 <__printf_...@plt>



Modifying this last example to include two function pointer indirections 
once again causes the optimization to be missed.


So, my questions are:
0) Am I missing some existing commandline parameter that would induce the 
optimization? (e.g. a bad connection between my chair and keyboard)

1) Is this a missed optimization bug, or is this a missing feature?
2) Either way, what are the steps to correct the issue?

Thanks in advance for insights and/or help!



PS: I would test with a newer 4.5.0 build, but I'm having trouble 
bootstrapping. Any help is appreciated on that email (sent yesterday), as 
well.


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


Re: GCC 4.5 is uncompilable

2009-11-20 Thread Matt

Hey Dave,

What OS are you bootstrapping on, and with which compiler/version? 
(Cygwin, I assume, but you never know 
;>)


I haven't been able to bootstrap for a few weeks, but no one answered my 
email for help (which probably got lost in the kernel-related fighting):

http://gcc.gnu.org/ml/gcc/2009-11/msg00476.html

The code in question definitely looks like the uninitialized warning 
(reported as error) is valid. I'm surprised no one else has been seeing 
this, unless they aren't bootstrapping using 4.4.1 or above.


Any help is appreciated -- I really want to get cracking on testing 4.5.

Thanks!




--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


df_changeable_flags use in combine.c

2010-01-04 Thread Matt

Hi,

I'm fixing some compiler errors when configuring with 
--enable-build-with-cxx, and ran into a curious line of code that may 
indicate a bug:


static unsigned int
rest_of_handle_combine (void)
{
  int rebuild_jump_labels_after_combine;

  df_set_flags (DF_LR_RUN_DCE + DF_DEFER_INSN_RESCAN);
 // ...
}

The DF_* values are from the df_changeable_flags enum, whose values are 
typically used in logical and/or operations for masking purposes. As such, 
I'm guessing the author may have meant to do:

  df_set_flags (DF_LR_RUN_DCE & DF_DEFER_INSN_RESCAN);

I could have just added the explicit cast necessary to silence the 
gcc-as-cxx warning I was running into, but I wanted to be a good citizen 
:)


Any pointers are appreciated,
Thanks!




--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


[gcc-as-cxx] enum conversion to int

2010-01-04 Thread Matt

Hi,

I'm trying to fix some errors/warnings to make sure that gcc-as-cxx 
doesn't bitrot too much. I ran into this issue, and an unsure how to fix 
it without really ugly casting:


enum df_changeable_flags
df_set_flags (enum df_changeable_flags changeable_flags)
{
  enum df_changeable_flags old_flags = df->changeable_flags;
  df->changeable_flags |= changeable_flags;
  return old_flags;
}

I'm getting this warning on the second line of the function:
./../gcc-trunk/gcc/df-core.c: In function df_changeable_flags 
df_set_flags(df_changeable_flags):
../../gcc-trunk/gcc/df-core.c:474: error: invalid conversion from int to 
df_changeable_flags


At first blanch, it seems like df_changeable_flags should be a typedef to 
byte (or int, which is what it was being implicitly converted to 
everywhere), and the enum should be disbanded into individual #defines.


I wanted to make sure that this wasn't a warning false positive first, 
though.


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


Re: [gcc-as-cxx] enum conversion to int

2010-01-05 Thread Matt

On Tue, 5 Jan 2010, Ian Lance Taylor wrote:


Matt  writes:


I'm trying to fix some errors/warnings to make sure that gcc-as-cxx
doesn't bitrot too much. I ran into this issue, and an unsure how to
fix it without really ugly casting:

enum df_changeable_flags
df_set_flags (enum df_changeable_flags changeable_flags)
{
  enum df_changeable_flags old_flags = df->changeable_flags;
  df->changeable_flags |= changeable_flags;
  return old_flags;
}


On trunk df_set_flags looks like this:

int
df_set_flags (int changeable_flags)


Yes, was I pasted was a local change. I was trying to eliminate the 
implicit cast to int from the enum type, which was causing my 
--enable-werror build to fail. At this point, I think the better 
option would be to break up the enum values into indivdual #defines and do 
a typedef df_changeable_flags int;



The gcc-in-cxx branch is no longer active.  All the work was merged to
trunk, where it is available via --enable-build-with-cxx.  If you want
to work on the gcc-in-cxx branch, start by merging from trunk.


Sorry, I didn't mean to imply I was working on the now-dead branch. I'm 
doing this work in trunk. I want the build-as-cxx option to work decently 
so that my profiledbootstrap exercises the C++ front-end more, since that 
is what we compile all our code with here. As such, I'm building trunk to 
eliminate some of the cxx failures, and will submit a patch once it either 
builds completely or I've hit a brick wall. This should (hopefully) make 
for less work when the more invasive changes are started once trunk is 
open again.


PS: of course, it would be even better if profiledbootstrap allowed me to 
point at our build's makefile to generate the runtime profile.

--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


Re: [gcc-as-cxx] enum conversion to int

2010-01-05 Thread Matt

On Tue, 5 Jan 2010, Ian Lance Taylor wrote:


Matt  writes:


Yes, was I pasted was a local change. I was trying to eliminate the
implicit cast to int from the enum type, which was causing my
--enable-werror build to fail. At this point, I think the better
option would be to break up the enum values into indivdual #defines
and do a typedef df_changeable_flags int;


Don't use #defines.  Enums give better debug info by default.  typedef
df_changeable_flags int is fine if that seems necessary.  Right now
the code simply doesn't use the df_changeable_flags type any time
there is more than one flag.


Okay, good to know about the better debuggability of enums. If the flags 
are supposed to be mutually exclusive, then the code in my other email 
where two flags are added together seems contrary.


Regardless, does this mean  that the bit-wise operations for set_flags and 
clear_flags could be changed to simple assignments? That would indeed fix 
this issue in a nice way.


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


Re: ICE building svn trunk on Ubuntu 9.x amd64

2009-06-25 Thread Matt

(now sending to gcc@ instead of gcc-help@, as suggested)

I have narrowed it down to this reduced commandline (the time is there 
just to show that it may take a while, but this particular issue doesn't 
cause a hang):


 m...@hargett-755:~/src/gcc-obj/prev-gcc$ time 
/home/matt/src/gcc-obj/./prev-gcc/xgcc 
-B/home/matt/src/gcc-obj/./prev-gcc/ 
-B/home/matt/x86_64-unknown-linux-gnu/bin/ 
-B/home/matt/x86_64-unknown-linux-gnu/bin/ 
-B/home/matt/x86_64-unknown-linux-gnu/lib/ -isystem 
/home/matt/x86_64-unknown-linux-gnu/include -isystem 
/home/matt/x86_64-unknown-linux-gnu/sys-include-c  -O2 
-ftree-loop-distribution -DIN_GCC -DHAVE_CONFIG_H -I. -I. 
-I../../gcc-trunk/gcc -I../../gcc-trunk/gcc/. 
-I../../gcc-trunk/gcc/../include -I../../gcc-trunk/gcc/../libcpp/include 
-I../../gcc-trunk/gcc/../libdecnumber 
-I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber -Iyes/include 
-Iyes/include -DCLOOG_PPL_BACKEND   ../../gcc-trunk/gcc/reload1.c -o 
reaload1.o../../gcc-trunk/gcc/reload1.c: In function delete_output_reload:
../../gcc-trunk/gcc/reload1.c:8391:1: error: type mismatch in binary 
expression

long unsigned int



long unsigned int

D.65146_650 = D.65145_651 - D.65141_624;

../../gcc-trunk/gcc/reload1.c:8391:1: error: type mismatch in binary 
expression

long unsigned int



long unsigned int

D.65154_658 = D.65153_659 - D.65149_647;

../../gcc-trunk/gcc/reload1.c:8391:1: internal compiler error: 
verify_stmts failed

Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

real9m25.630s
user9m23.823s
sys 0m0.972s


-O0 -ftree-loop-distribution doesn't exhibit the problem, and neither does 
-O1 -ftree-loop-distribution. There's something about the combination of 
-O2 (or -O3) and -ftree-loop-distribution that causes the ICE on this 
particular file.


I'll try bootstrapping without -ftree-loop-distribution and see if that 
works for me. If more information is needed, or I should file a bug 
report, let me know.


On Wed, 24 Jun 2009, Matt wrote:


Hi,

I left my profiled bootstrap to of svn r148885 to run overnight, and saw this 
in the morning:


/home/matt/src/gcc-obj/./prev-gcc/xgcc -B/home/matt/src/gcc-obj/./prev-gcc/ 
-B/home/matt/x86_64-unknown-linux-gnu/bin/ 
-B/home/matt/x86_64-unknown-linux-gnu/bin/ 
-B/home/matt/x86_64-unknown-linux-gnu/lib/ -isystem 
/home/matt/x86_64-unknown-linux-gnu/include -isystem 
/home/matt/x86_64-unknown-linux-gnu/sys-include-c  -O3 -floop-interchange 
-floop-strip-mine -floop-block -findirect-inlining -ftree-switch-conversion 
-fvect-cost-model -fgcse-sm -fgcse-las -fgcse-after-reload -fsee 
-ftree-loop-linear -ftree-loop-distribution -ftree-loop-im 
-ftree-loop-ivcanon -fivopts -fvpt -funroll-loops -funswitch-loops 
-fprofile-generate -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wcast-qual -Wold-style-definition -Wc++-compat 
-Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I../../gcc-trunk/gcc -I../../gcc-trunk/gcc/. 
-I../../gcc-trunk/gcc/../include -I../../gcc-trunk/gcc/../libcpp/include 
-I../../gcc-trunk/gcc/../libdecnumber 
-I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber -Iyes/include 
-Iyes/include -DCLOOG_PPL_BACKEND   ../../gcc-trunk/gcc/rtl.c -o rtl.o

../../gcc-trunk/gcc/reload1.c: In function delete_output_reload:
../../gcc-trunk/gcc/reload1.c:8391:1: error: type mismatch in binary 
expression

long unsigned int



long unsigned int

D.58046_964 = D.58045_963 - D.58041_946;

../../gcc-trunk/gcc/reload1.c:8391:1: error: type mismatch in binary 
expression

long unsigned int



long unsigned int

D.58054_972 = D.58053_971 - D.58049_967;

../../gcc-trunk/gcc/reload1.c:8391:1: internal compiler error: verify_stmts 
failed

Please submit a full bug report,
with preprocessed source if appropriate.

This is using the 4:4.4.0-3ubuntu1 version of Ubuntu's gcc package on amd64.

Here's my configure cmdline:
CFLAGS="-O3 -floop-interchange -floop-strip-mine -floop-block 
-findirect-inlining -ftree-switch-conversion -fvect-cost-model -fgcse-sm 
-fgcse-las -fgcse-after-reload -fsee -ftree-loop-linear 
-ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fivopts -fvpt 
-funroll-loops -funswitch-loops" CPPFLAGS="-O3 -floop-interchange 
-floop-strip-mine -floop-block -findirect-inlining -ftree-switch-conversion 
-fvect-cost-model -fgcse-sm -fgcse-las -fgcse-after-reload -fsee 
-ftree-loop-linear -ftree-loop-distribution -ftree-loop-im 
-ftree-loop-ivcanon -fivopts -fvpt -funroll-loops -funswitch-loops" 
../gcc-trunk/configure --prefix=/home/matt --enable-stage1-checking=all 
--enable-bootstrap --enable-lto --enable-languages=c,c++ --with-ppl 
--with-cloog


and here's my make cmdline:
make BOOT_CFLAGS="-O3 -floop-interchange -floop-strip-mine -floop-block 
-findi

Re: Phase 1 of gcc-in-cxx now complete

2009-06-26 Thread Matt



* Develop some trial patches which require C++, e.g., convert VEC to
 std::vector.


Do you have any ideas for the easiest starting points? Is there anywhere 
that is decently self-contained, or will if have to be a big bang?


I'd love to see this happen so there's more exercising of template 
expansion during the profiledbootstrap. If I can get pointed in the right 
direction, I can probably produce a patch within the next week.


Thanks for this work and adding all the extra warnings!

--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


4.1.1 profiledbootstrap failure on amd64

2006-05-23 Thread Matt
I get this failure when trying to do a proifledbootstrap on amd64. This is
a gentoo Linux machine with gcc 3.4.4, glibc 2.35, binutils 2.16.1,
autoconf 2.59, etc, etc.

make[6]: Entering directory
`/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3'
if [ -z "32" ]; then \
  true; \
else \
  rootpre=`${PWDCMD-pwd}`/; export rootpre; \
  srcrootpre=`cd ../../../gcc-4.1.1-20060517/libstdc++-v3;
${PWDCMD-pwd}`/; export srcrootpre; \
  lib=`echo ${rootpre} | sed -e 's,^.*/\([^/][^/]*\)/$,\1,'`; \
  compiler="/home/matt/src/gcc-bin/./gcc/xgcc
-B/home/matt/src/gcc-bin/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/
-B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem
/usr/local/x86_64-unknown-linux-gnu/include -isystem
/usr/local/x86_64-unknown-linux-gnu/sys-include"; \
  for i in `${compiler} --print-multi-lib 2>/dev/null`; do \
dir=`echo $i | sed -e 's/;.*$//'`; \
if [ "${dir}" = "." ]; then \
  true; \
else \
  if [ -d ../${dir}/${lib} ]; then \
flags=`echo $i | sed -e 's/^[^;]*;//' -e 's/@/ -/g'`; \
if (cd ../${dir}/${lib}; make "AR_FLAGS=rc" "CC_FOR_BUILD=gcc"
"CC_FOR_TARGET=/home/matt/src/gcc-bin/./gcc/xgcc
-B/home/matt/src/gcc-bin/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/
-B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem
/usr/local/x86_64-unknown-linux-gnu/include -isystem
/usr/local/x86_64-unknown-linux-gnu/sys-include" "CFLAGS=-O2 -g -O2 "
"CXXFLAGS=-g -O2  -D_GNU_SOURCE" "CFLAGS_FOR_BUILD=-g -O2"
"CFLAGS_FOR_TARGET=-O2 -g -O2 " "INSTALL=/usr/bin/install -c"
"INSTALL_DATA=/usr/bin/install -c -m 644"
"INSTALL_PROGRAM=/usr/bin/install -c" "INSTALL_SCRIPT=/usr/bin/install -c"
"LDFLAGS=" "LIBCFLAGS=-O2 -g -O2 " "LIBCFLAGS_FOR_TARGET=-O2 -g -O2 "
"MAKE=make" "MAKEINFO=makeinfo --split-size=500 --split-size=500
--split-size=500" "PICFLAG=" "PICFLAG_FOR_TARGET=" "SHELL=/bin/sh"
"RUNTESTFLAGS=" "exec_prefix=/usr/local" "infodir=/usr/local/info"
"libdir=/usr/local/lib" "includedir=/usr/local/include"
"prefix=/usr/local" "tooldir=/usr/local/x86_64-unknown-linux-gnu"
"gxx_include_dir=/usr/local/include/c++/4.1.1" "AR=ar"
"AS=/home/matt/src/gcc-bin/./gcc/as"
"LD=/home/matt/src/gcc-bin/./gcc/collect-ld" "RANLIB=ranlib"
"NM=/home/matt/src/gcc-bin/./gcc/nm" "NM_FOR_BUILD=" "NM_FOR_TARGET=nm"
"DESTDIR=" "WERROR=" \
CFLAGS="-O2 -g -O2  ${flags}" \
FCFLAGS=" ${flags}" \
FFLAGS=" ${flags}" \
ADAFLAGS=" ${flags}" \
prefix="/usr/local" \
exec_prefix="/usr/local" \
GCJFLAGS=" ${flags}" \
CXXFLAGS="-g -O2  -D_GNU_SOURCE ${flags}" \
LIBCFLAGS="-O2 -g -O2  ${flags}" \
LIBCXXFLAGS="-g -O2  -D_GNU_SOURCE
-fno-implicit-templates ${flags}" \
LDFLAGS=" ${flags}" \
MULTIFLAGS="${flags}" \
DESTDIR="" \
    INSTALL="/usr/bin/install -c" \
INSTALL_DATA="/usr/bin/install -c -m 644" \
INSTALL_PROGRAM="/usr/bin/install -c" \
INSTALL_SCRIPT="/usr/bin/install -c" \
all); then \
  true; \
else \
  exit 1; \
    fi; \
  else true; \
  fi; \
fi; \
  done; \
fi
make[7]: Entering directory
`/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/32/libstdc++-v3'
make[7]: *** No rule to make target `all'.  Stop.
make[7]: Leaving directory
`/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/32/libstdc++-v3'
make[6]: *** [multi-do] Error 1
make[6]: Leaving directory
`/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3'
make[5]: *** [all-multi] Error 2
make[5]: Leaving directory
`/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory
`/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3'
make[3]: *** [all] Error 2
make[3]: Leaving directory
`/home/matt/src/gcc-bin/x86_64-unknown-linux-gnu/libstdc++-v3'
make[2]: *** [all-target-libstdc++-v3] Error 2
make[2]: Leaving directory `/home/matt/src/gcc-bin'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/matt/src/gcc-bin'
make: *** [profiledbootstrap] Error 2


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


Re: build failure, GMP not available

2006-11-16 Thread Matt Fago
I have been struggling with this issue, and now that I have  
successfully built GCC I thought I would share my results. Hopefully  
it can help someone better versed in autotools to improve the build  
of GCC with GMP/MPFR.


For reference, a few older threads I've found:
	http://gcc.gnu.org/ml/gcc/2006-01/msg00333.html";>http:// 
gcc.gnu.org/ml/gcc/2006-01/msg00333.html
	http://gcc.gnu.org/ml/gcc-bugs/2006-03/ 
msg00723.html">http://gcc.gnu.org/ml/gcc-bugs/2006-03/msg00723.html


The long and short of it: my builds of the latest versions of GMP and  
MPFR were perfectly fine, although not ideal for building GCC.
However, the GCC 4.1.1 configure script incorrectly decided that it  
_had_ located useful copies of GMP and MPFR, while in fact the  
GFortran build fails 90 minutes later with the error message (as in  
the second thread above):


"../.././libgfortran/mk-kinds-h.sh: Unknown type"

This was configuring GCC via:

	../srcdir/configure --with-gmp=/usr/local/lib64 --with-mpfr=/usr/ 
local/lib64


I now understand that this is a mis-use of these options, however  
recall configure was successful (I still do not understand why),  
while configure failed with the 'correct' options '--with-gmp=/usr/ 
local --with-mpfr=/usr/local' (because *.h are in /usr/local/include,  
but *.a are in /usr/local/lib64).


I was finally successful by using the build-directories rather than  
the installed libraries via:


	../srcdir/configure --with-gmp-dir=/usr/local/gmp --with-mpfr-dir=/ 
usr/local/mpfr


but only after I made the symlink:

ln -s /usr/local/mpfr/.libs/libmpfr.a /usr/local/mpfr/libmpfr.a

One issue here is that '--with-mpfr=path' assumes that 'libmpfr.a' is  
in 'path/lib' (not true for how I installed it), while '--with-mpfr- 
dir=path' assumes that 'libmpfr.a' is in 'path', rather than  
'path/.libs' (can this work for anyone?). Note that '--with-gmp- 
dir=path' does look in 'path/.libs'.


This is all on RHEL4 x86_64. Note I am new to x86_64 and multilibs --  
this certainly added to my difficulties. The machine does have older  
versions of GMP and MPFR installed in /usr/lib and /usr/lib64, while  
I had installed the latest versions in /usr/local (with the libraries  
in /usr/local/lib64).  I would also note that GMP unfortunately hard- 
codes the bitness of the libraries in gmp.h, and that the older  
system /usr/include/gmp.h identifies itself as 64-bit (there are no  
#define switches as I would have expected).



My comments:

1) It would have been very useful to have explicit configure options  
such as --with-gmp-lib=path and --with-gmp-include=path (etc) that  
explicitly locate the *.a and *.h directories, rather than (or in  
addition to) the existing "install directory" and "build directory"   
options.


2) Ideally IMHO the top-level configure (or at least the libgfortran  
configure) would test the execution of some or all of the required  
functions in GMP/MPFR. I vaguely recall that this is possible with  
autoconf, and should be more robust. Would it add too much complexity  
to the top-level configure?




 Thanks,
 - Matt


Re: build failure, GMP not available

2006-11-17 Thread Matt Fago
>From: "Kaveh R. GHAZI" <[EMAIL PROTECTED]>
>> Matt Fago wrote:
>> One issue here is that '--with-mpfr=path' assumes that 'libmpfr.a' is
>> in 'path/lib' (not true for how I installed it), while '--with-mpfr-
>> dir=path' assumes that 'libmpfr.a' is in 'path', rather than
>> 'path/.libs' (can this work for anyone?). Note that '--with-gmp-
>> dir=path' does look in 'path/.libs'.
>
>This problem appears in the 4.0 series all the way through current
>mainline.  I do believe it should be fixed and it is simple to do so. I'll
>take care of it.
>
>> My comments:
>>
>> 1) It would have been very useful to have explicit configure options
>> such as --with-gmp-lib=path and --with-gmp-include=path (etc) that
>> explicitly locate the *.a and *.h directories, rather than (or in
>> addition to) the existing "install directory" and "build directory"
>> options.
>
>Yes, the configure included in mpfr itself has this for searching for GMP
>which it relies on.  I'll add something for this in GCC also.

Thank you.

>> 2) Ideally IMHO the top-level configure (or at least the libgfortran
>> configure) would test the execution of some or all of the required
>> functions in GMP/MPFR. I vaguely recall that this is possible with
>> autoconf, and should be more robust. Would it add too much complexity
>> to the top-level configure?
>
>I tend to be reluctant about run tests because they don't work with a
>cross-compiler.  Would you please tell me specifically what problem
>checking at runtime would prevent that the existing compile test doesn't
>detect?

Yes, a cross-compiler could not do runtime tests.  I was trying to think of a 
more robust configuration-time test. This is difficult as I do not quite 
understand why configure was successful in finding the libraries with the
correct versions, but yet the compilation itself failed.  Would a link test 
against all of the required GMP/MPFR functions  (via AC_CHECK_LIB etc)
offer anything?

 Thanks,
 - Matt


Re: Bootstrap broken on x86_64 on the trunk in libgfortran?

2006-11-30 Thread Matt Fago

>> ../../../trunk/libgfortran/mk-kinds-h.sh: Unknown type
>> grep '^#' < kinds.h > kinds.inc
>> /bin/sh: kinds.h: No such file or directory
>> make[2]: *** [kinds.inc] Error 1
>> make[2]: Leaving directory 
>> `/home/daney/gccsvn/native-trunk/x86_64-unknown-linux-gnu/libgfortran'
>> make[1]: *** [all-target-libgfortran] Error 2
>> make[1]: Leaving directory `/home/daney/gccsvn/native-trunk'
>> make: *** [all] Error 2
>
>Usually (like 99% of the time), this means you GMP/MPFR are broken
>and is causing gfortran to crash out.

I think the patch concept below may help with these issues. The idea
was to make configure try to link to libmfpr using the functions only in 
mfpr 2.2.0 or greater that GCC is currently using (that I could find anyhow). 
Previously configure could succeed if any version of libmfpr was available
so long as the header was the correct version (this is likely on x86_64).

Please excuse any formatting issues -- this is my first patch. I have neither
SVN access nor a copyright assignment, but this is a short patch. Would
someone be willing to help test and possibly apply? 

Thanks!
Matt

--- configure.in(Revision 119232)
+++ configure.in(Working Copy)
@@ -1123,7 +1123,12 @@ if test x"$have_gmp" = xyes; then
 #if MPFR_VERSION_MAJOR < 2 || (MPFR_VERSION_MAJOR == 2 && MPFR_VERSION_MINOR < 
2)
   choke me
 #endif
-  mpfr_t n; mpfr_init(n);
+  int t;
+  mpfr_t n, x;
+  mpfr_init (n); mpfr_init (x);
+  mpfr_atan2 (n, n, x, GMP_RNDN);
+  mpfr_erfc (n, x, GMP_RNDN);
+  mpfr_subnormalize (x, t, GMP_RNDN);
 ], [AC_MSG_RESULT([yes])], [AC_MSG_RESULT([no]); have_gmp=no])
   LIBS="$saved_LIBS"
 fi




Re: mpfr issues when Installing gcc 3.4 on fedora core

2007-01-03 Thread Matt Fago

You do mean gcc 4.3 right (either a snapshot, or from svn)?

Since you're running on x86_64, do you know that the libraries are  
the correct bitness (running 'file' on the mpfr and gmp libraries  
will tell).  By default gcc on x86_64 will build 64-bit, but  
libraries in /usr/local/lib should only be 32-bit (versus /usr/local/ 
lib64). The linker will ignore any 32-bit libraries when linking a 64- 
bit executable. How did you install gmp/mpfr (note the package from  
fedora is broken -- very old)?


It took me quite a while to get 4.1 with fortran installed on RHEL  
until I got this all sorted out (I was new to multilibs). I just  
upgraded to fc6 and was able to install gcc from svn once I used -- 
with-gmp-lib=/usr/local/lib64 (etc for include and mpfr) and setting  
LD_LIBRARY_PATH=/usr/local/lib64  appropriately. Alternatively one  
could (carefully!) setup /etc/ld.so.conf and run ldconfig (I did this  
on RHEL).


I might be able to help tomorrow AM (US mountain time) if you email  
me directly.


FWIW, I understand the reason to keep mpfr out of the gcc tree, but  
not doing so makes gcc more difficult to bootstrap for a novice such  
as myself.  Fedora's outdated gmp/mpfr package doesn't help either ...



 - Matt



Re: mpfr issues when Installing gcc 3.4 on fedora core

2007-01-04 Thread Matt Fago
> drizzle drizzle wrote:
>And as matt suggested if mpfr is not needed by 3.4, how can I
>configure it that way. --disable -mpfr did not help.

MPFR should not have _anything_ to do with any gcc prior to 4.x. Where did you 
get gcc 3.4? A tarball from a gnu mirror or somewhere else? I think either the 
tarball is misnamed or something is terribly wrong with it. 

>checking if gmp.h version and libgmp version are the same... (4.2.1/4.1.4) no
>configure: WARNING: 'gmp.h' and 'libgmp' seems to have different versions or
>configure: WARNING: we cannot run a program linked with GMP (if you cannot
>configure: WARNING: see the version numbers above).
>configure: WARNING: However since we can't use 'libtool' inside the configure,
>configure: WARNING: we can't be sure. See 'config.log' for details.

This means that mpfr needs to be told where gmp is and was probably not built 
correctly. When you configure mpfr use the options:

  --with-gmp-include=DIR  GMP include directory
  --with-gmp-lib=DIR GMP lib directory

Make sure these point to the lib and include directories with the new version 
of gmp. You can also use:

 --libdir=/usr/local/lib64 

if you wish to install the 64-bit libraries there instead of ../lib.

Note that fedora installs a 'bad' version of gmp 4.1.4 that includes a very old 
copy of mpfr. You seem to be picking up the library from this one.

 - Matt


Re: mpfr issues when Installing gcc 3.4 on fedora core

2007-01-04 Thread Matt Fago
>From: drizzle drizzle <[EMAIL PROTECTED]>
>Still no luck so far .. I got the gcc3.4 from the gcc archive. Any way
>I can make gcc 3.4 not use these libraries ?


What is the exact file name and URL? I will download the same tarball and try 
to build it on my fc6 box.

 - M


Re: mpfr issues when Installing gcc 3.4 on fedora core

2007-01-04 Thread Matt Fago
>From: drizzle drizzle <[EMAIL PROTECTED]>
>
>svn -q checkout svn://gcc.gnu.org/svn/gcc/trunk gcc_3_4_6_release

This is checking out the latest trunk, not version 3.4. The last argument only 
changes the name of the directory name on your local machine. The 'svn://'  is 
what specifies the tag (in this case 'trunk').

 - Matt



gcc gcov and --coverage on x86_64

2007-03-08 Thread Matt Fago
Having searched in bugzilla and asked on gcc-help to no avail ...

gcc --coverage appears to be broken on x86_64 in gcc 4.1.1 on FC6 (works fine 
with Trunk). I'm almost certain that this is a known issue, but cannot find a 
reference in Bugzilla.

Could someone please give me a pointer to the bug?

Thanks,
Matt


Re: gcc gcov and --coverage on x86_64

2007-03-14 Thread Matt Fago

>From: Ben Elliston <[EMAIL PROTECTED]>
>> gcc --coverage appears to be broken on x86_64 in gcc 4.1.1 on FC6
>> (works fine with Trunk). I'm almost certain that this is a known
>> issue, but cannot find a reference in Bugzilla.
>
>I implemented that option, so can probably help you.  Contact me in
>private mail and we'll try and troubleshoot it.  If necessary, you can
>then file a bug report.

FYI, this is an issue with ccache and not gcc (I forgot about that possibility).

Guess it's time to dig into ccache.

Thanks,
Matt



VAX backend status

2007-04-01 Thread Matt Thomas

Over the past several weeks, I've revamped the VAX backend:

 - fixed various bugs
 - improved 64bit move, add, subtract code.
 - added patterns for ffs, bswap16, bswap32, sync_lock_test_and_set,  
and

   sync_lock_release
 - modified it to generate PIC code.
 - fixed the dwarf2 output so it is readonly in shared libraries.
 - moved the constraints from vax.h to constraints.md
 - moved predicates to predicates.md
 - added several peephole and peephole2 patterns

So the last major change to make the VAX backend completely modern is to
remove the need for "HAVE_cc0".  However, even instructions that modify
the CC don't always changes all the CC bits; some instructions preserve
certain bits.  I'd like to do this but currently it's above my level of
gcc expertise.

Should the above be submitted as one megapatch?  Or as a dozen or two
smaller patches?

And finally a few musings ...

I've noticed a few things in doing the above.  GCC 4.x doesn't seems to
do CSE on addresses.  Because the VAX binutils doesn't support non-local
symbols with a non-zero addend in the GOT, PIC will do a define_expand
so that (const (plus (symbol_ref) (const_int))) will be split into
separate instructions.  However, gcc doesn't seem to be able to take
advantage of that.  For instance, gcc emits:

movab rpb,%r0
movab 100(%r0),%r1
cvtwl (%r1),%r0

but the movab 100(%r0),%r1 is completely unneeded, this should have
been emitted as:

movab rpb,%r0
cvtwl 100(%r0),%r0

I could add peepholes to find these and fix them but it would be nice
if the optimizer could do that for me.

Another issue is that gcc has become "stupider" when it comes using
indexed addressing.  For example:

static struct { void (*func)(void *); void *arg; int inuse; } keys[64];

int nextkey;

int
setkey(void (*func)(void *), void *arg)
{
int i;
for (i = nextkey; i < 64; i++) {
if (!keys[i].inuse)
goto out;
}

emits:

movl nextkey,%r3
cmpl %r3,$63
jgtr .L38
mull3 %r3,$12,%r0
movab keys+8[%r0],%r0
tstl (%r0)

The last 3 instructions should have been:

mull3 %r3,$3,%r0
tstl keys+8[%r0]




[RFA] Invalid mmap(2) assumption in pch (ggc-common.c)

2005-04-23 Thread Matt Thomas
Running the libstdc++ testsuite on NetBSD/sparc or NetBSD/sparc64
results in most tests failing like:
:1: fatal error: had to relocate PCH
compilation terminated.
compiler exited with status 1
This is due to a misassumption in ggc-common.c:654
(mmap_gt_pch_use_address):
   This version assumes that the kernel honors the START operand of mmap
   even without MAP_FIXED if START through START+SIZE are not currently
   mapped with something.
That is not true for NetBSD.  Due to MMU idiosyncracies, some architecures
(like sparc and sparc64) will align mmap requests that don't have MAP_FIXED
set for architecture specific reasons).
Is there a reason why MAP_FIXED isn't used even though it probably
should be?
--
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.


[PATCH] VAX: cleanup; move macros from config/vax/vax.h to normal in config/vax/vax.c

2005-04-26 Thread Matt Thomas
This doesn't change any functionality, it just moves and cleans up a
large number of complicated macros in vax.h to normal C code in vax.c.
It's the first major step to integrating PIC support that I did for
gcc 2.95.3.  It also switches from using SYMBOL_REF_FLAG to
SYMBOL_REF_LOCAL_P.
Committed.
--
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.
2005-03-26  Matt Thomas <[EMAIL PROTECTED]>

* config/vax/vax.c (legitimate_constant_address_p): New.  Formerly
CONSTANT_ADDRESS_P in config/vax/vax.h
(legitimate_constant_p): New.  Formerly CONSTANT_P in vax.h. 
(INDEX_REGISTER_P): New.
(BASE_REGISTER_P): New.
(indirectable_constant_address_p): New.  Adapted from
INDIRECTABLE_CONSTANT_ADDRESS_P in vax.h.
Use SYMBOL_REF_LOCAL_P.
(indirectable_address_p): New.  Adapted from
INDIRECTABLE_ADDRESS_P in vax.h.
(nonindexed_address_p): New.  Adapted from
GO_IF_NONINDEXED_ADDRESS in vax.h.
(index_temp_p): New.  Adapted from
INDEX_TERM_P in vax.h.
(reg_plus_index_p): New.  Adapted from
GO_IF_REG_PLUS_INDEX in vax.h.
(legitimate_address_p): New.  Adapted from
GO_IF_LEGITIMATE_ADDRESS in vax.h
(vax_mode_dependent_address_p): New.  Adapted from
GO_IF_MODE_DEPENDENT_ADDRESS in vax.h
* config/vax/vax.h (CONSTANT_ADDRESS_P): Use
legitimate_constant_address_p
(CONSTANT_P): Use legitimate_constant_p.
(INDIRECTABLE_CONSTANT_ADDRESS_P): Removed.
(INDIRECTABLE_ADDRESS_P): Removed.
(GO_IF_NONINDEXED_ADDRESS): Removed.
(INDEX_TEMP_P): Removed.
(GO_IF_REG_PLUS_INDEX): Removed.
(GO_IF_LEGITIMATE_ADDRESS): Use legitimate_address_p.
Two definitions, depending on whether REG_OK_STRICT is defined.
(GO_IF_MODE_DEPENDENT_ADDRESS): Use vax_mode_dependent_address_p.
Two definitions, depending on whether REG_OK_STRICT is defined.
* config/vax/vax-protos.h (legitimate_constant_address_p): Prototype
added.
(legitimate_constant_p): Prototype added.
(legitimate_address_p): Prototype added.
(vax_mode_dependent_address_p): Prototype added.


Index: vax.c
===
RCS file: /cvs/gcc/gcc/gcc/config/vax/vax.c,v
retrieving revision 1.60
diff -u -3 -p -r1.60 vax.c
--- vax.c   7 Apr 2005 21:44:57 -   1.60
+++ vax.c   26 Apr 2005 20:45:42 -
@@ -1100,3 +1100,227 @@ vax_output_conditional_branch (enum rtx_
 }
 }
 
+/* 1 if X is an rtx for a constant that is a valid address.  */
+
+int
+legitimate_constant_address_p (rtx x)
+{
+  return (GET_CODE (x) == LABEL_REF || GET_CODE (x) == SYMBOL_REF
+ || GET_CODE (x) == CONST_INT || GET_CODE (x) == CONST
+ || GET_CODE (x) == HIGH);
+}
+
+/* Nonzero if the constant value X is a legitimate general operand.
+   It is given that X satisfies CONSTANT_P or is a CONST_DOUBLE.  */
+
+int
+legitimate_constant_p (rtx x ATTRIBUTE_UNUSED)
+{
+  return 1;
+}
+
+/* The other macros defined here are used only in legitimate_address_p ().  */
+
+/* Nonzero if X is a hard reg that can be used as an index
+   or, if not strict, if it is a pseudo reg.  */
+#defineINDEX_REGISTER_P(X, STRICT)
+(GET_CODE (X) == REG && (!(STRICT) || REGNO_OK_FOR_INDEX_P (REGNO (X
+
+/* Nonzero if X is a hard reg that can be used as a base reg
+   or, if not strict, if it is a pseudo reg.  */
+#defineBASE_REGISTER_P(X, STRICT)
+(GET_CODE (X) == REG && (!(STRICT) || REGNO_OK_FOR_BASE_P (REGNO (X
+
+#ifdef NO_EXTERNAL_INDIRECT_ADDRESS
+
+/* Re-definition of CONSTANT_ADDRESS_P, which is true only when there
+   are no SYMBOL_REFs for external symbols present.  */
+
+static int
+indirectable_constant_address_p (rtx x)
+{
+  if (!CONSTANT_ADDRESS_P (x))
+return 0;
+  if (GET_CODE (x) == CONST && GET_CODE (XEXP ((x), 0)) == PLUS)
+x = XEXP (XEXP (x, 0), 0);
+  if (GET_CODE (x) == SYMBOL_REF && !SYMBOL_REF_LOCAL_P (x))
+return 0;
+
+  return 1;
+}
+
+#else /* not NO_EXTERNAL_INDIRECT_ADDRESS */
+
+static int
+indirectable_constant_address_p (rtx x)
+{
+  return CONSTANT_ADDRESS_P (x);
+}
+
+#endif /* not NO_EXTERNAL_INDIRECT_ADDRESS */
+
+/* Nonzero if X is an address which can be indirected.  External symbols
+   could be in a sharable image library, so we disallow those.  */
+
+static int
+indirectable_address_p(rtx x, int strict)
+{
+  if (indirectable_constant_address_p (x))
+return 1;
+  if (BASE_REGISTER_P (x, strict))
+return 1;
+  if (GET_CODE (x) == PLUS
+  && BASE_REGISTER_P (XEXP (x, 0), stric

GCC 4.1: Buildable on GHz machines only?

2005-04-26 Thread Matt Thomas
Over the past month I've been making sure that GCC 4.1 works on NetBSD.
I've completed bootstraps on sparc, sparc64, arm, x86_64, i386, alpha,
mipsel, mipseb, and powerpc.  I've done cross-build targets for vax.
Results have been sent to gcc-testsuite.
The times to complete bootstraps on older machines has been bothering me.
It took nearly 72 hours for 233MHz StrongArm with 64MB to complete a
bootstrap (with libjava).  It took over 48 hours for a 120MHz MIPS R4400
(little endian) with 128MB to finish (without libjava) and a bit over 24
hours for a 250MHz MIPS R4400 (big endian) with 256MB to finish (again,
no libjava).  That doesn't even include the time to run the testsuites.
I have a 50MHz 68060 with 96MB of memory (MVME177) approaching 100 hours
(48 hours just to exit stage3 and start on the libraries) doing a bootstrap
knowing that it's going to die when doing the ranlib of libjava.  The kernel
for the 060 isn't configured with a large enough dataspace to complete the
ranlib.
Most of the machines I've listed above are relatively powerful machines
near the apex of performance of their target architecture.  And yet GCC4.1
can barely be bootstrapped on them.
I do most of my GCC work on a 2GHz x86_64 because it's so fast.  I'm afraid
the widespread availability of such fast machines hides the fast that the
current performance of GCC on older architectures is appalling.
I'm going to run some bootstraps with --disable-checking just to see how
much faster they are.  I hope I'm going to pleasantly surprised but I'm
not counting on it.
--
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.



Re: GCC 4.1: Buildable on GHz machines only?

2005-04-26 Thread Matt Thomas
Richard Henderson wrote:
> On Tue, Apr 26, 2005 at 10:57:07PM -0400, Daniel Jacobowitz wrote:
> 
>>I would expect it to be drastically faster.  However this won't show up
>>clearly in the bootstrap.  The, bar none, longest bit of the bootstrap
>>is building stage2; and stage1 is always built with optimization off and
>>(IIRC) checking on.
> 
> 
> Which is why I essentially always supply STAGE1_CFLAGS='-O -g' when
> building on risc machines.

Alas, the --disable-checking and STAGE1_CFLAGS="-O2 -g" (which I was
already doing) only decreased the bootstrap time by 10%.  By far, the
longest bit of the bootstrap is building libjava.

-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.


[RFA] Which is better? More and simplier patterns? Fewer patterns with more embedded code?

2005-04-26 Thread Matt Thomas
Back when I modified gcc 2.95.3 to produce PIC code for NetBSD/vax, I changed
the patterns in vax.md to be more specific with the instructions that got
matched.  The one advantage (to me as the writer) was it made it much easier
to track down what pattern caused what instruction to be emitted.

For instance:

(define_insn "*pushal"
  [(set (match_operand:SI 0 "push_operand" "=g")
(match_operand:SI 0 "address_operand" "p"))]
  ""
  "pushal %a1")

I like the more and simplier patterns approach but I'm wondering what
the general recommendation is?
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.


Re: GCC 4.1: Buildable on GHz machines only?

2005-04-26 Thread Matt Thomas
Gary Funck wrote:
> 
>>-Original Message-
>>From: Matt Thomas
>>Sent: Tuesday, April 26, 2005 10:42 PM
> 
> [...]
> 
>>Alas, the --disable-checking and STAGE1_CFLAGS="-O2 -g" (which I was
>>already doing) only decreased the bootstrap time by 10%.  By far, the
>>longest bit of the bootstrap is building libjava.
>>
> 
> 
> Is it fair to compare current build times, with libjava included,
> against past build times when it didn't exist?  Would a closer
> apples-to-apples comparison be to bootstrap GCC Core only on
> the older sub Ghz platforms?

libjava is built on everything but vax and mips.  Bootstrapping core
might be better but do the configure on the fly it's not as easy as
it used to be.

It would be nice if bootstrap emitted timestamps when it was started
and when it completed a stage so one could just look at the make output.

Regardless, GCC4.1 is a computational pig.
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.


Re: GCC 4.1: Buildable on GHz machines only?

2005-04-27 Thread Matt Thomas
David Edelsohn wrote:
>>>>>>Matt Thomas writes:
> 
> 
> Matt> Regardless, GCC4.1 is a computational pig.
> 
>   If you are referring to the compiler itself, this has no basis in
> reality.  If you are referring to the entire compiler collection,
> including runtimes, you are not using a fair comparison or are making
> extreme statements without considering the cause.

When I see the native stage2 m68k compiler spend 30+ minutes compute bound
with no paging activity compiling a single source file, I believe
that is an accurate term.  Compiling stage3 on a 50MHz 68060 took 18 hours.
(That 30 minutes was for fold-const.c if you care to know).

At some points, I had no idea whether GCC had just gone into an infinite
loop due a bug or was actually doing what it was supposed to.

>   GCC now supports C++, Fortran 90 and Java.  Those languages have
> extensive, complicated runtimes.  The GCC Java environment is becoming
> much more complete and standards compliant, which means adding more and
> more features.

That's all positive but if GCC also becomes too expensive to build then
all those extra features become worthless.  What is the slowest system
that GCC has been recently bootstrapped on?

>   If your point is that fully supporting modern, richly featured
> languages results in a longer build process, that is correct.  Using
> disparaging terms like "pig" is missing the point.  As others have pointed
> out, if you do not want to build some languages and runtimes, you can
> disable them.  GCC is providing features that users want and that has a
> cost.

Yes they have a cost, but the cost is mitigated by running fast processors.
They are just so fast they can hide ineffiences and bloat.  We have seen
that for NetBSD and it's just as true for GCC or any other software.
These slower processor perform usefull feedback but only if a GCC bootstrap
is attempted on them on a semi-regular basis.

Am I the only person who has attempted to do a native bootstrap on a system
as slow as a M68k?  I thought about doing a bootstrap on a MicroSparc based
system but instead I decided to use a UltraSparcIIi system running with a
32bit kernel.
-- 
Matt Thomas     email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.


Re: GCC 4.1: Buildable on GHz machines only?

2005-04-27 Thread Matt Thomas
Jonathan Wakely wrote:
> On Wed, Apr 27, 2005 at 08:05:39AM -0700, Matt Thomas wrote:
> 
> 
>>David Edelsohn wrote:
>>
>>
>>> GCC now supports C++, Fortran 90 and Java.  Those languages have
>>>extensive, complicated runtimes.  The GCC Java environment is becoming
>>>much more complete and standards compliant, which means adding more and
>>>more features.
>>
>>That's all positive but if GCC also becomes too expensive to build then
>>all those extra features become worthless.
> 
> 
> Worthless to whom?

To users of that platform that can no longer afford to build GCC.

> The features under discussion are new, they didn't exist before.

And because they never existed before, their cost for older platforms
may not have been correctly assessed.  If no one builds natively on
older platforms, the recognition that the new features maybe a problem
for older platforms will never be made.

> If you survived without them previously you can do so now.
> (i.e. don't build libjava if your machine isn't capable of it)

Yes, you can skip building libjava.  But can you skip building GCC?
Will GCC 3.x be supported forever?  If not, your compiler may have
to rely being cross-built.  Being able to do a bootstrap is useful
and is part of the expected GCC testing but when it can only be
done one or two a week, it becomes a less practical test method.

> But claiming it's "worthless" when plenty of people are using it is
> just, well ... worthless.

Depends on your point of view.
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.


Re: GCC 4.1: Buildable on GHz machines only?

2005-04-27 Thread Matt Thomas
Mike Stump wrote:
On Apr 26, 2005, at 11:12 PM, Matt Thomas wrote:
It would be nice if bootstrap emitted timestamps when it was started
and when it completed a stage so one could just look at the make  output.

You can get them differenced for free by using:
time make boostrap
I know that.  But it's only works overall.  I want the per-stage
times.  Here's a sparc64--netbsd full bootstrap including libjava
(the machine has 640MB and was doing nothing but building gcc):
25406.01 real 21249.17 user  6283.15 sys
 0  maximum resident set size
 0  average shared memory size
 0  average unshared data size
 0  average unshared stack size
  54689526  page reclaims
  5349  page faults
   110  swaps
   723  block input operations
377302  block output operations
52  messages sent
52  messages received
285329  signals received
   1037478  voluntary context switches
253151  involuntary context switches
--
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.


Re: GCC 4.1: Buildable on GHz machines only?

2005-04-28 Thread Matt Thomas

Someone complained I was unfair in my gcc bootstrap times since
some builds included libjava/gfortran and some did not.

So in the past day, I've done bootstrap with just c,c++,objc on
both 3.4 and gcc4.1.  I've put the results in a web page at
http://3am-software.com/gcc-speed.html.  The initial bootstrap
compiler was gcc3.3 and they are all running off the same base
of NetBSD 3.99.3.

While taking out fortran and java reduced the disparity, there
is still a large increase in bootstrap times from 3.4 to 4.1.
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.


Re: GCC 4.1: Buildable on GHz machines only?

2005-04-29 Thread Matt Thomas
Joe Buck wrote:
> I think you need to talk to the binutils people.  It should be possible
> to make ar and ld more memory-efficient.

Even though systems maybe demand paged, having super large libraries
that consume lots of address space can be a problem.

I'd like to libjava be split into multiple shared libraries.
In C, we have libc, libm, libpthread, etc.  In X11, there's X11, Xt, etc.
So why does java have everything in one shared library?  Could
the swing stuff be moved to its own?  Are there other logical
divisions?

Unlike other modern systems with a two level page table structure,
the VAX uses a single page table of indirection.  This greatly reduces
the amount of address space a process can efficiently use.  If there
are components that will not be needed by some java programs, it would
nice if they could be separated into their shared libraries.
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.



Use $(VARRAY_H) in dependencies?

2005-05-08 Thread Matt Kraai
Howdy,

The rules for c-objc-common.o, loop-unroll.o, and tree-inline.o
include $(VARRAY_H), which is never defined, in their dependency
lists.  The rest of the targets that depend on varray.h include
varray.h in their dependency list.

varray.h includes machmode.h, system.h, coretypes.h, and tm.h, so
Makefile.in should define and use VARRAY_H, right?

-- 
Matt


signature.asc
Description: Digital signature


Re: Use $(VARRAY_H) in dependencies?

2005-05-08 Thread Matt Kraai
On Sun, May 08, 2005 at 07:31:38PM -0700, Matt Kraai wrote:
> On Mon, May 09, 2005 at 03:03:23AM +0100, Paul Brook wrote:
> > On Monday 09 May 2005 02:26, Matt Kraai wrote:
> > > Howdy,
> > >
> > > The rules for c-objc-common.o, loop-unroll.o, and tree-inline.o
> > > include $(VARRAY_H), which is never defined, in their dependency
> > > lists.  The rest of the targets that depend on varray.h include
> > > varray.h in their dependency list.
> > >
> > > varray.h includes machmode.h, system.h, coretypes.h, and tm.h, so
> > > Makefile.in should define and use VARRAY_H, right?
> > 
> > Already one step ahead of you :-)
> > 
> > 2005-05-07  Paul Brook  <[EMAIL PROTECTED]>
> > 
> > * Makefile.in: Fix dependencies.
> > (GCOV_IO_H, VARRAY_H): Set.
> 
> Great.

The dependencies for the rules for build/genautomata.o,
build/varray.o, and gtype-desc.o still include varray.h instead of
$(VARRAY_H).  Is this on purpose?  If so, why?

-- 
Matt


signature.asc
Description: Digital signature


Targets

2005-12-29 Thread Matt Ritchie
Hello:
I was wondering if the team could add the following
targets to GCC\G++\G77:

Basically make it even more crossplatform compiliant
and emulator friendly
eg: add the following cpu series : 8080, z80, 6502,
6800, and cpm/8000? :)
Maybe OS Specific librarys too (eg CP/M-86\CP/M-86
Also does G77 support Fortran-66?

PS: Can I help in any way(testing the mingw port(i
don't have linux\bsd\unix\vms\os/2 or mac, just
windows and dos

Matt Ritchie 


bounty available for porting AVR backend to MODE_CC

2020-02-23 Thread Matt Wette

Hi All,

I don't subscribe but wanted developers to know there is a bounty 
available for

porting the gcc AVR backend to use MODE_CC.  Here is the reference:

https://www.bountysource.com/issues/84630749-avr-convert-the-backend-to-mode_cc-so-it-can-be-kept-in-future-releases

And this is a reference to the discussion on avrfreaks.net:

https://www.avrfreaks.net/forum/avr-gcc-and-avr-g-are-deprecated-now

Matt



Function attribute((optimize(...))) ignored on inline functions?

2015-07-30 Thread Matt Turner
I'd like to tell gcc that it's okay to inline functions (such as
rintf(), to get the SSE4.1 roundss instruction) at particular call
sights without compiling the entire source file or calling function
with different CFLAGS.

I attempted this by making inline wrapper functions annotated with
attribute((optimize(...))), but it appears that the annotation does
not apply to inline functions? Take for example, ex.c:

#include 

static inline float __attribute__((optimize("-fno-trapping-math")))
rintf_wrapper_inline(float x)
{
   return rintf(x);
}

float
rintf_wrapper_inline_call(float x)
{
   return rintf(x);
}

float __attribute__((optimize("-fno-trapping-math")))
rintf_wrapper(float x)
{
   return rintf(x);
}

% gcc -O2 -msse4.1 -c ex.c
% objdump -d ex.o

ex.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0: e9 00 00 00 00   jmpq   5 
   5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1)
   c: 00 00 00 00

0010 :
  10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0
  16: c3   retq

whereas I expected that rintf_wrapper_inline_call would be the same as
rintf_wrapper.

I've read that per-function optimization is broken [1]. Is this still
the case? Is there a way to accomplish what I want?

[1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html


RFA: [VAX] SUBREG of MEM with a mode dependent address

2014-05-25 Thread Matt Thomas

GCC 4.8 for VAX is generating a subreg:HI for mem:SI indexed address.  This 
eventually gets caught by an assert in change_address_1.  Since the MEM rtx is 
SI, legimate_address_p thinks it's fine.  

I have a change to vax.md which catches these but it's extremely ugly and I 
have to think there's a better way.  But I have to wonder why is gcc even 
constructing a subreg of a mem with a mode dependent address.  

(gdb) call debug_rtx(insn)
(insn 73 72 374 12 (set (reg/v:HI 0 %r0 [orig:29 iCol ] [29])
(subreg:HI (mem/c:SI (plus:SI (mult:SI (reg/v:SI 10 %r10 [orig:22 i ] 
[22])
(const_int 4 [0x4]))
(reg/v/f:SI 11 %r11 [orig:101 aiCol ] [101])) [4 MEM[base: 
_154, offset: 0B]+0 S4 A32]) 0)) sqlite3.c:92031 13 {movhi_2}
 (nil))

Since this wasn't movstricthi, this could be rewritten to avoid the subreg and 
just treat %r0 as SI as in:

(insn 73 72 374 12 (set (reg/v:SI 0 %r0 [orig:29 iCol ] [29])
(mem/c:SI (plus:SI (mult:SI (reg/v:SI 10 %r10 [orig:22 i ] [22])
(const_int 4 [0x4]))
(reg/v/f:SI 11 %r11 [orig:101 aiCol ] [101]) [4 MEM[base: 
_154, offset: 0B]+0 S4 A32]) 0)) sqlite3.c:92031 13 {movsi_2}

But even if  movhi is a define_expand, as far as I can tell there's isn't 
enough info to know whether that is possible.  At that time, how can I tell 
that operands[0] will be a hard reg or operands[1] will be subreg of a mode 
dependent memory access?

I've tried using secondary_reload and it called called with 

(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)

but it dies in change_address_1 before invoking the code returned in sri.

I've tracked this down to reload replacing (reg:SI 113) with reg_equiv_mem 
(133) in the rtx.  However, it doesn't verify the rtx is actually valid.  I 
added a gcc_assert to trap this and got:

#1  0x0089ab87 in eliminate_regs_1 (x=0x7f7fe7b5c498, 
mem_mode=VOIDmode, insn=0x0, may_use_invariant=true, for_costs=true)
at 
/u1/netbsd-HEAD/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/reload1.c:2850(gdb)
 list
2845  && reg_equivs
2846  && reg_equiv_memory_loc (REGNO (SUBREG_REG (x))) != 0)
2847{
2848  new_rtx = SUBREG_REG (x);
2849  rtx z = reg_equiv_memory_loc (REGNO (new_rtx));
2850  gcc_assert (memory_address_addr_space_p (GET_MODE (x),
2851   XEXP (z, 0),
2852   MEM_ADDR_SPACE (z)));
2853}
2854  else
(gdb) call debug_rtx(z)
(mem:SI (plus:SI (mult:SI (reg/v:SI 22 [ i ])
(const_int 4 [0x4]))
(reg/v/f:SI 101 [ aiCol ])) [4 MEM[base: _154, offset: 0B]+0 S4 A32])
(gdb) call debug_rtx(x)
(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)

#2  0x0089cb31 in elimination_costs_in_insn (insn=0x7f7fe7b5bbd0)
at 
/u1/netbsd-HEAD/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/reload1.c:3751
(gdb) call debug_rtx (insn)
(insn 73 72 374 12 (set (nil)
(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)) 
/u1/netbsd-HEAD/src/external/public-domain/sqlite/lib/../dist/sqlite3.c:92031 
14 {movhi}
 (expr_list:REG_DEAD (reg:SI 113 [ MEM[base: _154, offset: 0B] ])
(nil)))

And now I'm stymied.  The limits of gcc-ness are now exceeded :)  I'n looking 
for ideas on how to proceed.

Thanks.

Re: RFA: [VAX] SUBREG of MEM with a mode dependent address

2014-06-03 Thread Matt Thomas

On May 30, 2014, at 10:39 AM, Jeff Law  wrote:

> On 05/25/14 18:19, Matt Thomas wrote:
>> 
>> But even if  movhi is a define_expand, as far as I can tell there's
>> isn't enough info to know whether that is possible.  At that time,
>> how can I tell that operands[0] will be a hard reg or operands[1]
>> will be subreg of a mode dependent memory access?
> At that time, you can't know those things.  Not even close ;-)  You certainly 
> don't want to try and rewrite the insn to just use SImode. This is all an 
> indication something has gone wrong elsewhere and this would just paper over 
> the problem.
> 
>> 
>> I've tried using secondary_reload and it called called with
>> 
>> (subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)
>> 
>> but it dies in change_address_1 before invoking the code returned in
>> sri.
> I suspect if you dig deep enough, you can make a secondary reload do what you 
> want.  It's just amazingly painful.
> 
> You want to allocate an SImode temporary, do the load of the SI memory 
> location into that SImode temporary, then (subreg:SI (tempreg:SI)). Your best 
> bet is going to be to look at how some other ports handle their secondary 
> reloads.  But I warn you, it's going to be painful.

Doesn't work because the assert fires before the secondary reload takes place.

In expr.c:convert_mode there is code that would seem to prevent this:

  /* For truncation, usually we can just refer to FROM in a narrower mode.  */
  if (GET_MODE_BITSIZE (to_mode) < GET_MODE_BITSIZE (from_mode)
  && TRULY_NOOP_TRUNCATION_MODES_P (to_mode, from_mode))
{
  if (!((MEM_P (from)
 && ! MEM_VOLATILE_P (from)
 && direct_load[(int) to_mode]
 && ! mode_dependent_address_p (XEXP (from, 0),
MEM_ADDR_SPACE (from)))
|| REG_P (from)
|| GET_CODE (from) == SUBREG))
from = force_reg (from_mode, from);
  if (REG_P (from) && REGNO (from) < FIRST_PSEUDO_REGISTER
  && ! HARD_REGNO_MODE_OK (REGNO (from), to_mode))
from = copy_to_reg (from);
  emit_move_insn (to, gen_lowpart (to_mode, from));
  return;
}

but from at that point is just

(mem:SI (reg:SI 112 [ D.118399 ]) [4 MEM[base: _154, offset: 0B]+0 S4 A32])

So there is not enough information for mode_dependent_address_p to return true.

>> 
>> I've tracked this down to reload replacing (reg:SI 113) with
>> reg_equiv_mem (133) in the rtx.  However, it doesn't verify the rtx
>> is actually valid.  I added a gcc_assert to trap this and got:
> Right.  reload will make that replacement and it's not going to do any 
> verification at that point.  Verification would have happened earlier.

See above.  If anywhere, that is where it would have been done.

> You have to look at the beginning of the main reload loop and poke at that 
> for a while:
> 
> /* For each pseudo register that has an equivalent location defined,
> try to eliminate any eliminable registers (such as the frame pointer)
> assuming initial offsets for the replacement register, which
> is the normal case.
> 
> If the resulting location is directly addressable, substitute
> the MEM we just got directly for the old REG.
> 
> If it is not addressable but is a constant or the sum of a hard reg
> and constant, it is probably not addressable because the constant is
> out of range, in that case record the address; we will generate
> hairy code to compute the address in a register each time it is
> needed.  Similarly if it is a hard register, but one that is not
> valid as an address register.
> 
> If the location is not addressable, but does not have one of the
> above forms, assign a stack slot.  We have to do this to avoid the
> potential of producing lots of reloads if, e.g., a location involves
> a pseudo that didn't get a hard register and has an equivalent memory
> location that also involves a pseudo that didn't get a hard register.
> 
> Perhaps at some point we will improve reload_when_needed handling
> so this problem goes away.  But that's very hairy.  */

I found a simplier solution.  It seemed to me that reload_inner_reg_of_subreg
was the right place to make this happen.  The following diff (to gcc 4.8.3)
fixes the problem:

diff -u -p -r1.3 reload.c
--- gcc/reload.c1 Mar 2014 08:58:29 -   1.3
+++ gcc/reload.c3 Jun 2014 17:24:27 -
@@ -846,6 +846,7 @@ static bool
 reload_inner_reg_of_subreg (rtx x, enum machine_mode mode, bool output)
 

Re: GCC ARM: aligned access

2014-08-31 Thread Matt Thomas

On Aug 31, 2014, at 11:32 AM, Joel Sherrill  wrote:

>> Hi,
>> 
>> I am writing some code and found that system crashed. I found it was
>> unaligned access which causes `data abort` exception. I write a piece
>> of code and objdump
>> it. I am not sure this is right or not.
>> 
>> command:
>> arm-poky-linux-gnueabi-gcc -marm -mno-thumb-interwork -mabi=aapcs-linux
>> -mword-relocations -march=armv7-a -mno-unaligned-access
>> -ffunction-sections -fdata-sections -fno-common -ffixed-r9 -msoft-float
>> -pipe  -O2 -c 2.c -o 2.o
>> 
>> arch is armv7-a and used '-mno-unaligned access'
> 
> I think this is totally expected. You were passed a u8 pointer which is 
> aligned for that type (no restrictions likely). You cast it to a type with 
> stricter alignment requirements. The code is just flawed. Some CPUs handle 
> unaligned accesses but not your ARM.

While armv7 and armv6 supports unaligned access, that support has to be 
enabled by the underlying O/S.  Not knowing the underlying environment, 
I can't say whether that support is enabled.  One issue we had in NetBSD
in moving to gcc4.8 was that the NetBSD/arm kernel didn't enable unaligned
access for armv[67] CPUs.  We quickly changed things so unaligned access
is supported.

Missed optimization case

2014-12-22 Thread Matt Godbolt
Hi all,

While digging into some GCC-generated code, I noticed a missed
opportunity in GCC that Clang and ICC seem to take advantage of. All
versions of GCC (up to 4.9.0) seem to have the same trouble. The
following source (for x86_64) shows up the problem:

-
#include 

#define add_carry32(sum, v)  __asm__("addl %1, %0 ;"  \
"adcl $0, %0 ;"  \
: "=r" (sum)  \
: "g" ((uint32_t) v), "0" (sum))

unsigned sorta_checksum(const void* src, int n, unsigned sum)
{
  const uint32_t *s4 = (const uint32_t*) src;
  const uint32_t *es4 = s4 + (n >> 2);

  while( s4 != es4 ) {
add_carry32(sum, *s4++);
  }

  add_carry32(sum, *(const uint16_t*) s4);
  return sum;
}
-

(the example is a contrived version of the original code, which comes
from Solarflare's OpenOnload project).

GCC optimizes the loop but then re-calculates the "s4" variable
outside of the loop before the last add_carry32.  ICC and Clang both
realise that the 's4' value in the loop is fine to re-use. GCC has an
extra four instructions to calculate the same value known to be in a
register upon loop exit.

Compiler explorer links:
GCC 4.9.0: http://goo.gl/fi3p2J
ICC 13.0.1: http://goo.gl/PRTTc6
Clang 3.4.1: http://goo.gl/95JEQc

I'll happily file a bug if necessary but I'm not clear in what phase
the optimization opportunity has been missed.

Thanks all, Matt


Re: Missed optimization case

2014-12-23 Thread Matt Godbolt
On Tue, Dec 23, 2014 at 2:25 PM, Andi Kleen  wrote:
>
> Please file a bug with a test case. No need to worry about the phase
> too much initially, just fill in a reasonable component.
>

Thanks - filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64396

-matt


volatile access optimization (C++ / x86_64)

2014-12-26 Thread Matt Godbolt
Hi all,

I'm investigating ways to have single-threaded writers write to memory
areas which are then (very infrequently) read from another thread for
monitoring purposes. Things like "number of units of work done".

I initially modeled this with relaxed atomic operations. This
generates a "lock xadd" style instruction, as I can't convey that
there are no other writers.

As best I can tell, there's no memory order I can use to explain my
usage characteristics. Giving up on the atomics, I tried volatiles.
These are less than ideal as their power is less expressive, but in my
instance I am not trying to fight the ISA's reordering; just prevent
the compiler from eliding updates to my shared metrics.

GCC's code generation uses a "load; add; store" for volatiles, instead
of a single "add 1, [metric]".

http://goo.gl/dVzRSq has the example (which is also at the bottom of my email).

Is there a reason why (in principal) the volatile increment can't be
made into a single add? Clang and ICC both emit the same code for the
volatile and non-volatile case.

Thanks in advance for any thoughts on the matter,

Matt

--- example code ---
#include 
std::atomic a(0);

void base_case() {
a++;
}

void relaxed() {
a.fetch_add(1, std::memory_order_relaxed);
}

void load_and_store_relaxed() {
  a.store(a.load(std::memory_order_relaxed) + 1, std::memory_order_relaxed);
}

void cast_as_int_ptr() {
  (*(int*)&a) ++;
}

void cast_as_volatile_int_ptr() {
  (*(volatile int*)&a) ++;
}

---example output (gcc490)---

base_case():
  lock addl $1, a(%rip)
  ret
relaxed():
  lock addl $1, a(%rip)
  ret
load_and_store_relaxed():
  movl a(%rip), %eax
  addl $1, %eax
  movl %eax, a(%rip)
  ret
cast_as_int_ptr():
  addl $1, a(%rip)
  ret
cast_as_volatile_int_ptr():
  movl a(%rip), %eax
  addl $1, %eax
  movl %eax, a(%rip)
  ret


Re: volatile access optimization (C++ / x86_64)

2014-12-26 Thread Matt Godbolt
On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley  wrote:
> On 26/12/14 20:32, Matt Godbolt wrote:
>> Is there a reason why (in principal) the volatile increment can't be
>> made into a single add? Clang and ICC both emit the same code for the
>> volatile and non-volatile case.
>
> Yes.  Volatiles use the "as if" rule, where every memory access is as
> written.  a volatile increment is defined as a load, an increment, and
> a store.

That makes sense to me from a logical point of view. My understanding
though is the volatile keyword was mainly used when working with
memory-mapped devices, where memory loads and stores could not be
elided. A single-instruction load-modify-write like "increment [addr]"
adheres to these constraints even though it is a single instruction.
I realise my understanding could be wrong here!  If not though, both
clang and icc are taking a short-cut that may puts them into
non-compliant state.

> If you want single atomic increment, atomics are what you
> should use.  If you want an increment to be written to memory, use a
> store barrier after the increment.

Thanks. I realise I was unclear in my original email. I'm really
looking for a way to say "do a non-lock-prefixed increment". Atomics
are too strong and enforce a bus lock.  Doing a store barrier after
the increment also appears heavy-handed: while I wish for eventual
consistency with memory, I do not require it. I do however need the
compiler to not move or elide my increment.

At the moment I think the best I can do is to use an inline assembly
version of the increment which prevents GCC from doing any
optimisation upon it. That seems rather ugly though, and if anyone has
any better suggestions I'd be very grateful.

To give a concrete example:

uint64_t num_done = 0;
void process_work() { /* does something somewhat expensive */}
void worker_thread(int num_work) {
  for  (int i = 0; i < num_work; ++i) {
process_work();
num_done++;  // ideally a relaxed atomic increment here
  }
}

void reporting_thread() {
  while(true) {
   sleep(60);
   printf("worker has done %d\n", num_done);  // ideally a relaxed read here
  }
}


In the non-atomic case above, no locked instructions are used. Given
enough information about what process_work() does, the compiler can
realise that num_done can be added to outside of the loop (num_done +=
num_work); which is the part I'd like to avoid.  By making the int
atomic and using relaxed, I get this guarantee but at the cost of a
"lock addl".

Thanks in advance for any ideas,

Matt


Re: volatile access optimization (C++ / x86_64)

2014-12-26 Thread Matt Godbolt
On Fri, Dec 26, 2014 at 4:51 PM, Marc Glisse  wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677

Thanks Marc


Re: volatile access optimization (C++ / x86_64)

2014-12-26 Thread Matt Godbolt
On Fri, Dec 26, 2014 at 5:19 PM, Andrew Haley  wrote:
> On 26/12/14 22:49, Matt Godbolt wrote:
>> On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley  wrote:
>>> On 26/12/14 20:32, Matt Godbolt wrote:

>> I realise my understanding could be wrong here!
>> If not though, both clang and icc are taking a short-cut that may
>> puts them into non-compliant state.
>
> It's hard to be certain.  The language used by the standard is very
> unhelpful: it requires all accesses to be as written, but does not
> define exactly what constitutes an access.

Thanks. My world is very x86-centric and so I find it hard to
understand why a single instruction's RMW is different from three
separate instructions; but I appreciate the standard is vague around
volatiles, and that atomics go some way to using more well-defined
semantics.

>> Thanks. I realise I was unclear in my original email. I'm really
>> looking for a way to say "do a non-lock-prefixed increment".
>
> Why?

Performance. The single-threaded writers do not need to use a lock
prefix: the atomicity of their read-add-write is guaranteed by my
knowing no other threads write to the value. Thus the bus lock they
take out unnecessarily slows down the instruction and potentially
causes extra coherency traffic.  The order of stores (on x86) is
guaranteed and so provided I take a relaxed view in the consumer
there's not even a need for any other flush.  The memory write will
necessarily "eventually" become visible to the reader. Within the
constraints of the architecture I'm working in, this is plenty enough
for a metric.

> You could just use a compiler barrier: asm volatile(""); But this is
> good only for x86 and a few others.

This may be all I need, but my worry is this will inhibit other valid
optimisations. I know that the "trick" used elsewhere as a barrier
(asm voliatile("":::"memory");) has the effect of flushing
enregistered values to memory. Ideally this wouldn't be necessary.
I'll be honest; I don't know the semantics of an empty volatile asm(),
but I'm not sure how it could cause only the one write (metric++) to
be emitted without affecting other variables too.

> Everyone else needs a real store barrier.

This is certainly true if the writer needs to guarantee visibility to
other threads. But that's not the case for my use case.

> Well, that's the problem: do you want a barrier or not?  With no
> barrier there is no guarantee that the data will ever be written to
> memory.  Do you only care about x86 processors?

I appreciate your patience in understanding my case (given I'm not
explaining myself very well!)  In this instance, yes, only x86
processors. I do not need an explicit ISA-level flush. I do need a
guarantee that the compiler cannot optimise the increment by
loop-invariant motion.

>> To give a concrete example:
[snip]
>> By making the int
>> atomic and using relaxed, I get this guarantee but at the cost of a
>> "lock addl".
>
> Ok, I get that, but not why.  If you care about a particular x86
> instruction, you can use it in an inlne asm.  I'm not at all sure what
> you want, really.

I hope my other comments at least help to explain the why! It's not a
particular instruction inasmuch as communicating to the compiler that
there's only one writer, and so the lock prefix is unnecessary (for
x86) as the write of the read-modify-write will not race with other
writers (as none exist) and the write will eventually become visible
to other threads in strict memory order (as the x86 guarantees). This
last stage I believe is consistent with a "relaxed" model, with an
optimisation that if no other writers exist, no bus lock is required
on the writer.

Again, thanks for the reply and the time taken thinking about the
issue especially at this festive time of year!

Best regards, Matt


Re: volatile access optimization (C++ / x86_64)

2014-12-26 Thread Matt Godbolt
On Fri, Dec 26, 2014 at 5:20 PM, NightStrike  wrote:
> Have you tried release and acquire/consume instead?

Yes; these emit the same instructions in this case. http://goo.gl/e94Ya7

Regards, Matt


Re: volatile access optimization (C++ / x86_64)

2014-12-27 Thread Matt Godbolt
On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley  wrote:
> On 27/12/14 00:02, Matt Godbolt wrote:
>> On Fri, Dec 26, 2014 at 5:19 PM, Andrew Haley  wrote:
>>> On 26/12/14 22:49, Matt Godbolt wrote:
>>>> On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley  wrote:
>>> Why?
>>
>> Performance.
>
> Okay, but that's not what I was trying to ask: if you don't need an
> atomic access, why do you care that it uses a read-modify-write
> instruction instead of three instructions?  Is it faster?  Have you
> measured it?  Is it so much faster that it's critical for your
> application?

Good point. No; I've yet to measure it but I will. I'll be honest: my
instinct is that really it won't make a measurable difference. From a
microarchitectural point of view it devolves to almost exactly the
same set of micro-operations (barring the duplicate memory address
calculation). It does encode to a longer instruction stream (15 bytes
vs 7 bytes), so there's an argument it puts more pressure than needed
on the i-cache. But honestly, it's more from an aesthetic point of
view I prefer the increment. (The locked version *is* measurable
slower).

Also, it's always nice to understand why particular optimisations
aren't performed by the compiler from a correctness point of view! :)

Thanks all for your fascinating insights :)

-matt


Re: volatile access optimization (C++ / x86_64)

2014-12-27 Thread Matt Godbolt
> On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley  wrote:
> Is it faster?  Have you measured it?  Is it so much faster that it's critical 
> for your
> application?

Well, I couldn't really leave this be: I did a little bit of
benchmarking using my company's proprietary benchmarking library,
which I'll try and get open sourced. It follows Intel's
recommendations for using RDTSCP/CPUID etc, and I've also spent some
time looking at Agner Fog 's techniques. I believe it to be pretty
accurate, to within a clock cycle or two.

On my laptop (Core i5 M520) the volatile and non-volatile increments
are so fast as to be within the noise - 1-2 clock cycles. So that
certainly lends support to your theory Andrew that it's probably not
worth the effort (other than offending my aesthetic sensibilities!).
Obviously this doesn't really take into account the extra i-cache
pressure.

As a comparison, the "lock xaddl" versions come out at 18 cycles.
Obviously this is also pretty much "free" by any reasonable metric,
but it's hard to measure the impact of the bus lock on other
processors' memory accesses in a highly multi-threaded environment.

For completeness I also tried it on a few other machines:
X5670 : 0-2 for normal, 28 clocks for lock xadd
E5-2667 v2: as above, 27 clocks for lock xadd
E5-2667 v3: as above, 15 clocks for lock xadd

On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley  wrote:
> Well, in this case you now know: it's a bug!  But one that it's
>fairly hard to care deeply about, although it might get fixed now.

Understood completely! Thanks again,

Matt


Re: volatile access optimization (C++ / x86_64)

2014-12-30 Thread Matt Godbolt
On Tue, Dec 30, 2014 at 5:05 AM, Torvald Riegel  wrote:
> I agree with Andrew.  My understanding of volatile is that the generated
> code must do exactly what the abstract machine would do.

That makes sense. I suppose I don't understand what the difference is
in terms of an abstract machine of "load; add; store" versus the
"load-add-store". At least from on x86, from the perspective of the
memory bus, there's no difference I'm aware of.

> One can use volatiles for synchronization if one is also manually adding
> HW barriers and potentially compiler barriers (depending on whether you
> need to mix volatile and non-volatile) -- but volatiles really aim at a
> different use case than atomics.

Again, the processor's reordering and memory barriers are not of huge
concern to me in this instance. I completely agree about volatile
being the wrong use case.

> For the single-writer shared-counter case, a load and a store operation
> with memory_order_relaxed seem to be right approach.

I agree: this most closely models my intention: a non-atomic-increment
but which has the semantics of being visible to other threads in a
finite period of time (as per your previous email).

The relaxed-load; add; relaxed-store generates the same code as the
volatile code (as in; three separate instructions), but I prefer it
over the volatile as it is more intention-revealing.  As to whether
it's valid to peephole optimize the three instructions to be a single
increment in the case of x86 given relaxed memory ordering, I can
offer no good opinion (though my instinct is it should be able to be!)

Thanks all for your help, Matt


Re: volatile access optimization (C++ / x86_64)

2015-01-05 Thread Matt Godbolt
On Mon, Jan 5, 2015 at 11:53 AM, DJ Delorie  wrote:
>
> Matt Godbolt  writes:
>> GCC's code generation uses a "load; add; store" for volatiles, instead
>> of a single "add 1, [metric]".
>
> GCC doesn't know if a target's load/add/store patterns are
> volatile-safe, so it must avoid them.  There are a few targets that have
> been audited for volatile-safe-ness such that gcc *can* use the combined
> load/add/store when the backend says it's OK.  x86 is not yet one of
> those targets.

Thanks DJ.

One question: do you have an example of a non-volatile-safe machine so
I can get a feel for the problems one might encounter?  At best I can
imagine a machine that optimizes "add 0, [mem]" to avoid the
read/write, but I'm not aware of such an ISA.

Much appreciated, Matt


5.1.0/4.9.2 native mingw64 lto-wrapper.exe issues (PR 65559 and 65582)

2015-04-28 Thread Matt Breedlove
I was told I should repost this on this ML rather than the gcc-help
list I originally posted this under.  Here was my original thread:

https://gcc.gnu.org/ml/gcc-help/2015-04/msg00167.html

I came across PR 65559 and 65582 while investigating why I was getting
the "lto1.exe: internal compiler error: in read_cgraph_and_symbols, at
lto/lto.c:2947" error during a native MINGW64 LTO build.  This also
seems to be present when enabling bootstrap-lto within 5.1.0
presenting an error message akin to what is listed in PR 65582.

1.

Under:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/lto-wrapper.c;h=404cb68e0d1f800628ff69b7672385b88450a3d5;hb=HEAD#l927

lto-wrapper processes command-line params for filenames match (in my
case) "./.libs/libspeexdsp.a@0x44e26" and separates the filename from
the offset into separate variables.  Since the following check to see
if that file exists by opening it doesn't use the parsed filename
variable and instead continues to use the argv parameter, the attempt
to open it always fails and that file is not specifically parsed for
LTO options.


2.

One other issue I've noticed in my build happens as a result of the
open call when trying to parse the options using libiberty.  Under
mingw64 native, the open call opens the object file in text mode and
then passes the fd eventually to libiberty's
simple_object_internal_read within simple-object.c.  The issue springs
up trying to perform a read and it hits a CTRL+Z (0x1A) within the
object at which point the next read will return 0 bytes and trigger
the break of the loop and a subsequent error message of "file too
short" which gets silently ignored.  In my testing, changing the 0x1A
within the object file to something else returns the full read (or
more data until another CTRL+Z is hit).

Ref: https://msdn.microsoft.com/en-us/library/wyssk1bs.aspx

This still happens within 4.9.2 and 4.9 trunk however in 4.9, the
object file being checked for LTO sections is still passed along in
the command-line whereas in 5.1.0 it gets skipped but is still listed
within the res file most likely leading to the ICE within 65559.  This
would also explain Kai's comment on why this issue only occurs on
native builds.  The ICE in 5.1.0 can also be avoided by using an
lto-wrapper from 4.9 or prior allowing the link to complete though no
LTO options will get processed due to #1.


This is my first report so I wouldn't mind some guidance.  I'm
familiar enough with debugging to gather whatever other level details
are requested.  Most of this was found using gdb.

--
Matt Breedlove


5.1.0 / 5.1.1 mingw64 bootstrap LTO failure questions

2015-05-11 Thread Matt Breedlove
I've posted an update to PR 66014 regarding mingw64 slim LTO bootstrap
errors I had been getting I was hoping to get some comments on.
Though this resolves the problem for me, I'm wondering what other
potential issues similar to it may spring up and was hoping to get
some feedback.

In addition, there is another related failure when doing bootstrap-lto
or bootstrap-lto-noplugin (slim or fat) in mingw64 relating to
sys_siglist.  mingw64 (as far as I know) does not have an
implementation for it.  The issue is as follows:

1.  stage1 completes bootstrapping.  strsignal and sys_siglist are
undetected resulting in HAVE_STRSIGNAL and HAVE_SYS_SIGLIST.

2.  stage2 (or stagefeedback) detects strsignal but not sys_siglist
leaving HAVE_SYS_SIGLIST defined.  This causes libiberty to define
strsignal but skips sys_siglist during the build leaving an undefined
reference to sys_siglist.

3. Build fails when attempting to link against the new LTO
libiberty.a(strsignal.o) when building gcc-nm, gcc-ar, etc.


Non-LTO builds suffer neither problem and fat bootstraps only suffer
from the issue above which I have worked around by passing in
"libiberty_cv_var_sys_siglist=no" during configuration.  Combined with
building libiberty with "-fno-builtin-stpcpy" (PR 66014), I have
gotten all builds to finally succeed.  I could use some guidance on
where to go from here, however.

Thanks,
Matt


Re: X32 psABI status

2011-02-12 Thread Matt Thomas

On Feb 12, 2011, at 1:29 PM, H.J. Lu wrote:

> On Sat, Feb 12, 2011 at 1:10 PM, Florian Weimer  wrote:
>> * H. J. Lu:
>> 
>>> We made lots of progresses on x32 pABI:
>>> 
>>> https://sites.google.com/site/x32abi/
>>> 
>>> 1. Kernel interface with syscall is close to be finalized.
>>> 2. GCC x32 branch is stabilizing.
>>> 3. The Bionic C library works with the syscall kernel interface.
>>> 
>>> The next major milestone will be x32 glibc port.
>> 
>> It is a bit difficult to extract useful information from these
>> resources.
> 
> That is true. Contributions are more than welcome.
> 
>> Is off_t 32 bits?  Why is the ia32 compatiblity kernel interface used?
> 
> Yes.

off_t is not part of the psABI since it's OS dependent.

>> I'm sure a lot of people want to get rid of that in cases where they
>> control the whole software stack.
> 
> That is debatable. The current thought is the x32 user space API
> is the same as is ia32.  time_t is also an issue.

Any system call method is beyond the scope of the psABI since it's
OS dependent and user-code should never care.


Re: X32 psABI status

2011-02-12 Thread Matt Thomas

On Feb 12, 2011, at 7:02 PM, Andrew Pinski wrote:

> On Sat, Feb 12, 2011 at 3:04 PM, H. Peter Anvin  wrote:
>> On 02/12/2011 01:10 PM, Florian Weimer wrote:
>>> Why is the ia32 compatiblity kernel interface used?
>> 
>> Because there is no way in hell we're designing in a second
>> compatibility ABI in the kernel (and it has to be a compatibility ABI,
>> because of the pointer size difference.)
> 
> I think he is asking why not create a new ABI layer for the kernel
> like it is done for n32 for MIPS.

The kernel syscall ABI needs to be able to be pass 64-bit quantities
in a single register (since that's what the calling ABI is capable
of doing but I don't think the ia32 kernel interface can do)?

Maybe it's me, but I expected X32 to be the X86-64 ABI with 32-bit longs
and pointers (converted to 64-bit arguments when passed in register or
on the stack).  That allows the same syscall argument marshalling that
currently exists but just need a different set of syscall vectors.




Re: RFC: A new MIPS64 ABI

2011-02-14 Thread Matt Thomas

On Feb 14, 2011, at 12:29 PM, David Daney wrote:

> Background:
> 
> Current MIPS 32-bit ABIs (both o32 and n32) are restricted to 2GB of
> user virtual memory space.  This is due the way MIPS32 memory space is
> segmented.  Only the range from 0..2^31-1 is available.  Pointer
> values are always sign extended.
> 
> Because there are not already enough MIPS ABIs, I present the ...
> 
> Proposal: A new ABI to support 4GB of address space with 32-bit
> pointers.
> 
> The proposed new ABI would only be available on MIPS64 platforms.  It
> would be identical to the current MIPS n32 ABI *except* that pointers
> would be zero-extended rather than sign-extended when resident in
> registers.  In the remainder of this document I will call it
> 'n32-big'.  As a result, applications would have access to a full 4GB
> of virtual address space.  The operating environment would be
> configured such that the entire lower 4GB of the virtual address space
> was available to the program.

I have to wonder if it's worth the effort.  The primary problem I see
is that this new ABI requires a 64bit kernel since faults through the
upper 2G will go through the XTLB miss exception vector.  

> At a low level here is how it would work:
> 
> 1) Load a pointer to a register from memory:
> 
> n32:
>   LW $reg, offset($reg)
> 
> n32-big:
>   LWU $reg, offset($reg)


That might be sufficient for userland, but the kernel will need
to do similar things (even if a 64bit kernel) when accessing 
structures supplied by 32-bit syscalls.  

It seems to be workable but if you need the additional address space
why not use N64?



Re: RFC: A new MIPS64 ABI

2011-02-14 Thread Matt Thomas

On Feb 14, 2011, at 6:22 PM, David Daney wrote:

> On 02/14/2011 04:15 PM, Matt Thomas wrote:
>> 
>> I have to wonder if it's worth the effort.  The primary problem I see
>> is that this new ABI requires a 64bit kernel since faults through the
>> upper 2G will go through the XTLB miss exception vector.
>> 
> 
> Yes, that is correct.  It is a 64-bit ABI, and like the existing n32 ABI 
> requires a 64-bit kernel.

N32 doesn't require a LP64 kernel, just a 64-bit register aware kernel.
Your N32-big does require a LP64 kernel.



Re: RFC: A new MIPS64 ABI

2011-02-14 Thread Matt Thomas

On Feb 14, 2011, at 6:26 PM, David Daney wrote:

> On 02/14/2011 06:14 PM, Joe Buck wrote:
>> On Mon, Feb 14, 2011 at 05:57:13PM -0800, Paul Koning wrote:
>>> It seems that this proposal would benefit programs that need more than 2 GB 
>>> but less than 4 GB, and for some reason really don't want 64 bit pointers.
>>> 
>>> This seems like a microscopically small market segment.  I can't see any 
>>> sense in such an effort.
>> 
>> I remember the RHEL hugemem patch being a big deal for lots of their
>> customers, so a process could address the full 4GB instead of only 3GB
>> on a 32-bit machine.  If I recall correctly, upstream didn't want it
>> (get a 64-bit machine!) but lots of paying customers clamored for it.
>> 
>> (I personally don't have an opinion on whether it's worth bothering with).
>> 
> 
> Also look at the new x86_64 ABI (See all those X32 psABI messages) that the 
> Intel folks are actively working on.  This proposal is very similar to what 
> they are doing.

untrue.  N32 is closer to the X32 ABI since it is limited to 2GB.



Re: RFC: A new MIPS64 ABI

2011-02-14 Thread Matt Thomas

On Feb 14, 2011, at 6:50 PM, David Daney wrote:

> On 02/14/2011 06:33 PM, Matt Thomas wrote:
>> 
>> On Feb 14, 2011, at 6:22 PM, David Daney wrote:
>> 
>>> On 02/14/2011 04:15 PM, Matt Thomas wrote:
>>>> 
>>>> I have to wonder if it's worth the effort.  The primary problem I see
>>>> is that this new ABI requires a 64bit kernel since faults through the
>>>> upper 2G will go through the XTLB miss exception vector.
>>>> 
>>> 
>>> Yes, that is correct.  It is a 64-bit ABI, and like the existing n32 ABI 
>>> requires a 64-bit kernel.
>> 
>> N32 doesn't require a LP64 kernel, just a 64-bit register aware kernel.
>> Your N32-big does require a LP64 kernel.
>> 
> 
> But using 'official' kernel sources the only way to get a 64-bit register 
> aware kernel is for it to also be LP64.  So effectively, you do in fact need 
> a 64-bit kernel to run n32 userspace code.

Not all the world is Linux. :)  NetBSD supports N32 kernels.  

> My proposed ABI would need trivial kernel changes:
> 
> o Fix a couple of places where pointers are sign extended instead of zero 
> extended.

I think you'll find there are more of these than you'd expect.

> o Change the stack address and address ranges returned by mmap().

My biggest concern is that many many mips opcodes expect properly 
sign-extended value for registers.  Thusly N32-big will require 
using daddu/dadd/dsub/dsubu for addresses.  So that's yet another
departure from N32 which can use addu/add/sub/subu.

> The main work would be in the compiler toolchain and runtime libraries.

You'd also need to update gas for la and dla expansion.



Internal compiler error in targhooks.c: default_secondary_reload (ARM/Thumb)

2011-04-04 Thread Matt Fischer
I'm getting an internal compiler error on the following test program:

void func(int a, int b, int c, int d, int e, int f, int g, short int h)
{
assert(a < 100);
assert(b < 100);
assert(c < 100);
assert(d < 100);
assert(e < 100);
assert(f < 100);
assert(g < 100);
assert((-1000 < h) && (h < 0));
}

Command line and output:

$ arm-none-eabi-gcc -mthumb -O2 -c -o test.o test.c
test.c: In function 'func':
test.c:11:1: internal compiler error: in default_secondary_reload, at
targhooks.c:769
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://support.codesourcery.com/GNUToolchain/> for instructions.


This is running on Windows XP.  Version information:

$ arm-none-eabi-gcc --version
arm-none-eabi-gcc.exe (Sourcery G++ Lite 2010.09-51) 4.5.1
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

>From playing around with this, it looks to be some kind of register
allocation problem--it needs to have lots of variables active at once,
and the error doesn't occur unless I'm compiling for Thumb.
Unfortunately I don't have a way to test this on tips, so I can't tell
if it's been fixed there or not.  Any information on this would be
appreciated.

Thanks,
Matt


RE: Question about static code analysis features in GCC

2011-04-12 Thread Hargett, Matt
Hey Sarah,

Many array bounds and format string problems can already be found, especially 
with LTO, ClooG, loop-unrolling, and -O3 enabled. Seeing across object-file 
boundaries, understanding loop boundaries, and aggressive inlining allows GCC 
to warn about a lot of real-world vulnerabilities. When multiple IPA passes 
lands in trunk, it should be even better.

What I think is missing is:

1) detection of double-free. This is already a function attribute called 
'malloc', which is used to express a specific kind of allocation function whose 
return value will never be aliased. You could use that attribute, in addition 
to a new one ('free'), to track potential double-frees of values via VRP/IPA.

2) the ability to annotate functions as to the taint and filtering side-effects 
to their parameters, like the format() attribute. (I've asked for this feature 
from the PC-Lint people for some time.) You could make this even more generic 
and just add a new attribute that allows for tagging and checking of arbitrary 
tags:
ssize_t recv(int sockfd, void *buf, size_t len, int flags) __attribute__ 
((add_parameter_tag ("taint", 2)))
   __attribute__ 
((add_return_value_tag ("taint")));

int count_sql_rows_for(const char* name) __attribute__ ((disallow_parameter_tag 
("taint", 1)));
void filter_sql_characters_from(const char* name) __attribute__ 
((removes_parameter_tag ("taint", 1)));

then a program like this:
int main(void) {
  char name[20] = {0};
  recv(GLOBAL_SOCKET, &name, sizeof(name), 0);
  filter_sql_characters_from(name); // comment this line to get warning
  count_sql_rows_for(name);
}

When I wrote my binary static analysis product, BugScan, we assumed that if a 
pointer was tainted, so was its contents. (This was especially a necessity for 
collections like lists and vectors in Java and C++ binaries.) You may want to 
get more explicit with that, by having a rescurively_add_parameter_tag() or 
somesuch that only applies to pointer parameters.

3) lack of explicit NULL-termination of strings. This one gets really 
complicated, especially for situations where they are terminated properly and 
then become un-terminated.

4) if a loop that writes to a pointer, and increments that pointer, is bound by 
a tainted value. You'd have to add an extension to the loop unroller for that, 
and just check for the 'taint' tag on the bounds check.


Of course, you still run into temporal ordering issues, especially with 
globals, where the CFG ordering won't help.

But don't let that discourage you -- it would be great work to see done and 
commoditized, and would probably be better than most commercial analyzers as 
well ;)

Let me know if you need any more of my expertise in this area. I can't speak 
for GCC internals, though.




RE: GCC 4.4/4.6/4.7 uninitialized warning regression?

2011-04-22 Thread Hargett, Matt
> > This brings out 2 questions.  Why don't GCC 4.4/4.6/4.7 warn it?
> > Why doesn't 64bit GCC 4.2 warn it?

> Good question. It seems that the difference is whether the compiler
> generates a field-by-field copy or a call to memcpy(). According to
> David, the trunk gcc in 32-bit mode doesn't call memcpy, but still
> doesn't warn. He's looking at it.

Is this related to this bug, which I filed a year or two ago?

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42561

It would indeed be very nice to get this taken care of, as this kind of 
analysis would really help find a lot of bugs that currently require commercial 
tools. 


gcc and scientific computing

2011-04-25 Thread Matt McCormick
Hi,

I am involved in a scientific computing podcast,
http://inscight.org/

I was wondering if anyone from the GCC project would like to be a special guest
on the show to talk about recent developments in GCC for scientific computing 
in 
C/C++.  We could discuss, e.g., the graphite optimizations, link time 
optimization, C++Ox, ...

Thanks,
Matt



Detecting global pointers

2011-05-03 Thread Matt Davis
I am writing a gcc plugin and am trying to detect if a value assigned by a
function call, is a global variable or not.  Unfortunately, all calls to
'is_global_var' with a DECL type are returning false.

My pass executes after alias analysis, and ipa analysis.  The
cfun->gimple_df->ipa_pta is set to true, so I know the pta analysis should have
resolved global information. 


Plugin code:
if (is_gimple_call(stmt))
{
gimple_debug_bb(stmt);
tree lhs = gimple_call_lhs(stmt);
if (lhs && is_global_var(SSA_NAME_VAR(lhs)))
  printf("Global detected\n");
}


Source code (in Go):
package main

type T struct {id int}
var myglobal *T;

func fn() *T {
myglobal = new(T); // Should be detected as global
return myglobal;
}

func main() {
t := fn();
}


Basic Block dump as my plugin code executes for function 'fn':
:
# .MEM_4 = VDEF <.MEM_3(D)>
main.myglobal.13_1 = __go_new_nopointers (4);
# .MEM_5 = VDEF <.MEM_4>
main.myglobal = main.myglobal.13_1;
# VUSE <.MEM_5>
D.186_2 = main.myglobal;
return D.186_2;


Any insight would be helpful.
Thanks!

-Matt


Re: Detecting global pointers

2011-05-04 Thread Matt Davis
On Wed, May 4, 2011 at 7:38 PM, Richard Guenther
 wrote:
> On Wed, May 4, 2011 at 6:16 AM, Matt Davis  wrote:
>> I am writing a gcc plugin and am trying to detect if a value assigned by a
>> function call, is a global variable or not.  Unfortunately, all calls to
>> 'is_global_var' with a DECL type are returning false.
>>
>> My pass executes after alias analysis, and ipa analysis.  The
>> cfun->gimple_df->ipa_pta is set to true, so I know the pta analysis should 
>> have
>> resolved global information.
>
> is_global_var is all you need, no need for PTA analysis (which doesn't
> change this but simply uses is_global_var as well).

Thanks for the clarification.

>> Plugin code:
>>    if (is_gimple_call(stmt))
>>    {
>>        gimple_debug_bb(stmt);
>>        tree lhs = gimple_call_lhs(stmt);
>>        if (lhs && is_global_var(SSA_NAME_VAR(lhs)))
>>          printf("Global detected\n");
>
> That will only reliably work if the global is not of is_gimple_reg_type (),
> otherwise the call will store to an automatic temporary and the store
> to the global will happen in a separate statement.
>
>>    }
>>
>>
>> Source code (in Go):
>>    package main
>>
>>    type T struct {id int}
>>    var myglobal *T;
>>
>>    func fn() *T {
>>        myglobal = new(T); // Should be detected as global
>>        return myglobal;
>>    }
>>
>>    func main() {
>>        t := fn();
>>    }
>>
>>
>> Basic Block dump as my plugin code executes for function 'fn':
>>    :
>>    # .MEM_4 = VDEF <.MEM_3(D)>
>>    main.myglobal.13_1 = __go_new_nopointers (4);
>
> assigns to a temporary
>
>>    # .MEM_5 = VDEF <.MEM_4>
>>    main.myglobal = main.myglobal.13_1;
>
> and here is the store
>
> You can try looking up the store if the LHS of the call is an SSA name
> by looking at its immediate uses, but of course for
>
> int glob;
>
> foo()
> {
>  int i = call(); // not global
>  glob = i;
> }
>
> this would also find the store to glob.
>
> So I'm not sure you can recover all information up to source level
> precision.

Thanks very much for the clarification and information.

-Matt


Non-optimal stack usage with C++ temporaries

2011-05-11 Thread Matt Fischer
I've noticed some behavior with g++ that seems strange to me.  I don't
know if there's some technicality in the C++ standard that requires
this, or if it's just a limitation to the optimization code, but it
seemed strange so I thought I'd see if anybody could shed more light
on it.

Here's a test program that illustrates the behavior:

struct Foo {
char buf[256];
Foo() {} // suppress automatically-generated constructor code for clarity
~Foo() {}
};

void func0(const Foo &);
void func1(const Foo &);
void func2(const Foo &);
void func3(const Foo &);

void f()
{
func0(Foo());
func1(Foo());
func2(Foo());
func3(Foo());
}

Compiling with -O2 and "-fno-stack-protector -fno-exceptions" for
clarity, on g++ 4.4.3, gives the following:

 :
   0:   55  push   %ebp
   1:   89 e5   mov%esp,%ebp
   3:   81 ec 18 04 00 00   sub$0x418,%esp
   9:   8d 85 f8 fb ff ff   lea-0x408(%ebp),%eax
   f:   89 04 24mov%eax,(%esp)
  12:   e8 fc ff ff ff  call   13 <_Z1fv+0x13>
  17:   8d 85 f8 fc ff ff   lea-0x308(%ebp),%eax
  1d:   89 04 24mov%eax,(%esp)
  20:   e8 fc ff ff ff  call   21 <_Z1fv+0x21>
  25:   8d 85 f8 fd ff ff   lea-0x208(%ebp),%eax
  2b:   89 04 24mov%eax,(%esp)
  2e:   e8 fc ff ff ff  call   2f <_Z1fv+0x2f>
  33:   8d 85 f8 fe ff ff   lea-0x108(%ebp),%eax
  39:   89 04 24mov%eax,(%esp)
  3c:   e8 fc ff ff ff  call   3d <_Z1fv+0x3d>
  41:   c9  leave
  42:   c3  ret

The function makes four function calls, each of which constructs a
temporary for the parameter.  The compiler dutifully allocates stack
space to construct these, but it seems to allocate separate stack
space for each of the temporaries.  This seems unnecessary--since
their lifetimes don't overlap, the same stack space could be used for
each of them.  The real-life code I adapted this example from had a
fairly large number of temporaries strewn throughout it, each of which
were quite large, so this behavior caused the generated function to
use up a pretty substantial amount of stack, for what seems like no
good reason.

My question is, is this expected behavior?  My understanding of the
C++ standard is that each of those temporaries goes away at the
semicolon, so it seems like they have non-overlapping lifetimes, but I
know there are some exceptions to that rule.  Could someone comment on
whether this is an actual bug, or required for some reason by the
standard, or just behavior that not enough people have run into
problems with?

Thanks,
Matt


How to get function argument points-to information.

2011-05-17 Thread Matt Davis
For some analysis I am doing, I need to determine if a particular SSA_NAME_VAR
node is pointed-to by a function argument.  I am iterating across the function's
arguments via DECL_ARGUMENTS(), but each argument is just a DECL node, and
contains no associated points-to data, as far as I can tell.  I assume there is
a better/different way of determining if an argument points to my node?

Thanks for any insight.

-Matt


missed optimization: transforming while(n>=1) into if(n>=1)

2011-05-20 Thread Matt Turner
Hi,

While trying to optimize pixman, I noticed that gcc is unable to
recognize that 'while (n >= 1)' can often be simplified to 'if (n >=
1)'. Consider the following example, where there are loops that
operate on larger amounts of data and smaller loops that deal with
small or unaligned data.

int sum(const int *l, int n)
{
int s = 0;

while (n >= 2) {
s += l[0] + l[1];

l += 2;
n -= 2;
}

while (n >= 1) {
s += l[0];

l += 1;
n -= 1;
}

return s;
}

Clearly the while (n >= 1) loop can never execute more than once, as n
must be < 2, and in the body of the loop, n is decremented.

The resulting machine code includes the backward branch to the top of
the while (n >= 1) loop, which can never be taken.

I suppose this is a missed optimization. Is this known, or should I
make a new bug report?

Thanks,
Matt Turner


[RFC] alpha/ev6: model 1-cycle cross-cluster delay

2011-05-24 Thread Matt Turner
Alpha EV6 and newer can execute four instructions per cycle if correctly
scheduled. The architecture has two clusters {0, 1}, each with its own
register file. In each cluster, there are two slots {upper, lower}. Some
instructions only execute from either upper or lower slots.

Register values produced in one cluster take 1 cycle to appear in the
other cluster, so improperly scheduled instructions may incur a cross-
cluster delay.

I've duplicated (define_insn_reservation ...) for instructions which can
execute from either cluster, increased latencies by 1, and added
bypasses.

In my limited testing it seems to provide a minor improvement (I
wouldn't expect much, since it should only remove single-cycle delays
here and there)

So, please review and provide feedback.

I also have some questions:

 - In the Compiler Writer's Guide [1] [2], it doesn't seem to mention
   anything about cross-cluster delays from integer load/store
   instructions as producers. It seems plausible that load/stores could
   be a special case and update both clusters' register files at the
   same time, but maybe this is an oversight in (two versions of) the
   manual?

 - CMOV instructions are internally split as two distinct instructions
   on >=EV6 that may execute on any cluster/slot. Evidently, this means
   that the first part may execute on cluster 0 while the second
   executes on cluster 1, thereby incurring a 1-cycle cross-cluster
   delay. WTF. So, how can I represent this two-part instruction--by
   duplicating its define_insn_reservation 4 times? I can't find any
   rules for scheduling CMOVs in the CWG, so knowing this would be
   helpful too.

 - The CWG lists the latency of unconditional branches and jsr/call
   instructions as 3, whereas we have 1. I guess this latency value is
   only meaningful if the instruction produces a value? I'm a bit
   confused by this value in the CWG since it lists the latency of
   conditional branches as N/A, while these other types of branches as
   3, although none produce a register value.

 - When increasing the default instruction latencies, I've added
   ',nothing' to the functional unit regexp. Is this the correct way to
   describe that the functional unit is free?

 - There's a ??? comment at the top that says "In addition, instruction
   order affects cluster issue." Does gcc understand how to do this
   already, or is this a TODO reminder? If it's a reminder, where should
   I look in gcc to add this?

 - I also see that fadd/fcmov/fmul instructions take an extra two cycles
   when the consumer is fst/ftoi, so something similar should be added
   for them. Can a (define_bypass ...) function specify a latency value
   greater than the default latency, or should I raise the default
   latency and special-case fst/ftoi consumers like I've done for
   cross-cluster delay?

Thanks a lot!

Matt Turner

[1] http://www.compaq.com/cpq-alphaserver/technology/literature/cmpwrgd.pdf
[2] http://download.majix.org/dec/comp_guide_v2.pdf


--- ev6.md.orig 2007-08-02 06:49:31.0 -0400
+++ ev6.md  2011-05-24 23:15:39.414919424 -0400
@@ -24,19 +24,19 @@
 ; EV6 has two symmetric pairs ("clusters") of two asymmetric integer
 ; units ("upper" and "lower"), yielding pipe names U0, U1, L0, L1.
 ;
-; ??? The clusters have independent register files that are re-synced
+; The clusters have independent register files that are re-synced
 ; every cycle.  Thus there is one additional cycle of latency between
-; insns issued on different clusters.  Possibly model that by duplicating
-; all EBOX insn_reservations that can issue to either cluster, increasing
-; all latencies by one, and adding bypasses within the cluster.
+; insns issued on different clusters.
 ;
-; ??? In addition, instruction order affects cluster issue.
+; ??? In addition, instruction order affects cluster issue. XXX: what to do?
 
 (define_automaton "ev6_0,ev6_1")
 (define_cpu_unit "ev6_u0,ev6_u1,ev6_l0,ev6_l1" "ev6_0")
 (define_reservation "ev6_u" "ev6_u0|ev6_u1")
 (define_reservation "ev6_l" "ev6_l0|ev6_l1")
-(define_reservation "ev6_ebox" "ev6_u|ev6_l")
+(define_reservation "ev6_ebox" "ev6_u|ev6_l") ; XXX: remove
+(define_reservation "ev6_e0" "ev6_l0|ev6_u0")
+(define_reservation "ev6_e1" "ev6_l1|ev6_u1")
 
 (define_cpu_unit "ev6_fa" "ev6_1")
 (define_cpu_unit "ev6_fm,ev6_fst0,ev6_fst1" "ev6_0")
@@ -50,15 +50,26 @@
 
 ; Integer loads take at least 3 clocks, and only issue to lower units.
 ; adjust_cost still factors in user-specified memory latency, so return 1 here.
-(define_insn_reservation "ev6_ild" 1
+; XXX: CWG doesn't mention cross-cluster delay for ild/ist producers ???
+(define_insn_reservation "ev

Configure gcc with --multilib=... ?

2011-06-14 Thread Matt Turner
Hi,

I'd like to ship multilib Gentoo/MIPS installations with only n32 and
n64 ABIs (ie, no o32). The reasoning is that if your system can use
either 64-bit ABI you don't have any reason to run o32, given that
o32-only installation media also exists.

I say this mail http://gcc.gnu.org/ml/gcc/2010-01/msg00063.html
suggesting the addition of a --multilib= configure option. Has such a
thing been added? Is there a way to configure gcc to build only n32
and n64 ABIs?

Thanks,
Matt


RE: GCC 4.6.1 Status Report (2011-06-20) [BRANCH FROZEN]

2011-06-20 Thread Hargett, Matt
> GCC 4.6.1 first release candidate has been uploaded, and the branch
> is now frozen.  All changes need RM approval now.
> Please test it, if all goes well, 4.6.1 will be released early next
> week.

No chance for a fix for this in 4.6.1?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48600

This has been a critical regression for us, forcing the removal of cold 
attributes which in turn has reduced performance by a notable amount due to 
decreased spatial locality.

If cold attributes are a sufficiently obscure feature that doesn't warrant a 
P1, let me know and I'll set expectations appropriately.

Thanks!


RE: C++ bootstrap of GCC - still useful ?

2011-07-09 Thread Hargett, Matt
> As of a couple of months, I perform a bootstrap-with-C++
> (--enable-build-with-cxx) daily on my machine between 18:10 and 20:10 UTC.

> Is there still interest in daily builds like mine ?

Absolutely! Especially if you do a profiled-bootstrap and/or LTO bootstrap in 
that mode. Hopefully this is feasible given the recent improvements in trunk 
that allowed Mozilla to be built this way.

Even without those things, it's quite useful to make sure it stays working. So, 
thanks and keep it up :)


Updating the CFG after function modifcation

2011-07-15 Thread Matt Davis
Hello,
I have an IPA pass (implemented as a plugin) which executes after all IPA
passes.  My pass transforms functions by adding code and also modifying the
function prototypes.  I have had this work on a per-function basis, via a
GIMPLE_PASS, which calls update_ssa verify_ssa and cleanup_cfg after each
function is processed.  However, I have recently moved my plugin to execute
after all IPA passes, so I can iterate over the cfg of the program.  The first
iteration is an analysis, and the second iteration does the transformations.
Unfortunately, I keep getting errors now, primarily a segfault in
"compute_call_stmt_bb_frequency" in the processing of the main().  The segfault
occurs because the argument 'bb' is NULL and later dereferenced.  (NOTE: I do
not modify the prototype of main).

The e->call_stmt that the null basic block references is from a statement I have
removed via gsi_remove during my transformation pass.  I need to clean up the
cfg somehow, after I remove the statement.  My gimple pass, with this same
functionality worked fine.  Something tells me that my plugin should be in a
different position.  I also tried calling cleanup_tree_cfg() after my
transformation pass, still no luck

Any suggestions would be welcomed.  Thanks for even reading this far.

-Matt


PARM_DECL to SSA_NAME

2011-07-16 Thread Matt Davis
Hello,
I have a PARM_DECL node that I am passing to a function.  Previously, my code
was working, but since I have made my optimization pass operate as an IPA pass,
versus a GIMPLE pass, I think I am missing some verification/resolution call
that I need to make.

Of course, when I pass the PARM_DECL to my function, I am now getting an error
from verify_ssa() suggesting that I should be passing a SSA_NAME instance.  I 
tried
using gimple_default_def() to obtain the SSA_NAME for that PARM_DECL, however,
the return value is NULL.  Is there some other way of accessing the SSA_NAME
information for this PARM_DECL node?  The SSA has been generated before my 
plugin
executes.  Also, I do call update_ssa() after the routines are processed by my
passes.

Thanks for any insight.

-Matt


Inline Expansion Problem

2011-08-26 Thread Matt Davis
Hello,
I am having the compiler insert a call to a function which is defined inside
another object file.  However, during inline expansion via expand_call_inline(),
the following assertion fails in tree-inline.c:
>> 3775: edge = cgraph_edge (id->dst_node, stmt);
>> 3776: gcc_checking_assert (cg_edge);

cg_node comes back as being NULL since there is only one callee and no indirect
calls, the function that has the inserted call is main().  Is there something I
forgot to do after inserting the gimple call statement?  This works fine without
optimization.

-Matt


Re: Inline Expansion Problem

2011-08-27 Thread Matt Davis
On Sat, Aug 27, 2011 at 09:27:49AM +0200, Richard Guenther wrote:
> On Sat, Aug 27, 2011 at 4:47 AM, Matt Davis  wrote:
> > Hello,
> > I am having the compiler insert a call to a function which is defined inside
> > another object file.  However, during inline expansion via 
> > expand_call_inline(),
> > the following assertion fails in tree-inline.c:
> >>> 3775: edge = cgraph_edge (id->dst_node, stmt);
> >>> 3776: gcc_checking_assert (cg_edge);
> >
> > cg_node comes back as being NULL since there is only one callee and no 
> > indirect
> > calls, the function that has the inserted call is main().  Is there 
> > something I
> > forgot to do after inserting the gimple call statement?  This works fine 
> > without
> > optimization.
> 
> Dependent on where you do it you have to add/rebuild cgraph edges.

Thanks Richard,
I tired "rebuild_cgraph_edges()" before I sent the initial email.
Unfortunately, when I call that function after I add the statement, in an IPA
pass, the resulting binary does not link, as it does not seem able to resolve
the symbol to the callee.  Maybe providing more context would help make more
sense.  insert_func_call inserts the call by adding a new gimple call statement.
I've done this tons of times before, but it seems with -O the callgraph isn't
happy.

>> for (node=cgraph_nodes; node; node=node->next)
>> {
>> if (!(func = DECL_STRUCT_FUNCTION(node->decl)))
>>   continue;
>> 
>> push_cfun(func);
>> old_fn_decl = current_function_decl;
>> current_function_decl = node->decl;
>> 
>> insert_func_call(func);
>> 
>> rebuild_cgraph_edges();
>> current_function_decl = old_fn_decl;
>> pop_cfun();
>> }

-Matt


Re: Inline Expansion Problem

2011-08-27 Thread Matt Davis
On Sat, Aug 27, 2011 at 11:25:45AM +0200, Richard Guenther wrote:
> On Sat, Aug 27, 2011 at 10:06 AM, Matt Davis  wrote:
> > On Sat, Aug 27, 2011 at 09:27:49AM +0200, Richard Guenther wrote:
> >> On Sat, Aug 27, 2011 at 4:47 AM, Matt Davis  wrote:
> >> > Hello,
> >> > I am having the compiler insert a call to a function which is defined 
> >> > inside
> >> > another object file.  However, during inline expansion via 
> >> > expand_call_inline(),
> >> > the following assertion fails in tree-inline.c:
> >> >>> 3775: edge = cgraph_edge (id->dst_node, stmt);
> >> >>> 3776: gcc_checking_assert (cg_edge);
> >> >
> >> > cg_node comes back as being NULL since there is only one callee and no 
> >> > indirect
> >> > calls, the function that has the inserted call is main().  Is there 
> >> > something I
> >> > forgot to do after inserting the gimple call statement?  This works fine 
> >> > without
> >> > optimization.
> >>
> >> Dependent on where you do it you have to add/rebuild cgraph edges.
> >
> > Thanks Richard,
> > I tired "rebuild_cgraph_edges()" before I sent the initial email.
> > Unfortunately, when I call that function after I add the statement, in an 
> > IPA
> > pass, the resulting binary does not link, as it does not seem able to 
> > resolve
> > the symbol to the callee.  Maybe providing more context would help make more
> > sense.  insert_func_call inserts the call by adding a new gimple call 
> > statement.
> > I've done this tons of times before, but it seems with -O the callgraph 
> > isn't
> > happy.
> 
> If you are doing this from an IPA pass you have to add the edge manually using
> update_edges_for_call_stmt.

Thanks Richard,
I was unable to properly use update_edges_for_call_stmt.  It seems that routine
is for updating an existing call.  In my case I am inserting a new gimple call
via gsi_insert_before() with GSI_NEW_STMT.  As a gimple pass, this works fine.
I appreciate all of your correspondence.

-Matt


Adding functions at compile time

2011-09-11 Thread Matt Davis
I am creating a few functions at compile time, via a gcc plugin.  I create the
functions and their bodies, and insert them into the call graph.  This is all
done before "cgraph_finalize_compilation_unit()" has been called.  I then have
another compiler pass, which gets started after the SSA representation has been
generated, and it is this pass that uses the functions created previously, in
the much earlier pass.  The problem is that by the time the created functions
are used, the cgraph has already removed those nodes since they are disjoint.  I
tried creating and modifying the functions in the same pass, but that was not
successful either.  I did not see any flag I could set in the cgraph nodes,
which are created in the first pass I mentioned preventing them from being
removed.  Is there a way I can keep those nodes around so the functions created
at compile time actually get built?

-Matt


Go Garbage Collection Roots

2011-09-29 Thread Matt Davis
As some of you might know, I have been researching and working on a region-based
memory management plugin for GCC.  My target is specifically the Go language.
With that said, I have been making a fair amount of progress.  More recently, I
have been benchmarking my work, and it came to my attention that I need to
handle types defined in an external object files.  For instance, when a new List
object is created, the external package for List, calls "new" and returns us a
nice sparkly new List object.  The runtime of Go implements "new" as "__go_new,"
which calls the runtime's special allocator to produce an object that is garbage
collected.  This is causing some snags in my system.  Mainly, I want to use my
own allocator, since there is only a special case when I want to use garbage
collection in my region system.  Is there a way/interface to register data as a
root in the garbage collector, so that its not in conflict with my allocation?

The other option would be to try to override "__go_new" with my own
implementation, but keeping the same symbol name so that the linker does the
dirty work.

-Matt


Creating a structure at compile time.

2011-12-01 Thread Matt Davis
I am working on a gcc-plugin where I need to create a structure at compile time.
I have gleaned over one of the front ends to learn more about creating
structures at compile time.  What I have thus far is a type node for my struct.

I now need to create an instance of this struct.  For exemplary purposes we will
call this type 'struct T' and we will call the instance of T, 'my_T'  By using
the build_constructor() routine in GCC I create an instance, my_T, which I need
to pass the address of to a function.  So, I take this decl, my_T, and pass it 
to 
build_fold_addr_expr().  The result of the latter is what I pass to the
function 'fn()'.

Yes, the function I am passing the reference to is expecting the proper type,
that of address-to-T.  Running this presents me with an error in
expand_expr_real_1() where "Variables inherited from containing functions should
have been lowered by this point."

So, I figure, if I create a temp variable, 'V', of type pointer-to-T, and run
make_ssa_name() on that temp.  And then insert an assignment before the call to
fn, so I get: 'V = &my_T;'  After looking at the GIMPLE dump, I see, 'V = &my_T;
fn(V);'  Which is correct, however, in the type list of the caller, I only see:
'struct * V;'  Now, this concerns me, I would expect to see "struct T *V;"  As
above, this case also fails.

I am baffled, do I need to even be creating the ssa_name instance to pass to
'fn()', which is 'V' in the case above?  Or, will the build_constructor()
produce a tree node that I can treat as a variable, that I can pass to 'fn()' ? 
 

-Matt


Re: Creating a structure at compile time.

2011-12-03 Thread Matt Davis
On Fri, Dec 2, 2011 at 3:38 PM, Matt Davis  wrote:
> I am working on a gcc-plugin where I need to create a structure at compile 
> time.
> I have gleaned over one of the front ends to learn more about creating
> structures at compile time.  What I have thus far is a type node for my 
> struct.
>
> I now need to create an instance of this struct.  For exemplary purposes we 
> will
> call this type 'struct T' and we will call the instance of T, 'my_T'  By using
> the build_constructor() routine in GCC I create an instance, my_T, which I 
> need
> to pass the address of to a function.  So, I take this decl, my_T, and pass 
> it to
> build_fold_addr_expr().  The result of the latter is what I pass to the
> function 'fn()'.
>
> Yes, the function I am passing the reference to is expecting the proper type,
> that of address-to-T.  Running this presents me with an error in
> expand_expr_real_1() where "Variables inherited from containing functions 
> should
> have been lowered by this point."
>
> So, I figure, if I create a temp variable, 'V', of type pointer-to-T, and run
> make_ssa_name() on that temp.  And then insert an assignment before the call 
> to
> fn, so I get: 'V = &my_T;'  After looking at the GIMPLE dump, I see, 'V = 
> &my_T;
> fn(V);'  Which is correct, however, in the type list of the caller, I only 
> see:
> 'struct * V;'  Now, this concerns me, I would expect to see "struct T *V;"  As
> above, this case also fails.
>
> I am baffled, do I need to even be creating the ssa_name instance to pass to
> 'fn()', which is 'V' in the case above?  Or, will the build_constructor()
> produce a tree node that I can treat as a variable, that I can pass to 'fn()' 
> ?
>
> -Matt

Well, I have successfully created and used an initialized structure.
Note that I do not need to run the make_ssa_name.  I can declare the
struct as TREE_STATIC and work from there.  Now, my problem with the
expand_expr_real_1 check failing is because some of the values I
initialize in my compile-time created struct can be different at
runtime.  Is there a way I can take this constructor tree node, and
have all of the values in it set in the middle of my function, where
those values are defined?   I do not need the structure initialized
upon function entry.  What I need is to have all of the values, which
I already setup when I am in the middle of the function being
processed.  I need these values actually filled-out in the middle of
function instead at function entry.  I am unsure how to do this.  The
constructor node exists, and I'm in the middle of an IPA pass.  I
assume I can call gimplify_expr() but I am thinking I need to pass it
something different than just a constructor tree node.

Thanks for any help

-Matt


Obtaining the arguments to a function pointer

2011-12-09 Thread Matt Davis
I am trying to look at the arguments that are passed to a function
pointer.  I have an SSA_NAME which is for a pointer-type to a
function-type.  I want to obtain the arguments being passed to the
function pointer, but after looking all over the SSA_NAME node and its
corresponding VAR_DECL I cannot seem to find the arguments stashed
anywhere.  I know this is somewhat of a special case.  Typically, if I
had a fndecl it would be easy, but all I know in my case is the
function type.

-Matt


Re: Obtaining the arguments to a function pointer

2011-12-09 Thread Matt Davis
On Sat, Dec 10, 2011 at 12:40 PM, Ian Lance Taylor  wrote:
> Matt Davis  writes:
>
>> I am trying to look at the arguments that are passed to a function
>> pointer.  I have an SSA_NAME which is for a pointer-type to a
>> function-type.  I want to obtain the arguments being passed to the
>> function pointer, but after looking all over the SSA_NAME node and its
>> corresponding VAR_DECL I cannot seem to find the arguments stashed
>> anywhere.  I know this is somewhat of a special case.  Typically, if I
>> had a fndecl it would be easy, but all I know in my case is the
>> function type.
>
> A function pointer doesn't have any associated arguments, at least not
> as I use that word.  Are you looking for the argument types?  Because
> there are no argument values.
>
> The argument types can be found from the type of the SSA_NAME, which
> should be a FUNCTION_TYPE.  TYPE_ARG_TYPES of the FUNCTION_TYPE will be
> the argument types.

Ian,
I was actually looking for the argument instances and not the types.
However, I have found I can get the gimple statement for this call,
and just use that to obtain the actual arguments I need.  Thanks for
the fast reply!

-Matt


Modifying the datatype of a formal parameter

2011-12-17 Thread Matt Davis
I am using 'ipa_modify_formal_parameters()' to change the type of a function's
formal parameter.  After my pass completes, I get a 'gimple_expand_cfg()'
error. I must be missing some key piece here, as the failure points to a NULL
"SA.partition_to_pseudo" value.  I also set_default_ssa_name() on the returned
value from ipa_modify_formal_parameter (the adjustment's 'reduction' field).  Do
I need to re-gimplify the function or run some kind of 'cleanup' or 'update'
once I modify this formal parameter?

Thanks

-Matt


Re: Modifying the datatype of a formal parameter

2011-12-19 Thread Matt Davis
Hi Martin and thank you very much for your reply.  I do have some more
resolution to my issue.

On Mon, Dec 19, 2011 at 8:42 PM, Martin Jambor  wrote:
> Hi,
>
> On Sun, Dec 18, 2011 at 01:57:17PM +1100, Matt Davis wrote:
>> I am using 'ipa_modify_formal_parameters()' to change the type of a 
>> function's
>> formal parameter.  After my pass completes, I get a 'gimple_expand_cfg()'
>> error. I must be missing some key piece here, as the failure points to a NULL
>> "SA.partition_to_pseudo" value.  I also set_default_ssa_name() on the 
>> returned
>> value from ipa_modify_formal_parameter (the adjustment's 'reduction' field). 
>>  Do
>> I need to re-gimplify the function or run some kind of 'cleanup' or 'update'
>> once I modify this formal parameter?
>
> It's difficult to say without knowing what and at what stage of the
> compilation you are doing.

My pass is getting called as the last IPA pass
(PLUGIN_ALL_IPA_PASSES_END).  I do use the same function
"ipa_modify_formal_parameters()" to add additional parameters to
certain functions.  And it works well.

> The sad truth is that
> ipa_modify_formal_parameters is very much crafted for its sole user
> which is IPA-SRA and is probably quite less general than what the
> original intention was.  Any pass using the function then must modify
> the body itself to reflect the changes, just like IPA-SRA does.
>
> SRA does not re-gimplify the modify functions, it just returns
> TODO_update_ssa or (TODO_update_ssa | TODO_cleanup_cfg) if any EH
> cleanup changed the CFG.

Yep, and I do call update_ssa and cleanup_tree_cfg() after my pass.

> So I would suggest to have a look at IPA-SRA (grep for the only call
> to ipa_modify_formal_parameters in tree-sra.c), especially at what you
> do differently.  If you then have any further questions, feel free to
> ask.

Yeah, that was one of the first things I did.   Now, as mentioned, I
do have some more clarity on my issue.  Basically, I am just changing
the type of an existing formal parameter.  When I look at
"gimple_expand_cfg()" which is called later, I notice that the
"SA.partition_to_pseudo" for that parameter is NULL, to which
"gimple_expand_cfg()" aborts() on.  Now, that value is NULL, because
in "gimple_expand_cfg()" the function "expand_used_vars()" is called.
I need "expand_one_var()" called since that should fix-up the RTX
assigned to the parameter I am modifying.  Unfortunately, the bitmap,
"SA.partition_has_default_def" is true for the parameter, even if I do
not set it explicitly.  And since it is always set, the
"expand_one_var()" routine is never called.  I need to unset the
default def associated to the param to force "expand_one_var()" to
execute.  So, for the ssa name assigned to the parameter I am
modifying, I use SSA_NAME_IS_DEFAULT_DEF to set the flag to 'false'
This sounds like a really gross hack.  If I do this, I will need to
set a new ssa definition for the modified parameter.

-Matt


Re: Modifying the datatype of a formal parameter

2011-12-20 Thread Matt Davis
Here is a follow up.  I am closer to what I need, but not quite there
yet.  Basically I just want to switch the type of one formal parameter
to a different type.

On Mon, Dec 19, 2011 at 11:05 PM, Matt Davis  wrote:
> Hi Martin and thank you very much for your reply.  I do have some more
> resolution to my issue.
>
> On Mon, Dec 19, 2011 at 8:42 PM, Martin Jambor  wrote:
>> Hi,
>>
>> On Sun, Dec 18, 2011 at 01:57:17PM +1100, Matt Davis wrote:
>>> I am using 'ipa_modify_formal_parameters()' to change the type of a 
>>> function's
>>> formal parameter.  After my pass completes, I get a 'gimple_expand_cfg()'
>>> error. I must be missing some key piece here, as the failure points to a 
>>> NULL
>>> "SA.partition_to_pseudo" value.  I also set_default_ssa_name() on the 
>>> returned
>>> value from ipa_modify_formal_parameter (the adjustment's 'reduction' 
>>> field).  Do
>>> I need to re-gimplify the function or run some kind of 'cleanup' or 'update'
>>> once I modify this formal parameter?
>>
>> It's difficult to say without knowing what and at what stage of the
>> compilation you are doing.
>
> My pass is getting called as the last IPA pass
> (PLUGIN_ALL_IPA_PASSES_END).  I do use the same function
> "ipa_modify_formal_parameters()" to add additional parameters to
> certain functions.  And it works well.
>
>> The sad truth is that
>> ipa_modify_formal_parameters is very much crafted for its sole user
>> which is IPA-SRA and is probably quite less general than what the
>> original intention was.  Any pass using the function then must modify
>> the body itself to reflect the changes, just like IPA-SRA does.
>>
>> SRA does not re-gimplify the modify functions, it just returns
>> TODO_update_ssa or (TODO_update_ssa | TODO_cleanup_cfg) if any EH
>> cleanup changed the CFG.
>
> Yep, and I do call update_ssa and cleanup_tree_cfg() after my pass.
>
>> So I would suggest to have a look at IPA-SRA (grep for the only call
>> to ipa_modify_formal_parameters in tree-sra.c), especially at what you
>> do differently.  If you then have any further questions, feel free to
>> ask.
>
> Yeah, that was one of the first things I did.   Now, as mentioned, I
> do have some more clarity on my issue.  Basically, I am just changing
> the type of an existing formal parameter.  When I look at
> "gimple_expand_cfg()" which is called later, I notice that the
> "SA.partition_to_pseudo" for that parameter is NULL, to which
> "gimple_expand_cfg()" aborts() on.  Now, that value is NULL, because
> in "gimple_expand_cfg()" the function "expand_used_vars()" is called.
> I need "expand_one_var()" called since that should fix-up the RTX
> assigned to the parameter I am modifying.  Unfortunately, the bitmap,
> "SA.partition_has_default_def" is true for the parameter, even if I do
> not set it explicitly.  And since it is always set, the
> "expand_one_var()" routine is never called.  I need to unset the
> default def associated to the param to force "expand_one_var()" to
> execute.  So, for the ssa name assigned to the parameter I am
> modifying, I use SSA_NAME_IS_DEFAULT_DEF to set the flag to 'false'
> This sounds like a really gross hack.  If I do this, I will need to
> set a new ssa definition for the modified parameter.

I use ipa_modify_formal_paramaters() and swap the type of the param
with that of my desired type.  The resulting PARM_DECL that the latter
function gives me has no default definition.  So, I use
make_ssa_name() and set the return of that to the default definition
for the PARM_DECL.  That works fine, however I need to somehow rebuild
the SSANAMES for the function.  So, the new name I have for the
modified PARAM_DECL is out of order and gimple_expand_cfg() fails,
because the new  definition of the PARM_DECL is now of order for
SA,partition_to_pseudo, when gimple_expand_cfg() is called.  Since the
partition-to-pseudo stuff works based on the index of where the
SSA_NAME is in the functions list of SSANAMES.  gimple_expand_cfg()
works by iterating across all SSANAMEs including the one I no longer
need.  What I need to do is replace the old SSA_NAME  with the newer
SSA_NAME I get back from make_ssa_name().  I could do this directly,
but I have yet to find an appropriate routine in tree-flow.h and
tree-flow-inline.h

-Matt


RTL Conditional and Call

2011-12-30 Thread Matt Davis
Hi,
I am having an RTL problem trying to make a function call from a
COND_EXEC rtx.  The reload pass has been called, and very simply I
want to compare on an 64bit x86 %rdx with a specific integer value,
and if that value is true, my function call executes.  I can call the
function fine outside of the conditional, but when I set it in the
conditional expression, I get the following error:

test.c:6:1: error: unrecognizable insn:
(insn 27 13 20 2 (cond_exec (eq:BI (const_int 42 [0x2a])
(reg:DI 1 dx))
(call (mem:DI (symbol_ref:DI ("abort")) [0 S8 A8])
(const_int 0 [0]))) -1
 (nil))
test.c:6:1: internal compiler error: in insn_default_length, at
config/i386/i386.md:591

The original code for the condition:
rtx cmp = gen_rtx_EQ(
BImode,
gen_rtx_CONST_INT(VOIDmode, 42),
gen_rtx_REG(DImode, 1));

And the original code for the COND_EXEC expression, which is what I
emit into the program:
rtx sym = gen_rtx_SYMBOL_REF(Pmode, "abort");
rtx abrt_addr = gen_rtx_MEM(Pmode, sym);
rtx abrt = gen_rtx_CALL(VOIDmode, abrt_addr, const0_rtx);
rtx cond = gen_rtx_COND_EXEC(VOIDmode, cmp, abrt);

Thanks

-Matt


Re: RTL Conditional and Call

2011-12-30 Thread Matt Davis
On Sat, Dec 31, 2011 at 12:51 AM, Alexander Monakov  wrote:
>
>
> On Sat, 31 Dec 2011, Matt Davis wrote:
>
>> Hi,
>> I am having an RTL problem trying to make a function call from a
>> COND_EXEC rtx.  The reload pass has been called, and very simply I
>> want to compare on an 64bit x86 %rdx with a specific integer value,
>> and if that value is true, my function call executes.  I can call the
>> function fine outside of the conditional, but when I set it in the
>> conditional expression, I get the following error:
>>
>> test.c:6:1: error: unrecognizable insn:
>
> Indeed, x86 does not have a "conditional call" instruction.  You would have to
> generate the call in a separate basic block and add a conditional branch
> instruction around it.  You can reference the following code, which attempts
> to convert any COND_EXECs to explicit control flow:
>
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02383.html
>
> (but you will probably need to additionally generate comparison instructions).
>
> Hope that helps,

Thanks Alexander.  This does help.  What I have been doing is writing
the same code in c.  Compiling that, and then dumping the RTL.  I then
try to create the same RTL by hand.  The second thing I need to do, as
the first is already in place in my code, is to compare a register
with a constant.  So, just to test things, I just perform a simple
"COMPARE" and set the mode to CCZ, and is what my analogue C variant
produces in the RTL dump.  Unfortunately, I'm still getting a similar
error "unrecognizable insn"   I feel lame asking so many questions,
but this is something I want to get stronger with, so aside from my
current gcc research, I am tossing this into the mix in my free time.
I've looked at the rtl.def and nothing seems incorrect.

My RTX:
rtx cmp2 = gen_rtx_COMPARE(
CCZmode,
gen_rtx_REG(DImode, 1),
gen_rtx_CONST_INT(VOIDmode, 42));

Once this is in place I would wrap a SET rtx and actually set the CCZ
register.  I'm primarily just concerned with getting the comparison
piece in place first.

-Matt


Interface Method Table

2012-01-19 Thread Matt Davis
For a Go program being compiled in gcc, from the middle end, is there a way to
figure-out which routines make up the interface-method-table?  I could check the
mangled name of the method table, but is there another way to deduce what
methods compose it from the middle-end?

Thanks!

-Matt


RTL AND Instruction

2012-01-21 Thread Matt Davis
Hello (again),
I have a case where I need to emit an AND operation on a register and a
const_int value.  The machine architecture I am looking at, for the .md, is an
i386.  Anyways, after matching things up with the rtl.def and what is in the
.md,  I use the gen_rtx_AND macro and wrap that in a gen_rtx_SET.  I could
insert inline assembly with the ASM_OPERANDS macro, but I really want to do this
with pure RTL.  Essentially, I just want to emit:  "and %eax, $0x7"

Once I emit my rtx into the list of insns, GCC gives me an "unrecognized insn"
error.  I can trace the code through the first part of the condition, specified
in i386.md, "ix86_binary_operator_ok," and that passes fine from the
"anddi_1" define_insn.  What I have in my source is the following:

rtx eax = gen_rtx_REG(DImode, 0);
rtx and = gen_rtx_AND(DImode, eax, gen_rtx_CONST_INT(VOIDmode, 7));
and = gen_rtx_SET(DImode, eax, and);
emit_insn_before(and, insn);

Thanks for any insight into this.  On a side note, this is just for a
side-project, and I am trying to get a better grasp of RTL.  I have gone through
the internals manual for RTL and Machine Descriptions, but seems I am still
having a bit of trouble.

-Matt


Re: RTL AND Instruction

2012-01-29 Thread Matt Davis
On Sun, Jan 29, 2012 at 8:21 PM, James Courtier-Dutton
 wrote:
>
> On Jan 22, 2012 5:21 AM, "Matt Davis"  wrote:
>>  Essentially, I just want to emit:  "and %eax, $0x7"
>>
> Assuming at&t format, does that instruction actually exist?
> How can you store the result in the constant number 7?
> Did you instead mean
> and $0x7, %eax

Yes, I have it working.  Much thanks to everyone :-)

-Matt


[alpha] Request for help wrt gcc bugs 27468, 27469

2009-12-02 Thread Matt Turner
Hi,
Could someone please take a look at these two bugs?

27468 - sign-extending Alpha instructions not exploited
27469 - zero extension not eliminated [on Alpha]

Andrew Pinski has confirmed both of them three and a half years ago.
My uninformed feeling after seeing bugs 8603 and 42113 fixed is that
both of them are relatively simple.

I CC'd Richard since you probably know more about Alpha than anyone
else, and I CC'd you, Uros, since you were extremely nice and helpful
with bugs the other two previously mentioned bugs.

I'm more than willing to do any testing I can, and I can get you
access to a quad-833MHz ES40 to do testing on, if need be.

Thanks,
Matt Turner


[alpha] Wrong code produced at -Os, -O2, and -O3

2010-04-07 Thread Matt Turner
Hi Uros and Richard,
I was rewriting the Alpha sched_find_first_bit implementation for the
Linux Kernel, and in the process I think I've come across a gcc bug.

I rewrote the function using cmov instructions, and wrote a small
program to test its correctness and performance. I wrote the function
initially as an external .S file, and once I was reasonably sure it
was correct, converted it to C function with inline assembly.
Compiling both produce the exact same output, as shown.

:
ldq t0,0(a0)
clr t2
ldq t1,8(a0)
cmoveq  t0,0x40,t2
cmoveq  t0,t1,t0
cttzt0,t3
addqt3,t2,v0
ret

In my test program, I found that when I executed the rewritten
implementation _before_ the reference implementation that it produced
bogus results. This only happens when using the C/inline asm function.
When compiled with the external .S file, the results are correct.

Attached is a tar.gz with my test code. Compile the test program with
`gcc -O -mcpu=... find.c rewritten.S test.c -o test` with optional
-D__REWRITTEN_INLINE and -D__REWRITTEN_FIRST. At -Os, -O2, or -O3 and
-D__REWRITTEN_INLINE and -D__REWRITTEN_FIRST the program will produce
incorrect results and assert(). At -O0 or -O1 or without one or both
of the -D flags, it will produce correct results. I've tested with
gcc-4.3.4 and gcc-4.4.2.

Thanks. Let me know what I can do to help further.

Matt Turner


sched_find_first_bit.tar.gz
Description: GNU Zip compressed data


Re: [alpha] Wrong code produced at -Os, -O2, and -O3

2010-04-08 Thread Matt Turner
On Thu, Apr 8, 2010 at 2:16 AM, Uros Bizjak  wrote:
> On Wed, Apr 7, 2010 at 8:38 PM, Matt Turner  wrote:
>
>> I was rewriting the Alpha sched_find_first_bit implementation for the
>> Linux Kernel, and in the process I think I've come across a gcc bug.
>
> [...]
>
>> Thanks. Let me know what I can do to help further.
>
> Please fill a Bugzilla bugreport with your problem. Otherwise, it will
> be lost in the mailing lists.
>
> Uros.
>

Sure. Thanks for the email.

I've filed it in Bugzilla, with as small a test case as I can.

Thanks!
Matt

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43691


Stack mangling for anonymous function pointers

2008-10-24 Thread Matt Hauer
I'm working on a system where we're jumping from Java into C to pull a
function out of a dictionary (indexed by string name) and calling it
as a 'long (*)(void *, ...).  There's some confusion as to if there is
a method to copy a structure or an array onto the stack through the
... arg such that the remainder of the stack can be used for the
specific arguments that the function is looking for (ie, "f(void *,
int, long, long, double)").  Online documentation has some static as
to whether a pointer to, or the whole structure is copied onto the
stack.

Is there a reliable way to write data to the stack such that a called
function pointer can extract the values it seeks?

Thanks,
Matt


Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-13 Thread Matt Thomas


On Mar 13, 2009, at 10:06 AM, Paolo Bonzini wrote:




Hm.  In fold-const.c we try to make sure to produce the same result
as the target would for constant-folding shifts.  Thus, Paolo, I  
think

what fold-const.c does is what we should assume for
!SHIFT_COUNT_TRUNCATED.  No?
Unfortunately it is not so simple.  fold-const.c is actually  
wrong, as

witnessed by this program

static inline int f (int s) { return 2 << s; }
int main () { printf ("%d\n", f(33)); }

which prints 4 at -O0 and 0 at -O2 on i686-pc-linux-gnu.


But this is because i?86 doesn't define SHIFT_COUNT_TRUNCATED, no?


Yes, so fold-const.c is *not* modeling the target in this case.

But on the other hand, this means we can get by with documenting the
effect of a conservative truncation mask: no wrong code bugs, just
differences between optimization levels for undefined programs.  I'll
check that the optimizations done based on the truncation mask are all
conservative or can be made so.

So, I'd still need the information for arm and m68k, because that
information is about the bitfield instructions.  For rs6000 it would  
be

nice to see what they do for 64-bits (for 32-bit I know that PowerPCs
truncate to 6 bits, not 5).  But for the other architectures, we can  
be

conservative.


VAX doesn't truncate at all, if you specify >31 bits it raises a
reserved operand exception.


Can't pass temporary with hidden copy ctor as const ref

2009-04-09 Thread Matt Hoosier
Hi,

I'm having trouble compiling the following with g++ 4.2.1:

  class Uncopyable
  {
  public:
  Uncopyable(int x) {}
  private:
  Uncopyable(const Uncopyable & other) {}
  };

  class User
  {
  public:
  void foo(int x)
  {
  foo(Uncopyable(x));
  }

  void foo(const Uncopyable & x)
  {
  // do something
  }
  };

  int main ()
  {
  User u;
  u.foo(1);
  return 0;
  }

The compiler complains that it can't find a copy ctor for
'Noncopyable'; why is this? It would seem that temporaries can be
passed directly as the const ref rather than needing a copy.

Message:

test.cc: In member function 'void User::foo(int)':
test.cc:11: error: 'Uncopyable::Uncopyable(const Uncopyable&)' is private


variadic arguments not thread safe on amd64?

2009-04-27 Thread Matt Provost
I've been trying to write a program with a logging thread that will
consume messages in 'printf format' passed via a struct. It seemed
that this should be possible using va_copy to copy the variadic
arguments but they would always come out as garbage. This is with gcc
4.1.2 on amd64. Reading through the amd64 ABI it's now clear that the
va_list is just a struct and the actual values are stored in
registers. So I imagine that when it switches threads the registers
are restored and the va_list isn't valid anymore. But I can't find any
documentation about whether the va_* macros were ever supposed to be
thread safe. It seems that they probably are everywhere except PPC and
amd64.

Is there a portable way to pass a va_list between threads?

Here's an example program, if you compile it on a 32 bit machine (or
even with -m32) it prints out both strings ok, but on amd64 it will
print nulls for the threaded case.

$ gcc -m64 -g -lpthread test.c
$ ./a.out hello world
debug: hello world
tdebug: hello world
$ gcc -m64 -g -lpthread test.c
$ ./a.out hello world
debug: hello world
tdebug: (null) (null)


#include 
#include 

typedef struct log_s {
const char *format;
va_list ap;
} log_t;

log_t mylog;
pthread_mutex_t m;
pthread_cond_t c;

void printlog() {
vprintf(mylog.format, mylog.ap);
}

void *tprintlog() {
pthread_mutex_lock(&m);
pthread_cond_wait(&c, &m);
vprintf(mylog.format, mylog.ap);
pthread_mutex_unlock(&m);
}

void debug(const char *format, ...) {
va_list ap;
mylog.format = format;
va_start(ap, format);
va_copy(mylog.ap, ap);
printlog();
va_end(ap);
}

void tdebug(const char *format, ...) {
va_list ap;
pthread_mutex_lock(&m);
mylog.format = format;
va_start(ap, format);
va_copy(mylog.ap, ap);
pthread_cond_signal(&c);
pthread_mutex_unlock(&m);
}

int main(int argc, char *argv[]) {
pthread_t t;

debug("debug: %s %s\n", argv[1], argv[2]);

pthread_mutex_init(&m, NULL);
pthread_cond_init(&c, NULL);
pthread_create(&t, NULL, tprintlog, NULL);

sleep(1);

tdebug("tdebug: %s %s\n", argv[1], argv[2]);

sleep(1);
}


Re: variadic arguments not thread safe on amd64?

2009-04-27 Thread Matt Provost
On Mon, Apr 27, 2009 at 08:49:27PM -0700, Andrew Pinski wrote:
> On Mon, Apr 27, 2009 at 8:37 PM, Matt Provost  wrote:
> > void tdebug(const char *format, ...) {
> > ?? ??va_list ap;
> > ?? ??pthread_mutex_lock(&m);
> > ?? ??mylog.format = format;
> > ?? ??va_start(ap, format);
> > ?? ??va_copy(mylog.ap, ap);
> > ?? ??pthread_cond_signal(&c);
> > ?? ??pthread_mutex_unlock(&m);
> 
> You are missing two va_end's here
> 

Yes I had a question about va_end in this situation. Putting one that
clears 'ap' seems fine but doesn't change anything. But if you va_end
the copy that you put in the struct, then what happens when the other
thread goes to use it? Or should the va_end for that be in the tprintlog
function after it's done with it?

In any case none of those combinations seem to affect the output.

Thanks,
Matt


  1   2   >