Re: Modifying ARM code generator for elimination of 8bit writes - need help

2006-07-21 Thread Rask Ingemann Lambertsen
On Thu, Jul 20, 2006 at 03:27:49PM +0200, Rask Ingemann Lambertsen wrote:

> (define_expand "reload_outqi"
>   [(clobber (match_operand:QI 0 "memory_operand" "=Q"))
>(clobber (match_operand:DI 2 "register_operand" "=&r"))
>(set (match_dup 4) (match_dup 5))
>(parallel [
>(set (match_dup 6)
>   (match_operand:QI 1 "register_operand" "r"))
>(clobber (match_dup 3))]
>   )]
[...]

I should perhaps explain how this works. Let's say operand 0 is (mem:QI
(reg:SI 0)), operand 1 is (reg:QI 1) and operand 2 is (reg:DI 2). We then
get:

(clobber (mem:QI (reg:SI 0)))
(clobber (reg:DI 2))
(set (reg:SI 3) (reg:SI 3)) ; {*arm_movsi_insn}, optimized away later.
(parallel [
(set (mem:QI (reg:SI 0)) (reg:QI 1))
(clobber (reg:QI 2))
])  ; {_arm_movqi_insn_swp}

If operand 0 is (mem:QI (plus:SI (reg:SI 0) (const_int 16))), we get:

(clobber (mem:QI (reg:SI 0)))
(clobber (reg:DI 2))
(set (reg:SI 3)
 (plus:SI (reg:SI 0) (const_int 16))) ; {*arm_addsi3}
(parallel [
(set (mem:QI (reg:SI 3)) (reg:QI 1))
(clobber (reg:QI 2))
])  ; {_arm_movqi_insn_swp}

I'll rewrite it to make clearer what is going on. Also, the two clobber
expressions have no purpose in the insn stream. They only exist because all
external operands must be declared using match_operand somewhere in the RTL
template and RTL offers no good way of doing that in a case like this one.

-- 
Rask Ingemann Lambertsen


Re: Gcov: Counting lines of source code (untested files) as gcov does

2006-07-21 Thread Fredrik Johansson

Apparently I forgot to CC this to the list last time, so here is a new attempt!


Fredrik, if you can modify the source: if any line in a source file
is touched, you'll get a .da file.  So you could add a dummy routine
to each file and call them all at startup.  That will be an easier
project than trying to write a new .bb/.bbg reader.



In that case I guess I wouldn't need any "dummy routine" either, just
include the code from a testfile that I know is beeing run, and I have
one TestMain.cpp or something that loads the testsuit and such. That
would work, wouldn't it? I'm at home now so I don't have the code in
front of me, but an included file that isn't used get 0% coverage,
right?

But anyway, I would really really hate to have to alter the main code
base. I'm hired over the summer to alter the build system, and if I
came and told the other developers to make changes to their other
components I think that would discorage them to use gcov at all, which
is the opposite to what I'm hired to do :)

It's not possible to "trick" gcov (or create a version of gcov that
can be tricked) that there is a .da file but that no lines have been
executed or something like that (the line count is done by reading the
.bb anyway, right?)?

Thanks!

Regards,
Fredrik


Re: Modifying ARM code generator for elimination of 8bit writes - need help

2006-07-21 Thread Rask Ingemann Lambertsen
On Thu, Jul 20, 2006 at 04:37:41PM +0200, Rask Ingemann Lambertsen wrote:
> ;; This is primarily a hack for the Nintendo DS external RAM.
> (define_insn "_arm_movqi_insn_swp"
>   [(set (match_operand:QI 0 "reg_or_Qmem_operand" "=r,r,r,Q,Q")
>   (match_operand:QI 1 "general_operand" "rI,K,m,r,r"))
> (clobber (match_scratch:QI 2 "=X,X,X,1,&r"))]
>   "TARGET_ARM && TARGET_SWP_BYTE_WRITES
>&& (   register_operand (operands[0], QImode)
>|| register_operand (operands[1], QImode))"
>   "@
>mov%?\\t%0, %1
>mvn%?\\t%0, #%B1
>ldr%?b\\t%0, %1
>swp%?b\\t%1, %1, [%|%m0]
>swp%?b\\t%2, %1, [%|%m0]"
>   [(set_attr "type" "*,*,load1,store1,store1")
>(set_attr "predicable" "yes")]
> )

I found that this peephole optimization improves the code a whole lot:

;; The register allocator is often stupid. Try to change
;;  mov r2, r1
;;  swpbr2, r2, [r0]
;; into
;;  swpbr2, r1, [r0]
;; (and pretend it is just another way of allocating a scratch register).
(define_peephole2
  [(parallel
  [(set (match_operand:QI 2 "register_operand")
(match_operand:QI 1 "register_operand"))
   (clobber (match_scratch:QI 3))])
   (parallel [
   (set (match_operand:QI 0 "memory_operand") (match_dup 2))
   (clobber (match_dup 2))])]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES"
  [(parallel
  [(set (match_dup 0) (match_dup 1))
   (clobber (match_dup 2))])]
)

Another way of improving the code was to swap the order of the two last
alternatives of _arm_movqi_insn_swp. There are a few differences in the
generated code, shown with "1,&r" to the left and "&r,1" to the right:

.L92: .L92:
ldr r2, [fp, #-144] | ldr r1, [fp, #-144]
ldr r3, [fp, #-152]   ldr r3, [fp, #-152]
cmp r2, #0  | cmp r1, #0
add r2, r3, #2add r2, r3, #2
ldreq   r0, [fp, #-144] | moveq   r0, r1

Above, reload from memory [fp, #-144] for no apparent reason.

.L141:  .L141:
ldr r0, [fp, #-152] | ldr r2, [fp, #-152]
sub r3, r0, #2  | sub r3, r2, #2
cmp r5, r3cmp r5, r3
beq .L142 beq .L142
cmp r5, #0cmp r5, #0
movne   r2, r0  | beq .L144
bne .L146   | b   .L146
b   .L144   <

Some sort of register allocation mismatch.

beq .L160   | beq .L155
cmp r0, #44   cmp r0, #44
cmpne   r0, #59   cmpne   r0, #59
beq .L160   | beq .L155
cmp r0, #61   cmp r0, #61
cmpne   r0, #43   cmpne   r0, #43
bne .L158 bne .L158
> .L155:
> mov ip, #95
> str r8, [fp, #-120]
> mov r0, #1
> swpbr2, ip, [r6]
> b   .L159
.L160:.L160:
mov r3, #95   mov r3, #95
str r8, [fp, #-120]   str r8, [fp, #-120]
mov r0, #1mov r0, #1
swpbr1, r3, [r6]  swpbr1, r3, [r6]
b   .L159 b   .L159

Code duplication, presumably because of the different register allocation.

ldr lr, [fp, #-104]   ldr lr, [fp, #-104]
ldrbr2, [r1, ip]  ldrbr2, [r1, ip]
add r3, r1, lradd r3, r1, lr
swpbr2, r2, [r3]| swpblr, r2, [r3]
ldr r2, [fp, #-132]   ldr r2, [fp, #-132]
add r1, r1, #1add r1, r1, #1
> ldr lr, [fp, #-104]
add r2, r2, #1add r2, r2, #1
cmp r1, r0cmp r1, r0
str r2, [fp, #-132]   str r2, [fp, #-132]
add r3, lr, r1add r3, lr, r1

Here, the register allocator is just plain stupid in not using the best
alternative. I suspect this is because only reload allocates scratch
registers and doesn't realize that the input register dies in this insn.

ldr r2, [fp, #-184] | ldr r5, [fp, #-184]
strhr3, [r4, #22] strhr3, [r4, #22]
strhr3, [r4, #14] strhr3, [r4, #14]
mov r0, r2, asr #16 <
ldrhr2, [fp, #-48]ldrhr2, [fp, #-48]
mov r1, #0m

c++ variable-length array?

2006-07-21 Thread Neal Becker
Using gcc-4.1.1.  Info says variable-length array is supported in c++ mode,
but doesn't seem to work:

#include 
#include 

template
void F (in_t const& in, int size, int x[size]) {}

void G (std::vector const& in, int size, int x[size]) {}

int main () {
  std::vector i (10);
  int x (10);
  F (i, boost::size (i), x);
  G (i, boost::size (i), x);
}
g++ -c Test.cc -I /usr/local/src/boost.cvs
Test.cc:5: error: ‘size’ was not declared in this scope
Test.cc:7: error: ‘size’ was not declared in this scope
Test.cc: In function ‘int main()’:
Test.cc:12: error: no matching function for call to ‘F(std::vector >&, size_t, int&)’
Test.cc:7: error: too many arguments to function ‘void G(const
std::vector >&, int)’
Test.cc:13: error: at this point in file




Fortran fail on 4.0 branch

2006-07-21 Thread Andreas Krebbel
Hi,

my daily build for s390(x) still shows the 2 following 
gfortran testcases failing on the 4.0 branch:

actual_array_constructor_2.f90  (#28167)
actual_array_substr_2.f90   (#28174)

Both were committed with rev. 115186 for gcc 4.1 and with rev. 115185 
for 4.0 by Alexandre Oliva. The patch fixing these came with rev. 115222 for 
4.1.
Unfortunately there was no fix for the 4.0 branch so they are still failing :(

Is there a fix for 4.0? If not, should we remove the testcases or mark them
as xfail - just to polish the test summary a bit ;-)

Bye,

-Andreas-


Re: Fortran fail on 4.0 branch

2006-07-21 Thread Steven Bosscher

On 7/21/06, Andreas Krebbel <[EMAIL PROTECTED]> wrote:

Is there a fix for 4.0? If not, should we remove the testcases or mark them
as xfail - just to polish the test summary a bit ;-)


IMHO backporting any fortran patches to 4.0 is a waste of time. There
are so many known bugs in the gfortran that was in gcc 4.0 that nobody
should use it anyway. Fixing these two test cases might give folks a
false sense of confidence in gfortran 4.0 ;-)

Gr.
Steven


Re: MIPS RDHWR instruction reordering

2006-07-21 Thread Atsushi Nemoto
On 19 Jun 2006 16:45:43 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
> I'm not sure, because I'm not sure what is hoisting the instruction.
> 
> I tried recreating this, but I couldn't.  I get this:
> 
> foo:
>   .frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
>   .mask   0x,0
>   .fmask  0x,0
>   .setnoreorder
>   .cpload $25
>   .setreorder
>   .setnoreorder
>   .setnomacro
>   beq $4,$0,$L7
>   .setpush
>   .setmips32r2
>   rdhwr   $3,$29
>   .setpop
>   .setmacro
>   .setreorder

FYI, I found that the difference between your result (gcc 4.2) and
mine (gcc 4.1.1) is come from r108713 commit.

With r108712 I got:

foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setnomacro

lw  $2,%gottprel(x)($28)
.setpush
.setmips32r2
rdhwr   $3,$29
.setpop
addu$2,$2,$3
beq $4,$0,$L4
move$3,$0

lw  $3,0($2)
$L4:
j   $31
move$2,$3

And with r108713 I got:

foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setnomacro

beq $4,$0,$L7
.setpush
.setmips32r2
rdhwr   $3,$29
.setpop

lw  $2,%gottprel(x)($28)
nop
addu$2,$2,$3
lw  $2,0($2)
j   $31
nop

$L7:
j   $31
move$2,$0

And I can not see why the commit make such a difference...

---
Atsushi Nemoto


Re: MIPS RDHWR instruction reordering

2006-07-21 Thread Ian Lance Taylor
Atsushi Nemoto <[EMAIL PROTECTED]> writes:

> And with r108713 I got:
> 
> foo:
>   .frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
>   .mask   0x,0
>   .fmask  0x,0
>   .setnoreorder
>   .cpload $25
>   .setnomacro
>   
>   beq $4,$0,$L7
>   .setpush
>   .setmips32r2
>   rdhwr   $3,$29
>   .setpop
> 
>   lw  $2,%gottprel(x)($28)
>   nop
>   addu$2,$2,$3
>   lw  $2,0($2)
>   j   $31
>   nop
> 
> $L7:
>   j   $31
>   move$2,$0
> 
> And I can not see why the commit make such a difference...

I also don't see why revision 108713 would affect this.

But I do note that this version is still bad.  The rdhwr instruction
is in the branch delay slot, and is therefore always executed.

Ian


Re: c++ variable-length array?

2006-07-21 Thread Mike Stump

On Jul 21, 2006, at 7:14 AM, Neal Becker wrote:
Using gcc-4.1.1.  Info says variable-length array is supported in c+ 
+ mode,

but doesn't seem to work


Nope, sure doesn't.  I don't recall any good reason why we can't  
support it.  I'd file a bug report for it.


Being able to compile c99 style code would be useful/nice.


gcc-4.1-20060721 is now available

2006-07-21 Thread gccadmin
Snapshot gcc-4.1-20060721 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20060721/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.1 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch 
revision 115654

You'll find:

gcc-4.1-20060721.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.1-20060721.tar.bz2 C front end and core compiler

gcc-ada-4.1-20060721.tar.bz2  Ada front end and runtime

gcc-fortran-4.1-20060721.tar.bz2  Fortran front end and runtime

gcc-g++-4.1-20060721.tar.bz2  C++ front end and runtime

gcc-java-4.1-20060721.tar.bz2 Java front end and runtime

gcc-objc-4.1-20060721.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.1-20060721.tar.bz2The GCC testsuite

Diffs from 4.1-20060714 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.1
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.