Re: Modifying ARM code generator for elimination of 8bit writes - need help
On Thu, Jul 20, 2006 at 03:27:49PM +0200, Rask Ingemann Lambertsen wrote: > (define_expand "reload_outqi" > [(clobber (match_operand:QI 0 "memory_operand" "=Q")) >(clobber (match_operand:DI 2 "register_operand" "=&r")) >(set (match_dup 4) (match_dup 5)) >(parallel [ >(set (match_dup 6) > (match_operand:QI 1 "register_operand" "r")) >(clobber (match_dup 3))] > )] [...] I should perhaps explain how this works. Let's say operand 0 is (mem:QI (reg:SI 0)), operand 1 is (reg:QI 1) and operand 2 is (reg:DI 2). We then get: (clobber (mem:QI (reg:SI 0))) (clobber (reg:DI 2)) (set (reg:SI 3) (reg:SI 3)) ; {*arm_movsi_insn}, optimized away later. (parallel [ (set (mem:QI (reg:SI 0)) (reg:QI 1)) (clobber (reg:QI 2)) ]) ; {_arm_movqi_insn_swp} If operand 0 is (mem:QI (plus:SI (reg:SI 0) (const_int 16))), we get: (clobber (mem:QI (reg:SI 0))) (clobber (reg:DI 2)) (set (reg:SI 3) (plus:SI (reg:SI 0) (const_int 16))) ; {*arm_addsi3} (parallel [ (set (mem:QI (reg:SI 3)) (reg:QI 1)) (clobber (reg:QI 2)) ]) ; {_arm_movqi_insn_swp} I'll rewrite it to make clearer what is going on. Also, the two clobber expressions have no purpose in the insn stream. They only exist because all external operands must be declared using match_operand somewhere in the RTL template and RTL offers no good way of doing that in a case like this one. -- Rask Ingemann Lambertsen
Re: Gcov: Counting lines of source code (untested files) as gcov does
Apparently I forgot to CC this to the list last time, so here is a new attempt! Fredrik, if you can modify the source: if any line in a source file is touched, you'll get a .da file. So you could add a dummy routine to each file and call them all at startup. That will be an easier project than trying to write a new .bb/.bbg reader. In that case I guess I wouldn't need any "dummy routine" either, just include the code from a testfile that I know is beeing run, and I have one TestMain.cpp or something that loads the testsuit and such. That would work, wouldn't it? I'm at home now so I don't have the code in front of me, but an included file that isn't used get 0% coverage, right? But anyway, I would really really hate to have to alter the main code base. I'm hired over the summer to alter the build system, and if I came and told the other developers to make changes to their other components I think that would discorage them to use gcov at all, which is the opposite to what I'm hired to do :) It's not possible to "trick" gcov (or create a version of gcov that can be tricked) that there is a .da file but that no lines have been executed or something like that (the line count is done by reading the .bb anyway, right?)? Thanks! Regards, Fredrik
Re: Modifying ARM code generator for elimination of 8bit writes - need help
On Thu, Jul 20, 2006 at 04:37:41PM +0200, Rask Ingemann Lambertsen wrote: > ;; This is primarily a hack for the Nintendo DS external RAM. > (define_insn "_arm_movqi_insn_swp" > [(set (match_operand:QI 0 "reg_or_Qmem_operand" "=r,r,r,Q,Q") > (match_operand:QI 1 "general_operand" "rI,K,m,r,r")) > (clobber (match_scratch:QI 2 "=X,X,X,1,&r"))] > "TARGET_ARM && TARGET_SWP_BYTE_WRITES >&& ( register_operand (operands[0], QImode) >|| register_operand (operands[1], QImode))" > "@ >mov%?\\t%0, %1 >mvn%?\\t%0, #%B1 >ldr%?b\\t%0, %1 >swp%?b\\t%1, %1, [%|%m0] >swp%?b\\t%2, %1, [%|%m0]" > [(set_attr "type" "*,*,load1,store1,store1") >(set_attr "predicable" "yes")] > ) I found that this peephole optimization improves the code a whole lot: ;; The register allocator is often stupid. Try to change ;; mov r2, r1 ;; swpbr2, r2, [r0] ;; into ;; swpbr2, r1, [r0] ;; (and pretend it is just another way of allocating a scratch register). (define_peephole2 [(parallel [(set (match_operand:QI 2 "register_operand") (match_operand:QI 1 "register_operand")) (clobber (match_scratch:QI 3))]) (parallel [ (set (match_operand:QI 0 "memory_operand") (match_dup 2)) (clobber (match_dup 2))])] "TARGET_ARM && TARGET_SWP_BYTE_WRITES" [(parallel [(set (match_dup 0) (match_dup 1)) (clobber (match_dup 2))])] ) Another way of improving the code was to swap the order of the two last alternatives of _arm_movqi_insn_swp. There are a few differences in the generated code, shown with "1,&r" to the left and "&r,1" to the right: .L92: .L92: ldr r2, [fp, #-144] | ldr r1, [fp, #-144] ldr r3, [fp, #-152] ldr r3, [fp, #-152] cmp r2, #0 | cmp r1, #0 add r2, r3, #2add r2, r3, #2 ldreq r0, [fp, #-144] | moveq r0, r1 Above, reload from memory [fp, #-144] for no apparent reason. .L141: .L141: ldr r0, [fp, #-152] | ldr r2, [fp, #-152] sub r3, r0, #2 | sub r3, r2, #2 cmp r5, r3cmp r5, r3 beq .L142 beq .L142 cmp r5, #0cmp r5, #0 movne r2, r0 | beq .L144 bne .L146 | b .L146 b .L144 < Some sort of register allocation mismatch. beq .L160 | beq .L155 cmp r0, #44 cmp r0, #44 cmpne r0, #59 cmpne r0, #59 beq .L160 | beq .L155 cmp r0, #61 cmp r0, #61 cmpne r0, #43 cmpne r0, #43 bne .L158 bne .L158 > .L155: > mov ip, #95 > str r8, [fp, #-120] > mov r0, #1 > swpbr2, ip, [r6] > b .L159 .L160:.L160: mov r3, #95 mov r3, #95 str r8, [fp, #-120] str r8, [fp, #-120] mov r0, #1mov r0, #1 swpbr1, r3, [r6] swpbr1, r3, [r6] b .L159 b .L159 Code duplication, presumably because of the different register allocation. ldr lr, [fp, #-104] ldr lr, [fp, #-104] ldrbr2, [r1, ip] ldrbr2, [r1, ip] add r3, r1, lradd r3, r1, lr swpbr2, r2, [r3]| swpblr, r2, [r3] ldr r2, [fp, #-132] ldr r2, [fp, #-132] add r1, r1, #1add r1, r1, #1 > ldr lr, [fp, #-104] add r2, r2, #1add r2, r2, #1 cmp r1, r0cmp r1, r0 str r2, [fp, #-132] str r2, [fp, #-132] add r3, lr, r1add r3, lr, r1 Here, the register allocator is just plain stupid in not using the best alternative. I suspect this is because only reload allocates scratch registers and doesn't realize that the input register dies in this insn. ldr r2, [fp, #-184] | ldr r5, [fp, #-184] strhr3, [r4, #22] strhr3, [r4, #22] strhr3, [r4, #14] strhr3, [r4, #14] mov r0, r2, asr #16 < ldrhr2, [fp, #-48]ldrhr2, [fp, #-48] mov r1, #0m
c++ variable-length array?
Using gcc-4.1.1. Info says variable-length array is supported in c++ mode, but doesn't seem to work: #include #include template void F (in_t const& in, int size, int x[size]) {} void G (std::vector const& in, int size, int x[size]) {} int main () { std::vector i (10); int x (10); F (i, boost::size (i), x); G (i, boost::size (i), x); } g++ -c Test.cc -I /usr/local/src/boost.cvs Test.cc:5: error: ‘size’ was not declared in this scope Test.cc:7: error: ‘size’ was not declared in this scope Test.cc: In function ‘int main()’: Test.cc:12: error: no matching function for call to ‘F(std::vector >&, size_t, int&)’ Test.cc:7: error: too many arguments to function ‘void G(const std::vector >&, int)’ Test.cc:13: error: at this point in file
Fortran fail on 4.0 branch
Hi, my daily build for s390(x) still shows the 2 following gfortran testcases failing on the 4.0 branch: actual_array_constructor_2.f90 (#28167) actual_array_substr_2.f90 (#28174) Both were committed with rev. 115186 for gcc 4.1 and with rev. 115185 for 4.0 by Alexandre Oliva. The patch fixing these came with rev. 115222 for 4.1. Unfortunately there was no fix for the 4.0 branch so they are still failing :( Is there a fix for 4.0? If not, should we remove the testcases or mark them as xfail - just to polish the test summary a bit ;-) Bye, -Andreas-
Re: Fortran fail on 4.0 branch
On 7/21/06, Andreas Krebbel <[EMAIL PROTECTED]> wrote: Is there a fix for 4.0? If not, should we remove the testcases or mark them as xfail - just to polish the test summary a bit ;-) IMHO backporting any fortran patches to 4.0 is a waste of time. There are so many known bugs in the gfortran that was in gcc 4.0 that nobody should use it anyway. Fixing these two test cases might give folks a false sense of confidence in gfortran 4.0 ;-) Gr. Steven
Re: MIPS RDHWR instruction reordering
On 19 Jun 2006 16:45:43 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > I'm not sure, because I'm not sure what is hoisting the instruction. > > I tried recreating this, but I couldn't. I get this: > > foo: > .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 > .mask 0x,0 > .fmask 0x,0 > .setnoreorder > .cpload $25 > .setreorder > .setnoreorder > .setnomacro > beq $4,$0,$L7 > .setpush > .setmips32r2 > rdhwr $3,$29 > .setpop > .setmacro > .setreorder FYI, I found that the difference between your result (gcc 4.2) and mine (gcc 4.1.1) is come from r108713 commit. With r108712 I got: foo: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x,0 .fmask 0x,0 .setnoreorder .cpload $25 .setnomacro lw $2,%gottprel(x)($28) .setpush .setmips32r2 rdhwr $3,$29 .setpop addu$2,$2,$3 beq $4,$0,$L4 move$3,$0 lw $3,0($2) $L4: j $31 move$2,$3 And with r108713 I got: foo: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x,0 .fmask 0x,0 .setnoreorder .cpload $25 .setnomacro beq $4,$0,$L7 .setpush .setmips32r2 rdhwr $3,$29 .setpop lw $2,%gottprel(x)($28) nop addu$2,$2,$3 lw $2,0($2) j $31 nop $L7: j $31 move$2,$0 And I can not see why the commit make such a difference... --- Atsushi Nemoto
Re: MIPS RDHWR instruction reordering
Atsushi Nemoto <[EMAIL PROTECTED]> writes: > And with r108713 I got: > > foo: > .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 > .mask 0x,0 > .fmask 0x,0 > .setnoreorder > .cpload $25 > .setnomacro > > beq $4,$0,$L7 > .setpush > .setmips32r2 > rdhwr $3,$29 > .setpop > > lw $2,%gottprel(x)($28) > nop > addu$2,$2,$3 > lw $2,0($2) > j $31 > nop > > $L7: > j $31 > move$2,$0 > > And I can not see why the commit make such a difference... I also don't see why revision 108713 would affect this. But I do note that this version is still bad. The rdhwr instruction is in the branch delay slot, and is therefore always executed. Ian
Re: c++ variable-length array?
On Jul 21, 2006, at 7:14 AM, Neal Becker wrote: Using gcc-4.1.1. Info says variable-length array is supported in c+ + mode, but doesn't seem to work Nope, sure doesn't. I don't recall any good reason why we can't support it. I'd file a bug report for it. Being able to compile c99 style code would be useful/nice.
gcc-4.1-20060721 is now available
Snapshot gcc-4.1-20060721 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20060721/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch revision 115654 You'll find: gcc-4.1-20060721.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20060721.tar.bz2 C front end and core compiler gcc-ada-4.1-20060721.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20060721.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20060721.tar.bz2 C++ front end and runtime gcc-java-4.1-20060721.tar.bz2 Java front end and runtime gcc-objc-4.1-20060721.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20060721.tar.bz2The GCC testsuite Diffs from 4.1-20060714 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.