Re: Second GCC 4.6.0 release candidate is now available
On Tue, Mar 22, 2011 at 11:12 AM, Jakub Jelinek wrote: > A second GCC 4.6.0 release candidate is available at: > > ftp://gcc.gnu.org/pub/gcc/snapshots/4.6.0-RC-20110321/ > > Please test the tarballs and report any problems to Bugzilla. > CC me on the bugs if you believe they are regressions from > previous releases severe enough to block the 4.6.0 release. > > If no more blockers appear I'd like to release GCC 4.6.0 > early next week. The RC bootstraps C, C++, Fortran, Obj-C, and Obj-C++ on ARMv7/Cortex-A9/Thumb-2/NEON, ARMv5T/ARM/softfp, ARMv5T/Thumb/softfp, and ARMv4T/ARM/softfp. I'm afraid I haven't reviewed the test results (Richard? Ramana?) See: http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02298.html http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02391.html http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02394.html http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02393.html and: http://builds.linaro.org/toolchain/gcc-4.6.0-RC-20110321/logs/ -- Michael
Re: Second GCC 4.6.0 release candidate is now available
On Sat, Mar 26, 2011 at 4:16 AM, Ramana Radhakrishnan wrote: > Hi Michael, > > Thanks for running these. I spent some time this morning looking > through the results, they largely look ok though I don't have much > perspective on the > the objc/ obj-c++ failures. > > These failures here > > For v7-a , A9 and Neon - these failures below: > >> Running target unix >> FAIL: gfortran.dg/array_constructor_11.f90 -O3 -fomit-frame-pointer (test >> for excess errors) >> UNRESOLVED: gfortran.dg/array_constructor_11.f90 -O3 -fomit-frame-pointer >> compilation failed to produce executable >> FAIL: gfortran.dg/array_constructor_11.f90 -O3 -fomit-frame-pointer >> -funroll-loops (test for excess errors) >> UNRESOLVED: gfortran.dg/array_constructor_11.f90 -O3 -fomit-frame-pointer >> -funroll-loops compilation failed to produce executable >> FAIL: gfortran.dg/array_constructor_11.f90 -O3 -fomit-frame-pointer >> -funroll-all-loops -finline-functions (test for excess errors) >> UNRESOLVED: gfortran.dg/array_constructor_11.f90 -O3 -fomit-frame-pointer >> -funroll-all-loops -finline-functions compilation failed to produce >> executable >> FAIL: gfortran.dg/array_constructor_11.f90 -O3 -g (test for excess errors) >> UNRESOLVED: gfortran.dg/array_constructor_11.f90 -O3 -g compilation failed >> to produce executable >> FAIL: gfortran.dg/func_assign_3.f90 -O3 -fomit-frame-pointer (test for >> excess errors) >> UNRESOLVED: gfortran.dg/func_assign_3.f90 -O3 -fomit-frame-pointer >> compilation failed to produce executable >> FAIL: gfortran.dg/func_assign_3.f90 -O3 -fomit-frame-pointer -funroll-loops >> (test for excess errors) >> UNRESOLVED: gfortran.dg/func_assign_3.f90 -O3 -fomit-frame-pointer >> -funroll-loops compilation failed to produce executable >> FAIL: gfortran.dg/func_assign_3.f90 -O3 -fomit-frame-pointer >> -funroll-all-loops -finline-functions (test for excess errors) >> UNRESOLVED: gfortran.dg/func_assign_3.f90 -O3 -fomit-frame-pointer >> -funroll-all-loops -finline-functions compilation failed to produce >> executable >> FAIL: gfortran.dg/func_assign_3.f90 -O3 -g (test for excess errors) >> UNRESOLVED: gfortran.dg/func_assign_3.f90 -O3 -g compilation failed to >> produce executable > are caused by a broken assembler. All these tests appear to pass > fine in a cross environment on my machine. I've updated to binutils 2.21.51 which should fix the fault. I'm re-running the Cortex-A9 build against the 4.6.0 release now. > From v5t. > >> FAIL: gcc.dg/c90-intconst-1.c (internal compiler error) >> FAIL: gcc.dg/c90-intconst-1.c (test for excess errors) I re-ran this against the 4.6.0 release and these fails went away. Good. http://gcc.gnu.org/ml/gcc-testresults/2011-04/msg00319.html -- Michael
Re: GCC 4.6.1 Release Candidate available from gcc.gnu.org
On Tue, Jun 21, 2011 at 1:01 AM, Jakub Jelinek wrote: > The first release candidate for GCC 4.6.1 is available from > > ftp://gcc.gnu.org/pub/gcc/snapshots/4.6.1-RC-20110620 > > and shortly its mirrors. It has been generated from SVN revision 175201. > > I have so far bootstrapped and tested the release candidate on > x86_64-linux and i686-linux. Please test it and report any issues to > bugzilla. It bootstraps C, C++ and Fortran in a ARM Cortex-A9 and ARMv5TE configuration. The test results are here: http://gcc.gnu.org/ml/gcc-testresults/2011-06/msg02632.html http://gcc.gnu.org/ml/gcc-testresults/2011-06/msg02633.html with more detail here: http://builds.linaro.org/toolchain/gcc-4.6.1-RC-20110620/logs/ Ramana or Richard, could you have a read over the results please? -- Michael
Re: performance regression with trunk's gengtype on ARM?
On Mon, Aug 29, 2011 at 8:57 AM, Mikael Pettersson wrote: > I'm seeing what appears to be a recent massive performance regression > with trunk's gengtype, as compiled and run in stage 2, on ARM V5TE. > > Right now 4.7-20110827's stage2 gengtype has been running for almost > 10 hours on my ARM build machine, but the process is tiny and no swapping > occurs. To put those 10 hours in perspective, on this machine (1.6 GHz > ARM V5TE uniprocessor running Linux) I regularly do full bootstraps and > regression test suite runs for c,c++,ada,fortran in about 18 hours for > gcc 4.4, about 20 hours for gcc 4.5, about 24 hours for gcc 4.6, and > about 27 hours for trunk until recently. So 10 hours or more just in > stage 2 gengtype is suspicious. > > I believe 4.7-20110820 also was unusually slow to build, but I didn't > monitor that build very carefully so can't say if gengtype was involved > then too. FWIW, I build trunk once a week on a PandaBoard. r178096 took 10 hours to bootstrap C, C++, and Fortran and 9 hours to test. The 4.5 release branch at r177893 takes 3:50 to bootstrap and 6:15 to test. I've put the user time in seconds below. 4.5 is ~2 s, 4.6 is ~23000 s, and current trunk ~46000 (2.3 x slower). See http://builds.linaro.org/toolchain/ for more. -- Michael gcc-4.5+svn175369 20602.07 gcc-4.5+svn175745 19768.21 gcc-4.5+svn176026 19739.35 gcc-4.5+svn176306 19711.70 gcc-4.5+svn176615 19668.38 gcc-4.5+svn176915 19728.38 gcc-4.5+svn177422 19713.14 gcc-4.5+svn177688 19746.67 gcc-4.5+svn177893 19744.96 gcc-4.6+svn175136 22979.51 gcc-4.6+svn175369 23092.50 gcc-4.6+svn175745 22958.21 gcc-4.6+svn176026 23009.37 gcc-4.6+svn176306 22952.93 gcc-4.6+svn176615 22952.11 gcc-4.6+svn177422 22946.22 gcc-4.6+svn177688 22847.87 gcc-4.6+svn177894 22964.09 gcc-4.6+svn178096 22934.61 gcc-4.7~svn175284 34518.10 gcc-4.7~svn175368 34887.17 gcc-4.7~svn175422 34975.48 gcc-4.7~svn175617 34908.60 gcc-4.7~svn175745 35040.42 gcc-4.7~svn175795 35110.84 gcc-4.7~svn175904 34893.29 gcc-4.7~svn176026 34972.99 gcc-4.7~svn176133 35171.65 gcc-4.7~svn176224 35247.44 gcc-4.7~svn176306 35038.07 gcc-4.7~svn176494 26151.21 gcc-4.7~svn176615 26257.04 gcc-4.7~svn176733 40401.18 gcc-4.7~svn176816 40048.32 gcc-4.7~svn176915 40102.52 gcc-4.7~svn176998 40161.10 gcc-4.7~svn177229 28604.27 gcc-4.7~svn177422 44991.89 gcc-4.7~svn177554 45199.05 gcc-4.7~svn177610 45173.94 gcc-4.7~svn177688 45469.00 gcc-4.7~svn177823 45391.00 gcc-4.7~svn177949 28769.64 gcc-4.7~svn178025 45605.43 gcc-4.7~svn178096 45599.59
Re: RFC: Improving support for known testsuite failures
On Thu, Sep 8, 2011 at 8:31 PM, Richard Guenther wrote: > On Wed, Sep 7, 2011 at 5:28 PM, Diego Novillo wrote: >> One of the most vexing aspects of GCC development is dealing with >> failures in the various testsuites. In general, we are unable to >> keep failures down to zero. We tolerate some failures and tell >> people to "compare your build against a clean build". >> >> This forces developers to either double their testing time by >> building the compiler twice or search in gcc-testresults and hope >> to find a relatively similar build to compare against. >> >> Additionally, the marking mechanisms in DejaGNU are generally >> cumbersome and hard to add. Even worse, depending on the >> controlling script, there may not be an XFAIL marker at all. >> >> So, while we would ideally keep NO failures in the testsuite, the >> reality is that we are content with having KNOWN failures. For a >> given set of failures out of 'make check', I would like to have a >> simple filtering mechanism that prunes the known failures out. >> >> Desired features: >> >> - List of known failures lives in SVN. >> - Each target can have its own list. >> - Supports ignoring FAIL, UNRESOLVED and XPASS results. >> - Supports pattern matching to glob sets of failures. >> - Co-exists with the existing XFAIL support in DejaGNU. >> - Supports flaky tests. >> - Supports timestamps to avoid having tests in a knonw-to-fail >> state forever. >> >> In terms of implementation, this filter could be part of 'make >> check'. We'd pipe make check's output to it and it would decide >> whether to emit FAIL/UNRESOLVED/XPASS lines based on the black >> list. >> >> I could also make this a post-check filter that runs on all the >> generated .sum files. The filter could live in >> /contrib and be used on demand. >> >> I am not thrilled about the prospect of implementing this in >> DejaGNU directly. >> >> Thoughts? > > I think it would be more useful to have a script parse gcc-testresults@ > postings from the various autotesters and produce a nice webpage > with revisions and known FAIL/XPASSes for the target triplets that > are tested. > > That's been a long time on my TODO list, but my web/script FU is > weak enough that I've been pushing that back. I have something along those lines for the Linaro releases: http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.6-2011.08/logs/armv7l-natty-cbuild162-ursa1-cortexa9r1/gcc-testsuite.txt?base=gcc-linaro-4.6-2011.07-0 and a lower level diff-on-sum-files for each commit: http://builds.linaro.org/toolchain/gcc-linaro-4.5+bzr99541~rsandifo~lp823708-4.5/logs/armv7l-natty-cbuild181-ursa4-armv5r2/testsuite-diff.txt http://builds.linaro.org/toolchain/gcc-linaro-4.6+bzr106801~ams-codesourcery~merge-from-fsf-20110908-4.6/logs/x86_64-natty-cbuild181-oort1-x86_64r1/testsuite-diff.txt They're both a hack and only work against local files. The code is available at: https://launchpad.net/tcwg-web and: https://launchpad.net/cbuild They're both similar to contrib/compare_results but webified and hooked into our auto builders. -- Michael
Re: Improvement of Cortex-A15
On Thu, Jan 19, 2012 at 9:35 PM, Yang Yueming wrote: > I want to do some optimizations for Cortex-A15,Is anyone doing this too or is > there any work has been done? > > Yang Yueming Hi there. Cortex-A15 boards aren't readily available so most of the work is being done by the PDSW group inside ARM. Here at Linaro we target the Cortex-A9 but many of the changes also help the A15. Here's our todo list: http://apus.seabright.co.nz/helpers/backlog/project/gcc-linaro We work upstream in the FSF trunk and backport the changes to our 4.6 based branch. We're a pretty open bunch and, if you wish, are happy to backport any changes you make. -- Michael
Changing the order when generating a spill address
Hi there. The port that I'm working on has pointer registers backed by a cache. It's unusual as the cache changes immediately when the pointer register is modified instead of later when it is deferenced. This means that it is cheaper to copy a base address into the pointer register, then add the offset as it is less likely that the cache row will change. Normal code such as this: --- struct abc { int a; char cc[64]; int b; }; int foo(struct abc *p) { return p->a + p->b; } --- generates the correct code: LOADACC, R10 STOREACC, X LOADLONG, #68 ADD, X LOADACC, (X) Code that saves or loads a value into a spill slot however does the opposite: LOADLONG, #16 STOREACC, X LOADACC, R1E (the stack pointer) ADD, X LOADACC, R10 STOREACC, (X) I did a spot check on the bfin port and it does the same: P2 = -16 (X); P2 = P2 + FP; R0 = [P2]; Is there a way of setting the order that reload generates a spill slot address? I can work around it by implementing LEGITIMIZE_RELOAD_ADDRESS but was wondering if there is a better way. Thank you, -- Michael
Picking between alternative ways of expanding a section
Hi there. This is in follow up to my email on the 24 th of May. The short version is: how can I track down why GCC is picking between two alternatives for implementing a function? In a memcpy() where Pmode == SImode, I get a near ideal implementation. If Pmode == PSImode (due to limitations of the pointer registers) I get something much worse. The difference happens early on. In the .128r.expand with Pmode == SImode I get: ;; MEM[base: to] = MEM[base: p]; With PSImode I get offset addressing instead: ;; MEM[base: pto + ivtmp.25] = MEM[base: pfrom + ivtmp.25]; This flows through into the actual code. I assume this is due to GCC assuming that PSImode works differently to SImode and that the cast/translation cost is enough to make offset addressing overall cheaper. The m32c compiler is the only other using PSImode but it doesn't generate offsetted addresses. The same things happen with and without a basic TARGET_ADDRESS_COSTS and TARGET_RTX_COSTS. I guess I want a way of telling the compiler that PSImode and SImode are equivalent. The longer version is: The machine I'm working on has two special registers for memory access that are backed by caches. Any change to these registers can cause an expensive cache load cycle so while they're great for memory access they're terrible for general use. The problem is that Pmode == SImode so the register allocator will now and again use these registers for general operations. I've implemented a partial integer mode PSImode suggested by Mihael Meissner and set Pmode to PSImode. This correctly separates things but the compiler now generates significantly worse code. The example is a simple memcpy(): void copy(int *pfrom, int *pto, int count) { while (count != 0) { *pto = *pfrom; pto++; pfrom++; count--; } } If I have #define Pmode SImode then I get the near-best code: copy: LOADACC, R12;# 133 loadaccsi_insn/1 STOREACC, R13 ;# 134 storeaccsi_insn LOADLONG, #0;# 139 loadaccsi_insn/2 XOR, R13;# 140 cmpccsi_insn/3 LOADLONG, #.L4 ;# 43 *bCCeq SKIP_IF STOREACC, PC LOADACC, R11;# 121 loadaccsi_insn/1 STOREACC, Y ;# 122 storeaccsi_insn LOADACC, R10;# 127 loadaccsi_insn/1 STOREACC, X ;# 128 storeaccsi_insn .L3: LOADACC, (X);# 79 loadaccsi_insn/1 STOREACC, (Y) ;# 86 storeaccsi_insn LOADLONG, #4;# 149 loadaccsi_insn/2 ADD, Y ;# 150 addsi3_acc ADD, X ;# 151 addsi3_acc LOADLONG, #-1 ;# 103 loadaccsi_insn/2 ADD, R12;# 104 addsi3_acc LOADACC, R12;# 109 loadaccsi_insn/1 STOREACC, R10 ;# 110 storeaccsi_insn LOADLONG, #0;# 115 loadaccsi_insn/2 XOR, R10;# 116 cmpccsi_insn/3 LOADLONG, #.L3 ;# 57 *bCCne STOREACC, PC_IF .L4: POP ;# 147 *expanded_return STOREACC, PC Note the good LOADACC, (X);# 79 loadaccsi_insn/1 STOREACC, (Y) ;# 86 storeaccsi_insn LOADLONG, #4;# 149 loadaccsi_insn/2 ADD, Y ;# 150 addsi3_acc ADD, X ;# 151 addsi3_acc in the middle. Instead if I have #define Pmode PSImode I get copy: LOADACC, R14;# 186 loadaccsi_insn/1 PUSH;# 187 pushsi_acc LOADACC, R12;# 163 loadaccsi_insn/1 STOREACC, R13 ;# 164 storeaccsi_insn LOADLONG, #0;# 169 loadaccsi_insn/2 XOR, R13;# 170 cmpccsi_insn/3 LOADLONG, #.L4 ;# 43 *bCCeq SKIP_IF STOREACC, PC LOADLONG, #0;# 157 loadaccsi_insn/2 STOREACC, R13 ;# 158 storeaccsi_insn .L3: LOADACC, R13;# 85 loadaccsi_insn/1 STOREACC, X ;# 86 storeaccsi_insn ; No-op truncate on X = X ;# 47 truncsipsi2/1 LOADACC, R11;# 91 loadaccpsi_insn/1 STOREACC, Y ;# 92 storeaccpsi_insn LOADACC, X ;# 97 loadaccpsi_insn/1 ADD, Y ;# 98 addpsi3_acc LOADACC, R10;# 103 loadaccpsi_insn/1 STOREACC, R14 ;# 104 storeaccpsi_insn LOADACC, X ;# 109 loadaccpsi_insn/1 ADD, R14;# 110 addpsi3_acc LOADACC, R14;# 115 loadaccpsi_insn/1 STOREACC, X ;# 116 storeaccpsi_insn LOADACC, (X);# 121 loadaccsi_insn/1 STOREACC, (Y) ;# 128 storeaccsi_insn LOADLONG, #-1 ;# 133 loadaccsi_insn/2 ADD, R12;# 134 addsi3_acc LOADLONG, #4;# 139 loadaccsi_insn/2 ADD, R13;# 140 addsi3_acc LOADACC, R12;# 145 loadaccsi_insn/1 STOREACC, X ;# 146 storeaccsi_insn LOADLONG, #0;# 151 loadaccsi_insn/2 XOR, X ;# 152 cmpccsi_insn/3 LOADLONG, #.L3 ;# 59 *bCCne STOREACC, PC_IF .L4: POP ;# 178 popsi_insn STOREACC, R14 POP ;# 179 *
Re: Compiler for gcc
Hi Harshal. I'm no expert, but GCC can be built by another C compiler. If you have a look at how GCC builds you'll see that it goes through a few stages - the first is where the local C compiler builds a first version of GCC, and then this new version of GCC is used to build itself. The same technique is used to build newer versions of GCC. If your machine currently has GCC version 3 and you want to build version 4 then the first step uses GCC 3 to build a temporary version of GCC 4, and then this temporary version is used to build the final version. -- Michael 2009/8/9 Harshal Jain : > As we know gcc is used 2 compile c programs n also gcc is used 2 > compile linux kernels also bt i wanted 2 know who is d compiler of > gcc? > means in which programming language compiler for gcc is written??? > -- > Regards , > Harshal Jain > > “UNIX is simple. It just takes a genius to understand its simplicity.” > – Dennis Ritchie >
Improving code with no offset addressing
Hi there. The architecture I'm working on porting gcc to has indirect addressing but no constant offset or register offset versions. Code like this: void fill(int* p) { p[0] = 0; p[1] = 0; p[2] = 0; p[3] = 0; Turns into: X = p *X = 0 X = X + 4 *X = 0 X = p X = X + 8 *X = 0 X = p X = X + 12 *X = 0 at both -O and -O2. Note that the first step recognises that X contains p and correctly increases it instead of rebuilding it. I'd like to generate the following code instead: X = p *X = 0 X = X + 4 *X = 0 X = X + 4 *X = 0 X = p X = X + 4 *X = 0 What is the best way to approach this? It seems to be common across ports (see the note on ia64 and ARM Thumb below). Is there a cost function I can change? Will changing LEGITIMIZE_ADDRESS fix it? Is there some type of value tracking that could be turned on/added? I've checked the ia64, which also only has indirect addressing, and ARM Thumb which has limited offsets. ia64 generates the same reload base/add offset as mine: mov r14 = r32 ;; st4 [r14] = r0, 4 ;; st4 [r14] = r0 adds r14 = 8, r32 ;; st4 [r14] = r0 adds r14 = 12, r32 ;; st4 [r14] = r0 adds r14 = 16, r32 ARM Thumb does the same when the offset is large (p[70] and p[71] in this case): str r3, [r0] ; p[0] str r3, [r0, #4] ; p[1] str r3, [r0, #8] ; p[2] str r3, [r0, #12] ; p[3] mov r2, #140 mov r3, #0 lsl r2, r2, #1 str r3, [r0, r2] ; p[70] mov r2, #142 lsl r2, r2, #1 str r3, [r0, r2] ; p[71] Thanks for any pointers, -- Michael
Re: porting GCC to a micro with a very limited addressing mode --- what to write in LEGITIMATE_ADDRESS, LEGITIMIZE_ADDRESS and micro.md ?!
Hi Sergio. My port has similar addressing modes - all memory must be accessed by one of two registers and can only be accessed indirectly, indirect with pre increment, and indirect with post increment. The key is GO_IF_LEGITIMATE_ADDRESS and the legitimate address helper function. Mine looks like this: /* Return 1 if the address is OK, otherwise 0. Used by GO_IF_LEGITIMATE_ADDRESS. */ bool tomi_legitimate_address (enum machine_mode mode ATTRIBUTE_UNUSED, rtx x, bool strict_checking) { /* (mem reg) */ if (REG_P (x) && tomi_reg_ok (x, strict_checking) ) { return 1; } if (GET_CODE(x) == PRE_DEC) { ... } if (GET_CODE(x) == POST_INC) { ... } return 0; } tomi_reg_ok returns true if x is any register when strict checking is clear and true if x is one of my addressing registers when strict checking is set. GCC will feed any memory accesses through this function to see if they are directly supported, and if not it will break them up into something smaller and try again. Hope that helps, -- Michael 2010/1/26 Sergio Ruocco : > Gabriel Paubert wrote: >> On Mon, Jan 25, 2010 at 01:34:09PM +0100, Sergio Ruocco wrote: >>> Hi everyone, >>> >>> I am porting GCC to a custom 16-bit microcontroller with very limited >>> addressing modes. Basically, it can only load/store using a (general >>> purpose) register as the address, without any offset: >>> >>> LOAD (R2) R1 ; load R1 from memory at address (R2) >>> STORE R1 (R2) ; store R1 to memory at address (R2) >>> >>> As far as I can understand, this is more limited than the current >>> architectures supported by GCC that I found in the current gcc/config/*. >> >> The Itanium (ia64) has the same limited choice of addressing modes. >> >> Gabriel > > Thanks Gabriel. > > I dived into the ia64 md, but it is still unclear to me how the various > parts (macros, define_expand and define_insn in MD etc.) work together > to force the computation of a source/dest address plus offset into a > register... can anyone help me with this ? > > Thanks, > > Sergio >
Re: porting GCC to a micro with a very limited addressing mode --- success with LEGITIMATE / LEGITIMIZE_ADDRESS, stuck with ICE !
Hi Sergio. Here's the interesting parts from my port. The code's a bit funny looking as I've edited it for this post. In .h: #define BASE_REG_CLASS ADDR_REGS #define INDEX_REG_CLASS NO_REGS #ifdef REG_OK_STRICT # define _REG_OK_STRICT 1 #else # define _REG_OK_STRICT 0 #endif #define REGNO_OK_FOR_BASE_P(r) _regno_ok_for_base_p(r, _REG_OK_STRICT) #define REGNO_OK_FOR_INDEX_P(r) 0 In .c: static bool _reg_ok(rtx reg, bool strict) { int regno = REGNO(reg); bool is_addr = _is_addr_regno(regno); bool ok_strict = is_addr; bool special = regno == ARG_POINTER_REGNUM || regno == TREG_S ; if (strict) { return ok_strict || special; } else { return ok_strict || special || regno >= FIRST_PSEUDO_REGISTER ; } } bool _legitimate_address (enum machine_mode mode ATTRIBUTE_UNUSED, rtx x, bool strict_checking) { /* (mem reg) */ if (REG_P (x) && _reg_ok (x, strict_checking) ) { return 1; } return 0; } Note that this ISA only has indirect addressing and has no indirect + offset or indirect + register modes. GCC handles this just fine by splitting up any other type that fails legitimate_address into smaller components. -- Michael On 10 February 2010 09:02, Sergio Ruocco wrote: > > Michael Hope wrote: >> Hi Sergio. Any luck so far? > > Micheal, thanks for your inquiry. I made some progress, in fact. > > I got the GO_IF_LEGITIMATE_ADDRESS() macro to detect correctly REG+IMM > addresses, and then the LEGITIMIZE_ADDRESS() macro to force them to be > pre-computed in a register. > > However, now the compiler freaks out with an ICE.. :-/ I put some > details below. Thanks for any clue that you or others can give me. > > Cheers, > > Sergio > > == > > > This is a fragment of my LEGITIMIZE_ADDRESS(): > - > > rtx > legitimize_address(rtx X,rtx OLDX, enum machine_mode MODE) > { > rtx op1,op2,op,sum; > op=NULL; > ... > if(GET_CODE(X)==PLUS && !no_new_pseudos) > { > op1=XEXP(X,0); > op2=XEXP(X,1); > if(GET_CODE(op1) == CONST_INT && (GET_CODE(op2) == REG || > GET_CODE(op2) == SUBREG)) // base displacement > { > sum = gen_rtx_PLUS (MODE, op1, op2); > op = force_reg(MODE, sum); > } > ... > - > > > Now when compiling a simple program such as: > > void foobar(int par1, int par2, int parN) > { > int a,b; > a = 0x1234; > b = a; > } > > the instructions (n. 8,12,13) which compute the addresses in registers > seem to be generated correctly: > > - > ;; Function foobar > > ;; Register dispositions: > 37 in 4 38 in 2 39 in 4 40 in 2 41 in 2 > > ;; Hard regs used: 2 4 30 > > (note 2 0 3 NOTE_INSN_DELETED) > > (note 3 2 6 0 NOTE_INSN_FUNCTION_BEG) > > ;; Start of basic block 1, registers live: 1 [A1] 29 [B13] 30 [B14] > (note 6 3 8 1 [bb 1] NOTE_INSN_BASIC_BLOCK) > > (insn 8 6 9 1 (set (reg/f:HI 4 A4 [37]) > (plus:HI (reg/f:HI 30 B14) > (const_int -16 [0xfff0]))) 9 {addhi3} (nil) > (nil)) > > (insn 9 8 10 1 (set (reg:HI 2 A2 [38]) > (const_int 4660 [0x1234])) 5 {*constant_load} (nil) > (nil)) > > (insn 10 9 12 1 (set (mem/i:HI (reg/f:HI 4 A4 [37]) [0 a+0 S2 A32]) > (reg:HI 2 A2 [38])) 7 {*store_word} (nil) > (nil)) > > (insn 12 10 13 1 (set (reg/f:HI 4 A4 [39]) > (plus:HI (reg/f:HI 30 B14) > (const_int -14 [0xfff2]))) 9 {addhi3} (nil) > (nil)) > > (insn 13 12 14 1 (set (reg/f:HI 2 A2 [40]) > (plus:HI (reg/f:HI 30 B14) > (const_int -16 [0xfff0]))) 9 {addhi3} (nil) > (nil)) > > (insn 14 13 15 1 (set (reg:HI 2 A2 [orig:41 a ] [41]) > (mem/i:HI (reg/f:HI 2 A2 [40]) [0 a+0 S2 A32])) 4 {*load_word} (nil) > (nil)) > > (insn 15 14 16 1 (set (mem/i:HI (reg/f:HI 4 A4 [39]) [0 b+0 S2 A16]) > (reg:HI 2 A2 [orig:41 a ] [41])) 7 {*store_word} (nil) > (nil)) > ;; End of basic block 1, registers live: > 1 [A1] 29 [B13] 30 [B14] > > (note 16 15 25 NOTE_INSN_FUNCTION_END) > > (note 25 16 0 NOTE_INSN_DELETED) > - > > However, when I compile it > > $ hcc -da foobar8.c > > I get an ICE at the end
Re: GCC porting tutorials
Hi Radu. I found the MMIX backend to be quite useful. It's reasonably small and acceptably up to date. Keep in mind that the MMIX is a 64 bit machine though. The Picochip and ARM are good as well. The ARM port is very complicated due to the number of targets that it supports but fairly clean once you get into it. -- Michael On 24 April 2010 20:53, Radu Hobincu wrote: > Hello, > > My name is Radu Hobincu, I am part of a team at "Politehnica" University > of Bucharest that is developing a massive parallel computing architecture > and currently my job is to port the GCC compiler to this new machine. > > I've been looking over the GCC official site at http://gcc.gnu.org/ but I > couldn't find an official porting tutorial. Is there such a thing? And > maybe a small example for a lightweight architecture? > > Regards, > Radu >
Side effects on memory access
Hi there. I'm looking at porting GCC to a new architecture which has a quite small instruction set and I'm afraid I can't figure out how to represent unintended side effects on instructions. My current problem is accessing memory. Reading an aligned 32 bit word is simple using LOADACC, (X). Half words and bytes are harder as the only instruction available is a load byte with post increment 'LOADACC, (X+)'. How can I tell GCC that loading a byte also increases the pointer register? My first version reserved one of the pointer registers and threw away the modified value but this is inefficient. I suspect that some type of clobber or define_expand is required but I can't figure it out. Thanks for any help, -- Michael
Re: Side effects on memory access
Thanks for the response Ian. Doing the define_expand inserts the post increment but GCC doesn't seem to notice the change in X. I added this code: (define_expand "movqi" [(set (match_operand:QI 0 "nonimmediate_operand") (match_operand:QI 1 "general_operand" ""))] "" { if (can_create_pseudo_p () && MEM_P (operands[1])) { rtx reg = copy_to_reg (XEXP (operands[1], 0)); emit_insn (gen_movqi_mem (operands[0], reg)); DONE; } } ) ; PENDING: The SI here is actually a P (define_insn "movqi_mem" [(set (match_operand:QI 0 "register_operand" "=d") (mem:QI (post_inc:SI (match_operand:SI 1 "register_operand" "a"] "" "LOADACC, (%1+)\;STOREACC, %0" ) The 'd' constraint is for data registers and the 'a' for address registers, which is only the X register due to cache coherency reasons. When compiling this test case: uint store5(volatile char* p) { return *p + *p; } I get the following move2.i.139r.subreg: --- (insn 3 5 4 2 move2.c:56 (set (reg/v/f:SI 30 [ p ]) (reg:SI 5 R10 [ p ])) 6 {movsi} (nil)) (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 4 8 2 move2.c:57 (set (reg:QI 31) (mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0 {movqi_mem} (nil)) (insn 8 7 9 2 move2.c:57 (set (reg:SI 27 [ D.1191 ]) (zero_extend:SI (reg:QI 31))) 24 {zero_extendqisi2} (nil)) (insn 9 8 10 2 move2.c:57 (set (reg:QI 32) (mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0 {movqi_mem} (nil)) (insn 10 9 11 2 move2.c:57 (set (reg:SI 26 [ D.1193 ]) (zero_extend:SI (reg:QI 32))) 24 {zero_extendqisi2} (nil)) (insn 11 10 12 2 move2.c:57 (set (reg:SI 33) (plus:SI (reg:SI 26 [ D.1193 ]) (reg:SI 27 [ D.1191 ]))) 9 {addsi3} (nil)) (insn 12 11 16 2 move2.c:57 (set (reg:SI 28 [ ]) (reg:SI 33)) 6 {movsi} (nil)) (insn 16 12 22 2 move2.c:58 (set (reg/i:SI 5 R10) (reg:SI 28 [ ])) 6 {movsi} (nil)) (insn 22 16 0 2 move2.c:58 (use (reg/i:SI 5 R10)) -1 (nil)) --- Instruction 3 copies incoming argument in R10 is copied into pseudo 30. Pseudo 30 is then used at instruction 7 then instruction 9 without either being reloaded or corrected for the post increment. -- Michael 2009/4/22 Ian Lance Taylor : > Michael Hope writes: > >> Hi there. I'm looking at porting GCC to a new architecture which has >> a quite small instruction set and I'm afraid I can't figure out how to >> represent unintended side effects on instructions. >> >> My current problem is accessing memory. Reading an aligned 32 bit >> word is simple using LOADACC, (X). Half words and bytes are harder as >> the only instruction available is a load byte with post increment >> 'LOADACC, (X+)'. > > Wow. > >> How can I tell GCC that loading a byte also increases the pointer >> register? My first version reserved one of the pointer registers and >> threw away the modified value but this is inefficient. I suspect that >> some type of clobber or define_expand is required but I can't figure >> it out. > > Well, you can use a define_expand to generate the move in the first > place. If can_create_pseudo_p() returns true, then you can call > copy_to_reg (addr) to get the address into a register, and you can > generate the post increment. > > (define_expand "movhi" > ... > if (can_create_pseudo_p () && MEM_P (operands[1])) > { > rtx reg = copy_to_reg (XEXP (operands[1], 0)); > emit_insn (gen_movhi_insn (operands[0], reg)); > DONE; > } > ... > ) > > (define_insn "movhi_insn" > [(set (match_operand:HI 0 ...) > (mem:HI (post_inc:P (match_operand:P 1 "register_operand" ...] > ... > ) > > The difficulties are going to come in reload. Reload will want to load > and store 16-bit values in order to spill registers. You will need a > scratch register to dothis, and that means that you need to implement > TARGET_SECONDARY_RELOAD. This is complicated:read the docs carefully > and look at the existing examples. > > Ian >
Re: Side effects on memory access
No luck on that. I've re-baselined off GCC 4.4.0 to get the add_reg_note function() but the register is still re-used wihtout being reloaded. The test case is: -- uint32_t load_q(volatile uint8_t* p) { return *p + *p; } -- The appropriate section of the md file is: --- (define_expand "movqi" [(set (match_operand:QI 0 "nonimmediate_operand") (match_operand:QI 1 "general_operand" ""))] "" { if (can_create_pseudo_p () && MEM_P (operands[1])) { rtx reg = copy_to_reg (XEXP (operands[1], 0)); rtx insn = emit_insn (gen_movqi_mem (operands[0], reg)); add_reg_note (insn, REG_INC, reg); DONE; } } ) (define_insn "movqi_mem" [(set (match_operand:QI 0 "register_operand" "=d") (mem:QI (post_inc:SI (match_operand:SI 1 "register_operand" "a"] "" "LOADACC, (%1+)\;STOREACC, %0" ) --- My last RTL dump was wrong due to it hitting a zero extend from memory optimisation. However, this time test.i.136r.subreg1 contains: --- (insn 3 5 4 2 loads.c:4 (set (reg/v/f:SI 30 [ p ]) (reg:SI 5 R10 [ p ])) 6 {movsi} (nil)) (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 4 8 2 loads.c:5 (set (reg:SI 32) (reg/v/f:SI 30 [ p ])) 6 {movsi} (nil)) (insn 8 7 9 2 loads.c:5 (set (reg:QI 31) (mem:QI (post_inc:SI (reg:SI 32)) [0 S1 A8])) 0 {movqi_mem} (expr_list:REG_INC (reg:SI 32) (nil))) (insn 9 8 10 2 loads.c:5 (set (reg:SI 27 [ D.1215 ]) (zero_extend:SI (reg:QI 31))) 24 {zero_extendqisi2} (nil)) (insn 10 9 11 2 loads.c:5 (set (reg:SI 34) (reg/v/f:SI 30 [ p ])) 6 {movsi} (nil)) (insn 11 10 12 2 loads.c:5 (set (reg:QI 33) (mem:QI (post_inc:SI (reg:SI 34)) [0 S1 A8])) 0 {movqi_mem} (expr_list:REG_INC (reg:SI 34) (nil))) (insn 12 11 13 2 loads.c:5 (set (reg:SI 26 [ D.1217 ]) (zero_extend:SI (reg:QI 33))) 24 {zero_extendqisi2} (nil)) (insn 13 12 14 2 loads.c:5 (set (reg:SI 35) (plus:SI (reg:SI 26 [ D.1217 ]) (reg:SI 27 [ D.1215 ]))) 9 {addsi3} (nil)) --- This is correct so far, but the next step in test.i.138r.cse1 contains: --- (insn 3 5 4 2 loads.c:4 (set (reg/v/f:SI 30 [ p ]) (reg:SI 5 R10 [ p ])) 6 {movsi} (nil)) (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 4 8 2 loads.c:5 (set (reg/f:SI 32 [ p ]) (reg/v/f:SI 30 [ p ])) 6 {movsi} (nil)) (insn 8 7 9 2 loads.c:5 (set (reg:QI 31) (mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0 {movqi_mem} (expr_list:REG_INC (reg/f:SI 32 [ p ]) (nil))) (insn 9 8 10 2 loads.c:5 (set (reg:SI 27 [ D.1215 ]) (zero_extend:SI (reg:QI 31))) 24 {zero_extendqisi2} (nil)) (insn 10 9 11 2 loads.c:5 (set (reg/f:SI 34 [ p ]) (reg/v/f:SI 30 [ p ])) 6 {movsi} (nil)) (insn 11 10 12 2 loads.c:5 (set (reg:QI 33) (mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0 {movqi_mem} (expr_list:REG_INC (reg/f:SI 34 [ p ]) (nil))) (insn 12 11 13 2 loads.c:5 (set (reg:SI 26 [ D.1217 ]) (zero_extend:SI (reg:QI 33))) 24 {zero_extendqisi2} (nil)) (insn 13 12 14 2 loads.c:5 (set (reg:SI 35) (plus:SI (reg:SI 26 [ D.1217 ]) (reg:SI 27 [ D.1215 ]))) 9 {addsi3} (nil)) --- At this level pseudo register 30 is being used in each load without being invalidated or re-loaded. -- Michael
Re: Side effects on memory access
Thanks. I'm going to work around it for now by post correcting X - it's a hack but I'm in the early stages of the port so I can get back to it later. -- Michael 2009/4/28 Ian Lance Taylor : > Michael Hope writes: > >> My last RTL dump was wrong due to it hitting a zero extend from memory >> optimisation. However, this time test.i.136r.subreg1 contains: > >> (insn 7 4 8 2 loads.c:5 (set (reg:SI 32) >> (reg/v/f:SI 30 [ p ])) 6 {movsi} (nil)) >> >> (insn 8 7 9 2 loads.c:5 (set (reg:QI 31) >> (mem:QI (post_inc:SI (reg:SI 32)) [0 S1 A8])) 0 {movqi_mem} >> (expr_list:REG_INC (reg:SI 32) >> (nil))) > >> This is correct so far, but the next step in test.i.138r.cse1 contains: > >> (insn 7 4 8 2 loads.c:5 (set (reg/f:SI 32 [ p ]) >> (reg/v/f:SI 30 [ p ])) 6 {movsi} (nil)) >> >> (insn 8 7 9 2 loads.c:5 (set (reg:QI 31) >> (mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0 >> {movqi_mem} (expr_list:REG_INC (reg/f:SI 32 [ p ]) >> (nil))) > > This substitution is clearly invalid. So there is a bug in CSE. Most > likely this bug has not been noticed before because POST_INC and friends > are normally inserted by the inc_dec pass which runs after CSE. > > It may be that all that is needed is to change the cse_insn function to > look for REG_INC notes. > > Ian >
Unexpected offsets when eliminating SP
HI there. I'm working on porting gcc to a new architecture which only does indirect addressing - there is no indirect with displacement. The problem is with spill locations in GCC 4.4.0. The elimination code correctly elimates the frame and args pointer and replaces it with register X. The problem is that it then generates indirect with offset loads to load spilt values. Normal usage such as: struct foo { int a; int b; } int bar(struct foo* p) { return p->b; } is correctly split into load X with p, add four, and then de-references. The RTL is generated after the IRA stage. GCC aborts in post reload with a 'instruction does not satisfy constraints' on: (insn 183 181 75 3 mandelbrot.c:117 (set (reg:SI 6 R11) (mem/c:SI (plus:SI (reg:SI 3 X) (const_int -8 [0xfff8])) [0 %sfp+-8 S4 A32])) -1 (nil)) The movsi it matches against is: (define_insn "movsi_insn" [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r,rm,rm,rm,C, rm") (match_operand:SI 1 "general_operand" "r, m,I, i ,n, rm,C"))] "" "@ LOADACC, %1\;STOREACC, %0 LOADACC, %1\;STOREACC, %0 LOADI, #%1\;STOREACC, %0 LOADLONG, #%1\;STOREACC, %0 LOADLONG, %1\;STOREACC, %0 Foo Bar" ) I believe it fails on the constraints as the 'm' constraint misses as go_if_legitimate_address only supports (mem (reg)) and not (mem (plus (reg...))) I don't think I had this problem when working against 4.3.3 but I'm not sure. Could someone point me in the right direction please? Is it appropriate to ask such questions on this list? -- Michael
Re: Unexpected offsets when eliminating SP
Thanks Jim and Ian. I've added a secondary_reload which does this: ... if (code == MEM) { if (fp_plus_const_operand(XEXP(x, 0), mode)) { sri->icode = in_p ? CODE_FOR_reload_insi : CODE_FOR_reload_outsi; return NO_REGS; } where fp_plus_const_operand is taken from the bfin port - it checks that this is RTL of the form ((plus (reg const)). The .md file contains: --- (define_expand "reload_insi" [(parallel [(set (match_operand:SI 0 "register_operand" "=r") (match_operand:SI 1 "memory_operand" "m")) (clobber (match_operand:SI 2 "register_operand" "=a"))])] "" { fprintf(stderr, "reload_insi\n"); rtx plus_op = XEXP(operands[1], 0); rtx fp_op = XEXP (plus_op, 0); rtx const_op = XEXP (plus_op, 1); rtx primary = operands[0]; rtx scratch = operands[2]; emit_move_insn (scratch, fp_op); emit_insn (gen_addsi3 (scratch, scratch, const_op)); emit_move_insn (primary, gen_rtx_MEM(Pmode, scratch)); DONE; } ) (define_expand "reload_outsi" [(parallel [(match_operand 0 "memory_operand" "=m") (match_operand 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "=&a")])] "" { fprintf(stderr, "reload_outsi\n"); rtx plus_op = XEXP(operands[0], 0); rtx fp_op = XEXP (plus_op, 0); rtx const_op = XEXP (plus_op, 1); rtx primary = operands[1]; rtx scratch = operands[2]; emit_move_insn (scratch, fp_op); emit_insn (gen_addsi3 (scratch, scratch, const_op)); emit_move_insn (gen_rtx_MEM(Pmode, scratch), primary); DONE; } ) --- The reload_insi is being called and is expanding into the correct code but for some reason the reload_outsi never gets called. sri->icode is being set correctly and propagates a few levels up but I couldn't track it any further. The s390 port does the reload in the same way as me. The bfin is similar. I haven't looked further into GO_IF_LEGITIMATE_ADDRESS but it's the next part to look at. It's a stripped down version of the mmix one so it should be roughly OK. I'm a bit confused with the documentation versus the ports. For example, REGNO_MODE_CODE_OK_FOR_BASE_P doesn't appear to need a strict form according to the documentation but the bfin port has a strict and non-strict version. Most of the ports have a REG_OK_FOR_BASE_P macro with strict and non-strict versions macro but it's not documented, isn't used, and might have been removed around gcc 4.0. Any ideas on why the reload_outsi above is being eaten? Thanks, -- Michael 2009/4/30 Jim Wilson : > Michael Hope wrote: >> >> HI there. I'm working on porting gcc to a new architecture which only >> does indirect addressing - there is no indirect with displacement. > > The IA-64 target also has only indirect addressing. Well, it has some > auto-increment addressing modes too, but that isn't relevant here. You > could try looking at the IA-64 port to see why it works and yours doesn't. > >> The problem is with spill locations in GCC 4.4.0. The elimination >> code correctly elimates the frame and args pointer and replaces it >> with register X. The problem is that it then generates indirect with >> offset loads to load spilt values. > > Since this is happening inside reload, first thing I would check is to make > sure you handle REG_OK_STRICT correctly. Before reload, a pseudo-reg is a > valid memory address. Inside reload, an unallocated pseudo-reg is actually > a memory location, and hence can not be a valid memory address. This is > controlled by REG_OK_STRICT. > > Jim >
Re: Unexpected offsets when eliminating SP
Thanks for everybodys help. I've gotten things working so I thought I'd quickly write it up. The architecture I'm working on is deliberatly simple. It has: * An accumulator * Fourteen general purpose registers R10 to R1E * X and Y cache registers each backed by non-coherent (!) caches * A stack backed by the S cache Memory can only be accessed by the X or Y registers. The cache-coherency problem means you can really only use X unless you can tell Y is far away - but that's a problem for another time. It also means you can't use the S stack as a data stack as you can't address it using X. The only addressing is 32 bit word indirect, 8 bit with pre-decrement, and 8 bit with post increment. I allocated R1E to the data stack and R1D to the frame pointer. The general purpose registers are in the DATA_REGS class while X and Y are in ADDR_REGS. Y is marked as fixed to prevent it being used. The implementation is: * Set BASE_REG_CLASS to ADDR_REGS * Set INDEX_REG_CLASS to NO_REGS to reject index addressing * Implement GO_IF_LEGITIMATE_ADDRESS so that it accepts (mem x) but rejects (mem (plus (reg const)) and the others You can't set BASE_REG_CLASS to NO_REGS as (mem x) is treated as (mem (plus (reg 0)) This works fine until you spill a variable. Spills generate offsets relative to the frame pointer. This is OK providing your frame pointer is a member of ADDR_REGS - mine isn't so the resulting fixup generates a offset address which kills the compiler. You can't pretend and put the FP in ADDR_REGS. A non-zero offset will correctly be rejected by GO_IF_LEGITIMATE_ADDRESS and loaded into X, but a zero offset will try to load from R1D. The solution here is to copy the mc68hc11 and use LEGITIMIZE_RELOAD_ADDRESS to recognise the offset and cause another reload. This code: if (GET_CODE (x) == PLUS && GET_CODE (XEXP (x, 0)) == REG && GET_CODE(XEXP(x, 1)) == CONST_INT) { HOST_WIDE_INT value = INTVAL (XEXP (x, 1)); push_reload(x, NULL_RTX, px, NULL, ADDR_REGS, GET_MODE(x), VOIDmode, 0, 0, opnum, reload_type); return true; } does that. I tried TARGET_SECONDARY_RELOAD as well. Similar code to above would correclty generate the code on an 'in' reload but for some reason the code for the 'out' reload would never get inserted. -- Michael 2009/4/29 Michael Hope : > HI there. I'm working on porting gcc to a new architecture which only > does indirect addressing - there is no indirect with displacement. > > The problem is with spill locations in GCC 4.4.0. The elimination > code correctly elimates the frame and args pointer and replaces it > with register X. The problem is that it then generates indirect with > offset loads to load spilt values. > > Normal usage such as: > > struct foo > { > int a; > int b; > } > > int bar(struct foo* p) > { > return p->b; > } > > is correctly split into load X with p, add four, and then de-references. > > The RTL is generated after the IRA stage. GCC aborts in post reload > with a 'instruction does not satisfy constraints' on: > (insn 183 181 75 3 mandelbrot.c:117 (set (reg:SI 6 R11) > (mem/c:SI (plus:SI (reg:SI 3 X) > (const_int -8 [0xfff8])) [0 %sfp+-8 S4 > A32])) -1 (nil)) > > The movsi it matches against is: > > (define_insn "movsi_insn" > [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r,rm,rm,rm,C, rm") > (match_operand:SI 1 "general_operand" "r, m,I, i ,n, rm,C"))] > "" > "@ > LOADACC, %1\;STOREACC, %0 > LOADACC, %1\;STOREACC, %0 > LOADI, #%1\;STOREACC, %0 > LOADLONG, #%1\;STOREACC, %0 > LOADLONG, %1\;STOREACC, %0 > Foo > Bar" > ) > > I believe it fails on the constraints as the 'm' constraint misses as > go_if_legitimate_address only supports (mem (reg)) and not (mem (plus > (reg...))) > > I don't think I had this problem when working against 4.3.3 but I'm not sure. > > Could someone point me in the right direction please? Is it > appropriate to ask such questions on this list? > > -- Michael >
Destructive comparison
Hi there. I'm having trouble figuring out how to represent a destructive comparison on the port I'm attempting. The ISA is very simple and accumulator based, so to generate a compare of two registers you would do: ; Compare R10 and R11, destroying R11 and setting C LOADACC, R10 XOR, R11 Note that the XOR instruction leaves the result in R11, i.e. R11 = R11 ^ ACC ; Greater than or equals, unsigned: LOADACC, R10 NOTACC ; Ones complement the accumulator ADD, R11 ; R11 = R11 + ACC, set C The C flag is equivalent to a zero flag in many cases and a carry flag in others so I've followed docs and defined different carry modes. Setting C is done similar to MMIX and bfin where you finally emit a set compare instruction such as: (define_insn "cmpcc_insn" [(set (match_operand:CC 0 "register_operand" "=C") (compare:CC (match_operand:SI 1 "register_operand" "d") (match_operand:SI 2 "register_operand" "b"))) ] "" "XOR, %1" ) Note here the 'b' constraint is for registers in the ACC_REGS class and 'd' is for registers in the DATA_REGS class. This seems to work fine, properly reloading the right operand into the accumulator. How should I represent the destruction/clobbering of operand 1? I've tried: * Setting the constraint to '=d' or '+d' to mark it as written * Using a (clobber (match_dup 1)) in the insn form, such as: (define_insn "cmpcc_insn" [(set (match_operand:CC 0 "register_operand" "=C") (compare:CC (match_operand:SI 1 "register_operand" "d") (match_operand:SI 2 "register_operand" "b"))) ] "" "XOR, %1" ) * Using a define_expand to clobber operand 1 later (outside the insn's implicit parallel) * Using a define_insn to mark it as both a destructive xor and compare in parallel, such as: (define_insn "cmpcc_insn" [ (set (match_operand:SI 0 "register_operand" "=d") (xor:SI (match_operand:SI 1 "register_operand" "%0") (match_operand:SI 2 "register_operand" "b"))) (set (match_operand:CC 3 "register_operand" "=C") (compare:CC (match_dup 1) (match_dup 2) )) I'd rather not use a scratch register as the moving between registers involves ACC, which would mean I'd need to save the right hand operand before doing the move. I'd rather have the reload do the move earlier if required if the left operand lives past this instruction. Thanks for any help, -- Michael
Re: Destructive comparison
Thanks, that worked. I ended up using: (define_insn "cmpcc_xor" [(set (match_operand:CC 0 "register_operand" "=C") (compare:CC (not:SI (xor:SI (match_operand:SI 1 "register_operand" "%r") (match_operand:SI 2 "register_operand" "b"))) (const_int 0))) (set (match_operand:SI 3 "register_operand" "=1") (not:SI (xor:SI (match_dup 1) (match_dup 2] "" "XOR, %1" ) The important thing was in the generation. The XOR is two operand but I needed to supply a third pretend operand using: emit_insn (gen_cmpcc_(cc_reg, x, y, gen_reg_rtx(SImode))); Using a match_dup instead of operand 3 above, or supplying 'x' twice, lead to the compiler not noticing the change. -- Michael 2009/5/18 Jim Wilson : > Michael Hope wrote: >> >> * Using a define_insn to mark it as both a destructive xor and >> compare in parallel, such as: > > When a compare is in a parallel, the compare must be the first operation. > You have it second. This kind of pattern should work. You can find many > examples of it in the sparc.md file for instance. Of course, in this case, > they aren't generated at RTL generation time. They are generated at combine > time. Still, I'd expect this to work, though there might be some early RTL > optimization passes that are not prepared to handle it. > > See for instance the cmp_cc_xor_not_set pattern in the sparc.md file, which > is similar to what you want. > > Jim >
Re: Destructive comparison
Yip, picked that up after I sent it. Thanks. 2009/5/19 Jim Wilson : > On Mon, 2009-05-18 at 19:58 +1200, Michael Hope wrote: >> (set (match_operand:SI 3 "register_operand" "=1") >> (not:SI (xor:SI (match_dup 1) (match_dup 2] > > not xor is aka xnor. You probably want this without the two "not" > operations. > > Jim > > >
Accumulator based machines
Hi there. The machine I'm working is part accumulator based, part register based. I'm having trouble figuring out how best to tell the compiler how ACC is affected and when. For example, the add instruction is two operand with the destination being a general register: ADD, R11 is equivalent to R11 = R11 + ACC This works fine using a rule like (define_insn "addsi3_insn" [(set (match_operand:SI 0 "register_operand" "=r") (plus:SI (match_operand:SI 1 "register_operand" "0") (match_operand:SI 2 "register_operand" "b")))] (b is the constraint that the register comes from the ACC_REGS class) The logical right shift instruction only works on the accumulator: LSR1 is equivalent to ACC = ACC >> 1 This works fine using: (define_insn "lshrsi3_const" [(set (match_operand:SI 0 "register_operand" "=b") (lshiftrt:SI (match_operand:SI 1 "register_operand" "0") (match_operand:SI 2 "immediate_operand" "")))] The problem is when I have to clobber ACC such as when moving between registers. The output should be: LOADACC, R10; STOREACC, R11 (equivalent to ACC = R10; R11 = ACC) I've tried a parallel clobber like: (define_insn "movsi" [(set (match_operand:SI 0 "nonimmediate_operand" "=b, dam,dam") (match_operand:SI 1 "general_operand" "dami,b, dam")) (clobber (reg:SI TREG_ACC)) but this causes trouble when setting up ACC for the likes of the add above. The compiler runs but the code is incorrect I've tried a parallel with a match_scratch like: (define_insn "movsi" [(set (match_operand:SI 0 "nonimmediate_operand" "=b, rm,rm") (match_operand:SI 1 "general_operand" "rmi,b, rm")) (clobber (match_scratch:SI 2 "=X,X,b")) ] "" "@ LOADACC, %1 STOREACC, %0 LOADACC, %1\;STOREACC, %0" This uses a 'b' constraint to put the scratch into ACC when moving between registers and a 'X' constraint to ignore the scratch when moving to or from ACC directly. This basically works but fails when mixed with other instructions. For example, the code: return left + right fails with a 'movsi does not meet constraints' as ACC was already allocated to one of the operands of the addsi, was not available for the scratch register, and as such something else was given to the movsi which didn't match the 'b' constraint. All of the other instructions are OK as I can clobber or mark ACC as an output reload to mark it as dirty. Even the 68hc11 is better off as it can directly move between any two registers :) Any ideas? Am I going about this the wrong way? My first port treated ACC as a fixed register which avoided all of this but generated too many loads and stores. Is there a way of using a register only if a chain of instructions use it? Can I peephole it in someway instead? -- Michael
Limiting the use of pointer registers
Hi there. I'm working on a port to an architecture where the pointer registers X and Y are directly backed by small 128 byte caches. Changing one of these registers to a different memory row causes a cache load cycle, so using them for memory access is fine but using them as general purpose registers is expensive. How can I prevent the register allocator from using these for anything but memory access? I have a register class called ADDR_REGS that contains just X and Y and one called DATA_REGS which contains the general registers R10 to R1E. GENERAL_REGS is the same as DATA_REGS. The order they appear in in reg_class is DATA_REGS, GENERAL_REGS, then ADDR_REGS. I've defined the constrains for most of the patterns to only take 'r' which prevents X or Y being used as operands for those patterns. I have to allow X and Y to be used in movsi and addsi3 to allow indirect memory addresses to be calculated. Unfortunately Pmode is SImode so I can't tell the difference between pointer and normal values in PREFERRED_RELOAD_CLASS, LIMIT_RELOAD_CLASS, or TARGET_SECONDARY_RELOAD. I tried setting REGISTER_MOVE_COST and MEMORY_MOVE_COST to 100 when the source or destination is ADDR_REGS but this didn't affect the output. I suspect that I'll have to do the same as the accumulator and hide X and Y from the register allocator. Pretend that any general register can access memory and then use post reload split to turn the patterns into X based patterns for the later phases to tidy up. One more question. The backing caches aren't coherent so X and Y can't read and write to the same 128 bytes of memory at the same time. Does GCC have any other information about the location of a pointer that I could use? Something like: * Pointer is to text memory or read only data, so it is safe to read from * Pointer 1 is in the stack and pointer 2 is in BSS, so they are definitely far apart * Pointer 1 is to to one on stack item and pointer 2 is to a stack item at least 128 bytes apart * The call stack is known and pointer 1 and pointer 2 point to different rows My fallback plan is to add a variable attribute so the programmer can mark the pointer as non overlapping and push the problem onto them. Something clever would be nice though :) Sorry for all the questions - this is quite a difficult architecture. I hope to collect all the answers and do a write up for others to use when I'm done. -- Michael
Using a umulhisi3
Hi there. The architecture I'm working is a 32 bit, word based machine with a 16x16 -> 32 unsigned multiply. For some reason the combine stage is converting the umulhisi3 into a mulsi3 and I'm not sure how to track this down. The test code is part of an alpha blend: void blend(uint8_t* sb, uint8_t* db) { uint16_t ia = 256 - *sb; uint16_t d = *db; *db = ((d * ia) >> 8) + *sb; } I've define the different multiplies in the .md file: (define_insn "umulhisi3" [(set (match_operand:SI 0 "register_operand" "=r") (mult:SI (zero_extend:SI (match_operand:HI 1 "register_operand" "%r")) (zero_extend:SI (match_operand:HI 2 "register_operand" "r"] "" ... (define_insn "mulsi3" [(set (match_operand:SI 0 "register_operand" "=r") (mult:SI (match_operand:SI 1 "register_operand" "%r") (match_operand:SI 2 "register_operand" "r")))] "" ... Running at -O level optimisations gives the following in umul.157r.outof_cfglayout, just before the combine stage: --- (insn 3 6 4 2 umul.c:16 (set (reg/v/f:SI 28 [ sb ]) (reg:SI 0 R10 [ sb ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 0 R10 [ sb ]) (nil))) (insn 4 3 5 2 umul.c:16 (set (reg/v/f:SI 29 [ db ]) (reg:SI 1 R11 [ db ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 1 R11 [ db ]) (nil))) (note 5 4 8 2 NOTE_INSN_FUNCTION_BEG) (insn 8 5 9 2 umul.c:17 (set (reg:SI 26 [ D.1217 ]) (zero_extend:SI (mem:QI (reg/v/f:SI 28 [ sb ]) [0 S1 A8]))) 27 {zero_extendqisi2} (expr_list:REG_DEAD (reg/v/f:SI 28 [ sb ]) (nil))) (insn 9 8 10 2 umul.c:20 (set (reg:HI 30) (const_int 256 [0x100])) 1 {movhi_insn} (nil)) (insn 10 9 11 2 umul.c:20 (set (reg:SI 31) (minus:SI (subreg:SI (reg:HI 30) 0) (reg:SI 26 [ D.1217 ]))) 12 {subsi3} (expr_list:REG_DEAD (reg:HI 30) (nil))) (insn 11 10 12 2 umul.c:20 (set (reg:SI 33) (zero_extend:SI (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8]))) 27 {zero_extendqisi2} (nil)) (insn 12 11 13 2 umul.c:20 (set (reg:HI 32) (subreg:HI (reg:SI 33) 0)) 1 {movhi_insn} (expr_list:REG_DEAD (reg:SI 33) (nil))) (insn 13 12 14 2 umul.c:20 (set (reg:SI 34) (mult:SI (zero_extend:SI (reg:HI 32)) (zero_extend:SI (subreg:HI (reg:SI 31) 0 14 {umulhisi3} (expr_list:REG_DEAD (reg:HI 32) (expr_list:REG_DEAD (reg:SI 31) (nil (insn 14 13 15 2 umul.c:20 (set (reg:SI 35) (ashiftrt:SI (reg:SI 34) (const_int 8 [0x8]))) 21 {ashrsi3_const} (expr_list:REG_DEAD (reg:SI 34) (nil))) (insn 15 14 16 2 umul.c:20 (set (reg:QI 36) (subreg:QI (reg:SI 35) 0)) 0 {movqi_insn} (expr_list:REG_DEAD (reg:SI 35) (nil))) (insn 16 15 17 2 umul.c:20 (set (reg:SI 37) (plus:SI (reg:SI 26 [ D.1217 ]) (subreg:SI (reg:QI 36) 0))) 11 {addsi3} (expr_list:REG_DEAD (reg:QI 36) (expr_list:REG_DEAD (reg:SI 26 [ D.1217 ]) (nil (insn 17 16 0 2 umul.c:20 (set (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8]) (subreg:QI (reg:SI 37) 0)) 0 {movqi_insn} (expr_list:REG_DEAD (reg:SI 37) (expr_list:REG_DEAD (reg/v/f:SI 29 [ db ]) (nil --- The umulhisi3 has been correctly found and used at this stage. In the following combine stage however, it gets converted into a mulsi3. The .combine dump is attached. The xtensa port is the closest match I can find as it is 32 bit, word based, and has the umulhisi3. It correctly keeps the 16 bit multiply. Some other test cases like: uint32_t mul(uint16_t a, uint16_t b) { return a*b; } come through fine. It might be something to do with the memory access. How does the combine stage work? It looks like it could get multiple potential matches for a set of RTLs. Does it use some type of costing function to pick between them? Can I tell combine that a umulhisi3 is cheaper than a mulsi3? Thanks for the earlier help on the post reload split to use the accumulator - it's working well. -- Michael umul.i.159r.combine Description: Binary data
Re: Machine Description Template?
I've found the MMIX port to be a good place to start. It's a bit old but the archtecture is nice and simple and the implementation nice and brief. Watch out though as it is a pure 64 bit machine - you'll need to think SI every time you see DI. The trick past there is to compare the significant features of your machine with existing machines. For example, GCC prefers a 68000 style machine with a set of condition codes, however many machines only have one condition flag that changes meaning based on what you are doing. -- Michael 2009/6/6 Graham Reitz : > > Is there a machine description template in the gcc file source tree? > > If there is also template for the 'C header file of macro definitions' that > would be good to know too. > > I did a file search for '.md' and there are tons of examples. Although, I > was curious if there was a generic template. > > graham >
Good progress
Hi there. Sorry for the noise, but I thought it would be nice to hear from a new porter who has gotten past the first few hurdles. The architecture I'm working on is a 32 bit accumulator based machine with a very small instruction set. Binutils and GAS were straight forward and after some help I've incorperated the destructive compares, post reload fixes for the accumulator, and the limited addressing modes (well, mode :) I've hooked in my own simulator to the test suite. The compile test suite passes fine and the execute tests are down to from an initial 700 to 300 failures. The last fifty will be messy but the next few hundred should drop fairly easily. Thanks for everyones help so far. The code generated is already decent and will only get better. -- Michael
Re: GCC 4.7.0 Release Candidate available from gcc.gnu.org
On Sat, Mar 3, 2012 at 2:44 AM, Richard Guenther wrote: > > GCC 4.7.0 Release Candidate available from gcc.gnu.org > > The first release candidate for GCC 4.7.0 is available from > > ftp://gcc.gnu.org/pub/gcc/snapshots/4.7.0-RC-20120302 > > and shortly its mirrors. It has been generated from SVN revision 184777. > > I have so far bootstrapped and tested the release candidate on > x86_64-linux. Please test it and report any issues to bugzilla. The RC bootstraps C, C++, Fortran, and Obj-C in arm-linux-gnueabi Cortex-A9/Thumb-2/NEON/softfp and ARMv5T/ARM/soft-float configurations. The test results are here: http://builds.linaro.org/toolchain/gcc-4.7.0-RC-20120302/logs/armv7l-natty-cbuild259-tcpanda03-armv5r2/gcc-testsuite.txt and: http://builds.linaro.org/toolchain/gcc-4.7.0-RC-20120302/logs/armv7l-natty-cbuild259-tcpanda02-cortexa9r1/gcc-testsuite.txt and, on reflection, should be sent to gcc-testresults. The host details are in the same directory. There's a fair number of failures: http://people.linaro.org/~michaelh/incoming/a9-faults.txt http://people.linaro.org/~michaelh/incoming/armv5-faults.txt Ramana, any thoughts? If you ignore the guality and tls ones then most are testisims but there's a couple of ICEs. -- Michael
Re: How to avoid sign or zero extension
On 3 June 2012 17:06, i-love-spam wrote: > I'm writing some optimized functions for gcc-arm in a library that obuses > shorts. So the problem I have is that in extremely many places resutls of my > optimized functions are needlessly sign or zero extended. That is, gcc adds > UXTH or SXTH opcode. > > For example, imagine if I use clz instructions (count leading zeros). Result > of the function will be positive number between 0 and 32. So, in places where > result of that clz functions is assigned to a short int it shouldn't > sign-extend the result. > > I use inline asm, and it works with arm's armcc if I use short as a result of > inline asm expression: > > static __inline short CLZ(int n) > { > short ret; > #ifdef __GNUC__ > __asm__("clz %0, %1" : "=r"(ret) : "r"(n)); > #else > __asm { clz ret, n; } > #endif > return ret; > } > > //test function > short test_clz(int n) > { > return CLZ(n); > } > > > ARMCC generates this code: > test_clz: > CLZ r0,r0 > BX lr > > GCC generates this code: > test_clz: > clz r0, r0 > sxth r0, r0 <--- offending line. > bx lr Hi there. This list is about the development of GCC. I recommend using the gcc-help list for end user topics. In this case, GCC is correct. Section 5.4 of the ARM AAPCS says "A Fundamental Data Type that is smaller than 4 bytes is zero- or sign-extended to a word and returned in r0". You've used inline assembler so GCC can't tell that the clz instruction already clears the top bits. How about using __builtin_clz() instead? You get the bonus that GCC can then reason about the function and optimise away if possible. -- Michael
Re: GCC 4.7.1 Release Candidate available from gcc.gnu.org
On 6 June 2012 22:14, Richard Guenther wrote: > > The first release candidate for GCC 4.7.1 is available from > > ftp://gcc.gnu.org/pub/gcc/snapshots/4.7.1-RC-20120606 > > and shortly its mirrors. It has been generated from SVN revision 188257. > > I have so far bootstrapped and tested the release candidate on > x86_64-linux. Please test it and report any issues to bugzilla. > > If all goes well, I'd like to release 4.7.1 at the end of next week. This bootstraps and tests OK in ARMv5T+ARM+soft, Cortex-A9+Thumb-2+softfp+NEON, and Cortex-A9+Thumb-2+hard+NEON configurations for C, C++, Fortran, and objc[1]. The regressions compared to 4.7.0 are testisms or the vectoriser not applying. I haven't logged them in bugzilla, sorry. -- Michael [1] http://builds.linaro.org/toolchain/gcc-4.7.1-RC-20120606/logs/
Re: ARM: gcc generates two identical strd instructions to store 8 bytes
On 26 June 2012 00:48, Nathanaël Prémillieu wrote: > Hi all, > > I am using the gcc ARM cross-compiler (gcc version 4.6.3 (Ubuntu/Linaro > 4.6.3-1ubuntu5)). Compiling the test.c code (in attachement) with: > > 'arm-linux-gnueabi-gcc -S test.c' > > I obtain the test.s assembly code (in attachement). At lines 56 and 57 of > the test.s there is two identical strd instructions: > > 56 strd r2, [r7] > 57 strd r2, [r7] > > I have checked the semantic of the ARM strd instruction and I have not seen > any side effect of this instruction that could explain why gcc need to put > this instruction two times in a row. For me, one is sufficient to store the > 8-bytes variable into memory. > > Is there an explanation? Hi Nathanaël. Your question is more appropriate for the gcc-help list. This list is about the development of GCC itself. You've built with optimisation turned off so GCC has generated correct but inefficient code. The double store could be side effect of expanding the 64 bit multiply into the component 32 bit multiplies or the conditional. Try building at -O or higher. -- Michael
Re: thumb2 support
On 11 October 2012 17:58, Grant wrote: >>> Hello, I'm working with the BeagleBone and gcc-4.5.4 on Gentoo. If I >>> try to compile the 3.6 kernel with CONFIG_THUMB2_KERNEL, I get: >>> >>> arch/arm/boot/compressed/head.S:127: Error: selected processor does >>> not support requested special purpose register -- `mrs r2,cpsr' >>> arch/arm/boot/compressed/head.S:134: Error: selected processor does >>> not support requested special purpose register -- `mrs r2,cpsr' >>> arch/arm/boot/compressed/head.S:136: Error: selected processor does >>> not support requested special purpose register -- `msr cpsr_c,r2' >>> >>> This post indicates that mainline gcc does not currently support thumb2: >>> >>> https://groups.google.com/d/msg/beagleboard/P52fgMDzp8A/vupzuh71vdYJ >>> >>> However, this indicates that thumb2 has been supported since 4.3: >>> >>> http://gcc.gnu.org/gcc-4.3/changes.html >>> >>> Can anyone clear this up? >> >> The errors are coming from an assembler file that is not part of the >> GCC sources. Are those instructions valid for Thumb2? I don't know. >> If they are valid, then the issue is with the assembler, which is not >> part of GCC; check the version of the GNU binutils that you have >> installed. If those instructions are not valid, then you need to >> change your source. > > Thanks Ian. I'm using binutils-2.22-r1. Do you happen to know which > version of binutils should support thumb2? Hi Grant. I'm pretty sure this was fixed by: commit c0d796cf810a84f10703c0390f7b1c5887b837c9 Author: Nick Clifton Date: Wed Jun 13 14:18:59 2012 + PR gas/12698 * config/tc-arm.c (do_t_mrs): Do not require an m-profile architecure when assembling for all archiectures. (do_t_msr): Likewise. which will be in the upcoming binutils 2.23. Debian/Ubuntu carry this as a patch on top of their 2.22. -- Michael