Re: Second GCC 4.6.0 release candidate is now available

2011-03-24 Thread Michael Hope
On Tue, Mar 22, 2011 at 11:12 AM, Jakub Jelinek  wrote:
> A second GCC 4.6.0 release candidate is available at:
>
> ftp://gcc.gnu.org/pub/gcc/snapshots/4.6.0-RC-20110321/
>
> Please test the tarballs and report any problems to Bugzilla.
> CC me on the bugs if you believe they are regressions from
> previous releases severe enough to block the 4.6.0 release.
>
> If no more blockers appear I'd like to release GCC 4.6.0
> early next week.

The RC bootstraps C, C++, Fortran, Obj-C, and Obj-C++ on
ARMv7/Cortex-A9/Thumb-2/NEON, ARMv5T/ARM/softfp, ARMv5T/Thumb/softfp,
and ARMv4T/ARM/softfp.  I'm afraid I haven't reviewed the test results
(Richard? Ramana?)

See:
 http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02298.html
 http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02391.html
 http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02394.html
 http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02393.html

and:
 http://builds.linaro.org/toolchain/gcc-4.6.0-RC-20110321/logs/

-- Michael


Re: Second GCC 4.6.0 release candidate is now available

2011-04-04 Thread Michael Hope
On Sat, Mar 26, 2011 at 4:16 AM, Ramana Radhakrishnan
 wrote:
> Hi Michael,
>
> Thanks for running these. I spent some time this morning looking
> through the results, they largely look ok though I don't have much
> perspective on the
> the objc/ obj-c++ failures.
>
> These failures here
>
> For v7-a , A9 and Neon - these failures below:
>
>> Running target unix
>> FAIL: gfortran.dg/array_constructor_11.f90  -O3 -fomit-frame-pointer  (test 
>> for excess errors)
>> UNRESOLVED: gfortran.dg/array_constructor_11.f90  -O3 -fomit-frame-pointer  
>> compilation failed to produce executable
>> FAIL: gfortran.dg/array_constructor_11.f90  -O3 -fomit-frame-pointer 
>> -funroll-loops  (test for excess errors)
>> UNRESOLVED: gfortran.dg/array_constructor_11.f90  -O3 -fomit-frame-pointer 
>> -funroll-loops  compilation failed to produce executable
>> FAIL: gfortran.dg/array_constructor_11.f90  -O3 -fomit-frame-pointer 
>> -funroll-all-loops -finline-functions  (test for excess errors)
>> UNRESOLVED: gfortran.dg/array_constructor_11.f90  -O3 -fomit-frame-pointer 
>> -funroll-all-loops -finline-functions  compilation failed to produce 
>> executable
>> FAIL: gfortran.dg/array_constructor_11.f90  -O3 -g  (test for excess errors)
>> UNRESOLVED: gfortran.dg/array_constructor_11.f90  -O3 -g  compilation failed 
>> to produce executable
>> FAIL: gfortran.dg/func_assign_3.f90  -O3 -fomit-frame-pointer  (test for 
>> excess errors)
>> UNRESOLVED: gfortran.dg/func_assign_3.f90  -O3 -fomit-frame-pointer  
>> compilation failed to produce executable
>> FAIL: gfortran.dg/func_assign_3.f90  -O3 -fomit-frame-pointer -funroll-loops 
>>  (test for excess errors)
>> UNRESOLVED: gfortran.dg/func_assign_3.f90  -O3 -fomit-frame-pointer 
>> -funroll-loops  compilation failed to produce executable
>> FAIL: gfortran.dg/func_assign_3.f90  -O3 -fomit-frame-pointer 
>> -funroll-all-loops -finline-functions  (test for excess errors)
>> UNRESOLVED: gfortran.dg/func_assign_3.f90  -O3 -fomit-frame-pointer 
>> -funroll-all-loops -finline-functions  compilation failed to produce 
>> executable
>> FAIL: gfortran.dg/func_assign_3.f90  -O3 -g  (test for excess errors)
>> UNRESOLVED: gfortran.dg/func_assign_3.f90  -O3 -g  compilation failed to 
>> produce executable
> are caused by a broken assembler. All these tests appear to pass
> fine in a cross environment on my machine.

I've updated to binutils 2.21.51 which should fix the fault.  I'm
re-running the Cortex-A9 build against the 4.6.0 release now.

> From v5t.
>
>> FAIL: gcc.dg/c90-intconst-1.c (internal compiler error)
>> FAIL: gcc.dg/c90-intconst-1.c (test for excess errors)

I re-ran this against the 4.6.0 release and these fails went away. Good.
  http://gcc.gnu.org/ml/gcc-testresults/2011-04/msg00319.html

-- Michael


Re: GCC 4.6.1 Release Candidate available from gcc.gnu.org

2011-06-22 Thread Michael Hope
On Tue, Jun 21, 2011 at 1:01 AM, Jakub Jelinek  wrote:
> The first release candidate for GCC 4.6.1 is available from
>
>  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6.1-RC-20110620
>
> and shortly its mirrors.  It has been generated from SVN revision 175201.
>
> I have so far bootstrapped and tested the release candidate on
> x86_64-linux and i686-linux.  Please test it and report any issues to
> bugzilla.

It bootstraps C, C++ and Fortran in a ARM Cortex-A9 and ARMv5TE
configuration.  The test results are here:
 http://gcc.gnu.org/ml/gcc-testresults/2011-06/msg02632.html
 http://gcc.gnu.org/ml/gcc-testresults/2011-06/msg02633.html

with more detail here:
 http://builds.linaro.org/toolchain/gcc-4.6.1-RC-20110620/logs/

Ramana or Richard, could you have a read over the results please?

-- Michael


Re: performance regression with trunk's gengtype on ARM?

2011-08-28 Thread Michael Hope
On Mon, Aug 29, 2011 at 8:57 AM, Mikael Pettersson  wrote:
> I'm seeing what appears to be a recent massive performance regression
> with trunk's gengtype, as compiled and run in stage 2, on ARM V5TE.
>
> Right now 4.7-20110827's stage2 gengtype has been running for almost
> 10 hours on my ARM build machine, but the process is tiny and no swapping
> occurs.  To put those 10 hours in perspective, on this machine (1.6 GHz
> ARM V5TE uniprocessor running Linux) I regularly do full bootstraps and
> regression test suite runs for c,c++,ada,fortran in about 18 hours for
> gcc 4.4, about 20 hours for gcc 4.5, about 24 hours for gcc 4.6, and
> about 27 hours for trunk until recently.  So 10 hours or more just in
> stage 2 gengtype is suspicious.
>
> I believe 4.7-20110820 also was unusually slow to build, but I didn't
> monitor that build very carefully so can't say if gengtype was involved
> then too.

FWIW, I build trunk once a week on a PandaBoard.  r178096 took 10
hours to bootstrap C, C++, and Fortran and 9 hours to test.  The 4.5
release branch at r177893 takes 3:50 to bootstrap and 6:15 to test.

I've put the user time in seconds below.  4.5 is ~2 s, 4.6 is
~23000 s, and current trunk ~46000 (2.3 x slower).

See http://builds.linaro.org/toolchain/ for more.

-- Michael

gcc-4.5+svn175369 20602.07
gcc-4.5+svn175745 19768.21
gcc-4.5+svn176026 19739.35
gcc-4.5+svn176306 19711.70
gcc-4.5+svn176615 19668.38
gcc-4.5+svn176915 19728.38
gcc-4.5+svn177422 19713.14
gcc-4.5+svn177688 19746.67
gcc-4.5+svn177893 19744.96
gcc-4.6+svn175136 22979.51
gcc-4.6+svn175369 23092.50
gcc-4.6+svn175745 22958.21
gcc-4.6+svn176026 23009.37
gcc-4.6+svn176306 22952.93
gcc-4.6+svn176615 22952.11
gcc-4.6+svn177422 22946.22
gcc-4.6+svn177688 22847.87
gcc-4.6+svn177894 22964.09
gcc-4.6+svn178096 22934.61
gcc-4.7~svn175284 34518.10
gcc-4.7~svn175368 34887.17
gcc-4.7~svn175422 34975.48
gcc-4.7~svn175617 34908.60
gcc-4.7~svn175745 35040.42
gcc-4.7~svn175795 35110.84
gcc-4.7~svn175904 34893.29
gcc-4.7~svn176026 34972.99
gcc-4.7~svn176133 35171.65
gcc-4.7~svn176224 35247.44
gcc-4.7~svn176306 35038.07
gcc-4.7~svn176494 26151.21
gcc-4.7~svn176615 26257.04
gcc-4.7~svn176733 40401.18
gcc-4.7~svn176816 40048.32
gcc-4.7~svn176915 40102.52
gcc-4.7~svn176998 40161.10
gcc-4.7~svn177229 28604.27
gcc-4.7~svn177422 44991.89
gcc-4.7~svn177554 45199.05
gcc-4.7~svn177610 45173.94
gcc-4.7~svn177688 45469.00
gcc-4.7~svn177823 45391.00
gcc-4.7~svn177949 28769.64
gcc-4.7~svn178025 45605.43
gcc-4.7~svn178096 45599.59


Re: RFC: Improving support for known testsuite failures

2011-09-08 Thread Michael Hope
On Thu, Sep 8, 2011 at 8:31 PM, Richard Guenther
 wrote:
> On Wed, Sep 7, 2011 at 5:28 PM, Diego Novillo  wrote:
>> One of the most vexing aspects of GCC development is dealing with
>> failures in the various testsuites.  In general, we are unable to
>> keep failures down to zero.  We tolerate some failures and tell
>> people to "compare your build against a clean build".
>>
>> This forces developers to either double their testing time by
>> building the compiler twice or search in gcc-testresults and hope
>> to find a relatively similar build to compare against.
>>
>> Additionally, the marking mechanisms in DejaGNU are generally
>> cumbersome and hard to add.  Even worse, depending on the
>> controlling script, there may not be an XFAIL marker at all.
>>
>> So, while we would ideally keep NO failures in the testsuite, the
>> reality is that we are content with having KNOWN failures.  For a
>> given set of failures out of 'make check', I would like to have a
>> simple filtering mechanism that prunes the known failures out.
>>
>> Desired features:
>>
>> - List of known failures lives in SVN.
>> - Each target can have its own list.
>> - Supports ignoring FAIL, UNRESOLVED and XPASS results.
>> - Supports pattern matching to glob sets of failures.
>> - Co-exists with the existing XFAIL support in DejaGNU.
>> - Supports flaky tests.
>> - Supports timestamps to avoid having tests in a knonw-to-fail
>>  state forever.
>>
>> In terms of implementation, this filter could be part of 'make
>> check'.  We'd pipe make check's output to it and it would decide
>> whether to emit FAIL/UNRESOLVED/XPASS lines based on the black
>> list.
>>
>> I could also make this a post-check filter that runs on all the
>> generated .sum files.  The filter could live in
>> /contrib and be used on demand.
>>
>> I am not thrilled about the prospect of implementing this in
>> DejaGNU directly.
>>
>> Thoughts?
>
> I think it would be more useful to have a script parse gcc-testresults@
> postings from the various autotesters and produce a nice webpage
> with revisions and known FAIL/XPASSes for the target triplets that
> are tested.
>
> That's been a long time on my TODO list, but my web/script FU is
> weak enough that I've been pushing that back.

I have something along those lines for the Linaro releases:
 
http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.6-2011.08/logs/armv7l-natty-cbuild162-ursa1-cortexa9r1/gcc-testsuite.txt?base=gcc-linaro-4.6-2011.07-0

and a lower level diff-on-sum-files for each commit:
 
http://builds.linaro.org/toolchain/gcc-linaro-4.5+bzr99541~rsandifo~lp823708-4.5/logs/armv7l-natty-cbuild181-ursa4-armv5r2/testsuite-diff.txt
 
http://builds.linaro.org/toolchain/gcc-linaro-4.6+bzr106801~ams-codesourcery~merge-from-fsf-20110908-4.6/logs/x86_64-natty-cbuild181-oort1-x86_64r1/testsuite-diff.txt

They're both a hack and only work against local files.  The code is
available at:
 https://launchpad.net/tcwg-web

and:
 https://launchpad.net/cbuild

They're both similar to contrib/compare_results but webified and
hooked into our auto builders.

-- Michael


Re: Improvement of Cortex-A15

2012-01-24 Thread Michael Hope
On Thu, Jan 19, 2012 at 9:35 PM, Yang Yueming  wrote:
> I want to do some optimizations for Cortex-A15,Is anyone doing this too or is 
> there any work has been done?
>
> Yang Yueming

Hi there.  Cortex-A15 boards aren't readily available so most of the
work is being done by the PDSW group inside ARM.  Here at Linaro we
target the Cortex-A9 but many of the changes also help the A15.

Here's our todo list:
 http://apus.seabright.co.nz/helpers/backlog/project/gcc-linaro

We work upstream in the FSF trunk and backport the changes to our 4.6
based branch.  We're a pretty open bunch and, if you wish, are happy
to backport any changes you make.

-- Michael


Changing the order when generating a spill address

2009-07-20 Thread Michael Hope
Hi there.  The port that I'm working on has pointer registers backed
by a cache.  It's unusual as the cache changes immediately when the
pointer register is modified instead of later when it is deferenced.
This means that it is cheaper to copy a base address into the pointer
register, then add the offset as it is less likely that the cache row
will change.

Normal code such as this:
---
struct abc
{
  int a;
  char cc[64];
  int b;
};

int foo(struct abc *p)
{
  return p->a + p->b;
}
---
generates the correct code:
LOADACC, R10
STOREACC, X
LOADLONG, #68
ADD, X
LOADACC, (X)

Code that saves or loads a value into a spill slot however does the opposite:
LOADLONG, #16
STOREACC, X
LOADACC, R1E (the stack pointer)
ADD, X
LOADACC, R10
STOREACC, (X)

I did a spot check on the bfin port and it does the same:
P2 = -16 (X);
P2 = P2 + FP;
R0 = [P2];

Is there a way of setting the order that reload generates a spill slot
address?  I can work around it by implementing
LEGITIMIZE_RELOAD_ADDRESS but was wondering if there is a better way.

Thank you,

-- Michael


Picking between alternative ways of expanding a section

2009-07-23 Thread Michael Hope
Hi there.  This is in follow up to my email on the 24 th of May.

The short version is: how can I track down why GCC is picking between
two alternatives for implementing a function?  In a memcpy() where
Pmode == SImode, I get a near ideal implementation.  If Pmode ==
PSImode (due to limitations of the pointer registers) I get something
much worse.

The difference happens early on.  In the .128r.expand with Pmode ==
SImode I get:
 ;; MEM[base: to] = MEM[base: p];

With PSImode I get offset addressing instead:
;; MEM[base: pto + ivtmp.25] = MEM[base: pfrom + ivtmp.25];

This flows through into the actual code.

I assume this is due to GCC assuming that PSImode works differently to
SImode and that the cast/translation cost is enough to make offset
addressing overall cheaper.

The m32c compiler is the only other using PSImode but it doesn't
generate offsetted addresses.  The same things happen with and without
a basic TARGET_ADDRESS_COSTS and TARGET_RTX_COSTS.

I guess I want a way of telling the compiler that PSImode and SImode
are equivalent.

The longer version is:
The machine I'm working on has two special registers for memory access
that are backed by caches.  Any change to these registers can cause an
expensive cache load cycle so while they're great for memory access
they're terrible for general use.

The problem is that Pmode == SImode so the register allocator will now
and again use these registers for general operations.  I've
implemented a partial integer mode PSImode suggested by Mihael
Meissner and set Pmode to PSImode. This correctly separates things but
the compiler now generates significantly worse code.

The example is a simple memcpy():

void copy(int *pfrom, int *pto, int count)
{
  while (count != 0)
{
  *pto = *pfrom;
  pto++;
  pfrom++;
  count--;
}
}

If I have #define Pmode SImode then I get the near-best code:
copy:
LOADACC, R12;# 133  loadaccsi_insn/1
STOREACC, R13   ;# 134  storeaccsi_insn
LOADLONG, #0;# 139  loadaccsi_insn/2
XOR, R13;# 140  cmpccsi_insn/3
LOADLONG, #.L4  ;# 43   *bCCeq
SKIP_IF
STOREACC, PC
LOADACC, R11;# 121  loadaccsi_insn/1
STOREACC, Y ;# 122  storeaccsi_insn
LOADACC, R10;# 127  loadaccsi_insn/1
STOREACC, X ;# 128  storeaccsi_insn
.L3:
LOADACC, (X);# 79   loadaccsi_insn/1
STOREACC, (Y)   ;# 86   storeaccsi_insn
LOADLONG, #4;# 149  loadaccsi_insn/2
ADD, Y  ;# 150  addsi3_acc
ADD, X  ;# 151  addsi3_acc
LOADLONG, #-1   ;# 103  loadaccsi_insn/2
ADD, R12;# 104  addsi3_acc
LOADACC, R12;# 109  loadaccsi_insn/1
STOREACC, R10   ;# 110  storeaccsi_insn
LOADLONG, #0;# 115  loadaccsi_insn/2
XOR, R10;# 116  cmpccsi_insn/3
LOADLONG, #.L3  ;# 57   *bCCne
STOREACC, PC_IF
.L4:
POP ;# 147  *expanded_return
STOREACC, PC

Note the good
LOADACC, (X);# 79   loadaccsi_insn/1
STOREACC, (Y)   ;# 86   storeaccsi_insn
LOADLONG, #4;# 149  loadaccsi_insn/2
ADD, Y  ;# 150  addsi3_acc
ADD, X  ;# 151  addsi3_acc

in the middle.

Instead if I have #define Pmode PSImode I get
copy:
LOADACC, R14;# 186  loadaccsi_insn/1
PUSH;# 187  pushsi_acc
LOADACC, R12;# 163  loadaccsi_insn/1
STOREACC, R13   ;# 164  storeaccsi_insn
LOADLONG, #0;# 169  loadaccsi_insn/2
XOR, R13;# 170  cmpccsi_insn/3
LOADLONG, #.L4  ;# 43   *bCCeq
SKIP_IF
STOREACC, PC
LOADLONG, #0;# 157  loadaccsi_insn/2
STOREACC, R13   ;# 158  storeaccsi_insn
.L3:
LOADACC, R13;# 85   loadaccsi_insn/1
STOREACC, X ;# 86   storeaccsi_insn
; No-op truncate on X = X   ;# 47   truncsipsi2/1
LOADACC, R11;# 91   loadaccpsi_insn/1
STOREACC, Y ;# 92   storeaccpsi_insn
LOADACC, X  ;# 97   loadaccpsi_insn/1
ADD, Y  ;# 98   addpsi3_acc
LOADACC, R10;# 103  loadaccpsi_insn/1
STOREACC, R14   ;# 104  storeaccpsi_insn
LOADACC, X  ;# 109  loadaccpsi_insn/1
ADD, R14;# 110  addpsi3_acc
LOADACC, R14;# 115  loadaccpsi_insn/1
STOREACC, X ;# 116  storeaccpsi_insn
LOADACC, (X);# 121  loadaccsi_insn/1
STOREACC, (Y)   ;# 128  storeaccsi_insn
LOADLONG, #-1   ;# 133  loadaccsi_insn/2
ADD, R12;# 134  addsi3_acc
LOADLONG, #4;# 139  loadaccsi_insn/2
ADD, R13;# 140  addsi3_acc
LOADACC, R12;# 145  loadaccsi_insn/1
STOREACC, X ;# 146  storeaccsi_insn
LOADLONG, #0;# 151  loadaccsi_insn/2
XOR, X  ;# 152  cmpccsi_insn/3
LOADLONG, #.L3  ;# 59   *bCCne
STOREACC, PC_IF
.L4:
POP ;# 178  popsi_insn
STOREACC, R14
POP ;# 179  *

Re: Compiler for gcc

2009-08-08 Thread Michael Hope
Hi Harshal.  I'm no expert, but GCC can be built by another C
compiler.  If you have a look at how GCC builds you'll see that it
goes through a few stages - the first is where the local C compiler
builds a first version of GCC, and then this new version of GCC is
used to build itself.  The same technique is used to build newer
versions of GCC. If your machine currently has GCC version 3 and you
want to build version 4 then the first step uses GCC 3 to build a
temporary version of GCC 4, and then this temporary version is used to
build the final version.

-- Michael

2009/8/9 Harshal Jain :
> As we know gcc is used 2 compile c programs n also gcc is used 2
> compile linux kernels also bt i wanted 2 know who is d compiler of
> gcc?
> means in which programming language compiler for gcc is written???
> --
> Regards ,
> Harshal Jain
>
> “UNIX is simple.  It just takes a genius to understand its simplicity.”
> – Dennis Ritchie
>


Improving code with no offset addressing

2009-10-19 Thread Michael Hope
Hi there.  The architecture I'm working on porting gcc to has indirect
addressing but no constant offset or register offset versions.  Code
like this:

void fill(int* p)
{
  p[0] = 0;
  p[1] = 0;
  p[2] = 0;
  p[3] = 0;

Turns into:
 X = p
 *X = 0
 X = X + 4
 *X = 0
 X = p
 X = X + 8
 *X = 0
 X = p
 X = X + 12
 *X = 0

at both -O and -O2.  Note that the first step recognises that X
contains p and correctly increases it instead of rebuilding it.

I'd like to generate the following code instead:
 X = p
 *X = 0
 X = X + 4
 *X = 0
 X = X + 4
 *X = 0
 X = p
 X = X + 4
 *X = 0

What is the best way to approach this?  It seems to be common across
ports (see the note on ia64 and ARM Thumb below).  Is there a cost
function I can change?  Will changing LEGITIMIZE_ADDRESS fix it?  Is
there some type of value tracking that could be turned on/added?

I've checked the ia64, which also only has indirect addressing, and
ARM Thumb which has limited offsets.  ia64 generates the same reload
base/add offset as mine:

mov r14 = r32
;;
st4 [r14] = r0, 4
;;
st4 [r14] = r0
adds r14 = 8, r32
;;
st4 [r14] = r0
adds r14 = 12, r32
;;
st4 [r14] = r0
adds r14 = 16, r32

ARM Thumb does the same when the offset is large (p[70] and p[71] in this case):

str r3, [r0] ; p[0]
str r3, [r0, #4] ; p[1]
str r3, [r0, #8] ; p[2]
str r3, [r0, #12] ; p[3]
mov r2, #140
mov r3, #0
lsl r2, r2, #1
str r3, [r0, r2] ; p[70]
mov r2, #142
lsl r2, r2, #1
str r3, [r0, r2] ; p[71]

Thanks for any pointers,

-- Michael


Re: porting GCC to a micro with a very limited addressing mode --- what to write in LEGITIMATE_ADDRESS, LEGITIMIZE_ADDRESS and micro.md ?!

2010-01-25 Thread Michael Hope
Hi Sergio.  My port has similar addressing modes - all memory must be
accessed by one of two registers and can only be accessed indirectly,
indirect with pre increment, and indirect with post increment.  The
key is GO_IF_LEGITIMATE_ADDRESS and the legitimate address helper
function.  Mine looks like this:

/* Return 1 if the address is OK, otherwise 0.
   Used by GO_IF_LEGITIMATE_ADDRESS.  */

bool
tomi_legitimate_address (enum machine_mode mode ATTRIBUTE_UNUSED,
 rtx x,
 bool strict_checking)
{
  /* (mem reg) */
  if (REG_P (x)
  && tomi_reg_ok (x, strict_checking)
  )
{
  return 1;
}

  if (GET_CODE(x) == PRE_DEC)
{
...
}

  if (GET_CODE(x) == POST_INC)
{
...
}

  return 0;
}

tomi_reg_ok returns true if x is any register when strict checking is
clear and true if x is one of my addressing registers when strict
checking is set.

GCC will feed any memory accesses through this function to see if they
are directly supported, and if not it will break them up into
something smaller and try again.

Hope that helps,

-- Michael


2010/1/26 Sergio Ruocco :
> Gabriel Paubert wrote:
>> On Mon, Jan 25, 2010 at 01:34:09PM +0100, Sergio Ruocco wrote:
>>> Hi everyone,
>>>
>>> I am porting GCC to a custom 16-bit microcontroller with very limited
>>> addressing modes. Basically, it can only load/store using a (general
>>> purpose) register as the address, without any offset:
>>>
>>>      LOAD (R2) R1    ; load R1 from memory at address (R2)
>>>      STORE R1 (R2)   ; store R1 to memory at address (R2)
>>>
>>> As far as I can understand, this is more limited than the current
>>> architectures supported by GCC that I found in the current gcc/config/*.
>>
>> The Itanium (ia64) has the same limited choice of addressing modes.
>>
>>       Gabriel
>
> Thanks Gabriel.
>
> I dived into the ia64 md, but it is still unclear to me how the various
> parts (macros, define_expand and define_insn in MD etc.) work together
> to force the computation of a source/dest address plus offset into a
> register... can anyone help me with this ?
>
> Thanks,
>
>        Sergio
>


Re: porting GCC to a micro with a very limited addressing mode --- success with LEGITIMATE / LEGITIMIZE_ADDRESS, stuck with ICE !

2010-02-10 Thread Michael Hope
Hi Sergio.  Here's the interesting parts from my port.  The code's a
bit funny looking as I've edited it for this post.

In .h:

#define BASE_REG_CLASS  ADDR_REGS
#define INDEX_REG_CLASS NO_REGS

#ifdef REG_OK_STRICT
# define _REG_OK_STRICT 1
#else
# define _REG_OK_STRICT 0
#endif

#define REGNO_OK_FOR_BASE_P(r) _regno_ok_for_base_p(r,
_REG_OK_STRICT)
#define REGNO_OK_FOR_INDEX_P(r) 0

In .c:

static bool
_reg_ok(rtx reg, bool strict)
{
  int regno = REGNO(reg);

  bool is_addr = _is_addr_regno(regno);
  bool ok_strict = is_addr;
  bool special = regno == ARG_POINTER_REGNUM
|| regno == TREG_S
;

  if (strict)
{
  return ok_strict || special;
}
  else
{
  return ok_strict || special
|| regno >= FIRST_PSEUDO_REGISTER
;
}
}

bool
_legitimate_address (enum machine_mode mode ATTRIBUTE_UNUSED,
 rtx x,
 bool strict_checking)
{
  /* (mem reg) */
  if (REG_P (x)
  && _reg_ok (x, strict_checking)
  )
{
  return 1;
}

  return 0;
}

Note that this ISA only has indirect addressing and has no indirect +
offset or indirect + register modes.  GCC
handles this just fine by splitting up any other type that fails
legitimate_address into smaller components.

-- Michael

On 10 February 2010 09:02, Sergio Ruocco  wrote:
>
> Michael Hope wrote:
>> Hi Sergio.  Any luck so far?
>
> Micheal, thanks for your inquiry. I made some progress, in fact.
>
> I got the GO_IF_LEGITIMATE_ADDRESS() macro to detect correctly REG+IMM
> addresses, and then the LEGITIMIZE_ADDRESS() macro to force them to be
> pre-computed in a register.
>
> However, now the compiler freaks out with an ICE.. :-/ I put some
> details below. Thanks for any clue that you or others can give me.
>
> Cheers,
>
>        Sergio
>
> ==
>
>
> This is a fragment of my LEGITIMIZE_ADDRESS():
> -
>
> rtx
> legitimize_address(rtx X,rtx OLDX, enum machine_mode MODE)
> {
>        rtx op1,op2,op,sum;
>        op=NULL;
> ...
>        if(GET_CODE(X)==PLUS && !no_new_pseudos)
>        {
>                op1=XEXP(X,0);
>                op2=XEXP(X,1);
>                if(GET_CODE(op1) == CONST_INT && (GET_CODE(op2) == REG ||
> GET_CODE(op2) == SUBREG)) // base displacement
>                {
>                        sum = gen_rtx_PLUS (MODE, op1, op2);
>                        op = force_reg(MODE, sum);
>                }
> ...
> -
>
>
> Now when compiling a simple program such as:
>
> void foobar(int par1, int par2, int parN)
> {
>        int a,b;
>        a = 0x1234;
>        b = a;
> }
>
> the instructions (n. 8,12,13) which compute the addresses in registers
> seem to be generated correctly:
>
> -
> ;; Function foobar
>
> ;; Register dispositions:
> 37 in 4  38 in 2  39 in 4  40 in 2  41 in 2
>
> ;; Hard regs used:  2 4 30
>
> (note 2 0 3 NOTE_INSN_DELETED)
>
> (note 3 2 6 0 NOTE_INSN_FUNCTION_BEG)
>
> ;; Start of basic block 1, registers live: 1 [A1] 29 [B13] 30 [B14]
> (note 6 3 8 1 [bb 1] NOTE_INSN_BASIC_BLOCK)
>
> (insn 8 6 9 1 (set (reg/f:HI 4 A4 [37])
>        (plus:HI (reg/f:HI 30 B14)
>            (const_int -16 [0xfff0]))) 9 {addhi3} (nil)
>    (nil))
>
> (insn 9 8 10 1 (set (reg:HI 2 A2 [38])
>        (const_int 4660 [0x1234])) 5 {*constant_load} (nil)
>    (nil))
>
> (insn 10 9 12 1 (set (mem/i:HI (reg/f:HI 4 A4 [37]) [0 a+0 S2 A32])
>        (reg:HI 2 A2 [38])) 7 {*store_word} (nil)
>    (nil))
>
> (insn 12 10 13 1 (set (reg/f:HI 4 A4 [39])
>        (plus:HI (reg/f:HI 30 B14)
>            (const_int -14 [0xfff2]))) 9 {addhi3} (nil)
>    (nil))
>
> (insn 13 12 14 1 (set (reg/f:HI 2 A2 [40])
>        (plus:HI (reg/f:HI 30 B14)
>            (const_int -16 [0xfff0]))) 9 {addhi3} (nil)
>    (nil))
>
> (insn 14 13 15 1 (set (reg:HI 2 A2 [orig:41 a ] [41])
>        (mem/i:HI (reg/f:HI 2 A2 [40]) [0 a+0 S2 A32])) 4 {*load_word} (nil)
>    (nil))
>
> (insn 15 14 16 1 (set (mem/i:HI (reg/f:HI 4 A4 [39]) [0 b+0 S2 A16])
>        (reg:HI 2 A2 [orig:41 a ] [41])) 7 {*store_word} (nil)
>    (nil))
> ;; End of basic block 1, registers live:
>  1 [A1] 29 [B13] 30 [B14]
>
> (note 16 15 25 NOTE_INSN_FUNCTION_END)
>
> (note 25 16 0 NOTE_INSN_DELETED)
> -
>
> However, when I compile it
>
> $ hcc -da foobar8.c
>
> I get an ICE at the end 

Re: GCC porting tutorials

2010-04-24 Thread Michael Hope
Hi Radu.  I found the MMIX backend to be quite useful.  It's
reasonably small and acceptably up to date.  Keep in mind that the
MMIX is a 64 bit machine though.

The Picochip and ARM are good as well.  The ARM port is very
complicated due to the number of targets that it supports but fairly
clean once you get into it.

-- Michael

On 24 April 2010 20:53, Radu Hobincu  wrote:
> Hello,
>
> My name is Radu Hobincu, I am part of a team at "Politehnica" University
> of Bucharest that is developing a massive parallel computing architecture
> and currently my job is to port the GCC compiler to this new machine.
>
> I've been looking over the GCC official site at http://gcc.gnu.org/ but I
> couldn't find an official porting tutorial. Is there such a thing? And
> maybe a small example for a lightweight architecture?
>
> Regards,
> Radu
>


Side effects on memory access

2009-04-21 Thread Michael Hope
Hi there.  I'm looking at porting GCC to a new architecture which has
a quite small instruction set and I'm afraid I can't figure out how to
represent unintended side effects on instructions.

My current problem is accessing memory.  Reading an aligned 32 bit
word is simple using LOADACC, (X).  Half words and bytes are harder as
the only instruction available is a load byte with post increment
'LOADACC, (X+)'.

How can I tell GCC that loading a byte also increases the pointer
register?  My first version reserved one of the pointer registers and
threw away the modified value but this is inefficient.  I suspect that
some type of clobber or define_expand is required but I can't figure
it out.

Thanks for any help,

-- Michael


Re: Side effects on memory access

2009-04-23 Thread Michael Hope
Thanks for the response Ian.  Doing the define_expand inserts the post
increment but GCC doesn't seem to notice the change in X.

I added this code:


(define_expand "movqi"
  [(set (match_operand:QI 0 "nonimmediate_operand")
(match_operand:QI 1 "general_operand" ""))]
  ""
  {
if (can_create_pseudo_p () && MEM_P (operands[1]))
  {
rtx reg = copy_to_reg (XEXP (operands[1], 0));
emit_insn (gen_movqi_mem (operands[0], reg));
DONE;
  }
  }
)

; PENDING: The SI here is actually a P
(define_insn "movqi_mem"
  [(set (match_operand:QI 0 "register_operand" "=d")
(mem:QI (post_inc:SI (match_operand:SI 1 "register_operand" "a"]
  ""
  "LOADACC, (%1+)\;STOREACC, %0"
)

The 'd' constraint is for data registers and the 'a' for address
registers, which is only the X register due to cache coherency
reasons.

When compiling this test case:

uint store5(volatile char* p)
{
  return *p + *p;
}

I get the following move2.i.139r.subreg:
---
(insn 3 5 4 2 move2.c:56 (set (reg/v/f:SI 30 [ p ])
(reg:SI 5 R10 [ p ])) 6 {movsi} (nil))

(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)

(insn 7 4 8 2 move2.c:57 (set (reg:QI 31)
(mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0
{movqi_mem} (nil))

(insn 8 7 9 2 move2.c:57 (set (reg:SI 27 [ D.1191 ])
(zero_extend:SI (reg:QI 31))) 24 {zero_extendqisi2} (nil))

(insn 9 8 10 2 move2.c:57 (set (reg:QI 32)
(mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0
{movqi_mem} (nil))

(insn 10 9 11 2 move2.c:57 (set (reg:SI 26 [ D.1193 ])
(zero_extend:SI (reg:QI 32))) 24 {zero_extendqisi2} (nil))

(insn 11 10 12 2 move2.c:57 (set (reg:SI 33)
(plus:SI (reg:SI 26 [ D.1193 ])
(reg:SI 27 [ D.1191 ]))) 9 {addsi3} (nil))

(insn 12 11 16 2 move2.c:57 (set (reg:SI 28 [  ])
(reg:SI 33)) 6 {movsi} (nil))

(insn 16 12 22 2 move2.c:58 (set (reg/i:SI 5 R10)
(reg:SI 28 [  ])) 6 {movsi} (nil))

(insn 22 16 0 2 move2.c:58 (use (reg/i:SI 5 R10)) -1 (nil))
---
Instruction 3 copies incoming argument in R10 is copied into pseudo
30.  Pseudo 30 is then used at instruction 7 then instruction 9
without either being reloaded or corrected for the post increment.

-- Michael

2009/4/22 Ian Lance Taylor :
> Michael Hope  writes:
>
>> Hi there.  I'm looking at porting GCC to a new architecture which has
>> a quite small instruction set and I'm afraid I can't figure out how to
>> represent unintended side effects on instructions.
>>
>> My current problem is accessing memory.  Reading an aligned 32 bit
>> word is simple using LOADACC, (X).  Half words and bytes are harder as
>> the only instruction available is a load byte with post increment
>> 'LOADACC, (X+)'.
>
> Wow.
>
>> How can I tell GCC that loading a byte also increases the pointer
>> register?  My first version reserved one of the pointer registers and
>> threw away the modified value but this is inefficient.  I suspect that
>> some type of clobber or define_expand is required but I can't figure
>> it out.
>
> Well, you can use a define_expand to generate the move in the first
> place.  If can_create_pseudo_p() returns true, then you can call
> copy_to_reg (addr) to get the address into a register, and you can
> generate the post increment.
>
> (define_expand "movhi"
>  ...
>  if (can_create_pseudo_p () && MEM_P (operands[1]))
>    {
>      rtx reg = copy_to_reg (XEXP (operands[1], 0));
>      emit_insn (gen_movhi_insn (operands[0], reg));
>      DONE;
>    }
>  ...
> )
>
> (define_insn "movhi_insn"
>  [(set (match_operand:HI 0 ...)
>        (mem:HI (post_inc:P (match_operand:P 1 "register_operand" ...]
>  ...
> )
>
> The difficulties are going to come in reload.  Reload will want to load
> and store 16-bit values in order to spill registers.  You will need a
> scratch register to dothis, and that means that you need to implement
> TARGET_SECONDARY_RELOAD.  This is complicated:read the docs carefully
> and look at the existing examples.
>
> Ian
>


Re: Side effects on memory access

2009-04-26 Thread Michael Hope
No luck on that.  I've re-baselined off GCC 4.4.0 to get the
add_reg_note function() but the register is still re-used wihtout
being reloaded.

The test case is:
--
uint32_t load_q(volatile uint8_t* p)
{
  return *p + *p;
}
--
The appropriate section of the md file is:
---
(define_expand "movqi"
  [(set (match_operand:QI 0 "nonimmediate_operand")
(match_operand:QI 1 "general_operand" ""))]
  ""
  {
if (can_create_pseudo_p () && MEM_P (operands[1]))
  {
rtx reg = copy_to_reg (XEXP (operands[1], 0));
rtx insn = emit_insn (gen_movqi_mem (operands[0], reg));
add_reg_note (insn, REG_INC, reg);
DONE;
  }
  }
)

(define_insn "movqi_mem"
  [(set (match_operand:QI 0 "register_operand" "=d")
(mem:QI (post_inc:SI (match_operand:SI 1 "register_operand" "a"]
  ""
  "LOADACC, (%1+)\;STOREACC, %0"
)
---
My last RTL dump was wrong due to it hitting a zero extend from memory
optimisation.  However, this time test.i.136r.subreg1 contains:
---
(insn 3 5 4 2 loads.c:4 (set (reg/v/f:SI 30 [ p ])
(reg:SI 5 R10 [ p ])) 6 {movsi} (nil))

(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)

(insn 7 4 8 2 loads.c:5 (set (reg:SI 32)
(reg/v/f:SI 30 [ p ])) 6 {movsi} (nil))

(insn 8 7 9 2 loads.c:5 (set (reg:QI 31)
(mem:QI (post_inc:SI (reg:SI 32)) [0 S1 A8])) 0 {movqi_mem}
(expr_list:REG_INC (reg:SI 32)
(nil)))

(insn 9 8 10 2 loads.c:5 (set (reg:SI 27 [ D.1215 ])
(zero_extend:SI (reg:QI 31))) 24 {zero_extendqisi2} (nil))

(insn 10 9 11 2 loads.c:5 (set (reg:SI 34)
(reg/v/f:SI 30 [ p ])) 6 {movsi} (nil))

(insn 11 10 12 2 loads.c:5 (set (reg:QI 33)
(mem:QI (post_inc:SI (reg:SI 34)) [0 S1 A8])) 0 {movqi_mem}
(expr_list:REG_INC (reg:SI 34)
(nil)))

(insn 12 11 13 2 loads.c:5 (set (reg:SI 26 [ D.1217 ])
(zero_extend:SI (reg:QI 33))) 24 {zero_extendqisi2} (nil))

(insn 13 12 14 2 loads.c:5 (set (reg:SI 35)
(plus:SI (reg:SI 26 [ D.1217 ])
(reg:SI 27 [ D.1215 ]))) 9 {addsi3} (nil))
---
This is correct so far, but the next step in test.i.138r.cse1 contains:
---
(insn 3 5 4 2 loads.c:4 (set (reg/v/f:SI 30 [ p ])
(reg:SI 5 R10 [ p ])) 6 {movsi} (nil))

(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)

(insn 7 4 8 2 loads.c:5 (set (reg/f:SI 32 [ p ])
(reg/v/f:SI 30 [ p ])) 6 {movsi} (nil))

(insn 8 7 9 2 loads.c:5 (set (reg:QI 31)
(mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0
{movqi_mem} (expr_list:REG_INC (reg/f:SI 32 [ p ])
(nil)))

(insn 9 8 10 2 loads.c:5 (set (reg:SI 27 [ D.1215 ])
(zero_extend:SI (reg:QI 31))) 24 {zero_extendqisi2} (nil))

(insn 10 9 11 2 loads.c:5 (set (reg/f:SI 34 [ p ])
(reg/v/f:SI 30 [ p ])) 6 {movsi} (nil))

(insn 11 10 12 2 loads.c:5 (set (reg:QI 33)
(mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0
{movqi_mem} (expr_list:REG_INC (reg/f:SI 34 [ p ])
(nil)))

(insn 12 11 13 2 loads.c:5 (set (reg:SI 26 [ D.1217 ])
(zero_extend:SI (reg:QI 33))) 24 {zero_extendqisi2} (nil))

(insn 13 12 14 2 loads.c:5 (set (reg:SI 35)
(plus:SI (reg:SI 26 [ D.1217 ])
(reg:SI 27 [ D.1215 ]))) 9 {addsi3} (nil))
---
At this level pseudo register 30 is being used in each load without
being invalidated or re-loaded.

-- Michael


Re: Side effects on memory access

2009-04-27 Thread Michael Hope
Thanks.  I'm going to work around it for now by post correcting X -
it's a hack but I'm in the early stages of the port so I can get back
to it later.

-- Michael

2009/4/28 Ian Lance Taylor :
> Michael Hope  writes:
>
>> My last RTL dump was wrong due to it hitting a zero extend from memory
>> optimisation.  However, this time test.i.136r.subreg1 contains:
>
>> (insn 7 4 8 2 loads.c:5 (set (reg:SI 32)
>>         (reg/v/f:SI 30 [ p ])) 6 {movsi} (nil))
>>
>> (insn 8 7 9 2 loads.c:5 (set (reg:QI 31)
>>         (mem:QI (post_inc:SI (reg:SI 32)) [0 S1 A8])) 0 {movqi_mem}
>> (expr_list:REG_INC (reg:SI 32)
>>         (nil)))
>
>> This is correct so far, but the next step in test.i.138r.cse1 contains:
>
>> (insn 7 4 8 2 loads.c:5 (set (reg/f:SI 32 [ p ])
>>         (reg/v/f:SI 30 [ p ])) 6 {movsi} (nil))
>>
>> (insn 8 7 9 2 loads.c:5 (set (reg:QI 31)
>>         (mem:QI (post_inc:SI (reg/v/f:SI 30 [ p ])) [0 S1 A8])) 0
>> {movqi_mem} (expr_list:REG_INC (reg/f:SI 32 [ p ])
>>         (nil)))
>
> This substitution is clearly invalid.  So there is a bug in CSE.  Most
> likely this bug has not been noticed before because POST_INC and friends
> are normally inserted by the inc_dec pass which runs after CSE.
>
> It may be that all that is needed is to change the cse_insn function to
> look for REG_INC notes.
>
> Ian
>


Unexpected offsets when eliminating SP

2009-04-29 Thread Michael Hope
HI there.  I'm working on porting gcc to a new architecture which only
does indirect addressing - there is no indirect with displacement.

The problem is with spill locations in GCC 4.4.0.  The elimination
code correctly elimates the frame and args pointer and replaces it
with register X.  The problem is that it then generates indirect with
offset loads to load spilt values.

Normal usage such as:

struct foo
{
  int a;
  int b;
}

int bar(struct foo* p)
{
   return p->b;
}

is correctly split into load X with p, add four, and then de-references.

The RTL is generated after the IRA stage.  GCC aborts in post reload
with a 'instruction does not satisfy constraints' on:
(insn 183 181 75 3 mandelbrot.c:117 (set (reg:SI 6 R11)
(mem/c:SI (plus:SI (reg:SI 3 X)
(const_int -8 [0xfff8])) [0 %sfp+-8 S4
A32])) -1 (nil))

The movsi it matches against is:

(define_insn "movsi_insn"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r,rm,rm,rm,C, rm")
(match_operand:SI 1 "general_operand"   "r, m,I, i ,n, rm,C"))]
  ""
  "@
   LOADACC, %1\;STOREACC, %0
   LOADACC, %1\;STOREACC, %0
   LOADI, #%1\;STOREACC, %0
   LOADLONG, #%1\;STOREACC, %0
   LOADLONG, %1\;STOREACC, %0
   Foo
   Bar"
)

I believe it fails on the constraints as the 'm' constraint misses as
go_if_legitimate_address only supports (mem (reg)) and not (mem (plus
(reg...)))

I don't think I had this problem when working against 4.3.3 but I'm not sure.

Could someone point me in the right direction please?  Is it
appropriate to ask such questions on this list?

-- Michael


Re: Unexpected offsets when eliminating SP

2009-05-03 Thread Michael Hope
Thanks Jim and Ian.  I've added a secondary_reload which does this:

...
  if (code == MEM)
{
  if (fp_plus_const_operand(XEXP(x, 0), mode))
{
  sri->icode = in_p ? CODE_FOR_reload_insi : CODE_FOR_reload_outsi;
  return NO_REGS;
}

where fp_plus_const_operand is taken from the bfin port - it checks
that this is RTL of the form ((plus (reg const)).  The .md file
contains:

---
(define_expand "reload_insi"
  [(parallel [(set (match_operand:SI 0 "register_operand" "=r")
   (match_operand:SI 1 "memory_operand" "m"))
 (clobber (match_operand:SI 2 "register_operand" "=a"))])]
  ""
{
  fprintf(stderr, "reload_insi\n");
  rtx plus_op = XEXP(operands[1], 0);
  rtx fp_op = XEXP (plus_op, 0);
  rtx const_op = XEXP (plus_op, 1);
  rtx primary = operands[0];
  rtx scratch = operands[2];

  emit_move_insn (scratch, fp_op);
  emit_insn (gen_addsi3 (scratch, scratch, const_op));
  emit_move_insn (primary, gen_rtx_MEM(Pmode, scratch));
  DONE;
}
)

(define_expand "reload_outsi"
  [(parallel [(match_operand 0 "memory_operand" "=m")
 (match_operand 1 "register_operand" "r")
 (match_operand:SI 2 "register_operand" "=&a")])]
  ""
{
  fprintf(stderr, "reload_outsi\n");
  rtx plus_op = XEXP(operands[0], 0);
  rtx fp_op = XEXP (plus_op, 0);
  rtx const_op = XEXP (plus_op, 1);
  rtx primary = operands[1];
  rtx scratch = operands[2];

  emit_move_insn (scratch, fp_op);
  emit_insn (gen_addsi3 (scratch, scratch, const_op));
  emit_move_insn (gen_rtx_MEM(Pmode, scratch), primary);
  DONE;
}
)
---
The reload_insi is being called and is expanding into the correct code
but for some reason the reload_outsi never gets called.  sri->icode is
being set correctly and propagates a few levels up but I couldn't
track it any further.

The s390 port does the reload in the same way as me.  The bfin is
similar.  I haven't looked further into GO_IF_LEGITIMATE_ADDRESS but
it's the next part to look at.  It's a stripped down version of the
mmix one so it should be roughly OK.

I'm a bit confused with the documentation versus the ports.  For
example, REGNO_MODE_CODE_OK_FOR_BASE_P doesn't appear to need a strict
form according to the documentation but the bfin port has a strict and
non-strict version.  Most of the ports have a REG_OK_FOR_BASE_P macro
with strict and non-strict versions macro but it's not documented,
isn't used, and might have been removed around gcc 4.0.

Any ideas on why the reload_outsi above is being eaten?

Thanks,

-- Michael

2009/4/30 Jim Wilson :
> Michael Hope wrote:
>>
>> HI there.  I'm working on porting gcc to a new architecture which only
>> does indirect addressing - there is no indirect with displacement.
>
> The IA-64 target also has only indirect addressing.  Well, it has some
> auto-increment addressing modes too, but that isn't relevant here.  You
> could try looking at the IA-64 port to see why it works and yours doesn't.
>
>> The problem is with spill locations in GCC 4.4.0.  The elimination
>> code correctly elimates the frame and args pointer and replaces it
>> with register X.  The problem is that it then generates indirect with
>> offset loads to load spilt values.
>
> Since this is happening inside reload, first thing I would check is to make
> sure you handle REG_OK_STRICT correctly.  Before reload, a pseudo-reg is a
> valid memory address.  Inside reload, an unallocated pseudo-reg is actually
> a memory location, and hence can not be a valid memory address.  This is
> controlled by REG_OK_STRICT.
>
> Jim
>


Re: Unexpected offsets when eliminating SP

2009-05-09 Thread Michael Hope
Thanks for everybodys help.  I've gotten things working so I thought
I'd quickly write it up.

The architecture I'm working on is deliberatly simple.  It has:
 * An accumulator
 * Fourteen general purpose registers R10 to R1E
 * X and Y cache registers each backed by non-coherent (!) caches
 * A stack backed by the S cache

Memory can only be accessed by the X or Y registers.  The
cache-coherency problem means you can really only use X unless you can
tell Y is far away - but that's a problem for another time.  It also
means you can't use the S stack as a data stack as you can't address
it using X.

The only addressing is 32 bit word indirect, 8 bit with pre-decrement,
and 8 bit with post increment.

I allocated R1E to the data stack and R1D to the frame pointer.  The
general purpose registers are in the DATA_REGS class while X and Y are
in ADDR_REGS.  Y is marked as fixed to prevent it being used.

The implementation is:
 * Set BASE_REG_CLASS to ADDR_REGS
 * Set INDEX_REG_CLASS to NO_REGS to reject index addressing
 * Implement GO_IF_LEGITIMATE_ADDRESS so that it accepts (mem x) but
rejects (mem (plus (reg const)) and the others

You can't set BASE_REG_CLASS to NO_REGS as (mem x) is treated as (mem
(plus (reg 0))

This works fine until you spill a variable.  Spills generate offsets
relative to the frame pointer.  This is OK providing your frame
pointer is a member of ADDR_REGS - mine isn't so the resulting fixup
generates a offset address which kills the compiler.

You can't pretend and put the FP in ADDR_REGS.  A non-zero offset will
correctly be rejected by GO_IF_LEGITIMATE_ADDRESS and loaded into X,
but a zero offset will try to load from R1D.

The solution here is to copy the mc68hc11 and use
LEGITIMIZE_RELOAD_ADDRESS to recognise the offset and cause another
reload.  This code:

  if (GET_CODE (x) == PLUS
  && GET_CODE (XEXP (x, 0)) == REG
  && GET_CODE(XEXP(x, 1)) == CONST_INT)
{
  HOST_WIDE_INT value = INTVAL (XEXP (x, 1));

  push_reload(x, NULL_RTX, px, NULL,
  ADDR_REGS, GET_MODE(x), VOIDmode, 0, 0, opnum, reload_type);

  return true;
}

does that.

I tried TARGET_SECONDARY_RELOAD as well.  Similar code to above would
correclty generate the code on an 'in' reload but for some reason the
code for the 'out' reload would never get inserted.

-- Michael

2009/4/29 Michael Hope :
> HI there.  I'm working on porting gcc to a new architecture which only
> does indirect addressing - there is no indirect with displacement.
>
> The problem is with spill locations in GCC 4.4.0.  The elimination
> code correctly elimates the frame and args pointer and replaces it
> with register X.  The problem is that it then generates indirect with
> offset loads to load spilt values.
>
> Normal usage such as:
>
> struct foo
> {
>  int a;
>  int b;
> }
>
> int bar(struct foo* p)
> {
>   return p->b;
> }
>
> is correctly split into load X with p, add four, and then de-references.
>
> The RTL is generated after the IRA stage.  GCC aborts in post reload
> with a 'instruction does not satisfy constraints' on:
> (insn 183 181 75 3 mandelbrot.c:117 (set (reg:SI 6 R11)
>        (mem/c:SI (plus:SI (reg:SI 3 X)
>                (const_int -8 [0xfff8])) [0 %sfp+-8 S4
> A32])) -1 (nil))
>
> The movsi it matches against is:
>
> (define_insn "movsi_insn"
>  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r,rm,rm,rm,C, rm")
>        (match_operand:SI 1 "general_operand"       "r, m,I, i ,n, rm,C"))]
>  ""
>  "@
>   LOADACC, %1\;STOREACC, %0
>   LOADACC, %1\;STOREACC, %0
>   LOADI, #%1\;STOREACC, %0
>   LOADLONG, #%1\;STOREACC, %0
>   LOADLONG, %1\;STOREACC, %0
>   Foo
>   Bar"
> )
>
> I believe it fails on the constraints as the 'm' constraint misses as
> go_if_legitimate_address only supports (mem (reg)) and not (mem (plus
> (reg...)))
>
> I don't think I had this problem when working against 4.3.3 but I'm not sure.
>
> Could someone point me in the right direction please?  Is it
> appropriate to ask such questions on this list?
>
> -- Michael
>


Destructive comparison

2009-05-17 Thread Michael Hope
Hi there.  I'm having trouble figuring out how to represent a
destructive comparison on the port I'm attempting.  The ISA is very
simple and accumulator based, so to generate a compare of two
registers you would do:

; Compare R10 and R11, destroying R11 and setting C

LOADACC, R10
XOR, R11

Note that the XOR instruction leaves the result in R11, i.e. R11 = R11 ^ ACC

; Greater than or equals, unsigned:

LOADACC, R10
NOTACC ; Ones complement the accumulator
ADD, R11 ; R11 = R11 + ACC, set C

The C flag is equivalent to a zero flag in many cases and a carry flag
in others so I've followed docs and defined different carry modes.
Setting C is done similar to MMIX and bfin where you finally emit a
set compare instruction such as:

(define_insn "cmpcc_insn"
  [(set (match_operand:CC 0 "register_operand" "=C")
(compare:CC
 (match_operand:SI 1 "register_operand" "d")
 (match_operand:SI 2 "register_operand" "b")))
  ]
  ""
  "XOR, %1"
)

Note here the 'b' constraint is for registers in the ACC_REGS class
and 'd' is for registers in the DATA_REGS class.  This seems to work
fine, properly reloading the right operand into the accumulator.

How should I represent the destruction/clobbering of operand 1?  I've tried:

 * Setting the constraint to '=d' or '+d' to mark it as written
 * Using a (clobber (match_dup 1)) in the insn form, such as:

(define_insn "cmpcc_insn"
  [(set (match_operand:CC 0 "register_operand" "=C")
(compare:CC
 (match_operand:SI 1 "register_operand" "d")
 (match_operand:SI 2 "register_operand" "b")))
  ]
  ""
  "XOR, %1"
)

 * Using a define_expand to clobber operand 1 later (outside the
insn's implicit parallel)
 * Using a define_insn to mark it as both a destructive xor and
compare in parallel, such as:

(define_insn "cmpcc_insn"
  [
  (set (match_operand:SI 0 "register_operand" "=d")
   (xor:SI
(match_operand:SI 1 "register_operand" "%0")
(match_operand:SI 2 "register_operand" "b")))
  (set (match_operand:CC 3 "register_operand" "=C")
   (compare:CC
(match_dup 1)
(match_dup 2)
))

I'd rather not use a scratch register as the moving between registers
involves ACC, which would mean I'd need to save the right hand operand
before doing the move.  I'd rather have the reload do the move earlier
if required if the left operand lives past this instruction.

Thanks for any help,

-- Michael


Re: Destructive comparison

2009-05-18 Thread Michael Hope
Thanks, that worked.  I ended up using:

(define_insn "cmpcc_xor"
  [(set (match_operand:CC 0 "register_operand" "=C")
(compare:CC
 (not:SI (xor:SI (match_operand:SI 1 "register_operand" "%r")
 (match_operand:SI 2 "register_operand" "b")))
 (const_int 0)))
  (set (match_operand:SI 3 "register_operand" "=1")
(not:SI (xor:SI (match_dup 1) (match_dup 2]
  ""
  "XOR, %1"
)

The important thing was in the generation.  The XOR is two operand but
I needed to supply a third pretend operand using:

  emit_insn (gen_cmpcc_(cc_reg, x, y, gen_reg_rtx(SImode)));

Using a match_dup instead of operand 3 above, or supplying 'x' twice,
lead to the compiler not noticing the change.

-- Michael

2009/5/18 Jim Wilson :
> Michael Hope wrote:
>>
>>  * Using a define_insn to mark it as both a destructive xor and
>> compare in parallel, such as:
>
> When a compare is in a parallel, the compare must be the first operation.
>  You have it second.  This kind of pattern should work.  You can find many
> examples of it in the sparc.md file for instance.  Of course, in this case,
> they aren't generated at RTL generation time. They are generated at combine
> time.  Still, I'd expect this to work, though there might be some early RTL
> optimization passes that are not prepared to handle it.
>
> See for instance the cmp_cc_xor_not_set pattern in the sparc.md file, which
> is similar to what you want.
>
> Jim
>


Re: Destructive comparison

2009-05-18 Thread Michael Hope
Yip, picked that up after I sent it.  Thanks.

2009/5/19 Jim Wilson :
> On Mon, 2009-05-18 at 19:58 +1200, Michael Hope wrote:
>>   (set (match_operand:SI 3 "register_operand" "=1")
>>       (not:SI (xor:SI (match_dup 1) (match_dup 2]
>
> not xor is aka xnor.  You probably want this without the two "not"
> operations.
>
> Jim
>
>
>


Accumulator based machines

2009-05-21 Thread Michael Hope
Hi there.  The machine I'm working is part accumulator based, part
register based.  I'm having trouble figuring out how best to tell the
compiler how ACC is affected and when.

For example, the add instruction is two operand with the destination
being a general register:
  ADD, R11 is equivalent to R11 = R11 + ACC

This works fine using a rule like
(define_insn "addsi3_insn"
  [(set (match_operand:SI 0 "register_operand"  "=r")
(plus:SI
 (match_operand:SI 1 "register_operand" "0")
 (match_operand:SI 2 "register_operand" "b")))]

(b is the constraint that the register comes from the ACC_REGS class)

The logical right shift instruction only works on the accumulator:
  LSR1 is equivalent to ACC = ACC >> 1

This works fine using:
(define_insn "lshrsi3_const"
  [(set (match_operand:SI 0 "register_operand" "=b")
(lshiftrt:SI
 (match_operand:SI 1 "register_operand" "0")
 (match_operand:SI 2 "immediate_operand" "")))]

The problem is when I have to clobber ACC such as when moving between
registers.  The output should be:
 LOADACC, R10; STOREACC, R11 (equivalent to ACC = R10; R11 = ACC)

I've tried a parallel clobber like:
(define_insn "movsi"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=b,   dam,dam")
(match_operand:SI 1 "general_operand"   "dami,b,  dam"))
  (clobber (reg:SI TREG_ACC))

 but this causes trouble when setting up ACC for the likes of the add
above.  The compiler runs but the code is incorrect

I've tried a parallel with a match_scratch like:
(define_insn "movsi"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=b,   rm,rm")
(match_operand:SI 1 "general_operand"   "rmi,b,  rm"))
  (clobber (match_scratch:SI 2 "=X,X,b"))
  ]
  ""
  "@
  LOADACC, %1
  STOREACC, %0
  LOADACC, %1\;STOREACC, %0"

This uses a 'b' constraint to put the scratch into ACC when moving
between registers and a 'X' constraint to ignore the scratch when
moving to or from ACC directly.

This basically works but fails when mixed with other instructions.
For example, the code:

  return left + right

fails with a 'movsi does not meet constraints' as ACC was already
allocated to one of the operands of the addsi, was not available for
the scratch register, and as such something else was given to the
movsi which didn't match the 'b' constraint.

All of the other instructions are OK as I can clobber or mark ACC as
an output reload to mark it as dirty.

Even the 68hc11 is better off as it can directly move between any two
registers :)

Any ideas?  Am I going about this the wrong way?  My first port
treated ACC as a fixed register which avoided all of this but
generated too many loads and stores.  Is there a way of using a
register only if a chain of instructions use it?  Can I peephole it in
someway instead?

-- Michael


Limiting the use of pointer registers

2009-05-24 Thread Michael Hope
Hi there.  I'm working on a port to an architecture where the pointer
registers X and Y are directly backed by small 128 byte caches.
Changing one of these registers to a different memory row causes a
cache load cycle, so using them for memory access is fine but using
them as general purpose registers is expensive.

How can I prevent the register allocator from using these for anything
but memory access?  I have a register class called ADDR_REGS that
contains just X and Y and one called DATA_REGS which contains the
general registers R10 to R1E.  GENERAL_REGS is the same as DATA_REGS.
The order they appear in in reg_class is DATA_REGS, GENERAL_REGS, then
ADDR_REGS.  I've defined the constrains for most of the patterns to
only take 'r' which prevents X or Y being used as operands for those
patterns.  I have to allow X and Y to be used in movsi and addsi3 to
allow indirect memory addresses to be calculated.

Unfortunately Pmode is SImode so I can't tell the difference between
pointer and normal values in PREFERRED_RELOAD_CLASS,
LIMIT_RELOAD_CLASS, or TARGET_SECONDARY_RELOAD.  I tried setting
REGISTER_MOVE_COST and MEMORY_MOVE_COST to 100 when the source or
destination is ADDR_REGS but this didn't affect the output.

I suspect that I'll have to do the same as the accumulator and hide X
and Y from the register allocator.  Pretend that any general register
can access memory and then use post reload split to turn the patterns
into X based patterns for the later phases to tidy up.

One more question.  The backing caches aren't coherent so X and Y
can't read and write to the same 128 bytes of memory at the same time.
 Does GCC have any other information about the location of a pointer
that I could use?  Something like:
 * Pointer is to text memory or read only data, so it is safe to read from
 * Pointer 1 is in the stack and pointer 2 is in BSS, so they are
definitely far apart
 * Pointer 1 is to to one on stack item and pointer 2 is to a stack
item at least 128 bytes apart
 * The call stack is known and pointer 1 and pointer 2 point to different rows

My fallback plan is to add a variable attribute so the programmer can
mark the pointer as non overlapping and push the problem onto them.
Something clever would be nice though :)

Sorry for all the questions - this is quite a difficult architecture.
I hope to collect all the answers and do a write up for others to use
when I'm done.

-- Michael


Using a umulhisi3

2009-06-03 Thread Michael Hope
Hi there.  The architecture I'm working is a 32 bit, word based
machine with a 16x16 -> 32 unsigned multiply.  For some reason the
combine stage is converting the umulhisi3 into a mulsi3 and I'm not
sure how to track this down.

The test code is part of an alpha blend:

void blend(uint8_t* sb, uint8_t* db)
{
  uint16_t ia = 256 - *sb;
  uint16_t d = *db;

  *db = ((d * ia) >> 8) + *sb;
}

I've define the different multiplies in the .md file:
(define_insn "umulhisi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
(mult:SI (zero_extend:SI
  (match_operand:HI 1 "register_operand" "%r"))
 (zero_extend:SI
  (match_operand:HI 2 "register_operand" "r"]
  ""
...

(define_insn "mulsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
(mult:SI (match_operand:SI 1 "register_operand" "%r")
 (match_operand:SI 2 "register_operand" "r")))]
   ""
...

Running at -O level optimisations gives the following in
umul.157r.outof_cfglayout, just before the combine stage:
---
(insn 3 6 4 2 umul.c:16 (set (reg/v/f:SI 28 [ sb ])
(reg:SI 0 R10 [ sb ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 0
R10 [ sb ])
(nil)))

(insn 4 3 5 2 umul.c:16 (set (reg/v/f:SI 29 [ db ])
(reg:SI 1 R11 [ db ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 1
R11 [ db ])
(nil)))

(note 5 4 8 2 NOTE_INSN_FUNCTION_BEG)

(insn 8 5 9 2 umul.c:17 (set (reg:SI 26 [ D.1217 ])
(zero_extend:SI (mem:QI (reg/v/f:SI 28 [ sb ]) [0 S1 A8]))) 27
{zero_extendqisi2} (expr_list:REG_DEAD (reg/v/f:SI 28 [ sb ])
(nil)))

(insn 9 8 10 2 umul.c:20 (set (reg:HI 30)
(const_int 256 [0x100])) 1 {movhi_insn} (nil))

(insn 10 9 11 2 umul.c:20 (set (reg:SI 31)
(minus:SI (subreg:SI (reg:HI 30) 0)
(reg:SI 26 [ D.1217 ]))) 12 {subsi3} (expr_list:REG_DEAD (reg:HI 30)
(nil)))

(insn 11 10 12 2 umul.c:20 (set (reg:SI 33)
(zero_extend:SI (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8]))) 27
{zero_extendqisi2} (nil))

(insn 12 11 13 2 umul.c:20 (set (reg:HI 32)
(subreg:HI (reg:SI 33) 0)) 1 {movhi_insn} (expr_list:REG_DEAD
(reg:SI 33)
(nil)))

(insn 13 12 14 2 umul.c:20 (set (reg:SI 34)
(mult:SI (zero_extend:SI (reg:HI 32))
(zero_extend:SI (subreg:HI (reg:SI 31) 0 14
{umulhisi3} (expr_list:REG_DEAD (reg:HI 32)
(expr_list:REG_DEAD (reg:SI 31)
(nil

(insn 14 13 15 2 umul.c:20 (set (reg:SI 35)
(ashiftrt:SI (reg:SI 34)
(const_int 8 [0x8]))) 21 {ashrsi3_const}
(expr_list:REG_DEAD (reg:SI 34)
(nil)))

(insn 15 14 16 2 umul.c:20 (set (reg:QI 36)
(subreg:QI (reg:SI 35) 0)) 0 {movqi_insn} (expr_list:REG_DEAD
(reg:SI 35)
(nil)))

(insn 16 15 17 2 umul.c:20 (set (reg:SI 37)
(plus:SI (reg:SI 26 [ D.1217 ])
(subreg:SI (reg:QI 36) 0))) 11 {addsi3}
(expr_list:REG_DEAD (reg:QI 36)
(expr_list:REG_DEAD (reg:SI 26 [ D.1217 ])
(nil

(insn 17 16 0 2 umul.c:20 (set (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8])
(subreg:QI (reg:SI 37) 0)) 0 {movqi_insn} (expr_list:REG_DEAD
(reg:SI 37)
(expr_list:REG_DEAD (reg/v/f:SI 29 [ db ])
(nil
---
The umulhisi3 has been correctly found and used at this stage.  In the
following combine stage however, it gets converted into a mulsi3.  The
.combine dump is attached.

The xtensa port is the closest match I can find as it is 32 bit, word
based, and has the umulhisi3.  It correctly keeps the 16 bit multiply.

Some other test cases like:
uint32_t mul(uint16_t a, uint16_t b)
{
return a*b;
}

come through fine.  It might be something to do with the memory access.

How does the combine stage work?  It looks like it could get multiple
potential matches for a set of RTLs.  Does it use some type of costing
function to pick between them?  Can I tell combine that a umulhisi3 is
cheaper than a mulsi3?

Thanks for the earlier help on the post reload split to use the
accumulator - it's working well.

-- Michael


umul.i.159r.combine
Description: Binary data


Re: Machine Description Template?

2009-06-05 Thread Michael Hope
I've found the MMIX port to be a good place to start.  It's a bit old
but the archtecture is nice and simple and the implementation nice and
brief.  Watch out though as it is a pure 64 bit machine - you'll need
to think SI every time you see DI.

The trick past there is to compare the significant features of your
machine with existing machines.  For example, GCC prefers a 68000
style machine with a set of condition codes, however many machines
only have one condition flag that changes meaning based on what you
are doing.

-- Michael

2009/6/6 Graham Reitz :
>
> Is there a machine description template in the gcc file source tree?
>
> If there is also template for the 'C header file of macro definitions' that
> would be good to know too.
>
> I did a file search for '.md' and there are tons of examples.  Although, I
> was curious if there was a generic template.
>
> graham
>


Good progress

2009-06-28 Thread Michael Hope
Hi there.  Sorry for the noise, but I thought it would be nice to hear
from a new porter who has gotten past the first few hurdles.

The architecture I'm working on is a 32 bit accumulator based machine
with a very small instruction set.  Binutils and GAS were straight
forward and after some help I've incorperated the destructive
compares, post reload fixes for the accumulator, and the limited
addressing modes (well, mode :)

I've hooked in my own simulator to the test suite.  The compile test
suite passes fine and the execute tests are down to from an initial
700 to 300 failures.  The last fifty will be messy but the next few
hundred should drop fairly easily.

Thanks for everyones help so far.  The code generated is already
decent and will only get better.

-- Michael


Re: GCC 4.7.0 Release Candidate available from gcc.gnu.org

2012-03-05 Thread Michael Hope
On Sat, Mar 3, 2012 at 2:44 AM, Richard Guenther  wrote:
>
> GCC 4.7.0 Release Candidate available from gcc.gnu.org
>
> The first release candidate for GCC 4.7.0 is available from
>
>  ftp://gcc.gnu.org/pub/gcc/snapshots/4.7.0-RC-20120302
>
> and shortly its mirrors.  It has been generated from SVN revision 184777.
>
> I have so far bootstrapped and tested the release candidate on
> x86_64-linux.  Please test it and report any issues to bugzilla.

The RC bootstraps C, C++, Fortran, and Obj-C in arm-linux-gnueabi
Cortex-A9/Thumb-2/NEON/softfp and ARMv5T/ARM/soft-float
configurations.  The test results are here:
 
http://builds.linaro.org/toolchain/gcc-4.7.0-RC-20120302/logs/armv7l-natty-cbuild259-tcpanda03-armv5r2/gcc-testsuite.txt
and:
 
http://builds.linaro.org/toolchain/gcc-4.7.0-RC-20120302/logs/armv7l-natty-cbuild259-tcpanda02-cortexa9r1/gcc-testsuite.txt

and, on reflection, should be sent to gcc-testresults.  The host
details are in the same directory.

There's a fair number of failures:
 http://people.linaro.org/~michaelh/incoming/a9-faults.txt
 http://people.linaro.org/~michaelh/incoming/armv5-faults.txt

Ramana, any thoughts?  If you ignore the guality and tls ones then
most are testisims but there's a couple of ICEs.

-- Michael


Re: How to avoid sign or zero extension

2012-06-04 Thread Michael Hope
On 3 June 2012 17:06, i-love-spam  wrote:
> I'm writing some optimized functions for gcc-arm in a library that obuses 
> shorts. So the problem I have is that in extremely many places resutls of my 
> optimized functions are needlessly sign or zero extended. That is, gcc adds 
> UXTH or SXTH opcode.
>
> For example, imagine if I use clz instructions (count leading zeros). Result 
> of the function will be positive number between 0 and 32. So, in places where 
> result of that clz functions is assigned to a short int it shouldn't 
> sign-extend the result.
>
> I use inline asm, and it works with arm's armcc if I use short as a result of 
> inline asm expression:
>
> static __inline short CLZ(int n)
> {
>    short ret;
> #ifdef __GNUC__
>    __asm__("clz %0, %1" : "=r"(ret) : "r"(n));
> #else
>    __asm { clz ret, n; }
> #endif
>    return ret;
> }
>
> //test function
> short test_clz(int n)
> {
>    return CLZ(n);
> }
>
>
> ARMCC generates this code:
> test_clz:
>    CLZ      r0,r0
>    BX       lr
>
> GCC generates this code:
> test_clz:
>    clz   r0, r0
>    sxth r0, r0    <--- offending line.
>    bx   lr

Hi there.  This list is about the development of GCC.  I recommend
using the gcc-help list for end user topics.

In this case, GCC is correct.  Section 5.4 of the ARM AAPCS says "A
Fundamental Data Type that is smaller than 4 bytes is zero- or
sign-extended to a word and returned in r0".  You've used inline
assembler so GCC can't tell that the clz instruction already clears
the top bits.

How about using __builtin_clz() instead?  You get the bonus that GCC
can then reason about the function and optimise away if possible.

-- Michael


Re: GCC 4.7.1 Release Candidate available from gcc.gnu.org

2012-06-07 Thread Michael Hope
On 6 June 2012 22:14, Richard Guenther  wrote:
>
> The first release candidate for GCC 4.7.1 is available from
>
>  ftp://gcc.gnu.org/pub/gcc/snapshots/4.7.1-RC-20120606
>
> and shortly its mirrors.  It has been generated from SVN revision 188257.
>
> I have so far bootstrapped and tested the release candidate on
> x86_64-linux.  Please test it and report any issues to bugzilla.
>
> If all goes well, I'd like to release 4.7.1 at the end of next week.

This bootstraps and tests OK in ARMv5T+ARM+soft,
Cortex-A9+Thumb-2+softfp+NEON, and Cortex-A9+Thumb-2+hard+NEON
configurations for C, C++, Fortran, and objc[1].  The regressions
compared to 4.7.0 are testisms or the vectoriser not applying.  I
haven't logged them in bugzilla, sorry.

-- Michael
[1] http://builds.linaro.org/toolchain/gcc-4.7.1-RC-20120606/logs/


Re: ARM: gcc generates two identical strd instructions to store 8 bytes

2012-06-25 Thread Michael Hope
On 26 June 2012 00:48, Nathanaël Prémillieu  wrote:
> Hi all,
>
> I am using the gcc ARM cross-compiler (gcc version 4.6.3 (Ubuntu/Linaro
> 4.6.3-1ubuntu5)). Compiling the test.c code (in attachement) with:
>
> 'arm-linux-gnueabi-gcc -S test.c'
>
> I obtain the test.s assembly code (in attachement). At lines 56 and 57 of
> the test.s there is two identical strd instructions:
>
> 56      strd    r2, [r7]
> 57      strd    r2, [r7]
>
> I have checked the semantic of the ARM strd instruction and I have not seen
> any side effect of this instruction that could explain why gcc need to put
> this instruction two times in a row. For me, one is sufficient to store the
> 8-bytes variable into memory.
>
> Is there an explanation?

Hi Nathanaël.  Your question is more appropriate for the gcc-help
list.  This list is about the development of GCC itself.

You've built with optimisation turned off so GCC has generated correct
but inefficient code.  The double store could be side effect of
expanding the 64 bit multiply into the component 32 bit multiplies or
the conditional.  Try building at -O or higher.

-- Michael


Re: thumb2 support

2012-10-14 Thread Michael Hope
On 11 October 2012 17:58, Grant  wrote:
>>> Hello, I'm working with the BeagleBone and gcc-4.5.4 on Gentoo.  If I
>>> try to compile the 3.6 kernel with CONFIG_THUMB2_KERNEL, I get:
>>>
>>> arch/arm/boot/compressed/head.S:127: Error: selected processor does
>>> not support requested special purpose register -- `mrs r2,cpsr'
>>> arch/arm/boot/compressed/head.S:134: Error: selected processor does
>>> not support requested special purpose register -- `mrs r2,cpsr'
>>> arch/arm/boot/compressed/head.S:136: Error: selected processor does
>>> not support requested special purpose register -- `msr cpsr_c,r2'
>>>
>>> This post indicates that mainline gcc does not currently support thumb2:
>>>
>>> https://groups.google.com/d/msg/beagleboard/P52fgMDzp8A/vupzuh71vdYJ
>>>
>>> However, this indicates that thumb2 has been supported since 4.3:
>>>
>>> http://gcc.gnu.org/gcc-4.3/changes.html
>>>
>>> Can anyone clear this up?
>>
>> The errors are coming from an assembler file that is not part of the
>> GCC sources.  Are those instructions valid for Thumb2?  I don't know.
>> If they are valid, then the issue is with the assembler, which is not
>> part of GCC; check the version of the GNU binutils that you have
>> installed.  If those instructions are not valid, then you need to
>> change your source.
>
> Thanks Ian.  I'm using binutils-2.22-r1.  Do you happen to know which
> version of binutils should support thumb2?

Hi Grant.  I'm pretty sure this was fixed by:

commit c0d796cf810a84f10703c0390f7b1c5887b837c9
Author: Nick Clifton 
Date:   Wed Jun 13 14:18:59 2012 +

PR gas/12698
* config/tc-arm.c (do_t_mrs): Do not require an m-profile
architecure when assembling for all archiectures.
(do_t_msr): Likewise.

which will be in the upcoming binutils 2.23.  Debian/Ubuntu carry this
as a patch on top of their 2.22.

-- Michael