Re: No documentation of -fsched-pressure-algorithm

2012-05-02 Thread David Brown

On 01/05/2012 20:11, Joern Rennecke wrote:

Quoting Richard Sandiford :


nick clifton  writes:

OK, but what if it turns out that the new algorithm improves the
performance of some benchmarks/applications, but degrades others, within
the same architecture ? If that turns out to be the case (and I suspect
that it will) then having a documented command line option to select the
algorithm makes more sense.


That was my point though. If that's the situation, we need to find
out why. We shouldn't hand the user a long list of options and tell
them to figure out which ones happen to produce the best code.


Actually, having a long list of things that you can tweak is exactly what
Milepost thrives on.


Well, given the replies from you, Ian and Vlad (when reviewing the
patch),
I feel once again in a minority of one here :-) but... I just don't
think we should be advertising this sort of stuff to users. Not because
I'm trying to be cliquey, but because any time the user ends up having
to use stuff like this represents a failure on the part of the compiler.


We have mis-fireing heuristics all over the place. This has actually
gotten a lot worse with the shift from rtl to tree optimizers. These
days the optimizers know how to do almost any transformation, but squat
about the cost/benefit equation.

But I don't see how hiding the switches to force the heuristics makes
this situation any better.


I mean, at what level would we document it? We could give a detailed
description of the two algorithms, but there should never be any need
to explain those to users (or for the users to have to read about them).
And there's no guarantee we won't change the algorithms between releases.
So I suspect we'd just have documentation along of the lines of
"here, we happen to have two algorithms to do this. Treat them as
black boxes, try each one on each source file, and see what works
out best." Which isn't particularly insightful and not IMO a good
user interface.


It's not ideal, but workable. If you could explain coherently when the
option should be used, you could probably improve the heuristics already.
If you want to make this a bit more meaningful, you could have a bugzilla
bug for the imperfect heuristics, and ask people to submit their testcases
when they see significant benefit from using an obscure option.



One thing that might be nice is to split the documentation pages a 
little, such as into "normal user options", "advanced user options", and 
"here be dragons".  gcc has a /lot/ of options, many of which are of 
little use to most users, and it can be overwhelming to see so many on 
the same page of the manual.  If you made a "here be dragons" page, it 
would make it much easier to have only rough information there.  The 
page would start with a disclaimer that these options are for expert 
usage and testing, they can change at any time in different versions of 
gcc, and users should not expect support from suppliers (Code Sourcery, 
Red Hat, etc.) on their usage.  Then you could add limited documentation 
for options like "-fsched-pressure-algorithm" or the various "--param" 
options, with just a rough explanation.  It doesn't really matter if the 
explanation is incomprehensible to mortal users - and it may even just 
be a link to a gcc wiki page or part of the gcc internals documentation.


A second thing that would be hugely convenient for advanced users and 
testers (and people like me who just like to read manuals) would be a 
version number attached to each option, so that we can see which gcc 
versions support it.  Some of us use multiple gcc versions (I do 
embedded work - I have gcc for different targets with versions ranging 
from 2.95 to 4.6) - it would be /very/ nice to be able to look at the 
latest version of the documentation rather than always having to go back 
to old versions to figure out if a particular option exists in that 
particular version.


mvh.,

David


Re: [RFC] Converting end of loop computations to MIN_EXPRs.

2012-05-02 Thread Richard Guenther
On Tue, May 1, 2012 at 8:36 AM, Ramana Radhakrishnan
 wrote:
> Sorry about the delayed response, I've been away for some time.
>
>>
>> I don't exactly understand why the general transform is not advisable.
>> We already synthesize min/max operations.
>
>
>>
>> Can you elaborate on why you think that better code might be generated
>> when not doing this transform?
>
> The reason why I wasn't happy was because of the code we ended up
> generating in this case for ARM comparing the simple examples showed
> the following difference - while I'm pretty sure I can massage the
> backend to generate the right form in this case with splitters I
> probably didn't realize this when I wrote the mail. Given this, I
> wonder if it is worth in general doing this transformation in a fold
> type operation rather than restricting ourselves only to invariant
> operands ?

Yes, I think doing this generally would be beneficial.  Possible places to
hook this up are tree-ssa-forwprop.c if you have

 tem1 = i < x;
 tem2 = i < y;
 tem3 = tem1 && tem2;
 if (tem3)

or tree-ssa-ifcombine.c if you instead see

  if (i < x)
if (i < y)
  ...

Richard.


>
> The canonical example is as below :
>
>
> #define min(x, y) ((x) < (y)) ? (x) : (y)
> int foo (int i, int x ,int y)
> {
> // return ( i < x) && (i < y);
>  return i < (min (x, y));
> }
>
>
> Case with min_expr:
>
>        cmp     r2, r1  @ 8     *arm_smin_insn/1        [length = 8]
>        movge   r2, r1
>        cmp     r2, r0  @ 23    *arm_cmpsi_insn/3       [length = 4]
>        movle   r0, #0  @ 24    *p *arm_movsi_insn/2    [length = 4]
>        movgt   r0, #1  @ 25    *p *arm_movsi_insn/2    [length = 4]
>        bx      lr      @ 28    *arm_return     [length = 12]
>
>
> This might well be .
>
>      cmp       r2, r0
>      cmpge  r1, r0
>     movle    r0, #0
>     movgt    r0, #1
>     bx           lr
>
> Case without min_expr:
>
>        cmp     r0, r2  @ 28    *cmp_and/6      [length = 8]
>        cmplt   r0, r1
>        movge   r0, #0  @ 29    *mov_scc        [length = 8]
>        movlt   r0, #1
>        bx      lr      @ 32    *arm_return     [length = 12]
>
>
>
>>
>>> #define min(x,y) ((x) <= (y) ? (x) : (y))
>>>
>>> void foo (int x, int y, int *  a, int * b, int *c)
>>> {
>>>  int i;
>>>
>>>  for (i = 0;
>>>       i < x && i < y;
>>>       /* i < min (x, y); */
>>>       i++)
>>>    a[i] = b[i] * c[i];
>>>
>>> }
>>>
>>> The patch below deals with this case and I'm guessing that it could
>>> also handle more of the comparison cases and come up with more
>>> intelligent choices and should be made quite a lot more robust than
>>> what it is right now.
>>
>> Yes.  At least if you have i < 5 && i < y we canonicalize it to
>> i <= 4 && i < y, so your pattern matching would fail.
>
> Of-course considering overflow semantics you could transform this to
> i < min (x +1, y) where the original condition was i <= x && i < y.
>
> Thinking about it , it's probably right to state that
>
> i op1 X && i op2 Y => i op min (X1, Y1)
>
> when op1 and op2 are identical or according to the table below :
>
> op1     op2             op            X1           Y1
>  <         <=               <=            X + 1        Y
>  >         >=               >                X            Y + 1
>  < =        <                <=             X             Y + 1
>  >=         >               >               X + 1         Y
>
>
> Other than being careful about overflow semantics the second table
> is probably worthwhile looking at -
>
>>
>> Btw, the canonical case this happens in is probably
>>
>>   for (i = 0; i < n; ++i)
>>     for (j = 0; j < m && j < i; ++j)
>>       a[i][j] = ...
>>
>> thus iterating over the lower/upper triangular part of a non-square matrix
>> (including or not including the diagonal, thus also j < m && j <= i
>
> Ok thanks - fair enough .
>
>
> Ramana
>
>>
>> Richard.
>>
>>> regards,
>>> Ramana
>>>
>>>
>>>
>>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>>> index ce5eb20..a529536 100644
>>> --- a/gcc/tree-ssa-loop-im.c
>>> +++ b/gcc/tree-ssa-loop-im.c
>>> @@ -563,6 +563,7 @@ stmt_cost (gimple stmt)
>>>
>>>   switch (gimple_assign_rhs_code (stmt))
>>>     {
>>> +    case MIN_EXPR:
>>>     case MULT_EXPR:
>>>     case WIDEN_MULT_EXPR:
>>>     case WIDEN_MULT_PLUS_EXPR:
>>> @@ -971,6 +972,124 @@ rewrite_reciprocal (gimple_stmt_iterator *bsi)
>>>   return stmt1;
>>>  }
>>>
>>> +/* We look for a sequence that is :
>>> +   def_stmt1  : x = a < b
>>> +   def_stmt2  : y = a < c
>>> +   stmt: z = x & y
>>> +   use_stmt_cond: if ( z != 0)
>>> +
>>> +   where b, c are loop invariant .
>>> +
>>> +   In which case we might as well replace this by :
>>> +
>>> +   t = min (b, c)
>>> +   if ( a < t )
>>> +*/
>>> +
>>> +static gimple
>>> +rewrite_min_test (gimple_stmt_iterator *bsi)
>>> +{
>>> +  gimple stmt, def_stmt_x, def_stmt_y, use_stmt_cond, stmt1;
>>> +  tree x, y, z, a, b, c, var, t, name;
>>> +  use_operand_p use;
>>> +  bool is_lhs_of_comparison = false;
>>> +
>>> +

Porting new target architecture to GCC

2012-05-02 Thread Ben Morgan

Hello,

In a course at my university (Universität Würzburg, Germany) we have
created a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL)
which we then play onto a FPGA (the Xilinx Spartan-3AN) to use. So far
if we want to do anything with it, we have to write the assembly code
ourselves.

How much work would it be to write a HadesXI backend for GCC?
(The idea is to use this as a possible bachelor thesis.)

Where would be a good place to start; what are the prerequisites for
undertaking a project like this other than knowing the CPU architecture
inside out?

Thanks for your advice,
Ben Morgan


Re: Porting new target architecture to GCC

2012-05-02 Thread Basile Starynkevitch
On Wed, May 02, 2012 at 01:30:19PM +0200, Ben Morgan wrote:

> In a course at my university (Universität Würzburg, Germany) we have
> created a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL)
> which we then play onto a FPGA (the Xilinx Spartan-3AN) to use. So far
> if we want to do anything with it, we have to write the assembly code
> ourselves.
> 
> How much work would it be to write a HadesXI backend for GCC?
> (The idea is to use this as a possible bachelor thesis.)

I am not familiar with back-ends -I'm more familiar with the middle-end-, 
and I am not very familiar with the German university system. 

I'm guessing that what you call a "bachelor thesis" is what is called 
today "License" in the French university system.

My feeling is that understanding GCC and writing a small backend 
is a big lot of work for a student. (For a GCC expert, it is rumored that 
making a suboptimal backend for a new architecture is several months of work).

So I would perhaps believe that making a new backend for GCC is a quite 
ambitious goal (perhaps too ambitious for a bachelor thesis, if your goal 
is mostly to make something usable, not only to learn a big lot of things). 
If you follow that route, you should first find out, amongst the many existing 
GCC back-ends, the architecture which seems similar to what your HaDesXI is.

Notice that GCC has even back-ends for "fictious" architecture like Knuth's 
MMIX.

To get a picture of GCC, you might be interested to have a glance at some 
slides 
under http://gcc-melt.org/ notably http://gcc-melt.org/GCC-MELT-HiPEAC2012.pdf 
which has many links to other material. Of course, you can find a lot 
of other material about GCC on Internet.

(you might even want to play with GCC MELT to understand some 
of the basic internal representations of GCC)

On the other hand, GCC offers you a very powerful back-end architecture. 
But GCC is complex, and significantly evolving!

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: No documentation of -fsched-pressure-algorithm

2012-05-02 Thread Jonathan Wakely
On 2 May 2012 10:37, David Brown wrote:
> A second thing that would be hugely convenient for advanced users and
> testers (and people like me who just like to read manuals) would be a
> version number attached to each option, so that we can see which gcc
> versions support it.  Some of us use multiple gcc versions (I do embedded
> work - I have gcc for different targets with versions ranging from 2.95 to
> 4.6) - it would be /very/ nice to be able to look at the latest version of
> the documentation rather than always having to go back to old versions to
> figure out if a particular option exists in that particular version.

I'm sure you know the drill ... patches welcome.


Re: Porting new target architecture to GCC

2012-05-02 Thread Alexander Monakov

On Wed, 2 May 2012, Ben Morgan wrote:

> Hello,
> 
> In a course at my university (Universität Würzburg, Germany) we have
> created a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL)
> which we then play onto a FPGA (the Xilinx Spartan-3AN) to use. So far
> if we want to do anything with it, we have to write the assembly code
> ourselves.
> 
> How much work would it be to write a HadesXI backend for GCC?

I remember "6 months and more of full-time work for a skilled developer"
mentioned on this mailing list.

> Where would be a good place to start; what are the prerequisites for
> undertaking a project like this other than knowing the CPU architecture
> inside out?

I recommend reading "The GGX patch archive" blog entries to get a "big
picture" of the steps involved.  It was available at spindazzle.org/ggx,
but at the moment you'll have to browse it via The Internet Archive
( http://web.archive.org/web/20100117171845/http://spindazzle.org/ggx/ ).
Apart from that, the GCC wiki has accumulated many resources, especially
in the GettingStarted section ( http://gcc.gnu.org/wiki/GettingStarted ).

Alexander

[ANN] ODB C++ ORM 2.0.0 released

2012-05-02 Thread Boris Kolpackov
I am pleased to announce the release of ODB 2.0.0.

ODB is an open source object-relational mapping (ORM) system for C++. It
allows you to persist C++ objects to a relational database without having
to deal with tables, columns, or SQL and without manually writing any of
the mapping code.

ODB is implemented as a GCC plugin and this release adds support for GCC
4.7 series in addition to GCC 4.6 and 4.5.

Other major new features in this release:

  * Support for C++11 which adds integration with the new C++11 standard
library components, including smart pointers and containers. Now you
can use std::unique_ptr and std::shared_ptr as object pointers (their
lazy versions are also provided). For containers, support was added
for std::array, std::forward_list, and the unordered containers.

  * Support for polymorphism which allows you to persist, load, update,
erase, and query objects of derived classes using their base class
interfaces. Persistent class hierarchies are mapped to the relational
database model using the table-per-difference mapping.

  * Support for composite object ids which are translated to composite
primary keys in the relational database.

  * Support for the NULL semantics for composite values.

A more detailed discussion of these features can be found in the
following blog post:

http://www.codesynthesis.com/~boris/blog/2012/05/02/odb-2-0-0-released/

For the complete list of new features in this version see the official
release announcement:

http://www.codesynthesis.com/pipermail/odb-announcements/2012/13.html

ODB is written in portable C++ and you should be able to use it with any
modern C++ compiler. In particular, we have tested this release on GNU/Linux
(x86/x86-64), Windows (x86/x86-64), Mac OS X, and Solaris (x86/x86-64/SPARC)
with GNU g++ 4.2.x-4.7.x, MS Visual C++ 2008 and 2010, Sun Studio 12, and
Clang 3.0.

The currently supported database systems are MySQL, SQLite, PostgreSQL,
Oracle, and SQL Server. ODB also provides profiles for Boost and Qt, which
allow you to seamlessly use value types, containers, and smart pointers
from these libraries in your persistent classes.

More information, documentation, source code, and pre-compiled binaries are
available from:

http://www.codesynthesis.com/products/odb/

Enjoy,
Boris



Paradoxical subreg reload issue

2012-05-02 Thread Aurelien Buhrig
Hi,

I have an issue (gcc 4.6.3, private bacakend) when reloading operands of
this insn:
(set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0)
 (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f]))

The register 21 is reloaded into
(reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register.
Since it is a BIG_ENDIAN target, the SI subreg regno is then -1.

Note that word_mode is SImode, whereas the class r0 belongs to is
HI-wide. I don't know if this matters when reloading.

I have no idea how to debug this, if it is a backend or a reload bug.
Any idea?

Thank you in advance,
Aurélien


Re: making sizeof(void*) different from sizeof(void(*)())

2012-05-02 Thread Paulo J. Matos

On 30/04/12 13:01, Peter Bigot wrote:

I would like to see the technical details, if your code is released somewhere.



Hi Peter,

Sorry for the delay.
The code is not released, however I can send you a patch against GCC 
4.6.3 sources (our GCC 4.7.0 port is not yet stable) of our changes and 
will also try to explain how it works.



Without having started it yet, I'm thinking this can be done by
modifying build_pointer_type to generalize the
TARGET_ADDR_SPACE_POINTER_MODE to TARGET_TYPE_POINTER_MODE, pass it
the whole type instead of just the address space field, and moving
TARGET_ADDR_SPACE_POINTER_MODE support to the default implementation
for that hook. Likewise for build_reference_type.  Then judicious
application of attributes to types and decls would allow detection of
the situation where a non-standard pointer size is needed.  I'm hoping
there aren't too many other places where that work would get undone.



As you will see, I haven't used anything related to address spaces 
feature in GCC.




Sounds like a useful set of changes to have in the main sources, since
this is hardly a singular need!


Yes.  Is there an existing bug/enhancement report for this capability?



Don't think so but I would be happy to contribute with whatever I can.


--
PMatos



Re: No documentation of -fsched-pressure-algorithm

2012-05-02 Thread nick clifton

Hi Richard,


Well, given the replies from you, Ian and Vlad (when reviewing the patch),
I feel once again in a minority of one here :-) but... I just don't
think we should be advertising this sort of stuff to users.


OK, what about Ian's suggestion of controlling the algorithm selection 
via a --param instead of a -f option ?




Not because
I'm trying to be cliquey, but because any time the user ends up having
to use stuff like this represents a failure on the part of the compiler.


A nice idea in principle, but in practice GCC already has a ton of these 
specialist options.  Maybe you feel that we should not be adding another 
one to this list, but I think that we are already too far gone.  GCC and 
its long list of command line options is an established norm.


Perhaps now is the time to consider embracing projects like Acovea and 
Milepost and making them an official, easier-to-use meta front end to gcc ?




I mean, at what level would we document it?


Well I rather like David's suggestion - a split gcc invocation manual 
with options like -fsched-pressure-algorithm only appearing in the 
here-be-dragons section.




  "Here, we happen to have two algorithms to do this.
   Treat them as black boxes, try each one on each

>source file, and see what works out best."

Or:

"Here, we have two algorithms to do this. You can
 treat them as black boxes, try both and see which
 works best for your application.  Or you can delve
 into their intricacies to see which ought to be the
 better one for your target. See this post  for a description of the algorithms.
 Either way we would be interested in hearing about
 which algorithm works best for you, what your application
 looks like and which architecture you are using.
 Please contact us at "


Cheers
  Nick


Re: Porting new target architecture to GCC

2012-05-02 Thread Georg-Johann Lay
Ben Morgan wrote:

> In a course at my university (Universität Würzburg, Germany) we have created
> a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL) which we then
> play onto a FPGA (the Xilinx Spartan-3AN) to use. So far if we want to do
> anything with it, we have to write the assembly code ourselves.

You have already ported binutils and gdb if I understand correctly?

> How much work would it be to write a HadesXI backend for GCC? (The idea is
> to use this as a possible bachelor thesis.)

It's not the idea of your Betreuer, I hope. If so, it's unfair to propose
this as a bachelor thesis.  Besides that the pure implementation will
take several months for an experienced GCC developer (others already commented
on this), you will have to author and write corresponding paperwork.

Porting GCC is "only filling in hooks", yes, but the internals linked below
are often misleading and hard to read for newcomers, likewise intuition from
programming experience is often misleading and wrong.

Without an experienced GCC developer / backend guy as tutor I'd strongly
discourage to pick this topic, and even with an experienced tutor it's
a *very* ambitious project, and bugs and shortcoming of the implementation and
the resulting gcc executables are likely to diminish you grading in an unfair 
way.

> Where would be a good place to start; what are the prerequisites for 
> undertaking a project like this other than knowing the CPU architecture 
> inside out?

One basis is a reasonable assembler like GNU as. If the tools after GCC are not
"mighty" enough, e.g. if you cannot express things by means of respective
relocations or expression modifiers and such as needed, the assembler is not
much help.

And such a port will be hard without a debugger and a simulator.
Many things are easier with a simulator than on silicon.

For a start with GCC, it's the internals, see

http://gcc.gnu.org/onlinedocs/gccint/

and in particular chapters

10 RTL Representation
16 Machine Descriptions
17 Target Description Macros and Functions
19.1 Target Makefile Fragments

Besides that it's reading existing backends.
Avoid overly complicated ones like x86 and rs6000.

s390 is nicely documented and it can be helpful to
consult backends even if the hardware is not similar
to your hardware.

The more orthogonal the instruction set is, the easier will be the backend.
Similar for register set and addressing modes.

> 
> Thanks for your advice, Ben Morgan
> 

Johann


Re: Porting new target architecture to GCC

2012-05-02 Thread Ian Lance Taylor
Ben Morgan  writes:

> In a course at my university (Universität Würzburg, Germany) we have
> created a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL)
> which we then play onto a FPGA (the Xilinx Spartan-3AN) to use. So far
> if we want to do anything with it, we have to write the assembly code
> ourselves.
>
> How much work would it be to write a HadesXI backend for GCC?
> (The idea is to use this as a possible bachelor thesis.)
>
> Where would be a good place to start; what are the prerequisites for
> undertaking a project like this other than knowing the CPU architecture
> inside out?

The difficulty depends entirely on the characteristics of the CPU and
the extent to which you want GCC to take advantage of any unusual
features.

I've seen other messages commenting on the length of time and the
difficulties of the internal docs, but I think they are exaggerating the
problems.  Porting a new CPU is the best documented part of GCC
internals.  My rule of thumb for an experienced toolchain programmer to
add a complete GNU toolchain port--compiler, assembler, linker,
debugger--is three months.  The compiler alone is about half that.

Other than knowing the CPU, the prerequisite is the ability to read and
understand the GCC internal docs, the willingness to look at other GCC
ports for similar processors, and the willingness to write code.

It's worth looking at Anthony Green's blog about implementing moxie at
http://moxielogic.org/ , as he described the process of doing a full GCC
port.

I don't know what a bachelor thesis is, so I don't know if this would be
suitable.  A GCC port by itself would be too simple for a masters thesis
in the U.S.

Ian


Re: Paradoxical subreg reload issue

2012-05-02 Thread Ian Lance Taylor
Aurelien Buhrig  writes:

> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of
> this insn:
> (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0)
>  (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f]))
>
> The register 21 is reloaded into
> (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register.
> Since it is a BIG_ENDIAN target, the SI subreg regno is then -1.
>
> Note that word_mode is SImode, whereas the class r0 belongs to is
> HI-wide. I don't know if this matters when reloading.
>
> I have no idea how to debug this, if it is a backend or a reload bug.
> Any idea?

Where did that insn come from?  It looks like it really wants to be

(set (reg:QI 21)
 (truncate:QI (lshiftrt:SI (reg:SI 24) (const_int 31

Ian


Spital nou de pediatrie

2012-05-02 Thread cramer

http://www.youtube.com/watch?feature=player_embedded&v=phjGxHn3uKU
 To unsubscribe please send email to unsubscr...@cc.psd-prahova.ro


Re: Porting new target architecture to GCC

2012-05-02 Thread Aurelien Buhrig

> Ben Morgan wrote:
> 
>> In a course at my university (Universität Würzburg, Germany) we have created
>> a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL) which we then
>> play onto a FPGA (the Xilinx Spartan-3AN) to use. So far if we want to do
>> anything with it, we have to write the assembly code ourselves.
> 
> You have already ported binutils and gdb if I understand correctly?

And don't forget an ISS (gdb sim, sid, ...) or a testsuite/board
interface if you want to run the GCC execution testsuite...


>> How much work would it be to write a HadesXI backend for GCC? (The idea is
>> to use this as a possible bachelor thesis.)
> 
> It's not the idea of your Betreuer, I hope. If so, it's unfair to propose
> this as a bachelor thesis.  Besides that the pure implementation will
> take several months for an experienced GCC developer (others already commented
> on this), you will have to author and write corresponding paperwork.
> 
> Porting GCC is "only filling in hooks", yes, but the internals linked below
> are often misleading and hard to read for newcomers, likewise intuition from
> programming experience is often misleading and wrong.
> 
> Without an experienced GCC developer / backend guy as tutor I'd strongly
> discourage to pick this topic, and even with an experienced tutor it's
> a *very* ambitious project, and bugs and shortcoming of the implementation and
> the resulting gcc executables are likely to diminish you grading in an unfair 
> way.
> 

I do agree with Johann.
As an example, we proposed, years ago, a 6-month engineering school
internship to do develop a gcc backend "as much as possible", only
focused on GCC (sid/gdb/as/ld/... was already done). The student had no
GCC backend skills before beginning. I think he coped with it very well,
but the result was not stable at all, the testsuite was not set up, and
the work had to be continued for weeks/months. And I don't talk about
optimizations...
So be careful not underestimating the amount of work.

Aurelien



Re: Paradoxical subreg reload issue

2012-05-02 Thread Aurelien Buhrig
Le 02/05/2012 16:41, Ian Lance Taylor a écrit :
> Aurelien Buhrig  writes:
> 
>> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of
>> this insn:
>> (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0)
>>  (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f]))
>>
>> The register 21 is reloaded into
>> (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register.
>> Since it is a BIG_ENDIAN target, the SI subreg regno is then -1.
>>
>> Note that word_mode is SImode, whereas the class r0 belongs to is
>> HI-wide. I don't know if this matters when reloading.
>>
>> I have no idea how to debug this, if it is a backend or a reload bug.
>> Any idea?
> 
> Where did that insn come from?  It looks like it really wants to be
> 
> (set (reg:QI 21)
>  (truncate:QI (lshiftrt:SI (reg:SI 24) (const_int 31
> 

It comes from the combine pass, which merged the following insns:

(insn 20 19 21 5
(set (reg:SI 27)
  (lshiftrt:SI (reg/v:SI 24 [w])
(const_int 31 [0x1f]))) {*lshrsi3_split}
 (nil))

(insn 21 20 22 5
(set (reg:QI 21 [ iftmp.1 ])
 (subreg:QI (reg:SI 27) 3)) {movqi}
 (expr_list:REG_DEAD (reg:SI 27)

--
Here is the combiner output:

Trying 20 -> 21:
Successfully matched this instruction:
(set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0)
(lshiftrt:SI (reg/v:SI 24 [ w ])
(const_int 31 [0x1f])))
deferring deletion of insn with uid = 20.
modifying insn i321 r21:QI#0=r24:SI 0>>0x1f
deferring rescan insn with uid = 21.


Thanks,
Aurélien



Re: Porting new target architecture to GCC

2012-05-02 Thread Alexander Monakov


On Wed, 2 May 2012, Ian Lance Taylor wrote:

> It's worth looking at Anthony Green's blog about implementing moxie at
> http://moxielogic.org/ , as he described the process of doing a full GCC
> port.

Let me clarify that Anthony described porting in his "GGX patch archives",
linked in my other response in this thread; when the port was functional,
the architecture was renamed to 'moxie' and a new blog was started.  The new
http://moxielogic.org/blog does not contain all those posts about porting.

Alexander


Re: No documentation of -fsched-pressure-algorithm

2012-05-02 Thread Richard Earnshaw
On 02/05/12 14:13, nick clifton wrote:
> Hi Richard,
> 
>> Well, given the replies from you, Ian and Vlad (when reviewing the patch),
>> I feel once again in a minority of one here :-) but... I just don't
>> think we should be advertising this sort of stuff to users.
> 
> OK, what about Ian's suggestion of controlling the algorithm selection 
> via a --param instead of a -f option ?
> 
> 
>> Not because
>> I'm trying to be cliquey, but because any time the user ends up having
>> to use stuff like this represents a failure on the part of the compiler.
> 
> A nice idea in principle, but in practice GCC already has a ton of these 
> specialist options.  Maybe you feel that we should not be adding another 
> one to this list, but I think that we are already too far gone.  GCC and 
> its long list of command line options is an established norm.
> 
> Perhaps now is the time to consider embracing projects like Acovea and 
> Milepost and making them an official, easier-to-use meta front end to gcc ?
> 
> 
>> I mean, at what level would we document it?
> 
> Well I rather like David's suggestion - a split gcc invocation manual 
> with options like -fsched-pressure-algorithm only appearing in the 
> here-be-dragons section.
> 

I think we should document the option, stress that it is a new feature
and say that it has only been enabled on targets where benchmarking has
shown it to be an overall benefit.  Finally we should solicit feedback
from the community as to whether it makes code better or worse.

R.



Re: Using movw/movt rather than minipools in ARM gcc

2012-05-02 Thread Ramana Radhakrishnan
On Fri, Apr 27, 2012 at 9:24 PM, David Sehr  wrote:
> Hello All,
>
> We are using gcc trunk as of 4/27/12, and are attempting to add
> support to the ARM gcc compiler for Native Client.
> We are trying to get gcc -march=armv7-a to use movw/movt consistently
> instead of minipools. The motivation is for
> a new target variant where armv7-a is the minimum supported and
> non-code in .text is never allowed (per Native Client rules).
> But the current behavior looks like a generically poor optimization
> for -march=armv7-a.  (Surely memory loads are slower
> than movw/movt, and no space is saved in many cases.)  For further
> details, this seems to only happen with -O2 or higher.
> -O1 generates movw/movt, seemingly because cprop is folding away a
> LO_SUM/HIGH pair.  Another data point to note
> is that "Ubuntu/Linaro 4.5.2-8ubuntu3" does produce movw/movt for this
> test case, but we haven't tried stock 4.5.

I remember this one - this is
https://bugs.launchpad.net/gcc-linaro/+bug/886124 and I reached the
same conclusion as you did :) Unfortunately I've not been able to work
out why such a change occurred and what's triggered this. Would you be
able to experiment with some of the suggestions in that report and
maybe create an equivalent one in the GCC bugzilla .

I haven't had the time to investigate this particular problem further.

regards,
Ramana




>
> I have enabled TARGET_USE_MOVT, which should force a large fraction of
> constant materialization to use movw/movt
> rather than pc-relative loads.  However, I am still seeing pc-relative
> loads for the following example case and am looking
> for help from the experts here.
>
> int a[1000], b[1000], c[1000];
>
> void foo(int n) {
>   int i;
>   for (i = 0; i < n; ++i) {
>     a[i] = b[i] + c[i];
>   }
> }
>
> When I compile this I get:
>
> foo:
>         ...
> ldr r3, .L7
> ldr r1, .L7+4
> ldr r2, .L7+8
>         ...
> .L7:
> .word  b
> .word  c
> .word  a
> .size foo, .-foo
> .comm c,4000,4
> .comm b,4000,4
> .comm a,4000,4
>
> From some investigation, it seems I need to add a define_split to
> convert SYMBOL_REFs to LO_SUM/HIGH pairs.
> There is already a function called arm_split_constant that seems to do
> this, but no rule seems to be firing to cause
> it to get invoked.  Before I dive into writing the define_split, am I
> missing something obvious?
>
> Cheers,
>
> David


Re: making sizeof(void*) different from sizeof(void(*)())

2012-05-02 Thread Peter Bigot
On Wed, May 2, 2012 at 8:08 AM, Paulo J. Matos  wrote:
> On 30/04/12 13:01, Peter Bigot wrote:
>>
>> I would like to see the technical details, if your code is released
>> somewhere.
>>
>
> Hi Peter,
>
> Sorry for the delay.
> The code is not released, however I can send you a patch against GCC 4.6.3
> sources (our GCC 4.7.0 port is not yet stable) of our changes and will also
> try to explain how it works.

Thanks; I'd appreciate it.

>> Without having started it yet, I'm thinking this can be done by
>> modifying build_pointer_type to generalize the
>> TARGET_ADDR_SPACE_POINTER_MODE to TARGET_TYPE_POINTER_MODE, pass it
>> the whole type instead of just the address space field, and moving
>> TARGET_ADDR_SPACE_POINTER_MODE support to the default implementation
>> for that hook. Likewise for build_reference_type.  Then judicious
>> application of attributes to types and decls would allow detection of
>> the situation where a non-standard pointer size is needed.  I'm hoping
>> there aren't too many other places where that work would get undone.
>>

I've had pretty good success with the above approach, involving the
following changes:

* Eliminate some gratuitous passing of function expressions through
  memory_address(), which insists on treating everything as though it was in
  ADDR_SPACE_GENERIC and therefore forces a conversion to Pmode; also fix
  one use of Pmode which probably should have been FUNCTION_MODE back when
  it was added by rms in 1992.

* Provide new TARGET_TYPE_* hooks paralleling TARGET_ADDR_SPACE_* so that
  the appropriate pointer and address modes can examine the whole type tree,
  rather than assuming the address space is sufficient.  This provides
  access to attributes that influence the selection of appropriate mode,
  which I need for both data and function types.

* Cache the desired pointer_mode and address_mode values in struct
  mem_attrs instead of assuming addrspace is sufficient to recalculate them.

All in all, not too painful.  These'll be in the mspgcc git repository
for gcc at 
http://mspgcc.git.sourceforge.net/git/gitweb.cgi?p=mspgcc/gcc;a=summary
in a couple weeks when I do another release.  Dunno whether it's worth
considering them for trunk sometime.

> As you will see, I haven't used anything related to address spaces feature
> in GCC.

Yeah, the fact that address spaces are ignored for function types, and
apparently aren't available in C++, makes them useless for my needs
even though the support infrastructure is very similar to what I
wanted.

Peter


Re: Paradoxical subreg reload issue

2012-05-02 Thread Eric Botcazou
> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of
> this insn:
> (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0)
>  (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f]))
>
> The register 21 is reloaded into
> (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register.
> Since it is a BIG_ENDIAN target, the SI subreg regno is then -1.
>
> Note that word_mode is SImode, whereas the class r0 belongs to is
> HI-wide. I don't know if this matters when reloading.
>
> I have no idea how to debug this, if it is a backend or a reload bug.

RA/reload is known to have issues with word-mode paradoxical subregs on 
big-endian machines.  For example, on SPARC 64-bit, we run into similar 
problems for FP regs, which are 32-bit.  Likewise on HP-PA 64-bit I think.

So we have kludges in the back-end:

/* Defines invalid mode changes.  Borrowed from the PA port.

   SImode loads to floating-point registers are not zero-extended.
   The definition for LOAD_EXTEND_OP specifies that integer loads
   narrower than BITS_PER_WORD will be zero-extended.  As a result,
   we inhibit changes from SImode unless they are to a mode that is
   identical in size.

   Likewise for SFmode, since word-mode paradoxical subregs are
   problematic on big-endian architectures.  */

#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)   \
  (TARGET_ARCH64\
   && GET_MODE_SIZE (FROM) == 4 \
   && GET_MODE_SIZE (TO) != 4   \
   ? reg_classes_intersect_p (CLASS, FP_REGS) : 0)

-- 
Eric Botcazou