Re: No documentation of -fsched-pressure-algorithm
On 01/05/2012 20:11, Joern Rennecke wrote: Quoting Richard Sandiford : nick clifton writes: OK, but what if it turns out that the new algorithm improves the performance of some benchmarks/applications, but degrades others, within the same architecture ? If that turns out to be the case (and I suspect that it will) then having a documented command line option to select the algorithm makes more sense. That was my point though. If that's the situation, we need to find out why. We shouldn't hand the user a long list of options and tell them to figure out which ones happen to produce the best code. Actually, having a long list of things that you can tweak is exactly what Milepost thrives on. Well, given the replies from you, Ian and Vlad (when reviewing the patch), I feel once again in a minority of one here :-) but... I just don't think we should be advertising this sort of stuff to users. Not because I'm trying to be cliquey, but because any time the user ends up having to use stuff like this represents a failure on the part of the compiler. We have mis-fireing heuristics all over the place. This has actually gotten a lot worse with the shift from rtl to tree optimizers. These days the optimizers know how to do almost any transformation, but squat about the cost/benefit equation. But I don't see how hiding the switches to force the heuristics makes this situation any better. I mean, at what level would we document it? We could give a detailed description of the two algorithms, but there should never be any need to explain those to users (or for the users to have to read about them). And there's no guarantee we won't change the algorithms between releases. So I suspect we'd just have documentation along of the lines of "here, we happen to have two algorithms to do this. Treat them as black boxes, try each one on each source file, and see what works out best." Which isn't particularly insightful and not IMO a good user interface. It's not ideal, but workable. If you could explain coherently when the option should be used, you could probably improve the heuristics already. If you want to make this a bit more meaningful, you could have a bugzilla bug for the imperfect heuristics, and ask people to submit their testcases when they see significant benefit from using an obscure option. One thing that might be nice is to split the documentation pages a little, such as into "normal user options", "advanced user options", and "here be dragons". gcc has a /lot/ of options, many of which are of little use to most users, and it can be overwhelming to see so many on the same page of the manual. If you made a "here be dragons" page, it would make it much easier to have only rough information there. The page would start with a disclaimer that these options are for expert usage and testing, they can change at any time in different versions of gcc, and users should not expect support from suppliers (Code Sourcery, Red Hat, etc.) on their usage. Then you could add limited documentation for options like "-fsched-pressure-algorithm" or the various "--param" options, with just a rough explanation. It doesn't really matter if the explanation is incomprehensible to mortal users - and it may even just be a link to a gcc wiki page or part of the gcc internals documentation. A second thing that would be hugely convenient for advanced users and testers (and people like me who just like to read manuals) would be a version number attached to each option, so that we can see which gcc versions support it. Some of us use multiple gcc versions (I do embedded work - I have gcc for different targets with versions ranging from 2.95 to 4.6) - it would be /very/ nice to be able to look at the latest version of the documentation rather than always having to go back to old versions to figure out if a particular option exists in that particular version. mvh., David
Re: [RFC] Converting end of loop computations to MIN_EXPRs.
On Tue, May 1, 2012 at 8:36 AM, Ramana Radhakrishnan wrote: > Sorry about the delayed response, I've been away for some time. > >> >> I don't exactly understand why the general transform is not advisable. >> We already synthesize min/max operations. > > >> >> Can you elaborate on why you think that better code might be generated >> when not doing this transform? > > The reason why I wasn't happy was because of the code we ended up > generating in this case for ARM comparing the simple examples showed > the following difference - while I'm pretty sure I can massage the > backend to generate the right form in this case with splitters I > probably didn't realize this when I wrote the mail. Given this, I > wonder if it is worth in general doing this transformation in a fold > type operation rather than restricting ourselves only to invariant > operands ? Yes, I think doing this generally would be beneficial. Possible places to hook this up are tree-ssa-forwprop.c if you have tem1 = i < x; tem2 = i < y; tem3 = tem1 && tem2; if (tem3) or tree-ssa-ifcombine.c if you instead see if (i < x) if (i < y) ... Richard. > > The canonical example is as below : > > > #define min(x, y) ((x) < (y)) ? (x) : (y) > int foo (int i, int x ,int y) > { > // return ( i < x) && (i < y); > return i < (min (x, y)); > } > > > Case with min_expr: > > cmp r2, r1 @ 8 *arm_smin_insn/1 [length = 8] > movge r2, r1 > cmp r2, r0 @ 23 *arm_cmpsi_insn/3 [length = 4] > movle r0, #0 @ 24 *p *arm_movsi_insn/2 [length = 4] > movgt r0, #1 @ 25 *p *arm_movsi_insn/2 [length = 4] > bx lr @ 28 *arm_return [length = 12] > > > This might well be . > > cmp r2, r0 > cmpge r1, r0 > movle r0, #0 > movgt r0, #1 > bx lr > > Case without min_expr: > > cmp r0, r2 @ 28 *cmp_and/6 [length = 8] > cmplt r0, r1 > movge r0, #0 @ 29 *mov_scc [length = 8] > movlt r0, #1 > bx lr @ 32 *arm_return [length = 12] > > > >> >>> #define min(x,y) ((x) <= (y) ? (x) : (y)) >>> >>> void foo (int x, int y, int * a, int * b, int *c) >>> { >>> int i; >>> >>> for (i = 0; >>> i < x && i < y; >>> /* i < min (x, y); */ >>> i++) >>> a[i] = b[i] * c[i]; >>> >>> } >>> >>> The patch below deals with this case and I'm guessing that it could >>> also handle more of the comparison cases and come up with more >>> intelligent choices and should be made quite a lot more robust than >>> what it is right now. >> >> Yes. At least if you have i < 5 && i < y we canonicalize it to >> i <= 4 && i < y, so your pattern matching would fail. > > Of-course considering overflow semantics you could transform this to > i < min (x +1, y) where the original condition was i <= x && i < y. > > Thinking about it , it's probably right to state that > > i op1 X && i op2 Y => i op min (X1, Y1) > > when op1 and op2 are identical or according to the table below : > > op1 op2 op X1 Y1 > < <= <= X + 1 Y > > >= > X Y + 1 > < = < <= X Y + 1 > >= > > X + 1 Y > > > Other than being careful about overflow semantics the second table > is probably worthwhile looking at - > >> >> Btw, the canonical case this happens in is probably >> >> for (i = 0; i < n; ++i) >> for (j = 0; j < m && j < i; ++j) >> a[i][j] = ... >> >> thus iterating over the lower/upper triangular part of a non-square matrix >> (including or not including the diagonal, thus also j < m && j <= i > > Ok thanks - fair enough . > > > Ramana > >> >> Richard. >> >>> regards, >>> Ramana >>> >>> >>> >>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c >>> index ce5eb20..a529536 100644 >>> --- a/gcc/tree-ssa-loop-im.c >>> +++ b/gcc/tree-ssa-loop-im.c >>> @@ -563,6 +563,7 @@ stmt_cost (gimple stmt) >>> >>> switch (gimple_assign_rhs_code (stmt)) >>> { >>> + case MIN_EXPR: >>> case MULT_EXPR: >>> case WIDEN_MULT_EXPR: >>> case WIDEN_MULT_PLUS_EXPR: >>> @@ -971,6 +972,124 @@ rewrite_reciprocal (gimple_stmt_iterator *bsi) >>> return stmt1; >>> } >>> >>> +/* We look for a sequence that is : >>> + def_stmt1 : x = a < b >>> + def_stmt2 : y = a < c >>> + stmt: z = x & y >>> + use_stmt_cond: if ( z != 0) >>> + >>> + where b, c are loop invariant . >>> + >>> + In which case we might as well replace this by : >>> + >>> + t = min (b, c) >>> + if ( a < t ) >>> +*/ >>> + >>> +static gimple >>> +rewrite_min_test (gimple_stmt_iterator *bsi) >>> +{ >>> + gimple stmt, def_stmt_x, def_stmt_y, use_stmt_cond, stmt1; >>> + tree x, y, z, a, b, c, var, t, name; >>> + use_operand_p use; >>> + bool is_lhs_of_comparison = false; >>> + >>> +
Porting new target architecture to GCC
Hello, In a course at my university (Universität Würzburg, Germany) we have created a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL) which we then play onto a FPGA (the Xilinx Spartan-3AN) to use. So far if we want to do anything with it, we have to write the assembly code ourselves. How much work would it be to write a HadesXI backend for GCC? (The idea is to use this as a possible bachelor thesis.) Where would be a good place to start; what are the prerequisites for undertaking a project like this other than knowing the CPU architecture inside out? Thanks for your advice, Ben Morgan
Re: Porting new target architecture to GCC
On Wed, May 02, 2012 at 01:30:19PM +0200, Ben Morgan wrote: > In a course at my university (Universität Würzburg, Germany) we have > created a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL) > which we then play onto a FPGA (the Xilinx Spartan-3AN) to use. So far > if we want to do anything with it, we have to write the assembly code > ourselves. > > How much work would it be to write a HadesXI backend for GCC? > (The idea is to use this as a possible bachelor thesis.) I am not familiar with back-ends -I'm more familiar with the middle-end-, and I am not very familiar with the German university system. I'm guessing that what you call a "bachelor thesis" is what is called today "License" in the French university system. My feeling is that understanding GCC and writing a small backend is a big lot of work for a student. (For a GCC expert, it is rumored that making a suboptimal backend for a new architecture is several months of work). So I would perhaps believe that making a new backend for GCC is a quite ambitious goal (perhaps too ambitious for a bachelor thesis, if your goal is mostly to make something usable, not only to learn a big lot of things). If you follow that route, you should first find out, amongst the many existing GCC back-ends, the architecture which seems similar to what your HaDesXI is. Notice that GCC has even back-ends for "fictious" architecture like Knuth's MMIX. To get a picture of GCC, you might be interested to have a glance at some slides under http://gcc-melt.org/ notably http://gcc-melt.org/GCC-MELT-HiPEAC2012.pdf which has many links to other material. Of course, you can find a lot of other material about GCC on Internet. (you might even want to play with GCC MELT to understand some of the basic internal representations of GCC) On the other hand, GCC offers you a very powerful back-end architecture. But GCC is complex, and significantly evolving! Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: No documentation of -fsched-pressure-algorithm
On 2 May 2012 10:37, David Brown wrote: > A second thing that would be hugely convenient for advanced users and > testers (and people like me who just like to read manuals) would be a > version number attached to each option, so that we can see which gcc > versions support it. Some of us use multiple gcc versions (I do embedded > work - I have gcc for different targets with versions ranging from 2.95 to > 4.6) - it would be /very/ nice to be able to look at the latest version of > the documentation rather than always having to go back to old versions to > figure out if a particular option exists in that particular version. I'm sure you know the drill ... patches welcome.
Re: Porting new target architecture to GCC
On Wed, 2 May 2012, Ben Morgan wrote: > Hello, > > In a course at my university (Universität Würzburg, Germany) we have > created a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL) > which we then play onto a FPGA (the Xilinx Spartan-3AN) to use. So far > if we want to do anything with it, we have to write the assembly code > ourselves. > > How much work would it be to write a HadesXI backend for GCC? I remember "6 months and more of full-time work for a skilled developer" mentioned on this mailing list. > Where would be a good place to start; what are the prerequisites for > undertaking a project like this other than knowing the CPU architecture > inside out? I recommend reading "The GGX patch archive" blog entries to get a "big picture" of the steps involved. It was available at spindazzle.org/ggx, but at the moment you'll have to browse it via The Internet Archive ( http://web.archive.org/web/20100117171845/http://spindazzle.org/ggx/ ). Apart from that, the GCC wiki has accumulated many resources, especially in the GettingStarted section ( http://gcc.gnu.org/wiki/GettingStarted ). Alexander
[ANN] ODB C++ ORM 2.0.0 released
I am pleased to announce the release of ODB 2.0.0. ODB is an open source object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL and without manually writing any of the mapping code. ODB is implemented as a GCC plugin and this release adds support for GCC 4.7 series in addition to GCC 4.6 and 4.5. Other major new features in this release: * Support for C++11 which adds integration with the new C++11 standard library components, including smart pointers and containers. Now you can use std::unique_ptr and std::shared_ptr as object pointers (their lazy versions are also provided). For containers, support was added for std::array, std::forward_list, and the unordered containers. * Support for polymorphism which allows you to persist, load, update, erase, and query objects of derived classes using their base class interfaces. Persistent class hierarchies are mapped to the relational database model using the table-per-difference mapping. * Support for composite object ids which are translated to composite primary keys in the relational database. * Support for the NULL semantics for composite values. A more detailed discussion of these features can be found in the following blog post: http://www.codesynthesis.com/~boris/blog/2012/05/02/odb-2-0-0-released/ For the complete list of new features in this version see the official release announcement: http://www.codesynthesis.com/pipermail/odb-announcements/2012/13.html ODB is written in portable C++ and you should be able to use it with any modern C++ compiler. In particular, we have tested this release on GNU/Linux (x86/x86-64), Windows (x86/x86-64), Mac OS X, and Solaris (x86/x86-64/SPARC) with GNU g++ 4.2.x-4.7.x, MS Visual C++ 2008 and 2010, Sun Studio 12, and Clang 3.0. The currently supported database systems are MySQL, SQLite, PostgreSQL, Oracle, and SQL Server. ODB also provides profiles for Boost and Qt, which allow you to seamlessly use value types, containers, and smart pointers from these libraries in your persistent classes. More information, documentation, source code, and pre-compiled binaries are available from: http://www.codesynthesis.com/products/odb/ Enjoy, Boris
Paradoxical subreg reload issue
Hi, I have an issue (gcc 4.6.3, private bacakend) when reloading operands of this insn: (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0) (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f])) The register 21 is reloaded into (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register. Since it is a BIG_ENDIAN target, the SI subreg regno is then -1. Note that word_mode is SImode, whereas the class r0 belongs to is HI-wide. I don't know if this matters when reloading. I have no idea how to debug this, if it is a backend or a reload bug. Any idea? Thank you in advance, Aurélien
Re: making sizeof(void*) different from sizeof(void(*)())
On 30/04/12 13:01, Peter Bigot wrote: I would like to see the technical details, if your code is released somewhere. Hi Peter, Sorry for the delay. The code is not released, however I can send you a patch against GCC 4.6.3 sources (our GCC 4.7.0 port is not yet stable) of our changes and will also try to explain how it works. Without having started it yet, I'm thinking this can be done by modifying build_pointer_type to generalize the TARGET_ADDR_SPACE_POINTER_MODE to TARGET_TYPE_POINTER_MODE, pass it the whole type instead of just the address space field, and moving TARGET_ADDR_SPACE_POINTER_MODE support to the default implementation for that hook. Likewise for build_reference_type. Then judicious application of attributes to types and decls would allow detection of the situation where a non-standard pointer size is needed. I'm hoping there aren't too many other places where that work would get undone. As you will see, I haven't used anything related to address spaces feature in GCC. Sounds like a useful set of changes to have in the main sources, since this is hardly a singular need! Yes. Is there an existing bug/enhancement report for this capability? Don't think so but I would be happy to contribute with whatever I can. -- PMatos
Re: No documentation of -fsched-pressure-algorithm
Hi Richard, Well, given the replies from you, Ian and Vlad (when reviewing the patch), I feel once again in a minority of one here :-) but... I just don't think we should be advertising this sort of stuff to users. OK, what about Ian's suggestion of controlling the algorithm selection via a --param instead of a -f option ? Not because I'm trying to be cliquey, but because any time the user ends up having to use stuff like this represents a failure on the part of the compiler. A nice idea in principle, but in practice GCC already has a ton of these specialist options. Maybe you feel that we should not be adding another one to this list, but I think that we are already too far gone. GCC and its long list of command line options is an established norm. Perhaps now is the time to consider embracing projects like Acovea and Milepost and making them an official, easier-to-use meta front end to gcc ? I mean, at what level would we document it? Well I rather like David's suggestion - a split gcc invocation manual with options like -fsched-pressure-algorithm only appearing in the here-be-dragons section. "Here, we happen to have two algorithms to do this. Treat them as black boxes, try each one on each >source file, and see what works out best." Or: "Here, we have two algorithms to do this. You can treat them as black boxes, try both and see which works best for your application. Or you can delve into their intricacies to see which ought to be the better one for your target. See this post for a description of the algorithms. Either way we would be interested in hearing about which algorithm works best for you, what your application looks like and which architecture you are using. Please contact us at " Cheers Nick
Re: Porting new target architecture to GCC
Ben Morgan wrote: > In a course at my university (Universität Würzburg, Germany) we have created > a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL) which we then > play onto a FPGA (the Xilinx Spartan-3AN) to use. So far if we want to do > anything with it, we have to write the assembly code ourselves. You have already ported binutils and gdb if I understand correctly? > How much work would it be to write a HadesXI backend for GCC? (The idea is > to use this as a possible bachelor thesis.) It's not the idea of your Betreuer, I hope. If so, it's unfair to propose this as a bachelor thesis. Besides that the pure implementation will take several months for an experienced GCC developer (others already commented on this), you will have to author and write corresponding paperwork. Porting GCC is "only filling in hooks", yes, but the internals linked below are often misleading and hard to read for newcomers, likewise intuition from programming experience is often misleading and wrong. Without an experienced GCC developer / backend guy as tutor I'd strongly discourage to pick this topic, and even with an experienced tutor it's a *very* ambitious project, and bugs and shortcoming of the implementation and the resulting gcc executables are likely to diminish you grading in an unfair way. > Where would be a good place to start; what are the prerequisites for > undertaking a project like this other than knowing the CPU architecture > inside out? One basis is a reasonable assembler like GNU as. If the tools after GCC are not "mighty" enough, e.g. if you cannot express things by means of respective relocations or expression modifiers and such as needed, the assembler is not much help. And such a port will be hard without a debugger and a simulator. Many things are easier with a simulator than on silicon. For a start with GCC, it's the internals, see http://gcc.gnu.org/onlinedocs/gccint/ and in particular chapters 10 RTL Representation 16 Machine Descriptions 17 Target Description Macros and Functions 19.1 Target Makefile Fragments Besides that it's reading existing backends. Avoid overly complicated ones like x86 and rs6000. s390 is nicely documented and it can be helpful to consult backends even if the hardware is not similar to your hardware. The more orthogonal the instruction set is, the easier will be the backend. Similar for register set and addressing modes. > > Thanks for your advice, Ben Morgan > Johann
Re: Porting new target architecture to GCC
Ben Morgan writes: > In a course at my university (Universität Würzburg, Germany) we have > created a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL) > which we then play onto a FPGA (the Xilinx Spartan-3AN) to use. So far > if we want to do anything with it, we have to write the assembly code > ourselves. > > How much work would it be to write a HadesXI backend for GCC? > (The idea is to use this as a possible bachelor thesis.) > > Where would be a good place to start; what are the prerequisites for > undertaking a project like this other than knowing the CPU architecture > inside out? The difficulty depends entirely on the characteristics of the CPU and the extent to which you want GCC to take advantage of any unusual features. I've seen other messages commenting on the length of time and the difficulties of the internal docs, but I think they are exaggerating the problems. Porting a new CPU is the best documented part of GCC internals. My rule of thumb for an experienced toolchain programmer to add a complete GNU toolchain port--compiler, assembler, linker, debugger--is three months. The compiler alone is about half that. Other than knowing the CPU, the prerequisite is the ability to read and understand the GCC internal docs, the willingness to look at other GCC ports for similar processors, and the willingness to write code. It's worth looking at Anthony Green's blog about implementing moxie at http://moxielogic.org/ , as he described the process of doing a full GCC port. I don't know what a bachelor thesis is, so I don't know if this would be suitable. A GCC port by itself would be too simple for a masters thesis in the U.S. Ian
Re: Paradoxical subreg reload issue
Aurelien Buhrig writes: > I have an issue (gcc 4.6.3, private bacakend) when reloading operands of > this insn: > (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0) > (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f])) > > The register 21 is reloaded into > (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register. > Since it is a BIG_ENDIAN target, the SI subreg regno is then -1. > > Note that word_mode is SImode, whereas the class r0 belongs to is > HI-wide. I don't know if this matters when reloading. > > I have no idea how to debug this, if it is a backend or a reload bug. > Any idea? Where did that insn come from? It looks like it really wants to be (set (reg:QI 21) (truncate:QI (lshiftrt:SI (reg:SI 24) (const_int 31 Ian
Spital nou de pediatrie
http://www.youtube.com/watch?feature=player_embedded&v=phjGxHn3uKU To unsubscribe please send email to unsubscr...@cc.psd-prahova.ro
Re: Porting new target architecture to GCC
> Ben Morgan wrote: > >> In a course at my university (Universität Würzburg, Germany) we have created >> a 32-bit RISC CPU architecture -- the HaDesXI-CPU -- (in VHDL) which we then >> play onto a FPGA (the Xilinx Spartan-3AN) to use. So far if we want to do >> anything with it, we have to write the assembly code ourselves. > > You have already ported binutils and gdb if I understand correctly? And don't forget an ISS (gdb sim, sid, ...) or a testsuite/board interface if you want to run the GCC execution testsuite... >> How much work would it be to write a HadesXI backend for GCC? (The idea is >> to use this as a possible bachelor thesis.) > > It's not the idea of your Betreuer, I hope. If so, it's unfair to propose > this as a bachelor thesis. Besides that the pure implementation will > take several months for an experienced GCC developer (others already commented > on this), you will have to author and write corresponding paperwork. > > Porting GCC is "only filling in hooks", yes, but the internals linked below > are often misleading and hard to read for newcomers, likewise intuition from > programming experience is often misleading and wrong. > > Without an experienced GCC developer / backend guy as tutor I'd strongly > discourage to pick this topic, and even with an experienced tutor it's > a *very* ambitious project, and bugs and shortcoming of the implementation and > the resulting gcc executables are likely to diminish you grading in an unfair > way. > I do agree with Johann. As an example, we proposed, years ago, a 6-month engineering school internship to do develop a gcc backend "as much as possible", only focused on GCC (sid/gdb/as/ld/... was already done). The student had no GCC backend skills before beginning. I think he coped with it very well, but the result was not stable at all, the testsuite was not set up, and the work had to be continued for weeks/months. And I don't talk about optimizations... So be careful not underestimating the amount of work. Aurelien
Re: Paradoxical subreg reload issue
Le 02/05/2012 16:41, Ian Lance Taylor a écrit : > Aurelien Buhrig writes: > >> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of >> this insn: >> (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0) >> (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f])) >> >> The register 21 is reloaded into >> (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register. >> Since it is a BIG_ENDIAN target, the SI subreg regno is then -1. >> >> Note that word_mode is SImode, whereas the class r0 belongs to is >> HI-wide. I don't know if this matters when reloading. >> >> I have no idea how to debug this, if it is a backend or a reload bug. >> Any idea? > > Where did that insn come from? It looks like it really wants to be > > (set (reg:QI 21) > (truncate:QI (lshiftrt:SI (reg:SI 24) (const_int 31 > It comes from the combine pass, which merged the following insns: (insn 20 19 21 5 (set (reg:SI 27) (lshiftrt:SI (reg/v:SI 24 [w]) (const_int 31 [0x1f]))) {*lshrsi3_split} (nil)) (insn 21 20 22 5 (set (reg:QI 21 [ iftmp.1 ]) (subreg:QI (reg:SI 27) 3)) {movqi} (expr_list:REG_DEAD (reg:SI 27) -- Here is the combiner output: Trying 20 -> 21: Successfully matched this instruction: (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0) (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f]))) deferring deletion of insn with uid = 20. modifying insn i321 r21:QI#0=r24:SI 0>>0x1f deferring rescan insn with uid = 21. Thanks, Aurélien
Re: Porting new target architecture to GCC
On Wed, 2 May 2012, Ian Lance Taylor wrote: > It's worth looking at Anthony Green's blog about implementing moxie at > http://moxielogic.org/ , as he described the process of doing a full GCC > port. Let me clarify that Anthony described porting in his "GGX patch archives", linked in my other response in this thread; when the port was functional, the architecture was renamed to 'moxie' and a new blog was started. The new http://moxielogic.org/blog does not contain all those posts about porting. Alexander
Re: No documentation of -fsched-pressure-algorithm
On 02/05/12 14:13, nick clifton wrote: > Hi Richard, > >> Well, given the replies from you, Ian and Vlad (when reviewing the patch), >> I feel once again in a minority of one here :-) but... I just don't >> think we should be advertising this sort of stuff to users. > > OK, what about Ian's suggestion of controlling the algorithm selection > via a --param instead of a -f option ? > > >> Not because >> I'm trying to be cliquey, but because any time the user ends up having >> to use stuff like this represents a failure on the part of the compiler. > > A nice idea in principle, but in practice GCC already has a ton of these > specialist options. Maybe you feel that we should not be adding another > one to this list, but I think that we are already too far gone. GCC and > its long list of command line options is an established norm. > > Perhaps now is the time to consider embracing projects like Acovea and > Milepost and making them an official, easier-to-use meta front end to gcc ? > > >> I mean, at what level would we document it? > > Well I rather like David's suggestion - a split gcc invocation manual > with options like -fsched-pressure-algorithm only appearing in the > here-be-dragons section. > I think we should document the option, stress that it is a new feature and say that it has only been enabled on targets where benchmarking has shown it to be an overall benefit. Finally we should solicit feedback from the community as to whether it makes code better or worse. R.
Re: Using movw/movt rather than minipools in ARM gcc
On Fri, Apr 27, 2012 at 9:24 PM, David Sehr wrote: > Hello All, > > We are using gcc trunk as of 4/27/12, and are attempting to add > support to the ARM gcc compiler for Native Client. > We are trying to get gcc -march=armv7-a to use movw/movt consistently > instead of minipools. The motivation is for > a new target variant where armv7-a is the minimum supported and > non-code in .text is never allowed (per Native Client rules). > But the current behavior looks like a generically poor optimization > for -march=armv7-a. (Surely memory loads are slower > than movw/movt, and no space is saved in many cases.) For further > details, this seems to only happen with -O2 or higher. > -O1 generates movw/movt, seemingly because cprop is folding away a > LO_SUM/HIGH pair. Another data point to note > is that "Ubuntu/Linaro 4.5.2-8ubuntu3" does produce movw/movt for this > test case, but we haven't tried stock 4.5. I remember this one - this is https://bugs.launchpad.net/gcc-linaro/+bug/886124 and I reached the same conclusion as you did :) Unfortunately I've not been able to work out why such a change occurred and what's triggered this. Would you be able to experiment with some of the suggestions in that report and maybe create an equivalent one in the GCC bugzilla . I haven't had the time to investigate this particular problem further. regards, Ramana > > I have enabled TARGET_USE_MOVT, which should force a large fraction of > constant materialization to use movw/movt > rather than pc-relative loads. However, I am still seeing pc-relative > loads for the following example case and am looking > for help from the experts here. > > int a[1000], b[1000], c[1000]; > > void foo(int n) { > int i; > for (i = 0; i < n; ++i) { > a[i] = b[i] + c[i]; > } > } > > When I compile this I get: > > foo: > ... > ldr r3, .L7 > ldr r1, .L7+4 > ldr r2, .L7+8 > ... > .L7: > .word b > .word c > .word a > .size foo, .-foo > .comm c,4000,4 > .comm b,4000,4 > .comm a,4000,4 > > From some investigation, it seems I need to add a define_split to > convert SYMBOL_REFs to LO_SUM/HIGH pairs. > There is already a function called arm_split_constant that seems to do > this, but no rule seems to be firing to cause > it to get invoked. Before I dive into writing the define_split, am I > missing something obvious? > > Cheers, > > David
Re: making sizeof(void*) different from sizeof(void(*)())
On Wed, May 2, 2012 at 8:08 AM, Paulo J. Matos wrote: > On 30/04/12 13:01, Peter Bigot wrote: >> >> I would like to see the technical details, if your code is released >> somewhere. >> > > Hi Peter, > > Sorry for the delay. > The code is not released, however I can send you a patch against GCC 4.6.3 > sources (our GCC 4.7.0 port is not yet stable) of our changes and will also > try to explain how it works. Thanks; I'd appreciate it. >> Without having started it yet, I'm thinking this can be done by >> modifying build_pointer_type to generalize the >> TARGET_ADDR_SPACE_POINTER_MODE to TARGET_TYPE_POINTER_MODE, pass it >> the whole type instead of just the address space field, and moving >> TARGET_ADDR_SPACE_POINTER_MODE support to the default implementation >> for that hook. Likewise for build_reference_type. Then judicious >> application of attributes to types and decls would allow detection of >> the situation where a non-standard pointer size is needed. I'm hoping >> there aren't too many other places where that work would get undone. >> I've had pretty good success with the above approach, involving the following changes: * Eliminate some gratuitous passing of function expressions through memory_address(), which insists on treating everything as though it was in ADDR_SPACE_GENERIC and therefore forces a conversion to Pmode; also fix one use of Pmode which probably should have been FUNCTION_MODE back when it was added by rms in 1992. * Provide new TARGET_TYPE_* hooks paralleling TARGET_ADDR_SPACE_* so that the appropriate pointer and address modes can examine the whole type tree, rather than assuming the address space is sufficient. This provides access to attributes that influence the selection of appropriate mode, which I need for both data and function types. * Cache the desired pointer_mode and address_mode values in struct mem_attrs instead of assuming addrspace is sufficient to recalculate them. All in all, not too painful. These'll be in the mspgcc git repository for gcc at http://mspgcc.git.sourceforge.net/git/gitweb.cgi?p=mspgcc/gcc;a=summary in a couple weeks when I do another release. Dunno whether it's worth considering them for trunk sometime. > As you will see, I haven't used anything related to address spaces feature > in GCC. Yeah, the fact that address spaces are ignored for function types, and apparently aren't available in C++, makes them useless for my needs even though the support infrastructure is very similar to what I wanted. Peter
Re: Paradoxical subreg reload issue
> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of > this insn: > (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0) > (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f])) > > The register 21 is reloaded into > (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register. > Since it is a BIG_ENDIAN target, the SI subreg regno is then -1. > > Note that word_mode is SImode, whereas the class r0 belongs to is > HI-wide. I don't know if this matters when reloading. > > I have no idea how to debug this, if it is a backend or a reload bug. RA/reload is known to have issues with word-mode paradoxical subregs on big-endian machines. For example, on SPARC 64-bit, we run into similar problems for FP regs, which are 32-bit. Likewise on HP-PA 64-bit I think. So we have kludges in the back-end: /* Defines invalid mode changes. Borrowed from the PA port. SImode loads to floating-point registers are not zero-extended. The definition for LOAD_EXTEND_OP specifies that integer loads narrower than BITS_PER_WORD will be zero-extended. As a result, we inhibit changes from SImode unless they are to a mode that is identical in size. Likewise for SFmode, since word-mode paradoxical subregs are problematic on big-endian architectures. */ #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ (TARGET_ARCH64\ && GET_MODE_SIZE (FROM) == 4 \ && GET_MODE_SIZE (TO) != 4 \ ? reg_classes_intersect_p (CLASS, FP_REGS) : 0) -- Eric Botcazou