Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression
Hi, Jeff Thanks for your advice. can_replace_by.patch is the new patch to handle both cases. pr43920-2.c.244r.jump2.ori is the original jump2 rtl dump pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump after patch can_replace_by.patch Could you help me to review the patch? Thanks again. Shiva 2015-04-18 0:03 GMT+08:00 Jeff Law : > On 04/17/2015 03:57 AM, Shiva Chen wrote: >> >> Hi, >> >> I think the rtl dump in >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916 >> is not jump2 phase rtl dump. >> >> Because jump2 is after ira, the register number should be hardware >> register number. >> >> the jump2 rtl dump should as follow >> >> ... >> 31: NOTE_INSN_BASIC_BLOCK 5 >> 32: [r6:SI]=r4:SI >>REG_DEAD r6:SI >>REG_DEAD r4:SI >> 33: [r5:SI]=r0:SI >>REG_DEAD r5:SI >>REG_DEAD r0:SI >> 7: r0:SI=0 >>REG_EQUAL 0 >> 85: use r0:SI >> 86: >> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} >>REG_UNUSED pc:SI >>REG_UNUSED r3:SI >>REG_CFA_RESTORE r7:SI >>REG_CFA_RESTORE r6:SI >>REG_CFA_RESTORE r5:SI >>REG_CFA_RESTORE r4:SI >>REG_CFA_RESTORE r3:SI >> 77: barrier >> 46: L46: >> 45: NOTE_INSN_BASIC_BLOCK 6 >> 8: r0:SI=r4:SI >>REG_DEAD r4:SI >>REG_EQUAL 0x >> 87: use r0:SI >> 88: >> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} >>REG_UNUSED pc:SI >>REG_UNUSED r3:SI >>REG_CFA_RESTORE r7:SI >>REG_CFA_RESTORE r6:SI >>REG_CFA_RESTORE r5:SI >>REG_CFA_RESTORE r4:SI >>REG_CFA_RESTORE r3:SI >> 79: barrier >> 54: L54: >> 53: NOTE_INSN_BASIC_BLOCK 7 >> 9: r0:SI=0x <== lost REG_EQUAL after patch >> 34: L34: >> 35: NOTE_INSN_BASIC_BLOCK 8 >> 41: use r0:SI >> 90: >> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} >>REG_UNUSED pc:SI >>REG_UNUSED r3:SI >>REG_CFA_RESTORE r7:SI >>REG_CFA_RESTORE r6:SI >>REG_CFA_RESTORE r5:SI >>REG_CFA_RESTORE r4:SI >>REG_CFA_RESTORE r3:SI >> 89: barrier > > Intead of the slim dump, can you please include the full RTL dump. I find > those much easier to read. > > > >> >> Possible patch for can_replace_by in cfgcleanup.c. >> >> - if (!note1 || !note2 || !rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0)) >> - || !CONST_INT_P (XEXP (note1, 0))) >> + >> + if (!note1 || !CONST_INT_P (XEXP (note1, 0))) >> return dir_none; >> >> + if (note2) >> +{ >> + if (!rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0))) >> + return dir_none; >> +} >> + else >> +{ >> + if (!CONST_INT_P (SET_SRC (s2)) >> + || !rtx_equal_p (XEXP (note1, 0), SET_SRC (s2))) >> + return dir_none; >> +} >> + >> >> I'm not sure the idea is ok or it might crash something. >> Any suggestion would be very helpful. > > Seems like you're on a reasonable path to me. I suggest you stick with it. > > Basically what it appears you're trying to do is unify insns from different > blocks where one looks like > > (set x y) with an attached REG_EQUAL note > > And the other looks like > > (set x const_int) > > Where the REG_EQUAL note has the same value as the const_int in the second > set. > > I think you'd want to handle both cases i1 has the note i2, no note and i1 > has no note and i2 has a note. > > Jeff > > jeff can_replace_by.patch Description: Binary data pr43920-2.c.244r.jump2.ori Description: Binary data pr43920-2.c.244r.jump2.patch_can_replace_by Description: Binary data Changelog.can_replace_by Description: Binary data
Re: NIOS and atomic primitives
On 19 April 2015 at 03:06, Sandra Loosemore wrote: > Is it one of the standard g++ or libstdc++ test cases that is failing? I > could check what nios2-elf does with it if I knew what to look for. Libstdc++ does not use the __sync built-ins in the 4.9 branch, it has switched to the __atomic ones which should be provided by libatomic, if that exists for the target.
RE: try_merge_delay_insn with delay list > 1
I've attached the fixed version of the patch. I've tested it on the trunk with my private target. I can't provide a test because apparently no backend (other than my private one) uses delay slots with more that 1 slot. I was also unable to test the behaviour of this patch for an hypothetic target providing delay lots with more that 1 slot AND the possibility to annul instruction in delay slots. It seems to me that this patch is a small enhancement anyway. I hope it's ok for trunk :) Regards, Selim -Message d'origine- De : Jeff Law [mailto:l...@redhat.com] Envoyé : vendredi 17 avril 2015 18:41 À : BELBACHIR Selim; gcc@gcc.gnu.org Objet : Re: try_merge_delay_insn with delay list > 1 On 03/10/2015 07:40 AM, BELBACHIR Selim wrote: > Me again :) > > I enhanced my patch because it was not generalized for instructions with N > delay_slots. Mostly OK, though there are some formatting nits that need to be corrected. We have whitespace around arithmetic, logical and comparison operators to separate them from their operands. So instead of slot_number+j Use slot_number + j Instead of j=1 Use j = 1 Lines should be wrapped at 80 columns. So you end up with something like this foo (argument1, argument2, argument3) ie, when you wrap, the arguments to the call will line up vertically. It may help wrapping to create a local variable to hold PATTERN (insn). Call it 'pat' :-) When referring to variables or parameters in a comment, capitalize them. The patch may need updating for the trunk, please test it with the trunk when you ask for it to be included on the trunk. These are all fairly minor issues. The actual change seems reasonable. What systems do you have that you could do a bootstrap and regression test with? Ideally since you're changing the delay slot branching code it'd be a system with delay slots :-) If that's not possible because you don't have access to a bootstrapping system with delay slots, make sure to mention it. Ideally you'd have a test for this bug. However, with a private port I wouldn't consider it a necessity. However, you may want to go ahead and create one for internal uses. And if you ever submit your port to the offficial sources, you can include target specific tests at that time. try_merge_patch3 Description: try_merge_patch3
How to get anchors to onlinedocs that can be used in external documents?
How to add an anchor to one of the onlinedocs texi documents? Suppose I'd like to add an HTML anchor to one of the onlinedocs, for example to link it from the gcc release notes. Currently the only anchors are either auto-generated and contain some hashes (hence not usable from external documents) or are trivial like: x86-Windows-Options.html#x86-Windows-Options Submodel-Options.html#Submodel-Options x86-transactional-memory-intrinsics.html#x86-transactional-memory-intrinsics ... and are always pointing to the top of the respective HTML page. How can I get an anchor to, say, a subsection without fragmenting the respective HTML document into dozens of individual documents and thereby changing the document layout, which is not what I want. Examples as listed in the GCC onlindocs "Table of Contents": - 3.17.1.1 -march and -mcpu Feature Modifiers - 3.17.5.1 EIND and Devices with More Than 128 Ki Bytes of Flash - 6.16.4 SPU Named Address Spaces - 6.43.2.8 x86 Floating-Point asm Operands All these entries in the TOC don't link to the indicated place but to the respective document's header, e.g. "x86 Floating-Point asm Operands" does not forward to that section but to "Extended-Asm.html#Extended-Asm" which contains 9 subsections. Johann
Do we have any plans to un-flatten our header files?
Hi, because I really dislike the hassle our (almost) flattened header files cause quite often, I have made a very simple experiment to find out how the header files really depend on each other. Some results, together with a dozen of short paragraphs of relevant text are here: http://labs.suse.cz/mjambor/150417-headers_in_gcc/headers_in_gcc.html Short summary: I wrote a script removing an include at a time and parsing errors and recording such discovered "dependencies." The page has text-file reports showing what needs to be included before what, in which files and because of which error messages. I have also plotted a few graphs from these "dependencies." The pictures look interesting, I am not sure if they are in any way useful, but have a look and judge for yourself. Regardless of whether or not this experiment had any value. Do we plan to do un-flattening of header files in the course of the next release or two? The pain the flat header files is real and enduring it for much longer or indefinitely seems unhealthy to me. Thanks, Martin
Re: Do we have any plans to un-flatten our header files?
On Mon, Apr 20, 2015 at 3:02 PM, Martin Jambor wrote: > Hi, > > because I really dislike the hassle our (almost) flattened header > files cause quite often, I have made a very simple experiment to find > out how the header files really depend on each other. Some results, > together with a dozen of short paragraphs of relevant text are here: > > http://labs.suse.cz/mjambor/150417-headers_in_gcc/headers_in_gcc.html > > Short summary: I wrote a script removing an include at a time and > parsing errors and recording such discovered "dependencies." The page > has text-file reports showing what needs to be included before what, > in which files and because of which error messages. I have also > plotted a few graphs from these "dependencies." The pictures look > interesting, I am not sure if they are in any way useful, but have > a look and judge for yourself. > > Regardless of whether or not this experiment had any value. Do we > plan to do un-flattening of header files in the course of the next > release or two? The pain the flat header files is real and enduring > it for much longer or indefinitely seems unhealthy to me. The goal was to end up with a few aggregating includes so you just include sth like ssa-pass.h from a tree pass that works on SSA form. Or gimple.h for anything that only needs to look at GIMPLE stmts. The first step was the flattening to see what dependencies we have (ok, it's not really good at that...) and to move stuff (mostly prototypes) between files to reduce dependencies. It's easy to create a ssa-pass.h file, but it's hard to determine the minimum stuff that should be in there (no rtl.h for example). For Andrews project (that tree stuff) the most important thing is to not have tree(-core?).h leak into files that only use GIMPLE (and thus the "valid" parts of tree). Richard. > Thanks, > > Martin
Re: NIOS and atomic primitives
On 4/20/2015 5:04 AM, Jonathan Wakely wrote: > On 19 April 2015 at 03:06, Sandra Loosemore wrote: >> Is it one of the standard g++ or libstdc++ test cases that is failing? I >> could check what nios2-elf does with it if I knew what to look for. > Libstdc++ does not use the __sync built-ins in the 4.9 branch, it has > switched to the __atomic ones which should be provided by libatomic, > if that exists for the target. This is with 4.9.2. This is the only RTEMS target giving this error so I must be missing the magic bit of configurery for it. /users/joel/rtems-4.11-work/rtems-source-builder/rtems/build/nios2-rtems4.11-gcc-4.9.2-newlib-2.2.0.20150323-x86_64-linux-gnu-1/build/nios2-rtems4.11/libstdc++-v3/include/ext/atomicity.h:49: undefined reference to `__sync_fetch_and_add_4' /users/joel/rtems-4.11-work/tools/lib/gcc/nios2-rtems4.11/4.9.2/libstdc++.a(locale-inst.o):/users/joel/rtems-4.11-work/rtems-source-builder/rtems/build/nios2-rtems4.11-gcc-4.9.2-newlib-2.2.0.20150323-x86_64-linux-gnu-1/build/nios2-rtems4.11/libstdc++-v3/include/ext/atomicity.h:49: more undefined references to `__sync_fetch_and_add_4' follow Where should I be looking? -- Joel Sherrill, Ph.D. Director of Research & Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985
Re: NIOS and atomic primitives
On Mon, Apr 20, 2015 at 11:14:48AM -0500, Joel Sherrill wrote: > > > On 4/20/2015 5:04 AM, Jonathan Wakely wrote: > > On 19 April 2015 at 03:06, Sandra Loosemore wrote: > >> Is it one of the standard g++ or libstdc++ test cases that is failing? I > >> could check what nios2-elf does with it if I knew what to look for. > > Libstdc++ does not use the __sync built-ins in the 4.9 branch, it has > > switched to the __atomic ones which should be provided by libatomic, > > if that exists for the target. > This is with 4.9.2. This is the only RTEMS target giving this error so > I must > be missing the magic bit of configurery for it. > > /users/joel/rtems-4.11-work/rtems-source-builder/rtems/build/nios2-rtems4.11-gcc-4.9.2-newlib-2.2.0.20150323-x86_64-linux-gnu-1/build/nios2-rtems4.11/libstdc++-v3/include/ext/atomicity.h:49: > undefined reference to `__sync_fetch_and_add_4' > /users/joel/rtems-4.11-work/tools/lib/gcc/nios2-rtems4.11/4.9.2/libstdc++.a(locale-inst.o):/users/joel/rtems-4.11-work/rtems-source-builder/rtems/build/nios2-rtems4.11-gcc-4.9.2-newlib-2.2.0.20150323-x86_64-linux-gnu-1/build/nios2-rtems4.11/libstdc++-v3/include/ext/atomicity.h:49: > more undefined references to `__sync_fetch_and_add_4' follow > > Where should I be looking? Look where your atomicity.h comes from? On the 4.9 branch, none of the atomicity.h headers included in vanilla libstdc++-v3 have any __sync_* uses. Jakub
RE: IRA preferencing issues
Interestingly even when the preferences are accurate, lra_constraints completely ignores the preferred/allocno class. If the cost of 2 alternatives is equal in every way (which will be the case if they are both legal matches as the standard cost functions are not used at all), the wrong one may be chosen depending on the order in the MD file. This is particularly bad when the spill optimization pass later removes some of the spills and correctly allocates them to their preferred allocno class... Forcing win = true if the register class of the alternative intersects with the preferred class generates significantly better spill code for cases where the preference is accurate (ie. not just ALL_REGS), resulting in far less confusion between integer and FP registers. So shouldn't get_reg_class return the preference/allocno class like below rather than NO_REGS? diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 0ddd842..f38914a 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -263,7 +263,10 @@ get_reg_class (int regno) } if (regno >= new_regno_start) return lra_get_allocno_class (regno); - return NO_REGS; + return reg_preferred_class (regno); } Wilco
Re: Do we have any plans to un-flatten our header files?
On 04/20/2015 09:18 AM, Richard Biener wrote: On Mon, Apr 20, 2015 at 3:02 PM, Martin Jambor wrote: Hi, because I really dislike the hassle our (almost) flattened header files cause quite often, I have made a very simple experiment to find out how the header files really depend on each other. Some results, together with a dozen of short paragraphs of relevant text are here: http://labs.suse.cz/mjambor/150417-headers_in_gcc/headers_in_gcc.html Short summary: I wrote a script removing an include at a time and parsing errors and recording such discovered "dependencies." The page has text-file reports showing what needs to be included before what, in which files and because of which error messages. I have also plotted a few graphs from these "dependencies." The pictures look interesting, I am not sure if they are in any way useful, but have a look and judge for yourself. The problem with this simple approach (which was the original one) is that you miss things like target macros and conditional compilation. There was a long discussion about why that doesnt work in general, but it boils down to constructs like header.h #define HAS_LOOP_INSN file.c #if HAS_LOOP_INSN code; #end file.c compiles and executes perfectly fine without header.h... all you get is suboptimal code which would be hellish to find later. The same thing applies to header files which may depend on other header files... this isnt a compilation-fail dependency Most of them will miss tm.h in particular, but require some closer examination to see if they are immune to this kind of thing. Regardless of whether or not this experiment had any value. Do we plan to do un-flattening of header files in the course of the next release or two? The pain the flat header files is real and enduring it for much longer or indefinitely seems unhealthy to me. The goal was to end up with a few aggregating includes so you just include sth like ssa-pass.h from a tree pass that works on SSA form. Or gimple.h for anything that only needs to look at GIMPLE stmts. The first step was the flattening to see what dependencies we have (ok, it's not really good at that...) and to move stuff (mostly prototypes) between files to reduce dependencies. It's easy to create a ssa-pass.h file, but it's hard to determine the minimum stuff that should be in there (no rtl.h for example). For Andrews project (that tree stuff) the most important thing is to not have tree(-core?).h leak into files that only use GIMPLE (and thus the "valid" parts of tree). Richard. I plan to spend some time and get it wrapped it during this stage 1...It flattened enough to be useful, and we ought to be able to product better aggregators. Andrew
Re: try_merge_delay_insn with delay list > 1
On 04/20/2015 05:08 AM, BELBACHIR Selim wrote: I've attached the fixed version of the patch. I've tested it on the trunk with my private target. I can't provide a test because apparently no backend (other than my private one) uses delay slots with more that 1 slot. I was also unable to test the behaviour of this patch for an hypothetic target providing delay lots with more that 1 slot AND the possibility to annul instruction in delay slots. It seems to me that this patch is a small enhancement anyway. I hope it's ok for trunk :) Even for small enhancements or bugfixes, we try to at least do some basic testing. Unfortunately with no sparc or mips machines in the compile farm, good testing of a reorg.c change is hard. I built mips-elf cross tools and used those to compile newlib for mips-elf. Then I applied your patch, rebuilt the compiler and used that to compile newlib again. Then I compared all the objects from the two copies of newlib and verified the code we generated as identical. So there's at least some degree of confidence we didn't mess anything up in the single delay slot case. I fixed a couple more minor formatting problems and installed your change on the trunk. jeff
Announcing Segher Boessenkool as combine.c maintainer
I'm pleased to announce that Segher Boessenkool has been appointed as maintainer for instruction combiner (combine.c). Segher, can you please add yourself to the MAINTAINERS file for the additional role. Thanks, jeff
Re: IRA preferencing issues
On 17/04/15 09:26 AM, Matthew Fortune wrote: Wilco Dijkstra writes: While investigating why the IRA preferencing algorithm often chooses incorrect preferences from the costs, I noticed this thread: https://gcc.gnu.org/ml/gcc/2011-05/msg00186.html I am seeing the exact same issue on AArch64 - during the final preference selection ira-costs takes the union of any register classes that happen to have equal cost. As a result many registers get ALL_REGS as the preferred register eventhough its cost is much higher than either GENERAL_REGS or FP_REGS. So we end up with lots of scalar SIMD instructions and expensive int<->FP moves in integer code when register pressure is high. When the preference is computed correctly as in the proposed patch (choosing the first class with lowest cost, ie. GENERAL_REGS) the resulting code is much more efficient, and there are no spurious SIMD instructions. Choosing a preferred class when it doesn't have the lowest cost is clearly incorrect. So is there a good reason why the proposed patch should not be applied? I actually wonder why we'd ever need to do a union - if there are 2 classes with equal cost, you'd use the 2nd as the alternative class. The other question I had is whether there is a good way to get improve the preference in cases like this and avoid classes with equal cost altogether. The costs are clearly not equal: scalar SIMD instructions have higher latency and require extra int<->FP moves. It is possible to mark variants in the MD patterns using '?' to discourage them but that seems like a hack, just like '*'. Is there a general way to say that GENERAL_REGS is preferred over FP_REGS for SI/DI mode? MIPS has the same problem here and we have been looking at ways to address it purely via costings rather than changing IRA. What we have done so far is to make the cost of a move from GENERAL_REGS to FP_REGS more expensive than memory if the move has an integer mode. The goal for MIPS is to never allocate an FP register to an integer mode unless it was absolutely necessary owing to an integer to fp conversion where the integer has to be put in an FP register. Ideally I'd like a guarantee that FP registers will never be used unless a floating point type is present in the source but I haven't found a way to do that given the FP-int conversion issue requiring SImode to be allowed in FP regs. The patch for MIPS is not submitted yet but has eliminated the final two uses of FP registers when building the whole Linux kernel with hard-float enabled. I am however still not confident enough to say you can build integer only code with hard-float and never touch an FP register. Since there are multiple architectures suffering from this I guess we should look at properly addressing it in generic code. Wilco, Matt, thanks for sharing your problem cases. It would be nice if you provide a small test case and fill PR in GCC bugzilla, or point me one if it already exists. Preferred and alternative classes are from old RA which implemented Chow's priority based coloring. This algorithm has a different coloring criteria, generally speaking it can be considered as simultaneous coloring and assigning and permits easy usage of several different priorities reg classes for a pseudo. IRA uses Chaitin-Briggs coloring originally with Kempe's criteria which is a standard now for industrial optimizing compilers (LLVM is a rare exclusion). It assumes that we have non-intersected reg-classes and each pseudo belongs to one class. New coloring criteria developed on ones proposed by Smith's and Holloway were lately added to IRA. These criteria permit pseudo-register classes form tree (one class can fully include another class). Preferred and alternative classes can not be integrated to Chaitin-Briggs approach (as to undeservedly forgotten Ershow's graph coloring based on merging non-connected nodes of conflict graph). So we need to use one class in IRA for a pseudo. If we have mem-mem move through a register and floating point or integer register, it would be wrong to choose only one class in case when only one integer or fp register pressure is high. IRA costs has many drawbacks and I planned to improve it. I believe it should choose insn alternatives first and reg classes after that based on the chosen alternatives. Wilco, I might be wrong and the patch you mentioned works well (e.g. probability of the above case mem-mem move is small). If you provide data (e.g. on SPEC2000/SPEC2006) for your target how the patch improves the code, we could consider it as some temporary (it might become a permanent) solution until ira-costs.c is rewritten. I myself could measure the patch effect on x86/x86-64 to make a final decision. Thanks.
Re: PR63633: May middle-end come up width hard regs for insn expanders?
On 17/04/15 05:58 AM, Georg-Johann Lay wrote: I allowed me to CC Vladimir; maybe he can propose how the backend can describe an efficient, constraint-based solution. The problem is about expanders producing insns with non-fixed hard-regs as in/out operands or clobbers. This includes move insn from non-generic address spaces which require dedicated hard regs. Issue is about correctness and efficiency of generated code. I might be wrong but I think you have a bloated code because you use scratches. I already told several times that usage of scratch is always a bad idea. It was a bad idea for an old RA and is still a bad idea for IRA. The usage of scratches should be prohibited, probably we should write it somewhere. It is better to use just a regular pseudo instead. Why it is a bad idea? Because IRA (or the old global RA) does not take them into account *at all*. It means that IRA might think that there are enough registers for pseudos but in reality it is wrong because of scratches in live range of the pseudos. If it is not the case I should investigate why you have a bloated code and small test would help here. Thanks, I hope my comments will be useful.
Re: PR63633: May middle-end come up width hard regs for insn expanders?
On Mon, Apr 20, 2015 at 10:11 PM, Vladimir Makarov wrote: > I might be wrong but I think you have a bloated code because you use > scratches. I already told several times that usage of scratch is always a > bad idea. It was a bad idea for an old RA and is still a bad idea for IRA. > The usage of scratches should be prohibited, probably we should write it > somewhere. It is better to use just a regular pseudo instead. Thanks Vladimir, I didn't know this. Does this mean that, for example, extendsidi in i386.md would be better if it did not use match_scratch? The expander and the 32-bits version of the insn currently look like this: (define_expand "extendsidi2" [(set (match_operand:DI 0 "register_operand") (sign_extend:DI (match_operand:SI 1 "register_operand")))] "" { if (!TARGET_64BIT) { emit_insn (gen_extendsidi2_1 (operands[0], operands[1])); DONE; } }) (define_insn "extendsidi2_1" [(set (match_operand:DI 0 "nonimmediate_operand" "=*A,r,?r,?*o") (sign_extend:DI (match_operand:SI 1 "register_operand" "0,0,r,r"))) (clobber (reg:CC FLAGS_REG)) (clobber (match_scratch:SI 2 "=X,X,X,&r"))] "!TARGET_64BIT" "#") // there is a post reg-alloc splitter If I understand your remark on SCRATCH correctly, then the expander should use gen_reg_rtx for operand 2 of extendsidi2_1; and the insn should use match_operand on operand 2 instead of match_scratch. Correct? When is a scratch still OK? The most common pattern in i386 is (clobber (match_scratch ...))) which probably always results in the need for a register that IRA doesn't see, so I assume that this is one of the cases where you would recommend using a pseudo instead...? (A quick grep on config/*.md shows 952 cases like this, and 113 match_scratch uses of a different form.) A match_scratch or (clobber (scratch)) in a define_peephole2 should also always be fine, because that's just a way to see if there is a suitable register available to perform the peephole code transformation (peephole2 runs after reg-alloc). (mem:BLK (scratch)) is always OK, > Why it is a bad idea? Because IRA (or the old global RA) does not take > them into account *at all*. It means that IRA might think that there are > enough registers for pseudos but in reality it is wrong because of > scratches in live range of the pseudos. Is there a reason why IRA doesn't replace scratches with pseudos, like LRA does (and IIRC reload does, also)? Ciao! Steven
[wwwdocs] PATCH for Re: [ANN] gcc-lua: Lua plugin for the GNU Compiler Collection
Hi Peter, On Wed, 31 Oct 2012, Peter Colberg wrote: > gcc‑lua extends the GNU Compiler Collection with the ability to run Lua > scripts. The plugin provides an interface to register callback functions > for plugin events, and inspect the abstract syntax tree of a translation > unit. The plugin is useful for static C code analysis. gcc‑lua supports > GCC 4.6 or 4.7, and, Lua 5.1 or 5.2, or LuaJIT. I added this to our GCC Extensions page at https://gcc.gnu.org/extensions.html with the patch below. Let us know if you'd like to see this tweaked or changed otherwise. Thanks for the heads-up! Gerald Index: extensions.html === RCS file: /cvs/gcc/wwwdocs/htdocs/extensions.html,v retrieving revision 1.53 diff -u -r1.53 extensions.html --- extensions.html 16 Apr 2012 12:46:03 - 1.53 +++ extensions.html 20 Apr 2015 22:49:19 - @@ -26,6 +26,13 @@ to ease development of GCC plugin-like extensions. +http://colberg.org/gcc-lua/";>Lua plugin for GCC + +gcc‑lua extends GCC with the ability to run Lua scripts. The plugin + provides an interface to register callback functions for plugin events, + and inspect the abstract syntax tree of a translation unit. + + http://runtime.bordeaux.inria.fr/StarPU/";>StarPU StarPU is a GCC extension and run-time support library for hybrid
Re: PR63633: May middle-end come up width hard regs for insn expanders?
On 20/04/15 06:27 PM, Steven Bosscher wrote: On Mon, Apr 20, 2015 at 10:11 PM, Vladimir Makarov wrote: I might be wrong but I think you have a bloated code because you use scratches. I already told several times that usage of scratch is always a bad idea. It was a bad idea for an old RA and is still a bad idea for IRA. The usage of scratches should be prohibited, probably we should write it somewhere. It is better to use just a regular pseudo instead. Thanks Vladimir, I didn't know this. Does this mean that, for example, extendsidi in i386.md would be better if it did not use match_scratch? The expander and the 32-bits version of the insn currently look like this: (define_expand "extendsidi2" [(set (match_operand:DI 0 "register_operand") (sign_extend:DI (match_operand:SI 1 "register_operand")))] "" { if (!TARGET_64BIT) { emit_insn (gen_extendsidi2_1 (operands[0], operands[1])); DONE; } }) (define_insn "extendsidi2_1" [(set (match_operand:DI 0 "nonimmediate_operand" "=*A,r,?r,?*o") (sign_extend:DI (match_operand:SI 1 "register_operand" "0,0,r,r"))) (clobber (reg:CC FLAGS_REG)) (clobber (match_scratch:SI 2 "=X,X,X,&r"))] "!TARGET_64BIT" "#") // there is a post reg-alloc splitter If I understand your remark on SCRATCH correctly, then the expander should use gen_reg_rtx for operand 2 of extendsidi2_1; and the insn should use match_operand on operand 2 instead of match_scratch. Correct? This case is more complicated than one for the AVR patch where scratch should always get a register. I've should not have been too unconditional about prohibiting scratches. When is a scratch still OK? There are some insns where scratch will rest as scratch after RA. The above insn is an example (constraint X). If it is changed by pseudo in RA, the pseudo in such case might get a stack slot. It is not a big deal as scratch pseudo are short-live one and probability of stack space increase is small. But for small stackless functions the penalty could be big. For the above case it is also hard for me to say now will the pseudo use in IRA improve the allocation because we don't know yet what is the final insn alternative. We might assign a hard reg in IRA and spill other pseudos for this, although the final alternative will use constraint X. If scratch should always get a register, that what you describe above would be a right thing to do. In such case using scratch is really a bad thing. The most common pattern in i386 is (clobber (match_scratch ...))) which probably always results in the need for a register that IRA doesn't see, so I assume that this is one of the cases where you would recommend using a pseudo instead...? (A quick grep on config/*.md shows 952 cases like this, and 113 match_scratch uses of a different form.) A match_scratch or (clobber (scratch)) in a define_peephole2 should also always be fine, because that's just a way to see if there is a suitable register available to perform the peephole code transformation (peephole2 runs after reg-alloc). (mem:BLK (scratch)) is always OK, Right, there are cases where scratch is useful. Why it is a bad idea? Because IRA (or the old global RA) does not take them into account *at all*. It means that IRA might think that there are enough registers for pseudos but in reality it is wrong because of scratches in live range of the pseudos. Is there a reason why IRA doesn't replace scratches with pseudos, like LRA does (and IIRC reload does, also)? As I remember reload does not do it but of course it takes scratches into account. When reload processes insn, it looks what reload registers are needed for operands including scratches. LRA process all pseudos not just reload ones having a bigger picture and of course it should looks at scratch too. All scratches are changed by pseudos to simplify implementation of LRA and spilled pseudo are transformed back into scratches. IRA could use the same approach with scratches as LRA but as I wrote above for some cases it is hard to say is it a right thing to do as we don't know the final insn alternative. For LRA, changing scratches by pseudos is the best approach as we assign hard registers after deciding the final insn alternative. So I should have written in my first email that usage of scratches should be deprecated when it has register constraints without X.
Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression
On 04/20/2015 01:09 AM, Shiva Chen wrote: Hi, Jeff Thanks for your advice. can_replace_by.patch is the new patch to handle both cases. pr43920-2.c.244r.jump2.ori is the original jump2 rtl dump pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump after patch can_replace_by.patch Could you help me to review the patch? Thanks. This looks pretty good. I expanded the comment for the new function a bit and renamed the function in an effort to clarify its purpose. From reviewing can_replace_by, it seems it should have been handling this case, but clearly wasn't due to implementation details. I then bootstrapped and regression tested the patch on x86_64-linux-gnu where it passed. I also instrumented that compiler to see how often this code triggers. During a bootstrap it triggers a couple hundred times (which is obviously a proxy for cross jumping improvements). So it's triggering regularly on x86_64, which is good. I also verified that this fixes BZ64916 for an arm-non-eabi toolchain configured with --with-arch=armv7. Installed on the trunk. No new testcase as it's covered by existing tests. Thanks,, jeff
Re: PR63633: May middle-end come up width hard regs for insn expanders?
On Tue, Apr 21, 2015 at 12:27:40AM +0200, Steven Bosscher wrote: > On Mon, Apr 20, 2015 at 10:11 PM, Vladimir Makarov wrote: > > I might be wrong but I think you have a bloated code because you use > > scratches. I already told several times that usage of scratch is always a > > bad idea. It was a bad idea for an old RA and is still a bad idea for IRA. > > The usage of scratches should be prohibited, probably we should write it > > somewhere. It is better to use just a regular pseudo instead. > > Thanks Vladimir, I didn't know this. > Does this mean that, for example, extendsidi in i386.md would be > better if it did not use match_scratch? The combiner can add or remove clobbers of scratches whenever needed, but it cannot do that for clobbers of pseudos. Segher