Re: Scheduling automaton question
Le vendredi 11 février 2011 à 13:33 +0100, Bernd Schmidt a écrit : > Suppose I have two insns, one reserving (A|B|C), and the other reserving > A. I'm observing that when the first one is scheduled in an otherwise > empty state, it reserves the A unit and blocks the second one from being > scheduled in the same cycle. This is a problem when there's an > anti-dependence of cost 0 between the two instructions. If you generate a NDFA ( using '(automata_option "ndfa")' ) it should allow you to schedule both instructions together as this should try all functional unit alternatives. Fred
Re: Describing multi-register values in RTL
Le jeudi 21 octobre 2010 à 21:11 -0700, Ian Lance Taylor a écrit : > Paul Koning writes: > > > To take that example, on the pdp11 an SImode is two HImodes. Could > > the RTL template in the MD file for, say, addsi3 split that into two > > or three insns that operate on HImode values and describe the actual > > instructions? In this case: add high parts, then add low parts and > > propagate carry into high. Split that way it would seem you would not > > be constrained to adjacent registers, or for that matter to both being > > registers at all. This is exactly the kind of thing I'm looking at. > The lower subreg pass will do that for you if you have the right set of > insns. Could you expand a bit on what the 'right set of instructions' is or even better give an example of an md file where we could find an example? Thanks a lot! Fred
Re: Bug in expand_builtin_setjmp_receiver ?
Hi Jon, Le mardi 26 octobre 2010 à 13:07 +0100, Jon Beniston a écrit : > What problems do you have building lm32-elf? If you let me know, I can try > to look in to them. If you have access to a lm32 toolchain, can you test if gcc.c-torture/execute/built-in-setjmp.c passes at different optimization levels? Many thanks, Fred
Re: combine two load insns
Le mardi 07 décembre 2010 à 06:18 -0700, Jeff Law a écrit : > On 12/06/10 15:07, Ian Lance Taylor wrote: > Given the two loads don't have a def-use data dependency combine won't > ever get the opportunity to do anything with them. In general there is > no pass which combines insns without a true data dependency and targets > which have such insns have had to handle those combinations in machine > dependent reorg. In fact, it was the combination of independent insns > which led to the introduction of the machine dependent reorg pass eons ago. The issue with this approach is that reorg runs very late. I suppose that if one wants to combine 2 SI loads into a DI load, it needs to be done before IRA to satisfy the generated register constraints. Fred
Re: Subreg splitting and floating point
Le jeudi 06 janvier 2011 à 09:29 -0800, Richard Henderson a écrit : > On 01/06/2011 06:58 AM, Frederic Riss wrote: > > 136 is a pseudo. I have movdf and movsf patterns that accepts > > constants. > > This one statement is suspicious to me. Do I read from this that > you have fp move patterns that accept constants but not registers? No , I was just pointing out that they don't reject constants because the pattern in question was moving a constant. > Move patterns are special in that they *must* handle everything, > modulo some constants which reload can spill to memory. Yes I learned that a few weeks ago while working on improving my testsuite results (although this particular error was only causing a few obscure failures in the C++ tests). > The other piece of advice that I can give from elsewhere in this > thread is that you should never match or generate SUBREG by hand. > You should always use the gen_lowpart, gen_highpart, simplify_gen_subreg > interfaces. Those will greatly simplify the double-subreg issues > that you have been having. Yeah, Ian pointed me at those. I'll update my code. > Finally, do you actually have dedicated hard registers for fp? If > yours is a soft-fp target -- or one of the rare targets that does > hard fp out of the general register set -- consider totally eliminating > the fp move patterns. Once upon a time gcc required them even for > soft-fp, but we've gotten much better with introduction of the > lower-subreg pass. Not that too many existing ports have been updated > for that pass, leading others to conclude that the patterns are still > required... This is quite interesting. I'll remove the patterns and see what happens. Thanks for all the advice! Fred
Auto-vectorizer and (mis-)alignment support assumptions
On Thu, 2013-09-12 at 17:39 +0200, Frederic Riss wrote: > The issue is that I am using super-block > scheduling in sched2 and that my sched_reorder hook prioritized the > load operation over the conditional branch that did the alignment > check. > > I'm now leaning toward a scheduler bug (or my customization thereof). > I expect superblock scheduling to hoist instructions out of their > original basic-block, but it seems very dangerous to move memory > accesses this way (without speculation). I tracked this down to may_trap_p(). Stack pointer relative accesses are always considered non-trapping, thus the scheduler is allowed to execute it speculatively. The vectorizer has protected the access, but the fact that it's stack-relative allows it to escape the protected region... I hit this case because of extended basic-block scheduling, but I think the problem is more general than that. may_trap_or_fault_p() does the right thing by taking the misalignement of stack accesses into account on STRICT_ALIGN targets. Would it be a solution to call that instead may_trap_p() from haifa-sched.c:may_trap_exp() ? I'm not clear why may_trap_p() and may_trap_or_fault_p() are different functions. When would we want to disallow a trap, but allow a misalignment fault? Fred
[4.7 regression?] HImode 'smax' RTL generation
Hello, I'm trying to port a private backend from GCC 4.5 to 4.7, and I'm seeing some performance degradation in HImode benchmarks. The backend has no HImode insns apart from the mov and SImode extensions. I tracked one of the regressions down to the RTL expansion pass. The 4.7 version won't generate smax RTL patterns for COND_EXPR statements working on HImode operands. In GCC 4.5, COND_EXPR was of GIMPLE_SINGLE_RHS rhs_class. At expansion time, the COND_EXPR would go through: expand_gimple_stmt -> expand_gimple_stmt_1 -> expand_assigment -> store_expr -> fold_convert_loc (in the promoted subreg case) ...and the folding converts the COND_EXPR to a MAX_EXPR that generates a smax RTL pattern. In 4.7, the COND_EXPR has become a GIMPLE_TERNARY_RHS rhs_class, meaning that it won't use the expand_assigment path in expand_gimple_stmt_1, but will use straight expression expansion which will generate control flow RTL for the COND_EXPR. Should this be considered a code generation regression, or should the MAX_EXPR trees be generated at some other point in the middle-end ? Many thanks, Fred
Re: IRA_COVER_CLASSES In gcc47
Hi Valdimir Le vendredi 23 mars 2012 à 12:08 -0400, Vladimir Makarov a écrit : > Since 4.7 we use more sophisticated trivial coloring criteria which work > well even on intersected register classes. To be more accurate, we > calculate an approximation of an profitable hard regs for each pseudo. > These approximations form a tree. The tree is used for find trivial > colorability of the pseudos. It was a surprise that such approach is > profitable even for architectures with regular register files like ppc. > Here is an excerpt from comments on the top ira.c file: I started porting an in house backend to GCC 4.7 and saw at least one huge regression relating to the IRA changes. The symptom in that benchmark is that all allocnos get pushed on the allocation stack as trivially colorable, but at unstacking time only the first ones get a hard reg and all others get spilled. AFAIU the CB coloring algorithm, all registers that get pushed as trivially colorable should get a hard reg when they are popped from the stack. Thus there must be some description issue in my backend that confuses the IRA. The register file is very regular. There are 64 SImode registers of which 3 are fixed. In order to store DImode values, 2 consecutive SImode registers are needed. The DImode pairs cannot be chosen anywhere in the register file, they need to start at even offset (however this still allows to have 32 DImode registers). In the testcase I'm looking at, we have ~32 DImode allocnos that are live from function start to the end. These registers conflict with (nearly) all other allocnos. These registers get pushed last, but strangely, they get pushed as trivially colorable. Of course, once the register file has been fully allocated to these long lived registers, nothing else can get a hard register anymore. In GCC 4.5, some of these DImode registers got marked as potential spill, and allowed the other pseudos to be correclty allocated. Do you have any idea what in my backend can confuse the trivial coloring criteria to mark all allocnos as trivially colorable ? Many thanks, Fred