Re: Scheduling automaton question

2011-02-11 Thread Frédéric RISS
Le vendredi 11 février 2011 à 13:33 +0100, Bernd Schmidt a écrit :
> Suppose I have two insns, one reserving (A|B|C), and the other reserving
> A. I'm observing that when the first one is scheduled in an otherwise
> empty state, it reserves the A unit and blocks the second one from being
> scheduled in the same cycle. This is a problem when there's an
> anti-dependence of cost 0 between the two instructions.

If you generate a NDFA ( using '(automata_option "ndfa")' ) it should
allow you to schedule both instructions together as this should try all
functional unit alternatives. 

Fred



Re: Describing multi-register values in RTL

2010-10-21 Thread Frédéric RISS
Le jeudi 21 octobre 2010 à 21:11 -0700, Ian Lance Taylor a écrit :
> Paul Koning  writes:
> 
> > To take that example, on the pdp11 an SImode is two HImodes.  Could
> > the RTL template in the MD file for, say, addsi3 split that into two
> > or three insns that operate on HImode values and describe the actual
> > instructions?  In this case: add high parts, then add low parts and
> > propagate carry into high.  Split that way it would seem you would not
> > be constrained to adjacent registers, or for that matter to both being
> > registers at all.

This is exactly the kind of thing I'm looking at.

> The lower subreg pass will do that for you if you have the right set of
> insns.

Could you expand a bit on what the 'right set of instructions' is or
even better give an example of an md file where we could find an
example?

Thanks a lot!
Fred



Re: Bug in expand_builtin_setjmp_receiver ?

2010-10-27 Thread Frédéric RISS
Hi Jon,

Le mardi 26 octobre 2010 à 13:07 +0100, Jon Beniston a écrit :
> What problems do you have building lm32-elf? If you let me know, I can try
> to look in to them.

If you have access to a lm32 toolchain, can you test if
gcc.c-torture/execute/built-in-setjmp.c passes at different optimization
levels?

Many thanks,
Fred



Re: combine two load insns

2010-12-07 Thread Frédéric RISS
Le mardi 07 décembre 2010 à 06:18 -0700, Jeff Law a écrit :
> On 12/06/10 15:07, Ian Lance Taylor wrote:
> Given the two loads don't have a def-use data dependency combine won't 
> ever get the opportunity to do anything with them.  In general there is 
> no pass which combines insns without a true data dependency and targets 
> which have such insns have had to handle those combinations in machine 
> dependent reorg.  In fact, it was the combination of independent insns 
> which led to the introduction of the machine dependent reorg pass eons ago.

The issue with this approach is that reorg runs very late. I suppose
that if one wants to combine 2 SI loads into a DI load, it needs to be
done before IRA to satisfy the generated register constraints.

Fred



Re: Subreg splitting and floating point

2011-01-06 Thread Frédéric RISS
Le jeudi 06 janvier 2011 à 09:29 -0800, Richard Henderson a écrit :
> On 01/06/2011 06:58 AM, Frederic Riss wrote:
> > 136 is a pseudo. I have movdf and movsf patterns that accepts
> > constants.
> 
> This one statement is suspicious to me.  Do I read from this that
> you have fp move patterns that accept constants but not registers?

No , I was just pointing out that they don't reject constants because
the pattern in question was moving a constant.

> Move patterns are special in that they *must* handle everything,
> modulo some constants which reload can spill to memory.

Yes I learned that a few weeks ago while working on improving my
testsuite results (although this particular error was only causing a few
obscure failures in the C++ tests). 

> The other piece of advice that I can give from elsewhere in this
> thread is that you should never match or generate SUBREG by hand.
> You should always use the gen_lowpart, gen_highpart, simplify_gen_subreg
> interfaces.  Those will greatly simplify the double-subreg issues
> that you have been having.

Yeah, Ian pointed me at those. I'll update my code.

> Finally, do you actually have dedicated hard registers for fp?  If
> yours is a soft-fp target -- or one of the rare targets that does
> hard fp out of the general register set -- consider totally eliminating
> the fp move patterns.  Once upon a time gcc required them even for
> soft-fp, but we've gotten much better with introduction of the
> lower-subreg pass.  Not that too many existing ports have been updated
> for that pass, leading others to conclude that the patterns are still
> required...

This is quite interesting. I'll remove the patterns and see what
happens. Thanks for all the advice!

Fred




Auto-vectorizer and (mis-)alignment support assumptions

2013-09-12 Thread Frédéric RISS
On Thu, 2013-09-12 at 17:39 +0200, Frederic Riss wrote:
> The issue is that I am using super-block
> scheduling in sched2 and that my sched_reorder hook prioritized the
> load operation over the conditional branch that did the alignment
> check.
> 
> I'm now leaning toward a scheduler bug (or my customization thereof).
> I expect superblock scheduling to hoist instructions out of their
> original basic-block, but it seems very dangerous to move memory
> accesses this way (without speculation). 

I tracked this down to may_trap_p(). Stack pointer relative accesses are
always considered non-trapping, thus the scheduler is allowed to execute
it speculatively. The vectorizer has protected the access, but the fact
that it's stack-relative allows it to escape the protected region... I
hit this case because of extended basic-block scheduling, but I think
the problem is more general than that.

may_trap_or_fault_p() does the right thing by taking the misalignement
of stack accesses into account on STRICT_ALIGN targets. Would it be a
solution to call that instead may_trap_p() from
haifa-sched.c:may_trap_exp() ?

I'm not clear why may_trap_p() and may_trap_or_fault_p() are different
functions. When would we want to disallow a trap, but allow a
misalignment fault? 

Fred



[4.7 regression?] HImode 'smax' RTL generation

2012-03-13 Thread Frédéric RISS
Hello,

I'm trying to port a private backend from GCC 4.5 to 4.7, and I'm seeing
some performance degradation in HImode benchmarks. The backend has no
HImode insns apart from the mov and SImode extensions. 

I tracked one of the regressions down to the RTL expansion pass. The 4.7
version won't generate smax RTL patterns for COND_EXPR statements
working on HImode operands. 

In GCC 4.5, COND_EXPR was of GIMPLE_SINGLE_RHS rhs_class. At expansion
time, the COND_EXPR would go through:
expand_gimple_stmt -> expand_gimple_stmt_1 ->  expand_assigment 
  -> store_expr -> fold_convert_loc (in the promoted subreg case)
...and the folding converts the COND_EXPR to a MAX_EXPR that generates a
smax RTL pattern.

In 4.7, the COND_EXPR has become a GIMPLE_TERNARY_RHS rhs_class, meaning
that it won't use the expand_assigment path in expand_gimple_stmt_1, but
will use straight expression expansion which will generate control flow
RTL for the COND_EXPR.

Should this be considered a code generation regression, or should the
MAX_EXPR trees be generated at some other point in the middle-end ?

Many thanks,
Fred



Re: IRA_COVER_CLASSES In gcc47

2012-03-23 Thread Frédéric RISS
Hi Valdimir

Le vendredi 23 mars 2012 à 12:08 -0400, Vladimir Makarov a écrit :
> Since 4.7 we use more sophisticated trivial coloring criteria which work 
> well even on intersected register classes.  To be more accurate, we 
> calculate an approximation of an profitable hard regs for each pseudo.  
> These approximations form a tree.  The tree is used for find trivial 
> colorability of the pseudos.  It was a surprise that such approach is 
> profitable even for architectures with regular register files like ppc.  
> Here is an excerpt from comments on the top ira.c file:

I started porting an in house backend to GCC 4.7 and saw at least one
huge regression relating to the IRA changes. The symptom in that
benchmark is that all allocnos get pushed on the allocation stack as
trivially colorable, but at unstacking time only the first ones get a
hard reg and all others get spilled.

AFAIU the CB coloring algorithm, all registers that get pushed as
trivially colorable should get a hard reg when they are popped from the
stack. Thus there must be some description issue in my backend that
confuses the IRA.

The register file is very regular. There are 64 SImode registers of
which 3 are fixed. In order to store DImode values, 2 consecutive SImode
registers are needed. The DImode pairs cannot be chosen anywhere in the
register file, they need to start at even offset (however this still
allows to have 32 DImode registers).

In the testcase I'm looking at, we have ~32 DImode allocnos that are
live from function start to the end. These registers conflict with
(nearly) all other allocnos. These registers get pushed last, but
strangely, they get pushed as trivially colorable. Of course, once the
register file has been fully allocated to these long lived registers,
nothing else can get a hard register anymore.

In GCC 4.5, some of these DImode registers got marked as potential
spill, and allowed the other pseudos to be correclty allocated.

Do you have any idea what in my backend can confuse the trivial coloring
criteria to mark all allocnos as trivially colorable ?

Many thanks,
Fred