compiling very large functions.

2006-11-04 Thread Kenneth Zadeck
I think that it is time that we in the GCC community took some time to
address the problem of compiling very large functions in a somewhat
systematic manner.

GCC has two competing interests here:  it needs to be able to provide
state of the art optimization for modest sized functions and it needs to
be able to properly process very large machine generated functions using
reasonable resources. 

I believe that the default behavior for the compiler should be that
certain non essential passes be skipped if a very large function is
encountered. 

There are two problems here:

1) defining the set of optimizations that need to be skipped.
2) defining the set of functions that trigger the special processing.


For (1) I would propose that three measures be made of each function. 
These measures should be made before inlining occurs. The three measures
are the number of variables, the number of statements, and the number of
basic blocks. 
Many of the gcc passes are non linear in one or more of these measures
and these passes should be skipped if one or more of these measures
exceeds some threshold.

For (2) I would propose that we add three new fields to the compilation
manager.  These fields would be null or zero if the optimization is
either essential or is only linear in the measure.  Otherwise, some
indication of either a threshold or the exponent of the growth is used
as the field. 

The compilation manager could then look at the options, in particular
the -O level and perhaps some new options to indicate that this is a
small machine or in the other extreme "optimize all functions come hell
or high water!!" and skip those passes which will cause performance
problems.

I do not claim to understand how sensitive every pass is to these
measures.  However, I could possibly make a good first cut on the rtl
passes. 

However, I think that before anyone starts hacking anything, we should
come to a consensus on an overall strategy and implement something
consistent for the entire compiler rather than attack some particular
pass in a manner that only gets us pass the next pr. 

Volunteer(s) to implement the compilation manager part of this would
also be appreciated.

Kenny




Re: compiling very large functions.

2006-11-04 Thread Kenneth Zadeck
Richard Guenther wrote:
> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> I think that it is time that we in the GCC community took some time to
>> address the problem of compiling very large functions in a somewhat
>> systematic manner.
>>
>> GCC has two competing interests here:  it needs to be able to provide
>> state of the art optimization for modest sized functions and it needs to
>> be able to properly process very large machine generated functions using
>> reasonable resources.
>>
>> I believe that the default behavior for the compiler should be that
>> certain non essential passes be skipped if a very large function is
>> encountered.
>>
>> There are two problems here:
>>
>> 1) defining the set of optimizations that need to be skipped.
>> 2) defining the set of functions that trigger the special processing.
>>
>>
>> For (1) I would propose that three measures be made of each function.
>> These measures should be made before inlining occurs. The three measures
>> are the number of variables, the number of statements, and the number of
>> basic blocks.
>
> Why before inlining?  These three numbers can change quite significantly
> as a function passes through the pass pipeline.  So we should try to keep
> them up-to-date to have an accurate measurement.
>
I am flexible here. We may want inlining to be able to update the
numbers.  However, I think that we should drive the inlining agression
based on these numbers. 
> Otherwise the proposal sounds reasonable but we should make sure the
> limits we impose allow reproducible compilations for N x M cross
> configurations and native compilation on different sized machines.
>
I do not want to get into the game where we are looking at the size of
the machine and making this decision.  Doing that would make it hard to
reproduce bugs that come in from the field.  Thus, I think that the
limits (or functions) should be platform independent.

> Richard.



Re: compiling very large functions.

2006-11-04 Thread Kenneth Zadeck
Richard Guenther wrote:
> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> Richard Guenther wrote:
>> > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> >> I think that it is time that we in the GCC community took some
>> time to
>> >> address the problem of compiling very large functions in a somewhat
>> >> systematic manner.
>> >>
>> >> GCC has two competing interests here:  it needs to be able to provide
>> >> state of the art optimization for modest sized functions and it
>> needs to
>> >> be able to properly process very large machine generated functions
>> using
>> >> reasonable resources.
>> >>
>> >> I believe that the default behavior for the compiler should be that
>> >> certain non essential passes be skipped if a very large function is
>> >> encountered.
>> >>
>> >> There are two problems here:
>> >>
>> >> 1) defining the set of optimizations that need to be skipped.
>> >> 2) defining the set of functions that trigger the special processing.
>> >>
>> >>
>> >> For (1) I would propose that three measures be made of each function.
>> >> These measures should be made before inlining occurs. The three
>> measures
>> >> are the number of variables, the number of statements, and the
>> number of
>> >> basic blocks.
>> >
>> > Why before inlining?  These three numbers can change quite
>> significantly
>> > as a function passes through the pass pipeline.  So we should try
>> to keep
>> > them up-to-date to have an accurate measurement.
>> >
>> I am flexible here. We may want inlining to be able to update the
>> numbers.  However, I think that we should drive the inlining agression
>> based on these numbers.
>
> Well, for example jump threading and tail duplication can cause these
> numbers to significantly change.  Also CFG instrumentation and PRE
> can increase the BB count.  So we need to deal with cases where an
> optimization produces overly large number of basic blocks or
> instructions.
> (like by throtting those passes properly)
>
I lean to leave the numbers static even if they do increase as time goes
by.  Otherwise you get two effects, the first optimizations get to be
run more, and you get the wierd non linear step functions where small
changes in some upstream function effect the down stream.

kenny

> Richard.



Re: compiling very large functions.

2006-11-05 Thread Kenneth Zadeck
Paolo Bonzini wrote:
>
>> While I agree with you, I think that there are so many things we are
>> already trying to address, that this one can wait.  I think we've
>> been doing a very good job on large functions too, and I believe that
>> authors of very large functions are just getting not only what they
>> deserve, but actually what the expect: large compile times
>> (superlinear).
>
> Not too mention, that these huge functions are usually central to the
> program.  If GCC decided that it is not worth optimizing the
> machine-generated bytecode interpreter of GNU Smalltalk, for example,
> I might as well rewrite it in assembly (or as a JIT compiler).  Same
> for interpret.cc in libjava, though it is a tad smaller than GNU
> Smalltalk's interpreter.
>
> Unlike the authors of other VM's, I have no problem writing code so
> that the *latest* version of GCC will do its best, instead of
> complaining that GCC compiles my code worse on every release.  So, I
> am ok with GCC doing stupid things because of bugs that I/we can fix,
> but not with GCC just giving up optimization on code that has always
> been compiled perfectly (in one/two minutes for about 30,000 lines of
> machine-generated code, despite being chock-full of computed gotos),
> that *can* be optimized very well, and that is central to the
> performance of a program.
>
> Paolo
I actually think that you small talk example is the exception and not
the rule.  I would guess that the vast majority of very large functions
are machine generated simulations where the optimizer most likely
provides little benefit. 

In the case of dataflow, reaching defs is much more expensive than
simple liveness, not because there are more bit vectors (there are
exactly the same number) but because there are an order of magnitude
more bits in those bit vectors than in live variables and it just takes
more time and space to move them around. 

The thing is that even as memories get larger, something has to give. 
There are and will always be programs that are too large for the most
aggressive techniques and my proposal is simply a way to gracefully shed
the most expensive techniques as the programs get very large. 

The alternative is to just to just shelve these bugs and tell the
submitter not to use optimization on them.  I do not claim to know what
the right approach is.

kenny


Re: compiling very large functions.

2006-11-05 Thread Kenneth Zadeck
Eric Botcazou wrote:
>> AFAIK not one of the tree optimizers disables itself, but perhaps we
>> should. The obvious candidates would be the ones that require
>> recomputation of alias analysis, and the ones that don't update SSA
>> info on the fly (i.e. require update_ssa, which is a horrible compile
>> time hog).
>> 
>
> Tree alias analysis can partially disable itself though:
>
>   /* If the program has too many call-clobbered variables and/or function
>  calls, create .GLOBAL_VAR and use it to model call-clobbering
>  semantics at call sites.  This reduces the number of virtual operands
>  considerably, improving compile times at the expense of lost
>  aliasing precision.  */
>   maybe_create_global_var (ai);
>
> We have found this to be quite helpful on gigantic elaboration procedures 
> generated for Ada packages instantiating gazillions of generics.  We have 
> actually lowered the threshold locally.
>
>   
I would like to point out that the central point of my proposal was to
have the compilation manager be the process that manages if an
optimization is skipped or not rather than having each pass make a
decision on it's own.  If we have a central mechanism, then it is
relative easy to find some sweet spots.  If every pass rolls its own, it
is more difficult to balance.


Kenny



Re: compiling very large functions.

2006-11-06 Thread Kenneth Zadeck
Andrew MacLeod wrote:
> On Sat, 2006-11-04 at 15:17 -0500, Kenneth Zadeck wrote:
>   
>> ld. 
>> 
>
>   
>> However, I think that before anyone starts hacking anything, we should
>> come to a consensus on an overall strategy and implement something
>> consistent for the entire compiler rather than attack some particular
>> pass in a manner that only gets us pass the next pr. 
>> 
>
> I think we really do need to deal with it on a per pass basis. We
> occasionally get a monster PR which shows half a dozen (or more) serious
> time hogs. We more often get a pathological case for a specific pass. 
>
> In each and every one of these cases, someone has taken a look at how to
> improve the time hog(s). Usually the result is a leaner/faster and
> improved pass, although it sometimes takes a new release for it to
> happen :-). Occasionally, we turn something off.   A couple of PRs with
> massive functions are primarily responsible for the pending changes to
> out-of-ssa... And those changes will benefit small functions as well as
> the large ones.  
>
>   
The problem with trying to solve this problem on a per pass basis rather
than coming up with an integrate solution is that we are completely
leaving the user out of the thought process.

There are some uses who have big machines or a lot of time on their
hands and want the damn the torpedoes full speed ahead and there are
some uses that want reasonable decisions made even at high
optimization.  We need to give them any easy to turn knob. 

I am not saying that my original proposal was the best of all possible
worlds, but solving hacking things on a pass by pass or pr by pr basis
is not really solving the problem.

kenny






new DATAFLOW PORTING wiki available.

2007-01-15 Thread Kenneth Zadeck
I have posted a new wiki intended to help port maintainers with issues
that may arise with the dataflow branch.  The new wiki page can be found
at http://gcc.gnu.org/wiki/DataflowPorting.

Thanks,

Kenny


re: Some thoughts and quetsions about the data flow infrastructure

2007-02-12 Thread Kenneth Zadeck
>   On Sunday I had accidentally chat about the df infrastructure on
> IIRC.  I've got some thoughts which I'd like to share.
> 
>   I like df infrastructure code from the day one for its clearness.
> Unfortunately users don't see it and probably don't care about it.
> With my point of view the df infrastructure has a design flaw.  It
> extracts a lot of information about RTL and keep it on the side.  It
> does not make the code fast.  It would be ok if we got a better code
> quality.  Danny told me that they have 1.5% better code using df.  It
> is a really big improvement (about half year work for all compiler
> team according to Proebsting's law).  IMHO, it could justify promised
> 5% compiler slowness.
>
Vlad, 

I think that different people can have different perspectives.  

You have been working on improving the register allocation for several
years, but very little has come of it because the reload
infrastructure does not suit itself to being integrated with modern
register allocators.  You have spent several years of work without
touching the underlying problem that reload is generally going to
defeat almost any effort to get good benefits out of a new register
allocator.  I do not want to denigrate your work in any way, but at
the end of the day, any new register allocator will be compromised by
the existing reload implementation.

I am interested bringing the rest of the back end into the modern
world.  While some of the passes can and should be moved into the ssa
middle end of the compiler, there are several optimizations that can
only be done after the details of the target have been fully exposed.

My experience with trying to do this was that the number one problem
was that the existing dataflow is in many cases wrong or too
conservative and that it was not flexible enough to accommodate many
most modern optimization techniques.  So rather than hack around the
problem, I decided to attack the bad infrastructure problem first, and
open the way for myself and the others who work on the back end to
benefit from that infrastructure to get the rest of passes into shape.

There are certainly performance issues here.  There are limits on how
much I, and the others who have worked on this have been able to
change before we do our merge.  So far, only those passes that were
directly hacked into flow, such as dce, and auto-inc-dec detection
have been rewritten from the ground up to fully utilize the new
framework.  

However, it had gotten to the point where the two frameworks really
should not coexist.  Both implementations expect to work in an
environment where the information is maintained from pass to pass and
doing with two systems was not workable.  So the plan accepted by the
steering committee accommodates the wholesale replacement of the
dataflow analysis but even after the merge, there will still be many
passes that will be changed.

I would have liked to have the df information more tightly integrated
into the rtl rather than it being on the side.  It is cumbersome to
keep this information up to date.  However, the number of places in
the backends that depend on the existing rtl data structures apis make
such a replacement very difficult.

I do believe that by the time that we merge the branch, we will be
down to a 5% compile time regression.  While I would like this number
to be 0% or negative, I personally believe that having precise and correct
information is worth it and that over time we will be able to remove
that 5% penalty.  

As far as the other regressions, these will be dealt with very soon.  

Kenny



Re: Some thoughts and quetsions about the data flow infrastructure

2007-02-13 Thread Kenneth Zadeck
Vladimir N. Makarov wrote:
>> Vlad,
>> I think that different people can have different perspectives. 
>> You have been working on improving the register allocation for several
>> years, but very little has come of it because the reload
>> infrastructure does not suit itself to being integrated with modern
>> register allocators.  You have spent several years of work without
>> touching the underlying problem that reload is generally going to
>> defeat almost any effort to get good benefits out of a new register
>> allocator.  I do not want to denigrate your work in any way, but at
>> the end of the day, any new register allocator will be compromised by
>> the existing reload implementation.
>>
>
> Ken, to be exact I've been working for a bit more 2 years on the
> register allocator itself.  Probably you don't know I attacked exactly
> underlying problem - the reload.  If you look at YARA branch, it is
> register allocator without the reload.  And I am really worring that
> there is a little outcome.  But ridding the reload off is so complex
> problem. I can not work on it for a few more years. I need some result
> too.
>
> Using the experience I've got from YARA branch I've created another
> register allocator (IRA branch) to make it ready for gcc-4.4.  IRA
> still uses reload.  But may be I have higher standards.  I don't want
> to include the code in sake of inclusion.  I have no worse code
> generated by IRA than the current register allocator.  The code size
> is smaller for the most platforms.  For some platforms I have better
> generated code (up to 4% on SPECINT2000 in 32-bt mode: bo exact 1930
> vs 1850 for Core2 according this weekend benchmarking).
>
> Actually I could make IRA ready for gcc4.3.  It works for x86, x86_64,
> itanium, ppc, sparc, s390, arm.  It is optional so other platform can
> use the current register allocator.  But I don't to rush.
>
> But still you are right the reload comprising the generated code.  If
> you are interesting in my opinion, df infrastracture is a tiny part of
> RA implementation problem (and as I understand insn scheduler and code
> selection).  Actually IRA uses he tDF infrastracture but it can be easily
> to be switched to the old life anaysis.
>
>> I am interested bringing the rest of the back end into the modern
>> world.  While some of the passes can and should be moved into the ssa
>> middle end of the compiler, there are several optimizations that can
>> only be done after the details of the target have been fully exposed.
>>
>
> Bring the rest of the back end into the modern word is too chalenging
> task.  If you really want it, imho you should attack RTL and machine
> descriptions.  But this task is a magnitude more difficult than
> introducing tree-SSA.
>
It is a hard project and you are right that replacing rtl would be
better.  However, I do know how to do that either from a logistical
point of view or from the point of view of having a better replacement 
that covers all of the platforms as well.

However there are a lot of sins in the back end and a large number of
them are either being directly addressed by this replacement or are now
accessible.  The addition of df will allow others to introduce better
technology.

>> My experience with trying to do this was that the number one problem
>> was that the existing dataflow is in many cases wrong or too
>> conservative and that it was not flexible enough to accommodate many
>> most modern optimization techniques.  So rather than hack around the
>> problem, I decided to attack the bad infrastructure problem first, and
>> open the way for myself and the others who work on the back end to
>> benefit from that infrastructure to get the rest of passes into shape.
>>
>
> I am not against a new DF infrastracture, more accurate one. I am
> against a slower infrastracture.
>
>> There are certainly performance issues here.  There are limits on
>> how much I, and the others who have worked on this have been able to
>> change before we do our merge.  So far, only those passes that were
>> directly hacked into flow, such as dce, and auto-inc-dec detection
>> have been rewritten from the ground up to fully utilize the new
>> framework.  However, it had gotten to the point where the two
>> frameworks really should not coexist.  Both implementations expect
>> to work in an environment where the information is maintained from
>> pass to pass and doing with two systems was not workable.  So the
>> plan accepted by the steering committee accommodates the wholesale
>> replacement of the dataflow analysis but even after the merge, there
>> will still be many passes that will be changed.
>
> Does it means that compiler will be even more slower?
>
>> I would have liked
>> to have the df information more tightly integrated into the rtl
>> rather than it being on the side.  It is cumbersome to keep this
>> information up to date.  However, the number of places in the
>> backends that depend on the existing rtl data structu

Re: Some thoughts and quetsions about the data flow infrastructure

2007-02-13 Thread Kenneth Zadeck
Richard Kenner wrote:
>> Regs_ever_live is the poster child of this.  In theory regs_ever_live is
>> easy, it is just the set of hard registers that are used.  In practice
>> this is a disaster to keep track of because it was only updated
>> occasionally and its values are "randomly" changed by the backends in
>> totally undocumented ways.  Maintaining regs_ever_live requires a lot of
>> special mechanism that slows down a the incremental scanning.  
>> 
>
> The history here, by the way, is that it was originally very simple and just
> supposed to provide a "quick and easy" way of having a conservative view of
> which registers *weren't* ever used.  So it was set when a register might
> possibly be used.  That was indeed easy.
>
> But then people wanted to be able to know *for sure* which registers were
> used, so mechanisms were added to clear it out when we knew a register
> *wasn't* used, which added the complexity you mention.
>
> This is a problem with a lot of the ad-hoc structures used: they were
> originally meant for one specific purpose and often used very locally and
> were reasonably well-designed for that purpose, but then were used more
> globally and/or for other purposes and they weren't quite so well designed
> for that purpose anymore, but nobody went to the troule to change them.
>
> I strongly support a new, common infrastructure that will allow all of these
> older structures to be replaced.  But the history is important in my opinion
> because it means that we need to think as generally as possible and to ensure
> we come up with as broad a structure as possible in order both to replace the
> current structures, but also to support many other uses in the future. What
> what I understand, the current mechanism does that, but I think it's
> important to keep this criterion in mind when evaluating any possible
> "competitors".
>   
I very much understand the "I just need to make this one little tweak"
road to hell.  By definition, this was no one's plan.  It will most
likely take us all of stage 2 to understand and replace all of the one
little tweaks with respect to regs_ever_live. 


RE: new auto-inc-dec pass

2007-03-05 Thread Kenneth Zadeck
> Hi,
> 
> I'mm currently looking at the dataflow branch for m68k mainly because of 
> the new auto-inc-dec pass, I worked a bit on the old code in flow.c, but 
> considering the new pass, I think it doesn't make much sense to continue 
> it.
> 
I agree.  With high probability the auto inc detection will be
replaced with the new auto-inc-dec.c on the dataflow branch.

I would certainly encourage you to enhance the new code in
auto-inc-dec.c where you find that it is lacking.  That code does get
all of the cases that were found by the old version of flow and does
get many things that were missed.  

But there is no expectation that this is the last word in auto-inc-dec
detection.  My main concern was to cleanly separate this function from
flow without any regressions.


> 
> 
> I'm attaching my current patch and some test cases, but before delving 
> deeper into the code, I have a few questions and I'd like to hear some 
> opinions. Currently I only looked at the output and it misses some 
> opportunities (not that my patch can do everything either :) ).

I agree that the auto-inc-dec code will already get much of this.  In
particular there is no longer (unless the machine requirres it) a
reliance that the increment match the size of the load or store. 
> 
> One case is about multiple increments, the tree optimizer merges them and 
> increments the register only once, so if one only looks at the size of the
> pointer value one misses them, e.g. something like this:
> 
>   (set (mem (reg)) (x))
>   (set (mem (plus (reg) (const_int 4))) (x))
>   (set (reg) (plus (reg) (const_int 8))
> 
> Another case are multiple uses of the register within an instruction, the 
> old code didn't do this at all, the new code seems to do a bit better, but 
> there as a rather tricky case I looked into, e.g.
> 
>   (set (mem) (plus (mem) (reg)))
> 
> could be turned into:
> 
>   (set (post_inc (mem)) (plus (mem) (reg)))
> 
I do find this a little wierd but if it makes a difference go for it.


> or
> 
>   (set (mem) (plus (pre_dec (mem)) (reg)))
> 
> This is at least what operands_match_p() accepts, but hasn't been 
> generated in a long time and has the potential to trigger bugs in the 
> back end (e.g. m68k only looks only at one of the operands).
> A question would be now if there is a general interest in reactivating 
> this, so gcc could generate a instruction like "add.l %d0,(%a0)+" for 
> m68k.
> 
> The more general question is whether someone is already working on further 
> improvements, so I won't duplicate anything (which wouldn't be before I 
> looked into the remaining m68k dataflow problems anyway. :) ).
> 
We have stopped development on this piece of code.  It is efficient
and works with no regressions on arm, ppc and ia-64.  So it meets the
minimun criteria for inclusion onto mainline with the upcoming merge.

However, that should in no way stop anyone else from enhancing it further.

> The dataflow porting document mentions other possible, but doesn't mention 
> any examples. Anything I might have to look out for regardings the m68k 
> post_inc/pre_dec addressing modes?
> 
> bye, Roman

Kenny


Re: [DataFlow Branch] Query regarding an ICE in global.c

2007-04-18 Thread Kenneth Zadeck
Ramana Radhakrishnan wrote:
> Hi , 
>
> I am working on integrating a private port into the Dataflow branch and
> am running into a couple of issues with ICEs in global.c  . The ICE is
> at gcc_assert ( REGS_LIVE (i)) at line 534 in global_alloc in
> global.c .. 
>
> However because of the way we generate calls in the backend with an
> extra clobber register for calculating the return address. The temporary
> which gets clobbered is here. 
>
> (call_insn:HI 32 29 34
> 4 ../../../../gnu/libgcc/../gcc/unwind-dw2-fde.c:487 (parallel [
> (set (reg:SI 1 $r1)
> (call (mem:SI (reg/v/f:SI 145 [ fde_compare ]) [0 S4
> A32])
> (const_int 0 [0x0])))
> (use (const_int 0 [0x0]))
> (clobber (reg:SI 31 $link))
> (clobber (reg:SI 153))
> ]) 40 {call_value_indirect} (expr_list:REG_DEAD (reg:SI 3 $c3)
> (expr_list:REG_DEAD (reg:SI 2 $r2)
> (nil)))
> (expr_list:REG_DEP_TRUE (use (reg:SI 3 $r3))
> (expr_list:REG_DEP_TRUE (use (reg:SI 2 $r2))
> (expr_list:REG_DEP_TRUE (use (reg:SI 1 $r1 [ ob ]))
> (nil)
>
>
>
>
> Is it something to do with the way that REGS_EVER_LIVE is calculated ?
> If so where does this get updated in the new infrastructure. I am still
> feeling my way through the df code , so any help would be appreciated.
> Thanks in advance. 
>
> This is with revision 123253 of the DF branch which is slightly older .
> I am considering moving up but wanted to check before that . 
>
>
> cheers
> Ramana
>
>
>
>   
There may have been a few fixes that might help so you should try
updating first. 
There is a fix was that causes the REG_N... information to be rebuilt at
the start of global rather than reusing the info computed during local. 
If you are adding this pseudo very late, this will likely fix the problem.

Fixing your kind of bug was not the reason for this fix.   The fix
solves an issue where the regs live across a call computed during local
is different than the way it is computed during global. 

This kind of information is built by the RI problem when df_analyze is
called and that problem is not resolved when df_analyze is called inside
global.

Hope this fixes you issue.

Kenny
>
>
>
>
>
>
>
>   



Re: Canonical form of the RTL CFG for an IF-THEN-ELSE block?

2005-04-09 Thread Kenneth Zadeck

Jan Hubicka wrote:
Hi,
We would like to know if there is some way to find the true and false
branches of a conditional jump in RTL.  In the tree CFG, we have two
edge flags for that, EDGE_{TRUE,FALSE}_VALUE, but those flags have no
meaning for the RTL CFG.  So our question is, is there some other way
to tell what edge will be taken in a conditional jump if the condition
is true?
It seems that some passes assume a canonical form of IF-THEN-ELSE even
on RTL.  From ifcvt.c:find_if_header:
 /* The THEN edge is canonically the one that falls through.  */
 if (then_edge->flags & EDGE_FALLTHRU)
   ;
 else if (else_edge->flags & EDGE_FALLTHRU)
   {
 edge e = else_edge;
 else_edge = then_edge;
 then_edge = e;
   }
 else
   /* Otherwise this must be a multiway branch of some sort.  */
   return NULL;
On the other hand, in cfgexpand.c:expand_gimple_cond_expr we have,
 false_edge->flags |= EDGE_FALLTHRU;
and loop-unswitch.c assumes that the BRANCH_EDGE is the true_edge:
 true_edge = BRANCH_EDGE (unswitch_on_alt);
 false_edge = FALLTHRU_EDGE (unswitch_on);
So which is it?  Is BRANCH_EDGE always taken if the condition is true,
or FALLTHRU_EDGE, or do you have to look at the condition to know?
Who knows an answer?  :-)
   

:) It depends on how the conditional is constructed.
If you use get_condition the edge taken when conditional is true is
always BRANCH_EDGE if some exists (it is possible to have conditional
jump to the following instruction where you have only one edge with
EDGE_FALLTHRU flag).
Otherwise you have to look into conditional jump RTL yourself to figure
out if it has form
(set (pc) (if_then_else (cond) (pc) (label_ref xxx))
or
(set (pc) (if_then_else (cond) (label_ref xxx) (pc))
In the first case we are taking barnch edge when conditional is false.
 

It seems there are two approaches to solving this problem, constructive 
and preservative.   According to danny, the constructive approach is 
what is used in the rtl level if-conversion and is dead slow.  In the 
preservative approach, you keep the markings that are inserted from the 
tree code and track their changes as the rtl gets modified.  This 
assumes that we do not throw away the cfg right before rtl generation, 
which may be a big assumption or at least you add notes in the rtl that 
preserves this information for later reconstruction.

I have to admit that I am always a fan of preserving information rather 
than reconstructing it.  This need to just keep rescanning the 
intermediate representation over and over again is a big flaw in gcc and 
contributes adversely to the performance. 

kenny

Honza
 

Gr.
Steven
   



Re: Proposed resolution to aliasing issue.

2005-05-18 Thread Kenneth Zadeck
> Mark Mitchell wrote:
> >
> >  struct A {...};
> >  struct B { ...; struct A a; ...; };
> >
> >
> >  void f() {
> >B b;
> >g(&b.a);
> >  }
> >
> >
> >does the compiler have to assume that "g" may access the parts of
> >"b" outside of "a". If the compiler can see the body of "g" than
> >it may be able to figure out that it can't access any other
> >parts, or figure out which parts it can access, and in that case
> >it can of course use that information. The interesting case,
> >therefore, is when the body of "g" is not available, or is
> >insufficient to make a conclusive determination.
> >
> 
> I attended a UK C++ panel meeting yesterday, and took the opportunity
> to solicit opinions on this.  The question I posed was
>   struct A {
>   ...
>   T1 a;
>   T2 b;
>   };
>   void g(T1 &a);
>   void Foo () {
>  A v;
>  v.b = 2;
>  g (v.a);
>  if (v.b == 2) ...
> }
> Does the compiler have a right to presume v.b does indeed == 2 in the if
> condition? -- assuming T2 is a suitable type for that :)
> 
> 
> After I explained the optimization (and the related structure splitting
> optimization), the general consensus was 'yes that would be a useful
> optimization'.  But no one was sufficiently confident of proving it
> was allowable.  The opinion was expressed that if it was not allowable,
> there was a bug in the std.
> 
> 
> The observation was made that if A is non-POD, one cannot play offsetof
> tricks to get from A::a to A::b, so the optimization is safe on non-PODs.
> (Of course one would have to prove the address of 'v' did not escape,
> so I guess the ctor and dtor would need to be trivial or visible.)
> 
> 
> One of the panel members is looking at C++'s memory model WRT
> multithreading.  This question has a direct impact there too, as
> separate threads might be operating on v.a and v.b.  He indicated
> he would consider the issue.
> 
> 
> I also outlined the approach gcc would take with a compile time
> switch and an attribute.  The preference was expressed that
> the attribute should be of the form
>   void badfunc (A & __I_AM_BAD__ m);
> rather than
>   void goodfunc (A & __I_AM_GOOD__ m);
> because (a) badfuncs are more than likely rare and (b) it would be a useful
> aid to the programmer.[1] Mark outlines an __I_AM_GOOD__ attribute,  I think
> it would be better to have both flavours and then the compiler switch can
> specify which way the default goes.
> 

I would like to point out some problems with this approach.  Consider
the case where you have three modules: A, B and C.  Each with a single
function a, b, and c respectively where a calls b and b calls c.  Also
assume that c has the __I_AM_BAD__ attribute.

What is known when compiling the A module?  Function a does not know
that b calls c.  Are you going to require that b's prototype also have
the __I_AM_BAD__ attribute because it calls c?  Where does this stop?

I believe that the only way to have this work is to have the attribute
be associated with the data type.  This attribute discriminates this
type from a similar type that does not have the attribute, i.e. you
cannot just assign a pointer to a bad int to a pointer to an int.

This will force the programmer to have separate versions of functions
that take the bad pointer and the good pointer but this lets the
compiler compiler the good functions in a manner that rewards the
good.  It is also easy to track thru the maze of separate compilation.

Kenny

Disclaimer:  I have never written a single line of C++ in my life nor
have I ever read any C++ books or specifications.   I am firmly rooted
in the ML, Modula-3 and Java school of strongly typed, well defined
languages.

> 
> nathan
> 
> [1] it was of course noted that that looked stunningly like 'restrict', and
> the suggestion that it be spelled 'noalias' was jokingly made :)
> 
> 
> --
> Nathan Sidwell::   http://www.codesourcery.com   :: CodeSourcery LLC
> [EMAIL PROTECTED]::
> http://www.planetfall.pwp.blueyonder.co.uk



Re: Existing tree functionality?

2005-07-06 Thread Kenneth Zadeck
I hope to have my code checked in today. 
Look in ipa-reference.c


Kenny

Daniel Berlin wrote:

Most of this can be found in the cgraph nodes.  The rest requires
scanning the IL.

Ken Zadeck should have code to do this.


On Wed, 2005-07-06 at 08:32 -0400, Michael Tegtmeyer wrote:
 

Hello,

Is there existing functionality somewhere to sweep a function and collect 
all externally visible variables at the tree level or do I need to roll my 
own? I've looked in tree.h and grepped around as much as I could but I 
haven't found anything obvious.


Thanks in advance,
Mike Tegtmeyer
   


why are there many copies of function_decls that have the same information including the same uid?

2005-07-11 Thread Kenneth Zadeck

Is this a bug or a feature?


Re: why are there many copies of function_decls that have the same information including the same uid?

2005-07-11 Thread Kenneth Zadeck

I was wrong, I misread some debugging output.

Sorry,

kenny

Andrew Pinski wrote:


On Jul 11, 2005, at 8:50 AM, Daniel Berlin wrote:


On Mon, 2005-07-11 at 08:40 -0400, Kenneth Zadeck wrote:

Is this a bug or a feature?

Bug.
where is it occurring?


I want to say the C++ front-end since that is where a couple of
ICEs have showed up because of that.

-- Pinski


more on duplicate decls

2005-07-12 Thread Kenneth Zadeck
I have been trying to cleanup the location where the persistent ipa 
information is stored.


The original way that I did this was that I had a field in the var_ann 
structure that pointed to the ipa_information. 

Now that danny has reorganized the decls, the plan was to just add a 
field to the function_decl.


Danny has suggested that if I was going to add such a field, it should 
be a pointer to the cgraph_node and then store the persistent ipa 
information there.


I noticed that if I did this, there should be a lot of stuff that could 
fall away.  There would be no need to have the hash table that points 
from function_decls to cgraph nodes as well as there was no need for the 
master_clone field since the cgraph node that the decl pointed to would 
be just this good.  All of this sounds great... smaller,  faster.


This has not gone as easy as I had hoped.  Danny found the first 
stumbling block yesterday that the c  decl merging needed to be enhanced.


The second bug appears to be harder.  In c++, there really can be 
multiple function_decls with the same uid.


The kludge to handle them is documented in cp/name-lookup.c around line 
670 and one test case to generate them is in 
gcc/testsuite/g++.dg/opt/inline2.C.
There are several ways to go about fixing this:  One is to just abandon 
my fix and let others (such as the code sourcery people deal with it 
when they go to build a real symbol table.)


It seems like a better way is to build a table of merges that need to be 
done and find some place after the c++ front end is finished but before 
the cgraph gets built and do one scan of the code and replace all of the 
offending decls. 


Does such a location exist?

Is this the correct plan?

Kenny




Re: more on duplicate decls

2005-07-13 Thread Kenneth Zadeck



Mark Mitchell wrote:

Kenneth Zadeck wrote:

The kludge to handle them is documented in cp/name-lookup.c around 
line 670


Ugh.

IMO, the right thing here is that there should be only one 
FUNCTION_DECL for a given function, ever, period.  Default arguments 
are not part of "the function" in C++; they are an aspect of 
particular declarations of the function.  The problem we're having is 
that we associate them with the FUNCTION_DECL, rather than with the 
bindings (mappings from names to FUNCTION_DECLs).


Unfortunately, fixing that properly is a rather large change.

It seems like a better way is to build a table of merges that need to 
be done and find some place after the c++ front end is finished but 
before the cgraph gets built and do one scan of the code and replace 
all of the offending decls.


After the C++ front end is done, it would be OK to canoanicalize all 
FUNCTION_DECLs with a single UID into one.  A stupid way of doing that 
would be to just walk the entire translation unit, and whenever you 
find a reference to a non-canonical copy of the FUNCTION_DECL, 
re-point it at the right one.


The C++ front end always operates in unit-at-a-time mode, so, yes, you 
could do this when the C++ front end is finished.
Are you saying that if I wait til the front end is finished that I do 
not have to worry about the default args problem you mention above?


Should I fix it this simple way or should I let a c++ front end person 
fix it as the decls are created?


kenny


Re: tree-ipa-type-escape slow

2005-08-30 Thread Kenneth Zadeck

the splay tree in question is there as part of the type unification.

This is required because our design for combining modules does not unify 
the types and this messes up the type escape analysis. 

Because of this, the type name must be the key.  However, there is the 
possibility that doing some special treatment for unnamed types is 
possible.  I did not look into it.  I do not understand the front ends 
enough to know how often these are created. 


Is this a c++ issue? or an issue associated with only this benchmark?

kenny



Richard Guenther wrote:

I'm seeing splay tree usage from tree-ipa-type-escape in the first
places of profiles from tramp3d-v4:

samples  %image name   app name symbol 
name
1418146.5462  no-vmlinux   no-vmlinux   (no 
symbols)
71471 3.2992  cc1plus  cc1plus  
splay_tree_splay_helper
51114 2.3595  cc1plus  cc1plus  
comptypes
49001 2.2619  cc1plus  cc1plus  
compute_may_aliases



---
 1116015.6147  cc1plus  cc1plus  
splay_tree_splay
 5985583.7473  cc1plus  cc1plus  
splay_tree_splay_helper
71471 3.2992  cc1plus  cc1plus  
splay_tree_splay_helper
 7147143.1489  cc1plus  cc1plus  
splay_tree_splay_helper [self]
 5985536.1360  cc1plus  cc1plus  
splay_tree_splay_helper
 12966 7.8279  libc-2.3.5.socc1plus  
strcmp
 11404 6.8849  cc1plus  cc1plus  
compare_type_brand
 8650  5.  cc1plus  cc1plus  
splay_tree_compare_pointers

---

looking into the ipa-type-escape code I wonder if there's a reason
to use TYPE_NAME as a key instead of TYPE_UID.  Also if TYPE_NAME
is necessary, instead of using "" using NULL and special
casing that will be a lot faster, too.  Can you elaborate on what
exactly the datastructure is used for?  Maybe for a large number
of types it isn't really the most efficient one.

Thanks,
Richard.
 


Re: [RFC] propagating loop dependences from trees to RTL (for SMS)

2005-09-22 Thread Kenneth Zadeck
I will pull a patch together tomorrow.  There is currently nothing in 
the code for keeping the region stuff up to date as changes are made to 
the cfg.  For most changes this would not be hard, but for some it is 
really hard. 


Daniel Berlin wrote:

On Thu, 2005-09-22 at 18:49 -0700, Devang Patel wrote:
 

On Sep 22, 2005, at 2:32 AM, Steven Bosscher wrote:

   
On Sep 22, 2005 11:25 AM, Zdenek Dvorak  
<[EMAIL PROTECTED]> wrote:


 

4. Other ideas?

 
Preserving the information about loops throughout the  
optimizations, and
just keeping this information attached at the loop description  
would by
far be the cleanest approach -- admitedly also the one that  
requires the

greatest amount of work.

   

Shouldn't the regions patch allow us to preserve loops quite easily?
 

Any pointer to this "regions patch" ?
Thanks,
   


Ask kenny for a copy.

 


Re: Link-time optimzation

2005-11-17 Thread Kenneth Zadeck
On Wed, Nov 16, 2005 at 02:26:28PM -0800, Mark Mitchell wrote:
> >   http://gcc.gnu.org/projects/lto/lto.pdf
> 
> Section 4.2
> 
>   What is the rationale for using a stack-based representation rather
>   than a register-based representation?  A infinite register based
>   solution would seem to map better onto gimple, making it easier to
>   understand the conversion into and out of.  The required stacking
>   and unstacking operations would seem to get in the way.
> 
>   C.f. plan9's inferno, rather than jvm or cil.

What we wanted was something that was portable and easy to specify but
also easily exported from and imported into GCC.  With respect to
those constraints, JVM, CIL and LLVM do not make the cut because they
are different enough from the tree-gimple code that we have to make
the translation into and/or out of gcc difficult.  This is not NIH,
just being pragmatic about not wanting to have to commit resources into
things that are not important.

A stack machine representation was chosen for the same reason.  Tree
gimple is a series of statements each statement being a tree.
Smashing the trees and introducing temps is easy on the output side
but requires a lot of work on the input side.  I am not a fan of our
tree representations, but I did not believe that changing them should
be a necessary to get to link time optimization.  If we decide we want
to get rid if trees as an intermediate form, this decision should
change.

A well designed stack machine also provides for a very tight encoding.
It is very desirable from a size of portable code point of view to
minimize the number of temps that you create.  You can do this in
several ways:

1) Do some register allocation of the temps so that they are reused.
   This is non trivial to undo (but truely doable), especially where
   you wish to not adversely impact debugging.

2) Just generate a lot of temps and hope that some other form of
   compression will save the day.

3) Use a stack to represent the intermediate nodes of the tree.  This
   is what we chose to do.

It is trivial to generate the stack code from a single walk of the
tree.  It is trivial to regenerate the tree from a single pass over
the stack code.

The stack machine that we have in mind will be as stripped down as
possible.  The idea is just to get the trees in and get them back out.


Kenny



Re: Link-time optimzation

2005-11-17 Thread Kenneth Zadeck
Mark Mitchell <[EMAIL PROTECTED]> writes:

>   http://gcc.gnu.org/projects/lto/lto.pdf

> 
> Section 4.2 (Executable Representation) describes the GVM as a stack
> machine, and mentions load, store, duplicate, and swap operations.
> But it also discusses having registers which correspond to GIMPLE
> local variables.  The way I put this together is that a MODIFY_EXPR
> which sets a local variable, say, will be converted to something that
> computes the expression using a stack and then assigns the value to a
> register.  It's easy to see how to convert such a MODIFY_EXPR into a
> stack machine and back.  But it's not easy to see why that stack
> machine is going to need, e.g, a swap operation.  There is no GIMPLE
> swap operator.  So I may still be confused about something.

You are most likely correct.  We either do not need a swap or do not
want to encourage people to try to optimize the trees so much that
they need a swap.
> 
> Ian



Re: Link-time optimzation

2005-11-17 Thread Kenneth Zadeck
> >> Thanks for woking on this. Any specific reason why using the LLVM
> >> bytecode wasn't taken into account?
> >
> > It was.
> > A large number of alternatives were explored, including CIL, the JVM,
> > LLVM, etc.
> >
> >> It is proven to be stable, high-level enough to
> >> perform any kind of needed optimization,
> >
> > This is not true, unfortunately.
> > That's why it is called "low level virtual machine".
> > It doesn't have things we'd like to do high level optimizations on,
> > like dynamic_cast removal, etc.
> 
> 
> Anyway, *slightly* extending an existing VM which already exists, is
> production-ready, is GPL compatible, is supported by a full toolchain
> (including interpreters, disassemblers, jitters, loaders, optimizers...) looks
> like a much better deal. Also, I'm sure Chris would be willing to provide us
> with all the needed help.
> 
> I also think CIL would have egregiously worked. I'm sure the reasons to refuse
> it are more political than tecnical, so it's useless to go into further 
> details
> I presume.

I do not think that CIL really would have worked.  CIL has been
carefully crafted to support what Microsoft wants it to support and
unrestricted C, C++, and Fortran are not in that mix.  

Remember that Microsoft wants to be able to take a CIL program
produced on one (?Microsoft?) platform and run it on another
(?Microsoft?) platform.  By the time we get to where we want to
produce the iterchangable code, the il has a lot of platform knowledge
(for instance generated by the C preprocessor) that cannot be easily
accounted for.


> 
> Giovanni Bajo



Re: LLVM/GCC Integration Proposal

2005-11-19 Thread Kenneth Zadeck

>  Specifically, advocates of the recent link-time-optimization proposal
>  [1] may claim that exposing all of the information that DWARF captures
>  (e.g. about the type system) at link-time is a good thing: it makes it
>  easier to do specific high-level optimizations, because all of the
>  information is trivially available. However, they are ignoring the
>  issue of handling multiple languages at the same time: an optimizer
>  with this design has to be aware of (and be able to update) all
>  possible source-language information and must be extended in the
>  future as new languages are added (this is the problem with universal
>  compilers, dating back to the UNCOL design). Also, the linker needs to
>  know how to combine declarations defined in different languages,
>  because an interprocedural optimizer only want to see one declaration
>  for each logical variable (for example). This quickly becomes
>  difficult and messy, which is presumably why the link-time proposal
>  allows the linker to "give up" linking two translation units.

The reason for the complexity of the type system handling in our
proposal was motivated primarily by two concerns:

1) We need to keep track of the types so that they are available for
   debugging.  As LLVM does not support debuggers at this point, it
   may be difficult to see through all of the issues required to keep
   track of the information while still doing good optimization.
   In the end we will have to produce a set of debugging info for the
   optimized program, loosing all the type information before you
   start did not seem the correct plan.

2) GCC has a very large user community that knows nothing about
   compilers except how to shepherd their programs through GCC and
   run the result.  Unless I am mistaken about the LLVM, it has no
   such user community.  I personally cannot guarantee that GCC (or
   for that matter any optimizing compiler) can correctly cross inline
   and compile a program if the types in one module are not consistent
   with the types in another module.  Just because the program
   happens to work correctly when separately compiled is not enough.

   When Mark and I started working on this proposal (and later the
   rest of the volunteers) we decided that this was not going to be
   either an academic exercise or just something to run benchmarks.
   What that means to me is that the link time optimizer needs to be
   able to either generate correct code or give up in some predictable
   manner.  Having the compiler push forward and hope everything
   turns out OK is not enough.  Discretion is the better part
   of valor.

I think that taking advantage of mixed C, C++ or C and Fortran
programs is going to be hard.  But it is what the GCC customers want
and there is a desire to accommodate them if possible.

On the positive side, LLVM has a lot more experience in managing whole
program compilation, and it is a much cleaner software base.  It would
be a good to find some mechanism to take advantage of that experience.
Trying to make the hippo dance is not really a lot of fun.


Kenny



question about registers that are set on entry to a function.

2006-01-05 Thread Kenneth Zadeck

Ian and Honza,

Here is a first draft of the a function to find the set of registers 
that are (will) be defined on entry to function.  I generated this from 
our conversation on chat and by looking around. 

This function is necessary in the new version of df because, unlike 
flow, we do both forwards and backward propagation to get better 
information.  Thus there is no code in flow to give me a hint about 
this.  Furthermore, the other places that do forwards analysis like 
make_accurate_live analysis punt on the hard registers and just assume 
that they are all live on entry to the function.


1) Do you believe that this code could be correct?
2) Most of the paragraphs are protected by either reload_completed or 
epilogue_completed (which has a comment at the its point of declaration 
that it is true for both the prologue and epilogue).  Is it true that 
after these epochs, that there will really be code in the first basic 
block that will def these registers?  I.e. are any of them implicitly 
set before the function is called so I will not find a def?

3) Am I missing any classes of registers?

Any one else who has any opinions (or secret, undocumented knowledge of 
ports) is encouraged to answer back also.


Thanks,

Kenny



/* Record the (conservative) set of hard registers that are defined on
  entry to the function.  */

static void
df_record_entry_block_defs (struct dataflow * dflow)
{
 unsigned int i;
 bitmap_iterator bi;
 rtx r;
 struct df * df = dflow->df;

 bitmap_clear (df->entry_block_defs);

 if (! (df->flags & DF_HARD_REGS))
   return;

 /* Once the prologue has been generated, all of these registers
should just show up in the first regular block.  */
 if (HAVE_prologue && epilogue_completed)
   {
 for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   {
 if (FUNCTION_ARG_REGNO_P (i))
   bitmap_set_bit (df->entry_block_defs, i);
   }

#ifdef EH_USES

 for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   if (EH_USES (i))
 {
   bitmap_set_bit (df->entry_block_defs, i);
 }
#endif

 if (REG_P (INCOMING_RETURN_ADDR_RTX))

   bitmap_set_bit (df->entry_block_defs, REGNO (INCOMING_RETURN_ADDR_RTX));


 /* If STATIC_CHAIN_INCOMING_REGNUM == STATIC_CHAIN_REGNUM

only STATIC_CHAIN_REGNUM is defined.  If they are different,
we only care about the STATIC_CHAIN_INCOMING_REGNUM.  */
#ifdef STATIC_CHAIN_INCOMING_REGNUM
 bitmap_set_bit (df->entry_block_defs, STATIC_CHAIN_INCOMING_REGNUM);
#else
#ifdef STATIC_CHAIN_REGNUM
 bitmap_set_bit (df->entry_block_defs, STATIC_CHAIN_REGNUM);
#endif
#endif

 r = TARGET_STRUCT_VALUE_RTX (current_function_decl, true)

   if (r && REG_P (r))
 bitmap_set_bit (df->entry_block_defs, REGNO (r));
   }

 /* These registers are live everywhere.  */
 if (! reload_completed)
   {
 /* Any reference to any pseudo before reload is a potential
reference of the frame pointer.  */
 bitmap_set_bit (df->entry_block_defs, FRAME_POINTER_REGNUM);

#if FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
 /* Pseudos with argument area equivalences may require
reloading via the argument pointer.  */
 if (fixed_regs[ARG_POINTER_REGNUM])
   bitmap_set_bit (df->entry_block_defs, ARG_POINTER_REGNUM);
#endif

 /* Any constant, or pseudo with constant equivalences, may

require reloading from memory using the pic register.  */
 if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM
 && fixed_regs[PIC_OFFSET_TABLE_REGNUM])
   bitmap_set_bit (df->entry_block_defs, PIC_OFFSET_TABLE_REGNUM);
   }

 EXECUTE_IF_SET_IN_BITMAP (df->entry_block_defs, 0, i, bi)
   {
 rtx def = df_reg_def_gen (true, i);
 df_ref_create_structure (dflow, regno_reg_rtx[i],
  &XEXP (def, 0), ENTRY_BLOCK_PTR, NULL,
  DF_REF_REG_DEF, DF_REF_ARTIFICIAL);
   }
}






Re: question about registers that are set on entry to a function.

2006-01-05 Thread Kenneth Zadeck

Ian Lance Taylor wrote:

Kenneth Zadeck <[EMAIL PROTECTED]> writes:

  

1) Do you believe that this code could be correct?



Well, no.

  

You do not have to sugar coat it, I can handle the truth.

2) Most of the paragraphs are protected by either reload_completed or
epilogue_completed (which has a comment at the its point of
declaration that it is true for both the prologue and epilogue).  Is
it true that after these epochs, that there will really be code in the
first basic block that will def these registers?  I.e. are any of them
implicitly set before the function is called so I will not find a def?



This isn't quite accurate.  When the RTL is initially generated, for
any values passed in parameters you will see at the start of the
function
 (set (pseudo-reg) (param-reg))
That is, parameter handling is independent of the prologue/epilogue
code, and thus independent of epilogue_completed.  If you want to
declare that all FUNCTION_ARG_REGNO_P registers are live, you should
do that regardless of epilogue_completed.

Actually FUNCTION_ARG_REGNO_P is not enough.  It returns true for
outgoing function registers.  On a target with register windows, at
the start of the function you need to apply INCOMING_REGNO to each
register for which FUNCTION_ARG_REGNO_P returns true.

The prologue code which is set up when epilogue_completed is true is
code which sets up the stack frame and saves the callee saved
registers on the stack.  So when epilogue_completed is true, then at
the start of the function you will probably want to treat as live
every callee saved register for which regs_ever_live is true.  For
some targets the frame pointer will be live on function entry, for
others it is initialized by the prologue if necessary.

Assuming that PIC_OFFSET_TABLE_REGNUM is defined isn't always going to
be correct, although it may not matter.  On some targets, e.g., i386,
the register will be defined by the prologue code anyhow.

  

This all seems doable.  I can spin a better version of this.


3) Am I missing any classes of registers?



For complete accuracy, there are probably going to be some target
specific registers which need to be handled, unfortunately.  For
example, on MIPS, with -mabicalls (which is the default on GNU/Linux),
$25 is live on function entry.  It holds the address of the function,
and is used to initialize $28, which is PIC_OFFSET_TABLE_REGNUM.  I
don't think there is any target independent way of getting at that
fact.

  
Is the right thing to do to define a target hook for this that defaults 
to doing nothing and we only define it in those places where necessary 
or should I add a few other macros to the ports as necessary and just 
check to see if they are defined?  I have noticed that there are several 
macros that are only used on a single port.



It is also worth noting that rather than assuming that all
FUNCTION_ARG_REGNO_P registers are live, you could use
FUNCTION_ARG/FUNCTION_ARG_ADVANCE to walk through the parameters and
determine precisely which ones are live.  I don't know whether that
will make a significant difference for the rest of the code, though.

  
I will look into this also. 


Ian
  





Re: question about registers that are set on entry to a function.

2006-01-06 Thread Kenneth Zadeck

Ian Lance Taylor wrote:

Kenneth Zadeck <[EMAIL PROTECTED]> writes:

  

For complete accuracy, there are probably going to be some target
specific registers which need to be handled, unfortunately.  For
example, on MIPS, with -mabicalls (which is the default on GNU/Linux),
$25 is live on function entry.  It holds the address of the function,
and is used to initialize $28, which is PIC_OFFSET_TABLE_REGNUM.  I
don't think there is any target independent way of getting at that
fact.


  

Is the right thing to do to define a target hook for this that
defaults to doing nothing and we only define it in those places where
necessary or should I add a few other macros to the ports as necessary
and just check to see if they are defined?  I have noticed that there
are several macros that are only used on a single port.



I think the best approach would be a target hook.  I guess you could
pass it the function decl and have it set bits in a bitmap or a
HARD_REG_SET.  You might even want to do the whole thing in a target
hook, in which case the code you wrote would become the default
version of the hook (and would probably be called by non-default
versions).  I don't have a strong feeling as to which approach would
be best.

Ian
  

Being the glutton for punishment, lets try this again.

I have one question about EH_USES on the ia-64 (the only place it is 
used).  The function

that implements this starts out with

int
ia64_eh_uses (int regno)
{
 if (! reload_completed)
   return 0;

This means that before register allocation, all of the uses of these 
registers are hidden.


However, in the definition of REG_ALLOC_ORDER contains all of the 
register that are defined in ia64_eh_uses after reload.  It is at the 
end of the order and there are a large number of registers but it seems 
to me that if someone were to write a program that needed a large 
boatload of registers, the register allocator might be tempted to use 
these registers (unless there are no instruction patterns that use these 
registers) and then the instruction exception unwinder would not work.


Is this a latent bug?

Asside from the concern from above and plumbing the target hook, how 
does this look?


/* Record the (conservative) set of hard registers that are defined on
  entry to the function.  */

static void
df_record_entry_block_defs (struct dataflow * dflow)
{
 unsigned int i;
 bitmap_iterator bi;
 rtx r;
 struct df * df = dflow->df;

 bitmap_clear (df->entry_block_defs);

 if (! (df->flags & DF_HARD_REGS))
   return;

 for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   {
 if (FUNCTION_ARG_REGNO_P (i))
#ifdef INCOMING_REGNO
   bitmap_set_bit (df->entry_block_defs, INCOMING_REGNO (i));
#else
   bitmap_set_bit (df->entry_block_defs, i);
#endif
   }

 /* Once the prologue has been generated, all of these registers

should just show up in the first regular block.  */
 if (HAVE_prologue && epilogue_completed)
   {
 /* Defs for the callee saved registers are inserted so that the
pushes have some defining location.  */
 for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   if ((call_used_regs[i] == 0) && (regs_ever_live[i]))
 bitmap_set_bit (df->entry_block_defs, i);
   }
 else
   {
 if (REG_P (INCOMING_RETURN_ADDR_RTX))
   bitmap_set_bit (df->entry_block_defs, REGNO (INCOMING_RETURN_ADDR_RTX));


 /* If STATIC_CHAIN_INCOMING_REGNUM == STATIC_CHAIN_REGNUM

only STATIC_CHAIN_REGNUM is defined.  If they are different,
we only care about the STATIC_CHAIN_INCOMING_REGNUM.  */
#ifdef STATIC_CHAIN_INCOMING_REGNUM
 bitmap_set_bit (df->entry_block_defs, STATIC_CHAIN_INCOMING_REGNUM);
#else
#ifdef STATIC_CHAIN_REGNUM
 bitmap_set_bit (df->entry_block_defs, STATIC_CHAIN_REGNUM);
#endif
#endif

 r = TARGET_STRUCT_VALUE_RTX (current_function_decl, true)

   if (r && REG_P (r))
 bitmap_set_bit (df->entry_block_defs, REGNO (r));
   }

 /* These registers are live everywhere.  */
 if (!reload_completed)
   {
 /* Any reference to any pseudo before reload is a potential
reference of the frame pointer.  */
 bitmap_set_bit (df->entry_block_defs, FRAME_POINTER_REGNUM);

#ifdef EH_USES
 /* The ia-64, the only machine that uses this, does not define these
until after reload.  */
 for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   if (EH_USES (i))
 {
   bitmap_set_bit (df->entry_block_defs, i);
 }
#endif

#if FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM

 /* Pseudos with argument area equivalences may require
reloading via the argument pointer.  */
 if (fixed_regs[ARG_POINTER_REGNUM])
   bitmap_set_bit (df->entry_block_defs, ARG_POINTER_REGNUM);
#endif

#ifdef PIC_OFFSET_TABLE_REGNUM

 /* Any constant, or pseudo with constant equivalences, may
require reloading from memory using the pic register.  */
 if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID

Re: Mainline build failure

2006-01-11 Thread Kenneth Zadeck

Steven Bosscher wrote:

Hi,

I can't build the trunk today:

gcc -c   -O0 -g -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros 
-Wold-style-definition -Wmissing-format-attribute-DHAVE_CONFIG_H -I. -I. 
-I../../trunk/gcc -I../../trunk/gcc/. -I../../trunk/gcc/../include 
-I../../trunk/gcc/../libcpp/include  -I../../trunk/gcc/../libdecnumber 
-I../libdecnumber../../trunk/gcc/df-problems.c -o df-problems.o
../../trunk/gcc/df-core.c: In function ‘df_compact_blocks’:
../../trunk/gcc/df-core.c:795: error: invalid lvalue in assignment
../../trunk/gcc/df-core.c:803: error: invalid lvalue in assignment
../../trunk/gcc/df-core.c: In function ‘df_bb_replace’:
../../trunk/gcc/df-core.c:833: error: invalid lvalue in assignment
make[2]: *** [df-core.o] Error 1
make[2]: *** Waiting for unfinished jobs

Path to problem:
../trunk/configure --enable-languages=c --disable-{libmudflap,nls,libssp,checking} --disable-bootstrap 
make -j 2 CFLAGS="-O0 -g"


My host compiler is "gcc (GCC) 4.0.2 20050901".

Gr.
Steven

  
I posted the obvious patch to this a few hours ago, but because of the 
problems with the mail, no one has seen it.  I just talked to david 
edelsohn and he just told me to check it in.


It will be committed in a minute.


Re: Mainline build failure

2006-01-11 Thread Kenneth Zadeck
Steven, actually the last piece of mail I sent you was a lie. 
The bug I fixed is different.


The problem is that between the time that I check my code in an now, 
someone changed the representation of BASIC_BLOCK.


lorien:~/gccDFTest/gcc(27) diff basic-block.h  ../../gccBaseline/gcc
370c370
<   varray_type x_basic_block_info;
---
>   VEC(basic_block,gc) *x_basic_block_info;
402c402
<   (VARRAY_BB (basic_block_info_for_function(FN), (N)))
---
>   (VEC_index (basic_block, basic_block_info_for_function(FN), (N)))
414c414,415
< #define BASIC_BLOCK(N)(VARRAY_BB (basic_block_info, (N)))
---
> #define BASIC_BLOCK(N)(VEC_index (basic_block, 
basic_block_info, (N)))
> #define SET_BASIC_BLOCK(N,BB) (VEC_replace (basic_block, 
basic_block_info, (N), (BB)))


I will change my code and submit something when I get it thru stage 1 build.

kenny


Steven Bosscher wrote:

On Wednesday 11 January 2006 21:44, Steven Bosscher wrote:
  

Hi,

I can't build the trunk today:

gcc -c   -O0 -g -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes
-Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros
-Wold-style-definition -Wmissing-format-attribute-DHAVE_CONFIG_H -I.
-I. -I../../trunk/gcc -I../../trunk/gcc/. -I../../trunk/gcc/../include
-I../../trunk/gcc/../libcpp/include  -I../../trunk/gcc/../libdecnumber
-I../libdecnumber../../trunk/gcc/df-problems.c -o df-problems.o
../../trunk/gcc/df-core.c: In function ‘df_compact_blocks’:
../../trunk/gcc/df-core.c:795: error: invalid lvalue in assignment
../../trunk/gcc/df-core.c:803: error: invalid lvalue in assignment
../../trunk/gcc/df-core.c: In function ‘df_bb_replace’:
../../trunk/gcc/df-core.c:833: error: invalid lvalue in assignment
make[2]: *** [df-core.o] Error 1
make[2]: *** Waiting for unfinished jobs



From df-core.c:
  free (problem_temps);

  i = NUM_FIXED_BLOCKS;
  FOR_EACH_BB (bb)
{
  BASIC_BLOCK (i) = bb;
  bb->index = i;
  i++;
}

  gcc_assert (i == n_basic_blocks);

  for (; i < last_basic_block; i++)
BASIC_BLOCK (i) = NULL;


Now look at cfg.c:compact_blocks...  df-core.c should use that function
instead of duplicating it.

Gr.
Steven

  




reviewers for wide int.

2014-04-22 Thread Kenneth Zadeck

Richi,

David Edelsohn said that I should talk to you about appointing reviewers 
for wide-int.While I think that it may not be necessary to have any 
reviewers for wide-int in the long term, I think that it would be useful 
to make Richard Sandiford, Mike Stump and myself reviewers at least for 
this release cycle.


While of course one hopes that there will be no issues with wide-int, a 
change of this size will have some pain no matter how well we have 
tested it.  Having three reviewers will assure problems are resolved 
quickly.


Thanks,

Kenny


Re: reviewers for wide int.

2014-04-22 Thread Kenneth Zadeck

On 04/22/2014 03:37 PM, Richard Biener wrote:

On April 22, 2014 9:28:15 PM CEST, Kenneth Zadeck  
wrote:

Richi,

David Edelsohn said that I should talk to you about appointing
reviewers
for wide-int.While I think that it may not be necessary to have any

reviewers for wide-int in the long term, I think that it would be
useful
to make Richard Sandiford, Mike Stump and myself reviewers at least for

this release cycle.

While of course one hopes that there will be no issues with wide-int, a

change of this size will have some pain no matter how well we have
tested it.  Having three reviewers will assure problems are resolved
quickly.

Works for me.  I suppose this mainly covers wide-int.[CH], right?
if you want to define it that narrowly you can.   it really depends on 
how much help you want and how much you trust us not to go beyond what 
is reasonable.   All three of us have been at this long enough to know 
when to ask for help.


Kenny

Richard.


Thanks,

Kenny






Re: reviewers for wide int.

2014-04-23 Thread Kenneth Zadeck

On 04/23/2014 04:04 AM, Richard Biener wrote:

On Tue, 22 Apr 2014, Mike Stump wrote:


On Apr 22, 2014, at 12:48 PM, Kenneth Zadeck  wrote:

While of course one hopes that there will be no issues with wide-int, a
change of this size will have some pain no matter how well we have
tested it.  Having three reviewers will assure problems are resolved
quickly.

Works for me.  I suppose this mainly covers wide-int.[CH], right?

if you want to define it that narrowly you can.   it really depends on how much 
help you want and how much you trust us not to go beyond what is reasonable.   
All three of us have been at this long enough to know when to ask for help.

There is a large class of bugs that can creep in due to the subtle
change of interface from double-int to wide-int.  These happen outside
of the wide-int.[ch] code and seem statistically more likely by a large
margin than bugs in wide-int.[ch].  The good news, resolving them is
easy enough with side-by-side comparisons (say of dump files and .s
files).  Most of those fixes I’d expect to be trivial (for some
definition of trivial).

Yeah.  Note that it's difficult to define "reviewer for code that
uses wide-int", thus my question (that is, what do you put into
MAINTAINERS and how would you interpret the entry).

But as always we apply common sense to reviewer/maintainership
areas.

richi.

This is not without precedent.   The dataflow reviewers are authorized 
to review changes to data flow anywhere in the rtl level and back ends. 
   In the many years that that has been in place none of us "went 
rogue".We will be conservative.


kenny


Richard.





status of wide-int patch.

2014-04-28 Thread Kenneth Zadeck
At this point we have believe that we have addressed all of the concerns 
that the community has made about the wide-int branch.   We have also 
had each of the sections of the branch approved by the area maintainers.


We are awaiting a clean build on the arm and are currently retesting 
x86-64, s390, and p7 but assuming that those are clean, we are ready to 
merge this branch into trunk in the next day or so.Other port 
maintainers may wish consider testing on the branch before we commit.   
Otherwise we will fix any regressions after the merge.


Thanks for all of the help we have received along the way.

Kenny


merging the wide-int branch

2014-05-05 Thread Kenneth Zadeck
We are now ready to merge the wide-int branch.The branch was broken 
into small pieces and each of the area maintainers has approved their 
pieces.


The branch has been tested and runs regression free on three 64 bit 
platforms: x86, ppc, and s390 and on three 32 bit platforms: x86, arm 
and s390.The range of these includes both big and little endian for 
both sizes.  The testing has included all supported front ends.


Our plan, unless we hear any objections, is to merge tomorrow, on 
Tuesday May 6.While I am sure there will be some fallout, there are 
wide-int maintainers on all time zones so we should be able to address 
any issues in a hurry.


Thanks,

Kenny




we are starting the wide int merge

2014-05-06 Thread Kenneth Zadeck
please hold off on committing patches for the next couple of hours as we 
have a very large merge to do.

thanks.

kenny


Re: How to update reg_dead notes

2015-02-24 Thread Kenneth Zadeck
It is generally as easy as just adding the problem and calling 
df_analyze.  You tend to get into trouble if the rtl is not good, i.e. 
there is improper sharing or other violations of the canonical rtl 
rules.  DF does not like improperly shared rtl and it has not been 
uncommon for port specific passes to play loose with these rules.   I 
would suggest that first you try a build where you turn on all of the 
rtl checking and see if that comes out clean.


Kenny

On 02/24/2015 06:41 AM, Georg-Johann Lay wrote:
Hi, in order to fix PR64331 I tried to implement new target-specific 
passes whose sole purpose is to recompute REG_DEAD notes.


The avr BE relies on correct dead notes which are used in 
avr.c:reg_unused_after which uses dead_or_set_p.  avr BE needs correct 
dead notes in ADJUST_INSN_LENGTH, get_attr_length, 
get_attr_min_length, etc.


After trying for more than one day I am really frustrated; each 
approach ended up in seg_faults somewhere in df.


Following the source comments in df-core.c, recomputing dead notes 
should be as easy as



  df_note_add_problem ();
  df_analyze ();

in the execute() method of the new pass.


As this (and many many other tries using df_scan_alloc, df_scan_blocks 
df_finish_pass, df_insn_rescan_all, etc.) always crashes the compiler, 
I must do something completely wrong...


Could you give me some advice on correct usage of df or even more 
preferred point me to a comprehensible documentation of df which is 
more complete than in df-core.c?


Internals don't treat df, and the source comments are not really 
helpful, e.g. the complete documentation of df_analyze is /* Analyze 
dataflow info.  */.  Not a single word about prerequisites (except 
that it must run after df_init and before df_finish and needs correct 
cfg).


One example of a crash is that df->insns[uid] is being accessed and 
dereferenced where uid is a valid uid but df->insns[uid] is NULL.


df-core.c mentions "given instances of df".  How do I get one? The 
only instance I can find is the global struct df_f *df.  Does this 
mean one has to mess around with that global variable?







Re: How to update reg_dead notes

2015-02-24 Thread Kenneth Zadeck
when i suggested that you do a build with all of the checking turned on, 
i wanted you to this without you new code in it.there is a good 
possibility that the problem is that your port is generating bad rtl.
Also, you should generate a debuggable compiler so that the line numbers 
have some relevance to reality.




On 02/24/2015 01:23 PM, Georg-Johann Lay wrote:

Am 02/24/2015 um 06:06 PM schrieb Eric Botcazou:
Could you give me some advice on correct usage of df or even more 
preferred
point me to a comprehensible documentation of df which is more 
complete than

in df-core.c?


Take a look at the c6x and mep ports.



Thanks for the hint.  I changed the execute method to:

unsigned int execute ()
  {
compute_bb_for_insn ();
//df_clear_flags (DF_LR_RUN_DCE);
df_note_add_problem ();
df_analyze ();
df_finish_pass (false);

return 0;
  }

bit it keeps aborting.  Actually I am just copy pasting code from some 
other passes / BEs, but these places also don't have explanation for 
why they must use, may use, must not use specific functions.


Compiling the mentioned test case from libgcc with -Os (so that less 
insns are left over) and patching df-scan.c:df_refs_verify to print 
the refs just before it does gcc_assert (0):


new_ref = { u-1(18){ }}
*old_rec = { u-1(18){ }u-1(19){ }u-1(24){ }u-1(25){ }}
libgcc2-addvsi3.c: In function '__addvhi3':
libgcc2-addvsi3.c:1514:1: internal compiler error: in df_refs_verify, 
at df-scan.c:4338


In df_insn_refs_verify which is in the call chain of df_refs_verify, 
insn reads:


(insn 21 19 22 5 (parallel [
(set (cc0)
(compare (reg/v:HI 18 r18 [orig:48 a ] [48])
(reg/v:HI 24 r24 [orig:46 w ] [46])))
(clobber (scratch:QI))
]) libgcc2-addvsi3.c:1511 408 {*cmphi}
 (expr_list:REG_DEAD (reg/v:HI 18 r18 [orig:48 a ] [48])
(nil)))

r18 and r24 are ordinary general-purpose hard registers.


The latest pass which runs before the crash is .split5, i.e. 
recog.c::pass_split_for_shorten_branches which executes 
split_all_insns_noflow which in turn reads:


/* Same as split_all_insns, but do not expect CFG to be available.
   Used by machine dependent reorg passes.  */

unsigned int
split_all_insns_noflow (void)
{ ...

Does this mean CFG is (or might be) messed up after this pass and this 
is the reason for why df crashes because df needs correct CFG?


Johann








Re: How to update reg_dead notes

2015-02-25 Thread Kenneth Zadeck
it is a good policy to do a run with full checking every once in a 
while.   it is easy to have some garbage sneak in.df is particularly 
sensitive to bad rtl so that was my first suggestion.i had forgotten 
that df (as well as a lot of the infrastructure) does not work very late.

On 02/25/2015 01:54 PM, Georg-Johann Lay wrote:

Am 02/24/2015 um 07:33 PM schrieb Kenneth Zadeck:
when i suggested that you do a build with all of the checking turned 
on, i
wanted you to this without you new code in it.there is a good 
possibility

that the problem is that your port is generating bad rtl.


Ah, ok.  This actually revealed an ice-checking; but that was not 
related to the df problems...


I think the problem was that I tried to run passes needing df after 
free_cfg.


My current solution works so far: It fixes the test case (correct code 
+ insn lengths) and it did not ICE up to now.


Stuff is running a bit slow with all checks on so building and running 
tests will take a while...


FYI, you find my current patch attached.

Thanks for your help and time,

Johann



Index: config/avr/avr.c
===
--- config/avr/avr.c(revision 220963)
+++ config/avr/avr.c(working copy)
@@ -51,6 +51,8 @@
 #include "target-def.h"
 #include "params.h"
 #include "df.h"
+#include "context.h"
+#include "tree-pass.h"

 /* Maximal allowed offset for an address in the LD command */
 #define MAX_LD_OFFSET(MODE) (64 - (signed)GET_MODE_SIZE (MODE))
@@ -285,6 +287,58 @@ avr_to_int_mode (rtx x)
 }


+static const pass_data avr_pass_data_recompute_notes =
+{
+  RTL_PASS, /* type */
+  "avr-xx", /* name (will be patched) */
+  OPTGROUP_NONE, /* optinfo_flags */
+  false, /* has_gate */
+  true, /* has_execute */
+  TV_DF_SCAN, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  (TODO_df_finish | TODO_verify_rtl_sharing
+   | TODO_verify_flow ), /* todo_flags_finish */
+};
+
+
+class avr_pass_recompute_notes : public rtl_opt_pass
+{
+public:
+  avr_pass_recompute_notes (gcc::context *ctxt, const char *pass_name)
+: rtl_opt_pass (avr_pass_data_recompute_notes, ctxt)
+  {
+name = pass_name;
+  }
+
+  unsigned int execute ()
+  {
+df_note_add_problem ();
+df_analyze ();
+
+return 0;
+  }
+}; // avr_pass_recompute_notes
+
+
+static void
+avr_register_passes (void)
+{
+  /* This avr-specific pass (re)computes insn notes, in particular 
REG_DEAD

+ notes which are used by `avr.c::reg_unused_after' and branch offset
+ computations.  These notes must be correct, i.e. there must be no
+ dangling REG_DEAD notes; otherwise wrong code might result, cf. 
PR64331.

+
+ DF needs (correct) CFG, hence right before free_cfg is the last
+ opportunity to rectify notes.  */
+
+  register_pass (new avr_pass_recompute_notes (g, "avr-notes-free-cfg"),
+ PASS_POS_INSERT_BEFORE, "*free_cfg", 1);
+}
+
+
 /* Implement `TARGET_OPTION_OVERRIDE'.  */

 static void
@@ -346,6 +400,11 @@ avr_option_override (void)
   init_machine_status = avr_init_machine_status;

   avr_log_set_avr_log();
+
+  /* Register some avr-specific pass(es).  There is no canonical 
place for

+ pass registration.  This function is convenient.  */
+
+  avr_register_passes ();
 }

 /* Function to set up the backend function structure.  */





Re: inconsistencies in the documentation regarding side effects with auto inc-dec

2010-03-17 Thread Kenneth Zadeck
i will fix it.

kenny

On 03/17/2010 07:28 PM, Ramana Radhakrishnan wrote:
> Hi Kenneth,
>
> The documentation of auto-inc-dec still refers to flow and when I
> raised this on IRC folks suggested that you might have some
> documentation fixes if any, in this area.
>
> http://gcc.gnu.org/onlinedocs/gccint/Incdec.html#Incdec
>
> The lines in the doco are as below :
>
> These embedded side effect expressions must be used with care.
> Instruction patterns may not use them. Until the `flow' pass of the
> compiler, they may occur only to represent pushes onto the stack. The
> `flow' pass finds cases where registers are incremented or decremented
> in one instruction and used as an address shortly before or after;
> these cases are then transformed to use pre- or post-increment or
> -decrement.
>
>
> Cheers
> Ramana
>   


Re: [lto][RFC] Do not emit hybrid object files

2008-10-17 Thread Kenneth Zadeck
Andrew Pinski wrote:
> On Fri, Oct 17, 2008 at 2:15 PM, Diego Novillo <[EMAIL PROTECTED]> wrote:
>   
>> On Fri, Oct 17, 2008 at 16:51, Richard Henderson <[EMAIL PROTECTED]> wrote:
>>
>> 
>>> If the version of GCC being used isn't compatible with the version of the IL
>>> in the object file, we can just fall back on the final code.
>>>   
>> Fair enough.  But this could be provided via a flag to optionally emit
>> final code.  The more common scenario is likely to be hermetic builds
>> that use the same compiler throughout.  Duplicating code generation
>> for *every* build seems too wasteful to make it the compiler's default
>> behaviour.
>> 
>
> The idea was that people did not have to change their makefiles. If
> some static library provided by vendor 1 does not include the real
> machine code, then the build just fails.  So having the machine code
> by default is still seems like the correct idea to go forward and
> provided a flag to turn off it if really needed (though I think it is
> bad to have two different modes in this case).
>
> Thanks,
> Andrew Pinski
>   
Sorry to have missed most of this thread today.   i was at a meeting in
the city all day.  

Andrew is correct that the reason for putting both lto and final code in
the same file was to do the least damage to peoples build tools.   A
change from each invocation of gcc produce two files instead of one will
severely break many people's build environments.  Mark and I were very
concerned about how widely used LTO would be, if trying it cost more
than adding -flto in several places.

As far as the support of non elf compilers, the idea was to minimize and
modularize the actual dependence of elf as much as possible and then to
let anyone who was still non elf fill in the pieces.  At the present
time, all of the uses of elf go thru indirect calls so in theory this
will be easy.   However, in practice this is likely to be harder.   For
instance the current lto implementation assumes that there can be a lot
of different sections, but for instance the aix format does not allow this.


Re: [lto][RFC] Do not emit hybrid object files

2008-10-18 Thread Kenneth Zadeck
Diego Novillo wrote:
> On Fri, Oct 17, 2008 at 20:52, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>
>   
>> Andrew is correct that the reason for putting both lto and final code in
>> the same file was to do the least damage to peoples build tools.   A
>> change from each invocation of gcc produce two files instead of one will
>> severely break many people's build environments.  Mark and I were very
>> concerned about how widely used LTO would be, if trying it cost more
>> than adding -flto in several places.
>> 
>
> OK, we can then offer the option of emitting hybrid objects for
> libraries that want to provide the duality.  For the final user, the
> distinction will be transparent.  However, for the most common use
> scenario, emitting hybrid files is really wasteful.
>
>   
I actually think that the hybrid files should be the default.  If you
are willing to make invasive changes to your build environment to
support two files, then you should be willing to add extra options to
support that.  
>> However, in practice this is likely to be harder.
>> 
>
> Yes, agreed.
>
>
> Diego.
>   



bitwise dataflow analysis

2009-03-07 Thread Kenneth Zadeck
Silvius,

If you want to persue this, you should go back and look at my patches
for subreg level dataflow and start hacking from there.   It is not a
big step to start from that and modify it for bits.   If you start
from that, it is most likely not much more than a few days work for
the analysis part.  However, hacking combine could be a life long effort!

All of this code got removed because it's only client was the old ra.

Kenny


naked zero_extracts longer than a word.

2009-05-11 Thread Kenneth Zadeck
Would those that know, (or even those that are just generally vocal) be
willing to support a change rtl.texi for sign_extract (and by
implication, zero_extract) from

If @var{loc} is in memory, its mode must be a single-byte integer mode.
If @var{loc} is in a register, the mode to use is specified by the
operand of the @code{insv} or @code{extv} pattern
(@pxref{Standard Names}) and is usually a full-word integer mode,
which is the default if none is specified.

to a version that explicitly prohibits the use of a mode longer than a word?

The back ends do not appear to support this anyway.

Kenny



Re: Fwd: Register Pressure in Instruction Level Parallelism

2009-06-28 Thread Kenneth Zadeck
David Edelsohn wrote:
> -- Forwarded message --
> From: Albert Cohen 
> Date: Sun, Jun 28, 2009 at 6:25 PM
> Subject: Re: Register Pressure in Instruction Level Parallelism
> To: Dave Korn 
> Cc: re...@meinersbur.de, gcc@gcc.gnu.org, Sid Touati
> , Frederic Brault 
>
>
> Hi all,
>
> I'm convinced that saturation and sufficiency based approaches are the
> future of register pressure management.
>
> [Please keep my colleague Sid Touati in CC of this thread, until he
> registers to the list.]
>
> Unfortunately, the state of the art (more recent that the thesis
> referenced in the original email, see Touati's web page) is limited to
> basic block and software-pipelining scopes, and limited to scheduling.
>
> Compared to the tasks currently managed by reload, it certainly misses
> a whole bunch of instruction selection and immediate operand/address
> mode corner case problems (depending on the target). It also misses
> global scheduling, but extended BB scheduling is not very hard to
> approximate, as well as superblock scheduling.
>
>   
> I'm not at all a knowledgeable person to tell you what to do in the
> case of GCC, but for sure saturation/sufficiency-based approches are
> not sufficient to kill the dragon.
>
> However, I will strongly advise anybody (= Kenny Zadeck) looking at a
> fresh SSA-based backend design to consider such an approach to avoid
> messing up with pressure-aware-* where * is any backend pass that
> bears a risk of blowing up register pressure.
>
>   
I am considering such a design and i did, late last week to go public
with my project.  I am now working to put up a project in the public
spaces.  I would very much like the help of anyone who is interested in
working in this area to consider working on it in gcc.


kenny
> If you interested in collaboration on the topic, we are about to start
> extending those approaches to general control-flow, and implementing
> it in a production compiler (not GCC, but a free one, and not LLVM
> either).
>
> Albert
>
>
> Dave Korn wrote:
>   
>> Michael Kruse wrote:
>>
>> 
>>> So, now my questions: How much do you think could this could improve
>>> compiled code speed? Would the current GCC/YARA benefit from such an
>>> optimization pass at all? What are the chances that this could get into
>>> the main GCC tree if it shows up to be an improvement?
>>>   
>>  One of the major problems in gcc is the intertangling of instruction
>> selection with register allocation and spill generation.  If these could be
>> separated it would almost certainly generate better code and be welcomed with
>> open arms!
>>
>> 
>>> I'd prefer to implement this for the gcc, but my advisor wants me to do
>>> it for the university's own compiler. Therefore I could also need
>>> arguments why to do it for the GCC.
>>>   
>>  Because destroying reload(*) would be an incalculable public service and
>> your name will be remembered in history as the one who slew the dragon? ;-)
>>
>>cheers,
>>  DaveK
>> 



Re: Mainline bootstrap failure (revision 110017)

2006-01-20 Thread Kenneth Zadeck

Daniel Berlin wrote:

On Fri, 2006-01-20 at 10:53 +0530, Ranjit Mathew wrote:
  

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

  Mainline fails to bootstrap for me (revision 110017)
on i686-pc-linux-gnu.

Configured as:

$GCC_SRC_DIR/configure --prefix=$HOME/gcc --enable-languages=c,c++,java \
- --with-as=/home/ranmath/gnu/bin/as --with-gnu-as \
- --with-ld=/home/ranmath/gnu/bin/ld --with-gnu-ld \
- --with-arch=pentium4 --with-tune=pentium4 \
- --disable-nls --disable-checking --disable-libmudflap \
- --disable-debug --enable-threads=posix --enable-__cxa_atexit \
- --disable-static





Kenny thought it would be nice, rather than pass the actual bb info to free to 
the freeing function, to instead pass some random bitmap.


The attached fixes *that*, but this just causes a crash deeper in trying to 
free some chains.

However, it looks like that is either caused by a double free, or because
 we never null out pointers to things after we free the memory for what they 
are pointing to.

  



Index: df-core.c
===
--- df-core.c   (revision 110017)
+++ df-core.c   (working copy)
@@ -292,6 +292,7 @@ are write-only operations.  
 static struct df *ddf = NULL;

 struct df *shared_df = NULL;
 
+static void * df_get_bb_info (struct dataflow *, unsigned int);

 /*
   Functions to create, destroy and manipulate an instance of df.
 */
@@ -370,7 +371,7 @@ df_set_blocks (struct df *df, bitmap blo
  EXECUTE_IF_SET_IN_BITMAP (diff, 0, bb_index, bi)
{
  basic_block bb = BASIC_BLOCK (bb_index);
- (*dflow->problem->free_bb_fun) (dflow, bb, diff);
+ (*dflow->problem->free_bb_fun) (dflow, bb, df_get_bb_info 
(dflow, bb_index));
}
}
}
  
This is why c sucks.  In any language with a reasonable type system, 
this could have been written with a subtype and would not require using 
a void * arg.


Sorry for the problems and thanks for looking into them.

kenny


Re: Mainline bootstrap failure (revision 110017)

2006-01-20 Thread Kenneth Zadeck

Andreas Jaeger wrote:

Kenneth Zadeck <[EMAIL PROTECTED]> writes:

  

Daniel Berlin wrote:



  

The attached fixes *that*, but this just causes a crash deeper in trying to 
free some chains.
  

[...]
Sorry for the problems and thanks for looking into them.



Ken, please reread the email.  The issue is *not* fixed according to
Daniel, there's still another problem.  Could you look into it,
please?

Andreas
  

I will update and start looking.

sorry

kenny


Re: Mainline bootstrap failure (revision 110017)

2006-01-20 Thread Kenneth Zadeck

Andreas Jaeger wrote:

Kenneth Zadeck <[EMAIL PROTECTED]> writes:

  

Daniel Berlin wrote:



  

The attached fixes *that*, but this just causes a crash deeper in trying to 
free some chains.
  

[...]
Sorry for the problems and thanks for looking into them.



Ken, please reread the email.  The issue is *not* fixed according to
Daniel, there's still another problem.  Could you look into it,
please?

Andreas
  
I would like permission to revert zdenek's patch for a few days.  There 
is nothing wrong with zdenek's patch, pe se, but it excercises a part of 
df that should work but does not.


We could revert my storage patch, but the problem is much deeper than 
that.  The storage patch only causes this problem to happen when 
bootstrapping.  The problem will still be there and may cause random 
ices when running zdeneks code.


The problem is that when ever you delete any basic blocks, you need to 
get rid of the def use and use def chains that either start or end in 
the deleted block, furthermore,  this has to be done before the 
instructions are deleted for those blocks.  This will take me a day to 
get correct.
Zdenek's patch is the only code that manipulates the blocks after 
use-def or def-use chains are built.


Kenny


Re: Mainline bootstrap failure (revision 110017)

2006-01-20 Thread Kenneth Zadeck

Zdenek Dvorak wrote:

Hello,

  
The attached fixes *that*, but this just causes a crash deeper in trying 
to free some chains.
 
  

[...]
Sorry for the problems and thanks for looking into them.
   


Ken, please reread the email.  The issue is *not* fixed according to
Daniel, there's still another problem.  Could you look into it,
please?

  
I would like permission to revert zdenek's patch for a few days.  There 
is nothing wrong with zdenek's patch, pe se, but it excercises a part of 
df that should work but does not.



I don't quite like this idea; as you yourself say, the problem is not in
that patch (in fact, mainline bootstraps and passes regtesting with it
without your patch); so I think that if we indeed want to go into
reverting patches, it should be the one causing the problem (and I have
a followup patch that I would like to be able to test).

Btw.: of course it may happen that some patch sometimes breaks
bootstrap, it happened to everyone of us.  But, with your patch, not
even libgcc builds.  This means that you did not even try to build gcc
before commiting your patch.  Please do that next time.  In fact,
running at least a partial bootstrap till gengtype is run helped me
avoid introducing bootstrap failures (caused by unexpected interaction
with patches commited since I tested the patch fully the last time)
several times, so it is a good idea even if you are quite sure that no
such problem may occur.

  

Zdenek,
There are several problems here.  If I fix the immediate problem, you 
only hit the next one.


The underlying problem is that when you change the blocks in the program 
and you have def-use or use-def chains built, these have to be cleaned 
out in a way that was not being done.
If I revert my patch, it will fix the immediate problem that 
df_set_blocks crashes but will most likely happen in other places.  It 
is because of the other places that it is not sufficient to just revert 
my patch.


Yours is currently the only place that plays with blocks when the 
use-def chain problem has been built.  This is something that I 
advertised could be done and I blew it.  But removing my last patch is 
not going to fix that, it will still be broken, it will just fail less 
often.


Kenny
We could revert my storage patch, but the problem is much deeper than 
that.  The storage patch only causes this problem to happen when 
bootstrapping.  The problem will still be there and may cause random 
ices when running zdeneks code.


The problem is that when ever you delete any basic blocks, you need to 
get rid of the def use and use def chains that either start or end in 
the deleted block, furthermore,  this has to be done before the 
instructions are deleted for those blocks.  This will take me a day to 
get correct.
Zdenek's patch is the only code that manipulates the blocks after 
use-def or def-use chains are built.



This analysis seems wrong to me -- the crash occurs when called from
estimate_probability, and we do not change CFG in branch prediction.

Zdenek
  




Re: Mainline bootstrap failure (revision 110017)

2006-01-20 Thread Kenneth Zadeck

Eric Botcazou wrote:

So he updated his tree, saw changes in the middle-end and committed
his without testing.



So Kenny would have had to lauch a new bootstrap, wait for a couple of hours 
only to discover that something again changed in-between, and so on?


  
This is exactly what I did, when I got the approval, I did an update, 
launched a bootstrap and regression test, when that passed, I did 
another update and committed.


I am sure that zdenek has access to 1000 teraflop machine to make this 
process almost instantanious.  The rest of us have to make due with 
machies that only run at a few gigahertz.


kenny



Re: Mainline bootstrap failure (revision 110017)

2006-01-20 Thread Kenneth Zadeck

This fixes the problems that became apparent from zdeneks patch.

Bootstrapped and regression tested against the version with zdenek's 
original code (since this directly tickled the failure and bootstrapped 
(and in the process of regression testing) against the current mainline. 
Both on i686-pc-linux-gnu.


Kenny


2005-01-20  Kenneth Zadeck <[EMAIL PROTECTED]>

   * df-scan.c (problem_SCAN): Added NULL reset function.
   (df_scan_reset_blocks): Added code to call reset block function
   (df_bb_refs_delete) Fixed comment.
   (df_insn_refs_delete): Made tolerant of deleting non existent info
   for dataflow problems that need to be reset.
   * df-core.c (df_set_blocks): Ditto.
   * df.h (struct df_problem): Added reset_fun.
   * df-problems.c (problem_RU, problem_RD, problem_LR, problem_UR,
   problem_UREC, problem_CHAIN, problem_RI): Initialized reset_fun field.
   (df_chain_insn_reset, df_chain_bb_reset, df_chain_reset): New
   functions to clear out all references to def-use or use-def chains.


Zdenek Dvorak wrote:

Hello,

  
The attached fixes *that*, but this just causes a crash deeper in trying 
to free some chains.
 
  

[...]
Sorry for the problems and thanks for looking into them.
   


Ken, please reread the email.  The issue is *not* fixed according to
Daniel, there's still another problem.  Could you look into it,
please?

  
I would like permission to revert zdenek's patch for a few days.  There 
is nothing wrong with zdenek's patch, pe se, but it excercises a part of 
df that should work but does not.



I don't quite like this idea; as you yourself say, the problem is not in
that patch (in fact, mainline bootstraps and passes regtesting with it
without your patch); so I think that if we indeed want to go into
reverting patches, it should be the one causing the problem (and I have
a followup patch that I would like to be able to test).

Btw.: of course it may happen that some patch sometimes breaks
bootstrap, it happened to everyone of us.  But, with your patch, not
even libgcc builds.  This means that you did not even try to build gcc
before commiting your patch.  Please do that next time.  In fact,
running at least a partial bootstrap till gengtype is run helped me
avoid introducing bootstrap failures (caused by unexpected interaction
with patches commited since I tested the patch fully the last time)
several times, so it is a good idea even if you are quite sure that no
such problem may occur.

  
We could revert my storage patch, but the problem is much deeper than 
that.  The storage patch only causes this problem to happen when 
bootstrapping.  The problem will still be there and may cause random 
ices when running zdeneks code.


The problem is that when ever you delete any basic blocks, you need to 
get rid of the def use and use def chains that either start or end in 
the deleted block, furthermore,  this has to be done before the 
instructions are deleted for those blocks.  This will take me a day to 
get correct.
Zdenek's patch is the only code that manipulates the blocks after 
use-def or def-use chains are built.



This analysis seems wrong to me -- the crash occurs when called from
estimate_probability, and we do not change CFG in branch prediction.

Zdenek
  


Index: df-scan.c
===
--- df-scan.c	(revision 110059)
+++ df-scan.c	(working copy)
@@ -304,6 +304,7 @@ static struct df_problem problem_SCAN =
   DF_SCAN,/* Problem id.  */
   DF_NONE,/* Direction.  */
   df_scan_alloc,  /* Allocate the problem specific data.  */
+  NULL,   /* Reset global information.  */
   df_scan_free_bb_info,   /* Free basic block info.  */
   NULL,   /* Local compute function.  */
   NULL,   /* Init the solution specific data.  */
@@ -426,6 +427,8 @@ df_rescan_blocks (struct df *df, bitmap 
 
   if (blocks)
 {
+  int i;
+
   /* Need to assure that there are space in all of the tables.  */
   unsigned int insn_num = get_max_uid () + 1;
   insn_num += insn_num / 4;
@@ -443,6 +446,24 @@ df_rescan_blocks (struct df *df, bitmap 
   df->def_info.add_refs_inline = true;
   df->use_info.add_refs_inline = true;
 
+  for (i = df->num_problems_defined; i; i--)
+	{
+	  bitmap blocks_to_reset = NULL;
+	  if (*dflow->problem->reset_fun)
+	{
+	  if (!blocks_to_reset)
+		{
+		  blocks_to_reset = BITMAP_ALLOC (NULL);
+		  bitmap_copy (blocks_to_reset, local_blocks_to_scan);
+		  if (df->blocks_to_scan)
+		bitmap_ior_into (blocks_to_reset, df->blocks_to_scan);
+		}
+	  (*dflow->problem->reset_fun) (dflow, blocks_to_reset);
+	}
+	  if (blocks_to_reset)
+	BITMAP_FREE (blocks_to_reset);
+	}
+
   df_refs_delete (dflow, local_blocks_to_scan);
 
   /* This may be a mis

Re: Bootstrap failure on Linux/x86-64

2006-01-22 Thread Kenneth Zadeck

Graham Stott wrote:

Andreas,

FWIW I've had successful bootstrap with these checking flags on
x86_64-unknown-lunux-gnu

Graham

My bootstrap was fine also using x86_64-suse-linux-gnu.

Kenny



Re: Bootstrap failure on Linux/x86-64

2006-01-22 Thread Kenneth Zadeck

Andreas Jaeger wrote:

Andrew Pinski <[EMAIL PROTECTED]> writes:

  

On Jan 21, 2006, at 1:09 PM, Andreas Jaeger wrote:



On Sat, Jan 21, 2006 at 12:42:24PM -0500, Andrew Pinski wrote:
  

On Jan 21, 2006, at 12:38 PM, Andreas Jaeger wrote:



I'm still seeing this with current Subversion.  Anybody else having
the same problem on x86-64?  It fails for me on the two systems I
tested it on,
  

What reversion is this on?

This worked for me on 110050 and 110059.


I'll try 110059 now to double check.  Perhaps my configure flags are
the
problem, I use:
/cvs/gcc-svn/trunk/configure   --prefix=/opt/gcc/4.2-devel
--enable-checking=misc,tree,gc,rtl,rtlflag,assert
--enable-threads=posix --enable-clocale=gnu --enable-__cxa_atexit
--enable-shared --with-system-zlib x86_64-suse-linux-gnu
--enable-languages=c,ada,c++,fortran,java,objc,treelang
  

I just used 110067 and it worked.  Maybe it is one of the checking
options you are
using which is causing it.  I just used the default checking.



I now used 110067 with the default checking options - and it worked,
so it's the checking options.

Kenneth, Zdenek, could you look into this?  Please use the same
checking options that I have above.

  

Let me double check with the checking you have.



Thanks,
Andreas
  
I am not going to see this failure on my machine.  I have a 2gb x86-64 
which will easily handle
compiling this size file.   I was watching top as it compiled a stage2 
insn-addrtab, which I think is the largest function in the gcc stack and 
the VIRT topped at 501mb, which is no sweat for this machine.  You could 
argue, (and I would not disagree with you) that 501mb is not tolerable 
and is symptomatic of something bad happening.  There are, most likely 
storage leaks in the system.  But I do not think that you are going to 
necessarily find the problem by looking to see which version happens to 
crash on your machine.  The worst case senario is that this is just 
footprint creep and the version that failed was just the one the added 
the straw that broke the camels back with you particular config.


I am not fully into building stage3 and have no failure.

btw, I am doing this with the latest baseline compiler.  I can go back 
to earlier compilers if you want.  But given that andreas's failed with 
the following line:

cc1: out of memory allocating 22752576 bytes after a total of 142843904 bytes

I am not going to see the problem.

Kenny

















Re: Mainline bootstrap failure (revision 110017)

2006-01-24 Thread Kenneth Zadeck

Zdenek Dvorak wrote:

Hello,

  

On Fri, 20 Jan 2006, Zdenek Dvorak wrote:


I propose the following workaround instead, that also restores
bootstrap.  It changes the way loop-iv uses df to more conservative one,
that does not seem to cause problems.
  

That's what I like to see... options.  Yes, this is OK for mainline,
please hold off on the reversion (or if necessary reapply with this
change).



OK to revert this workaround now?  Mainline now passes bootstrap &
regtesting on i686 without it.

  
I tested my code before and after your revert and it passed.  As far as 
I am concerned, put it back.



Zdenek

* loop-iv.c (iv_analysis_loop_init): Use df analysis in a more
efficient way.

Index: loop-iv.c
===
*** loop-iv.c   (revision 110143)
--- loop-iv.c   (working copy)
*** iv_analysis_loop_init (struct loop *loop
*** 250,260 
current_loop = loop;
  
/* Clear the information from the analysis of the previous loop.  */

!   if (!first_time)
! iv_analysis_done ();
!   df = df_init (DF_HARD_REGS | DF_EQUIV_NOTES);
!   df_chain_add_problem (df, DF_UD_CHAIN);
!   bivs = htab_create (10, biv_hash, biv_eq, free);
  
for (i = 0; i < loop->num_nodes; i++)

  {
--- 250,263 
current_loop = loop;
  
/* Clear the information from the analysis of the previous loop.  */

!   if (first_time)
! {
!   df = df_init (DF_HARD_REGS | DF_EQUIV_NOTES);
!   df_chain_add_problem (df, DF_UD_CHAIN);
!   bivs = htab_create (10, biv_hash, biv_eq, free);
! }
!   else
! clear_iv_info ();
  
for (i = 0; i < loop->num_nodes; i++)

  {
  




very confused about single_set

2006-02-19 Thread Kenneth Zadeck
Steven, Zdenek

1) single_set is built on top of single_set2.
2) single_set_2 uses REG_UNUSED notes.
3) tree-ssa-loop-ivops.c:seq_cost uses single_set.

I can find no indication that rtl dfa is run here to provide the
information for single_set to produce the correct answer.

Inquiring minds want to know.

Kenny


Re: very confused about single_set

2006-02-19 Thread Kenneth Zadeck
Daniel Berlin wrote:
> On Sun, 2006-02-19 at 09:56 -0500, Kenneth Zadeck wrote:
>   
>> Steven, Zdenek
>>
>> 1) single_set is built on top of single_set2.
>> 
> Yes
>   
>> 2) single_set_2 uses REG_UNUSED notes.
>> 
> Only if there are multiple sets.
>
>   
>> 3) tree-ssa-loop-ivops.c:seq_cost uses single_set.
>> 
>
> This is because Zdenek builds RTL to determine target costs.
>
>   
>> I can find no indication that rtl dfa is run here to provide the
>> information for single_set to produce the correct answer.
>> 
>
> I don't understand what you mean.
> Why does the DFA need to be run to produce the correct answer?
> (Our ports use Deterministic Finite Automaton based schedulers, so
> saying DFA is an ambiguous term at best)
>
> Or do you mean DataFlow Analysis, in which case, i'm sure everyone
> expects the notes are always up to date (even though they may not be).
>
>
>   
I was refering to the reg_unused notes not being there at all when this
is called.  You are correct that this is not so much a correctness issue
as a place where the code is pessimistic.

kenny



Re: Reload producing lots of duplicate insns

2006-02-26 Thread Kenneth Zadeck
Steven Bosscher wrote:
> Hi,
>
> While teaching dce.c on the dataflow-branch to delete noop-moves,
> I noticed that most noop moves are produced by reload.  It inserts
> duplicate insns for some reloads, postreload turns the duplicate
> reload into a simple reg-reg move (but the lhs and rhs are the same
> register of course), and then dce.c removes these moves because
> they are noop-moves.
>
>   
Steven,

Did richard's untested patch not fix this?


Kenny


Re: Deprecating -f options?

2006-06-15 Thread Kenneth Zadeck
Steven Bosscher wrote:
> On 6/15/06, Joern RENNECKE <[EMAIL PROTECTED]> wrote:
>> In http://gcc.gnu.org/ml/gcc/2006-06/msg00549.html, you wrote:
>>
>> > -ftree-live-range-split  splits live ranges on exit from ssa.  This is
>> > on by default, and in fact it is more work to NOT split unrelated live
>> > ranges, and creates extra register pressure.
>>
>> I've found that in 4.1, -fno-web is often needed in order to generate
>> auto-increment addressing.
>> I suppose that will be similar for any live range splitting that
>> happens before flow.
>
> Well, is that something you have to blame live range splitting for, or
> is perhaps flow just too stupid to see through the split live ranges?
>
> Actually, Zadeck is working on a new auto-increment pass anyway IIUC.
>
> Gr.
> Steven

My pass is redoing the code in flow so that it is not in flow, and does
the premodify and postmodify properly.  It does not attempt to do what
Joern is doing, i.e. changing a reference from one base register to
another. 

Kenny


Re: local data flow

2006-07-16 Thread Kenneth Zadeck
Joern RENNECKE wrote:
> I 've been looking at the problem of converting the struct-equiv code
> to use DEF-USE chains
> instead of global dataflow information, but I have hit a snag.
> We can find local registers as being registers that are defined
> somewhere in the examined (partial) block,
> and have all their uses within the (partial) block.  However, there is
> no feasible way to find dying inputs
> with def-use / use-def chains.  We could find the definition with
> use-def chains, and examine all the uses
> of this definition with def-use chains, but we can't easily tell if a
> use could be reached through the examined
> (partial) block.
>
> And this is really not an isolated problem.  Generally, when we
> examine REG_DEAD notes, there is no
> equivalent information to be found in def-use chains.  The problem is
> that we are not really interested
> in what uses can be reached from the definition, but which can be
> reached from the use we are examining.
> I.e. we'd like a list of uses of the same register after the current
> use.  In fact, we don't need an exhaustive list.
> It is sufficient if we have a set of uses such that any use which is
> reachable from the current use is dominated
> by one of the uses in the set (a minimal set is desirable, but not
> necessary).  With these sets in place, we could
> also restrict the def-use information to such a set.  When we add back
> pointers to each link, these data structures
> should be reasonably easy to keep sufficiently up-to-date to remain
> usable, i.e. such that every dependency is
> represented, and that there are no dangling pointers.
you can have def-use chains, you can have use-def chains or you can have
both.
It seems like what you are asking for are use-def chains, i.e. the list
of defs that can reach each use.  If the set is empty, this is equiv to
a reg-dead note.

kenny


Re: local data flow

2006-07-16 Thread Kenneth Zadeck
Joern RENNECKE wrote:
> Kenneth Zadeck wrote:
>
>>  
>>
>> you can have def-use chains, you can have use-def chains or you can have
>> both.
>> It seems like what you are asking for are use-def chains,
>>
> No, I want to know if there exists a path from the current *use* of a
> register to
> some other *use* of the same register without going through a def.
>
The right way to do this is not to build chains but to define your own
dataflow problem to do this. 

I think that what you want is something like the reaching uses problem
but you want a forwards version of this rather than a backwards version
as is defined in df-problems.c.
However, you may find that that problem does provide the info you are
looking for.

If this is what you want, I can help you define it.



>> i.e. the list
>> of defs that can reach each use.  If the set is empty, this is equiv to
>> a reg-dead note.
>>  
>>
> Huh?  If the set is empty, the use is either uninitialized or
> unreachable.
yes, I am sorry


Re: local data flow

2006-07-16 Thread Kenneth Zadeck
Joern RENNECKE wrote:
> Kenneth Zadeck wrote:
>
>>  
>>
>> The right way to do this is not to build chains but to define your own
>> dataflow problem to do this.  
>>
> But wouldn't I need to update the problem solution every time a change
> a bit of the
> program - which would be much more costly then doing a local update of
> some
> local def-firstuse or use-nextuse chains?
>
depending on what you are doing, you can update the solution in place. 
The point of the dataflow talk was not to say that you cannot do
anything incremental, it was to say that there are no good GENERAL
techniques.  Many times it is possible to update the solution precisely
if you have a very good understanding of the local conditions under
which the transformation is done on. 

The other trick it to do what I do in the new fast dce or the if-cvt on
the dataflow branch:
order the basic blocks so that it does not matter if you mess up the
solution. Generally a post order or a reverse postorder traversial of
the basic blocks will have the property that you can just mess things up
in a wave that moves from the beginning to the end or the end to the end
of the program without ever seeing the mess you make.

The only time you need to iterate is if you mess things up a the top of
a block that is the destination of the back edge.

This is a very useful trick in a compiler.  The cfg is your friend.

>> I think that what you want is something like the reaching uses problem
>> but you want a forwards version of this rather than a backwards version
>> as is defined in df-problems.c.
>>  
>>
> It is reaching uses, but the starting point is not necessarily a
> definition, but is more
> often a use.  I want to know about uses that are forward of the
> current site in the
> control flow, but I suppose this is best computed with a backward
> propagation of
> lifeness data.  AFAICT that's the same direction that the current
> reaching use problem
> has.
>
the gen set of the reaching uses problem is the set of uses.  the kill
set are the defs and the clobbers.  This is why the problem is called
"reaching uses".


Re: local data flow

2006-07-17 Thread Kenneth Zadeck
Joern Rennecke wrote:
> In http://gcc.gnu.org/ml/gcc/2006-07/msg00390.html, you write:
>   
>> depending on what you are doing, you can update the solution in place.
>> The point of the dataflow talk was not to say that you cannot do
>> anything incremental, it was to say that there are no good GENERAL
>> techniques.  Many times it is possible to update the solution precisely
>> if you have a very good understanding of the local conditions under
>> which the transformation is done on.
>> 
>
> Right.  So the struct-equiv code does not have to be converted to
> use def-use chains to be beter integrated with thedataflow branch,
> it can happily go on using regsets of live registers.
> What it needed is a solid understanding about what invariants are
> required and how we can maintain them while we are changing the rtl,
> in order to keep the cost of updating the information under control.
> Which, ironically, is what I proposed to use in the first place outside
> of the dataflow branch to address the compile time problems, but Berndt
> considerd that approach to fragile.
>
>
>   
It is fragile, but not in the normal sense.  It cannot be generalized to
work with any general  transformation.  You will end up with something
that only works for the kinds of transformations that you are doing in
the pass you are writing.  This is fine. 

I should point out that you will also not be able to cure cancer with
this.  You just have to be satisfied to do a good job solving the
problem at hand. 
>> The other trick it to do what I do in the new fast dce or the if-cvt on
>> the dataflow branch:
>> order the basic blocks so that it does not matter if you mess up the
>> solution. Generally a post order or a reverse postorder traversial of
>> the basic blocks will have the property that you can just mess things up
>> in a wave that moves from the beginning to the end or the end to the end
>> of the program without ever seeing the mess you make.
>> 
>
> cross-jumping two entire basic blocks creates opportunities for further
> cross-jumping and jump simplification involving the predecessor blocks.
> This appears to fit a reverse postorder traversal.
>
> However, if-conversion creates further if-conversion opportunities involving
> the newly merged block.  Thus, while a postorder / reverse postorder
> traversal makes sense, we also need valid information for the newly merged
> block, its successors and predecessors.
> I suppose reg-live-at-start / reg-live-at-end information is actually easier
> to maintain during if-conversion that def-use chains.
>
>   
This is true, certainly in theory, a lot less so in practice.
The way that you order things is this. 

while (something changes and we have not done too many passes)
do the analysis
do one properly ordered pass over the cfg changing what you can
change without using information that has been corrupted during this pass.
 
You should limit the number of passes to some small number times the
optimization level.
you only iterate if you made changes during that pass.
You try to keep the dataflow up-to-date.  If some transformation is too
hard to do this, you mark the block as not transformable until the next
pass.

In practice, there are few second order effects so you end up iterating
twice.

 I think that what you want is something like the reaching uses problem
 but you want a forwards version of this rather than a backwards version
 as is defined in df-problems.c.
 
> ...
>   
>> the gen set of the reaching uses problem is the set of uses.  the kill
>> set are the defs and the clobbers.  This is why the problem is called
>> "reaching uses".
>> 
>
> In my problem, the gen set is the set of uses, and the kill set are
> defs, clobbers and uses.  Hence, it's still backwards problem.
>
>   
You could define it either way.  However, the backwards problem is
already there in both the mainline and the dataflow branch.  It is
called reaching uses or DF_RU.

> But it is not really useful to compute this just for one or two code
> transformations - benefits could only be obtained if this information
> can be kept usable between different transformation to reduce the number
> of global updates.
>
>   
You have to let it go.  This is why we defined it the way we did.  Just
compute it, use it, and throw it away when you are finished.  It is way
too hard to figure out how to keep this up to date for all of the passes.

> I suppose I should instead continue to operate on regsets, but keep them
> usable without frequent global updates, if that's all right with Berndt.
>   

I think you should look closely are DF_RU.



Re: local data flow

2006-07-17 Thread Kenneth Zadeck
Joern RENNECKE wrote:
> Kenneth Zadeck wrote:
>
>>> I suppose reg-live-at-start / reg-live-at-end information is
>>> actually easier
>>> to maintain during if-conversion that def-use chains.
>> This is true, certainly in theory, a lot less so in practice.
>> The way that you order things is this.
>> while (something changes and we have not done too many passes)
>>do the analysis
>>do one properly ordered pass over the cfg changing what you can
>> change without using information that has been corrupted during this
>> pass.
>>  
>>
> Except that with the current def-use chains updating def-use chains is
> pointless since they
> don't have the required information in the first place.
>>> In my problem, the gen set is the set of uses, and the kill set are
>>> defs, clobbers and uses.  Hence, it's still backwards problem.
>> You could define it either way.  However, the backwards problem is
>> already there in both the mainline and the dataflow branch.  It is
>> called reaching uses or DF_RU.
>>  
>>
> When the block has one or more uses, but no def of a register, the
> question
> to ask is 'which uses of this register reaches this *use* site'
> (backwards).  That's
> different from DF_RU, which calculates this information for def sites
> instead.
>
>>  I think you should look closely are DF_RU.
>
> What more is there to see?  When the block I'm looking at only has
> uses of a
> register, I don't want to know about uses in 'siblings' in the control
> flow graph -
> what I want to know is if the register is live at the end of the
> block.  LR should
> answer that, and I can keep this information up-to-date inasmuch as it
> will
> mention all live registers, and remain self-consistent.
>
Jorne, I do not know what problem you want to solve.  You have never
given me or I think anyone else a succinct description of exactly what
you want to do here.

What you did say that you needed was "No, I want to know if there exists
a path from the current *use* of a register to some other *use* of the
same register without going through a def. "

>From that description I assumed that you really did care which uses
actually reached which other uses.  The reaching uses problem tracks
each use separately.  If this isn't what you need then you are free to
use LR which is certainly much cheaper than RU.


Re: local data flow

2006-07-17 Thread Kenneth Zadeck
Joern RENNECKE wrote:
> Kenneth Zadeck wrote:
>
>> From that description I assumed that you really did care which uses
>> actually reached which other uses.  The reaching uses problem tracks
>> each use separately.  If this isn't what you need then you are free to
>> use LR which is certainly much cheaper than RU.
>>  
>>
> Yes, LR is perfectly adequate.  I was thinking in terms of def-use
> chains only
> because these can be kept more comprehensively up-to-date - i.e. when
> a register
> is live in a loop solely because it is used afterwards, and the use
> after the loop
> goes away, def-use chains won't keep a lingering livenes of the
> register in the
> loop.
> But since the exiting def-use chains don't actually fit the problem,
> there is no
> point in trying to use them here.
>
> So, in summary, the plan is:
> - Use LR (Live Registers) information for if-conversion / cross-jumping.
> - Update the information after each transformation to keep a correct,
> but not necessarily
>  minimal solution.
> - After traversing the entire flowgraph, if anything changed,
> re-compute LR from scratch.
> - After re-computing LR, we need to traverse the flowgraph again. 
> This might mean we
>  end up doing more if-conversion / cross-jumping checks than with the
> current scheme, but
>  since these checks are local, this should be generally cheaper than
> trying to keep a minimal
>  LR solution all the time.
>
> I'm not sure how to best handle splitting blocks when doing
> cross-jumping.
> propagate_block can produce inconsistant liveness information for
> stack regs after
> reg_stack.  Currently this is handled by zeroing out all the stack reg
> liveness information.
> Maybe we shouldn't even try to update LR for the split upper parts of
> the original BBs.
> The live-at-end information for the joined lower part should be
> bassically the same as both
> blocks original BBs had, and doing further optimizations with a
> partial or fully joined block
> appears not unlikely; however, the remaining unjoined upper parts of
> the original BBs
> are somehow differentfrom each other, so any further cross-jumping for
> them, while not
> impossible, seems unlikely - AFAICT it would require a complete
> corss-jumping of the joined
> lower part with some other block.  So maybe we could mark them has
> having no valid LR
> information, and not consider them again till the next global LR
> recomputation.
You should do this starting from the dataflow branch.  The version of
if-cvt there works as I have mentioned in the previous mail and does not
use propagate block at all.





Re: local data flow

2006-07-17 Thread Kenneth Zadeck
Joern RENNECKE wrote:
> Kenneth Zadeck wrote:
>
>>  
>>
>> You should do this starting from the dataflow branch.  The version of
>> if-cvt there works as I have mentioned in the previous mail and does not
>> use propagate block at all.
>>  
>>
> if-conversion always joins blocks.  But cross-jumping merges blocks or
> partial blocks.
> If the latter, both of the original blocks are split, and the botom
> part of the blocks
> becomes one merge block.  You need to re-compute the live-at-end
> information for
> the upper parts of the original blocks, which become two separate BBs.
>
>>
>>
>>  
>>
>
Are you saying that this is a problem with my code or a problem with
what you have to do?
Updating the LR dataflow when splitting a basic block is generally
pretty easy. 
You start from the end of the block and just interpret the uses and defs
for each insn. 
The code in the new version of if-cvt actually does this where it
replaces the old code that used to call propagate block.

kenny


Re: local data flow

2006-07-17 Thread Kenneth Zadeck
Joern RENNECKE wrote:
> Kenneth Zadeck wrote:
>
>> Updating the LR dataflow when splitting a basic block is generally
>> pretty easy. You start from the end of the block and just interpret
>> the uses and defs
>> for each insn.
>>  
>>
> This means duplicating this aspect of the propagate_block functionality.
> If we really have to do this in multiple places, we should have a
> function
> to do that.
>
I have functions for it, look at the code.
>> The code in the new version of if-cvt actually does this where it
>> replaces the old code that used to call propagate block.
>>  
>>
> Even propagate_block clouldn't handle stack regs properly.  The
> problem is
> that arithmetic operations of implicit pushses / pops, which
> effectively renames
> your registers
> You won't see this problem for passes that run before reg-stack.
is it really necessary to do your pass after reg stack.  Given that
there is no if conversion that runs after regstack what is your point?

I should point out that there are no passes that currently use any
dataflow after regstack.


Re: local data flow

2006-07-17 Thread Kenneth Zadeck
Joern RENNECKE wrote:
> Kenneth Zadeck wrote:
>
>>  
>>
>> is it really necessary to do your pass after reg stack.  Given that
>> there is no if conversion that runs after regstack what is your point?
>>  
>>
> I am talking about cross-jumping after regstack.
>
>> I should point out that there are no passes that currently use any
>> dataflow after regstack.
>>  
>>
> That's because the current cross-jumping is so stupid.
I would be very careful to tread here.  regstack is a can of worms and
not very good worms.  We have had a lot of trouble retrofitting better
dataflow into regstack because of the fragile nature of it's
implementation. 

If you think that your cross jumping would be good to do as an
interleaved pass to if conversion, it is possible to see the benefit
there. However, I think that the code gets hardened a lot by register
allocation in it is difficult for me to see how heroic transformations
are going to be really useful after that pass. 

Anyway, many compilers do label the stack and do dataflow on the
individual stack elements.  GCC is problematic because most of the stack
off are set in register allocation, and in truth it really is too late
to get much benefit after that pass.  But it is not hard to define a
problem that did label the stack elements.

Kenny



type consistency of gimple

2006-08-11 Thread Kenneth Zadeck
Mark,

I have had some discussions with Honza and Diego about the type
consistency at the gimple level.  They told me that Andrew was in the
process of making gimple properly type consistent.

I just wanted to point out how this effects encoding gimple into dwarf. 
If the gimple is type consistent, then it looks like the only place
where I will need to write type references is at CONVERT_EXPRs and
NOP_EXPRs.  If it is not type consistent, Diego and Honza do not believe
that I can properly get away without putting type references at every
tree node. 

This looks like about 20% of the size of writing a function body.  I do
not know how close Pinskia's project is to completion, but anything that
you, or any one else, could do to help will pay off for LTO.  It has
been suggested that I assume that gimple is type consistent as a way to
force the issue.  I like this idea, but it is not something that I feel
should be kept a secret either. 

Kenny


Re: type consistency of gimple

2006-08-11 Thread Kenneth Zadeck
Richard Guenther wrote:
> On 8/11/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> Mark,
>>
>> I have had some discussions with Honza and Diego about the type
>> consistency at the gimple level.  They told me that Andrew was in the
>> process of making gimple properly type consistent.
>>
>> I just wanted to point out how this effects encoding gimple into dwarf.
>> If the gimple is type consistent, then it looks like the only place
>> where I will need to write type references is at CONVERT_EXPRs and
>> NOP_EXPRs.  If it is not type consistent, Diego and Honza do not believe
>> that I can properly get away without putting type references at every
>> tree node.
>>
>> This looks like about 20% of the size of writing a function body.  I do
>> not know how close Pinskia's project is to completion, but anything that
>> you, or any one else, could do to help will pay off for LTO.  It has
>> been suggested that I assume that gimple is type consistent as a way to
>> force the issue.  I like this idea, but it is not something that I feel
>> should be kept a secret either.
>
> Maybe you can elaborate how you are missing information (assuming
> the SSA temporaries still have a type in their "decl" node)?  Especially
> fold relies on exact matching types for arithmetic operands, the only
> place where type transitions are not explicit in all cases is for
> MODIFY_EXPRs which can have an implicit type-conversion from
> the type of the RHS to the type of the LHS (see
> tree_ssa_useless_type_conversion()).
>
> Richard.
Richard,

I actually do not know the details.  From what I was told by Diego and
Honza while we were in Russian is that what you say is in fact the gold
standard, and unfortunately gcc does not live up to that standard.  The
feeling was that Andrew was in fact fixing bugs, not changing the
definition of gimple. 

I am writing this because no one seems to think that Andrew, and the
reviewers are really finished with this process and I was just giving a
supportive nudge. 

Kenny




Re: type consistency of gimple

2006-08-11 Thread Kenneth Zadeck
Richard Guenther wrote:
> On 8/11/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> Richard Guenther wrote:
>> > On 8/11/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> >> Mark,
>> >>
>> >> I have had some discussions with Honza and Diego about the type
>> >> consistency at the gimple level.  They told me that Andrew was in the
>> >> process of making gimple properly type consistent.
>> >>
>> >> I just wanted to point out how this effects encoding gimple into
>> dwarf.
>> >> If the gimple is type consistent, then it looks like the only place
>> >> where I will need to write type references is at CONVERT_EXPRs and
>> >> NOP_EXPRs.  If it is not type consistent, Diego and Honza do not
>> believe
>> >> that I can properly get away without putting type references at every
>> >> tree node.
>> >>
>> >> This looks like about 20% of the size of writing a function body. 
>> I do
>> >> not know how close Pinskia's project is to completion, but
>> anything that
>> >> you, or any one else, could do to help will pay off for LTO.  It has
>> >> been suggested that I assume that gimple is type consistent as a
>> way to
>> >> force the issue.  I like this idea, but it is not something that I
>> feel
>> >> should be kept a secret either.
>> >
>> > Maybe you can elaborate how you are missing information (assuming
>> > the SSA temporaries still have a type in their "decl" node)? 
>> Especially
>> > fold relies on exact matching types for arithmetic operands, the only
>> > place where type transitions are not explicit in all cases is for
>> > MODIFY_EXPRs which can have an implicit type-conversion from
>> > the type of the RHS to the type of the LHS (see
>> > tree_ssa_useless_type_conversion()).
>> >
>> > Richard.
>> Richard,
>>
>> I actually do not know the details.  From what I was told by Diego and
>> Honza while we were in Russian is that what you say is in fact the gold
>> standard, and unfortunately gcc does not live up to that standard.  The
>> feeling was that Andrew was in fact fixing bugs, not changing the
>> definition of gimple.
>>
>> I am writing this because no one seems to think that Andrew, and the
>> reviewers are really finished with this process and I was just giving a
>> supportive nudge.
>
> Ok, maybe it's about time we put the various patches that are floating
> around to verify this type correctness on the LTO branch?  I know that
> at least at some point we could bootstrap with them.
>
> Richard.
It is quite possible that that may be the correct plan.  We should wait
until some of the gimple elite respond this email.  I know that honza's
patch has rotted because he tried it in russia and it was doa.  I assume
that pinskia has a valid one since he has been beating out the varmints.




Re: type consistency of gimple

2006-08-14 Thread Kenneth Zadeck
David Edelsohn wrote:
>> Diego Novillo writes:
>> 
>
> Diego> Agreed.  This is a good opportunity for us to design a GIMPLE type
> Diego> system.  Besides the obvious space savings and cleanliness, it is also
> Diego> needed to remove lang_hooks.types_compatible_p.
>
>   And this last statement is the key point.  We can and probably
> should implement a prototype LTO without cleaning up the type system, but
> for LTO to be usable in a production compiler, the GIMPLE type system will
> need to be fixed.  Cleaning the type system is not an immediate
> requirement -- not a dependency for the initial implementation of LTO --
> but LTO cannot be generally usable without this problem being addressed.
> A more compact representation is a byproduct, but multi-language
> correctness is necessary for full, production functionality.
>
> David
>
>   
I am modifying my code so that their is a preprocessor flag,
STUPID_TYPE_SYSTEM that either writes or does not write the redundant
type nodes. 

The comments by Diego imply that not only is this a space issue but that
will have to be solved in order to get rid of one or more of the
lang_hooks.  Thus, it does make some sense to work on this sooner rather
than later. 

I would suggest that we ask those with patches to strengthen the type
system to contribute those patches to the lto branch and for diego (who
I believe has the last working type checker) to contribute that type
checker to the lto branch.  And that we further invite those in the
community who are willing to attack some of these problems to contribute
those patches to the lto branch as they become available, on the basis
that perhaps we can provide some fast path to getting them in the
mainline if we are able to solve this problem in a timely manner. 

When we get to the point where can compile C with the type checker, we
should turn it on in the lto branch and remove my STUPID_TYPE_SYSTEM
code.  This will keep the pressure up but at the same time allow us to
get a prototype in a timely manner. 

Kenny


Re: type consistency of gimple

2006-08-15 Thread Kenneth Zadeck
Nick Clifton wrote:
> Hi Diego,
>
>> Jeff's point about our optimizers is also true.  Nick, remember that
>> issue with MIPS optimizations you were discussing with Jeff a few days
>> ago?  I didn't follow most of the details, but it involved ivopts and
>> sign issues.  Could you send a summary?
>
> Sure:
>
>   I was looking at how a gcc 4.1 based compiler optimized a fast
> fourier transform algorithm for the MIPS target.  What I found was the
> it was producing much worse code than a similarly configured gcc 3.4
> based compiler, and the culprit was the creation of the induction
> variables used in the inner loop.
>
I think that you are pointing the gun at the wrong suspect.  I believe
that the true villain is some where down stream in the backend passes
that cannot see thru the type conversions.  This is one case of us
having "degraded" the back end because the middle end likes to break
things up into smaller pieces and the back end has too small of a window
to look thru to do its work.

We should be looking at the back end to see where it cannot see what it
needs to see rather than trying to stop getting the middle end code into
a reasonable form.

> The nested loops look like this:
>
> signed short i,j,k,DataSizeExponent,DataSize;
> signed short RealBitRevData[MAX], ImagBitRevData[MAX];
> signed short * CosineV, * SineV;
>
> for (k = 1; k <= DataSizeExponent; k++)
> {
> signed short n1 = 1 << k;
> signed short n2 = n1 >> 1;
> signed short ArgIndex = 0;
> signed short DeltaIndex = (DataSize >> 1) / n2;
>
> for (j = 0; j < n2; j++)
> {
> signed int WReal = CosineV[ArgIndex];
> signed int WImag = SineV[ArgIndex];
>
> ArgIndex += DeltaIndex;
> for (i = j; i < DataSize; i += n1)
> {
> signed short l = i + n2;
> signed int tRealData = (WReal * RealBitRevData[l]) +
> (WImag * ImagBitRevData[l]);
> signed int tImagData = (WReal * ImagBitRevData[l]) -
> (WImag * RealBitRevData[l]);
>
> tRealData = tRealData >> BUTTERFLY_SCALE_FACTOR;
> tImagData = tImagData >> BUTTERFLY_SCALE_FACTOR;
> RealBitRevData[l] = RealBitRevData[i] - tRealData;
> ImagBitRevData[l] = ImagBitRevData[i] - tImagData;
> RealBitRevData[i] += tRealData;
> ImagBitRevData[i] += tImagData;
> }
> }
> }
>
> and the inner loop was being transformed into:
>
>   short int D.2046, D.2043;
>   long int D.2038, D.2035;
>   int D.2033, D.2024,  D.2020, D.2008, D.2001;
>[...]
>
> :;
>   D.2033 = (int) (signed short) ivtmp.67;
>   D.2035 = (long int) RealBitRevData[D.2033];
>   D.2038 = (long int) ImagBitRevData[D.2033];
>   tRealData = WReal * D.2035 + WImag * D.2038;
>   tImagData = WReal * D.2038 - WImag * D.2035;
>   D.2001 = (int) i;
>   D.2043 = (short int) (tRealData >> 15);
>   RealBitRevData[D.2033] = RealBitRevData[D.2001] - D.2043;
>   D.2046 = (short int) (tImagData >> 15);
>   ImagBitRevData[D.2033] = ImagBitRevData[D.2001] - D.2046;
>   RealBitRevData[D.2001] = D.2043 + RealBitRevData[D.2001];
>   ImagBitRevData[D.2001] = D.2046 + ImagBitRevData[D.2001];
>   i = (signed short) ((short unsigned int) i + ivtmp.78);
>   ivtmp.68 = ivtmp.68 + ivtmp.78;
>   ivtmp.67 = ivtmp.67 + ivtmp.78;
>   if (DataSize > (signed short) (ivtmp.68 - ivtmp.78)) goto ; else
> goto ;
>
>
> The net result of these induction variables, and especially the type
> translations necessary to go between signed and unsigned and shorts
> and ints was an inner loop consisting of 42 instructions, as opposed
> to an inner loop of 32 instructions as produced by the gcc 3.4 compiler.
>
> Cheers
>   Nick
>



Re: type consistency of gimple

2006-08-15 Thread Kenneth Zadeck
Diego Novillo wrote:
> Kenneth Zadeck wrote on 08/15/06 11:57:
>
>   
>> We should be looking at the back end to see where it cannot see what it
>> needs to see rather than trying to stop getting the middle end code into
>> a reasonable form.
>>
>> 
> You're confused.  This is a middle-end mis-optimization.  However, it is
> true that we should probably make the optimizers smarter.
>
> First, though, we must define a GIMPLE type system.
>   
I most likely am.  You are right, we need a type system.

kenny


Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-08-28 Thread Kenneth Zadeck
Chris Lattner wrote:
> On Aug 28, 2006, at 6:57 AM, Kenneth Zadeck wrote:
>
>> This posting is a progress report of my task of encoding and decoding
>> the GIMPLE stream into LTO.   Included in this posting is a patch that
>> encodes functions and dumps the result to files.
>
> Interesting email.  One question: how big are the files you're
> generating?  For example, how much space does it take to store the
> body of a function with 100 add operations in it (before gzip'ing
> it)?  What is the size proportional to?  # instructions, # basic
> blocks, what else?
>
> -Chris
The size will be large because the trees are still large.  Encoding
everything in a bootstrap for the three languages was on the order of
60mb.  As you know, size is a well known problem in gcc.  The plan is to
track the changes in gimple and as that gets smaller, this will get
smaller. 

I would like to get the types and perhaps the flags out of the internal
nodes of the trees.  when that gets done, these operations should be small.

I actually do not think that it is productive to spend that much time
measuring this until we first assure ourselves that we are getting all
of the information output correctly.  That 60mb number will change a lot
(both up and down) as we get closer to getting everything tied
together.  Then we can spend more time looking at the pieces and seeing
where we need to do better.



Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-08-28 Thread Kenneth Zadeck
Michael Eager wrote:
> Kenneth Zadeck wrote:
>> This posting is a progress report of my task of encoding and decoding
>> the GIMPLE stream into LTO.   Included in this posting is a patch that
>> encodes functions and dumps the result to files.  
>
> I have only a limited understanding of GIMPLE and LTO, but here are my
> comments regarding DWARF.
>
> DWARF is a debugging format used to describe a program's source and how
> it is represented in an executable file.  GIMPLE is an intermediate
> representation, not program source.  It's always interesting to see one
> tool used in a different context, but I'm not clear that in this case
> you are using the right tool for the right task in the way that it was
> designed.
>
> Speaking on behalf of the DWARF Standards Workgroup, we welcome
> suggestions for extensions or improvements in the DWARF standard.
> I wasn't able to identify any specific extension in your email (although
> I didn't look at the patch in detail).  If there are specific areas
> where DWARF can be improved, please let me know.
>
Michael,

My original posting was primarily intended for the gcc developer
community which is familiar with gimple and not so familiar with dwarf. 
I actually think that dwarf is probably pretty good for what it was
designed for and it is also documented much better than most of the
internals of gcc. 

The task I was given was to encode gimple into dwarf, something that
looked like it might be a reasonable idea, but is, in fact, a bad idea. 
My posting was to tell the gcc community, been there, done that, now
lets move on.

I will respond to many of your points but you should not take this as
criticism of dwarf, these are mostly just places where the interface
between gimple and dwarf was particularly troublesome and unless you
want to get in the transporting intermediate code business, which I
strongly suggest you avoid, you can just ignore most of these problems. 


>
>> 1) ABBREV TABLES ARE BAD FOR LTO.
>>
>> The abbrev table is a mechanism for making a DWARF3 file self
>> descriptive.  It contains a set of templates that are used to describe
>> the records and attributes that are written to describe the object
>> module.
>>
>> This self description mechanism is much weaker (an therefor much more
>> compact) than an XML file's schemata because it depends on a common
>> understanding of the meaning of the tags and attributes that describe
>> the pieces of the data.  The tags them selves can be thought of as
>> characters, and an abbrev entry is a magic cookie that tells you how
>> to break the characters into words.
>
> DWARF is not self-descriptive, except in the very limited sense that
> several sections contain version numbers.  The DWARF abbreviation table
> is not intended to be self-descriptive or to be descriptive of the
> DWARF data.  It is a simple compression scheme which compacts the
> Tag/Attribute/Value structure of a DWARF DIE into a much more condensed
> format.
>
> Certainly the DWARF Abbrev table makes no pretense of being anything
> similar to an XML schema.  I wouldn't use an XML schema for debugging
> information (it would be huge and slow), and I wouldn't expect to use
> DWARF abbreviation tables to describe anything at all.
>
While I understand your point, from a language theoretic point of view,
they really can be thought of a schemata.  They are a description of the
records types that have been written by the compiler (or some other
tool) for the tags and attributes in this module. 
This may not be the way that they evolved in your community, but for
someone coming from the outside with no knowledge of your history, this
is what they are.

>> However, this mechanism is only self descriptive if you do not extend
>> the set of tags.  That is not an option for LTO.  While the DWARF3
>> designers seem to have added every conceivable tag to describe object
>> code, it is not be surprising that this comes up short when describing
>> GIMPLE.  While some of GIMPLE could be shoe-horned into the existing
>> tags, all of it will not fit, so anyone who wants to understand
>> GIMPLE, will need to use the GCC headers.
>
> The DWARF standard provides for a consistent means to extend
> the debugging format.  If you extend DWARF beyond the standard, as has
> often been done, you will need to provide descriptions (in some form)
> of what these extensions mean.  You can provide this as written
> material, or you can have the implementer read the code.  In either
> case, the Abbrev tables only provide compression for the Tag/Attr/Val
> structure of a DIE.  They do no provide any semantic description.
>
What I did was add 4 new DW_FORMS.  This is one of the areas t

Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-08-30 Thread Kenneth Zadeck
Mark Mitchell wrote:
> Kenneth Zadeck wrote:
>
>
>
>> This
>> will be more cumbersome if we have to keep reloading each object
>> file's abbrev table just to tear apart a single function in that .o
>> file.  While the abbrev sections average slightly less than %2 of the
>> of the size of the GIMPLE encoding for an entire file, each abbrev table
>> averages about the same size as a single function.
>
> Interesting datapoint.
>
> (Implied, but not stated, in your mail is the fact that the
> abbreviation table cannot be indexed directly.  If it could be, then
> you wouldn't have to read the entire abbreviation table for each
> function; you would just read the referenced abbreviations.  Because
> the abbreviation table records are of variable length, it is indeed
> true that you cannot make random accesses to the table.  So, this
> paragraph is just fleshing out your argument.)
>
> I think the conclusion that you reach (that the size of the tables is
> a problem) depends on how you expect the compiler to process functions
> at link-time.  My expectation was that you would form a global
> control-flow graph for the entire program (by reading CFG data encoded
> in each .o file), eliminate unreachable functions, and then
> inline/optimize functions one-at-a-time.
>
> If you sort the function-reading so that you prefer to read functions
> from the same object file in order, then I would expect that you would
> considerably reduce the impact of reading the abbreviation tables. 
> I'm making the assumption that it f calls N functions, then they
> probably come from < N object files.  I have no data to back up that
> assumption.
>
> (There is nothing that says that you can only have one abbreviation
> table for all functions.  You can equally well have one abbreviation
> table per function.  In that mode, you trade space (more abbreviation
> tables, and the same abbreviation appearing in multiple tables)
> against the fact that you now only need to read the abbreviation
> tables you need.  I'm not claiming this is a good idea.)
>
> I don't find this particular argument (that the abbreviation tables
> will double file I/O) very convincing.  I don't think it's likely that
> the problem we're going to have with LTO is running out of *virtual*
> memory, especially as 64-bit hardware becomes nearly universal.  The
> problem is going to be running out of physical memory, and thereby
> paging like crazy, running out of D-cache.  So, I'd assume you'd just
> read the tables as-needed, and never both discarding them.  As long as
> there is reasonable locality of reference to abbreviation tables
> (i.e., you can arrange to hit object files in groups), then the cost
> here doesn't seem like it would be very big.
>
Even if we decide that we are going to process all of the functions in
one file at one time, we still have to have access to the functions that
are going to be inlined into the function being compiled.  Getting at
those functions that are going to be inlined is where the double the i/o
arguement comes from. 

I have never depended on the kindness of strangers or the virtues of
virtual memory.  I fear the size of the virtual memory when we go to
compile really large programs. 


>> 2) I PROMISED TO USE THE DWARF3 STACK MACHINE AND I DID NOT.
>
> I never imagined you doing this; as per above, I always expected that
> you would use DWARF tags for the expression nodes.  I agree that the
> stack-machine is ill-suited.
>
>> 3) THERE IS NO COMPRESSION IN DWARF3.
>
>> In 1 file per mode, zlib -9 compression is almost 6:1.  In 1 function
>> per mode, zlib -9 compression averages about 3:1.
>
> In my opinion, if you considered DWARF + zlib to be satisfactory, then
> I think that would be fine.  For LTO, we're allowed to do whatever we
> want.  I feel the same about your confession that you invented a new
> record form; if DWARF + extensions is a suitable format, that's fine.
> In other words, in principle, using a somewhat non-standard variant of
> DWARF for LTO doesn't seem evil to me -- if that met our needs.
>
One of the comments that was made by a person on the dwarf committee is
that the abbrev tables really can be used for compression.  If you have
information that is really common to a bunch of records, you can build
an abbrev entry with the common info in it. 

I have not seen a place where any use can be made of this for encoding
gimple except for a couple of places where I have encoded a true or
false.  I therefor really do not see that they really add anything
except for the code to read and write them. 


>> 2) LOCAL DECLARATIONS 
>> Mark was going to do all of the types and all of the declar

Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-08-31 Thread Kenneth Zadeck
Mark Mitchell wrote:
> Kenneth Zadeck wrote:
>
>> Even if we decide that we are going to process all of the functions in
>> one file at one time, we still have to have access to the functions that
>> are going to be inlined into the function being compiled.  Getting at
>> those functions that are going to be inlined is where the double the i/o
>> arguement comes from. 
>
> I understand -- but it's natural to expect that those functions will
> be clumped together.  In a gigantic program, I expect there are going
> to be clumps of tightly connected object files, with relatively few
> connections between the clumps.  So, you're likely to get good cache
> behavior for any per-object-file specific data that you need to access.
>
I just do not know.  I assume that you are right, that there is some
clumping.  But I am just no sure. 

>> I have never depended on the kindness of strangers or the virtues of
>> virtual memory.  I fear the size of the virtual memory when we go to
>> compile really large programs. 
>
> I don't think we're going to blow out a 64-bit address space any time
> soon.  Disks are big, but they are nowhere near *that* big, so it's
> going to be pretty hard for anyone to hand us that many .o files. 
> And, there's no point manually reading/writing stuff (as opposed to
> mapping it into memory), unless we actually run out of address space.
>
I am not so concerned with running out of virtual address space than I
am about being able to break this up so that it can be done in parallel,
on a farm of machines.  Otherwise, lto can never be part of anyone's
compile/test loop. 

The notion of having 20 or 50 compile servers each mapping all of the
files of a large system in seems like a bad design point. 


> In fact, if you're going to design your own encoding formats, I would
> consider a format with self-relative pointers (or, offsets from some
> fixed base) that you could just map into memory.  It wouldn't be as
> compact as using compression, so the total number of bytes written
> when generating the object files would be bigger.  But, it will be
> very quick to load it into memory.
>
If you look at my code, that is what I have, at least with respect to
the function itself.
There is one big difference here between lto and what a debugger needs. 
I could see designing a debugger (and I have no idea if any such
debuggers exist) that simply maps in the debug information and just uses
the incore representation as is.  Dwarf seems to have been designed to
support this.  (but then again I could be dreaming).  With an
intermediate form of a compiler, the usage is quite different.  All that
we are going to do is load a function, convert it gimple and then throw
away (the notion of throw away may not have meaning for memory mapped
files) the on disk version.

The prime goal is that the format be designed so that an enitity
(generally a function) can be expanded into gimple in one pass.  Then
the question of the benefit of using a compressor comes down to
processor speed vs io speed.

With the parts that you are in charge of, namely the types and the
globals, this is not true.  I can very easily see an implementation of
the types and decls that is like I describe for the debugger, you map it
into mem and just use if from there.   But since the intermediate code
for a function body is going to get very modified, and our existing
gimple is chocked full of pointers, it is harder to envision ever
winning at that the mapping game.


> I guess my overriding concern is that we're focusing heavily on the
> data format here (DWARF?  Something else?  Memory-mappable?  What
> compression scheme?) and we may not have enough data.  I guess we just
> have to pick something and run with it.  I think we should try to keep
> that code as as separate as possible so that we can recover easily if
> whatever we pick turns out to be (another) bad choice. :-)
>
>> One of the comments that was made by a person on the dwarf committee is
>> that the abbrev tables really can be used for compression.  If you have
>> information that is really common to a bunch of records, you can build
>> an abbrev entry with the common info in it. 
>
> Yes.  I was a little bit surprised that you don't seem to have seen
> much commonality.  If you recorded most of the tree flags, and treated
> them as DWARF attributes, I'd expect you would see relatively many
> expressions of a fixed form.  Like, there must be a lot of PLUS_EXPRs
> with TREE_USED set on them.  But, I gather that you're trying to avoid
> recording some of these flags, hoping either that (a) they won't be
> needed, or (b) you can recreate them when reading the file.  I think
> both (a) and (b) hold in many cases, so I think it'

Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-08-31 Thread Kenneth Zadeck
Mark Mitchell wrote:
> Kenneth Zadeck wrote:
>
>> I am not so concerned with running out of virtual address space than I
>> am about being able to break this up so that it can be done in parallel,
>> on a farm of machines.  Otherwise, lto can never be part of anyone's
>> compile/test loop. 
>
> I think we just expanded the scope of work by an order of magnitude. :-)
>
> If you had just said that you wanted to support multi-threaded LTO,
> that would have been a big deal.  But multiple machines with multiple
> address spaces trying to do LTO on one program is a really big deal. 
> (Of course, there is a "cheap hack" way of doing what you want: run
> LTO on clumps of object files in parallel, and then just link the
> pre-optimized files together in the ordinary way.)
>
> I'd really like to see us inline a function before we even begin to
> have this conversation. :-)
I have no intention in expanding the scope of the work.  I just am not
going to do anything that precludes doing this.  As long as the function
bodies are in a form that can be easily accessed without reading and
processing the entire file, I am fine. 


>
>> I have no idea how "stable" all the types and decls are over a
>> compilation.  I write my info pretty early, and I assume the types and
>> decls are written pretty late in the compilation (otherwise you would
>> not have address expressions for the debugger).  If there has been any
>> "processing" on these between when I write my stuff and when the types
>> and decls get written, things may not match up. 
>
> I don't think that this is an issue.  The important information about
> types and declaration is stable.  Things like "is this declaration
> used?" change over the course of the compilation, but that's not
> useful for DWARF anyhow -- and, in general, we don't write out
> information about types/declarations that are entirely unused.  The
> key aspects (sizes/layouts/etc.) are fixed.
>
You are either right, or we will fix it.  Those are the only two
options.  My only point was that there is a lot of compiler in between
where someone could have done something silly.




Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-09-01 Thread Kenneth Zadeck
Daniel Jacobowitz wrote:
> On Thu, Aug 31, 2006 at 04:00:25PM -0700, Mark Mitchell wrote:
>   
>> I think this is probably moot, since I believe that Kenny feels DWARF is 
>> not suitable for reasons other than the abbreviation table issue, but 
>> this is a clever technique.
>> 
>
> ... for GIMPLE; when I discussed with him, I got the impression he was
> still open to using it for other things, like types.  I may have been
> mistaken.
>
>   
Given that Mark, and for that matter no one else,  did not really push
back, I am pretty much committed not to use dwarf.

Kenny


Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-09-01 Thread Kenneth Zadeck
Daniel Jacobowitz wrote:
> On Fri, Sep 01, 2006 at 09:45:34AM -0400, Kenneth Zadeck wrote:
>   
>> Given that Mark, and for that matter no one else,  did not really push
>> back, I am pretty much committed not to use dwarf.
>> 
>
> Then... what are you going to do about things like types?  Invent a new
> serialization for those too?  I think that confusing dwarf-for-types
> and dwarf-for-gimple would be a mistake.
>
>   
My part is only the function bodies, we are still going to use dwarf for
the types and the global variables.  There are people at codesoucery
who, even as we speak, are busily enhancing that part to get all of the
pieces output, not just the parts used for the debugger.

Kenny


Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-09-01 Thread Kenneth Zadeck
Diego Novillo wrote:
> Kenneth Zadeck wrote on 08/28/06 09:57:
>
>   
>> I have not done this because I do not rule the earth.  That was not
>> what I was assigned to do, and I agreed that DWARF3 sounded like a
>> reasonable way to go.  Now that I understand the details of DWARF3, I
>> have changed my mind about the correct direction.  Now is the time to
>> make that change before there is a lot of infrastructure built that
>> assumes the DWARF3 encoding.
>>
>> 
> FWIW.  I agree with your conclusion.  I do not know DWARF3 very well,
> but it always felt a bit heavy handed to me.
>
> At first, I was under the impression that we were going to use DWARF3 to
> describe the types and symbol table, and embed our bytecode alongside.
> It sounds to me like we want to either invent our own bytecode or adapt
> an existing one to our needs.  Inventing our own is probably the best
> long-term alternative.
>
> A few comments on the code:
>
>   
>> Index: lto-tree-flags.def
>> [ ... ]
>> +/* This file is designed to be inlined into both the writing and the
>> +   reading of the lto information.  What it does depends on the glue
>> +   that put in front and at the end of this and how ADD_BASE_FLAG is
>> +   defined. 
>> +
>> +   For those interested in extra credit, this could also be used as
>> +   the checking code to see if each flag is used correctly.  10 extra
>> +   credit points will be given for the intrepid programmer who does
>> +   this and is able to removes the comment that this was generated
>> +   from in tree.h.  */
>>
>> 
> Nothing in this comment helps in understanding what the file is supposed
> to do.  What is this used for?
>   

When I get the other side of the code finished and enhance the comments,
you will see. 
>   
>> +  switch (code)
>> +{
>> +case ABS_EXPR:
>> +  break;
>> +  
>> +case ADDR_EXPR:
>> +  break;
>> +  
>> +  ADD_BASE_FLAG (static_flag)
>>
>> 
> This code is unreachable.  There are many instances of this here.
>
>
>   
I would have found that when I wrote the code that reads this back.
>> +/* Return the master clone node of N if it is available and if
>> +   CHECK_OVERWRITE is true, not overwritable.  */ 
>> +
>> 
> What does it mean to be overwritable?  You never seem to call this
> function with check_overwrite == false.
>
>   
I have no idea what overwritable is used for.  This is something that
honza and I went around many times on when I was writing my ipa code. 
It appears that you can put attributes on some functions that allow the
function to replaced late, like at link or even runtime. 

For instance, when I was doing my pure-const analysis, I have to mark
these as not being pure or const even if they looked like they could be
since they could be replaced by versions that were not pure or const.

However, here, I need the master clone node, but I do not care if it is
overwritable, I have to put it out anyway.


I  do call it that way now. Thanks for noticing.
>   
>> +struct char_ptr_base
>> +{
>> +  char *ptr;
>> +};
>> +
>> 
> Hmm?  Why?  Planning to have something different in the future?
>
>
>   
>> +/* An incore byte stream to buffer the various parts of the
>> +function. The entire structure should be zeroed when created.  The
>> +record consists of a set of blocks.  The first sizeof (ptr) bytes are
>> +used as a chain, and the rest store the bytes to be written.  */
>> +
>> +struct output_stream
>> +{
>> +  /* The pointer to the first block in the stream.  */
>> +  struct char_ptr_base * first_block;
>> +  /* The pointer to the last and current block in the stream.  */
>> +  struct char_ptr_base * current_block;
>> +  /* The pointer to where the next char should be written.  */
>> +  char * current_pointer;
>> +  /* The number of characters left in the current block.  */
>> +  unsigned int left_in_block;
>> +  /* The block size of the last block allocated.  */
>> +  unsigned int block_size;
>> +  /* The total number of characters written.  */
>> +  unsigned int total_size;
>> +};
>> +
>> 
> Maybe there's code out there for paging/caching data streams?  Though we
> would probably want to tweak it quite a bit, it may save some effort for
> the base functionality.
>
> Even if we end up not using DWARF3 as output, the streaming aspect of
> this code will remain, I guess?
>
>   
Yes, the streaming remains, I need to do it this way so that I can then
pass some of the buffers into a compressor.

I had though

Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-09-01 Thread Kenneth Zadeck
Andrew Pinski wrote:
> On Fri, 2006-09-01 at 09:51 -0400, Diego Novillo wrote:
>   
>>> +#if STUPID_TYPE_SYSTEM
>>>
>>>   
>> STUPID_TYPE_SYSTEM?  No need to be insulting.  It's unpleasant.
>> 
>
> Well it right now it is stupid, this is just a work around anyways until
> people fix the type mismatches really.  Is it more insulting than having
> stupid.c which existed in GCC before 3.0.0?  Also this is very temporary
> and will go away when the problem in the rest of the compiler is fixed.
>
> -- Pinski
>
>   
I will tone it down, the point has been made.

kenny


this code in fold-const.c:fold_single_bit_test looks wrong to me

2013-06-25 Thread Kenneth Zadeck

   if (TREE_CODE (inner) == RSHIFT_EXPR
  && TREE_CODE (TREE_OPERAND (inner, 1)) == INTEGER_CST
  && TREE_INT_CST_HIGH (TREE_OPERAND (inner, 1)) == 0
  && bitnum < TYPE_PRECISION (type)
  && 0 > compare_tree_int (TREE_OPERAND (inner, 1),
   bitnum - TYPE_PRECISION (type)))
{
  bitnum += TREE_INT_CST_LOW (TREE_OPERAND (inner, 1));
  inner = TREE_OPERAND (inner, 0);
}


in particular, in the last stanza of the test
TREE_OPERAND (inner, 1) is a positive number from the second stanza.
bitnum is also always positive and less than the TYPE_PRECISION (type) 
from the third stanza,

so bitnum - TYPE_PRECISION (type) is always negative,

so the compare will always be positive, so this code will never be executed.

it is hard to believe that this is what you want.

kenny



Re: [wide-int] int_traits

2013-10-18 Thread Kenneth Zadeck
I am a little confused here.what is the reason for doing the the 
is_sign_extended thing?
is the point of deferring the canonization that we can avoid the use of 
the scratch array until the value is actually used.   then we 
canonicalize on demand?


That seems like it is going to require several bits of state: like one 
to say if we have canonicalized or not.   All of these will then have to 
be checked on every operation.


however, i guess if it keeps to alias analyzer sane then I guess it is 
ok, but it makes everything else a little more complex.


Kenny




On 10/18/2013 10:57 AM, Richard Sandiford wrote:

[off-list]

Kenneth Zadeck  writes:

Richi,

Do you want me to back out the patch that changes the rep for unsigned
tree-csts?

kenny

Doesn't look like you're on IRC, so FWIW:

 richi: just to check, you still want the scratch array to be
   separate from the other fields, to avoid the other fields escaping,
   is that right? [11:13]
 will try that today if so
 rsandifo: that would be nice, but I'm not settled yet on what to do on
::decompose fully - the idea of a ::is_sign_extended storage-ref flag
popped up [11:17]
 rsandifo: so eventually we don't need a scratch member anymore (which
would be best)
 rsandifo: so if you can give that idea a take instead ... ;) [11:18]
 today I'll try to clear my patch review pipeline ...
 yeah.  Just to be sure: we still need the extra zero HWI for the
   large unsigned constants though, right?
 no, in that case we wouldn't need that
 we'd have extra (conditional) sext() operations in all sign-dependend
ops though [11:19]
 thus delay the canonicalization until the first use
 I thought the ::is_sign_extended was dealing with a different case
   (small_prec).  The problem with the extra zero HWI is different.
 no, it would deal with the case in general and say "if you need a
sign-extended rep you have to sign-extend" [11:20]
 but the point of the extra HWI is that if you extend a 64-bit tree
   to 128 bits, you have two signficant HWIs.
 for the fixed_wide_int rep we then want to combine the extension with
the copy in its constructor [11:21]
 For unsigned constants that extra HWI is zero.  For signed
   constants it's minus one.
 The choice matters in that case, since it affects the 128-bit
   result of any arithmetic
 but here the interface is just what limits us - the fact that
decompose allows precision != xprecision [11:22]
 that case should be split out to a different "decompose"
 it's just needed for fixed_wide_int AFAIK?
 But the interface was defined that way to avoid constructing
   fixed_wide_ints for x and y when doing x + y -> fixed_wide_int
 we could instead do addition with three different precisions (x, y
   and the result), but that'd get complicated... [11:23]
 hmm, I see
 well, then leave the extra zeros alone for now [11:24]
 ok, works for me, thanks
 so for this ::is_sign_extended thing, we basically go back to the
   original "arbitrary upper bits" as the default (conservative)
   assumption, but add optimisations for the case where
   ::is_sign_extended is a compile-time true? [11:25]
 yes
 OK
 "arbitrary upper bits" would be either sign- or zero-extended thouhg
(but it probaly doesn't make a difference to arbtrary)
 on a tree wide_int_ref would have that "arbitrary upper bits"
 that in as far as I can see should get rid of scratch (fingers
crossing) [11:27]
 s/on/only/
 Right -- that copy wasn't there in Kenny's original version, it
   only came in because of the "small_prec having defined upper bits".
 I agree ::is_sign_extended sounds like a nice compromise [11:28]
 let's hope it will work out and improve code as desired ;)
 also the parts Mike suggested, adding more compile-time known
optimizations [11:29]
 yeah
 and making tree.c predicates use the tree rep directly
 though then integer_zerop will be magically more efficient than t == 0
... [11:30]
 would be nice if we could wi:: to be efficient enough so that we
   can keep the tree.c bits as-is [11:32]
 ...could get... [11:33]
 with CONSTANT (...) tricks or whatever
 realise that might be a bit optimistic though...
 yeah [12:41]





suspect code in fold-const.c

2013-11-14 Thread Kenneth Zadeck
in doing the wide int conversion, i have found the following code on the 
trunk which seems somewhat suspect:



this code from fold-const.c starts on line 13811.

else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
 && TREE_INT_CST_LOW (arg1) == signed_max_lo
 && TYPE_UNSIGNED (arg1_type)
 /* We will flip the signedness of the comparison operator
associated with the mode of arg1, so the sign bit is
specified by this mode.  Check that arg1 is the signed
max associated with this sign bit.  */
 && width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
 /* signed_type does not work on pointer types.  */
 && INTEGRAL_TYPE_P (arg1_type))

it seems that the check on bitsize should really be a check on the 
precision of the variable.   If this seems right, i will correct this on 
the trunk and make the appropriate changes to the wide-int branch.


kenny




Re: suspect code in fold-const.c

2013-11-15 Thread Kenneth Zadeck

On 11/15/2013 04:07 AM, Eric Botcazou wrote:

this code from fold-const.c starts on line 13811.

  else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
   && TREE_INT_CST_LOW (arg1) == signed_max_lo
   && TYPE_UNSIGNED (arg1_type)
   /* We will flip the signedness of the comparison operator
  associated with the mode of arg1, so the sign bit is
  specified by this mode.  Check that arg1 is the signed
  max associated with this sign bit.  */
   && width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
   /* signed_type does not work on pointer types.  */
   && INTEGRAL_TYPE_P (arg1_type))

with width defined as:

unsigned int width = TYPE_PRECISION (arg1_type);


it seems that the check on bitsize should really be a check on the
precision of the variable.   If this seems right, i will correct this on
the trunk and make the appropriate changes to the wide-int branch.

Do you mean

   && width == GET_MODE_PRECISION (TYPE_MODE (arg1_type))

instead?  If so, that would probably make sense, but there are a few other
places with the same TYPE_PRECISION/GET_MODE_BITSIZE check, in particular the
very similar transformation done in fold_single_bit_test_into_sign_test.

yes.  I understand the need to do this check on the mode rather than the 
precision of the type itself.
The point is that if the mode under this type happens to be a partial 
int mode, then that sign bit may not even be where the bitsize points to.


However, having just done a few greps, it looks like this case was just 
the one that i found while doing the wide-int work, there may be several 
more of these cases.   Just in fold-const, there are a couple in 
fold_binary_loc.   The one in tree.c:int_fits_type_p looks particularly 
wrong.


I think that there are also several in tree-vect-patterns.c.

Kenny


Re: suspect code in fold-const.c

2013-11-15 Thread Kenneth Zadeck


This patch fixes a number of places where the mode bitsize had been used 
but the mode precision should have been used.  The tree level is 
somewhat sloppy about this - some places use the mode precision and some 
use the mode bitsize.   It seems that the mode precision is the proper 
choice since it does the correct thing if the underlying mode is a 
partial int mode.


This code has been tested on x86-64 with no regressions.   Ok to commit?



2013-11-15 Kenneth Zadeck 
* tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to
GET_MODE_PRECISION.
* fold-const.c (fold_single_bit_test_into_sign_test,
fold_binary_loc):  Change GET_MODE_BITSIZE to
GET_MODE_PRECISION.

Kenny


On 11/15/2013 08:32 AM, Kenneth Zadeck wrote:

On 11/15/2013 04:07 AM, Eric Botcazou wrote:

this code from fold-const.c starts on line 13811.

  else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
   && TREE_INT_CST_LOW (arg1) == signed_max_lo
   && TYPE_UNSIGNED (arg1_type)
   /* We will flip the signedness of the comparison 
operator

  associated with the mode of arg1, so the sign bit is
  specified by this mode.  Check that arg1 is the signed
  max associated with this sign bit.  */
   && width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
   /* signed_type does not work on pointer types. */
   && INTEGRAL_TYPE_P (arg1_type))

with width defined as:

unsigned int width = TYPE_PRECISION (arg1_type);


it seems that the check on bitsize should really be a check on the
precision of the variable.   If this seems right, i will correct 
this on

the trunk and make the appropriate changes to the wide-int branch.

Do you mean

   && width == GET_MODE_PRECISION (TYPE_MODE (arg1_type))

instead?  If so, that would probably make sense, but there are a few 
other
places with the same TYPE_PRECISION/GET_MODE_BITSIZE check, in 
particular the

very similar transformation done in fold_single_bit_test_into_sign_test.

yes.  I understand the need to do this check on the mode rather than 
the precision of the type itself.
The point is that if the mode under this type happens to be a partial 
int mode, then that sign bit may not even be where the bitsize points to.


However, having just done a few greps, it looks like this case was 
just the one that i found while doing the wide-int work, there may be 
several more of these cases.   Just in fold-const, there are a couple 
in fold_binary_loc.   The one in tree.c:int_fits_type_p looks 
particularly wrong.


I think that there are also several in tree-vect-patterns.c.

Kenny


Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 204842)
+++ gcc/fold-const.c	(working copy)
@@ -6593,7 +6593,7 @@ fold_single_bit_test_into_sign_test (loc
 	  /* This is only a win if casting to a signed type is cheap,
 	 i.e. when arg00's type is not a partial mode.  */
 	  && TYPE_PRECISION (TREE_TYPE (arg00))
-	 == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (arg00
+	 == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg00
 	{
 	  tree stype = signed_type_for (TREE_TYPE (arg00));
 	  return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR,
@@ -12050,7 +12050,7 @@ fold_binary_loc (location_t loc,
 	zerobits = unsigned HOST_WIDE_INT) 1) << shiftc) - 1);
 	  else if (TREE_CODE (arg0) == RSHIFT_EXPR
 		   && TYPE_PRECISION (TREE_TYPE (arg0))
-		  == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (arg0
+		  == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg0
 	{
 	  prec = TYPE_PRECISION (TREE_TYPE (arg0));
 	  tree arg00 = TREE_OPERAND (arg0, 0);
@@ -12061,7 +12061,7 @@ fold_binary_loc (location_t loc,
 		{
 		  tree inner_type = TREE_TYPE (TREE_OPERAND (arg00, 0));
 		  if (TYPE_PRECISION (inner_type)
-		  == GET_MODE_BITSIZE (TYPE_MODE (inner_type))
+		  == GET_MODE_PRECISION (TYPE_MODE (inner_type))
 		  && TYPE_PRECISION (inner_type) < prec)
 		{
 		  prec = TYPE_PRECISION (inner_type);
@@ -13816,7 +13816,7 @@ fold_binary_loc (location_t loc,
 			associated with the mode of arg1, so the sign bit is
 			specified by this mode.  Check that arg1 is the signed
 			max associated with this sign bit.  */
-		 && width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
+		 && width == GET_MODE_PRECISION (TYPE_MODE (arg1_type))
 		 /* signed_type does not work on pointer types.  */
 		 && INTEGRAL_TYPE_P (arg1_type))
 	  {
Index: gcc/tree.c
===
--- gcc/tree.c	(revision 204842)
+++ gcc/tree.c	(working copy)
@@ -8614,7 +8614,7 @@ retry:
   /* Third, unsigned integers with top bit set never fit signed types.  */
   if (! TYPE_UNSIGNED (type) && unsc)
 {
-  int prec = GET_MODE

Re: suspect code in fold-const.c

2013-11-18 Thread Kenneth Zadeck

committed as revision 204987.

thanks

kenny

On 11/18/2013 05:38 AM, Richard Biener wrote:

On Fri, 15 Nov 2013, Kenneth Zadeck wrote:


This patch fixes a number of places where the mode bitsize had been used but
the mode precision should have been used.  The tree level is somewhat sloppy
about this - some places use the mode precision and some use the mode bitsize.
It seems that the mode precision is the proper choice since it does the
correct thing if the underlying mode is a partial int mode.

This code has been tested on x86-64 with no regressions.   Ok to commit?

Ok.

Thanks,
Richard.



2013-11-15 Kenneth Zadeck 
 * tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to
 GET_MODE_PRECISION.
 * fold-const.c (fold_single_bit_test_into_sign_test,
 fold_binary_loc):  Change GET_MODE_BITSIZE to
 GET_MODE_PRECISION.

Kenny


On 11/15/2013 08:32 AM, Kenneth Zadeck wrote:

On 11/15/2013 04:07 AM, Eric Botcazou wrote:

this code from fold-const.c starts on line 13811.

   else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
&& TREE_INT_CST_LOW (arg1) == signed_max_lo
&& TYPE_UNSIGNED (arg1_type)
/* We will flip the signedness of the comparison operator
   associated with the mode of arg1, so the sign bit is
   specified by this mode.  Check that arg1 is the signed
   max associated with this sign bit.  */
&& width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
/* signed_type does not work on pointer types. */
&& INTEGRAL_TYPE_P (arg1_type))

with width defined as:

 unsigned int width = TYPE_PRECISION (arg1_type);


it seems that the check on bitsize should really be a check on the
precision of the variable.   If this seems right, i will correct this on
the trunk and make the appropriate changes to the wide-int branch.

Do you mean

&& width == GET_MODE_PRECISION (TYPE_MODE (arg1_type))

instead?  If so, that would probably make sense, but there are a few other
places with the same TYPE_PRECISION/GET_MODE_BITSIZE check, in particular
the
very similar transformation done in fold_single_bit_test_into_sign_test.


yes.  I understand the need to do this check on the mode rather than the
precision of the type itself.
The point is that if the mode under this type happens to be a partial int
mode, then that sign bit may not even be where the bitsize points to.

However, having just done a few greps, it looks like this case was just the
one that i found while doing the wide-int work, there may be several more of
these cases.   Just in fold-const, there are a couple in fold_binary_loc.
The one in tree.c:int_fits_type_p looks particularly wrong.

I think that there are also several in tree-vect-patterns.c.

Kenny




Index: gcc/tree.c
===
--- gcc/tree.c	(revision 204986)
+++ gcc/tree.c	(working copy)
@@ -8629,7 +8629,7 @@ retry:
   /* Third, unsigned integers with top bit set never fit signed types.  */
   if (! TYPE_UNSIGNED (type) && unsc)
 {
-  int prec = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (c))) - 1;
+  int prec = GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (c))) - 1;
   if (prec < HOST_BITS_PER_WIDE_INT)
 	{
 	  if (unsigned HOST_WIDE_INT) 1) << prec) & dc.low) != 0)
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 204986)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,11 @@
+2013-11-18 Kenneth Zadeck 
+
+	* tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to
+	GET_MODE_PRECISION.
+	* fold-const.c (fold_single_bit_test_into_sign_test)
+	(fold_binary_loc):  Change GET_MODE_BITSIZE to
+	GET_MODE_PRECISION.
+
 2013-11-18  Teresa Johnson  
 
 	* gcc/cfgrtl.c (cfg_layout_initialize): Assert if we
Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 204986)
+++ gcc/fold-const.c	(working copy)
@@ -6593,7 +6593,7 @@ fold_single_bit_test_into_sign_test (loc
 	  /* This is only a win if casting to a signed type is cheap,
 	 i.e. when arg00's type is not a partial mode.  */
 	  && TYPE_PRECISION (TREE_TYPE (arg00))
-	 == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (arg00
+	 == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg00
 	{
 	  tree stype = signed_type_for (TREE_TYPE (arg00));
 	  return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR,
@@ -12049,7 +12049,7 @@ fold_binary_loc (location_t loc,
 	zerobits = unsigned HOST_WIDE_INT) 1) << shiftc) - 1);
 	  else if (TREE_CODE (arg0) == RSHIFT_EXPR
 		   && TYPE_PRECISION (TREE_TYPE (arg0))
-		  == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (arg0
+		  == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg0
 	{
 	  prec = TYPE_PRECISION (TREE_TYPE (arg0));
 	  

pr52543

2012-03-19 Thread Kenneth Zadeck
I have figured out what the root cause of pr52543, but i need some 
advise as to how to fix it.
The bug only happens if the source or destination of the move is a hard 
register.   lower-subreg never breaks up pseudo to pseudo moves that are 
larger than word mode.   According to richard sandiford, this bug also 
appears on the neon, but i do not know if there is a bugzilla for it.   
It also appears on my private port, which is why i am interested in it.


in the particular case of pr52543 and my port, this happens because the 
input arguments are hard regs.


The offending code is in can_decompose_p.   The problem is that if the 
reg is a hard reg, it completely blows off all of the information that 
it accumulated during the first pass and unconditionally splits the 
register (assuming it is legal to do so).


My question for the list, what is the predicate that we want to replace 
the code that always  decomposes hardregs (assuming it is legal).In 
the case of the neon and my port, decomposing costs 4x more than using a 
wide move.   I assume the avr is similar.


Kenny




Re: pr52543

2012-03-20 Thread Kenneth Zadeck

Ian is certainly correct.

I think that the question is really bigger than finding the correct line 
to fix.   The problem is, that this code assumes that machines do not 
have multiword moves or multiword shifts.   My machine has both, and i 
assume that the avr and the neon have at least multiword moves (but i do 
not know about the shifts).   And as life moves forward, more machines 
will have these.


It seems like the right way to fix this is to somehow enhance the code 
at the beginning of decompose_multiword_subregs to ask which modes are 
not cheap to move or shift and then modify the second loop to never 
lower for those operations for those modes.


The question is do i add 2 more target hooks (one for shifting and one 
for moves) or do i use the rtx_cost mechanism and split for anything 
over COSTS_N_INSNS (1) or some such?


Kenny

On 03/20/2012 01:13 AM, Ian Lance Taylor wrote:

Kenneth Zadeck  writes:


I have figured out what the root cause of pr52543, but i need some
advise as to how to fix it.
The bug only happens if the source or destination of the move is a
hard register.   lower-subreg never breaks up pseudo to pseudo moves
that are larger than word mode.   According to richard sandiford, this
bug also appears on the neon, but i do not know if there is a bugzilla
for it.   It also appears on my private port, which is why i am
interested in it.

in the particular case of pr52543 and my port, this happens because
the input arguments are hard regs.

The offending code is in can_decompose_p.   The problem is that if the
reg is a hard reg, it completely blows off all of the information that
it accumulated during the first pass and unconditionally splits the
register (assuming it is legal to do so).

My question for the list, what is the predicate that we want to
replace the code that always  decomposes hardregs (assuming it is
legal).In the case of the neon and my port, decomposing costs 4x
more than using a wide move.   I assume the avr is similar.

I don't think can_decompose_p would be the right thing to change.  If
that function returns false, resolve_simple_move is still going to split
up the move.  You need to change resolve_simple_move.  In fact, looking
at resolve_simple_move, I don't think it will break up the register
unless it has already decided to do so:

   /* If we didn't have any big SUBREGS of decomposed registers, and
  neither side of the move is a register we are decomposing, then
  we don't have to do anything here.  */

   if (src == SET_SRC (set)
   &&  dest == SET_DEST (set)
   &&  !resolve_reg_p (src)
   &&  !resolve_subreg_p (src)
   &&  !resolve_reg_p (dest)
   &&  !resolve_subreg_p (dest))
 {
   end_sequence ();
   return insn;
 }

So I think you need to analyze this a bit more.  I don't think that is
the offending code.

Ian


Re: pr52543

2012-03-20 Thread Kenneth Zadeck
i actually care about all registers, not just the hard ones.as it 
turns out i had been wrong and lower-subregs splits pseudo to pseudo 
moves, and hard reg to and from psuedo moves.


register_move_cost requires the regclasses.

anyway that is not the right thing to do for the shifts.

kenny

On 03/20/2012 09:40 AM, Ian Lance Taylor wrote:

Kenneth Zadeck  writes:


I think that the question is really bigger than finding the correct
line to fix.   The problem is, that this code assumes that machines do
not have multiword moves or multiword shifts.   My machine has both,
and i assume that the avr and the neon have at least multiword moves
(but i do not know about the shifts).   And as life moves forward,
more machines will have these.

It seems like the right way to fix this is to somehow enhance the code
at the beginning of decompose_multiword_subregs to ask which modes are
not cheap to move or shift and then modify the second loop to never
lower for those operations for those modes.

The question is do i add 2 more target hooks (one for shifting and one
for moves) or do i use the rtx_cost mechanism and split for anything
over COSTS_N_INSNS (1) or some such?

Why not use REGISTER_MOVE_COST?  You only care about hard registers, and
all that matters are moves between hard registers and pseudo-regs.  So
if you find a hard register REG, and if

   register_move_cost (GET_MODE (reg), REGNO_REG_CLASS (REGNO (REG)),
   REGNO_REG_CLASS (REGNO (REG)))
 == 2

then put the pseudo reg into non_decomposable_context.  This would be in
find_decomposable_subregs.

Ian


confusion about fma description in section 16.9 of gccint doc.

2012-05-14 Thread Kenneth Zadeck
Section 16.9 of the current gcc doc is as follows.   It implies that the 
fma pattern should/could be used on a machine that double rounds the 
multiply add.


`fmam4'
   Multiply operand 2 and operand 1, then add operand 3, storing the
   result in operand 0. All operands must have mode m. This pattern is
   used to implement the |fma|, |fmaf|, and |fmal| builtin functions
   from the ISO C99 standard. The |fma| operation may produce different
   results than doing the multiply followed by the add if the machine
   does not perform a rounding step between the operations.

However, this gccint doc says that this pattern is to be used to 
implement the fma* builtin functions of ISO C99.


The iso c99 standard states:

7.12.13 Floating multiply-add
7.12.13.1 Thefmafunctions
Synopsis
1   #include  double fma(double x, double y, double z); float 
fmaf(float x, float y, float z); long double fmal(long double x, long double y,
long double z);
Description
2   The fma functions compute (x × y) + z, rounded as one ternary 
operation: they compute the value (as if) to infinite precision and round once 
to the result format, according to the rounding mode characterized by the value 
of FLT_ROUNDS.
Returns
3   The fma functions return (x × y) + z, rounded as one ternary operation.


Point 2 of the standard above states that those functions only do single 
rounding.  My guess is that the gccint doc is wrong.(I should point 
out that the gccint doc for the new rtl operator in section 10.9 does 
require only single rounding.   But that is not the question here.)


Should i change the section 16.9 doc to indicate that this pattern is 
only to be used if the machine can do this with a single rounding?


kenny


Re: confusion about fma description in section 16.9 of gccint doc.

2012-05-14 Thread Kenneth Zadeck

committed in revision 187494.

thanks.

On 05/14/2012 08:05 PM, Ian Lance Taylor wrote:

Kenneth Zadeck  writes:


Should i change the section 16.9 doc to indicate that this pattern is
only to be used if the machine can do this with a single rounding?

Sure.

Ian


Re: ARM/getting rid of superfluous zero extension

2012-10-04 Thread Kenneth Zadeck

David and i have been talking about this for some time.

what is needed is a real global optimization algorithm.  my leaning is 
to make do it at the rtl level because that is where everything has been 
exposed.   but it would be a lot easier in ssa form.


The first step in my opinion is to ask the backend what kind of 
operations are expensive and what kinds of operations are cheap. for 
instance some risc machines can do comparisons at any mode and some only 
do them on the width of a register. also some machines have different 
costs associated with sign or zero extension.  I really think that you 
not only need to connect up the sources and syncs but you have to 
understand the space of what kind of transformations changes are going 
to be cheapest.
People do not seem to ask this kind of question at the tree level which 
is one of the reasons that i lean to doing it at the rtl level.


but the chain that goes from a load or some other operation that has a 
specific width output to the
sync that has a specific width input can be a long one and only 
something equivalent to ssa or dataflow is going to let you connect the 
true sources with the true syncs.


this is the problem with tom's patch.One of the examples that i have 
seen is a loop that has di mode add of an index variable and (had) an si 
mode comparison to exit the loop.However, later passes removed the 
si mode comparison because on the ppc it used the loop count 
instruction.but the add of the index variable with sign extension 
remained because it was an index variable and so the value fed back into 
itself.


you are never going clean this up unless you do something global so that 
you can break the recurrence and see that there is no sync to that web 
that really needs the value to be si mode.


kenny

On 10/04/2012 06:16 PM, David Edelsohn wrote:

On Thu, Oct 4, 2012 at 2:18 PM, Eric Botcazou  wrote:

Any suggestion about how I could avoid generating this zero_extension?

Redundant extensions have been a hot topic for some time.  The combiner should
catch the local easy cases, we have ree.c for the nonlocal easy cases and Tom
recently posted:
   http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00408.html
which should catch more complex cases.

I guess someone should gather the various missed optimization cases, draw a
global picture of the situation and see how the various approaches could be
fitted together.

Kenny already did that as part of the thread with Tom.  Tom's patch
also is incomplete.

- David




possible typo in gcc/java/expr.c at line 1053

2012-10-11 Thread Kenneth Zadeck

this code looks bogus, i think that the "== INTEGER_CST" needs to disappear.

kenny

tree
build_newarray (int atype_value, tree length)
{
  tree type_arg;

  tree prim_type = decode_newarray_type (atype_value);
  tree type
= build_java_array_type (prim_type,
 host_integerp (length, 0) == INTEGER_CST
 ? tree_low_cst (length, 0) : -1);



Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-30 Thread Kenneth Zadeck

jakub,

i am hoping to get the rest of my wide integer conversion posted by nov 
5.   I am under some adverse conditions here: hurricane sandy hit her 
pretty badly.  my house is hooked up to a small generator, and no one 
has any power for miles around.


So far richi has promised to review them.   he has sent some comments, 
but so far no reviews.Some time after i get the first round of them 
posted, i will do a second round that incorporates everyones comments.


But i would like a little slack here if possible.While this work is 
a show stopper for my private port, the patches address serious problems 
for many of the public ports, especially ones that have very flexible 
vector units.I believe that there are significant set of latent 
problems currently with the existing ports that use ti mode that these 
patches will fix.


However, i will do everything in my power to get the first round of the 
patches posted by nov 5 deadline.


kenny

On 10/29/2012 01:56 PM, Jakub Jelinek wrote:

Status
==

I'd like to close the stage 1 phase of GCC 4.8 development
on Monday, November 5th.  If you have still patches for new features you'd
like to see in GCC 4.8, please post them for review soon.  Patches
posted before the freeze, but reviewed shortly after the freeze, may
still go in, further changes should be just bugfixes and documentation
fixes.


Quality Data


Priority  #   Change from Last Report
---   ---
P1   23   + 23
P2   77   +  8
P3   85   + 84
---   ---
Total   185   +115


Previous Report
===

http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html

The next report will be sent by me again, announcing end of stage 1.




Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Kenneth Zadeck

Richi,

Let me explain to you what a broken api is.   I have spent the last week 
screwing around with tree-vpn and as of last night i finally got it to 
work.   In tree-vpn, it is clear that double-int is the precise 
definition of a broken api.


The tree-vpn uses an infinite-precision view of arithmetic. However, 
that infinite precision is implemented on top of a finite, CARVED IN 
STONE, base that is and will always be without a patch like this, 128 
bits on an x86-64.However, as was pointed out by earlier, tree-vrp 
needs 2 * the size of a type + 1 bit to work correctly.Until 
yesterday i did not fully understand the significance of that 1 bit.  
what this means is that tree-vrp does not work on an x86-64 with _int128 
variables.


There are no checks in tree-vrp to back off when it sees something too 
large, tree-vrp simply gets the wrong answer.   To me, this is a broken 
api and is GCC at its very worst.   The patches that required this 
SHOULD HAVE NEVER GONE INTO GCC.   What you have with my patches is 
someone who is willing to fix a large and complex problem that should 
have been fixed years ago.


I understand that you do not like several aspects of the wide-int api 
and i am willing to make some of those improvements.   However, what i 
am worried about is that you are in some ways really attached to the 
style of programmed where everything is dependent on the size of a 
HWI.I will continue to push back on those comments but have been 
working the rest in as i have been going along.


To answer your other question, it will be a significant problem if i 
cannot get these patches in.   They are very prone to patch rot and my 
customer wants a product without many patches to the base code.
Also, i fear that your real reason that you want to wait is because you 
really do not like the fact these patches get rid of double in and that 
style of programming and putting off that day serves no one well.


kenny









On 10/31/2012 05:59 AM, Richard Sandiford wrote:

Richard Biener  writes:

On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck
  wrote:

jakub,

i am hoping to get the rest of my wide integer conversion posted by nov 5.
I am under some adverse conditions here: hurricane sandy hit her pretty
badly.  my house is hooked up to a small generator, and no one has any power
for miles around.

So far richi has promised to review them.   he has sent some comments, but
so far no reviews.Some time after i get the first round of them posted,
i will do a second round that incorporates everyones comments.

But i would like a little slack here if possible.While this work is a
show stopper for my private port, the patches address serious problems for
many of the public ports, especially ones that have very flexible vector
units.I believe that there are significant set of latent problems
currently with the existing ports that use ti mode that these patches will
fix.

However, i will do everything in my power to get the first round of the
patches posted by nov 5 deadline.

I suppose you are not going to merge your private port for 4.8 and thus
the wide-int changes are not a show-stopper for you.

That said, I considered the main conversion to be appropriate to be
defered for the next stage1.  There is no advantage in disrupting the
tree more at this stage.

I would like the wide_int class and rtl stuff to go in 4.8 though.
IMO it's a significant improvement in its own right, and Kenny
submitted it well before the deadline.

Richard





Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Kenneth Zadeck

jakub

my port has 256 bit integers.   They are done by strapping together all 
of the elements of a vector unit.
if one looks at where intel is going, they are doing exactly the same 
thing.The difference is that they like to add the operations one at 
a time rather than just do a clean implementation like we did.   Soon 
they will get there, it is just a matter of time.


i understand the tree-vrp code well enough to say that this operation 
does not work if you have timode, but i do not know how to translate 
that back into c to generate a test case.My patch to tree-vrp is 
adaptable in that it looks at the types in the program and adjusts its 
definition of infinite precision based on the code that it sees.  I can 
point people to that code in tree vrp and am happy to do that, but that 
is not my priority now.


also, richi pointed out that there are places in the tree level constant 
propagators that require infinite precision so he is really the person 
who both should know about this and generate proper tests.


kenny

On 10/31/2012 09:55 AM, Jakub Jelinek wrote:

On Wed, Oct 31, 2012 at 09:44:50AM -0400, Kenneth Zadeck wrote:

The tree-vpn uses an infinite-precision view of arithmetic. However,
that infinite precision is implemented on top of a finite, CARVED IN
STONE, base that is and will always be without a patch like this,
128 bits on an x86-64.However, as was pointed out by earlier,
tree-vrp needs 2 * the size of a type + 1 bit to work correctly.
Until yesterday i did not fully understand the significance of that
1 bit.  what this means is that tree-vrp does not work on an x86-64
with _int128 variables.

If you see a VRP bug, please file a PR with a testcase, or point to existing
PR.  I agree with richi that it would be better to add a clean wide_int
implementation for 4.9, rather than rushing something in, introducing
lots of bugs, just for a port that hasn't been submitted, nor I understand
why > int128_t integer types are so crucial to your port, the vector
support doesn't generally need very large integers, even if your
vector unit is 256-bit, 512-bit or larger.

Jakub




Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Kenneth Zadeck
I was not planning to do that mangling for 4.8.My primary 
justification for getting it in publicly now is that there are a large 
number of places where the current compiler (both at the tree and rtl 
levels) do not do optimization of the value is larger than a single 
hwi.My code generalizes all of these places so that they do the 
transformations independent of the size of the hwi.   (in some cases at 
the rtl level, the transformations were only done on 32 bit or smaller 
types, but i have seen nothing like that at the tree level.)   This 
provides benefits for cross compilers and for ports that support timode 
now.


The fact that i have chosen to do it in such a way that we will never 
have this problem again is the part of the patch that richi seems to 
object to.


We have patches that do the mangling for 256 for the front ends but we 
figured that we would post those for comments.   These are likely to be 
controversial because the require extensions to the syntax to accept 
large constants.


But there is no reason why the patches that fix the existing problems in 
a general way should not be considered for this release.


Kenny

On 10/31/2012 10:27 AM, Jakub Jelinek wrote:

On Wed, Oct 31, 2012 at 10:04:58AM -0400, Kenneth Zadeck wrote:

if one looks at where intel is going, they are doing exactly the
same thing.The difference is that they like to add the
operations one at a time rather than just do a clean implementation
like we did.   Soon they will get there, it is just a matter of
time.

All I see on Intel is whole vector register shifts (and like on many other
ports and/or/xor/andn could be considered whole register too).
And, even if your port has 256-bit integer arithmetics, there is no mangling
for __int256_t or similar, so I don't see how you can offer such data type
as supported in the 4.8 timeframe.

Jakub




Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Kenneth Zadeck

Jakub,

it is hard from all of the threads to actually distill what the real 
issues are here.  So let me start from a clean slate and state them simply.


Richi has three primary objections:

1) that we can do all of this with a templated version of double-int.
2) that we should not be passing in a precision and bitsize into the 
interface.

3) that the interface is too large.

I have attached a fragment of my patch #5 to illustrate the main thrust 
of my patches and to illustrate the usefulness to gcc right now.


In the current trunk, we have code that does simplification when the 
mode fits in an HWI and we have code that does the simplification if the 
mode fits in two HWIs.   if the mode does not fit in two hwi's the code 
does not do the simplification.


Thus here and in a large number of other places we have two copies of 
the code.Richi wants there to be multiple template instantiations of 
double-int.This means that we are now going to have to have 3 copies 
of this code to support oi mode on a 64 bit host and 4 copies on a 32 
bit host.


Further note that there are not as many cases for the 2*hwi in the code 
as their are for the hwi case and in general this is true through out 
the compiler.  (CLRSB is missing from the 2hwi case in the patch)  We 
really did not write twice the code when we stated supporting 2 hwi, we 
added about 1.5 times the code (simplify-rtx is better than most of the 
rest of the compiler).  I am using the rtl level as an example here 
because i have posted all of those patches, but the tree level is no 
better.


I do not want to write this code a third time and certainly not a fourth 
time.   Just fixing all of this is quite useful now: it fills in a lot 
of gaps in our transformations and it removes many edge case crashes 
because ti mode really is lightly tested.  However, this patch becomes 
crucial as the world gets larger.


Richi's second point is that we should be doing everything at "infinite 
precision" and not passing in an explicit bitsize and precision.   That 
works ok (sans the issues i raised with it in tree-vpn earlier) when the 
largest precision on the machine fits in a couple of hwis.However, 
for targets that have large integers or cross compilers, this becomes 
expensive.The idea behind my set of patches is that for the 
transformations that can work this way, we do the math in the precision 
of the type or mode.   In general this means that almost all of the math 
will be done quickly, even on targets that support really big 
integers.   For passes like tree-vrp, the math will be done at some 
multiple of the largest type seen in the actual program.The amount 
of the multiple is a function of the optimization, not the target or the 
host. Currently (on my home computer) the wide-int interface allows the 
optimization to go 4x the largest mode on the target.


I can get rid of this bound at the expense of doing an alloca rather 
than stack allocating a fixed sized structure.However, given the 
extremely heavy use of this interface, that does not seem like the best 
of tradeoffs.


The truth is that the vast majority of the compiler actually wants to 
see the math done the way that it is going to be done on the machine.  
Tree-vrp and the gimple constant prop do not.  But i have made 
accommodations to handle both needs.I believe that the reason that 
double-int was never used at the rtl level is that it does not actually 
do the math in a way that is useful to the target.


Richi's third objection is that the interface is too large.   I 
disagree.   It was designed based on the actual usage of the 
interface.   When i found places where i was writing the same code over 
and over again, i put it in a function as part of the interface.   I 
later went back and optimized many of these because this is a very 
heavily used interface.  Richi has many other objections, but i have 
agreed to fix almost all of them, so i am not going to address them here.


It really will be a huge burden to have to carry these patched until the 
next revision.  We are currently in stage 1 and i believe that the minor 
issues that richi raises can be easily addressed.


kenny
@@ -1373,302 +1411,87 @@ simplify_const_unary_operation (enum rtx_code code, enum machine_mode mode,
   return CONST_DOUBLE_FROM_REAL_VALUE (d, mode);
 }
 
-  if (CONST_INT_P (op)
-  && width <= HOST_BITS_PER_WIDE_INT && width > 0)
+  if (CONST_SCALAR_INT_P (op) && width > 0)
 {
-  HOST_WIDE_INT arg0 = INTVAL (op);
-  HOST_WIDE_INT val;
+  wide_int result;
+  enum machine_mode imode = op_mode == VOIDmode ? mode : op_mode;
+  wide_int op0 = wide_int::from_rtx (op, imode);
+
+#if TARGET_SUPPORTS_WIDE_INT == 0
+  /* This assert keeps the simplification from producing a result
+	 that cannot be represented in a CONST_DOUBLE but a lot of
+	 upstream callers expect that this function never fails to
+	 simplify something and so you if you a

  1   2   3   >