Re: IVs optimization issue

2012-03-01 Thread Richard Guenther
On Wed, Feb 29, 2012 at 6:02 PM, Aurelien Buhrig
 wrote:
> Le 29/02/2012 17:08, Richard Guenther a écrit :
>> On Wed, Feb 29, 2012 at 4:41 PM, Aurelien Buhrig
>>  wrote:
>>> Le 29/02/2012 16:15, Richard Guenther a écrit :
 On Wed, Feb 29, 2012 at 4:08 PM, Aurelien Buhrig
  wrote:
>
>> The issue is most probably that on GIMPLE we only deal with ptr_mode,
>> not Pmode, and IVOPTs thinks that pointer induction variables will
>> have ptr_mode.  To fix this the cost computation would need to take
>> into account ptr_mode to Pmode conversions _and_ would need to
>> consider Pmode IVs in the first place (I'm not sure that will be easy).
>
>
> Thank you Richard for you reply.
>
> I guess such an issue is not in the top priority tasks of main
> developers. So I think I'll have to look at it myself if I feel
> confident enough to carry out such a job (I've never worked at tree 
> level).
>
> My main wonder is about Pmode IVs: since GIMPLE representation only
> deals with ptr_mode, what differentiates a Pmode IV from a ptr_mode one?

 Its TREE_TYPE.  In your case you'd have a POINTER_TYPE with
 PSImode for Pmode and a POINTER_TYPE with SImode for ptr_mode
 pointers.  They will differ in TYPE_MODE and TYPE_PRECISION.
>>>
>>> Thanks, I will look at it.
>>>
> BTW, this wonder is not limited to IVs. What does control the choice of
> Pmode vs ptr_mode when mapping to RTL?

 ptr_mode is the C language specified mode for all pointers.  Pmode is
 the mode used for pointers in address operands of CPU instructions.
 Usually they are the same.  When mapping to RTL all ptr_mode uses
 for memory accesses are translated to Pmode while operations on
 the value of ptr_mode quantities are done on ptr_mode (IIRC).
>>>
>>> Another point that is not optimal for my backend is when computing the
>>> address of an array element (M[i]). Now, both the M address and i are
>>> extended to ptr_mode and the sum is truncated in Pmode; whereas it would
>>> be much better to extend i to Pmode, and then perform the add in Pmode.
>>> So if I understand correctly, the later option cannot be generated. Right?
>>
>> Not by IVOPTs at least.  There is also the long-standing issue that
>> POINTER_PLUS_EXPR only accepts sizetype offsets - that may cause
>> issues if your target does not define sizetype having the same mode as
>> ptr_mode.  (And of course complicates using Pmode on the gimple level)
>
> Sorry, it wasn't related to ivopts, but on the use of Pmode from Gimple,
> and especially when computing a M[i] address. (My ptr_mode and SIZE_TYPE
> mode are the same). Can you confirm that it's not possible to compute
> the address of M[i] in Pmode without truncating from ptr_mode? Because
> mapping POINTER_PLUS_EXPR directly to Pmode would also be (with ivopts
> PSI support) a great improvement for Pmode=PSImode targets.

Not sure what you mean with "not possible", it's not done.

Richard.

> Thanks for your help,
> Aurélien
>
>> Richard.
>>
 Richard.

> Thanks,
> Aurélien
>
>>>
>


register renaming issue

2012-03-01 Thread Konstantin Vladimirov
Hi,

I am supporting custom 32-bit backend for gcc 4.6.2, and I want to
implement DI to DF conversion. Earlier it was through libgcc. Specific
of backend is, that DF may be represented as one register, but source
DI must be splitted into two SI to be used.

So overall insn pattern (actual splitting is performed inside expand
pattern) is:

;; splitted 64-bit integer to double float
(define_insn "floatdidf2_32_internal"
  [(parallel
 [(set (subreg:SI (match_operand:DF 0 "register_operand" "=r") 4)
   (subreg:SI (float:DF (match_operand:SI 1 "register_operand" "r")) 4))
  (set (subreg:SI (match_dup 0) 0)
   (subreg:SI (float:DF (match_operand:SI 2 "register_operand"
"r")) 0))])]
  "TARGET_32BIT"
  {
return output_pseudo_didf(operands);
  }
  [(set_attr "predicable" "no")]
)

output_pseudo_didf is function, that outputs non-trivial instruction
sequence, using operands[0], operands[1], operands[2].

And everything works just fine on -O2, but on -O2 -frename-registers
behaves strange:

Before renaming:

(insn 20 19 8 3 (set:DF (reg:DF 3 %r10 [74])
(const_double:DF 0.0 [0x0.0p+0])) test.c:4 11 {movdf_internal_vec}
 (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
(nil)))

(insn 8 20 13 3 (parallel [
(set (subreg:SI (reg:DF 3 %r10 [74]) 4)
(subreg:SI (float:DF (reg:SI 7 %r2 [ s ])) 4))
(set (subreg:SI (reg:DF 3 %r10 [74]) 0)
(subreg:SI (float:DF (reg:SI 8 %r3 [ s+4 ])) 0))
]) test.c:4 160 {floatdidf2_32_internal}
 (expr_list:REG_DEAD (reg:SI 8 %r3 [ s+4 ])
(expr_list:REG_DEAD (reg:SI 7 %r2 [ s ])
(nil

(insn 13 8 16 3 (set:DF (reg/i:DF 7 %r2)
(reg:DF 3 %r10 [74])) test.c:5 11 {movdf_internal_vec}
 (expr_list:REG_DEAD (reg:DF 3 %r10 [74])
(nil)))

After renaming:

(insn 20 19 8 3 (set:DF (reg:DF 3 %r10 [74])
(const_double:DF 0.0 [0x0.0p+0])) test.c:4 11 {movdf_internal_vec}
 (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
(nil)))

(insn 8 20 13 3 (parallel [
(set (subreg:SI (reg:DF 3 %r10 [74]) 4)
(subreg:SI (float:DF (reg:SI 7 %r2 [ s ])) 4))
(set (subreg:SI (reg:DF 4 %r11 [74]) 0)
<-- here %r10 was renamed to be %r11, that is senseless
(subreg:SI (float:DF (reg:SI 8 %r3 [ s+4 ])) 0))
]) test.c:4 160 {floatdidf2_32_internal}
 (expr_list:REG_DEAD (reg:SI 8 %r3 [ s+4 ])
(expr_list:REG_DEAD (reg:SI 7 %r2 [ s ])
(nil

(insn 13 8 16 3 (set:DF (reg/i:DF 7 %r2)
(reg:DF 4 %r11 [74])) test.c:5 11 {movdf_internal_vec}
 (expr_list:REG_DEAD (reg:DF 4 %r11 [74])
(nil)))

It seems, that register renaming in gcc 4.6.2 doesn't know how to
preserve match_dup, and breaks it. How can I rewrite my pattern to
explain, that I mean exactly that: "synchronously updating higher and
lower part of the same register, no renaming please"? Or may be I can
somehow patch renaming pass itself (I don't want to do it and prefer
to change everything only in my backend code, but in the last resort
it is a solution).

---
With best regards, Konstantin


Re: IVs optimization issue

2012-03-01 Thread Aurelien Buhrig
Le 01/03/2012 11:09, Richard Guenther a écrit :
> On Wed, Feb 29, 2012 at 6:02 PM, Aurelien Buhrig
>  wrote:
>> Le 29/02/2012 17:08, Richard Guenther a écrit :
>>> On Wed, Feb 29, 2012 at 4:41 PM, Aurelien Buhrig
>>>  wrote:
 Le 29/02/2012 16:15, Richard Guenther a écrit :
> On Wed, Feb 29, 2012 at 4:08 PM, Aurelien Buhrig
>  wrote:
>>
>>> The issue is most probably that on GIMPLE we only deal with ptr_mode,
>>> not Pmode, and IVOPTs thinks that pointer induction variables will
>>> have ptr_mode.  To fix this the cost computation would need to take
>>> into account ptr_mode to Pmode conversions _and_ would need to
>>> consider Pmode IVs in the first place (I'm not sure that will be easy).
>>
>>
>> Thank you Richard for you reply.
>>
>> I guess such an issue is not in the top priority tasks of main
>> developers. So I think I'll have to look at it myself if I feel
>> confident enough to carry out such a job (I've never worked at tree 
>> level).
>>
>> My main wonder is about Pmode IVs: since GIMPLE representation only
>> deals with ptr_mode, what differentiates a Pmode IV from a ptr_mode one?
>
> Its TREE_TYPE.  In your case you'd have a POINTER_TYPE with
> PSImode for Pmode and a POINTER_TYPE with SImode for ptr_mode
> pointers.  They will differ in TYPE_MODE and TYPE_PRECISION.

 Thanks, I will look at it.

>> BTW, this wonder is not limited to IVs. What does control the choice of
>> Pmode vs ptr_mode when mapping to RTL?
>
> ptr_mode is the C language specified mode for all pointers.  Pmode is
> the mode used for pointers in address operands of CPU instructions.
> Usually they are the same.  When mapping to RTL all ptr_mode uses
> for memory accesses are translated to Pmode while operations on
> the value of ptr_mode quantities are done on ptr_mode (IIRC).

 Another point that is not optimal for my backend is when computing the
 address of an array element (M[i]). Now, both the M address and i are
 extended to ptr_mode and the sum is truncated in Pmode; whereas it would
 be much better to extend i to Pmode, and then perform the add in Pmode.
 So if I understand correctly, the later option cannot be generated. Right?
>>>
>>> Not by IVOPTs at least.  There is also the long-standing issue that
>>> POINTER_PLUS_EXPR only accepts sizetype offsets - that may cause
>>> issues if your target does not define sizetype having the same mode as
>>> ptr_mode.  (And of course complicates using Pmode on the gimple level)
>>
>> Sorry, it wasn't related to ivopts, but on the use of Pmode from Gimple,
>> and especially when computing a M[i] address. (My ptr_mode and SIZE_TYPE
>> mode are the same). Can you confirm that it's not possible to compute
>> the address of M[i] in Pmode without truncating from ptr_mode? Because
>> mapping POINTER_PLUS_EXPR directly to Pmode would also be (with ivopts
>> PSI support) a great improvement for Pmode=PSImode targets.
> 
> Not sure what you mean with "not possible", it's not done.

It's what I meant.Thank you for your reply,
Aurélien

> Richard.
> 
>> Thanks for your help,
>> Aurélien
>>
>>> Richard.
>>>
> Richard.
>
>> Thanks,
>> Aurélien
>>

>>



Re: Graphite news

2012-03-01 Thread Richard Guenther
On Thu, Feb 9, 2012 at 1:42 PM, Tobias Grosser  wrote:
> Hi,
>
> it has been quiet around Graphite for a while and I think it is more than
> time to give an update on Graphite.
>
> == The Status of Graphite ==
>
> Graphite has been around for a while in GCC. During this time a lot of
> people tested Graphite and Sebastian fixed many bugs. As of today the
> Graphite infrastructure is pretty stable and hosts already specific
> optimizations such as loop-interchange, blocking and loop-flattening.
>
> However, during the development of Graphite we also found areas where
> we are still way behind our possibilities.
> First of all we realized that the use of a rational polyhedral library, even
> though it provides some functionality for integer polyhedra, is blocking us.
> Rational rational polyhedra worked OK for some time, but we have now come to
> a point where the absence of real integer polyhedra is causing problems. We
> have bugs that cannot be solved, just because rational polyhedra do not
> represent correctly the set of integer points in the loop iterations.
> Another deficit in Graphite is the absence of a generic optimizer. Even
> though classical loop transformations work well for certain problems, one of
> the major selling points of polyhedral techniques is the possibility to go
> beyond classical loop transformations and to forget about the corresponding
> pass ordering issues. Instead it is possible to define a generic cost
> function for which to optimize. We currently do not take advantage of this
> possibility and therefore miss possible performance gains.
> And as a last point, Graphite still does not apply to as much code as it
> could. We cannot transform a lot of code, not only because of the missing
> support for casts (for which we need integer polyhedra), but also because of
> an ad hoc SCoP detection and because some passes in the
> GCC pass order complicate Graphite's job. Moving these road blocks out of
> the way should increase the amount of code we can optimize significantly.
>
> == The pipeline of upcoming graphite changes ==
>
> As just pointed out there is still a lot of work to be done. We have been
> aware of this and we actually have several ongoing projects to get this work
> done.
>
> 0. Moving to recent version of CLooG.
>
> Graphite was relying for a long time on CLooG-PPL, a CLooG version Sebastian
> forked and ported to PPL, because of copyright issues at that time. The fork
> was never officially maintained by cloog.org, but always by Sebastian
> himself. This was a significant maintenance burden and meant that we where
> cut of from improvements in the official CLooG library. With Andreas
> Simbuerger we had 2011 a summer of code student, that added support to use
> the official cloog.org. The cloog.org version
> proved to be very stable, but we could not yet switch entirely over,
> as this version uses isl as polyhedral library, which would introduce
> another library dependence to GCC (ppl, CLooG and now isl). One solution to
> get this patch in and to not increase the number of library dependences is
> to follow CLooG and to replace PPL with isl. As this was
> desirable for several other reasons Sebastian went ahead:
>
> 1. The integer set library
>
> Back in September Sebastian started the work to move Graphite to an actual
> integer set library. We have chosen isl [1], which is nowadays probably the
> most advanced open source integer set library*. The patch set as posted in
> September was incomplete and in parts incorrect. I finished the patch set.
> With the new patch set the core graphite transformations work entirely with
> isl. The only exceptions are the interchange cost function, the openscop
> export/import and the loop-flattening pass. Due to the native support for
> integer maps and especially due to how we can combine sets and maps with
> isl, the isl
> implementation of graphite functions is often a lot simpler and easier to
> understand. But, more importantly, it finally allows us to gather modulo
> wrapping and undefined overflow characteristics and solves several other
> issues we had due to the use of rational polyhedra.
>
> 2. A real polyhedral optimizer
>
> To get a real, generic polyhedral optimizer for Graphite we have chosen the
> Pluto algorithm. The original implementation of Pluto is available here [2],
> the relevant publications are [3] and [4]. Pluto is an polyhedral optimizer
> that uses a single cost function to optimize simultaneously for data
> locality, parallelism and tileability. It has shown good results on various
> kernels [5] (or see the papers) and Uday, the original author was employed
> to reimplement it in IBM XL. We added an implementation of this algorithm to
> isl. My recent patch set enables Graphite to use this new optimizer. Even
> though the patch is an early draft and definitely needs tuning to match the
> results of the original implementation, it is a great starting point for a
> real polyhedral optimizer in G

Re: RFC: Handle conditional expression in sccvn/fre/pre

2012-03-01 Thread Bin.Cheng
>> Second point, as you said, PRE often get confused and moves compare
>> EXPR far from jump statement. Could we rely on register re-materialize
>> to handle this, or any other solution?
>
> Well, a simple kind of solution would be to preprocess the IL before
> redundancy elimination and separate the predicate computation from
> their uses and then as followup combine predicates back (tree forwprop
> would do that, for example - even for multiple uses).  The question is
> what you gain in the end.

I realized there is no merit if compare EXPR is factored only for PRE pass.

>
>> I would like to learn more about this case, so do you have any opinion on
>> how this should be fixed for now.
>
> The GIMPLE IL should be better here, especially if you consider that
> we force away predicate computation that may trap for -fnon-call-exceptions
> already.  So, simplifying the IL is still the way to go IMHO.  But as I said
> above - it's a non-trivial task with possibly much fallout.
>
There is another benefit. Currently general compare EXPR is a dead case GCC
can not handle in conditional const/copy propagation. It can be handled properly
after rewriting, since GIMPLE_COND only contains a predicate SSA_NAME.
For example, redundant gimple generated for test case in pr38998:

:
  if (y_3(D) < 1.0e+1)
goto ;
  else
goto ;

:
  D.4069_7 = cos (y_3(D));
  if (y_3(D) < 1.0e+1)
goto ;
  else
goto ;

I do think these "non-canonical" compare EXPR might seed other issues.

As for the fallout you mentioned, how about introduce a light-weight pass
at the very end of middle end to propagate the compare EXPR back to
GIMPLE_COND if the corresponding predicate SSA_NAME is down-safe
only because it is used by GIMPLE_COND.

So what do you think?

-- 
Best Regards.


Re: RFC: Handle conditional expression in sccvn/fre/pre

2012-03-01 Thread Richard Guenther
On Thu, Mar 1, 2012 at 3:45 PM, Bin.Cheng  wrote:
>>> Second point, as you said, PRE often get confused and moves compare
>>> EXPR far from jump statement. Could we rely on register re-materialize
>>> to handle this, or any other solution?
>>
>> Well, a simple kind of solution would be to preprocess the IL before
>> redundancy elimination and separate the predicate computation from
>> their uses and then as followup combine predicates back (tree forwprop
>> would do that, for example - even for multiple uses).  The question is
>> what you gain in the end.
>
> I realized there is no merit if compare EXPR is factored only for PRE pass.
>
>>
>>> I would like to learn more about this case, so do you have any opinion on
>>> how this should be fixed for now.
>>
>> The GIMPLE IL should be better here, especially if you consider that
>> we force away predicate computation that may trap for -fnon-call-exceptions
>> already.  So, simplifying the IL is still the way to go IMHO.  But as I said
>> above - it's a non-trivial task with possibly much fallout.
>>
> There is another benefit. Currently general compare EXPR is a dead case GCC
> can not handle in conditional const/copy propagation. It can be handled 
> properly
> after rewriting, since GIMPLE_COND only contains a predicate SSA_NAME.
> For example, redundant gimple generated for test case in pr38998:
>
> :
>  if (y_3(D) < 1.0e+1)
>    goto ;
>  else
>    goto ;
>
> :
>  D.4069_7 = cos (y_3(D));
>  if (y_3(D) < 1.0e+1)
>    goto ;
>  else
>    goto ;
>
> I do think these "non-canonical" compare EXPR might seed other issues.
>
> As for the fallout you mentioned, how about introduce a light-weight pass
> at the very end of middle end to propagate the compare EXPR back to
> GIMPLE_COND if the corresponding predicate SSA_NAME is down-safe
> only because it is used by GIMPLE_COND.
>
> So what do you think?

Well, I'm all for it, but the fallout is in the GIMPLE middle-end pieces.
It's just a lot of work ;)  And I'd rather start forcing the predicate
separation for VEC_COND_EXPRs and COND_EXPRs as they appear
on the RHS of gimple assigns.  That should be simpler and the fallout
should be less.

If you want to do the work I promise to review patches.

Richard.

> --
> Best Regards.


GCC 4.6.3 Released

2012-03-01 Thread Jakub Jelinek
The GNU Compiler Collection version 4.6.3 has been released.

GCC 4.6.3 is a bug-fix release containing fixes for regressions and serious
bugs in GCC 4.6.2, with over 70 bugs fixed since previous release.  This
release is available from the FTP servers listed at:

  http://www.gnu.org/order/ftp.html

Please do not contact me directly regarding questions or comments about
this release.  Instead, use the resources available from
http://gcc.gnu.org.

As always, a vast number of people contributed to this GCC release -- far
too many to thank individually!


gcc-4.5-20120301 is now available

2012-03-01 Thread gccadmin
Snapshot gcc-4.5-20120301 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20120301/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 184762

You'll find:

 gcc-4.5-20120301.tar.bz2 Complete GCC

  MD5=f42ee096cd501240c9b7eba60961df21
  SHA1=941f3b275de21fea5a8438fafb4bf71fd20c43ed

Diffs from 4.5-20120223 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.