Inconsistent next_bb info when EXIT is a successor

2007-03-02 Thread Andrey Belevantsev

Hello,

During my work on the selective scheduler I have triggered an assert in 
our code saying that a fall-through edge should have e->src->next_bb == 
e->dest. This was for a bb with EXIT_BLOCK as its fall-through 
successor, but its next_bb pointing to another block.


I was wondering why verify_flow_info didn't catch this issue.  The code 
starting at cfgrtl.c:1973 (in the January 03 trunk) does check this, but 
only when e->src != ENTRY_BLOCK_PTR && e->dest != EXIT_BLOCK_PTR.


I have tried to reorganize the check so that the "e->src->next_bb == 
e->dest" condition is checked for all edges (see the patch below).  Of 
course, GCC does not bootstrap with this patch, triggering an assert of 
incorrect fallthru block in cfg_layout_finalize, after RTL loop 
optimizations.  In my case, combine has broken that condition.


Does this ring any bells to anybody?  Is this a bug, or should this 
condition not be checked for edges pointing to the exit block at all?


Thanks, Andrey

--- cfgrtl.c(revision 24203)
+++ cfgrtl.c(local)
@@ -1969,8 +1969,7 @@ rtl_verify_flow_info (void)
  break;
}
}
-  else if (e->src != ENTRY_BLOCK_PTR
-  && e->dest != EXIT_BLOCK_PTR)
+  else
{
  rtx insn;

@@ -1981,7 +1980,9 @@ rtl_verify_flow_info (void)
 e->src->index, e->dest->index);
  err = 1;
}
- else
+
+  if (e->src != ENTRY_BLOCK_PTR
+  && e->dest != EXIT_BLOCK_PTR)
for (insn = NEXT_INSN (BB_END (e->src)); insn != BB_HEAD 
(e->dest);

 insn = NEXT_INSN (insn))
  if (BARRIER_P (insn) || INSN_P (insn))


Re: Inconsistent next_bb info when EXIT is a successor

2007-03-02 Thread Andrey Belevantsev

Steven Bosscher wrote:

No. The condition you're checking is simply not true in cfglayout
mode. The whole point of cfglayout mode is to get rid of the
requirement that basic blocks are serial. That means a fallthru edge
in cfglayout mode doesn't have to go to next_bb. It can go to *any*
bb.
Yes, but I'm not in cfglayout mode, because I'm either in sched1 or 
sched2.  In that case, should this condition be preserved or not?


Andrey




Re: Inconsistent next_bb info when EXIT is a successor

2007-03-03 Thread Andrey Belevantsev

Steven Bosscher wrote:
> I don't understand this.  You're saying there is a fallthrough edge
> from your e->src to  EXIT_BLOCK. This case is explicitly allowed by
> the checking code. It is an exception from the rule: For a fallthrough
> edge to EXIT, e->src->next_bb != e->dest is OK.
Thanks!  It's the answer I was looking for -- this case is a known 
exception, so I shouldn't worry.  Given that ...


> It is hard to tell without more context what your problem is. That
> assert, is it an assert in your own code?  Maybe it is too strict?
... yes, the assert was too strict.  I've fixed that so we don't rely on 
next_bb in these cases.


Andrey




Re: Improvements of the haifa scheduler

2007-03-05 Thread Andrey Belevantsev

Vladimir N. Makarov wrote:
Good aliasing is very important for the scheduler.  But I'd look at this 
more wider.  We need a good aliasing for many RTL optimizations.  What's 
happened to ISP RAS aliasing patch propagating SSA info to RTL?  Why is 
it stalled?
We'll plan to work on it further in the near future.  The initial plan 
was to update it to trunk, then to gather the new numbers of 
disambiguations and stuff, then to check the consistency of the saved 
information via some verifier.


As for Sanjiv Gupta's aliasing work, that was interesting but as I 
remember the patch made compiler too slow (like 40% slower).  You should 
make this approach faster to make it accepted as used by default.
Alexander Monakov was working on it in the last year (CC'd).  I have 
looked through his paper to recall what was done:


- instead of iterating over cfg, a single iteration in topological order 
was made to calculate address descriptors (which was enough for using 
this info in the scheduler);
- instead of one per-function hashtable for all address descriptors, 
separate per-bb hashtables were introduced, lowering the time needed to 
access hashtables;
- instead of saving out lists of descriptors for each bb, in lists were 
saved (and not recalculated several times);
- saving descriptors for each mem instead of each bb.  Earlier, when an 
aliasing query was made, we've searched for insns corresponding to mems 
via hashtable, then reanalyzed a basic block up to that insns, then 
answered on the query using calculated address descriptors.  After the 
fix, we've just got the final descriptor from the first hashtable, then 
answered the query.


After all fixes, bootstrap and cc1-i-files compiled slower on 2%.  The 
compiler built with the patch enabled compiled tramp3d on 0.5% faster 
and produced 0.6% faster code.  We'll dig out this patch together with 
the rest of aliasing patches and will send it as a RFC.  It is my 
mistake for not doing this earlier.


If you need benchmarking for machines (like ppc) you have no access to, 
I can provide the benchmarking.
That's great, because we have access to powerpc750 only.  I have used it 
to try the scheduler on ppc, but that was slow.


I really appreciate.  May be if you or ISP RAS could find students (e.g. 
from Moscow University) to do this as Google Summer Code, it could help 
you.  I think it is not too late.  You should ask Ian Taylor or Daniel 
Berlin, if you want to do this.
We'll work on aliasing anyways (see above).  Three students are working 
with us, but they are busy with different projects.  I'll ask my advisor 
about it.


Andrey


Re: anyone using svk?

2007-04-12 Thread Andrey Belevantsev

Rafael Espindola wrote:

Is anyone using svk? I tried to create a local depot by updating the
one pointed on the wiki. Unfortunately it is trying to use too much
ram and crashing.
Yes, we keep a local mirror of trunk and sel-sched branch using svk.  As 
 far as I remember, I did the setup from scratch, starting from some 
revision (probably, a branch point).  We used svk 1.05 and now are using 
1.08.


Andrey


Re: Scheduler questions (related to PR17808)

2005-06-30 Thread Andrey Belevantsev

Vladimir Makarov wrote:

I'll look at this PR today.


We've looked today at this issue. We think the problem is that proposed 
patch of sched_get_condition() treats conditional jumps likely to 
COND_EXECs, but it doesn't fix other places in sched-deps, where 
COND_EXECs are considered. Maxim Kuvyrkov proposed the attached patch, 
which allows gcc to bootstrap on ia64 and fixes the testcase in PR.


We've also found that current mainline ICEs compiling the testcase with 
"-O0 -fschedule-insns -fschedule-insns2". That is because after reload 
several pseudos still remain in global_live_at_start sets. The pseudos 
then appear in regsets through compute_jump_reg_dependencies(), and 
sched-deps segfaults at EXECUTE_IF_SET_IN_REG_SET loop at sched-deps.c:948.


We don't know reload well enough to know for sure which place should be 
fixed in reload, or maybe in update_life_info(). Is this issue worth 
opening another PR?


Andrey




--- gcc/gcc/sched-deps.cSun Jun 19 16:37:49 2005
+++ orig/gcc/sched-deps.c   Thu Jun 30 18:00:23 2005
@@ -149,7 +149,7 @@
 return 0;
 
   src = SET_SRC (pc_set (insn));
-#if 0
+#if 1
   /* The previous code here was completely invalid and could never extract
  the condition from a jump.  This code does the correct thing, but that
  triggers latent bugs later in the scheduler on ports with conditional
@@ -1019,7 +1019,8 @@
 {
   /* In the case of barrier the most added dependencies are not
  real, so we use anti-dependence here.  */
-  if (GET_CODE (PATTERN (insn)) == COND_EXEC)
+  /* if (GET_CODE (PATTERN (insn)) == COND_EXEC)  */
+  if (sched_get_condition (insn))
{
  EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i, rsi)
{
@@ -1066,7 +1067,8 @@
 {
   /* If the current insn is conditional, we can't free any
 of the lists.  */
-  if (GET_CODE (PATTERN (insn)) == COND_EXEC)
+  /* if (GET_CODE (PATTERN (insn)) == COND_EXEC)  */
+  if (sched_get_condition (insn))
{
  EXECUTE_IF_SET_IN_REG_SET (reg_pending_uses, 0, i, rsi)
{


Re: Needs advises on rotating register allocation for IA64 in GCC

2005-07-21 Thread Andrey Belevantsev

Steven Bosscher wrote:

Hmm, I've never seen any discussions about this on [EMAIL PROTECTED]  Could you
give some links to messages in the mailing list archives that you
may have found?


I've seen only the thread mentioning the work of Ritu Sabharwal 
(http://gcc.gnu.org/ml/gcc/2002-12/msg00508.html), and then questions of 
Canqun Yang and Feng Wang 
(http://gcc.gnu.org/ml/gcc/2003-09/msg00924.html and 
http://gcc.gnu.org/ml/gcc/2004-10/msg01193.html, respectively). Maybe 
I've missed something.


Andrey




[GCC 4.2 Project] Support for IA-64 speculation

2005-09-02 Thread Andrey Belevantsev

Hello,

I work on GCC for the Institute for System Programming in Russia. Below 
is a brief summary of the project aiming at adding support for ia64 
speculation to the GCC instruction scheduler. I presented the project at 
the last GCC Summit.


This description doesn't have any implementation details, but rather 
refers to the summit paper. If needed, I'd be happy to provide longer 
summary on the wiki page.


I'll not be able to respond on your comments until next Tuesday.

Regards, Andrey

---
Support for IA-64 speculation

Speculation is one of the features of IA-64 architecture aimed to expose 
instruction-level parralelism. Using speculation allows for a compiler 
to overcome the dependencies by moving a load through the ambiguous 
store or across a branch (with data and control speculation, 
respectively). This technique helps to hide the latency of memory loads 
and reduce the execution time.


The patch adds support for both data and control speculation to the GCC 
instruction scheduler. Implementation issues of the patch are described 
in the paper 'Improving GCC instruction scheduler for IA-64', which can 
be found in the proceedings of GCC Summit 2005.


Personnel

Maxim Kuvyrkov, Andrey Belevantsev (Institute for System Programming, 
Russian Academy of Sciences)


Delivery Date

This project will be ready during the first stage of GCC 4.2.

Benefits

The patch improves SPEC FP on 2% (with -O2, as of May 2005). Aggressive 
inlining and loop unrolling help the patch to produce better results on 
SPEC INT. Detailed results are given in the abovementioned paper.


Dependencies

None.

Modifications Required

Target-independent parts of the patch modify the scheduler source files. 
A new flag and params are added for enabling and controlling speculation 
support. Target-dependent part includes new speculative instructions and 
pipeline descriptions in the ia64 backend, and ia64.c changes.





Re: Where are the fortran test results for cv strunk?

2005-09-16 Thread Andrey Belevantsev

Christian Joensson wrote:
So, I just wonder what's going wrong here... 

Could it be the problem explained in
http://gcc.gnu.org/ml/gcc-patches/2005-09/msg00872.html? The patch is 
available later in that thread: 
http://gcc.gnu.org/ml/gcc-patches/2005-09/msg00879.html


Andrey





Bootstrap problems on ia64

2005-12-23 Thread Andrey Belevantsev

Hi,

When bootstrapping rev. 109012 on ia64-linux (checked out around 9am GMT 
today), I get


make[3]: Entering directory `/mnt/sda5/bonzo/obj-trunk/stage2-libdecnumber'
source='../../trunk/libdecnumber/decNumber.c' object='decNumber.o' 
libtool=no /home/bonzo/local/obj-trunk/./prev-gcc/xgcc 
-B/home/bonzo/local/obj-trunk/./prev-gcc/ 
-B/mnt/sda5/bonzo/obj-trunk//ia64-unknown-linux-gnu/bin/ 
-I../../trunk/libdecnumber -I.  -g -O2 -W -Wall -Wwrite-strings 
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition 
-Wmissing-format-attribute -pedantic -Wno-long-long -Werror 
-I../../trunk/libdecnumber -I.  -c ../../trunk/libdecnumber/decNumber.c

cc1: warnings being treated as errors
../../trunk/libdecnumber/decNumber.c: In function 'decToString':
../../trunk/libdecnumber/decNumber.c:2013: warning: value computed is 
not used

make[3]: *** [decNumber.o] Error 1

Looking at the source, the warning seems to be spurious:

static void
decToString (decNumber * dn, char *string, Flag eng)
{
 <...>
  char *c = string; /* work [output pointer] */
 <...>
  if (dn->bits & DECSPECIAL)
{   /* Is a special value */
  if (decNumberIsInfinite (dn))
{
  strcpy (c, "Infinity");  <--- here
  return;
}

And when specifying --disable-werror, I get a comparison failure:

Bootstrap comparison failure!
./gcc.o differs

Is this known or am I just broke my installation? GCC was configured 
just with --prefix, and I used 3.4.3 for stage1.


Andrey



Re: Remove sel-sched?

2016-01-13 Thread Andrey Belevantsev

Hello Bernd,

On 13.01.2016 21:25, Bernd Schmidt wrote:

There are a few open PRs involving sel-sched, and I'd like to start a
discussion about removing it. Having two separate schedulers isn't a very
good idea in the first place IMO, and since ia64 is dead, sel-sched gets
practically no testing despite being the more complex one.

Thoughts?


Out of the PRs we have, two are actually fixed but not marked as such. 
This year's PRs are from the recent Zdenek's Debian rebuild with GCC 6 and 
I will be on them now.  For the other two last year PRs, it is my fault not 
to fix them in a timely manner.  Frankly, 2015 was very tough for me and my 
colleagues (we worked 6 days a week most part of the year), but since 
January it is fine again and we'll catch up now.  Sorry for that.


You're also right that sel-sched now gets limited testing.  We're made it 
work initially for ia64, x64, ppc and cell, and then added ARM, too. 
Outside of ia64 world, I had private reports of sel-sched being used for 
cell with success, and we used it in our own contractor work for optimizing 
some ARM apps with GCC.


In short, we're willing to maintain sel-sched and I apologize for the slow 
PR fixing speed last year, it should be no problem anymore as of now.  If 
there are any big plans of reorganizing schedulers and sel-sched stands in 
the way of those, let's discuss it and we'll be willing to help in any way.


Andrey




Bernd




Re: Remove sel-sched?

2016-01-15 Thread Andrey Belevantsev

On 14.01.2016 20:26, Jeff Law wrote:

On 01/14/2016 12:07 AM, Andrey Belevantsev wrote:

Hello Bernd,

On 13.01.2016 21:25, Bernd Schmidt wrote:

There are a few open PRs involving sel-sched, and I'd like to start a
discussion about removing it. Having two separate schedulers isn't a very
good idea in the first place IMO, and since ia64 is dead, sel-sched gets
practically no testing despite being the more complex one.

Thoughts?


Out of the PRs we have, two are actually fixed but not marked as such.
This year's PRs are from the recent Zdenek's Debian rebuild with GCC 6
and I will be on them now.  For the other two last year PRs, it is my
fault not to fix them in a timely manner.  Frankly, 2015 was very tough
for me and my colleagues (we worked 6 days a week most part of the
year), but since January it is fine again and we'll catch up now.  Sorry
for that.

You're also right that sel-sched now gets limited testing.  We're made
it work initially for ia64, x64, ppc and cell, and then added ARM, too.
Outside of ia64 world, I had private reports of sel-sched being used for
cell with success, and we used it in our own contractor work for
optimizing some ARM apps with GCC.

In short, we're willing to maintain sel-sched and I apologize for the
slow PR fixing speed last year, it should be no problem anymore as of
now.  If there are any big plans of reorganizing schedulers and
sel-sched stands in the way of those, let's discuss it and we'll be
willing to help in any way.

FWIW, I've downgraded the sel-sched stuff to P4 for this release given how
that scheduler is typically used (ia64, which is a dead platform).

I think the bigger question Bernd is asking here is whether or not it makes
sense to have multiple schedulers.  In an ideal world we'd bake them off
select the best and deprecate/remove the others.

I didn't follow sel-sched development closely, so forgive me if the
questions are simplistic/naive, but what are the main benefits of sel-sched
and is it at a point (performance-wise) where it could conceivably replace
the aging haifa scheduler infrastructure?


The main sel-sched points at the time of its inclusion were as follows: 
bookkeeping code support (move an insn between any blocks in the scheduling 
region), insn transformations support (renaming, unification, substitution 
through register copies), scheduling at several points at once, pipelining 
support.  Together it paid off with something like 7-8% on SPEC at the time 
on ia64, but not so on the other archs, where we didn't spend much time for 
tuning and usually got both ups and downs compared to haifa.  On ia64 the 
speedup was mostly because of pipelining with speculation, as far as I 
recall, for others including ARM renaming and substitution were useful.


Since then, Vlad and Bernd put more improvements to the haifa scheduler, 
including sched pressure, predication and backtracking, so both schedulers 
now have features not present in the other one and the initial feature 
advantage somewhat wore off.


Also, the big problem of sel-sched is speed -- it is slow because the 
dependency lists are not maintained through the scheduler, most of 
transformation stuff is implemented through an insn movement up the region 
and looking what should happen to allow insn A move up through insn B. 
I've done most of I could imagine to speed it up but haven't managed making 
sel-sched by default on -O2.


So to sum this up, I don't think sel-sched can replace haifa in its current 
state.  These days to speed up the scheduler I'd add something like path 
based dependency tracking with bit vectors like it is done in Intel's 
wavefront scheduling, though it is patented (Vlad may correct me here). 
Or, we need to devise other means of keeping dependencies up to date. 
We've tried that but never got it working good enough.


The thing I would not like to lose is sel-sched pipelining.  It can work on 
any loops, not only countable ones like modulo scheduling, and this can 
make a difference for some apps even outside of ia64.  But if one basic 
scheduler is desired, maybe the better use of our resources will be to 
improve modulo scheduling instead to not lose pipelining capabilies in gcc. 
 It is completely unmaintained now, my colleague Roman Zhuykov had a 
couple of improvements ~4yrs ago but most of them never got into trunk due 
to lack of review.  He can step up as a modulo-sched maintainer if needed, 
the code is alive (see PR69252).


Sorry for a long mail :)

Andrey



Jeff




Re: broken links?

2014-06-29 Thread Andrey Belevantsev

On 30.06.2014 7:22, Hebenstreit, Michael wrote:

I tested from home to reach https://gcc.gnu.org/pub/gcc/infrastructure/ - same 
result; ftp://gcc.gnu.org/pub/gcc/infrastructure/ works though. Trying ftp from 
behind the company FW on FF redirects me to the htpps, though on IE it works


There was a number of bug reports to Firefox about unexpected changes in 
the URL protocol when using autocomplete, e.g. 
https://bugzilla.mozilla.org/show_bug.cgi?id=769348 and 
https://bugzilla.mozilla.org/show_bug.cgi?id=769994.  These are marked as 
fixed but still get occasional comments, maybe you can describe your 
situation there.


Best,
Andrey



Regards
Michael

-Original Message-
From: Ingwie Phoenix [mailto:ingwie2...@googlemail.com]
Sent: Sunday, June 29, 2014 7:53 PM
To: Hebenstreit, Michael
Cc: Gerald Pfeifer; Jonathan Wakely; gcc@gcc.gnu.org
Subject: Re: broken links?


Am 30.06.2014 um 01:43 schrieb Hebenstreit, Michael 
:


Could our firewall (plus proxy) be the reason? I still get "page no found" for 
both Firefox 25 and IE

Michael

Does the proxy have a cache? Some proxys have something similar to a browser 
cache - in fact, more compareable with what CloudFlare does. It probably saved 
a copy of the URL's result and deals it out upon request. So you might wish to 
check this out too.





Re: Selective scheduling and its usage

2018-03-21 Thread Andrey Belevantsev
Hi Martin,

On 21.03.2018 12:48, Martin Liška wrote:
> Hello.
> 
> I noticed there are quite many selective scheduling PRs:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659
> 
> and many others.
> 
> I want to ask you if you plan to maintain the scheduling?

Yes.  The current status is that I have patches for 83530, 83962, 83913,
83480, 83972, 80463.  I don't have patches for any of the 84* issues.
I'm planning to submit the patches for the former set and to look at the
later set next week.

I usually do most of the work by myself after internal discussions with
Alexander and other colleagues here, and there might be delays when I get
busy with unrelated stuff.  However, if there's a pressing need, we have
enough knowledgeable people to fix any sel-sched PR within a week or so.

> Is it enabled by default for any target we support?

Yes, ia64 at -O3.  The testing we make usually is like follows: bootstrap
and test on ia64, bootstrap with sel-sched enabled on x86-64, and make any
new tests from PRs be run on x86-64, ia64, and ppc.  This way I'm confident
that it mostly works on that platforms.

> Should we deprecate it for GCC 8?

No, I don't think so.

Best,
Andrey

> 
> Thank you,
> Martin



Re: Selective scheduling and its usage

2018-03-21 Thread Andrey Belevantsev
On 21.03.2018 13:31, Martin Liška wrote:
> On 03/21/2018 11:17 AM, Andrey Belevantsev wrote:
>> Hi Martin,
>>
>> On 21.03.2018 12:48, Martin Liška wrote:
>>> Hello.
>>>
>>> I noticed there are quite many selective scheduling PRs:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659
>>>
>>> and many others.
>>>
>>> I want to ask you if you plan to maintain the scheduling?
>>
>> Yes.  The current status is that I have patches for 83530, 83962, 83913,
>> 83480, 83972, 80463.  I don't have patches for any of the 84* issues.
>> I'm planning to submit the patches for the former set and to look at the
>> later set next week.
> 
> Nice!
> 
> Maybe we can create a meta bug to track all sel. scheduling issue.
> May I create it?

Yes, of course.  I would be happy because I can easily lose track (I have
some queries in Bugzilla about scheduling but I don't monitor whole
gcc-bugs traffic).  In fact, I wasn't aware of some PR from your list until
your mail.

Best,
Andrey

> 
>>
>> I usually do most of the work by myself after internal discussions with
>> Alexander and other colleagues here, and there might be delays when I get
>> busy with unrelated stuff.  However, if there's a pressing need, we have
>> enough knowledgeable people to fix any sel-sched PR within a week or so.
>>
>>> Is it enabled by default for any target we support?
>>
>> Yes, ia64 at -O3.  The testing we make usually is like follows: bootstrap
>> and test on ia64, bootstrap with sel-sched enabled on x86-64, and make any
>> new tests from PRs be run on x86-64, ia64, and ppc.  This way I'm confident
>> that it mostly works on that platforms.
> 
> Great.
> 
>>
>>> Should we deprecate it for GCC 8?
>>
>> No, I don't think so.
> 
> Works for me.
> 
> Thanks for clarification.
> Martin
> 
>>
>> Best,
>> Andrey
>>
>>>
>>> Thank you,
>>> Martin
>>
> 



Re: Problems with selective scheduling

2009-10-28 Thread Andrey Belevantsev

Hi Markus,

Markus L wrote:

Thank you very much for your detailed response!


I suspect your machine description says that dependency between loads and
multiply-add has zero latency, thus allowing the scheduler to place them
into
one instruction group.  Grep for various comments about tick_check_p
function.
In verbose scheduler dumps, there should be something like

Expr 35 is not ready yet until cycle 2
No best expr found!
Finished a cycle.  Current cycle = 2


At a glance when compiling without the -fsel-sched-pipelining flag
(but with -fselective-scheduling2) proper VLIW grouping is performed
so I guess the dependency is not zero latency but I will try to
investigate the details. Increasing verbosity and comparing dumps to
ia64 will probably be helpful.
You are welcome to email me or Alexander if you'd need any help with 
debugging this.





On the high level, yes.  In this particular example, pipelining of loads
would
not be possible for the following reasons:
1) speculative motion of loads with pre/post-increment is not implemented
(ia64 backend disables auto-inc generation pass when sel-sched is enabled);


Is there a fundamental problem with pre/post-increment support in the
selective scheduling approach or is this something that might be
implemented in the future?
There is no problem, we just didn't have time for this within the frames 
of that project.  We are willing to tackle this within our spare time 
during next stage1, if we would have any.






2) when pipelining loads, scheduler needs to transform them into
control-speculative form (since loop epilogue is not generated, load on the
very last iteration of the transformed loop may access unallocated memory).
In other words, selective scheduler does not preserve number of instruction
executions (pipelined instructions from original loop will be executed more
times than number of loop iterations).
Speculative loads are not supported by any mainline GCC target except ia64.


On my target it is always safe to performs loads so I suppose I could
pretend to support speculative loads in order to get around this.
That's good.  You can check that VINSN_MAY_TRAP_P is false for your 
loads (it is set from haifa_classify_insn which in turn uses 
may_trap_exp).  If this is the case, then you should be safe.


Yours, Andrey




/Markus




Re: Is there any plan for "data propagation from Tree SSA to RTL" to be in GCC mainline?

2008-11-09 Thread Andrey Belevantsev

Diego Novillo wrote:

On Sun, Nov 9, 2008 at 06:38, Steven Bosscher <[EMAIL PROTECTED]> wrote:


Wasn't there a GSoC project for this last year?  And this year?

It'd be interesting to hear if anything came out of that...


Nothing came of that, unfortunately.
There are two patches, actually.  The patch of propagating data 
dependences to RTL is ready and working, it wasn't (at that time) 
committed just because it was initially completed during stage3.  The 
patch for propagating alias info wasn't finished within the scope of 
this year's GSoC, unfortunately, and I take it more as my fault than a 
student's fault, as I failed to help him locally with organizing his work.


We are nevertheless trying to put some work into finishing this patch. 
As it is not completed yet, I don't have a subject to discuss.  I hope 
that before the next stage1 we'll manage to finish the patches and to 
unify them before submitting, as the mechanism they use for mapping MEMs 
to trees is the same.  If we'd not finish the second patch, we'll submit 
the first one anyways.


Sorry for not writing this earlier -- I've had a few busy months (mostly 
finishing and defending ph.d. thesis :)


Andrey


Re: Is there any plan for "data propagation from Tree SSA to RTL" to be in GCC mainline?

2008-11-11 Thread Andrey Belevantsev

Bingfeng Mei wrote:

I found the the GsoC project and patch here (only 2007)
http://code.google.com/soc/2007/gcc/appinfo.html?csaid=E0FEBB869A5F65A8

Is this patch only for propagating data dependency or does it include 
propagating alias info as well?
The patch at http://gcc.gnu.org/ml/gcc/2007-12/msg00240.html (I presume 
this is the same patch, I'm just giving you the link to its submission 
to the GCC ML) only does propagating data dependency info.


Andrey


Re: GCC & OpenCL ?

2009-02-03 Thread Andrey Belevantsev

Hello,

We at ISP RAS have plans to work in the near future on generating either 
 CUDA source or PTX from C programs (probably with simple OpenMP 
directives).  Of course, we would benefit from the OpenCL infrastructure 
 in GCC if one was available.


Mark Mitchell wrote:

We (CodeSourcery) have been talking to our commercial partners about
implementing OpenCL in GCC and trying to develop/assess the level of
interest.  As others have stated, our theory of operation here would be
to have the compiler depend on a library API that could be implemented
for GPUs or for ordinary multi-core CPUs.  (Just as the libgomp API
could be provided to the compiler without depending on POSIX threads.)
In the case of generating code for GPUs, I am wondering whether the 
backend that produces device assembly for the kernel code will be (in 
your theory) implemented inside GCC, or would that be a third-party 
tool?  Obviously, a library is not enough for a heterogeneous system, or 
am I missing anything from your description?  As I know, e.g. there is 
no device-independent bytecode in the OpenCL standard which such a 
backend could generate.


Andrey


Re: Fixing the pre-pass scheduler on x86 (Bug 38403)

2009-04-08 Thread Andrey Belevantsev

Vladimir Makarov wrote:

Steven Bosscher wrote:
On Wed, Apr 8, 2009 at 5:19 AM, Vladimir Makarov  
wrote:
I've been working on register-pressure sensitive insn scheduling last 
two
months and I hope to submit this work for gcc4.5.  I am implementing 
also a

mode in insn-scheduler to do only live range shrinkage.


Is all of this still necessary if the selective scheduler (with
register renaming) is made to work on i686/x86_64 after reload?



 That is a really interesting question, Steven.  I thought about
this for a few months (since last fall).  Here is my points to result
me in starting my work on haifa-scheduler:

 1. Selective scheduler works only for Itanium now.  As I know there
are some plans to make it working on PPC, but there are no plans
to make it working for other acrhictectures for now.
There are some patches for fixing sel-sched on PPC that I need to ping, 
thanks for reminding me :)  They were too late for regression-only 
mode, and we didn't get access to modern PPCs as we hoped, so this was 
not an issue earlier.  Btw, when we've submitted the scheduler, it did 
work on x86-64 (compile farm), I don't know whether this is still the case.




 2. My understanding is that (register-pressure sensitive insn
scheduling + RA + 2nd insn scheduling) is not equal to (RA +
selective scheduling with register renaming) with the point of
the potential performance results.  In first case, the
pressure-sensitive insn scheduler could improve RA by live-range
shrinkage.  In the 2nd case it is already impossible.  It could
be improved by the 2nd RA but RA is more time consuming than
scheduling now.  In general this chicken-egg problem could be
solved by iterating the 2 contradictory passes insn scheduling +
RA (or RA + insns scheduling).  It is a matter of practicality
when to stop these iterations.

 3. My current understanding is that selective scheduler is overkill
for architectures with few registers.  In other words, I don't
think it will give a better performance for such architectures.
On the other hand, it is much slower than haifa-scheduler because
of its complexity.  Haifa-scheduler before RA already adds 3-5% compile
time, selective scheduler will add 5-7% more compile time without
performance improvement for mainstream architectures x86/x86_64.
I think it is intolerable.
We still plan to do some speedups to the sel-sched code within 4.5 
timeframe, mainly to the dependence handling code.  But even after that, 
I agree that for out-of-order architectures selective scheduler will 
probably be an overkill.  Besides, register renaming itself would remain 
an expensive operation, because it needs to scan all paths along which 
an insn was moved to prove that the new register can be used, though we 
have invested a lot of time for speeding up this process via various 
caching mechanisms.  On ia64, register renaming is useful for pipelining 
loops.


Also, we have tried to limit register renaming for the 1st pass 
selective scheduler via tracking register pressure and having a cutoff 
for that, but it didn't work out very well on ia64, so I agree that much 
more of RA knowledge should be brought in for this task.


Hope this helps.  Vlad, Steven, thanks for caring.

Andrey




Re: Questions about selective scheduler and PowerPC

2010-10-18 Thread Andrey Belevantsev

Hi Jie,

On 18.10.2010 10:49, Jie Zhang wrote:


When this error happens, FENCE_ISSUED_INSNS (fence) is 2 and issue_rate is
1. PowerPC 8540 is capable to issue 2 instructions in one cycle, but
rs6000_issue_rate lies to scheduler that it can only issue 1 instruction
before register relocation is done. See the following code:


See PR 45352.  I've tried to fix this in the selective scheduler by 
modeling the lying behavior in line with the haifa scheduler.  Let me know 
if the last patch from the PR audit trail doesn't work for you.


In addition, after the above patch goes in, I can make the selective 
scheduler not try to jump through the hoops with putting correct sched 
cycles on insns for targets which don't need it in their target_finish 
hook.  I guess powerpc needs this though, but x86-64 (for which PR 45342 
was opened) almost surely does not.


Thanks, Andrey


Re: Questions about selective scheduler and PowerPC

2010-10-18 Thread Andrey Belevantsev

On 18.10.2010 11:31, Jie Zhang wrote:

Hi Andrey,

On 10/18/2010 03:13 PM, Andrey Belevantsev wrote:

Hi Jie,

On 18.10.2010 10:49, Jie Zhang wrote:


When this error happens, FENCE_ISSUED_INSNS (fence) is 2 and
issue_rate is
1. PowerPC 8540 is capable to issue 2 instructions in one cycle, but
rs6000_issue_rate lies to scheduler that it can only issue 1 instruction
before register relocation is done. See the following code:


See PR 45352. I've tried to fix this in the selective scheduler by
modeling the lying behavior in line with the haifa scheduler. Let me
know if the last patch from the PR audit trail doesn't work for you.

In addition, after the above patch goes in, I can make the selective
scheduler not try to jump through the hoops with putting correct sched
cycles on insns for targets which don't need it in their target_finish
hook. I guess powerpc needs this though, but x86-64 (for which PR 45342
was opened) almost surely does not.


Thanks for your reply. I just tried. That patch does not help for this issue.
I see, I didn't touch the failing assert with the patch.  Can you just 
remove the assert and see if that helps for you?  I cannot think of how it 
can be relaxed and still be useful.


Andrey





Re: Questions about selective scheduler and PowerPC

2010-10-19 Thread Andrey Belevantsev

On 19.10.2010 17:57, Jie Zhang wrote:

Removing the failing assert fixes the test case. But I wonder why not just
get max_issue correct. I'm testing the attached patch. IMHO, max_issue
looks confusing.

* The concept of ISSUE POINT has never been used since the code landed in
repository.

* In the comment just before the function, it's mentioned that MAX_POINTS
is the sum of points of all instructions in READY. But it does not match
the code. The code only summarizes the points of the first MORE_ISSUE
instructions. If later ISSUE_POINTS become not uniform, that piece of code
should be redesigned.

So I think it's good to remove it now. And "top - choice_stack" is a good
replacement for top->n. So we can remove field n from struct choice_entry,
too.

Now I'm looking at MIPS target to find out why this change in the would
cause PR37360.
I agree that ISSUE_POINTS can be removed, as it was not used (maybe Maxim 
can comment more on this).  However, the assert is not about the points but 
exactly about the situation when a target is lying to the compiler about 
its issue rate.


The ideal situation is that we agree on that this should never happen, but 
then you need to fix all targets that use this trick, and it seems that 
there is at least mips, ppc, and x86-64 (which is why I pointed you to 
45352).  The fix would be to find out why claiming the true issue rate 
degrades performance and to implement the proper scheduling hooks for 
changing priority of some insns, or to enable -fsched-pressure for the 
offending targets.


This is a lot of work, which is why this assert was installed in max_issue 
for relatively short amount of time.  Maybe it's time to try again, but 
let's have a consensus first that this assert should never trigger by 
design and we have enough flexibility in the scheduler to provide legal 
means to achieve the same performance effect.


Andrey





/* ??? We used to assert here that we never issue more insns than issue_rate.
However, some targets (e.g. MIPS/SB1) claim lower issue rate than can be
achieved to get better performance. Until these targets are fixed to use
scheduler hooks to manipulate insns priority instead, the assert should
- be disabled.
-
- gcc_assert (more_issue >= 0); */
+ be disabled. */






Re: software pipelining

2010-11-10 Thread Andrey Belevantsev

Hi,

On 10.11.2010 12:32, roy rosen wrote:

Hi,

I was wondering if gcc has software pipelining.
I saw options -fsel-sched-pipelining -fselective-scheduling
-fselective-scheduling2 but I don't see any pipelining happening
(tried with ia64).
Is there a gcc VLIW port in which I can see it working?
You need to try -fmodulo-sched.  Selective scheduling works by default on 
ia64 with -O3, otherwise you need -fselective-scheduling2 
-fsel-sched-pipelining.  Note that selective scheduling disables autoinc 
generation for the pipelining to work, and modulo scheduling will likely 
refuse to pipeline a loop with autoincs.


Modulo scheduling implementation in GCC may be improved, but that's a 
different topic.


Andrey



For an example function like

int nor(char* __restrict__ c, char* __restrict__ d)
{
 int i, sum = 0;
 for (i = 0; i<  256; i++)
 d[i] = c[i]<<  3;
 return sum;
}

with no pipelining a code like

r1 = 0
r2 = c
r3 = d
_startloop
if r1 == 256 jmp _end
r4 = [r2]+
r4>>= r4
[r3]+ = r4
r1++
jmp _startloop
_end

here inside the loop there is a data dependency between all 3 insns
(only the r1++ is independent) which does not permit any parallelism

with pipelining I expect a code like

r1 = 2
r2 = c
r3 = d
// peel first iteration
r4 = [r2]+
r4>>= r4
r5 = [r2]+
_startloop
if r1 == 256 jmp _end
[r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+
r1++
jmp _startloop
_end

Now the data dependecy is broken and parlallism is possible.
As I said I could not see that happening.
Can someone please tell me on which port and with what options can I
get such a result?

Thanks, Roy.




Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Andrey Belevantsev

Hello,

On 14.11.2010 0:08, Xinliang David Li wrote:

I re-measured the performance difference using trunk gcc and trunk
clang/llvm on a core-2 box.  -fno-strict-aliasing is added to gcc
because clang/llvm's type based aliasing is not incomplete and not
enabled by default. I also added -fomit-frame-pointer to clang/llvm as
this is gcc's default. The base option is -O2.


It would be very interesting to compare also peak numbers, i.e. with LTO 
and strict aliasing enabled, as well as -O3 and -ffast-math/-funroll-loops, 
similar to Vlad's or OpenSUSE's options.  Can you try to measure these? 
Maybe you can also run SPEC2k6, if there is enough machine resources, but 
that's probably asking too much...


Andrey



Re: alias time explosion

2006-03-22 Thread Andrey Belevantsev

Hi Daniel,

I can't find the testcase attached to any message of the thread.  Could 
it be because of the message size?  If so, please send the testcase both 
to me and Maxim, one of us will look into it.


Thanks, Andrey




Re: [patch] Improve loop array prefetch for IA-64

2006-06-02 Thread Andrey Belevantsev

Canqun Yang wrote:

Hi, all

This patch results a performance increase of 4% for SPECfp2000 and 13% for NAS 
benchmark suite on
Itanium-2 system, respectively. More performance increase is hopeful by further 
tuning the
parameters and improving the prefetch algorithm at tree level. 


Hi Canqun,

It's great news that you continued to work on prefetching tuning for 
ia64!  Do you plan to port your other changes for the old RTL 
prefetching to the tree level?



@@ -1985,13 +1985,18 @@
??? This number is bogus and needs to be replaced before the value is
actually used in optimizations.  */


I suggest to remove this comment as it has become outdated with your 
patch.  Instead you might say how did you choose this particular value 
(and PREFETCH_BLOCK too).  Just my 2c.


Andrey



Re: Power/Energy Awareness in GCC

2013-04-15 Thread Andrey Belevantsev

On 15.04.2013 20:21, Tobias Burnus wrote:

Ghassan Shobaki wrote:

We are currently working on a research
project on instruction scheduling for low power (experimenting with
different algorithms for minimizing switching power) and would like to
find out if GCC already has such a scheduler and how it can be enabled,
so that we can experiment with it.


Your project sounds similar to the following project:
http://gcc.gnu.org/ml/gcc/2013-03/msg00259.html ("Identifying Compiler
Options to Minimise Energy Consumption for Embedded Platforms").


We also did similar research in the past, mainly investigating how compiler 
feedback (via instrumentation and profiling) can assist dynamic voltage 
scaling in the OS (we had fully offline DVS as well as the patches to the 
Linux scheduler), but we also did some experiments with controlling 
bit-switching in the scheduler and some other work.


You can find the relevant papers with some preliminary DVS work and 
bit-switching experiments at 
http://www.doc.ic.ac.uk/~phjk/GROW09/papers/06-PowerBelevantsev.pdf and 
with more DVS research and experiments from GROW 2010 at 
http://gcc.gnu.org/wiki/GROW-2010?action=AttachFile&do=get&target=2010-GROW-Proceedings.pdf.


To be honest, I can only repeat the same things I said to James on the 2012 
Cauldron -- I don't think you can achieve much of power savings from within 
the compiler, at least until the hardware design will change.  The 
scheduling freedom in minimizing bit-switching is very limited and this is 
not going to buy much.  Any information that a compiler can produce for you 
will be about program behavior related to CPU and memory, and no matter how 
well you use it, again you will not save much power as nowadays CPU will 
not be the largest power consumer in an embedded system (if you save e.g. 
10% CPU power consumption which is 10% of total system consumption, that's 
only 1%).  One can achieve more by turning off a display or a networking 
device.


But I will be happy to be proven wrong :-) So good luck with the research.

Andrey


Re: Update the c++1y/c++14 support page

2013-10-06 Thread Andrey Belevantsev

On 07.10.2013 9:54, Jeff Hammond wrote:

Given your company (Oracle) sued Google over 9 lines of Java (the now
infamous rangeCheck function), I hardly think it's appropriate for you
to discourage someone from following through with copyright assignment
for a minor contribution.


This has nothing to do with Oracle v. Google, but with GCC policy of not 
requiring a copyright assignment for small patches from the first-time 
contributors (which Paolo knows as a long-time GCC hacker).  Of course, 
after a certain limit the assignment is required.  See 
http://gcc.gnu.org/contribute.html.


Also, please don't top post on this list.

Andrey



Jeff

On Sun, Oct 6, 2013 at 7:48 AM, Paolo Carlini  wrote:



Hi,

Morwenn Ed  ha scritto:

Well, it never got to sign the copyright assignment.


Certainly you don't need an assignment for 3 lines of docs!

Paolo








Re: RFC: -Wall by default

2012-04-05 Thread Andrey Belevantsev

On 05.04.2012 16:33, Robert Dewar wrote:

On 4/5/2012 8:28 AM, Michael Veksler wrote:


It is not that they can't remember. I am a TA at a moderately basic
programming course,
and student submit home assignments with horrible errors. These errors,
such as
free(*str) or *str=malloc(n) are easily be caught by -Wall. I have to
remember to
advise them to use -Wall and to fix all the warnings, which I sometimes
forget to do.


Wouldn't it be better in a "moderately basic programming course" to
provide standard canned scripts that set things up nicely for students
including the switches they need? Indeed for such a course wouldn't it
be better to use an appropriate IDE, so they could concentrate on the
task at hand and not fiddling with commands. Yes, I think it is very
important for students to learn what is going on, but you can't do
everything at once in a basic course.

And even in the context you give, surely it is not too much to expect
a TA to remember important advice like this?


FWIW, in our "basic programming" course students have to hand their 
homework to an automated testing system which forces the compiler options 
we think useful, including all the relevant warning switches and -Werror. 
Of course, there is a web page explaining the meaning of the switches and 
TAs help with emphasizing their importance to students.  And indeed, you 
can't do everything in an 101 course, thus not much of this (helpful) 
information remains in their heads.  But it's better than nothing.


Andrey


Re: SMS issues

2012-07-19 Thread Andrey Belevantsev

Hello Alex,

On 18.07.2012 18:40, Alex Turjan wrote:


Im writing to you with respect to some strange SMS functionality.
In the code bellow there are 2 instructions (a builtin store and a builtin load)
as they appear in the program flow before SMS:


...


Issues:
1.  What is the reason why (T,1) is build up? – to me it seams that (T,0)
must be enough


This looks like the issue that Roman's patch from 
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html should be fixing, 
could you try it?


Ayal, Revital, could you again take a look at the above patch and all the 
SMS improvement patches mentioned in 
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01859.html?  The last comments 
from me are at http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00478.html.  At 
the Cauldron, I was talking to Ramana about pushing these forward as 
important for arm and Linaro, so it would be good to have them in 4.8.


Andrey


2.  Looking inside generate_reg_moves it seams to me that this function
is not meant to deal with replacing memory accesses but only with
register replacements. (see inside the call to replace_rtx which in
my case trys to replace the  mem accesses inside 136).

3. The (T,1) dep is assumed to take place as if before the SMS pass,
insn 136 was preceding insn 134:

(insn 136 135 137 12 tdscdma_pfu_ccdec.c:292
(set (reg/v:HI 248 [ mappingAddress_i16 ])
 (unspec:HI [
 (mem:HI (plus:PSI (reg/v/f:PSI 170 [ curMappingTable_pi16 ])
 (reg:SI 305)) [0 S2 A16])
 ] 696)) 755 {INSN_BUILTIN___loadbyteofs_16} (expr_list:REG_DEAD 
(reg:SI 305)
 (nil)))

(insn 134 133 135 12 tdscdma_pfu_ccdec.c:289
(set (mem:HI (plus:PSI (reg/v/f:PSI 185 [ ccdecInterim_pi16 ])
 (reg:SI 303)) [0 S2 A16])
 (unspec:HI [
 (reg/v:HI 244 [ outData_u16 ])
 ] 1752)) 1377 {INSN_BUILTIN___storebyteofs_16} (expr_list:REG_DEAD 
(reg:SI 303)
 (expr_list:REG_DEAD (reg/v:HI 244 [ outData_u16 ])
   (nil

If that would be the case then between 134 and 136 there would be present
also an antidependence of distance 0. Becasue in the pipelined schedule,
134 is scheduled before 136 (SCHED_TIME (136) > SCHED_TIME (134)) the modulo
variable expansion needs to take place as explained before.

SMS decides to produce a modulo variable expansion in a case when is not
needed. However, it fails in fulfilling the whole modulo variable expansion
procedure, covering in this way the possibly incorrect behavior described above.

regards,
Alex





Re: SMS issues

2012-07-23 Thread Andrey Belevantsev

Hello,

On 19.07.2012 13:14, Alex Turjan wrote:

Andrey, Thanks for the patch. I applied it and so far it seams ok. I
will run further testing and let you know if i see problems.

Back to the last part of my email, Im still wondering what happens in
case the variable modulo expanded is a memory location? because as I see
generate_reg_moves is not able to handle such situation... or perhaps
there is something which prevents the modulo scheduler from arriving to
this situation?


The dependencies that get removed with -fmodulo-sched-allow-regmoves are 
only register ones, and with the check for setting REG_P in 
schedule_reg_moves we are not supposed to touch memory.


I suggest you to look at the trunk's code as it was rewritten by Richard 
Sandiford (CC'd), he could comment more on how the scheduling of register 
moves was changed.  See also the thread starting at 
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02428.html.


Andrey



Alex

--- On Thu, 7/19/12, Andrey Belevantsev  wrote:


From: Andrey Belevantsev  Subject: Re: SMS issues To:
"Alex Turjan"  Cc: gcc@gcc.gnu.org,
ayal.z...@gmail.com, revital.e...@linaro.org, "Roman Zhuikov"
 Date: Thursday, July 19, 2012, 11:11 AM Hello
Alex,

On 18.07.2012 18:40, Alex Turjan wrote:


Im writing to you with respect to some strange SMS

functionality.

In the code bellow there are 2 instructions (a builtin

store and a builtin load)

as they appear in the program flow before SMS:


...


Issues: 1.What is the reason why (T,1) is

build up? – to me it seams that (T,0)

must be enough


This looks like the issue that Roman's patch from
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html should be
fixing, could you try it?

Ayal, Revital, could you again take a look at the above patch and all
the SMS improvement patches mentioned in
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01859.html? The last
comments from me are at
http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00478.html. At the
Cauldron, I was talking to Ramana about pushing these forward as
important for arm and Linaro, so it would be good to have them in
4.8.

Andrey


2.Looking inside generate_reg_moves

it seams to me that this function

is not meant to deal with replacing

memory accesses but only with

register replacements. (see inside

the call to replace_rtx which in

my case trys to replace the

mem accesses inside 136).


3. The (T,1) dep is assumed to take place as if before

the SMS pass,

insn 136 was preceding insn 134:

(insn 136 135 137 12 tdscdma_pfu_ccdec.c:292 (set (reg/v:HI 248 [

mappingAddress_i16 ])

(unspec:HI [


(mem:HI (plus:PSI (reg/v/f:PSI 170 [ curMappingTable_pi16 ])



(reg:SI 305)) [0 S2 A16])

] 696))

755 {INSN_BUILTIN___loadbyteofs_16} (expr_list:REG_DEAD (reg:SI 305)

(nil)))

(insn 134 133 135 12 tdscdma_pfu_ccdec.c:289 (set (mem:HI (plus:PSI
(reg/v/f:PSI

185 [ ccdecInterim_pi16 ])



(reg:SI 303)) [0 S2 A16])

(unspec:HI [


(reg/v:HI 244 [ outData_u16 ])

]

1752)) 1377 {INSN_BUILTIN___storebyteofs_16} (expr_list:REG_DEAD
(reg:SI 303)

(expr_list:REG_DEAD

(reg/v:HI 244 [ outData_u16 ])

(nil

If that would be the case then between 134 and 136

there would be present

also an antidependence of distance 0. Becasue in the

pipelined schedule,

134 is scheduled before 136 (SCHED_TIME (136) >

SCHED_TIME (134)) the modulo

variable expansion needs to take place as explained

before.


SMS decides to produce a modulo variable expansion in a

case when is not

needed. However, it fails in fulfilling the whole

modulo variable expansion

procedure, covering in this way the possibly incorrect

behavior described above.


regards, Alex








Re: Global Value Numbering and dependence on SSA in GCC

2013-02-10 Thread Andrey Belevantsev

Hi Kartik,

On Sun, 10 Feb 2013 15:41:17 +0530, Kartik Singhal 
 wrote:

Thanks Richard for pointing out tree-ssa-sccvn.c

On Wed, Feb 6, 2013 at 8:14 PM, Richard Biener
 wrote:


Well, to ignore SSA form simply treat each SSA name as separate 
variable.

You of course need to handle PHI nodes as copies on CFG edges then.


I am not sure if I understood this correctly.

Consider the following example:

 if (...)
   a_1 = 5;
 else if (...)
   a_2 = 2;
 else
   a_3 = 13;

 # a_4 = PHI 
 return a_4;

Do you mean that I treat a_1, a_2 and a_3 as 3 different variables? 
In

this approach, I lose the information that they are actually the same
variables.

Or should I write a mini lexer function to convert the SSA names into
original variable names by removing _1, _2, etc. as suffix from each?

Regarding PHI nodes, I think you mean, I should treat them as 
identity
functions, but I am not clear exactly how in the first approach 
above.

In second, I can just treat it as a=a.


Richard means that you can treat the above code as

  if (...) {
 a_1 = 5;
 a_4 = a_1;
  } else if (...) {
 a_2 = 2;
 a_4 = a_2;
  } else {
 a_3 = 13;
 a_4 = a_3;
  }
  return a_4;

The extra code above is "copies on CFG edges".

Andrey





--
Kartik
http://k4rtik.wordpress.com/


Release planning for GCC 4.4?

2008-03-03 Thread Andrey Belevantsev

Hello,

As GCC 4.3 is almost out of the door, I thought it possible to ask 
whether there will be a release plan for GCC 4.4 similar to the ones for 
previous releases.  The reason I'm asking is that myself and my 
colleagues are working on preparing the selective scheduler branch for 
inclusion in mainline.  We'll need another month for the final cleanup 
and compile time tuning of the scheduler, so we plan to submit it 
sometimes in April.  I would like to avoid any possible conflicts with 
other projects (though this work is unlikely to conflict with others I 
know of, except maybe Vlad's register allocator), and I think that the 
idea of a release plan used for previous development cycles worked quite 
nicely.


Thanks, Andrey


Re: GCC-4.3.0 fails to compile SPECint-2006 with control speculation on itanium processor

2008-04-11 Thread Andrey Belevantsev

吴曦 wrote:

Hi:

I am working on gcc-4.3.0 and Redhat ES 4. When I uses the compiler to
build specint-2006 benchmarks,
none passes the make with compiler option: -msched-control-spec
(enable control speculation on IA-64)
Control speculation is disabled by default on IA-64, so I think one of 
the scheduler patches accidentally broke it.  This can be Maxim's 
rewrite of dependence lists.  We will take a look.  Meanwhile you can 
try 4.2 series, they should work.


Andrey


Re: gcc on IA64 platform

2008-05-20 Thread Andrey Belevantsev

Hello,

Tadas V wrote:

I am a computer science student and currently I am preparing my master
degree final work on "Compiler optimization on IA64 platforms". So
could you provide some information to me what is the the current
situation with gcc and IA64 platfrom - I mean what are open
optimization issues and so on. After googling a while I found this
document http://gcc.gnu.org/projects/ia64.html and I would like to
know if this information is up-to-date. Looking forward to hearing
from you. Thank you in advance.


As you can see from the above page, it comes from the 2001 mini summit, 
so most of the projects mentioned there are already done.  Moreover, GCC 
infrastructure has been dramatically improved since then.  The current 
state can be summarized as follows:


o Alias analysis improvements mentioned on the page are done long ago. 
There are two unfinished IA-64 inspired patches concerning alias 
analysis: improvements of Sanjiv Gupta's patch tracking base+offset 
calculations on RTL done by Alexander Monakov, which we didn't manage to 
submit (see http://gcc.gnu.org/ml/gcc/2007-03/msg00148.html), and the 
patch propagating alias information from Tree SSA to RTL, which produced 
too few disambiguations and should be improved by Alexander Monakov 
during this year's Google Summer of Code.


o Data prefetching is now reimplemented on trees instead of RTL.  There 
was a project by Canqun Yang on tuning the old RTL data prefetching for 
IA-64, but AFAIK it was never ported to the new implementation.


o DFA scheduler was implemented by Vladimir Makarov and checked in long 
ago.  Bundling is now performed using DFA too, see bundling () in 
config/ia64/ia64.c.


o Profile-directed block ordering and inlining is already supported AFAIK.

o Control and data speculation are supported since GCC 4.2 as a result 
of a project of ISP RAS.  The implementation was done by Maxim Kuvyrkov.


o Extended basic block scheduling is implemented and works as a second 
scheduling pass on IA-64.  Superblock formation was also implemented on 
RTL and fairly recently moved to Tree SSA by Robert Kidd.  There is also 
 a treegion scheduling patch on the treegion branch, but it was never 
committed to mainline.


o Modulo scheduling is implemented by IBM Haifa team.  It started 
working on IA-64 since GCC 4.3 after some small bugfixes (sorry I didn't 
mention that in changes.html).  Also, there is a patch that does 
propagation of data dependency information to RTL done by Alexander 
Monakov.  It wasn't committed because there was a stage3 at that time, 
and I think it will be unified with the analogous aliasing patch 
mentioned above.


o We (ISP RAS) are currently preparing selective scheduling 
implementation, also inspired by IA64, for merge with mainline.  The 
actual code is in sel-sched-branch in the SVN repository.


o Rotating registers are not yet supported.

o Link time optimizations (LTO) is an ongoing work, you can take a look 
at LTO branch.


Also, there was a meeting of Gelato GCC group in 2006, and some 
information can be found in the minutes at 
http://gcc.gelato.org/MeetingNotes.  You can also search mailing list 
archives for similar discussions happened in the past.


Hope this helps,
Andrey




Re: gcc on IA64 platform

2008-05-20 Thread Andrey Belevantsev

Gerald Pfeifer wrote:

Any chance you could make a pass through that page and remove those
items that you know have already been done, or separate those that
are still open and those that have been done into two different
sections?  

Sure, I would make a note to do this somewhere during stage2.

Andrey


Using cfglayout mode in the selective scheduler

2008-08-11 Thread Andrey Belevantsev

Hello,

Currently, the selective scheduler pass uses cfgrtl mode.  This results 
in creating extra jumps and basic blocks while changing control flow, 
especially when redirecting edges.  When this happens, we need to 
initialize scheduler's data structures.  To do this, we have implemented 
control flow hooks and RTL hooks to notify the scheduler about all 
created insns/bbs.  The new RTL hooks were considered not a good idea. 
Instead, Steven and Ian suggested using cfglayout mode in the scheduler 
in such a way that we'd see all generated jumps and initialize them.


The basic idea is enabling cfglayout mode and then ensuring that insn 
stream and control flow are in sync with each other at all times.  This 
is required because e.g. on Itanium the final bundling happens right 
after scheduling, and any extra jumps emitted by cfg_layout_finalize 
will spoil the schedule.  So we need to ensure that leaving cfglayout 
mode will not create extra insns by fixing the insn stream on the fly. 
We also need to maintain the existing behavior (fixes are done while 
finalizing) for other users of cfglayout mode.


I see several options for supporting this functionality:

1. Make the required fixes inside the cfglayout hooks so that both the 
new behavior and the old behavior is supported and the user can choose 
one of them.  As we still need to see the created jumps, we need to make 
try_redirect_by_replacing_jump and force_nonfallthru functions either to 
call user-defined hooks on the new jumps or to record the new jumps in a 
vector to which the user can get access.


2. Factor out the hooks and helpers from cfg*.c into smaller functions 
and create the alternative implementations of hooks inside the 
scheduler, which will see the new jumps.  The old behavior will be 
retained as we'll not change the original hooks.


3. Modify try_redirect_by_replacing_jump and force_nonfallthru as in #1, 
but do this in cfgrtl mode.  No changes to the cfglayout mode will be 
needed then, and it will not be used at all.



Going with #1 means that the easier handling of control flow given by 
the cfglayout mode will not be used (except for maybe better support of 
current_loops data).  But it is better if we'd want to get rid of cfgrtl 
eventually.  Going with #2 will mean some code duplication, hopefully 
not very large due to factoring out the hooks and reusing its parts. 
Also, we will not need to handle all cases like e.g. splitting an 
abnormal edge.  Going with #3 means the smallest amount of changes, but 
doesn't use cfglayout at all :)


I would choose #3, but as people think that moving to cfglayout is a 
good idea in general, will be happy to implement is #1 or #2.  What do 
people think?  Is there better options I've overlooked?


Thanks, Andrey


Re: Using cfglayout mode in the selective scheduler

2008-08-12 Thread Andrey Belevantsev

Zdenek Dvorak wrote:

I am probably missing something:


The basic idea is enabling cfglayout mode and then ensuring that insn
stream and control flow are in sync with each other at all times. This
is required because e.g. on Itanium the final bundling happens right
after scheduling, and any extra jumps emitted by cfg_layout_finalize
will spoil the schedule.

what is the difference between this idea and the cfgrtl mode?

In cfgrtl mode, the functions to manipulate the cfg ensure that the
insn stream and the CFG match. For cfglayout mode you have to do it by
hand.


I must say that I do not like this idea at all.  As cfgrtl mode routines
show, this is nontrivial (and consequently, error-prone), and you would
be duplicating much of cfgrtl functionality.  I fail to see the
advantage over simply using cfgrtl mode and handling the created jump
insns (by checking the last insn of altered and newly created blocks, no
changes to cfgrtl routines or new hooks necessary),
I would prefer to do that too, as it seems a cleaner approach.  I am 
still willing to try the cfglayout idea, but if there is a disagreement 
about its usefulness, maybe it's better to proceed with your suggestion 
(I have suggested something similar to Ian at 
http://gcc.gnu.org/ml/gcc-patches/2008-06/msg01738.html).  What do the 
other maintainers think?


Andrey


Re: build error from trunk sources

2008-09-01 Thread Andrey Belevantsev

Janus Weil wrote:

Building trunk rev. 139857 on linux/x86_64, I get the following failure:

...
cc1: warnings being treated as errors
/home/local/jweil/gcc44/trunk/gcc/sel-sched-ir.c:946: error:
'cmp_v_in_regset_pool' defined but not used
I will commit the following as obvious once bootstrap finishes.  Sorry 
for the breakage.


Andrey


2008-09-01  Andrey Belevantsev  <[EMAIL PROTECTED]>

* sel-sched-ir.c (cmp_v_in_regset_pool): Surround with
#ifdef ENABLE_CHECKING.

Index: gcc/sel-sched-ir.c
===
*** gcc/sel-sched-ir.c  (revision 139859)
--- gcc/sel-sched-ir.c  (working copy)
*** return_regset_to_pool (regset rs)
*** 939,944 
--- 939,945 
regset_pool.v[regset_pool.n++] = rs;
  }

+ #ifdef ENABLE_CHECKING
  /* This is used as a qsort callback for sorting regset pool stacks.
 X and XX are addresses of two regsets.  They are never equal.  */
  static int
*** cmp_v_in_regset_pool (const void *x, con
*** 946,951 
--- 947,953 
  {
return *((const regset *) x) - *((const regset *) xx);
  }
+ #endif

  /*  Free the regset pool possibly checking for memory leaks.  */
  void


Re: Optimizations for itanium

2007-06-01 Thread Andrey Belevantsev

Prasad, Kamal R wrote:

Hello,

 Can someone tell me the back-end optimizations available for itanium
(IA64)?
We (HP) may be able to contribute to this from our side.
To add to the summary Vlad already did, you may want to take a look at 
the notes from the last meeting of the Gelato GCC Itanium group in 
Moscow available at http://gcc.gelato.org/MeetingNotes.  This is a good 
summary of what is considered the most important works for Itanium in 
GCC.  More information about the meeting is at 
http://gcc.gelato.org/MoscowMeeting.


Our team is currently focused on performance tuning and compile time 
improvement of the new scheduler for Itanium, available on the sel-sched 
branch.  As of data dependence propagation, there will be a Google SoC 
project about this; I hope to have more data in a few weeks to discuss.


I have sent a patch to fix modulo scheduling on Itanium some time ago. 
It was considered acceptable by IBM folks, and I think that it will go 
in with the other fixes done by them, but I don't know the details. 
Also, Dmitry Zhurikhin from our team tried to use resource-aware 
constraints in modulo scheduling for his MS thesis, which worked for two 
tests from SPEC.  He will work on better heuristics for scheduling in 
another SoC project this summer.


I would also suggest to tune default parameters of some optimizations 
for Itanium.  For example, there was a patch from Canqun Yang about 
increasing prefetching parameters for Itanium, which produced much 
better results than the default values.  However, that was for the old 
RTL prefetch pass, which is replaced by the tree ssa implementation done 
by Zdenek.  Inlining parameters for Itanium can be more aggressive too. 
 This kind of work is quite simple, but requires a lot of machine 
resources, and people usually don't have access to many Itaniums.  (We 
have only two, for example.)  So maybe you can help here.


Andrey


hash_rtx and volatile subexpressions

2007-06-07 Thread Andrey Belevantsev

Hello,

I would like to use some sort of hashing to speed up searching for an 
insn in an availability set in the selective scheduler.  It seems 
natural using hash_rtx for this purpose.  However, hash_rtx will not 
hash any volatile subexpressions, returning 0 in this case.  This is 
fine by me, but together with the recursion optimization it has near 
line 2345...


  for (; i >= 0; i--)
{
  switch (fmt[i])

...

  /* If we are about to do the last recursive call
 needed at this level, change it into iteration.
 This function  is called enough to be worth it.  */
  if (i == 0)
{
  x = XEXP (x, i);
  goto repeat;
}

...one does not get a hash e.g. for any conditional jump like

(set (pc)
(if_then_else (ne (reg:BI 262 p6 [360])
(const_int 0 [0x0]))
(label_ref:DI 191)
(pc)))

...because the pc subexpression is always the first one, and the 
recursive call for it gets optimized by the above code, which results in 
a zero hash for the whole pattern.


Is this intentional, or do we want to have 'return hash;' instead of 
'return 0;' in all places when *do_not_record_p is set to 1?  Is there a 
better hash_rtx somewhere, which I don't know about?


Andrey



Re: IA64 optimizations..

2007-09-05 Thread Andrey Belevantsev

Hello,

Kumar Rangarajan wrote:
I am interested in understanding the limitations/optimization 
opportunities of the IA64 version of gcc. I read from the projects list 
on the gcc site about the proposed optimizations for the IA64 platform, 
I see that some of the requests were from 2001 or so timeframe, I am 
wondering what's the current state of those optimization projects. I 
tried reading other sites out there, but most of the information seemed 
a little dated (again from 2001) (eg: 
http://ia64-linux.org/compilers/gcc_wishlist.html).


Can someone please let me know what's the current status of those 
projects or any other optimization wishlists ?
This question was raised a few months ago -- you may find useful to look 
at the thread starting at http://gcc.gnu.org//ml/gcc/2007-06/msg7.html.


Andrey


Re: Git and GCC

2007-12-06 Thread Andrey Belevantsev

Vincent Lefevre wrote:

It's surprising that you don't mention svk, which is based on top
of Subversion[*]. Has anyone tried? Is there any problem with it?
I must agree with Ismail's reply here.  We have used svk for our 
internal development for about two years, for the reason of easy 
mirroring of gcc trunk and branching from it locally.  I would not 
complain about its speed, but sometimes we had problems with merge from 
trunk, ending up with e.g. zero-sized files in our branch which were 
removed from trunk, or we even couldn't merge at all, and I had to 
resort to underlying subversion repository for merging.  As a result, 
we're currently migrating to mercurial.


Andrey