Inconsistent next_bb info when EXIT is a successor
Hello, During my work on the selective scheduler I have triggered an assert in our code saying that a fall-through edge should have e->src->next_bb == e->dest. This was for a bb with EXIT_BLOCK as its fall-through successor, but its next_bb pointing to another block. I was wondering why verify_flow_info didn't catch this issue. The code starting at cfgrtl.c:1973 (in the January 03 trunk) does check this, but only when e->src != ENTRY_BLOCK_PTR && e->dest != EXIT_BLOCK_PTR. I have tried to reorganize the check so that the "e->src->next_bb == e->dest" condition is checked for all edges (see the patch below). Of course, GCC does not bootstrap with this patch, triggering an assert of incorrect fallthru block in cfg_layout_finalize, after RTL loop optimizations. In my case, combine has broken that condition. Does this ring any bells to anybody? Is this a bug, or should this condition not be checked for edges pointing to the exit block at all? Thanks, Andrey --- cfgrtl.c(revision 24203) +++ cfgrtl.c(local) @@ -1969,8 +1969,7 @@ rtl_verify_flow_info (void) break; } } - else if (e->src != ENTRY_BLOCK_PTR - && e->dest != EXIT_BLOCK_PTR) + else { rtx insn; @@ -1981,7 +1980,9 @@ rtl_verify_flow_info (void) e->src->index, e->dest->index); err = 1; } - else + + if (e->src != ENTRY_BLOCK_PTR + && e->dest != EXIT_BLOCK_PTR) for (insn = NEXT_INSN (BB_END (e->src)); insn != BB_HEAD (e->dest); insn = NEXT_INSN (insn)) if (BARRIER_P (insn) || INSN_P (insn))
Re: Inconsistent next_bb info when EXIT is a successor
Steven Bosscher wrote: No. The condition you're checking is simply not true in cfglayout mode. The whole point of cfglayout mode is to get rid of the requirement that basic blocks are serial. That means a fallthru edge in cfglayout mode doesn't have to go to next_bb. It can go to *any* bb. Yes, but I'm not in cfglayout mode, because I'm either in sched1 or sched2. In that case, should this condition be preserved or not? Andrey
Re: Inconsistent next_bb info when EXIT is a successor
Steven Bosscher wrote: > I don't understand this. You're saying there is a fallthrough edge > from your e->src to EXIT_BLOCK. This case is explicitly allowed by > the checking code. It is an exception from the rule: For a fallthrough > edge to EXIT, e->src->next_bb != e->dest is OK. Thanks! It's the answer I was looking for -- this case is a known exception, so I shouldn't worry. Given that ... > It is hard to tell without more context what your problem is. That > assert, is it an assert in your own code? Maybe it is too strict? ... yes, the assert was too strict. I've fixed that so we don't rely on next_bb in these cases. Andrey
Re: Improvements of the haifa scheduler
Vladimir N. Makarov wrote: Good aliasing is very important for the scheduler. But I'd look at this more wider. We need a good aliasing for many RTL optimizations. What's happened to ISP RAS aliasing patch propagating SSA info to RTL? Why is it stalled? We'll plan to work on it further in the near future. The initial plan was to update it to trunk, then to gather the new numbers of disambiguations and stuff, then to check the consistency of the saved information via some verifier. As for Sanjiv Gupta's aliasing work, that was interesting but as I remember the patch made compiler too slow (like 40% slower). You should make this approach faster to make it accepted as used by default. Alexander Monakov was working on it in the last year (CC'd). I have looked through his paper to recall what was done: - instead of iterating over cfg, a single iteration in topological order was made to calculate address descriptors (which was enough for using this info in the scheduler); - instead of one per-function hashtable for all address descriptors, separate per-bb hashtables were introduced, lowering the time needed to access hashtables; - instead of saving out lists of descriptors for each bb, in lists were saved (and not recalculated several times); - saving descriptors for each mem instead of each bb. Earlier, when an aliasing query was made, we've searched for insns corresponding to mems via hashtable, then reanalyzed a basic block up to that insns, then answered on the query using calculated address descriptors. After the fix, we've just got the final descriptor from the first hashtable, then answered the query. After all fixes, bootstrap and cc1-i-files compiled slower on 2%. The compiler built with the patch enabled compiled tramp3d on 0.5% faster and produced 0.6% faster code. We'll dig out this patch together with the rest of aliasing patches and will send it as a RFC. It is my mistake for not doing this earlier. If you need benchmarking for machines (like ppc) you have no access to, I can provide the benchmarking. That's great, because we have access to powerpc750 only. I have used it to try the scheduler on ppc, but that was slow. I really appreciate. May be if you or ISP RAS could find students (e.g. from Moscow University) to do this as Google Summer Code, it could help you. I think it is not too late. You should ask Ian Taylor or Daniel Berlin, if you want to do this. We'll work on aliasing anyways (see above). Three students are working with us, but they are busy with different projects. I'll ask my advisor about it. Andrey
Re: anyone using svk?
Rafael Espindola wrote: Is anyone using svk? I tried to create a local depot by updating the one pointed on the wiki. Unfortunately it is trying to use too much ram and crashing. Yes, we keep a local mirror of trunk and sel-sched branch using svk. As far as I remember, I did the setup from scratch, starting from some revision (probably, a branch point). We used svk 1.05 and now are using 1.08. Andrey
Re: Scheduler questions (related to PR17808)
Vladimir Makarov wrote: I'll look at this PR today. We've looked today at this issue. We think the problem is that proposed patch of sched_get_condition() treats conditional jumps likely to COND_EXECs, but it doesn't fix other places in sched-deps, where COND_EXECs are considered. Maxim Kuvyrkov proposed the attached patch, which allows gcc to bootstrap on ia64 and fixes the testcase in PR. We've also found that current mainline ICEs compiling the testcase with "-O0 -fschedule-insns -fschedule-insns2". That is because after reload several pseudos still remain in global_live_at_start sets. The pseudos then appear in regsets through compute_jump_reg_dependencies(), and sched-deps segfaults at EXECUTE_IF_SET_IN_REG_SET loop at sched-deps.c:948. We don't know reload well enough to know for sure which place should be fixed in reload, or maybe in update_life_info(). Is this issue worth opening another PR? Andrey --- gcc/gcc/sched-deps.cSun Jun 19 16:37:49 2005 +++ orig/gcc/sched-deps.c Thu Jun 30 18:00:23 2005 @@ -149,7 +149,7 @@ return 0; src = SET_SRC (pc_set (insn)); -#if 0 +#if 1 /* The previous code here was completely invalid and could never extract the condition from a jump. This code does the correct thing, but that triggers latent bugs later in the scheduler on ports with conditional @@ -1019,7 +1019,8 @@ { /* In the case of barrier the most added dependencies are not real, so we use anti-dependence here. */ - if (GET_CODE (PATTERN (insn)) == COND_EXEC) + /* if (GET_CODE (PATTERN (insn)) == COND_EXEC) */ + if (sched_get_condition (insn)) { EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i, rsi) { @@ -1066,7 +1067,8 @@ { /* If the current insn is conditional, we can't free any of the lists. */ - if (GET_CODE (PATTERN (insn)) == COND_EXEC) + /* if (GET_CODE (PATTERN (insn)) == COND_EXEC) */ + if (sched_get_condition (insn)) { EXECUTE_IF_SET_IN_REG_SET (reg_pending_uses, 0, i, rsi) {
Re: Needs advises on rotating register allocation for IA64 in GCC
Steven Bosscher wrote: Hmm, I've never seen any discussions about this on [EMAIL PROTECTED] Could you give some links to messages in the mailing list archives that you may have found? I've seen only the thread mentioning the work of Ritu Sabharwal (http://gcc.gnu.org/ml/gcc/2002-12/msg00508.html), and then questions of Canqun Yang and Feng Wang (http://gcc.gnu.org/ml/gcc/2003-09/msg00924.html and http://gcc.gnu.org/ml/gcc/2004-10/msg01193.html, respectively). Maybe I've missed something. Andrey
[GCC 4.2 Project] Support for IA-64 speculation
Hello, I work on GCC for the Institute for System Programming in Russia. Below is a brief summary of the project aiming at adding support for ia64 speculation to the GCC instruction scheduler. I presented the project at the last GCC Summit. This description doesn't have any implementation details, but rather refers to the summit paper. If needed, I'd be happy to provide longer summary on the wiki page. I'll not be able to respond on your comments until next Tuesday. Regards, Andrey --- Support for IA-64 speculation Speculation is one of the features of IA-64 architecture aimed to expose instruction-level parralelism. Using speculation allows for a compiler to overcome the dependencies by moving a load through the ambiguous store or across a branch (with data and control speculation, respectively). This technique helps to hide the latency of memory loads and reduce the execution time. The patch adds support for both data and control speculation to the GCC instruction scheduler. Implementation issues of the patch are described in the paper 'Improving GCC instruction scheduler for IA-64', which can be found in the proceedings of GCC Summit 2005. Personnel Maxim Kuvyrkov, Andrey Belevantsev (Institute for System Programming, Russian Academy of Sciences) Delivery Date This project will be ready during the first stage of GCC 4.2. Benefits The patch improves SPEC FP on 2% (with -O2, as of May 2005). Aggressive inlining and loop unrolling help the patch to produce better results on SPEC INT. Detailed results are given in the abovementioned paper. Dependencies None. Modifications Required Target-independent parts of the patch modify the scheduler source files. A new flag and params are added for enabling and controlling speculation support. Target-dependent part includes new speculative instructions and pipeline descriptions in the ia64 backend, and ia64.c changes.
Re: Where are the fortran test results for cv strunk?
Christian Joensson wrote: So, I just wonder what's going wrong here... Could it be the problem explained in http://gcc.gnu.org/ml/gcc-patches/2005-09/msg00872.html? The patch is available later in that thread: http://gcc.gnu.org/ml/gcc-patches/2005-09/msg00879.html Andrey
Bootstrap problems on ia64
Hi, When bootstrapping rev. 109012 on ia64-linux (checked out around 9am GMT today), I get make[3]: Entering directory `/mnt/sda5/bonzo/obj-trunk/stage2-libdecnumber' source='../../trunk/libdecnumber/decNumber.c' object='decNumber.o' libtool=no /home/bonzo/local/obj-trunk/./prev-gcc/xgcc -B/home/bonzo/local/obj-trunk/./prev-gcc/ -B/mnt/sda5/bonzo/obj-trunk//ia64-unknown-linux-gnu/bin/ -I../../trunk/libdecnumber -I. -g -O2 -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wmissing-format-attribute -pedantic -Wno-long-long -Werror -I../../trunk/libdecnumber -I. -c ../../trunk/libdecnumber/decNumber.c cc1: warnings being treated as errors ../../trunk/libdecnumber/decNumber.c: In function 'decToString': ../../trunk/libdecnumber/decNumber.c:2013: warning: value computed is not used make[3]: *** [decNumber.o] Error 1 Looking at the source, the warning seems to be spurious: static void decToString (decNumber * dn, char *string, Flag eng) { <...> char *c = string; /* work [output pointer] */ <...> if (dn->bits & DECSPECIAL) { /* Is a special value */ if (decNumberIsInfinite (dn)) { strcpy (c, "Infinity"); <--- here return; } And when specifying --disable-werror, I get a comparison failure: Bootstrap comparison failure! ./gcc.o differs Is this known or am I just broke my installation? GCC was configured just with --prefix, and I used 3.4.3 for stage1. Andrey
Re: Remove sel-sched?
Hello Bernd, On 13.01.2016 21:25, Bernd Schmidt wrote: There are a few open PRs involving sel-sched, and I'd like to start a discussion about removing it. Having two separate schedulers isn't a very good idea in the first place IMO, and since ia64 is dead, sel-sched gets practically no testing despite being the more complex one. Thoughts? Out of the PRs we have, two are actually fixed but not marked as such. This year's PRs are from the recent Zdenek's Debian rebuild with GCC 6 and I will be on them now. For the other two last year PRs, it is my fault not to fix them in a timely manner. Frankly, 2015 was very tough for me and my colleagues (we worked 6 days a week most part of the year), but since January it is fine again and we'll catch up now. Sorry for that. You're also right that sel-sched now gets limited testing. We're made it work initially for ia64, x64, ppc and cell, and then added ARM, too. Outside of ia64 world, I had private reports of sel-sched being used for cell with success, and we used it in our own contractor work for optimizing some ARM apps with GCC. In short, we're willing to maintain sel-sched and I apologize for the slow PR fixing speed last year, it should be no problem anymore as of now. If there are any big plans of reorganizing schedulers and sel-sched stands in the way of those, let's discuss it and we'll be willing to help in any way. Andrey Bernd
Re: Remove sel-sched?
On 14.01.2016 20:26, Jeff Law wrote: On 01/14/2016 12:07 AM, Andrey Belevantsev wrote: Hello Bernd, On 13.01.2016 21:25, Bernd Schmidt wrote: There are a few open PRs involving sel-sched, and I'd like to start a discussion about removing it. Having two separate schedulers isn't a very good idea in the first place IMO, and since ia64 is dead, sel-sched gets practically no testing despite being the more complex one. Thoughts? Out of the PRs we have, two are actually fixed but not marked as such. This year's PRs are from the recent Zdenek's Debian rebuild with GCC 6 and I will be on them now. For the other two last year PRs, it is my fault not to fix them in a timely manner. Frankly, 2015 was very tough for me and my colleagues (we worked 6 days a week most part of the year), but since January it is fine again and we'll catch up now. Sorry for that. You're also right that sel-sched now gets limited testing. We're made it work initially for ia64, x64, ppc and cell, and then added ARM, too. Outside of ia64 world, I had private reports of sel-sched being used for cell with success, and we used it in our own contractor work for optimizing some ARM apps with GCC. In short, we're willing to maintain sel-sched and I apologize for the slow PR fixing speed last year, it should be no problem anymore as of now. If there are any big plans of reorganizing schedulers and sel-sched stands in the way of those, let's discuss it and we'll be willing to help in any way. FWIW, I've downgraded the sel-sched stuff to P4 for this release given how that scheduler is typically used (ia64, which is a dead platform). I think the bigger question Bernd is asking here is whether or not it makes sense to have multiple schedulers. In an ideal world we'd bake them off select the best and deprecate/remove the others. I didn't follow sel-sched development closely, so forgive me if the questions are simplistic/naive, but what are the main benefits of sel-sched and is it at a point (performance-wise) where it could conceivably replace the aging haifa scheduler infrastructure? The main sel-sched points at the time of its inclusion were as follows: bookkeeping code support (move an insn between any blocks in the scheduling region), insn transformations support (renaming, unification, substitution through register copies), scheduling at several points at once, pipelining support. Together it paid off with something like 7-8% on SPEC at the time on ia64, but not so on the other archs, where we didn't spend much time for tuning and usually got both ups and downs compared to haifa. On ia64 the speedup was mostly because of pipelining with speculation, as far as I recall, for others including ARM renaming and substitution were useful. Since then, Vlad and Bernd put more improvements to the haifa scheduler, including sched pressure, predication and backtracking, so both schedulers now have features not present in the other one and the initial feature advantage somewhat wore off. Also, the big problem of sel-sched is speed -- it is slow because the dependency lists are not maintained through the scheduler, most of transformation stuff is implemented through an insn movement up the region and looking what should happen to allow insn A move up through insn B. I've done most of I could imagine to speed it up but haven't managed making sel-sched by default on -O2. So to sum this up, I don't think sel-sched can replace haifa in its current state. These days to speed up the scheduler I'd add something like path based dependency tracking with bit vectors like it is done in Intel's wavefront scheduling, though it is patented (Vlad may correct me here). Or, we need to devise other means of keeping dependencies up to date. We've tried that but never got it working good enough. The thing I would not like to lose is sel-sched pipelining. It can work on any loops, not only countable ones like modulo scheduling, and this can make a difference for some apps even outside of ia64. But if one basic scheduler is desired, maybe the better use of our resources will be to improve modulo scheduling instead to not lose pipelining capabilies in gcc. It is completely unmaintained now, my colleague Roman Zhuykov had a couple of improvements ~4yrs ago but most of them never got into trunk due to lack of review. He can step up as a modulo-sched maintainer if needed, the code is alive (see PR69252). Sorry for a long mail :) Andrey Jeff
Re: broken links?
On 30.06.2014 7:22, Hebenstreit, Michael wrote: I tested from home to reach https://gcc.gnu.org/pub/gcc/infrastructure/ - same result; ftp://gcc.gnu.org/pub/gcc/infrastructure/ works though. Trying ftp from behind the company FW on FF redirects me to the htpps, though on IE it works There was a number of bug reports to Firefox about unexpected changes in the URL protocol when using autocomplete, e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=769348 and https://bugzilla.mozilla.org/show_bug.cgi?id=769994. These are marked as fixed but still get occasional comments, maybe you can describe your situation there. Best, Andrey Regards Michael -Original Message- From: Ingwie Phoenix [mailto:ingwie2...@googlemail.com] Sent: Sunday, June 29, 2014 7:53 PM To: Hebenstreit, Michael Cc: Gerald Pfeifer; Jonathan Wakely; gcc@gcc.gnu.org Subject: Re: broken links? Am 30.06.2014 um 01:43 schrieb Hebenstreit, Michael : Could our firewall (plus proxy) be the reason? I still get "page no found" for both Firefox 25 and IE Michael Does the proxy have a cache? Some proxys have something similar to a browser cache - in fact, more compareable with what CloudFlare does. It probably saved a copy of the URL's result and deals it out upon request. So you might wish to check this out too.
Re: Selective scheduling and its usage
Hi Martin, On 21.03.2018 12:48, Martin Liška wrote: > Hello. > > I noticed there are quite many selective scheduling PRs: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872 > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842 > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659 > > and many others. > > I want to ask you if you plan to maintain the scheduling? Yes. The current status is that I have patches for 83530, 83962, 83913, 83480, 83972, 80463. I don't have patches for any of the 84* issues. I'm planning to submit the patches for the former set and to look at the later set next week. I usually do most of the work by myself after internal discussions with Alexander and other colleagues here, and there might be delays when I get busy with unrelated stuff. However, if there's a pressing need, we have enough knowledgeable people to fix any sel-sched PR within a week or so. > Is it enabled by default for any target we support? Yes, ia64 at -O3. The testing we make usually is like follows: bootstrap and test on ia64, bootstrap with sel-sched enabled on x86-64, and make any new tests from PRs be run on x86-64, ia64, and ppc. This way I'm confident that it mostly works on that platforms. > Should we deprecate it for GCC 8? No, I don't think so. Best, Andrey > > Thank you, > Martin
Re: Selective scheduling and its usage
On 21.03.2018 13:31, Martin Liška wrote: > On 03/21/2018 11:17 AM, Andrey Belevantsev wrote: >> Hi Martin, >> >> On 21.03.2018 12:48, Martin Liška wrote: >>> Hello. >>> >>> I noticed there are quite many selective scheduling PRs: >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872 >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842 >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659 >>> >>> and many others. >>> >>> I want to ask you if you plan to maintain the scheduling? >> >> Yes. The current status is that I have patches for 83530, 83962, 83913, >> 83480, 83972, 80463. I don't have patches for any of the 84* issues. >> I'm planning to submit the patches for the former set and to look at the >> later set next week. > > Nice! > > Maybe we can create a meta bug to track all sel. scheduling issue. > May I create it? Yes, of course. I would be happy because I can easily lose track (I have some queries in Bugzilla about scheduling but I don't monitor whole gcc-bugs traffic). In fact, I wasn't aware of some PR from your list until your mail. Best, Andrey > >> >> I usually do most of the work by myself after internal discussions with >> Alexander and other colleagues here, and there might be delays when I get >> busy with unrelated stuff. However, if there's a pressing need, we have >> enough knowledgeable people to fix any sel-sched PR within a week or so. >> >>> Is it enabled by default for any target we support? >> >> Yes, ia64 at -O3. The testing we make usually is like follows: bootstrap >> and test on ia64, bootstrap with sel-sched enabled on x86-64, and make any >> new tests from PRs be run on x86-64, ia64, and ppc. This way I'm confident >> that it mostly works on that platforms. > > Great. > >> >>> Should we deprecate it for GCC 8? >> >> No, I don't think so. > > Works for me. > > Thanks for clarification. > Martin > >> >> Best, >> Andrey >> >>> >>> Thank you, >>> Martin >> >
Re: Problems with selective scheduling
Hi Markus, Markus L wrote: Thank you very much for your detailed response! I suspect your machine description says that dependency between loads and multiply-add has zero latency, thus allowing the scheduler to place them into one instruction group. Grep for various comments about tick_check_p function. In verbose scheduler dumps, there should be something like Expr 35 is not ready yet until cycle 2 No best expr found! Finished a cycle. Current cycle = 2 At a glance when compiling without the -fsel-sched-pipelining flag (but with -fselective-scheduling2) proper VLIW grouping is performed so I guess the dependency is not zero latency but I will try to investigate the details. Increasing verbosity and comparing dumps to ia64 will probably be helpful. You are welcome to email me or Alexander if you'd need any help with debugging this. On the high level, yes. In this particular example, pipelining of loads would not be possible for the following reasons: 1) speculative motion of loads with pre/post-increment is not implemented (ia64 backend disables auto-inc generation pass when sel-sched is enabled); Is there a fundamental problem with pre/post-increment support in the selective scheduling approach or is this something that might be implemented in the future? There is no problem, we just didn't have time for this within the frames of that project. We are willing to tackle this within our spare time during next stage1, if we would have any. 2) when pipelining loads, scheduler needs to transform them into control-speculative form (since loop epilogue is not generated, load on the very last iteration of the transformed loop may access unallocated memory). In other words, selective scheduler does not preserve number of instruction executions (pipelined instructions from original loop will be executed more times than number of loop iterations). Speculative loads are not supported by any mainline GCC target except ia64. On my target it is always safe to performs loads so I suppose I could pretend to support speculative loads in order to get around this. That's good. You can check that VINSN_MAY_TRAP_P is false for your loads (it is set from haifa_classify_insn which in turn uses may_trap_exp). If this is the case, then you should be safe. Yours, Andrey /Markus
Re: Is there any plan for "data propagation from Tree SSA to RTL" to be in GCC mainline?
Diego Novillo wrote: On Sun, Nov 9, 2008 at 06:38, Steven Bosscher <[EMAIL PROTECTED]> wrote: Wasn't there a GSoC project for this last year? And this year? It'd be interesting to hear if anything came out of that... Nothing came of that, unfortunately. There are two patches, actually. The patch of propagating data dependences to RTL is ready and working, it wasn't (at that time) committed just because it was initially completed during stage3. The patch for propagating alias info wasn't finished within the scope of this year's GSoC, unfortunately, and I take it more as my fault than a student's fault, as I failed to help him locally with organizing his work. We are nevertheless trying to put some work into finishing this patch. As it is not completed yet, I don't have a subject to discuss. I hope that before the next stage1 we'll manage to finish the patches and to unify them before submitting, as the mechanism they use for mapping MEMs to trees is the same. If we'd not finish the second patch, we'll submit the first one anyways. Sorry for not writing this earlier -- I've had a few busy months (mostly finishing and defending ph.d. thesis :) Andrey
Re: Is there any plan for "data propagation from Tree SSA to RTL" to be in GCC mainline?
Bingfeng Mei wrote: I found the the GsoC project and patch here (only 2007) http://code.google.com/soc/2007/gcc/appinfo.html?csaid=E0FEBB869A5F65A8 Is this patch only for propagating data dependency or does it include propagating alias info as well? The patch at http://gcc.gnu.org/ml/gcc/2007-12/msg00240.html (I presume this is the same patch, I'm just giving you the link to its submission to the GCC ML) only does propagating data dependency info. Andrey
Re: GCC & OpenCL ?
Hello, We at ISP RAS have plans to work in the near future on generating either CUDA source or PTX from C programs (probably with simple OpenMP directives). Of course, we would benefit from the OpenCL infrastructure in GCC if one was available. Mark Mitchell wrote: We (CodeSourcery) have been talking to our commercial partners about implementing OpenCL in GCC and trying to develop/assess the level of interest. As others have stated, our theory of operation here would be to have the compiler depend on a library API that could be implemented for GPUs or for ordinary multi-core CPUs. (Just as the libgomp API could be provided to the compiler without depending on POSIX threads.) In the case of generating code for GPUs, I am wondering whether the backend that produces device assembly for the kernel code will be (in your theory) implemented inside GCC, or would that be a third-party tool? Obviously, a library is not enough for a heterogeneous system, or am I missing anything from your description? As I know, e.g. there is no device-independent bytecode in the OpenCL standard which such a backend could generate. Andrey
Re: Fixing the pre-pass scheduler on x86 (Bug 38403)
Vladimir Makarov wrote: Steven Bosscher wrote: On Wed, Apr 8, 2009 at 5:19 AM, Vladimir Makarov wrote: I've been working on register-pressure sensitive insn scheduling last two months and I hope to submit this work for gcc4.5. I am implementing also a mode in insn-scheduler to do only live range shrinkage. Is all of this still necessary if the selective scheduler (with register renaming) is made to work on i686/x86_64 after reload? That is a really interesting question, Steven. I thought about this for a few months (since last fall). Here is my points to result me in starting my work on haifa-scheduler: 1. Selective scheduler works only for Itanium now. As I know there are some plans to make it working on PPC, but there are no plans to make it working for other acrhictectures for now. There are some patches for fixing sel-sched on PPC that I need to ping, thanks for reminding me :) They were too late for regression-only mode, and we didn't get access to modern PPCs as we hoped, so this was not an issue earlier. Btw, when we've submitted the scheduler, it did work on x86-64 (compile farm), I don't know whether this is still the case. 2. My understanding is that (register-pressure sensitive insn scheduling + RA + 2nd insn scheduling) is not equal to (RA + selective scheduling with register renaming) with the point of the potential performance results. In first case, the pressure-sensitive insn scheduler could improve RA by live-range shrinkage. In the 2nd case it is already impossible. It could be improved by the 2nd RA but RA is more time consuming than scheduling now. In general this chicken-egg problem could be solved by iterating the 2 contradictory passes insn scheduling + RA (or RA + insns scheduling). It is a matter of practicality when to stop these iterations. 3. My current understanding is that selective scheduler is overkill for architectures with few registers. In other words, I don't think it will give a better performance for such architectures. On the other hand, it is much slower than haifa-scheduler because of its complexity. Haifa-scheduler before RA already adds 3-5% compile time, selective scheduler will add 5-7% more compile time without performance improvement for mainstream architectures x86/x86_64. I think it is intolerable. We still plan to do some speedups to the sel-sched code within 4.5 timeframe, mainly to the dependence handling code. But even after that, I agree that for out-of-order architectures selective scheduler will probably be an overkill. Besides, register renaming itself would remain an expensive operation, because it needs to scan all paths along which an insn was moved to prove that the new register can be used, though we have invested a lot of time for speeding up this process via various caching mechanisms. On ia64, register renaming is useful for pipelining loops. Also, we have tried to limit register renaming for the 1st pass selective scheduler via tracking register pressure and having a cutoff for that, but it didn't work out very well on ia64, so I agree that much more of RA knowledge should be brought in for this task. Hope this helps. Vlad, Steven, thanks for caring. Andrey
Re: Questions about selective scheduler and PowerPC
Hi Jie, On 18.10.2010 10:49, Jie Zhang wrote: When this error happens, FENCE_ISSUED_INSNS (fence) is 2 and issue_rate is 1. PowerPC 8540 is capable to issue 2 instructions in one cycle, but rs6000_issue_rate lies to scheduler that it can only issue 1 instruction before register relocation is done. See the following code: See PR 45352. I've tried to fix this in the selective scheduler by modeling the lying behavior in line with the haifa scheduler. Let me know if the last patch from the PR audit trail doesn't work for you. In addition, after the above patch goes in, I can make the selective scheduler not try to jump through the hoops with putting correct sched cycles on insns for targets which don't need it in their target_finish hook. I guess powerpc needs this though, but x86-64 (for which PR 45342 was opened) almost surely does not. Thanks, Andrey
Re: Questions about selective scheduler and PowerPC
On 18.10.2010 11:31, Jie Zhang wrote: Hi Andrey, On 10/18/2010 03:13 PM, Andrey Belevantsev wrote: Hi Jie, On 18.10.2010 10:49, Jie Zhang wrote: When this error happens, FENCE_ISSUED_INSNS (fence) is 2 and issue_rate is 1. PowerPC 8540 is capable to issue 2 instructions in one cycle, but rs6000_issue_rate lies to scheduler that it can only issue 1 instruction before register relocation is done. See the following code: See PR 45352. I've tried to fix this in the selective scheduler by modeling the lying behavior in line with the haifa scheduler. Let me know if the last patch from the PR audit trail doesn't work for you. In addition, after the above patch goes in, I can make the selective scheduler not try to jump through the hoops with putting correct sched cycles on insns for targets which don't need it in their target_finish hook. I guess powerpc needs this though, but x86-64 (for which PR 45342 was opened) almost surely does not. Thanks for your reply. I just tried. That patch does not help for this issue. I see, I didn't touch the failing assert with the patch. Can you just remove the assert and see if that helps for you? I cannot think of how it can be relaxed and still be useful. Andrey
Re: Questions about selective scheduler and PowerPC
On 19.10.2010 17:57, Jie Zhang wrote: Removing the failing assert fixes the test case. But I wonder why not just get max_issue correct. I'm testing the attached patch. IMHO, max_issue looks confusing. * The concept of ISSUE POINT has never been used since the code landed in repository. * In the comment just before the function, it's mentioned that MAX_POINTS is the sum of points of all instructions in READY. But it does not match the code. The code only summarizes the points of the first MORE_ISSUE instructions. If later ISSUE_POINTS become not uniform, that piece of code should be redesigned. So I think it's good to remove it now. And "top - choice_stack" is a good replacement for top->n. So we can remove field n from struct choice_entry, too. Now I'm looking at MIPS target to find out why this change in the would cause PR37360. I agree that ISSUE_POINTS can be removed, as it was not used (maybe Maxim can comment more on this). However, the assert is not about the points but exactly about the situation when a target is lying to the compiler about its issue rate. The ideal situation is that we agree on that this should never happen, but then you need to fix all targets that use this trick, and it seems that there is at least mips, ppc, and x86-64 (which is why I pointed you to 45352). The fix would be to find out why claiming the true issue rate degrades performance and to implement the proper scheduling hooks for changing priority of some insns, or to enable -fsched-pressure for the offending targets. This is a lot of work, which is why this assert was installed in max_issue for relatively short amount of time. Maybe it's time to try again, but let's have a consensus first that this assert should never trigger by design and we have enough flexibility in the scheduler to provide legal means to achieve the same performance effect. Andrey /* ??? We used to assert here that we never issue more insns than issue_rate. However, some targets (e.g. MIPS/SB1) claim lower issue rate than can be achieved to get better performance. Until these targets are fixed to use scheduler hooks to manipulate insns priority instead, the assert should - be disabled. - - gcc_assert (more_issue >= 0); */ + be disabled. */
Re: software pipelining
Hi, On 10.11.2010 12:32, roy rosen wrote: Hi, I was wondering if gcc has software pipelining. I saw options -fsel-sched-pipelining -fselective-scheduling -fselective-scheduling2 but I don't see any pipelining happening (tried with ia64). Is there a gcc VLIW port in which I can see it working? You need to try -fmodulo-sched. Selective scheduling works by default on ia64 with -O3, otherwise you need -fselective-scheduling2 -fsel-sched-pipelining. Note that selective scheduling disables autoinc generation for the pipelining to work, and modulo scheduling will likely refuse to pipeline a loop with autoincs. Modulo scheduling implementation in GCC may be improved, but that's a different topic. Andrey For an example function like int nor(char* __restrict__ c, char* __restrict__ d) { int i, sum = 0; for (i = 0; i< 256; i++) d[i] = c[i]<< 3; return sum; } with no pipelining a code like r1 = 0 r2 = c r3 = d _startloop if r1 == 256 jmp _end r4 = [r2]+ r4>>= r4 [r3]+ = r4 r1++ jmp _startloop _end here inside the loop there is a data dependency between all 3 insns (only the r1++ is independent) which does not permit any parallelism with pipelining I expect a code like r1 = 2 r2 = c r3 = d // peel first iteration r4 = [r2]+ r4>>= r4 r5 = [r2]+ _startloop if r1 == 256 jmp _end [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+ r1++ jmp _startloop _end Now the data dependecy is broken and parlallism is possible. As I said I could not see that happening. Can someone please tell me on which port and with what options can I get such a result? Thanks, Roy.
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
Hello, On 14.11.2010 0:08, Xinliang David Li wrote: I re-measured the performance difference using trunk gcc and trunk clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc because clang/llvm's type based aliasing is not incomplete and not enabled by default. I also added -fomit-frame-pointer to clang/llvm as this is gcc's default. The base option is -O2. It would be very interesting to compare also peak numbers, i.e. with LTO and strict aliasing enabled, as well as -O3 and -ffast-math/-funroll-loops, similar to Vlad's or OpenSUSE's options. Can you try to measure these? Maybe you can also run SPEC2k6, if there is enough machine resources, but that's probably asking too much... Andrey
Re: alias time explosion
Hi Daniel, I can't find the testcase attached to any message of the thread. Could it be because of the message size? If so, please send the testcase both to me and Maxim, one of us will look into it. Thanks, Andrey
Re: [patch] Improve loop array prefetch for IA-64
Canqun Yang wrote: Hi, all This patch results a performance increase of 4% for SPECfp2000 and 13% for NAS benchmark suite on Itanium-2 system, respectively. More performance increase is hopeful by further tuning the parameters and improving the prefetch algorithm at tree level. Hi Canqun, It's great news that you continued to work on prefetching tuning for ia64! Do you plan to port your other changes for the old RTL prefetching to the tree level? @@ -1985,13 +1985,18 @@ ??? This number is bogus and needs to be replaced before the value is actually used in optimizations. */ I suggest to remove this comment as it has become outdated with your patch. Instead you might say how did you choose this particular value (and PREFETCH_BLOCK too). Just my 2c. Andrey
Re: Power/Energy Awareness in GCC
On 15.04.2013 20:21, Tobias Burnus wrote: Ghassan Shobaki wrote: We are currently working on a research project on instruction scheduling for low power (experimenting with different algorithms for minimizing switching power) and would like to find out if GCC already has such a scheduler and how it can be enabled, so that we can experiment with it. Your project sounds similar to the following project: http://gcc.gnu.org/ml/gcc/2013-03/msg00259.html ("Identifying Compiler Options to Minimise Energy Consumption for Embedded Platforms"). We also did similar research in the past, mainly investigating how compiler feedback (via instrumentation and profiling) can assist dynamic voltage scaling in the OS (we had fully offline DVS as well as the patches to the Linux scheduler), but we also did some experiments with controlling bit-switching in the scheduler and some other work. You can find the relevant papers with some preliminary DVS work and bit-switching experiments at http://www.doc.ic.ac.uk/~phjk/GROW09/papers/06-PowerBelevantsev.pdf and with more DVS research and experiments from GROW 2010 at http://gcc.gnu.org/wiki/GROW-2010?action=AttachFile&do=get&target=2010-GROW-Proceedings.pdf. To be honest, I can only repeat the same things I said to James on the 2012 Cauldron -- I don't think you can achieve much of power savings from within the compiler, at least until the hardware design will change. The scheduling freedom in minimizing bit-switching is very limited and this is not going to buy much. Any information that a compiler can produce for you will be about program behavior related to CPU and memory, and no matter how well you use it, again you will not save much power as nowadays CPU will not be the largest power consumer in an embedded system (if you save e.g. 10% CPU power consumption which is 10% of total system consumption, that's only 1%). One can achieve more by turning off a display or a networking device. But I will be happy to be proven wrong :-) So good luck with the research. Andrey
Re: Update the c++1y/c++14 support page
On 07.10.2013 9:54, Jeff Hammond wrote: Given your company (Oracle) sued Google over 9 lines of Java (the now infamous rangeCheck function), I hardly think it's appropriate for you to discourage someone from following through with copyright assignment for a minor contribution. This has nothing to do with Oracle v. Google, but with GCC policy of not requiring a copyright assignment for small patches from the first-time contributors (which Paolo knows as a long-time GCC hacker). Of course, after a certain limit the assignment is required. See http://gcc.gnu.org/contribute.html. Also, please don't top post on this list. Andrey Jeff On Sun, Oct 6, 2013 at 7:48 AM, Paolo Carlini wrote: Hi, Morwenn Ed ha scritto: Well, it never got to sign the copyright assignment. Certainly you don't need an assignment for 3 lines of docs! Paolo
Re: RFC: -Wall by default
On 05.04.2012 16:33, Robert Dewar wrote: On 4/5/2012 8:28 AM, Michael Veksler wrote: It is not that they can't remember. I am a TA at a moderately basic programming course, and student submit home assignments with horrible errors. These errors, such as free(*str) or *str=malloc(n) are easily be caught by -Wall. I have to remember to advise them to use -Wall and to fix all the warnings, which I sometimes forget to do. Wouldn't it be better in a "moderately basic programming course" to provide standard canned scripts that set things up nicely for students including the switches they need? Indeed for such a course wouldn't it be better to use an appropriate IDE, so they could concentrate on the task at hand and not fiddling with commands. Yes, I think it is very important for students to learn what is going on, but you can't do everything at once in a basic course. And even in the context you give, surely it is not too much to expect a TA to remember important advice like this? FWIW, in our "basic programming" course students have to hand their homework to an automated testing system which forces the compiler options we think useful, including all the relevant warning switches and -Werror. Of course, there is a web page explaining the meaning of the switches and TAs help with emphasizing their importance to students. And indeed, you can't do everything in an 101 course, thus not much of this (helpful) information remains in their heads. But it's better than nothing. Andrey
Re: SMS issues
Hello Alex, On 18.07.2012 18:40, Alex Turjan wrote: Im writing to you with respect to some strange SMS functionality. In the code bellow there are 2 instructions (a builtin store and a builtin load) as they appear in the program flow before SMS: ... Issues: 1. What is the reason why (T,1) is build up? – to me it seams that (T,0) must be enough This looks like the issue that Roman's patch from http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html should be fixing, could you try it? Ayal, Revital, could you again take a look at the above patch and all the SMS improvement patches mentioned in http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01859.html? The last comments from me are at http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00478.html. At the Cauldron, I was talking to Ramana about pushing these forward as important for arm and Linaro, so it would be good to have them in 4.8. Andrey 2. Looking inside generate_reg_moves it seams to me that this function is not meant to deal with replacing memory accesses but only with register replacements. (see inside the call to replace_rtx which in my case trys to replace the mem accesses inside 136). 3. The (T,1) dep is assumed to take place as if before the SMS pass, insn 136 was preceding insn 134: (insn 136 135 137 12 tdscdma_pfu_ccdec.c:292 (set (reg/v:HI 248 [ mappingAddress_i16 ]) (unspec:HI [ (mem:HI (plus:PSI (reg/v/f:PSI 170 [ curMappingTable_pi16 ]) (reg:SI 305)) [0 S2 A16]) ] 696)) 755 {INSN_BUILTIN___loadbyteofs_16} (expr_list:REG_DEAD (reg:SI 305) (nil))) (insn 134 133 135 12 tdscdma_pfu_ccdec.c:289 (set (mem:HI (plus:PSI (reg/v/f:PSI 185 [ ccdecInterim_pi16 ]) (reg:SI 303)) [0 S2 A16]) (unspec:HI [ (reg/v:HI 244 [ outData_u16 ]) ] 1752)) 1377 {INSN_BUILTIN___storebyteofs_16} (expr_list:REG_DEAD (reg:SI 303) (expr_list:REG_DEAD (reg/v:HI 244 [ outData_u16 ]) (nil If that would be the case then between 134 and 136 there would be present also an antidependence of distance 0. Becasue in the pipelined schedule, 134 is scheduled before 136 (SCHED_TIME (136) > SCHED_TIME (134)) the modulo variable expansion needs to take place as explained before. SMS decides to produce a modulo variable expansion in a case when is not needed. However, it fails in fulfilling the whole modulo variable expansion procedure, covering in this way the possibly incorrect behavior described above. regards, Alex
Re: SMS issues
Hello, On 19.07.2012 13:14, Alex Turjan wrote: Andrey, Thanks for the patch. I applied it and so far it seams ok. I will run further testing and let you know if i see problems. Back to the last part of my email, Im still wondering what happens in case the variable modulo expanded is a memory location? because as I see generate_reg_moves is not able to handle such situation... or perhaps there is something which prevents the modulo scheduler from arriving to this situation? The dependencies that get removed with -fmodulo-sched-allow-regmoves are only register ones, and with the check for setting REG_P in schedule_reg_moves we are not supposed to touch memory. I suggest you to look at the trunk's code as it was rewritten by Richard Sandiford (CC'd), he could comment more on how the scheduling of register moves was changed. See also the thread starting at http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02428.html. Andrey Alex --- On Thu, 7/19/12, Andrey Belevantsev wrote: From: Andrey Belevantsev Subject: Re: SMS issues To: "Alex Turjan" Cc: gcc@gcc.gnu.org, ayal.z...@gmail.com, revital.e...@linaro.org, "Roman Zhuikov" Date: Thursday, July 19, 2012, 11:11 AM Hello Alex, On 18.07.2012 18:40, Alex Turjan wrote: Im writing to you with respect to some strange SMS functionality. In the code bellow there are 2 instructions (a builtin store and a builtin load) as they appear in the program flow before SMS: ... Issues: 1.What is the reason why (T,1) is build up? – to me it seams that (T,0) must be enough This looks like the issue that Roman's patch from http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html should be fixing, could you try it? Ayal, Revital, could you again take a look at the above patch and all the SMS improvement patches mentioned in http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01859.html? The last comments from me are at http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00478.html. At the Cauldron, I was talking to Ramana about pushing these forward as important for arm and Linaro, so it would be good to have them in 4.8. Andrey 2.Looking inside generate_reg_moves it seams to me that this function is not meant to deal with replacing memory accesses but only with register replacements. (see inside the call to replace_rtx which in my case trys to replace the mem accesses inside 136). 3. The (T,1) dep is assumed to take place as if before the SMS pass, insn 136 was preceding insn 134: (insn 136 135 137 12 tdscdma_pfu_ccdec.c:292 (set (reg/v:HI 248 [ mappingAddress_i16 ]) (unspec:HI [ (mem:HI (plus:PSI (reg/v/f:PSI 170 [ curMappingTable_pi16 ]) (reg:SI 305)) [0 S2 A16]) ] 696)) 755 {INSN_BUILTIN___loadbyteofs_16} (expr_list:REG_DEAD (reg:SI 305) (nil))) (insn 134 133 135 12 tdscdma_pfu_ccdec.c:289 (set (mem:HI (plus:PSI (reg/v/f:PSI 185 [ ccdecInterim_pi16 ]) (reg:SI 303)) [0 S2 A16]) (unspec:HI [ (reg/v:HI 244 [ outData_u16 ]) ] 1752)) 1377 {INSN_BUILTIN___storebyteofs_16} (expr_list:REG_DEAD (reg:SI 303) (expr_list:REG_DEAD (reg/v:HI 244 [ outData_u16 ]) (nil If that would be the case then between 134 and 136 there would be present also an antidependence of distance 0. Becasue in the pipelined schedule, 134 is scheduled before 136 (SCHED_TIME (136) > SCHED_TIME (134)) the modulo variable expansion needs to take place as explained before. SMS decides to produce a modulo variable expansion in a case when is not needed. However, it fails in fulfilling the whole modulo variable expansion procedure, covering in this way the possibly incorrect behavior described above. regards, Alex
Re: Global Value Numbering and dependence on SSA in GCC
Hi Kartik, On Sun, 10 Feb 2013 15:41:17 +0530, Kartik Singhal wrote: Thanks Richard for pointing out tree-ssa-sccvn.c On Wed, Feb 6, 2013 at 8:14 PM, Richard Biener wrote: Well, to ignore SSA form simply treat each SSA name as separate variable. You of course need to handle PHI nodes as copies on CFG edges then. I am not sure if I understood this correctly. Consider the following example: if (...) a_1 = 5; else if (...) a_2 = 2; else a_3 = 13; # a_4 = PHI return a_4; Do you mean that I treat a_1, a_2 and a_3 as 3 different variables? In this approach, I lose the information that they are actually the same variables. Or should I write a mini lexer function to convert the SSA names into original variable names by removing _1, _2, etc. as suffix from each? Regarding PHI nodes, I think you mean, I should treat them as identity functions, but I am not clear exactly how in the first approach above. In second, I can just treat it as a=a. Richard means that you can treat the above code as if (...) { a_1 = 5; a_4 = a_1; } else if (...) { a_2 = 2; a_4 = a_2; } else { a_3 = 13; a_4 = a_3; } return a_4; The extra code above is "copies on CFG edges". Andrey -- Kartik http://k4rtik.wordpress.com/
Release planning for GCC 4.4?
Hello, As GCC 4.3 is almost out of the door, I thought it possible to ask whether there will be a release plan for GCC 4.4 similar to the ones for previous releases. The reason I'm asking is that myself and my colleagues are working on preparing the selective scheduler branch for inclusion in mainline. We'll need another month for the final cleanup and compile time tuning of the scheduler, so we plan to submit it sometimes in April. I would like to avoid any possible conflicts with other projects (though this work is unlikely to conflict with others I know of, except maybe Vlad's register allocator), and I think that the idea of a release plan used for previous development cycles worked quite nicely. Thanks, Andrey
Re: GCC-4.3.0 fails to compile SPECint-2006 with control speculation on itanium processor
吴曦 wrote: Hi: I am working on gcc-4.3.0 and Redhat ES 4. When I uses the compiler to build specint-2006 benchmarks, none passes the make with compiler option: -msched-control-spec (enable control speculation on IA-64) Control speculation is disabled by default on IA-64, so I think one of the scheduler patches accidentally broke it. This can be Maxim's rewrite of dependence lists. We will take a look. Meanwhile you can try 4.2 series, they should work. Andrey
Re: gcc on IA64 platform
Hello, Tadas V wrote: I am a computer science student and currently I am preparing my master degree final work on "Compiler optimization on IA64 platforms". So could you provide some information to me what is the the current situation with gcc and IA64 platfrom - I mean what are open optimization issues and so on. After googling a while I found this document http://gcc.gnu.org/projects/ia64.html and I would like to know if this information is up-to-date. Looking forward to hearing from you. Thank you in advance. As you can see from the above page, it comes from the 2001 mini summit, so most of the projects mentioned there are already done. Moreover, GCC infrastructure has been dramatically improved since then. The current state can be summarized as follows: o Alias analysis improvements mentioned on the page are done long ago. There are two unfinished IA-64 inspired patches concerning alias analysis: improvements of Sanjiv Gupta's patch tracking base+offset calculations on RTL done by Alexander Monakov, which we didn't manage to submit (see http://gcc.gnu.org/ml/gcc/2007-03/msg00148.html), and the patch propagating alias information from Tree SSA to RTL, which produced too few disambiguations and should be improved by Alexander Monakov during this year's Google Summer of Code. o Data prefetching is now reimplemented on trees instead of RTL. There was a project by Canqun Yang on tuning the old RTL data prefetching for IA-64, but AFAIK it was never ported to the new implementation. o DFA scheduler was implemented by Vladimir Makarov and checked in long ago. Bundling is now performed using DFA too, see bundling () in config/ia64/ia64.c. o Profile-directed block ordering and inlining is already supported AFAIK. o Control and data speculation are supported since GCC 4.2 as a result of a project of ISP RAS. The implementation was done by Maxim Kuvyrkov. o Extended basic block scheduling is implemented and works as a second scheduling pass on IA-64. Superblock formation was also implemented on RTL and fairly recently moved to Tree SSA by Robert Kidd. There is also a treegion scheduling patch on the treegion branch, but it was never committed to mainline. o Modulo scheduling is implemented by IBM Haifa team. It started working on IA-64 since GCC 4.3 after some small bugfixes (sorry I didn't mention that in changes.html). Also, there is a patch that does propagation of data dependency information to RTL done by Alexander Monakov. It wasn't committed because there was a stage3 at that time, and I think it will be unified with the analogous aliasing patch mentioned above. o We (ISP RAS) are currently preparing selective scheduling implementation, also inspired by IA64, for merge with mainline. The actual code is in sel-sched-branch in the SVN repository. o Rotating registers are not yet supported. o Link time optimizations (LTO) is an ongoing work, you can take a look at LTO branch. Also, there was a meeting of Gelato GCC group in 2006, and some information can be found in the minutes at http://gcc.gelato.org/MeetingNotes. You can also search mailing list archives for similar discussions happened in the past. Hope this helps, Andrey
Re: gcc on IA64 platform
Gerald Pfeifer wrote: Any chance you could make a pass through that page and remove those items that you know have already been done, or separate those that are still open and those that have been done into two different sections? Sure, I would make a note to do this somewhere during stage2. Andrey
Using cfglayout mode in the selective scheduler
Hello, Currently, the selective scheduler pass uses cfgrtl mode. This results in creating extra jumps and basic blocks while changing control flow, especially when redirecting edges. When this happens, we need to initialize scheduler's data structures. To do this, we have implemented control flow hooks and RTL hooks to notify the scheduler about all created insns/bbs. The new RTL hooks were considered not a good idea. Instead, Steven and Ian suggested using cfglayout mode in the scheduler in such a way that we'd see all generated jumps and initialize them. The basic idea is enabling cfglayout mode and then ensuring that insn stream and control flow are in sync with each other at all times. This is required because e.g. on Itanium the final bundling happens right after scheduling, and any extra jumps emitted by cfg_layout_finalize will spoil the schedule. So we need to ensure that leaving cfglayout mode will not create extra insns by fixing the insn stream on the fly. We also need to maintain the existing behavior (fixes are done while finalizing) for other users of cfglayout mode. I see several options for supporting this functionality: 1. Make the required fixes inside the cfglayout hooks so that both the new behavior and the old behavior is supported and the user can choose one of them. As we still need to see the created jumps, we need to make try_redirect_by_replacing_jump and force_nonfallthru functions either to call user-defined hooks on the new jumps or to record the new jumps in a vector to which the user can get access. 2. Factor out the hooks and helpers from cfg*.c into smaller functions and create the alternative implementations of hooks inside the scheduler, which will see the new jumps. The old behavior will be retained as we'll not change the original hooks. 3. Modify try_redirect_by_replacing_jump and force_nonfallthru as in #1, but do this in cfgrtl mode. No changes to the cfglayout mode will be needed then, and it will not be used at all. Going with #1 means that the easier handling of control flow given by the cfglayout mode will not be used (except for maybe better support of current_loops data). But it is better if we'd want to get rid of cfgrtl eventually. Going with #2 will mean some code duplication, hopefully not very large due to factoring out the hooks and reusing its parts. Also, we will not need to handle all cases like e.g. splitting an abnormal edge. Going with #3 means the smallest amount of changes, but doesn't use cfglayout at all :) I would choose #3, but as people think that moving to cfglayout is a good idea in general, will be happy to implement is #1 or #2. What do people think? Is there better options I've overlooked? Thanks, Andrey
Re: Using cfglayout mode in the selective scheduler
Zdenek Dvorak wrote: I am probably missing something: The basic idea is enabling cfglayout mode and then ensuring that insn stream and control flow are in sync with each other at all times. This is required because e.g. on Itanium the final bundling happens right after scheduling, and any extra jumps emitted by cfg_layout_finalize will spoil the schedule. what is the difference between this idea and the cfgrtl mode? In cfgrtl mode, the functions to manipulate the cfg ensure that the insn stream and the CFG match. For cfglayout mode you have to do it by hand. I must say that I do not like this idea at all. As cfgrtl mode routines show, this is nontrivial (and consequently, error-prone), and you would be duplicating much of cfgrtl functionality. I fail to see the advantage over simply using cfgrtl mode and handling the created jump insns (by checking the last insn of altered and newly created blocks, no changes to cfgrtl routines or new hooks necessary), I would prefer to do that too, as it seems a cleaner approach. I am still willing to try the cfglayout idea, but if there is a disagreement about its usefulness, maybe it's better to proceed with your suggestion (I have suggested something similar to Ian at http://gcc.gnu.org/ml/gcc-patches/2008-06/msg01738.html). What do the other maintainers think? Andrey
Re: build error from trunk sources
Janus Weil wrote: Building trunk rev. 139857 on linux/x86_64, I get the following failure: ... cc1: warnings being treated as errors /home/local/jweil/gcc44/trunk/gcc/sel-sched-ir.c:946: error: 'cmp_v_in_regset_pool' defined but not used I will commit the following as obvious once bootstrap finishes. Sorry for the breakage. Andrey 2008-09-01 Andrey Belevantsev <[EMAIL PROTECTED]> * sel-sched-ir.c (cmp_v_in_regset_pool): Surround with #ifdef ENABLE_CHECKING. Index: gcc/sel-sched-ir.c === *** gcc/sel-sched-ir.c (revision 139859) --- gcc/sel-sched-ir.c (working copy) *** return_regset_to_pool (regset rs) *** 939,944 --- 939,945 regset_pool.v[regset_pool.n++] = rs; } + #ifdef ENABLE_CHECKING /* This is used as a qsort callback for sorting regset pool stacks. X and XX are addresses of two regsets. They are never equal. */ static int *** cmp_v_in_regset_pool (const void *x, con *** 946,951 --- 947,953 { return *((const regset *) x) - *((const regset *) xx); } + #endif /* Free the regset pool possibly checking for memory leaks. */ void
Re: Optimizations for itanium
Prasad, Kamal R wrote: Hello, Can someone tell me the back-end optimizations available for itanium (IA64)? We (HP) may be able to contribute to this from our side. To add to the summary Vlad already did, you may want to take a look at the notes from the last meeting of the Gelato GCC Itanium group in Moscow available at http://gcc.gelato.org/MeetingNotes. This is a good summary of what is considered the most important works for Itanium in GCC. More information about the meeting is at http://gcc.gelato.org/MoscowMeeting. Our team is currently focused on performance tuning and compile time improvement of the new scheduler for Itanium, available on the sel-sched branch. As of data dependence propagation, there will be a Google SoC project about this; I hope to have more data in a few weeks to discuss. I have sent a patch to fix modulo scheduling on Itanium some time ago. It was considered acceptable by IBM folks, and I think that it will go in with the other fixes done by them, but I don't know the details. Also, Dmitry Zhurikhin from our team tried to use resource-aware constraints in modulo scheduling for his MS thesis, which worked for two tests from SPEC. He will work on better heuristics for scheduling in another SoC project this summer. I would also suggest to tune default parameters of some optimizations for Itanium. For example, there was a patch from Canqun Yang about increasing prefetching parameters for Itanium, which produced much better results than the default values. However, that was for the old RTL prefetch pass, which is replaced by the tree ssa implementation done by Zdenek. Inlining parameters for Itanium can be more aggressive too. This kind of work is quite simple, but requires a lot of machine resources, and people usually don't have access to many Itaniums. (We have only two, for example.) So maybe you can help here. Andrey
hash_rtx and volatile subexpressions
Hello, I would like to use some sort of hashing to speed up searching for an insn in an availability set in the selective scheduler. It seems natural using hash_rtx for this purpose. However, hash_rtx will not hash any volatile subexpressions, returning 0 in this case. This is fine by me, but together with the recursion optimization it has near line 2345... for (; i >= 0; i--) { switch (fmt[i]) ... /* If we are about to do the last recursive call needed at this level, change it into iteration. This function is called enough to be worth it. */ if (i == 0) { x = XEXP (x, i); goto repeat; } ...one does not get a hash e.g. for any conditional jump like (set (pc) (if_then_else (ne (reg:BI 262 p6 [360]) (const_int 0 [0x0])) (label_ref:DI 191) (pc))) ...because the pc subexpression is always the first one, and the recursive call for it gets optimized by the above code, which results in a zero hash for the whole pattern. Is this intentional, or do we want to have 'return hash;' instead of 'return 0;' in all places when *do_not_record_p is set to 1? Is there a better hash_rtx somewhere, which I don't know about? Andrey
Re: IA64 optimizations..
Hello, Kumar Rangarajan wrote: I am interested in understanding the limitations/optimization opportunities of the IA64 version of gcc. I read from the projects list on the gcc site about the proposed optimizations for the IA64 platform, I see that some of the requests were from 2001 or so timeframe, I am wondering what's the current state of those optimization projects. I tried reading other sites out there, but most of the information seemed a little dated (again from 2001) (eg: http://ia64-linux.org/compilers/gcc_wishlist.html). Can someone please let me know what's the current status of those projects or any other optimization wishlists ? This question was raised a few months ago -- you may find useful to look at the thread starting at http://gcc.gnu.org//ml/gcc/2007-06/msg7.html. Andrey
Re: Git and GCC
Vincent Lefevre wrote: It's surprising that you don't mention svk, which is based on top of Subversion[*]. Has anyone tried? Is there any problem with it? I must agree with Ismail's reply here. We have used svk for our internal development for about two years, for the reason of easy mirroring of gcc trunk and branching from it locally. I would not complain about its speed, but sometimes we had problems with merge from trunk, ending up with e.g. zero-sized files in our branch which were removed from trunk, or we even couldn't merge at all, and I had to resort to underlying subversion repository for merging. As a result, we're currently migrating to mercurial. Andrey