http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751
--- Comment #20 from Oleg Endo <oleg.e...@t-online.de> 2011-12-12 02:11:12 UTC --- (In reply to comment #19) > The results look way better now. I've tested your latest patch for > sh4-unknown-linux-gnu and found no new regressions for gcc testsuite. > CSiBE with "-O2 -fpic" on that target shows that 144 improvements and > 28 dis-improvements for size on 896 files. The worst case is > -4.34783 net/ipv4/ip_forward 704 736 > which looks the case of the high r0 register pressure. The best one is > 25.7426 arch/testplatform/kernel/traps 10160 8080 > which looks to be very impressive. That looks nice! Thanks for checking it out! I haven't ran CSiBE with "-O2 -fpic", only with "-Os -mpretend-cmove -mfused-madd -freg-struct-return". I will use your params and try to see what is happening in those cases where it gets worse. Maybe it can be "fixed" with a peephole. > > > /* We want to enable the use of SUBREGs as a means to > > VEC_SELECT a single element of a vector. */ > >+ > >+ /* This effectively disallows using GENERAL_REGS for SFmode vector > >subregs. > >+ This can be problematic when SFmode vector subregs need to be accessed > >+ on the stack with displacement addressing, as it happens with -O0. > >+ Thus we allow the mode change for -O0. */ > > if (to == SFmode && VECTOR_MODE_P (from) && GET_MODE_INNER (from) == > > SFmode) > >- return (reg_classes_intersect_p (GENERAL_REGS, rclass)); > >+ return optimize ? (reg_classes_intersect_p (GENERAL_REGS, rclass)) : > >false; > >+ Thus we allow the mode change for -O0. */ .. should be _disallow_ of course... Another note that should have went into the comment: As far as I could observe it, this is mainly triggered by the following in sh_legitimate_index_p: + if (mode == QImode && (unsigned) INTVAL (op) < 16) + return true; .. probably because this makes the generic "m" constraint match QImode displacement addressing and then it tries using it. > Rather than that, I guess that the QI/HImode disp addressing would > be an optimization unneeded for -O0 in the first place. Perhaps > something like -mpreferdisp option and TARGET_PREFER_DISP macro > which are enable by default but disable at -O0 might be help. Yeah, could also be an option. > It'll also help some unfortunate anormallies for which those optimizations > will generate worse codes. You mean, by giving the user the option to turn off displacement addressing for e.g. some specific files / modules by specifying -mno-preferdisp or something like that? By anomalies do you mean code that gets worse because of too much pressure on R0 and all the reloads around it, or do you have any other bad use cases? BTW, the vector mode handling seems a bit unfinished (see also PR13423). I was planning to address that at a later point... > Maybe. Implementing it with predicates and constraints would be > smarter if possible but may be difficult because the register > allocator handles the "m" constraint specially. Yes, the "m" constraint is an obstacle in this case. What I've tried out is splitting it into a memory constraint that allows displacement addressing and another memory constraint that disallows it ("Snd", "Sdd" - they are added by the patch) and use those in the move / sign extend insns instead of the generic "m" constraint. For example something like that: (define_insn "*extendqisi2" [(set (match_operand:SI 0 "arith_reg_dest" "=z,r,r") (sign_extend:SI (match_operand:QI 1 "general_movsrc_operand" Sdd,Snd,r")))] "TARGET_SH1" "@ mov.b %1,%0 mov.b %1,%0 exts.b %1,%0" [(set_attr "type" "load,load,arith")]) This basically seems to work. But when there are consecutive loads, reload would use displacement addressing for the first load, but not for the following loads because R0 will be already allocated at that point. Ideally, reload should take into account that "reloading around R0" is in most cases more efficient than other strategies, especially on SH4. However, I'm not sure whether changing reload for this issue is a good idea ;) Another thing I could try out is to have load/store insns that allow arbitrary operands in displacement addressing like on SH2A, and split them into two insns of one load/store and one reg-reg move after reload. But that would probably require the R0 clobber in the expander which could make worse code in cases where displacement addressing is not used, I guess. Do you think this approach could make sense? > I think so, though we are in stage 3 and have to wait the trunk returns > to stage 1 or 2 for committing such changes. I was afraid you might say something like this :T > You have the time for implementing HImode support. Yep, sure. I've noticed that the latest version of the patch seems to fix some more testsuite failures. I will investigate which hunk is responsible for the fixes so that could be pulled out from the patch. OK? > BTW, the changes for white spaces, spells and other clean-ups which > are not essential for this work should be separated into another patch. Ah yeah, sure. Will pull them out and submit as a cleanup patch to the patches-list.