>> Date: Thu, 29 Apr 2010 08:55:56 +0200 (CEST) >> From: "Jonas Paulsson" <d0...@student.lth.se> > >> It feels good to know that the widening mults issue has been >> resolved > > Yes, nice, and as late as last week too, though the patch was > from February. > >> as >> it was a bit of a disapointment I noted the erratic behaviour with GCC >> 4.4.1. Perhaps you would care to comment on what to expect as a user >> now, >> then? > > IIUC, it should Just Work. No, I haven't checked. Note that > the fix was somewhat along the lines of what you wrote in your > thesis IIUC; adding a specific pass to fix up separated > operations. See > <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29274> and > <http://gcc.gnu.org/ml/gcc-patches/2010-02/msg00643.html>. BTW, > my observation was from the 4.3 era. It's a regression, which > explains why I hadn't noticed it with the 3.x version I used > before that. A pity it was deemed too invasive to fix for 4.5.
> >> Another issue that gave me porting problems was the SIMD memory >> accesses, >> for e g doing a wide load into two adjacent narrow registers with one >> instruction. This was resolved earlier on the mailinglist to not be >> handleable on RTL, so I wonder now if anything has been done for this, >> as >> it too seems rather reasonable, just like the widening loads? > > You wanted to load adjacent data in a wider mode that was then > to be separately used in a mode half that size, but the > registers had to be adjacent too? That's kind of the opposite > problem to what's usually needed! If the use of the data was > actually for the obvious wider mode (SI or V2HI), you'd just > have to define the movsi or movv2hi pattern and it would be > used, but that unfortunately seems not applicable in any way. > I'm not sure that problem is of common interest I'm afraid, but > if it can be resolved with a target-specific pass, there'd be > reason to add a hook somewhat like > TARGET_MACHINE_DEPENDENT_REORG, but earlier. > > But, did you check whether combine tried to match RTL that > looked somewhat like: > > (parallel > [(set (reg:HI 1) (mem:HI (plus:SI (reg:HI 3) (const_int 2)))) > (set (reg:HI 2) (mem:HI (plus:SI (reg:HI 3) (const_int 4))))]) > > I.e. a parallel with the two loads where the addresses were > adjacent? From gdb you inspect the calls to try_combine (IIRC). > That insn could have been matched to a pattern like: > > (define_insn "*load_wide" > [(set (match_operand:HI 0 "register_operand" "=d0,d1,d2") > (match_operand:HI 1 "reg_plus_const_memory_operand" "m")) > (set (match_operand:HI 2 "register_operand" "=d1,d2,d3") > (match_operand:HI 3 "reg_plus_const_memory_operand" "m"))] > "rtx_equal_p (XEXP (operands[3], 0), > plus_constant (XEXP (operands[1]), 2))" > "load_wide %0,%1") > Yes, of course I checked with combine, but the debug dump of the pass revealed that it is not looking for a wider load in the case of an adjacent load, unfortunately. I checked this again now by setting a bp in try_combine, but there is no attempt to use the wider load insn. This combination I then handled by an added pass that located and replaced load/store instructions. Along with successful uses of post-inc insns, this was an important optimization for the project. Does not make sense to me, not to do such a simple thing as looking for offset 1 in the local block, even... (of course, I did just the simple thing of checking for CODE_FOR... in terms of locating load insns) I think I tried to add a pattern with a parallel semantic pattern, but combine did not care about it. > Just a WAG, there are reasons this would not match in the > general case (for one, you'd want to try to match the opposite > order too). Don't pay too much attention to the exact matching > predicates, constraints and condition above. The point is just > whether combine tried to generate and match a parallel with two > valid loads, given source where there was obvious opportunity > for it. > > That insn *could* then be caught with a pattern which would, > through the right constraints coerce register allocation to make > the right choices for the (initially separete) registers. In > the example above, four registers are assumed to be valid as > destination with the matching singleton constraints d0..d3. > I guess I wonder here how much of a tweak it is to use GCC in this fashion - pairing 16 bit regs to 32 bit regs. There does not seem to be complete support for it, although it works on a basic level. > brgds, H-P >