on 2021/1/26 上午1:59, Richard Sandiford via Gcc-patches wrote:
> Richard Biener <rguent...@suse.de> writes:
>> On Fri, 22 Jan 2021, Segher Boessenkool wrote:
>>
>>> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote:
>>>> On Thu, 21 Jan 2021, Segher Boessenkool wrote:
>>>>> What is holding up this patch still?  Ke Wen has pinged it every month
>>>>> since May, and there has still not been a review.
>>>
>>> Richard Sandiford wrote:
>>>> FAOD (since I'm on cc:), I don't feel qualified to review this.
>>>> Tree-level loop stuff isn't really my area.
>>>
>>> And Richard Biener wrote:
>>>> I don't like it, it feels wrong but I don't have a good suggestion
>>>> that had positive feedback.  Since a reviewer / approver is indirectly
>>>> responsible for at least the design I do not want to ack this patch.
>>>> Bin made forward progress on the other parts of the series but clearly
>>>> there's somebody missing with the appropriate privileges who feels
>>>> positive about the patch and its general direction.
>>>>
>>>> Sorry to be of no help here.
>>>
>>> How unfortunate :-(
>>>
>>> So, first off, this will then have to work for next stage 1 to make any
>>> progress.  Rats.
>>>
>>> But what could have been done differently that would have helped?  Of
>>> course Ke Wen could have written a better patch (aka one that is more
>>> acceptable); either of you could have made your current replies earlier,
>>> so that it is clear help needs to be sought elsewhere; and I could have
>>> pushed people earlier, too.  No one really did anything wrong, I'm not
>>> seeking who to blame, I'm just trying to find out how to prevent
>>> deadlocks like this in the future (where one party waits for replies
>>> that will never come).
>>>
>>> Is it just that we have a big gaping hole in reviewers with experience
>>> in such loop optimisations?
>>
>> May be.  But what I think is the biggest problem is that we do not
>> have a good way to achieve what the patch tries (if you review the
>> communications you'll see many ideas tossed around) first and foremost
>> because IV selection is happening early on GIMPLE and unrolling
>> happens late on RTL.  Both need a quite accurate estimate of costs
>> but unrolling has an ever harder time than IV selection where we've
>> got along with throwing dummy RTL at costing functions.
>>
>> IMHO the patch is the wrong "start" to try fixing the issue and my
>> fear is that wiring this kind of "features" into the current
>> (fundamentally broken) state will make it much harder to rework
>> that state without introducing regressions on said features (I'm
>> there with trying to turn the vectorizer upside down - for three
>> years now, struggling to not regress any of the "features" we've
>> accumulated for various targets where most of them feel a
>> "bolted-on" rather than well-designed ;/).
> 
> Thinking of any features in particular here?
> 
> Most of the ones I can think of seem to be doing things in the way
> that the current infrastructure expects.  But of course, the current
> infrastructure isn't perfect, so the end result isn't either.
> 
> Still, I agree with the above apart from maybe that last bit. ;-)
> 
>> I think IV selection and unrolling (and scheduling FWIW) need to move
>> closer together.  I do not have a good idea how that can work out
>> though but I very much believe that this "most wanted" GIMPLE unroller
>> will not be a good way of progressing here.
> 
> What do you feel about unrolling in the vectoriser (by doubling the VF, etc.)
> in cases where something about the target indicates that that would be
> useful?  I think that's a good place to do it (for the cases that it
> handles) because it's hard to unroll later and then interleave.
> 
>> Maybe taking the bullet and moving IV selection back to RTL is the
>> answer.
> 
> I think that would be a bad move.  The trend recently seems to have been
> to lower stuff to individual machine operations earlier in the rtl pass
> pipeline (often immediately during expand) rather than split them later.
> The reasoning behind that is that (1) gimple has already heavily optimised
> the unlowered form and (2) lowering earlier gives the more powerful rtl
> optimisers a chance to do something with the individual machine operations.
> It's going to be hard for an RTL ivopts pass to piece everything back
> together.
> 
>> For a "short term" solution I still think that trying to perform
>> unrolling and IV selection (for the D-form case you're targeting)
>> at the same time is a better design, even if it means complicating
>> the IV selection pass (and yeah, it'll still be at GIMPLE and w/o
>> any good idea about scheduling).  There are currently 20+ GIMPLE
>> optimization passes and 10+ RTL optimization passes between
>> IV selection and unrolling, the idea that you can have transform
>> decision and transform apply this far apart looks scary.
> 
> FWIW, another option might be to go back to something like:
> 
>   https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532676.html
> 
> I agree that it was worth putting that series on hold and trying a more
> target-independent approach, but I think in the end it didn't work out,
> for the reasons Richard says.  At least the target-specific pass would
> be making a strict improvement to the IL that it sees, rather than
> having to predict what future passes might do or might want.
> 

Yeah, I also had this thought in mind, if we cannot find one good
target-independent approach, it seems good to revisit this series and
work with target-specific approach.

BR,
Kewen

Reply via email to