On Mon, Apr 23, 2018 at 2:02 PM, Richard Biener <richard.guent...@gmail.com>
wrote:

> On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> > On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene <t...@moene.org> wrote:
> >> A few days ago there was a rant on the Fortran Standardization
> Committee's
> >> e-mail list about Fortran's "whole array arithmetic" being
> unoptimizable.
> >>
> >> An example picked at random from our weather forecasting code:
> >>
> >>     ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
> >>     ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
> >>     ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
> >>     ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
> >>
> >> The reaction from one of the members of the committee (about "their"
> >> compiler):
> >>
> >> 'And multiple consecutive array statements with the same shape are
> “fused”
> >> exactly so that the compiler can generate good cache use. This sort of
> >> optimization is pretty low hanging fruit.'
> >>
> >> As far as I can see loop fusion as a stand-alone optimization is not
> >> supported as yet, although some mention is made in the context of
> graphite.
> >>
> >> Is this something that should be pursued ?
> > Hi,
> > I don't know the current status of fusion in graphite.  As for
> > traditional fusion transformation, I think it's not very difficult to
> > be implemented along with existing distribution, actually, quite lot
> > of code should be shared.  What we do need are something like: more
> > motivation cases, good/conservative cost model.
>
> Yes, I guess before distribution you want to do maximum fusion and then
> apply (re-)distribution on the fused loop.  The cost model should be the
> very same for distribution/fusion.
>
> Richard.
>


I recall Fujitsu bragging that the key to them getting good application
performance (read: outside linpack) on the K computer is extensive use of
loop FISSION + software pipelining. Though I guess sw-pipelining is only
useful if you have lots of architectural registers, which disqualifies
x86-64..


-- 
Janne Blomqvist

Reply via email to