On Mon, Apr 23, 2018 at 2:02 PM, Richard Biener <richard.guent...@gmail.com> wrote:
> On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > > On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene <t...@moene.org> wrote: > >> A few days ago there was a rant on the Fortran Standardization > Committee's > >> e-mail list about Fortran's "whole array arithmetic" being > unoptimizable. > >> > >> An example picked at random from our weather forecasting code: > >> > >> ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP) > >> ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP) > >> ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP) > >> ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP) > >> > >> The reaction from one of the members of the committee (about "their" > >> compiler): > >> > >> 'And multiple consecutive array statements with the same shape are > “fused” > >> exactly so that the compiler can generate good cache use. This sort of > >> optimization is pretty low hanging fruit.' > >> > >> As far as I can see loop fusion as a stand-alone optimization is not > >> supported as yet, although some mention is made in the context of > graphite. > >> > >> Is this something that should be pursued ? > > Hi, > > I don't know the current status of fusion in graphite. As for > > traditional fusion transformation, I think it's not very difficult to > > be implemented along with existing distribution, actually, quite lot > > of code should be shared. What we do need are something like: more > > motivation cases, good/conservative cost model. > > Yes, I guess before distribution you want to do maximum fusion and then > apply (re-)distribution on the fused loop. The cost model should be the > very same for distribution/fusion. > > Richard. > I recall Fujitsu bragging that the key to them getting good application performance (read: outside linpack) on the K computer is extensive use of loop FISSION + software pipelining. Though I guess sw-pipelining is only useful if you have lots of architectural registers, which disqualifies x86-64.. -- Janne Blomqvist