Do we need to do a loop invariant motion after loop interchange ?

2019-11-21 Thread Li Jia He

Hi,

I found for the follow code:

#define N 256
int a[N][N][N], b[N][N][N];
int d[N][N], c[N][N];
void __attribute__((noinline))
double_reduc (int n)
{
  for (int k = 0; k < n; k++)
  {
for (int l = 0; l < n; l++)
 {
c[k][l] = 0;
for (int m = 0; m < n; m++)
  c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
 }
  }
}

I dumped the file after loop interchange and got the following information:

 [local count: 118111600]:
  # m_46 = PHI <0(7), m_45(11)>
  # ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
  _39 = _49 + 1;

   [local count: 955630224]:
  # l_48 = PHI <0(3), l_47(12)>
  # ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
  c_I_I_lsm.5_18 = c[k_28][l_48];
  c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
  _2 = a[k_28][m_46][l_48];
  _3 = d[k_28][m_46];
  _4 = _2 * _3;
  _5 = b[k_28][m_46][l_48];
  _6 = _3 * _5;
  _7 = _4 + _6;
  _8 = _7 + c_I_I_lsm.5_53;
  c[k_28][l_48] = _8;
  l_47 = l_48 + 1;
  ivtmp_40 = ivtmp_41 - 1;
  if (ivtmp_40 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

we can see '_3 = d[k_28][m_46];'  is a loop invariant.
Do we need to add a loop invariant motion pass after the loop interchange?

BR,
Lijia He



Re: Do we need to do a loop invariant motion after loop interchange ?

2019-11-21 Thread Richard Biener
On Thu, Nov 21, 2019 at 10:22 AM Li Jia He  wrote:
>
> Hi,
>
> I found for the follow code:
>
> #define N 256
> int a[N][N][N], b[N][N][N];
> int d[N][N], c[N][N];
> void __attribute__((noinline))
> double_reduc (int n)
> {
>for (int k = 0; k < n; k++)
>{
>  for (int l = 0; l < n; l++)
>   {
> c[k][l] = 0;
>  for (int m = 0; m < n; m++)
>c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
>   }
>}
> }
>
> I dumped the file after loop interchange and got the following information:
>
>  [local count: 118111600]:
># m_46 = PHI <0(7), m_45(11)>
># ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
>_39 = _49 + 1;
>
> [local count: 955630224]:
># l_48 = PHI <0(3), l_47(12)>
># ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
>c_I_I_lsm.5_18 = c[k_28][l_48];
>c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
>_2 = a[k_28][m_46][l_48];
>_3 = d[k_28][m_46];
>_4 = _2 * _3;
>_5 = b[k_28][m_46][l_48];
>_6 = _3 * _5;
>_7 = _4 + _6;
>_8 = _7 + c_I_I_lsm.5_53;
>c[k_28][l_48] = _8;
>l_47 = l_48 + 1;
>ivtmp_40 = ivtmp_41 - 1;
>if (ivtmp_40 != 0)
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
>
> we can see '_3 = d[k_28][m_46];'  is a loop invariant.
> Do we need to add a loop invariant motion pass after the loop interchange?

There is one at the end of the loop pipeline.

> BR,
> Lijia He
>


Re: GCC's instrumentation and the target environment

2019-11-21 Thread Martin Liška

On 11/20/19 4:14 PM, David Taylor wrote:

Sorry for not responding sooner.

Thanks Martin.

Like Joel we have a third party solution to instrumentation.  Part of
my objection to the third party solution is freedom.  There are
customizations we would like, but not having source we're at the mercy
of the vendor both for whether it gets done and the timing.  Part of
the objection is the massive amount of changes I had to make to our
build system to integrate it and the resulting feeling of fragility.
They are reportedly addressing the latter in a future release.

By contrast, looking at GCC based instrumentation, the changes
required to our build system are very small and easy.


Hello.

That sounds promising to me!



Part of my purpose in posting was the belief that this problem --
wanting to instrument embedded code -- is not uncommon and has likely
been solved already.  And the hope that one of the solvers would feel
that their existing solution was in a good enough shape to contribute
back.  Or that someone would point out that there is already an
existing {Free | Open} source solution that I am overlooking.


Well, one quite similar usage will be usage of gcov in linux kernel,
please take a look at kernel/gcov subfolder.



Since no one has mentioned an existing solution, here is a first draft
of a proposed solution to GCC instrumentation not playing well with
embedded...

NOTE: *NONE* of the following has been implemented as yet.  I would
ulimately like this to be something that once implemented would be
considered for becoming part of standard GCC.  So, if you see something
that would impede that goal or if changed would improve its chances,
please speak up.

Add a new configure options --{with|without}-libgcov-standalone-env

Default: without (i.e., hosted)

Question: should hosted libgcov be the default for all configuration
tuples?  Or should it only be the default when there are headers?
Or...?


That's a detail, so let's say we'll have an option that will come up
with a wrappers (__gcov_ # fn).



When standalone, when building libgcov.a files, suppress
-Dinhibit_libc and add -DLIBGCOV_STANDALONE_ENV to the command line
switches.

Then in libgcov-driver.c, libgcov-driver-system.c, gcov-io.c, replace

all calls of fopenwith calls of __gcov_open_int
  fread  __gcov_read_int
 fwrite __gcov_write_int
 fclose __gcov_close_int
 fseek  __gcov_seek_int
 ftell  __gcov_tell_int

 setbuf __gcov_setbuf_int

 Probably belongs inside __gcov_open_int instead of as
 a separate routine.

 getenv __gcov_getenv_int

 abort  __gcov_abort_int

  When the application is 'the kernel' or 'the system',
  abort isn't really an option.

 fprintf__gcov_fprintf_int

  This is called in two places -- gcov_error and
  gcov_exit_open_gcda_file.  The latter hard codes the
  stream as stderr; the former calls get_gcov_error_file
  to get the stream (which defaults to stderr but can be
  overridded via an environment variable).

 I think that get_gcov_error_file should be renamed to
 __gcov_get_error_file, be made non-static, and be
 called by gcov_exit_open_gcda_file instead of hard
 coding stderr.

 For that matter, I feel that gcov_exit_open_gcda_file
 should just call __gcov_error instead of doing it's
 own error reporting.  And __gcov_get_error_file and
 __gcov_error should be a replacable routines as
 embedded systems might well have a different way of
 reporting errors.

 vfprintf   __gcov_vfprintf_int

 If gcov_open_gcda_file is altered to call
 __gcov_error and __gcov_error becomes a replacable
 routine, then fprintf and vfprintf do not need to be
 wrapped.

 malloc __gcov_malloc_int
 free   __gcov_free_int

 Embedded applications often do memory allocation
 differently.

While I think that the above list is complete, I wouldn't be surprised
if I missed one or two.


That seams reasonable to me and can be easily achieved by a macro that
will either expand to __gcov # fn _ int, or to fn. I can imagine that.



Other than __gcov_open_int, there would be no conflict if the _int was
left off the end.  I put it on to emphasize that these routines were
not meant to be called by the user, but rather are provided by the
user.  Some other naming convention might be better.

There would be a new heade

Re: Commit messages and the move to git

2019-11-21 Thread Joseph Myers
On Tue, 19 Nov 2019, Eric S. Raymond wrote:

> Richard Earnshaw (lists) :
> > Nope, that was from running the go version from yesterday.  This one, to
> > be precise:  1ab3c514c6cd5e1a5d6b68a8224df299751ca637
> > 
> > This pass used to be very fast a couple of weeks back, but something
> > went in recently that's caused a major slowdown.
> > 
> > Oh, and I've been having problems with the ChangeLogs command as well.
> > It used to run fine on my machine (128G), but now it's started blowing
> > memory and taking my X server down.
> 
> That sucks.  Those were stretches of code the two guys working with me
> have been trying to speed up. Looks like that backfired.
> 
> Please file isses at  https://gitlab.com/esr/reposurgeon/issues and
> include timing reports if you can.

I see the changelogs issue is fixed (I can run a conversion past that 
point on a system with 128GB memory, with mergeinfo processing being very 
slow as described by Richard).  But then I get errors:

*** Unknown syntax: relax

followed by the "tag /branch-root|branchpoint/ delete" command giving an 
error

reposurgeon: assignments invalidated by GC

and a "script abort" in conversion.log, after which it starts writing out 
gcc.fi (I think without processing any of the rest of gcc.lift).  I don't 
know whether the above errors are bugs in reposurgeon or in the 
gcc-conversion scripts.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Commit messages and the move to git

2019-11-21 Thread Richard Earnshaw (lists)

On 21/11/2019 16:40, Joseph Myers wrote:

On Tue, 19 Nov 2019, Eric S. Raymond wrote:


Richard Earnshaw (lists) :

Nope, that was from running the go version from yesterday.  This one, to
be precise:  1ab3c514c6cd5e1a5d6b68a8224df299751ca637

This pass used to be very fast a couple of weeks back, but something
went in recently that's caused a major slowdown.

Oh, and I've been having problems with the ChangeLogs command as well.
It used to run fine on my machine (128G), but now it's started blowing
memory and taking my X server down.


That sucks.  Those were stretches of code the two guys working with me
have been trying to speed up. Looks like that backfired.

Please file isses at  https://gitlab.com/esr/reposurgeon/issues and
include timing reports if you can.


I see the changelogs issue is fixed (I can run a conversion past that
point on a system with 128GB memory, with mergeinfo processing being very
slow as described by Richard).  


This is
https://gitlab.com/esr/reposurgeon/issues/153


But then I get errors:

*** Unknown syntax: relax



Change that to

set relax


followed by the "tag /branch-root|branchpoint/ delete" command giving an
error

reposurgeon: assignments invalidated by GC

and a "script abort" in conversion.log, after which it starts writing out
gcc.fi (I think without processing any of the rest of gcc.lift).  I don't
know whether the above errors are bugs in reposurgeon or in the
gcc-conversion scripts.





Re: Commit messages and the move to git

2019-11-21 Thread Eric S. Raymond
Joseph Myers :
> I see the changelogs issue is fixed (I can run a conversion past that 
> point on a system with 128GB memory, with mergeinfo processing being very 
> slow as described by Richard).  But then I get errors:
> 
> *** Unknown syntax: relax

Missing "relax" command probably means your reposurgeon is very old.
What does "version" say?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Commit messages and the move to git

2019-11-21 Thread Eric S. Raymond
Richard Earnshaw (lists) :
> > But then I get errors:
> > 
> > *** Unknown syntax: relax
> > 
> 
> Change that to
> 
> set relax

Oops.  He's right.  It used to be a command, but that changed recently
as art of a redesign of log levels and options.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




How to properly build and run testsuite?

2019-11-21 Thread Andrew Dean via gcc
I'm curious what other people are doing, because I'm never able to match the 
results that get reported to the test-results list. I created a brand new 
virtual machine running Ubuntu 18.04 (x86_64), installed the prereqs as listed 
here:  https://gcc.gnu.org/install/prerequisites.html, created the repo 
following the "Getting Started - Read Only" instructions listed here: 
https://gcc.gnu.org/wiki/GitMirror, then ran these commands from my build 
folder.
configure --disable-multilib --prefix=/home/adean/install
make
make check -k

As an example, the gcc summary for me (10.0.0 20191120) shows 
# of unexpected failures85
# of unexpected successes   35

Whereas the most recent reported results (10.0.0 20191118) show only 2 
unexpected failures and no unexpected successes in the gcc summary.

Is it really just because I'm two days newer that ~120 regressions entered the 
picture (unlikely) or am I doing something wrong on my machine?

Thanks,
Andrew



Re: Commit messages and the move to git

2019-11-21 Thread Richard Earnshaw (lists)
On 21/11/2019 16:40, Joseph Myers wrote:
> On Tue, 19 Nov 2019, Eric S. Raymond wrote:
> 
>> Richard Earnshaw (lists) :
>> > Nope, that was from running the go version from yesterday.  This one, to
>> > be precise:  1ab3c514c6cd5e1a5d6b68a8224df299751ca637
>> > 
>> > This pass used to be very fast a couple of weeks back, but something
>> > went in recently that's caused a major slowdown.
>> > 
>> > Oh, and I've been having problems with the ChangeLogs command as well.
>> > It used to run fine on my machine (128G), but now it's started blowing
>> > memory and taking my X server down.
>> 
>> That sucks.  Those were stretches of code the two guys working with me
>> have been trying to speed up. Looks like that backfired.
>> 
>> Please file isses at  https://gitlab.com/esr/reposurgeon/issues and
>> include timing reports if you can.
> 
> I see the changelogs issue is fixed (I can run a conversion past that
> point on a system with 128GB memory, with mergeinfo processing being very
> slow as described by Richard).  But then I get errors:
> 

Eric, now that the changelogs command can take a selection set, do you
have a suggestion for how we might construct a sets that are just the
merge commands, or just the copies?  Both of these seem to get the wrong
author attribution and it would be nice to exclude them.

R.

> *** Unknown syntax: relax
> 
> followed by the "tag /branch-root|branchpoint/ delete" command giving an
> error
> 
> reposurgeon: assignments invalidated by GC
> 
> and a "script abort" in conversion.log, after which it starts writing out
> gcc.fi (I think without processing any of the rest of gcc.lift).  I don't
> know whether the above errors are bugs in reposurgeon or in the
> gcc-conversion scripts.
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



Re: How to properly build and run testsuite?

2019-11-21 Thread Jonathan Wakely
On Thu, 21 Nov 2019 at 19:11, Andrew Dean via gcc  wrote:
>
> I'm curious what other people are doing, because I'm never able to match the 
> results that get reported to the test-results list. I created a brand new 
> virtual machine running Ubuntu 18.04 (x86_64), installed the prereqs as 
> listed here:  https://gcc.gnu.org/install/prerequisites.html, created the 
> repo following the "Getting Started - Read Only" instructions listed here: 
> https://gcc.gnu.org/wiki/GitMirror, then ran these commands from my build 
> folder.
> configure --disable-multilib --prefix=/home/adean/install
> make
> make check -k
>
> As an example, the gcc summary for me (10.0.0 20191120) shows
> # of unexpected failures85
> # of unexpected successes   35

That doesn't seem too bad.

> Whereas the most recent reported results (10.0.0 20191118) show only 2 
> unexpected failures and no unexpected successes in the gcc summary.

Which results are you looking at?
Two failures sounds very low, it's probably not running the guality
tests which usually fail.

> Is it really just because I'm two days newer that ~120 regressions entered 
> the picture (unlikely) or am I doing something wrong on my machine?
>
> Thanks,
> Andrew
>


RE: How to properly build and run testsuite?

2019-11-21 Thread Andrew Dean via gcc
> > Whereas the most recent reported results (10.0.0 20191118) show only 2
> unexpected failures and no unexpected successes in the gcc summary.
> 
> Which results are you looking at?
> Two failures sounds very low, it's probably not running the guality tests 
> which
> usually fail.
> 
I searched the mailing list for x86_64-pc-linux-gnu to make sure I was 
comparing apples to apples, and this was the most recent report: 
https://gcc.gnu.org/ml/gcc-testresults/2019-11/msg01190.html


Re: How to properly build and run testsuite?

2019-11-21 Thread Jonathan Wakely
On Thu, 21 Nov 2019 at 21:30, Andrew Dean  wrote:
>
> > > Whereas the most recent reported results (10.0.0 20191118) show only 2
> > unexpected failures and no unexpected successes in the gcc summary.
> >
> > Which results are you looking at?
> > Two failures sounds very low, it's probably not running the guality tests 
> > which
> > usually fail.
> >
> I searched the mailing list for x86_64-pc-linux-gnu to make sure I was 
> comparing apples to apples, and this was the most recent report: 
> https://gcc.gnu.org/ml/gcc-testresults/2019-11/msg01190.html

Yes, I thought you might be looking at that one. That has:
# of unsupported tests 7126
which seems high. Here's a more typical set of results:
https://gcc.gnu.org/ml/gcc-testresults/2019-11/msg01255.html

The one you looked at is built on an EC2 instance, so might be missing
something (GDB?) needed for the guality tests.

So I don't think you're doing anything wrong, you just got unlucky and
looked at one which skips most of the tests that are failing for you.


RE: How to properly build and run testsuite?

2019-11-21 Thread Andrew Dean via gcc
> > > > Whereas the most recent reported results (10.0.0 20191118) show
> > > > only 2
> > > unexpected failures and no unexpected successes in the gcc summary.
> > >
> > > Which results are you looking at?
> > > Two failures sounds very low, it's probably not running the guality
> > > tests which usually fail.
> > >
> > I searched the mailing list for x86_64-pc-linux-gnu to make sure I was
> > comparing apples to apples, and this was the most recent report:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> > gnu.org%2Fml%2Fgcc-testresults%2F2019-
> 11%2Fmsg01190.html&data=02%7
> >
> C01%7CAndrew.Dean%40microsoft.com%7Cb89abb6ea9f34a0916d908d76ed0
> 3783%7
> >
> C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637099712871926073
> &sda
> >
> ta=KooYFswCjKMaV%2FQ3D6tc03WAej8MfQ4Zl9H9kjtR%2B6Y%3D&rese
> rved=0
> 
> Yes, I thought you might be looking at that one. That has:
> # of unsupported tests 7126
> which seems high. Here's a more typical set of results:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu
> .org%2Fml%2Fgcc-testresults%2F2019-
> 11%2Fmsg01255.html&data=02%7C01%7CAndrew.Dean%40microsoft.co
> m%7Cb89abb6ea9f34a0916d908d76ed03783%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C637099712871926073&sdata=ujcoGkSL45wodz
> A50F7NvlyHPGzt8MKnTQbUzye3JIw%3D&reserved=0
> 
> The one you looked at is built on an EC2 instance, so might be missing
> something (GDB?) needed for the guality tests.
> 
> So I don't think you're doing anything wrong, you just got unlucky and looked 
> at
> one which skips most of the tests that are failing for you.

Thanks for your help!


Re: Do we need to do a loop invariant motion after loop interchange ?

2019-11-21 Thread Li Jia He




On 2019/11/21 8:10 PM, Richard Biener wrote:

On Thu, Nov 21, 2019 at 10:22 AM Li Jia He  wrote:


Hi,

I found for the follow code:

#define N 256
int a[N][N][N], b[N][N][N];
int d[N][N], c[N][N];
void __attribute__((noinline))
double_reduc (int n)
{
for (int k = 0; k < n; k++)
{
  for (int l = 0; l < n; l++)
   {
 c[k][l] = 0;
  for (int m = 0; m < n; m++)
c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
   }
}
}

I dumped the file after loop interchange and got the following information:

 [local count: 118111600]:
# m_46 = PHI <0(7), m_45(11)>
# ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
_39 = _49 + 1;

 [local count: 955630224]:
# l_48 = PHI <0(3), l_47(12)>
# ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
c_I_I_lsm.5_18 = c[k_28][l_48];
c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
_2 = a[k_28][m_46][l_48];
_3 = d[k_28][m_46];
_4 = _2 * _3;
_5 = b[k_28][m_46][l_48];
_6 = _3 * _5;
_7 = _4 + _6;
_8 = _7 + c_I_I_lsm.5_53;
c[k_28][l_48] = _8;
l_47 = l_48 + 1;
ivtmp_40 = ivtmp_41 - 1;
if (ivtmp_40 != 0)
  goto ; [89.00%]
else
  goto ; [11.00%]

we can see '_3 = d[k_28][m_46];'  is a loop invariant.
Do we need to add a loop invariant motion pass after the loop interchange?


There is one at the end of the loop pipeline.


Hi,

The one at the end of the loop pipeline may miss some optimization
opportunities.  If we vectorize the above code (a.c.158t.vect), we
can get information similar to the following:

bb 3:
 # m_46 = PHI <0(7), m_45(11)>  // loop m, outer loop
  if (_59 <= 2)
goto bb 20;
  else
goto bb 15;

bb 15:
  _89 = d[k_28][m_46];
  vect_cst__90 = {_89, _89, _89, _89};

bb 4:
   # l_48 = PHI  // loop l, inner loop
  vect__6.23_100 = vect_cst__99 * vect__5.22_98;
   if (ivtmp_110 < bnd.8_1)
goto bb 12;
  else
goto bb 17;

bb 20:
bb 18:
   _27 = d[k_28][m_46];
if (ivtmp_12 != 0)
goto bb 19;
  else
goto bb 21;

Vectorization will do some conversions in this case.  We can see
‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant
relative to loop l.  We can move ‘d[k_28][m_46]’ to the front of
‘if (_59 <= 2)’ to get rid of loading data from memory in both branches.

The one at at the end of the loop pipeline can't handle this situation.
If we move d[k_28][m_46] from loop l to loop m before doing
vectorization, we can get rid of this situation.

--
BR,
Lijia He




BR,
Lijia He





Re: Do we need to do a loop invariant motion after loop interchange ?

2019-11-21 Thread Richard Biener
On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He  
wrote:
>
>
>On 2019/11/21 8:10 PM, Richard Biener wrote:
>> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He 
>wrote:
>>>
>>> Hi,
>>>
>>> I found for the follow code:
>>>
>>> #define N 256
>>> int a[N][N][N], b[N][N][N];
>>> int d[N][N], c[N][N];
>>> void __attribute__((noinline))
>>> double_reduc (int n)
>>> {
>>> for (int k = 0; k < n; k++)
>>> {
>>>   for (int l = 0; l < n; l++)
>>>{
>>>  c[k][l] = 0;
>>>   for (int m = 0; m < n; m++)
>>> c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
>>>}
>>> }
>>> }
>>>
>>> I dumped the file after loop interchange and got the following
>information:
>>>
>>>  [local count: 118111600]:
>>> # m_46 = PHI <0(7), m_45(11)>
>>> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
>>> _39 = _49 + 1;
>>>
>>>  [local count: 955630224]:
>>> # l_48 = PHI <0(3), l_47(12)>
>>> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
>>> c_I_I_lsm.5_18 = c[k_28][l_48];
>>> c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
>>> _2 = a[k_28][m_46][l_48];
>>> _3 = d[k_28][m_46];
>>> _4 = _2 * _3;
>>> _5 = b[k_28][m_46][l_48];
>>> _6 = _3 * _5;
>>> _7 = _4 + _6;
>>> _8 = _7 + c_I_I_lsm.5_53;
>>> c[k_28][l_48] = _8;
>>> l_47 = l_48 + 1;
>>> ivtmp_40 = ivtmp_41 - 1;
>>> if (ivtmp_40 != 0)
>>>   goto ; [89.00%]
>>> else
>>>   goto ; [11.00%]
>>>
>>> we can see '_3 = d[k_28][m_46];'  is a loop invariant.
>>> Do we need to add a loop invariant motion pass after the loop
>interchange?
>> 
>> There is one at the end of the loop pipeline.
>
>Hi,
>
>The one at the end of the loop pipeline may miss some optimization
>opportunities.  If we vectorize the above code (a.c.158t.vect), we
>can get information similar to the following:
>
>bb 3:
>  # m_46 = PHI <0(7), m_45(11)>  // loop m, outer loop
>   if (_59 <= 2)
> goto bb 20;
>   else
> goto bb 15;
>
>bb 15:
>   _89 = d[k_28][m_46];
>   vect_cst__90 = {_89, _89, _89, _89};
>
>bb 4:
># l_48 = PHI  // loop l, inner loop
>   vect__6.23_100 = vect_cst__99 * vect__5.22_98;
>if (ivtmp_110 < bnd.8_1)
> goto bb 12;
>   else
> goto bb 17;
>
>bb 20:
>bb 18:
>_27 = d[k_28][m_46];
>if (ivtmp_12 != 0)
> goto bb 19;
>   else
> goto bb 21;
>
>Vectorization will do some conversions in this case.  We can see
>‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant
>relative to loop l.  We can move ‘d[k_28][m_46]’ to the front of
>‘if (_59 <= 2)’ to get rid of loading data from memory in both
>branches.
>
>The one at at the end of the loop pipeline can't handle this situation.
>If we move d[k_28][m_46] from loop l to loop m before doing
>vectorization, we can get rid of this situation.

But we can't run every pass after every other. With multiple passes having 
ordering issues is inevitable.

Now - interchange could trigger a region based invariant motion just for the 
nest it interchanged. But that doesn't exist right now.

Richard. 


Re: Do we need to do a loop invariant motion after loop interchange ?

2019-11-21 Thread Bin.Cheng
On Fri, Nov 22, 2019 at 3:19 PM Richard Biener
 wrote:
>
> On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He  
> wrote:
> >
> >
> >On 2019/11/21 8:10 PM, Richard Biener wrote:
> >> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He 
> >wrote:
> >>>
> >>> Hi,
> >>>
> >>> I found for the follow code:
> >>>
> >>> #define N 256
> >>> int a[N][N][N], b[N][N][N];
> >>> int d[N][N], c[N][N];
> >>> void __attribute__((noinline))
> >>> double_reduc (int n)
> >>> {
> >>> for (int k = 0; k < n; k++)
> >>> {
> >>>   for (int l = 0; l < n; l++)
> >>>{
> >>>  c[k][l] = 0;
> >>>   for (int m = 0; m < n; m++)
> >>> c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
> >>>}
> >>> }
> >>> }
> >>>
> >>> I dumped the file after loop interchange and got the following
> >information:
> >>>
> >>>  [local count: 118111600]:
> >>> # m_46 = PHI <0(7), m_45(11)>
> >>> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
> >>> _39 = _49 + 1;
> >>>
> >>>  [local count: 955630224]:
> >>> # l_48 = PHI <0(3), l_47(12)>
> >>> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
> >>> c_I_I_lsm.5_18 = c[k_28][l_48];
> >>> c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
> >>> _2 = a[k_28][m_46][l_48];
> >>> _3 = d[k_28][m_46];
> >>> _4 = _2 * _3;
> >>> _5 = b[k_28][m_46][l_48];
> >>> _6 = _3 * _5;
> >>> _7 = _4 + _6;
> >>> _8 = _7 + c_I_I_lsm.5_53;
> >>> c[k_28][l_48] = _8;
> >>> l_47 = l_48 + 1;
> >>> ivtmp_40 = ivtmp_41 - 1;
> >>> if (ivtmp_40 != 0)
> >>>   goto ; [89.00%]
> >>> else
> >>>   goto ; [11.00%]
> >>>
> >>> we can see '_3 = d[k_28][m_46];'  is a loop invariant.
> >>> Do we need to add a loop invariant motion pass after the loop
> >interchange?
> >>
> >> There is one at the end of the loop pipeline.
> >
> >Hi,
> >
> >The one at the end of the loop pipeline may miss some optimization
> >opportunities.  If we vectorize the above code (a.c.158t.vect), we
> >can get information similar to the following:
> >
> >bb 3:
> >  # m_46 = PHI <0(7), m_45(11)>  // loop m, outer loop
> >   if (_59 <= 2)
> > goto bb 20;
> >   else
> > goto bb 15;
> >
> >bb 15:
> >   _89 = d[k_28][m_46];
> >   vect_cst__90 = {_89, _89, _89, _89};
> >
> >bb 4:
> ># l_48 = PHI  // loop l, inner loop
> >   vect__6.23_100 = vect_cst__99 * vect__5.22_98;
> >if (ivtmp_110 < bnd.8_1)
> > goto bb 12;
> >   else
> > goto bb 17;
> >
> >bb 20:
> >bb 18:
> >_27 = d[k_28][m_46];
> >if (ivtmp_12 != 0)
> > goto bb 19;
> >   else
> > goto bb 21;
> >
> >Vectorization will do some conversions in this case.  We can see
> >‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant
> >relative to loop l.  We can move ‘d[k_28][m_46]’ to the front of
> >‘if (_59 <= 2)’ to get rid of loading data from memory in both
> >branches.
> >
> >The one at at the end of the loop pipeline can't handle this situation.
> >If we move d[k_28][m_46] from loop l to loop m before doing
> >vectorization, we can get rid of this situation.
>
> But we can't run every pass after every other. With multiple passes having 
> ordering issues is inevitable.
>
> Now - interchange could trigger a region based invariant motion just for the 
> nest it interchanged. But that doesn't exist right now.
With data reference/dependence information in the pass, I think it
could be quite straightforward.  Didn't realize that we need it
before.

Thanks,
bin
>
> Richard.