Do we need to do a loop invariant motion after loop interchange ?
Hi, I found for the follow code: #define N 256 int a[N][N][N], b[N][N][N]; int d[N][N], c[N][N]; void __attribute__((noinline)) double_reduc (int n) { for (int k = 0; k < n; k++) { for (int l = 0; l < n; l++) { c[k][l] = 0; for (int m = 0; m < n; m++) c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; } } } I dumped the file after loop interchange and got the following information: [local count: 118111600]: # m_46 = PHI <0(7), m_45(11)> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)> _39 = _49 + 1; [local count: 955630224]: # l_48 = PHI <0(3), l_47(12)> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)> c_I_I_lsm.5_18 = c[k_28][l_48]; c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; _2 = a[k_28][m_46][l_48]; _3 = d[k_28][m_46]; _4 = _2 * _3; _5 = b[k_28][m_46][l_48]; _6 = _3 * _5; _7 = _4 + _6; _8 = _7 + c_I_I_lsm.5_53; c[k_28][l_48] = _8; l_47 = l_48 + 1; ivtmp_40 = ivtmp_41 - 1; if (ivtmp_40 != 0) goto ; [89.00%] else goto ; [11.00%] we can see '_3 = d[k_28][m_46];' is a loop invariant. Do we need to add a loop invariant motion pass after the loop interchange? BR, Lijia He
Re: Do we need to do a loop invariant motion after loop interchange ?
On Thu, Nov 21, 2019 at 10:22 AM Li Jia He wrote: > > Hi, > > I found for the follow code: > > #define N 256 > int a[N][N][N], b[N][N][N]; > int d[N][N], c[N][N]; > void __attribute__((noinline)) > double_reduc (int n) > { >for (int k = 0; k < n; k++) >{ > for (int l = 0; l < n; l++) > { > c[k][l] = 0; > for (int m = 0; m < n; m++) >c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; > } >} > } > > I dumped the file after loop interchange and got the following information: > > [local count: 118111600]: ># m_46 = PHI <0(7), m_45(11)> ># ivtmp_44 = PHI <_42(7), ivtmp_43(11)> >_39 = _49 + 1; > > [local count: 955630224]: ># l_48 = PHI <0(3), l_47(12)> ># ivtmp_41 = PHI <_39(3), ivtmp_40(12)> >c_I_I_lsm.5_18 = c[k_28][l_48]; >c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; >_2 = a[k_28][m_46][l_48]; >_3 = d[k_28][m_46]; >_4 = _2 * _3; >_5 = b[k_28][m_46][l_48]; >_6 = _3 * _5; >_7 = _4 + _6; >_8 = _7 + c_I_I_lsm.5_53; >c[k_28][l_48] = _8; >l_47 = l_48 + 1; >ivtmp_40 = ivtmp_41 - 1; >if (ivtmp_40 != 0) > goto ; [89.00%] >else > goto ; [11.00%] > > we can see '_3 = d[k_28][m_46];' is a loop invariant. > Do we need to add a loop invariant motion pass after the loop interchange? There is one at the end of the loop pipeline. > BR, > Lijia He >
Re: GCC's instrumentation and the target environment
On 11/20/19 4:14 PM, David Taylor wrote: Sorry for not responding sooner. Thanks Martin. Like Joel we have a third party solution to instrumentation. Part of my objection to the third party solution is freedom. There are customizations we would like, but not having source we're at the mercy of the vendor both for whether it gets done and the timing. Part of the objection is the massive amount of changes I had to make to our build system to integrate it and the resulting feeling of fragility. They are reportedly addressing the latter in a future release. By contrast, looking at GCC based instrumentation, the changes required to our build system are very small and easy. Hello. That sounds promising to me! Part of my purpose in posting was the belief that this problem -- wanting to instrument embedded code -- is not uncommon and has likely been solved already. And the hope that one of the solvers would feel that their existing solution was in a good enough shape to contribute back. Or that someone would point out that there is already an existing {Free | Open} source solution that I am overlooking. Well, one quite similar usage will be usage of gcov in linux kernel, please take a look at kernel/gcov subfolder. Since no one has mentioned an existing solution, here is a first draft of a proposed solution to GCC instrumentation not playing well with embedded... NOTE: *NONE* of the following has been implemented as yet. I would ulimately like this to be something that once implemented would be considered for becoming part of standard GCC. So, if you see something that would impede that goal or if changed would improve its chances, please speak up. Add a new configure options --{with|without}-libgcov-standalone-env Default: without (i.e., hosted) Question: should hosted libgcov be the default for all configuration tuples? Or should it only be the default when there are headers? Or...? That's a detail, so let's say we'll have an option that will come up with a wrappers (__gcov_ # fn). When standalone, when building libgcov.a files, suppress -Dinhibit_libc and add -DLIBGCOV_STANDALONE_ENV to the command line switches. Then in libgcov-driver.c, libgcov-driver-system.c, gcov-io.c, replace all calls of fopenwith calls of __gcov_open_int fread __gcov_read_int fwrite __gcov_write_int fclose __gcov_close_int fseek __gcov_seek_int ftell __gcov_tell_int setbuf __gcov_setbuf_int Probably belongs inside __gcov_open_int instead of as a separate routine. getenv __gcov_getenv_int abort __gcov_abort_int When the application is 'the kernel' or 'the system', abort isn't really an option. fprintf__gcov_fprintf_int This is called in two places -- gcov_error and gcov_exit_open_gcda_file. The latter hard codes the stream as stderr; the former calls get_gcov_error_file to get the stream (which defaults to stderr but can be overridded via an environment variable). I think that get_gcov_error_file should be renamed to __gcov_get_error_file, be made non-static, and be called by gcov_exit_open_gcda_file instead of hard coding stderr. For that matter, I feel that gcov_exit_open_gcda_file should just call __gcov_error instead of doing it's own error reporting. And __gcov_get_error_file and __gcov_error should be a replacable routines as embedded systems might well have a different way of reporting errors. vfprintf __gcov_vfprintf_int If gcov_open_gcda_file is altered to call __gcov_error and __gcov_error becomes a replacable routine, then fprintf and vfprintf do not need to be wrapped. malloc __gcov_malloc_int free __gcov_free_int Embedded applications often do memory allocation differently. While I think that the above list is complete, I wouldn't be surprised if I missed one or two. That seams reasonable to me and can be easily achieved by a macro that will either expand to __gcov # fn _ int, or to fn. I can imagine that. Other than __gcov_open_int, there would be no conflict if the _int was left off the end. I put it on to emphasize that these routines were not meant to be called by the user, but rather are provided by the user. Some other naming convention might be better. There would be a new heade
Re: Commit messages and the move to git
On Tue, 19 Nov 2019, Eric S. Raymond wrote: > Richard Earnshaw (lists) : > > Nope, that was from running the go version from yesterday. This one, to > > be precise: 1ab3c514c6cd5e1a5d6b68a8224df299751ca637 > > > > This pass used to be very fast a couple of weeks back, but something > > went in recently that's caused a major slowdown. > > > > Oh, and I've been having problems with the ChangeLogs command as well. > > It used to run fine on my machine (128G), but now it's started blowing > > memory and taking my X server down. > > That sucks. Those were stretches of code the two guys working with me > have been trying to speed up. Looks like that backfired. > > Please file isses at https://gitlab.com/esr/reposurgeon/issues and > include timing reports if you can. I see the changelogs issue is fixed (I can run a conversion past that point on a system with 128GB memory, with mergeinfo processing being very slow as described by Richard). But then I get errors: *** Unknown syntax: relax followed by the "tag /branch-root|branchpoint/ delete" command giving an error reposurgeon: assignments invalidated by GC and a "script abort" in conversion.log, after which it starts writing out gcc.fi (I think without processing any of the rest of gcc.lift). I don't know whether the above errors are bugs in reposurgeon or in the gcc-conversion scripts. -- Joseph S. Myers jos...@codesourcery.com
Re: Commit messages and the move to git
On 21/11/2019 16:40, Joseph Myers wrote: On Tue, 19 Nov 2019, Eric S. Raymond wrote: Richard Earnshaw (lists) : Nope, that was from running the go version from yesterday. This one, to be precise: 1ab3c514c6cd5e1a5d6b68a8224df299751ca637 This pass used to be very fast a couple of weeks back, but something went in recently that's caused a major slowdown. Oh, and I've been having problems with the ChangeLogs command as well. It used to run fine on my machine (128G), but now it's started blowing memory and taking my X server down. That sucks. Those were stretches of code the two guys working with me have been trying to speed up. Looks like that backfired. Please file isses at https://gitlab.com/esr/reposurgeon/issues and include timing reports if you can. I see the changelogs issue is fixed (I can run a conversion past that point on a system with 128GB memory, with mergeinfo processing being very slow as described by Richard). This is https://gitlab.com/esr/reposurgeon/issues/153 But then I get errors: *** Unknown syntax: relax Change that to set relax followed by the "tag /branch-root|branchpoint/ delete" command giving an error reposurgeon: assignments invalidated by GC and a "script abort" in conversion.log, after which it starts writing out gcc.fi (I think without processing any of the rest of gcc.lift). I don't know whether the above errors are bugs in reposurgeon or in the gcc-conversion scripts.
Re: Commit messages and the move to git
Joseph Myers : > I see the changelogs issue is fixed (I can run a conversion past that > point on a system with 128GB memory, with mergeinfo processing being very > slow as described by Richard). But then I get errors: > > *** Unknown syntax: relax Missing "relax" command probably means your reposurgeon is very old. What does "version" say? -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Richard Earnshaw (lists) : > > But then I get errors: > > > > *** Unknown syntax: relax > > > > Change that to > > set relax Oops. He's right. It used to be a command, but that changed recently as art of a redesign of log levels and options. -- http://www.catb.org/~esr/";>Eric S. Raymond
How to properly build and run testsuite?
I'm curious what other people are doing, because I'm never able to match the results that get reported to the test-results list. I created a brand new virtual machine running Ubuntu 18.04 (x86_64), installed the prereqs as listed here: https://gcc.gnu.org/install/prerequisites.html, created the repo following the "Getting Started - Read Only" instructions listed here: https://gcc.gnu.org/wiki/GitMirror, then ran these commands from my build folder. configure --disable-multilib --prefix=/home/adean/install make make check -k As an example, the gcc summary for me (10.0.0 20191120) shows # of unexpected failures85 # of unexpected successes 35 Whereas the most recent reported results (10.0.0 20191118) show only 2 unexpected failures and no unexpected successes in the gcc summary. Is it really just because I'm two days newer that ~120 regressions entered the picture (unlikely) or am I doing something wrong on my machine? Thanks, Andrew
Re: Commit messages and the move to git
On 21/11/2019 16:40, Joseph Myers wrote: > On Tue, 19 Nov 2019, Eric S. Raymond wrote: > >> Richard Earnshaw (lists) : >> > Nope, that was from running the go version from yesterday. This one, to >> > be precise: 1ab3c514c6cd5e1a5d6b68a8224df299751ca637 >> > >> > This pass used to be very fast a couple of weeks back, but something >> > went in recently that's caused a major slowdown. >> > >> > Oh, and I've been having problems with the ChangeLogs command as well. >> > It used to run fine on my machine (128G), but now it's started blowing >> > memory and taking my X server down. >> >> That sucks. Those were stretches of code the two guys working with me >> have been trying to speed up. Looks like that backfired. >> >> Please file isses at https://gitlab.com/esr/reposurgeon/issues and >> include timing reports if you can. > > I see the changelogs issue is fixed (I can run a conversion past that > point on a system with 128GB memory, with mergeinfo processing being very > slow as described by Richard). But then I get errors: > Eric, now that the changelogs command can take a selection set, do you have a suggestion for how we might construct a sets that are just the merge commands, or just the copies? Both of these seem to get the wrong author attribution and it would be nice to exclude them. R. > *** Unknown syntax: relax > > followed by the "tag /branch-root|branchpoint/ delete" command giving an > error > > reposurgeon: assignments invalidated by GC > > and a "script abort" in conversion.log, after which it starts writing out > gcc.fi (I think without processing any of the rest of gcc.lift). I don't > know whether the above errors are bugs in reposurgeon or in the > gcc-conversion scripts. > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: How to properly build and run testsuite?
On Thu, 21 Nov 2019 at 19:11, Andrew Dean via gcc wrote: > > I'm curious what other people are doing, because I'm never able to match the > results that get reported to the test-results list. I created a brand new > virtual machine running Ubuntu 18.04 (x86_64), installed the prereqs as > listed here: https://gcc.gnu.org/install/prerequisites.html, created the > repo following the "Getting Started - Read Only" instructions listed here: > https://gcc.gnu.org/wiki/GitMirror, then ran these commands from my build > folder. > configure --disable-multilib --prefix=/home/adean/install > make > make check -k > > As an example, the gcc summary for me (10.0.0 20191120) shows > # of unexpected failures85 > # of unexpected successes 35 That doesn't seem too bad. > Whereas the most recent reported results (10.0.0 20191118) show only 2 > unexpected failures and no unexpected successes in the gcc summary. Which results are you looking at? Two failures sounds very low, it's probably not running the guality tests which usually fail. > Is it really just because I'm two days newer that ~120 regressions entered > the picture (unlikely) or am I doing something wrong on my machine? > > Thanks, > Andrew >
RE: How to properly build and run testsuite?
> > Whereas the most recent reported results (10.0.0 20191118) show only 2 > unexpected failures and no unexpected successes in the gcc summary. > > Which results are you looking at? > Two failures sounds very low, it's probably not running the guality tests > which > usually fail. > I searched the mailing list for x86_64-pc-linux-gnu to make sure I was comparing apples to apples, and this was the most recent report: https://gcc.gnu.org/ml/gcc-testresults/2019-11/msg01190.html
Re: How to properly build and run testsuite?
On Thu, 21 Nov 2019 at 21:30, Andrew Dean wrote: > > > > Whereas the most recent reported results (10.0.0 20191118) show only 2 > > unexpected failures and no unexpected successes in the gcc summary. > > > > Which results are you looking at? > > Two failures sounds very low, it's probably not running the guality tests > > which > > usually fail. > > > I searched the mailing list for x86_64-pc-linux-gnu to make sure I was > comparing apples to apples, and this was the most recent report: > https://gcc.gnu.org/ml/gcc-testresults/2019-11/msg01190.html Yes, I thought you might be looking at that one. That has: # of unsupported tests 7126 which seems high. Here's a more typical set of results: https://gcc.gnu.org/ml/gcc-testresults/2019-11/msg01255.html The one you looked at is built on an EC2 instance, so might be missing something (GDB?) needed for the guality tests. So I don't think you're doing anything wrong, you just got unlucky and looked at one which skips most of the tests that are failing for you.
RE: How to properly build and run testsuite?
> > > > Whereas the most recent reported results (10.0.0 20191118) show > > > > only 2 > > > unexpected failures and no unexpected successes in the gcc summary. > > > > > > Which results are you looking at? > > > Two failures sounds very low, it's probably not running the guality > > > tests which usually fail. > > > > > I searched the mailing list for x86_64-pc-linux-gnu to make sure I was > > comparing apples to apples, and this was the most recent report: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. > > gnu.org%2Fml%2Fgcc-testresults%2F2019- > 11%2Fmsg01190.html&data=02%7 > > > C01%7CAndrew.Dean%40microsoft.com%7Cb89abb6ea9f34a0916d908d76ed0 > 3783%7 > > > C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637099712871926073 > &sda > > > ta=KooYFswCjKMaV%2FQ3D6tc03WAej8MfQ4Zl9H9kjtR%2B6Y%3D&rese > rved=0 > > Yes, I thought you might be looking at that one. That has: > # of unsupported tests 7126 > which seems high. Here's a more typical set of results: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu > .org%2Fml%2Fgcc-testresults%2F2019- > 11%2Fmsg01255.html&data=02%7C01%7CAndrew.Dean%40microsoft.co > m%7Cb89abb6ea9f34a0916d908d76ed03783%7C72f988bf86f141af91ab2d7c > d011db47%7C1%7C0%7C637099712871926073&sdata=ujcoGkSL45wodz > A50F7NvlyHPGzt8MKnTQbUzye3JIw%3D&reserved=0 > > The one you looked at is built on an EC2 instance, so might be missing > something (GDB?) needed for the guality tests. > > So I don't think you're doing anything wrong, you just got unlucky and looked > at > one which skips most of the tests that are failing for you. Thanks for your help!
Re: Do we need to do a loop invariant motion after loop interchange ?
On 2019/11/21 8:10 PM, Richard Biener wrote: On Thu, Nov 21, 2019 at 10:22 AM Li Jia He wrote: Hi, I found for the follow code: #define N 256 int a[N][N][N], b[N][N][N]; int d[N][N], c[N][N]; void __attribute__((noinline)) double_reduc (int n) { for (int k = 0; k < n; k++) { for (int l = 0; l < n; l++) { c[k][l] = 0; for (int m = 0; m < n; m++) c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; } } } I dumped the file after loop interchange and got the following information: [local count: 118111600]: # m_46 = PHI <0(7), m_45(11)> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)> _39 = _49 + 1; [local count: 955630224]: # l_48 = PHI <0(3), l_47(12)> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)> c_I_I_lsm.5_18 = c[k_28][l_48]; c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; _2 = a[k_28][m_46][l_48]; _3 = d[k_28][m_46]; _4 = _2 * _3; _5 = b[k_28][m_46][l_48]; _6 = _3 * _5; _7 = _4 + _6; _8 = _7 + c_I_I_lsm.5_53; c[k_28][l_48] = _8; l_47 = l_48 + 1; ivtmp_40 = ivtmp_41 - 1; if (ivtmp_40 != 0) goto ; [89.00%] else goto ; [11.00%] we can see '_3 = d[k_28][m_46];' is a loop invariant. Do we need to add a loop invariant motion pass after the loop interchange? There is one at the end of the loop pipeline. Hi, The one at the end of the loop pipeline may miss some optimization opportunities. If we vectorize the above code (a.c.158t.vect), we can get information similar to the following: bb 3: # m_46 = PHI <0(7), m_45(11)> // loop m, outer loop if (_59 <= 2) goto bb 20; else goto bb 15; bb 15: _89 = d[k_28][m_46]; vect_cst__90 = {_89, _89, _89, _89}; bb 4: # l_48 = PHI // loop l, inner loop vect__6.23_100 = vect_cst__99 * vect__5.22_98; if (ivtmp_110 < bnd.8_1) goto bb 12; else goto bb 17; bb 20: bb 18: _27 = d[k_28][m_46]; if (ivtmp_12 != 0) goto bb 19; else goto bb 21; Vectorization will do some conversions in this case. We can see ‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant relative to loop l. We can move ‘d[k_28][m_46]’ to the front of ‘if (_59 <= 2)’ to get rid of loading data from memory in both branches. The one at at the end of the loop pipeline can't handle this situation. If we move d[k_28][m_46] from loop l to loop m before doing vectorization, we can get rid of this situation. -- BR, Lijia He BR, Lijia He
Re: Do we need to do a loop invariant motion after loop interchange ?
On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He wrote: > > >On 2019/11/21 8:10 PM, Richard Biener wrote: >> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He >wrote: >>> >>> Hi, >>> >>> I found for the follow code: >>> >>> #define N 256 >>> int a[N][N][N], b[N][N][N]; >>> int d[N][N], c[N][N]; >>> void __attribute__((noinline)) >>> double_reduc (int n) >>> { >>> for (int k = 0; k < n; k++) >>> { >>> for (int l = 0; l < n; l++) >>>{ >>> c[k][l] = 0; >>> for (int m = 0; m < n; m++) >>> c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; >>>} >>> } >>> } >>> >>> I dumped the file after loop interchange and got the following >information: >>> >>> [local count: 118111600]: >>> # m_46 = PHI <0(7), m_45(11)> >>> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)> >>> _39 = _49 + 1; >>> >>> [local count: 955630224]: >>> # l_48 = PHI <0(3), l_47(12)> >>> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)> >>> c_I_I_lsm.5_18 = c[k_28][l_48]; >>> c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; >>> _2 = a[k_28][m_46][l_48]; >>> _3 = d[k_28][m_46]; >>> _4 = _2 * _3; >>> _5 = b[k_28][m_46][l_48]; >>> _6 = _3 * _5; >>> _7 = _4 + _6; >>> _8 = _7 + c_I_I_lsm.5_53; >>> c[k_28][l_48] = _8; >>> l_47 = l_48 + 1; >>> ivtmp_40 = ivtmp_41 - 1; >>> if (ivtmp_40 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> we can see '_3 = d[k_28][m_46];' is a loop invariant. >>> Do we need to add a loop invariant motion pass after the loop >interchange? >> >> There is one at the end of the loop pipeline. > >Hi, > >The one at the end of the loop pipeline may miss some optimization >opportunities. If we vectorize the above code (a.c.158t.vect), we >can get information similar to the following: > >bb 3: > # m_46 = PHI <0(7), m_45(11)> // loop m, outer loop > if (_59 <= 2) > goto bb 20; > else > goto bb 15; > >bb 15: > _89 = d[k_28][m_46]; > vect_cst__90 = {_89, _89, _89, _89}; > >bb 4: ># l_48 = PHI // loop l, inner loop > vect__6.23_100 = vect_cst__99 * vect__5.22_98; >if (ivtmp_110 < bnd.8_1) > goto bb 12; > else > goto bb 17; > >bb 20: >bb 18: >_27 = d[k_28][m_46]; >if (ivtmp_12 != 0) > goto bb 19; > else > goto bb 21; > >Vectorization will do some conversions in this case. We can see >‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant >relative to loop l. We can move ‘d[k_28][m_46]’ to the front of >‘if (_59 <= 2)’ to get rid of loading data from memory in both >branches. > >The one at at the end of the loop pipeline can't handle this situation. >If we move d[k_28][m_46] from loop l to loop m before doing >vectorization, we can get rid of this situation. But we can't run every pass after every other. With multiple passes having ordering issues is inevitable. Now - interchange could trigger a region based invariant motion just for the nest it interchanged. But that doesn't exist right now. Richard.
Re: Do we need to do a loop invariant motion after loop interchange ?
On Fri, Nov 22, 2019 at 3:19 PM Richard Biener wrote: > > On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He > wrote: > > > > > >On 2019/11/21 8:10 PM, Richard Biener wrote: > >> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He > >wrote: > >>> > >>> Hi, > >>> > >>> I found for the follow code: > >>> > >>> #define N 256 > >>> int a[N][N][N], b[N][N][N]; > >>> int d[N][N], c[N][N]; > >>> void __attribute__((noinline)) > >>> double_reduc (int n) > >>> { > >>> for (int k = 0; k < n; k++) > >>> { > >>> for (int l = 0; l < n; l++) > >>>{ > >>> c[k][l] = 0; > >>> for (int m = 0; m < n; m++) > >>> c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; > >>>} > >>> } > >>> } > >>> > >>> I dumped the file after loop interchange and got the following > >information: > >>> > >>> [local count: 118111600]: > >>> # m_46 = PHI <0(7), m_45(11)> > >>> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)> > >>> _39 = _49 + 1; > >>> > >>> [local count: 955630224]: > >>> # l_48 = PHI <0(3), l_47(12)> > >>> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)> > >>> c_I_I_lsm.5_18 = c[k_28][l_48]; > >>> c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; > >>> _2 = a[k_28][m_46][l_48]; > >>> _3 = d[k_28][m_46]; > >>> _4 = _2 * _3; > >>> _5 = b[k_28][m_46][l_48]; > >>> _6 = _3 * _5; > >>> _7 = _4 + _6; > >>> _8 = _7 + c_I_I_lsm.5_53; > >>> c[k_28][l_48] = _8; > >>> l_47 = l_48 + 1; > >>> ivtmp_40 = ivtmp_41 - 1; > >>> if (ivtmp_40 != 0) > >>> goto ; [89.00%] > >>> else > >>> goto ; [11.00%] > >>> > >>> we can see '_3 = d[k_28][m_46];' is a loop invariant. > >>> Do we need to add a loop invariant motion pass after the loop > >interchange? > >> > >> There is one at the end of the loop pipeline. > > > >Hi, > > > >The one at the end of the loop pipeline may miss some optimization > >opportunities. If we vectorize the above code (a.c.158t.vect), we > >can get information similar to the following: > > > >bb 3: > > # m_46 = PHI <0(7), m_45(11)> // loop m, outer loop > > if (_59 <= 2) > > goto bb 20; > > else > > goto bb 15; > > > >bb 15: > > _89 = d[k_28][m_46]; > > vect_cst__90 = {_89, _89, _89, _89}; > > > >bb 4: > ># l_48 = PHI // loop l, inner loop > > vect__6.23_100 = vect_cst__99 * vect__5.22_98; > >if (ivtmp_110 < bnd.8_1) > > goto bb 12; > > else > > goto bb 17; > > > >bb 20: > >bb 18: > >_27 = d[k_28][m_46]; > >if (ivtmp_12 != 0) > > goto bb 19; > > else > > goto bb 21; > > > >Vectorization will do some conversions in this case. We can see > >‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant > >relative to loop l. We can move ‘d[k_28][m_46]’ to the front of > >‘if (_59 <= 2)’ to get rid of loading data from memory in both > >branches. > > > >The one at at the end of the loop pipeline can't handle this situation. > >If we move d[k_28][m_46] from loop l to loop m before doing > >vectorization, we can get rid of this situation. > > But we can't run every pass after every other. With multiple passes having > ordering issues is inevitable. > > Now - interchange could trigger a region based invariant motion just for the > nest it interchanged. But that doesn't exist right now. With data reference/dependence information in the pass, I think it could be quite straightforward. Didn't realize that we need it before. Thanks, bin > > Richard.