On July 4, 2015 4:17:02 PM GMT+02:00, Jakub Jelinek <ja...@redhat.com> wrote: >On Sat, Jul 04, 2015 at 09:20:04AM +0200, Jakub Jelinek wrote: >> On Fri, Jul 03, 2015 at 03:21:47PM +0200, Marek Polacek wrote: >> > This patch implements a new pass, called laddress, which deals with >> > lowering ADDR_EXPR assignments. Such lowering ought to help the >> > vectorizer, but it also could expose more CSE opportunities, maybe >> > help reassoc, etc. It's only active when optimize != 0. >> > >> > So e.g. >> > _1 = (sizetype) i_9; >> > _7 = _1 * 4; >> > _4 = &b + _7; >> > instead of >> > _4 = &b[i_9]; >> > >> > This triggered 14105 times during the regtest and 6392 times during >> > the bootstrap. >> > >> > The fallout (at least on x86_64) is surprisingly small, i.e. none, >just >> > gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is >due >> > to a bug in the vectorizer. Jakub has a patch and knows the >details. >> > As the test shows, we're now able to vectorize ADDR_EXPR of >non-invariants >> > (that was the motivation of this pass). >> >> Just FYI, while bootstrapping/regtesting your patch together with the >one >> I've posted and another one to vectorize pr59984.c better, I've >noticed there >> is another regression with your patch (reverting my patches doesn't >help, >> disabling your gate does help): >> >> +FAIL: libgomp.c/simd-3.c execution test >> +FAIL: libgomp.c++/simd-3.C execution test >> >> on both x86_64-linux and i686-linux (at least on AVX capable box). >> Most likely hitting another latent vectorizer issue, haven't analyzed >it >> yet. > >Here is a fix for that, bootstrapped/regtested on x86_64-linux and >i686-linux on top of Marek's patch and my two other patches, ok for >trunk?
OK. Thanks, Richard. >The problem was that in loops that don't need any scalar post-loop the >GOMP_SIMD_LAST_LANE value was wrong - 0 instead of vf - 1. > >2015-07-04 Jakub Jelinek <ja...@redhat.com> > > PR tree-optimization/66718 > * tree-vect-stmts.c (vectorizable_call): Replace uses of > GOMP_SIMD_LANE outside of loop with vf - 1 rather than 0. > >--- gcc/tree-vect-stmts.c.jj 2015-07-03 20:43:42.000000000 +0200 >+++ gcc/tree-vect-stmts.c 2015-07-04 14:08:18.659356110 +0200 >@@ -2601,6 +2601,30 @@ vectorizable_call (gimple gs, gimple_stm > lhs = gimple_call_lhs (STMT_VINFO_RELATED_STMT (stmt_info)); > else > lhs = gimple_call_lhs (stmt); >+ >+ if (gimple_call_internal_p (stmt) >+ && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE) >+ { >+ /* Replace uses of the lhs of GOMP_SIMD_LANE call outside the >loop >+ with vf - 1 rather than 0, that is the last iteration of the >+ vectorized loop. */ >+ imm_use_iterator iter; >+ use_operand_p use_p; >+ gimple use_stmt; >+ FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) >+ { >+ basic_block use_bb = gimple_bb (use_stmt); >+ if (use_bb >+ && !flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo), >use_bb)) >+ { >+ FOR_EACH_IMM_USE_ON_STMT (use_p, iter) >+ SET_USE (use_p, build_int_cst (TREE_TYPE (lhs), >+ ncopies * nunits_out - 1)); >+ update_stmt (use_stmt); >+ } >+ } >+ } >+ > new_stmt = gimple_build_assign (lhs, build_zero_cst (type)); > set_vinfo_for_stmt (new_stmt, stmt_info); > set_vinfo_for_stmt (stmt, NULL); > > > Jakub