Hi Christophe, > -----Original Message----- > From: Christophe Lyon [mailto:christophe.l...@linaro.org] > Sent: Tuesday, May 24, 2016 8:45 PM > To: Kumar, Venkataramanan <venkataramanan.ku...@amd.com> > Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org > Subject: Re: [Patch V2] Fix SLP PR58135. > > Hi Venkat, > > > On 23 May 2016 at 11:54, Kumar, Venkataramanan > <venkataramanan.ku...@amd.com> wrote: > > Hi Richard, > > > >> -----Original Message----- > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > >> Sent: Thursday, May 19, 2016 4:08 PM > >> To: Kumar, Venkataramanan <venkataramanan.ku...@amd.com> > >> Cc: gcc-patches@gcc.gnu.org > >> Subject: Re: [Patch V2] Fix SLP PR58135. > >> > >> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan > >> <venkataramanan.ku...@amd.com> wrote: > >> > Hi Richard, > >> > > >> >> -----Original Message----- > >> >> From: Richard Biener [mailto:richard.guent...@gmail.com] > >> >> Sent: Tuesday, May 17, 2016 5:40 PM > >> >> To: Kumar, Venkataramanan <venkataramanan.ku...@amd.com> > >> >> Cc: gcc-patches@gcc.gnu.org > >> >> Subject: Re: [Patch V2] Fix SLP PR58135. > >> >> > >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan > >> >> <venkataramanan.ku...@amd.com> wrote: > >> >> > Hi Richard, > >> >> > > >> >> > I created the patch by passing -b option to git. Now the patch > >> >> > is more > >> >> readable. > >> >> > > >> >> > As per your suggestion I tried to fix the PR by splitting the > >> >> > SLP store group at > >> >> vector boundary after the SLP tree is built. > >> >> > > >> >> > Boot strap PASSED on x86_64. > >> >> > Checked the patch with check_GNU_style.sh. > >> >> > > >> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. > >> >> > Hence it > >> >> generated 2 more vzeroupper. > >> >> > As recommended I adjusted the test case by adding > >> >> > -fno-tree-slp-vectorize > >> >> to make it as expected after loop vectorization. > >> >> > > >> >> > The following tests are now passing. > >> >> > > >> >> > ------ Snip----- > >> >> > Tests that now work, but didn't before: > >> >> > > >> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects > >> >> > scan-tree-dump-times > >> >> > slp2 "basic block vectorized" 1 > >> >> > > >> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block > >> >> > vectorized" 1 > >> >> > > >> >> > New tests that PASS: > >> >> > > >> >> > gcc.dg/vect/pr58135.c (test for excess errors) > >> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess > >> >> > errors) > >> >> > > >> >> > ------ Snip----- > >> >> > > >> >> > ChangeLog > >> >> > > >> >> > 2016-05-14 Venkataramanan Kumar > >> >> <venkataramanan.ku...@amd.com> > >> >> > PR tree-optimization/58135 > >> >> > * tree-vect-slp.c: When group size is not multiple of vector > >> >> > size, > >> >> > allow splitting of store group at vector boundary. > >> >> > > >> >> > Test suite ChangeLog > >> >> > 2016-05-14 Venkataramanan Kumar > >> >> <venkataramanan.ku...@amd.com> > >> >> > * gcc.dg/vect/bb-slp-19.c: Remove XFAIL. > >> >> > * gcc.dg/vect/pr58135.c: Add new. > >> >> > * gfortran.dg/pr46519-1.f: Adjust test case. > >> >> > > >> >> > The attached patch Ok for trunk? > >> >> > >> >> > >> >> Please avoid the excessive vertical space around the > >> >> vect_build_slp_tree > >> call. > >> > Yes fixed in the attached patch. > >> >> > >> >> + /* Calculate the unrolling factor. */ > >> >> + unrolling_factor = least_common_multiple > >> >> + (nunits, group_size) / group_size; > >> >> ... > >> >> + else > >> >> { > >> >> /* Calculate the unrolling factor based on the smallest type. > >> >> */ > >> >> if (max_nunits > nunits) > >> >> - unrolling_factor = least_common_multiple (max_nunits, > group_size) > >> >> - / group_size; > >> >> + unrolling_factor > >> >> + = least_common_multiple (max_nunits, > >> >> + group_size)/group_size; > >> >> > >> >> please compute the "correct" unroll factor immediately and move > >> >> the "unrolling of BB required" error into the if() case by > >> >> post-poning the nunits < group_size check (and use max_nunits here). > >> >> > >> > Yes fixed in the attached patch. > >> > > >> >> + if (is_a <bb_vec_info> (vinfo) > >> >> + && nunits < group_size > >> >> + && unrolling_factor != 1 > >> >> + && is_a <bb_vec_info> (vinfo)) > >> >> + { > >> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >> >> + "Build SLP failed: store group " > >> >> + "size not a multiple of the vector size " > >> >> + "in basic block SLP\n"); > >> >> + /* Fatal mismatch. */ > >> >> + matches[nunits] = false; > >> >> > >> >> this is too pessimistic - you want to add the extra 'false' at > >> >> group_size / max_nunits * max_nunits. > >> > Yes fixed in attached patch. > >> > > >> >> > >> >> It looks like you leak 'node' in the if () path as well. You need > >> >> > >> >> vect_free_slp_tree (node); > >> >> loads.release (); > >> >> > >> >> thus treat it as a failure case. > >> > > >> > Yes fixed. I added an else part before scalar_stmts.release call > >> > for the case > >> when SLP tree is not built. This avoids double freeing. > >> > Bootstrapped and reg tested on X86_64. > >> > > > This patch is causing regressions on armeb: > http://people.linaro.org/~christophe.lyon/cross- > validation/gcc/trunk/236591/report-build-info.html > The following fortran tests now fail at runtime: > > gfortran.dg/c_f_pointer_logical.f03 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > gfortran.dg/c_f_pointer_logical.f03 -O3 -g execution test > gfortran.dg/intrinsic_pack_1.f90 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > gfortran.dg/intrinsic_pack_1.f90 -O3 -g execution test > > Can you have a look? >
Sure. I will take a look. > Thanks, > > Christophe. > > > > >> > Ok for trunk ? > >> > >> + /*Calculate the unrolling factor based on the smallest type. */ > >> + unrolling_factor > >> > >> Space after /* missing. > >> > >> + unrolling_factor > >> + = least_common_multiple (max_nunits, group_size)/group_size; > >> > >> spaces before/after the '/' > >> > >> + if (max_nunits > nunits > >> + && unrolling_factor != 1 > >> + && is_a <bb_vec_info> (vinfo)) > >> { > >> if (dump_enabled_p ()) > >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >> - "Build SLP failed: unrolling required in > >> basic" > >> - " block SLP\n"); > >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, > >> + vect_location, > >> + "Build SLP failed: unrolling " > >> + "required in basic block SLP\n"); > >> vect_free_slp_tree (node); > >> loads.release (); > >> return false; > >> } > >> > >> + if (is_a <bb_vec_info> (vinfo) > >> + && max_nunits < group_size > >> + && unrolling_factor != 1) > >> > >> please merge these. I think you have a not covered case of > >> max_nunits == nunits, max_nunits > group_size. > >> So do > >> > >> if (unrolling_factor != 1 > >> && is_a <...>) > >> { > >> if (max_nunits >= group_size) > >> { > >> first error > >> } > >> ... signal fatal mismatch instead... > >> } > >> else > >> { > >> > >> Ok with that change. > > > > Thank you for the review. I updated and patch and up streamed it to trunk. > > Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582 > > > > Regards, > > Venkat. Regards, Venkat.