On Thu, Mar 31, 2016 at 6:43 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, Mar 29, 2016 at 9:37 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Mon, Mar 28, 2016 at 9:57 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> Sorry, Should have replied to gcc-patches list. >>> >>> Thanks, >>> bin >>> >>> ---------- Forwarded message ---------- >>> From: "Bin.Cheng" <amker.ch...@gmail.com> >>> Date: Tue, 29 Mar 2016 03:55:04 +0800 >>> Subject: Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking >>> DR against its innermost loop bahavior if possible >>> To: Richard Biener <richard.guent...@gmail.com> >>> >>> On 3/17/16, Richard Biener <richard.guent...@gmail.com> wrote: >>>> On Wed, Mar 16, 2016 at 5:17 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>>> On Wed, Mar 16, 2016 at 12:20 PM, Richard Biener >>>>> <richard.guent...@gmail.com> wrote: >>>>>> >>>>>> Hmm. >>>>> Hi, >>>>> Thanks for reviewing. >>>>>> >>>>>> + equal_p = true; >>>>>> + if (e1->base_address && e2->base_address) >>>>>> + equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0); >>>>>> + if (e1->offset && e2->offset) >>>>>> + equal_p &= operand_equal_p (e1->offset, e2->offset, 0); >>>>>> >>>>>> surely better to return false early. >>>>>> >>>>>> I think we don't want this in tree-data-refs.h also because of ... >>>>>> >>>>>> @@ -615,15 +619,29 @@ >>>>>> hash_memrefs_baserefs_and_store_DRs_read_written_info >>>>>> (data_reference_p a) >>>>>> data_reference_p *master_dr, *base_master_dr;and REALPART) before >>>>>> creating the DR (or adjust the equality function >>>>> and hashing >>>>>> tree ref = DR_REF (a); >>>>>> tree base_ref = DR_BASE_OBJECT (a); >>>>>> + innermost_loop_behavior *innermost = &DR_INNERMOST (a); >>>>>> tree ca = bb_predicate (gimple_bb (DR_STMT (a))); >>>>>> bool exist1, exist2; >>>>>> >>>>>> - while (TREE_CODE (ref) == COMPONENT_REF >>>>>> - || TREE_CODE (ref) == IMAGPART_EXPR >>>>>> - || TREE_CODE (ref) == REALPART_EXPR) >>>>>> - ref = TREE_OPERAND (ref, 0); >>>>>> + /* If reference in DR has innermost loop behavior and it is not >>>>>> + a compound memory reference, we store it to innermost_DR_map, >>>>>> + otherwise to ref_DR_map. */ >>>>>> + if (TREE_CODE (ref) == COMPONENT_REF >>>>>> + || TREE_CODE (ref) == IMAGPART_EXPR >>>>>> + || TREE_CODE (ref) == REALPART_EXPR >>>>>> + || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a) >>>>>> + || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a))) >>>>>> + { >>>>>> + while (TREE_CODE (ref) == COMPONENT_REF >>>>>> + || TREE_CODE (ref) == IMAGPART_EXPR >>>>>> + || TREE_CODE (ref) == REALPART_EXPR) >>>>>> + ref = TREE_OPERAND (ref, 0); >>>>>> + >>>>>> + master_dr = &ref_DR_map->get_or_insert (ref, &exist1); >>>>>> + } >>>>>> + else >>>>>> + master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1); >>>>>> >>>>>> we don't want an extra hashmap but replace ref_DR_map entirely. So we'd >>>>>> need to >>>>>> strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART >>>>>> and REALPART) before creating the DR (or adjust the equality function >>>>>> and hashing >>>>>> to disregard them which means subtracting their offset from DR_INIT. >>>>> I am not sure if I understand correctly. But for component reference, >>>>> it is the base object that we want to record/track. For example, >>>>> >>>>> for (i = 0; i < N; i++) { >>>>> m = *data++; >>>>> >>>>> m1 = p1->x - m; >>>>> m2 = p2->x + m; >>>>> >>>>> p3->y = (m1 >= m2) ? p1->y : p2->y; >>>>> >>>>> p1++; >>>>> p2++; >>>>> p3++; >>>>> } >>>>> We want to infer that reads of p1/p2 in condition statement won't trap >>>>> because there are unconditional reads of the structures, though the >>>>> unconditional reads are actual of other sub-objects. Here it is the >>>>> invariant part of address that we want to track. >>>> >>>> Well, the variant parts - we want to strip invariant parts as far as we can >>>> (offsetof (x) and offsetof (y)) >>>> >>>>> Also illustrated by this example, we can't rely on data-ref analyzer >>>>> here. Because in gathering/scattering cases, the address could be not >>>>> affine at all. >>>> >>>> Sure, but that's a different issue. >>>> >>>>>> >>>>>> To adjust the references we collect you'd maybe could use a callback >>>>>> to get_references_in_stmt >>>>>> to adjust them. >>>>>> >>>>>> OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple >>>>>> as >>>>> Is this a part of the method you suggested above, or is it an >>>>> alternative one? If it's the latter, then I have below questions >>>>> embedded. >>>> >>>> It is an alternative to adding a hook to get_references_in_stmt and >>>> probably "easier". >>>> >>>>>> >>>>>> Index: tree-if-conv.c >>>>>> =================================================================== >>>>>> --- tree-if-conv.c (revision 234215) >>>>>> +++ tree-if-conv.c (working copy) >>>>>> @@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo >>>>>> >>>>>> for (i = 0; refs->iterate (i, &dr); i++) >>>>>> { >>>>>> + tree *refp = &DR_REF (dr); >>>>>> + while ((TREE_CODE (*refp) == COMPONENT_REF >>>>>> + && TREE_OPERAND (*refp, 2) == NULL_TREE) >>>>>> + || TREE_CODE (*refp) == IMAGPART_EXPR >>>>>> + || TREE_CODE (*refp) == REALPART_EXPR) >>>>>> + refp = &TREE_OPERAND (*refp, 0); >>>>>> + if (refp != &DR_REF (dr)) >>>>>> + { >>>>>> + tree saved_base = *refp; >>>>>> + *refp = integer_zero_node; >>>>>> + >>>>>> + if (DR_INIT (dr)) >>>>>> + { >>>>>> + tree poffset; >>>>>> + int punsignedp, preversep, pvolatilep; >>>>>> + machine_mode pmode; >>>>>> + HOST_WIDE_INT pbitsize, pbitpos; >>>>>> + get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, >>>>>> &poffset, >>>>>> + &pmode, &punsignedp, &preversep, >>>>>> &pvolatilep, >>>>>> + false); >>>>>> + gcc_assert (poffset == NULL_TREE); >>>>>> + >>>>>> + DR_INIT (dr) >>>>>> + = wide_int_to_tree (ssizetype, >>>>>> + wi::sub (DR_INIT (dr), >>>>>> + pbitpos / BITS_PER_UNIT)); >>>>>> + } >>>>>> + >>>>>> + *refp = saved_base; >>>>>> + DR_REF (dr) = *refp; >>>>>> + } >>>>> Looks to me the code is trying to resolve difference between two (or >>>>> more) component references, which is DR_INIT in the code. But DR_INIT >>>>> is not the only thing needs to be handled. For a structure containing >>>>> two sub-arrays, DR_OFFSET may be different too. >>>> >>>> Yes, but we can't say that if >>>> >>>> a->a[i] >>>> >>>> doesn't trap that then >>>> >>>> a->b[i] >>>> >>>> doesn't trap either. We can only "strip" outermost >>>> non-variable-offset components. >>>> >>>> But maybe I'm missing what example you are thinking of. >>> Hmm, this was the case I meant. What I don't understand is current >>> code logic does infer trap information for a.b[i] from a.a[i]. Given >>> below example: >>> struct str >>> { >>> int a[10]; >>> int b[20]; >>> char c; >>> }; >>> >>> void bar (struct str *); >>> int foo (int x, int n) >>> { >>> int i; >>> struct str s; >>> bar (&s); >>> for (i = 0; i < n; i++) >>> { >>> s.a[i] = s.b[i]; >>> if (x > i) >>> s.b[i] = 0; >>> } >>> bar (&s); >>> return 0; >>> } >>> The loop is convertible because of below code in function >>> ifcvt_memrefs_wont_trap: >>> >>> /* If a is unconditionally accessed then ... */ >>> if (DR_RW_UNCONDITIONALLY (*master_dr)) >>> { >>> /* an unconditional read won't trap. */ >>> if (DR_IS_READ (a)) >>> return true; >>> >>> /* an unconditionaly write won't trap if the base is written >>> to unconditionally. */ >>> if (base_master_dr >>> && DR_BASE_W_UNCONDITIONALLY (*base_master_dr)) >>> return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES); >>> else >>> { >>> /* or the base is know to be not readonly. */ >>> tree base_tree = get_base_address (DR_REF (a)); >>> if (DECL_P (base_tree) >>> && decl_binds_to_current_def_p (base_tree) >>> && ! TREE_READONLY (base_tree)) >>> return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES); >>> } >>> } >>> It is the main object '&s' that is recorded in base_master_dr, so >>> s.b[i] is considered not trap. Even it's not, I suppose >>> get_base_address will give same result? >> >> Well, for this case it sees that s.b[i] is read from so it can't be an >> out-of-bound >> access. And s.a[i] is written to unconditionally so 's' cannot be a >> readonly object. >> With both pieces of information we can conclude that s.b[i] = 0 doesn't trap. > > Hi Richard, > Attachment is the updated patch. I made below changes wrto your > review comments: > 1) Moved hash class for innermost loop behavior from tree-data-ref.h > to tree-if-conv.c. > I also removed check on innermost_loop_behavior.aligned_to which > seems redundant to me because it's computed from offset and we have > already checked equality for offset. > 2) Replaced ref_DR_map with new map innermost_DR_map. > 3) Post-processed DR in if_convertible_loop_p_1 for compound reference > or references don't have innermost behavior. This cleans up following > code using DR information. > 4) Added a vectorization test for PR56625. > > I didn't incorporate your proposed code because I think it handles > COMPONENT_REF of structure array like struct_arr[i].x/struct_arr[i].y.
But the previous code already handled it, so not handling it would be a regression. Note that I think your patch will handle it as well given you hash innermost behavior. > Looks to me it is another ifcvt opportunity different from PR69489. > Anyway, fix is easy, I can send another patch in GCC7. > > Bootstrap and test on x86_64/AArch64, so any comments on this version? Looks good to me. Richard. > Thanks, > bin > > 2016-03-30 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/56625 > PR tree-optimization/69489 > * tree-data-ref.h (DR_INNERMOST): New macro. > * tree-if-conv.c (innermost_loop_behavior_hash): New class for > hashing struct innermost_loop_behavior. > (ref_DR_map): Remove. > (innermost_DR_map): New map. > (baseref_DR_map): Revise comment. > (hash_memrefs_baserefs_and_store_DRs_read_written_info): Store DR > to innermost_DR_map accroding to its innermost loop behavior. > (ifcvt_memrefs_wont_trap): Get DR from innermost_DR_map according > to its innermost loop behavior. > (if_convertible_loop_p_1): Remove intialization for ref_DR_map. > Add initialization for innermost_DR_map. Record memory reference > in DR_BASE_ADDRESS if the reference is compound one or it doesn't > have innermost loop behavior. > (if_convertible_loop_p): Remove release for ref_DR_map. Release > innermost_DR_map. > > gcc/testsuite/ChangeLog > 2016-03-30 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/56625 > PR tree-optimization/69489 > * gcc.dg/vect/pr56625.c: New test. > * gcc.dg/tree-ssa/ifc-pr69489-1.c: New test.