On Tue, Jun 27, 2017 at 1:44 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 12:30 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Tue, Jun 20, 2017 at 10:22 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>> On Mon, Jun 12, 2017 at 6:03 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>> Hi,
>> Rebased V3 for changes in previous patches.  Bootstap and test on
>> x86_64 and aarch64.
>
> why is ldist-12.c no longer distributed?  your comment says it doesn't expose
> more "parallelism" but the point is to reduce memory bandwith requirements
> which it clearly does.
>
> Likewise for -13.c, -14.c.  -4.c may be a questionable case but the wording
> of "parallelism" still confuses me.
>
> Can you elaborate on that.  Now onto the patch:
Given we don't model data locality or memory bandwidth, whether
distribution enables loops that can be executed paralleled becomes the
major criteria for distribution.  BTW, I think a good memory stream
optimization model shouldn't consider small loops as in ldist-12.c,
etc., appropriate for distribution.

>
> +   Loop distribution is the dual of loop fusion.  It separates statements
> +   of a loop (or loop nest) into multiple loops (or loop nests) with the
> +   same loop header.  The major goal is to separate statements which may
> +   be vectorized from those that can't.  This pass implements distribution
> +   in the following steps:
>
> misses the goal of being a memory stream optimization, not only a 
> vectorization
> enabler.  distributing a loop can also reduce register pressure.
I will revise the comment, but as explained, enabling more
vectorization is the major criteria for distribution to some extend
now.

>
> You introduce ldist_alias_id in struct loop (probably in 01/n which I
> didn't look
> into yet).  If you don't use that please introduce it separately.
Hmm, yes it is introduced in patch [01/n] and set in this patch.

>
> +             /* Be conservative.  If data references are not well analyzed,
> +                or the two data references have the same base address and
> +                offset, add dependence and consider it alias to each other.
> +                In other words, the dependence can not be resolved by
> +                runtime alias check.  */
> +             if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
> +                 || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
> +                 || !DR_INIT (dr1) || !DR_INIT (dr2)
> +                 || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))
> +                 || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))
> +                 || res == 0)
>
> ISTR a helper that computes whether we can handle a runtime alias check for
> a specific case?
I guess you mean runtime_alias_check_p that I factored out previously?
 Unfortunately, it's factored out vectorizer's usage and doesn't fit
here straightforwardly.  Shall I try to further generalize the
interface as independence patch to this one?

>
> +  /* Depend on vectorizer to fold IFN_LOOP_DIST_ALIAS.  */
> +  if (flag_tree_loop_vectorize)
> +    {
>
> so at this point I'd condition the whole runtime-alias check generating
> on flag_tree_loop_vectorize.  You seem to support versioning w/o
> that here but in other places disable versioning w/o flag_tree_loop_vectorize.
> That looks somewhat inconsistent...
It is a bit complicated.  In function version_for_distribution_p, we have
+
+  /* Need to version loop if runtime alias check is necessary.  */
+  if (alias_ddrs->length () > 0)
+    return true;
+
+  /* Don't version the loop with call to IFN_LOOP_DIST_ALIAS if
+     vectorizer is not enable because no other pass can fold it.  */
+  if (!flag_tree_loop_vectorize)
+    return false;
+

It means we also versioning loops even if runtime alias check is
unnecessary.  I did this because we lack cost model and current
distribution may result in too many distribution?  If that's the case,
at least vectorizer will remove distributed version loop and fall back
to the original one.  Hmm, shall I change it into below code:
+
+  /* Need to version loop if runtime alias check is necessary.  */
+  if (alias_ddrs->length () == 0)
+    return false;
+
+  /* Don't version the loop with call to IFN_LOOP_DIST_ALIAS if
+     vectorizer is not enable because no other pass can fold it.  */
+  if (!flag_tree_loop_vectorize)
+    return false;
+

Then I can remove the check you mentioned in function
version_loop_by_alias_check?

>
> +  /* Don't version loop if any partition is builtin.  */
> +  for (i = 0; partitions->iterate (i, &partition); ++i)
> +    {
> +      if (partition->kind != PKIND_NORMAL)
> +       break;
> +    }
>
> why's that?  Do you handle the case where only a subset of partitions
One reason is I generally consider distributed builtin functions as a
win, thus distribution won't be canceled later in vectorizer.  Another
reason is if all distributed loops are recognized as builtins, we
can't really version with current logic because the
IFN_LOOP_DIST_ALIAS won't be folded in vectorizer.

> require a runtime alias check to be generated?  Thus from a loop
> generate three loops, only two of them being versioned?  The
> complication would be that both the runtime alias and non-alias
> versions would be "transformed".  Or do we rely on recursively
> distributing the distribution result, thus if we have partitions that
> can be handled w/o runtime alias check fuse the remaining partitions
> and recurse on those?
No, this is not precisely handled now, the pass will version the whole
loop once.  Though I think it's not very difficult to do two stages
distribution, I am not sure how useful it is.

Thanks,
bin

Reply via email to