On Sat, Jan 20, 2018 at 4:10 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Fri, Jan 19, 2018 at 5:42 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>> Hi,
>> This patch is supposed to fix regression caused by loop distribution when
>> ftree-parallelize-loops.  The reason is distributed memset call can't be
>> understood/analyzed in data reference analysis, as a result, parloop can
>> only parallelize the innermost 2-level loop nest.  Before distribution
>> change, parloop can parallelize the innermost 3-level loop nest, i.e,
>> more parallelization.
>> As commented in the PR, ideally, loop distribution should be able to
>> distribute memset call for 3-level loop nest.  Unfortunately this requires
>> sophisticated work proving equality between tree expressions which gcc
>> is not good at now.
>> Another fix is to improve data reference analysis so that memset call
>> can be supported.  We don't know how big this change is and it's definitely
>> not GCC 8 task.
>>
>> So this patch fixes the regression in a bit hacking way.  It first enables
>> 3-level loop nest distribution when flag_tree_parloops > 1.  Secondly, it
>> supports 3-level loop nest distribution for ZERO-ing stmt which can only
>> be distributed as a loop (nest) of memset, but can't be distributed as a
>> single memset.  The overall effect is ZERO-ing stmt will be distributed
>> to one loop deeper than now, so parloop can parallelize as before.
>>
>> Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK if no errors?

Ok.

Thanks,
Richard.

> Test finished without error.  Also I checked
> -ftree-parallelize-loops=6 on AArch64 and can confirm the regression
> is resolved.
>
> Thanks,
> bin
>>
>> Thanks,
>> bin
>> 2018-01-19  Bin Cheng  <bin.ch...@arm.com>
>>
>>         PR tree-optimization/82604
>>         * tree-loop-distribution.c (enum partition_kind): New enum item
>>         PKIND_PARTIAL_MEMSET.
>>         (partition_builtin_p): Support above new enum item.
>>         (generate_code_for_partition): Ditto.
>>         (compute_access_range): Differentiate cases that equality can be
>>         proven at all loops, the innermost loops or no loops.
>>         (classify_builtin_st, classify_builtin_ldst): Adjust call to above
>>         function.  Set PKIND_PARTIAL_MEMSET for partition appropriately.
>>         (finalize_partitions, distribute_loop): Don't fuse partition of
>>         PKIND_PARTIAL_MEMSET kind when distributing 3-level loop nest.
>>         (prepare_perfect_loop_nest): Distribute 3-level loop nest only if
>>         parloop is enabled.

Reply via email to