On Sat, Jan 20, 2018 at 4:10 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, Jan 19, 2018 at 5:42 PM, Bin Cheng <bin.ch...@arm.com> wrote: >> Hi, >> This patch is supposed to fix regression caused by loop distribution when >> ftree-parallelize-loops. The reason is distributed memset call can't be >> understood/analyzed in data reference analysis, as a result, parloop can >> only parallelize the innermost 2-level loop nest. Before distribution >> change, parloop can parallelize the innermost 3-level loop nest, i.e, >> more parallelization. >> As commented in the PR, ideally, loop distribution should be able to >> distribute memset call for 3-level loop nest. Unfortunately this requires >> sophisticated work proving equality between tree expressions which gcc >> is not good at now. >> Another fix is to improve data reference analysis so that memset call >> can be supported. We don't know how big this change is and it's definitely >> not GCC 8 task. >> >> So this patch fixes the regression in a bit hacking way. It first enables >> 3-level loop nest distribution when flag_tree_parloops > 1. Secondly, it >> supports 3-level loop nest distribution for ZERO-ing stmt which can only >> be distributed as a loop (nest) of memset, but can't be distributed as a >> single memset. The overall effect is ZERO-ing stmt will be distributed >> to one loop deeper than now, so parloop can parallelize as before. >> >> Bootstrap and test on x86_64 and AArch64 ongoing. Is it OK if no errors?
Ok. Thanks, Richard. > Test finished without error. Also I checked > -ftree-parallelize-loops=6 on AArch64 and can confirm the regression > is resolved. > > Thanks, > bin >> >> Thanks, >> bin >> 2018-01-19 Bin Cheng <bin.ch...@arm.com> >> >> PR tree-optimization/82604 >> * tree-loop-distribution.c (enum partition_kind): New enum item >> PKIND_PARTIAL_MEMSET. >> (partition_builtin_p): Support above new enum item. >> (generate_code_for_partition): Ditto. >> (compute_access_range): Differentiate cases that equality can be >> proven at all loops, the innermost loops or no loops. >> (classify_builtin_st, classify_builtin_ldst): Adjust call to above >> function. Set PKIND_PARTIAL_MEMSET for partition appropriately. >> (finalize_partitions, distribute_loop): Don't fuse partition of >> PKIND_PARTIAL_MEMSET kind when distributing 3-level loop nest. >> (prepare_perfect_loop_nest): Distribute 3-level loop nest only if >> parloop is enabled.