Hello, I found current modulo pipelining very inefficient for many loops. One reason is primitive cross-iteration memory dependency analysis. The add_inter_loop_mem_dep function in ddg.c just draws true dependency between every write and read pair. This is quite inadequate since many loops read from memory at the beginning of the loop and wrte to the memory at the end. In the end, we obtain schedule no better than list scheduling.
I am aware of this work of propagating Tree-level dependency info to RTL (http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/melnik-propagation-greps2007.pdf). It should help a lot in improving memory dependency analysis. Is there any plan for this work to make into GCC mainline? Thanks in advance. Kind Regards, Bingfeng Mei Broadcom UK