I think it is useful to have a bugzilla here.will do.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40168
Btw, complete unrolling is also hindred by the artificial limit of maximally unrolling 16 iterations. Your inner loops iterate 27 times. Also by the artificial limit of the maximal unrolled size. With --param max-completely-peel-times=27 --param max-completely-peeled-insns=666 (values for trunk) the loops are unrolled at -O3.
hmmm. but leading to slower code.