https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Tejas Belagod from comment #7) > I tried this, but it still doesn't seem to fold for aarch64. > > So, here is the DOM trace for aarch64: > > Optimizing statement a = *.LC0; Why do we get LC0 in the first place? It seems like it is happening because of some cost model issue with MOVECOST. > LKUP STMT a = *.LC0 with .MEM_3(D) > LKUP STMT *.LC0 = a with .MEM_3(D) > Optimizing statement vectp_a.5_1 = &a; > LKUP STMT vectp_a.5_1 = &a > ==== ASGN vectp_a.5_1 = &a > Optimizing statement vect__6.6_13 = MEM[(int *)vectp_a.5_1]; > Replaced 'vectp_a.5_1' with constant '&aD.2604' > LKUP STMT vect__6.6_13 = MEM[(int *)&a] with .MEM_4 > 2>>> STMT vect__6.6_13 = MEM[(int *)&a] with .MEM_4 > Optimizing statement vect_sum_7.7_6 = vect__6.6_13; > LKUP STMT vect_sum_7.7_6 = vect__6.6_13 > ==== ASGN vect_sum_7.7_6 = vect__6.6_13 > Optimizing statement vectp_a.4_7 = vectp_a.5_1 + 16; > Replaced 'vectp_a.5_1' with constant '&aD.2604' > LKUP STMT vectp_a.4_7 = &a pointer_plus_expr 16 > 2>>> STMT vectp_a.4_7 = &a pointer_plus_expr 16 > ==== ASGN vectp_a.4_7 = &MEM[(void *)&a + 16B] > Optimizing statement ivtmp_8 = 1; > LKUP STMT ivtmp_8 = 1 > ==== ASGN ivtmp_8 = 1 > Optimizing statement vect__6.6_10 = MEM[(int *)vectp_a.4_7]; > Replaced 'vectp_a.4_7' with constant '&MEM[(voidD.39 *)&aD.2604 + 16B]' > Folded to: vect__6.6_10 = MEM[(int *)&a + 16B]; > LKUP STMT vect__6.6_10 = MEM[(int *)&a + 16B] with .MEM_4 > 2>>> STMT vect__6.6_10 = MEM[(int *)&a + 16B] with .MEM_4 > Optimizing statement vect_sum_7.7_17 = vect_sum_7.7_6 + vect__6.6_10; > Replaced 'vect_sum_7.7_6' with variable 'vect__6.6_13' > gimple_simplified to vect_sum_7.7_17 = vect__6.6_10 + vect__6.6_13; > Folded to: vect_sum_7.7_17 = vect__6.6_10 + vect__6.6_13; > LKUP STMT vect_sum_7.7_17 = vect__6.6_10 plus_expr vect__6.6_13 > 2>>> STMT vect_sum_7.7_17 = vect__6.6_10 plus_expr vect__6.6_13 > ... > > In x86's case, by this time, the constant vectors have been propagated and > folded into a constant vector: > > Optimizing statement vect_cst_.12_23 = { 0, 1, 2, 3 }; > LKUP STMT vect_cst_.12_23 = { 0, 1, 2, 3 } > ==== ASGN vect_cst_.12_23 = { 0, 1, 2, 3 } > Optimizing statement vect_cst_.11_32 = { 4, 5, 6, 7 }; > LKUP STMT vect_cst_.11_32 = { 4, 5, 6, 7 } > ==== ASGN vect_cst_.11_32 = { 4, 5, 6, 7 } > Optimizing statement vectp.14_2 = &a[0]; > LKUP STMT vectp.14_2 = &a[0] > ==== ASGN vectp.14_2 = &a[0] > Optimizing statement MEM[(int *)vectp.14_2] = vect_cst_.12_23; > Replaced 'vectp.14_2' with constant '&aD.1831[0]' > Replaced 'vect_cst_.12_23' with constant '{ 0, 1, 2, 3 }' > Folded to: MEM[(int *)&a] = { 0, 1, 2, 3 }; > LKUP STMT MEM[(int *)&a] = { 0, 1, 2, 3 } with .MEM_3(D) > LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_3(D) > LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_25 > 2>>> STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_25 > Optimizing statement vectp.14_21 = vectp.14_2 + 16; > Replaced 'vectp.14_2' with constant '&aD.1831[0]' > LKUP STMT vectp.14_21 = &a[0] pointer_plus_expr 16 > 2>>> STMT vectp.14_21 = &a[0] pointer_plus_expr 16 > ==== ASGN vectp.14_21 = &MEM[(void *)&a + 16B] > Optimizing statement MEM[(int *)vectp.14_21] = vect_cst_.11_32; > Replaced 'vectp.14_21' with constant '&MEM[(voidD.41 *)&aD.1831 + 16B]' > Replaced 'vect_cst_.11_32' with constant '{ 4, 5, 6, 7 }' > Folded to: MEM[(int *)&a + 16B] = { 4, 5, 6, 7 }; > LKUP STMT MEM[(int *)&a + 16B] = { 4, 5, 6, 7 } with .MEM_25 > LKUP STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_25 > LKUP STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_19 > 2>>> STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_19 > Optimizing statement vectp_a.5_22 = &a; > LKUP STMT vectp_a.5_22 = &a > ==== ASGN vectp_a.5_22 = &a > Optimizing statement vect__13.6_20 = MEM[(int *)vectp_a.5_22]; > Replaced 'vectp_a.5_22' with constant '&aD.1831' > LKUP STMT vect__13.6_20 = MEM[(int *)&a] with .MEM_19 > FIND: { 0, 1, 2, 3 } > Replaced redundant expr '# VUSE <.MEM_19> > MEM[(intD.6 *)&aD.1831]' with '{ 0, 1, 2, 3 }' > ==== ASGN vect__13.6_20 = { 0, 1, 2, 3 } > Optimizing statement vect_sum_14.7_13 = vect__13.6_20; > Replaced 'vect__13.6_20' with constant '{ 0, 1, 2, 3 }' > LKUP STMT vect_sum_14.7_13 = { 0, 1, 2, 3 } > ==== ASGN vect_sum_14.7_13 = { 0, 1, 2, 3 } > .... > > While the MEM[vect_ptr + CST] gets replaced correctly by 'a', it doesn't > seem to figure out that the literal pool load 'a = *LC0' is nothing but > > vect_cst_.12_23 = { 0, 1, 2, 3 }; and vect_cst_.11_32 = { 4, 5, 6, 7 }; > > which is the only major difference between how the const vector is > initialized in x86 and aarch64. Is DOM not able to understand 'a = *LC0'?