https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101621
Bug ID: 101621 Summary: gcc cannot optimize int8_t vector assign with subscription to shuffle Product: gcc Version: 11.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yumeyao at gmail dot com Target Milestone: --- https://gcc.godbolt.org/z/91cqenf99 typedef char v16b __attribute__((vector_size(16))); To summary it up, regarding optimizing v = { v[n] ...} into shuffle, targeting Intel x86(x86_64): These is a lack of optimization when there is a zero There is some regression starting from gcc9. so this might be 2 issues. But I think a proper fix could resolve both. * gcc can optimize int8_t vector assign with subscription of the same vector to shuffle, like this: v16b gcc_can_shuffle(v16b b) { return (v16b) {b[0], b[0], b[0], b[0], b[4], b[4], b[4], b[4], b[8], b[8], b[8], b[8], b[12], b[12], b[12], b[12]}; } * However, if there is a zero, gcc can't handle this. Actually this is supported on Intel x86, with a negative subscription indicating the 'zero value'. Clang can do the optimization starting with clang 5. * Furthermore, there is a regression: gcc < 8 can always optimize it, but starting with gcc9, if there is a cast, then the optimization fails: typedef long v2si64 __attribute__((vector_size(16))); v16b gcc_cannot_shuffle_with_cast(v2si64 x) { v16b b = (v16b)x; v16b b0 = {b[0], b[0], b[0], b[0], b[4], b[4], b[4], b[4], b[8], b[8], b[8], b[8], b[12], b[12], b[12], b[12]}; return b0; } gcc 11 can optimize it on -O3, but not on -O1 or -O2.