https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98113
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Target Milestone|--- |11.0 Target| |x86_64-*-* s390x-*-* Keywords| |missed-optimization --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- The most straight-forward approach would be to treat r_14 = BIT_INSERT_EXPR <r_15(D), _18, 0 (32 bits)>; r_33 = BIT_INSERT_EXPR <r_14, _27, 32 (32 bits)>; r_32 = BIT_INSERT_EXPR <r_33, _36, 64 (32 bits)>; r_31 = BIT_INSERT_EXPR <r_32, _3, 96 (32 bits)>; itself as a SLP source much like we look for CTORs as SLP source. Note the transformed load is an extra complication but at least I added support to SLP existing vectors. Also regresses on x86_64. I'll see whether I can cook up sth.