https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization CC| |rguenth at gcc dot gnu.org --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- What flags do I need? Hmm, unincluding isn't very successful, with trunk it seems to at least compile... I see <bb 5> [local count: 955630224]: # ivtmp.13749_388 = PHI <0(4), ivtmp.13749_387(5)> _571 = {x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111, x_111}; _24 = {color_13(D), color_13(D), color_13(D), color_13(D)}; _93 = MEM[base: src_15(D), index: ivtmp.13749_388, offset: 0B]; _462 = (unsigned int) _93; _461 = BIT_FIELD_REF <_93, 32, 32>; _460 = BIT_FIELD_REF <_93, 32, 64>; _459 = BIT_FIELD_REF <_93, 32, 96>; MEM <unsigned int> [(struct Vec *)&D.151762] = _462; MEM <unsigned int> [(struct Vec *)&D.151762 + 4B] = _461; MEM <unsigned int> [(struct Vec *)&D.151762 + 8B] = _460; MEM <unsigned int> [(struct Vec *)&D.151762 + 12B] = _459; src_2 = MEM[(struct Vec *)&D.151762]; _32 = (unsigned char) src_2; _33 = BIT_FIELD_REF <src_2, 8, 8>; _34 = BIT_FIELD_REF <src_2, 8, 16>; _35 = BIT_FIELD_REF <src_2, 8, 24>; ... _106 = (unsigned char) _455; _107 = (unsigned char) _456; _108 = (unsigned char) _457; _109 = (unsigned char) _458; _566 = {_109, _108, _107, _106, _105, _104, _103, _102, _101, _100, _99, _98, _97, _96, _95, _94}; _550 = VIEW_CONVERT_EXPR<__int128 unsigned>(_566); _89 = (unsigned int) _550; _90 = BIT_FIELD_REF <_566, 32, 32>; _91 = BIT_FIELD_REF <_566, 32, 64>; _92 = BIT_FIELD_REF <_566, 32, 96>; c ={v} {CLOBBER}; D.98791.lo.lo.val = _89; D.98791.lo.hi.val = _90; D.98791.hi.lo.val = _91; D.98791.hi.hi.val = _92; _538 = MEM <__int128 unsigned> [(char * {ref-all})&D.98791]; so we're not able to "combine" through all sorts of reshuffling here (FRE would be able to at least produce some vector CTOR for the final one but it currently resists because of cost reasons in general). This might be all because of stupid intrinsic use or because our intrinsic [inline] expansion is stupid or ... "extracting" the actual loops (inlined and all) in intrinsic form as a C testcase would be really really nice.