On Tue, Aug 24, 2021 at 09:57:52AM +0800, Hongtao Liu wrote: > Trying 5 -> 7: > 5: r85:V4SF=[`*.LC0'] > REG_EQUAL const_vector > 7: r84:V4SF=vec_select(vec_concat(r85:V4SF,r85:V4SF),parallel) > REG_DEAD r85:V4SF > REG_EQUAL const_vector > Failed to match this instruction: > (set (reg:V4SF 84) > (const_vector:V4SF [ > (const_double:SF 3.0e+0 [0x0.cp+2]) > (const_double:SF 2.0e+0 [0x0.8p+2]) > (const_double:SF 4.0e+0 [0x0.8p+3]) > (const_double:SF 1.0e+0 [0x0.8p+1]) > ])) > > (insn 5 2 7 2 (set (reg:V4SF 85) > (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 > A128])) > "/export/users/liuhongt/install/git_trunk_master_native/lib/gcc/x86_64-pc-linux-gnu/12.0.0/include/xmmintrin.h":746:19 > 1600 {movv4sf_internal} > (expr_list:REG_EQUAL (const_vector:V4SF [ > (const_double:SF 4.0e+0 [0x0.8p+3]) > (const_double:SF 3.0e+0 [0x0.cp+2]) > (const_double:SF 2.0e+0 [0x0.8p+2]) > (const_double:SF 1.0e+0 [0x0.8p+1]) > ]) > (nil))) > (insn 7 5 11 2 (set (reg:V4SF 84) > (vec_select:V4SF (vec_concat:V8SF (reg:V4SF 85) > (reg:V4SF 85)) > (parallel [ > (const_int 1 [0x1]) > (const_int 2 [0x2]) > (const_int 4 [0x4]) > (const_int 7 [0x7]) > ]))) > "/export/users/liuhongt/install/git_trunk_master_native/lib/gcc/x86_64-pc-linux-gnu/12.0.0/include/xmmintrin.h":746:19 > 3015 {sse_shufps_v4sf} > (expr_list:REG_DEAD (reg:V4SF 85) > (expr_list:REG_EQUAL (const_vector:V4SF [ > (const_double:SF 3.0e+0 [0x0.cp+2]) > (const_double:SF 2.0e+0 [0x0.8p+2]) > (const_double:SF 4.0e+0 [0x0.8p+3]) > (const_double:SF 1.0e+0 [0x0.8p+1]) > ]) > (nil)))) > > I think pass_combine should be extended to force illegitimate constant > to constant pool and recog load insn again, It looks like a general > optimization that better not do it in the backend.
Patches welcome. You should do this like change_zero_ext is done, and perhaps make sure you do not introduce new is_just_move insns that can make 2->2 combinations do the wrong thing. Also somehow make this not take exponential time? It looks like this should onle be done in cases where change_zero_ext is not, and the reverse, so this will work fine with a little attention to detail. gl;hf, Segher