https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So, we have simplify_vector_constructor, but that handles only extractions from some other vector, not reads from memory. E.g. on -mavx2 -O2 -mtune=intel: typedef int __attribute__((vector_size(32))) V; typedef short __attribute__((vector_size(16))) W; V foo (short *a) { return (V){a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]}; } V bar (int *a) { return (V){a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]}; } V baz (short *b) { W a = *(W *)b; return (V){a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]}; } V qux (short *b) { W a = *(W *)b; return __builtin_convertvector (a, V); } it triggers on baz, but emits worse code than qux, bar is after forwprop1 and dce: W a; vector(8) short int _1; vector(16) short int _20; V _21; <bb 2> : _1 = MEM[(W *)b_19(D)]; _20 = BIT_INSERT_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, _1, 0>; _21 = [vec_unpack_lo_expr] _20; return _21; and we don't really emit the code we want for the BIT_INSERT_EXPR. I guess it might be better to canonicalize on .VEC_CONVERT for this. And then for the memory reads, see if the target supports unaligned vector loads and in that case also support MEM_REFs.