https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> --- During the expansion, we don't try vec_extract because we are trying to extract low DImode (64bits) out of a V16QImode pseudo, which is not really vector element extraction, and the middle end doesn't know that on this target it is beneficial to just subreg the V16QImode pseudo to identically sized vector with different sized elements (V2DImode in this case). So, in order to handle this at the expansion level, we probably would need to add some new optab like vec_extract that would be not just about the source mode, but also target mode (conversion optab?), or some target hook or macro that would instruct the middle-end to also try to subreg the vector mode to identically sized other vector mode before trying vec_extract. Immediately after the vec_extract check, we already convert the V16QImode to TImode and force_reg it, so that is the last spot that can do something about it during expansion. To fix this up before reload, we have the option of either !reload_completed splitter or some combiner pattern(s). Short testcase that shows hopefully optimal or close to that output for f5-f8 and really bad code for f1-f4, both with -O2 -m64 and -O2 -msse2 -m32. typedef unsigned char V __attribute__((vector_size (16))); typedef unsigned long long W __attribute__((vector_size (16))); typedef unsigned int T __attribute__((vector_size (16))); void f1 (unsigned long long *x, V y) { *x = ((W)y)[0]; } unsigned long long f2 (V y) { return ((W)y)[0]; } void f3 (unsigned int *x, V y) { *x = ((T)y)[0]; } unsigned int f4 (V y) { return ((T)y)[0]; } void f5 (unsigned long long *x, W y) { *x = ((W)y)[0]; } unsigned long long f6 (W y) { return ((W)y)[0]; } void f7 (unsigned int *x, T y) { *x = ((T)y)[0]; } unsigned int f8 (T y) { return ((T)y)[0]; }