Hi, SSE4.1 introduced zero-extending and sign-extending loads, such as
pmovzxbd (%rax), %mm0 which takes four bytes from (%rax), zero-extends them to four 32-bit dwords, and put them into %mm0. However, GCC's intrinsics support only the form pmovzxbd %mm1, %mm0 which take the lower 32 bits from %mm1 and does the same. This is reflected in the definition of the intrinsic (from the GCC 4.4.1 manual): v4si __builtin_ia32_pmovzxbd128 (v16qi) This makes it rather hard and indirect to load, say, 32 bits from an unaligned char* -- especially if you're not sure that the next 96 bits are readable. (Just casting the char* pointer to an v16qi* and dereferencing it in the intrinsic's argument causes GCC to emit an aligned load to a register, followed by a pmovzxbd reg/reg, at least in my program.) Could you please add the forms that take v2qi/v4qi/v8qi/v2hi/v4hi/v2si as well, for the entire pmovzx* and pmovsx* family? -- Summary: Please add memory forms of pmovzx* (SSE4.1) Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sgunderson at bigfoot dot com GCC build triplet: x86_64-linux-gnu GCC host triplet: x86_64-linux-gnu GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484