typedef int v4si __attribute__ ((vector_size (16))); v4si y(v4si *s3) { return *s3; }
extern v4si s1, s2; v4si x(void) { v4si s3 = s1 + s2; return y(&s3); } And compile it with -O2 -fno-inline -msse2 -fomit-frame-pointer The variable s3 is stored using unaligned store (movdqu) and loaded using aligned load (movdqa). -mpreferred-stack-boundary=4 doesn't guarantee stack alignment, it only advises that there is stack alignment (the function may be called from OS callback, signal, another library compiled with lesser alignment, etc... --- and i386 mandates only 4-byte stack alignment), so use of movdqa is incorrect. (does GCC ABI mandate that all vector types must be aligned? If so, then movdqa is correct, but storing it on the stack, relying on alignment -mpreferred-stack-boundary=4 is not correct). Now, if you compile it with -mpreferred-stack-boundary=2, function "x" aligns the stack but uses movdqu to store on the aligned stack, so it generates suboptimal code. -- Summary: gcc shouldn't assume that the stack is aligned Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838