On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote: > The LLVM implementation always used 32-byte alignment for stack redzones. > I never actually did any performance checking on x86 (32-byte aligned > vs 8-byte aligned), > although I suspect 32-byte aligned redzones should be ~2x faster.
If the ~2x faster comes from unaligned vs. aligned integer stores, I can't spot anything like that on e.g. __attribute__((noinline, noclone)) void foo (int *p) { int i; for (i = 0; i < 32; i++) p[i] = 0x12345678; } int main (int argc, const char **argv) { char buf[1024]; int *p = &buf[argc - 1]; int i; __builtin_printf ("%p\n", p); for (i = 0; i < 100000000; i++) foo (p); return 0; } Time with zero arguments (i.e. argc 1) is the same as time with 1, 2 or 3 arguments on SandyBridge CPU. I guess there could be penalties on page boundaries, etc., but I think hot caches is the usual operation on the stack. Jakub