I've committed a flag to the LLVM implementation to not realign the stack (-mllvm -asan-realign-stack=0). On Xeon W3690 I've measured no performance difference (tried C/C++ part of SPEC2006). So, on x86 it's probably the right thing to not realign the stack.
--kcc On Mon, Dec 3, 2012 at 10:41 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote: >> The LLVM implementation always used 32-byte alignment for stack redzones. >> I never actually did any performance checking on x86 (32-byte aligned >> vs 8-byte aligned), >> although I suspect 32-byte aligned redzones should be ~2x faster. > > If the ~2x faster comes from unaligned vs. aligned integer stores, I can't > spot anything like that on e.g. > > __attribute__((noinline, noclone)) void > foo (int *p) > { > int i; > for (i = 0; i < 32; i++) > p[i] = 0x12345678; > } > > int > main (int argc, const char **argv) > { > char buf[1024]; > int *p = &buf[argc - 1]; > int i; > __builtin_printf ("%p\n", p); > for (i = 0; i < 100000000; i++) > foo (p); > return 0; > } > > Time with zero arguments (i.e. argc 1) is the same as time with 1, 2 or 3 > arguments on SandyBridge CPU. I guess there could be penalties on page > boundaries, etc., but I think hot caches is the usual operation on the > stack. > > Jakub