I've committed a flag to the LLVM implementation to not realign the
stack (-mllvm -asan-realign-stack=0).
On Xeon W3690 I've measured no performance difference (tried C/C++
part of SPEC2006).
So, on x86 it's probably the right thing to not realign the stack.

--kcc

On Mon, Dec 3, 2012 at 10:41 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote:
>> The LLVM implementation always used 32-byte alignment for stack redzones.
>> I never actually did any performance checking on x86 (32-byte aligned
>> vs 8-byte aligned),
>> although I suspect 32-byte aligned redzones should be ~2x faster.
>
> If the ~2x faster comes from unaligned vs. aligned integer stores, I can't
> spot anything like that on e.g.
>
> __attribute__((noinline, noclone)) void
> foo (int *p)
> {
>   int i;
>   for (i = 0; i < 32; i++)
>     p[i] = 0x12345678;
> }
>
> int
> main (int argc, const char **argv)
> {
>   char buf[1024];
>   int *p = &buf[argc - 1];
>   int i;
>   __builtin_printf ("%p\n", p);
>   for (i = 0; i < 100000000; i++)
>     foo (p);
>   return 0;
> }
>
> Time with zero arguments (i.e. argc 1) is the same as time with 1, 2 or 3
> arguments on SandyBridge CPU.  I guess there could be penalties on page
> boundaries, etc., but I think hot caches is the usual operation on the
> stack.
>
>         Jakub

Reply via email to