On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote:
> The LLVM implementation always used 32-byte alignment for stack redzones.
> I never actually did any performance checking on x86 (32-byte aligned
> vs 8-byte aligned),
> although I suspect 32-byte aligned redzones should be ~2x faster.

If the ~2x faster comes from unaligned vs. aligned integer stores, I can't
spot anything like that on e.g.

__attribute__((noinline, noclone)) void
foo (int *p)
{
  int i;
  for (i = 0; i < 32; i++)
    p[i] = 0x12345678;
}

int
main (int argc, const char **argv)
{
  char buf[1024];
  int *p = &buf[argc - 1];
  int i;
  __builtin_printf ("%p\n", p);
  for (i = 0; i < 100000000; i++)
    foo (p);
  return 0;
}

Time with zero arguments (i.e. argc 1) is the same as time with 1, 2 or 3
arguments on SandyBridge CPU.  I guess there could be penalties on page
boundaries, etc., but I think hot caches is the usual operation on the
stack.

        Jakub

Reply via email to