On Fri, 23 Dec 2011, Alexander Best wrote:

is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer?
i built GENERIC (including modules) with and without that flag. the results
are:

The same as it has always been.  It avoids some bloat.

1654496 bytes with the flag set
vs.
1654952 bytes with the flag unset

I don't believe this.  GENERIC is enormously bloated, so it has size
more like 16MB than 1.6MB.  Even a savings of 4K instead of 456 bytes
is hard to believe.  I get a savings of 9K (text) in a 5MB kernel.
Changing the default target arch from i386 to pentium-undocumented has
reduced the text space savings a little, since the default for passing
args is now to preallocate stack space for them and store to this,
instead of to push them; this preallocation results in more functions
needing to allocate some stack space explicitly, and when some is
allocated explicitly, the text space cost for this doesn't depend on
the size of the allocation.

Anyway, the savings are mostly from from avoiding cache misses from
sparse allocation on stacks.

Also, FreeBSD-i386 hasn't been programmed to support aligned stacks:
- KSTACK_PAGES on i386 is 2, while on amd64 it is 4.  Using more
  stack might push something over the edge
- not much care is taken to align the initial stack or to keep the
  stack aligned in calls from asm code.  E.g., any alignment for
  mi_startup() (and thus proc0?) is accidental.  This may result
  in perfect alignment or perfect misalignment.  Hopefully, more
  care is taken with thread startup.  For gcc, the alignment is
  done bogusly in main() in userland, but there is no main() in
  the kernel.  The alignment doesn't matter much (provided the
  perfect misalignment is still to a multiple of 4), but when it
  matters, the random misalignment that results from not trying to
  do it at all is better than perfect misalignment from getting it
  wrong.  With 4-byte alignment, the only cases that it helps are
  with 64-bit variables.

the gcc(1) man page states the following:

"
This extra alignment does consume extra stack space, and generally
increases code size.  Code that is sensitive to stack space usage,
such as embedded systems and operating system kernels, may want to
reduce the preferred alignment to -mpreferred-stack-boundary=2.
"

the comment in sys/conf/kern.mk however sorta suggests that the default
alignment of 4 bytes might improve performance.

The default stack alignment is 16 bytes, which unimproves performance.

clang handles stack alignment correctly (only does it when it is needed)
so it doesn't need a -mpreferred-stack-boundary option and doesn't
always break without alignment in main().  Well, at least it used to,
IIRC.  Testing it now shows that it does the necessary andl of the
stack pointer for __aligned(32), but for __aligned(16) it now assumes
that the stack is aligned by the caller.  So it now needs
-mpreferred-stack-boundary=2, but doesn't have it.  OTOH, clang doesn't
do the andl in main() like gcc does (unless you put a dummy __aligned(32)
there), but requires crt to pass an aligned stack.

Bruce
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to