On Mon, 26 Nov 2012, Konstantin Belousov wrote:
On Mon, Nov 26, 2012 at 06:31:34AM -0800, sig6247 wrote:Just checked out r243529, this only happens when the kernel is compiled by clang, and only on i386, either recompiling the kernel with gcc or booting from a UFS root works fine. Is it a known problem?It looks like that clang uses more stack than gcc, and zfs makes quite deep call chains. It would be a waste, generally, to increase the init process kernel stack size only to pacify zfs. And I suspect that it would not help in the similar situations when the same procedure initiated for non-root mounts.
Or to pacify clang...
---------------------------------------------------------------------- WARNING: WITNESS option enabled, expect reduced performance. Trying to mount root from zfs:zroot []... Fatal double fault: eip = 0xc0adc37d esp = 0xc86bffc8 ebp = 0xc86c003c cpuid = 1; apic id = 01 panic: double fault cpuid = 1 KDB: enter: panic [ thread pid 1 tid 100002 ] Stopped at kdb_enter+0x3d: movl $0,kdb_why db> bt Tracing pid 1 tid 100002 td 0xc89efbc0 kdb_enter(c1064aa4,c1064aa4,c10b806f,c139e3b8,f5eacada,...) at kdb_enter+0x3d panic(c10b806f,1,1,1,c86c003c,...) at panic+0x14b dblfault_handler() at dblfault_handler+0xab --- trap 0x17, eip = 0xc0adc37d, esp = 0xc86bffc8, ebp = 0xc86c003c --- witness_checkorder(c1fd7508,9,c109df18,7fa,0,...) at witness_checkorder+0x37d __mtx_lock_flags(c1fd7518,0,c109df18,7fa,c135d918,...) at __mtx_lock_flags+0x87 uma_zalloc_arg(c1fd66c0,0,1,4d3,c86c0110,...) at uma_zalloc_arg+0x605 vm_map_insert(c1fd508c,c13dfc10,bb1f000,0,cba1e000,...) at vm_map_insert+0x499 kmem_back(c1fd508c,cba1e000,1000,3,c86c01d4,...) at kmem_back+0x76 kmem_malloc(c1fd508c,1000,3) at kmem_malloc+0x250 page_alloc(c1fd1d80,1000,c86c020b,3,c1fd1d80,...) at page_alloc+0x27 keg_alloc_slab(103,4,c109df18,870,cb99ef6c,...) at keg_alloc_slab+0xc3 keg_fetch_slab(103,c1fd1d80,cb99ef6c,c1fc8230,c86c02c0,...) at keg_fetch_slab+0xe2 zone_fetch_slab(c1fd1d80,c1fd0480,103,826,0,...) at zone_fetch_slab+0x43 uma_zalloc_arg(c1fd1d80,0,102,3,2,...) at uma_zalloc_arg+0x3f2 malloc(4c,c1686100,102,c86c0388,c173d09a,...) at malloc+0xe9 zfs_kmem_alloc(4c,102,cb618820,c89efbc0,cb618820,...) at zfs_kmem_alloc+0x20 vdev_mirror_io_start(cb8218a0,10,cb8218a0,1,0,...) at vdev_mirror_io_start+0x14a zio_vdev_io_start(cb8218a0,c89efbc0,0,cb8218a0,c86c0600,...) at zio_vdev_io_start+0x228 zio_execute(cb8218a0,cb618000,cba1b640,cb900000,400,...) at zio_execute+0x106 spa_load_verify_cb(cb618000,0,cba1b640,cb884b40,c86c0600,...) at spa_load_verify_cb+0x89 traverse_visitbp(cb884b40,cba1b640,c86c0600,c86c0ba0,0,...) at traverse_visitbp+0x29f traverse_dnode(cb884b40,0,0,8b,0,...) at traverse_dnode+0x92 traverse_visitbp(cb884bb8,cba07200,c86c0890,cb884bf4,c16ce7e0,...) at traverse_visitbp+0xe47 traverse_visitbp(cb884bf4,cb9bf840,c86c0968,c86c0ba0,0,...) at traverse_visitbp+0xf32 traverse_dnode(cb884bf4,0,0,0,0,...) at traverse_dnode+0x92 traverse_visitbp(0,cb618398,c86c0b50,2,cb9f1c78,...) at traverse_visitbp+0x96d traverse_impl(0,0,cb618398,3e1,0,...) at traverse_impl+0x268 traverse_pool(cb618000,3e1,0,d,c1727830,...) at traverse_pool+0x79 spa_load(0,1,c86c0ec4,1e,0,...) at spa_load+0x1dde spa_load(0,0,c13d8c94,1,3,...) at spa_load+0x11a5 spa_load_best(0,ffffffff,ffffffff,1,c0adc395,...) at spa_load_best+0x71 spa_open_common(c17e0e1e,0,0,c86c1190,c16f5a1c,...) at spa_open_common+0x11a spa_open(c86c1078,c86c1074,c17e0e1e,c135d918,c1fd7798,...) at spa_open+0x27 dsl_dir_open_spa(0,cb770030,c17e11b1,c86c11f8,c86c11f4,...) at dsl_dir_open_spa+0x6c dsl_dataset_hold(cb770030,cb613800,c86c1240,cb613800,cb613800,...) at dsl_dataset_hold+0x3a dsl_dataset_own(cb770030,0,cb613800,c86c1240,c1684e30,...) at dsl_dataset_own+0x21 dmu_objset_own(cb770030,2,1,cb613800,c86c1290,...) at dmu_objset_own+0x2a zfsvfs_create(cb770030,c86c13ac,c17ee05d,681,0,...) at zfsvfs_create+0x4c zfs_mount(cb78ed20,c17f411c,c9ff4600,c89cae80,0,...) at zfs_mount+0x42c vfs_donmount(c89efbc0,4000,0,c86c1790,cb6c0800,...) at vfs_donmount+0xc6d kernel_mount(cb7700b0,4000,0,0,1,...) at kernel_mount+0x6b parse_mount(cb7700e0,c1194498,0,1,0,...) at parse_mount+0x606 vfs_mountroot(c13d95b0,4,c105c042,2bb,0,...) at vfs_mountroot+0x6cf start_init(0,c86c1d08,c105e94c,3db,0,...) at start_init+0x6a fork_exit(c0a42090,0,c86c1d08) at fork_exit+0x7f fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xc86c1d40, ebp = 0 --- db>
43 deep (before the double fault) is disgusting, but even if clang has
broken stack alignment due to a wrong default and no
-mpreferred-stack-boundary to fix it, that's still only about 8*43
extra bytes (8 for the average extra stack to align to 16 bytes).
Probably zfs is also putting large data structures on the stack.
It would be useful if the stack trace printed the the stack pointer
on every function call, so that you could see how much stack each
function used.
All those ', ...' printed after 5 args show further apparent clang
lossage. vfs_mountroot takes no args at all, but the above shows
ot taking 5 known args and further unknown args. ddb's stack tracer
does limited parsing of the bytes in function's caller to guess the
number of args. It usually guesses right for gcc. It apparently
usually guesses wrong for clang.
I can't test this for i386 since ref10-i386 is down, but for amd64 the
parsing seems to be already too dificult for gcc and impossible for
clang: for the function call in `void bar(void); void foo(void) {
bar(); }', clang generates a tail call:
% test: # @test
% .cfi_startproc
% # BB#0: # %entry
% xorb %al, %al
% jmp bar # TAILCALL
% .Ltmp0:
The bytes after the tail call are unparseable. The bytes before the
tail call are difficult to parse, and AFAIR ddb doesn't look at them
(the xorb says that there are no xmm args, but that is of no interest
in the kernel).
Gcc generates:
% test:
% .LFB2:
% subq $8, %rsp
% .LCFI0:
% movl $0, %eax
% call bar
% addq $8, %rsp
% ret
The addq $8,%rsp after the call looks like it is popping 1 arg and is
probably misguessed as such -- it is actually to clean up the stack
alignment before return. Actually, this guess doesn't apply to amd64
-- the first few args are in registers where there number is almost
impossible to guess even for gcc if there are none on stack; it is
only when there are some on the stack that the number there can possibly
be guessed and the total number = #number in registers + #on stack.
With 1 arg: `void bar(int x); void foo(int) { bar(x); }', the generated
code is identical for both compilers. So have any chance of guessing the
number of args for bar(), bytes in bar()'s caller's callers would have
to be checked, recursively. This is too hard for ddb. With tail calls,
the checking is unbounded.
Tail calls also break the stack trace, but save space so running out of
kernel stack is less likely :-).
It is arguable a bug to compile kernels with options that make stack tracing
impossible. Even passing args in registers makes correct arg printing
impossible -- ddb would have to assume that there are always at least
3 (?) args on amd64. The above shows 3 being printed for fork_exit(),
which is correct, and 3 for kmem_malloc(), which is correct, and none
for fork_trampoline(), which depends on ddb having special knowledge
of this and a few other low-level/asm functions. All the others are
broken.
An very old bug in this area are that functions written in asm don't have
normal frames (at least on amd64 and i386) unless profiling is configured.
They are mostly leaf functions that shouldn't trap, so this is not very
important for interactive use of ddb. But it makes the args of a function like memcpy() harder to find in interactive use, and breaks stack traces if an interrupt or trap occurs in the functions. Compilers may also omit frame pointers for leaf functions, but this is easy to prevent. The efficiency gains from not using frame pointers are small or negative, at least on modern x86 since the frame pointer management can run in parallel and is especially fast in leaf functions where it is not actually used (except for debugging), so frame pointers should never be omitted if there is any chance that the code needs to be debugged. The chance is almost 100% for kernels since they can be debugged if GDB or DDB or stack tracing is configured or if panic dumps are enabled and occur. A not so old bug in this area is that compiling with -O2 or just with with -march tends to break things. Even on old i386 kernels compiled with -O -march=i386, I mostly see 5 args (but no ', ...'). For a just a few functions, the calling sequence is "call foo; addl $4*N,%esp" and ddb correctly guesses that this means N args. Another sequence is "call foo; movl %eax,%esi; addl $8,%esp". ddb doesn't understand this at all, and misguesses that there are 5 args. Bruce _______________________________________________ [email protected] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[email protected]"
