Robert Milkowski wrote:
Hi.
On nfs clients which are mounting file system f3-1/d611 I can see 3-5s
periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs
server at the same time disk activity is almost 0 (both iostat and zpool
iostat). However CPU activity increases in SYS during that periods.
Different time period when disk activity is small:
# lockstat -kgIw sleep 10 | less
Did you happen to get 'lockstat -kgIW' output while the problem was
occurring? (note the capital W) I'm not sure how to interpret the -w
output... (and sorry I gave you the wrong flags before).
Now during another period when disk activity is low and nfs clients see problem:
# dtrace -n fbt:::entry'{self->vt=vtimestamp;}' -n fbt:::return'/self->vt/[EMAIL
PROTECTED](vtimestamp-self->vt);self->vt=0;}' -n tick-5s'{printa(@);exit(0);}'
[...]
page_next_scan_large 23648600
generic_idle_cpu 69234100
disp_getwork 139261800
avl_walk 669424900
Hmm, that's a possibility, but the method you're using to gather this
information (tracing *every* function entry and exit) is a bit
heavy-handed, and it may be distorting the results.
Heh, I'm sure I have seen avl_walk consuming lot of CPU before...
So wait for another such period and (6-7seconds):
# dtrace -n fbt::avl_walk:entry'[EMAIL PROTECTED]()]=count();}'
[...]
zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
1415957
Hmm, assuming that avl_walk() is actually consuming most of our CPU
(which the lockstat -kgIW will confirm), this seems to indicate that
we're taking a long time trying to find free chunks of space. This may
happen if you have lots of small fragments of free space, but no chunks
large enough to hold the block we're trying to allocate. We try to
avoid this situation by trying to allocate from the metaslabs with the
most free space, but it's possible that there's an error in this algorithm.
So lets destroy oldest snapshot:
# zfs destroy f3-1/[EMAIL PROTECTED]
[it took about 4 minutes!]
After snapshot was destroyed problem is completly gone.
FYI, destroying the snapshot probably helped simply by (a) returning
some big chunks of space to the pool and/or (b) perturbing the system
enough so that we try different metaslabs which aren't so fragmented.
--matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss