Robert Milkowski wrote:
Hi.

    On nfs clients which are mounting file system f3-1/d611 I can see 3-5s 
periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs 
server at the same time disk activity is almost 0 (both iostat and zpool 
iostat). However CPU activity increases in SYS during that periods.

Different time period when disk activity is small:

# lockstat -kgIw sleep 10 | less

Did you happen to get 'lockstat -kgIW' output while the problem was occurring? (note the capital W) I'm not sure how to interpret the -w output... (and sorry I gave you the wrong flags before).

Now during another period when disk activity is low and nfs clients see problem:

# dtrace -n fbt:::entry'{self->vt=vtimestamp;}' -n fbt:::return'/self->vt/[EMAIL 
PROTECTED](vtimestamp-self->vt);self->vt=0;}' -n tick-5s'{printa(@);exit(0);}'
[...]
  page_next_scan_large                                       23648600
  generic_idle_cpu                                           69234100
  disp_getwork                                              139261800
  avl_walk                                                  669424900

Hmm, that's a possibility, but the method you're using to gather this information (tracing *every* function entry and exit) is a bit heavy-handed, and it may be distorting the results.

Heh, I'm sure I have seen avl_walk consuming lot of CPU before...

So wait for another such period and (6-7seconds):

# dtrace -n fbt::avl_walk:entry'[EMAIL PROTECTED]()]=count();}'
[...]

              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e4
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_alloc_blk+0x70
              zfs`zil_lwb_write_start+0x8c
              zfs`zil_lwb_commit+0x1ac
              zfs`zil_commit+0x1b0
              zfs`zfs_fsync+0xa8
              genunix`fop_fsync+0x14
              nfssrv`rfs3_create+0x700
              nfssrv`common_dispatch+0x444
              rpcmod`svc_getreq+0x154
              rpcmod`svc_run+0x198
              nfs`nfssys+0x1c4
              unix`syscall_trap32+0xcc
          1415957

Hmm, assuming that avl_walk() is actually consuming most of our CPU (which the lockstat -kgIW will confirm), this seems to indicate that we're taking a long time trying to find free chunks of space. This may happen if you have lots of small fragments of free space, but no chunks large enough to hold the block we're trying to allocate. We try to avoid this situation by trying to allocate from the metaslabs with the most free space, but it's possible that there's an error in this algorithm.

So lets destroy oldest snapshot:

# zfs destroy f3-1/[EMAIL PROTECTED]
[it took about 4 minutes!]

After snapshot was destroyed problem is completly gone.

FYI, destroying the snapshot probably helped simply by (a) returning some big chunks of space to the pool and/or (b) perturbing the system enough so that we try different metaslabs which aren't so fragmented.

--matt

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to