I moved my main workspaces over to ZFS a while ago and noticed that my disk got 
really noisy (yes, one of those subjective measurements). It sounded like the 
head was being bounced around a lot at the end of each transaction group.

Today I grabbed the iosnoop dtrace script (from 
<http://www.opensolaris.org/os/community/dtrace/scripts/>) and looked a little 
at the output. It's strange, it looks as if the blocks are being written to 
disk in nearly random order.

I have a two-vdev pool, just plain disk slices, no mirroring etc. (I'm not 
using whole disks because I've just got the two disks in my workstation and my 
root is still on UFS.) If I use 'dd' to create a 1MB file out of 1KB writes and 
wait for it to be pushed to disk, one of the two disks sees a block stream like:

  27610929:1
  27610930:3
  27610933:9
  27610942:13
  39425458:13  <-- huh?
  27565952:16  <-- now we've gone backwards
  39400576:16
  27463484:4
  39342412:4
  27581454:2
  39382602:2
  27581456:2
  ...

So the head of this disk is happily bouncing back and forth at this point 
(well, they're FC disks with a reasonably deep queue, so it's not so bad as it 
could be, but it's still not great).

The other disk is behaving a little better, but still moving back and forth 
between two block ranges.

Before I find some time to go dig into the intricacies of the I/O scheduler, 
any hints as to why this might be happening? My intuition would be that we 
ought to be able to write the blocks out in arbitrary order since it's only the 
überblock write which commits them, so we should be able to use an 
always-move-forward ordering (and, of course, let the disk do its own 
scheduling within that). Also, why the very small adjacent writes? Those first 
four writes in the snoop pushed out 13K of data using 4 separate write 
operations, which is wasteful. (There are others too, e.g. towards the end of 
the excerpt above we're doing two 1K writes to adjacent blocks.) Does the 
scheduler attempt to perform coalescing as well?

(I should mention that this is S10U2 so there have certainly been fixes since.)
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to