Re: [zfs-discuss] periodic slow responsiveness

James Lever Wed, 23 Sep 2009 22:01:55 -0700


On 08/09/2009, at 2:01 AM, Ross Walker wrote:

On Sep 7, 2009, at 1:32 AM, James Lever <j...@jamver.id.au> wrote:
Well a MD1000 holds 15 drives a good compromise might be 2 7 driveRAIDZ2s with a hotspare... That should provide 320 IOPS instead of160, big difference.
The issue is interactive responsiveness and if there is a way totune the system to give that while still having good performancefor builds when they are run.
Look at the write IOPS of the pool with the zpool iostat -v and lookat how many are happening on the RAIDZ2 vdev.
I was suggesting that slog write were possibly starving reads fromthe l2arc as they were on the same device. This appears not tohave been the issue as the problem has persisted even with thel2arc devices removed from the pool.
The SSD will handle a lot more IOPS then the pool and L2ARC is alazy reader, it mostly just holds on to read cache data.
It just may be that the pool configuration just can't handle thewrite IOPS needed and reads are starving.
Possible, but hard to tell. Have a look at the iostat results I’veposted.
The busy times of the disks while the issue is occurring should letyou know.

So it turns out that the problem is that all writes coming via NFS aregoing through the slog. When that happens, the transfer speed to thedevice drops to ~70MB/s (the write speed of his SLC SSD) and until theload drops all new write requests are blocked causing a noticeabledelay (which has been observed to be up to 20s, but generally only2-4s).

I can reproduce this behaviour by copying a large file (hundreds of MBin size) using 'cp src dst’ on an NFS (still currently v3) client andobserve that all data is pushed through the slog device (10GBpartition of a Samsung 50GB SSD behind a PERC 6/i w/256MB BBC) ratherthan going direct to the primary storage disks.

On a related note, I had 2 of these devices (both using just 10GBpartitions) connected as log devices (so the pool had 2 separate logdevices) and the second one was consistently running significantlyslower than the first. Removing the second device made an improvementon performance, but did not remove the occasional observed pauses.

I was of the (mis)understanding that only metadata and writes smallerthan 64k went via the slog device in the event of an O_SYNC writerequest?


The clients are (mostly) RHEL5.

Is there a way to tune this on the NFS server or clients such thatwhen I perform a large synchronous write, the data does not go via theslog device?

I have investigated using the logbias setting, but that will just killsmall file performance also on any filesystem using it and defeat thepurpose of having a slog device at all.


cheers,
James

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] periodic slow responsiveness

Reply via email to