Re: [zfs-discuss] Sync Write - ZIL log performance - Feedback for ZFS developers?

Neil Perrin Sat, 10 Apr 2010 15:41:42 -0700

On 04/10/10 09:28, Edward Ned Harvey wrote:

Neil or somebody?  Actual ZFS developers?  Taking feedback here?   ;-)
While I was putting my poor little server through cruel and unusualpunishment as described in my post a moment ago, I noticed somethingunexpected:
I expected that while I'm stressing my log device by infinite syncwrites, my primary storage devices would also be busy(ish). Notreally busy, but not totally idle either. Since the primary storageis a stripe of spindle mirrors, obviously it can handle much moresustainable throughput than the individual log device, but the logdevice can respond with smaller latency. What I noticed was this:
For several seconds, **only** the log device is busy. Then it stops,and for maybe 0.5 secs **only** the primary storage disks are busy.Repeat, recycle.


These are the txgs getting pushed out.

I expected to see the log device busy nonstop. And the spindle disksblinking lightly. As long as the spindle disks are idle, why wait fora larger TXG to be built? Why not flush out smaller TXG's as long asthe disks are idle?

Sometimes it's more efficient to batch up requests. Less blocks arewritten. As you mentioned you weren't stressing the system heavily.ZFS will perform differently when under pressure. It will shorten thetime between txgs if the data arrives quicker.

But worse yet ... During the 1-second (or 0.5 second) that thespindle disks are busy, why stop the log device? (Presumably alsostopping my application that's doing all the writing.)

Yes, this has been observed by many people. There are two sides to thisproblem related to the

CPU and IO used while pushing a txg:

6806882 need a less brutal I/O scheduler

6881015 ZFS write activity prevents other threads from running in atimely manner


The CPU side (6881015) was fixed relatively recently in snv_129.

This means, if I'm doing zillions of **tiny** sync writes, I will getthe best performance with the dedicated log device present. But ifI'm doing large sync writes, I would actually get better performancewithout the log device at all. Or else ... add just as many logdevices as I have primary storage devices. Which seems kind of crazy.

Yes you're right, there are times when it's better to bypass the slogand use the pool disks which can deliver better bandwidth.


The algorithm for where and what the ZIL writes has got quite complex:

- There was another change recently to bypass the slog if 1MB had beensent to it and 2MB were waiting to be sent.- There's a new property logbias which when set to throughput directsthe ZIL to send all of it's writes to the main pool devices thus freeingthe slog for more latency sensitive work (ideal for database data files).- If synchronous writes are large (>32K) and block aligned then theblocks are written directly to the pool and a small recordwritten to the log. Later when the txg commits then the blocks arejust linked into the txg. However, this processing is notdone if there are any slogs because I found it didn't perform as well.Probably ought to be re-evaluated.- There are further tweaks being suggested and which might make it to aZIL near you soon.


Neil.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sync Write - ZIL log performance - Feedback for ZFS developers?

Reply via email to