Le 15 févr. 08 à 11:38, Philip Beevers a écrit : > Hi everyone, > > This is my first post to zfs-discuss, so be gentle with me :-) > > I've been doing some testing with ZFS - in particular, in > checkpointing > the large, proprietary in-memory database which is a key part of the > application I work on. In doing this I've found what seems to be some > fairly unhelpful write throttling behaviour from ZFS. > > In summary, the environment is: > > * An x4600 with 8 CPUs and 128GBytes of memory > * A 50GByte in-memory database > * A big, fast disk array (a 6140 with a LUN comprised of 4 SATA > drives) > * Running Solaris 10 update 4 (problems initially seen on U3 so I > got it > patched) > > The problems happen when I checkpoint the database, which involves > putting that database on disk as quickly as possible, using the > write(2) > system call. > > The first time the checkpoint is run, it's quick - about 160MBytes/ > sec, > even though the disk array is only sustaining 80MBytes/sec. So we're > dirtying stuff in the ARC (and growing the ARC) at a pretty impressive > rate. > > After letting the IO subside, running the checkpoint again results in > very different behaviour. It starts running very quickly, again at > 160MByte/sec (with the underlying device doing 80MBytes/sec), and > after > a while (presumably once the ARC is full) things go badly wrong. In > particular, a write(2) system call hangs for 6-7 minutes, apparently > until all the outstanding IO is done. Any reads from that device also > take a huge amount of time, making the box very unresponsive. > > Obviously this isn't good behaviour, but it's particularly unfortunate > given that this checkpoint is stuff that I don't want to retain in any > kind of cache anyway - in fact, preferably I wouldn't pollute the ARC > with it in the first place. But it seems directio(3C) doesn't work > with > ZFS (unsurprisingly as I guess this is implemented in segmap), and > madvise(..., MADV_DONTNEED) doesn't drop data from the ARC (again, I > guess, as it's working on segmap/segvn). > > Of course, limiting the ARC size to something fairly small makes it > behave much better. But this isn't really the answer. > > I also tried using O_DSYNC, which stops the pathological behaviour but > makes things pretty slow - I only get a maximum of about 20MBytes/sec, > which is obviously much less than the hardware can sustain. > > It sounds like we could do with different write throttling behaviour > to > head this sort of thing off. Of course, the ideal would be to have > some > way of telling ZFS not to bother keeping pages in the ARC. > > The latter appears to be bug 6429855. But the underlying behaviour > doesn't really seem desirable; are there plans afoot to do any work on > ZFS write throttling to address this kind of thing? >
Throttling is being addressed. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 BTW, the new code will adjust write speed to disk speed very quickly. You will not see those ultra fast initial checkpoints. Is this a concern ? -r > Regards, > > -- > > Philip Beevers > Fidessa Infrastructure Development > > mailto:[EMAIL PROTECTED] > phone: +44 1483 206571 > > ******************************************************************************************************************************************************************************************** > This message is intended only for the stated addressee(s) and may be > confidential. Access to this email by anyone else is unauthorised. > Any opinions expressed in this email do not necessarily reflect the > opinions of Fidessa. Any unauthorised disclosure, use or > dissemination, either whole or in part is prohibited. If you are not > the intended recipient of this message, please notify the sender > immediately. > > Fidessa plc - Registered office: > Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom > Registered in England no. 3781700 VAT registration no. 688 9008 78 > > Fidessa group plc - Registered Office: > Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom > Registered in England no. 3234176 VAT registration no. 688 9008 78 > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss