Tao Chen writes: > On 5/12/06, Roch Bourbonnais - Performance Engineering > <[EMAIL PROTECTED]> wrote: > > > > From: Gregory Shaw <[EMAIL PROTECTED]> > > Regarding directio and quickio, is there a way with ZFS to skip the > > system buffer cache? I've seen big benefits for using directio when > > the data files have been segregated from the log files. > > > > > > Were the benefits coming from extra concurrency (no single writter lock) > > Does DIO bypass "writter lock" on Solaris?
Yep. > Not on AIX, which uses CIO (concurrent I/O) to bypass managing locks > at filesystem level: > http://oracle.ittoolbox.com/white-papers/improving-database-performance-with-aix-concurrent-io-2582 > > > or avoiding the extra copy to page cache > > Certainly. Also to avoid VM overhead (DB does like raw devices). OK, but again, is it to avoid badly configured readahead, or get extra concurrency, or something else ? I have a hard time that managing the page cache represents a cost when you compare this to a 5ms I/O. > > > or from too much readahead that is not used before pages need to be > > recycled. > > Not sure what you mean ( avoid unnecessary readahead? ) There is thing thing where a 2K read over UFS if it crosses a page boundary can lead UFS to assert sequential access to the file and do a clustered readahead. Since clusters are often set to 1MB then you can get a lot of spurious I/O form this 2K read. If the data readahead turns out to never be used later because of memory pressure then you have a suboptimal configuration. This is one kind of issue that DIO would not have. > > > ZFS already has the concurrency. > > Interesting, would like to find more on this. > I'll have to blog this down one day. > > The page cache copy is really rather cheap > > VM as a whole is certainly not cheap. In some aspect certainly. Compared to I/O I'd say it's really cheap minus bugs. My point is to be cautious of this syllogism: DIO, a VM bypass mechanism, can be much faster than regular I/O. Thus the VM is costly. DIO is a VM bypass _AND_ a different UFS codepath. > > > and I assert somewhat necessary to insure data integrity. > > Not following you. I'm on thin ground here. But I believe that you can't directly write a disk block and it's checksum in the refering block in the way ZFS wants to do; or at least you couldn't do this and hold up application in a way that is acceptable performance wise. So to insure the data-integrity that ZFS delivers is has to have the data cached for the time it takes to update the on-disk format properly. I'm willing to be corrected on this (and anything else for that matters, we live in a complex world). > > > The extra readahead is somewhat of a bug in UFS (read 2 > > pages get a maxcontig chunk (1MB)). > > Ouch. You said it. But people have learned to tune it down when this hits (tunefs -a) which is not that often. > > > > > ZFS is new, conventional wisdom, may or may not apply. > > > > This (zfs-discuss) is the place where we can be enlightened :-) > > Tao > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -r ____________________________________________________________________________________ Roch Bourbonnais Sun Microsystems, Icnc-Grenoble Senior Performance Analyst 180, Avenue De L'Europe, 38330, Montbonnot Saint Martin, France Performance & Availability Engineering http://icncweb.france/~rbourbon http://blogs.sun.com/roller/page/roch [EMAIL PROTECTED] (+33).4.76.18.83.20 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss