Re: [zfs-discuss] ZFS and databases

Roch Bourbonnais - Performance Engineering Fri, 12 May 2006 02:24:43 -0700

Tao Chen writes:
 > On 5/12/06, Roch Bourbonnais - Performance Engineering
 > <[EMAIL PROTECTED]> wrote:
 > >
 > >   From: Gregory Shaw <[EMAIL PROTECTED]>
 > >   Regarding directio and quickio, is there a way with ZFS to skip the
 > >   system buffer cache?  I've seen big benefits for using directio when
 > >   the data files have been segregated from the log files.
 > >
 > >
 > > Were the benefits coming from extra concurrency (no single writter lock)
 > 
 > Does DIO bypass "writter lock" on Solaris?


Yep.

 > Not on AIX, which uses CIO (concurrent I/O) to bypass managing locks
 > at filesystem level:
 > http://oracle.ittoolbox.com/white-papers/improving-database-performance-with-aix-concurrent-io-2582
 > 
 > > or avoiding the extra copy to page cache
 > 
 > Certainly. Also to avoid VM overhead (DB does like raw devices).

OK, but again, is it to avoid badly configured readahead, or 
get extra concurrency, or something else ? I have a hard
time that managing the page cache represents a cost when you 
compare this to a 5ms I/O. 

 > 
 > > or from too much readahead that is not used before pages need to be 
 > > recycled.
 > 
 > Not sure what you mean ( avoid unnecessary readahead? )

There is thing thing where a 2K read over UFS if it crosses
a page boundary can lead UFS to assert sequential access to
the file and do a clustered readahead. Since clusters are
often set to 1MB then you can get a lot of spurious I/O form 
this 2K read. If the data readahead turns out to never be
used later because of memory pressure then you have a
suboptimal configuration. This is one kind of issue that DIO 
would not have.

 > 
 > > ZFS already has the concurrency.
 > 
 > Interesting, would like to find more on this.
 > 

I'll have to blog this down one day.

 > > The page cache copy is really rather cheap
 > 
 > VM as a whole is certainly not cheap.

In some aspect certainly. Compared to I/O I'd say it's
really cheap minus bugs. My point is to be cautious of this
syllogism: 

        DIO, a VM bypass mechanism, can be much faster than
        regular I/O. Thus the VM is costly.

DIO is a VM bypass _AND_ a different UFS codepath.

        

 > 
 > > and I assert somewhat necessary to insure data integrity.
 > 
 > Not following you.

I'm on    thin ground here.  But I    believe that you can't
directly  write   a disk block  and    it's checksum in  the
refering block in the  way ZFS wants  to do; or at least you
couldn't  do this and hold up  application in a  way that is
acceptable   performance   wise.     So   to   insure    the
data-integrity that  ZFS delivers is   has to have the  data
cached for  the time it takes  to update the  on-disk format
properly. I'm willing to be corrected on this (and anything
else for that matters, we live in a complex world).


 > 
 > > The extra readahead is somewhat of a bug in UFS (read 2
 > > pages get a maxcontig chunk (1MB)).
 > 
 > Ouch.

You said it. But people have learned to tune it down when
this hits (tunefs -a) which is not that often.

 > 
 > >
 > > ZFS is new, conventional wisdom, may or may not apply.
 > >
 > 
 > This (zfs-discuss) is the place where we can be enlightened :-)
 > 
 > Tao
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-r
____________________________________________________________________________________
Roch Bourbonnais                        Sun Microsystems, Icnc-Grenoble 
Senior Performance Analyst              180, Avenue De L'Europe, 38330, 
                                        Montbonnot Saint Martin, France
Performance & Availability Engineering  
http://icncweb.france/~rbourbon         http://blogs.sun.com/roller/page/roch
[EMAIL PROTECTED]               (+33).4.76.18.83.20

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS and databases

Reply via email to