eric kustarz writes:
 > >
 > > Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm
 > > surprised that this is being met with skepticism considering that
 > > Oracle highly recommends direct IO be used,  and, IIRC, Oracle
 > > performance was the main motivation to adding DIO to UFS back in
 > > Solaris 2.6. This isn't a problem with ZFS or any specific fs per se,
 > > it's the buffer caching they all employ. So I'm a big fan of seeing
 > > 6429855 come to fruition.
 > 
 > The point is that directI/O typically means two things:
 > 1) concurrent I/O
 > 2) no caching at the file system
 > 

In my blog I also mention :

   3) no readahead (but can be viewed as an implicit consequence of 2)

And someone chimed in with

   4) ability to do I/O at the sector granularity.


I also think that for many 2) is too weak form of what they
expect :

   5) DMA straight from user buffer to disk avoiding a copy.


So
 
   1) concurrent I/O we have in ZFS.

   2) No Caching.
      we could do by taking a directio hint and evict 
      arc buffer immediately after copyout to user space
      for reads,  and after txg completion for writes.

   3) No prefetching.
      we have 2 level of prefetching. The low level was
      fixed recently. Should not cause problem to DB loads.
      The high level still needs fixing on it's own.
      Then we should take the same hint as 2) to disable it
      altogether. In the mean time we can tune our way into 
      this mode.

   4) Sector sized I/O
      Is really foreign to ZFS design.

   5) Zero Copy & more CPU efficientcy.
      I think is where the debate is.



My line has been that 5) won't help latency much and latency is
where I think the game is currently played. Now the
disconnect might be because people might feel that the game
is not latency but CPU efficientcy : "how many CPU cycles to I
burn to do get data from disk to user buffer". This is a
valid point. Configurations can with very large number
of disks end up saturated by the filesystem CPU utilisation.

So I still think that the major area  for ZFS perf gains are
on the latency  front : block  allocation (now much improved
with  the Separate  intent log),  I/O  scheduling, and other
fixes to the threading & ARC behavior.  But at some point we
can turn  our microscope on    the CPU efficientcy  of   the
implementation.   The copy will certainly be  a big chunk of
the CPU cost per  I/O but I would still  like to gather that
data.

Also  consider, 50  disks at  200 IOPS of   8K is 80 MB/sec.
That means maybe  1/10th  of a single  CPU  to  be saved  by
avoiding just   the copy. Probably  not  what people have in
mind.  How many CPU's do you have when attaching 1000 drives 
to a host running a 100TB database ? That many drivers will barely 
occupy 2 cores running the copies.

People want  performance and efficientcy. Directio is
just an overloaded name that  delivered those gains to other
filesystems.

Right now, what I think  is worth gathering is cycles  spent
in ZFS per reads & writes in a large DB environment where DB
holds  90%  of memory.  For  comparison with another  FS, we
should disable checksum, file prefetching, vdev prefetching,
cap the  ARC, atime  off,  8K  recordsize.  A breakdown  and
comparison  of   the  CPU  cost per   layer   will  be quite
interesting and points to what needs work.

Another interesting thing for me would be : what is your
budget ?

        "how   many cycles per DB   reads and writes are you
        willing to spend and how did you come to that number"


But, as Eric says, let's develop 2 and I'll try  in parallel to 
figure out the per layer breakdown cost.

-r



 > Most file systems (ufs, vxfs, etc.) don't do 1) or 2) without turning  
 > on "directI/O".
 > 
 > ZFS *does* 1.  It doesn't do 2 (currently).
 > 
 > That is what we're trying to discuss here.
 > 
 > Where does the win come from with "directI/O"?  Is it 1), 2), or some  
 > combination?  If its a combination, what's the percentage of each  
 > towards the win?
 > 
 > We need to tease 1) and 2) apart to have a full understanding.  I'm  
 > not against adding 2) to ZFS but want more information.  I suppose  
 > i'll just prototype it and find out for myself.
 > 
 > eric

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to