eric kustarz writes: > > > > Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm > > surprised that this is being met with skepticism considering that > > Oracle highly recommends direct IO be used, and, IIRC, Oracle > > performance was the main motivation to adding DIO to UFS back in > > Solaris 2.6. This isn't a problem with ZFS or any specific fs per se, > > it's the buffer caching they all employ. So I'm a big fan of seeing > > 6429855 come to fruition. > > The point is that directI/O typically means two things: > 1) concurrent I/O > 2) no caching at the file system >
In my blog I also mention : 3) no readahead (but can be viewed as an implicit consequence of 2) And someone chimed in with 4) ability to do I/O at the sector granularity. I also think that for many 2) is too weak form of what they expect : 5) DMA straight from user buffer to disk avoiding a copy. So 1) concurrent I/O we have in ZFS. 2) No Caching. we could do by taking a directio hint and evict arc buffer immediately after copyout to user space for reads, and after txg completion for writes. 3) No prefetching. we have 2 level of prefetching. The low level was fixed recently. Should not cause problem to DB loads. The high level still needs fixing on it's own. Then we should take the same hint as 2) to disable it altogether. In the mean time we can tune our way into this mode. 4) Sector sized I/O Is really foreign to ZFS design. 5) Zero Copy & more CPU efficientcy. I think is where the debate is. My line has been that 5) won't help latency much and latency is where I think the game is currently played. Now the disconnect might be because people might feel that the game is not latency but CPU efficientcy : "how many CPU cycles to I burn to do get data from disk to user buffer". This is a valid point. Configurations can with very large number of disks end up saturated by the filesystem CPU utilisation. So I still think that the major area for ZFS perf gains are on the latency front : block allocation (now much improved with the Separate intent log), I/O scheduling, and other fixes to the threading & ARC behavior. But at some point we can turn our microscope on the CPU efficientcy of the implementation. The copy will certainly be a big chunk of the CPU cost per I/O but I would still like to gather that data. Also consider, 50 disks at 200 IOPS of 8K is 80 MB/sec. That means maybe 1/10th of a single CPU to be saved by avoiding just the copy. Probably not what people have in mind. How many CPU's do you have when attaching 1000 drives to a host running a 100TB database ? That many drivers will barely occupy 2 cores running the copies. People want performance and efficientcy. Directio is just an overloaded name that delivered those gains to other filesystems. Right now, what I think is worth gathering is cycles spent in ZFS per reads & writes in a large DB environment where DB holds 90% of memory. For comparison with another FS, we should disable checksum, file prefetching, vdev prefetching, cap the ARC, atime off, 8K recordsize. A breakdown and comparison of the CPU cost per layer will be quite interesting and points to what needs work. Another interesting thing for me would be : what is your budget ? "how many cycles per DB reads and writes are you willing to spend and how did you come to that number" But, as Eric says, let's develop 2 and I'll try in parallel to figure out the per layer breakdown cost. -r > Most file systems (ufs, vxfs, etc.) don't do 1) or 2) without turning > on "directI/O". > > ZFS *does* 1. It doesn't do 2 (currently). > > That is what we're trying to discuss here. > > Where does the win come from with "directI/O"? Is it 1), 2), or some > combination? If its a combination, what's the percentage of each > towards the win? > > We need to tease 1) and 2) apart to have a full understanding. I'm > not against adding 2) to ZFS but want more information. I suppose > i'll just prototype it and find out for myself. > > eric _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss