> Basically speaking - there needs to be some sort of strategy for > bypassing the ARC or even parts of the ARC for applications that > may need to advise the filesystem of either: > 1) the delicate nature of imposing additional buffering for their > data flow > 2) already well optimized applications that need more adaptive > cache in the application instead of the underlying filesystem or > volume manager
This advice can't be sensibly delivered to ZFS via a Direct I/O mechanism. Anton's characterization of Direct I/O as, "an optimization which allows data to be transferred directly between user data buffers and disk, without a memory-to-memory copy," is concise and accurate. Trying to intuit advice from this is unlikely to be useful. It would be better to develop a separate mechanism for delivering advice about the application to the filesystem. (fadvise, perhaps?) A DIO implementation for ZFS is more complicated than UFS and adversely impacts well optimized applications. I looked into this late last year when we had a customer who was suffering from too much bcopy overhead. Billm found another workaround instead of bypassing the ARC. The challenge for implementing DIO for ZFS is in dealing with access to the pages mapped by the user application. Since ZFS has to checksum all of its data, the user's pages that are involved in the direct I/O cannot be written to by another thread during the I/O. If this policy isn't enforced, it is possible for the data written to or read from disk to be different from their checksums. In order to protect the user pages while a DIO is in progress, we want support from the VM that isn't presently implemented. To prevent a page from being accessed by another thread, we have to unmap the TLB/PTE entries and lock the page. There's a cost associated with this, as it may be necessary to cross-call other CPUs. Any thread that accesses the locked pages will block. While it's possible lock pages in the VM today, there isn't a neat set of interfaces the filesystem can use to maintain the integrity of the user's buffers. Without an experimental prototype to verify the design, it's impossible to say whether overhead of manipulating the page permissions is more than the cost of bypassing the cache. What do you see as potential use cases for ZFS Direct I/O? I'm having a hard time imagining a situation in which this would be useful to a customer. The application would probably have to be single-threaded, and if not, it would have to be pretty careful about how its threads access buffers involved in I/O. -j _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss