Now could we detect the pattern that cause holding to the cached block not optimal and do a quick freebehind after the copyout ? Something like Random access + very large file + poor cache hit ratio ?
We might detect it ... or we could let the application give us the hint, via the directio ioctl, which for ZFS might mean not "bypass the cache" but "free cache as soon as possible." (The problem with detecting this situation is that we don't know future access patterns, and we don't know whether the application is doing its own caching, in which case any that we do isn't particularly useful ... unless there are subblock writes in future, in which case our cache can be used to avoid the read-modify-write.)
Now about avoiding the copy; That would mean dma straight into user space ? But if the checksum does not validate the data, what do we do ? If storage is not raid-protected and we have to return EIO, I don't think we can do this _and_ corrupt the user buffer also, not sure what POSIX says for this situation.
Well, direct I/O behaves that way today. Actually, paged I/O does as well -- we move one page at a time into user space, so if we encounter an error while reading a later portion of the request, the earlier portion of the user buffer will already have been overwritten. SUSv3 doesn't specify anything about buffer contents in the event of an error. (It even leaves the file offset undefined.) So I think we're safe here.
Now latency wise, the cost of copy is small compared to the I/O; right ? So it now turns into an issue of saving some CPU cycles.
CPU cycles and memory bandwidth (which both can be in short supply on a database server). Anton _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss