Now could we detect the pattern that cause holding to the
cached block not optimal and do a quick freebehind after the
copyout ? Something like Random access +  very large file +
poor cache hit ratio ?

We might detect it ... or we could let the application give us
the hint, via the directio ioctl, which for ZFS might mean not
"bypass the cache" but "free cache as soon as possible."

(The problem with detecting this situation is that we don't
know future access patterns, and we don't know whether the
application is doing its own caching, in which case any that
we do isn't particularly useful ... unless there are subblock
writes in future, in which case our cache can be used to avoid
the read-modify-write.)

Now about avoiding the copy; That would mean dma straight
into user space ? But if the checksum does not validate the
data, what do we do ? If storage is not raid-protected and we
have to return EIO, I don't think we can do this _and_
corrupt the user buffer also, not sure what POSIX says for
this situation.

Well, direct I/O behaves that way today.  Actually, paged I/O
does as well -- we move one page at a time into user space, so
if we encounter an error while reading a later portion of the
request, the earlier portion of the user buffer will already
have been overwritten.

SUSv3 doesn't specify anything about buffer contents in the
event of an error.  (It even leaves the file offset undefined.)

So I think we're safe here.

Now latency wise, the cost of copy is  small compared to the
I/O;  right ? So it now  turns into an  issue of saving some
CPU cycles.

CPU cycles and memory bandwidth (which both can be in short
supply on a database server).

Anton

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to