> This is a request for a review. This patch fixes a bug in specfs
> relating to dealing with the EOF condition of a block device.
>
> If the EOF occurs in the middle of a block, specfs was not
> properly calculating the truncation for the I/O.
I didn't have time to review this properly when you first asked.
It is a bit over-commented, and returns wrong error codes for EOF.
> Index: miscfs/specfs/spec_vnops.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/miscfs/specfs/spec_vnops.c,v
> retrieving revision 1.108
> diff -u -r1.108 spec_vnops.c
> --- spec_vnops.c 1999/09/17 06:10:26 1.108
> +++ spec_vnops.c 1999/09/20 17:50:48
> @@ -311,19 +311,37 @@
> do {
> bn = btodb(uio->uio_offset) & ~(bscale - 1);
> on = uio->uio_offset % bsize;
> - n = min((unsigned)(bsize - on), uio->uio_resid);
> if (seqcount > 1) {
> nextbn = bn + bscale;
> error = breadn(vp, bn, (int)bsize, &nextbn,
> (int *)&bsize, 1, NOCRED, &bp);
> } else {
> error = bread(vp, bn, (int)bsize, NOCRED, &bp);
> + }
> +
> + /*
> + * Figure out how much of the buffer is valid relative
> + * to our offset into the buffer, which may be negative
> + * if we are beyond the EOF.
> + *
> + * The valid size of the buffer is based on
> + * bp->b_bcount (which may have been truncated by
> + * dscheck or the device) minus bp->b_resid, which
> + * may be indicative of an I/O error if non-zero.
> + */
The short read must have been caused by EOF unless error != 0. Note that
filesystems depend on this, i.e., the following handling in VOP_STRATEGY():
Case 1: do some i/o (possibly null), then hit an i/o error:
Return the amount not done in b_resid, and set B_ERROR to indicate
that i/o was truncated because of an error.
Case 2: do some i/o (possibly null), then hit EOF:
Return the amount not done in b_resid, and don't touch B_ERROR.
Some drivers change b_bcount, but this is dubious, especially
for writes where the data beyond b_bcount is still valid and
something may be using it.
ffs handles these cases as follows:
Case 1:
Throw away the partial block. Depend on VOP_STRATEGY() setting
B_ERROR so that bread()/bwrite()/etc. return nonzero in this
case.
Case 2:
Assume that this case can't happen for metadata (a reasonable
assumption, since filesystems don't use blocks beyond EOF),
and don't check for it. Similarly for VOP_WRITE(). Handle it
in VOP_READ(), although I think it can't happen there either.
specfs (for bdevs) currently handle these cases like ffs. Short counts
are only handled for Case 2.
physio() handles the cases like VOP_STRATEGY(). It passes the distinction
bewtween EOF and error to its caller by returning nonzero only if there
was an error. (Distinguishing the cases is especially important for
write-once devices.) Control eventually reaches the level dofileread(),
etc., where the distinction is screwed up :-( (`error' is only reset to
0 for nonzero short counts if it is ERESTART, EINTR or EWOULDBLOCK).
> + n = bp->b_bcount - on;
> + if (n < 0) {
> + error = EINVAL;
`error' shouldn't be changed here if it is already nonzero.
EINVAL is a wrong value to return for EOF here (but neccessary to give
bug for bug compatibility here). See the comment in my version.
> + } else {
> + n -= bp->b_resid;
> + if (n < 0)
> + error = EIO;
EIO is a wronger value to return for EOF.
Here are my fixes (relative to the old version) for spec_read(). They are
over-commented to temporarily document the buggy EOF return code.
---
diff -c2 spec_vnops.c~ spec_vnops.c
*** spec_vnops.c~ Sat Sep 11 15:54:39 1999
--- spec_vnops.c Wed Sep 22 19:47:52 1999
***************
*** 309,313 ****
bn = btodb(uio->uio_offset) & ~(bscale - 1);
on = uio->uio_offset % bsize;
- n = min((unsigned)(bsize - on), uio->uio_resid);
if (vp->v_lastr + bscale == bn) {
nextbn = bn + bscale;
--- 311,314 ----
***************
*** 317,325 ****
error = bread(vp, bn, (int)bsize, NOCRED, &bp);
vp->v_lastr = bn;
- n = min(n, bsize - bp->b_resid);
if (error) {
brelse(bp);
return (error);
}
error = uiomove((char *)bp->b_data + on, n, uio);
brelse(bp);
--- 318,343 ----
error = bread(vp, bn, (int)bsize, NOCRED, &bp);
vp->v_lastr = bn;
if (error) {
brelse(bp);
return (error);
}
+
+ /*
+ * The amount of valid data in the buffer is
+ * bp->b_bcount - bp->b_resid. If this is smaller
+ * than `on', return EINVAL to give bug for bug
+ * compatibility with physio() and with ourself
+ * (most strategy routines erroneously return EINVAL
+ * for i/o completely beyond EOF). POSIX and POLA
+ * specify returning 0 for reads from all offsets
+ * beyond EOF.
+ */
+ n = bp->b_bcount - bp->b_resid - on;
+ if (n < 0) {
+ brelse(bp);
+ return (EINVAL);
+ }
+
+ n = imin(n, uio->uio_resid);
error = uiomove((char *)bp->b_data + on, n, uio);
brelse(bp);
---
I was also concerned about what happens when buffers with a short
b_bcount or a nonzero b_resid are left in the cache. It seems that
nothing fatal happens, but they just get in the way. I observed
the following farcial handling for rereading block 4 on /dev/vn0a
where /dev/vn0a has size 6:
spec_read(): call bread()
bread(): call getblk()
getblk(): call gbincore()
gb_incore(): returns the buffer successfully
getblk(): The buffer has the wrong size, so it is thrown away.
Do a VOP_WRITE() as part of throwing it away, although
the buffer is not dirty! :-(
Return an empty buffer.
bread(): do a VOP_STRATEGY() to read the buffer again.
Bruce
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message