--------
In message <dc23d104-f5f3-4844-8638-4644dc9dd...@samsco.org>, Scott Long writes:

> Why is overloading EIO so bad?  brelse() will call bdirty() when a BIO_WRITE
> command has failed with EIO.  Calling bdirty() has the effect of retrying the 
> I/O.
> This disregards the fact that disk drivers only return EIO when they’ve 
> decided
> that the I/O cannot be retried.  It has no termination condition for the 
> retries, and
> will endlessly retry I/O in vain; I’ve seen this quite frequently.

The really annoying thing about this particular class of errors,
is that if we propagated them up to the filesystems, very often
things could be relocated to different blocks and we would avoid the
unnecessary filesystem corruption.

The real fundamental deficiency is that we do not have a way to say "give up
if this bio cannot be completed in X time" which is what people actually want.

That is suprisingly hard to provide, there are far too many
corner-cases for me to enumerate them all, but let me just give one
example:

Imagine you issue a deadlined write to a RAID5 thing.  Thee component
writes happen smoothly, but the last two fail the deadline, with
no way to predict how long time it will take before they complete
or fail.

* Does the bio write transaction fail ?

* Does the bio write transaction time out ?

* Do you attempt to complete the write to the RAID5 ?

* Where do you store a copy of the data if you do ?

* What happens next time a read happens on this bio's extent ?

Then for an encore, imagine it was a read bio: Three DMAs go smoothly,
two are outstanding and you don't know if/when they will complete/fail.

* If you fail or time out the bio, how do you "taint" the space
  being read into until the two remaining DMAs are outstanding?

* What if that space is mapped into userland ?

* What if that space is being executed ?

* What if one of the two outstanding DMAs later return garbage ?

My conclusion back when I did GEOM, was that the only way to
do something like this sanely, is to have a special GEOM do it
for you, which always allocates a temp-space:

        allocate temp buffer
        if (write)
                copy write data to temp buffer
        issue bio downwards on temp buffer
        if timeout
                park temp buffer until biodone
                return(timeout)
        if (read)
                copy temp buffer to read space
        return (ok/error)


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
p...@freebsd.org         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-geom@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-geom
To unsubscribe, send any mail to "freebsd-geom-unsubscr...@freebsd.org"

Reply via email to