bio re-ordering

2022-01-28 Thread peterj
I'm working on a GEOM Gate network client to better handle high-latency
connections and have some questions regarding bio ordering assumptions
(alternatively, how much should I be able to re-order bio requests without
breaking things).  Within geom_gate, an incoming bio request is retrieved
from the kernel using a G_GATE_CMD_START ioctl, processed in userland
(typically by forwarding it to a remote system) and then returned via a
G_GATE_CMD_DONE ioctl.  My GEOM Gate client can reorder requests quite
aggressively and I suspect it's breaking some kernel assumptions regarding
bio behaviour.  The following questions assume that BIO_READ, BIO_WRITE and
BIO_FLUSH are valid but BIO_DELETE isn't supported.

a) In the absence of BIO_FLUSH operations, what (if any) are the limits on
   reordering operations?  Given a block that initially contains A, followed
   by a write B, read and write C, is there any constraint on which content
   the read returns?

b) Are individual BIO_READ and BIO_WRITE operations expected to be atomic
   with respect to other BIO_WRITE operations?  Give 2 adjacent blocks that
   initially contain AB, and successive write CD, read and write EF
   operations to those blocks, is it expected that the read would return CD
   (or maybe AD or EF, assuming that's valid from the previous question) or
   could the write operations partially complete in different orders,
   resulting in something like AD, CF, EB etc?

b) I assume that a BIO_FLUSH should not return DONE until all preceeding
   write operations have completed issued.  Is it required that write
   operations issued after the BIO_FLUSH must not complete before the
   BIO_FLUSH completes?

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: bio re-ordering

2022-01-28 Thread Konstantin Belousov
On Sat, Jan 29, 2022 at 03:29:39PM +1100, pet...@freebsd.org wrote:
> I'm working on a GEOM Gate network client to better handle high-latency
> connections and have some questions regarding bio ordering assumptions
> (alternatively, how much should I be able to re-order bio requests without
> breaking things).  Within geom_gate, an incoming bio request is retrieved
> from the kernel using a G_GATE_CMD_START ioctl, processed in userland
> (typically by forwarding it to a remote system) and then returned via a
> G_GATE_CMD_DONE ioctl.  My GEOM Gate client can reorder requests quite
> aggressively and I suspect it's breaking some kernel assumptions regarding
> bio behaviour.  The following questions assume that BIO_READ, BIO_WRITE and
> BIO_FLUSH are valid but BIO_DELETE isn't supported.
> 
> a) In the absence of BIO_FLUSH operations, what (if any) are the limits on
>reordering operations?  Given a block that initially contains A, followed
>by a write B, read and write C, is there any constraint on which content
>the read returns?
There are no limits.  Either other software entities, or hardware itself,
can process requests in arbitrary order.  This is why things are typically
done in the completion handler, and part of the reason why the complexity
of UFS SU exists.

> 
> b) Are individual BIO_READ and BIO_WRITE operations expected to be atomic
>with respect to other BIO_WRITE operations?  Give 2 adjacent blocks that
>initially contain AB, and successive write CD, read and write EF
>operations to those blocks, is it expected that the read would return CD
>(or maybe AD or EF, assuming that's valid from the previous question) or
>could the write operations partially complete in different orders,
>resulting in something like AD, CF, EB etc?
No.  At very least, underlying entities can split request into several,
each of which is ordered individiually.  Typically, it is higher-level
code that ensures that there are no concurrent modifications of the same
block.  For instance, we exclusively lock vnodes and buffers around 
metadata updates.  Similarly, we lock buffers until the data is written
to the device.

> 
> b) I assume that a BIO_FLUSH should not return DONE until all preceeding
>write operations have completed issued.  Is it required that write
>operations issued after the BIO_FLUSH must not complete before the
>BIO_FLUSH completes?
UFS SU relies on BIO_FLUSH being the full barrier.