Am 12.02.2025 um 16:05 hat Paolo Bonzini geschrieben:
> On 2/11/25 22:43, Kevin Wolf wrote:
> > +/// A request to a block driver
> > +pub enum Request {
> > +    Read { offset: u64, len: u64 },
> > +}
> > +
> 
> Maybe add flags already?
> > +#[allow(dead_code)]
> > +pub enum MappingTarget {
> > +    /// The described blocks are unallocated. Reading from them yields 
> > zeros.
> > +    Unmapped,
> > +
> > +    /// The described blocks are stored in a child node.
> > +    Data {
> > +        /// Child node in which the data is stored
> > +        node: (),
> 
> Make it already a *mut BlockDriverState, or *mut BdrvChild?  Or are you 
> worried of
> irritating the borrow checker? :)

Mostly I just didn't need it yet and was too lazy to think about the
details.

The right type would probably be Arc<driver::BdrvChild> (which contains
the raw *mut pointer). But I haven't thought much about how this plays
together with graph changes.

> > +        /// Offset in the child node at which the data is stored
> > +        offset: u64,
> > +    },
> > +}
> > +
> > +/// A mapping for a number of contiguous guest blocks
> > +pub struct Mapping {
> > +    /// Offset of the mapped blocks from the perspective of the guest
> > +    pub offset: u64,
> > +    /// Length of the mapping in bytes
> > +    pub len: u64,
> > +    /// Where the data for the described blocks is stored
> > +    pub target: MappingTarget,
> > +}
> > +
> >   /// A trait for writing block drivers.
> >   ///
> >   /// Types that implement this trait can be registered as QEMU block 
> > drivers using the
> > @@ -37,6 +72,11 @@ unsafe fn open(
> >       /// Returns the size of the image in bytes
> >       fn size(&self) -> u64;
> > +
> > +    /// Returns the mapping for the first part of `req`. If the returned 
> > mapping is shorter than
> > +    /// the request, the function can be called again with a shortened 
> > request to get the mapping
> > +    /// for the remaining part.
> > +    async fn map(&self, req: &Request) -> io::Result<Mapping>;
> 
> I am not sure I like the idea of making this the only way to do a read.

We'll clearly need a way for drivers to have explicit functions when the
blocks aren't just mapped to somewhere else, e.g. with compression or
encryption. (My idea for that was that it would be another MappingTarget
branch.)

But using map() as the primary interface is intentional as it enables a
few things, even though this isn't visible in the code yet. Of course,
that basically every block driver duplicates the same loop is a reason
to unify it, but as you say there are other ways to do that.

One thing map() can do in the common case is provide the caller just a
list of mappings to a file descriptor and an offset. This may in the
future allow some block exports like ublk or fuse to use zero copy
operations. You can't do that if the block driver already does an
explicit read into a buffer.

You can cache mappings outside of the block driver and even of QEMU.
Block exports could tell the kernel that it shouldn't even bother with
going to userspace if a request can be completed using a cached mapping.
Two hosts with shared storage could access the same qcow2 image without
one of them sending all data across the network (like we did in KubeSAN
with NBD), but it's enough if metadata (= mappings) are synchronised.

So I think this is a useful interface to have, even if we don't know
which of the possibilities we'll actually make use of. And it should be
the primary interface, because if you want to do caching, then cache
invalidation must be done properly by block drivers, which is more
likely to go wrong if it's just an additional interface that isn't
normally used.

Kevin


Reply via email to