Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic

Alex Pyrgiotis Fri, 19 Feb 2016 03:51:07 -0800

Hi Paolo,

Sorry for the late reply and thanks for the review. See my comments inline:

On 02/11/2016 01:17 PM, Paolo Bonzini wrote:
> 
> 
> On 16/12/2015 17:55, Alex Pyrgiotis wrote:
>> +/*
>> + * Create a QEMUIOVector from a scatter-gather list.
>> + *
>> + * This function does not copy the data of the scatter-gather list. 
>> Instead, it
>> + * uses the dma_memory_map() function to map physical memory regions of the
>> + * virtual device (as interpreted by the guest kernel) into the address 
>> space
>> + * of the QEMU process, in order to have access to the data.
>> + */
>> +static void dma_map_sg(DMAAIOCB *dbs)
> 
> In special cases where the QEMUSGList includes MMIO regions, dma_map_sg
> might not be able to map the whole list.  In this case, for regular I/O
> it is possible to break the operation in multiple steps---in fact, this
> breaking of requests is the main purpose of most of the code in
> dma-helpers.c.

You're right. I've written a comment about that yet, somehow, the
implications of rescheduling a DMA operation flew over my head. Great.

So, I've tried to grasp the current situation and this is what I've got
so far. Please correct me where I'm wrong.

Guest kernel space:
1. The guest kernel wants to send a SCSI request to a (para)virtual SCSI
   controller.
2. This SCSI request involves a list of physical address ranges.
   Whether the request only refers to a single contiguous physical
   range, or whether it is a scatter-gather list depends on the
   implementation of the kernel's Low-level Device Driver (LLDD) and
   the capabilities of the SCSI controller.
3. Some of these ranges may include MMIO regions. Since I'm not aware
   of a scenario where this happens, can you provide an example?
4. The LLDD writes the SCSI CDB to the SCSI controller's queue and
   informs the SCSI controller by writing to some sort of doorbell
   register.

QEMU/Hardware space:
5. The SCSI controller code will create a QEMUSGList that points to
   the memory regions of the SCSI request. This QEMUSGList will also
   include the MMIO regions.
6. The QEMU device implementation, e.g. scsi-block, chooses to use
   the dma_* interface.
7. The dma_blk_read/write() code will ultimately attempt to map all the
   memory regions pointed by the QEMUSGList in order to create a
   QEMUIOVector.
8. At some point during the mapping loop, the code will encounter an
   MMIO region. Since reading and writing from/to an MMIO region
   requires  special handling, e.g., we need to call
   MemoryRegion->ops->write(), we cannot include it in our read/write
   system call to the host kernel.
9. This leads to a partial read/write and the mapping loop will resume
   once the partial read/write() has finished.

Are we in the same page so far? If so, let's dig into the
dma_blk_read/write() logic. The following is a simplified pseudocode:

        allocate QEMUIOVector
callback:

        # Reset any changes to QEMUIOVector from previous operations
        for address in QEMUIOVector:
                unmap address
        reset QEMUIOVector

        # Check if the previous IO operation was the last one and if so,
        # call the completion callback of the user and return.
        if operation is completed:
                dma_complete()
                return

        offset = adjust read/write offset
        # Map each sg of QEMUSGList and add it to the QEMUIOVector
        for sg in QEMUSGList:
                address = dma_memory_map(sg)
                if address == NULL:
                        break
                add address to QEMUIOVector

        # When either of the following operations have finished, restart
        # this DMA operation.
        if nothing mapped:
                # If we can't map anything, we need to retry
                create reschedule_dma callback
                cpu_register_map_client(callback)
        else:
                # Else, we can perform a partial read/write()
                do readv/writev(offset, QEMUIOVector, callback)

Are the above OK? If so, I have some questions:

a) Is an MMIO region one of the reasons why we can't map an sg?
b) At which point will the relevant ops->write() method for the MMIO
   region be called when we have to DMA into the region?? Is it handled
   implicitly in dma_memory_map()?
c) I'm not quite sure about the logic of the "nothing mapped" section.
   Correct me if I'm wrong, but what I think it does is that it
   registers a callback (reschedule_dma) once some sort of mapping has
   completed. What kind of mapping is this? Is there anything more to
   it?

> However, it is not possible to do the same for ioctls.  This is actually
> the reason why no one has ever tried to make scsi-generic do anything
> but bounce-buffering. I think that your code breaks horribly in this
> case, and I don't see a way to fix it, except for reverting to bounce
> buffering.
>
> This would require major changes in your patches, and I'm not sure
> whether they are worth it for the single use case of tape devices...

Well, I wouldn't narrow it down to tape devices. The scsi-generic
interface is the universal interface for all SCSI devices and the only
interface that is fully passthrough. So this patch would effectively
boost the baseline performance of SCSI devices. I think it's worth a try.

Thanks,
Alex

Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic

Reply via email to