> I changed Slirp output to use vectored IO to avoid the slowdown from > memcpy (see the patch for the work in progress, gives a small > performance improvement). But then I got the idea that using AIO would > be nice at the outgoing end of the network IO processing. In fact, > vectored AIO model could even be used for the generic DMA! The benefit > is that no buffering or copying should be needed.
An interesting idea, however I don't want to underestimate the difficulty of implementing this correctly. I suspect to get real benefits you need to support zero-copy async operation all the way through. Things get really hairy if you allow some operations to complete synchronously, and some to be deferred. I've done async operation for SCSI and USB. The latter is really not pretty, and the former has some notable warts. A generic IODMA framework needs to make sure it covers these requirements without making things worse. Hopefully it'll also help fix the things that are wrong with them. > For the specific Sparc32 case, unfortunately Lance bus byte swapping > makes buffering necessary at that stage, unless we can make N vectors > with just a single byte faster than memcpy + bswap of memory block > with size N. We really want to be dealing with largeish blocks. The {ptr,size} vector is 64 or 128 bytes per element, so the overhead on blocks < 64 bytes if going to be really brutal. Also time taken to do address translation will be O(number of vectors). > Inside Qemu the vectors would use target physical addresses (struct > qemu_iovec), but at some point the addresses would change to host > pointers suitable for real AIO. Phrases like "at some point" worry me :-) I think it would be good to get a top-down description of what each different entity (initiating device, host endpoint, bus translation, memory) is responsible for, and how they all fit together. I have some ideas, but without more detailed investigation can't tell if they will actually work in practice, or if they fit into the code fragments you've posted. My suspicion is they don't as I can't make head or tail of how your gdma_aiov.diff patch would be used in practice. Paul