On 10/11/2011 11:56 AM, Gleb Natapov wrote:
On Tue, Oct 11, 2011 at 11:49:16AM +0200, Avi Kivity wrote: > >Whatever we do, the interface will never be as fast as DMA. We will always have to do sanity / permission checks for every IO operation, can batch up only so many IO requests and in QEMU again have to call our callbacks in a loop. > > We can batch per page, which makes the overhead negligible. > Current code batch userspace exit per 1024 bytes IIRC and changing it to page didn't show significant improvement (also IIRC). But after io data is copied into the kernel emulator process it byte by byte. Possible optimization, which I didn't tried, is to check that destination memory is not mmio and write back the whole buffer if it is the case.
All the permission checks, segment checks, register_address_increment, page table walking, can be done per page. Right now they are done per byte.
btw Intel also made this optimization, current processors copy complete cache lines instead of bytes, so they probably also do the checks just once.
-- error compiling committee.c: too many arguments to function