On Tue, Oct 11, 2011 at 11:59:45AM +0200, Avi Kivity wrote: > On 10/11/2011 11:56 AM, Gleb Natapov wrote: > >On Tue, Oct 11, 2011 at 11:49:16AM +0200, Avi Kivity wrote: > >> >Whatever we do, the interface will never be as fast as DMA. We will > >> always have to do sanity / permission checks for every IO operation, can > >> batch up only so many IO requests and in QEMU again have to call our > >> callbacks in a loop. > >> > >> We can batch per page, which makes the overhead negligible. > >> > >Current code batch userspace exit per 1024 bytes IIRC and changing it to > >page didn't show significant improvement (also IIRC). But after io data > >is copied into the kernel emulator process it byte by byte. Possible > >optimization, which I didn't tried, is to check that destination memory is > >not mmio and write back the whole buffer if it is the case. > > > > All the permission checks, segment checks, > register_address_increment, page table walking, can be done per > page. Right now they are done per byte. > Permission checking result is cached in ctxt->perm_ok. I see that current code check it after several function calls, but this was not the case before. All others are done for each iteration currently. By writing back a whole buffer at once we eliminate others too. Interesting how much it will improve the situation.
> btw Intel also made this optimization, current processors copy > complete cache lines instead of bytes, so they probably also do the > checks just once. > > -- > error compiling committee.c: too many arguments to function -- Gleb.