> On Oct 28, 2019, at 10:43 AM, Stephen Hemminger <step...@networkplumber.org> > wrote: > > On Fri, 25 Oct 2019 08:42:25 +0200 > Willy Tarreau <w...@1wt.eu> wrote: > >> Hi Andy, >> >>> On Thu, Oct 24, 2019 at 09:45:56PM -0700, Andy Lutomirski wrote: >>> Hi all- >>> >>> Supporting iopl() in the Linux kernel is becoming a maintainability >>> problem. As far as I know, DPDK is the only major modern user of >>> iopl(). >>> >>> After doing some research, DPDK uses direct io port access for only a >>> single purpose: accessing legacy virtio configuration structures. >>> These structures are mapped in IO space in BAR 0 on legacy virtio >>> devices. >>> >>> There are at least three ways you could avoid using iopl(). Here they >>> are in rough order of quality in my opinion: >> (...) >> >> I'm just wondering, why wouldn't we introduce a sys_ioport() syscall >> to perform I/Os in the kernel without having to play at all with iopl()/ >> ioperm() ? That would alleviate the need for these large port maps. >> Applications that use outb/inb() usually don't need extreme speeds. >> Each time I had to use them, it was to access a watchdog, a sensor, a >> fan, control a front panel LED, or read/write to NVRAM. Some userland >> drivers possibly don't need much more, and very likely run with >> privileges turned on all the time, so replacing their inb()/outb() calls >> would mostly be a matter of redefining them using a macro to use the >> syscall instead. >> >> I'd see an API more or less like this : >> >> int ioport(int op, u16 port, long val, long *ret); >> >> <op> would take values such as INB,INW,INL to fill *<ret>, OUTB,OUTW,OUL >> to read from <val>, possibly ORB,ORW,ORL to read, or with <val>, write >> back and return previous value to <ret>, ANDB/W/L, XORB/W/L to do the >> same with and/xor, and maybe a TEST operation to just validate support >> at start time and replace ioperm/iopl so that subsequent calls do not >> need to check for errors. Applications could then replace : >> >> ioperm() with ioport(TEST,port,0,0) >> iopl() with ioport(TEST,0,0,0) >> outb() with ioport(OUTB,port,val,0) >> inb() with ({ char val;ioport(INB,port,0,&val);val;}) >> >> ... and so on. >> >> And then ioperm/iopl can easily be dropped. >> >> Maybe I'm overlooking something ? >> Willy > > DPDK does not want to system calls. It kills performance. > With pure user mode access it can reach > 10 Million Packets/sec > with a system call per packet that drops to 1 Million Packets/sec.
If you are getting 10 MPPS with an OUT per packet, I’ll buy you a whole case of beer. I’m suggesting that, on virtio-legacy, you benchmark the performance hit of using a syscall to ring the doorbell. Right now, you're doing an OUT instruction that traps to the hypervisor, probably gets emulated, and goes out to whatever host-side driver is running. The cost of doing that is going to be quite high, especially on older machines. I'm guessing that adding a syscall to the mix won't make much difference. --Andy