On Fri, 25 Oct 2019 08:42:25 +0200 Willy Tarreau <w...@1wt.eu> wrote:
> Hi Andy, > > On Thu, Oct 24, 2019 at 09:45:56PM -0700, Andy Lutomirski wrote: > > Hi all- > > > > Supporting iopl() in the Linux kernel is becoming a maintainability > > problem. As far as I know, DPDK is the only major modern user of > > iopl(). > > > > After doing some research, DPDK uses direct io port access for only a > > single purpose: accessing legacy virtio configuration structures. > > These structures are mapped in IO space in BAR 0 on legacy virtio > > devices. > > > > There are at least three ways you could avoid using iopl(). Here they > > are in rough order of quality in my opinion: > (...) > > I'm just wondering, why wouldn't we introduce a sys_ioport() syscall > to perform I/Os in the kernel without having to play at all with iopl()/ > ioperm() ? That would alleviate the need for these large port maps. > Applications that use outb/inb() usually don't need extreme speeds. > Each time I had to use them, it was to access a watchdog, a sensor, a > fan, control a front panel LED, or read/write to NVRAM. Some userland > drivers possibly don't need much more, and very likely run with > privileges turned on all the time, so replacing their inb()/outb() calls > would mostly be a matter of redefining them using a macro to use the > syscall instead. > > I'd see an API more or less like this : > > int ioport(int op, u16 port, long val, long *ret); > > <op> would take values such as INB,INW,INL to fill *<ret>, OUTB,OUTW,OUL > to read from <val>, possibly ORB,ORW,ORL to read, or with <val>, write > back and return previous value to <ret>, ANDB/W/L, XORB/W/L to do the > same with and/xor, and maybe a TEST operation to just validate support > at start time and replace ioperm/iopl so that subsequent calls do not > need to check for errors. Applications could then replace : > > ioperm() with ioport(TEST,port,0,0) > iopl() with ioport(TEST,0,0,0) > outb() with ioport(OUTB,port,val,0) > inb() with ({ char val;ioport(INB,port,0,&val);val;}) > > ... and so on. > > And then ioperm/iopl can easily be dropped. > > Maybe I'm overlooking something ? > Willy DPDK does not want to system calls. It kills performance. With pure user mode access it can reach > 10 Million Packets/sec with a system call per packet that drops to 1 Million Packets/sec. Also, adding new system calls might help in the long term, but users are often kernels that are at least 5 years behind upstream.