On Thu, Oct 11, 2018 at 3:27 PM, Dmitry Vyukov <dvyu...@google.com> wrote: > On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet > <asmad...@codewreck.org> wrote: >> Dmitry Vyukov wrote on Thu, Oct 11, 2018: >>> > That's still the tricky part, I'm afraid... Making a separate server >>> > would have been easy because I could have reused some of my junk for the >>> > actual connection handling (some rdma helper library I wrote ages >>> > ago[1]), but if you're going to just embed C code you'll probably want >>> > something lower level? I've never seen syzkaller use any library call >>> > but I'm not even sure I would know how to create a qp without >>> > libibverbs, would standard stuff be OK ? >>> >>> Raw syscalls preferably. >>> What does 'rxe_cfg start ens3' do on syscall level? Some netlink? >> >> modprobe rdma_rxe (and a bunch of other rdma modules before that) then >> writes the interface name in /sys/module/rdma_rxe/parameters/add >> apparently; then checks it worked. >> this part could be done in C directly without too much trouble, but as >> long as the proper kernel configuration/modules are available > > Now we are talking! > We generally assume that all modules are simply compiled into kernel. > At least that's we have on syzbot. If somebody can't compile them in, > we can suggest to add modprobe into init. > So this boils down to just writing to /sys/module/rdma_rxe/parameters/add.
This fails for me: root@syzkaller:~# echo -n syz1 > /sys/module/rdma_rxe/parameters/add [20992.905406] rdma_rxe: interface syz1 not found bash: echo: write error: Invalid argument >>> Any libraries and utilities are hell pain in linux world. Will it work >>> in Android userspace? gVisor? Who will explain all syzkaller users >>> where they get this for their who-knows-what distro, which is 10 years >>> old because of corp policies, and debug how their version of the >>> library has a slightly incompatible version? >>> For example, after figuring out that rxe_cfg actually comes from >>> rdma-core (which is a separate delight on linux), my debian >>> destribution failed to install it because of some conflicts around >>> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about >>> such package. And we've just started :) >> >> The rdma ecosystem is a pain, I'll easily agree with that... >> >>> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP, >>> ok, that's it. If it works, great, we can use it. >> >> I'll have to look into it a bit more; libibverbs abstracts a lot of >> stuff into per-nic userspace drivers (the files I cited in a previous >> mail) and basically with the mellanox cards I'm familiar with the whole >> user session looks like this: >> * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and >> /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi >> version, what user driver to load etc) >> * it and the userspace driver issue "commands" over these two files' fd >> to setup the connection ; some commands are standard but some are >> specific to the interface and defined in the driver. > > But we will use some kind of virtual/stub driver, right? We don't have > real hardware. So all these commands should be fixed and known for the > virtual/stub driver. > >> There are many facets to a connection in RDMA: a protection domain used >> to register memory with the nic, a queue pair that is the actual tx/rx >> connection, optionally a completion channel that will be another fd to >> listen on for events that tell you something happened and finally some >> memory regions to directly communicate with the nic from userspace >> depending on the specific driver. >> * then there's the actual usage, more commands through the uverbs0 char >> device to register the memory you'll use, and once that's done it's >> entierly up to the driver - for example the mellanox lib can do >> everything in userspace playing with the memory regions it registered, >> but I'd wager the rxe driver does more calls through the uverbs0 fd... >> >> Honestly I'm not keen on reimplementing all of this; the interface >> itself pretty much depends on your version of the kernel (there is a >> common ABI defined, but as far as specific nics are concerned if your >> kernel module doesn't match the user library version you can get some >> nasty surprises), and it's far from the black or white of a good ol' >> ENOSUPP error. >> >> >> I'll look if I can figure out if there is a common subset of verbs >> commands that are standard and sufficient to setup a listening >> connection and exchange data that should be supported for all devices >> and would let us reimplement just that, but while I hear your point >> about android and ten years in the future I think it's more likely than >> ten years in the future the verb abi will have changed but libibverbs >> will just have the new version implemented and hide the change :P > > But again we don't need to support all of the available hardware. > For example, we are testing net stack from external side using tun. > tun is a very simple, virtual abstraction of a network card. It allows > us to test all of generic net stack starting from L2 without messing > with any real drivers and their differences entirely. I had impression > that we are talking about something similar here too. Or not? > > Also I am a bit missing context about rdma<->9p interface. Do we need > to setup all these ring buffers to satisfy the parts that 9p needs? Is > it that 9p actually reads data directly from these ring buffers? Or > there is some higher-level rdma interface that 9p uses?