On Fri, 22 Sep 2023 09:19:07 +0100 Bruce Richardson <bruce.richard...@intel.com> wrote:
> Following my talk at the recent DPDK Summit [1], here is an RFC patchset > containing the prototypes I created which led to the talk. This > patchset is simply to demonstrate: > > * what is currently possible with DPDK in terms of zero-copy IPC > * where the big gaps, and general problem areas are > * what the performance is like doing zero-copy between processes > * how we may look to have new deployment models for DPDK apps. > > This cover letter is quite long, as it covers how to run the demo app > and use the drivers included in this set. I felt it more accessible this > way than putting it in rst files in the patches. This patchset depends > upon patchsets [2] and [3] > > [1] https://dpdksummit2023.sched.com/event/1P9wU > [2] http://patches.dpdk.org/project/dpdk/list/?series=29536 > [3] http://patches.dpdk.org/project/dpdk/list/?series=29538 > > Overview > -------- > > The patchset contains at a high level the following parts: a proxy > application which performs packet IO and steers traffic on a per-queue > basis to other applications which connect to it via unix sockets, and a > set of drivers to be used by those applications so that they can > (hopefully) receive packets from the proxy app without any changes to > their own code. This all helps to demonstrate the feasibility of zero- > copy packet transfer between independent DPDK apps. > > The drivers are: > * a bus driver, which makes the connection to the proxy app via > the unix socket. Thereafter it accepts the shared memory from the > proxy and maps it into the running process for use for buffers and > rings etc. It also handled communication with the proxy app on behalf > of the other two drivers > * a mempool driver, which simply manages a set of buffers on the basis > of offsets within the shared memory area rather than using pointers. > The big downside of its use is that it assumes all the objects stored > in the mempool are mbufs. (As described in my talk, this is a big > issue where I'm not sure we have a good solution available right now > to resolve it) > * an ethernet driver, which creates an rx and tx ring in shared memory > for use in communicating with the proxy app. All buffers sent/received > are converted to offsets within the shared memory area. > > The proxy app itself implements all the other logic - mostly inside > datapath.c - to allow the connecting app to run. When an app connects to > the unix socket, the proxy app uses memfd to create a hugepage block to > be passed through to the "guest" app, and then sends/receives the > messages from the drivers until the app connection is up and running to > handle traffic. [Ideally, this IPC over unix socket mechanism should > probably be generalized into a library used by the app, but for now it's > just built-in]. As stated above, the steering of traffic is done > per-queue, that is, each app connects to a specific socket corresponding > to a NIC queue. For demo purposes, the traffic to the queues is just > distributed using RSS, but obviously it would be possible to use e.g. > rte_flow to do more interesting distribution in future. > > Running the Apps > ---------------- > > To get things all working just do a DPDK build as normal. Then run the > io-proxy app. It only takes a single parameter of the core number to > use. For example, on my system I run it on lcore 25: > > ./build/app/dpdk-io-proxy 25 > > The sockets to be created and how they map to ports/queues is controlled > via commandline, but a startup script can be provided, which just needs > to be in the current directory and name "dpdk-io-proxy.cmds". Patch 5 of > this set contains an example setup that I use. Therefore it's > recommended that you run the proxy app from a directory containing that > file. If so, the proxy app will use two ports and create two queues on > each, mapping them to 4 unix socket files in /tmp. (Each socket is > created in its own directory to simplify use with docker containers as > described below in next section). > > No traffic is handled by the app until other end-user apps connect to > it. Testpmd works as that second "guest" app without any changes to it. > To run multiple testpmd instances, each taking traffic from a unique RX > queue and forwarding it back, the following sequence of commands can be > used [in this case, doing forwarding on cores 26 through 29, and using > the 4 unix sockets configured using the startup file referenced above]. > > ./build/app/dpdk-testpmd -l 24,26 --no-huge -m1 --no-shconf \ > -a sock:/tmp/socket_0_0/sock -- --forward-mode=macswap > ./build/app/dpdk-testpmd -l 24,27 --no-huge -m1 --no-shconf \ > -a sock:/tmp/socket_0_1/sock -- --forward-mode=macswap > ./build/app/dpdk-testpmd -l 24,28 --no-huge -m1 --no-shconf \ > -a sock:/tmp/socket_1_0/sock -- --forward-mode=macswap > ./build/app/dpdk-testpmd -l 24,29 --no-huge -m1 --no-shconf \ > -a sock:/tmp/socket_1_1/sock -- --forward-mode=macswap > > NOTE: > * the "--no-huge -m1" is present to guarantee that no regular DPDK > hugepage memory is being used by the app. It's all coming from the > proxy app's memfd > * the "--no-shconf" parameter is necessary just to avoid us needing to > specify a unix file-prefix for each instance > * the forwarding type to be used is optional, macswap is chosen just to > have some work done inside testpmd to prove it can touch the packet > payload, not just the mbuf header. > > Using with docker containers > ---------------------------- > > The testpmd instances run above can also be run within a docker > container. Using a dockerfile like below we can run testpmd in a > container getting the packets in a zero-copy manner from the io-proxy > running on the host. > > # syntax=docker/dockerfile:1-labs > FROM alpine > RUN apk add --update alpine-sdk \ > py3-elftools meson ninja \ > bsd-compat-headers \ > linux-headers \ > numactl-dev \ > bash > ADD . dpdk > WORKDIR dpdk > RUN rm -rf build > RUN meson setup -Denable_drivers=*/shared_mem -Ddisable_libs=* \ > -Denable_apps=test-pmd -Dtests=false build > RUN ninja -v -C build > ENTRYPOINT ["/dpdk/build/app/dpdk-testpmd"] > > To access the proxy, all the container needs is access to the unix > socket on the filesystem. Since in the example startup script each > socket is placed in its own directory we can use "--volume" parameter to > give each instance it's own unique unix socket, and therefore proxied > NIC RX/TX queue. To run four testpmd instances as above, just in > containers the following commands can be used - assuming the dockerfile > above is built to an image called "testpmd". > > docker run -it --volume=/tmp/socket_0_0:/run testpmd \ > -l 24,26 --no-huge -a sock:/run/sock -- \ > --no-mlockall --forward-mode=macswap > docker run -it --volume=/tmp/socket_0_1:/run testpmd \ > -l 24,27 --no-huge -a sock:/run/sock -- \ > --no-mlockall --forward-mode=macswap > docker run -it --volume=/tmp/socket_1_0:/run testpmd \ > -l 24,28 --no-huge -a sock:/run/sock -- \ > --no-mlockall --forward-mode=macswap > docker run -it --volume=/tmp/socket_1_1:/run testpmd \ > -l 24,29 --no-huge -a sock:/run/sock -- \ > --no-mlockall --forward-mode=macswap > > NOTE: since these docker testpmd instances don't access IO or allocate > hugepages directly, they should be runable without extra privileges, so > long as they can connect to the unix socket. > > Additional info > --------------- > > * Stats are available via app commandline > * By default (#define in code), the proxy app only uses 2 queues per > port, so you can't configure more than that via cmdline > * Any ports used by the proxy script must support queue reconfiguration > at runtime without stopping the port. > * When a "guest" process connected to a socket terminates, all shared > memory used by that process is detroyed and a new memfd created on > reconnect. > * The above setups using testpmd are the only ways in which this app and > drivers have been tested. I would be hopeful that other apps would > work too, but there are quite a few limitations (see my DPDK summit > talk for some more details on those). > > Congratulations on reading this far! :-) > All comments/feedback on this welcome. > > Bruce Richardson (5): > bus: new driver to accept shared memory over unix socket > mempool: driver for mempools of mbufs on shared memory > net: new ethdev driver to communicate using shared mem > app: add IO proxy app using shared memory interfaces > app/io-proxy: add startup commands > > app/io-proxy/command_fns.c | 160 ++++++ > app/io-proxy/commands.list | 6 + > app/io-proxy/datapath.c | 595 +++++++++++++++++++++ > app/io-proxy/datapath.h | 37 ++ > app/io-proxy/datapath_mp.c | 78 +++ > app/io-proxy/dpdk-io-proxy.cmds | 6 + > app/io-proxy/main.c | 71 +++ > app/io-proxy/meson.build | 12 + > app/meson.build | 1 + > drivers/bus/meson.build | 1 + > drivers/bus/shared_mem/meson.build | 11 + > drivers/bus/shared_mem/shared_mem_bus.c | 323 +++++++++++ > drivers/bus/shared_mem/shared_mem_bus.h | 75 +++ > drivers/bus/shared_mem/version.map | 11 + > drivers/mempool/meson.build | 1 + > drivers/mempool/shared_mem/meson.build | 10 + > drivers/mempool/shared_mem/shared_mem_mp.c | 94 ++++ > drivers/net/meson.build | 1 + > drivers/net/shared_mem/meson.build | 11 + > drivers/net/shared_mem/shared_mem_eth.c | 295 ++++++++++ > 20 files changed, 1799 insertions(+) > create mode 100644 app/io-proxy/command_fns.c > create mode 100644 app/io-proxy/commands.list > create mode 100644 app/io-proxy/datapath.c > create mode 100644 app/io-proxy/datapath.h > create mode 100644 app/io-proxy/datapath_mp.c > create mode 100644 app/io-proxy/dpdk-io-proxy.cmds > create mode 100644 app/io-proxy/main.c > create mode 100644 app/io-proxy/meson.build > create mode 100644 drivers/bus/shared_mem/meson.build > create mode 100644 drivers/bus/shared_mem/shared_mem_bus.c > create mode 100644 drivers/bus/shared_mem/shared_mem_bus.h > create mode 100644 drivers/bus/shared_mem/version.map > create mode 100644 drivers/mempool/shared_mem/meson.build > create mode 100644 drivers/mempool/shared_mem/shared_mem_mp.c > create mode 100644 drivers/net/shared_mem/meson.build > create mode 100644 drivers/net/shared_mem/shared_mem_eth.c > > -- > 2.39.2 > This looked interesting but appears to be a dead end. No more work, and never clear how it was different from memif. Would need more documentation etc to be a real NIC. If there is still interest resubmit it.