Hi all,
I am starting this discussion to see what can be done in order to make
the in-built nodes (i.e. https://git.dpdk.org/dpdk/tree/lib/node) easier
to reuse in external applications.
So far here are the limitations I have found when working on grout. Some
of these limitations are trivial, some others are more tricky. I hope we
can get clean solutions.
ethdev_rx and ethdev_tx require cloning
---------------------------------------
These nodes have been written to receive from or transmit to a single
queue. When changing the number of ports and/or rx/tx queues. The graph
needs to be recreated.
* Node names must all be unique (hence, node clones need to have
different names than their original).
=> There is a routine that automatically adds a "unique" suffix to the
cloned ethdev_rx and ethdev_tx names. The "ethdev_rx-<port>-<queue>"
and "ethdev_rx-<port>" name patterns are used.
=> it is not possible to prepare the new nodes in advance without
destroying the active graph. For example, if one port+queue isn't
changed, the "ethdev_rx-<port>-<queue>" name will already exist and be
active in the graph. Reusing the same name could lead to data races.
* Node context data cannot be passed during rte_graph_create() or
rte_graph_clone().
Instead, each node init() callback must determine its own context data
based on the graph and node pointers it has. In most cases, it is
trivial, but not for nodes that have multiple copies per graph.
* Once created/cloned, nodes cannot be destroyed.
=> Removing ports and/or reducing the number queues, results in node
clones remaining. It isn't the end of the world but it would be nice
to allow destroying them for clarity.
* Graph statistics are per-node.
=> When cloning nodes, we end up with obscure node names such as
"ethdev_rx-4-7" in graph statistics. It would be clearer if the clones
would be collapsed in the statistics. Having clones is an
implementation detail which shouldn't reflect in the results.
Same with the DOT graph dump, it makes the graph images bloated and
also gives a different image per worker. It would be clearer if the
original node name was used only.
ip* nodes assume the mbuf data offset is at 0
---------------------------------------------
L3 and L4 nodes assume that the mbuf they process have their data
offsets pointing to an ethernet header.
This prevents implementing IP tunnels or control to data plane
communication where the data pointer may need to be at the end of the
L3 header, for example.
If we change that to adjust the data pointer to the correct OSI layer,
it would also mandate that each individual node only deals with a single
OSI layer.
This means that the current ip*_rewrite nodes would need to be split in
two: ip*_output and eth_output. This may have big implications on the
optimizations that were made in these nodes.
No explicit API to pass application specific data around
--------------------------------------------------------
This one is more a documentation issue. It would help if there was
a clear description of how the in-built nodes work together and what
kind of mbuf private data they require in order to function properly.
Next nodes are hard coded
-------------------------
All next-nodes are set in the in-built nodes. This prevents from reusing
only a subset of them.
No support for logical interfaces
---------------------------------
All interfaces are supposed to be DPDK ports (e.g. IP next hops contain
destination Ethernet addresses and DPDK port IDs). This prevents support
of logical interfaces such as IP tunnels.
No support for multiple VRF
---------------------------
There is a single lpm/lpm6 instance for all ports. This is sort of
linked to the previous limitation about no having support for logical
interfaces. Ideally, the lpm/lpm6 instance should be determined from the
VRF identifier of the input interface (nb: NOT the same thing as
receiving DPDK port).
Cheers!
--
Robin