Hi all,

I am starting this discussion to see what can be done in order to make the in-built nodes (i.e. https://git.dpdk.org/dpdk/tree/lib/node) easier to reuse in external applications.

So far here are the limitations I have found when working on grout. Some of these limitations are trivial, some others are more tricky. I hope we can get clean solutions.

ethdev_rx and ethdev_tx require cloning
---------------------------------------

These nodes have been written to receive from or transmit to a single queue. When changing the number of ports and/or rx/tx queues. The graph needs to be recreated.

* Node names must all be unique (hence, node clones need to have different names than their original).

=> There is a routine that automatically adds a "unique" suffix to the cloned ethdev_rx and ethdev_tx names. The "ethdev_rx-<port>-<queue>" and "ethdev_rx-<port>" name patterns are used.

=> it is not possible to prepare the new nodes in advance without destroying the active graph. For example, if one port+queue isn't changed, the "ethdev_rx-<port>-<queue>" name will already exist and be active in the graph. Reusing the same name could lead to data races.

* Node context data cannot be passed during rte_graph_create() or rte_graph_clone().

Instead, each node init() callback must determine its own context data based on the graph and node pointers it has. In most cases, it is trivial, but not for nodes that have multiple copies per graph.

* Once created/cloned, nodes cannot be destroyed.

=> Removing ports and/or reducing the number queues, results in node clones remaining. It isn't the end of the world but it would be nice to allow destroying them for clarity.

* Graph statistics are per-node.

=> When cloning nodes, we end up with obscure node names such as "ethdev_rx-4-7" in graph statistics. It would be clearer if the clones would be collapsed in the statistics. Having clones is an implementation detail which shouldn't reflect in the results.

Same with the DOT graph dump, it makes the graph images bloated and also gives a different image per worker. It would be clearer if the original node name was used only.

ip* nodes assume the mbuf data offset is at 0
---------------------------------------------

L3 and L4 nodes assume that the mbuf they process have their data offsets pointing to an ethernet header.

This prevents implementing IP tunnels or control to data plane communication where the data pointer may need to be at the end of the L3 header, for example.

If we change that to adjust the data pointer to the correct OSI layer, it would also mandate that each individual node only deals with a single OSI layer.

This means that the current ip*_rewrite nodes would need to be split in two: ip*_output and eth_output. This may have big implications on the optimizations that were made in these nodes.

No explicit API to pass application specific data around
--------------------------------------------------------

This one is more a documentation issue. It would help if there was a clear description of how the in-built nodes work together and what kind of mbuf private data they require in order to function properly.

Next nodes are hard coded
-------------------------

All next-nodes are set in the in-built nodes. This prevents from reusing only a subset of them.

No support for logical interfaces
---------------------------------

All interfaces are supposed to be DPDK ports (e.g. IP next hops contain destination Ethernet addresses and DPDK port IDs). This prevents support of logical interfaces such as IP tunnels.

No support for multiple VRF
---------------------------

There is a single lpm/lpm6 instance for all ports. This is sort of linked to the previous limitation about no having support for logical interfaces. Ideally, the lpm/lpm6 instance should be determined from the VRF identifier of the input interface (nb: NOT the same thing as receiving DPDK port).

Cheers!

--
Robin

Reply via email to