Hello Harry,
Thanks for posting this RFC. I'll share my API RFC in the coming days too, for
now
some feedback (which is already integrated/existing in the API I've been
working on)
Bigger picture; the APIs below implement "DPDK C API thinking" in Rust. I'd like to
rework the APIs to be "DPDK with Rust Ergonomics" instead:
Example:
1) DpdkPortConf::new_from(0, c_dev_conf_struct, c_rx_conf_struct,
c_tx_conf_struct, 8, 8, 512, 512, mempool (or None));
- the API here is not use case focussed. The values passed in are
documented, but not easy to explore/understand.
2) let ports = dpdk::get_ports(); // returns a Vec<DpdkPort>
let p = ports.get(0);
p.rxqs(8, rx_mempool)?;
p.rx_crc_strip(true);
p.set_rss_hash(&[0,1,2,3..N])?;
// Notes: configure RX values: note the type-safe, and english-syntax
functions.
// These functions prompt in modern IDEs, making "exploring" APIs
easy/fast/intuitive (struct member bit-fields are not as easy to explore)
// The "rxqs()" function takes the parameters it needs (u16 count, and a
Arc<Dpdk::mempool> object)
// - this avoids exposing an "Option<Arc<Mempool>>" publically, hiding impl
detail, giving better ergonomics
// Notice the "?" operator, this can be used "inline" to early-return and
error (e.g. rss_hash array has incorrect length)
Agree.
That approach is much better.
The ability to error check individual functions/APIs allow better error
reporting to the user,
specifically at the point where the error occurred. As these are not datapath
APIs, allocations and
overhead in the failure case should not be a problem - user-experience is the
goal:
1) on error: -EINVAL returned from rte_eth_rxq_configure(), difficult to see
what parameter caused the -EINVAL.
2) on error: "rss_hash has invalid lenght of 42" (or whatever we make the
Debug/std::error::Error impl say)
To summarize, "use-case focused APIs" are easier to read, debug, and maintain.
The work here is to design/build these Rust ergonomic APIs: it takes DPDK
knowledge, and some Rust API
best-practices to figure these out. There's probably a "playbook" of like 3
rules/strategies, which will
cover 90%+ of the API design.
Hopefully this email is a starting point; some resources for future reader's
reference;
- https://burntsushi.net/rust-error-handling/
- https://corrode.dev/blog/idiomatic-rust-resources/ (look for "error-handling"
tag)
I'll definitely check these out
```rust
/// Configuration details for a DPDK port.
///
/// # Overview
///
/// `DpdkPortConf` is used to initialize and configure ports in a DPDK
environment. It includes:
/// - Device information (`dev_info`) pulled from DPDK.
/// - Transmit and receive configurations, including queue settings and memory
pools.
/// - Details about the number of queues and descriptors for transmission and
reception.
///
/// This struct is typically instantiated using the [`DpdkPortConf::new_from`]
method.
///
/// # Example
///
/// ```
/// let conf = DpdkPortConf::new_from(
/// port_id,
/// dev_conf,
/// tx_conf,
/// rx_conf,
/// 8, // Number of RX queues
/// 8, // Number of TX queues
/// 512, // RX descriptors
/// 512, // TX descriptors
/// 0, // RX queue socket ID
/// 0, // TX queue socket ID
/// Some(rx_mempool) // RX queue mempool
/// ).unwrap();
A few observations; The DpdkPortConf itself is a Rust struct, but internally it
exposes a number
of "pub" fields (rte_eth_dev_info, _eth_conf, rte_eth_rxconf, ...), which are
directly bindgen-ed
from the DPDK C API. While this works, it will not provide the ergonomics we'd
like for Rust code.
For example, the eth_dev_info->driver_name is a CString (aka, const char *),
which needs to be
converted to a safe Rust CString or CStr (see
https://doc.rust-lang.org/beta/std/ffi/struct.CStr.html#method.from_ptr)
A big value add in Rust development is the automatic "Debug" implementations,
allowing one to just
println!("{dev_info:?}");
a struct, and that the compiler is able to identify the contents, and present
them automatically.
That will not work if we have C structures with pointers, resulting in bad
debug/logging for users.
What about explicitly implementing Debug for types that are referenced as
pointers ?
For example:
```rust
#[derive(Copy, Clone)]
struct MbufPtr(*const rte_mbuf);
impl std::fmt::Debug for MbufPtr {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
if self.0.is_null() {
return write!(f, "MbufPtr(null)");
}
let mbuf = unsafe { &*self.0 };
let data_off = unsafe { mbuf.__bindgen_anon_1.__bindgen_anon_1.data_off
};
let pkt_len = unsafe { mbuf.__bindgen_anon_2.__bindgen_anon_1.pkt_len };
write!(f, "MbufPtr {{ address: {:p}, data_offset: {}, packet_length: {}
}}",
self.0,
data_off,
pkt_len
)
}
}
```
The activation is like this:
```rust
let mbuf_ptr: *const rte_mbuf = PTR;
println!("{:?}", MbufPtr(mbuf_ptr));
```
Output:
```
MbufPtr { address: 0x1033af140, data_offset: 128, packet_length: 62 }
```
///
#[derive(Clone)]
pub struct DpdkPortConf {
/// Information about the DPDK Ethernet device (e.g., driver, capabilities,
etc.).
pub dev_info: rte_eth_dev_info,
/// Configuration for the Ethernet device. Determines the overall behavior
of the device.
pub dev_conf: rte_eth_conf,
/// Configuration for transmitting (Tx) packets on the Ethernet device.
pub tx_conf: rte_eth_txconf,
/// Configuration for receiving (Rx) packets on the Ethernet device.
pub rx_conf: rte_eth_rxconf,
/// Number of receive (Rx) queues configured on this port.
pub rxq_num: u16,
/// Number of transmit (Tx) queues configured on this port.
pub txq_num: u16,
/// Number of descriptors for each transmit (Tx) queue.
/// Descriptors represent items in a queue to handle packets.
pub tx_desc_num: u16,
/// Number of descriptors for each receive (Rx) queue.
pub rx_desc_num: u16,
/// NUMA socket ID associated with the memory used by receive (Rx) queues.
pub rxq_socket_id: u32,
/// NUMA socket ID associated with the memory used by transmit (Tx) queues.
pub txq_socket_id: u32,
/// Memory pool associated with receive (Rx) queues.
/// This manages the buffers used for storing incoming packets.
pub rxq_mempool: Option<Arc<DpdkMempool>>,
}
Any reaons that rxq_mempool has an Option<>..? Is it for TX-only port
configurations?
That's the only thing I could imagine is a cause. See above API where RX and TX
are
configured independently - this avoids passing an "RX mempool" at all, if only
TX is configured!
Right. The Option<> is for Tx only configuration.
/// A trait defining basic operations for managing and interacting with a DPDK
port.
///
/// # Overview
///
/// The `DpdkPort` trait standardizes how to operate on DPDK ports, making it
possible to:
/// - Configure the Ethernet device using [`configure`].
/// - Start the device using [`start`].
/// - Handle Rx and Tx bursts of packets using [`rx_burst`] and [`tx_burst`].
///
pub trait DpdkPort: Send + Sync {
This trait provides a very C API centric view. Unfortunately, that view does
not include the
Rust API semantics we need for RXQ and TXQ being Send/!Sync (see
https://youtu.be/lb6xn2xQ-NQ?t=802)
Example of INCORRECT usage of this RFC API (that Rust compiler cannot help with)
(refer to
https://github.com/getelson-at-mellanox/rdpdk/blob/main/port/mlx5/mlx5_port.rs#L30
for port creation)
thread 1:
let port = Mlx5::from(0, &port_conf);
let pkts = &mut [*mut rte_mbuf; 32];
port.rx_burst(0, &mut pkts); // segfault simulateous access on (port/queue
combo)
thread 2:
let port = Mlx5::from(0, &port_conf);
let pkts = &mut [*mut rte_mbuf; 32];
port.rx_burst(0, &mut pkts); // segfault simulateous access on (port/queue
combo)
The problem here is that the user was able to create two "handles" to the same
"port".
And from that handle of a port, it was possible to call rx_burst(), (or
tx_burst(), etc).
The Rust compiler cannot identify that Mlx5::from(0) is not safe to call twice.
My current preferred solution is to have a Dpdk::eth::get_ports() API, which
returns
all the port instances. This removes the user's ability to "create" ports at
all, and hence
it cannot be mis-used either.
All in all, the method in which C DPDK APIs work (pass in integer ID for
port/queue), is
a very different to how safe Rust APIs will look when building in Send/Sync
safety.
I see your point now.
/// Returns the port ID of the DPDK port.
///
/// # Return Value
/// A `u16` that uniquely identifies the DPDK port.
///
/// # Example
/// ```
/// let port_id = dpdk_port.port_id();
/// println!("DPDK port ID: {}", port_id);
/// ```
fn port_id(&self) -> u16;
/// Returns a reference to the configuration object of the DPDK port.
///
/// # Return Value
/// A reference to [`DpdkPortConf`], which contains various settings like
Rx/Tx queue configurations,
/// memory pools, and NUMA socket IDs.
///
/// # Example
/// ```
/// let port_config = dpdk_port.port_conf();
/// println!("Rx queues: {}", port_config.rxq_num);
/// ```
fn port_conf(&self) -> &DpdkPortConf;
/// Configures the DPDK Ethernet device with the settings specified in the
port configuration.
///
/// This method is typically called before starting the port to ensure it
is prepared for Rx and Tx operations.
///
/// # Return Value
/// - `Ok(())` if the configuration was applied successfully.
/// - `Err(String)` with a descriptive error message if the configuration
failed.
///
/// # Example
/// ```
/// let result = dpdk_port.configure();
/// if let Err(err) = result {
/// eprintln!("Failed to configure the port: {}", err);
/// }
/// ```
fn configure(&mut self) -> Result<(), String>;
/// Starts the DPDK Ethernet device.
///
/// This method initializes the Rx and Tx queues, making the port ready for
data transmission
/// and reception.
///
/// # Return Value
/// - `Ok(())` if the port was started successfully.
/// - `Err(String)` if the startup process failed, with a descriptive error
message.
///
/// # Example
/// ```
/// let result = dpdk_port.start();
/// if let Err(err) = result {
/// eprintln!("Failed to start the port: {}", err);
/// }
/// ```
fn start(&mut self) -> Result<(), String>;
/// Receives a burst of packets on the specified Rx queue.
///
/// # Parameters
/// - `queue_id`: The ID of the Rx queue to receive packets from.
/// - `pkts`: A mutable reference to an array of packet buffers (`*mut
rte_mbuf`) where received packets
/// will be written.
///
/// # Return Value
/// - `Ok(u16)` containing the number of packets successfully received.
/// - `Err(String)` if the operation failed.
///
/// # Example
/// ```
/// let pkts: Vec<*mut rte_mbuf> = vec![std::ptr::null_mut(); 32];
/// let received = dpdk_port.rx_burst(0, &pkts);
/// match received {
/// Ok(count) => println!("Received {} packets", count),
/// Err(err) => eprintln!("Rx burst failed: {}", err),
/// }
/// ```
fn rx_burst(&mut self, queue_id: u16, pkts: &[*mut rte_mbuf]) -> Result<u16,
String>;
/// Sends a burst of packets on the specified Tx queue.
///
/// # Parameters
/// - `queue_id`: The ID of the Tx queue to send packets on.
/// - `pkts`: A reference to an array of packet buffers (`*mut rte_mbuf`)
to send.
///
/// # Return Value
/// - `Ok(u16)` containing the number of packets successfully sent.
/// - `Err(String)` if the operation failed.
///
/// # Example
/// ```
/// let pkts: Vec<*mut rte_mbuf> = vec![some_packet_ptr1, some_packet_ptr2];
/// let sent = dpdk_port.tx_burst(0, &pkts);
/// match sent {
/// Ok(count) => println!("Sent {} packets", count),
/// Err(err) => eprintln!("Tx burst failed: {}", err),
/// }
/// ```
fn tx_burst(&mut self, queue_id: u16, pkts: &[*mut rte_mbuf]) -> Result<u16,
String>;
}
```
Having a single trait with configure(), start() and rx_burst() makes it
impossible to encode the
Send/Sync attributes of the "Port" concept, and the "RxQueue" concept
seperately.
So perhaps a solution with two traits could work; I haven't investigated in
detail yet.
The API uses raw FFI pointers in DpdkPort rx_burst() and tx_burst() to preserve
DPDK performance.
Cool, nice trick... but does it remove all virtual functions/function-pointer
dispatch?
-
https://github.com/getelson-at-mellanox/rdpdk/blob/main/app/runpmd/runpmd.rs#L143
This seems to accept a "&mut Vec<Box<dyn DpdkPort>>", which would mean the
".rx_burst()"
is a virtual-function into the trait, and then a function-pointer into the PMD?
Mlx5 trait implementation calls PMD rx_burst() and tx_burst() functions
directly.
This way I did not need to re-implement original DPDK static inlining for IO
calls.
For that approach to work DPDK must be patched to expose PMD IO functions.
See
https://github.com/getelson-at-mellanox/rdpdk/blob/main/dpdk-patches/0002-net-mlx5-refactor-Rx-Tx-functions-selection.patch
It could be interesting to change "do_io" to a generic function, taking a
generic DpdkPort instance? Then the first level
trait-resolution would become compile-time (through Rust compiler monomorphisation of
the generic do_io<Pmd: impl DpdkPort>() into a
more specific "do_io<Mlx5>()" code?
For multi-hw setups that function declaration need to be `do_io<R: DpdkPort, T:
DpkdPort>()`
But that will not compile for Rx-only calls.
Without generics the function works for any HW type and for all directions.
I don't believe this to *actually* matter for performance. It might matter for
an IO-forward, 0 packet data processing test.
So micro-benchmark bragging, yes this has impact... but touch the packet data
and it no longer matters!
The API implementation is here:
API definition:
https://github.com/getelson-at-mellanox/rdpdk/blob/main/lib/port/port.rs
API implementation with RTE function calls:
https://github.com/getelson-at-mellanox/rdpdk/blob/main/lib/port/raw_port.rs
Implement direct calls to mlx5 Rx/Tx IO functions:
https://github.com/getelson-at-mellanox/rdpdk/blob/main/port/mlx5/mlx5_port.rs
Signed-off-by: Gregory Etelson <getel...@nvidia.com>
Thanks for sharing the links and RFC!
I will share the RFC API that communicates the Send/Sync requirements to the
Rust compiler for Ethdev Port Rxq as a sample,
and we can try compare/converge the "trait DpdkPort" and the
compile-time-thread-safe approaches into the best API?
== thumbs-up ==
Regards,
Gregory