Hello Harry,


Thanks for posting this RFC. I'll share my API RFC in the coming days too, for 
now
some feedback (which is already integrated/existing in the API I've been 
working on)

Bigger picture; the APIs below implement "DPDK C API thinking" in Rust. I'd like to 
rework the APIs to be "DPDK with Rust Ergonomics" instead:

Example:
1) DpdkPortConf::new_from(0, c_dev_conf_struct, c_rx_conf_struct, 
c_tx_conf_struct, 8, 8, 512, 512, mempool (or None));
   - the API here is not use case focussed. The values passed in are 
documented, but not easy to explore/understand.

2) let ports = dpdk::get_ports(); // returns a Vec<DpdkPort>
  let p = ports.get(0);
  p.rxqs(8, rx_mempool)?;
  p.rx_crc_strip(true);
  p.set_rss_hash(&[0,1,2,3..N])?;

  // Notes: configure RX values: note the type-safe, and english-syntax 
functions.
  // These functions prompt in modern IDEs, making "exploring" APIs 
easy/fast/intuitive (struct member bit-fields are not as easy to explore)
  // The "rxqs()" function takes the parameters it needs (u16 count, and a 
Arc<Dpdk::mempool> object)
  //   - this avoids exposing an "Option<Arc<Mempool>>" publically, hiding impl 
detail, giving better ergonomics
  // Notice the "?" operator, this can be used "inline" to early-return and 
error (e.g. rss_hash array has incorrect length)


Agree.
That approach is much better.


The ability to error check individual functions/APIs allow better error 
reporting to the user,
specifically at the point where the error occurred. As these are not datapath 
APIs, allocations and
overhead in the failure case should not be a problem - user-experience is the 
goal:

1) on error: -EINVAL returned from rte_eth_rxq_configure(), difficult to see 
what parameter caused the -EINVAL.
2) on error: "rss_hash has invalid lenght of 42" (or whatever we make the 
Debug/std::error::Error impl say)

To summarize, "use-case focused APIs" are easier to read, debug, and maintain.
The work here is to design/build these Rust ergonomic APIs: it takes DPDK 
knowledge, and some Rust API
best-practices to figure these out. There's probably a "playbook" of like 3 
rules/strategies, which will
cover 90%+ of the API design.

Hopefully this email is a starting point; some resources for future reader's 
reference;
- https://burntsushi.net/rust-error-handling/
- https://corrode.dev/blog/idiomatic-rust-resources/ (look for "error-handling" 
tag)


I'll definitely check these out


```rust
/// Configuration details for a DPDK port.
///
/// # Overview
///
/// `DpdkPortConf` is used to initialize and configure ports in a DPDK 
environment. It includes:
/// - Device information (`dev_info`) pulled from DPDK.
/// - Transmit and receive configurations, including queue settings and memory 
pools.
/// - Details about the number of queues and descriptors for transmission and 
reception.
///
/// This struct is typically instantiated using the [`DpdkPortConf::new_from`] 
method.
///
/// # Example
///
/// ```
/// let conf = DpdkPortConf::new_from(
///     port_id,
///     dev_conf,
///     tx_conf,
///     rx_conf,
///     8,               // Number of RX queues
///     8,               // Number of TX queues
///     512,             // RX descriptors
///     512,             // TX descriptors
///     0,               // RX queue socket ID
///     0,               // TX queue socket ID
///     Some(rx_mempool) // RX queue mempool
/// ).unwrap();

A few observations; The DpdkPortConf itself is a Rust struct, but internally it 
exposes a number
of "pub" fields (rte_eth_dev_info, _eth_conf, rte_eth_rxconf, ...), which are 
directly bindgen-ed
from the DPDK C API. While this works, it will not provide the ergonomics we'd 
like for Rust code.

For example, the eth_dev_info->driver_name is a CString (aka, const char *), 
which needs to be
converted to a safe Rust CString or CStr (see 
https://doc.rust-lang.org/beta/std/ffi/struct.CStr.html#method.from_ptr)

A big value add in Rust development is the automatic "Debug" implementations, 
allowing one to just
       println!("{dev_info:?}");
a struct, and that the compiler is able to identify the contents, and present 
them automatically.
That will not work if we have C structures with pointers, resulting in bad 
debug/logging for users.


What about explicitly implementing Debug for types that are referenced as 
pointers ?
For example:

```rust
#[derive(Copy, Clone)]
struct MbufPtr(*const rte_mbuf);

impl std::fmt::Debug for MbufPtr {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        if self.0.is_null() {
            return write!(f, "MbufPtr(null)");
        }

        let mbuf = unsafe { &*self.0 };
        let data_off = unsafe { mbuf.__bindgen_anon_1.__bindgen_anon_1.data_off 
};
        let pkt_len = unsafe { mbuf.__bindgen_anon_2.__bindgen_anon_1.pkt_len };

        write!(f, "MbufPtr {{ address: {:p}, data_offset: {}, packet_length: {} 
}}",
            self.0,
            data_off,
            pkt_len
        )
    }
}
```

The activation is like this:
```rust
let mbuf_ptr: *const rte_mbuf = PTR;
println!("{:?}", MbufPtr(mbuf_ptr));
```

Output:
```
MbufPtr { address: 0x1033af140, data_offset: 128, packet_length: 62 }
```


///
#[derive(Clone)]
pub struct DpdkPortConf {
    /// Information about the DPDK Ethernet device (e.g., driver, capabilities, 
etc.).
    pub dev_info: rte_eth_dev_info,

    /// Configuration for the Ethernet device. Determines the overall behavior 
of the device.
    pub dev_conf: rte_eth_conf,

    /// Configuration for transmitting (Tx) packets on the Ethernet device.
    pub tx_conf: rte_eth_txconf,

    /// Configuration for receiving (Rx) packets on the Ethernet device.
    pub rx_conf: rte_eth_rxconf,

    /// Number of receive (Rx) queues configured on this port.
    pub rxq_num: u16,

    /// Number of transmit (Tx) queues configured on this port.
    pub txq_num: u16,

    /// Number of descriptors for each transmit (Tx) queue.
    /// Descriptors represent items in a queue to handle packets.
    pub tx_desc_num: u16,

    /// Number of descriptors for each receive (Rx) queue.
    pub rx_desc_num: u16,

    /// NUMA socket ID associated with the memory used by receive (Rx) queues.
    pub rxq_socket_id: u32,

    /// NUMA socket ID associated with the memory used by transmit (Tx) queues.
    pub txq_socket_id: u32,

    /// Memory pool associated with receive (Rx) queues.
    /// This manages the buffers used for storing incoming packets.
    pub rxq_mempool: Option<Arc<DpdkMempool>>,
}

Any reaons that rxq_mempool has an Option<>..? Is it for TX-only port 
configurations?
That's the only thing I could imagine is a cause. See above API where RX and TX 
are
configured independently - this avoids passing an "RX mempool" at all, if only 
TX is configured!


Right. The Option<> is for Tx only configuration.


/// A trait defining basic operations for managing and interacting with a DPDK 
port.
///
/// # Overview
///
/// The `DpdkPort` trait standardizes how to operate on DPDK ports, making it 
possible to:
/// - Configure the Ethernet device using [`configure`].
/// - Start the device using [`start`].
/// - Handle Rx and Tx bursts of packets using [`rx_burst`] and [`tx_burst`].
///
pub trait DpdkPort: Send + Sync {

This trait provides a very C API centric view. Unfortunately, that view does 
not include the
Rust API semantics we need for RXQ and TXQ being Send/!Sync (see 
https://youtu.be/lb6xn2xQ-NQ?t=802)

Example of INCORRECT usage of this RFC API (that Rust compiler cannot help with)
(refer to 
https://github.com/getelson-at-mellanox/rdpdk/blob/main/port/mlx5/mlx5_port.rs#L30
 for port creation)
thread 1:
  let port = Mlx5::from(0, &port_conf);
  let pkts = &mut [*mut rte_mbuf; 32];
  port.rx_burst(0, &mut pkts); // segfault simulateous access on (port/queue 
combo)

thread 2:
  let port = Mlx5::from(0, &port_conf);
  let pkts = &mut [*mut rte_mbuf; 32];
  port.rx_burst(0, &mut pkts); // segfault simulateous access on (port/queue 
combo)

The problem here is that the user was able to create two "handles" to the same 
"port".
And from that handle of a port, it was possible to call rx_burst(), (or 
tx_burst(), etc).

The Rust compiler cannot identify that Mlx5::from(0) is not safe to call twice.
My current preferred solution is to have a Dpdk::eth::get_ports() API, which 
returns
all the port instances. This removes the user's ability to "create" ports at 
all, and hence
it cannot be mis-used either.

All in all, the method in which C DPDK APIs work (pass in integer ID for 
port/queue), is
a very different to how safe Rust APIs will look when building in Send/Sync 
safety.



I see your point now.


    /// Returns the port ID of the DPDK port.
    ///
    /// # Return Value
    /// A `u16` that uniquely identifies the DPDK port.
    ///
    /// # Example
    /// ```
    /// let port_id = dpdk_port.port_id();
    /// println!("DPDK port ID: {}", port_id);
    /// ```
    fn port_id(&self) -> u16;

    /// Returns a reference to the configuration object of the DPDK port.
    ///
    /// # Return Value
    /// A reference to [`DpdkPortConf`], which contains various settings like 
Rx/Tx queue configurations,
    /// memory pools, and NUMA socket IDs.
    ///
    /// # Example
    /// ```
    /// let port_config = dpdk_port.port_conf();
    /// println!("Rx queues: {}", port_config.rxq_num);
    /// ```
    fn port_conf(&self) -> &DpdkPortConf;

    /// Configures the DPDK Ethernet device with the settings specified in the 
port configuration.
    ///
    /// This method is typically called before starting the port to ensure it 
is prepared for Rx and Tx operations.
    ///
    /// # Return Value
    /// - `Ok(())` if the configuration was applied successfully.
    /// - `Err(String)` with a descriptive error message if the configuration 
failed.
    ///
    /// # Example
    /// ```
    /// let result = dpdk_port.configure();
    /// if let Err(err) = result {
    ///     eprintln!("Failed to configure the port: {}", err);
    /// }
    /// ```
    fn configure(&mut self) -> Result<(), String>;

    /// Starts the DPDK Ethernet device.
    ///
    /// This method initializes the Rx and Tx queues, making the port ready for 
data transmission
    /// and reception.
    ///
    /// # Return Value
    /// - `Ok(())` if the port was started successfully.
    /// - `Err(String)` if the startup process failed, with a descriptive error 
message.
    ///
    /// # Example
    /// ```
    /// let result = dpdk_port.start();
    /// if let Err(err) = result {
    ///     eprintln!("Failed to start the port: {}", err);
    /// }
    /// ```
    fn start(&mut self) -> Result<(), String>;

    /// Receives a burst of packets on the specified Rx queue.
    ///
    /// # Parameters
    /// - `queue_id`: The ID of the Rx queue to receive packets from.
    /// - `pkts`: A mutable reference to an array of packet buffers (`*mut 
rte_mbuf`) where received packets
    ///   will be written.
    ///
    /// # Return Value
    /// - `Ok(u16)` containing the number of packets successfully received.
    /// - `Err(String)` if the operation failed.
    ///
    /// # Example
    /// ```
    /// let pkts: Vec<*mut rte_mbuf> = vec![std::ptr::null_mut(); 32];
    /// let received = dpdk_port.rx_burst(0, &pkts);
    /// match received {
    ///     Ok(count) => println!("Received {} packets", count),
    ///     Err(err) => eprintln!("Rx burst failed: {}", err),
    /// }
    /// ```
    fn rx_burst(&mut self, queue_id: u16, pkts: &[*mut rte_mbuf]) -> Result<u16, 
String>;

    /// Sends a burst of packets on the specified Tx queue.
    ///
    /// # Parameters
    /// - `queue_id`: The ID of the Tx queue to send packets on.
    /// - `pkts`: A reference to an array of packet buffers (`*mut rte_mbuf`) 
to send.
    ///
    /// # Return Value
    /// - `Ok(u16)` containing the number of packets successfully sent.
    /// - `Err(String)` if the operation failed.
    ///
    /// # Example
    /// ```
    /// let pkts: Vec<*mut rte_mbuf> = vec![some_packet_ptr1, some_packet_ptr2];
    /// let sent = dpdk_port.tx_burst(0, &pkts);
    /// match sent {
    ///     Ok(count) => println!("Sent {} packets", count),
    ///     Err(err) => eprintln!("Tx burst failed: {}", err),
    /// }
    /// ```
    fn tx_burst(&mut self, queue_id: u16, pkts: &[*mut rte_mbuf]) -> Result<u16, 
String>;
}
```

Having a single trait with configure(), start() and rx_burst() makes it 
impossible to encode the
Send/Sync attributes of the "Port" concept, and the "RxQueue" concept 
seperately.
So perhaps a solution with two traits could work; I haven't investigated in 
detail yet.


The API uses raw FFI pointers in DpdkPort rx_burst() and tx_burst() to preserve 
DPDK performance.

Cool, nice trick... but does it remove all virtual functions/function-pointer 
dispatch?
- 
https://github.com/getelson-at-mellanox/rdpdk/blob/main/app/runpmd/runpmd.rs#L143
This seems to accept a "&mut Vec<Box<dyn DpdkPort>>", which would mean the 
".rx_burst()"
is a virtual-function into the trait, and then a function-pointer into the PMD?


Mlx5 trait implementation calls PMD rx_burst() and tx_burst() functions 
directly.
This way I did not need to re-implement original DPDK static inlining for IO 
calls.
For that approach to work DPDK must be patched to expose PMD IO functions.
See https://github.com/getelson-at-mellanox/rdpdk/blob/main/dpdk-patches/0002-net-mlx5-refactor-Rx-Tx-functions-selection.patch


It could be interesting to change "do_io" to a generic function, taking a 
generic DpdkPort instance? Then the first level
trait-resolution would become compile-time (through Rust compiler monomorphisation of 
the generic do_io<Pmd: impl DpdkPort>() into a
more specific  "do_io<Mlx5>()"  code?

For multi-hw setups that function declaration need to be `do_io<R: DpdkPort, T: 
DpkdPort>()`
But that will not compile for Rx-only calls.
Without generics the function works for any HW type and for all directions.


I don't believe this to *actually* matter for performance. It might matter for 
an IO-forward, 0 packet data processing test.
So micro-benchmark bragging, yes this has impact... but touch the packet data 
and it no longer matters!


The API implementation is here:

API definition:
https://github.com/getelson-at-mellanox/rdpdk/blob/main/lib/port/port.rs

API implementation with RTE function calls:
https://github.com/getelson-at-mellanox/rdpdk/blob/main/lib/port/raw_port.rs

Implement direct calls to mlx5 Rx/Tx IO functions:
https://github.com/getelson-at-mellanox/rdpdk/blob/main/port/mlx5/mlx5_port.rs


Signed-off-by: Gregory Etelson <getel...@nvidia.com>

Thanks for sharing the links and RFC!

I will share the RFC API that communicates the Send/Sync requirements to the 
Rust compiler for Ethdev Port Rxq as a sample,
and we can try compare/converge the "trait DpdkPort" and the 
compile-time-thread-safe approaches into the best API?


== thumbs-up ==

Regards,
Gregory

Reply via email to