Hi Maxim, Thanks for your questions.
1. Yes, the bindgen for packet headers isn't as convenient for Rust plugins to use as I would like. I have explored using other Rust crates for accessing packet contents, such as https://docs.rs/etherparse/latest/etherparse/, but most likely those external Rust crates won't have compatible performance goals. Therefore, we will probably end up with some wrapper or native Rust types either directly in the vpp-plugin crate or in a new crate. 2. The Rust standard library uses libc malloc/free, but vpp-plugin wrappers around VPP objects (e.g. buffers and vecs) use the same allocator as the C code. As far as I'm aware, there are no problems in both allocators existing in VPP provided that the code doesn't lose track of which allocator was used (which is greatly helped by the Rust type system), although Rust does allow providing an alternative allocator if it turns out to be required/desired. However, the Rust standard library algorithms likely aren't going to be ideal for use in the fast path due to them not being optimised for this use case, so further zero-cost abstractions around the VPP C functionality will be added where required. Thanks, Rob On Mon, 29 Dec 2025 at 11:12, Maxim Uvarov via lists.fd.io <[email protected]> wrote: > > > Hello Rob, > > 1. In first example direct bindgen looks a little bit strange. I guess that > can use direct define in future. > let ip: *const ip4_header_t = b0.current_ptr_mut() as *const ip4_header_t; > if (*ip).__bindgen_anon_1.protocol == IP_PROTOCOL_ICMP { > > 2. How do Rust libraries use alloc/free for their common algorithms? Are they > use vpp pool and buffers (huge tables, numa, cache align)? I interested how > Rust or C++ can be integrated to VPP nodes realization. Many of standard > algorithms can request of memory allocation. Allocation is not expected in > runtime data path. Is that handler by your Rust port or there are some > limitations of usage? > > Thank you, > Maxim. > > 27.11.2025, 15:32, [email protected] > > Results with node functions performing prefetching and processing four > buffers at a time, with vector instructions emitted by both compilers > in the update of next node indices: > C: 1.08e1 clocks per packet > Rust: 1.09e1 clocks per packet > > So again, approximately equal. > > The code for this: > C: > https://github.com/rshearman/vpp/blob/48ca99dc7079bd46b2bb37605ec3d62a44bbf58f/src/plugins/example-c/example_node.c > Rust: > https://github.com/rshearman/vpp-plugin-rs/blob/de41333b46b671b8c61182b6f38d0996fce8cf5e/vpp-example-plugin/src/lib.rs > > Median values from: > $ grep example */show_runtime.txt > c-1/show_runtime.txt:example-c active > 809723 28203452 0 1.08e1 > 34.83 > c-2/show_runtime.txt:example-c active > 816954 28795970 0 1.06e1 > 35.25 > c-3/show_runtime.txt:example-c active > 906334 28421093 0 1.16e1 > 31.36 > rust-1/show_runtime.txt:example active > 864643 28368780 0 1.09e1 > 32.81 > rust-2/show_runtime.txt:example active > 924042 29116994 0 1.13e1 > 31.51 > rust-3/show_runtime.txt:example active > 858871 29217427 0 1.07e1 > 34.02 > > All of these runs were done while offering the highest packet rate the > traffic generator could do (rather than NDR in the previous set of > results). > > Thanks, > Rob > > On Tue, 25 Nov 2025 at 14:52, Robert Shearman via lists.fd.io > wrote: > > > > There is SIMD usage in vlib_get_buffers (and the Rust implementation > > of the same functionality), so it does cover that base, but I can > > enhance the plugins to perform prefetching. > > > > Thanks, > > Rob > > > > On Tue, 25 Nov 2025 at 13:55, Damjan Marion via lists.fd.io > > wrote: > > > > > > > > > Well, your C plugin is very basic, no prefetching, no instruction level > > > parallelism, no SIMD usage so no surprises that numbers are close. > > > Any chance you can try something involving those techniques? > > > > > > Thanks, > > > > > > Damjan > > > > > > > > > > On 24.11.2025., at 21:22, Robert Shearman via lists.fd.io wrote: > > > > > > > > I've been able to capture some numbers for what I believe is a fair > > > > apples-to-apples comparison of a C plugin versus a Rust plugin, and a > > > > short summary is that they are approximately equal in performance. > > > > > > > > C: 1.22e1 clock cycles per packet > > > > Rust: 1.19e1 clock cycles per packet > > > > > > > > Details: > > > > > > > > Rust plugin code (also using vpp-plugin crate from same git hash): > > > > https://github.com/rshearman/vpp-plugin-rs/tree/15437cd8d848fd877dbb5858dec1e4ab853bbd42/vpp-example-plugin > > > > C plugin code: > > > > https://github.com/rshearman/vpp/blob/8ebcf29538b1145e376bdf7f8b405ca4822e9c24/src/plugins/example-c/example_node.c > > > > > > > > Obviously, the plugins here are about as basic as it comes but the > > > > more basic the plugins the easier it is to perform an apples-to-apples > > > > comparison and visually validate they are doing the same job. > > > > > > > > Rust compiler version: > > > > rustc 1.91.0 (f8297e351 2025-10-28) > > > > > > > > C compiler version: > > > > Ubuntu clang version 18.1.3 (1ubuntu1) > > > > Target: x86_64-pc-linux-gnu > > > > Thread model: posix > > > > InstalledDir: /usr/bin > > > > > > > > The test setup is a VM with 8G of memory where TRex is running, > > > > connected via two virtio interfaces to a second VM with 4G of memory > > > > where vpp is run. Both VMs have 2 lcores pinned to set lcores on the > > > > hypervisor. Both VMs are running Ubuntu 24.04.3, along with the > > > > hypervisor. Given that this was a one-time performance test, I made no > > > > attempt at doing CPU isolation in either the guest VMs or the > > > > hypervisor, with the hope that the noise in the results that this > > > > causes is acceptable. > > > > > > > > VPP code was built using "make pkg-deb" and then installed as packages > > > > in the VM, with the only change to the configuration being to > > > > configure `workers 1`. The Rust plugin was built using `cargo build > > > > --release` and then the resulting .so file for the vpp-example-plugin > > > > was copied into the expected location in the VM. > > > > > > > > IPv4 UDP 1500-byte packets are generated from TRex in an NDR test > > > > (although the overhead from the example plugin turned out to be lost > > > > in the noise), meaning that these packets don't match in the plugins > > > > under test so the packets aren't dropped, but follow the next-feature > > > > path. > > > > > > > > The CPU on which the test was run is 11th Gen Intel(R) Core(TM) > > > > i5-11600K @ 3.90GHz (Rocket Lake) with 12 cores (although only 4 cores > > > > were being used by the VMs part of the test topology), meaning Icelake > > > > multiarch functions are in use. > > > > > > > > Three runs were performed for each of the C and Rust plugins being > > > > enabled (with the C and Rust runs interleaved to avoid bias, such as > > > > the CPU becoming thermal-limited in performance), with the median > > > > clocks value being picked for each (to avoid bias from outliers): > > > > > > > > $ grep example */show_runtime.txt > > > > c-1/show_runtime.txt:example-c active > > > > 5567253 168564707 0 1.31e1 > > > > 30.28 > > > > c-2/show_runtime.txt:example-c active > > > > 5724169 169749914 0 1.22e1 > > > > 29.65 > > > > c-3/show_runtime.txt:example-c active > > > > 5831253 165498112 0 1.15e1 > > > > 28.38 > > > > rust-1/show_runtime.txt:example active > > > > 5423194 162165846 0 1.19e1 > > > > 29.90 > > > > rust-2/show_runtime.txt:example active > > > > 5768590 172390325 0 1.15e1 > > > > 29.88 > > > > rust-3/show_runtime.txt:example active > > > > 4822482 136679532 0 1.22e1 > > > > 28.34 > > > > > > > > The full "vppctl show runtime" output for the median runs are attached > > > > for reference. > > > > > > > > Thanks, > > > > Rob > > > > > > > > On Fri, 14 Nov 2025 at 09:34, Robert Shearman wrote: > > > >> > > > >> Hi Damjan, > > > >> > > > >> I haven't done that yet, but I'll give it a go! > > > >> > > > >> Thanks, > > > >> Rob > > > >> > > > >> On Thu, 13 Nov 2025 at 12:49, Damjan Marion via lists.fd.io > > > >> wrote: > > > >>> > > > >>> > > > >>> Hi, > > > >>> > > > >>> have you tried to implement something already existing in C and > > > >>> compare > > > >>> performance? > > > >>> > > > >>> I would really like to se apple-to-apple comparison of same > > > >>> functionality in > > > >>> C and rust when it comes to high-performance datapath code. > > > >>> > > > >>> Thanks, > > > >>> > > > >>> — > > > >>> Damjan > > > >>> > > > >>> > > > >>>> On 12.11.2025., at 13:21, Robert Shearman via lists.fd.io wrote: > > > >>>> > > > >>>> Hi folks, > > > >>>> > > > >>>> I believe there could be benefits in having the option of writing VPP > > > >>>> plugins in Rust, so to that end I've created a set of Rust > > > >>>> crates/packages to make it easier to write plugins, make use of the > > > >>>> underlying VPP C APIs, and an example feature plugin all of which can > > > >>>> be found here: > > > >>>> > > > >>>> https://github.com/rshearman/vpp-plugin-rs/ > > > >>>> > > > >>>> The goal is to have performance parity with VPP plugins written in C > > > >>>> (compiling with support for different instruction sets similar to C > > > >>>> code is already supported, for example), but whilst still feeling > > > >>>> like > > > >>>> Rust code. > > > >>>> > > > >>>> I'd be interested in feedback from the VPP development community. > > > >>>> > > > >>>> Thanks, > > > >>>> -- > > > >>>> Rob Shearman > > > >>>> > > > >>>> > > > >>>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >> > > > >> > > > >> -- > > > >> Rob Shearman > > > > > > > > > > > > > > > > -- > > > > Rob Shearman > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Rob Shearman > > > > > > > > > -- > Rob Shearman > > > > -- Rob Shearman
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#26690): https://lists.fd.io/g/vpp-dev/message/26690 Mute This Topic: https://lists.fd.io/mt/116254824/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
