On 2024-07-27 17:45, Wathsala Wathawana Vithanage wrote:
Hi Mattias,

The primary goal of this patch is to provide a direct interface to HW,
instead of letting kernel handle it. This is not an API just for Arm
CPUs, as other vendors also have similar HW features. For instance,
Intel and AMD has support for x86 RDRAND and RDSEED instructions, thus
can easily implement this API.


No DPDK library (or PMD) currently needs this functionality, and no
application, to my knowledge, has asked for this. If an app or a DPDK library
would require cryptographically secure random numbers, it would most likely
require it on all CPU/OS platforms (and with all DPDK -march flags).


I'm sorry, I'm not following this. Are you saying

(a) adding a feature someone hasn't already asked for is impossible?


No, not if you can explain why this feature will be useful. You guys made no such attempt.

(b) it is impossible to add support for a feature that is only available in a 
few CPUs
of an architecture family?


Cryptographically secure random numbers are available on all CPUs, via the operating system. Arguably, such random numbers are more secure than anything that a machine instruction can offer.

If your patch are to have non-zero chance of being accepted, it should include a base implementation based on getrandom() (and the Windows equivalent), with the proper optimizations (e.g., batching entropy requests to the kernel on a per-lcore basis).

You would also need to provide a rationale why ARM CPU HW random numbers are better than what the kernel can offer. The only potential reason I can think of is performance, so you would need to quantify that in some way.

In addition, reliance on CPU CSRNG would need to be a build-time option, and disabled by default.

Plus, what I've mentioned several times, give a rationale why DPDK should have this functionality.

RDRAND is only available on certain x86_64 CPUs, and is incredibly slow
- slower than getting entropy via the kernel, even with non-vDSO syscalls.

Agner Fog lists the RDRAND latency as ~3700 cc for Zen 2. Later generations of
both AMD and Intel CPUs have much shorter latencies, but a reciprocal
throughput so low that one have to wait thousands of clock cycles before
issuing another RDRAND, or risk stalling the core.

My Raptor Lake seems to require ~1000 cc retire RDRAND, which is ~11x
slower than getting entropy (in bulk) via getentropy().

What is the latency for the ARM equivalent? Does it also have a reciprocal
throughput issue?


Agree, from the numbers you are showing, it looks like they are all slow and
unsuitable for the data plane. But aren't there multi-plane DPDK applications
with control-plane threads that can benefit from a CSRNG, albeit slow?


Reply via email to