> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Monday, 4 July 2022 18.33
> 
> On Sun, 3 Jul 2022 20:38:21 +0100
> Konstantin Ananyev <konstantin.v.anan...@yandex.ru> wrote:
> 
> > >
> > > The base/existing design for DPDK was done with one particular HW
> architecture in mind where there was an abundance of resources.
> Unfortunately, that HW architecture is fast evolving and DPDK is
> adopted in use cases where that kind of resources are not available.
> For ex: efficiency cores are being introduced by every CPU vendor now.
> Soon enough, we will see big-little architecture in networking as well.
> The existing PMD design introduces 512B of stores (256B for copying to
> stack variable and 256B to store lcore cache) and 256B load/store on RX
> side every 32 packets back to back. It doesn't make sense to have that
> kind of memcopy for little/efficiency cores just for the driver code.
> >
> > I don't object about specific use-case optimizations.
> > Specially if the use-case is a common one.
Or exotic, but high-volume, use cases! Those usually get a lot of attention 
from sales and product management people. :-)

DPDK needs to support those in the mainline, or we will end up with forks like 
Qualcomm's QSDK fork of the Linux kernel. (The QSDK fork from Qualcomm, a 
leading Wi-Fi chip set vendor, bypasses a lot of the Linux kernel's IP stack to 
provide much higher throughput for one use specific case, which is a quite high 
volume use case: a Wi-Fi Access Point.)

> > But I think such changes has to be transparent to the user as
> > much as possible and shouldn't cause further DPDK code fragmentation
> > (new CONFIG options, etc.).
> > I understand that it is not always possible, but for pure SW based
> > optimizations, I think it is a reasonable expectation.
> 
> Great discussion.
> 
> Also, if you look back at the mailing list history, you can see that
> lots of users just
> use DPDK because it is "go fast" secret sauce and have not
> understanding of the internals.

Certainly, DPDK should still do that!

I just want DPDK to be able to go faster for experts.

Car analogy: If you buy a fast car, it will go fast. If you bring it to a 
tuning specialist, it will go faster. Similarly, DPDK should go "fast", but 
also accept that specialists can make it go "faster".

> 
> My concern, is that if one untestable optimization goes in for one
> hardware platform then
> users will enable it all the time thinking it makes any and all uses
> cases faster.
> Try explaining to a Linux user that the real-time kernel is *not*
> faster than
> the normal kernel...

Yes, because of the common misconception that faster equals to higher 
bandwidth. But the real-time kernel does provide lower latency (under certain 
conditions), which means faster to some of us. I'm sorry... working with 
latency as one of our KPIs, I just couldn't resist it! ;-)

Seriously, DPDK cannot be limited to cater to everyone on Stack Overflow!

Jokes aside...

When we started using DPDK at SmartShare Systems, DPDK was a highly optimized 
development kit for embedded network appliances, perfect for our SmartShare 
StraightShaper WAN optimization appliances and future roadmap. Over time, DPDK 
has morphed into a packet processing library for Ubuntu and Red Hat, with a lot 
of added features we don't use, and no ability to remove those added features. 
Those added features potentially degrade the fast path performance, and 
increase the risk of bugs at system level.

Some software optimizations have been proposed to DPDK, to support some 
specific high-volume use cases. "mbuf fast free" got accepted, but "direct 
re-arm" is getting a lot of push-back, and the most recent "IOVA VA only mode" 
is another new optimization suggestion being discussed.

In theory, it would be nice if all software optimizations could be supported at 
run-time, but it adds at least one branch to the fast path for every 
optimization, eventually slowing down the fast path significantly. And some of 
the optimizations just make so much better sense at compile time than at 
runtime, e.g. the "IOVA VA mode".

So, I think we should start thinking about such optimizations differently: If 
someone needs to optimize something for a specific use case, it can be done at 
compile time; there is no need to do it at runtime. Which is what I meant by 
the subject of my email: Don't offer optimizations as runtime features; they 
are use case specific, and should be chosen at compile time only.

Referring to the Linux kernel as the golden standard, it even has "make 
menuconfig"... a menu driven configuration interface for compile time 
configuration. Why must DPDK have every exotic option available at runtime, 
when the Linux kernel considers it perfectly acceptable to have some things 
configurable at compile time only?

With this discussion, I am only asking for software optimizations (which 
usually also imply some other limitations) to be compile time options, rather 
than compile time options. Any application can achieve exactly the same without 
those optimizations enabled, but it will be faster with the optimization 
enabled.

I would love to go back to the good old days, where DPDK had a lot of compile 
time options to disable cruft we're not using, but I know that game was lost a 
long time ago! So I'm trying to find some middle ground that keeps all features 
in the "DPDK library for distros", but also allows hard core developers to tune 
the performance for their individual use cases.

Offering software optimizations as compile time options only, should also 
reduce the amount of push-back for such software optimizations.

Reading all the feedback from the thread, it seems that the major concern is 
testing. And for some mysterious reason, compiling 2^N features causes more 
concern than run-time testing 2^N features. I get the sense that run-time 
testing the various feature combinations is not happening today. :-(

Reply via email to