On Thu, 26 Oct 2023 09:33:42 +0200 Morten Brørup <m...@smartsharesystems.com> wrote:
> > From: Stephen Hemminger [mailto:step...@networkplumber.org] > > Sent: Wednesday, 25 October 2023 23.33 > > > > On Wed, 25 Oct 2023 19:54:06 +0200 > > Morten Brørup <m...@smartsharesystems.com> wrote: > > > > > I agree with Thomas on this. > > > > > > If you want the log message, please degrade it to INFO or DEBUG level. It > > > is > > only relevant when chasing problems, not for normal production - and thus > > NOTICE is too high. > > > > I don't want the message to be hidden. > > If we get any bug reports want to be able to say "read the log, don't do > > that". > > Since Stephen is arguing so strongly for it, I have changed my mind, and now > support Stephen's suggestion. > > It's a tradeoff: Noise for carefully designed systems, vs. important bug > hunting information for systems under development (or casually developed > systems). > As Stephen points out, it is a good starting point to check for bug reports > possibly related to this. And, I suppose the experienced users who really > understands it will not be seriously confused by such a NOTICE message in the > log. > > > > > > Someone might build a kernel with options to keep non-dataplane threads > > > off > > some dedicated CPU cores, so they can be used for guaranteed low-latency > > dataplane threads. We do. We don't use real-time priority, though. > > > > This is really, hard to do. > > As my kids would say: This is really, really, really, really, really hard to > do! > > We have not been able to find an authoritative source of documentation > describing how to do it. :-( > > And our experiment shows that we didn't 100 % succeed doing it. But we got > close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ GHz > CPU corresponds to max 3 microseconds of added worst-case latency. > > It would be great for latency-sensitive applications if the DPDK > documentation went more into detail on this topic. However, if the DPDK runs > on top of a Linux distro, it essentially depends on the distro, and should be > documented there. And if running on top of a custom built Linux Kernel, it > essentially depends on the kernel, and should be documented there. In other > words: Such information should be contributed there, and not in the DPDK > documentation. ;-) > > > Isolated CPU's are not isolated from interrupts > > and other sources which end up scheduling work as kernel threads. Plus there > > is the behavior where kernel decides to turn a soft irq into a kernel > > thread, > > then starve itself. > > We have configured the kernel to put all of this on CPU 0. (Details further > below.) > > > Under starvation, disk corruption is likely if interrupts never get > > processed :-( > > > > > For reference, we did some experiments (using this custom built kernel) > > > with > > a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and > > registering the delta. Although the overwhelming majority is ca. CPU 80 > > cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of > > magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel > > threads steal some cycles from this thread, regardless of our > > customizations. > > We haven't bothered analyzing and optimizing it further. > > > > Was this on isolated CPU? > > Yes. We isolate all CPUs but CPU 0. > > > Did you check that that CPU was excluded from the smp_affinty mask on all > > devices? > > Not sure how to do that? > > NB: We are currently only using single-socket hardware - this makes some > things easier. Perhaps this is one of those things? > > > Did you enable the kernel feature to avoid clock ticks if CPU is dedicated? > > > > Yes: > # Timers subsystem > CONFIG_TICK_ONESHOT=y > CONFIG_NO_HZ_COMMON=y > CONFIG_NO_HZ_FULL=y > CONFIG_NO_HZ_FULL_ALL=y > > CONFIG_CMDLINE="isolcpus=1-32 irqaffinity=0 rcu_nocb_poll" > > > Same thing for RCU, need to adjust parameters? > > Yes: > # RCU Subsystem > CONFIG_TREE_RCU=y > CONFIG_SRCU=y > CONFIG_RCU_STALL_COMMON=y > CONFIG_CONTEXT_TRACKING=y > CONFIG_RCU_NOCB_CPU=y > CONFIG_RCU_NOCB_CPU_ALL=y > > > > > Also, on many systems there can be SMI BIOS hidden execution that will cause > > big outliers. > > Yes, this is a big surprise to many people, when it happens. Our hardware > doesn't suffer from that. > > > > > Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots > > of > > places. > > Yes, this is very important! We treat CPU 0 as if any random process or > interrupt handler can take it away at any time. > > > > > > I think our experiment supports the need to allow kernel threads to run, > > e.g. by calling sleep() or similar, when an EAL thread has real-time > > priority. > One benefit of doing real-time thread is that kernel will be more precise in any calls to sleep. If you do small sleep in normal thread, the kernel will round up the timer to try and avoid reprogramming timer chip and to save power (less wakeups from idle). With RT thread it will do "you wanted 21us, ok for you will do 21us" The project that was originally Vyatta, has a script that tries to isolate interrupts etc. I started it but they have worked on it since then. https://github.com/danos/vyatta-cpu-shield It adjust kernel workers, softirq, cgroups etc