On Tue, Apr 16, 2013 at 09:41:30AM -0700, Paul E. McKenney wrote: > From: "Paul E. McKenney" <paul...@linux.vnet.ibm.com> > > The Linux kernel uses a number of per-CPU kthreads, any of which might > contribute to OS jitter at any time. The usual approach to normal > kthreads, namely to bind them to a "housekeeping" CPU, does not work > with these kthreads because they cannot operate correctly if moved to > some other CPU. This commit therefore lists ways of controlling OS > jitter from the Linux kernel's per-CPU kthreads. > > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > Cc: Frederic Weisbecker <fweis...@gmail.com> > Cc: Steven Rostedt <rost...@goodmis.org> > Cc: Borislav Petkov <b...@alien8.de> > Cc: Arjan van de Ven <ar...@linux.intel.com> > Cc: Kevin Hilman <khil...@linaro.org> > Cc: Christoph Lameter <c...@linux.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Olivier Baetz <olivier.ba...@novasparks.com> > Reviewed-by: Randy Dunlap <rdun...@infradead.org> > --- > Documentation/kernel-per-CPU-kthreads.txt | 186 > ++++++++++++++++++++++++++++++ > 1 file changed, 186 insertions(+) > create mode 100644 Documentation/kernel-per-CPU-kthreads.txt > > diff --git a/Documentation/kernel-per-CPU-kthreads.txt > b/Documentation/kernel-per-CPU-kthreads.txt > new file mode 100644 > index 0000000..bfecc1c > --- /dev/null > +++ b/Documentation/kernel-per-CPU-kthreads.txt > @@ -0,0 +1,186 @@ > +REDUCING OS JITTER DUE TO PER-CPU KTHREADS > + > +This document lists per-CPU kthreads in the Linux kernel and presents > +options to control OS jitter due to these kthreads. Note that kthreads
s/due to/which can be caused by/ > +that are not per-CPU are not listed here -- to reduce OS jitter from one too many "that"s: s/that/which/ > +non-per-CPU kthreads, bind them to a "housekeeping" CPU that is dedicated s/that/which/ > +to such work. > + > + > +REFERENCES > + > +o Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. > + > +o Documentation/cgroups: Using cgroups to bind tasks to sets of CPUs. > + > +o man taskset: Using the taskset command to bind tasks to sets > + of CPUs. > + > +o man sched_setaffinity: Using the sched_setaffinity() system > + call to bind tasks to sets of CPUs. > + > + > +KTHREADS > + > +Name: ehca_comp/%u > +Purpose: Periodically process Infiniband-related work. > +To reduce corresponding OS jitter, do any of the following: > +1. Don't use EHCA Infiniband hardware. This will prevent these Sounds like this particular hardware is slow and its IRQ handler/softirq needs a lot of time. Yes, no? Can we have a reason why people shouldn't use that hw. > + kthreads from being created in the first place. (This will > + work for most people, as this hardware, though important, > + is relatively old and is produced in relatively low unit > + volumes.) > +2. Do all EHCA-Infiniband-related work on other CPUs, including > + interrupts. > + > + > +Name: irq/%d-%s > +Purpose: Handle threaded interrupts. > +To reduce corresponding OS jitter, do the following: This sentence keeps repeating; maybe explain the purpose of this doc in the beginning once and drop this sentence in the later sections. > +1. Use irq affinity to force the irq threads to execute on > + some other CPU. > + > +Name: kcmtpd_ctr_%d > +Purpose: Handle Bluetooth work. > +To reduce corresponding OS jitter, do one of the following: > +1. Don't use Bluetooth, in which case these kthreads won't be > + created in the first place. > +2. Use irq affinity to force Bluetooth-related interrupts to > + occur on some other CPU and furthermore initiate all > + Bluetooth activity on some other CPU. > + > +Name: ksoftirqd/%u > +Purpose: Execute softirq handlers when threaded or when under heavy load. > +To reduce corresponding OS jitter, each softirq vector must be handled > +separately as follows: > +TIMER_SOFTIRQ: Do all of the following: > +1. To the extent possible, keep the CPU out of the kernel when it > + is non-idle, for example, by avoiding system calls and by forcing > + both kernel threads and interrupts to execute elsewhere. > +2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force > + the CPU offline, then bring it back online. This forces > + recurring timers to migrate elsewhere. If you are concerned We don't migrate them back to that CPU when we online it again, do we? > + with multiple CPUs, force them all offline before bringing the > + first one back online. > +NET_TX_SOFTIRQ and NET_RX_SOFTIRQ: Do all of the following: > +1. Force networking interrupts onto other CPUs. > +2. Initiate any network I/O on other CPUs. > +3. Once your application has started, prevent CPU-hotplug operations > + from being initiated from tasks that might run on the CPU to > + be de-jittered. (It is OK to force this CPU offline and then > + bring it back online before you start your application.) > +BLOCK_SOFTIRQ: Do all of the following: > +1. Force block-device interrupts onto some other CPU. > +2. Initiate any block I/O on other CPUs. > +3. Once your application has started, prevent CPU-hotplug operations > + from being initiated from tasks that might run on the CPU to > + be de-jittered. (It is OK to force this CPU offline and then > + bring it back online before you start your application.) > +BLOCK_IOPOLL_SOFTIRQ: Do all of the following: > +1. Force block-device interrupts onto some other CPU. > +2. Initiate any block I/O and block-I/O polling on other CPUs. > +3. Once your application has started, prevent CPU-hotplug operations > + from being initiated from tasks that might run on the CPU to > + be de-jittered. (It is OK to force this CPU offline and then > + bring it back online before you start your application.) more repeated text in brackets, maybe a footnote somewhere instead... > +TASKLET_SOFTIRQ: Do one or more of the following: > +1. Avoid use of drivers that use tasklets. > +2. Convert all drivers that you must use from tasklets to workqueues. > +3. Force interrupts for drivers using tasklets onto other CPUs, > + and also do I/O involving these drivers on other CPUs. How do I check which drivers use tasklets? > +SCHED_SOFTIRQ: Do all of the following: > +1. Avoid sending scheduler IPIs to the CPU to be de-jittered, > + for example, ensure that at most one runnable kthread is To which sentence does "for example" belong to? Depending on the answer, you can split that sentence. > + present on that CPU. If a thread awakens that expects > + to run on the de-jittered CPU, the scheduler will send "If a thread expecting to run ont the de-jittered CPU awakens, the scheduler..." > + an IPI that can result in a subsequent SCHED_SOFTIRQ. > +2. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, > + CONFIG_NO_HZ_FULL=y, and in addition ensure that the CPU commas: , and, in addition, ensure... > + to be de-jittered is marked as an adaptive-ticks CPU using the > + "nohz_full=" boot parameter. This reduces the number of > + scheduler-clock interrupts that the de-jittered CPU receives, > + minimizing its chances of being selected to do load balancing, I don't think there's a "," if the "which... " part refers to the previous "load balancing" and not to the whole sentence. > + which happens in SCHED_SOFTIRQ context. > +3. To the extent possible, keep the CPU out of the kernel when it > + is non-idle, for example, by avoiding system calls and by > + forcing both kernel threads and interrupts to execute elsewhere. This time "for example" reads ok. > + This further reduces the number of scheduler-clock interrupts > + that the de-jittered CPU receives. s/that/which/ would suit better here IMHO. > +HRTIMER_SOFTIRQ: Do all of the following: > +1. To the extent possible, keep the CPU out of the kernel when it > + is non-idle, for example, by avoiding system calls and by forcing > + both kernel threads and interrupts to execute elsewhere. Ok, I think I get your "for example" usage pattern. "blabablabla. For example, do blabalbal." I think that would be a bit more readable. > +2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the > + CPU offline, then bring it back online. This forces recurring > + timers to migrate elsewhere. If you are concerned with multiple > + CPUs, force them all offline before bringing the first one > + back online. Same question: do the timers get migrated back when the CPU reappears online? > +RCU_SOFTIRQ: Do at least one of the following: > +1. Offload callbacks and keep the CPU in either dyntick-idle or > + adaptive-ticks state by doing all of the following: > + a. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, > + CONFIG_NO_HZ_FULL=y, and in addition ensure that the CPU , and, in addition, > + to be de-jittered is marked as an adaptive-ticks CPU > + using the "nohz_full=" boot parameter. Bind the rcuo > + kthreads to housekeeping CPUs that can tolerate OS jitter. which > + b. To the extent possible, keep the CPU out of the kernel > + when it is non-idle, for example, by avoiding system > + calls and by forcing both kernel threads and interrupts > + to execute elsewhere. > +2. Enable RCU to do its processing remotely via dyntick-idle by > + doing all of the following: > + a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. > + b. Ensure that the CPU goes idle frequently, allowing other I'm ensuring that by selecting the proper workload which has idle breathers? > + CPUs to detect that it has passed through an RCU quiescent > + state. If the kernel is built with CONFIG_NO_HZ_FULL=y, > + userspace execution also allows other CPUs to detect that > + the CPU in question has passed through a quiescent state. > + c. To the extent possible, keep the CPU out of the kernel > + when it is non-idle, for example, by avoiding system > + calls and by forcing both kernel threads and interrupts > + to execute elsewhere. > + > +Name: rcuc/%u > +Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. > +To reduce corresponding OS jitter, do at least one of the following: > +1. Build the kernel with CONFIG_PREEMPT=n. This prevents these > + kthreads from being created in the first place, and also prevents > + RCU priority boosting from ever being required. This approach "... this obviates the need for RCU priority boosting." > + is feasible for workloads that do not require high degrees of > + responsiveness. > +2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these > + kthreads from being created in the first place. This approach > + is feasible only if your workload never requires RCU priority > + boosting, for example, if you ensure frequent idle time on all > + CPUs that might execute within the kernel. > +3. Build with CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y, > + which offloads all RCU callbacks to kthreads that can be moved > + off of CPUs susceptible to OS jitter. This approach prevents the > + rcuc/%u kthreads from having any work to do, so that they are > + never awakened. > +4. Ensure that the CPU never enters the kernel and in particular , and, in particular, > + avoid initiating any CPU hotplug operations on this CPU. This is > + another way of preventing any callbacks from being queued on the > + CPU, again preventing the rcuc/%u kthreads from having any work > + to do. > + > +Name: rcuob/%d, rcuop/%d, and rcuos/%d > +Purpose: Offload RCU callbacks from the corresponding CPU. > +To reduce corresponding OS jitter, do at least one of the following: > +1. Use affinity, cgroups, or other mechanism to force these kthreads > + to execute on some other CPU. > +2. Build with CONFIG_RCU_NOCB_CPUS=n, which will prevent these > + kthreads from being created in the first place. However, > + please note that this will not eliminate the corresponding can we drop "corresponding" here? > + OS jitter, but will instead shift it to RCU_SOFTIRQ. > + > +Name: watchdog/%u > +Purpose: Detect software lockups on each CPU. > +To reduce corresponding OS jitter, do at least one of the following: ditto. > +1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these > + kthreads from being created in the first place. > +2. Echo a zero to /proc/sys/kernel/watchdog to disable the > + watchdog timer. > +3. Echo a large number of /proc/sys/kernel/watchdog_thresh in > + order to reduce the frequency of OS jitter due to the watchdog > + timer down to a level that is acceptable for your workload. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/