On Thu 29-08-19 19:14:46, Tetsuo Handa wrote: > On 2019/08/29 16:11, Michal Hocko wrote: > > On Wed 28-08-19 12:46:20, Edward Chron wrote: > >> Our belief is if you really think eBPF is the preferred mechanism > >> then move OOM reporting to an eBPF. > > > > I've said that all this additional information has to be dynamically > > extensible rather than a part of the core kernel. Whether eBPF is the > > suitable tool, I do not know. I haven't explored that. There are other > > ways to inject code to the kernel. systemtap/kprobes, kernel modules and > > probably others. > > As for SystemTap, guru mode (an expert mode which disables protection provided > by SystemTap; allowing kernel to crash when something went wrong) could be > used > for holding spinlock. However, as far as I know, holding mutex (or doing any > operation that might sleep) from such dynamic hooks is not allowed. Also we > will > need to export various symbols in order to allow access from such dynamic > hooks.
This is the oom path and it should better not use any sleeping locks in the first place. > I'm not familiar with eBPF, but I guess that eBPF is similar. > > But please be aware that, I REPEAT AGAIN, I don't think neither eBPF nor > SystemTap will be suitable for dumping OOM information. OOM situation means > that even single page fault event cannot complete, and temporary memory > allocation for reading from kernel or writing to files cannot complete. And I repeat that no such reporting is going to write to files. This is an OOM path afterall. > Therefore, we will need to hold all information in kernel memory (without > allocating any memory when OOM event happened). Dynamic hooks could hold > a few lines of output, but not all lines we want. The only possible buffer > which is preallocated and large enough would be printk()'s buffer. Thus, > I believe that we will have to use printk() in order to dump OOM information. > At that point, Yes, this is what I've had in mind. > > static bool (*oom_handler)(struct oom_control *oc) = default_oom_killer; > > bool out_of_memory(struct oom_control *oc) > { > return oom_handler(oc); > } > > and let in-tree kernel modules override current OOM killer would be > the only practical choice (if we refuse adding many knobs). Or simply provide a hook with the oom_control to be called to report without replacing the whole oom killer behavior. That is not necessary. -- Michal Hocko SUSE Labs