On Mon, Dec 29, 2025 at 10:45:56AM +0800, Yafang Shao wrote: > We maintain a vmcore analysis script on each server that automatically > parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps > us save considerable effort by avoiding analysis of known bugs. > > For vmcores triggered by a driver bug, the system calls print_modules() to > list the loaded modules. However, print_modules() does not output module > version information. Across a large fleet of servers, there are often many > different module versions running simultaneously, and we need to know which > driver version caused a given vmcore. > > Currently, the only reliable way to obtain the module version associated > with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an > operation that is resource-intensive. Therefore, we propose printing the > driver version directly in the log, which is far more efficient. > > - Before this patch > > Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O) > Unloaded tainted modules: nvidia_peermem(PO):1 > > - After this patch > > Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O) > Unloaded tainted modules: nvidia_peermem(PO):1 > > Signed-off-by: Yafang Shao <[email protected]> > --- > kernel/module/main.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/module/main.c b/kernel/module/main.c > index 710ee30b3bea..1ad9afec8730 100644 > --- a/kernel/module/main.c > +++ b/kernel/module/main.c > @@ -3901,7 +3901,10 @@ void print_modules(void) > list_for_each_entry_rcu(mod, &modules, list) { > if (mod->state == MODULE_STATE_UNFORMED) > continue; > - pr_cont(" %s%s", mod->name, module_flags(mod, buf, true)); > + pr_cont(" %s", mod->name); > + if (mod->version) > + pr_cont("-%s", mod->version); > + pr_cont("%s", module_flags(mod, buf, true)); > } > > print_unloaded_tainted_modules(); > -- > 2.43.5 >
Hi Yafang, While I certainly appreciate the operational burden of managing a large-scale fleet and the desire to automate crash triage, I am somewhat hesitant to support this change in its current form. Perhaps the more appropriate approach would be to extend the existing module information infrastructure to include the version only when it is explicitly requested: introduce print_module_versions(). In my view, while the requirement for better version visibility is valid, we must ensure that the change does not compromise the readability of the crash report for the rest of the community. Nacked-by: Aaron Tomlin <[email protected]> Kind regards, -- Aaron Tomlin
signature.asc
Description: PGP signature
