We maintain a vmcore analysis script on each server that automatically
parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
us save considerable effort by avoiding analysis of known bugs.

For vmcores triggered by a driver bug, the system calls print_modules() to
list the loaded modules. However, print_modules() does not output module
version information. Across a large fleet of servers, there are often many
different module versions running simultaneously, and we need to know which
driver version caused a given vmcore.

Currently, the only reliable way to obtain the module version associated
with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
operation that is resource-intensive. Therefore, we propose printing the
driver version directly in the log, which is far more efficient.

- Before this patch

  Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O)
  Unloaded tainted modules: nvidia_peermem(PO):1

- After this patch

  Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O)
  Unloaded tainted modules: nvidia_peermem(PO):1

Signed-off-by: Yafang Shao <[email protected]>
---
 kernel/module/main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 710ee30b3bea..1ad9afec8730 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3901,7 +3901,10 @@ void print_modules(void)
        list_for_each_entry_rcu(mod, &modules, list) {
                if (mod->state == MODULE_STATE_UNFORMED)
                        continue;
-               pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
+               pr_cont(" %s", mod->name);
+               if (mod->version)
+                       pr_cont("-%s", mod->version);
+               pr_cont("%s", module_flags(mod, buf, true));
        }
 
        print_unloaded_tainted_modules();
-- 
2.43.5


Reply via email to