We maintain a vmcore analysis script on each server that automatically
parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
us save considerable effort by avoiding analysis of known bugs.
For vmcores triggered by a driver bug, the system calls print_modules() to
list the loaded modules. However, print_modules() does not output module
version information. Across a large fleet of servers, there are often many
different module versions running simultaneously, and we need to know which
driver version caused a given vmcore.
Currently, the only reliable way to obtain the module version associated
with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
operation that is resource-intensive. Therefore, we propose printing the
driver version directly in the log, which is far more efficient.
The motivation behind this change is that the external NVIDIA driver
[0] frequently causes kernel panics across our server fleet.
While we continuously upgrade to newer NVIDIA driver versions,
upgrading the entire fleet is time-consuming. Therefore, we need to
identify which driver version is responsible for each panic.
In-tree modules are tied to the specific kernel version already, so
printing their versions is redundant. However, for external drivers (like
proprietary networking or GPU stacks), the version is the single most
critical piece of metadata for triage. Therefore, to avoid bloating the
information about loaded modules, we only print the version for external
modules.
- Before this patch
Modules linked in: mlx5_core(O) nvidia(PO) nvme_core
- After this patch
Modules linked in: mlx5_core-5.8-2.0.3(O) nvidia-535.274.02(PO) nvme_core
^^^^^^^^^^ ^^^^^^^^^^^
Note: nvme_core is a in-tree module[1], so its version isn't printed.
As pointed out by Sami, we must ensure mod->version is valid in
print_modules():
: We release the memory for mod->version in:
:
: free_module
: -> module_remove_modinfo_attrs
: -> attr->free = free_modinfo_version
:
: And this happens before the module is removed from the list.
: Couldn't there be a race condition where we read a non-NULL
: mod->version here, but the buffer is being concurrently released
: by another core that's unloading the module, resulting in a
: use-after-free in the pr_cont call?
:
: In order to do this safely, we should presumably drop the attr->free
: call from module_remove_modinfo_attrs and release the attributes
: only after the synchronize_rcu call in free_module (there's already
: free_modinfo we can use), so mod->version is valid for the entire
: time the module is on the list.
Link: https://github.com/NVIDIA/open-gpu-kernel-modules/tags [0]
Link:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/core.c?h=v6.19-rc3#n5448
[1]
Suggested-by: Petr Pavlu <[email protected]>
Suggested-by: Sami Tolvanen <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
---
kernel/module/main.c | 29 +++++++++++++++++------------
kernel/module/sysfs.c | 2 --
2 files changed, 17 insertions(+), 14 deletions(-)
---
v2->v3:
- ensure mod->version is valid when printing it. (Sami)
v1->v2:
- print it for external module only (Petr, Aaron)
- add comment for it (Aaron)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 2bac4c7cd019..c8f41fa90f8a 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1384,6 +1384,17 @@ static void free_mod_mem(struct module *mod)
module_memory_free(mod, MOD_DATA);
}
+static void free_modinfo(struct module *mod)
+{
+ const struct module_attribute *attr;
+ int i;
+
+ for (i = 0; (attr = modinfo_attrs[i]); i++) {
+ if (attr->free)
+ attr->free(mod);
+ }
+}
+
/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
@@ -1422,6 +1433,7 @@ static void free_module(struct module *mod)
module_bug_cleanup(mod);
/* Wait for RCU synchronizing before releasing mod->list and buglist. */
synchronize_rcu();
+ free_modinfo(mod);
if (try_add_tainted_module(mod))
pr_err("%s: adding tainted module to the unloaded tainted
modules list failed.\n",
mod->name);
@@ -1779,17 +1791,6 @@ static int setup_modinfo(struct module *mod, struct
load_info *info)
return 0;
}
-static void free_modinfo(struct module *mod)
-{
- const struct module_attribute *attr;
- int i;
-
- for (i = 0; (attr = modinfo_attrs[i]); i++) {
- if (attr->free)
- attr->free(mod);
- }
-}
-
bool __weak module_init_section(const char *name)
{
return strstarts(name, ".init");
@@ -3901,7 +3902,11 @@ void print_modules(void)
list_for_each_entry_rcu(mod, &modules, list) {
if (mod->state == MODULE_STATE_UNFORMED)
continue;
- pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
+ pr_cont(" %s", mod->name);
+ /* Only append version for out-of-tree modules */
+ if (mod->version && test_bit(TAINT_OOT_MODULE, &mod->taints))
+ pr_cont("-%s", mod->version);
+ pr_cont("%s", module_flags(mod, buf, true));
}
print_unloaded_tainted_modules();
diff --git a/kernel/module/sysfs.c b/kernel/module/sysfs.c
index 01c65d608873..17d1796d6dc7 100644
--- a/kernel/module/sysfs.c
+++ b/kernel/module/sysfs.c
@@ -278,8 +278,6 @@ static void module_remove_modinfo_attrs(struct module *mod,
int end)
if (!attr->attr.name)
break;
sysfs_remove_file(&mod->mkobj.kobj, &attr->attr);
- if (attr->free)
- attr->free(mod);
}
kfree(mod->modinfo_attrs);
}
--
2.47.3