On Sat, Feb 14, 2015 at 11:18:40AM +0800, Daniel J Blueman wrote: > When ECC interrupts occur on memory controllers after EDAC_MAX_MCS (16), the
I knew this artificial limit would come back to bite us someday :-\ > kernel fatally dereferences unallocated structures [1]; this occurs on at > least NumaConnect systems. > > Minimally fix by checking if a memory controller info structure is allocated; > candidate for stable. > > Signed-off-by: Daniel J Blueman <dan...@numascale.com> > > -- [1] > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 > IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 > PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 > Oops: 0000 [#2] SMP > Modules linked in: > CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 CPU 224?! What node is that? :) > --- > drivers/edac/amd64_edac.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c > index 17638d7..baccc0e 100644 > --- a/drivers/edac/amd64_edac.c > +++ b/drivers/edac/amd64_edac.c > @@ -2175,7 +2175,7 @@ static void __log_bus_error(struct mem_ctl_info *mci, > struct err_info *err, > static inline void decode_bus_error(int node_id, struct mce *m) > { > struct mem_ctl_info *mci = mcis[node_id]; > - struct amd64_pvt *pvt = mci->pvt_info; > + struct amd64_pvt *pvt; > u8 ecc_type = (m->status >> 45) & 0x3; > u8 xec = XEC(m->status, 0x1f); > u16 ec = EC(m->status); > @@ -2190,6 +2190,11 @@ static inline void decode_bus_error(int node_id, > struct mce *m) > if (xec && xec != F10_NBSL_EXT_ERR_ECC) > return; > > + /* Unable to decode on memory controllers after EDAC_MAX_MCS, as no mci > is allocated */ > + if (!mci) > + return; > + pvt = mci->pvt_info; Hmm, so we have all the facilities to fix that properly, IINM: edac_mc_find(), add_mc_to_global_list() and so on. Would looking through the list of the memory controllers help instead, i.e. if you do: static inline void decode_bus_error(int node_id, struct mce *m) { struct mem_ctl_info *mci = edac_mc_find(node_id); if (!mci) return; ? Then we can get rid of that local mcis dumbness and do it properly... Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/