Re: [PATCH v5 00/21] EEH reorganization

2012-04-16 Thread Gavin Shan
Ben, thanks a lot for the backtrace to help narrowing down the root cause. Also thanks a lot for how to parse the backtrace and register staff printed by oops ;-) Finally, I successfully reproduced the issue on Firebird-L machine without loading the corresponding device driver for Emulex etherne

Re: [PATCH v5 00/21] EEH reorganization

2012-04-16 Thread Benjamin Herrenschmidt
On Tue, 2012-04-17 at 11:37 +1000, Anton Blanchard wrote: > > No. I replaced that backtrace in eeh_dn_check_failure with a WARN_ON() > because the backtrace doesn't give us enough info. I'm submitting a > patch for that today. > > Bottom line is mstmread has been causing an EEH error since at lea

Re: [PATCH v5 00/21] EEH reorganization

2012-04-16 Thread Anton Blanchard
Hi, > Thanks for the information. I'll try to reproduce the issue on > Firebird-L today. By the way, it seems that "mstmread" is some > user-level application accessing the config space while the problem > happened? The EEH error is caused by the Melanox firmware tools. > It seems the crash was

Re: [PATCH v5 00/21] EEH reorganization

2012-04-16 Thread Gavin Shan
>> I just hit this on mainline from today (3.4.0-rc2-00065-gf549e08). >> Haven't had a chance to narrow it down yet. Thanks for the information. I'll try to reproduce the issue on Firebird-L today. By the way, it seems that "mstmread" is some user-level application accessing the config space while

Re: [PATCH v5 00/21] EEH reorganization

2012-04-12 Thread Anton Blanchard
Hi, > I just hit this on mainline from today (3.4.0-rc2-00065-gf549e08). > Haven't had a chance to narrow it down yet. Looking closer, it was caused by an EEH error at boot. It looks like the Mellanox infiniband card gets an error when probed by their firmware tool (mstmread), but only if the ke

Re: [PATCH v5 00/21] EEH reorganization

2012-04-12 Thread Anton Blanchard
Hi Gavin, > This series of patches is going to reorganize EEH so that it could > support multiple platforms in future. The requirements were raised > from the aspects. I just hit this on mainline from today (3.4.0-rc2-00065-gf549e08). Haven't had a chance to narrow it down yet. Oops: Kernel acc

Re: [PATCH v5 00/21] EEH reorganization

2012-02-28 Thread Gavin Shan
Hi Ben, Could you pls take a look on this when you have time? Thanks, Gavin > This series of patches is going to reorganize EEH so that it could support > multiple platforms in future. The requirements were raised from the aspects. > > * The original EEH implementation only support pSerie

[PATCH v5 00/21] EEH reorganization

2012-02-27 Thread Gavin Shan
This series of patches is going to reorganize EEH so that it could support multiple platforms in future. The requirements were raised from the aspects. * The original EEH implementation only support pSeries platform, which would be regarded as guest system. Platform powernv is co