Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-25 Thread Raj, Ashok
On Fri, Sep 25, 2015 at 10:29:01AM +0200, Borislav Petkov wrote: > > > > > > > The last patch of that series had 2 changes. > > > > 1. Allow offline cpu's to participate in the rendezvous. Since in the odd > > chance the offline cpus have any errors collected we can still report them. > > (we ch

Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-25 Thread Borislav Petkov
+ x...@kernel.org On Thu, Sep 24, 2015 at 02:25:41PM -0700, Raj, Ashok wrote: > Hi Boris > > I should have expanded on it.. > > On Thu, Sep 24, 2015 at 11:07:33PM +0200, Borislav Petkov wrote: > > > > How are you ever going to call into those from an offlined CPU?! > > > > And that's easy: >

Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-24 Thread Raj, Ashok
Hi Boris I should have expanded on it.. On Thu, Sep 24, 2015 at 11:07:33PM +0200, Borislav Petkov wrote: > > How are you ever going to call into those from an offlined CPU?! > > And that's easy: > > if (!cpu_online(cpu)) > return; > The last patch of that series had 2 ch

Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-24 Thread Borislav Petkov
On Thu, Sep 24, 2015 at 01:22:12PM -0700, Raj, Ashok wrote: > Another reason i had a separate buffer in my earlier patch was to avoid > calling rcu() functions from the offline CPU. I had an offline discussion > with Paul McKenney he said don't do that... > > mce_gen_pool_add()->gen_pool_alloc

Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-24 Thread Raj, Ashok
Hi Boris On Thu, Sep 24, 2015 at 09:22:24PM +0200, Borislav Petkov wrote: > > Ah, we return. But we shouldn't return - we should overwrite. I believe > we've talked about the policy of overwriting old errors with new ones. > Another reason i had a separate buffer in my earlier patch was to avoi

Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-24 Thread Borislav Petkov
On Thu, Sep 24, 2015 at 07:00:46PM +, Luck, Tony wrote: > > If we get new ones logged in the meantime and userspace hasn't managed > > to consume and delete the present ones yet, we overwrite the oldest ones > > and set MCE_OVERFLOW like mce_log does now for mcelog. And that's no > > difference

RE: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-24 Thread Luck, Tony
> If we get new ones logged in the meantime and userspace hasn't managed > to consume and delete the present ones yet, we overwrite the oldest ones > and set MCE_OVERFLOW like mce_log does now for mcelog. And that's no > difference in functionality than what we have now. U. No.

Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-24 Thread Borislav Petkov
On Thu, Sep 24, 2015 at 06:44:25PM +, Luck, Tony wrote: > > Now that we have this shiny 2-pages sized lockless gen_pool, why are we > > still dealing with struct mce_log mcelog? Why can't we rip it out and > > kill it finally? And switch to the gen_pool? > > > > All code that reads from mcelog

RE: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-24 Thread Luck, Tony
> Now that we have this shiny 2-pages sized lockless gen_pool, why are we > still dealing with struct mce_log mcelog? Why can't we rip it out and > kill it finally? And switch to the gen_pool? > > All code that reads from mcelog - /dev/mcelog chrdev - should switch to > the lockless buffer and will

Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

2015-09-24 Thread Borislav Petkov
On Thu, Sep 24, 2015 at 01:48:38AM -0400, Ashok Raj wrote: > MCE_LOG_LEN appears to be short for high core count parts. Especially when > handling fatal errors, we don't clear MCE banks. Socket level MC banks > are visible to all CPUs that share banks. > > Assuming 18 core part, 2 threads per core