Re: [PATCH v11 3/4] x86, mce: Add __mcsafe_copy()

2016-02-18 Thread Luck, Tony
On Thu, Feb 18, 2016 at 10:12:42AM -0800, Linus Torvalds wrote: > On Wed, Feb 17, 2016 at 10:20 AM, Tony Luck wrote: > > > > If we faulted during the copy, then 'trapnr' will say which type > > of trap (X86_TRAP_PF or X86_TRAP_MC) and 'remain' says how many > > bytes were not copied. > > So apart

[PATCH v12] x86, mce: Add memcpy_trap()

2016-02-18 Thread Luck, Tony
Make use of the EXTABLE_FAULT exception table entries. This routine returns a structure to indicate the result of the copy: struct mcsafe_ret { u64 trap_nr; u64 bytes_left; }; If the copy is successful, then both 'trap_nr' and 'bytes_left' are zero. If we faulted during the copy,

Re: [PATCH -v2] x86: Add an archinfo dumper module

2016-02-09 Thread Luck, Tony
> What I was going to propose, though, was to simplify the parsing by > doing this: > > struct reg_range { > const char * const names; > unsigned flags; > unsigned len; > }; > > which describes a bit slice of the register and then do this: > > const struct reg_range reg_descrip

Re: [PATCH v10 3/4] x86, mce: Add __mcsafe_copy()

2016-02-09 Thread Luck, Tony
> You can save yourself this MOV here in what is, I'm assuming, the > general likely case where @src is aligned and do: > > /* check for bad alignment of source */ > testl $7, %esi > /* already aligned? */ > jz 102f > > movl %esi,%ecx > subl $8,%ecx

Re: [PATCH v10 4/4] x86: Create a new synthetic cpu capability for machine check recovery

2016-02-09 Thread Luck, Tony
> > + if (mca_cfg.recovery || (mca_cfg.ser && > > + !strncmp(c->x86_model_id, "Intel(R) Xeon(R) CPU E7-", 24))) > > Eeww, a model string check :-( > > Lemme guess: those E7s can't be represented by a range of > model/steppings, can they? We use the same model number for E5 and E7 ser

Re: [PATCH v13] x86, mce: Add memcpy_trap()

2016-02-25 Thread Luck, Tony
For reference below is what I'd hoped to be able to do with copy_from_user() [obviously needs to not just replace that ALTERNATIVE_2 setup in _copy_from_user ... would have to invent an ALTERNATIVE_3 to pick the new function for people willing to sacrifice speed for recoverability] BUT ... there a

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-18 Thread Luck, Tony
> Your test case is presumably doing something that involves setting > undocumented registers* to program the CPU or memory controller to > generate a machine check on access to some address. Presumably this > is done by broadcasting an SMI and programming the registers in SMM. Good theory - but

RE: [PATCH v3 0/3] Handle IST interrupts from userspace on the normal stack

2014-11-19 Thread Luck, Tony
> NB: Tony has seen odd behavior when stress-testing injected > machine checks with this series applied. I suspect that > it's a bug in something else, possibly his BIOS. Bugs in > this series shouldn't be ruled out, though. v3 did 3.5x better than earlier ones ... survived overnight but died at

[PATCH] x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks

2014-11-19 Thread Luck, Tony
We now switch to the kernel stack when a machine check interrupts during user mode. This means that we can perform recovery actions in the tail of do_machine_check(). So say goodbye to TIF_MCE_NOTIFY, mce_save_info(), mce_find_info() and mce_notify_process() Signed-off-by: Tony Luck --- Obvious

RE: [PATCH v3] x86/mce: Try printing all machine check banks known before panic

2014-11-19 Thread Luck, Tony
>> No information besides that it is a machine check. This happens in two cases: >> 1) The CPU logs the error with the MCi_STATUS.EN bit set to zero, and Linux >>ignores EN=0 entries (as it should). > Well, I guess we shouldn't anymore. Apparently hw forgets to set the > bit when raising an MC

[GIT PULL] Couple more changes for x86/ras branch

2014-11-20 Thread Luck, Tony
The following changes since commit 8dcf32ea220d87ca517e164de85d336480c9d172: x86, MCE, AMD: Assign interrupt handler only when bank supports it (2014-11-01 11:28:23 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-ucn

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Luck, Tony
> Not that easy for testing the #MC path - there we have to inject real > MCEs and then noodle through the memory_failure() code. I'd be very much > interested to see what would happen if two MCEs happen back-to-back with > your change, the second one being raised when we're on the kernel stack > a

RE: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error

2014-11-12 Thread Luck, Tony
> Just as what you said, the severity table entry for the "EN" check > should have been skipped when calling from the CMCI/Poll handler. > As shown below: > >MCESEV( >NO, "Not enabled", >EXCP, BITCLR(MCI_STATUS_EN) >), Yes - that worked. The

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Luck, Tony
> v2 coming soon with these changes and some additional comment cleanups. So v1 + do_machine_check change is not surviving some real testing. I'm injecting and consuming errors sequentially with a small delay in between - so no fancy corner cases with multiple errors being processed ... we get

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Luck, Tony
> v2's not going to make a difference unless you're using uprobes at the > same time. Not (knowingly) using uprobes. System is installed with a RHEL7 userspace ... but is essentially idle except for my test program. > In the interest of my sanity, can you add something like > BUG_ON(!user_mode_v

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Luck, Tony
> printk seems to work just fine in do_machine_check. Any chance you > can instrument, for each cpu, all entries to do_machine_check, all > calls to do_machine_check, all returns, and everything that tries to > do memory_failure? I first added a printk() just for the cpu that calls do_machine_che

RE: randconfig build error with next-20141113, in fs/pstore/inode.c

2014-11-13 Thread Luck, Tony
> Building with the attached random configuration file, > > fs/built-in.o: In function `pstore_check_syslog_permissions': > inode.c:(.text+0x13a1bd): undefined reference to `check_syslog_permissions' > make: *** [vmlinux] Error 1 Sebastian, This looks to come from your "Honor dmesg_restrict sysct

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Luck, Tony
> Are you sure that this works in an unmodified kernel Unmodified kernel has run tens of thousands of injection/consumption/recovery cycles. I did get a crash with the entry/exit traces you asked for. Last 2 lines of console log attached. There are a couple of OOPs before things fall apar

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Luck, Tony
"worst == MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel)" This can't happen. We can only declare AR severity for a user mode fault. Sent from my iPhone > On Nov 13, 2014, at 16:50, Andy Lutomirski wrote: > > worst == > MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel) -- To unsubscribe fr

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Luck, Tony
> Can you also try rebasing onto what will probably be v3? > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tag/?id=paranoid-stack-v2.9 Built that - with none of my other changes ... i.e. still use TIF_NOTIFY_MCE etc. No printk() in the MCE context. System ran 736 injection/consum

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Luck, Tony
>> It adds debugging for inappropriate reschedules from the wrong stack. >> Setting CONFIG_DEBUG_ATOMIC_SLEEP might also be a good idea. > > Will add that for next build/test Didn't see anything new. System died at 1108 recoveries with the "Timeout synchronization ..." panic -Tony N�r��y

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Luck, Tony
> So far, the only thing I've come up with is that do_machine_check > seems to be missing exception_enter or the equivalent. Do you have > CONFIG_CONTEXT_TRACKING on and/or full nohz enabled? I don't think > that this explains my bug, though. Yes to both: $ grep CONTEXT_TRACK .config CONFIG_CON

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Luck, Tony
>> Right, I can do it in the meantime and we can always experiment more >> later. Getting rid of _TIF_MCE_NOTIFY is a good thing already. > > Yep, it looks pretty simple - not tested yet, it builds though. It seems pretty solid under test so far. Can we make it pass the address/flag to mce_notify

RE: Request for help: what did I do wrong with idtentry?

2014-11-14 Thread Luck, Tony
> causes Tony's MCE stress test to fail, presumably when some CPU either > becomes permanently non-interruptable or otherwise wanders off into > the weeds. It might be that recent "improvements" I made to my test harness have messed things up. I trimmed one delay (between injection and consumptio

RE: [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error

2014-11-10 Thread Luck, Tony
> Looks ok to me. And also to me > Tony, let me know if I should pick them up or you want to. Btw, there's > already tip/x86/ras for 3.19. I can take it - if you retype the "Looks ok" in the form of an "Acked-by" tag :-) -Tony

RE: [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

2014-11-10 Thread Luck, Tony
> In fact, you could redo this patch in the meantime without the AMD vendor > check so that Tony can pick them up soon. I'll add the correct AMD bits > later. You just want this hunk deleted ... + if (c->x86_vendor == X86_VENDOR_AMD) { + /* +* AMD BKDGs - Machi

RE: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error

2014-11-10 Thread Luck, Tony
But then I tested it ... I injected a UC error to memory - then did a simple byte write to the target line. This resulted in two banks logging errors: [ 124.638045] poll: CPU54 saw ec010092 in bank 7 [ 124.639006] poll: severity = 0 [ 124.647333] poll: CPU54 saw b8200179 in ba

RE: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error

2014-11-11 Thread Luck, Tony
>> The bank 7 error reported as severity 0 because EN=0 ... so we took no >> action for it. > > How come EN is 0? Bank7 error reporting is not enabled? Why? Or the > error injection thing doesn't do it? The "EN" bit is poorly named, and not well documented. Here's a clip from the SDM: One of b

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Luck, Tony
So here is the flow: 1) A machine check happens - it is (currently) broadcast to all logical cpus on all sockets 2) First cpu to execute "order = atomic_inc_return(&mce_callin);" in mce_start() gets to be the "monarch" and directs things during the handler. 3) Every cpu gets to scan all the ma

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Luck, Tony
Andy said: > Yeah. But if you haven't cleared MCIP, you go boom, which is the same > with pretty much any approach. The current code has an ugly hole at the moment. End of do_machine_check() clears MCG_STATUS. At that point we are still running on the magic stack for machine check exceptions ..

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Luck, Tony
> I've thought about one sneaky option. If we can reliably determine > that we're an innocent bystander of a broadcast #MC, can we send an > IPI-to-self and return without clearing MCIP? Then we get another > interrupt as soon as interrupts are enabled, and we can clear MCIP at > a time when we'r

RE: [PATCH v3] x86/mce: Try printing all machine check banks known before panic

2014-11-21 Thread Luck, Tony
> leave them in. Then you can read them out again on panic time. The mce > log buffer will have to become a circular buffer or something like that. This is a mixed bag. If there are a bunch of errors so that we overflow the buffer, then general wisdom says that people want to see the first error

RE: [PATCH] sb_edac: Add support for Broadwell-DE processor

2014-11-21 Thread Luck, Tony
>> +{ PCI_DESCR(PCI_DEVICE_ID_INTEL_BROADWELL_IMC_HA0_TAD2, 1) }, >> +{ PCI_DESCR(PCI_DEVICE_ID_INTEL_BROADWELL_IMC_HA0_TAD3, 1) }, > > You are marking TAD2 and TAD3 as optional here, but > >> +for (i = 0; i < NUM_CHANNELS; i++) { >> +if (!pvt->pci_tad[i]) >> +

RE: [PATCH v3] x86/mce: Try printing all machine check banks known before panic

2014-11-21 Thread Luck, Tony
> >/* > * No machine check event found. Must be some external > * source or one CPU is hung. Panic. > */ >if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3) >mce_panic("Machine check from unknown source", NULL, NULL); > > Provided

RE: [PATCH v3] x86/mce: Try printing all machine check banks known before panic

2014-11-21 Thread Luck, Tony
>> That means there were no VALID=1, EN=1, S=1 errors anywhere. But there >> might be some other things logged that would help us understand. > > By "other things" you mean other MCEs? Logs with EN=0 and/or S=0. They may have interesting information, and have a good chance of being useful (espec

RE: Final per cpu consistency patch for -next or late in 3.19 merge period

2014-12-02 Thread Luck, Tony
> From: Christoph Lameter > Subject: ia64: Update comment that references __get_cpu_var > > __get_cpu_var was removed. Update the comments. Applied. Will send to Linus towards the end of the merge window Thanks -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

[PATCH] ACPI, EINJ: Enhance error injection tolerance level

2014-12-03 Thread Luck, Tony
From: "Chen, Gong" Some BIOSes utilize PCI MMCFG space read/write opertion to trigger specific errors. EINJ will report errors as below when hitting such cases: APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers It is because on x86 platform ACPI based PCI MMCFG

[PATCH] sb_edac: Add support for Broadwell-DE processor

2014-11-17 Thread Luck, Tony
Broadwell-DE is the microserver version of next generation Xeon processors. A whole bunch of new PCIe device ids, but otherwise pretty much the same as Haswell. Signed-off-by: Tony Luck --- Mauro: Naming of routines and #defines follows the existing convention of only making new functions wher

RE: linux-next: build warning after merge of the ia64 tree

2014-11-17 Thread Luck, Tony
> In file included from kernel/printk/printk.c:41:0: > include/linux/syslog.h:55:12: warning: 'check_syslog_permissions' defined but > not used [-Wunused-function] > static int check_syslog_permissions(int type, bool from_file) Bah - missed the "inline". Fixed now. Thanks -Tony -- To unsubscr

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Luck, Tony
>> However, I'd like to be very sure this thing doesn't introduce any >> regressions to the MCA code. So even if Tony's testing passes, I'd like >> to be very conservative here and stress it more than usual. Because once >> this thing hits upstream and stuff starts breaking, it'll be a serious >> P

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Luck, Tony
> I still wonder whether the timeout code is the real culprit. My patch > will slow down entry into do_machine_check by tens of cycles, several > cachelines, and possibly a couple of TLB misses. Given that the > timing seemed marginal to me, it's possible (albeit not that likely) > that it pushed

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Luck, Tony
> It could also be interesting to tweak mce_panic to not actually panic > the machine but to try to return and stop the test instead. Then real > debugging could be possible :) The lost cpu is *really* lost. Warm reset doesn't fix the machine, I usually have to do a full power cycle. -Tony N���

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-18 Thread Luck, Tony
>> The lost cpu is *really* lost. Warm reset doesn't fix the machine, I usually >> have to do a full power cycle. > How is it even possible that I did that with a few lines of asm? Probably not your directly your fault - some cascade of errors may have occurred. > Could this be a hardware bug?

[GIT PULL] ensure unique filenames in pstore

2014-10-16 Thread Luck, Tony
The following changes since commit bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9: Linux 3.17 (2014-10-05 12:23:04 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-pstore for you to fetch changes up to d4bf205da618bbd0b0

RE: [PATCH] pstore: do not use message compression without lock

2015-05-21 Thread Luck, Tony
> - if (big_oops_buf) { > + if (big_oops_buf && is_locked) { > dst = big_oops_buf; > hsize = sprintf(dst, "%s#%d Part%u\n", why, > Bump Thanks for the reminder. Applied. Should show up in linux-next soon, and then go to Linus i

RE: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

2015-05-05 Thread Luck, Tony
Should you check whether the address is valid before blindly reading the register? > m.bank = bank; if (m.status & MCI_STATUS_ADDRV) rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr); > mce_log(&m); -Tony

RE: [RFC 3/3] x86, mirror: x86 enabling - find mirrored memory ranges and tell memblock

2015-05-18 Thread Luck, Tony
On 2015/2/4 6:40, Tony Luck wrote: >> Can't post this part yet because it uses things in an upcoming[*] ACPI, >> UEFI, or some >> other four-letter-ending-in-I standard. So just imagine a call someplace >> early >> in startup that reads information about mirrored address ranges and does: >> >

RE: [RFC 0/3] Mirrored memory support for boot time allocations

2015-05-18 Thread Luck, Tony
> Is it means that you will create a new zone to fill mirrored memory, like the > movable zone, right? That's my general plan. > I think this will change a lot of code, why not create a new migrate type? > such as CMA, e.g. MIGRATE_MIRROR I'm still exploring options ... the idea is to use mirro

RE: [PATCH 1/2] x86: mce: kdump: use under_crashdumping to turn off MCE in all CPUs together

2015-02-24 Thread Luck, Tony
> I'd even venture a guess and say that clearing CR4.MCE should be enough > but I *think* that doesn't prevent errors from being logged. Just to be > extra sure, you should clear MCG_CTL bits too. It all depends on what > exactly you want to do. I'm not sure - the broadcast MCE will still arrive a

RE: [PATCH 1/2] x86: mce: kdump: use under_crashdumping to turn off MCE in all CPUs together

2015-02-24 Thread Luck, Tony
> If that is the case, the tolerance level might be the better approach > after all... We should also take a look at mce_start(): int cpus = num_online_cpus(); ... /* * Wait for everyone. */ while (atomic_read(&mce_callin) != cpus) { since offl

RE: [PATCH V4 0/7] x86/intel_rdt: Intel Cache Allocation Technology

2015-02-25 Thread Luck, Tony
> The CAT thing was annoying already, but at least one can find that in > the SDM, this RDT thing, not a single mention. The problems of development at the bleeding edge. Would you rather Linux sat on the sidelines until there are enough Google hits from other users of new features? I did get one

RE: [PATCH] PCI/AER: Avoid info leak in __print_tlp_header

2015-02-25 Thread Luck, Tony
> I think we should expect AER to be used on big-endian machines. I'm > pretty sure it's used on Itanium in big-endian mode. Itanium can run in either big or little endian mode - but Linux uses little endian: arch/ia64/include/uapi/asm/byteorder.h:#include -Tony N�r��yb�X��ǧv�^�)޺{.n

RE: [PATCH] x86/PCI: Fully disable devices before releasing IRQ resource

2015-03-11 Thread Luck, Tony
>> Unfortunately there's a long standing comment in pci_device_remove(): >> >> /* >> * We would love to complain here if pci_dev->is_enabled is set, that >> * the driver should have called pci_disable_device(), but the >> * unfortunate fact is there are too many

RE: [PATCH] mce: use safe MSR accesses

2015-03-11 Thread Luck, Tony
> When running as a guest under kvm, it's possible that the MSR > being accessed may not be implemented. All MSR accesses should > be prepared to handle exceptions. Isn't that a KVM bug? The code here first checks family/model before accessing the MSR: if (c->x86 == 0x15 &&

RE: [PATCH V3 0/2] Rework mce_severity

2015-03-23 Thread Luck, Tony
> Patch1: Introduce AMD severities function > Patch2: Initialise mce_severity function pointer to mce_severity_intel > and override it to mce_severity_amd on AMD systems both parts: Acked-by: Tony Luck -Tony -- To unsubscribe from this list: send the line "unsubscribe li

RE: [PATCH v4] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-03-04 Thread Luck, Tony
> - fixed AR and UC order in enum severity_level because UC is severer than AR > by definition. Current code is not affected by this wrong order by chance. AR and AO are both UC errors - that happen also to be recoverable. Are you really sure about this re-order not affecting existing code? Yo

RE: [PATCH] x86, mce, severities: Add AMD severities function

2015-03-18 Thread Luck, Tony
One other thought. Instead of the run-time test to see if this is an AMD processor on every call to this function, would it be cleaner to: 1) Rename existing mce_severity() function to mce_severity_intel() 2) Declare a function pointer named mce_severity. 3) Assign that pointer to the _intel() o

RE: [PATCH v2] mce: use safe MSR accesses

2015-03-13 Thread Luck, Tony
-rdmsrl(msrs[i], val); - -/* CntP bit set? */ -if (val & BIT_64(62)) { - val &= ~BIT_64(62); - wrmsrl(msrs[i], val); -

RE: [PATCH] fs/pstore/ram.c: Fix the ramoops module parameters update

2015-03-16 Thread Luck, Tony
> I swear this had been fixed before. Thanks for catching it! Maybe I missed it? > Acked-by: Kees Cook Got it this time - should show up in the next linux-next build. Thanks -Tony

[GIT PULL] RAS update for 3.20 (one more thing)

2015-02-02 Thread Luck, Tony
The following changes since commit 26bc420b59a38e4e6685a73345a0def461136dce: Linux 3.19-rc6 (2015-01-25 20:04:41 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-fixmcelog for you to fetch changes up to 728b6f14abaa7f

RE: [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-03-03 Thread Luck, Tony
+static void machine_check_under_kdump(struct pt_regs *regs, long error_code) +{ + if (mca_cfg.kdump_cpu == smp_processor_id()) + pr_emerg("MCE triggered when kdumping. If you are lucky enough, you will have a kdump. Otherwise, this is a dying message.\n"); I'm worried about t

[GIT PULL] fs/pstore for 3.20

2015-02-11 Thread Luck, Tony
The following changes since commit eaa27f34e91a14cdceed26ed6c6793ec1d186115: linux 3.19-rc4 (2015-01-11 12:44:53 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-pstore for you to fetch changes up to a6b8978c54b771

RE: [PATCH] einj: Documentation text corrections and streamlining

2015-01-29 Thread Luck, Tony
> -To use EINJ, make sure the following are enabled in your kernel > +To use EINJ, make sure the following are options enabled in your kernel > configuration: How about a paragraph telling people how to check whether their platform supports EINJ before they start building kernels. Either look f

RE: [PATCH] einj: Documentation text corrections and streamlining

2015-01-30 Thread Luck, Tony
>> How about a paragraph telling people how to check whether their platform >> supports > > I took your text and massaged it into the doc, diff ontop: Acked-by: Tony Luck N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A

[PATCH] drm: fb helper should avoid sleeping in panic context

2014-12-15 Thread Luck, Tony
From: Rui Wang There are still some places in the fb helper that need to avoid sleeping in panic context. Here's an example: [ 65.615496] bad: scheduling from the idle thread! [ 65.620747] CPU: 92 PID: 0 Comm: swapper/92 Tainted: G ME 3.18.0-rc4-7-default+ #20 [ 65.630364] Har

[PATCHv3] ACPI, EINJ: Enhance error injection tolerance level

2014-12-15 Thread Luck, Tony
From: "Chen, Gong" Some BIOSes utilize PCI MMCFG space read/write opertion to trigger specific errors. EINJ will report errors as below when hitting such cases: APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers It is because on x86 platform ACPI based PCI MMCFG

RE: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Luck, Tony
> When CR4.MCE=0b and an MCE happens, it will shutdown the system, at > least on Intel, according to Tony I checked with the architects ... and I was right. If you clear CR4.MCE you'll still see the machine check - and you'll pull the big system reset lever. If you think the other cpus can survi

[GIT PULL] Fix mcelog regression

2015-02-17 Thread Luck, Tony
Sorry for bypassing normal channels, but this looks like a trivial regression fix to me, but I'm getting pushback from my co-maintainer and from Ingo. 1. This used to work 2. Now it doesn't 3. People have complained Previous threads: https://lkml.org/lkml/2015/1/30/641 https://lkml.org/

[GIT PULL] two more pstore commits for merge window

2014-12-12 Thread Luck, Tony
The following changes since commit 069fb0b63722f8c9f8b4bbce236793626c89af33: syslog: Provide stub check_syslog_permissions (2014-11-17 10:28:04 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-morepstore for you to

RE: [PATCH v14] x86, mce: Add memcpy_mcsafe()

2016-03-10 Thread Luck, Tony
> But you return 0 == false for success and 1 == true for failure. Aaargh! -ETOOMUCHSHELLSCRIPTPROGRAMMING -Tony

RE: [PATCH v2 0/2] ACPI, APEI: Memory leaks

2016-03-10 Thread Luck, Tony
>> drivers/acpi/apei/apei-base.c |6 -- >> drivers/acpi/apei/erst.c |3 +++ >> 2 files changed, 7 insertions(+), 2 deletions(-) > > Tony, Boris, should I apply these? Rafael, Yes please. -Tony

RE: [PATCH] ia64: define ioremap_uc()

2016-03-15 Thread Luck, Tony
>> All architectures now need ioremap_uc(), ia64 seems defines >> this already through its ioremap_nocache() and it already >> ensures it *only* uses UC. >> >> Reported-by: 0 day bot >> Signed-off-by: Luis R. Rodriguez > > *Poke* Luis, Thanks for the reminder. Applied. -Tony

RE: [PATCH] ia64: define ioremap_uc()

2016-03-15 Thread Luck, Tony
>> Note, this is actually needed since v4.3 to complete an allyesconfig >> compile on ia64, there were others archs that needed this, and this >> one just seems to have fallen through the cracks. > > So a cc:stable was needed. I've added that to my copy of the patch. > Tony ack? Acked-by: Tony L

Re: [PATCH V2] x86/irq: Cure live lock in fixup_irqs()

2016-03-19 Thread Luck, Tony
are over 4000 lines like this: [ 218.220045] Broke affinity for irq 66 on the console for each offline pass. Also a couple of these: [ 221.951171] IRQ fixup: irq 68 move in progress, old vector 79 for each pass offlining all but one cpu. Tested-by: Tony Luck -Tony

RE: [PATCH 4/4] perf/x86/cqm: Support cqm/mbm only for perf events

2016-04-25 Thread Luck, Tony
>> Hence removing support for the parts which are broken rather than >> pretending to support it and giving incorrect data. > > Uh what, how about attempt to fix it? No hope to do that by 4.6 release ... so I suggested to Vikas that it would be better to disable the feature now so users wouldn't

RE: [PATCH 4.2.y-ckt 092/218] EDAC/sb_edac: Fix computation of channel address

2016-03-31 Thread Luck, Tony
> 4.2.8-ckt7 -stable review patch. If anyone has any objections, please let me > know. Can you hold it for a bit? There is a silly error in this which breaks the channel computation. Also needs to check if address hashing is enabled. So 1 or 2 follow up patches should be coming out soon and

RE: [PATCH] mm: Introduce kernelcore=reliable option

2015-10-30 Thread Luck, Tony
> If each memory controller has the same distance/latency, you (your firmware) > don't need > to allocate reliable memory per each memory controller. > If distance is problem, another node should be allocated. > > ...is the behavior(splitting zone) really required ? It's useful from a memory band

RE: [UNTESTED PATCH] x86, mce: Avoid double entry of deferred errors into the genpool.

2015-11-23 Thread Luck, Tony
> Also, two more fixes I've done while injecting in a kvm guest I'm > sending as a reply to this message. Will inject on a real box too. Ok ... applied those two on top of my "UNTESTED" patch and injected an error to force a UCNA log. Everything looked ok. Just one copy on the console and in /

RE: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations

2015-06-30 Thread Luck, Tony
> Sounds logical. In that case, bootmem awareness would be crucial. > Enabling support in just the page allocator is too late. Andrew already applied some patches from me that I think covered bootmem mirror allocations: commit fc6daaf93151877748f8096af6b3fddb147f22d6 mm/memblock: add extra "f

RE: [PATCH][RFC] mm: Introduce kernelcore=reliable option

2015-10-09 Thread Luck, Tony
> I understand if the mirrored regions are always at the start of the zone > today, but is that somehow guaranteed going forward on all future hardware? > > I think it's important to at least consider what we would do if DMA32 > turned out to be non-reliable. Current hardware can map one mirrored

RE: [PATCH][RFC] mm: Introduce kernelcore=reliable option

2015-10-09 Thread Luck, Tony
> I remember Kame has already suggested this idea. In my opinion, > I still think it's better to add a new migratetype or a new zone, > so both user and kernel could use mirrored memory. A new zone would be more flexible ... and probably the right long term solution. But this looks like a very cl

RE: [PATCH] tree wide: Use kvfree() than conditional kfree()/vfree()

2015-11-09 Thread Luck, Tony
> ACK for the ACPI changes (and CCing Tony and Boris for the heads-up as they > are way more famailiar with the APEI code than I am). Sure. If kvfree() really is smart enough to figure it out then there it no point in the if (blah) kfree() else vfree(). The drivers/acpi/apei/erst.c code isn't doi

Re: [RFC PATCH 0/3] Machine check recovery when kernel accesses poison

2015-11-10 Thread Luck, Tony
On Tue, Nov 10, 2015 at 12:21:01PM +0100, Borislav Petkov wrote: > Just a general, why-do-we-do-this, question: on big systems, the memory > occupied by the kernel is a very small percentage compared to whole RAM, > right? And yet we want to recover from there too? Not, say, kexec... I need to add

Re: [PATCH 1/3] x86, ras: Add new infrastructure for machine check fixup tables

2015-11-10 Thread Luck, Tony
On Tue, Nov 10, 2015 at 12:21:16PM +0100, Borislav Petkov wrote: > > +# define _ASM_MCEXTABLE(from, to) \ > > Maybe add an intermediary macro which abstracts the table name: > > #define __ASM_EXTABLE(from, to, table) > ... > > and then do > > #define _ASM_EXTABLE(from,

Re: [PATCH 2/3] x86, ras: Extend machine check recovery code to annotated ring0 areas

2015-11-10 Thread Luck, Tony
On Tue, Nov 10, 2015 at 12:21:42PM +0100, Borislav Petkov wrote: > You could save a precious indentation level here: > > if (cfg->tolerant == 3) > goto clear; > > and add the "clear" label below. > > clear: > if (worst > 0) > mce_report_event(regs); >

[GIT PULL] pstore changes for 4.4

2015-11-03 Thread Luck, Tony
The following changes since commit 7379047d5585187d1288486d4627873170d0005a: Linux 4.3-rc6 (2015-10-18 16:08:42 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-pstore for you to fetch changes up to 306e5c2a3cb45a0

Re: [PATCH] EDAC, {skx|i10nm}_edac: Fix randconfig build error

2019-03-15 Thread Luck, Tony
On Fri, Mar 15, 2019 at 06:37:20PM +0100, Borislav Petkov wrote: > I think the shared code should not have any reference to modules because > it is supposed to be a library. Can you move the THIS_MODULE out of the > common file and into the actual module? Yes - Qiuxu did that already ... patch re

Re: [PATCH] EDAC, {skx|i10nm}_edac: Fix randconfig build error

2019-03-15 Thread Luck, Tony
On Fri, Mar 15, 2019 at 07:02:06PM +0100, Borislav Petkov wrote: > On Fri, Mar 15, 2019 at 10:49:56AM -0700, Luck, Tony wrote: > > Yes - Qiuxu did that already ... patch reposted below. > > ... to which Arnd said that it were fragile because it might break if it > includes a THI

RE: [PATCH] EDAC, {skx|i10nm}_edac: Fix randconfig build error

2019-03-15 Thread Luck, Tony
> Basically I cheat Kconfig, so if one driver is built-in and > the other is a loadable module, we compile both as built-in. My mail client may have munged that path. Both "git am" and "patch -p1" barf when I try to apply it. Since it was small I tried to replicate manually, but I must have messe

RE: [PATCH] MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h

2019-08-21 Thread Luck, Tony
> sed -i -e 's/\(INTEL_FAM6_.*\)_MOBILE/\1_ULT/g' ${i} I think it would be better to change the one _ULT to _MOBILE, rather than all the _MOBILE to _ULT -Tony

Re: [PATCH v2 2/3] x86/cpu: Add new Intel Atom CPU model name

2019-08-21 Thread Luck, Tony
On Tue, Aug 20, 2019 at 04:57:35PM +0200, Peter Zijlstra wrote: > On Tue, Aug 20, 2019 at 12:48:05PM +0000, Luck, Tony wrote: > > > > >> +#define INTEL_FAM6_ATOM_AIRMONT_NP0x75 /* Lightning Mountain */ > > > > > > What's _NP ? > > > >

Re: [PATCH v2 2/3] x86/cpu: Add new Intel Atom CPU model name

2019-08-22 Thread Luck, Tony
On Thu, Aug 22, 2019 at 12:29:55PM +0200, Peter Zijlstra wrote: > On Wed, Aug 21, 2019 at 01:18:46PM -0700, Luck, Tony wrote: > > On Tue, Aug 20, 2019 at 04:57:35PM +0200, Peter Zijlstra wrote: > > > As I mentioned above, there are some folks internally that think > > NP

[PATCH] x86/cpu: Add new Airmont variant to Intel family

2019-08-22 Thread Luck, Tony
On Thu, Aug 22, 2019 at 11:53:47AM -0700, Luck, Tony wrote: > On Thu, Aug 22, 2019 at 12:29:55PM +0200, Peter Zijlstra wrote: > > On Wed, Aug 21, 2019 at 01:18:46PM -0700, Luck, Tony wrote: > > > On Tue, Aug 20, 2019 at 04:57:35PM +0200, Peter Zijlstra wrote: > > > >

Re: [PATCH 0/5] Further sanitize INTEL_FAM6 naming

2019-08-22 Thread Luck, Tony
On Thu, Aug 22, 2019 at 12:23:06PM +0200, Peter Zijlstra wrote: > Lots of variation has crept in; time to collapse the lot again. Conceptually good. But I applied the series on top of tip/master and got a build error: CC arch/x86/kernel/cpu/common.o arch/x86/kernel/cpu/common.c:1031:19: e

RE: [PATCH 0/5] Further sanitize INTEL_FAM6 naming

2019-08-22 Thread Luck, Tony
> Looks like your scripts didn't anticipate the CPP gymnastics like: > > #define VULNWL_INTEL(model, whitelist) \ >VULNWL(INTEL, 6, INTEL_FAM6_##model, whitelist) Also INTEL_CPU_FAM6() macro -Tony

Re: [PATCH] MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h

2019-08-15 Thread Luck, Tony
On Thu, Aug 15, 2019 at 09:58:22AM +0200, Borislav Petkov wrote: > On Wed, Aug 14, 2019 at 04:40:30PM -0700, Tony Luck wrote: > > There are a few different subsystems in the kernel that depend on > > model specific behaviour (perf, EDAC, power, ...). Easier for just > > one person to have the task

Re: [PATCH] MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h

2019-08-15 Thread Luck, Tony
On Thu, Aug 15, 2019 at 07:54:55PM +0200, Borislav Petkov wrote: > On Thu, Aug 15, 2019 at 10:21:59AM -0700, Luck, Tony wrote: > > Like this? > > Actually, I was thinking you'd put it above the defines in the file > intel-family.h itself so that *everyone* who wants to

Re: [PATCH] MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h

2019-08-15 Thread Luck, Tony
On Thu, Aug 15, 2019 at 10:22:07PM +0200, Thomas Gleixner wrote: > On Thu, 15 Aug 2019, Luck, Tony wrote: > > On Thu, Aug 15, 2019 at 07:54:55PM +0200, Borislav Petkov wrote: > So we should document the list of valid and usable ones and either fixup > broken ones or document that th

RE: [PATCH] MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h

2019-08-16 Thread Luck, Tony
>> + * The defined symbol names have the following form: >> + * INTEL_FAM6{OPTFAMILY}_{MICROARCH}{OPTDIFF} > > I think you want to have the underscores in the template: > > INTEL_FAM6_{OPTFAMILY}_{MICROARCH}_{OPTDIFF} > > but no need to resend if this is the only issue - I'll fix it up when

RE: [PATCH -v2 0/5] Further sanitize INTEL_FAM6 naming

2019-08-27 Thread Luck, Tony
> I'm reposting because the version Ingo applied and partially fixed up still > generates build bot failure. Looks like this version gets them all. I built my standard config, allmodconfig and allyesconfig. Reviewed-by: Tony Luck What happens next? Will Ingo back out the previous set & his par

<    3   4   5   6   7   8   9   10   11   12   >