Re: [PATCH v2 4/4] x86/mce: Add Zhaoxin LMCE support

2019-09-12 Thread Luck, Tony
On Tue, Sep 10, 2019 at 08:20:07AM +, Tony W Wang-oc wrote: > Zhaoxin newer CPUs support LMCE that compatible with Intel's > "Machine-Check Architecture", so add support for Zhaoxin LMCE > in mce/core.c. Your mailer included a header: Content-Language: zh-CN which seems to have made

RE: [PATCH 0/7] Address most issues when building with W=1

2019-09-13 Thread Luck, Tony
> Looks ok to me at a quick glance, ACK. Me too. Also ACK. -Tony

Re: [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support

2019-09-13 Thread Luck, Tony
On Wed, Sep 11, 2019 at 12:01:42PM +, Tony W Wang-oc wrote: > + /* Checks after this one are Intel/Zhaoxin-specific: */ > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && > + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN) Is it time to have a big cleanup on how we handle

Re: [PATCH v3 4/4] x86/mce: Add Zhaoxin LMCE support

2019-09-16 Thread Luck, Tony
On Mon, Sep 16, 2019 at 11:37:18AM +, Tony W Wang-oc wrote: > Zhaoxin newer CPUs support LMCE that compatible with Intel's > "Machine-Check Architecture", so add support for Zhaoxin LMCE > in mce/core.c. > > Signed-off-by: Tony W Wang-oc > --- > arch/x86/kernel/cpu/mce/core.c | 35 ++

Re: [PATCH v3 4/4] x86/mce: Add Zhaoxin LMCE support

2019-09-17 Thread Luck, Tony
On Tue, Sep 17, 2019 at 06:54:05AM +, Tony W Wang-oc wrote: > But have a question about below codes: > if (mcgstatus & MCG_STATUS_RIPV) { > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); > return true; > } > These seems require all #MC exception errors set MCG_STATU

Re: [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long

2019-09-17 Thread Luck, Tony
On Tue, Sep 17, 2019 at 08:29:28AM +, David Laight wrote: > From: Tony Luck > > Sent: 16 September 2019 23:40 > > From: Fenghua Yu > > > > The x86_capability array in cpuinfo_x86 is defined as u32 and thus is > > naturally aligned to 4 bytes. But, set_bit() and clear_bit() require > > the arr

RE: pstore does not work under xen

2019-09-19 Thread Luck, Tony
> I have been investigating a regression in our environment where pstore > (efi-pstore specifically but I suspect this would affect all > implementations) no longer works after upgrading from a 4.4 to 5.0 > kernel when running under xen. (This is an Ubuntu kernel but I don't > think there are

Re: [PATCH v2 2/3] x86/cpu: Add new Intel Atom CPU model name

2019-08-20 Thread Luck, Tony
>> +#define INTEL_FAM6_ATOM_AIRMONT_NP0x75 /* Lightning Mountain */ > > What's _NP ? Network Processor. But that is too narrow a descriptor. This is going to be used in other areas besides networking. I’m contemplating calling it AIRMONT2 -Tony

Re: [PATCH v2 2/3] x86/cpu: Add new Intel Atom CPU model name

2019-08-20 Thread Luck, Tony
> Author: Peter Zijlstra > Date: Tue Aug 7 10:17:27 2018 -0700 > >x86/cpu: Sanitize FAM6_ATOM naming > > > What 2 or 3 or other number means? In this case I want it to mean “This is an Airmont derived core. Mostly like original Airmont, so you might see some places where we have the s

Re: [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe()

2020-04-30 Thread Luck, Tony
On Thu, Apr 30, 2020 at 11:42:20AM -0700, Andy Lutomirski wrote: > I suppose there could be a consistent naming like this: > > copy_from_user() > copy_to_user() > > copy_from_unchecked_kernel_address() [what probe_kernel_read() is] > copy_to_unchecked_kernel_address() [what probe_kernel_write() i

Re: [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe()

2020-04-30 Thread Luck, Tony
On Thu, Apr 30, 2020 at 12:50:40PM -0700, Linus Torvalds wrote: I see your point about the namimg being important. I think Dan's case is indeed "copy from pmem to user" where only options for faulting are #MC on the source addresses, and #PF on the destination. > The only *fundamental* access wo

RE: [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe()

2020-05-01 Thread Luck, Tony
> Now maybe copy_to_user() should *always* work this way, but I’m not convinced. > Certainly put_user() shouldn’t — the result wouldn’t even be well defined. > And I’m > unconvinced that it makes much sense for the majority of copy_to_user() > callers > that are also directly accessing the sour

Re: [PATCH 1/2] rtc/ia64: remove legacy efirtc driver

2019-10-23 Thread Luck, Tony
repeated "rtc-efi" at the start of the line is redundant). Acked-by: Tony Luck -Tony

RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
> If fd release cleans up then how should there be something in flight at > the final mmdrop? ENQCMD from the user is only synchronous in that it lets the user know their request has been added to a queue (or not). Execution of the request may happen later (if the device is busy working on reques

RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
>> So the driver needs to use flush/drain operations to make sure all >> the in-flight work has completed before releasing/re-using the PASID. >> > Are you suggesting we should let driver also hold a reference of the > PASID? The sequence for bare metal is: process is queuing requests to

RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
> There are two users of a PASID, mm and device driver(FD). If > either one is not done with the PASID, it cannot be reclaimed. As you > mentioned, it could take a long time for the driver to abort. If the > abort ends *after* mmdrop, we are in trouble. > If driver drops reference after abort/drain

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-14 Thread Luck, Tony
On Mon, Oct 14, 2019 at 11:36:18PM +0200, Borislav Petkov wrote: > This description is already *begging* for this delay value to be > automatically set by the kernel. Putting yet another knob in front of > the user who doesn't have a clue most of the time shows one more time > that we haven't done

RE: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-17 Thread Luck, Tony
>> That all sounds like the printk should be downgraded too, it is not a >> KERN_CRIT warning. It is more a notification that we're getting warm. > > Right, and I think we should take Benjamin's patch after all - perhaps > even tag it for stable if that message is annoying people too much - and > S

RE: [RFD] x86/split_lock: Request to Intel

2019-10-17 Thread Luck, Tony
> If that's not going to happen, then we just bury the whole thing and put it > on hold until a sane implementation of that functionality surfaces in > silicon some day in the not so foreseeable future. We will drop the patches to flip the MSR bits to enable checking. But we can fix the split loc

RE: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-17 Thread Luck, Tony
> * we throttle the machine from within the kernel - whatever that may mean > * if that doesn't help, we stop scheduling !root tasks > * if that doesn't help, we halt The silicon will do that "halt" step all by itself if the temperature continues to rise and hits the highest of the temperature thr

RE: [PATCH v2 03/33] ia64: Use pr_warn instead of pr_warning

2019-10-18 Thread Luck, Tony
> As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of > pr_warning"), removing pr_warning so all logging messages use a > consistent _warn style. Let's do it. Acked-by: Tony Luck

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Luck, Tony
On Fri, Oct 18, 2019 at 03:23:09PM +0200, Borislav Petkov wrote: > On Fri, Oct 18, 2019 at 05:26:36AM -0700, Srinivas Pandruvada wrote: > > Server/desktops generally rely on the embedded controller for FAN > > control, which kernel have no control. For them this warning helps to > > either bring i

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Luck, Tony
On Fri, Oct 18, 2019 at 09:45:03PM +0200, Borislav Petkov wrote: > On Fri, Oct 18, 2019 at 11:02:57AM -0700, Luck, Tony wrote: > > So what should we do next? > > I was simply keying off this statement of yours: > > "Depending on what we end up with from Srinivas ... w

Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Luck, Tony
On Thu, Feb 07, 2019 at 03:01:31PM +0100, Peter Zijlstra wrote: > On Thu, Feb 07, 2019 at 11:50:52AM +, Linus Torvalds wrote: > > If you re-generate the canonical address in __cpa_addr(), now we'll > > actually have the real virtual address around for a lot of code-paths > > (pte lookup etc), w

Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Luck, Tony
On Thu, Feb 07, 2019 at 06:57:20PM +0100, Peter Zijlstra wrote: > Something like so then? AFAICT CLFLUSH will also #GP if feed it crap. Correct. CFLUSH will also #GP on a non-canonical address. > - __flush_tlb_one_kernel(__cpa_addr(cpa, i)); > + __flush_tlb_one_kernel(fix_

Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Luck, Tony
On Thu, Feb 07, 2019 at 10:07:28AM -0800, Andy Lutomirski wrote: > Joining this thread late... > > This is all IMO rather crazy. How about we fiddle with CR0 to turn off > the cache, then fiddle with page tables, then turn caching on? Or, heck, > see if there’s some chicken bit we can set to imp

Re: [PATCH] x86: avoid confusion over the new RESCTRL config prompt

2019-01-29 Thread Luck, Tony
On Wed, Jan 30, 2019 at 12:08:45AM +0100, Borislav Petkov wrote: > On Tue, Jan 29, 2019 at 05:52:18PM -0500, Johannes Weiner wrote: > > config X86_RESCTRL > > - bool "Resource Control support" > > + bool "x86 cache control support" > > Except that it is not only cache but memory (bandwidth) c

Re: [PATCH] x86/mce: Initialize "bank" when we find a fatal error in mce_no_way_out()

2019-02-01 Thread Luck, Tony
On Fri, Feb 01, 2019 at 10:55:53AM +0100, Borislav Petkov wrote: > On Thu, Jan 31, 2019 at 04:33:41PM -0800, Tony Luck wrote: > > if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= > > MCE_PANIC_SEVERITY) { > > + m->bank = i; > > So conceptually this write belongs

RE: [PATCH] x86: avoid confusion over the new RESCTRL config prompt

2019-02-01 Thread Luck, Tony
>> What about >> >> s/X86_RESCTRL/X86_CPU_RESCTRL/g > > Good idea. > > Tony, Babu, that look okay to you guys as well? For now. But very soon we will also have ARM_CPU_RESCTRL, and some of this code will become generic. Will we need an arch-independent name for the bits of code shared by arm an

Re: [PATCH] EDAC, sb_edac: remove redundant update of tad_base

2019-05-08 Thread Luck, Tony
On Wed, May 08, 2019 at 11:42:01PM +0100, Colin King wrote: > From: Colin Ian King > > The variable tad_base is being set to a value that is never read > and is being over-written on the next iteration of a for-loop. > This assignment is therefore redundant and can be removed. > > Addresses-Cove

Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

2019-05-21 Thread Luck, Tony
On Tue, May 21, 2019 at 10:29:02PM +0200, Borislav Petkov wrote: > > Can we do instead: > > -static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_array); > +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank, > mce_banks_array[MAX_NR_BANKS]); > > which should be something like 9*32 = 2

Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-05-17 Thread Luck, Tony
On Fri, May 17, 2019 at 09:34:31PM +0200, Borislav Petkov wrote: > On Fri, May 17, 2019 at 11:06:07AM -0700, Luck, Tony wrote: > > and thus end up with that extra level on indent for the rest > > of the function. > > Ok: > > @@ -1569,7 +1575,13 @@ static void __mchec

Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-18 Thread Luck, Tony
On Fri, Apr 19, 2019 at 01:29:10AM +0200, Borislav Petkov wrote: > Which reminds me, Tony, I think all those debugging files "pfn" > and "array" and the one you add now, should all be under a > CONFIG_RAS_CEC_DEBUG which is default off and used only for development. > Mind adding that too pls? Pat

Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-19 Thread Luck, Tony
On Fri, Apr 19, 2019 at 02:29:11AM +0200, Borislav Petkov wrote: > On Thu, Apr 18, 2019 at 05:07:45PM -0700, Luck, Tony wrote: > > On Fri, Apr 19, 2019 at 01:29:10AM +0200, Borislav Petkov wrote: > > > Which reminds me, Tony, I think all those debugging files "pfn" &

RE: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-22 Thread Luck, Tony
> Err, this all sounds to me like the storm detection code should > *automatically* disable the CEC in such cases, I'd say. Sounds good. But we should distinguish storms that have many different addresses from storms that just ping a few addresses. CEC will see counts hit the threshold in the lat

RE: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-22 Thread Luck, Tony
> Now, if you still want to know how many errors and where they happened > and when they happened and yadda yadda, you *disable* the CEC. Rebooting isn't popular in many end user situations. Many CSP (cloud service providers) vehemently hate the idea of rebooting. -Tony

RE: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-22 Thread Luck, Tony
>> Rebooting isn't popular in many end user situations. Many CSP (cloud >> service providers) vehemently hate the idea of rebooting. > > I meant disable in Kconfig - not build it in at all. If rebooting is bad, then re-compiling and rebooting is 100x worse. :-) -Tony

RE: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-22 Thread Luck, Tony
> I think we're talking past each other here: I mean disable the CEC > *forever* and *never* use it. Use only a userspace agent and log errors > with it. > > Makes sense? Not really. We want pretty much everyone to enable and use CEC. That way people don't bother use about the occasional neutron s

Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-22 Thread Luck, Tony
On Mon, Apr 22, 2019 at 07:15:32PM +0200, Borislav Petkov wrote: > On Mon, Apr 22, 2019 at 03:59:16PM +0000, Luck, Tony wrote: > > > Err, this all sounds to me like the storm detection code should > > > *automatically* disable the CEC in such cases, I'd say. > >

RE: [PATCH 1/3] tty: simserial: drop unused iflag macro

2019-04-26 Thread Luck, Tony
> Drop the RELEVANT_IFLAG() macro which hasn't been used for over a > decade. > > Cc: Tony Luck > Cc: Fenghua Yu > Signed-off-by: Johan Hovold > --- > arch/ia64/hp/sim/simserial.c | 2 -- > 1 file changed, 2 deletions(-) Acked-by: Tony Luck

RE: DISCONTIGMEM is deprecated

2019-04-29 Thread Luck, Tony
> ia64 has a such a huge number of memory model choices. Maybe we > need to cut it down to a small set that actually work. SGI systems had extremely discontiguous memory (they used some high order physical address bits in the tens/hundreds of terabyte range for the node number ... so there would

RE: ERROR: "paddr_to_nid" [drivers/md/raid1.ko] undefined!

2019-05-03 Thread Luck, Tony
From: Randy Dunlap [mailto:rdun...@infradead.org] >>ERROR: "paddr_to_nid" [drivers/block/brd.ko] undefined! >>ERROR: "paddr_to_nid" [crypto/ccm.ko] undefined! >> > > --- > Exporting paddr_to_nid() in arch/ia64/mm/numa.c fixes all of these build > errors. > Is there a problem with doing t

RE: ERROR: "paddr_to_nid" [drivers/md/raid1.ko] undefined!

2019-05-03 Thread Luck, Tony
>> Exporting paddr_to_nid() in arch/ia64/mm/numa.c fixes all of these build >> errors. >> Is there a problem with doing that? > > I don't see a problem with exporting it. But I also don't see these build errors. I'm using the same HEAD commit. I think the same .config (derived from arch/ia64/co

RE: [PATCH] arch: ia64: sn: pci: Use kmemdup in tioce_bus_fixup

2019-08-12 Thread Luck, Tony
> arch/ia64/sn/pci/tioce_provider.c | 4 ++-- Thanks for the patch, but Christoph is working on a patch series that deletes all of arch/ia64/sn/ -Tony

RE: fix misc compiler warnings in the ia64 build

2019-08-12 Thread Luck, Tony
> this little series fixes various warnings I see in ia64 builds. Applied. Thanks. [I assume you are using some up-to-date version of gcc that generates these warnings ... I'm not seeing them, but I'm still using a compiler from the stone age] -Tony

RE: [PATCH] EDAC, pnd2: Fix ioremap() size in dnv_rd_reg() from 64K -> 32K

2019-08-08 Thread Luck, Tony
- base = ioremap((resource_size_t)addr, 0x1); + base = ioremap((resource_size_t)addr, 0x8000); Changing one magic value for another. :-( Do different BIOS do different things? I don't recall seeing this error (but perhaps I missed it, or perhaps the kernel has ad

Re: [PATCH] EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()

2019-08-09 Thread Luck, Tony
On Fri, Aug 09, 2019 at 02:18:02PM +, Stephen Douthit wrote: > Depending on how BIOS has marked the reserved region containing the 32KB > MCHBAR you can get warnings like: > > resource sanity check: requesting [mem 0xfed1-0xfed1], which spans > more than reserved [mem 0xfed1-0xfed

Re: [PATCH] x86/asm: Add support for MOVDIR64B instruction

2019-08-01 Thread Luck, Tony
On Thu, Aug 01, 2019 at 12:03:41PM +0200, Borislav Petkov wrote: > On Wed, Jul 31, 2019 at 02:05:54AM +0300, Kirill A. Shutemov wrote: > > Several upcoming patchsets will make use of the helper. > > ... so why aren't you sending it together with its first user? Just to get another of the non-cont

Re: [PATCH] x86/asm: Add support for MOVDIR64B instruction

2019-08-01 Thread Luck, Tony
On Thu, Aug 01, 2019 at 10:43:48PM +0300, Alexey Dobriyan wrote: > > +static inline void movdir64b(void *dst, const void *src) > > +{ > > + /* movdir64b [rdx], rax */ > > + asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02" > > + : "=m" (*(char *)dst) >

RE: [PATCH] x86/asm: Add support for MOVDIR64B instruction

2019-08-01 Thread Luck, Tony
> I think Tony's in the right direction. We already do dst "sizing" like > that for the compiler in clwb(). The clwb case does look like what we want for movdir64b(). But is it right for clwb() ... that doesn't modify anything, just pushes things from cache to memory. So why is it using "+m"? -T

Re: [PATCH] MAINTAINERS: update EDAC entry to reflect current tree and maintainers

2019-07-25 Thread Luck, Tony
ub/scm/linux/kernel/git/bp/bp.git for-next > > -T: git > > git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac.git > > linux_next > > +T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git > > edac-for-next > > S: Supported > > F: Documentation/admin-guide/ras.rst > > F: Documentation/driver-api/edac.rst > > -- > > Acked-by: Borislav Petkov > Acked-by: Tony Luck -Tony

[PATCH V2] IB/core: Add mitigation for Spectre V1

2019-07-30 Thread Luck, Tony
Some processors may mispredict an array bounds check and speculatively access memory that they should not. With a user supplied array index we like to play things safe by masking the value with the array size before it is used as an index. Signed-off-by: Tony Luck --- V2: Mask the index *AFTER

RE: [PATCH] ia64:unwind: fix double free for mod->arch.init_unw_table

2019-08-06 Thread Luck, Tony
> Here, set mod->arch.init_unw_table = NULL after remove the unwind > table to avoid double free. Applied. Thanks. -Tony

RE: remove sn2, hpsim and ia64 machvecs

2019-08-07 Thread Luck, Tony
I like the idea ... and it sure gets rid of a lot of code. > A git tree is also available at: > >git://git.infradead.org/users/hch/misc.git ia64-remove-machvecs I grabbed this tree and ran though my build scripts. I found that vmlinux.gz doesn't get built. Which is odd, because I don't see

RE: remove sn2, hpsim and ia64 machvecs

2019-08-07 Thread Luck, Tony
> Even if I explicitly run: > > $ make compressed > > It still doesn't build it. Weird. Ugh! The rule to do the compression was in arch/ia64/hp/sim/boot/Makefile which went away as part of the deletion of hpsim. -Tony

Re: remove sn2, hpsim and ia64 machvecs

2019-08-07 Thread Luck, Tony
On Wed, Aug 07, 2019 at 01:26:17PM -0700, Luck, Tony wrote: > Ugh! The rule to do the compression was in arch/ia64/hp/sim/boot/Makefile > which went away as part of the deletion of hpsim. This fixes it ... should fold into the patch that dropped the arch/ia64/hp/sim/boot/Makefile I ju

Re: remove sn2, hpsim and ia64 machvecs

2019-08-08 Thread Luck, Tony
On Thu, Aug 08, 2019 at 08:51:23AM +0200, 'Christoph Hellwig' wrote: > On Wed, Aug 07, 2019 at 04:07:37PM -0700, Luck, Tony wrote: > > On Wed, Aug 07, 2019 at 01:26:17PM -0700, Luck, Tony wrote: > > > Ugh! The rule to do the compression was in arch/ia64/hp/sim/boot/Ma

Re: [PATCH] EDAC, ie31200: Add Intel Coffee Lake CPU support

2019-06-10 Thread Luck, Tony
On Sun, Jun 09, 2019 at 05:16:13PM +0200, Marco Elver wrote: Marco, Thanks for the patch. One comment below. > - { > - PCI_VEND_DEV(INTEL, IE31200_HB_1), PCI_ANY_ID, PCI_ANY_ID, 0, 0, > - IE31200}, > - { > - PCI_VEND_DEV(INTEL, IE31200_HB_2), PCI_ANY_I

RE: [PATCH v2 2/2] EDAC, ie31200: Reformat PCI device table

2019-06-10 Thread Luck, Tony
> Reformat device table after Coffee Lake additions to be more readable. I like that you put the reformat second ... if some old version needs a backport to get Coffee Lake support they can just take part 1 to get the functionality and then decide whether or not to take part 2. Both parts: Acked

Re: [PATCH] EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec

2019-06-27 Thread Luck, Tony
On Thu, Jun 27, 2019 at 06:11:18PM +0100, James Morse wrote: > Hello, > > (CC: +Tony Luck. > Original Patch: lore.kernel.org/r/20190626054011.30044-1-de...@etsukata.com ) Heh: My mail agent "helpfully" made that clickable, but as a "mailto:"; URL rather than an https: one! > > On 26/06/2019 06:

[GIT PULL] EDAC driver changes for v5.3

2019-07-08 Thread Luck, Tony
The following changes since commit 9e0babf2c06c73cda2c0cd37a1653d823adb40ec: Linux 5.2-rc5 (2019-06-16 08:49:45 -1000) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-for_5.3 for you to fetch changes up to d8655e7630dafa88b

Re: [PATCH] x86, mce: Fix machine_check_poll() tests for which errors to log

2019-03-11 Thread Luck, Tony
On Mon, Mar 11, 2019 at 08:25:53PM +, Ghannam, Yazen wrote: > > + if (!(m.status & MCI_STATUS_PCC) && !(m.status & MCI_STATUS_S)) > > + goto log_it; > > + > > Can you please include a vendor check with this? MCi_STATUS[56] is > not defined the same way on AMD system

RE: [RFC PATCH] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section

2019-02-13 Thread Luck, Tony
> I think the last time this came up, it was said that those people still > running Linux on Itanium were running old distro kernels, not upstream. > > So yeah, we could probably do whatever and nobody would ever notice, > except maybe Al, who is rumoured to still have an ia64 :-) I haven't heard

[PATCH] EDAC, {skx|i10nm}_edac: Fix randconfig build error

2019-03-21 Thread Luck, Tony
From: Qiuxu Zhuo Kbuild failed on the kernel configurations below: CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=m CONFIG_EDAC_I10NM=y or CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=y CONFIG_EDAC_I10NM=m Failed log: ... CC [M] drivers/edac/skx

Re: [PATCH] EDAC, {skx|i10nm}_edac: Fix randconfig build error

2019-03-22 Thread Luck, Tony
On Fri, Mar 22, 2019 at 03:00:25PM +0100, Arnd Bergmann wrote: > Sorry, this was my mistake, my email was garbled. The patch was > correct though: the idea here is not to change the Kconfig symbols > but to change the Makefile to do the right thing even when Kconfig > is set wrong. Well this does

[PATCH] MAINTAINERS: Update entry for EDAC-SKYLAKE

2019-03-25 Thread Luck, Tony
Code refactoring to share some source code with a new EDAC driver resulted in renaming one file (skx_edac.c became skx_base.c) and adding a new file (skx_common.c). Update the file pattern in MAINTAINERS to take account of this change. Reported-by: Joe Perches Fixes: 98f2fc829e3b ("EDAC, skx_eda

[PATCH] MAINTAINERS: Fix file pattern for X86 MCE INFRASTRUCTURE

2019-03-25 Thread Luck, Tony
Code restructuring renamed arch/x86/kernel/cpu/mcheck/ to be arch/x86/kernel/cpu/mce/ Update the MAINTAINERS file pattern to account for this change. Fixes: 21afaf181362 ("x86/mce: Streamline MCE subsystem's naming") Reported-by: Joe Perches Signed-off-by: Tony Luck --- MAINTAINERS | 2 +- 1 f

[PATCH] MAINTAINERS: Add entry for EDAC-I10NM

2019-03-25 Thread Luck, Tony
We forgot to update the MAINTAINERS file when adding this new driver. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Tony Luck --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index e5f3230d3f1

Re: [PATCH v3 08/34] ia64: mm: Add p?d_large() definitions

2019-03-04 Thread Luck, Tony
On Mon, Mar 04, 2019 at 01:16:47PM +, Steven Price wrote: > On 01/03/2019 21:57, Kirill A. Shutemov wrote: > > On Wed, Feb 27, 2019 at 05:05:42PM +, Steven Price wrote: > >> walk_page_range() is going to be allowed to walk page tables other than > >> those of user space. For this it needs t

[PATCH] EDAC, {skx|i10nm}_edac: Fix randconfig build error

2019-03-06 Thread Luck, Tony
From: Qiuxu Zhuo Kbuild failed on the kernel configurations below: CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=m CONFIG_EDAC_I10NM=y or CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=y CONFIG_EDAC_I10NM=m Failed log: ... CC [M] drivers/edac/skx_c

Re: [REVIEW][PATCH 0/3] signal/ia64: siginfo fixes and cleanups

2018-09-24 Thread Luck, Tony
ree. If you feel it > should go through your arch tree let me know. All of the prerequisites > should have been merged several releases ago. Sure. Merge away. Acked-by: Tony Luck -Tony

Re: [PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Luck, Tony
On Tue, Sep 25, 2018 at 05:26:59PM +0200, Borislav Petkov wrote: > On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > > We observe an oops in the skx_edac module during boot. > > Examining /var/log/messages: > > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller

[PATCH] EDAC: Don't add devices under /sys/bus/edac

2018-10-01 Thread Luck, Tony
Nobody(*) uses them. Dropping this will allow us to make the total number of memory controllers configurable (as we won't have to worry about duplicated device names under this directory). (*) https://marc.info/?l=linux-edac&m=153809709903987&w=2 Signed-off-by: Tony Luck --- Boris: Apply this,

[PATCH V2] x86/mce: Fix set_mce_nospec() to avoid #GP fault

2018-08-31 Thread Luck, Tony
The trick with flipping bit 63 to avoid loading the address of the 1:1 mapping of the poisoned page while we update the 1:1 map used to work when we wanted to unmap the page. But it falls down horribly when we try to directly set the page as uncacheable. The problem is that when we change the cach

Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake

2018-06-07 Thread Luck, Tony
On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote: > Currently we just check the "CAPID0" register to see whether the CPU > can recover from machine checks. > > But there are also some special SKUs which do not have all advanced > RAS features, but do enable machine check recovery for use

Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake

2018-06-07 Thread Luck, Tony
On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > On Thu, Jun 07, 2018 at 01:18:31PM -0700, Dan Williams wrote: > > I'm making an effort to get all persistent memory error handling holes > > covered this cycle, so I think it makes sense for this to go through > > the nvdimm tree. T

Re: PROBLEM: mce: [Hardware Error] from dmesg -l emerg

2018-05-21 Thread Luck, Tony
On Mon, May 21, 2018 at 05:31:52PM +0530, Jeffrin Thalakkottoor wrote: > > Ok, but please do not top-post. > > Ok > > > Looks like mcelog has trouble decoding this. Have you updated mcelog to > > the latest version in your distro? > . > mcelog 153+dfsg-1 So this is

Re: PROBLEM: mce: [Hardware Error] from dmesg -l emerg

2018-05-21 Thread Luck, Tony
I guess I didn't explain that very clearly. I need all the lines in betweeen. How about this: $ sudo dmesg -r | grep -C 30 Bank -Tony

Re: PROBLEM: mce: [Hardware Error] from dmesg -l emerg

2018-05-21 Thread Luck, Tony
On Tue, May 22, 2018 at 02:43:37AM +0530, Jeffrin Thalakkottoor wrote: > mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: ee40110b > mce: [Hardware Error]: TSC 0 ADDR 16080 MISC 5040008086 > mce: [Hardware Error]: PROCESSOR 0:306d4 TIME 1526932210 SOCKET 0 APIC > 0 microcode 2a T

v4.16+ seeing many unaligned access in dequeue_task_fair() on IA64

2018-04-02 Thread Luck, Tony
v4.16 boots cleanly. But with the first bunch of merges (Linus HEAD = 46e0d28bdb8e6d00e27a0fe9e1d15df6098f0ffb) I see a bunch of: ia64_handle_unaligned: 4863 callbacks suppressed kernel unaligned access to 0xe0031660fd74, ip=0xa001000f23e0 kernel unaligned access to 0xe0033bdffbcc, ip=

RE: v4.16+ seeing many unaligned access in dequeue_task_fair() on IA64

2018-04-02 Thread Luck, Tony
> kernel unaligned access to 0xe0031660fd74, ip=0xa001000f23e0 > kernel unaligned access to 0xe0033bdffbcc, ip=0xa001000f2370 Here's the disassembly of dequeu_task_fair() in case it would help to see which two instructions are getting all the faults: a001000f21c0 : a001000

RE: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

2017-02-01 Thread Luck, Tony
> I was asking for requirements, not a design proposal. In order to make a > design you need a requirements specification. Here's what I came up with ... not a fully baked list, but should allow for some useful discussion on whether any of these are not really needed, or if there is a glaring ho

Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Luck, Tony
On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote: > On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote: > > One possible timing sequence would be: > > 1st kernel running on multiple cpus panicked > > then the crash dump code starts > > the crash dump code stops the others cp

Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Luck, Tony
On Mon, Jan 23, 2017 at 06:51:30PM +0100, Borislav Petkov wrote: > Hey Tony, > > a "welcome back" is in order? :-) Yes - first day back today. Lots of catching up to do. > And apparently crash knows about poisoned pages and handles them: > > static int __init crash_save_vmcoreinfo_init(void) >

Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface

2016-07-13 Thread Luck, Tony
On Wed, Jul 13, 2016 at 02:47:30PM +0200, Thomas Gleixner wrote: > On Tue, 12 Jul 2016, Fenghua Yu wrote: > > +3. Hierarchy in rscctrl > > +=== > > What means rscctrl? > > You were not able to find a more cryptic acronym? rscctrl == resource control Intel marketing would (pr

Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface

2016-07-14 Thread Luck, Tony
On Thu, Jul 14, 2016 at 08:53:17AM +0200, Thomas Gleixner wrote: > > Happy to take suggestions for something in between those > > extremes :-) > > I'd suggest "resctrl" and the abbreviation dictionaries tell me that the most > common ones for resource are: R, RESORC, RES OK. "resctrl" it is. > A

Re: [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT

2016-07-25 Thread Luck, Tony
On Mon, Jul 25, 2016 at 11:31:24AM -0500, Nilay Vaish wrote: > I was thinking more about this software caching of CLOSids. How > likely do you think these CLOSids would be found cached? I think the > software cache would be very infrequently accessed, so it seems you > are likely to miss these in

Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management

2016-07-25 Thread Luck, Tony
the 80% when they are on their own socket and the spare 20% if the wander off to the other socket. Sent from my iPhone > On Jul 25, 2016, at 19:13, Marcelo Tosatti wrote: > >> On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote: >>> On Fri, Jul 22, 2016 at 04:1

Re: [PATCH] cacheinfo: Introduce cache id

2016-07-01 Thread Luck, Tony
On Fri, Jul 01, 2016 at 12:21:43PM +0200, Borislav Petkov wrote: > On Wed, Jun 29, 2016 at 06:56:10PM -0700, Fenghua Yu wrote: > > From: Fenghua Yu > > > > Each cache node is described by cacheinfo and is a unique node across > > What is a cache node? Clearly not a good name for the concept we

Re: [PATCH] cacheinfo: Introduce cache id

2016-07-01 Thread Luck, Tony
> Basically all cache indices carry the APIC ID of the core, so L1D on > CPU0 has ID 0 and then L1I has ID 0 too and then L2 has also the same > ID. > > How does that look on a CAT system? Do all the different cache levels > get different IDs? For CAT we only need the IDs to be unique at each lev

RE: [PATCH 12/20] x86, edac: use Intel family name macros for edac driver

2016-06-02 Thread Luck, Tony
> Another straightforward replacement of magic numbers. It would be if I hadn't forgotten that INTEL_FAM6_MODEL_BROADWELL_XEON_D had a separate model number from the other Broadwell Xeons when I switched the driver from PCI device lookup to cpu model number. This needs to add an entry for BDX-D

RE: [PATCH 12/20] x86, edac: use Intel family name macros for edac driver

2016-06-02 Thread Luck, Tony
> This needs to add an entry for BDX-DE (use the same table initializer). > Probably as > a separate patch before/after this. Oops ... a bit worse than that. I assumed that index into the array matches the enum ... (with a comment!) ... having two entries for the same "type" would break that. I'

RE: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.

2016-05-31 Thread Luck, Tony
>> -m.bank = 1; >> +m.bank = mca_cfg.banks; > > There's struct cper_sec_mem_err.bank. Why aren't we copying that? Because that is DDR3/DDR4 "bank" (internal DIMM detail) as opposed to machine check "bank" (CPU microarchitecture detail). We need the latter here. -Tony

RE: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.

2016-05-31 Thread Luck, Tony
> Btw, would it have any benefit of writing a "magic" value in m.bank > to denote the error comes from APEI instead of number of banks which > differs between generations? > > Something like > > m.bank = -1; > > or so? That might be a bit more obvious than my subtle "one more than possible o

RE: [PATCH] x86/MCE: Remove MCP_TIMESTAMP

2016-11-07 Thread Luck, Tony
> So, get rid of all that and simply log an MCE with a TSC value always. > Simplifies the code a bit too. I'm not necessarily opposed to this ... but there was once some logic behind when logged TSC, and when we didn't. Essentially we wanted the TSC when we were logging from #CMCI or #MC be

RE: [PATCH] x86/MCE: Remove MCP_TIMESTAMP

2016-11-07 Thread Luck, Tony
> One other possibility would be to use ->time and write ->tsc *only* > when exact - i.e., in the handler - and this is then enough info about > timing. > > ->time will give you somewhere around where it happened and ->tsc - only > if set - will give you exact, well, *timestamp* :) > > This sounds

Re: [PATCH 22/25] x86/mcheck: Do the init in one place

2016-11-07 Thread Luck, Tony
On Mon, Nov 07, 2016 at 07:45:32PM +0100, Borislav Petkov wrote: > On Thu, Nov 03, 2016 at 03:50:18PM +0100, Sebastian Andrzej Siewior wrote: > > Part of the init (memory allocation and so on) is done > > in mcheck_cpu_init(). While moving the the allocation to > > mcheck_init_device() (where the h

RE: [PATCH] x86/MCE: Remove MCP_TIMESTAMP

2016-11-08 Thread Luck, Tony
> This still preserves the precise TSC timestamp in intel_threshold_interrupt(). Yup - this looks right. Acked-by: Tony Luck -Tony

RE: [PATCH 22/25] x86/mcheck: Do the init in one place

2016-11-09 Thread Luck, Tony
> That's why the hotplug callback mce_disable_cpu() doesn't fiddle with > CR4 - it only clears the bits in MCi_CTL. And I think we should remain > that way. N.B. See vendor_disable_error_reporting() ... on Intel we don't clear MCi_CTL. -Tony

[PATCH 2/2] mcelog: Print the PPIN in machine check records when it is available

2016-11-17 Thread Luck, Tony
From: Tony Luck Intel Xeons from Ivy Bridge onwards support a processor identification number. Kernels v4.9 and higher include it in the "mce" record. Signed-off-by: Tony Luck --- mcelog.c | 3 +++ mcelog.h | 3 +++ 2 files changed, 6 insertions(+) diff --git a/mcelog.c b/mcelog.c index 7214a

<    4   5   6   7   8   9   10   11   12   >