Dear Ashosk, dear Borislav,

On 01/05/17 02:12, Raj, Ashok wrote:

CPUID Vendor Intel Family 6 Model 142
This is Kabylake Mobile

Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 7880018086 ADDR fef1ce40
TIME 1483543069 Wed Jan  4 16:17:49 2017
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0

Decoding the bits further from MCi_STATUS above:
Val=1, OVER=1, UC=1, but EN=0 indicates this isn't a MCE, hence should have
been signaled by a CMCI.

PCC=1, but should be ignored when EN=0.
MCACOD: 110a MSCOD: 0040

If the system is stable enough after the report, can you send the output of
/proc/interrupts to confirm that.

To be clear, other than the message, the system is stable for me.

Here is `/proc/interrupts`.

```
$ more /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
0: 27 0 0 0 IR-IO-APIC 2-edge timer 1: 3 2 125 5 IR-IO-APIC 1-edge i8042 8: 0 1 0 0 IR-IO-APIC 8-edge rtc0 9: 108 31 397 5 IR-IO-APIC 9-fasteoi acpi 12: 66 18 92 35 IR-IO-APIC 12-edge i8042 14: 0 0 0 0 IR-IO-APIC 14-fasteoi INT344B:00 16: 0 0 0 0 IR-IO-APIC 16-fasteoi idma64.0, i801_smbus, i2c_designware.0 17: 419 42 280 415 IR-IO-APIC 17-fasteoi idma64.1, i2c_designware.1 51: 2 0 0 1 IR-IO-APIC 51-fasteoi DLL075B:01 120: 0 0 0 0 DMAR-MSI 0-edge dmar0 121: 0 0 0 0 DMAR-MSI 1-edge dmar1 274: 17 2 0 4 IR-PCI-MSI 30932992-edge rtsx_pci 275: 89 26 57 45 IR-PCI-MSI 327680-edge xhci_hcd 276: 1886 0 2361 0 IR-PCI-MSI 31457280-edge nvme0q0, nvme0q1 277: 0 3010 2570 0 IR-PCI-MSI 31457281-edge nvme0q2 278: 0 0 2023 3480 IR-PCI-MSI 31457282-edge nvme0q3 279: 0 3319 0 5863 IR-PCI-MSI 31457283-edge nvme0q4 280: 45 0 0 0 IR-PCI-MSI 360448-edge mei_me 281: 201 52 3008 85 IR-PCI-MSI 32768-edge i915 282: 151 29 997 24821 IR-PCI-MSI 30408704-edge ath10k_pci 283: 331 938 677 188 IR-PCI-MSI 514048-edge snd_hda_intel:card0
 NMI:          1          0          0          0   Non-maskable interrupts
 LOC:      15198      21708      16850      31954   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
PMI: 1 0 0 0 Performance monitoring interrupts
 IWI:          3          0          0          0   IRQ work interrupts
 RTR:          0          0          0          0   APIC ICR read retries
 RES:       1329       1974       1532       1959   Rescheduling interrupts
CAL: 2254 3827 1969 3963 Function call interrupts
 TLB:        396       2349        342       2193   TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts DFR: 0 0 0 0 Deferred Error APIC interrupts MCE: 0 0 0 0 Machine check exceptions
 MCP:          9          9          9          9   Machine check polls
 ERR:         17
 MIS:          0
PIN: 0 0 0 0 Posted-interrupt notification event PIW: 0 0 0 0 Posted-interrupt wakeup event
```

Although its reported as a L2 error, some memory errors can also manifest
itself as a cache error in certain cases.  In this case it looks like
some speculative fetch from bad memory might be the cause.

MCGCAP c08 APICID 0 SOCKETID 0

MCG_CAP: c08
Support CMCI(bit 10) - Corrected Machine Check Interrupt (CMCI_P) and
Threshold based error reporting (bit 11) (TES_P).


Do you have another machine which doesn't report these errors? if so try
swapping memory between them to see if the error disappears.

No, I don’t. And everybody I talked to with a Dell XPS13 (9360) seems to have these errors.

I don't have the model specific error handy.. will check that in the meantime
to get some decoding as well.

If you haven't already running some memory tests would also help.

I need some time for that.

If you replaced the motherboard, did that involve both cpu and memory?
or just the motheboard swap?

Sorry, I don’t know, as I am not the person from StackExchange [1].


Kind regards,

Paul


[1] https://unix.stackexchange.com/questions/324237/understanding-machine-check-exceptions-mce/330283

Reply via email to