On 28-03-2015 08:50 PM, Mick wrote:
On Saturday 28 Mar 2015 22:48:48 Sebas Pedersen wrote:
On 28-03-2015 07:37 PM, Volker Armin Hemmann wrote:
> Am 28.03.2015 um 23:00 schrieb Sebas Pedersen:
>> On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote:
>>> Am 28.03.2015 um 14:58 schrieb Sebas Pedersen:
>>>> Hi guys,
>>>>
>>>> From a few days ago I am experimenting an MCE error.
>>>> Sometimes I turn on the computer and at some point while booting the
>>>> kernel (after the grub menu) just freezes and puts this:
>>>>
>>>> CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
>>>> TSC f5acc9180
>>>> PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0
>>>>
>>>> the number for TSC may vary, but the b200000000070f0f it's always
>>>> the
>>>> same (at least for now). The error message suggest to parse the
>>>> above
>>>> error with mcelog. I did that and the result was:
>>>>
>>>> Hardware event. This is not a software error.
>>>> CPU 0 4 northbridge TSC f5acc9180
>>>> TIME 1427486735 Fri Mar 27 17:05:35 2015
>>>>
>>>> Northbridge Watchdog error
>>>>
>>>> bit57 = processor context corrupt
>>>> bit61 = error uncorrected
>>>>
>>>> bus error 'generic participation, request timed out
>>>>
>>>> generic error mem transaction
>>>> generic access, level generic'
>>>>
>>>> STATUS b200000000070f0f MCGSTATUS 4
>>>> CPUID Vendor AMD Family 15 Model 44
>>>> SOCKET 0 APIC 0 microcode 0
>>>>
>>>> The error suggest it's a hardware problem. I replace de RAM with no
>>>> luck. Same error keeps happening.
>>>>
>>>> Any suggestion for identifying the problem or how to procede?
>>>>
>>>> Many thanks in advance!
>>>>
>>>> Sebas
>>>
>>> bios update/microcode update. A google search suggests that you have
>>> run
>>> into an errata.
>>
>> Oh OK, thank you. Must have miss that in the search. So you are saying
>> that the error comes from a bios errata (and don't know what microdode
>> is), and the fix is to update bios?
>
> no, possibly from a CPU errata and a bios update might bring in the
> microcode update that works around that.
I see, thanks for clarifying that. So looks like not too many options,
either try to update the bios and/or replace the CPU.
I really appreciated you replys and time.
Thanks!,
Sebas
There's 'CONFIG_MICROCODE=y' and friends in the kernel which along with
sys-
apps/microcode-ctl will load what ever is the latest Intel/AMD CPU code
(firmware) to patch any bugs with instructions that the CPU
manufacturers have
discovered.
That's nice. I'm gonna compile the kernel and see what happends.
Many thanks!