>>I don't have info about microcode update, only a note from dell support which >>said that it's correcting >>instability on vmware. (So I don't known for kvm)
Here the detail of microcode patch 815 Processor May Read Partially Updated Branch Status Register Description Under a highly specific and detailed set of internal timing conditions, the processor may read an internal branch status register (BSR) while the register is being updated resulting in an incorrect rIP. Potential Effect on System The incorrect rIP causes unpredictable program or system behavior, usually observed as a page fault. Suggested Workaround Contact your AMD representative for information on a BIOS update. Fix Planned No fix planned I have another crash this afternoon, and this host was around 90% cpu usage since 12h. (But loadaverage was ok). So maybe more cpu give more chance to reach the case. I have patched this bios, I'll wait to see if it's improve or not. ----- Mail original ----- De: "aderumier" <aderum...@odiso.com> À: "datanom.net" <m...@datanom.net> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Lundi 29 Décembre 2014 16:56:32 Objet: Re: [pve-devel] need help to debug random host freeze on multiple hosts >>Could this, given the high load, be caused by a race condition which is >>solved in the new microcode? I don't have info about microcode update, only a note from dell support which said that it's correcting instability on vmware. (So I don't known for kvm) >>Have you tried connecting a serial console to one of the nodes? >> >>If you have IPMI on the nodes you should also be able to monitor >>further than on the default console. I'm going to implement serial output over the dell idrac. ----- Mail original ----- De: "datanom.net" <m...@datanom.net> À: "pve-devel" <pve-devel@pve.proxmox.com> Cc: "aderumier" <aderum...@odiso.com> Envoyé: Lundi 29 Décembre 2014 13:27:08 Objet: Re: [pve-devel] need help to debug random host freeze on multiple hosts On Mon, 29 Dec 2014 07:31:32 +0100 (CET) Alexandre DERUMIER <aderum...@odiso.com> wrote: > > Yes sure , I have nothing in logs. > (That's why I thinked of kdump to try to have more info). > > I'll really don't known if it's a software real kernel panic, or a hardware > bug. > > I just see on vmware forum some amd microcode bug, and see that dell provide > a new bios update this month. > I'll try to update to see if it's help. > Could this, given the high load, be caused by a race condition which is solved in the new microcode? Have you tried connecting a serial console to one of the nodes? If you have IPMI on the nodes you should also be able to monitor further than on the default console. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael <at> rasmussen <dot> cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir <at> datanom <dot> net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir <at> miras <dot> org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: We secure our friends not by accepting favors but by doing them. -- Thucydides _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel