--- Begin Message ---
Hi,

El 2/9/22 a las 9:59, Eneko Lacunza escribió:
Hi,

El 2/9/22 a las 9:47, Fiona Ebner escribió:
Am 02.09.22 um 09:22 schrieb Eneko Lacunza:
Hi Fiona,

Does this patch correspond to kernels linked in this forum thread?

https://forum.proxmox.com/threads/proxmox-7-2-3-ceph-16-2-7-migrating-vms-hangs-them-kernel-panic-on-linux-freeze-on-windows.109645/page-2#post-488479

No, there is no public build with the below patch yet.
Ok, thanks for the clarification.

Did you already test the kernel with the fpu patches that's mentioned in
that forum post?

No, I was waiting for a good time-window in our prod cluster to test it :) Seems it will be today.

I have just tested, and that patch doesn't seem to help. VMs hung with 100% CPU use with that version in live-migration destination host. Just updated bugzilla entry.



If so I can test them and see if that helps with bugzilla entry #4073:
https://bugzilla.proxmox.com/show_bug.cgi?id=4073

I don't think theses issues are related, as there, the VM that's been
migrated hangs, and here other VMs on the node were affected.

Yes, that's true, but I have seen other VMs on the nodes to be affected too (but less frequently). Maybe we are impacted by the two issues :)

I have easily reproduced hang on migrated (linux) VMs, but not hanging other VMs in today tests.

Cheers



which might be responsible for several issues reported in the
community forum[0][1].

In my case, loading a VM snapshot that originally was taken on
a CPU from a different vendor often caused problems in other VMs(!).
In particular, it often led to RCU stalls (with similar messages as in
[1]) or slowdowns, and sometimes clock jumps far into the future (like
in [0]). With this revert applied, everything seems to run smoothly
even after loading the "bad" snapshot 10 times.

[0]https://forum.proxmox.com/threads/112756/
[1]https://forum.proxmox.com/threads/111494/
The fix 11d39e8cc43e1c6737af19ca9372e590061b5ad2 is only for AMD/SVM, so
most likely [1], where people with Intel N5105 are affected, is not
related either. RCU stall messages can happen for different reasons of
course ;)


Our cluster has AMD CPUs.

I'll report back the results of our tests if I can finally try the test kernel today.

Thanks


Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

--- End Message ---
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to