Hi Jürgen, just wanted to give you (and everyone who may be keeping an eye on this) an update.
Somehow, after applying your kernel patch -- the VM is now running 10 days+ without a problem. I'll keep experimenting (A/B-testing style) but at this point I'm actually pretty perplexed as to why this patch would make a difference (since it is basically just for observability). Any thoughts on that? Thanks, Roman. On Wed, Feb 24, 2021 at 7:06 PM Roman Shaposhnik <ro...@zededa.com> wrote: > > Hi Jürgen! > > sorry for the belated reply -- I wanted to externalize the VM before I > do -- but let me at least reply to you: > > On Tue, Feb 23, 2021 at 5:17 AM Jürgen Groß <jgr...@suse.com> wrote: > > > > On 18.02.21 06:21, Roman Shaposhnik wrote: > > > On Wed, Feb 17, 2021 at 12:29 AM Jürgen Groß <jgr...@suse.com > > > <mailto:jgr...@suse.com>> wrote: > > > > > > On 17.02.21 09:12, Roman Shaposhnik wrote: > > > > Hi Jürgen, thanks for taking a look at this. A few comments below: > > > > > > > > On Tue, Feb 16, 2021 at 10:47 PM Jürgen Groß <jgr...@suse.com > > > <mailto:jgr...@suse.com>> wrote: > > > >> > > > >> On 16.02.21 21:34, Stefano Stabellini wrote: > > > >>> + x86 maintainers > > > >>> > > > >>> It looks like the tlbflush is getting stuck? > > > >> > > > >> I have seen this case multiple times on customer systems now, but > > > >> reproducing it reliably seems to be very hard. > > > > > > > > It is reliably reproducible under my workload but it take a long > > > time > > > > (~3 days of the workload running in the lab). > > > > > > This is by far the best reproduction rate I have seen up to now. > > > > > > The next best reproducer seems to be a huge installation with several > > > hundred hosts and thousands of VMs with about 1 crash each week. > > > > > > > > > > >> I suspected fifo events to be blamed, but just yesterday I've been > > > >> informed of another case with fifo events disabled in the guest. > > > >> > > > >> One common pattern seems to be that up to now I have seen this > > > effect > > > >> only on systems with Intel Gold cpus. Can it be confirmed to be > > > true > > > >> in this case, too? > > > > > > > > I am pretty sure mine isn't -- I can get you full CPU specs if > > > that's useful. > > > > > > Just the output of "grep model /proc/cpuinfo" should be enough. > > > > > > > > > processor: 3 > > > vendor_id: GenuineIntel > > > cpu family: 6 > > > model: 77 > > > model name: Intel(R) Atom(TM) CPU C2550 @ 2.40GHz > > > stepping: 8 > > > microcode: 0x12d > > > cpu MHz: 1200.070 > > > cache size: 1024 KB > > > physical id: 0 > > > siblings: 4 > > > core id: 3 > > > cpu cores: 4 > > > apicid: 6 > > > initial apicid: 6 > > > fpu: yes > > > fpu_exception: yes > > > cpuid level: 11 > > > wp: yes > > > flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat > > > pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp > > > lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology > > > nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est > > > tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer > > > aes rdrand lahf_lm 3dnowprefetch cpuid_fault epb pti ibrs ibpb stibp > > > tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida > > > arat md_clear > > > vmx flags: vnmi preemption_timer invvpid ept_x_only flexpriority > > > tsc_offset vtpr mtf vapic ept vpid unrestricted_guest > > > bugs: cpu_meltdown spectre_v1 spectre_v2 mds msbds_only > > > bogomips: 4800.19 > > > clflush size: 64 > > > cache_alignment: 64 > > > address sizes: 36 bits physical, 48 bits virtual > > > power management: > > > > > > > > > > >> In case anybody has a reproducer (either in a guest or dom0) with > > > a > > > >> setup where a diagnostic kernel can be used, I'd be _very_ > > > interested! > > > > > > > > I can easily add things to Dom0 and DomU. Whether that will > > > disrupt the > > > > experiment is, of course, another matter. Still please let me > > > know what > > > > would be helpful to do. > > > > > > Is there a chance to switch to an upstream kernel in the guest? I'd > > > like > > > to add some diagnostic code to the kernel and creating the patches > > > will > > > be easier this way. > > > > > > > > > That's a bit tough -- the VM is based on stock Ubuntu and if I upgrade > > > the kernel I'll have fiddle with a lot things to make workload > > > functional again. > > > > > > However, I can install debug kernel (from Ubuntu, etc. etc.) > > > > > > Of course, if patching the kernel is the only way to make progress -- > > > lets try that -- please let me know. > > > > I have found a nice upstream patch, which - with some modifications - I > > plan to give our customer as a workaround. > > > > The patch is for kernel 4.12, but chances are good it will apply to a > > 4.15 kernel, too. > > I'm slightly confused about this patch -- it seems to me that it needs > to be applied to the guest kernel, correct? > > If that's the case -- the challenge I have is that I need to re-build > the Canonical (Ubuntu) distro kernel with this patch -- this seems > a bit daunting at first (I mean -- I'm pretty good at rebuilding kernels > I just never do it with the vendor ones ;-)). > > So... if there's anyone here who has any suggestions on how to do that > -- I'd appreciate pointers. > > > I have been able to gather some more data. > > > > I have contacted the author of the upstream kernel patch I've been using > > for our customer (and that helped, by the way). > > > > It seems as if the problem is occurring when running as a guest at least > > under Xen, KVM, and VMWare, and there have been reports of bare metal > > cases, too. Hunting this bug is going on for several years now, the > > patch author is at it since 8 months. > > > > So we can rule out a Xen problem. > > > > Finding the root cause is still important, of course, and your setup > > seems to have the best reproduction rate up to now. > > > > So any help would really be appreciated. > > > > Is the VM self contained? Would it be possible to start it e.g. on a > > test system on my side? If yes, would you be allowed to pass it on to > > me? > > I'm working on externalizing the VM in a way that doesn't disclose anything > about the customer workload. I'm almost there -- sans my question about > the vendor kernel rebuild. I plan to make that VM available this week. > > Goes without saying, but I would really appreciate your help in chasing this. > > Thanks, > Roman.