[Xen-devel] gcc version used by developers
Hello, what version of gcc do Xen developers use for Xen? Is gcc 5.4 or 6.4 safe to use? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X
On 23.09.2019 10:17, Jan Beulich wrote: While, according to AMD's processor specs page, the 3700X is just an 8-core chip, I wonder whether https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg01954.html still affects this configuration as well. Could you give this a try in at least the viridian=0 case? As to Linux, did you check that PVH As a first input for the Xen developers I used the tool from http://www.etallen.com/cpuid.html to dump complete cpuid information. Regards Andreas cpuid-3700x.tar.xz Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X
On 23.09.2019 10:17, Jan Beulich wrote: Does booting with a single vCPU work? Number of vCPUs make no difference Well, according to Steven it does, with viridian=0. Could you re-check this? I can confirm that viridian=0 AND vcpus=1 makes the system bootable (with long delay though) at least the viridian=0 case? As to Linux, did you check that PVH (or HVM, which you don't mention) guests actually start all their vCPU-s successfully? I just tried PVH and HVM with 8 vcpus. Everything works, tested with make -j9 on a kernel tree. > 8-core chip, I wonder whether > https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg01954.html > still affects this configuration as well. Could you give this a try in Does it still make sense to try the patch given the cpuid I posted? Also I have an AMD 7302P in my lab (cpuid dump attached). No change in behavior from 3700X - cpuid is nearly the same. Regards Andreas cpuid-amd7302p.tar.xz Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X
While AMD Ryzen 2700X was working perfectly in my tests with Windows 10, the new 3700X does not even boot a Windows HVM. With viridian=1 you get BSOD HAL_MEMORY_ALLOCATION and with viridian=0 you get "multiprocessor config not supported". xl dmesg says: (XEN) d1v0 VIRIDIAN CRASH: ac 0 a0a0 f8065c06bf88 bf8 (XEN) d2v0 VIRIDIAN CRASH: ac 0 a0a0 f8035b049f88 bf8 Linux domUs with PV and PVH seem to work so far. Xen version 4.10.2. dom0 kernel 4.13.16. The BIOS version is unchanged from 2700X (working) to 3700X (crashing). Is it a known problem? Did someone test the new EPYCs? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X
On 20.08.2019 20:12, Andrew Cooper wrote: Xen version 4.10.2. dom0 kernel 4.13.16. The BIOS version is unchanged from 2700X (working) to 3700X (crashing). So you've done a Zen v1 => Zen v2 CPU upgrade and an existing system? With "existing system" you mean the Windows installation? Yes, but it is not relevant. The same BSODs happen if you boot the HVM with just the iso installation medium and no disks. Is it a known problem? Did someone test the new EPYCs? This looks familiar, and is still somewhere on my TODO list. Do you already know the reason or is that still to investigate? Does booting with a single vCPU work? Number of vCPUs make no difference Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X
On 20.08.2019 22:38, Andrew Cooper wrote: On 20/08/2019 21:36, Andreas Kinzler wrote: On 20.08.2019 20:12, Andrew Cooper wrote: Xen version 4.10.2. dom0 kernel 4.13.16. The BIOS version is unchanged from 2700X (working) to 3700X (crashing). So you've done a Zen v1 => Zen v2 CPU upgrade and an existing system? With "existing system" you mean the Windows installation? I meant same computer, not same VM. Tried with 2 mainboards: Asrock X370 Pro4 and AsrockRack X470D4U. You need to flash the BIOS for Zen2. X470D4U BIOS 3.1 works with 2700X but not with 3700X. X370 Pro4 with somewhat older BIOS worked for 2700X and does not work with current (6.00) BIOS and 3700X. Yes, but it is not relevant. The same BSODs happen if you boot the HVM with just the iso installation medium and no disks. That's a useful datapoint. I wouldn't expect this to be relevant, given how Window's HAL works. It should make debugging for you quite "simple" because it can be reproduced very easily. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Ryzen 3xxx works with Windows
Hello All, I compared the CPUID listings from Ryzen 2700X (attached as tar.xz) to 3700X and found only very few differences. I added cpuid = [ "0x8008:ecx=0100" ] to xl.cfg and then Windows runs great with 16 vCPUs. Cinebench R15 score is >2050 which is more or less the bare metal value. Regards Andreas cpuid-2700X.tar.xz Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ryzen 3xxx works with Windows
On 15.11.2019 18:13, George Dunlap wrote: On 11/15/19 5:06 PM, Andreas Kinzler wrote: Hello All, I compared the CPUID listings from Ryzen 2700X (attached as tar.xz) to 3700X and found only very few differences. I added cpuid = [ "0x8008:ecx=0100" ] to xl.cfg and then Windows runs great with 16 vCPUs. Cinebench R15 score is >2050 which is more or less the bare metal value. Awesome. Any idea what those bits do? From the AMD APM (https://www.amd.com/system/files/TechDocs/24594.pdf): APIC ID size. The number of bits in the initial APIC20[ApicId] value that indicate core ID within a processor. A zero value indicates that legacy methods must be used to derive the maximum number of cores. The size of this field determines the maximum number of cores (MNC) that the processor could theoretically support, not the actual number of cores that are actually implemented or enabled on the processor, as indicated by CPUID Fn8000_0008_ECX[NC]. if (ApicIdCoreIdSize[3:0] == 0){ // Used by legacy dual-core/single-core processors MNC = CPUID Fn8000_0008_ECX[NC] + 1; } else { // use ApicIdCoreIdSize[3:0] field MNC = (2 ^ ApicIdCoreIdSize[3:0]); } The value programmed in 2700X is 4, on 3700X it is 7. See my dump in https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html Please note that the value is an exponent - that means MNC is programmed as 16 for 2700X and 128 for 3700X. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] wall clock drift on Coffee Lake / C24x mainboard (HPET broken?), best practices
On 15.11.2019 12:01, Andreas Kinzler wrote: On 14.11.2019 12:29, Jan Beulich wrote: On 14.11.2019 00:10, Andreas Kinzler wrote: I came across the following: https://lkml.org/lkml/2019/8/29/536 Could that be the reason for the problem mentioned below? Xen is using HPET as clocksource on the platform/mainboard. Is there an (easy) way to verify if Xen uses PC10? Hence I can only suggest that you try again with limited or no use of C states, to at least get a hint as to a possible I changed the BIOS setting to a limit of PC7 and it is now running. I have to wait for the result. Thanks. Previously the drift after 4 days uptime was 60 sec. Now after 4 days uptime drift is 9 sec. So setting the package c-state limit to PC7 was a success. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] wall clock drift on Coffee Lake / C24x mainboard (HPET broken?), best practices
On 19.11.2019 10:29, Jan Beulich wrote: On 18.11.2019 20:35, Andreas Kinzler wrote: On 15.11.2019 12:01, Andreas Kinzler wrote: On 14.11.2019 12:29, Jan Beulich wrote: On 14.11.2019 00:10, Andreas Kinzler wrote: I came across the following: https://lkml.org/lkml/2019/8/29/536 Could that be the reason for the problem mentioned below? Xen is using HPET as clocksource on the platform/mainboard. Is there an (easy) way to verify if Xen uses PC10? Hence I can only suggest that you try again with limited or no use of C states, to at least get a hint as to a possible I changed the BIOS setting to a limit of PC7 and it is now running. I have to wait for the result. Thanks. Previously the drift after 4 days uptime was 60 sec. Now after 4 days uptime drift is 9 sec. So setting the package c-state limit to PC7 was a success. 9s still seems quite a lot to me, but yes, it's an improvement. It seems it is even better than some other platforms now. Some snapshot measurements from running systems: Xeon E3-1230v5 (Skylake): drift of 4 sec per day (23.999MHz HPET) Xeon E3-1240v6 (Kaby Lake): drift of 1.9 sec per day (23.999MHz HPET) Xeon E3-1240v5 (Skylake): drift of 4.85 sec per day (23.999MHz HPET) Xeon E5-1620v4 (Broadwell): drift of 2.7 sec per day (14.318MHz HPET) All these values are not great, but it is OK for me. Now would you be up to checking whether, rather than via BIOS settings (which not all BIOSes may offer) the same can be achieved by using Xen's command line option "max_cstate="? Also did you check whether further limiting C state use would I cannot try on production machines. I may have a slot on lab machines but I cannot promise. > further improve the situation? And did you possibly also check > whether telling Xen not to use the HPET would make a difference? Which other clocksource do you prefer? Is Xen tested (field-proven) on that other clocksource? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ryzen 3xxx works with Windows
On 18.11.2019 17:25, George Dunlap wrote: Where were these values collected -- on a PV dom0? Or from within the guest? Neither. Bare metal kernel - no Xen at all. Could you try this with `0111` instead? Works. '1000' crashes again. Now it is clear that 7 is the maximum Windows accepts. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: avoid HPET use on certain Intel platforms
On 22.11.2019 13:58, Andrew Cooper wrote: On 22/11/2019 12:57, Jan Beulich wrote: On 22.11.2019 13:50, Andrew Cooper wrote: On 22/11/2019 12:46, Jan Beulich wrote: Linux commit fc5db58539b49351e76f19817ed1102bf7c712d0 says "Some Coffee Lake platforms have a skewed HPET timer once the SoCs entered PC10, which in consequence marks TSC as unstable because HPET is used as watchdog clocksource for TSC." Adjust a few types in touched or nearby code at the same time. Reported-by ? The Linux commit has a Suggested-by, but no Reported-by. Do you want me to copy that one? Or else do you have any suggestion as to who the reporter was? Well - this patch was identified by someone on xen-devel, which I presume was your basis for looking into it. https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00662.html BTW: Xeon E-2136 @ C242 has 8086:3eca as ID. One needs to check with Intel which combinations are really affected. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: avoid HPET use on certain Intel platforms
On 25.11.2019 11:15, Jan Beulich wrote: On 23.11.2019 00:10, Andreas Kinzler wrote: BTW: Xeon E-2136 @ C242 has 8086:3eca as ID. One needs to check with Intel which combinations are really affected. Are you saying you observed the same issue on such a (server processor) system as well? Neither its datasheet nor its specification update The whole thread starting with https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00966.html was about Xeon E-2136. Setting a limit to PC7 greatly reduced the drift (https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg01044.html) (which I specifically downloaded and looked through just because of your remark) have any mention of a similar issue. I also take it that the code comment inherited from Linux says "SoCs" for a reason. Even the kernel mailing list postings lack official confirmation from Intel. That is why I said: someone (with internal Intel knowledge) needs to confirm which combinations are affected. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X
On 20.08.2019 22:38, Andrew Cooper wrote: On 20/08/2019 21:36, Andreas Kinzler wrote: Is it a known problem? Did someone test the new EPYCs? This looks familiar, and is still somewhere on my TODO list. Do you already know the reason or is that still to investigate? Does booting with a single vCPU work? Hmm - perhaps its not the same issue then. Either way, its firmly in the "still to investigate" phase. Any update on the topic? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] wall clock drift on C24x mainboard, best practices
Hello all, hello Paul, On a certain new mainboard with chipset C242 and Intel Xeon E-2136 I notice a severe clock drift. This is from dom0: # uptime 20:13:52 up 81 days, 1:41, 1 user, load average: 0.00, 0.00, 0.00 # hwclock 2019-10-12 20:27:37.204966+02:00 # date Sat Oct 12 20:07:19 CEST 2019 Kernel is 4.13.16 vanilla, Xen 4.10.2 So after 81 days uptime there is a difference of over 20 minutes between "date" and "hwclock". I operate many Xen servers and have never seen such a great drift except on this type of mainboard. What could be the reason? In general, what is the current best practice for NTP sync? Run it in dom0? In domU? Both? How does the domU type (Linux HVM/PVM/PVH or Windows HVM with WinPV drivers) make a difference? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Debugging Windows HVM crashes on Ryzen 3xxx series CPUs.
Hello All, https://www.reddit.com/r/Amd/comments/ckr5f4/amd_ryzen_3000_series_linux_support_and/ is concerning KVM, but it identified that the TOPOEXT feature was important to getting windows to boot. I just tried qemu 3.1.1 with KVM (kernel 5.1.21) on a Ryzen 3700X and started qemu with "-cpu host,-topoext" and it still works perfectly. So it seems that TOPOEXT is not relevant. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ryzen 3xxx plans for 4.13
On 06.11.2019 18:50, George Dunlap wrote: Modern Windows guests (at least Windows 10 and Windows Server 2016) crash when running under Xen on AMD Ryzen 3xxx desktop-class cpus (but not the corresponding server cpus). I my tests the second generation EPYC CPUs (codename "Rome") fail exactly the same way as the Ryzen 3xxx desktop CPUs. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] wall clock drift on Coffee Lake / C24x mainboard (HPET broken?), best practices
Hello All, I came across the following: https://lkml.org/lkml/2019/8/29/536 Could that be the reason for the problem mentioned below? Xen is using HPET as clocksource on the platform/mainboard. Is there an (easy) way to verify if Xen uses PC10? Regards Andreas On 12.10.2019 20:47, Andreas Kinzler wrote: Hello all, hello Paul, On a certain new mainboard with chipset C242 and Intel Xeon E-2136 I notice a severe clock drift. This is from dom0: # uptime 20:13:52 up 81 days, 1:41, 1 user, load average: 0.00, 0.00, 0.00 # hwclock 2019-10-12 20:27:37.204966+02:00 # date Sat Oct 12 20:07:19 CEST 2019 Kernel is 4.13.16 vanilla, Xen 4.10.2 So after 81 days uptime there is a difference of over 20 minutes between "date" and "hwclock". I operate many Xen servers and have never seen such a great drift except on this type of mainboard. What could be the reason? In general, what is the current best practice for NTP sync? Run it in dom0? In domU? Both? How does the domU type (Linux HVM/PVM/PVH or Windows HVM with WinPV drivers) make a difference? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] wall clock drift on Coffee Lake / C24x mainboard (HPET broken?), best practices
On 14.11.2019 12:29, Jan Beulich wrote: On 14.11.2019 00:10, Andreas Kinzler wrote: I came across the following: https://lkml.org/lkml/2019/8/29/536 Could that be the reason for the problem mentioned below? Xen is using HPET as clocksource on the platform/mainboard. Is there an (easy) way to verify if Xen uses PC10? In principle this can be obtained via both the xenpm utility and the 'c' debug key. Both xenpm and 'c' debug key show only up to level 7 in Xen 4.10.x (unmodified code). For Coffee Lake, however, I can't find any indication in the SDM that a PC10 residency MSR would exist. I used turbostat (https://github.com/torvalds/linux/blob/master/tools/power/x86/turbostat/turbostat.c) as a help. See functions has_c8910_msrs and intel_model_duplicates. I then added Coffee Lake with PC8/9/10 to do_get_hw_residencies and then I got high counts in PC8+PC9 and zero in PC10. Hence I can only suggest that you try again with limited or no use of C states, to at least get a hint as to a possible I changed the BIOS setting to a limit of PC7 and it is now running. I have to wait for the result. Thanks. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC] x86: Add hack to disable "Fake HT" mode
On 15.11.2019 11:57, George Dunlap wrote: Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM guests") attempted to "fake up" a topology which would induce guest operating systems to not treat vcpus as sibling hyperthreads. This involved (among other things) actually reporting hyperthreading as available, but giving vcpus every other APICID. The resulting cpu featureset is invalid, but most operating systems on most hardware managed to cope with it. Unfortunately, Windows running on modern AMD hardware -- including Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets confused by the resulting contradictory feature bits and crashes during installation. (Linux guests have so far continued to cope.) I do not understand a central point: No matter why and/or how a fake topology is presented by Xen, why did the older generation Ryzen 2xxx work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that causes the one to work and the other to fail? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC] x86: Add hack to disable "Fake HT" mode
On 15.11.2019 12:29, George Dunlap wrote: On 11/15/19 11:17 AM, Andreas Kinzler wrote: I do not understand a central point: No matter why and/or how a fake topology is presented by Xen, why did the older generation Ryzen 2xxx work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that causes the one to work and the other to fail? The CPU features that the guest sees are a mix of the real underlying features and changes made by Xen. Xen and/or the hardware will behave Why not analyze the bits in detail? I already posted the complete CPUID for 3700X (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html). @Steven: > If this is helpful, I can probably provide the same from: >* Ryzen 2700x >* Ryzen 3900x Can you post for 2700X? Then someone with detailed knowledge could compare the two? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC] x86: Add hack to disable "Fake HT" mode
On 15.11.2019 13:10, George Dunlap wrote: On 11/15/19 11:39 AM, Andreas Kinzler wrote: On 15.11.2019 12:29, George Dunlap wrote: On 11/15/19 11:17 AM, Andreas Kinzler wrote: I do not understand a central point: No matter why and/or how a fake topology is presented by Xen, why did the older generation Ryzen 2xxx work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that causes the one to work and the other to fail? The CPU features that the guest sees are a mix of the real underlying features and changes made by Xen. Xen and/or the hardware will behave Why not analyze the bits in detail? I already posted the complete CPUID for 3700X (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html). Then someone with detailed knowledge could compare the two? What would be the purpose? The code is going to look like this -- an impenetrable maze of "switch" and "if" statements based on individual bits or features or models. *Somewhere* in Window's versionof that code, there's a path which is triggered by As of this moment all of this is just an assumption - you might very well be right, but it could also be something totally different. What if the CPUID is nearly identical? This would lead to the conclusion that the problem has completely different root causes. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] PCI passthrough performance loss with Skylake-SP
On Tue, 26 Jun 2018 09:47:11 +0200, Paul Durrant wrote: > is not affected at all. The test uses standard iperf3 as a client - > the passed PCI device is not used in the test - so that > just the presence of the passed device will cause the iperf3> performance to drop from 6.5 gbit/sec (no passthrough) > to 4.5 gbit/sec. I assume that the network interface that you are testing is a PV network interface? Yes, win-pv. > Any explanation/fixes for that? Are both systems using the same version of Xen and Linux? Yes, same SSD. Attaching it to different machines. I can't necessarily claim credit for the discovery but that is indeed the case, and the sort of performance drop seen is exactly what I'd expect. I recently put a change into the Windows PV drivers to use a ballooned-out region of the guest RAM to host the grant tables instead, which avoids this problem. We run with this little hack in XenServer, which also 'fixes' things for guests OS that have not been modified: --- a/xen/arch/x86/hvm/mtrr.c +++ b/xen/arch/x86/hvm/mtrr.c I tried the patch and it seems to solve the problem. Thanks. Is the patch accepted by Xen devs as upstream patch? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] xen + i40e: transmit queue timeout
I am currently researching a transmit queue timeout with Xen 4.8.2 and Intel X722 (i40e driver). The problem occurs with various linux versions (4.8.17, 4.13.16, SLES 15 port of i40e). The problem seems to be related to heavy forwarding/bridging as I am running a heavy network stress test in a domU (linux/pvm 4.13.16). It seems that if I run the same test without Xen, it works (not sure). Any ideas? Regards Andreas [ 441.823998] NETDEV WATCHDOG: eth0 (i40e): transmit queue 0 timed out [ 441.824033] [ cut here ] [ 441.824046] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x218/0x220 [ 441.824048] Modules linked in: i40e nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_physdev br_netfilter bridge stp llc xt_tcpudp iptable_filter ip_tables x_tables binfmt_misc tun mlx5_core [ 441.824074] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.16-ak2 #1 [ 441.824077] Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 2.0 11/29/2017 [ 441.824079] task: 81810480 task.stack: 8180 [ 441.824084] RIP: e030:dev_watchdog+0x218/0x220 [ 441.824087] RSP: e02b:88005d203e68 EFLAGS: 00010296 [ 441.824091] RAX: 0038 RBX: RCX: 003e [ 441.824093] RDX: RSI: 88005d203ce4 RDI: 0004 [ 441.824095] RBP: 88005d203e98 R08: 8800571ae200 R09: 880056c00248 [ 441.824097] R10: 0082 R11: 0040 R12: 880052da9f40 [ 441.824099] R13: R14: 880055d04800 R15: 0080 [ 441.824112] FS: () GS:88005d20() knlGS:88005d20 [ 441.824115] CS: e033 DS: ES: CR0: 80050033 [ 441.824117] CR2: 7f22b20ed3a0 CR3: 518d1000 CR4: 00042660 [ 441.824121] Call Trace: [ 441.824123] [ 441.824130] ? qdisc_rcu_free+0x40/0x40 [ 441.824133] ? qdisc_rcu_free+0x40/0x40 [ 441.824140] call_timer_fn.isra.5+0x1f/0x90 [ 441.824144] expire_timers+0x99/0xb0 [ 441.824148] run_timer_softirq+0x7b/0xc0 [ 441.824155] ? handle_percpu_irq+0x35/0x50 [ 441.824159] ? generic_handle_irq+0x1d/0x30 [ 441.824166] ? __evtchn_fifo_handle_events+0x142/0x160 [ 441.824170] __do_softirq+0xe5/0x200 [ 441.824174] irq_exit+0xb1/0xc0 [ 441.824179] xen_evtchn_do_upcall+0x2b/0x40 [ 441.824186] xen_do_hypervisor_callback+0x1e/0x30 [ 441.824188] [ 441.824193] ? xen_hypercall_sched_op+0xa/0x20 [ 441.824197] ? xen_hypercall_sched_op+0xa/0x20 [ 441.824204] ? xen_safe_halt+0x10/0x20 [ 441.824207] ? default_idle+0x9/0x10 [ 441.824210] ? arch_cpu_idle+0xa/0x10 [ 441.824213] ? default_idle_call+0x1e/0x30 [ 441.824218] ? do_idle+0x183/0x1b0 [ 441.824222] ? cpu_startup_entry+0x18/0x20 [ 441.824226] ? rest_init+0xcb/0xd0 [ 441.824232] ? start_kernel+0x399/0x3a6 [ 441.824238] ? x86_64_start_reservations+0x2a/0x2c [ 441.824241] ? xen_start_kernel+0x54f/0x55b [ 441.824244] Code: 63 8e a0 03 00 00 eb 93 4c 89 f7 c6 05 89 22 3a 00 01 e8 7c fc fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 58 3e 76 81 e8 84 3a bf ff <0f> ff eb c3 0f 1f 40 00 48 c7 47 08 00 00 00 00 55 48 c7 07 00 [ 441.824306] ---[ end trace ae5cd79c539b9f32 ]--- [ 441.824323] i40e :19:00.0 eth0: tx_timeout: VSI_seid: 390, Q 0, NTC: 0x73, HWB: 0x73, NTU: 0x5d, TAIL: 0x73, INT: 0x1 [ 441.824330] i40e :19:00.0 eth0: tx_timeout recovery level 1, hung_queue 0<> ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] xen + i40e: transmit queue timeout
On Fri, 06 Jul 2018 14:03:00 +0200, Jan Beulich wrote: I am currently researching a transmit queue timeout with Xen 4.8.2 and Intel X722 (i40e driver). The problem occurs with various linux versions (4.8.17, 4.13.16, SLES 15 port of i40e). The problem seems to be related to heavy forwarding/bridging as I am running a heavy network stress test in a domU (linux/pvm 4.13.16). It seems that if I run the same test without Xen, it works (not sure). The log fragment below of course tells about nothing on why this is happening. Couple of questions therefore: Thanks for suggesting helpful further steps. - Are interrupts still arriving for this device at the point of the reported timeout? - Are interrupts distributed reasonably evenly between (v)CPUs? - Is the overall interrupt rate not higher than what the system can reasonably handle (the lower handling overhead means without Xen a higher rate would still be acceptable)? - Is the same heavy forwarding/bridging in effect when trying this without Xen? - Does running the same stress test in Dom0 work? I take it that there are no other relevant messages in any of the logs, or else you would have provided them right away. Actually, it seems that the driver is the problem which it quite counterintuitive because the driver is quite old (started in 2013) and you would expect it to be very mature. For a test I used the 2.4.10 version from https://sourceforge.net/projects/e1000/files/i40e%20stable/ and all the problems went away. I am writing this here so that others with the same problem have a possible solution to try. Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] PCI passthrough performance loss with Skylake-SP
I am currently testing PCI passthrough on the Skylake-SP platform using a Supermicro X11SPi-TF mainboard. Using PCI passthrough (an LSI SAS HBA) causes severe performance loss on the Skylake-SP platform while Xeon E3 v5 is not affected at all. The test uses standard iperf3 as a client - the passed PCI device is not used in the test - so that just the presence of the passed device will cause the iperf3 performance to drop from 6.5 gbit/sec (no passthrough) to 4.5 gbit/sec. Any explanation/fixes for that? Below the first part of xl dmesg for both systems. Regards Andreas Xeon E3-1240v5: (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB. (XEN) Intel VT-d Snoop Control enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping enabled. (XEN) Intel VT-d Posted Interrupt not enabled. (XEN) Intel VT-d Shared EPT tables enabled. (XEN) :00:13.0: unknown type 0 (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Interrupt remapping enabled (XEN) Enabled directed EOI with ioapic_ack_old on! (XEN) ENABLING IO-APIC IRQs (XEN) -> Using old ACK method (XEN) Allocated console ring of 16 KiB. (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Extended Page Tables (EPT) (XEN) - Virtual-Processor Identifiers (VPID) (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) - Unrestricted Guest (XEN) - VMCS shadowing (XEN) - VM Functions (XEN) - Virtualisation Exceptions (XEN) - Page Modification Logging (XEN) HVM: ASIDs enabled. (XEN) HVM: VMX enabled (XEN) HVM: Hardware Assisted Paging (HAP) detected (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB Skylake-SP: (XEN) Intel VT-d iommu 2 supported page sizes: 4kB, 2MB, 1GB. (XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB. (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB. (XEN) Intel VT-d iommu 3 supported page sizes: 4kB, 2MB, 1GB. (XEN) Intel VT-d Snoop Control enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping enabled. (XEN) Intel VT-d Posted Interrupt not enabled. (XEN) Intel VT-d Shared EPT tables enabled. (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Interrupt remapping enabled (XEN) Enabled directed EOI with ioapic_ack_old on! (XEN) ENABLING IO-APIC IRQs (XEN) -> Using old ACK method (XEN) Allocated console ring of 128 KiB. (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Extended Page Tables (EPT) (XEN) - Virtual-Processor Identifiers (VPID) (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) - Unrestricted Guest (XEN) - APIC Register Virtualization (XEN) - Virtual Interrupt Delivery (XEN) - Posted Interrupt Processing (XEN) - VMCS shadowing (XEN) - VM Functions (XEN) - Virtualisation Exceptions (XEN) - Page Modification Logging (XEN) - TSC Scaling ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Xen 4.10.x and PCI passthrough
Hello Roger, in August 2017, I reported a problem with PCI passthrough and MSI interrupts (https://lists.xenproject.org/archives/html/xen-devel/2017-08/msg01433.html). That report lead to some patches for Xen and qemu. Some weeks ago I tried a quite new version of Xen 4.10.2-pre (http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=a645331a9f4190e92ccf41a950bc4692f8904239) and the PCI card (LSI SAS HBA) using Windows 2012 R2 as a guest. Everything works but only to the point where Windows reboots -> then the card is no longer usable. If you destroy the domain and recreate the card again works. Did I miss something simple or should we analyze the problem again using similar debug prints as before? Regards Andreas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen 4.10.x and PCI passthrough
Hello Roger, Some weeks ago I tried a quite new version of Xen 4.10.2-pre (http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=a645331a9f4190e92ccf41a950bc4692f8904239) and the PCI card (LSI SAS HBA) using Windows 2012 R2 as a guest. Everything works but only to the point where Windows reboots -> then the card is no longer usable. If you destroy the domain and recreate the card again Did I miss something simple or should we analyze the problem again using similar debug prints as before? Not sure, but it doesn't look to me like this issue is related to the one fixed by the patches mentioned above, I think this is a different issue, and by the looks of it it's a toolstack issue. Can you paste the output of `xl -vvv create ` and the contents of the log that you will find in /var/log/xen/xl-.log after you have attempted a reboot? xl-domain.log is attached. From what I can see, the problem is that the card is deleted after domid 3 and not added back later. To confirm I ran "xl pci-list winsrv" during domid 3 and the card is shown. After Windows reboot (domid 4) "xl pci-list winsrv" gives an empty output. Regards Andreas<> ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: keep assigned pci devices across domain reboots
Fill the from_xenstore libxl_device_type hook for PCI devices so that libxl_retrieve_domain_configuration can properly retrieve PCI devices from xenstore. This fixes disappearing pci devices across domain reboots. This patch seems to be committed now. Please backport this to Xen 4.10 stable branch, for upcoming 4.10.3, because original bugreport was about Xen 4.10. Thanks to devs for the patch. I tested the patch and I can confirm that it fixes my original problem on Xen 4.10. To use with Xen 4.10 you need a small backport patch (attached). Regards Andreas backport-4.10.patch Description: Binary data ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel