[Xen-devel] gcc version used by developers

2017-11-27 Thread Andreas Kinzler

Hello,

what version of gcc do Xen developers use for Xen? Is gcc 5.4 or 6.4 safe  
to use?


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X

2019-09-24 Thread Andreas Kinzler

On 23.09.2019 10:17, Jan Beulich wrote:

While, according to AMD's processor specs page, the 3700X is just an
8-core chip, I wonder whether
https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg01954.html
still affects this configuration as well. Could you give this a try in
at least the viridian=0 case? As to Linux, did you check that PVH


As a first input for the Xen developers I used the tool from 
http://www.etallen.com/cpuid.html to dump complete cpuid information.


Regards Andreas


cpuid-3700x.tar.xz
Description: Binary data
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X

2019-09-24 Thread Andreas Kinzler

On 23.09.2019 10:17, Jan Beulich wrote:

Does booting with a single vCPU work?

Number of vCPUs make no difference

Well, according to Steven it does, with viridian=0. Could you
re-check this?


I can confirm that viridian=0 AND vcpus=1 makes the system bootable 
(with long delay though)



at least the viridian=0 case? As to Linux, did you check that PVH
(or HVM, which you don't mention) guests actually start all their vCPU-s
successfully?


I just tried PVH and HVM with 8 vcpus. Everything works, tested with 
make -j9 on a kernel tree.


> 8-core chip, I wonder whether
> 
https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg01954.html

> still affects this configuration as well. Could you give this a try in

Does it still make sense to try the patch given the cpuid I posted?
Also I have an AMD 7302P in my lab (cpuid dump attached). No change in 
behavior from 3700X - cpuid is nearly the same.


Regards Andreas


cpuid-amd7302p.tar.xz
Description: Binary data
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X

2019-08-20 Thread Andreas Kinzler
While AMD Ryzen 2700X was working perfectly in my tests with Windows 10, 
the new 3700X does not even boot a Windows HVM. With viridian=1 you get 
BSOD HAL_MEMORY_ALLOCATION and with viridian=0 you get "multiprocessor 
config not supported".


xl dmesg says:
(XEN) d1v0 VIRIDIAN CRASH: ac 0 a0a0 f8065c06bf88 bf8
(XEN) d2v0 VIRIDIAN CRASH: ac 0 a0a0 f8035b049f88 bf8

Linux domUs with PV and PVH seem to work so far.

Xen version 4.10.2. dom0 kernel 4.13.16. The BIOS version is unchanged 
from 2700X (working) to 3700X (crashing).


Is it a known problem? Did someone test the new EPYCs?

Regards Andreas


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X

2019-08-20 Thread Andreas Kinzler

On 20.08.2019 20:12, Andrew Cooper wrote:

Xen version 4.10.2. dom0 kernel 4.13.16. The BIOS version is unchanged
from 2700X (working) to 3700X (crashing).

So you've done a Zen v1 => Zen v2 CPU upgrade and an existing system?


With "existing system" you mean the Windows installation? Yes, but it is 
not relevant. The same BSODs happen if you boot the HVM with just the 
iso installation medium and no disks.



Is it a known problem? Did someone test the new EPYCs?

This looks familiar, and is still somewhere on my TODO list.


Do you already know the reason or is that still to investigate?


Does booting with a single vCPU work?


Number of vCPUs make no difference

Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X

2019-08-20 Thread Andreas Kinzler

On 20.08.2019 22:38, Andrew Cooper wrote:

On 20/08/2019 21:36, Andreas Kinzler wrote:

On 20.08.2019 20:12, Andrew Cooper wrote:

Xen version 4.10.2. dom0 kernel 4.13.16. The BIOS version is unchanged
from 2700X (working) to 3700X (crashing).

So you've done a Zen v1 => Zen v2 CPU upgrade and an existing system?

With "existing system" you mean the Windows installation?

I meant same computer, not same VM.


Tried with 2 mainboards: Asrock X370 Pro4 and AsrockRack X470D4U.
You need to flash the BIOS for Zen2. X470D4U BIOS 3.1 works with 2700X 
but not with 3700X. X370 Pro4 with somewhat older BIOS worked for 2700X 
and does not work with current (6.00) BIOS and 3700X.



Yes, but it is not relevant. The same BSODs happen if you boot the HVM
with just the iso installation medium and no disks.

That's a useful datapoint.  I wouldn't expect this to be relevant, given
how Window's HAL works.


It should make debugging for you quite "simple" because it can be 
reproduced very easily.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Ryzen 3xxx works with Windows

2019-11-15 Thread Andreas Kinzler

Hello All,

I compared the CPUID listings from Ryzen 2700X (attached as tar.xz) to 
3700X and found only very few differences. I added


cpuid = [ "0x8008:ecx=0100" ]

to xl.cfg and then Windows runs great with 16 vCPUs. Cinebench R15 score 
is >2050 which is more or less the bare metal value.


Regards Andreas


cpuid-2700X.tar.xz
Description: Binary data
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ryzen 3xxx works with Windows

2019-11-18 Thread Andreas Kinzler

On 15.11.2019 18:13, George Dunlap wrote:

On 11/15/19 5:06 PM, Andreas Kinzler wrote:

Hello All,

I compared the CPUID listings from Ryzen 2700X (attached as tar.xz) to
3700X and found only very few differences. I added

cpuid = [ "0x8008:ecx=0100" ]

to xl.cfg and then Windows runs great with 16 vCPUs. Cinebench R15 score
is >2050 which is more or less the bare metal value.

Awesome.  Any idea what those bits do?


From the AMD APM (https://www.amd.com/system/files/TechDocs/24594.pdf):

APIC ID size. The number of bits in the initial APIC20[ApicId] value 
that indicate core ID within a processor. A zero value indicates that 
legacy methods must be used to derive the maximum number of cores. The 
size of this field determines the maximum number of cores (MNC) that the 
processor could theoretically support, not the actual number of cores 
that are actually implemented or enabled on the processor, as indicated 
by CPUID Fn8000_0008_ECX[NC].

if (ApicIdCoreIdSize[3:0] == 0){
  // Used by legacy dual-core/single-core processors
  MNC = CPUID Fn8000_0008_ECX[NC] + 1;
} else {
  // use ApicIdCoreIdSize[3:0] field
  MNC = (2 ^ ApicIdCoreIdSize[3:0]);
}

The value programmed in 2700X is 4, on 3700X it is 7. See my dump in 
https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html


Please note that the value is an exponent - that means MNC is programmed 
as 16 for 2700X and 128 for 3700X.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] wall clock drift on Coffee Lake / C24x mainboard (HPET broken?), best practices

2019-11-18 Thread Andreas Kinzler

On 15.11.2019 12:01, Andreas Kinzler wrote:

On 14.11.2019 12:29, Jan Beulich wrote:

On 14.11.2019 00:10, Andreas Kinzler wrote:

I came across the following: https://lkml.org/lkml/2019/8/29/536
Could that be the reason for the problem mentioned below? Xen is using
HPET as clocksource on the platform/mainboard. Is there an (easy) way to
verify if Xen uses PC10?

Hence I can only suggest that you try again with limited or no
use of C states, to at least get a hint as to a possible
I changed the BIOS setting to a limit of PC7 and it is now running. I 
have to wait for the result. Thanks.


Previously the drift after 4 days uptime was 60 sec. Now after 4 days 
uptime drift is 9 sec. So setting the package c-state limit to PC7 was a 
success.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] wall clock drift on Coffee Lake / C24x mainboard (HPET broken?), best practices

2019-11-19 Thread Andreas Kinzler

On 19.11.2019 10:29, Jan Beulich wrote:

On 18.11.2019 20:35, Andreas Kinzler wrote:

On 15.11.2019 12:01, Andreas Kinzler wrote:

On 14.11.2019 12:29, Jan Beulich wrote:

On 14.11.2019 00:10, Andreas Kinzler wrote:

I came across the following: https://lkml.org/lkml/2019/8/29/536
Could that be the reason for the problem mentioned below? Xen is using
HPET as clocksource on the platform/mainboard. Is there an (easy) way to
verify if Xen uses PC10?

Hence I can only suggest that you try again with limited or no
use of C states, to at least get a hint as to a possible

I changed the BIOS setting to a limit of PC7 and it is now running. I
have to wait for the result. Thanks.


Previously the drift after 4 days uptime was 60 sec. Now after 4 days
uptime drift is 9 sec. So setting the package c-state limit to PC7 was a
success.


9s still seems quite a lot to me, but yes, it's an improvement.


It seems it is even better than some other platforms now. Some snapshot 
measurements from running systems:

Xeon E3-1230v5 (Skylake): drift of 4 sec per day (23.999MHz HPET)
Xeon E3-1240v6 (Kaby Lake): drift of 1.9 sec per day (23.999MHz HPET)
Xeon E3-1240v5 (Skylake): drift of 4.85 sec per day (23.999MHz HPET)
Xeon E5-1620v4 (Broadwell): drift of 2.7 sec per day (14.318MHz HPET)

All these values are not great, but it is OK for me.


Now would you be up to checking whether, rather than via BIOS
settings (which not all BIOSes may offer) the same can be
achieved by using Xen's command line option "max_cstate="?
Also did you check whether further limiting C state use would


I cannot try on production machines. I may have a slot on lab machines 
but I cannot promise.


> further improve the situation? And did you possibly also check
> whether telling Xen not to use the HPET would make a difference?

Which other clocksource do you prefer? Is Xen tested (field-proven) on 
that other clocksource?


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ryzen 3xxx works with Windows

2019-11-19 Thread Andreas Kinzler

On 18.11.2019 17:25, George Dunlap wrote:

Where were these values collected -- on a PV dom0?  Or from within the
guest?


Neither. Bare metal kernel - no Xen at all.


Could you try this with `0111` instead?


Works. '1000' crashes again. Now it is clear that 7 is the maximum 
Windows accepts.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: avoid HPET use on certain Intel platforms

2019-11-22 Thread Andreas Kinzler

On 22.11.2019 13:58, Andrew Cooper wrote:

On 22/11/2019 12:57, Jan Beulich wrote:

On 22.11.2019 13:50, Andrew Cooper wrote:

On 22/11/2019 12:46, Jan Beulich wrote:

Linux commit fc5db58539b49351e76f19817ed1102bf7c712d0 says

"Some Coffee Lake platforms have a skewed HPET timer once the SoCs entered
  PC10, which in consequence marks TSC as unstable because HPET is used as
  watchdog clocksource for TSC."

Adjust a few types in touched or nearby code at the same time.

Reported-by ?

The Linux commit has a Suggested-by, but no Reported-by. Do you
want me to copy that one? Or else do you have any suggestion as
to who the reporter was?

Well - this patch was identified by someone on xen-devel, which I
presume was your basis for looking into it.


https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00662.html

BTW: Xeon E-2136 @ C242 has 8086:3eca as ID. One needs to check with 
Intel which combinations are really affected.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: avoid HPET use on certain Intel platforms

2019-11-25 Thread Andreas Kinzler

On 25.11.2019 11:15, Jan Beulich wrote:

On 23.11.2019 00:10, Andreas Kinzler wrote:

BTW: Xeon E-2136 @ C242 has 8086:3eca as ID. One needs to check with
Intel which combinations are really affected.

Are you saying you observed the same issue on such a (server processor)
system as well? Neither its datasheet nor its specification update


The whole thread starting with 
https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00966.html 
was about Xeon E-2136.


Setting a limit to PC7 greatly reduced the drift 
(https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg01044.html)



(which I specifically downloaded and looked through just because of your
remark) have any mention of a similar issue. I also take it that the
code comment inherited from Linux says "SoCs" for a reason.


Even the kernel mailing list postings lack official confirmation from 
Intel. That is why I said: someone (with internal Intel knowledge) needs 
to confirm which combinations are affected.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X

2019-10-02 Thread Andreas Kinzler

On 20.08.2019 22:38, Andrew Cooper wrote:

On 20/08/2019 21:36, Andreas Kinzler wrote:

Is it a known problem? Did someone test the new EPYCs?

This looks familiar, and is still somewhere on my TODO list.

Do you already know the reason or is that still to investigate?

Does booting with a single vCPU work?

Hmm - perhaps its not the same issue then.  Either way, its firmly in
the "still to investigate" phase.


Any update on the topic?

Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] wall clock drift on C24x mainboard, best practices

2019-10-12 Thread Andreas Kinzler

Hello all, hello Paul,

On a certain new mainboard with chipset C242 and Intel Xeon E-2136 I 
notice a severe clock drift. This is from dom0:


# uptime
 20:13:52 up 81 days,  1:41,  1 user,  load average: 0.00, 0.00, 0.00
# hwclock
2019-10-12 20:27:37.204966+02:00
# date
Sat Oct 12 20:07:19 CEST 2019

Kernel is 4.13.16 vanilla, Xen 4.10.2

So after 81 days uptime there is a difference of over 20 minutes between 
"date" and "hwclock". I operate many Xen servers and have never seen 
such a great drift except on this type of mainboard. What could be the 
reason?


In general, what is the current best practice for NTP sync? Run it in 
dom0? In domU? Both? How does the domU type (Linux HVM/PVM/PVH or 
Windows HVM with WinPV drivers) make a difference?


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Debugging Windows HVM crashes on Ryzen 3xxx series CPUs.

2019-10-28 Thread Andreas Kinzler

Hello All,


https://www.reddit.com/r/Amd/comments/ckr5f4/amd_ryzen_3000_series_linux_support_and/
is concerning KVM, but it identified that the TOPOEXT feature was
important to getting windows to boot.


I just tried qemu 3.1.1 with KVM (kernel 5.1.21) on a Ryzen 3700X and 
started qemu with "-cpu host,-topoext" and it still works perfectly. So 
it seems that TOPOEXT is not relevant.


Regards Andreas


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ryzen 3xxx plans for 4.13

2019-11-06 Thread Andreas Kinzler

On 06.11.2019 18:50, George Dunlap wrote:

Modern Windows guests (at least Windows 10 and Windows Server 2016)
crash when running under Xen on AMD Ryzen 3xxx desktop-class cpus (but
not the corresponding server cpus).


I my tests the second generation EPYC CPUs (codename "Rome") fail 
exactly the same way as the Ryzen 3xxx desktop CPUs.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] wall clock drift on Coffee Lake / C24x mainboard (HPET broken?), best practices

2019-11-13 Thread Andreas Kinzler

Hello All,

I came across the following: https://lkml.org/lkml/2019/8/29/536

Could that be the reason for the problem mentioned below? Xen is using 
HPET as clocksource on the platform/mainboard. Is there an (easy) way to 
verify if Xen uses PC10?


Regards Andreas

On 12.10.2019 20:47, Andreas Kinzler wrote:

Hello all, hello Paul,

On a certain new mainboard with chipset C242 and Intel Xeon E-2136 I 
notice a severe clock drift. This is from dom0:


# uptime
 20:13:52 up 81 days,  1:41,  1 user,  load average: 0.00, 0.00, 0.00
# hwclock
2019-10-12 20:27:37.204966+02:00
# date
Sat Oct 12 20:07:19 CEST 2019

Kernel is 4.13.16 vanilla, Xen 4.10.2

So after 81 days uptime there is a difference of over 20 minutes 
between "date" and "hwclock". I operate many Xen servers and have 
never seen such a great drift except on this type of mainboard. What 
could be the reason?


In general, what is the current best practice for NTP sync? Run it in 
dom0? In domU? Both? How does the domU type (Linux HVM/PVM/PVH or 
Windows HVM with WinPV drivers) make a difference?


Regards Andreas


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] wall clock drift on Coffee Lake / C24x mainboard (HPET broken?), best practices

2019-11-15 Thread Andreas Kinzler

On 14.11.2019 12:29, Jan Beulich wrote:

On 14.11.2019 00:10, Andreas Kinzler wrote:

I came across the following: https://lkml.org/lkml/2019/8/29/536
Could that be the reason for the problem mentioned below? Xen is using
HPET as clocksource on the platform/mainboard. Is there an (easy) way to
verify if Xen uses PC10?

In principle this can be obtained via both the xenpm utility and
the 'c' debug key.


Both xenpm and 'c' debug key show only up to level 7 in Xen 4.10.x 
(unmodified code).



For Coffee Lake, however, I can't find any
indication in the SDM that a PC10 residency MSR would exist.


I used turbostat 
(https://github.com/torvalds/linux/blob/master/tools/power/x86/turbostat/turbostat.c) 
as a help. See functions has_c8910_msrs and intel_model_duplicates.


I then added Coffee Lake with PC8/9/10 to do_get_hw_residencies and then 
I got high counts in PC8+PC9 and zero in PC10.



Hence I can only suggest that you try again with limited or no
use of C states, to at least get a hint as to a possible


I changed the BIOS setting to a limit of PC7 and it is now running. I 
have to wait for the result. Thanks.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] x86: Add hack to disable "Fake HT" mode

2019-11-15 Thread Andreas Kinzler

On 15.11.2019 11:57, George Dunlap wrote:

Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM
guests") attempted to "fake up" a topology which would induce guest
operating systems to not treat vcpus as sibling hyperthreads.  This
involved (among other things) actually reporting hyperthreading as
available, but giving vcpus every other APICID.  The resulting cpu
featureset is invalid, but most operating systems on most hardware
managed to cope with it.

Unfortunately, Windows running on modern AMD hardware -- including
Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets
confused by the resulting contradictory feature bits and crashes
during installation.  (Linux guests have so far continued to cope.)


I do not understand a central point: No matter why and/or how a fake 
topology is presented by Xen, why did the older generation Ryzen 2xxx 
work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that 
causes the one to work and the other to fail?


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] x86: Add hack to disable "Fake HT" mode

2019-11-15 Thread Andreas Kinzler

On 15.11.2019 12:29, George Dunlap wrote:

On 11/15/19 11:17 AM, Andreas Kinzler wrote:

I do not understand a central point: No matter why and/or how a fake
topology is presented by Xen, why did the older generation Ryzen 2xxx
work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that
causes the one to work and the other to fail?

The CPU features that the guest sees are a mix of the real underlying
features and changes made by Xen.  Xen and/or the hardware will behave


Why not analyze the bits in detail? I already posted the complete CPUID 
for 3700X 
(https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html).


@Steven:
> If this is helpful, I can probably provide the same from:
>* Ryzen 2700x
>* Ryzen 3900x
Can you post for 2700X?

Then someone with detailed knowledge could compare the two?

Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] x86: Add hack to disable "Fake HT" mode

2019-11-15 Thread Andreas Kinzler

On 15.11.2019 13:10, George Dunlap wrote:

On 11/15/19 11:39 AM, Andreas Kinzler wrote:

On 15.11.2019 12:29, George Dunlap wrote:

On 11/15/19 11:17 AM, Andreas Kinzler wrote:

I do not understand a central point: No matter why and/or how a fake
topology is presented by Xen, why did the older generation Ryzen 2xxx
work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that
causes the one to work and the other to fail?

The CPU features that the guest sees are a mix of the real underlying
features and changes made by Xen.  Xen and/or the hardware will behave

Why not analyze the bits in detail? I already posted the complete CPUID
for 3700X
(https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html).
Then someone with detailed knowledge could compare the two?

What would be the purpose?
The code is going to look like this --
an impenetrable maze of "switch" and "if" statements based on
individual bits or features or models.  *Somewhere* in Window's
versionof that code, there's a path which is triggered by


As of this moment all of this is just an assumption - you might very 
well be right, but it could also be something totally different. What if 
the CPUID is nearly identical? This would lead to the conclusion that 
the problem has completely different root causes.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] PCI passthrough performance loss with Skylake-SP

2018-07-01 Thread Andreas Kinzler
On Tue, 26 Jun 2018 09:47:11 +0200, Paul Durrant   
wrote:


> is not affected at all. The test uses standard iperf3 as a client - >  
the passed PCI device is not used in the test - so that
> just the presence of the passed device will cause the iperf3>  
performance to drop from 6.5 gbit/sec (no passthrough)

> to 4.5 gbit/sec.
I assume that the network interface that you are testing is a PV
network interface?


Yes, win-pv.


> Any explanation/fixes for that?
Are both systems using the same version of Xen and Linux?


Yes, same SSD. Attaching it to different machines.

I can't necessarily claim credit for the discovery but that is indeed  
the case, and the sort of performance drop seen is exactly what I'd  
expect. I recently put a change into the Windows PV drivers to use a  
ballooned-out region of the guest RAM to host the grant tables instead,  
which avoids this problem.
We run with this little hack in XenServer, which also 'fixes' things for  
guests OS that have not been modified:

--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c


I tried the patch and it seems to solve the problem. Thanks.
Is the patch accepted by Xen devs as upstream patch?

Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] xen + i40e: transmit queue timeout

2018-07-06 Thread Andreas Kinzler
I am currently researching a transmit queue timeout with Xen 4.8.2 and  
Intel X722 (i40e driver). The problem occurs with various linux versions  
(4.8.17, 4.13.16, SLES 15 port of i40e). The problem seems to be related  
to heavy forwarding/bridging as I am running a heavy network stress test  
in a domU (linux/pvm 4.13.16). It seems that if I run the same test  
without Xen, it works (not sure).


Any ideas?

Regards Andreas

[  441.823998] NETDEV WATCHDOG: eth0 (i40e): transmit queue 0 timed out
[  441.824033] [ cut here ]
[  441.824046] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316  
dev_watchdog+0x218/0x220
[  441.824048] Modules linked in: i40e nf_conntrack_ipv4 nf_defrag_ipv4  
xt_conntrack nf_conntrack xt_physdev br_netfilter bridge stp llc xt_tcpudp  
iptable_filter ip_tables x_tables binfmt_misc tun mlx5_core

[  441.824074] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.16-ak2 #1
[  441.824077] Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 2.0  
11/29/2017

[  441.824079] task: 81810480 task.stack: 8180
[  441.824084] RIP: e030:dev_watchdog+0x218/0x220
[  441.824087] RSP: e02b:88005d203e68 EFLAGS: 00010296
[  441.824091] RAX: 0038 RBX:  RCX:  
003e
[  441.824093] RDX:  RSI: 88005d203ce4 RDI:  
0004
[  441.824095] RBP: 88005d203e98 R08: 8800571ae200 R09:  
880056c00248
[  441.824097] R10: 0082 R11: 0040 R12:  
880052da9f40
[  441.824099] R13:  R14: 880055d04800 R15:  
0080
[  441.824112] FS:  () GS:88005d20()  
knlGS:88005d20

[  441.824115] CS:  e033 DS:  ES:  CR0: 80050033
[  441.824117] CR2: 7f22b20ed3a0 CR3: 518d1000 CR4:  
00042660

[  441.824121] Call Trace:
[  441.824123]  
[  441.824130]  ? qdisc_rcu_free+0x40/0x40
[  441.824133]  ? qdisc_rcu_free+0x40/0x40
[  441.824140]  call_timer_fn.isra.5+0x1f/0x90
[  441.824144]  expire_timers+0x99/0xb0
[  441.824148]  run_timer_softirq+0x7b/0xc0
[  441.824155]  ? handle_percpu_irq+0x35/0x50
[  441.824159]  ? generic_handle_irq+0x1d/0x30
[  441.824166]  ? __evtchn_fifo_handle_events+0x142/0x160
[  441.824170]  __do_softirq+0xe5/0x200
[  441.824174]  irq_exit+0xb1/0xc0
[  441.824179]  xen_evtchn_do_upcall+0x2b/0x40
[  441.824186]  xen_do_hypervisor_callback+0x1e/0x30
[  441.824188]  
[  441.824193]  ? xen_hypercall_sched_op+0xa/0x20
[  441.824197]  ? xen_hypercall_sched_op+0xa/0x20
[  441.824204]  ? xen_safe_halt+0x10/0x20
[  441.824207]  ? default_idle+0x9/0x10
[  441.824210]  ? arch_cpu_idle+0xa/0x10
[  441.824213]  ? default_idle_call+0x1e/0x30
[  441.824218]  ? do_idle+0x183/0x1b0
[  441.824222]  ? cpu_startup_entry+0x18/0x20
[  441.824226]  ? rest_init+0xcb/0xd0
[  441.824232]  ? start_kernel+0x399/0x3a6
[  441.824238]  ? x86_64_start_reservations+0x2a/0x2c
[  441.824241]  ? xen_start_kernel+0x54f/0x55b
[  441.824244] Code: 63 8e a0 03 00 00 eb 93 4c 89 f7 c6 05 89 22 3a 00 01  
e8 7c fc fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 58 3e 76 81 e8 84 3a bf ff  
<0f> ff eb c3 0f 1f 40 00 48 c7 47 08 00 00 00 00 55 48 c7 07 00

[  441.824306] ---[ end trace ae5cd79c539b9f32 ]---
[  441.824323] i40e :19:00.0 eth0: tx_timeout: VSI_seid: 390, Q 0,  
NTC: 0x73, HWB: 0x73, NTU: 0x5d, TAIL: 0x73, INT: 0x1
[  441.824330] i40e :19:00.0 eth0: tx_timeout recovery level 1,  
hung_queue 0<>
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] xen + i40e: transmit queue timeout

2018-07-09 Thread Andreas Kinzler

On Fri, 06 Jul 2018 14:03:00 +0200, Jan Beulich  wrote:

I am currently researching a transmit queue timeout with Xen 4.8.2 and
Intel X722 (i40e driver). The problem occurs with various linux versions
(4.8.17, 4.13.16, SLES 15 port of i40e). The problem seems to be related
to heavy forwarding/bridging as I am running a heavy network stress test
in a domU (linux/pvm 4.13.16). It seems that if I run the same test
without Xen, it works (not sure).

The log fragment below of course tells about nothing on why this
is happening. Couple of questions therefore:


Thanks for suggesting helpful further steps.


- Are interrupts still arriving for this device at the point of the
  reported timeout?
- Are interrupts distributed reasonably evenly between (v)CPUs?
- Is the overall interrupt rate not higher than what the system
  can reasonably handle (the lower handling overhead means
  without Xen a higher rate would still be acceptable)?
- Is the same heavy forwarding/bridging in effect when trying this
  without Xen?
- Does running the same stress test in Dom0 work?
I take it that there are no other relevant messages in any of the
logs, or else you would have provided them right away.


Actually, it seems that the driver is the problem which it quite  
counterintuitive because the driver is quite old (started in 2013) and you  
would expect it to be very mature.


For a test I used the 2.4.10 version from  
https://sourceforge.net/projects/e1000/files/i40e%20stable/ and all the  
problems went away. I am writing this here so that others with the same  
problem have a possible solution to try.


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] PCI passthrough performance loss with Skylake-SP

2018-06-25 Thread Andreas Kinzler
I am currently testing PCI passthrough on the Skylake-SP platform using a  
Supermicro X11SPi-TF mainboard. Using PCI passthrough (an LSI SAS HBA)  
causes severe performance loss on the Skylake-SP platform while Xeon E3 v5  
is not affected at all. The test uses standard iperf3 as a client - the  
passed PCI device is not used in the test - so that just the presence of  
the passed device will cause the iperf3 performance to drop from 6.5  
gbit/sec (no passthrough) to 4.5 gbit/sec.


Any explanation/fixes for that?

Below the first part of xl dmesg for both systems.

Regards Andreas

Xeon E3-1240v5:
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) :00:13.0: unknown type 0
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) Allocated console ring of 16 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN)  - VMCS shadowing
(XEN)  - VM Functions
(XEN)  - Virtualisation Exceptions
(XEN)  - Page Modification Logging
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB

Skylake-SP:
(XEN) Intel VT-d iommu 2 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 3 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) Allocated console ring of 128 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN)  - APIC Register Virtualization
(XEN)  - Virtual Interrupt Delivery
(XEN)  - Posted Interrupt Processing
(XEN)  - VMCS shadowing
(XEN)  - VM Functions
(XEN)  - Virtualisation Exceptions
(XEN)  - Page Modification Logging
(XEN)  - TSC Scaling

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Xen 4.10.x and PCI passthrough

2018-09-07 Thread Andreas Kinzler

Hello Roger,

in August 2017, I reported a problem with PCI passthrough and MSI  
interrupts  
(https://lists.xenproject.org/archives/html/xen-devel/2017-08/msg01433.html).


That report lead to some patches for Xen and qemu.

Some weeks ago I tried a quite new version of Xen 4.10.2-pre  
(http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=a645331a9f4190e92ccf41a950bc4692f8904239)  
and the PCI card (LSI SAS HBA) using Windows 2012 R2 as a guest.  
Everything works but only to the point where Windows reboots -> then the  
card is no longer usable. If you destroy the domain and recreate the card  
again works.


Did I miss something simple or should we analyze the problem again using  
similar debug prints as before?


Regards Andreas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.10.x and PCI passthrough

2018-09-17 Thread Andreas Kinzler

Hello Roger,

Some weeks ago I tried a quite new version of Xen 4.10.2-pre  
(http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=a645331a9f4190e92ccf41a950bc4692f8904239)
and the PCI card (LSI SAS HBA) using Windows 2012 R2 as a guest.  
Everything

works but only to the point where Windows reboots -> then the card is no
longer usable. If you destroy the domain and recreate the card again  
Did I miss something simple or should we analyze the problem again using

similar debug prints as before?

Not sure, but it doesn't look to me like this issue is related to the
one fixed by the patches mentioned above, I think this is a different
issue, and by the looks of it it's a toolstack issue.
Can you paste the output of `xl -vvv create ` and the
contents of the log that you will find in
/var/log/xen/xl-.log after you have attempted a reboot?


xl-domain.log is attached. From what I can see, the problem is that the  
card is deleted after domid 3 and not added back later. To confirm I ran  
"xl pci-list winsrv" during domid 3 and the card is shown. After Windows  
reboot (domid 4) "xl pci-list winsrv" gives an empty output.


Regards Andreas<>
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: keep assigned pci devices across domain reboots

2018-10-01 Thread Andreas Kinzler

Fill the from_xenstore libxl_device_type hook for PCI devices so that
libxl_retrieve_domain_configuration can properly retrieve PCI devices
from xenstore.
This fixes disappearing pci devices across domain reboots.
This patch seems to be committed now. Please backport this to Xen 4.10  
stable branch, for upcoming 4.10.3, because original bugreport was about  
Xen 4.10.


Thanks to devs for the patch. I tested the patch and I can confirm that it  
fixes my original problem on Xen 4.10. To use with Xen 4.10 you need a  
small backport patch (attached).


Regards Andreas

backport-4.10.patch
Description: Binary data
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel