Re: [Xen-devel] [PATCH] x86/apicv: enhance posted-interrupt processing

2017-02-27 Thread Xuquan (Quan Xu)
On February 23, 2017 4:38 PM, Chao Gao wrote:
>On Thu, Feb 23, 2017 at 11:55:15AM +, Xuquan (Quan Xu) wrote:
>>On February 23, 2017 7:01 PM, Jan Beulich wrote:
>> On 23.02.17 at 11:53,  wrote:
 On February 23, 2017 5:59 PM, Jan Beulich wrote:
 On 23.02.17 at 10:28,  wrote:
>> On February 18, 2017 12:33 AM, Jan Beulich wrote:
>> On 17.02.17 at 09:49,  wrote:
>diff --git a/xen/arch/x86/hvm/vmx/vmx.c
>b/xen/arch/x86/hvm/vmx/vmx.c
>index 61925cf..3887c32 100644
>--- a/xen/arch/x86/hvm/vmx/vmx.c
>+++ b/xen/arch/x86/hvm/vmx/vmx.c
>@@ -1846,8 +1846,7 @@ static void
>>>__vmx_deliver_posted_interrupt(struct vcpu *v)
> {
> unsigned int cpu = v->processor;
>
>-if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>>&softirq_pending(cpu))
>- && (cpu != smp_processor_id()) )
>+if ( !softirq_pending(cpu) && (cpu !=
>+ smp_processor_id())
>+ )
>>
>> Jan,
>> could you help me present the definition of ' smp_processor_id()'
>>>and '
>> current' in __vmx_deliver_posted_interrupt() ? thanks..
>
>I'm afraid I don't understand the request.

 IOW,
  which vcpu does the 'current' refer to?
  which cpu does the ' smp_processor_id()' refer to?
>>>
>>>current: currently running vCPU
>>
>>in a SMP machine, are there more than one currently running vCPU?
>>I think so, the condition "if ( running && (in_irq() || (v != current)) )", in
>__vmx_deliver_posted_interrupt() looks strange -- when vCPU is running,
>why to check ' v != current '..
>>
>>
>>>smp_processor_id(): processor ID of the CPU we're running on
>>>
>>I think if vcpu is running, ' cpu != smp_processor_id() ' should be true.
>>
>
>I am afraid it is not right. when the 'current' wants to deliver posted 
>interrupt
>to 'current', the vcpu is running and cpu == smp_processor_id().
>
>>
>>I think we could simplify __vmx_deliver_posted_interrupt():
>>
>>   1. set VCPU_KICK_SOFTIRQ bit of v->processor.
>
>Do we need set VCPU_KICK_SOFTIRQ bit when vcpu is not running? IMO, if
>vcpu is not running, just unblock (or, vcpu_kick since when vcpu is not
>running, vcpu_kick is the same with vcpu_unblock )is enough.
>
>>   2. IF vcpu is running:
>> - send_IPI
>
>Do we need send self-IPI when the target vcpu is 'current'? IMO, set
>VCPU_KICK_SOFTIQR is enough.
>
>> ELSE
>> - vcpu_kick
>>
>>

Chao, I almost agree to your comments.. I will send out v2 later..
I appreciate your(KT, JB and you) comments that help me understand posted 
interrupt..

Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [linux-linus test] 106172: regressions - FAIL

2017-02-27 Thread osstest service owner
flight 106172 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106172/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail REGR. vs. 59254
 test-armhf-armhf-xl   6 xen-boot  fail REGR. vs. 59254
 test-armhf-armhf-xl-multivcpu  6 xen-boot fail REGR. vs. 59254
 test-armhf-armhf-xl-xsm   6 xen-boot  fail REGR. vs. 59254
 build-i386-pvops  5 kernel-build  fail REGR. vs. 59254
 test-armhf-armhf-xl-cubietruck 11 guest-start fail REGR. vs. 59254
 test-armhf-armhf-libvirt  6 xen-boot  fail REGR. vs. 59254
 test-armhf-armhf-xl-credit2   6 xen-boot  fail REGR. vs. 59254
 test-armhf-armhf-xl-arndale   6 xen-boot  fail REGR. vs. 59254
 test-armhf-armhf-libvirt-xsm  6 xen-boot  fail REGR. vs. 59254
 build-armhf-pvops 4 host-build-prep fail in 106152 REGR. vs. 59254

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 13 guest-localmigrate fail pass in 
106152

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds  6 xen-boot  fail REGR. vs. 59254
 test-amd64-amd64-xl-rtds  9 debian-installfail REGR. vs. 59254
 test-armhf-armhf-xl-vhd   6 xen-bootfail baseline untested
 test-armhf-armhf-libvirt-raw  6 xen-bootfail baseline untested
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 59254
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 59254

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-multivcpu  1 build-check(1)  blocked in 106152 n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked in 106152 n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked in 106152 n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked in 106152 n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked in 106152 n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked in 106152 n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1) blocked in 106152 n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked in 106152 n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked in 106152 n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked in 106152 n/a
 test-armhf-armhf-xl-xsm   1 build-check(1)   blocked in 106152 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-raw1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm  1 build-check(1) blocked n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemut-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-rumprun-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-arm64-arm64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-win7-amd64  1 build-che

Re: [Xen-devel] [PATCH] x86/emul: Fix sarx emulation test

2017-02-27 Thread Jan Beulich
>>> Andrew Cooper  02/24/17 7:14 PM >>>
>The emulation tests run `sarx %edx,(%ecx),%ebx` with 0xfedcba98 pointed at by
>%ecx, and 0xff13 in %rdx.
>
>As the instruction uses a 32bit operand size, the expected result is
>0xffdb in %rbx (rather than 0xffdb), due to normal
>usual zeroing of the upper half of the 64bit register.
>
>The test harness was incorrectly sign extending from 32 bits to 64 bits rather
>than zero extending when checking the result of emulation, causing a false
>negative failure.
>
>Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

>--- a/tools/tests/x86_emulator/test_x86_emulator.c
>+++ b/tools/tests/x86_emulator/test_x86_emulator.c
>@@ -1155,7 +1155,7 @@ int main(int argc, char **argv)
>regs.eflags = 0xa43;
>rc = x86_emulate(&ctxt, &emulops);
>if ( (rc != X86EMUL_OKAY) ||
>- regs.ebx != ((signed)*res >> (regs.edx & 0x1f)) ||
>+ regs.ebx != (unsigned)(((signed)*res >> (regs.edx & 0x1f))) ||

If "ebx" was what its name says, there wouldn't have been a problem.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 04/24] x86: refactor psr: implement CPU init and free flow.

2017-02-27 Thread Jan Beulich
>>> Wei Liu  02/26/17 6:41 PM >>>
>On Wed, Feb 15, 2017 at 04:49:19PM +0800, Yi Sun wrote:
>> +static void free_feature(struct psr_socket_info *info)
>> +{
>> +struct feat_node *feat, *next;
>> +
>> +if ( !info )
>> +return;
>> +
>> +/*
>> + * Free resources of features. But we do not free global feature list
>> + * entry, like feat_l3_cat. Although it may cause a few memory leak,
>> + * it is OK simplify things.
>
>I don't think it is OK to leak memory in the hypervisor in general.
>Please specify why it is OK in this particular case in the comment.

The problem is to call this a leak in the comment. There's no leak here,
the allocation is simply being kept until the next CPU online attempt.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 2/8] xen/passthrough: Reject self-(de)assignment of devices

2017-02-27 Thread Chao Gao
That is to say, don't support a domain assigns a device to itself or detachs
a device from itself.

Signed-off-by: Chao Gao 
---
v9:
- Newly added

 xen/drivers/passthrough/pci.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 338d6b4..a80d59e 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1603,6 +1603,13 @@ int iommu_do_pci_domctl(
 break;
 
 case XEN_DOMCTL_assign_device:
+/* Don't support self-(de)assignment of devices */
+if ( d == current->domain )
+{
+ret = -EINVAL;
+break;
+}
+
 ret = -ENODEV;
 if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_PCI )
 break;
@@ -1643,6 +1650,13 @@ int iommu_do_pci_domctl(
 break;
 
 case XEN_DOMCTL_deassign_device:
+/* Don't support self-(de)assignment of devices */
+if ( d == current->domain )
+{
+ret = -EINVAL;
+break;
+}
+
 ret = -ENODEV;
 if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_PCI )
 break;
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 0/8] VMX: Properly handle pi descriptor and per-cpu

2017-02-27 Thread Chao Gao
The current VT-d PI related code may operate incorrectly in the 
following scenarios: 
1. When the last assigned device is dettached from the domain, all 
the PI related hooks are removed then, however, the vCPU can be 
blocked, switched to another pCPU, etc, all without the aware of 
PI. After the next time we attach another device to the domain, 
which makes the PI realted hooks avaliable again, the status 
of the pi descriptor is not true. Beside that, the blocking vcpu 
may still remain in the per-cpu blocking in this case. Patch [1/8] 
and [3/8] handle this. Patch [2/8] rejects self-(de)assignment to
make domain pause in [3/8] safe.

2. In the current code, we didn't mind the assignment sequences
of the four hook functions(ie, vcpu_block, switch_to, switch_from,
do_resume). It caused the last the assertion in vmx_vcpu_block()
failed. Patch [4/8] handles the sequences carefully to make sure
that the assertion will not be triggered again.

3. When IRTE is in posted mode, we don't need to set the irq 
affinity for it, since the destination of these interrupts is 
vCPU and the vCPU affinity is set during vCPU scheduling. Patch 
[5/8] handles this. In order to make the logic clearer, it also
extracts the common part from pi_update_irte() and
msi_msg_to_rte_entry() which two functions update irte to make
the logic clearer.

4. [6/8] is a cleanup patch 

5. When a pCPU is unplugged, and there might be vCPUs on its 
list. Since the pCPU is offline, those vCPUs might not be woken 
up again. [7/8] addresses it. 

6. IRTE is updated through structure assigment or memcpy() which is
unsafe. To resolve this, Patch [8/8] use cmpxchg16b() if supported or
two 64-bit write operations to update irte.

Note that I just took over feng's work and the v9 patch series are based
on feng's v8 patch
(https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg01472.html).
Most of these patches are reviewed or acked by maintainers. Patch [2/8],
Patch [5/8], Patch [8/8] is added by me to address remainly issues.

Chao Gao (3):
  xen/passthrough: Reject self-(de)assignment of devices
  VT-d: Introduce a new function update_irte_for_msi_common
  VT-d: Add copy_irte_{to,from}_irt for updating irte

Feng Wu (5):
  VMX: Permanently assign PI hook vmx_pi_switch_to()
  VMX: Properly handle pi when all the assigned devices are removed
  VMX: Make sure PI is in proper state before install the hooks
  VT-d: Some cleanups
  VMX: Fixup PI descriptor when cpu is offline

 xen/arch/x86/hvm/vmx/vmcs.c|  14 +-
 xen/arch/x86/hvm/vmx/vmx.c | 134 +-
 xen/drivers/passthrough/pci.c  |  14 ++
 xen/drivers/passthrough/vtd/intremap.c | 252 +++--
 xen/include/asm-x86/hvm/vmx/vmx.h  |   3 +
 5 files changed, 297 insertions(+), 120 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 4/8] VMX: Make sure PI is in proper state before install the hooks

2017-02-27 Thread Chao Gao
From: Feng Wu 

We may hit the last ASSERT() in vmx_vcpu_block in the current code,
since vmx_vcpu_block() may get called before vmx_pi_switch_to()
has been installed or executed. Here We use cmpxchg to update
the NDST field, this can make sure we only update the NDST when
vmx_pi_switch_to() has not been called. So the NDST is in a
proper state in vmx_vcpu_block().

Suggested-by: Jan Beulich 
Signed-off-by: Feng Wu 
Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
Acked-by: Kevin Tian 
---
v6: 
- Comments changes 
- Define macro 'APIC_INVALID_DEST' for '0x' 

v5: 
- Use 0x as the invalid value for NDST field. 

v4: 
- This patch is previously called "Pause/Unpause the domain before/after
assigning PI hooks" 
- Remove the pause/unpause method 
- Use cmpxchg to update NDST 

 xen/arch/x86/hvm/vmx/vmcs.c   | 13 +
 xen/arch/x86/hvm/vmx/vmx.c| 27 ++-
 xen/include/asm-x86/hvm/vmx/vmx.h |  2 ++
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 0b77dbc..7905d3e 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -956,16 +956,13 @@ void virtual_vmcs_vmwrite(const struct vcpu *v, u32 
vmcs_encoding, u64 val)
  */
 static void pi_desc_init(struct vcpu *v)
 {
-uint32_t dest;
-
 v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
 
-dest = cpu_physical_id(v->processor);
-
-if ( x2apic_enabled )
-v->arch.hvm_vmx.pi_desc.ndst = dest;
-else
-v->arch.hvm_vmx.pi_desc.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);
+/*
+ * Mark NDST as invalid, then we can use this invalid value as a
+ * marker to whether update NDST or not in vmx_pi_hooks_assign().
+ */
+v->arch.hvm_vmx.pi_desc.ndst = APIC_INVALID_DEST;
 }
 
 static int construct_vmcs(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a7a70e7..e03786b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -246,14 +246,39 @@ static void vmx_pi_do_resume(struct vcpu *v)
 /* This function is called when pcidevs_lock is held */
 void vmx_pi_hooks_assign(struct domain *d)
 {
+struct vcpu *v;
+
 if ( !iommu_intpost || !has_hvm_container_domain(d) )
 return;
 
 ASSERT(!d->arch.hvm_domain.pi_ops.vcpu_block);
 
-d->arch.hvm_domain.pi_ops.vcpu_block = vmx_vcpu_block;
+/*
+ * We carefully handle the timing here:
+ * - Install the context switch first
+ * - Then set the NDST field
+ * - Install the block and resume hooks in the end
+ *
+ * This can make sure the PI (especially the NDST feild) is
+ * in proper state when we call vmx_vcpu_block().
+ */
 d->arch.hvm_domain.pi_ops.switch_from = vmx_pi_switch_from;
 d->arch.hvm_domain.pi_ops.switch_to = vmx_pi_switch_to;
+
+for_each_vcpu ( d, v )
+{
+unsigned int dest = cpu_physical_id(v->processor);
+struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+/*
+ * We don't need to update NDST if vmx_pi_switch_to()
+ * has already got called.
+ */
+(void)cmpxchg(&pi_desc->ndst, APIC_INVALID_DEST,
+x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
+}
+
+d->arch.hvm_domain.pi_ops.vcpu_block = vmx_vcpu_block;
 d->arch.hvm_domain.pi_ops.do_resume = vmx_pi_do_resume;
 }
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h 
b/xen/include/asm-x86/hvm/vmx/vmx.h
index f4183d9..00e6f0d 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -601,6 +601,8 @@ void vmx_pi_per_cpu_init(unsigned int cpu);
 void vmx_pi_hooks_assign(struct domain *d);
 void vmx_pi_hooks_deassign(struct domain *d);
 
+#define APIC_INVALID_DEST   0x
+
 /* EPT violation qualifications definitions */
 typedef union ept_qual {
 unsigned long raw;
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 1/8] VMX: Permanently assign PI hook vmx_pi_switch_to()

2017-02-27 Thread Chao Gao
From: Feng Wu 

PI hook vmx_pi_switch_to() is needed even after any previously
assigned device is detached from the domain. Since 'SN' bit is
also used to control the CPU side PI and we change the state of
SN bit in vmx_pi_switch_to() and vmx_pi_switch_from(), then
evaluate this bit in vmx_deliver_posted_intr() when trying to
deliver the interrupt in posted way via software. The problem
is if we deassign the hooks while the vCPU is runnable in the
runqueue with 'SN' set, all the furture notificaton event will
be suppressed. This patch makes the hook permanently assigned.

Signed-off-by: Feng Wu 
Signed-off-by: Chao Gao 
Reviewed-by: Konrad Rzeszutek Wilk 
---
v9:
- Comments changes [per Kevin's comments]

v8: 
- Comments changes 

v7: 
- comments changes. 

v6: 
- Adjust the comments and wording. 

v5: 
- Zap "pi_switch_from" hook 

v4: 
- Don't zap vmx_pi_switch_from() and vmx_pi_switch_to() when 
any previously assigned device is detached from the domain. 
- Comments changes. 

 xen/arch/x86/hvm/vmx/vmx.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d3d98da..5fa16dd 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -260,9 +260,15 @@ void vmx_pi_hooks_deassign(struct domain *d)
 
 ASSERT(d->arch.hvm_domain.pi_ops.vcpu_block);
 
+/*
+ * Note that we don't set 'd->arch.hvm_domain.pi_ops.switch_to' to NULL
+ * here. If we deassign the hooks while the vCPU is runnable in the
+ * runqueue with 'SN' set, all the future notification event will be
+ * suppressed. Preserving the 'switch_to' hook function can make sure
+ * event time the vCPU is running the 'SN' field is cleared.
+ */
 d->arch.hvm_domain.pi_ops.vcpu_block = NULL;
 d->arch.hvm_domain.pi_ops.switch_from = NULL;
-d->arch.hvm_domain.pi_ops.switch_to = NULL;
 d->arch.hvm_domain.pi_ops.do_resume = NULL;
 }
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 3/8] VMX: Properly handle pi when all the assigned devices are removed

2017-02-27 Thread Chao Gao
From: Feng Wu 

This patch handles some corner cases when the last assigned device
is removed from the domain. In this case we should carefully handle
pi descriptor and the per-cpu blocking list, to make sure:
- all the PI descriptor are in the right state when next time a
devices is assigned to the domain again.
- No remaining vcpus of the domain in the per-cpu blocking list.

Here we call vmx_pi_unblock_vcpu() to remove the vCPU from the blocking list
if it is on the list. However, this could happen when vmx_vcpu_block() is
being called, hence we might incorrectly add the vCPU to the blocking list
while the last devcie is detached from the domain. Consider that the situation
can only occur when detaching the last device from the domain and it is not
a frequent operation, so we use domain_pause before that, which is considered
as an clean and maintainable solution for the situation.

Signed-off-by: Feng Wu 
Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
v9:
- Based on [v8 2/7]. Add a assertion before domain pause.

v7: 
- Prevent the domain from pausing itself. 

v6: 
- Comments changes 
- Rename vmx_pi_list_remove() to vmx_pi_unblock_vcpu() 

v5: 
- Remove a no-op wrapper 

v4: 
- Rename some functions: 
vmx_pi_remove_vcpu_from_blocking_list() -> vmx_pi_list_remove() 
vmx_pi_blocking_cleanup() -> vmx_pi_list_cleanup() 
- Remove the check in vmx_pi_list_cleanup() 
- Comments adjustment 

 xen/arch/x86/hvm/vmx/vmx.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 5fa16dd..a7a70e7 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -155,14 +155,12 @@ static void vmx_pi_switch_to(struct vcpu *v)
 pi_clear_sn(pi_desc);
 }
 
-static void vmx_pi_do_resume(struct vcpu *v)
+static void vmx_pi_unblock_vcpu(struct vcpu *v)
 {
 unsigned long flags;
 spinlock_t *pi_blocking_list_lock;
 struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
 
-ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
-
 /*
  * Set 'NV' field back to posted_intr_vector, so the
  * Posted-Interrupts can be delivered to the vCPU when
@@ -170,12 +168,12 @@ static void vmx_pi_do_resume(struct vcpu *v)
  */
 write_atomic(&pi_desc->nv, posted_intr_vector);
 
-/* The vCPU is not on any blocking list. */
 pi_blocking_list_lock = v->arch.hvm_vmx.pi_blocking.lock;
 
 /* Prevent the compiler from eliminating the local variable.*/
 smp_rmb();
 
+/* The vCPU is not on any blocking list. */
 if ( pi_blocking_list_lock == NULL )
 return;
 
@@ -195,6 +193,13 @@ static void vmx_pi_do_resume(struct vcpu *v)
 spin_unlock_irqrestore(pi_blocking_list_lock, flags);
 }
 
+static void vmx_pi_do_resume(struct vcpu *v)
+{
+ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
+
+vmx_pi_unblock_vcpu(v);
+}
+
 /*
  * To handle posted interrupts correctly, we need to set the following
  * state:
@@ -255,12 +260,23 @@ void vmx_pi_hooks_assign(struct domain *d)
 /* This function is called when pcidevs_lock is held */
 void vmx_pi_hooks_deassign(struct domain *d)
 {
+struct vcpu *v;
+
 if ( !iommu_intpost || !has_hvm_container_domain(d) )
 return;
 
 ASSERT(d->arch.hvm_domain.pi_ops.vcpu_block);
 
 /*
+ * Pausing the domain can make sure the vCPUs are not
+ * running and hence not calling the hooks simultaneously
+ * when deassigning the PI hooks and removing the vCPU
+ * from the blocking list.
+ */
+ASSERT(current->domain != d);
+domain_pause(d);
+
+/*
  * Note that we don't set 'd->arch.hvm_domain.pi_ops.switch_to' to NULL
  * here. If we deassign the hooks while the vCPU is runnable in the
  * runqueue with 'SN' set, all the future notification event will be
@@ -270,6 +286,11 @@ void vmx_pi_hooks_deassign(struct domain *d)
 d->arch.hvm_domain.pi_ops.vcpu_block = NULL;
 d->arch.hvm_domain.pi_ops.switch_from = NULL;
 d->arch.hvm_domain.pi_ops.do_resume = NULL;
+
+for_each_vcpu ( d, v )
+vmx_pi_unblock_vcpu(v);
+
+domain_unpause(d);
 }
 
 static int vmx_domain_initialise(struct domain *d)
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 7/8] VMX: Fixup PI descriptor when cpu is offline

2017-02-27 Thread Chao Gao
From: Feng Wu 

When cpu is offline, we need to move all the vcpus in its blocking
list to another online cpu, this patch handles it.

Signed-off-by: Feng Wu 
Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
Acked-by: Kevin Tian 
---
v7: 
- Pass unsigned int to vmx_pi_desc_fixup()

v6: 
- Carefully suppress 'SN' to avoid missing notification event
during moving the vcpu to the new list

v5: 
- Add some comments to explain why it doesn't cause deadlock
for the ABBA deadlock scenario. 

v4: 
- Remove the pointless check since we are in machine stop
context and no other cpus go down in parallel.

 xen/arch/x86/hvm/vmx/vmcs.c   |  1 +
 xen/arch/x86/hvm/vmx/vmx.c| 70 +++
 xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
 3 files changed, 72 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 7905d3e..7e3e093 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -578,6 +578,7 @@ void vmx_cpu_dead(unsigned int cpu)
 vmx_free_vmcs(per_cpu(vmxon_region, cpu));
 per_cpu(vmxon_region, cpu) = 0;
 nvmx_cpu_dead(cpu);
+vmx_pi_desc_fixup(cpu);
 }
 
 int vmx_cpu_up(void)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e03786b..b8a385b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -200,6 +200,76 @@ static void vmx_pi_do_resume(struct vcpu *v)
 vmx_pi_unblock_vcpu(v);
 }
 
+void vmx_pi_desc_fixup(unsigned int cpu)
+{
+unsigned int new_cpu, dest;
+unsigned long flags;
+struct arch_vmx_struct *vmx, *tmp;
+spinlock_t *new_lock, *old_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
+struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
+
+if ( !iommu_intpost )
+return;
+
+/*
+ * We are in the context of CPU_DEAD or CPU_UP_CANCELED notification,
+ * and it is impossible for a second CPU go down in parallel. So we
+ * can safely acquire the old cpu's lock and then acquire the new_cpu's
+ * lock after that.
+ */
+spin_lock_irqsave(old_lock, flags);
+
+list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
+{
+/*
+ * Suppress notification or we may miss an interrupt when the
+ * target cpu is dying.
+ */
+pi_set_sn(&vmx->pi_desc);
+
+/*
+ * Check whether a notification is pending before doing the
+ * movement, if that is the case we need to wake up it directly
+ * other than moving it to the new cpu's list.
+ */
+if ( pi_test_on(&vmx->pi_desc) )
+{
+list_del(&vmx->pi_blocking.list);
+vmx->pi_blocking.lock = NULL;
+vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx));
+}
+else
+{
+/*
+ * We need to find an online cpu as the NDST of the PI descriptor, 
it
+ * doesn't matter whether it is within the cpupool of the domain or
+ * not. As long as it is online, the vCPU will be woken up once the
+ * notification event arrives.
+ */
+new_cpu = cpumask_any(&cpu_online_map);
+new_lock = &per_cpu(vmx_pi_blocking, new_cpu).lock;
+
+spin_lock(new_lock);
+
+ASSERT(vmx->pi_blocking.lock == old_lock);
+
+dest = cpu_physical_id(new_cpu);
+write_atomic(&vmx->pi_desc.ndst,
+ x2apic_enabled ? dest : MASK_INSR(dest, 
PI_xAPIC_NDST_MASK));
+
+list_move(&vmx->pi_blocking.list,
+  &per_cpu(vmx_pi_blocking, new_cpu).list);
+vmx->pi_blocking.lock = new_lock;
+
+spin_unlock(new_lock);
+}
+
+pi_clear_sn(&vmx->pi_desc);
+}
+
+spin_unlock_irqrestore(old_lock, flags);
+}
+
 /*
  * To handle posted interrupts correctly, we need to set the following
  * state:
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h 
b/xen/include/asm-x86/hvm/vmx/vmx.h
index 00e6f0d..70155fb 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -597,6 +597,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
 void vmx_pi_per_cpu_init(unsigned int cpu);
+void vmx_pi_desc_fixup(unsigned int cpu);
 
 void vmx_pi_hooks_assign(struct domain *d);
 void vmx_pi_hooks_deassign(struct domain *d);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 6/8] VT-d: Some cleanups

2017-02-27 Thread Chao Gao
From: Feng Wu 

Use type-safe structure assignment instead of memcpy()
Use sizeof(*iremap_entry).

Signed-off-by: Feng Wu 
Signed-off-by: Chao Gao 
Reviewed-by: Konrad Rzeszutek Wilk 
Acked-by: Kevin Tian 
---
v9:
- Delete several lines for patch [5/8] has been reworked.

v7: 
- Remove a useless cleanup

v6: 
- More descripion about the patch

 xen/drivers/passthrough/vtd/intremap.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index 4269cd4..737b886 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -183,8 +183,8 @@ static void free_remap_entry(struct iommu *iommu, int index)
 GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
  iremap_entries, iremap_entry);
 
-memset(iremap_entry, 0, sizeof(struct iremap_entry));
-iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+memset(iremap_entry, 0, sizeof(*iremap_entry));
+iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
 
 unmap_vtd_domain_page(iremap_entries);
@@ -310,7 +310,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
 GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
  iremap_entries, iremap_entry);
 
-memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
+new_ire = *iremap_entry;
 
 if ( rte_upper )
 {
@@ -353,8 +353,8 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
 remap_rte->format = 1;/* indicate remap format */
 }
 
-memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
-iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+*iremap_entry = new_ire;
+iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
 
 unmap_vtd_domain_page(iremap_entries);
@@ -639,7 +639,7 @@ static int update_irte_for_msi_common(
 iremap_entry->hi = new_ire.hi;
 }
 
-iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
 }
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 5/8] VT-d: Introduce a new function update_irte_for_msi_common

2017-02-27 Thread Chao Gao
Both pi_update_irte() and msi_msg_to_remap_entry() update the content of IRTE;
besides, the current msi_msg_to_remap_entry is buggy when the live IRTE is in
posted format. This patch try to rework these two functions to make them
clearer by moving their common part to the new function.

Signed-off-by: Feng Wu 
Signed-off-by: Chao Gao 
---
v9:
- Newly added.

 xen/drivers/passthrough/vtd/intremap.c | 232 +++--
 1 file changed, 131 insertions(+), 101 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index bfd468b..4269cd4 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -420,7 +420,8 @@ void io_apic_write_remap_rte(
 __ioapic_write_entry(apic, ioapic_pin, 1, old_rte);
 }
 
-static void set_msi_source_id(struct pci_dev *pdev, struct iremap_entry *ire)
+static void set_msi_source_id(const struct pci_dev *pdev,
+  struct iremap_entry *ire)
 {
 u16 seg;
 u8 bus, devfn, secbus;
@@ -547,16 +548,116 @@ static int remap_entry_to_msi_msg(
 return 0;
 }
 
+/*
+ * This function is a common interface to update irte for msi case.
+ *
+ * If @pi_desc != NULL and @gvec != 0, the IRTE will be updated to a posted
+ * format. In this case, @msg is ignored because constructing a posted format
+ * IRTE doesn't need any information about the msi address or msi data.
+ *
+ * If @pi_desc == NULL and @gvec == 0, the IRTE will be updated to a remapped
+ * format. In this case, @msg can't be NULL.
+ *
+ * Assume 'ir_ctrl->iremap_lock' has been acquired and the remap_index
+ * of msi_desc has a benign value.
+ */
+static int update_irte_for_msi_common(
+struct iommu *iommu, const struct pci_dev *pdev,
+const struct msi_desc *msi_desc, struct msi_msg *msg,
+const struct pi_desc *pi_desc, const uint8_t gvec)
+{
+struct iremap_entry *iremap_entry = NULL, *iremap_entries;
+struct iremap_entry new_ire = {{0}};
+unsigned int index = msi_desc->remap_index;
+struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
+
+ASSERT( ir_ctrl );
+ASSERT( spin_is_locked(&ir_ctrl->iremap_lock) );
+ASSERT( (index >= 0) && (index < IREMAP_ENTRY_NR) );
+
+if ( (!pi_desc && gvec) || (pi_desc && !gvec) )
+return -EINVAL;
+
+if ( !pi_desc && !gvec && !msg )
+return -EINVAL;
+
+GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
+ iremap_entries, iremap_entry);
+
+if ( !pi_desc )
+{
+   /* Set interrupt remapping table entry */
+new_ire.remap.dm = msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT;
+new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
+new_ire.remap.dlm = msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT;
+/* Hardware require RH = 1 for LPR delivery mode */
+new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
+MSI_DATA_VECTOR_MASK;
+if ( x2apic_enabled )
+new_ire.remap.dst = msg->dest32;
+else
+new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
+ & 0xff) << 8;
+new_ire.remap.p = 1;
+}
+else
+{
+new_ire.post.im = 1;
+new_ire.post.vector = gvec;
+new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
+new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
+new_ire.post.p = 1;
+}
+
+if ( pdev )
+set_msi_source_id(pdev, &new_ire);
+else
+set_hpet_source_id(msi_desc->hpet_id, &new_ire);
+
+if ( iremap_entry->val != new_ire.val )
+{
+if ( cpu_has_cx16 )
+{
+__uint128_t ret;
+struct iremap_entry old_ire;
+
+old_ire = *iremap_entry;
+ret = cmpxchg16b(iremap_entry, &old_ire, &new_ire);
+
+/*
+ * In the above, we use cmpxchg16 to atomically update the 128-bit
+ * IRTE, and the hardware cannot update the IRTE behind us, so
+ * the return value of cmpxchg16 should be the same as old_ire.
+ * This ASSERT validate it.
+ */
+ASSERT(ret == old_ire.val);
+}
+else
+{
+iremap_entry->lo = new_ire.lo;
+iremap_entry->hi = new_ire.hi;
+}
+
+iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+iommu_flush_iec_index(iommu, 0, index);
+}
+
+unmap_vtd_domain_page(iremap_entries);
+return 0;
+}
+
 static int msi_msg_to_remap_entry(
-struct iommu *iommu, struct pci_dev *pdev,
+struct iommu *iommu, const struct pci_dev *pdev,
 struct msi_desc *msi_desc, struct msi_msg *msg)
 {
 struct iremap_entry *iremap_entry = NULL, *iremap_entries;
-struct iremap_entry new_ire;
 struct msi_msg_remap_entry *remap_rte;
 unsigned in

[Xen-devel] [PATCH v9 8/8] VT-d: Add copy_irte_{to, from}_irt for updating irte

2017-02-27 Thread Chao Gao
We used structure assignment to update irte which was not safe when a
interrupt happend during update. It is better to update IRTE atomically
through cmpxchg16b(). When cmpxchg16b is not supported, two 64-bit write
operation can atomically update IRTE when only one the high dword or
low dword is intented to be changed.

Signed-off-by: Chao Gao 
---
v9:
- Newly added.

 xen/drivers/passthrough/vtd/intremap.c | 58 --
 1 file changed, 34 insertions(+), 24 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index 737b886..649bebd 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -169,6 +169,37 @@ bool_t __init iommu_supports_eim(void)
 return 1;
 }
 
+static inline void copy_irte_from_irt(struct iremap_entry *dst,
+  struct iremap_entry *src)
+{
+*dst = *src;
+}
+
+static void copy_irte_to_irt(struct iremap_entry *dst, struct iremap_entry 
*src)
+{
+if ( cpu_has_cx16 )
+{
+__uint128_t ret;
+struct iremap_entry old_ire;
+
+copy_irte_from_irt(&old_ire, dst);
+ret = cmpxchg16b(dst, &old_ire, src);
+
+/*
+ * In the above, we use cmpxchg16 to atomically update the 128-bit
+ * IRTE, and the hardware cannot update the IRTE behind us, so
+ * the return value of cmpxchg16 should be the same as old_ire.
+ * This ASSERT validate it.
+ */
+ASSERT(ret == old_ire.val);
+}
+else
+{
+dst->lo = src->lo;
+dst->hi = src->hi;
+}
+}
+
 /* Mark specified intr remap entry as free */
 static void free_remap_entry(struct iommu *iommu, int index)
 {
@@ -310,7 +341,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
 GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
  iremap_entries, iremap_entry);
 
-new_ire = *iremap_entry;
+copy_irte_from_irt(&new_ire, iremap_entry);
 
 if ( rte_upper )
 {
@@ -353,7 +384,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
 remap_rte->format = 1;/* indicate remap format */
 }
 
-*iremap_entry = new_ire;
+copy_irte_to_irt(iremap_entry, &new_ire);
 iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
 
@@ -617,28 +648,7 @@ static int update_irte_for_msi_common(
 
 if ( iremap_entry->val != new_ire.val )
 {
-if ( cpu_has_cx16 )
-{
-__uint128_t ret;
-struct iremap_entry old_ire;
-
-old_ire = *iremap_entry;
-ret = cmpxchg16b(iremap_entry, &old_ire, &new_ire);
-
-/*
- * In the above, we use cmpxchg16 to atomically update the 128-bit
- * IRTE, and the hardware cannot update the IRTE behind us, so
- * the return value of cmpxchg16 should be the same as old_ire.
- * This ASSERT validate it.
- */
-ASSERT(ret == old_ire.val);
-}
-else
-{
-iremap_entry->lo = new_ire.lo;
-iremap_entry->hi = new_ire.hi;
-}
-
+copy_irte_to_irt(iremap_entry, &new_ire);
 iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
 }
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Michal Hocko
From: Michal Hocko 

This knob has been added by 31bc3858ea3e ("memory-hotplug: add automatic
onlining policy for the newly added memory") mainly to cover memory
hotplug based balooning solutions currently implemented for HyperV
and Xen. Both of them want to online the memory as soon after
registering as possible otherwise they can register too much memory
which cannot be used and trigger the oom killer (we need ~1.5% of the
registered memory so a large increase can consume all the available
memory). hv_mem_hot_add even waits for the userspace to online the
memory if the auto onlining is disabled to mitigate that problem.

Adding yet another knob and a config option just doesn't make much sense
IMHO. How is a random user supposed to know when to enable this option?
Ballooning drivers know much better that they want to do an immediate
online rather than waiting for the userspace to do that. If the memory
is onlined for a different purpose then we already have a notification
for the userspace and udev can handle the onlining. So the knob as well
as the config option for the default behavior just doesn't make any
sense. Let's remove them and allow user of add_memory to request the
online status explicitly. Not only it makes more sense it also removes a
lot of clutter.

Signed-off-by: Michal Hocko 
---

Hi,
I am sending this as an RFC because this is a user visible change. Maybe
we won't be able to remove the sysfs knob which would be sad, especially
when it has been added without a wider discussion and IMHO it is just
wrong. Is there any reason why a kernel command line parameter wouldn't
work just fine?

Even in that case I believe that we should remove
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE knob. It just adds to an already
messy config space. Does anybody depend on the policy during the early
boot before the userspace can set the sysfs knob? Or why those users cannot
simply use the kernel command line parameter.

I also believe that the wait-for-userspace in hyperV should just die. It
should do the unconditional onlining. Same as Xen. I do not see any
reason why those should depend on the userspace. This should be just
fixed regardless of the sysfs/config part. I can separate this out of course.

Thoughts/Concerns?

 drivers/acpi/acpi_memhotplug.c |  2 +-
 drivers/base/memory.c  | 33 +
 drivers/hv/hv_balloon.c| 26 +-
 drivers/s390/char/sclp_cmd.c   |  2 +-
 drivers/xen/balloon.c  |  2 +-
 include/linux/memory_hotplug.h |  4 +---
 mm/Kconfig | 16 
 mm/memory_hotplug.c| 22 ++
 8 files changed, 8 insertions(+), 99 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..2b1c35fb36d1 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = add_memory(node, info->start_addr, info->length);
+   result = add_memory(node, info->start_addr, info->length, 
false);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index fa26ffd25fa6..476c2c02f938 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -446,37 +446,6 @@ print_block_size(struct device *dev, struct 
device_attribute *attr,
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
 
 /*
- * Memory auto online policy.
- */
-
-static ssize_t
-show_auto_online_blocks(struct device *dev, struct device_attribute *attr,
-   char *buf)
-{
-   if (memhp_auto_online)
-   return sprintf(buf, "online\n");
-   else
-   return sprintf(buf, "offline\n");
-}
-
-static ssize_t
-store_auto_online_blocks(struct device *dev, struct device_attribute *attr,
-const char *buf, size_t count)
-{
-   if (sysfs_streq(buf, "online"))
-   memhp_auto_online = true;
-   else if (sysfs_streq(buf, "offline"))
-   memhp_auto_online = false;
-   else
-   return -EINVAL;
-
-   return count;
-}
-
-static DEVICE_ATTR(auto_online_blocks, 0644, show_auto_online_blocks,
-  store_auto_online_blocks);
-
-/*
  * Some architectures will have custom drivers to do this, and
  * will not need to do it from userspace.  The fake hot-add code
  * as well as ppc64 will do all of their discovery in userspace
@@ -500,7 +469,7 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
 
nid = memory_add_physaddr_to_nid(phys_addr);
ret = add_memory(nid, phys_addr,
-MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+ 

Re: [Xen-devel] [PATCH] x86/paging: Package up the log dirty function pointers

2017-02-27 Thread George Dunlap
On 16/02/17 17:13, Andrew Cooper wrote:
> They depend soley on paging mode, so don't need to be repeated per domain, and
> can live in .rodata.  While making this change, drop the redundant log_dirty
> from the function pointer names.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: George Dunlap 

Sorry for the delay.



> ---
> CC: Jan Beulich 
> CC: Tim Deegan 
> CC: George Dunlap 
> ---
>  xen/arch/x86/mm/hap/hap.c   | 10 +++---
>  xen/arch/x86/mm/paging.c| 23 ---
>  xen/arch/x86/mm/shadow/common.c |  9 +++--
>  xen/include/asm-x86/domain.h|  8 +---
>  xen/include/asm-x86/paging.h|  6 +-
>  5 files changed, 28 insertions(+), 28 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index b5870bf..e7bad69 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -444,14 +444,18 @@ static void hap_destroy_monitor_table(struct vcpu* v, 
> mfn_t mmfn)
>  //
>  void hap_domain_init(struct domain *d)
>  {
> +static const struct log_dirty_ops hap_ops = {
> +.enable  = hap_enable_log_dirty,
> +.disable = hap_disable_log_dirty,
> +.clean   = hap_clean_dirty_bitmap,
> +};
> +
>  INIT_PAGE_LIST_HEAD(&d->arch.paging.hap.freelist);
>  
>  d->arch.paging.gfn_bits = hap_paddr_bits - PAGE_SHIFT;
>  
>  /* Use HAP logdirty mechanism. */
> -paging_log_dirty_init(d, hap_enable_log_dirty,
> -  hap_disable_log_dirty,
> -  hap_clean_dirty_bitmap);
> +paging_log_dirty_init(d, &hap_ops);
>  }
>  
>  /* return 0 for success, -errno for failure */
> diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
> index d964ed5..58d7f13 100644
> --- a/xen/arch/x86/mm/paging.c
> +++ b/xen/arch/x86/mm/paging.c
> @@ -234,7 +234,7 @@ int paging_log_dirty_enable(struct domain *d, bool_t 
> log_global)
>  return -EINVAL;
>  
>  domain_pause(d);
> -ret = d->arch.paging.log_dirty.enable_log_dirty(d, log_global);
> +ret = d->arch.paging.log_dirty.ops->enable(d, log_global);
>  domain_unpause(d);
>  
>  return ret;
> @@ -250,7 +250,7 @@ static int paging_log_dirty_disable(struct domain *d, 
> bool_t resuming)
>  /* Safe because the domain is paused. */
>  if ( paging_mode_log_dirty(d) )
>  {
> -ret = d->arch.paging.log_dirty.disable_log_dirty(d);
> +ret = d->arch.paging.log_dirty.ops->disable(d);
>  ASSERT(ret <= 0);
>  }
>  }
> @@ -572,7 +572,7 @@ static int paging_log_dirty_op(struct domain *d,
>  {
>  /* We need to further call clean_dirty_bitmap() functions of specific
>   * paging modes (shadow or hap).  Safe because the domain is paused. 
> */
> -d->arch.paging.log_dirty.clean_dirty_bitmap(d);
> +d->arch.paging.log_dirty.ops->clean(d);
>  }
>  domain_unpause(d);
>  return rv;
> @@ -624,22 +624,15 @@ void paging_log_dirty_range(struct domain *d,
>  flush_tlb_mask(d->domain_dirty_cpumask);
>  }
>  
> -/* Note that this function takes three function pointers. Callers must supply
> - * these functions for log dirty code to call. This function usually is
> - * invoked when paging is enabled. Check shadow_enable() and hap_enable() for
> - * reference.
> +/* Callers must supply log_dirty_ops for the log dirty code to call. This
> + * function usually is invoked when paging is enabled. Check shadow_enable()
> + * and hap_enable() for reference.
>   *
>   * These function pointers must not be followed with the log-dirty lock held.
>   */
> -void paging_log_dirty_init(struct domain *d,
> -   int(*enable_log_dirty)(struct domain *d,
> -  bool_t log_global),
> -   int(*disable_log_dirty)(struct domain *d),
> -   void   (*clean_dirty_bitmap)(struct domain *d))
> +void paging_log_dirty_init(struct domain *d, const struct log_dirty_ops *ops)
>  {
> -d->arch.paging.log_dirty.enable_log_dirty = enable_log_dirty;
> -d->arch.paging.log_dirty.disable_log_dirty = disable_log_dirty;
> -d->arch.paging.log_dirty.clean_dirty_bitmap = clean_dirty_bitmap;
> +d->arch.paging.log_dirty.ops = ops;
>  }
>  
>  //
> diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
> index 1c9d9b9..a18aa25 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -48,6 +48,12 @@ static void sh_clean_dirty_bitmap(struct domain *);
>   * Called for every domain from arch_domain_create() */
>  int shadow_domain_init(struct domain *d, unsigned int domcr_flags)
>  {
> +static const struct log_dirty_ops sh_ops = {
> +.enable  = sh_enable_log_dirty,
> +.disable = sh_disable_log_dirty,
> +.clean   = sh_clea

Re: [Xen-devel] [PATCH] x86/mm: Swap mfn_valid() to use mfn_t

2017-02-27 Thread George Dunlap
On 16/02/17 20:07, Andrew Cooper wrote:
> Replace one opencoded mfn_eq() and some coding style issues on altered lines.
> Swap __mfn_valid() to being bool, although it can't be updated to take mfn_t
> because of include dependencies.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: George Dunlap 

> ---
> CC: Jan Beulich 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> CC: Tim Deegan 
> CC: George Dunlap 
> ---
>  xen/arch/arm/mem_access.c   |  2 +-
>  xen/arch/arm/mm.c   |  2 +-
>  xen/arch/arm/p2m.c  |  6 +++---
>  xen/arch/arm/setup.c|  3 ++-
>  xen/arch/x86/cpu/mcheck/mce.c   |  2 +-
>  xen/arch/x86/cpu/mcheck/vmce.c  |  2 +-
>  xen/arch/x86/cpu/vpmu.c |  2 +-
>  xen/arch/x86/debug.c|  2 +-
>  xen/arch/x86/domctl.c   |  2 +-
>  xen/arch/x86/hvm/mtrr.c |  2 +-
>  xen/arch/x86/mm.c   | 20 ++--
>  xen/arch/x86/mm/hap/guest_walk.c|  2 +-
>  xen/arch/x86/mm/hap/hap.c   |  2 --
>  xen/arch/x86/mm/hap/nested_hap.c|  2 --
>  xen/arch/x86/mm/mem_access.c|  4 ++--
>  xen/arch/x86/mm/mem_sharing.c   |  4 +---
>  xen/arch/x86/mm/p2m-ept.c   |  4 ++--
>  xen/arch/x86/mm/p2m-pod.c   |  2 --
>  xen/arch/x86/mm/p2m-pt.c|  2 --
>  xen/arch/x86/mm/p2m.c   |  2 --
>  xen/arch/x86/mm/paging.c|  2 --
>  xen/arch/x86/mm/shadow/private.h|  2 --
>  xen/arch/x86/tboot.c|  4 ++--
>  xen/arch/x86/x86_64/mm.c| 16 
>  xen/arch/x86/x86_64/traps.c | 14 +++---
>  xen/common/grant_table.c| 10 +-
>  xen/common/memory.c |  8 
>  xen/common/page_alloc.c | 12 ++--
>  xen/common/pdx.c|  2 +-
>  xen/drivers/passthrough/amd/iommu_guest.c   | 10 +-
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  2 +-
>  xen/drivers/passthrough/vtd/dmar.c  |  2 +-
>  xen/drivers/passthrough/vtd/x86/vtd.c   |  2 +-
>  xen/include/asm-arm/mm.h|  4 ++--
>  xen/include/asm-arm/p2m.h   |  2 +-
>  xen/include/asm-x86/p2m.h   |  2 +-
>  xen/include/asm-x86/page.h  |  2 +-
>  xen/include/xen/pdx.h   |  2 +-
>  xen/include/xen/tmem_xen.h  |  2 +-
>  39 files changed, 77 insertions(+), 92 deletions(-)
> 
> diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
> index 03b20c4..04b1506 100644
> --- a/xen/arch/arm/mem_access.c
> +++ b/xen/arch/arm/mem_access.c
> @@ -172,7 +172,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
> long flag,
>  if ( mfn_eq(mfn, INVALID_MFN) )
>  goto err;
>  
> -if ( !mfn_valid(mfn_x(mfn)) )
> +if ( !mfn_valid(mfn) )
>  goto err;
>  
>  /*
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 2d96423..f0a2edd 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1350,7 +1350,7 @@ int replace_grant_host_mapping(unsigned long addr, 
> unsigned long mfn,
>  
>  bool is_iomem_page(mfn_t mfn)
>  {
> -return !mfn_valid(mfn_x(mfn));
> +return !mfn_valid(mfn);
>  }
>  
>  void clear_and_clean_page(struct page_info *page)
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 5e8f6cd..e36d075 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -648,7 +648,7 @@ static void p2m_put_l3_page(const lpae_t pte)
>  {
>  unsigned long mfn = pte.p2m.base;
>  
> -ASSERT(mfn_valid(mfn));
> +ASSERT(mfn_valid(_mfn(mfn)));
>  put_page(mfn_to_page(mfn));
>  }
>  }
> @@ -695,7 +695,7 @@ static void p2m_free_entry(struct p2m_domain *p2m,
>  p2m_flush_tlb_sync(p2m);
>  
>  mfn = _mfn(entry.p2m.base);
> -ASSERT(mfn_valid(mfn_x(mfn)));
> +ASSERT(mfn_valid(mfn));
>  
>  free_domheap_page(mfn_to_page(mfn_x(mfn)));
>  }
> @@ -1412,7 +1412,7 @@ struct page_info *get_page_from_gva(struct vcpu *v, 
> vaddr_t va,
>  if ( rc )
>  goto err;
>  
> -if ( !mfn_valid(maddr >> PAGE_SHIFT) )
> +if ( !mfn_valid(_mfn(maddr >> PAGE_SHIFT)) )
>  goto err;
>  
>  page = mfn_to_page(maddr >> PAGE_SHIFT);
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 2bf4363..b25ad80 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -268,7 +268,8 @@ void __init discard_initial_modules(void)
>  if ( mi->module[i].kind == BOOTMOD_XEN )
>  continue;
>  
> -if ( !mfn_valid(paddr_to_pfn(s)) || !mfn_valid(paddr_to_pfn(e)))
> +if ( !mfn_valid(_mfn(paddr_to_pfn(s))) ||
> + !mfn_valid(_mfn(

Re: [Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Vitaly Kuznetsov
Michal Hocko  writes:

> From: Michal Hocko 
>
> This knob has been added by 31bc3858ea3e ("memory-hotplug: add automatic
> onlining policy for the newly added memory") mainly to cover memory
> hotplug based balooning solutions currently implemented for HyperV
> and Xen. Both of them want to online the memory as soon after
> registering as possible otherwise they can register too much memory
> which cannot be used and trigger the oom killer (we need ~1.5% of the
> registered memory so a large increase can consume all the available
> memory). hv_mem_hot_add even waits for the userspace to online the
> memory if the auto onlining is disabled to mitigate that problem.
>
> Adding yet another knob and a config option just doesn't make much sense
> IMHO. How is a random user supposed to know when to enable this option?
> Ballooning drivers know much better that they want to do an immediate
> online rather than waiting for the userspace to do that. If the memory
> is onlined for a different purpose then we already have a notification
> for the userspace and udev can handle the onlining. So the knob as well
> as the config option for the default behavior just doesn't make any
> sense. Let's remove them and allow user of add_memory to request the
> online status explicitly. Not only it makes more sense it also removes a
> lot of clutter.
>
> Signed-off-by: Michal Hocko 
> ---
>
> Hi,
> I am sending this as an RFC because this is a user visible change. Maybe
> we won't be able to remove the sysfs knob which would be sad, especially
> when it has been added without a wider discussion and IMHO it is just
> wrong. Is there any reason why a kernel command line parameter wouldn't
> work just fine?
>
> Even in that case I believe that we should remove
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE knob. It just adds to an already
> messy config space. Does anybody depend on the policy during the early
> boot before the userspace can set the sysfs knob? Or why those users cannot
> simply use the kernel command line parameter.
>
> I also believe that the wait-for-userspace in hyperV should just die. It
> should do the unconditional onlining. Same as Xen. I do not see any
> reason why those should depend on the userspace. This should be just
> fixed regardless of the sysfs/config part. I can separate this out of course.
>
> Thoughts/Concerns?

I don't have anything new to add to the discussion happened last week
but I'd like to summarize my arguments against this change:

1) This patch doesn't solve any issue. Configuration option is not an
issue by itself, it is an option for distros to decide what they want to
ship: udev rule with known issues (legacy mode) or enable the new
option. Distro makers and users building their kernels should be able to
answer this simple question "do you want to automatically online all
newly added memory or not". There are distros already which ship kernels
with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE enabled (Fedora 24 and 25 as
far as I remember, maybe someone else).

2) This patch creates an imbalance between Xen/Hyper-V on one side and
KVM/Vmware on another. KVM/Vmware use pure ACPI memory hotplug and this
memory won't get onlined. I don't understand how this problem is
supposed to be solved by distros. They'll *have to* continue shipping
a udev rule which has and always will have issues.

3) Kernel command line is not a viable choice, it is rather a debug
method. Having all newly added memory online as soon as possible is a
major use-case not something a couple of users wants (and this is
proved by major distros shipping the unconditional 'offline->online'
rule with udev).

A couple of other thoughts:
1) Having all newly added memory online ASAP is probably what people
want for all virtual machines. Unfortunately, we have additional
complexity with memory zones (ZONE_NORMAL, ZONE_MOVABLE) and in some
cases manual intervention is required. Especially, when further unplug
is expected.

2) Adding new memory can (in some extreme cases) still fail as we need
some *other* memory before we're able to online the newly added
block. This is an issue to be solved and it is doable (IMO) with some
pre-allocation.

I'd also like to notice that this patch doesn't re-introduce the issue I
was fixing with in-kernel memory onlining as all memory added through
the Hyper-V driver will be auto-onlined unconditionally. What I disagree
with here is taking away choice without fixing any real world issues.

[snip]

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] x86: fix memory leak in pvh_setup_acpi_madt

2017-02-27 Thread Roger Pau Monné
On Sun, Feb 26, 2017 at 03:49:31PM +, Wei Liu wrote:
> Switch to use goto style error handling to avoid leaking madt.

Thanks, I guess it's fine to don't leak memory here, but in any case this is
doomed to fail, no further progress is going to be made if Dom0 build fails,
hence that's why memory leaks here are not really meaningful.

> Coverity-ID: 1401534
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Roger Pau Monné 

> ---
>  xen/arch/x86/domain_build.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
> index 0534838c8e..1f18f9283f 100644
> --- a/xen/arch/x86/domain_build.c
> +++ b/xen/arch/x86/domain_build.c
> @@ -2290,7 +2290,8 @@ static int __init pvh_setup_acpi_madt(struct domain *d, 
> paddr_t *addr)
>  if ( !madt )
>  {
>  printk("Unable to allocate memory for MADT table\n");
> -return -ENOMEM;
> +rc = -ENOMEM;
> +goto out;

This one doesn't need the goto, there's nothing leaked here (although there's
also no problem in doing in that way).

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] x86: fix memory leak in pvh_setup_acpi_xsdt

2017-02-27 Thread Roger Pau Monné
On Sun, Feb 26, 2017 at 03:49:32PM +, Wei Liu wrote:
> Switch to use goto style error handling to avoid leaking xsdt.
> 
> Coverity-ID: 1401535
> 
> Signed-off-by: Wei Liu 

Thanks! Same comments as the ones in the previous patch, and the first goto is
not really needed.

Reviewed-by: Roger Pau Monné 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/28] ARM: GICv3 ITS: allocate device and collection table

2017-02-27 Thread Andre Przywara
Hi Shanker,

thanks for having a look.

On 24/02/17 19:29, Shanker Donthineni wrote:
> Hi Andre
>
>
> On 02/16/2017 01:03 PM, Shanker Donthineni wrote:
>> Hi Andre,
>>
>>
>> On 01/30/2017 12:31 PM, Andre Przywara wrote:
>>> Each ITS maps a pair of a DeviceID (usually the PCI b/d/f triplet) and
>>> an EventID (the MSI payload or interrupt ID) to a pair of LPI number
>>> and collection ID, which points to the target CPU.
>>> This mapping is stored in the device and collection tables, which
>>> software
>>> has to provide for the ITS to use.
>>> Allocate the required memory and hand it the ITS.
>>> The maximum number of devices is limited to a compile-time constant
>>> exposed in Kconfig.
>>>
>>> Signed-off-by: Andre Przywara 
>>> ---
>>>   xen/arch/arm/Kconfig |  14 +
>>>   xen/arch/arm/gic-v3-its.c| 129
>>> +++
>>>   xen/arch/arm/gic-v3.c|   5 ++
>>>   xen/include/asm-arm/gic_v3_its.h |  55 -
>>>   4 files changed, 202 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>>> index 71734a1..81bc233 100644
>>> --- a/xen/arch/arm/Kconfig
>>> +++ b/xen/arch/arm/Kconfig
>>> @@ -64,6 +64,20 @@ config MAX_PHYS_LPI_BITS
>>> This can be overriden on the command line with the
>>> max_lpi_bits
>>> parameter.
>>>
>>> +config MAX_PHYS_ITS_DEVICE_BITS
>>> +depends on HAS_ITS
>>> +int "Number of device bits the ITS supports"
>>> +range 1 32
>>> +default "10"
>>> +help
>>> +  Specifies the maximum number of devices which want to use the
>>> ITS.
>>> +  Xen needs to allocates memory for the whole range very early.
>>> +  The allocation scheme may be sparse, so a much larger
>>> number must
>>> +  be supported to cover devices with a high bus number or
>>> those on
>>> +  separate bus segments.
>>> +  This can be overriden on the command line with the
>>> max_its_device_bits
>>> +  parameter.
>>> +
>>>   endmenu
>>>
>>>   menu "ARM errata workaround via the alternative framework"
>>> diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
>>> index ff0f571..c31fef6 100644
>>> --- a/xen/arch/arm/gic-v3-its.c
>>> +++ b/xen/arch/arm/gic-v3-its.c
>>> @@ -20,9 +20,138 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>> +#include 
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>> +
>>> +#define BASER_ATTR_MASK   \
>>> +((0x3UL << GITS_BASER_SHAREABILITY_SHIFT)   | \
>>> + (0x7UL << GITS_BASER_OUTER_CACHEABILITY_SHIFT) | \
>>> + (0x7UL << GITS_BASER_INNER_CACHEABILITY_SHIFT))
>>> +#define BASER_RO_MASK   (GENMASK(58, 56) | GENMASK(52, 48))
>>> +
>>> +static uint64_t encode_phys_addr(paddr_t addr, int page_bits)
>>> +{
>>> +uint64_t ret;
>>> +
>>> +if ( page_bits < 16 )
>>> +return (uint64_t)addr & GENMASK(47, page_bits);
>>> +
>>> +ret = addr & GENMASK(47, 16);
>>> +return ret | (addr & GENMASK(51, 48)) >> (48 - 12);
>>> +}
>>> +
>>> +#define PAGE_BITS(sz) ((sz) * 2 + PAGE_SHIFT)
>>> +
>>> +static int its_map_baser(void __iomem *basereg, uint64_t regc, int
>>> nr_items)
>>> +{
>>> +uint64_t attr, reg;
>>> +int entry_size = ((regc >> GITS_BASER_ENTRY_SIZE_SHIFT) & 0x1f)
>>> + 1;
>>> +int pagesz = 0, order, table_size;
>
> Please try ITS page sizes in the order 64K, 16K and 4K to cover more ITS
> devices using a flat table. Similar to Linux ITS driver.
>
>>> +void *buffer = NULL;
>>> +
>>> +attr  = GIC_BASER_InnerShareable << GITS_BASER_SHAREABILITY_SHIFT;
>>> +attr |= GIC_BASER_CACHE_SameAsInner <<
>>> GITS_BASER_OUTER_CACHEABILITY_SHIFT;
>>> +attr |= GIC_BASER_CACHE_RaWaWb <<
>>> GITS_BASER_INNER_CACHEABILITY_SHIFT;
>>> +
>>> +/*
>>> + * Setup the BASE register with the attributes that we like.
>>> Then read
>>> + * it back and see what sticks (page size, cacheability and
>>> shareability
>>> + * attributes), retrying if necessary.
>>> + */
>>> +while ( 1 )
>>> +{
>>> +table_size = ROUNDUP(nr_items * entry_size,
>>> BIT(PAGE_BITS(pagesz)));
>>> +order = get_order_from_bytes(table_size);
>>> +
>
> Limit to 256 ITS pages, ITS spec doesn't support more than 256 ITS pages.
>
>/* Maximum of 256 ITS pages are allowed */
>if ( (table_size >> PAGE_BITS(pagesz)) > GITS_BASER_PAGES_MAX )
>table_size = BIT(PAGE_BITS(pagesz)) * GITS_BASER_PAGES_MAX;
>
>>> +if ( !buffer )
>>> +buffer = alloc_xenheap_pages(order, 0);
>>> +if ( !buffer )
>>> +return -ENOMEM;
>>> +
>
> Please zero memory memset(buffer, 0x00, order << PAGE_SHIFT)

All three comments make sense, thanks for pointing these out.
I will incorporate those changes into the next post.

Cheers,
Andre.
IMPORTANT NOTICE: The contents of this email and any 

Re: [Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Michal Hocko
On Mon 27-02-17 11:02:09, Vitaly Kuznetsov wrote:
[...]
> I don't have anything new to add to the discussion happened last week
> but I'd like to summarize my arguments against this change:
> 
> 1) This patch doesn't solve any issue. Configuration option is not an
> issue by itself, it is an option for distros to decide what they want to
> ship: udev rule with known issues (legacy mode) or enable the new
> option. Distro makers and users building their kernels should be able to
> answer this simple question "do you want to automatically online all
> newly added memory or not".

OK, so could you be more specific? Distributions have no clue about
which HW their kernel runs on so how can they possibly make a sensible
decision here?

> There are distros already which ship kernels
> with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE enabled (Fedora 24 and 25 as
> far as I remember, maybe someone else).
> 
> 2) This patch creates an imbalance between Xen/Hyper-V on one side and
> KVM/Vmware on another. KVM/Vmware use pure ACPI memory hotplug and this
> memory won't get onlined. I don't understand how this problem is
> supposed to be solved by distros. They'll *have to* continue shipping
> a udev rule which has and always will have issues.

They have notifications for udev to online that memory and AFAICU
neither KVM nor VMware are using memory hotplut for ballooning - unlike
HyperV and Xen.

> 3) Kernel command line is not a viable choice, it is rather a debug
> method.

Why?

> Having all newly added memory online as soon as possible is a
> major use-case not something a couple of users wants (and this is
> proved by major distros shipping the unconditional 'offline->online'
> rule with udev).

I would argue because this really depends on the usecase. a) somebody
might want to online memory as movable and that really depends on which
node we are talking about because not all of them can be movable b) it
is easier to handle potential errors from userspace than the kernel.

> A couple of other thoughts:
> 1) Having all newly added memory online ASAP is probably what people
> want for all virtual machines. Unfortunately, we have additional
> complexity with memory zones (ZONE_NORMAL, ZONE_MOVABLE) and in some
> cases manual intervention is required. Especially, when further unplug
> is expected.

and that is why we do not want to hardwire this into the kernel and we
have a notification to handle this in userspace.

> 2) Adding new memory can (in some extreme cases) still fail as we need
> some *other* memory before we're able to online the newly added
> block. This is an issue to be solved and it is doable (IMO) with some
> pre-allocation.

you cannot preallocate for all the possible memory that can be added.
-- 
Michal Hocko
SUSE Labs

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Error in Xen while booting for salvator-X (M3 Board)

2017-02-27 Thread George John
Hi,
Thanks for the reply,
I am using Linux version 4.6.
The memory nodes were already squashed. When I have used a different
version of Xen, it booted to dom0. but still the crash occurs as shown in
the log below.

I have also noticed that for salvator x M3 board(r8a7796) the dtb file used
was r8a7795-salvator-x-dom0.dtb
Is it ok?

regards,
George

On Fri, Feb 24, 2017 at 8:43 PM, Oleksandr Tyshchenko 
wrote:

> Hi,
>
> Not 100% sure, but anyway...
>
> Can you recheck after squashing all memory nodes to a single one.
>
> ---
> I guess, you have following in your device tree:
>
> memory@4800 {
> device_type = "memory";
> /* first 128MB is reserved for secure area. */
> reg = <0x0 0x4800 0x0 0x3800>;
> };
>
> memory@5 {
> device_type = "memory";
> reg = <0x5 0x 0x0 0x4000>;
> };
>
> memory@6 {
> device_type = "memory";
> reg = <0x6 0x 0x0 0x4000>;
> };
>
> memory@7 {
> device_type = "memory";
> reg = <0x7 0x 0x0 0x4000>;
> };
>
> ---
> Try to make next:
>
> memory@4800 {
> device_type = "memory";
> /* first 128MB is reserved for secure area. */
> reg = <0x0 0x4800 0x0 0x3800>,
>  <0x5 0x 0x0 0x4000>,
>  <0x6 0x 0x0 0x4000>,
>  <0x7 0x 0x0 0x4000>;
> };
>
>
>
> On Fri, Feb 24, 2017 at 4:53 PM, Julien Grall 
> wrote:
> >
> >
> > On 21/02/17 12:03, George John wrote:
> >>
> >> Hi,
> >
> >
> > Hello,
> >
> >
> >> I was trying out xen in salvator-X(M3 Board as described
> >> in
> >> https://wiki.xenproject.org/wiki/Xen_ARM_with_
> Virtualization_Extensions/Salvator-X
> >>
> >> I ran in to following error:
> >>
> >>
> >> U-Boot 2015.04 (Feb 21 2017 - 14:24:48)
> >>
> >> CPU: Renesas Electronics R8A7796 rev 1.0
> >> Board: Salvator-X
> >> I2C:   ready
> >> DRAM:  3.9 GiB
> >> MMC:   sh-sdhi: 0, sh-sdhi: 1, sh-sdhi: 2
> >> In:serial
> >> Out:   serial
> >> Err:   serial
> >> Net:   Board Net Initialization Failed
> >> No ethernet found.
> >> Hit any key to stop autoboot:  0
> >> 819584 bytes read in 89 ms (8.8 MiB/s)
> >> 64927 bytes read in 23 ms (2.7 MiB/s)
> >> 14038016 bytes read in 1188 ms (11.3 MiB/s)
> >> 10319 bytes read in 19 ms (530.3 KiB/s)
> >> ## Booting kernel from Legacy Image at 4808 ...
> >>Image Name:   XEN
> >>Image Type:   AArch64 Linux Kernel Image (uncompressed)
> >>Data Size:819520 Bytes = 800.3 KiB
> >>Load Address: 7808
> >>Entry Point:  7808
> >>Verifying Checksum ... OK
> >> ## Flattened Device Tree blob at 4800
> >>Booting using the fdt blob at 0x4800
> >>Loading Kernel Image ... OK
> >>Using Device Tree in place at 4800, end 48012d9e
> >>
> >> Starting kernel ...
> >>
> >> - UART enabled -
> >> - CPU  booting -
> >> - Current EL 0008 -
> >> - Xen starting at EL2 -
> >> - Zero BSS -
> >> - Setting up control registers -
> >> - Turning on paging -
> >> - Ready -
> >> (XEN) Checking for initrd in /chosen
> >> (XEN) RAM: 4800 - 7fff
> >> (XEN) RAM: 0005 - 00053fff
> >> (XEN) RAM: 0006 - 00063fff
> >> (XEN) RAM: 0007 - 00073fff
> >> (XEN)
> >> (XEN) MODULE[0]: 4800 - 4801 Device Tree
> >> (XEN) MODULE[1]: 7a00 - 7c00 Kernel
> >> (XEN) MODULE[2]: 7c00 - 7c01 XSM
> >> (XEN)  RESVD[0]: 4800 - 4801
> >> (XEN)
> >> (XEN) Command line: dom0_mem=512M console=dtuart dtuart=serial0
> >> dom0_max_vcpus=1 bootscrub=0 flask_enforcing=1
> >> (XEN) Placing Xen at 0x7fe0-0x8000
> >> (XEN) Update BOOTMOD_XEN from 7808-78196e01 =>
> >> 7fe0-7ff16e01
> >
> >
> > Which kernel version is it?
> >
> >>
> >>
> >>
> >>
> >>
> >>
> >> After this, it hangs. What could be the possible reason?
> >
> >
> > Xen will initialize the heap and then continue into the boot. I would add
> > more debug around setup_mm to see where it failed.
> >
> > Regards,
> >
> > --
> > Julien Grall
> >
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > https://lists.xen.org/xen-devel
>
>
>
> --
> Regards,
>
> Oleksandr Tyshchenko
>


log_27_feb2017
Description: Binary data
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [qemu-mainline test] 106182: regressions - FAIL

2017-02-27 Thread osstest service owner
flight 106182 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106182/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl   6 xen-boot fail REGR. vs. 106141
 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail REGR. vs. 106141

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 106141
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 106141
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 106141
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 106141
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 106141
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 106141

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-rtds  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 build-arm64   5 xen-buildfail   never pass
 build-arm64-xsm   5 xen-buildfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu685783c5b69c83c942d1fc21679311eeb8f79ab9
baseline version:
 qemuu28f997a82cb509bf4775d4006b368e1bde8b7bdd

Last test of basis   106141  2017-02-26 08:00:26 Z1 days
Failing since106163  2017-02-26 17:45:03 Z0 days2 attempts
Testing same since   106182  2017-02-27 04:18:05 Z0 days1 attempts


People who touched revisions under test:
  Jeff Cody 
  John Snow 
  Kevin Wolf 
  Nir Soffer 
  Paul Burton 
  Peter Lieven 
  Peter Maydell 
  Samuel Thibault 
  tianqing 
  Yongbok Kim 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  fail
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd6

Re: [Xen-devel] Error in Xen while booting for salvator-X (M3 Board)

2017-02-27 Thread Oleksandr Tyshchenko
Hi.

On Mon, Feb 27, 2017 at 12:29 PM, George John  wrote:
> Hi,
> Thanks for the reply,
> I am using Linux version 4.6.
> The memory nodes were already squashed. When I have used a different version
> of Xen, it booted to dom0. but still the crash occurs as shown in the log
> below.
>
> I have also noticed that for salvator x M3 board(r8a7796) the dtb file used
> was r8a7795-salvator-x-dom0.dtb
> Is it ok?
I don't know about M3 board.
CC my colleague who plays with M3 board. Hope, that he can shed some lights.

>
> regards,
> George
>
> On Fri, Feb 24, 2017 at 8:43 PM, Oleksandr Tyshchenko 
> wrote:
>>
>> Hi,
>>
>> Not 100% sure, but anyway...
>>
>> Can you recheck after squashing all memory nodes to a single one.
>>
>> ---
>> I guess, you have following in your device tree:
>>
>> memory@4800 {
>> device_type = "memory";
>> /* first 128MB is reserved for secure area. */
>> reg = <0x0 0x4800 0x0 0x3800>;
>> };
>>
>> memory@5 {
>> device_type = "memory";
>> reg = <0x5 0x 0x0 0x4000>;
>> };
>>
>> memory@6 {
>> device_type = "memory";
>> reg = <0x6 0x 0x0 0x4000>;
>> };
>>
>> memory@7 {
>> device_type = "memory";
>> reg = <0x7 0x 0x0 0x4000>;
>> };
>>
>> ---
>> Try to make next:
>>
>> memory@4800 {
>> device_type = "memory";
>> /* first 128MB is reserved for secure area. */
>> reg = <0x0 0x4800 0x0 0x3800>,
>>  <0x5 0x 0x0 0x4000>,
>>  <0x6 0x 0x0 0x4000>,
>>  <0x7 0x 0x0 0x4000>;
>> };
>>
>>
>>
>> On Fri, Feb 24, 2017 at 4:53 PM, Julien Grall 
>> wrote:
>> >
>> >
>> > On 21/02/17 12:03, George John wrote:
>> >>
>> >> Hi,
>> >
>> >
>> > Hello,
>> >
>> >
>> >> I was trying out xen in salvator-X(M3 Board as described
>> >> in
>> >>
>> >> https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/Salvator-X
>> >>
>> >> I ran in to following error:
>> >>
>> >>
>> >> U-Boot 2015.04 (Feb 21 2017 - 14:24:48)
>> >>
>> >> CPU: Renesas Electronics R8A7796 rev 1.0
>> >> Board: Salvator-X
>> >> I2C:   ready
>> >> DRAM:  3.9 GiB
>> >> MMC:   sh-sdhi: 0, sh-sdhi: 1, sh-sdhi: 2
>> >> In:serial
>> >> Out:   serial
>> >> Err:   serial
>> >> Net:   Board Net Initialization Failed
>> >> No ethernet found.
>> >> Hit any key to stop autoboot:  0
>> >> 819584 bytes read in 89 ms (8.8 MiB/s)
>> >> 64927 bytes read in 23 ms (2.7 MiB/s)
>> >> 14038016 bytes read in 1188 ms (11.3 MiB/s)
>> >> 10319 bytes read in 19 ms (530.3 KiB/s)
>> >> ## Booting kernel from Legacy Image at 4808 ...
>> >>Image Name:   XEN
>> >>Image Type:   AArch64 Linux Kernel Image (uncompressed)
>> >>Data Size:819520 Bytes = 800.3 KiB
>> >>Load Address: 7808
>> >>Entry Point:  7808
>> >>Verifying Checksum ... OK
>> >> ## Flattened Device Tree blob at 4800
>> >>Booting using the fdt blob at 0x4800
>> >>Loading Kernel Image ... OK
>> >>Using Device Tree in place at 4800, end 48012d9e
>> >>
>> >> Starting kernel ...
>> >>
>> >> - UART enabled -
>> >> - CPU  booting -
>> >> - Current EL 0008 -
>> >> - Xen starting at EL2 -
>> >> - Zero BSS -
>> >> - Setting up control registers -
>> >> - Turning on paging -
>> >> - Ready -
>> >> (XEN) Checking for initrd in /chosen
>> >> (XEN) RAM: 4800 - 7fff
>> >> (XEN) RAM: 0005 - 00053fff
>> >> (XEN) RAM: 0006 - 00063fff
>> >> (XEN) RAM: 0007 - 00073fff
>> >> (XEN)
>> >> (XEN) MODULE[0]: 4800 - 4801 Device Tree
>> >> (XEN) MODULE[1]: 7a00 - 7c00 Kernel
>> >> (XEN) MODULE[2]: 7c00 - 7c01 XSM
>> >> (XEN)  RESVD[0]: 4800 - 4801
>> >> (XEN)
>> >> (XEN) Command line: dom0_mem=512M console=dtuart dtuart=serial0
>> >> dom0_max_vcpus=1 bootscrub=0 flask_enforcing=1
>> >> (XEN) Placing Xen at 0x7fe0-0x8000
>> >> (XEN) Update BOOTMOD_XEN from 7808-78196e01 =>
>> >> 7fe0-7ff16e01
>> >
>> >
>> > Which kernel version is it?
>> >
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> After this, it hangs. What could be the possible reason?
>> >
>> >
>> > Xen will initialize the heap and then continue into the boot. I would
>> > add
>> > more debug around setup_mm to see where it failed.
>> >
>> > Regards,
>> >
>> > --
>> > Julien Grall
>> >
>> > ___
>> > Xen-devel mailing list
>> > Xen-devel@lists.xen.org
>> > https://lists.xen.org/xen-devel
>>
>>
>>
>> --
>> Regards,
>>
>> Oleksandr Tyshchenko
>
>



-- 
Regards,

Oleksandr Tyshchenko

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Vitaly Kuznetsov
Michal Hocko  writes:

> On Mon 27-02-17 11:02:09, Vitaly Kuznetsov wrote:
> [...]
>> I don't have anything new to add to the discussion happened last week
>> but I'd like to summarize my arguments against this change:
>> 
>> 1) This patch doesn't solve any issue. Configuration option is not an
>> issue by itself, it is an option for distros to decide what they want to
>> ship: udev rule with known issues (legacy mode) or enable the new
>> option. Distro makers and users building their kernels should be able to
>> answer this simple question "do you want to automatically online all
>> newly added memory or not".
>
> OK, so could you be more specific? Distributions have no clue about
> which HW their kernel runs on so how can they possibly make a sensible
> decision here?

They at least have an idea if they ship udev rule or not. I can also
imagine different choices for non-x86 architectures but I don't know
enough about them to have an opinion.

>
>> There are distros already which ship kernels
>> with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE enabled (Fedora 24 and 25 as
>> far as I remember, maybe someone else).
>> 
>> 2) This patch creates an imbalance between Xen/Hyper-V on one side and
>> KVM/Vmware on another. KVM/Vmware use pure ACPI memory hotplug and this
>> memory won't get onlined. I don't understand how this problem is
>> supposed to be solved by distros. They'll *have to* continue shipping
>> a udev rule which has and always will have issues.
>
> They have notifications for udev to online that memory and AFAICU
> neither KVM nor VMware are using memory hotplut for ballooning - unlike
> HyperV and Xen.
>

No, Hyper-V doesn't use memory hotplug for ballooning purposes. It is
just a memory hotplug. The fact that the code is located in hv_balloon
is just a coincidence.

The difference with real hardware is how the operation is performed:
with real hardware you need to take a DIMM, go to your server room, open
the box, insert DIMM, go back to your seat. Asking to do some manual
action to actually enable memory is kinda OK. The beauty of hypervisors
is that everything happens automatically (e.g. when the VM is running
out of memory).

>> 3) Kernel command line is not a viable choice, it is rather a debug
>> method.
>
> Why?
>

Because we usually have just a few things there (root=, console=) and
the rest is used when something goes wrong or for 'special' cases, not
for the majority of users.

>> Having all newly added memory online as soon as possible is a
>> major use-case not something a couple of users wants (and this is
>> proved by major distros shipping the unconditional 'offline->online'
>> rule with udev).
>
> I would argue because this really depends on the usecase. a) somebody
> might want to online memory as movable and that really depends on which
> node we are talking about because not all of them can be movable

This is possible and that's why I introduce kernel command line options
back then. To simplify, I argue that the major use-case is 'online ASAP,
never offline' and for other use-cases we have options, both for distros
(config) and for users (command-line)


> b) it
> is easier to handle potential errors from userspace than the kernel.
>

Yes, probably, but memory hotplug was around for quite some time and I
didn't see anything but the dump udev rule (offline->online) without any
handling. And I think that we should rather focus on fixing potential
issues and making failures less probable (e.g. it's really hard to come
up with something different from 'failed->retry').

>> A couple of other thoughts:
>> 1) Having all newly added memory online ASAP is probably what people
>> want for all virtual machines. Unfortunately, we have additional
>> complexity with memory zones (ZONE_NORMAL, ZONE_MOVABLE) and in some
>> cases manual intervention is required. Especially, when further unplug
>> is expected.
>
> and that is why we do not want to hardwire this into the kernel and we
> have a notification to handle this in userspace.

Yes and I don't know about any plans to remove this notification. In
case some really complex handling is required just don't turn on the
automatic onlining.

Memory hotplug in real x86 hardware is rare, memory hotplug for VMs is
ubiquitous.

>
>> 2) Adding new memory can (in some extreme cases) still fail as we need
>> some *other* memory before we're able to online the newly added
>> block. This is an issue to be solved and it is doable (IMO) with some
>> pre-allocation.
>
> you cannot preallocate for all the possible memory that can be added.

For all, no, but for 1 next block - yes, and then I'll preallocate for
the next one.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Error in Xen while booting for salvator-X (M3 Board)

2017-02-27 Thread Andrii Anisov

Dear George,


Actually r8a7796 and r8a7795 are different chips with different 
peripherals, so you should adjust your r8a7796-xxx.dtb.


I'm not really sure what the problem is with XEN 4.6, but the crash 
point you are stepped in with 4.8 is commented as following


/*
 * Currently, to ensure hypervisor safety, when we received a
 * guest-generated vSerror/vAbort, we just crash the guest to protect
 * the hypervisor. In future we can better handle this by injecting
 * a vSerror/vAbort to the guest.
 */

So I guess you see the abort on access to a peripheral missing on 
r8a7796 but described in a device tree for r8a7795.


--

*Andrii Anisov*

*Lead Systems Engineer*

*Office: *+380 44 390 5457  *x* 66766 
*Cell: *+380 50 5738852 *Email: 
*andrii_ani...@epam.com 


*Kyiv**,* *Ukraine *(GMT+3)*epam.com *

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or 
entity(ies) to which it is addressed and contains information that is 
legally privileged and confidential. If you are not the intended 
recipient, or the person responsible for delivering the message to the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. 
All unintended recipients are obliged to delete this message and destroy 
any printed copies.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] x86/apicv: enhance posted-interrupt processing

2017-02-27 Thread Xuquan (Quan Xu)
>From 6b5f702927d832513d270a2bca4634b271f4df47 Mon Sep 17 00:00:00 2001
From: Quan Xu 
Date: Tue, 28 Feb 2017 02:48:29 +0800
Subject: [PATCH v2] x86/apicv: enhance posted-interrupt processing

If guest is already in non-root mode, an posted interrupt will
be directly delivered to guest (leaving softirq being set without
actually incurring a VM-Exit - breaking desired softirq behavior).
Then further posted interrupts will skip the IPI, stay in PIR and
not noted until another VM-Exit happens.

When target vCPU is in non-root mode and running on remote CPU,
posted way is triggered to inject interrupt without VM-Exit.
Otherwise, set VCPU_KICK_SOFTIRQ bit to inject interrupt until
another VM-Entry as a best effort.

Signed-off-by: Quan Xu 
---
 xen/arch/x86/hvm/vmx/vmx.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 61925cf..ecdfbbb 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1839,17 +1839,14 @@ static void vmx_process_isr(int isr, struct vcpu *v)

 static void __vmx_deliver_posted_interrupt(struct vcpu *v)
 {
-bool_t running = v->is_running;
+unsigned int cpu = v->processor;

 vcpu_unblock(v);
-if ( running && (in_irq() || (v != current)) )
-{
-unsigned int cpu = v->processor;
-
-if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
- && (cpu != smp_processor_id()) )
-send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
-}
+if ( v->is_running && (in_irq() || (v != current)) &&
+ (cpu != smp_processor_id()) )
+send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
+else
+set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu));
 }

 static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
--
1.8.3.1



0001-x86-apicv-enhance-posted-interrupt-processing.patch
Description: 0001-x86-apicv-enhance-posted-interrupt-processing.patch
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [distros-debian-sid test] 68617: regressions - trouble: blocked/broken/fail/pass

2017-02-27 Thread Platform Team regression test user
flight 68617 distros-debian-sid real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/68617/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-i386-sid-netboot-pygrub 9 debian-di-install fail REGR. vs. 
68583

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-i386-sid-netboot-pvgrub 10 guest-start fail like 68583
 test-amd64-amd64-amd64-sid-netboot-pvgrub 10 guest-start   fail like 68583
 test-armhf-armhf-armhf-sid-netboot-pygrub  9 debian-di-install fail like 68583

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-armhf-sid-netboot-pygrub  1 build-check(1)blocked n/a
 build-arm64   2 hosts-allocate   broken never pass
 build-arm64-pvops 2 hosts-allocate   broken never pass
 build-arm64-pvops 3 capture-logs broken never pass
 build-arm64   3 capture-logs broken never pass

baseline version:
 flight   68583

jobs:
 build-amd64  pass
 build-arm64  broken  
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-arm64-pvopsbroken  
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-sid-netboot-pvgrubfail
 test-amd64-i386-i386-sid-netboot-pvgrub  fail
 test-amd64-i386-amd64-sid-netboot-pygrub pass
 test-arm64-arm64-armhf-sid-netboot-pygrubblocked 
 test-armhf-armhf-armhf-sid-netboot-pygrubfail
 test-amd64-amd64-i386-sid-netboot-pygrub fail



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Error in Xen while booting for salvator-X (M3 Board)

2017-02-27 Thread Oleksandr Andrushchenko

Hi,

PFA the DTS I use for M3ULCB board


On 02/27/2017 12:48 PM, Oleksandr Tyshchenko wrote:

Hi.

On Mon, Feb 27, 2017 at 12:29 PM, George John  wrote:

Hi,
Thanks for the reply,
I am using Linux version 4.6.
The memory nodes were already squashed. When I have used a different version
of Xen, it booted to dom0. but still the crash occurs as shown in the log
below.

I have also noticed that for salvator x M3 board(r8a7796) the dtb file used
was r8a7795-salvator-x-dom0.dtb
Is it ok?

I don't know about M3 board.
CC my colleague who plays with M3 board. Hope, that he can shed some lights.


regards,
George

On Fri, Feb 24, 2017 at 8:43 PM, Oleksandr Tyshchenko 
wrote:

Hi,

Not 100% sure, but anyway...

Can you recheck after squashing all memory nodes to a single one.

---
I guess, you have following in your device tree:

memory@4800 {
device_type = "memory";
/* first 128MB is reserved for secure area. */
reg = <0x0 0x4800 0x0 0x3800>;
};

memory@5 {
device_type = "memory";
reg = <0x5 0x 0x0 0x4000>;
};

memory@6 {
device_type = "memory";
reg = <0x6 0x 0x0 0x4000>;
};

memory@7 {
device_type = "memory";
reg = <0x7 0x 0x0 0x4000>;
};

---
Try to make next:

memory@4800 {
device_type = "memory";
/* first 128MB is reserved for secure area. */
reg = <0x0 0x4800 0x0 0x3800>,
  <0x5 0x 0x0 0x4000>,
  <0x6 0x 0x0 0x4000>,
  <0x7 0x 0x0 0x4000>;
};



On Fri, Feb 24, 2017 at 4:53 PM, Julien Grall 
wrote:


On 21/02/17 12:03, George John wrote:

Hi,


Hello,



I was trying out xen in salvator-X(M3 Board as described
in

https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/Salvator-X

I ran in to following error:


U-Boot 2015.04 (Feb 21 2017 - 14:24:48)

CPU: Renesas Electronics R8A7796 rev 1.0
Board: Salvator-X
I2C:   ready
DRAM:  3.9 GiB
MMC:   sh-sdhi: 0, sh-sdhi: 1, sh-sdhi: 2
In:serial
Out:   serial
Err:   serial
Net:   Board Net Initialization Failed
No ethernet found.
Hit any key to stop autoboot:  0
819584 bytes read in 89 ms (8.8 MiB/s)
64927 bytes read in 23 ms (2.7 MiB/s)
14038016 bytes read in 1188 ms (11.3 MiB/s)
10319 bytes read in 19 ms (530.3 KiB/s)
## Booting kernel from Legacy Image at 4808 ...
Image Name:   XEN
Image Type:   AArch64 Linux Kernel Image (uncompressed)
Data Size:819520 Bytes = 800.3 KiB
Load Address: 7808
Entry Point:  7808
Verifying Checksum ... OK
## Flattened Device Tree blob at 4800
Booting using the fdt blob at 0x4800
Loading Kernel Image ... OK
Using Device Tree in place at 4800, end 48012d9e

Starting kernel ...

- UART enabled -
- CPU  booting -
- Current EL 0008 -
- Xen starting at EL2 -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 4800 - 7fff
(XEN) RAM: 0005 - 00053fff
(XEN) RAM: 0006 - 00063fff
(XEN) RAM: 0007 - 00073fff
(XEN)
(XEN) MODULE[0]: 4800 - 4801 Device Tree
(XEN) MODULE[1]: 7a00 - 7c00 Kernel
(XEN) MODULE[2]: 7c00 - 7c01 XSM
(XEN)  RESVD[0]: 4800 - 4801
(XEN)
(XEN) Command line: dom0_mem=512M console=dtuart dtuart=serial0
dom0_max_vcpus=1 bootscrub=0 flask_enforcing=1
(XEN) Placing Xen at 0x7fe0-0x8000
(XEN) Update BOOTMOD_XEN from 7808-78196e01 =>
7fe0-7ff16e01


Which kernel version is it?







After this, it hangs. What could be the possible reason?


Xen will initialize the heap and then continue into the boot. I would
add
more debug around setup_mm to see where it failed.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel



--
Regards,

Oleksandr Tyshchenko







/*
 * Device Tree Source for the M3ULCB (R-Car Starter Kit Pro) board
 *
 * Copyright (C) 2016-2017 Renesas Electronics Corp.
 * Copyright (C) 2016 Cogent Embedded, Inc.
 *
 * This file is licensed under the terms of the GNU General Public License
 * version 2.  This program is licensed "as is" without any warranty of any
 * kind, whether express or implied.
 */

/dts-v1/;
#include "r8a7796.dtsi"
#include 
#include 

/ {
model = "Renesas M3ULCB board based on r8a7796";
compatible = "renesas,m3ulcb", "renesas,r8a7796";

aliases {
serial0 = &scif2;
ethernet0 = &avb;
};

chosen {
bootargs = "dom0_mem=752M console=dtuart dtuart=serial0 
dom0_max_vcpus=4 bootscrub=0 flask_enforcing=1 loglvl=all";
xen,dom0-bootargs = "console=hvc0 root=/dev/mmcblk1p1 rw 
rootwait rootfstype=ext4 ignore_loglevel cma=128M";
//xen,dom0-bo

Re: [Xen-devel] [PATCH v8 08/24] x86: refactor psr: set value: implement framework.

2017-02-27 Thread Jan Beulich
>>> Yi Sun  02/27/17 8:06 AM >>>
>On 17-02-26 17:41:43, Wei Liu wrote:
>> On Wed, Feb 15, 2017 at 04:49:23PM +0800, Yi Sun wrote:
>> > +/*
>> > + * Step 0:
>> > + * old_cos means the COS ID current domain is using. By default, it 
>> > is 0.
>> > + *
>> > + * For every COS ID, there is a reference count to record how many 
>> > domains
>> > + * are using the COS register corresponding to this COS ID.
>> > + * - If ref[old_cos] is 0, that means this COS is not used by any 
>> > domain.
>> > + * - If ref[old_cos] is 1, that means this COS is only used by current
>> > + *   domain.
>> > + * - If ref[old_cos] is more than 1, that mean multiple domains are 
>> > using
>> > + *   this COS.
>> > + */
>> > +old_cos = d->arch.psr_cos_ids[socket];
>> > +if ( old_cos > MAX_COS_REG_CNT )
>> 
>> How could this happen? This function is the setter of cos, it is a bug
>> if it ever sets a value larger than MAX_COS_REG_CNT.
>> 
>You are right. This should not happen. This check is just a protection. If you
>think it is not necessary, I will remove it.

In which case  - make in an ASSERT() instead of an if().

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 0/3] x86/vmx: fix for vmentry failure with TSX bits in LBR

2017-02-27 Thread Andrew Cooper
On 23/02/17 09:33, Sergey Dyasli wrote:
> The first 2 patches is a general optimization which is nice to have
> prior to the 3rd patch which contains the real fix.
>
> A similar bug was fixed in Linux's perf subsystem in Jun 2016:
>
> commit 19fc9ddd61e059cc45464bdf6e8fa304bb94080f
> ("perf/x86/intel: Fix MSR_LAST_BRANCH_FROM_x bug when no TSX")
>
> But the issue affects virtualized systems in a more severe way since
> all LBR entries have to be fixed during vmentry.
>
> 
>
> Sergey Dyasli (3):
>   x86/vmx: introduce vmx_find_msr()
>   x86/vmx: optimize vmx_read/write_guest_msr()
>   x86/vmx: fix vmentry failure with TSX bits in LBR

Pushed, thanks.

There is a separate bug to do with the order of microcode loading and
collection of cpuid features.  At the moment, if microcode ends up
disabling TSX/HLE, Xen's BSP features still see them as present at the
point that lbr_tsx_fixup_check() is called.

In some copious free time, I will see about loading microcode earlier
(Linux loads it out of the trampoline for APs), and trying to load the
BSP microcode before feature evaluation occurs.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Heiko Carstens
On Mon, Feb 27, 2017 at 11:02:09AM +0100, Vitaly Kuznetsov wrote:
> A couple of other thoughts:
> 1) Having all newly added memory online ASAP is probably what people
> want for all virtual machines.

This is not true for s390. On s390 we have "standby" memory that a guest
sees and potentially may use if it sets it online. Every guest that sets
memory offline contributes to the hypervisor's standby memory pool, while
onlining standby memory takes memory away from the standby pool.

The use-case is that a system administrator in advance knows the maximum
size a guest will ever have and also defines how much memory should be used
at boot time. The difference is standby memory.

Auto-onlining of standby memory is the last thing we want.

> Unfortunately, we have additional complexity with memory zones
> (ZONE_NORMAL, ZONE_MOVABLE) and in some cases manual intervention is
> required. Especially, when further unplug is expected.

This also is a reason why auto-onlining doesn't seem be the best way.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/2] Two PVH dom0 series fixes

2017-02-27 Thread Andrew Cooper
On 26/02/17 15:49, Wei Liu wrote:
> Wei Liu (2):
>   x86: fix memory leak in pvh_setup_acpi_madt
>   x86: fix memory leak in pvh_setup_acpi_xsdt
>
>  xen/arch/x86/domain_build.c | 37 ++---
>  1 file changed, 26 insertions(+), 11 deletions(-)
>

Reviewed and committed.  Thanks,

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 03/28] ARM: GICv3: allocate LPI pending and property table

2017-02-27 Thread Andre Przywara
Hi,

"Yes, will fix" to everything not explicitly mentioned below.

On 06/02/17 16:26, Julien Grall wrote:
> Hi Andre,
> 
> On 30/01/17 18:31, Andre Przywara wrote:
>> The ARM GICv3 provides a new kind of interrupt called LPIs.
>> The pending bits and the configuration data (priority, enable bits) for
>> those LPIs are stored in tables in normal memory, which software has to
>> provide to the hardware.
>> Allocate the required memory, initialize it and hand it over to each
>> redistributor. The maximum number of LPIs to be used can be adjusted with
>> the command line option "max_lpi_bits", which defaults to a compile time
>> constant exposed in Kconfig.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>>  xen/arch/arm/Kconfig  |  15 +
>>  xen/arch/arm/Makefile |   1 +
>>  xen/arch/arm/gic-v3-lpi.c | 129
>> ++
>>  xen/arch/arm/gic-v3.c |  44 +
>>  xen/include/asm-arm/bitops.h  |   1 +
>>  xen/include/asm-arm/gic.h |   2 +
>>  xen/include/asm-arm/gic_v3_defs.h |  52 ++-
>>  xen/include/asm-arm/gic_v3_its.h  |  22 ++-
>>  8 files changed, 264 insertions(+), 2 deletions(-)
>>  create mode 100644 xen/arch/arm/gic-v3-lpi.c
>>
>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>> index bf64c61..71734a1 100644
>> --- a/xen/arch/arm/Kconfig
>> +++ b/xen/arch/arm/Kconfig
>> @@ -49,6 +49,21 @@ config HAS_ITS
>>  bool "GICv3 ITS MSI controller support"
>>  depends on HAS_GICV3
>>
>> +config MAX_PHYS_LPI_BITS
>> +depends on HAS_ITS
>> +int "Maximum bits for GICv3 host LPIs (14-32)"
>> +range 14 32
>> +default "20"
>> +help
>> +  Specifies the maximum number of LPIs (in bits) Xen should take
>> +  care of. The host ITS may provide support for a very large
>> number
>> +  of supported LPIs, for all of which we may not want to
>> allocate
>> +  memory, so this number here allows to limit this.
> 
> I think the description is misleading, if a user wants 8K worth of LPIs
> by default, he would have to use 14 and not 13.
> 
> Furthermore, you provide both a runtime option (via command line) and
> build time option (via Kconfig). You don't express what is the
> differences between both and how there are supposed to co-exist.
> 
> Anyway, IHMO the command line option should be sufficient to allow
> override if necessary. So I would drop the Kconfig version.

The idea is simply to let the Kconfig version specify the default value
if there is no command line option. So giving a command line option will
always override Kconfig.
Should we know a sensible default value, we can indeed get rid of this
Kconfig snippet here.

>> +  Xen itself does not know how many LPIs domains will ever need
>> +  beforehand.
>> +  This can be overriden on the command line with the
>> max_lpi_bits
> 
> s/overriden/overridden/
> 
>> +  parameter.
>> +
>>  endmenu
>>
>>  menu "ARM errata workaround via the alternative framework"
>> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
>> index 5f4ff23..4ccf2eb 100644
>> --- a/xen/arch/arm/Makefile
>> +++ b/xen/arch/arm/Makefile
>> @@ -19,6 +19,7 @@ obj-y += gic.o
>>  obj-y += gic-v2.o
>>  obj-$(CONFIG_HAS_GICV3) += gic-v3.o
>>  obj-$(CONFIG_HAS_ITS) += gic-v3-its.o
>> +obj-$(CONFIG_HAS_ITS) += gic-v3-lpi.o
>>  obj-y += guestcopy.o
>>  obj-y += hvm.o
>>  obj-y += io.o
>> diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
>> new file mode 100644
>> index 000..e2fc901
>> --- /dev/null
>> +++ b/xen/arch/arm/gic-v3-lpi.c
>> @@ -0,0 +1,129 @@
>> +/*
>> + * xen/arch/arm/gic-v3-lpi.c
>> + *
>> + * ARM GICv3 Locality-specific Peripheral Interrupts (LPI) support
>> + *
>> + * Copyright (C) 2016,2017 - ARM Ltd
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + */
>> +
>> +#include 
> 
> xen/config.h is not necessary.
> 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +/* Global state */
>> +static struct {
>> +uint8_t *lpi_property;
>> +unsigned int host_lpi_bits;
> 
> On the previous version, Stefano suggested to rename this to
> phys_lpi_bits + adding a comment as you store the number of bits.
> 
> However, looking at the usage the number of bits is only required during
> the initialization. Runtime code (such as gic_get_host_lpi) will use the
> number of LPIs (see gic_get_host_lpi) and therefore require

Re: [Xen-devel] [PATCH 02/28] ARM: GICv3 ITS: parse and store ITS subnodes from hardware DT

2017-02-27 Thread Andre Przywara
Hi,

On 06/02/17 12:58, Julien Grall wrote:
> Hi Andre,
> 
> On 30/01/17 18:31, Andre Przywara wrote:
>> Parse the DT GIC subnodes to find every ITS MSI controller the hardware
>> offers. Store that information in a list to both propagate all of them
>> later to Dom0, but also to be able to iterate over all ITSes.
>> This introduces an ITS Kconfig option.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>>  xen/arch/arm/Kconfig |  4 +++
>>  xen/arch/arm/Makefile|  1 +
>>  xen/arch/arm/gic-v3-its.c| 71
>> 
>>  xen/arch/arm/gic-v3.c| 12 ---
>>  xen/include/asm-arm/gic_v3_its.h | 57 
>>  5 files changed, 141 insertions(+), 4 deletions(-)
>>  create mode 100644 xen/arch/arm/gic-v3-its.c
>>  create mode 100644 xen/include/asm-arm/gic_v3_its.h
>>
>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>> index 2e023d1..bf64c61 100644
>> --- a/xen/arch/arm/Kconfig
>> +++ b/xen/arch/arm/Kconfig
>> @@ -45,6 +45,10 @@ config ACPI
>>  config HAS_GICV3
>>  bool
>>
>> +config HAS_ITS
>> +bool "GICv3 ITS MSI controller support"
>> +depends on HAS_GICV3
> 
> Should not this be disabled by default until the last patch of the
> series in order to avoid potential issue if bisecting Xen?

It should work without it, as we only ever map something driven by Dom0,
which does not learn about the virtual until the very last patch.
Actually enabling it early on allows fine-grained build and runtime
testing and bisecting, so we can pinpoint breakages in the VGIC
subsystem to a particular patch.

So do you see any real issues with that? Or was that just a feeling?

Cheers,
Andre.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v1 02/21] x86: NUMA: Refactor NUMA code

2017-02-27 Thread Vijay Kilari
Hi Jan,

On Thu, Feb 9, 2017 at 9:41 PM, Jan Beulich  wrote:
 On 09.02.17 at 16:56,  wrote:
>> From: Vijaya Kumar K 
>>
>> Move common generic NUMA code to xen/common/numa.c from
>> xen/arch/x86/numa.c. Also move generic code in header file
>> xen/include/asm-x86/numa.h to xen/include/xen/numa.h
>>
>> This common code can be re-used later for ARM.
>>
>> Signed-off-by: Vijaya Kumar K 
>
> I would have been nice if you Cc-ed the maintainers of the code
> you're moving.
>
>> --- /dev/null
>> +++ b/xen/common/numa.c
>> @@ -0,0 +1,342 @@
>> +/*
>> + * Common NUMA handling functions for x86 and arm.
>> + * Original code extracted from arch/x86/numa.c
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see .
>> + */
>> +
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>
> This last one would better not be included here.
>
>> +struct node_data node_data[MAX_NUMNODES];
>> +
>> +/* Mapping from pdx to node id */
>> +int memnode_shift;
>> +unsigned long memnodemapsize;
>> +u8 *memnodemap;
>> +typeof(*memnodemap) _memnodemap[64];
>
> Careful with removing any "static" please. For the last one in
> particular you would need to change the name if it's really necessary
> to make non-static. Even better would be though to keep it static
> and provide suitable accessors.
>
> Also please sanitize types as you're moving stuff: memnode_shift
> looks like it really wants to be unsigned int, and u8 should really
> be uint8_t (as we're trying to phase out u8 & Co). This also applies
> to code further down.

You mean to change all occurrences of
s/u8/uint8_t
s/u32/uint32_t
s/u64/uint64_t

Also, I see that xen/arch/x86/srat.c coding style is not adhere to xen
coding style.
Shall I clean up before I move the code?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 04/24] x86: refactor psr: implement CPU init and free flow.

2017-02-27 Thread Wei Liu
On Mon, Feb 27, 2017 at 02:42:31PM +0800, Yi Sun wrote:
> On 17-02-26 17:41:08, Wei Liu wrote:
> > On Wed, Feb 15, 2017 at 04:49:19PM +0800, Yi Sun wrote:
> > > This patch implements the CPU init and free flow including L3 CAT
> > > initialization and feature list free.
> > > 
> > > Signed-off-by: Yi Sun 
> > 
> > Either you need to use a separate patch to move cpuid_count_leaf or you
> > should state it is moved in the commit message.
> > 
> Thanks! I will add description in commit message.
> 
> > > +
> > > +/* Common functions. */
> > > +static void free_feature(struct psr_socket_info *info)
> > > +{
> > > +struct feat_node *feat, *next;
> > > +
> > > +if ( !info )
> > > +return;
> > > +
> > > +/*
> > > + * Free resources of features. But we do not free global feature list
> > > + * entry, like feat_l3_cat. Although it may cause a few memory leak,
> > > + * it is OK simplify things.
> > 
> > I don't think it is OK to leak memory in the hypervisor in general.
> > Please specify why it is OK in this particular case in the comment.
> > 
> In most cases, such global feature list entry will be added into feature list
> so that it can be freed here.
> 
> In extreme case, e.g. socket 1 does not support L3 CAT. The feat_l3_cat
> allocated in psr_cpu_prepare will not be released. But this is rare case.
> 
> Jan, Konrad and me disucssed this before. Per Jan's suggestion, we do not free
> it.

Then I would suggest you to not use "leak" in the comment. And put the
relevant bits from the discussion in the comment. Otherwise a drive-by
reviewer like me will call this out again. :-)

> 
> > > + */
> > > +list_for_each_entry_safe(feat, next, &info->feat_list, list)
> > > +{
> > > +if ( !feat )
> > > +return;
> > > +
> > > +__clear_bit(feat->feature, &info->feat_mask);
> > > +list_del(&feat->list);
> > > +xfree(feat);
> > > +}
> > > +}
> > > +
> > > -static int psr_cpu_prepare(unsigned int cpu)
> > > +static void cpu_init_work(void)
> > > +{
> > > +struct psr_socket_info *info;
> > > +unsigned int socket;
> > > +unsigned int cpu = smp_processor_id();
> > > +struct feat_node *feat;
> > > +struct cpuid_leaf regs = { .a = 0, .b = 0, .c = 0, .d = 0 };
> > > +
> > > +if ( !cpu_has(¤t_cpu_data, X86_FEATURE_PQE) )
> > > +return;
> > > +else if ( current_cpu_data.cpuid_level < PSR_CPUID_LEVEL_CAT )
> > > +{
> > > +__clear_bit(X86_FEATURE_PQE, current_cpu_data.x86_capability);
> > > +return;
> > > +}
> > > +
> > > +socket = cpu_to_socket(cpu);
> > > +info = socket_info + socket;
> > > +if ( info->feat_mask )
> > > +return;
> > > +
> > > +INIT_LIST_HEAD(&info->feat_list);
> > > +spin_lock_init(&info->ref_lock);
> > > +
> > > +cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 0, ®s);
> > > +if ( regs.b & PSR_RESOURCE_TYPE_L3 )
> > > +{
> > 
> > You can move
> > 
> >struct feat_node *feat
> > 
> > here.
> > 
> This variable will also be used by L2 CAT which codes exist in a different
> branch. So, I declare it at the top of the function. Please refer below
> patch:
> [PATCH v8 17/24] x86: L2 CAT: implement CPU init and free flow.

OK.

> 
> > > +cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 1, ®s);
> > > +
> > > +feat = feat_l3_cat;
> > > +/* psr_cpu_prepare will allocate it on subsequent CPU onlining. 
> > > */
> > > +feat_l3_cat = NULL;
> > > +feat->ops = l3_cat_ops;
> > > +
> > > +l3_cat_init_feature(regs, feat, info);
> > > +}
> > > +}
> > > +
> > [...]
> > >  
> > > @@ -359,7 +528,7 @@ static int cpu_callback(
> > >  switch ( action )
> > >  {
> > >  case CPU_UP_PREPARE:
> > > -rc = psr_cpu_prepare(cpu);
> > > +rc = psr_cpu_prepare();
> > >  break;
> > >  case CPU_STARTING:
> > >  psr_cpu_init();
> > > @@ -388,10 +557,14 @@ static int __init psr_presmp_init(void)
> > >  if ( (opt_psr & PSR_CMT) && opt_rmid_max )
> > >  init_psr_cmt(opt_rmid_max);
> > >  
> > > -psr_cpu_prepare(0);
> > > +if ( opt_psr & PSR_CAT )
> > > +init_psr();
> > > +
> > > +if ( psr_cpu_prepare() )
> > > +psr_free();
> > >  
> > >  psr_cpu_init();
> > > -if ( psr_cmt_enabled() )
> > > +if ( psr_cmt_enabled() || socket_info )
> > 
> > Why not have psr_cat_enabled() here?
> > 
> psr_cmt_enabled() returns true of false by checking if the global pointer
> 'psr_cmt' has been allocated or not. The 'psr_cmt' is also used in sysctl.c.
> For allocation features, we have a similar global pointer 'socket_info'. But
> it is only used in psr.c and all allocation features(CAT/CDP/MBA) use it. So
> we directly use it in psr.c to check if related initialization has been done.

The problem with using socket_info directly is that the name doesn't
tell you much. It doesn't carry specific semantics by itself. Wrapping
it inside an inline function with a 

Re: [Xen-devel] [PATCH v8 09/24] x86: refactor psr: set value: assemble features value array.

2017-02-27 Thread Wei Liu
On Mon, Feb 27, 2017 at 03:11:35PM +0800, Yi Sun wrote:
> On 17-02-26 17:43:04, Wei Liu wrote:
> > On Wed, Feb 15, 2017 at 04:49:24PM +0800, Yi Sun wrote:
> > [...]
> > >  
> > > +static unsigned int l3_cat_get_cos_num(const struct feat_node *feat)
> > > +{
> > > +return 1;
> > > +}
> > > +
> > > +static int l3_cat_get_old_val(uint64_t val[],
> > 
> > And the length of val is? How can you bound-check the access?
> > 
> > But I *think* this is just a pointer to uint64_t, you can just use
> > uint64_t *val here and *val = x; in code?
> > 
> > Same comment applies to the set_new_val handler as well.
> > 
> Such implementation is to simplify these functions. The bound-check will be
> done in caller functions, such as combine_val_array/set_new_val_to_array/etc.
> 

Please consider adding a comment to say bound-check is don't by the
caller.

> > > +  const struct feat_node *feat,
> > > +  unsigned int old_cos)
> > > +{
> > > +if ( old_cos > feat->info.l3_cat_info.cos_max )
> > > +/* Use default value. */
> > > +old_cos = 0;
> > > +
> > > +/* CAT */
> > > +val[0] =  feat->cos_reg_val[old_cos];
> > > +
> > > +return 0;
> > > +}
> > > +
> [...]
> > > -static int assemble_val_array(uint64_t *val,
> > > +static int combine_val_array(uint64_t *val,
> > >uint32_t array_len,
> > >const struct psr_socket_info *info,
> > >unsigned int old_cos)
> > 
> > Please just name this function combine_val_array in your previous patch
> > instead of trying to rename it here. Or just don't change the name at
> > all -- I don't see why changing name is necessary.
> > 
> Per Konrad's suggestion to change the name. Because he thought the 'assemble'
> is not accurate here.
> 

The such change should be inside the patch to introduce the framework,
not in this patch.

Wei.

> > Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/shadow: Fix build with CONFIG_SHADOW_PAGING=n following c/s 45ac805

2017-02-27 Thread Andrew Cooper
c/s 45ac805 "x86/paging: Package up the log dirty function pointers" neglected
the case when CONFIG_SHADOW_PAGING is disabled.  Make a similar adjustment to
the none stubs.

Spotted by a Travis RANDCONFIG run.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
---
 xen/arch/x86/mm/shadow/none.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/none.c b/xen/arch/x86/mm/shadow/none.c
index 69e56c5..41ce593 100644
--- a/xen/arch/x86/mm/shadow/none.c
+++ b/xen/arch/x86/mm/shadow/none.c
@@ -20,8 +20,13 @@ static void _clean_dirty_bitmap(struct domain *d)
 
 int shadow_domain_init(struct domain *d, unsigned int domcr_flags)
 {
-paging_log_dirty_init(d, _enable_log_dirty,
-  _disable_log_dirty, _clean_dirty_bitmap);
+static const struct log_dirty_ops sh_none_ops = {
+.enable  = _enable_log_dirty,
+.disable = _disable_log_dirty,
+.clean   = _clean_dirty_bitmap,
+};
+
+paging_log_dirty_init(d, &sh_none_ops);
 return is_pv_domain(d) ? 0 : -EOPNOTSUPP;
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/shadow: Fix build with CONFIG_SHADOW_PAGING=n following c/s 45ac805

2017-02-27 Thread Tim Deegan
At 11:47 + on 27 Feb (1488196049), Andrew Cooper wrote:
> c/s 45ac805 "x86/paging: Package up the log dirty function pointers" neglected
> the case when CONFIG_SHADOW_PAGING is disabled.  Make a similar adjustment to
> the none stubs.
> 
> Spotted by a Travis RANDCONFIG run.
> 
> Signed-off-by: Andrew Cooper 
> ---

Acked-by:  Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Vitaly Kuznetsov
Heiko Carstens  writes:

> On Mon, Feb 27, 2017 at 11:02:09AM +0100, Vitaly Kuznetsov wrote:
>> A couple of other thoughts:
>> 1) Having all newly added memory online ASAP is probably what people
>> want for all virtual machines.

Sorry, obviously missed 'x86' in the above statement.

>
> This is not true for s390. On s390 we have "standby" memory that a guest
> sees and potentially may use if it sets it online. Every guest that sets
> memory offline contributes to the hypervisor's standby memory pool, while
> onlining standby memory takes memory away from the standby pool.
>
> The use-case is that a system administrator in advance knows the maximum
> size a guest will ever have and also defines how much memory should be used
> at boot time. The difference is standby memory.
>
> Auto-onlining of standby memory is the last thing we want.
>

This is actually a very good example of why we need the config
option. s390 kernels will have it disabled.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Crash w/ bios="ovmf"

2017-02-27 Thread Wei Liu
On Wed, Feb 22, 2017 at 03:50:28PM -0500, Jason Dickens wrote:
> I'm trying to find a solution to an immediate VM crash which occurs by
> simply adding "bios='ovmf' to my configuration?
> 
> I started with a standard Ubuntu install which contained Xen 4.6.0 and had
> the crash. The VM works fine booting w/ SeaBIOS once the configuration line
> is removed. It also works fine with OVMF using just QEMU.
> 
> Then I was led to believe that i needed to rebuild Xen with --enable-ovmf ,
> using the source from apt-get source, this proved to be difficult and
> ultimately I had to disable stub domains to get it to work. (in case, that
> is relevant)
> 
> Still I get the crash almost immediately after the VM is launched. With
> sdl=1 I can see the window flash up and disappear.
> I should mention that ultimately I plan to use a non-standard build of the
> OVMF kernel, but for now I'm happy to get it working with any one.
> 
> At this point, I have been studying the source to try and determine where
> things diverge. I find very little use of LIBXL_BIOS_TYPE_OVMF and can't
> find  exactly where setting hvm.bios that even matters?
> 
> In any case, I would appreciate help on how to avoid the crash and/or
> understanding the Xen modifications for OVMF.
> 

Because OVMF doesn't do regular releases, we ship a commit of it when we
release a new version of Xen.

Are you comfortable with building Xen from the development branch? If
you use 4.8 or our master branch, you should be able to provide an OVMF
blob without rebuilding HVMloader every time.

Wei.


> Jason
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-4.4-testing test] 106183: regressions - FAIL

2017-02-27 Thread osstest service owner
flight 106183 xen-4.4-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106183/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xend-qemut-winxpsp3 15 guest-localmigrate/x10 fail in 106051 
REGR. vs. 105835

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemut-win7-amd64 3 host-install(3) broken in 106018 pass in 
106183
 test-amd64-amd64-xl-qemuu-ovmf-amd64 3 host-install(3) broken in 106018 pass 
in 106183
 test-amd64-amd64-pygrub  3 host-install(3) broken in 106018 pass in 106183
 test-amd64-i386-xl-qemuu-ovmf-amd64 3 host-install(3) broken in 106018 pass in 
106183
 test-amd64-i386-xl   3 host-install(3) broken in 106018 pass in 106183
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 3 host-install(3) broken in 106018 
pass in 106183
 test-amd64-i386-freebsd10-amd64 3 host-install(3) broken in 106051 pass in 
106183
 test-armhf-armhf-xl-multivcpu 16 guest-start.2   fail in 106051 pass in 106130
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail in 106051 
pass in 106183
 test-armhf-armhf-xl-credit2   6 xen-boot fail in 106130 pass in 106183
 test-armhf-armhf-xl-multivcpu 15 guest-start/debian.repeat fail in 106168 pass 
in 106130
 test-armhf-armhf-xl-arndale   9 debian-install   fail in 106168 pass in 106183
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail pass in 106018
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail pass in 106051
 test-armhf-armhf-xl-multivcpu 11 guest-start   fail pass in 106168
 test-armhf-armhf-xl-credit2  15 guest-start/debian.repeat  fail pass in 106168

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xend-qemut-winxpsp3  9 windows-installfail like 105815
 test-xtf-amd64-amd64-2   16 xtf/test-pv32pae-selftestfail  like 105835
 test-xtf-amd64-amd64-4   16 xtf/test-pv32pae-selftestfail  like 105835
 test-xtf-amd64-amd64-3   20 xtf/test-hvm32-invlpg~shadow fail  like 105835
 test-xtf-amd64-amd64-3 33 xtf/test-hvm32pae-invlpg~shadow fail like 105835
 test-xtf-amd64-amd64-3   44 xtf/test-hvm64-invlpg~shadow fail  like 105835
 test-xtf-amd64-amd64-2   54 leak-check/check fail  like 105835
 test-xtf-amd64-amd64-4   54 leak-check/check fail  like 105835
 test-xtf-amd64-amd64-1   54 leak-check/check fail  like 105835
 test-xtf-amd64-amd64-5   54 leak-check/check fail  like 105835
 test-xtf-amd64-amd64-3   54 leak-check/check fail  like 105835
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 105835
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 105835

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumprun-amd64  1 build-check(1)   blocked  n/a
 test-amd64-i386-rumprun-i386  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail in 106168 never 
pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-check fail in 106168 
never pass
 test-xtf-amd64-amd64-2   10 xtf-fep  fail   never pass
 test-xtf-amd64-amd64-4   10 xtf-fep  fail   never pass
 test-xtf-amd64-amd64-2   18 xtf/test-hvm32-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-4   18 xtf/test-hvm32-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-1   10 xtf-fep  fail   never pass
 test-xtf-amd64-amd64-1   16 xtf/test-pv32pae-selftestfail   never pass
 test-xtf-amd64-amd64-1   18 xtf/test-hvm32-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-2 31 xtf/test-hvm32pae-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-4 31 xtf/test-hvm32pae-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-2 37 xtf/test-hvm32pse-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-4 37 xtf/test-hvm32pse-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-2   41 xtf/test-hvm64-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-1 31 xtf/test-hvm32pae-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-4   41 xtf/test-hvm64-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-1 37 xtf/test-hvm32pse-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-1   41 xtf/test-hvm64-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-5   10 xtf-fep  fail   never pass
 build-amd64-rumprun   7 xen-buildfail   never pass
 build-i386-rumprun7 xen-buildfail   never pass
 test-xtf-amd64-amd64-3   10 xtf-fep  fail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-xtf-amd64-amd64-2   53 xtf/test-hvm64-xsa-195   fail   never pass
 test-xtf-amd64-amd64-4   53 xtf/test-hvm64-xsa-195   

[Xen-devel] [PATCH] x86/PVHv2: fix dereference of native RSDP table mapping

2017-02-27 Thread Roger Pau Monne
Check that the RSDP is mapped before trying to access it.

Spotted-by: Coverity
Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/domain_build.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index d742965..78ae741 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -2592,6 +2592,11 @@ static int __init pvh_setup_acpi(struct domain *d, 
paddr_t start_info)
 
 /* Craft a custom RSDP. */
 native_rsdp = acpi_os_map_memory(acpi_os_get_root_pointer(), sizeof(rsdp));
+if ( !native_rsdp )
+{
+printk("Failed to map native RSDP\n");
+return -ENOMEM;
+}
 memcpy(rsdp.oem_id, native_rsdp->oem_id, sizeof(rsdp.oem_id));
 acpi_os_unmap_memory(native_rsdp, sizeof(rsdp));
 rsdp.xsdt_physical_address = xsdt_paddr;
-- 
2.10.1 (Apple Git-78)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] libs/devicemodel: free xencall handle in error path in _open()

2017-02-27 Thread Wei Liu
Change the allocation to use calloc to get zeroed structure. Free
xencall handler in error path.

Spotted by Coverity.

Signed-off-by: Wei Liu 
---
 tools/libs/devicemodel/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 19ebef63b3..a85cb49c22 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -24,7 +24,7 @@
 xendevicemodel_handle *xendevicemodel_open(xentoollog_logger *logger,
unsigned open_flags)
 {
-xendevicemodel_handle *dmod = malloc(sizeof(*dmod));
+xendevicemodel_handle *dmod = calloc(1, sizeof(*dmod));
 int rc;
 
 if (!dmod)
@@ -54,6 +54,7 @@ xendevicemodel_handle *xendevicemodel_open(xentoollog_logger 
*logger,
 
 err:
 xtl_logger_destroy(dmod->logger_tofree);
+xencall_close(dmod->xcall);
 free(dmod);
 return NULL;
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable-smoke test] 106196: tolerable trouble: broken/fail/pass - PUSHED

2017-02-27 Thread osstest service owner
flight 106196 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106196/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64   5 xen-buildfail   never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 xen  d6e9f8d4f35d938417f9dc2ea50f6e8004e26725
baseline version:
 xen  1f24be6c945c8f8e25547aed4a56c092133df713

Last test of basis   106089  2017-02-24 18:01:12 Z2 days
Testing same since   106196  2017-02-27 11:02:16 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  George Dunlap 
  Kevin Tian 
  Sergey Dyasli 
  Tim Deegan 

jobs:
 build-amd64  pass
 build-arm64  fail
 build-armhf  pass
 build-amd64-libvirt  pass
 build-arm64-pvopsfail
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=d6e9f8d4f35d938417f9dc2ea50f6e8004e26725
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
d6e9f8d4f35d938417f9dc2ea50f6e8004e26725
+ branch=xen-unstable-smoke
+ revision=d6e9f8d4f35d938417f9dc2ea50f6e8004e26725
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.8-testing
+ '[' xd6e9f8d4f35d938417f9dc2ea50f6e8004e26725 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.or

Re: [Xen-devel] [PATCH 03/28] ARM: GICv3: allocate LPI pending and property table

2017-02-27 Thread Julien Grall



On 27/02/17 11:34, Andre Przywara wrote:

Hi,


Hi Andre,



"Yes, will fix" to everything not explicitly mentioned below.

On 06/02/17 16:26, Julien Grall wrote:

Hi Andre,

On 30/01/17 18:31, Andre Przywara wrote:

The ARM GICv3 provides a new kind of interrupt called LPIs.
The pending bits and the configuration data (priority, enable bits) for
those LPIs are stored in tables in normal memory, which software has to
provide to the hardware.
Allocate the required memory, initialize it and hand it over to each
redistributor. The maximum number of LPIs to be used can be adjusted with
the command line option "max_lpi_bits", which defaults to a compile time
constant exposed in Kconfig.

Signed-off-by: Andre Przywara 
---
 xen/arch/arm/Kconfig  |  15 +
 xen/arch/arm/Makefile |   1 +
 xen/arch/arm/gic-v3-lpi.c | 129
++
 xen/arch/arm/gic-v3.c |  44 +
 xen/include/asm-arm/bitops.h  |   1 +
 xen/include/asm-arm/gic.h |   2 +
 xen/include/asm-arm/gic_v3_defs.h |  52 ++-
 xen/include/asm-arm/gic_v3_its.h  |  22 ++-
 8 files changed, 264 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/gic-v3-lpi.c

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index bf64c61..71734a1 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -49,6 +49,21 @@ config HAS_ITS
 bool "GICv3 ITS MSI controller support"
 depends on HAS_GICV3

+config MAX_PHYS_LPI_BITS
+depends on HAS_ITS
+int "Maximum bits for GICv3 host LPIs (14-32)"
+range 14 32
+default "20"
+help
+  Specifies the maximum number of LPIs (in bits) Xen should take
+  care of. The host ITS may provide support for a very large
number
+  of supported LPIs, for all of which we may not want to
allocate
+  memory, so this number here allows to limit this.


I think the description is misleading, if a user wants 8K worth of LPIs
by default, he would have to use 14 and not 13.

Furthermore, you provide both a runtime option (via command line) and
build time option (via Kconfig). You don't express what is the
differences between both and how there are supposed to co-exist.

Anyway, IHMO the command line option should be sufficient to allow
override if necessary. So I would drop the Kconfig version.


The idea is simply to let the Kconfig version specify the default value
if there is no command line option. So giving a command line option will
always override Kconfig.
Should we know a sensible default value, we can indeed get rid of this
Kconfig snippet here.


Please have in mind that distribution will ship one version of Xen for 
all the platforms. There is no sensible default value and I don't see 
how a distribution will be able to pick-up one.


So I still don't see the point of this default option.




+  Xen itself does not know how many LPIs domains will ever need
+  beforehand.
+  This can be overriden on the command line with the
max_lpi_bits


s/overriden/overridden/


+  parameter.
+
 endmenu

 menu "ARM errata workaround via the alternative framework"
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 5f4ff23..4ccf2eb 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -19,6 +19,7 @@ obj-y += gic.o
 obj-y += gic-v2.o
 obj-$(CONFIG_HAS_GICV3) += gic-v3.o
 obj-$(CONFIG_HAS_ITS) += gic-v3-its.o
+obj-$(CONFIG_HAS_ITS) += gic-v3-lpi.o
 obj-y += guestcopy.o
 obj-y += hvm.o
 obj-y += io.o
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
new file mode 100644
index 000..e2fc901
--- /dev/null
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -0,0 +1,129 @@
+/*
+ * xen/arch/arm/gic-v3-lpi.c
+ *
+ * ARM GICv3 Locality-specific Peripheral Interrupts (LPI) support
+ *
+ * Copyright (C) 2016,2017 - ARM Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 


xen/config.h is not necessary.


+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Global state */
+static struct {
+uint8_t *lpi_property;
+unsigned int host_lpi_bits;


On the previous version, Stefano suggested to rename this to
phys_lpi_bits + adding a comment as you store the number of bits.

However, looking at the usage the number of bits is only required during
the initialization. Runtime code (such as gic_get_host_lpi) will use the
number of LPIs (see gic_get_host_lpi) and therefore require extra
instructions to comp

Re: [Xen-devel] [PATCH 02/28] ARM: GICv3 ITS: parse and store ITS subnodes from hardware DT

2017-02-27 Thread Julien Grall

On 27/02/17 11:43, Andre Przywara wrote:

Hi,


Hi Andre,


On 06/02/17 12:58, Julien Grall wrote:

Hi Andre,

On 30/01/17 18:31, Andre Przywara wrote:

Parse the DT GIC subnodes to find every ITS MSI controller the hardware
offers. Store that information in a list to both propagate all of them
later to Dom0, but also to be able to iterate over all ITSes.
This introduces an ITS Kconfig option.

Signed-off-by: Andre Przywara 
---
 xen/arch/arm/Kconfig |  4 +++
 xen/arch/arm/Makefile|  1 +
 xen/arch/arm/gic-v3-its.c| 71

 xen/arch/arm/gic-v3.c| 12 ---
 xen/include/asm-arm/gic_v3_its.h | 57 
 5 files changed, 141 insertions(+), 4 deletions(-)
 create mode 100644 xen/arch/arm/gic-v3-its.c
 create mode 100644 xen/include/asm-arm/gic_v3_its.h

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2e023d1..bf64c61 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -45,6 +45,10 @@ config ACPI
 config HAS_GICV3
 bool

+config HAS_ITS
+bool "GICv3 ITS MSI controller support"
+depends on HAS_GICV3


Should not this be disabled by default until the last patch of the
series in order to avoid potential issue if bisecting Xen?


It should work without it, as we only ever map something driven by Dom0,
which does not learn about the virtual until the very last patch.
Actually enabling it early on allows fine-grained build and runtime
testing and bisecting, so we can pinpoint breakages in the VGIC
subsystem to a particular patch.

So do you see any real issues with that? Or was that just a feeling?


Just a feeling. If you say it should be ok, then I am happy with that.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libs/devicemodel: free xencall handle in error path in _open()

2017-02-27 Thread Paul Durrant
> -Original Message-
> From: Wei Liu [mailto:wei.l...@citrix.com]
> Sent: 27 February 2017 12:20
> To: Xen-devel 
> Cc: Ian Jackson ; Paul Durrant
> ; Wei Liu 
> Subject: [PATCH] libs/devicemodel: free xencall handle in error path in
> _open()
> 
> Change the allocation to use calloc to get zeroed structure. Free
> xencall handler in error path.
> 
> Spotted by Coverity.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
>  tools/libs/devicemodel/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
> index 19ebef63b3..a85cb49c22 100644
> --- a/tools/libs/devicemodel/core.c
> +++ b/tools/libs/devicemodel/core.c
> @@ -24,7 +24,7 @@
>  xendevicemodel_handle *xendevicemodel_open(xentoollog_logger
> *logger,
> unsigned open_flags)
>  {
> -xendevicemodel_handle *dmod = malloc(sizeof(*dmod));
> +xendevicemodel_handle *dmod = calloc(1, sizeof(*dmod));
>  int rc;
> 
>  if (!dmod)
> @@ -54,6 +54,7 @@ xendevicemodel_handle
> *xendevicemodel_open(xentoollog_logger *logger,
> 
>  err:
>  xtl_logger_destroy(dmod->logger_tofree);
> +xencall_close(dmod->xcall);
>  free(dmod);
>  return NULL;
>  }
> --
> 2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Michal Hocko
On Mon 27-02-17 11:49:43, Vitaly Kuznetsov wrote:
> Michal Hocko  writes:
> 
> > On Mon 27-02-17 11:02:09, Vitaly Kuznetsov wrote:
> > [...]
> >> I don't have anything new to add to the discussion happened last week
> >> but I'd like to summarize my arguments against this change:
> >> 
> >> 1) This patch doesn't solve any issue. Configuration option is not an
> >> issue by itself, it is an option for distros to decide what they want to
> >> ship: udev rule with known issues (legacy mode) or enable the new
> >> option. Distro makers and users building their kernels should be able to
> >> answer this simple question "do you want to automatically online all
> >> newly added memory or not".
> >
> > OK, so could you be more specific? Distributions have no clue about
> > which HW their kernel runs on so how can they possibly make a sensible
> > decision here?
> 
> They at least have an idea if they ship udev rule or not. I can also
> imagine different choices for non-x86 architectures but I don't know
> enough about them to have an opinion.

I really do not follow. If they know whether they ship the udev rule
then why do they need a kernel help at all? Anyway this global policy
actually breaks some usecases. Say you would have a default set to
online. What should user do if _some_ nodes should be online_movable?
Or, say that HyperV or other hotplug based ballooning implementation
really want to online the movable memory in order to have a realiable
hotremove. Now you have a global policy which goes against it.

> >> There are distros already which ship kernels
> >> with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE enabled (Fedora 24 and 25 as
> >> far as I remember, maybe someone else).
> >> 
> >> 2) This patch creates an imbalance between Xen/Hyper-V on one side and
> >> KVM/Vmware on another. KVM/Vmware use pure ACPI memory hotplug and this
> >> memory won't get onlined. I don't understand how this problem is
> >> supposed to be solved by distros. They'll *have to* continue shipping
> >> a udev rule which has and always will have issues.
> >
> > They have notifications for udev to online that memory and AFAICU
> > neither KVM nor VMware are using memory hotplut for ballooning - unlike
> > HyperV and Xen.
> >
> 
> No, Hyper-V doesn't use memory hotplug for ballooning purposes. It is
> just a memory hotplug. The fact that the code is located in hv_balloon
> is just a coincidence.

OK, I might be wrong here but 1cac8cd4d146 ("Drivers: hv: balloon:
Implement hot-add functionality") suggests otherwise.

> The difference with real hardware is how the operation is performed:
> with real hardware you need to take a DIMM, go to your server room, open
> the box, insert DIMM, go back to your seat. Asking to do some manual
> action to actually enable memory is kinda OK. The beauty of hypervisors
> is that everything happens automatically (e.g. when the VM is running
> out of memory).

I do not see your point. Either you have some (semi)automatic way to
balance memory in guest based on the memory pressure (let's call it
ballooning) or this is an administration operation (say you buy more
DIMs or pay more to your virtualization provider) and then it is up to
the guest owner to tell what to do about that memory. In other words you
really do not want to wait in the first case as you are under memory
pressure which is _actively_ managed or this is much more relaxed
environment.
 
> >> 3) Kernel command line is not a viable choice, it is rather a debug
> >> method.
> >
> > Why?
> >
> 
> Because we usually have just a few things there (root=, console=) and
> the rest is used when something goes wrong or for 'special' cases, not
> for the majority of users.

auto online or even memory hotplug seems something that requires
a special HW/configuration already so I fail to see your point. It is
normal to put kernel parameters to override the default. And AFAIU
default offline is a sensible default for the standard memory hotplug.

[...]

> >> 2) Adding new memory can (in some extreme cases) still fail as we need
> >> some *other* memory before we're able to online the newly added
> >> block. This is an issue to be solved and it is doable (IMO) with some
> >> pre-allocation.
> >
> > you cannot preallocate for all the possible memory that can be added.
> 
> For all, no, but for 1 next block - yes, and then I'll preallocate for
> the next one.

You are still thinking in the scope of your particular use case and I
believe the whole thing is shaped around that very same thing and that
is why it should have been rejected in the first place. Especially when
that use case can be handled without user visible configuration knob.

-- 
Michal Hocko
SUSE Labs

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable bisection] complete test-amd64-i386-xl-qemut-debianhvm-amd64-xsm

2017-02-27 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-amd64-i386-xl-qemut-debianhvm-amd64-xsm
testid xen-boot

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  c5b9805bc1f79319ae342c65fcc201a15a47
  Bug not present: b199c44afa3a0d18d0e968e78a590eb9e69e20ad
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/106197/


  commit c5b9805bc1f79319ae342c65fcc201a15a47
  Author: Daniel Kiper 
  Date:   Wed Feb 22 14:38:06 2017 +0100
  
  efi: create new early memory allocator
  
  There is a problem with place_string() which is used as early memory
  allocator. It gets memory chunks starting from start symbol and goes
  down. Sadly this does not work when Xen is loaded using multiboot2
  protocol because then the start lives on 1 MiB address and we should
  not allocate a memory from below of it. So, I tried to use mem_lower
  address calculated by GRUB2. However, this solution works only on some
  machines. There are machines in the wild (e.g. Dell PowerEdge R820)
  which uses first ~640 KiB for boot services code or data... :-(((
  Hence, we need new memory allocator for Xen EFI boot code which is
  quite simple and generic and could be used by place_string() and
  efi_arch_allocate_mmap_buffer(). I think about following solutions:
  
  1) We could use native EFI allocation functions (e.g. AllocatePool()
 or AllocatePages()) to get memory chunk. However, later (somewhere
 in __start_xen()) we must copy its contents to safe place or reserve
 it in e820 memory map and map it in Xen virtual address space. This
 means that the code referring to Xen command line, loaded modules and
 EFI memory map, mostly in __start_xen(), will be further complicated
 and diverge from legacy BIOS cases. Additionally, both former things
 have to be placed below 4 GiB because their addresses are stored in
 multiboot_info_t structure which has 32-bit relevant members.
  
  2) We may allocate memory area statically somewhere in Xen code which
 could be used as memory pool for early dynamic allocations. Looks
 quite simple. Additionally, it would not depend on EFI at all and
 could be used on legacy BIOS platforms if we need it. However, we
 must carefully choose size of this pool. We do not want increase Xen
 binary size too much and waste too much memory but also we must fit
 at least memory map on x86 EFI platforms. As I saw on small machine,
 e.g. IBM System x3550 M2 with 8 GiB RAM, memory map may contain more
 than 200 entries. Every entry on x86-64 platform is 40 bytes in size.
 So, it means that we need more than 8 KiB for EFI memory map only.
 Additionally, if we use this memory pool for Xen and modules command
 line storage (it would be used when xen.efi is executed as EFI 
application)
 then we should add, I think, about 1 KiB. In this case, to be on safe
 side, we should assume at least 64 KiB pool for early memory 
allocations.
 Which is about 4 times of our earlier calculations. However, during
 discussion on Xen-devel Jan Beulich suggested that just in case we 
should
 use 1 MiB memory pool like it is in original place_string() 
implementation.
 So, let's use 1 MiB as it was proposed. If we think that we should not
 waste unallocated memory in the pool on running system then we can mark
 this region as __initdata and move all required data to dynamically
 allocated places somewhere in __start_xen().
  
  2a) We could put memory pool into .bss.page_aligned section. Then allocate
  memory chunks starting from the lowest address. After init phase we 
can
  free unused portion of the memory pool as in case of .init.text or 
.init.data
  sections. This way we do not need to allocate any space in image file 
and
  freeing of unused area in the memory pool is very simple.
  
  Now #2a solution is implemented because it is quite simple and requires
  limited number of changes, especially in __start_xen().
  
  New allocator is quite generic and can be used on ARM platforms too.
  Though it is not enabled on ARM yet due to lack of some prereq.
  List of them is placed before ebmalloc code.
  
  Signed-off-by: Daniel Kiper 
  Acked-by: Jan Beulich 
  Acked-by: Julien Grall 
  Reviewed-by: Doug Goldstein 
  Tested-by: Doug Goldstein 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproj

Re: [Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Vitaly Kuznetsov
Michal Hocko  writes:

> On Mon 27-02-17 11:49:43, Vitaly Kuznetsov wrote:
>> Michal Hocko  writes:
>> 
>> > On Mon 27-02-17 11:02:09, Vitaly Kuznetsov wrote:
>> > [...]
>> >> I don't have anything new to add to the discussion happened last week
>> >> but I'd like to summarize my arguments against this change:
>> >> 
>> >> 1) This patch doesn't solve any issue. Configuration option is not an
>> >> issue by itself, it is an option for distros to decide what they want to
>> >> ship: udev rule with known issues (legacy mode) or enable the new
>> >> option. Distro makers and users building their kernels should be able to
>> >> answer this simple question "do you want to automatically online all
>> >> newly added memory or not".
>> >
>> > OK, so could you be more specific? Distributions have no clue about
>> > which HW their kernel runs on so how can they possibly make a sensible
>> > decision here?
>> 
>> They at least have an idea if they ship udev rule or not. I can also
>> imagine different choices for non-x86 architectures but I don't know
>> enough about them to have an opinion.
>
> I really do not follow. If they know whether they ship the udev rule
> then why do they need a kernel help at all? Anyway this global policy
> actually breaks some usecases. Say you would have a default set to
> online. What should user do if _some_ nodes should be online_movable?
> Or, say that HyperV or other hotplug based ballooning implementation
> really want to online the movable memory in order to have a realiable
> hotremove. Now you have a global policy which goes against it.
>

While I think that hotremove is a special case which really requires
manual intervention (at least to decide which memory goes NORMAL and
which MOVABLE), MEMORY_HOTPLUG_DEFAULT_ONLINE is probably not for it.

[snip]

>
>> The difference with real hardware is how the operation is performed:
>> with real hardware you need to take a DIMM, go to your server room, open
>> the box, insert DIMM, go back to your seat. Asking to do some manual
>> action to actually enable memory is kinda OK. The beauty of hypervisors
>> is that everything happens automatically (e.g. when the VM is running
>> out of memory).
>
> I do not see your point. Either you have some (semi)automatic way to
> balance memory in guest based on the memory pressure (let's call it
> ballooning) or this is an administration operation (say you buy more
> DIMs or pay more to your virtualization provider) and then it is up to
> the guest owner to tell what to do about that memory. In other words you
> really do not want to wait in the first case as you are under memory
> pressure which is _actively_ managed or this is much more relaxed
> environment.

I don't see a contradiction between what I say and what you say here :-)
Yes, there are case when we're not in a hurry and there are cases when
we can't wait.

>
>> >> 3) Kernel command line is not a viable choice, it is rather a debug
>> >> method.
>> >
>> > Why?
>> >
>> 
>> Because we usually have just a few things there (root=, console=) and
>> the rest is used when something goes wrong or for 'special' cases, not
>> for the majority of users.
>
> auto online or even memory hotplug seems something that requires
> a special HW/configuration already so I fail to see your point. It is
> normal to put kernel parameters to override the default. And AFAIU
> default offline is a sensible default for the standard memory hotplug.
>

It depends how we define 'standard'. The point I'm trying to make is
that it's really common for VMs to use this technique while in hardware
(x86) world it is a rare occasion. The 'sensible default' may differ.

> [...]
>
>> >> 2) Adding new memory can (in some extreme cases) still fail as we need
>> >> some *other* memory before we're able to online the newly added
>> >> block. This is an issue to be solved and it is doable (IMO) with some
>> >> pre-allocation.
>> >
>> > you cannot preallocate for all the possible memory that can be added.
>> 
>> For all, no, but for 1 next block - yes, and then I'll preallocate for
>> the next one.
>
> You are still thinking in the scope of your particular use case and I
> believe the whole thing is shaped around that very same thing and that
> is why it should have been rejected in the first place. Especially when
> that use case can be handled without user visible configuration knob.

I think my use case is broad enough. At least it applies to all
virtualization technoligies and not only to Hyper-V. But yes, I agree
that adding a parameter to add_memory() solves my particular use case as
well.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xen/arm: Hiding SMMUs from Dom0 when using ACPI on Xen

2017-02-27 Thread Vijay Kilari
Hi Julien,

On Wed, Feb 22, 2017 at 7:40 PM, Julien Grall  wrote:
> Hello,
>
> There was few discussions recently about hiding SMMUs from DOM0 when using
> ACPI. I thought it would be good to have a separate thread for this.
>
> When using ACPI, the SMMUs will be described in the IO Remapping Table
> (IORT). The specification can be found on the ARM website [1].
>
> For a brief summary, the IORT can be used to discover the SMMUs present on
> the platform and find for a given device the ID to configure components such
> as ITS (DeviceID) and SMMU (StreamID).
>
> The appendix A in the specification gives an example how DeviceID and
> StreamID can be found. For instance, when a PCI device is both protected by
> an SMMU and MSI-capable the following translation will happen:
> RID -> StreamID -> DeviceID
>
> Currently, SMMUs are hidden from DOM0 because they are been used by Xen and
> we don't support stage-1 SMMU. If we pass the IORT as it is, DOM0 will try
> to initialize SMMU and crash.
>
> I first thought about using a Xen specific way (STAO) or extending a flag in
> IORT. But that is not ideal.
>
> So we would have to rewrite the IORT for DOM0. Given that a range of RID can
> mapped to multiple ranges of DeviceID, we would have to translate RID one by
> one to find the associated DeviceID. I think this may end up to complex code
> and have a big IORT table.

Why can't we replace Output base of IORT of PCI node with SMMU output base?.
I mean similar to PCI node without SMMU, why can't replace output base
of PCI node with
SMMU's output base?.

The issue I see is RID is [15:0] where is DeviceID is [17:0].

>
> However, given that DeviceID will be used by DOM0 to only configure the ITS.
> We have no need to use to have the DOM0 DeviceID equal to the host DeviceID.
> So I think we could simplify our life by generating DeviceID for each RID
> range.

If DOM0 DeviceID != host Device ID, then we cannot initialize ITS using DOM0
ITS commands (MAPD). So, is it concluded that ITS initializes all the devices
with platform specific Device ID's in Xen?.

>
> Any opinions?
>
> Cheers,
>
> [1]
> http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
>
> --
> Julien Grall
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/PVHv2: fix dereference of native RSDP table mapping

2017-02-27 Thread Andrew Cooper
On 27/02/17 12:14, Roger Pau Monne wrote:
> Check that the RSDP is mapped before trying to access it.
>
> Spotted-by: Coverity
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/7] x86/hvm: Adjust hvm_nx_enabled() to match how Xen behaves

2017-02-27 Thread Andrew Cooper
On Intel hardware, EFER is not fully switched between host and guest contexts.
In practice, this means that Xen's EFER.NX setting leaks into guest context,
and influences the behaviour of the hardware pagewalker.

When servicing a pagefault, Xen's model of guests behaviour should match
hardware's behaviour, to allow correct interpretation of the pagefault error
code, and to avoid creating observable difference in behaviour from the guests
point of view.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 

Fixing this isn't trivial.  On more modern hardware, we can use EFER loading.
On older hardware, we can use general MSR loading if available.  On
older-hardware-yet, we could reload EFER right before/after vmentry/vmexit.
However, doing so would require reloading EFER before any data accesses (as
the NX bit will cause #PF[RSVD]), and that is rather hard given the need to
preserve the GPRs.
---
 xen/include/asm-x86/hvm/hvm.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 87b203a..9907a7a 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -292,8 +292,10 @@ int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t 
dest, uint8_t dest_mode);
 (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMEP))
 #define hvm_smap_enabled(v) \
 (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMAP))
+/* HVM guests on Intel hardware leak Xen's NX settings into guest context. */
 #define hvm_nx_enabled(v) \
-(!!((v)->arch.hvm_vcpu.guest_efer & EFER_NX))
+((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && cpu_has_nx) || \
+ !!((v)->arch.hvm_vcpu.guest_efer & EFER_NX))
 #define hvm_pku_enabled(v) \
 (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE))
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/7] Fixes to pagetable handling

2017-02-27 Thread Andrew Cooper
This series has been a long time in preparation (i.e. most of the 4.8 and 4.9
dev cycles).  It started when I tried to make an XTF PoC for XSA-176, and
stumbled upon the the _PAGE_PAGED aliasing issue (see patch 7) which caused by
PoC to be descheduled waiting for (a non-existent) paging agent to respond.

This series is built upon:
  1) The switch to using _Bool. I spent rather too long trying to debug why
 CR0.WP wasn't behaving properly, and it was down to static inline bool_t
 guest_wp_enabled().
  2) The series to switch emulation to using system-segment relative memory
 accesses (directly relevant to patches 1 and 2), which in turn resulted
 in the discovery of XSA-191.
  3) The CPUID improvement work to get maxphysaddr into a sensibly audited
 state, and sensibly accessible location.

Outstanding hardware issues discovered include:
  1) There is an observable delay in AMD Fam 10h processors between loading a
 segment selector, and the results of the LDT/GDT memory access being
 visible in the pagetables (via the Access bits being set).
  2) SMAP provides insufficient information to the pagefault handler (via the
 error code) to fully determine the nature of the access causing the
 pagefault in the first place.  See patch 2 for details.

Andrew Cooper (7):
  x86/hvm: Correctly identify implicit supervisor accesses
  x86/shadow: Try to correctly identify implicit supervisor accesses
  x86/pagewalk: Helpers for reserved bit handling
  x86/hvm: Adjust hvm_nx_enabled() to match how Xen behaves
  x86/shadow: Use the pagewalk reserved bits helpers
  x86/pagewalk: Consistently use guest_walk_*() helpers for translation
  x86/pagewalk: Re-implement the pagetable walker

 xen/arch/x86/hvm/emulate.c|  18 +-
 xen/arch/x86/mm/guest_walk.c  | 444 --
 xen/arch/x86/mm/hap/guest_walk.c  |  23 +-
 xen/arch/x86/mm/hap/nested_ept.c  |   2 +-
 xen/arch/x86/mm/p2m.c |  17 +-
 xen/arch/x86/mm/shadow/multi.c| 108 +++---
 xen/include/asm-x86/guest_pt.h| 180 +---
 xen/include/asm-x86/hvm/hvm.h |   4 +-
 xen/include/asm-x86/p2m.h |   2 +-
 xen/include/asm-x86/page.h|   3 -
 xen/include/asm-x86/processor.h   |   3 +
 xen/include/asm-x86/x86_64/page.h |   6 -
 12 files changed, 489 insertions(+), 321 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/7] x86/shadow: Try to correctly identify implicit supervisor accesses

2017-02-27 Thread Andrew Cooper
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/mm/shadow/multi.c | 60 --
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 128809d..7c6b017 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2857,7 +2857,7 @@ static int sh_page_fault(struct vcpu *v,
 const struct x86_emulate_ops *emul_ops;
 int r;
 p2m_type_t p2mt;
-uint32_t rc;
+uint32_t rc, error_code;
 int version;
 const struct npfec access = {
  .read_access = 1,
@@ -3011,13 +3011,69 @@ static int sh_page_fault(struct vcpu *v,
 
  rewalk:
 
+error_code = regs->error_code;
+
+/*
+ * When CR4.SMAP is enabled, instructions which have a side effect of
+ * accessing the system data structures (e.g. mov to %ds accessing the
+ * LDT/GDT, or int $n accessing the IDT) are known as implicit supervisor
+ * accesses.
+ *
+ * The distinction between implicit and explicit accesses form part of the
+ * determination of access rights, controlling whether the access is
+ * successful, or raises a #PF.
+ *
+ * Unfortunately, the processor throws away the implicit/explicit
+ * distinction and does not provide it to the pagefault handler
+ * (i.e. here.) in the #PF error code.  Therefore, we must try to
+ * reconstruct the lost state so it can be fed back into our pagewalk
+ * through the guest tables.
+ *
+ * User mode accesses are easy to reconstruct:
+ *
+ *   If we observe a cpl3 data fetch which was a supervisor walk, this
+ *   must have been an implicit access to a system table.
+ *
+ * Supervisor mode accesses are not easy:
+ *
+ *   In principle, we could decode the instruction under %rip and have the
+ *   instruction emulator tell us if there is an implicit access.
+ *   However, this is racy with other vcpus updating the pagetable or
+ *   rewriting the instruction stream under our feet.
+ *
+ *   Therefore, we do nothing.  (If anyone has a sensible suggestion for
+ *   how to distinguish these cases, xen-devel@ is all ears...)
+ *
+ * As a result, one specific corner case will fail.  If a guest OS with
+ * SMAP enabled ends up mapping a system table with user mappings, sets
+ * EFLAGS.AC to allow explicit accesses to user mappings, and implicitly
+ * accesses the user mapping, hardware and the shadow code will disagree
+ * on whether a #PF should be raised.
+ *
+ * Hardware raises #PF because implicit supervisor accesses to user
+ * mappings are strictly disallowed.  As we can't reconstruct the correct
+ * input, the pagewalk is performed as if it were an explicit access,
+ * which concludes that the access should have succeeded and the shadow
+ * pagetables need modifying.  The shadow pagetables are modified (to the
+ * same value), and we re-enter the guest to re-execute the instruction,
+ * which causes another #PF, and the vcpu livelocks, unable to make
+ * forward progress.
+ *
+ * In practice, this is tolerable because no OS would deliberately
+ * construct such a corner case, as doing so would mean it is trivially
+ * root-able by its own userspace.
+ */
+if ( !(error_code & (PFEC_insn_fetch|PFEC_user_mode)) &&
+ (is_pv_vcpu(v) ? (regs->ss & 3) : hvm_get_cpl(v)) == 3 )
+error_code |= PFEC_implicit;
+
 /* The walk is done in a lock-free style, with some sanity check
  * postponed after grabbing paging lock later. Those delayed checks
  * will make sure no inconsistent mapping being translated into
  * shadow page table. */
 version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
 rmb();
-rc = sh_walk_guest_tables(v, va, &gw, regs->error_code);
+rc = sh_walk_guest_tables(v, va, &gw, error_code);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
 regs->error_code &= ~PFEC_page_present;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 6/7] x86/pagewalk: Consistently use guest_walk_*() helpers for translation

2017-02-27 Thread Andrew Cooper
hap_p2m_ga_to_gfn() and sh_page_fault() currently use guest_l1e_get_gfn() to
obtain the translation of a pagewalk.  This is conceptually wrong (the
semantics of gw.l1e is an internal detail), and will actually be wrong when
PSE36 superpage support is fixed.  Switch them to using guest_walk_to_gfn().

Take the opportunity to const-correct the walk_t parameter of the
guest_walk_to_*() helpers, and implement guest_walk_to_gpa() in terms of
guest_walk_to_gfn() to avoid duplicating the actual translation calculation.

While editing guest_walk_to_gpa(), fix a latent bug by causing it to return
INVALID_PADDR rather than 0 for a failed translation, as 0 is also a valid
successful result.  The sole caller, sh_page_fault(), has already confirmed
that the translation is valid, so this doesn't cause a behavioural change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/mm/hap/guest_walk.c |  2 +-
 xen/arch/x86/mm/shadow/multi.c   |  2 +-
 xen/include/asm-x86/guest_pt.h   | 19 +--
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 569a495..313f82f 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -98,7 +98,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 /* Interpret the answer */
 if ( missing == 0 )
 {
-gfn_t gfn = guest_l1e_get_gfn(gw.l1e);
+gfn_t gfn = guest_walk_to_gfn(&gw);
 struct page_info *page;
 page = get_page_from_gfn_p2m(p2m->domain, p2m, gfn_x(gfn), &p2mt,
  NULL, P2M_ALLOC | P2M_UNSHARE);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 702835b..f73b553 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3109,7 +3109,7 @@ static int sh_page_fault(struct vcpu *v,
 }
 
 /* What mfn is the guest trying to access? */
-gfn = guest_l1e_get_gfn(gw.l1e);
+gfn = guest_walk_to_gfn(&gw);
 gmfn = get_gfn(d, gfn, &p2mt);
 
 if ( shadow_mode_refcounts(d) &&
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 1c3d384..1aa383f 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -323,8 +323,7 @@ struct guest_pagetable_walk
 
 /* Given a walk_t, translate the gw->va into the guest's notion of the
  * corresponding frame number. */
-static inline gfn_t
-guest_walk_to_gfn(walk_t *gw)
+static inline gfn_t guest_walk_to_gfn(const walk_t *gw)
 {
 if ( !(guest_l1e_get_flags(gw->l1e) & _PAGE_PRESENT) )
 return INVALID_GFN;
@@ -333,19 +332,19 @@ guest_walk_to_gfn(walk_t *gw)
 
 /* Given a walk_t, translate the gw->va into the guest's notion of the
  * corresponding physical address. */
-static inline paddr_t
-guest_walk_to_gpa(walk_t *gw)
+static inline paddr_t guest_walk_to_gpa(const walk_t *gw)
 {
-if ( !(guest_l1e_get_flags(gw->l1e) & _PAGE_PRESENT) )
-return 0;
-return ((paddr_t)gfn_x(guest_l1e_get_gfn(gw->l1e)) << PAGE_SHIFT) +
-   (gw->va & ~PAGE_MASK);
+gfn_t gfn = guest_walk_to_gfn(gw);
+
+if ( gfn_eq(gfn, INVALID_GFN) )
+return INVALID_PADDR;
+
+return (gfn_x(gfn) << PAGE_SHIFT) | (gw->va & ~PAGE_MASK);
 }
 
 /* Given a walk_t from a successful walk, return the page-order of the 
  * page or superpage that the virtual address is in. */
-static inline unsigned int 
-guest_walk_to_page_order(walk_t *gw)
+static inline unsigned int guest_walk_to_page_order(const walk_t *gw)
 {
 /* This is only valid for successful walks - otherwise the 
  * PSE bits might be invalid. */
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/7] x86/pagewalk: Helpers for reserved bit handling

2017-02-27 Thread Andrew Cooper
Some bits are unconditionally reserved in pagetable entries, or reserved
because of alignment restrictions.  Other bits are reserved because of control
register configuration.

Introduce helpers which take an individual vcpu and guest pagetable entry, and
calculates whether any reserved bits are set.

While here, add a couple of newlines to aid readability, drop some trailing
whitespace and bool/const correct the existing helpers to allow the new
helpers to take const vcpu pointers.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: George Dunlap 
CC: Tim Deegan 
---
 xen/include/asm-x86/guest_pt.h | 98 ++
 1 file changed, 90 insertions(+), 8 deletions(-)

diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 0bf6cf9..1c3d384 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -44,6 +44,18 @@ gfn_to_paddr(gfn_t gfn)
 #undef get_gfn
 #define get_gfn(d, g, t) get_gfn_type((d), gfn_x(g), (t), P2M_ALLOC)
 
+/* Mask covering the reserved bits from superpage alignment. */
+#define SUPERPAGE_RSVD(bit) \
+(((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
+
+static inline uint32_t fold_pse36(uint64_t val)
+{
+return (val & ~(0x1ffULL << 13)) | ((val & (0x1ffULL << 32)) >> (32 - 13));
+}
+static inline uint64_t unfold_pse36(uint32_t val)
+{
+return (val & ~(0x1ffULL << 13)) | ((val & (0x1ffULL << 13)) << (32 - 13));
+}
 
 /* Types of the guest's page tables and access functions for them */
 
@@ -51,9 +63,13 @@ gfn_to_paddr(gfn_t gfn)
 
 #define GUEST_L1_PAGETABLE_ENTRIES 1024
 #define GUEST_L2_PAGETABLE_ENTRIES 1024
+
 #define GUEST_L1_PAGETABLE_SHIFT 12
 #define GUEST_L2_PAGETABLE_SHIFT 22
 
+#define GUEST_L1_PAGETABLE_RSVD   0
+#define GUEST_L2_PAGETABLE_RSVD   0
+
 typedef uint32_t guest_intpte_t;
 typedef struct { guest_intpte_t l1; } guest_l1e_t;
 typedef struct { guest_intpte_t l2; } guest_l2e_t;
@@ -88,21 +104,39 @@ static inline guest_l2e_t guest_l2e_from_gfn(gfn_t gfn, 
u32 flags)
 #else /* GUEST_PAGING_LEVELS != 2 */
 
 #if GUEST_PAGING_LEVELS == 3
+
 #define GUEST_L1_PAGETABLE_ENTRIES  512
 #define GUEST_L2_PAGETABLE_ENTRIES  512
 #define GUEST_L3_PAGETABLE_ENTRIES4
+
 #define GUEST_L1_PAGETABLE_SHIFT 12
 #define GUEST_L2_PAGETABLE_SHIFT 21
 #define GUEST_L3_PAGETABLE_SHIFT 30
+
+#define GUEST_L1_PAGETABLE_RSVD0x7ff0UL
+#define GUEST_L2_PAGETABLE_RSVD0x7ff0UL
+#define GUEST_L3_PAGETABLE_RSVD  \
+(0xfff0UL | _PAGE_GLOBAL | _PAGE_PSE | _PAGE_DIRTY | \
+ _PAGE_ACCESSED | _PAGE_USER | _PAGE_RW)
+
 #else /* GUEST_PAGING_LEVELS == 4 */
+
 #define GUEST_L1_PAGETABLE_ENTRIES  512
 #define GUEST_L2_PAGETABLE_ENTRIES  512
 #define GUEST_L3_PAGETABLE_ENTRIES  512
 #define GUEST_L4_PAGETABLE_ENTRIES  512
+
 #define GUEST_L1_PAGETABLE_SHIFT 12
 #define GUEST_L2_PAGETABLE_SHIFT 21
 #define GUEST_L3_PAGETABLE_SHIFT 30
 #define GUEST_L4_PAGETABLE_SHIFT 39
+
+#define GUEST_L1_PAGETABLE_RSVD0
+#define GUEST_L2_PAGETABLE_RSVD0
+#define GUEST_L3_PAGETABLE_RSVD0
+/* NB L4e._PAGE_GLOBAL is reserved for AMD, but ignored for Intel. */
+#define GUEST_L4_PAGETABLE_RSVD_PAGE_PSE
+
 #endif
 
 typedef l1_pgentry_t guest_l1e_t;
@@ -170,27 +204,30 @@ static inline guest_l4e_t guest_l4e_from_gfn(gfn_t gfn, 
u32 flags)
 
 /* Which pagetable features are supported on this vcpu? */
 
-static inline int
-guest_supports_superpages(struct vcpu *v)
+static inline bool guest_supports_superpages(const struct vcpu *v)
 {
 /* The _PAGE_PSE bit must be honoured in HVM guests, whenever
- * CR4.PSE is set or the guest is in PAE or long mode. 
+ * CR4.PSE is set or the guest is in PAE or long mode.
  * It's also used in the dummy PT for vcpus with CR4.PG cleared. */
 return (is_pv_vcpu(v)
 ? opt_allow_superpage
-: (GUEST_PAGING_LEVELS != 2 
+: (GUEST_PAGING_LEVELS != 2
|| !hvm_paging_enabled(v)
|| (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE)));
 }
 
-static inline int
-guest_supports_1G_superpages(struct vcpu *v)
+static inline bool guest_has_pse36(const struct vcpu *v)
+{
+ /* No support for 2-level PV guests. */
+return is_pv_vcpu(v) ? 0 : paging_mode_hap(v->domain);
+}
+
+static inline bool guest_supports_1G_superpages(const struct vcpu *v)
 {
 return (GUEST_PAGING_LEVELS >= 4 && hvm_pse1gb_supported(v->domain));
 }
 
-static inline int
-guest_supports_nx(struct vcpu *v)
+static inline bool guest_supports_nx(const struct vcpu *v)
 {
 if ( GUEST_PAGING_LEVELS == 2 || !cpu_has_nx )
 return 0;
@@ -213,6 +250,51 @@ guest_supports_nx(struct vcpu *v)
 #define _PAGE_INVALID_BITS _PAGE_INVALID_BIT

[Xen-devel] [PATCH 1/7] x86/hvm: Correctly identify implicit supervisor accesses

2017-02-27 Thread Andrew Cooper
All actions which refer to the active ldt/gdt/idt or task register
(e.g. loading a new segment selector) are known as implicit supervisor
accesses, even when the access originates from user code.

The distinction is necessary in the pagewalk when SMAP is enabled.  Refer to
Intel SDM Vol 3 "Access Rights" for the exact details.

Introduce a new pagewalk input, and make use of the new system segment
references in hvmemul_{read,write}().  While modifying those areas, move the
calculation of the appropriate pagewalk input before its first use.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Paul Durrant 
CC: George Dunlap 
CC: Tim Deegan 
---
 xen/arch/x86/hvm/emulate.c  | 18 ++
 xen/arch/x86/mm/guest_walk.c|  4 
 xen/include/asm-x86/processor.h |  1 +
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index f24d289..9b32bca 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -783,6 +783,11 @@ static int __hvmemul_read(
 struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
 int rc;
 
+if ( is_x86_system_segment(seg) )
+pfec |= PFEC_implicit;
+else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3 )
+pfec |= PFEC_user_mode;
+
 rc = hvmemul_virtual_to_linear(
 seg, offset, bytes, &reps, access_type, hvmemul_ctxt, &addr);
 if ( rc != X86EMUL_OKAY || !bytes )
@@ -793,10 +798,6 @@ static int __hvmemul_read(
  (vio->mmio_gla == (addr & PAGE_MASK)) )
 return hvmemul_linear_mmio_read(addr, bytes, p_data, pfec, 
hvmemul_ctxt, 1);
 
-if ( (seg != x86_seg_none) &&
- (hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3) )
-pfec |= PFEC_user_mode;
-
 rc = ((access_type == hvm_access_insn_fetch) ?
   hvm_fetch_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo) :
   hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo));
@@ -893,6 +894,11 @@ static int hvmemul_write(
 struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
 int rc;
 
+if ( is_x86_system_segment(seg) )
+pfec |= PFEC_implicit;
+else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3 )
+pfec |= PFEC_user_mode;
+
 rc = hvmemul_virtual_to_linear(
 seg, offset, bytes, &reps, hvm_access_write, hvmemul_ctxt, &addr);
 if ( rc != X86EMUL_OKAY || !bytes )
@@ -902,10 +908,6 @@ static int hvmemul_write(
  (vio->mmio_gla == (addr & PAGE_MASK)) )
 return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, 
hvmemul_ctxt, 1);
 
-if ( (seg != x86_seg_none) &&
- (hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3) )
-pfec |= PFEC_user_mode;
-
 rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo);
 
 switch ( rc )
diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index faaf70c..4f8d0e3 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -161,6 +161,10 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 bool_t pse1G = 0, pse2M = 0;
 p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE;
 
+/* Only implicit supervisor data accesses exist. */
+ASSERT(!(pfec & PFEC_implicit) ||
+   !(pfec & (PFEC_insn_fetch|PFEC_user_mode)));
+
 perfc_incr(guest_walk);
 memset(gw, 0, sizeof(*gw));
 gw->va = va;
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index dda8b83..d3ba8ea 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -76,6 +76,7 @@
 /* Internally used only flags. */
 #define PFEC_page_paged (1U<<16)
 #define PFEC_page_shared(1U<<17)
+#define PFEC_implicit   (1U<<18) /* Pagewalk input for ldt/gdt/idt/tr 
accesses. */
 
 /* Other exception error code values. */
 #define X86_XEC_EXT (_AC(1,U) << 0)
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 7/7] x86/pagewalk: Re-implement the pagetable walker

2017-02-27 Thread Andrew Cooper
The existing pagetable walker has complicated return semantics, which squeeze
multiple pieces of information into single integer.  This would be fine if the
information didn't overlap, but it does.

Specifically, _PAGE_INVALID_BITS for 3-level guests alias _PAGE_PAGED and
_PAGE_SHARED.  A guest which constructs a PTE with bits 52 or 53 set (the
start of the upper software-available range) will create a virtual address
which, when walked by Xen, tricks Xen into believing the frame is paged or
shared.  This behaviour was introduced by XSA-173 (c/s 8b17648).

It is also complicated to turn rc back into a normal pagefault error code.
Instead, change the calling semantics to return a boolean indicating success,
and have the function accumulate a real pagefault error code as it goes
(including synthetic error codes, which do not alias hardware ones).  This
requires an equivalent adjustment to map_domain_gfn().

Issues fixed:
 * 2-level PSE36 superpages now return the correct translation.
 * 2-level L2 superpages without CR0.PSE now return the correct translation.
 * SMEP now inhibits a user instruction fetch even if NX isn't active.
 * Supervisor writes without CR0.WP now set the leaf dirty bit.
 * L4e._PAGE_GLOBAL is strictly reserved on AMD.
 * 3-level l3 entries have all reserved bits checked.
 * 3-level entries can no longer alias Xen's idea of paged or shared.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/mm/guest_walk.c  | 448 +++---
 xen/arch/x86/mm/hap/guest_walk.c  |  21 +-
 xen/arch/x86/mm/hap/nested_ept.c  |   2 +-
 xen/arch/x86/mm/p2m.c |  17 +-
 xen/arch/x86/mm/shadow/multi.c|  27 +--
 xen/include/asm-x86/guest_pt.h|  63 --
 xen/include/asm-x86/p2m.h |   2 +-
 xen/include/asm-x86/page.h|   3 -
 xen/include/asm-x86/processor.h   |   2 +
 xen/include/asm-x86/x86_64/page.h |   6 -
 10 files changed, 303 insertions(+), 288 deletions(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 4f8d0e3..01bcb1d 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -32,44 +32,6 @@ asm(".file \"" __OBJECT_FILE__ "\"");
 #include 
 #include 
 
-extern const uint32_t gw_page_flags[];
-#if GUEST_PAGING_LEVELS == CONFIG_PAGING_LEVELS
-const uint32_t gw_page_flags[] = {
-/* I/F -  Usr Wr */
-/* 0   0   0   0 */ _PAGE_PRESENT,
-/* 0   0   0   1 */ _PAGE_PRESENT|_PAGE_RW,
-/* 0   0   1   0 */ _PAGE_PRESENT|_PAGE_USER,
-/* 0   0   1   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_USER,
-/* 0   1   0   0 */ _PAGE_PRESENT,
-/* 0   1   0   1 */ _PAGE_PRESENT|_PAGE_RW,
-/* 0   1   1   0 */ _PAGE_PRESENT|_PAGE_USER,
-/* 0   1   1   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_USER,
-/* 1   0   0   0 */ _PAGE_PRESENT|_PAGE_NX_BIT,
-/* 1   0   0   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_NX_BIT,
-/* 1   0   1   0 */ _PAGE_PRESENT|_PAGE_USER|_PAGE_NX_BIT,
-/* 1   0   1   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_NX_BIT,
-/* 1   1   0   0 */ _PAGE_PRESENT|_PAGE_NX_BIT,
-/* 1   1   0   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_NX_BIT,
-/* 1   1   1   0 */ _PAGE_PRESENT|_PAGE_USER|_PAGE_NX_BIT,
-/* 1   1   1   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_NX_BIT,
-};
-#endif
-
-/* Flags that are needed in a pagetable entry, with the sense of NX inverted */
-static uint32_t mandatory_flags(struct vcpu *v, uint32_t pfec) 
-{
-/* Don't demand not-NX if the CPU wouldn't enforce it. */
-if ( !guest_supports_nx(v) )
-pfec &= ~PFEC_insn_fetch;
-
-/* Don't demand R/W if the CPU wouldn't enforce it. */
-if ( is_hvm_vcpu(v) && unlikely(!hvm_wp_enabled(v)) 
- && !(pfec & PFEC_user_mode) )
-pfec &= ~PFEC_write_access;
-
-return gw_page_flags[(pfec & 0x1f) >> 1] | _PAGE_INVALID_BITS;
-}
-
 /* Modify a guest pagetable entry to set the Accessed and Dirty bits.
  * Returns non-zero if it actually writes to guest memory. */
 static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
@@ -90,62 +52,33 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, 
int set_dirty)
 return 0;
 }
 
-#if GUEST_PAGING_LEVELS >= 4
-static bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
-uint32_t pte_flags, uint32_t pte_pkey)
-{
-uint32_t pkru;
-
-/* When page isn't present,  PKEY isn't checked. */
-if ( !(pfec & PFEC_page_present) || is_pv_vcpu(vcpu) )
-return 0;
-
-/*
- * PKU:  additional mechanism by which the paging controls
- * access to user-mode addresses based on the value in the
- * PKRU register. A fault is considered as a PKU violation if all
- * of the following conditions are true:
- * 1.CR4_PKE=1.
- * 2.EFER_LMA=1.
- * 3.Page is present with no reserved bit violations.
- * 4.The access is not an instruction fetch.
- * 5.The access is to a user page.
- * 6.PKRU.AD=1 or
- *  the access is a

[Xen-devel] [PATCH 5/7] x86/shadow: Use the pagewalk reserved bits helpers

2017-02-27 Thread Andrew Cooper
The shadow logic should never create a shadow of a guest PTE which contains
reserved bits from the guests point of view.  Such a shadowed entry might not
cause #PF[RSVD] when walked by hardware, thus won't behave architecturally
from the guests point of view.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/mm/shadow/multi.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 7c6b017..702835b 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2157,7 +2157,8 @@ static int validate_gl4e(struct vcpu *v, void *new_ge, 
mfn_t sl4mfn, void *se)
 
 perfc_incr(shadow_validate_gl4e_calls);
 
-if ( guest_l4e_get_flags(new_gl4e) & _PAGE_PRESENT )
+if ( (guest_l4e_get_flags(new_gl4e) & _PAGE_PRESENT) &&
+ !guest_l4e_rsvd_bits(v, new_gl4e) )
 {
 gfn_t gl3gfn = guest_l4e_get_gfn(new_gl4e);
 mfn_t gl3mfn = get_gfn_query_unlocked(d, gfn_x(gl3gfn), &p2mt);
@@ -2215,7 +2216,8 @@ static int validate_gl3e(struct vcpu *v, void *new_ge, 
mfn_t sl3mfn, void *se)
 
 perfc_incr(shadow_validate_gl3e_calls);
 
-if ( guest_l3e_get_flags(new_gl3e) & _PAGE_PRESENT )
+if ( (guest_l3e_get_flags(new_gl3e) & _PAGE_PRESENT) &&
+ !guest_l3e_rsvd_bits(v, new_gl3e) )
 {
 gfn_t gl2gfn = guest_l3e_get_gfn(new_gl3e);
 mfn_t gl2mfn = get_gfn_query_unlocked(d, gfn_x(gl2gfn), &p2mt);
@@ -2248,7 +2250,8 @@ static int validate_gl2e(struct vcpu *v, void *new_ge, 
mfn_t sl2mfn, void *se)
 
 perfc_incr(shadow_validate_gl2e_calls);
 
-if ( guest_l2e_get_flags(new_gl2e) & _PAGE_PRESENT )
+if ( (guest_l2e_get_flags(new_gl2e) & _PAGE_PRESENT) &&
+ !guest_l2e_rsvd_bits(v, new_gl2e) )
 {
 gfn_t gl1gfn = guest_l2e_get_gfn(new_gl2e);
 if ( guest_supports_superpages(v) &&
@@ -2288,8 +2291,7 @@ static int validate_gl1e(struct vcpu *v, void *new_ge, 
mfn_t sl1mfn, void *se)
 shadow_l1e_t new_sl1e;
 guest_l1e_t new_gl1e = *(guest_l1e_t *)new_ge;
 shadow_l1e_t *sl1p = se;
-gfn_t gfn;
-mfn_t gmfn;
+mfn_t gmfn = INVALID_MFN;
 p2m_type_t p2mt;
 int result = 0;
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -2298,8 +2300,13 @@ static int validate_gl1e(struct vcpu *v, void *new_ge, 
mfn_t sl1mfn, void *se)
 
 perfc_incr(shadow_validate_gl1e_calls);
 
-gfn = guest_l1e_get_gfn(new_gl1e);
-gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
+if ( (guest_l1e_get_flags(new_gl1e) & _PAGE_PRESENT) &&
+ !guest_l1e_rsvd_bits(v, new_gl1e) )
+{
+gfn_t gfn = guest_l1e_get_gfn(new_gl1e);
+
+gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
+}
 
 l1e_propagate_from_guest(v, new_gl1e, gmfn, &new_sl1e, ft_prefetch, p2mt);
 result |= shadow_set_l1e(d, sl1p, new_sl1e, p2mt, sl1mfn);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xen/arm: Hiding SMMUs from Dom0 when using ACPI on Xen

2017-02-27 Thread Julien Grall



On 27/02/17 13:23, Vijay Kilari wrote:

Hi Julien,


Hello Vijay,


On Wed, Feb 22, 2017 at 7:40 PM, Julien Grall  wrote:

Hello,

There was few discussions recently about hiding SMMUs from DOM0 when using
ACPI. I thought it would be good to have a separate thread for this.

When using ACPI, the SMMUs will be described in the IO Remapping Table
(IORT). The specification can be found on the ARM website [1].

For a brief summary, the IORT can be used to discover the SMMUs present on
the platform and find for a given device the ID to configure components such
as ITS (DeviceID) and SMMU (StreamID).

The appendix A in the specification gives an example how DeviceID and
StreamID can be found. For instance, when a PCI device is both protected by
an SMMU and MSI-capable the following translation will happen:
RID -> StreamID -> DeviceID

Currently, SMMUs are hidden from DOM0 because they are been used by Xen and
we don't support stage-1 SMMU. If we pass the IORT as it is, DOM0 will try
to initialize SMMU and crash.

I first thought about using a Xen specific way (STAO) or extending a flag in
IORT. But that is not ideal.

So we would have to rewrite the IORT for DOM0. Given that a range of RID can
mapped to multiple ranges of DeviceID, we would have to translate RID one by
one to find the associated DeviceID. I think this may end up to complex code
and have a big IORT table.


Why can't we replace Output base of IORT of PCI node with SMMU output base?.
I mean similar to PCI node without SMMU, why can't replace output base
of PCI node with
SMMU's output base?.


Because I don't see anything in the spec preventing one RC ID mapping to 
produce multiple SMMU ID mapping. So which output base would you use?




The issue I see is RID is [15:0] where is DeviceID is [17:0].



However, given that DeviceID will be used by DOM0 to only configure the ITS.
We have no need to use to have the DOM0 DeviceID equal to the host DeviceID.
So I think we could simplify our life by generating DeviceID for each RID
range.


If DOM0 DeviceID != host Device ID, then we cannot initialize ITS using DOM0
ITS commands (MAPD). So, is it concluded that ITS initializes all the devices
with platform specific Device ID's in Xen?.


Initializing ITS using DOM0 ITS command is a workaround until we get PCI 
passthrough done. It would still be possible to implement that with 
vDeviceID != pDeviceID as Xen would likely have the mapping between the 
2 DeviceID.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable-smoke test] 106199: tolerable trouble: broken/fail/pass - PUSHED

2017-02-27 Thread osstest service owner
flight 106199 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106199/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64   5 xen-buildfail   never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 xen  3564f9e400c417fc3d47b732f474019c494434f9
baseline version:
 xen  d6e9f8d4f35d938417f9dc2ea50f6e8004e26725

Last test of basis   106196  2017-02-27 11:02:16 Z0 days
Testing same since   106199  2017-02-27 13:01:37 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Haozhong Zhang 
  Juergen Gross 
  Tim Deegan 
  Wei Liu 

jobs:
 build-amd64  pass
 build-arm64  fail
 build-armhf  pass
 build-amd64-libvirt  pass
 build-arm64-pvopsfail
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=3564f9e400c417fc3d47b732f474019c494434f9
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
3564f9e400c417fc3d47b732f474019c494434f9
+ branch=xen-unstable-smoke
+ revision=3564f9e400c417fc3d47b732f474019c494434f9
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.8-testing
+ '[' x3564f9e400c417fc3d47b732f474019c494434f9 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:

Re: [Xen-devel] [PATCH 06/10] x86/cpuid: Handle leaf 0x6 in guest_cpuid()

2017-02-27 Thread Andrew Cooper
On 22/02/17 09:26, Jan Beulich wrote:
 On 22.02.17 at 10:12,  wrote:
>> On 22/02/17 08:23, Andrew Cooper wrote:
>>> On 22/02/17 07:31, Jan Beulich wrote:
>>> On 21.02.17 at 18:40,  wrote:
> On 21/02/17 17:25, Jan Beulich wrote:
> On 20.02.17 at 12:00,  wrote:
>>> The PV MSR handling logic as minimal support for some thermal/perf 
>>> operations
>>> from the hardware domain, so leak through the implemented subset of 
>>> features.
>> Does it make sense to continue to special case PV hwdom here?
> Being able to play with these MSRs will be actively wrong for HVM
> context.  It is already fairly wrong for PV context, as nothing prevents
> you being rescheduled across pcpus while in the middle of a read/write
> cycle on the MSRs.
 So the MSRs in question are, afaics
 - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all
   of which are is_cpufreq_controller() dependent)
 - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS
   (both of which are is_pinned_vcpu() dependent)
 For the latter your argument doesn't apply. For the former, I've
 been wondering for a while whether we shouldn't do away with
 "cpufreq=dom0-kernel".
>>> Hmm.  All good points.  If I can get away without leaking any of this,
>>> that would be ideal.  (Lets see what Linux thinks of such a setup.)
>> Linux seems fine without any of this leakage.
> But is that for a broad range of versions, or just the one you had
> to hand?

3.10 and 4.4 PVOps.  Looking at the 2.6.32-classic source, I can't see
anything which would be a problem.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v1 02/21] x86: NUMA: Refactor NUMA code

2017-02-27 Thread Jan Beulich
>>> Vijay Kilari  02/27/17 12:43 PM >>>
>On Thu, Feb 9, 2017 at 9:41 PM, Jan Beulich  wrote:
> On 09.02.17 at 16:56,  wrote:
>>> +struct node_data node_data[MAX_NUMNODES];
>>> +
>>> +/* Mapping from pdx to node id */
>>> +int memnode_shift;
>>> +unsigned long memnodemapsize;
>>> +u8 *memnodemap;
>>> +typeof(*memnodemap) _memnodemap[64];
>>
>> Careful with removing any "static" please. For the last one in
>> particular you would need to change the name if it's really necessary
>> to make non-static. Even better would be though to keep it static
>> and provide suitable accessors.
>>
>> Also please sanitize types as you're moving stuff: memnode_shift
>> looks like it really wants to be unsigned int, and u8 should really
>> be uint8_t (as we're trying to phase out u8 & Co). This also applies
>> to code further down.
>
>You mean to change all occurrences of
>s/u8/uint8_t
>s/u32/uint32_t
>s/u64/uint64_t

Yes.

>Also, I see that xen/arch/x86/srat.c coding style is not adhere to xen
>coding style.
>Shall I clean up before I move the code?

If you want to take the time - sure. All I'd like to ask for is that at least 
the
file(s) you move the code _to_ end up with consistent style.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 08/10] x86/cpuid: Handle leaf 0xb in guest_cpuid()

2017-02-27 Thread Andrew Cooper
On 22/02/17 10:37, Jan Beulich wrote:
 On 22.02.17 at 11:22,  wrote:
>> On 22/02/17 09:16, Jan Beulich wrote:
>> On 20.02.17 at 12:00,  wrote:
 Leaf 0xb is reserved by AMD, and uniformly hidden from guests by the 
>> toolstack
 logic and hypervisor PV logic.

 The previous dynamic logic filled in the x2APIC ID for all HVM guests.  
 This
 is modified to respect the entire leaf being reserved by AMD, but is 
 altered
 to include PV Intel guests, so they get more sensible values in their 
 emulated
 and faulted view of CPUID.
>>> Don't we expose x2APIC to HVM guests even on AMD systems? In
>>> which case we surely will want to also expose the x2APIC ID.
>> The x2apic feature bit is still listed as reserved in the latest AMD
>> SDM, and I haven't seen a piece of hardware which supports it.
> This doesn't seem to answer my question, as hardware behavior is
> of no interest here (our emulation works on old Intel hardware too).
> Just to be sure I've checked a HVM guest on an AMD box, and this
> is what Linux reports:
>
> <6>Enabling x2apic
> <6>Enabled x2apic
> <6>Switched APIC routing to physical x2apic.

Hmm - that is a good point - we do offer x2apic even to AMD HVM guests.

However, this is problematic and highlights another can of worms.  We
cannot turn on APICV/AVIC for guests if our cpuid features claim a
greater featureset than hardware can support.  (Looking at c/s
3e0c8272f, TSC_DEADLINE is a feature in a similar position.)

I think the only feasible way of making this work is to ensure that the
default policy exactly matches hardware, and the max policy contains all
features we are able to emulate.  Determination of whether to use
APICV/AVIC or not must be on whether the chosen featureset is compatible
with hardware or not.

>
>> Also, x2apic or not, this leaf is documented as fully reserved, so we
>> shouldn't be filling it in anyway.
> While guests appear to do fine without, I think this is inconsistent
> with providing x2APIC support. But as current behavior is the same,
> we can as well leave this for the future.
>
 @@ -959,6 +950,14 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
  }
  break;
  
 +case 0xb:
 +if ( p->x86_vendor == X86_VENDOR_INTEL )
 +{
 +/* Fix the x2APIC identifier. */
 +res->d = v->vcpu_id * 2;
 +}
 +break;
>>> Irrespective of the comment above, wouldn't the if() here better
>>> look at the x2APIC feature flag of the domain?
>> Perhaps.  There isn't a formal link called out, but looking at the
>> content, I am guessing that real hardware is consistent with their
>> support of x2apic and the availability of this leaf?
> I suppose so, at least they've got introduced together iirc.

Ok.  I will change the determination to be based exclusively on x2apic
being advertised, and leave a comment explaining why.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 10/10] x86/cpuid: Always enable faulting for the control domain

2017-02-27 Thread Andrew Cooper
On 22/02/17 10:10, Jan Beulich wrote:
 On 22.02.17 at 11:00,  wrote:
>> On 22/02/17 09:23, Jan Beulich wrote:
>> On 20.02.17 at 12:00,  wrote:
 The domain builder in libxc no longer depends on leaked CPUID information 
 to
 properly construct HVM domains.  Remove the control domain exclusion.
>>> Am I missing some intermediate step? As long as there's a raw
>>> CPUID invocation in xc_cpuid_x86.c (which is still there in staging
>>> and I don't recall this series removing it) it at least _feels_ unsafe.
>> Strictly speaking, the domain builder part of this was completed after
>> my xsave adjustments.  All the guest-type-dependent information now
>> comes from non-cpuid sources in libxc, or Xen ignores the toolstack
>> values and recalculates information itself.
>>
>> However, until the Intel leaves were complete, dom0 had a hard time
>> booting with this change as there were no toolstack-provided policy and
>> no leakage from hardware.
> So what are the CPUID uses in libxc then needed for at this point?
> Could they be removed in a prereq patch to make clear all needed
> information is now being obtained via hypercalls?

I'd prefer to defer that work.  The next chunk of CPUID work is going to
be redesigning and reimplementing the hypervisor/libxc interface, and
all cpuid() calls in libxc will fall out there, but its not a trivial
set of changes to make.

>
>>> Also the change here then results in Dom0 observing different
>>> behavior between faulting-capable and faulting-incapable hosts.
>>> I'm not convinced this is desirable.
>> I disagree.  Avoiding the leakage is very desirable moving forwards.
>>
>> Other side effects are that it makes PV and PVH dom0 functionally
>> identical WRT CPUID, and PV userspace (which, unlikely the kernel, tends
>> not to be Xen-aware) sees sensible information.
> I can see the upsides too, hence the "I'm not convinced" ...

So is that an ack or a nack?  I am afraid that this isn't very helpful.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RTDS Patch v2 for Xen4.8] xen: rtds: only tickle non-already tickled CPUs

2017-02-27 Thread Dario Faggioli
On Fri, 2017-02-24 at 15:54 -0600, Haoran Li wrote:
> From: naroahlee 
> 
> Bug Analysis:
>
Just kill this line above.

> When more than one idle VCPUs that have the same PCPU as their
> previous running core invoke runq_tickle(), they will tickle the same
> PCPU. The tickled PCPU will only pick at most one VCPU, i.e., the
> highest-priority one, to execute. The other VCPUs will not be
> scheduled for a period, even when there is an idle core, making these
> VCPUs unnecessarily starve for one period.
> Therefore, always make sure that we only tickle PCPUs that have not
> been tickled already.
>
And I'd say to wrap around the lines at a shorter threshold. `git log',
for instance, indents the changelogs, and the idea would be for them to
look good on 80 characters terminal.

> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -1144,9 +1144,10 @@ rt_vcpu_sleep(const struct scheduler *ops,
> struct vcpu *vc)
>   * Called by wake() and context_saved()
>   * We have a running candidate here, the kick logic is:
>   * Among all the cpus that are within the cpu affinity
> - * 1) if the new->cpu is idle, kick it. This could benefit cache hit
> - * 2) if there are any idle vcpu, kick it.
> - * 3) now all pcpus are busy;
> + * 1) if there are any idle vcpu, kick it.
> + *For cache benefit, we first search new->cpu.
> + * 
> + * 2) now all pcpus are busy;
>
As Meng said, no blank line here.

>   *among all the running vcpus, pick lowest priority one
>   *if snext has higher priority, kick it.
>   *
> @@ -1174,17 +1175,11 @@ runq_tickle(const struct scheduler *ops,
> struct rt_vcpu *new)
>  cpumask_and(¬_tickled, online, new->vcpu->cpu_hard_affinity);
>  cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled);
>  
> -/* 1) if new's previous cpu is idle, kick it for cache benefit
> */
> -if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
> -{
> -SCHED_STAT_CRANK(tickled_idle_cpu);
> -cpu_to_tickle = new->vcpu->processor;
> -goto out;
> -}
> -
> -/* 2) if there are any idle pcpu, kick it */
> +/* 1) if there are any idle pcpu, kick it */
>
While there, do you mind adding a full stop at the end of the sentence?

>  /* The same loop also find the one with lowest priority */
> -for_each_cpu(cpu, ¬_tickled)
> + /* For cache benefit, we search new->cpu first */
>
And this looks to me to be misindented.

If you fix these things and resend, you can add (together to Meng's
one):

Reviewed-by: Dario Faggioli 

And I'm Cc-ing George, so he can also adivse if he wants, as hee is
also a scheduler maintainer... not to mention that he will most likely
be the one that will commit the change, so please do Cc him yourself as
well when you resend the patch (I should have asked to do that before,
but did not notice he was not there).

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RESEND RFC 1/8] mm: Separate free page chunk merging into its own routine

2017-02-27 Thread Andrew Cooper
On 27/02/17 00:37, Boris Ostrovsky wrote:
> +static void merge_chunks(struct page_info *pg, unsigned int node,
> + unsigned int zone, unsigned int order)
> +{
> +ASSERT(spin_is_locked(&heap_lock));
> +
> +/* Merge chunks as far as possible. */
> +while ( order < MAX_ORDER )
> +{
> +unsigned int mask = 1UL << order;

This was unsigned long before.  If order is guaranteed never to be
larger than 31, we are ok.  If not, it needs correctly.

~Andrew

>  /* Free 2^@order set of pages. */
>  static void free_heap_pages(
>  struct page_info *pg, unsigned int order)
>  {
> -unsigned long mask, mfn = page_to_mfn(pg);
> +unsigned long mfn = page_to_mfn(pg);
>  unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
>  unsigned int zone = page_to_zone(pg);
>  
> @@ -977,38 +1024,7 @@ static void free_heap_pages(
>  midsize_alloc_zone_pages = max(
>  midsize_alloc_zone_pages, total_avail_pages / 
> MIDSIZE_ALLOC_FRAC);
>  
> -/* Merge chunks as far as possible. */
> -while ( order < MAX_ORDER )
> -{
> -mask = 1UL << order;
> -


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL]

2017-02-27 Thread Dario Faggioli
On Sun, 2017-02-26 at 16:53 +0100, Dario Faggioli wrote:
> On Fri, 2017-02-24 at 17:14 +0100, Dario Faggioli wrote:
> > On Wed, 2017-02-22 at 01:46 -0700, Jan Beulich wrote:
> > > 
> > > However, comparing with the staging version of the file
> > > (which is heavily different), the immediate code involved here
> > > isn't
> > > all that different, so I wonder whether (a) this is a problem on
> > > staging too or (b) we're missing another backport. Dario?
> > > 
> > So, according to my investigation, this is a genuine race. It
> > affects
> > this branch as well as staging, but it manifests less frequently
> > (or,
> > I
> > should say, very rarely) in the latter.
> > 
> Actually, this is probably wrong. It looks like the following commit:
> 
>  f3d47501db2b7bb8dfd6a3c9710b7aff4b1fc55b
>  xen: fix a (latent) cpupool-related race during domain destroy
> 
> is not in staging-4.7.
> 
And my testing confirms that backporting the changeset above (which
just applies cleanly on staging-4.7, AFAICT) make the problem go away.

As the changelog of that commit says, I've even seen something similar
happening already during my development... Sorry I did not recognise it
sooner, and for failing to request backport of that change in the first
place.

I'm therefore doing that now: I ask for backport of:

 f3d47501db2b7bb8dfd6a3c9710b7aff4b1fc55b
 xen: fix a (latent) cpupool-related race during domain destroy

to 4.7.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RESEND RFC 2/8] mm: Place unscrubbed pages at the end of pagelist

2017-02-27 Thread Andrew Cooper
On 27/02/17 00:37, Boris Ostrovsky wrote:
> . so that it's easy to find pages that need to be scrubbed (those pages are
> now marked with _PGC_need_scrub bit).
>
> Signed-off-by: Boris Ostrovsky 
> ---
>  xen/common/page_alloc.c  |   97 +++--
>  xen/include/asm-x86/mm.h |4 ++
>  2 files changed, 79 insertions(+), 22 deletions(-)
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 352eba9..653bb91 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -385,6 +385,8 @@ typedef struct page_list_head 
> heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
>  static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
>  #define heap(node, zone, order) ((*_heap[node])[zone][order])
>  
> +static unsigned node_need_scrub[MAX_NUMNODES];

This will overflow if there is 16TB of ourstanding memory needing
scrubbed per node.

In the worst case, all available patches could be oustanding for
scrubbing, so should be counted using the same types.

> +
>  static unsigned long *avail[MAX_NUMNODES];
>  static long total_avail_pages;
>  
> @@ -935,11 +942,16 @@ static bool_t can_merge(struct page_info *head, 
> unsigned int node,
>(phys_to_nid(page_to_maddr(head)) != node) )
>  return 0;
>  
> +if ( !!need_scrub ^
> + !!test_bit(_PGC_need_scrub, &head->count_info) )
> +return 0;
> +
>  return 1;
>  }
>  
>  static void merge_chunks(struct page_info *pg, unsigned int node,
> - unsigned int zone, unsigned int order)
> + unsigned int zone, unsigned int order,
> + bool_t need_scrub)

Can't you calculate need_scrub from *pg rather than passing an extra
parameter?

>  {
>  ASSERT(spin_is_locked(&heap_lock));
>  
> @@ -970,12 +982,49 @@ static void merge_chunks(struct page_info *pg, unsigned 
> int node,
>  }
>  
>  PFN_ORDER(pg) = order;
> -page_list_add_tail(pg, &heap(node, zone, order));
> +if ( need_scrub )
> +page_list_add_tail(pg, &heap(node, zone, order));
> +else
> +page_list_add(pg, &heap(node, zone, order));
> +}
> +
> +static void scrub_free_pages(unsigned int node)
> +{
> +struct page_info *pg;
> +unsigned int i, zone;
> +int order;
> +
> +ASSERT(spin_is_locked(&heap_lock));
> +
> +if ( !node_need_scrub[node] )
> +return;
> +
> +for ( zone = 0; zone < NR_ZONES; zone++ )
> +{
> +for ( order = MAX_ORDER; order >= 0; order-- )
> +{
> +while ( !page_list_empty(&heap(node, zone, order)) )
> +{
> +/* Unscrubbed pages are always at the end of the list. */
> +pg = page_list_last(&heap(node, zone, order));
> +if ( !test_bit(_PGC_need_scrub, &pg[0].count_info) )

&pg->count_info

> +break;
> +
> +for ( i = 0; i < (1 << order); i++)

1U, and probably unsigned long.  Similarly later.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Michal Hocko
On Mon 27-02-17 12:25:10, Heiko Carstens wrote:
> On Mon, Feb 27, 2017 at 11:02:09AM +0100, Vitaly Kuznetsov wrote:
> > A couple of other thoughts:
> > 1) Having all newly added memory online ASAP is probably what people
> > want for all virtual machines.
> 
> This is not true for s390. On s390 we have "standby" memory that a guest
> sees and potentially may use if it sets it online. Every guest that sets
> memory offline contributes to the hypervisor's standby memory pool, while
> onlining standby memory takes memory away from the standby pool.
> 
> The use-case is that a system administrator in advance knows the maximum
> size a guest will ever have and also defines how much memory should be used
> at boot time. The difference is standby memory.
> 
> Auto-onlining of standby memory is the last thing we want.
> 
> > Unfortunately, we have additional complexity with memory zones
> > (ZONE_NORMAL, ZONE_MOVABLE) and in some cases manual intervention is
> > required. Especially, when further unplug is expected.
> 
> This also is a reason why auto-onlining doesn't seem be the best way.

Can you imagine any situation when somebody actually might want to have
this knob enabled? From what I understand it doesn't seem to be the
case.

-- 
Michal Hocko
SUSE Labs

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [ovmf test] 106187: regressions - FAIL

2017-02-27 Thread osstest service owner
flight 106187 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106187/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 105963
 test-amd64-i386-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail REGR. vs. 105963

version targeted for testing:
 ovmf 29c90f14c91bf134a202c1b4ed600c9766bc1411
baseline version:
 ovmf e0307a7dad02aa8c0cd8b3b0b9edce8ddb3fef2e

Last test of basis   105963  2017-02-21 21:43:31 Z5 days
Failing since105980  2017-02-22 10:03:53 Z5 days   12 attempts
Testing same since   106187  2017-02-27 06:33:44 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Ard Biesheuvel 
  Jeff Fan 
  Jiewen Yao 
  Laszlo Ersek 
  Paolo Bonzini 
  Star Zeng 
  Yonghong Zhu 
  Zhang Lubo 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 1029 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RESEND RFC 2/8] mm: Place unscrubbed pages at the end of pagelist

2017-02-27 Thread Boris Ostrovsky

>>  
>>  static void merge_chunks(struct page_info *pg, unsigned int node,
>> - unsigned int zone, unsigned int order)
>> + unsigned int zone, unsigned int order,
>> + bool_t need_scrub)
> Can't you calculate need_scrub from *pg rather than passing an extra
> parameter?

Right, I can just look at the head's PGC_need_scrub bit.

>>  {
>>  ASSERT(spin_is_locked(&heap_lock));
>>  
>> @@ -970,12 +982,49 @@ static void merge_chunks(struct page_info *pg, 
>> unsigned int node,
>>  }
>>  
>>  PFN_ORDER(pg) = order;
>> -page_list_add_tail(pg, &heap(node, zone, order));
>> +if ( need_scrub )
>> +page_list_add_tail(pg, &heap(node, zone, order));
>> +else
>> +page_list_add(pg, &heap(node, zone, order));
>> +}
>> +
>> +static void scrub_free_pages(unsigned int node)
>> +{
>> +struct page_info *pg;
>> +unsigned int i, zone;
>> +int order;
>> +
>> +ASSERT(spin_is_locked(&heap_lock));
>> +
>> +if ( !node_need_scrub[node] )
>> +return;
>> +
>> +for ( zone = 0; zone < NR_ZONES; zone++ )
>> +{
>> +for ( order = MAX_ORDER; order >= 0; order-- )
>> +{
>> +while ( !page_list_empty(&heap(node, zone, order)) )
>> +{
>> +/* Unscrubbed pages are always at the end of the list. */
>> +pg = page_list_last(&heap(node, zone, order));
>> +if ( !test_bit(_PGC_need_scrub, &pg[0].count_info) )
> &pg->count_info
>
>> +break;
>> +
>> +for ( i = 0; i < (1 << order); i++)
> 1U, and probably unsigned long.  Similarly later.

I'll update sizing throughout the series.

I just noticed, BTW, that this routine lost something like

   page_list_del(pg, &heap(node, zone, order));
   merge_chunks(pg, node, zone, chunk_order, 0);

otherwise we just scrub the last chunk on each list.

This is all rewritten in a later patch which is why I missed this during
testing.

-boris




___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 08/29] xl: move some helper functions to xl_utils.c

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 08/29] xl: move some helper functions to xl_utils.c"):
> Move some commonly used functions to a new file. Prepend "x" to
> function names to stick to consistent naming scheme.

The `xfoo' naming scheme is a reference to `xmalloc' et al, which
conventionally crash the program on failure.

I don't think you need to add this to every function in xl.  It is
applicable only when there is an existing function which returns an
error code, which is wrapped by one which never fails.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 05/29] xl: generate _paths.h

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 05/29] xl: generate _paths.h"):
> It is included by xl.h. Previously it was using _paths.h from some other
> place. We'd better generate one for xl as well.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 01/29] xl: remove accidentally committed hunk from Makefile

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 01/29] xl: remove accidentally committed hunk from 
Makefile"):
> It was never intended to be committed. Lucky the high level Makefile was
> correct so it didn't cause us problem when building xl.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 03/29] xl: remove inclusion of libxl_osdeps.h

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 03/29] xl: remove inclusion of libxl_osdeps.h"):
> There is no reason for a client to include a private header from libxl.
> Remove the inclusion and define _GNU_SOURCE for {v,}asprintf in
> xl_cmdimpl.c.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/29] xl: use <> variant to include Xen tools library headers

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 04/29] xl: use <> variant to include Xen tools library 
headers"):
> They should be treated like any other libraries installed on the build
> host. Compiler options are set correctly to point to their locations.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 02/29] xl: update copyright information

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 02/29] xl: update copyright information"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 06/29] xl: remove trailing spaces in xl_cmdimpl.c

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 06/29] xl: remove trailing spaces in xl_cmdimpl.c"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 09/29] xl: split out tmem related code to xl_tmem.c

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 09/29] xl: split out tmem related code to xl_tmem.c"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 07/29] xl: lift a bunch of macros to xl_utils.h

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 07/29] xl: lift a bunch of macros to xl_utils.h"):
> We're going to split xl_cmdimpl.c into multiple files. Lift the commonly
> used macros to xl_utils.h.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 13/29] xl: split out vtpm related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 13/29] xl: split out vtpm related code"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 22/29] xl: split out psr related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 22/29] xl: split out psr related code"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 20/29] xl: split out cd related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 20/29] xl: split out cd related code"):
> Signed-off-by: Wei Liu 
> ---
>  tools/xl/Makefile |   2 +-
>  tools/xl/xl_cd.c  | 114 
> ++

Would you mind calling this `libxl_cdrom.c' rather than `_cd' ?
The latter makes me think of chdir...

Thanks,
Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 10/29] xl: split out xl_parse.[ch]

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 10/29] xl: split out xl_parse.[ch]"):
> Some notable changes other than code movement:
> 1. rename cpurange_parse to parse_cpurange;
> 2. provide a function to return shutdown action name.

Can you split that out so that I can review it separately from the
code motion ?

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 17/29] xl: split out scheduler related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 17/29] xl: split out scheduler related code"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

BTW, for all of these code motion patches: it would be nice if the
comment said `code motion only, apart from the obvious Makefile
change' or something.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 14/29] xl: split out block related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 14/29] xl: split out block related code"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 29/29] xl: merge xl_cmdimpl.c into xl.c

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 29/29] xl: merge xl_cmdimpl.c into xl.c"):
> After splitting out all the meaty bits, xl_cmdimpl.c doesn't contain
> much. Merge the rest into xl.c and delete the file.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 25/29] xl: split out miscellaneous functions

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 25/29] xl: split out miscellaneous functions"):
> A collections of functions that don't warrant their own files.
> 
> Moving main_devd there requires lifting do_daemonize to xl_utils.c.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 24/29] xl: split out vnc and console related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 24/29] xl: split out vnc and console related code"):
> The new file also contains code for channel, which is just a console
> in disguise.
> 
> Replace the call to vncviewer() with libxl_vncviewer_exec() directly in
> main_vncviewer.

Again, please separate out the non-code-motion change.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 28/29] xl: split out migration related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 28/29] xl: split out migration related code"):
> Include COLO / Remus code because they are built on top of the existing
> migration protocol.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 11/29] xl: split out cpupool related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 11/29] xl: split out cpupool related code"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 16/29] xl: split out usb related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 16/29] xl: split out usb related code"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 26/29] xl: split out vm lifecycle control functions

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 26/29] xl: split out vm lifecycle control functions"):
> Including create, reboot, shutdown, pause, unpause and destroy.
> 
> Lift a bunch of core data structures and function declarations to xl.h
> because they are needed in both xl_cmdimpl.c and xl_vmcontrol.c.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 23/29] xl: split out functions to print out information

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 23/29] xl: split out functions to print out 
information"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 21/29] xl: split out memory related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 21/29] xl: split out memory related code"):
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 27/29] xl: split out save/restore related code

2017-02-27 Thread Ian Jackson
Wei Liu writes ("[PATCH 27/29] xl: split out save/restore related code"):
> Add some function declarations to xl.h because they are now needed in
> multiple files.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   >