[Bug 83381] 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses".

2014-09-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=83381

robert...@intel.com changed:

   What|Removed |Added

 CC||robert...@intel.com

--- Comment #10 from robert...@intel.com ---
I greped its history, it was found on 3.9 kernel and was cerntain it was fine
in 3.0. I think it shall be a regression.
As for the component it's related to, I agree we can change its component to
kernle or else, other than KVM.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Frank Blaschka
On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
> On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
> > This set of patches implements pci pass-through support for qemu/KVM on 
> > s390.
> > PCI support on s390 is very different from other platforms.
> > Major differences are:
> > 
> > 1) all PCI operations are driven by special s390 instructions
> 
> Generating config cycles is always arch specific.
> 
> > 2) all s390 PCI instructions are privileged
> 
> While the operations to generate config cycles on x86 are not
> privileged, they must be arbitrated between accesses, so in a sense
> they're privileged.
> 
> > 3) PCI config and memory spaces can not be mmap'ed
> 
> VFIO has mapping flags that allow any region to specify mmap support.
>

Hi Alex,

thx for your reply.

Let me elaborate a little bit ore on 1 - 3. Config and memory space can not
be accessed via memory operations. You have to use special s390 instructions.
This instructions can not be executed in user space. So there is no other
way than executing this instructions in kernel. Yes vfio does support a
slow path via ioctrl we could use, but this seems suboptimal from performance
point of view.
 
> > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
> >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
> 
> VFIO delivers interrupts as eventfds regardless of the underlying
> platform mechanism.
> 

yes that's right, but then we have to do platform specific stuff to present
the irq to the guest. I do not say this is impossible but we have add s390
specific code to vfio. 

> > 5) For DMA access there is always an IOMMU required.
> 
> x86 requires the same.
> 
> >  s390 pci implementation
> >does not support a complete memory to iommu mapping, dma mappings are
> >created on request.
> 
> Sounds like POWER.

Don't know the details from power, maybe it is similar but not the same.
We might be able to extend vfio to have a new interface allowing
us to do DMA mappings on request.

> 
> > 6) The OS does not get any informations about the physical layout
> >of the PCI bus.
> 
> If that means that every device is isolated (seems unlikely for
> multifunction devices) then that makes IOMMU group support really easy.
>

OK
 
> > 7) To take advantage of system z specific virtualization features
> >we need to access the SIE control block residing in the kernel KVM
> 
> The KVM-VFIO device allows interaction between VFIO devices and KVM.
> 
> > 8) To enable system z specific virtualization features we have to manipulate
> >the zpci device in kernel.
> 
> VFIO supports different device backends, currently pci_dev and working
> towards platform devices.  zpci might just be an extension to standard
> pci.
> 

7 - 8 At least this is not as straightforward as the pure kernel approach, but
I have to dig into that in more detail if we could only agree on a vfio 
solution.

> > For this reasons I decided to implement a kernel based approach similar
> > to x86 device assignment. There is a new qemu device (s390-pci) 
> > representing a
> > pass through device on the host. Here is a sample qemu device configuration:
> > 
> > -device s390-pci,host=:00:00.0
> > 
> > The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy 
> > instance
> > in the kernel KVM and connect this instance to the host pci device.
> > 
> > kernel patches apply to linux-kvm
> > 
> > s390: cio: chsc function to register GIB
> > s390: pci: export pci functions for pass-through usage
> > KVM: s390: Add GISA support
> > KVM: s390: Add PCI pass-through support
> > 
> > qemu patches apply to qemu-master
> > 
> > s390: Add PCI bus support
> > s390: Add PCI pass-through device support
> > 
> > Feedback and discussion is highly welcome ...
> 
> KVM-based device assignment needs to go away.  It's a horrible model for
> devices, it offers very little protection to the kernel, assumes every
> device is fully isolated and visible to the IOMMU, relies on smattering
> of sysfs files to operate, etc.  x86, POWER, and ARM are all moving to
> VFIO-based device assignment.  Why is s390 special enough to repeat all
> the mistakes that x86 did?  Thanks,
> 

Is this your personal opinion or was this a strategic decision of the
QEMU/KVM community? Can anybody give us direction about this?

Actually I can understand your point. In the last weeks I did some development
and testing regarding the use of vfio too. But the in kernel solutions seems to
offer the best performance and most straighforward implementation for our
platform.

Greetings,

Frank

> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.k

Re: [PATCH v2 12/15] arm/arm64: KVM: add virtual GICv3 distributor emulation

2014-09-05 Thread Andre Przywara
Hi wanghaibin,

On 05/09/14 04:28, wanghaibin wrote:
> On 2014/8/21 21:06, Andre Przywara wrote:
> 
> 
>> +void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>> +{
>> +struct kvm *kvm = vcpu->kvm;
>> +struct kvm_vcpu *c_vcpu;
>> +struct vgic_dist *dist = &kvm->arch.vgic;
>> +u16 target_cpus;
>> +u64 mpidr, mpidr_h, mpidr_l;
>> +int sgi, mode, c, vcpu_id;
>> +int updated = 0;
>> +
>> +vcpu_id = vcpu->vcpu_id;
>> +
>> +sgi = (reg >> 24) & 0xf;
>> +mode = (reg >> 40) & 0x1;
>> +target_cpus = reg & 0x;
>> +mpidr = ((reg >> 48) & 0xff) << MPIDR_LEVEL_SHIFT(3);
>> +mpidr |= ((reg >> 32) & 0xff) << MPIDR_LEVEL_SHIFT(2);
>> +mpidr |= ((reg >> 16) & 0xff) << MPIDR_LEVEL_SHIFT(1);
>> +mpidr &= ~MPIDR_LEVEL_MASK;
>> +
> 
>> +/*
>> + * We take the dist lock here, because we come from the sysregs
>> + * code path and not from MMIO (where this is already done)
>> + */
>> +spin_lock(&dist->lock);
>> +kvm_for_each_vcpu(c, c_vcpu, kvm) {
> 
> 
> Hi, Andre, there is a suggestion. Move the
> 
>> +if (target_cpus == 0)
>> +break;
> 
> code, out the  kvm_for_each_vcpu loop, Like :
> 
> 
>   if (!mode && target_cpus == 0)   /* the judgement do not need judge in 
> kvm_for_each_vcpu loop */
>   return;

I am not so much concerned about someone actually sending a SGI to
no-one, but the code is there to stop the loop after the only CPU has
been serviced.
...

>   spin_lock(&dist->lock);
>   kvm_for_each_vcpu(c, c_vcpu, kvm) {
> 
>> +if (mode && c == vcpu_id)   /* not to myself */
>> +continue;
>> +if (!mode) {
>> +mpidr_h = kvm_vcpu_get_mpidr(c_vcpu);
>> +mpidr_l = MPIDR_AFFINITY_LEVEL(mpidr_h, 0);
>> +mpidr_h &= ~MPIDR_LEVEL_MASK;
>> +if (mpidr != mpidr_h)
>> +continue;
>> +if (!(target_cpus & BIT(mpidr_l)))
>> +continue;
>> +target_cpus &= ~BIT(mpidr_l);

Here the CPU bit is removed from target_cpus. The idea is that most of
the time we trigger a SGI for a single CPU only, so there is no need to
further iterate through all VCPUs once we found the first and only one.
That's why I check target_cpus inside the loop.

Regards,
Andre.

>> +}
>> +/* Flag the SGI as pending */
>> +vgic_dist_irq_set(c_vcpu, sgi);
>> +updated = 1;
>> +kvm_debug("SGI%d from CPU%d to CPU%d\n", sgi, vcpu_id, c);
>> +}
>> +if (updated)
>> +vgic_update_state(vcpu->kvm);
>> +spin_unlock(&dist->lock);
>> +if (updated)
>> +vgic_kick_vcpus(vcpu->kvm);
>> +}
>> +
>> +
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Alexander Graf


On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
> This set of patches implements pci pass-through support for qemu/KVM on s390.
> PCI support on s390 is very different from other platforms.
> Major differences are:
> 
> 1) all PCI operations are driven by special s390 instructions
> 2) all s390 PCI instructions are privileged
> 3) PCI config and memory spaces can not be mmap'ed

That's ok, vfio abstracts config space anyway.

> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.

This is in line with other implementations. Interrupts go from

  device -> PHB -> PIC -> CPU

(some times you can have another converter device in between)

In your case, the PHB converts INTX and MSI interrupts to Adapter
interrupts to go to the floating interrupt controller. Same thing as
everyone else really.

> 5) For DMA access there is always an IOMMU required. s390 pci implementation
>does not support a complete memory to iommu mapping, dma mappings are
>created on request.

Sounds great :). So I suppose we should implement a guest facing IOMMU?

> 6) The OS does not get any informations about the physical layout
>of the PCI bus.

So how does it know whether different devices are behind the same IOMMU
context? Or can we assume that every device has its own context?

> 7) To take advantage of system z specific virtualization features
>we need to access the SIE control block residing in the kernel KVM

Pleas elaborate.

> 8) To enable system z specific virtualization features we have to manipulate
>the zpci device in kernel.

Why?

> 
> For this reasons I decided to implement a kernel based approach similar
> to x86 device assignment. There is a new qemu device (s390-pci) representing a

I fail to see the rationale and I definitely don't want to see anything
even remotely similar to the legacy x86 device assignment on s390 ;).

Can't we just enhance VFIO?

Also, I think we'll get the cleanest model if we start off with an
implementation that allows us to add emulated PCI devices to an s390x
machine and only then follow on with physical ones.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 3/6] KVM: s390: Add GISA support

2014-09-05 Thread Alexander Graf


On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
> From: Frank Blaschka 
> 
> This patch adds GISA (Guest Interrupt State Area) support
> to s390 kvm. GISA can be used for exitless interrupts. The
> patch provides a set of functions for GISA related operations
> like accessing GISA fields or registering ISCs for alert.
> Exploiters of GISA will follow with additional patches.
> 
> Signed-off-by: Frank Blaschka 

That's a nice feature. However, please make sure that you maintain the
abstraction levels.

What should happen is that you request an irqfd from FLIC. Then you
associate that irqfd with the PCI device.

Thanks to that association, both parties can now talk to each other and
negotiate their GISA number space and make sure things are connected.

However, it should always be possible to do things without this direct
IRQ injection.

So you should be able to receive an irqfd event when an IRQ happened, so
that VFIO user space applications can also handle interrupts for example.

And the same applies for interrupt injection. We also need to be able to
inject an adapter interrupt from QEMU for emulated devices ;).


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Alexander Graf


On 05.09.14 09:46, Frank Blaschka wrote:
> On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
>> On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
>>> This set of patches implements pci pass-through support for qemu/KVM on 
>>> s390.
>>> PCI support on s390 is very different from other platforms.
>>> Major differences are:
>>>
>>> 1) all PCI operations are driven by special s390 instructions
>>
>> Generating config cycles is always arch specific.
>>
>>> 2) all s390 PCI instructions are privileged
>>
>> While the operations to generate config cycles on x86 are not
>> privileged, they must be arbitrated between accesses, so in a sense
>> they're privileged.
>>
>>> 3) PCI config and memory spaces can not be mmap'ed
>>
>> VFIO has mapping flags that allow any region to specify mmap support.
>>
> 
> Hi Alex,
> 
> thx for your reply.
> 
> Let me elaborate a little bit ore on 1 - 3. Config and memory space can not
> be accessed via memory operations. You have to use special s390 instructions.
> This instructions can not be executed in user space. So there is no other
> way than executing this instructions in kernel. Yes vfio does support a
> slow path via ioctrl we could use, but this seems suboptimal from performance
> point of view.

Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal"
to call into the kernel for every PCI access, but I still think that
VFIO provides the correct abstraction layer for us to use. If nothing
else, it would at least give us identical configuration to x86 and nice
debugability en par with the other platforms.

>  
>>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
>>
>> VFIO delivers interrupts as eventfds regardless of the underlying
>> platform mechanism.
>>
> 
> yes that's right, but then we have to do platform specific stuff to present
> the irq to the guest. I do not say this is impossible but we have add s390
> specific code to vfio. 

Not at all - interrupt delivery is completely transparent to VFIO.

> 
>>> 5) For DMA access there is always an IOMMU required.
>>
>> x86 requires the same.
>>
>>>  s390 pci implementation
>>>does not support a complete memory to iommu mapping, dma mappings are
>>>created on request.
>>
>> Sounds like POWER.
> 
> Don't know the details from power, maybe it is similar but not the same.
> We might be able to extend vfio to have a new interface allowing
> us to do DMA mappings on request.

We already have that.

> 
>>
>>> 6) The OS does not get any informations about the physical layout
>>>of the PCI bus.
>>
>> If that means that every device is isolated (seems unlikely for
>> multifunction devices) then that makes IOMMU group support really easy.
>>
> 
> OK
>  
>>> 7) To take advantage of system z specific virtualization features
>>>we need to access the SIE control block residing in the kernel KVM
>>
>> The KVM-VFIO device allows interaction between VFIO devices and KVM.
>>
>>> 8) To enable system z specific virtualization features we have to manipulate
>>>the zpci device in kernel.
>>
>> VFIO supports different device backends, currently pci_dev and working
>> towards platform devices.  zpci might just be an extension to standard
>> pci.
>>
> 
> 7 - 8 At least this is not as straightforward as the pure kernel approach, but
> I have to dig into that in more detail if we could only agree on a vfio 
> solution.

Please do so, yes :).

> 
>>> For this reasons I decided to implement a kernel based approach similar
>>> to x86 device assignment. There is a new qemu device (s390-pci) 
>>> representing a
>>> pass through device on the host. Here is a sample qemu device configuration:
>>>
>>> -device s390-pci,host=:00:00.0
>>>
>>> The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy 
>>> instance
>>> in the kernel KVM and connect this instance to the host pci device.
>>>
>>> kernel patches apply to linux-kvm
>>>
>>> s390: cio: chsc function to register GIB
>>> s390: pci: export pci functions for pass-through usage
>>> KVM: s390: Add GISA support
>>> KVM: s390: Add PCI pass-through support
>>>
>>> qemu patches apply to qemu-master
>>>
>>> s390: Add PCI bus support
>>> s390: Add PCI pass-through device support
>>>
>>> Feedback and discussion is highly welcome ...
>>
>> KVM-based device assignment needs to go away.  It's a horrible model for
>> devices, it offers very little protection to the kernel, assumes every
>> device is fully isolated and visible to the IOMMU, relies on smattering
>> of sysfs files to operate, etc.  x86, POWER, and ARM are all moving to
>> VFIO-based device assignment.  Why is s390 special enough to repeat all
>> the mistakes that x86 did?  Thanks,
>>
> 
> Is this your personal opinion or was this a strategic decision of the
> QEMU/KVM community? Can anybody give us direction about this?
> 
> Actually I can understand your point. In the last

Re: [RFC][patch 4/6] KVM: s390: Add PCI pass-through support

2014-09-05 Thread Alexander Graf


On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
> From: Frank Blaschka 
> 
> This patch implemets PCI pass-through kernel support for s390.
> Design approach is very similar to the x86 device assignment.
> User space executes the KVM_ASSIGN_PCI_DEVICE ioctl to create
> a proxy instance in the kernel KVM and connect this instance to the
> host pci device. s390 pci instructions are intercepted in kernel and
> operations are passed directly to the assigned pci device.
> To take advantage of all system z specific virtualization features
> we need to access the SIE control block residing in KVM. Also we have to
> enable z pci devices with special configuration information coming
> form the SIE block as well.
> 
> Signed-off-by: Frank Blaschka 
> ---
>  arch/s390/include/asm/kvm_host.h |1 
>  arch/s390/kvm/Makefile   |2 
>  arch/s390/kvm/intercept.c|1 
>  arch/s390/kvm/kvm-s390.c |   33 
>  arch/s390/kvm/kvm-s390.h |   17 
>  arch/s390/kvm/pci.c  | 2130 
> +++
>  arch/s390/kvm/priv.c |   21 
>  7 files changed, 2202 insertions(+), 3 deletions(-)


I would love to review this patch, but in its current form it's
impossible to do. I can't possibly keep > 2000 lines of code in my head.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/3] contrib: add ivshmem client and server

2014-09-05 Thread Claudio Fontana
Just to point out that for the client there is also a DEBUG_LOG to uppercase, 
just like already pointed out for the server.

>> diff --git a/contrib/ivshmem-client/ivshmem-client.c 
>> b/contrib/ivshmem-client/ivshmem-client.c
>> new file mode 100644
>> index 000..ad210c8
>> --- /dev/null
>> +++ b/contrib/ivshmem-client/ivshmem-client.c
>> @@ -0,0 +1,405 @@
>> +/*
>> + * Copyright 6WIND S.A., 2014
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> + * (at your option) any later version.  See the COPYING file in the
>> + * top-level directory.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "qemu-common.h"
>> +#include "qemu/queue.h"
>> +
>> +#include "ivshmem-client.h"
>> +
>> +/* log a message on stdout if verbose=1 */
>> +#define debug_log(client, fmt, ...) do { \
>> +if ((client)->verbose) { \
>> +printf(fmt, ## __VA_ARGS__); \
>> +}\
>> +} while (0)
>> +

..here (DEBUG_LOG?)

Thanks to all who are working on this.

Claudio


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions

2014-09-05 Thread Gleb Natapov
On Thu, Sep 04, 2014 at 07:44:51PM +0200, Paolo Bonzini wrote:
> Il 04/09/2014 17:05, Gleb Natapov ha scritto:
> >> > 
> >> > If you do that, KVM gets down to the "if (writeback)" and writes the 
> >> > ctxt->eip from L2 into the L1 EIP.
> > Heh, that's a bummer. We should not write back if an instruction caused a 
> > vmexit.
> > 
> 
> You're right, that works.
Looks good!

Reviewed-by: Gleb Natapov 

> 
> Paolo
> 
> -- 8< -
> Subject: [PATCH] KVM: x86: skip writeback on injection of nested exception
> 
> If a nested page fault happens during emulation, we will inject a vmexit,
> not a page fault.  However because writeback happens after the injection,
> we will write ctxt->eip from L2 into the L1 EIP.  We do not write back
> if an instruction caused an interception vmexit---do the same for page
> faults.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/x86.c  | 15 ++-
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 08cc299..c989651 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -893,7 +893,6 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct 
> x86_exception *fault);
>  int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>   gfn_t gfn, void *data, int offset, int len,
>   u32 access);
> -void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
>  bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
>  
>  static inline int __kvm_irq_line_state(unsigned long *irq_state,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e4ed85e..3541946 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -408,12 +408,14 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, 
> struct x86_exception *fault)
>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
>  
> -void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
> +static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception 
> *fault)
>  {
>   if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
>   vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
>   else
>   vcpu->arch.mmu.inject_page_fault(vcpu, fault);
> +
> + return fault->nested_page_fault;
>  }
>  
>  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
> @@ -4929,16 +4931,18 @@ static void toggle_interruptibility(struct kvm_vcpu 
> *vcpu, u32 mask)
>   }
>  }
>  
> -static void inject_emulated_exception(struct kvm_vcpu *vcpu)
> +static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
>  {
>   struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
>   if (ctxt->exception.vector == PF_VECTOR)
> - kvm_propagate_fault(vcpu, &ctxt->exception);
> - else if (ctxt->exception.error_code_valid)
> + return kvm_propagate_fault(vcpu, &ctxt->exception);
> +
> + if (ctxt->exception.error_code_valid)
>   kvm_queue_exception_e(vcpu, ctxt->exception.vector,
> ctxt->exception.error_code);
>   else
>   kvm_queue_exception(vcpu, ctxt->exception.vector);
> + return false;
>  }
>  
>  static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
> @@ -5300,8 +5304,9 @@ restart:
>   }
>  
>   if (ctxt->have_exception) {
> - inject_emulated_exception(vcpu);
>   r = EMULATE_DONE;
> + if (inject_emulated_exception(vcpu))
> + return r;
>   } else if (vcpu->arch.pio.count) {
>   if (!vcpu->arch.pio.in) {
>   /* FIXME: return into emulator if single-stepping.  */
> -- 
> 1.9.3
> 
> 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM changes for 3.17-rc4

2014-09-05 Thread Paolo Bonzini
Linus,

The following changes since commit 30d1e0e806e5b2fadc297ba78f2d7afd6ba309cf:

  virt/kvm/assigned-dev.c: Set 'dev->irq_source_id' to '-1' after free it 
(2014-08-19 15:12:28 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to b11ba8c62be3eba54784611c769f4d7d07fac4a5:

  KVM: x86: fix kvmclock breakage from timers branch merge (2014-09-04 23:42:11 
+0200)


A smattering of bug fixes across most architectures.

The x86 fix is really working around a timekeeping bug that is being
exposed by KVM.  Thomas and John are aware of it and they are working
on the real fix.  I am including the workaround because I think it
makes the code nicer, so I would submit it sooner or later anyway.


Christian Borntraeger (4):
  KVM: s390: Fix user triggerable bug in dead code
  KVM: s390/mm: try a cow on read only pages for key ops
  KVM: s390/mm: Fix storage key corruption during swapping
  KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags

Christoffer Dall (1):
  arm/arm64: KVM: Complete WFI/WFE instructions

Laurent Dufour (1):
  powerpc/kvm/cma: Fix panic introduces by signed shift operation

Paolo Bonzini (4):
  Merge tag 'kvm-s390-20140825' of git://git.kernel.org/.../kvms390/linux 
into kvm-master
  Merge tag 'kvm-arm-for-v3.17-rc3' of 
git://git.kernel.org/.../kvmarm/kvmarm into HEAD
  Merge tag 'kvm-s390-master-20140902' of 
git://git.kernel.org/.../kvms390/linux into kvm-master
  KVM: x86: fix kvmclock breakage from timers branch merge

Pranavkumar Sawargaonkar (1):
  ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU

 arch/arm/kvm/handle_exit.c  |  2 ++
 arch/arm/kvm/init.S |  4 
 arch/arm64/kvm/handle_exit.c|  2 ++
 arch/arm64/kvm/hyp-init.S   |  4 
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  4 ++--
 arch/s390/include/asm/pgtable.h |  6 --
 arch/s390/kvm/kvm-s390.c| 13 -
 arch/s390/mm/pgtable.c  | 10 ++
 arch/x86/kvm/x86.c  | 13 +++--
 9 files changed, 35 insertions(+), 23 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: arm: remove register accessor macros now that they are in uapi

2014-09-05 Thread Will Deacon
On Fri, Aug 29, 2014 at 02:00:24PM +0100, Will Deacon wrote:
> The kernel now exposes register accessor macros in the uapi/ headers
> for arm and arm64, so use those instead (and avoid the compile failure
> from the duplicate definitions).
> 
> Signed-off-by: Will Deacon 
> ---
> 
> Pekka -- please take this as a fix, since merging the 3.16 sources has
>  caused some breakage for ARM. Cheers!

Ping? It would be great to get the build failure fixed on master.

Cheers,

Will

>  tools/kvm/arm/aarch32/kvm-cpu.c | 15 +--
>  tools/kvm/arm/aarch64/kvm-cpu.c | 15 ---
>  2 files changed, 1 insertion(+), 29 deletions(-)
> 
> diff --git a/tools/kvm/arm/aarch32/kvm-cpu.c b/tools/kvm/arm/aarch32/kvm-cpu.c
> index 464b473dc936..95fb1da5ba3d 100644
> --- a/tools/kvm/arm/aarch32/kvm-cpu.c
> +++ b/tools/kvm/arm/aarch32/kvm-cpu.c
> @@ -7,25 +7,12 @@
>  #define ARM_CORE_REG(x)  (KVM_REG_ARM | KVM_REG_SIZE_U32 | 
> KVM_REG_ARM_CORE | \
>KVM_REG_ARM_CORE_REG(x))
>  
> -#define ARM_CP15_REG_SHIFT_MASK(x,n) \
> - (((x) << KVM_REG_ARM_ ## n ## _SHIFT) & KVM_REG_ARM_ ## n ## _MASK)
> -
> -#define __ARM_CP15_REG(op1,crn,crm,op2)  \
> - (KVM_REG_ARM | KVM_REG_SIZE_U32 |   \
> -  (15 << KVM_REG_ARM_COPROC_SHIFT)   |   \
> -  ARM_CP15_REG_SHIFT_MASK(op1, OPC1) |   \
> -  ARM_CP15_REG_SHIFT_MASK(crn, 32_CRN)   |   \
> -  ARM_CP15_REG_SHIFT_MASK(crm, CRM)  |   \
> -  ARM_CP15_REG_SHIFT_MASK(op2, 32_OPC2))
> -
> -#define ARM_CP15_REG(...)__ARM_CP15_REG(__VA_ARGS__)
> -
>  unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu)
>  {
>   struct kvm_one_reg reg;
>   u32 mpidr;
>  
> - reg.id = ARM_CP15_REG(ARM_CPU_ID, ARM_CPU_ID_MPIDR);
> + reg.id = ARM_CP15_REG32(ARM_CPU_ID, ARM_CPU_ID_MPIDR);
>   reg.addr = (u64)(unsigned long)&mpidr;
>   if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, ®) < 0)
>   die("KVM_GET_ONE_REG failed (get_mpidr vcpu%ld", vcpu->cpu_id);
> diff --git a/tools/kvm/arm/aarch64/kvm-cpu.c b/tools/kvm/arm/aarch64/kvm-cpu.c
> index 71a2a3a7789d..1b293748efd6 100644
> --- a/tools/kvm/arm/aarch64/kvm-cpu.c
> +++ b/tools/kvm/arm/aarch64/kvm-cpu.c
> @@ -15,21 +15,6 @@
>  #define ARM64_CORE_REG(x)(KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>  
> -#define ARM64_SYS_REG_SHIFT_MASK(x,n)\
> - (((x) << KVM_REG_ARM64_SYSREG_ ## n ## _SHIFT) &\
> -  KVM_REG_ARM64_SYSREG_ ## n ## _MASK)
> -
> -#define __ARM64_SYS_REG(op0,op1,crn,crm,op2) \
> - (KVM_REG_ARM64 | KVM_REG_SIZE_U64   |   \
> -  KVM_REG_ARM64_SYSREG   |   \
> -  ARM64_SYS_REG_SHIFT_MASK(op0, OP0) |   \
> -  ARM64_SYS_REG_SHIFT_MASK(op1, OP1) |   \
> -  ARM64_SYS_REG_SHIFT_MASK(crn, CRN) |   \
> -  ARM64_SYS_REG_SHIFT_MASK(crm, CRM) |   \
> -  ARM64_SYS_REG_SHIFT_MASK(op2, OP2))
> -
> -#define ARM64_SYS_REG(...)   __ARM64_SYS_REG(__VA_ARGS__)
> -
>  unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu)
>  {
>   struct kvm_one_reg reg;
> -- 
> 2.1.0
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/3] ivshmem: add check on protocol version in QEMU

2014-09-05 Thread Stefan Hajnoczi
On Thu, Sep 04, 2014 at 02:51:01PM +0200, David Marchand wrote:
> diff --git a/contrib/ivshmem-client/ivshmem-client.c 
> b/contrib/ivshmem-client/ivshmem-client.c
> index ad210c8..0c4e016 100644
> --- a/contrib/ivshmem-client/ivshmem-client.c
> +++ b/contrib/ivshmem-client/ivshmem-client.c
> @@ -184,10 +184,18 @@ ivshmem_client_connect(IvshmemClient *client)
>  goto err_close;
>  }
>  
> -/* first, we expect our index + a fd == -1 */
> +/* first, we expect a protocol version */
> +if (read_one_msg(client, &tmp, &fd) < 0 ||
> +(tmp != IVSHMEM_PROTOCOL_VERSION) || fd != -1) {
> +debug_log(client, "cannot read from server\n");
> +goto err_close;
> +}
> +debug_log(client, "our_id=%ld\n", client->local.id);

This debug_log() is probably not intentional.  local.id will always be
-1 here so the output is not useful.

> +static void ivshmem_check_version(void *opaque, const uint8_t * buf, int 
> flags)
> +{
> +IVShmemState *s = opaque;
> +PCIDevice *dev = PCI_DEVICE(s);
> +int tmp;
> +long version;
> +
> +memcpy(&version, buf, sizeof(long));
> +tmp = qemu_chr_fe_get_msgfd(s->server_chr);
> +if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
> +fprintf(stderr, "incompatible version, you are connecting to a 
> ivhsmem-"
> +"server using a different protocol please check your 
> setup\n");
> +qemu_chr_delete(s->server_chr);
> +s->server_chr = NULL;
> +return;
> +}
> +
> +IVSHMEM_DPRINTF("version check ok, finish init and switch to real 
> chardev "
> +"handler\n");
> +
> +pci_register_bar(dev, 2, s->ivshmem_attr, &s->bar);

Not sure if it is okay to delay PCI initialization to a fd hander
callback.

If the version message is too slow the guest could see the PCI adapter
without the BAR!

Did you move this code in order to prevent the guest from accessing the
device before it has connected to the server?  Perhaps the device needs
a state field that tracks whether or not it is ready for operation.  Any
access before RUNNING state is reached will be ignored (?).


pgpiIFphsdeyj.pgp
Description: PGP signature


Re: [PATCH v5 2/3] docs: update ivshmem device spec

2014-09-05 Thread Stefan Hajnoczi
On Thu, Sep 04, 2014 at 02:51:00PM +0200, David Marchand wrote:
> Add some notes on the parts needed to use ivshmem devices: more specifically,
> explain the purpose of an ivshmem server and the basic concept to use the
> ivshmem devices in guests.
> Move some parts of the documentation and re-organise it.
> 
> Signed-off-by: David Marchand 
> Reviewed-by: Claudio Fontana 
> ---
>  docs/specs/ivshmem_device_spec.txt |  124 
> +++-
>  1 file changed, 93 insertions(+), 31 deletions(-)

Reviewed-by: Stefan Hajnoczi 


pgpzI5uDLZxl3.pgp
Description: PGP signature


Re: [PATCH v5 1/3] contrib: add ivshmem client and server

2014-09-05 Thread Stefan Hajnoczi
On Thu, Sep 04, 2014 at 02:50:59PM +0200, David Marchand wrote:
> When using ivshmem devices, notifications between guests can be sent as
> interrupts using a ivshmem-server (typical use described in documentation).
> The client is provided as a debug tool.
> 
> Signed-off-by: Olivier Matz 
> Signed-off-by: David Marchand 
> ---
>  Makefile|8 +
>  configure   |3 +
>  contrib/ivshmem-client/ivshmem-client.c |  405 
> +++
>  contrib/ivshmem-client/ivshmem-client.h |  239 ++
>  contrib/ivshmem-client/main.c   |  237 ++
>  contrib/ivshmem-server/ivshmem-server.c |  395 ++
>  contrib/ivshmem-server/ivshmem-server.h |  186 ++
>  contrib/ivshmem-server/main.c   |  244 +++
>  qemu-doc.texi   |   10 +-
>  9 files changed, 1724 insertions(+), 3 deletions(-)
>  create mode 100644 contrib/ivshmem-client/ivshmem-client.c
>  create mode 100644 contrib/ivshmem-client/ivshmem-client.h
>  create mode 100644 contrib/ivshmem-client/main.c
>  create mode 100644 contrib/ivshmem-server/ivshmem-server.c
>  create mode 100644 contrib/ivshmem-server/ivshmem-server.h
>  create mode 100644 contrib/ivshmem-server/main.c

Modulo Michael's comments:

Reviewed-by: Stefan Hajnoczi 


pgpWQnOK3dvoN.pgp
Description: PGP signature


Re: [RFC][patch 3/6] KVM: s390: Add GISA support

2014-09-05 Thread Frank Blaschka
On Fri, Sep 05, 2014 at 10:29:26AM +0200, Alexander Graf wrote:
> 
> 
> On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
> > From: Frank Blaschka 
> > 
> > This patch adds GISA (Guest Interrupt State Area) support
> > to s390 kvm. GISA can be used for exitless interrupts. The
> > patch provides a set of functions for GISA related operations
> > like accessing GISA fields or registering ISCs for alert.
> > Exploiters of GISA will follow with additional patches.
> > 
> > Signed-off-by: Frank Blaschka 
> 
> That's a nice feature. However, please make sure that you maintain the
> abstraction levels.
> 
> What should happen is that you request an irqfd from FLIC. Then you
> associate that irqfd with the PCI device.
> 
> Thanks to that association, both parties can now talk to each other and
> negotiate their GISA number space and make sure things are connected.
> 
> However, it should always be possible to do things without this direct
> IRQ injection.
> 
> So you should be able to receive an irqfd event when an IRQ happened, so
> that VFIO user space applications can also handle interrupts for example.
> 
> And the same applies for interrupt injection. We also need to be able to
> inject an adapter interrupt from QEMU for emulated devices ;).
>

OK, assuming we are doing the vfio solution expoiting GISA would be a
second step. Will take your feedback into account. THX!
> 
> Alex
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Frank Blaschka
On Fri, Sep 05, 2014 at 10:21:27AM +0200, Alexander Graf wrote:
> 
> 
> On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
> > This set of patches implements pci pass-through support for qemu/KVM on 
> > s390.
> > PCI support on s390 is very different from other platforms.
> > Major differences are:
> > 
> > 1) all PCI operations are driven by special s390 instructions
> > 2) all s390 PCI instructions are privileged
> > 3) PCI config and memory spaces can not be mmap'ed
> 
> That's ok, vfio abstracts config space anyway.
> 
> > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
> >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
> 
> This is in line with other implementations. Interrupts go from
> 
>   device -> PHB -> PIC -> CPU
> 
> (some times you can have another converter device in between)
> 
> In your case, the PHB converts INTX and MSI interrupts to Adapter
> interrupts to go to the floating interrupt controller. Same thing as
> everyone else really.
> 

Yes, I think this can be done, but we need s390 specific changes in vfio.

> > 5) For DMA access there is always an IOMMU required. s390 pci implementation
> >does not support a complete memory to iommu mapping, dma mappings are
> >created on request.
> 
> Sounds great :). So I suppose we should implement a guest facing IOMMU?
> 
> > 6) The OS does not get any informations about the physical layout
> >of the PCI bus.
> 
> So how does it know whether different devices are behind the same IOMMU
> context? Or can we assume that every device has its own context?

Actually yes

> 
> > 7) To take advantage of system z specific virtualization features
> >we need to access the SIE control block residing in the kernel KVM
> 
> Pleas elaborate.
> 
> > 8) To enable system z specific virtualization features we have to manipulate
> >the zpci device in kernel.
> 
> Why?
>

We have following s390 specific virtualization features:

1) interpretive execution of pci load/store instruction. If we use this function
   pci access does not get intercepted (no SIE exit) but is handled via 
microcode.
   To enable this we have to disable zpci device and enable it again with 
information
   from the SIE control block. Further in qemu problem is: vfio traps access to
   MSIX table so we have to find another way programming msix if we do not get
   intercepts for memory space access.

2) Adapter event forwarding (with alerting). This is a mechanism the adpater 
event (irq)
   is directly forwarded to the guest. To set this up we also need to manipulate
   the zpci device (in kernel) with information form the SIE block. Exploiting
   GISA is only one part of this mechanism.

Both might be possible with some more or less nice looking vfio extensions. As 
I said
before we have to dig more into. Also this can be further optimazation steps 
later
if we have a running vfio implementation on the platform. 
 
> > 
> > For this reasons I decided to implement a kernel based approach similar
> > to x86 device assignment. There is a new qemu device (s390-pci) 
> > representing a
> 
> I fail to see the rationale and I definitely don't want to see anything
> even remotely similar to the legacy x86 device assignment on s390 ;).
> 
> Can't we just enhance VFIO?
> 

Probably yes, but we need some vfio changes (kernel and qemu)

> Also, I think we'll get the cleanest model if we start off with an
> implementation that allows us to add emulated PCI devices to an s390x
> machine and only then follow on with physical ones.
> 

I can already do this. With some more s390 intercepts a device can be detected 
and
guest is able to access config/memory space. Unfortunately s390 platform does 
not
support I/O bars so non of the emulated devices will work on the platform ...

> 
> Alex
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Frank Blaschka
On Fri, Sep 05, 2014 at 10:35:59AM +0200, Alexander Graf wrote:
> 
> 
> On 05.09.14 09:46, Frank Blaschka wrote:
> > On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
> >> On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
> >>> This set of patches implements pci pass-through support for qemu/KVM on 
> >>> s390.
> >>> PCI support on s390 is very different from other platforms.
> >>> Major differences are:
> >>>
> >>> 1) all PCI operations are driven by special s390 instructions
> >>
> >> Generating config cycles is always arch specific.
> >>
> >>> 2) all s390 PCI instructions are privileged
> >>
> >> While the operations to generate config cycles on x86 are not
> >> privileged, they must be arbitrated between accesses, so in a sense
> >> they're privileged.
> >>
> >>> 3) PCI config and memory spaces can not be mmap'ed
> >>
> >> VFIO has mapping flags that allow any region to specify mmap support.
> >>
> > 
> > Hi Alex,
> > 
> > thx for your reply.
> > 
> > Let me elaborate a little bit ore on 1 - 3. Config and memory space can not
> > be accessed via memory operations. You have to use special s390 
> > instructions.
> > This instructions can not be executed in user space. So there is no other
> > way than executing this instructions in kernel. Yes vfio does support a
> > slow path via ioctrl we could use, but this seems suboptimal from 
> > performance
> > point of view.
> 
> Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal"
> to call into the kernel for every PCI access, but I still think that
> VFIO provides the correct abstraction layer for us to use. If nothing
> else, it would at least give us identical configuration to x86 and nice
> debugability en par with the other platforms.
> 
> >  
> >>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
> >>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
> >>
> >> VFIO delivers interrupts as eventfds regardless of the underlying
> >> platform mechanism.
> >>
> > 
> > yes that's right, but then we have to do platform specific stuff to present
> > the irq to the guest. I do not say this is impossible but we have add s390
> > specific code to vfio. 
> 
> Not at all - interrupt delivery is completely transparent to VFIO.
>

interrupt yes, but MSIX no
 
> > 
> >>> 5) For DMA access there is always an IOMMU required.
> >>
> >> x86 requires the same.
> >>
> >>>  s390 pci implementation
> >>>does not support a complete memory to iommu mapping, dma mappings are
> >>>created on request.
> >>
> >> Sounds like POWER.
> > 
> > Don't know the details from power, maybe it is similar but not the same.
> > We might be able to extend vfio to have a new interface allowing
> > us to do DMA mappings on request.
> 
> We already have that.
>

Great, can you give me some pointers how to use? Thx!
 
> > 
> >>
> >>> 6) The OS does not get any informations about the physical layout
> >>>of the PCI bus.
> >>
> >> If that means that every device is isolated (seems unlikely for
> >> multifunction devices) then that makes IOMMU group support really easy.
> >>
> > 
> > OK
> >  
> >>> 7) To take advantage of system z specific virtualization features
> >>>we need to access the SIE control block residing in the kernel KVM
> >>
> >> The KVM-VFIO device allows interaction between VFIO devices and KVM.
> >>
> >>> 8) To enable system z specific virtualization features we have to 
> >>> manipulate
> >>>the zpci device in kernel.
> >>
> >> VFIO supports different device backends, currently pci_dev and working
> >> towards platform devices.  zpci might just be an extension to standard
> >> pci.
> >>
> > 
> > 7 - 8 At least this is not as straightforward as the pure kernel approach, 
> > but
> > I have to dig into that in more detail if we could only agree on a vfio 
> > solution.
> 
> Please do so, yes :).
> 
> > 
> >>> For this reasons I decided to implement a kernel based approach similar
> >>> to x86 device assignment. There is a new qemu device (s390-pci) 
> >>> representing a
> >>> pass through device on the host. Here is a sample qemu device 
> >>> configuration:
> >>>
> >>> -device s390-pci,host=:00:00.0
> >>>
> >>> The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy 
> >>> instance
> >>> in the kernel KVM and connect this instance to the host pci device.
> >>>
> >>> kernel patches apply to linux-kvm
> >>>
> >>> s390: cio: chsc function to register GIB
> >>> s390: pci: export pci functions for pass-through usage
> >>> KVM: s390: Add GISA support
> >>> KVM: s390: Add PCI pass-through support
> >>>
> >>> qemu patches apply to qemu-master
> >>>
> >>> s390: Add PCI bus support
> >>> s390: Add PCI pass-through device support
> >>>
> >>> Feedback and discussion is highly welcome ...
> >>
> >> KVM-based device assignment needs to go away.  It's a horrible model for
> >> devices, it offers very little protection to the kernel, assumes every
> >> device is full

Re: [PATCH v5 1/3] contrib: add ivshmem client and server

2014-09-05 Thread David Marchand

Hello Michael,

On 09/04/2014 05:56 PM, Michael S. Tsirkin wrote:

+/* create the unix listening socket */
+sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
+if (sock_fd < 0) {
+debug_log(server, "cannot create socket: %s\n", strerror(errno));
+goto err_close_shm;
+}
+
+sun.sun_family = AF_UNIX;
+snprintf(sun.sun_path, sizeof(sun.sun_path), "%s", server->unix_sock_path);
+unlink(sun.sun_path);


why unlink it?


Yes, this is wrong, because this means that when starting multiple
servers on the same socket, the last server is the one who wins ...
while I think it should be the opposite (first server wins, as it may
have some connected clients).

I have been scratching my head about this: when should I unlink ?

My current fix unlinks from ivshmem_server_close() (which should be the
right place).
I need to call this when exiting, but I can only do this when the server
exits gracefully (when an error occurs on the server socket or when 
receiving a SIGTERM).


If something unexpected happens (like a bug/crash or a SIGKILL), the
socket won't be unlinked and the next server process will refuse to start.
Is this something acceptable ?

Do you have a better idea ?


Thanks.

--
David Marchand
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/3] ivshmem: add check on protocol version in QEMU

2014-09-05 Thread David Marchand

Hello Stefan,

On 09/05/2014 12:29 PM, Stefan Hajnoczi wrote:

On Thu, Sep 04, 2014 at 02:51:01PM +0200, David Marchand wrote:

diff --git a/contrib/ivshmem-client/ivshmem-client.c 
b/contrib/ivshmem-client/ivshmem-client.c
index ad210c8..0c4e016 100644
--- a/contrib/ivshmem-client/ivshmem-client.c
+++ b/contrib/ivshmem-client/ivshmem-client.c
@@ -184,10 +184,18 @@ ivshmem_client_connect(IvshmemClient *client)
  goto err_close;
  }

-/* first, we expect our index + a fd == -1 */
+/* first, we expect a protocol version */
+if (read_one_msg(client, &tmp, &fd) < 0 ||
+(tmp != IVSHMEM_PROTOCOL_VERSION) || fd != -1) {
+debug_log(client, "cannot read from server\n");
+goto err_close;
+}
+debug_log(client, "our_id=%ld\n", client->local.id);


This debug_log() is probably not intentional.  local.id will always be
-1 here so the output is not useful.


Yes, this is most likely a merge/rebase issue.
Will remove this.




+static void ivshmem_check_version(void *opaque, const uint8_t * buf, int flags)
+{
+IVShmemState *s = opaque;
+PCIDevice *dev = PCI_DEVICE(s);
+int tmp;
+long version;
+
+memcpy(&version, buf, sizeof(long));
+tmp = qemu_chr_fe_get_msgfd(s->server_chr);
+if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
+fprintf(stderr, "incompatible version, you are connecting to a 
ivhsmem-"


Hum, typo: ivhs -> ivsh.


+"server using a different protocol please check your setup\n");
+qemu_chr_delete(s->server_chr);
+s->server_chr = NULL;
+return;
+}
+
+IVSHMEM_DPRINTF("version check ok, finish init and switch to real chardev "
+"handler\n");
+
+pci_register_bar(dev, 2, s->ivshmem_attr, &s->bar);


Not sure if it is okay to delay PCI initialization to a fd hander
callback.

If the version message is too slow the guest could see the PCI adapter
without the BAR!

Did you move this code in order to prevent the guest from accessing the
device before it has connected to the server?  Perhaps the device needs
a state field that tracks whether or not it is ready for operation.  Any
access before RUNNING state is reached will be ignored (?).


Yes, exactly.

There already is a synchronisation mechanism described in the documentation:
"When using the server, since the server is a separate process, the VM 
ID will only be set when the device is ready (shared memory is received 
from the server and accessible via the device).  If the device is not 
ready, the IVPosition will return -1.
Applications should ensure that they have a valid VM ID before accessing 
the shared memory."


So actually, this move is unneeded if ivshmem users comply to this.

I will let the init stuff (pci_register_bar + gmalloc) where it was 
before, ivshmem_check_version will only switch the chardev handler.


What do you think about this ?


Thanks.

--
David Marchand
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/9] KVM-VFIO IRQ forward control

2014-09-05 Thread Eric Auger
On 09/02/2014 11:05 PM, Alex Williamson wrote:
> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
>> This RFC proposes an integration of "ARM: Forwarding physical
>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
>> KVM.
>>
>> It enables to transform a VFIO platform driver IRQ into a forwarded
>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
>> switch can be avoided on guest virtual IRQ completion. Before this
>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
>>
>> When the IRQ is forwarded, the VFIO platform driver does not need to
>> disable the IRQ anymore. Indeed when returning from the IRQ handler
>> the IRQ is not deactivated. Only its priority is lowered. This means
>> the same IRQ cannot hit before the guest completes the virtual IRQ
>> and the GIC automatically deactivates the corresponding physical IRQ.
>>
>> Besides, the injection still is based on irqfd triggering. The only
>> impact on irqfd process is resamplefd is not called anymore on
>> virtual IRQ completion since this latter becomes "transparent".
>>
>> The current integration is based on an extension of the KVM-VFIO
>> device, previously used by KVM to interact with VFIO groups. The
>> patch serie now enables KVM to directly interact with a VFIO
>> platform device. The VFIO external API was extended for that purpose.
>>
>> Th KVM-VFIO device can get/put the vfio platform device, check its
>> integrity and type, get the IRQ number associated to an IRQ index.
>>
>> The IRQ forward programming is architecture specific (virtual interrupt
>> controller programming basically). However the whole infrastructure is
>> kept generic.
>>
>> from a user point of view, the functionality is provided through new
>> KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
>> Assignment can only be changed when the physical IRQ is not active.
>> It is the responsability of the user to do this check.
>>
>> This patch serie has the following dependencies:
>> - "ARM: Forwarding physical interrupts to a guest VM"
>>   (http://lwn.net/Articles/603514/) in
>> - [PATCH v3] irqfd for ARM
>> - and obviously the VFIO platform driver serie:
>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
>>   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
>>
>> Integrated pieces can be found at
>> ssh://git.linaro.org/people/eric.auger/linux.git
>> on branch 3.17rc3_irqfd_forward_integ_v2
>>
>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
>>
>> v1 -> v2:
>> - forward control is moved from architecture specific file into generic
>>   vfio.c module.
>>   only kvm_arch_set_fwd_state remains architecture specific
>> - integrate Kim's patch which enables KVM-VFIO for ARM
>> - fix vgic state bypass in vgic_queue_hwirq
>> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>>   to include/uapi/linux/kvm.h
>>   also irq_index renamed into index and guest_irq renamed into gsi
>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>> - vfio_external_get_base_device renamed into vfio_external_base_device
>> - vfio_external_get_type removed
>> - kvm_vfio_external_get_base_device renamed into 
>> kvm_vfio_external_base_device
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>
>> Eric Auger (8):
>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
>> IRQ
>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
>>   VFIO: platform: handler tests whether the IRQ is forwarded
>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
>>   VFIO: Extend external user API
>>   KVM: KVM-VFIO: add new VFIO external API hooks
>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
>> control
>>   KVM: KVM-VFIO: ARM forwarding control
>>
>> Kim Phillips (1):
>>   ARM: KVM: Enable the KVM-VFIO device
>>
>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
>>  arch/arm/include/asm/kvm_host.h|   7 +
>>  arch/arm/kvm/Kconfig   |   1 +
>>  arch/arm/kvm/Makefile  |   4 +-
>>  arch/arm/kvm/kvm_vfio_arm.c|  85 +
>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
>>  drivers/vfio/vfio.c|  24 ++
>>  include/kvm/arm_vgic.h |   1 +
>>  include/linux/kvm_host.h   |  27 ++
>>  include/linux/vfio.h   |   3 +
>>  include/uapi/linux/kvm.h   |   9 +
>>  virt/kvm/arm/vgic.c|  59 +++-
>>  virt/kvm/vfio.c| 497 
>> -
>>  13 files changed, 733 insertions(+), 17 deletions(-)
>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>>
> 
> Have we ventured too far in the other direction?  I suppose what I was
> hoping to see was something more like:
> 
>   case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> 
>  

Re: [PATCH] KVM: x86: fix kvmclock breakage from timers branch merge

2014-09-05 Thread Thomas Gleixner
On Thu, 4 Sep 2014, Paolo Bonzini wrote:

> Il 04/09/2014 22:58, Thomas Gleixner ha scritto:
> > This is simply wrong.
> 
> It is.
> 
> > Now I have no idea why you think it needs to add xtime_sec. If the
> > result is wrong, then we need to figure out which one of the supplied
> > values is wrong and not blindly add xtime_sec just because that makes
> > it magically correct.
> > 
> > Can you please provide a proper background why you think that adding
> > xtime_sec is a good idea?
> 
> It's not a good idea indeed.  I didn't fully digest the 3.16->3.17
> timekeeping changes and messed up this patch.
> 
> However, there is a bug in the "base_mono + offs_boot" formula, given
> that:
> 
> - bisection leads to the merge commit of John's timers branch
> 
> - bisecting within John's timers branch, with a KVM commit on top to
>   make the code much easier to trigger, leads to commit cbcf2dd3b3d4
>   (x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based,
>   2014-07-16).
> 
> - I backported your patch to 3.16, using wall_to_monotonic +
>   total_sleep_time + xtime_sec (wtm+xtime_sec as in pre-cbcf2dd3b3d4
>   code, total_sleep_time from 3.16 monotonic_to_bootbased) and it works
> 
> - In v2 of the patch I fixed the bug by changing the formula
>   "base_mono + offs_boot" to "offs_boot - offs_real" (and then adding
>   xtime_sec separately as in the 3.16 backport), but the two formulas
>   "base_mono + offs_boot" and "offs_boot - offs_real + xtime_sec" ought
>   to be identical.

So lets look at the differences here:

3.163.17

inject_sleeptime(delta) inject_sleeptime(delta)

xtime += delta; xtime += delta;

wall_to_mono -= delta;  wall_to_mono -= delta;
offs_real = -wall_to_mono;  offs_real = -wall_to_mono;

sleeptime += delta;
offs_boot = sleeptime;  offs_boot += delta;

getboottime()

tmp = wall_to_mono + sleeptime;
boottime = -tmp;

  So:
boottime = -wall_to_mono - sleeptime;

  Because of the above:
offs_real = -wall_to_mono;
offs_boot = sleeptime;

  The result is:

  boottime = offs_real - offs_boot;   boottime = offs_real - offs_boot;

monotomic_to_bootbased(mono)monotomic_to_bootbased(mono)

   return mono + sleeptime;

  Because of the above:
offs_boot = sleeptime;

  The result is:

return mono + offs_boot;  return mono + offs_boot;
  
Now on KVM side he have

update_pvclock_gtod()   update_pvclock_gtod()

mono_base = xtime + wall_to_mono;  boot_base = mono_base + offs_boot;

and

do_monotonic()  do_monotonic_boot()

mono_now = mono_base + delta_ns;   boot_now = boot_base + delta_ns;

kvm_get_time_and_clockread()

mono_now = do_monotonic()

boot_now = mono_to_boot(mono_now);

So that means on both side the same:

   boot_now = mono_base + offs_boot + delta_ns;

So that means the code is correct. Now where is the bug?

Well hidden and still so obvious that it's even visible through the
brown paperpag I'm wearing ...

Thanks,

tglx

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index fb4a9c2cf8d9..ec1791fae965 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -442,11 +442,12 @@ static void timekeeping_update(struct timekeeper *tk, 
unsigned int action)
tk->ntp_error = 0;
ntp_clear();
}
-   update_vsyscall(tk);
-   update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
 
tk_update_ktime_data(tk);
 
+   update_vsyscall(tk);
+   update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
+
if (action & TK_MIRROR)
memcpy(&shadow_timekeeper, &tk_core.timekeeper,
   sizeof(tk_core.timekeeper));

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix kvmclock breakage from timers branch merge

2014-09-05 Thread Paolo Bonzini
Il 05/09/2014 17:14, Thomas Gleixner ha scritto:
> So that means the code is correct. Now where is the bug?

In kernel/time/timekeeping.c?

We know that we should have

  base_mono = wall_to_monotonic + xtime_sec

Instead it is

  base_mono = wall_to_monotonic + xtime_sec
  - seconds from boot time

which is... zero.  Given this is the only use of base_mono in a
notifier, I wonder if it is as simple as this (which I don't have time
to test right now):

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index fb4a9c2cf8d9..f6807a85b8c9 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -443,9 +443,9 @@ static void timekeeping_update(struct timekeeper
*tk, unsigned int action)
ntp_clear();
}
update_vsyscall(tk);
-   update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);

tk_update_ktime_data(tk);
+   update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);

if (action & TK_MIRROR)
memcpy(&shadow_timekeeper, &tk_core.timekeeper,

:)

Paolo

> Well hidden and still so obvious that it's even visible through the
> brown paperpag I'm wearing ...

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/4 resend] Introduce device assignment flag operation helper function

2014-09-05 Thread Bjorn Helgaas
On Fri, Aug 08, 2014 at 01:36:03PM +0800, Ethan Zhao wrote:
> This patch set introduces three PCI device flag operation helper functions
> when set pci device PF/VF to assigned or deassigned status also check it.
> and patch 2,3,4 apply these helper functions to KVM,XEN and PCI.
> 
> v2: simplify unnecessory ternary operation in function pci_is_dev_assigned().
> v3: amend helper function naming.
> 
> Appreciate suggestion from
> alex.william...@redhat.com,
> david.vra...@citrix.com,
> alexander.h.du...@intel.com
> 
> Resend for v3.16 building.
> 
> Thanks,
> Ethan
> ---
> Ethan Zhao (4):
>   PCI: introduce helper functions for device flag operation
>   KVM: use pci device flag operation helper functions
>   xen-pciback: use pci device flag operation helper function
>   PCI: use device flag operation helper function in iov.c
> 
>  drivers/pci/iov.c  |2 +-
>  drivers/xen/xen-pciback/pci_stub.c |4 ++--
>  include/linux/pci.h|   13 +
>  virt/kvm/assigned-dev.c|2 +-
>  virt/kvm/iommu.c   |4 ++--
>  5 files changed, 19 insertions(+), 6 deletions(-)
> 

Applied to pci/virtualization for v3.18, thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix kvmclock breakage from timers branch merge

2014-09-05 Thread Thomas Gleixner
On Fri, 5 Sep 2014, Paolo Bonzini wrote:

> Il 05/09/2014 17:14, Thomas Gleixner ha scritto:
> > So that means the code is correct. Now where is the bug?
> 
> In kernel/time/timekeeping.c?
> 
> We know that we should have
> 
>   base_mono = wall_to_monotonic + xtime_sec
> 
> Instead it is
> 
>   base_mono = wall_to_monotonic + xtime_sec
>   - seconds from boot time
> 
> which is... zero.  Given this is the only use of base_mono in a
> notifier, I wonder if it is as simple as this (which I don't have time
> to test right now):
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index fb4a9c2cf8d9..f6807a85b8c9 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -443,9 +443,9 @@ static void timekeeping_update(struct timekeeper
> *tk, unsigned int action)
>   ntp_clear();
>   }
>   update_vsyscall(tk);
> - update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
> 
>   tk_update_ktime_data(tk);
> + update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);

Why are you moving the update between vsycall and pvclock update as I
did in my patch? We really need to update everything before calling
somewhere.
 
And yes it is that simple. I instrumented the stuff and its correct
now.

Thanks

tglx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix kvmclock breakage from timers branch merge

2014-09-05 Thread Paolo Bonzini
Il 05/09/2014 20:33, Thomas Gleixner ha scritto:
>> >update_vsyscall(tk);
>> > -  update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
>> > 
>> >tk_update_ktime_data(tk);
>> > +  update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
> Why are you moving the update between vsycall and pvclock update as I
> did in my patch? We really need to update everything before calling
> somewhere.

Do you mean the call should be moved not just after tk_update_ktime_data
(which sets base_mono), but further down after

update_fast_timekeeper(tk);

?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix kvmclock breakage from timers branch merge

2014-09-05 Thread Thomas Gleixner
On Fri, 5 Sep 2014, Paolo Bonzini wrote:

> Il 05/09/2014 20:33, Thomas Gleixner ha scritto:
> >> >  update_vsyscall(tk);
> >> > -update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
> >> > 
> >> >  tk_update_ktime_data(tk);
> >> > +update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
> > Why are you moving the update between vsycall and pvclock update as I
> > did in my patch? We really need to update everything before calling
> > somewhere.
> 
> Do you mean the call should be moved not just after tk_update_ktime_data
> (which sets base_mono), but further down after
> 
> update_fast_timekeeper(tk);

No, it needs to be above update_vsyscall(). Here is the patch again
which I sent before. [https://lkml.org/lkml/2014/9/5/395]

Thanks,

tglx

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index fb4a9c2cf8d9..ec1791fae965 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -442,11 +442,12 @@ static void timekeeping_update(struct timekeeper *tk, 
unsigned int action)
tk->ntp_error = 0;
ntp_clear();
}
-   update_vsyscall(tk);
-   update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
 
tk_update_ktime_data(tk);
 
+   update_vsyscall(tk);
+   update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
+
if (action & TK_MIRROR)
memcpy(&shadow_timekeeper, &tk_core.timekeeper,
   sizeof(tk_core.timekeeper));
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM changes for 3.17-rc4

2014-09-05 Thread Linus Torvalds
On Fri, Sep 5, 2014 at 3:16 AM, Paolo Bonzini  wrote:
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

Nothing new there. Forgot to push out, or perhaps to use "-f" to
overwrite the previous tag of the same name?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM changes for 3.17-rc4

2014-09-05 Thread Thomas Gleixner
On Fri, 5 Sep 2014, Linus Torvalds wrote:

> On Fri, Sep 5, 2014 at 3:16 AM, Paolo Bonzini  wrote:
> >
> > are available in the git repository at:
> >
> >   git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus
> 
> Nothing new there. Forgot to push out, or perhaps to use "-f" to
> overwrite the previous tag of the same name?

And even if there would be something, please do not pull the top most
commit b11ba8c62be3eb (KVM: x86: fix kvmclock breakage from timers
branch merge).

That one is blantanly wrong and just hacks badly around a brown
paperbag bug in the core timekeeping code, which I introduced in the
last overhaul.

It's debugged and understood. Fix is posted and if confirmed by the
KVM folks it will go to you before rc4.

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix kvmclock breakage from timers branch merge

2014-09-05 Thread Paolo Bonzini
Il 05/09/2014 22:41, Thomas Gleixner ha scritto:
> No, it needs to be above update_vsyscall(). Here is the patch again
> which I sent before. [https://lkml.org/lkml/2014/9/5/395]

Ah, I missed it after your signature.  Thanks, I'll test yours then next
week.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM changes for 3.17-rc4

2014-09-05 Thread Paolo Bonzini
Il 05/09/2014 22:58, Thomas Gleixner ha scritto:
> Nothing new there. Forgot to push out, or perhaps to use "-f" to
> overwrite the previous tag of the same name?

It's there now.  Probably a --dry-run too much (I have
push=+refs/tags/for-linus:refs/tags/for-linus in the remote configuration).

> And even if there would be something, please do not pull the top most
> commit b11ba8c62be3eb (KVM: x86: fix kvmclock breakage from timers
> branch merge).
> 
> That one is blantanly wrong and just hacks badly around a brown
> paperbag bug in the core timekeeping code, which I introduced in the
> last overhaul.

It's not wrong, it's just different.  The commit message says clearly
that besides acting as a workaround, I find the patched code easier to
understand, and I clearly stated the same in the tag message.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM changes for 3.17-rc4

2014-09-05 Thread Thomas Gleixner
On Fri, 5 Sep 2014, Paolo Bonzini wrote:

> Il 05/09/2014 22:58, Thomas Gleixner ha scritto:
> > Nothing new there. Forgot to push out, or perhaps to use "-f" to
> > overwrite the previous tag of the same name?
> 
> It's there now.  Probably a --dry-run too much (I have
> push=+refs/tags/for-linus:refs/tags/for-linus in the remote configuration).
> 
> > And even if there would be something, please do not pull the top most
> > commit b11ba8c62be3eb (KVM: x86: fix kvmclock breakage from timers
> > branch merge).
> > 
> > That one is blantanly wrong and just hacks badly around a brown
> > paperbag bug in the core timekeeping code, which I introduced in the
> > last overhaul.
> 
> It's not wrong, it's just different.  The commit message says clearly

Right, it's different. Because you paper at the receiving end over a
core bug and that's wrong to begin with.

> that besides acting as a workaround, I find the patched code easier to
> understand, and I clearly stated the same in the tag message.

Well, we might have different opinions about easier to understand. I
did go a great length to distangle the monotonic boot time on which
you are interested from xtime, because the latter does not make any
sense outside of the core timekeeping code. Aside of that I optimized
the whole thing to avoid conversions, loops and hoops. So you just add
another multiply and add to make it more understandable. Sigh.

Thanks,

tglx






--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM changes for 3.17-rc4

2014-09-05 Thread Paolo Bonzini
Il 05/09/2014 23:24, Thomas Gleixner ha scritto:
> 
>> > that besides acting as a workaround, I find the patched code easier to
>> > understand, and I clearly stated the same in the tag message.
> Well, we might have different opinions about easier to understand. I
> did go a great length to distangle the monotonic boot time on which
> you are interested from xtime, because the latter does not make any
> sense outside of the core timekeeping code. Aside of that I optimized
> the whole thing to avoid conversions, loops and hoops. So you just add
> another multiply and add to make it more understandable. 

Fair enough, I've dropped the patch.

Thanks for helping out with the core timekeeping fix.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL v2] KVM changes for 3.17-rc4

2014-09-05 Thread Paolo Bonzini
Linus,

The following changes since commit 30d1e0e806e5b2fadc297ba78f2d7afd6ba309cf:

  virt/kvm/assigned-dev.c: Set 'dev->irq_source_id' to '-1' after free it 
(2014-08-19 15:12:28 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to 02a68d0503fa470abff8852e10b1890df5730a08:

  powerpc/kvm/cma: Fix panic introduces by signed shift operation (2014-09-03 
10:34:07 +0200)


A smattering of bug fixes across most architectures.


Christian Borntraeger (4):
  KVM: s390: Fix user triggerable bug in dead code
  KVM: s390/mm: try a cow on read only pages for key ops
  KVM: s390/mm: Fix storage key corruption during swapping
  KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags

Christoffer Dall (1):
  arm/arm64: KVM: Complete WFI/WFE instructions

Laurent Dufour (1):
  powerpc/kvm/cma: Fix panic introduces by signed shift operation

Paolo Bonzini (3):
  Merge tag 'kvm-s390-20140825' of git://git.kernel.org/.../kvms390/linux 
into kvm-master
  Merge tag 'kvm-arm-for-v3.17-rc3' of 
git://git.kernel.org/.../kvmarm/kvmarm into HEAD
  Merge tag 'kvm-s390-master-20140902' of 
git://git.kernel.org/.../kvms390/linux into kvm-master

Pranavkumar Sawargaonkar (1):
  ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU

 arch/arm/kvm/handle_exit.c  |  2 ++
 arch/arm/kvm/init.S |  4 
 arch/arm64/kvm/handle_exit.c|  2 ++
 arch/arm64/kvm/hyp-init.S   |  4 
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  4 ++--
 arch/s390/include/asm/pgtable.h |  6 --
 arch/s390/kvm/kvm-s390.c| 13 -
 arch/s390/mm/pgtable.c  | 10 ++
 8 files changed, 28 insertions(+), 17 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM changes for 3.17-rc4

2014-09-05 Thread Thomas Gleixner
On Fri, 5 Sep 2014, Paolo Bonzini wrote:

> Il 05/09/2014 23:24, Thomas Gleixner ha scritto:
> > 
> >> > that besides acting as a workaround, I find the patched code easier to
> >> > understand, and I clearly stated the same in the tag message.
> > Well, we might have different opinions about easier to understand. I
> > did go a great length to distangle the monotonic boot time on which
> > you are interested from xtime, because the latter does not make any
> > sense outside of the core timekeeping code. Aside of that I optimized
> > the whole thing to avoid conversions, loops and hoops. So you just add
> > another multiply and add to make it more understandable. 
> 
> Fair enough, I've dropped the patch.

Though I agree that the struct member name choice is not the most
brilliant one, so feel free to fix that.
 
> Thanks for helping out with the core timekeeping fix.

I'm still banging my head against the wall
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Alexander Graf


On 05.09.14 13:55, Frank Blaschka wrote:
> On Fri, Sep 05, 2014 at 10:35:59AM +0200, Alexander Graf wrote:
>>
>>
>> On 05.09.14 09:46, Frank Blaschka wrote:
>>> On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
 On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
> This set of patches implements pci pass-through support for qemu/KVM on 
> s390.
> PCI support on s390 is very different from other platforms.
> Major differences are:
>
> 1) all PCI operations are driven by special s390 instructions

 Generating config cycles is always arch specific.

> 2) all s390 PCI instructions are privileged

 While the operations to generate config cycles on x86 are not
 privileged, they must be arbitrated between accesses, so in a sense
 they're privileged.

> 3) PCI config and memory spaces can not be mmap'ed

 VFIO has mapping flags that allow any region to specify mmap support.

>>>
>>> Hi Alex,
>>>
>>> thx for your reply.
>>>
>>> Let me elaborate a little bit ore on 1 - 3. Config and memory space can not
>>> be accessed via memory operations. You have to use special s390 
>>> instructions.
>>> This instructions can not be executed in user space. So there is no other
>>> way than executing this instructions in kernel. Yes vfio does support a
>>> slow path via ioctrl we could use, but this seems suboptimal from 
>>> performance
>>> point of view.
>>
>> Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal"
>> to call into the kernel for every PCI access, but I still think that
>> VFIO provides the correct abstraction layer for us to use. If nothing
>> else, it would at least give us identical configuration to x86 and nice
>> debugability en par with the other platforms.
>>
>>>  
> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.

 VFIO delivers interrupts as eventfds regardless of the underlying
 platform mechanism.

>>>
>>> yes that's right, but then we have to do platform specific stuff to present
>>> the irq to the guest. I do not say this is impossible but we have add s390
>>> specific code to vfio. 
>>
>> Not at all - interrupt delivery is completely transparent to VFIO.
>>
> 
> interrupt yes, but MSIX no
>  
>>>
> 5) For DMA access there is always an IOMMU required.

 x86 requires the same.

>  s390 pci implementation
>does not support a complete memory to iommu mapping, dma mappings are
>created on request.

 Sounds like POWER.
>>>
>>> Don't know the details from power, maybe it is similar but not the same.
>>> We might be able to extend vfio to have a new interface allowing
>>> us to do DMA mappings on request.
>>
>> We already have that.
>>
> 
> Great, can you give me some pointers how to use? Thx!

Sure! :)

So on POWER (sPAPR) you get a list of page entries that describe the
device -> ram mapping. Every time you want to modify any of these
entries, you need to invoke a hypercall (H_PUT_TCE).

So every time the guest wants to runtime add a DMA window, we trap into
put_tce_emu() in hw/ppc/spapr_iommu.c. Here we call
memory_region_notify_iommu().

This call goes either to an emulated IOMMU context for emulated devices
or to the special VFIO IOMMU context for VFIO devices.

In the VFIO case, we end up in vfio_iommu_map_notify() at hw/misc/vfio.c
which calls ioctl(VFIO_IOMMU_MAP_DMA) at the end of the day. The
in-kernel implementation of the host IOMMU provider uses this map to
create the virtual DMA window map.

Basically, VFIO *only* supports "DMA mappings on request" as you call
them. Prepopulated DMA windows are just a coincidence that may or may
not happen.

I hope that makes it slightly more clear what the path looks like :). If
you have more questions on this, don't hesitate to ask.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Alexander Graf


On 05.09.14 13:39, Frank Blaschka wrote:
> On Fri, Sep 05, 2014 at 10:21:27AM +0200, Alexander Graf wrote:
>>
>>
>> On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
>>> This set of patches implements pci pass-through support for qemu/KVM on 
>>> s390.
>>> PCI support on s390 is very different from other platforms.
>>> Major differences are:
>>>
>>> 1) all PCI operations are driven by special s390 instructions
>>> 2) all s390 PCI instructions are privileged
>>> 3) PCI config and memory spaces can not be mmap'ed
>>
>> That's ok, vfio abstracts config space anyway.
>>
>>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
>>
>> This is in line with other implementations. Interrupts go from
>>
>>   device -> PHB -> PIC -> CPU
>>
>> (some times you can have another converter device in between)
>>
>> In your case, the PHB converts INTX and MSI interrupts to Adapter
>> interrupts to go to the floating interrupt controller. Same thing as
>> everyone else really.
>>
> 
> Yes, I think this can be done, but we need s390 specific changes in vfio.
> 
>>> 5) For DMA access there is always an IOMMU required. s390 pci implementation
>>>does not support a complete memory to iommu mapping, dma mappings are
>>>created on request.
>>
>> Sounds great :). So I suppose we should implement a guest facing IOMMU?
>>
>>> 6) The OS does not get any informations about the physical layout
>>>of the PCI bus.
>>
>> So how does it know whether different devices are behind the same IOMMU
>> context? Or can we assume that every device has its own context?
> 
> Actually yes

That greatly simplifies things. Awesome :).

> 
>>
>>> 7) To take advantage of system z specific virtualization features
>>>we need to access the SIE control block residing in the kernel KVM
>>
>> Pleas elaborate.
>>
>>> 8) To enable system z specific virtualization features we have to manipulate
>>>the zpci device in kernel.
>>
>> Why?
>>
> 
> We have following s390 specific virtualization features:
> 
> 1) interpretive execution of pci load/store instruction. If we use this 
> function
>pci access does not get intercepted (no SIE exit) but is handled via 
> microcode.
>To enable this we have to disable zpci device and enable it again with 
> information
>from the SIE control block.

Hrm. So how about you create a special vm ioctl for KVM that allows you
to attach a VFIO device fd into the KVM VM context? Then the default
would stay "accessible by mmap traps", but we could accelerate it with KVM.

>Further in qemu problem is: vfio traps access to
>MSIX table so we have to find another way programming msix if we do not get
>intercepts for memory space access.

We trap access to the MSIX table because it's a shared resource. If it's
not shared for you, there's no need to trap it.

> 2) Adapter event forwarding (with alerting). This is a mechanism the adpater 
> event (irq)
>is directly forwarded to the guest. To set this up we also need to 
> manipulate
>the zpci device (in kernel) with information form the SIE block. Exploiting
>GISA is only one part of this mechanism.

How does this work when the VM is not running (because it's idle)?

Either way, we have a very similar thing on x86. It's called "posted
interrupts" there. I'm not sure everything's in place for VFIO and
posted interrupts to work properly, but whatever we do it sounds like
the interfaces and configuration flow should be identical.

> Both might be possible with some more or less nice looking vfio extensions. 
> As I said
> before we have to dig more into. Also this can be further optimazation steps 
> later
> if we have a running vfio implementation on the platform. 

Yup :). That's the nice part about it.

>  
>>>
>>> For this reasons I decided to implement a kernel based approach similar
>>> to x86 device assignment. There is a new qemu device (s390-pci) 
>>> representing a
>>
>> I fail to see the rationale and I definitely don't want to see anything
>> even remotely similar to the legacy x86 device assignment on s390 ;).
>>
>> Can't we just enhance VFIO?
>>
> 
> Probably yes, but we need some vfio changes (kernel and qemu)

We need changes either way ;). So let's better do the right ones.

> 
>> Also, I think we'll get the cleanest model if we start off with an
>> implementation that allows us to add emulated PCI devices to an s390x
>> machine and only then follow on with physical ones.
>>
> 
> I can already do this. With some more s390 intercepts a device can be 
> detected and
> guest is able to access config/memory space. Unfortunately s390 platform does 
> not
> support I/O bars so non of the emulated devices will work on the platform ...

Oh? How about "nec-usb-xhci" or "intel-hda"?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vge