Re: [PATCH] KVM/X86: Check input sreg values before loading vcpu

2018-03-08 Thread rkrc...@redhat.com
2018-02-27 06:57+, Tianyu Lan:
> From: Lan Tianyu 
> 
> This patch is to check sreg value first and then load vcpu in order
> to avoid redundant loading/putting vcpu.
> 
> Signed-off-by: Lan Tianyu 
> ---

Patch "KVM: x86: KVM_CAP_SYNC_REGS" made significant changes to the
sregs setter, so the patch cannot be applied in current form.

I think that moving the X86_CR4_OSXSAVE check to guest_cpuid_has still
makes sense, but avoiding the vcpu_load/put would produce worse code
elsewhere and avoiding the load/put is not critical as any error is
probably going to be the end for this VM.

Thanks.


Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

2016-01-21 Thread rkrc...@redhat.com
2016-01-21 13:44+0800, Yang Zhang:
> On 2016/1/21 13:41, Wu, Feng wrote:
>>>From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>>>We may have different understanding on PI mode. My understanding is if
>>>we set the IRTE to PI format, than the subsequent interrupt will be
>>>handled in PI mode. multi-cast and broadcast interrupts cannot be
>>>injected to guest directly but it doesn't mean cannot be handled in PI
>>>mode. As i said, we can handle it in wake up vector or via other
>>>approach.But it is much complexity.

KVM has to intercept the interrupt, so we'd need to trigger a deferred
work from the notification handler to send the multicast.
Reusing existing PI vectors would mean slowing them down, so we should
define a new PI notification vector just for this purpose, which would
be confusing in /proc/interrupts anyway.
On top of that, we'd need to define new PIRR array(s) and create unique
PID for every IRTE, to avoid parsing those PIRR arrays as the vector is
stored in IRTE ... it's going a bit too far, I guess.

>>For the multicast/broastcast, we cannot set the related IRTE in PI
>>mode, since we cannot set only one destination in IRTE. If an interrupt
>>is for multiple destination, how can you use VT-d PI to injection it
>>to all the destinations?
> 
> You may still not get my point. Anyway, it doesn't matter. Rollback to
> legacy mode still is the best choice so far.

I think we can't do much better than we do now.


Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-21 Thread rkrc...@redhat.com
2016-01-21 05:33+, Wu, Feng:
>> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
>> ow...@vger.kernel.org] On Behalf Of Yang Zhang
>> On 2016/1/20 9:42, Feng Wu wrote:
>> > +  /*
>> > +   * We may find a hardware disabled LAPIC here, if
>> that
>> > +   * is the case, print out a error message once for each
>> > +   * guest and return.
>> > +   */
>> > +  if (!dst[idx-1] &&
>> > +  (kvm->arch.disabled_lapic_found == 0)) {
>> > +  kvm->arch.disabled_lapic_found = 1;
>> > +  printk(KERN_ERR
>> > +  "Disabled LAPIC found during irq
>> injection\n");
>> > +  goto out;
>> 
>> What does "goto out" mean? Inject successfully or fail? According the
>> value of ret which is set to ture here, it means inject successfully but

(true actually means that fast path did the job and slow path isn't
 needed.)

>> i = -1.

(I think there isn't a practical difference between *r=-1 and *r=0.)

> Oh, I didn't notice 'ret' is initialized to true, I thought it was initialized
> to false like another function, I should add a "ret = false' here. We should
> failed to inject the interrupt since hardware disabled LAPIC is found.

'ret = true' is the better one.  We know that the interrupt is not
deliverable [1], so there's no point in trying to deliver with the slow
path.  We behave similarly when the interrupt targets a single disabled
APIC.

---
1: Well ... it's possible that slowpath would deliver it thanks to
   different handling of disabled APICs, but it's undefined behavior,
   so it doesn't matter matter if we don't try.


Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

2016-01-22 Thread rkrc...@redhat.com
2016-01-22 10:03+0800, Yang Zhang:
> On 2016/1/22 0:35, rkrc...@redhat.com wrote:
>>2016-01-21 13:44+0800, Yang Zhang:
>>>On 2016/1/21 13:41, Wu, Feng wrote:
>>>>>From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>>>>>We may have different understanding on PI mode. My understanding is if
>>>>>we set the IRTE to PI format, than the subsequent interrupt will be
>>>>>handled in PI mode. multi-cast and broadcast interrupts cannot be
>>>>>injected to guest directly but it doesn't mean cannot be handled in PI
>>>>>mode. As i said, we can handle it in wake up vector or via other
>>>>>approach.But it is much complexity.
>>
>>KVM has to intercept the interrupt, so we'd need to trigger a deferred
>>work from the notification handler to send the multicast.
>>Reusing existing PI vectors would mean slowing them down, so we should
>>define a new PI notification vector just for this purpose, which would
>>be confusing in /proc/interrupts anyway.
>>On top of that, we'd need to define new PIRR array(s) and create unique
>>PID for every IRTE, to avoid parsing those PIRR arrays as the vector is
>>stored in IRTE ... it's going a bit too far, I guess.
> 
> Not so complicated. We can reuse the wake up vector and check whether the
> interrupt is multicast when one of destination vcpu handles it.

I'm not sure what you mean now ... I guess it is:
- Deliver the interrupt to a guest VCPU and relay the multicast to other
  VCPUs.  No, it's strictly worse than intercepting it in the host.

- Modify host's wakeup vector handler to send the multicast.
  It's so complicated, because all information you start with in the
  host is a vector number.  You start with no idea what the multicast
  interrupt should be.

  We could add per-multicast PID to the list of parsed PIDs in
  wakeup_handler and use PID->multicast interrupt mapping to tell which
  interrupt we should send, but that seems worse than just delivering a
  non-remapped interrupt.

  Also, if wakeup vector were used for wakeup and multicast, we'd be
  uselessly doing work, because we can't tell which reason triggered the
  interrupt before finishing one part -- using separate vectors for that
  would be a bit nicer.


Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-22 Thread rkrc...@redhat.com
2016-01-22 12:00+0800, Yang Zhang:
> On 2016/1/22 1:21, rkrc...@redhat.com wrote:
>>(I think there isn't a practical difference between *r=-1 and *r=0.)
> 
> Currently, if *r == -1, the remote_irr may get set. But it seems wrong. I

Yeah ...

> need to have a double check to see whether it is a bug in current code.

Looking forward to the patch!

Thanks.

>>'ret = true' is the better one.  We know that the interrupt is not
>>deliverable [1], so there's no point in trying to deliver with the slow
>>path.  We behave similarly when the interrupt targets a single disabled
>>APIC.
>>
>>---
>>1: Well ... it's possible that slowpath would deliver it thanks to
>>different handling of disabled APICs, but it's undefined behavior,
> 
> why it is undefined behavior? Besides, why we will keep two different
> handling logic for the fast path and slow path? It looks weird.

It does look very weird ... the slow path would require refactoring,
though, so we save effort without a considerable drawback.
(I would love if it behaved identically, but I don't want to force it on
 someone and likely won't do it myself ...)

I consider it undefined because SMD says that an OS musn't configure
this behavior and doesn't say what should happen if the OS does => we
could do anything.  (Killing the guest would be great for debugging OS
issues, but ours behavior is fairly conservative.)


Re: [PATCH 0/6] KVM/x86: Add workaround to support ExtINT with AVIC

2019-04-04 Thread rkrc...@redhat.com
2019-03-22 11:57+, Suthikulpanit, Suravee:
> This series is one of the prerequisites for supporting AMD AVIC with
> in-kernel irqchip (kernel_irqchip=on).
> 
> Since AVIC does not support ExtINT interrupt, which is required during
> the booting phase of Windows and FreeBSD VMs (e.g. PIT -> PIC -> ExtInt).
> This results in VM hang in the boot loader with kernel_irqchip=on mode.
> 
> This series provides workaround by temporary deactivate AVIC and fallback
> to use legacy interrupt injection (w/ vINTR and interrupt window).
> Then re-activate AVIC once the intrrupts are handled.

The solution looks reasonable (although it seems dangerous to me) as we
only enable the workaround if the interrupt cannot be immediately
injected.

The interesting part is that split irqchip works, yet it cannot inject
ExtInt correctly either -- does the guest OS behave differently, or does
split irqchip just ignore the ExtInt?

Thanks.


Re: [PATCH 0/6] KVM/x86: Add workaround to support ExtINT with AVIC

2019-04-04 Thread rkrc...@redhat.com
2019-03-22 11:57+, Suthikulpanit, Suravee:
> This series is one of the prerequisites for supporting AMD AVIC with
> in-kernel irqchip (kernel_irqchip=on).
> 
> Since AVIC does not support ExtINT interrupt, which is required during
> the booting phase of Windows and FreeBSD VMs (e.g. PIT -> PIC -> ExtInt).
> This results in VM hang in the boot loader with kernel_irqchip=on mode.
> 
> This series provides workaround by temporary deactivate AVIC and fallback
> to use legacy interrupt injection (w/ vINTR and interrupt window).
> Then re-activate AVIC once the intrrupts are handled.

Hm, another idea.  It is possible to inject the ExtInt in APICv, but if
interrupt injection is currently disabled, we need to wait until the
interrupt window opens, which can't be done with int_* controls n.

Wouldn't intercepting IRET/STGI/IF writes be enough to eventually reach
the point where we can do event_inj?

Thanks.


Re: [PATCH v5 9/9] Documentation: virtual: kvm: Support vcpu preempted check

2016-10-21 Thread rkrc...@redhat.com
2016-10-21 11:27+, David Laight:
> From: Pan Xinhui
>> Sent: 20 October 2016 22:28
>> Commit ("x86, kvm: support vcpu preempted check") add one field "__u8
>> preempted" into struct kvm_steal_time. This field tells if one vcpu is
>> running or not.
>> 
>> It is zero if 1) some old KVM deos not support this filed. 2) the vcpu is
>> preempted. Other values means the vcpu has been preempted.
>> 
>> Signed-off-by: Pan Xinhui 
>> ---
>>  Documentation/virtual/kvm/msr.txt | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/Documentation/virtual/kvm/msr.txt 
>> b/Documentation/virtual/kvm/msr.txt
>> index 2a71c8f..3376f13 100644
>> --- a/Documentation/virtual/kvm/msr.txt
>> +++ b/Documentation/virtual/kvm/msr.txt
>> @@ -208,7 +208,8 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
>>  __u64 steal;
>>  __u32 version;
>>  __u32 flags;
>> -__u32 pad[12];
>> +__u8  preempted;
>> +__u32 pad[11];
>>  }
> 
> I think I'd be explicit about the 3 pad bytes you've left.

Seconded.

With that change are all KVM bits

Acked-by: Radim Krčmář 


Re: [PATCH v5 9/9] Documentation: virtual: kvm: Support vcpu preempted check

2016-10-24 Thread rkrc...@redhat.com
2016-10-24 16:42+0200, Paolo Bonzini:
> On 21/10/2016 20:39, rkrc...@redhat.com wrote:
>> 2016-10-21 11:27+, David Laight:
>>> From: Pan Xinhui
>>>> Sent: 20 October 2016 22:28
>>>> Commit ("x86, kvm: support vcpu preempted check") add one field "__u8
>>>> preempted" into struct kvm_steal_time. This field tells if one vcpu is
>>>> running or not.
>>>>
>>>> It is zero if 1) some old KVM deos not support this filed. 2) the vcpu is
>>>> preempted. Other values means the vcpu has been preempted.
>>>>
>>>> Signed-off-by: Pan Xinhui 
>>>> ---
>>>>  Documentation/virtual/kvm/msr.txt | 8 +++-
>>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/virtual/kvm/msr.txt 
>>>> b/Documentation/virtual/kvm/msr.txt
>>>> index 2a71c8f..3376f13 100644
>>>> --- a/Documentation/virtual/kvm/msr.txt
>>>> +++ b/Documentation/virtual/kvm/msr.txt
>>>> @@ -208,7 +208,8 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
>>>>__u64 steal;
>>>>__u32 version;
>>>>__u32 flags;
>>>> -  __u32 pad[12];
>>>> +  __u8  preempted;
>>>> +  __u32 pad[11];
>>>>}
>>>
>>> I think I'd be explicit about the 3 pad bytes you've left.
>> 
>> Seconded.
>> 
>> With that change are all KVM bits
>> 
>> Acked-by: Radim Krčmář 
> 
> Saw this after replying to the previous message.  If you need to post v6
> of the full series, it would be nice if you removed the
> kvm_read_guest_cached.  But anyway it wasn't my intention to override Radim.

The patch was acceptable to me even now, so I definitely wouldn't mind
if it were even nicer. :)


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-22 Thread rkrc...@redhat.com
2015-12-22 07:19+, Wu, Feng:
>> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>> On 2015/12/22 14:59, Wu, Feng wrote:
>> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>> >> On 2015/12/16 9:37, Feng Wu wrote:
>> >>> +for_each_set_bit(i, &bitmap, 16) {
>> >>> +if (!dst[i]
>> >> && !kvm_lapic_enabled(dst[i]->vcpu)) {
>> >>
>> >> It should be or(||) not and (&&).
>> >
>> > Oh, you are right! My negligence! Thanks for pointing this out, Yang!
>> 
>>  btw, i think the kvm_lapic_enabled check is wrong here? Why need it 
>>  here?
>> >>>
>> >>> If the lapic is not enabled, I think we cannot recognize it as a 
>> >>> candidate, can
>> >> we?
>> >>> Maybe Radim can confirm this, Radim, what is your option?

SDM 10.6.2.2 Logical Destination Mode:
  For both configurations of logical destination mode, when combined
  with lowest priority delivery mode, software is responsible for
  ensuring that all of the local APICs included in or addressed by the
  IPI or I/O subsystem interrupt are present and enabled to receive the
  interrupt.

The case is undefined if some targeted LAPICs weren't hardware enabled
as no interrupts can be delivered to hardware disabled LAPIC, so we can
check for hardware enabled.

It's not obvious if "enabled to receive the interrupt" means hardware or
software enabled, but lowest priority cannot deliver NMI/INIT/..., so
checking for software enabled doesn't restrict any valid uses either.

so ... KVM only musn't blow up when encountering this situation :)

The current code seems correct, but redundant.  Just for reference, KVM
now does:
- check for software enabled LAPIC since patch aefd18f01ee8 ("KVM: x86:
  In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's")
- check only for hardware enabled LAPIC in the fast path, since
  1e08ec4a130e ("KVM: optimize apic interrupt delivery"))

(v1 was arguable better, I pointed the need for enabled LAPIC in v1 only
 from looking at one KVM function, sorry.)

>> >> Lapic can be disable by hw or sw. Here we only need to check the hw is
>> >> enough which is already covered while injecting the interrupt into
>> >> guest. I remember we(Glab, Macelo and me) have discussed it several ago,
>> >> but i cannot find the mail thread.
>>
>> >
>> > But if the lapic is disabled by software, we cannot still inject 
>> > interrupts to
>> > it, can we?
>> 
>> Yes, We cannot inject the normal interrupt. But this already covered by
>> current logic and add a check here seems meaningless. Conversely, it may
>> do bad thing..
>> 
> 
> Let's wait for Radim/Paolo's opinions about this.

I'd pick whatever results in less code: this time it seems like checking
for hardware enabled LAPIC in both paths (implicitly in the fast path).
Maybe it can be done better, I haven't given it much thought.

We should revert aefd18f01ee8 at the same time, so our PI/non-PI slow
paths won't diverge -- I hope it wasn't fixing a bug :)

I'll review the series tomorrow, thanks for your patience.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-23 Thread rkrc...@redhat.com
2015-12-23 02:12+, Wu, Feng:
>> From: rkrc...@redhat.com [mailto:rkrc...@redhat.com]
>> 2015-12-22 07:19+, Wu, Feng:
>> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>> >> On 2015/12/22 14:59, Wu, Feng wrote:
>> >> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>> >> >>>>>> On 2015/12/16 9:37, Feng Wu wrote:
>> The case is undefined if some targeted LAPICs weren't hardware enabled
>> as no interrupts can be delivered to hardware disabled LAPIC, so we can
>> check for hardware enabled.
>> 
>> It's not obvious if "enabled to receive the interrupt" means hardware or
>> software enabled, but lowest priority cannot deliver NMI/INIT/..., so
>> checking for software enabled doesn't restrict any valid uses either.
>> 
>> so ... KVM only musn't blow up when encountering this situation :)
>> 
>> The current code seems correct, but redundant.  Just for reference, KVM
>> now does:
>> - check for software enabled LAPIC since patch aefd18f01ee8 ("KVM: x86:
>>   In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's")
>> - check only for hardware enabled LAPIC in the fast path, since
>>   1e08ec4a130e ("KVM: optimize apic interrupt delivery"))
> 
> Software enabled LAPIC is also checked in patch 1e08ec4a130e
> ("KVM: optimize apic interrupt delivery"), however, it was removed
> in patch 3b5a5ffa928a3f875b0d5dd284eeb7c322e1688a.

Right, thanks.  (The software check was actually removed in 173beedc1601
("KVM: x86: Software disabled APIC should still deliver NMIs"), which
introduced a two pass mechanism that was later simplified.)

>Now I am
> a little confused about the policy, when and where should we do
> the software/hardware enabled check?

It's a mess, I think we'd like both checks to be done early and ideally
only in one place.

The fast path would like to precompute as much as possible, but only
hardware enabled affects all interrupts (like non-present LAPIC);
software disabled still needs an extra condition for every interrupt.

>> I'd pick whatever results in less code: this time it seems like checking
>> for hardware enabled LAPIC in both paths (implicitly in the fast path).
>> Maybe it can be done better, I haven't given it much thought.
>> 
>> We should revert aefd18f01ee8 at the same time, so our PI/non-PI slow
>> paths won't diverge -- I hope it wasn't fixing a bug :)
> 
> From the change log, It seems to me this patch was fixing a bug.

Yeah, I found the original discussion
  RFC: http://www.spinics.net/lists/kvm/msg36190.html
  v1:  http://www.spinics.net/lists/kvm/msg36395.html
  v2:  http://www.spinics.net/lists/kvm/msg36651.html

that led to some explanation in bugzilla:
  https://bugzilla.redhat.com/show_bug.cgi?id=596223 (a clone of
  https://bugzilla.redhat.com/show_bug.cgi?id=505527)

It seems that kexec on VCPU != 0 did something with BSP APIC ID that
resulted in a wrong delivery -- I didn't look where the bug was, but the
solution we adopted is probably just a lucky workaround.
Makes sense to look deeper into it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-23 Thread rkrc...@redhat.com
2015-12-22 14:42+0800, Yang Zhang:
> On 2015/12/22 12:36, Wu, Feng wrote:
>>>From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>>>On 2015/12/21 9:55, Wu, Feng wrote:
>From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
>On 2015/12/16 9:37, Feng Wu wrote:
>>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>@@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm
>>>*kvm,
>unsigned int host_irq,
>>   */
>>
>>  kvm_set_msi_irq(e, &irq);
>>- if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
>>- continue;
>>+
>>+ if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>>+ if (!kvm_vector_hashing_enabled() ||
>>+ irq.delivery_mode !=
>APIC_DM_LOWEST)
>>+ continue;
>>+
>>+ vcpu = kvm_intr_vector_hashing_dest(kvm, &irq);
>>+ if (!vcpu)
>>+ continue;
>>+ }
>
>I am a little confused with the 'continue'. If the destination is not
>single vcpu, shouldn't we rollback to use non-PI mode?

Here is the logic:
- If it is single destination, we will use PI no matter it is fixed or 
lowest-priority.
- If it is not single destination:
a) It is fixed, we will use non-PI
b) It is lowest-priority and vector-hashing is enabled, we will use PI
c) otherwise, use non-PI
>>>
>>>If it is single destination previously, then change to no-single mode.
>>>Can current code cover this case?
>>
>>In my test, before setting irq affinity (change single vcpu to non-single vcpu
>>in this case), the guest will mask the interrupt first, so before getting 
>>here, IRTE
>>has been changed back to remapped mode already(when guest masks the MSIx,
>>we will change back to remapped mode), hence nothing needed here.
>>
>>Digging into the linux code (guest) a bit more, I found that if interrupt 
>>remapping
>>is not enabled in the guest (IR is not supported for guest anyway), it will 
>>always
>>mask the MSI/MSIx before setting the irq affinity. So the code should work
>>well currently.
> 
> We should not rely on guest's behavior. From code level, it need be fixed.
> 
>>However, for robustness, I think explicitly changing IRTE back to remapped
>>mode for the 'continue' case should be a good idea.
> 
> This is what i am looking for.

I agree, that would be a nice addition.

IIRC, the masking is optional -- if the guest can handle interrupts that
are generated while the device is half-configured, it doesn't need to
disable MSIs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

2016-01-26 Thread rkrc...@redhat.com
2016-01-26 09:44+0800, Yang Zhang:
> On 2016/1/25 21:59, rkrc...@redhat.com wrote:
>>2016-01-25 09:49+0800, Yang Zhang:
>>>On 2016/1/22 21:31, rkrc...@redhat.com wrote:
>>>>2016-01-22 10:03+0800, Yang Zhang:
>>>>>Not so complicated. We can reuse the wake up vector and check whether the
>>>>>interrupt is multicast when one of destination vcpu handles it.
>>>>
>>>>I'm not sure what you mean now ... I guess it is:
>>>>- Deliver the interrupt to a guest VCPU and relay the multicast to other
>>>>   VCPUs.  No, it's strictly worse than intercepting it in the host.
>>>
>>>It is still handled in host context not guest context. The wakeup event
>>>cannot be consumed like posted event.
>>
>>Ok.  ("when one of destination vcpu handles it" confused me into
>>thinking that you'd like to handle it with the notification vector.)
> 
> Sorry for my poor english. :(

It's good.  Ambiguity is hard to avoid if a reader doesn't want to
assume only the most likely meaning.

>>>>   Also, if wakeup vector were used for wakeup and multicast, we'd be
>>>>   uselessly doing work, because we can't tell which reason triggered the
>>>>   interrupt before finishing one part -- using separate vectors for that
>>>>   would be a bit nicer.
>>
>>(imprecise -- we would always have to check for ON bit of all PIDs from
>>  blocked VCPUs, for the original meaning of wakeup vector, and always
> 
> This is what KVM does currently.

Yep.

>>  either read the PIRR or check for ON bit of all PIDs that encode
>>  multicast interrupts;  then we have to clear ON bits for multicasts.)
> 
> Also, most part of work is covered by current logic except checking the
> multicast.

We could reuse the setup that gets us to wakeup_handler, but there is
nothing to share in the handler itself.  Sharing a handler means that we
always have to execute both parts.

We must create new PID anyway and compared to the extra work needed for
multicast handling, a new vector + handler is a relatively small code
investment that adds clarity to the design (and performance).

(Taking the vector splitting to the extreme, we'd improve performance if
 we added a vector per assigned device.  That is practically the same as
 non-posted mode, just more complicated.)

>>---
>>There might be a benefit of using posted interrupts for host interrupts
>>when we run out of free interrupt vectors:  we could start using vectors
>>by multiple sources through posted interrupts, if using posted
> 
> Do you mean per vcpu posted interrupts?

I mean using posting for host device interrupts (no virt involved).

Let's say we have 300 devices for one CPU and CPU has 200 useable
vectors.  We have 100 device interrupts that need to be shared in some
vectors and using posting might be faster than directly checking
multiple devices.

(I couldn't come up with a plausible scenario where we might want to use
 posting for host interrupts.)


Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

2016-01-27 Thread rkrc...@redhat.com
2016-01-27 10:07+0800, Yang Zhang:
> On 2016/1/27 2:22, rkrc...@redhat.com wrote:
>>2016-01-26 09:44+0800, Yang Zhang:
>>>On 2016/1/25 21:59, rkrc...@redhat.com wrote:
>>>>>>   Also, if wakeup vector were used for wakeup and multicast, we'd be
>>>>>>   uselessly doing work, because we can't tell which reason triggered the
>>>>>>   interrupt before finishing one part -- using separate vectors for that
>>>>>>   would be a bit nicer.
>>>>
>>>>(imprecise -- we would always have to check for ON bit of all PIDs from
>>>>  blocked VCPUs, for the original meaning of wakeup vector, and always
>>>>  either read the PIRR or check for ON bit of all PIDs that encode
>>>>  multicast interrupts;  then we have to clear ON bits for multicasts.)
>>>
>>>Also, most part of work is covered by current logic except checking the
>>>multicast.
>>
>>We could reuse the setup that gets us to wakeup_handler, but there is
>>nothing to share in the handler itself.  Sharing a handler means that we
>>always have to execute both parts.
> 
> I don't quite understand it. There is nothing need to be modified for wakeup
> logic. The only thing we need to do is add the checking before the vcpu pick
> up the pending interrupt(This is happened in VCPU context, not in handler).

I see, there are few problems with that.

>>We must create new PID anyway and compared to the extra work needed for
>>multicast handling, a new vector + handler is a relatively small code
>>investment that adds clarity to the design (and performance).
> 
> No new PID is needed. If the target vcpu is running, no additional work is
> required in wakeup handler. If target vcpu is not running, the current logic
> will wake up the vcpu, then let vcpu itself to check whether pending
> interrupt is a multicast and handle it in vcpu's context.

We do need a new PID.  The existing VCPU PID switches between wakeup
vector and notification vector, so if the VCPU was running when the
device triggered an interrupt, we'd deliver the posted interrupt without
exiting, but we need to handle the interrupt in the host.

=> We need at least one PID that is never set to notification vector.

Reusing VCPU's PIRR is in new PID(s) is not doable.
Parsing PIRR would be our only option of recognizing multicast
interrupts and if the guest configured many sources to send the same
vector, we'd have to do unacceptable things to tell which one was
triggered.

=> We also need at least on one new PIRR.

Handling the interrupt in VCPU context doesn't pose any advantage and we
even want to do it outside, because all VCPUs can be running when the
interrupt arrives and can therefore be posted further.

I hope I covered other disadvantages of PIDs and PIRRs earlier.


Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

2016-01-25 Thread rkrc...@redhat.com
2016-01-25 09:49+0800, Yang Zhang:
> On 2016/1/22 21:31, rkrc...@redhat.com wrote:
>>2016-01-22 10:03+0800, Yang Zhang:
>>>Not so complicated. We can reuse the wake up vector and check whether the
>>>interrupt is multicast when one of destination vcpu handles it.
>>
>>I'm not sure what you mean now ... I guess it is:
>>- Deliver the interrupt to a guest VCPU and relay the multicast to other
>>   VCPUs.  No, it's strictly worse than intercepting it in the host.
> 
> It is still handled in host context not guest context. The wakeup event
> cannot be consumed like posted event.

Ok.  ("when one of destination vcpu handles it" confused me into
thinking that you'd like to handle it with the notification vector.)

>   So it relies on hypervisor to inject
> the interrupt to guest. We can add the check at this point.

Yes, but I don't think we want to do that, because of following
drawbacks:

>>- Modify host's wakeup vector handler to send the multicast.
>>   It's so complicated, because all information you start with in the
>>   host is a vector number.  You start with no idea what the multicast
>>   interrupt should be.
>>
>>   We could add per-multicast PID to the list of parsed PIDs in
>>   wakeup_handler and use PID->multicast interrupt mapping to tell which
>>   interrupt we should send, but that seems worse than just delivering a
>>   non-remapped interrupt.

(should have been "remapped, but non-posted".)

>>   Also, if wakeup vector were used for wakeup and multicast, we'd be
>>   uselessly doing work, because we can't tell which reason triggered the
>>   interrupt before finishing one part -- using separate vectors for that
>>   would be a bit nicer.

(imprecise -- we would always have to check for ON bit of all PIDs from
 blocked VCPUs, for the original meaning of wakeup vector, and always
 either read the PIRR or check for ON bit of all PIDs that encode
 multicast interrupts;  then we have to clear ON bits for multicasts.)


---
There might be a benefit of using posted interrupts for host interrupts
when we run out of free interrupt vectors:  we could start using vectors
by multiple sources through posted interrupts, if using posted
interrupts is the fastest way to distinguish the interrupt source.