On Sun, Jun 06, 2010 at 12:10:07PM +0200, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Sun, Jun 06, 2010 at 10:07:48AM +0200, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: > >>>> Gleb Natapov wrote: > >>>>> On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: > >>>>>>> I'd like to also support EOI handling. When the guest clears the > >>>>>>> interrupt condtion, the EOI callback would be called. This could occur > >>>>>>> much later than the IRQ delivery time. I'm not sure if we need the > >>>>>>> result code in that case. > >>>>>>> > >>>>>>> If any intermediate device (IOAPIC?) needs to be informed about either > >>>>>>> delivery or EOI also, it could create a proxy message with its > >>>>>>> callbacks in place. But we need then a separate opaque field (in > >>>>>>> addition to payload) to store the original message. > >>>>>>> > >>>>>>> struct IRQMsg { > >>>>>>> DeviceState *src; > >>>>>>> void (*delivery_cb)(IRQMsg *msg, int result); > >>>>>>> void (*eoi_cb)(IRQMsg *msg, int result); > >>>>>>> void *src_opaque; > >>>>>>> void *payload; > >>>>>>> }; > >>>>>> Extending the lifetime of IRQMsg objects beyond the delivery call stack > >>>>>> means qemu_malloc/free for every delivery. I think it takes a _very_ > >>>>>> appealing reason to justify this. But so far I do not see any use case > >>>>>> for eio_cb at all. > >>>>>> > >>>>> I dislike use of eoi for reinfecting missing interrupts since > >>>>> it eliminates use of internal PIC/APIC queue of not yet delivered > >>>>> interrupts. PIC and APIC has internal queue that can handle two > >>>>> elements: > >>>>> one is delivered, but not yet acked interrupt in isr and another is > >>>>> pending interrupt in irr. Using eoi callback (or ack notifier as it's > >>>>> called inside kernel) interrupt will be considered coalesced even if irr > >>>>> is cleared, but no ack was received for previously delivered interrupt. > >>>>> But ack notifiers actually has another use: device assignment. There is > >>>>> a plan to move device assignment from kernel to userspace and for that > >>>>> ack notifiers will have to be extended to userspace too. If so we can > >>>>> use them to do irq decoalescing as well. I doubt they should be part > >>>>> of IRQMsg though. Why not do what kernel does: have globally registered > >>>>> notifier based on irqchip/pin. > >>>> I read this twice but I still don't get your plan. Do you like or > >>>> dislike using EIO for de-coalescing? And how should these notifiers work? > >>>> > >>> That's because I confused myself :) I _dislike_ them to be used, but > >>> since device assignment requires ack notifiers anyway may be it is better > >>> to introduce one mechanism for device assignmen + de-coalescing instead > >>> of introducing two different mechanism. Using ack notifiers should be > >>> easy: RTC registers ack notifier and keep track of delivered interrupts. > >>> If timer triggers after previews irq was set, but before it was acked > >>> coalesced counter is incremented. In ack notifier callback coalesced > >>> counter is checked and if it is not zero new irq is set. > >> Ack notifier registrations and event deliveries still need to be routed. > >> Piggy-backing this on IRQ messages may be unavoidable for that reason. > > It is done in the kernel without piggy-backing. > > As it does not include any IRQ routers in front of the interrupt > controller. Maybe it works for x86, but it is no generic solution. > x86 has IRQ router in front of interrupt controller inside pci host bridge.
> Also, periodic timer sources get no information about the fact that > their interrupt is masked somewhere along the path to the VCPUs and will > possibly replay countless IRQs when the masking ends, no? > Correct, for that we have mask notifiers in the kernel. Gets ugly be the minute. > > > >> Anyway, I'm going to post my HPET updates with the infrastructure for > >> IRQMsg now. Maybe it's helpful to see the other option in reality. > >> > > One other think to consider current approach does not always work. > > Win2K3-64bit-smp and Win2k8-64bit-smp configure RTC interrupt to be > > broadcasted to all cpus, but only boot cpu does time calculation. With > > current approach if interrupt is delivered to at least one vcpu > > it will not be considered coalesced, but if cpu it was delivered to is > > not cpu that does time accounting then clock will drift. > > That means we would have to fire callbacks per receiving CPU and report > its number back. Is there a way to find out if we are running such a > guest without an '-enable-win2k[38]-64bit-smp-rtc-drift-fix'? > Not that I know of. -- Gleb.