On Mon, Mar 04, 2019 at 11:54:38PM -0800, Mike Larkin wrote:
> On Mon, Mar 04, 2019 at 01:25:16PM +0200, Wictor Lund wrote:
> > Hi misc@!
> > 
> > I have figured out that it is possible to get vmd(8) into a state where
> > 1) com1_dev.rcv_pending != 0 
> > 2) there is data pending on com1_dev.fd
> > 3) the guest doesn't seem to care
> > 
> > This results in a locked up situation where com_rcv_event() is called on
> > indefinitely.  It seems to me that an interrupt is lost somewhere, leading
> > to a situation where the guest OS is happily ignorant of the available data,
> > while the vmm is waiting for the guest to eat it up. 
> > 
> > This has made it impossible to install Linux via the serial console on
> > vmm(4).  It seems that people previously have reported "freezing" problems
> > in vmm(4) form time to time, but when reported no one else have been able to
> > reproduce it.
> > 
> > I have solved the problem for myself by changing com_rcv_event() to the
> > following:
> > 
> > static void
> > com_rcv_event(int fd, short kind, void *arg)
> > {
> >         mutex_lock(&com1_dev.mutex);
> > 
> >         /*
> >          * We already have other data pending to be received. The data that
> >          * has become available now will be moved to the com port later.
> >          */
> >         if (com1_dev.rcv_pending) {
> >                 /* If pending interrupt, inject */
> >                 if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
> >                         utrace("comrcv injintr", &com1_dev.regs.lsr, 
> > sizeof(com1_dev.regs.lsr));
> >                         /* XXX: vcpu_id */
> >                         vcpu_assert_pic_irq((uintptr_t) arg, 0, 
> > com1_dev.irq);
> >                         vcpu_deassert_pic_irq((uintptr_t) arg, 0, 
> > com1_dev.irq);
> >                 }
> >                 mutex_unlock(&com1_dev.mutex);
> >                 return;
> >         }
> >         if (com1_dev.regs.lsr & LSR_RXRDY)
> >                 com1_dev.rcv_pending = 1;
> >         else {
> >                 com_rcv(&com1_dev, (uintptr_t) arg, 0);
> > 
> >                 /* If pending interrupt, inject */
> >                 if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
> >                         /* XXX: vcpu_id */
> >                         vcpu_assert_pic_irq((uintptr_t) arg, 0, 
> > com1_dev.irq);
> >                         vcpu_deassert_pic_irq((uintptr_t) arg, 0, 
> > com1_dev.irq);
> >                 }
> >         }
> > 
> >         mutex_unlock(&com1_dev.mutex);
> > }
> > 
> > However, I have little experience in the interrupt behaviour on x86.  I'm
> > also aware of that there has been an attempt to fix this behaviour [1].
> > 
> > I think the problem is that when com_rcv() is called from
> > vcpu_process_com_data(), the interrupt is triggered using vcpu_exit_inout(),
> > which was not touched in the previous attempt [1] to fix the "freezing"
> > problem.  vcpu_exit_inout() still uses a simple vcpu_assert_pic_irq() call
> > to trigger the interrupt while for example com_rcv_event() uses the
> > vcpu_assert_pic_irq(); vcpu_deassert_pic_irq() sequence to trigger it.
> > 
> > With my modifications to com_rcv_event() I was able to install not only
> > alpine linux, but even debian using the serial console.  Without the
> > modification I can't even install alpine linux via the serial console.
> > 
> > Any thoughts on this?  If people think my change is a sound one, I can make
> > a proper patch for it.  If people think the change is unsound, I would have
> > to look into changing vcpu_exit_inout() and probably extend the interface to
> > it to decide how the interrupt should be triggered.
> > 
> > 1. https://marc.info/?l=openbsd-cvs&m=153115270302514&w=2
> > 
> > --
> > Wictor Lund
> > 
> 
> Thanks Wictor!
> 
> Can you make a proper diff and resend please?

The diff is appended below.

I noticed that uint8_t elcr[2] is handled under pthread_mutex_t pic_mtx in
i8259_deassert_irq() (which is called from vcpu_deassert_pic_irq()), while
elcr is not handled under pic_mtx in pic_set_elcr() and vcpu_exit_elcr().

This may or may not be of importance because i8259_deassert_irq() is a no-op
depending on the value of elcr.

I could write a diff immediately that makes those functions take the mutex,
but I haven't studied the locking strategies in vmd(8) well enough to know
whether it may cause a deadlock.

--
Wictor Lund

diff --git a/usr.sbin/vmd/ns8250.c b/usr.sbin/vmd/ns8250.c
index 6ed614b0a9f..3efbf357702 100644
--- a/usr.sbin/vmd/ns8250.c
+++ b/usr.sbin/vmd/ns8250.c
@@ -115,6 +115,12 @@ com_rcv_event(int fd, short kind, void *arg)
         * has become available now will be moved to the com port later.
         */
        if (com1_dev.rcv_pending) {
+               /* If no interrupt is currently pending, inject one */
+               if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
+                       /* XXX: vcpu_id */
+                       vcpu_assert_pic_irq((uintptr_t) arg, 0, com1_dev.irq);
+                       vcpu_deassert_pic_irq((uintptr_t) arg, 0, com1_dev.irq);
+               }
                mutex_unlock(&com1_dev.mutex);
                return;
        }

Reply via email to