Claudio Jeker <cje...@diehard.n-r-g.com> writes:
> On Thu, Aug 03, 2023 at 07:01:51PM -0400, Dave Voutila wrote: >> >> Claudio Jeker <cje...@diehard.n-r-g.com> writes: >> >> > On Thu, Aug 03, 2023 at 04:20:47PM -0400, Dave Voutila wrote: >> >> Found this while working on some virtio stuff. My original >> >> implementation as part of the multi-process redesign didn't handle if >> >> imsg_flush failed due to resource shortages (EAGAIN), so adopt the >> >> do/while idiom used by other daemons like iked(8). >> >> >> >> Manifests with errors from the vm process looking like: >> >> >> >> virtio_pci_io: imsg_flush (write) >> >> vmd: pci i/o access function failed >> >> >> >> ok? >> > >> > There are way to many imsg_flush() calls in this code. I don't think this >> > is designed the right way. imsg_flush() should almost never be used. >> > Busy loop on EAGAIN will bite you further down the road. >> > >> >> Bite me how? > > We had massive EAGAIN spins in relayd. Busy loop on EAGAIN is almost > always wrong. A possible fix is to poll on the fd until it is writable -- > while this does not burn CPU for nothing it still does not allow any other > event from happening either. Have a look at what atomicio.c does you may > need to add something similar for imsg_flush(). > Ah. In this case, it's the vcpu thread that doesn't participate in an event loop via libevent. >> This is all for the synchronous imsg channel to the virtio device >> process, not the async one. What's going to actually perform the flush >> for me if not imsg_flush? > > This is where I'm a bit confused. There is no such concept of a > synchronous imsg channel. imsg are by nature asynchronous. You pass > messages beween processes (especially in a libevent driven daemon). > It seems this code requires some kind of synchronisation but blocking > all processing during that time seems like a very big hammer. > Right. It's the vcpu thread. Reading a register on the virtio device needs to block until the value is produced, unless I'm missing something here. > I don't fully grasp vmd but looking at the code and seeing that atomicio > is mixed with libevent is a red flag. Also why is the synchronous channel > set non-blocking? If it is synchronous it should most probably be blocking > (which would also solve the EAGAIN issue). > atomicio is used after exec and before the child process can initialize its libevent loop and imsg channels. Afterwards, it's all imsg. So I guess that's the part that's a bit messy? The vm process's vcpu thread sends imsgs "synchronously", but the device process will process and respond in an asynchronous way using libevent. I'm going to revisit this a bit more closely and see how it can be reworked.