On Thu, Aug 03, 2023 at 07:01:51PM -0400, Dave Voutila wrote: > > Claudio Jeker <cje...@diehard.n-r-g.com> writes: > > > On Thu, Aug 03, 2023 at 04:20:47PM -0400, Dave Voutila wrote: > >> Found this while working on some virtio stuff. My original > >> implementation as part of the multi-process redesign didn't handle if > >> imsg_flush failed due to resource shortages (EAGAIN), so adopt the > >> do/while idiom used by other daemons like iked(8). > >> > >> Manifests with errors from the vm process looking like: > >> > >> virtio_pci_io: imsg_flush (write) > >> vmd: pci i/o access function failed > >> > >> ok? > > > > There are way to many imsg_flush() calls in this code. I don't think this > > is designed the right way. imsg_flush() should almost never be used. > > Busy loop on EAGAIN will bite you further down the road. > > > > Bite me how?
We had massive EAGAIN spins in relayd. Busy loop on EAGAIN is almost always wrong. A possible fix is to poll on the fd until it is writable -- while this does not burn CPU for nothing it still does not allow any other event from happening either. Have a look at what atomicio.c does you may need to add something similar for imsg_flush(). > This is all for the synchronous imsg channel to the virtio device > process, not the async one. What's going to actually perform the flush > for me if not imsg_flush? This is where I'm a bit confused. There is no such concept of a synchronous imsg channel. imsg are by nature asynchronous. You pass messages beween processes (especially in a libevent driven daemon). It seems this code requires some kind of synchronisation but blocking all processing during that time seems like a very big hammer. I don't fully grasp vmd but looking at the code and seeing that atomicio is mixed with libevent is a red flag. Also why is the synchronous channel set non-blocking? If it is synchronous it should most probably be blocking (which would also solve the EAGAIN issue). > >> diff refs/heads/master refs/heads/vmd-imsg_flush > >> commit - 40d57955f2f1a3a65c42ea374f86c74cf879d76d > >> commit + 2c42dedb675f013276cdbd47464f656e2451b92c > >> blob - 798b5fea6d589d43083c134a040df6272037869f > >> blob + ff1ab5fdcb52866f10a14df38d30abde277619df > >> --- usr.sbin/vmd/virtio.c > >> +++ usr.sbin/vmd/virtio.c > >> @@ -825,8 +825,11 @@ virtio_shutdown(struct vmd_vm *vm) > >> if (ret == -1) > >> fatalx("%s: failed to send shutdown to device", > >> __func__); > >> - if (imsg_flush(ibuf) == -1) > >> - fatalx("%s: imsg_flush", __func__); > >> + do { > >> + ret = imsg_flush(ibuf); > >> + } while (ret == -1 && errno == EAGAIN); > >> + if (ret == -1) > >> + fatal("%s: imsg_flush", __func__); > >> } > >> > >> /* > >> @@ -1132,8 +1135,11 @@ vionet_dump(int fd) > >> __func__, dev->vionet.idx); > >> return (-1); > >> } > >> - if (imsg_flush(ibuf) == -1) { > >> - log_warnx("%s: imsg_flush", __func__); > >> + do { > >> + ret = imsg_flush(ibuf); > >> + } while (ret == -1 && errno == EAGAIN); > >> + if (ret == -1) { > >> + log_warn("%s: imsg_flush", __func__); > >> return (-1); > >> } > >> > >> @@ -1189,12 +1195,14 @@ vioblk_dump(int fd) > >> __func__, dev->vioblk.idx); > >> return (-1); > >> } > >> - if (imsg_flush(ibuf) == -1) { > >> - log_warnx("%s: imsg_flush", __func__); > >> + do { > >> + ret = imsg_flush(ibuf); > >> + } while (ret == -1 && errno == EAGAIN); > >> + if (ret == -1) { > >> + log_warn("%s: imsg_flush", __func__); > >> return (-1); > >> } > >> > >> - > >> sz = atomicio(read, dev->sync_fd, &temp, sizeof(temp)); > >> if (sz != sizeof(temp)) { > >> log_warnx("%s: failed to dump vioblk[%d]", __func__, > >> @@ -1660,8 +1668,11 @@ virtio_pci_io(int dir, uint16_t reg, uint32_t > >> *data, u > >> " device", __func__); > >> return (ret); > >> } > >> - if (imsg_flush(ibuf) == -1) { > >> - log_warnx("%s: imsg_flush (write)", __func__); > >> + do { > >> + ret = imsg_flush(ibuf); > >> + } while (ret == -1 && errno == EAGAIN); > >> + if (ret == -1) { > >> + log_warn("%s: imsg_flush (write)", __func__); > >> return (-1); > >> } > >> } else { > >> @@ -1675,8 +1686,11 @@ virtio_pci_io(int dir, uint16_t reg, uint32_t > >> *data, u > >> " device", __func__); > >> return (ret); > >> } > >> - if (imsg_flush(ibuf) == -1) { > >> - log_warnx("%s: imsg_flush (read)", __func__); > >> + do { > >> + ret = imsg_flush(ibuf); > >> + } while (ret == -1 && errno == EAGAIN); > >> + if (ret == -1) { > >> + log_warn("%s: imsg_flush (read)", __func__); > >> return (-1); > >> } > >> -- :wq Claudio