On Wed, Jan 19, 2022 at 02:22:56PM -0300, Leonardo Bras Soares Passos wrote: > Hello Daniel, > > On Thu, Jan 13, 2022 at 7:42 AM Daniel P. Berrangé <berra...@redhat.com> > wrote: > > > > On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote: > > > On Thu, Jan 13, 2022 at 10:06:14AM +0000, Daniel P. Berrangé wrote: > > > > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote: > > > > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote: > > > > > > @@ -558,15 +575,26 @@ static ssize_t > > > > > > qio_channel_socket_writev(QIOChannel *ioc, > > > > > > memcpy(CMSG_DATA(cmsg), fds, fdsize); > > > > > > } > > > > > > > > > > > > + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { > > > > > > + sflags = MSG_ZEROCOPY; > > > > > > + } > > > > > > + > > > > > > retry: > > > > > > - ret = sendmsg(sioc->fd, &msg, 0); > > > > > > + ret = sendmsg(sioc->fd, &msg, sflags); > > > > > > if (ret <= 0) { > > > > > > - if (errno == EAGAIN) { > > > > > > + switch (errno) { > > > > > > + case EAGAIN: > > > > > > return QIO_CHANNEL_ERR_BLOCK; > > > > > > - } > > > > > > - if (errno == EINTR) { > > > > > > + case EINTR: > > > > > > goto retry; > > > > > > + case ENOBUFS: > > > > > > + if (sflags & MSG_ZEROCOPY) { > > > > > > + error_setg_errno(errp, errno, > > > > > > + "Process can't lock enough memory > > > > > > for using MSG_ZEROCOPY"); > > > > > > + return -1; > > > > > > + } > > > > > > > > > > I have no idea whether it'll make a real differnece, but - should we > > > > > better add > > > > > a "break" here? If you agree and with that fixed, feel free to add: > > > > > > > > > > Reviewed-by: Peter Xu <pet...@redhat.com> > > > > > > > > > > I also wonder whether you hit ENOBUFS in any of the environments. On > > > > > Fedora > > > > > here it's by default unlimited, but just curious when we should keep > > > > > an eye. > > > > > > > > Fedora doesn't allow unlimited locked memory by default > > > > > > > > $ grep "locked memory" /proc/self/limits > > > > Max locked memory 65536 65536 > > > > bytes > > > > > > > > And regardless of Fedora defaults, libvirt will set a limit > > > > for the guest. It will only be unlimited if requiring certain > > > > things like VFIO. > > > > > > Thanks, I obviously checked up the wrong host.. > > > > > > Leo, do you know how much locked memory will be needed by zero copy? Will > > > there be a limit? Is it linear to the number of sockets/channels? > > > > IIRC we decided it would be limited by the socket send buffer size, rather > > than guest RAM, because writes will block once the send buffer is full. > > > > This has a default global setting, with per-socket override. On one box I > > have it is 200 Kb. With multifd you'll need "num-sockets * send buffer". > > Oh, I was not aware there is a send buffer size (or maybe I am unable > to recall). > That sure makes things much easier. > > > > > > It'll be better if we can fail at enabling the feature when we detected > > > that > > > the specified locked memory limit may not be suffice. > > sure > > > > > Checking this value against available locked memory though will always > > have an error margin because other things in QEMU can use locked memory > > too > > We can get the current limit (before zerocopy) as an error margin: > req_lock_mem = num-sockets * send buffer + BASE_LOCKED > > Where BASE_LOCKED is the current libvirt value, or so on.
Hmm.. not familiar with libvirt, so I'm curious whether libvirt is actually enlarging the allowed locked mem on Fedora since the default is 64KB? I think it'll be great to capture the very major going-to-fail scenarios. For example, I'm wondering whether a qemu (without libvirt) will simply fail directly on Fedora using non-root even with 1 channel due to the 64K limit, or the other extreme case is when the user does not allow locking mem at all in some container environment (when we see max locked mem is zero). It's not only about failing early, it's also about failing with a meaningful error so the user knows what to tune, while I'm not very sure that'll be easily understandable when we wait until the failure of io_writev(). Thanks, -- Peter Xu