Hello Dave, On Thu, Apr 28, 2022 at 1:20 PM Dr. David Alan Gilbert <dgilb...@redhat.com> wrote: > > Leo: > Unfortunately this is failing a couple of CI tests; the MSG_ZEROCOPY > one I guess is the simpler one; I think Stefanha managed to find the > liburing fix for the __kernel_timespec case, but that looks like a bit > more fun! > > Dave
About MSG_ZEROCOPY error: I tracked down how the test happened, downloaded the same docker image from the tests(opensuse-leap-15.2), and took a look at the filesystem for the MSG_ZEROCOPY define, which I could not find anywhere. Then I took a look into /usr/include/bits/socket.h, which is where RHEL has MSG_ZEROCOPY defined. Zypper defines it as been provided by glibc-devel, which is versioned at 2.26-lp152.26.12.1. I then took a look at https://sourceware.org/git/glibc.git, and found commit 78cde19f62 that introduces MSG_ZEROCOPY. The first version that has this commit is glibc-2.27. So, basically, this means opensuse-leap-15.2 glibc version does not support MSG_ZEROCOPY. Based on that, I had a few ideas on how to solve the CI bug: 1 - Propose a backport of this patch (few comments + single define) for leap-15.x, wait for them to accept and update the version in qemu CI. (TBH I have no idea how the opensuse community works, I just suppose it could be a way of tackling this.) 2 - include an #ifndef MSG_ZEROCOPY #define MSG_ZEROCOPY 0x4000000 #endif in code, which is ugly IMHO, but will be fast and clean. 3 - In CI, patch /usr/include/bits/socket.h before building, which will also work fine, but defeats the purpose of keeping qemu building on the platform. Among the above, I would go with (2), as it seems a reasonable way of dealing with this. Does anyone else have any further suggestions, or know how this kind of issue is generally solved in qemu? Best regards, Leo > > > Job #2390848140 ( https://gitlab.com/dagrh/qemu/-/jobs/2390848140/raw ) > Name: build-system-alpine > In file included from /usr/include/linux/errqueue.h:6, > from ../io/channel-socket.c:29: > /usr/include/linux/time_types.h:7:8: error: redefinition of 'struct > __kernel_timespec' > 7 | struct __kernel_timespec { > | ^~~~~~~~~~~~~~~~~ > In file included from /usr/include/liburing.h:19, > from /builds/dagrh/qemu/include/block/aio.h:18, > from /builds/dagrh/qemu/include/io/channel.h:26, > from /builds/dagrh/qemu/include/io/channel-socket.h:24, > from ../io/channel-socket.c:24: > /usr/include/liburing/compat.h:9:8: note: originally defined here > 9 | struct __kernel_timespec { > | ^~~~~~~~~~~~~~~~~ > > ---- > Name: build-system-opensuse > > https://gitlab.com/dagrh/qemu/-/jobs/2390848160/raw > ../io/channel-socket.c: In function ‘qio_channel_socket_writev’: > ../io/channel-socket.c:578:18: error: ‘MSG_ZEROCOPY’ undeclared (first > use in this function); did you mean ‘SO_ZEROCOPY’? > sflags = MSG_ZEROCOPY; > ^~~~~~~~~~~~ > SO_ZEROCOPY > ../io/channel-socket.c:578:18: note: each undeclared identifier is reported > only once for each function it appears in > > * Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote: > > From: Leonardo Bras <leob...@redhat.com> > > > > For CONFIG_LINUX, implement the new zero copy flag and the optional callback > > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY > > feature is available in the host kernel, which is checked on > > qio_channel_socket_connect_sync() > > > > qio_channel_socket_flush() was implemented by counting how many times > > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the > > socket's error queue, in order to find how many of them finished sending. > > Flush will loop until those counters are the same, or until some error > > occurs. > > > > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY: > > 1: Buffer > > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid > > copying, > > some caution is necessary to avoid overwriting any buffer before it's sent. > > If something like this happen, a newer version of the buffer may be sent > > instead. > > - If this is a problem, it's recommended to call qio_channel_flush() before > > freeing > > or re-using the buffer. > > > > 2: Locked memory > > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, > > and > > unlocked after it's sent. > > - Depending on the size of each buffer, and how often it's sent, it may > > require > > a larger amount of locked memory than usually available to non-root user. > > - If the required amount of locked memory is not available, writev_zero_copy > > will return an error, which can abort an operation like migration, > > - Because of this, when an user code wants to add zero copy as a feature, it > > requires a mechanism to disable it, so it can still be accessible to less > > privileged users. > > > > Signed-off-by: Leonardo Bras <leob...@redhat.com> > > Reviewed-by: Peter Xu <pet...@redhat.com> > > Reviewed-by: Daniel P. Berrangé <berra...@redhat.com> > > Reviewed-by: Juan Quintela <quint...@redhat.com> > > Message-Id: <20220426230654.637939-3-leob...@redhat.com> > > Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > --- > > include/io/channel-socket.h | 2 + > > io/channel-socket.c | 108 ++++++++++++++++++++++++++++++++++-- > > 2 files changed, 106 insertions(+), 4 deletions(-) > > > > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h > > index e747e63514..513c428fe4 100644 > > --- a/include/io/channel-socket.h > > +++ b/include/io/channel-socket.h > > @@ -47,6 +47,8 @@ struct QIOChannelSocket { > > socklen_t localAddrLen; > > struct sockaddr_storage remoteAddr; > > socklen_t remoteAddrLen; > > + ssize_t zero_copy_queued; > > + ssize_t zero_copy_sent; > > }; > > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c > > index 696a04dc9c..1dd85fc1ef 100644 > > --- a/io/channel-socket.c > > +++ b/io/channel-socket.c > > @@ -25,6 +25,10 @@ > > #include "io/channel-watch.h" > > #include "trace.h" > > #include "qapi/clone-visitor.h" > > +#ifdef CONFIG_LINUX > > +#include <linux/errqueue.h> > > +#include <bits/socket.h> > > +#endif > > > > #define SOCKET_MAX_FDS 16 > > > > @@ -54,6 +58,8 @@ qio_channel_socket_new(void) > > > > sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); > > sioc->fd = -1; > > + sioc->zero_copy_queued = 0; > > + sioc->zero_copy_sent = 0; > > > > ioc = QIO_CHANNEL(sioc); > > qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); > > @@ -153,6 +159,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket > > *ioc, > > return -1; > > } > > > > +#ifdef CONFIG_LINUX > > + int ret, v = 1; > > + ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v)); > > + if (ret == 0) { > > + /* Zero copy available on host */ > > + qio_channel_set_feature(QIO_CHANNEL(ioc), > > + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY); > > + } > > +#endif > > + > > return 0; > > } > > > > @@ -533,6 +549,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel > > *ioc, > > char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)]; > > size_t fdsize = sizeof(int) * nfds; > > struct cmsghdr *cmsg; > > + int sflags = 0; > > > > memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)); > > > > @@ -557,15 +574,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel > > *ioc, > > memcpy(CMSG_DATA(cmsg), fds, fdsize); > > } > > > > + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { > > + sflags = MSG_ZEROCOPY; > > + } > > + > > retry: > > - ret = sendmsg(sioc->fd, &msg, 0); > > + ret = sendmsg(sioc->fd, &msg, sflags); > > if (ret <= 0) { > > - if (errno == EAGAIN) { > > + switch (errno) { > > + case EAGAIN: > > return QIO_CHANNEL_ERR_BLOCK; > > - } > > - if (errno == EINTR) { > > + case EINTR: > > goto retry; > > + case ENOBUFS: > > + if (sflags & MSG_ZEROCOPY) { > > + error_setg_errno(errp, errno, > > + "Process can't lock enough memory for > > using MSG_ZEROCOPY"); > > + return -1; > > + } > > + break; > > } > > + > > error_setg_errno(errp, errno, > > "Unable to write to socket"); > > return -1; > > @@ -659,6 +688,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel > > *ioc, > > } > > #endif /* WIN32 */ > > > > + > > +#ifdef CONFIG_LINUX > > +static int qio_channel_socket_flush(QIOChannel *ioc, > > + Error **errp) > > +{ > > + QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); > > + struct msghdr msg = {}; > > + struct sock_extended_err *serr; > > + struct cmsghdr *cm; > > + char control[CMSG_SPACE(sizeof(*serr))]; > > + int received; > > + int ret = 1; > > + > > + msg.msg_control = control; > > + msg.msg_controllen = sizeof(control); > > + memset(control, 0, sizeof(control)); > > + > > + while (sioc->zero_copy_sent < sioc->zero_copy_queued) { > > + received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE); > > + if (received < 0) { > > + switch (errno) { > > + case EAGAIN: > > + /* Nothing on errqueue, wait until something is available > > */ > > + qio_channel_wait(ioc, G_IO_ERR); > > + continue; > > + case EINTR: > > + continue; > > + default: > > + error_setg_errno(errp, errno, > > + "Unable to read errqueue"); > > + return -1; > > + } > > + } > > + > > + cm = CMSG_FIRSTHDR(&msg); > > + if (cm->cmsg_level != SOL_IP && > > + cm->cmsg_type != IP_RECVERR) { > > + error_setg_errno(errp, EPROTOTYPE, > > + "Wrong cmsg in errqueue"); > > + return -1; > > + } > > + > > + serr = (void *) CMSG_DATA(cm); > > + if (serr->ee_errno != SO_EE_ORIGIN_NONE) { > > + error_setg_errno(errp, serr->ee_errno, > > + "Error on socket"); > > + return -1; > > + } > > + if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) { > > + error_setg_errno(errp, serr->ee_origin, > > + "Error not from zero copy"); > > + return -1; > > + } > > + > > + /* No errors, count successfully finished sendmsg()*/ > > + sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1; > > + > > + /* If any sendmsg() succeeded using zero copy, return 0 at the end > > */ > > + if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) { > > + ret = 0; > > + } > > + } > > + > > + return ret; > > +} > > + > > +#endif /* CONFIG_LINUX */ > > + > > static int > > qio_channel_socket_set_blocking(QIOChannel *ioc, > > bool enabled, > > @@ -789,6 +886,9 @@ static void qio_channel_socket_class_init(ObjectClass > > *klass, > > ioc_klass->io_set_delay = qio_channel_socket_set_delay; > > ioc_klass->io_create_watch = qio_channel_socket_create_watch; > > ioc_klass->io_set_aio_fd_handler = > > qio_channel_socket_set_aio_fd_handler; > > +#ifdef CONFIG_LINUX > > + ioc_klass->io_flush = qio_channel_socket_flush; > > +#endif > > } > > > > static const TypeInfo qio_channel_socket_info = { > > -- > > 2.35.1 > > > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > >