Hello Dave,

On Thu, Apr 28, 2022 at 1:20 PM Dr. David Alan Gilbert
<dgilb...@redhat.com> wrote:
>
> Leo:
>   Unfortunately this is failing a couple of CI tests; the MSG_ZEROCOPY
> one I guess is the simpler one; I think Stefanha managed to find the
> liburing fix for the __kernel_timespec case, but that looks like a bit
> more fun!
>
> Dave

About MSG_ZEROCOPY error:

I tracked down how the test happened, downloaded the same docker image from the
tests(opensuse-leap-15.2), and took a look at the filesystem for the
MSG_ZEROCOPY define, which I could not find anywhere.

Then I took a look into /usr/include/bits/socket.h, which is where RHEL has
MSG_ZEROCOPY defined. Zypper defines it as been provided by glibc-devel, which
is versioned at 2.26-lp152.26.12.1.

I then took a look at https://sourceware.org/git/glibc.git, and found commit
78cde19f62 that introduces MSG_ZEROCOPY. The first version that has this commit
is glibc-2.27.

So, basically, this means opensuse-leap-15.2 glibc version does not support
MSG_ZEROCOPY. Based on that, I had a few ideas on how to solve the CI bug:
1 - Propose a backport of this patch (few comments +  single define) for
leap-15.x, wait for them to accept and update the version in qemu CI.
(TBH I have no idea how the opensuse community works, I just suppose it could
be a way of tackling this.)
2 - include an #ifndef MSG_ZEROCOPY #define MSG_ZEROCOPY 0x4000000 #endif in
code, which is ugly IMHO, but will be fast and clean.
3 - In CI, patch /usr/include/bits/socket.h before building, which will also
work fine, but defeats the purpose of keeping qemu building on the platform.

Among the above, I would go with (2), as it seems a reasonable way of dealing
with this.

Does anyone else have any further suggestions, or know how this kind of issue
is generally solved in qemu?

Best regards,
Leo


>
>
> Job #2390848140 ( https://gitlab.com/dagrh/qemu/-/jobs/2390848140/raw )
> Name: build-system-alpine
> In file included from /usr/include/linux/errqueue.h:6,
>                  from ../io/channel-socket.c:29:
> /usr/include/linux/time_types.h:7:8: error: redefinition of 'struct 
> __kernel_timespec'
>     7 | struct __kernel_timespec {
>       |        ^~~~~~~~~~~~~~~~~
> In file included from /usr/include/liburing.h:19,
>                  from /builds/dagrh/qemu/include/block/aio.h:18,
>                  from /builds/dagrh/qemu/include/io/channel.h:26,
>                  from /builds/dagrh/qemu/include/io/channel-socket.h:24,
>                  from ../io/channel-socket.c:24:
> /usr/include/liburing/compat.h:9:8: note: originally defined here
>     9 | struct __kernel_timespec {
>       |        ^~~~~~~~~~~~~~~~~
>
> ----
> Name: build-system-opensuse
>
> https://gitlab.com/dagrh/qemu/-/jobs/2390848160/raw
> ../io/channel-socket.c: In function ‘qio_channel_socket_writev’:
> ../io/channel-socket.c:578:18: error: ‘MSG_ZEROCOPY’ undeclared (first 
> use in this function); did you mean ‘SO_ZEROCOPY’?
>          sflags = MSG_ZEROCOPY;
>                   ^~~~~~~~~~~~
>                   SO_ZEROCOPY
> ../io/channel-socket.c:578:18: note: each undeclared identifier is reported 
> only once for each function it appears in
>
> * Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote:
> > From: Leonardo Bras <leob...@redhat.com>
> >
> > For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > feature is available in the host kernel, which is checked on
> > qio_channel_socket_connect_sync()
> >
> > qio_channel_socket_flush() was implemented by counting how many times
> > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> > socket's error queue, in order to find how many of them finished sending.
> > Flush will loop until those counters are the same, or until some error 
> > occurs.
> >
> > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> > 1: Buffer
> > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid 
> > copying,
> > some caution is necessary to avoid overwriting any buffer before it's sent.
> > If something like this happen, a newer version of the buffer may be sent 
> > instead.
> > - If this is a problem, it's recommended to call qio_channel_flush() before 
> > freeing
> > or re-using the buffer.
> >
> > 2: Locked memory
> > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, 
> > and
> > unlocked after it's sent.
> > - Depending on the size of each buffer, and how often it's sent, it may 
> > require
> > a larger amount of locked memory than usually available to non-root user.
> > - If the required amount of locked memory is not available, writev_zero_copy
> > will return an error, which can abort an operation like migration,
> > - Because of this, when an user code wants to add zero copy as a feature, it
> > requires a mechanism to disable it, so it can still be accessible to less
> > privileged users.
> >
> > Signed-off-by: Leonardo Bras <leob...@redhat.com>
> > Reviewed-by: Peter Xu <pet...@redhat.com>
> > Reviewed-by: Daniel P. Berrangé <berra...@redhat.com>
> > Reviewed-by: Juan Quintela <quint...@redhat.com>
> > Message-Id: <20220426230654.637939-3-leob...@redhat.com>
> > Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
> > ---
> >  include/io/channel-socket.h |   2 +
> >  io/channel-socket.c         | 108 ++++++++++++++++++++++++++++++++++--
> >  2 files changed, 106 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > index e747e63514..513c428fe4 100644
> > --- a/include/io/channel-socket.h
> > +++ b/include/io/channel-socket.h
> > @@ -47,6 +47,8 @@ struct QIOChannelSocket {
> >      socklen_t localAddrLen;
> >      struct sockaddr_storage remoteAddr;
> >      socklen_t remoteAddrLen;
> > +    ssize_t zero_copy_queued;
> > +    ssize_t zero_copy_sent;
> >  };
> >
> >
> > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > index 696a04dc9c..1dd85fc1ef 100644
> > --- a/io/channel-socket.c
> > +++ b/io/channel-socket.c
> > @@ -25,6 +25,10 @@
> >  #include "io/channel-watch.h"
> >  #include "trace.h"
> >  #include "qapi/clone-visitor.h"
> > +#ifdef CONFIG_LINUX
> > +#include <linux/errqueue.h>
> > +#include <bits/socket.h>
> > +#endif
> >
> >  #define SOCKET_MAX_FDS 16
> >
> > @@ -54,6 +58,8 @@ qio_channel_socket_new(void)
> >
> >      sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
> >      sioc->fd = -1;
> > +    sioc->zero_copy_queued = 0;
> > +    sioc->zero_copy_sent = 0;
> >
> >      ioc = QIO_CHANNEL(sioc);
> >      qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> > @@ -153,6 +159,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket 
> > *ioc,
> >          return -1;
> >      }
> >
> > +#ifdef CONFIG_LINUX
> > +    int ret, v = 1;
> > +    ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> > +    if (ret == 0) {
> > +        /* Zero copy available on host */
> > +        qio_channel_set_feature(QIO_CHANNEL(ioc),
> > +                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
> > +    }
> > +#endif
> > +
> >      return 0;
> >  }
> >
> > @@ -533,6 +549,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >      char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
> >      size_t fdsize = sizeof(int) * nfds;
> >      struct cmsghdr *cmsg;
> > +    int sflags = 0;
> >
> >      memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
> >
> > @@ -557,15 +574,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> >      }
> >
> > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > +        sflags = MSG_ZEROCOPY;
> > +    }
> > +
> >   retry:
> > -    ret = sendmsg(sioc->fd, &msg, 0);
> > +    ret = sendmsg(sioc->fd, &msg, sflags);
> >      if (ret <= 0) {
> > -        if (errno == EAGAIN) {
> > +        switch (errno) {
> > +        case EAGAIN:
> >              return QIO_CHANNEL_ERR_BLOCK;
> > -        }
> > -        if (errno == EINTR) {
> > +        case EINTR:
> >              goto retry;
> > +        case ENOBUFS:
> > +            if (sflags & MSG_ZEROCOPY) {
> > +                error_setg_errno(errp, errno,
> > +                                 "Process can't lock enough memory for 
> > using MSG_ZEROCOPY");
> > +                return -1;
> > +            }
> > +            break;
> >          }
> > +
> >          error_setg_errno(errp, errno,
> >                           "Unable to write to socket");
> >          return -1;
> > @@ -659,6 +688,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >  }
> >  #endif /* WIN32 */
> >
> > +
> > +#ifdef CONFIG_LINUX
> > +static int qio_channel_socket_flush(QIOChannel *ioc,
> > +                                    Error **errp)
> > +{
> > +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> > +    struct msghdr msg = {};
> > +    struct sock_extended_err *serr;
> > +    struct cmsghdr *cm;
> > +    char control[CMSG_SPACE(sizeof(*serr))];
> > +    int received;
> > +    int ret = 1;
> > +
> > +    msg.msg_control = control;
> > +    msg.msg_controllen = sizeof(control);
> > +    memset(control, 0, sizeof(control));
> > +
> > +    while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
> > +        received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> > +        if (received < 0) {
> > +            switch (errno) {
> > +            case EAGAIN:
> > +                /* Nothing on errqueue, wait until something is available 
> > */
> > +                qio_channel_wait(ioc, G_IO_ERR);
> > +                continue;
> > +            case EINTR:
> > +                continue;
> > +            default:
> > +                error_setg_errno(errp, errno,
> > +                                 "Unable to read errqueue");
> > +                return -1;
> > +            }
> > +        }
> > +
> > +        cm = CMSG_FIRSTHDR(&msg);
> > +        if (cm->cmsg_level != SOL_IP &&
> > +            cm->cmsg_type != IP_RECVERR) {
> > +            error_setg_errno(errp, EPROTOTYPE,
> > +                             "Wrong cmsg in errqueue");
> > +            return -1;
> > +        }
> > +
> > +        serr = (void *) CMSG_DATA(cm);
> > +        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
> > +            error_setg_errno(errp, serr->ee_errno,
> > +                             "Error on socket");
> > +            return -1;
> > +        }
> > +        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
> > +            error_setg_errno(errp, serr->ee_origin,
> > +                             "Error not from zero copy");
> > +            return -1;
> > +        }
> > +
> > +        /* No errors, count successfully finished sendmsg()*/
> > +        sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > +
> > +        /* If any sendmsg() succeeded using zero copy, return 0 at the end 
> > */
> > +        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > +            ret = 0;
> > +        }
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +#endif /* CONFIG_LINUX */
> > +
> >  static int
> >  qio_channel_socket_set_blocking(QIOChannel *ioc,
> >                                  bool enabled,
> > @@ -789,6 +886,9 @@ static void qio_channel_socket_class_init(ObjectClass 
> > *klass,
> >      ioc_klass->io_set_delay = qio_channel_socket_set_delay;
> >      ioc_klass->io_create_watch = qio_channel_socket_create_watch;
> >      ioc_klass->io_set_aio_fd_handler = 
> > qio_channel_socket_set_aio_fd_handler;
> > +#ifdef CONFIG_LINUX
> > +    ioc_klass->io_flush = qio_channel_socket_flush;
> > +#endif
> >  }
> >
> >  static const TypeInfo qio_channel_socket_info = {
> > --
> > 2.35.1
> >
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>
>


Reply via email to