On 2022/5/13 下午2:28, Leonardo Bras wrote:
@@ -557,15 +578,31 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
          memcpy(CMSG_DATA(cmsg), fds, fdsize);
      }
+#ifdef QEMU_MSG_ZEROCOPY
+    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+        sflags = MSG_ZEROCOPY;
+    }
+#endif
+
   retry:
-    ret = sendmsg(sioc->fd, &msg, 0);
+    ret = sendmsg(sioc->fd, &msg, sflags);
      if (ret <= 0) {
-        if (errno == EAGAIN) {
+        switch (errno) {
+        case EAGAIN:
              return QIO_CHANNEL_ERR_BLOCK;
-        }
-        if (errno == EINTR) {
+        case EINTR:
              goto retry;
+#ifdef QEMU_MSG_ZEROCOPY
+        case ENOBUFS:
+            if (sflags & MSG_ZEROCOPY) {
+                error_setg_errno(errp, errno,
+                                 "Process can't lock enough memory for using 
MSG_ZEROCOPY");
+                return -1;
+            }
+            break;
+#endif
          }
+
          error_setg_errno(errp, errno,
                           "Unable to write to socket");
          return -1;

Hi, Leo.

There are some other questions I would like to discuss with you.

I tested the multifd zero_copy migration and found that sometimes even if max locked memory of qemu was set to 16GB(much greater than `MULTIFD_PACKET_SIZE`), the error "Process can't lock enough memory for using MSG_ZEROCOPY" would still be reported.

I noticed that the doc(https://www.kernel.org/doc/html/v5.12/networking/msg_zerocopy.html) says "A zerocopy failure will return -1 with errno ENOBUFS. This happens if the socket option was not set, _the socket exceeds its optmem limit_ or the user exceeds its ulimit on locked pages."

I also found that the RFC(https://lwn.net/Articles/715279/) says _"__The change to allocate notification skbuffs from optmem requires__ensuring that net.core.optmem is at least a few 100KB."_

On my host,  optmem was initially set to 20KB, I tried to change it to 100KB (echo 102400 > /proc/sys/net/core/optmem_max) as the RFC says.Then I tested the multifd zero_copy migration repeatedly,and the error disappeared.

So when sendmsg returns -1 with errno ENOBUFS, should we distinguish between error ''socket exceeds optmem limit" and error "user exceeds ulimit on locked pages"? Or is there any better way to avoid this problem?

Best Regards,

chuang xu

Reply via email to