On Sat, 2 Aug 2025 at 00:09, Daniel P. Berrangé <berra...@redhat.com> wrote: > eb3618e9 migration: activate TLS thread safety workaround > edea8183 io: add support for activating TLS thread safety workaround > 24ad5e19 crypto: implement workaround for GNUTLS thread safety problems > > an attempt was made to workaround broken gnutls thread safety when > TLS 1.3 rekeying is performed. > > Those patches acquired locks before calling gnutls_record_{send|recv} > but temporarily dropped the locks in the push/pull functions, in the > mistaken belief that there was a race inside gnutls that did not cross > execution of the push/pull functions. > > A non-deterministic reproducer mislead into thinking the workaround > was operating as expected, but this was wrong. Juraj demonstrated > that QEMU would still see errors from GNUTLS as well as crashes. > > The issue is that a pointer to internal state is saved before the > the push/pull functions are called, and after they return this > saved pointer is potentially invalid. IOW, it is never safe to > temporarily drop the mutexes inside the push/pull functions. The > lock must be held throughout execution of gnutls_record_{send|recv}. > > This would be possible with QEMU migration, except that the return > path thread sits in a blocking read waiting for data that very > rarely arrives from the destination QEMU. This blocks ability to > send any migration data in the other thread. > > It is possible to workaround this issue, however, by proactively > calling poll() to check for available incoming data before trying > the qio_channel_read() call. > > Reported-by: Juraj Marcin <jmar...@redhat.com> > Signed-off-by: Daniel P. Berrangé <berra...@redhat.com> > --- > crypto/tlssession.c | 16 ---------------- > migration/qemu-file.c | 16 ++++++++++++++++ > 2 files changed, 16 insertions(+), 16 deletions(-) > > diff --git a/crypto/tlssession.c b/crypto/tlssession.c > index 86d407a142..7e11317528 100644 > --- a/crypto/tlssession.c > +++ b/crypto/tlssession.c > @@ -95,19 +95,11 @@ qcrypto_tls_session_push(void *opaque, const void *buf, > size_t len) > return -1; > }; > > - if (session->lockEnabled) { > - qemu_mutex_unlock(&session->lock); > - } > - > error_free(session->werr); > session->werr = NULL; > > ret = session->writeFunc(buf, len, session->opaque, &session->werr); > > - if (session->lockEnabled) { > - qemu_mutex_lock(&session->lock); > - } > - > if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) { > errno = EAGAIN; > return -1; > @@ -134,16 +126,8 @@ qcrypto_tls_session_pull(void *opaque, void *buf, size_t > len) > error_free(session->rerr); > session->rerr = NULL; > > - if (session->lockEnabled) { > - qemu_mutex_unlock(&session->lock); > - } > - > ret = session->readFunc(buf, len, session->opaque, &session->rerr); > > - if (session->lockEnabled) { > - qemu_mutex_lock(&session->lock); > - } > - > if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) { > errno = EAGAIN; > return -1; > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > index 8ee44c5ac9..cf6115e699 100644 > --- a/migration/qemu-file.c > +++ b/migration/qemu-file.c > @@ -338,6 +338,22 @@ static ssize_t coroutine_mixed_fn > qemu_fill_buffer(QEMUFile *f) > return 0; > } > > + /* > + * This feature triggers acquisition of mutexes around every > + * read and write. Thus we must not sit in a blocking read > + * if this is set, but must instead poll proactively. This > + * does not work with some channel types, however, so must > + * only pre-poll when the featre is set. > + */ > + if (qio_channel_has_feature(f->ioc, > + QIO_CHANNEL_FEATURE_CONCURRENT_IO)) { > + if (qemu_in_coroutine()) { > + qio_channel_yield(f->ioc, G_IO_IN); > + } else { > + qio_channel_wait(f->ioc, G_IO_IN); > + } > + } > + > do { > struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending }; > len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0, > --
* Looks okay. Reviewed-by: Prasad Pandit <p...@fedoraproject.org> Thank you. --- - Prasad