On 4.02.2025 19:25, Fabiano Rosas wrote:
Peter Xu <pet...@redhat.com> writes:
On Tue, Feb 04, 2025 at 03:08:02PM +0000, Daniel P. Berrangé wrote:
On Mon, Feb 03, 2025 at 01:20:01PM -0500, Peter Xu wrote:
On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
Multifd send channels are terminated by calling
qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in
multifd_send_terminate_threads(), which in the TLS case essentially
calls shutdown(SHUT_RDWR) on the underlying raw socket.
Unfortunately, this does not terminate the TLS session properly and
the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error.
The only reason why this wasn't causing migration failures is because
the current migration code apparently does not check for migration
error being set after the end of the multifd receive process.
However, this will change soon so the multifd receive code has to be
prepared to not return an error on such premature TLS session EOF.
Use the newly introduced QIOChannelTLS method for that.
It's worth noting that even if the sender were to be changed to terminate
the TLS connection properly the receive side still needs to remain
compatible with older QEMU bit stream which does not do this.
If this is an existing bug, we could add a Fixes.
Two pure questions..
- What is the correct way to terminate the TLS session without this flag?
- Why this is only needed by multifd sessions?
Graceful TLS termination (via gnutls_bye()) should only be important to
security if the QEMU protocol in question does not know how much data it
is expecting to recieve. ie it cannot otherwise distinguish between an
expected EOF, and a premature EOF triggered by an attacker.
If the migration protocol has sufficient info to know when a chanel is
expected to see EOF, then we should stop trying to read from the TLS
channel before seeing the underlying EOF.
Ignoring GNUTLS_E_PREMATURE_TERMINATION would be valid if we know that
migration will still fail corretly in the case of a malicious attack
causing premature termination.
If there's a risk that migration may succeed, but with incomplete data,
then we would need the full gnutls_bye dance.
IIUC that's not required for migration then, because migration should know
exactly how much data to receive, and migration should need to verify that
and fail if the received data didn't match the expectation along the way.
We also have QEMU_VM_EOF as the end mark of stream.
The migration overall can detect whether EOF should have been reached,
but multifd threads cannot. If one multifd channel experiences an issue
and sees a premature termination, but ignores it, then that's a hang in
QEMU because nothing provided the syncs needed (p->sem_sync, most
likely).
I think allowing premature TLS termination simply makes the TLS case
function the same as non-TLS case here if the source were to close the
multifd channel(s) early.
Aren't we just postponing a bug?
Thanks,
Maciej