On Wed, Sep 04, 2024 at 04:58:14PM -0400, Steven Sistare wrote:
> On 8/21/2024 2:34 PM, Peter Xu wrote:
> > On Fri, Aug 16, 2024 at 01:09:23PM -0400, Steven Sistare wrote:
> > > On 8/16/2024 12:17 PM, Peter Xu wrote:
> > > > On Fri, Aug 16, 2024 at 05:00:32PM +0100, Daniel P. Berrangé wrote:
> > > > > On Fri, Aug 16, 2024 at 11:34:10AM -0400, Peter Xu wrote:
> > > > > > On Fri, Aug 16, 2024 at 04:16:50PM +0100, Daniel P. Berrangé wrote:
> > > > > > > On Fri, Aug 16, 2024 at 11:06:10AM -0400, Peter Xu wrote:
> > > > > > > > On Thu, Aug 15, 2024 at 04:55:20PM -0400, Steven Sistare wrote:
> > > > > > > > > On 8/13/2024 3:46 PM, Peter Xu wrote:
> > > > > > > > > > On Tue, Aug 06, 2024 at 04:56:18PM -0400, Steven Sistare 
> > > > > > > > > > wrote:
> > > > > > > > > > > > The flipside, however, is that localhost migration via 
> > > > > > > > > > > > 2 separate QEMU
> > > > > > > > > > > > processes has issues where both QEMUs want to be 
> > > > > > > > > > > > opening the very same
> > > > > > > > > > > > file, and only 1 of them can ever have them open.
> > > > > > > > > > 
> > > > > > > > > > I thought we used to have similar issue with block devices, 
> > > > > > > > > > but I assume
> > > > > > > > > > it's solved for years (and whoever owns it will take proper 
> > > > > > > > > > file lock,
> > > > > > > > > > IIRC, and QEMU migration should properly serialize the time 
> > > > > > > > > > window on who's
> > > > > > > > > > going to take the file lock).
> > > > > > > > > > 
> > > > > > > > > > Maybe this is about something else?
> > > > > > > > > 
> > > > > > > > > I don't have an example where this fails.
> > > > > > > > > 
> > > > > > > > > I can cause "Failed to get "write" lock" errors if two qemu 
> > > > > > > > > instances open
> > > > > > > > > the same block device, but the error is suppressed if you add 
> > > > > > > > > the -incoming
> > > > > > > > > argument, due to this code:
> > > > > > > > > 
> > > > > > > > >     blk_attach_dev()
> > > > > > > > >       if (runstate_check(RUN_STATE_INMIGRATE))
> > > > > > > > >         blk->disable_perm = true;
> > > > > > > > 
> > > > > > > > Yep, this one is pretty much expected.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > > Indeed, and "files" includes unix domain sockets.
> > > > > > > > > 
> > > > > > > > > More on this -- the second qemu to bind a unix domain socket 
> > > > > > > > > for listening
> > > > > > > > > wins, and the first qemu loses it (because second qemu 
> > > > > > > > > unlinks and recreates
> > > > > > > > > the socket path before binding on the assumption that it is 
> > > > > > > > > stale).
> > > > > > > > > 
> > > > > > > > > One must use a different name for the socket for second qemu, 
> > > > > > > > > and clients
> > > > > > > > > that wish to connect must be aware of the new port.
> > > > > > > > > 
> > > > > > > > > > > Network ports also conflict.
> > > > > > > > > > > cpr-exec avoids such problems, and is one of the 
> > > > > > > > > > > advantages of the method that
> > > > > > > > > > > I forgot to promote.
> > > > > > > > > > 
> > > > > > > > > > I was thinking that's fine, as the host ports should be the 
> > > > > > > > > > backend of the
> > > > > > > > > > VM ports only anyway so they don't need to be identical on 
> > > > > > > > > > both sides?
> > > > > > > > > > 
> > > > > > > > > > IOW, my understanding is it's the guest IP/ports/... which 
> > > > > > > > > > should still be
> > > > > > > > > > stable across migrations, where the host ports can be 
> > > > > > > > > > different as long as
> > > > > > > > > > the host ports can forward guest port messages correctly?
> > > > > > > > > 
> > > > > > > > > Yes, one must use a different host port number for the second 
> > > > > > > > > qemu, and clients
> > > > > > > > > that wish to connect must be aware of the new port.
> > > > > > > > > 
> > > > > > > > > That is my point -- cpr-transfer requires fiddling with such 
> > > > > > > > > things.
> > > > > > > > > cpr-exec does not.
> > > > > > > > 
> > > > > > > > Right, and my understanding is all these facilities are already 
> > > > > > > > there, so
> > > > > > > > no new code should be needed on reconnect issues if to support 
> > > > > > > > cpr-transfer
> > > > > > > > in Libvirt or similar management layers that supports 
> > > > > > > > migrations.
> > > > > > > 
> > > > > > > Note Libvirt explicitly blocks localhost migration today because
> > > > > > > solving all these clashing resource problems is a huge can of 
> > > > > > > worms
> > > > > > > and it can't be made invisible to the user of libvirt in any 
> > > > > > > practical
> > > > > > > way.
> > > > > > 
> > > > > > Ahhh, OK.  I'm pretty surprised by this, as I thought at least 
> > > > > > kubevirt
> > > > > > supported local migration somehow on top of libvirt.
> > > > > 
> > > > > Since kubevirt runs inside a container, "localhost" migration
> > > > > is effectively migrating between 2 completely separate OS installs
> > > > > (containers), that happen to be on the same physical host. IOW, it
> > > > > is a cross-host migration from Libvirt & QEMU's POV, and there are
> > > > > no clashing resources to worry about.
> > > > 
> > > > OK, makes sense.
> > > > 
> > > > Then do you think it's possible to support cpr-transfer in that scenario
> > > > from Libvirt POV?
> > > > 
> > > > > 
> > > > > > Does it mean that cpr-transfer is a no-go in this case at least for 
> > > > > > Libvirt
> > > > > > to consume it (as cpr-* is only for local host migrations so far)?  
> > > > > > Even if
> > > > > > all the rest issues we're discussing with cpr-exec, is that the 
> > > > > > only way to
> > > > > > go for Libvirt, then?
> > > > > 
> > > > > cpr-exec is certainly appealing from the POV of avoiding the clashing
> > > > > resources problem in libvirt.
> > > > > 
> > > > > It has own issues though, because libvirt runs all QEMU processes with
> > > > > seccomp filters that block 'execve', as we consider QEMU to be 
> > > > > untrustworthy
> > > > > and thus don't want to allow it to exec anything !
> > > > > 
> > > > > I don't know which is the lesser evil from libvirt's POV.
> > > > > 
> > > > > Personally I see security controls as an overriding requirement for
> > > > > everything.
> > > > 
> > > > One thing I am aware of is cpr-exec is not the only one who might start 
> > > > to
> > > > use exec() in QEMU. TDX fundamentally will need to create another key 
> > > > VM to
> > > > deliver the keys and the plan seems to be using exec() too.  However in
> > > > that case per my understanding the exec() is optional - the key VM can 
> > > > also
> > > > be created by Libvirt.
> > > > 
> > > > IOW, it looks like we can still stick with execve() being blocked yet so
> > > > far except cpr-exec().
> > > > 
> > > > Hmm, this makes the decision harder to make.  We need to figure out a 
> > > > way
> > > > on knowing how to consume this feature for at least open source virt
> > > > stack..  So far it looks like it's only possible (if we take seccomp 
> > > > high
> > > > priority) we use cpr-transfer but only in a container.
> > > 
> > > libvirt starts qemu with the -sandbox spawn=deny option which blocks 
> > > fork, exec,
> > > and change namespace operations.  I have a patch in my workspace to be 
> > > submitted
> > > later called "seccomp: fine-grained control of fork, exec, and namespace" 
> > > that allows
> > > libvirt to block fork and namespace but allow exec.
> > 
> > The question is whether that would be accepted, and it also gives me the
> > feeling that even if it's accepted, it might limit the use cases that cpr
> > can apply to.
> 
> This is more acceptable for libvirt running in a container (such as under 
> kubevirt)
> with a limited set of binaries in /bin that could be exec'd.  In that case 
> allowing
> exec is more reasonable.
> 
> > What I read so far from Dan is that cpr-transfer seems to be also preferred
> > from Libvirt POV:
> > 
> >    https://lore.kernel.org/r/zr9-ivorkgjre...@redhat.com
> > 
> > Did I read it right?
> 
> I read that as: cpr-transfer is a viable option for libvirt.  I don't hear him
> excluding the possibility of cpr-exec.

I preferred not having two solution because if they work the same problem
out, then it potentially means one of them might be leftover at some point,
unless they suite different needs.  But I don't feel strongly, especially
if cpr-exec is light if cpr-transfer is there.

> 
> I agree that "Dan the libvirt expert prefers cpr-transfer" is a good reason to
> provide cpr-transfer.  Which I will do.
> 
> So does "Steve the OCI expert prefers cpr-exec" carry equal weight, for also
> providing cpr-exec?

As an open source project, Libvirt using it means the feature can be
actively used and tested.  When e.g. there's a new feature replacing CPR we
know when we can obsolete the old CPR, no matter -exec or -transfer.

Close sourced projects can also be great itself but naturally are less
important in open source communities IMHO due to not accessible to anyone
in the community.  E.g., we never know when an close sourced project
abandoned a feature, then QEMU can carry over that feature forever without
knowing who's using it.

It's the same as when Linux doesn't maintain kabi on out-of-tree drivers to
me.  It's just that here the open source virt stack is a huge project and
QEMU plays its role within.

> 
> We are at an impasse on this series.  To make forward progress, I am willing 
> to
> reorder the patches, and re-submit cpr-transfer as the first mode, so we can
> review and pull that.  I will submit cpr-exec as a follow on and we can resume
> our arguments then.

Yes this could be better to justify how small change cpr-exec would need on
top of cpr-transfer, but I'd still wait for some comments from Dan or
others in case they'll chime in, just to avoid sinking your time with
rebases.

Thanks,

-- 
Peter Xu


Reply via email to