Re: [PATCH v5 3/5] migration: enable multifd and postcopy together

Peter Xu Tue, 11 Feb 2025 07:20:45 -0800

On Tue, Feb 11, 2025 at 02:34:07PM +0530, Prasad Pandit wrote:
> On Mon, 10 Feb 2025 at 22:29, Peter Xu <pet...@redhat.com> wrote:
> > Yes, and I suggest a rename or introduce a new helper, per previous reply.
> 
> * Okay, will try it.
> 
> > I didn't follow, sorry - do you mean this patch is correct on dropping the
> > mapped-ram check? I don't yet understand how it can work if without.
> 
> * It goes for channel peek if '!migrate_mapped_ram', ie. when
> mapped_ram is not set. When it is enabled, likely it just falls into
> the multifd channel, like other tls/file channels. I'll see if we have
> to add a check for mapped_ram stream, like tls/file one.
> 
> > I meant tls channels should have these magics too.  Do you mean they're not?
> 
> * Yes. AFAIU, tls/file channels don't send magic values.


Please double check whether TLS will send magics.  AFAICT, they should.

> 
> > No I don't think so.
> > Flushing sending side makes sure send buffer is empty.  It doesn't
> > guarantee recv buffer is empty on the other side.
> 
> * A simple 'flush' operation is not supposed to guarantee reception on
> the destination side. It is just a 'flush' operation. If we want to
> _confirm_ whether the pages sent to the destination are received or
> not, then the destination side should send an 'Acknowledgement' to the
> source side. Is there such a mechanism in place currently?

No.  We need to figure out a way to do that properly, and that's exactly
what I mentioned as one of the core changes we need in this series, which
is still missing.  We may or may not need an ACK message.  Please think
about it.

> 
> > >
> > > * If all multifd pages are sent/written/flushed onto the multifd
> > > channels before postcopy_start() is called, then multifd pages should
> > > not arrive at the destination after postcopy starts IIUC.  If that is
> > > happening, we need a reproducer for such a case. Do we have such a
> > > reproducer?
> >
> > With or without a reproducer, we need to at least justify it in theory.  If
> > it doesn't even work in theory, it's a problem.
> 
> * The theory that both multifd and postcopy channels use the same
> underlying network wire; And in that multifd pages get delayed, but
> postcopy pages don't, is not understandable. There must be something
> else happening in such a case, which a reproducer could help with.

Please consider the case where multifd recv threads may get scheduled out
on dest host during precopy phase, not getting chance to be scheduled until
postcopy already started running on dst, then the recv thread can stumble
upon a page that was sent during precopy.  As long as that can be always
avoided, I think we should be good.

Thanks,

-- 
Peter Xu

Re: [PATCH v5 3/5] migration: enable multifd and postcopy together

Reply via email to