On Mon, Oct 30, 2023 at 6:58 AM Fabiano Rosas <faro...@suse.de> wrote:
>
> Hao Xiang <hao.xi...@bytedance.com> writes:
>
> > On Fri, Oct 27, 2023 at 5:30 AM Fabiano Rosas <faro...@suse.de> wrote:
> >>
> >> Hao Xiang <hao.xi...@bytedance.com> writes:
> >>
> >> > Juan Quintela had a patchset enabling zero page checking in multifd
> >> > threads.
> >> >
> >> > https://lore.kernel.org/all/20220802063907.18882-13-quint...@redhat.com/
> >>
> >> Hmm, risky to base your series on code more than an year old. We should
> >> bother Juan so he sends an updated version for review.
> >>
> >> I have concerns about that series. First is why are we doing payload
> >> processing (i.e. zero page detection) in the multifd thread. And that
> >> affects your series directly, because AFAICS we're now doing more
> >> processing still.
> >>
> >
> > I am pretty new to QEMU so my take could be wrong. We can wait for Juan
> > to comment here. My understanding is that the migration main loop was 
> > originally
> > designed to handle single sender thread (before multifd feature). Zero
> > page checking
> > is a pretty CPU intensive operation. So in case of multifd, we scaled
> > up the number
> > of sender threads in order to saturate network traffic.
>
> Right. That's all fine.
>
> > Doing zero page checking in the
> > main loop is not going to scale with this new design.
>
> Yep. Moving work outside of the main loop is reasonable. Juan is
> focusing on separating the migration code from the QEMUFile internals,
> so moving zero page into multifd is a step in the right direction from
> that perspective.
>
> > In fact, we
> > (Bytedance) has merged
> > Juan's change into our internal QEMU and we have been using this
> > feature since last
> > year. I was told that it improved performance pretty significantly.
> > Ideally, I would love to
> > see zero page checking be done in a separate thread pool so we can
> > scale it independently
> > from the sender threads but doing it in the sender thread is an
> > inexpensive way to scale.
>
> Yep, you got the point. And I acknowledge that reusing the sender
> threads is the natural next step. Even if we go that route, let's make
> sure it still leaves us space to separate pre-processing from actual
> sending.

Totally agree. Right now, pre-processing is more than zero page
checking. One can turn on compression/decompression and those are done
before actual sending as well. Currently, using CPU (even multiple
threads) to do compression/decompression doesn't quite match today's
large network throughput but hardware acceleration like Intel's QAT
can really make a difference. To make that happen, there needs some
refactoring on the multifd sender/receiver path.

Reply via email to