On Mon, Nov 24, 2025 at 8:35 AM Jason Wang <[email protected]> wrote:
>
> On Thu, Nov 6, 2025 at 11:23 AM Zhang Chen <[email protected]> wrote:
> >
> > On Thu, Nov 6, 2025 at 9:10 AM Zhijian Li (Fujitsu)
> > <[email protected]> wrote:
> > >
> > >
> > >
> > > On 06/11/2025 04:58, Peter Xu wrote:
> > > > On Tue, Nov 04, 2025 at 09:36:06AM +0800, Li Zhijian wrote:
> > > >> Commit 4881411136 ("migration: Always set DEVICE state") set a new 
> > > >> DEVICE
> > > >> state before completed during migration, which broke the original 
> > > >> transition
> > > >> to COLO. The migration flow for precopy has changed to:
> > > >> active -> pre-switchover -> device -> completed.
> > > >>
> > > >> This patch updates the transition state to ensure that the Pre-COLO
> > > >> state corresponds to DEVICE state correctly.
> > > >>
> > > >> Fixes: 4881411136 ("migration: Always set DEVICE state")
> > > >> Signed-off-by: Li Zhijian <[email protected]>
> > > >> ---
> > > >>   migration/migration.c | 4 ++--
> > > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > > >>
> > > >> diff --git a/migration/migration.c b/migration/migration.c
> > > >> index a63b46bbef..6ec7f3cec8 100644
> > > >> --- a/migration/migration.c
> > > >> +++ b/migration/migration.c
> > > >> @@ -3095,9 +3095,9 @@ static void migration_completion(MigrationState 
> > > >> *s)
> > > >>           goto fail;
> > > >>       }
> > > >>
> > > >> -    if (migrate_colo() && s->state == MIGRATION_STATUS_ACTIVE) {
> > > >> +    if (migrate_colo() && s->state == MIGRATION_STATUS_DEVICE) {
> > > >>           /* COLO does not support postcopy */
> > > >> -        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> > > >> +        migrate_set_state(&s->state, MIGRATION_STATUS_DEVICE,
> > > >>                             MIGRATION_STATUS_COLO);
> > > >>       } else {
> > > >>           migration_completion_end(s);
> > > >
> > > > Thanks a lot for fixing it, Zhijian.  It means I broke COLO already for
> > > > 10.0/10.1..
> > > >
> > > > Hailiang/Chen, do you still know anyone who is using COLO, especially in
> > > > enterprise?  I don't expect any individual using it.. It definitely
> > > > complicates migration logics all over the places.  Fabiano and I 
> > > > discussed
> > > > a few times on removing legacy code and COLO was always in the list.
> > > >
> > > > We used to discuss RDMA obsoletion too, that's when Huawei developers at
> > > > least tried to re-implement the whole RDMA using rsocket, that didn't 
> > > > land
> > > > only because of a perf regression.  Meanwhile, Zhijian also provided an
> > > > unit test, which we rely on recently to not break RDMA at the minimum.
> > > >
> > > > If we do not have known users, I sincerely want to discuss with you on
> > > > obsoletion and removal of COLO from qemu codebase.  Do you see feasible?
> > > >
> > > > Zhijian, do you have any input here?
> > >
> > >
> > > If we don't have any known users, I personally have no objection to 
> > > removing COLO.
> > >
> > >  From my previous understanding, its use cases are rather limited, and 
> > > the checkpointing overhead is significant.
> > > Moreover, with the continuous development of Cloud Native over the past 
> > > decade, service-based
> > > FT/HA solutions have become very mature, which shrinks the use cases for 
> > > VM-based FT solutions even further.
> > >
> > > I think it's worth keeping if we have:
> > >
> > > - Active users who depend on it.
> > > - A unit test for the COLO framework.
> > >
> > > Thanks
> > > Zhijian
> > >
> > >
> >
> > Add CC Lukas.
> >
> > From technical point, I agree Zhijian's comments. We can probably do
> > this gradually.
> > In my side, I know some local companies build thier HA/FT product based on 
> > COLO.
> > In this case, I think most of them already forked QEMU upstream code
> > to a private repo for internal mantained.
> > It may caused some upgrade issues in the future.
> >
> > And another part is Lukas covered pacemaker project integrated COLO,
> > and I don't know users status for pacemaker.
> > Maybe Lukas can input some comments?
> >
> > For the implementation, COLO not only have migration part of code(it
> > is the core of COLO), it also including network and block replication
> > for co-working.
> > If we remove migration related code need to consider how to handle
> > other parts, network maybe change to general QEMU netfilter?  block
> > replication ?
>
> SInce netfiler code was mostly decoupled with COLO, I think we can
> keep them. Or just deprecate colo-compare.
>

If the deprecation decision is made, I will send patch to deprecate
colo-compare related code.
It is OK for me.

Thanks
Chen

> Thanks
>
> >
> > For the COLO framework unit test,  I think it need to add some "#if
> > defined(qtest)" in migration code for testing(COLO proxy/netfilter
> > already have independent qtest).
> >
> > Thanks
> > Chen
> >
> >
> >
> >
> >
> > >
> > > >
> > > > Thanks,
> > > >
> >
>

Reply via email to