On Mon, Nov 24, 2025 at 8:35 AM Jason Wang <[email protected]> wrote: > > On Thu, Nov 6, 2025 at 11:23 AM Zhang Chen <[email protected]> wrote: > > > > On Thu, Nov 6, 2025 at 9:10 AM Zhijian Li (Fujitsu) > > <[email protected]> wrote: > > > > > > > > > > > > On 06/11/2025 04:58, Peter Xu wrote: > > > > On Tue, Nov 04, 2025 at 09:36:06AM +0800, Li Zhijian wrote: > > > >> Commit 4881411136 ("migration: Always set DEVICE state") set a new > > > >> DEVICE > > > >> state before completed during migration, which broke the original > > > >> transition > > > >> to COLO. The migration flow for precopy has changed to: > > > >> active -> pre-switchover -> device -> completed. > > > >> > > > >> This patch updates the transition state to ensure that the Pre-COLO > > > >> state corresponds to DEVICE state correctly. > > > >> > > > >> Fixes: 4881411136 ("migration: Always set DEVICE state") > > > >> Signed-off-by: Li Zhijian <[email protected]> > > > >> --- > > > >> migration/migration.c | 4 ++-- > > > >> 1 file changed, 2 insertions(+), 2 deletions(-) > > > >> > > > >> diff --git a/migration/migration.c b/migration/migration.c > > > >> index a63b46bbef..6ec7f3cec8 100644 > > > >> --- a/migration/migration.c > > > >> +++ b/migration/migration.c > > > >> @@ -3095,9 +3095,9 @@ static void migration_completion(MigrationState > > > >> *s) > > > >> goto fail; > > > >> } > > > >> > > > >> - if (migrate_colo() && s->state == MIGRATION_STATUS_ACTIVE) { > > > >> + if (migrate_colo() && s->state == MIGRATION_STATUS_DEVICE) { > > > >> /* COLO does not support postcopy */ > > > >> - migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, > > > >> + migrate_set_state(&s->state, MIGRATION_STATUS_DEVICE, > > > >> MIGRATION_STATUS_COLO); > > > >> } else { > > > >> migration_completion_end(s); > > > > > > > > Thanks a lot for fixing it, Zhijian. It means I broke COLO already for > > > > 10.0/10.1.. > > > > > > > > Hailiang/Chen, do you still know anyone who is using COLO, especially in > > > > enterprise? I don't expect any individual using it.. It definitely > > > > complicates migration logics all over the places. Fabiano and I > > > > discussed > > > > a few times on removing legacy code and COLO was always in the list. > > > > > > > > We used to discuss RDMA obsoletion too, that's when Huawei developers at > > > > least tried to re-implement the whole RDMA using rsocket, that didn't > > > > land > > > > only because of a perf regression. Meanwhile, Zhijian also provided an > > > > unit test, which we rely on recently to not break RDMA at the minimum. > > > > > > > > If we do not have known users, I sincerely want to discuss with you on > > > > obsoletion and removal of COLO from qemu codebase. Do you see feasible? > > > > > > > > Zhijian, do you have any input here? > > > > > > > > > If we don't have any known users, I personally have no objection to > > > removing COLO. > > > > > > From my previous understanding, its use cases are rather limited, and > > > the checkpointing overhead is significant. > > > Moreover, with the continuous development of Cloud Native over the past > > > decade, service-based > > > FT/HA solutions have become very mature, which shrinks the use cases for > > > VM-based FT solutions even further. > > > > > > I think it's worth keeping if we have: > > > > > > - Active users who depend on it. > > > - A unit test for the COLO framework. > > > > > > Thanks > > > Zhijian > > > > > > > > > > Add CC Lukas. > > > > From technical point, I agree Zhijian's comments. We can probably do > > this gradually. > > In my side, I know some local companies build thier HA/FT product based on > > COLO. > > In this case, I think most of them already forked QEMU upstream code > > to a private repo for internal mantained. > > It may caused some upgrade issues in the future. > > > > And another part is Lukas covered pacemaker project integrated COLO, > > and I don't know users status for pacemaker. > > Maybe Lukas can input some comments? > > > > For the implementation, COLO not only have migration part of code(it > > is the core of COLO), it also including network and block replication > > for co-working. > > If we remove migration related code need to consider how to handle > > other parts, network maybe change to general QEMU netfilter? block > > replication ? > > SInce netfiler code was mostly decoupled with COLO, I think we can > keep them. Or just deprecate colo-compare. >
If the deprecation decision is made, I will send patch to deprecate colo-compare related code. It is OK for me. Thanks Chen > Thanks > > > > > For the COLO framework unit test, I think it need to add some "#if > > defined(qtest)" in migration code for testing(COLO proxy/netfilter > > already have independent qtest). > > > > Thanks > > Chen > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > >
