On Tue Aug 22, 2023 at 2:44 PM AEST, Pavel Dovgalyuk wrote: > On 11.08.2023 04:47, Nicholas Piggin wrote: > > RR CPU switching is driven by timers and events so it is deterministic > > like everything else. Record a CPU switch event and use that to drive > > the CPU switch on replay. > > > > Signed-off-by: Nicholas Piggin <npig...@gmail.com> > > --- > > This is still in RFC phase because so far I've only really testd ppc > > pseries, and only with patches that are not yet upstream (but posted > > to list). > > > > It works with smp 2, can step, reverse-step, reverse-continue, etc. > > throughout a Linux boot. > > I still didn't have time to test it, but here are some comments.
That's okay, I got a little further, mainly adding vmstate to migrate it (otherwise we can only use the initial snapshot). Unless there is more interest, I will focus on getting ppc fixes upstream first. Let me know if you have more time to look, I can send you the latest. [snip] > > @@ -294,9 +346,9 @@ static void *rr_cpu_thread_fn(void *arg) > > qatomic_set_mb(&cpu->exit_request, 0); > > } > > > > - if (all_cpu_threads_idle()) { > > - rr_stop_kick_timer(); > > + qatomic_set(&rr_next_cpu, cpu); > > This does not seem to be in the mainline. Sorry I meant to sqush that in or send it out. The kick timer init vs start needed to be moved to make it work. [snip] > > -bool replay_exception(void) > > +bool replay_switch_cpu(void) > > +{ > > + if (replay_mode == REPLAY_MODE_RECORD) { > > + g_assert(replay_mutex_locked()); > > + replay_save_instructions(); > > + replay_put_event(EVENT_SWITCH_CPU); > > + return true; > > + } else if (replay_mode == REPLAY_MODE_PLAY) { > > + bool res = replay_has_switch_cpu(); > > + if (res) { > > + replay_finish_event(); > > + } else { > > + g_assert_not_reached(); > > + } > > + return res; > > + } > > + > > + return true; > > +} > > + > > +bool replay_has_switch_cpu(void) > > Is this function really needed? I found it was easier to fit in the way the CPU scheduling is done in rr. I think that main scheduling loop could be refactored a bit that could then avoid the need for this (e.g., a helper function to return the next CPU and all the selection code including rr is in there). But that became non-trivial and looks like the code is a bit delicate. I might try to tackle that afterwards. Thanks, Nick