Paolo Bonzini <pbonz...@redhat.com> writes: > On 31/03/2017 13:21, Alex Bennée wrote: >> Anyway I think I'm getting closer to narrowing it down. On record I see >> replay_current_step jump backwards with this: >> >> /* A common event print, called when reading or saving an event */ >> static void print_event(uint8_t event) >> { >> static int event_count; >> static uint64_t last_step; >> uint64_t this_step = replay_get_current_step(); >> >> fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n", >> event_count, event, this_step); >> >> if (this_step < last_step) { >> fprintf(stderr,"%s: !!! step %d went backwards >> %#"PRIx64"=>%#"PRIx64"!!!\n", >> __func__, event_count, last_step, this_step); >> abort(); >> } >> last_step = this_step; >> event_count++; >> } >> >> void replay_put_event(uint8_t event) >> { >> assert(event < EVENT_COUNT); >> replay_put_byte(event); >> print_event(event); >> } >> >> The jump back is fairly consistent in my case (event 66 in the vexpress >> run) but I'm fairly sure that is the bug. I can't see any reason for the >> step count to go backwards - indeed that breaks the premise of . > > Good catch! I suspect it's the "else" case in cpu_get_icount_raw: > > icount = timers_state.qemu_icount; > if (cpu) { > if (!cpu->can_do_io) { > fprintf(stderr, "Bad icount read\n"); > exit(1); > } > icount -= (cpu->icount_decr.u16.low + cpu->icount_extra); > } > > Between > > timers_state.qemu_icount += count; > > and > > timers_state.qemu_icount -= (cpu->icount_decr.u16.low > + cpu->icount_extra); > > the I/O thread can read a value that is later rolled back. I think you > need to do this the other way round: add something to icount in > cpu_get_icount_raw rather than taking it off: > > icount = timers_state.qemu_icount; > if (cpu) { > if (!cpu->can_do_io) { > fprintf(stderr, "Bad icount read\n"); > exit(1); > } > icount += cpu->pending_icount > - (cpu->icount_decr.u16.low + cpu->icount_extra); > } > > where cpu->pending_icount is set before cpu_exec, and folded into > timers_state.qemu_icount afterwards. > > Also, here: > > if (use_icount) { > int64_t count; > int decr; > timers_state.qemu_icount -= (cpu->icount_decr.u16.low > + cpu->icount_extra); > cpu->icount_decr.u16.low = 0; > cpu->icount_extra = 0; > > this should be dead code because tcg_cpu_exec also clears the two > decrementer fields. So if you can replace the three assignments with > assertions on cpu->icount_decr.u16.low and cpu->icount_extra, that would > also simplify the code and remove race opportunities.
I'll have a look at that on Monday. I wrote this before I saw your email: https://github.com/stsquad/qemu/tree/mttcg/debug-record-replay-v1 It fixes the race so time only ever goes forward. Combined with the following patches it also makes the record/replay streams tolerant of "missing the boat". With this I can do a initrd run of the vexpress kernel in both record and playback. I'm not sure it counts as deterministic though - the vCPU and main-loop thread seem to figure it out as the go along but I suspect if we really want to be sure we push the replay_lock() further up. This would ensure all related events from whichever thread are pushed together. This is very much a come back on Monday and see if it still seems like a good idea in the cold light of morning ;-) -- Alex Bennée