Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9

Alex Bennée Fri, 31 Mar 2017 12:49:30 -0700

Paolo Bonzini <pbonz...@redhat.com> writes:

> On 31/03/2017 13:21, Alex Bennée wrote:
>> Anyway I think I'm getting closer to narrowing it down. On record I see
>> replay_current_step jump backwards with this:
>>
>> /* A common event print, called when reading or saving an event */
>> static void print_event(uint8_t event)
>> {
>>     static int event_count;
>>     static uint64_t last_step;
>>     uint64_t this_step = replay_get_current_step();
>>
>>     fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",
>>             event_count, event, this_step);
>>
>>     if (this_step < last_step) {
>>         fprintf(stderr,"%s: !!! step %d went backwards 
>> %#"PRIx64"=>%#"PRIx64"!!!\n",
>>                 __func__, event_count, last_step, this_step);
>>         abort();
>>     }
>>     last_step = this_step;
>>     event_count++;
>> }
>>
>> void replay_put_event(uint8_t event)
>> {
>>     assert(event < EVENT_COUNT);
>>     replay_put_byte(event);
>>     print_event(event);
>> }
>>
>> The jump back is fairly consistent in my case (event 66 in the vexpress
>> run) but I'm fairly sure that is the bug. I can't see any reason for the
>> step count to go backwards - indeed that breaks the premise of .
>
> Good catch!  I suspect it's the "else" case in cpu_get_icount_raw:
>
>     icount = timers_state.qemu_icount;
>     if (cpu) {
>         if (!cpu->can_do_io) {
>             fprintf(stderr, "Bad icount read\n");
>             exit(1);
>         }
>         icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);
>     }
>
> Between
>
>         timers_state.qemu_icount += count;
>
> and
>
>         timers_state.qemu_icount -= (cpu->icount_decr.u16.low
>                                     + cpu->icount_extra);
>
> the I/O thread can read a value that is later rolled back.  I think you
> need to do this the other way round: add something to icount in
> cpu_get_icount_raw rather than taking it off:
>
>     icount = timers_state.qemu_icount;
>     if (cpu) {
>         if (!cpu->can_do_io) {
>             fprintf(stderr, "Bad icount read\n");
>             exit(1);
>         }
>         icount += cpu->pending_icount
>             - (cpu->icount_decr.u16.low + cpu->icount_extra);
>     }
>
> where cpu->pending_icount is set before cpu_exec, and folded into
> timers_state.qemu_icount afterwards.
>
> Also, here:
>
>     if (use_icount) {
>         int64_t count;
>         int decr;
>         timers_state.qemu_icount -= (cpu->icount_decr.u16.low
>                                     + cpu->icount_extra);
>         cpu->icount_decr.u16.low = 0;
>         cpu->icount_extra = 0;
>
> this should be dead code because tcg_cpu_exec also clears the two
> decrementer fields.  So if you can replace the three assignments with
> assertions on cpu->icount_decr.u16.low and cpu->icount_extra, that would
> also simplify the code and remove race opportunities.


I'll have a look at that on Monday. I wrote this before I saw your
email:

  https://github.com/stsquad/qemu/tree/mttcg/debug-record-replay-v1

It fixes the race so time only ever goes forward. Combined with the
following patches it also makes the record/replay streams tolerant of
"missing the boat". With this I can do a initrd run of the vexpress
kernel in both record and playback.

I'm not sure it counts as deterministic though - the vCPU and main-loop
thread seem to figure it out as the go along but I suspect if we really
want to be sure we push the replay_lock() further up. This would ensure
all related events from whichever thread are pushed together.

This is very much a come back on Monday and see if it still seems like a
good idea in the cold light of morning ;-)

--
Alex Bennée

Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9

Reply via email to