It wasn't so easy to apply this patch due to problems in compilation of version you pointed to, and due to content distortions introduced by mail archive, but I got it worked finally :)
Applying this patch finally made all my tests succeed... almost :) Now qemu may hang in random moment of emulation, but not hard. Symptoms looks like I've already reported here: https://bugs.launchpad.net/qemu/+bug/1790460 . So, this isn't record/replay-specific. Although, without rr= option I wasn't able cause this issue to reveal itself, but it doesn't make much sense due to instability of issue's nature and its hard reproducibility. Commit I tested against (with patches applied): 53a19a9a5f9811a911e9b69ef36afb0d66b5d85c . вт, 9 окт. 2018 г. в 17:26, Pavel Dovgalyuk <dovga...@ispras.ru>: > Maybe this will help? > > > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg560780.html > > > > Pavel Dovgalyuk > > > > *From:* Artem Pisarenko [mailto:artem.k.pisare...@gmail.com] > *Sent:* Tuesday, October 09, 2018 2:24 PM > *To:* Pavel Dovgalyuk > > > *Cc:* pavel.dovga...@ispras.ru; qemu-devel@nongnu.org > *Subject:* Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and > adding reverse debugging > > > > (Since all previous patches are already merged to master, I'm running > tests against latest (almost) version from master branch. Following results > are based on master commit dafd95053611aa14dda40266857608d12ddce658 .) > > > > Applying this patch made Tests 1 and 2 succeed (at least I wasn't able to > acheive failures with several attempts). > > Also I've tried few tests without sleep=off and/or rtc base options. All > of them succeed too, except one case - removing sleep=off (regardless of > -rtc option values or its presence at all) causes qemu to hang hard in > recording mode at very startup. Process needs to be killed. > > > > Some info from debugger: > > qemu-system-x86_64 [13231] [cores: 2,4,5,7] > > Thread #1 [qemu-system-x86] 13231 [core: 2] (Suspended : > Container) > > __lll_lock_wait() at lowlevellock.S:135 > 0x7f00b116626d > > __GI___pthread_mutex_lock() at > pthread_mutex_lock.c:80 0x7f00b115fdbd > > qemu_mutex_lock_impl() at qemu-thread-posix.c:66 > 0x947ac4 > > replay_mutex_lock() at replay-internal.c:206 > 0x7f3dea > > os_host_main_loop_wait() at main-loop.c:235 > 0x94335e > > main_loop_wait() at main-loop.c:497 0x943429 > > main_loop() at vl.c:1,853 0x5be70f > > main() at vl.c:4,575 0x5c56e0 > > Thread #2 [qemu-system-x86] 13282 [core: 4] (Suspended : > Container) > > Thread #3 [qemu-system-x86] 13283 [core: 5] (Suspended : > Container) > > Thread #4 [qemu-system-x86] 13284 [core: 7] (Suspended : Step) > > cpu_get_icount_raw() at cpus.c:301 0x45a0a0 > > replay_get_current_step() at replay.c:67 0x7f2f14 > > replay_save_instructions() at replay-internal.c:225 > 0x7f3ea0 > > replay_save_clock() at replay-time.c:24 0x7f483d > > icount_warp_rt() at cpus.c:512 0x45a745 > > qemu_account_warp_timer() at cpus.c:690 > 0x45ad55 > > qemu_tcg_rr_cpu_thread_fn() at cpus.c:1,498 > 0x45c554 > > qemu_thread_start() at qemu-thread-posix.c:504 > 0x9485cf > > start_thread() at pthread_create.c:333 > 0x7f00b115d6ba > > clone() at clone.S:109 0x7f00b0e9341d > > gdb (7.11.1) > > > > Threads #2,3 are just waiting in poll or similar. Nothing extraordinary. > > > > Thread #4 cycles inside do {} while() loop of cpu_get_icount_raw() > function: > > do { > > start = seqlock_read_begin(&timers_state.vm_clock_seqlock); > > icount = cpu_get_icount_raw_locked(); > > } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start)); > > > > Value of timers_state.vm_clock_seqlock.sequence is always 3. > > > > вт, 9 окт. 2018 г. в 15:04, Pavel Dovgalyuk <dovga...@ispras.ru>: > > Please try the following patch. > > There was a problem with rtc option in record/replay mode. > > > > diff --git a/vl.c b/vl.c > > index 40d5d0f..afe1c20 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -2885,6 +2885,7 @@ int main(int argc, char **argv, char **envp) > > DisplayState *ds; > > QemuOpts *opts, *machine_opts; > > QemuOpts *icount_opts = NULL, *accel_opts = NULL; > > + QemuOpts *rtc_opts = NULL; > > QemuOptsList *olist; > > int optind; > > const char *optarg; > > @@ -3691,12 +3692,11 @@ int main(int argc, char **argv, char **envp) > > warn_report("This option is ignored and will be removed > soon"); > > break; > > case QEMU_OPTION_rtc: > > - opts = qemu_opts_parse_noisily(qemu_find_opts("rtc"), > optarg, > > - false); > > - if (!opts) { > > + rtc_opts = qemu_opts_parse_noisily(qemu_find_opts("rtc"), > > + optarg, false); > > + if (!rtc_opts) { > > exit(1); > > } > > - configure_rtc(opts); > > break; > > case QEMU_OPTION_tb_size: > > #ifndef CONFIG_TCG > > @@ -3907,6 +3907,9 @@ int main(int argc, char **argv, char **envp) > > loc_set_none(); > > replay_configure(icount_opts); > > + if (rtc_opts) { > > + configure_rtc(rtc_opts); > > + } > > if (incoming && !preconfig_exit_requested) { > > error_report("'preconfig' and 'incoming' options are " > > > > Pavel Dovgalyuk > > > > *From:* Artem Pisarenko [mailto:artem.k.pisare...@gmail.com] > *Sent:* Thursday, October 04, 2018 4:16 PM > *To:* dovgaluk > *Cc:* pavel.dovga...@ispras.ru; qemu-devel@nongnu.org > *Subject:* Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and > adding reverse debugging > > > > No, it didn't changed test results, at least for > https://github.com/ispras/qemu/tree/rr-180911 . Even step values it > stucks on are same for most runs. > > Playing with master and my own branch gives different results for tests > without sleep=off and -rtc base. It seems that patch you mentioned didn't > changed them very much. > > The only thing can be said for sure, is that this patch does not fix > issues completely. But MAY fix them partially or in some other specific > cases... > > > > ср, 3 окт. 2018 г. в 12:47, dovgaluk <dovga...@ispras.ru>: > > Can you try applying this patch? > https://www.mail-archive.com/qemu-devel@nongnu.org/msg563798.html > > I also encountered the problems with x86_64 replaying and found the > misprint in > the code which was fixed later, than sending the series to the mailing > list. > > Pavel Dovgalyuk > > > Artem Pisarenko писал 2018-10-02 10:02: > > I've added "-monitor stdio" option to command line of Test 1 and > > repeated entering command during execution: > > > > QEMU 3.0.50 monitor - type 'help' for more information > > (qemu) info replay > > Replaying execution 'icount_rr_capture.bin': current step = > > 311736195 > > (qemu) info replay > > Replaying execution 'icount_rr_capture.bin': current step = > > 318198367 > > (qemu) info replay > > Replaying execution 'icount_rr_capture.bin': current step = > > 324737211 > > (qemu) info replay > > Replaying execution 'icount_rr_capture.bin': current step = > > 329890795 > > (qemu) info replay > > Replaying execution 'icount_rr_capture.bin': current step = > > 607069789 > > (qemu) info replay > > Replaying execution 'icount_rr_capture.bin': current step = > > 607069789 > > (qemu) info replay > > Replaying execution 'icount_rr_capture.bin': current step = > > 607069789 > > ... > > > > Some notes on value of step it stucks on: > > - mostly it's same (even across different record-replay pairs); > > - stressing host during replay may cause it to change even for same > > record-replay pair (i.e. different replay executions for same file > > recorded). > > > > This specific case seems to be stable to reproduce. > > > > вт, 2 окт. 2018 г. в 0:22, Artem Pisarenko > > <artem.k.pisare...@gmail.com>: > > > >> I've posted bug report with extended tests (incl. case without > >> sleep=off). You may find guest image (kernel) in bug description. > >> https://bugs.launchpad.net/qemu/+bug/1795369 [1] > >> > >> The most annoying thing is that some issues are almost not > >> reproducible. There are definitely race conditions somewhere in qemu > >> code. Running 'stress-ng' utility with CPU and I/O stressors in > >> parallel with qemu execution greatly minimizes amount of attempts > >> when I'm trying to trigger some of issues I encounter. > >> > >> I'll try 'info monitor' command tomorrow, but no guarantees that > >> I'll be able to reproduce issue again. > >> > >> Speaking about '-nographic' and SDL... I've noted that UI greatly > >> minimizes possibility of hanging (but not avoids it completely) when > >> using icount in general, so this effect isn't rr-specific. I've > >> already reported this bug too. > >> > >> пн, 1 окт. 2018 г., 20:14 dovgaluk <dovga...@ispras.ru>: > >> > >>> Artem Pisarenko писал 2018-09-30 14:01: > >>>> Feature still broken :( > >>> > >>> Thanks for testing. > >>> > >>>> > >>>> Brief description of my tests. > >>>> > >>>> Guest image is Linux, which just powers off after kernel boots > >>>> (instead of proceeding to user-space /init or /sbin/init). > >>>> Base cmdline: > >>>> qemu-system-x86_64 -nodefaults -machine pc,accel=tcg -m 2048 > >>> -cpu > >>>> qemu64 -rtc clock=vm,base=2000-01-01T00:00:00 -kernel bzImage > >>> -initrd > >>>> rootfs -append 'nokaslr console=ttyS0 rdinit=/init_poweroff' > >>>> -nographic -serial SERIAL_VALUE -icount > >>>> 1,sleep=off,rr=RR_VALUE,rrfile=icount_rr_capture.bin > >>> > >>> I've never tried it with sleep=off. Can you remove it and try > >>> again? > >>> > >>> We also seen a problem with '-nographic'. When we remove this > >>> option and > >>> QEMU runs with SDL > >>> window, everything is ok. There is some problem with main loop > >>> which may > >>> sleep when there > >>> is no GUI to update, or something like that. We couldn't fix it > >>> yet. > >>> > >>>> > >>>> Test 1. When SERIAL_VALUE=none > >>>> Running with RR_VALUE=record completes successfully. > >>>> Running with RR_VALUE=replay doesn't completes. qemu process > >>> just > >>>> eating ~100% cpu and memory usage doesn't grow after some > >>> moment. I > >>>> don't see what happens because of problem no.2 (see below). > >>> > >>> Try 'info replay' monitor command. Does instruction counter > >>> increases? > >>> > >>>> > >>>> Test 2. When SERIAL_VALUE=stdio > >>>> Running with RR_VALUE=record completes successfully. > >>>> > >>>> Running with RR_VALUE=replay caues exit with error: > >>>> > >>>> "qemu-system-x86_64: Missing character write event in the replay > >>> log" > >>>> > >>>> These problems are same with qemu 2.12 (both vanilla and with > >>> previous > >>>> versions of these patches applied). Furthemore, I consider whole > >>>> icount mode broken and determinism isn't achievable. > >>>> The irony is that I actually don't need record/replay feature. > >>> I've > >>>> tried to use it only as instrument to debug failing determinism > >>> in > >>>> qemu code. But since replay/record feature itself relies on > >>>> determinism, which is broken, it's no wonder why it fails also > >>> (I just > >>>> hoped to bypass it). > >>>> > >>>> Contact me if you need more details. I just tired a lot trying > >>> to get > >>>> all these things working... Hope is leaving me... > >>> > >>> Can you share the kernel in case the icount still broken? > >>> > >>> Pavel Dovgalyuk > >> -- > >> > >> С уважением, > >> Артем Писаренко > > -- > > > > С уважением, > > Артем Писаренко > > > > Links: > > ------ > > [1] https://bugs.launchpad.net/qemu/+bug/1795369 > > -- > > С уважением, > Артем Писаренко > > -- > > С уважением, > Артем Писаренко > -- С уважением, Артем Писаренко