On 07/06/2015 11:35 AM, Peter Maydell wrote: > I'm seeing a qtest hang in the /x86_64/ahci/io/ncq/simple > test case. It looks like QEMU is running OK, but the qtest test > is busy-looping in ahci_command_wait(): > > #0 ahci_command_wait (ahci=0x1003f3f9400, cmd=0x1003f401810) at > /home/pm215/qemu/tests/libqos/ahci.c:929 > #1 0x000000001001ba10 in ahci_command_issue (ahci=0x1003f3f9400, > cmd=0x1003f401810) > at /home/pm215/qemu/tests/libqos/ahci.c:937 > #2 0x0000000010019f18 in ahci_guest_io (ahci=0x1003f3f9400, port=5 > '\005', ide_cmd=97 'a', buffer=1097728, > bufsize=4096, sector=0) at /home/pm215/qemu/tests/libqos/ahci.c:632 > #3 0x000000001000b640 in ahci_test_io_rw_simple (ahci=0x1003f3f9400, > bufsize=4096, sector=0, read_cmd= > 96 '`', write_cmd=97 'a') at /home/pm215/qemu/tests/ahci-test.c:886 > #4 0x000000001000d434 in test_ncq_simple () at > /home/pm215/qemu/tests/ahci-test.c:1439 > #5 0x00000080753db01c in 000000ca.plt_call.strncasecmp@@GLIBC_2.3+0 > () from /lib64/libglib-2.0.so.0 > #6 0x00000080753db1f8 in 000000ca.plt_call.strncasecmp@@GLIBC_2.3+0 > () from /lib64/libglib-2.0.so.0 > #7 0x00000080753db1f8 in 000000ca.plt_call.strncasecmp@@GLIBC_2.3+0 > () from /lib64/libglib-2.0.so.0 > #8 0x00000080753db1f8 in 000000ca.plt_call.strncasecmp@@GLIBC_2.3+0 > () from /lib64/libglib-2.0.so.0 > #9 0x00000080753db1f8 in 000000ca.plt_call.strncasecmp@@GLIBC_2.3+0 > () from /lib64/libglib-2.0.so.0 > #10 0x00000080753db6c0 in .g_test_run_suite () from /lib64/libglib-2.0.so.0 > #11 0x00000080753db778 in .g_test_run () from /lib64/libglib-2.0.so.0 > #12 0x000000001000de70 in main (argc=1, argv=0x3fffeeb5a578) at > /home/pm215/qemu/tests/ahci-test.c:1703 > > If you singlestep, we just loop round and round. Presumably > the condition we're expecting just never becomes true. > > I've only seen this on a ppc64 host system; my x86-64 and arm > 'make check' runs have been fine. Have you tested your AHCI > qtest code on a big endian system? (Not necessarily the > problem, Paolo also said he'd seen an intermittent failure > in one of these ahci tests on an x86 host. But the ppc fail > seems to be reliably always on the same test.) > > thanks > -- PMM >
I'll take a look. More than possible there's a race in the wait conditional that I have just never seen before. I tweaked it a little recently to wait on both NCQ and traditional completion, but it doesn't have a timeout or anything. I'll go ahead and fix that while I'm here. Should be something simple, hopefully. --js