Jason Wang <jasow...@redhat.com> writes: > On 2019/1/25 下午3:12, Markus Armbruster wrote: >> Jason Wang <jasow...@redhat.com> writes: >> >>> On 2019/1/24 下午5:47, Markus Armbruster wrote: >>>> Please cc: me on QMP issues. >>> >>> Ok. >>> >>> >>>> Jason Wang <jasow...@redhat.com> writes: >>>> >>>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote: >>>>>> * Jason Wang (jasow...@redhat.com) wrote: >>>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote: >>>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasow...@redhat.com> wrote: >>>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote: >>>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert >>>>>>>>>> <dgilb...@redhat.com <mailto:dgilb...@redhat.com>> wrote: >>>>>>>>>> >>>>>>>>>> * Peter Maydell (peter.mayd...@linaro.org >>>>>>>>>> <mailto:peter.mayd...@linaro.org>) wrote: >>>>>>>>>> > Recently I've noticed that test-filter-mirror has been >>>>>>>>>> hanging >>>>>>>>>> > intermittently, typically when run on some other TCG >>>>>>>>>> architecture. >>>>>>>>>> > In the instance I've just looked at, this was with s390x >>>>>>>>>> guest on >>>>>>>>>> > x86-64 host, though I've also seen it on other host archs >>>>>>>>>> and >>>>>>>>>> > perhaps with other guests. >>>>>>>>>> >>>>>>>>>> Watch out to see if you really do see it for other guests; >>>>>>>>>> it carefully avoids using virtio-net to avoid vhost; but on >>>>>>>>>> s390x it >>>>>>>>>> uses virtio-net-ccw - could that hit the vhost it was trying >>>>>>>>>> to avoid? >>>>>>>>>> >>>>>>>>>> > Below is a backtrace, though it seems to be pretty >>>>>>>>>> unhelpful. >>>>>>>>>> > Anybody got any theories ? Does the mirror test rely on >>>>>>>>>> dirty >>>>>>>>>> > memory bitmaps like the migration test (which also hangs >>>>>>>>>> > occasionally with TCG due to some bug I'm sure we've >>>>>>>>>> investigated >>>>>>>>>> > in the past) ? >>>>>>>>>> >>>>>>>>>> I don't think it relies on the CPU at all. >>>>>>>>>> I have no idea about this currently, but Jason and I designed >>>>>>>>>> the >>>>>>>>>> test case. >>>>>>>>>> Add Jason: Have any comments about this ? >>>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the >>>>>>>>> test should be independent to any kinds of emulation. It should pass >>>>>>>>> when mainloop work. >>>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is >>>>>>>> indeed not specific to s390x guest (and so not specific to >>>>>>>> virtio-net either, since the ppc64 guest setup uses e1000). >>>>>>>> >>>>>>>> thanks >>>>>>>> -- PMM >>>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of >>>>>>> running. >>>>>>> >>>>>>> Bisection points to OOB monitor[1]. >>>>>>> >>>>>>> It looks to me after OOB is used unconditionally we lose a barrier to >>>>>>> make >>>>>>> sure socket is connected before sending packets in >>>>>>> test-filter-mirror.c. Is >>>>>>> there any other similar and simple thing that we could do to kick the >>>>>>> mainloop? >>>>>> Do you mean the: >>>>>> >>>>>> /* send a qmp command to guarantee that 'connected' is setting to >>>>>> true. */ >>>>>> qmp_discard_response(qts, "{ 'execute' : 'query-status'}"); >>>>> Yes. >>>>> >>>>> >>>>>> why was that ever sufficient to know the socket was ready? >>>>> It was suggested by Fam, I don't remember the details. Can we make >>>>> sure all pending events has been processed (UNIX socket was set to >>>>> connected) after query-status is returned with an non OOB monitor? >>>> I'm afraid I lack context. Which socket are you talking about? The >>>> test has at least the QMP socket, the send_sock[], and recv_sock. What >>>> exactly are you trying to accomplish? >>> >>> I mean recv_sock. If mirror tries to send a packet to it before its >>> is_connected is set to true, packet will be dropped. >> So the *socket* is connected (in the TCP sense), > > > UNIX domain socket actually in the case of this test.
Yes. >> but something else >> (whatever owns is_connected) is not. Can you point me to where >> is_connected is set to true? > > > Sorry, should be "connected". It was set in tcp_chr_connect(). So if > filter want to send a packet to socket chardev before > tcp_chr_connect() is called, the packet will be dropped silently by > tcp_chr_write(). This will fail this unit-test. Aha: the thing that isn't connected is the character device. >>>> By the way, mkstemp(sock_path) followed by unix_connect(sock_path, NULL) >>>> looks rather fishy. Why create a temporary file only to create a Unix >>>> domain socket right over it? >>> >>> I vaguely remember passing fd created by unix domain socket doesn't >>> work when the test is introduced. So my understanding is the author >>> needs a way to create a unique file name which will be used b Unix >>> domain socket at that time. >> We should really, really, really improve the test harness to run each >> test program in its very own temporary directory. Then tests can simply >> create files with fixed names, and leave cleanup to the test harness. > > > Agree, but for this test, since passing fd works now. I tend to using > socketpair(). Resources that don't require manual cleanup (such as file descriptors obtained with socketpair() or pipe()) are the best choice when they work. [..]