I wrote: > This patch works (in that it fixes this problem with a test case I have > coincidentally received from another reporter this week), although I > notice that doing read/write syscalls via gdb is dreadfully slow > because there seems to be ~1second delay between gdb sending its response > to a syscall (Fwrite/Fread) packet and getting the ack back from qemu. > I'm guessing that's a different bug...
In fact it's an extremely close relative of the bug you're trying to fix, and it shows why the approach you're trying to take here is the wrong one. Basically the stalls occur when gdb sends us the reply to our syscall before we've got round to processing the vm_stop(): in this case gdb_read_byte() will drop the initial "$" on the floor and then send (another) vm_stop(). After this we drop the whole of the reply packet (because we're in RS_IDLE/RS_SYSCALL and the "$" was dropped rather than handled). So we stall until gdb times out and retransmits the packet. Some debug tracing demonstrating this: here's a "good" syscall (# comments mine): gdb_do_syscall: vm_stop(RUN_STATE_DEBUG) reply='Fread,00000003,04000188,00000200' gdb_vm_state_change: to 0 # we got the vm_state_change first... gdb_chr_receive bytes 1042 Got ACK # so we take the "+" ACK and then handle the packet <5:$><2:M><2:4><2:0><2:0><2:0><2:1><2:8><2:8><2:,><2:2><2:0><2:0><2::><2:3><2:4><2:3><2:5><2:3><2:2><2:3><2:2><2:0><2:9><2:3><2:1><2:3><2:4><2:3><2:4><2:3><2:3><2:3><2:8><2:3><2:6><2:3><2:2><2:3><2:3><2:3><2:2><2:3><2:7><2:0><2:9><2:3><2:3><2:3><2:5><2:3><2:9><2:3><2:7><2:3><2:7><2:3><2:2><2:3><2:2> # this tracing is <s->state:char>; I've snipped the dull middle bit out of it <2:7><2:#><3:0><4:3>command='M4000188,200:343532320931343433383632333237093335393737323233340a313332333532340939313030343533323909323639393834360a323037313036353035340933333438373030313909313232313039383630350a313433313031363434390933303239343136303409313037323631363436380a33363337323434300931353134373636363009313533393131343436320a333438313332353031093131383038383930373909313732333831303638360a32303033303836343639093133333831333634373609313537363139353139350a333838353638383330093136343832333036343209313331303035373931310a33373333343736343009363437373232363633093939303738343230370a313830323134383139340934363932383339313709313831333238343936370a3130393835323638373309383239303536313531093133393836333037380a3230303835373232303209383331373535393937093337343936373630300a3139353935383537330932303532383534363032093337363239313132350a34393839303031373809393737393837343232093239393837323533310a3635303337363833380933363936313832333609313733303838383938300a31383331323635393137093230393334323839323309313736373236313432300a3130323139313837343609313532323134303437' # that was GDB writing the memory for our read() reply='OK' gdb_chr_receive bytes 9 Got ACK <1:$><2:F><2:2><2:0><2:0><2:#><3:d><4:8>command='F200' # and the final status from the read(), so go again: gdb_continue: vm_start() gdb_vm_state_change: to 1 Here's tracing when it goes wrong: gdb_do_syscall: vm_stop(RUN_STATE_DEBUG) reply='Fread,00000003,04000188,00000200' gdb_chr_receive bytes 1042 # got the data back before the state change Got ACK dropping char $, vm_stop(RUN_STATE_PAUSED) # ...so gdb_read_byte drops this $ and sends a spurious state change: gdb_vm_state_change: to 0 <5:M><1:4><1:0><1:0><1:0><1:1><1:8><1:8><1:,><1:2><1:0><1:0><1::><1:3><1:6><1:3><1:3><1:3><1:9><1:0><1:9><1:3><1:1><1:3><1:3><1:3><1:8><1:3><1:9><1:3><1:0><1:3><1:7><1:3><1:9><1:3> # and we end up ignoring the whole packet <1:1><1:2>gdb_chr_receive bytes 1041 # gdb got bored and retransmitted <1:$><2:M><2:4><2:0><2:0> # snip again; this time we got it, though. <2:#><3:1><4:2>command='M4000188,200:3633390931333839303739333432093533353238363134310a31363432363633313938093437343432363309313538323438323433370a313033333230363230320938343431363939333909313135333236333539300a3139393238363531323809323836373931363331093138313232363531330a313635303939343537310931343835353131383034093938363437383235370a323132343839383133380938343839333436383309313133313335323334360a313534313431373534300939343331393034393509313134353135313232350a33303338373232360938373730363839373209313234353033363432310a313339303836353732350939353631333431353809313630383334303633340a38333230373736343509313733313139303935320936353132303335360a37333637333333390931313839393334343609313232303538353437320a3738343137363033093137303134373538383309313636333536383131310a3932323538373534320937303732353538323509313135383734373636310a31323039333739313734093838383438323333390934343437303231360a353437343037333330093138373439363035393609323033373333353334340a313339363334323031330938353838323932393409313534303834363236370a3139323034383836300932303033393830353139' # etc I think the right way to deal with both the problem you were seeing and this related issue is simply not to try to send the syscall request until we have really stopped the CPU. That is, when not in CONFIG_USER_ONLY we should send the syscall request from gdb_vm_state_change(). -- PMM