Re: [Kgdb-bugreport] [PATCH 2.6.20-rc7] 8139too KGDBoE fix

Mark Huth Fri, 23 Feb 2007 12:34:48 -0800

Stephen Hemminger wrote:

On Fri, 23 Feb 2007 11:10:40 -0700
Mark Huth <[EMAIL PROTECTED]> wrote:
Amit S. Kale wrote:
Hi Net Gurus,
This thread came up on kgdb-bugreport mailing list. Could you please suggestus what's the correct way of fixing this problem?
1. When running a kgdb on RTL8139 ethernet interface: 8139too driver printstoo many "Out-of-sync dirty pointer" messages on console and gdb can'tconnect to kgdb stub. These messages can be suppressed, though it stillresults in connection failures frequently.
We think this comes from calling the driver while the queue is stopped.Drivers should not do horrible things when hard start is called with thequeue stopped, but unfortunately, at this time, at least some driversdo explode or complain under that condition.
The kernel is built on a set of assumptions about calling context. Your
out of tree code is violating one of them. Why not check for stopped queue
and do some action to try and clear it, that is what netconsole does.

Yes, of course. This is just an incidental thing that happens becauseof the real problem, which is the use of CONFIG_NETPOL_TRAP innetif_stop/wake_queue routines. Information about the necessity of thatcode would be appreciated. Because when that option is selected, thequeue management interface is squashed, leading to the situation wherethe device driver thinks the queue is stopped but the flag for that doesnot get changed. Leading to the situation where device drivers eitherpanic or complain.

AFAIK, NETPOLL_RX is not used at all, and NETPOLL_TRAP is only used innetdevice.h to turn off the transmit flow control/queue managementfunction. Netpoll already bypasses the actual queue, but it does try tohonor the queue state. However, KGDBOE breaks the queue statemanagement by selecting NETPOLL_TRAP.This is not exactly out of tree code, because netpoll is the entity thatcalls the driver leading to errors and worse from the drivers. And KGDBis from the community tree. We're just trying to make it work, and thepatches will be returned when we figure this out. We're also trying toget this to work with the RT stuff, which creates another whole set ofproblems due to major semantic changes. However it looks like thelatest nepoll code should be okay wrt RT.

And I remain of the opinion that a device driver ought not panic orcorrupt data, or anything else obnoxious given a hard_start call at thewrong time, but that's another battle for another day.

2. Here is how kgdb uses polling mechanism for communication to gdb. kgdbcalls netpoll_set_trap(1) just before entering a loop where it communicatesto gdb. It calls netpoll_set_trap(0) after it is done and wants to resume akernel. The communication to gdb goes through netpoll_poll (which calls kgdbrx_hook) and netpoll_send_udp functions.
3. A queue for an interface may have been stopped by it's driver by callingnetif_stop_queue. After this if kgdb attempts to enter communication withgdb, it'll call netpoll_set_trap(1), after which the queue can't be startedagain. This is a potential deadlock situation. Is there a way out of this?
We are trying without setting the CONFIG_NETPOLL_TRAP option. Thisoption is what turns off the function of the netif_stop/wake_queuecalls, which breaks the usual flow control mechanism used by netpolltransmit function. It also prevents the netif_schedule call, which willputs the device on the tx softirq queue. However, in the case whereinterupts are off and scheduling is not allowed - which would be thenetpoll_set_trap(1) condition, the softirq will not run until netpoll isdone and the user of netpoll returns the system to normal operation. SoI am unclear that allowing the schedule is a problem. There may be someobscure race conditions on smp, so we are trying to analyze that part,but for the moment are testing with the netif_schedule call allowed inthe event of queuing the device.
4. Is it necessary to call netpoll_set_trap(1) at all before entering gdbcommunication loop? Even if a driver stops the queue in middle of thecommunication netpoll_poll and netpoll_send_udp calls can recover from thatby calling driver's interrupt and poll routines. Is this a valid statement?
netpoll_set_trap() is necessary, as it informs the netpoll code torespond to arp requests on behalf of the netpoll user, as well as makingsure that skbs are freed without needing the completion queue stuff torun (I think)
Thanks a lot.
-Amit


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Kgdb-bugreport] [PATCH 2.6.20-rc7] 8139too KGDBoE fix

Reply via email to