On 5/22/2017 8:26 PM, Shyam Sundar S K wrote: > > On 5/22/2017 6:49 PM, Mathias Nyman wrote: >> On 22.05.2017 11:56, Shyam Sundar S K wrote: >>> Hi Mathias, >>> >>> >>> On 5/19/2017 12:43 PM, Mathias Nyman wrote: >>>> On 18.05.2017 16:46, Alan Stern wrote: >>>>> On Thu, 18 May 2017, Shyam Sundar S K wrote: >>>>> >>>>>> on AMD platforms with SNPS 3.1 USB controller has an issue >>>>>> if the stop EP command is issued when the controller is not >>>>>> in running state. If issued, it is leading to a critical RTL bug >>>>>> because of which controller goes into irrecoverable state. >>>>>> >>>>>> This patch adds a appropriate checks to make sure that scenario >>>>>> does not happen. >>>>>> >>>>>> Signed-off-by: Shyam Sundar S K <[email protected]> >>>>>> Signed-off-by: Nehal Shah <[email protected]> >>>>>> --- >>>>>> --- a/drivers/usb/host/xhci.h >>>>>> +++ b/drivers/usb/host/xhci.h >>>>>> @@ -1819,6 +1819,7 @@ struct xhci_hcd { >>>>>> /* For controller with a broken Port Disable implementation */ >>>>>> #define XHCI_BROKEN_PORT_PED (1 << 25) >>>>>> #define XHCI_LIMIT_ENDPOINT_INTERVAL_7 (1 << 26) >>>>>> +#define XHCI_BROKEN_STOP (1 << 27) >>>>> Does there really need to be a quirk flag for this? I should think >>>>> that you never want to issue a STOP EP command while the controller is >>>>> not running, no matter what kind of controller it is. >>>>> >>>>> Alan Stern >>>>> >>>> If it's about controller not running then there shouldn't be any problems. >>>> We shouldn't issue a stop endpoint command if controller is halted. >>>> >>>> If this is about issuing a stop endpoint command while endpoint isn't >>>> running, then fully working controllers should just respond with a command >>>> completion with "context state error" status. >>> As per SNPS the controller is responding with "Context State Error", >>> however the same is not getting >>> reflected when we check the cmd->status in the xhci driver. >>> >>>> Anyway, as Alan said the quirk is probably unnecessary here. >>> OK. We will take care of this. >>> >>>> We shouldn't need to >>>> stop endpoints that are not running. Only problem I see here is that the >>>> endpoint state in the output endpoint context might not be up to date. If >>>> driver >>>> just restarted the endpoint by ringing the doorbell, the output context >>>> state >>>> might not be updated yet. >>> Before issuing the stop end point command, we checked the state of the >>> endpoint and it looks the state of >>> the end point is EP_STATE_STOPPED. If the output endpoint context is not >>> updated is there a better way >>> to retrieve the EP state before issuing the stop end point command ? >> Not really, checking endpoint context and possible a software variable kept >> up to date >> by driver to keep track of doorbell. Perhaps checking endpoint ctx is enough >> for now > So, is it OK to guard the stop endpoint by checking the EP context before > queuing it? > >>>> How does this SNPS 3.1 controller react if the endpoint just halted or >>>> moved to >>>> error state just before controller runs the stop endpoint command? Still >>>> triggers >>>> the RTL bug? >>> As per SNPS analysis. >>> >>> 1) Driver issues STOP ENDPOINT command and the EP is in Running state. >>> 2) HW executes the STOP ENDPOINT command successfully >>> 3) Driver again issues STOP ENDPOINT command. >>> 4) Since the EP is already halted/stopped, HW completes the command >>> execution and reports “device context error” completion code. This is as >>> per the spec. >>> 5) However HW on receiving the second command additionally marks EP to Flow >>> control state in HW which is RTL bug >>> 6) The above bug causes the HW not to respond to any further doorbells that >>> are rung by the driver. This causes the EP to not functional anymore and >>> causes gross functional failures. >>> >> What happens if endpoint ctx shows endpoint is in the halted or error state >> when stop endpoint command is issued? >> still RTL bug? > Yes. That's right. If EP context shows as halted/stopped/error and we issue a > stop endpoint command it is triggering the RTL bug. Since the tapeout has > already happened and there is no way to fix this > from SNPS side, they are asking for a SW workaround i.e. "issuing the stop > endpoint command only when the EP context state is running." > > So, it is OK to have this patch submission which will check for EP context > before queuing the stop endpoint command ? > >>>> I'm talking about the in xhci spec 4.6.9: >>>> >>>> " A Busy endpoint may asynchronously transition from the Running to the >>>> Halted or Error state due >>>> to error conditions detected while processing TRBs. A possible race >>>> condition may occur if >>>> software, thinking an endpoint is in the Running state, issues a Stop >>>> Endpoint Command however >>>> at the same time the xHC asynchronously transitions the endpoint to the >>>> Halted or Error state. In >>>> this case, a Context State Error may be generated for the command >>>> completion. Software may >>>> verify that this case occurred by inspecting the EP State for Halted or >>>> Error when a Stop Endpoint >>>> Command results in a Context State Error." >>>> >>> Since we are novice, can you please help to understand what is the >>> intuition behind sending two stop endpoint commands ? >> No need for two stop endpoint commands, that can be fixed in the driver. > How ? can you please help to understand. > >> -Mathias Hi Mathias,
Any feedback on this patch ? Thanks, Shyam -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
