On 5/22/2017 8:26 PM, Shyam Sundar S K wrote:
>
> On 5/22/2017 6:49 PM, Mathias Nyman wrote:
>> On 22.05.2017 11:56, Shyam Sundar S K wrote:
>>> Hi Mathias,
>>>
>>>
>>> On 5/19/2017 12:43 PM, Mathias Nyman wrote:
>>>> On 18.05.2017 16:46, Alan Stern wrote:
>>>>> On Thu, 18 May 2017, Shyam Sundar S K wrote:
>>>>>
>>>>>> on AMD platforms with SNPS 3.1 USB controller has an issue
>>>>>> if the stop EP command is issued when the controller is not
>>>>>> in running state. If issued, it is leading to a critical RTL bug
>>>>>> because of which controller goes into irrecoverable state.
>>>>>>
>>>>>> This patch adds a appropriate checks to make sure that scenario
>>>>>> does not happen.
>>>>>>
>>>>>> Signed-off-by: Shyam Sundar S K <[email protected]>
>>>>>> Signed-off-by: Nehal Shah <[email protected]>
>>>>>> ---
>>>>>> --- a/drivers/usb/host/xhci.h
>>>>>> +++ b/drivers/usb/host/xhci.h
>>>>>> @@ -1819,6 +1819,7 @@ struct xhci_hcd {
>>>>>>    /* For controller with a broken Port Disable implementation */
>>>>>>    #define XHCI_BROKEN_PORT_PED    (1 << 25)
>>>>>>    #define XHCI_LIMIT_ENDPOINT_INTERVAL_7    (1 << 26)
>>>>>> +#define XHCI_BROKEN_STOP    (1 << 27)
>>>>> Does there really need to be a quirk flag for this?  I should think
>>>>> that you never want to issue a STOP EP command while the controller is
>>>>> not running, no matter what kind of controller it is.
>>>>>
>>>>> Alan Stern
>>>>>
>>>> If it's about controller not running then there shouldn't be any problems.
>>>> We shouldn't issue a stop endpoint command if controller is halted.
>>>>
>>>> If this is about issuing a stop endpoint command while endpoint isn't
>>>> running, then fully working controllers should just respond with a command
>>>> completion with "context state error" status.
>>> As per SNPS the controller is responding with "Context State Error", 
>>> however the same is not getting
>>> reflected when we check the cmd->status in the xhci driver.
>>>
>>>> Anyway, as Alan said the quirk is probably unnecessary here.
>>> OK. We will take care of this.
>>>
>>>> We shouldn't need to
>>>> stop endpoints that are not running. Only problem I see here is that the
>>>> endpoint state in the output endpoint context might not be up to date. If 
>>>> driver
>>>> just restarted the endpoint by ringing the doorbell, the output context 
>>>> state
>>>> might not be updated yet.
>>> Before issuing the stop end point command, we checked the state of the 
>>> endpoint and it looks the state of
>>> the end point is EP_STATE_STOPPED. If the output endpoint context is not 
>>> updated is there a better way
>>> to retrieve the EP state before issuing the stop end point command ?
>> Not really, checking endpoint context and possible a software variable kept 
>> up to date
>> by driver to keep track of doorbell. Perhaps checking endpoint ctx is enough 
>> for now
> So, is it OK to guard the stop endpoint by checking the EP context before 
> queuing it?
>
>>>> How does this SNPS 3.1 controller react if the endpoint just halted or 
>>>> moved to
>>>> error state just before controller runs the stop endpoint command? Still 
>>>> triggers
>>>> the RTL bug?
>>> As per SNPS analysis.
>>>
>>> 1) Driver issues STOP ENDPOINT command  and the EP is in Running state.
>>> 2) HW executes the STOP ENDPOINT command successfully
>>> 3) Driver again issues STOP ENDPOINT command.
>>> 4) Since the EP is already halted/stopped, HW completes the command 
>>> execution and reports “device context error” completion code. This is as 
>>> per the spec.
>>> 5) However HW on receiving the second command additionally marks EP to Flow 
>>> control state in HW which is RTL bug
>>> 6) The above bug causes the HW not to respond to any further doorbells that 
>>> are rung by the driver. This causes the EP to not functional anymore and 
>>> causes gross functional failures.
>>>
>> What happens if endpoint ctx shows endpoint is in the halted or error state 
>> when stop endpoint command is issued?
>>  still RTL bug?
> Yes. That's right. If EP context shows as halted/stopped/error and we issue a 
> stop endpoint command it is triggering the RTL bug. Since the tapeout has 
> already happened and there is no way to fix this
> from SNPS side, they are asking for a SW workaround i.e.  "issuing the stop 
> endpoint command only when the EP context state is running."
>
> So, it is OK to have this patch submission which will check for EP context 
> before queuing the stop endpoint command ?
>
>>>> I'm talking about the in xhci spec 4.6.9:
>>>>
>>>> " A Busy endpoint may asynchronously transition from the Running to the 
>>>> Halted or Error state due
>>>> to error conditions detected while processing TRBs. A possible race 
>>>> condition may occur if
>>>> software, thinking an endpoint is in the Running state, issues a Stop 
>>>> Endpoint Command however
>>>> at the same time the xHC asynchronously transitions the endpoint to the 
>>>> Halted or Error state. In
>>>> this case, a Context State Error may be generated for the command 
>>>> completion. Software may
>>>> verify that this case occurred by inspecting the EP State for Halted or 
>>>> Error when a Stop Endpoint
>>>> Command results in a Context State Error."
>>>>
>>> Since we are novice, can you please help to understand what is the 
>>> intuition behind sending two stop endpoint commands ?
>> No need for two stop endpoint commands, that can be fixed in the driver.
> How ? can you please help to understand.
>
>> -Mathias
Hi Mathias,

Any feedback on this patch ?

Thanks,
Shyam
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to