Hello!

On 7/20/20 2:58 PM, Yoshihiro Shimoda wrote:

> According to the report of [1], this driver is possible to cause
> the following error in ravb_tx_timeout_work().
> 
> ravb e6800000.ethernet ethernet: failed to switch device to config mode

   Hmm, maybe we need a larger timeout there? The current one amounts to only
~100 ms for all cases (maybe we should parametrize the timeout?)...
  
> This error means that the hardware could not change the state
> from "Operation" to "Configuration" while some tx and/or rx queue
> are operating. After that, ravb_config() in ravb_dmac_init() will fail,

   Are we seeing double messages from ravb_config()? I think we aren't...

> and then any descriptors will be not allocaled anymore so that NULL
> pointer dereference happens after that on ravb_start_xmit().
> 
> To fix the issue, the ravb_tx_timeout_work() should check
> the return value of ravb_stop_dma() whether this hardware can be
> re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work()
> re-enables TX and RX and just exits.
> 
> [1]
> https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.be...@de.bosch.com/
> 
> Reported-by: Dirk Behme <dirk.be...@de.bosch.com>
> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda...@renesas.com>

   Assuming the comment below is fixed:

Reviewed-by: Sergei Shtylyov <sergei.shtyl...@gmail.com>

> ---
>  Changes from RFC v1:
>  - Check the return value of ravb_stop_dma() and exit if the hardware
>    condition can not be initialized in the tx timeout.
>  - Update the commit subject and description.
>  - Fix some typo.
>  https://patchwork.kernel.org/patch/11570217/
> 
>  Unfortunately, I still didn't reproduce the issue yet. So, I still
>  marked RFC on this patch.

    I think the Bosch people should test this patch, as they reported the 
kernel oops...

> 
>  drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
> b/drivers/net/ethernet/renesas/ravb_main.c
> index a442bcf6..500f5c1 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct 
> *work)
>               ravb_ptp_stop(ndev);
>  
>       /* Wait for DMA stopping */
> -     ravb_stop_dma(ndev);
> +     if (ravb_stop_dma(ndev)) {
> +             /* If ravb_stop_dma() fails, the hardware is still in-progress
> +              * as "Operation" mode for TX and/or RX. So, this should not

   s/in-progress as "Operation" mode/operating/.

> +              * call the following functions because ravb_dmac_init() is
> +              * possible to fail too. Also, this should not retry
> +              * ravb_stop_dma() again and again here because it's possible
> +              * to wait forever. So, this just re-enables the TX and RX and
> +              * skip the following re-initialization procedure.
> +              */
> +             ravb_rcv_snd_enable(ndev);
> +             goto out;
> +     }
>  
>       ravb_ring_free(ndev, RAVB_BE);
>       ravb_ring_free(ndev, RAVB_NC);
> @@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct 
> *work)
>       ravb_dmac_init(ndev);

   BTW, that one also may fail...

>       ravb_emac_init(ndev);
>  
> +out:
>       /* Initialise PTP Clock driver */
>       if (priv->chip_id == RCAR_GEN2)
>               ravb_ptp_init(ndev, priv->pdev);
> 

Reply via email to