On Tue, Feb 12, 2019 at 01:59:19PM +0000, Russell King - ARM Linux admin wrote:
> Hi,
> 
> Booting 4.20 on SolidRun Clearfog reliably provokes the following
> warning - this is with mvneta built in, but DSA as modules:
> 
> WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc
> mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it 
> has not allocated [device address=0x000000002dd7dc00] [size=240 bytes]
> Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink 
> armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy 
> mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables
> CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291
> Hardware name: Marvell Armada 380/385 (Device Tree)
> [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
> [<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4)
> [<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
> [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
> [<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc)
> [<c00b0370>] (check_sync) from [<c00b04f8>] 
> (debug_dma_sync_single_range_for_cpu+0x6c/0x74)
> [<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] 
> (mvneta_poll+0x298/0xf58)
> [<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424)
> [<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540)
> [<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144)
> [<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0)
> [<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] 
> (gic_handle_irq+0x48/0x98)
> [<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98)
> ...
> 
> This appears to be from:
> 
>                 if (rx_bytes <= rx_copybreak) {
>                         /* better copy a small frame and not unmap the DMA 
> region */
>                         skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
>                         if (unlikely(!skb))
>                                 goto err_drop_frame_ret_pool;
> 
>                         dma_sync_single_range_for_cpu(dev->dev.parent,
>                                                       rx_desc->buf_phys_addr,
>                                                       MVNETA_MH_SIZE + 
> NET_SKB_PAD,
>                                                       rx_bytes,
>                                                       DMA_FROM_DEVICE);
> 
> which suggests that rx_desc->buf_phys_addr is not something that should
> be passed to dma_sync_single_range_for_cpu().  I've not been able to
> track down why that is, nor which interface is provoking that.
> 
> As I don't have the details of how the buffer management hardware works
> on Armada 388, I'm unable to debug this myself.

Doing what debugging I _can_ do, it seems that this has been a long-term
error in mvneta, but one that was merely uncovered by:

  commit 562e2f467e71f45f0400ebee5077eaa426d3e426
  Author: Yelena Krivosheev <yel...@marvell.com>
  Date:   Wed Jul 18 18:10:57 2018 +0200

The buffer that is being complained about is sync'd using a device of
dev->dev.parent 'f1070000.ethernet', but is allocated by
mvneta_bm_construct() against a different device:

mvneta_bm_construct: 0x2dd85c00 +0x140 for ee113294 (f10c8000.bm)

namely 'f10c8000.bm'.

It's long-term, because it will only trigger in older kernels if we
hit the copy-break stuff, which used to do:

        if (rx_bytes <= rx_copybreak) {
                skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
                if (unlikely(!skb)) {
                        ...
                }
                
                dma_sync_single_range_for_cpu(dev->dev.parent,
                                              phys_addr,
                                              MVNETA_MH_SIZE + NET_SKB_PAD,
                                              rx_bytes,
                                              DMA_FROM_DEVICE);

where rx_copybreak is 256 bytes.  Quite why that hasn't been seen
already, I do not know.

Looking at the code after the commit, if mvneta is used on a
non-coherent platform, then we have problems:

        copy_size = min(skb_size, rx_bytes);
        ...
        memcpy(rxq->skb->data, data + MVNETA_MH_SIZE,
               copy_size);
        ...
        if (rxq->left_size == 0) {
                int size = copy_size + MVNETA_MH_SIZE;
                dma_sync_single_range_for_cpu(dev->dev.parent,
                                              phys_addr, 0,
                                              size,
                                              DMA_FROM_DEVICE);

Since the sync is done _after_ we've copied data from a non-coherent
buffer.

If this code has been written to assume that we're always coherent,
then is there any point at all to having the incorrect dma_sync_*()
calls at all?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Reply via email to