date:20210304

Re: [PATCH V3 5/5] arm64: dts: renesas: beacon kits: Setup AVB refclk

2021-03-04 Thread Geert Uytterhoeven

On Wed, Feb 24, 2021 at 12:52 PM Adam Ford  wrote:
> The AVB refererence clock assumes an external clock that runs

reference

> automatically.  Because the Versaclock is wired to provide the
> AVB refclock, the device tree needs to reference it in order for the
> driver to start the clock.
>
> Signed-off-by: Adam Ford 

Reviewed-by: Geert Uytterhoeven 
i.e. will queue in renesas-devel (with the typo fixed) once the DT
bindings have been accepted.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH V3 4/5] net: ethernet: ravb: Enable optional refclk

2021-03-04 Thread Geert Uytterhoeven

Hi Adam,

On Wed, Feb 24, 2021 at 12:52 PM Adam Ford  wrote:
> For devices that use a programmable clock for the AVB reference clock,
> the driver may need to enable them.  Add code to find the optional clock
> and enable it when available.
>
> Signed-off-by: Adam Ford 

Thanks for your patch!

> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -2148,6 +2148,13 @@ static int ravb_probe(struct platform_device *pdev)
> goto out_release;
> }
>
> +   priv->refclk = devm_clk_get_optional(&pdev->dev, "refclk");
> +   if (IS_ERR(priv->refclk)) {
> +   error = PTR_ERR(priv->refclk);
> +   goto out_release;
> +   }
> +   clk_prepare_enable(priv->refclk);
> +

Shouldn't the reference clock be disabled in case of any failure below?

> ndev->max_mtu = 2048 - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN);
> ndev->min_mtu = ETH_MIN_MTU;
>
> @@ -2260,6 +2267,9 @@ static int ravb_remove(struct platform_device *pdev)
> if (priv->chip_id != RCAR_GEN2)
> ravb_ptp_stop(ndev);
>
> +   if (priv->refclk)
> +   clk_disable_unprepare(priv->refclk);
> +
> dma_free_coherent(ndev->dev.parent, priv->desc_bat_size, 
> priv->desc_bat,
>   priv->desc_bat_dma);
> /* Set reset mode */

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: Re: [RFC v4 07/11] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-03-04 Thread Yongji Xie

On Thu, Mar 4, 2021 at 2:27 PM Jason Wang  wrote:
>
>
> On 2021/2/23 7:50 下午, Xie Yongji wrote:
> > This VDUSE driver enables implementing vDPA devices in userspace.
> > Both control path and data path of vDPA devices will be able to
> > be handled in userspace.
> >
> > In the control path, the VDUSE driver will make use of message
> > mechnism to forward the config operation from vdpa bus driver
> > to userspace. Userspace can use read()/write() to receive/reply
> > those control messages.
> >
> > In the data path, VDUSE_IOTLB_GET_FD ioctl will be used to get
> > the file descriptors referring to vDPA device's iova regions. Then
> > userspace can use mmap() to access those iova regions. Besides,
> > userspace can use ioctl() to inject interrupt and use the eventfd
> > mechanism to receive virtqueue kicks.
> >
> > Signed-off-by: Xie Yongji 
> > ---
> >   Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
> >   drivers/vdpa/Kconfig   |   10 +
> >   drivers/vdpa/Makefile  |1 +
> >   drivers/vdpa/vdpa_user/Makefile|5 +
> >   drivers/vdpa/vdpa_user/vduse_dev.c | 1348 
> > 
> >   include/uapi/linux/vduse.h |  136 ++
> >   6 files changed, 1501 insertions(+)
> >   create mode 100644 drivers/vdpa/vdpa_user/Makefile
> >   create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
> >   create mode 100644 include/uapi/linux/vduse.h
> >
> > diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> > b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > index a4c75a28c839..71722e6f8f23 100644
> > --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> > +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > @@ -300,6 +300,7 @@ Code  Seq#Include File  
> >  Comments
> >   'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
> > conflict!
> >   '|'   00-7F  linux/media.h
> >   0x80  00-1F  linux/fb.h
> > +0x81  00-1F  linux/vduse.h
> >   0x89  00-06  arch/x86/include/asm/sockios.h
> >   0x89  0B-DF  linux/sockios.h
> >   0x89  E0-EF  linux/sockios.h 
> > SIOCPROTOPRIVATE range
> > diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> > index ffd1e098bfd2..92f07715e3b6 100644
> > --- a/drivers/vdpa/Kconfig
> > +++ b/drivers/vdpa/Kconfig
> > @@ -25,6 +25,16 @@ config VDPA_SIM_NET
> >   help
> > vDPA networking device simulator which loops TX traffic back to RX.
> >
> > +config VDPA_USER
> > + tristate "VDUSE (vDPA Device in Userspace) support"
> > + depends on EVENTFD && MMU && HAS_DMA
> > + select DMA_OPS
> > + select VHOST_IOTLB
> > + select IOMMU_IOVA
> > + help
> > +   With VDUSE it is possible to emulate a vDPA Device
> > +   in a userspace program.
> > +
> >   config IFCVF
> >   tristate "Intel IFC VF vDPA driver"
> >   depends on PCI_MSI
> > diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
> > index d160e9b63a66..66e97778ad03 100644
> > --- a/drivers/vdpa/Makefile
> > +++ b/drivers/vdpa/Makefile
> > @@ -1,5 +1,6 @@
> >   # SPDX-License-Identifier: GPL-2.0
> >   obj-$(CONFIG_VDPA) += vdpa.o
> >   obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
> > +obj-$(CONFIG_VDPA_USER) += vdpa_user/
> >   obj-$(CONFIG_IFCVF)+= ifcvf/
> >   obj-$(CONFIG_MLX5_VDPA) += mlx5/
> > diff --git a/drivers/vdpa/vdpa_user/Makefile 
> > b/drivers/vdpa/vdpa_user/Makefile
> > new file mode 100644
> > index ..260e0b26af99
> > --- /dev/null
> > +++ b/drivers/vdpa/vdpa_user/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +vduse-y := vduse_dev.o iova_domain.o
> > +
> > +obj-$(CONFIG_VDPA_USER) += vduse.o
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> > b/drivers/vdpa/vdpa_user/vduse_dev.c
> > new file mode 100644
> > index ..393bf99c48be
> > --- /dev/null
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -0,0 +1,1348 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * VDUSE: vDPA Device in Userspace
> > + *
> > + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights 
> > reserved.
> > + *
> > + * Author: Xie Yongji 
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "iova_domain.h"
> > +
> > +#define DRV_VERSION  "1.0"
> > +#define DRV_AUTHOR   "Yongji Xie "
> > +#define DRV_DESC "vDPA Device in Userspace"
> > +#define DRV_LICENSE  "GPL v2"
> > +
> > +#define VDUSE_DEV_MAX (1U << MINORBITS)
> > +
> > +struct vduse_virtqueue {
> > + u16 index;
> > + bool ready;
> > + spinlock_t kick_lock;
> > + spinlock_t irq_lock;
> > + struct eventfd_ctx *kickfd;
> > + struct v

Re: Re: [RFC v4 11/11] vduse: Support binding irq to the specified cpu

2021-03-04 Thread Yongji Xie

On Thu, Mar 4, 2021 at 3:30 PM Jason Wang  wrote:
>
>
> On 2021/2/23 7:50 下午, Xie Yongji wrote:
> > Add a parameter for the ioctl VDUSE_INJECT_VQ_IRQ to support
> > injecting virtqueue's interrupt to the specified cpu.
>
>
> How userspace know which CPU is this irq for? It looks to me we need to
> do it at different level.
>
> E.g introduce some API in sys to allow admin to tune for that.
>
> But I think we can do that in antoher patch on top of this series.
>

OK. I will think more about it.

Thanks,
Yongji

Re: [PATCH net] docs: networking: drop special stable handling

2021-03-04 Thread Leon Romanovsky

On Wed, Mar 03, 2021 at 05:00:07PM +, patchwork-bot+netdev...@kernel.org 
wrote:
> Hello:
>
> This patch was applied to netdev/net.git (refs/heads/master):
>
> On Tue,  2 Mar 2021 18:46:43 -0800 you wrote:
> > Leave it to Greg.
> >
> > Signed-off-by: Jakub Kicinski 
> > ---
> >  Documentation/networking/netdev-FAQ.rst   | 72 ++-
> >  Documentation/process/stable-kernel-rules.rst |  6 --
> >  Documentation/process/submitting-patches.rst  |  5 --
> >  3 files changed, 6 insertions(+), 77 deletions(-)
>
> Here is the summary with links:
>   - [net] docs: networking: drop special stable handling
> https://git.kernel.org/netdev/net/c/dbbe7c962c3a

Excellent news, thanks

Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-04 Thread Jason Wang




On 2021/3/3 4:29 下午, Cornelia Huck wrote:

On Wed, 3 Mar 2021 12:01:01 +0800
Jason Wang  wrote:


On 2021/3/2 8:08 下午, Cornelia Huck wrote:

On Mon, 1 Mar 2021 11:51:08 +0800
Jason Wang  wrote:
  

On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote:

On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote:

On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote:

Confused. What is wrong with the above? It never reads the
field unless the feature has been offered by device.

So the spec said:

"

The following driver-read-only field, max_virtqueue_pairs only exists if
VIRTIO_NET_F_MQ is set.

"

If I read this correctly, there will be no max_virtqueue_pairs field if the
VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() violates
what spec said.

Thanks

I think that's a misunderstanding. This text was never intended to
imply that field offsets change beased on feature bits.
We had this pain with legacy and we never wanted to go back there.

This merely implies that without VIRTIO_NET_F_MQ the field
should not be accessed. Exists in the sense "is accessible to driver".

Let's just clarify that in the spec, job done.

Ok, agree. That will make things more eaiser.

Yes, that makes much more sense.

What about adding the following to the "Basic Facilities of a Virtio
Device/Device Configuration Space" section of the spec:

"If an optional configuration field does not exist, the corresponding
space is still present, but reserved."


This became interesting after re-reading some of the qemu codes.

E.g in virtio-net.c we had:

*static VirtIOFeature feature_sizes[] = {
      {.flags = 1ULL << VIRTIO_NET_F_MAC,
   .end = endof(struct virtio_net_config, mac)},
      {.flags = 1ULL << VIRTIO_NET_F_STATUS,
   .end = endof(struct virtio_net_config, status)},
      {.flags = 1ULL << VIRTIO_NET_F_MQ,
   .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
      {.flags = 1ULL << VIRTIO_NET_F_MTU,
   .end = endof(struct virtio_net_config, mtu)},
      {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
   .end = endof(struct virtio_net_config, duplex)},
      {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL <<
VIRTIO_NET_F_HASH_REPORT),
   .end = endof(struct virtio_net_config, supported_hash_types)},
      {}
};*

*It has a implict dependency chain. E.g MTU doesn't presnet if
DUPLEX/RSS is not offered ...
*

But I think it covers everything up to the relevant field, no? So MTU
is included if we have the feature bit, even if we don't have
DUPLEX/RSS.

Given that a config space may be shorter (but must not collapse
non-existing fields), maybe a better wording would be:

"If an optional configuration field does not exist, the corresponding
space will still be present if it is not at the end of the
configuration space (i.e., further configuration fields exist.)



This should work but I think we need to define the end of configuration 
space first?




This
implies that a given field, if it exists, is always at the same offset
from the beginning of the configuration space."



(Do we need to specify what a device needs to do if the driver tries to
access a non-existing field? We cannot really make assumptions about
config space accesses; virtio-ccw can just copy a chunk of config space
that contains non-existing fields...


Right, it looks to me ccw doesn't depends on config len which is kind of
interesting. I think the answer depends on the implementation of both
transport and device:

(virtio-ccw is a bit odd, because channel I/O does not have the concept
of a config space and I needed to come up with something)



Ok.





Let's take virtio-net-pci as an example, it fills status unconditionally
in virtio_net_set_config() even if VIRTIO_NET_F_STATUS is not
negotiated. So with the above feature_sizes:

1) if one of the MQ, MTU, DUPLEX or RSS is implemented, driver can still
read status in this case
2) if none of the above four is negotied, driver can only read a 0xFF
(virtio_config_readb())



I guess the device could ignore
writes and return zeroes on read?)


So consider the above, it might be too late to fix/clarify that in the
spec on the expected behaviour of reading/writing non-exist fields.

We could still advise behaviour via SHOULD, though I'm not sure it adds
much at this point in time.

It really feels a bit odd that a driver can still read and write fields
for features that it did not negotiate, but I fear we're stuck with it.



Yes, since the device (at least virtio-pci) implment thing like this.

Thanks





Thanks



I've opened https://github.com/oasis-tcs/virtio-spec/issues/98 for the
spec issues.
  


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: BUG: soft lockup in ieee80211_tasklet_handler

2021-03-04 Thread Johannes Berg

On Tue, 2021-03-02 at 20:01 +0100, Dmitry Vyukov wrote:
> 
> Looking at the reproducer that mostly contains just perf_event_open,
> It may be the old known issue of perf_event_open with some extreme
> parameters bringing down kernel.
> +perf maintainers
> And as far as I remember +Peter had some patch to restrict
> perf_event_open parameters.
> 
> r0 = perf_event_open(&(0x7f01d000)={0x1, 0x70, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x3ff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xfffe, 0x0, @perf_config_ext}, 0x0,
> 0x0, 0x, 0x0)

Oh! Thanks for looking.

Seems that also applies to

https://syzkaller.appspot.com/bug?extid=d6219cf21f26bdfcc22e

FWIW. I was still tracking that one, but never had a chance to look at
it (also way down the list since it was reported as directly in hwsim)

johannes

Re: [PATCH net 1/3] sh_eth: fix TRSCER mask for SH771x

2021-03-04 Thread Geert Uytterhoeven

On Sun, Feb 28, 2021 at 9:54 PM Sergey Shtylyov  wrote:
> According  to  the SH7710, SH7712, SH7713 Group User's Manual: Hardware,
> Rev. 3.00, the TRSCER register actually has only bit 7 valid (and named
> differently), with all the other bits reserved. Apparently, this was not
> the case with some early revisions of the manual as we have the other
> bits declared (and set) in the original driver.  Follow the suit and add
> the explicit sh_eth_cpu_data::trscer_err_mask initializer for SH771x...
>
> Fixes: 86a74ff21a7a ("net: sh_eth: add support for Renesas SuperH Ethernet")
> Signed-off-by: Sergey Shtylyov 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH net 2/3] sh_eth: fix TRSCER mask for R7S72100

2021-03-04 Thread Geert Uytterhoeven

On Sun, Feb 28, 2021 at 9:54 PM Sergey Shtylyov  wrote:
> According  to  the RZ/A1H Group, RZ/A1M Group User's Manual: Hardware,
> Rev. 4.00, the TRSCER register has bit 9 reserved, hence we can't use
> the driver's default TRSCER mask.  Add the explicit initializer for
> sh_eth_cpu_data::trscer_err_mask for R7S72100.
>
> Fixes: db893473d313 ("sh_eth: Add support for r7s72100")
> Signed-off-by: Sergey Shtylyov 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH net 3/3] sh_eth: fix TRSCER mask for R7S9210

2021-03-04 Thread Geert Uytterhoeven

On Sun, Feb 28, 2021 at 9:56 PM Sergey Shtylyov  wrote:
> According  to the RZ/A2M Group User's Manual: Hardware, Rev. 2.00,
> the TRSCER register has bit 9 reserved, hence we can't use the driver's
> default TRSCER mask.  Add the explicit initializer for sh_eth_cpu_data::
> trscer_err_mask for R7S9210.
>
> Fixes: 6e0bb04d0e4f ("sh_eth: Add R7S9210 support")
> Signed-off-by: Sergey Shtylyov 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH intel-net 1/3] i40e: move headroom initialization to i40e_configure_rx_ring

2021-03-04 Thread Jesper Dangaard Brouer

On Wed,  3 Mar 2021 16:39:26 +0100
Maciej Fijalkowski  wrote:

> i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
> relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.
> 
> Currently, the callsite of mentioned function is placed incorrectly
> within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
> set yet. This causes the XDP_REDIRECT to be partially broken due to
> inability to create xdp_frame in the headroom space, as the headroom is
> 0.
> 
> For the record, below is the call graph:
> 
> i40e_vsi_open
>  i40e_vsi_setup_rx_resources
>   i40e_setup_rx_descriptors
>i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below
> 
>  i40e_vsi_configure_rx
>   i40e_configure_rx_ring
>set_ring_build_skb_enabled(ring) <-- set build_skb flag
> 
> Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
> the flag setting.
> 
> Fixes: f7bb0d71d658 ("i40e: store the result of i40e_rx_offset() onto 
> i40e_ring")
> Reported-by: Jesper Dangaard Brouer 
> Co-developed-by: Jesper Dangaard Brouer 
> Signed-off-by: Jesper Dangaard Brouer 
> Signed-off-by: Maciej Fijalkowski 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 13 +
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 12 
>  2 files changed, 13 insertions(+), 12 deletions(-)

Acked-by: Jesper Dangaard Brouer 
Tested-by: Jesper Dangaard Brouer 

I'm currently looking at extending samples/bpf/ xdp_redirect_map to
detect the situation.  As with this bug the redirect tests/sample
programs will just report really high performance numbers (because
packets are dropped earlier due to err).   Knowing what performance
numbers to expect, I could see that they were out-of-spec, and
investigated the root-cause.

I assume Intel QA tested XDP-redirect and didn't find the bug due to
this.  Red Hat QA also use samples/bpf/xdp* and based on the reports I
get from them, I could not blame them if this bug would slip through,
as the tool reports "good" results.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection

2021-03-04 Thread Yongji Xie

On Thu, Mar 4, 2021 at 2:59 PM Jason Wang  wrote:
>
>
> On 2021/2/23 7:50 下午, Xie Yongji wrote:
> > This patch introduces a workqueue to support injecting
> > virtqueue's interrupt asynchronously. This is mainly
> > for performance considerations which makes sure the push()
> > and pop() for used vring can be asynchronous.
>
>
> Do you have pref numbers for this patch?
>

No, I can do some tests for it if needed.

Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless
if we call irq callback in ioctl context. Something like:

virtqueue_push();
virtio_notify();
ioctl()
-
irq_cb()
virtqueue_get_buf()

The used vring is always empty each time we call virtqueue_push() in
userspace. Not sure if it is what we expected.

Thanks,
Yongji

[PATCH net 0/2] nexthop: Do not flush blackhole nexthops when loopback goes down

2021-03-04 Thread Ido Schimmel

From: Ido Schimmel 

Patch #1 prevents blackhole nexthops from being flushed when the
loopback device goes down given that as far as user space is concerned,
these nexthops do not have a nexthop device.

Patch #2 adds a test case.

There are no regressions in fib_nexthops.sh with this change:

 # ./fib_nexthops.sh
 ...
 Tests passed: 165
 Tests failed:   0

Ido Schimmel (2):
  nexthop: Do not flush blackhole nexthops when loopback goes down
  selftests: fib_nexthops: Test blackhole nexthops when loopback goes
down

 net/ipv4/nexthop.c  | 10 +++---
 tools/testing/selftests/net/fib_nexthops.sh |  8 
 2 files changed, 15 insertions(+), 3 deletions(-)

-- 
2.29.2

[PATCH net 2/2] selftests: fib_nexthops: Test blackhole nexthops when loopback goes down

2021-03-04 Thread Ido Schimmel

From: Ido Schimmel 

Test that blackhole nexthops are not flushed when the loopback device
goes down.

Output without previous patch:

 # ./fib_nexthops.sh -t basic

 Basic functional tests
 --
 TEST: List with nothing defined [ OK ]
 TEST: Nexthop get on non-existent id[ OK ]
 TEST: Nexthop with no device or gateway [ OK ]
 TEST: Nexthop with down device  [ OK ]
 TEST: Nexthop with device that is linkdown  [ OK ]
 TEST: Nexthop with device only  [ OK ]
 TEST: Nexthop with duplicate id [ OK ]
 TEST: Blackhole nexthop [ OK ]
 TEST: Blackhole nexthop with other attributes   [ OK ]
 TEST: Blackhole nexthop with loopback device down   [FAIL]
 TEST: Create group  [ OK ]
 TEST: Create group with blackhole nexthop   [FAIL]
 TEST: Create multipath group where 1 path is a blackhole[ OK ]
 TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
 TEST: Create group with non-existent nexthop[ OK ]
 TEST: Create group with same nexthop multiple times [ OK ]
 TEST: Replace nexthop with nexthop group[ OK ]
 TEST: Replace nexthop group with nexthop[ OK ]
 TEST: Nexthop group and device  [ OK ]
 TEST: Test proto flush  [ OK ]
 TEST: Nexthop group and blackhole   [ OK ]

 Tests passed:  19
 Tests failed:   2

Output with previous patch:

 # ./fib_nexthops.sh -t basic

 Basic functional tests
 --
 TEST: List with nothing defined [ OK ]
 TEST: Nexthop get on non-existent id[ OK ]
 TEST: Nexthop with no device or gateway [ OK ]
 TEST: Nexthop with down device  [ OK ]
 TEST: Nexthop with device that is linkdown  [ OK ]
 TEST: Nexthop with device only  [ OK ]
 TEST: Nexthop with duplicate id [ OK ]
 TEST: Blackhole nexthop [ OK ]
 TEST: Blackhole nexthop with other attributes   [ OK ]
 TEST: Blackhole nexthop with loopback device down   [ OK ]
 TEST: Create group  [ OK ]
 TEST: Create group with blackhole nexthop   [ OK ]
 TEST: Create multipath group where 1 path is a blackhole[ OK ]
 TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
 TEST: Create group with non-existent nexthop[ OK ]
 TEST: Create group with same nexthop multiple times [ OK ]
 TEST: Replace nexthop with nexthop group[ OK ]
 TEST: Replace nexthop group with nexthop[ OK ]
 TEST: Nexthop group and device  [ OK ]
 TEST: Test proto flush  [ OK ]
 TEST: Nexthop group and blackhole   [ OK ]

 Tests passed:  21
 Tests failed:   0

Signed-off-by: Ido Schimmel 
Reviewed-by: David Ahern 
---
 tools/testing/selftests/net/fib_nexthops.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/tools/testing/selftests/net/fib_nexthops.sh 
b/tools/testing/selftests/net/fib_nexthops.sh
index 4c7d33618437..d98fb85e201c 100755
--- a/tools/testing/selftests/net/fib_nexthops.sh
+++ b/tools/testing/selftests/net/fib_nexthops.sh
@@ -1524,6 +1524,14 @@ basic()
run_cmd "$IP nexthop replace id 2 blackhole dev veth1"
log_test $? 2 "Blackhole nexthop with other attributes"
 
+   # blackhole nexthop should not be affected by the state of the loopback
+   # device
+   run_cmd "$IP link set dev lo down"
+   check_nexthop "id 2" "id 2 blackhole"
+   log_test $? 0 "Blackhole nexthop with loopback device down"
+
+   run_cmd "$IP link set dev lo up"
+
#
# groups
#
-- 
2.29.2

[PATCH net 1/2] nexthop: Do not flush blackhole nexthops when loopback goes down

2021-03-04 Thread Ido Schimmel

From: Ido Schimmel 

As far as user space is concerned, blackhole nexthops do not have a
nexthop device and therefore should not be affected by the
administrative or carrier state of any netdev.

However, when the loopback netdev goes down all the blackhole nexthops
are flushed. This happens because internally the kernel associates
blackhole nexthops with the loopback netdev.

This behavior is both confusing to those not familiar with kernel
internals and also diverges from the legacy API where blackhole IPv4
routes are not flushed when the loopback netdev goes down:

 # ip route add blackhole 198.51.100.0/24
 # ip link set dev lo down
 # ip route show 198.51.100.0/24
 blackhole 198.51.100.0/24

Blackhole IPv6 routes are flushed, but at least user space knows that
they are associated with the loopback netdev:

 # ip -6 route show 2001:db8:1::/64
 blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium

Fix this by only flushing blackhole nexthops when the loopback netdev is
unregistered.

Fixes: ab84be7e54fc ("net: Initial nexthop code")
Signed-off-by: Ido Schimmel 
Reported-by: Donald Sharp 
Reviewed-by: David Ahern 
---
 net/ipv4/nexthop.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index f1c6cbdb9e43..743777bce179 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -1399,7 +1399,7 @@ static int insert_nexthop(struct net *net, struct nexthop 
*new_nh,
 
 /* rtnl */
 /* remove all nexthops tied to a device being deleted */
-static void nexthop_flush_dev(struct net_device *dev)
+static void nexthop_flush_dev(struct net_device *dev, unsigned long event)
 {
unsigned int hash = nh_dev_hashfn(dev->ifindex);
struct net *net = dev_net(dev);
@@ -1411,6 +1411,10 @@ static void nexthop_flush_dev(struct net_device *dev)
if (nhi->fib_nhc.nhc_dev != dev)
continue;
 
+   if (nhi->reject_nh &&
+   (event == NETDEV_DOWN || event == NETDEV_CHANGE))
+   continue;
+
remove_nexthop(net, nhi->nh_parent, NULL);
}
 }
@@ -2189,11 +2193,11 @@ static int nh_netdev_event(struct notifier_block *this,
switch (event) {
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
-   nexthop_flush_dev(dev);
+   nexthop_flush_dev(dev, event);
break;
case NETDEV_CHANGE:
if (!(dev_get_flags(dev) & (IFF_RUNNING | IFF_LOWER_UP)))
-   nexthop_flush_dev(dev);
+   nexthop_flush_dev(dev, event);
break;
case NETDEV_CHANGEMTU:
info_ext = ptr;
-- 
2.29.2

[PATCH net] ethtool: Add indicator field for link_mode validity to link_ksettings

2021-03-04 Thread Danielle Ratson

Some drivers clear the 'ethtool_link_ksettings' struct in their
get_link_ksettings() callback, before populating it with actual values.
Such drivers will set the new 'link_mode' field to zero, resulting in
user space receiving wrong link mode information given that zero is a
valid value for the field.

Fix this by introducing a new boolean field ('link_mode_valid'), which
indicates whether the 'link_mode' field is valid or not. Set it in mlxsw
which is currently the only driver supporting the new API.

Another problem is that some drivers (notably tun) can report random
values in the 'link_mode' field. This can result in a general protection
fault when the field is used as an index to the 'link_mode_params' array
[1].

This happens because such drivers implement their set_link_ksettings()
callback by simply overwriting their private copy of
'ethtool_link_ksettings' struct with the one they get from the stack,
which is not always properly initialized.

Fix this by making sure that the new 'link_mode_valid' field is always
initialized to 'false' before invoking the set_link_ksettings()
callback.

[1]
general protection fault, probably for non-canonical address 
0xdc00f14cc32c:  [#1] PREEMPT SMP KASAN
KASAN: probably user-memory-access in range 
[0x00078a661960-0x00078a661967]
CPU: 0 PID: 8452 Comm: syz-executor360 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:__ethtool_get_link_ksettings+0x1a3/0x3a0 net/ethtool/ioctl.c:446
Code: b7 3e fa 83 fd ff 0f 84 30 01 00 00 e8 16 b0 3e fa 48 8d 3c ed 60 d5 69 
8a 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 
e0 07 83 c0 03
+38 d0 7c 08 84 d2 0f 85 b9
RSP: 0018:c900019df7a0 EFLAGS: 00010202
RAX: dc00 RBX: 888026136008 RCX: 
RDX: f14cc32c RSI: 873439ca RDI: 00078a661960
RBP: 8880 R08:  R09: 88802613606f
R10: 873439bc R11:  R12: 
R13: 88802613606c R14: 888011d0c210 R15: 888011d0c210
FS:  00749300() GS:8880b9c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004b60f0 CR3: 185c2000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 linkinfo_prepare_data+0xfd/0x280 net/ethtool/linkinfo.c:37
 ethnl_default_notify+0x1dc/0x630 net/ethtool/netlink.c:586
 ethtool_notify+0xbd/0x1f0 net/ethtool/netlink.c:656
 ethtool_set_link_ksettings+0x277/0x330 net/ethtool/ioctl.c:620
 dev_ethtool+0x2b35/0x45d0 net/ethtool/ioctl.c:2842
 dev_ioctl+0x463/0xb70 net/core/dev_ioctl.c:440
 sock_do_ioctl+0x148/0x2d0 net/socket.c:1060
 sock_ioctl+0x477/0x6a0 net/socket.c:1177
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: c8907043c6ac9 ("ethtool: Get link mode in use instead of speed and 
duplex parameters")
Signed-off-by: Danielle Ratson 
Reported-by: Eric Dumazet 
Reviewed-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c | 9 -
 include/linux/ethtool.h| 1 +
 net/ethtool/ioctl.c| 6 --
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
index 0bd64169bf81..66eb1cd17bc4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
@@ -1232,14 +1232,14 @@ mlxsw_sp1_from_ptys_link_mode(struct mlxsw_sp 
*mlxsw_sp, bool carrier_ok,
 {
int i;
 
-   cmd->link_mode = -1;
-
if (!carrier_ok)
return;
 
for (i = 0; i < MLXSW_SP1_PORT_LINK_MODE_LEN; i++) {
-   if (ptys_eth_proto & mlxsw_sp1_port_link_mode[i].mask)
+   if (ptys_eth_proto & mlxsw_sp1_port_link_mode[i].mask) {
cmd->link_mode = 
mlxsw_sp1_port_link_mode[i].mask_ethtool;
+   cmd->link_mode_valid = true;
+   }
}
 }
 
@@ -1672,8 +1672,6 @@ mlxsw_sp2_from_ptys_link_mode(struct mlxsw_sp *mlxsw_sp, 
bool carrier_ok,
struct mlxsw_sp2_port_link_mode link;
int i;
 
-   cmd->link_mode = -1;
-
if (!carrier_ok)
return;
 
@@ -1681,6 +1679,7 @@ mlxsw_sp2_from_ptys_link_mode(struct mlxsw_sp *mlxsw_sp, 
bool carrier_ok,
if (ptys_eth_proto & mlxsw_sp2_port_link_mode[i].mask) {
link = mlxsw_sp2_port_link_mode[i];
cmd->link_mode = link.mask_ethtool[1];
+   cm

Re: [PATCH] net: mac802154: Fix null pointer dereference

2021-03-04 Thread Pavel Skripkin

Hi, thanks for your reply!

On Wed, 2021-03-03 at 21:40 -0500, Alexander Aring wrote:
> Hi,
> 
> On Wed, 3 Mar 2021 at 11:28, Pavel Skripkin 
> wrote:
> > syzbot found general protection fault in crypto_destroy_tfm()[1].
> > It was caused by wrong clean up loop in llsec_key_alloc().
> > If one of the tfm array members won't be initialized it will cause
> > NULL dereference in crypto_destroy_tfm().
> > 
> > Call Trace:
> >  crypto_free_aead include/crypto/aead.h:191 [inline] [1]
> >  llsec_key_alloc net/mac802154/llsec.c:156 [inline]
> >  mac802154_llsec_key_add+0x9e0/0xcc0 net/mac802154/llsec.c:249
> >  ieee802154_add_llsec_key+0x56/0x80 net/mac802154/cfg.c:338
> >  rdev_add_llsec_key net/ieee802154/rdev-ops.h:260 [inline]
> >  nl802154_add_llsec_key+0x3d3/0x560 net/ieee802154/nl802154.c:1584
> >  genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739
> >  genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
> >  genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800
> >  netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
> >  genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
> >  netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
> >  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
> >  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
> >  sock_sendmsg_nosec net/socket.c:654 [inline]
> >  sock_sendmsg+0xcf/0x120 net/socket.c:674
> >  sys_sendmsg+0x6e8/0x810 net/socket.c:2350
> >  ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
> >  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
> >  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > Signed-off-by: Pavel Skripkin 
> > Reported-by: syzbot+12cf5fbfdeba210a8...@syzkaller.appspotmail.com
> > ---
> >  net/mac802154/llsec.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/mac802154/llsec.c b/net/mac802154/llsec.c
> > index 585d33144c33..6709f186f777 100644
> > --- a/net/mac802154/llsec.c
> > +++ b/net/mac802154/llsec.c
> > @@ -151,7 +151,7 @@ llsec_key_alloc(const struct
> > ieee802154_llsec_key *template)
> >  err_tfm0:
> > crypto_free_sync_skcipher(key->tfm0);
> >  err_tfm:
> > -   for (i = 0; i < ARRAY_SIZE(key->tfm); i++)
> > +   for (; i >= 0; i--)
> > if (key->tfm[i])
> 
> I think this need to be:
> 
> if (!IS_ERR_OR_NULL(key->tfm[i]))
> 
> otherwise we still run into issues for the current iterator when
> key->tfm[i] is in range of IS_ERR().

Oh... I got it completly wrong, I'm sorry. If it's still not fixed,
I'll send rigth patch for that.

> 
> - Alex
-- 
With regards,
Pavel Skripkin

Re: [Patch bpf-next v2 2/9] sock: introduce sk_prot->update_proto()

2021-03-04 Thread Lorenz Bauer

On Wed, 3 Mar 2021 at 18:21, Cong Wang  wrote:
>
> Yeah, I am not surprised we can change tcp_update_ulp() too, but
> why should I bother kTLS when I do not have to? What you suggest
> could at most save us a bit of code size, not a big gain. So, I'd keep
> its return value as it is, unless you see any other benefits.

I think the end result is code that is easier to understand and
therefore maintain. Keep it as it is if you prefer.

> BTW, I will rename it to 'psock_update_sk_prot', please let me know
> if you have any better names.

SGTM.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

[PATCH] i40e: improve locking of mac_filter_hash

2021-03-04 Thread Stefan Assmann

i40e_config_vf_promiscuous_mode() calls
i40e_getnum_vf_vsi_vlan_filters() without acquiring the
mac_filter_hash_lock spinlock.

This is unsafe because mac_filter_hash may get altered in another thread
while i40e_getnum_vf_vsi_vlan_filters() traverses the hashes.

Simply adding the spinlock in i40e_getnum_vf_vsi_vlan_filters() is not
possible as it already gets called in i40e_get_vlan_list_sync() with the
spinlock held. Therefore adding a wrapper that acquires the spinlock and
call the correct function where appropriate.

Fixes: 37d318d7805f ("i40e: Remove scheduling while atomic possibility")
Fix-suggested-by: Paolo Abeni 
Signed-off-by: Stefan Assmann 
---
 .../ethernet/intel/i40e/i40e_virtchnl_pf.c| 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 1b6ec9be155a..61d3e9a00f65 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1101,12 +1101,12 @@ static int i40e_quiesce_vf_pci(struct i40e_vf *vf)
 }
 
 /**
- * i40e_getnum_vf_vsi_vlan_filters
+ * __i40e_getnum_vf_vsi_vlan_filters
  * @vsi: pointer to the vsi
  *
  * called to get the number of VLANs offloaded on this VF
  **/
-static int i40e_getnum_vf_vsi_vlan_filters(struct i40e_vsi *vsi)
+static int __i40e_getnum_vf_vsi_vlan_filters(struct i40e_vsi *vsi)
 {
struct i40e_mac_filter *f;
u16 num_vlans = 0, bkt;
@@ -1119,6 +1119,23 @@ static int i40e_getnum_vf_vsi_vlan_filters(struct 
i40e_vsi *vsi)
return num_vlans;
 }
 
+/**
+ * i40e_getnum_vf_vsi_vlan_filters
+ * @vsi: pointer to the vsi
+ *
+ * wrapper for __i40e_getnum_vf_vsi_vlan_filters() with spinlock held
+ **/
+static int i40e_getnum_vf_vsi_vlan_filters(struct i40e_vsi *vsi)
+{
+   int num_vlans;
+
+   spin_lock_bh(&vsi->mac_filter_hash_lock);
+   num_vlans = __i40e_getnum_vf_vsi_vlan_filters(vsi);
+   spin_unlock_bh(&vsi->mac_filter_hash_lock);
+
+   return num_vlans;
+}
+
 /**
  * i40e_get_vlan_list_sync
  * @vsi: pointer to the VSI
@@ -1136,7 +1153,7 @@ static void i40e_get_vlan_list_sync(struct i40e_vsi *vsi, 
u16 *num_vlans,
int bkt;
 
spin_lock_bh(&vsi->mac_filter_hash_lock);
-   *num_vlans = i40e_getnum_vf_vsi_vlan_filters(vsi);
+   *num_vlans = __i40e_getnum_vf_vsi_vlan_filters(vsi);
*vlan_list = kcalloc(*num_vlans, sizeof(**vlan_list), GFP_ATOMIC);
if (!(*vlan_list))
goto err;
-- 
2.29.2

Re: [PATCH v2 1/1] xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume

2021-03-04 Thread Steffen Klassert

On Tue, Mar 02, 2021 at 08:00:04AM +1300, Evan Nimmo wrote:
> A situation can occur where the interface bound to the sk is different
> to the interface bound to the sk attached to the skb. The interface
> bound to the sk is the correct one however this information is lost inside
> xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume
> instead. This assumes that the sk bound interface and the bound interface
> attached to the sk within the skb are the same which can lead to lookup
> failures inside ip_route_me_harder resulting in the packet being dropped.
> 
> We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the
> global VRF however we have an encapsulated dot1q tunnel interface that
> is within a different VRF. We also have a mangle rule that marks the 
> packets causing them to be processed inside ip_route_me_harder.
> 
> Prior to commit 31c70d5956fc ("l2tp: keep original skb ownership") this
> worked fine as the sk attached to the skb was changed from the dot1q
> encapsulated interface to the sk for the tunnel which meant the interface
> bound to the sk and the interface bound to the skb were identical.
> Commit 46d6c5ae953c ("netfilter: use actual socket sk rather than skb sk
> when routing harder") fixed some of these issues however a similar
> problem existed in the xfrm code.
> 
> Fixes: 31c70d5956fc ("l2tp: keep original skb ownership")
> 
> Signed-off-by: Evan Nimmo 

Applied, thanks Evan!

RE: [PATCH] gianfar: fix jumbo packets+napi+rx overrun crash

2021-03-04 Thread Claudiu Manoil

>-Original Message-
>From: michael-...@fami-braun.de 
>Sent: Thursday, March 4, 2021 5:13 AM
>To: Claudiu Manoil 
>Cc: netdev@vger.kernel.org; Michael Braun 
>Subject: [PATCH] gianfar: fix jumbo packets+napi+rx overrun crash
>
>From: Michael Braun 
>
>When using jumbo packets and overrunning rx queue with napi enabled,
>the following sequence is observed in gfar_add_rx_frag:
>

Hi,

Could you help provide some context, e.g.:
On what board/soc were you able to trigger this issue?
How often does the overrun occur? What's the use case? Is the issue triggered 
with smaller packets than 9600B?
Increasing the Rx ring size does significantly reduce ring overruns?
Thanks.

>   | lstatus  |   | skb   |
>t  | lstatus,  size, flags| first | len, data_len, *ptr   |
>---+--+---+---+
>13 | 18002348, 9032, INTERRUPT LAST   | 0 | 9600, 8000,  f554c12e |
>12 | 1640, 1600, INTERRUPT| 0 | 8000, 6400,  f554c12e |
>11 | 1640, 1600, INTERRUPT| 0 | 6400, 4800,  f554c12e |
>10 | 1640, 1600, INTERRUPT| 0 | 4800, 3200,  f554c12e |
>09 | 1640, 1600, INTERRUPT| 0 | 3200, 1600,  f554c12e |
>08 | 14000640, 1600, INTERRUPT FIRST  | 0 | 1600, 0, f554c12e |
>07 | 14000640, 1600, INTERRUPT FIRST  | 1 | 0,0, f554c12e |
>06 | 1c80, 128,  INTERRUPT LAST FIRST | 1 | 0,0, abf3bd6e |
>05 | 18002348, 9032, INTERRUPT LAST   | 0 | 8000, 6400,  c5a57780 |
>04 | 1640, 1600, INTERRUPT| 0 | 6400, 4800,  c5a57780 |
>03 | 1640, 1600, INTERRUPT| 0 | 4800, 3200,  c5a57780 |
>02 | 1640, 1600, INTERRUPT| 0 | 3200, 1600,  c5a57780 |
>01 | 1640, 1600, INTERRUPT| 0 | 1600, 0, c5a57780 |
>00 | 14000640, 1600, INTERRUPT FIRST  | 1 | 0,0, c5a57780 |
>
>So at t=7 a new packets is started but not finished, probably due to rx
>overrun - but rx overrun is not indicated in the flags. Instead a new
>packets starts at t=8. This results in skb->len to exceed size for the LAST
>fragment at t=13 and thus a negative fragment size added to the skb.
>
>This then crashes:
>
>kernel BUG at include/linux/skbuff.h:2277!
>Oops: Exception in kernel mode, sig: 5 [#1]
>...
>NIP [c04689f4] skb_pull+0x2c/0x48
>LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
>Call Trace:
>[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
>[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
>[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
>[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
>[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
>[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
>[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
>[ec4bff08] [c0062718] kthread+0x140/0x144
>[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c
>
>This patch fixes this by checking for computed LAST fragment size, so a
>negative sized fragment is never added.
>In order to prevent the newer rx frame from getting corrupted, the FIRST
>flag is checked to discard the incomplete older frame.
>
>Signed-off-by: Michael Braun 
>---
> drivers/net/ethernet/freescale/gianfar.c | 14 ++
> 1 file changed, 14 insertions(+)
>
>diff --git a/drivers/net/ethernet/freescale/gianfar.c
>b/drivers/net/ethernet/freescale/gianfar.c
>index 541de32ea662..2aecae23bfd0 100644
>--- a/drivers/net/ethernet/freescale/gianfar.c
>+++ b/drivers/net/ethernet/freescale/gianfar.c
>@@ -2390,6 +2390,10 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff
>*rxb, u32 lstatus,
>   if (lstatus & BD_LFLAG(RXBD_LAST))
>   size -= skb->len;
>
>+  WARN(size < 0, "gianfar: rx fragment size underflow");
>+  if (size < 0)
>+  return false;
>+
>   skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
>   rxb->page_offset + RXBUF_ALIGNMENT,
>   size, GFAR_RXB_TRUESIZE);
>@@ -2552,6 +2556,16 @@ static int gfar_clean_rx_ring(struct gfar_priv_rx_q
>*rx_queue,
>   if (lstatus & BD_LFLAG(RXBD_EMPTY))
>   break;
>
>+  /* lost RXBD_LAST descriptor due to overrun */
>+  if (skb &&
>+  (lstatus & BD_LFLAG(RXBD_FIRST))) {
>+  /* discard faulty buffer */
>+  dev_kfree_skb(skb);
>+  skb = NULL;
>+
>+  /* can continue normally */
>+  }
>+

This is indeed an invalid state. If you hit this condition, discarding the skb 
is the right thing to do.

Acked-by: Claudiu Manoil 

>   /* order rx buffer descriptor reads */
>   rmb();
>
>--
>2.20.1

[PATCH -next] iwlwifi: mvm: fix old-style static const declaration

2021-03-04 Thread 'Wei Yongjun

From: Wei Yongjun 

GCC reports warning as follows:

drivers/net/wireless/intel/iwlwifi/mvm/rfi.c:14:1: warning:
 'static' is not at beginning of declaration [-Wold-style-declaration]
   14 | const static struct iwl_rfi_lut_entry iwl_rfi_table[IWL_RFI_LUT_SIZE] = 
{
  | ^

Move static to the beginning of declaration.

Fixes: 21254908cbe9 ("iwlwifi: mvm: add RFI-M support")
Reported-by: Hulk Robot 
Signed-off-by: Wei Yongjun 
---
 drivers/net/wireless/intel/iwlwifi/mvm/rfi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rfi.c 
b/drivers/net/wireless/intel/iwlwifi/mvm/rfi.c
index 873919048143..4d5a99cbcc9d 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rfi.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rfi.c
@@ -11,7 +11,7 @@
  * DDR needs frequency in units of 16.666MHz, so provide FW with the
  * frequency values in the adjusted format.
  */
-const static struct iwl_rfi_lut_entry iwl_rfi_table[IWL_RFI_LUT_SIZE] = {
+static const struct iwl_rfi_lut_entry iwl_rfi_table[IWL_RFI_LUT_SIZE] = {
/* LPDDR4 */
 
/* frequency 3733MHz */

net: mscc: ocelot: issue with uninitialized pointer read in ocelot_flower_parse_key

2021-03-04 Thread Colin Ian King

Hi,

Static analysis with Coverity had detected an uninitialized pointer read
in function ocelot_flower_parse_key in
drivers/net/ethernet/mscc/ocelot_flower.c introduced by commit:

commit 75944fda1dfe836fdd406bef6cb3cc8a80f7af83
Author: Xiaoliang Yang 
Date:   Fri Oct 2 15:02:23 2020 +0300

net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1

The analysis is as follows:

531

   10. Condition flow_rule_match_key(rule,
FLOW_DISSECTOR_KEY_IPV4_ADDRS), taking true branch.
   11. Condition proto == 2048, taking true branch.

532if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_IPV4_ADDRS) &&
533proto == ETH_P_IP) {

   12. var_decl: Declaring variable match without initializer.

534struct flow_match_ipv4_addrs match;
535u8 *tmp;
536

   13. Condition filter->block_id == VCAP_ES0, taking false branch.

537if (filter->block_id == VCAP_ES0) {
538NL_SET_ERR_MSG_MOD(extack,
539   "VCAP ES0 cannot match on
IP address");
540return -EOPNOTSUPP;
541}
542

   14. Condition filter->block_id == VCAP_IS1, taking true branch.
   Uninitialized pointer read (UNINIT)
   15. uninit_use: Using uninitialized value match.mask.

543if (filter->block_id == VCAP_IS1 && *(u32
*)&match.mask->dst) {
544NL_SET_ERR_MSG_MOD(extack,
545   "Key type S1_NORMAL cannot
match on destination IP");
546return -EOPNOTSUPP;
547}

match is declared in line 534 and is not initialized and the
uninitialized match.mask is being dereferenced on line 543. Not sure
what intent was on this and how to fix, hence I'm reporting this issue.

Colin

[PATCH net] net: mscc: ocelot: properly reject destination IP keys in VCAP IS1

2021-03-04 Thread Vladimir Oltean

From: Vladimir Oltean 

An attempt is made to warn the user about the fact that VCAP IS1 cannot
offload keys matching on destination IP (at least given the current half
key format), but sadly that warning fails miserably in practice, due to
the fact that it operates on an uninitialized "match" variable. We must
first decode the keys from the flow rule.

Fixes: 75944fda1dfe ("net: mscc: ocelot: offload ingress skbedit and vlan 
actions to VCAP IS1")
Reported-by: Colin Ian King 
Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot_flower.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c 
b/drivers/net/ethernet/mscc/ocelot_flower.c
index c3ac026f6aea..a41b458b1b3e 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -540,13 +540,14 @@ ocelot_flower_parse_key(struct ocelot *ocelot, int port, 
bool ingress,
return -EOPNOTSUPP;
}
 
+   flow_rule_match_ipv4_addrs(rule, &match);
+
if (filter->block_id == VCAP_IS1 && *(u32 *)&match.mask->dst) {
NL_SET_ERR_MSG_MOD(extack,
   "Key type S1_NORMAL cannot match on 
destination IP");
return -EOPNOTSUPP;
}
 
-   flow_rule_match_ipv4_addrs(rule, &match);
tmp = &filter->key.ipv4.sip.value.addr[0];
memcpy(tmp, &match.key->src, 4);
 
-- 
2.25.1

Re: [PATCH net] net: mscc: ocelot: properly reject destination IP keys in VCAP IS1

2021-03-04 Thread Colin Ian King

On 04/03/2021 10:29, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> An attempt is made to warn the user about the fact that VCAP IS1 cannot
> offload keys matching on destination IP (at least given the current half
> key format), but sadly that warning fails miserably in practice, due to
> the fact that it operates on an uninitialized "match" variable. We must
> first decode the keys from the flow rule.
> 
> Fixes: 75944fda1dfe ("net: mscc: ocelot: offload ingress skbedit and vlan 
> actions to VCAP IS1")
> Reported-by: Colin Ian King 
> Signed-off-by: Vladimir Oltean 
> ---
>  drivers/net/ethernet/mscc/ocelot_flower.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c 
> b/drivers/net/ethernet/mscc/ocelot_flower.c
> index c3ac026f6aea..a41b458b1b3e 100644
> --- a/drivers/net/ethernet/mscc/ocelot_flower.c
> +++ b/drivers/net/ethernet/mscc/ocelot_flower.c
> @@ -540,13 +540,14 @@ ocelot_flower_parse_key(struct ocelot *ocelot, int 
> port, bool ingress,
>   return -EOPNOTSUPP;
>   }
>  
> + flow_rule_match_ipv4_addrs(rule, &match);
> +
>   if (filter->block_id == VCAP_IS1 && *(u32 *)&match.mask->dst) {
>   NL_SET_ERR_MSG_MOD(extack,
>  "Key type S1_NORMAL cannot match on 
> destination IP");
>   return -EOPNOTSUPP;
>   }
>  
> - flow_rule_match_ipv4_addrs(rule, &match);
>   tmp = &filter->key.ipv4.sip.value.addr[0];
>   memcpy(tmp, &match.key->src, 4);
>  
> 

Thanks, looks good to me.

Reviewed-by: Colin Ian King

Re: Re: [RFC v4 09/11] Documentation: Add documentation for VDUSE

2021-03-04 Thread Yongji Xie

On Thu, Mar 4, 2021 at 2:40 PM Jason Wang  wrote:
>
>
> On 2021/2/23 7:50 下午, Xie Yongji wrote:
> > VDUSE (vDPA Device in Userspace) is a framework to support
> > implementing software-emulated vDPA devices in userspace. This
> > document is intended to clarify the VDUSE design and usage.
> >
> > Signed-off-by: Xie Yongji 
> > ---
> >   Documentation/userspace-api/index.rst |   1 +
> >   Documentation/userspace-api/vduse.rst | 112 
> > ++
> >   2 files changed, 113 insertions(+)
> >   create mode 100644 Documentation/userspace-api/vduse.rst
> >
> > diff --git a/Documentation/userspace-api/index.rst 
> > b/Documentation/userspace-api/index.rst
> > index acd2cc2a538d..f63119130898 100644
> > --- a/Documentation/userspace-api/index.rst
> > +++ b/Documentation/userspace-api/index.rst
> > @@ -24,6 +24,7 @@ place where this information is gathered.
> >  ioctl/index
> >  iommu
> >  media/index
> > +   vduse
> >
> >   .. only::  subproject and html
> >
> > diff --git a/Documentation/userspace-api/vduse.rst 
> > b/Documentation/userspace-api/vduse.rst
> > new file mode 100644
> > index ..2a20e686bb59
> > --- /dev/null
> > +++ b/Documentation/userspace-api/vduse.rst
> > @@ -0,0 +1,112 @@
> > +==
> > +VDUSE - "vDPA Device in Userspace"
> > +==
> > +
> > +vDPA (virtio data path acceleration) device is a device that uses a
> > +datapath which complies with the virtio specifications with vendor
> > +specific control path. vDPA devices can be both physically located on
> > +the hardware or emulated by software. VDUSE is a framework that makes it
> > +possible to implement software-emulated vDPA devices in userspace.
> > +
> > +How VDUSE works
> > +
> > +Each userspace vDPA device is created by the VDUSE_CREATE_DEV ioctl on
> > +the character device (/dev/vduse/control). Then a device file with the
> > +specified name (/dev/vduse/$NAME) will appear, which can be used to
> > +implement the userspace vDPA device's control path and data path.
>
>
> It's better to mention that in order to le thte device to be registered
> on the bus, admin need to use the management API(netlink) to create the
> vDPA device.
>
> Some codes to demnonstrate how to create the device will be better.
>

OK.

>
> > +
> > +To implement control path, a message-based communication protocol and some
> > +types of control messages are introduced in the VDUSE framework:
> > +
> > +- VDUSE_SET_VQ_ADDR: Set the vring address of virtqueue.
> > +
> > +- VDUSE_SET_VQ_NUM: Set the size of virtqueue
> > +
> > +- VDUSE_SET_VQ_READY: Set ready status of virtqueue
> > +
> > +- VDUSE_GET_VQ_READY: Get ready status of virtqueue
> > +
> > +- VDUSE_SET_VQ_STATE: Set the state for virtqueue
> > +
> > +- VDUSE_GET_VQ_STATE: Get the state for virtqueue
> > +
> > +- VDUSE_SET_FEATURES: Set virtio features supported by the driver
> > +
> > +- VDUSE_GET_FEATURES: Get virtio features supported by the device
> > +
> > +- VDUSE_SET_STATUS: Set the device status
> > +
> > +- VDUSE_GET_STATUS: Get the device status
> > +
> > +- VDUSE_SET_CONFIG: Write to device specific configuration space
> > +
> > +- VDUSE_GET_CONFIG: Read from device specific configuration space
> > +
> > +- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping in 
> > device IOTLB
> > +
> > +Those control messages are mostly based on the vdpa_config_ops in
> > +include/linux/vdpa.h which defines a unified interface to control
> > +different types of vdpa device. Userspace needs to read()/write()
> > +on the VDUSE device file to receive/reply those control messages
> > +from/to VDUSE kernel module as follows:
> > +
> > +.. code-block:: c
> > +
> > + static int vduse_message_handler(int dev_fd)
> > + {
> > + int len;
> > + struct vduse_dev_request req;
> > + struct vduse_dev_response resp;
> > +
> > + len = read(dev_fd, &req, sizeof(req));
> > + if (len != sizeof(req))
> > + return -1;
> > +
> > + resp.request_id = req.unique;
> > +
> > + switch (req.type) {
> > +
> > + /* handle different types of message */
> > +
> > + }
> > +
> > + len = write(dev_fd, &resp, sizeof(resp));
> > + if (len != sizeof(resp))
> > + return -1;
> > +
> > + return 0;
> > + }
> > +
> > +In the deta path, vDPA device's iova regions will be mapped into userspace
> > +with the help of VDUSE_IOTLB_GET_FD ioctl on the VDUSE device file:
> > +
> > +- VDUSE_IOTLB_GET_FD: get the file descriptor to iova region. Userspace can
> > +  access this iova region by passing the fd to mmap().
>
>
> It would be better to have codes to explain how it is expected to work here.
>

OK.

>
> > +
> > +Besides, the following ioctls on the VDUSE device file are provided to 
> > support
> > +interrupt injection and setting up e

[PATCH net 1/2] net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10

2021-03-04 Thread Vladimir Oltean

From: Vladimir Oltean 

When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
read before resetting the switch so it can be reprogrammed afterwards.
This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
therefore that last branch is dead code.

Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
remove the check for SPEED_10, let it fall into the default case.

Fixes: ffe10e679cec ("net: dsa: sja1105: Add support for the SGMII port")
Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index 7692338730df..c1982615c631 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -1922,7 +1922,7 @@ int sja1105_static_config_reload(struct sja1105_private 
*priv,
speed = SPEED_1000;
else if (bmcr & BMCR_SPEED100)
speed = SPEED_100;
-   else if (bmcr & BMCR_SPEED10)
+   else
speed = SPEED_10;
 
sja1105_sgmii_pcs_force_speed(priv, speed);
-- 
2.25.1

[PATCH net 2/2] net: dsa: sja1105: fix ucast/bcast flooding always remaining enabled

2021-03-04 Thread Vladimir Oltean

From: Vladimir Oltean 

In the blamed patch I managed to introduce a bug while moving code
around: the same logic is applied to the ucast_egress_floods and
bcast_egress_floods variables both on the "if" and the "else" branches.

This is clearly an unintended change compared to how the code used to be
prior to that bugfix, so restore it.

Fixes: 7f7ccdea8c73 ("net: dsa: sja1105: fix leakage of flooded frames outside 
bridging domain")
Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index c1982615c631..51ea104c63bb 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -3369,14 +3369,14 @@ static int sja1105_port_ucast_bcast_flood(struct 
sja1105_private *priv, int to,
if (flags.val & BR_FLOOD)
priv->ucast_egress_floods |= BIT(to);
else
-   priv->ucast_egress_floods |= BIT(to);
+   priv->ucast_egress_floods &= ~BIT(to);
}
 
if (flags.mask & BR_BCAST_FLOOD) {
if (flags.val & BR_BCAST_FLOOD)
priv->bcast_egress_floods |= BIT(to);
else
-   priv->bcast_egress_floods |= BIT(to);
+   priv->bcast_egress_floods &= ~BIT(to);
}
 
return sja1105_manage_flood_domains(priv);
-- 
2.25.1

Re: cleanup unused or almost unused IOMMU APIs and the FSL PAMU driver

2021-03-04 Thread Joerg Roedel

On Mon, Mar 01, 2021 at 09:42:40AM +0100, Christoph Hellwig wrote:
> Diffstat:
>  arch/powerpc/include/asm/fsl_pamu_stash.h   |   12 
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |2 
>  drivers/iommu/amd/iommu.c   |   23 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   85 ---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  122 +---
>  drivers/iommu/dma-iommu.c   |8 
>  drivers/iommu/fsl_pamu.c|  264 --
>  drivers/iommu/fsl_pamu.h|   10 
>  drivers/iommu/fsl_pamu_domain.c |  694 
> ++--
>  drivers/iommu/fsl_pamu_domain.h |   46 -
>  drivers/iommu/intel/iommu.c |   55 --
>  drivers/iommu/iommu.c   |   75 ---
>  drivers/soc/fsl/qbman/qman_portal.c |   56 --
>  drivers/vfio/vfio_iommu_type1.c |   31 -
>  drivers/vhost/vdpa.c|   10 
>  include/linux/iommu.h   |   81 ---
>  16 files changed, 214 insertions(+), 1360 deletions(-)

Nice cleanup, thanks. The fsl_pamu driver and interface has always been
a little bit of an alien compared to other IOMMU drivers. I am inclined
to merge this after -rc3 is out, given some reviews. Can you also please
add changelogs to the last three patches?

Thanks,

Joerg

[PATCH 01/18] thunderbolt: Disable retry logic for intra-domain control packets

2021-03-04 Thread Mika Westerberg

In most cases the response packet is lost because the router in question
was disconnected by the user. Resending the control packet in that case
just adds unnecessary delays, so disable that for intra-domain control
packets. For inter-domain (XDomain) packets we continue retrying.

This also aligns the driver better what the Intel connection manager
firmware is doing.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/ctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index f1aeaff9f368..875922133782 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -17,7 +17,7 @@
 
 
 #define TB_CTL_RX_PKG_COUNT10
-#define TB_CTL_RETRIES 4
+#define TB_CTL_RETRIES 1
 
 /**
  * struct tb_ctl - Thunderbolt control channel
-- 
2.30.1

[PATCH 04/18] Documentation / thunderbolt: Drop speed/lanes entries for XDomain

2021-03-04 Thread Mika Westerberg

These are actually not needed as we already have similar entries that
apply to all devices on the Thunderbolt bus.

Cc: Isaac Hazan 
Signed-off-by: Mika Westerberg 
---
 .../ABI/testing/sysfs-bus-thunderbolt | 28 ---
 1 file changed, 28 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-thunderbolt 
b/Documentation/ABI/testing/sysfs-bus-thunderbolt
index d7f09d011b6d..bfa4ca6f3fc1 100644
--- a/Documentation/ABI/testing/sysfs-bus-thunderbolt
+++ b/Documentation/ABI/testing/sysfs-bus-thunderbolt
@@ -1,31 +1,3 @@
-What:  /sys/bus/thunderbolt/devices//rx_speed
-Date:  Feb 2021
-KernelVersion: 5.11
-Contact:   Isaac Hazan 
-Description:   This attribute reports the XDomain RX speed per lane.
-   All RX lanes run at the same speed.
-
-What:  /sys/bus/thunderbolt/devices//rx_lanes
-Date:  Feb 2021
-KernelVersion: 5.11
-Contact:   Isaac Hazan 
-Description:   This attribute reports the number of RX lanes the XDomain
-   is using simultaneously through its upstream port.
-
-What:  /sys/bus/thunderbolt/devices//tx_speed
-Date:  Feb 2021
-KernelVersion: 5.11
-Contact:   Isaac Hazan 
-Description:   This attribute reports the XDomain TX speed per lane.
-   All TX lanes run at the same speed.
-
-What:  /sys/bus/thunderbolt/devices//tx_lanes
-Date:  Feb 2021
-KernelVersion: 5.11
-Contact:   Isaac Hazan 
-Description:   This attribute reports number of TX lanes the XDomain
-   is using simultaneously through its upstream port.
-
 What: /sys/bus/thunderbolt/devices/.../domainX/boot_acl
 Date:  Jun 2018
 KernelVersion: 4.17
-- 
2.30.1

[PATCH 05/18] thunderbolt: Add more logging to XDomain connections

2021-03-04 Thread Mika Westerberg

Currently the driver is pretty quiet when another host is connected
which makes debugging possible issues harder. For this reason add more
logging on debug level that can be turned on as needed.

While there log the host-to-host connection on info level analogous to
routers and retimers.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/xdomain.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
index 7cf8b9c85ab7..584bb5ec06f8 100644
--- a/drivers/thunderbolt/xdomain.c
+++ b/drivers/thunderbolt/xdomain.c
@@ -591,6 +591,8 @@ static void tb_xdp_handle_request(struct work_struct *work)
 
finalize_property_block();
 
+   tb_dbg(tb, "%llx: received XDomain request %#x\n", route, pkg->type);
+
switch (pkg->type) {
case PROPERTIES_REQUEST:
ret = tb_xdp_properties_response(tb, ctl, route, sequence, uuid,
@@ -1002,9 +1004,12 @@ static void tb_xdomain_get_uuid(struct work_struct *work)
uuid_t uuid;
int ret;
 
+   dev_dbg(&xd->dev, "requesting remote UUID\n");
+
ret = tb_xdp_uuid_request(tb->ctl, xd->route, xd->uuid_retries, &uuid);
if (ret < 0) {
if (xd->uuid_retries-- > 0) {
+   dev_dbg(&xd->dev, "failed to request UUID, retrying\n");
queue_delayed_work(xd->tb->wq, &xd->get_uuid_work,
   msecs_to_jiffies(100));
} else {
@@ -1013,6 +1018,8 @@ static void tb_xdomain_get_uuid(struct work_struct *work)
return;
}
 
+   dev_dbg(&xd->dev, "got remote UUID %pUb\n", &uuid);
+
if (uuid_equal(&uuid, xd->local_uuid))
dev_dbg(&xd->dev, "intra-domain loop detected\n");
 
@@ -1052,11 +1059,15 @@ static void tb_xdomain_get_properties(struct 
work_struct *work)
u32 gen = 0;
int ret;
 
+   dev_dbg(&xd->dev, "requesting remote properties\n");
+
ret = tb_xdp_properties_request(tb->ctl, xd->route, xd->local_uuid,
xd->remote_uuid, xd->properties_retries,
&block, &gen);
if (ret < 0) {
if (xd->properties_retries-- > 0) {
+   dev_dbg(&xd->dev,
+   "failed to request remote properties, 
retrying\n");
queue_delayed_work(xd->tb->wq, &xd->get_properties_work,
   msecs_to_jiffies(1000));
} else {
@@ -1123,6 +1134,11 @@ static void tb_xdomain_get_properties(struct work_struct 
*work)
dev_err(&xd->dev, "failed to add XDomain device\n");
return;
}
+   dev_info(&xd->dev, "new host found, vendor=%#x device=%#x\n",
+xd->vendor, xd->device);
+   if (xd->vendor_name && xd->device_name)
+   dev_info(&xd->dev, "%s %s\n", xd->vendor_name,
+xd->device_name);
} else {
kobject_uevent(&xd->dev.kobj, KOBJ_CHANGE);
}
@@ -1143,13 +1159,19 @@ static void tb_xdomain_properties_changed(struct 
work_struct *work)
 properties_changed_work.work);
int ret;
 
+   dev_dbg(&xd->dev, "sending properties changed notification\n");
+
ret = tb_xdp_properties_changed_request(xd->tb->ctl, xd->route,
xd->properties_changed_retries, xd->local_uuid);
if (ret) {
-   if (xd->properties_changed_retries-- > 0)
+   if (xd->properties_changed_retries-- > 0) {
+   dev_dbg(&xd->dev,
+   "failed to send properties changed 
notification, retrying\n");
queue_delayed_work(xd->tb->wq,
   &xd->properties_changed_work,
   msecs_to_jiffies(1000));
+   }
+   dev_err(&xd->dev, "failed to send properties changed 
notification\n");
return;
}
 
@@ -1390,6 +1412,10 @@ struct tb_xdomain *tb_xdomain_alloc(struct tb *tb, 
struct device *parent,
xd->dev.groups = xdomain_attr_groups;
dev_set_name(&xd->dev, "%u-%llx", tb->index, route);
 
+   dev_dbg(&xd->dev, "local UUID %pUb\n", local_uuid);
+   if (remote_uuid)
+   dev_dbg(&xd->dev, "remote UUID %pUb\n", remote_uuid);
+
/*
 * This keeps the DMA powered on as long as we have active
 * connection to another host.
@@ -1452,10 +1478,12 @@ void tb_xdomain_remove(struct tb_xdomain *xd)
pm_runtime_put_noidle(&xd->dev);
pm_runtime_set_suspended(&xd->dev);
 
-   if (!device_is_registered(&xd->dev))
+   if (!device_is_registered(&xd->dev)) {

[PATCH 03/18] thunderbolt: Decrease control channel timeout for software connection manager

2021-03-04 Thread Mika Westerberg

When the firmware connection manager is not proxying between the
software and the hardware we can decrease the timeout for control
packets significantly. The USB4 spec recommends 10 ms +- 1 ms but we use
slightly larger value (100 ms) which is recommendation from Intel
Thunderbolt firmware folks. When firmware connection manager is running
then we keep using the existing 5000 ms.

To implement this we move the control channel allocation to
tb_domain_alloc(), and pass the timeout from that function to the
tb_ctl_alloc(). Then make both connection manager implementations pass
the timeout when they alloc the domain structure.

While there update kernel-doc of struct tb_ctl to match the reality.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/ctl.c| 15 +---
 drivers/thunderbolt/ctl.h|  5 ++-
 drivers/thunderbolt/domain.c | 66 +---
 drivers/thunderbolt/icm.c|  2 +-
 drivers/thunderbolt/tb.c |  4 ++-
 drivers/thunderbolt/tb.h |  2 +-
 6 files changed, 49 insertions(+), 45 deletions(-)

diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index b79be1f02d92..0fb5e04191e2 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -29,6 +29,7 @@
  * @request_queue_lock: Lock protecting @request_queue
  * @request_queue: List of outstanding requests
  * @running: Is the control channel running at the moment
+ * @timeout_msec: Default timeout for non-raw control messages
  * @callback: Callback called when hotplug message is received
  * @callback_data: Data passed to @callback
  */
@@ -43,6 +44,7 @@ struct tb_ctl {
struct list_head request_queue;
bool running;
 
+   int timeout_msec;
event_cb callback;
void *callback_data;
 };
@@ -613,6 +615,7 @@ struct tb_cfg_result tb_cfg_request_sync(struct tb_ctl *ctl,
 /**
  * tb_ctl_alloc() - allocate a control channel
  * @nhi: Pointer to NHI
+ * @timeout_msec: Default timeout used with non-raw control messages
  * @cb: Callback called for plug events
  * @cb_data: Data passed to @cb
  *
@@ -620,13 +623,15 @@ struct tb_cfg_result tb_cfg_request_sync(struct tb_ctl 
*ctl,
  *
  * Return: Returns a pointer on success or NULL on failure.
  */
-struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, event_cb cb, void *cb_data)
+struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, int timeout_msec, event_cb cb,
+   void *cb_data)
 {
int i;
struct tb_ctl *ctl = kzalloc(sizeof(*ctl), GFP_KERNEL);
if (!ctl)
return NULL;
ctl->nhi = nhi;
+   ctl->timeout_msec = timeout_msec;
ctl->callback = cb;
ctl->callback_data = cb_data;
 
@@ -829,7 +834,7 @@ struct tb_cfg_result tb_cfg_reset(struct tb_ctl *ctl, u64 
route)
req->response_size = sizeof(reply);
req->response_type = TB_CFG_PKG_RESET;
 
-   res = tb_cfg_request_sync(ctl, req, TB_CFG_DEFAULT_TIMEOUT);
+   res = tb_cfg_request_sync(ctl, req, ctl->timeout_msec);
 
tb_cfg_request_put(req);
 
@@ -1005,7 +1010,7 @@ int tb_cfg_read(struct tb_ctl *ctl, void *buffer, u64 
route, u32 port,
enum tb_cfg_space space, u32 offset, u32 length)
 {
struct tb_cfg_result res = tb_cfg_read_raw(ctl, buffer, route, port,
-   space, offset, length, TB_CFG_DEFAULT_TIMEOUT);
+   space, offset, length, ctl->timeout_msec);
switch (res.err) {
case 0:
/* Success */
@@ -1031,7 +1036,7 @@ int tb_cfg_write(struct tb_ctl *ctl, const void *buffer, 
u64 route, u32 port,
 enum tb_cfg_space space, u32 offset, u32 length)
 {
struct tb_cfg_result res = tb_cfg_write_raw(ctl, buffer, route, port,
-   space, offset, length, TB_CFG_DEFAULT_TIMEOUT);
+   space, offset, length, ctl->timeout_msec);
switch (res.err) {
case 0:
/* Success */
@@ -1069,7 +1074,7 @@ int tb_cfg_get_upstream_port(struct tb_ctl *ctl, u64 
route)
u32 dummy;
struct tb_cfg_result res = tb_cfg_read_raw(ctl, &dummy, route, 0,
   TB_CFG_SWITCH, 0, 1,
-  TB_CFG_DEFAULT_TIMEOUT);
+  ctl->timeout_msec);
if (res.err == 1)
return -EIO;
if (res.err)
diff --git a/drivers/thunderbolt/ctl.h b/drivers/thunderbolt/ctl.h
index 2eafbfea5dff..e8c64898dfce 100644
--- a/drivers/thunderbolt/ctl.h
+++ b/drivers/thunderbolt/ctl.h
@@ -21,15 +21,14 @@ struct tb_ctl;
 typedef bool (*event_cb)(void *data, enum tb_cfg_pkg_type type,
 const void *buf, size_t size);
 
-struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, event_cb cb, void *cb_data);
+struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, int timeout_msec, event_cb cb,
+   void *cb_data);
 void tb_ctl_start(struct tb_ctl *ctl

[PATCH 07/18] thunderbolt: Use pseudo-random number as initial property block generation

2021-03-04 Thread Mika Westerberg

As recommended by USB4 inter-domain service spec use pseudo-random value
instead of zero as initial XDomain property block generation value.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/xdomain.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
index a1657663a95e..cfe6fa7e84f4 100644
--- a/drivers/thunderbolt/xdomain.c
+++ b/drivers/thunderbolt/xdomain.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1880,6 +1881,7 @@ int tb_xdomain_init(void)
tb_property_add_immediate(xdomain_property_dir, "deviceid", 0x1);
tb_property_add_immediate(xdomain_property_dir, "devicerv", 0x8100);
 
+   xdomain_property_block_gen = prandom_u32();
return 0;
 }
 
-- 
2.30.1

[PATCH 13/18] thunderbolt: Allow multiple DMA tunnels over a single XDomain connection

2021-03-04 Thread Mika Westerberg

Currently we have had an artificial limitation of a single DMA tunnel
per XDomain connection. However, hardware wise there is no such limit
and software based connection manager can take advantage of all the DMA
rings available on the host to establish tunnels.

For this reason make the tb_xdomain_[enable|disable]_paths() to take the
DMA ring and HopID as parameter instead of storing them in the struct
tb_xdomain. We also add API functions to allocate input and output
HopIDs of the XDomain connection that the service drivers can use
instead of hard-coding.

Also convert the two existing service drivers over to this API.

Signed-off-by: Mika Westerberg 
---
 drivers/net/thunderbolt.c  |  49 +---
 drivers/thunderbolt/dma_test.c |  35 -
 drivers/thunderbolt/domain.c   |  24 --
 drivers/thunderbolt/icm.c  |  32 +---
 drivers/thunderbolt/tb.c   |  48 +++-
 drivers/thunderbolt/tb.h   |  16 +++-
 drivers/thunderbolt/tunnel.c   |  82 ---
 drivers/thunderbolt/tunnel.h   |   8 +-
 drivers/thunderbolt/xdomain.c  | 139 ++---
 include/linux/thunderbolt.h|  32 +---
 10 files changed, 340 insertions(+), 125 deletions(-)

diff --git a/drivers/net/thunderbolt.c b/drivers/net/thunderbolt.c
index ed3743dc62b9..5c9ec91b6e78 100644
--- a/drivers/net/thunderbolt.c
+++ b/drivers/net/thunderbolt.c
@@ -28,7 +28,6 @@
 #define TBNET_LOGOUT_TIMEOUT   100
 
 #define TBNET_RING_SIZE256
-#define TBNET_LOCAL_PATH   0xf
 #define TBNET_LOGIN_RETRIES60
 #define TBNET_LOGOUT_RETRIES   5
 #define TBNET_MATCH_FRAGS_ID   BIT(1)
@@ -154,8 +153,8 @@ struct tbnet_ring {
  * @login_sent: ThunderboltIP login message successfully sent
  * @login_received: ThunderboltIP login message received from the remote
  * host
- * @transmit_path: HopID the other end needs to use building the
- *opposite side path.
+ * @local_transmit_path: HopID we are using to send out packets
+ * @remote_transmit_path: HopID the other end is using to send packets to us
  * @connection_lock: Lock serializing access to @login_sent,
  *  @login_received and @transmit_path.
  * @login_retries: Number of login retries currently done
@@ -184,7 +183,8 @@ struct tbnet {
atomic_t command_id;
bool login_sent;
bool login_received;
-   u32 transmit_path;
+   int local_transmit_path;
+   int remote_transmit_path;
struct mutex connection_lock;
int login_retries;
struct delayed_work login_work;
@@ -257,7 +257,7 @@ static int tbnet_login_request(struct tbnet *net, u8 
sequence)
  atomic_inc_return(&net->command_id));
 
request.proto_version = TBIP_LOGIN_PROTO_VERSION;
-   request.transmit_path = TBNET_LOCAL_PATH;
+   request.transmit_path = net->local_transmit_path;
 
return tb_xdomain_request(xd, &request, sizeof(request),
  TB_CFG_PKG_XDOMAIN_RESP, &reply,
@@ -364,10 +364,10 @@ static void tbnet_tear_down(struct tbnet *net, bool 
send_logout)
mutex_lock(&net->connection_lock);
 
if (net->login_sent && net->login_received) {
-   int retries = TBNET_LOGOUT_RETRIES;
+   int ret, retries = TBNET_LOGOUT_RETRIES;
 
while (send_logout && retries-- > 0) {
-   int ret = tbnet_logout_request(net);
+   ret = tbnet_logout_request(net);
if (ret != -ETIMEDOUT)
break;
}
@@ -377,8 +377,16 @@ static void tbnet_tear_down(struct tbnet *net, bool 
send_logout)
tbnet_free_buffers(&net->rx_ring);
tbnet_free_buffers(&net->tx_ring);
 
-   if (tb_xdomain_disable_paths(net->xd))
+   ret = tb_xdomain_disable_paths(net->xd,
+  net->local_transmit_path,
+  net->rx_ring.ring->hop,
+  net->remote_transmit_path,
+  net->tx_ring.ring->hop);
+   if (ret)
netdev_warn(net->dev, "failed to disable DMA paths\n");
+
+   tb_xdomain_release_in_hopid(net->xd, net->remote_transmit_path);
+   net->remote_transmit_path = 0;
}
 
net->login_retries = 0;
@@ -424,7 +432,7 @@ static int tbnet_handle_packet(const void *buf, size_t 
size, void *data)
if (!ret) {
mutex_lock(&net->connection_lock);
net->login_received = true;
-   net->transmit_path = pkg->transmit_path;
+   net->remote_transmit_path = pkg->transmit_path;
 
/* If we reached the number of max retries or
 * previous logout, schedule another round of
@@ -597

[PATCH 00/18] thunderbolt: Align with USB4 inter-domain and DROM specs

2021-03-04 Thread Mika Westerberg

Hi all,

The latest USB4 spec [1] also includes inter-domain (peer-to-peer, XDomain)
and DROM (per-device ROM) specs. There are sligth differences between what
the driver is doing now and what the spec say so this series tries to align
the driver(s) with that. We also improve the "service" stack so that it is
possible to run multiple DMA tunnels over a single XDomain connection, and
update the two existing service drivers accordingly.

We also decrease the control channel timeout when software based connection
manager is used.

The USB4 DROM spec adds a new product descriptor that includes the device
and IDs instead of the generic entries in the Thunderbotl 1-3 DROMs. This
series updates the driver to parse this descriptor too.

[1] https://www.usb.org/document-library/usb4tm-specification

Mika Westerberg (18):
  thunderbolt: Disable retry logic for intra-domain control packets
  thunderbolt: Do not pass timeout for tb_cfg_reset()
  thunderbolt: Decrease control channel timeout for software connection manager
  Documentation / thunderbolt: Drop speed/lanes entries for XDomain
  thunderbolt: Add more logging to XDomain connections
  thunderbolt: Do not re-establish XDomain DMA paths automatically
  thunderbolt: Use pseudo-random number as initial property block generation
  thunderbolt: Align XDomain protocol timeouts with the spec
  thunderbolt: Add tb_property_copy_dir()
  thunderbolt: Add support for maxhopid XDomain property
  thunderbolt: Use dedicated flow control for DMA tunnels
  thunderbolt: Drop unused tb_port_set_initial_credits()
  thunderbolt: Allow multiple DMA tunnels over a single XDomain connection
  net: thunderbolt: Align the driver to the USB4 networking spec
  thunderbolt: Add KUnit tests for XDomain properties
  thunderbolt: Add KUnit tests for DMA tunnels
  thunderbolt: Check quirks in tb_switch_add()
  thunderbolt: Add support for USB4 DROM

 .../ABI/testing/sysfs-bus-thunderbolt |  35 +-
 drivers/net/thunderbolt.c |  56 +-
 drivers/thunderbolt/ctl.c |  21 +-
 drivers/thunderbolt/ctl.h |   8 +-
 drivers/thunderbolt/dma_test.c|  35 +-
 drivers/thunderbolt/domain.c  |  90 ++--
 drivers/thunderbolt/eeprom.c  | 105 +++-
 drivers/thunderbolt/icm.c |  34 +-
 drivers/thunderbolt/property.c|  71 +++
 drivers/thunderbolt/switch.c  |  26 +-
 drivers/thunderbolt/tb.c  |  52 +-
 drivers/thunderbolt/tb.h  |  19 +-
 drivers/thunderbolt/test.c| 492 ++
 drivers/thunderbolt/tunnel.c  | 102 +++-
 drivers/thunderbolt/tunnel.h  |   8 +-
 drivers/thunderbolt/xdomain.c | 416 +--
 include/linux/thunderbolt.h   |  54 +-
 17 files changed, 1220 insertions(+), 404 deletions(-)

-- 
2.30.1

[PATCH 10/18] thunderbolt: Add support for maxhopid XDomain property

2021-03-04 Thread Mika Westerberg

USB4 inter-domain spec mandates that the compatible hosts expose a new
property "maxhopid" that tells the connection manager on the other side
what is the maximum supported input HopID over the connection. Since
this is depend on the lane adapter the cable is connected it needs to be
filled in dynamically.

For this reason we take a copy of the global properties and fill then
for each XDomain connection upon first connect, and then keep updating
it if the generation changes as services are being added/removed. We
also take advantage of this copy to fill in the hostname.

We also expose this maxhopid as an attribute under each XDomain device.

While there drop kernel-doc entry for property_lock which seems to be
left there when the structure was originally introduced.

Signed-off-by: Mika Westerberg 
---
 .../ABI/testing/sysfs-bus-thunderbolt |   7 +
 drivers/thunderbolt/xdomain.c | 206 ++
 include/linux/thunderbolt.h   |  19 +-
 3 files changed, 138 insertions(+), 94 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-thunderbolt 
b/Documentation/ABI/testing/sysfs-bus-thunderbolt
index bfa4ca6f3fc1..c41c68f64693 100644
--- a/Documentation/ABI/testing/sysfs-bus-thunderbolt
+++ b/Documentation/ABI/testing/sysfs-bus-thunderbolt
@@ -134,6 +134,13 @@ Contact:   thunderbolt-softw...@lists.01.org
 Description:   This attribute contains name of this device extracted from
the device DROM.
 
+What:  /sys/bus/thunderbolt/devices/.../maxhopid
+Date:  Jul 2021
+KernelVersion: 5.13
+Contact:   Mika Westerberg 
+Description:   Only set for XDomains. The maximum HopID the other host
+   supports as its input HopID.
+
 What:  /sys/bus/thunderbolt/devices/.../rx_speed
 Date:  Jan 2020
 KernelVersion: 5.5
diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
index ffa9cc9e0e7d..ab56757d7c24 100644
--- a/drivers/thunderbolt/xdomain.c
+++ b/drivers/thunderbolt/xdomain.c
@@ -24,6 +24,7 @@
 #define XDOMAIN_PROPERTIES_RETRIES 10
 #define XDOMAIN_PROPERTIES_CHANGED_RETRIES 10
 #define XDOMAIN_BONDING_WAIT   100  /* ms */
+#define XDOMAIN_DEFAULT_MAX_HOPID  15
 
 struct xdomain_request_work {
struct work_struct work;
@@ -35,13 +36,15 @@ static bool tb_xdomain_enabled = true;
 module_param_named(xdomain, tb_xdomain_enabled, bool, 0444);
 MODULE_PARM_DESC(xdomain, "allow XDomain protocol (default: true)");
 
-/* Serializes access to the properties and protocol handlers below */
+/*
+ * Serializes access to the properties and protocol handlers below. If
+ * you need to take both this lock and the struct tb_xdomain lock, take
+ * this one first.
+ */
 static DEFINE_MUTEX(xdomain_lock);
 
 /* Properties exposed to the remote domains */
 static struct tb_property_dir *xdomain_property_dir;
-static u32 *xdomain_property_block;
-static u32 xdomain_property_block_len;
 static u32 xdomain_property_block_gen;
 
 /* Additional protocol handlers */
@@ -386,8 +389,7 @@ static int tb_xdp_properties_request(struct tb_ctl *ctl, 
u64 route,
 }
 
 static int tb_xdp_properties_response(struct tb *tb, struct tb_ctl *ctl,
-   u64 route, u8 sequence, const uuid_t *src_uuid,
-   const struct tb_xdp_properties *req)
+   struct tb_xdomain *xd, u8 sequence, const struct tb_xdp_properties *req)
 {
struct tb_xdp_properties_response *res;
size_t total_size;
@@ -399,39 +401,39 @@ static int tb_xdp_properties_response(struct tb *tb, 
struct tb_ctl *ctl,
 * protocol supports forwarding, though which we might add
 * support later on.
 */
-   if (!uuid_equal(src_uuid, &req->dst_uuid)) {
-   tb_xdp_error_response(ctl, route, sequence,
+   if (!uuid_equal(xd->local_uuid, &req->dst_uuid)) {
+   tb_xdp_error_response(ctl, xd->route, sequence,
  ERROR_UNKNOWN_DOMAIN);
return 0;
}
 
-   mutex_lock(&xdomain_lock);
+   mutex_lock(&xd->lock);
 
-   if (req->offset >= xdomain_property_block_len) {
-   mutex_unlock(&xdomain_lock);
+   if (req->offset >= xd->local_property_block_len) {
+   mutex_unlock(&xd->lock);
return -EINVAL;
}
 
-   len = xdomain_property_block_len - req->offset;
+   len = xd->local_property_block_len - req->offset;
len = min_t(u16, len, TB_XDP_PROPERTIES_MAX_DATA_LENGTH);
total_size = sizeof(*res) + len * 4;
 
res = kzalloc(total_size, GFP_KERNEL);
if (!res) {
-   mutex_unlock(&xdomain_lock);
+   mutex_unlock(&xd->lock);
return -ENOMEM;
}
 
-   tb_xdp_fill_header(&res->hdr, route, sequence, PROPERTIES_RESPONSE,
+   tb_xdp_fill_header(&res->hdr, xd->route, sequence, PROPERTIES_RESPONSE,
   total_size);
-   res->generation = xdomai

[PATCH 15/18] thunderbolt: Add KUnit tests for XDomain properties

2021-03-04 Thread Mika Westerberg

This adds KUnit tests for parsing, formatting and copying of XDomain
properties.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/test.c | 252 +
 1 file changed, 252 insertions(+)

diff --git a/drivers/thunderbolt/test.c b/drivers/thunderbolt/test.c
index 464c2d37b992..4e1e7ae2d90d 100644
--- a/drivers/thunderbolt/test.c
+++ b/drivers/thunderbolt/test.c
@@ -1594,6 +1594,255 @@ static void tb_test_tunnel_port_on_path(struct kunit 
*test)
tb_tunnel_free(dp_tunnel);
 }
 
+static const u32 root_directory[] = {
+   0x55584401, /* "UXD" v1 */
+   0x0018, /* Root directory length */
+   0x76656e64, /* "vend" */
+   0x6f726964, /* "orid" */
+   0x7601, /* "v" R 1 */
+   0x0a27, /* Immediate value, ! Vendor ID */
+   0x76656e64, /* "vend" */
+   0x6f726964, /* "orid" */
+   0x7403, /* "t" R 3 */
+   0x001a, /* Text leaf offset, (“Apple Inc.”) */
+   0x64657669, /* "devi" */
+   0x63656964, /* "ceid" */
+   0x7601, /* "v" R 1 */
+   0x000a, /* Immediate value, ! Device ID */
+   0x64657669, /* "devi" */
+   0x63656964, /* "ceid" */
+   0x7403, /* "t" R 3 */
+   0x001d, /* Text leaf offset, (“Macintosh”) */
+   0x64657669, /* "devi" */
+   0x63657276, /* "cerv" */
+   0x7601, /* "v" R 1 */
+   0x8100, /* Immediate value, Device Revision */
+   0x6e657477, /* "netw" */
+   0x6f726b00, /* "ork" */
+   0x4414, /* "D" R 20 */
+   0x0021, /* Directory data offset, (Network Directory) */
+   0x4170706c, /* "Appl" */
+   0x6520496e, /* "e In" */
+   0x632e, /* "c." ! */
+   0x4d616369, /* "Maci" */
+   0x6e746f73, /* "ntos" */
+   0x6800, /* "h" */
+   0x, /* padding */
+   0xca8961c6, /* Directory UUID, Network Directory */
+   0x9541ce1c, /* Directory UUID, Network Directory */
+   0x5949b8bd, /* Directory UUID, Network Directory */
+   0x4f5a5f2e, /* Directory UUID, Network Directory */
+   0x70727463, /* "prtc" */
+   0x6964, /* "id" */
+   0x7601, /* "v" R 1 */
+   0x0001, /* Immediate value, Network Protocol ID */
+   0x70727463, /* "prtc" */
+   0x76657273, /* "vers" */
+   0x7601, /* "v" R 1 */
+   0x0001, /* Immediate value, Network Protocol Version */
+   0x70727463, /* "prtc" */
+   0x72657673, /* "revs" */
+   0x7601, /* "v" R 1 */
+   0x0001, /* Immediate value, Network Protocol Revision */
+   0x70727463, /* "prtc" */
+   0x73746e73, /* "stns" */
+   0x7601, /* "v" R 1 */
+   0x, /* Immediate value, Network Protocol Settings */
+};
+
+static const uuid_t network_dir_uuid =
+   UUID_INIT(0xc66189ca, 0x1cce, 0x4195,
+ 0xbd, 0xb8, 0x49, 0x59, 0x2e, 0x5f, 0x5a, 0x4f);
+
+static void tb_test_property_parse(struct kunit *test)
+{
+   struct tb_property_dir *dir, *network_dir;
+   struct tb_property *p;
+
+   dir = tb_property_parse_dir(root_directory, ARRAY_SIZE(root_directory));
+   KUNIT_ASSERT_TRUE(test, dir != NULL);
+
+   p = tb_property_find(dir, "foo", TB_PROPERTY_TYPE_TEXT);
+   KUNIT_ASSERT_TRUE(test, !p);
+
+   p = tb_property_find(dir, "vendorid", TB_PROPERTY_TYPE_TEXT);
+   KUNIT_ASSERT_TRUE(test, p != NULL);
+   KUNIT_EXPECT_STREQ(test, p->value.text, "Apple Inc.");
+
+   p = tb_property_find(dir, "vendorid", TB_PROPERTY_TYPE_VALUE);
+   KUNIT_ASSERT_TRUE(test, p != NULL);
+   KUNIT_EXPECT_EQ(test, p->value.immediate, (u32)0xa27);
+
+   p = tb_property_find(dir, "deviceid", TB_PROPERTY_TYPE_TEXT);
+   KUNIT_ASSERT_TRUE(test, p != NULL);
+   KUNIT_EXPECT_STREQ(test, p->value.text, "Macintosh");
+
+   p = tb_property_find(dir, "deviceid", TB_PROPERTY_TYPE_VALUE);
+   KUNIT_ASSERT_TRUE(test, p != NULL);
+   KUNIT_EXPECT_EQ(test, p->value.immediate, (u32)0xa);
+
+   p = tb_property_find(dir, "missing", TB_PROPERTY_TYPE_DIRECTORY);
+   KUNIT_ASSERT_TRUE(test, !p);
+
+   p = tb_property_find(dir, "network", TB_PROPERTY_TYPE_DIRECTORY);
+   KUNIT_ASSERT_TRUE(test, p != NULL);
+
+   network_dir = p->value.dir;
+   KUNIT_EXPECT_TRUE(test, uuid_equal(network_dir->uuid, 
&network_dir_uuid));
+
+   p = tb_property_find(network_dir, "prtcid", TB_PROPERTY_TYPE_VALUE);
+   KUNIT_ASSERT_TRUE(test, p != NULL);
+   KUNIT_EXPECT_EQ(test, p->value.immediate, (u32)0x1);
+
+   p = tb_property_find(network_dir, "prtcvers", TB_PROPERTY_TYPE_VALUE);
+   KUNIT_ASSERT_TRUE(test, p != NULL);
+   KUNIT_EXPECT_EQ(test, p->value.immediate, (u32)0x1);
+
+   p = tb_property_find(network_dir, "prtcrevs", TB_PROPERTY_TYPE_

[PATCH 11/18] thunderbolt: Use dedicated flow control for DMA tunnels

2021-03-04 Thread Mika Westerberg

The USB4 inter-domain service spec recommends using dedicated flow
control scheme so update the driver accordingly.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/tunnel.c | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
index 6557b6e07009..2e7ec037a73e 100644
--- a/drivers/thunderbolt/tunnel.c
+++ b/drivers/thunderbolt/tunnel.c
@@ -794,24 +794,14 @@ static u32 tb_dma_credits(struct tb_port *nhi)
return min(max_credits, 13U);
 }
 
-static int tb_dma_activate(struct tb_tunnel *tunnel, bool active)
-{
-   struct tb_port *nhi = tunnel->src_port;
-   u32 credits;
-
-   credits = active ? tb_dma_credits(nhi) : 0;
-   return tb_port_set_initial_credits(nhi, credits);
-}
-
-static void tb_dma_init_path(struct tb_path *path, unsigned int isb,
-unsigned int efc, u32 credits)
+static void tb_dma_init_path(struct tb_path *path, unsigned int efc, u32 
credits)
 {
int i;
 
path->egress_fc_enable = efc;
path->ingress_fc_enable = TB_PATH_ALL;
path->egress_shared_buffer = TB_PATH_NONE;
-   path->ingress_shared_buffer = isb;
+   path->ingress_shared_buffer = TB_PATH_NONE;
path->priority = 5;
path->weight = 1;
path->clear_fc = true;
@@ -856,7 +846,6 @@ struct tb_tunnel *tb_tunnel_alloc_dma(struct tb *tb, struct 
tb_port *nhi,
if (!tunnel)
return NULL;
 
-   tunnel->activate = tb_dma_activate;
tunnel->src_port = nhi;
tunnel->dst_port = dst;
 
@@ -869,8 +858,7 @@ struct tb_tunnel *tb_tunnel_alloc_dma(struct tb *tb, struct 
tb_port *nhi,
tb_tunnel_free(tunnel);
return NULL;
}
-   tb_dma_init_path(path, TB_PATH_NONE, TB_PATH_SOURCE | 
TB_PATH_INTERNAL,
-credits);
+   tb_dma_init_path(path, TB_PATH_SOURCE | TB_PATH_INTERNAL, 
credits);
tunnel->paths[i++] = path;
}
 
@@ -881,7 +869,7 @@ struct tb_tunnel *tb_tunnel_alloc_dma(struct tb *tb, struct 
tb_port *nhi,
tb_tunnel_free(tunnel);
return NULL;
}
-   tb_dma_init_path(path, TB_PATH_SOURCE, TB_PATH_ALL, credits);
+   tb_dma_init_path(path, TB_PATH_ALL, credits);
tunnel->paths[i++] = path;
}
 
-- 
2.30.1

[PATCH 08/18] thunderbolt: Align XDomain protocol timeouts with the spec

2021-03-04 Thread Mika Westerberg

The USB4 inter-domain service spec has slightly different recommended
timeouts for the XDomain protocol so align the driver with those.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/xdomain.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
index cfe6fa7e84f4..ffa9cc9e0e7d 100644
--- a/drivers/thunderbolt/xdomain.c
+++ b/drivers/thunderbolt/xdomain.c
@@ -19,9 +19,9 @@
 
 #include "tb.h"
 
-#define XDOMAIN_DEFAULT_TIMEOUT5000 /* ms */
+#define XDOMAIN_DEFAULT_TIMEOUT1000 /* ms */
 #define XDOMAIN_UUID_RETRIES   10
-#define XDOMAIN_PROPERTIES_RETRIES 60
+#define XDOMAIN_PROPERTIES_RETRIES 10
 #define XDOMAIN_PROPERTIES_CHANGED_RETRIES 10
 #define XDOMAIN_BONDING_WAIT   100  /* ms */
 
-- 
2.30.1

[PATCH 09/18] thunderbolt: Add tb_property_copy_dir()

2021-03-04 Thread Mika Westerberg

This function takes a deep copy of the properties. We need this in order
to support more dynamic properties per XDomain connection as required by
the USB4 inter-domain service spec.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/property.c | 71 ++
 include/linux/thunderbolt.h|  1 +
 2 files changed, 72 insertions(+)

diff --git a/drivers/thunderbolt/property.c b/drivers/thunderbolt/property.c
index d5b0cdb8f0b1..dc555cda98e6 100644
--- a/drivers/thunderbolt/property.c
+++ b/drivers/thunderbolt/property.c
@@ -501,6 +501,77 @@ ssize_t tb_property_format_dir(const struct 
tb_property_dir *dir, u32 *block,
return ret < 0 ? ret : 0;
 }
 
+/**
+ * tb_property_copy_dir() - Take a deep copy of directory
+ * @dir: Directory to copy
+ *
+ * This function takes a deep copy of @dir and returns back the copy. In
+ * case of error returns %NULL. The resulting directory needs to be
+ * released by calling tb_property_free_dir().
+ */
+struct tb_property_dir *tb_property_copy_dir(const struct tb_property_dir *dir)
+{
+   struct tb_property *property, *p = NULL;
+   struct tb_property_dir *d;
+
+   if (!dir)
+   return NULL;
+
+   d = tb_property_create_dir(dir->uuid);
+   if (!d)
+   return NULL;
+
+   list_for_each_entry(property, &dir->properties, list) {
+   struct tb_property *p;
+
+   p = tb_property_alloc(property->key, property->type);
+   if (!p)
+   goto err_free;
+
+   p->length = property->length;
+
+   switch (property->type) {
+   case TB_PROPERTY_TYPE_DIRECTORY:
+   p->value.dir = 
tb_property_copy_dir(property->value.dir);
+   if (!p->value.dir)
+   goto err_free;
+   break;
+
+   case TB_PROPERTY_TYPE_DATA:
+   p->value.data = kmemdup(property->value.data,
+   property->length * 4,
+   GFP_KERNEL);
+   if (!p->value.data)
+   goto err_free;
+   break;
+
+   case TB_PROPERTY_TYPE_TEXT:
+   p->value.text = kzalloc(p->length * 4, GFP_KERNEL);
+   if (!p->value.text)
+   goto err_free;
+   strcpy(p->value.text, property->value.text);
+   break;
+
+   case TB_PROPERTY_TYPE_VALUE:
+   p->value.immediate = property->value.immediate;
+   break;
+
+   default:
+   break;
+   }
+
+   list_add_tail(&p->list, &d->properties);
+   }
+
+   return d;
+
+err_free:
+   kfree(p);
+   tb_property_free_dir(d);
+
+   return NULL;
+}
+
 /**
  * tb_property_add_immediate() - Add immediate property to directory
  * @parent: Directory to add the property
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index 7ec977161f5c..003a9ad29168 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -146,6 +146,7 @@ struct tb_property_dir *tb_property_parse_dir(const u32 
*block,
  size_t block_len);
 ssize_t tb_property_format_dir(const struct tb_property_dir *dir, u32 *block,
   size_t block_len);
+struct tb_property_dir *tb_property_copy_dir(const struct tb_property_dir 
*dir);
 struct tb_property_dir *tb_property_create_dir(const uuid_t *uuid);
 void tb_property_free_dir(struct tb_property_dir *dir);
 int tb_property_add_immediate(struct tb_property_dir *parent, const char *key,
-- 
2.30.1

[PATCH 14/18] net: thunderbolt: Align the driver to the USB4 networking spec

2021-03-04 Thread Mika Westerberg

The USB4 networking spec (USB4NET) recommends different timeouts, and
also suggest that the driver sets the 64k frame support flag in the
properties block. Make the networking driver to honor this.

Signed-off-by: Mika Westerberg 
---
 drivers/net/thunderbolt.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/thunderbolt.c b/drivers/net/thunderbolt.c
index 5c9ec91b6e78..9a6a8353e192 100644
--- a/drivers/net/thunderbolt.c
+++ b/drivers/net/thunderbolt.c
@@ -25,12 +25,13 @@
 /* Protocol timeouts in ms */
 #define TBNET_LOGIN_DELAY  4500
 #define TBNET_LOGIN_TIMEOUT500
-#define TBNET_LOGOUT_TIMEOUT   100
+#define TBNET_LOGOUT_TIMEOUT   1000
 
 #define TBNET_RING_SIZE256
 #define TBNET_LOGIN_RETRIES60
-#define TBNET_LOGOUT_RETRIES   5
+#define TBNET_LOGOUT_RETRIES   10
 #define TBNET_MATCH_FRAGS_ID   BIT(1)
+#define TBNET_64K_FRAMES   BIT(2)
 #define TBNET_MAX_MTU  SZ_64K
 #define TBNET_FRAME_SIZE   SZ_4K
 #define TBNET_MAX_PAYLOAD_SIZE \
@@ -1367,7 +1368,7 @@ static int __init tbnet_init(void)
 * the moment.
 */
tb_property_add_immediate(tbnet_dir, "prtcstns",
- TBNET_MATCH_FRAGS_ID);
+ TBNET_MATCH_FRAGS_ID | TBNET_64K_FRAMES);
 
ret = tb_register_property_dir("network", tbnet_dir);
if (ret) {
-- 
2.30.1

[PATCH 02/18] thunderbolt: Do not pass timeout for tb_cfg_reset()

2021-03-04 Thread Mika Westerberg

There is only one user for this function and it passes the default
timeout to it anyway, so remove the parameter completely. This is also
needed in the subsequent patch where we allow connection manager
implementations to use different timeout for non-raw control channel
messages.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/ctl.c| 6 ++
 drivers/thunderbolt/ctl.h| 3 +--
 drivers/thunderbolt/switch.c | 2 +-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index 875922133782..b79be1f02d92 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -802,14 +802,12 @@ static bool tb_cfg_copy(struct tb_cfg_request *req, const 
struct ctl_pkg *pkg)
  * tb_cfg_reset() - send a reset packet and wait for a response
  * @ctl: Control channel pointer
  * @route: Router string for the router to send reset
- * @timeout_msec: Timeout in ms how long to wait for the response
  *
  * If the switch at route is incorrectly configured then we will not receive a
  * reply (even though the switch will reset). The caller should check for
  * -ETIMEDOUT and attempt to reconfigure the switch.
  */
-struct tb_cfg_result tb_cfg_reset(struct tb_ctl *ctl, u64 route,
- int timeout_msec)
+struct tb_cfg_result tb_cfg_reset(struct tb_ctl *ctl, u64 route)
 {
struct cfg_reset_pkg request = { .header = tb_cfg_make_header(route) };
struct tb_cfg_result res = { 0 };
@@ -831,7 +829,7 @@ struct tb_cfg_result tb_cfg_reset(struct tb_ctl *ctl, u64 
route,
req->response_size = sizeof(reply);
req->response_type = TB_CFG_PKG_RESET;
 
-   res = tb_cfg_request_sync(ctl, req, timeout_msec);
+   res = tb_cfg_request_sync(ctl, req, TB_CFG_DEFAULT_TIMEOUT);
 
tb_cfg_request_put(req);
 
diff --git a/drivers/thunderbolt/ctl.h b/drivers/thunderbolt/ctl.h
index 97cb03b38953..2eafbfea5dff 100644
--- a/drivers/thunderbolt/ctl.h
+++ b/drivers/thunderbolt/ctl.h
@@ -124,8 +124,7 @@ static inline struct tb_cfg_header tb_cfg_make_header(u64 
route)
 }
 
 int tb_cfg_ack_plug(struct tb_ctl *ctl, u64 route, u32 port, bool unplug);
-struct tb_cfg_result tb_cfg_reset(struct tb_ctl *ctl, u64 route,
- int timeout_msec);
+struct tb_cfg_result tb_cfg_reset(struct tb_ctl *ctl, u64 route);
 struct tb_cfg_result tb_cfg_read_raw(struct tb_ctl *ctl, void *buffer,
 u64 route, u32 port,
 enum tb_cfg_space space, u32 offset,
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index 2a95b4ce06c0..218869c6ee21 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -1331,7 +1331,7 @@ int tb_switch_reset(struct tb_switch *sw)
  TB_CFG_SWITCH, 2, 2);
if (res.err)
return res.err;
-   res = tb_cfg_reset(sw->tb->ctl, tb_route(sw), TB_CFG_DEFAULT_TIMEOUT);
+   res = tb_cfg_reset(sw->tb->ctl, tb_route(sw));
if (res.err > 0)
return -EIO;
return res.err;
-- 
2.30.1

[PATCH 16/18] thunderbolt: Add KUnit tests for DMA tunnels

2021-03-04 Thread Mika Westerberg

Add a couple of tests to check DMA tunneling functionality.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/test.c | 240 +
 1 file changed, 240 insertions(+)

diff --git a/drivers/thunderbolt/test.c b/drivers/thunderbolt/test.c
index 4e1e7ae2d90d..5ff5a03bc9ce 100644
--- a/drivers/thunderbolt/test.c
+++ b/drivers/thunderbolt/test.c
@@ -119,6 +119,7 @@ static struct tb_switch *alloc_host(struct kunit *test)
sw->ports[7].config.type = TB_TYPE_NHI;
sw->ports[7].config.max_in_hop_id = 11;
sw->ports[7].config.max_out_hop_id = 11;
+   sw->ports[7].config.nfc_credits = 0x4180;
 
sw->ports[8].config.type = TB_TYPE_PCIE_DOWN;
sw->ports[8].config.max_in_hop_id = 8;
@@ -1594,6 +1595,240 @@ static void tb_test_tunnel_port_on_path(struct kunit 
*test)
tb_tunnel_free(dp_tunnel);
 }
 
+static void tb_test_tunnel_dma(struct kunit *test)
+{
+   struct tb_port *nhi, *port;
+   struct tb_tunnel *tunnel;
+   struct tb_switch *host;
+
+   /*
+* Create DMA tunnel from NHI to port 1 and back.
+*
+*   [Host 1]
+*1 ^ In HopID 1 -> Out HopID 8
+*  |
+*  v In HopID 8 -> Out HopID 1
+*  Domain border
+*  |
+*   [Host 2]
+*/
+   host = alloc_host(test);
+   nhi = &host->ports[7];
+   port = &host->ports[1];
+
+   tunnel = tb_tunnel_alloc_dma(NULL, nhi, port, 8, 1, 8, 1);
+   KUNIT_ASSERT_TRUE(test, tunnel != NULL);
+   KUNIT_EXPECT_EQ(test, tunnel->type, (enum tb_tunnel_type)TB_TUNNEL_DMA);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->src_port, nhi);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->dst_port, port);
+   KUNIT_ASSERT_EQ(test, tunnel->npaths, (size_t)2);
+   /* RX path */
+   KUNIT_ASSERT_EQ(test, tunnel->paths[0]->path_length, 1);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->paths[0]->hops[0].in_port, port);
+   KUNIT_EXPECT_EQ(test, tunnel->paths[0]->hops[0].in_hop_index, 8);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->paths[0]->hops[0].out_port, nhi);
+   KUNIT_EXPECT_EQ(test, tunnel->paths[0]->hops[0].next_hop_index, 1);
+   /* TX path */
+   KUNIT_ASSERT_EQ(test, tunnel->paths[1]->path_length, 1);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->paths[1]->hops[0].in_port, nhi);
+   KUNIT_EXPECT_EQ(test, tunnel->paths[1]->hops[0].in_hop_index, 1);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->paths[1]->hops[0].out_port, port);
+   KUNIT_EXPECT_EQ(test, tunnel->paths[1]->hops[0].next_hop_index, 8);
+
+   tb_tunnel_free(tunnel);
+}
+
+static void tb_test_tunnel_dma_rx(struct kunit *test)
+{
+   struct tb_port *nhi, *port;
+   struct tb_tunnel *tunnel;
+   struct tb_switch *host;
+
+   /*
+* Create DMA RX tunnel from port 1 to NHI.
+*
+*   [Host 1]
+*1 ^
+*  |
+*  | In HopID 15 -> Out HopID 2
+*  Domain border
+*  |
+*   [Host 2]
+*/
+   host = alloc_host(test);
+   nhi = &host->ports[7];
+   port = &host->ports[1];
+
+   tunnel = tb_tunnel_alloc_dma(NULL, nhi, port, -1, -1, 15, 2);
+   KUNIT_ASSERT_TRUE(test, tunnel != NULL);
+   KUNIT_EXPECT_EQ(test, tunnel->type, (enum tb_tunnel_type)TB_TUNNEL_DMA);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->src_port, nhi);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->dst_port, port);
+   KUNIT_ASSERT_EQ(test, tunnel->npaths, (size_t)1);
+   /* RX path */
+   KUNIT_ASSERT_EQ(test, tunnel->paths[0]->path_length, 1);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->paths[0]->hops[0].in_port, port);
+   KUNIT_EXPECT_EQ(test, tunnel->paths[0]->hops[0].in_hop_index, 15);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->paths[0]->hops[0].out_port, nhi);
+   KUNIT_EXPECT_EQ(test, tunnel->paths[0]->hops[0].next_hop_index, 2);
+
+   tb_tunnel_free(tunnel);
+}
+
+static void tb_test_tunnel_dma_tx(struct kunit *test)
+{
+   struct tb_port *nhi, *port;
+   struct tb_tunnel *tunnel;
+   struct tb_switch *host;
+
+   /*
+* Create DMA TX tunnel from NHI to port 1.
+*
+*   [Host 1]
+*1 | In HopID 2 -> Out HopID 15
+*  |
+*  v
+*  Domain border
+*  |
+*   [Host 2]
+*/
+   host = alloc_host(test);
+   nhi = &host->ports[7];
+   port = &host->ports[1];
+
+   tunnel = tb_tunnel_alloc_dma(NULL, nhi, port, 15, 2, -1, -1);
+   KUNIT_ASSERT_TRUE(test, tunnel != NULL);
+   KUNIT_EXPECT_EQ(test, tunnel->type, (enum tb_tunnel_type)TB_TUNNEL_DMA);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->src_port, nhi);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->dst_port, port);
+   KUNIT_ASSERT_EQ(test, tunnel->npaths, (size_t)1);
+   /* TX path */
+   KUNIT_ASSERT_EQ(test, tunnel->paths[0]->path_length, 1);
+   KUNIT_EXPECT_PTR_EQ(test, tunnel->paths[0]->

[PATCH 17/18] thunderbolt: Check quirks in tb_switch_add()

2021-03-04 Thread Mika Westerberg

This makes it more visible on the main path of adding router.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/eeprom.c | 1 -
 drivers/thunderbolt/switch.c | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/eeprom.c b/drivers/thunderbolt/eeprom.c
index dd03d3096653..aecb0b9f0c75 100644
--- a/drivers/thunderbolt/eeprom.c
+++ b/drivers/thunderbolt/eeprom.c
@@ -610,7 +610,6 @@ int tb_drom_read(struct tb_switch *sw)
sw->uid = header->uid;
sw->vendor = header->vendor_id;
sw->device = header->model_id;
-   tb_check_quirks(sw);
 
crc = tb_crc32(sw->drom + TB_DROM_DATA_START, header->data_len);
if (crc != header->data_crc32) {
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index 7ac37a1f95e1..72b43c7c0651 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -2520,6 +2520,8 @@ int tb_switch_add(struct tb_switch *sw)
}
tb_sw_dbg(sw, "uid: %#llx\n", sw->uid);
 
+   tb_check_quirks(sw);
+
ret = tb_switch_set_uuid(sw);
if (ret) {
dev_err(&sw->dev, "failed to set UUID\n");
-- 
2.30.1

[PATCH 12/18] thunderbolt: Drop unused tb_port_set_initial_credits()

2021-03-04 Thread Mika Westerberg

This function is not used anymore in the driver so we can remove it.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/switch.c | 22 --
 drivers/thunderbolt/tb.h |  1 -
 2 files changed, 23 deletions(-)

diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index 218869c6ee21..7ac37a1f95e1 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -626,28 +626,6 @@ int tb_port_add_nfc_credits(struct tb_port *port, int 
credits)
 TB_CFG_PORT, ADP_CS_4, 1);
 }
 
-/**
- * tb_port_set_initial_credits() - Set initial port link credits allocated
- * @port: Port to set the initial credits
- * @credits: Number of credits to to allocate
- *
- * Set initial credits value to be used for ingress shared buffering.
- */
-int tb_port_set_initial_credits(struct tb_port *port, u32 credits)
-{
-   u32 data;
-   int ret;
-
-   ret = tb_port_read(port, &data, TB_CFG_PORT, ADP_CS_5, 1);
-   if (ret)
-   return ret;
-
-   data &= ~ADP_CS_5_LCA_MASK;
-   data |= (credits << ADP_CS_5_LCA_SHIFT) & ADP_CS_5_LCA_MASK;
-
-   return tb_port_write(port, &data, TB_CFG_PORT, ADP_CS_5, 1);
-}
-
 /**
  * tb_port_clear_counter() - clear a counter in TB_CFG_COUNTER
  * @port: Port whose counters to clear
diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h
index d6ad45686488..ec8cdbc761fa 100644
--- a/drivers/thunderbolt/tb.h
+++ b/drivers/thunderbolt/tb.h
@@ -860,7 +860,6 @@ static inline bool tb_switch_tmu_is_enabled(const struct 
tb_switch *sw)
 
 int tb_wait_for_port(struct tb_port *port, bool wait_if_unplugged);
 int tb_port_add_nfc_credits(struct tb_port *port, int credits);
-int tb_port_set_initial_credits(struct tb_port *port, u32 credits);
 int tb_port_clear_counter(struct tb_port *port, int counter);
 int tb_port_unlock(struct tb_port *port);
 int tb_port_enable(struct tb_port *port);
-- 
2.30.1

[PATCH 18/18] thunderbolt: Add support for USB4 DROM

2021-03-04 Thread Mika Westerberg

USB4 router DROM differs sligthly from Thunderbolt 1-3 DROM. For
instance it does not include UID and CRC8 in the header section, and it
has product descriptor genereric entry to describe the product IDs and
related information. If the "Version" field in the DROM header section
reads 3 it means the router only has USB4 DROM and if it reads 1 it
means the router supports TBT3 compatible DROM.

For this reason, update the DROM parsing code to support "pure" USB4
DROMs too.

While there drop the extra empty line at the end of tb_drom_read().

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/eeprom.c | 104 +++
 1 file changed, 80 insertions(+), 24 deletions(-)

diff --git a/drivers/thunderbolt/eeprom.c b/drivers/thunderbolt/eeprom.c
index aecb0b9f0c75..46d0906a3070 100644
--- a/drivers/thunderbolt/eeprom.c
+++ b/drivers/thunderbolt/eeprom.c
@@ -277,6 +277,16 @@ struct tb_drom_entry_port {
u8 unknown4:2;
 } __packed;
 
+/* USB4 product descriptor */
+struct tb_drom_entry_desc {
+   struct tb_drom_entry_header header;
+   u16 bcdUSBSpec;
+   u16 idVendor;
+   u16 idProduct;
+   u16 bcdProductFWRevision;
+   u32 TID;
+   u8 productHWRevision;
+};
 
 /**
  * tb_drom_read_uid_only() - Read UID directly from DROM
@@ -329,6 +339,16 @@ static int tb_drom_parse_entry_generic(struct tb_switch 
*sw,
if (!sw->device_name)
return -ENOMEM;
break;
+   case 9: {
+   const struct tb_drom_entry_desc *desc =
+   (const struct tb_drom_entry_desc *)entry;
+
+   if (!sw->vendor && !sw->device) {
+   sw->vendor = desc->idVendor;
+   sw->device = desc->idProduct;
+   }
+   break;
+   }
}
 
return 0;
@@ -521,6 +541,51 @@ static int tb_drom_read_n(struct tb_switch *sw, u16 
offset, u8 *val,
return tb_eeprom_read_n(sw, offset, val, count);
 }
 
+static int tb_drom_parse(struct tb_switch *sw)
+{
+   const struct tb_drom_header *header =
+   (const struct tb_drom_header *)sw->drom;
+   u32 crc;
+
+   crc = tb_crc8((u8 *) &header->uid, 8);
+   if (crc != header->uid_crc8) {
+   tb_sw_warn(sw,
+   "DROM UID CRC8 mismatch (expected: %#x, got: %#x), 
aborting\n",
+   header->uid_crc8, crc);
+   return -EINVAL;
+   }
+   if (!sw->uid)
+   sw->uid = header->uid;
+   sw->vendor = header->vendor_id;
+   sw->device = header->model_id;
+
+   crc = tb_crc32(sw->drom + TB_DROM_DATA_START, header->data_len);
+   if (crc != header->data_crc32) {
+   tb_sw_warn(sw,
+   "DROM data CRC32 mismatch (expected: %#x, got: %#x), 
continuing\n",
+   header->data_crc32, crc);
+   }
+
+   return tb_drom_parse_entries(sw);
+}
+
+static int usb4_drom_parse(struct tb_switch *sw)
+{
+   const struct tb_drom_header *header =
+   (const struct tb_drom_header *)sw->drom;
+   u32 crc;
+
+   crc = tb_crc32(sw->drom + TB_DROM_DATA_START, header->data_len);
+   if (crc != header->data_crc32) {
+   tb_sw_warn(sw,
+  "DROM data CRC32 mismatch (expected: %#x, got: %#x), 
aborting\n",
+  header->data_crc32, crc);
+   return -EINVAL;
+   }
+
+   return tb_drom_parse_entries(sw);
+}
+
 /**
  * tb_drom_read() - Copy DROM to sw->drom and parse it
  * @sw: Router whose DROM to read and parse
@@ -534,7 +599,6 @@ static int tb_drom_read_n(struct tb_switch *sw, u16 offset, 
u8 *val,
 int tb_drom_read(struct tb_switch *sw)
 {
u16 size;
-   u32 crc;
struct tb_drom_header *header;
int res, retries = 1;
 
@@ -599,30 +663,21 @@ int tb_drom_read(struct tb_switch *sw)
goto err;
}
 
-   crc = tb_crc8((u8 *) &header->uid, 8);
-   if (crc != header->uid_crc8) {
-   tb_sw_warn(sw,
-   "drom uid crc8 mismatch (expected: %#x, got: %#x), 
aborting\n",
-   header->uid_crc8, crc);
-   goto err;
-   }
-   if (!sw->uid)
-   sw->uid = header->uid;
-   sw->vendor = header->vendor_id;
-   sw->device = header->model_id;
+   tb_sw_dbg(sw, "DROM version: %d\n", header->device_rom_revision);
 
-   crc = tb_crc32(sw->drom + TB_DROM_DATA_START, header->data_len);
-   if (crc != header->data_crc32) {
-   tb_sw_warn(sw,
-   "drom data crc32 mismatch (expected: %#x, got: %#x), 
continuing\n",
-   header->data_crc32, crc);
+   switch (header->device_rom_revision) {
+   case 3:
+   res = usb4_drom_parse(sw);
+   break;
+   default:
+   tb_sw_warn(sw, "DROM device_rom_revision %#x unknown\n",
+

[PATCH 06/18] thunderbolt: Do not re-establish XDomain DMA paths automatically

2021-03-04 Thread Mika Westerberg

This step is actually not needed. The service drivers themselves will
handle this once they have negotiated the service up and running again
with the remote side. Also dropping this makes it easier to add support
for multiple DMA tunnels over a single XDomain connection.

Signed-off-by: Mika Westerberg 
---
 drivers/thunderbolt/xdomain.c | 35 ++-
 include/linux/thunderbolt.h   |  2 --
 2 files changed, 2 insertions(+), 35 deletions(-)

diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
index 584bb5ec06f8..a1657663a95e 100644
--- a/drivers/thunderbolt/xdomain.c
+++ b/drivers/thunderbolt/xdomain.c
@@ -946,19 +946,6 @@ static int populate_properties(struct tb_xdomain *xd,
return 0;
 }
 
-/* Called with @xd->lock held */
-static void tb_xdomain_restore_paths(struct tb_xdomain *xd)
-{
-   if (!xd->resume)
-   return;
-
-   xd->resume = false;
-   if (xd->transmit_path) {
-   dev_dbg(&xd->dev, "re-establishing DMA path\n");
-   tb_domain_approve_xdomain_paths(xd->tb, xd);
-   }
-}
-
 static inline struct tb_switch *tb_xdomain_parent(struct tb_xdomain *xd)
 {
return tb_to_switch(xd->dev.parent);
@@ -1084,16 +1071,8 @@ static void tb_xdomain_get_properties(struct work_struct 
*work)
mutex_lock(&xd->lock);
 
/* Only accept newer generation properties */
-   if (xd->properties && gen <= xd->property_block_gen) {
-   /*
-* On resume it is likely that the properties block is
-* not changed (unless the other end added or removed
-* services). However, we need to make sure the existing
-* DMA paths are restored properly.
-*/
-   tb_xdomain_restore_paths(xd);
+   if (xd->properties && gen <= xd->property_block_gen)
goto err_free_block;
-   }
 
dir = tb_property_parse_dir(block, ret);
if (!dir) {
@@ -1118,8 +1097,6 @@ static void tb_xdomain_get_properties(struct work_struct 
*work)
 
tb_xdomain_update_link_attributes(xd);
 
-   tb_xdomain_restore_paths(xd);
-
mutex_unlock(&xd->lock);
 
kfree(block);
@@ -1332,15 +1309,7 @@ static int __maybe_unused tb_xdomain_suspend(struct 
device *dev)
 
 static int __maybe_unused tb_xdomain_resume(struct device *dev)
 {
-   struct tb_xdomain *xd = tb_to_xdomain(dev);
-
-   /*
-* Ask tb_xdomain_get_properties() restore any existing DMA
-* paths after properties are re-read.
-*/
-   xd->resume = true;
-   start_handshake(xd);
-
+   start_handshake(tb_to_xdomain(dev));
return 0;
 }
 
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index 659a0a810fa1..7ec977161f5c 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -185,7 +185,6 @@ void tb_unregister_property_dir(const char *key, struct 
tb_property_dir *dir);
  * @link_speed: Speed of the link in Gb/s
  * @link_width: Width of the link (1 or 2)
  * @is_unplugged: The XDomain is unplugged
- * @resume: The XDomain is being resumed
  * @needs_uuid: If the XDomain does not have @remote_uuid it will be
  * queried first
  * @transmit_path: HopID which the remote end expects us to transmit
@@ -231,7 +230,6 @@ struct tb_xdomain {
unsigned int link_speed;
unsigned int link_width;
bool is_unplugged;
-   bool resume;
bool needs_uuid;
u16 transmit_path;
u16 transmit_ring;
-- 
2.30.1

Re: [regression] Kernel panic on resume from sleep

2021-03-04 Thread Zbynek Michl

Looks good so far, but need to wait some more time as the issue was irregular.

Do you have any explanation why the calls disorder caused the panic
just occasionally?

Also, the same (wrong) order I can see in the 3.16 kernel code, but it
has worked fine with this kernel in all cases. So what is different in
5.10?

Thanks
Zbynek

On Wed, Mar 3, 2021 at 2:44 AM Jakub Kicinski  wrote:
>
> On Mon, 1 Mar 2021 23:11:05 +0100 Zbynek Michl wrote:
> > Hello,
> >
> > Can anybody help me with the following kernel issue?
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983595
> >
> > Do I understand it correctly that the kernel crashes due to the bug in
> > the alx driver?
>
> Order of calls on resume looks suspicious, can you give this a try?
>
> diff --git a/drivers/net/ethernet/atheros/alx/main.c 
> b/drivers/net/ethernet/atheros/alx/main.c
> index 9b7f1af5f574..9e02f8864593 100644
> --- a/drivers/net/ethernet/atheros/alx/main.c
> +++ b/drivers/net/ethernet/atheros/alx/main.c
> @@ -1894,13 +1894,16 @@ static int alx_resume(struct device *dev)
>
> if (!netif_running(alx->dev))
> return 0;
> -   netif_device_attach(alx->dev);
>
> rtnl_lock();
> err = __alx_open(alx, true);
> rtnl_unlock();
> +   if (err)
> +   return err;
>
> -   return err;
> +   netif_device_attach(alx->dev);
> +
> +   return 0;
>  }
>
>  static SIMPLE_DEV_PM_OPS(alx_pm_ops, alx_suspend, alx_resume);

stmmac driver timeout issue

2021-03-04 Thread Joakim Zhang



Hello Andrew, Hello Jakub,

You may can give some suggestions based on your great networking knowledge, 
thanks in advance!

I found that add vlan id hw filter (stmmac_vlan_rx_add_vid) have possibility 
timeout when accessing VLAN Filter registers during ifup/ifdown stress test, 
and restore vlan id hw filter (stmmac_restore_hw_vlan_rx_fltr) always timeout 
when access VLAN Filter registers. 

My hardware is i.MX8MP (drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c, RGMII 
interface, RTL8211FDI-CG PHY), it needs fix mac speed(imx_dwmac_fix_speed), it 
indirectly involved in phylink_link_up. After debugging, if phylink_link_up is 
called later than adding vlan id hw filter, it will report timeout, so I guess 
we need fix mac speed before accessing VLAN Filter registers. Error like below:
[  106.389879] 8021q: adding VLAN 0 to HW filter on device eth1
[  106.395644] imx-dwmac 30bf.ethernet eth1: Timeout accessing 
MAC_VLAN_Tag_Filter
[  108.160734] imx-dwmac 30bf.ethernet eth1: Link is Up - 
100Mbps/Full - flow control rx/tx   ->->-> which means accessing VLAN Filter 
registers before phylink_link_up is called.

Same case when system resume back, 
[ 1763.842294] imx-dwmac 30bf.ethernet eth1: configuring for 
phy/rgmii-id link mode
[ 1763.853084] imx-dwmac 30bf.ethernet eth1: No Safety Features 
support found
[ 1763.853186] imx-dwmac 30bf.ethernet eth1: Timeout accessing 
MAC_VLAN_Tag_Filter
[ 1763.873465] usb usb1: root hub lost power or was reset
[ 1763.873469] usb usb2: root hub lost power or was reset
[ 1764.090321] PM: resume devices took 0.248 seconds
[ 1764.257381] OOM killer enabled.
[ 1764.260518] Restarting tasks ... done.
[ 1764.265229] PM: suspend exit
===
suspend 12 times
===
[ 1765.887915] imx-dwmac 30bf.ethernet eth1: Link is Up - 
100Mbps/Full - flow control rx/tx  ->->-> which means accessing VLAN Filter 
registers before phylink_link_up is called.

My question is that some MAC controllers need RXC clock from RGMII interface to 
reset DAM or access to some registers. If there is any way to ensure 
phylink_link_up is invoked synchronously when we need it. I am not sure this 
timeout caused by a fix mac speed is needed before accessing VLAN Filter 
registers, is there ang hints, thanks a lot! We have another board i.MX8DXL 
which don't need fix mac speed attach to AR8031 PHY, can't reproduce this issue.

Best Regards,
Joakim Zhang

[PATCH 1/1] net: usb: qmi_wwan: allow qmimux add/del with master up

2021-03-04 Thread Daniele Palmas

There's no reason for preventing the creation and removal
of qmimux network interfaces when the underlying interface
is up.

This makes qmi_wwan mux implementation more similar to the
rmnet one, simplifying userspace management of the same
logical interfaces.

Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
Reported-by: Aleksander Morgado 
Signed-off-by: Daniele Palmas 
---
 drivers/net/usb/qmi_wwan.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 17a050521b86..6700f1970b24 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -429,13 +429,6 @@ static ssize_t add_mux_store(struct device *d,  struct 
device_attribute *attr, c
goto err;
}
 
-   /* we don't want to modify a running netdev */
-   if (netif_running(dev->net)) {
-   netdev_err(dev->net, "Cannot change a running device\n");
-   ret = -EBUSY;
-   goto err;
-   }
-
ret = qmimux_register_device(dev->net, mux_id);
if (!ret) {
info->flags |= QMI_WWAN_FLAG_MUX;
@@ -465,13 +458,6 @@ static ssize_t del_mux_store(struct device *d,  struct 
device_attribute *attr, c
if (!rtnl_trylock())
return restart_syscall();
 
-   /* we don't want to modify a running netdev */
-   if (netif_running(dev->net)) {
-   netdev_err(dev->net, "Cannot change a running device\n");
-   ret = -EBUSY;
-   goto err;
-   }
-
del_dev = qmimux_find_dev(dev, mux_id);
if (!del_dev) {
netdev_err(dev->net, "mux_id not present\n");
-- 
2.17.1

Re: [PATCH net 1/2] net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10

2021-03-04 Thread Andrew Lunn

On Thu, Mar 04, 2021 at 12:56:53PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
> read before resetting the switch so it can be reprogrammed afterwards.
> This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
> because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
> therefore that last branch is dead code.
> 
> Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
> remove the check for SPEED_10, let it fall into the default case.
> 
> Fixes: ffe10e679cec ("net: dsa: sja1105: Add support for the SGMII port")
> Signed-off-by: Vladimir Oltean 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net 2/2] net: dsa: sja1105: fix ucast/bcast flooding always remaining enabled

2021-03-04 Thread Andrew Lunn

On Thu, Mar 04, 2021 at 12:56:54PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> In the blamed patch I managed to introduce a bug while moving code
> around: the same logic is applied to the ucast_egress_floods and
> bcast_egress_floods variables both on the "if" and the "else" branches.

Some static analysers will report this.
 
> This is clearly an unintended change compared to how the code used to be
> prior to that bugfix, so restore it.
> 
> Fixes: 7f7ccdea8c73 ("net: dsa: sja1105: fix leakage of flooded frames 
> outside bridging domain")
> Signed-off-by: Vladimir Oltean 

Reviewed-by: Andrew Lunn 

Andrew

Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-04 Thread Cornelia Huck

On Thu, 4 Mar 2021 16:24:16 +0800
Jason Wang  wrote:

> On 2021/3/3 4:29 下午, Cornelia Huck wrote:
> > On Wed, 3 Mar 2021 12:01:01 +0800
> > Jason Wang  wrote:
> >  
> >> On 2021/3/2 8:08 下午, Cornelia Huck wrote:  
> >>> On Mon, 1 Mar 2021 11:51:08 +0800
> >>> Jason Wang  wrote:
> >>> 
>  On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote:  
> > On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote:  
> >> On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote:  
> >>> Confused. What is wrong with the above? It never reads the
> >>> field unless the feature has been offered by device.  
> >> So the spec said:
> >>
> >> "
> >>
> >> The following driver-read-only field, max_virtqueue_pairs only exists 
> >> if
> >> VIRTIO_NET_F_MQ is set.
> >>
> >> "
> >>
> >> If I read this correctly, there will be no max_virtqueue_pairs field 
> >> if the
> >> VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() 
> >> violates
> >> what spec said.
> >>
> >> Thanks  
> > I think that's a misunderstanding. This text was never intended to
> > imply that field offsets change beased on feature bits.
> > We had this pain with legacy and we never wanted to go back there.
> >
> > This merely implies that without VIRTIO_NET_F_MQ the field
> > should not be accessed. Exists in the sense "is accessible to driver".
> >
> > Let's just clarify that in the spec, job done.  
>  Ok, agree. That will make things more eaiser.  
> >>> Yes, that makes much more sense.
> >>>
> >>> What about adding the following to the "Basic Facilities of a Virtio
> >>> Device/Device Configuration Space" section of the spec:
> >>>
> >>> "If an optional configuration field does not exist, the corresponding
> >>> space is still present, but reserved."  
> >>
> >> This became interesting after re-reading some of the qemu codes.
> >>
> >> E.g in virtio-net.c we had:
> >>
> >> *static VirtIOFeature feature_sizes[] = {
> >>       {.flags = 1ULL << VIRTIO_NET_F_MAC,
> >>    .end = endof(struct virtio_net_config, mac)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_STATUS,
> >>    .end = endof(struct virtio_net_config, status)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_MQ,
> >>    .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_MTU,
> >>    .end = endof(struct virtio_net_config, mtu)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
> >>    .end = endof(struct virtio_net_config, duplex)},
> >>       {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL <<
> >> VIRTIO_NET_F_HASH_REPORT),
> >>    .end = endof(struct virtio_net_config, supported_hash_types)},
> >>       {}
> >> };*
> >>
> >> *It has a implict dependency chain. E.g MTU doesn't presnet if
> >> DUPLEX/RSS is not offered ...
> >> *  
> > But I think it covers everything up to the relevant field, no? So MTU
> > is included if we have the feature bit, even if we don't have
> > DUPLEX/RSS.
> >
> > Given that a config space may be shorter (but must not collapse
> > non-existing fields), maybe a better wording would be:
> >
> > "If an optional configuration field does not exist, the corresponding
> > space will still be present if it is not at the end of the
> > configuration space (i.e., further configuration fields exist.)  
> 
> 
> This should work but I think we need to define the end of configuration 
> space first?

What about sidestepping this:

"...the corresponding space will still be present, unless no further
configuration fields exist."

?

> 
> > This
> > implies that a given field, if it exists, is always at the same offset
> > from the beginning of the configuration space."

Re: [PATCH/v5] bpf: add bpf_skb_adjust_room flag BPF_F_ADJ_ROOM_ENCAP_L2_ETH

2021-03-04 Thread Willem de Bruijn

On Thu, Mar 4, 2021 at 1:41 AM Xuesen Huang  wrote:
>
> From: Xuesen Huang 
>
> bpf_skb_adjust_room sets the inner_protocol as skb->protocol for packets
> encapsulation. But that is not appropriate when pushing Ethernet header.
>
> Add an option to further specify encap L2 type and set the inner_protocol
> as ETH_P_TEB.
>
> Suggested-by: Willem de Bruijn 
> Signed-off-by: Xuesen Huang 
> Signed-off-by: Zhiyong Cheng 
> Signed-off-by: Li Wang 

Acked-by: Willem de Bruijn

Re: [PATCH] selftests_bpf: extend test_tc_tunnel test with vxlan

2021-03-04 Thread Willem de Bruijn

On Thu, Mar 4, 2021 at 1:42 AM Xuesen Huang  wrote:
>
> From: Xuesen Huang 
>
> Add BPF_F_ADJ_ROOM_ENCAP_L2_ETH flag to the existing tests which
> encapsulates the ethernet as the inner l2 header.
>
> Update a vxlan encapsulation test case.
>
> Signed-off-by: Xuesen Huang 
> Signed-off-by: Li Wang 
> Signed-off-by: Willem de Bruijn 

Please mark patch target: [PATCH bpf-next]

> ---
>  tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 113 
> ++---
>  tools/testing/selftests/bpf/test_tc_tunnel.sh  |  15 ++-
>  2 files changed, 111 insertions(+), 17 deletions(-)


> -static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 
> encap_proto,
> - __u16 l2_proto)
> +static __always_inline int __encap_ipv4(struct __sk_buff *skb, __u8 
> encap_proto,
> +   __u16 l2_proto, __u16 ext_proto)
>  {
> __u16 udp_dst = UDP_PORT;
> struct iphdr iph_inner;
> struct v4hdr h_outer;
> struct tcphdr tcph;
> int olen, l2_len;
> +   __u8 *l2_hdr = NULL;
> int tcp_off;
> __u64 flags;
>
> @@ -141,7 +157,11 @@ static __always_inline int encap_ipv4(struct __sk_buff 
> *skb, __u8 encap_proto,
> break;
> case ETH_P_TEB:
> l2_len = ETH_HLEN;
> -   udp_dst = ETH_OVER_UDP_PORT;
> +   if (ext_proto & EXTPROTO_VXLAN) {
> +   udp_dst = VXLAN_UDP_PORT;
> +   l2_len += sizeof(struct vxlanhdr);
> +   } else
> +   udp_dst = ETH_OVER_UDP_PORT;
> break;
> }
> flags |= BPF_F_ADJ_ROOM_ENCAP_L2(l2_len);
> @@ -171,14 +191,26 @@ static __always_inline int encap_ipv4(struct __sk_buff 
> *skb, __u8 encap_proto,
> }
>
> /* add L2 encap (if specified) */
> +   l2_hdr = (__u8 *)&h_outer + olen;
> switch (l2_proto) {
> case ETH_P_MPLS_UC:
> -   *((__u32 *)((__u8 *)&h_outer + olen)) = mpls_label;
> +   *(__u32 *)l2_hdr = mpls_label;
> break;
> case ETH_P_TEB:
> -   if (bpf_skb_load_bytes(skb, 0, (__u8 *)&h_outer + olen,
> -  ETH_HLEN))
> +   flags |= BPF_F_ADJ_ROOM_ENCAP_L2_ETH;
> +
> +   if (ext_proto & EXTPROTO_VXLAN) {
> +   struct vxlanhdr *vxlan_hdr = (struct vxlanhdr 
> *)l2_hdr;
> +
> +   vxlan_hdr->vx_flags = VXLAN_FLAGS;
> +   vxlan_hdr->vx_vni = bpf_htonl((VXLAN_VNI & 
> VXLAN_VNI_MASK) << 8);
> +
> +   l2_hdr += sizeof(struct vxlanhdr);

should this be l2_len? (here and ipv6 below)

> +SEC("encap_vxlan_eth")
> +int __encap_vxlan_eth(struct __sk_buff *skb)
> +{
> +   if (skb->protocol == __bpf_constant_htons(ETH_P_IP))
> +   return __encap_ipv4(skb, IPPROTO_UDP,
> +   ETH_P_TEB,
> +   EXTPROTO_VXLAN);

non-standard indentation: align with the opening parenthesis. (here
and ipv6 below)

[PATCH] net: mellanox: mlx5: fix error return code in mlx5_fpga_device_start()

2021-03-04 Thread Jia-Ju Bai

When mlx5_is_fpga_lookaside() returns a non-zero value, no error 
return code is assigned.
To fix this bug, err is assigned with -EINVAL as error return code.

Reported-by: TOTE Robot 
Signed-off-by: Jia-Ju Bai 
---
 drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c
index 2ce4241459ce..c9e6da97126f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c
@@ -198,8 +198,10 @@ int mlx5_fpga_device_start(struct mlx5_core_dev *mdev)
mlx5_fpga_info(fdev, "FPGA card %s:%u\n", mlx5_fpga_name(fpga_id), 
fpga_id);
 
/* No QPs if FPGA does not participate in net processing */
-   if (mlx5_is_fpga_lookaside(fpga_id))
+   if (mlx5_is_fpga_lookaside(fpga_id)) {
+   err = -EINVAL;
goto out;
+   }
 
mlx5_fpga_info(fdev, "%s(%d): image, version %u; SBU %06x:%04x version 
%d\n",
   mlx5_fpga_image_name(fdev->last_oper_image),
-- 
2.17.1

Re: [PATCH 10/11] pragma once: delete few backslashes

2021-03-04 Thread Edward Cree

On 28/02/2021 17:05, Alexey Dobriyan wrote:
> From 251ca5673886b5bb0a42004944290b9d2b267a4a Mon Sep 17 00:00:00 2001
> From: Alexey Dobriyan 
> Date: Fri, 19 Feb 2021 13:37:24 +0300
> Subject: [PATCH 10/11] pragma once: delete few backslashes
> 
> Some macros contain one backslash too many and end up being the last
> macro in a header file. When #pragma once conversion script truncates
> the last #endif and whitespace before it, such backslash triggers
> a warning about "OMG file ends up in a backslash-newline".
> 
> Needless to say I don't want to handle another case in my script,
> so delete useless backslashes instead.
> 
> Signed-off-by: Alexey Dobriyan 
> ---
>  arch/arc/include/asm/cacheflush.h  | 2 +-
>  drivers/net/ethernet/mellanox/mlxsw/item.h | 2 +-
>  include/linux/once.h   | 2 +-
>  include/media/drv-intf/exynos-fimc.h   | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arc/include/asm/cacheflush.h 
> b/arch/arc/include/asm/cacheflush.h
> index e201b4b1655a..46704c341b17 100644
> --- a/arch/arc/include/asm/cacheflush.h
> +++ b/arch/arc/include/asm/cacheflush.h
> @@ -112,6 +112,6 @@ do {  
> \
>  } while (0)
>  
>  #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
> - memcpy(dst, src, len);  \
> + memcpy(dst, src, len)
>  This changebar also removes a semicolon.
It looks plausibly correct, but the commit message ought to mention it.

-ed

Re: [PATCH] net: mac802154: Fix null pointer dereference

2021-03-04 Thread Alexander Aring

Hi,

On Thu, 4 Mar 2021 at 04:23, Pavel Skripkin  wrote:
...
> >
> > I think this need to be:
> >
> > if (!IS_ERR_OR_NULL(key->tfm[i]))
> >
> > otherwise we still run into issues for the current iterator when
> > key->tfm[i] is in range of IS_ERR().
>
> Oh... I got it completly wrong, I'm sorry. If it's still not fixed,
> I'll send rigth patch for that.
>

please resend your patch. We will review again.

- Alex

Re: WARNING in carl9170_usb_submit_cmd_urb/usb_submit_urb

2021-03-04 Thread syzbot

syzbot has bisected this issue to:

commit 6a66a7ded12baa6ebbb2e3e82f8cb91382814839
Author: zhangyi (F) 
Date:   Thu Feb 13 06:38:20 2020 +

jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14bd498ed0
start commit:   f69d02e3 Merge tag 'misc-5.12-2021-03-02' of git://git.ker..
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=16bd498ed0
console output: https://syzkaller.appspot.com/x/log.txt?x=12bd498ed0
kernel config:  https://syzkaller.appspot.com/x/.config?x=e0da2d01cc636e2c
dashboard link: https://syzkaller.appspot.com/bug?extid=9468df99cb63a4a4c4e1
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=11770346d0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17be69ccd0

Reported-by: syzbot+9468df99cb63a4a4c...@syzkaller.appspotmail.com
Fixes: 6a66a7ded12b ("jbd2: move the clearing of b_modified flag to the 
journal_unmap_buffer()")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Re: [PATCH v4 0/6] can: c_can: add support to 64 message objects

2021-03-04 Thread Marc Kleine-Budde

On 02.03.2021 22:54:29, Dario Binacchi wrote:
> 
> The D_CAN controller supports up to 128 messages. Until now the driver
> only managed 32 messages although Sitara processors and DRA7 SOC can
> handle 64.
> 
> The series was tested on a beaglebone board.
> 
> Note:
> I have not changed the type of tx_field (belonging to the c_can_priv
> structure) to atomic64_t because I think the atomic_t type has size
> of at least 32 bits on x86 and arm, which is enough to handle 64
> messages.
> http://marc.info/?l=linux-can&m=139746476821294&w=2 reports the results
> of tests performed just on x86 and arm architectures.

Applied to linux-can-next/testing.

I've added some cleanup patches to the series. I'll send it around soon,
please test.

regards,
Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |


signature.asc
Description: PGP signature

Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-04 Thread Robin Murphy


On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this 
through the drivers at all? Seems like it would make more sense for the 
x86 drivers to reflect their private options back to iommu_dma_strict 
(and allow Intel's caching mode to override it as well), then have 
iommu_dma_init_domain just test !iommu_dma_strict && 
domain->ops->flush_iotlb_all.


Robin.


Also remove the now unused iommu_domain_get_attr functionality.

Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 ++---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++---
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 56 +
  drivers/iommu/dma-iommu.c   |  8 ++-
  drivers/iommu/intel/iommu.c | 27 ++
  drivers/iommu/iommu.c   | 19 +++
  include/linux/iommu.h   | 17 ++-
  7 files changed, 51 insertions(+), 146 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40d0..37a8e51db17656 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1771,24 +1771,11 @@ static struct iommu_group 
*amd_iommu_device_group(struct device *dev)
return acpihid_device_group(dev);
  }
  
-static int amd_iommu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain)
  {
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   return -ENODEV;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = !amd_iommu_unmap_flush;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return !amd_iommu_unmap_flush;
  }
  
  /*

@@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = {
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
-   .domain_get_attr = amd_iommu_domain_get_attr,
+   .dma_use_flush_queue = amd_iommu_dma_use_flush_queue,
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a8304375..bf96172e8c1f71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain)
  {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
  
-	switch (domain->type) {

-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return smmu_domain->non_strict;
+}
+
+
+static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain)
+{
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return;
+   to_smmu_domain(domain)->non_strict = true;
  }
  
  static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

@@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
}
break;
case IOMMU_DOMAIN_DMA:
-   switch(attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
-   break;
-   default:
-   ret = -EN

[PATCH v2] net: mac802154: Fix general protection fault

2021-03-04 Thread Pavel Skripkin

syzbot found general protection fault in crypto_destroy_tfm()[1].
It was caused by wrong clean up loop in llsec_key_alloc().
If one of the tfm array members is in IS_ERR() range it will
cause general protection fault in clean up function [1].

Call Trace:
 crypto_free_aead include/crypto/aead.h:191 [inline] [1]
 llsec_key_alloc net/mac802154/llsec.c:156 [inline]
 mac802154_llsec_key_add+0x9e0/0xcc0 net/mac802154/llsec.c:249
 ieee802154_add_llsec_key+0x56/0x80 net/mac802154/cfg.c:338
 rdev_add_llsec_key net/ieee802154/rdev-ops.h:260 [inline]
 nl802154_add_llsec_key+0x3d3/0x560 net/ieee802154/nl802154.c:1584
 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739
 genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
 genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Signed-off-by: Pavel Skripkin 
Reported-by: syzbot+9ec037722d2603a9f...@syzkaller.appspotmail.com
Change-Id: I29f7ac641a039096d63d1e6070bb32cb5a3beb07
---
 net/mac802154/llsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac802154/llsec.c b/net/mac802154/llsec.c
index 585d33144c33..0ead2ced 100644
--- a/net/mac802154/llsec.c
+++ b/net/mac802154/llsec.c
@@ -152,7 +152,7 @@ llsec_key_alloc(const struct ieee802154_llsec_key *template)
crypto_free_sync_skcipher(key->tfm0);
 err_tfm:
for (i = 0; i < ARRAY_SIZE(key->tfm); i++)
-   if (key->tfm[i])
+   if (!IS_ERR_OR_NULL(key->tfm[i]))
crypto_free_aead(key->tfm[i]);
 
kfree_sensitive(key);
-- 
2.25.1

Re: [PATCH v4 5/6] can: c_can: prepare to up the message objects number

2021-03-04 Thread Marc Kleine-Budde

On 02.03.2021 22:54:34, Dario Binacchi wrote:
> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
> index 77b9aee56154..0052ba5197e0 100644
> --- a/drivers/net/can/c_can/c_can.c
> +++ b/drivers/net/can/c_can/c_can.c
[...]
> -struct net_device *alloc_c_can_dev(void)
> +struct net_device *alloc_c_can_dev(int msg_obj_num)
>  {
>   struct net_device *dev;
>   struct c_can_priv *priv;
> + int msg_obj_tx_num = msg_obj_num / 2;
>  
> - dev = alloc_candev(sizeof(struct c_can_priv), C_CAN_MSG_OBJ_TX_NUM);
> + dev = alloc_candev(sizeof(*priv) + sizeof(u32) * msg_obj_tx_num,
> +msg_obj_tx_num);

I've converted this to make use of the struct_size() macro:

+   dev = alloc_candev(struct_size(priv, dlc, msg_obj_tx_num),
+  msg_obj_tx_num);

regards,
Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |


signature.asc
Description: PGP signature

Re: [Patch bpf-next] skmsg: add function doc for skb->_sk_redir

2021-03-04 Thread patchwork-bot+netdevbpf

Hello:

This patch was applied to bpf/bpf-next.git (refs/heads/master):

On Mon,  1 Mar 2021 10:48:05 -0800 you wrote:
> From: Cong Wang 
> 
> This should fix the following warning:
> 
> include/linux/skbuff.h:932: warning: Function parameter or member
> '_sk_redir' not described in 'sk_buff'
> 
> [...]

Here is the summary with links:
  - [bpf-next] skmsg: add function doc for skb->_sk_redir
https://git.kernel.org/bpf/bpf-next/c/6ed6e1c761f6

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-04 Thread Robin Murphy


On 2021-03-01 08:42, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 


Moreso than the previous patch, where the feature is at least relatively 
generic (note that there's a bunch of in-flight development around 
DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to 
bloat the generic iommu_ops structure with private driver-specific 
interfaces. The attribute interface is a great compromise for these 
kinds of things, and you can easily add type-checked wrappers around it 
for external callers (maybe even make the actual attributes internal 
between the IOMMU core and drivers) if that's your concern.


Robin.


---
  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
  drivers/iommu/iommu.c   |  9 ++
  include/linux/iommu.h   |  9 +-
  4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu)
struct io_pgtable_domain_attr pgtbl_cfg;
  
  	pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;

-   iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
+   iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg);
  }
  
  struct msm_gem_address_space *

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2e17d990d04481..2858999c86dfd1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
iommu_domain *domain)
return ret;
  }
  
-static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
+   struct io_pgtable_domain_attr *pgtbl_cfg)
  {
-   int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
  
-	mutex_lock(&smmu_domain->init_mutex);

-
-   switch(domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
-   struct io_pgtable_domain_attr *pgtbl_cfg = data;
-
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
  
-			smmu_domain->pgtbl_cfg = *pgtbl_cfg;

-   break;
-   }
-   default:
-   ret = -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   ret = -ENODEV;
-   break;
-   default:
-   ret = -EINVAL;
+   mutex_lock(&smmu_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   ret = 0;
}
-out_unlock:
mutex_unlock(&smmu_domain->init_mutex);
+
return ret;
  }
  
@@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {

.device_group   = arm_smmu_device_group,
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
-   .domain_set_attr= arm_smmu_domain_set_attr,
+   .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
.domain_enable_nesting  = arm_smmu_domain_enable_nesting,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2e9e058501a953..8490aefd4b41f8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
*domain)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
  
+int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,

+   struct io_pgtable_domain_attr *pgtbl_cfg)
+{
+   if (!domain->ops->domain_set_pgtable_attr)
+   return -EINVAL;
+   return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr);
+
  void iommu_get_resv_regions(struct device *dev, struct list_head *list)
  {
const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aed88aa3bd3edf..39d3ed4d2700ac 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct iommu_domain;
  struct notifier_block;
  struct iommu_sva;
  struct iommu_fault_event;
+stru

Re: [PATCH bpf-next] selftests/bpf: Fix test_attach_probe for powerpc uprobes

2021-03-04 Thread Jiri Olsa

On Thu, Mar 04, 2021 at 11:46:27AM +1100, Michael Ellerman wrote:
> "Naveen N. Rao"  writes:
> > On 2021/03/02 11:35AM, Jiri Olsa wrote:
> >> On Mon, Mar 01, 2021 at 02:58:53PM -0800, Yonghong Song wrote:
> >> > 
> >> > 
> >> > On 3/1/21 11:04 AM, Jiri Olsa wrote:
> >> > > When testing uprobes we the test gets GEP (Global Entry Point)
> >> > > address from kallsyms, but then the function is called locally
> >> > > so the uprobe is not triggered.
> >> > > 
> >> > > Fixing this by adjusting the address to LEP (Local Entry Point)
> >> > > for powerpc arch.
> >> > > 
> >> > > Signed-off-by: Jiri Olsa 
> >> > > ---
> >> > >   .../selftests/bpf/prog_tests/attach_probe.c| 18 
> >> > > +-
> >> > >   1 file changed, 17 insertions(+), 1 deletion(-)
> >> > > 
> >> > > diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c 
> >> > > b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> >> > > index a0ee87c8e1ea..c3cfb48d3ed0 100644
> >> > > --- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> >> > > +++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> >> > > @@ -2,6 +2,22 @@
> >> > >   #include 
> >> > >   #include "test_attach_probe.skel.h"
> >> > > +#if defined(__powerpc64__)
> >
> > This needs to be specific to ELF v2 ABI, so you'll need to check 
> > _CALL_ELF. See commit d5c2e2c17ae1d6 ("perf probe ppc64le: Prefer symbol 
> > table lookup over DWARF") for an example.
> >
> >> > > +/*
> >> > > + * We get the GEP (Global Entry Point) address from kallsyms,
> >> > > + * but then the function is called locally, so we need to adjust
> >> > > + * the address to get LEP (Local Entry Point).
> >> > 
> >> > Any documentation in the kernel about this behavior? This will
> >> > help to validate the change without trying with powerpc64 qemu...
> >
> > I don't think we have documented this in the kernel anywhere, but this 
> > is specific to the ELF v2 ABI and is described there:
> > - 2.3.2.1.  Function Prologue: 
> >   
> > http://cdn.openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655240___RefHeading___Toc377640597.html
> > - 3.4.1.  Symbol Values:
> >
> > http://cdn.openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655241_95185.html
> 
> There's a comment in ppc_function_entry(), but I don't think we have any
> actual "documentation".
> 
> static inline unsigned long ppc_function_entry(void *func)
> {
> #ifdef PPC64_ELF_ABI_v2
>   u32 *insn = func;
> 
>   /*
>* A PPC64 ABIv2 function may have a local and a global entry
>* point. We need to use the local entry point when patching
>* functions, so identify and step over the global entry point
>* sequence.

hm, so I need to do the instructions check below as well

>*
>* The global entry point sequence is always of the form:
>*
>* addis r2,r12,
>* addi  r2,r2,
>*
>* A linker optimisation may convert the addis to lis:
>*
>* lis   r2,
>* addi  r2,r2,
>*/
>   if *insn & OP_RT_RA_MASK) == ADDIS_R2_R12) ||
>((*insn & OP_RT_RA_MASK) == LIS_R2)) &&
>   ((*(insn+1) & OP_RT_RA_MASK) == ADDI_R2_R2))

is this check/instructions specific to kernel code?

In the test prog I see following instructions:

Dump of assembler code for function get_base_addr:
   0x10034cb0 <+0>: lis r2,4256
   0x10034cb4 <+4>: addir2,r2,31488
   ...

but first instruction does not match the check in kernel code above:

1.insn value:   0x3c4010a0
2.insn value:   0x38427b00

the used defines are:
#define OP_RT_RA_MASK   0xUL
#define LIS_R2  0x3c02UL
#define ADDIS_R2_R120x3c4cUL
#define ADDI_R2_R2  0x3842UL


maybe we could skip the check, and run the test twice: first on
kallsym address and if the uprobe is not hit we will run it again
on address + 8

thanks,
jirka

>   return (unsigned long)(insn + 2);
>   else
>   return (unsigned long)func;
> 
> 
> cheers
>

Re: [PATCH 3/3] PCI: Convert rtw88 power cycle quirk to shutdown quirk

2021-03-04 Thread Bjorn Helgaas

[+cc Rafael, linux-pm]

On Thu, Mar 04, 2021 at 02:07:18PM +0800, Kai-Heng Feng wrote:
> On Sat, Feb 27, 2021 at 2:17 AM Bjorn Helgaas  wrote:
> > On Fri, Feb 26, 2021 at 02:31:31PM +0100, Heiner Kallweit wrote:
> > > On 26.02.2021 13:18, Kai-Heng Feng wrote:
> > > > On Fri, Feb 26, 2021 at 8:10 PM Heiner Kallweit  
> > > > wrote:
> > > >>
> > > >> On 26.02.2021 08:12, Kalle Valo wrote:
> > > >>> Kai-Heng Feng  writes:
> > > >>>
> > >  Now we have a generic D3 shutdown quirk, so convert the original
> > >  approach to a PCI quirk.
> > > 
> > >  Signed-off-by: Kai-Heng Feng 
> > >  ---
> > >   drivers/net/wireless/realtek/rtw88/pci.c | 2 --
> > >   drivers/pci/quirks.c | 6 ++
> > >   2 files changed, 6 insertions(+), 2 deletions(-)
> > > >>>
> > > >>> It would have been nice to CC linux-wireless also on patches 1-2. I 
> > > >>> only
> > > >>> saw patch 3 and had to search the rest of patches from lkml.
> > > >>>
> > > >>> I assume this goes via the PCI tree so:
> > > >>>
> > > >>> Acked-by: Kalle Valo 
> > > >>
> > > >> To me it looks odd to (mis-)use the quirk mechanism to set a device
> > > >> to D3cold on shutdown. As I see it the quirk mechanism is used to work
> > > >> around certain device misbehavior. And setting a device to a D3
> > > >> state on shutdown is a normal activity, and the shutdown() callback
> > > >> seems to be a good place for it.
> > > >> I miss an explanation what the actual benefit of the change is.
> > > >
> > > > To make putting device to D3 more generic, as there are more than one
> > > > device need the quirk.
> > > >
> > > > Here's the discussion:
> > > > https://lore.kernel.org/linux-usb/00de6927-3fa6-a9a3-2d65-2b4d4e8f0...@linux.intel.com/
> > > >
> > >
> > > Thanks for the link. For the AMD USB use case I don't have a strong 
> > > opinion,
> > > what's considered the better option may be a question of personal taste.
> > > For rtw88 however I'd still consider it over-engineering to replace a 
> > > simple
> > > call to pci_set_power_state() with a PCI quirk.
> > > I may be biased here because I find it sometimes bothering if I want to
> > > look up how a device is handled and in addition to checking the respective
> > > driver I also have to grep through quirks.c whether there's any special
> > > handling.
> >
> > I haven't looked at these patches carefully, but in general, I agree
> > that quirks should be used to work around hardware defects in the
> > device.  If the device behaves correctly per spec, we should use a
> > different mechanism so the code remains generic and all devices get
> > the benefit.
> >
> > If we do add quirks, the commit log should explain what the device
> > defect is.
> 
> So maybe it's reasonable to put all PCI devices to D3 at shutdown?

I don't know off-hand.  I added Rafael and linux-pm in case they do.

If not, I suggest working up a patch to do that and a commit log that
explains why that's a good idea and then we can have a discussion
about it.  This thread really doesn't have that justification.  It
says "putting device X in D3cold at shutdown saves 0.03w while in S5",
but doesn't explain why that's safe or desirable for all devices.

Bjorn

[PATCH net] ibmvnic: always store valid MAC address

2021-03-04 Thread Jiri Wiesner

The last change to ibmvnic_set_mac(), 8fc3672a8ad3, meant to prevent
users from setting an invalid MAC address on an ibmvnic interface
that has not been brought up yet. The change also prevented the
requested MAC address from being stored by the adapter object for an
ibmvnic interface when the state of the ibmvnic interface is
VNIC_PROBED - that is after probing has finished but before the
ibmvnic interface is brought up. The MAC address stored by the
adapter object is used and sent to the hypervisor for checking when
an ibmvnic interface is brought up.

The ibmvnic driver ignoring the requested MAC address when in
VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
than one slave to malfunction. The bonding code must be able to
change the MAC address of its slaves before they are brought up
during enslaving. The inability of kernels with 8fc3672a8ad3 to set
the MAC addresses of bonding slaves is observable in the output of
"ip address show". The MAC addresses of the slaves are the same as
the MAC address of the bond on a working system whereas the slaves
retain their original MAC addresses on a system with a malfunctioning
LACP bond.

Fixes: 8fc3672a8ad3 ("ibmvnic: fix ibmvnic_set_mac")
Signed-off-by: Jiri Wiesner 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 3bad762083c5..b6102ccf9b90 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1906,10 +1906,9 @@ static int ibmvnic_set_mac(struct net_device *netdev, 
void *p)
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
 
-   if (adapter->state != VNIC_PROBED) {
-   ether_addr_copy(adapter->mac_addr, addr->sa_data);
+   ether_addr_copy(adapter->mac_addr, addr->sa_data);
+   if (adapter->state != VNIC_PROBED)
rc = __ibmvnic_set_mac(netdev, addr->sa_data);
-   }
 
return rc;
 }
-- 
2.26.2

Re: [PATCH] scsi: target: vhost-scsi: remove redundant initialization of variable ret

2021-03-04 Thread Stefan Hajnoczi

On Wed, Mar 03, 2021 at 01:43:39PM +, Colin King wrote:
> From: Colin Ian King 
> 
> The variable ret is being initialized with a value that is never read
> and it is being updated later with a new value.  The initialization is
> redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/vhost/scsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Which kernel version is this patch based on?

If it's a fix for a patch that hasn't landed yet, please indicate this.
A "Fixes: ..." tag should be added to this patch as well.

I looked at linux.git/master commit f69d02e37a85645aa90d18cacfff36dba370f797 
and see this:

  static int __init vhost_scsi_init(void)
  {
  int ret = -ENOMEM;

  pr_debug("TCM_VHOST fabric module %s on %s/%s"
  " on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
  utsname()->machine);

  /*
   * Use our own dedicated workqueue for submitting I/O into
   * target core to avoid contention within system_wq.
   */
  vhost_scsi_workqueue = alloc_workqueue("vhost_scsi", 0, 0);
  if (!vhost_scsi_workqueue)
  goto out;

We need ret's initialization value here ^

  ret = vhost_scsi_register();
  if (ret < 0)
  goto out_destroy_workqueue;

  ret = target_register_template(&vhost_scsi_ops);
  if (ret < 0)
  goto out_vhost_scsi_deregister;

  return 0;

  out_vhost_scsi_deregister:
  vhost_scsi_deregister();
  out_destroy_workqueue:
  destroy_workqueue(vhost_scsi_workqueue);
  out:
  return ret;
  };


> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index d16c04dcc144..9129ab8187fd 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -2465,7 +2465,7 @@ static const struct target_core_fabric_ops 
> vhost_scsi_ops = {
>  
>  static int __init vhost_scsi_init(void)
>  {
> - int ret = -ENOMEM;
> + int ret;
>  
>   pr_debug("TCM_VHOST fabric module %s on %s/%s"
>   " on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
> -- 
> 2.30.0
> 


signature.asc
Description: PGP signature

Minor update to bridge-utils

2021-03-04 Thread Stephen Hemminger

Small changes to bridge-utils to address some minor issues.

1. The default branch is main not master

2. Fixed some compiler warnings because Gcc 10 and Clang now
   do checks for string overflow.

3. Made a backup repository mirror at github.

4. Fixed version string printed

This is not a required update, none of these are important
or critical to users.

Do not intend to do any new features or serious fixes to
bridge-utils; all users should convert to iproute2/bridge command.

Re: [PATCH 1/1] net: usb: qmi_wwan: allow qmimux add/del with master up

2021-03-04 Thread Bjørn Mork

Daniele Palmas  writes:

> There's no reason for preventing the creation and removal
> of qmimux network interfaces when the underlying interface
> is up.
>
> This makes qmi_wwan mux implementation more similar to the
> rmnet one, simplifying userspace management of the same
> logical interfaces.
>
> Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
> Reported-by: Aleksander Morgado 
> Signed-off-by: Daniele Palmas 

Acked-by: Bjørn Mork

Re: [regression] Kernel panic on resume from sleep

2021-03-04 Thread Jakub Kicinski

On Thu, 4 Mar 2021 13:50:01 +0100 Zbynek Michl wrote:
> Looks good so far, but need to wait some more time as the issue was irregular.
> 
> Do you have any explanation why the calls disorder caused the panic
> just occasionally?

Depends if kernel attempts to try to send a packet before __alx_open()
finishes. You can probably make it more likely by running trafgen, iperf
or such while suspending and resuming?

> Also, the same (wrong) order I can see in the 3.16 kernel code, but it
> has worked fine with this kernel in all cases. So what is different in
> 5.10?

At some point in between those versions the driver got modified to
allocate and free the NAPI structures dynamically.

I didn't look too closely to find out if things indeed worked 100%
correctly before, but now they will reliably crash on a NULL pointer
dereference if transmission comes before open is done.

[PATCH net] netdevsim: init u64 stats for 32bit hardware

2021-03-04 Thread Jakub Kicinski

From: Hillf Danton 

Init the u64 stats in order to avoid the lockdep prints on the 32bit
hardware like

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
 Hardware name: ARM-Versatile Express
 Backtrace:
 [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c 
arch/arm/kernel/traps.c:252)
 [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 
[inline])
 [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 
lib/dump_stack.c:120)
 [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key 
kernel/locking/lockdep.c:935 [inline])
 [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 
kernel/locking/lockdep.c:1247)
 [<802bef04>] (register_lock_class) from [<802baa2c>] 
(__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
 [<802ba9a8>] (__lock_acquire) from [<802be840>] 
(lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
 [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 
kernel/locking/lockdep.c:5415)
 [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access 
include/linux/seqlock.h:103 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin 
include/linux/u64_stats_sync.h:164 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin 
include/linux/u64_stats_sync.h:175 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 
drivers/net/netdevsim/netdev.c:70)
 [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 
net/core/dev.c:10405)
 [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 
net/core/rtnetlink.c:1211)
 [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] 
(rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
 [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] 
(rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event 
net/core/rtnetlink.c:3830 [inline])
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event 
net/core/rtnetlink.c:3821 [inline])
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] 
(rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
 [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c 
net/core/dev.c:10103)
 [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 
drivers/net/netdevsim/netdev.c:317)
 [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 
drivers/net/netdevsim/dev.c:941)
 [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all 
drivers/net/netdevsim/dev.c:990 [inline])
 [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] 
(nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
 [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 
drivers/net/netdevsim/bus.c:287)
 [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c 
drivers/base/dd.c:554)
 [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 
drivers/base/dd.c:740)
 [<811728cc>] (driver_probe_device) from [<81172fe4>] 
(__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
 [<81172f58>] (__device_attach_driver) from [<8116fee0>] 
(bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
 [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 
drivers/base/dd.c:914)
 [<81172b90>] (__device_attach) from [<8117305c>] 
(device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
 [<81173048>] (device_initial_probe) from [<81171358>] 
(bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
 [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 
drivers/base/core.c:3109)
 [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 
drivers/base/core.c:3182)
 [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new 
drivers/net/netdevsim/bus.c:336 [inline])
 [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 
drivers/net/netdevsim/bus.c:215)
 [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 
drivers/base/bus.c:122)
 [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 
fs/sysfs/file.c:139)
 [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] 
(kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter 
include/linux/fs.h:1901 [inline])
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write 
fs/read_write.c:518 [inline])
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c 
fs/read_write.c:605)
 [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec 
fs/read_write.c:658)
 [<804d259c>] (ksys_write)

[PATCH] net: sched: avoid duplicates in classes dump

2021-03-04 Thread Maximilian Heyne

This is a follow up of commit ea3274695353 ("net: sched: avoid
duplicates in qdisc dump") which has fixed the issue only for the qdisc
dump.

The duplicate printing also occurs when dumping the classes via
  tc class show dev eth0

Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable")
Signed-off-by: Maximilian Heyne 
---
 net/sched/sch_api.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index e2e4353db8a7..f87d07736a14 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -2168,7 +2168,7 @@ static int tc_dump_tclass_qdisc(struct Qdisc *q, struct 
sk_buff *skb,
 
 static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
   struct tcmsg *tcm, struct netlink_callback *cb,
-  int *t_p, int s_t)
+  int *t_p, int s_t, bool recur)
 {
struct Qdisc *q;
int b;
@@ -2179,7 +2179,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct 
sk_buff *skb,
if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
return -1;
 
-   if (!qdisc_dev(root))
+   if (!qdisc_dev(root) || !recur)
return 0;
 
if (tcm->tcm_parent) {
@@ -2214,13 +2214,13 @@ static int tc_dump_tclass(struct sk_buff *skb, struct 
netlink_callback *cb)
s_t = cb->args[0];
t = 0;
 
-   if (tc_dump_tclass_root(dev->qdisc, skb, tcm, cb, &t, s_t) < 0)
+   if (tc_dump_tclass_root(dev->qdisc, skb, tcm, cb, &t, s_t, true) < 0)
goto done;
 
dev_queue = dev_ingress_queue(dev);
if (dev_queue &&
tc_dump_tclass_root(dev_queue->qdisc_sleeping, skb, tcm, cb,
-   &t, s_t) < 0)
+   &t, s_t, false) < 0)
goto done;
 
 done:
-- 
2.16.6




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH net] ethtool: Add indicator field for link_mode validity to link_ksettings

2021-03-04 Thread Jakub Kicinski

On Thu, 4 Mar 2021 11:09:33 +0200 Danielle Ratson wrote:
> Some drivers clear the 'ethtool_link_ksettings' struct in their
> get_link_ksettings() callback, before populating it with actual values.
> Such drivers will set the new 'link_mode' field to zero, resulting in
> user space receiving wrong link mode information given that zero is a
> valid value for the field.
> 
> Fix this by introducing a new boolean field ('link_mode_valid'), which
> indicates whether the 'link_mode' field is valid or not. Set it in mlxsw
> which is currently the only driver supporting the new API.
> 
> Another problem is that some drivers (notably tun) can report random
> values in the 'link_mode' field. This can result in a general protection
> fault when the field is used as an index to the 'link_mode_params' array
> [1].
> 
> This happens because such drivers implement their set_link_ksettings()
> callback by simply overwriting their private copy of
> 'ethtool_link_ksettings' struct with the one they get from the stack,
> which is not always properly initialized.
> 
> Fix this by making sure that the new 'link_mode_valid' field is always
> initialized to 'false' before invoking the set_link_ksettings()
> callback.
> 
> [1]
> general protection fault, probably for non-canonical address 
> 0xdc00f14cc32c:  [#1] PREEMPT SMP KASAN
> KASAN: probably user-memory-access in range 
> [0x00078a661960-0x00078a661967]
> CPU: 0 PID: 8452 Comm: syz-executor360 Not tainted 5.11.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:__ethtool_get_link_ksettings+0x1a3/0x3a0 net/ethtool/ioctl.c:446
> Code: b7 3e fa 83 fd ff 0f 84 30 01 00 00 e8 16 b0 3e fa 48 8d 3c ed 60 d5 69 
> 8a 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 
> 83 e0 07 83 c0 03
> +38 d0 7c 08 84 d2 0f 85 b9
> RSP: 0018:c900019df7a0 EFLAGS: 00010202
> RAX: dc00 RBX: 888026136008 RCX: 
> RDX: f14cc32c RSI: 873439ca RDI: 00078a661960
> RBP: 8880 R08:  R09: 88802613606f
> R10: 873439bc R11:  R12: 
> R13: 88802613606c R14: 888011d0c210 R15: 888011d0c210
> FS:  00749300() GS:8880b9c0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004b60f0 CR3: 185c2000 CR4: 001506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  linkinfo_prepare_data+0xfd/0x280 net/ethtool/linkinfo.c:37
>  ethnl_default_notify+0x1dc/0x630 net/ethtool/netlink.c:586
>  ethtool_notify+0xbd/0x1f0 net/ethtool/netlink.c:656
>  ethtool_set_link_ksettings+0x277/0x330 net/ethtool/ioctl.c:620
>  dev_ethtool+0x2b35/0x45d0 net/ethtool/ioctl.c:2842
>  dev_ioctl+0x463/0xb70 net/core/dev_ioctl.c:440
>  sock_do_ioctl+0x148/0x2d0 net/socket.c:1060
>  sock_ioctl+0x477/0x6a0 net/socket.c:1177
>  vfs_ioctl fs/ioctl.c:48 [inline]
>  __do_sys_ioctl fs/ioctl.c:753 [inline]
>  __se_sys_ioctl fs/ioctl.c:739 [inline]
>  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Fixes: c8907043c6ac9 ("ethtool: Get link mode in use instead of speed and 
> duplex parameters")
> Signed-off-by: Danielle Ratson 
> Reported-by: Eric Dumazet 
> Reviewed-by: Ido Schimmel 

Reviewed-by: Jakub Kicinski 

BTW shouldn't __ethtool_get_link_ksettings() be moved to common.c?

Re: [PATCH net 1/3] net: ethernet: ixgbe: don't propagate -ENODEV from ixgbe_mii_bus_init()

2021-03-04 Thread Jakub Kicinski

On Wed,  3 Mar 2021 17:06:47 -0800 Tony Nguyen wrote:
> From: Bartosz Golaszewski 
> 
> It's a valid use-case for ixgbe_mii_bus_init() to return -ENODEV - we
> still want to finalize the registration of the ixgbe device. Check the
> error code and don't bail out if err == -ENODEV.
> 
> This fixes an issue on C3000 family of SoCs where four ixgbe devices
> share a single MDIO bus and ixgbe_mii_bus_init() returns -ENODEV for
> three of them but we still want to register them.
> 
> Fixes: 09ef193fef7e ("net: ethernet: ixgbe: check the return value of 
> ixgbe_mii_bus_init()")
> Reported-by: Yongxin Liu 
> Signed-off-by: Bartosz Golaszewski 
> Tested-by: Tony Brelinski 
> Signed-off-by: Tony Nguyen 

Are you sure this is not already fixed upstream by:

bd7f14df9492 ("ixgbe: fix probing of multi-port devices with one MDIO")

?

Re: [PATCH bpf-next v4 1/5] bpf: consolidate shared test timing code

2021-03-04 Thread Andrii Nakryiko

On Wed, Mar 3, 2021 at 2:18 AM Lorenz Bauer  wrote:
>
> Share the timing / signal interruption logic between different
> implementations of PROG_TEST_RUN. There is a change in behaviour
> as well. We check the loop exit condition before checking for
> pending signals. This resolves an edge case where a signal
> arrives during the last iteration. Instead of aborting with
> EINTR we return the successful result to user space.
>
> Signed-off-by: Lorenz Bauer 
> ---

LGTM.

Acked-by: Andrii Nakryiko 

>  net/bpf/test_run.c | 141 +
>  1 file changed, 78 insertions(+), 63 deletions(-)
>

[...]

[RFC PATCH V2 net-next 0/5] ethtool: Extend module EEPROM dump API

2021-03-04 Thread Moshe Shemesh

Ethtool supports module EEPROM dumps via the `ethtool -m ` command.
But in current state its functionality is limited - offset and length
parameters, which are used to specify a linear desired region of EEPROM
data to dump, is not enough, considering emergence of complex module
EEPROM layouts such as CMIS 4.0.
Moreover, CMIS 4.0 extends the amount of pages that may be accessible by
introducing another parameter for page addressing - banks.

Besides, currently module EEPROM is represented as a chunk of
concatenated pages, where lower 128 bytes of all pages, except page 00h,
are omitted. Offset and length are used to address parts of this fake
linear memory. But in practice drivers, which implement
get_module_info() and get_module_eeprom() ethtool ops still calculate
page number and set I2C address on their own.

This series tackles these issues by adding ethtool op, which allows to
pass page number, bank number and I2C address in addition to offset and
length parameters to the driver, adds corresponding netlink
infrastructure and implements the new interface in mlx5 driver.

This allows to extend userspace 'ethtool -m' CLI by adding new
parameters - page, bank and i2c. New command line format:
 ethtool -m  [hex on|off] [raw on|off] [offset N] [length N] [page N] 
[bank N] [i2c N]

The consequence of this series is a possibility to dump arbitrary EEPROM
page at a time, in contrast to dumps of concatenated pages. Therefore,
offset and length change their semantics and may be used only to specify
a part of data within a page, which size is currently limited to 256
bytes.

As for backwards compatibility with get_module_info() and
get_module_eeprom() pair, the series addresses it as well by
implementing a fallback mechanism. As mentioned earlier, drivers derive
a page number from 'global' offset, so this can be done vice versa
without their involvement thanks to standardization. If kernel netlink
handler of 'ethtool -m' command detects that new ethtool op is not
supported by the driver, it calculates offset from given page number and
page offset and calls old ndos, if they are available.

Change log:
v1 -> v2:
- Limited i2c_address values by 127
- Added page bound check for offset and length
- Added defines for these two points
- Added extack to ndo parameters
- Moved ethnl_ops_begin(dev) and set error path accordingly



Vladyslav Tarasiuk (5):
  ethtool: Allow network drivers to dump arbitrary EEPROM data
  net/mlx5: Refactor module EEPROM query
  net/mlx5: Implement get_module_eeprom_data_by_page()
  net/mlx5: Add support for DSFP module EEPROM dumps
  ethtool: Add fallback to get_module_eeprom from netlink command

 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  42 +++
 .../net/ethernet/mellanox/mlx5/core/port.c| 101 ++--
 include/linux/ethtool.h   |   7 +-
 include/linux/mlx5/port.h |  12 +
 include/uapi/linux/ethtool.h  |  26 ++
 include/uapi/linux/ethtool_netlink.h  |  19 ++
 net/ethtool/Makefile  |   2 +-
 net/ethtool/eeprom.c  | 239 ++
 net/ethtool/netlink.c |  10 +
 net/ethtool/netlink.h |   2 +
 10 files changed, 430 insertions(+), 30 deletions(-)
 create mode 100644 net/ethtool/eeprom.c

-- 
2.18.2

[RFC PATCH V2 net-next 3/5] net/mlx5: Implement get_module_eeprom_data_by_page()

2021-03-04 Thread Moshe Shemesh

From: Vladyslav Tarasiuk 

Implement ethtool_ops::get_module_eeprom_data_by_page() to enable
support of new SFP standards.

Signed-off-by: Vladyslav Tarasiuk 
---
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  | 42 +++
 .../net/ethernet/mellanox/mlx5/core/port.c| 33 +++
 include/linux/mlx5/port.h |  2 +
 3 files changed, 77 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index abdf721bb264..6766ffb07c81 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1769,6 +1769,47 @@ static int mlx5e_get_module_eeprom(struct net_device 
*netdev,
return 0;
 }
 
+static int mlx5e_get_module_eeprom_data_by_page(struct net_device *netdev,
+   const struct 
ethtool_eeprom_data *page_data,
+   struct netlink_ext_ack *extack)
+{
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct mlx5_module_eeprom_query_params query;
+   struct mlx5_core_dev *mdev = priv->mdev;
+   u8 *data = page_data->data;
+   int size_read;
+   int i = 0;
+
+   if (!page_data->length)
+   return -EINVAL;
+
+   memset(data, 0, page_data->length);
+
+   query.offset = page_data->offset;
+   query.i2c_address = page_data->i2c_address;
+   query.bank = page_data->bank;
+   query.page = page_data->page;
+   while (i < page_data->length) {
+   query.size = page_data->length - i;
+   size_read = mlx5_query_module_eeprom_data(mdev, &query, data + 
i);
+
+   /* Done reading */
+   if (!size_read)
+   return 0;
+
+   if (size_read < 0) {
+   netdev_err(priv->netdev, "%s: 
mlx5_query_module_eeprom_data failed:0x%x\n",
+  __func__, size_read);
+   return 0;
+   }
+
+   i += size_read;
+   query.offset += size_read;
+   }
+
+   return 0;
+}
+
 int mlx5e_ethtool_flash_device(struct mlx5e_priv *priv,
   struct ethtool_flash *flash)
 {
@@ -2148,6 +2189,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.set_wol   = mlx5e_set_wol,
.get_module_info   = mlx5e_get_module_info,
.get_module_eeprom = mlx5e_get_module_eeprom,
+   .get_module_eeprom_data_by_page = mlx5e_get_module_eeprom_data_by_page,
.flash_device  = mlx5e_flash_device,
.get_priv_flags= mlx5e_get_priv_flags,
.set_priv_flags= mlx5e_set_priv_flags,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 9b9f870d67a4..f7a16fdfb8d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -428,6 +428,39 @@ int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
 }
 EXPORT_SYMBOL_GPL(mlx5_query_module_eeprom);
 
+int mlx5_query_module_eeprom_data(struct mlx5_core_dev *dev,
+ struct mlx5_module_eeprom_query_params 
*params,
+ u8 *data)
+{
+   u8 module_id;
+   int err;
+
+   err = mlx5_query_module_num(dev, ¶ms->module_number);
+   if (err)
+   return err;
+
+   err = mlx5_query_module_id(dev, params->module_number, &module_id);
+   if (err)
+   return err;
+
+   if (module_id != MLX5_MODULE_ID_SFP &&
+   module_id != MLX5_MODULE_ID_QSFP &&
+   module_id != MLX5_MODULE_ID_QSFP28 &&
+   module_id != MLX5_MODULE_ID_QSFP_PLUS) {
+   mlx5_core_err(dev, "Module ID not recognized: 0x%x\n", 
module_id);
+   return -EINVAL;
+   }
+
+   if (params->i2c_address != MLX5_I2C_ADDR_HIGH &&
+   params->i2c_address != MLX5_I2C_ADDR_LOW) {
+   mlx5_core_err(dev, "I2C address not recognized: 0x%x\n", 
params->i2c_address);
+   return -EINVAL;
+   }
+
+   return mlx5_query_mcia(dev, params, data);
+}
+EXPORT_SYMBOL_GPL(mlx5_query_module_eeprom_data);
+
 static int mlx5_query_port_pvlc(struct mlx5_core_dev *dev, u32 *pvlc,
int pvlc_size,  u8 local_port)
 {
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index 90b87aa82db3..887cd43b41e8 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -209,6 +209,8 @@ void mlx5_query_port_fcs(struct mlx5_core_dev *mdev, bool 
*supported,
 bool *enabled);
 int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
 u16 offset, u16 size, u8 *data);
+int mlx5_query_module_eeprom_data(struct mlx5_core_dev *dev,
+ struct mlx5_module_eeprom_query_params 
*par

[RFC PATCH V2 net-next 2/5] net/mlx5: Refactor module EEPROM query

2021-03-04 Thread Moshe Shemesh

From: Vladyslav Tarasiuk 

Prepare for ethtool_ops::get_module_eeprom_data() implementation by
extracting common part of mlx5_query_module_eeprom() into a separate
function.

Signed-off-by: Vladyslav Tarasiuk 
---
 .../net/ethernet/mellanox/mlx5/core/port.c| 79 +++
 include/linux/mlx5/port.h |  9 +++
 2 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 4bb219565c58..9b9f870d67a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -353,67 +353,78 @@ static void mlx5_sfp_eeprom_params_set(u16 *i2c_addr, int 
*page_num, u16 *offset
*offset -= MLX5_EEPROM_PAGE_LENGTH;
 }
 
-int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
-u16 offset, u16 size, u8 *data)
+static int mlx5_query_mcia(struct mlx5_core_dev *dev,
+  struct mlx5_module_eeprom_query_params *params, u8 
*data)
 {
-   int module_num, status, err, page_num = 0;
u32 in[MLX5_ST_SZ_DW(mcia_reg)] = {};
u32 out[MLX5_ST_SZ_DW(mcia_reg)];
-   u16 i2c_addr = 0;
-   u8 module_id;
+   int status, err;
void *ptr;
+   u16 size;
+
+   size = min_t(int, params->size, MLX5_EEPROM_MAX_BYTES);
+
+   MLX5_SET(mcia_reg, in, l, 0);
+   MLX5_SET(mcia_reg, in, size, size);
+   MLX5_SET(mcia_reg, in, module, params->module_number);
+   MLX5_SET(mcia_reg, in, device_address, params->offset);
+   MLX5_SET(mcia_reg, in, page_number, params->page);
+   MLX5_SET(mcia_reg, in, i2c_device_address, params->i2c_address);
 
-   err = mlx5_query_module_num(dev, &module_num);
+   err = mlx5_core_access_reg(dev, in, sizeof(in), out,
+  sizeof(out), MLX5_REG_MCIA, 0, 0);
if (err)
return err;
 
-   err = mlx5_query_module_id(dev, module_num, &module_id);
+   status = MLX5_GET(mcia_reg, out, status);
+   if (status) {
+   mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n",
+ status);
+   return -EIO;
+   }
+
+   ptr = MLX5_ADDR_OF(mcia_reg, out, dword_0);
+   memcpy(data, ptr, size);
+
+   return size;
+}
+
+int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
+u16 offset, u16 size, u8 *data)
+{
+   struct mlx5_module_eeprom_query_params query = {0};
+   u8 module_id;
+   int err;
+
+   err = mlx5_query_module_num(dev, &query.module_number);
+   if (err)
+   return err;
+
+   err = mlx5_query_module_id(dev, query.module_number, &module_id);
if (err)
return err;
 
switch (module_id) {
case MLX5_MODULE_ID_SFP:
-   mlx5_sfp_eeprom_params_set(&i2c_addr, &page_num, &offset);
+   mlx5_sfp_eeprom_params_set(&query.i2c_address, &query.page, 
&query.offset);
break;
case MLX5_MODULE_ID_QSFP:
case MLX5_MODULE_ID_QSFP_PLUS:
case MLX5_MODULE_ID_QSFP28:
-   mlx5_qsfp_eeprom_params_set(&i2c_addr, &page_num, &offset);
+   mlx5_qsfp_eeprom_params_set(&query.i2c_address, &query.page, 
&query.offset);
break;
default:
mlx5_core_err(dev, "Module ID not recognized: 0x%x\n", 
module_id);
return -EINVAL;
}
 
-   if (offset + size > MLX5_EEPROM_PAGE_LENGTH)
+   if (query.offset + size > MLX5_EEPROM_PAGE_LENGTH)
/* Cross pages read, read until offset 256 in low page */
size -= offset + size - MLX5_EEPROM_PAGE_LENGTH;
 
-   size = min_t(int, size, MLX5_EEPROM_MAX_BYTES);
+   query.size = size;
 
-   MLX5_SET(mcia_reg, in, l, 0);
-   MLX5_SET(mcia_reg, in, module, module_num);
-   MLX5_SET(mcia_reg, in, i2c_device_address, i2c_addr);
-   MLX5_SET(mcia_reg, in, page_number, page_num);
-   MLX5_SET(mcia_reg, in, device_address, offset);
-   MLX5_SET(mcia_reg, in, size, size);
-
-   err = mlx5_core_access_reg(dev, in, sizeof(in), out,
-  sizeof(out), MLX5_REG_MCIA, 0, 0);
-   if (err)
-   return err;
-
-   status = MLX5_GET(mcia_reg, out, status);
-   if (status) {
-   mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n",
- status);
-   return -EIO;
-   }
-
-   ptr = MLX5_ADDR_OF(mcia_reg, out, dword_0);
-   memcpy(data, ptr, size);
-
-   return size;
+   return mlx5_query_mcia(dev, &query, data);
 }
 EXPORT_SYMBOL_GPL(mlx5_query_module_eeprom);
 
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index 23edd2db4803..90b87aa82db3 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -62,6 +62,15 @@ enum mlx5_an_status {
 #def

[RFC PATCH V2 net-next 1/5] ethtool: Allow network drivers to dump arbitrary EEPROM data

2021-03-04 Thread Moshe Shemesh

From: Vladyslav Tarasiuk 

Define get_module_eeprom_data_by_page() ethtool callback and implement
netlink infrastructure.

get_module_eeprom_data_by_page() allows network drivers to dump a part
of module's EEPROM specified by page and bank numbers along with offset
and length. It is effectively a netlink replacement for
get_module_info() and get_module_eeprom() pair, which is needed due to
emergence of complex non-linear EEPROM layouts.

Signed-off-by: Vladyslav Tarasiuk 
---
 include/linux/ethtool.h  |   7 +-
 include/uapi/linux/ethtool.h |  26 +
 include/uapi/linux/ethtool_netlink.h |  19 
 net/ethtool/Makefile |   2 +-
 net/ethtool/eeprom.c | 157 +++
 net/ethtool/netlink.c|  10 ++
 net/ethtool/netlink.h|   2 +
 7 files changed, 221 insertions(+), 2 deletions(-)
 create mode 100644 net/ethtool/eeprom.c

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index ec4cd3921c67..2f65aae5f492 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -81,6 +81,7 @@ enum {
 #define ETH_RSS_HASH_NO_CHANGE 0
 
 struct net_device;
+struct netlink_ext_ack;
 
 /* Some generic methods drivers may use in their ethtool_ops */
 u32 ethtool_op_get_link(struct net_device *dev);
@@ -410,6 +411,8 @@ struct ethtool_pause_stats {
  * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
  * This is only useful if the device maintains PHY statistics and
  * cannot use the standard PHY library helpers.
+ * @get_module_eeprom_data_by_page: Get a region of plug-in module EEPROM data
+ * from specified page. Returns a negative error code or zero.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -515,6 +518,9 @@ struct ethtool_ops {
   const struct ethtool_tunable *, void *);
int (*set_phy_tunable)(struct net_device *,
   const struct ethtool_tunable *, const void 
*);
+   int (*get_module_eeprom_data_by_page)(struct net_device *dev,
+ const struct 
ethtool_eeprom_data *page,
+ struct netlink_ext_ack 
*extack);
 };
 
 int ethtool_check_ops(const struct ethtool_ops *ops);
@@ -538,7 +544,6 @@ int ethtool_virtdev_set_link_ksettings(struct net_device 
*dev,
   const struct ethtool_link_ksettings *cmd,
   u32 *dev_speed, u8 *dev_duplex);
 
-struct netlink_ext_ack;
 struct phy_device;
 struct phy_tdr_config;
 
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index cde753bb2093..2459571fc1d1 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -340,6 +340,28 @@ struct ethtool_eeprom {
__u8data[0];
 };
 
+/**
+ * struct ethtool_eeprom_data - EEPROM dump from specified page
+ * @offset: Offset within the specified EEPROM page to begin read, in bytes.
+ * @length: Number of bytes to read.
+ * @page: Page number to read from.
+ * @bank: Page bank number to read from, if applicable by EEPROM spec.
+ * @i2c_address: I2C address of a page. Value less than 0x7f expected. Most
+ * EEPROMs use 0x50 or 0x51.
+ * @data: Pointer to buffer with EEPROM data of @length size.
+ *
+ * This can be used to manage pages during EEPROM dump in ethtool and pass
+ * required information to the driver.
+ */
+struct ethtool_eeprom_data {
+   __u32   offset;
+   __u32   length;
+   __u32   page;
+   __u32   bank;
+   __u32   i2c_address;
+   __u8*data;
+};
+
 /**
  * struct ethtool_eee - Energy Efficient Ethernet information
  * @cmd: ETHTOOL_{G,S}EEE
@@ -1865,6 +1887,10 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 #define ETH_MODULE_SFF_8636_MAX_LEN 640
 #define ETH_MODULE_SFF_8436_MAX_LEN 640
 
+#define ETH_MODULE_EEPROM_MAX_LEN  640
+#define ETH_MODULE_EEPROM_PAGE_LEN 256
+#define ETH_MODULE_MAX_I2C_ADDRESS 0x7f
+
 /* Reset flags */
 /* The reset() operation must clear the flags for the components which
  * were actually reset.  On successful return, the flags indicate the
diff --git a/include/uapi/linux/ethtool_netlink.h 
b/include/uapi/linux/ethtool_netlink.h
index a286635ac9b8..60dd848d0b54 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -42,6 +42,7 @@ enum {
ETHTOOL_MSG_CABLE_TEST_ACT,
ETHTOOL_MSG_CABLE_TEST_TDR_ACT,
ETHTOOL_MSG_TUNNEL_INFO_GET,
+   ETHTOOL_MSG_EEPROM_DATA_GET,
 
/* add new constants above here */
__ETHTOOL_MSG_USER_CNT,
@@ -80,6 +81,7 @@ enum {
ETHTOOL_MSG_CABLE_TEST_NTF,
ETHTOOL_MSG_CABLE_TEST_TDR_NTF,
ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
+   ETHTOOL_MSG_EEPROM_DATA_GET_REPLY,
 
/*

[RFC PATCH V2 net-next 4/5] net/mlx5: Add support for DSFP module EEPROM dumps

2021-03-04 Thread Moshe Shemesh

From: Vladyslav Tarasiuk 

Allow the driver to recognise DSFP transceiver module ID and therefore
allow its EEPROM dumps using ethtool.

Signed-off-by: Vladyslav Tarasiuk 
---
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 3 ++-
 include/linux/mlx5/port.h  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index f7a16fdfb8d3..3a7aa6b05198 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -446,7 +446,8 @@ int mlx5_query_module_eeprom_data(struct mlx5_core_dev *dev,
if (module_id != MLX5_MODULE_ID_SFP &&
module_id != MLX5_MODULE_ID_QSFP &&
module_id != MLX5_MODULE_ID_QSFP28 &&
-   module_id != MLX5_MODULE_ID_QSFP_PLUS) {
+   module_id != MLX5_MODULE_ID_QSFP_PLUS &&
+   module_id != MLX5_MODULE_ID_DSFP) {
mlx5_core_err(dev, "Module ID not recognized: 0x%x\n", 
module_id);
return -EINVAL;
}
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index 887cd43b41e8..71b4373cb96c 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -45,6 +45,7 @@ enum mlx5_module_id {
MLX5_MODULE_ID_QSFP = 0xC,
MLX5_MODULE_ID_QSFP_PLUS= 0xD,
MLX5_MODULE_ID_QSFP28   = 0x11,
+   MLX5_MODULE_ID_DSFP = 0x1B,
 };
 
 enum mlx5_an_status {
-- 
2.18.2

[RFC PATCH V2 net-next 5/5] ethtool: Add fallback to get_module_eeprom from netlink command

2021-03-04 Thread Moshe Shemesh

From: Vladyslav Tarasiuk 

In case netlink get_module_eeprom_data_by_page() callback is not
implemented by the driver, try to call old get_module_info() and
get_module_eeprom() pair. Recalculate parameters to get_module_eeprom()
offset and len using page number and their sizes. Return error if
this can't be done.

Signed-off-by: Vladyslav Tarasiuk 
---
 net/ethtool/eeprom.c | 84 +++-
 1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/net/ethtool/eeprom.c b/net/ethtool/eeprom.c
index 2618a55b9a40..72c7714a9d37 100644
--- a/net/ethtool/eeprom.c
+++ b/net/ethtool/eeprom.c
@@ -26,6 +26,88 @@ struct eeprom_data_reply_data {
 #define EEPROM_DATA_REPDATA(__reply_base) \
container_of(__reply_base, struct eeprom_data_reply_data, base)
 
+static int fallback_set_params(struct eeprom_data_req_info *request,
+  struct ethtool_modinfo *modinfo,
+  struct ethtool_eeprom *eeprom)
+{
+   u32 offset = request->offset;
+   u32 length = request->length;
+
+   if (request->page)
+   offset = 128 + request->page * 128 + offset;
+
+   if (!length)
+   length = modinfo->eeprom_len;
+
+   if (offset >= modinfo->eeprom_len)
+   return -EINVAL;
+
+   if (modinfo->eeprom_len < offset + length)
+   length = modinfo->eeprom_len - offset;
+
+   eeprom->cmd = ETHTOOL_GMODULEEEPROM;
+   eeprom->len = length;
+   eeprom->offset = offset;
+
+   switch (modinfo->type) {
+   case ETH_MODULE_SFF_8079:
+   if (request->page > 1)
+   return -EINVAL;
+   break;
+   case ETH_MODULE_SFF_8472:
+   if (request->page > 3)
+   return -EINVAL;
+   break;
+   case ETH_MODULE_SFF_8436:
+   case ETH_MODULE_SFF_8636:
+   if (request->page > 3)
+   return -EINVAL;
+   break;
+   }
+   return 0;
+}
+
+static int eeprom_data_fallback(struct eeprom_data_req_info *request,
+   struct eeprom_data_reply_data *reply,
+   struct genl_info *info)
+{
+   struct net_device *dev = reply->base.dev;
+   struct ethtool_modinfo modinfo = {0};
+   struct ethtool_eeprom eeprom = {0};
+   u8 *data;
+   int err;
+
+   if ((!dev->ethtool_ops->get_module_info &&
+!dev->ethtool_ops->get_module_eeprom) ||
+   request->bank || request->i2c_address) {
+   return -EOPNOTSUPP;
+   }
+   modinfo.cmd = ETHTOOL_GMODULEINFO;
+   err = dev->ethtool_ops->get_module_info(dev, &modinfo);
+   if (err < 0)
+   return err;
+
+   err = fallback_set_params(request, &modinfo, &eeprom);
+   if (err < 0)
+   return err;
+
+   data = kmalloc(eeprom.len, GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+   err = dev->ethtool_ops->get_module_eeprom(dev, &eeprom, data);
+   if (err < 0)
+   goto err_out;
+
+   reply->data = data;
+   reply->length = eeprom.len;
+
+   return 0;
+
+err_out:
+   kfree(data);
+   return err;
+}
+
 static int eeprom_data_prepare_data(const struct ethnl_req_info *req_base,
struct ethnl_reply_data *reply_base,
struct genl_info *info)
@@ -37,7 +119,7 @@ static int eeprom_data_prepare_data(const struct 
ethnl_req_info *req_base,
int err;
 
if (!dev->ethtool_ops->get_module_eeprom_data_by_page)
-   return -EOPNOTSUPP;
+   return eeprom_data_fallback(request, reply, info);
 
page_data.offset = request->offset;
page_data.length = request->length;
-- 
2.18.2

Re: [PATCH net 1/3] net: ethernet: ixgbe: don't propagate -ENODEV from ixgbe_mii_bus_init()

2021-03-04 Thread Nguyen, Anthony L

On Thu, 2021-03-04 at 10:50 -0800, Jakub Kicinski wrote:
> On Wed,  3 Mar 2021 17:06:47 -0800 Tony Nguyen wrote:
> > From: Bartosz Golaszewski 
> > 
> > It's a valid use-case for ixgbe_mii_bus_init() to return -ENODEV -
> > we
> > still want to finalize the registration of the ixgbe device. Check
> > the
> > error code and don't bail out if err == -ENODEV.
> > 
> > This fixes an issue on C3000 family of SoCs where four ixgbe
> > devices
> > share a single MDIO bus and ixgbe_mii_bus_init() returns -ENODEV
> > for
> > three of them but we still want to register them.
> > 
> > Fixes: 09ef193fef7e ("net: ethernet: ixgbe: check the return value
> > of ixgbe_mii_bus_init()")
> > Reported-by: Yongxin Liu 
> > Signed-off-by: Bartosz Golaszewski 
> > Tested-by: Tony Brelinski 
> > Signed-off-by: Tony Nguyen 
> 
> Are you sure this is not already fixed upstream by:
> 
> bd7f14df9492 ("ixgbe: fix probing of multi-port devices with one
> MDIO")
> 
> ?

That looks to solve this issue. I'll drop this patch and resend the
series.

Thanks,
Tony

[BUG] hitting bug when running spinlock test

2021-03-04 Thread Jiri Olsa

hi,
I'm getting attached BUG/crash when running in parralel selftests, like:

  while :; do ./test_progs -t spinlock; done
  while :; do ./test_progs ; done

it's the latest bpf-next/master, I can send the .config if needed,
but I don't think there's anything special about it, because I saw
the bug on other servers with different generic configs

it looks like it's related to cgroup local storage, for some reason
the storage deref returns NULL

I'm bit lost in this code, so any help would be great ;-)

thanks,
jirka


---
...
[  382.324440] bpf_testmod: loading out-of-tree module taints kernel.
[  382.330670] bpf_testmod: module verification failed: signature and/or 
required key missing - tainting kernel
[  480.391667] perf: interrupt took too long (2540 > 2500), lowering 
kernel.perf_event_max_sample_rate to 78000
[  480.401730] perf: interrupt took too long (6860 > 6751), lowering 
kernel.perf_event_max_sample_rate to 29000
[  480.416172] perf: interrupt took too long (8602 > 8575), lowering 
kernel.perf_event_max_sample_rate to 23000
[  480.433053] BUG: kernel NULL pointer dereference, address: 
[  480.440014] #PF: supervisor read access in kernel mode
[  480.445153] #PF: error_code(0x) - not-present page
[  480.450294] PGD 800133a18067 P4D 800133a18067 PUD 10c019067 PMD 0 
[  480.457164] Oops:  [#1] PREEMPT SMP PTI
[  480.461350] CPU: 6 PID: 16689 Comm: test_progs Tainted: G  IOE 
5.11.0+ #11
[  480.469263] Hardware name: Dell Inc. PowerEdge R440/08CYF7, BIOS 1.7.0 
12/14/2018
[  480.476742] RIP: 0010:bpf_get_local_storage+0x13/0x50
[  480.481797] Code: e8 92 c5 8e 00 5d 89 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 
00 0f 1f 00 0f 1f 44 00 00 83 7f 18 15 74 10 65 48 8b 05 6d c6 e2 7e <48> 8b 00 
48 83 c0 10 c3 55 48 89 e5 53 65 48 8b 05 60 c6 e2 7e8
[  480.500540] RSP: 0018:c90001bd3ce0 EFLAGS: 00010293
[  480.505766] RAX:  RBX: 982a2595 RCX: 0018
[  480.512901] RDX: 0001 RSI:  RDI: 888149ccf000
[  480.520034] RBP: c90001bd3d20 R08: c90001bd3d04 R09: 888105121600
[  480.527164] R10: d3b93420 R11: 025c R12: 0734
[  480.534299] R13: 888149ccc710 R14:  R15: c9379048
[  480.541430] FS:  7f8f2357b640() GS:8897e098() 
knlGS:
[  480.549515] CS:  0010 DS:  ES:  CR0: 80050033
[  480.555262] CR2:  CR3: 00014e826006 CR4: 007706e0
[  480.562395] DR0:  DR1:  DR2: 
[  480.569527] DR3:  DR6: fffe0ff0 DR7: 0400
[  480.576660] PKRU: 5554
[  480.579372] Call Trace:
[  480.581829]  ? bpf_prog_c48154a736e5c014_bpf_sping_lock_test+0x2ba/0x860
[  480.588526]  bpf_test_run+0x127/0x2b0
[  480.592192]  ? __build_skb_around+0xb0/0xc0
[  480.596378]  bpf_prog_test_run_skb+0x32f/0x6b0
[  480.600824]  __do_sys_bpf+0xa94/0x2240
[  480.604577]  ? debug_smp_processor_id+0x17/0x20
[  480.609107]  ? __perf_event_task_sched_in+0x32/0x340
[  480.614077]  __x64_sys_bpf+0x1a/0x20
[  480.617653]  do_syscall_64+0x38/0x50
[  480.621233]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  480.626286] RIP: 0033:0x7f8f2467f55d
[  480.629865] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d eb 78 0c 00 f7 d8 64 89 018
[  480.648611] RSP: 002b:7f8f2357ad58 EFLAGS: 0206 ORIG_RAX: 
0141
[  480.656175] RAX: ffda RBX:  RCX: 7f8f2467f55d
[  480.663308] RDX: 0078 RSI: 7f8f2357ad60 RDI: 000a
[  480.670442] RBP: 7f8f2357ae28 R08:  R09: 0008
[  480.677574] R10:  R11: 0206 R12: 7f8f2357ae2c
[  480.684707] R13: 022df420 R14:  R15: 7f8f2357b640
[  480.691842] Modules linked in: bpf_testmod(OE) intel_rapl_msr 
intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
ipmi_ssif irqbypass rapl intel_cstate dell_smbios intel_uncore mei_]
[  480.739134] CR2: 
[  480.742452] ---[ end trace 807177cbb5e3b3da ]---
[  480.752174] RIP: 0010:bpf_get_local_storage+0x13/0x50
[  480.757230] Code: e8 92 c5 8e 00 5d 89 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 
00 0f 1f 00 0f 1f 44 00 00 83 7f 18 15 74 10 65 48 8b 05 6d c6 e2 7e <48> 8b 00 
48 83 c0 10 c3 55 48 89 e5 53 65 48 8b 05 60 c6 e2 7e8
[  480.775976] RSP: 0018:c90001bd3ce0 EFLAGS: 00010293
[  480.781202] RAX:  RBX: 982a2595 RCX: 0018
[  480.788335] RDX: 0001 RSI:  RDI: 888149ccf000
[  480.795466] RBP: c90001bd3d20 R08: c90001bd3d04 R09: 888105121600
[  480.802598] R10: d3b93420 R11: 025c R12: 0734
[  480.809730] R13: 888149ccc710 R14: 00

Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN

2021-03-04 Thread Jakub Kicinski

On Thu, 4 Mar 2021 13:51:15 +0100 Eric Dumazet wrote:
> I think we are over thinking this really (especially if the fix needs
> a change in core networking or drivers)
> 
> We can reuse TSQ logic to have a chance to recover when the clone is
> eventually freed.
> This will be more generic, not only for the SYN+data of FastOpen.
> 
> Can you please test the following patch ?

#7 - Eric comes up with something much better :)


But so far doesn't seem to quite do it, I'm looking but maybe you'll
know right away (FWIW testing a v5.6 backport but I don't think TSQ
changed?):

On __tcp_retransmit_skb kretprobe:

==> Hit TFO case ret:-16 ca_state:0 skb:888fdb4bac00!

First hit:
__tcp_retransmit_skb+1
tcp_rcv_state_process+2488
tcp_v6_do_rcv+405
tcp_v6_rcv+2984
ip6_protocol_deliver_rcu+180
ip6_input_finish+17

Successful hit:
__tcp_retransmit_skb+1
tcp_retransmit_skb+18
tcp_retransmit_timer+716
tcp_write_timer_handler+136
tcp_write_timer+141
call_timer_fn+43

 skb:888fdb4bac00 --- delay:51642us bytes_acked:1

Re: [PATCH net] net: usb: cdc_ncm: don't spew notifications

2021-03-04 Thread Grant Grundler

On Wed, Jan 20, 2021 at 5:04 PM Jakub Kicinski  wrote:
>
> On Wed, 20 Jan 2021 03:38:32 + Hayes Wang wrote:
> > Grant Grundler 
> > > Sent: Wednesday, January 20, 2021 9:12 AM
> > > Subject: [PATCH net] net: usb: cdc_ncm: don't spew notifications
> > >
> > > RTL8156 sends notifications about every 32ms.
> > > Only display/log notifications when something changes.
> > >
> > > This issue has been reported by others:
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472
> > > https://lkml.org/lkml/2020/8/27/1083
> > >
> > > Chrome OS cannot support RTL8156 until this is fixed.
> > >
> > > Signed-off-by: Grant Grundler 
> >
> > Reviewed-by: Hayes Wang 
>
> Applied, thanks!
>
> net should be merged back into net-next by the end of the day, so
> if the other patches depend on this one to apply cleanly please keep
> an eye and post after that happens. If there is no conflict you can
> just post them with [PATCH net-next] now.

Jakub, sorry, I "dropped the ball" on this one. I'll repost with
"net-next" a bit later today.

cheers,
grant

[PATCH net v2 0/2][pull request] Intel Wired LAN Driver Updates 2021-03-04

2021-03-04 Thread Tony Nguyen

This series contains updates to ixgbe and ixgbevf drivers.

Antony Antony adds a check to fail for non transport mode SA with
offload as this is not supported for ixgbe and ixgbevf.

Dinghao Liu fixes a memory leak on failure to program a perfect filter
for ixgbe.

v2:
- Dropped patch 1

The following are changes since commit a9ecb0cbf03746b17a7c13bd8e3464e6789f73e8:
  rtnetlink: using dev_base_seq from target net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 10GbE

Antony Antony (1):
  ixgbe: fail to create xfrm offload of IPsec tunnel mode SA

Dinghao Liu (1):
  ixgbe: Fix memleak in ixgbe_configure_clsu32

 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 5 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 6 --
 drivers/net/ethernet/intel/ixgbevf/ipsec.c | 5 +
 3 files changed, 14 insertions(+), 2 deletions(-)

-- 
2.26.2

[PATCH net v2 2/2] ixgbe: Fix memleak in ixgbe_configure_clsu32

2021-03-04 Thread Tony Nguyen

From: Dinghao Liu 

When ixgbe_fdir_write_perfect_filter_82599() fails,
input allocated by kzalloc() has not been freed,
which leads to memleak.

Signed-off-by: Dinghao Liu 
Reviewed-by: Paul Menzel 
Tested-by: Tony Brelinski 
Signed-off-by: Tony Nguyen 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index fae84202d870..9f3f12e2ccf2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9565,8 +9565,10 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter 
*adapter,
ixgbe_atr_compute_perfect_hash_82599(&input->filter, mask);
err = ixgbe_fdir_write_perfect_filter_82599(hw, &input->filter,
input->sw_idx, queue);
-   if (!err)
-   ixgbe_update_ethtool_fdir_entry(adapter, input, input->sw_idx);
+   if (err)
+   goto err_out_w_lock;
+
+   ixgbe_update_ethtool_fdir_entry(adapter, input, input->sw_idx);
spin_unlock(&adapter->fdir_perfect_lock);
 
if ((uhtid != 0x800) && (adapter->jump_tables[uhtid]))
-- 
2.26.2

[PATCH net v2 1/2] ixgbe: fail to create xfrm offload of IPsec tunnel mode SA

2021-03-04 Thread Tony Nguyen

From: Antony Antony 

Based on talks and indirect references ixgbe IPsec offlod do not
support IPsec tunnel mode offload. It can only support IPsec transport
mode offload. Now explicitly fail when creating non transport mode SA
with offload to avoid false performance expectations.

Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Antony Antony 
Acked-by: Shannon Nelson 
Tested-by: Tony Brelinski 
Signed-off-by: Tony Nguyen 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 5 +
 drivers/net/ethernet/intel/ixgbevf/ipsec.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index eca73526ac86..54d47265a7ac 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -575,6 +575,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
return -EINVAL;
}
 
+   if (xs->props.mode != XFRM_MODE_TRANSPORT) {
+   netdev_err(dev, "Unsupported mode for ipsec offload\n");
+   return -EINVAL;
+   }
+
if (ixgbe_ipsec_check_mgmt_ip(xs)) {
netdev_err(dev, "IPsec IP addr clash with mgmt filters\n");
return -EINVAL;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c 
b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 5170dd9d8705..caaea2c920a6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -272,6 +272,11 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs)
return -EINVAL;
}
 
+   if (xs->props.mode != XFRM_MODE_TRANSPORT) {
+   netdev_err(dev, "Unsupported mode for ipsec offload\n");
+   return -EINVAL;
+   }
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa rsa;
 
-- 
2.26.2

[PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash

2021-03-04 Thread michael-dev

From: Michael Braun 

When using jumbo packets and overrunning rx queue with napi enabled,
the following sequence is observed in gfar_add_rx_frag:

   | lstatus  |   | skb   |
t  | lstatus,  size, flags| first | len, data_len, *ptr   |
---+--+---+---+
13 | 18002348, 9032, INTERRUPT LAST   | 0 | 9600, 8000,  f554c12e |
12 | 1640, 1600, INTERRUPT| 0 | 8000, 6400,  f554c12e |
11 | 1640, 1600, INTERRUPT| 0 | 6400, 4800,  f554c12e |
10 | 1640, 1600, INTERRUPT| 0 | 4800, 3200,  f554c12e |
09 | 1640, 1600, INTERRUPT| 0 | 3200, 1600,  f554c12e |
08 | 14000640, 1600, INTERRUPT FIRST  | 0 | 1600, 0, f554c12e |
07 | 14000640, 1600, INTERRUPT FIRST  | 1 | 0,0, f554c12e |
06 | 1c80, 128,  INTERRUPT LAST FIRST | 1 | 0,0, abf3bd6e |
05 | 18002348, 9032, INTERRUPT LAST   | 0 | 8000, 6400,  c5a57780 |
04 | 1640, 1600, INTERRUPT| 0 | 6400, 4800,  c5a57780 |
03 | 1640, 1600, INTERRUPT| 0 | 4800, 3200,  c5a57780 |
02 | 1640, 1600, INTERRUPT| 0 | 3200, 1600,  c5a57780 |
01 | 1640, 1600, INTERRUPT| 0 | 1600, 0, c5a57780 |
00 | 14000640, 1600, INTERRUPT FIRST  | 1 | 0,0, c5a57780 |

So at t=7 a new packets is started but not finished, probably due to rx
overrun - but rx overrun is not indicated in the flags. Instead a new
packets starts at t=8. This results in skb->len to exceed size for the LAST
fragment at t=13 and thus a negative fragment size added to the skb.

This then crashes:

kernel BUG at include/linux/skbuff.h:2277!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c04689f4] skb_pull+0x2c/0x48
LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
Call Trace:
[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
[ec4bff08] [c0062718] kthread+0x140/0x144
[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c

This patch fixes this by checking for computed LAST fragment size, so a
negative sized fragment is never added.
In order to prevent the newer rx frame from getting corrupted, the FIRST
flag is checked to discard the incomplete older frame.

Signed-off-by: Michael Braun 
---
 drivers/net/ethernet/freescale/gianfar.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 541de32ea662..1cf8ef717453 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2390,6 +2390,10 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, 
u32 lstatus,
if (lstatus & BD_LFLAG(RXBD_LAST))
size -= skb->len;
 
+   WARN(size < 0, "gianfar: rx fragment size underflow");
+   if (size < 0)
+   return false;
+
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
rxb->page_offset + RXBUF_ALIGNMENT,
size, GFAR_RXB_TRUESIZE);
@@ -2552,6 +2556,17 @@ static int gfar_clean_rx_ring(struct gfar_priv_rx_q 
*rx_queue,
if (lstatus & BD_LFLAG(RXBD_EMPTY))
break;
 
+   /* lost RXBD_LAST descriptor due to overrun */
+   if (skb &&
+   (lstatus & BD_LFLAG(RXBD_FIRST))) {
+   /* discard faulty buffer */
+   dev_kfree_skb(skb);
+   skb = NULL;
+   rx_queue->stats.rx_dropped++;
+
+   /* can continue normally */
+   }
+
/* order rx buffer descriptor reads */
rmb();
 
-- 
2.20.1

Re: [PATCH] net: mellanox: mlx5: fix error return code in mlx5_fpga_device_start()

2021-03-04 Thread Heiner Kallweit

On 04.03.2021 15:18, Jia-Ju Bai wrote:
> When mlx5_is_fpga_lookaside() returns a non-zero value, no error 
> return code is assigned.
> To fix this bug, err is assigned with -EINVAL as error return code.
> 
To me it looks like the current behavior is intentional.
Did you verify that it's actually an error condition if the
function returns true? Please don't blindly trust such code checkers.

> Reported-by: TOTE Robot 
> Signed-off-by: Jia-Ju Bai 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c
> index 2ce4241459ce..c9e6da97126f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c
> @@ -198,8 +198,10 @@ int mlx5_fpga_device_start(struct mlx5_core_dev *mdev)
>   mlx5_fpga_info(fdev, "FPGA card %s:%u\n", mlx5_fpga_name(fpga_id), 
> fpga_id);
>  
>   /* No QPs if FPGA does not participate in net processing */
> - if (mlx5_is_fpga_lookaside(fpga_id))
> + if (mlx5_is_fpga_lookaside(fpga_id)) {
> + err = -EINVAL;
>   goto out;
> + }
>  
>   mlx5_fpga_info(fdev, "%s(%d): image, version %u; SBU %06x:%04x version 
> %d\n",
>  mlx5_fpga_image_name(fdev->last_oper_image),
>

Re: [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash

2021-03-04 Thread Vladimir Oltean

Hi Michael,

On Thu, Mar 04, 2021 at 08:52:52PM +0100, michael-...@fami-braun.de wrote:
> From: Michael Braun 
>
> When using jumbo packets and overrunning rx queue with napi enabled,
> the following sequence is observed in gfar_add_rx_frag:
>
>| lstatus  |   | skb   |
> t  | lstatus,  size, flags| first | len, data_len, *ptr   |
> ---+--+---+---+
> 13 | 18002348, 9032, INTERRUPT LAST   | 0 | 9600, 8000,  f554c12e |
> 12 | 1640, 1600, INTERRUPT| 0 | 8000, 6400,  f554c12e |
> 11 | 1640, 1600, INTERRUPT| 0 | 6400, 4800,  f554c12e |
> 10 | 1640, 1600, INTERRUPT| 0 | 4800, 3200,  f554c12e |
> 09 | 1640, 1600, INTERRUPT| 0 | 3200, 1600,  f554c12e |
> 08 | 14000640, 1600, INTERRUPT FIRST  | 0 | 1600, 0, f554c12e |
> 07 | 14000640, 1600, INTERRUPT FIRST  | 1 | 0,0, f554c12e |
> 06 | 1c80, 128,  INTERRUPT LAST FIRST | 1 | 0,0, abf3bd6e |
> 05 | 18002348, 9032, INTERRUPT LAST   | 0 | 8000, 6400,  c5a57780 |
> 04 | 1640, 1600, INTERRUPT| 0 | 6400, 4800,  c5a57780 |
> 03 | 1640, 1600, INTERRUPT| 0 | 4800, 3200,  c5a57780 |
> 02 | 1640, 1600, INTERRUPT| 0 | 3200, 1600,  c5a57780 |
> 01 | 1640, 1600, INTERRUPT| 0 | 1600, 0, c5a57780 |
> 00 | 14000640, 1600, INTERRUPT FIRST  | 1 | 0,0, c5a57780 |
>
> So at t=7 a new packets is started but not finished, probably due to rx
> overrun - but rx overrun is not indicated in the flags. Instead a new
> packets starts at t=8. This results in skb->len to exceed size for the LAST
> fragment at t=13 and thus a negative fragment size added to the skb.
>
> This then crashes:
>
> kernel BUG at include/linux/skbuff.h:2277!
> Oops: Exception in kernel mode, sig: 5 [#1]
> ...
> NIP [c04689f4] skb_pull+0x2c/0x48
> LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
> Call Trace:
> [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
> [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
> [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
> [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
> [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
> [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
> [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
> [ec4bff08] [c0062718] kthread+0x140/0x144
> [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c
>
> This patch fixes this by checking for computed LAST fragment size, so a
> negative sized fragment is never added.
> In order to prevent the newer rx frame from getting corrupted, the FIRST
> flag is checked to discard the incomplete older frame.
>
> Signed-off-by: Michael Braun 
> ---

Just for my understanding, do you have a reproducer for the issue?
I notice you haven't answered Claudiu's questions posted on v1.
On LS1021A I cannot trigger this apparent hardware issue even if I force
RX overruns (by reducing the ring size). Judging from the "NIP" register
from your stack trace, this is a PowerPC device, which one is it?

Re: [PATCH AUTOSEL 5.10 050/217] rsi: Fix TX EAPOL packet handling against iwlwifi AP

2021-03-04 Thread Sasha Levin


On Tue, Mar 02, 2021 at 08:25:49PM +0100, Marek Vasut wrote:

On 12/23/20 3:13 AM, Sasha Levin wrote:

Hello Sasha,


From: Marek Vasut 

[ Upstream commit 65277100caa2f2c62b6f3c4648b90d6f0435f3bc ]

In case RSI9116 SDIO WiFi operates in STA mode against Intel 9260 in AP mode,
the association fails. The former is using wpa_supplicant during association,
the later is set up using hostapd:


[...]

Was this patch possibly missed from 5.10.y ?


I'm not sure what happened there, but I can queue it up.

Also, while at it, I think it might make sense to pick the following 
two patches as well, they dramatically reduce interrupt rate of the 
RSI WiFi device, so it stops overloading lower-end devices:

287431463e786 ("rsi: Move card interrupt handling to RX thread")


And this one too.


abd131a19f6b8 ("rsi: Clean up loop in the interrupt handler")


But not this one, it looks like just a cleanup. Why is it needed?

--
Thanks,
Sasha

Re: [PATCH] net: sched: avoid duplicates in classes dump

2021-03-04 Thread Cong Wang

On Thu, Mar 4, 2021 at 6:44 AM Maximilian Heyne  wrote:
>
> This is a follow up of commit ea3274695353 ("net: sched: avoid
> duplicates in qdisc dump") which has fixed the issue only for the qdisc
> dump.
>
> The duplicate printing also occurs when dumping the classes via
>   tc class show dev eth0
>
> Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable")
> Signed-off-by: Maximilian Heyne 

Seems reasonable. If you can show the difference of tc class dump
before and after this patch in your changelog, it would be helpful to
understand the bug here.

Thanks.

[PATCH] netlink.7: note not reliable if NETLINK_NO_ENOBUFS

2021-03-04 Thread Alexander Aring

This patch adds a note to the netlink manpage that if NETLINK_NO_ENOBUFS
is set there is no additional handling to make netlink reliable. It just
disables the error notification. The used word "avoid" receiving ENOBUFS
errors can be interpreted that netlink tries to do some additional queue
handling to avoid that such scenario occurs at all, e.g. like zerocopy
which tries to avoid memory copy. However disable is not the right word
here as well that in some cases ENOBUFS can be still received. This
patch makes clear that there will no additional handling to put netlink
in a more reliable mode.

Signed-off-by: Alexander Aring 
---
 man7/netlink.7 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man7/netlink.7 b/man7/netlink.7
index c69bb62bf..2cb0d1a55 100644
--- a/man7/netlink.7
+++ b/man7/netlink.7
@@ -478,7 +478,7 @@ errors.
 .\"Author: Pablo Neira Ayuso 
 This flag can be used by unicast and broadcast listeners to avoid receiving
 .B ENOBUFS
-errors.
+errors. Note it does not turn netlink into any kind of more reliable mode.
 .TP
 .BR NETLINK_LISTEN_ALL_NSID " (since Linux 4.2)"
 .\"commit 59324cf35aba5336b611074028777838a963d03b
-- 
2.26.2

[PATCH resend] netlink.7: note not reliable if NETLINK_NO_ENOBUFS

2021-03-04 Thread Alexander Aring

This patch adds a note to the netlink manpage that if NETLINK_NO_ENOBUFS
is set there is no additional handling to make netlink reliable. It just
disables the error notification. The used word "avoid" receiving ENOBUFS
errors can be interpreted that netlink tries to do some additional queue
handling to avoid that such scenario occurs at all, e.g. like zerocopy
which tries to avoid memory copy. However disable is not the right word
here as well that in some cases ENOBUFS can be still received. This
patch makes clear that there will no additional handling to put netlink
in a more reliable mode.

Signed-off-by: Alexander Aring 
---
resend:
 - forgot linux-man mailinglist in cc, sorry.

 man7/netlink.7 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man7/netlink.7 b/man7/netlink.7
index c69bb62bf..2cb0d1a55 100644
--- a/man7/netlink.7
+++ b/man7/netlink.7
@@ -478,7 +478,7 @@ errors.
 .\"Author: Pablo Neira Ayuso 
 This flag can be used by unicast and broadcast listeners to avoid receiving
 .B ENOBUFS
-errors.
+errors. Note it does not turn netlink into any kind of more reliable mode.
 .TP
 .BR NETLINK_LISTEN_ALL_NSID " (since Linux 4.2)"
 .\"commit 59324cf35aba5336b611074028777838a963d03b
-- 
2.26.2

Re: [PATCH AUTOSEL 5.10 050/217] rsi: Fix TX EAPOL packet handling against iwlwifi AP

2021-03-04 Thread Marek Vasut


On 3/4/21 9:47 PM, Sasha Levin wrote:

On Tue, Mar 02, 2021 at 08:25:49PM +0100, Marek Vasut wrote:

On 12/23/20 3:13 AM, Sasha Levin wrote:

Hello Sasha,


From: Marek Vasut 

[ Upstream commit 65277100caa2f2c62b6f3c4648b90d6f0435f3bc ]

In case RSI9116 SDIO WiFi operates in STA mode against Intel 9260 in 
AP mode,
the association fails. The former is using wpa_supplicant during 
association,

the later is set up using hostapd:


[...]

Was this patch possibly missed from 5.10.y ?


I'm not sure what happened there, but I can queue it up.


Thank you

Also, while at it, I think it might make sense to pick the following 
two patches as well, they dramatically reduce interrupt rate of the 
RSI WiFi device, so it stops overloading lower-end devices:

287431463e786 ("rsi: Move card interrupt handling to RX thread")


And this one too.


Thanks


abd131a19f6b8 ("rsi: Clean up loop in the interrupt handler")


But not this one, it looks like just a cleanup. Why is it needed?


Now I got confused, yes, please skip abd131a19f6b8, thanks for spotting 
it. (I still have one more patch for the RSI wifi which I need to send 
out, but that's for later)

Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN

2021-03-04 Thread Jonathan Lemon

On Thu, Mar 04, 2021 at 08:41:45PM +0100, Eric Dumazet wrote:
> On Thu, Mar 4, 2021 at 8:06 PM Jakub Kicinski  wrote:
> >
> > On Thu, 4 Mar 2021 13:51:15 +0100 Eric Dumazet wrote:
> > > I think we are over thinking this really (especially if the fix needs
> > > a change in core networking or drivers)
> > >
> > > We can reuse TSQ logic to have a chance to recover when the clone is
> > > eventually freed.
> > > This will be more generic, not only for the SYN+data of FastOpen.
> > >
> > > Can you please test the following patch ?
> >
> > #7 - Eric comes up with something much better :)
> >
> >
> > But so far doesn't seem to quite do it, I'm looking but maybe you'll
> > know right away (FWIW testing a v5.6 backport but I don't think TSQ
> > changed?):
> >
> > On __tcp_retransmit_skb kretprobe:
> >
> > ==> Hit TFO case ret:-16 ca_state:0 skb:888fdb4bac00!
> >
> > First hit:
> > __tcp_retransmit_skb+1
> > tcp_rcv_state_process+2488
> > tcp_v6_do_rcv+405
> > tcp_v6_rcv+2984
> > ip6_protocol_deliver_rcu+180
> > ip6_input_finish+17
> >
> > Successful hit:
> > __tcp_retransmit_skb+1
> > tcp_retransmit_skb+18
> > tcp_retransmit_timer+716
> > tcp_write_timer_handler+136
> > tcp_write_timer+141
> > call_timer_fn+43
> >
> >  skb:888fdb4bac00 --- delay:51642us bytes_acked:1
> 
> 
> Humm maybe one of the conditions used in tcp_tsq_write() does not hold...
> 
> if (tp->lost_out > tp->retrans_out &&
> tp->snd_cwnd > tcp_packets_in_flight(tp)) {
> tcp_mstamp_refresh(tp);
> tcp_xmit_retransmit_queue(sk);
> }
> 
> Maybe FastOpen case is 'special' and tp->lost_out is wrong.


Something like this?  (completely untested)
-- 
Jonathan

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 69a545db80d2..92bc9b0f4955 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5995,8 +5995,10 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, 
struct sk_buff *synack,
else
tp->fastopen_client_fail = TFO_DATA_NOT_ACKED;
skb_rbtree_walk_from(data) {
+   tcp_mark_skb_lost(sk, data);
if (__tcp_retransmit_skb(sk, data, 1))
break;
+   tp->retrans_out += tcp_skb_pcount(data);
}
tcp_rearm_rto(sk);
NET_INC_STATS(sock_net(sk),

1 2 3 >

1 - 100 of 231 matches

Mail list logo