Re: [PATCH bpf 2/2] bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro

2019-06-02 Thread Song Liu



> On Jun 1, 2019, at 6:09 PM, Martin Lau  wrote:
> 
> On Sat, Jun 01, 2019 at 04:54:46PM -0700, Song Liu wrote:
>> 
>> 
>>> On May 31, 2019, at 3:29 PM, Martin KaFai Lau  wrote:
>>> 
>>> When the commit a6024562ffd7 ("udp: Add GRO functions to UDP socket")
>>> added udp[46]_lib_lookup_skb to the udp_gro code path, it broke
>>> the reuseport_select_sock() assumption that skb->data is pointing
>>> to the transport header.
>>> 
>>> This patch follows an earlier __udp6_lib_err() fix by
>>> passing a NULL skb to avoid calling the reuseport's bpf_prog.
>>> 
>>> Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket")
>>> Cc: Tom Herbert 
>>> Signed-off-by: Martin KaFai Lau 
>>> ---
>>> net/ipv4/udp.c | 6 +-
>>> net/ipv6/udp.c | 2 +-
>>> 2 files changed, 6 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>>> index 8fb250ed53d4..85db0e3d7f3f 100644
>>> --- a/net/ipv4/udp.c
>>> +++ b/net/ipv4/udp.c
>>> @@ -503,7 +503,11 @@ static inline struct sock 
>>> *__udp4_lib_lookup_skb(struct sk_buff *skb,
> Note that this patch is changing the below "udp4_lib_lookup_skb()"
> instead of the above "__udp4_lib_lookup_skb()".
> 
>>> struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
>>>  __be16 sport, __be16 dport)
>>> {
>>> -   return __udp4_lib_lookup_skb(skb, sport, dport, &udp_table);
>>> +   const struct iphdr *iph = ip_hdr(skb);
>>> +
>>> +   return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport,
>>> +iph->daddr, dport, inet_iif(skb),
>>> +inet_sdif(skb), &udp_table, NULL);
>> 
>> I think we can now remove the last argument of __udp4_lib_lookup()?
> The last arg of __udp4_lib_lookup() is skb.
> __udp4_lib_lookup_skb(), which is not changed in this patch, is still
> calling __udp4_lib_lookup() with a skb and the skb is used by the
> reuseport's bpf_prog.  Hence, it cannot be removed.

I see. I somehow missed this path. Thanks for the explanation. 

Acked-by: Song Liu 

> 
>> 
>> 
>>> }
>>> EXPORT_SYMBOL_GPL(udp4_lib_lookup_skb);
>>> 
>>> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
>>> index 133e6370f89c..4e52c37bb836 100644
>>> --- a/net/ipv6/udp.c
>>> +++ b/net/ipv6/udp.c
>>> @@ -243,7 +243,7 @@ struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
>>> 
>>> return __udp6_lib_lookup(dev_net(skb->dev), &iph->saddr, sport,
>>>  &iph->daddr, dport, inet6_iif(skb),
>>> -inet6_sdif(skb), &udp_table, skb);
>>> +inet6_sdif(skb), &udp_table, NULL);
>>> }
>>> EXPORT_SYMBOL_GPL(udp6_lib_lookup_skb);
>>> 
>>> -- 
>>> 2.17.1
>>> 
>> 



Re: [PATCH RFC iproute2-next v2] tc: add support for act ctinfo

2019-06-02 Thread Kevin 'ldir' Darbyshire-Bryant
Please ignore this patch.  Have just realised I’ve sent completely
the wrong thing.  Somehow managed to send the kernel space patch
again which is already accepted.  I will send a v3 of the user space
patch shortly.

Apologies.

Kevin

> On 31 May 2019, at 09:10, ldir@icloud.com wrote:
> 
> From: Kevin Darbyshire-Bryant 
> 
> ctinfo is an action restoring data stored in conntrack marks to various
> fields.  At present it has two independent modes of operation,
> restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
> marks into packet skb marks.
> 
> It understands a number of parameters specific to this action in
> additional to the usual action syntax.  Each operating mode is
> independent of the other so all options are err, optional, however not
> specifying at least one mode is a bit pointless.
> 
> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
> [CONTROL] [index ]\n"
> 
> DSCP mode
> 
> dscp enables copying of a DSCP store in the conntrack mark into the
> ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
> in the conntrack mark the DSCP value is stored.  It must be 6 contiguous
> bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits
> of the conntrack mark.
> 
> The DSCP copying may be optionally controlled by a statemask.  The
> statemask is a 32bit field, usually with a single bit set and must not
> overlap the dscp mask.  The DSCP restore operation will only take place
> if the corresponding bit/s in conntrack mark yield a non zero result.
> 
> eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6
> bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
> example.
> 
> CPMARK mode
> 
> cpmark enables copying of the conntrack mark to the packet skb mark.  In
> this mode it is completely equivalent to the existing act_connmark.
> Additional functionality is provided by the optional mask parameter,
> whereby the stored conntrack mark is logically anded with the cpmark
> mask before being stored into skb mark.  This allows shared usage of the
> conntrack mark between applications.
> 
> eg. cpmark 0x00ff would restore only the lower 24 bits of the
> conntrack mark, thus may be useful in the event that the upper 8 bits
> are used by the DSCP function.
> 
> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
> [CONTROL] [index ]
> where :
>   dscp MASK is the bitmask to restore DSCP
>STATEMASK is the bitmask to determine conditional restoring
>   cpmark MASK mask applied to restored packet mark
>   ZONE is the conntrack zone
>   CONTROL := reclassify | pipe | drop | continue | ok |
>  goto chain 
> 
> Signed-off-by: Kevin Darbyshire-Bryant 
> ---
> v2 - fix whitespace issue in pkt_cls
> fix most warnings from checkpatch - some lines still over 80 chars
> due to long TLV names.
> include/uapi/linux/pkt_cls.h  |   1 +
> include/uapi/linux/tc_act/tc_ctinfo.h |  34 
> tc/Makefile   |   1 +
> tc/m_ctinfo.c | 251 ++
> 4 files changed, 287 insertions(+)
> create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
> create mode 100644 tc/m_ctinfo.c
> 
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 51a0496f..a93680fc 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -105,6 +105,7 @@ enum tca_id {
>   TCA_ID_IFE = TCA_ACT_IFE,
>   TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
>   /* other actions go here */
> + TCA_ID_CTINFO,
>   __TCA_ID_MAX = 255
> };
> 
> diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h 
> b/include/uapi/linux/tc_act/tc_ctinfo.h
> new file mode 100644
> index ..da803e05
> --- /dev/null
> +++ b/include/uapi/linux/tc_act/tc_ctinfo.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef __UAPI_TC_CTINFO_H
> +#define __UAPI_TC_CTINFO_H
> +
> +#include 
> +#include 
> +
> +struct tc_ctinfo {
> + tc_gen;
> +};
> +
> +enum {
> + TCA_CTINFO_UNSPEC,
> + TCA_CTINFO_PAD,
> + TCA_CTINFO_TM,
> + TCA_CTINFO_ACT,
> + TCA_CTINFO_ZONE,
> + TCA_CTINFO_PARMS_DSCP_MASK,
> + TCA_CTINFO_PARMS_DSCP_STATEMASK,
> + TCA_CTINFO_PARMS_CPMARK_MASK,
> + TCA_CTINFO_STATS_DSCP_SET,
> + TCA_CTINFO_STATS_DSCP_ERROR,
> + TCA_CTINFO_STATS_CPMARK_SET,
> + __TCA_CTINFO_MAX
> +};
> +
> +#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
> +
> +enum {
> + CTINFO_MODE_DSCP= BIT(0),
> + CTINFO_MODE_CPMARK  = BIT(1)
> +};
> +
> +#endif
> diff --git a/tc/Makefile b/tc/Makefile
> index 1a305cf4..60abddee 100644
> --- a/tc/Makefile
> +++ b/tc/Makefile
> @@ -48,6 +48,7 @@ TCMODULES += m_csum.o
> TCMODULES += m_simple.o
> TCMODULES += m_vlan.o
> TCMODULES += m_connmark.o
> +TCMODULES += m_ctinfo.o
> TCMODULES += m_bpf.o
> TCMODULES += m_tunnel_key.o

Re: [PATCH v3 bpf-next 2/2] libbpf: remove qidconf and better support external bpf programs.

2019-06-02 Thread Song Liu



> On Jun 1, 2019, at 9:18 PM, Jonathan Lemon  wrote:
> 
> 
> 
> On 1 Jun 2019, at 16:05, Song Liu wrote:
> 
>>> On May 31, 2019, at 11:57 AM, Jonathan Lemon  
>>> wrote:
>>> 
>>> Use the recent change to XSKMAP bpf_map_lookup_elem() to test if
>>> there is a xsk present in the map instead of duplicating the work
>>> with qidconf.
>>> 
>>> Fix things so callers using XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD
>>> bypass any internal bpf maps, so xsk_socket__{create|delete} works
>>> properly.
>>> 
>>> Signed-off-by: Jonathan Lemon 
>>> ---
>>> tools/lib/bpf/xsk.c | 79 +
>>> 1 file changed, 16 insertions(+), 63 deletions(-)
>>> 
>>> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
>>> index 38667b62f1fe..7ce7494b5b50 100644
>>> --- a/tools/lib/bpf/xsk.c
>>> +++ b/tools/lib/bpf/xsk.c
>>> @@ -60,10 +60,8 @@ struct xsk_socket {
>>> struct xsk_umem *umem;
>>> struct xsk_socket_config config;
>>> int fd;
>>> -   int xsks_map;
>>> int ifindex;
>>> int prog_fd;
>>> -   int qidconf_map_fd;
>>> int xsks_map_fd;
>>> __u32 queue_id;
>>> char ifname[IFNAMSIZ];
>>> @@ -265,15 +263,11 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
>>> /* This is the C-program:
>>>  * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
>>>  * {
>>> -* int *qidconf, index = ctx->rx_queue_index;
>>> +* int index = ctx->rx_queue_index;
>>>  *
>>>  * // A set entry here means that the correspnding queue_id
>>>  * // has an active AF_XDP socket bound to it.
>>> -* qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
>>> -* if (!qidconf)
>>> -* return XDP_ABORTED;
>>> -*
>>> -* if (*qidconf)
>>> +* if (bpf_map_lookup_elem(&xsks_map, &index))
>>>  * return bpf_redirect_map(&xsks_map, index, 0);
>>>  *
>>>  * return XDP_PASS;
>>> @@ -286,15 +280,10 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
>>> BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
>>> BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
>>> BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
>>> -   BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
>>> +   BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
>>> BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
>>> BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
>>> -   BPF_MOV32_IMM(BPF_REG_0, 0),
>>> -   /* if r1 == 0 goto +8 */
>>> -   BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
>>> BPF_MOV32_IMM(BPF_REG_0, 2),
>>> -   /* r1 = *(u32 *)(r1 + 0) */
>>> -   BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
>>> /* if r1 == 0 goto +5 */
>>> BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
>>> /* r2 = *(u32 *)(r10 - 4) */
>>> @@ -366,18 +355,11 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk)
>>> if (max_queues < 0)
>>> return max_queues;
>>> 
>>> -   fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map",
>>> +   fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
>>>  sizeof(int), sizeof(int), max_queues, 0);
>>> if (fd < 0)
>>> return fd;
>>> -   xsk->qidconf_map_fd = fd;
>>> 
>>> -   fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
>>> -sizeof(int), sizeof(int), max_queues, 0);
>>> -   if (fd < 0) {
>>> -   close(xsk->qidconf_map_fd);
>>> -   return fd;
>>> -   }
>>> xsk->xsks_map_fd = fd;
>>> 
>>> return 0;
>>> @@ -385,10 +367,8 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk)
>>> 
>>> static void xsk_delete_bpf_maps(struct xsk_socket *xsk)
>>> {
>>> -   close(xsk->qidconf_map_fd);
>>> +   bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id);
>>> close(xsk->xsks_map_fd);
>>> -   xsk->qidconf_map_fd = -1;
>>> -   xsk->xsks_map_fd = -1;
>>> }
>>> 
>>> static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
>>> @@ -417,10 +397,9 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
>>> if (err)
>>> goto out_map_ids;
>>> 
>>> -   for (i = 0; i < prog_info.nr_map_ids; i++) {
>>> -   if (xsk->qidconf_map_fd != -1 && xsk->xsks_map_fd != -1)
>>> -   break;
>>> +   xsk->xsks_map_fd = -1;
>>> 
>>> +   for (i = 0; i < prog_info.nr_map_ids; i++) {
>>> fd = bpf_map_get_fd_by_id(map_ids[i]);
>>> if (fd < 0)
>>> continue;
>>> @@ -431,11 +410,6 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
>>> continue;
>>> }
>>> 
>>> -   if (!strcmp(map_info.name, "qidconf_map")) {
>>> -   xsk->qidconf_map_fd = fd;
>>> -   continue;
>>> -   }
>>> -
>>> if (!strcmp(map_info.name, "xsks_map")) {
>>> xsk->xsks_map_fd = fd;
>>> continue;
>>> @@ -445,40 +419,18 @@ static int xsk_lookup_bpf_maps

Re: [PATCH net-next 01/13] net: axienet: Fixed 64-bit compile, enable build on X86 and ARM

2019-06-02 Thread kbuild test robot
Hi Robert,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Robert-Hancock/Xilinx-axienet-driver-updates/20190602-124146
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: 
>> cast to restricted __be32
>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: 
>> cast to restricted __be32
>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: 
>> cast to restricted __be32
>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: 
>> cast to restricted __be32
>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: 
>> cast to restricted __be32
>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: 
>> cast to restricted __be32
>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:35: sparse: sparse: 
>> incorrect type in assignment (different base types) @@expected 
>> restricted __wsum [usertype] csum @@got  [usertype] csum @@
>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:35: sparse:
>> expected restricted __wsum [usertype] csum
>> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:35: sparse:got 
>> unsigned int

vim +778 drivers/net/ethernet/xilinx/xilinx_axienet_main.c

8a3b7a25 Daniel Borkmann   2012-01-19  728  
8a3b7a25 Daniel Borkmann   2012-01-19  729  /**
8a3b7a25 Daniel Borkmann   2012-01-19  730   * axienet_recv - Is called from 
Axi DMA Rx Isr to complete the received
8a3b7a25 Daniel Borkmann   2012-01-19  731   *BD processing.
8a3b7a25 Daniel Borkmann   2012-01-19  732   * @ndev:   Pointer to net_device 
structure.
8a3b7a25 Daniel Borkmann   2012-01-19  733   *
8a3b7a25 Daniel Borkmann   2012-01-19  734   * This function is invoked from 
the Axi DMA Rx isr to process the Rx BDs. It
8a3b7a25 Daniel Borkmann   2012-01-19  735   * does minimal processing and 
invokes "netif_rx" to complete further
8a3b7a25 Daniel Borkmann   2012-01-19  736   * processing.
8a3b7a25 Daniel Borkmann   2012-01-19  737   */
8a3b7a25 Daniel Borkmann   2012-01-19  738  static void axienet_recv(struct 
net_device *ndev)
8a3b7a25 Daniel Borkmann   2012-01-19  739  {
8a3b7a25 Daniel Borkmann   2012-01-19  740  u32 length;
8a3b7a25 Daniel Borkmann   2012-01-19  741  u32 csumstatus;
8a3b7a25 Daniel Borkmann   2012-01-19  742  u32 size = 0;
8a3b7a25 Daniel Borkmann   2012-01-19  743  u32 packets = 0;
38e96b35 Peter Crosthwaite 2015-05-05  744  dma_addr_t tail_p = 0;
8a3b7a25 Daniel Borkmann   2012-01-19  745  struct axienet_local *lp = 
netdev_priv(ndev);
8a3b7a25 Daniel Borkmann   2012-01-19  746  struct sk_buff *skb, *new_skb;
8a3b7a25 Daniel Borkmann   2012-01-19  747  struct axidma_bd *cur_p;
8a3b7a25 Daniel Borkmann   2012-01-19  748  
8a3b7a25 Daniel Borkmann   2012-01-19  749  cur_p = 
&lp->rx_bd_v[lp->rx_bd_ci];
8a3b7a25 Daniel Borkmann   2012-01-19  750  
8a3b7a25 Daniel Borkmann   2012-01-19  751  while ((cur_p->status & 
XAXIDMA_BD_STS_COMPLETE_MASK)) {
38e96b35 Peter Crosthwaite 2015-05-05  752  tail_p = lp->rx_bd_p + 
sizeof(*lp->rx_bd_v) * lp->rx_bd_ci;
8a3b7a25 Daniel Borkmann   2012-01-19  753  
8a3b7a25 Daniel Borkmann   2012-01-19  754  
dma_unmap_single(ndev->dev.parent, cur_p->phys,
8a3b7a25 Daniel Borkmann   2012-01-19  755   
lp->max_frm_size,
8a3b7a25 Daniel Borkmann   2012-01-19  756   
DMA_FROM_DEVICE);
8a3b7a25 Daniel Borkmann   2012-01-19  757  
2f148c6d Robert Hancock2019-05-31  758  skb = cur_p->skb;
2f148c6d Robert Hancock2019-05-31  759  cur_p->skb = NULL;
2f148c6d Robert Hancock2019-05-31  760  length = cur_p->app4 & 
0x;
2f148c6d Robert Hancock2019-05-31  761  
8a3b7a25 Daniel Borkmann   2012-01-19  762  skb_put(skb, length);
8a3b7a25 Daniel Borkmann   2012-01-19  763  skb->protocol = 
eth_type_trans(skb, ndev);
8a3b7a25 Daniel Borkmann   2012-01-19  764  
/*skb_checksum_none_assert(skb);*/
8a3b7a25 Daniel Borkmann   2012-01-19  765  skb->ip_summed = 
CHECKSUM_NONE;
8a3b7a25 Daniel Borkmann   2012-01-19  766  
8a3b7a25 Daniel Borkmann   2012-01-19  767  /* if we're doing Rx 
csum offload, set it up */
8a3b7a25 Daniel Borkmann   2012-01-19  7

[PATCH net-next] r8169: use paged versions of phylib MDIO access functions

2019-06-02 Thread Heiner Kallweit
Use paged versions of phylib MDIO access functions to simplify
the code.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/ethernet/realtek/r8169.c | 105 +--
 1 file changed, 33 insertions(+), 72 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 2705eb510..53a4e3a73 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1969,9 +1969,7 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp)
ret = phy_read_mmd(phydev, MDIO_MMD_PCS, MDIO_PCS_EEE_ABLE);
break;
case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-   phy_write(phydev, 0x1f, 0x0a5c);
-   ret = phy_read(phydev, 0x12);
-   phy_write(phydev, 0x1f, 0x);
+   ret = phy_read_paged(phydev, 0x0a5c, 0x12);
break;
default:
ret = -EPROTONOSUPPORT;
@@ -1994,9 +1992,7 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
ret = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_LPABLE);
break;
case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-   phy_write(phydev, 0x1f, 0x0a5d);
-   ret = phy_read(phydev, 0x11);
-   phy_write(phydev, 0x1f, 0x);
+   ret = phy_read_paged(phydev, 0x0a5d, 0x11);
break;
default:
ret = -EPROTONOSUPPORT;
@@ -2019,9 +2015,7 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp)
ret = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV);
break;
case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-   phy_write(phydev, 0x1f, 0x0a5d);
-   ret = phy_read(phydev, 0x10);
-   phy_write(phydev, 0x1f, 0x);
+   ret = phy_read_paged(phydev, 0x0a5d, 0x10);
break;
default:
ret = -EPROTONOSUPPORT;
@@ -2044,9 +2038,7 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, 
int val)
ret = phy_write_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV, val);
break;
case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-   phy_write(phydev, 0x1f, 0x0a5d);
-   phy_write(phydev, 0x10, val);
-   phy_write(phydev, 0x1f, 0x);
+   phy_write_paged(phydev, 0x0a5d, 0x10, val);
break;
default:
ret = -EPROTONOSUPPORT;
@@ -2582,9 +2574,7 @@ static void rtl8168f_config_eee_phy(struct 
rtl8169_private *tp)
 
 static void rtl8168g_config_eee_phy(struct rtl8169_private *tp)
 {
-   phy_write(tp->phydev, 0x1f, 0x0a43);
-   phy_set_bits(tp->phydev, 0x11, BIT(4));
-   phy_write(tp->phydev, 0x1f, 0x);
+   phy_modify_paged(tp->phydev, 0x0a43, 0x11, 0, BIT(4));
 }
 
 static void rtl8169s_hw_phy_config(struct rtl8169_private *tp)
@@ -3483,20 +3473,15 @@ static void rtl8411_hw_phy_config(struct 
rtl8169_private *tp)
 
 static void rtl8168g_disable_aldps(struct rtl8169_private *tp)
 {
-   phy_write(tp->phydev, 0x1f, 0x0a43);
-   phy_clear_bits(tp->phydev, 0x10, BIT(2));
+   phy_modify_paged(tp->phydev, 0x0a43, 0x10, BIT(2), 0);
 }
 
 static void rtl8168g_phy_adjust_10m_aldps(struct rtl8169_private *tp)
 {
struct phy_device *phydev = tp->phydev;
 
-   phy_write(phydev, 0x1f, 0x0bcc);
-   phy_clear_bits(phydev, 0x14, BIT(8));
-
-   phy_write(phydev, 0x1f, 0x0a44);
-   phy_set_bits(phydev, 0x11, BIT(7) | BIT(6));
-
+   phy_modify_paged(phydev, 0x0bcc, 0x14, BIT(8), 0);
+   phy_modify_paged(phydev, 0x0a44, 0x11, 0, BIT(7) | BIT(6));
phy_write(phydev, 0x1f, 0x0a43);
phy_write(phydev, 0x13, 0x8084);
phy_clear_bits(phydev, 0x14, BIT(14) | BIT(13));
@@ -3507,43 +3492,36 @@ static void rtl8168g_phy_adjust_10m_aldps(struct 
rtl8169_private *tp)
 
 static void rtl8168g_1_hw_phy_config(struct rtl8169_private *tp)
 {
+   int ret;
+
rtl_apply_firmware(tp);
 
-   rtl_writephy(tp, 0x1f, 0x0a46);
-   if (rtl_readphy(tp, 0x10) & 0x0100) {
-   rtl_writephy(tp, 0x1f, 0x0bcc);
-   rtl_w0w1_phy(tp, 0x12, 0x, 0x8000);
-   } else {
-   rtl_writephy(tp, 0x1f, 0x0bcc);
-   rtl_w0w1_phy(tp, 0x12, 0x8000, 0x);
-   }
+   ret = phy_read_paged(tp->phydev, 0x0a46, 0x10);
+   if (ret & BIT(8))
+   phy_modify_paged(tp->phydev, 0x0bcc, 0x12, BIT(15), 0);
+   else
+   phy_modify_paged(tp->phydev, 0x0bcc, 0x12, 0, BIT(15));
 
-   rtl_writephy(tp, 0x1f, 0x0a46);
-   if (rtl_readphy(tp, 0x13) & 0x0100) {
-   rtl_writephy(tp, 0x1f, 0x0c41);
-   rtl_w0w1_phy(tp, 0x15, 0x0002, 0x);
-   } else {
-   rtl_writephy(tp, 0x1f, 0x0c41);
-   rtl_w0w1_phy(tp, 0x15, 0x, 0x0002);
-   }
+   ret = phy_read_paged(tp->phydev, 0x

Re: [PATCH 3/8] dt-bindings: net: bluetooth: Add rtl8723bs-bluetooth

2019-06-02 Thread Luca Weiss
On Dienstag, 19. Februar 2019 15:14:01 CEST Rob Herring wrote:
> > > How is this used?
> > 
> > rtl8723bs-bt needs 2 firmware binaries -- one is actual firmware,
> > another is firmware config which is specific to the board. If
> > firmware-postfix is specified, driver appends it to the name of config
> > and requests board-specific config while loading firmware. I.e. if
> > 'pine64' is specified as firmware-postfix driver will load
> > rtl8723bs_config-pine64.bin.
> 
> We already have 'firmware-name' defined and I'd prefer not to have
> another way to do things. The difference is just you have to give the
> full filename.
> 

Hi Rob,

I'm working on a v2 for this patchset and I've looked on how using "firmware-
name" with the full filename would be possible but as David Summers has already 
written [1], the existing code [2] takes this "postfix" as parameter and 
basically fills it into a filename template ("${CFG_NAME}-${POSTFIX}.bin"). So 
either we stay with the "firmware-postfix" property or the existing code would 
have to be modified to accomodate the full filename; but if using 
firmware-postfix 
is unacceptable, I can rework the existing code.

Luca

[1] 
https://lore.kernel.org/netdev/d06e3c30-a34a-bd84-9cdf-535f25384...@davidjohnsummers.uk/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
drivers/bluetooth/btrtl.c#n566

signature.asc
Description: This is a digitally signed message part.


Re: [RFC PATCH 6/6] seg6: Add support to rearrange SRH for AH ICV calculation

2019-06-02 Thread Ahmed Abdelsalam
On Fri, 31 May 2019 10:34:03 -0700
Tom Herbert  wrote:

> On Fri, May 31, 2019 at 10:07 AM Ahmed Abdelsalam
>  wrote:
> >
> > On Fri, 31 May 2019 09:48:40 -0700
> > Tom Herbert  wrote:
> >
> > > Mutable fields related to segment routing are: destination address,
> > > segments left, and modifiable TLVs (those whose high order bit is set).
> > >
> > > Add support to rearrange a segment routing (type 4) routing header to
> > > handle these mutability requirements. This is described in
> > > draft-herbert-ipv6-srh-ah-00.
> >
> > Hi Tom,
> > Assuming that IETF process needs to be fixed, then, IMO, should not be on 
> > the cost of breaking the kernel process here.
> 
> Ahmed,
> 
> I do not see how this is any way breaking the kernel process. The
> kernel is beholden to the needs of users provide a robust and secure
> implementations, not to some baroque IETF or other SDO processes. When
> those are in conflict, the needs of our users should prevail.
> 
> > Let us add to the kernel things that have been reviewed and reached some 
> > consensus.
> 
> By that argument, segment routing should never have been added to the
> kernel since consensus has not be reached on it yet or at least
> portions of it. In fact, if you look at this patch set, most of the
> changes are actually bug fixes to bring the implementation into
> conformance with a later version of the draft. For instance, there was
> never consensus reached on the HMAC flag; now it's gone and we need to
> remove it from the implementation.
> 
> > For new features that still need to be reviewed we can have them outside 
> > the kernel tree for community to use.
> > This way the community does not get blocked by IETF process but also keep 
> > the kernel tree stable.
> 
> In any case, that does not address the issue of a user using both
> segment routing and authentication which leads to adverse behaviors.
> AFAICT, the kernel does not prevent this today. So I ask again: what
> is your alternative to address this?
> 
> Thanks,
> Tom

Tom,
Yes, the needs for users should prevail. But it’s not Tom or Ahmed alone who 
should define users needs. 
The comparison between "draft-herbert-ipv6-srh-ah-00" and 
"draft-ietf-6man-segment-routing-header" is
missing some facts. The first patch of the SRH implementation was submitted to 
the kernel two years after
releasing the SRH draft. By this time, the draft was a working group adopted 
and co-authored by several
vendors, operators and academia. Please refer to the first SRH patch submitted 
to the kernel
(https://patchwork.ozlabs.org/patch/663176/). I still don’t see the point of 
rushing to upstream something 
that has been defined couple of days ago. Plus there is nothing that prevents 
anyone to "innovate" in his 
own private kernel tree.

-- 
Ahmed Abdelsalam 


Re: [net-next PATCH] net: rtnetlink: Enslave device before bringing it up

2019-06-02 Thread Phil Sutter
Hi David,

On Fri, May 31, 2019 at 02:26:15PM -0700, David Miller wrote:
> From: Phil Sutter 
> Date: Wed, 29 May 2019 15:51:20 +0200
> 
> > Unlike with bridges, one can't add an interface to a bond and set it up
> > at the same time:
> > 
> > | # ip link set dummy0 down
> > | # ip link set dummy0 master bond0 up
> > | Error: Device can not be enslaved while up.
> > 
> > Of all drivers with ndo_add_slave callback, bond and team decline if
> > IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and
> > immediately up again) and the others just don't care.
> > 
> > Support the common notion of setting the interface up after enslaving it
> > by sorting the operations accordingly.
> > 
> > Signed-off-by: Phil Sutter 
> 
> What about other flags like IFF_PROMISCUITY?

Crap, that's the crux: Upon enslaving, team driver propagates
IFF_PROMISC and IFF_ALLMULTI flags from master to slave. With my change,
these propagations roll back by accident. So please disregard this
patch, I'll have to find a different solution.

Thanks, Phil


[PATCH net] selftests: set sysctl bc_forwarding properly in router_broadcast.sh

2019-06-02 Thread Xin Long
sysctl setting bc_forwarding for $rp2 is needed when ping_test_from h2,
otherwise the bc packets from $rp2 won't be forwarded. This patch is to
add this setting for $rp2.

Also, as ping_test_from does grep "$from" only, which could match some
unexpected output, some test case doesn't really work, like:

  # ping_test_from $h2 198.51.200.255 198.51.200.2
PING 198.51.200.255 from 198.51.100.2 veth3: 56(84) bytes of data.
64 bytes from 198.51.100.1: icmp_seq=1 ttl=64 time=0.336 ms

When doing grep $form (198.51.200.2), the output could still match.
So change to grep "bytes from $from" instead.

Fixes: 40f98b9af943 ("selftests: add a selftest for directed broadcast 
forwarding")
Signed-off-by: Xin Long 
---
 tools/testing/selftests/net/forwarding/router_broadcast.sh | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/router_broadcast.sh 
b/tools/testing/selftests/net/forwarding/router_broadcast.sh
index 9a678ec..4eac0a0 100755
--- a/tools/testing/selftests/net/forwarding/router_broadcast.sh
+++ b/tools/testing/selftests/net/forwarding/router_broadcast.sh
@@ -145,16 +145,19 @@ bc_forwarding_disable()
 {
sysctl_set net.ipv4.conf.all.bc_forwarding 0
sysctl_set net.ipv4.conf.$rp1.bc_forwarding 0
+   sysctl_set net.ipv4.conf.$rp2.bc_forwarding 0
 }
 
 bc_forwarding_enable()
 {
sysctl_set net.ipv4.conf.all.bc_forwarding 1
sysctl_set net.ipv4.conf.$rp1.bc_forwarding 1
+   sysctl_set net.ipv4.conf.$rp2.bc_forwarding 1
 }
 
 bc_forwarding_restore()
 {
+   sysctl_restore net.ipv4.conf.$rp2.bc_forwarding
sysctl_restore net.ipv4.conf.$rp1.bc_forwarding
sysctl_restore net.ipv4.conf.all.bc_forwarding
 }
@@ -171,7 +174,7 @@ ping_test_from()
log_info "ping $dip, expected reply from $from"
ip vrf exec $(master_name_get $oif) \
$PING -I $oif $dip -c 10 -i 0.1 -w $PING_TIMEOUT -b 2>&1 \
-   | grep $from &> /dev/null
+   | grep "bytes from $from" > /dev/null
check_err_fail $fail $?
 }
 
-- 
2.1.0



[PATCH net] ipv4: not do cache for local delivery if bc_forwarding is enabled

2019-06-02 Thread Xin Long
With the topo:

h1 ---| rp1|
  | route  rp3 |--- h3 (192.168.200.1)
h2 ---| rp2|

If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after
doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on
h2, and the packets can still be forwared.

This issue was caused by the input route cache. It should only do
the cache for either bc forwarding or local delivery. Otherwise,
local delivery can use the route cache for bc forwarding of other
interfaces.

This patch is to fix it by not doing cache for local delivery if
all.bc_forwarding is enabled.

Note that we don't fix it by checking route cache local flag after
rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the
common route code shouldn't be touched for bc_forwarding.

Fixes: 5cbf777cfdf6 ("route: add support for directed broadcast forwarding")
Reported-by: Jianlin Shi 
Signed-off-by: Xin Long 
---
 net/ipv4/route.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 11ddc27..91bf75b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1985,7 +1985,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
u32 itag = 0;
struct rtable   *rth;
struct flowi4   fl4;
-   bool do_cache;
+   bool do_cache = true;
 
/* IP on this device is disabled. */
 
@@ -2062,6 +2062,9 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
if (res->type == RTN_BROADCAST) {
if (IN_DEV_BFORWARD(in_dev))
goto make_route;
+   /* not do cache if bc_forwarding is enabled */
+   if (IPV4_DEVCONF_ALL(net, BC_FORWARDING))
+   do_cache = false;
goto brd_input;
}
 
@@ -2099,18 +2102,15 @@ out:return err;
RT_CACHE_STAT_INC(in_brd);
 
 local_input:
-   do_cache = false;
-   if (res->fi) {
-   if (!itag) {
-   struct fib_nh_common *nhc = FIB_RES_NHC(*res);
+   do_cache &= res->fi && !itag;
+   if (do_cache) {
+   struct fib_nh_common *nhc = FIB_RES_NHC(*res);
 
-   rth = rcu_dereference(nhc->nhc_rth_input);
-   if (rt_cache_valid(rth)) {
-   skb_dst_set_noref(skb, &rth->dst);
-   err = 0;
-   goto out;
-   }
-   do_cache = true;
+   rth = rcu_dereference(nhc->nhc_rth_input);
+   if (rt_cache_valid(rth)) {
+   skb_dst_set_noref(skb, &rth->dst);
+   err = 0;
+   goto out;
}
}
 
-- 
2.1.0



[PATCH net] ipv6: fix the check before getting the cookie in rt6_get_cookie

2019-06-02 Thread Xin Long
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.

It's caused by Commit 93531c674315 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().

Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.

c1:
  (rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))

Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based 
routes")
Reported-by: Jianlin Shi 
Signed-off-by: Xin Long 
---
 include/net/ip6_fib.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 525f701..d6d936c 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -263,8 +263,7 @@ static inline u32 rt6_get_cookie(const struct rt6_info *rt)
rcu_read_lock();
 
from = rcu_dereference(rt->from);
-   if (from && (rt->rt6i_flags & RTF_PCPU ||
-   unlikely(!list_empty(&rt->rt6i_uncached
+   if (from)
fib6_get_cookie_safe(from, &cookie);
 
rcu_read_unlock();
-- 
2.1.0



[PATCH net] netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments

2019-06-02 Thread Guillaume Nault
With commit 997dd9647164 ("net: IP6 defrag: use rbtrees in
nf_conntrack_reasm.c"), nf_ct_frag6_reasm() is now called from
nf_ct_frag6_queue(). With this change, nf_ct_frag6_queue() can fail
after the skb has been added to the fragment queue and
nf_ct_frag6_gather() was adapted to handle this case.

But nf_ct_frag6_queue() can still fail before the fragment has been
queued. nf_ct_frag6_gather() can't handle this case anymore, because it
has no way to know if nf_ct_frag6_queue() queued the fragment before
failing. If it didn't, the skb is lost as the error code is overwritten
with -EINPROGRESS.

Fix this by setting -EINPROGRESS directly in nf_ct_frag6_queue(), so
that nf_ct_frag6_gather() can propagate the error as is.

Fixes: 997dd9647164 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")
Signed-off-by: Guillaume Nault 
---
Not sure if this should got to the net or nf tree (as the original patch
went through net). Anyway this patch applies cleanly to both.

 net/ipv6/netfilter/nf_conntrack_reasm.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 3de0e9b0a482..5b3f65e29b6f 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -293,7 +293,11 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct 
sk_buff *skb,
skb->_skb_refdst = 0UL;
err = nf_ct_frag6_reasm(fq, skb, prev, dev);
skb->_skb_refdst = orefdst;
-   return err;
+
+   /* After queue has assumed skb ownership, only 0 or
+* -EINPROGRESS must be returned.
+*/
+   return err ? -EINPROGRESS : 0;
}
 
skb_dst_drop(skb);
@@ -480,12 +484,6 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff 
*skb, u32 user)
ret = 0;
}
 
-   /* after queue has assumed skb ownership, only 0 or -EINPROGRESS
-* must be returned.
-*/
-   if (ret)
-   ret = -EINPROGRESS;
-
spin_unlock_bh(&fq->q.lock);
inet_frag_put(&fq->q);
return ret;
-- 
2.20.1



[PATCH] netfilter: ipv6: Fix undefined symbol nf_ct_frag6_gather

2019-06-02 Thread wenxu
From: wenxu 

CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set

ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined!

Fixes: c9bb6165a16e ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y")
Reported-by: kbuild test robot 
Signed-off-by: wenxu 
---
 net/ipv6/netfilter.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 9530cc2..96d7abf 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -238,8 +238,10 @@ int br_ip6_fragment(struct net *net, struct sock *sk, 
struct sk_buff *skb,
.route_input= ip6_route_input,
.fragment   = ip6_fragment,
.reroute= nf_ip6_reroute,
-#if IS_MODULE(CONFIG_IPV6)
+#if IS_MODULE(CONFIG_IPV6) && IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
.br_defrag  = nf_ct_frag6_gather,
+#endif
+#if IS_MODULE(CONFIG_IPV6)
.br_fragment= br_ip6_fragment,
 #endif
 };
-- 
1.8.3.1



[PATCH net-next v2] netfilter: ipv6: Fix undefined symbol nf_ct_frag6_gather

2019-06-02 Thread wenxu
From: wenxu 

CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set

ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined!

Fixes: c9bb6165a16e ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y")
Reported-by: kbuild test robot 
Signed-off-by: wenxu 
---
v2: Forgot to include "net-next"

 net/ipv6/netfilter.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 9530cc2..96d7abf 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -238,8 +238,10 @@ int br_ip6_fragment(struct net *net, struct sock *sk, 
struct sk_buff *skb,
.route_input= ip6_route_input,
.fragment   = ip6_fragment,
.reroute= nf_ip6_reroute,
-#if IS_MODULE(CONFIG_IPV6)
+#if IS_MODULE(CONFIG_IPV6) && IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
.br_defrag  = nf_ct_frag6_gather,
+#endif
+#if IS_MODULE(CONFIG_IPV6)
.br_fragment= br_ip6_fragment,
 #endif
 };
-- 
1.8.3.1



Re: iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56

2019-06-02 Thread Marc Haber
On Thu, May 30, 2019 at 10:12:57AM +0200, Marc Haber wrote:
> on my primary notebook, a Lenovo X260, with an Intel Wireless 8260
> (8086:24f3), running Debian unstable, I have started to see network
> hangs since upgrading to kernel 5.1. In this situation, I cannot
> restart Network-Manager (the call just hangs), I can log out of X, but
> the system does not cleanly shut down and I need to Magic SysRq myself
> out of the running system. This happens about once every two days.

The issue is also present in 5.1.5 and 5.1.6.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


[PATCH] net: phylink: avoid reducing support mask

2019-06-02 Thread Russell King
Avoid reducing the support mask as a result of the interface type
selected for SFP modules, or when setting the link settings through
ethtool - this should only change when the supported link modes of
the hardware combination change.

Signed-off-by: Russell King 
---
 drivers/net/phy/phylink.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 9044b95d2afe..4c0616ba314d 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1073,6 +1073,7 @@ EXPORT_SYMBOL_GPL(phylink_ethtool_ksettings_get);
 int phylink_ethtool_ksettings_set(struct phylink *pl,
  const struct ethtool_link_ksettings *kset)
 {
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(support);
struct ethtool_link_ksettings our_kset;
struct phylink_link_state config;
int ret;
@@ -1083,11 +1084,12 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
kset->base.autoneg != AUTONEG_ENABLE)
return -EINVAL;
 
+   linkmode_copy(support, pl->supported);
config = pl->link_config;
 
/* Mask out unsupported advertisements */
linkmode_and(config.advertising, kset->link_modes.advertising,
-pl->supported);
+support);
 
/* FIXME: should we reject autoneg if phy/mac does not support it? */
if (kset->base.autoneg == AUTONEG_DISABLE) {
@@ -1097,7 +1099,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
 * duplex.
 */
s = phy_lookup_setting(kset->base.speed, kset->base.duplex,
-  pl->supported, false);
+  support, false);
if (!s)
return -EINVAL;
 
@@ -1126,7 +1128,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
__set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, config.advertising);
}
 
-   if (phylink_validate(pl, pl->supported, &config))
+   if (phylink_validate(pl, support, &config))
return -EINVAL;
 
/* If autonegotiation is enabled, we must have an advertisement */
@@ -1576,6 +1578,7 @@ static int phylink_sfp_module_insert(void *upstream,
 {
struct phylink *pl = upstream;
__ETHTOOL_DECLARE_LINK_MODE_MASK(support) = { 0, };
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(support1);
struct phylink_link_state config;
phy_interface_t iface;
int ret = 0;
@@ -1603,6 +1606,8 @@ static int phylink_sfp_module_insert(void *upstream,
return ret;
}
 
+   linkmode_copy(support1, support);
+
iface = sfp_select_interface(pl->sfp_bus, id, config.advertising);
if (iface == PHY_INTERFACE_MODE_NA) {
netdev_err(pl->netdev,
@@ -1612,7 +1617,7 @@ static int phylink_sfp_module_insert(void *upstream,
}
 
config.interface = iface;
-   ret = phylink_validate(pl, support, &config);
+   ret = phylink_validate(pl, support1, &config);
if (ret) {
netdev_err(pl->netdev, "validation of %s/%s with support %*pb 
failed: %d\n",
   phylink_an_mode_str(MLO_AN_INBAND),
-- 
2.7.4



[PATCH] net: sfp: read eeprom in maximum 16 byte increments

2019-06-02 Thread Russell King
Some SFP modules do not like reads longer than 16 bytes, so read the
EEPROM in chunks of 16 bytes at a time.  This behaviour is not specified
in the SFP MSAs, which specifies:

 "The serial interface uses the 2-wire serial CMOS E2PROM protocol
  defined for the ATMEL AT24C01A/02/04 family of components."

and

 "As long as the SFP+ receives an acknowledge, it shall serially clock
  out sequential data words. The sequence is terminated when the host
  responds with a NACK and a STOP instead of an acknowledge."

We must avoid breaking a read across a 16-bit quantity in the diagnostic
page, thankfully all 16-bit quantities in that page are naturally
aligned.

Signed-off-by: Russell King 
---
 drivers/net/phy/sfp.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index d4635c2178d1..71812be0ac64 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -281,6 +281,7 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 
dev_addr, void *buf,
 {
struct i2c_msg msgs[2];
u8 bus_addr = a2 ? 0x51 : 0x50;
+   size_t this_len;
int ret;
 
msgs[0].addr = bus_addr;
@@ -292,11 +293,26 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 
dev_addr, void *buf,
msgs[1].len = len;
msgs[1].buf = buf;
 
-   ret = i2c_transfer(sfp->i2c, msgs, ARRAY_SIZE(msgs));
-   if (ret < 0)
-   return ret;
+   while (len) {
+   this_len = len;
+   if (this_len > 16)
+   this_len = 16;
 
-   return ret == ARRAY_SIZE(msgs) ? len : 0;
+   msgs[1].len = this_len;
+
+   ret = i2c_transfer(sfp->i2c, msgs, ARRAY_SIZE(msgs));
+   if (ret < 0)
+   return ret;
+
+   if (ret != ARRAY_SIZE(msgs))
+   break;
+
+   msgs[1].buf += this_len;
+   dev_addr += this_len;
+   len -= this_len;
+   }
+
+   return msgs[1].buf - (u8 *)buf;
 }
 
 static int sfp_i2c_write(struct sfp *sfp, bool a2, u8 dev_addr, void *buf,
-- 
2.7.4



Re: [PATCH net-next] net: phy: phylink: add fallback from SGMII to 1000BaseX

2019-06-02 Thread Russell King - ARM Linux admin
On Fri, May 31, 2019 at 06:17:51PM -0600, Robert Hancock wrote:
> Our device is mainly intended for fiber modules, which is why 1000BaseX
> is being used. The variant of fiber modules we are using (for example,
> Finisar FCLF8520P2BTL) are set up for 1000BaseX, and seem like they are
> kind of a hack to allow using copper on devices which only support
> 1000BaseX mode (in fact that particular one is extra hacky since you
> have to disable 1000BaseX autonegotiation on the host side). This patch
> is basically intended to allow that particular case to work.

Looking at the data sheet for FCLF8520P2BTL, it explicit states:

PRODUCT SELECTION
Part Number Link Indicator  1000BASE-X auto-negotiation
on RX_LOS Pin   enabled by default
FCLF8520P2BTL   Yes No
FCLF8521P2BTL   No  Yes
FCLF8522P2BTL   Yes Yes

The idea being, you buy the correct one according to what the host
equipment requires, rather than just picking one and hoping it works.

The data sheet goes on to mention that the module uses a Marvell
88e PHY, which seems to be quite common for copper SFPs from
multiple manufacturers (but not all) and is very flexible in how it
can be configured.

If we detect a PHY on the SFP module, we check detect whether it is
an 88e PHY, and then read out its configured link type.  We don't
have a way to deal with the difference between FCLF8520P2BTL and
FCLF8521P2BTL, but at least we'll be able to tell whether we should
be in 1000Base-X mode for these modules, rather than SGMII.

For a SFP cage meant to support fiber, I would recommend using the
FCLF8521P2BTL or FCLF8522P2BTL since those will behave more like a
802.3z standards-compliant gigabit fiber connection.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 6/6] seg6: Add support to rearrange SRH for AH ICV calculation

2019-06-02 Thread Tom Herbert
On Sun, Jun 2, 2019 at 2:54 AM Ahmed Abdelsalam  wrote:
>
> On Fri, 31 May 2019 10:34:03 -0700
> Tom Herbert  wrote:
>
> > On Fri, May 31, 2019 at 10:07 AM Ahmed Abdelsalam
> >  wrote:
> > >
> > > On Fri, 31 May 2019 09:48:40 -0700
> > > Tom Herbert  wrote:
> > >
> > > > Mutable fields related to segment routing are: destination address,
> > > > segments left, and modifiable TLVs (those whose high order bit is set).
> > > >
> > > > Add support to rearrange a segment routing (type 4) routing header to
> > > > handle these mutability requirements. This is described in
> > > > draft-herbert-ipv6-srh-ah-00.
> > >
> > > Hi Tom,
> > > Assuming that IETF process needs to be fixed, then, IMO, should not be on 
> > > the cost of breaking the kernel process here.
> >
> > Ahmed,
> >
> > I do not see how this is any way breaking the kernel process. The
> > kernel is beholden to the needs of users provide a robust and secure
> > implementations, not to some baroque IETF or other SDO processes. When
> > those are in conflict, the needs of our users should prevail.
> >
> > > Let us add to the kernel things that have been reviewed and reached some 
> > > consensus.
> >
> > By that argument, segment routing should never have been added to the
> > kernel since consensus has not be reached on it yet or at least
> > portions of it. In fact, if you look at this patch set, most of the
> > changes are actually bug fixes to bring the implementation into
> > conformance with a later version of the draft. For instance, there was
> > never consensus reached on the HMAC flag; now it's gone and we need to
> > remove it from the implementation.
> >
> > > For new features that still need to be reviewed we can have them outside 
> > > the kernel tree for community to use.
> > > This way the community does not get blocked by IETF process but also keep 
> > > the kernel tree stable.
> >
> > In any case, that does not address the issue of a user using both
> > segment routing and authentication which leads to adverse behaviors.
> > AFAICT, the kernel does not prevent this today. So I ask again: what
> > is your alternative to address this?
> >
> > Thanks,
> > Tom
>
> Tom,
> Yes, the needs for users should prevail. But it’s not Tom or Ahmed alone who 
> should define users needs.
> The comparison between "draft-herbert-ipv6-srh-ah-00" and 
> "draft-ietf-6man-segment-routing-header" is
> missing some facts. The first patch of the SRH implementation was submitted 
> to the kernel two years after
> releasing the SRH draft. By this time, the draft was a working group adopted 
> and co-authored by several
> vendors, operators and academia. Please refer to the first SRH patch 
> submitted to the kernel
> (https://patchwork.ozlabs.org/patch/663176/). I still don’t see the point of 
> rushing to upstream something
> that has been defined couple of days ago. Plus there is nothing that prevents 
> anyone to "innovate" in his
> own private kernel tree.

Ahmed,

While you seem to think that was just defined and came out the blue a
few days ago, in fact this has been in discussion for many months. The
simultaneous use of segment routing and authentication header was not
defined-- but it is defined for other routing types and extension
headers. The primary drivers of segment routing (the academics,
operators, and vendors you refer to) were reluctant to do this. For
the most part, these are mostly routing vendors who don't care about
preserving end-to-end host functionality like AH. In order to define
an interoperable protocol, the mutability of fields needs to be
defined. They were unwilling to commit to defining what is mutable in
their protocol, and it took an intervening action of the working group
chairs to force them to clarify the requirements so now we have
something.

IMO, this is straightforward bug fix. If you want to say that we need
to wait for IETF to take action, okay, but then I strongly suggest
that you actively participate in the process (i.e. send to 6man list
what you think about the draft), as opposed to just passively
deferring to it and assuming others will do the right thing.

Tom

>
> --
> Ahmed Abdelsalam 


[PATCH net-next] net: fix use-after-free in kfree_skb_list

2019-06-02 Thread Eric Dumazet
syzbot reported nasty use-after-free [1]

Lets remove frag_list field from structs ip_fraglist_iter
and ip6_fraglist_iter. This seens not needed anyway.

[1] :
BUG: KASAN: use-after-free in kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706
Read of size 8 at addr 888085a3cbc0 by task syz-executor303/8947

CPU: 0 PID: 8947 Comm: syz-executor303 Not tainted 5.2.0-rc2+ #12
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 kasan_report+0x12/0x20 mm/kasan/common.c:614
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706
 ip6_fragment+0x1ef4/0x2680 net/ipv6/ip6_output.c:882
 __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144
 ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156
 NF_HOOK_COND include/linux/netfilter.h:294 [inline]
 ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179
 dst_output include/net/dst.h:433 [inline]
 ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179
 ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796
 ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816
 rawv6_push_pending_frames net/ipv6/raw.c:617 [inline]
 rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947
 inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:671
 ___sys_sendmsg+0x803/0x920 net/socket.c:2292
 __sys_sendmsg+0x105/0x1d0 net/socket.c:2330
 __do_sys_sendmsg net/socket.c:2339 [inline]
 __se_sys_sendmsg net/socket.c:2337 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x44add9
Code: e8 7c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 
1b 05 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7f826f33bce8 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 006e7a18 RCX: 0044add9
RDX:  RSI: 2240 RDI: 0005
RBP: 006e7a10 R08:  R09: 
R10:  R11: 0246 R12: 006e7a1c
R13: 7ffcec4f7ebf R14: 7f826f33c9c0 R15: 20c49ba5e353f7cf

Allocated by task 8947:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_kmalloc mm/kasan/common.c:489 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
 slab_post_alloc_hook mm/slab.h:437 [inline]
 slab_alloc_node mm/slab.c:3269 [inline]
 kmem_cache_alloc_node+0x131/0x710 mm/slab.c:3579
 __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:199
 alloc_skb include/linux/skbuff.h:1058 [inline]
 __ip6_append_data.isra.0+0x2a24/0x3640 net/ipv6/ip6_output.c:1519
 ip6_append_data+0x1e5/0x320 net/ipv6/ip6_output.c:1688
 rawv6_sendmsg+0x1467/0x35e0 net/ipv6/raw.c:940
 inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:671
 ___sys_sendmsg+0x803/0x920 net/socket.c:2292
 __sys_sendmsg+0x105/0x1d0 net/socket.c:2330
 __do_sys_sendmsg net/socket.c:2339 [inline]
 __se_sys_sendmsg net/socket.c:2337 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 8947:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
 __cache_free mm/slab.c:3432 [inline]
 kmem_cache_free+0x86/0x260 mm/slab.c:3698
 kfree_skbmem net/core/skbuff.c:625 [inline]
 kfree_skbmem+0xc5/0x150 net/core/skbuff.c:619
 __kfree_skb net/core/skbuff.c:682 [inline]
 kfree_skb net/core/skbuff.c:699 [inline]
 kfree_skb+0xf0/0x390 net/core/skbuff.c:693
 kfree_skb_list+0x44/0x60 net/core/skbuff.c:708
 __dev_xmit_skb net/core/dev.c:3551 [inline]
 __dev_queue_xmit+0x3034/0x36b0 net/core/dev.c:3850
 dev_queue_xmit+0x18/0x20 net/core/dev.c:3914
 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1532
 neigh_output include/net/neighbour.h:511 [inline]
 ip6_finish_output2+0x1034/0x2550 net/ipv6/ip6_output.c:120
 ip6_fragment+0x1ebb/0x2680 net/ipv6/ip6_output.c:863
 __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144
 ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156
 NF_HOOK_COND include/linux/netfilter.h:294 [inline]
 ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179
 dst_output include/net/dst.h:433 [inline]
 ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179
 ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796
 ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816
 rawv6_

[PATCH RFC iproute2-next v3] tc: add support for action act_ctinfo

2019-06-02 Thread Kevin Darbyshire-Bryant
ctinfo is an action restoring data stored in conntrack marks to various
fields.  At present it has two independent modes of operation,
restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
marks into packet skb marks.

It understands a number of parameters specific to this action in
additional to the usual action syntax.  Each operating mode is
independent of the other so all options are optional, however not
specifying at least one mode is a bit pointless.

Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
  [CONTROL] [index ]

DSCP mode

dscp enables copying of a DSCP store in the conntrack mark into the
ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
in the conntrack mark the DSCP value is stored.  It must be 6 contiguous
bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits
of the conntrack mark.

The DSCP copying may be optionally controlled by a statemask.  The
statemask is a 32bit field, usually with a single bit set and must not
overlap the dscp mask.  The DSCP restore operation will only take place
if the corresponding bit/s in conntrack mark yield a non zero result.

eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6
bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
example.

CPMARK mode

cpmark enables copying of the conntrack mark to the packet skb mark.  In
this mode it is completely equivalent to the existing act_connmark.
Additional functionality is provided by the optional mask parameter,
whereby the stored conntrack mark is logically anded with the cpmark
mask before being stored into skb mark.  This allows shared usage of the
conntrack mark between applications.

eg. cpmark 0x00ff would restore only the lower 24 bits of the
conntrack mark, thus may be useful in the event that the upper 8 bits
are used by the DSCP function.

Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
  [CONTROL] [index ]
where :
dscp MASK is the bitmask to restore DSCP
 STATEMASK is the bitmask to determine conditional restoring
cpmark MASK mask applied to restored packet mark
ZONE is the conntrack zone
CONTROL := reclassify | pipe | drop | continue | ok |
   goto chain 

Signed-off-by: Kevin Darbyshire-Bryant 

---
v2 - fix whitespace issue in pkt_cls
 fix most warnings from checkpatch - some lines still over 80 chars
 due to long TLV names.
v3 - fix some dangling else warnings.
 refactor stats printing to please checkpatch.
 send zone TLV even if default '0' zone.
 now checkpatch clean even though I think some of the formatting
 is horrible :-)
 sending via google's smtp 'cos MS' exchange office365 appears
 to mangle patches from git send-email.

 include/uapi/linux/pkt_cls.h  |   1 +
 include/uapi/linux/tc_act/tc_ctinfo.h |  34 
 tc/Makefile   |   1 +
 tc/m_ctinfo.c | 262 ++
 4 files changed, 298 insertions(+)
 create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
 create mode 100644 tc/m_ctinfo.c

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51a0496f..a93680fc 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -105,6 +105,7 @@ enum tca_id {
TCA_ID_IFE = TCA_ACT_IFE,
TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
/* other actions go here */
+   TCA_ID_CTINFO,
__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h 
b/include/uapi/linux/tc_act/tc_ctinfo.h
new file mode 100644
index ..da803e05
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ctinfo.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CTINFO_H
+#define __UAPI_TC_CTINFO_H
+
+#include 
+#include 
+
+struct tc_ctinfo {
+   tc_gen;
+};
+
+enum {
+   TCA_CTINFO_UNSPEC,
+   TCA_CTINFO_PAD,
+   TCA_CTINFO_TM,
+   TCA_CTINFO_ACT,
+   TCA_CTINFO_ZONE,
+   TCA_CTINFO_PARMS_DSCP_MASK,
+   TCA_CTINFO_PARMS_DSCP_STATEMASK,
+   TCA_CTINFO_PARMS_CPMARK_MASK,
+   TCA_CTINFO_STATS_DSCP_SET,
+   TCA_CTINFO_STATS_DSCP_ERROR,
+   TCA_CTINFO_STATS_CPMARK_SET,
+   __TCA_CTINFO_MAX
+};
+
+#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
+
+enum {
+   CTINFO_MODE_DSCP= BIT(0),
+   CTINFO_MODE_CPMARK  = BIT(1)
+};
+
+#endif
diff --git a/tc/Makefile b/tc/Makefile
index 1a305cf4..60abddee 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -48,6 +48,7 @@ TCMODULES += m_csum.o
 TCMODULES += m_simple.o
 TCMODULES += m_vlan.o
 TCMODULES += m_connmark.o
+TCMODULES += m_ctinfo.o
 TCMODULES += m_bpf.o
 TCMODULES += m_tunnel_key.o
 TCMODULES += m_sample.o
diff --git a/tc/m_ctinfo.c b/tc/m_ctinfo.c
new file mode 100644
index ..af5102bf
--- /dev/null
+++ b/tc/m_ctinfo.c
@@ -0,0 +1,262 @@
+/* SPDX-License-Ident

Re: [PATCH net-next] selftests: Add test cases for nexthop objects

2019-06-02 Thread David Miller
From: David Ahern 
Date: Thu, 30 May 2019 12:06:36 -0700

> From: David Ahern 
> 
> Add functional test cases for nexthop objects.
> 
> Signed-off-by: David Ahern 

Applied, thanks.


Re: [PATCH net-next] cxgb4: Set initial IRQ affinity hints

2019-06-02 Thread David Miller
From: Nirranjan Kirubaharan 
Date: Thu, 30 May 2019 23:14:28 -0700

> + while (--ethqidx >= 0) {
> + --msi_index;

It is more canonical to use "msi_index--;" here.


Re: [PATCH RFC iproute2-next v3] tc: add support for action act_ctinfo

2019-06-02 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant  writes:

> ctinfo is an action restoring data stored in conntrack marks to various
> fields.  At present it has two independent modes of operation,
> restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
> marks into packet skb marks.
>
> It understands a number of parameters specific to this action in
> additional to the usual action syntax.  Each operating mode is
> independent of the other so all options are optional, however not
> specifying at least one mode is a bit pointless.
>
> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
> [CONTROL] [index ]

Yay, bikeshedding time! :)

As I said in reply to the kernel patch, the "X/Y" syntax usually means
"/", where here they are just two
semi-related mask values. So I think it would be better to just make
'statemask' its own parameter.


Other than that, just a few nits, below...

> DSCP mode
>
> dscp enables copying of a DSCP store in the conntrack mark into the
> ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
> in the conntrack mark the DSCP value is stored.  It must be 6 contiguous
> bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits
> of the conntrack mark.
>
> The DSCP copying may be optionally controlled by a statemask.  The
> statemask is a 32bit field, usually with a single bit set and must not
> overlap the dscp mask.  The DSCP restore operation will only take place
> if the corresponding bit/s in conntrack mark yield a non zero result.
>
> eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6
> bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
> example.
>
> CPMARK mode
>
> cpmark enables copying of the conntrack mark to the packet skb mark.  In
> this mode it is completely equivalent to the existing act_connmark.
> Additional functionality is provided by the optional mask parameter,
> whereby the stored conntrack mark is logically anded with the cpmark
> mask before being stored into skb mark.  This allows shared usage of the
> conntrack mark between applications.
>
> eg. cpmark 0x00ff would restore only the lower 24 bits of the
> conntrack mark, thus may be useful in the event that the upper 8 bits
> are used by the DSCP function.
>
> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
> [CONTROL] [index ]
> where :
>   dscp MASK is the bitmask to restore DSCP
>STATEMASK is the bitmask to determine conditional restoring
>   cpmark MASK mask applied to restored packet mark
>   ZONE is the conntrack zone
>   CONTROL := reclassify | pipe | drop | continue | ok |
>  goto chain 
>
> Signed-off-by: Kevin Darbyshire-Bryant 
>
> ---
> v2 - fix whitespace issue in pkt_cls
>  fix most warnings from checkpatch - some lines still over 80 chars
>  due to long TLV names.
> v3 - fix some dangling else warnings.
>  refactor stats printing to please checkpatch.
>  send zone TLV even if default '0' zone.
>  now checkpatch clean even though I think some of the formatting
>  is horrible :-)
>  sending via google's smtp 'cos MS' exchange office365 appears
>  to mangle patches from git send-email.

Ah, so it wasn't just me having problems ;)

>  include/uapi/linux/pkt_cls.h  |   1 +
>  include/uapi/linux/tc_act/tc_ctinfo.h |  34 
>  tc/Makefile   |   1 +
>  tc/m_ctinfo.c | 262 ++
>  4 files changed, 298 insertions(+)
>  create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
>  create mode 100644 tc/m_ctinfo.c
>
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 51a0496f..a93680fc 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -105,6 +105,7 @@ enum tca_id {
>   TCA_ID_IFE = TCA_ACT_IFE,
>   TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
>   /* other actions go here */
> + TCA_ID_CTINFO,
>   __TCA_ID_MAX = 255
>  };
>  
> diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h 
> b/include/uapi/linux/tc_act/tc_ctinfo.h
> new file mode 100644
> index ..da803e05
> --- /dev/null
> +++ b/include/uapi/linux/tc_act/tc_ctinfo.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef __UAPI_TC_CTINFO_H
> +#define __UAPI_TC_CTINFO_H
> +
> +#include 
> +#include 
> +
> +struct tc_ctinfo {
> + tc_gen;
> +};
> +
> +enum {
> + TCA_CTINFO_UNSPEC,
> + TCA_CTINFO_PAD,
> + TCA_CTINFO_TM,
> + TCA_CTINFO_ACT,
> + TCA_CTINFO_ZONE,
> + TCA_CTINFO_PARMS_DSCP_MASK,
> + TCA_CTINFO_PARMS_DSCP_STATEMASK,
> + TCA_CTINFO_PARMS_CPMARK_MASK,
> + TCA_CTINFO_STATS_DSCP_SET,
> + TCA_CTINFO_STATS_DSCP_ERROR,
> + TCA_CTINFO_STATS_CPMARK_SET,
> + __TCA_CTINFO_MAX
> +};
> +
> +#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
> +
> +enum {
> + CTINFO_MODE_DSCP= BIT(0),
> +  

Re: [PATCH net-next] Update my email address

2019-06-02 Thread David Miller
From: Wei Liu 
Date: Fri, 31 May 2019 08:31:02 +0100

> Signed-off-by: Wei Liu 

Applied.


[PATCH net-next 03/11] net: dsa: sja1105: Add missing L2 Forwarding Table definitions for P/Q/R/S

2019-06-02 Thread Vladimir Oltean
This appends to the L2 Forwarding and L2 Forwarding Parameters tables
(originally added for first-generation switches) the bits that are new
in the second generation.

Signed-off-by: Vladimir Oltean 
---
 .../net/dsa/sja1105/sja1105_static_config.c   | 18 ++---
 .../net/dsa/sja1105/sja1105_static_config.h   | 26 +++
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_static_config.c 
b/drivers/net/dsa/sja1105/sja1105_static_config.c
index 7e90e62da389..6d65a7b09395 100644
--- a/drivers/net/dsa/sja1105/sja1105_static_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_static_config.c
@@ -236,10 +236,20 @@ size_t sja1105pqrs_l2_lookup_entry_packing(void *buf, 
void *entry_ptr,
const size_t size = SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY;
struct sja1105_l2_lookup_entry *entry = entry_ptr;
 
-   /* These are static L2 lookup entries, so the structure
-* should match UM11040 Table 16/17 definitions when
-* LOCKEDS is 1.
-*/
+   if (entry->lockeds) {
+   sja1105_packing(buf, &entry->tsreg,159, 159, size, op);
+   sja1105_packing(buf, &entry->mirrvlan, 158, 147, size, op);
+   sja1105_packing(buf, &entry->takets,   146, 146, size, op);
+   sja1105_packing(buf, &entry->mirr, 145, 145, size, op);
+   sja1105_packing(buf, &entry->retag,144, 144, size, op);
+   } else {
+   sja1105_packing(buf, &entry->touched,  159, 159, size, op);
+   sja1105_packing(buf, &entry->age,  158, 144, size, op);
+   }
+   sja1105_packing(buf, &entry->mask_iotag,   143, 143, size, op);
+   sja1105_packing(buf, &entry->mask_vlanid,  142, 131, size, op);
+   sja1105_packing(buf, &entry->mask_macaddr, 130,  83, size, op);
+   sja1105_packing(buf, &entry->iotag, 82,  82, size, op);
sja1105_packing(buf, &entry->vlanid,81,  70, size, op);
sja1105_packing(buf, &entry->macaddr,   69,  22, size, op);
sja1105_packing(buf, &entry->destports, 21,  17, size, op);
diff --git a/drivers/net/dsa/sja1105/sja1105_static_config.h 
b/drivers/net/dsa/sja1105/sja1105_static_config.h
index 069ca8fd059c..d513b1c91b98 100644
--- a/drivers/net/dsa/sja1105/sja1105_static_config.h
+++ b/drivers/net/dsa/sja1105/sja1105_static_config.h
@@ -122,9 +122,35 @@ struct sja1105_l2_lookup_entry {
u64 destports;
u64 enfport;
u64 index;
+   /* P/Q/R/S only */
+   u64 mask_iotag;
+   u64 mask_vlanid;
+   u64 mask_macaddr;
+   u64 iotag;
+   bool lockeds;
+   union {
+   /* LOCKEDS=1: Static FDB entries */
+   struct {
+   u64 tsreg;
+   u64 mirrvlan;
+   u64 takets;
+   u64 mirr;
+   u64 retag;
+   };
+   /* LOCKEDS=0: Dynamically learned FDB entries */
+   struct {
+   u64 touched;
+   u64 age;
+   };
+   };
 };
 
 struct sja1105_l2_lookup_params_entry {
+   u64 start_dynspc;/* P/Q/R/S only */
+   u64 drpnolearn;  /* P/Q/R/S only */
+   u64 use_static;  /* P/Q/R/S only */
+   u64 owr_dyn; /* P/Q/R/S only */
+   u64 learn_once;  /* P/Q/R/S only */
u64 maxage;  /* Shared */
u64 dyn_tbsz;/* E/T only */
u64 poly;/* E/T only */
-- 
2.17.1



[PATCH net-next 00/11] FDB updates for SJA1105 DSA driver

2019-06-02 Thread Vladimir Oltean
This patch series adds:

- FDB switchdev support for the second generation of switches (P/Q/R/S).
  I could test/code these now that I got a board with a SJA1105Q.

- Management route support for SJA1105 P/Q/R/S. This is needed to send
  PTP/STP/management frames over the CPU port.

- Logic to hide private DSA VLANs from the 'bridge fdb' commands.

The new FDB code was also tested and still works on SJA1105T.

Vladimir Oltean (11):
  net: dsa: sja1105: Shim declaration of struct sja1105_dyn_cmd
  net: dsa: sja1105: Fix bit offsets of index field from L2 lookup
entries
  net: dsa: sja1105: Add missing L2 Forwarding Table definitions for
P/Q/R/S
  net: dsa: sja1105: Plug in support for TCAM searches via the dynamic
interface
  net: dsa: sja1105: Make room for P/Q/R/S FDB operations
  net: dsa: sja1105: Add P/Q/R/S support for dynamic L2 lookup
operations
  net: dsa: sja1105: Make dynamic_config_read return -ENOENT if not
found
  net: dsa: sja1105: Add P/Q/R/S management route support via dynamic
interface
  net: dsa: sja1105: Add FDB operations for P/Q/R/S series
  net: dsa: sja1105: Unset port from forwarding mask unconditionally on
fdb_del
  net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb
command

 drivers/net/dsa/sja1105/sja1105.h |  20 +-
 .../net/dsa/sja1105/sja1105_dynamic_config.c  | 144 +-
 .../net/dsa/sja1105/sja1105_dynamic_config.h  |  11 +-
 drivers/net/dsa/sja1105/sja1105_main.c| 186 --
 drivers/net/dsa/sja1105/sja1105_spi.c |  12 ++
 .../net/dsa/sja1105/sja1105_static_config.c   |  18 +-
 .../net/dsa/sja1105/sja1105_static_config.h   |  26 +++
 7 files changed, 379 insertions(+), 38 deletions(-)

-- 
2.17.1



[PATCH net-next 04/11] net: dsa: sja1105: Plug in support for TCAM searches via the dynamic interface

2019-06-02 Thread Vladimir Oltean
Only a single dynamic configuration table of the SJA1105 P/Q/R/S
supports this operation: the FDB.

To keep the existing structure in place (sja1105_dynamic_config_read and
sja1105_dynamic_config_write) and not introduce any new function, a
convention is made for sja1105_dynamic_config_read that a negative index
argument denotes a search for the entry provided as argument.

Signed-off-by: Vladimir Oltean 
---
 .../net/dsa/sja1105/sja1105_dynamic_config.c  | 36 ++-
 .../net/dsa/sja1105/sja1105_dynamic_config.h  |  3 ++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index 0023b03a010d..7e7efc2e8ee4 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -36,6 +36,7 @@
SJA1105PQRS_SIZE_MAC_CONFIG_DYN_CMD
 
 struct sja1105_dyn_cmd {
+   bool search;
u64 valid;
u64 rdwrset;
u64 errors;
@@ -248,6 +249,7 @@ sja1105et_general_params_entry_packing(void *buf, void 
*entry_ptr,
 #define OP_READBIT(0)
 #define OP_WRITE   BIT(1)
 #define OP_DEL BIT(2)
+#define OP_SEARCH  BIT(3)
 
 /* SJA1105E/T: First generation */
 struct sja1105_dynamic_table_ops sja1105et_dyn_ops[BLK_IDX_MAX_DYN] = {
@@ -367,6 +369,24 @@ struct sja1105_dynamic_table_ops 
sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = {
[BLK_IDX_XMII_PARAMS] = {0},
 };
 
+/* Provides read access to the settings through the dynamic interface
+ * of the switch.
+ * @blk_idxis used as key to select from the sja1105_dynamic_table_ops.
+ * The selection is limited by the hardware in respect to which
+ * configuration blocks can be read through the dynamic interface.
+ * @index  is used to retrieve a particular table entry. If negative,
+ * (and if the @blk_idx supports the searching operation) a search
+ * is performed by the @entry parameter.
+ * @entry  Type-casted to an unpacked structure that holds a table entry
+ * of the type specified in @blk_idx.
+ * Usually an output argument. If @index is negative, then this
+ * argument is used as input/output: it should be pre-populated
+ * with the element to search for. Entries which support the
+ * search operation will have an "index" field (not the @index
+ * argument to this function) and that is where the found index
+ * will be returned (or left unmodified - thus negative - if not
+ * found).
+ */
 int sja1105_dynamic_config_read(struct sja1105_private *priv,
enum sja1105_blk_idx blk_idx,
int index, void *entry)
@@ -385,6 +405,8 @@ int sja1105_dynamic_config_read(struct sja1105_private 
*priv,
 
if (index >= ops->max_entry_count)
return -ERANGE;
+   if (index < 0 && !(ops->access & OP_SEARCH))
+   return -EOPNOTSUPP;
if (!(ops->access & OP_READ))
return -EOPNOTSUPP;
if (ops->packed_size > SJA1105_MAX_DYN_CMD_SIZE)
@@ -396,9 +418,19 @@ int sja1105_dynamic_config_read(struct sja1105_private 
*priv,
 
cmd.valid = true; /* Trigger action on table entry */
cmd.rdwrset = SPI_READ; /* Action is read */
-   cmd.index = index;
+   if (index < 0) {
+   /* Avoid copying a signed negative number to an u64 */
+   cmd.index = 0;
+   cmd.search = true;
+   } else {
+   cmd.index = index;
+   cmd.search = false;
+   }
ops->cmd_packing(packed_buf, &cmd, PACK);
 
+   if (cmd.search)
+   ops->entry_packing(packed_buf, entry, PACK);
+
/* Send SPI write operation: read config table entry */
rc = sja1105_spi_send_packed_buf(priv, SPI_WRITE, ops->addr,
 packed_buf, ops->packed_size);
@@ -456,6 +488,8 @@ int sja1105_dynamic_config_write(struct sja1105_private 
*priv,
 
if (index >= ops->max_entry_count)
return -ERANGE;
+   if (index < 0)
+   return -ERANGE;
if (!(ops->access & OP_WRITE))
return -EOPNOTSUPP;
if (!keep && !(ops->access & OP_DEL))
diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.h 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.h
index 49c611eb02cb..740dadf43f01 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.h
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.h
@@ -7,6 +7,9 @@
 #include "sja1105.h"
 #include 
 
+/* Special index that can be used for sja1105_dynamic_config_read */
+#define SJA1105_SEARCH -1
+
 struct sja1105_dyn_cmd;
 
 struct sja1105_dynamic_table_ops {
-- 
2.17.1



[PATCH net-next 01/11] net: dsa: sja1105: Shim declaration of struct sja1105_dyn_cmd

2019-06-02 Thread Vladimir Oltean
This structure is merely an implementation detail and should be hidden
from the sja1105_dynamic_config.h header, which provides to the rest of
the driver an abstract access to the dynamic configuration interface of
the switch.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 8 
 drivers/net/dsa/sja1105/sja1105_dynamic_config.h | 8 +---
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index e73ab28bf632..c981c12eb181 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -35,6 +35,14 @@
 #define SJA1105_MAX_DYN_CMD_SIZE   \
SJA1105PQRS_SIZE_MAC_CONFIG_DYN_CMD
 
+struct sja1105_dyn_cmd {
+   u64 valid;
+   u64 rdwrset;
+   u64 errors;
+   u64 valident;
+   u64 index;
+};
+
 static void
 sja1105pqrs_l2_lookup_cmd_packing(void *buf, struct sja1105_dyn_cmd *cmd,
  enum packing_op op)
diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.h 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.h
index 77be59546a55..49c611eb02cb 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.h
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.h
@@ -7,13 +7,7 @@
 #include "sja1105.h"
 #include 
 
-struct sja1105_dyn_cmd {
-   u64 valid;
-   u64 rdwrset;
-   u64 errors;
-   u64 valident;
-   u64 index;
-};
+struct sja1105_dyn_cmd;
 
 struct sja1105_dynamic_table_ops {
/* This returns size_t just to keep same prototype as the
-- 
2.17.1



[PATCH net-next 02/11] net: dsa: sja1105: Fix bit offsets of index field from L2 lookup entries

2019-06-02 Thread Vladimir Oltean
This was inadvertently copied from the SJA1105 E/T structure and not
tested.  Cross-checking with the P/Q/R/S documentation (UM11040) makes
it immediately obvious what the correct bit offsets for this field are.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index c981c12eb181..0023b03a010d 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -62,7 +62,7 @@ sja1105pqrs_l2_lookup_cmd_packing(void *buf, struct 
sja1105_dyn_cmd *cmd,
 * such that our API doesn't need to ask for a full-blown entry
 * structure when e.g. a delete is requested.
 */
-   sja1105_packing(buf, &cmd->index, 29, 20,
+   sja1105_packing(buf, &cmd->index, 15, 6,
SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY, op);
/* TODO hostcmd */
 }
-- 
2.17.1



[PATCH net-next 05/11] net: dsa: sja1105: Make room for P/Q/R/S FDB operations

2019-06-02 Thread Vladimir Oltean
The DSA callbacks were written with the E/T (first generation) in mind,
which is quite different.

For P/Q/R/S completely new implementations need to be provided, which
are held as function pointers in the priv->info structure.  We are
taking a slightly roundabout way for this (a function from
sja1105_main.c reads a structure defined in sja1105_spi.c that
points to a function defined in sja1105_main.c), but it is what it is.

The FDB dump callback works for both families, hence no function pointer
for that.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105.h | 15 -
 .../net/dsa/sja1105/sja1105_dynamic_config.c  |  2 +-
 drivers/net/dsa/sja1105/sja1105_main.c| 56 ++-
 drivers/net/dsa/sja1105/sja1105_spi.c | 12 
 4 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h 
b/drivers/net/dsa/sja1105/sja1105.h
index b043bfc408f2..f55e95d1b731 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -55,6 +55,11 @@ struct sja1105_info {
const struct sja1105_regs *regs;
int (*reset_cmd)(const void *ctx, const void *data);
int (*setup_rgmii_delay)(const void *ctx, int port);
+   /* Prototypes from include/net/dsa.h */
+   int (*fdb_add_cmd)(struct dsa_switch *ds, int port,
+  const unsigned char *addr, u16 vid);
+   int (*fdb_del_cmd)(struct dsa_switch *ds, int port,
+  const unsigned char *addr, u16 vid);
const char *name;
 };
 
@@ -142,7 +147,15 @@ int sja1105_dynamic_config_write(struct sja1105_private 
*priv,
 enum sja1105_blk_idx blk_idx,
 int index, void *entry, bool keep);
 
-u8 sja1105_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid);
+u8 sja1105et_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid);
+int sja1105et_fdb_add(struct dsa_switch *ds, int port,
+ const unsigned char *addr, u16 vid);
+int sja1105et_fdb_del(struct dsa_switch *ds, int port,
+ const unsigned char *addr, u16 vid);
+int sja1105pqrs_fdb_add(struct dsa_switch *ds, int port,
+   const unsigned char *addr, u16 vid);
+int sja1105pqrs_fdb_del(struct dsa_switch *ds, int port,
+   const unsigned char *addr, u16 vid);
 
 /* Common implementations for the static and dynamic configs */
 size_t sja1105_l2_forwarding_entry_packing(void *buf, void *entry_ptr,
diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index 7e7efc2e8ee4..3a8b0d0ab330 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -552,7 +552,7 @@ static u8 sja1105_crc8_add(u8 crc, u8 byte, u8 poly)
  * is also received as argument in the Koopman notation that the switch
  * hardware stores it in.
  */
-u8 sja1105_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid)
+u8 sja1105et_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid)
 {
struct sja1105_l2_lookup_params_entry *l2_lookup_params =
priv->static_config.tables[BLK_IDX_L2_LOOKUP_PARAMS].entries;
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index cfdefd9f1905..c78d2def52f1 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -786,10 +786,10 @@ static inline int sja1105et_fdb_index(int bin, int way)
return bin * SJA1105ET_FDB_BIN_SIZE + way;
 }
 
-static int sja1105_is_fdb_entry_in_bin(struct sja1105_private *priv, int bin,
-  const u8 *addr, u16 vid,
-  struct sja1105_l2_lookup_entry *match,
-  int *last_unused)
+static int sja1105et_is_fdb_entry_in_bin(struct sja1105_private *priv, int bin,
+const u8 *addr, u16 vid,
+struct sja1105_l2_lookup_entry *match,
+int *last_unused)
 {
int way;
 
@@ -818,8 +818,8 @@ static int sja1105_is_fdb_entry_in_bin(struct 
sja1105_private *priv, int bin,
return -1;
 }
 
-static int sja1105_fdb_add(struct dsa_switch *ds, int port,
-  const unsigned char *addr, u16 vid)
+int sja1105et_fdb_add(struct dsa_switch *ds, int port,
+ const unsigned char *addr, u16 vid)
 {
struct sja1105_l2_lookup_entry l2_lookup = {0};
struct sja1105_private *priv = ds->priv;
@@ -827,10 +827,10 @@ static int sja1105_fdb_add(struct dsa_switch *ds, int 
port,
int last_unused = -1;
int bin, way;
 
-   bin = sja1105_fdb_hash(priv, addr, vid);
+   bin = sja1105et_fdb_hash(priv, addr, vid);
 
-   way = sja1105_is_fdb_entry_in_bin

[PATCH net-next 06/11] net: dsa: sja1105: Add P/Q/R/S support for dynamic L2 lookup operations

2019-06-02 Thread Vladimir Oltean
These are needed in order to implement the switchdev FDB callbacks.

Compared to the E/T generation, not only the ABI (bit offsets) is
different, but also the introduction of the HOSTCMD field which permits
O(1) TCAM search for an FDB entry.  Make use of the newly introduce
OP_SEARCH to permit that.  It will be used while adding and deleting an
FDB entry (to see whether it exists or not).

Signed-off-by: Vladimir Oltean 
---
 .../net/dsa/sja1105/sja1105_dynamic_config.c  | 54 +--
 1 file changed, 50 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index 3a8b0d0ab330..7db1f8258287 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -44,17 +44,63 @@ struct sja1105_dyn_cmd {
u64 index;
 };
 
+enum sja1105_hostcmd {
+   SJA1105_HOSTCMD_SEARCH = 1,
+   SJA1105_HOSTCMD_READ = 2,
+   SJA1105_HOSTCMD_WRITE = 3,
+   SJA1105_HOSTCMD_INVALIDATE = 4,
+};
+
 static void
 sja1105pqrs_l2_lookup_cmd_packing(void *buf, struct sja1105_dyn_cmd *cmd,
  enum packing_op op)
 {
u8 *p = buf + SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY;
const int size = SJA1105_SIZE_DYN_CMD;
+   u64 lockeds = 0;
+   u64 hostcmd;
 
sja1105_packing(p, &cmd->valid,31, 31, size, op);
sja1105_packing(p, &cmd->rdwrset,  30, 30, size, op);
sja1105_packing(p, &cmd->errors,   29, 29, size, op);
+   sja1105_packing(p, &lockeds,   28, 28, size, op);
sja1105_packing(p, &cmd->valident, 27, 27, size, op);
+
+   /* VALIDENT is supposed to indicate "keep or not", but in SJA1105 E/T,
+* using it to delete a management route was unsupported. UM10944
+* said about it:
+*
+*   In case of a write access with the MGMTROUTE flag set,
+*   the flag will be ignored. It will always be found cleared
+*   for read accesses with the MGMTROUTE flag set.
+*
+* SJA1105 P/Q/R/S keeps the same behavior w.r.t. VALIDENT, but there
+* is now another flag called HOSTCMD which does more stuff (quoting
+* from UM11040):
+*
+*   A write request is accepted only when HOSTCMD is set to write host
+*   or invalid. A read request is accepted only when HOSTCMD is set to
+*   search host or read host.
+*
+* So it is possible to translate a RDWRSET/VALIDENT combination into
+* HOSTCMD so that we keep the dynamic command API in place, and
+* at the same time achieve compatibility with the management route
+* command structure.
+*/
+   if (cmd->rdwrset == SPI_READ) {
+   if (cmd->search)
+   hostcmd = SJA1105_HOSTCMD_SEARCH;
+   else
+   hostcmd = SJA1105_HOSTCMD_READ;
+   } else {
+   /* SPI_WRITE */
+   if (cmd->valident)
+   hostcmd = SJA1105_HOSTCMD_WRITE;
+   else
+   hostcmd = SJA1105_HOSTCMD_INVALIDATE;
+   }
+   sja1105_packing(p, &hostcmd, 25, 23, size, op);
+
/* Hack - The hardware takes the 'index' field within
 * struct sja1105_l2_lookup_entry as the index on which this command
 * will operate. However it will ignore everything else, so 'index'
@@ -65,7 +111,6 @@ sja1105pqrs_l2_lookup_cmd_packing(void *buf, struct 
sja1105_dyn_cmd *cmd,
 */
sja1105_packing(buf, &cmd->index, 15, 6,
SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY, op);
-   /* TODO hostcmd */
 }
 
 static void
@@ -319,9 +364,9 @@ struct sja1105_dynamic_table_ops 
sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = {
[BLK_IDX_L2_LOOKUP] = {
.entry_packing = sja1105pqrs_l2_lookup_entry_packing,
.cmd_packing = sja1105pqrs_l2_lookup_cmd_packing,
-   .access = (OP_READ | OP_WRITE | OP_DEL),
+   .access = (OP_READ | OP_WRITE | OP_DEL | OP_SEARCH),
.max_entry_count = SJA1105_MAX_L2_LOOKUP_COUNT,
-   .packed_size = SJA1105ET_SIZE_L2_LOOKUP_DYN_CMD,
+   .packed_size = SJA1105PQRS_SIZE_L2_LOOKUP_DYN_CMD,
.addr = 0x24,
},
[BLK_IDX_L2_POLICING] = {0},
@@ -403,7 +448,7 @@ int sja1105_dynamic_config_read(struct sja1105_private 
*priv,
 
ops = &priv->info->dyn_ops[blk_idx];
 
-   if (index >= ops->max_entry_count)
+   if (index >= 0 && index >= ops->max_entry_count)
return -ERANGE;
if (index < 0 && !(ops->access & OP_SEARCH))
return -EOPNOTSUPP;
@@ -426,6 +471,7 @@ int sja1105_dynamic_config_read(struct sja1105_private 
*priv,
cmd.index = index;
cmd.search = false;
}
+   cmd.valident = true;
ops->cmd_packing(packed_buf, &cmd, PACK);
 
if (cmd.sear

[PATCH net-next 07/11] net: dsa: sja1105: Make dynamic_config_read return -ENOENT if not found

2019-06-02 Thread Vladimir Oltean
Conceptually, if an entry is not found in the requested hardware table,
it is not an invalid request - so change the error returned
appropriately.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 2 +-
 drivers/net/dsa/sja1105/sja1105_main.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index 7db1f8258287..02a67df4437e 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -502,7 +502,7 @@ int sja1105_dynamic_config_read(struct sja1105_private 
*priv,
 * So don't error out in that case.
 */
if (!cmd.valident && blk_idx != BLK_IDX_MGMT_ROUTE)
-   return -EINVAL;
+   return -ENOENT;
cpu_relax();
} while (cmd.valid && --retries);
 
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index c78d2def52f1..dc9803efdbbd 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -948,7 +948,7 @@ static int sja1105_fdb_dump(struct dsa_switch *ds, int port,
rc = sja1105_dynamic_config_read(priv, BLK_IDX_L2_LOOKUP,
 i, &l2_lookup);
/* No fdb entry at i, not an issue */
-   if (rc == -EINVAL)
+   if (rc == -ENOENT)
continue;
if (rc) {
dev_err(dev, "Failed to dump FDB: %d\n", rc);
-- 
2.17.1



[PATCH net-next 08/11] net: dsa: sja1105: Add P/Q/R/S management route support via dynamic interface

2019-06-02 Thread Vladimir Oltean
Management routes are one-shot FDB rules installed on the CPU port for
sending link-local traffic.  They are a prerequisite for STP, PTP etc to
work.

Also make a note that removing a management route was not supported on
the previous generation of switches.

Signed-off-by: Vladimir Oltean 
---
 .../net/dsa/sja1105/sja1105_dynamic_config.c  | 40 ++-
 drivers/net/dsa/sja1105/sja1105_main.c|  2 +
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c 
b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index 02a67df4437e..352bb6e89297 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -161,6 +161,36 @@ static size_t sja1105et_mgmt_route_entry_packing(void 
*buf, void *entry_ptr,
return size;
 }
 
+static void
+sja1105pqrs_mgmt_route_cmd_packing(void *buf, struct sja1105_dyn_cmd *cmd,
+  enum packing_op op)
+{
+   u8 *p = buf + SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY;
+   u64 mgmtroute = 1;
+
+   sja1105pqrs_l2_lookup_cmd_packing(buf, cmd, op);
+   if (op == PACK)
+   sja1105_pack(p, &mgmtroute, 26, 26, SJA1105_SIZE_DYN_CMD);
+}
+
+static size_t sja1105pqrs_mgmt_route_entry_packing(void *buf, void *entry_ptr,
+  enum packing_op op)
+{
+   const size_t size = SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY;
+   struct sja1105_mgmt_entry *entry = entry_ptr;
+
+   /* In P/Q/R/S, enfport got renamed to mgmtvalid, but its purpose
+* is the same (driver uses it to confirm that frame was sent).
+* So just keep the name from E/T.
+*/
+   sja1105_packing(buf, &entry->tsreg, 71, 71, size, op);
+   sja1105_packing(buf, &entry->takets,70, 70, size, op);
+   sja1105_packing(buf, &entry->macaddr,   69, 22, size, op);
+   sja1105_packing(buf, &entry->destports, 21, 17, size, op);
+   sja1105_packing(buf, &entry->enfport,   16, 16, size, op);
+   return size;
+}
+
 /* In E/T, entry is at addresses 0x27-0x28. There is a 4 byte gap at 0x29,
  * and command is at 0x2a. Similarly in P/Q/R/S there is a 1 register gap
  * between entry (0x2d, 0x2e) and command (0x30).
@@ -359,7 +389,7 @@ struct sja1105_dynamic_table_ops 
sja1105et_dyn_ops[BLK_IDX_MAX_DYN] = {
[BLK_IDX_XMII_PARAMS] = {0},
 };
 
-/* SJA1105P/Q/R/S: Second generation: TODO */
+/* SJA1105P/Q/R/S: Second generation */
 struct sja1105_dynamic_table_ops sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = {
[BLK_IDX_L2_LOOKUP] = {
.entry_packing = sja1105pqrs_l2_lookup_entry_packing,
@@ -369,6 +399,14 @@ struct sja1105_dynamic_table_ops 
sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = {
.packed_size = SJA1105PQRS_SIZE_L2_LOOKUP_DYN_CMD,
.addr = 0x24,
},
+   [BLK_IDX_MGMT_ROUTE] = {
+   .entry_packing = sja1105pqrs_mgmt_route_entry_packing,
+   .cmd_packing = sja1105pqrs_mgmt_route_cmd_packing,
+   .access = (OP_READ | OP_WRITE | OP_DEL | OP_SEARCH),
+   .max_entry_count = SJA1105_NUM_PORTS,
+   .packed_size = SJA1105PQRS_SIZE_L2_LOOKUP_DYN_CMD,
+   .addr = 0x24,
+   },
[BLK_IDX_L2_POLICING] = {0},
[BLK_IDX_VLAN_LOOKUP] = {
.entry_packing = sja1105_vlan_lookup_entry_packing,
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index dc9803efdbbd..f9bbc780f835 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -1475,6 +1475,8 @@ static int sja1105_mgmt_xmit(struct dsa_switch *ds, int 
port, int slot,
if (!timeout) {
/* Clean up the management route so that a follow-up
 * frame may not match on it by mistake.
+* This is only hardware supported on P/Q/R/S - on E/T it is
+* a no-op and we are silently discarding the -EOPNOTSUPP.
 */
sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE,
 slot, &mgmt_route, false);
-- 
2.17.1



[PATCH net-next 09/11] net: dsa: sja1105: Add FDB operations for P/Q/R/S series

2019-06-02 Thread Vladimir Oltean
This adds support for manipulating the L2 forwarding database (dump,
add, delete) for the second generation of NXP SJA1105 switches.

At the moment only FDB entries installed statically through 'bridge fdb'
are visible in the dump callback - the dynamically learned ones are
still under investigation.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105.h  |  5 ++
 drivers/net/dsa/sja1105/sja1105_main.c | 89 +-
 2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h 
b/drivers/net/dsa/sja1105/sja1105.h
index f55e95d1b731..61d00682de60 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -147,6 +147,11 @@ int sja1105_dynamic_config_write(struct sja1105_private 
*priv,
 enum sja1105_blk_idx blk_idx,
 int index, void *entry, bool keep);
 
+enum sja1105_iotag {
+   SJA1105_C_TAG = 0, /* Inner VLAN header */
+   SJA1105_S_TAG = 1, /* Outer VLAN header */
+};
+
 u8 sja1105et_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid);
 int sja1105et_fdb_add(struct dsa_switch *ds, int port,
  const unsigned char *addr, u16 vid);
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index f9bbc780f835..46e2cc7b9ddc 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -210,6 +210,8 @@ static int sja1105_init_l2_lookup_params(struct 
sja1105_private *priv)
.maxage = SJA1105_AGEING_TIME_MS(30),
/* All entries within a FDB bin are available for learning */
.dyn_tbsz = SJA1105ET_FDB_BIN_SIZE,
+   /* And the P/Q/R/S equivalent setting: */
+   .start_dynspc = 0,
/* 2^8 + 2^5 + 2^3 + 2^2 + 2^1 + 1 in Koopman notation */
.poly = 0x97,
/* This selects between Independent VLAN Learning (IVL) and
@@ -225,6 +227,13 @@ static int sja1105_init_l2_lookup_params(struct 
sja1105_private *priv)
 * Maybe correlate with no_linklocal_learn from bridge driver?
 */
.no_mgmt_learn = true,
+   /* P/Q/R/S only */
+   .use_static = true,
+   /* Dynamically learned FDB entries can overwrite other (older)
+* dynamic FDB entries
+*/
+   .owr_dyn = true,
+   .drpnolearn = true,
};
 
table = &priv->static_config.tables[BLK_IDX_L2_LOOKUP_PARAMS];
@@ -908,13 +917,89 @@ int sja1105et_fdb_del(struct dsa_switch *ds, int port,
 int sja1105pqrs_fdb_add(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid)
 {
-   return -EOPNOTSUPP;
+   struct sja1105_l2_lookup_entry l2_lookup = {0};
+   struct sja1105_private *priv = ds->priv;
+   int rc, i;
+
+   /* Search for an existing entry in the FDB table */
+   l2_lookup.macaddr = ether_addr_to_u64(addr);
+   l2_lookup.vlanid = vid;
+   l2_lookup.iotag = SJA1105_S_TAG;
+   l2_lookup.mask_macaddr = GENMASK_ULL(ETH_ALEN * 8 - 1, 0);
+   l2_lookup.mask_vlanid = VLAN_VID_MASK;
+   l2_lookup.mask_iotag = BIT(0);
+   l2_lookup.destports = BIT(port);
+
+   rc = sja1105_dynamic_config_read(priv, BLK_IDX_L2_LOOKUP,
+SJA1105_SEARCH, &l2_lookup);
+   if (rc == 0) {
+   /* Found and this port is already in the entry's
+* port mask => job done
+*/
+   if (l2_lookup.destports & BIT(port))
+   return 0;
+   /* l2_lookup.index is populated by the switch in case it
+* found something.
+*/
+   l2_lookup.destports |= BIT(port);
+   goto skip_finding_an_index;
+   }
+
+   /* Not found, so try to find an unused spot in the FDB.
+* This is slightly inefficient because the strategy is knock-knock at
+* every possible position from 0 to 1023.
+*/
+   for (i = 0; i < SJA1105_MAX_L2_LOOKUP_COUNT; i++) {
+   rc = sja1105_dynamic_config_read(priv, BLK_IDX_L2_LOOKUP,
+i, NULL);
+   if (rc < 0)
+   break;
+   }
+   if (i == SJA1105_MAX_L2_LOOKUP_COUNT) {
+   dev_err(ds->dev, "FDB is full, cannot add entry.\n");
+   return -EINVAL;
+   }
+   l2_lookup.index = i;
+
+skip_finding_an_index:
+   return sja1105_dynamic_config_write(priv, BLK_IDX_L2_LOOKUP,
+   l2_lookup.index, &l2_lookup,
+   true);
 }
 
 int sja1105pqrs_fdb_del(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid)
 {
-   return -EOPNOTSUPP;

[PATCH net-next 10/11] net: dsa: sja1105: Unset port from forwarding mask unconditionally on fdb_del

2019-06-02 Thread Vladimir Oltean
This is a cosmetic patch that simplifies the code by removing a
redundant check. A logical AND-with-zero performed on a zero is still
zero.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index 46e2cc7b9ddc..8343dcf48384 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -903,8 +903,8 @@ int sja1105et_fdb_del(struct dsa_switch *ds, int port,
 * need to completely evict the FDB entry.
 * Otherwise we just write it back.
 */
-   if (l2_lookup.destports & BIT(port))
-   l2_lookup.destports &= ~BIT(port);
+   l2_lookup.destports &= ~BIT(port);
+
if (l2_lookup.destports)
keep = true;
else
-- 
2.17.1



[PATCH net-next 11/11] net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb command

2019-06-02 Thread Vladimir Oltean
TX VLANs and RX VLANs are an internal implementation detail of DSA for
frame tagging.  They work by installing special VLANs on switch ports in
the operating modes where no behavior change w.r.t. VLANs can be
observed by the user.

Therefore it makes sense to hide these VLANs in the 'bridge fdb'
command, as well as translate the pvid into the RX VID and TX VID on
'bridge fdb add' and 'bridge fdb del' commands.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_main.c | 37 ++
 1 file changed, 37 insertions(+)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index 8343dcf48384..b151a8fafb9e 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -1006,7 +1006,21 @@ static int sja1105_fdb_add(struct dsa_switch *ds, int 
port,
   const unsigned char *addr, u16 vid)
 {
struct sja1105_private *priv = ds->priv;
+   int rc;
+
+   /* Since we make use of VLANs even when the bridge core doesn't tell us
+* to, translate these FDB entries into the correct dsa_8021q ones.
+*/
+   if (!dsa_port_is_vlan_filtering(&ds->ports[port])) {
+   unsigned int upstream = dsa_upstream_port(priv->ds, port);
+   u16 tx_vid = dsa_8021q_tx_vid(ds, port);
+   u16 rx_vid = dsa_8021q_rx_vid(ds, port);
 
+   rc = priv->info->fdb_add_cmd(ds, port, addr, tx_vid);
+   if (rc < 0)
+   return rc;
+   return priv->info->fdb_add_cmd(ds, upstream, addr, rx_vid);
+   }
return priv->info->fdb_add_cmd(ds, port, addr, vid);
 }
 
@@ -1014,7 +1028,21 @@ static int sja1105_fdb_del(struct dsa_switch *ds, int 
port,
   const unsigned char *addr, u16 vid)
 {
struct sja1105_private *priv = ds->priv;
+   int rc;
 
+   /* Since we make use of VLANs even when the bridge core doesn't tell us
+* to, translate these FDB entries into the correct dsa_8021q ones.
+*/
+   if (!dsa_port_is_vlan_filtering(&ds->ports[port])) {
+   unsigned int upstream = dsa_upstream_port(priv->ds, port);
+   u16 tx_vid = dsa_8021q_tx_vid(ds, port);
+   u16 rx_vid = dsa_8021q_rx_vid(ds, port);
+
+   rc = priv->info->fdb_del_cmd(ds, port, addr, tx_vid);
+   if (rc < 0)
+   return rc;
+   return priv->info->fdb_del_cmd(ds, upstream, addr, rx_vid);
+   }
return priv->info->fdb_del_cmd(ds, port, addr, vid);
 }
 
@@ -1049,6 +1077,15 @@ static int sja1105_fdb_dump(struct dsa_switch *ds, int 
port,
if (!(l2_lookup.destports & BIT(port)))
continue;
u64_to_ether_addr(l2_lookup.macaddr, macaddr);
+
+   /* We need to hide the dsa_8021q VLAN from the user.
+* Convert the TX VID into the pvid that is active in
+* standalone and non-vlan_filtering modes, aka 1.
+* The RX VID is applied on the CPU port, which is not seen by
+* the bridge core anyway, so there's nothing to hide.
+*/
+   if (!dsa_port_is_vlan_filtering(&ds->ports[port]))
+   l2_lookup.vlanid = 1;
cb(macaddr, l2_lookup.vlanid, false, data);
}
return 0;
-- 
2.17.1



Re: [PATCH RFC iproute2-next v3] tc: add support for action act_ctinfo

2019-06-02 Thread Kevin 'ldir' Darbyshire-Bryant


> On 2 Jun 2019, at 21:39, Toke Høiland-Jørgensen  wrote:
> 
> Kevin Darbyshire-Bryant  writes:
> 
>> ctinfo is an action restoring data stored in conntrack marks to various
>> fields.  At present it has two independent modes of operation,
>> restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
>> marks into packet skb marks.
>> 
>> It understands a number of parameters specific to this action in
>> additional to the usual action syntax.  Each operating mode is
>> independent of the other so all options are optional, however not
>> specifying at least one mode is a bit pointless.
>> 
>> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
>>[CONTROL] [index ]
> 
> Yay, bikeshedding time! :)

I see your bikeshed and raise you… a bus shelter :-)

> 
> As I said in reply to the kernel patch, the "X/Y" syntax usually means
> "/", where here they are just two
> semi-related mask values. So I think it would be better to just make
> 'statemask' its own parameter.

Instead of creating another keyword how about we drop the ‘/‘ and
make it a space separated optional parameter to ‘dscp’? eg.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] blah blah


> Other than that, just a few nits, below...
> 
>> DSCP mode
>> 
>> dscp enables copying of a DSCP store in the conntrack mark into the
>> ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
>> in the conntrack mark the DSCP value is stored.  It must be 6 contiguous
>> bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits
>> of the conntrack mark.
>> 
>> The DSCP copying may be optionally controlled by a statemask.  The
>> statemask is a 32bit field, usually with a single bit set and must not
>> overlap the dscp mask.  The DSCP restore operation will only take place
>> if the corresponding bit/s in conntrack mark yield a non zero result.
>> 
>> eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6
>> bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
>> example.
>> 
>> CPMARK mode
>> 
>> cpmark enables copying of the conntrack mark to the packet skb mark.  In
>> this mode it is completely equivalent to the existing act_connmark.
>> Additional functionality is provided by the optional mask parameter,
>> whereby the stored conntrack mark is logically anded with the cpmark
>> mask before being stored into skb mark.  This allows shared usage of the
>> conntrack mark between applications.
>> 
>> eg. cpmark 0x00ff would restore only the lower 24 bits of the
>> conntrack mark, thus may be useful in the event that the upper 8 bits
>> are used by the DSCP function.
>> 
>> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
>>[CONTROL] [index ]
>> where :
>>  dscp MASK is the bitmask to restore DSCP
>>   STATEMASK is the bitmask to determine conditional restoring
>>  cpmark MASK mask applied to restored packet mark
>>  ZONE is the conntrack zone
>>  CONTROL := reclassify | pipe | drop | continue | ok |
>> goto chain 
>> 
>> Signed-off-by: Kevin Darbyshire-Bryant 
>> 
>> ---
>> v2 - fix whitespace issue in pkt_cls
>> fix most warnings from checkpatch - some lines still over 80 chars
>> due to long TLV names.
>> v3 - fix some dangling else warnings.
>> refactor stats printing to please checkpatch.
>> send zone TLV even if default '0' zone.
>> now checkpatch clean even though I think some of the formatting
>> is horrible :-)
>> sending via google's smtp 'cos MS' exchange office365 appears
>> to mangle patches from git send-email.
> 
> Ah, so it wasn't just me having problems ;)

No, though I’m still not clear what’s going on or when Microsoft
improved(tm) it :-/

> 
>> include/uapi/linux/pkt_cls.h  |   1 +
>> include/uapi/linux/tc_act/tc_ctinfo.h |  34 
>> tc/Makefile   |   1 +
>> tc/m_ctinfo.c | 262 ++
>> 4 files changed, 298 insertions(+)
>> create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
>> create mode 100644 tc/m_ctinfo.c
>> 
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 51a0496f..a93680fc 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -105,6 +105,7 @@ enum tca_id {
>>  TCA_ID_IFE = TCA_ACT_IFE,
>>  TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
>>  /* other actions go here */
>> +TCA_ID_CTINFO,
>>  __TCA_ID_MAX = 255
>> };
>> 
>> diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h 
>> b/include/uapi/linux/tc_act/tc_ctinfo.h
>> new file mode 100644
>> index ..da803e05
>> --- /dev/null
>> +++ b/include/uapi/linux/tc_act/tc_ctinfo.h
>> @@ -0,0 +1,34 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +#ifndef __UAPI_TC_CTINFO_H
>> +#define __UAPI_TC_CTINFO_H
>> +
>> +#include 
>> +#include 
>> +
>> +struct tc_ctinfo {
>> +tc_gen;
>

[PATCH v2 net 1/1] net: dsa: sja1105: Fix link speed not working at 100 Mbps and below

2019-06-02 Thread Vladimir Oltean
The hardware values for link speed are held in the sja1105_speed_t enum.
However they do not increase in the order that sja1105_get_speed_cfg was
iterating over them (basically from SJA1105_SPEED_AUTO - 0 - to
SJA1105_SPEED_1000MBPS - 1 - skipping the other two).

Another bug is that the code in sja1105_adjust_port_config relies on the
fact that an invalid link speed is detected by sja1105_get_speed_cfg and
returned as -EINVAL.  However storing this into an enum that only has
positive members will cast it into an unsigned value, and it will miss
the negative check.

So take the simplest approach and remove the sja1105_get_speed_cfg
function and replace it with a simple switch-case statement.

Fixes: 8aa9ebccae87 ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 
switch")
Signed-off-by: Vladimir Oltean 
Suggested-by: Andrew Lunn 
---
 drivers/net/dsa/sja1105/sja1105_main.c | 32 +-
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index 5412c3551bcc..25bb64ce0432 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -710,16 +710,6 @@ static int sja1105_speed[] = {
[SJA1105_SPEED_1000MBPS] = 1000,
 };
 
-static sja1105_speed_t sja1105_get_speed_cfg(unsigned int speed_mbps)
-{
-   int i;
-
-   for (i = SJA1105_SPEED_AUTO; i <= SJA1105_SPEED_1000MBPS; i++)
-   if (sja1105_speed[i] == speed_mbps)
-   return i;
-   return -EINVAL;
-}
-
 /* Set link speed and enable/disable traffic I/O in the MAC configuration
  * for a specific port.
  *
@@ -742,8 +732,21 @@ static int sja1105_adjust_port_config(struct 
sja1105_private *priv, int port,
mii = priv->static_config.tables[BLK_IDX_XMII_PARAMS].entries;
mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries;
 
-   speed = sja1105_get_speed_cfg(speed_mbps);
-   if (speed_mbps && speed < 0) {
+   switch (speed_mbps) {
+   case 0:
+   /* No speed update requested */
+   speed = SJA1105_SPEED_AUTO;
+   break;
+   case 10:
+   speed = SJA1105_SPEED_10MBPS;
+   break;
+   case 100:
+   speed = SJA1105_SPEED_100MBPS;
+   break;
+   case 1000:
+   speed = SJA1105_SPEED_1000MBPS;
+   break;
+   default:
dev_err(dev, "Invalid speed %iMbps\n", speed_mbps);
return -EINVAL;
}
@@ -753,10 +756,7 @@ static int sja1105_adjust_port_config(struct 
sja1105_private *priv, int port,
 * and we no longer need to store it in the static config (already told
 * hardware we want auto during upload phase).
 */
-   if (speed_mbps)
-   mac[port].speed = speed;
-   else
-   mac[port].speed = SJA1105_SPEED_AUTO;
+   mac[port].speed = speed;
 
/* On P/Q/R/S, one can read from the device via the MAC reconfiguration
 * tables. On E/T, MAC reconfig tables are not readable, only writable.
-- 
2.17.1



[PATCH v2 net 0/1] Fix link speed handling for SJA1105 DSA driver

2019-06-02 Thread Vladimir Oltean
This patchset avoids two bugs in the logic handling of the enum
sja1105_speed_t which caused link speeds of 10 and 100 Mbps to not be
interpreted correctly and thus not be applied to the switch MACs.

v1 patchset can be found at:
https://www.spinics.net/lists/netdev/msg574477.html

Changes from v1:
Applied Andrew Lunn's suggestion of removing the sja1105_get_speed_cfg
function altogether instead of trying to fix it.

Vladimir Oltean (1):
  net: dsa: sja1105: Fix link speed not working at 100 Mbps and below

 drivers/net/dsa/sja1105/sja1105_main.c | 32 +-
 1 file changed, 16 insertions(+), 16 deletions(-)

-- 
2.17.1



Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF

2019-06-02 Thread Jakub Kicinski
On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote:
> On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev  wrote:
> > On 05/31, Andrii Nakryiko wrote:  
> > > This patch adds support for a new way to define BPF maps. It relies on
> > > BTF to describe mandatory and optional attributes of a map, as well as
> > > captures type information of key and value naturally. This eliminates
> > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > always in sync with the key/value type.  
> > My 2c: this is too magical and relies on me knowing the expected fields.
> > (also, the compiler won't be able to help with the misspellings).  

I have mixed feelings, too.  Especially the key and value fields are
very non-idiomatic for C :(  They never hold any value or data, while
the other fields do.  That feels so awkward.  I'm no compiler expert,
but even something like:

struct map_def {
void *key_type_ref;
} mamap = {
.key_type_ref = &(struct key_xyz){},
};

Would feel like less of a hack to me, and then map_def doesn't have to
be different for every map.  But yea, IDK if it's easy to (a) resolve
the type of what key_type points to, or (b) how to do this for scalar
types.

> I don't think it's really worse than current bpf_map_def approach. In
> typical scenario, there are only two fields you need to remember: type
> and max_entries (notice, they are called exactly the same as in
> bpf_map_def, so this knowledge is transferrable). Then you'll have
> key/value, using which you are describing both type (using field's
> type) and size (calculated from the type).
> 
> I can relate a bit to that with bpf_map_def you can find definition
> and see all possible fields, but one can also find a lot of examples
> for new map definitions as well.
> 
> One big advantage of this scheme, though, is that you get that type
> association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> with no chance of having a mismatch, etc. This is less duplication (no
> need to do sizeof(struct my_struct) and struct my_struct as an arg to
> that macro) and there is no need to go and ping people to add those
> annotations to improve introspection of BPF maps.

> > > Relying on BTF, this approach allows for both forward and backward
> > > compatibility w.r.t. extending supported map definition features. Old
> > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > implementations will parse and recognize new optional attributes.  
> > I also don't know how to feel about old libbpf ignoring some attributes.
> > In the kernel we require that the unknown fields are zeroed.
> > We probably need to do something like that here? What do you think
> > would be a good example of an optional attribute?  
> 
> Ignoring is required for forward-compatibility, where old libbpf will
> be used to load newer user BPF programs. We can decided not to do it,
> in that case it's just a question of erroring out on first unknown
> field. This RFC was posted exactly to discuss all these issues with
> more general community, as there is no single true way to do this.
> 
> As for examples of when it can be used. It's any feature that can be
> considered optional or a hint, so if old libbpf doesn't do that, it's
> still not the end of the world (and we can live with that, or can
> correct using direct libbpf API calls).

On forward compatibility my 0.02c would be - if we want to go there 
and silently ignore fields it'd be good to have some form of "hard
required" bit.  For TLVs ABIs it can be a "you have to understand 
this one" bit, for libbpf perhaps we could add a "min libbpf version
required" section?  That kind of ties us ELF formats to libbpf
specifics (the libbpf version presumably would imply support for
features), but I think we want to go there, anyway.


Re: [PATCH v2 net 1/1] net: dsa: sja1105: Fix link speed not working at 100 Mbps and below

2019-06-02 Thread Andrew Lunn
On Mon, Jun 03, 2019 at 02:31:37AM +0300, Vladimir Oltean wrote:
> The hardware values for link speed are held in the sja1105_speed_t enum.
> However they do not increase in the order that sja1105_get_speed_cfg was
> iterating over them (basically from SJA1105_SPEED_AUTO - 0 - to
> SJA1105_SPEED_1000MBPS - 1 - skipping the other two).
> 
> Another bug is that the code in sja1105_adjust_port_config relies on the
> fact that an invalid link speed is detected by sja1105_get_speed_cfg and
> returned as -EINVAL.  However storing this into an enum that only has
> positive members will cast it into an unsigned value, and it will miss
> the negative check.
> 
> So take the simplest approach and remove the sja1105_get_speed_cfg
> function and replace it with a simple switch-case statement.
> 
> Fixes: 8aa9ebccae87 ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 
> switch")
> Signed-off-by: Vladimir Oltean 
> Suggested-by: Andrew Lunn 
> ---
>  drivers/net/dsa/sja1105/sja1105_main.c | 32 +-
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
> b/drivers/net/dsa/sja1105/sja1105_main.c
> index 5412c3551bcc..25bb64ce0432 100644
> --- a/drivers/net/dsa/sja1105/sja1105_main.c
> +++ b/drivers/net/dsa/sja1105/sja1105_main.c
> @@ -710,16 +710,6 @@ static int sja1105_speed[] = {
>   [SJA1105_SPEED_1000MBPS] = 1000,
>  };
>  
> -static sja1105_speed_t sja1105_get_speed_cfg(unsigned int speed_mbps)
> -{
> - int i;
> -
> - for (i = SJA1105_SPEED_AUTO; i <= SJA1105_SPEED_1000MBPS; i++)
> - if (sja1105_speed[i] == speed_mbps)
> - return i;
> - return -EINVAL;
> -}
> -
>  /* Set link speed and enable/disable traffic I/O in the MAC configuration
>   * for a specific port.
>   *
> @@ -742,8 +732,21 @@ static int sja1105_adjust_port_config(struct 
> sja1105_private *priv, int port,
>   mii = priv->static_config.tables[BLK_IDX_XMII_PARAMS].entries;
>   mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries;
>  
> - speed = sja1105_get_speed_cfg(speed_mbps);
> - if (speed_mbps && speed < 0) {
> + switch (speed_mbps) {
> + case 0:
> + /* No speed update requested */
> + speed = SJA1105_SPEED_AUTO;
> + break;
> + case 10:
> + speed = SJA1105_SPEED_10MBPS;
> + break;
> + case 100:
> + speed = SJA1105_SPEED_100MBPS;
> + break;
> + case 1000:
> + speed = SJA1105_SPEED_1000MBPS;
> + break;
> + default:
>   dev_err(dev, "Invalid speed %iMbps\n", speed_mbps);
>   return -EINVAL;
>   }

Thanks for the re-write. This looks more obviously correct. One minor
nit-pick. We have SPEED_10, SPEED_100, SPEED_1000, etc. It would be
good to use them.

With that change

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net] packet: unconditionally free po->rollover

2019-06-02 Thread David Miller
From: Willem de Bruijn 
Date: Fri, 31 May 2019 12:37:23 -0400

> From: Willem de Bruijn 
> 
> Rollover used to use a complex RCU mechanism for assignment, which had
> a race condition. The below patch fixed the bug and greatly simplified
> the logic.
> 
> The feature depends on fanout, but the state is private to the socket.
> Fanout_release returns f only when the last member leaves and the
> fanout struct is to be freed.
> 
> Destroy rollover unconditionally, regardless of fanout state.
> 
> Fixes: 57f015f5eccf2 ("packet: fix crash in fanout_demux_rollover()")
> Reported-by: syzbot 
> Diagnosed-by: Dmitry Vyukov 
> Signed-off-by: Willem de Bruijn 

Applied and queued up for -stable.


Re: [PATCH net-next v3] net: add rcu annotations for ifa_list

2019-06-02 Thread David Miller
From: Florian Westphal 
Date: Fri, 31 May 2019 18:27:02 +0200

> v3: fix typo in patch1 commit message
> All other patches are unchanged.
> v2: remove ifa_list iteration in afs instead of conversion
> 
> Eric Dumazet reported following problem:
> 
>   It looks that unless RTNL is held, accessing ifa_list needs proper RCU
>   protection.  indev->ifa_list can be changed under us by another cpu
>   (which owns RTNL) [..]
> 
>   A proper rcu_dereference() with an happy sparse support would require
>   adding __rcu attribute.
> 
> This patch series does that: add __rcu to the ifa_list pointers.
> That makes sparse complain, so the series also adds the required
> rcu_assign_pointer/dereference helpers where needed.
> 
> All patches except the last one are preparation work.
> Two new macros are introduced for in_ifaddr walks.
> 
> Last patch adds the __rcu annotations and the assign_pointer/dereference
> helper calls.
> 
> This patch is a bit large, but I found no better way -- other
> approaches (annotate-first or add helpers-first) all result in
> mid-series sparse warnings.
> 
> This series is submitted vs. net-next rather than net for several
> reasons:
> 
> 1. Its (mostly) compile-tested only
> 2. 3rd patch changes behaviour wrt. secondary addresses
>(see changelog)
> 3. The problem exists for a very long time (2004), so it doesn't
>seem to be urgent to fix this -- rcu use to free ifa_list
>predates the git era.

Series applied, thanks Florian.


Re: [PATCH net-next] net: ethernet: improve eth_platform_get_mac_address

2019-06-02 Thread David Miller
From: Heiner Kallweit 
Date: Fri, 31 May 2019 19:14:44 +0200

> pci_device_to_OF_node(to_pci_dev(dev)) is the same as dev->of_node,
> so we can simplify the code. In addition add an empty line before
> the return statement.
> 
> Signed-off-by: Heiner Kallweit 

Applied.


Re: [PATCH net-next] r8169: improve r8169_csum_workaround

2019-06-02 Thread David Miller
From: Heiner Kallweit 
Date: Fri, 31 May 2019 19:17:15 +0200

> Use helper skb_is_gso() and simplify access to tx_dropped.
> 
> Signed-off-by: Heiner Kallweit 

Applied.


Re: [PATCH net-next] nexthop: Add entry to MAINTAINERS

2019-06-02 Thread David Miller
From: David Ahern 
Date: Fri, 31 May 2019 12:44:09 -0600

> From: David Ahern 
> 
> Add entry to MAINTAINERS file for new nexthop code.
> 
> Signed-off-by: David Ahern 

Applied.


Re: [PATCH net-next 0/3] r8169: replace several function pointers with direct calls

2019-06-02 Thread David Miller
From: Heiner Kallweit 
Date: Fri, 31 May 2019 19:52:24 +0200

> This series removes most function pointers from struct rtl8169_private
> and uses direct calls instead. This simplifies the code and avoids
> the penalty of indirect calls in times of retpoline.

Series applied, thanks.


Re: [PATCH 1/1] net: rds: add per rds connection cache statistics

2019-06-02 Thread santosh.shilim...@oracle.com

On 6/1/19 12:54 AM, Zhu Yanjun wrote:

The variable cache_allocs is to indicate how many frags (KiB) are in one
rds connection frag cache.
The command "rds-info -Iv" will output the rds connection cache
statistics as below:
"
RDS IB Connections:
   LocalAddr RemoteAddr Tos SL  LocalDevRemoteDev
   1.1.1.14 1.1.1.14   58 255  fe80::2:c903:a:7a31 fe80::2:c903:a:7a31
   send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096,
   rdma_mr_size=257, cache_allocs=12
"
This means that there are about 12KiB frag in this rds connection frag
  cache.

Tested-by: RDS CI 

Please add some valid email id or drop above. Its expected
that with SOB, patches are tested before testing.


Signed-off-by: Zhu Yanjun 
---
  include/uapi/linux/rds.h | 2 ++
  net/rds/ib.c | 2 ++
  2 files changed, 4 insertions(+)

diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index 5d0f76c..fd6b5f6 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -250,6 +250,7 @@ struct rds_info_rdma_connection {
__u32   rdma_mr_max;
__u32   rdma_mr_size;
__u8tos;
+   __u32   cache_allocs;

Some of this header file changes, how is taking care of backward
compatibility with tooling ? This was one of the reason, the
all the fields are not updated.

Regards,
Santosh


Re: KASAN: user-memory-access Read in ip6_hold_safe (3)

2019-06-02 Thread David Ahern
On 6/1/19 12:05 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    dfb569f2 net: ll_temac: Fix compile error

just an FYI: this is before any of my IPv6 changes in 5.2-next that are
relevant. At this commit the only IPv6 changes of mine are:

19a3b7eea424 ipv6: export function to send route updates
cdaa16a4f70c ipv6: Add hook to bump sernum for a route to stubs
68a9b13d9219 ipv6: Add delete route hook to stubs

which are function exports - unused at commit dfb569f2.


> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10afcb8aa0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=fc045131472947d7
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=a5b6e01ec8116d046842
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+a5b6e01ec8116d046...@syzkaller.appspotmail.com
> 
> ==
> BUG: KASAN: user-memory-access in atomic_read
> include/asm-generic/atomic-instrumented.h:26 [inline]
> BUG: KASAN: user-memory-access in atomic_fetch_add_unless
> include/linux/atomic-fallback.h:1086 [inline]
> BUG: KASAN: user-memory-access in atomic_add_unless
> include/linux/atomic-fallback.h: [inline]
> BUG: KASAN: user-memory-access in atomic_inc_not_zero
> include/linux/atomic-fallback.h:1127 [inline]
> BUG: KASAN: user-memory-access in dst_hold_safe include/net/dst.h:297
> [inline]
> BUG: KASAN: user-memory-access in ip6_hold_safe+0xad/0x380
> net/ipv6/route.c:1050
> Read of size 4 at addr 1ec4 by task syz-executor.0/10106

0xc1ec4 is not a valid address for an allocated rt6_info.

> 
> CPU: 0 PID: 10106 Comm: syz-executor.0 Not tainted 5.2.0-rc1+ #5
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
>  __kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
>  kasan_report+0x12/0x20 mm/kasan/common.c:614
>  check_memory_region_inline mm/kasan/generic.c:185 [inline]
>  check_memory_region+0x123/0x190 mm/kasan/generic.c:191
>  kasan_check_read+0x11/0x20 mm/kasan/common.c:94
>  atomic_read include/asm-generic/atomic-instrumented.h:26 [inline]
>  atomic_fetch_add_unless include/linux/atomic-fallback.h:1086 [inline]
>  atomic_add_unless include/linux/atomic-fallback.h: [inline]
>  atomic_inc_not_zero include/linux/atomic-fallback.h:1127 [inline]
>  dst_hold_safe include/net/dst.h:297 [inline]
>  ip6_hold_safe+0xad/0x380 net/ipv6/route.c:1050
>  rt6_get_pcpu_route net/ipv6/route.c:1277 [inline]

My hunch is that this is memory corruption in the pcpu memory space.

In a fib6_info, rt6i_pcpu is non-NULL for ALL fib6_info except
fib6_null_entry for which pcpu routes are never generated.

rt6i_pcpu is allocated via pcpu_alloc which means this memory space is
amongst other pcpu users and easily stepped on by other pcpu users. The
entries stored in rt6_pcpu are kmem_cache entries for the ipv6 dst cache
and either a valid allocated memory address or NULL.

Past issues with pcpu routes was the 'from' (the fib6_info used to
generate the rt6_info) being NULL (several), the fib entry getting
released more than it should (0e2338749192) or not getting freed at all
(61fb0d016807).


Re: general protection fault in tcp_v6_connect

2019-06-02 Thread David Ahern
On 6/1/19 12:05 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    f4aa8012 cxgb4: Make t4_get_tp_e2c_map static
> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1662cb12a0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d137eb988ffd93c3
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=5ee26b4e30c45930bd3c
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+5ee26b4e30c45930b...@syzkaller.appspotmail.com
> 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 17324 Comm: syz-executor.5 Not tainted 5.2.0-rc1+ #2
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
> RIP: 0010:rt6_get_cookie include/net/ip6_fib.h:264 [inline]
> RIP: 0010:ip6_dst_store include/net/ip6_route.h:213 [inline]
> RIP: 0010:tcp_v6_connect+0xfd0/0x20a0 net/ipv6/tcp_ipv6.c:298
> Code: 89 e6 e8 83 a2 48 fb 45 84 e4 0f 84 90 09 00 00 e8 35 a1 48 fb 49
> 8d 7e 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02
> 00 0f 85 57 0e 00 00 4d 8b 66 70 e8 4d 88 35 fb 31 ff 89
> RSP: 0018:888066547800 EFLAGS: 00010207
> RAX: dc00 RBX: 888064e839f0 RCX: c90010e49000
> RDX: 002b RSI: 8628033b RDI: 015f
> RBP: 888066547980 R08: 8880a9412080 R09: ed1015d26be0

This one is not so obvious.

The error has to be a bad dst from ip6_dst_lookup_flow called by
tcp_v6_connect which then is attempted to be stored in the socket via
ip6_dst_store. ip6_dst_store calls rt6_get_cookie with dst as the
argument. RDI (first arg for x86) shows 0x15f which is not a valid and
would cause a fault.

None of the ip6_dst_* functions in net/ipv6/ip6_output.c have changed
recently (5.2-next definitely but I believe this true for many releases
prior). Further, all of the FIB lookup functions (called by
ip6_dst_lookup_flow) always return a non-NULL dst.

If my hunch about the other splat is correct (pcpu corruption) that
could explain this one: FIB lookup is fine and finds an entry, the entry
has a pcpu cache entry so it is returned. If the pcpu entry was stomped
on then it would be invalid and the above would result.


[PATCH v2 net-next 4/7] ipv6: Plumb support for nexthop object in a fib6_info

2019-06-02 Thread David Ahern
From: David Ahern 

Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info
referencing a nexthop object can not have 'sibling' entries (the old way
of doing multipath routes), the nh_list is a union with fib6_siblings.

Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
and delete fib entries using the nexthop.

Add a few nexthop helpers for use when a nexthop is added to fib6_info:
- nexthop_fib6_nh - return first fib6_nh in a nexthop object
- fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
  if the fib6_info references a nexthop object
- nexthop_path_fib6_result - similar to ipv4, select a path within a
  multipath nexthop object. If the nexthop is a blackhole, set
  fib6_result type to RTN_BLACKHOLE, and set the REJECT flag

Update the fib6_info references to check for nh and take a different path
as needed:
- rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
  be coalesced with other fib entries into a multipath route
- rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
  a nexthop
- addrconf (host routes), RA's and info entries (anything configured via
  ndisc) does not use nexthop objects
- fib6_info_destroy_rcu - put reference to nexthop object
- fib6_purge_rt - drop fib6_info from f6i_list
- fib6_select_path - update to use the new nexthop_path_fib6_result when
  fib entry uses a nexthop object
- rt6_device_match - update to catch use of nexthop object as a blackhole
  and set fib6_type and flags.
- ip6_pol_route - detect the REJECT flag getting set for blackhole nexthop
  and jump to ip6_create_rt_rcu
- ip6_route_info_create - don't add space for fib6_nh if fib entry is
  going to reference a nexthop object, take a reference to nexthop object,
  disallow use of source routing
- rt6_nlmsg_size - add space for RTA_NH_ID
- add rt6_fill_node_nexthop to add nexthop data on a dump

As with ipv4, most of the changes push existing code into the else branch
of whether the fib entry uses a nexthop object.

Update the nexthop code to walk f6i_list on a nexthop deleted to remove
fib entries referencing it.

Signed-off-by: David Ahern 
---
 include/net/ip6_fib.h   |  11 ++--
 include/net/ip6_route.h |  13 +++-
 include/net/nexthop.h   |  50 
 net/ipv4/nexthop.c  |  44 ++
 net/ipv6/addrconf.c |   5 ++
 net/ipv6/ip6_fib.c  |  22 +--
 net/ipv6/ndisc.c|   3 +-
 net/ipv6/route.c| 156 +---
 8 files changed, 268 insertions(+), 36 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index ebe5d65f97e0..1a8acd51b277 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -146,7 +146,10 @@ struct fib6_info {
 * destination, but not the same gateway. nsiblings is just a cache
 * to speed up lookup.
 */
-   struct list_headfib6_siblings;
+   union {
+   struct list_headfib6_siblings;
+   struct list_headnh_list;
+   };
unsigned intfib6_nsiblings;
 
refcount_t  fib6_ref;
@@ -170,6 +173,7 @@ struct fib6_info {
unused:3;
 
struct rcu_head rcu;
+   struct nexthop  *nh;
struct fib6_nh  fib6_nh[0];
 };
 
@@ -441,11 +445,6 @@ void rt6_get_prefsrc(const struct rt6_info *rt, struct 
in6_addr *addr)
rcu_read_unlock();
 }
 
-static inline struct net_device *fib6_info_nh_dev(const struct fib6_info *f6i)
-{
-   return f6i->fib6_nh->fib_nh_dev;
-}
-
 int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
 struct fib6_config *cfg, gfp_t gfp_flags,
 struct netlink_ext_ack *extack);
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index a6ce6ea856b9..7375a165fd98 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -27,6 +27,7 @@ struct route_info {
 #include 
 #include 
 #include 
+#include 
 
 #define RT6_LOOKUP_F_IFACE 0x0001
 #define RT6_LOOKUP_F_REACHABLE 0x0002
@@ -66,10 +67,13 @@ static inline bool rt6_need_strict(const struct in6_addr 
*daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | 
IPV6_ADDR_LOOPBACK);
 }
 
+/* fib entries using a nexthop object can not be coalesced into
+ * a multipath route
+ */
 static inline bool rt6_qualify_for_ecmp(const struct fib6_info *f6i)
 {
/* the RTF_ADDRCONF flag filters out RA's */
-   return !(f6i->fib6_flags & RTF_ADDRCONF) &&
+   return !(f6i->fib6_flags & RTF_ADDRCONF) && !f6i->nh &&
f6i->fib6_nh->fib_nh_gw_family;
 }
 
@@ -275,8 +279,13 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info 
*rt,
 
 static i

[PATCH v2 net-next 1/7] ipv4: Use accessors for fib_info nexthop data

2019-06-02 Thread David Ahern
From: David Ahern 

Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the
fib_dev macro which is an alias for the first nexthop. Replacements:

  fi->fib_dev--> fib_info_nh(fi, 0)->fib_nh_dev
  fi->fib_nh --> fib_info_nh(fi, 0)
  fi->fib_nh[i]  --> fib_info_nh(fi, i)
  fi->fib_nhs--> fib_info_num_path(fi)

where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path
returns fi->fib_nhs.

Move the existing fib_info_nhc to nexthop.h and define the new ones
there. A later patch adds a check if a fib_info uses a nexthop object,
and defining the helpers in nexthop.h avoid circular header
dependencies.

After this all remaining open coded references to fi->fib_nhs and
fi->fib_nh are in:
- fib_create_info and helpers used to lookup an existing fib_info
  entry, and
- the netdev event functions fib_sync_down_dev and fib_sync_up.

The latter two will not be reused for nexthops, and the fib_create_info
will be updated to handle a nexthop in a fib_info.

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c   | 29 ++
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 19 ---
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 25 +---
 include/net/ip_fib.h   |  6 --
 include/net/nexthop.h  | 15 +
 net/core/filter.c  |  3 +-
 net/ipv4/fib_frontend.c| 11 ++--
 net/ipv4/fib_lookup.h  |  1 +
 net/ipv4/fib_rules.c   |  8 ++-
 net/ipv4/fib_semantics.c   | 66 --
 net/ipv4/fib_trie.c| 26 +
 net/ipv4/route.c   |  3 +-
 12 files changed, 132 insertions(+), 80 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
index 8212bfd05733..2cbfaa8da7fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2019 Mellanox Technologies. */
 
 #include 
+#include 
 #include "lag.h"
 #include "lag_mp.h"
 #include "mlx5_core.h"
@@ -110,6 +111,8 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev,
 struct fib_info *fi)
 {
struct lag_mp *mp = &ldev->lag_mp;
+   struct fib_nh *fib_nh0, *fib_nh1;
+   unsigned int nhs;
 
/* Handle delete event */
if (event == FIB_EVENT_ENTRY_DEL) {
@@ -120,9 +123,11 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev,
}
 
/* Handle add/replace event */
-   if (fi->fib_nhs == 1) {
+   nhs = fib_info_num_path(fi);
+   if (nhs == 1) {
if (__mlx5_lag_is_active(ldev)) {
-   struct net_device *nh_dev = fi->fib_nh[0].fib_nh_dev;
+   struct fib_nh *nh = fib_info_nh(fi, 0);
+   struct net_device *nh_dev = nh->fib_nh_dev;
int i = mlx5_lag_dev_get_netdev_idx(ldev, nh_dev);
 
mlx5_lag_set_port_affinity(ldev, ++i);
@@ -130,14 +135,16 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag 
*ldev,
return;
}
 
-   if (fi->fib_nhs != 2)
+   if (nhs != 2)
return;
 
/* Verify next hops are ports of the same hca */
-   if (!(fi->fib_nh[0].fib_nh_dev == ldev->pf[0].netdev &&
- fi->fib_nh[1].fib_nh_dev == ldev->pf[1].netdev) &&
-   !(fi->fib_nh[0].fib_nh_dev == ldev->pf[1].netdev &&
- fi->fib_nh[1].fib_nh_dev == ldev->pf[0].netdev)) {
+   fib_nh0 = fib_info_nh(fi, 0);
+   fib_nh1 = fib_info_nh(fi, 1);
+   if (!(fib_nh0->fib_nh_dev == ldev->pf[0].netdev &&
+ fib_nh1->fib_nh_dev == ldev->pf[1].netdev) &&
+   !(fib_nh0->fib_nh_dev == ldev->pf[1].netdev &&
+ fib_nh1->fib_nh_dev == ldev->pf[0].netdev)) {
mlx5_core_warn(ldev->pf[0].dev, "Multipath offload require two 
ports of the same HCA\n");
return;
}
@@ -174,7 +181,7 @@ static void mlx5_lag_fib_nexthop_event(struct mlx5_lag 
*ldev,
mlx5_lag_set_port_affinity(ldev, i);
}
} else if (event == FIB_EVENT_NH_ADD &&
-  fi->fib_nhs == 2) {
+  fib_info_num_path(fi) == 2) {
mlx5_lag_set_port_affinity(ldev, 0);
}
 }
@@ -238,6 +245,7 @@ static int mlx5_lag_fib_event(struct notifier_block *nb,
struct mlx5_fib_event_work *fib_work;
struct fib_entry_notifier_info *fen_info;
struct fib_nh_notifier_info *fnh_info;
+   struct net_device *fib_dev;
struct fib_info *fi;
 
if (info->family != AF_INET)
@@ -254,8 +262,9 @@ static int mlx5_lag_fib_event(struct notifier_block *nb,
fen_info = container

[PATCH v2 net-next 6/7] mlx5: Fail attempts to use routes with nexthop objects

2019-06-02 Thread David Ahern
From: David Ahern 

Fail attempts to use nexthop objects with routes until support can be
properly added.

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
index 2cbfaa8da7fc..e69766393990 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
@@ -262,6 +262,10 @@ static int mlx5_lag_fib_event(struct notifier_block *nb,
fen_info = container_of(info, struct fib_entry_notifier_info,
info);
fi = fen_info->fi;
+   if (fi->nh) {
+   NL_SET_ERR_MSG_MOD(info->extack, "IPv4 route with 
nexthop objects is not supported");
+   return notifier_from_errno(-EINVAL);
+   }
fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev;
if (fib_dev != ldev->pf[0].netdev &&
fib_dev != ldev->pf[1].netdev) {
-- 
2.11.0



[PATCH v2 net-next 5/7] mlxsw: Fail attempts to use routes with nexthop objects

2019-06-02 Thread David Ahern
From: David Ahern 

Fail attempts to use nexthop objects with routes until support can be
properly added.

Signed-off-by: David Ahern 
Reviewed-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 4f781358aef1..23f17ea52061 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -6122,6 +6122,20 @@ static int mlxsw_sp_router_fib_event(struct 
notifier_block *nb,
NL_SET_ERR_MSG_MOD(info->extack, "IPv6 gateway 
with IPv4 route is not supported");
return notifier_from_errno(-EINVAL);
}
+   if (fen_info->fi->nh) {
+   NL_SET_ERR_MSG_MOD(info->extack, "IPv4 route 
with nexthop objects is not supported");
+   return notifier_from_errno(-EINVAL);
+   }
+   } else if (info->family == AF_INET6) {
+   struct fib6_entry_notifier_info *fen6_info;
+
+   fen6_info = container_of(info,
+struct 
fib6_entry_notifier_info,
+info);
+   if (fen6_info->rt->nh) {
+   NL_SET_ERR_MSG_MOD(info->extack, "IPv6 route 
with nexthop objects is not supported");
+   return notifier_from_errno(-EINVAL);
+   }
}
break;
}
-- 
2.11.0



[PATCH v2 net-next 7/7] rocker: Fail attempts to use routes with nexthop objects

2019-06-02 Thread David Ahern
From: David Ahern 

Fail attempts to use nexthop objects with routes until support can be
properly added.

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/rocker/rocker_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/rocker/rocker_main.c 
b/drivers/net/ethernet/rocker/rocker_main.c
index 7ae6c124bfe9..45b3325c3a38 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2214,6 +2214,10 @@ static int rocker_router_fib_event(struct notifier_block 
*nb,
NL_SET_ERR_MSG_MOD(info->extack, "IPv6 gateway 
with IPv4 route is not supported");
return notifier_from_errno(-EINVAL);
}
+   if (fen_info->fi->nh) {
+   NL_SET_ERR_MSG_MOD(info->extack, "IPv4 route 
with nexthop objects is not supported");
+   return notifier_from_errno(-EINVAL);
+   }
}
 
memcpy(&fib_work->fen_info, ptr, sizeof(fib_work->fen_info));
-- 
2.11.0



Re: [PATCH 1/1] net: rds: add per rds connection cache statistics

2019-06-02 Thread Yanjun Zhu



On 2019/6/3 11:03, santosh.shilim...@oracle.com wrote:

On 6/1/19 12:54 AM, Zhu Yanjun wrote:

The variable cache_allocs is to indicate how many frags (KiB) are in one
rds connection frag cache.
The command "rds-info -Iv" will output the rds connection cache
statistics as below:
"
RDS IB Connections:
   LocalAddr RemoteAddr Tos SL  LocalDev RemoteDev
   1.1.1.14 1.1.1.14   58 255  fe80::2:c903:a:7a31 
fe80::2:c903:a:7a31

   send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096,
   rdma_mr_size=257, cache_allocs=12
"
This means that there are about 12KiB frag in this rds connection frag
  cache.

Tested-by: RDS CI 

Please add some valid email id or drop above. Its expected
that with SOB, patches are tested before testing.


Thanks for review.

OK. I will remove this in V2.




Signed-off-by: Zhu Yanjun 
---
  include/uapi/linux/rds.h | 2 ++
  net/rds/ib.c | 2 ++
  2 files changed, 4 insertions(+)

diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index 5d0f76c..fd6b5f6 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -250,6 +250,7 @@ struct rds_info_rdma_connection {
  __u32    rdma_mr_max;
  __u32    rdma_mr_size;
  __u8    tos;
+    __u32    cache_allocs;

Some of this header file changes, how is taking care of backward
compatibility with tooling ? 


Just now I made tests with rds-tools.

In this commit

"

commit 6c03b61e9097098d35b4c2be16d0f0f9f8357112
Author: Santosh Shilimkar 
Date:   Wed Mar 9 04:30:48 2016 -0800

    rds-tools: sync up sources with 2.0.7-1.16
"

cache_allocs is added into rds-tools. The diff is as below.

"

@@ -176,6 +191,9 @@ struct rds_info_rdma_connection {
    uint32_t    max_send_sge;
    uint32_t    rdma_mr_max;
    uint32_t    rdma_mr_size;
+   uint8_t tos;
+   uint8_t sl;
+   uint32_t    cache_allocs;
 };
"
Then this cache_allocs does not exist in rds-tools 2.0.6 and rds-tools 
2.0.5.


I made tests with 2.0.5 and 2.0.6

"

rds-info -V
rds-info: Invalid option '-V'
rds-info version 2.0.5

[root@ca-dev14 rds-tools]# rds-info -Iv

RDS IB Connections:
  LocalAddr  RemoteAddr LocalDev    RemoteDev
   1.1.1.14    1.1.1.14 fe80::2:c903:a:7a31  
fe80::2:c903:a:7a31  send_wr=256, recv_wr=1024, send_sge=8, 
rdma_mr_max=4096, rdma_mr_size=257

"

"

[root@ca-dev14 rds-tools]# rds-info -V
rds-info: Invalid option '-V'
rds-info version 2.0.6

[root@ca-dev14 rds-tools]# rds-info -Iv

RDS IB Connections:
  LocalAddr  RemoteAddr LocalDev    RemoteDev
   1.1.1.14    1.1.1.14 fe80::2:c903:a:7a31  
fe80::2:c903:a:7a31  send_wr=256, recv_wr=1024, send_sge=8, 
rdma_mr_max=4096, rdma_mr_size=257

"

From output of rds-tools 2.0.5 and 2.0.6, cache_allocs does not appear 
since cache_allocs does not exist in struct rds_info_rdma_connection.


But in rds-tools 2.0.7, cache_allocs exists in struct 
rds_info_rdma_connection.


"

[root@ca-dev14 rds-tools]# rds-info -V
rds-info: invalid option -- 'V'

rds-info version 2.0.7

[root@ca-dev14 rds-tools]# rds-info -Iv

RDS IB Connections:
  LocalAddr  RemoteAddr  Tos  SL 
LocalDev    RemoteDev
   1.1.1.14    1.1.1.14    5 255 
fe80::2:c903:a:7a31  fe80::2:c903:a:7a31  send_wr=256, 
recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257, 
cache_allocs=12

"

So do not worry about backward compatibility.  This commit will work 
well with older rds-tools2.0.5 and 2.0.6.


I will send V2 soon.

Thanks

Zhu Yanjun


This was one of the reason, the
all the fields are not updated.

Regards,
Santosh


[PATCH v2 net-next 0/7] net: add struct nexthop to fib{6}_info

2019-06-02 Thread David Ahern
From: David Ahern 

This sets adds 'struct nexthop' to fib_info and fib6_info. IPv4
already handles multiple fib_nh entries in a single fib_info, so
the conversion to use a nexthop struct is fairly mechanical. IPv6
using a nexthop struct with a fib6_info impacts a lot of core logic
which is built around the assumption of a single, builtin fib6_nh
per fib6_info. To make this easier to review, this set adds
nexthop to fib6_info and adds checks in most places fib6_info is
used. The next set finishes the IPv6 conversion, walking through
the places that need to consider all fib6_nh within a nexthop struct.

Offload drivers - mlx5, mlxsw and rocker - are changed to fail FIB
entries using nexthop objects. That limitation can be removed once
the drivers are updated to properly support separate nexthops.

This set starts by adding accessors for fib_nh and fib_nhs in a
fib_info. This makes it easier to extract the number of nexthops
in the fib entry and a specific fib_nh once the entry references
a struct nexthop. Patch 2 converts more of IPv4 code to use
fib_nh_common allowing a struct nexthop to use a fib6_nh with an
IPv4 entry.

Patches 3 and 4 add 'struct nexthop' to fib{6}_info and update
references to both take a different path when it is set. New
exported functions are added to the nexthop code to validate a
nexthop struct when configured for use with a fib entry. IPv4
is allowed to use a nexthop with either v4 or v6 entries. IPv6
is limited to v6 entries only. In both cases list_heads track
the fib entries using a nexthop struct for fast correlation on
events (e.g., device events or nexthop events like delete or
replace).

The last 3 patches add hooks to drivers listening for FIB
notificationas. All 3 of them reject the routes as unsupported,
returning an error message to the user via extack. For mlxsw
at least this is a stop gap measure until the driver is updated for
proper support.

Functional tests for nexthops have already been committed. Those tests
will be active after the next patch set which makes the code paths
created by this set and the next one live.

Existing code paths moved to the else branch of 'if (f{6}i->nh)' checks
are covered by existing tests under selftests/net.

v2
- no code changes from v1
- commit messages for first 4 patches updated

David Ahern (7):
  ipv4: Use accessors for fib_info nexthop data
  ipv4: Prepare for fib6_nh from a nexthop object
  ipv4: Plumb support for nexthop object in a fib_info
  ipv6: Plumb support for nexthop object in a fib6_info
  mlxsw: Fail attempts to use routes with nexthop objects
  mlx5: Fail attempts to use routes with nexthop objects
  rocker: Fail attempts to use routes with nexthop objects

 drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c   |  33 ++-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |  33 ++-
 drivers/net/ethernet/rocker/rocker_main.c  |   4 +
 drivers/net/ethernet/rocker/rocker_ofdpa.c |  25 +-
 include/net/ip6_fib.h  |  11 +-
 include/net/ip6_route.h|  13 +-
 include/net/ip_fib.h   |  25 +-
 include/net/nexthop.h  | 113 +
 net/core/filter.c  |   3 +-
 net/ipv4/fib_frontend.c|  15 +-
 net/ipv4/fib_lookup.h  |   1 +
 net/ipv4/fib_rules.c   |   8 +-
 net/ipv4/fib_semantics.c   | 257 ++---
 net/ipv4/fib_trie.c|  38 ++-
 net/ipv4/nexthop.c | 111 -
 net/ipv4/route.c   |   5 +-
 net/ipv6/addrconf.c|   5 +
 net/ipv6/ip6_fib.c |  22 +-
 net/ipv6/ndisc.c   |   3 +-
 net/ipv6/route.c   | 156 +++--
 20 files changed, 706 insertions(+), 175 deletions(-)

-- 
2.11.0



[PATCH v2 net-next 2/7] ipv4: Prepare for fib6_nh from a nexthop object

2019-06-02 Thread David Ahern
From: David Ahern 

Convert more IPv4 code to use fib_nh_common over fib_nh to enable routes
to use a fib6_nh based nexthop. In the end, only code not using a
nexthop object in a fib_info should directly access fib_nh in a fib_info
without checking the famiy and going through fib_nh_common. Those
functions will be marked when it is not directly evident.

Signed-off-by: David Ahern 
---
 include/net/ip_fib.h | 15 +
 net/ipv4/fib_frontend.c  | 12 +--
 net/ipv4/fib_rules.c |  4 ++--
 net/ipv4/fib_semantics.c | 55 +---
 net/ipv4/fib_trie.c  | 15 +++--
 net/ipv4/nexthop.c   |  3 ++-
 net/ipv4/route.c |  2 +-
 7 files changed, 69 insertions(+), 37 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 42b1a806f6f5..7da8ea784029 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -195,8 +195,8 @@ struct fib_result_nl {
 #define FIB_TABLE_HASHSZ 2
 #endif
 
-__be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh,
-   unsigned char scope);
+__be32 fib_info_update_nhc_saddr(struct net *net, struct fib_nh_common *nhc,
+unsigned char scope);
 __be32 fib_result_prefsrc(struct net *net, struct fib_result *res);
 
 #define FIB_RES_NHC(res)   ((res).nhc)
@@ -455,11 +455,18 @@ static inline void fib_combine_itag(u32 *itag, const 
struct fib_result *res)
 {
 #ifdef CONFIG_IP_ROUTE_CLASSID
struct fib_nh_common *nhc = res->nhc;
-   struct fib_nh *nh = container_of(nhc, struct fib_nh, nh_common);
 #ifdef CONFIG_IP_MULTIPLE_TABLES
u32 rtag;
 #endif
-   *itag = nh->nh_tclassid << 16;
+   if (nhc->nhc_family == AF_INET) {
+   struct fib_nh *nh;
+
+   nh = container_of(nhc, struct fib_nh, nh_common);
+   *itag = nh->nh_tclassid << 16;
+   } else {
+   *itag = 0;
+   }
+
 #ifdef CONFIG_IP_MULTIPLE_TABLES
rtag = res->tclassid;
if (*itag == 0)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index ab369959ce0b..8e49baa00d20 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -235,9 +235,9 @@ static inline unsigned int __inet_dev_addr_type(struct net 
*net,
if (table) {
ret = RTN_UNICAST;
if (!fib_table_lookup(table, &fl4, &res, FIB_LOOKUP_NOREF)) {
-   struct fib_nh *nh = fib_info_nh(res.fi, 0);
+   struct fib_nh_common *nhc = fib_info_nhc(res.fi, 0);
 
-   if (!dev || dev == nh->fib_nh_dev)
+   if (!dev || dev == nhc->nhc_dev)
ret = res.type;
}
}
@@ -325,18 +325,18 @@ bool fib_info_nh_uses_dev(struct fib_info *fi, const 
struct net_device *dev)
int ret;
 
for (ret = 0; ret < fib_info_num_path(fi); ret++) {
-   const struct fib_nh *nh = fib_info_nh(fi, ret);
+   const struct fib_nh_common *nhc = fib_info_nhc(fi, ret);
 
-   if (nh->fib_nh_dev == dev) {
+   if (nhc->nhc_dev == dev) {
dev_match = true;
break;
-   } else if (l3mdev_master_ifindex_rcu(nh->fib_nh_dev) == 
dev->ifindex) {
+   } else if (l3mdev_master_ifindex_rcu(nhc->nhc_dev) == 
dev->ifindex) {
dev_match = true;
break;
}
}
 #else
-   if (fib_info_nh(fi, 0)->fib_nh_dev == dev)
+   if (fib_info_nhc(fi, 0)->nhc_dev == dev)
dev_match = true;
 #endif
 
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index ab06fd73b343..88807c138df4 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -147,9 +147,9 @@ static bool fib4_rule_suppress(struct fib_rule *rule, 
struct fib_lookup_arg *arg
struct net_device *dev = NULL;
 
if (result->fi) {
-   struct fib_nh *nh = fib_info_nh(result->fi, 0);
+   struct fib_nh_common *nhc = fib_info_nhc(result->fi, 0);
 
-   dev = nh->fib_nh_dev;
+   dev = nhc->nhc_dev;
}
 
/* do not accept result if the route does
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index a37ff07718a8..4a12c69f7fa1 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -61,6 +61,9 @@ static unsigned int fib_info_cnt;
 #define DEVINDEX_HASHSIZE (1U << DEVINDEX_HASHBITS)
 static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE];
 
+/* for_nexthops and change_nexthops only used when nexthop object
+ * is not set in a fib_info. The logic within can reference fib_nh.
+ */
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 
 #define for_nexthops(fi) { \
@@ -402,20 +405,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
 
/* each nexthop is packed in a

[PATCH v2 net-next 3/7] ipv4: Plumb support for nexthop object in a fib_info

2019-06-02 Thread David Ahern
From: David Ahern 

Add 'struct nexthop' and nh_list list_head to fib_info. nh_list is the
fib_info side of the nexthop <-> fib_info relationship.

Add fi_list list_head to 'struct nexthop' to track fib_info entries
using a nexthop instance. Add __remove_nexthop_fib and add it to
__remove_nexthop to walk the new list_head and mark those fib entries
as dead when the nexthop is deleted.

Add a few nexthop helpers for use when a nexthop is added to fib_info:
- nexthop_cmp to determine if 2 nexthops are the same
- nexthop_path_fib_result to select a path for a multipath
  'struct nexthop'
- nexthop_fib_nhc to select a specific fib_nh_common within a
  multipath 'struct nexthop'

Update existing fib_info_nhc to use nexthop_fib_nhc if a fib_info uses
a 'struct nexthop', and mark fib_info_nh as only used for the non-nexthop
case.

Update the fib_info functions to check for fi->nh and take a different
path as needed:
- free_fib_info_rcu - put the nexthop object reference
- fib_release_info - remove the fib_info from the nexthop's fi_list
- nh_comp - use nexthop_cmp when either fib_info references a nexthop
  object
- fib_info_hashfn - use the nexthop id for the hashing vs the oif of
  each fib_nh in a fib_info
- fib_nlmsg_size - add space for the RTA_NH_ID attribute
- fib_create_info - verify nexthop reference can be taken, verify
  nexthop spec is valid for fib entry, and add fib_info to fi_list for
  a nexthop
- fib_select_multipath - use the new nexthop_path_fib_result to select a
  path when nexthop objects are used
- fib_table_lookup - if the 'struct nexthop' is a blackhole nexthop, treat
  it the same as a fib entry using 'blackhole'

The bulk of the changes are in fib_semantics.c and most of that is
moving the existing change_nexthops into an else branch.

Update the nexthop code to walk fi_list on a nexthop deleted to remove
fib entries referencing it.

Signed-off-by: David Ahern 
---
 include/net/ip_fib.h |   4 ++
 include/net/nexthop.h|  48 
 net/ipv4/fib_semantics.c | 142 +++
 net/ipv4/fib_trie.c  |   7 +++
 net/ipv4/nexthop.c   |  64 +
 5 files changed, 229 insertions(+), 36 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 7da8ea784029..071d280de389 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -129,9 +129,12 @@ struct fib_nh {
  * This structure contains data shared by many of routes.
  */
 
+struct nexthop;
+
 struct fib_info {
struct hlist_node   fib_hash;
struct hlist_node   fib_lhash;
+   struct list_headnh_list;
struct net  *fib_net;
int fib_treeref;
refcount_t  fib_clntref;
@@ -151,6 +154,7 @@ struct fib_info {
int fib_nhs;
boolfib_nh_is_v6;
boolnh_updated;
+   struct nexthop  *nh;
struct rcu_head rcu;
struct fib_nh   fib_nh[0];
 };
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index e501d77b82c8..2912a2d7a515 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -77,6 +77,7 @@ struct nh_group {
 
 struct nexthop {
struct rb_node  rb_node;/* entry on netns rbtree */
+   struct list_headfi_list;/* v4 entries using nh */
struct list_headgrp_list;   /* nh group entries using this nh */
struct net  *net;
 
@@ -110,6 +111,12 @@ static inline void nexthop_put(struct nexthop *nh)
call_rcu(&nh->rcu, nexthop_free_rcu);
 }
 
+static inline bool nexthop_cmp(const struct nexthop *nh1,
+  const struct nexthop *nh2)
+{
+   return nh1 == nh2;
+}
+
 static inline bool nexthop_is_multipath(const struct nexthop *nh)
 {
if (nh->is_group) {
@@ -193,18 +200,59 @@ static inline bool nexthop_is_blackhole(const struct 
nexthop *nh)
return nhi->reject_nh;
 }
 
+static inline void nexthop_path_fib_result(struct fib_result *res, int hash)
+{
+   struct nh_info *nhi;
+   struct nexthop *nh;
+
+   nh = nexthop_select_path(res->fi->nh, hash);
+   nhi = rcu_dereference(nh->nh_info);
+   res->nhc = &nhi->fib_nhc;
+}
+
+/* called with rcu read lock or rtnl held */
+static inline
+struct fib_nh_common *nexthop_fib_nhc(struct nexthop *nh, int nhsel)
+{
+   struct nh_info *nhi;
+
+   BUILD_BUG_ON(offsetof(struct fib_nh, nh_common) != 0);
+   BUILD_BUG_ON(offsetof(struct fib6_nh, nh_common) != 0);
+
+   if (nexthop_is_multipath(nh)) {
+   nh = nexthop_mpath_select(nh, nhsel);
+   if (!nh)
+   return NULL;
+   }
+
+   nhi = rcu_dereference_rtnl(nh->nh_info);
+   return &nhi->fib_nhc;
+}
+
 static inline unsigned int fib_info_num_path(const struct fib_info *fi)
 {
+   if (unlikely(fi->nh))
+   return ne

[PATCHv2 1/1] net: rds: add per rds connection cache statistics

2019-06-02 Thread Zhu Yanjun
The variable cache_allocs is to indicate how many frags (KiB) are in one
rds connection frag cache.
The command "rds-info -Iv" will output the rds connection cache
statistics as below:
"
RDS IB Connections:
  LocalAddr RemoteAddr Tos SL  LocalDevRemoteDev
  1.1.1.14 1.1.1.14   58 255  fe80::2:c903:a:7a31 fe80::2:c903:a:7a31
  send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096,
  rdma_mr_size=257, cache_allocs=12
"
This means that there are about 12KiB frag in this rds connection frag
cache. 
Since rds.h in rds-tools is not related with the kernel rds.h, the change
in kernel rds.h does not affect rds-tools.
rds-info in rds-tools 2.0.5 and 2.0.6 is tested with this commit. It works
well.

Signed-off-by: Zhu Yanjun 
---
V1->V2: RDS CI is removed. 
---
 include/uapi/linux/rds.h | 2 ++
 net/rds/ib.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index 5d0f76c..fd6b5f6 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -250,6 +250,7 @@ struct rds_info_rdma_connection {
__u32   rdma_mr_max;
__u32   rdma_mr_size;
__u8tos;
+   __u32   cache_allocs;
 };
 
 struct rds6_info_rdma_connection {
@@ -264,6 +265,7 @@ struct rds6_info_rdma_connection {
__u32   rdma_mr_max;
__u32   rdma_mr_size;
__u8tos;
+   __u32   cache_allocs;
 };
 
 /* RDS message Receive Path Latency points */
diff --git a/net/rds/ib.c b/net/rds/ib.c
index 2da9b75..f9baf2d 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -318,6 +318,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection 
*conn,
iinfo->max_recv_wr = ic->i_recv_ring.w_nr;
iinfo->max_send_sge = rds_ibdev->max_sge;
rds_ib_get_mr_info(rds_ibdev, iinfo);
+   iinfo->cache_allocs = atomic_read(&ic->i_cache_allocs);
}
return 1;
 }
@@ -351,6 +352,7 @@ static int rds6_ib_conn_info_visitor(struct rds_connection 
*conn,
iinfo6->max_recv_wr = ic->i_recv_ring.w_nr;
iinfo6->max_send_sge = rds_ibdev->max_sge;
rds6_ib_get_mr_info(rds_ibdev, iinfo6);
+   iinfo6->cache_allocs = atomic_read(&ic->i_cache_allocs);
}
return 1;
 }
-- 
2.7.4



Re: [PATCH] devlink: fix libc and kernel headers collision

2019-06-02 Thread Jiri Pirko
Thu, May 30, 2019 at 05:32:27PM CEST, bar...@tkos.co.il wrote:
>Since commit 2f1242efe9d ("devlink: Add devlink health show command") we
>use the sys/sysinfo.h header for the sysinfo(2) system call. But since
>iproute2 carries a local version of the kernel struct sysinfo, this
>causes a collision with libc that do not rely on kernel defined sysinfo
>like musl libc:
>
>In file included from devlink.c:25:0:
>.../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct 
>sysinfo'
> struct sysinfo {
>^~~
>In file included from ../include/uapi/linux/kernel.h:5:0,
> from ../include/uapi/linux/netlink.h:5,
> from ../include/uapi/linux/genetlink.h:6,
> from devlink.c:21:
>../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
> struct sysinfo {
>   ^~~
>
>Rely on the kernel header alone to avoid kernel and userspace headers
>collision of definitions.
>
>Cc: Aya Levin 
>Cc: Moshe Shemesh 
>Signed-off-by: Baruch Siach 

Acked-by: Jiri Pirko 


Re: general protection fault in tcp_v6_connect

2019-06-02 Thread Dmitry Vyukov
On Mon, Jun 3, 2019 at 5:29 AM David Ahern  wrote:
>
> On 6/1/19 12:05 AM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:f4aa8012 cxgb4: Make t4_get_tp_e2c_map static
> > git tree:   net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1662cb12a0
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=d137eb988ffd93c3
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=5ee26b4e30c45930bd3c
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+5ee26b4e30c45930b...@syzkaller.appspotmail.com
> >
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault:  [#1] PREEMPT SMP KASAN
> > CPU: 1 PID: 17324 Comm: syz-executor.5 Not tainted 5.2.0-rc1+ #2
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
> > RIP: 0010:rt6_get_cookie include/net/ip6_fib.h:264 [inline]
> > RIP: 0010:ip6_dst_store include/net/ip6_route.h:213 [inline]
> > RIP: 0010:tcp_v6_connect+0xfd0/0x20a0 net/ipv6/tcp_ipv6.c:298
> > Code: 89 e6 e8 83 a2 48 fb 45 84 e4 0f 84 90 09 00 00 e8 35 a1 48 fb 49
> > 8d 7e 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02
> > 00 0f 85 57 0e 00 00 4d 8b 66 70 e8 4d 88 35 fb 31 ff 89
> > RSP: 0018:888066547800 EFLAGS: 00010207
> > RAX: dc00 RBX: 888064e839f0 RCX: c90010e49000
> > RDX: 002b RSI: 8628033b RDI: 015f
> > RBP: 888066547980 R08: 8880a9412080 R09: ed1015d26be0
>
> This one is not so obvious.
>
> The error has to be a bad dst from ip6_dst_lookup_flow called by
> tcp_v6_connect which then is attempted to be stored in the socket via
> ip6_dst_store. ip6_dst_store calls rt6_get_cookie with dst as the
> argument. RDI (first arg for x86) shows 0x15f which is not a valid and
> would cause a fault.
>
> None of the ip6_dst_* functions in net/ipv6/ip6_output.c have changed
> recently (5.2-next definitely but I believe this true for many releases
> prior). Further, all of the FIB lookup functions (called by
> ip6_dst_lookup_flow) always return a non-NULL dst.
>
> If my hunch about the other splat is correct (pcpu corruption) that
> could explain this one: FIB lookup is fine and finds an entry, the entry
> has a pcpu cache entry so it is returned. If the pcpu entry was stomped
> on then it would be invalid and the above would result.


This happened only once so far, so may be a previous silent memory corruption.

This also may be related to "KASAN: user-memory-access Read in
ip6_hold_safe (3)":
https://syzkaller.appspot.com/bug?extid=a5b6e01ec8116d046842
because that one seems to be a race in involved code.
So this one may be a rare incarnation of the other crash.