Re: [PATCH bpf 2/2] bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro
> On Jun 1, 2019, at 6:09 PM, Martin Lau wrote: > > On Sat, Jun 01, 2019 at 04:54:46PM -0700, Song Liu wrote: >> >> >>> On May 31, 2019, at 3:29 PM, Martin KaFai Lau wrote: >>> >>> When the commit a6024562ffd7 ("udp: Add GRO functions to UDP socket") >>> added udp[46]_lib_lookup_skb to the udp_gro code path, it broke >>> the reuseport_select_sock() assumption that skb->data is pointing >>> to the transport header. >>> >>> This patch follows an earlier __udp6_lib_err() fix by >>> passing a NULL skb to avoid calling the reuseport's bpf_prog. >>> >>> Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") >>> Cc: Tom Herbert >>> Signed-off-by: Martin KaFai Lau >>> --- >>> net/ipv4/udp.c | 6 +- >>> net/ipv6/udp.c | 2 +- >>> 2 files changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >>> index 8fb250ed53d4..85db0e3d7f3f 100644 >>> --- a/net/ipv4/udp.c >>> +++ b/net/ipv4/udp.c >>> @@ -503,7 +503,11 @@ static inline struct sock >>> *__udp4_lib_lookup_skb(struct sk_buff *skb, > Note that this patch is changing the below "udp4_lib_lookup_skb()" > instead of the above "__udp4_lib_lookup_skb()". > >>> struct sock *udp4_lib_lookup_skb(struct sk_buff *skb, >>> __be16 sport, __be16 dport) >>> { >>> - return __udp4_lib_lookup_skb(skb, sport, dport, &udp_table); >>> + const struct iphdr *iph = ip_hdr(skb); >>> + >>> + return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport, >>> +iph->daddr, dport, inet_iif(skb), >>> +inet_sdif(skb), &udp_table, NULL); >> >> I think we can now remove the last argument of __udp4_lib_lookup()? > The last arg of __udp4_lib_lookup() is skb. > __udp4_lib_lookup_skb(), which is not changed in this patch, is still > calling __udp4_lib_lookup() with a skb and the skb is used by the > reuseport's bpf_prog. Hence, it cannot be removed. I see. I somehow missed this path. Thanks for the explanation. Acked-by: Song Liu > >> >> >>> } >>> EXPORT_SYMBOL_GPL(udp4_lib_lookup_skb); >>> >>> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c >>> index 133e6370f89c..4e52c37bb836 100644 >>> --- a/net/ipv6/udp.c >>> +++ b/net/ipv6/udp.c >>> @@ -243,7 +243,7 @@ struct sock *udp6_lib_lookup_skb(struct sk_buff *skb, >>> >>> return __udp6_lib_lookup(dev_net(skb->dev), &iph->saddr, sport, >>> &iph->daddr, dport, inet6_iif(skb), >>> -inet6_sdif(skb), &udp_table, skb); >>> +inet6_sdif(skb), &udp_table, NULL); >>> } >>> EXPORT_SYMBOL_GPL(udp6_lib_lookup_skb); >>> >>> -- >>> 2.17.1 >>> >>
Re: [PATCH RFC iproute2-next v2] tc: add support for act ctinfo
Please ignore this patch. Have just realised I’ve sent completely the wrong thing. Somehow managed to send the kernel space patch again which is already accepted. I will send a v3 of the user space patch shortly. Apologies. Kevin > On 31 May 2019, at 09:10, ldir@icloud.com wrote: > > From: Kevin Darbyshire-Bryant > > ctinfo is an action restoring data stored in conntrack marks to various > fields. At present it has two independent modes of operation, > restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack > marks into packet skb marks. > > It understands a number of parameters specific to this action in > additional to the usual action syntax. Each operating mode is > independent of the other so all options are err, optional, however not > specifying at least one mode is a bit pointless. > > Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] > [CONTROL] [index ]\n" > > DSCP mode > > dscp enables copying of a DSCP store in the conntrack mark into the > ipv4/v6 diffserv field. The mask is a 32bit field and specifies where > in the conntrack mark the DSCP value is stored. It must be 6 contiguous > bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits > of the conntrack mark. > > The DSCP copying may be optionally controlled by a statemask. The > statemask is a 32bit field, usually with a single bit set and must not > overlap the dscp mask. The DSCP restore operation will only take place > if the corresponding bit/s in conntrack mark yield a non zero result. > > eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6 > bits, whilst using bit 25 as a flag to do so. Bit 26 is unused in this > example. > > CPMARK mode > > cpmark enables copying of the conntrack mark to the packet skb mark. In > this mode it is completely equivalent to the existing act_connmark. > Additional functionality is provided by the optional mask parameter, > whereby the stored conntrack mark is logically anded with the cpmark > mask before being stored into skb mark. This allows shared usage of the > conntrack mark between applications. > > eg. cpmark 0x00ff would restore only the lower 24 bits of the > conntrack mark, thus may be useful in the event that the upper 8 bits > are used by the DSCP function. > > Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] > [CONTROL] [index ] > where : > dscp MASK is the bitmask to restore DSCP >STATEMASK is the bitmask to determine conditional restoring > cpmark MASK mask applied to restored packet mark > ZONE is the conntrack zone > CONTROL := reclassify | pipe | drop | continue | ok | > goto chain > > Signed-off-by: Kevin Darbyshire-Bryant > --- > v2 - fix whitespace issue in pkt_cls > fix most warnings from checkpatch - some lines still over 80 chars > due to long TLV names. > include/uapi/linux/pkt_cls.h | 1 + > include/uapi/linux/tc_act/tc_ctinfo.h | 34 > tc/Makefile | 1 + > tc/m_ctinfo.c | 251 ++ > 4 files changed, 287 insertions(+) > create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h > create mode 100644 tc/m_ctinfo.c > > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h > index 51a0496f..a93680fc 100644 > --- a/include/uapi/linux/pkt_cls.h > +++ b/include/uapi/linux/pkt_cls.h > @@ -105,6 +105,7 @@ enum tca_id { > TCA_ID_IFE = TCA_ACT_IFE, > TCA_ID_SAMPLE = TCA_ACT_SAMPLE, > /* other actions go here */ > + TCA_ID_CTINFO, > __TCA_ID_MAX = 255 > }; > > diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h > b/include/uapi/linux/tc_act/tc_ctinfo.h > new file mode 100644 > index ..da803e05 > --- /dev/null > +++ b/include/uapi/linux/tc_act/tc_ctinfo.h > @@ -0,0 +1,34 @@ > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ > +#ifndef __UAPI_TC_CTINFO_H > +#define __UAPI_TC_CTINFO_H > + > +#include > +#include > + > +struct tc_ctinfo { > + tc_gen; > +}; > + > +enum { > + TCA_CTINFO_UNSPEC, > + TCA_CTINFO_PAD, > + TCA_CTINFO_TM, > + TCA_CTINFO_ACT, > + TCA_CTINFO_ZONE, > + TCA_CTINFO_PARMS_DSCP_MASK, > + TCA_CTINFO_PARMS_DSCP_STATEMASK, > + TCA_CTINFO_PARMS_CPMARK_MASK, > + TCA_CTINFO_STATS_DSCP_SET, > + TCA_CTINFO_STATS_DSCP_ERROR, > + TCA_CTINFO_STATS_CPMARK_SET, > + __TCA_CTINFO_MAX > +}; > + > +#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1) > + > +enum { > + CTINFO_MODE_DSCP= BIT(0), > + CTINFO_MODE_CPMARK = BIT(1) > +}; > + > +#endif > diff --git a/tc/Makefile b/tc/Makefile > index 1a305cf4..60abddee 100644 > --- a/tc/Makefile > +++ b/tc/Makefile > @@ -48,6 +48,7 @@ TCMODULES += m_csum.o > TCMODULES += m_simple.o > TCMODULES += m_vlan.o > TCMODULES += m_connmark.o > +TCMODULES += m_ctinfo.o > TCMODULES += m_bpf.o > TCMODULES += m_tunnel_key.o
Re: [PATCH v3 bpf-next 2/2] libbpf: remove qidconf and better support external bpf programs.
> On Jun 1, 2019, at 9:18 PM, Jonathan Lemon wrote: > > > > On 1 Jun 2019, at 16:05, Song Liu wrote: > >>> On May 31, 2019, at 11:57 AM, Jonathan Lemon >>> wrote: >>> >>> Use the recent change to XSKMAP bpf_map_lookup_elem() to test if >>> there is a xsk present in the map instead of duplicating the work >>> with qidconf. >>> >>> Fix things so callers using XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD >>> bypass any internal bpf maps, so xsk_socket__{create|delete} works >>> properly. >>> >>> Signed-off-by: Jonathan Lemon >>> --- >>> tools/lib/bpf/xsk.c | 79 + >>> 1 file changed, 16 insertions(+), 63 deletions(-) >>> >>> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c >>> index 38667b62f1fe..7ce7494b5b50 100644 >>> --- a/tools/lib/bpf/xsk.c >>> +++ b/tools/lib/bpf/xsk.c >>> @@ -60,10 +60,8 @@ struct xsk_socket { >>> struct xsk_umem *umem; >>> struct xsk_socket_config config; >>> int fd; >>> - int xsks_map; >>> int ifindex; >>> int prog_fd; >>> - int qidconf_map_fd; >>> int xsks_map_fd; >>> __u32 queue_id; >>> char ifname[IFNAMSIZ]; >>> @@ -265,15 +263,11 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) >>> /* This is the C-program: >>> * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) >>> * { >>> -* int *qidconf, index = ctx->rx_queue_index; >>> +* int index = ctx->rx_queue_index; >>> * >>> * // A set entry here means that the correspnding queue_id >>> * // has an active AF_XDP socket bound to it. >>> -* qidconf = bpf_map_lookup_elem(&qidconf_map, &index); >>> -* if (!qidconf) >>> -* return XDP_ABORTED; >>> -* >>> -* if (*qidconf) >>> +* if (bpf_map_lookup_elem(&xsks_map, &index)) >>> * return bpf_redirect_map(&xsks_map, index, 0); >>> * >>> * return XDP_PASS; >>> @@ -286,15 +280,10 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) >>> BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4), >>> BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), >>> BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), >>> - BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd), >>> + BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), >>> BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), >>> BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), >>> - BPF_MOV32_IMM(BPF_REG_0, 0), >>> - /* if r1 == 0 goto +8 */ >>> - BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8), >>> BPF_MOV32_IMM(BPF_REG_0, 2), >>> - /* r1 = *(u32 *)(r1 + 0) */ >>> - BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0), >>> /* if r1 == 0 goto +5 */ >>> BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5), >>> /* r2 = *(u32 *)(r10 - 4) */ >>> @@ -366,18 +355,11 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk) >>> if (max_queues < 0) >>> return max_queues; >>> >>> - fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map", >>> + fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map", >>> sizeof(int), sizeof(int), max_queues, 0); >>> if (fd < 0) >>> return fd; >>> - xsk->qidconf_map_fd = fd; >>> >>> - fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map", >>> -sizeof(int), sizeof(int), max_queues, 0); >>> - if (fd < 0) { >>> - close(xsk->qidconf_map_fd); >>> - return fd; >>> - } >>> xsk->xsks_map_fd = fd; >>> >>> return 0; >>> @@ -385,10 +367,8 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk) >>> >>> static void xsk_delete_bpf_maps(struct xsk_socket *xsk) >>> { >>> - close(xsk->qidconf_map_fd); >>> + bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id); >>> close(xsk->xsks_map_fd); >>> - xsk->qidconf_map_fd = -1; >>> - xsk->xsks_map_fd = -1; >>> } >>> >>> static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) >>> @@ -417,10 +397,9 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) >>> if (err) >>> goto out_map_ids; >>> >>> - for (i = 0; i < prog_info.nr_map_ids; i++) { >>> - if (xsk->qidconf_map_fd != -1 && xsk->xsks_map_fd != -1) >>> - break; >>> + xsk->xsks_map_fd = -1; >>> >>> + for (i = 0; i < prog_info.nr_map_ids; i++) { >>> fd = bpf_map_get_fd_by_id(map_ids[i]); >>> if (fd < 0) >>> continue; >>> @@ -431,11 +410,6 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) >>> continue; >>> } >>> >>> - if (!strcmp(map_info.name, "qidconf_map")) { >>> - xsk->qidconf_map_fd = fd; >>> - continue; >>> - } >>> - >>> if (!strcmp(map_info.name, "xsks_map")) { >>> xsk->xsks_map_fd = fd; >>> continue; >>> @@ -445,40 +419,18 @@ static int xsk_lookup_bpf_maps
Re: [PATCH net-next 01/13] net: axienet: Fixed 64-bit compile, enable build on X86 and ARM
Hi Robert, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Robert-Hancock/Xilinx-axienet-driver-updates/20190602-124146 reproduce: # apt-get install sparse # sparse version: v0.6.1-rc1-7-g2b96cd8-dirty make ARCH=x86_64 allmodconfig make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' If you fix the issue, kindly add following tag Reported-by: kbuild test robot sparse warnings: (new ones prefixed by >>) >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: >> cast to restricted __be32 >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: >> cast to restricted __be32 >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: >> cast to restricted __be32 >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: >> cast to restricted __be32 >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: >> cast to restricted __be32 >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:37: sparse: sparse: >> cast to restricted __be32 >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:35: sparse: sparse: >> incorrect type in assignment (different base types) @@expected >> restricted __wsum [usertype] csum @@got [usertype] csum @@ >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:35: sparse: >> expected restricted __wsum [usertype] csum >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:778:35: sparse:got >> unsigned int vim +778 drivers/net/ethernet/xilinx/xilinx_axienet_main.c 8a3b7a25 Daniel Borkmann 2012-01-19 728 8a3b7a25 Daniel Borkmann 2012-01-19 729 /** 8a3b7a25 Daniel Borkmann 2012-01-19 730 * axienet_recv - Is called from Axi DMA Rx Isr to complete the received 8a3b7a25 Daniel Borkmann 2012-01-19 731 *BD processing. 8a3b7a25 Daniel Borkmann 2012-01-19 732 * @ndev: Pointer to net_device structure. 8a3b7a25 Daniel Borkmann 2012-01-19 733 * 8a3b7a25 Daniel Borkmann 2012-01-19 734 * This function is invoked from the Axi DMA Rx isr to process the Rx BDs. It 8a3b7a25 Daniel Borkmann 2012-01-19 735 * does minimal processing and invokes "netif_rx" to complete further 8a3b7a25 Daniel Borkmann 2012-01-19 736 * processing. 8a3b7a25 Daniel Borkmann 2012-01-19 737 */ 8a3b7a25 Daniel Borkmann 2012-01-19 738 static void axienet_recv(struct net_device *ndev) 8a3b7a25 Daniel Borkmann 2012-01-19 739 { 8a3b7a25 Daniel Borkmann 2012-01-19 740 u32 length; 8a3b7a25 Daniel Borkmann 2012-01-19 741 u32 csumstatus; 8a3b7a25 Daniel Borkmann 2012-01-19 742 u32 size = 0; 8a3b7a25 Daniel Borkmann 2012-01-19 743 u32 packets = 0; 38e96b35 Peter Crosthwaite 2015-05-05 744 dma_addr_t tail_p = 0; 8a3b7a25 Daniel Borkmann 2012-01-19 745 struct axienet_local *lp = netdev_priv(ndev); 8a3b7a25 Daniel Borkmann 2012-01-19 746 struct sk_buff *skb, *new_skb; 8a3b7a25 Daniel Borkmann 2012-01-19 747 struct axidma_bd *cur_p; 8a3b7a25 Daniel Borkmann 2012-01-19 748 8a3b7a25 Daniel Borkmann 2012-01-19 749 cur_p = &lp->rx_bd_v[lp->rx_bd_ci]; 8a3b7a25 Daniel Borkmann 2012-01-19 750 8a3b7a25 Daniel Borkmann 2012-01-19 751 while ((cur_p->status & XAXIDMA_BD_STS_COMPLETE_MASK)) { 38e96b35 Peter Crosthwaite 2015-05-05 752 tail_p = lp->rx_bd_p + sizeof(*lp->rx_bd_v) * lp->rx_bd_ci; 8a3b7a25 Daniel Borkmann 2012-01-19 753 8a3b7a25 Daniel Borkmann 2012-01-19 754 dma_unmap_single(ndev->dev.parent, cur_p->phys, 8a3b7a25 Daniel Borkmann 2012-01-19 755 lp->max_frm_size, 8a3b7a25 Daniel Borkmann 2012-01-19 756 DMA_FROM_DEVICE); 8a3b7a25 Daniel Borkmann 2012-01-19 757 2f148c6d Robert Hancock2019-05-31 758 skb = cur_p->skb; 2f148c6d Robert Hancock2019-05-31 759 cur_p->skb = NULL; 2f148c6d Robert Hancock2019-05-31 760 length = cur_p->app4 & 0x; 2f148c6d Robert Hancock2019-05-31 761 8a3b7a25 Daniel Borkmann 2012-01-19 762 skb_put(skb, length); 8a3b7a25 Daniel Borkmann 2012-01-19 763 skb->protocol = eth_type_trans(skb, ndev); 8a3b7a25 Daniel Borkmann 2012-01-19 764 /*skb_checksum_none_assert(skb);*/ 8a3b7a25 Daniel Borkmann 2012-01-19 765 skb->ip_summed = CHECKSUM_NONE; 8a3b7a25 Daniel Borkmann 2012-01-19 766 8a3b7a25 Daniel Borkmann 2012-01-19 767 /* if we're doing Rx csum offload, set it up */ 8a3b7a25 Daniel Borkmann 2012-01-19 7
[PATCH net-next] r8169: use paged versions of phylib MDIO access functions
Use paged versions of phylib MDIO access functions to simplify the code. Signed-off-by: Heiner Kallweit --- drivers/net/ethernet/realtek/r8169.c | 105 +-- 1 file changed, 33 insertions(+), 72 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 2705eb510..53a4e3a73 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1969,9 +1969,7 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp) ret = phy_read_mmd(phydev, MDIO_MMD_PCS, MDIO_PCS_EEE_ABLE); break; case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: - phy_write(phydev, 0x1f, 0x0a5c); - ret = phy_read(phydev, 0x12); - phy_write(phydev, 0x1f, 0x); + ret = phy_read_paged(phydev, 0x0a5c, 0x12); break; default: ret = -EPROTONOSUPPORT; @@ -1994,9 +1992,7 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp) ret = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_LPABLE); break; case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: - phy_write(phydev, 0x1f, 0x0a5d); - ret = phy_read(phydev, 0x11); - phy_write(phydev, 0x1f, 0x); + ret = phy_read_paged(phydev, 0x0a5d, 0x11); break; default: ret = -EPROTONOSUPPORT; @@ -2019,9 +2015,7 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp) ret = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV); break; case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: - phy_write(phydev, 0x1f, 0x0a5d); - ret = phy_read(phydev, 0x10); - phy_write(phydev, 0x1f, 0x); + ret = phy_read_paged(phydev, 0x0a5d, 0x10); break; default: ret = -EPROTONOSUPPORT; @@ -2044,9 +2038,7 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val) ret = phy_write_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV, val); break; case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: - phy_write(phydev, 0x1f, 0x0a5d); - phy_write(phydev, 0x10, val); - phy_write(phydev, 0x1f, 0x); + phy_write_paged(phydev, 0x0a5d, 0x10, val); break; default: ret = -EPROTONOSUPPORT; @@ -2582,9 +2574,7 @@ static void rtl8168f_config_eee_phy(struct rtl8169_private *tp) static void rtl8168g_config_eee_phy(struct rtl8169_private *tp) { - phy_write(tp->phydev, 0x1f, 0x0a43); - phy_set_bits(tp->phydev, 0x11, BIT(4)); - phy_write(tp->phydev, 0x1f, 0x); + phy_modify_paged(tp->phydev, 0x0a43, 0x11, 0, BIT(4)); } static void rtl8169s_hw_phy_config(struct rtl8169_private *tp) @@ -3483,20 +3473,15 @@ static void rtl8411_hw_phy_config(struct rtl8169_private *tp) static void rtl8168g_disable_aldps(struct rtl8169_private *tp) { - phy_write(tp->phydev, 0x1f, 0x0a43); - phy_clear_bits(tp->phydev, 0x10, BIT(2)); + phy_modify_paged(tp->phydev, 0x0a43, 0x10, BIT(2), 0); } static void rtl8168g_phy_adjust_10m_aldps(struct rtl8169_private *tp) { struct phy_device *phydev = tp->phydev; - phy_write(phydev, 0x1f, 0x0bcc); - phy_clear_bits(phydev, 0x14, BIT(8)); - - phy_write(phydev, 0x1f, 0x0a44); - phy_set_bits(phydev, 0x11, BIT(7) | BIT(6)); - + phy_modify_paged(phydev, 0x0bcc, 0x14, BIT(8), 0); + phy_modify_paged(phydev, 0x0a44, 0x11, 0, BIT(7) | BIT(6)); phy_write(phydev, 0x1f, 0x0a43); phy_write(phydev, 0x13, 0x8084); phy_clear_bits(phydev, 0x14, BIT(14) | BIT(13)); @@ -3507,43 +3492,36 @@ static void rtl8168g_phy_adjust_10m_aldps(struct rtl8169_private *tp) static void rtl8168g_1_hw_phy_config(struct rtl8169_private *tp) { + int ret; + rtl_apply_firmware(tp); - rtl_writephy(tp, 0x1f, 0x0a46); - if (rtl_readphy(tp, 0x10) & 0x0100) { - rtl_writephy(tp, 0x1f, 0x0bcc); - rtl_w0w1_phy(tp, 0x12, 0x, 0x8000); - } else { - rtl_writephy(tp, 0x1f, 0x0bcc); - rtl_w0w1_phy(tp, 0x12, 0x8000, 0x); - } + ret = phy_read_paged(tp->phydev, 0x0a46, 0x10); + if (ret & BIT(8)) + phy_modify_paged(tp->phydev, 0x0bcc, 0x12, BIT(15), 0); + else + phy_modify_paged(tp->phydev, 0x0bcc, 0x12, 0, BIT(15)); - rtl_writephy(tp, 0x1f, 0x0a46); - if (rtl_readphy(tp, 0x13) & 0x0100) { - rtl_writephy(tp, 0x1f, 0x0c41); - rtl_w0w1_phy(tp, 0x15, 0x0002, 0x); - } else { - rtl_writephy(tp, 0x1f, 0x0c41); - rtl_w0w1_phy(tp, 0x15, 0x, 0x0002); - } + ret = phy_read_paged(tp->phydev, 0x
Re: [PATCH 3/8] dt-bindings: net: bluetooth: Add rtl8723bs-bluetooth
On Dienstag, 19. Februar 2019 15:14:01 CEST Rob Herring wrote: > > > How is this used? > > > > rtl8723bs-bt needs 2 firmware binaries -- one is actual firmware, > > another is firmware config which is specific to the board. If > > firmware-postfix is specified, driver appends it to the name of config > > and requests board-specific config while loading firmware. I.e. if > > 'pine64' is specified as firmware-postfix driver will load > > rtl8723bs_config-pine64.bin. > > We already have 'firmware-name' defined and I'd prefer not to have > another way to do things. The difference is just you have to give the > full filename. > Hi Rob, I'm working on a v2 for this patchset and I've looked on how using "firmware- name" with the full filename would be possible but as David Summers has already written [1], the existing code [2] takes this "postfix" as parameter and basically fills it into a filename template ("${CFG_NAME}-${POSTFIX}.bin"). So either we stay with the "firmware-postfix" property or the existing code would have to be modified to accomodate the full filename; but if using firmware-postfix is unacceptable, I can rework the existing code. Luca [1] https://lore.kernel.org/netdev/d06e3c30-a34a-bd84-9cdf-535f25384...@davidjohnsummers.uk/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ drivers/bluetooth/btrtl.c#n566 signature.asc Description: This is a digitally signed message part.
Re: [RFC PATCH 6/6] seg6: Add support to rearrange SRH for AH ICV calculation
On Fri, 31 May 2019 10:34:03 -0700 Tom Herbert wrote: > On Fri, May 31, 2019 at 10:07 AM Ahmed Abdelsalam > wrote: > > > > On Fri, 31 May 2019 09:48:40 -0700 > > Tom Herbert wrote: > > > > > Mutable fields related to segment routing are: destination address, > > > segments left, and modifiable TLVs (those whose high order bit is set). > > > > > > Add support to rearrange a segment routing (type 4) routing header to > > > handle these mutability requirements. This is described in > > > draft-herbert-ipv6-srh-ah-00. > > > > Hi Tom, > > Assuming that IETF process needs to be fixed, then, IMO, should not be on > > the cost of breaking the kernel process here. > > Ahmed, > > I do not see how this is any way breaking the kernel process. The > kernel is beholden to the needs of users provide a robust and secure > implementations, not to some baroque IETF or other SDO processes. When > those are in conflict, the needs of our users should prevail. > > > Let us add to the kernel things that have been reviewed and reached some > > consensus. > > By that argument, segment routing should never have been added to the > kernel since consensus has not be reached on it yet or at least > portions of it. In fact, if you look at this patch set, most of the > changes are actually bug fixes to bring the implementation into > conformance with a later version of the draft. For instance, there was > never consensus reached on the HMAC flag; now it's gone and we need to > remove it from the implementation. > > > For new features that still need to be reviewed we can have them outside > > the kernel tree for community to use. > > This way the community does not get blocked by IETF process but also keep > > the kernel tree stable. > > In any case, that does not address the issue of a user using both > segment routing and authentication which leads to adverse behaviors. > AFAICT, the kernel does not prevent this today. So I ask again: what > is your alternative to address this? > > Thanks, > Tom Tom, Yes, the needs for users should prevail. But it’s not Tom or Ahmed alone who should define users needs. The comparison between "draft-herbert-ipv6-srh-ah-00" and "draft-ietf-6man-segment-routing-header" is missing some facts. The first patch of the SRH implementation was submitted to the kernel two years after releasing the SRH draft. By this time, the draft was a working group adopted and co-authored by several vendors, operators and academia. Please refer to the first SRH patch submitted to the kernel (https://patchwork.ozlabs.org/patch/663176/). I still don’t see the point of rushing to upstream something that has been defined couple of days ago. Plus there is nothing that prevents anyone to "innovate" in his own private kernel tree. -- Ahmed Abdelsalam
Re: [net-next PATCH] net: rtnetlink: Enslave device before bringing it up
Hi David, On Fri, May 31, 2019 at 02:26:15PM -0700, David Miller wrote: > From: Phil Sutter > Date: Wed, 29 May 2019 15:51:20 +0200 > > > Unlike with bridges, one can't add an interface to a bond and set it up > > at the same time: > > > > | # ip link set dummy0 down > > | # ip link set dummy0 master bond0 up > > | Error: Device can not be enslaved while up. > > > > Of all drivers with ndo_add_slave callback, bond and team decline if > > IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and > > immediately up again) and the others just don't care. > > > > Support the common notion of setting the interface up after enslaving it > > by sorting the operations accordingly. > > > > Signed-off-by: Phil Sutter > > What about other flags like IFF_PROMISCUITY? Crap, that's the crux: Upon enslaving, team driver propagates IFF_PROMISC and IFF_ALLMULTI flags from master to slave. With my change, these propagations roll back by accident. So please disregard this patch, I'll have to find a different solution. Thanks, Phil
[PATCH net] selftests: set sysctl bc_forwarding properly in router_broadcast.sh
sysctl setting bc_forwarding for $rp2 is needed when ping_test_from h2, otherwise the bc packets from $rp2 won't be forwarded. This patch is to add this setting for $rp2. Also, as ping_test_from does grep "$from" only, which could match some unexpected output, some test case doesn't really work, like: # ping_test_from $h2 198.51.200.255 198.51.200.2 PING 198.51.200.255 from 198.51.100.2 veth3: 56(84) bytes of data. 64 bytes from 198.51.100.1: icmp_seq=1 ttl=64 time=0.336 ms When doing grep $form (198.51.200.2), the output could still match. So change to grep "bytes from $from" instead. Fixes: 40f98b9af943 ("selftests: add a selftest for directed broadcast forwarding") Signed-off-by: Xin Long --- tools/testing/selftests/net/forwarding/router_broadcast.sh | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/forwarding/router_broadcast.sh b/tools/testing/selftests/net/forwarding/router_broadcast.sh index 9a678ec..4eac0a0 100755 --- a/tools/testing/selftests/net/forwarding/router_broadcast.sh +++ b/tools/testing/selftests/net/forwarding/router_broadcast.sh @@ -145,16 +145,19 @@ bc_forwarding_disable() { sysctl_set net.ipv4.conf.all.bc_forwarding 0 sysctl_set net.ipv4.conf.$rp1.bc_forwarding 0 + sysctl_set net.ipv4.conf.$rp2.bc_forwarding 0 } bc_forwarding_enable() { sysctl_set net.ipv4.conf.all.bc_forwarding 1 sysctl_set net.ipv4.conf.$rp1.bc_forwarding 1 + sysctl_set net.ipv4.conf.$rp2.bc_forwarding 1 } bc_forwarding_restore() { + sysctl_restore net.ipv4.conf.$rp2.bc_forwarding sysctl_restore net.ipv4.conf.$rp1.bc_forwarding sysctl_restore net.ipv4.conf.all.bc_forwarding } @@ -171,7 +174,7 @@ ping_test_from() log_info "ping $dip, expected reply from $from" ip vrf exec $(master_name_get $oif) \ $PING -I $oif $dip -c 10 -i 0.1 -w $PING_TIMEOUT -b 2>&1 \ - | grep $from &> /dev/null + | grep "bytes from $from" > /dev/null check_err_fail $fail $? } -- 2.1.0
[PATCH net] ipv4: not do cache for local delivery if bc_forwarding is enabled
With the topo: h1 ---| rp1| | route rp3 |--- h3 (192.168.200.1) h2 ---| rp2| If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on h2, and the packets can still be forwared. This issue was caused by the input route cache. It should only do the cache for either bc forwarding or local delivery. Otherwise, local delivery can use the route cache for bc forwarding of other interfaces. This patch is to fix it by not doing cache for local delivery if all.bc_forwarding is enabled. Note that we don't fix it by checking route cache local flag after rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the common route code shouldn't be touched for bc_forwarding. Fixes: 5cbf777cfdf6 ("route: add support for directed broadcast forwarding") Reported-by: Jianlin Shi Signed-off-by: Xin Long --- net/ipv4/route.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 11ddc27..91bf75b 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1985,7 +1985,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, u32 itag = 0; struct rtable *rth; struct flowi4 fl4; - bool do_cache; + bool do_cache = true; /* IP on this device is disabled. */ @@ -2062,6 +2062,9 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, if (res->type == RTN_BROADCAST) { if (IN_DEV_BFORWARD(in_dev)) goto make_route; + /* not do cache if bc_forwarding is enabled */ + if (IPV4_DEVCONF_ALL(net, BC_FORWARDING)) + do_cache = false; goto brd_input; } @@ -2099,18 +2102,15 @@ out:return err; RT_CACHE_STAT_INC(in_brd); local_input: - do_cache = false; - if (res->fi) { - if (!itag) { - struct fib_nh_common *nhc = FIB_RES_NHC(*res); + do_cache &= res->fi && !itag; + if (do_cache) { + struct fib_nh_common *nhc = FIB_RES_NHC(*res); - rth = rcu_dereference(nhc->nhc_rth_input); - if (rt_cache_valid(rth)) { - skb_dst_set_noref(skb, &rth->dst); - err = 0; - goto out; - } - do_cache = true; + rth = rcu_dereference(nhc->nhc_rth_input); + if (rt_cache_valid(rth)) { + skb_dst_set_noref(skb, &rth->dst); + err = 0; + goto out; } } -- 2.1.0
[PATCH net] ipv6: fix the check before getting the cookie in rt6_get_cookie
In Jianlin's testing, netperf was broken with 'Connection reset by peer', as the cookie check failed in rt6_check() and ip6_dst_check() always returned NULL. It's caused by Commit 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes"), where the cookie can be got only when 'c1'(see below) for setting dst_cookie whereas rt6_check() is called when !'c1' for checking dst_cookie, as we can see in ip6_dst_check(). Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check() (!c1) will check the 'from' cookie, this patch is to remove the c1 check in rt6_get_cookie(), so that the dst_cookie can always be set properly. c1: (rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached))) Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Jianlin Shi Signed-off-by: Xin Long --- include/net/ip6_fib.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 525f701..d6d936c 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -263,8 +263,7 @@ static inline u32 rt6_get_cookie(const struct rt6_info *rt) rcu_read_lock(); from = rcu_dereference(rt->from); - if (from && (rt->rt6i_flags & RTF_PCPU || - unlikely(!list_empty(&rt->rt6i_uncached + if (from) fib6_get_cookie_safe(from, &cookie); rcu_read_unlock(); -- 2.1.0
[PATCH net] netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments
With commit 997dd9647164 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c"), nf_ct_frag6_reasm() is now called from nf_ct_frag6_queue(). With this change, nf_ct_frag6_queue() can fail after the skb has been added to the fragment queue and nf_ct_frag6_gather() was adapted to handle this case. But nf_ct_frag6_queue() can still fail before the fragment has been queued. nf_ct_frag6_gather() can't handle this case anymore, because it has no way to know if nf_ct_frag6_queue() queued the fragment before failing. If it didn't, the skb is lost as the error code is overwritten with -EINPROGRESS. Fix this by setting -EINPROGRESS directly in nf_ct_frag6_queue(), so that nf_ct_frag6_gather() can propagate the error as is. Fixes: 997dd9647164 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c") Signed-off-by: Guillaume Nault --- Not sure if this should got to the net or nf tree (as the original patch went through net). Anyway this patch applies cleanly to both. net/ipv6/netfilter/nf_conntrack_reasm.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 3de0e9b0a482..5b3f65e29b6f 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -293,7 +293,11 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, skb->_skb_refdst = 0UL; err = nf_ct_frag6_reasm(fq, skb, prev, dev); skb->_skb_refdst = orefdst; - return err; + + /* After queue has assumed skb ownership, only 0 or +* -EINPROGRESS must be returned. +*/ + return err ? -EINPROGRESS : 0; } skb_dst_drop(skb); @@ -480,12 +484,6 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) ret = 0; } - /* after queue has assumed skb ownership, only 0 or -EINPROGRESS -* must be returned. -*/ - if (ret) - ret = -EINPROGRESS; - spin_unlock_bh(&fq->q.lock); inet_frag_put(&fq->q); return ret; -- 2.20.1
[PATCH] netfilter: ipv6: Fix undefined symbol nf_ct_frag6_gather
From: wenxu CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined! Fixes: c9bb6165a16e ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y") Reported-by: kbuild test robot Signed-off-by: wenxu --- net/ipv6/netfilter.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index 9530cc2..96d7abf 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -238,8 +238,10 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, .route_input= ip6_route_input, .fragment = ip6_fragment, .reroute= nf_ip6_reroute, -#if IS_MODULE(CONFIG_IPV6) +#if IS_MODULE(CONFIG_IPV6) && IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) .br_defrag = nf_ct_frag6_gather, +#endif +#if IS_MODULE(CONFIG_IPV6) .br_fragment= br_ip6_fragment, #endif }; -- 1.8.3.1
[PATCH net-next v2] netfilter: ipv6: Fix undefined symbol nf_ct_frag6_gather
From: wenxu CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined! Fixes: c9bb6165a16e ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y") Reported-by: kbuild test robot Signed-off-by: wenxu --- v2: Forgot to include "net-next" net/ipv6/netfilter.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index 9530cc2..96d7abf 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -238,8 +238,10 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, .route_input= ip6_route_input, .fragment = ip6_fragment, .reroute= nf_ip6_reroute, -#if IS_MODULE(CONFIG_IPV6) +#if IS_MODULE(CONFIG_IPV6) && IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) .br_defrag = nf_ct_frag6_gather, +#endif +#if IS_MODULE(CONFIG_IPV6) .br_fragment= br_ip6_fragment, #endif }; -- 1.8.3.1
Re: iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56
On Thu, May 30, 2019 at 10:12:57AM +0200, Marc Haber wrote: > on my primary notebook, a Lenovo X260, with an Intel Wireless 8260 > (8086:24f3), running Debian unstable, I have started to see network > hangs since upgrading to kernel 5.1. In this situation, I cannot > restart Network-Manager (the call just hangs), I can log out of X, but > the system does not cleanly shut down and I need to Magic SysRq myself > out of the running system. This happens about once every two days. The issue is also present in 5.1.5 and 5.1.6. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
[PATCH] net: phylink: avoid reducing support mask
Avoid reducing the support mask as a result of the interface type selected for SFP modules, or when setting the link settings through ethtool - this should only change when the supported link modes of the hardware combination change. Signed-off-by: Russell King --- drivers/net/phy/phylink.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 9044b95d2afe..4c0616ba314d 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -1073,6 +1073,7 @@ EXPORT_SYMBOL_GPL(phylink_ethtool_ksettings_get); int phylink_ethtool_ksettings_set(struct phylink *pl, const struct ethtool_link_ksettings *kset) { + __ETHTOOL_DECLARE_LINK_MODE_MASK(support); struct ethtool_link_ksettings our_kset; struct phylink_link_state config; int ret; @@ -1083,11 +1084,12 @@ int phylink_ethtool_ksettings_set(struct phylink *pl, kset->base.autoneg != AUTONEG_ENABLE) return -EINVAL; + linkmode_copy(support, pl->supported); config = pl->link_config; /* Mask out unsupported advertisements */ linkmode_and(config.advertising, kset->link_modes.advertising, -pl->supported); +support); /* FIXME: should we reject autoneg if phy/mac does not support it? */ if (kset->base.autoneg == AUTONEG_DISABLE) { @@ -1097,7 +1099,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl, * duplex. */ s = phy_lookup_setting(kset->base.speed, kset->base.duplex, - pl->supported, false); + support, false); if (!s) return -EINVAL; @@ -1126,7 +1128,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl, __set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, config.advertising); } - if (phylink_validate(pl, pl->supported, &config)) + if (phylink_validate(pl, support, &config)) return -EINVAL; /* If autonegotiation is enabled, we must have an advertisement */ @@ -1576,6 +1578,7 @@ static int phylink_sfp_module_insert(void *upstream, { struct phylink *pl = upstream; __ETHTOOL_DECLARE_LINK_MODE_MASK(support) = { 0, }; + __ETHTOOL_DECLARE_LINK_MODE_MASK(support1); struct phylink_link_state config; phy_interface_t iface; int ret = 0; @@ -1603,6 +1606,8 @@ static int phylink_sfp_module_insert(void *upstream, return ret; } + linkmode_copy(support1, support); + iface = sfp_select_interface(pl->sfp_bus, id, config.advertising); if (iface == PHY_INTERFACE_MODE_NA) { netdev_err(pl->netdev, @@ -1612,7 +1617,7 @@ static int phylink_sfp_module_insert(void *upstream, } config.interface = iface; - ret = phylink_validate(pl, support, &config); + ret = phylink_validate(pl, support1, &config); if (ret) { netdev_err(pl->netdev, "validation of %s/%s with support %*pb failed: %d\n", phylink_an_mode_str(MLO_AN_INBAND), -- 2.7.4
[PATCH] net: sfp: read eeprom in maximum 16 byte increments
Some SFP modules do not like reads longer than 16 bytes, so read the EEPROM in chunks of 16 bytes at a time. This behaviour is not specified in the SFP MSAs, which specifies: "The serial interface uses the 2-wire serial CMOS E2PROM protocol defined for the ATMEL AT24C01A/02/04 family of components." and "As long as the SFP+ receives an acknowledge, it shall serially clock out sequential data words. The sequence is terminated when the host responds with a NACK and a STOP instead of an acknowledge." We must avoid breaking a read across a 16-bit quantity in the diagnostic page, thankfully all 16-bit quantities in that page are naturally aligned. Signed-off-by: Russell King --- drivers/net/phy/sfp.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index d4635c2178d1..71812be0ac64 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -281,6 +281,7 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf, { struct i2c_msg msgs[2]; u8 bus_addr = a2 ? 0x51 : 0x50; + size_t this_len; int ret; msgs[0].addr = bus_addr; @@ -292,11 +293,26 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf, msgs[1].len = len; msgs[1].buf = buf; - ret = i2c_transfer(sfp->i2c, msgs, ARRAY_SIZE(msgs)); - if (ret < 0) - return ret; + while (len) { + this_len = len; + if (this_len > 16) + this_len = 16; - return ret == ARRAY_SIZE(msgs) ? len : 0; + msgs[1].len = this_len; + + ret = i2c_transfer(sfp->i2c, msgs, ARRAY_SIZE(msgs)); + if (ret < 0) + return ret; + + if (ret != ARRAY_SIZE(msgs)) + break; + + msgs[1].buf += this_len; + dev_addr += this_len; + len -= this_len; + } + + return msgs[1].buf - (u8 *)buf; } static int sfp_i2c_write(struct sfp *sfp, bool a2, u8 dev_addr, void *buf, -- 2.7.4
Re: [PATCH net-next] net: phy: phylink: add fallback from SGMII to 1000BaseX
On Fri, May 31, 2019 at 06:17:51PM -0600, Robert Hancock wrote: > Our device is mainly intended for fiber modules, which is why 1000BaseX > is being used. The variant of fiber modules we are using (for example, > Finisar FCLF8520P2BTL) are set up for 1000BaseX, and seem like they are > kind of a hack to allow using copper on devices which only support > 1000BaseX mode (in fact that particular one is extra hacky since you > have to disable 1000BaseX autonegotiation on the host side). This patch > is basically intended to allow that particular case to work. Looking at the data sheet for FCLF8520P2BTL, it explicit states: PRODUCT SELECTION Part Number Link Indicator 1000BASE-X auto-negotiation on RX_LOS Pin enabled by default FCLF8520P2BTL Yes No FCLF8521P2BTL No Yes FCLF8522P2BTL Yes Yes The idea being, you buy the correct one according to what the host equipment requires, rather than just picking one and hoping it works. The data sheet goes on to mention that the module uses a Marvell 88e PHY, which seems to be quite common for copper SFPs from multiple manufacturers (but not all) and is very flexible in how it can be configured. If we detect a PHY on the SFP module, we check detect whether it is an 88e PHY, and then read out its configured link type. We don't have a way to deal with the difference between FCLF8520P2BTL and FCLF8521P2BTL, but at least we'll be able to tell whether we should be in 1000Base-X mode for these modules, rather than SGMII. For a SFP cage meant to support fiber, I would recommend using the FCLF8521P2BTL or FCLF8522P2BTL since those will behave more like a 802.3z standards-compliant gigabit fiber connection. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
Re: [RFC PATCH 6/6] seg6: Add support to rearrange SRH for AH ICV calculation
On Sun, Jun 2, 2019 at 2:54 AM Ahmed Abdelsalam wrote: > > On Fri, 31 May 2019 10:34:03 -0700 > Tom Herbert wrote: > > > On Fri, May 31, 2019 at 10:07 AM Ahmed Abdelsalam > > wrote: > > > > > > On Fri, 31 May 2019 09:48:40 -0700 > > > Tom Herbert wrote: > > > > > > > Mutable fields related to segment routing are: destination address, > > > > segments left, and modifiable TLVs (those whose high order bit is set). > > > > > > > > Add support to rearrange a segment routing (type 4) routing header to > > > > handle these mutability requirements. This is described in > > > > draft-herbert-ipv6-srh-ah-00. > > > > > > Hi Tom, > > > Assuming that IETF process needs to be fixed, then, IMO, should not be on > > > the cost of breaking the kernel process here. > > > > Ahmed, > > > > I do not see how this is any way breaking the kernel process. The > > kernel is beholden to the needs of users provide a robust and secure > > implementations, not to some baroque IETF or other SDO processes. When > > those are in conflict, the needs of our users should prevail. > > > > > Let us add to the kernel things that have been reviewed and reached some > > > consensus. > > > > By that argument, segment routing should never have been added to the > > kernel since consensus has not be reached on it yet or at least > > portions of it. In fact, if you look at this patch set, most of the > > changes are actually bug fixes to bring the implementation into > > conformance with a later version of the draft. For instance, there was > > never consensus reached on the HMAC flag; now it's gone and we need to > > remove it from the implementation. > > > > > For new features that still need to be reviewed we can have them outside > > > the kernel tree for community to use. > > > This way the community does not get blocked by IETF process but also keep > > > the kernel tree stable. > > > > In any case, that does not address the issue of a user using both > > segment routing and authentication which leads to adverse behaviors. > > AFAICT, the kernel does not prevent this today. So I ask again: what > > is your alternative to address this? > > > > Thanks, > > Tom > > Tom, > Yes, the needs for users should prevail. But it’s not Tom or Ahmed alone who > should define users needs. > The comparison between "draft-herbert-ipv6-srh-ah-00" and > "draft-ietf-6man-segment-routing-header" is > missing some facts. The first patch of the SRH implementation was submitted > to the kernel two years after > releasing the SRH draft. By this time, the draft was a working group adopted > and co-authored by several > vendors, operators and academia. Please refer to the first SRH patch > submitted to the kernel > (https://patchwork.ozlabs.org/patch/663176/). I still don’t see the point of > rushing to upstream something > that has been defined couple of days ago. Plus there is nothing that prevents > anyone to "innovate" in his > own private kernel tree. Ahmed, While you seem to think that was just defined and came out the blue a few days ago, in fact this has been in discussion for many months. The simultaneous use of segment routing and authentication header was not defined-- but it is defined for other routing types and extension headers. The primary drivers of segment routing (the academics, operators, and vendors you refer to) were reluctant to do this. For the most part, these are mostly routing vendors who don't care about preserving end-to-end host functionality like AH. In order to define an interoperable protocol, the mutability of fields needs to be defined. They were unwilling to commit to defining what is mutable in their protocol, and it took an intervening action of the working group chairs to force them to clarify the requirements so now we have something. IMO, this is straightforward bug fix. If you want to say that we need to wait for IETF to take action, okay, but then I strongly suggest that you actively participate in the process (i.e. send to 6man list what you think about the draft), as opposed to just passively deferring to it and assuming others will do the right thing. Tom > > -- > Ahmed Abdelsalam
[PATCH net-next] net: fix use-after-free in kfree_skb_list
syzbot reported nasty use-after-free [1] Lets remove frag_list field from structs ip_fraglist_iter and ip6_fraglist_iter. This seens not needed anyway. [1] : BUG: KASAN: use-after-free in kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706 Read of size 8 at addr 888085a3cbc0 by task syz-executor303/8947 CPU: 0 PID: 8947 Comm: syz-executor303 Not tainted 5.2.0-rc2+ #12 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706 ip6_fragment+0x1ef4/0x2680 net/ipv6/ip6_output.c:882 __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144 ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179 dst_output include/net/dst.h:433 [inline] ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179 ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796 ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816 rawv6_push_pending_frames net/ipv6/raw.c:617 [inline] rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947 inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:671 ___sys_sendmsg+0x803/0x920 net/socket.c:2292 __sys_sendmsg+0x105/0x1d0 net/socket.c:2330 __do_sys_sendmsg net/socket.c:2339 [inline] __se_sys_sendmsg net/socket.c:2337 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x44add9 Code: e8 7c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 05 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f826f33bce8 EFLAGS: 0246 ORIG_RAX: 002e RAX: ffda RBX: 006e7a18 RCX: 0044add9 RDX: RSI: 2240 RDI: 0005 RBP: 006e7a10 R08: R09: R10: R11: 0246 R12: 006e7a1c R13: 7ffcec4f7ebf R14: 7f826f33c9c0 R15: 20c49ba5e353f7cf Allocated by task 8947: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497 slab_post_alloc_hook mm/slab.h:437 [inline] slab_alloc_node mm/slab.c:3269 [inline] kmem_cache_alloc_node+0x131/0x710 mm/slab.c:3579 __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:199 alloc_skb include/linux/skbuff.h:1058 [inline] __ip6_append_data.isra.0+0x2a24/0x3640 net/ipv6/ip6_output.c:1519 ip6_append_data+0x1e5/0x320 net/ipv6/ip6_output.c:1688 rawv6_sendmsg+0x1467/0x35e0 net/ipv6/raw.c:940 inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:671 ___sys_sendmsg+0x803/0x920 net/socket.c:2292 __sys_sendmsg+0x105/0x1d0 net/socket.c:2330 __do_sys_sendmsg net/socket.c:2339 [inline] __se_sys_sendmsg net/socket.c:2337 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 8947: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kmem_cache_free+0x86/0x260 mm/slab.c:3698 kfree_skbmem net/core/skbuff.c:625 [inline] kfree_skbmem+0xc5/0x150 net/core/skbuff.c:619 __kfree_skb net/core/skbuff.c:682 [inline] kfree_skb net/core/skbuff.c:699 [inline] kfree_skb+0xf0/0x390 net/core/skbuff.c:693 kfree_skb_list+0x44/0x60 net/core/skbuff.c:708 __dev_xmit_skb net/core/dev.c:3551 [inline] __dev_queue_xmit+0x3034/0x36b0 net/core/dev.c:3850 dev_queue_xmit+0x18/0x20 net/core/dev.c:3914 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:511 [inline] ip6_finish_output2+0x1034/0x2550 net/ipv6/ip6_output.c:120 ip6_fragment+0x1ebb/0x2680 net/ipv6/ip6_output.c:863 __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144 ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179 dst_output include/net/dst.h:433 [inline] ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179 ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796 ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816 rawv6_
[PATCH RFC iproute2-next v3] tc: add support for action act_ctinfo
ctinfo is an action restoring data stored in conntrack marks to various fields. At present it has two independent modes of operation, restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack marks into packet skb marks. It understands a number of parameters specific to this action in additional to the usual action syntax. Each operating mode is independent of the other so all options are optional, however not specifying at least one mode is a bit pointless. Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] [CONTROL] [index ] DSCP mode dscp enables copying of a DSCP store in the conntrack mark into the ipv4/v6 diffserv field. The mask is a 32bit field and specifies where in the conntrack mark the DSCP value is stored. It must be 6 contiguous bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits of the conntrack mark. The DSCP copying may be optionally controlled by a statemask. The statemask is a 32bit field, usually with a single bit set and must not overlap the dscp mask. The DSCP restore operation will only take place if the corresponding bit/s in conntrack mark yield a non zero result. eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6 bits, whilst using bit 25 as a flag to do so. Bit 26 is unused in this example. CPMARK mode cpmark enables copying of the conntrack mark to the packet skb mark. In this mode it is completely equivalent to the existing act_connmark. Additional functionality is provided by the optional mask parameter, whereby the stored conntrack mark is logically anded with the cpmark mask before being stored into skb mark. This allows shared usage of the conntrack mark between applications. eg. cpmark 0x00ff would restore only the lower 24 bits of the conntrack mark, thus may be useful in the event that the upper 8 bits are used by the DSCP function. Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] [CONTROL] [index ] where : dscp MASK is the bitmask to restore DSCP STATEMASK is the bitmask to determine conditional restoring cpmark MASK mask applied to restored packet mark ZONE is the conntrack zone CONTROL := reclassify | pipe | drop | continue | ok | goto chain Signed-off-by: Kevin Darbyshire-Bryant --- v2 - fix whitespace issue in pkt_cls fix most warnings from checkpatch - some lines still over 80 chars due to long TLV names. v3 - fix some dangling else warnings. refactor stats printing to please checkpatch. send zone TLV even if default '0' zone. now checkpatch clean even though I think some of the formatting is horrible :-) sending via google's smtp 'cos MS' exchange office365 appears to mangle patches from git send-email. include/uapi/linux/pkt_cls.h | 1 + include/uapi/linux/tc_act/tc_ctinfo.h | 34 tc/Makefile | 1 + tc/m_ctinfo.c | 262 ++ 4 files changed, 298 insertions(+) create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h create mode 100644 tc/m_ctinfo.c diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index 51a0496f..a93680fc 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -105,6 +105,7 @@ enum tca_id { TCA_ID_IFE = TCA_ACT_IFE, TCA_ID_SAMPLE = TCA_ACT_SAMPLE, /* other actions go here */ + TCA_ID_CTINFO, __TCA_ID_MAX = 255 }; diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h b/include/uapi/linux/tc_act/tc_ctinfo.h new file mode 100644 index ..da803e05 --- /dev/null +++ b/include/uapi/linux/tc_act/tc_ctinfo.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef __UAPI_TC_CTINFO_H +#define __UAPI_TC_CTINFO_H + +#include +#include + +struct tc_ctinfo { + tc_gen; +}; + +enum { + TCA_CTINFO_UNSPEC, + TCA_CTINFO_PAD, + TCA_CTINFO_TM, + TCA_CTINFO_ACT, + TCA_CTINFO_ZONE, + TCA_CTINFO_PARMS_DSCP_MASK, + TCA_CTINFO_PARMS_DSCP_STATEMASK, + TCA_CTINFO_PARMS_CPMARK_MASK, + TCA_CTINFO_STATS_DSCP_SET, + TCA_CTINFO_STATS_DSCP_ERROR, + TCA_CTINFO_STATS_CPMARK_SET, + __TCA_CTINFO_MAX +}; + +#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1) + +enum { + CTINFO_MODE_DSCP= BIT(0), + CTINFO_MODE_CPMARK = BIT(1) +}; + +#endif diff --git a/tc/Makefile b/tc/Makefile index 1a305cf4..60abddee 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -48,6 +48,7 @@ TCMODULES += m_csum.o TCMODULES += m_simple.o TCMODULES += m_vlan.o TCMODULES += m_connmark.o +TCMODULES += m_ctinfo.o TCMODULES += m_bpf.o TCMODULES += m_tunnel_key.o TCMODULES += m_sample.o diff --git a/tc/m_ctinfo.c b/tc/m_ctinfo.c new file mode 100644 index ..af5102bf --- /dev/null +++ b/tc/m_ctinfo.c @@ -0,0 +1,262 @@ +/* SPDX-License-Ident
Re: [PATCH net-next] selftests: Add test cases for nexthop objects
From: David Ahern Date: Thu, 30 May 2019 12:06:36 -0700 > From: David Ahern > > Add functional test cases for nexthop objects. > > Signed-off-by: David Ahern Applied, thanks.
Re: [PATCH net-next] cxgb4: Set initial IRQ affinity hints
From: Nirranjan Kirubaharan Date: Thu, 30 May 2019 23:14:28 -0700 > + while (--ethqidx >= 0) { > + --msi_index; It is more canonical to use "msi_index--;" here.
Re: [PATCH RFC iproute2-next v3] tc: add support for action act_ctinfo
Kevin Darbyshire-Bryant writes: > ctinfo is an action restoring data stored in conntrack marks to various > fields. At present it has two independent modes of operation, > restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack > marks into packet skb marks. > > It understands a number of parameters specific to this action in > additional to the usual action syntax. Each operating mode is > independent of the other so all options are optional, however not > specifying at least one mode is a bit pointless. > > Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] > [CONTROL] [index ] Yay, bikeshedding time! :) As I said in reply to the kernel patch, the "X/Y" syntax usually means "/", where here they are just two semi-related mask values. So I think it would be better to just make 'statemask' its own parameter. Other than that, just a few nits, below... > DSCP mode > > dscp enables copying of a DSCP store in the conntrack mark into the > ipv4/v6 diffserv field. The mask is a 32bit field and specifies where > in the conntrack mark the DSCP value is stored. It must be 6 contiguous > bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits > of the conntrack mark. > > The DSCP copying may be optionally controlled by a statemask. The > statemask is a 32bit field, usually with a single bit set and must not > overlap the dscp mask. The DSCP restore operation will only take place > if the corresponding bit/s in conntrack mark yield a non zero result. > > eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6 > bits, whilst using bit 25 as a flag to do so. Bit 26 is unused in this > example. > > CPMARK mode > > cpmark enables copying of the conntrack mark to the packet skb mark. In > this mode it is completely equivalent to the existing act_connmark. > Additional functionality is provided by the optional mask parameter, > whereby the stored conntrack mark is logically anded with the cpmark > mask before being stored into skb mark. This allows shared usage of the > conntrack mark between applications. > > eg. cpmark 0x00ff would restore only the lower 24 bits of the > conntrack mark, thus may be useful in the event that the upper 8 bits > are used by the DSCP function. > > Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] > [CONTROL] [index ] > where : > dscp MASK is the bitmask to restore DSCP >STATEMASK is the bitmask to determine conditional restoring > cpmark MASK mask applied to restored packet mark > ZONE is the conntrack zone > CONTROL := reclassify | pipe | drop | continue | ok | > goto chain > > Signed-off-by: Kevin Darbyshire-Bryant > > --- > v2 - fix whitespace issue in pkt_cls > fix most warnings from checkpatch - some lines still over 80 chars > due to long TLV names. > v3 - fix some dangling else warnings. > refactor stats printing to please checkpatch. > send zone TLV even if default '0' zone. > now checkpatch clean even though I think some of the formatting > is horrible :-) > sending via google's smtp 'cos MS' exchange office365 appears > to mangle patches from git send-email. Ah, so it wasn't just me having problems ;) > include/uapi/linux/pkt_cls.h | 1 + > include/uapi/linux/tc_act/tc_ctinfo.h | 34 > tc/Makefile | 1 + > tc/m_ctinfo.c | 262 ++ > 4 files changed, 298 insertions(+) > create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h > create mode 100644 tc/m_ctinfo.c > > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h > index 51a0496f..a93680fc 100644 > --- a/include/uapi/linux/pkt_cls.h > +++ b/include/uapi/linux/pkt_cls.h > @@ -105,6 +105,7 @@ enum tca_id { > TCA_ID_IFE = TCA_ACT_IFE, > TCA_ID_SAMPLE = TCA_ACT_SAMPLE, > /* other actions go here */ > + TCA_ID_CTINFO, > __TCA_ID_MAX = 255 > }; > > diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h > b/include/uapi/linux/tc_act/tc_ctinfo.h > new file mode 100644 > index ..da803e05 > --- /dev/null > +++ b/include/uapi/linux/tc_act/tc_ctinfo.h > @@ -0,0 +1,34 @@ > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ > +#ifndef __UAPI_TC_CTINFO_H > +#define __UAPI_TC_CTINFO_H > + > +#include > +#include > + > +struct tc_ctinfo { > + tc_gen; > +}; > + > +enum { > + TCA_CTINFO_UNSPEC, > + TCA_CTINFO_PAD, > + TCA_CTINFO_TM, > + TCA_CTINFO_ACT, > + TCA_CTINFO_ZONE, > + TCA_CTINFO_PARMS_DSCP_MASK, > + TCA_CTINFO_PARMS_DSCP_STATEMASK, > + TCA_CTINFO_PARMS_CPMARK_MASK, > + TCA_CTINFO_STATS_DSCP_SET, > + TCA_CTINFO_STATS_DSCP_ERROR, > + TCA_CTINFO_STATS_CPMARK_SET, > + __TCA_CTINFO_MAX > +}; > + > +#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1) > + > +enum { > + CTINFO_MODE_DSCP= BIT(0), > +
Re: [PATCH net-next] Update my email address
From: Wei Liu Date: Fri, 31 May 2019 08:31:02 +0100 > Signed-off-by: Wei Liu Applied.
[PATCH net-next 03/11] net: dsa: sja1105: Add missing L2 Forwarding Table definitions for P/Q/R/S
This appends to the L2 Forwarding and L2 Forwarding Parameters tables (originally added for first-generation switches) the bits that are new in the second generation. Signed-off-by: Vladimir Oltean --- .../net/dsa/sja1105/sja1105_static_config.c | 18 ++--- .../net/dsa/sja1105/sja1105_static_config.h | 26 +++ 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/drivers/net/dsa/sja1105/sja1105_static_config.c b/drivers/net/dsa/sja1105/sja1105_static_config.c index 7e90e62da389..6d65a7b09395 100644 --- a/drivers/net/dsa/sja1105/sja1105_static_config.c +++ b/drivers/net/dsa/sja1105/sja1105_static_config.c @@ -236,10 +236,20 @@ size_t sja1105pqrs_l2_lookup_entry_packing(void *buf, void *entry_ptr, const size_t size = SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY; struct sja1105_l2_lookup_entry *entry = entry_ptr; - /* These are static L2 lookup entries, so the structure -* should match UM11040 Table 16/17 definitions when -* LOCKEDS is 1. -*/ + if (entry->lockeds) { + sja1105_packing(buf, &entry->tsreg,159, 159, size, op); + sja1105_packing(buf, &entry->mirrvlan, 158, 147, size, op); + sja1105_packing(buf, &entry->takets, 146, 146, size, op); + sja1105_packing(buf, &entry->mirr, 145, 145, size, op); + sja1105_packing(buf, &entry->retag,144, 144, size, op); + } else { + sja1105_packing(buf, &entry->touched, 159, 159, size, op); + sja1105_packing(buf, &entry->age, 158, 144, size, op); + } + sja1105_packing(buf, &entry->mask_iotag, 143, 143, size, op); + sja1105_packing(buf, &entry->mask_vlanid, 142, 131, size, op); + sja1105_packing(buf, &entry->mask_macaddr, 130, 83, size, op); + sja1105_packing(buf, &entry->iotag, 82, 82, size, op); sja1105_packing(buf, &entry->vlanid,81, 70, size, op); sja1105_packing(buf, &entry->macaddr, 69, 22, size, op); sja1105_packing(buf, &entry->destports, 21, 17, size, op); diff --git a/drivers/net/dsa/sja1105/sja1105_static_config.h b/drivers/net/dsa/sja1105/sja1105_static_config.h index 069ca8fd059c..d513b1c91b98 100644 --- a/drivers/net/dsa/sja1105/sja1105_static_config.h +++ b/drivers/net/dsa/sja1105/sja1105_static_config.h @@ -122,9 +122,35 @@ struct sja1105_l2_lookup_entry { u64 destports; u64 enfport; u64 index; + /* P/Q/R/S only */ + u64 mask_iotag; + u64 mask_vlanid; + u64 mask_macaddr; + u64 iotag; + bool lockeds; + union { + /* LOCKEDS=1: Static FDB entries */ + struct { + u64 tsreg; + u64 mirrvlan; + u64 takets; + u64 mirr; + u64 retag; + }; + /* LOCKEDS=0: Dynamically learned FDB entries */ + struct { + u64 touched; + u64 age; + }; + }; }; struct sja1105_l2_lookup_params_entry { + u64 start_dynspc;/* P/Q/R/S only */ + u64 drpnolearn; /* P/Q/R/S only */ + u64 use_static; /* P/Q/R/S only */ + u64 owr_dyn; /* P/Q/R/S only */ + u64 learn_once; /* P/Q/R/S only */ u64 maxage; /* Shared */ u64 dyn_tbsz;/* E/T only */ u64 poly;/* E/T only */ -- 2.17.1
[PATCH net-next 00/11] FDB updates for SJA1105 DSA driver
This patch series adds: - FDB switchdev support for the second generation of switches (P/Q/R/S). I could test/code these now that I got a board with a SJA1105Q. - Management route support for SJA1105 P/Q/R/S. This is needed to send PTP/STP/management frames over the CPU port. - Logic to hide private DSA VLANs from the 'bridge fdb' commands. The new FDB code was also tested and still works on SJA1105T. Vladimir Oltean (11): net: dsa: sja1105: Shim declaration of struct sja1105_dyn_cmd net: dsa: sja1105: Fix bit offsets of index field from L2 lookup entries net: dsa: sja1105: Add missing L2 Forwarding Table definitions for P/Q/R/S net: dsa: sja1105: Plug in support for TCAM searches via the dynamic interface net: dsa: sja1105: Make room for P/Q/R/S FDB operations net: dsa: sja1105: Add P/Q/R/S support for dynamic L2 lookup operations net: dsa: sja1105: Make dynamic_config_read return -ENOENT if not found net: dsa: sja1105: Add P/Q/R/S management route support via dynamic interface net: dsa: sja1105: Add FDB operations for P/Q/R/S series net: dsa: sja1105: Unset port from forwarding mask unconditionally on fdb_del net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb command drivers/net/dsa/sja1105/sja1105.h | 20 +- .../net/dsa/sja1105/sja1105_dynamic_config.c | 144 +- .../net/dsa/sja1105/sja1105_dynamic_config.h | 11 +- drivers/net/dsa/sja1105/sja1105_main.c| 186 -- drivers/net/dsa/sja1105/sja1105_spi.c | 12 ++ .../net/dsa/sja1105/sja1105_static_config.c | 18 +- .../net/dsa/sja1105/sja1105_static_config.h | 26 +++ 7 files changed, 379 insertions(+), 38 deletions(-) -- 2.17.1
[PATCH net-next 04/11] net: dsa: sja1105: Plug in support for TCAM searches via the dynamic interface
Only a single dynamic configuration table of the SJA1105 P/Q/R/S supports this operation: the FDB. To keep the existing structure in place (sja1105_dynamic_config_read and sja1105_dynamic_config_write) and not introduce any new function, a convention is made for sja1105_dynamic_config_read that a negative index argument denotes a search for the entry provided as argument. Signed-off-by: Vladimir Oltean --- .../net/dsa/sja1105/sja1105_dynamic_config.c | 36 ++- .../net/dsa/sja1105/sja1105_dynamic_config.h | 3 ++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index 0023b03a010d..7e7efc2e8ee4 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -36,6 +36,7 @@ SJA1105PQRS_SIZE_MAC_CONFIG_DYN_CMD struct sja1105_dyn_cmd { + bool search; u64 valid; u64 rdwrset; u64 errors; @@ -248,6 +249,7 @@ sja1105et_general_params_entry_packing(void *buf, void *entry_ptr, #define OP_READBIT(0) #define OP_WRITE BIT(1) #define OP_DEL BIT(2) +#define OP_SEARCH BIT(3) /* SJA1105E/T: First generation */ struct sja1105_dynamic_table_ops sja1105et_dyn_ops[BLK_IDX_MAX_DYN] = { @@ -367,6 +369,24 @@ struct sja1105_dynamic_table_ops sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = { [BLK_IDX_XMII_PARAMS] = {0}, }; +/* Provides read access to the settings through the dynamic interface + * of the switch. + * @blk_idxis used as key to select from the sja1105_dynamic_table_ops. + * The selection is limited by the hardware in respect to which + * configuration blocks can be read through the dynamic interface. + * @index is used to retrieve a particular table entry. If negative, + * (and if the @blk_idx supports the searching operation) a search + * is performed by the @entry parameter. + * @entry Type-casted to an unpacked structure that holds a table entry + * of the type specified in @blk_idx. + * Usually an output argument. If @index is negative, then this + * argument is used as input/output: it should be pre-populated + * with the element to search for. Entries which support the + * search operation will have an "index" field (not the @index + * argument to this function) and that is where the found index + * will be returned (or left unmodified - thus negative - if not + * found). + */ int sja1105_dynamic_config_read(struct sja1105_private *priv, enum sja1105_blk_idx blk_idx, int index, void *entry) @@ -385,6 +405,8 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv, if (index >= ops->max_entry_count) return -ERANGE; + if (index < 0 && !(ops->access & OP_SEARCH)) + return -EOPNOTSUPP; if (!(ops->access & OP_READ)) return -EOPNOTSUPP; if (ops->packed_size > SJA1105_MAX_DYN_CMD_SIZE) @@ -396,9 +418,19 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv, cmd.valid = true; /* Trigger action on table entry */ cmd.rdwrset = SPI_READ; /* Action is read */ - cmd.index = index; + if (index < 0) { + /* Avoid copying a signed negative number to an u64 */ + cmd.index = 0; + cmd.search = true; + } else { + cmd.index = index; + cmd.search = false; + } ops->cmd_packing(packed_buf, &cmd, PACK); + if (cmd.search) + ops->entry_packing(packed_buf, entry, PACK); + /* Send SPI write operation: read config table entry */ rc = sja1105_spi_send_packed_buf(priv, SPI_WRITE, ops->addr, packed_buf, ops->packed_size); @@ -456,6 +488,8 @@ int sja1105_dynamic_config_write(struct sja1105_private *priv, if (index >= ops->max_entry_count) return -ERANGE; + if (index < 0) + return -ERANGE; if (!(ops->access & OP_WRITE)) return -EOPNOTSUPP; if (!keep && !(ops->access & OP_DEL)) diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.h b/drivers/net/dsa/sja1105/sja1105_dynamic_config.h index 49c611eb02cb..740dadf43f01 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.h +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.h @@ -7,6 +7,9 @@ #include "sja1105.h" #include +/* Special index that can be used for sja1105_dynamic_config_read */ +#define SJA1105_SEARCH -1 + struct sja1105_dyn_cmd; struct sja1105_dynamic_table_ops { -- 2.17.1
[PATCH net-next 01/11] net: dsa: sja1105: Shim declaration of struct sja1105_dyn_cmd
This structure is merely an implementation detail and should be hidden from the sja1105_dynamic_config.h header, which provides to the rest of the driver an abstract access to the dynamic configuration interface of the switch. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 8 drivers/net/dsa/sja1105/sja1105_dynamic_config.h | 8 +--- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index e73ab28bf632..c981c12eb181 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -35,6 +35,14 @@ #define SJA1105_MAX_DYN_CMD_SIZE \ SJA1105PQRS_SIZE_MAC_CONFIG_DYN_CMD +struct sja1105_dyn_cmd { + u64 valid; + u64 rdwrset; + u64 errors; + u64 valident; + u64 index; +}; + static void sja1105pqrs_l2_lookup_cmd_packing(void *buf, struct sja1105_dyn_cmd *cmd, enum packing_op op) diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.h b/drivers/net/dsa/sja1105/sja1105_dynamic_config.h index 77be59546a55..49c611eb02cb 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.h +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.h @@ -7,13 +7,7 @@ #include "sja1105.h" #include -struct sja1105_dyn_cmd { - u64 valid; - u64 rdwrset; - u64 errors; - u64 valident; - u64 index; -}; +struct sja1105_dyn_cmd; struct sja1105_dynamic_table_ops { /* This returns size_t just to keep same prototype as the -- 2.17.1
[PATCH net-next 02/11] net: dsa: sja1105: Fix bit offsets of index field from L2 lookup entries
This was inadvertently copied from the SJA1105 E/T structure and not tested. Cross-checking with the P/Q/R/S documentation (UM11040) makes it immediately obvious what the correct bit offsets for this field are. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index c981c12eb181..0023b03a010d 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -62,7 +62,7 @@ sja1105pqrs_l2_lookup_cmd_packing(void *buf, struct sja1105_dyn_cmd *cmd, * such that our API doesn't need to ask for a full-blown entry * structure when e.g. a delete is requested. */ - sja1105_packing(buf, &cmd->index, 29, 20, + sja1105_packing(buf, &cmd->index, 15, 6, SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY, op); /* TODO hostcmd */ } -- 2.17.1
[PATCH net-next 05/11] net: dsa: sja1105: Make room for P/Q/R/S FDB operations
The DSA callbacks were written with the E/T (first generation) in mind, which is quite different. For P/Q/R/S completely new implementations need to be provided, which are held as function pointers in the priv->info structure. We are taking a slightly roundabout way for this (a function from sja1105_main.c reads a structure defined in sja1105_spi.c that points to a function defined in sja1105_main.c), but it is what it is. The FDB dump callback works for both families, hence no function pointer for that. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/sja1105/sja1105.h | 15 - .../net/dsa/sja1105/sja1105_dynamic_config.c | 2 +- drivers/net/dsa/sja1105/sja1105_main.c| 56 ++- drivers/net/dsa/sja1105/sja1105_spi.c | 12 4 files changed, 69 insertions(+), 16 deletions(-) diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h index b043bfc408f2..f55e95d1b731 100644 --- a/drivers/net/dsa/sja1105/sja1105.h +++ b/drivers/net/dsa/sja1105/sja1105.h @@ -55,6 +55,11 @@ struct sja1105_info { const struct sja1105_regs *regs; int (*reset_cmd)(const void *ctx, const void *data); int (*setup_rgmii_delay)(const void *ctx, int port); + /* Prototypes from include/net/dsa.h */ + int (*fdb_add_cmd)(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid); + int (*fdb_del_cmd)(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid); const char *name; }; @@ -142,7 +147,15 @@ int sja1105_dynamic_config_write(struct sja1105_private *priv, enum sja1105_blk_idx blk_idx, int index, void *entry, bool keep); -u8 sja1105_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid); +u8 sja1105et_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid); +int sja1105et_fdb_add(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid); +int sja1105et_fdb_del(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid); +int sja1105pqrs_fdb_add(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid); +int sja1105pqrs_fdb_del(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid); /* Common implementations for the static and dynamic configs */ size_t sja1105_l2_forwarding_entry_packing(void *buf, void *entry_ptr, diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index 7e7efc2e8ee4..3a8b0d0ab330 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -552,7 +552,7 @@ static u8 sja1105_crc8_add(u8 crc, u8 byte, u8 poly) * is also received as argument in the Koopman notation that the switch * hardware stores it in. */ -u8 sja1105_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid) +u8 sja1105et_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid) { struct sja1105_l2_lookup_params_entry *l2_lookup_params = priv->static_config.tables[BLK_IDX_L2_LOOKUP_PARAMS].entries; diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index cfdefd9f1905..c78d2def52f1 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -786,10 +786,10 @@ static inline int sja1105et_fdb_index(int bin, int way) return bin * SJA1105ET_FDB_BIN_SIZE + way; } -static int sja1105_is_fdb_entry_in_bin(struct sja1105_private *priv, int bin, - const u8 *addr, u16 vid, - struct sja1105_l2_lookup_entry *match, - int *last_unused) +static int sja1105et_is_fdb_entry_in_bin(struct sja1105_private *priv, int bin, +const u8 *addr, u16 vid, +struct sja1105_l2_lookup_entry *match, +int *last_unused) { int way; @@ -818,8 +818,8 @@ static int sja1105_is_fdb_entry_in_bin(struct sja1105_private *priv, int bin, return -1; } -static int sja1105_fdb_add(struct dsa_switch *ds, int port, - const unsigned char *addr, u16 vid) +int sja1105et_fdb_add(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid) { struct sja1105_l2_lookup_entry l2_lookup = {0}; struct sja1105_private *priv = ds->priv; @@ -827,10 +827,10 @@ static int sja1105_fdb_add(struct dsa_switch *ds, int port, int last_unused = -1; int bin, way; - bin = sja1105_fdb_hash(priv, addr, vid); + bin = sja1105et_fdb_hash(priv, addr, vid); - way = sja1105_is_fdb_entry_in_bin
[PATCH net-next 06/11] net: dsa: sja1105: Add P/Q/R/S support for dynamic L2 lookup operations
These are needed in order to implement the switchdev FDB callbacks. Compared to the E/T generation, not only the ABI (bit offsets) is different, but also the introduction of the HOSTCMD field which permits O(1) TCAM search for an FDB entry. Make use of the newly introduce OP_SEARCH to permit that. It will be used while adding and deleting an FDB entry (to see whether it exists or not). Signed-off-by: Vladimir Oltean --- .../net/dsa/sja1105/sja1105_dynamic_config.c | 54 +-- 1 file changed, 50 insertions(+), 4 deletions(-) diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index 3a8b0d0ab330..7db1f8258287 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -44,17 +44,63 @@ struct sja1105_dyn_cmd { u64 index; }; +enum sja1105_hostcmd { + SJA1105_HOSTCMD_SEARCH = 1, + SJA1105_HOSTCMD_READ = 2, + SJA1105_HOSTCMD_WRITE = 3, + SJA1105_HOSTCMD_INVALIDATE = 4, +}; + static void sja1105pqrs_l2_lookup_cmd_packing(void *buf, struct sja1105_dyn_cmd *cmd, enum packing_op op) { u8 *p = buf + SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY; const int size = SJA1105_SIZE_DYN_CMD; + u64 lockeds = 0; + u64 hostcmd; sja1105_packing(p, &cmd->valid,31, 31, size, op); sja1105_packing(p, &cmd->rdwrset, 30, 30, size, op); sja1105_packing(p, &cmd->errors, 29, 29, size, op); + sja1105_packing(p, &lockeds, 28, 28, size, op); sja1105_packing(p, &cmd->valident, 27, 27, size, op); + + /* VALIDENT is supposed to indicate "keep or not", but in SJA1105 E/T, +* using it to delete a management route was unsupported. UM10944 +* said about it: +* +* In case of a write access with the MGMTROUTE flag set, +* the flag will be ignored. It will always be found cleared +* for read accesses with the MGMTROUTE flag set. +* +* SJA1105 P/Q/R/S keeps the same behavior w.r.t. VALIDENT, but there +* is now another flag called HOSTCMD which does more stuff (quoting +* from UM11040): +* +* A write request is accepted only when HOSTCMD is set to write host +* or invalid. A read request is accepted only when HOSTCMD is set to +* search host or read host. +* +* So it is possible to translate a RDWRSET/VALIDENT combination into +* HOSTCMD so that we keep the dynamic command API in place, and +* at the same time achieve compatibility with the management route +* command structure. +*/ + if (cmd->rdwrset == SPI_READ) { + if (cmd->search) + hostcmd = SJA1105_HOSTCMD_SEARCH; + else + hostcmd = SJA1105_HOSTCMD_READ; + } else { + /* SPI_WRITE */ + if (cmd->valident) + hostcmd = SJA1105_HOSTCMD_WRITE; + else + hostcmd = SJA1105_HOSTCMD_INVALIDATE; + } + sja1105_packing(p, &hostcmd, 25, 23, size, op); + /* Hack - The hardware takes the 'index' field within * struct sja1105_l2_lookup_entry as the index on which this command * will operate. However it will ignore everything else, so 'index' @@ -65,7 +111,6 @@ sja1105pqrs_l2_lookup_cmd_packing(void *buf, struct sja1105_dyn_cmd *cmd, */ sja1105_packing(buf, &cmd->index, 15, 6, SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY, op); - /* TODO hostcmd */ } static void @@ -319,9 +364,9 @@ struct sja1105_dynamic_table_ops sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = { [BLK_IDX_L2_LOOKUP] = { .entry_packing = sja1105pqrs_l2_lookup_entry_packing, .cmd_packing = sja1105pqrs_l2_lookup_cmd_packing, - .access = (OP_READ | OP_WRITE | OP_DEL), + .access = (OP_READ | OP_WRITE | OP_DEL | OP_SEARCH), .max_entry_count = SJA1105_MAX_L2_LOOKUP_COUNT, - .packed_size = SJA1105ET_SIZE_L2_LOOKUP_DYN_CMD, + .packed_size = SJA1105PQRS_SIZE_L2_LOOKUP_DYN_CMD, .addr = 0x24, }, [BLK_IDX_L2_POLICING] = {0}, @@ -403,7 +448,7 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv, ops = &priv->info->dyn_ops[blk_idx]; - if (index >= ops->max_entry_count) + if (index >= 0 && index >= ops->max_entry_count) return -ERANGE; if (index < 0 && !(ops->access & OP_SEARCH)) return -EOPNOTSUPP; @@ -426,6 +471,7 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv, cmd.index = index; cmd.search = false; } + cmd.valident = true; ops->cmd_packing(packed_buf, &cmd, PACK); if (cmd.sear
[PATCH net-next 07/11] net: dsa: sja1105: Make dynamic_config_read return -ENOENT if not found
Conceptually, if an entry is not found in the requested hardware table, it is not an invalid request - so change the error returned appropriately. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 2 +- drivers/net/dsa/sja1105/sja1105_main.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index 7db1f8258287..02a67df4437e 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -502,7 +502,7 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv, * So don't error out in that case. */ if (!cmd.valident && blk_idx != BLK_IDX_MGMT_ROUTE) - return -EINVAL; + return -ENOENT; cpu_relax(); } while (cmd.valid && --retries); diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index c78d2def52f1..dc9803efdbbd 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -948,7 +948,7 @@ static int sja1105_fdb_dump(struct dsa_switch *ds, int port, rc = sja1105_dynamic_config_read(priv, BLK_IDX_L2_LOOKUP, i, &l2_lookup); /* No fdb entry at i, not an issue */ - if (rc == -EINVAL) + if (rc == -ENOENT) continue; if (rc) { dev_err(dev, "Failed to dump FDB: %d\n", rc); -- 2.17.1
[PATCH net-next 08/11] net: dsa: sja1105: Add P/Q/R/S management route support via dynamic interface
Management routes are one-shot FDB rules installed on the CPU port for sending link-local traffic. They are a prerequisite for STP, PTP etc to work. Also make a note that removing a management route was not supported on the previous generation of switches. Signed-off-by: Vladimir Oltean --- .../net/dsa/sja1105/sja1105_dynamic_config.c | 40 ++- drivers/net/dsa/sja1105/sja1105_main.c| 2 + 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index 02a67df4437e..352bb6e89297 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -161,6 +161,36 @@ static size_t sja1105et_mgmt_route_entry_packing(void *buf, void *entry_ptr, return size; } +static void +sja1105pqrs_mgmt_route_cmd_packing(void *buf, struct sja1105_dyn_cmd *cmd, + enum packing_op op) +{ + u8 *p = buf + SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY; + u64 mgmtroute = 1; + + sja1105pqrs_l2_lookup_cmd_packing(buf, cmd, op); + if (op == PACK) + sja1105_pack(p, &mgmtroute, 26, 26, SJA1105_SIZE_DYN_CMD); +} + +static size_t sja1105pqrs_mgmt_route_entry_packing(void *buf, void *entry_ptr, + enum packing_op op) +{ + const size_t size = SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY; + struct sja1105_mgmt_entry *entry = entry_ptr; + + /* In P/Q/R/S, enfport got renamed to mgmtvalid, but its purpose +* is the same (driver uses it to confirm that frame was sent). +* So just keep the name from E/T. +*/ + sja1105_packing(buf, &entry->tsreg, 71, 71, size, op); + sja1105_packing(buf, &entry->takets,70, 70, size, op); + sja1105_packing(buf, &entry->macaddr, 69, 22, size, op); + sja1105_packing(buf, &entry->destports, 21, 17, size, op); + sja1105_packing(buf, &entry->enfport, 16, 16, size, op); + return size; +} + /* In E/T, entry is at addresses 0x27-0x28. There is a 4 byte gap at 0x29, * and command is at 0x2a. Similarly in P/Q/R/S there is a 1 register gap * between entry (0x2d, 0x2e) and command (0x30). @@ -359,7 +389,7 @@ struct sja1105_dynamic_table_ops sja1105et_dyn_ops[BLK_IDX_MAX_DYN] = { [BLK_IDX_XMII_PARAMS] = {0}, }; -/* SJA1105P/Q/R/S: Second generation: TODO */ +/* SJA1105P/Q/R/S: Second generation */ struct sja1105_dynamic_table_ops sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = { [BLK_IDX_L2_LOOKUP] = { .entry_packing = sja1105pqrs_l2_lookup_entry_packing, @@ -369,6 +399,14 @@ struct sja1105_dynamic_table_ops sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = { .packed_size = SJA1105PQRS_SIZE_L2_LOOKUP_DYN_CMD, .addr = 0x24, }, + [BLK_IDX_MGMT_ROUTE] = { + .entry_packing = sja1105pqrs_mgmt_route_entry_packing, + .cmd_packing = sja1105pqrs_mgmt_route_cmd_packing, + .access = (OP_READ | OP_WRITE | OP_DEL | OP_SEARCH), + .max_entry_count = SJA1105_NUM_PORTS, + .packed_size = SJA1105PQRS_SIZE_L2_LOOKUP_DYN_CMD, + .addr = 0x24, + }, [BLK_IDX_L2_POLICING] = {0}, [BLK_IDX_VLAN_LOOKUP] = { .entry_packing = sja1105_vlan_lookup_entry_packing, diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index dc9803efdbbd..f9bbc780f835 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -1475,6 +1475,8 @@ static int sja1105_mgmt_xmit(struct dsa_switch *ds, int port, int slot, if (!timeout) { /* Clean up the management route so that a follow-up * frame may not match on it by mistake. +* This is only hardware supported on P/Q/R/S - on E/T it is +* a no-op and we are silently discarding the -EOPNOTSUPP. */ sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, slot, &mgmt_route, false); -- 2.17.1
[PATCH net-next 09/11] net: dsa: sja1105: Add FDB operations for P/Q/R/S series
This adds support for manipulating the L2 forwarding database (dump, add, delete) for the second generation of NXP SJA1105 switches. At the moment only FDB entries installed statically through 'bridge fdb' are visible in the dump callback - the dynamically learned ones are still under investigation. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/sja1105/sja1105.h | 5 ++ drivers/net/dsa/sja1105/sja1105_main.c | 89 +- 2 files changed, 92 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h index f55e95d1b731..61d00682de60 100644 --- a/drivers/net/dsa/sja1105/sja1105.h +++ b/drivers/net/dsa/sja1105/sja1105.h @@ -147,6 +147,11 @@ int sja1105_dynamic_config_write(struct sja1105_private *priv, enum sja1105_blk_idx blk_idx, int index, void *entry, bool keep); +enum sja1105_iotag { + SJA1105_C_TAG = 0, /* Inner VLAN header */ + SJA1105_S_TAG = 1, /* Outer VLAN header */ +}; + u8 sja1105et_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid); int sja1105et_fdb_add(struct dsa_switch *ds, int port, const unsigned char *addr, u16 vid); diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index f9bbc780f835..46e2cc7b9ddc 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -210,6 +210,8 @@ static int sja1105_init_l2_lookup_params(struct sja1105_private *priv) .maxage = SJA1105_AGEING_TIME_MS(30), /* All entries within a FDB bin are available for learning */ .dyn_tbsz = SJA1105ET_FDB_BIN_SIZE, + /* And the P/Q/R/S equivalent setting: */ + .start_dynspc = 0, /* 2^8 + 2^5 + 2^3 + 2^2 + 2^1 + 1 in Koopman notation */ .poly = 0x97, /* This selects between Independent VLAN Learning (IVL) and @@ -225,6 +227,13 @@ static int sja1105_init_l2_lookup_params(struct sja1105_private *priv) * Maybe correlate with no_linklocal_learn from bridge driver? */ .no_mgmt_learn = true, + /* P/Q/R/S only */ + .use_static = true, + /* Dynamically learned FDB entries can overwrite other (older) +* dynamic FDB entries +*/ + .owr_dyn = true, + .drpnolearn = true, }; table = &priv->static_config.tables[BLK_IDX_L2_LOOKUP_PARAMS]; @@ -908,13 +917,89 @@ int sja1105et_fdb_del(struct dsa_switch *ds, int port, int sja1105pqrs_fdb_add(struct dsa_switch *ds, int port, const unsigned char *addr, u16 vid) { - return -EOPNOTSUPP; + struct sja1105_l2_lookup_entry l2_lookup = {0}; + struct sja1105_private *priv = ds->priv; + int rc, i; + + /* Search for an existing entry in the FDB table */ + l2_lookup.macaddr = ether_addr_to_u64(addr); + l2_lookup.vlanid = vid; + l2_lookup.iotag = SJA1105_S_TAG; + l2_lookup.mask_macaddr = GENMASK_ULL(ETH_ALEN * 8 - 1, 0); + l2_lookup.mask_vlanid = VLAN_VID_MASK; + l2_lookup.mask_iotag = BIT(0); + l2_lookup.destports = BIT(port); + + rc = sja1105_dynamic_config_read(priv, BLK_IDX_L2_LOOKUP, +SJA1105_SEARCH, &l2_lookup); + if (rc == 0) { + /* Found and this port is already in the entry's +* port mask => job done +*/ + if (l2_lookup.destports & BIT(port)) + return 0; + /* l2_lookup.index is populated by the switch in case it +* found something. +*/ + l2_lookup.destports |= BIT(port); + goto skip_finding_an_index; + } + + /* Not found, so try to find an unused spot in the FDB. +* This is slightly inefficient because the strategy is knock-knock at +* every possible position from 0 to 1023. +*/ + for (i = 0; i < SJA1105_MAX_L2_LOOKUP_COUNT; i++) { + rc = sja1105_dynamic_config_read(priv, BLK_IDX_L2_LOOKUP, +i, NULL); + if (rc < 0) + break; + } + if (i == SJA1105_MAX_L2_LOOKUP_COUNT) { + dev_err(ds->dev, "FDB is full, cannot add entry.\n"); + return -EINVAL; + } + l2_lookup.index = i; + +skip_finding_an_index: + return sja1105_dynamic_config_write(priv, BLK_IDX_L2_LOOKUP, + l2_lookup.index, &l2_lookup, + true); } int sja1105pqrs_fdb_del(struct dsa_switch *ds, int port, const unsigned char *addr, u16 vid) { - return -EOPNOTSUPP;
[PATCH net-next 10/11] net: dsa: sja1105: Unset port from forwarding mask unconditionally on fdb_del
This is a cosmetic patch that simplifies the code by removing a redundant check. A logical AND-with-zero performed on a zero is still zero. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/sja1105/sja1105_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 46e2cc7b9ddc..8343dcf48384 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -903,8 +903,8 @@ int sja1105et_fdb_del(struct dsa_switch *ds, int port, * need to completely evict the FDB entry. * Otherwise we just write it back. */ - if (l2_lookup.destports & BIT(port)) - l2_lookup.destports &= ~BIT(port); + l2_lookup.destports &= ~BIT(port); + if (l2_lookup.destports) keep = true; else -- 2.17.1
[PATCH net-next 11/11] net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb command
TX VLANs and RX VLANs are an internal implementation detail of DSA for frame tagging. They work by installing special VLANs on switch ports in the operating modes where no behavior change w.r.t. VLANs can be observed by the user. Therefore it makes sense to hide these VLANs in the 'bridge fdb' command, as well as translate the pvid into the RX VID and TX VID on 'bridge fdb add' and 'bridge fdb del' commands. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/sja1105/sja1105_main.c | 37 ++ 1 file changed, 37 insertions(+) diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 8343dcf48384..b151a8fafb9e 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -1006,7 +1006,21 @@ static int sja1105_fdb_add(struct dsa_switch *ds, int port, const unsigned char *addr, u16 vid) { struct sja1105_private *priv = ds->priv; + int rc; + + /* Since we make use of VLANs even when the bridge core doesn't tell us +* to, translate these FDB entries into the correct dsa_8021q ones. +*/ + if (!dsa_port_is_vlan_filtering(&ds->ports[port])) { + unsigned int upstream = dsa_upstream_port(priv->ds, port); + u16 tx_vid = dsa_8021q_tx_vid(ds, port); + u16 rx_vid = dsa_8021q_rx_vid(ds, port); + rc = priv->info->fdb_add_cmd(ds, port, addr, tx_vid); + if (rc < 0) + return rc; + return priv->info->fdb_add_cmd(ds, upstream, addr, rx_vid); + } return priv->info->fdb_add_cmd(ds, port, addr, vid); } @@ -1014,7 +1028,21 @@ static int sja1105_fdb_del(struct dsa_switch *ds, int port, const unsigned char *addr, u16 vid) { struct sja1105_private *priv = ds->priv; + int rc; + /* Since we make use of VLANs even when the bridge core doesn't tell us +* to, translate these FDB entries into the correct dsa_8021q ones. +*/ + if (!dsa_port_is_vlan_filtering(&ds->ports[port])) { + unsigned int upstream = dsa_upstream_port(priv->ds, port); + u16 tx_vid = dsa_8021q_tx_vid(ds, port); + u16 rx_vid = dsa_8021q_rx_vid(ds, port); + + rc = priv->info->fdb_del_cmd(ds, port, addr, tx_vid); + if (rc < 0) + return rc; + return priv->info->fdb_del_cmd(ds, upstream, addr, rx_vid); + } return priv->info->fdb_del_cmd(ds, port, addr, vid); } @@ -1049,6 +1077,15 @@ static int sja1105_fdb_dump(struct dsa_switch *ds, int port, if (!(l2_lookup.destports & BIT(port))) continue; u64_to_ether_addr(l2_lookup.macaddr, macaddr); + + /* We need to hide the dsa_8021q VLAN from the user. +* Convert the TX VID into the pvid that is active in +* standalone and non-vlan_filtering modes, aka 1. +* The RX VID is applied on the CPU port, which is not seen by +* the bridge core anyway, so there's nothing to hide. +*/ + if (!dsa_port_is_vlan_filtering(&ds->ports[port])) + l2_lookup.vlanid = 1; cb(macaddr, l2_lookup.vlanid, false, data); } return 0; -- 2.17.1
Re: [PATCH RFC iproute2-next v3] tc: add support for action act_ctinfo
> On 2 Jun 2019, at 21:39, Toke Høiland-Jørgensen wrote: > > Kevin Darbyshire-Bryant writes: > >> ctinfo is an action restoring data stored in conntrack marks to various >> fields. At present it has two independent modes of operation, >> restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack >> marks into packet skb marks. >> >> It understands a number of parameters specific to this action in >> additional to the usual action syntax. Each operating mode is >> independent of the other so all options are optional, however not >> specifying at least one mode is a bit pointless. >> >> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] >>[CONTROL] [index ] > > Yay, bikeshedding time! :) I see your bikeshed and raise you… a bus shelter :-) > > As I said in reply to the kernel patch, the "X/Y" syntax usually means > "/", where here they are just two > semi-related mask values. So I think it would be better to just make > 'statemask' its own parameter. Instead of creating another keyword how about we drop the ‘/‘ and make it a space separated optional parameter to ‘dscp’? eg. Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] blah blah > Other than that, just a few nits, below... > >> DSCP mode >> >> dscp enables copying of a DSCP store in the conntrack mark into the >> ipv4/v6 diffserv field. The mask is a 32bit field and specifies where >> in the conntrack mark the DSCP value is stored. It must be 6 contiguous >> bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits >> of the conntrack mark. >> >> The DSCP copying may be optionally controlled by a statemask. The >> statemask is a 32bit field, usually with a single bit set and must not >> overlap the dscp mask. The DSCP restore operation will only take place >> if the corresponding bit/s in conntrack mark yield a non zero result. >> >> eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6 >> bits, whilst using bit 25 as a flag to do so. Bit 26 is unused in this >> example. >> >> CPMARK mode >> >> cpmark enables copying of the conntrack mark to the packet skb mark. In >> this mode it is completely equivalent to the existing act_connmark. >> Additional functionality is provided by the optional mask parameter, >> whereby the stored conntrack mark is logically anded with the cpmark >> mask before being stored into skb mark. This allows shared usage of the >> conntrack mark between applications. >> >> eg. cpmark 0x00ff would restore only the lower 24 bits of the >> conntrack mark, thus may be useful in the event that the upper 8 bits >> are used by the DSCP function. >> >> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] >>[CONTROL] [index ] >> where : >> dscp MASK is the bitmask to restore DSCP >> STATEMASK is the bitmask to determine conditional restoring >> cpmark MASK mask applied to restored packet mark >> ZONE is the conntrack zone >> CONTROL := reclassify | pipe | drop | continue | ok | >> goto chain >> >> Signed-off-by: Kevin Darbyshire-Bryant >> >> --- >> v2 - fix whitespace issue in pkt_cls >> fix most warnings from checkpatch - some lines still over 80 chars >> due to long TLV names. >> v3 - fix some dangling else warnings. >> refactor stats printing to please checkpatch. >> send zone TLV even if default '0' zone. >> now checkpatch clean even though I think some of the formatting >> is horrible :-) >> sending via google's smtp 'cos MS' exchange office365 appears >> to mangle patches from git send-email. > > Ah, so it wasn't just me having problems ;) No, though I’m still not clear what’s going on or when Microsoft improved(tm) it :-/ > >> include/uapi/linux/pkt_cls.h | 1 + >> include/uapi/linux/tc_act/tc_ctinfo.h | 34 >> tc/Makefile | 1 + >> tc/m_ctinfo.c | 262 ++ >> 4 files changed, 298 insertions(+) >> create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h >> create mode 100644 tc/m_ctinfo.c >> >> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h >> index 51a0496f..a93680fc 100644 >> --- a/include/uapi/linux/pkt_cls.h >> +++ b/include/uapi/linux/pkt_cls.h >> @@ -105,6 +105,7 @@ enum tca_id { >> TCA_ID_IFE = TCA_ACT_IFE, >> TCA_ID_SAMPLE = TCA_ACT_SAMPLE, >> /* other actions go here */ >> +TCA_ID_CTINFO, >> __TCA_ID_MAX = 255 >> }; >> >> diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h >> b/include/uapi/linux/tc_act/tc_ctinfo.h >> new file mode 100644 >> index ..da803e05 >> --- /dev/null >> +++ b/include/uapi/linux/tc_act/tc_ctinfo.h >> @@ -0,0 +1,34 @@ >> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ >> +#ifndef __UAPI_TC_CTINFO_H >> +#define __UAPI_TC_CTINFO_H >> + >> +#include >> +#include >> + >> +struct tc_ctinfo { >> +tc_gen; >
[PATCH v2 net 1/1] net: dsa: sja1105: Fix link speed not working at 100 Mbps and below
The hardware values for link speed are held in the sja1105_speed_t enum. However they do not increase in the order that sja1105_get_speed_cfg was iterating over them (basically from SJA1105_SPEED_AUTO - 0 - to SJA1105_SPEED_1000MBPS - 1 - skipping the other two). Another bug is that the code in sja1105_adjust_port_config relies on the fact that an invalid link speed is detected by sja1105_get_speed_cfg and returned as -EINVAL. However storing this into an enum that only has positive members will cast it into an unsigned value, and it will miss the negative check. So take the simplest approach and remove the sja1105_get_speed_cfg function and replace it with a simple switch-case statement. Fixes: 8aa9ebccae87 ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch") Signed-off-by: Vladimir Oltean Suggested-by: Andrew Lunn --- drivers/net/dsa/sja1105/sja1105_main.c | 32 +- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 5412c3551bcc..25bb64ce0432 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -710,16 +710,6 @@ static int sja1105_speed[] = { [SJA1105_SPEED_1000MBPS] = 1000, }; -static sja1105_speed_t sja1105_get_speed_cfg(unsigned int speed_mbps) -{ - int i; - - for (i = SJA1105_SPEED_AUTO; i <= SJA1105_SPEED_1000MBPS; i++) - if (sja1105_speed[i] == speed_mbps) - return i; - return -EINVAL; -} - /* Set link speed and enable/disable traffic I/O in the MAC configuration * for a specific port. * @@ -742,8 +732,21 @@ static int sja1105_adjust_port_config(struct sja1105_private *priv, int port, mii = priv->static_config.tables[BLK_IDX_XMII_PARAMS].entries; mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries; - speed = sja1105_get_speed_cfg(speed_mbps); - if (speed_mbps && speed < 0) { + switch (speed_mbps) { + case 0: + /* No speed update requested */ + speed = SJA1105_SPEED_AUTO; + break; + case 10: + speed = SJA1105_SPEED_10MBPS; + break; + case 100: + speed = SJA1105_SPEED_100MBPS; + break; + case 1000: + speed = SJA1105_SPEED_1000MBPS; + break; + default: dev_err(dev, "Invalid speed %iMbps\n", speed_mbps); return -EINVAL; } @@ -753,10 +756,7 @@ static int sja1105_adjust_port_config(struct sja1105_private *priv, int port, * and we no longer need to store it in the static config (already told * hardware we want auto during upload phase). */ - if (speed_mbps) - mac[port].speed = speed; - else - mac[port].speed = SJA1105_SPEED_AUTO; + mac[port].speed = speed; /* On P/Q/R/S, one can read from the device via the MAC reconfiguration * tables. On E/T, MAC reconfig tables are not readable, only writable. -- 2.17.1
[PATCH v2 net 0/1] Fix link speed handling for SJA1105 DSA driver
This patchset avoids two bugs in the logic handling of the enum sja1105_speed_t which caused link speeds of 10 and 100 Mbps to not be interpreted correctly and thus not be applied to the switch MACs. v1 patchset can be found at: https://www.spinics.net/lists/netdev/msg574477.html Changes from v1: Applied Andrew Lunn's suggestion of removing the sja1105_get_speed_cfg function altogether instead of trying to fix it. Vladimir Oltean (1): net: dsa: sja1105: Fix link speed not working at 100 Mbps and below drivers/net/dsa/sja1105/sja1105_main.c | 32 +- 1 file changed, 16 insertions(+), 16 deletions(-) -- 2.17.1
Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote: > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev wrote: > > On 05/31, Andrii Nakryiko wrote: > > > This patch adds support for a new way to define BPF maps. It relies on > > > BTF to describe mandatory and optional attributes of a map, as well as > > > captures type information of key and value naturally. This eliminates > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are > > > always in sync with the key/value type. > > My 2c: this is too magical and relies on me knowing the expected fields. > > (also, the compiler won't be able to help with the misspellings). I have mixed feelings, too. Especially the key and value fields are very non-idiomatic for C :( They never hold any value or data, while the other fields do. That feels so awkward. I'm no compiler expert, but even something like: struct map_def { void *key_type_ref; } mamap = { .key_type_ref = &(struct key_xyz){}, }; Would feel like less of a hack to me, and then map_def doesn't have to be different for every map. But yea, IDK if it's easy to (a) resolve the type of what key_type points to, or (b) how to do this for scalar types. > I don't think it's really worse than current bpf_map_def approach. In > typical scenario, there are only two fields you need to remember: type > and max_entries (notice, they are called exactly the same as in > bpf_map_def, so this knowledge is transferrable). Then you'll have > key/value, using which you are describing both type (using field's > type) and size (calculated from the type). > > I can relate a bit to that with bpf_map_def you can find definition > and see all possible fields, but one can also find a lot of examples > for new map definitions as well. > > One big advantage of this scheme, though, is that you get that type > association automagically without using BPF_ANNOTATE_KV_PAIR hack, > with no chance of having a mismatch, etc. This is less duplication (no > need to do sizeof(struct my_struct) and struct my_struct as an arg to > that macro) and there is no need to go and ping people to add those > annotations to improve introspection of BPF maps. > > > Relying on BTF, this approach allows for both forward and backward > > > compatibility w.r.t. extending supported map definition features. Old > > > libbpf implementation will ignore fields it doesn't recognize, while new > > > implementations will parse and recognize new optional attributes. > > I also don't know how to feel about old libbpf ignoring some attributes. > > In the kernel we require that the unknown fields are zeroed. > > We probably need to do something like that here? What do you think > > would be a good example of an optional attribute? > > Ignoring is required for forward-compatibility, where old libbpf will > be used to load newer user BPF programs. We can decided not to do it, > in that case it's just a question of erroring out on first unknown > field. This RFC was posted exactly to discuss all these issues with > more general community, as there is no single true way to do this. > > As for examples of when it can be used. It's any feature that can be > considered optional or a hint, so if old libbpf doesn't do that, it's > still not the end of the world (and we can live with that, or can > correct using direct libbpf API calls). On forward compatibility my 0.02c would be - if we want to go there and silently ignore fields it'd be good to have some form of "hard required" bit. For TLVs ABIs it can be a "you have to understand this one" bit, for libbpf perhaps we could add a "min libbpf version required" section? That kind of ties us ELF formats to libbpf specifics (the libbpf version presumably would imply support for features), but I think we want to go there, anyway.
Re: [PATCH v2 net 1/1] net: dsa: sja1105: Fix link speed not working at 100 Mbps and below
On Mon, Jun 03, 2019 at 02:31:37AM +0300, Vladimir Oltean wrote: > The hardware values for link speed are held in the sja1105_speed_t enum. > However they do not increase in the order that sja1105_get_speed_cfg was > iterating over them (basically from SJA1105_SPEED_AUTO - 0 - to > SJA1105_SPEED_1000MBPS - 1 - skipping the other two). > > Another bug is that the code in sja1105_adjust_port_config relies on the > fact that an invalid link speed is detected by sja1105_get_speed_cfg and > returned as -EINVAL. However storing this into an enum that only has > positive members will cast it into an unsigned value, and it will miss > the negative check. > > So take the simplest approach and remove the sja1105_get_speed_cfg > function and replace it with a simple switch-case statement. > > Fixes: 8aa9ebccae87 ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 > switch") > Signed-off-by: Vladimir Oltean > Suggested-by: Andrew Lunn > --- > drivers/net/dsa/sja1105/sja1105_main.c | 32 +- > 1 file changed, 16 insertions(+), 16 deletions(-) > > diff --git a/drivers/net/dsa/sja1105/sja1105_main.c > b/drivers/net/dsa/sja1105/sja1105_main.c > index 5412c3551bcc..25bb64ce0432 100644 > --- a/drivers/net/dsa/sja1105/sja1105_main.c > +++ b/drivers/net/dsa/sja1105/sja1105_main.c > @@ -710,16 +710,6 @@ static int sja1105_speed[] = { > [SJA1105_SPEED_1000MBPS] = 1000, > }; > > -static sja1105_speed_t sja1105_get_speed_cfg(unsigned int speed_mbps) > -{ > - int i; > - > - for (i = SJA1105_SPEED_AUTO; i <= SJA1105_SPEED_1000MBPS; i++) > - if (sja1105_speed[i] == speed_mbps) > - return i; > - return -EINVAL; > -} > - > /* Set link speed and enable/disable traffic I/O in the MAC configuration > * for a specific port. > * > @@ -742,8 +732,21 @@ static int sja1105_adjust_port_config(struct > sja1105_private *priv, int port, > mii = priv->static_config.tables[BLK_IDX_XMII_PARAMS].entries; > mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries; > > - speed = sja1105_get_speed_cfg(speed_mbps); > - if (speed_mbps && speed < 0) { > + switch (speed_mbps) { > + case 0: > + /* No speed update requested */ > + speed = SJA1105_SPEED_AUTO; > + break; > + case 10: > + speed = SJA1105_SPEED_10MBPS; > + break; > + case 100: > + speed = SJA1105_SPEED_100MBPS; > + break; > + case 1000: > + speed = SJA1105_SPEED_1000MBPS; > + break; > + default: > dev_err(dev, "Invalid speed %iMbps\n", speed_mbps); > return -EINVAL; > } Thanks for the re-write. This looks more obviously correct. One minor nit-pick. We have SPEED_10, SPEED_100, SPEED_1000, etc. It would be good to use them. With that change Reviewed-by: Andrew Lunn Andrew
Re: [PATCH net] packet: unconditionally free po->rollover
From: Willem de Bruijn Date: Fri, 31 May 2019 12:37:23 -0400 > From: Willem de Bruijn > > Rollover used to use a complex RCU mechanism for assignment, which had > a race condition. The below patch fixed the bug and greatly simplified > the logic. > > The feature depends on fanout, but the state is private to the socket. > Fanout_release returns f only when the last member leaves and the > fanout struct is to be freed. > > Destroy rollover unconditionally, regardless of fanout state. > > Fixes: 57f015f5eccf2 ("packet: fix crash in fanout_demux_rollover()") > Reported-by: syzbot > Diagnosed-by: Dmitry Vyukov > Signed-off-by: Willem de Bruijn Applied and queued up for -stable.
Re: [PATCH net-next v3] net: add rcu annotations for ifa_list
From: Florian Westphal Date: Fri, 31 May 2019 18:27:02 +0200 > v3: fix typo in patch1 commit message > All other patches are unchanged. > v2: remove ifa_list iteration in afs instead of conversion > > Eric Dumazet reported following problem: > > It looks that unless RTNL is held, accessing ifa_list needs proper RCU > protection. indev->ifa_list can be changed under us by another cpu > (which owns RTNL) [..] > > A proper rcu_dereference() with an happy sparse support would require > adding __rcu attribute. > > This patch series does that: add __rcu to the ifa_list pointers. > That makes sparse complain, so the series also adds the required > rcu_assign_pointer/dereference helpers where needed. > > All patches except the last one are preparation work. > Two new macros are introduced for in_ifaddr walks. > > Last patch adds the __rcu annotations and the assign_pointer/dereference > helper calls. > > This patch is a bit large, but I found no better way -- other > approaches (annotate-first or add helpers-first) all result in > mid-series sparse warnings. > > This series is submitted vs. net-next rather than net for several > reasons: > > 1. Its (mostly) compile-tested only > 2. 3rd patch changes behaviour wrt. secondary addresses >(see changelog) > 3. The problem exists for a very long time (2004), so it doesn't >seem to be urgent to fix this -- rcu use to free ifa_list >predates the git era. Series applied, thanks Florian.
Re: [PATCH net-next] net: ethernet: improve eth_platform_get_mac_address
From: Heiner Kallweit Date: Fri, 31 May 2019 19:14:44 +0200 > pci_device_to_OF_node(to_pci_dev(dev)) is the same as dev->of_node, > so we can simplify the code. In addition add an empty line before > the return statement. > > Signed-off-by: Heiner Kallweit Applied.
Re: [PATCH net-next] r8169: improve r8169_csum_workaround
From: Heiner Kallweit Date: Fri, 31 May 2019 19:17:15 +0200 > Use helper skb_is_gso() and simplify access to tx_dropped. > > Signed-off-by: Heiner Kallweit Applied.
Re: [PATCH net-next] nexthop: Add entry to MAINTAINERS
From: David Ahern Date: Fri, 31 May 2019 12:44:09 -0600 > From: David Ahern > > Add entry to MAINTAINERS file for new nexthop code. > > Signed-off-by: David Ahern Applied.
Re: [PATCH net-next 0/3] r8169: replace several function pointers with direct calls
From: Heiner Kallweit Date: Fri, 31 May 2019 19:52:24 +0200 > This series removes most function pointers from struct rtl8169_private > and uses direct calls instead. This simplifies the code and avoids > the penalty of indirect calls in times of retpoline. Series applied, thanks.
Re: [PATCH 1/1] net: rds: add per rds connection cache statistics
On 6/1/19 12:54 AM, Zhu Yanjun wrote: The variable cache_allocs is to indicate how many frags (KiB) are in one rds connection frag cache. The command "rds-info -Iv" will output the rds connection cache statistics as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDevRemoteDev 1.1.1.14 1.1.1.14 58 255 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31 send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257, cache_allocs=12 " This means that there are about 12KiB frag in this rds connection frag cache. Tested-by: RDS CI Please add some valid email id or drop above. Its expected that with SOB, patches are tested before testing. Signed-off-by: Zhu Yanjun --- include/uapi/linux/rds.h | 2 ++ net/rds/ib.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h index 5d0f76c..fd6b5f6 100644 --- a/include/uapi/linux/rds.h +++ b/include/uapi/linux/rds.h @@ -250,6 +250,7 @@ struct rds_info_rdma_connection { __u32 rdma_mr_max; __u32 rdma_mr_size; __u8tos; + __u32 cache_allocs; Some of this header file changes, how is taking care of backward compatibility with tooling ? This was one of the reason, the all the fields are not updated. Regards, Santosh
Re: KASAN: user-memory-access Read in ip6_hold_safe (3)
On 6/1/19 12:05 AM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: dfb569f2 net: ll_temac: Fix compile error just an FYI: this is before any of my IPv6 changes in 5.2-next that are relevant. At this commit the only IPv6 changes of mine are: 19a3b7eea424 ipv6: export function to send route updates cdaa16a4f70c ipv6: Add hook to bump sernum for a route to stubs 68a9b13d9219 ipv6: Add delete route hook to stubs which are function exports - unused at commit dfb569f2. > git tree: net-next > console output: https://syzkaller.appspot.com/x/log.txt?x=10afcb8aa0 > kernel config: https://syzkaller.appspot.com/x/.config?x=fc045131472947d7 > dashboard link: > https://syzkaller.appspot.com/bug?extid=a5b6e01ec8116d046842 > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+a5b6e01ec8116d046...@syzkaller.appspotmail.com > > == > BUG: KASAN: user-memory-access in atomic_read > include/asm-generic/atomic-instrumented.h:26 [inline] > BUG: KASAN: user-memory-access in atomic_fetch_add_unless > include/linux/atomic-fallback.h:1086 [inline] > BUG: KASAN: user-memory-access in atomic_add_unless > include/linux/atomic-fallback.h: [inline] > BUG: KASAN: user-memory-access in atomic_inc_not_zero > include/linux/atomic-fallback.h:1127 [inline] > BUG: KASAN: user-memory-access in dst_hold_safe include/net/dst.h:297 > [inline] > BUG: KASAN: user-memory-access in ip6_hold_safe+0xad/0x380 > net/ipv6/route.c:1050 > Read of size 4 at addr 1ec4 by task syz-executor.0/10106 0xc1ec4 is not a valid address for an allocated rt6_info. > > CPU: 0 PID: 10106 Comm: syz-executor.0 Not tainted 5.2.0-rc1+ #5 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x172/0x1f0 lib/dump_stack.c:113 > __kasan_report.cold+0x5/0x40 mm/kasan/report.c:321 > kasan_report+0x12/0x20 mm/kasan/common.c:614 > check_memory_region_inline mm/kasan/generic.c:185 [inline] > check_memory_region+0x123/0x190 mm/kasan/generic.c:191 > kasan_check_read+0x11/0x20 mm/kasan/common.c:94 > atomic_read include/asm-generic/atomic-instrumented.h:26 [inline] > atomic_fetch_add_unless include/linux/atomic-fallback.h:1086 [inline] > atomic_add_unless include/linux/atomic-fallback.h: [inline] > atomic_inc_not_zero include/linux/atomic-fallback.h:1127 [inline] > dst_hold_safe include/net/dst.h:297 [inline] > ip6_hold_safe+0xad/0x380 net/ipv6/route.c:1050 > rt6_get_pcpu_route net/ipv6/route.c:1277 [inline] My hunch is that this is memory corruption in the pcpu memory space. In a fib6_info, rt6i_pcpu is non-NULL for ALL fib6_info except fib6_null_entry for which pcpu routes are never generated. rt6i_pcpu is allocated via pcpu_alloc which means this memory space is amongst other pcpu users and easily stepped on by other pcpu users. The entries stored in rt6_pcpu are kmem_cache entries for the ipv6 dst cache and either a valid allocated memory address or NULL. Past issues with pcpu routes was the 'from' (the fib6_info used to generate the rt6_info) being NULL (several), the fib entry getting released more than it should (0e2338749192) or not getting freed at all (61fb0d016807).
Re: general protection fault in tcp_v6_connect
On 6/1/19 12:05 AM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: f4aa8012 cxgb4: Make t4_get_tp_e2c_map static > git tree: net-next > console output: https://syzkaller.appspot.com/x/log.txt?x=1662cb12a0 > kernel config: https://syzkaller.appspot.com/x/.config?x=d137eb988ffd93c3 > dashboard link: > https://syzkaller.appspot.com/bug?extid=5ee26b4e30c45930bd3c > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+5ee26b4e30c45930b...@syzkaller.appspotmail.com > > kasan: CONFIG_KASAN_INLINE enabled > kasan: GPF could be caused by NULL-ptr deref or user memory access > general protection fault: [#1] PREEMPT SMP KASAN > CPU: 1 PID: 17324 Comm: syz-executor.5 Not tainted 5.2.0-rc1+ #2 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] > RIP: 0010:rt6_get_cookie include/net/ip6_fib.h:264 [inline] > RIP: 0010:ip6_dst_store include/net/ip6_route.h:213 [inline] > RIP: 0010:tcp_v6_connect+0xfd0/0x20a0 net/ipv6/tcp_ipv6.c:298 > Code: 89 e6 e8 83 a2 48 fb 45 84 e4 0f 84 90 09 00 00 e8 35 a1 48 fb 49 > 8d 7e 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 > 00 0f 85 57 0e 00 00 4d 8b 66 70 e8 4d 88 35 fb 31 ff 89 > RSP: 0018:888066547800 EFLAGS: 00010207 > RAX: dc00 RBX: 888064e839f0 RCX: c90010e49000 > RDX: 002b RSI: 8628033b RDI: 015f > RBP: 888066547980 R08: 8880a9412080 R09: ed1015d26be0 This one is not so obvious. The error has to be a bad dst from ip6_dst_lookup_flow called by tcp_v6_connect which then is attempted to be stored in the socket via ip6_dst_store. ip6_dst_store calls rt6_get_cookie with dst as the argument. RDI (first arg for x86) shows 0x15f which is not a valid and would cause a fault. None of the ip6_dst_* functions in net/ipv6/ip6_output.c have changed recently (5.2-next definitely but I believe this true for many releases prior). Further, all of the FIB lookup functions (called by ip6_dst_lookup_flow) always return a non-NULL dst. If my hunch about the other splat is correct (pcpu corruption) that could explain this one: FIB lookup is fine and finds an entry, the entry has a pcpu cache entry so it is returned. If the pcpu entry was stomped on then it would be invalid and the above would result.
[PATCH v2 net-next 4/7] ipv6: Plumb support for nexthop object in a fib6_info
From: David Ahern Add struct nexthop and nh_list list_head to fib6_info. nh_list is the fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info referencing a nexthop object can not have 'sibling' entries (the old way of doing multipath routes), the nh_list is a union with fib6_siblings. Add f6i_list list_head to 'struct nexthop' to track fib6_info entries using a nexthop instance. Update __remove_nexthop_fib to walk f6_list and delete fib entries using the nexthop. Add a few nexthop helpers for use when a nexthop is added to fib6_info: - nexthop_fib6_nh - return first fib6_nh in a nexthop object - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh if the fib6_info references a nexthop object - nexthop_path_fib6_result - similar to ipv4, select a path within a multipath nexthop object. If the nexthop is a blackhole, set fib6_result type to RTN_BLACKHOLE, and set the REJECT flag Update the fib6_info references to check for nh and take a different path as needed: - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT be coalesced with other fib entries into a multipath route - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references a nexthop - addrconf (host routes), RA's and info entries (anything configured via ndisc) does not use nexthop objects - fib6_info_destroy_rcu - put reference to nexthop object - fib6_purge_rt - drop fib6_info from f6i_list - fib6_select_path - update to use the new nexthop_path_fib6_result when fib entry uses a nexthop object - rt6_device_match - update to catch use of nexthop object as a blackhole and set fib6_type and flags. - ip6_pol_route - detect the REJECT flag getting set for blackhole nexthop and jump to ip6_create_rt_rcu - ip6_route_info_create - don't add space for fib6_nh if fib entry is going to reference a nexthop object, take a reference to nexthop object, disallow use of source routing - rt6_nlmsg_size - add space for RTA_NH_ID - add rt6_fill_node_nexthop to add nexthop data on a dump As with ipv4, most of the changes push existing code into the else branch of whether the fib entry uses a nexthop object. Update the nexthop code to walk f6i_list on a nexthop deleted to remove fib entries referencing it. Signed-off-by: David Ahern --- include/net/ip6_fib.h | 11 ++-- include/net/ip6_route.h | 13 +++- include/net/nexthop.h | 50 net/ipv4/nexthop.c | 44 ++ net/ipv6/addrconf.c | 5 ++ net/ipv6/ip6_fib.c | 22 +-- net/ipv6/ndisc.c| 3 +- net/ipv6/route.c| 156 +--- 8 files changed, 268 insertions(+), 36 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index ebe5d65f97e0..1a8acd51b277 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -146,7 +146,10 @@ struct fib6_info { * destination, but not the same gateway. nsiblings is just a cache * to speed up lookup. */ - struct list_headfib6_siblings; + union { + struct list_headfib6_siblings; + struct list_headnh_list; + }; unsigned intfib6_nsiblings; refcount_t fib6_ref; @@ -170,6 +173,7 @@ struct fib6_info { unused:3; struct rcu_head rcu; + struct nexthop *nh; struct fib6_nh fib6_nh[0]; }; @@ -441,11 +445,6 @@ void rt6_get_prefsrc(const struct rt6_info *rt, struct in6_addr *addr) rcu_read_unlock(); } -static inline struct net_device *fib6_info_nh_dev(const struct fib6_info *f6i) -{ - return f6i->fib6_nh->fib_nh_dev; -} - int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh, struct fib6_config *cfg, gfp_t gfp_flags, struct netlink_ext_ack *extack); diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index a6ce6ea856b9..7375a165fd98 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -27,6 +27,7 @@ struct route_info { #include #include #include +#include #define RT6_LOOKUP_F_IFACE 0x0001 #define RT6_LOOKUP_F_REACHABLE 0x0002 @@ -66,10 +67,13 @@ static inline bool rt6_need_strict(const struct in6_addr *daddr) (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK); } +/* fib entries using a nexthop object can not be coalesced into + * a multipath route + */ static inline bool rt6_qualify_for_ecmp(const struct fib6_info *f6i) { /* the RTF_ADDRCONF flag filters out RA's */ - return !(f6i->fib6_flags & RTF_ADDRCONF) && + return !(f6i->fib6_flags & RTF_ADDRCONF) && !f6i->nh && f6i->fib6_nh->fib_nh_gw_family; } @@ -275,8 +279,13 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, static i
[PATCH v2 net-next 1/7] ipv4: Use accessors for fib_info nexthop data
From: David Ahern Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the fib_dev macro which is an alias for the first nexthop. Replacements: fi->fib_dev--> fib_info_nh(fi, 0)->fib_nh_dev fi->fib_nh --> fib_info_nh(fi, 0) fi->fib_nh[i] --> fib_info_nh(fi, i) fi->fib_nhs--> fib_info_num_path(fi) where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path returns fi->fib_nhs. Move the existing fib_info_nhc to nexthop.h and define the new ones there. A later patch adds a check if a fib_info uses a nexthop object, and defining the helpers in nexthop.h avoid circular header dependencies. After this all remaining open coded references to fi->fib_nhs and fi->fib_nh are in: - fib_create_info and helpers used to lookup an existing fib_info entry, and - the netdev event functions fib_sync_down_dev and fib_sync_up. The latter two will not be reused for nexthops, and the fib_create_info will be updated to handle a nexthop in a fib_info. Signed-off-by: David Ahern --- drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c | 29 ++ .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 19 --- drivers/net/ethernet/rocker/rocker_ofdpa.c | 25 +--- include/net/ip_fib.h | 6 -- include/net/nexthop.h | 15 + net/core/filter.c | 3 +- net/ipv4/fib_frontend.c| 11 ++-- net/ipv4/fib_lookup.h | 1 + net/ipv4/fib_rules.c | 8 ++- net/ipv4/fib_semantics.c | 66 -- net/ipv4/fib_trie.c| 26 + net/ipv4/route.c | 3 +- 12 files changed, 132 insertions(+), 80 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c index 8212bfd05733..2cbfaa8da7fc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c @@ -2,6 +2,7 @@ /* Copyright (c) 2019 Mellanox Technologies. */ #include +#include #include "lag.h" #include "lag_mp.h" #include "mlx5_core.h" @@ -110,6 +111,8 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, struct fib_info *fi) { struct lag_mp *mp = &ldev->lag_mp; + struct fib_nh *fib_nh0, *fib_nh1; + unsigned int nhs; /* Handle delete event */ if (event == FIB_EVENT_ENTRY_DEL) { @@ -120,9 +123,11 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, } /* Handle add/replace event */ - if (fi->fib_nhs == 1) { + nhs = fib_info_num_path(fi); + if (nhs == 1) { if (__mlx5_lag_is_active(ldev)) { - struct net_device *nh_dev = fi->fib_nh[0].fib_nh_dev; + struct fib_nh *nh = fib_info_nh(fi, 0); + struct net_device *nh_dev = nh->fib_nh_dev; int i = mlx5_lag_dev_get_netdev_idx(ldev, nh_dev); mlx5_lag_set_port_affinity(ldev, ++i); @@ -130,14 +135,16 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, return; } - if (fi->fib_nhs != 2) + if (nhs != 2) return; /* Verify next hops are ports of the same hca */ - if (!(fi->fib_nh[0].fib_nh_dev == ldev->pf[0].netdev && - fi->fib_nh[1].fib_nh_dev == ldev->pf[1].netdev) && - !(fi->fib_nh[0].fib_nh_dev == ldev->pf[1].netdev && - fi->fib_nh[1].fib_nh_dev == ldev->pf[0].netdev)) { + fib_nh0 = fib_info_nh(fi, 0); + fib_nh1 = fib_info_nh(fi, 1); + if (!(fib_nh0->fib_nh_dev == ldev->pf[0].netdev && + fib_nh1->fib_nh_dev == ldev->pf[1].netdev) && + !(fib_nh0->fib_nh_dev == ldev->pf[1].netdev && + fib_nh1->fib_nh_dev == ldev->pf[0].netdev)) { mlx5_core_warn(ldev->pf[0].dev, "Multipath offload require two ports of the same HCA\n"); return; } @@ -174,7 +181,7 @@ static void mlx5_lag_fib_nexthop_event(struct mlx5_lag *ldev, mlx5_lag_set_port_affinity(ldev, i); } } else if (event == FIB_EVENT_NH_ADD && - fi->fib_nhs == 2) { + fib_info_num_path(fi) == 2) { mlx5_lag_set_port_affinity(ldev, 0); } } @@ -238,6 +245,7 @@ static int mlx5_lag_fib_event(struct notifier_block *nb, struct mlx5_fib_event_work *fib_work; struct fib_entry_notifier_info *fen_info; struct fib_nh_notifier_info *fnh_info; + struct net_device *fib_dev; struct fib_info *fi; if (info->family != AF_INET) @@ -254,8 +262,9 @@ static int mlx5_lag_fib_event(struct notifier_block *nb, fen_info = container
[PATCH v2 net-next 6/7] mlx5: Fail attempts to use routes with nexthop objects
From: David Ahern Fail attempts to use nexthop objects with routes until support can be properly added. Signed-off-by: David Ahern --- drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c index 2cbfaa8da7fc..e69766393990 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c @@ -262,6 +262,10 @@ static int mlx5_lag_fib_event(struct notifier_block *nb, fen_info = container_of(info, struct fib_entry_notifier_info, info); fi = fen_info->fi; + if (fi->nh) { + NL_SET_ERR_MSG_MOD(info->extack, "IPv4 route with nexthop objects is not supported"); + return notifier_from_errno(-EINVAL); + } fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev; if (fib_dev != ldev->pf[0].netdev && fib_dev != ldev->pf[1].netdev) { -- 2.11.0
[PATCH v2 net-next 5/7] mlxsw: Fail attempts to use routes with nexthop objects
From: David Ahern Fail attempts to use nexthop objects with routes until support can be properly added. Signed-off-by: David Ahern Reviewed-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 4f781358aef1..23f17ea52061 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -6122,6 +6122,20 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb, NL_SET_ERR_MSG_MOD(info->extack, "IPv6 gateway with IPv4 route is not supported"); return notifier_from_errno(-EINVAL); } + if (fen_info->fi->nh) { + NL_SET_ERR_MSG_MOD(info->extack, "IPv4 route with nexthop objects is not supported"); + return notifier_from_errno(-EINVAL); + } + } else if (info->family == AF_INET6) { + struct fib6_entry_notifier_info *fen6_info; + + fen6_info = container_of(info, +struct fib6_entry_notifier_info, +info); + if (fen6_info->rt->nh) { + NL_SET_ERR_MSG_MOD(info->extack, "IPv6 route with nexthop objects is not supported"); + return notifier_from_errno(-EINVAL); + } } break; } -- 2.11.0
[PATCH v2 net-next 7/7] rocker: Fail attempts to use routes with nexthop objects
From: David Ahern Fail attempts to use nexthop objects with routes until support can be properly added. Signed-off-by: David Ahern --- drivers/net/ethernet/rocker/rocker_main.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c index 7ae6c124bfe9..45b3325c3a38 100644 --- a/drivers/net/ethernet/rocker/rocker_main.c +++ b/drivers/net/ethernet/rocker/rocker_main.c @@ -2214,6 +2214,10 @@ static int rocker_router_fib_event(struct notifier_block *nb, NL_SET_ERR_MSG_MOD(info->extack, "IPv6 gateway with IPv4 route is not supported"); return notifier_from_errno(-EINVAL); } + if (fen_info->fi->nh) { + NL_SET_ERR_MSG_MOD(info->extack, "IPv4 route with nexthop objects is not supported"); + return notifier_from_errno(-EINVAL); + } } memcpy(&fib_work->fen_info, ptr, sizeof(fib_work->fen_info)); -- 2.11.0
Re: [PATCH 1/1] net: rds: add per rds connection cache statistics
On 2019/6/3 11:03, santosh.shilim...@oracle.com wrote: On 6/1/19 12:54 AM, Zhu Yanjun wrote: The variable cache_allocs is to indicate how many frags (KiB) are in one rds connection frag cache. The command "rds-info -Iv" will output the rds connection cache statistics as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 1.1.1.14 1.1.1.14 58 255 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31 send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257, cache_allocs=12 " This means that there are about 12KiB frag in this rds connection frag cache. Tested-by: RDS CI Please add some valid email id or drop above. Its expected that with SOB, patches are tested before testing. Thanks for review. OK. I will remove this in V2. Signed-off-by: Zhu Yanjun --- include/uapi/linux/rds.h | 2 ++ net/rds/ib.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h index 5d0f76c..fd6b5f6 100644 --- a/include/uapi/linux/rds.h +++ b/include/uapi/linux/rds.h @@ -250,6 +250,7 @@ struct rds_info_rdma_connection { __u32 rdma_mr_max; __u32 rdma_mr_size; __u8 tos; + __u32 cache_allocs; Some of this header file changes, how is taking care of backward compatibility with tooling ? Just now I made tests with rds-tools. In this commit " commit 6c03b61e9097098d35b4c2be16d0f0f9f8357112 Author: Santosh Shilimkar Date: Wed Mar 9 04:30:48 2016 -0800 rds-tools: sync up sources with 2.0.7-1.16 " cache_allocs is added into rds-tools. The diff is as below. " @@ -176,6 +191,9 @@ struct rds_info_rdma_connection { uint32_t max_send_sge; uint32_t rdma_mr_max; uint32_t rdma_mr_size; + uint8_t tos; + uint8_t sl; + uint32_t cache_allocs; }; " Then this cache_allocs does not exist in rds-tools 2.0.6 and rds-tools 2.0.5. I made tests with 2.0.5 and 2.0.6 " rds-info -V rds-info: Invalid option '-V' rds-info version 2.0.5 [root@ca-dev14 rds-tools]# rds-info -Iv RDS IB Connections: LocalAddr RemoteAddr LocalDev RemoteDev 1.1.1.14 1.1.1.14 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31 send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257 " " [root@ca-dev14 rds-tools]# rds-info -V rds-info: Invalid option '-V' rds-info version 2.0.6 [root@ca-dev14 rds-tools]# rds-info -Iv RDS IB Connections: LocalAddr RemoteAddr LocalDev RemoteDev 1.1.1.14 1.1.1.14 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31 send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257 " From output of rds-tools 2.0.5 and 2.0.6, cache_allocs does not appear since cache_allocs does not exist in struct rds_info_rdma_connection. But in rds-tools 2.0.7, cache_allocs exists in struct rds_info_rdma_connection. " [root@ca-dev14 rds-tools]# rds-info -V rds-info: invalid option -- 'V' rds-info version 2.0.7 [root@ca-dev14 rds-tools]# rds-info -Iv RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 1.1.1.14 1.1.1.14 5 255 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31 send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257, cache_allocs=12 " So do not worry about backward compatibility. This commit will work well with older rds-tools2.0.5 and 2.0.6. I will send V2 soon. Thanks Zhu Yanjun This was one of the reason, the all the fields are not updated. Regards, Santosh
[PATCH v2 net-next 0/7] net: add struct nexthop to fib{6}_info
From: David Ahern This sets adds 'struct nexthop' to fib_info and fib6_info. IPv4 already handles multiple fib_nh entries in a single fib_info, so the conversion to use a nexthop struct is fairly mechanical. IPv6 using a nexthop struct with a fib6_info impacts a lot of core logic which is built around the assumption of a single, builtin fib6_nh per fib6_info. To make this easier to review, this set adds nexthop to fib6_info and adds checks in most places fib6_info is used. The next set finishes the IPv6 conversion, walking through the places that need to consider all fib6_nh within a nexthop struct. Offload drivers - mlx5, mlxsw and rocker - are changed to fail FIB entries using nexthop objects. That limitation can be removed once the drivers are updated to properly support separate nexthops. This set starts by adding accessors for fib_nh and fib_nhs in a fib_info. This makes it easier to extract the number of nexthops in the fib entry and a specific fib_nh once the entry references a struct nexthop. Patch 2 converts more of IPv4 code to use fib_nh_common allowing a struct nexthop to use a fib6_nh with an IPv4 entry. Patches 3 and 4 add 'struct nexthop' to fib{6}_info and update references to both take a different path when it is set. New exported functions are added to the nexthop code to validate a nexthop struct when configured for use with a fib entry. IPv4 is allowed to use a nexthop with either v4 or v6 entries. IPv6 is limited to v6 entries only. In both cases list_heads track the fib entries using a nexthop struct for fast correlation on events (e.g., device events or nexthop events like delete or replace). The last 3 patches add hooks to drivers listening for FIB notificationas. All 3 of them reject the routes as unsupported, returning an error message to the user via extack. For mlxsw at least this is a stop gap measure until the driver is updated for proper support. Functional tests for nexthops have already been committed. Those tests will be active after the next patch set which makes the code paths created by this set and the next one live. Existing code paths moved to the else branch of 'if (f{6}i->nh)' checks are covered by existing tests under selftests/net. v2 - no code changes from v1 - commit messages for first 4 patches updated David Ahern (7): ipv4: Use accessors for fib_info nexthop data ipv4: Prepare for fib6_nh from a nexthop object ipv4: Plumb support for nexthop object in a fib_info ipv6: Plumb support for nexthop object in a fib6_info mlxsw: Fail attempts to use routes with nexthop objects mlx5: Fail attempts to use routes with nexthop objects rocker: Fail attempts to use routes with nexthop objects drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c | 33 ++- .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 33 ++- drivers/net/ethernet/rocker/rocker_main.c | 4 + drivers/net/ethernet/rocker/rocker_ofdpa.c | 25 +- include/net/ip6_fib.h | 11 +- include/net/ip6_route.h| 13 +- include/net/ip_fib.h | 25 +- include/net/nexthop.h | 113 + net/core/filter.c | 3 +- net/ipv4/fib_frontend.c| 15 +- net/ipv4/fib_lookup.h | 1 + net/ipv4/fib_rules.c | 8 +- net/ipv4/fib_semantics.c | 257 ++--- net/ipv4/fib_trie.c| 38 ++- net/ipv4/nexthop.c | 111 - net/ipv4/route.c | 5 +- net/ipv6/addrconf.c| 5 + net/ipv6/ip6_fib.c | 22 +- net/ipv6/ndisc.c | 3 +- net/ipv6/route.c | 156 +++-- 20 files changed, 706 insertions(+), 175 deletions(-) -- 2.11.0
[PATCH v2 net-next 2/7] ipv4: Prepare for fib6_nh from a nexthop object
From: David Ahern Convert more IPv4 code to use fib_nh_common over fib_nh to enable routes to use a fib6_nh based nexthop. In the end, only code not using a nexthop object in a fib_info should directly access fib_nh in a fib_info without checking the famiy and going through fib_nh_common. Those functions will be marked when it is not directly evident. Signed-off-by: David Ahern --- include/net/ip_fib.h | 15 + net/ipv4/fib_frontend.c | 12 +-- net/ipv4/fib_rules.c | 4 ++-- net/ipv4/fib_semantics.c | 55 +--- net/ipv4/fib_trie.c | 15 +++-- net/ipv4/nexthop.c | 3 ++- net/ipv4/route.c | 2 +- 7 files changed, 69 insertions(+), 37 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 42b1a806f6f5..7da8ea784029 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -195,8 +195,8 @@ struct fib_result_nl { #define FIB_TABLE_HASHSZ 2 #endif -__be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh, - unsigned char scope); +__be32 fib_info_update_nhc_saddr(struct net *net, struct fib_nh_common *nhc, +unsigned char scope); __be32 fib_result_prefsrc(struct net *net, struct fib_result *res); #define FIB_RES_NHC(res) ((res).nhc) @@ -455,11 +455,18 @@ static inline void fib_combine_itag(u32 *itag, const struct fib_result *res) { #ifdef CONFIG_IP_ROUTE_CLASSID struct fib_nh_common *nhc = res->nhc; - struct fib_nh *nh = container_of(nhc, struct fib_nh, nh_common); #ifdef CONFIG_IP_MULTIPLE_TABLES u32 rtag; #endif - *itag = nh->nh_tclassid << 16; + if (nhc->nhc_family == AF_INET) { + struct fib_nh *nh; + + nh = container_of(nhc, struct fib_nh, nh_common); + *itag = nh->nh_tclassid << 16; + } else { + *itag = 0; + } + #ifdef CONFIG_IP_MULTIPLE_TABLES rtag = res->tclassid; if (*itag == 0) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index ab369959ce0b..8e49baa00d20 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -235,9 +235,9 @@ static inline unsigned int __inet_dev_addr_type(struct net *net, if (table) { ret = RTN_UNICAST; if (!fib_table_lookup(table, &fl4, &res, FIB_LOOKUP_NOREF)) { - struct fib_nh *nh = fib_info_nh(res.fi, 0); + struct fib_nh_common *nhc = fib_info_nhc(res.fi, 0); - if (!dev || dev == nh->fib_nh_dev) + if (!dev || dev == nhc->nhc_dev) ret = res.type; } } @@ -325,18 +325,18 @@ bool fib_info_nh_uses_dev(struct fib_info *fi, const struct net_device *dev) int ret; for (ret = 0; ret < fib_info_num_path(fi); ret++) { - const struct fib_nh *nh = fib_info_nh(fi, ret); + const struct fib_nh_common *nhc = fib_info_nhc(fi, ret); - if (nh->fib_nh_dev == dev) { + if (nhc->nhc_dev == dev) { dev_match = true; break; - } else if (l3mdev_master_ifindex_rcu(nh->fib_nh_dev) == dev->ifindex) { + } else if (l3mdev_master_ifindex_rcu(nhc->nhc_dev) == dev->ifindex) { dev_match = true; break; } } #else - if (fib_info_nh(fi, 0)->fib_nh_dev == dev) + if (fib_info_nhc(fi, 0)->nhc_dev == dev) dev_match = true; #endif diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index ab06fd73b343..88807c138df4 100644 --- a/net/ipv4/fib_rules.c +++ b/net/ipv4/fib_rules.c @@ -147,9 +147,9 @@ static bool fib4_rule_suppress(struct fib_rule *rule, struct fib_lookup_arg *arg struct net_device *dev = NULL; if (result->fi) { - struct fib_nh *nh = fib_info_nh(result->fi, 0); + struct fib_nh_common *nhc = fib_info_nhc(result->fi, 0); - dev = nh->fib_nh_dev; + dev = nhc->nhc_dev; } /* do not accept result if the route does diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index a37ff07718a8..4a12c69f7fa1 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -61,6 +61,9 @@ static unsigned int fib_info_cnt; #define DEVINDEX_HASHSIZE (1U << DEVINDEX_HASHBITS) static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE]; +/* for_nexthops and change_nexthops only used when nexthop object + * is not set in a fib_info. The logic within can reference fib_nh. + */ #ifdef CONFIG_IP_ROUTE_MULTIPATH #define for_nexthops(fi) { \ @@ -402,20 +405,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) /* each nexthop is packed in a
[PATCH v2 net-next 3/7] ipv4: Plumb support for nexthop object in a fib_info
From: David Ahern Add 'struct nexthop' and nh_list list_head to fib_info. nh_list is the fib_info side of the nexthop <-> fib_info relationship. Add fi_list list_head to 'struct nexthop' to track fib_info entries using a nexthop instance. Add __remove_nexthop_fib and add it to __remove_nexthop to walk the new list_head and mark those fib entries as dead when the nexthop is deleted. Add a few nexthop helpers for use when a nexthop is added to fib_info: - nexthop_cmp to determine if 2 nexthops are the same - nexthop_path_fib_result to select a path for a multipath 'struct nexthop' - nexthop_fib_nhc to select a specific fib_nh_common within a multipath 'struct nexthop' Update existing fib_info_nhc to use nexthop_fib_nhc if a fib_info uses a 'struct nexthop', and mark fib_info_nh as only used for the non-nexthop case. Update the fib_info functions to check for fi->nh and take a different path as needed: - free_fib_info_rcu - put the nexthop object reference - fib_release_info - remove the fib_info from the nexthop's fi_list - nh_comp - use nexthop_cmp when either fib_info references a nexthop object - fib_info_hashfn - use the nexthop id for the hashing vs the oif of each fib_nh in a fib_info - fib_nlmsg_size - add space for the RTA_NH_ID attribute - fib_create_info - verify nexthop reference can be taken, verify nexthop spec is valid for fib entry, and add fib_info to fi_list for a nexthop - fib_select_multipath - use the new nexthop_path_fib_result to select a path when nexthop objects are used - fib_table_lookup - if the 'struct nexthop' is a blackhole nexthop, treat it the same as a fib entry using 'blackhole' The bulk of the changes are in fib_semantics.c and most of that is moving the existing change_nexthops into an else branch. Update the nexthop code to walk fi_list on a nexthop deleted to remove fib entries referencing it. Signed-off-by: David Ahern --- include/net/ip_fib.h | 4 ++ include/net/nexthop.h| 48 net/ipv4/fib_semantics.c | 142 +++ net/ipv4/fib_trie.c | 7 +++ net/ipv4/nexthop.c | 64 + 5 files changed, 229 insertions(+), 36 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 7da8ea784029..071d280de389 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -129,9 +129,12 @@ struct fib_nh { * This structure contains data shared by many of routes. */ +struct nexthop; + struct fib_info { struct hlist_node fib_hash; struct hlist_node fib_lhash; + struct list_headnh_list; struct net *fib_net; int fib_treeref; refcount_t fib_clntref; @@ -151,6 +154,7 @@ struct fib_info { int fib_nhs; boolfib_nh_is_v6; boolnh_updated; + struct nexthop *nh; struct rcu_head rcu; struct fib_nh fib_nh[0]; }; diff --git a/include/net/nexthop.h b/include/net/nexthop.h index e501d77b82c8..2912a2d7a515 100644 --- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -77,6 +77,7 @@ struct nh_group { struct nexthop { struct rb_node rb_node;/* entry on netns rbtree */ + struct list_headfi_list;/* v4 entries using nh */ struct list_headgrp_list; /* nh group entries using this nh */ struct net *net; @@ -110,6 +111,12 @@ static inline void nexthop_put(struct nexthop *nh) call_rcu(&nh->rcu, nexthop_free_rcu); } +static inline bool nexthop_cmp(const struct nexthop *nh1, + const struct nexthop *nh2) +{ + return nh1 == nh2; +} + static inline bool nexthop_is_multipath(const struct nexthop *nh) { if (nh->is_group) { @@ -193,18 +200,59 @@ static inline bool nexthop_is_blackhole(const struct nexthop *nh) return nhi->reject_nh; } +static inline void nexthop_path_fib_result(struct fib_result *res, int hash) +{ + struct nh_info *nhi; + struct nexthop *nh; + + nh = nexthop_select_path(res->fi->nh, hash); + nhi = rcu_dereference(nh->nh_info); + res->nhc = &nhi->fib_nhc; +} + +/* called with rcu read lock or rtnl held */ +static inline +struct fib_nh_common *nexthop_fib_nhc(struct nexthop *nh, int nhsel) +{ + struct nh_info *nhi; + + BUILD_BUG_ON(offsetof(struct fib_nh, nh_common) != 0); + BUILD_BUG_ON(offsetof(struct fib6_nh, nh_common) != 0); + + if (nexthop_is_multipath(nh)) { + nh = nexthop_mpath_select(nh, nhsel); + if (!nh) + return NULL; + } + + nhi = rcu_dereference_rtnl(nh->nh_info); + return &nhi->fib_nhc; +} + static inline unsigned int fib_info_num_path(const struct fib_info *fi) { + if (unlikely(fi->nh)) + return ne
[PATCHv2 1/1] net: rds: add per rds connection cache statistics
The variable cache_allocs is to indicate how many frags (KiB) are in one rds connection frag cache. The command "rds-info -Iv" will output the rds connection cache statistics as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDevRemoteDev 1.1.1.14 1.1.1.14 58 255 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31 send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257, cache_allocs=12 " This means that there are about 12KiB frag in this rds connection frag cache. Since rds.h in rds-tools is not related with the kernel rds.h, the change in kernel rds.h does not affect rds-tools. rds-info in rds-tools 2.0.5 and 2.0.6 is tested with this commit. It works well. Signed-off-by: Zhu Yanjun --- V1->V2: RDS CI is removed. --- include/uapi/linux/rds.h | 2 ++ net/rds/ib.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h index 5d0f76c..fd6b5f6 100644 --- a/include/uapi/linux/rds.h +++ b/include/uapi/linux/rds.h @@ -250,6 +250,7 @@ struct rds_info_rdma_connection { __u32 rdma_mr_max; __u32 rdma_mr_size; __u8tos; + __u32 cache_allocs; }; struct rds6_info_rdma_connection { @@ -264,6 +265,7 @@ struct rds6_info_rdma_connection { __u32 rdma_mr_max; __u32 rdma_mr_size; __u8tos; + __u32 cache_allocs; }; /* RDS message Receive Path Latency points */ diff --git a/net/rds/ib.c b/net/rds/ib.c index 2da9b75..f9baf2d 100644 --- a/net/rds/ib.c +++ b/net/rds/ib.c @@ -318,6 +318,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn, iinfo->max_recv_wr = ic->i_recv_ring.w_nr; iinfo->max_send_sge = rds_ibdev->max_sge; rds_ib_get_mr_info(rds_ibdev, iinfo); + iinfo->cache_allocs = atomic_read(&ic->i_cache_allocs); } return 1; } @@ -351,6 +352,7 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn, iinfo6->max_recv_wr = ic->i_recv_ring.w_nr; iinfo6->max_send_sge = rds_ibdev->max_sge; rds6_ib_get_mr_info(rds_ibdev, iinfo6); + iinfo6->cache_allocs = atomic_read(&ic->i_cache_allocs); } return 1; } -- 2.7.4
Re: [PATCH] devlink: fix libc and kernel headers collision
Thu, May 30, 2019 at 05:32:27PM CEST, bar...@tkos.co.il wrote: >Since commit 2f1242efe9d ("devlink: Add devlink health show command") we >use the sys/sysinfo.h header for the sysinfo(2) system call. But since >iproute2 carries a local version of the kernel struct sysinfo, this >causes a collision with libc that do not rely on kernel defined sysinfo >like musl libc: > >In file included from devlink.c:25:0: >.../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct >sysinfo' > struct sysinfo { >^~~ >In file included from ../include/uapi/linux/kernel.h:5:0, > from ../include/uapi/linux/netlink.h:5, > from ../include/uapi/linux/genetlink.h:6, > from devlink.c:21: >../include/uapi/linux/sysinfo.h:8:8: note: originally defined here > struct sysinfo { > ^~~ > >Rely on the kernel header alone to avoid kernel and userspace headers >collision of definitions. > >Cc: Aya Levin >Cc: Moshe Shemesh >Signed-off-by: Baruch Siach Acked-by: Jiri Pirko
Re: general protection fault in tcp_v6_connect
On Mon, Jun 3, 2019 at 5:29 AM David Ahern wrote: > > On 6/1/19 12:05 AM, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit:f4aa8012 cxgb4: Make t4_get_tp_e2c_map static > > git tree: net-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=1662cb12a0 > > kernel config: https://syzkaller.appspot.com/x/.config?x=d137eb988ffd93c3 > > dashboard link: > > https://syzkaller.appspot.com/bug?extid=5ee26b4e30c45930bd3c > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+5ee26b4e30c45930b...@syzkaller.appspotmail.com > > > > kasan: CONFIG_KASAN_INLINE enabled > > kasan: GPF could be caused by NULL-ptr deref or user memory access > > general protection fault: [#1] PREEMPT SMP KASAN > > CPU: 1 PID: 17324 Comm: syz-executor.5 Not tainted 5.2.0-rc1+ #2 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] > > RIP: 0010:rt6_get_cookie include/net/ip6_fib.h:264 [inline] > > RIP: 0010:ip6_dst_store include/net/ip6_route.h:213 [inline] > > RIP: 0010:tcp_v6_connect+0xfd0/0x20a0 net/ipv6/tcp_ipv6.c:298 > > Code: 89 e6 e8 83 a2 48 fb 45 84 e4 0f 84 90 09 00 00 e8 35 a1 48 fb 49 > > 8d 7e 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 > > 00 0f 85 57 0e 00 00 4d 8b 66 70 e8 4d 88 35 fb 31 ff 89 > > RSP: 0018:888066547800 EFLAGS: 00010207 > > RAX: dc00 RBX: 888064e839f0 RCX: c90010e49000 > > RDX: 002b RSI: 8628033b RDI: 015f > > RBP: 888066547980 R08: 8880a9412080 R09: ed1015d26be0 > > This one is not so obvious. > > The error has to be a bad dst from ip6_dst_lookup_flow called by > tcp_v6_connect which then is attempted to be stored in the socket via > ip6_dst_store. ip6_dst_store calls rt6_get_cookie with dst as the > argument. RDI (first arg for x86) shows 0x15f which is not a valid and > would cause a fault. > > None of the ip6_dst_* functions in net/ipv6/ip6_output.c have changed > recently (5.2-next definitely but I believe this true for many releases > prior). Further, all of the FIB lookup functions (called by > ip6_dst_lookup_flow) always return a non-NULL dst. > > If my hunch about the other splat is correct (pcpu corruption) that > could explain this one: FIB lookup is fine and finds an entry, the entry > has a pcpu cache entry so it is returned. If the pcpu entry was stomped > on then it would be invalid and the above would result. This happened only once so far, so may be a previous silent memory corruption. This also may be related to "KASAN: user-memory-access Read in ip6_hold_safe (3)": https://syzkaller.appspot.com/bug?extid=a5b6e01ec8116d046842 because that one seems to be a race in involved code. So this one may be a rare incarnation of the other crash.