RE: [PATCH V2 0/5] patches for stmmac
> -Original Message- > From: Jakub Kicinski > Sent: 2020年12月6日 5:40 > To: Joakim Zhang > Cc: peppe.cavall...@st.com; alexandre.tor...@st.com; > joab...@synopsys.com; da...@davemloft.net; netdev@vger.kernel.org; > dl-linux-imx > Subject: Re: [PATCH V2 0/5] patches for stmmac > > On Fri, 4 Dec 2020 10:46:33 +0800 Joakim Zhang wrote: > > A patch set for stmmac, fix some driver issues. > > These don't apply cleanly to the net tree where fixes go: > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchw > ork.kernel.org%2Fproject%2Fnetdevbpf%2Flist%2F%3Fdelegate%3Dnetdev%26 > param%3D1%26order%3Ddate&data=04%7C01%7Cqiangqing.zhang%40 > nxp.com%7C78a0b4496e7a49d8fcfc08d899664aff%7C686ea1d3bc2b4c6fa92cd > 99c5c301635%7C0%7C1%7C637428011934975450%7CUnknown%7CTWFpbGZ > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn > 0%3D%7C2000&sdata=TO7GoQGWml8hlMyYV84bks1hXsAb%2FLQYue1U > Y%2FpmIrM%3D&reserved=0 > > Please rebase / retest / repost. Hi Jakub, I will rebase to the latest net tree, thanks. Hi all guys, I also want to report a stmmac driver issue here, someone may also suffer from it. After I do hundreds of suspend/resume stress test, I can encounter below netdev watchdog timeout issue. Tx queue timed out then reset adapter. === suspend 1000 times === Test < suspend_quick_auto.sh > ended root@imx8mpevk:/unit_tests/Power_Management# [ 1347.976688] imx-dwmac 30bf.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx [ 1358.022784] [ cut here ] [ 1358.027430] NETDEV WATCHDOG: eth0 (imx-dwmac): transmit queue 0 timed out [ 1358.035469] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:450 dev_watchdog+0x2fc/0x30c [ 1358.043736] Modules linked in: [ 1358.046798] CPU: 2 PID: 0 Comm: swapper/2 Tainted: GW 5.8.0-rc5-next-20200717-7-g30d24ae22e81-dirty #333 [ 1358.058011] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 1358.063324] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--) [ 1358.068898] pc : dev_watchdog+0x2fc/0x30c [ 1358.072908] lr : dev_watchdog+0x2fc/0x30c [ 1358.076915] sp : 800011c5bd90 [ 1358.080228] x29: 800011c5bd90 x28: 0001767f1940 [ 1358.085542] x27: 0004 x26: 000176e88440 [ 1358.090857] x25: 0140 x24: [ 1358.096171] x23: 000176e8839c x22: 0002 [ 1358.101484] x21: 8000119f6000 x20: 000176e88000 [ 1358.106799] x19: x18: 0030 [ 1358.112112] x17: 0001 x16: 0018bf1a354e [ 1358.117426] x15: 0001760eae70 x14: [ 1358.122740] x13: 800091c5ba77 x12: 800011c5ba80 [ 1358.128054] x11: x10: 00017f38b7c0 [ 1358.133368] x9 : 000c x8 : 6928203068746520 [ 1358.138682] x7 : 3a474f4448435441 x6 : 0003 [ 1358.143996] x5 : x4 : [ 1358.149310] x3 : 0004 x2 : 0100 [ 1358.154624] x1 : b54950db346c9600 x0 : [ 1358.159939] Call trace: [ 1358.162389] dev_watchdog+0x2fc/0x30c [ 1358.166055] call_timer_fn.constprop.0+0x24/0x80 [ 1358.170673] expire_timers+0x98/0xc4 [ 1358.174249] run_timer_softirq+0xd0/0x200 [ 1358.178261] efi_header_end+0x124/0x284 [ 1358.182098] irq_exit+0xdc/0xfc [ 1358.185241] __handle_domain_irq+0x80/0xe0 [ 1358.189338] gic_handle_irq+0xc8/0x170 [ 1358.193087] el1_irq+0xbc/0x180 [ 1358.196230] arch_cpu_idle+0x14/0x20 [ 1358.199807] cpu_startup_entry+0x24/0x80 [ 1358.203732] secondary_start_kernel+0x138/0x184 [ 1358.208262] ---[ end trace b422761fd811b2a7 ]--- [ 1358.213588] imx-dwmac 30bf.ethernet eth0: Reset adapter. [ 1358.228037] imx-dwmac 30bf.ethernet eth0: PHY [stmmac-1:01] driver [RTL8211F Gigabit Ethernet] (irq=POLL) [ 1358.246815] imx-dwmac 30bf.ethernet eth0: No Safety Features support found [ 1358.254062] imx-dwmac 30bf.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported [ 1358.264130] imx-dwmac 30bf.ethernet eth0: registered PTP clock [ 1358.270374] imx-dwmac 30bf.ethernet eth0: configuring for phy/rgmii-id link mode [ 1358.279481] 8021q: adding VLAN 0 to HW filter on device eth0 [ 1360.328695] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1360.335007] imx-dwmac 30bf.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx I found this issue first in latest 5.10, and I confirm it is fine in 5.4. After a period of time digging into driver commit history, I got nothing. It should be related to stmmac core driver, un-related to platform driver. So I think it could be reproduced on other platforms. Could you please point me how to debug this issue? Now I don't know how to look into this issue further, as I take over ethernet driver in a short time. Any feedback could be appreciated! Joakim Zhang
RE: [PATCH] net: stmmac: implement .set_intf_mode() callback for imx8dxl
> -Original Message- > From: Jakub Kicinski > Sent: 2020年12月6日 3:58 > To: Joakim Zhang > Cc: peppe.cavall...@st.com; alexandre.tor...@st.com; > joab...@synopsys.com; da...@davemloft.net; dl-linux-imx > ; netdev@vger.kernel.org > Subject: Re: [PATCH] net: stmmac: implement .set_intf_mode() callback for > imx8dxl > > On Thu, 3 Dec 2020 12:10:38 +0800 Joakim Zhang wrote: > > From: Fugang Duan > > > > Implement .set_intf_mode() callback for imx8dxl. > > > > Signed-off-by: Fugang Duan > > Signed-off-by: Joakim Zhang > > Couple minor issues. > > > @@ -86,7 +88,37 @@ imx8dxl_set_intf_mode(struct > plat_stmmacenet_data > > *plat_dat) { > > int ret = 0; > > > > - /* TBD: depends on imx8dxl scu interfaces to be upstreamed */ > > + struct imx_sc_ipc *ipc_handle; > > + int val; > > Looks like you're gonna have a empty line in the middle of variable > declarations? > > Please remove it and order the variable lines longest to shortest. > > > + > > + ret = imx_scu_get_handle(&ipc_handle); > > + if (ret) > > + return ret; > > + > > + switch (plat_dat->interface) { > > + case PHY_INTERFACE_MODE_MII: > > + val = GPR_ENET_QOS_INTF_SEL_MII; > > + break; > > + case PHY_INTERFACE_MODE_RMII: > > + val = GPR_ENET_QOS_INTF_SEL_RMII; > > + break; > > + case PHY_INTERFACE_MODE_RGMII: > > + case PHY_INTERFACE_MODE_RGMII_ID: > > + case PHY_INTERFACE_MODE_RGMII_RXID: > > + case PHY_INTERFACE_MODE_RGMII_TXID: > > + val = GPR_ENET_QOS_INTF_SEL_RGMII; > > + break; > > + default: > > + pr_debug("imx dwmac doesn't support %d interface\n", > > +plat_dat->interface); > > + return -EINVAL; > > + } > > + > > + ret = imx_sc_misc_set_control(ipc_handle, IMX_SC_R_ENET_1, > > + IMX_SC_C_INTF_SEL, val >> 16); > > + ret |= imx_sc_misc_set_control(ipc_handle, IMX_SC_R_ENET_1, > > + IMX_SC_C_CLK_GEN_EN, 0x1); > > return ret; > > These calls may return different errors AFAICT. > > You can't just errno values to gether the result will be meaningless. > > please use the normal flow, and return the result of the second call > directly: > > ret = func1(); > if (ret) > return ret; > > return func2(); > > Please also CC the maintainers of the Ethernet PHY subsystem on v2, to make > sure there is nothing wrong with the patch from their PoV. Thanks Jakub for your kindly review, I will improve patch following your comments. Best Regards, Joakim Zhang > Thanks!
[PATCH bpf-next] xsk: Validate socket state in xsk_recvmsg, prior touching socket members
From: Björn Töpel In AF_XDP the socket state needs to be checked, prior touching the members of the socket. This was not the case for the recvmsg implementation. Fix that by moving the xsk_is_bound() call. Reported-by: kernel test robot Fixes: 45a86681844e ("xsk: Add support for recvmsg()") Signed-off-by: Björn Töpel --- net/xdp/xsk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 56c46e5f57bc..e28c6825e089 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -554,12 +554,12 @@ static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int fl struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); + if (unlikely(!xsk_is_bound(xs))) + return -ENXIO; if (unlikely(!(xs->dev->flags & IFF_UP))) return -ENETDOWN; if (unlikely(!xs->rx)) return -ENOBUFS; - if (unlikely(!xsk_is_bound(xs))) - return -ENXIO; if (unlikely(need_wait)) return -EOPNOTSUPP; base-commit: 34da87213d3ddd26643aa83deff7ffc6463da0fc -- 2.27.0
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
Herbert Xu wrote: > > Herbert recently made some changes for MSG_MORE support in the AF_ALG > > code, which permits a skcipher encryption to be split into several > > invocations of the skcipher layer without the need for this complexity > > on the side of the caller. Maybe there is a way to reuse that here. > > Herbert? > > Yes this was one of the reasons I was persuing the continuation > work. It should allow us to kill the special case for CTS in the > krb5 code. > > Hopefully I can get some time to restart work on this soon. In the krb5 case, we know in advance how much data we're going to be dealing with, if that helps. David
[PATCH] net: tipc: prevent possible null deref of link
`tipc_node_apply_property` does a null check on a `tipc_link_entry` pointer but also accesses the same pointer out of the null check block. This triggers a warning on Coverity Static Analyzer because we're implying that `e->link` can BE null. Move "Update MTU for node link entry" line into if block to make sure that we're not in a state that `e->link` is null. Signed-off-by: Cengiz Can --- net/tipc/node.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/tipc/node.c b/net/tipc/node.c index c95d037fde51..83978d5dae59 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -2181,9 +2181,11 @@ void tipc_node_apply_property(struct net *net, struct tipc_bearer *b, &xmitq); else if (prop == TIPC_NLA_PROP_MTU) tipc_link_set_mtu(e->link, b->mtu); + + /* Update MTU for node link entry */ + e->mtu = tipc_link_mss(e->link); } - /* Update MTU for node link entry */ - e->mtu = tipc_link_mss(e->link); + tipc_node_write_unlock(n); tipc_bearer_xmit(net, bearer_id, &xmitq, &e->maddr, NULL); } -- 2.29.2
Re: [PATCH net] udp: fix the proto value passed to ip_protocol_deliver_rcu for the segments
From: Xin Long Date: Mon, 7 Dec 2020 15:55:40 +0800 > Guillaume noticed that: for segments udp_queue_rcv_one_skb() returns the > proto, and it should pass "ret" unmodified to ip_protocol_deliver_rcu(). > Otherwize, with a negtive value passed, it will underflow inet_protos. > > This can be reproduced with IPIP FOU: > > # ip fou add port ipproto 4 > # ethtool -K eth1 rx-gro-list on > > Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection") > Reported-by: Guillaume Nault > Signed-off-by: Xin Long Applied and queued up for -stable, thanks!
Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
On Sun, 2020-12-06 at 09:08 -0800, Richard Cochran wrote: > On Sun, Dec 06, 2020 at 03:37:47PM +0200, Eran Ben Elisha wrote: > > Adding new enum to the ioctl means we have add > > (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY for example) all the way - > > drivers, > > kernel ptp, user space ptp, ethtool. > > Not exactly, 1) the flag name should be HWTSTAMP_TX_PTP_EVENTS, similar to what we already have in RX, which will mean: HW stamp all PTP events, don't care about the rest. 2) no need to add it to drivers from the get go, only drivers who are interested may implement it, and i am sure there are tons who would like to have this flag if their hw timestamping implementation is slow ! other drivers will just keep doing what they are doing, timestamp all traffic even if user requested this flag, again exactly like many other drivers do for RX flags (hwtstamp_rx_filters). > > My concerns are: > > 1. Timestamp applications (like ptp4l or similar) will have to add > > support > > for configuring the driver to use HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY > > if > > supported via ioctl prior to packets transmit. From application > > point of > > view, the dual-modes (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY , > > HWTSTAMP_TX_ON) > > support is redundant, as it offers nothing new. > > Well said. > disagree, it is not a dual mode, just allow the user to have better granularity for what hw stamps, exactly like what we have in rx. we are not adding any new mechanism. > > 2. Other vendors will have to support it as well, when not sure > > what is the > > expectation from them if they cannot improve accuracy between them. > > If there were multiple different devices out there with this kind of > implementation (different levels of accuracy with increasing run time > performance cost), then we could consider such a flag. However, to > my > knowledge, this feature is unique to your device. > I agree, but i never meant to have a flag that indicate two different levels of accuracy, that would be a very wild mistake for sure! The new flag will be about selecting granularity of what gets a hw stamp and what doesn't, aligning with the RX filter API. > > This feature is just an internal enhancement, and as such it should > > be added > > only as a vendor private configuration flag. We are not offering > > here about > > any standard for others to follow. > > +1 > Our driver feature is and internal enhancement yes, but the suggested flag is very far from indicating any internal enhancement, is actually an enhancement to the current API, and is a very simple extension with wide range of improvements to all layers. Our driver can optimize accuracy when this flag is set, other drivers might be happy to implement it since they already have a slow hw and this flag would allow them to run better TCP/UDP performance while still performing ptp hw stamping, some admins/apps will use it to avoid stamping all traffic on tx, win win win.
Re: [PATCH 1/1] ice: fix array overflow on receiving too many fragments for a packet
Hi Xiaohui, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on tnguy-next-queue/dev-queue] [also build test WARNING on v5.10-rc7 next-20201204] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Xiaohui-Zhang/ice-fix-array-overflow-on-receiving-too-many-fragments-for-a-packet/20201207-141033 base: https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue config: riscv-allyesconfig (attached as .config) compiler: riscv64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/b3906f69dcad641195cbf1ce9af3e9105a6f72e1 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Xiaohui-Zhang/ice-fix-array-overflow-on-receiving-too-many-fragments-for-a-packet/20201207-141033 git checkout b3906f69dcad641195cbf1ce9af3e9105a6f72e1 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=riscv If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): In file included from include/vdso/processor.h:10, from arch/riscv/include/asm/processor.h:11, from include/linux/prefetch.h:15, from drivers/net/ethernet/intel/ice/ice_txrx.c:6: arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax': arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration] 14 | barrier(); | ^~~ drivers/net/ethernet/intel/ice/ice_txrx.c: In function 'ice_add_rx_frag': >> drivers/net/ethernet/intel/ice/ice_txrx.c:828:2: warning: ISO C90 forbids >> mixed declarations and code [-Wdeclaration-after-statement] 828 | struct skb_shared_info *shinfo = skb_shinfo(skb); | ^~ >> drivers/net/ethernet/intel/ice/ice_txrx.c:831:24: warning: passing argument >> 2 of 'skb_add_rx_frag' makes integer from pointer without a cast >> [-Wint-conversion] 831 | skb_add_rx_frag(skb, shinfo, rx_buf->page, |^~ || |struct skb_shared_info * In file included from include/net/net_namespace.h:39, from include/linux/netdevice.h:37, from include/trace/events/xdp.h:8, from include/linux/bpf_trace.h:5, from drivers/net/ethernet/intel/ice/ice_txrx.c:8: include/linux/skbuff.h:2187:47: note: expected 'int' but argument is of type 'struct skb_shared_info *' 2187 | void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, | ^ cc1: some warnings being treated as errors vim +828 drivers/net/ethernet/intel/ice/ice_txrx.c 825 826 if (!size) 827 return; > 828 struct skb_shared_info *shinfo = skb_shinfo(skb); 829 830 if (shinfo->nr_frags < ARRAY_SIZE(shinfo->frags)) { > 831 skb_add_rx_frag(skb, shinfo, rx_buf->page, 832 rx_buf->page_offset, size, truesize); 833 } 834 835 /* page is being used so we must update the page offset */ 836 ice_rx_buf_adjust_pg_offset(rx_buf, truesize); 837 } 838 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH bpf-next] xsk: Validate socket state in xsk_recvmsg, prior touching socket members
On Mon, Dec 7, 2020 at 9:22 AM Björn Töpel wrote: > > From: Björn Töpel > > In AF_XDP the socket state needs to be checked, prior touching the > members of the socket. This was not the case for the recvmsg > implementation. Fix that by moving the xsk_is_bound() call. > > Reported-by: kernel test robot > Fixes: 45a86681844e ("xsk: Add support for recvmsg()") > Signed-off-by: Björn Töpel > --- > net/xdp/xsk.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Acked-by: Magnus Karlsson > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 56c46e5f57bc..e28c6825e089 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -554,12 +554,12 @@ static int xsk_recvmsg(struct socket *sock, struct > msghdr *m, size_t len, int fl > struct sock *sk = sock->sk; > struct xdp_sock *xs = xdp_sk(sk); > > + if (unlikely(!xsk_is_bound(xs))) > + return -ENXIO; > if (unlikely(!(xs->dev->flags & IFF_UP))) > return -ENETDOWN; > if (unlikely(!xs->rx)) > return -ENOBUFS; > - if (unlikely(!xsk_is_bound(xs))) > - return -ENXIO; > if (unlikely(need_wait)) > return -EOPNOTSUPP; > > > base-commit: 34da87213d3ddd26643aa83deff7ffc6463da0fc > -- > 2.27.0 >
[PATCH 1/1] xdp: avoid calling kfree twice
From: Zhu Yanjun In the function xdp_umem_pin_pages, if npgs != umem->npgs and npgs >= 0, the function xdp_umem_unpin_pages is called. In this function, kfree is called to handle umem->pgs, and then in the function xdp_umem_pin_pages, kfree is called again to handle umem->pgs. Eventually, umem->pgs is freed twice. Signed-off-by: Zhu Yanjun --- net/xdp/xdp_umem.c | 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 56a28a686988..ff5173f72920 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -97,7 +97,6 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) { unsigned int gup_flags = FOLL_WRITE; long npgs; - int err; umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL | __GFP_NOWARN); @@ -112,20 +111,14 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) if (npgs != umem->npgs) { if (npgs >= 0) { umem->npgs = npgs; - err = -ENOMEM; - goto out_pin; + xdp_umem_unpin_pages(umem); + return -ENOMEM; } - err = npgs; - goto out_pgs; + kfree(umem->pgs); + umem->pgs = NULL; + return npgs; } return 0; - -out_pin: - xdp_umem_unpin_pages(umem); -out_pgs: - kfree(umem->pgs); - umem->pgs = NULL; - return err; } static int xdp_umem_account_pages(struct xdp_umem *umem) -- 2.18.4
Re: [PATCH] net: stmmac: dwmac-meson8b: fix mask definition of the m250_sel mux
On Sat 05 Dec 2020 at 22:32, Martin Blumenstingl wrote: > The m250_sel mux clock uses bit 4 in the PRG_ETH0 register. Fix this by > shifting the PRG_ETH0_CLK_M250_SEL_MASK accordingly as the "mask" in > struct clk_mux expects the mask relative to the "shift" field in the > same struct. > > While here, get rid of the PRG_ETH0_CLK_M250_SEL_SHIFT macro and use > __ffs() to determine it from the existing PRG_ETH0_CLK_M250_SEL_MASK > macro. > > Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson > 8b / GXBB DWMAC") > Signed-off-by: Martin Blumenstingl Reviewed-by: Jerome Brunet > --- > drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c > b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c > index dc0b8b6d180d..459ae715b33d 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c > @@ -30,7 +30,6 @@ > #define PRG_ETH0_EXT_RMII_MODE 4 > > /* mux to choose between fclk_div2 (bit unset) and mpll2 (bit set) */ > -#define PRG_ETH0_CLK_M250_SEL_SHIFT 4 > #define PRG_ETH0_CLK_M250_SEL_MASK GENMASK(4, 4) > > /* TX clock delay in ns = "8ns / 4 * tx_dly_val" (where 8ns are exactly one > @@ -155,8 +154,9 @@ static int meson8b_init_rgmii_tx_clk(struct meson8b_dwmac > *dwmac) > return -ENOMEM; > > clk_configs->m250_mux.reg = dwmac->regs + PRG_ETH0; > - clk_configs->m250_mux.shift = PRG_ETH0_CLK_M250_SEL_SHIFT; > - clk_configs->m250_mux.mask = PRG_ETH0_CLK_M250_SEL_MASK; > + clk_configs->m250_mux.shift = __ffs(PRG_ETH0_CLK_M250_SEL_MASK); > + clk_configs->m250_mux.mask = PRG_ETH0_CLK_M250_SEL_MASK >> > + clk_configs->m250_mux.shift; > clk = meson8b_dwmac_register_clk(dwmac, "m250_sel", mux_parents, >ARRAY_SIZE(mux_parents), &clk_mux_ops, >&clk_configs->m250_mux.hw);
Re: [PATCH net] net: openvswitch: fix TTL decrement exception action execution
On 5 Dec 2020, at 1:30, Jakub Kicinski wrote: On Fri, 4 Dec 2020 07:16:23 -0500 Eelco Chaudron wrote: Currently, the exception actions are not processed correctly as the wrong dataset is passed. This change fixes this, including the misleading comment. In addition, a check was added to make sure we work on an IPv4 packet, and not just assume if it's not IPv6 it's IPv4. Small cleanup which removes an unsessesaty parameter from the dec_ttl_exception_handler() function. No cleanups in fixes, please. Especially when we're at -rc6.. You can clean this up in net-next within a week after trees merge. Ack, will undo the parameter removal, and sent out a v2. Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format") :( and please add some info on how these changes are tested. Will add the details to v2.
pull request (net): ipsec 2020-12-07
1) Sysbot reported fixes for the new 64/32 bit compat layer. From Dmitry Safonov. 2) Fix a memory leak in xfrm_user_policy that was introduced by adding the 64/32 bit compat layer. From Yu Kuai. Please pull or let me know if there are problems. Thanks! The following changes since commit 4e0396c59559264442963b349ab71f66e471f84d: net: marvell: prestera: fix compilation with CONFIG_BRIDGE=m (2020-11-07 12:43:26 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master for you to fetch changes up to 48f486e13ffdb49fbb9b38c21d0e108ed60ab1a2: net: xfrm: fix memory leak in xfrm_user_policy() (2020-11-10 09:14:25 +0100) Dmitry Safonov (3): xfrm/compat: Translate by copying XFRMA_UNSPEC attribute xfrm/compat: memset(0) 64-bit padding at right place xfrm/compat: Don't allocate memory with __GFP_ZERO Steffen Klassert (1): Merge branch 'xfrm/compat: syzbot-found fixes' Yu Kuai (1): net: xfrm: fix memory leak in xfrm_user_policy() net/xfrm/xfrm_compat.c | 5 +++-- net/xfrm/xfrm_state.c | 4 +++- 2 files changed, 6 insertions(+), 3 deletions(-)
[PATCH 3/4] xfrm/compat: Don't allocate memory with __GFP_ZERO
From: Dmitry Safonov 32-bit to 64-bit messages translator zerofies needed paddings in the translation, the rest is the actual payload. Don't allocate zero pages as they are not needed. Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator") Signed-off-by: Dmitry Safonov Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_compat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_compat.c b/net/xfrm/xfrm_compat.c index 556e9f33b815..d8e8a11ca845 100644 --- a/net/xfrm/xfrm_compat.c +++ b/net/xfrm/xfrm_compat.c @@ -564,7 +564,7 @@ static struct nlmsghdr *xfrm_user_rcv_msg_compat(const struct nlmsghdr *h32, return NULL; len += NLMSG_HDRLEN; - h64 = kvmalloc(len, GFP_KERNEL | __GFP_ZERO); + h64 = kvmalloc(len, GFP_KERNEL); if (!h64) return ERR_PTR(-ENOMEM); -- 2.25.1
Re: [PATCH ipsec-next] xfrm: interface: support collect metadata mode
On Fri, Nov 27, 2020 at 02:32:44PM +0200, Eyal Birger wrote: > Hi Steffen, > > On Fri, Nov 27, 2020 at 11:44 AM Steffen Klassert > wrote: > > > > On Sat, Nov 21, 2020 at 04:28:23PM +0200, Eyal Birger wrote: > > > This commit adds support for 'collect_md' mode on xfrm interfaces. > > > > > > Each net can have one collect_md device, created by providing the > > > IFLA_XFRM_COLLECT_METADATA flag at creation. This device cannot be > > > altered and has no if_id or link device attributes. > > > > > > On transmit to this device, the if_id is fetched from the attached dst > > > metadata on the skb. The dst metadata type used is METADATA_IP_TUNNEL > > > since the only needed property is the if_id stored in the tun_id member > > > of the ip_tunnel_info->key. > > > > Can we please have a separate metadata type for xfrm interfaces? > > > > Sharing such structures turned already out to be a bad idea > > on vti interfaces, let's try to avoid that misstake with > > xfrm interfaces. > > My initial thought was to do that, but it looks like most of the constructs > surrounding this facility - tc, nft, ovs, ebpf, ip routing - are built around > struct ip_tunnel_info and don't regard other possible metadata types. That is likely because most objects that have a collect_md mode are tunnels. We have already a second metadata type, and I don't see why we can't have a third one. Maybe we can create something more generic so that it can have other users too. > For xfrm interfaces, the only metadata used is the if_id, which is stored > in the metadata tun_id, so I think other than naming consideration, the use > of struct ip_tunnel_info does not imply tunneling and does not limit the > use of xfrmi to a specific mode of operation. I agree that this can work, but it is a first step into a wrong direction. Using a __be64 field of a completely unrelated structure as an u32 if_id is bad style IMO. > On the other hand, adding a new metadata type would require changing all > other places to regard the new metadata type, with a large number of > userspace visible changes. I admit that this might have some disadvantages too, but I'm not convinced that this justifies the 'ip_tunnel_info' hack.
[PATCH 1/4] xfrm/compat: Translate by copying XFRMA_UNSPEC attribute
From: Dmitry Safonov xfrm_xlate32() translates 64-bit message provided by kernel to be sent for 32-bit listener (acknowledge or monitor). Translator code doesn't expect XFRMA_UNSPEC attribute as it doesn't know its payload. Kernel never attaches such attribute, but a user can. I've searched if any opensource does it and the answer is no. Nothing on github and google finds only tfcproject that has such code commented-out. What will happen if a user sends a netlink message with XFRMA_UNSPEC attribute? Ipsec code ignores this attribute. But if there is a monitor-process or 32-bit user requested ack - kernel will try to translate such message and will hit WARN_ONCE() in xfrm_xlate64_attr(). Deal with XFRMA_UNSPEC by copying the attribute payload with xfrm_nla_cpy(). In result, the default switch-case in xfrm_xlate64_attr() becomes an unused code. Leave those 3 lines in case a new xfrm attribute will be added. Fixes: 5461fc0c8d9f ("xfrm/compat: Add 64=>32-bit messages translator") Reported-by: syzbot+a7e701c8385bd8543...@syzkaller.appspotmail.com Signed-off-by: Dmitry Safonov Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_compat.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/xfrm/xfrm_compat.c b/net/xfrm/xfrm_compat.c index e28f0c9ecd6a..17edbf935e35 100644 --- a/net/xfrm/xfrm_compat.c +++ b/net/xfrm/xfrm_compat.c @@ -234,6 +234,7 @@ static int xfrm_xlate64_attr(struct sk_buff *dst, const struct nlattr *src) case XFRMA_PAD: /* Ignore */ return 0; + case XFRMA_UNSPEC: case XFRMA_ALG_AUTH: case XFRMA_ALG_CRYPT: case XFRMA_ALG_COMP: -- 2.25.1
[PATCH 2/4] xfrm/compat: memset(0) 64-bit padding at right place
From: Dmitry Safonov 32-bit messages translated by xfrm_compat can have attributes attached. For all, but XFRMA_SA, XFRMA_POLICY the size of payload is the same in 32-bit UABI and 64-bit UABI. For XFRMA_SA (struct xfrm_usersa_info) and XFRMA_POLICY (struct xfrm_userpolicy_info) it's only tail-padding that is present in 64-bit payload, but not in 32-bit. The proper size for destination nlattr is already calculated by xfrm_user_rcv_calculate_len64() and allocated with kvmalloc(). xfrm_attr_cpy32() copies 32-bit copy_len into 64-bit attribute translated payload, zero-filling possible padding for SA/POLICY. Due to a typo, *pos already has 64-bit payload size, in a result next memset(0) is called on the memory after the translated attribute, not on the tail-padding of it. Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator") Reported-by: syzbot+c43831072e7df506a...@syzkaller.appspotmail.com Signed-off-by: Dmitry Safonov Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_compat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_compat.c b/net/xfrm/xfrm_compat.c index 17edbf935e35..556e9f33b815 100644 --- a/net/xfrm/xfrm_compat.c +++ b/net/xfrm/xfrm_compat.c @@ -388,7 +388,7 @@ static int xfrm_attr_cpy32(void *dst, size_t *pos, const struct nlattr *src, memcpy(nla, src, nla_attr_size(copy_len)); nla->nla_len = nla_attr_size(payload); - *pos += nla_attr_size(payload); + *pos += nla_attr_size(copy_len); nlmsg->nlmsg_len += nla->nla_len; memset(dst + *pos, 0, payload - copy_len); -- 2.25.1
[PATCH 4/4] net: xfrm: fix memory leak in xfrm_user_policy()
From: Yu Kuai if xfrm_get_translator() failed, xfrm_user_policy() return without freeing 'data', which is allocated in memdup_sockptr(). Fixes: 96392ee5a13b ("xfrm/compat: Translate 32-bit user_policy from sockptr") Reported-by: Hulk Robot Signed-off-by: Yu Kuai Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_state.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index a77da7aae6fe..2f1517827995 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -2382,8 +2382,10 @@ int xfrm_user_policy(struct sock *sk, int optname, sockptr_t optval, int optlen) if (in_compat_syscall()) { struct xfrm_translator *xtr = xfrm_get_translator(); - if (!xtr) + if (!xtr) { + kfree(data); return -EOPNOTSUPP; + } err = xtr->xlate_user_policy_sockptr(&data, optlen); xfrm_put_translator(xtr); -- 2.25.1
Re: [PATCH v1 1/5] Bluetooth: advmon offload MSFT add rssi support
Hi Archie, > MSFT needs rssi parameter for monitoring advertisement packet, > therefore we should supply them from mgmt. > > Signed-off-by: Archie Pusaka > Reviewed-by: Miao-chen Chou > Reviewed-by: Yun-Hao Chung I don’t need any Reviewed-by if they are not catching an obvious user API breakage. > --- > > include/net/bluetooth/hci_core.h | 9 + > include/net/bluetooth/mgmt.h | 9 + > net/bluetooth/mgmt.c | 8 > 3 files changed, 26 insertions(+) > > diff --git a/include/net/bluetooth/hci_core.h > b/include/net/bluetooth/hci_core.h > index 9873e1c8cd16..42d446417817 100644 > --- a/include/net/bluetooth/hci_core.h > +++ b/include/net/bluetooth/hci_core.h > @@ -246,8 +246,17 @@ struct adv_pattern { > __u8 value[HCI_MAX_AD_LENGTH]; > }; > > +struct adv_rssi_thresholds { > + __s8 low_threshold; > + __s8 high_threshold; > + __u16 low_threshold_timeout; > + __u16 high_threshold_timeout; > + __u8 sampling_period; > +}; > + > struct adv_monitor { > struct list_head patterns; > + struct adv_rssi_thresholds rssi; > boolactive; > __u16 handle; > }; > diff --git a/include/net/bluetooth/mgmt.h b/include/net/bluetooth/mgmt.h > index d8367850e8cd..dc534837be0e 100644 > --- a/include/net/bluetooth/mgmt.h > +++ b/include/net/bluetooth/mgmt.h > @@ -763,9 +763,18 @@ struct mgmt_adv_pattern { > __u8 value[31]; > } __packed; > > +struct mgmt_adv_rssi_thresholds { > + __s8 high_threshold; > + __le16 high_threshold_timeout; > + __s8 low_threshold; > + __le16 low_threshold_timeout; > + __u8 sampling_period; > +} __packed; > + > #define MGMT_OP_ADD_ADV_PATTERNS_MONITOR 0x0052 > struct mgmt_cp_add_adv_patterns_monitor { > __u8 pattern_count; > + struct mgmt_adv_rssi_thresholds rssi; > struct mgmt_adv_pattern patterns[]; > } __packed; This is something we can not do. It breaks an userspace facing API. Is the mgmt opcode 0x0052 in an already released kernel? >>> >>> Yes, the opcode does exist in an already released kernel. >>> >>> The DBus method which accesses this API is put behind the experimental >>> flag, therefore we expect they are flexible enough to support changes. >>> Previously, we already had a discussion in an email thread with the >>> title "Offload RSSI tracking to controller", and the outcome supports >>> this change. >>> >>> Here is an excerpt of the discussion. >> >> it doesn’t matter. This is fixed API now and so we can not just change it. >> The argument above is void. What matters if it is in already released kernel. > > If that is the case, do you have a suggestion to allow RSSI to be > considered when monitoring advertisement? Would a new MGMT opcode with > these parameters suffice? its the only way. Regards Marcel
Re: [PATCH net-next v2 1/4] vm_sockets: Include flags field in the vsock address data structure
On Fri, Dec 04, 2020 at 07:02:32PM +0200, Andra Paraschiv wrote: vsock enables communication between virtual machines and the host they are running on. With the multi transport support (guest->host and host->guest), nested VMs can also use vsock channels for communication. In addition to this, by default, all the vsock packets are forwarded to the host, if no host->guest transport is loaded. This behavior can be implicitly used for enabling vsock communication between sibling VMs. Add a flags field in the vsock address data structure that can be used to explicitly mark the vsock connection as being targeted for a certain type of communication. This way, can distinguish between different use cases such as nested VMs and sibling VMs. Use the already available "svm_reserved1" field and mark it as a flags field instead. This field can be set when initializing the vsock address variable used for the connect() call. Changelog v1 -> v2 * Update the field name to "svm_flags". * Split the current patch in 2 patches. Usually the changelog goes after the 3 dashes, but I'm not sure there is a strict rule :-) Anyway the patch LGTM: Reviewed-by: Stefano Garzarella Signed-off-by: Andra Paraschiv --- include/uapi/linux/vm_sockets.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index fd0ed7221645d..46735376a57a8 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -145,7 +145,7 @@ struct sockaddr_vm { __kernel_sa_family_t svm_family; - unsigned short svm_reserved1; + unsigned short svm_flags; unsigned int svm_port; unsigned int svm_cid; unsigned char svm_zero[sizeof(struct sockaddr) - -- 2.20.1 (Apple Git-117) Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.
Re: [PATCH net-next v2 2/4] vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
On Fri, Dec 04, 2020 at 07:02:33PM +0200, Andra Paraschiv wrote: Add VMADDR_FLAG_TO_HOST vsock flag that is used to setup a vsock connection where all the packets are forwarded to the host. Then, using this type of vsock channel, vsock communication between sibling VMs can be built on top of it. Changelog v1 -> v2 * New patch in v2, it was split from the first patch in the series. * Remove the default value for the vsock flags field. * Update the naming for the vsock flag to "VMADDR_FLAG_TO_HOST". Signed-off-by: Andra Paraschiv --- include/uapi/linux/vm_sockets.h | 15 +++ 1 file changed, 15 insertions(+) diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index 46735376a57a8..72e1a3d05682d 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -114,6 +114,21 @@ #define VMADDR_CID_HOST 2 +/* The current default use case for the vsock channel is the following: + * local vsock communication between guest and host and nested VMs setup. + * In addition to this, implicitly, the vsock packets are forwarded to the host + * if no host->guest vsock transport is set. + * + * Set this flag value in the sockaddr_vm corresponding field if the vsock + * packets need to be always forwarded to the host. Using this behavior, + * vsock communication between sibling VMs can be setup. Maybe we can add a sentence saying that this flag is set on the remote peer address for an incoming connection when it is routed from the host (local CID and remote CID > VMADDR_CID_HOST). + * + * This way can explicitly distinguish between vsock channels created for + * different use cases, such as nested VMs (or local communication between + * guest and host) and sibling VMs. + */ +#define VMADDR_FLAG_TO_HOST 0x0001 + /* Invalid vSockets version. */ #define VM_SOCKETS_INVALID_VERSION -1U -- 2.20.1 (Apple Git-117) Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.
Re: [PATCH net-next v2 4/4] af_vsock: Assign the vsock transport considering the vsock address flags
On Fri, Dec 04, 2020 at 07:02:35PM +0200, Andra Paraschiv wrote: The vsock flags field can be set in the connect and (listen) receive paths. When the vsock transport is assigned, the remote CID is used to distinguish between types of connection. Use the vsock flags value (in addition to the CID) from the remote address to decide which vsock transport to assign. For the sibling VMs use case, all the vsock packets need to be forwarded to the host, so always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag is set. For the other use cases, the vsock transport assignment logic is not changed. Changelog v1 -> v2 * Use bitwise operator to check the vsock flag. * Use the updated "VMADDR_FLAG_TO_HOST" flag naming. * Merge the checks for the g2h transport assignment in one "if" block. Signed-off-by: Andra Paraschiv --- net/vmw_vsock/af_vsock.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 83d035eab0b05..66e643c3b5f85 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -421,7 +421,8 @@ static void vsock_deassign_transport(struct vsock_sock *vsk) * The vsk->remote_addr is used to decide which transport to use: * - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if *g2h is not loaded, will use local transport; - * - remote CID <= VMADDR_CID_HOST will use guest->host transport; + * - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field + *includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport; * - remote CID > VMADDR_CID_HOST will use host->guest transport; */ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) @@ -429,6 +430,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) const struct vsock_transport *new_transport; struct sock *sk = sk_vsock(vsk); unsigned int remote_cid = vsk->remote_addr.svm_cid; + unsigned short remote_flags; int ret; /* If the packet is coming with the source and destination CIDs higher @@ -443,6 +445,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) vsk->remote_addr.svm_cid > VMADDR_CID_HOST) vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST; + remote_flags = vsk->remote_addr.svm_flags; + switch (sk->sk_type) { case SOCK_DGRAM: new_transport = transport_dgram; @@ -450,7 +454,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) case SOCK_STREAM: if (vsock_use_local_transport(remote_cid)) new_transport = transport_local; - else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g) + else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g || +(remote_flags & VMADDR_FLAG_TO_HOST) == VMADDR_FLAG_TO_HOST) Maybe "remote_flags & VMADDR_FLAG_TO_HOST" should be enough, but the patch is okay: Reviewed-by: Stefano Garzarella new_transport = transport_g2h; else new_transport = transport_h2g; -- 2.20.1 (Apple Git-117) Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.
Re: [PATCH net-next v2 3/4] af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
On Fri, Dec 04, 2020 at 07:02:34PM +0200, Andra Paraschiv wrote: The vsock flags can be set during the connect() setup logic, when initializing the vsock address data structure variable. Then the vsock transport is assigned, also considering this flags field. The vsock transport is also assigned on the (listen) receive path. The flags field needs to be set considering the use case. Set the value of the vsock flags of the remote address to the one targeted for packets forwarding to the host, if the following conditions are met: * The source CID of the packet is higher than VMADDR_CID_HOST. * The destination CID of the packet is higher than VMADDR_CID_HOST. Changelog v1 -> v2 * Set the vsock flag on the receive path in the vsock transport assignment logic. * Use bitwise operator for the vsock flag setup. * Use the updated "VMADDR_FLAG_TO_HOST" flag naming. Signed-off-by: Andra Paraschiv --- net/vmw_vsock/af_vsock.c | 12 1 file changed, 12 insertions(+) Reviewed-by: Stefano Garzarella diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index d10916ab45267..83d035eab0b05 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -431,6 +431,18 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) unsigned int remote_cid = vsk->remote_addr.svm_cid; int ret; + /* If the packet is coming with the source and destination CIDs higher +* than VMADDR_CID_HOST, then a vsock channel where all the packets are +* forwarded to the host should be established. Then the host will +* need to forward the packets to the guest. +* +* The flag is set on the (listen) receive path (psk is not NULL). On +* the connect path the flag can be set by the user space application. +*/ + if (psk && vsk->local_addr.svm_cid > VMADDR_CID_HOST && + vsk->remote_addr.svm_cid > VMADDR_CID_HOST) + vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST; + switch (sk->sk_type) { case SOCK_DGRAM: new_transport = transport_dgram; -- 2.20.1 (Apple Git-117) Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.
Re: [PATCH net-next v2 0/4] vsock: Add flags field in the vsock address
Hi Andra, On Fri, Dec 04, 2020 at 07:02:31PM +0200, Andra Paraschiv wrote: vsock enables communication between virtual machines and the host they are running on. Nested VMs can be setup to use vsock channels, as the multi transport support has been available in the mainline since the v5.5 Linux kernel has been released. Implicitly, if no host->guest vsock transport is loaded, all the vsock packets are forwarded to the host. This behavior can be used to setup communication channels between sibling VMs that are running on the same host. One example can be the vsock channels that can be established within AWS Nitro Enclaves (see Documentation/virt/ne_overview.rst). To be able to explicitly mark a connection as being used for a certain use case, add a flags field in the vsock address data structure. The "svm_reserved1" field has been repurposed to be the flags field. The value of the flags will then be taken into consideration when the vsock transport is assigned. This way can distinguish between different use cases, such as nested VMs / local communication and sibling VMs. the series seems in a good shape, I left some minor comments. I run my test suite (vsock_test, iperf3, nc) with nested VMs (QEMU/KVM), and everything looks good. Note: I'll be offline today and tomorrow, so I may miss followups. Thanks, Stefano
[PATCH net v2] net: openvswitch: fix TTL decrement exception action execution
Currently, the exception actions are not processed correctly as the wrong dataset is passed. This change fixes this, including the misleading comment. In addition, a check was added to make sure we work on an IPv4 packet, and not just assume if it's not IPv6 it's IPv4. This was all tested using OVS with patch, https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639, applied and sending packets with a TTL of 1 (and 0), both with IPv4 and IPv6. Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format") Signed-off-by: Eelco Chaudron --- v2: - Undid unnessesary paramerter removal from dec_ttl_exception_handler() - Updated commit message to include testing information. net/openvswitch/actions.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 5829a020b81c..ace69777cb29 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -956,16 +956,13 @@ static int dec_ttl_exception_handler(struct datapath *dp, struct sk_buff *skb, struct sw_flow_key *key, const struct nlattr *attr, bool last) { - /* The first action is always 'OVS_DEC_TTL_ATTR_ARG'. */ - struct nlattr *dec_ttl_arg = nla_data(attr); + /* The first attribute is always 'OVS_DEC_TTL_ATTR_ACTION'. */ + struct nlattr *actions = nla_data(attr); - if (nla_len(dec_ttl_arg)) { - struct nlattr *actions = nla_data(dec_ttl_arg); + if (nla_len(actions)) + return clone_execute(dp, skb, key, 0, nla_data(actions), +nla_len(actions), last, false); - if (actions) - return clone_execute(dp, skb, key, 0, nla_data(actions), -nla_len(actions), last, false); - } consume_skb(skb); return 0; } @@ -1209,7 +1206,7 @@ static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key) return -EHOSTUNREACH; key->ip.ttl = --nh->hop_limit; - } else { + } else if (skb->protocol == htons(ETH_P_IP)) { struct iphdr *nh; u8 old_ttl;
Re: [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions
Hi Marcelo, On 12/1/2020 11:41 PM, Saeed Mahameed wrote: On Fri, 2020-11-27 at 11:01 -0300, Marcelo Ricardo Leitner wrote: On Wed, Sep 23, 2020 at 03:48:18PM -0700, sa...@kernel.org wrote: From: Oz Shlomo Sorry for reviving this one, but seemed better for the context. A connection is represented by two 5-tuple entries, one for each direction. Currently, each direction allocates its own hw counter, which is inefficient as ct aging is managed per connection. Share the counter that was allocated for the original direction with the reverse direction. Yes, aging is done per connection, but the stats are not. With this patch, with netperf TCP_RR test, I get this: (mangled for readability) # grep 172.0.0.4 /proc/net/nf_conntrack ipv4 2 tcp 6 src=172.0.0.3 dst=172.0.0.4 sport=34018 dport=33396 packets=3941992 bytes=264113427 src=172.0.0.4 dst=172.0.0.3 sport=33396 dport=34018 packets=4 bytes=218 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3 while without it (594e31bceb + act_ct patch to enable it posted yesterday + revert), I get: # grep 172.0.0.4 /proc/net/nf_conntrack ipv4 2 tcp 6 src=172.0.0.3 dst=172.0.0.4 sport=41856 dport=32776 packets=1876763 bytes=125743084 src=172.0.0.4 dst=172.0.0.3 sport=32776 dport=41856 packets=1876761 bytes=125742951 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3 The same is visible on 'ovs-appctl dpctl/dump-conntrack -s' then. Summing both directions in one like this is at least very misleading. Seems this change was motivated only by hw resources constrains. That said, I'm wondering, can this change be reverted somehow? Marcelo Hi Marcelo, thanks for the report, Sorry i am not familiar with this /procfs Oz, Ariel, Roi, what is your take on this, it seems that we changed the behavior of stats incorrectly. Indeed we overlooked the CT accounting extension. We will submit a driver fix. Thanks, Saeed.
Re: pull-request: wireless-drivers-next-2020-12-03
Jakub Kicinski writes: > On Thu, 3 Dec 2020 18:57:32 + (UTC) Kalle Valo wrote: >> wireless-drivers-next patches for v5.11 >> >> First set of patches for v5.11. rtw88 getting improvements to work >> better with Bluetooth and other driver also getting some new features. >> mhi-ath11k-immutable branch was pulled from mhi tree to avoid >> conflicts with mhi tree. > > Pulled, but there are a lot of fixes in here which look like they > should have been part of the other PR, if you ask me. Yeah, I'm actually on purpose keeping the bar high for patches going to wireless-drivers (ie. the fixes going to -rc releases). This is just to keep things simple for me and avoiding the number of conflicts between the trees. > There's also a patch which looks like it renames a module parameter. > Module parameters are considered uAPI. Ah, I have been actually wondering that if they are part of user space API or not, good to know that they are. I'll keep an eye of this in the future so that we are not breaking the uAPI with module parameter changes. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
[PATCH V3 0/5] patches for stmmac
A patch set for stmmac, fix some driver issues. ChangeLogs: V1->V2: * add Fixes tag. * add patch 5/5 into this patch set. V2->V3: * rebase to latest net tree where fixes go. Fugang Duan (5): net: stmmac: increase the timeout for dma reset net: stmmac: start phylink instance before stmmac_hw_setup() net: stmmac: free tx skb buffer in stmmac_resume() net: stmmac: delete the eee_ctrl_timer after napi disabled net: stmmac: overwrite the dma_cap.addr64 according to HW design .../net/ethernet/stmicro/stmmac/dwmac-imx.c | 9 +--- .../net/ethernet/stmicro/stmmac/dwmac4_lib.c | 2 +- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 51 +++ include/linux/stmmac.h| 1 + 4 files changed, 43 insertions(+), 20 deletions(-) -- 2.17.1
[PATCH V3 1/5] net: stmmac: increase the timeout for dma reset
From: Fugang Duan Current timeout value is not enough for gmac5 dma reset on imx8mp platform, increase the timeout range. Signed-off-by: Fugang Duan Signed-off-by: Joakim Zhang --- drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c index 6e30d7eb4983..0b4ee2dbb691 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c @@ -22,7 +22,7 @@ int dwmac4_dma_reset(void __iomem *ioaddr) return readl_poll_timeout(ioaddr + DMA_BUS_MODE, value, !(value & DMA_BUS_MODE_SFT_RESET), -1, 10); +1, 100); } void dwmac4_set_rx_tail_ptr(void __iomem *ioaddr, u32 tail_ptr, u32 chan) -- 2.17.1
[PATCH V3 3/5] net: stmmac: free tx skb buffer in stmmac_resume()
From: Fugang Duan When do suspend/resume test, there have WARN_ON() log dump from stmmac_xmit() funciton, the code logic: entry = tx_q->cur_tx; first_entry = entry; WARN_ON(tx_q->tx_skbuff[first_entry]); In normal case, tx_q->tx_skbuff[txq->cur_tx] should be NULL because the skb should be handled and freed in stmmac_tx_clean(). But stmmac_resume() reset queue parameters like below, skb buffers may not be freed. tx_q->cur_tx = 0; tx_q->dirty_tx = 0; So free tx skb buffer in stmmac_resume() to avoid warning and memory leak. log: [ 46.139824] [ cut here ] [ 46.144453] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3235 stmmac_xmit+0x7a0/0x9d0 [ 46.154969] Modules linked in: crct10dif_ce vvcam(O) flexcan can_dev [ 46.161328] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.4.24-2.1.0+g2ad925d15481 #1 [ 46.170369] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 46.175677] pstate: 8005 (Nzcv daif -PAN -UAO) [ 46.180465] pc : stmmac_xmit+0x7a0/0x9d0 [ 46.184387] lr : dev_hard_start_xmit+0x94/0x158 [ 46.188913] sp : 800010003cc0 [ 46.192224] x29: 800010003cc0 x28: 000177e2a100 [ 46.197533] x27: 000176ef0840 x26: 000176ef0090 [ 46.202842] x25: x24: [ 46.208151] x23: 0003 x22: 8000119ddd30 [ 46.213460] x21: 00017636f000 x20: 000176ef0cc0 [ 46.218769] x19: 0003 x18: [ 46.224078] x17: x16: [ 46.229386] x15: 0079 x14: [ 46.234695] x13: 0003 x12: 0003 [ 46.240003] x11: 0010 x10: 0010 [ 46.245312] x9 : 00017002b140 x8 : [ 46.250621] x7 : 00017636f000 x6 : 0010 [ 46.255930] x5 : 0001 x4 : 000176ef [ 46.261238] x3 : 0003 x2 : [ 46.266547] x1 : 000177e2a000 x0 : [ 46.271856] Call trace: [ 46.274302] stmmac_xmit+0x7a0/0x9d0 [ 46.277874] dev_hard_start_xmit+0x94/0x158 [ 46.282056] sch_direct_xmit+0x11c/0x338 [ 46.285976] __qdisc_run+0x118/0x5f0 [ 46.289549] net_tx_action+0x110/0x198 [ 46.293297] __do_softirq+0x120/0x23c [ 46.296958] irq_exit+0xb8/0xd8 [ 46.300098] __handle_domain_irq+0x64/0xb8 [ 46.304191] gic_handle_irq+0x5c/0x148 [ 46.307936] el1_irq+0xb8/0x180 [ 46.311076] cpuidle_enter_state+0x84/0x360 [ 46.315256] cpuidle_enter+0x34/0x48 [ 46.318829] call_cpuidle+0x18/0x38 [ 46.322314] do_idle+0x1e0/0x280 [ 46.325539] cpu_startup_entry+0x24/0x40 [ 46.329460] rest_init+0xd4/0xe0 [ 46.332687] arch_call_rest_init+0xc/0x14 [ 46.336695] start_kernel+0x420/0x44c [ 46.340353] ---[ end trace bc1ee695123cbacd ]--- Fixes: 47dd7a540b8a0 ("net: add support for STMicroelectronics Ethernet controllers.") Signed-off-by: Fugang Duan Signed-off-by: Joakim Zhang --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 0cef414f1289..7452f3c1cab9 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1533,6 +1533,19 @@ static void dma_free_tx_skbufs(struct stmmac_priv *priv, u32 queue) stmmac_free_tx_buffer(priv, queue, i); } +/** + * stmmac_free_tx_skbufs - free TX skb buffers + * @priv: private structure + */ +static void stmmac_free_tx_skbufs(struct stmmac_priv *priv) +{ + u32 tx_queue_cnt = priv->plat->tx_queues_to_use; + u32 queue; + + for (queue = 0; queue < tx_queue_cnt; queue++) + dma_free_tx_skbufs(priv, queue); +} + /** * free_dma_rx_desc_resources - free RX dma desc resources * @priv: private structure @@ -5260,6 +5273,7 @@ int stmmac_resume(struct device *dev) stmmac_reset_queues_param(priv); + stmmac_free_tx_skbufs(priv); stmmac_clear_descriptors(priv); stmmac_hw_setup(ndev, false); -- 2.17.1
[PATCH V3 2/5] net: stmmac: start phylink instance before stmmac_hw_setup()
From: Fugang Duan Start phylink instance and resume back the PHY to supply RX clock to MAC before MAC layer initialization by calling .stmmac_hw_setup(), since DMA reset depends on the RX clock, otherwise DMA reset cost maximum timeout value then finally timeout. Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic") Signed-off-by: Fugang Duan Signed-off-by: Joakim Zhang --- .../net/ethernet/stmicro/stmmac/stmmac_main.c| 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index ba45fe237512..0cef414f1289 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -5247,6 +5247,14 @@ int stmmac_resume(struct device *dev) return ret; } + if (!device_may_wakeup(priv->device) || !priv->plat->pmt) { + rtnl_lock(); + phylink_start(priv->phylink); + /* We may have called phylink_speed_down before */ + phylink_speed_up(priv->phylink); + rtnl_unlock(); + } + rtnl_lock(); mutex_lock(&priv->lock); @@ -5265,14 +5273,6 @@ int stmmac_resume(struct device *dev) mutex_unlock(&priv->lock); rtnl_unlock(); - if (!device_may_wakeup(priv->device) || !priv->plat->pmt) { - rtnl_lock(); - phylink_start(priv->phylink); - /* We may have called phylink_speed_down before */ - phylink_speed_up(priv->phylink); - rtnl_unlock(); - } - phylink_mac_change(priv->phylink, true); netif_device_attach(ndev); -- 2.17.1
[PATCH V3 5/5] net: stmmac: overwrite the dma_cap.addr64 according to HW design
From: Fugang Duan The current IP register MAC_HW_Feature1[ADDR64] only defines 32/40/64 bit width, but some SOCs support others like i.MX8MP support 34 bits but it maps to 40 bits width in MAC_HW_Feature1[ADDR64]. So overwrite dma_cap.addr64 according to HW real design. Fixes: 94abdad6974a ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip") Signed-off-by: Fugang Duan Signed-off-by: Joakim Zhang --- drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c | 9 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 include/linux/stmmac.h| 1 + 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c index efef5476a577..223f69da7e95 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c @@ -246,13 +246,7 @@ static int imx_dwmac_probe(struct platform_device *pdev) goto err_parse_dt; } - ret = dma_set_mask_and_coherent(&pdev->dev, - DMA_BIT_MASK(dwmac->ops->addr_width)); - if (ret) { - dev_err(&pdev->dev, "DMA mask set failed\n"); - goto err_dma_mask; - } - + plat_dat->addr64 = dwmac->ops->addr_width; plat_dat->init = imx_dwmac_init; plat_dat->exit = imx_dwmac_exit; plat_dat->fix_mac_speed = imx_dwmac_fix_speed; @@ -272,7 +266,6 @@ static int imx_dwmac_probe(struct platform_device *pdev) err_dwmac_init: err_drv_probe: imx_dwmac_exit(pdev, plat_dat->bsp_priv); -err_dma_mask: err_parse_dt: err_match_data: stmmac_remove_config_dt(pdev, plat_dat); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index d2521ebb8217..c33db79cdd0a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4945,6 +4945,14 @@ int stmmac_dvr_probe(struct device *device, dev_info(priv->device, "SPH feature enabled\n"); } + /* The current IP register MAC_HW_Feature1[ADDR64] only define +* 32/40/64 bit width, but some SOC support others like i.MX8MP +* support 34 bits but it map to 40 bits width in MAC_HW_Feature1[ADDR64]. +* So overwrite dma_cap.addr64 according to HW real design. +*/ + if (priv->plat->addr64) + priv->dma_cap.addr64 = priv->plat->addr64; + if (priv->dma_cap.addr64) { ret = dma_set_mask_and_coherent(device, DMA_BIT_MASK(priv->dma_cap.addr64)); diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 628e28903b8b..15ca6b4167cc 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -170,6 +170,7 @@ struct plat_stmmacenet_data { int unicast_filter_entries; int tx_fifo_size; int rx_fifo_size; + u32 addr64; u32 rx_queues_to_use; u32 tx_queues_to_use; u8 rx_sched_algorithm; -- 2.17.1
[PATCH V3 4/5] net: stmmac: delete the eee_ctrl_timer after napi disabled
From: Fugang Duan There have chance to re-enable the eee_ctrl_timer and fire the timer in napi callback after delete the timer in .stmmac_release(), which introduces to access eee registers in the timer function after clocks are disabled then causes system hang. Found this issue when do suspend/resume and reboot stress test. It is safe to delete the timer after napi disabled and disable lpi mode. Fixes: d765955d2ae0b ("stmmac: add the Energy Efficient Ethernet support") Signed-off-by: Fugang Duan Signed-off-by: Joakim Zhang --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 7452f3c1cab9..d2521ebb8217 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2908,9 +2908,6 @@ static int stmmac_release(struct net_device *dev) struct stmmac_priv *priv = netdev_priv(dev); u32 chan; - if (priv->eee_enabled) - del_timer_sync(&priv->eee_ctrl_timer); - if (device_may_wakeup(priv->device)) phylink_speed_down(priv->phylink, false); /* Stop and disconnect the PHY */ @@ -2929,6 +2926,11 @@ static int stmmac_release(struct net_device *dev) if (priv->lpi_irq > 0) free_irq(priv->lpi_irq, dev); + if (priv->eee_enabled) { + priv->tx_path_in_lpi_mode = false; + del_timer_sync(&priv->eee_ctrl_timer); + } + /* Stop TX/RX DMA and clear the descriptors */ stmmac_stop_all_dma(priv); @@ -5155,6 +5157,11 @@ int stmmac_suspend(struct device *dev) for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++) del_timer_sync(&priv->tx_queue[chan].txtimer); + if (priv->eee_enabled) { + priv->tx_path_in_lpi_mode = false; + del_timer_sync(&priv->eee_ctrl_timer); + } + /* Stop TX/RX DMA */ stmmac_stop_all_dma(priv); -- 2.17.1
Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
On 12/7/2020 10:37 AM, Saeed Mahameed wrote: On Sun, 2020-12-06 at 09:08 -0800, Richard Cochran wrote: On Sun, Dec 06, 2020 at 03:37:47PM +0200, Eran Ben Elisha wrote: Adding new enum to the ioctl means we have add (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY for example) all the way - drivers, kernel ptp, user space ptp, ethtool. Not exactly, 1) the flag name should be HWTSTAMP_TX_PTP_EVENTS, similar to what we already have in RX, which will mean: HW stamp all PTP events, don't care about the rest. 2) no need to add it to drivers from the get go, only drivers who are interested may implement it, and i am sure there are tons who would like to have this flag if their hw timestamping implementation is slow ! other drivers will just keep doing what they are doing, timestamp all traffic even if user requested this flag, again exactly like many other drivers do for RX flags (hwtstamp_rx_filters). My concerns are: 1. Timestamp applications (like ptp4l or similar) will have to add support for configuring the driver to use HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY if supported via ioctl prior to packets transmit. From application point of view, the dual-modes (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY , HWTSTAMP_TX_ON) support is redundant, as it offers nothing new. Well said. disagree, it is not a dual mode, just allow the user to have better granularity for what hw stamps, exactly like what we have in rx. we are not adding any new mechanism. 2. Other vendors will have to support it as well, when not sure what is the expectation from them if they cannot improve accuracy between them. If there were multiple different devices out there with this kind of implementation (different levels of accuracy with increasing run time performance cost), then we could consider such a flag. However, to my knowledge, this feature is unique to your device. I agree, but i never meant to have a flag that indicate two different levels of accuracy, that would be a very wild mistake for sure! The new flag will be about selecting granularity of what gets a hw stamp and what doesn't, aligning with the RX filter API. This feature is just an internal enhancement, and as such it should be added only as a vendor private configuration flag. We are not offering here about any standard for others to follow. +1 Our driver feature is and internal enhancement yes, but the suggested flag is very far from indicating any internal enhancement, is actually an enhancement to the current API, and is a very simple extension with wide range of improvements to all layers. Our driver can optimize accuracy when this flag is set, other drivers might be happy to implement it since they already have a slow hw and this flag would allow them to run better TCP/UDP performance while still performing ptp hw stamping, some admins/apps will use it to avoid stamping all traffic on tx, win win win. Seems interesting. I can form such V2 patches soon.
[PATCH net-next] nfc: s3fwrn5: Change irqflags
From: Bongsu Jeon change irqflags from IRQF_TRIGGER_HIGH to IRQF_TRIGGER_RISING for stable Samsung's nfc interrupt handling. Signed-off-by: Bongsu Jeon --- drivers/nfc/s3fwrn5/i2c.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c index e1bdde105f24..016f6b6df849 100644 --- a/drivers/nfc/s3fwrn5/i2c.c +++ b/drivers/nfc/s3fwrn5/i2c.c @@ -213,7 +213,7 @@ static int s3fwrn5_i2c_probe(struct i2c_client *client, return ret; ret = devm_request_threaded_irq(&client->dev, phy->i2c_dev->irq, NULL, - s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_HIGH | IRQF_ONESHOT, + s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_RISING | IRQF_ONESHOT, S3FWRN5_I2C_DRIVER_NAME, phy); if (ret) s3fwrn5_remove(phy->common.ndev); -- 2.17.1
[PATCH RFC] ethernet: stmmac: clean up the code for release/suspend/resume function
commit 1c35cc9cf6a0 ("net: stmmac: remove redundant null check before clk_disable_unprepare()"), have not clean up check NULL clock parameter completely, this patch did it. commit e8377e7a29efb ("net: stmmac: only call pmt() during suspend/resume if HW enables PMT"), after this patch, we use if (device_may_wakeup(priv->device) && priv->plat->pmt) check MAC wakeup if (device_may_wakeup(priv->device)) check PHY wakeup Add oneline comment for readability. commit 77b2898394e3b ("net: stmmac: Speed down the PHY if WoL to save energy"), slow down phy speed when release net device under any condition. Slightly adjust the order of the codes so that suspend/resume look more symmetrical, generally speaking they should appear symmetrically. Signed-off-by: Joakim Zhang --- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 +-- 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index c33db79cdd0a..a46e865c4acc 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2908,8 +2908,7 @@ static int stmmac_release(struct net_device *dev) struct stmmac_priv *priv = netdev_priv(dev); u32 chan; - if (device_may_wakeup(priv->device)) - phylink_speed_down(priv->phylink, false); + phylink_speed_down(priv->phylink, false); /* Stop and disconnect the PHY */ phylink_stop(priv->phylink); phylink_disconnect_phy(priv->phylink); @@ -5183,6 +5182,7 @@ int stmmac_suspend(struct device *dev) } else { mutex_unlock(&priv->lock); rtnl_lock(); + /* For PHY wakeup case */ if (device_may_wakeup(priv->device)) phylink_speed_down(priv->phylink, false); phylink_stop(priv->phylink); @@ -5260,11 +5260,17 @@ int stmmac_resume(struct device *dev) /* enable the clk previously disabled */ clk_prepare_enable(priv->plat->stmmac_clk); clk_prepare_enable(priv->plat->pclk); - if (priv->plat->clk_ptp_ref) - clk_prepare_enable(priv->plat->clk_ptp_ref); + clk_prepare_enable(priv->plat->clk_ptp_ref); /* reset the phy so that it's ready */ if (priv->mii) stmmac_mdio_reset(priv->mii); + + rtnl_lock(); + phylink_start(priv->phylink); + /* We may have called phylink_speed_down before */ + if (device_may_wakeup(priv->device)) + phylink_speed_up(priv->phylink); + rtnl_unlock(); } if (priv->plat->serdes_powerup) { @@ -5275,14 +5281,6 @@ int stmmac_resume(struct device *dev) return ret; } - if (!device_may_wakeup(priv->device) || !priv->plat->pmt) { - rtnl_lock(); - phylink_start(priv->phylink); - /* We may have called phylink_speed_down before */ - phylink_speed_up(priv->phylink); - rtnl_unlock(); - } - rtnl_lock(); mutex_lock(&priv->lock); -- 2.17.1
[PATCH net] tcp: fix receive buffer autotuning to trigger for any valid advertised MSS
Previously receiver buffer auto-tuning starts after receiving one advertised window amount of data.After the initial receiver buffer was raised by commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB"),the receiver buffer may take too long for TCP autotuning to start raising the receiver buffer size. commit 041a14d26715 ("tcp: start receiver buffer autotuning sooner") tried to decrease the threshold at which TCP auto-tuning starts but it's doesn't work well in some environments where the receiver has large MTU (9001) especially with high RTT connections as in these environments rcvq_space.space will be the same as rcv_wnd so TCP autotuning will never start because sender can't send more than rcv_wnd size in one round trip. To address this issue this patch is decreasing the initial rcvq_space.space so TCP autotuning kicks in whenever the sender is able to send more than 5360 bytes in one round trip regardless the receiver's configured MTU. Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") Signed-off-by: Hazem Mohamed Abuelfotoh --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 389d1b340248..f0ffac9e937b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -504,13 +504,14 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb) static void tcp_init_buffer_space(struct sock *sk) { int tcp_app_win = sock_net(sk)->ipv4.sysctl_tcp_app_win; + struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); int maxwin; if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK)) tcp_sndbuf_expand(sk); - tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss); + tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * icsk->icsk_ack.rcv_mss); tcp_mstamp_refresh(tp); tp->rcvq_space.time = tp->tcp_mstamp; tp->rcvq_space.seq = tp->copied_seq; -- 2.16.6 Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284 Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
BUG: unable to handle kernel paging request in bpf_lru_populate
Hello, syzbot found the following issue on: HEAD commit:bcd684aa net/nfc/nci: Support NCI 2.x initial sequence git tree: net-next console output: https://syzkaller.appspot.com/x/log.txt?x=12001bd350 kernel config: https://syzkaller.appspot.com/x/.config?x=3cb098ab0334059f dashboard link: https://syzkaller.appspot.com/bug?extid=ec2234240c96fdd26b93 compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11f7f2ef50 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=103833f750 The issue was bisected to: commit b93ef089d35c3386dd197e85afb6399bbd54cfb3 Author: Martin KaFai Lau Date: Mon Nov 16 20:01:13 2020 + bpf: Fix the irq and nmi check in bpf_sk_storage for tracing usage bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1103b83750 final oops: https://syzkaller.appspot.com/x/report.txt?x=1303b83750 console output: https://syzkaller.appspot.com/x/log.txt?x=1503b83750 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+ec2234240c96fdd26...@syzkaller.appspotmail.com Fixes: b93ef089d35c ("bpf: Fix the irq and nmi check in bpf_sk_storage for tracing usage") BUG: unable to handle page fault for address: f5200471266c #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 23fff2067 P4D 23fff2067 PUD 101a4067 PMD 32e3a067 PTE 0 Oops: [#1] PREEMPT SMP KASAN CPU: 1 PID: 8503 Comm: syz-executor608 Not tainted 5.10.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bpf_common_lru_populate kernel/bpf/bpf_lru_list.c:569 [inline] RIP: 0010:bpf_lru_populate+0xd8/0x5e0 kernel/bpf/bpf_lru_list.c:614 Code: 03 4d 01 e7 48 01 d8 48 89 4c 24 10 4d 89 fe 48 89 44 24 08 e8 99 23 eb ff 49 8d 7e 12 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <0f> b6 04 18 38 d0 7f 08 84 c0 0f 85 80 04 00 00 49 8d 7e 13 41 c6 RSP: 0018:c9000126fc20 EFLAGS: 00010202 RAX: 19200471266c RBX: dc00 RCX: 8184e3e2 RDX: 0002 RSI: 8184e2e7 RDI: c90023893362 RBP: 00bc R08: 107c R09: R10: 107c R11: R12: 0001 R13: 107c R14: c90023893350 R15: c900234832f0 FS: 00fe0880() GS:8880b9f0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: f5200471266c CR3: 1ba62000 CR4: 001506e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: prealloc_init kernel/bpf/hashtab.c:319 [inline] htab_map_alloc+0xf6e/0x1230 kernel/bpf/hashtab.c:507 find_and_alloc_map kernel/bpf/syscall.c:123 [inline] map_create kernel/bpf/syscall.c:829 [inline] __do_sys_bpf+0xa81/0x5170 kernel/bpf/syscall.c:4374 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x4402e9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7ffe77af23b8 EFLAGS: 0246 ORIG_RAX: 0141 RAX: ffda RBX: 004002c8 RCX: 004402e9 RDX: 0040 RSI: 2000 RDI: 0d00 RBP: 006ca018 R08: R09: R10: R11: 0246 R12: 00401af0 R13: 00401b80 R14: R15: Modules linked in: CR2: f5200471266c ---[ end trace 4f3928bacde7b3ed ]--- RIP: 0010:bpf_common_lru_populate kernel/bpf/bpf_lru_list.c:569 [inline] RIP: 0010:bpf_lru_populate+0xd8/0x5e0 kernel/bpf/bpf_lru_list.c:614 Code: 03 4d 01 e7 48 01 d8 48 89 4c 24 10 4d 89 fe 48 89 44 24 08 e8 99 23 eb ff 49 8d 7e 12 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <0f> b6 04 18 38 d0 7f 08 84 c0 0f 85 80 04 00 00 49 8d 7e 13 41 c6 RSP: 0018:c9000126fc20 EFLAGS: 00010202 RAX: 19200471266c RBX: dc00 RCX: 8184e3e2 RDX: 0002 RSI: 8184e2e7 RDI: c90023893362 RBP: 00bc R08: 107c R09: R10: 107c R11: R12: 0001 R13: 107c R14: c90023893350 R15: c900234832f0 FS: 00fe0880() GS:8880b9f0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: f5200471266c CR3: 1ba62000 CR4: 001506e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep
[PATCH net] tcp: fix receive buffer autotuning to trigger for any valid advertised MSS
Previously receiver buffer auto-tuning starts after receiving one advertised window amount of data.After the initial receiver buffer was raised by commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB"),the receiver buffer may take too long for TCP autotuning to start raising the receiver buffer size. commit 041a14d26715 ("tcp: start receiver buffer autotuning sooner") tried to decrease the threshold at which TCP auto-tuning starts but it's doesn't work well in some environments where the receiver has large MTU (9001) especially with high RTT connections as in these environments rcvq_space.space will be the same as rcv_wnd so TCP autotuning will never start because sender can't send more than rcv_wnd size in one round trip. To address this issue this patch is decreasing the initial rcvq_space.space so TCP autotuning kicks in whenever the sender is able to send more than 5360 bytes in one round trip regardless the receiver's configured MTU. Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") Signed-off-by: Hazem Mohamed Abuelfotoh --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 389d1b340248..f0ffac9e937b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -504,13 +504,14 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb) static void tcp_init_buffer_space(struct sock *sk) { int tcp_app_win = sock_net(sk)->ipv4.sysctl_tcp_app_win; + struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); int maxwin; if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK)) tcp_sndbuf_expand(sk); - tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss); + tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * icsk->icsk_ack.rcv_mss); tcp_mstamp_refresh(tp); tp->rcvq_space.time = tp->tcp_mstamp; tp->rcvq_space.seq = tp->copied_seq; -- 2.16.6 Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284 Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
Re: [PATCH net-next] nfc: s3fwrn5: Change irqflags
On Mon, Dec 07, 2020 at 08:38:27PM +0900, Bongsu Jeon wrote: > From: Bongsu Jeon > > change irqflags from IRQF_TRIGGER_HIGH to IRQF_TRIGGER_RISING for stable > Samsung's nfc interrupt handling. 1. Describe in commit title/subject the change. Just a word "change irqflags" is not enough. 2. Describe in commit message what you are trying to fix. Before was not stable? The "for stable interrupt handling" is a little bit vauge. 3. This is contradictory to the bindings and current DTS. I think the driver should not force the specific trigger type because I could imagine some configuration that the actual interrupt to the CPU is routed differently. Instead, how about removing the trigger flags here and fixing the DTS and bindings example? Best regards, Krzysztof > > Signed-off-by: Bongsu Jeon > --- > drivers/nfc/s3fwrn5/i2c.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c > index e1bdde105f24..016f6b6df849 100644 > --- a/drivers/nfc/s3fwrn5/i2c.c > +++ b/drivers/nfc/s3fwrn5/i2c.c > @@ -213,7 +213,7 @@ static int s3fwrn5_i2c_probe(struct i2c_client *client, > return ret; > > ret = devm_request_threaded_irq(&client->dev, phy->i2c_dev->irq, NULL, > - s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_HIGH | IRQF_ONESHOT, > + s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_RISING | IRQF_ONESHOT, > S3FWRN5_I2C_DRIVER_NAME, phy); > if (ret) > s3fwrn5_remove(phy->common.ndev); > -- > 2.17.1 >
Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
On Fri, 4 Dec 2020 23:19:55 +0100 Daniel Borkmann wrote: > On 12/4/20 6:20 PM, Toke Høiland-Jørgensen wrote: > > Daniel Borkmann writes: > [...] > >> We tried to standardize on a minimum guaranteed amount, but unfortunately > >> not > >> everyone seems to implement it, but I think it would be very useful to > >> query > >> this from application side, for example, consider that an app inserts a BPF > >> prog at XDP doing custom encap shortly before XDP_TX so it would be useful > >> to > >> know which of the different encaps it implements are realistically > >> possible on > >> the underlying XDP supported dev. > > > > How many distinct values are there in reality? Enough to express this in > > a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need > > an additional field to get the exact value? If we implement the latter > > we also run the risk of people actually implementing all sorts of weird > > values, whereas if we constrain it to a few distinct values it's easier > > to push back against adding new values (as it'll be obvious from the > > addition of new flags). > > It's not everywhere straight forward to determine unfortunately, see also > [0,1] > as some data points where Jesper looked into in the past, so in some cases it > might differ depending on the build/runtime config.. > >[0] > https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit@firesoul/ >[1] > https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul/ Yes, unfortunately drivers have already gotten creative in this area, and variations have sneaked in. I remember that we were forced to allow SFC driver to use 128 bytes headroom, to avoid a memory corruption. I tried hard to have the minimum 192 bytes as it is 3 cachelines, but I failed to enforce this. It might be valuable to expose info on the drivers headroom size, as this will allow end-users to take advantage of this (instead of having to use the lowest common headroom) and up-front in userspace rejecting to load on e.g. SFC that have this annoying limitation. BUT thinking about what the drivers headroom size MEANS to userspace, I'm not sure it is wise to give this info to userspace. The XDP-headroom is used for several kernel internal things, that limit the available space for growing packet-headroom. E.g. (1) xdp_frame is something that we likely need to grow (even-though I'm pushing back), E.g. (2) metadata area which Saeed is looking to populate from driver code (also reduce packet-headroom for encap-headers). So, userspace cannot use the XDP-headroom size to much... -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [EXT] Re: [PATCH v5 6/9] task_isolation: arch/arm64: enable task isolation functionality
On Fri, Dec 04, 2020 at 12:37:32AM +, Alex Belits wrote: > On Wed, 2020-12-02 at 13:59 +, Mark Rutland wrote: > > On Mon, Nov 23, 2020 at 05:58:06PM +, Alex Belits wrote: > > As a heads-up, the arm64 entry code is changing, as we found that > > our lockdep, RCU, and context-tracking management wasn't quite > > right. I have a series of patches: > > > > https://lore.kernel.org/r/20201130115950.22492-1-mark.rutl...@arm.com > > > > ... which are queued in the arm64 for-next/fixes branch. I intend to > > have some further rework ready for the next cycle. > > That was quite obviously broken if PROVE_LOCKING and NO_HZ_FULL were > > chosen and context tracking was in use (e.g. with > > CONTEXT_TRACKING_FORCE), > > I am not yet sure about TRACE_IRQFLAGS, however NO_HZ_FULL and > CONTEXT_TRACKING have to be enabled for it to do anything. > > I will check it with PROVE_LOCKING and your patches. Thanks. In future, please do test this functionality with PROVE_LOCKING, because otherwise bugs with RCU and IRQ state maangement will easily be missed (as has been the case until very recently). Testing with all those debug optiosn enabled (and stating that you have done so) will give reviuewers much greater confidence that this works, and if that does start spewing errors it save everyone the time identifying that. > Entry code only adds an inline function that, if task isolation is > enabled, uses raw_local_irq_save() / raw_local_irq_restore(), low-level > operations and accesses per-CPU variabled by offset, so at very least > it should not add any problems. Even raw_local_irq_save() / > raw_local_irq_restore() probably should be removed, however I wanted to > have something that can be safely called if by whatever reason > interrupts were enabled before kernel was fully entered. Sure. In the new flows we have new enter_from_*() and exit_to_*() functions where these calls should be able to live (and so we should be able to ensure a more consistent environment). The near-term plan for arm64 is to migrate more of the exception triage assembly to C, then to rework the arm64 entry code and generic entry code to be more similar, then to migrate as much as possible to the generic entry code. So please bear in mind that anything that adds to the differences between the two is goingf to be problematic. > > so I'm assuming that this series has not been > > tested in that configuration. What sort of testing has this seen? > > On various available arm64 hardware, with enabled > > CONFIG_TASK_ISOLATION > CONFIG_NO_HZ_FULL > CONFIG_HIGH_RES_TIMERS > > and disabled: > > CONFIG_HZ_PERIODIC > CONFIG_NO_HZ_IDLE > CONFIG_NO_HZ Ok. I'd recommend looking at the various debug options under the "kernel hacking" section in kconfig, and enabling some of those. At the very least PROVE_LOCKING, ideally also using the lockup dectors and anything else for debugging RCU, etc. [...] > > > Functions called from there: > > > asm_nmi_enter() -> nmi_enter() -> task_isolation_kernel_enter() > > > asm_nmi_exit() -> nmi_exit() -> task_isolation_kernel_return() > > > > > > Handlers: > > > do_serror() -> nmi_enter() -> task_isolation_kernel_enter() > > > or task_isolation_kernel_enter() > > > el1_sync_handler() -> task_isolation_kernel_enter() > > > el0_sync_handler() -> task_isolation_kernel_enter() > > > el0_sync_compat_handler() -> task_isolation_kernel_enter() > > > > > > handle_arch_irq() is irqchip-specific, most call > > > handle_domain_irq() > > > There is a separate patch for irqchips that do not follow this > > > rule. > > > > > > handle_domain_irq() -> task_isolation_kernel_enter() > > > do_handle_IPI() -> task_isolation_kernel_enter() (may be redundant) > > > nmi_enter() -> task_isolation_kernel_enter() > > > > The IRQ cases look very odd to me. With the rework I've just done > > for arm64, we'll do the regular context tracking accounting before > > we ever get into handle_domain_irq() or similar, so I suspect that's > > not necessary at all? > > The goal is to call task_isolation_kernel_enter() before anything that > depends on a CPU state, including pipeline, that could remain un- > synchronized when the rest of the kernel was sending synchronization > IPIs. Similarly task_isolation_kernel_return() should be called when it > is safe to turn off synchronization. If rework allows it to be done > earlier, there is no need to touch more specific functions. Sure; I think that's sorted as a result of the changes I made recently. > > > --- a/arch/arm64/include/asm/barrier.h > > > +++ b/arch/arm64/include/asm/barrier.h > > > @@ -49,6 +49,7 @@ > > > #define dma_rmb()dmb(oshld) > > > #define dma_wmb()dmb(oshst) > > > > > > +#define instr_sync() isb() > > > > I think I've asked on prior versions of the patchset, but what is > > this for? Where is it going to be used, and what is the expected > > semantics? I'm wary of exposing this outside of arch code because > > there aren't stron
Re: [EXT] Re: [PATCH v5 7/9] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu()
On Fri, Dec 04, 2020 at 12:54:29AM +, Alex Belits wrote: > > On Wed, 2020-12-02 at 14:20 +, Mark Rutland wrote: > > External Email > > > > --- > > --- > > On Mon, Nov 23, 2020 at 05:58:22PM +, Alex Belits wrote: > > > From: Yuri Norov > > > > > > For nohz_full CPUs the desirable behavior is to receive interrupts > > > generated by tick_nohz_full_kick_cpu(). But for hard isolation it's > > > obviously not desirable because it breaks isolation. > > > > > > This patch adds check for it. > > > > > > Signed-off-by: Yuri Norov > > > [abel...@marvell.com: updated, only exclude CPUs running isolated > > > tasks] > > > Signed-off-by: Alex Belits > > > --- > > > kernel/time/tick-sched.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > > index a213952541db..6c8679e200f0 100644 > > > --- a/kernel/time/tick-sched.c > > > +++ b/kernel/time/tick-sched.c > > > @@ -20,6 +20,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > #include > > > #include > > > #include > > > @@ -268,7 +269,8 @@ static void tick_nohz_full_kick(void) > > > */ > > > void tick_nohz_full_kick_cpu(int cpu) > > > { > > > - if (!tick_nohz_full_cpu(cpu)) > > > + smp_rmb(); > > > > What does this barrier pair with? The commit message doesn't mention > > it, > > and it's not clear in-context. > > With barriers in task_isolation_kernel_enter() > and task_isolation_exit_to_user_mode(). Please add a comment in the code as to what it pairs with. Thanks, Mark.
[PATCH -next] net/mlx5_core: remove unused including
Remove including that don't need it. Signed-off-by: Zou Wei --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 989c70c..82ecc161 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -30,7 +30,6 @@ * SOFTWARE. */ -#include #include #include #include -- 2.6.2
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
Ard Biesheuvel wrote: > > Yeah - the problem with that is that for sunrpc, we might be dealing with > > 1MB > > plus bits of non-contiguous pages, requiring >8K of scatterlist elements > > (admittedly, we can chain them, but we may have to do one or more large > > allocations). > > > > > However, I would recommend against it: > > > > Sorry, recommend against what? > > > > Recommend against the current approach of manipulating the input like > this and feeding it into the skcipher piecemeal. Right. I understand the problem, but as I mentioned above, the scatterlist itself becomes a performance issue as it may exceed two pages in size. Double that as there may need to be separate input and output scatterlists. > Herbert recently made some changes for MSG_MORE support in the AF_ALG > code, which permits a skcipher encryption to be split into several > invocations of the skcipher layer without the need for this complexity > on the side of the caller. Maybe there is a way to reuse that here. > Herbert? I wonder if it would help if the input buffer and output buffer didn't have to correspond exactly in usage - ie. the output buffer could be used at a slower rate than the input to allow for buffering inside the crypto algorithm. > > Can you also do SHA at the same time in the same loop? > > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD. > The former doesn't really fit the current API so we'd have to invent > something for it. The hashes corresponding to the kerberos enctypes I'm supporting are: HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. HMAC-SHA256 for aes128-cts-hmac-sha256-128 HMAC-SHA384 for aes256-cts-hmac-sha384-192 CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac I'm not sure you can support all of those with the instructions available. David
Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
Daniel Borkmann writes: > On 12/4/20 6:20 PM, Toke Høiland-Jørgensen wrote: >> Daniel Borkmann writes: > [...] >>> We tried to standardize on a minimum guaranteed amount, but unfortunately >>> not >>> everyone seems to implement it, but I think it would be very useful to query >>> this from application side, for example, consider that an app inserts a BPF >>> prog at XDP doing custom encap shortly before XDP_TX so it would be useful >>> to >>> know which of the different encaps it implements are realistically possible >>> on >>> the underlying XDP supported dev. >> >> How many distinct values are there in reality? Enough to express this in >> a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need >> an additional field to get the exact value? If we implement the latter >> we also run the risk of people actually implementing all sorts of weird >> values, whereas if we constrain it to a few distinct values it's easier >> to push back against adding new values (as it'll be obvious from the >> addition of new flags). > > It's not everywhere straight forward to determine unfortunately, see also > [0,1] > as some data points where Jesper looked into in the past, so in some cases it > might differ depending on the build/runtime config.. > >[0] > https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit@firesoul/ >[1] > https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul/ Right, well in that case maybe we should just expose the actual headroom as a separate netlink attribute? Although I suppose that would require another round of driver changes since Jesper's patch you linked above only puts this into xdp_buff at XDP program runtime. Jesper, WDYT? -Toke
Re: [PATCH v2 bpf 0/5] New netdev feature flags for XDP
Jakub Kicinski writes: > On Fri, 04 Dec 2020 18:26:10 +0100 Toke Høiland-Jørgensen wrote: >> Jakub Kicinski writes: >> >> > On Fri, 4 Dec 2020 11:28:56 +0100 alar...@gmail.com wrote: >> >> * Extend ethtool netlink interface in order to get access to the XDP >> >>bitmap (XDP_PROPERTIES_GET). [Toke] >> > >> > That's a good direction, but I don't see why XDP caps belong in ethtool >> > at all? We use rtnetlink to manage the progs... >> >> You normally use ethtool to get all the other features a device support, >> don't you? > > Not really, please take a look at all the IFLA attributes. There's > a bunch of capabilities there. Ah, right, TIL. Well, putting this new property in rtnetlink instead of ethtool is fine by me as well :) -Toke
Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
Jesper Dangaard Brouer writes: > On Fri, 4 Dec 2020 23:19:55 +0100 > Daniel Borkmann wrote: > >> On 12/4/20 6:20 PM, Toke Høiland-Jørgensen wrote: >> > Daniel Borkmann writes: >> [...] >> >> We tried to standardize on a minimum guaranteed amount, but unfortunately >> >> not >> >> everyone seems to implement it, but I think it would be very useful to >> >> query >> >> this from application side, for example, consider that an app inserts a >> >> BPF >> >> prog at XDP doing custom encap shortly before XDP_TX so it would be >> >> useful to >> >> know which of the different encaps it implements are realistically >> >> possible on >> >> the underlying XDP supported dev. >> > >> > How many distinct values are there in reality? Enough to express this in >> > a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need >> > an additional field to get the exact value? If we implement the latter >> > we also run the risk of people actually implementing all sorts of weird >> > values, whereas if we constrain it to a few distinct values it's easier >> > to push back against adding new values (as it'll be obvious from the >> > addition of new flags). >> >> It's not everywhere straight forward to determine unfortunately, see also >> [0,1] >> as some data points where Jesper looked into in the past, so in some cases it >> might differ depending on the build/runtime config.. >> >>[0] >> https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit@firesoul/ >>[1] >> https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul/ > > Yes, unfortunately drivers have already gotten creative in this area, > and variations have sneaked in. I remember that we were forced to > allow SFC driver to use 128 bytes headroom, to avoid a memory > corruption. I tried hard to have the minimum 192 bytes as it is 3 > cachelines, but I failed to enforce this. > > It might be valuable to expose info on the drivers headroom size, as > this will allow end-users to take advantage of this (instead of having > to use the lowest common headroom) and up-front in userspace rejecting > to load on e.g. SFC that have this annoying limitation. > > BUT thinking about what the drivers headroom size MEANS to userspace, > I'm not sure it is wise to give this info to userspace. The > XDP-headroom is used for several kernel internal things, that limit the > available space for growing packet-headroom. E.g. (1) xdp_frame is > something that we likely need to grow (even-though I'm pushing back), > E.g. (2) metadata area which Saeed is looking to populate from driver > code (also reduce packet-headroom for encap-headers). So, userspace > cannot use the XDP-headroom size to much... (Ah, you had already replied, sorry seems I missed that). Can we calculate a number from the headroom that is meaningful for userspace? I suppose that would be "total number of bytes available for metadata+packet extension"? Even with growing data structures, any particular kernel should be able to inform userspace of the current value, no? -Toke
Re: [PATCH v8 3/4] phy: Add Sparx5 ethernet serdes PHY driver
On 04.12.2020 15:16, Alexandre Belloni wrote: EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe On 03/12/2020 22:52:53+0100, Andrew Lunn wrote: > + if (macro->serdestype == SPX5_SDT_6G) { > + value = sdx5_rd(priv, SD6G_LANE_LANE_DF(macro->stpidx)); > + analog_sd = SD6G_LANE_LANE_DF_PMA2PCS_RXEI_FILTERED_GET(value); > + } else if (macro->serdestype == SPX5_SDT_10G) { > + value = sdx5_rd(priv, SD10G_LANE_LANE_DF(macro->stpidx)); > + analog_sd = SD10G_LANE_LANE_DF_PMA2PCS_RXEI_FILTERED_GET(value); > + } else { > + value = sdx5_rd(priv, SD25G_LANE_LANE_DE(macro->stpidx)); > + analog_sd = SD25G_LANE_LANE_DE_LN_PMA_RXEI_GET(value); > + } > + /* Link up is when analog_sd == 0 */ > + return analog_sd; > +} What i have not yet seen is how this code plugs together with phylink_pcs_ops? Can this hardware also be used for SATA, USB? As far as i understand, the Marvell Comphy is multi-purpose, it is used for networking, USB, and SATA, etc. Making it a generic PHY then makes sense, because different subsystems need to use it. But it looks like this is for networking only? So i'm wondering if it belongs in driver/net/pcs and it should be accessed using phylink_pcs_ops? Ocelot had PCie on the phys, doesn't Sparx5 have it? Yes Ocelot has that, but on Sparx5 the PCIe is separate... -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com BR Steen --- Steen Hegelund steen.hegel...@microchip.com
[PATCH] bpf: propagate __user annotations properly
__htab_map_lookup_and_delete_batch() stores a user pointer in the local variable ubatch and uses that in copy_{from,to}_user(), but ubatch misses a __user annotation. So, sparse warns in the various assignments and uses of ubatch: kernel/bpf/hashtab.c:1415:24: warning: incorrect type in initializer (different address spaces) kernel/bpf/hashtab.c:1415:24:expected void *ubatch kernel/bpf/hashtab.c:1415:24:got void [noderef] __user * kernel/bpf/hashtab.c:1444:46: warning: incorrect type in argument 2 (different address spaces) kernel/bpf/hashtab.c:1444:46:expected void const [noderef] __user *from kernel/bpf/hashtab.c:1444:46:got void *ubatch kernel/bpf/hashtab.c:1608:16: warning: incorrect type in assignment (different address spaces) kernel/bpf/hashtab.c:1608:16:expected void *ubatch kernel/bpf/hashtab.c:1608:16:got void [noderef] __user * kernel/bpf/hashtab.c:1609:26: warning: incorrect type in argument 1 (different address spaces) kernel/bpf/hashtab.c:1609:26:expected void [noderef] __user *to kernel/bpf/hashtab.c:1609:26:got void *ubatch Add the __user annotation to repair this chain of propagating __user annotations in __htab_map_lookup_and_delete_batch(). Signed-off-by: Lukas Bulwahn --- applies cleanly on current master (v5.10-rc7) and next-20201204 BPF maintainers, please pick this minor non-urgent clean-up patch. kernel/bpf/hashtab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index fe7a0733a63a..76c791def033 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1412,7 +1412,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val; void __user *uvalues = u64_to_user_ptr(attr->batch.values); void __user *ukeys = u64_to_user_ptr(attr->batch.keys); - void *ubatch = u64_to_user_ptr(attr->batch.in_batch); + void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch); u32 batch, max_count, size, bucket_size; struct htab_elem *node_to_free = NULL; u64 elem_map_flags, map_flags; -- 2.17.1
Re: [PATCH 1/1] xdp: avoid calling kfree twice
On 2020-12-08 07:50, Zhu Yanjun wrote: From: Zhu Yanjun In the function xdp_umem_pin_pages, if npgs != umem->npgs and npgs >= 0, the function xdp_umem_unpin_pages is called. In this function, kfree is called to handle umem->pgs, and then in the function xdp_umem_pin_pages, kfree is called again to handle umem->pgs. Eventually, umem->pgs is freed twice. Hi Zhu, Thanks for the cleanup! kfree(NULL) is valid, so this is not a double-free, but still a nice cleanup! Signed-off-by: Zhu Yanjun --- net/xdp/xdp_umem.c | 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 56a28a686988..ff5173f72920 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -97,7 +97,6 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) { unsigned int gup_flags = FOLL_WRITE; long npgs; - int err; umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL | __GFP_NOWARN); @@ -112,20 +111,14 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) if (npgs != umem->npgs) { if (npgs >= 0) { umem->npgs = npgs; - err = -ENOMEM; - goto out_pin; + xdp_umem_unpin_pages(umem); + return -ENOMEM; } - err = npgs; - goto out_pgs; + kfree(umem->pgs); + umem->pgs = NULL; + return npgs; I'd like an explicit cast "(int)" here (-Wconversion). Please spin a v2 with the cast, with my: Acked-by: Björn Töpel added. Cheers! Björn } return 0; - -out_pin: - xdp_umem_unpin_pages(umem); -out_pgs: - kfree(umem->pgs); - umem->pgs = NULL; - return err; } static int xdp_umem_account_pages(struct xdp_umem *umem)
WARNING: ODEBUG bug in slave_kobj_release
Hello, syzbot found the following issue on: HEAD commit:34816d20 Merge tag 'gfs2-v5.10-rc5-fixes' of git://git.ker.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=153f779d50 kernel config: https://syzkaller.appspot.com/x/.config?x=e49433cfed49b7d9 dashboard link: https://syzkaller.appspot.com/bug?extid=7bce4c2f7e1768ec3fe0 compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+7bce4c2f7e1768ec3...@syzkaller.appspotmail.com kobject_add_internal failed for bonding_slave (error: -12 parent: veth213) [ cut here ] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0 WARNING: CPU: 1 PID: 22707 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 1 PID: 22707 Comm: syz-executor.4 Not tainted 5.10.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 20 a2 9d 89 4c 89 ee 48 c7 c7 20 96 9d 89 e8 1e 0e f2 04 <0f> 0b 83 05 a5 87 32 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:c9000e37e9a0 EFLAGS: 00010082 RAX: RBX: 0005 RCX: RDX: 0004 RSI: 8158c855 RDI: f52001c6fd26 RBP: 0001 R08: 0001 R09: 8880b9f2011b R10: R11: R12: 894d3be0 R13: 899d9ca0 R14: 815f15f0 R15: 192001c6fd3f FS: 7fc5d258d700() GS:8880b9f0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 00749138 CR3: 52e81000 CR4: 00350ee0 Call Trace: debug_object_assert_init lib/debugobjects.c:890 [inline] debug_object_assert_init+0x1f4/0x2e0 lib/debugobjects.c:861 debug_timer_assert_init kernel/time/timer.c:737 [inline] debug_assert_init kernel/time/timer.c:782 [inline] del_timer+0x6d/0x110 kernel/time/timer.c:1202 try_to_grab_pending+0x6d/0xd0 kernel/workqueue.c:1252 __cancel_work_timer+0xa6/0x520 kernel/workqueue.c:3095 slave_kobj_release+0x48/0xe0 drivers/net/bonding/bond_main.c:1468 kobject_cleanup lib/kobject.c:705 [inline] kobject_release lib/kobject.c:736 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x1c8/0x540 lib/kobject.c:753 bond_kobj_init drivers/net/bonding/bond_main.c:1489 [inline] bond_alloc_slave drivers/net/bonding/bond_main.c:1506 [inline] bond_enslave+0x2488/0x4bf0 drivers/net/bonding/bond_main.c:1708 do_set_master+0x1c8/0x220 net/core/rtnetlink.c:2517 do_setlink+0x911/0x3a70 net/core/rtnetlink.c:2713 __rtnl_newlink+0xc1c/0x1740 net/core/rtnetlink.c:3374 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5562 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:671 sys_sendmsg+0x6e8/0x810 net/socket.c:2353 ___sys_sendmsg+0xf3/0x170 net/socket.c:2407 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45deb9 Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7fc5d258cc78 EFLAGS: 0246 ORIG_RAX: 002e RAX: ffda RBX: 0002e740 RCX: 0045deb9 RDX: RSI: 2080 RDI: 0005 RBP: 7fc5d258cca0 R08: R09: R10: R11: 0246 R12: 0009 R13: 7ffdcf6b003f R14: 7fc5d258d9c0 R15: 0119bf2c --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
[PATCH net-next] net/af_iucv: use DECLARE_SOCKADDR to cast from sockaddr
This gets us compile-time size checking. Signed-off-by: Julian Wiedmann --- net/iucv/af_iucv.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c index db7d888914fa..882f028992c3 100644 --- a/net/iucv/af_iucv.c +++ b/net/iucv/af_iucv.c @@ -587,7 +587,7 @@ static void __iucv_auto_name(struct iucv_sock *iucv) static int iucv_sock_bind(struct socket *sock, struct sockaddr *addr, int addr_len) { - struct sockaddr_iucv *sa = (struct sockaddr_iucv *) addr; + DECLARE_SOCKADDR(struct sockaddr_iucv *, sa, addr); char uid[sizeof(sa->siucv_user_id)]; struct sock *sk = sock->sk; struct iucv_sock *iucv; @@ -691,7 +691,7 @@ static int iucv_sock_autobind(struct sock *sk) static int afiucv_path_connect(struct socket *sock, struct sockaddr *addr) { - struct sockaddr_iucv *sa = (struct sockaddr_iucv *) addr; + DECLARE_SOCKADDR(struct sockaddr_iucv *, sa, addr); struct sock *sk = sock->sk; struct iucv_sock *iucv = iucv_sk(sk); unsigned char user_data[16]; @@ -738,7 +738,7 @@ static int afiucv_path_connect(struct socket *sock, struct sockaddr *addr) static int iucv_sock_connect(struct socket *sock, struct sockaddr *addr, int alen, int flags) { - struct sockaddr_iucv *sa = (struct sockaddr_iucv *) addr; + DECLARE_SOCKADDR(struct sockaddr_iucv *, sa, addr); struct sock *sk = sock->sk; struct iucv_sock *iucv = iucv_sk(sk); int err; @@ -874,7 +874,7 @@ static int iucv_sock_accept(struct socket *sock, struct socket *newsock, static int iucv_sock_getname(struct socket *sock, struct sockaddr *addr, int peer) { - struct sockaddr_iucv *siucv = (struct sockaddr_iucv *) addr; + DECLARE_SOCKADDR(struct sockaddr_iucv *, siucv, addr); struct sock *sk = sock->sk; struct iucv_sock *iucv = iucv_sk(sk); -- 2.17.1
Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
On Fri, 4 Dec 2020 16:21:08 +0100 Daniel Borkmann wrote: > On 12/4/20 1:46 PM, Maciej Fijalkowski wrote: > > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke Høiland-Jørgensen wrote: > >> alar...@gmail.com writes: > >>> From: Marek Majtyka > >>> > >>> Implement support for checking what kind of xdp functionality a netdev > >>> supports. Previously, there was no way to do this other than to try > >>> to create an AF_XDP socket on the interface or load an XDP program and see > >>> if it worked. This commit changes this by adding a new variable which > >>> describes all xdp supported functions on pretty detailed level: > >> > >> I like the direction this is going! :) (Me too, don't get discouraged by our nitpicking, keep working on this! :-)) > >> > >>> - aborted > >>> - drop > >>> - pass > >>> - tx > > I strongly think we should _not_ merge any native XDP driver patchset > that does not support/implement the above return codes. I agree, with above statement. > Could we instead group them together and call this something like > XDP_BASE functionality to not give a wrong impression? I disagree. I can accept that XDP_BASE include aborted+drop+pass. I think we need to keep XDP_TX action separate, because I think that there are use-cases where the we want to disable XDP_TX due to end-user policy or hardware limitations. Use-case(1): Cloud-provider want to give customers (running VMs) ability to load XDP program for DDoS protection (only), but don't want to allow customer to use XDP_TX (that can implement LB or cheat their VM isolation policy). Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue resources, as the use-case is only DDoS. Today we have this problem with the ixgbe hardware, that cannot load XDP programs on systems with more than 192 CPUs. > If this is properly documented that these are basic must-have > _requirements_, then users and driver developers both know what the > expectations are. We can still document that XDP_TX is a must-have requirement, when a driver implements XDP. > >>> - redirect > >> -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[PATCH v5 5/6] net: dsa: microchip: Add Microchip KSZ8863 SPI based driver support
Add KSZ88X3 driver support. We add support for the KXZ88X3 three port switches using the SPI Interface. Reviewed-by: Florian Fainelli Signed-off-by: Michael Grzeschik --- v1 -> v2: - this glue was not implemented v2 -> v3: - this glue was part of previous bigger patch v3 -> v4: - this glue was moved to this separate patch v4 -> v5: - added reviewed by from f.fainelli - using device_get_match_data instead of own matching code --- drivers/net/dsa/microchip/ksz8795_spi.c | 44 ++--- 1 file changed, 32 insertions(+), 12 deletions(-) diff --git a/drivers/net/dsa/microchip/ksz8795_spi.c b/drivers/net/dsa/microchip/ksz8795_spi.c index 45420c07c99fc..708f8daaedbc2 100644 --- a/drivers/net/dsa/microchip/ksz8795_spi.c +++ b/drivers/net/dsa/microchip/ksz8795_spi.c @@ -14,34 +14,52 @@ #include #include +#include "ksz8.h" #include "ksz_common.h" -#define SPI_ADDR_SHIFT 12 -#define SPI_ADDR_ALIGN 3 -#define SPI_TURNAROUND_SHIFT 1 +#define KSZ8795_SPI_ADDR_SHIFT 12 +#define KSZ8795_SPI_ADDR_ALIGN 3 +#define KSZ8795_SPI_TURNAROUND_SHIFT 1 -KSZ_REGMAP_TABLE(ksz8795, 16, SPI_ADDR_SHIFT, -SPI_TURNAROUND_SHIFT, SPI_ADDR_ALIGN); +#define KSZ8863_SPI_ADDR_SHIFT 8 +#define KSZ8863_SPI_ADDR_ALIGN 8 +#define KSZ8863_SPI_TURNAROUND_SHIFT 0 + +KSZ_REGMAP_TABLE(ksz8795, 16, KSZ8795_SPI_ADDR_SHIFT, +KSZ8795_SPI_TURNAROUND_SHIFT, KSZ8795_SPI_ADDR_ALIGN); + +KSZ_REGMAP_TABLE(ksz8863, 16, KSZ8863_SPI_ADDR_SHIFT, +KSZ8863_SPI_TURNAROUND_SHIFT, KSZ8863_SPI_ADDR_ALIGN); static int ksz8795_spi_probe(struct spi_device *spi) { + const struct regmap_config *regmap_config; + struct device *ddev = &spi->dev; + struct ksz8 *ksz8; struct regmap_config rc; struct ksz_device *dev; - int i, ret; + int i, ret = 0; - dev = ksz_switch_alloc(&spi->dev, spi); + ksz8 = devm_kzalloc(&spi->dev, sizeof(struct ksz8), GFP_KERNEL); + ksz8->priv = spi; + + dev = ksz_switch_alloc(&spi->dev, ksz8); if (!dev) return -ENOMEM; + regmap_config = device_get_match_data(ddev); + if (!regmap_config) + return -EINVAL; + for (i = 0; i < ARRAY_SIZE(ksz8795_regmap_config); i++) { - rc = ksz8795_regmap_config[i]; + rc = regmap_config[i]; rc.lock_arg = &dev->regmap_mutex; dev->regmap[i] = devm_regmap_init_spi(spi, &rc); if (IS_ERR(dev->regmap[i])) { ret = PTR_ERR(dev->regmap[i]); dev_err(&spi->dev, "Failed to initialize regmap%i: %d\n", - ksz8795_regmap_config[i].val_bits, ret); + regmap_config[i].val_bits, ret); return ret; } } @@ -85,9 +103,11 @@ static void ksz8795_spi_shutdown(struct spi_device *spi) } static const struct of_device_id ksz8795_dt_ids[] = { - { .compatible = "microchip,ksz8765" }, - { .compatible = "microchip,ksz8794" }, - { .compatible = "microchip,ksz8795" }, + { .compatible = "microchip,ksz8765", .data = &ksz8795_regmap_config }, + { .compatible = "microchip,ksz8794", .data = &ksz8795_regmap_config }, + { .compatible = "microchip,ksz8795", .data = &ksz8795_regmap_config }, + { .compatible = "microchip,ksz8863", .data = &ksz8863_regmap_config }, + { .compatible = "microchip,ksz8873", .data = &ksz8863_regmap_config }, {}, }; MODULE_DEVICE_TABLE(of, ksz8795_dt_ids); -- 2.29.2
[PATCH v5 3/6] net: dsa: microchip: ksz8795: move register offsets and shifts to separate struct
In order to get this driver used with other switches the functions need to use different offsets and register shifts. This patch changes the direct use of the register defines to register description structures, which can be set depending on the chips register layout. Signed-off-by: Michael Grzeschik --- v1 -> v4: - extracted this change from bigger previous patch v4 -> v5: - added missing variables in ksz8_r_vlan_entries - moved shifts, masks and registers to arrays indexed by enums - using unsigned types where possible --- drivers/net/dsa/microchip/ksz8.h| 69 +++ drivers/net/dsa/microchip/ksz8795.c | 261 +--- drivers/net/dsa/microchip/ksz8795_reg.h | 85 3 files changed, 253 insertions(+), 162 deletions(-) create mode 100644 drivers/net/dsa/microchip/ksz8.h diff --git a/drivers/net/dsa/microchip/ksz8.h b/drivers/net/dsa/microchip/ksz8.h new file mode 100644 index 0..d3e89c27e22aa --- /dev/null +++ b/drivers/net/dsa/microchip/ksz8.h @@ -0,0 +1,69 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Microchip KSZ8XXX series register access + * + * Copyright (C) 2019 Pengutronix, Michael Grzeschik + */ + +#ifndef __KSZ8XXX_H +#define __KSZ8XXX_H +#include + +enum ksz_regs { + REG_IND_CTRL_0, + REG_IND_DATA_8, + REG_IND_DATA_CHECK, + REG_IND_DATA_HI, + REG_IND_DATA_LO, + REG_IND_MIB_CHECK, + P_FORCE_CTRL, + P_LINK_STATUS, + P_LOCAL_CTRL, + P_NEG_RESTART_CTRL, + P_REMOTE_STATUS, + P_SPEED_STATUS, + S_TAIL_TAG_CTRL, +}; + +enum ksz_masks { + PORT_802_1P_REMAPPING, + SW_TAIL_TAG_ENABLE, + MIB_COUNTER_OVERFLOW, + MIB_COUNTER_VALID, + VLAN_TABLE_FID, + VLAN_TABLE_MEMBERSHIP, + VLAN_TABLE_VALID, + STATIC_MAC_TABLE_VALID, + STATIC_MAC_TABLE_USE_FID, + STATIC_MAC_TABLE_FID, + STATIC_MAC_TABLE_OVERRIDE, + STATIC_MAC_TABLE_FWD_PORTS, + DYNAMIC_MAC_TABLE_ENTRIES_H, + DYNAMIC_MAC_TABLE_MAC_EMPTY, + DYNAMIC_MAC_TABLE_NOT_READY, + DYNAMIC_MAC_TABLE_ENTRIES, + DYNAMIC_MAC_TABLE_FID, + DYNAMIC_MAC_TABLE_SRC_PORT, + DYNAMIC_MAC_TABLE_TIMESTAMP, +}; + +enum ksz_shifts { + VLAN_TABLE_MEMBERSHIP_S, + VLAN_TABLE, + STATIC_MAC_FWD_PORTS, + STATIC_MAC_FID, + DYNAMIC_MAC_ENTRIES_H, + DYNAMIC_MAC_ENTRIES, + DYNAMIC_MAC_FID, + DYNAMIC_MAC_TIMESTAMP, + DYNAMIC_MAC_SRC_PORT, +}; + +struct ksz8 { + const u8 *regs; + const u32 *masks; + const u8 *shifts; + void *priv; +}; + +#endif diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index 53cb41087a594..127498a9b8f72 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -20,6 +20,57 @@ #include "ksz_common.h" #include "ksz8795_reg.h" +#include "ksz8.h" + +static const u8 ksz8795_regs[] = { + [REG_IND_CTRL_0]= 0x6E, + [REG_IND_DATA_8]= 0x70, + [REG_IND_DATA_CHECK]= 0x72, + [REG_IND_DATA_HI] = 0x71, + [REG_IND_DATA_LO] = 0x75, + [REG_IND_MIB_CHECK] = 0x74, + [P_FORCE_CTRL] = 0x0C, + [P_LINK_STATUS] = 0x0E, + [P_LOCAL_CTRL] = 0x07, + [P_NEG_RESTART_CTRL]= 0x0D, + [P_REMOTE_STATUS] = 0x08, + [P_SPEED_STATUS]= 0x09, + [S_TAIL_TAG_CTRL] = 0x0C, +}; + +static const u32 ksz8795_masks[] = { + [PORT_802_1P_REMAPPING] = BIT(7), + [SW_TAIL_TAG_ENABLE]= BIT(1), + [MIB_COUNTER_OVERFLOW] = BIT(6), + [MIB_COUNTER_VALID] = BIT(5), + [VLAN_TABLE_FID]= GENMASK(6, 0), + [VLAN_TABLE_MEMBERSHIP] = GENMASK(11, 7), + [VLAN_TABLE_VALID] = BIT(12), + [STATIC_MAC_TABLE_VALID]= BIT(21), + [STATIC_MAC_TABLE_USE_FID] = BIT(23), + [STATIC_MAC_TABLE_FID] = GENMASK(30, 24), + [STATIC_MAC_TABLE_OVERRIDE] = BIT(26), + [STATIC_MAC_TABLE_FWD_PORTS]= GENMASK(24, 20), + [DYNAMIC_MAC_TABLE_ENTRIES_H] = GENMASK(6, 0), + [DYNAMIC_MAC_TABLE_MAC_EMPTY] = BIT(8), + [DYNAMIC_MAC_TABLE_NOT_READY] = BIT(7), + [DYNAMIC_MAC_TABLE_ENTRIES] = GENMASK(31, 29), + [DYNAMIC_MAC_TABLE_FID] = GENMASK(26, 20), + [DYNAMIC_MAC_TABLE_SRC_PORT]= GENMASK(26, 24), + [DYNAMIC_MAC_TABLE_TIMESTAMP] = GENMASK(28, 27), +}; + +static const u8 ksz8795_shifts[] = { + [VLAN_TABLE_MEMBERSHIP] = 7, + [VLAN_TABLE]= 16, + [STATIC_MAC_FWD_PORTS] = 16, + [STATIC_MAC_FID]= 24, + [DYNAMIC_MAC_ENTRIES_H] = 3, + [DYNAMIC_MAC_ENTRIES]
[PATCH v5 1/6] net: dsa: microchip: ksz8795: change drivers prefix to be generic
The driver can be used on other chips of this type. To reflect this we rename the drivers prefix from ksz8795 to ksz8. Signed-off-by: Michael Grzeschik --- v1 -> v4: - extracted this change from bigger previous patch v4 -> v5: - removed extra unavailable variables in ksz8_r_vlan_entries --- drivers/net/dsa/microchip/ksz8795.c | 222 drivers/net/dsa/microchip/ksz8795_spi.c | 2 +- drivers/net/dsa/microchip/ksz_common.h | 2 +- 3 files changed, 110 insertions(+), 116 deletions(-) diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index c973db101b729..1d12597a1c8a4 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -74,7 +74,7 @@ static void ksz_port_cfg(struct ksz_device *dev, int port, int offset, u8 bits, bits, set ? bits : 0); } -static int ksz8795_reset_switch(struct ksz_device *dev) +static int ksz8_reset_switch(struct ksz_device *dev) { /* reset switch */ ksz_write8(dev, REG_POWER_MANAGEMENT_1, @@ -117,8 +117,7 @@ static void ksz8795_set_prio_queue(struct ksz_device *dev, int port, int queue) true); } -static void ksz8795_r_mib_cnt(struct ksz_device *dev, int port, u16 addr, - u64 *cnt) +static void ksz8_r_mib_cnt(struct ksz_device *dev, int port, u16 addr, u64 *cnt) { u16 ctrl_addr; u32 data; @@ -148,8 +147,8 @@ static void ksz8795_r_mib_cnt(struct ksz_device *dev, int port, u16 addr, mutex_unlock(&dev->alu_mutex); } -static void ksz8795_r_mib_pkt(struct ksz_device *dev, int port, u16 addr, - u64 *dropped, u64 *cnt) +static void ksz8_r_mib_pkt(struct ksz_device *dev, int port, u16 addr, + u64 *dropped, u64 *cnt) { u16 ctrl_addr; u32 data; @@ -195,7 +194,7 @@ static void ksz8795_r_mib_pkt(struct ksz_device *dev, int port, u16 addr, mutex_unlock(&dev->alu_mutex); } -static void ksz8795_freeze_mib(struct ksz_device *dev, int port, bool freeze) +static void ksz8_freeze_mib(struct ksz_device *dev, int port, bool freeze) { /* enable the port for flush/freeze function */ if (freeze) @@ -207,7 +206,7 @@ static void ksz8795_freeze_mib(struct ksz_device *dev, int port, bool freeze) ksz_cfg(dev, REG_SW_CTRL_6, BIT(port), false); } -static void ksz8795_port_init_cnt(struct ksz_device *dev, int port) +static void ksz8_port_init_cnt(struct ksz_device *dev, int port) { struct ksz_port_mib *mib = &dev->ports[port].mib; @@ -235,8 +234,7 @@ static void ksz8795_port_init_cnt(struct ksz_device *dev, int port) memset(mib->counters, 0, dev->mib_cnt * sizeof(u64)); } -static void ksz8795_r_table(struct ksz_device *dev, int table, u16 addr, - u64 *data) +static void ksz8_r_table(struct ksz_device *dev, int table, u16 addr, u64 *data) { u16 ctrl_addr; @@ -248,8 +246,7 @@ static void ksz8795_r_table(struct ksz_device *dev, int table, u16 addr, mutex_unlock(&dev->alu_mutex); } -static void ksz8795_w_table(struct ksz_device *dev, int table, u16 addr, - u64 data) +static void ksz8_w_table(struct ksz_device *dev, int table, u16 addr, u64 data) { u16 ctrl_addr; @@ -261,7 +258,7 @@ static void ksz8795_w_table(struct ksz_device *dev, int table, u16 addr, mutex_unlock(&dev->alu_mutex); } -static int ksz8795_valid_dyn_entry(struct ksz_device *dev, u8 *data) +static int ksz8_valid_dyn_entry(struct ksz_device *dev, u8 *data) { int timeout = 100; @@ -284,9 +281,9 @@ static int ksz8795_valid_dyn_entry(struct ksz_device *dev, u8 *data) return 0; } -static int ksz8795_r_dyn_mac_table(struct ksz_device *dev, u16 addr, - u8 *mac_addr, u8 *fid, u8 *src_port, - u8 *timestamp, u16 *entries) +static int ksz8_r_dyn_mac_table(struct ksz_device *dev, u16 addr, + u8 *mac_addr, u8 *fid, u8 *src_port, + u8 *timestamp, u16 *entries) { u32 data_hi, data_lo; u16 ctrl_addr; @@ -298,7 +295,7 @@ static int ksz8795_r_dyn_mac_table(struct ksz_device *dev, u16 addr, mutex_lock(&dev->alu_mutex); ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr); - rc = ksz8795_valid_dyn_entry(dev, &data); + rc = ksz8_valid_dyn_entry(dev, &data); if (rc == -EAGAIN) { if (addr == 0) *entries = 0; @@ -341,13 +338,13 @@ static int ksz8795_r_dyn_mac_table(struct ksz_device *dev, u16 addr, return rc; } -static int ksz8795_r_sta_mac_table(struct ksz_device *dev, u16 addr, - struct alu_struct *alu) +static int ksz8_r_sta_mac_table(struct ksz_device *dev, u16 addr, + struct alu_struct *alu) {
[PATCH v5 2/6] net: dsa: microchip: ksz8795: move cpu_select_interface to extra function
This patch moves the cpu interface selection code to a individual function specific for ksz8795. It will make it simpler to customize the code path for different switches supported by this driver. Signed-off-by: Michael Grzeschik --- v1 -> v5: - extracted this from previous bigger patch --- drivers/net/dsa/microchip/ksz8795.c | 92 - 1 file changed, 50 insertions(+), 42 deletions(-) diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index 1d12597a1c8a4..53cb41087a594 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -911,10 +911,58 @@ static void ksz8_port_mirror_del(struct dsa_switch *ds, int port, PORT_MIRROR_SNIFFER, false); } +static void ksz8795_cpu_interface_select(struct ksz_device *dev, int port) +{ + struct ksz_port *p = &dev->ports[port]; + u8 data8; + + if (!p->interface && dev->compat_interface) { + dev_warn(dev->dev, +"Using legacy switch \"phy-mode\" property, because it is missing on port %d node. " +"Please update your device tree.\n", +port); + p->interface = dev->compat_interface; + } + + /* Configure MII interface for proper network communication. */ + ksz_read8(dev, REG_PORT_5_CTRL_6, &data8); + data8 &= ~PORT_INTERFACE_TYPE; + data8 &= ~PORT_GMII_1GPS_MODE; + switch (p->interface) { + case PHY_INTERFACE_MODE_MII: + p->phydev.speed = SPEED_100; + break; + case PHY_INTERFACE_MODE_RMII: + data8 |= PORT_INTERFACE_RMII; + p->phydev.speed = SPEED_100; + break; + case PHY_INTERFACE_MODE_GMII: + data8 |= PORT_GMII_1GPS_MODE; + data8 |= PORT_INTERFACE_GMII; + p->phydev.speed = SPEED_1000; + break; + default: + data8 &= ~PORT_RGMII_ID_IN_ENABLE; + data8 &= ~PORT_RGMII_ID_OUT_ENABLE; + if (p->interface == PHY_INTERFACE_MODE_RGMII_ID || + p->interface == PHY_INTERFACE_MODE_RGMII_RXID) + data8 |= PORT_RGMII_ID_IN_ENABLE; + if (p->interface == PHY_INTERFACE_MODE_RGMII_ID || + p->interface == PHY_INTERFACE_MODE_RGMII_TXID) + data8 |= PORT_RGMII_ID_OUT_ENABLE; + data8 |= PORT_GMII_1GPS_MODE; + data8 |= PORT_INTERFACE_RGMII; + p->phydev.speed = SPEED_1000; + break; + } + ksz_write8(dev, REG_PORT_5_CTRL_6, data8); + p->phydev.duplex = 1; +} + static void ksz8_port_setup(struct ksz_device *dev, int port, bool cpu_port) { struct ksz_port *p = &dev->ports[port]; - u8 data8, member; + u8 member; /* enable broadcast storm limit */ ksz_port_cfg(dev, port, P_BCAST_STORM_CTRL, PORT_BROADCAST_STORM, true); @@ -931,47 +979,7 @@ static void ksz8_port_setup(struct ksz_device *dev, int port, bool cpu_port) ksz_port_cfg(dev, port, P_PRIO_CTRL, PORT_802_1P_ENABLE, true); if (cpu_port) { - if (!p->interface && dev->compat_interface) { - dev_warn(dev->dev, -"Using legacy switch \"phy-mode\" property, because it is missing on port %d node. " -"Please update your device tree.\n", -port); - p->interface = dev->compat_interface; - } - - /* Configure MII interface for proper network communication. */ - ksz_read8(dev, REG_PORT_5_CTRL_6, &data8); - data8 &= ~PORT_INTERFACE_TYPE; - data8 &= ~PORT_GMII_1GPS_MODE; - switch (p->interface) { - case PHY_INTERFACE_MODE_MII: - p->phydev.speed = SPEED_100; - break; - case PHY_INTERFACE_MODE_RMII: - data8 |= PORT_INTERFACE_RMII; - p->phydev.speed = SPEED_100; - break; - case PHY_INTERFACE_MODE_GMII: - data8 |= PORT_GMII_1GPS_MODE; - data8 |= PORT_INTERFACE_GMII; - p->phydev.speed = SPEED_1000; - break; - default: - data8 &= ~PORT_RGMII_ID_IN_ENABLE; - data8 &= ~PORT_RGMII_ID_OUT_ENABLE; - if (p->interface == PHY_INTERFACE_MODE_RGMII_ID || - p->interface == PHY_INTERFACE_MODE_RGMII_RXID) - data8 |= PORT_RGMII_ID_IN_ENABLE; - if (p->interface == PHY_INTERFACE_MODE_RGMII_ID || - p->interface == PHY_INTERFACE_MODE_RGMII_TXID) -
[PATCH v5 0/6] microchip: add support for ksz88x3 driver family
This series adds support for the ksz88x3 driver family to the dsa based ksz drivers. The driver is making use of the already available ksz8795 driver and moves it to an generic driver for the ksz8 based chips which have similar functions but an totaly different register layout. This branch is to be rebased on net-next/master The mainlining discussion history of this branch: v1: https://lore.kernel.org/netdev/20191107110030.25199-1-m.grzesc...@pengutronix.de/ v2: https://lore.kernel.org/netdev/20191218200831.13796-1-m.grzesc...@pengutronix.de/ v3: https://lore.kernel.org/netdev/20200508154343.6074-1-m.grzesc...@pengutronix.de/ v4: https://lore.kernel.org/netdev/20200803054442.20089-1-m.grzesc...@pengutronix.de/ Michael Grzeschik (6): net: dsa: microchip: ksz8795: change drivers prefix to be generic net: dsa: microchip: ksz8795: move cpu_select_interface to extra function net: dsa: microchip: ksz8795: move register offsets and shifts to separate struct net: dsa: microchip: ksz8795: add support for ksz88xx chips net: dsa: microchip: Add Microchip KSZ8863 SPI based driver support dt-bindings: net: dsa: document additional Microchip KSZ8863/8873 switch .../bindings/net/dsa/microchip,ksz.yaml | 2 + drivers/net/dsa/microchip/ksz8.h | 69 ++ drivers/net/dsa/microchip/ksz8795.c | 888 -- drivers/net/dsa/microchip/ksz8795_reg.h | 125 +-- drivers/net/dsa/microchip/ksz8795_spi.c | 46 +- drivers/net/dsa/microchip/ksz_common.h| 3 +- 6 files changed, 730 insertions(+), 403 deletions(-) create mode 100644 drivers/net/dsa/microchip/ksz8.h -- 2.29.2
[PATCH v5 6/6] dt-bindings: net: dsa: document additional Microchip KSZ8863/8873 switch
It is a 3-Port 10/100 Ethernet Switch. One CPU-Port and two Switch-Ports. Cc: devicet...@vger.kernel.org Reviewed-by: Andrew Lunn Acked-by: Rob Herring Reviewed-by: Florian Fainelli Signed-off-by: Michael Grzeschik --- v1 -> v3: - nothing changes - already Acked-by Rob Herring v1 -> v4: - nothing changes v4 -> v5: - nothing changes --- Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml index 9f7d131bbcef0..84985f53bffd4 100644 --- a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml +++ b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml @@ -21,6 +21,8 @@ properties: - microchip,ksz8765 - microchip,ksz8794 - microchip,ksz8795 + - microchip,ksz8863 + - microchip,ksz8873 - microchip,ksz9477 - microchip,ksz9897 - microchip,ksz9896 -- 2.29.2
[PATCH v5 4/6] net: dsa: microchip: ksz8795: add support for ksz88xx chips
We add support for the ksz8863 and ksz8873 chips which are using the same register patterns but other offsets as the ksz8795. Signed-off-by: Michael Grzeschik --- v1 -> v4: - extracted this change from bigger previous patch v4 -> v5: - added clear of reset bit for ksz8863 reset code - using extra device flag IS_KSZ88x3 instead of is_ksz8795 function - using DSA_TAG_PROTO_KSZ9893 protocol for ksz88x3 instead --- drivers/net/dsa/microchip/ksz8795.c | 345 +++- drivers/net/dsa/microchip/ksz8795_reg.h | 40 ++- drivers/net/dsa/microchip/ksz_common.h | 1 + 3 files changed, 299 insertions(+), 87 deletions(-) diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index 127498a9b8f72..9484667a29a35 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -22,6 +22,9 @@ #include "ksz8795_reg.h" #include "ksz8.h" +/* Used with variable features to indicate capabilities. */ +#define IS_88X3BIT(0) + static const u8 ksz8795_regs[] = { [REG_IND_CTRL_0]= 0x6E, [REG_IND_DATA_8]= 0x70, @@ -72,9 +75,60 @@ static const u8 ksz8795_shifts[] = { [DYNAMIC_MAC_SRC_PORT] = 24, }; -static const struct { +static const u8 ksz8863_regs[] = { + [REG_IND_CTRL_0]= 0x79, + [REG_IND_DATA_8]= 0x7B, + [REG_IND_DATA_CHECK]= 0x7B, + [REG_IND_DATA_HI] = 0x7C, + [REG_IND_DATA_LO] = 0x80, + [REG_IND_MIB_CHECK] = 0x80, + [P_FORCE_CTRL] = 0x0C, + [P_LINK_STATUS] = 0x0E, + [P_LOCAL_CTRL] = 0x0C, + [P_NEG_RESTART_CTRL]= 0x0D, + [P_REMOTE_STATUS] = 0x0E, + [P_SPEED_STATUS]= 0x0F, + [S_TAIL_TAG_CTRL] = 0x03, +}; + +static const u32 ksz8863_masks[] = { + [PORT_802_1P_REMAPPING] = BIT(3), + [SW_TAIL_TAG_ENABLE]= BIT(6), + [MIB_COUNTER_OVERFLOW] = BIT(7), + [MIB_COUNTER_VALID] = BIT(6), + [VLAN_TABLE_FID]= GENMASK(15, 12), + [VLAN_TABLE_MEMBERSHIP] = GENMASK(18, 16), + [VLAN_TABLE_VALID] = BIT(19), + [STATIC_MAC_TABLE_VALID]= BIT(19), + [STATIC_MAC_TABLE_USE_FID] = BIT(21), + [STATIC_MAC_TABLE_FID] = GENMASK(29, 26), + [STATIC_MAC_TABLE_OVERRIDE] = BIT(20), + [STATIC_MAC_TABLE_FWD_PORTS]= GENMASK(18, 16), + [DYNAMIC_MAC_TABLE_ENTRIES_H] = GENMASK(5, 0), + [DYNAMIC_MAC_TABLE_MAC_EMPTY] = BIT(7), + [DYNAMIC_MAC_TABLE_NOT_READY] = BIT(7), + [DYNAMIC_MAC_TABLE_ENTRIES] = GENMASK(31, 28), + [DYNAMIC_MAC_TABLE_FID] = GENMASK(19, 16), + [DYNAMIC_MAC_TABLE_SRC_PORT]= GENMASK(21, 20), + [DYNAMIC_MAC_TABLE_TIMESTAMP] = GENMASK(23, 22), +}; + +static u8 ksz8863_shifts[] = { + [VLAN_TABLE_MEMBERSHIP] = 16, + [STATIC_MAC_FWD_PORTS] = 16, + [STATIC_MAC_FID]= 22, + [DYNAMIC_MAC_ENTRIES_H] = 3, + [DYNAMIC_MAC_ENTRIES] = 24, + [DYNAMIC_MAC_FID] = 16, + [DYNAMIC_MAC_TIMESTAMP] = 24, + [DYNAMIC_MAC_SRC_PORT] = 20, +}; + +struct mib_names { char string[ETH_GSTRING_LEN]; -} mib_names[] = { +}; + +static const struct mib_names ksz87xx_mib_names[] = { { "rx_hi" }, { "rx_undersize" }, { "rx_fragments" }, @@ -113,6 +167,43 @@ static const struct { { "tx_discards" }, }; +static const struct mib_names ksz88xx_mib_names[] = { + { "rx" }, + { "rx_hi" }, + { "rx_undersize" }, + { "rx_fragments" }, + { "rx_oversize" }, + { "rx_jabbers" }, + { "rx_symbol_err" }, + { "rx_crc_err" }, + { "rx_align_err" }, + { "rx_mac_ctrl" }, + { "rx_pause" }, + { "rx_bcast" }, + { "rx_mcast" }, + { "rx_ucast" }, + { "rx_64_or_less" }, + { "rx_65_127" }, + { "rx_128_255" }, + { "rx_256_511" }, + { "rx_512_1023" }, + { "rx_1024_1522" }, + { "tx" }, + { "tx_hi" }, + { "tx_late_col" }, + { "tx_pause" }, + { "tx_bcast" }, + { "tx_mcast" }, + { "tx_ucast" }, + { "tx_deferred" }, + { "tx_total_col" }, + { "tx_exc_col" }, + { "tx_single_col" }, + { "tx_mult_col" }, + { "rx_discards" }, + { "tx_discards" }, +}; + static void ksz_cfg(struct ksz_device *dev, u32 addr, u8 bits, bool set) { regmap_update_bits(dev->regmap[0], addr, bits, set ? bits : 0); @@ -127,10 +218,18 @@ static void ksz_port_cfg(struct ksz_device *dev, int port, int offset, u8 bits, static int ksz8_reset_switch(struct ksz_device *dev)
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
On Mon, 7 Dec 2020 at 13:02, David Howells wrote: > > Ard Biesheuvel wrote: > > > > Yeah - the problem with that is that for sunrpc, we might be dealing with > > > 1MB > > > plus bits of non-contiguous pages, requiring >8K of scatterlist elements > > > (admittedly, we can chain them, but we may have to do one or more large > > > allocations). > > > > > > > However, I would recommend against it: > > > > > > Sorry, recommend against what? > > > > > > > Recommend against the current approach of manipulating the input like > > this and feeding it into the skcipher piecemeal. > > Right. I understand the problem, but as I mentioned above, the scatterlist > itself becomes a performance issue as it may exceed two pages in size. Double > that as there may need to be separate input and output scatterlists. > I wasn't aware that Herbert's work hadn't been merged yet. So that means it is entirely reasonable to split the input like this and feed the first part into a cbc(aes) skcipher and the last part into a cts(cbc(aes)) skcipher, provided that you ensure that the last part covers the final two blocks (one full block and one block that is either full or partial) With Herbert's changes, you will be able to use the same skcipher, and pass a flag to all but the final part that more data is coming. But for lack of that, the current approach is optimal for cases where having to cover the entire input with a single scatterlist is undesirable. > > Herbert recently made some changes for MSG_MORE support in the AF_ALG > > code, which permits a skcipher encryption to be split into several > > invocations of the skcipher layer without the need for this complexity > > on the side of the caller. Maybe there is a way to reuse that here. > > Herbert? > > I wonder if it would help if the input buffer and output buffer didn't have to > correspond exactly in usage - ie. the output buffer could be used at a slower > rate than the input to allow for buffering inside the crypto algorithm. > I don't follow - how could one be used at a slower rate? > > > Can you also do SHA at the same time in the same loop? > > > > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD. > > The former doesn't really fit the current API so we'd have to invent > > something for it. > > The hashes corresponding to the kerberos enctypes I'm supporting are: > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. > > HMAC-SHA256 for aes128-cts-hmac-sha256-128 > > HMAC-SHA384 for aes256-cts-hmac-sha384-192 > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac > > I'm not sure you can support all of those with the instructions available. > It depends on whether the caller can make use of the authenc() pattern, which is a type of AEAD we support. There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)), including h/w accelerated ones, but none that implement ciphertext stealing. So that means that, even if you manage to use the AEAD layer to perform both at the same time, the generic authenc() template will perform the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes, respectively, which won't give you any benefit until accelerated implementations turn up that perform the whole operation in one pass over the input. And even then, I don't think the performance benefit will be worth it.
[PATCH net-next 2/6] s390/ccwgroup: use bus->dev_groups for bus-based sysfs attributes
Bus drivers have their own way of describing the sysfs attributes that all devices on a bus should provide. Switch ccwgroup_attr_groups over to use bus->dev_groups, and thus free up dev->groups for usage by the ccwgroup device drivers. While adjusting the attribute naming, use ATTRIBUTE_GROUPS() to get rid of some boilerplate code. Signed-off-by: Julian Wiedmann Acked-by: Heiko Carstens --- drivers/s390/cio/ccwgroup.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/s390/cio/ccwgroup.c b/drivers/s390/cio/ccwgroup.c index 483a9ecfcbb1..444385da5792 100644 --- a/drivers/s390/cio/ccwgroup.c +++ b/drivers/s390/cio/ccwgroup.c @@ -210,18 +210,12 @@ static ssize_t ccwgroup_ungroup_store(struct device *dev, static DEVICE_ATTR(ungroup, 0200, NULL, ccwgroup_ungroup_store); static DEVICE_ATTR(online, 0644, ccwgroup_online_show, ccwgroup_online_store); -static struct attribute *ccwgroup_attrs[] = { +static struct attribute *ccwgroup_dev_attrs[] = { &dev_attr_online.attr, &dev_attr_ungroup.attr, NULL, }; -static struct attribute_group ccwgroup_attr_group = { - .attrs = ccwgroup_attrs, -}; -static const struct attribute_group *ccwgroup_attr_groups[] = { - &ccwgroup_attr_group, - NULL, -}; +ATTRIBUTE_GROUPS(ccwgroup_dev); static void ccwgroup_ungroup_workfn(struct work_struct *work) { @@ -384,7 +378,6 @@ int ccwgroup_create_dev(struct device *parent, struct ccwgroup_driver *gdrv, } dev_set_name(&gdev->dev, "%s", dev_name(&gdev->cdev[0]->dev)); - gdev->dev.groups = ccwgroup_attr_groups; if (gdrv) { gdev->dev.driver = &gdrv->driver; @@ -487,6 +480,7 @@ static void ccwgroup_shutdown(struct device *dev) static struct bus_type ccwgroup_bus_type = { .name = "ccwgroup", + .dev_groups = ccwgroup_dev_groups, .remove = ccwgroup_remove, .shutdown = ccwgroup_shutdown, }; -- 2.17.1
[PATCH net-next 5/6] s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED state
Reuse the QETH_QDIO_BUF_EMPTY state to indicate that a TX buffer has been completed with a QAOB notification, and may be cleaned up by qeth_cleanup_handled_pending(). Signed-off-by: Julian Wiedmann --- drivers/s390/net/qeth_core.h | 2 -- drivers/s390/net/qeth_core_main.c | 5 ++--- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index d150da95d073..6f5ddc3eab8c 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -424,8 +424,6 @@ enum qeth_qdio_out_buffer_state { /* Received QAOB notification on CQ: */ QETH_QDIO_BUF_QAOB_OK, QETH_QDIO_BUF_QAOB_ERROR, - /* Handled via transfer pending / completion queue. */ - QETH_QDIO_BUF_HANDLED_DELAYED, }; struct qeth_qdio_out_buffer { diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 869694217450..da27ef451d05 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -477,8 +477,7 @@ static void qeth_cleanup_handled_pending(struct qeth_qdio_out_q *q, int bidx, while (c) { if (forced_cleanup || - atomic_read(&c->state) == - QETH_QDIO_BUF_HANDLED_DELAYED) { + atomic_read(&c->state) == QETH_QDIO_BUF_EMPTY) { struct qeth_qdio_out_buffer *f = c; QETH_CARD_TEXT(f->q->card, 5, "fp"); @@ -549,7 +548,7 @@ static void qeth_qdio_handle_aob(struct qeth_card *card, kmem_cache_free(qeth_core_header_cache, data); } - atomic_set(&buffer->state, QETH_QDIO_BUF_HANDLED_DELAYED); + atomic_set(&buffer->state, QETH_QDIO_BUF_EMPTY); break; default: WARN_ON_ONCE(1); -- 2.17.1
[PATCH net-next 3/6] s390/qeth: use dev->groups for common sysfs attributes
All qeth devices have a minimum set of sysfs attributes, and non-OSN devices share a group of additional attributes. Depending on whether the device is forced to use a specific discipline, the device_type then specifies further attributes. Shift the common attributes into dev->groups, so that the device_type only contains the discipline-specific attributes. This avoids exposing the common attributes to the disciplines, and nicely cleans up our sysfs code. While replacing the qeth_l*_*_device_attributes() helpers, switch from sysfs_*_groups() to the more generic device_*_groups(). Signed-off-by: Julian Wiedmann --- drivers/s390/net/qeth_core.h | 6 ++--- drivers/s390/net/qeth_core_main.c | 7 -- drivers/s390/net/qeth_core_sys.c | 41 ++- drivers/s390/net/qeth_l2.h| 2 -- drivers/s390/net/qeth_l2_main.c | 4 +-- drivers/s390/net/qeth_l2_sys.c| 19 -- drivers/s390/net/qeth_l3.h| 2 -- drivers/s390/net/qeth_l3_main.c | 4 +-- drivers/s390/net/qeth_l3_sys.c| 21 9 files changed, 30 insertions(+), 76 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 69b474f8735e..d150da95d073 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -1063,10 +1063,8 @@ extern const struct qeth_discipline qeth_l2_discipline; extern const struct qeth_discipline qeth_l3_discipline; extern const struct ethtool_ops qeth_ethtool_ops; extern const struct ethtool_ops qeth_osn_ethtool_ops; -extern const struct attribute_group *qeth_generic_attr_groups[]; -extern const struct attribute_group *qeth_osn_attr_groups[]; -extern const struct attribute_group qeth_device_attr_group; -extern const struct attribute_group qeth_device_blkt_group; +extern const struct attribute_group *qeth_dev_groups[]; +extern const struct attribute_group *qeth_osn_dev_groups[]; extern const struct device_type qeth_generic_devtype; const char *qeth_get_cardname_short(struct qeth_card *); diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 8171b9d3a70e..05d0b16bd7d6 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -6375,13 +6375,11 @@ void qeth_core_free_discipline(struct qeth_card *card) const struct device_type qeth_generic_devtype = { .name = "qeth_generic", - .groups = qeth_generic_attr_groups, }; EXPORT_SYMBOL_GPL(qeth_generic_devtype); static const struct device_type qeth_osn_devtype = { .name = "qeth_osn", - .groups = qeth_osn_attr_groups, }; #define DBF_NAME_LEN 20 @@ -6561,6 +6559,11 @@ static int qeth_core_probe_device(struct ccwgroup_device *gdev) if (rc) goto err_chp_desc; + if (IS_OSN(card)) + gdev->dev.groups = qeth_osn_dev_groups; + else + gdev->dev.groups = qeth_dev_groups; + enforced_disc = qeth_enforce_discipline(card); switch (enforced_disc) { case QETH_DISCIPLINE_UNDETERMINED: diff --git a/drivers/s390/net/qeth_core_sys.c b/drivers/s390/net/qeth_core_sys.c index 4441b3393eaf..a0f777f76f66 100644 --- a/drivers/s390/net/qeth_core_sys.c +++ b/drivers/s390/net/qeth_core_sys.c @@ -640,23 +640,17 @@ static struct attribute *qeth_blkt_device_attrs[] = { &dev_attr_inter_jumbo.attr, NULL, }; -const struct attribute_group qeth_device_blkt_group = { + +static const struct attribute_group qeth_dev_blkt_group = { .name = "blkt", .attrs = qeth_blkt_device_attrs, }; -EXPORT_SYMBOL_GPL(qeth_device_blkt_group); -static struct attribute *qeth_device_attrs[] = { - &dev_attr_state.attr, - &dev_attr_chpid.attr, - &dev_attr_if_name.attr, - &dev_attr_card_type.attr, +static struct attribute *qeth_dev_extended_attrs[] = { &dev_attr_inbuf_size.attr, &dev_attr_portno.attr, &dev_attr_portname.attr, &dev_attr_priority_queueing.attr, - &dev_attr_buffer_count.attr, - &dev_attr_recover.attr, &dev_attr_performance_stats.attr, &dev_attr_layer2.attr, &dev_attr_isolation.attr, @@ -664,18 +658,12 @@ static struct attribute *qeth_device_attrs[] = { &dev_attr_switch_attrs.attr, NULL, }; -const struct attribute_group qeth_device_attr_group = { - .attrs = qeth_device_attrs, -}; -EXPORT_SYMBOL_GPL(qeth_device_attr_group); -const struct attribute_group *qeth_generic_attr_groups[] = { - &qeth_device_attr_group, - &qeth_device_blkt_group, - NULL, +static const struct attribute_group qeth_dev_extended_group = { + .attrs = qeth_dev_extended_attrs, }; -static struct attribute *qeth_osn_device_attrs[] = { +static struct attribute *qeth_dev_attrs[] = { &dev_attr_state.attr, &dev_attr_chpid.attr, &dev_attr_if_name.attr, @@ -684,10 +672,19 @@ static struct attribute *qeth_osn_device_attrs[] = { &dev_attr_re
[PATCH net-next 4/6] s390/qeth: don't replace a fully completed async TX buffer
For TX buffers that require an additional async notification via QAOB, the TX completion code can now manage all the necessary processing if the notification has already occurred (or is occurring concurrently). In such cases we can avoid replacing the metadata that is associated with the buffer's slot on the ring, and just keep using the current one. As qeth_clear_output_buffer() will also handle any kmem cache-allocated memory that was mapped into the TX buffer, qeth_qdio_handle_aob() doesn't need to worry about it. While at it, also remove the unneeded forward declaration for qeth_init_qdio_out_buf(). Signed-off-by: Julian Wiedmann --- drivers/s390/net/qeth_core_main.c | 89 ++- 1 file changed, 51 insertions(+), 38 deletions(-) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 05d0b16bd7d6..869694217450 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -75,7 +75,6 @@ static void qeth_notify_skbs(struct qeth_qdio_out_q *queue, enum iucv_tx_notify notification); static void qeth_tx_complete_buf(struct qeth_qdio_out_buffer *buf, bool error, int budget); -static int qeth_init_qdio_out_buf(struct qeth_qdio_out_q *, int); static void qeth_close_dev_handler(struct work_struct *work) { @@ -517,18 +516,6 @@ static void qeth_qdio_handle_aob(struct qeth_card *card, buffer = (struct qeth_qdio_out_buffer *) aob->user1; QETH_CARD_TEXT_(card, 5, "%lx", aob->user1); - /* Free dangling allocations. The attached skbs are handled by -* qeth_cleanup_handled_pending(). -*/ - for (i = 0; -i < aob->sb_count && i < QETH_MAX_BUFFER_ELEMENTS(card); -i++) { - void *data = phys_to_virt(aob->sba[i]); - - if (data && buffer->is_header[i]) - kmem_cache_free(qeth_core_header_cache, data); - } - if (aob->aorc) { QETH_CARD_TEXT_(card, 2, "aorc%02X", aob->aorc); new_state = QETH_QDIO_BUF_QAOB_ERROR; @@ -536,10 +523,9 @@ static void qeth_qdio_handle_aob(struct qeth_card *card, switch (atomic_xchg(&buffer->state, new_state)) { case QETH_QDIO_BUF_PRIMED: - /* Faster than TX completion code. */ - notification = qeth_compute_cq_notification(aob->aorc, 0); - qeth_notify_skbs(buffer->q, buffer, notification); - atomic_set(&buffer->state, QETH_QDIO_BUF_HANDLED_DELAYED); + /* Faster than TX completion code, let it handle the async +* completion for us. +*/ break; case QETH_QDIO_BUF_PENDING: /* TX completion code is active and will handle the async @@ -550,6 +536,19 @@ static void qeth_qdio_handle_aob(struct qeth_card *card, /* TX completion code is already finished. */ notification = qeth_compute_cq_notification(aob->aorc, 1); qeth_notify_skbs(buffer->q, buffer, notification); + + /* Free dangling allocations. The attached skbs are handled by +* qeth_cleanup_handled_pending(). +*/ + for (i = 0; +i < aob->sb_count && i < QETH_MAX_BUFFER_ELEMENTS(card); +i++) { + void *data = phys_to_virt(aob->sba[i]); + + if (data && buffer->is_header[i]) + kmem_cache_free(qeth_core_header_cache, data); + } + atomic_set(&buffer->state, QETH_QDIO_BUF_HANDLED_DELAYED); break; default: @@ -6078,9 +6077,13 @@ static void qeth_iqd_tx_complete(struct qeth_qdio_out_q *queue, QDIO_OUTBUF_STATE_FLAG_PENDING)) { WARN_ON_ONCE(card->options.cq != QETH_CQ_ENABLED); - if (atomic_cmpxchg(&buffer->state, QETH_QDIO_BUF_PRIMED, - QETH_QDIO_BUF_PENDING) == - QETH_QDIO_BUF_PRIMED) { + QETH_CARD_TEXT_(card, 5, "pel%u", bidx); + + switch (atomic_cmpxchg(&buffer->state, + QETH_QDIO_BUF_PRIMED, + QETH_QDIO_BUF_PENDING)) { + case QETH_QDIO_BUF_PRIMED: + /* We have initial ownership, no QAOB (yet): */ qeth_notify_skbs(queue, buffer, TX_NOTIFY_PENDING); /* Handle race with qeth_qdio_handle_aob(): */ @@ -6088,39 +6091,49 @@ static void qeth_iqd_tx_complete(struct qeth_qdio_out_q *queue, QETH_QDIO_BUF_NEED_QAOB)) { case QETH_QDIO_BUF_PENDING: /* No concurrent QAOB notification. */ - break; + +
[PATCH net-next 0/6] s390/qeth: updates 2020-12-07
Hi Jakub, please apply the following patch series for qeth to netdev's net-next tree. Some sysfs cleanups (with the prep work in ccwgroup acked by Heiko), and a few improvements to the code that deals with async TX completion notifications for IQD devices. This also brings the missing patch from the previous net-next submission. Thanks, Julian Julian Wiedmann (6): s390/qeth: don't call INIT_LIST_HEAD() on iob's list entry s390/ccwgroup: use bus->dev_groups for bus-based sysfs attributes s390/qeth: use dev->groups for common sysfs attributes s390/qeth: don't replace a fully completed async TX buffer s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED state s390/qeth: make qeth_qdio_handle_aob() more robust drivers/s390/cio/ccwgroup.c | 12 +--- drivers/s390/net/qeth_core.h | 10 +-- drivers/s390/net/qeth_core_main.c | 111 +- drivers/s390/net/qeth_core_sys.c | 41 +-- drivers/s390/net/qeth_l2.h| 2 - drivers/s390/net/qeth_l2_main.c | 4 +- drivers/s390/net/qeth_l2_sys.c| 19 - drivers/s390/net/qeth_l3.h| 2 - drivers/s390/net/qeth_l3_main.c | 4 +- drivers/s390/net/qeth_l3_sys.c| 21 -- 10 files changed, 92 insertions(+), 134 deletions(-) -- 2.17.1
[PATCH net-next 6/6] s390/qeth: make qeth_qdio_handle_aob() more robust
When qeth_qdio_handle_aob() frees dangling allocations in the notified TX buffer, there are rare tear-down cases where qeth_drain_output_queue() would later call qeth_clear_output_buffer() for the same buffer - and thus end up walking the buffer a second time to check for dangling kmem_cache allocations. Luckily current code previously scrubs such a buffer, so qeth_clear_output_buffer() would find buf->buffer->element[i].addr as NULL and not do anything. But this is fragile, and we can easily improve it by consistently clearing the ->is_header flag after freeing the allocation. Signed-off-by: Julian Wiedmann --- drivers/s390/net/qeth_core_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index da27ef451d05..f4b60294a969 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -546,6 +546,7 @@ static void qeth_qdio_handle_aob(struct qeth_card *card, if (data && buffer->is_header[i]) kmem_cache_free(qeth_core_header_cache, data); + buffer->is_header[i] = 0; } atomic_set(&buffer->state, QETH_QDIO_BUF_EMPTY); -- 2.17.1
[PATCH net-next 1/6] s390/qeth: don't call INIT_LIST_HEAD() on iob's list entry
INIT_LIST_HEAD() only needs to be called on actual list heads. While at it clarify the naming of the field. Suggested-by: Vasily Gorbik Signed-off-by: Julian Wiedmann --- drivers/s390/net/qeth_core.h | 2 +- drivers/s390/net/qeth_core_main.c | 9 - 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 0e9af2fbaa76..69b474f8735e 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -624,7 +624,7 @@ struct qeth_reply { }; struct qeth_cmd_buffer { - struct list_head list; + struct list_head list_entry; struct completion done; spinlock_t lock; unsigned int length; diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 319190824cd2..8171b9d3a70e 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -615,7 +615,7 @@ static void qeth_enqueue_cmd(struct qeth_card *card, struct qeth_cmd_buffer *iob) { spin_lock_irq(&card->lock); - list_add_tail(&iob->list, &card->cmd_waiter_list); + list_add_tail(&iob->list_entry, &card->cmd_waiter_list); spin_unlock_irq(&card->lock); } @@ -623,7 +623,7 @@ static void qeth_dequeue_cmd(struct qeth_card *card, struct qeth_cmd_buffer *iob) { spin_lock_irq(&card->lock); - list_del(&iob->list); + list_del(&iob->list_entry); spin_unlock_irq(&card->lock); } @@ -977,7 +977,7 @@ static void qeth_clear_ipacmd_list(struct qeth_card *card) QETH_CARD_TEXT(card, 4, "clipalst"); spin_lock_irqsave(&card->lock, flags); - list_for_each_entry(iob, &card->cmd_waiter_list, list) + list_for_each_entry(iob, &card->cmd_waiter_list, list_entry) qeth_notify_cmd(iob, -ECANCELED); spin_unlock_irqrestore(&card->lock, flags); } @@ -1047,7 +1047,6 @@ struct qeth_cmd_buffer *qeth_alloc_cmd(struct qeth_channel *channel, init_completion(&iob->done); spin_lock_init(&iob->lock); - INIT_LIST_HEAD(&iob->list); refcount_set(&iob->ref_count, 1); iob->channel = channel; iob->timeout = timeout; @@ -1094,7 +1093,7 @@ static void qeth_issue_next_read_cb(struct qeth_card *card, /* match against pending cmd requests */ spin_lock_irqsave(&card->lock, flags); - list_for_each_entry(tmp, &card->cmd_waiter_list, list) { + list_for_each_entry(tmp, &card->cmd_waiter_list, list_entry) { if (tmp->match && tmp->match(tmp, iob)) { request = tmp; /* take the object outside the lock */ -- 2.17.1
[PATCH v2 bpf-next 00/13] Socket migration for SO_REUSEPORT.
The SO_REUSEPORT option allows sockets to listen on the same port and to accept connections evenly. However, there is a defect in the current implementation[1]. When a SYN packet is received, the connection is tied to a listening socket. Accordingly, when the listener is closed, in-flight requests during the three-way handshake and child sockets in the accept queue are dropped even if other listeners on the same port could accept such connections. This situation can happen when various server management tools restart server (such as nginx) processes. For instance, when we change nginx configurations and restart it, it spins up new workers that respect the new configuration and closes all listeners on the old workers, resulting in the in-flight ACK of 3WHS is responded by RST. The SO_REUSEPORT option is excellent to improve scalability. On the other hand, as a trade-off, users have to know deeply how the kernel handles SYN packets and implement connection draining by eBPF[2]: 1. Stop routing SYN packets to the listener by eBPF. 2. Wait for all timers to expire to complete requests 3. Accept connections until EAGAIN, then close the listener. or 1. Start counting SYN packets and accept syscalls using eBPF map. 2. Stop routing SYN packets. 3. Accept connections up to the count, then close the listener. In either way, we cannot close a listener immediately. However, ideally, the application need not drain the not yet accepted sockets because 3WHS and tying a connection to a listener are just the kernel behaviour. The root cause is within the kernel, so the issue should be addressed in kernel space and should not be visible to user space. This patchset fixes it so that users need not take care of kernel implementation and connection draining. With this patchset, the kernel redistributes requests and connections from a listener to others in the same reuseport group at/after close() or shutdown() syscalls. Although some software does connection draining, there are still merits in migration. For some security reasons such as replacing TLS certificates, we may want to apply new settings as soon as possible and/or we may not be able to wait for connection draining. The sockets in the accept queue have not started application sessions yet. So, if we do not drain such sockets, they can be handled by the newer listeners and could have a longer lifetime. It is difficult to drain all connections in every case, but we can decrease such aborted connections by migration. In that sense, migration is always better than draining. Moreover, auto-migration simplifies userspace logic and also works well in a case where we cannot modify and build a server program to implement the workaround. Note that the source and destination listeners MUST have the same settings at the socket API level; otherwise, applications may face inconsistency and cause errors. In such a case, we have to use eBPF program to select a specific listener or to cancel migration. Link: [1] The SO_REUSEPORT socket option https://lwn.net/Articles/542629/ [2] Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode https://lore.kernel.org/netdev/1458828813.10868.65.ca...@edumazet-glaptop3.roam.corp.google.com/ Changelog: v2: * Do not save closed sockets in socks[] * Revert 607904c357c61adf20b8fd18af765e501d61a385 * Extract inet_csk_reqsk_queue_migrate() into a single patch * Change the spin_lock order to avoid lockdep warning * Add static to __reuseport_select_sock * Use refcount_inc_not_zero() in reuseport_select_migrated_sock() * Set the default attach type in bpf_prog_load_check_attach() * Define new proto of BPF_FUNC_get_socket_cookie * Fix test to be compiled successfully * Update commit messages v1: https://lore.kernel.org/netdev/20201201144418.35045-1-kun...@amazon.co.jp/ * Remove the sysctl option * Enable migration if eBPF progam is not attached * Add expected_attach_type to check if eBPF program can migrate sockets * Add a field to tell migration type to eBPF program * Support BPF_FUNC_get_socket_cookie to get the cookie of sk * Allocate an empty skb if skb is NULL * Pass req_to_sk(req)->sk_hash because listener's hash is zero * Update commit messages and coverletter RFC: https://lore.kernel.org/netdev/20201117094023.3685-1-kun...@amazon.co.jp/ Kuniyuki Iwashima (13): tcp: Allow TCP_CLOSE sockets to hold the reuseport group. bpf: Define migration types for SO_REUSEPORT. Revert "locking/spinlocks: Remove the unused spin_lock_bh_nested() API" tcp: Introduce inet_csk_reqsk_queue_migrate(). tcp: Set the new listener to migrated TFO requests. tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues. tcp: Migrate TCP_NEW_SYN_RECV requests. bpf: Introduce two attach types for BPF_PROG_TYPE_SK_REUSEPORT. libbpf: Set expected_attach_type for BPF_PROG_TYPE_SK_REUSEPORT. bpf: Add migration to sk_reuseport_(kern|md). bpf: Support BPF
[PATCH v2 bpf-next 01/13] tcp: Allow TCP_CLOSE sockets to hold the reuseport group.
This patch is a preparation patch to migrate incoming connections in the later commits and adds a field (num_closed_socks) to the struct sock_reuseport to allow TCP_CLOSE sockets to access to the reuseport group. When we close a listening socket, to migrate its connections to another listener in the same reuseport group, we have to handle two kinds of child sockets. One is that a listening socket has a reference to, and the other is not. The former is the TCP_ESTABLISHED/TCP_SYN_RECV sockets, and they are in the accept queue of their listening socket. So, we can pop them out and push them into another listener's queue at close() or shutdown() syscalls. On the other hand, the latter, the TCP_NEW_SYN_RECV socket is during the three-way handshake and not in the accept queue. Thus, we cannot access such sockets at close() or shutdown() syscalls. Accordingly, we have to migrate immature sockets after their listening socket has been closed. Currently, if their listening socket has been closed, TCP_NEW_SYN_RECV sockets are freed at receiving the final ACK or retransmitting SYN+ACKs. At that time, if we could select a new listener from the same reuseport group, no connection would be aborted. However, it is impossible because reuseport_detach_sock() sets NULL to sk_reuseport_cb and forbids access to the reuseport group from closed sockets. This patch allows TCP_CLOSE sockets to hold sk_reuseport_cb while any child socket references to them. The point is that reuseport_detach_sock() is called twice from inet_unhash() and sk_destruct(). At first, it decrements num_socks and increments num_closed_socks. Later, when all migrated connections are accepted, it decrements num_closed_socks and sets NULL to sk_reuseport_cb. By this change, closed sockets can keep sk_reuseport_cb until all child requests have been freed or accepted. Consequently calling listen() after shutdown() can cause EADDRINUSE or EBUSY in reuseport_add_sock() or inet_csk_bind_conflict() which expect that such sockets should not have the reuseport group. Therefore, this patch also loosens such validation rules so that the socket can listen again if it has the same reuseport group with other listening sockets. Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- include/net/sock_reuseport.h| 5 +++-- net/core/sock_reuseport.c | 39 +++-- net/ipv4/inet_connection_sock.c | 7 -- 3 files changed, 35 insertions(+), 16 deletions(-) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index 505f1e18e9bf..0e558ca7afbf 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -13,8 +13,9 @@ extern spinlock_t reuseport_lock; struct sock_reuseport { struct rcu_head rcu; - u16 max_socks; /* length of socks */ - u16 num_socks; /* elements in socks */ + u16 max_socks; /* length of socks */ + u16 num_socks; /* elements in socks */ + u16 num_closed_socks; /* closed elements in socks */ /* The last synq overflow event timestamp of this * reuse->socks[] group. */ diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index bbdd3c7b6cb5..c26f4256ff41 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -98,14 +98,15 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) return NULL; more_reuse->num_socks = reuse->num_socks; + more_reuse->num_closed_socks = reuse->num_closed_socks; more_reuse->prog = reuse->prog; more_reuse->reuseport_id = reuse->reuseport_id; more_reuse->bind_inany = reuse->bind_inany; more_reuse->has_conns = reuse->has_conns; + more_reuse->synq_overflow_ts = READ_ONCE(reuse->synq_overflow_ts); memcpy(more_reuse->socks, reuse->socks, reuse->num_socks * sizeof(struct sock *)); - more_reuse->synq_overflow_ts = READ_ONCE(reuse->synq_overflow_ts); for (i = 0; i < reuse->num_socks; ++i) rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, @@ -152,8 +153,10 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) reuse = rcu_dereference_protected(sk2->sk_reuseport_cb, lockdep_is_held(&reuseport_lock)); old_reuse = rcu_dereference_protected(sk->sk_reuseport_cb, -lockdep_is_held(&reuseport_lock)); - if (old_reuse && old_reuse->num_socks != 1) { + lockdep_is_held(&reuseport_lock)); + if (old_reuse == reuse) { + reuse->num_closed_socks--; + } else if (old_reuse && old_reuse->num_socks != 1) { spin_unlock_bh(&reuseport_lock); retu
Re: [PATCH 3/7] net: macb: unprepare clocks in case of failure
Hi Andrew, On 05.12.2020 16:30, Andrew Lunn wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > On Fri, Dec 04, 2020 at 02:34:17PM +0200, Claudiu Beznea wrote: >> Unprepare clocks in case of any failure in fu540_c000_clk_init(). > > Hi Claudiu > > Nice patchset. Simple to understand. >> > >> +err_disable_clocks: >> + clk_disable_unprepare(*tx_clk); > >> + clk_disable_unprepare(*hclk); >> + clk_disable_unprepare(*pclk); >> + clk_disable_unprepare(*rx_clk); >> + clk_disable_unprepare(*tsu_clk); > > This looks correct, but it would be more symmetrical to add a > > macb_clk_uninit() > > function for the four main clocks. I'm surprised it does not already > exist. I was in balance b/w added and not added it taking into account that the disable unprepares are not taking care of all the clocks, in all the places, in the same way. Anyway, I will add one function for the main clocks, as you proposed, in the next version. Thank you for your review, Claudiu > > Andrew >
[PATCH v2 bpf-next 02/13] bpf: Define migration types for SO_REUSEPORT.
As noted in the preceding commit, there are two migration types. In addition to that, the kernel will run the same eBPF program to select a listener for SYN packets. This patch defines three types to signal the kernel and the eBPF program if it is receiving a new request or migrating ESTABLISHED/SYN_RECV sockets in the accept queue or NEW_SYN_RECV socket during 3WHS. Signed-off-by: Kuniyuki Iwashima --- include/uapi/linux/bpf.h | 14 ++ tools/include/uapi/linux/bpf.h | 14 ++ 2 files changed, 28 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 1233f14f659f..7a48e0055500 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4423,6 +4423,20 @@ struct sk_msg_md { __bpf_md_ptr(struct bpf_sock *, sk); /* current socket */ }; +/* Migration type for SO_REUSEPORT enabled TCP sockets. + * + * BPF_SK_REUSEPORT_MIGRATE_NO : Select a listener for SYN packets. + * BPF_SK_REUSEPORT_MIGRATE_QUEUE : Migrate ESTABLISHED and SYN_RECV sockets in + *the accept queue at close() or shutdown(). + * BPF_SK_REUSEPORT_MIGRATE_REQUEST : Migrate NEW_SYN_RECV socket at receiving the + *final ACK of 3WHS or retransmitting SYN+ACKs. + */ +enum { + BPF_SK_REUSEPORT_MIGRATE_NO, + BPF_SK_REUSEPORT_MIGRATE_QUEUE, + BPF_SK_REUSEPORT_MIGRATE_REQUEST, +}; + struct sk_reuseport_md { /* * Start of directly accessible data. It begins from diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 1233f14f659f..7a48e0055500 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4423,6 +4423,20 @@ struct sk_msg_md { __bpf_md_ptr(struct bpf_sock *, sk); /* current socket */ }; +/* Migration type for SO_REUSEPORT enabled TCP sockets. + * + * BPF_SK_REUSEPORT_MIGRATE_NO : Select a listener for SYN packets. + * BPF_SK_REUSEPORT_MIGRATE_QUEUE : Migrate ESTABLISHED and SYN_RECV sockets in + *the accept queue at close() or shutdown(). + * BPF_SK_REUSEPORT_MIGRATE_REQUEST : Migrate NEW_SYN_RECV socket at receiving the + *final ACK of 3WHS or retransmitting SYN+ACKs. + */ +enum { + BPF_SK_REUSEPORT_MIGRATE_NO, + BPF_SK_REUSEPORT_MIGRATE_QUEUE, + BPF_SK_REUSEPORT_MIGRATE_REQUEST, +}; + struct sk_reuseport_md { /* * Start of directly accessible data. It begins from -- 2.17.2 (Apple Git-113)
[PATCH v2 bpf-next 03/13] Revert "locking/spinlocks: Remove the unused spin_lock_bh_nested() API"
This reverts commit 607904c357c61adf20b8fd18af765e501d61a385 to use spin_lock_bh_nested() in the next commit. Link: https://lore.kernel.org/netdev/9d290a57-49e1-04cd-2487-262b0d7c5...@gmail.com/ Signed-off-by: Kuniyuki Iwashima CC: Waiman Long --- include/linux/spinlock.h | 8 include/linux/spinlock_api_smp.h | 2 ++ include/linux/spinlock_api_up.h | 1 + kernel/locking/spinlock.c| 8 4 files changed, 19 insertions(+) diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h index 79897841a2cc..c020b375a071 100644 --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -227,6 +227,8 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock) #ifdef CONFIG_DEBUG_LOCK_ALLOC # define raw_spin_lock_nested(lock, subclass) \ _raw_spin_lock_nested(lock, subclass) +# define raw_spin_lock_bh_nested(lock, subclass) \ + _raw_spin_lock_bh_nested(lock, subclass) # define raw_spin_lock_nest_lock(lock, nest_lock) \ do { \ @@ -242,6 +244,7 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock) # define raw_spin_lock_nested(lock, subclass) \ _raw_spin_lock(((void)(subclass), (lock))) # define raw_spin_lock_nest_lock(lock, nest_lock) _raw_spin_lock(lock) +# define raw_spin_lock_bh_nested(lock, subclass) _raw_spin_lock_bh(lock) #endif #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) @@ -369,6 +372,11 @@ do { \ raw_spin_lock_nested(spinlock_check(lock), subclass); \ } while (0) +#define spin_lock_bh_nested(lock, subclass)\ +do { \ + raw_spin_lock_bh_nested(spinlock_check(lock), subclass);\ +} while (0) + #define spin_lock_nest_lock(lock, nest_lock) \ do { \ raw_spin_lock_nest_lock(spinlock_check(lock), nest_lock); \ diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h index 19a9be9d97ee..d565fb6304f2 100644 --- a/include/linux/spinlock_api_smp.h +++ b/include/linux/spinlock_api_smp.h @@ -22,6 +22,8 @@ int in_lock_functions(unsigned long addr); void __lockfunc _raw_spin_lock(raw_spinlock_t *lock) __acquires(lock); void __lockfunc _raw_spin_lock_nested(raw_spinlock_t *lock, int subclass) __acquires(lock); +void __lockfunc _raw_spin_lock_bh_nested(raw_spinlock_t *lock, int subclass) + __acquires(lock); void __lockfunc _raw_spin_lock_nest_lock(raw_spinlock_t *lock, struct lockdep_map *map) __acquires(lock); diff --git a/include/linux/spinlock_api_up.h b/include/linux/spinlock_api_up.h index d0d188861ad6..d3afef9d8dbe 100644 --- a/include/linux/spinlock_api_up.h +++ b/include/linux/spinlock_api_up.h @@ -57,6 +57,7 @@ #define _raw_spin_lock(lock) __LOCK(lock) #define _raw_spin_lock_nested(lock, subclass) __LOCK(lock) +#define _raw_spin_lock_bh_nested(lock, subclass) __LOCK(lock) #define _raw_read_lock(lock) __LOCK(lock) #define _raw_write_lock(lock) __LOCK(lock) #define _raw_spin_lock_bh(lock)__LOCK_BH(lock) diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c index 0ff08380f531..48e99ed1bdd8 100644 --- a/kernel/locking/spinlock.c +++ b/kernel/locking/spinlock.c @@ -363,6 +363,14 @@ void __lockfunc _raw_spin_lock_nested(raw_spinlock_t *lock, int subclass) } EXPORT_SYMBOL(_raw_spin_lock_nested); +void __lockfunc _raw_spin_lock_bh_nested(raw_spinlock_t *lock, int subclass) +{ + __local_bh_disable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET); + spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_); + LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock); +} +EXPORT_SYMBOL(_raw_spin_lock_bh_nested); + unsigned long __lockfunc _raw_spin_lock_irqsave_nested(raw_spinlock_t *lock, int subclass) { -- 2.17.2 (Apple Git-113)
[PATCH v2 bpf-next 04/13] tcp: Introduce inet_csk_reqsk_queue_migrate().
This patch defines a new function to migrate ESTABLISHED/SYN_RECV sockets. Listening sockets hold incoming connections as a linked list of struct request_sock in the accept queue, and each request has reference to its full socket and listener. In inet_csk_reqsk_queue_migrate(), we only unlink the requests from the closing listener's queue and relink them to the head of the new listener's queue. We do not process each request and its reference to the listener, so the migration completes in O(1) time complexity. Moreover, if TFO requests caused RST before 3WHS has completed, they are held in the listener's TFO queue to prevent DDoS attack. Thus, we also migrate the requests in the TFO queue in the same way. After 3WHS has completed, there are three access patterns to incoming sockets: (1) access to the full socket instead of request_sock (2) access to request_sock from access queue (3) access to request_sock from TFO queue In the first case, the full socket does not have a reference to its request socket and listener, so we do not need the correct listener set in the request socket. In the second case, we always have the correct listener and currently do not use req->rsk_listener. However, in the third case of TCP_SYN_RECV sockets, we take special care in the next commit. Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- include/net/inet_connection_sock.h | 1 + net/ipv4/inet_connection_sock.c| 68 ++ 2 files changed, 69 insertions(+) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 7338b3865a2a..2ea2d743f8fc 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -260,6 +260,7 @@ struct dst_entry *inet_csk_route_child_sock(const struct sock *sk, struct sock *inet_csk_reqsk_queue_add(struct sock *sk, struct request_sock *req, struct sock *child); +void inet_csk_reqsk_queue_migrate(struct sock *sk, struct sock *nsk); void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req, unsigned long timeout); struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child, diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 1451aa9712b0..5da38a756e4c 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -992,6 +992,74 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk, } EXPORT_SYMBOL(inet_csk_reqsk_queue_add); +void inet_csk_reqsk_queue_migrate(struct sock *sk, struct sock *nsk) +{ + struct request_sock_queue *old_accept_queue, *new_accept_queue; + struct fastopen_queue *old_fastopenq, *new_fastopenq; + spinlock_t *l1, *l2, *l3, *l4; + + old_accept_queue = &inet_csk(sk)->icsk_accept_queue; + new_accept_queue = &inet_csk(nsk)->icsk_accept_queue; + old_fastopenq = &old_accept_queue->fastopenq; + new_fastopenq = &new_accept_queue->fastopenq; + + l1 = &old_accept_queue->rskq_lock; + l2 = &new_accept_queue->rskq_lock; + l3 = &old_fastopenq->lock; + l4 = &new_fastopenq->lock; + + /* sk is never selected as the new listener from reuse->socks[], +* so inversion deadlock does not happen here, +* but change the order to avoid the warning of lockdep. +*/ + if (sk < nsk) { + swap(l1, l2); + swap(l3, l4); + } + + spin_lock(l1); + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); + + if (old_accept_queue->rskq_accept_head) { + if (new_accept_queue->rskq_accept_head) + old_accept_queue->rskq_accept_tail->dl_next = + new_accept_queue->rskq_accept_head; + else + new_accept_queue->rskq_accept_tail = old_accept_queue->rskq_accept_tail; + + new_accept_queue->rskq_accept_head = old_accept_queue->rskq_accept_head; + old_accept_queue->rskq_accept_head = NULL; + old_accept_queue->rskq_accept_tail = NULL; + + WRITE_ONCE(nsk->sk_ack_backlog, nsk->sk_ack_backlog + sk->sk_ack_backlog); + WRITE_ONCE(sk->sk_ack_backlog, 0); + } + + spin_unlock(l2); + spin_unlock(l1); + + spin_lock_bh(l3); + spin_lock_bh_nested(l4, SINGLE_DEPTH_NESTING); + + new_fastopenq->qlen += old_fastopenq->qlen; + old_fastopenq->qlen = 0; + + if (old_fastopenq->rskq_rst_head) { + if (new_fastopenq->rskq_rst_head) + old_fastopenq->rskq_rst_tail->dl_next = new_fastopenq->rskq_rst_head; + else + old_fastopenq->rskq_rst_tail = new_fastopenq->rskq_rst_tail; + + new_fastopenq->rskq_rst_head = old_fastopenq->rskq_rst_head; + old_fastopenq->rskq_rst_head
[PATCH v2 bpf-next 05/13] tcp: Set the new listener to migrated TFO requests.
A TFO request socket is only freed after BOTH 3WHS has completed (or aborted) and the child socket has been accepted (or its listener has been closed). Hence, depending on the order, there can be two kinds of request sockets in the accept queue. 3WHS -> accept : TCP_ESTABLISHED accept -> 3WHS : TCP_SYN_RECV Unlike TCP_ESTABLISHED socket, accept() does not free the request socket for TCP_SYN_RECV socket. It is freed later at reqsk_fastopen_remove(). Also, it accesses request_sock.rsk_listener. So, in order to complete TFO socket migration, we have to set the current listener to it at accept() before reqsk_fastopen_remove(). Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- net/ipv4/inet_connection_sock.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 5da38a756e4c..143590858c2e 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -500,6 +500,16 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern) tcp_rsk(req)->tfo_listener) { spin_lock_bh(&queue->fastopenq.lock); if (tcp_rsk(req)->tfo_listener) { + if (req->rsk_listener != sk) { + /* TFO request was migrated to another listener so +* the new listener must be used in reqsk_fastopen_remove() +* to hold requests which cause RST. +*/ + sock_put(req->rsk_listener); + sock_hold(sk); + req->rsk_listener = sk; + } + /* We are still waiting for the final ACK from 3WHS * so can't free req now. Instead, we set req->sk to * NULL to signify that the child socket is taken @@ -954,7 +964,6 @@ static void inet_child_forget(struct sock *sk, struct request_sock *req, if (sk->sk_protocol == IPPROTO_TCP && tcp_rsk(req)->tfo_listener) { BUG_ON(rcu_access_pointer(tcp_sk(child)->fastopen_rsk) != req); - BUG_ON(sk != req->rsk_listener); /* Paranoid, to prevent race condition if * an inbound pkt destined for child is -- 2.17.2 (Apple Git-113)
[PATCH v2 bpf-next 06/13] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.
This patch lets reuseport_detach_sock() return a pointer of struct sock, which is used only by inet_unhash(). If it is not NULL, inet_csk_reqsk_queue_migrate() migrates TCP_ESTABLISHED/TCP_SYN_RECV sockets from the closing listener to the selected one. By default, the kernel selects a new listener randomly. In order to pick out a different socket every time, we select the last element of socks[] as the new listener. This behaviour is based on how the kernel moves sockets in socks[]. (See also [1]) Basically, in order to redistribute sockets evenly, we have to use an eBPF program called in the later commit, but as the side effect of such default selection, the kernel can redistribute old requests evenly to new listeners for a specific case where the application replaces listeners by generations. For example, we call listen() for four sockets (A, B, C, D), and close() the first two by turns. The sockets move in socks[] like below. socks[0] : A <-. socks[0] : D socks[0] : D socks[1] : B | => socks[1] : B <-. => socks[1] : C socks[2] : C | socks[2] : C --' socks[3] : D --' Then, if C and D have newer settings than A and B, and each socket has a request (a, b, c, d) in their accept queue, we can redistribute old requests evenly to new listeners. socks[0] : A (a) <-. socks[0] : D (a + d) socks[0] : D (a + d) socks[1] : B (b) | => socks[1] : B (b) <-. => socks[1] : C (b + c) socks[2] : C (c) | socks[2] : C (c) --' socks[3] : D (d) --' Here, (A, D), or (B, C) can have different application settings, but they MUST have the same settings at the socket API level; otherwise, unexpected error may happen. For instance, if only the new listeners have TCP_SAVE_SYN, old requests do not hold SYN data, so the application will face inconsistency and cause an error. Therefore, if there are different kinds of sockets, we must attach an eBPF program described in later commits. Link: https://lore.kernel.org/netdev/CAEfhGiyG8Y_amDZ2C8dQoQqjZJMHjTY76b=KBkTKcBtA=dh...@mail.gmail.com/ Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- include/net/sock_reuseport.h | 2 +- net/core/sock_reuseport.c| 16 +--- net/ipv4/inet_hashtables.c | 9 +++-- 3 files changed, 21 insertions(+), 6 deletions(-) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index 0e558ca7afbf..09a1b1539d4c 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -31,7 +31,7 @@ struct sock_reuseport { extern int reuseport_alloc(struct sock *sk, bool bind_inany); extern int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany); -extern void reuseport_detach_sock(struct sock *sk); +extern struct sock *reuseport_detach_sock(struct sock *sk); extern struct sock *reuseport_select_sock(struct sock *sk, u32 hash, struct sk_buff *skb, diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index c26f4256ff41..2de42f8103ea 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -184,9 +184,11 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) } EXPORT_SYMBOL(reuseport_add_sock); -void reuseport_detach_sock(struct sock *sk) +struct sock *reuseport_detach_sock(struct sock *sk) { struct sock_reuseport *reuse; + struct bpf_prog *prog; + struct sock *nsk = NULL; int i; spin_lock_bh(&reuseport_lock); @@ -215,17 +217,25 @@ void reuseport_detach_sock(struct sock *sk) reuse->num_socks--; reuse->socks[i] = reuse->socks[reuse->num_socks]; + prog = rcu_dereference_protected(reuse->prog, + lockdep_is_held(&reuseport_lock)); + + if (sk->sk_protocol == IPPROTO_TCP) { + if (reuse->num_socks && !prog) + nsk = i == reuse->num_socks ? reuse->socks[i - 1] : reuse->socks[i]; - if (sk->sk_protocol == IPPROTO_TCP) reuse->num_closed_socks++; - else + } else { rcu_assign_pointer(sk->sk_reuseport_cb, NULL); + } } if (reuse->num_socks + reuse->num_closed_socks == 0) call_rcu(&reuse->rcu, reuseport_free_rcu); spin_unlock_bh(&reuseport_lock); + + return nsk; } EXPORT_SYMBOL(reuseport_detach_sock); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 45fb450b4522..545538a6bfac 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -681,6 +681,7 @@ void inet_unhash(struct sock *sk) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; struct inet_listen_hashbucket *ilb = NULL; + struct sock *nsk; spinlock_t *lock
[PATCH v2 bpf-next 07/13] tcp: Migrate TCP_NEW_SYN_RECV requests.
This patch renames reuseport_select_sock() to __reuseport_select_sock() and adds two wrapper function of it to pass the migration type defined in the previous commit. reuseport_select_sock : BPF_SK_REUSEPORT_MIGRATE_NO reuseport_select_migrated_sock : BPF_SK_REUSEPORT_MIGRATE_REQUEST As mentioned before, we have to select a new listener for TCP_NEW_SYN_RECV requests at receiving the final ACK or sending a SYN+ACK. Therefore, this patch also changes the code to call reuseport_select_migrated_sock() even if the listening socket is TCP_CLOSE. If we can pick out a listening socket from the reuseport group, we rewrite request_sock.rsk_listener and resume processing the request. Link: https://lore.kernel.org/bpf/202012020136.bf0z4guu-...@intel.com/ Reported-by: kernel test robot Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- include/net/inet_connection_sock.h | 11 include/net/request_sock.h | 13 ++ include/net/sock_reuseport.h | 8 +++--- net/core/sock_reuseport.c | 40 -- net/ipv4/inet_connection_sock.c| 13 -- net/ipv4/tcp_ipv4.c| 9 +-- net/ipv6/tcp_ipv6.c| 9 +-- 7 files changed, 86 insertions(+), 17 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 2ea2d743f8fc..d8c3be31e987 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -272,6 +272,17 @@ static inline void inet_csk_reqsk_queue_added(struct sock *sk) reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue); } +static inline void inet_csk_reqsk_queue_migrated(struct sock *sk, +struct sock *nsk, +struct request_sock *req) +{ + reqsk_queue_migrated(&inet_csk(sk)->icsk_accept_queue, +&inet_csk(nsk)->icsk_accept_queue, +req); + sock_put(sk); + req->rsk_listener = nsk; +} + static inline int inet_csk_reqsk_queue_len(const struct sock *sk) { return reqsk_queue_len(&inet_csk(sk)->icsk_accept_queue); diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 29e41ff3ec93..d18ba0b857cc 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -226,6 +226,19 @@ static inline void reqsk_queue_added(struct request_sock_queue *queue) atomic_inc(&queue->qlen); } +static inline void reqsk_queue_migrated(struct request_sock_queue *old_accept_queue, + struct request_sock_queue *new_accept_queue, + const struct request_sock *req) +{ + atomic_dec(&old_accept_queue->qlen); + atomic_inc(&new_accept_queue->qlen); + + if (req->num_timeout == 0) { + atomic_dec(&old_accept_queue->young); + atomic_inc(&new_accept_queue->young); + } +} + static inline int reqsk_queue_len(const struct request_sock_queue *queue) { return atomic_read(&queue->qlen); diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index 09a1b1539d4c..a48259a974be 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -32,10 +32,10 @@ extern int reuseport_alloc(struct sock *sk, bool bind_inany); extern int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany); extern struct sock *reuseport_detach_sock(struct sock *sk); -extern struct sock *reuseport_select_sock(struct sock *sk, - u32 hash, - struct sk_buff *skb, - int hdr_len); +extern struct sock *reuseport_select_sock(struct sock *sk, u32 hash, + struct sk_buff *skb, int hdr_len); +extern struct sock *reuseport_select_migrated_sock(struct sock *sk, u32 hash, + struct sk_buff *skb); extern int reuseport_attach_prog(struct sock *sk, struct bpf_prog *prog); extern int reuseport_detach_prog(struct sock *sk); diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index 2de42f8103ea..1011c3756c92 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -170,7 +170,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) } reuse->socks[reuse->num_socks] = sk; - /* paired with smp_rmb() in reuseport_select_sock() */ + /* paired with smp_rmb() in __reuseport_select_sock() */ smp_wmb(); reuse->num_socks++; rcu_assign_pointer(sk->sk_reuseport_cb, reuse); @@ -277,12 +277,13 @@ static struct sock *run_bpf_filter(struct sock_reuseport *reuse, u16 socks, * @hdr_len: BPF filter expects skb data pointer at payload data. If *the skb d
[PATCH v2 bpf-next 09/13] libbpf: Set expected_attach_type for BPF_PROG_TYPE_SK_REUSEPORT.
This commit introduces a new section (sk_reuseport/migrate) and sets expected_attach_type to two each section in BPF_PROG_TYPE_SK_REUSEPORT program. Signed-off-by: Kuniyuki Iwashima --- tools/lib/bpf/libbpf.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 9be88a90a4aa..ba64c891a5e7 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -8471,7 +8471,10 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec, static const struct bpf_sec_def section_defs[] = { BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER), - BPF_PROG_SEC("sk_reuseport",BPF_PROG_TYPE_SK_REUSEPORT), + BPF_EAPROG_SEC("sk_reuseport/migrate", BPF_PROG_TYPE_SK_REUSEPORT, + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE), + BPF_EAPROG_SEC("sk_reuseport", BPF_PROG_TYPE_SK_REUSEPORT, + BPF_SK_REUSEPORT_SELECT), SEC_DEF("kprobe/", KPROBE, .attach_fn = attach_kprobe), BPF_PROG_SEC("uprobe/", BPF_PROG_TYPE_KPROBE), -- 2.17.2 (Apple Git-113)
[PATCH v2 bpf-next 10/13] bpf: Add migration to sk_reuseport_(kern|md).
This patch adds u8 migration field to sk_reuseport_kern and sk_reuseport_md to signal the eBPF program if the kernel calls it for selecting a listener for SYN or migrating sockets in the accept queue or an immature socket during 3WHS. Note that this field is accessible only if the attached type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6t...@kafai-mbp.dhcp.thefacebook.com/ Suggested-by: Martin KaFai Lau Signed-off-by: Kuniyuki Iwashima --- include/linux/bpf.h| 1 + include/linux/filter.h | 4 ++-- include/uapi/linux/bpf.h | 1 + net/core/filter.c | 15 --- net/core/sock_reuseport.c | 2 +- tools/include/uapi/linux/bpf.h | 1 + 6 files changed, 18 insertions(+), 6 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d05e75ed8c1b..cdeb27f4ad63 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1914,6 +1914,7 @@ struct sk_reuseport_kern { u32 hash; u32 reuseport_id; bool bind_inany; + u8 migration; }; bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type, struct bpf_insn_access_aux *info); diff --git a/include/linux/filter.h b/include/linux/filter.h index 1b62397bd124..15d5bf13a905 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -967,12 +967,12 @@ void bpf_warn_invalid_xdp_action(u32 act); #ifdef CONFIG_INET struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, struct bpf_prog *prog, struct sk_buff *skb, - u32 hash); + u32 hash, u8 migration); #else static inline struct sock * bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, struct bpf_prog *prog, struct sk_buff *skb, -u32 hash) +u32 hash, u8 migration) { return NULL; } diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c7f6848c0226..cf518e83df5c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4462,6 +4462,7 @@ struct sk_reuseport_md { __u32 ip_protocol; /* IP protocol. e.g. IPPROTO_TCP, IPPROTO_UDP */ __u32 bind_inany; /* Is sock bound to an INANY address? */ __u32 hash; /* A hash of the packet 4 tuples */ + __u8 migration; /* Migration type */ }; #define BPF_TAG_SIZE 8 diff --git a/net/core/filter.c b/net/core/filter.c index 77001a35768f..7bdf62f24044 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -9860,7 +9860,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf, static void bpf_init_reuseport_kern(struct sk_reuseport_kern *reuse_kern, struct sock_reuseport *reuse, struct sock *sk, struct sk_buff *skb, - u32 hash) + u32 hash, u8 migration) { reuse_kern->skb = skb; reuse_kern->sk = sk; @@ -9869,16 +9869,17 @@ static void bpf_init_reuseport_kern(struct sk_reuseport_kern *reuse_kern, reuse_kern->hash = hash; reuse_kern->reuseport_id = reuse->reuseport_id; reuse_kern->bind_inany = reuse->bind_inany; + reuse_kern->migration = migration; } struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, struct bpf_prog *prog, struct sk_buff *skb, - u32 hash) + u32 hash, u8 migration) { struct sk_reuseport_kern reuse_kern; enum sk_action action; - bpf_init_reuseport_kern(&reuse_kern, reuse, sk, skb, hash); + bpf_init_reuseport_kern(&reuse_kern, reuse, sk, skb, hash, migration); action = BPF_PROG_RUN(prog, &reuse_kern); if (action == SK_PASS) @@ -10017,6 +10018,10 @@ sk_reuseport_is_valid_access(int off, int size, case offsetof(struct sk_reuseport_md, hash): return size == size_default; + case bpf_ctx_range(struct sk_reuseport_md, migration): + return prog->expected_attach_type == BPF_SK_REUSEPORT_SELECT_OR_MIGRATE && + size == sizeof(__u8); + /* Fields that allow narrowing */ case bpf_ctx_range(struct sk_reuseport_md, eth_protocol): if (size < sizeof_field(struct sk_buff, protocol)) @@ -10089,6 +10094,10 @@ static u32 sk_reuseport_convert_ctx_access(enum bpf_access_type type, case offsetof(struct sk_reuseport_md, bind_inany): SK_REUSEPORT_LOAD_FIELD(bind_inany); break; + + case offsetof(struct sk_reuseport_md, migration): + SK_REUSEPORT_LOAD_FIELD(migration); + break; } return insn - insn_buf; diff
[PATCH v2 bpf-next 11/13] bpf: Support BPF_FUNC_get_socket_cookie() for BPF_PROG_TYPE_SK_REUSEPORT.
We will call sock_reuseport.prog for socket migration in the next commit, so the eBPF program has to know which listener is closing in order to select the new listener. Currently, we can get a unique ID for each listener in the userspace by calling bpf_map_lookup_elem() for BPF_MAP_TYPE_REUSEPORT_SOCKARRAY map. This patch makes the sk pointer available in sk_reuseport_md so that we can get the ID by BPF_FUNC_get_socket_cookie() in the eBPF program. Link: https://lore.kernel.org/netdev/20201119001154.kapwihc2plp4f...@kafai-mbp.dhcp.thefacebook.com/ Suggested-by: Martin KaFai Lau Signed-off-by: Kuniyuki Iwashima --- include/uapi/linux/bpf.h | 8 net/core/filter.c | 22 ++ tools/include/uapi/linux/bpf.h | 8 3 files changed, 38 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index cf518e83df5c..a688a7a4fe85 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1655,6 +1655,13 @@ union bpf_attr { * A 8-byte long non-decreasing number on success, or 0 if the * socket field is missing inside *skb*. * + * u64 bpf_get_socket_cookie(struct bpf_sock *sk) + * Description + * Equivalent to bpf_get_socket_cookie() helper that accepts + * *skb*, but gets socket from **struct bpf_sock** context. + * Return + * A 8-byte long non-decreasing number. + * * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx) * Description * Equivalent to bpf_get_socket_cookie() helper that accepts @@ -4463,6 +4470,7 @@ struct sk_reuseport_md { __u32 bind_inany; /* Is sock bound to an INANY address? */ __u32 hash; /* A hash of the packet 4 tuples */ __u8 migration; /* Migration type */ + __bpf_md_ptr(struct bpf_sock *, sk); /* Current listening socket */ }; #define BPF_TAG_SIZE 8 diff --git a/net/core/filter.c b/net/core/filter.c index 7bdf62f24044..9f7018e3f545 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4631,6 +4631,18 @@ static const struct bpf_func_proto bpf_get_socket_cookie_sock_proto = { .arg1_type = ARG_PTR_TO_CTX, }; +BPF_CALL_1(bpf_get_socket_pointer_cookie, struct sock *, sk) +{ + return __sock_gen_cookie(sk); +} + +static const struct bpf_func_proto bpf_get_socket_pointer_cookie_proto = { + .func = bpf_get_socket_pointer_cookie, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_SOCKET, +}; + BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx) { return __sock_gen_cookie(ctx->sk); @@ -9989,6 +10001,8 @@ sk_reuseport_func_proto(enum bpf_func_id func_id, return &sk_reuseport_load_bytes_proto; case BPF_FUNC_skb_load_bytes_relative: return &sk_reuseport_load_bytes_relative_proto; + case BPF_FUNC_get_socket_cookie: + return &bpf_get_socket_pointer_cookie_proto; default: return bpf_base_func_proto(func_id); } @@ -10022,6 +10036,10 @@ sk_reuseport_is_valid_access(int off, int size, return prog->expected_attach_type == BPF_SK_REUSEPORT_SELECT_OR_MIGRATE && size == sizeof(__u8); + case offsetof(struct sk_reuseport_md, sk): + info->reg_type = PTR_TO_SOCKET; + return size == sizeof(__u64); + /* Fields that allow narrowing */ case bpf_ctx_range(struct sk_reuseport_md, eth_protocol): if (size < sizeof_field(struct sk_buff, protocol)) @@ -10098,6 +10116,10 @@ static u32 sk_reuseport_convert_ctx_access(enum bpf_access_type type, case offsetof(struct sk_reuseport_md, migration): SK_REUSEPORT_LOAD_FIELD(migration); break; + + case offsetof(struct sk_reuseport_md, sk): + SK_REUSEPORT_LOAD_FIELD(sk); + break; } return insn - insn_buf; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index cf518e83df5c..a688a7a4fe85 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1655,6 +1655,13 @@ union bpf_attr { * A 8-byte long non-decreasing number on success, or 0 if the * socket field is missing inside *skb*. * + * u64 bpf_get_socket_cookie(struct bpf_sock *sk) + * Description + * Equivalent to bpf_get_socket_cookie() helper that accepts + * *skb*, but gets socket from **struct bpf_sock** context. + * Return + * A 8-byte long non-decreasing number. + * * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx) * Description * Equivalent to bpf_get_socket_cookie() helper that accepts @@ -4463,6 +4470,7 @@ struct sk_reuseport_md { __u32 bind_inany; /* Is sock bound to an INANY address? */
[PATCH v2 bpf-next 12/13] bpf: Call bpf_run_sk_reuseport() for socket migration.
This patch supports socket migration by eBPF. If the attached type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, we can select a new listener by BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning SK_DROP. This feature is useful when listeners have different settings at the socket API level or when we want to free resources as soon as possible. There are two noteworthy points. The first is that we select a listening socket in reuseport_detach_sock() and __reuseport_select_sock(), but we do not have struct skb at closing a listener or retransmitting a SYN+ACK. However, some helper functions do not expect skb is NULL (e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer() in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb temporarily before running the eBPF program. The second is that we do not have struct request_sock in unhash path, and the sk_hash of the listener is always zero. So we pass zero as hash to bpf_run_sk_reuseport(). Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- net/core/filter.c | 19 +++ net/core/sock_reuseport.c | 21 +++-- net/ipv4/inet_hashtables.c | 2 +- 3 files changed, 31 insertions(+), 11 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 9f7018e3f545..53fa3bcbf00f 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -9890,10 +9890,29 @@ struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, { struct sk_reuseport_kern reuse_kern; enum sk_action action; + bool allocated = false; + + if (migration) { + /* cancel migration for possibly incapable eBPF program */ + if (prog->expected_attach_type != BPF_SK_REUSEPORT_SELECT_OR_MIGRATE) + return ERR_PTR(-ENOTSUPP); + + if (!skb) { + allocated = true; + skb = alloc_skb(0, GFP_ATOMIC); + if (!skb) + return ERR_PTR(-ENOMEM); + } + } else if (!skb) { + return NULL; /* fall back to select by hash */ + } bpf_init_reuseport_kern(&reuse_kern, reuse, sk, skb, hash, migration); action = BPF_PROG_RUN(prog, &reuse_kern); + if (allocated) + kfree_skb(skb); + if (action == SK_PASS) return reuse_kern.selected_sk; else diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index b877c8e552d2..2358e8896199 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -221,8 +221,15 @@ struct sock *reuseport_detach_sock(struct sock *sk) lockdep_is_held(&reuseport_lock)); if (sk->sk_protocol == IPPROTO_TCP) { - if (reuse->num_socks && !prog) - nsk = i == reuse->num_socks ? reuse->socks[i - 1] : reuse->socks[i]; + if (reuse->num_socks) { + if (prog) + nsk = bpf_run_sk_reuseport(reuse, sk, prog, NULL, 0, + BPF_SK_REUSEPORT_MIGRATE_QUEUE); + + if (!nsk) + nsk = i == reuse->num_socks ? + reuse->socks[i - 1] : reuse->socks[i]; + } reuse->num_closed_socks++; } else { @@ -306,15 +313,9 @@ static struct sock *__reuseport_select_sock(struct sock *sk, u32 hash, if (!prog) goto select_by_hash; - if (migration) - goto out; - - if (!skb) - goto select_by_hash; - if (prog->type == BPF_PROG_TYPE_SK_REUSEPORT) sk2 = bpf_run_sk_reuseport(reuse, sk, prog, skb, hash, migration); - else + else if (!skb) sk2 = run_bpf_filter(reuse, socks, prog, skb, hdr_len); select_by_hash: @@ -352,7 +353,7 @@ struct sock *reuseport_select_migrated_sock(struct sock *sk, u32 hash, struct sock *nsk; nsk = __reuseport_select_sock(sk, hash, skb, 0, BPF_SK_REUSEPORT_MIGRATE_REQUEST); - if (nsk && likely(refcount_inc_not_zero(&nsk->sk_refcnt))) + if (!IS_ERR_OR_NULL(nsk) && likely(refcount_inc_not_zero(&nsk->sk_refcnt))) return nsk; return NULL; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 545538a6bfac..59f58740c20d 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -699,7 +699,7 @@ void inet_unhash(struct sock *sk) if (rcu_access_pointer(sk->sk_reuseport_cb)) { nsk = reuseport_detach_sock(sk); - if
[PATCH v2 bpf-next 08/13] bpf: Introduce two attach types for BPF_PROG_TYPE_SK_REUSEPORT.
This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to check if the attached eBPF program is capable of migrating sockets. When the eBPF program is attached, the kernel runs it for socket migration only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. The kernel will change the behaviour depending on the returned value: - SK_PASS with selected_sk, select it as a new listener - SK_PASS with selected_sk NULL, fall back to the random selection - SK_DROP, cancel the migration Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6t...@kafai-mbp.dhcp.thefacebook.com/ Suggested-by: Martin KaFai Lau Signed-off-by: Kuniyuki Iwashima --- include/uapi/linux/bpf.h | 2 ++ kernel/bpf/syscall.c | 13 + tools/include/uapi/linux/bpf.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7a48e0055500..c7f6848c0226 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -241,6 +241,8 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_REUSEPORT_SELECT, + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 0cd3cc2af9c1..0737673c727c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1920,6 +1920,11 @@ static void bpf_prog_load_fixup_attach_type(union bpf_attr *attr) attr->expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE; break; + case BPF_PROG_TYPE_SK_REUSEPORT: + if (!attr->expected_attach_type) + attr->expected_attach_type = + BPF_SK_REUSEPORT_SELECT; + break; } } @@ -2003,6 +2008,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, if (expected_attach_type == BPF_SK_LOOKUP) return 0; return -EINVAL; + case BPF_PROG_TYPE_SK_REUSEPORT: + switch (expected_attach_type) { + case BPF_SK_REUSEPORT_SELECT: + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: + return 0; + default: + return -EINVAL; + } case BPF_PROG_TYPE_EXT: if (expected_attach_type) return -EINVAL; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 7a48e0055500..c7f6848c0226 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -241,6 +241,8 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_REUSEPORT_SELECT, + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, __MAX_BPF_ATTACH_TYPE }; -- 2.17.2 (Apple Git-113)
Re: [PATCH v3 0/7] Improve s0ix flows for systems i219LM
Hi, On 12/4/20 9:09 PM, Mario Limonciello wrote: > commit e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME > systems") > disabled s0ix flows for systems that have various incarnations of the > i219-LM ethernet controller. This was done because of some regressions > caused by an earlier > commit 632fbd5eb5b0e ("e1000e: fix S0ix flows for cable connected case") > with i219-LM controller. > > Performing suspend to idle with these ethernet controllers requires a properly > configured system. To make enabling such systems easier, this patch > series allows determining if enabled and turning on using ethtool. > > The flows have also been confirmed to be configured correctly on Dell's > Latitude > and Precision CML systems containing the i219-LM controller, when the kernel > also > contains the fix for s0i3.2 entry previously submitted here and now part of > this > series. > https://marc.info/?l=linux-netdev&m=160677194809564&w=2 > > Patches 4 through 7 will turn the behavior on by default for some of Dell's > CML and TGL systems. First of all thank you for working on this. I must say though that I don't like the approach taken here very much. This is not so much a criticism of this series as it is a criticism of the earlier decision to simply disable s0ix on all devices with the i219-LM + and active ME. AFAIK there was a perfectly acceptable patch to workaround those broken devices, which increased a timeout: https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20200323191639.48826-1-aaron...@canonical.com/ That patch was nacked because it increased the resume time *on broken devices*. So it seems to me that we have a simple choice here: 1. Longer resume time on devices with an improperly configured ME 2. Higher power-consumption on all non-buggy devices Your patches 4-7 try to workaround 2. but IMHO those are just bandaids for getting the initial priorities *very* wrong. Instead of penalizing non-buggy devices with a higher power-consumption, we should default to penalizing the buggy devices with a higher resume time. And if it is decided that the higher resume time is a worse problem then the higher power-consumption, then there should be a list of broken devices and s0ix can be disabled on those. The current allow-list approach is simply never going to work well leading to too high power-consumption on countless devices. This is going to be an endless game of whack-a-mole and as such really is a bad idea. A deny-list for broken devices is a much better approach, esp. since missing devices on that list will still work fine, they will just have a somewhat larger resume time. So what needs to happen IMHO is: 1. Merge your fix from patch 1 of this set 2. Merge "e1000e: bump up timeout to wait when ME un-configure ULP mode" 3. Drop the e1000e_check_me check. Then we also do not need the new "s0ix-enabled" ethertool flag because we do not need userspace to work-around us doing the wrong thing by default. Note a while ago I had access to one of the devices having suspend/resume issues caused by the S0ix support (a Lenovo Thinkpad X1 Carbon gen 7) and I can confirm that the "e1000e: bump up timeout to wait when ME un-configure ULP mode" patch fixes the suspend/resume problem without any noticeable negative side-effects. Regards, Hans > > Changes from v2 to v3: > - Correct some grammar and spelling issues caught by Bjorn H. >* s/s0ix/S0ix/ in all commit messages >* Fix a typo in commit message >* Fix capitalization of proper nouns > - Add more pre-release systems that pass > - Re-order the series to add systems only at the end of the series > - Add Fixes tag to a patch in series. > > Changes from v1 to v2: > - Directly incorporate Vitaly's dependency patch in the series > - Split out s0ix code into it's own file > - Adjust from DMI matching to PCI subsystem vendor ID/device matching > - Remove module parameter and sysfs, use ethtool flag instead. > - Export s0ix flag to ethtool private flags > - Include more people and lists directly in this submission chain. > > Mario Limonciello (6): > e1000e: Move all S0ix related code into its own source file > e1000e: Export S0ix flags to ethtool > e1000e: Add Dell's Comet Lake systems into S0ix heuristics > e1000e: Add more Dell CML systems into S0ix heuristics > e1000e: Add Dell TGL desktop systems into S0ix heuristics > e1000e: Add another Dell TGL notebook system into S0ix heuristics > > Vitaly Lifshits (1): > e1000e: fix S0ix flow to allow S0i3.2 subset entry > > drivers/net/ethernet/intel/e1000e/Makefile | 2 +- > drivers/net/ethernet/intel/e1000e/e1000.h | 4 + > drivers/net/ethernet/intel/e1000e/ethtool.c | 40 +++ > drivers/net/ethernet/intel/e1000e/netdev.c | 272 + > drivers/net/ethernet/intel/e1000e/s0ix.c| 311 > 5 files changed, 361 insertions(+), 268 deletions(-) > create mode 100644 drivers/net/ethernet/intel/e1000e/
[PATCH v2 bpf-next 13/13] bpf: Test BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
This patch adds a test for BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- .../bpf/prog_tests/select_reuseport_migrate.c | 173 ++ .../bpf/progs/test_select_reuseport_migrate.c | 53 ++ 2 files changed, 226 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/select_reuseport_migrate.c create mode 100644 tools/testing/selftests/bpf/progs/test_select_reuseport_migrate.c diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport_migrate.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport_migrate.c new file mode 100644 index ..814b1e3a4c56 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport_migrate.c @@ -0,0 +1,173 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Check if we can migrate child sockets. + * + * 1. call listen() for 5 server sockets. + * 2. update a map to migrate all child socket + *to the last server socket (migrate_map[cookie] = 4) + * 3. call connect() for 25 client sockets. + * 4. call close() for first 4 server sockets. + * 5. call accept() for the last server socket. + * + * Author: Kuniyuki Iwashima + */ + +#include +#include + +#include "test_progs.h" +#include "test_select_reuseport_migrate.skel.h" + +#define ADDRESS "127.0.0.1" +#define PORT 80 +#define NUM_SERVERS 5 +#define NUM_CLIENTS (NUM_SERVERS * 5) + + +static int test_listen(struct test_select_reuseport_migrate *skel, int server_fds[]) +{ + int i, err, optval = 1, migrated_to = NUM_SERVERS - 1; + int prog_fd, reuseport_map_fd, migrate_map_fd; + struct sockaddr_in addr; + socklen_t addr_len; + __u64 value; + + prog_fd = bpf_program__fd(skel->progs.prog_select_reuseport_migrate); + reuseport_map_fd = bpf_map__fd(skel->maps.reuseport_map); + migrate_map_fd = bpf_map__fd(skel->maps.migrate_map); + + addr_len = sizeof(addr); + addr.sin_family = AF_INET; + addr.sin_port = htons(PORT); + inet_pton(AF_INET, ADDRESS, &addr.sin_addr.s_addr); + + for (i = 0; i < NUM_SERVERS; i++) { + server_fds[i] = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (CHECK_FAIL(server_fds[i] == -1)) + return -1; + + err = setsockopt(server_fds[i], SOL_SOCKET, SO_REUSEPORT, +&optval, sizeof(optval)); + if (CHECK_FAIL(err == -1)) + return -1; + + if (i == 0) { + err = setsockopt(server_fds[i], SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, +&prog_fd, sizeof(prog_fd)); + if (CHECK_FAIL(err == -1)) + return -1; + } + + err = bind(server_fds[i], (struct sockaddr *)&addr, addr_len); + if (CHECK_FAIL(err == -1)) + return -1; + + err = listen(server_fds[i], 32); + if (CHECK_FAIL(err == -1)) + return -1; + + err = bpf_map_update_elem(reuseport_map_fd, &i, &server_fds[i], BPF_NOEXIST); + if (CHECK_FAIL(err == -1)) + return -1; + + err = bpf_map_lookup_elem(reuseport_map_fd, &i, &value); + if (CHECK_FAIL(err == -1)) + return -1; + + err = bpf_map_update_elem(migrate_map_fd, &value, &migrated_to, BPF_NOEXIST); + if (CHECK_FAIL(err == -1)) + return -1; + } + + return 0; +} + +static int test_connect(int client_fds[]) +{ + struct sockaddr_in addr; + socklen_t addr_len; + int i, err; + + addr_len = sizeof(addr); + addr.sin_family = AF_INET; + addr.sin_port = htons(PORT); + inet_pton(AF_INET, ADDRESS, &addr.sin_addr.s_addr); + + for (i = 0; i < NUM_CLIENTS; i++) { + client_fds[i] = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (CHECK_FAIL(client_fds[i] == -1)) + return -1; + + err = connect(client_fds[i], (struct sockaddr *)&addr, addr_len); + if (CHECK_FAIL(err == -1)) + return -1; + } + + return 0; +} + +static void test_close(int server_fds[], int num) +{ + int i; + + for (i = 0; i < num; i++) + if (server_fds[i] > 0) + close(server_fds[i]); +} + +static int test_accept(int server_fd) +{ + struct sockaddr_in addr; + socklen_t addr_len; + int cnt, client_fd; + + fcntl(server_fd, F_SETFL, O_NONBLOCK); + addr_len = sizeof(addr); + + for (cnt = 0; cnt < NUM_CLIENTS; cnt++) { + client_fd = accept(server_fd, (struct sockaddr *)&addr, &addr_len); + if (CHECK_FAIL(client_fd == -1)) + return -1; + } +
Re: [RFC PATCH 2/3] net: sparx5: Add Sparx5 switchdev driver
Mon, Nov 30, 2020 at 02:13:35PM CET, steen.hegel...@microchip.com wrote: >On 27.11.2020 18:15, Andrew Lunn wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the >> content is safe >> >> This is a very large driver, which is going to make it slow to review. >Hi Andrew, > >Yes I am aware of that, but I think that what is available with this >series, makes for a nice package that can be tested by us, and used by >our customers. Could you perhaps cut it into multiple patches for easier review? Like the basics, host delivery, fwd offload, etc?
RE: [PATCH net-next] tun: fix ubuf refcount incorrectly on error path
> -Original Message- > From: Jason Wang [mailto:jasow...@redhat.com] > Sent: Monday, December 7, 2020 11:54 AM > To: wangyunjian ; m...@redhat.com > Cc: virtualizat...@lists.linux-foundation.org; netdev@vger.kernel.org; Lilijun > (Jerry) ; xudingke > Subject: Re: [PATCH net-next] tun: fix ubuf refcount incorrectly on error path > > > On 2020/12/4 下午6:22, wangyunjian wrote: > >> -Original Message- > >> From: Jason Wang [mailto:jasow...@redhat.com] > >> Sent: Friday, December 4, 2020 2:11 PM > >> To: wangyunjian ; m...@redhat.com > >> Cc: virtualizat...@lists.linux-foundation.org; netdev@vger.kernel.org; > Lilijun > >> (Jerry) ; xudingke > >> Subject: Re: [PATCH net-next] tun: fix ubuf refcount incorrectly on error > >> path > >> > >> > >> On 2020/12/3 下午4:00, wangyunjian wrote: > >>> From: Yunjian Wang > >>> > >>> After setting callback for ubuf_info of skb, the callback > >>> (vhost_net_zerocopy_callback) will be called to decrease the refcount > >>> when freeing skb. But when an exception occurs afterwards, the error > >>> handling in vhost handle_tx() will try to decrease the same refcount > >>> again. This is wrong and fix this by clearing ubuf_info when meeting > >>> errors. > >>> > >>> Fixes: 4477138fa0ae ("tun: properly test for IFF_UP") > >>> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP > >>> driver") > >>> > >>> Signed-off-by: Yunjian Wang > >>> --- > >>>drivers/net/tun.c | 11 +++ > >>>1 file changed, 11 insertions(+) > >>> > >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c index > >>> 2dc1988a8973..3614bb1b6d35 100644 > >>> --- a/drivers/net/tun.c > >>> +++ b/drivers/net/tun.c > >>> @@ -1861,6 +1861,12 @@ static ssize_t tun_get_user(struct tun_struct > >> *tun, struct tun_file *tfile, > >>> if (unlikely(!(tun->dev->flags & IFF_UP))) { > >>> err = -EIO; > >>> rcu_read_unlock(); > >>> + if (zerocopy) { > >>> + skb_shinfo(skb)->destructor_arg = NULL; > >>> + skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY; > >>> + skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG; > >>> + } > >>> + > >>> goto drop; > >>> } > >>> > >>> @@ -1874,6 +1880,11 @@ static ssize_t tun_get_user(struct tun_struct > >>> *tun, struct tun_file *tfile, > >>> > >>> if (unlikely(headlen > skb_headlen(skb))) { > >>> atomic_long_inc(&tun->dev->rx_dropped); > >>> + if (zerocopy) { > >>> + skb_shinfo(skb)->destructor_arg = NULL; > >>> + skb_shinfo(skb)->tx_flags &= > ~SKBTX_DEV_ZEROCOPY; > >>> + skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG; > >>> + } > >>> napi_free_frags(&tfile->napi); > >>> rcu_read_unlock(); > >>> mutex_unlock(&tfile->napi_mutex); > >> > >> It looks to me then we miss the failure feedback. > >> > >> The issues comes from the inconsistent error handling in tun. > >> > >> I wonder whether we can simply do uarg->callback(uarg, false) if necessary > on > >> every failture path on tun_get_user(). > > How about this? > > > > --- > > drivers/net/tun.c | 29 ++--- > > 1 file changed, 18 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > index 2dc1988a8973..36a8d8eacd7b 100644 > > --- a/drivers/net/tun.c > > +++ b/drivers/net/tun.c > > @@ -1637,6 +1637,19 @@ static struct sk_buff *tun_build_skb(struct > tun_struct *tun, > > return NULL; > > } > > > > +/* copy ubuf_info for callback when skb has no error */ > > +inline static tun_copy_ubuf_info(struct sk_buff *skb, bool zerocopy, void > *msg_control) > > +{ > > + if (zerocopy) { > > + skb_shinfo(skb)->destructor_arg = msg_control; > > + skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; > > + skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; > > + } else if (msg_control) { > > + struct ubuf_info *uarg = msg_control; > > + uarg->callback(uarg, false); > > + } > > +} > > + > > /* Get packet from user space buffer */ > > static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file > > *tfile, > > void *msg_control, struct iov_iter *from, > > @@ -1812,16 +1825,6 @@ static ssize_t tun_get_user(struct tun_struct > *tun, struct tun_file *tfile, > > break; > > } > > > > - /* copy skb_ubuf_info for callback when skb has no error */ > > - if (zerocopy) { > > - skb_shinfo(skb)->destructor_arg = msg_control; > > - skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; > > - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; > > - } else if (msg_control) { > > - struct ubuf_info *uarg = msg_control; > > - uarg->callback(uarg, false); > > - } >
Re: [PATCH net-next] nfc: s3fwrn5: Change irqflags
On Mon, Dec 7, 2020 at 8:51 PM Krzysztof Kozlowski wrote: > > On Mon, Dec 07, 2020 at 08:38:27PM +0900, Bongsu Jeon wrote: > > From: Bongsu Jeon > > > > change irqflags from IRQF_TRIGGER_HIGH to IRQF_TRIGGER_RISING for stable > > Samsung's nfc interrupt handling. > > 1. Describe in commit title/subject the change. Just a word "change irqflags" > is >not enough. > Ok. I'll update it. > 2. Describe in commit message what you are trying to fix. Before was not >stable? The "for stable interrupt handling" is a little bit vauge. > Usually, Samsung's NFC Firmware sends an i2c frame as below. 1. NFC Firmware sets the gpio(interrupt pin) high when there is an i2c frame to send. 2. If the CPU's I2C master has received the i2c frame, NFC F/W sets the gpio low. NFC driver's i2c interrupt handler would be called in the abnormal case as the NFC F/W task of number 2 is delayed because of other high priority tasks. In that case, NFC driver will try to receive the i2c frame but there isn't any i2c frame to send in NFC. It would cause an I2C communication problem. This case would hardly happen. But, I changed the interrupt as a defense code. If Driver uses the TRIGGER_RISING not LEVEL trigger, there would be no problem even if the NFC F/W task is delayed. > 3. This is contradictory to the bindings and current DTS. I think the >driver should not force the specific trigger type because I could >imagine some configuration that the actual interrupt to the CPU is >routed differently. > >Instead, how about removing the trigger flags here and fixing the DTS >and bindings example? > As I mentioned before, I changed this code because of Samsung NFC's I2C Communication way. So, I think that it is okay for the nfc driver to force the specific trigger type( EDGE_RISING). What do you think about it? > Best regards, > Krzysztof > > > > > Signed-off-by: Bongsu Jeon > > --- > > drivers/nfc/s3fwrn5/i2c.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c > > index e1bdde105f24..016f6b6df849 100644 > > --- a/drivers/nfc/s3fwrn5/i2c.c > > +++ b/drivers/nfc/s3fwrn5/i2c.c > > @@ -213,7 +213,7 @@ static int s3fwrn5_i2c_probe(struct i2c_client *client, > > return ret; > > > > ret = devm_request_threaded_irq(&client->dev, phy->i2c_dev->irq, NULL, > > - s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_HIGH | IRQF_ONESHOT, > > + s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_RISING | IRQF_ONESHOT, > > S3FWRN5_I2C_DRIVER_NAME, phy); > > if (ret) > > s3fwrn5_remove(phy->common.ndev); > > -- > > 2.17.1 > >
miss you
Greetings I'm Sophia jasper I hope we can start a relationship please reply me
[PATCH v2] xfrm: interface: Don't hide plain packets from netfilter
With an IPsec tunnel without dedicated interface, netfilter sees locally generated packets twice as they exit the physical interface: Once as "the inner packet" with IPsec context attached and once as the encrypted (ESP) packet. With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT hook anymore, making it impossible to match on both inner header values and associated IPsec data from that hook. Fix this by looping packets transmitted from xfrm_interface through NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes behaviour consistent again from netfilter's point of view. Fixes: f203b76d78092 ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Phil Sutter --- Changes since v1: - Extend recipients list, no code changes. --- net/xfrm/xfrm_interface.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c index aa4cdcf69d471..24af61c95b4d4 100644 --- a/net/xfrm/xfrm_interface.c +++ b/net/xfrm/xfrm_interface.c @@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) skb_dst_set(skb, dst); skb->dev = tdev; - err = dst_output(xi->net, skb->sk, skb); + err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net, + skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output); if (net_xmit_eval(err) == 0) { struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats); -- 2.28.0
Re: [PATCH bpf-next] bpf: return -EOPNOTSUPP when attaching to non-kernel BTF
Alexei Starovoitov writes: > On Fri, Dec 4, 2020 at 7:11 PM Andrii Nakryiko wrote: >> + return -EOPNOTSUPP; > > $ cd kernel/bpf > $ git grep ENOTSUPP|wc -l > 46 > $ git grep EOPNOTSUPP|wc -l > 11 But also $ cd kernel/include/uapi $ git grep ENOTSUPP | wc -l 0 $ git grep EOPNOTSUPP | wc -l 8 (i.e., ENOTSUPP is not defined in userspace headers at all) -Toke
Re: [PATCH v5 3/6] net: dsa: microchip: ksz8795: move register offsets and shifts to separate struct
Hi Michael, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] [also build test WARNING on next-20201207] [cannot apply to net/master ipvs/master linus/master v5.10-rc7] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Michael-Grzeschik/microchip-add-support-for-ksz88x3-driver-family/20201207-205945 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git af3f4a85d90218bb59315d591bd2bffa5e646466 config: arc-allyesconfig (attached as .config) compiler: arceb-elf-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/db1f7322c8fa2c28587f13ab3eebbb6ee02874b1 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Michael-Grzeschik/microchip-add-support-for-ksz88x3-driver-family/20201207-205945 git checkout db1f7322c8fa2c28587f13ab3eebbb6ee02874b1 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> drivers/net/dsa/microchip/ksz8795.c:69:27: warning: initialized field >> overwritten [-Woverride-init] 69 | [DYNAMIC_MAC_ENTRIES] = 29, | ^~ drivers/net/dsa/microchip/ksz8795.c:69:27: note: (near initialization for 'ksz8795_shifts[5]') vim +69 drivers/net/dsa/microchip/ksz8795.c 62 63 static const u8 ksz8795_shifts[] = { 64 [VLAN_TABLE_MEMBERSHIP] = 7, 65 [VLAN_TABLE]= 16, 66 [STATIC_MAC_FWD_PORTS] = 16, 67 [STATIC_MAC_FID]= 24, 68 [DYNAMIC_MAC_ENTRIES_H] = 3, > 69 [DYNAMIC_MAC_ENTRIES] = 29, 70 [DYNAMIC_MAC_FID] = 16, 71 [DYNAMIC_MAC_TIMESTAMP] = 27, 72 [DYNAMIC_MAC_SRC_PORT] = 24, 73 }; 74 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH net-next] nfc: s3fwrn5: Change irqflags
On Mon, Dec 07, 2020 at 10:39:01PM +0900, Bongsu Jeon wrote: > On Mon, Dec 7, 2020 at 8:51 PM Krzysztof Kozlowski wrote: > > > > On Mon, Dec 07, 2020 at 08:38:27PM +0900, Bongsu Jeon wrote: > > > From: Bongsu Jeon > > > > > > change irqflags from IRQF_TRIGGER_HIGH to IRQF_TRIGGER_RISING for stable > > > Samsung's nfc interrupt handling. > > > > 1. Describe in commit title/subject the change. Just a word "change > > irqflags" is > >not enough. > > > Ok. I'll update it. > > > 2. Describe in commit message what you are trying to fix. Before was not > >stable? The "for stable interrupt handling" is a little bit vauge. > > > Usually, Samsung's NFC Firmware sends an i2c frame as below. > > 1. NFC Firmware sets the gpio(interrupt pin) high when there is an i2c > frame to send. > 2. If the CPU's I2C master has received the i2c frame, NFC F/W sets > the gpio low. > > NFC driver's i2c interrupt handler would be called in the abnormal case > as the NFC F/W task of number 2 is delayed because of other high > priority tasks. > In that case, NFC driver will try to receive the i2c frame but there > isn't any i2c frame > to send in NFC. It would cause an I2C communication problem. > This case would hardly happen. > But, I changed the interrupt as a defense code. > If Driver uses the TRIGGER_RISING not LEVEL trigger, there would be no problem > even if the NFC F/W task is delayed. All this should be explained in commit message, not in the email. > > > 3. This is contradictory to the bindings and current DTS. I think the > >driver should not force the specific trigger type because I could > >imagine some configuration that the actual interrupt to the CPU is > >routed differently. > > > >Instead, how about removing the trigger flags here and fixing the DTS > >and bindings example? > > > > As I mentioned before, > I changed this code because of Samsung NFC's I2C Communication way. > So, I think that it is okay for the nfc driver to force the specific > trigger type( EDGE_RISING). > > What do you think about it? Some different chip or some different hardware implementation could have the signal inverted, e.g. edge falling, not rising. This is rather a theoretical scenario but still such change makes the code more generic, configurable with DTS. Therefore trigger mode should be configured via DTS, not enforced by the driver. Best regards, Krzysztof
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
Ard Biesheuvel wrote: > > I wonder if it would help if the input buffer and output buffer didn't > > have to correspond exactly in usage - ie. the output buffer could be used > > at a slower rate than the input to allow for buffering inside the crypto > > algorithm. > > > > I don't follow - how could one be used at a slower rate? I mean that the crypto algorithm might need to buffer the last part of the input until it has a block's worth before it can write to the output. > > The hashes corresponding to the kerberos enctypes I'm supporting are: > > > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. > > > > HMAC-SHA256 for aes128-cts-hmac-sha256-128 > > > > HMAC-SHA384 for aes256-cts-hmac-sha384-192 > > > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac > > > > I'm not sure you can support all of those with the instructions available. > > It depends on whether the caller can make use of the authenc() > pattern, which is a type of AEAD we support. Interesting. I didn't realise AEAD was an API. > There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)), > including h/w accelerated ones, but none that implement ciphertext > stealing. So that means that, even if you manage to use the AEAD layer to > perform both at the same time, the generic authenc() template will perform > the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes, > respectively, which won't give you any benefit until accelerated > implementations turn up that perform the whole operation in one pass over > the input. And even then, I don't think the performance benefit will be > worth it. Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext, not the plaintext. For the moment, it's probably not worth worrying about, then. If I can manage to abstract the sunrpc bits out into a krb5 library, we can improve the library later. David
Re: [PATCH net-next] nfc: s3fwrn5: Change irqflags
On Mon, Dec 7, 2020 at 11:13 PM Krzysztof Kozlowski wrote: > > On Mon, Dec 07, 2020 at 10:39:01PM +0900, Bongsu Jeon wrote: > > On Mon, Dec 7, 2020 at 8:51 PM Krzysztof Kozlowski wrote: > > > > > > On Mon, Dec 07, 2020 at 08:38:27PM +0900, Bongsu Jeon wrote: > > > > From: Bongsu Jeon > > > > > > > > change irqflags from IRQF_TRIGGER_HIGH to IRQF_TRIGGER_RISING for stable > > > > Samsung's nfc interrupt handling. > > > > > > 1. Describe in commit title/subject the change. Just a word "change > > > irqflags" is > > >not enough. > > > > > Ok. I'll update it. > > > > > 2. Describe in commit message what you are trying to fix. Before was not > > >stable? The "for stable interrupt handling" is a little bit vauge. > > > > > Usually, Samsung's NFC Firmware sends an i2c frame as below. > > > > 1. NFC Firmware sets the gpio(interrupt pin) high when there is an i2c > > frame to send. > > 2. If the CPU's I2C master has received the i2c frame, NFC F/W sets > > the gpio low. > > > > NFC driver's i2c interrupt handler would be called in the abnormal case > > as the NFC F/W task of number 2 is delayed because of other high > > priority tasks. > > In that case, NFC driver will try to receive the i2c frame but there > > isn't any i2c frame > > to send in NFC. It would cause an I2C communication problem. > > This case would hardly happen. > > But, I changed the interrupt as a defense code. > > If Driver uses the TRIGGER_RISING not LEVEL trigger, there would be no > > problem > > even if the NFC F/W task is delayed. > > All this should be explained in commit message, not in the email. > Okay. I will > > > > > 3. This is contradictory to the bindings and current DTS. I think the > > >driver should not force the specific trigger type because I could > > >imagine some configuration that the actual interrupt to the CPU is > > >routed differently. > > > > > >Instead, how about removing the trigger flags here and fixing the DTS > > >and bindings example? > > > > > > > As I mentioned before, > > I changed this code because of Samsung NFC's I2C Communication way. > > So, I think that it is okay for the nfc driver to force the specific > > trigger type( EDGE_RISING). > > > > What do you think about it? > > Some different chip or some different hardware implementation could have > the signal inverted, e.g. edge falling, not rising. This is rather > a theoretical scenario but still such change makes the code more > generic, configurable with DTS. Therefore trigger mode should be > configured via DTS, not enforced by the driver. > Okay. I understand it. > Best regards, > Krzysztof
[PATCH][next] seg6: fix unintentional integer overflow on left shift
From: Colin Ian King Shifting the integer value 1 is evaluated using 32-bit arithmetic and then used in an expression that expects a unsigned long value leads to a potential integer overflow. Fix this by using the BIT macro to perform the shift to avoid the overflow. Addresses-Coverity: ("Uninitentional integer overflow") Fixes: 964adce526a4 ("seg6: improve management of behavior attributes") Signed-off-by: Colin Ian King --- net/ipv6/seg6_local.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index b07f7c1c82a4..d68de8cd1207 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -1366,7 +1366,7 @@ static void __destroy_attrs(unsigned long parsed_attrs, int max_parsed, * attribute; otherwise, we call the destroy() callback. */ for (i = 0; i < max_parsed; ++i) { - if (!(parsed_attrs & (1 << i))) + if (!(parsed_attrs & BIT(i))) continue; param = &seg6_action_params[i]; -- 2.29.2
Re: [PATCH net-next 0/6] s390/qeth: updates 2020-12-07
From: Julian Wiedmann Date: Mon, 7 Dec 2020 14:12:27 +0100 > Hi Jakub, > > please apply the following patch series for qeth to netdev's net-next tree. > > Some sysfs cleanups (with the prep work in ccwgroup acked by Heiko), and > a few improvements to the code that deals with async TX completion > notifications for IQD devices. > > This also brings the missing patch from the previous net-next submission. Series applied, thanks Julian!
Re: [PATCH v5 3/6] net: dsa: microchip: ksz8795: move register offsets and shifts to separate struct
From: Michael Grzeschik Date: Mon, 7 Dec 2020 13:56:24 +0100 > @@ -991,13 +1090,16 @@ static void ksz8_port_setup(struct ksz_device *dev, > int port, bool cpu_port) > static void ksz8_config_cpu_port(struct dsa_switch *ds) > { > struct ksz_device *dev = ds->priv; > + struct ksz8 *ksz8 = dev->priv; > + const u8 *regs = ksz8->regs; > + const u32 *masks = ksz8->masks; > struct ksz_port *p; > u8 remote; > int i; > Please use reverse christmas tree ordering for local variables. Thank you.
Re: [PATCH] dpaa2-mac: Add a missing of_node_put after of_device_is_available
On Sun, Dec 06, 2020 at 04:13:39PM +0100, Christophe JAILLET wrote: > Add an 'of_node_put()' call when a tested device node is not available. > > Fixes:94ae899b2096 ("dpaa2-mac: add PCS support through the Lynx module") > Signed-off-by: Christophe JAILLET Reviewed-by: Ioana Ciornei Thanks! > --- > drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c > b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c > index 90cd243070d7..828c177df03d 100644 > --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c > +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c > @@ -269,6 +269,7 @@ static int dpaa2_pcs_create(struct dpaa2_mac *mac, > > if (!of_device_is_available(node)) { > netdev_err(mac->net_dev, "pcs-handle node not available\n"); > + of_node_put(node); > return -ENODEV; > } > > -- > 2.27.0 >
pull request: bluetooth-next 2020-12-07
Hi Dave, Jakub, Here's the main bluetooth-next pull request for the 5.11 kernel. - Updated Bluetooth entries in MAINTAINERS to include Luiz von Dentz - Added support for Realtek 8822CE and 8852A devices - Added support for MediaTek MT7615E device - Improved workarounds for fake CSR devices - Fix Bluetooth qualification test case L2CAP/COS/CFD/BV-14-C - Fixes for LL Privacy support - Enforce 16 byte encryption key size for FIPS security level - Added new mgmt commands for extended advertising support - Multiple other smaller fixes & improvements Please let me know if there are any issues pulling. Thanks. Johan --- The following changes since commit bff6f1db91e330d7fba56f815cdbc412c75fe163: stmmac: intel: change all EHL/TGL to auto detect phy addr (2020-11-07 16:11:54 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git for-upstream for you to fetch changes up to 02be5f13aacba2100f1486d3ad16c26b6dede1ce: MAINTAINERS: Update Bluetooth entries (2020-12-07 17:02:01 +0200) Abhishek Pandit-Subedi (2): Bluetooth: btqca: Add valid le states quirk Bluetooth: Set missing suspend task bits Anant Thazhemadam (2): Bluetooth: hci_h5: close serdev device and free hu in h5_close Bluetooth: hci_h5: fix memory leak in h5_close Anmol Karn (1): Bluetooth: Fix null pointer dereference in hci_event_packet() Archie Pusaka (1): Bluetooth: Enforce key size of 16 bytes on FIPS level Balakrishna Godavarthi (1): Bluetooth: hci_qca: Enhance retry logic in qca_setup Cadel Watson (1): Bluetooth: btusb: Support 0bda:c123 Realtek 8822CE device Chris Chiu (1): Bluetooth: btusb: Add support for 13d3:3560 MediaTek MT7615E device Claire Chang (1): Bluetooth: Move force_bredr_smp debugfs into hci_debugfs_create_bredr Colin Ian King (1): Bluetooth: btrtl: fix incorrect skb allocation failure check Daniel Winkler (6): Bluetooth: Resume advertising after LE connection Bluetooth: Add helper to set adv data Bluetooth: Break add adv into two mgmt commands Bluetooth: Use intervals and tx power from mgmt cmds Bluetooth: Query LE tx power on startup Bluetooth: Change MGMT security info CMD to be more generic Edward Vear (1): Bluetooth: Fix attempting to set RPA timeout when unsupported Hans de Goede (4): Bluetooth: revert: hci_h5: close serdev device and free hu in h5_close Bluetooth: hci_h5: Add OBDA0623 ACPI HID Bluetooth: btusb: Fix detection of some fake CSR controllers with a bcdDevice val of 0x0134 Bluetooth: btusb: Add workaround for remote-wakeup issues with Barrot 8041a02 fake CSR controllers Howard Chung (6): Bluetooth: Replace BT_DBG with bt_dev_dbg in HCI request Bluetooth: Interleave with allowlist scan Bluetooth: Handle system suspend resume case Bluetooth: Handle active scan case Bluetooth: Refactor read default sys config for various types Bluetooth: Add toggle to switch off interleave scan Jimmy Wahlberg (1): Bluetooth: Fix for Bluetooth SIG test L2CAP/COS/CFD/BV-14-C Jing Xiangfeng (2): Bluetooth: btusb: Add the missed release_firmware() in btusb_mtk_setup_firmware() Bluetooth: btmtksdio: Add the missed release_firmware() in mtk_setup_firmware() Julian Pidancet (1): Bluetooth: btusb: Add support for 1358:c123 Realtek 8822CE device Kai-Heng Feng (1): Bluetooth: btrtl: Ask 8821C to drop old firmware Kiran K (5): Bluetooth: btintel: Fix endianness issue for TLV version information Bluetooth: btusb: Add *setup* function for new generation Intel controllers Bluetooth: btusb: Define a function to construct firmware filename Bluetooth: btusb: Helper function to download firmware to Intel adapters Bluetooth: btusb: Map Typhoon peak controller to BTUSB_INTEL_NEWGEN Luiz Augusto von Dentz (2): Bluetooth: Fix not sending Set Extended Scan Response Bluetooth: Rename get_adv_instance_scan_rsp Marcel Holtmann (2): Bluetooth: Increment management interface revision MAINTAINERS: Update Bluetooth entries Max Chou (3): Bluetooth: btusb: Add the more support IDs for Realtek RTL8822CE Bluetooth: btrtl: Refine the ic_id_table for clearer and more regular Bluetooth: btusb: btrtl: Add support for RTL8852A Nigel Christian (1): Bluetooth: hci_qca: resolve various warnings Ole Bjørn Midtbø (1): Bluetooth: hidp: use correct wait queue when removing ctrl_wait Peilin Ye (1): Bluetooth: Fix slab-out-of-bounds read in hci_le_direct_adv_report_evt() Reo Shiseki (1): Bluetooth: fix typo in struct name Sathish Narasimman (1): Bluetooth: Fix: LL PRivacy BLE device fails to connect Sergey Shtylyov (1): Bluetooth: consolidate error paths in hci_phy_link_complete_evt() Tim Jiang (1)