[PATCH net] ipv6: Fix suspicious RCU usage warning in ip6mr
From: Madhuparna Bhowmik This patch fixes the following warning: = WARNING: suspicious RCU usage 5.7.0-rc4-next-20200507-syzkaller #0 Not tainted - net/ipv6/ip6mr.c:124 RCU-list traversed in non-reader section!! ipmr_new_table() returns an existing table, but there is no table at init. Therefore the condition: either holding rtnl or the list is empty is used. Fixes: d13fee049f ("Default enable RCU list lockdep debugging with .."): WARNING: suspicious RCU usage Reported-by: kernel test robot Suggested-by: Jakub Kicinski Signed-off-by: Madhuparna Bhowmik --- net/ipv6/ip6mr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 65a54d74acc1..fbe282bb8036 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -98,7 +98,7 @@ static void ipmr_expire_process(struct timer_list *t); #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES #define ip6mr_for_each_table(mrt, net) \ list_for_each_entry_rcu(mrt, &net->ipv6.mr6_tables, list, \ - lockdep_rtnl_is_held()) + lockdep_rtnl_is_held() || list_empty(&net->ipv6.mr6_tables)) static struct mr_table *ip6mr_mr_table_iter(struct net *net, struct mr_table *mrt) -- 2.17.1
Re: [PATCH] Fix suspicious RCU usage warning
On Wed, May 13, 2020 at 12:00:10PM -0700, David Miller wrote: > From: madhuparnabhowmi...@gmail.com > Date: Wed, 13 May 2020 11:46:10 +0530 > > > From: Madhuparna Bhowmik > > > > This patch fixes the following warning: > > > > = > > WARNING: suspicious RCU usage > > 5.7.0-rc4-next-20200507-syzkaller #0 Not tainted > > - > > net/ipv6/ip6mr.c:124 RCU-list traversed in non-reader section!! > > > > ipmr_new_table() returns an existing table, but there is no table at > > init. Therefore the condition: either holding rtnl or the list is empty > > is used. > > > > Suggested-by: Jakub Kicinski > > Signed-off-by: Madhuparna Bhowmik > > > > Signed-off-by: Madhuparna Bhowmik > > Please only provide one signoff line. > > Please provide a proper Fixes: tag for this bug fix. > > And finally, please make your Subject line more appropriate. It must > first state the target tree inside of the "[PATCH]" area, the two choices > are "[PATCH net]" and "[PATCH net-next]" and it depends upon which tree > this patch is targetting. > > Then your Subject line should also be more descriptive about exactly the > subsystem and area the change is being made to, for this change for > example you could use something like: > > ipv6: Fix suspicious RCU usage warning in ip6mr. > > Also, obviously, there are also syzkaller tags you can add to the > commit message as well. Sorry for this malformed patch, I have sent a patch with all these corrections. Thank you, Madhuparna
RE: [EXT] Re: signal quality and cable diagnostic
On Tue, May 12, 2020 at 10:22:01AM +0200, Oleksij Rempel wrote: > So I think we should pass raw SQI value to user space, at least in the > first implementation. > What do you think about this? Hi Oleksij, I had a check about the background of this SQI thing. The table you reference with concrete SNR values is informative only and not a requirement. The requirements are rather loose. This is from OA: - Only for SQI=0 a link loss shall occur. - The indicated signal quality shall monotonic increasing /decreasing with noise level. - It shall be indicated in the datasheet at which level a BER<10^-10 (better than 10^-10) is achieved (e.g. "from SQI=3 to SQI=7 the link has a BER<10^-10 (better than 10^-10)") I.e. SQI does not need to have a direct correlation with SNR. The fundamental underlying metric is the BER. You can report the raw SQI level and users would have to look up what it means in the respective data sheet. There is no guaranteed relation between SQI levels of different devices, i.e. SQI 5 can have lower BER than SQI 6 on another device. Alternatively, you could report BER < x for the different SQI levels. However, this requires the information to be available. While I could provide these for NXP, it might not be easily available for other vendors. If reporting raw SQI, at least the SQI level for BER<10^-10 should be presented to give any meaning to the value. Regards, Christian
Re: [bpf-next PATCH 2/3] bpf: sk_msg helpers for probe_* and *current_task*
On 5/13/20 12:24 PM, John Fastabend wrote: Often it is useful when applying policy to know something about the task. If the administrator has CAP_SYS_ADMIN rights then they can use kprobe + sk_msg and link the two programs together to accomplish this. However, this is a bit clunky and also means we have to call sk_msg program and kprobe program when we could just use a single program and avoid passing metadata through sk_msg/skb, socket, etc. To accomplish this add probe_* helpers to sk_msg programs guarded by a CAP_SYS_ADMIN check. New supported helpers are the following, BPF_FUNC_get_current_task BPF_FUNC_current_task_under_cgroup BPF_FUNC_probe_read_user BPF_FUNC_probe_read_kernel BPF_FUNC_probe_read BPF_FUNC_probe_read_user_str BPF_FUNC_probe_read_kernel_str BPF_FUNC_probe_read_str I think this is a good idea. But this will require bpf program to be GPLed, probably it will be okay. Currently, for capabilities, it is CAP_SYS_ADMIN now, in the future, it may be CAP_PERFMON. Also, do we want to remove BPF_FUNC_probe_read and BPF_FUNC_probe_read_str from the list? Since we introduce helpers to new program types, we can deprecate these two helpers right away. The new helpers will be subject to new security lockdown rules which may have impact on networking bpf programs on particular setup. Signed-off-by: John Fastabend --- kernel/trace/bpf_trace.c | 16 net/core/filter.c| 34 ++ 2 files changed, 42 insertions(+), 8 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index d961428..abe6721 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -147,7 +147,7 @@ BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, return ret; } -static const struct bpf_func_proto bpf_probe_read_user_proto = { +const struct bpf_func_proto bpf_probe_read_user_proto = { .func = bpf_probe_read_user, .gpl_only = true, .ret_type = RET_INTEGER, @@ -167,7 +167,7 @@ BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size, return ret; } -static const struct bpf_func_proto bpf_probe_read_user_str_proto = { +const struct bpf_func_proto bpf_probe_read_user_str_proto = { .func = bpf_probe_read_user_str, .gpl_only = true, .ret_type = RET_INTEGER, @@ -198,7 +198,7 @@ BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, false); } -static const struct bpf_func_proto bpf_probe_read_kernel_proto = { +const struct bpf_func_proto bpf_probe_read_kernel_proto = { .func = bpf_probe_read_kernel, .gpl_only = true, .ret_type = RET_INTEGER, @@ -213,7 +213,7 @@ BPF_CALL_3(bpf_probe_read_compat, void *, dst, u32, size, return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, true); } -static const struct bpf_func_proto bpf_probe_read_compat_proto = { +const struct bpf_func_proto bpf_probe_read_compat_proto = { .func = bpf_probe_read_compat, .gpl_only = true, .ret_type = RET_INTEGER, @@ -253,7 +253,7 @@ BPF_CALL_3(bpf_probe_read_kernel_str, void *, dst, u32, size, return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, false); } -static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = { +const struct bpf_func_proto bpf_probe_read_kernel_str_proto = { .func = bpf_probe_read_kernel_str, .gpl_only = true, .ret_type = RET_INTEGER, @@ -268,7 +268,7 @@ BPF_CALL_3(bpf_probe_read_compat_str, void *, dst, u32, size, return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, true); } -static const struct bpf_func_proto bpf_probe_read_compat_str_proto = { +const struct bpf_func_proto bpf_probe_read_compat_str_proto = { .func = bpf_probe_read_compat_str, .gpl_only = true, .ret_type = RET_INTEGER, @@ -874,7 +874,7 @@ BPF_CALL_0(bpf_get_current_task) return (long) current; } -static const struct bpf_func_proto bpf_get_current_task_proto = { +const struct bpf_func_proto bpf_get_current_task_proto = { .func = bpf_get_current_task, .gpl_only = true, .ret_type = RET_INTEGER, @@ -895,7 +895,7 @@ BPF_CALL_2(bpf_current_task_under_cgroup, struct bpf_map *, map, u32, idx) return task_under_cgroup_hierarchy(current, cgrp); } -static const struct bpf_func_proto bpf_current_task_under_cgroup_proto = { +const struct bpf_func_proto bpf_current_task_under_cgroup_proto = { .func = bpf_current_task_under_cgroup, .gpl_only = false, .ret_type = RET_INTEGER, diff --git a/net/core/filter.c b/net/core/filter.c index 45b4a16..d1c4739 100644 --- a/net/core/filter.c +++ b/net/core/filter.c
Re: [bpf-next PATCH 3/3] bpf: sk_msg add get socket storage helpers
On 5/13/20 12:24 PM, John Fastabend wrote: Add helpers to use local socket storage. Signed-off-by: John Fastabend --- include/uapi/linux/bpf.h |2 ++ net/core/filter.c| 15 +++ 2 files changed, 17 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index bfb31c1..3ca7cfd 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3607,6 +3607,8 @@ struct sk_msg_md { __u32 remote_port; /* Stored in network byte order */ __u32 local_port; /* stored in host byte order */ __u32 size; /* Total size of sk_msg */ + + __bpf_md_ptr(struct bpf_sock *, sk); /* current socket */ }; Sync changes to tools/include/uapi/linux/bpf.h? For this patch and previous patches, it would be good we got some selftests to exercise some newly-added helpers. struct sk_reuseport_md { diff --git a/net/core/filter.c b/net/core/filter.c index d1c4739..c42adc8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6395,6 +6395,10 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_current_uid_gid_proto; case BPF_FUNC_get_current_pid_tgid: return &bpf_get_current_pid_tgid_proto; + case BPF_FUNC_sk_storage_get: + return &bpf_sk_storage_get_proto; + case BPF_FUNC_sk_storage_delete: + return &bpf_sk_storage_delete_proto; #ifdef CONFIG_CGROUPS case BPF_FUNC_get_current_cgroup_id: return &bpf_get_current_cgroup_id_proto; @@ -7243,6 +7247,11 @@ static bool sk_msg_is_valid_access(int off, int size, if (size != sizeof(__u64)) return false; break; + case offsetof(struct sk_msg_md, sk): + if (size != sizeof(__u64)) + return false; + info->reg_type = PTR_TO_SOCKET; + break; case bpf_ctx_range(struct sk_msg_md, family): case bpf_ctx_range(struct sk_msg_md, remote_ip4): case bpf_ctx_range(struct sk_msg_md, local_ip4): @@ -8577,6 +8586,12 @@ static u32 sk_msg_convert_ctx_access(enum bpf_access_type type, si->dst_reg, si->src_reg, offsetof(struct sk_msg_sg, size)); break; + + case offsetof(struct sk_msg_md, sk): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_msg, sk), + si->dst_reg, si->src_reg, + offsetof(struct sk_msg, sk)); + break; } return insn - insn_buf;
[PATCH net-next 0/2] Fixing compilation warnings and errors
Patch 1: Fixes the warnings seen when compiling using sparse tool. Patch 2: Fixes a cocci check error introduced after commit 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). Ayush Sawal (2): Crypto/chcr: Fixes compilations warnings Crypto/chcr: Fixes a cocci check error drivers/crypto/chelsio/chcr_algo.c | 9 + drivers/crypto/chelsio/chcr_ipsec.c | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) -- 2.26.0.rc1.11.g30e9940
[PATCH net-next 1/2] Crypto/chcr: Fixes compilations warnings
This patch fixes the compilation warnings displayed by sparse tool for chcr driver. Signed-off-by: Ayush Sawal --- drivers/crypto/chelsio/chcr_algo.c | 8 drivers/crypto/chelsio/chcr_ipsec.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/chelsio/chcr_algo.c b/drivers/crypto/chelsio/chcr_algo.c index b8c1c4dd3ef0..1aed0e8d6558 100644 --- a/drivers/crypto/chelsio/chcr_algo.c +++ b/drivers/crypto/chelsio/chcr_algo.c @@ -256,7 +256,7 @@ static void get_aes_decrypt_key(unsigned char *dec_key, return; } for (i = 0; i < nk; i++) - w_ring[i] = be32_to_cpu(*(u32 *)&key[4 * i]); + w_ring[i] = be32_to_cpu(*(__be32 *)&key[4 * i]); i = 0; temp = w_ring[nk - 1]; @@ -275,7 +275,7 @@ static void get_aes_decrypt_key(unsigned char *dec_key, } i--; for (k = 0, j = i % nk; k < nk; k++) { - *((u32 *)dec_key + k) = htonl(w_ring[j]); + *((__be32 *)dec_key + k) = htonl(w_ring[j]); j--; if (j < 0) j += nk; @@ -2926,7 +2926,7 @@ static int ccm_format_packet(struct aead_request *req, memcpy(ivptr, req->iv, 16); } if (assoclen) - *((unsigned short *)(reqctx->scratch_pad + 16)) = + *((__be16 *)(reqctx->scratch_pad + 16)) = htons(assoclen); rc = generate_b0(req, ivptr, op_type); @@ -3201,7 +3201,7 @@ static struct sk_buff *create_gcm_wr(struct aead_request *req, } else { memcpy(ivptr, req->iv, GCM_AES_IV_SIZE); } - *((unsigned int *)(ivptr + 12)) = htonl(0x01); + *((__be32 *)(ivptr + 12)) = htonl(0x01); ulptx = (struct ulptx_sgl *)(ivptr + 16); diff --git a/drivers/crypto/chelsio/chcr_ipsec.c b/drivers/crypto/chelsio/chcr_ipsec.c index d25689837b26..3a10f51ad6fd 100644 --- a/drivers/crypto/chelsio/chcr_ipsec.c +++ b/drivers/crypto/chelsio/chcr_ipsec.c @@ -403,7 +403,7 @@ inline void *copy_esn_pktxt(struct sk_buff *skb, xo = xfrm_offload(skb); aadiv->spi = (esphdr->spi); - seqlo = htonl(esphdr->seq_no); + seqlo = ntohl(esphdr->seq_no); seqno = cpu_to_be64(seqlo + ((u64)xo->seq.hi << 32)); memcpy(aadiv->seq_no, &seqno, 8); iv = skb_transport_header(skb) + sizeof(struct ip_esp_hdr); -- 2.26.0.rc1.11.g30e9940
[PATCH net-next 2/2] Crypto/chcr: Fixes a cocci check error
This fixes an error observed after running coccinile check. drivers/crypto/chelsio/chcr_algo.c:1462:5-8: Unneeded variable: "err". Return "0" on line 1480 This line is missed in the commit 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). Fixes: 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). Signed-off-by: Ayush Sawal --- drivers/crypto/chelsio/chcr_algo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/crypto/chelsio/chcr_algo.c b/drivers/crypto/chelsio/chcr_algo.c index 1aed0e8d6558..c90b68aebe65 100644 --- a/drivers/crypto/chelsio/chcr_algo.c +++ b/drivers/crypto/chelsio/chcr_algo.c @@ -1462,6 +1462,7 @@ static int chcr_device_init(struct chcr_context *ctx) int err = 0, rxq_perchan; if (!ctx->dev) { + err = -ENXIO; u_ctx = assign_chcr_device(); if (!u_ctx) { pr_err("chcr device assignment fails\n"); -- 2.26.0.rc1.11.g30e9940
Re: [PATCH] KVM: MIPS/TLB: Remove Unneeded semicolon in tlb.c
On Tue, Apr 28, 2020 at 02:32:45PM +0800, Jason Yan wrote: > Fix the following coccicheck warning: > > arch/mips/kvm/tlb.c:472:2-3: Unneeded semicolon > arch/mips/kvm/tlb.c:489:2-3: Unneeded semicolon > > Signed-off-by: Jason Yan > --- > arch/mips/kvm/tlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) applied to mips-next. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea.[ RFC1925, 2.3 ]
[PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
From: Bartosz Golaszewski This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC family. For now we only support full-duplex. Signed-off-by: Bartosz Golaszewski --- drivers/net/ethernet/mediatek/Kconfig |6 + drivers/net/ethernet/mediatek/Makefile |1 + drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1578 +++ 3 files changed, 1585 insertions(+) create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig index 5079b8090f16..5c3793076765 100644 --- a/drivers/net/ethernet/mediatek/Kconfig +++ b/drivers/net/ethernet/mediatek/Kconfig @@ -14,4 +14,10 @@ config NET_MEDIATEK_SOC This driver supports the gigabit ethernet MACs in the MediaTek SoC family. +config NET_MEDIATEK_MAC + tristate "MediaTek Ethernet MAC support" + select PHYLIB + help + This driver supports the ethernet IP on MediaTek MT85** SoCs. + endif #NET_VENDOR_MEDIATEK diff --git a/drivers/net/ethernet/mediatek/Makefile b/drivers/net/ethernet/mediatek/Makefile index 3362fb7ef859..f7f5638943a0 100644 --- a/drivers/net/ethernet/mediatek/Makefile +++ b/drivers/net/ethernet/mediatek/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o +obj-$(CONFIG_NET_MEDIATEK_MAC) += mtk_eth_mac.o diff --git a/drivers/net/ethernet/mediatek/mtk_eth_mac.c b/drivers/net/ethernet/mediatek/mtk_eth_mac.c new file mode 100644 index ..6fbe49e861d6 --- /dev/null +++ b/drivers/net/ethernet/mediatek/mtk_eth_mac.c @@ -0,0 +1,1578 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2020 MediaTek Corporation + * Copyright (c) 2020 BayLibre SAS + * + * Author: Bartosz Golaszewski + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MTK_MAC_DRVNAME"mtk_eth_mac" + +#define MTK_MAC_WAIT_TIMEOUT 300 +#define MTK_MAC_MAX_FRAME_SIZE 1514 +#define MTK_MAC_SKB_ALIGNMENT 16 +#define MTK_MAC_NAPI_WEIGHT64 +#define MTK_MAC_HASHTABLE_MC_LIMIT 256 +#define MTK_MAC_HASHTABLE_SIZE_MAX 512 + +/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h but + * this IP doesn't work without this alignment being equal to 2. + */ +#ifdef NET_IP_ALIGN +#undef NET_IP_ALIGN +#endif +#define NET_IP_ALIGN 2 + +static const char *const mtk_mac_clk_names[] = { "core", "reg", "trans" }; +#define MTK_MAC_NCLKS ARRAY_SIZE(mtk_mac_clk_names) + +/* PHY Control Register 0 */ +#define MTK_MAC_REG_PHY_CTRL0 0x +#define MTK_MAC_BIT_PHY_CTRL0_WTCMDBIT(13) +#define MTK_MAC_BIT_PHY_CTRL0_RDCMDBIT(14) +#define MTK_MAC_BIT_PHY_CTRL0_RWOK BIT(15) +#define MTK_MAC_MSK_PHY_CTRL0_PREG GENMASK(12, 8) +#define MTK_MAC_OFF_PHY_CTRL0_PREG 8 +#define MTK_MAC_MSK_PHY_CTRL0_RWDATA GENMASK(31, 16) +#define MTK_MAC_OFF_PHY_CTRL0_RWDATA 16 + +/* PHY Control Register 1 */ +#define MTK_MAC_REG_PHY_CTRL1 0x0004 +#define MTK_MAC_BIT_PHY_CTRL1_LINK_ST BIT(0) +#define MTK_MAC_BIT_PHY_CTRL1_AN_ENBIT(8) +#define MTK_MAC_OFF_PHY_CTRL1_FORCE_SPD9 +#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_10M0x00 +#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_100M 0x01 +#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_1000M 0x02 +#define MTK_MAC_BIT_PHY_CTRL1_FORCE_DPXBIT(11) +#define MTK_MAC_BIT_PHY_CTRL1_FORCE_FC_RX BIT(12) +#define MTK_MAC_BIT_PHY_CTRL1_FORCE_FC_TX BIT(13) + +/* MAC Configuration Register */ +#define MTK_MAC_REG_MAC_CFG0x0008 +#define MTK_MAC_OFF_MAC_CFG_IPG10 +#define MTK_MAC_VAL_MAC_CFG_IPG_96BIT GENMASK(4, 0) +#define MTK_MAC_BIT_MAC_CFG_MAXLEN_1522BIT(16) +#define MTK_MAC_BIT_MAC_CFG_AUTO_PAD BIT(19) +#define MTK_MAC_BIT_MAC_CFG_CRC_STRIP BIT(20) +#define MTK_MAC_BIT_MAC_CFG_VLAN_STRIP BIT(22) +#define MTK_MAC_BIT_MAC_CFG_NIC_PD BIT(31) + +/* Flow-Control Configuration Register */ +#define MTK_MAC_REG_FC_CFG 0x000c +#define MTK_MAC_BIT_FC_CFG_BP_EN BIT(7) +#define MTK_MAC_BIT_FC_CFG_UC_PAUSE_DIRBIT(8) +#define MTK_MAC_OFF_FC_CFG_SEND_PAUSE_TH 16 +#define MTK_MAC_MSK_FC_CFG_SEND_PAUSE_TH GENMASK(27, 16) +#define MTK_MAC_VAL_FC_CFG_SEND_PAUSE_TH_2K0x800 + +/* ARL Configuration Register */ +#define MTK_MAC_REG_ARL_CFG0x0010 +#define MTK_MAC_BIT_ARL_CFG_HASH_ALG BIT(0) +#define MTK_MAC_BIT_ARL_CFG_MISC_MODE BIT(4) + +/* MAC High and Low Bytes Registers */ +#define MT
[PATCH v3 11/15] ARM64: dts: mediatek: add pericfg syscon to mt8516.dtsi
From: Bartosz Golaszewski This adds support for the PERICFG register range as a syscon. This will soon be used by the MediaTek Ethernet MAC driver for NIC configuration. Signed-off-by: Bartosz Golaszewski --- arch/arm64/boot/dts/mediatek/mt8516.dtsi | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8516.dtsi b/arch/arm64/boot/dts/mediatek/mt8516.dtsi index 2f8adf042195..8cedaf74ae86 100644 --- a/arch/arm64/boot/dts/mediatek/mt8516.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8516.dtsi @@ -191,6 +191,11 @@ infracfg: infracfg@10001000 { #clock-cells = <1>; }; + pericfg: pericfg@10003050 { + compatible = "mediatek,mt8516-pericfg", "syscon"; + reg = <0 0x10003050 0 0x1000>; + }; + apmixedsys: apmixedsys@10018000 { compatible = "mediatek,mt8516-apmixedsys", "syscon"; reg = <0 0x10018000 0 0x710>; -- 2.25.0
[PATCH v3 15/15] ARM64: dts: mediatek: enable ethernet on pumpkin boards
From: Bartosz Golaszewski Add remaining properties to the ethernet node and enable it. Signed-off-by: Bartosz Golaszewski --- .../boot/dts/mediatek/pumpkin-common.dtsi | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi b/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi index 4b1d5f69aba6..dfceffe6950a 100644 --- a/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi +++ b/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi @@ -167,6 +167,24 @@ &uart0 { status = "okay"; }; +ðernet { + pinctrl-names = "default"; + pinctrl-0 = <ðernet_pins_default>; + phy-handle = <ð_phy>; + phy-mode = "rmii"; + mac-address = [00 00 00 00 00 00]; + status = "okay"; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + eth_phy: ethernet-phy@0 { + reg = <0>; + }; + }; +}; + &usb0 { status = "okay"; dr_mode = "peripheral"; -- 2.25.0
[PATCH v3 07/15] net: move devres helpers into a separate source file
From: Bartosz Golaszewski There's currently only a single devres helper in net/ - devm variant of alloc_etherdev. Let's move it to net/devres.c with the intention of assing a second one: devm_register_netdev(). This new routine will need to know the address of the release function of devm_alloc_etherdev() so that it can verify (using devres_find()) that the struct net_device that's being passed to it is also resource managed. Signed-off-by: Bartosz Golaszewski --- net/Makefile | 2 +- net/devres.c | 36 net/ethernet/eth.c | 28 3 files changed, 37 insertions(+), 29 deletions(-) create mode 100644 net/devres.c diff --git a/net/Makefile b/net/Makefile index 07ea48160874..5744bf1997fd 100644 --- a/net/Makefile +++ b/net/Makefile @@ -6,7 +6,7 @@ # Rewritten to use lists instead of if-statements. # -obj-$(CONFIG_NET) := socket.o core/ +obj-$(CONFIG_NET) := devres.o socket.o core/ tmp-$(CONFIG_COMPAT) := compat.o obj-$(CONFIG_NET) += $(tmp-y) diff --git a/net/devres.c b/net/devres.c new file mode 100644 index ..c1465d9f9019 --- /dev/null +++ b/net/devres.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * This file contains all networking devres helpers. + */ + +#include +#include +#include + +static void devm_free_netdev(struct device *dev, void *res) +{ + free_netdev(*(struct net_device **)res); +} + +struct net_device *devm_alloc_etherdev_mqs(struct device *dev, int sizeof_priv, + unsigned int txqs, unsigned int rxqs) +{ + struct net_device **dr; + struct net_device *netdev; + + dr = devres_alloc(devm_free_netdev, sizeof(*dr), GFP_KERNEL); + if (!dr) + return NULL; + + netdev = alloc_etherdev_mqs(sizeof_priv, txqs, rxqs); + if (!netdev) { + devres_free(dr); + return NULL; + } + + *dr = netdev; + devres_add(dev, dr); + + return netdev; +} +EXPORT_SYMBOL(devm_alloc_etherdev_mqs); diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index c8b903302ff2..dac65180c4ef 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -400,34 +400,6 @@ struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs, } EXPORT_SYMBOL(alloc_etherdev_mqs); -static void devm_free_netdev(struct device *dev, void *res) -{ - free_netdev(*(struct net_device **)res); -} - -struct net_device *devm_alloc_etherdev_mqs(struct device *dev, int sizeof_priv, - unsigned int txqs, unsigned int rxqs) -{ - struct net_device **dr; - struct net_device *netdev; - - dr = devres_alloc(devm_free_netdev, sizeof(*dr), GFP_KERNEL); - if (!dr) - return NULL; - - netdev = alloc_etherdev_mqs(sizeof_priv, txqs, rxqs); - if (!netdev) { - devres_free(dr); - return NULL; - } - - *dr = netdev; - devres_add(dev, dr); - - return netdev; -} -EXPORT_SYMBOL(devm_alloc_etherdev_mqs); - ssize_t sysfs_format_mac(char *buf, const unsigned char *addr, int len) { return scnprintf(buf, PAGE_SIZE, "%*phC\n", len, addr); -- 2.25.0
[PATCH v3 09/15] net: devres: provide devm_register_netdev()
From: Bartosz Golaszewski Provide devm_register_netdev() - a device resource managed variant of register_netdev(). This new helper will only work for net_device structs that are also already managed by devres. Signed-off-by: Bartosz Golaszewski --- .../driver-api/driver-model/devres.rst| 1 + include/linux/netdevice.h | 2 + net/devres.c | 55 +++ 3 files changed, 58 insertions(+) diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst index 50df28d20fa7..fc242ed4bde5 100644 --- a/Documentation/driver-api/driver-model/devres.rst +++ b/Documentation/driver-api/driver-model/devres.rst @@ -375,6 +375,7 @@ MUX NET devm_alloc_etherdev() devm_alloc_etherdev_mqs() + devm_register_netdev() PER-CPU MEM devm_alloc_percpu() diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 130a668049ab..c4ad728993dd 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4208,6 +4208,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, int register_netdev(struct net_device *dev); void unregister_netdev(struct net_device *dev); +int devm_register_netdev(struct device *dev, struct net_device *ndev); + /* General hardware address lists handling functions */ int __hw_addr_sync(struct netdev_hw_addr_list *to_list, struct netdev_hw_addr_list *from_list, int addr_len); diff --git a/net/devres.c b/net/devres.c index b97b0c5a8216..57a6a88d11f6 100644 --- a/net/devres.c +++ b/net/devres.c @@ -38,3 +38,58 @@ struct net_device *devm_alloc_etherdev_mqs(struct device *dev, int sizeof_priv, return dr->ndev; } EXPORT_SYMBOL(devm_alloc_etherdev_mqs); + +static void devm_netdev_release(struct device *dev, void *this) +{ + struct net_device_devres *res = this; + + unregister_netdev(res->ndev); +} + +static int netdev_devres_match(struct device *dev, void *this, void *match_data) +{ + struct net_device_devres *res = this; + struct net_device *ndev = match_data; + + return ndev == res->ndev; +} + +/** + * devm_register_netdev - resource managed variant of register_netdev() + * @dev: managing device for this netdev - usually the parent device + * @ndev: device to register + * + * This is a devres variant of register_netdev() for which the unregister + * function will be call automatically when the managing device is + * detached. Note: the net_device used must also be resource managed by + * the same struct device. + */ +int devm_register_netdev(struct device *dev, struct net_device *ndev) +{ + struct net_device_devres *dr; + int ret; + + /* struct net_device must itself be managed. For now a managed netdev +* can only be allocated by devm_alloc_etherdev_mqs() so the check is +* straightforward. +*/ + if (WARN_ON(!devres_find(dev, devm_free_netdev, +netdev_devres_match, ndev))) + return -EINVAL; + + dr = devres_alloc(devm_netdev_release, sizeof(*dr), GFP_KERNEL); + if (!dr) + return -ENOMEM; + + ret = register_netdev(ndev); + if (ret) { + devres_free(dr); + return ret; + } + + dr->ndev = ndev; + devres_add(ndev->dev.parent, dr); + + return 0; +} +EXPORT_SYMBOL(devm_register_netdev); -- 2.25.0
[PATCH v3 13/15] ARM64: dts: mediatek: add an alias for ethernet0 for pumpkin boards
From: Bartosz Golaszewski Add the ethernet0 alias for ethernet so that u-boot can find this node and fill in the MAC address. Signed-off-by: Bartosz Golaszewski --- arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi b/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi index a31093d7142b..97d9b000c37e 100644 --- a/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi +++ b/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi @@ -9,6 +9,7 @@ / { aliases { serial0 = &uart0; + ethernet0 = ðernet; }; chosen { -- 2.25.0
[PATCH v3 12/15] ARM64: dts: mediatek: add the ethernet node to mt8516.dtsi
From: Bartosz Golaszewski Add the Ethernet MAC node to mt8516.dtsi. This defines parameters common to all the boards based on this SoC. Signed-off-by: Bartosz Golaszewski --- arch/arm64/boot/dts/mediatek/mt8516.dtsi | 12 1 file changed, 12 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8516.dtsi b/arch/arm64/boot/dts/mediatek/mt8516.dtsi index 8cedaf74ae86..89af661e7f63 100644 --- a/arch/arm64/boot/dts/mediatek/mt8516.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8516.dtsi @@ -406,6 +406,18 @@ mmc2: mmc@1117 { status = "disabled"; }; + ethernet: ethernet@1118 { + compatible = "mediatek,mt8516-eth"; + reg = <0 0x1118 0 0x1000>; + mediatek,pericfg = <&pericfg>; + interrupts = ; + clocks = <&topckgen CLK_TOP_RG_ETH>, +<&topckgen CLK_TOP_66M_ETH>, +<&topckgen CLK_TOP_133M_ETH>; + clock-names = "core", "reg", "trans"; + status = "disabled"; + }; + rng: rng@1020c000 { compatible = "mediatek,mt8516-rng", "mediatek,mt7623-rng"; -- 2.25.0
[PATCH v3 06/15] Documentation: devres: add a missing section for networking helpers
From: Bartosz Golaszewski Add a new section for networking devres helpers to devres.rst and list the two existing devm functions. Signed-off-by: Bartosz Golaszewski --- Documentation/driver-api/driver-model/devres.rst | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst index 46c13780994c..50df28d20fa7 100644 --- a/Documentation/driver-api/driver-model/devres.rst +++ b/Documentation/driver-api/driver-model/devres.rst @@ -372,6 +372,10 @@ MUX devm_mux_chip_register() devm_mux_control_get() +NET + devm_alloc_etherdev() + devm_alloc_etherdev_mqs() + PER-CPU MEM devm_alloc_percpu() devm_free_percpu() -- 2.25.0
[PATCH v3 08/15] net: devres: define a separate devres structure for devm_alloc_etherdev()
From: Bartosz Golaszewski Not using a proxy structure to store struct net_device doesn't save anything in terms of compiled code size or memory usage but significantly decreases the readability of the code with all the pointer casting. Define struct net_device_devres and use it in devm_alloc_etherdev_mqs(). Signed-off-by: Bartosz Golaszewski --- net/devres.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/net/devres.c b/net/devres.c index c1465d9f9019..b97b0c5a8216 100644 --- a/net/devres.c +++ b/net/devres.c @@ -7,30 +7,34 @@ #include #include -static void devm_free_netdev(struct device *dev, void *res) +struct net_device_devres { + struct net_device *ndev; +}; + +static void devm_free_netdev(struct device *dev, void *this) { - free_netdev(*(struct net_device **)res); + struct net_device_devres *res = this; + + free_netdev(res->ndev); } struct net_device *devm_alloc_etherdev_mqs(struct device *dev, int sizeof_priv, unsigned int txqs, unsigned int rxqs) { - struct net_device **dr; - struct net_device *netdev; + struct net_device_devres *dr; dr = devres_alloc(devm_free_netdev, sizeof(*dr), GFP_KERNEL); if (!dr) return NULL; - netdev = alloc_etherdev_mqs(sizeof_priv, txqs, rxqs); - if (!netdev) { + dr->ndev = alloc_etherdev_mqs(sizeof_priv, txqs, rxqs); + if (!dr->ndev) { devres_free(dr); return NULL; } - *dr = netdev; devres_add(dev, dr); - return netdev; + return dr->ndev; } EXPORT_SYMBOL(devm_alloc_etherdev_mqs); -- 2.25.0
[PATCH v3 05/15] net: ethernet: mediatek: remove unnecessary spaces from Makefile
From: Bartosz Golaszewski The Makefile formatting in the kernel tree usually doesn't use tabs, so remove them before we add a second driver. Signed-off-by: Bartosz Golaszewski --- drivers/net/ethernet/mediatek/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/Makefile b/drivers/net/ethernet/mediatek/Makefile index 2d8362f9341b..3362fb7ef859 100644 --- a/drivers/net/ethernet/mediatek/Makefile +++ b/drivers/net/ethernet/mediatek/Makefile @@ -3,5 +3,5 @@ # Makefile for the Mediatek SoCs built-in ethernet macs # -obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o +obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o -- 2.25.0
[PATCH v3 04/15] net: ethernet: mediatek: rename Kconfig prompt
From: Bartosz Golaszewski We'll soon by adding a second MediaTek Ethernet driver so modify the Kconfig prompt. Signed-off-by: Bartosz Golaszewski --- drivers/net/ethernet/mediatek/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig index 4968352ba188..5079b8090f16 100644 --- a/drivers/net/ethernet/mediatek/Kconfig +++ b/drivers/net/ethernet/mediatek/Kconfig @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only config NET_VENDOR_MEDIATEK - bool "MediaTek ethernet driver" + bool "MediaTek devices" depends on ARCH_MEDIATEK || SOC_MT7621 || SOC_MT7620 ---help--- If you have a Mediatek SoC with ethernet, say Y. -- 2.25.0
[PATCH v3 14/15] ARM64: dts: mediatek: add ethernet pins for pumpkin boards
From: Bartosz Golaszewski Setup the pin control for the Ethernet MAC. Signed-off-by: Bartosz Golaszewski --- arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi b/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi index 97d9b000c37e..4b1d5f69aba6 100644 --- a/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi +++ b/arch/arm64/boot/dts/mediatek/pumpkin-common.dtsi @@ -219,4 +219,19 @@ gpio_mux_int_n_pin { bias-pull-up; }; }; + + ethernet_pins_default: ethernet { + pins_ethernet { + pinmux = , +, +, +, +, +, +, +, +, +; + }; + }; }; -- 2.25.0
[PATCH v3 03/15] dt-bindings: net: add a binding document for MediaTek Ethernet MAC
From: Bartosz Golaszewski This adds yaml DT bindings for the MediaTek Ethernet MAC present on the mt8* family of SoCs. Signed-off-by: Bartosz Golaszewski --- .../bindings/net/mediatek,eth-mac.yaml| 89 +++ 1 file changed, 89 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/mediatek,eth-mac.yaml diff --git a/Documentation/devicetree/bindings/net/mediatek,eth-mac.yaml b/Documentation/devicetree/bindings/net/mediatek,eth-mac.yaml new file mode 100644 index ..8ffd0b762c0f --- /dev/null +++ b/Documentation/devicetree/bindings/net/mediatek,eth-mac.yaml @@ -0,0 +1,89 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/mediatek,eth-mac.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MediaTek Ethernet MAC Controller + +maintainers: + - Bartosz Golaszewski + +description: + This Ethernet MAC is used on the MT8* family of SoCs from MediaTek. + It's compliant with 802.3 standards and supports half- and full-duplex + modes with flow-control as well as CRC offloading and VLAN tags. + +allOf: + - $ref: "ethernet-controller.yaml#" + +properties: + compatible: +enum: + - mediatek,mt8516-eth + - mediatek,mt8518-eth + - mediatek,mt8175-eth + + reg: +maxItems: 1 + + interrupts: +maxItems: 1 + + clocks: +minItems: 3 +maxItems: 3 + + clock-names: +additionalItems: false +items: + - const: core + - const: reg + - const: trans + + mediatek,pericfg: +$ref: /schemas/types.yaml#definitions/phandle +description: + Phandle to the device containing the PERICFG register range. This is used + to control the MII mode. + + mdio: +type: object +description: + Creates and registers an MDIO bus. + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + - mediatek,pericfg + - phy-handle + +examples: + - | +#include +#include + +ethernet: ethernet@1118 { +compatible = "mediatek,mt8516-eth"; +reg = <0x1118 0x1000>; +mediatek,pericfg = <&pericfg>; +interrupts = ; +clocks = <&topckgen CLK_TOP_RG_ETH>, + <&topckgen CLK_TOP_66M_ETH>, + <&topckgen CLK_TOP_133M_ETH>; +clock-names = "core", "reg", "trans"; +phy-handle = <ð_phy>; +phy-mode = "rmii"; + +mdio { +#address-cells = <1>; +#size-cells = <0>; + +eth_phy: ethernet-phy@0 { +reg = <0>; +}; +}; +}; -- 2.25.0
[PATCH v3 00/15] mediatek: add support for MediaTek Ethernet MAC
From: Bartosz Golaszewski This adds support for the Ethernet Controller present on MediaTeK SoCs from the MT8* family. First we convert the existing DT bindings for the PERICFG controller to YAML and add a new compatible string for mt8516 variant of it. Then we add the DT bindings for the MAC. Next we do some cleanup of the mediatek ethernet drivers directory and update the devres documentation with existing networking devres helpers. The following patches introduce a resource managed variant of register_netdev() and move all networking devres helpers into a separate .c file. The largest patch in the series adds the actual new driver. The rest of the patches add DT fixups for the boards already supported upstream. v1 -> v2: - add a generic helper for retrieving the net_device associated with given private data - fix several typos in commit messages - remove MTK_MAC_VERSION and don't set the driver version - use NET_IP_ALIGN instead of a magic number (2) but redefine it as it defaults to 0 on arm64 - don't manually turn the carrier off in mtk_mac_enable() - process TX cleanup in napi poll callback - configure pause in the adjust_link callback - use regmap_read_poll_timeout() instead of handcoding the polling - use devres_find() to verify that struct net_device is managed by devres in devm_register_netdev() - add a patch moving all networking devres helpers into net/devres.c - tweak the dma barriers: remove where unnecessary and add comments to the remaining barriers - don't reset internal counters when enabling the NIC - set the net_device's mtu size instead of checking the framesize in ndo_start_xmit() callback - fix a race condition in waking up the netif queue - don't emit log messages on OOM errors - use dma_set_mask_and_coherent() - use eth_hw_addr_random() - rework the receive callback so that we reuse the previous skb if unmapping fails, like we already do if skb allocation fails - rework hash table operations: add proper timeout handling and clear bits when appropriate v2 -> v3: - drop the patch adding priv_to_netdev() and store the netdev pointer in the driver private data - add an additional dma_wmb() after reseting the descriptor in mtk_mac_ring_pop_tail() - check the return value of dma_set_mask_and_coherent() - improve the DT bindings for mtk-eth-mac: make the reg property in the example use single-cell address and size, extend the description of the PERICFG phandle and document the mdio sub-node - add a patch converting the old .txt bindings for PERICFG to yaml - limit reading the DMA memory by storing the mapped addresses in the driver private structure - add a patch documenting the existing networking devres helpers Bartosz Golaszewski (15): dt-bindings: convert the binding document for mediatek PERICFG to yaml dt-bindings: add new compatible to mediatek,pericfg dt-bindings: net: add a binding document for MediaTek Ethernet MAC net: ethernet: mediatek: rename Kconfig prompt net: ethernet: mediatek: remove unnecessary spaces from Makefile Documentation: devres: add a missing section for networking helpers net: move devres helpers into a separate source file net: devres: define a separate devres structure for devm_alloc_etherdev() net: devres: provide devm_register_netdev() net: ethernet: mtk-eth-mac: new driver ARM64: dts: mediatek: add pericfg syscon to mt8516.dtsi ARM64: dts: mediatek: add the ethernet node to mt8516.dtsi ARM64: dts: mediatek: add an alias for ethernet0 for pumpkin boards ARM64: dts: mediatek: add ethernet pins for pumpkin boards ARM64: dts: mediatek: enable ethernet on pumpkin boards .../arm/mediatek/mediatek,pericfg.txt | 36 - .../arm/mediatek/mediatek,pericfg.yaml| 64 + .../bindings/net/mediatek,eth-mac.yaml| 89 + .../driver-api/driver-model/devres.rst|5 + arch/arm64/boot/dts/mediatek/mt8516.dtsi | 17 + .../boot/dts/mediatek/pumpkin-common.dtsi | 34 + drivers/net/ethernet/mediatek/Kconfig |8 +- drivers/net/ethernet/mediatek/Makefile|3 +- drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1578 + include/linux/netdevice.h |2 + net/Makefile |2 +- net/devres.c | 95 + net/ethernet/eth.c| 28 - 13 files changed, 1894 insertions(+), 67 deletions(-) delete mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml create mode 100644 Documentation/devicetree/bindings/net/mediatek,eth-mac.yaml create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c create mode 100644 net/devres.c -- 2.25.0
[PATCH v3 02/15] dt-bindings: add new compatible to mediatek,pericfg
From: Bartosz Golaszewski The PERICFG controller is present on the MT8516 SoC. Add an appropriate compatible variant. Signed-off-by: Bartosz Golaszewski --- .../devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml index 1340c6288024..55209a2baedc 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml @@ -25,6 +25,7 @@ properties: - mediatek,mt8135-pericfg - mediatek,mt8173-pericfg - mediatek,mt8183-pericfg + - mediatek,mt8516-pericfg - const: syscon - items: # Special case for mt7623 for backward compatibility -- 2.25.0
[PATCH v3 01/15] dt-bindings: convert the binding document for mediatek PERICFG to yaml
From: Bartosz Golaszewski Convert the DT binding .txt file for MediaTek's peripheral configuration controller to YAML. There's one special case where the compatible has three positions. Otherwise, it's a pretty normal syscon. Signed-off-by: Bartosz Golaszewski --- .../arm/mediatek/mediatek,pericfg.txt | 36 --- .../arm/mediatek/mediatek,pericfg.yaml| 63 +++ 2 files changed, 63 insertions(+), 36 deletions(-) delete mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt deleted file mode 100644 index ecf027a9003a.. --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt +++ /dev/null @@ -1,36 +0,0 @@ -Mediatek pericfg controller -=== - -The Mediatek pericfg controller provides various clocks and reset -outputs to the system. - -Required Properties: - -- compatible: Should be one of: - - "mediatek,mt2701-pericfg", "syscon" - - "mediatek,mt2712-pericfg", "syscon" - - "mediatek,mt7622-pericfg", "syscon" - - "mediatek,mt7623-pericfg", "mediatek,mt2701-pericfg", "syscon" - - "mediatek,mt7629-pericfg", "syscon" - - "mediatek,mt8135-pericfg", "syscon" - - "mediatek,mt8173-pericfg", "syscon" - - "mediatek,mt8183-pericfg", "syscon" -- #clock-cells: Must be 1 -- #reset-cells: Must be 1 - -The pericfg controller uses the common clk binding from -Documentation/devicetree/bindings/clock/clock-bindings.txt -The available clocks are defined in dt-bindings/clock/mt*-clk.h. -Also it uses the common reset controller binding from -Documentation/devicetree/bindings/reset/reset.txt. -The available reset outputs are defined in -dt-bindings/reset/mt*-resets.h - -Example: - -pericfg: power-controller@10003000 { - compatible = "mediatek,mt8173-pericfg", "syscon"; - reg = <0 0x10003000 0 0x1000>; - #clock-cells = <1>; - #reset-cells = <1>; -}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml new file mode 100644 index ..1340c6288024 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml @@ -0,0 +1,63 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/arm/mediatek/mediatek,pericfg.yaml#"; +$schema: "http://devicetree.org/meta-schemas/core.yaml#"; + +title: MediaTek Peripheral Configuration Controller + +maintainers: + - Bartosz Golaszewski + +description: + The Mediatek pericfg controller provides various clocks and reset outputs + to the system. + +properties: + compatible: +oneOf: + - items: +- enum: + - mediatek,mt2701-pericfg + - mediatek,mt2712-pericfg + - mediatek,mt7622-pericfg + - mediatek,mt7629-pericfg + - mediatek,mt8135-pericfg + - mediatek,mt8173-pericfg + - mediatek,mt8183-pericfg +- const: syscon + - items: +# Special case for mt7623 for backward compatibility +- const: mediatek,mt7623-pericfg +- const: mediatek,mt2701-pericfg +- const: syscon + + reg: +maxItems: 1 + + '#clock-cells': +const: 1 + + '#reset-cells': +const: 1 + +required: + - compatible + - reg + +examples: + - | +pericfg@10003000 { +compatible = "mediatek,mt8173-pericfg", "syscon"; +reg = <0x10003000 0x1000>; +#clock-cells = <1>; +#reset-cells = <1>; +}; + + - | +pericfg@10003000 { +compatible = "mediatek,mt7623-pericfg", "mediatek,mt2701-pericfg", "syscon"; +reg = <0x10003000 0x1000>; +#clock-cells = <1>; +#reset-cells = <1>; +}; -- 2.25.0
Re: [bpf-next PATCH 2/3] bpf: sk_msg helpers for probe_* and *current_task*
On 5/13/20 9:24 PM, John Fastabend wrote: Often it is useful when applying policy to know something about the task. If the administrator has CAP_SYS_ADMIN rights then they can use kprobe + sk_msg and link the two programs together to accomplish this. However, this is a bit clunky and also means we have to call sk_msg program and kprobe program when we could just use a single program and avoid passing metadata through sk_msg/skb, socket, etc. To accomplish this add probe_* helpers to sk_msg programs guarded by a CAP_SYS_ADMIN check. New supported helpers are the following, BPF_FUNC_get_current_task BPF_FUNC_current_task_under_cgroup BPF_FUNC_probe_read_user BPF_FUNC_probe_read_kernel BPF_FUNC_probe_read BPF_FUNC_probe_read_user_str BPF_FUNC_probe_read_kernel_str BPF_FUNC_probe_read_str Given the current discussion in the other thread with Linus et al, please don't add more users for BPF_FUNC_probe_read and BPF_FUNC_probe_read_str as I'm cooking up a patch to disable them on non-x86, and cleanups from Christoph would make them less efficient than the *_user/_kernel{,_str}() versions anyway, so lets only add the latter. Thanks, Daniel
Re: [PATCH 7/9] bpf: Compile the BTF id whitelist data in vmlinux
On Wed, May 13, 2020 at 11:29:40AM -0700, Alexei Starovoitov wrote: SNIP > > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh > > index d09ab4afbda4..dee91c6bf450 100755 > > --- a/scripts/link-vmlinux.sh > > +++ b/scripts/link-vmlinux.sh > > @@ -130,16 +130,26 @@ gen_btf() > > info "BTF" ${2} > > LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1} > > > > - # Create ${2} which contains just .BTF section but no symbols. Add > > + # Create object which contains just .BTF section but no symbols. Add > > # SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all > > # deletes all symbols including __start_BTF and __stop_BTF, which will > > # be redefined in the linker script. Add 2>/dev/null to suppress GNU > > # objcopy warnings: "empty loadable segment detected at ..." > > ${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \ > > - --strip-all ${1} ${2} 2>/dev/null > > - # Change e_type to ET_REL so that it can be used to link final vmlinux. > > - # Unlike GNU ld, lld does not allow an ET_EXEC input. > > - printf '\1' | dd of=${2} conv=notrunc bs=1 seek=16 status=none > > + --strip-all ${1} 2>/dev/null > > + > > + # Create object that contains just .BTF_whitelist_* sections generated > > + # by bpfwl. Same as BTF section, BTF_whitelist_* data will be part of > > + # the vmlinux image, hence SHF_ALLOC. > > + whitelist=.btf.vmlinux.whitelist > > + > > + ${BPFWL} ${1} kernel/bpf/helpers-whitelist > ${whitelist}.c > > + ${CC} -c -o ${whitelist}.o ${whitelist}.c > > + ${OBJCOPY} --only-section=.BTF_whitelist* --set-section-flags > > .BTF=alloc,readonly \ > > +--strip-all ${whitelist}.o 2>/dev/null > > + > > + # Link BTF and BTF_whitelist objects together > > + ${LD} -r -o ${2} ${1} ${whitelist}.o > > Thank you for working on it! > Looks great to me overall. In the next rev please drop RFC tag. > > My only concern is this extra linking step. How many extra seconds does it > add? I did not meassure, but I haven't noticed any noticable delay, I'll add meassurements to the next post > > Also in patch 3: > + func = func__find(str); > + if (func) > + func->id = id; > which means that if somebody mistyped the name or that kernel function > got renamed there will be no warnings or errors. > I think it needs to fail the build instead. it fails later on, when generating the array: if (!func->id) { fprintf(stderr, "FAILED: '%s' function not found in BTF data\n", func->name); return -1; } but it can clearly fail before that.. I'll change that > > If additional linking step takes another 20 seconds it could be a reason > to move the search to run-time. > We already have that with struct bpf_func_proto->btf_id[]. > Whitelist could be something similar. > I think this mechanism will be reused for unstable helpers and other > func->btf_id mappings, so 'bpfwl' name would change eventually. > It's not white list specific. It generates a mapping of names to btf_ids. > Doing it at build time vs run-time is a trade off and it doesn't have > an obvious answer. I was thinking of putting the names in __init section and generate the BTF ids on kernel start, but the build time generation seemed more convenient.. let's see the linking times with 'real size' whitelist and we can reconsider thanks, jirka
RE: [PATCH 27/33] sctp: export sctp_setsockopt_bindx
From: Marcelo Ricardo Leitner > Sent: 13 May 2020 19:01 > On Wed, May 13, 2020 at 08:26:42AM +0200, Christoph Hellwig wrote: > > And call it directly from dlm instead of going through kernel_setsockopt. > > The advantage on using kernel_setsockopt here is that sctp module will > only be loaded if dlm actually creates a SCTP socket. With this > change, sctp will be loaded on setups that may not be actually using > it. It's a quite big module and might expose the system. > > I'm okay with the SCTP changes, but I'll defer to DLM folks to whether > that's too bad or what for DLM. I didn't see these sneak through. There is a big long list of SCTP socket options that are needed to make anything work. They all need exporting. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [EXT] Re: signal quality and cable diagnostic
Hi Christian, On Thu, May 14, 2020 at 07:13:30AM +, Christian Herber wrote: > On Tue, May 12, 2020 at 10:22:01AM +0200, Oleksij Rempel wrote: > > > So I think we should pass raw SQI value to user space, at least in the > > first implementation. > > > What do you think about this? > > Hi Oleksij, > > I had a check about the background of this SQI thing. The table you reference > with concrete SNR values is informative only and not a requirement. The > requirements are rather loose. > > This is from OA: > - Only for SQI=0 a link loss shall occur. > - The indicated signal quality shall monotonic increasing /decreasing with > noise level. > - It shall be indicated in the datasheet at which level a BER<10^-10 (better > than 10^-10) is achieved (e.g. "from SQI=3 to SQI=7 the link has a BER<10^-10 > (better than 10^-10)") > > I.e. SQI does not need to have a direct correlation with SNR. The fundamental > underlying metric is the BER. > You can report the raw SQI level and users would have to look up what it > means in the respective data sheet. There is no guaranteed relation between > SQI levels of different devices, i.e. SQI 5 can have lower BER than SQI 6 on > another device. > Alternatively, you could report BER < x for the different SQI levels. > However, this requires the information to be available. While I could provide > these for NXP, it might not be easily available for other vendors. > If reporting raw SQI, at least the SQI level for BER<10^-10 should be > presented to give any meaning to the value. So the question is, which values to provide via KAPI to user space? - SQI The PHY can probably measure the SNR quite fast and has some internal function or lookup table to deduct the SQI from the measured SNR. If I understand you correctly, we can only compare SQI values of the same PHY, as different PHYs give different SQIs for the same link characteristics (=SNR). - SNR range We read the SQI from the PHY look up the SNR range for that value from the data sheet and provide that value to use space. This gives a better description of the quality of the link. - "guestimated" BER The manufacturer of the PHY has probably done some extensive testing that a measured SNR can be correlated to some BER. This value may be provided in the data sheet, too. The SNR seems to be most universal value, when it comes to comparing different situations (different links and different PHYs). The resolution of BER is not that detailed, for the NXP PHY is says only "BER below 1e-10" or not. > While I could provide these for NXP, it might not be easily available > for other vendors. It will be great if you can provide this information. It may force other vendors to do the same :) The actual procedure to measure the BER is the following testing strategy suggested by opensig[1]: Procedure: 1. Configure the DUT as MASTER. 2. Connect the packet monitoring station to the automotive cable. 3. Connect the DUT to the automotive cable. 4. Send 2,470,000 1,518-byte packets (for a 10 -10 BER) and the monitor will count the number of packet errors. 5. Repeat step 4 for the remaining automotive cables. 6. Repeat steps 4-5 with the DUT configured as SLAVE. [1] http://www.opensig.org/download/document/225/Open_Alliance_100BASE-T1_PMA_Test_Suite_v1.0-dec.pdf Regards, Oleksij & Marc -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0| Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- |
RE: remove kernel_setsockopt and kernel_getsockopt
From: Joe Perches > Sent: 13 May 2020 18:39 > On Wed, 2020-05-13 at 08:26 +0200, Christoph Hellwig wrote: > > this series removes the kernel_setsockopt and kernel_getsockopt > > functions, and instead switches their users to small functions that > > implement setting (or in one case getting) a sockopt directly using > > a normal kernel function call with type safety and all the other > > benefits of not having a function call. > > > > In some cases these functions seem pretty heavy handed as they do > > a lock_sock even for just setting a single variable, but this mirrors > > the real setsockopt implementation - counter to that a few kernel > > drivers just set the fields directly already. > > > > Nevertheless the diffstat looks quite promising: > > > > 42 files changed, 721 insertions(+), 799 deletions(-) I missed this patch going through. Massive NACK. You need to export functions that do most of the socket options for all protocols. As well as REUSADDR and NODELAY SCTP has loads because a lot of stuff that should have been extra system calls got piled into setsockopt. An alternate solution would be to move the copy_to/from_user() into a wrapper function so that the kernel_[sg]etsockopt() functions would bypass them completely. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH bpf-next v2 00/14] Introduce AF_XDP buffer allocation API
Overview Driver adoption for AF_XDP has been slow. The amount of code required to proper support AF_XDP is substantial and the driver/core APIs are vague or even non-existing. Drivers have to manually adjust data offsets, updating AF_XDP handles differently for different modes (aligned/unaligned). This series attempts to improve the situation by introducing an AF_XDP buffer allocation API. The implementation is based on a single core (single producer/consumer) buffer pool for the AF_XDP UMEM. A buffer is allocated using the xsk_buff_alloc() function, and returned using xsk_buff_free(). If a buffer is disassociated with the pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is said to be released. Currently, the release function is only used by the AF_XDP internals and not visible to the driver. Drivers using this API should register the XDP memory model with the new MEM_TYPE_XSK_BUFF_POOL type, which will supersede the MEM_TYPE_ZERO_COPY type. The buffer type is struct xdp_buff, and follows the lifetime of regular xdp_buffs, i.e. the lifetime of an xdp_buff is restricted to a NAPI context. In other words, the API is not replacing xdp_frames. DMA mapping/synching is folded into the buffer handling as well. @JeffK The Intel drivers changes should go through the bpf-next tree, and not your regular Intel tree, since multiple (non-Intel) drivers are affected. The outline of the series is as following: Patch 1 to 3 are restructures/clean ups. The XSKMAP implementation is moved to net/xdp/. Functions/defines/enums that are only used by the AF_XDP internals are moved from the global include/net/xdp_sock.h to net/xdp/xsk.h. We are also introducing a new "driver include file", include/net/xdp_sock_drv.h, which is the only file NIC driver developers adding AF_XDP zero-copy support should care about. Patch 4 adds the new API, and migrates the "copy-mode"/skb-mode AF_XDP path to the new API. Patch 5 to 10 migrates the existing zero-copy drivers to the new API. Patch 11 removes the MEM_TYPE_ZERO_COPY memory type, and the "handle" member of struct xdp_buff. Patch 12 simplifies the xdp_return_{frame,frame_rx_napi,buff} functions. Patch 13 is a performance patch, where some functions are inlined. Finally, patch 14 updates the MAINTAINERS file to correctly mirror the new file layout. Note that this series removes the "handle" member from struct xdp_buff, which reduces the xdp_buff size. After this series, the diff stat of drivers/net/ is: 27 files changed, 388 insertions(+), 1259 deletions(-) This series is a first step of simplifying the driver side of AF_XDP. I think more of the AF_XDP logic can be moved from the drivers to the AF_XDP core, e.g. the "need wakeup" set/clear functionality. Statistics when allocation fails can now be added to the socket statistics via the XDP_STATISTICS getsockopt(). This will be added in a follow up series. Performance === As a nice side effect, performance is up a bit as well (40 GbE, 64B packets, i40e): rxdrop, zero-copy, aligned: baseline: 20.4 new API : 21.3 rxdrop, zero-copy, unaligned: baseline: 19.5 new API : 21.2 Changelog = v1->v2: * mlx5: Fix DMA address handling, set XDP metadata to invalid. (Maxim) * ixgbe: Fixed xdp_buff data_end update. (Björn) * Swapped SoBs in patch 4. (Maxim) rfc->v1: * Fixed build errors/warnings for m68k and riscv. (kbuild test robot) * Added headroom/chunk size getter. (Maxim/Björn) * mlx5: Put back the sanity check for XSK params, use XSK API to get the total headroom size. (Maxim) * Fixed spelling in commit message. (Björn) * Make sure xp_validate_desc() is inlined for Tx perf. (Maxim) * Sorted file entries. (Joe) * Added xdp_return_{frame,frame_rx_napi,buff} simplification (Björn) Thanks for all the comments/input/help! Cheers, Björn Björn Töpel (13): xsk: move xskmap.c to net/xdp/ xsk: move defines only used by AF_XDP internals to xsk.h xsk: introduce AF_XDP buffer allocation API i40e: refactor rx_bi accesses i40e: separate kernel allocated rx_bi rings from AF_XDP rings i40e, xsk: migrate to new MEM_TYPE_XSK_BUFF_POOL ice, xsk: migrate to new MEM_TYPE_XSK_BUFF_POOL ixgbe, xsk: migrate to new MEM_TYPE_XSK_BUFF_POOL mlx5, xsk: migrate to new MEM_TYPE_XSK_BUFF_POOL xsk: remove MEM_TYPE_ZERO_COPY and corresponding code xdp: simplify xdp_return_{frame,frame_rx_napi,buff} xsk: explicitly inline functions and move definitions MAINTAINERS, xsk: update AF_XDP section after moves/adds Magnus Karlsson (1): xsk: move driver interface to xdp_sock_drv.h MAINTAINERS | 6 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 28 +- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 134 +++ drivers/net/ethernet/intel/i40e/i40e_txrx.h | 17 +- .../ethernet/intel/i40e/i40e_txrx_common.h| 40 +- drivers/net/ethernet/intel/i40e/i40e_type.h | 5 +- drivers/net/
[PATCH bpf-next v2 01/14] xsk: move xskmap.c to net/xdp/
From: Björn Töpel The XSKMAP is partly implemented by net/xdp/xsk.c. Move xskmap.c from kernel/bpf/ to net/xdp/, which is the logical place for AF_XDP related code. Also, move AF_XDP struct definitions, and function declarations only used by AF_XDP internals into net/xdp/xsk.h. Signed-off-by: Björn Töpel --- include/net/xdp_sock.h | 20 kernel/bpf/Makefile | 3 --- net/xdp/Makefile | 2 +- net/xdp/xsk.h| 16 {kernel/bpf => net/xdp}/xskmap.c | 2 ++ 5 files changed, 19 insertions(+), 24 deletions(-) rename {kernel/bpf => net/xdp}/xskmap.c (99%) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 67191ccaab85..a26d6c80e43d 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -65,22 +65,12 @@ struct xdp_umem { struct list_head xsk_tx_list; }; -/* Nodes are linked in the struct xdp_sock map_list field, and used to - * track which maps a certain socket reside in. - */ - struct xsk_map { struct bpf_map map; spinlock_t lock; /* Synchronize map updates */ struct xdp_sock *xsk_map[]; }; -struct xsk_map_node { - struct list_head node; - struct xsk_map *map; - struct xdp_sock **map_entry; -}; - struct xdp_sock { /* struct sock must be the first member of struct xdp_sock */ struct sock sk; @@ -114,7 +104,6 @@ struct xdp_sock { struct xdp_buff; #ifdef CONFIG_XDP_SOCKETS int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); -bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs); /* Used from netdev driver */ bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt); bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr); @@ -133,10 +122,6 @@ void xsk_clear_rx_need_wakeup(struct xdp_umem *umem); void xsk_clear_tx_need_wakeup(struct xdp_umem *umem); bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem); -void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs, -struct xdp_sock **map_entry); -int xsk_map_inc(struct xsk_map *map); -void xsk_map_put(struct xsk_map *map); int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp); void __xsk_map_flush(void); @@ -242,11 +227,6 @@ static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) return -ENOTSUPP; } -static inline bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) -{ - return false; -} - static inline bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt) { return false; diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 37b2d8620153..375b933010dd 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -12,9 +12,6 @@ obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) obj-$(CONFIG_BPF_SYSCALL) += devmap.o obj-$(CONFIG_BPF_SYSCALL) += cpumap.o -ifeq ($(CONFIG_XDP_SOCKETS),y) -obj-$(CONFIG_BPF_SYSCALL) += xskmap.o -endif obj-$(CONFIG_BPF_SYSCALL) += offload.o endif ifeq ($(CONFIG_PERF_EVENTS),y) diff --git a/net/xdp/Makefile b/net/xdp/Makefile index 71e2bdafb2ce..90b5460d6166 100644 --- a/net/xdp/Makefile +++ b/net/xdp/Makefile @@ -1,3 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o xsk_queue.o +obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o xsk_queue.o xskmap.o obj-$(CONFIG_XDP_SOCKETS_DIAG) += xsk_diag.o diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h index 4cfd106bdb53..d6a0979050e6 100644 --- a/net/xdp/xsk.h +++ b/net/xdp/xsk.h @@ -17,9 +17,25 @@ struct xdp_mmap_offsets_v1 { struct xdp_ring_offset_v1 cr; }; +/* Nodes are linked in the struct xdp_sock map_list field, and used to + * track which maps a certain socket reside in. + */ + +struct xsk_map_node { + struct list_head node; + struct xsk_map *map; + struct xdp_sock **map_entry; +}; + static inline struct xdp_sock *xdp_sk(struct sock *sk) { return (struct xdp_sock *)sk; } +bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs); +void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs, +struct xdp_sock **map_entry); +int xsk_map_inc(struct xsk_map *map); +void xsk_map_put(struct xsk_map *map); + #endif /* XSK_H_ */ diff --git a/kernel/bpf/xskmap.c b/net/xdp/xskmap.c similarity index 99% rename from kernel/bpf/xskmap.c rename to net/xdp/xskmap.c index 2cc5c8f4c800..1dc7208c71ba 100644 --- a/kernel/bpf/xskmap.c +++ b/net/xdp/xskmap.c @@ -9,6 +9,8 @@ #include #include +#include "xsk.h" + int xsk_map_inc(struct xsk_map *map) { bpf_map_inc(&map->map); -- 2.25.1
[PATCH bpf-next v2 03/14] xsk: move defines only used by AF_XDP internals to xsk.h
From: Björn Töpel Move the XSK_NEXT_PG_CONTIG_{MASK,SHIFT}, and XDP_UMEM_USES_NEED_WAKEUP defines from xdp_sock.h to the AF_XDP internal xsk.h file. Also, start using the BIT{,_ULL} macro instead of explicit shifts. Signed-off-by: Björn Töpel --- include/net/xdp_sock.h | 14 -- net/xdp/xsk.h | 14 ++ net/xdp/xsk_queue.h| 2 ++ 3 files changed, 16 insertions(+), 14 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 6a986dcbc336..fb7fe3060175 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -17,13 +17,6 @@ struct net_device; struct xsk_queue; struct xdp_buff; -/* Masks for xdp_umem_page flags. - * The low 12-bits of the addr will be 0 since this is the page address, so we - * can use them for flags. - */ -#define XSK_NEXT_PG_CONTIG_SHIFT 0 -#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT) - struct xdp_umem_page { void *addr; dma_addr_t dma; @@ -35,13 +28,6 @@ struct xdp_umem_fq_reuse { u64 handles[]; }; -/* Flags for the umem flags field. - * - * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public - * flags. See inlude/uapi/include/linux/if_xdp.h. - */ -#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1) - struct xdp_umem { struct xsk_queue *fq; struct xsk_queue *cq; diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h index d6a0979050e6..455ddd480f3d 100644 --- a/net/xdp/xsk.h +++ b/net/xdp/xsk.h @@ -4,6 +4,20 @@ #ifndef XSK_H_ #define XSK_H_ +/* Masks for xdp_umem_page flags. + * The low 12-bits of the addr will be 0 since this is the page address, so we + * can use them for flags. + */ +#define XSK_NEXT_PG_CONTIG_SHIFT 0 +#define XSK_NEXT_PG_CONTIG_MASK BIT_ULL(XSK_NEXT_PG_CONTIG_SHIFT) + +/* Flags for the umem flags field. + * + * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public + * flags. See inlude/uapi/include/linux/if_xdp.h. + */ +#define XDP_UMEM_USES_NEED_WAKEUP BIT(1) + struct xdp_ring_offset_v1 { __u64 producer; __u64 consumer; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 648733ec24ac..a322a7dac58c 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -10,6 +10,8 @@ #include #include +#include "xsk.h" + struct xdp_ring { u32 producer cacheline_aligned_in_smp; u32 consumer cacheline_aligned_in_smp; -- 2.25.1
[PATCH bpf-next v2 02/14] xsk: move driver interface to xdp_sock_drv.h
From: Magnus Karlsson Move the AF_XDP zero-copy driver interface to its own include file called xdp_sock_drv.h. This, hopefully, will make it more clear for NIC driver implementors to know what functions to use for zero-copy support. Signed-off-by: Magnus Karlsson --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c| 2 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 2 +- .../ethernet/mellanox/mlx5/core/en/xsk/rx.h | 2 +- .../ethernet/mellanox/mlx5/core/en/xsk/tx.h | 2 +- .../ethernet/mellanox/mlx5/core/en/xsk/umem.c | 2 +- include/net/xdp_sock.h| 203 + include/net/xdp_sock_drv.h| 207 ++ net/ethtool/channels.c| 2 +- net/ethtool/ioctl.c | 2 +- net/xdp/xdp_umem.h| 2 +- net/xdp/xsk.c | 2 +- 14 files changed, 227 insertions(+), 207 deletions(-) create mode 100644 include/net/xdp_sock_drv.h diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 2a037ec244b9..d6b2db4f2c65 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -11,7 +11,7 @@ #include "i40e_diag.h" #include "i40e_xsk.h" #include -#include +#include /* All i40e tracepoints are defined by the include below, which * must be included exactly once across the whole kernel with * CREATE_TRACE_POINTS defined diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 0b7d29192b2c..452bba7bc4ff 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -2,7 +2,7 @@ /* Copyright(c) 2018 Intel Corporation. */ #include -#include +#include #include #include "i40e.h" diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 8279db15e870..955b0fbb7c9a 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -2,7 +2,7 @@ /* Copyright (c) 2019, Intel Corporation. */ #include -#include +#include #include #include "ice.h" #include "ice_base.h" diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 74b540ebb3dc..5b6edbd8a4ed 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -2,7 +2,7 @@ /* Copyright(c) 2018 Intel Corporation. */ #include -#include +#include #include #include "ixgbe.h" diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index c4a7fb4ecd14..b04b99396f65 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -31,7 +31,7 @@ */ #include -#include +#include #include "en/xdp.h" #include "en/params.h" diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h index cab0e93497ae..a8e11adbf426 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h @@ -5,7 +5,7 @@ #define __MLX5_EN_XSK_RX_H__ #include "en.h" -#include +#include /* RX data path */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h index 79b487d89757..39fa0a705856 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h @@ -5,7 +5,7 @@ #define __MLX5_EN_XSK_TX_H__ #include "en.h" -#include +#include /* TX data path */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c index 4baaa5788320..5e49fdb564b3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB /* Copyright (c) 2019 Mellanox Technologies. */ -#include +#include #include "umem.h" #include "setup.h" #include "en/params.h" diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index a26d6c80e43d..6a986dcbc336 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -15,6 +15,7 @@ struct net_device; struct xsk_queue; +struct xdp_buff; /* Masks for xdp_umem_page flags. * The low 12-bits of the addr will be 0 since this is the page address, so we @@ -101,27 +102,9 @@ struct xdp_sock { spinlock_t map_list_lock; }; -struct xdp_buff; #ifdef CONFIG_XDP_SOCKETS -int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); -/* Used
[PATCH bpf-next v2 07/14] i40e, xsk: migrate to new MEM_TYPE_XSK_BUFF_POOL
From: Björn Töpel Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL APIs. The AF_XDP zero-copy rx_bi ring is now simply a struct xdp_buff pointer. Cc: intel-wired-...@lists.osuosl.org Signed-off-by: Björn Töpel --- drivers/net/ethernet/intel/i40e/i40e_main.c | 19 +- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 9 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 350 ++-- drivers/net/ethernet/intel/i40e/i40e_xsk.h | 1 - 4 files changed, 47 insertions(+), 332 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 3e1695bb8262..ea7395b391e5 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3266,21 +3266,19 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) ret = i40e_alloc_rx_bi_zc(ring); if (ret) return ret; - ring->rx_buf_len = ring->xsk_umem->chunk_size_nohr - - XDP_PACKET_HEADROOM; + ring->rx_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem); /* For AF_XDP ZC, we disallow packets to span on * multiple buffers, thus letting us skip that * handling in the fast-path. */ chain_len = 1; - ring->zca.free = i40e_zca_free; ret = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, -MEM_TYPE_ZERO_COPY, -&ring->zca); +MEM_TYPE_XSK_BUFF_POOL, +NULL); if (ret) return ret; dev_info(&vsi->back->pdev->dev, -"Registered XDP mem model MEM_TYPE_ZERO_COPY on Rx ring %d\n", +"Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", ring->queue_index); } else { @@ -3351,9 +3349,12 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q); writel(0, ring->tail); - ok = ring->xsk_umem ? -i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring)) : -!i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring)); + if (ring->xsk_umem) { + xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); + ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring)); + } else { + ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring)); + } if (!ok) { /* Log this in case the user has forgotten to give the kernel * any buffers, even later in the application. diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index d343498e8de5..5c255977fd58 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -301,12 +301,6 @@ struct i40e_rx_buffer { __u16 pagecnt_bias; }; -struct i40e_rx_buffer_zc { - dma_addr_t dma; - void *addr; - u64 handle; -}; - struct i40e_queue_stats { u64 packets; u64 bytes; @@ -356,7 +350,7 @@ struct i40e_ring { union { struct i40e_tx_buffer *tx_bi; struct i40e_rx_buffer *rx_bi; - struct i40e_rx_buffer_zc *rx_bi_zc; + struct xdp_buff **rx_bi_zc; }; DECLARE_BITMAP(state, __I40E_RING_STATE_NBITS); u16 queue_index;/* Queue number of ring */ @@ -418,7 +412,6 @@ struct i40e_ring { struct i40e_channel *ch; struct xdp_rxq_info xdp_rxq; struct xdp_umem *xsk_umem; - struct zero_copy_allocator zca; /* ZC allocator anchor */ } cacheline_internodealigned_in_smp; static inline bool ring_uses_build_skb(struct i40e_ring *ring) diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 4fce057f1eec..460f5052e1db 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -23,68 +23,11 @@ void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring) sizeof(*rx_ring->rx_bi_zc) * rx_ring->count); } -static struct i40e_rx_buffer_zc *i40e_rx_bi(struct i40e_ring *rx_ring, u32 idx) +static struct xdp_buff **i40e_rx_bi(struct i40e_ring *rx_ring, u32 idx) { return &rx_ring->rx_bi_zc[idx]; } -/** - * i40e_xsk_umem_dma_map - DMA maps all UMEM memory for the netdev - * @vsi: Current VSI - * @umem: UMEM to DMA map - * - * Returns 0 on success, <0 on failure - **/ -static int i40e_xsk_umem_dma_map(struct i40e_vsi *vsi, struct xdp_umem *umem) -{ - struct i40e_pf *pf = vsi->back; - struct device *dev; - unsigned i
[PATCH bpf-next v2 06/14] i40e: separate kernel allocated rx_bi rings from AF_XDP rings
From: Björn Töpel Continuing the path to support MEM_TYPE_XSK_BUFF_POOL, the AF_XDP zero-copy/sk_buff rx_bi rings are now separate. Functions to properly allocate the different rings are added as well. Cc: intel-wired-...@lists.osuosl.org Signed-off-by: Björn Töpel --- drivers/net/ethernet/intel/i40e/i40e_main.c | 7 ++ drivers/net/ethernet/intel/i40e/i40e_txrx.c | 119 +++--- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 22 ++-- .../ethernet/intel/i40e/i40e_txrx_common.h| 40 +- drivers/net/ethernet/intel/i40e/i40e_type.h | 5 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c| 74 ++- drivers/net/ethernet/intel/i40e/i40e_xsk.h| 2 + 7 files changed, 142 insertions(+), 127 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index d6b2db4f2c65..3e1695bb8262 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3260,8 +3260,12 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) if (ring->vsi->type == I40E_VSI_MAIN) xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); + kfree(ring->rx_bi); ring->xsk_umem = i40e_xsk_umem(ring); if (ring->xsk_umem) { + ret = i40e_alloc_rx_bi_zc(ring); + if (ret) + return ret; ring->rx_buf_len = ring->xsk_umem->chunk_size_nohr - XDP_PACKET_HEADROOM; /* For AF_XDP ZC, we disallow packets to span on @@ -3280,6 +3284,9 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) ring->queue_index); } else { + ret = i40e_alloc_rx_bi(ring); + if (ret) + return ret; ring->rx_buf_len = vsi->rx_buf_len; if (ring->vsi->type == I40E_VSI_MAIN) { ret = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 58daba8fabc8..f063df623443 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -521,28 +521,29 @@ int i40e_add_del_fdir(struct i40e_vsi *vsi, /** * i40e_fd_handle_status - check the Programming Status for FD * @rx_ring: the Rx ring for this descriptor - * @rx_desc: the Rx descriptor for programming Status, not a packet descriptor. + * @qword0_raw: qword0 + * @qword1: qword1 after le_to_cpu * @prog_id: the id originally used for programming * * This is used to verify if the FD programming or invalidation * requested by SW to the HW is successful or not and take actions accordingly. **/ -void i40e_fd_handle_status(struct i40e_ring *rx_ring, - union i40e_rx_desc *rx_desc, u8 prog_id) +void i40e_fd_handle_status(struct i40e_ring *rx_ring, u64 qword0_raw, + u64 qword1, u8 prog_id) { struct i40e_pf *pf = rx_ring->vsi->back; struct pci_dev *pdev = pf->pdev; + struct i40e_32b_rx_wb_qw0 *qw0; u32 fcnt_prog, fcnt_avail; u32 error; - u64 qw; - qw = le64_to_cpu(rx_desc->wb.qword1.status_error_len); - error = (qw & I40E_RX_PROG_STATUS_DESC_QW1_ERROR_MASK) >> + qw0 = (struct i40e_32b_rx_wb_qw0 *)&qword0_raw; + error = (qword1 & I40E_RX_PROG_STATUS_DESC_QW1_ERROR_MASK) >> I40E_RX_PROG_STATUS_DESC_QW1_ERROR_SHIFT; if (error == BIT(I40E_RX_PROG_STATUS_DESC_FD_TBL_FULL_SHIFT)) { - pf->fd_inv = le32_to_cpu(rx_desc->wb.qword0.hi_dword.fd_id); - if ((rx_desc->wb.qword0.hi_dword.fd_id != 0) || + pf->fd_inv = le32_to_cpu(qw0->hi_dword.fd_id); + if (qw0->hi_dword.fd_id != 0 || (I40E_DEBUG_FD & pf->hw.debug_mask)) dev_warn(&pdev->dev, "ntuple filter loc = %d, could not be added\n", pf->fd_inv); @@ -560,7 +561,7 @@ void i40e_fd_handle_status(struct i40e_ring *rx_ring, /* store the current atr filter count */ pf->fd_atr_cnt = i40e_get_current_atr_cnt(pf); - if ((rx_desc->wb.qword0.hi_dword.fd_id == 0) && + if (qw0->hi_dword.fd_id == 0 && test_bit(__I40E_FD_SB_AUTO_DISABLED, pf->state)) { /* These set_bit() calls aren't atomic with the * test_bit() here, but worse case we potentially @@ -589,7 +590,7 @@ void i40e_fd_handle_status(struct i40e_ring *rx_ring, } else if (error == BIT(I40E_RX_PROG_STATUS_DESC_NO_FD_ENTRY_SHIFT)) { if (I40E_DEBUG_FD & pf->hw.debug_mask) dev_info(&pdev->dev, "ntuple filter fd_id = %d, could not be removed\n", -rx_desc->wb.qword0.hi_dword.fd_id); +
[PATCH bpf-next v2 08/14] ice, xsk: migrate to new MEM_TYPE_XSK_BUFF_POOL
From: Björn Töpel Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL APIs. Cc: intel-wired-...@lists.osuosl.org Signed-off-by: Maciej Fijalkowski Signed-off-by: Björn Töpel --- drivers/net/ethernet/intel/ice/ice_base.c | 16 +- drivers/net/ethernet/intel/ice/ice_txrx.h | 8 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 372 +++--- drivers/net/ethernet/intel/ice/ice_xsk.h | 13 +- 4 files changed, 54 insertions(+), 355 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index a19cd6f5436b..433eb72b1c85 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2019, Intel Corporation. */ +#include #include "ice_base.h" #include "ice_dcb_lib.h" @@ -308,24 +309,23 @@ int ice_setup_rx_ctx(struct ice_ring *ring) if (ring->xsk_umem) { xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); - ring->rx_buf_len = ring->xsk_umem->chunk_size_nohr - - XDP_PACKET_HEADROOM; + ring->rx_buf_len = + xsk_umem_get_rx_frame_size(ring->xsk_umem); /* For AF_XDP ZC, we disallow packets to span on * multiple buffers, thus letting us skip that * handling in the fast-path. */ chain_len = 1; - ring->zca.free = ice_zca_free; err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, -MEM_TYPE_ZERO_COPY, -&ring->zca); +MEM_TYPE_XSK_BUFF_POOL, +NULL); if (err) return err; + xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); - dev_info(ice_pf_to_dev(vsi->back), "Registered XDP mem model MEM_TYPE_ZERO_COPY on Rx ring %d\n", + dev_info(ice_pf_to_dev(vsi->back), "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", ring->q_index); } else { - ring->zca.free = NULL; if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) /* coverity[check_return] */ xdp_rxq_info_reg(&ring->xdp_rxq, @@ -426,7 +426,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) writel(0, ring->tail); err = ring->xsk_umem ? - ice_alloc_rx_bufs_slow_zc(ring, ICE_DESC_UNUSED(ring)) : + ice_alloc_rx_bufs_zc(ring, ICE_DESC_UNUSED(ring)) : ice_alloc_rx_bufs(ring, ICE_DESC_UNUSED(ring)); if (err) dev_info(ice_pf_to_dev(vsi->back), "Failed allocate some buffers on %sRx ring %d (pf_q %d)\n", diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index 7ee00a128663..d0fd2173854f 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -155,17 +155,16 @@ struct ice_tx_offload_params { }; struct ice_rx_buf { - struct sk_buff *skb; - dma_addr_t dma; union { struct { + struct sk_buff *skb; + dma_addr_t dma; struct page *page; unsigned int page_offset; u16 pagecnt_bias; }; struct { - void *addr; - u64 handle; + struct xdp_buff *xdp; }; }; }; @@ -289,7 +288,6 @@ struct ice_ring { struct rcu_head rcu;/* to avoid race on free */ struct bpf_prog *xdp_prog; struct xdp_umem *xsk_umem; - struct zero_copy_allocator zca; /* CL3 - 3rd cacheline starts here */ struct xdp_rxq_info xdp_rxq; /* CLX - the below items are only accessed infrequently and should be diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 955b0fbb7c9a..da89589c3137 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -279,28 +279,6 @@ static int ice_xsk_alloc_umems(struct ice_vsi *vsi) return 0; } -/** - * ice_xsk_add_umem - add a UMEM region for XDP sockets - * @vsi: VSI to which the UMEM will be added - * @umem: pointer to a requested UMEM region - * @qid: queue ID - * - * Returns 0 on success, negative on error - */ -static int ice_xsk_add_umem(struct ice_vsi *vsi, struct xdp_um
[PATCH bpf-next v2 05/14] i40e: refactor rx_bi accesses
From: Björn Töpel As a first step to migrate i40e to the new MEM_TYPE_XSK_BUFF_POOL APIs, code that accesses the rx_bi (SW/shadow ring) is refactored to use an accessor function. Cc: intel-wired-...@lists.osuosl.org Signed-off-by: Björn Töpel --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 17 +++-- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 18 -- 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index b8496037ef7f..58daba8fabc8 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1195,6 +1195,11 @@ static void i40e_update_itr(struct i40e_q_vector *q_vector, rc->total_packets = 0; } +static struct i40e_rx_buffer *i40e_rx_bi(struct i40e_ring *rx_ring, u32 idx) +{ + return &rx_ring->rx_bi[idx]; +} + /** * i40e_reuse_rx_page - page flip buffer and store it back on the ring * @rx_ring: rx descriptor ring to store buffers on @@ -1208,7 +1213,7 @@ static void i40e_reuse_rx_page(struct i40e_ring *rx_ring, struct i40e_rx_buffer *new_buff; u16 nta = rx_ring->next_to_alloc; - new_buff = &rx_ring->rx_bi[nta]; + new_buff = i40e_rx_bi(rx_ring, nta); /* update, and store next to alloc */ nta++; @@ -1272,7 +1277,7 @@ struct i40e_rx_buffer *i40e_clean_programming_status( ntc = rx_ring->next_to_clean; /* fetch, update, and store next to clean */ - rx_buffer = &rx_ring->rx_bi[ntc++]; + rx_buffer = i40e_rx_bi(rx_ring, ntc++); ntc = (ntc < rx_ring->count) ? ntc : 0; rx_ring->next_to_clean = ntc; @@ -1361,7 +1366,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring) /* Free all the Rx ring sk_buffs */ for (i = 0; i < rx_ring->count; i++) { - struct i40e_rx_buffer *rx_bi = &rx_ring->rx_bi[i]; + struct i40e_rx_buffer *rx_bi = i40e_rx_bi(rx_ring, i); if (!rx_bi->page) continue; @@ -1576,7 +1581,7 @@ bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count) return false; rx_desc = I40E_RX_DESC(rx_ring, ntu); - bi = &rx_ring->rx_bi[ntu]; + bi = i40e_rx_bi(rx_ring, ntu); do { if (!i40e_alloc_mapped_page(rx_ring, bi)) @@ -1598,7 +1603,7 @@ bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count) ntu++; if (unlikely(ntu == rx_ring->count)) { rx_desc = I40E_RX_DESC(rx_ring, 0); - bi = rx_ring->rx_bi; + bi = i40e_rx_bi(rx_ring, 0); ntu = 0; } @@ -1965,7 +1970,7 @@ static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring, { struct i40e_rx_buffer *rx_buffer; - rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean]; + rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); prefetchw(rx_buffer->page); /* we are reusing so sync this buffer for CPU use */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 452bba7bc4ff..8d29477bb0b6 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -9,6 +9,11 @@ #include "i40e_txrx_common.h" #include "i40e_xsk.h" +static struct i40e_rx_buffer *i40e_rx_bi(struct i40e_ring *rx_ring, u32 idx) +{ + return &rx_ring->rx_bi[idx]; +} + /** * i40e_xsk_umem_dma_map - DMA maps all UMEM memory for the netdev * @vsi: Current VSI @@ -321,7 +326,7 @@ __i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count, bool ok = true; rx_desc = I40E_RX_DESC(rx_ring, ntu); - bi = &rx_ring->rx_bi[ntu]; + bi = i40e_rx_bi(rx_ring, ntu); do { if (!alloc(rx_ring, bi)) { ok = false; @@ -340,7 +345,7 @@ __i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count, if (unlikely(ntu == rx_ring->count)) { rx_desc = I40E_RX_DESC(rx_ring, 0); - bi = rx_ring->rx_bi; + bi = i40e_rx_bi(rx_ring, 0); ntu = 0; } @@ -402,7 +407,7 @@ static struct i40e_rx_buffer *i40e_get_rx_buffer_zc(struct i40e_ring *rx_ring, { struct i40e_rx_buffer *bi; - bi = &rx_ring->rx_bi[rx_ring->next_to_clean]; + bi = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); /* we are reusing so sync this buffer for CPU use */ dma_sync_single_range_for_cpu(rx_ring->dev, @@ -424,7 +429,8 @@ static struct i40e_rx_buffer *i40e_get_rx_buffer_zc(struct i40e_ring *rx_ring, static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring, struct i40e_rx_buffer *old_bi) { - struct i4
[PATCH bpf-next v2 09/14] ixgbe, xsk: migrate to new MEM_TYPE_XSK_BUFF_POOL
From: Björn Töpel Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL APIs. v1->v2: Fixed xdp_buff data_end update. (Björn) Cc: intel-wired-...@lists.osuosl.org Signed-off-by: Björn Töpel --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 9 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 15 +- .../ethernet/intel/ixgbe/ixgbe_txrx_common.h | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 305 +++--- 4 files changed, 62 insertions(+), 269 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 2833e4f041ce..5ddfc83a1e46 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -224,17 +224,17 @@ struct ixgbe_tx_buffer { }; struct ixgbe_rx_buffer { - struct sk_buff *skb; - dma_addr_t dma; union { struct { + struct sk_buff *skb; + dma_addr_t dma; struct page *page; __u32 page_offset; __u16 pagecnt_bias; }; struct { - void *addr; - u64 handle; + bool discard; + struct xdp_buff *xdp; }; }; }; @@ -351,7 +351,6 @@ struct ixgbe_ring { }; struct xdp_rxq_info xdp_rxq; struct xdp_umem *xsk_umem; - struct zero_copy_allocator zca; /* ZC allocator anchor */ u16 ring_idx; /* {rx,tx,xdp}_ring back reference idx */ u16 rx_buf_len; } cacheline_internodealigned_in_smp; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 718931d951bc..da7b8042901f 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -35,7 +35,7 @@ #include #include #include -#include +#include #include #include "ixgbe.h" @@ -3726,8 +3726,7 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter, /* configure the packet buffer length */ if (rx_ring->xsk_umem) { - u32 xsk_buf_len = rx_ring->xsk_umem->chunk_size_nohr - - XDP_PACKET_HEADROOM; + u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_umem); /* If the MAC support setting RXDCTL.RLPML, the * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and @@ -4074,11 +4073,10 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); ring->xsk_umem = ixgbe_xsk_umem(adapter, ring); if (ring->xsk_umem) { - ring->zca.free = ixgbe_zca_free; WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, - MEM_TYPE_ZERO_COPY, - &ring->zca)); - + MEM_TYPE_XSK_BUFF_POOL, + NULL)); + xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); } else { WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_SHARED, NULL)); @@ -4134,8 +4132,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, } if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) { - u32 xsk_buf_len = ring->xsk_umem->chunk_size_nohr - - XDP_PACKET_HEADROOM; + u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem); rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK | IXGBE_RXDCTL_RLPML_EN); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h index 6d01700b46bc..7887ae4aaf4f 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h @@ -35,7 +35,7 @@ int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem, void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle); -void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count); +bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count); int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, struct ixgbe_ring *rx_ring, const int budget); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 5b6edbd8a4ed..86add9fbd36c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -20,54 +20,11 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter, ret
[PATCH bpf-next v2 12/14] xdp: simplify xdp_return_{frame,frame_rx_napi,buff}
From: Björn Töpel The xdp_return_{frame,frame_rx_napi,buff} function are never used, except in xdp_convert_zc_to_xdp_frame(), by the MEM_TYPE_XSK_BUFF_POOL memory type. To simplify and reduce code, change so that xdp_convert_zc_to_xdp_frame() calls xsk_buff_free() directly since the type is know, and remove MEM_TYPE_XSK_BUFF_POOL from the switch statement in __xdp_return() function. Suggested-by: Maxim Mikityanskiy Signed-off-by: Björn Töpel --- net/core/xdp.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/net/core/xdp.c b/net/core/xdp.c index 11273c976e19..7ab1f9014c5e 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -334,10 +334,11 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); * scenarios (e.g. queue full), it is possible to return the xdp_frame * while still leveraging this protection. The @napi_direct boolean * is used for those calls sites. Thus, allowing for faster recycling - * of xdp_frames/pages in those cases. + * of xdp_frames/pages in those cases. This path is never used by the + * MEM_TYPE_XSK_BUFF_POOL memory type, so it's explicitly not part of + * the switch-statement. */ -static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, -struct xdp_buff *xdp) +static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct) { struct xdp_mem_allocator *xa; struct page *page; @@ -359,33 +360,29 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, page = virt_to_page(data); /* Assumes order0 page*/ put_page(page); break; - case MEM_TYPE_XSK_BUFF_POOL: - /* NB! Only valid from an xdp_buff! */ - xsk_buff_free(xdp); - break; default: /* Not possible, checked in xdp_rxq_info_reg_mem_model() */ + WARN(1, "Incorrect XDP memory type (%d) usage", mem->type); break; } } void xdp_return_frame(struct xdp_frame *xdpf) { - __xdp_return(xdpf->data, &xdpf->mem, false, NULL); + __xdp_return(xdpf->data, &xdpf->mem, false); } EXPORT_SYMBOL_GPL(xdp_return_frame); void xdp_return_frame_rx_napi(struct xdp_frame *xdpf) { - __xdp_return(xdpf->data, &xdpf->mem, true, NULL); + __xdp_return(xdpf->data, &xdpf->mem, true); } EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi); void xdp_return_buff(struct xdp_buff *xdp) { - __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp); + __xdp_return(xdp->data, &xdp->rxq->mem, true); } -EXPORT_SYMBOL_GPL(xdp_return_buff); /* Only called for MEM_TYPE_PAGE_POOL see xdp.h */ void __xdp_release_frame(void *data, struct xdp_mem_info *mem) @@ -466,7 +463,7 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp) xdpf->metasize = metasize; xdpf->mem.type = MEM_TYPE_PAGE_ORDER0; - xdp_return_buff(xdp); + xsk_buff_free(xdp); return xdpf; } EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame); -- 2.25.1
[PATCH bpf-next v2 04/14] xsk: introduce AF_XDP buffer allocation API
From: Björn Töpel In order to simplify AF_XDP zero-copy enablement for NIC driver developers, a new AF_XDP buffer allocation API is added. The implementation is based on a single core (single producer/consumer) buffer pool for the AF_XDP UMEM. A buffer is allocated using the xsk_buff_alloc() function, and returned using xsk_buff_free(). If a buffer is disassociated with the pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is said to be released. Currently, the release function is only used by the AF_XDP internals and not visible to the driver. Drivers using this API should register the XDP memory model with the new MEM_TYPE_XSK_BUFF_POOL type. The API is defined in net/xdp_sock_drv.h. The buffer type is struct xdp_buff, and follows the lifetime of regular xdp_buffs, i.e. the lifetime of an xdp_buff is restricted to a NAPI context. In other words, the API is not replacing xdp_frames. In addition to introducing the API and implementations, the AF_XDP core is migrated to use the new APIs. rfc->v1: Fixed build errors/warnings for m68k and riscv. (kbuild test robot) Added headroom/chunk size getter. (Maxim/Björn) v1->v2: Swapped SoBs. (Maxim) Signed-off-by: Björn Töpel Signed-off-by: Maxim Mikityanskiy --- include/net/xdp.h | 4 +- include/net/xdp_sock.h | 2 + include/net/xdp_sock_drv.h | 152 include/net/xsk_buff_pool.h | 54 + include/trace/events/xdp.h | 3 +- net/core/xdp.c | 14 +- net/xdp/Makefile| 1 + net/xdp/xdp_umem.c | 19 +- net/xdp/xsk.c | 147 +--- net/xdp/xsk_buff_pool.c | 462 net/xdp/xsk_diag.c | 2 +- net/xdp/xsk_queue.h | 59 +++-- 12 files changed, 800 insertions(+), 119 deletions(-) create mode 100644 include/net/xsk_buff_pool.h create mode 100644 net/xdp/xsk_buff_pool.c diff --git a/include/net/xdp.h b/include/net/xdp.h index 3cc6d5d84aa4..83173e4d306c 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -38,6 +38,7 @@ enum xdp_mem_type { MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ MEM_TYPE_PAGE_POOL, MEM_TYPE_ZERO_COPY, + MEM_TYPE_XSK_BUFF_POOL, MEM_TYPE_MAX, }; @@ -101,7 +102,8 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) int metasize; int headroom; - if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) + if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY || + xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) return xdp_convert_zc_to_xdp_frame(xdp); /* Assure headroom is available for storing info */ diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index fb7fe3060175..6e7265f63c04 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -31,11 +31,13 @@ struct xdp_umem_fq_reuse { struct xdp_umem { struct xsk_queue *fq; struct xsk_queue *cq; + struct xsk_buff_pool *pool; struct xdp_umem_page *pages; u64 chunk_mask; u64 size; u32 headroom; u32 chunk_size_nohr; + u32 chunk_size; struct user_struct *user; refcount_t users; struct work_struct work; diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 98dd6962e6d4..5a0970d4c44c 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -7,6 +7,7 @@ #define _LINUX_XDP_SOCK_DRV_H #include +#include #ifdef CONFIG_XDP_SOCKETS @@ -96,6 +97,87 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address, return address + offset; } +static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) +{ + return XDP_PACKET_HEADROOM + umem->headroom; +} + +static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) +{ + return umem->chunk_size; +} + +static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) +{ + return xsk_umem_get_chunk_size(umem) - xsk_umem_get_headroom(umem); +} + +static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, +struct xdp_rxq_info *rxq) +{ + xp_set_rxq_info(umem->pool, rxq); +} + +static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, + unsigned long attrs) +{ + xp_dma_unmap(umem->pool, attrs); +} + +static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, + unsigned long attrs) +{ + return xp_dma_map(umem->pool, dev, attrs, umem->pgs, umem->npgs); +} + +static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp) +{ + struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + + return xp_get_dma(xskb); +} + +static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) +{ + return xp_alloc(umem->pool); +} + +static inline bool xsk_buff_can_alloc(struct x
[PATCH bpf-next v2 14/14] MAINTAINERS, xsk: update AF_XDP section after moves/adds
From: Björn Töpel Update MAINTAINERS to correctly mirror the current AF_XDP socket file layout. Also, add the AF_XDP files of libbpf. rfc->v1: Sorted file entries. (Joe) Cc: Joe Perches Signed-off-by: Björn Töpel --- MAINTAINERS | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index db7a6d462dff..79e2bb1280e6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -18451,8 +18451,12 @@ R: Jonathan Lemon L: netdev@vger.kernel.org L: b...@vger.kernel.org S: Maintained -F: kernel/bpf/xskmap.c +F: include/net/xdp_sock* +F: include/net/xsk_buffer_pool.h +F: include/uapi/linux/if_xdp.h F: net/xdp/ +F: samples/bpf/xdpsock* +F: tools/lib/bpf/xsk* XEN BLOCK SUBSYSTEM M: Konrad Rzeszutek Wilk -- 2.25.1
[PATCH bpf-next v2 11/14] xsk: remove MEM_TYPE_ZERO_COPY and corresponding code
From: Björn Töpel There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding code, including the "handle" member of struct xdp_buff. rfc->v1: Fixed spelling in commit message. (Björn) Signed-off-by: Björn Töpel --- drivers/net/hyperv/netvsc_bpf.c | 1 - include/net/xdp.h | 9 +-- include/net/xdp_sock.h | 45 --- include/net/xdp_sock_drv.h | 139 include/trace/events/xdp.h | 1 - net/core/xdp.c | 42 ++ net/xdp/xdp_umem.c | 56 + net/xdp/xsk.c | 48 +-- net/xdp/xsk_buff_pool.c | 7 ++ net/xdp/xsk_queue.c | 62 -- net/xdp/xsk_queue.h | 105 11 files changed, 15 insertions(+), 500 deletions(-) diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c index b86611041db6..9f78f774041b 100644 --- a/drivers/net/hyperv/netvsc_bpf.c +++ b/drivers/net/hyperv/netvsc_bpf.c @@ -49,7 +49,6 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, xdp_set_data_meta_invalid(xdp); xdp->data_end = xdp->data + len; xdp->rxq = &nvchan->xdp_rxq; - xdp->handle = 0; memcpy(xdp->data, data, len); diff --git a/include/net/xdp.h b/include/net/xdp.h index 83173e4d306c..1495ffb7a642 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -37,7 +37,6 @@ enum xdp_mem_type { MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */ MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ MEM_TYPE_PAGE_POOL, - MEM_TYPE_ZERO_COPY, MEM_TYPE_XSK_BUFF_POOL, MEM_TYPE_MAX, }; @@ -53,10 +52,6 @@ struct xdp_mem_info { struct page_pool; -struct zero_copy_allocator { - void (*free)(struct zero_copy_allocator *zca, unsigned long handle); -}; - struct xdp_rxq_info { struct net_device *dev; u32 queue_index; @@ -69,7 +64,6 @@ struct xdp_buff { void *data_end; void *data_meta; void *data_hard_start; - unsigned long handle; struct xdp_rxq_info *rxq; }; @@ -102,8 +96,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) int metasize; int headroom; - if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY || - xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) + if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) return xdp_convert_zc_to_xdp_frame(xdp); /* Assure headroom is available for storing info */ diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 6e7265f63c04..96bfc5f5f24e 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -17,26 +17,12 @@ struct net_device; struct xsk_queue; struct xdp_buff; -struct xdp_umem_page { - void *addr; - dma_addr_t dma; -}; - -struct xdp_umem_fq_reuse { - u32 nentries; - u32 length; - u64 handles[]; -}; - struct xdp_umem { struct xsk_queue *fq; struct xsk_queue *cq; struct xsk_buff_pool *pool; - struct xdp_umem_page *pages; - u64 chunk_mask; u64 size; u32 headroom; - u32 chunk_size_nohr; u32 chunk_size; struct user_struct *user; refcount_t users; @@ -48,7 +34,6 @@ struct xdp_umem { u8 flags; int id; struct net_device *dev; - struct xdp_umem_fq_reuse *fq_reuse; bool zc; spinlock_t xsk_tx_list_lock; struct list_head xsk_tx_list; @@ -109,21 +94,6 @@ static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, return xs; } -static inline u64 xsk_umem_extract_addr(u64 addr) -{ - return addr & XSK_UNALIGNED_BUF_ADDR_MASK; -} - -static inline u64 xsk_umem_extract_offset(u64 addr) -{ - return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT; -} - -static inline u64 xsk_umem_add_offset_to_addr(u64 addr) -{ - return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr); -} - #else static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) @@ -146,21 +116,6 @@ static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, return NULL; } -static inline u64 xsk_umem_extract_addr(u64 addr) -{ - return 0; -} - -static inline u64 xsk_umem_extract_offset(u64 addr) -{ - return 0; -} - -static inline u64 xsk_umem_add_offset_to_addr(u64 addr) -{ - return 0; -} - #endif /* CONFIG_XDP_SOCKETS */ #endif /* _LINUX_XDP_SOCK_H */ diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 5a0970d4c44c..533ee0ce43de 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -11,16 +11,9 @@ #ifdef CONFIG_XDP_SOCKETS -bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt); -bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr); -void xsk_umem_release_addr(struct xdp_umem *umem); void xsk_umem_comple
[PATCH bpf-next v2 10/14] mlx5, xsk: migrate to new MEM_TYPE_XSK_BUFF_POOL
From: Björn Töpel Use the new MEM_TYPE_XSK_BUFF_POOL API in lieu of MEM_TYPE_ZERO_COPY in mlx5e. It allows to drop a lot of code from the driver (which is now common in AF_XDP core and was related to XSK RX frame allocation, DMA mapping, etc.) and slightly improve performance. rfc->v1: Put back the sanity check for XSK params, use XSK API to get the total headroom size. (Maxim) v1->v2: Fix DMA address handling, set XDP metadata to invalid. (Maxim) Signed-off-by: Björn Töpel Signed-off-by: Maxim Mikityanskiy --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 7 +- .../ethernet/mellanox/mlx5/core/en/params.c | 13 +- .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 30 ++--- .../net/ethernet/mellanox/mlx5/core/en/xdp.h | 2 +- .../ethernet/mellanox/mlx5/core/en/xsk/rx.c | 113 -- .../ethernet/mellanox/mlx5/core/en/xsk/rx.h | 23 +++- .../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 6 +- .../ethernet/mellanox/mlx5/core/en/xsk/umem.c | 49 +--- .../net/ethernet/mellanox/mlx5/core/en_main.c | 15 +-- .../net/ethernet/mellanox/mlx5/core/en_rx.c | 33 - 10 files changed, 94 insertions(+), 197 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 0864b76ca2c0..526e59029beb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -429,10 +429,7 @@ struct mlx5e_dma_info { dma_addr_t addr; union { struct page *page; - struct { - u64 handle; - void *data; - } xsk; + struct xdp_buff *xsk; }; }; @@ -650,7 +647,6 @@ struct mlx5e_rq { } mpwqe; }; struct { - u16umem_headroom; u16headroom; u8 map_dir; /* dma map direction */ } buff; @@ -682,7 +678,6 @@ struct mlx5e_rq { struct page_pool *page_pool; /* AF_XDP zero-copy */ - struct zero_copy_allocator zca; struct xdp_umem *umem; struct work_struct recover_work; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c index eb2e1f2138e4..38e4f19d69f8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c @@ -12,15 +12,16 @@ static inline bool mlx5e_rx_is_xdp(struct mlx5e_params *params, u16 mlx5e_get_linear_rq_headroom(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk) { - u16 headroom = NET_IP_ALIGN; + u16 headroom; - if (mlx5e_rx_is_xdp(params, xsk)) { + if (xsk) + return xsk->headroom; + + headroom = NET_IP_ALIGN; + if (mlx5e_rx_is_xdp(params, xsk)) headroom += XDP_PACKET_HEADROOM; - if (xsk) - headroom += xsk->headroom; - } else { + else headroom += MLX5_RX_HEADROOM; - } return headroom; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index b04b99396f65..a2a194525b15 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -71,7 +71,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq, xdptxd.data = xdpf->data; xdptxd.len = xdpf->len; - if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) { + if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) { /* The xdp_buff was in the UMEM and was copied into a newly * allocated page. The UMEM page was returned via the ZCA, and * this new page has to be mapped at this point and has to be @@ -119,49 +119,33 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq, /* returns true if packet was consumed by xdp */ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di, - void *va, u16 *rx_headroom, u32 *len, bool xsk) + u32 *len, struct xdp_buff *xdp) { struct bpf_prog *prog = READ_ONCE(rq->xdp_prog); - struct xdp_umem *umem = rq->umem; - struct xdp_buff xdp; u32 act; int err; if (!prog) return false; - xdp.data = va + *rx_headroom; - xdp_set_data_meta_invalid(&xdp); - xdp.data_end = xdp.data + *len; - xdp.data_hard_start = va; - if (xsk) - xdp.handle = di->xsk.handle; - xdp.rxq = &rq->xdp_rxq; - - act = bpf_prog_run_xdp(prog, &xdp); - if (xsk) { - u64 off = xdp.data - xdp.data_hard_start; - - xdp.handle = xsk_umem_adjust_offset(umem, xdp.handle, off); - } + act = bpf_prog_run_xdp(prog
[PATCH bpf-next v2 13/14] xsk: explicitly inline functions and move definitions
From: Björn Töpel In order to reduce the number of function calls, the struct xsk_buff_pool definition is moved to xsk_buff_pool.h. The functions xp_get_dma(), xp_dma_sync_for_cpu(), xp_dma_sync_for_device(), xp_validate_desc() and various helper functions are explicitly inlined. Further, move xp_get_handle() and xp_release() to xsk.c, to allow for the compiler to perform inlining. rfc->v1: Make sure xp_validate_desc() is inlined for Tx perf. (Maxim) Signed-off-by: Björn Töpel --- include/net/xsk_buff_pool.h | 92 +-- net/xdp/xsk.c | 15 net/xdp/xsk_buff_pool.c | 142 ++-- net/xdp/xsk_queue.h | 45 4 files changed, 151 insertions(+), 143 deletions(-) diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 9abef166441d..029522696ccb 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -4,6 +4,7 @@ #ifndef XSK_BUFF_POOL_H_ #define XSK_BUFF_POOL_H_ +#include #include #include #include @@ -24,6 +25,27 @@ struct xdp_buff_xsk { struct list_head free_list_node; }; +struct xsk_buff_pool { + struct xsk_queue *fq; + struct list_head free_list; + dma_addr_t *dma_pages; + struct xdp_buff_xsk *heads; + u64 chunk_mask; + u64 addrs_cnt; + u32 free_list_cnt; + u32 dma_pages_cnt; + u32 heads_cnt; + u32 free_heads_cnt; + u32 headroom; + u32 chunk_size; + u32 frame_len; + bool cheap_dma; + bool unaligned; + void *addrs; + struct device *dev; + struct xdp_buff_xsk *free_heads[]; +}; + /* AF_XDP core. */ struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, u32 chunk_size, u32 headroom, u64 size, @@ -31,8 +53,6 @@ struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq); void xp_destroy(struct xsk_buff_pool *pool); void xp_release(struct xdp_buff_xsk *xskb); -u64 xp_get_handle(struct xdp_buff_xsk *xskb); -bool xp_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); /* AF_XDP, and XDP core. */ void xp_free(struct xdp_buff_xsk *xskb); @@ -46,9 +66,69 @@ struct xdp_buff *xp_alloc(struct xsk_buff_pool *pool); bool xp_can_alloc(struct xsk_buff_pool *pool, u32 count); void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr); dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr); -dma_addr_t xp_get_dma(struct xdp_buff_xsk *xskb); -void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb); -void xp_dma_sync_for_device(struct xsk_buff_pool *pool, dma_addr_t dma, - size_t size); +static inline dma_addr_t xp_get_dma(struct xdp_buff_xsk *xskb) +{ + return xskb->dma; +} + +void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb); +static inline void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb) +{ + if (xskb->pool->cheap_dma) + return; + + xp_dma_sync_for_cpu_slow(xskb); +} + +void xp_dma_sync_for_device_slow(struct xsk_buff_pool *pool, dma_addr_t dma, +size_t size); +static inline void xp_dma_sync_for_device(struct xsk_buff_pool *pool, + dma_addr_t dma, size_t size) +{ + if (pool->cheap_dma) + return; + + xp_dma_sync_for_device_slow(pool, dma, size); +} + +/* Masks for xdp_umem_page flags. + * The low 12-bits of the addr will be 0 since this is the page address, so we + * can use them for flags. + */ +#define XSK_NEXT_PG_CONTIG_SHIFT 0 +#define XSK_NEXT_PG_CONTIG_MASK BIT_ULL(XSK_NEXT_PG_CONTIG_SHIFT) + +static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool, +u64 addr, u32 len) +{ + bool cross_pg = (addr & (PAGE_SIZE - 1)) + len > PAGE_SIZE; + + if (pool->dma_pages_cnt && cross_pg) { + return !(pool->dma_pages[addr >> PAGE_SHIFT] & +XSK_NEXT_PG_CONTIG_MASK); + } + return false; +} + +static inline u64 xp_aligned_extract_addr(struct xsk_buff_pool *pool, u64 addr) +{ + return addr & pool->chunk_mask; +} + +static inline u64 xp_unaligned_extract_addr(u64 addr) +{ + return addr & XSK_UNALIGNED_BUF_ADDR_MASK; +} + +static inline u64 xp_unaligned_extract_offset(u64 addr) +{ + return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT; +} + +static inline u64 xp_unaligned_add_offset_to_addr(u64 addr) +{ + return xp_unaligned_extract_addr(addr) + + xp_unaligned_extract_offset(addr); +} #endif /* XSK_BUFF_POOL_H_ */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3f2ab732ab8b..b6c0f08bd80d 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -99,6 +99,21 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_umem_uses_need_wakeup); +void xp_release(struct xdp_bu
Re: signal quality and cable diagnostic
On Tue, May 14, 2020 at 08:28:00AM +, Oleksij Rempel wrote: > On Thu, May 14, 2020 at 07:13:30AM +, Christian Herber wrote: > > On Tue, May 12, 2020 at 10:22:01AM +0200, Oleksij Rempel wrote: > > > > > So I think we should pass raw SQI value to user space, at least in the > > > first implementation. > > > > > What do you think about this? > > > > Hi Oleksij, > > > > I had a check about the background of this SQI thing. The table you > > reference with concrete SNR values is informative only and not a > > requirement. The requirements are rather loose. > > > > This is from OA: > > - Only for SQI=0 a link loss shall occur. > > - The indicated signal quality shall monotonic increasing /decreasing with > > noise level. > > - It shall be indicated in the datasheet at which level a BER<10^-10 > > (better than 10^-10) is achieved (e.g. "from SQI=3 to SQI=7 the link has a > > BER<10^-10 (better than 10^-10)") > > > > I.e. SQI does not need to have a direct correlation with SNR. The > > fundamental underlying metric is the BER. > > You can report the raw SQI level and users would have to look up what it > > means in the respective data sheet. There is no guaranteed relation between > > SQI levels of different devices, i.e. SQI 5 can have lower BER than SQI 6 > > on another device. > > Alternatively, you could report BER < x for the different SQI levels. > > However, this requires the information to be available. While I could > > provide these for NXP, it might not be easily available for other vendors. > > If reporting raw SQI, at least the SQI level for BER<10^-10 should be > > presented to give any meaning to the value. > So the question is, which values to provide via KAPI to user space? > > - SQI > The PHY can probably measure the SNR quite fast and has some internal > function or lookup table to deduct the SQI from the measured SNR. > > If I understand you correctly, we can only compare SQI values of the > same PHY, as different PHYs give different SQIs for the same link > characteristics (=SNR). > - SNR range > We read the SQI from the PHY look up the SNR range for that value from > the data sheet and provide that value to use space. This gives a > better description of the quality of the link. > - "guestimated" BER > The manufacturer of the PHY has probably done some extensive testing > that a measured SNR can be correlated to some BER. This value may be > provided in the data sheet, too. > > The SNR seems to be most universal value, when it comes to comparing > different situations (different links and different PHYs). The > resolution of BER is not that detailed, for the NXP PHY is says only > "BER below 1e-10" or not. The point I was trying to make is that SQI is intentionally called SQI and NOT SNR, because it is not a measure for SNR. The standard only suggest a mapping of SNR to SQI, but vendors do not need to comply to that or report that. The only mandatory requirement is linking to BER. BER is also what would be required by a user, as this is the metric that determines what happens to your traffic, not the SNR. So when it comes to KAPI parameters, I see the following options - SQI only - SQI + plus indication of SQI level at which BER<10^-10 (this is the only required and standardized information) - SQI + BER range (best for users, but requires input from the silicon vendors) SNR in my opinion is neither an option nor helpful. Regards, Christian
Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
On Wed, 13 May 2020 19:43:24 -0700 Linus Torvalds wrote: > On Wed, May 13, 2020 at 6:00 PM Masami Hiramatsu wrote: > > > > > But we should likely at least disallow it entirely on platforms where > > > we really can't - or pick one hardcoded choice. On sparc, you really > > > _have_ to specify one or the other. > > > > OK. BTW, is there any way to detect the kernel/user space overlap on > > memory layout statically? If there, I can do it. (I don't like > > "if (CONFIG_X86)" thing) > > Or, maybe we need CONFIG_ARCH_OVERLAP_ADDRESS_SPACE? > > I think it would be better to have a CONFIG variable that > architectures can just 'select' to show that they are ok with separate > kernel and user addresses. > > Because I don't think we have any way to say that right now as-is. You > can probably come up with hacky ways to approximate it, ie something > like > > if (TASK_SIZE_MAX > PAGE_OFFSET) > they overlap .. > > which would almost work, but.. It seems TASK_SIZE_MAX is defined only on x86 and s390, what about comparing STACK_TOP_MAX with PAGE_OFFSET ? Anyway, I agree that the best way is introducing a CONFIG. Thank you, -- Masami Hiramatsu
RE: [PATCH 32/33] sctp: add sctp_sock_get_primary_addr
From: Marcelo Ricardo Leitner > Sent: 13 May 2020 19:03 > > On Wed, May 13, 2020 at 08:26:47AM +0200, Christoph Hellwig wrote: > > Add a helper to directly get the SCTP_PRIMARY_ADDR sockopt from kernel > > space without going through a fake uaccess. > > Same comment as on the other dlm/sctp patch. Wouldn't it be best to write sctp_[gs]etsockotp() that use a kernel buffer and then implement the user-space calls using a wrapper that does the copies to an on-stack (or malloced if big) buffer. That will also simplify the code be removing all the copies and -EFAULT returns. Only the size checks will be needed and the code can assume the buffer is at least the size of the on-stack buffer. Our SCTP code uses SO_REUSADDR, SCTP_EVENTS, SCTP_NODELAY, SCTP_STATUS, SCTP_INITMSG, IPV6_ONLY, SCTP_SOCKOPT_BINDX_ADD and SO_LINGER. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH v2 net-next 07/11] net: qede: optional hw recovery procedure
Driver has an ability to initiate a recovery process as a reaction to detected errors. But the codepath (recovery_process) was disabled and never active. Here we add ethtool private flag to allow user have the recovery procedure activated. We still do not enable this by default though, since in some configurations this is not desirable. E.g. this may impact other PFs/VFs. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- .../net/ethernet/qlogic/qede/qede_ethtool.c | 24 +++ 1 file changed, 24 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c index 812c7766e096..24cc68391ac4 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c @@ -190,12 +190,14 @@ static const struct { enum { QEDE_PRI_FLAG_CMT, QEDE_PRI_FLAG_SMART_AN_SUPPORT, /* MFW supports SmartAN */ + QEDE_PRI_FLAG_RECOVER_ON_ERROR, QEDE_PRI_FLAG_LEN, }; static const char qede_private_arr[QEDE_PRI_FLAG_LEN][ETH_GSTRING_LEN] = { "Coupled-Function", "SmartAN capable", + "Recover on error", }; enum qede_ethtool_tests { @@ -417,9 +419,30 @@ static u32 qede_get_priv_flags(struct net_device *dev) if (edev->dev_info.common.smart_an) flags |= BIT(QEDE_PRI_FLAG_SMART_AN_SUPPORT); + if (edev->err_flags & BIT(QEDE_ERR_IS_RECOVERABLE)) + flags |= BIT(QEDE_PRI_FLAG_RECOVER_ON_ERROR); + return flags; } +static int qede_set_priv_flags(struct net_device *dev, u32 flags) +{ + struct qede_dev *edev = netdev_priv(dev); + u32 cflags = qede_get_priv_flags(dev); + u32 dflags = flags ^ cflags; + + /* can only change RECOVER_ON_ERROR flag */ + if (dflags & ~BIT(QEDE_PRI_FLAG_RECOVER_ON_ERROR)) + return -EINVAL; + + if (flags & BIT(QEDE_PRI_FLAG_RECOVER_ON_ERROR)) + set_bit(QEDE_ERR_IS_RECOVERABLE, &edev->err_flags); + else + clear_bit(QEDE_ERR_IS_RECOVERABLE, &edev->err_flags); + + return 0; +} + struct qede_link_mode_mapping { u32 qed_link_mode; u32 ethtool_link_mode; @@ -2098,6 +2121,7 @@ static const struct ethtool_ops qede_ethtool_ops = { .set_phys_id = qede_set_phys_id, .get_ethtool_stats = qede_get_ethtool_stats, .get_priv_flags = qede_get_priv_flags, + .set_priv_flags = qede_set_priv_flags, .get_sset_count = qede_get_sset_count, .get_rxnfc = qede_get_rxnfc, .set_rxnfc = qede_set_rxnfc, -- 2.17.1
[PATCH v2 net-next 06/11] net: qed: attention clearing properties
On different hardware events we have to respond differently, on some of hardware indications hw attention (error condition) should be cleared by the driver to continue normal functioning. Here we introduce attention clear flags, and put them on some important events (in aeu_descs). Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qed/qed.h| 3 +++ drivers/net/ethernet/qlogic/qed/qed_int.c| 22 drivers/net/ethernet/qlogic/qed/qed_int.h| 11 ++ drivers/net/ethernet/qlogic/qed/qed_main.c | 7 ++- drivers/net/ethernet/qlogic/qede/qede_main.c | 6 ++ include/linux/qed/qed_if.h | 9 6 files changed, 53 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 07f6ef930b52..66ed39d6f357 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -838,6 +838,9 @@ struct qed_dev { /* Recovery */ bool recov_in_prog; + /* Indicates whether should prevent attentions from being reasserted */ + bool attn_clr_en; + /* LLH info */ u8 ppfid_bitmap; struct qed_llh_info *p_llh_info; diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c index 1b1447b2f059..b7b974f0ef21 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_int.c +++ b/drivers/net/ethernet/qlogic/qed/qed_int.c @@ -96,6 +96,7 @@ struct aeu_invert_reg_bit { #define ATTENTION_BB(value) (value << ATTENTION_BB_SHIFT) #define ATTENTION_BB_DIFFERENT BIT(23) +#define ATTENTION_CLEAR_ENABLE BIT(28) unsigned int flags; /* Callback to call if attention will be triggered */ @@ -371,6 +372,13 @@ static int qed_fw_assertion(struct qed_hwfn *p_hwfn) return -EINVAL; } +static int qed_general_attention_35(struct qed_hwfn *p_hwfn) +{ + DP_INFO(p_hwfn, "General attention 35!\n"); + + return 0; +} + #define QED_DORQ_ATTENTION_REASON_MASK (0xf) #define QED_DORQ_ATTENTION_OPAQUE_MASK (0x) #define QED_DORQ_ATTENTION_OPAQUE_SHIFT (0x0) @@ -613,14 +621,15 @@ static struct aeu_invert_reg aeu_descs[NUM_ATTN_REGS] = { { { /* After Invert 4 */ - {"General Attention 32", ATTENTION_SINGLE, -qed_fw_assertion, + {"General Attention 32", ATTENTION_SINGLE | +ATTENTION_CLEAR_ENABLE, qed_fw_assertion, MAX_BLOCK_ID}, {"General Attention %d", (2 << ATTENTION_LENGTH_SHIFT) | (33 << ATTENTION_OFFSET_SHIFT), NULL, MAX_BLOCK_ID}, - {"General Attention 35", ATTENTION_SINGLE, -NULL, MAX_BLOCK_ID}, + {"General Attention 35", ATTENTION_SINGLE | +ATTENTION_CLEAR_ENABLE, qed_general_attention_35, +MAX_BLOCK_ID}, {"NWS Parity", ATTENTION_PAR | ATTENTION_BB_DIFFERENT | ATTENTION_BB(AEU_INVERT_REG_SPECIAL_CNIG_0), @@ -2361,6 +2370,11 @@ void qed_int_disable_post_isr_release(struct qed_dev *cdev) cdev->hwfns[i].b_int_requested = false; } +void qed_int_attn_clr_enable(struct qed_dev *cdev, bool clr_enable) +{ + cdev->attn_clr_en = clr_enable; +} + int qed_int_set_timer_res(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt, u8 timer_res, u16 sb_id, bool tx) { diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.h b/drivers/net/ethernet/qlogic/qed/qed_int.h index 9ad568d93ae6..e09db3386367 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_int.h +++ b/drivers/net/ethernet/qlogic/qed/qed_int.h @@ -190,6 +190,17 @@ void qed_int_get_num_sbs(struct qed_hwfn *p_hwfn, */ void qed_int_disable_post_isr_release(struct qed_dev *cdev); +/** + * @brief qed_int_attn_clr_enable - sets whether the general behavior is + *preventing attentions from being reasserted, or following the + *attributes of the specific attention. + * + * @param cdev + * @param clr_enable + * + */ +void qed_int_attn_clr_enable(struct qed_dev *cdev, bool clr_enable); + /** * @brief - Doorbell Recovery handler. * Run doorbell recovery in case of PF overflow (and flush DORQ if diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index d7c9d94e4c59..83e798d4eebb 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -2491,10 +2491,14 @@ void qed_hw_error_occurred(struct qed_hwfn *p_hwfn, DP_NOTICE(p_hwfn, "HW error occurred [%s]\n", err_str); - /* Call the HW error handler of the protocol driver + /* C
[PATCH v2 net-next 04/11] net: qed: critical err reporting to management firmware
On various critical errors, notification handler should also report the err information into the management firmware. MFW can interact with server/motherboard backend agents - these are used by server manufacturers to monitor server HW health. Thus, it is important for driver to report on any faulty conditions Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 19 drivers/net/ethernet/qlogic/qed/qed_hw.c | 3 + drivers/net/ethernet/qlogic/qed/qed_mcp.c | 125 ++ drivers/net/ethernet/qlogic/qed/qed_mcp.h | 15 +++ 4 files changed, 162 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index 4597015b8bff..21d53b00c2e6 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -12492,6 +12492,8 @@ struct public_drv_mb { #define DRV_MSG_CODE_GET_ENGINE_CONFIG 0x0037 #define DRV_MSG_CODE_GET_PPFID_BITMAP 0x4300 +#define DRV_MSG_CODE_DEBUG_DATA_SEND 0xc004 + #define RESOURCE_CMD_REQ_RESC_MASK 0x001F #define RESOURCE_CMD_REQ_RESC_SHIFT0 #define RESOURCE_CMD_REQ_OPCODE_MASK 0x00E0 @@ -12626,6 +12628,17 @@ struct public_drv_mb { #define DRV_MB_PARAM_FEATURE_SUPPORT_PORT_EEE 0x0002 #define DRV_MB_PARAM_FEATURE_SUPPORT_FUNC_VLINK0x0001 +/* DRV_MSG_CODE_DEBUG_DATA_SEND parameters */ +#define DRV_MSG_CODE_DEBUG_DATA_SEND_SIZE_OFFSET 0 +#define DRV_MSG_CODE_DEBUG_DATA_SEND_SIZE_MASK 0xFF + +/* Driver attributes params */ +#define DRV_MB_PARAM_ATTRIBUTE_KEY_OFFSET 0 +#define DRV_MB_PARAM_ATTRIBUTE_KEY_MASK0x00FF +#define DRV_MB_PARAM_ATTRIBUTE_CMD_OFFSET 24 +#define DRV_MB_PARAM_ATTRIBUTE_CMD_MASK0xFF00 + +#define DRV_MB_PARAM_NVM_CFG_OPTION_ID_OFFSET 0 #define DRV_MB_PARAM_NVM_CFG_OPTION_ID_SHIFT 0 #define DRV_MB_PARAM_NVM_CFG_OPTION_ID_MASK0x #define DRV_MB_PARAM_NVM_CFG_OPTION_ALL_SHIFT 16 @@ -12678,6 +12691,12 @@ struct public_drv_mb { #define FW_MSG_CODE_DRV_CFG_PF_VFS_MSIX_DONE 0x0087 #define FW_MSG_SEQ_NUMBER_MASK 0x +#define FW_MSG_CODE_DEBUG_DATA_SEND_INV_ARG0xb007 +#define FW_MSG_CODE_DEBUG_DATA_SEND_BUF_FULL 0xb008 +#define FW_MSG_CODE_DEBUG_DATA_SEND_NO_BUF 0xb009 +#define FW_MSG_CODE_DEBUG_NOT_ENABLED 0xb00a +#define FW_MSG_CODE_DEBUG_DATA_SEND_OK 0xb00b + u32 fw_mb_param; #define FW_MB_PARAM_RESOURCE_ALLOC_VERSION_MAJOR_MASK 0x #define FW_MB_PARAM_RESOURCE_ALLOC_VERSION_MAJOR_SHIFT 16 diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.c b/drivers/net/ethernet/qlogic/qed/qed_hw.c index 2d176e1b508c..5fa251489536 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hw.c +++ b/drivers/net/ethernet/qlogic/qed/qed_hw.c @@ -868,6 +868,9 @@ void qed_hw_err_notify(struct qed_hwfn *p_hwfn, } qed_hw_error_occurred(p_hwfn, err_type); + + if (fmt) + qed_mcp_send_raw_debug_data(p_hwfn, p_ptt, buf, len); } int qed_dmae_sanity(struct qed_hwfn *p_hwfn, diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 46653afc385c..62be13d49dd8 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c @@ -3821,3 +3821,128 @@ int qed_mcp_nvm_set_cfg(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt, DRV_MSG_CODE_SET_NVM_CFG_OPTION, mb_param, &resp, ¶m, len, (u32 *)p_buf); } + +#define QED_MCP_DBG_DATA_MAX_SIZE MCP_DRV_NVM_BUF_LEN +#define QED_MCP_DBG_DATA_MAX_HEADER_SIZEsizeof(u32) +#define QED_MCP_DBG_DATA_MAX_PAYLOAD_SIZE \ + (QED_MCP_DBG_DATA_MAX_SIZE - QED_MCP_DBG_DATA_MAX_HEADER_SIZE) + +static int +__qed_mcp_send_debug_data(struct qed_hwfn *p_hwfn, + struct qed_ptt *p_ptt, u8 *p_buf, u8 size) +{ + struct qed_mcp_mb_params mb_params; + int rc; + + if (size > QED_MCP_DBG_DATA_MAX_SIZE) { + DP_ERR(p_hwfn, + "Debug data size is %d while it should not exceed %d\n", + size, QED_MCP_DBG_DATA_MAX_SIZE); + return -EINVAL; + } + + memset(&mb_params, 0, sizeof(mb_params)); + mb_params.cmd = DRV_MSG_CODE_DEBUG_DATA_SEND; + SET_MFW_FIELD(mb_params.param, DRV_MSG_CODE_DEBUG_DATA_SEND_SIZE, size); + mb_params.p_data_src = p_buf; + mb_params.data_src_size = size; + rc = qed_mcp_cmd_and_union(p_hwfn, p_ptt, &mb_params); + if (rc) + return rc; + + if (mb_params.mcp_resp == FW_MSG_CODE_UNSUPPORTED) { + DP_INFO(p_hwfn, +
[PATCH v2 net-next 08/11] net: qede: Implement ndo_tx_timeout
From: Denis Bolotin Upon tx timeout detection we do disable carrier and print TX queue info on TX timeout. We then raise hw error condition and trigger service task to handle this. This handler will capture extra debug info and then optionally trigger recovery procedure to try restore function. Signed-off-by: Denis Bolotin Signed-off-by: Ariel Elior Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qede/qede.h | 1 - drivers/net/ethernet/qlogic/qede/qede_main.c | 46 2 files changed, 46 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 695d645d9ba9..8857da1208d7 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -533,7 +533,6 @@ u16 qede_select_queue(struct net_device *dev, struct sk_buff *skb, netdev_features_t qede_features_check(struct sk_buff *skb, struct net_device *dev, netdev_features_t features); -void qede_tx_log_print(struct qede_dev *edev, struct qede_fastpath *fp); int qede_alloc_rx_buffer(struct qede_rx_queue *rxq, bool allow_lazy); int qede_free_tx_pkt(struct qede_dev *edev, struct qede_tx_queue *txq, int *len); diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index ee7662da6413..f50d9a9b76be 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -539,6 +539,51 @@ static int qede_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) return 0; } +static void qede_tx_log_print(struct qede_dev *edev, struct qede_tx_queue *txq) +{ + DP_NOTICE(edev, + "Txq[%d]: FW cons [host] %04x, SW cons %04x, SW prod %04x [Jiffies %lu]\n", + txq->index, le16_to_cpu(*txq->hw_cons_ptr), + qed_chain_get_cons_idx(&txq->tx_pbl), + qed_chain_get_prod_idx(&txq->tx_pbl), + jiffies); +} + +static void qede_tx_timeout(struct net_device *dev, unsigned int txqueue) +{ + struct qede_dev *edev = netdev_priv(dev); + struct qede_tx_queue *txq; + int cos; + + netif_carrier_off(dev); + DP_NOTICE(edev, "TX timeout on queue %u!\n", txqueue); + + if (!(edev->fp_array[txqueue].type & QEDE_FASTPATH_TX)) + return; + + for_each_cos_in_txq(edev, cos) { + txq = &edev->fp_array[txqueue].txq[cos]; + + if (qed_chain_get_cons_idx(&txq->tx_pbl) != + qed_chain_get_prod_idx(&txq->tx_pbl)) + qede_tx_log_print(edev, txq); + } + + if (IS_VF(edev)) + return; + + if (test_and_set_bit(QEDE_ERR_IS_HANDLED, &edev->err_flags) || + edev->state == QEDE_STATE_RECOVERY) { + DP_INFO(edev, + "Avoid handling a Tx timeout while another HW error is being handled\n"); + return; + } + + set_bit(QEDE_ERR_GET_DBG_INFO, &edev->err_flags); + set_bit(QEDE_SP_HW_ERR, &edev->sp_flags); + schedule_delayed_work(&edev->sp_task, 0); +} + static int qede_setup_tc(struct net_device *ndev, u8 num_tc) { struct qede_dev *edev = netdev_priv(ndev); @@ -626,6 +671,7 @@ static const struct net_device_ops qede_netdev_ops = { .ndo_validate_addr = eth_validate_addr, .ndo_change_mtu = qede_change_mtu, .ndo_do_ioctl = qede_ioctl, + .ndo_tx_timeout = qede_tx_timeout, #ifdef CONFIG_QED_SRIOV .ndo_set_vf_mac = qede_set_vf_mac, .ndo_set_vf_vlan = qede_set_vf_vlan, -- 2.17.1
[PATCH v2 net-next 00/11] net: qed/qede: critical hw error handling
FastLinQ devices as a complex systems may observe various hardware level error conditions, both severe and recoverable. Driver is able to detect and report this, but so far it only did trace/dmesg based reporting. Here we implement an extended hw error detection, service task handler captures a dump for the later analysis. I also resubmit a patch from Denis Bolotin on tx timeout handler, addressing David's comment regarding recovery procedure as an extra reaction on this event. v2: Removing the patch with ethtool dump and udev magic. Its quite isolated, I'm working on devlink based logic for this separately. v1: https://patchwork.ozlabs.org/project/netdev/cover/cover.1588758463.git.irussk...@marvell.com/ Denis Bolotin (1): net: qede: Implement ndo_tx_timeout Igor Russkikh (10): net: qed: adding hw_err states and handling net: qede: add hw err scheduled handler net: qed: invoke err notify on critical areas net: qed: critical err reporting to management firmware net: qed: cleanup debug related declarations net: qed: attention clearing properties net: qede: optional hw recovery procedure net: qed: introduce critical fan failure handler net: qed: introduce critical hardware error handler net: qed: fix bad formatting drivers/net/ethernet/qlogic/qed/qed.h | 16 +- drivers/net/ethernet/qlogic/qed/qed_debug.c | 26 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 4 +- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 49 +++- drivers/net/ethernet/qlogic/qed/qed_hw.c | 42 ++- drivers/net/ethernet/qlogic/qed/qed_hw.h | 15 ++ drivers/net/ethernet/qlogic/qed/qed_int.c | 40 ++- drivers/net/ethernet/qlogic/qed/qed_int.h | 11 + drivers/net/ethernet/qlogic/qed/qed_main.c| 34 +++ drivers/net/ethernet/qlogic/qed/qed_mcp.c | 254 ++ drivers/net/ethernet/qlogic/qed/qed_mcp.h | 28 ++ drivers/net/ethernet/qlogic/qed/qed_spq.c | 16 +- drivers/net/ethernet/qlogic/qede/qede.h | 14 +- .../net/ethernet/qlogic/qede/qede_ethtool.c | 24 ++ drivers/net/ethernet/qlogic/qede/qede_main.c | 147 +- include/linux/qed/qed_if.h| 26 +- 16 files changed, 700 insertions(+), 46 deletions(-) -- 2.17.1
[PATCH v2 net-next 03/11] net: qed: invoke err notify on critical areas
In a number of critical places not only debug trace should be printed, but the appropriate hw error condition should be raised and error handling/recovery should start. Introduce our new qed_hw_err_notify invocation in these places to record and indicate critical error conditions in hardware. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qed/qed_dev.c | 4 +++- drivers/net/ethernet/qlogic/qed/qed_hw.c | 7 --- drivers/net/ethernet/qlogic/qed/qed_int.c | 20 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 2 ++ drivers/net/ethernet/qlogic/qed/qed_spq.c | 16 ++-- 5 files changed, 35 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 7119a18af19e..6e857468e993 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -3085,7 +3085,9 @@ int qed_hw_init(struct qed_dev *cdev, struct qed_hw_init_params *p_params) rc = qed_final_cleanup(p_hwfn, p_hwfn->p_main_ptt, p_hwfn->rel_pf_id, false); if (rc) { - DP_NOTICE(p_hwfn, "Final cleanup failed\n"); + qed_hw_err_notify(p_hwfn, p_hwfn->p_main_ptt, + QED_HW_ERR_RAMROD_FAIL, + "Final cleanup failed\n"); goto load_err; } } diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.c b/drivers/net/ethernet/qlogic/qed/qed_hw.c index 90b777019cf5..2d176e1b508c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hw.c +++ b/drivers/net/ethernet/qlogic/qed/qed_hw.c @@ -762,9 +762,10 @@ static int qed_dmae_execute_command(struct qed_hwfn *p_hwfn, dst_type, length_cur); if (qed_status) { - DP_NOTICE(p_hwfn, - "qed_dmae_execute_sub_operation Failed with error 0x%x. source_addr 0x%llx, destination addr 0x%llx, size_in_dwords 0x%x\n", - qed_status, src_addr, dst_addr, length_cur); + qed_hw_err_notify(p_hwfn, p_ptt, QED_HW_ERR_DMAE_FAIL, + "qed_dmae_execute_sub_operation Failed with error 0x%x. source_addr 0x%llx, destination addr 0x%llx, size_in_dwords 0x%x\n", + qed_status, src_addr, + dst_addr, length_cur); break; } } diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c index 9f5113639eaf..1b1447b2f059 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_int.c +++ b/drivers/net/ethernet/qlogic/qed/qed_int.c @@ -363,6 +363,14 @@ static int qed_pglueb_rbc_attn_cb(struct qed_hwfn *p_hwfn) return qed_pglueb_rbc_attn_handler(p_hwfn, p_hwfn->p_dpc_ptt); } +static int qed_fw_assertion(struct qed_hwfn *p_hwfn) +{ + qed_hw_err_notify(p_hwfn, p_hwfn->p_dpc_ptt, QED_HW_ERR_FW_ASSERT, + "FW assertion!\n"); + + return -EINVAL; +} + #define QED_DORQ_ATTENTION_REASON_MASK (0xf) #define QED_DORQ_ATTENTION_OPAQUE_MASK (0x) #define QED_DORQ_ATTENTION_OPAQUE_SHIFT (0x0) @@ -606,7 +614,8 @@ static struct aeu_invert_reg aeu_descs[NUM_ATTN_REGS] = { { { /* After Invert 4 */ {"General Attention 32", ATTENTION_SINGLE, -NULL, MAX_BLOCK_ID}, +qed_fw_assertion, +MAX_BLOCK_ID}, {"General Attention %d", (2 << ATTENTION_LENGTH_SHIFT) | (33 << ATTENTION_OFFSET_SHIFT), NULL, MAX_BLOCK_ID}, @@ -927,9 +936,12 @@ qed_int_deassertion_aeu_bit(struct qed_hwfn *p_hwfn, qed_int_attn_print(p_hwfn, p_aeu->block_index, ATTN_TYPE_INTERRUPT, !b_fatal); - - /* If the attention is benign, no need to prevent it */ - if (!rc) + /* Reach assertion if attention is fatal */ + if (b_fatal) + qed_hw_err_notify(p_hwfn, p_hwfn->p_dpc_ptt, QED_HW_ERR_HW_ATTN, + "`%s': Fatal attention\n", + p_bit_name); + else /* If the attention is benign, no need to prevent it */ goto out; /* Prevent this Attention from being asserted in the future */ diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 280527cc0578..46653afc385c 100644 --- a/drivers/net/
[PATCH v2 net-next 02/11] net: qede: add hw err scheduled handler
qede (ethernet level driver) registers a callback handler. This handler maintains eth dev state flags/bits to track error processing. It implements in place processing part for nonsleeping context (WARN_ON trigger), and a deferred (delayed work) part which triggers recovery process for recoverable errors. In later patches this atomic handler will come with more meat. We introduce err_flags on ethdevice structure, its being used to record error handling properties. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qede/qede.h | 13 ++- drivers/net/ethernet/qlogic/qede/qede_main.c | 95 +++- 2 files changed, 106 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index f6f0b51620ab..695d645d9ba9 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -278,6 +278,14 @@ struct qede_dev { struct qede_rdma_devrdma_info; struct bpf_prog *xdp_prog; + + unsigned long err_flags; +#define QEDE_ERR_IS_HANDLED31 +#define QEDE_ERR_ATTN_CLR_EN 0 +#define QEDE_ERR_GET_DBG_INFO 1 +#define QEDE_ERR_IS_RECOVERABLE2 +#define QEDE_ERR_WARN 3 + struct qede_dump_info dump_info; }; @@ -485,12 +493,15 @@ struct qede_fastpath { #define QEDE_SP_RECOVERY 0 #define QEDE_SP_RX_MODE1 +#define QEDE_SP_RSVD1 2 +#define QEDE_SP_RSVD2 3 +#define QEDE_SP_HW_ERR 4 +#define QEDE_SP_ARFS_CONFIG 5 #define QEDE_SP_AER7 #ifdef CONFIG_RFS_ACCEL int qede_rx_flow_steer(struct net_device *dev, const struct sk_buff *skb, u16 rxq_index, u32 flow_id); -#define QEDE_SP_ARFS_CONFIG4 #define QEDE_SP_TASK_POLL_DELAY(5 * HZ) #endif diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 300405369c37..e67d5da23792 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -139,10 +139,12 @@ static void qede_shutdown(struct pci_dev *pdev); static void qede_link_update(void *dev, struct qed_link_output *link); static void qede_schedule_recovery_handler(void *dev); static void qede_recovery_handler(struct qede_dev *edev); +static void qede_schedule_hw_err_handler(void *dev, +enum qed_hw_err_type err_type); static void qede_get_eth_tlv_data(void *edev, void *data); static void qede_get_generic_tlv_data(void *edev, struct qed_generic_tlvs *data); - +static void qede_generic_hw_err_handler(struct qede_dev *edev); #ifdef CONFIG_QED_SRIOV static int qede_set_vf_vlan(struct net_device *ndev, int vf, u16 vlan, u8 qos, __be16 vlan_proto) @@ -230,6 +232,7 @@ static struct qed_eth_cb_ops qede_ll_ops = { #endif .link_update = qede_link_update, .schedule_recovery_handler = qede_schedule_recovery_handler, + .schedule_hw_err_handler = qede_schedule_hw_err_handler, .get_generic_tlv_data = qede_get_generic_tlv_data, .get_protocol_tlv_data = qede_get_eth_tlv_data, }, @@ -1009,6 +1012,8 @@ static void qede_sp_task(struct work_struct *work) qede_process_arfs_filters(edev, false); } #endif + if (test_and_clear_bit(QEDE_SP_HW_ERR, &edev->sp_flags)) + qede_generic_hw_err_handler(edev); __qede_unlock(edev); if (test_and_clear_bit(QEDE_SP_AER, &edev->sp_flags)) { @@ -2509,6 +2514,94 @@ static void qede_recovery_handler(struct qede_dev *edev) qede_recovery_failed(edev); } +static void qede_atomic_hw_err_handler(struct qede_dev *edev) +{ + DP_NOTICE(edev, + "Generic non-sleepable HW error handling started - err_flags 0x%lx\n", + edev->err_flags); + + /* Get a call trace of the flow that led to the error */ + WARN_ON(test_bit(QEDE_ERR_WARN, &edev->err_flags)); + + DP_NOTICE(edev, "Generic non-sleepable HW error handling is done\n"); +} + +static void qede_generic_hw_err_handler(struct qede_dev *edev) +{ + struct qed_dev *cdev = edev->cdev; + + DP_NOTICE(edev, + "Generic sleepable HW error handling started - err_flags 0x%lx\n", + edev->err_flags); + + /* Trigger a recovery process. +* This is placed in the sleep requiring section just to make +* sure it is the last one, and that all the other operations +* were completed. +*/ + if (test_bit(QEDE_ERR_IS_RECOVERABLE, &edev->err_flags)) + edev->ops->common->recovery_process(cdev); + + clear_bit(QEDE_ERR_IS_HANDLED, &edev->err_flags); +
[PATCH v2 net-next 01/11] net: qed: adding hw_err states and handling
Here we introduce qed device error tracking flags and error types. qed_hw_err_notify is an entrace point to report errors. It'll notify higher level drivers (qede/qedr/etc) to handle and recover the error. List of posible errors comes from hardware interfaces, but could be extended in future. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qed/qed.h | 2 ++ drivers/net/ethernet/qlogic/qed/qed_hw.c | 32 ++ drivers/net/ethernet/qlogic/qed/qed_hw.h | 15 ++ drivers/net/ethernet/qlogic/qed/qed_main.c | 29 include/linux/qed/qed_if.h | 12 5 files changed, 90 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index fa41bf08a589..12c40ce3d876 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -1020,6 +1020,8 @@ u32 qed_unzip_data(struct qed_hwfn *p_hwfn, u32 input_len, u8 *input_buf, u32 max_size, u8 *unzip_buf); void qed_schedule_recovery_handler(struct qed_hwfn *p_hwfn); +void qed_hw_error_occurred(struct qed_hwfn *p_hwfn, + enum qed_hw_err_type err_type); void qed_get_protocol_stats(struct qed_dev *cdev, enum qed_mcp_protocol_type type, union qed_mcp_protocol_stats *stats); diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.c b/drivers/net/ethernet/qlogic/qed/qed_hw.c index 4ab8cfaf63d1..90b777019cf5 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hw.c +++ b/drivers/net/ethernet/qlogic/qed/qed_hw.c @@ -837,6 +837,38 @@ int qed_dmae_host2host(struct qed_hwfn *p_hwfn, return rc; } +void qed_hw_err_notify(struct qed_hwfn *p_hwfn, + struct qed_ptt *p_ptt, + enum qed_hw_err_type err_type, char *fmt, ...) +{ + char buf[QED_HW_ERR_MAX_STR_SIZE]; + va_list vl; + int len; + + if (fmt) { + va_start(vl, fmt); + len = vsnprintf(buf, QED_HW_ERR_MAX_STR_SIZE, fmt, vl); + va_end(vl); + + if (len > QED_HW_ERR_MAX_STR_SIZE - 1) + len = QED_HW_ERR_MAX_STR_SIZE - 1; + + DP_NOTICE(p_hwfn, "%s", buf); + } + + /* Fan failure cannot be masked by handling of another HW error */ + if (p_hwfn->cdev->recov_in_prog && + err_type != QED_HW_ERR_FAN_FAIL) { + DP_VERBOSE(p_hwfn, + NETIF_MSG_DRV, + "Recovery is in progress. Avoid notifying about HW error %d.\n", + err_type); + return; + } + + qed_hw_error_occurred(p_hwfn, err_type); +} + int qed_dmae_sanity(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt, const char *phase) { diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.h b/drivers/net/ethernet/qlogic/qed/qed_hw.h index 505e94db939d..f5b109b04b66 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hw.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hw.h @@ -315,4 +315,19 @@ int qed_init_fw_data(struct qed_dev *cdev, int qed_dmae_sanity(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt, const char *phase); +#define QED_HW_ERR_MAX_STR_SIZE 256 + +/** + * @brief qed_hw_err_notify - Notify upper layer driver and management FW + * about a HW error. + * + * @param p_hwfn + * @param p_ptt + * @param err_type + * @param fmt - debug data buffer to send to the MFW + * @param ... - buffer format args + */ +void qed_hw_err_notify(struct qed_hwfn *p_hwfn, + struct qed_ptt *p_ptt, + enum qed_hw_err_type err_type, char *fmt, ...); #endif diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 38a1d26ca9db..d7c9d94e4c59 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -2468,6 +2468,35 @@ void qed_schedule_recovery_handler(struct qed_hwfn *p_hwfn) ops->schedule_recovery_handler(cookie); } +char *qed_hw_err_type_descr[] = { + [QED_HW_ERR_FAN_FAIL] = "Fan Failure", + [QED_HW_ERR_MFW_RESP_FAIL] = "MFW Response Failure", + [QED_HW_ERR_HW_ATTN]= "HW Attention", + [QED_HW_ERR_DMAE_FAIL] = "DMAE Failure", + [QED_HW_ERR_RAMROD_FAIL]= "Ramrod Failure", + [QED_HW_ERR_FW_ASSERT] = "FW Assertion", + [QED_HW_ERR_LAST] = "Unknown", +}; + +void qed_hw_error_occurred(struct qed_hwfn *p_hwfn, + enum qed_hw_err_type err_type) +{ + struct qed_common_cb_ops *ops = p_hwfn->cdev->protocol_ops.common; + void *cookie = p_hwfn->cdev->ops_cookie; + char *err_str; + + if (err_type > QED_HW_ERR_LAST) + err_type =
[PATCH v2 net-next 09/11] net: qed: introduce critical fan failure handler
Fan failure is sent by firmware, driver reacts on this error with newly introduced notification path. It will collect dump and shut down the device to prevent physical breakage Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 2 +- drivers/net/ethernet/qlogic/qed/qed_mcp.c | 14 ++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index 21d53b00c2e6..ab042b835797 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -12761,7 +12761,7 @@ enum MFW_DRV_MSG_TYPE { MFW_DRV_MSG_GET_FCOE_STATS, MFW_DRV_MSG_GET_ISCSI_STATS, MFW_DRV_MSG_GET_RDMA_STATS, - MFW_DRV_MSG_BW_UPDATE10, + MFW_DRV_MSG_FAILURE_DETECTED, MFW_DRV_MSG_TRANSCEIVER_STATE_CHANGE, MFW_DRV_MSG_BW_UPDATE11, MFW_DRV_MSG_RESERVED, diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 62be13d49dd8..0058e804efc3 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c @@ -1706,6 +1706,17 @@ static void qed_mcp_update_stag(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) &resp, ¶m); } +static void qed_mcp_handle_fan_failure(struct qed_hwfn *p_hwfn, + struct qed_ptt *p_ptt) +{ + /* A single notification should be sent to upper driver in CMT mode */ + if (p_hwfn != QED_LEADING_HWFN(p_hwfn->cdev)) + return; + + qed_hw_err_notify(p_hwfn, p_ptt, QED_HW_ERR_FAN_FAIL, + "Fan failure was detected on the network interface card and it's going to be shut down.\n"); +} + void qed_mcp_read_ufp_config(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) { struct public_func shmem_info; @@ -1852,6 +1863,9 @@ int qed_mcp_handle_events(struct qed_hwfn *p_hwfn, case MFW_DRV_MSG_S_TAG_UPDATE: qed_mcp_update_stag(p_hwfn, p_ptt); break; + case MFW_DRV_MSG_FAILURE_DETECTED: + qed_mcp_handle_fan_failure(p_hwfn, p_ptt); + break; case MFW_DRV_MSG_GET_TLV_REQ: qed_mfw_tlv_req(p_hwfn); break; -- 2.17.1
[PATCH v2 net-next 11/11] net: qed: fix bad formatting
On some adjacent code, fix bad code formatting Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- include/linux/qed/qed_if.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h index 978e91e9ab65..48325d7790f8 100644 --- a/include/linux/qed/qed_if.h +++ b/include/linux/qed/qed_if.h @@ -821,12 +821,11 @@ enum qed_nvm_flash_cmd { struct qed_common_cb_ops { void (*arfs_filter_op)(void *dev, void *fltr, u8 fw_rc); - void(*link_update)(void *dev, - struct qed_link_output *link); + void (*link_update)(void *dev, struct qed_link_output *link); void (*schedule_recovery_handler)(void *dev); void (*schedule_hw_err_handler)(void *dev, enum qed_hw_err_type err_type); - void(*dcbx_aen)(void *dev, struct qed_dcbx_get *get, u32 mib_type); + void (*dcbx_aen)(void *dev, struct qed_dcbx_get *get, u32 mib_type); void (*get_generic_tlv_data)(void *dev, struct qed_generic_tlvs *data); void (*get_protocol_tlv_data)(void *dev, void *data); }; -- 2.17.1
[PATCH v2 net-next 10/11] net: qed: introduce critical hardware error handler
MCP may signal driver about generic critical failure. Driver has to collect mdump information (get_retain), it pushes that to logs and triggers generic notification on "hardware attention" event. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 28 +- drivers/net/ethernet/qlogic/qed/qed_mcp.c | 113 ++ drivers/net/ethernet/qlogic/qed/qed_mcp.h | 13 +++ 3 files changed, 153 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index ab042b835797..f00460d00cab 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -12400,6 +12400,13 @@ struct load_rsp_stc { #define LOAD_RSP_FLAGS0_DRV_EXISTS (0x1 << 0) }; +struct mdump_retain_data_stc { + u32 valid; + u32 epoch; + u32 pf; + u32 status; +}; + union drv_union_data { u32 ver_str[MCP_DRV_VER_STR_SIZE_DWORD]; struct mcp_mac wol_mac; @@ -12488,6 +12495,8 @@ struct public_drv_mb { #define DRV_MSG_CODE_BIST_TEST 0x001e #define DRV_MSG_CODE_SET_LED_MODE 0x0020 #define DRV_MSG_CODE_RESOURCE_CMD 0x0023 +/* Send crash dump commands with param[3:0] - opcode */ +#define DRV_MSG_CODE_MDUMP_CMD 0x0025 #define DRV_MSG_CODE_GET_TLV_DONE 0x002f #define DRV_MSG_CODE_GET_ENGINE_CONFIG 0x0037 #define DRV_MSG_CODE_GET_PPFID_BITMAP 0x4300 @@ -12519,6 +12528,21 @@ struct public_drv_mb { #define RESOURCE_DUMP 0 +/* DRV_MSG_CODE_MDUMP_CMD parameters */ +#define MDUMP_DRV_PARAM_OPCODE_MASK 0x000f +#define DRV_MSG_CODE_MDUMP_ACK 0x01 +#define DRV_MSG_CODE_MDUMP_SET_VALUES 0x02 +#define DRV_MSG_CODE_MDUMP_TRIGGER 0x03 +#define DRV_MSG_CODE_MDUMP_GET_CONFIG 0x04 +#define DRV_MSG_CODE_MDUMP_SET_ENABLE 0x05 +#define DRV_MSG_CODE_MDUMP_CLEAR_LOGS 0x06 +#define DRV_MSG_CODE_MDUMP_GET_RETAIN 0x07 +#define DRV_MSG_CODE_MDUMP_CLR_RETAIN 0x08 + +#define DRV_MSG_CODE_HW_DUMP_TRIGGER0x0a +#define DRV_MSG_CODE_MDUMP_GEN_MDUMP2 0x0b +#define DRV_MSG_CODE_MDUMP_FREE_MDUMP2 0x0c + #define DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL 0x002b #define DRV_MSG_CODE_OS_WOL0x002e @@ -12697,6 +12721,8 @@ struct public_drv_mb { #define FW_MSG_CODE_DEBUG_NOT_ENABLED 0xb00a #define FW_MSG_CODE_DEBUG_DATA_SEND_OK 0xb00b +#define FW_MSG_CODE_MDUMP_INVALID_CMD 0x0003 + u32 fw_mb_param; #define FW_MB_PARAM_RESOURCE_ALLOC_VERSION_MAJOR_MASK 0x #define FW_MB_PARAM_RESOURCE_ALLOC_VERSION_MAJOR_SHIFT 16 @@ -12763,7 +12789,7 @@ enum MFW_DRV_MSG_TYPE { MFW_DRV_MSG_GET_RDMA_STATS, MFW_DRV_MSG_FAILURE_DETECTED, MFW_DRV_MSG_TRANSCEIVER_STATE_CHANGE, - MFW_DRV_MSG_BW_UPDATE11, + MFW_DRV_MSG_CRITICAL_ERROR_OCCURRED, MFW_DRV_MSG_RESERVED, MFW_DRV_MSG_GET_TLV_REQ, MFW_DRV_MSG_OEM_CFG_UPDATE, diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 0058e804efc3..8a0bbc7d4b24 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c @@ -1717,6 +1717,116 @@ static void qed_mcp_handle_fan_failure(struct qed_hwfn *p_hwfn, "Fan failure was detected on the network interface card and it's going to be shut down.\n"); } +struct qed_mdump_cmd_params { + u32 cmd; + void *p_data_src; + u8 data_src_size; + void *p_data_dst; + u8 data_dst_size; + u32 mcp_resp; +}; + +static int +qed_mcp_mdump_cmd(struct qed_hwfn *p_hwfn, + struct qed_ptt *p_ptt, + struct qed_mdump_cmd_params *p_mdump_cmd_params) +{ + struct qed_mcp_mb_params mb_params; + int rc; + + memset(&mb_params, 0, sizeof(mb_params)); + mb_params.cmd = DRV_MSG_CODE_MDUMP_CMD; + mb_params.param = p_mdump_cmd_params->cmd; + mb_params.p_data_src = p_mdump_cmd_params->p_data_src; + mb_params.data_src_size = p_mdump_cmd_params->data_src_size; + mb_params.p_data_dst = p_mdump_cmd_params->p_data_dst; + mb_params.data_dst_size = p_mdump_cmd_params->data_dst_size; + rc = qed_mcp_cmd_and_union(p_hwfn, p_ptt, &mb_params); + if (rc) + return rc; + + p_mdump_cmd_params->mcp_resp = mb_params.mcp_resp; + + if (p_mdump_cmd_params->mcp_resp == FW_MSG_CODE_MDUMP_INVALID_CMD) { + DP_INFO(p_hwfn, + "The mdump sub command is unsupported by the MFW [mdump_cmd 0x%x]\n", + p_mdump_cmd_params->cmd); + rc = -EOPNOTSUPP; + } else if (p_mdump
[PATCH v2 net-next 05/11] net: qed: cleanup debug related declarations
Thats probably a legacy code had double declaration of some fields. Cleanup this, removing copy and fixing references. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: Igor Russkikh --- drivers/net/ethernet/qlogic/qed/qed.h | 11 +++-- drivers/net/ethernet/qlogic/qed/qed_debug.c | 26 ++--- 2 files changed, 16 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 12c40ce3d876..07f6ef930b52 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -740,12 +740,6 @@ struct qed_dbg_feature { u32 dumped_dwords; }; -struct qed_dbg_params { - struct qed_dbg_feature features[DBG_FEATURE_NUM]; - u8 engine_for_debug; - bool print_data; -}; - struct qed_dev { u32 dp_module; u8 dp_level; @@ -872,17 +866,18 @@ struct qed_dev { } protocol_ops; void*ops_cookie; - struct qed_dbg_params dbg_params; - #ifdef CONFIG_QED_LL2 struct qed_cb_ll2_info *ll2; u8 ll2_mac_address[ETH_ALEN]; #endif struct qed_dbg_feature dbg_features[DBG_FEATURE_NUM]; + u8 engine_for_debug; bool disable_ilt_dump; DECLARE_HASHTABLE(connections, 10); const struct firmware *firmware; + bool print_dbg_data; + u32 rdma_max_sge; u32 rdma_max_inline; u32 rdma_max_srq_sge; diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c b/drivers/net/ethernet/qlogic/qed/qed_debug.c index f4eebaabb6d0..57a0dab88431 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c @@ -7453,7 +7453,7 @@ static enum dbg_status format_feature(struct qed_hwfn *p_hwfn, enum qed_dbg_features feature_idx) { struct qed_dbg_feature *feature = - &p_hwfn->cdev->dbg_params.features[feature_idx]; + &p_hwfn->cdev->dbg_features[feature_idx]; u32 text_size_bytes, null_char_pos, i; enum dbg_status rc; char *text_buf; @@ -7502,7 +7502,7 @@ static enum dbg_status format_feature(struct qed_hwfn *p_hwfn, text_buf[i] = '\n'; /* Dump printable feature to log */ - if (p_hwfn->cdev->dbg_params.print_data) + if (p_hwfn->cdev->print_dbg_data) qed_dbg_print_feature(text_buf, text_size_bytes); /* Free the old dump_buf and point the dump_buf to the newly allocagted @@ -7523,7 +7523,7 @@ static enum dbg_status qed_dbg_dump(struct qed_hwfn *p_hwfn, enum qed_dbg_features feature_idx) { struct qed_dbg_feature *feature = - &p_hwfn->cdev->dbg_params.features[feature_idx]; + &p_hwfn->cdev->dbg_features[feature_idx]; u32 buf_size_dwords; enum dbg_status rc; @@ -7648,7 +7648,7 @@ static int qed_dbg_nvm_image(struct qed_dev *cdev, void *buffer, enum qed_nvm_images image_id) { struct qed_hwfn *p_hwfn = - &cdev->hwfns[cdev->dbg_params.engine_for_debug]; + &cdev->hwfns[cdev->engine_for_debug]; u32 len_rounded, i; __be32 val; int rc; @@ -7780,7 +7780,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer) { u8 cur_engine, omit_engine = 0, org_engine; struct qed_hwfn *p_hwfn = - &cdev->hwfns[cdev->dbg_params.engine_for_debug]; + &cdev->hwfns[cdev->engine_for_debug]; struct dbg_tools_data *dev_data = &p_hwfn->dbg_info; int grc_params[MAX_DBG_GRC_PARAMS], i; u32 offset = 0, feature_size; @@ -8000,7 +8000,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer) int qed_dbg_all_data_size(struct qed_dev *cdev) { struct qed_hwfn *p_hwfn = - &cdev->hwfns[cdev->dbg_params.engine_for_debug]; + &cdev->hwfns[cdev->engine_for_debug]; u32 regs_len = 0, image_len = 0, ilt_len = 0, total_ilt_len = 0; u8 cur_engine, org_engine; @@ -8059,9 +8059,9 @@ int qed_dbg_feature(struct qed_dev *cdev, void *buffer, enum qed_dbg_features feature, u32 *num_dumped_bytes) { struct qed_hwfn *p_hwfn = - &cdev->hwfns[cdev->dbg_params.engine_for_debug]; + &cdev->hwfns[cdev->engine_for_debug]; struct qed_dbg_feature *qed_feature = - &cdev->dbg_params.features[feature]; + &cdev->dbg_features[feature]; enum dbg_status dbg_rc; struct qed_ptt *p_ptt; int rc = 0; @@ -8084,7 +8084,7 @@ int qed_dbg_feature(struct qed_dev *cdev, void *buffer, DP_VERBOSE(cdev, QED_MSG_DEBUG, "copying debugfs feature to external buffer\n"); memcpy(buffer, qed_feature->dump_buf, qed_feature->buf_size); -
RE: [PATCH 11/18] maccess: remove strncpy_from_unsafe
From: Daniel Borkmann > Sent: 14 May 2020 00:59 > On 5/14/20 1:28 AM, Al Viro wrote: > > On Thu, May 14, 2020 at 12:36:28AM +0200, Daniel Borkmann wrote: > > > >>> So on say s390 TASK_SIZE_USUALLy is (-PAGE_SIZE), which means we'd alway > >>> try the user copy first, which seems odd. > >>> > >>> I'd really like to here from the bpf folks what the expected use case > >>> is here, and if the typical argument is kernel or user memory. > >> > >> It's used for both. Given this is enabled on pretty much all program > >> types, my > >> assumption would be that usage is still more often on kernel memory than > >> user one. > > > > Then it needs an argument telling it which one to use. Look at sparc64. > > Or s390. Or parisc. Et sodding cetera. > > > > The underlying model is that the kernel lives in a separate address space. > > Yes, on x86 it's actually sharing the page tables with userland, but that's > > not universal. The same address can be both a valid userland one _and_ > > a valid kernel one. You need to tell which one do you want. > > Yes, see also 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and > probe_read_{user, > kernel}_str helpers"), and my other reply wrt bpf_trace_printk() on how to > address > this. All I'm trying to say is that both bpf_probe_read() and > bpf_trace_printk() do > exist in this form since early [e]bpf days for ~5yrs now and while broken on > non-x86 > there are a lot of users on x86 for this in the wild, so they need to have a > chance > to migrate over to the new facilities before they are fully removed. If it's not a stupid question why is a BPF program allowed to get into a situation where it might have an invalid kernel address. It all stinks of a hole that allows all of kernel memory to be read and copied to userspace. Now you might want to something special so that BPF programs just abort on OOPS instead of possibly paniking the kernel. But that is different from a copy that expects to be passed garbage. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH net] pppoe: only process PADT targeted at local interfaces
We don't want to disconnect a session because of a stray PADT arriving while the interface is in promiscuous mode. Furthermore, multicast and broadcast packets make no sense here, so only PACKET_HOST is accepted. Reported-by: David Balažic Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault --- drivers/net/ppp/pppoe.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c index d760a36db28c..beedaad08255 100644 --- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -490,6 +490,9 @@ static int pppoe_disc_rcv(struct sk_buff *skb, struct net_device *dev, if (!skb) goto out; + if (skb->pkt_type != PACKET_HOST) + goto abort; + if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr))) goto abort; -- 2.21.1
Re: remove kernel_setsockopt and kernel_getsockopt
On Thu, May 14, 2020 at 08:29:30AM +, David Laight wrote: > You need to export functions that do most of the socket options > for all protocols. Only for those were we have users, and all those are covered.
Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
On 5/14/20 12:01 PM, David Laight wrote: [...] If it's not a stupid question why is a BPF program allowed to get into a situation where it might have an invalid kernel address. It all stinks of a hole that allows all of kernel memory to be read and copied to userspace. Now you might want to something special so that BPF programs just abort on OOPS instead of possibly paniking the kernel. But that is different from a copy that expects to be passed garbage. I suggest you read up on probe_kernel_read() and its uses in tracing in general, looks like you haven't done that.
RE: remove kernel_setsockopt and kernel_getsockopt
From: Christoph Hellwig > Only for those were we have users, and all those are covered. What do we tell all our users when our kernel SCTP code no longer works? It uses SO_REUSADDR, SCTP_EVENTS, SCTP_NODELAY, SCTP_STATUS, SCTP_INITMSG, IPV6_ONLY, SCTP_SOCKOPT_BINDX_ADD and SO_LINGER. We should probably use the CONNECTX function as well. I doubt we are the one company with out-of-tree drivers that use the kernel_socket interface. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
On 5/14/20 11:44 AM, Masami Hiramatsu wrote: On Wed, 13 May 2020 19:43:24 -0700 Linus Torvalds wrote: On Wed, May 13, 2020 at 6:00 PM Masami Hiramatsu wrote: But we should likely at least disallow it entirely on platforms where we really can't - or pick one hardcoded choice. On sparc, you really _have_ to specify one or the other. OK. BTW, is there any way to detect the kernel/user space overlap on memory layout statically? If there, I can do it. (I don't like "if (CONFIG_X86)" thing) Or, maybe we need CONFIG_ARCH_OVERLAP_ADDRESS_SPACE? I think it would be better to have a CONFIG variable that architectures can just 'select' to show that they are ok with separate kernel and user addresses. Because I don't think we have any way to say that right now as-is. You can probably come up with hacky ways to approximate it, ie something like if (TASK_SIZE_MAX > PAGE_OFFSET) they overlap .. which would almost work, but.. It seems TASK_SIZE_MAX is defined only on x86 and s390, what about comparing STACK_TOP_MAX with PAGE_OFFSET ? Anyway, I agree that the best way is introducing a CONFIG. Agree, CONFIG knob that archs can select feels cleanest. Fwiw, I've cooked up fixes for bpf side locally here and finishing up testing, will push out later today. Thanks, Daniel
Re: [PATCH stable-5.4.y] net: dsa: Do not make user port errors fatal
On Wed, May 13, 2020 at 12:55:46PM -0700, David Miller wrote: > From: Florian Fainelli > Date: Wed, 13 May 2020 10:41:45 -0700 > > > commit 86f8b1c01a0a537a73d2996615133be63cdf75db upstream > > > > Prior to 1d27732f411d ("net: dsa: setup and teardown ports"), we would > > not treat failures to set-up an user port as fatal, but after this > > commit we would, which is a regression for some systems where interfaces > > may be declared in the Device Tree, but the underlying hardware may not > > be present (pluggable daughter cards for instance). > > > > Fixes: 1d27732f411d ("net: dsa: setup and teardown ports") > > Signed-off-by: Florian Fainelli > > Reviewed-by: Andrew Lunn > > Signed-off-by: David S. Miller > > Greg, please queue this up. Now queued up, thanks. greg k-h
Re: [PATCH 29/33] rxrpc_sock_set_min_security_level
On Wed, May 13, 2020 at 02:13:07PM +0100, David Howells wrote: > Christoph Hellwig wrote: > > > +int rxrpc_sock_set_min_security_level(struct sock *sk, unsigned int val); > > + > > Looks good - but you do need to add this to Documentation/networking/rxrpc.txt > also, thanks. That file doesn't exist, instead we now have a cumentation/networking/rxrpc.rst in weird markup. Where do you want this to be added, and with what text? Remember I don't really know what this thing does, I just provide a shortcut.
[PATCH 1/3] net: stmmac: gmac3: add auxiliary snapshot support
From: Artem Panfilov This patch adds support for time stamping external inputs for GMAC3. The documentation defines 4 auxiliary snapshots ATSEN0 to ATSEN3 which can be toggled by writing the Timestamp Control Register. When the gmac detects a pps rising edge on one of it's auxiliary inputs, an isr of type GMAC_INT_STATUS_TSTAMP will be triggered. We use this isr to generate a ptp clock event of type PTP_CLOCK_EXTTS with the following content: - Time of which the event occurred in ns. - All the extts for which the event was generated ( - ) Note from the documentation: "When more than one bit is set at the same time, it means that corresponding auxiliary triggers were sampled at the same clock" When the GMAC writes it's auxiliary snapshots on it's fifo and that fifo is full, it will discard any new auxiliary snapshot until we read the fifo. By reading on each isr, it is unlikely that we will loose the 1pps external timestamps. Events for one auxiliary input can be requested through the PTP_EXTTS_REQUEST ioctl and read as already implemented in the uapi. This patch introduces 2 functions: stmmac_set_hw_tstamping and stmmac_get_hw_tstamping Each time we initialize the timestamping, we read the current value of PTP_TCR and patch with new configuration without setting the ATSENX flags again, which are set independently by the user through the ioctl. This allows to not loose the activated external events between each initialization of the timestamping, and not force the user to redo ioctl. Signed-off-by: Olivier Dautricourt Signed-off-by: Artem Panfilov --- .../net/ethernet/stmicro/stmmac/dwmac1000.h | 3 +- .../ethernet/stmicro/stmmac/dwmac1000_core.c | 24 ++ drivers/net/ethernet/stmicro/stmmac/hwif.h| 9 ++-- .../ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 10 - .../net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +-- .../net/ethernet/stmicro/stmmac/stmmac_ptp.c | 44 +++ .../net/ethernet/stmicro/stmmac/stmmac_ptp.h | 20 + 7 files changed, 107 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h index b70d44ac0990..5cff6c100258 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h @@ -41,8 +41,7 @@ #defineGMAC_INT_DISABLE_PCS(GMAC_INT_DISABLE_RGMII | \ GMAC_INT_DISABLE_PCSLINK | \ GMAC_INT_DISABLE_PCSAN) -#defineGMAC_INT_DEFAULT_MASK (GMAC_INT_DISABLE_TIMESTAMP | \ -GMAC_INT_DISABLE_PCS) +#defineGMAC_INT_DEFAULT_MASK GMAC_INT_DISABLE_PCS /* PMT Control and Status */ #define GMAC_PMT 0x002c diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index efc6ec1b8027..3895fe9396e5 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -20,6 +20,7 @@ #include "stmmac.h" #include "stmmac_pcs.h" #include "dwmac1000.h" +#include "stmmac_ptp.h" static void dwmac1000_core_init(struct mac_device_info *hw, struct net_device *dev) @@ -300,9 +301,29 @@ static void dwmac1000_rgsmii(void __iomem *ioaddr, struct stmmac_extra_stats *x) } } +static void dwmac1000_ptp_isr(struct stmmac_priv *priv) +{ + struct ptp_clock_event event; + u32 reg_value; + u64 ns; + + reg_value = readl(priv->ioaddr + PTP_GMAC3_TSR); + + ns = readl(priv->ioaddr + PTP_GMAC3_AUXTSLO); + ns += readl(priv->ioaddr + PTP_GMAC3_AUXTSHI) * 10ULL; + + event.timestamp = ns; + event.type = PTP_CLOCK_EXTTS; + event.index = (reg_value & PTP_GMAC3_ATSSTN_MASK) >> + PTP_GMAC3_ATSSTN_SHIFT; + ptp_clock_event(priv->ptp_clock, &event); +} + static int dwmac1000_irq_status(struct mac_device_info *hw, struct stmmac_extra_stats *x) { + struct stmmac_priv *priv = + container_of(x, struct stmmac_priv, xstats); void __iomem *ioaddr = hw->pcsr; u32 intr_status = readl(ioaddr + GMAC_INT_STATUS); u32 intr_mask = readl(ioaddr + GMAC_INT_MASK); @@ -324,6 +345,9 @@ static int dwmac1000_irq_status(struct mac_device_info *hw, x->irq_receive_pmt_irq_n++; } + if (intr_status & GMAC_INT_STATUS_TSTAMP) + dwmac1000_ptp_isr(priv); + /* MAC tx/rx EEE LPI entry/exit interrupts */ if (intr_status & GMAC_INT_STATUS_LPIIS) { /* Clean LPI interrupt by reading the Reg 12 */ diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h index ffe2d63389b8..8fa63d059231 100644 --- a/drivers/net/ethernet/stmicro/stmmac/hwif.h +++ b/drivers/net/ethernet/stmicro
[PATCH 0/3] Patch series for a PTP Grandmaster use case using stmmac/gmac3 ptp clock
This patch series covers a use case where an embedded system is disciplining an internal clock to a GNSS signal, which provides a stable frequency, and wants to act as a PTP Grandmaster by disciplining a ptp clock to this internal clock. In our setup a 10Mhz oscillator is frequency adjusted so that a derived pps from that oscillator is in phase with the pps generated by a gnss receiver. An other derived clock from the same disciplined oscillator is used as ptp_clock for the ethernet mac. The internal pps of the system is forwarded to one of the auxiliary inputs of the MAC. Initially the mac time registers are considered random. We want the mac nanosecond field to be 0 on the auxiliary pps input edge. PATCH 1/3: The stmmac gmac3 version used in the setup is patched to retrieve a timestamp at the rising edge of the aux input and to forward it to userspace. * What matters here is that we get the subsecond offset between the aux edge and the edge of the PHC's pps. * PATCH 2,3/3: We want the ptp clock to be in time with our aux pps input. Since the ptp clock is derived from the system oscillator, we don't want to do frequency adjustements. The stmmac driver is patched to allow to set the coarse correction mode which avoid to adjust the frequency of the mac continuously (the default behavior), but instead, have just one time adjustment. We calculate the time difference between the mac and the internal clock, and adust the ptp clock time with clock_adjtime syscall. To summarize this in a user-space program: #include #include #include #include #include #include #include #include #include #include #include #include #include #define NS_PER_SEC 10LL #define CLOCKFD 3 #define FD_TO_CLOCKID(fd) \ ((clockid_t) unsigned int) ~fd) << 3) | CLOCKFD)) static inline int clock_adjtime(clockid_t id, struct timex *tx) { return syscall(__NR_clock_adjtime, id, tx); } int main(void) { int fd; struct timex tx = {0}; struct ifreq ifreq = {0}; struct hwtstamp_config cfg = {0}; struct ptp_extts_event event = {0}; struct ptp_extts_request extts_request = { .index = 0, .flags = PTP_RISING_EDGE | PTP_ENABLE_FEATURE }; const char *iface = "eth0"; const char *ptp_dev = "/dev/ptp2"; strncpy(ifreq.ifr_name, iface, sizeof(ifreq.ifr_name) - 1); ifreq.ifr_data = (void *) &cfg; fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP); if (fd < 0) return 1; if (ioctl(fd, SIOCGHWTSTAMP, &ifreq) < 0) return 1; // Activate coarse mode for stmmac cfg.flags |= HWTSTAMP_FLAGS_ADJ_COARSE; cfg.flags &= ~HWTSTAMP_FLAGS_ADJ_FINE; if (ioctl(fd, SIOCSHWTSTAMP, &ifreq) < 0) return 1; fd = open(ptp_dev, O_RDWR); if (fd < 0) return 1; // Enable extts input index 0 if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request) < 0) return 1; // Read extts if (read(fd, &event, sizeof(event)) != sizeof(event)) return 1; // Correct phc time subsecond: note that this does not correct the phc // second count for concision. The delta is (event.t.nsec - NS_PER_SEC). tx.modes = ADJ_SETOFFSET | ADJ_NANO; tx.time.tv_sec = -1; tx.time.tv_usec = event.t.nsec; if (clock_adjtime(FD_TO_CLOCKID(fd), &tx)) return 1; // Disable extts index 0 extts_request.index = 0; extts_request.flags = 0; if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request) < 0) return 1; return 0; } Artem Panfilov (1): net: stmmac: GMAC3: add auxiliary snapshot support Olivier Dautricourt (2): net: uapi: Add HWTSTAMP_FLAGS_ADJ_FINE/ADJ_COARSE net: stmmac: Support coarse mode through ioctl .../net/ethernet/stmicro/stmmac/dwmac1000.h | 3 +- .../ethernet/stmicro/stmmac/dwmac1000_core.c | 24 ++ drivers/net/ethernet/stmicro/stmmac/hwif.h| 9 ++-- .../ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 10 +++- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 21 ++--- .../net/ethernet/stmicro/stmmac/stmmac_ptp.c | 47 +++ .../net/ethernet/stmicro/stmmac/stmmac_ptp.h | 20 include/uapi/linux/net_tstamp.h | 12 + net/core/dev_ioctl.c | 3 -- 9 files changed, 133 insertions(+), 16 deletions(-) -- 2.17.1
[PATCH 3/3] net: stmmac: Support coarse mode through ioctl
This commit enables coarse correction mode for stmmac driver. The coarse mode allows to update the system time in one process. The required time adjustment is written in the Timestamp Update registers while the Sub-second increment register is programmed with the period of the clock, which is the precision of our correction. The fine adjutment mode is always the default behavior of the driver. One should use the HWTSAMP_FLAG_ADJ_COARSE flag while calling SIOCGHWTSTAMP ioctl to enable coarse mode for stmmac driver. Signed-off-by: Olivier Dautricourt --- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 17 + .../net/ethernet/stmicro/stmmac/stmmac_ptp.c| 3 +++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index c39fafe69b12..f46503b086f4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -541,9 +541,12 @@ static int stmmac_hwtstamp_set(struct net_device *dev, struct ifreq *ifr) netdev_dbg(priv->dev, "%s config flags:0x%x, tx_type:0x%x, rx_filter:0x%x\n", __func__, config.flags, config.tx_type, config.rx_filter); - /* reserved for future extensions */ - if (config.flags) - return -EINVAL; + if (config.flags != HWTSTAMP_FLAGS_ADJ_COARSE) { + /* Defaulting to fine adjustment for compatibility */ + netdev_dbg(priv->dev, "%s defaulting to fine adjustment mode\n", + __func__); + config.flags = HWTSTAMP_FLAGS_ADJ_FINE; + } if (config.tx_type != HWTSTAMP_TX_OFF && config.tx_type != HWTSTAMP_TX_ON) @@ -689,10 +692,16 @@ static int stmmac_hwtstamp_set(struct net_device *dev, struct ifreq *ifr) stmmac_set_hw_tstamping(priv, priv->ptpaddr, 0); else { stmmac_get_hw_tstamping(priv, priv->ptpaddr, &value); - value |= (PTP_TCR_TSENA | PTP_TCR_TSCFUPDT | PTP_TCR_TSCTRLSSR | + value |= (PTP_TCR_TSENA | PTP_TCR_TSCTRLSSR | tstamp_all | ptp_v2 | ptp_over_ethernet | ptp_over_ipv6_udp | ptp_over_ipv4_udp | ts_event_en | ts_master_en | snap_type_sel); + + if (config.flags == HWTSTAMP_FLAGS_ADJ_FINE) + value |= PTP_TCR_TSCFUPDT; + else + value &= ~PTP_TCR_TSCFUPDT; + stmmac_set_hw_tstamping(priv, priv->ptpaddr, value); /* program Sub Second Increment reg */ diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c index 920f0f3ebbca..7fb318441015 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c @@ -27,6 +27,9 @@ static int stmmac_adjust_freq(struct ptp_clock_info *ptp, s32 ppb) int neg_adj = 0; u64 adj; + if (priv->tstamp_config.flags != HWTSTAMP_FLAGS_ADJ_FINE) + return -EPERM; + if (ppb < 0) { neg_adj = 1; ppb = -ppb; -- 2.17.1
[PATCH 2/3] net: uapi: Add HWTSTAMP_FLAGS_ADJ_FINE/ADJ_COARSE
This commit allows a user to specify a flag value for configuring timestamping through hwtsamp_config structure. New flags introduced: HWTSTAMP_FLAGS_NONE = 0 No flag specified: as it is of value 0, this will selects the default behavior for all the existing drivers and should not break existing userland programs. HWTSTAMP_FLAGS_ADJ_FINE = 1 Use the fine adjustment mode. Fine adjustment mode is usually used for precise frequency adjustments. HWTSTAMP_FLAGS_ADJ_COARSE = 2 Use the coarse adjustment mode Coarse adjustment mode is usually used for direct phase correction. Signed-off-by: Olivier Dautricourt --- include/uapi/linux/net_tstamp.h | 12 net/core/dev_ioctl.c| 3 --- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index 7ed0b3d1c00a..0cfcd490228f 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -65,6 +65,18 @@ struct hwtstamp_config { int rx_filter; }; +/* possible values for hwtstamp_config->flags */ +enum hwtsamp_flags { + /* No special flag specified */ + HWTSTAMP_FLAGS_NONE, + + /* Enable fine adjustment mode if the driver supports it */ + HWTSTAMP_FLAGS_ADJ_FINE, + + /* Enable coarse adjustment mode if the driver supports it */ + HWTSTAMP_FLAGS_ADJ_COARSE, +}; + /* possible values for hwtstamp_config->tx_type */ enum hwtstamp_tx_types { /* diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 547b587c1950..017671545d45 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -177,9 +177,6 @@ static int net_hwtstamp_validate(struct ifreq *ifr) if (copy_from_user(&cfg, ifr->ifr_data, sizeof(cfg))) return -EFAULT; - if (cfg.flags) /* reserved for future extensions */ - return -EINVAL; - tx_type = cfg.tx_type; rx_filter = cfg.rx_filter; -- 2.17.1
Re: [PATCH 20/33] ipv4: add ip_sock_set_recverr
On Wed, May 13, 2020 at 02:00:43PM -0700, Joe Perches wrote: > On Wed, 2020-05-13 at 08:26 +0200, Christoph Hellwig wrote: > > Add a helper to directly set the IP_RECVERR sockopt from kernel space > > without going through a fake uaccess. > > This seems used only with true as the second arg. > Is there reason to have that argument at all? Mostly to keep it symmetric with the sockopt. I could probably remove a few arguments in the series if we want to be strict.
Re: remove kernel_setsockopt and kernel_getsockopt
On Thu, May 14, 2020 at 10:26:41AM +, David Laight wrote: > From: Christoph Hellwig > > Only for those were we have users, and all those are covered. > > What do we tell all our users when our kernel SCTP code > no longer works? We only care about in-tree modules, just like for every other interface in the kernel.
is it ok to always pull in sctp for dlm, was: Re: [PATCH 27/33] sctp: export sctp_setsockopt_bindx
On Wed, May 13, 2020 at 03:00:58PM -0300, Marcelo Ricardo Leitner wrote: > On Wed, May 13, 2020 at 08:26:42AM +0200, Christoph Hellwig wrote: > > And call it directly from dlm instead of going through kernel_setsockopt. > > The advantage on using kernel_setsockopt here is that sctp module will > only be loaded if dlm actually creates a SCTP socket. With this > change, sctp will be loaded on setups that may not be actually using > it. It's a quite big module and might expose the system. > > I'm okay with the SCTP changes, but I'll defer to DLM folks to whether > that's too bad or what for DLM. So for ipv6 I could just move the helpers inline as they were trivial and avoid that issue. But some of the sctp stuff really is way too big for that, so the only other option would be to use symbol_get.
[PATCH net-next v4 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
Use hole in struct xdp_frame, when adding member frame_sz, which keeps same sizeof struct (32 bytes) Drivers ixgbe and sfc had bug cases where the necessary/expected tailroom was not reserved. This can lead to some hard to catch memory corruption issues. Having the drivers frame_sz this can be detected when packet length/end via xdp->data_end exceed the xdp_data_hard_end pointer, which accounts for the reserved the tailroom. When detecting this driver issue, simply fail the conversion with NULL, which results in feedback to driver (failing xdp_do_redirect()) causing driver to drop packet. Given the lack of consistent XDP stats, this can be hard to troubleshoot. And given this is a driver bug, we want to generate some more noise in form of a WARN stack dump (to ID the driver code that inlined convert_to_xdp_frame). Inlining the WARN macro is problematic, because it adds an asm instruction (on Intel CPUs ud2) what influence instruction cache prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this and at the same time make identifying the function and line of this inlined function easier. Signed-off-by: Jesper Dangaard Brouer Acked-by: Toke Høiland-Jørgensen --- include/net/xdp.h | 14 +- net/core/xdp.c|8 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index a764af4ae0ea..3094fccf5a88 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -89,7 +89,8 @@ struct xdp_frame { void *data; u16 len; u16 headroom; - u16 metasize; + u32 metasize:8; + u32 frame_sz:24; /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time, * while mem info is valid on remote CPU. */ @@ -104,6 +105,10 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame) frame->dev_rx = NULL; } +/* Avoids inlining WARN macro in fast-path */ +void xdp_warn(const char *msg, const char *func, const int line); +#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__) + struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp); /* Convert xdp_buff to xdp_frame */ @@ -124,6 +129,12 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) if (unlikely((headroom - metasize) < sizeof(*xdp_frame))) return NULL; + /* Catch if driver didn't reserve tailroom for skb_shared_info */ + if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) { + XDP_WARN("Driver BUG: missing reserved tailroom"); + return NULL; + } + /* Store info in top of packet */ xdp_frame = xdp->data_hard_start; @@ -131,6 +142,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) xdp_frame->len = xdp->data_end - xdp->data; xdp_frame->headroom = headroom - sizeof(*xdp_frame); xdp_frame->metasize = metasize; + xdp_frame->frame_sz = xdp->frame_sz; /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */ xdp_frame->mem = xdp->rxq->mem; diff --git a/net/core/xdp.c b/net/core/xdp.c index 4c7ea85486af..490b8f5fa8ee 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -496,3 +497,10 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp) return xdpf; } EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame); + +/* Used by XDP_WARN macro, to avoid inlining WARN() in fast-path */ +void xdp_warn(const char *msg, const char *func, const int line) +{ + WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg); +}; +EXPORT_SYMBOL_GPL(xdp_warn);
[PATCH net-next v4 00/33] XDP extend with knowledge of frame size
(Patchset based on net-next due to all the driver updates) V4: - Fixup checkpatch.pl issues - Collected more ACKs V3: - Fix issue on virtio_net patch spotted by Jason Wang - Adjust name for variable in mlx5 patch - Collected more ACKs V2: - Fix bug in mlx5 for XDP_PASS case - Collected nitpicks and ACKs from mailing list V1: - Fix bug in dpaa2 XDP have evolved to support several frame sizes, but xdp_buff was not updated with this information. This have caused the side-effect that XDP frame data hard end is unknown. This have limited the BPF-helper bpf_xdp_adjust_tail to only shrink the packet. This patchset address this and add packet tail extend/grow. The purpose of the patchset is ALSO to reserve a memory area that can be used for storing extra information, specifically for extending XDP with multi-buffer support. One proposal is to use same layout as skb_shared_info, which is why this area is currently 320 bytes. When converting xdp_frame to SKB (veth and cpumap), the full tailroom area can now be used and SKB truesize is now correct. For most drivers this result in a much larger tailroom in SKB "head" data area. The network stack can now take advantage of this when doing SKB coalescing. Thus, a good driver test is to use xdp_redirect_cpu from samples/bpf/ and do some TCP stream testing. Use-cases for tail grow/extend: (1) IPsec / XFRM needs a tail extend[1][2]. (2) DNS-cache responses in XDP. (3) HAProxy ALOHA would need it to convert to XDP. (4) Add tail info e.g. timestamp and collect via tcpdump [1] http://vger.kernel.org/netconf2019_files/xfrm_xdp.pdf [2] http://vger.kernel.org/netconf2019.html Examples on howto access the tail area of an XDP packet is shown in the XDP-tutorial example[3]. [3] https://github.com/xdp-project/xdp-tutorial/blob/master/experiment01-tailgrow/ --- Ilias Apalodimas (1): net: netsec: Add support for XDP frame size Jesper Dangaard Brouer (32): xdp: add frame size to xdp_buff bnxt: add XDP frame size to driver sfc: add XDP frame size mvneta: add XDP frame size to driver net: XDP-generic determining XDP frame size xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame xdp: cpumap redirect use frame_sz and increase skb_tailroom veth: adjust hard_start offset on redirect XDP frames veth: xdp using frame_sz in veth driver dpaa2-eth: add XDP frame size hv_netvsc: add XDP frame size to driver qlogic/qede: add XDP frame size to driver net: ethernet: ti: add XDP frame size to driver cpsw ena: add XDP frame size to amazon NIC driver mlx4: add XDP frame size and adjust max XDP MTU net: thunderx: add XDP frame size nfp: add XDP frame size to netronome driver tun: add XDP frame size vhost_net: also populate XDP frame size virtio_net: add XDP frame size in two code paths ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K ixgbe: add XDP frame size to driver ixgbevf: add XDP frame size to VF driver i40e: add XDP frame size to driver ice: add XDP frame size to driver xdp: for Intel AF_XDP drivers add XDP frame_sz mlx5: rx queue setup time determine frame_sz for XDP xdp: allow bpf_xdp_adjust_tail() to grow packet size xdp: clear grow memory in bpf_xdp_adjust_tail() bpf: add xdp.frame_sz in bpf_prog_test_run_xdp(). selftests/bpf: adjust BPF selftest for xdp_adjust_tail selftests/bpf: xdp_adjust_tail add grow tail tests drivers/net/ethernet/amazon/ena/ena_netdev.c |1 drivers/net/ethernet/amazon/ena/ena_netdev.h |5 - drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |1 drivers/net/ethernet/cavium/thunder/nicvf_main.c |1 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c |7 + drivers/net/ethernet/intel/i40e/i40e_txrx.c| 30 - drivers/net/ethernet/intel/i40e/i40e_xsk.c |2 drivers/net/ethernet/intel/ice/ice_txrx.c | 34 -- drivers/net/ethernet/intel/ice/ice_xsk.c |2 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 33 - drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |2 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 34 -- drivers/net/ethernet/marvell/mvneta.c | 25 ++-- drivers/net/ethernet/mellanox/mlx4/en_netdev.c |3 drivers/net/ethernet/mellanox/mlx4/en_rx.c |1 drivers/net/ethernet/mellanox/mlx5/core/en.h |1 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |1 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |6 + drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|2 .../net/ethernet/netronome/nfp/nfp_net_common.c|6 + drivers/net/ethernet/qlogic/qede/qede_fp.c |1 drivers/net/ethernet/qlogic/qede/qede_main.c |2 drivers/net/ethernet/sfc/rx.c |1 drivers/net/ethernet/socionext/netsec.c| 30 +
[PATCH net-next v4 05/33] net: netsec: Add support for XDP frame size
From: Ilias Apalodimas This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that can help reduce the number of cache-lines that need to be flushed when doing DMA sync for_device. Due to xdp_adjust_tail can grow the area accessible to the by the CPU (can possibly write into), then max sync length *after* bpf_prog_run_xdp() needs to be taken into account. For XDP_TX action the driver is smart and does DMA-sync. When growing tail this is still safe, because page_pool have DMA-mapped the entire page size. Signed-off-by: Ilias Apalodimas Signed-off-by: Jesper Dangaard Brouer Acked-by: Lorenzo Bianconi --- drivers/net/ethernet/socionext/netsec.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c index a5a0fb60193a..e1f4be4b3d69 100644 --- a/drivers/net/ethernet/socionext/netsec.c +++ b/drivers/net/ethernet/socionext/netsec.c @@ -884,23 +884,28 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog, struct xdp_buff *xdp) { struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX]; - unsigned int len = xdp->data_end - xdp->data; + unsigned int sync, len = xdp->data_end - xdp->data; u32 ret = NETSEC_XDP_PASS; + struct page *page; int err; u32 act; act = bpf_prog_run_xdp(prog, xdp); + /* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */ + sync = xdp->data_end - xdp->data_hard_start - NETSEC_RXBUF_HEADROOM; + sync = max(sync, len); + switch (act) { case XDP_PASS: ret = NETSEC_XDP_PASS; break; case XDP_TX: ret = netsec_xdp_xmit_back(priv, xdp); - if (ret != NETSEC_XDP_TX) - page_pool_put_page(dring->page_pool, - virt_to_head_page(xdp->data), len, - true); + if (ret != NETSEC_XDP_TX) { + page = virt_to_head_page(xdp->data); + page_pool_put_page(dring->page_pool, page, sync, true); + } break; case XDP_REDIRECT: err = xdp_do_redirect(priv->ndev, xdp, prog); @@ -908,9 +913,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog, ret = NETSEC_XDP_REDIR; } else { ret = NETSEC_XDP_CONSUMED; - page_pool_put_page(dring->page_pool, - virt_to_head_page(xdp->data), len, - true); + page = virt_to_head_page(xdp->data); + page_pool_put_page(dring->page_pool, page, sync, true); } break; default: @@ -921,8 +925,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog, /* fall through -- handle aborts by dropping packet */ case XDP_DROP: ret = NETSEC_XDP_CONSUMED; - page_pool_put_page(dring->page_pool, - virt_to_head_page(xdp->data), len, true); + page = virt_to_head_page(xdp->data); + page_pool_put_page(dring->page_pool, page, sync, true); break; } @@ -936,10 +940,14 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget) struct netsec_rx_pkt_info rx_info; enum dma_data_direction dma_dir; struct bpf_prog *xdp_prog; + struct xdp_buff xdp; u16 xdp_xmit = 0; u32 xdp_act = 0; int done = 0; + xdp.rxq = &dring->xdp_rxq; + xdp.frame_sz = PAGE_SIZE; + rcu_read_lock(); xdp_prog = READ_ONCE(priv->xdp_prog); dma_dir = page_pool_get_dma_dir(dring->page_pool); @@ -953,7 +961,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget) struct sk_buff *skb = NULL; u16 pkt_len, desc_len; dma_addr_t dma_handle; - struct xdp_buff xdp; void *buf_addr; if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) { @@ -1002,7 +1009,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget) xdp.data = desc->addr + NETSEC_RXBUF_HEADROOM; xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + pkt_len; - xdp.rxq = &dring->xdp_rxq; if (xdp_prog) { xdp_result = netsec_run_xdp(priv, xdp_prog, &xdp);
[PATCH net-next v4 06/33] net: XDP-generic determining XDP frame size
The SKB "head" pointer points to the data area that contains skb_shared_info, that can be found via skb_end_pointer(). Given xdp->data_hard_start have been established (basically pointing to skb->head), frame size is between skb_end_pointer() and data_hard_start, plus the size reserved to skb_shared_info. Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive offset number on grow, and negative number on shrink. As this seems more natural when reading the code. Signed-off-by: Jesper Dangaard Brouer Acked-by: Toke Høiland-Jørgensen --- net/core/dev.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 4c91de39890a..f937a3ff668d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4617,6 +4617,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, xdp->data_meta = xdp->data; xdp->data_end = xdp->data + hlen; xdp->data_hard_start = skb->data - skb_headroom(skb); + + /* SKB "head" area always have tailroom for skb_shared_info */ + xdp->frame_sz = (void *)skb_end_pointer(skb) - xdp->data_hard_start; + xdp->frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + orig_data_end = xdp->data_end; orig_data = xdp->data; eth = (struct ethhdr *)xdp->data; @@ -4640,14 +4645,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, skb_reset_network_header(skb); } - /* check if bpf_xdp_adjust_tail was used. it can only "shrink" -* pckt. -*/ - off = orig_data_end - xdp->data_end; + /* check if bpf_xdp_adjust_tail was used */ + off = xdp->data_end - orig_data_end; if (off != 0) { skb_set_tail_pointer(skb, xdp->data_end - xdp->data); - skb->len -= off; - + skb->len += off; /* positive on grow, negative on shrink */ } /* check if XDP changed eth hdr such SKB needs update */
[PATCH net-next v4 03/33] sfc: add XDP frame size
This driver uses RX page-split when possible. It was recently fixed in commit 86e85bf6981c ("sfc: fix XDP-redirect in this driver") to add needed tailroom for XDP-redirect. After the fix efx->rx_page_buf_step is the frame size, with enough head and tail-room for XDP-redirect. Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/sfc/rx.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c index 260352d97d9d..68c47a8c71df 100644 --- a/drivers/net/ethernet/sfc/rx.c +++ b/drivers/net/ethernet/sfc/rx.c @@ -308,6 +308,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel, xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + rx_buf->len; xdp.rxq = &rx_queue->xdp_rxq_info; + xdp.frame_sz = efx->rx_page_buf_step; xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp); rcu_read_unlock();
[PATCH net-next v4 02/33] bnxt: add XDP frame size to driver
This driver uses full PAGE_SIZE pages when XDP is enabled. In case of XDP uses driver uses __bnxt_alloc_rx_page which does full page DMA-map. Thus, xdp_adjust_tail grow is DMA compliant for XDP_TX action that does DMA-sync. Cc: Michael Chan Cc: Andy Gospodarek Signed-off-by: Jesper Dangaard Brouer Reviewed-by: Andy Gospodarek --- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index c6f6f2033880..5e3b4a3b69ea 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -138,6 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, xdp_set_data_meta_invalid(&xdp); xdp.data_end = *data_ptr + *len; xdp.rxq = &rxr->xdp_rxq; + xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */ orig_data = xdp.data; rcu_read_lock();
[PATCH net-next v4 09/33] veth: adjust hard_start offset on redirect XDP frames
When native XDP redirect into a veth device, the frame arrives in the xdp_frame structure. It is then processed in veth_xdp_rcv_one(), which can run a new XDP bpf_prog on the packet. Doing so requires converting xdp_frame to xdp_buff, but the tricky part is that xdp_frame memory area is located in the top (data_hard_start) memory area that xdp_buff will point into. The current code tried to protect the xdp_frame area, by assigning xdp_buff.data_hard_start past this memory. This results in 32 bytes less headroom to expand into via BPF-helper bpf_xdp_adjust_head(). This protect step is actually not needed, because BPF-helper bpf_xdp_adjust_head() already reserve this area, and don't allow BPF-prog to expand into it. Thus, it is safe to point data_hard_start directly at xdp_frame memory area. Cc: Toshiaki Makita Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring") Reported-by: Mao Wenan Signed-off-by: Jesper Dangaard Brouer Acked-by: Toshiaki Makita Acked-by: Toke Høiland-Jørgensen --- drivers/net/veth.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index aece0e5eec8c..d5691bb84448 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -564,13 +564,15 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, struct veth_stats *stats) { void *hard_start = frame->data - frame->headroom; - void *head = hard_start - sizeof(struct xdp_frame); int len = frame->len, delta = 0; struct xdp_frame orig_frame; struct bpf_prog *xdp_prog; unsigned int headroom; struct sk_buff *skb; + /* bpf_xdp_adjust_head() assures BPF cannot access xdp_frame area */ + hard_start -= sizeof(struct xdp_frame); + rcu_read_lock(); xdp_prog = rcu_dereference(rq->xdp_prog); if (likely(xdp_prog)) { @@ -592,7 +594,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, break; case XDP_TX: orig_frame = *frame; - xdp.data_hard_start = head; xdp.rxq->mem = frame->mem; if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) { trace_xdp_exception(rq->dev, xdp_prog, act); @@ -605,7 +606,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, goto xdp_xmit; case XDP_REDIRECT: orig_frame = *frame; - xdp.data_hard_start = head; xdp.rxq->mem = frame->mem; if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { frame = &orig_frame; @@ -629,7 +629,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, rcu_read_unlock(); headroom = sizeof(struct xdp_frame) + frame->headroom - delta; - skb = veth_build_skb(head, headroom, len, 0); + skb = veth_build_skb(hard_start, headroom, len, 0); if (!skb) { xdp_return_frame(frame); stats->rx_drops++;
[PATCH net-next v4 04/33] mvneta: add XDP frame size to driver
This marvell driver mvneta uses PAGE_SIZE frames, which makes it really easy to convert. Driver updates rxq and now frame_sz once per NAPI call. This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that can help reduce the number of cache-lines that need to be flushed when doing DMA sync for_device. Due to xdp_adjust_tail can grow the area accessible to the by the CPU (can possibly write into), then max sync length *after* bpf_prog_run_xdp() needs to be taken into account. For XDP_TX action the driver is smart and does DMA-sync. When growing tail this is still safe, because page_pool have DMA-mapped the entire page size. Cc: thomas.petazz...@bootlin.com Acked-by: Lorenzo Bianconi Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/marvell/mvneta.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 51889770958d..37947949345c 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -2148,12 +2148,17 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq, struct bpf_prog *prog, struct xdp_buff *xdp, struct mvneta_stats *stats) { - unsigned int len; + unsigned int len, sync; + struct page *page; u32 ret, act; len = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction; act = bpf_prog_run_xdp(prog, xdp); + /* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */ + sync = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction; + sync = max(sync, len); + switch (act) { case XDP_PASS: stats->xdp_pass++; @@ -2164,9 +2169,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq, err = xdp_do_redirect(pp->dev, xdp, prog); if (unlikely(err)) { ret = MVNETA_XDP_DROPPED; - page_pool_put_page(rxq->page_pool, - virt_to_head_page(xdp->data), len, - true); + page = virt_to_head_page(xdp->data); + page_pool_put_page(rxq->page_pool, page, sync, true); } else { ret = MVNETA_XDP_REDIR; stats->xdp_redirect++; @@ -2175,10 +2179,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq, } case XDP_TX: ret = mvneta_xdp_xmit_back(pp, xdp); - if (ret != MVNETA_XDP_TX) - page_pool_put_page(rxq->page_pool, - virt_to_head_page(xdp->data), len, - true); + if (ret != MVNETA_XDP_TX) { + page = virt_to_head_page(xdp->data); + page_pool_put_page(rxq->page_pool, page, sync, true); + } break; default: bpf_warn_invalid_xdp_action(act); @@ -2187,8 +2191,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq, trace_xdp_exception(pp->dev, prog, act); /* fall through */ case XDP_DROP: - page_pool_put_page(rxq->page_pool, - virt_to_head_page(xdp->data), len, true); + page = virt_to_head_page(xdp->data); + page_pool_put_page(rxq->page_pool, page, sync, true); ret = MVNETA_XDP_DROPPED; stats->xdp_drop++; break; @@ -2320,6 +2324,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi, rcu_read_lock(); xdp_prog = READ_ONCE(pp->xdp_prog); xdp_buf.rxq = &rxq->xdp_rxq; + xdp_buf.frame_sz = PAGE_SIZE; /* Fairness NAPI loop */ while (rx_proc < budget && rx_proc < rx_todo) {
[PATCH net-next v4 01/33] xdp: add frame size to xdp_buff
XDP have evolved to support several frame sizes, but xdp_buff was not updated with this information. The frame size (frame_sz) member of xdp_buff is introduced to know the real size of the memory the frame is delivered in. When introducing this also make it clear that some tailroom is reserved/required when creating SKBs using build_skb(). It would also have been an option to introduce a pointer to data_hard_end (with reserved offset). The advantage with frame_sz is that (like rxq) drivers only need to setup/assign this value once per NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to store frame_sz inside xdp_rxq_info, because it's varies per packet as it can be based/depend on packet length. V2: nitpick: deduct -> deduce Signed-off-by: Jesper Dangaard Brouer Acked-by: Toke Høiland-Jørgensen --- include/net/xdp.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index 3cc6d5d84aa4..a764af4ae0ea 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -6,6 +6,8 @@ #ifndef __LINUX_NET_XDP_H__ #define __LINUX_NET_XDP_H__ +#include /* skb_shared_info */ + /** * DOC: XDP RX-queue information * @@ -70,8 +72,19 @@ struct xdp_buff { void *data_hard_start; unsigned long handle; struct xdp_rxq_info *rxq; + u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/ }; +/* Reserve memory area at end-of data area. + * + * This macro reserves tailroom in the XDP buffer by limiting the + * XDP/BPF data access to data_hard_end. Notice same area (and size) + * is used for XDP_PASS, when constructing the SKB via build_skb(). + */ +#define xdp_data_hard_end(xdp) \ + ((xdp)->data_hard_start + (xdp)->frame_sz - \ +SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + struct xdp_frame { void *data; u16 len;
[PATCH net-next v4 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom
Knowing the memory size backing the packet/xdp_frame data area, and knowing it already have reserved room for skb_shared_info, simplifies using build_skb significantly. With this change we no-longer lie about the SKB truesize, but more importantly a significant larger skb_tailroom is now provided, e.g. when drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce, see TCP cases where tcp_queue_rcv() can 'eat' skb). Signed-off-by: Jesper Dangaard Brouer Acked-by: Toke Høiland-Jørgensen --- kernel/bpf/cpumap.c | 21 +++-- 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 3fe0b006d2d2..a71790dab12d 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -162,25 +162,10 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu, /* Part of headroom was reserved to xdpf */ hard_start_headroom = sizeof(struct xdp_frame) + xdpf->headroom; - /* build_skb need to place skb_shared_info after SKB end, and -* also want to know the memory "truesize". Thus, need to -* know the memory frame size backing xdp_buff. -* -* XDP was designed to have PAGE_SIZE frames, but this -* assumption is not longer true with ixgbe and i40e. It -* would be preferred to set frame_size to 2048 or 4096 -* depending on the driver. -* frame_size = 2048; -* frame_len = frame_size - sizeof(*xdp_frame); -* -* Instead, with info avail, skb_shared_info in placed after -* packet len. This, unfortunately fakes the truesize. -* Another disadvantage of this approach, the skb_shared_info -* is not at a fixed memory location, with mixed length -* packets, which is bad for cache-line hotness. + /* Memory size backing xdp_frame data already have reserved +* room for build_skb to place skb_shared_info in tailroom. */ - frame_size = SKB_DATA_ALIGN(xdpf->len + hard_start_headroom) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + frame_size = xdpf->frame_sz; pkt_data_start = xdpf->data - hard_start_headroom; skb = build_skb_around(skb, pkt_data_start, frame_size);
[PATCH net-next v4 10/33] veth: xdp using frame_sz in veth driver
The veth driver can run XDP in "native" mode in it's own NAPI handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring") packets can come in two forms either xdp_frame or skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb(). For packets to arrive in xdp_frame format, they will have been redirected from an XDP native driver. In case of XDP_PASS or no XDP-prog attached, the veth driver will allocate and create an SKB. The current code in veth_xdp_rcv_one() xdp_frame case, had to guess the frame truesize of the incoming xdp_frame, when using veth_build_skb(). With xdp_frame->frame_sz this is not longer necessary. Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done similar to the XDP-generic handling code in net/core/dev.c. Cc: Toshiaki Makita Reviewed-by: Lorenzo Bianconi Signed-off-by: Jesper Dangaard Brouer Acked-by: Toke Høiland-Jørgensen Acked-by: Toshiaki Makita --- drivers/net/veth.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index d5691bb84448..b586d2fa5551 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -405,10 +405,6 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, { struct sk_buff *skb; - if (!buflen) { - buflen = SKB_DATA_ALIGN(headroom + len) + -SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - } skb = build_skb(head, buflen); if (!skb) return NULL; @@ -583,6 +579,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, xdp.data = frame->data; xdp.data_end = frame->data + frame->len; xdp.data_meta = frame->data - frame->metasize; + xdp.frame_sz = frame->frame_sz; xdp.rxq = &rq->xdp_rxq; act = bpf_prog_run_xdp(xdp_prog, &xdp); @@ -629,7 +626,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, rcu_read_unlock(); headroom = sizeof(struct xdp_frame) + frame->headroom - delta; - skb = veth_build_skb(hard_start, headroom, len, 0); + skb = veth_build_skb(hard_start, headroom, len, frame->frame_sz); if (!skb) { xdp_return_frame(frame); stats->rx_drops++; @@ -695,9 +692,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, goto drop; } - nskb = veth_build_skb(head, - VETH_XDP_HEADROOM + mac_len, skb->len, - PAGE_SIZE); + nskb = veth_build_skb(head, VETH_XDP_HEADROOM + mac_len, + skb->len, PAGE_SIZE); if (!nskb) { page_frag_free(head); goto drop; @@ -715,6 +711,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, xdp.data_end = xdp.data + pktlen; xdp.data_meta = xdp.data; xdp.rxq = &rq->xdp_rxq; + + /* SKB "head" area always have tailroom for skb_shared_info */ + xdp.frame_sz = (void *)skb_end_pointer(skb) - xdp.data_hard_start; + xdp.frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + orig_data = xdp.data; orig_data_end = xdp.data_end; @@ -758,6 +759,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, } rcu_read_unlock(); + /* check if bpf_xdp_adjust_head was used */ delta = orig_data - xdp.data; off = mac_len + delta; if (off > 0) @@ -765,9 +767,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, else if (off < 0) __skb_pull(skb, -off); skb->mac_header -= delta; + + /* check if bpf_xdp_adjust_tail was used */ off = xdp.data_end - orig_data_end; if (off != 0) - __skb_put(skb, off); + __skb_put(skb, off); /* positive on grow, negative on shrink */ skb->protocol = eth_type_trans(skb, rq->dev); metalen = xdp.data - xdp.data_meta;
[PATCH net-next v4 13/33] qlogic/qede: add XDP frame size to driver
The driver qede uses a full page, when XDP is enabled. The drivers value in rx_buf_seg_size (struct qede_rx_queue) will be PAGE_SIZE when an XDP bpf_prog is attached. Cc: Ariel Elior Cc: gr-everest-linux...@marvell.com Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/qlogic/qede/qede_fp.c |1 + drivers/net/ethernet/qlogic/qede/qede_main.c |2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c index c6c20776b474..7598ebe0962a 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_fp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c @@ -1066,6 +1066,7 @@ static bool qede_rx_xdp(struct qede_dev *edev, xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + *len; xdp.rxq = &rxq->xdp_rxq; + xdp.frame_sz = rxq->rx_buf_seg_size; /* PAGE_SIZE when XDP enabled */ /* Queues always have a full reset currently, so for the time * being until there's atomic program replace just mark read diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 300405369c37..194bff3ae813 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -1425,7 +1425,7 @@ static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq) if (rxq->rx_buf_size + size > PAGE_SIZE) rxq->rx_buf_size = PAGE_SIZE - size; - /* Segment size to spilt a page in multiple equal parts , + /* Segment size to split a page in multiple equal parts, * unless XDP is used in which case we'd use the entire page. */ if (!edev->xdp_prog) {
[PATCH net-next v4 15/33] ena: add XDP frame size to amazon NIC driver
Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account the reserved tailroom. Cc: Arthur Kiyanovski Acked-by: Sameeh Jubran Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/amazon/ena/ena_netdev.c |1 + drivers/net/ethernet/amazon/ena/ena_netdev.h |5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 2818965427e9..85b87ed02dd5 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1606,6 +1606,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, "%s qid %d\n", __func__, rx_ring->qid); res_budget = budget; xdp.rxq = &rx_ring->xdp_rxq; + xdp.frame_sz = ENA_PAGE_SIZE; do { xdp_verdict = XDP_PASS; diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 7df67bf09b93..680099afcccf 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -151,8 +151,9 @@ * The buffer size we share with the device is defined to be ENA_PAGE_SIZE */ -#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ - VLAN_HLEN - XDP_PACKET_HEADROOM) +#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ +VLAN_HLEN - XDP_PACKET_HEADROOM - \ +SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) #define ENA_IS_XDP_INDEX(adapter, index) (((index) >= (adapter)->xdp_first_ring) && \ ((index) < (adapter)->xdp_first_ring + (adapter)->xdp_num_queues))
[PATCH net-next v4 12/33] hv_netvsc: add XDP frame size to driver
The hyperv NIC driver does memory allocation and copy even without XDP. In XDP mode it will allocate a new page for each packet and copy over the payload, before invoking the XDP BPF-prog. The positive thing it that its easy to determine the xdp.frame_sz. The XDP implementation for hv_netvsc transparently passes xdp_prog to the associated VF NIC. Many of the Azure VMs are using SRIOV, so majority of the data are actually processed directly on the VF driver's XDP path. So the overhead of the synthetic data path (hv_netvsc) is minimal. Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the SKB via build_skb (based on the newly allocated page). Now using XDP frame_sz this will provide more skb_tailroom, which netstack can use for SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce). V3: Adjust patch desc to be more positive. Cc: Wei Liu Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Signed-off-by: Jesper Dangaard Brouer --- drivers/net/hyperv/netvsc_bpf.c |1 + drivers/net/hyperv/netvsc_drv.c |2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c index b86611041db6..1e0c024b0a93 100644 --- a/drivers/net/hyperv/netvsc_bpf.c +++ b/drivers/net/hyperv/netvsc_bpf.c @@ -49,6 +49,7 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, xdp_set_data_meta_invalid(xdp); xdp->data_end = xdp->data + len; xdp->rxq = &nvchan->xdp_rxq; + xdp->frame_sz = PAGE_SIZE; xdp->handle = 0; memcpy(xdp->data, data, len); diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 5de57fc3ec60..6267f706e8ee 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -795,7 +795,7 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net, if (xbuf) { unsigned int hdroom = xdp->data - xdp->data_hard_start; unsigned int xlen = xdp->data_end - xdp->data; - unsigned int frag_size = netvsc_xdp_fraglen(hdroom + xlen); + unsigned int frag_size = xdp->frame_sz; skb = build_skb(xbuf, frag_size);
[PATCH net-next v4 20/33] vhost_net: also populate XDP frame size
In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start) which contains the buffer length 'buflen' (with tailroom for skb_shared_info). Also storing this buflen in xdp->frame_sz, does not obsolete struct tun_xdp_hdr, as it also contains a struct virtio_net_hdr with other information. Cc: Jason Wang Signed-off-by: Jesper Dangaard Brouer Acked-by: Michael S. Tsirkin Acked-by: Jason Wang --- drivers/vhost/net.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2927f02cc7e1..516519dcc8ff 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -747,6 +747,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, xdp->data = buf + pad; xdp->data_end = xdp->data + len; hdr->buflen = buflen; + xdp->frame_sz = buflen; --net->refcnt_bias; alloc_frag->offset += buflen;
[PATCH net-next v4 17/33] net: thunderx: add XDP frame size
To help reviewers these are the defines related to RCV_FRAG_LEN #define DMA_BUFFER_LEN 1536 /* In multiples of 128bytes */ #define RCV_FRAG_LEN (SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) Cc: Sunil Goutham Cc: Robert Richter Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/cavium/thunder/nicvf_main.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index b4b33368698f..2ba0ce115e63 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -552,6 +552,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + len; xdp.rxq = &rq->xdp_rxq; + xdp.frame_sz = RCV_FRAG_LEN + XDP_PACKET_HEADROOM; orig_data = xdp.data; rcu_read_lock();
[PATCH net-next v4 21/33] virtio_net: add XDP frame size in two code paths
The virtio_net driver is running inside the guest-OS. There are two XDP receive code-paths in virtio_net, namely receive_small() and receive_mergeable(). The receive_big() function does not support XDP. In receive_small() the frame size is available in buflen. The buffer backing these frames are allocated in add_recvbuf_small() with same size, except for the headroom, but tailroom have reserved room for skb_shared_info. The headroom is encoded in ctx pointer as a value. In receive_mergeable() the frame size is more dynamic. There are two basic cases: (1) buffer size is based on a exponentially weighted moving average (see DECLARE_EWMA) of packet length. Or (2) in case virtnet_get_headroom() have any headroom then buffer size is PAGE_SIZE. The ctx pointer is this time used for encoding two values; the buffer len "truesize" and headroom. In case (1) if the rx buffer size is underestimated, the packet will have been split over more buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of buffer area). If that happens the XDP path does a xdp_linearize_page operation. V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang. The code is really hard to follow, so some hints to reviewers. The receive_mergeable() case gets frames that were allocated in add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(), and 'buf' ptr is advanced this headroom. The headroom can only be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really simple: static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; } As frame_sz is an offset size from xdp.data_hard_start, reviewers should notice how this is calculated in receive_mergeable(): int offset = buf - page_address(page); [...] data = page_address(xdp_page) + offset; xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len; The calculated offset will always be VIRTIO_XDP_HEADROOM when reaching this code. Thus, xdp.data_hard_start will be page-start address plus vi->hdr_len. Given this xdp.frame_sz need to be reduced with vi->hdr_len size. IMHO a followup patch should cleanup this code to make it easier to maintain and understand, but it is outside the scope of this patchset. Cc: Jason Wang Signed-off-by: Jesper Dangaard Brouer Acked-by: Michael S. Tsirkin Acked-by: Jason Wang --- drivers/net/virtio_net.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 11f722460513..9e1b5d748586 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev, xdp.data_end = xdp.data + len; xdp.data_meta = xdp.data; xdp.rxq = &rq->xdp_rxq; + xdp.frame_sz = buflen; orig_data = xdp.data; act = bpf_prog_run_xdp(xdp_prog, &xdp); stats->xdp_packets++; @@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, int offset = buf - page_address(page); struct sk_buff *head_skb, *curr_skb; struct bpf_prog *xdp_prog; - unsigned int truesize; + unsigned int truesize = mergeable_ctx_to_truesize(ctx); unsigned int headroom = mergeable_ctx_to_headroom(ctx); - int err; unsigned int metasize = 0; + unsigned int frame_sz; + int err; head_skb = NULL; stats->bytes += len - vi->hdr_len; @@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, if (unlikely(hdr->hdr.gso_type)) goto err_xdp; + /* Buffers with headroom use PAGE_SIZE as alloc size, +* see add_recvbuf_mergeable() + get_mergeable_buf_len() +*/ + frame_sz = headroom ? PAGE_SIZE : truesize; + /* This happens when rx buffer size is underestimated * or headroom is not enough because of the buffer * was refilled before XDP is set. This should only @@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, page, offset, VIRTIO_XDP_HEADROOM, &len); + frame_sz = PAGE_SIZE; + if (!xdp_page) goto err_xdp; offset = VIRTIO_XDP_HEADROOM; @@ -850,6 +859,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, xdp.data_end = xdp.data + (len - vi->hdr_len); xdp.data_meta = xdp.data; xdp.rxq = &rq->xdp_rxq; + xdp.frame_sz = frame_sz - vi->hdr_len; act = bpf_prog_run_xdp(xd
[PATCH net-next v4 19/33] tun: add XDP frame size
The tun driver have two code paths for running XDP (bpf_prog_run_xdp). In both cases 'buflen' contains enough tailroom for skb_shared_info. Cc: Jason Wang Signed-off-by: Jesper Dangaard Brouer Acked-by: Michael S. Tsirkin Acked-by: Jason Wang --- drivers/net/tun.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 44889eba1dbc..c54f967e2c66 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1671,6 +1671,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + len; xdp.rxq = &tfile->xdp_rxq; + xdp.frame_sz = buflen; act = bpf_prog_run_xdp(xdp_prog, &xdp); if (act == XDP_REDIRECT || act == XDP_TX) { @@ -2411,6 +2412,7 @@ static int tun_xdp_one(struct tun_struct *tun, } xdp_set_data_meta_invalid(xdp); xdp->rxq = &tfile->xdp_rxq; + xdp->frame_sz = buflen; act = bpf_prog_run_xdp(xdp_prog, xdp); err = tun_xdp_act(tun, xdp_prog, xdp, act);
[PATCH net-next v4 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
The ixgbe driver have another memory model when compiled on archs with PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in two halves, but instead increment rx_buffer->page_offset by truesize of packet (which include headroom and tailroom for skb_shared_info). This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip which is currently only called on XDP_TX and XDP_REDIRECT, it forgets to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for veth and cpumap. Fix by adding size of skb_shared_info tailroom. Maintainers notice: This fix have been queued to Jeff. Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect") Cc: Jeff Kirsher Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 718931d951bc..ea6834bae04c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2254,7 +2254,8 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring, rx_buffer->page_offset ^= truesize; #else unsigned int truesize = ring_uses_build_skb(rx_ring) ? - SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) : + SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) : SKB_DATA_ALIGN(size); rx_buffer->page_offset += truesize;
[PATCH net-next v4 11/33] dpaa2-eth: add XDP frame size
The dpaa2-eth driver reserve some headroom used for hardware and software annotation area in RX/TX buffers. Thus, xdp.data_hard_start doesn't start at page boundary. When XDP is configured the area reserved via dpaa2_fd_get_offset(fd) is 448 bytes of which XDP have reserved 256 bytes. As frame_sz is calculated as an offset from xdp_buff.data_hard_start, an adjust from the full PAGE_SIZE == DPAA2_ETH_RX_BUF_RAW_SIZE. When doing XDP_REDIRECT, the driver doesn't need this reserved headroom any-longer and allows xdp_do_redirect() to use it. This is an advantage for the drivers own ndo-xdp_xmit, as it uses part of this headroom for itself. Patch also adjust frame_sz in this case. The driver cannot support XDP data_meta, because it uses the headroom just before xdp.data for struct dpaa2_eth_swa (DPAA2_ETH_SWA_SIZE=64), when transmitting the packet. When transmitting a xdp_frame in dpaa2_eth_xdp_xmit_frame (call via ndo_xdp_xmit) is uses this area to store a pointer to xdp_frame and dma_size, which is used in TX completion (free_tx_fd) to return frame via xdp_return_frame(). Cc: Ioana Radulescu Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c |7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c index 0f3e842a4fd6..8c8d95aa1dfd 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c @@ -331,6 +331,9 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv, xdp_set_data_meta_invalid(&xdp); xdp.rxq = &ch->xdp_rxq; + xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE - + (dpaa2_fd_get_offset(fd) - XDP_PACKET_HEADROOM); + xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp); /* xdp.data pointer may have changed */ @@ -366,7 +369,11 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv, dma_unmap_page(priv->net_dev->dev.parent, addr, DPAA2_ETH_RX_BUF_SIZE, DMA_BIDIRECTIONAL); ch->buf_count--; + + /* Allow redirect use of full headroom */ xdp.data_hard_start = vaddr; + xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE; + err = xdp_do_redirect(priv->net_dev, &xdp, xdp_prog); if (unlikely(err)) ch->stats.xdp_drop++;
[PATCH net-next v4 14/33] net: ethernet: ti: add XDP frame size to driver cpsw
The driver code cpsw.c and cpsw_new.c both use page_pool with default order-0 pages or their RX-pages. Cc: Grygorii Strashko Cc: Ilias Apalodimas Signed-off-by: Jesper Dangaard Brouer Reviewed-by: Grygorii Strashko --- drivers/net/ethernet/ti/cpsw.c |1 + drivers/net/ethernet/ti/cpsw_new.c |1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 09f98fa2fb4e..ce0645ada6e7 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -406,6 +406,7 @@ static void cpsw_rx_handler(void *token, int len, int status) xdp.data_hard_start = pa; xdp.rxq = &priv->xdp_rxq[ch]; + xdp.frame_sz = PAGE_SIZE; port = priv->emac_port + cpsw->data.dual_emac; ret = cpsw_run_xdp(priv, ch, &xdp, page, port); diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c index dce49311d3d3..1247d35d42ef 100644 --- a/drivers/net/ethernet/ti/cpsw_new.c +++ b/drivers/net/ethernet/ti/cpsw_new.c @@ -348,6 +348,7 @@ static void cpsw_rx_handler(void *token, int len, int status) xdp.data_hard_start = pa; xdp.rxq = &priv->xdp_rxq[ch]; + xdp.frame_sz = PAGE_SIZE; ret = cpsw_run_xdp(priv, ch, &xdp, page, priv->emac_port); if (ret != CPSW_XDP_PASS)
[PATCH net-next v4 16/33] mlx4: add XDP frame size and adjust max XDP MTU
The mlx4 drivers size of memory backing the RX packet is stored in frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096). For normal mode frag_stride is 2048. Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account. Cc: Tariq Toukan Cc: Saeed Mahameed Signed-off-by: Jesper Dangaard Brouer Reviewed-by: Tariq Toukan --- drivers/net/ethernet/mellanox/mlx4/en_netdev.c |3 ++- drivers/net/ethernet/mellanox/mlx4/en_rx.c |1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 43dcbd8214c6..5bd3cd37d50f 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -51,7 +51,8 @@ #include "en_port.h" #define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \ - XDP_PACKET_HEADROOM)) + XDP_PACKET_HEADROOM - \ + SKB_DATA_ALIGN(sizeof(struct skb_shared_info int mlx4_en_setup_tc(struct net_device *dev, u8 up) { diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 787139219813..8a10285b0e10 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -683,6 +683,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud rcu_read_lock(); xdp_prog = rcu_dereference(ring->xdp_prog); xdp.rxq = &ring->xdp_rxq; + xdp.frame_sz = priv->frag_info[0].frag_stride; doorbell_pending = 0; /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
[PATCH net-next v4 18/33] nfp: add XDP frame size to netronome driver
The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM. Thus, adjust for this when setting xdp.frame_sz, as it counts from data_hard_start. When doing XDP_TX this driver is smart and instead of a full DMA-map does a DMA-sync on with packet length. As xdp_adjust_tail can now grow packet length, add checks to make sure that grow size is within the DMA-mapped size. Cc: Jakub Kicinski Signed-off-by: Jesper Dangaard Brouer Reviewed-by: Jakub Kicinski --- .../net/ethernet/netronome/nfp/nfp_net_common.c|6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 9bfb3b077bc1..0e0cc3d58bdc 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -1741,10 +1741,15 @@ nfp_net_tx_xdp_buf(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring, struct nfp_net_rx_buf *rxbuf, unsigned int dma_off, unsigned int pkt_len, bool *completed) { + unsigned int dma_map_sz = dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA; struct nfp_net_tx_buf *txbuf; struct nfp_net_tx_desc *txd; int wr_idx; + /* Reject if xdp_adjust_tail grow packet beyond DMA area */ + if (pkt_len + dma_off > dma_map_sz) + return false; + if (unlikely(nfp_net_tx_full(tx_ring, 1))) { if (!*completed) { nfp_net_xdp_complete(tx_ring); @@ -1817,6 +1822,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget) rcu_read_lock(); xdp_prog = READ_ONCE(dp->xdp_prog); true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz; + xdp.frame_sz = PAGE_SIZE - NFP_NET_RX_BUF_HEADROOM; xdp.rxq = &rx_ring->xdp_rxq; tx_ring = r_vec->xdp_ring;
[PATCH net-next v4 24/33] ixgbevf: add XDP frame size to VF driver
This patch mirrors the changes to ixgbe in previous patch. This VF driver doesn't support XDP_REDIRECT, but correct tailroom is still necessary for BPF-helper xdp_adjust_tail. In legacy-mode + larger PAGE_SIZE, due to lacking tailroom, we accept that xdp_adjust_tail shrink doesn't work. Cc: intel-wired-...@lists.osuosl.org Cc: Jeff Kirsher Cc: Alexander Duyck Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 34 + 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index 4622c4ea2e46..a39e2cb384dd 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -1095,19 +1095,31 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter, return ERR_PTR(-result); } +static unsigned int ixgbevf_rx_frame_truesize(struct ixgbevf_ring *rx_ring, + unsigned int size) +{ + unsigned int truesize; + +#if (PAGE_SIZE < 8192) + truesize = ixgbevf_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */ +#else + truesize = ring_uses_build_skb(rx_ring) ? + SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) : + SKB_DATA_ALIGN(size); +#endif + return truesize; +} + static void ixgbevf_rx_buffer_flip(struct ixgbevf_ring *rx_ring, struct ixgbevf_rx_buffer *rx_buffer, unsigned int size) { -#if (PAGE_SIZE < 8192) - unsigned int truesize = ixgbevf_rx_pg_size(rx_ring) / 2; + unsigned int truesize = ixgbevf_rx_frame_truesize(rx_ring, size); +#if (PAGE_SIZE < 8192) rx_buffer->page_offset ^= truesize; #else - unsigned int truesize = ring_uses_build_skb(rx_ring) ? - SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) : - SKB_DATA_ALIGN(size); - rx_buffer->page_offset += truesize; #endif } @@ -1125,6 +1137,11 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, xdp.rxq = &rx_ring->xdp_rxq; + /* Frame size depend on rx_ring setup when PAGE_SIZE=4K */ +#if (PAGE_SIZE < 8192) + xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, 0); +#endif + while (likely(total_rx_packets < budget)) { struct ixgbevf_rx_buffer *rx_buffer; union ixgbe_adv_rx_desc *rx_desc; @@ -1157,7 +1174,10 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, xdp.data_hard_start = xdp.data - ixgbevf_rx_offset(rx_ring); xdp.data_end = xdp.data + size; - +#if (PAGE_SIZE > 4096) + /* At larger PAGE_SIZE, frame_sz depend on len size */ + xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, size); +#endif skb = ixgbevf_run_xdp(adapter, rx_ring, &xdp); }
[PATCH net-next v4 23/33] ixgbe: add XDP frame size to driver
This driver uses different memory models depending on PAGE_SIZE at compile time. For PAGE_SIZE 4K it uses page splitting, meaning for normal MTU frame size is 2048 bytes (and headroom 192 bytes). For larger MTUs the driver still use page splitting, by allocating order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than 4K, driver instead advance its rx_buffer->page_offset with the frame size "truesize". For XDP frame size calculations, this mean that in PAGE_SIZE larger than 4K mode the frame_sz change on a per packet basis. For the page split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be updated once outside the main NAPI loop. The default setting in the driver uses build_skb(), which provides the necessary headroom and tailroom for XDP-redirect in RX-frame (in both modes). There is one complication, which is legacy-rx mode (configurable via ethtool priv-flags). There are zero headroom in this mode, which is a requirement for XDP-redirect to work. The conversion to xdp_frame (convert_to_xdp_frame) will detect this insufficient space, and xdp_do_redirect() call will fail. This is deemed acceptable, as it allows other XDP actions to still work in legacy-mode. In legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also accept that xdp_adjust_tail shrink doesn't work. Cc: intel-wired-...@lists.osuosl.org Cc: Jeff Kirsher Cc: Alexander Duyck Signed-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 34 +++-- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index ea6834bae04c..eab5934b04f5 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2244,20 +2244,30 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter, return ERR_PTR(-result); } +static unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring, + unsigned int size) +{ + unsigned int truesize; + +#if (PAGE_SIZE < 8192) + truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */ +#else + truesize = ring_uses_build_skb(rx_ring) ? + SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) : + SKB_DATA_ALIGN(size); +#endif + return truesize; +} + static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring, struct ixgbe_rx_buffer *rx_buffer, unsigned int size) { + unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size); #if (PAGE_SIZE < 8192) - unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2; - rx_buffer->page_offset ^= truesize; #else - unsigned int truesize = ring_uses_build_skb(rx_ring) ? - SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) : - SKB_DATA_ALIGN(size); - rx_buffer->page_offset += truesize; #endif } @@ -2291,6 +2301,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, xdp.rxq = &rx_ring->xdp_rxq; + /* Frame size depend on rx_ring setup when PAGE_SIZE=4K */ +#if (PAGE_SIZE < 8192) + xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0); +#endif + while (likely(total_rx_packets < budget)) { union ixgbe_adv_rx_desc *rx_desc; struct ixgbe_rx_buffer *rx_buffer; @@ -2324,7 +2339,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, xdp.data_hard_start = xdp.data - ixgbe_rx_offset(rx_ring); xdp.data_end = xdp.data + size; - +#if (PAGE_SIZE > 4096) + /* At larger PAGE_SIZE, frame_sz depend on len size */ + xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size); +#endif skb = ixgbe_run_xdp(adapter, rx_ring, &xdp); }