Re: Bug or mis configuration for mlx5e lag and multipath
Hi, I can get the right log from demsg mlx5_core :81:00.0: modify lag map port 1:1 port 2:2 I debug with the driver, I find the rule be add on mlx_pf0vf0 and the peer one pf1, So I think the esw0 and esw1 both have the rule. The test case is based on the master branch of the net git tree. 在 2019/5/23 23:15, Roi Dayan 写道: > > On 20/05/2019 04:53, wenxu wrote: >> Hi Roi & Saeed, >> >> I just test the mlx5e lag and mutipath feature. There are some suituation >> the outgoing can't be offloaded. >> >> ovs configureation as following. >> >> # ovs-vsctl show >> dfd71dfb-6e22-423e-b088-d2022103af6b >> Bridge "br0" >> Port "mlx_pf0vf0" >> Interface "mlx_pf0vf0" >> Port gre >> Interface gre >> type: gre >> options: {key="1000", local_ip="172.168.152.75", >> remote_ip="172.168.152.241"} >> Port "br0" >> Interface "br0" >> type: internal >> >> set the mlx5e driver: >> >> >> modprobe mlx5_core >> echo 0 > /sys/class/net/eth2/device/sriov_numvfs >> echo 0 > /sys/class/net/eth3/device/sriov_numvfs >> echo 2 > /sys/class/net/eth2/device/sriov_numvfs >> echo 2 > /sys/class/net/eth3/device/sriov_numvfs >> lspci -nn | grep Mellanox >> echo :81:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind >> echo :81:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind >> echo :81:03.6 > /sys/bus/pci/drivers/mlx5_core/unbind >> echo :81:03.7 > /sys/bus/pci/drivers/mlx5_core/unbind >> >> devlink dev eswitch set pci/:81:00.0 mode switchdev encap enable >> devlink dev eswitch set pci/:81:00.1 mode switchdev encap enable >> >> modprobe bonding mode=802.3ad miimon=100 lacp_rate=1 >> ip l del dev bond0 >> ifconfig mlx_p0 down >> ifconfig mlx_p1 down >> ip l add dev bond0 type bond mode 802.3ad >> ifconfig bond0 172.168.152.75/24 up >> echo 1 > /sys/class/net/bond0/bonding/xmit_hash_policy >> ip l set dev mlx_p0 master bond0 >> ip l set dev mlx_p1 master bond0 >> ifconfig mlx_p0 up >> ifconfig mlx_p1 up >> >> systemctl start openvswitch >> ovs-vsctl set Open_vSwitch . other_config:hw-offload=true >> systemctl restart openvswitch >> >> >> mlx_pf0vf0 is assigned to vm. The tc rule show in_hw >> >> # tc filter ls dev mlx_pf0vf0 ingress >> filter protocol ip pref 2 flower >> filter protocol ip pref 2 flower handle 0x1 >> dst_mac 8e:c0:bd:bf:72:c3 >> src_mac 52:54:00:00:12:75 >> eth_type ipv4 >> ip_tos 0/3 >> ip_flags nofrag >> in_hw >> action order 1: tunnel_key set >> src_ip 172.168.152.75 >> dst_ip 172.168.152.241 >> key_id 1000 pipe >> index 2 ref 1 bind 1 >> >> action order 2: mirred (Egress Redirect to device gre_sys) stolen >> index 2 ref 1 bind 1 >> >> In the vm: the mlx5e driver enable xps default (by the way I think it is >> better not enable xps in default kernel can select queue by each flow), in >> the lag mode different vf queue associate with hw PF. >> >> with command taskset -c 2 ping 10.0.0.241 >> >> the packet can be offloaded , the outgoing pf is mlx_p0 >> >> but with command taskset -c 1 ping 10.0.0.241 >> >> the packet can't be offloaded, I can capture the packet on the mlx_pf0vf0, >> the outgoing pf is mlx_p1. Althrough the tc flower rule show in_hw >> >> >> I check with the driver both mlx_pf0vf0 and peer(mlx_p1) install the tc >> rule success >> >> I think it's a problem of lag mode. Or I miss some configureation? >> >> >> BR >> >> wenxu >> >> >> >> >> > Hi, > > we need to verify the driver detected to be in lag mode and > duplicated the offload rule to both eswitches. > do you see lag map messages in dmesg? > something like "lag map port 1:1 port 2:2" > this is to make sure the driver actually in lag mode. > in this mode a rule added to mlx_pf0vf0 will be added to esw of pf0 and esw > of pf1. > then when u send a packet it could be handled in esw0 or esw1 > if the rule is not in esw1 then it wont be offloaded when using pf1. > > thanks, > Roi
[PATCH net-next] netfilter: ipv6: fix compile err unknown field br_defrag and br_fragment
From: wenxu When CONFIG_IPV6 is not build with modules and CONIFG_NF_CONNTRACK_BRIDGE=m There will compile err: net/ipv6/netfilter.c:242:2: error: unknown field 'br_defrag' specified in initializer .br_defrag = nf_ct_frag6_gather, net/ipv6/netfilter.c:243:2: error: unknown field 'br_fragment' specified in initializer .br_fragment = br_ip6_fragment, Fixes: 764dd163ac92 ("netfilter: nf_conntrack_bridge: add support for IPv6") Signed-off-by: wenxu --- net/ipv6/netfilter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index c666538..9530cc2 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -238,7 +238,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, .route_input= ip6_route_input, .fragment = ip6_fragment, .reroute= nf_ip6_reroute, -#if IS_MODULE(CONFIG_NF_CONNTRACK_BRIDGE) +#if IS_MODULE(CONFIG_IPV6) .br_defrag = nf_ct_frag6_gather, .br_fragment= br_ip6_fragment, #endif -- 1.8.3.1
Re: [PATCH net-next,v2] netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y
Signed-off-by: wenxu On 5/31/2019 5:15 PM, Pablo Neira Ayuso wrote: > This patch fixes a few problems with CONFIG_IPV6=y and > CONFIG_NF_CONNTRACK_BRIDGE=m: > > In file included from net/netfilter/utils.c:5: > include/linux/netfilter_ipv6.h: In function 'nf_ipv6_br_defrag': > include/linux/netfilter_ipv6.h:110:9: error: implicit declaration of function > 'nf_ct_frag6_gather'; did you mean 'nf_ct_attach'? > [-Werror=implicit-function-declaration] > > And these too: > > net/ipv6/netfilter.c:242:2: error: unknown field 'br_defrag' specified in > initializer > net/ipv6/netfilter.c:243:2: error: unknown field 'br_fragment' specified in > initializer > > This patch includes an original chunk from wenxu. > > Fixes: 764dd163ac92 ("netfilter: nf_conntrack_bridge: add support for IPv6") > Reported-by: Stephen Rothwell > Reported-by: Yuehaibing > Reported-by: kbuild test robot > Reported-by: wenxu > Signed-off-by: Pablo Neira Ayuso > --- > v2: Forgot to include "net-next" and added Reported-by to all people that have > reported problems. > > include/linux/netfilter_ipv6.h | 2 ++ > net/ipv6/netfilter.c | 2 +- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/linux/netfilter_ipv6.h b/include/linux/netfilter_ipv6.h > index a21b8c9623ee..3a3dc4b1f0e7 100644 > --- a/include/linux/netfilter_ipv6.h > +++ b/include/linux/netfilter_ipv6.h > @@ -96,6 +96,8 @@ static inline int nf_ip6_route(struct net *net, struct > dst_entry **dst, > #endif > } > > +#include > + > static inline int nf_ipv6_br_defrag(struct net *net, struct sk_buff *skb, > u32 user) > { > diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c > index c6665382acb5..9530cc280953 100644 > --- a/net/ipv6/netfilter.c > +++ b/net/ipv6/netfilter.c > @@ -238,7 +238,7 @@ static const struct nf_ipv6_ops ipv6ops = { > .route_input= ip6_route_input, > .fragment = ip6_fragment, > .reroute= nf_ip6_reroute, > -#if IS_MODULE(CONFIG_NF_CONNTRACK_BRIDGE) > +#if IS_MODULE(CONFIG_IPV6) > .br_defrag = nf_ct_frag6_gather, > .br_fragment= br_ip6_fragment, > #endif
[PATCH] netfilter: ipv6: Fix undefined symbol nf_ct_frag6_gather
From: wenxu CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined! Fixes: c9bb6165a16e ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y") Reported-by: kbuild test robot Signed-off-by: wenxu --- net/ipv6/netfilter.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index 9530cc2..96d7abf 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -238,8 +238,10 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, .route_input= ip6_route_input, .fragment = ip6_fragment, .reroute= nf_ip6_reroute, -#if IS_MODULE(CONFIG_IPV6) +#if IS_MODULE(CONFIG_IPV6) && IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) .br_defrag = nf_ct_frag6_gather, +#endif +#if IS_MODULE(CONFIG_IPV6) .br_fragment= br_ip6_fragment, #endif }; -- 1.8.3.1
[PATCH net-next v2] netfilter: ipv6: Fix undefined symbol nf_ct_frag6_gather
From: wenxu CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined! Fixes: c9bb6165a16e ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y") Reported-by: kbuild test robot Signed-off-by: wenxu --- v2: Forgot to include "net-next" net/ipv6/netfilter.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index 9530cc2..96d7abf 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -238,8 +238,10 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, .route_input= ip6_route_input, .fragment = ip6_fragment, .reroute= nf_ip6_reroute, -#if IS_MODULE(CONFIG_IPV6) +#if IS_MODULE(CONFIG_IPV6) && IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) .br_defrag = nf_ct_frag6_gather, +#endif +#if IS_MODULE(CONFIG_IPV6) .br_fragment= br_ip6_fragment, #endif }; -- 1.8.3.1
[PATCH] netfilter: nft_paylaod: add base type NFT_PAYLOAD_LL_HEADER_NO_TAG
From: wenxu nft add rule bridge firewall rule-100-ingress ip protocol icmp drop The rule like above "ip protocol icmp", the packet will not be matched, It tracelate base=NFT_PAYLOAD_LL_HEADER off=12 && base=NFT_PAYLOAD_NETWORK_HEADER off=11 if the packet contained with tag info. But the user don't care about the vlan tag. Signed-off-by: wenxu --- include/uapi/linux/netfilter/nf_tables.h | 2 ++ net/netfilter/nft_payload.c | 10 +- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 505393c..345787f 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -673,11 +673,13 @@ enum nft_dynset_attributes { * @NFT_PAYLOAD_LL_HEADER: link layer header * @NFT_PAYLOAD_NETWORK_HEADER: network header * @NFT_PAYLOAD_TRANSPORT_HEADER: transport header + * @NFT_PAYLOAD_LL_HEADER_NO_TAG: link layer header ignore vlan tag */ enum nft_payload_bases { NFT_PAYLOAD_LL_HEADER, NFT_PAYLOAD_NETWORK_HEADER, NFT_PAYLOAD_TRANSPORT_HEADER, + NFT_PAYLOAD_LL_HEADER_NO_TAG, }; /** diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index 1465b7d..3cc7398 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -93,6 +93,12 @@ void nft_payload_eval(const struct nft_expr *expr, } offset = skb_mac_header(skb) - skb->data; break; + case NFT_PAYLOAD_LL_HEADER_NO_TAG: + if (!skb_mac_header_was_set(skb)) + goto err; + + offset = skb_mac_header(skb) - skb->data; + break; case NFT_PAYLOAD_NETWORK_HEADER: offset = skb_network_offset(skb); break; @@ -403,6 +409,7 @@ static int nft_payload_set_dump(struct sk_buff *skb, const struct nft_expr *expr case NFT_PAYLOAD_LL_HEADER: case NFT_PAYLOAD_NETWORK_HEADER: case NFT_PAYLOAD_TRANSPORT_HEADER: + case NFT_PAYLOAD_LL_HEADER_NO_TAG: break; default: return ERR_PTR(-EOPNOTSUPP); @@ -421,7 +428,8 @@ static int nft_payload_set_dump(struct sk_buff *skb, const struct nft_expr *expr len= ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN])); if (len <= 4 && is_power_of_2(len) && IS_ALIGNED(offset, len) && - base != NFT_PAYLOAD_LL_HEADER) + base != NFT_PAYLOAD_LL_HEADER && + base != NFT_PAYLOAD_LL_HEADER_NO_TAG) return &nft_payload_fast_ops; else return &nft_payload_ops; -- 1.8.3.1
rtnetlink dump operations also share the rrtnl_mutex
Hi all, When a netlink_route socket was created, the nlk->cb_mutex is get through nl_table[NETLINK_ROUTE].cb_mutex which is the rtnl_mutex. So all the netlink_route dump operation can contend for each other (ip l, ip a, ip r, tc ls). It will also contend with other RTM_NEW/DEL operations. So there can be a good way for each msgtype have their own mutex for dump operations? BR wenxu
[PATCH net-next] bridge: Set the pvid for untaged packet before prerouting
From: wenxu bridge vlan add dev veth1 vid 200 pvid untagged bridge vlan add dev veth2 vid 200 pvid untagged nft add table bridge firewall nft add chain bridge firewall zones { type filter hook prerouting priority - 300 \; } nft add rule bridge firewall zones counter ct zone set vlan id map { 100 : 1, 200 : 2 } As above set the bridge port with pvid, the received packet don't contain the vlan tag which means the packet should belong to vlan 200 through pvid. User can do conntrack base base on vlan id and map the vlan id to zone id in the prerouting hook. Signed-off-by: wenxu --- net/bridge/br_input.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 21b74e7..31b44bc 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -341,6 +341,13 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) } forward: + if (br_opt_get(p->br, BROPT_VLAN_ENABLED) && !skb_vlan_tag_present(skb)) { + u16 pvid = br_get_pvid(nbp_vlan_group_rcu(p)); + + if (pvid) + __vlan_hwaccel_put_tag(skb, p->br->vlan_proto, pvid); + } + switch (p->state) { case BR_STATE_FORWARDING: case BR_STATE_LEARNING: -- 1.8.3.1
Re: [PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
On 7/25/2019 7:51 AM, Marcelo Ricardo Leitner wrote: > On Thu, Jul 11, 2019 at 04:03:30PM +0800, we...@ucloud.cn wrote: >> From: wenxu >> >> The flow_cls_common_offload prio should be not zero >> >> It leads the invalid table prio in hw. >> >> # nft add table netdev firewall >> # nft add chain netdev firewall acl { type filter hook ingress device >> mlx_pf0vf0 priority - 300 \; } >> # nft add rule netdev firewall acl ip daddr 1.1.1.7 drop >> Error: Could not process rule: Invalid argument >> >> kernel log >> mlx5_core :81:00.0: E-Switch: Failed to create FDB Table err -22 (table >> prio: 65535, level: 0, size: 4194304) >> >> Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") >> Signed-off-by: wenxu >> --- >> net/netfilter/nf_tables_offload.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/net/netfilter/nf_tables_offload.c >> b/net/netfilter/nf_tables_offload.c >> index 2c33028..01d8133 100644 >> --- a/net/netfilter/nf_tables_offload.c >> +++ b/net/netfilter/nf_tables_offload.c >> @@ -7,6 +7,8 @@ >> #include >> #include >> >> +#define FLOW_OFFLOAD_DEFAUT_PRIO 1U >> + >> static struct nft_flow_rule *nft_flow_rule_alloc(int num_actions) >> { >> struct nft_flow_rule *flow; >> @@ -107,6 +109,7 @@ static void nft_flow_offload_common_init(struct >> flow_cls_common_offload *common, >> struct netlink_ext_ack *extack) >> { >> common->protocol = proto; >> +common->prio = TC_H_MAKE(FLOW_OFFLOAD_DEFAUT_PRIO << 16, 0); > Note that tc semantics for this is to auto-generate a priority in such > cases, instead of using a default. > > @tc_new_tfilter(): > if (prio == 0) { > /* If no priority is provided by the user, > * we allocate one. > */ > if (n->nlmsg_flags & NLM_F_CREATE) { > prio = TC_H_MAKE(0x8000U, 0U); > prio_allocate = true; > ... > if (prio_allocate) > prio = tcf_auto_prio(tcf_chain_tp_prev(chain, >&chain_info)); Yes,The tc auto-generate a priority. But if there is no pre tcf_proto, the priority is also set as a default. In nftables each rule no priortiy for each other. So It is enough to set a default value which is similar as the tc. static inline u32 tcf_auto_prio(struct tcf_proto *tp) { u32 first = TC_H_MAKE(0xC000U, 0U); if (tp) first = tp->prio - 1; return TC_H_MAJ(first); }
[PATCH net-next 1/3] flow_offload: move tc indirect block to flow offload
From: wenxu move tc indirect block to flow_offload.c. The nf_tables can use the indr block architecture. Signed-off-by: wenxu --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 10 +- include/net/flow_offload.h | 40 include/net/pkt_cls.h | 35 include/net/sch_generic.h | 3 - net/core/flow_offload.c| 189 + net/sched/cls_api.c| 225 ++--- 7 files changed, 254 insertions(+), 258 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 7f747cb..074573b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, { int err; - err = __tc_indr_block_cb_register(netdev, rpriv, - mlx5e_rep_indr_setup_tc_cb, - rpriv); + err = __flow_indr_block_cb_register(netdev, rpriv, + mlx5e_rep_indr_setup_tc_cb, + rpriv); if (err) { struct mlx5e_priv *priv = netdev_priv(rpriv->netdev); @@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv, struct net_device *netdev) { - __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, - rpriv); + __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, + rpriv); } static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb, diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c index e209f15..6a0f034 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1479,16 +1479,16 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app, return NOTIFY_OK; if (event == NETDEV_REGISTER) { - err = __tc_indr_block_cb_register(netdev, app, - nfp_flower_indr_setup_tc_cb, - app); + err = __flow_indr_block_cb_register(netdev, app, + nfp_flower_indr_setup_tc_cb, + app); if (err) nfp_flower_cmsg_warn(app, "Indirect block reg failed - %s\n", netdev->name); } else if (event == NETDEV_UNREGISTER) { - __tc_indr_block_cb_unregister(netdev, - nfp_flower_indr_setup_tc_cb, app); + __flow_indr_block_cb_unregister(netdev, + nfp_flower_indr_setup_tc_cb, app); } return NOTIFY_OK; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index b16d216..373028e 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -4,6 +4,7 @@ #include #include #include +#include struct flow_match { struct flow_dissector *dissector; @@ -347,4 +348,43 @@ static inline void flow_block_init(struct flow_block *flow_block) INIT_LIST_HEAD(&flow_block->cb_list); } +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, + enum tc_setup_type type, void *type_data); + +struct flow_indr_block_cb { + struct list_head list; + void *cb_priv; + flow_indr_block_bind_cb_t *cb; + void *cb_ident; +}; + +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, void *block, + struct flow_indr_block_cb *indr_block_cb, + enum flow_block_command command); + +struct flow_indr_block_dev { + struct rhash_head ht_node; + struct net_device *dev; + unsigned int refcnt; + struct list_head cb_list; + flow_indr_block_ing_cmd_t *cmd_cb; + void *block; +}; + +struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev); + +int flow_indr_rhashtable_init(void); + +int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + flow_indr_block_bind_cb_t *cb, void *cb_ident); + +void __flow_indr_block_cb_unregister
[PATCH net-next 3/3] netfilter: nf_tables_offload: support indr block call
From: wenxu nftable support indr-block call. It makes nftable an offload vlan and tunnel device Signed-off-by: wenxu --- net/netfilter/nf_tables_api.c | 6 ++ net/netfilter/nf_tables_offload.c | 137 ++ 2 files changed, 115 insertions(+), 28 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index c6dc173..20daf87 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7623,8 +7623,14 @@ static int __init nf_tables_module_init(void) if (err < 0) goto err5; + err = flow_indr_rhashtable_init(); + if (err) + goto err6; + nft_chain_route_init(); return err; +err6: + nfnetlink_subsys_unregister(&nf_tables_subsys); err5: rhltable_destroy(&nft_objname_ht); err4: diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index 3e1a1a8..be050f4 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -176,24 +176,125 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo, return 0; } +static int nft_block_setup(struct nft_base_chain *basechain, + struct flow_block_offload *bo, + enum flow_block_command cmd) +{ + int err; + + switch (cmd) { + case FLOW_BLOCK_BIND: + err = nft_flow_offload_bind(bo, basechain); + break; + case FLOW_BLOCK_UNBIND: + err = nft_flow_offload_unbind(bo, basechain); + break; + default: + WARN_ON_ONCE(1); + err = -EOPNOTSUPP; + } + + return err; +} + +static int nft_block_offload_cmd(struct nft_base_chain *chain, +struct net_device *dev, +enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + int err; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + rtnl_lock(); + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + if (err < 0) { + rtnl_unlock(); + return err; + } + rtnl_unlock(); + + return nft_block_setup(chain, &bo, cmd); +} + +static void nft_indr_block_ing_cmd(struct net_device *dev, void *block, + struct flow_indr_block_cb *indr_block_cb, + enum flow_block_command cmd) +{ + struct nft_base_chain *chain = (struct nft_base_chain *)block; + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + if (block) + return; + + rtnl_lock(); + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo); + rtnl_unlock(); + + nft_block_setup(chain, &bo, cmd); +} + +static int nft_indr_block_offload_cmd(struct nft_base_chain *chain, + struct net_device *dev, + enum flow_block_command cmd) +{ + struct flow_indr_block_cb *indr_block_cb; + struct flow_indr_block_dev *indr_dev; + struct flow_block_offload bo = {}; + struct netlink_ext_ack extack = {}; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + indr_dev = flow_indr_block_dev_lookup(dev); + if (!indr_dev) + return -EOPNOTSUPP; + + indr_dev->block = cmd == FLOW_BLOCK_BIND ? chain : NULL; + indr_dev->cmd_cb = cmd == FLOW_BLOCK_BIND ? nft_indr_block_ing_cmd : NULL; + + rtnl_lock(); + list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, + &bo); + rtnl_unlock(); + + return nft_block_setup(chain, &bo, cmd); +} + #define FLOW_SETUP_BLOCK TC_SETUP_BLOCK static int nft_flow_offload_chain(struct nft_trans *trans, enum flow_block_command cmd) { struct nft_chain *chain = trans->ctx.chain; - struct netlink_ext_ack extack = {}; - struct flow_block_offload bo = {}; struct nft_base_chain *basechain;
[PATCH net-next 2/3] flow_offload: Support get tcf block immediately
From: wenxu It provide a callback to find the tcf block in the flow_indr_block_dev_get Signed-off-by: wenxu --- include/net/flow_offload.h | 4 net/core/flow_offload.c| 12 net/sched/cls_api.c| 31 +++ 3 files changed, 47 insertions(+) diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 373028e..0ebb7e1 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -371,6 +371,10 @@ struct flow_indr_block_dev { void *block; }; +typedef void flow_indr_get_default_block_t(struct flow_indr_block_dev *indr_dev); + +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb); + struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev); int flow_indr_rhashtable_init(void); diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c index 77e18dc..6aa02b5 100644 --- a/net/core/flow_offload.c +++ b/net/core/flow_offload.c @@ -298,6 +298,14 @@ struct flow_indr_block_dev * } EXPORT_SYMBOL(flow_indr_block_dev_lookup); +static flow_indr_get_default_block_t *flow_indr_get_default_block; + +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb) +{ + flow_indr_get_default_block = cb; +} +EXPORT_SYMBOL(flow_indr_set_default_block_cb); + static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *dev) { struct flow_indr_block_dev *indr_dev; @@ -312,6 +320,10 @@ static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *de INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; + + if (flow_indr_get_default_block) + flow_indr_get_default_block(indr_dev); + if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, flow_indr_setup_block_ht_params)) { kfree(indr_dev); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 359d92f..e64a0d2 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -571,6 +571,35 @@ static void tc_indr_block_ing_cmd(struct net_device *dev, void *block, tcf_block_setup(t_block, &bo); } +static struct tcf_block *tc_dev_ingress_block(struct net_device *dev) +{ + const struct Qdisc_class_ops *cops; + struct Qdisc *qdisc; + + if (!dev_ingress_queue(dev)) + return NULL; + + qdisc = dev_ingress_queue(dev)->qdisc_sleeping; + if (!qdisc) + return NULL; + + cops = qdisc->ops->cl_ops; + if (!cops) + return NULL; + + if (!cops->tcf_block) + return NULL; + + return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL); +} + +static void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev) +{ + indr_dev->block = tc_dev_ingress_block(indr_dev->dev); + if (indr_dev->block) + indr_dev->cmd_cb = tc_indr_block_ing_cmd; +} + static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, struct tcf_block_ext_info *ei, enum flow_block_command command, @@ -3143,6 +3172,8 @@ static int __init tc_filter_init(void) if (err) goto err_rhash_setup_block_ht; + flow_indr_set_default_block_cb(tc_indr_get_default_block); + rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, RTNL_FLAG_DOIT_UNLOCKED); rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, -- 1.8.3.1
Re: [PATCH net-next 1/3] flow_offload: move tc indirect block to flow offload
On 7/25/2019 6:22 PM, Florian Westphal wrote: > we...@ucloud.cn wrote: >> From: wenxu >> >> move tc indirect block to flow_offload.c. The nf_tables >> can use the indr block architecture. > ... to do what? Can you please illustrate how this is going to be > used/useful? This is used to offload the tunnel packet. The decap rule is set on the tunnel device, but not the hardware device.
Re: [PATCH net-next 2/3] flow_offload: Support get tcf block immediately
@tc_indr_block_dev_get funcion, static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev) { struct tc_indr_block_dev *indr_dev; indr_dev = tc_indr_block_dev_lookup(dev); if (indr_dev) goto inc_ref; indr_dev = kzalloc(sizeof(*indr_dev), GFP_KERNEL); if (!indr_dev) return NULL; INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; indr_dev->block = tc_dev_ingress_block(dev); when the indr device register. It will call __tc_indr_block_cb_register-->tc_indr_block_dev_get, It can get the indr_dev->block immediately through tc_dev_ingress_block, But when the indr_block_dev_get put in the common flow_offload. It can not direct access tc_dev_ingress_block. On 7/25/2019 6:24 PM, Florian Westphal wrote: > we...@ucloud.cn wrote: >> From: wenxu >> >> It provide a callback to find the tcf block in >> the flow_indr_block_dev_get > Can you explain why you're making this change? > This will help us understand the concept/idea of your series. > > The above describes what the patch does, but it should > explain why this is callback is added. >
[PATCH] net/mlx5e: Fix zero table prio set by user.
From: wenxu The flow_cls_common_offload prio is zero It leads the invalid table prio in hw. Error: Could not process rule: Invalid argument kernel log: mlx5_core :81:00.0: E-Switch: Failed to create FDB Table err -22 (table prio: 65535, level: 0, size: 4194304) table_prio = (chain * FDB_MAX_PRIO) + prio - 1; should check (chain * FDB_MAX_PRIO) + prio is not 0 Signed-off-by: wenxu --- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 089ae4d..64ca90f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw) flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT | MLX5_FLOW_TABLE_TUNNEL_EN_DECAP); - table_prio = (chain * FDB_MAX_PRIO) + prio - 1; + table_prio = (chain * FDB_MAX_PRIO) + prio; + if (table_prio) + table_prio = table_prio - 1; /* create earlier levels for correct fs_core lookup when * connecting tables -- 1.8.3.1
[PATCH net-next v2 2/3] flow_offload: Support get tcf block immediately
From: wenxu Because the new flow-indr-block can't get the tcf_block directly. It provide a callback to find the tcf block immediately when the device register and contain a ingress block. Signed-off-by: wenxu --- v2: make use of flow_block include/net/flow_offload.h | 4 net/core/flow_offload.c| 12 net/sched/cls_api.c| 34 ++ 3 files changed, 50 insertions(+) diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index b5ef5be..bfe3a34 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -391,6 +391,10 @@ struct flow_indr_block_dev { struct flow_block *flow_block; }; +typedef void flow_indr_get_default_block_t(struct flow_indr_block_dev *indr_dev); + +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb); + struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev); int flow_indr_rhashtable_init(void); diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c index a6785df..eeff99f 100644 --- a/net/core/flow_offload.c +++ b/net/core/flow_offload.c @@ -298,6 +298,14 @@ struct flow_indr_block_dev * } EXPORT_SYMBOL(flow_indr_block_dev_lookup); +static flow_indr_get_default_block_t *flow_indr_get_default_block; + +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb) +{ + flow_indr_get_default_block = cb; +} +EXPORT_SYMBOL(flow_indr_set_default_block_cb); + static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *dev) { struct flow_indr_block_dev *indr_dev; @@ -312,6 +320,10 @@ static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *de INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; + + if (flow_indr_get_default_block) + flow_indr_get_default_block(indr_dev); + if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, flow_indr_setup_block_ht_params)) { kfree(indr_dev); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index d370c52..8d4d7f0 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -576,6 +576,38 @@ static void tc_indr_block_ing_cmd(struct net_device *dev, tcf_block_setup(block, &bo); } +static struct tcf_block *tc_dev_ingress_block(struct net_device *dev) +{ + const struct Qdisc_class_ops *cops; + struct Qdisc *qdisc; + + if (!dev_ingress_queue(dev)) + return NULL; + + qdisc = dev_ingress_queue(dev)->qdisc_sleeping; + if (!qdisc) + return NULL; + + cops = qdisc->ops->cl_ops; + if (!cops) + return NULL; + + if (!cops->tcf_block) + return NULL; + + return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL); +} + +static void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev) +{ + struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev); + + if (block) { + indr_dev->flow_block = &block->flow_block; + indr_dev->ing_cmd_cb = tc_indr_block_ing_cmd; + } +} + static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, struct tcf_block_ext_info *ei, enum flow_block_command command, @@ -3172,6 +3204,8 @@ static int __init tc_filter_init(void) if (err) goto err_rhash_setup_block_ht; + flow_indr_set_default_block_cb(tc_indr_get_default_block); + rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, RTNL_FLAG_DOIT_UNLOCKED); rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, -- 1.8.3.1
[PATCH net-next v2 0/3] flow_offload: add indr-block in nf_table_offload
From: wenxu This series patch make nftables offload support the vlan and tunnel device offload through indr-block architecture. The first patch mv tc indr block to flow offload and rename to flow-indr-block. Because the new flow-indr-block can't get the tcf_block directly. The second patch provide a callback to get tcf_block immediately when the device register and contain a ingress block. The third patch make nf_tables_offload support flow-indr-block. wenxu (3): flow_offload: move tc indirect block to flow offload flow_offload: Support get tcf block immediately netfilter: nf_tables_offload: support indr block call drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 10 +- include/net/flow_offload.h | 45 include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 202 + net/netfilter/nf_tables_api.c | 6 + net/netfilter/nf_tables_offload.c | 128 +-- net/sched/cls_api.c| 243 - 9 files changed, 410 insertions(+), 272 deletions(-) -- 1.8.3.1
[PATCH net-next v2 3/3] netfilter: nf_tables_offload: support indr block call
From: wenxu nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu --- v2: make use of flow_block net/netfilter/nf_tables_api.c | 6 ++ net/netfilter/nf_tables_offload.c | 128 +++--- 2 files changed, 110 insertions(+), 24 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 605a7cf..a8ab3e9 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7623,8 +7623,14 @@ static int __init nf_tables_module_init(void) if (err < 0) goto err5; + err = flow_indr_rhashtable_init(); + if (err) + goto err6; + nft_chain_route_init(); return err; +err6: + nfnetlink_subsys_unregister(&nf_tables_subsys); err5: rhltable_destroy(&nft_objname_ht); err4: diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index 64f5fd5..09a5efe 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -171,24 +171,120 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo, return 0; } +static int nft_block_setup(struct nft_base_chain *basechain, + struct flow_block_offload *bo, + enum flow_block_command cmd) +{ + int err; + + switch (cmd) { + case FLOW_BLOCK_BIND: + err = nft_flow_offload_bind(bo, basechain); + break; + case FLOW_BLOCK_UNBIND: + err = nft_flow_offload_unbind(bo, basechain); + break; + default: + WARN_ON_ONCE(1); + err = -EOPNOTSUPP; + } + + return err; +} + +static int nft_block_offload_cmd(struct nft_base_chain *chain, +struct net_device *dev, +enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + int err; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + if (err < 0) + return err; + + return nft_block_setup(chain, &bo, cmd); +} + +static void nft_indr_block_ing_cmd(struct net_device *dev, + struct flow_block *flow_block, + struct flow_indr_block_cb *indr_block_cb, + enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + struct nft_base_chain *chain; + + if (flow_block) + return; + + chain = container_of(flow_block, struct nft_base_chain, flow_block); + + bo.net = dev_net(dev); + bo.block = flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo); + + nft_block_setup(chain, &bo, cmd); +} + +static int nft_indr_block_offload_cmd(struct nft_base_chain *chain, + struct net_device *dev, + enum flow_block_command cmd) +{ + struct flow_indr_block_cb *indr_block_cb; + struct flow_indr_block_dev *indr_dev; + struct flow_block_offload bo = {}; + struct netlink_ext_ack extack = {}; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + indr_dev = flow_indr_block_dev_lookup(dev); + if (!indr_dev) + return -EOPNOTSUPP; + + indr_dev->flow_block = cmd == FLOW_BLOCK_BIND ? &chain->flow_block : NULL; + indr_dev->ing_cmd_cb = cmd == FLOW_BLOCK_BIND ? nft_indr_block_ing_cmd : NULL; + + list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, + &bo); + + return nft_block_setup(chain, &bo
[PATCH net-next v2 1/3] flow_offload: move tc indirect block to flow offload
From: wenxu move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu --- v2: make use of flow_block from Pablo flow_indr_rhashtable_init advice by jakub.kicinski drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 10 +- include/net/flow_offload.h | 41 include/net/pkt_cls.h | 35 include/net/sch_generic.h | 3 - net/core/flow_offload.c| 190 + net/sched/cls_api.c| 231 ++--- 7 files changed, 261 insertions(+), 259 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 7f747cb..074573b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, { int err; - err = __tc_indr_block_cb_register(netdev, rpriv, - mlx5e_rep_indr_setup_tc_cb, - rpriv); + err = __flow_indr_block_cb_register(netdev, rpriv, + mlx5e_rep_indr_setup_tc_cb, + rpriv); if (err) { struct mlx5e_priv *priv = netdev_priv(rpriv->netdev); @@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv, struct net_device *netdev) { - __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, - rpriv); + __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, + rpriv); } static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb, diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c index e209f15..6a0f034 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1479,16 +1479,16 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app, return NOTIFY_OK; if (event == NETDEV_REGISTER) { - err = __tc_indr_block_cb_register(netdev, app, - nfp_flower_indr_setup_tc_cb, - app); + err = __flow_indr_block_cb_register(netdev, app, + nfp_flower_indr_setup_tc_cb, + app); if (err) nfp_flower_cmsg_warn(app, "Indirect block reg failed - %s\n", netdev->name); } else if (event == NETDEV_UNREGISTER) { - __tc_indr_block_cb_unregister(netdev, - nfp_flower_indr_setup_tc_cb, app); + __flow_indr_block_cb_unregister(netdev, + nfp_flower_indr_setup_tc_cb, app); } return NOTIFY_OK; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 00b9aab..b5ef5be 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -4,6 +4,7 @@ #include #include #include +#include struct flow_match { struct flow_dissector *dissector; @@ -366,4 +367,44 @@ static inline void flow_block_init(struct flow_block *flow_block) INIT_LIST_HEAD(&flow_block->cb_list); } +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, + enum tc_setup_type type, void *type_data); + +struct flow_indr_block_cb { + struct list_head list; + void *cb_priv; + flow_indr_block_bind_cb_t *cb; + void *cb_ident; +}; + +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, + struct flow_block *flow_block, + struct flow_indr_block_cb *indr_block_cb, + enum flow_block_command command); + +struct flow_indr_block_dev { + struct rhash_head ht_node; + struct net_device *dev; + unsigned int refcnt; + struct list_head cb_list; + flow_indr_block_ing_cmd_t *ing_cmd_cb; + struct flow_block *flow_block; +}; + +struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev); + +int flow_indr_rhasht
Re: [PATCH] net/mlx5e: Fix zero table prio set by user.
在 2019/7/26 20:19, Or Gerlitz 写道: > On Fri, Jul 26, 2019 at 12:24 AM Saeed Mahameed wrote: >> On Thu, 2019-07-25 at 19:24 +0800, we...@ucloud.cn wrote: >>> From: wenxu >>> >>> The flow_cls_common_offload prio is zero >>> >>> It leads the invalid table prio in hw. >>> >>> Error: Could not process rule: Invalid argument >>> >>> kernel log: >>> mlx5_core :81:00.0: E-Switch: Failed to create FDB Table err -22 >>> (table prio: 65535, level: 0, size: 4194304) >>> >>> table_prio = (chain * FDB_MAX_PRIO) + prio - 1; >>> should check (chain * FDB_MAX_PRIO) + prio is not 0 >>> >>> Signed-off-by: wenxu >>> --- >>> drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git >>> a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c >>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c >>> index 089ae4d..64ca90f 100644 >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c >>> @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct >> this piece of code isn't in this function, weird how it got to the >> diff, patch applies correctly though ! >> >>> mlx5_eswitch *esw) >>> flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT | >>> MLX5_FLOW_TABLE_TUNNEL_EN_DECAP); >>> >>> - table_prio = (chain * FDB_MAX_PRIO) + prio - 1; >>> + table_prio = (chain * FDB_MAX_PRIO) + prio; >>> + if (table_prio) >>> + table_prio = table_prio - 1; >>> >> This is black magic, even before this fix. >> this -1 seems to be needed in order to call >> create_next_size_table(table_prio) with the previous "table prio" ? >> (table_prio - 1) ? >> >> The whole thing looks wrong to me since when prio is 0 and chain is 0, >> there is not such thing table_prio - 1. >> >> mlnx eswitch guys in the cc, please advise. > basically, prio 0 is not something we ever get in the driver, since if > user space > specifies 0, the kernel generates some random non-zero prio, and we support > only prios 1-16 -- Wenxu -- what do you run to get this error? > > I run offload with nfatbles(but not tc), there is no prio for each rule. prio of flow_cls_common_offload init as 0. static void nft_flow_offload_common_init(struct flow_cls_common_offload *common, __be16 proto, struct netlink_ext_ack *extack) { common->protocol = proto; common->extack = extack; } flow_cls_common_offload
[PATCH net-next v3 0/3] flow_offload: add indr-block in nf_table_offload
From: wenxu This series patch make nftables offload support the vlan and tunnel device offload through indr-block architecture. The first patch mv tc indr block to flow offload and rename to flow-indr-block. Because the new flow-indr-block can't get the tcf_block directly. The second patch provide a callback to get tcf_block immediately when the device register and contain a ingress block. The third patch make nf_tables_offload support flow-indr-block. wenxu (3): flow_offload: move tc indirect block to flow offload flow_offload: support get tcf block immediately netfilter: nf_tables_offload: support indr block call drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 10 +- include/net/flow_offload.h | 43 include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 191 net/netfilter/nf_tables_offload.c | 128 +-- net/sched/cls_api.c| 245 - 8 files changed, 389 insertions(+), 276 deletions(-) -- 1.8.3.1
[PATCH net-next v3 1/3] flow_offload: move tc indirect block to flow offload
From: wenxu move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu --- v3: subsys_initcall for init_flow_indr_rhashtable drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 10 +- include/net/flow_offload.h | 39 include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 179 net/sched/cls_api.c| 235 ++--- 7 files changed, 247 insertions(+), 264 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 7f747cb..074573b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, { int err; - err = __tc_indr_block_cb_register(netdev, rpriv, - mlx5e_rep_indr_setup_tc_cb, - rpriv); + err = __flow_indr_block_cb_register(netdev, rpriv, + mlx5e_rep_indr_setup_tc_cb, + rpriv); if (err) { struct mlx5e_priv *priv = netdev_priv(rpriv->netdev); @@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv, struct net_device *netdev) { - __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, - rpriv); + __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, + rpriv); } static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb, diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c index e209f15..6a0f034 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1479,16 +1479,16 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app, return NOTIFY_OK; if (event == NETDEV_REGISTER) { - err = __tc_indr_block_cb_register(netdev, app, - nfp_flower_indr_setup_tc_cb, - app); + err = __flow_indr_block_cb_register(netdev, app, + nfp_flower_indr_setup_tc_cb, + app); if (err) nfp_flower_cmsg_warn(app, "Indirect block reg failed - %s\n", netdev->name); } else if (event == NETDEV_UNREGISTER) { - __tc_indr_block_cb_unregister(netdev, - nfp_flower_indr_setup_tc_cb, app); + __flow_indr_block_cb_unregister(netdev, + nfp_flower_indr_setup_tc_cb, app); } return NOTIFY_OK; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 00b9aab..66f89bc 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -4,6 +4,7 @@ #include #include #include +#include struct flow_match { struct flow_dissector *dissector; @@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block) INIT_LIST_HEAD(&flow_block->cb_list); } +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, + enum tc_setup_type type, void *type_data); + +struct flow_indr_block_cb { + struct list_head list; + void *cb_priv; + flow_indr_block_bind_cb_t *cb; + void *cb_ident; +}; + +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, + struct flow_block *flow_block, + struct flow_indr_block_cb *indr_block_cb, + enum flow_block_command command); + +struct flow_indr_block_dev { + struct rhash_head ht_node; + struct net_device *dev; + unsigned int refcnt; + struct list_head cb_list; + flow_indr_block_ing_cmd_t *ing_cmd_cb; + struct flow_block *flow_block; +}; + +struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev); + +int __flow_indr_block_cb_register(struct net_dev
[PATCH net-next v3 2/3] flow_offload: support get tcf block immediately
From: wenxu Because the new flow-indr-block can't get the tcf_block directly. It provide a callback to find the tcf block immediately when the device register and contain a ingress block. Signed-off-by: wenxu --- v3: no change include/net/flow_offload.h | 4 net/core/flow_offload.c| 12 net/sched/cls_api.c| 34 ++ 3 files changed, 50 insertions(+) diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 66f89bc..3b2e848 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -391,6 +391,10 @@ struct flow_indr_block_dev { struct flow_block *flow_block; }; +typedef void flow_indr_get_default_block_t(struct flow_indr_block_dev *indr_dev); + +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb); + struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev); int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c index 9f1ae67..db8469d 100644 --- a/net/core/flow_offload.c +++ b/net/core/flow_offload.c @@ -298,6 +298,14 @@ struct flow_indr_block_dev * } EXPORT_SYMBOL(flow_indr_block_dev_lookup); +static flow_indr_get_default_block_t *flow_indr_get_default_block; + +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb) +{ + flow_indr_get_default_block = cb; +} +EXPORT_SYMBOL(flow_indr_set_default_block_cb); + static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *dev) { struct flow_indr_block_dev *indr_dev; @@ -312,6 +320,10 @@ static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *de INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; + + if (flow_indr_get_default_block) + flow_indr_get_default_block(indr_dev); + if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, flow_indr_setup_block_ht_params)) { kfree(indr_dev); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index d551c56..7c715a8 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -576,6 +576,38 @@ static void tc_indr_block_ing_cmd(struct net_device *dev, tcf_block_setup(block, &bo); } +static struct tcf_block *tc_dev_ingress_block(struct net_device *dev) +{ + const struct Qdisc_class_ops *cops; + struct Qdisc *qdisc; + + if (!dev_ingress_queue(dev)) + return NULL; + + qdisc = dev_ingress_queue(dev)->qdisc_sleeping; + if (!qdisc) + return NULL; + + cops = qdisc->ops->cl_ops; + if (!cops) + return NULL; + + if (!cops->tcf_block) + return NULL; + + return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL); +} + +static void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev) +{ + struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev); + + if (block) { + indr_dev->flow_block = &block->flow_block; + indr_dev->ing_cmd_cb = tc_indr_block_ing_cmd; + } +} + static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, struct tcf_block_ext_info *ei, enum flow_block_command command, @@ -3168,6 +3200,8 @@ static int __init tc_filter_init(void) if (err) goto err_register_pernet_subsys; + flow_indr_set_default_block_cb(tc_indr_get_default_block); + rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, RTNL_FLAG_DOIT_UNLOCKED); rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, -- 1.8.3.1
[PATCH net-next v3 3/3] netfilter: nf_tables_offload: support indr block call
From: wenxu nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu --- v3: subsys_initcall for init_flow_indr_rhashtable net/netfilter/nf_tables_offload.c | 128 +++--- 1 file changed, 104 insertions(+), 24 deletions(-) diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index 64f5fd5..09a5efe 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -171,24 +171,120 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo, return 0; } +static int nft_block_setup(struct nft_base_chain *basechain, + struct flow_block_offload *bo, + enum flow_block_command cmd) +{ + int err; + + switch (cmd) { + case FLOW_BLOCK_BIND: + err = nft_flow_offload_bind(bo, basechain); + break; + case FLOW_BLOCK_UNBIND: + err = nft_flow_offload_unbind(bo, basechain); + break; + default: + WARN_ON_ONCE(1); + err = -EOPNOTSUPP; + } + + return err; +} + +static int nft_block_offload_cmd(struct nft_base_chain *chain, +struct net_device *dev, +enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + int err; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + if (err < 0) + return err; + + return nft_block_setup(chain, &bo, cmd); +} + +static void nft_indr_block_ing_cmd(struct net_device *dev, + struct flow_block *flow_block, + struct flow_indr_block_cb *indr_block_cb, + enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + struct nft_base_chain *chain; + + if (flow_block) + return; + + chain = container_of(flow_block, struct nft_base_chain, flow_block); + + bo.net = dev_net(dev); + bo.block = flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo); + + nft_block_setup(chain, &bo, cmd); +} + +static int nft_indr_block_offload_cmd(struct nft_base_chain *chain, + struct net_device *dev, + enum flow_block_command cmd) +{ + struct flow_indr_block_cb *indr_block_cb; + struct flow_indr_block_dev *indr_dev; + struct flow_block_offload bo = {}; + struct netlink_ext_ack extack = {}; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + indr_dev = flow_indr_block_dev_lookup(dev); + if (!indr_dev) + return -EOPNOTSUPP; + + indr_dev->flow_block = cmd == FLOW_BLOCK_BIND ? &chain->flow_block : NULL; + indr_dev->ing_cmd_cb = cmd == FLOW_BLOCK_BIND ? nft_indr_block_ing_cmd : NULL; + + list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, + &bo); + + return nft_block_setup(chain, &bo, cmd); +} + #define FLOW_SETUP_BLOCK TC_SETUP_BLOCK static int nft_flow_offload_chain(struct nft_trans *trans, enum flow_block_command cmd) { struct nft_chain *chain = trans->ctx.chain; - struct netlink_ext_ack extack = {}; - struct flow_block_offload bo = {}; struct nft_base_chain *basechain; struct net_device *dev; - int err; if (!nft_is_base_chain(chain)) return -EOPNOTSUPP; basechain = nft_base_chain(chain); dev = basechain->ops.dev; - if (!dev || !dev->netdev_ops->ndo_setu
Re: [PATCH net-next v3 2/3] flow_offload: support get tcf block immediately
在 2019/7/27 8:52, Jakub Kicinski 写道: > On Fri, 26 Jul 2019 21:34:06 +0800, we...@ucloud.cn wrote: >> From: wenxu >> >> Because the new flow-indr-block can't get the tcf_block >> directly. >> It provide a callback to find the tcf block immediately >> when the device register and contain a ingress block. >> >> Signed-off-by: wenxu > Please CC people who gave you feedback on your subsequent submissions. > >> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h >> index 66f89bc..3b2e848 100644 >> --- a/include/net/flow_offload.h >> +++ b/include/net/flow_offload.h >> @@ -391,6 +391,10 @@ struct flow_indr_block_dev { >> struct flow_block *flow_block; >> }; >> >> +typedef void flow_indr_get_default_block_t(struct flow_indr_block_dev >> *indr_dev); >> + >> +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb); >> + >> struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device >> *dev); >> >> int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, >> diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c >> index 9f1ae67..db8469d 100644 >> --- a/net/core/flow_offload.c >> +++ b/net/core/flow_offload.c >> @@ -298,6 +298,14 @@ struct flow_indr_block_dev * >> } >> EXPORT_SYMBOL(flow_indr_block_dev_lookup); >> >> +static flow_indr_get_default_block_t *flow_indr_get_default_block; > This static variable which can only be set to the TC's callback really > is not a great API design :/ So any advise? just call the function in tc system with #ifdef NET_CLSXXX? >
Re: [PATCH net-next v3 1/3] flow_offload: move tc indirect block to flow offload
在 2019/7/27 8:56, Jakub Kicinski 写道: > On Fri, 26 Jul 2019 21:34:05 +0800, we...@ucloud.cn wrote: >> From: wenxu >> >> move tc indirect block to flow_offload and rename >> it to flow indirect block.The nf_tables can use the >> indr block architecture. >> >> Signed-off-by: wenxu >> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h >> index 00b9aab..66f89bc 100644 >> --- a/include/net/flow_offload.h >> +++ b/include/net/flow_offload.h >> @@ -4,6 +4,7 @@ >> #include >> #include >> #include >> +#include >> >> struct flow_match { >> struct flow_dissector *dissector; >> @@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block >> *flow_block) >> INIT_LIST_HEAD(&flow_block->cb_list); >> } >> >> +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, >> + enum tc_setup_type type, void *type_data); >> + >> +struct flow_indr_block_cb { >> +struct list_head list; >> +void *cb_priv; >> +flow_indr_block_bind_cb_t *cb; >> +void *cb_ident; >> +}; >> + >> +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, >> + struct flow_block *flow_block, >> + struct flow_indr_block_cb *indr_block_cb, >> + enum flow_block_command command); >> + >> +struct flow_indr_block_dev { >> +struct rhash_head ht_node; >> +struct net_device *dev; >> +unsigned int refcnt; >> +struct list_head cb_list; >> +flow_indr_block_ing_cmd_t *ing_cmd_cb; >> +struct flow_block *flow_block; > TC can only have one block per device. Now with nftables offload we can > have multiple blocks. Could you elaborate how this is solved? > >> +}; the nftable offload only work on netdev base chain. Each device can limit to one netdev base chain.
[PATCH net] net/mlx5e: Fix unnecessary flow_block_cb_is_busy call
From: wenxu When call flow_block_cb_is_busy. The indr_priv is guaranteed to NULL ptr. So there is no need to call flow_bock_cb_is_busy. Fixes: 0d4fd02e7199 ("net: flow_offload: add flow_block_cb_is_busy() and use it") Signed-off-by: wenxu --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 7f747cb..496d303 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -722,10 +722,6 @@ static void mlx5e_rep_indr_tc_block_unbind(void *cb_priv) if (indr_priv) return -EEXIST; - if (flow_block_cb_is_busy(mlx5e_rep_indr_setup_block_cb, - indr_priv, &mlx5e_block_cb_list)) - return -EBUSY; - indr_priv = kmalloc(sizeof(*indr_priv), GFP_KERNEL); if (!indr_priv) return -ENOMEM; -- 1.8.3.1
[PATCH net-next v4 0/3] flow_offload: add indr-block in nf_table_offload
From: wenxu This series patch make nftables offload support the vlan and tunnel device offload through indr-block architecture. The first patch mv tc indr block to flow offload and rename to flow-indr-block. Because the new flow-indr-block can't get the tcf_block directly. The second patch provide a callback to get tcf_block immediately when the device register and contain a ingress block. The third patch make nf_tables_offload support flow-indr-block. wenxu (3): flow_offload: move tc indirect block to flow offload flow_offload: Support get default block from tc immediately netfilter: nf_tables_offload: support indr block call drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 10 +- include/net/flow_offload.h | 39 include/net/pkt_cls.h | 42 +--- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 181 +++ net/netfilter/nf_tables_offload.c | 131 +-- net/sched/cls_api.c| 246 - 8 files changed, 385 insertions(+), 277 deletions(-) -- 1.8.3.1
[PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
From: wenxu When thre indr device register, it can get the default block from tc immediately if the block is exist. Signed-off-by: wenxu --- v3: no change v4: get tc default block without callback include/net/pkt_cls.h | 7 +++ net/core/flow_offload.c | 2 ++ net/sched/cls_api.c | 33 + 3 files changed, 42 insertions(+) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 0790a4e..77c3a42 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -54,6 +54,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, struct tcf_block_ext_info *ei); +void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev); + static inline bool tcf_block_shared(struct tcf_block *block) { return block->index; @@ -74,6 +76,11 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res, bool compat_mode); #else +static inline +void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev) +{ +} + static inline bool tcf_block_shared(struct tcf_block *block) { return false; diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c index 9f1ae67..0ca3d51 100644 --- a/net/core/flow_offload.c +++ b/net/core/flow_offload.c @@ -3,6 +3,7 @@ #include #include #include +#include struct flow_rule *flow_rule_alloc(unsigned int num_actions) { @@ -312,6 +313,7 @@ static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *de INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; + tc_indr_get_default_block(indr_dev); if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, flow_indr_setup_block_ht_params)) { kfree(indr_dev); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index d551c56..59e9572 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -576,6 +576,39 @@ static void tc_indr_block_ing_cmd(struct net_device *dev, tcf_block_setup(block, &bo); } +static struct tcf_block *tc_dev_ingress_block(struct net_device *dev) +{ + const struct Qdisc_class_ops *cops; + struct Qdisc *qdisc; + + if (!dev_ingress_queue(dev)) + return NULL; + + qdisc = dev_ingress_queue(dev)->qdisc_sleeping; + if (!qdisc) + return NULL; + + cops = qdisc->ops->cl_ops; + if (!cops) + return NULL; + + if (!cops->tcf_block) + return NULL; + + return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL); +} + +void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev) +{ + struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev); + + if (block) { + indr_dev->flow_block = &block->flow_block; + indr_dev->ing_cmd_cb = tc_indr_block_ing_cmd; + } +} +EXPORT_SYMBOL(tc_indr_get_default_block); + static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, struct tcf_block_ext_info *ei, enum flow_block_command command, -- 1.8.3.1
[PATCH net-next v4 3/3] netfilter: nf_tables_offload: support indr block call
From: wenxu nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu --- v3: subsys_initcall for init_flow_indr_rhashtable v4: guarantee only one offload base chain used per indr dev. If the indr_block_cmd bind fail return unsupported. net/netfilter/nf_tables_offload.c | 131 +++--- 1 file changed, 107 insertions(+), 24 deletions(-) diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index 64f5fd5..19214ad 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -171,24 +171,123 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo, return 0; } +static int nft_block_setup(struct nft_base_chain *basechain, + struct flow_block_offload *bo, + enum flow_block_command cmd) +{ + int err; + + switch (cmd) { + case FLOW_BLOCK_BIND: + err = nft_flow_offload_bind(bo, basechain); + break; + case FLOW_BLOCK_UNBIND: + err = nft_flow_offload_unbind(bo, basechain); + break; + default: + WARN_ON_ONCE(1); + err = -EOPNOTSUPP; + } + + return err; +} + +static int nft_block_offload_cmd(struct nft_base_chain *chain, +struct net_device *dev, +enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + int err; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + if (err < 0) + return err; + + return nft_block_setup(chain, &bo, cmd); +} + +static void nft_indr_block_ing_cmd(struct net_device *dev, + struct flow_block *flow_block, + struct flow_indr_block_cb *indr_block_cb, + enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + struct nft_base_chain *chain; + + if (flow_block) + return; + + chain = container_of(flow_block, struct nft_base_chain, flow_block); + + bo.net = dev_net(dev); + bo.block = flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo); + + nft_block_setup(chain, &bo, cmd); +} + +static int nft_indr_block_offload_cmd(struct nft_base_chain *chain, + struct net_device *dev, + enum flow_block_command cmd) +{ + struct flow_indr_block_cb *indr_block_cb; + struct flow_indr_block_dev *indr_dev; + struct flow_block_offload bo = {}; + struct netlink_ext_ack extack = {}; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + indr_dev = flow_indr_block_dev_lookup(dev); + if (!indr_dev) + return -EOPNOTSUPP; + + indr_dev->flow_block = cmd == FLOW_BLOCK_BIND ? &chain->flow_block : NULL; + indr_dev->ing_cmd_cb = cmd == FLOW_BLOCK_BIND ? nft_indr_block_ing_cmd : NULL; + + list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, + &bo); + + if (list_empty(&bo.cb_list)) + return -EOPNOTSUPP; + + return nft_block_setup(chain, &bo, cmd); +} + #define FLOW_SETUP_BLOCK TC_SETUP_BLOCK static int nft_flow_offload_chain(struct nft_trans *trans, enum flow_block_command cmd) { struct nft_chain *chain = trans->ctx.chain; - struct netlink_ext_ack extack = {}; - struct flow_block_offload bo = {}; struct nft_base_chain *basechain; struct net_device *dev; - int err;
[PATCH net-next v4 1/3] flow_offload: move tc indirect block to flow offload
From: wenxu move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu --- v3: subsys_initcall for init_flow_indr_rhashtable v4: no change drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 10 +- include/net/flow_offload.h | 39 include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 179 net/sched/cls_api.c| 235 ++--- 7 files changed, 247 insertions(+), 264 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 7f747cb..074573b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, { int err; - err = __tc_indr_block_cb_register(netdev, rpriv, - mlx5e_rep_indr_setup_tc_cb, - rpriv); + err = __flow_indr_block_cb_register(netdev, rpriv, + mlx5e_rep_indr_setup_tc_cb, + rpriv); if (err) { struct mlx5e_priv *priv = netdev_priv(rpriv->netdev); @@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv, struct net_device *netdev) { - __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, - rpriv); + __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, + rpriv); } static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb, diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c index e209f15..6a0f034 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1479,16 +1479,16 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app, return NOTIFY_OK; if (event == NETDEV_REGISTER) { - err = __tc_indr_block_cb_register(netdev, app, - nfp_flower_indr_setup_tc_cb, - app); + err = __flow_indr_block_cb_register(netdev, app, + nfp_flower_indr_setup_tc_cb, + app); if (err) nfp_flower_cmsg_warn(app, "Indirect block reg failed - %s\n", netdev->name); } else if (event == NETDEV_UNREGISTER) { - __tc_indr_block_cb_unregister(netdev, - nfp_flower_indr_setup_tc_cb, app); + __flow_indr_block_cb_unregister(netdev, + nfp_flower_indr_setup_tc_cb, app); } return NOTIFY_OK; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 00b9aab..66f89bc 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -4,6 +4,7 @@ #include #include #include +#include struct flow_match { struct flow_dissector *dissector; @@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block) INIT_LIST_HEAD(&flow_block->cb_list); } +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, + enum tc_setup_type type, void *type_data); + +struct flow_indr_block_cb { + struct list_head list; + void *cb_priv; + flow_indr_block_bind_cb_t *cb; + void *cb_ident; +}; + +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, + struct flow_block *flow_block, + struct flow_indr_block_cb *indr_block_cb, + enum flow_block_command command); + +struct flow_indr_block_dev { + struct rhash_head ht_node; + struct net_device *dev; + unsigned int refcnt; + struct list_head cb_list; + flow_indr_block_ing_cmd_t *ing_cmd_cb; + struct flow_block *flow_block; +}; + +struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev); + +int __flow_indr_block_cb_register(struct net_dev
Re: [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
On 7/29/2019 4:16 AM, Jakub Kicinski wrote: > . > The TC default block is there because the indirect registration may > happen _after_ the block is installed and populated. It's the device > driver that usually does the indirect registration, the tunnel device > and its rules may already be set when device driver is loaded or > reloaded. Yes, I know this scenario. > I don't know the nft code, but it seems unlikely it wouldn't have the > same problem/need.. nft don't have the same problem. The offload rule can only attached to offload base chain. Th offload base chain is created after the device driver loaded (the device exist). >
Re: [PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
Hi pablo Any suggestion for this case. Tthe 0 prio vlaue for driver is an invalid priority. So What should we do for this case? Currently there is no prio for each nft rules. BR wenxu On 7/25/2019 11:45 AM, Marcelo Ricardo Leitner wrote: > On Thu, Jul 25, 2019 at 11:03:52AM +0800, wenxu wrote: >> On 7/25/2019 7:51 AM, Marcelo Ricardo Leitner wrote: >>> On Thu, Jul 11, 2019 at 04:03:30PM +0800, we...@ucloud.cn wrote: >>>> From: wenxu >>>> >>>> The flow_cls_common_offload prio should be not zero >>>> >>>> It leads the invalid table prio in hw. >>>> >>>> # nft add table netdev firewall >>>> # nft add chain netdev firewall acl { type filter hook ingress device >>>> mlx_pf0vf0 priority - 300 \; } >>>> # nft add rule netdev firewall acl ip daddr 1.1.1.7 drop >>>> Error: Could not process rule: Invalid argument >>>> >>>> kernel log >>>> mlx5_core :81:00.0: E-Switch: Failed to create FDB Table err -22 >>>> (table prio: 65535, level: 0, size: 4194304) >>>> >>>> Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") >>>> Signed-off-by: wenxu >>>> --- >>>> net/netfilter/nf_tables_offload.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/net/netfilter/nf_tables_offload.c >>>> b/net/netfilter/nf_tables_offload.c >>>> index 2c33028..01d8133 100644 >>>> --- a/net/netfilter/nf_tables_offload.c >>>> +++ b/net/netfilter/nf_tables_offload.c >>>> @@ -7,6 +7,8 @@ >>>> #include >>>> #include >>>> >>>> +#define FLOW_OFFLOAD_DEFAUT_PRIO 1U >>>> + >>>> static struct nft_flow_rule *nft_flow_rule_alloc(int num_actions) >>>> { >>>>struct nft_flow_rule *flow; >>>> @@ -107,6 +109,7 @@ static void nft_flow_offload_common_init(struct >>>> flow_cls_common_offload *common, >>>>struct netlink_ext_ack *extack) >>>> { >>>>common->protocol = proto; >>>> + common->prio = TC_H_MAKE(FLOW_OFFLOAD_DEFAUT_PRIO << 16, 0); >>> Note that tc semantics for this is to auto-generate a priority in such >>> cases, instead of using a default. >>> >>> @tc_new_tfilter(): >>> if (prio == 0) { >>> /* If no priority is provided by the user, >>> * we allocate one. >>> */ >>> if (n->nlmsg_flags & NLM_F_CREATE) { >>> prio = TC_H_MAKE(0x8000U, 0U); >>> prio_allocate = true; >>> ... >>> if (prio_allocate) >>> prio = tcf_auto_prio(tcf_chain_tp_prev(chain, >>> >>> &chain_info)); >> Yes,The tc auto-generate a priority. But if there is no pre >> tcf_proto, the priority is also set as a default. > After the first filter, there will be a tcf_proto. Please see the test below. > >> In nftables each rule no priortiy for each other. So It is enough to >> set a default value which is similar as the tc. > Yep, maybe it works for nftables. I'm just highlighting this because > it is reusing tc infrastructure and will expose a different behavior > to the user. But if nftables already has this defined, that probably > takes precedence by now and all that is left to do is to make sure any > documentation on it is updated. Pablo? > >> static inline u32 tcf_auto_prio(struct tcf_proto *tp) >> { >> u32 first = TC_H_MAKE(0xC000U, 0U); > base default prio, 0xC = 49152 > >> if (tp) >> first = tp->prio - 1; >> >> return TC_H_MAJ(first); >> } > # tc qdisc add dev veth1 ingress > # tc filter add dev veth1 ingress proto ip flower src_mac ec:13:db:00:00:00 > action drop >1st filter --^^ > # tc filter add dev veth1 ingress proto ip flower src_mac ec:13:db:00:00:01 > action drop >2nd filter --^^ > # tc filter add dev veth1 ingress proto ip flower src_mac ec:13:db:00:00:02 > action drop > > With no 'prio X' parameter, it uses 0 as default, and when dumped: > > # tc filter show dev veth1 ingress > filter protocol ip pref 4915
Re: [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
On 7/29/2019 12:42 PM, Jakub Kicinski wrote: > On Mon, 29 Jul 2019 10:43:56 +0800, wenxu wrote: >> On 7/29/2019 4:16 AM, Jakub Kicinski wrote: >>> I don't know the nft code, but it seems unlikely it wouldn't have the >>> same problem/need.. >> nft don't have the same problem. The offload rule can only attached >> to offload base chain. >> >> Th offload base chain is created after the device driver loaded (the >> device exist). > For indirect blocks the block is on the tunnel device and the offload > target is another device. E.g. you offload rules from a VXLAN device > onto the ASIC. The ASICs driver does not have to be loaded when VXLAN > device is created. > > So I feel like either the chain somehow directly references the offload > target (in which case the indirect infrastructure with hash lookup etc > is not needed for nft), or indirect infra is needed, and we need to take > care of replays. So you mean the case is there are two card A and B both can offload vxlan. First vxlan device offload with A. And then the B driver loaded, So the rules should replay to B device?
Re: [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
On 7/29/2019 12:42 PM, Jakub Kicinski wrote: > On Mon, 29 Jul 2019 10:43:56 +0800, wenxu wrote: >> On 7/29/2019 4:16 AM, Jakub Kicinski wrote: >>> I don't know the nft code, but it seems unlikely it wouldn't have the >>> same problem/need.. >> nft don't have the same problem. The offload rule can only attached >> to offload base chain. >> >> Th offload base chain is created after the device driver loaded (the >> device exist). > For indirect blocks the block is on the tunnel device and the offload > target is another device. E.g. you offload rules from a VXLAN device > onto the ASIC. The ASICs driver does not have to be loaded when VXLAN > device is created. > > So I feel like either the chain somehow directly references the offload > target (in which case the indirect infrastructure with hash lookup etc > is not needed for nft), or indirect infra is needed, and we need to take > care of replays. I think the nft is different with tc. In tc case we can create vxlan device add a ingress qdisc with a block success Then the ASIC driver loaded, then register the vxlan indr-dev and get the block adn replay it to hardware But in the nft case, The base chain flags with offload. Create an offload netdev base chain on vxlan device will fail if there is no indr-device to offload.
Re: [PATCH net-next v4 1/3] flow_offload: move tc indirect block to flow offload
在 2019/7/29 19:13, Jiri Pirko 写道: > Sun, Jul 28, 2019 at 08:52:47AM CEST, we...@ucloud.cn wrote: >> From: wenxu >> >> move tc indirect block to flow_offload and rename >> it to flow indirect block.The nf_tables can use the >> indr block architecture. >> >> Signed-off-by: wenxu >> --- >> v3: subsys_initcall for init_flow_indr_rhashtable >> v4: no change >> > [...] > > >> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h >> index 00b9aab..66f89bc 100644 >> --- a/include/net/flow_offload.h >> +++ b/include/net/flow_offload.h >> @@ -4,6 +4,7 @@ >> #include >> #include >> #include >> +#include >> >> struct flow_match { >> struct flow_dissector *dissector; >> @@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block >> *flow_block) >> INIT_LIST_HEAD(&flow_block->cb_list); >> } >> >> +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, >> + enum tc_setup_type type, void *type_data); >> + >> +struct flow_indr_block_cb { >> +struct list_head list; >> +void *cb_priv; >> +flow_indr_block_bind_cb_t *cb; >> +void *cb_ident; >> +}; > I don't understand why are you pushing this struct out of the c file to > the header. Please don't. > > >> + >> +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, >> + struct flow_block *flow_block, >> + struct flow_indr_block_cb *indr_block_cb, >> + enum flow_block_command command); >> + >> +struct flow_indr_block_dev { >> +struct rhash_head ht_node; >> +struct net_device *dev; >> +unsigned int refcnt; >> +struct list_head cb_list; >> +flow_indr_block_ing_cmd_t *ing_cmd_cb; >> +struct flow_block *flow_block; > I don't understand why are you pushing this struct out of the c file to > the header. Please don't. the flow_indr_block_dev and indr_block_cb in the h file used for the function tc_indr_block_ing_cmd in cls_api.c >> -static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev, >> - struct tc_indr_block_cb *indr_block_cb, >> +static void tc_indr_block_ing_cmd(struct net_device *dev, > I don't understand why you change struct tc_indr_block_dev * to > struct net_device * here. If you want to do that, please do that in a > separate patch, not it this one where only "the move" should happen. >
Re: [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
在 2019/7/30 0:55, Jakub Kicinski 写道: > On Mon, 29 Jul 2019 15:18:03 +0800, wenxu wrote: >> On 7/29/2019 12:42 PM, Jakub Kicinski wrote: >>> On Mon, 29 Jul 2019 10:43:56 +0800, wenxu wrote: >>>> On 7/29/2019 4:16 AM, Jakub Kicinski wrote: >>>>> I don't know the nft code, but it seems unlikely it wouldn't have the >>>>> same problem/need.. >>>> nft don't have the same problem. The offload rule can only attached >>>> to offload base chain. >>>> >>>> Th offload base chain is created after the device driver loaded (the >>>> device exist). >>> For indirect blocks the block is on the tunnel device and the offload >>> target is another device. E.g. you offload rules from a VXLAN device >>> onto the ASIC. The ASICs driver does not have to be loaded when VXLAN >>> device is created. >>> >>> So I feel like either the chain somehow directly references the offload >>> target (in which case the indirect infrastructure with hash lookup etc >>> is not needed for nft), or indirect infra is needed, and we need to take >>> care of replays. >> I think the nft is different with tc. >> >> In tc case we can create vxlan device add a ingress qdisc with a block >> success >> >> Then the ASIC driver loaded, then register the vxlan indr-dev and get the >> block >> adn replay it to hardware >> >> But in the nft case, The base chain flags with offload. Create an offload >> netdev >> base chain on vxlan device will fail if there is no indr-device to offload. > Can you show us the offload chain spec? Does it specify offload to the > vxlan device or the ASIC device? nft add chain netdev firewall aclout { type filter hook ingress offload device vxlan0 priority - 300 \; } > > Indir-devs can come and go, how do you handle a situation where offload > chain was installed with indir listener present, but then the ASIC > driver got removed? yes, I think nft is also need to get the default block in the indr-regster-cb for the go aways and reload again case >
[PATCH net-next 0/6] flow_offload: add indr-block in nf_table_offload
From: wenxu This series patch make nftables offload support the vlan and tunnel device offload through indr-block architecture. The first four patches mv tc indr block to flow offload and rename to flow-indr-block. Because the new flow-indr-block can't get the tcf_block directly. The fifthe patch provide a callback list to get flow_block of each subsystem immediately when the device register and contain a block. The last patch make nf_tables_offload support flow-indr-block. wenxu (6): cls_api: modify the tc_indr_block_ing_cmd parameters. cls_api: replace block with flow_block in tc_indr_block_dev cls_api: add flow_indr_block_call function flow_offload: move tc indirect block to flow offload flow_offload: support get flow_block immediately netfilter: nf_tables_offload: support indr block call drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 11 +- include/net/flow_offload.h | 48 include/net/netfilter/nf_tables_offload.h | 2 + include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 251 net/netfilter/nf_tables_api.c | 7 + net/netfilter/nf_tables_offload.c | 156 +++-- net/sched/cls_api.c| 255 - 10 files changed, 497 insertions(+), 281 deletions(-) -- 1.8.3.1
[PATCH net-next 4/6] flow_offload: move tc indirect block to flow offload
From: wenxu move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 11 +- include/net/flow_offload.h | 31 +++ include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 218 +++ net/sched/cls_api.c| 241 + 7 files changed, 265 insertions(+), 284 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 7f747cb..074573b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, { int err; - err = __tc_indr_block_cb_register(netdev, rpriv, - mlx5e_rep_indr_setup_tc_cb, - rpriv); + err = __flow_indr_block_cb_register(netdev, rpriv, + mlx5e_rep_indr_setup_tc_cb, + rpriv); if (err) { struct mlx5e_priv *priv = netdev_priv(rpriv->netdev); @@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv, struct net_device *netdev) { - __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, - rpriv); + __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, + rpriv); } static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb, diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c index e209f15..7b490db 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1479,16 +1479,17 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app, return NOTIFY_OK; if (event == NETDEV_REGISTER) { - err = __tc_indr_block_cb_register(netdev, app, - nfp_flower_indr_setup_tc_cb, - app); + err = __flow_indr_block_cb_register(netdev, app, + nfp_flower_indr_setup_tc_cb, + app); if (err) nfp_flower_cmsg_warn(app, "Indirect block reg failed - %s\n", netdev->name); } else if (event == NETDEV_UNREGISTER) { - __tc_indr_block_cb_unregister(netdev, - nfp_flower_indr_setup_tc_cb, app); + __flow_indr_block_cb_unregister(netdev, + nfp_flower_indr_setup_tc_cb, + app); } return NOTIFY_OK; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 00b9aab..c8d60a6 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -4,6 +4,7 @@ #include #include #include +#include struct flow_match { struct flow_dissector *dissector; @@ -366,4 +367,34 @@ static inline void flow_block_init(struct flow_block *flow_block) INIT_LIST_HEAD(&flow_block->cb_list); } +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, + enum tc_setup_type type, void *type_data); + +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, + struct flow_block *flow_block, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command command); + +int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + flow_indr_block_bind_cb_t *cb, + void *cb_ident); + +void __flow_indr_block_cb_unregister(struct net_device *dev, +flow_indr_block_bind_cb_t *cb, +void *cb_ident); + +int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + flow_indr_block_bind_cb_t *cb, v
[PATCH net-next 2/6] cls_api: replace block with flow_block in tc_indr_block_dev
From: wenxu This patch make tc_indr_block_dev can separate from tc subsystem Signed-off-by: wenxu --- net/sched/cls_api.c | 31 ++- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 2e3b58d..f9643fa 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -574,7 +574,7 @@ struct tc_indr_block_dev { struct net_device *dev; unsigned int refcnt; struct list_head cb_list; - struct tcf_block *block; + struct flow_block *flow_block; }; struct tc_indr_block_cb { @@ -597,6 +597,14 @@ struct tc_indr_block_cb { tc_indr_setup_block_ht_params); } +static void tc_indr_get_default_block(struct tc_indr_block_dev *indr_dev) +{ + struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev); + + if (block) + indr_dev->flow_block = &block->flow_block; +} + static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev) { struct tc_indr_block_dev *indr_dev; @@ -611,7 +619,7 @@ static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev) INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; - indr_dev->block = tc_dev_ingress_block(dev); + tc_indr_get_default_block(indr_dev); if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, tc_indr_setup_block_ht_params)) { kfree(indr_dev); @@ -678,11 +686,14 @@ static int tcf_block_setup(struct tcf_block *block, struct flow_block_offload *bo); static void tc_indr_block_ing_cmd(struct net_device *dev, - struct tcf_block *block, + struct flow_block *flow_block, tc_indr_block_bind_cb_t *cb, void *cb_priv, enum flow_block_command command) { + struct tcf_block *block = flow_block ? container_of(flow_block, + struct tcf_block, + flow_block) : NULL; struct flow_block_offload bo = { .command= command, .binder_type= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS, @@ -694,7 +705,7 @@ static void tc_indr_block_ing_cmd(struct net_device *dev, if (!block) return; - bo.block = &block->flow_block; + bo.block = flow_block; cb(dev, cb_priv, TC_SETUP_BLOCK, &bo); @@ -717,7 +728,7 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv, + tc_indr_block_ing_cmd(dev, indr_dev->flow_block, cb, cb_priv, FLOW_BLOCK_BIND); return 0; @@ -750,13 +761,14 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, if (!indr_dev) return; - indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident); + indr_block_cb = tc_indr_block_cb_lookup(indr_dev, indr_block_cb->cb, + indr_block_cb->cb_ident); if (!indr_block_cb) return; /* Send unbind message if required to free any block cbs. */ - tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv, - FLOW_BLOCK_UNBIND); + tc_indr_block_ing_cmd(dev, indr_dev->flow_block, indr_block_cb->cb, + indr_block_cb->cb_priv, FLOW_BLOCK_UNBIND); tc_indr_block_cb_del(indr_block_cb); tc_indr_block_dev_put(indr_dev); } @@ -792,7 +804,8 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, if (!indr_dev) return; - indr_dev->block = command == FLOW_BLOCK_BIND ? block : NULL; + indr_dev->flow_block = command == FLOW_BLOCK_BIND ? + &block->flow_block : NULL; list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, -- 1.8.3.1
[PATCH net-next 1/6] cls_api: modify the tc_indr_block_ing_cmd parameters.
From: wenxu This patch make tc_indr_block_ing_cmd can't access struct tc_indr_block_dev and tc_indr_block_cb. Signed-off-by: wenxu --- net/sched/cls_api.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 3565d9a..2e3b58d 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -677,26 +677,28 @@ static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb) static int tcf_block_setup(struct tcf_block *block, struct flow_block_offload *bo); -static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev, - struct tc_indr_block_cb *indr_block_cb, +static void tc_indr_block_ing_cmd(struct net_device *dev, + struct tcf_block *block, + tc_indr_block_bind_cb_t *cb, + void *cb_priv, enum flow_block_command command) { struct flow_block_offload bo = { .command= command, .binder_type= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS, - .net= dev_net(indr_dev->dev), - .block_shared = tcf_block_non_null_shared(indr_dev->block), + .net= dev_net(dev), + .block_shared = tcf_block_non_null_shared(block), }; INIT_LIST_HEAD(&bo.cb_list); - if (!indr_dev->block) + if (!block) return; - bo.block = &indr_dev->block->flow_block; + bo.block = &block->flow_block; - indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, - &bo); - tcf_block_setup(indr_dev->block, &bo); + cb(dev, cb_priv, TC_SETUP_BLOCK, &bo); + + tcf_block_setup(block, &bo); } int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, @@ -715,7 +717,8 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_BIND); + tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv, + FLOW_BLOCK_BIND); return 0; err_dev_put: @@ -752,7 +755,8 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, return; /* Send unbind message if required to free any block cbs. */ - tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_UNBIND); + tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv, + FLOW_BLOCK_UNBIND); tc_indr_block_cb_del(indr_block_cb); tc_indr_block_dev_put(indr_dev); } -- 1.8.3.1
[PATCH net-next 6/6] netfilter: nf_tables_offload: support indr block call
From: wenxu nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu --- include/net/netfilter/nf_tables_offload.h | 2 + net/netfilter/nf_tables_api.c | 7 ++ net/netfilter/nf_tables_offload.c | 156 +- 3 files changed, 141 insertions(+), 24 deletions(-) diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h index 3196663..ac69087 100644 --- a/include/net/netfilter/nf_tables_offload.h +++ b/include/net/netfilter/nf_tables_offload.h @@ -63,6 +63,8 @@ struct nft_flow_rule { struct nft_flow_rule *nft_flow_rule_create(const struct nft_rule *rule); void nft_flow_rule_destroy(struct nft_flow_rule *flow); int nft_flow_rule_offload_commit(struct net *net); +bool nft_indr_get_default_block(struct net_device *dev, + struct flow_indr_block_info *info); #define NFT_OFFLOAD_MATCH(__key, __base, __field, __len, __reg) \ (__reg)->base_offset= \ diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 605a7cf..6a1d0b2 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7593,6 +7593,11 @@ static void __net_exit nf_tables_exit_net(struct net *net) .exit = nf_tables_exit_net, }; +static struct flow_indr_get_block_entry get_block_entry = { + .get_block_cb = nft_indr_get_default_block, + .list = LIST_HEAD_INIT(get_block_entry.list), +}; + static int __init nf_tables_module_init(void) { int err; @@ -7624,6 +7629,7 @@ static int __init nf_tables_module_init(void) goto err5; nft_chain_route_init(); + flow_indr_add_default_block_cb(&get_block_entry); return err; err5: rhltable_destroy(&nft_objname_ht); @@ -7640,6 +7646,7 @@ static int __init nf_tables_module_init(void) static void __exit nf_tables_module_exit(void) { + flow_indr_del_default_block_cb(&get_block_entry); nfnetlink_subsys_unregister(&nf_tables_subsys); unregister_netdevice_notifier(&nf_tables_flowtable_notifier); nft_chain_filter_fini(); diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index 64f5fd5..59c9629 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -171,24 +171,114 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo, return 0; } +static int nft_block_setup(struct nft_base_chain *basechain, + struct flow_block_offload *bo, + enum flow_block_command cmd) +{ + int err; + + switch (cmd) { + case FLOW_BLOCK_BIND: + err = nft_flow_offload_bind(bo, basechain); + break; + case FLOW_BLOCK_UNBIND: + err = nft_flow_offload_unbind(bo, basechain); + break; + default: + WARN_ON_ONCE(1); + err = -EOPNOTSUPP; + } + + return err; +} + +static int nft_block_offload_cmd(struct nft_base_chain *chain, +struct net_device *dev, +enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + int err; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + if (err < 0) + return err; + + return nft_block_setup(chain, &bo, cmd); +} + +static void nft_indr_block_ing_cmd(struct net_device *dev, + struct flow_block *flow_block, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + struct nft_base_chain *chain; + + if (flow_block) + return; + + chain = container_of(flow_block, struct nft_base_chain, flow_block); + + bo.net = dev_net(dev); + bo.block = flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; +
[PATCH net-next 5/6] flow_offload: support get flow_block immediately
From: wenxu The new flow-indr-block can't get the tcf_block directly. It provide a callback list to find the flow_block immediately when the device register and contain a ingress block. Signed-off-by: wenxu --- include/net/flow_offload.h | 17 + net/core/flow_offload.c| 33 + net/sched/cls_api.c| 44 3 files changed, 94 insertions(+) diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index c8d60a6..db04e3f 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -376,6 +376,23 @@ typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, void *cb_priv, enum flow_block_command command); +struct flow_indr_block_info { + struct flow_block *flow_block; + flow_indr_block_ing_cmd_t *ing_cmd_cb; +}; + +typedef bool flow_indr_get_default_block_t(struct net_device *dev, + struct flow_indr_block_info *info); + +struct flow_indr_get_block_entry { + flow_indr_get_default_block_t *get_block_cb; + struct list_headlist; +}; + +void flow_indr_add_default_block_cb(struct flow_indr_get_block_entry *entry); + +void flow_indr_del_default_block_cb(struct flow_indr_get_block_entry *entry); + int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, flow_indr_block_bind_cb_t *cb, void *cb_ident); diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c index a1fdfa4..8ff7a75b 100644 --- a/net/core/flow_offload.c +++ b/net/core/flow_offload.c @@ -282,6 +282,8 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f, } EXPORT_SYMBOL(flow_block_cb_setup_simple); +static LIST_HEAD(get_default_block_cb_list); + static struct rhashtable indr_setup_block_ht; struct flow_indr_block_cb { @@ -313,6 +315,24 @@ struct flow_indr_block_dev { flow_indr_setup_block_ht_params); } +static void flow_get_default_block(struct flow_indr_block_dev *indr_dev) +{ + struct flow_indr_get_block_entry *entry_cb; + struct flow_indr_block_info info; + + rcu_read_lock(); + + list_for_each_entry_rcu(entry_cb, &get_default_block_cb_list, list) { + if (entry_cb->get_block_cb(indr_dev->dev, &info)) { + indr_dev->flow_block = info.flow_block; + indr_dev->ing_cmd_cb = info.ing_cmd_cb; + break; + } + } + + rcu_read_unlock(); +} + static struct flow_indr_block_dev * flow_indr_block_dev_get(struct net_device *dev) { @@ -328,6 +348,7 @@ struct flow_indr_block_dev { INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; + flow_get_default_block(indr_dev); if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, flow_indr_setup_block_ht_params)) { kfree(indr_dev); @@ -492,6 +513,18 @@ void flow_indr_block_call(struct flow_block *flow_block, } EXPORT_SYMBOL_GPL(flow_indr_block_call); +void flow_indr_add_default_block_cb(struct flow_indr_get_block_entry *entry) +{ + list_add_tail_rcu(&entry->list, &get_default_block_cb_list); +} +EXPORT_SYMBOL_GPL(flow_indr_add_default_block_cb); + +void flow_indr_del_default_block_cb(struct flow_indr_get_block_entry *entry) +{ + list_del_rcu(&entry->list); +} +EXPORT_SYMBOL_GPL(flow_indr_del_default_block_cb); + static int __init init_flow_indr_rhashtable(void) { return rhashtable_init(&indr_setup_block_ht, diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index bd5e591..8bf918c 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -576,6 +576,43 @@ static void tc_indr_block_ing_cmd(struct net_device *dev, tcf_block_setup(block, &bo); } +static struct tcf_block *tc_dev_ingress_block(struct net_device *dev) +{ + const struct Qdisc_class_ops *cops; + struct Qdisc *qdisc; + + if (!dev_ingress_queue(dev)) + return NULL; + + qdisc = dev_ingress_queue(dev)->qdisc_sleeping; + if (!qdisc) + return NULL; + + cops = qdisc->ops->cl_ops; + if (!cops) + return NULL; + + if (!cops->tcf_block) + return NULL; + + return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL); +} + +static bool tc_indr_get_default_block(struct net_device *dev, + struct flow_indr_block_info *info) +{ + struct tcf_block *block = tc_dev_ingress_block(dev); + + if (block) { + info->flow_block = &block->flow_block; + info->ing_cmd_cb = tc_indr_block_ing_cmd; + + return
[PATCH net-next 3/6] cls_api: add flow_indr_block_call function
From: wenxu This patch make indr_block_call don't access struct tc_indr_block_cb and tc_indr_block_dev directly Signed-off-by: wenxu --- net/sched/cls_api.c | 33 - 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index f9643fa..617b098 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -783,13 +783,30 @@ void tc_indr_block_cb_unregister(struct net_device *dev, } EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister); +static void flow_indr_block_call(struct flow_block *flow_block, +struct net_device *dev, +struct flow_block_offload *bo, +enum flow_block_command command) +{ + struct tc_indr_block_cb *indr_block_cb; + struct tc_indr_block_dev *indr_dev; + + indr_dev = tc_indr_block_dev_lookup(dev); + if (!indr_dev) + return; + + indr_dev->flow_block = command == FLOW_BLOCK_BIND ? flow_block : NULL; + + list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, + bo); +} + static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, struct tcf_block_ext_info *ei, enum flow_block_command command, struct netlink_ext_ack *extack) { - struct tc_indr_block_cb *indr_block_cb; - struct tc_indr_block_dev *indr_dev; struct flow_block_offload bo = { .command= command, .binder_type= ei->binder_type, @@ -800,17 +817,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, }; INIT_LIST_HEAD(&bo.cb_list); - indr_dev = tc_indr_block_dev_lookup(dev); - if (!indr_dev) - return; - - indr_dev->flow_block = command == FLOW_BLOCK_BIND ? - &block->flow_block : NULL; - - list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) - indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, - &bo); - + flow_indr_block_call(&block->flow_block, dev, &bo, command); tcf_block_setup(block, &bo); } -- 1.8.3.1
[PATCH net-next v5 1/6] cls_api: modify the tc_indr_block_ing_cmd parameters.
From: wenxu This patch make tc_indr_block_ing_cmd can't access struct tc_indr_block_dev and tc_indr_block_cb. Signed-off-by: wenxu --- v5: new patch net/sched/cls_api.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 3565d9a..2e3b58d 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -677,26 +677,28 @@ static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb) static int tcf_block_setup(struct tcf_block *block, struct flow_block_offload *bo); -static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev, - struct tc_indr_block_cb *indr_block_cb, +static void tc_indr_block_ing_cmd(struct net_device *dev, + struct tcf_block *block, + tc_indr_block_bind_cb_t *cb, + void *cb_priv, enum flow_block_command command) { struct flow_block_offload bo = { .command= command, .binder_type= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS, - .net= dev_net(indr_dev->dev), - .block_shared = tcf_block_non_null_shared(indr_dev->block), + .net= dev_net(dev), + .block_shared = tcf_block_non_null_shared(block), }; INIT_LIST_HEAD(&bo.cb_list); - if (!indr_dev->block) + if (!block) return; - bo.block = &indr_dev->block->flow_block; + bo.block = &block->flow_block; - indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, - &bo); - tcf_block_setup(indr_dev->block, &bo); + cb(dev, cb_priv, TC_SETUP_BLOCK, &bo); + + tcf_block_setup(block, &bo); } int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, @@ -715,7 +717,8 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_BIND); + tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv, + FLOW_BLOCK_BIND); return 0; err_dev_put: @@ -752,7 +755,8 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, return; /* Send unbind message if required to free any block cbs. */ - tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_UNBIND); + tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv, + FLOW_BLOCK_UNBIND); tc_indr_block_cb_del(indr_block_cb); tc_indr_block_dev_put(indr_dev); } -- 1.8.3.1
[PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
From: wenxu The new flow-indr-block can't get the tcf_block directly. It provide a callback list to find the flow_block immediately when the device register and contain a ingress block. Signed-off-by: wenxu --- v5: add get_block_cb_list for both nft and tc include/net/flow_offload.h | 17 + net/core/flow_offload.c| 33 + net/sched/cls_api.c| 44 3 files changed, 94 insertions(+) diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index c8d60a6..db04e3f 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -376,6 +376,23 @@ typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, void *cb_priv, enum flow_block_command command); +struct flow_indr_block_info { + struct flow_block *flow_block; + flow_indr_block_ing_cmd_t *ing_cmd_cb; +}; + +typedef bool flow_indr_get_default_block_t(struct net_device *dev, + struct flow_indr_block_info *info); + +struct flow_indr_get_block_entry { + flow_indr_get_default_block_t *get_block_cb; + struct list_headlist; +}; + +void flow_indr_add_default_block_cb(struct flow_indr_get_block_entry *entry); + +void flow_indr_del_default_block_cb(struct flow_indr_get_block_entry *entry); + int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, flow_indr_block_bind_cb_t *cb, void *cb_ident); diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c index a1fdfa4..8ff7a75b 100644 --- a/net/core/flow_offload.c +++ b/net/core/flow_offload.c @@ -282,6 +282,8 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f, } EXPORT_SYMBOL(flow_block_cb_setup_simple); +static LIST_HEAD(get_default_block_cb_list); + static struct rhashtable indr_setup_block_ht; struct flow_indr_block_cb { @@ -313,6 +315,24 @@ struct flow_indr_block_dev { flow_indr_setup_block_ht_params); } +static void flow_get_default_block(struct flow_indr_block_dev *indr_dev) +{ + struct flow_indr_get_block_entry *entry_cb; + struct flow_indr_block_info info; + + rcu_read_lock(); + + list_for_each_entry_rcu(entry_cb, &get_default_block_cb_list, list) { + if (entry_cb->get_block_cb(indr_dev->dev, &info)) { + indr_dev->flow_block = info.flow_block; + indr_dev->ing_cmd_cb = info.ing_cmd_cb; + break; + } + } + + rcu_read_unlock(); +} + static struct flow_indr_block_dev * flow_indr_block_dev_get(struct net_device *dev) { @@ -328,6 +348,7 @@ struct flow_indr_block_dev { INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; + flow_get_default_block(indr_dev); if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, flow_indr_setup_block_ht_params)) { kfree(indr_dev); @@ -492,6 +513,18 @@ void flow_indr_block_call(struct flow_block *flow_block, } EXPORT_SYMBOL_GPL(flow_indr_block_call); +void flow_indr_add_default_block_cb(struct flow_indr_get_block_entry *entry) +{ + list_add_tail_rcu(&entry->list, &get_default_block_cb_list); +} +EXPORT_SYMBOL_GPL(flow_indr_add_default_block_cb); + +void flow_indr_del_default_block_cb(struct flow_indr_get_block_entry *entry) +{ + list_del_rcu(&entry->list); +} +EXPORT_SYMBOL_GPL(flow_indr_del_default_block_cb); + static int __init init_flow_indr_rhashtable(void) { return rhashtable_init(&indr_setup_block_ht, diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index bd5e591..8bf918c 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -576,6 +576,43 @@ static void tc_indr_block_ing_cmd(struct net_device *dev, tcf_block_setup(block, &bo); } +static struct tcf_block *tc_dev_ingress_block(struct net_device *dev) +{ + const struct Qdisc_class_ops *cops; + struct Qdisc *qdisc; + + if (!dev_ingress_queue(dev)) + return NULL; + + qdisc = dev_ingress_queue(dev)->qdisc_sleeping; + if (!qdisc) + return NULL; + + cops = qdisc->ops->cl_ops; + if (!cops) + return NULL; + + if (!cops->tcf_block) + return NULL; + + return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL); +} + +static bool tc_indr_get_default_block(struct net_device *dev, + struct flow_indr_block_info *info) +{ + struct tcf_block *block = tc_dev_ingress_block(dev); + + if (block) { + info->flow_block = &block->flow_block; + info->ing_cmd_cb = tc_in
[PATCH net-next v5 0/6] flow_offload: add indr-block in nf_table_offload
From: wenxu This series patch make nftables offload support the vlan and tunnel device offload through indr-block architecture. The first four patches mv tc indr block to flow offload and rename to flow-indr-block. Because the new flow-indr-block can't get the tcf_block directly. The fifthe patch provide a callback list to get flow_block of each subsystem immediately when the device register and contain a block. The last patch make nf_tables_offload support flow-indr-block. wenxu (6): cls_api: modify the tc_indr_block_ing_cmd parameters. cls_api: replace block with flow_block in tc_indr_block_dev cls_api: add flow_indr_block_call function flow_offload: move tc indirect block to flow offload flow_offload: support get flow_block immediately netfilter: nf_tables_offload: support indr block call drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 11 +- include/net/flow_offload.h | 48 include/net/netfilter/nf_tables_offload.h | 2 + include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 251 net/netfilter/nf_tables_api.c | 7 + net/netfilter/nf_tables_offload.c | 156 +++-- net/sched/cls_api.c| 255 - 10 files changed, 497 insertions(+), 281 deletions(-) -- 1.8.3.1
[PATCH net-next v5 4/6] flow_offload: move tc indirect block to flow offload
From: wenxu move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu --- v5: make flow_indr_block_cb/dev in c file drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 11 +- include/net/flow_offload.h | 31 +++ include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 218 +++ net/sched/cls_api.c| 241 + 7 files changed, 265 insertions(+), 284 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 7f747cb..074573b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, { int err; - err = __tc_indr_block_cb_register(netdev, rpriv, - mlx5e_rep_indr_setup_tc_cb, - rpriv); + err = __flow_indr_block_cb_register(netdev, rpriv, + mlx5e_rep_indr_setup_tc_cb, + rpriv); if (err) { struct mlx5e_priv *priv = netdev_priv(rpriv->netdev); @@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv, struct net_device *netdev) { - __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, - rpriv); + __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, + rpriv); } static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb, diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c index e209f15..7b490db 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1479,16 +1479,17 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app, return NOTIFY_OK; if (event == NETDEV_REGISTER) { - err = __tc_indr_block_cb_register(netdev, app, - nfp_flower_indr_setup_tc_cb, - app); + err = __flow_indr_block_cb_register(netdev, app, + nfp_flower_indr_setup_tc_cb, + app); if (err) nfp_flower_cmsg_warn(app, "Indirect block reg failed - %s\n", netdev->name); } else if (event == NETDEV_UNREGISTER) { - __tc_indr_block_cb_unregister(netdev, - nfp_flower_indr_setup_tc_cb, app); + __flow_indr_block_cb_unregister(netdev, + nfp_flower_indr_setup_tc_cb, + app); } return NOTIFY_OK; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 00b9aab..c8d60a6 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -4,6 +4,7 @@ #include #include #include +#include struct flow_match { struct flow_dissector *dissector; @@ -366,4 +367,34 @@ static inline void flow_block_init(struct flow_block *flow_block) INIT_LIST_HEAD(&flow_block->cb_list); } +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, + enum tc_setup_type type, void *type_data); + +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, + struct flow_block *flow_block, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command command); + +int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + flow_indr_block_bind_cb_t *cb, + void *cb_ident); + +void __flow_indr_block_cb_unregister(struct net_device *dev, +flow_indr_block_bind_cb_t *cb, +void *cb_ident); + +int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + fl
[PATCH net-next v5 2/6] cls_api: replace block with flow_block in tc_indr_block_dev
From: wenxu This patch make tc_indr_block_dev can separate from tc subsystem Signed-off-by: wenxu --- v5: new patch net/sched/cls_api.c | 31 ++- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 2e3b58d..f9643fa 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -574,7 +574,7 @@ struct tc_indr_block_dev { struct net_device *dev; unsigned int refcnt; struct list_head cb_list; - struct tcf_block *block; + struct flow_block *flow_block; }; struct tc_indr_block_cb { @@ -597,6 +597,14 @@ struct tc_indr_block_cb { tc_indr_setup_block_ht_params); } +static void tc_indr_get_default_block(struct tc_indr_block_dev *indr_dev) +{ + struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev); + + if (block) + indr_dev->flow_block = &block->flow_block; +} + static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev) { struct tc_indr_block_dev *indr_dev; @@ -611,7 +619,7 @@ static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev) INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; - indr_dev->block = tc_dev_ingress_block(dev); + tc_indr_get_default_block(indr_dev); if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, tc_indr_setup_block_ht_params)) { kfree(indr_dev); @@ -678,11 +686,14 @@ static int tcf_block_setup(struct tcf_block *block, struct flow_block_offload *bo); static void tc_indr_block_ing_cmd(struct net_device *dev, - struct tcf_block *block, + struct flow_block *flow_block, tc_indr_block_bind_cb_t *cb, void *cb_priv, enum flow_block_command command) { + struct tcf_block *block = flow_block ? container_of(flow_block, + struct tcf_block, + flow_block) : NULL; struct flow_block_offload bo = { .command= command, .binder_type= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS, @@ -694,7 +705,7 @@ static void tc_indr_block_ing_cmd(struct net_device *dev, if (!block) return; - bo.block = &block->flow_block; + bo.block = flow_block; cb(dev, cb_priv, TC_SETUP_BLOCK, &bo); @@ -717,7 +728,7 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv, + tc_indr_block_ing_cmd(dev, indr_dev->flow_block, cb, cb_priv, FLOW_BLOCK_BIND); return 0; @@ -750,13 +761,14 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, if (!indr_dev) return; - indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident); + indr_block_cb = tc_indr_block_cb_lookup(indr_dev, indr_block_cb->cb, + indr_block_cb->cb_ident); if (!indr_block_cb) return; /* Send unbind message if required to free any block cbs. */ - tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv, - FLOW_BLOCK_UNBIND); + tc_indr_block_ing_cmd(dev, indr_dev->flow_block, indr_block_cb->cb, + indr_block_cb->cb_priv, FLOW_BLOCK_UNBIND); tc_indr_block_cb_del(indr_block_cb); tc_indr_block_dev_put(indr_dev); } @@ -792,7 +804,8 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, if (!indr_dev) return; - indr_dev->block = command == FLOW_BLOCK_BIND ? block : NULL; + indr_dev->flow_block = command == FLOW_BLOCK_BIND ? + &block->flow_block : NULL; list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, -- 1.8.3.1
[PATCH net-next 3/6] cls_api: add flow_indr_block_call function
From: wenxu This patch make indr_block_call don't access struct tc_indr_block_cb and tc_indr_block_dev directly Signed-off-by: wenxu --- v5: new patch net/sched/cls_api.c | 33 - 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index f9643fa..617b098 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -783,13 +783,30 @@ void tc_indr_block_cb_unregister(struct net_device *dev, } EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister); +static void flow_indr_block_call(struct flow_block *flow_block, +struct net_device *dev, +struct flow_block_offload *bo, +enum flow_block_command command) +{ + struct tc_indr_block_cb *indr_block_cb; + struct tc_indr_block_dev *indr_dev; + + indr_dev = tc_indr_block_dev_lookup(dev); + if (!indr_dev) + return; + + indr_dev->flow_block = command == FLOW_BLOCK_BIND ? flow_block : NULL; + + list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, + bo); +} + static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, struct tcf_block_ext_info *ei, enum flow_block_command command, struct netlink_ext_ack *extack) { - struct tc_indr_block_cb *indr_block_cb; - struct tc_indr_block_dev *indr_dev; struct flow_block_offload bo = { .command= command, .binder_type= ei->binder_type, @@ -800,17 +817,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, }; INIT_LIST_HEAD(&bo.cb_list); - indr_dev = tc_indr_block_dev_lookup(dev); - if (!indr_dev) - return; - - indr_dev->flow_block = command == FLOW_BLOCK_BIND ? - &block->flow_block : NULL; - - list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) - indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, - &bo); - + flow_indr_block_call(&block->flow_block, dev, &bo, command); tcf_block_setup(block, &bo); } -- 1.8.3.1
[PATCH net-next v5 6/6] netfilter: nf_tables_offload: support indr block call
From: wenxu nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu --- v5: add nft_get_default_block include/net/netfilter/nf_tables_offload.h | 2 + net/netfilter/nf_tables_api.c | 7 ++ net/netfilter/nf_tables_offload.c | 156 +- 3 files changed, 141 insertions(+), 24 deletions(-) diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h index 3196663..ac69087 100644 --- a/include/net/netfilter/nf_tables_offload.h +++ b/include/net/netfilter/nf_tables_offload.h @@ -63,6 +63,8 @@ struct nft_flow_rule { struct nft_flow_rule *nft_flow_rule_create(const struct nft_rule *rule); void nft_flow_rule_destroy(struct nft_flow_rule *flow); int nft_flow_rule_offload_commit(struct net *net); +bool nft_indr_get_default_block(struct net_device *dev, + struct flow_indr_block_info *info); #define NFT_OFFLOAD_MATCH(__key, __base, __field, __len, __reg) \ (__reg)->base_offset= \ diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 605a7cf..6a1d0b2 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7593,6 +7593,11 @@ static void __net_exit nf_tables_exit_net(struct net *net) .exit = nf_tables_exit_net, }; +static struct flow_indr_get_block_entry get_block_entry = { + .get_block_cb = nft_indr_get_default_block, + .list = LIST_HEAD_INIT(get_block_entry.list), +}; + static int __init nf_tables_module_init(void) { int err; @@ -7624,6 +7629,7 @@ static int __init nf_tables_module_init(void) goto err5; nft_chain_route_init(); + flow_indr_add_default_block_cb(&get_block_entry); return err; err5: rhltable_destroy(&nft_objname_ht); @@ -7640,6 +7646,7 @@ static int __init nf_tables_module_init(void) static void __exit nf_tables_module_exit(void) { + flow_indr_del_default_block_cb(&get_block_entry); nfnetlink_subsys_unregister(&nf_tables_subsys); unregister_netdevice_notifier(&nf_tables_flowtable_notifier); nft_chain_filter_fini(); diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index 64f5fd5..59c9629 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -171,24 +171,114 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo, return 0; } +static int nft_block_setup(struct nft_base_chain *basechain, + struct flow_block_offload *bo, + enum flow_block_command cmd) +{ + int err; + + switch (cmd) { + case FLOW_BLOCK_BIND: + err = nft_flow_offload_bind(bo, basechain); + break; + case FLOW_BLOCK_UNBIND: + err = nft_flow_offload_unbind(bo, basechain); + break; + default: + WARN_ON_ONCE(1); + err = -EOPNOTSUPP; + } + + return err; +} + +static int nft_block_offload_cmd(struct nft_base_chain *chain, +struct net_device *dev, +enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + int err; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + if (err < 0) + return err; + + return nft_block_setup(chain, &bo, cmd); +} + +static void nft_indr_block_ing_cmd(struct net_device *dev, + struct flow_block *flow_block, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + struct nft_base_chain *chain; + + if (flow_block) + return; + + chain = container_of(flow_block, struct nft_base_chain, flow_block); + + bo.net = dev_net(dev); + bo.block = flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRE
Re: [PATCH net-next v5 6/6] netfilter: nf_tables_offload: support indr block call
On 8/1/2019 11:58 AM, Yunsheng Lin wrote: > On 2019/8/1 11:03, we...@ucloud.cn wrote: >> From: wenxu >> >> nftable support indr-block call. It makes nftable an offload vlan >> and tunnel device. >> >> nft add table netdev firewall >> nft add chain netdev firewall aclout { type filter hook ingress offload >> device mlx_pf0vf0 priority - 300 \; } >> nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 >> nft add chain netdev firewall aclin { type filter hook ingress device vlan0 >> priority - 300 \; } >> nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 >> >> Signed-off-by: wenxu >> --- >> v5: add nft_get_default_block >> >> include/net/netfilter/nf_tables_offload.h | 2 + >> net/netfilter/nf_tables_api.c | 7 ++ >> net/netfilter/nf_tables_offload.c | 156 >> +- >> 3 files changed, 141 insertions(+), 24 deletions(-) >> >> diff --git a/include/net/netfilter/nf_tables_offload.h >> b/include/net/netfilter/nf_tables_offload.h >> index 3196663..ac69087 100644 >> --- a/include/net/netfilter/nf_tables_offload.h >> +++ b/include/net/netfilter/nf_tables_offload.h >> @@ -63,6 +63,8 @@ struct nft_flow_rule { >> struct nft_flow_rule *nft_flow_rule_create(const struct nft_rule *rule); >> void nft_flow_rule_destroy(struct nft_flow_rule *flow); >> int nft_flow_rule_offload_commit(struct net *net); >> +bool nft_indr_get_default_block(struct net_device *dev, >> +struct flow_indr_block_info *info); >> >> #define NFT_OFFLOAD_MATCH(__key, __base, __field, __len, __reg) >> \ >> (__reg)->base_offset= \ >> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c >> index 605a7cf..6a1d0b2 100644 >> --- a/net/netfilter/nf_tables_api.c >> +++ b/net/netfilter/nf_tables_api.c >> @@ -7593,6 +7593,11 @@ static void __net_exit nf_tables_exit_net(struct net >> *net) >> .exit = nf_tables_exit_net, >> }; >> >> +static struct flow_indr_get_block_entry get_block_entry = { >> +.get_block_cb = nft_indr_get_default_block, >> +.list = LIST_HEAD_INIT(get_block_entry.list), >> +}; >> + >> static int __init nf_tables_module_init(void) >> { >> int err; >> @@ -7624,6 +7629,7 @@ static int __init nf_tables_module_init(void) >> goto err5; >> >> nft_chain_route_init(); >> +flow_indr_add_default_block_cb(&get_block_entry); >> return err; >> err5: >> rhltable_destroy(&nft_objname_ht); >> @@ -7640,6 +7646,7 @@ static int __init nf_tables_module_init(void) >> >> static void __exit nf_tables_module_exit(void) >> { >> +flow_indr_del_default_block_cb(&get_block_entry); >> nfnetlink_subsys_unregister(&nf_tables_subsys); >> unregister_netdevice_notifier(&nf_tables_flowtable_notifier); >> nft_chain_filter_fini(); >> diff --git a/net/netfilter/nf_tables_offload.c >> b/net/netfilter/nf_tables_offload.c >> index 64f5fd5..59c9629 100644 >> --- a/net/netfilter/nf_tables_offload.c >> +++ b/net/netfilter/nf_tables_offload.c >> @@ -171,24 +171,114 @@ static int nft_flow_offload_unbind(struct >> flow_block_offload *bo, >> return 0; >> } >> >> +static int nft_block_setup(struct nft_base_chain *basechain, >> + struct flow_block_offload *bo, >> + enum flow_block_command cmd) >> +{ >> +int err; >> + >> +switch (cmd) { >> +case FLOW_BLOCK_BIND: >> +err = nft_flow_offload_bind(bo, basechain); >> +break; >> +case FLOW_BLOCK_UNBIND: >> +err = nft_flow_offload_unbind(bo, basechain); >> +break; >> +default: >> +WARN_ON_ONCE(1); >> +err = -EOPNOTSUPP; >> +} >> + >> +return err; >> +} >> + >> +static int nft_block_offload_cmd(struct nft_base_chain *chain, >> + struct net_device *dev, >> + enum flow_block_command cmd) >> +{ >> +struct netlink_ext_ack extack = {}; >> +struct flow_block_offload bo = {}; >> +int err; >> + >> +bo.net = dev_net(dev); >> +bo.block = &chain->flow_block; >> +bo.command = cmd; >> +bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_ING
Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
On 8/2/2019 7:11 AM, Jakub Kicinski wrote: > On Thu, 1 Aug 2019 11:03:46 +0800, we...@ucloud.cn wrote: >> From: wenxu >> >> The new flow-indr-block can't get the tcf_block >> directly. It provide a callback list to find the flow_block immediately >> when the device register and contain a ingress block. >> >> Signed-off-by: wenxu > First of all thanks for splitting the series up into more patches, > it is easier to follow the logic now! > >> @@ -328,6 +348,7 @@ struct flow_indr_block_dev { >> >> INIT_LIST_HEAD(&indr_dev->cb_list); >> indr_dev->dev = dev; >> +flow_get_default_block(indr_dev); >> if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, >> flow_indr_setup_block_ht_params)) { >> kfree(indr_dev); > I wonder if it's still practical to keep the block information in the > indr_dev structure at all. The way this used to work was: > > > [hash table of devices] -- > | |netdev| > | |refcnt| > indir_dev[tun0]| -- | cached block | [ TC block ] > | | callbacks | . > | -- \__ [cb, cb_priv, cb_ident] > | [cb, cb_priv, cb_ident] > | -- > | |netdev| > | |refcnt| > indir_dev[tun1]| -- | cached block | [ TC block ] > | | callbacks |. > - -- \__ [cb, cb_priv, cb_ident] > [cb, cb_priv, cb_ident] > > > In the example above we have two tunnels tun0 and tun1, each one has a > indr_dev structure allocated, and for each one of them two drivers > registered for callbacks (hence the callbacks list has two entries). > > We used to cache the TC block in the indr_dev structure, but now that > there are multiple subsytems using the indr_dev we either have to have > a list of cached blocks (with entries for each subsystem) or just always > iterate over the subsystems :( > > After all the same device may have both a TC block and a NFT block. Only one subsystem can be used for the same device for both indr-dev and hw-dev the flow_block_cb_is_busy avoid the situation you mentioned. > > I think always iterating would be easier: > > The indr_dev struct would no longer have the block pointer, instead > when new driver registers for the callback instead of: > > if (indr_dev->ing_cmd_cb) > indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block, >indr_block_cb->cb, indr_block_cb->cb_priv, >FLOW_BLOCK_BIND); > > We'd have something like the loop in flow_get_default_block(): > > for each (subsystem) > subsystem->handle_new_indir_cb(indr_dev, cb); > > And then per-subsystem logic would actually call the cb. Or: > > for each (subsystem) > block = get_default_block(indir_dev) > indr_dev->ing_cmd_cb(...) > > I hope this makes sense. > > >
Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
On 8/2/2019 7:11 AM, Jakub Kicinski wrote: > On Thu, 1 Aug 2019 11:03:46 +0800, we...@ucloud.cn wrote: >> From: wenxu >> >> The new flow-indr-block can't get the tcf_block >> directly. It provide a callback list to find the flow_block immediately >> when the device register and contain a ingress block. >> >> Signed-off-by: wenxu > First of all thanks for splitting the series up into more patches, > it is easier to follow the logic now! > >> @@ -328,6 +348,7 @@ struct flow_indr_block_dev { >> >> INIT_LIST_HEAD(&indr_dev->cb_list); >> indr_dev->dev = dev; >> +flow_get_default_block(indr_dev); >> if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, >> flow_indr_setup_block_ht_params)) { >> kfree(indr_dev); > I wonder if it's still practical to keep the block information in the > indr_dev structure at all. The way this used to work was: > > > [hash table of devices] -- > | |netdev| > | |refcnt| > indir_dev[tun0]| -- | cached block | [ TC block ] > | | callbacks | . > | -- \__ [cb, cb_priv, cb_ident] > | [cb, cb_priv, cb_ident] > | -- > | |netdev| > | |refcnt| > indir_dev[tun1]| -- | cached block | [ TC block ] > | | callbacks |. > - -- \__ [cb, cb_priv, cb_ident] > [cb, cb_priv, cb_ident] > > > In the example above we have two tunnels tun0 and tun1, each one has a > indr_dev structure allocated, and for each one of them two drivers > registered for callbacks (hence the callbacks list has two entries). > > We used to cache the TC block in the indr_dev structure, but now that > there are multiple subsytems using the indr_dev we either have to have > a list of cached blocks (with entries for each subsystem) or just always > iterate over the subsystems :( > > After all the same device may have both a TC block and a NFT block. > > I think always iterating would be easier: > > The indr_dev struct would no longer have the block pointer, instead > when new driver registers for the callback instead of: > > if (indr_dev->ing_cmd_cb) > indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block, >indr_block_cb->cb, indr_block_cb->cb_priv, >FLOW_BLOCK_BIND); > > We'd have something like the loop in flow_get_default_block(): > > for each (subsystem) > subsystem->handle_new_indir_cb(indr_dev, cb); > > And then per-subsystem logic would actually call the cb. Or: > > for each (subsystem) > block = get_default_block(indir_dev) > indr_dev->ing_cmd_cb(...) nft dev chian is also based on register_netdevice_notifier, So for unregister case, the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister. is right? So maybe we can cache the block as a list of all the subsystem in indr_dev ? > I hope this makes sense. > > > Also please double check nft unload logic has an RCU synchronization > point, I'm not 100% confident rcu_barrier() implies synchronize_rcu(). > Perhaps someone more knowledgeable can chime in :) >
Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
在 2019/8/2 18:45, wenxu 写道: > On 8/2/2019 7:11 AM, Jakub Kicinski wrote: >> On Thu, 1 Aug 2019 11:03:46 +0800, we...@ucloud.cn wrote: >>> From: wenxu >>> >>> The new flow-indr-block can't get the tcf_block >>> directly. It provide a callback list to find the flow_block immediately >>> when the device register and contain a ingress block. >>> >>> Signed-off-by: wenxu >> First of all thanks for splitting the series up into more patches, >> it is easier to follow the logic now! >> >>> @@ -328,6 +348,7 @@ struct flow_indr_block_dev { >>> >>> INIT_LIST_HEAD(&indr_dev->cb_list); >>> indr_dev->dev = dev; >>> + flow_get_default_block(indr_dev); >>> if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, >>>flow_indr_setup_block_ht_params)) { >>> kfree(indr_dev); >> I wonder if it's still practical to keep the block information in the >> indr_dev structure at all. The way this used to work was: >> >> >> [hash table of devices] -- >> | |netdev| >> | |refcnt| >> indir_dev[tun0]| -- | cached block | [ TC block ] >> | | callbacks | . >> | -- \__ [cb, cb_priv, cb_ident] >> | [cb, cb_priv, cb_ident] >> | -- >> | |netdev| >> | |refcnt| >> indir_dev[tun1]| -- | cached block | [ TC block ] >> | | callbacks |. >> - -- \__ [cb, cb_priv, cb_ident] >> [cb, cb_priv, cb_ident] >> >> >> In the example above we have two tunnels tun0 and tun1, each one has a >> indr_dev structure allocated, and for each one of them two drivers >> registered for callbacks (hence the callbacks list has two entries). >> >> We used to cache the TC block in the indr_dev structure, but now that >> there are multiple subsytems using the indr_dev we either have to have >> a list of cached blocks (with entries for each subsystem) or just always >> iterate over the subsystems :( >> >> After all the same device may have both a TC block and a NFT block. >> >> I think always iterating would be easier: >> >> The indr_dev struct would no longer have the block pointer, instead >> when new driver registers for the callback instead of: >> >> if (indr_dev->ing_cmd_cb) >> indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block, >> indr_block_cb->cb, indr_block_cb->cb_priv, >> FLOW_BLOCK_BIND); >> >> We'd have something like the loop in flow_get_default_block(): >> >> for each (subsystem) >> subsystem->handle_new_indir_cb(indr_dev, cb); >> >> And then per-subsystem logic would actually call the cb. Or: >> >> for each (subsystem) >> block = get_default_block(indir_dev) >> indr_dev->ing_cmd_cb(...) > nft dev chian is also based on register_netdevice_notifier, So > for unregister case, > > the basechian(block) of nft maybe delete before the > __tc_indr_block_cb_unregister. is right? > > So maybe we can cache the block as a list of all the subsystem in indr_dev ? when the device is unregister the nft netdev chain related to this device will also be delete through netdevice_notifier . So for unregister case,the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister. cache for the block is not work because the chain already be delete and free. Maybe it improve the prio of rep_netdev_event can help this? > >
Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
在 2019/8/3 2:02, Jakub Kicinski 写道: > On Fri, 2 Aug 2019 21:09:03 +0800, wenxu wrote: >>>> We'd have something like the loop in flow_get_default_block(): >>>> >>>>for each (subsystem) >>>>subsystem->handle_new_indir_cb(indr_dev, cb); >>>> >>>> And then per-subsystem logic would actually call the cb. Or: >>>> >>>>for each (subsystem) >>>>block = get_default_block(indir_dev) >>>>indr_dev->ing_cmd_cb(...) >>> nft dev chian is also based on register_netdevice_notifier, So >>> for unregister case, >>> >>> the basechian(block) of nft maybe delete before the >>> __tc_indr_block_cb_unregister. is right? >>> >>> So maybe we can cache the block as a list of all the subsystem in indr_dev >>> ? >> >> when the device is unregister the nft netdev chain related to this device >> will also be delete through netdevice_notifier >> >> . So for unregister case,the basechian(block) of nft maybe delete before the >> __tc_indr_block_cb_unregister. > Hm, but I don't think that should be an issue. The ordering should be > like one of the following two: > > device unregister: > - driver notifier callback > - unregister flow cb > - UNBIND cb > - free driver's block state > - free driver's device state > - nft block destroy > # doesn't see driver's CB any more > > Or: > > device unregister: > - nft block destroy > - UNBIND cb > - free driver's block state > - driver notifier callback > - free driver's state > > No? For the second case maybe can't unbind cb? because the nft block is destroied. There is no way to find the block(chain) in nft. > >> cache for the block is not work because the chain already be delete and >> free. Maybe it improve the prio of >> >> rep_netdev_event can help this? > In theory the cache should work in a similar way as drivers, because > once the indr_dev is created and the initial block is found, the cached > value is just recorded in BIND/UNBIND calls. So if BIND/UNBIND works for > drivers it will also put the right info in the cache. >
Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
在 2019/8/3 8:21, Jakub Kicinski 写道: > On Sat, 3 Aug 2019 07:19:31 +0800, wenxu wrote: >>> Or: >>> >>> device unregister: >>> - nft block destroy >>> - UNBIND cb >>> - free driver's block state >>> - driver notifier callback >>> - free driver's state >>> >>> No? >> For the second case maybe can't unbind cb? because the nft block is >> destroied. There is no way to find the block(chain) in nft. > But before the block is destroyed doesn't nft send an UNBIND event to > the drivers, always? you are correct, it will be send an UBIND event when the block is destroyed
[PATCH net-next v6 0/6] flow_offload: add indr-block in nf_table_offload
From: wenxu This series patch make nftables offload support the vlan and tunnel device offload through indr-block architecture. The first four patches mv tc indr block to flow offload and rename to flow-indr-block. Because the new flow-indr-block can't get the tcf_block directly. The fifth patch provide a callback list to get flow_block of each subsystem immediately when the device register and contain a block. The last patch make nf_tables_offload support flow-indr-block. wenxu (6): cls_api: modify the tc_indr_block_ing_cmd parameters. cls_api: remove the tcf_block cache cls_api: add flow_indr_block_call function flow_offload: move tc indirect block to flow offload flow_offload: support get multi-subsystem block netfilter: nf_tables_offload: support indr block call drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 11 +- include/net/flow_offload.h | 37 +++ include/net/netfilter/nf_tables_offload.h | 4 + include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 236 +++ net/netfilter/nf_tables_api.c | 7 + net/netfilter/nf_tables_offload.c | 148 ++-- net/sched/cls_api.c| 254 - 10 files changed, 460 insertions(+), 285 deletions(-) -- 1.8.3.1
[PATCH net-next v6 3/6] cls_api: add flow_indr_block_call function
From: wenxu This patch make indr_block_call don't access struct tc_indr_block_cb and tc_indr_block_dev directly Signed-off-by: wenxu --- v6: no change net/sched/cls_api.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 654da8c..ebbc1e0 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -773,13 +773,27 @@ void tc_indr_block_cb_unregister(struct net_device *dev, } EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister); +static void flow_indr_block_call(struct net_device *dev, +struct flow_block_offload *bo, +enum flow_block_command command) +{ + struct tc_indr_block_cb *indr_block_cb; + struct tc_indr_block_dev *indr_dev; + + indr_dev = tc_indr_block_dev_lookup(dev); + if (!indr_dev) + return; + + list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, + bo); +} + static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, struct tcf_block_ext_info *ei, enum flow_block_command command, struct netlink_ext_ack *extack) { - struct tc_indr_block_cb *indr_block_cb; - struct tc_indr_block_dev *indr_dev; struct flow_block_offload bo = { .command= command, .binder_type= ei->binder_type, @@ -790,14 +804,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, }; INIT_LIST_HEAD(&bo.cb_list); - indr_dev = tc_indr_block_dev_lookup(dev); - if (!indr_dev) - return; - - list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) - indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, - &bo); - + flow_indr_block_call(dev, &bo, command); tcf_block_setup(block, &bo); } -- 1.8.3.1
[PATCH net-next v6 5/6] flow_offload: support get multi-subsystem block
From: wenxu It provide a callback list to find the blocks of tc and nft subsystems Signed-off-by: wenxu --- v6: new patch include/net/flow_offload.h | 10 +- net/core/flow_offload.c| 47 +- net/sched/cls_api.c| 9 - 3 files changed, 51 insertions(+), 15 deletions(-) diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 8f1a7b8..6022dd0 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -375,6 +375,15 @@ typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, void *cb_priv, enum flow_block_command command); +struct flow_indr_block_ing_entry { + flow_indr_block_ing_cmd_t *cb; + struct list_headlist; +}; + +void flow_indr_add_block_ing_cb(struct flow_indr_block_ing_entry *entry); + +void flow_indr_del_block_ing_cb(struct flow_indr_block_ing_entry *entry); + int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, flow_indr_block_bind_cb_t *cb, void *cb_ident); @@ -391,7 +400,6 @@ void flow_indr_block_cb_unregister(struct net_device *dev, void *cb_ident); void flow_indr_block_call(struct net_device *dev, - flow_indr_block_ing_cmd_t *cb, struct flow_block_offload *bo, enum flow_block_command command); diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c index 4cc18e4..0e84537 100644 --- a/net/core/flow_offload.c +++ b/net/core/flow_offload.c @@ -282,6 +282,8 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f, } EXPORT_SYMBOL(flow_block_cb_setup_simple); +static LIST_HEAD(block_ing_cb_list); + static struct rhashtable indr_setup_block_ht; struct flow_indr_block_cb { @@ -295,7 +297,6 @@ struct flow_indr_block_dev { struct rhash_head ht_node; struct net_device *dev; unsigned int refcnt; - flow_indr_block_ing_cmd_t *block_ing_cmd_cb; struct list_head cb_list; }; @@ -389,6 +390,22 @@ static void flow_indr_block_cb_del(struct flow_indr_block_cb *indr_block_cb) kfree(indr_block_cb); } +static void flow_block_ing_cmd(struct net_device *dev, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command command) +{ + struct flow_indr_block_ing_entry *entry; + + rcu_read_lock(); + + list_for_each_entry_rcu(entry, &block_ing_cb_list, list) { + entry->cb(dev, cb, cb_priv, command); + } + + rcu_read_unlock(); +} + int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, flow_indr_block_bind_cb_t *cb, void *cb_ident) @@ -406,10 +423,8 @@ int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - if (indr_dev->block_ing_cmd_cb) - indr_dev->block_ing_cmd_cb(dev, indr_block_cb->cb, - indr_block_cb->cb_priv, - FLOW_BLOCK_BIND); + flow_block_ing_cmd(dev, indr_block_cb->cb, indr_block_cb->cb_priv, + FLOW_BLOCK_BIND); return 0; @@ -448,10 +463,8 @@ void __flow_indr_block_cb_unregister(struct net_device *dev, if (!indr_block_cb) return; - if (indr_dev->block_ing_cmd_cb) - indr_dev->block_ing_cmd_cb(dev, indr_block_cb->cb, - indr_block_cb->cb_priv, - FLOW_BLOCK_UNBIND); + flow_block_ing_cmd(dev, indr_block_cb->cb, indr_block_cb->cb_priv, + FLOW_BLOCK_UNBIND); flow_indr_block_cb_del(indr_block_cb); flow_indr_block_dev_put(indr_dev); @@ -469,7 +482,6 @@ void flow_indr_block_cb_unregister(struct net_device *dev, EXPORT_SYMBOL_GPL(flow_indr_block_cb_unregister); void flow_indr_block_call(struct net_device *dev, - flow_indr_block_ing_cmd_t cb, struct flow_block_offload *bo, enum flow_block_command command) { @@ -480,15 +492,24 @@ void flow_indr_block_call(struct net_device *dev, if (!indr_dev) return; - indr_dev->block_ing_cmd_cb = command == FLOW_BLOCK_BIND -? cb : NULL; - list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, bo); } EXPORT_SYMBOL_GPL(flow_indr_block_call); +void
[PATCH net-next v6 1/6] cls_api: modify the tc_indr_block_ing_cmd parameters.
From: wenxu This patch make tc_indr_block_ing_cmd can't access struct tc_indr_block_dev and tc_indr_block_cb. Signed-off-by: wenxu --- v6: no change net/sched/cls_api.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 3565d9a..2e3b58d 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -677,26 +677,28 @@ static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb) static int tcf_block_setup(struct tcf_block *block, struct flow_block_offload *bo); -static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev, - struct tc_indr_block_cb *indr_block_cb, +static void tc_indr_block_ing_cmd(struct net_device *dev, + struct tcf_block *block, + tc_indr_block_bind_cb_t *cb, + void *cb_priv, enum flow_block_command command) { struct flow_block_offload bo = { .command= command, .binder_type= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS, - .net= dev_net(indr_dev->dev), - .block_shared = tcf_block_non_null_shared(indr_dev->block), + .net= dev_net(dev), + .block_shared = tcf_block_non_null_shared(block), }; INIT_LIST_HEAD(&bo.cb_list); - if (!indr_dev->block) + if (!block) return; - bo.block = &indr_dev->block->flow_block; + bo.block = &block->flow_block; - indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, - &bo); - tcf_block_setup(indr_dev->block, &bo); + cb(dev, cb_priv, TC_SETUP_BLOCK, &bo); + + tcf_block_setup(block, &bo); } int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, @@ -715,7 +717,8 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_BIND); + tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv, + FLOW_BLOCK_BIND); return 0; err_dev_put: @@ -752,7 +755,8 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, return; /* Send unbind message if required to free any block cbs. */ - tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_UNBIND); + tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv, + FLOW_BLOCK_UNBIND); tc_indr_block_cb_del(indr_block_cb); tc_indr_block_dev_put(indr_dev); } -- 1.8.3.1
[PATCH net-next v6 2/6] cls_api: remove the tcf_block cache
From: wenxu Remove the tcf_block in the tc_indr_block_dev for muti-subsystem support. Signed-off-by: wenxu --- v6: new patch net/sched/cls_api.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 2e3b58d..654da8c 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -574,7 +574,6 @@ struct tc_indr_block_dev { struct net_device *dev; unsigned int refcnt; struct list_head cb_list; - struct tcf_block *block; }; struct tc_indr_block_cb { @@ -611,7 +610,6 @@ static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev) INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; - indr_dev->block = tc_dev_ingress_block(dev); if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, tc_indr_setup_block_ht_params)) { kfree(indr_dev); @@ -706,6 +704,7 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, { struct tc_indr_block_cb *indr_block_cb; struct tc_indr_block_dev *indr_dev; + struct tcf_block *block; int err; indr_dev = tc_indr_block_dev_get(dev); @@ -717,8 +716,9 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv, - FLOW_BLOCK_BIND); + block = tc_dev_ingress_block(dev); + tc_indr_block_ing_cmd(dev, block, indr_block_cb->cb, + indr_block_cb->cb_priv, FLOW_BLOCK_BIND); return 0; err_dev_put: @@ -745,6 +745,7 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, { struct tc_indr_block_cb *indr_block_cb; struct tc_indr_block_dev *indr_dev; + struct tcf_block *block; indr_dev = tc_indr_block_dev_lookup(dev); if (!indr_dev) @@ -755,8 +756,9 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, return; /* Send unbind message if required to free any block cbs. */ - tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv, - FLOW_BLOCK_UNBIND); + block = tc_dev_ingress_block(dev); + tc_indr_block_ing_cmd(dev, block, indr_block_cb->cb, + indr_block_cb->cb_priv, FLOW_BLOCK_UNBIND); tc_indr_block_cb_del(indr_block_cb); tc_indr_block_dev_put(indr_dev); } @@ -792,8 +794,6 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, if (!indr_dev) return; - indr_dev->block = command == FLOW_BLOCK_BIND ? block : NULL; - list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo); -- 1.8.3.1
[PATCH net-next v6 6/6] netfilter: nf_tables_offload: support indr block call
From: wenxu nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu --- v6: support the new callback list include/net/netfilter/nf_tables_offload.h | 4 + net/netfilter/nf_tables_api.c | 7 ++ net/netfilter/nf_tables_offload.c | 148 +- 3 files changed, 135 insertions(+), 24 deletions(-) diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h index 3196663..bffd51a 100644 --- a/include/net/netfilter/nf_tables_offload.h +++ b/include/net/netfilter/nf_tables_offload.h @@ -63,6 +63,10 @@ struct nft_flow_rule { struct nft_flow_rule *nft_flow_rule_create(const struct nft_rule *rule); void nft_flow_rule_destroy(struct nft_flow_rule *flow); int nft_flow_rule_offload_commit(struct net *net); +void nft_indr_block_get_and_ing_cmd(struct net_device *dev, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command command); #define NFT_OFFLOAD_MATCH(__key, __base, __field, __len, __reg) \ (__reg)->base_offset= \ diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 605a7cf..fe3b7b0 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7593,6 +7593,11 @@ static void __net_exit nf_tables_exit_net(struct net *net) .exit = nf_tables_exit_net, }; +static struct flow_indr_block_ing_entry block_ing_entry = { + .cb = nft_indr_block_get_and_ing_cmd, + .list = LIST_HEAD_INIT(block_ing_entry.list), +}; + static int __init nf_tables_module_init(void) { int err; @@ -7624,6 +7629,7 @@ static int __init nf_tables_module_init(void) goto err5; nft_chain_route_init(); + flow_indr_add_block_ing_cb(&block_ing_entry); return err; err5: rhltable_destroy(&nft_objname_ht); @@ -7640,6 +7646,7 @@ static int __init nf_tables_module_init(void) static void __exit nf_tables_module_exit(void) { + flow_indr_del_block_ing_cb(&block_ing_entry); nfnetlink_subsys_unregister(&nf_tables_subsys); unregister_netdevice_notifier(&nf_tables_flowtable_notifier); nft_chain_filter_fini(); diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index 64f5fd5..d3c4c9c 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -171,24 +171,110 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo, return 0; } +static int nft_block_setup(struct nft_base_chain *basechain, + struct flow_block_offload *bo, + enum flow_block_command cmd) +{ + int err; + + switch (cmd) { + case FLOW_BLOCK_BIND: + err = nft_flow_offload_bind(bo, basechain); + break; + case FLOW_BLOCK_UNBIND: + err = nft_flow_offload_unbind(bo, basechain); + break; + default: + WARN_ON_ONCE(1); + err = -EOPNOTSUPP; + } + + return err; +} + +static int nft_block_offload_cmd(struct nft_base_chain *chain, +struct net_device *dev, +enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + int err; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + if (err < 0) + return err; + + return nft_block_setup(chain, &bo, cmd); +} + +static void nft_indr_block_ing_cmd(struct net_device *dev, + struct nft_base_chain *chain, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + + if (!chain) + return; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSAC
[PATCH net-next v6 4/6] flow_offload: move tc indirect block to flow offload
From: wenxu move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu --- v6: add a block_get_and_ing_cmd callback drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 11 +- include/net/flow_offload.h | 29 +++ include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 215 ++ net/sched/cls_api.c| 240 +++-- 7 files changed, 280 insertions(+), 263 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 6edf0ae..a820915 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -781,9 +781,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, { int err; - err = __tc_indr_block_cb_register(netdev, rpriv, - mlx5e_rep_indr_setup_tc_cb, - rpriv); + err = __flow_indr_block_cb_register(netdev, rpriv, + mlx5e_rep_indr_setup_tc_cb, + rpriv); if (err) { struct mlx5e_priv *priv = netdev_priv(rpriv->netdev); @@ -796,8 +796,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv, struct net_device *netdev) { - __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, - rpriv); + __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, + rpriv); } static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb, diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c index e209f15..7b490db 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1479,16 +1479,17 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app, return NOTIFY_OK; if (event == NETDEV_REGISTER) { - err = __tc_indr_block_cb_register(netdev, app, - nfp_flower_indr_setup_tc_cb, - app); + err = __flow_indr_block_cb_register(netdev, app, + nfp_flower_indr_setup_tc_cb, + app); if (err) nfp_flower_cmsg_warn(app, "Indirect block reg failed - %s\n", netdev->name); } else if (event == NETDEV_UNREGISTER) { - __tc_indr_block_cb_unregister(netdev, - nfp_flower_indr_setup_tc_cb, app); + __flow_indr_block_cb_unregister(netdev, + nfp_flower_indr_setup_tc_cb, + app); } return NOTIFY_OK; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 00b9aab..8f1a7b8 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -4,6 +4,7 @@ #include #include #include +#include struct flow_match { struct flow_dissector *dissector; @@ -366,4 +367,32 @@ static inline void flow_block_init(struct flow_block *flow_block) INIT_LIST_HEAD(&flow_block->cb_list); } +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, + enum tc_setup_type type, void *type_data); + +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command command); + +int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + flow_indr_block_bind_cb_t *cb, + void *cb_ident); + +void __flow_indr_block_cb_unregister(struct net_device *dev, +flow_indr_block_bind_cb_t *cb, +void *cb_ident); + +int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + flow_indr_block_bind_cb_t *cb, void *cb_ident); + +void flow_indr_blo
Re: [PATCH net-next 3/6] cls_api: add flow_indr_block_call function
v5 contain this patch but with non-version tag, I used --subject-prefix in git-format-patch. I am sorry to make a mistake when modify the commit log. So should I repost the v6? On 8/5/2019 2:02 PM, Jiri Pirko wrote: > Re subject. You don't have "v5" in this patch. I don't understand how > that happened. Do you use --subject-prefix in git-format-patch? >
Re: [PATCH net-next v6 5/6] flow_offload: support get multi-subsystem block
在 2019/8/7 0:10, Pablo Neira Ayuso 写道: > On Sun, Aug 04, 2019 at 09:24:00PM +0800, we...@ucloud.cn wrote: >> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h >> index 8f1a7b8..6022dd0 100644 >> --- a/include/net/flow_offload.h >> +++ b/include/net/flow_offload.h > [...] >> @@ -282,6 +282,8 @@ int flow_block_cb_setup_simple(struct flow_block_offload >> *f, >> } >> EXPORT_SYMBOL(flow_block_cb_setup_simple); >> >> +static LIST_HEAD(block_ing_cb_list); >> + >> static struct rhashtable indr_setup_block_ht; >> >> struct flow_indr_block_cb { >> @@ -295,7 +297,6 @@ struct flow_indr_block_dev { >> struct rhash_head ht_node; >> struct net_device *dev; >> unsigned int refcnt; >> -flow_indr_block_ing_cmd_t *block_ing_cmd_cb; >> struct list_head cb_list; >> }; >> >> @@ -389,6 +390,22 @@ static void flow_indr_block_cb_del(struct >> flow_indr_block_cb *indr_block_cb) >> kfree(indr_block_cb); >> } >> >> +static void flow_block_ing_cmd(struct net_device *dev, >> + flow_indr_block_bind_cb_t *cb, >> + void *cb_priv, >> + enum flow_block_command command) >> +{ >> +struct flow_indr_block_ing_entry *entry; >> + >> +rcu_read_lock(); >> + > unnecessary empty line. > >> +list_for_each_entry_rcu(entry, &block_ing_cb_list, list) { >> +entry->cb(dev, cb, cb_priv, command); >> +} >> + >> +rcu_read_unlock(); > OK, there's rcu_read_lock here... > >> +} >> + >> int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, >>flow_indr_block_bind_cb_t *cb, >>void *cb_ident) >> @@ -406,10 +423,8 @@ int __flow_indr_block_cb_register(struct net_device >> *dev, void *cb_priv, >> if (err) >> goto err_dev_put; >> >> -if (indr_dev->block_ing_cmd_cb) >> -indr_dev->block_ing_cmd_cb(dev, indr_block_cb->cb, >> - indr_block_cb->cb_priv, >> - FLOW_BLOCK_BIND); >> +flow_block_ing_cmd(dev, indr_block_cb->cb, indr_block_cb->cb_priv, >> + FLOW_BLOCK_BIND); >> >> return 0; >> >> @@ -448,10 +463,8 @@ void __flow_indr_block_cb_unregister(struct net_device >> *dev, >> if (!indr_block_cb) >> return; >> >> -if (indr_dev->block_ing_cmd_cb) >> -indr_dev->block_ing_cmd_cb(dev, indr_block_cb->cb, >> - indr_block_cb->cb_priv, >> - FLOW_BLOCK_UNBIND); >> +flow_block_ing_cmd(dev, indr_block_cb->cb, indr_block_cb->cb_priv, >> + FLOW_BLOCK_UNBIND); >> >> flow_indr_block_cb_del(indr_block_cb); >> flow_indr_block_dev_put(indr_dev); >> @@ -469,7 +482,6 @@ void flow_indr_block_cb_unregister(struct net_device >> *dev, >> EXPORT_SYMBOL_GPL(flow_indr_block_cb_unregister); >> >> void flow_indr_block_call(struct net_device *dev, >> - flow_indr_block_ing_cmd_t cb, >>struct flow_block_offload *bo, >>enum flow_block_command command) >> { >> @@ -480,15 +492,24 @@ void flow_indr_block_call(struct net_device *dev, >> if (!indr_dev) >> return; >> >> -indr_dev->block_ing_cmd_cb = command == FLOW_BLOCK_BIND >> - ? cb : NULL; >> - >> list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) >> indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, >>bo); >> } >> EXPORT_SYMBOL_GPL(flow_indr_block_call); >> >> +void flow_indr_add_block_ing_cb(struct flow_indr_block_ing_entry *entry) >> +{ > ... but registration does not protect the list with a mutex. > >> +list_add_tail_rcu(&entry->list, &block_ing_cb_list); >> +} >> +EXPORT_SYMBOL_GPL(flow_indr_add_block_ing_cb); flow_indr_add_block_ing_cb called from tc and nft in different order. subsys_initcall(tc_filter_init) and nf_tables_module_init It will be called at the same time? And any nft need flow_indr_del_block_ing_cb. It also does nedd the lock?
Re: [PATCH net-next v6 5/6] flow_offload: support get multi-subsystem block
在 2019/8/7 0:10, Pablo Neira Ayuso 写道: > >> >> +void flow_indr_add_block_ing_cb(struct flow_indr_block_ing_entry *entry) >> +{ > ... but registration does not protect the list with a mutex. > >> +list_add_tail_rcu(&entry->list, &block_ing_cb_list); >> +} >> +EXPORT_SYMBOL_GPL(flow_indr_add_block_ing_cb); yes, I think the flow_indr_add_block_ing_cb and flow_indr_del_block_ing_cb maybe used for more subsystem in the future. Both of them should add a mutex lock
[PATCH net-next v7 6/6] netfilter: nf_tables_offload: support indr block call
From: wenxu nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu Acked-by: Jakub Kicinski --- v7: no change include/net/netfilter/nf_tables_offload.h | 4 + net/netfilter/nf_tables_api.c | 7 ++ net/netfilter/nf_tables_offload.c | 148 +- 3 files changed, 135 insertions(+), 24 deletions(-) diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h index 3196663..bffd51a 100644 --- a/include/net/netfilter/nf_tables_offload.h +++ b/include/net/netfilter/nf_tables_offload.h @@ -63,6 +63,10 @@ struct nft_flow_rule { struct nft_flow_rule *nft_flow_rule_create(const struct nft_rule *rule); void nft_flow_rule_destroy(struct nft_flow_rule *flow); int nft_flow_rule_offload_commit(struct net *net); +void nft_indr_block_get_and_ing_cmd(struct net_device *dev, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command command); #define NFT_OFFLOAD_MATCH(__key, __base, __field, __len, __reg) \ (__reg)->base_offset= \ diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 605a7cf..fe3b7b0 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7593,6 +7593,11 @@ static void __net_exit nf_tables_exit_net(struct net *net) .exit = nf_tables_exit_net, }; +static struct flow_indr_block_ing_entry block_ing_entry = { + .cb = nft_indr_block_get_and_ing_cmd, + .list = LIST_HEAD_INIT(block_ing_entry.list), +}; + static int __init nf_tables_module_init(void) { int err; @@ -7624,6 +7629,7 @@ static int __init nf_tables_module_init(void) goto err5; nft_chain_route_init(); + flow_indr_add_block_ing_cb(&block_ing_entry); return err; err5: rhltable_destroy(&nft_objname_ht); @@ -7640,6 +7646,7 @@ static int __init nf_tables_module_init(void) static void __exit nf_tables_module_exit(void) { + flow_indr_del_block_ing_cb(&block_ing_entry); nfnetlink_subsys_unregister(&nf_tables_subsys); unregister_netdevice_notifier(&nf_tables_flowtable_notifier); nft_chain_filter_fini(); diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index 64f5fd5..d3c4c9c 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -171,24 +171,110 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo, return 0; } +static int nft_block_setup(struct nft_base_chain *basechain, + struct flow_block_offload *bo, + enum flow_block_command cmd) +{ + int err; + + switch (cmd) { + case FLOW_BLOCK_BIND: + err = nft_flow_offload_bind(bo, basechain); + break; + case FLOW_BLOCK_UNBIND: + err = nft_flow_offload_unbind(bo, basechain); + break; + default: + WARN_ON_ONCE(1); + err = -EOPNOTSUPP; + } + + return err; +} + +static int nft_block_offload_cmd(struct nft_base_chain *chain, +struct net_device *dev, +enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + int err; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; + bo.extack = &extack; + INIT_LIST_HEAD(&bo.cb_list); + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + if (err < 0) + return err; + + return nft_block_setup(chain, &bo, cmd); +} + +static void nft_indr_block_ing_cmd(struct net_device *dev, + struct nft_base_chain *chain, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command cmd) +{ + struct netlink_ext_ack extack = {}; + struct flow_block_offload bo = {}; + + if (!chain) + return; + + bo.net = dev_net(dev); + bo.block = &chain->flow_block; + bo.command = cmd; + bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSAC
[PATCH net-next v7 3/6] cls_api: add flow_indr_block_call function
From: wenxu This patch make indr_block_call don't access struct tc_indr_block_cb and tc_indr_block_dev directly Signed-off-by: wenxu Acked-by: Jakub Kicinski --- v7: no change net/sched/cls_api.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 12eaa6c9..7c34fc6 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -773,13 +773,27 @@ void tc_indr_block_cb_unregister(struct net_device *dev, } EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister); +static void flow_indr_block_call(struct net_device *dev, +struct flow_block_offload *bo, +enum flow_block_command command) +{ + struct tc_indr_block_cb *indr_block_cb; + struct tc_indr_block_dev *indr_dev; + + indr_dev = tc_indr_block_dev_lookup(dev); + if (!indr_dev) + return; + + list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) + indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, + bo); +} + static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, struct tcf_block_ext_info *ei, enum flow_block_command command, struct netlink_ext_ack *extack) { - struct tc_indr_block_cb *indr_block_cb; - struct tc_indr_block_dev *indr_dev; struct flow_block_offload bo = { .command= command, .binder_type= ei->binder_type, @@ -790,14 +804,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, }; INIT_LIST_HEAD(&bo.cb_list); - indr_dev = tc_indr_block_dev_lookup(dev); - if (!indr_dev) - return; - - list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) - indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, - &bo); - + flow_indr_block_call(dev, &bo, command); tcf_block_setup(block, &bo); } -- 1.8.3.1
[PATCH net-next v7 4/6] flow_offload: move tc indirect block to flow offload
From: wenxu move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu Acked-by: Jakub Kicinski --- v7: no change drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 11 +- include/net/flow_offload.h | 29 +++ include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 215 ++ net/sched/cls_api.c| 240 +++-- 7 files changed, 280 insertions(+), 263 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index bf6f483..1ebbd63 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -781,9 +781,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, { int err; - err = __tc_indr_block_cb_register(netdev, rpriv, - mlx5e_rep_indr_setup_tc_cb, - rpriv); + err = __flow_indr_block_cb_register(netdev, rpriv, + mlx5e_rep_indr_setup_tc_cb, + rpriv); if (err) { struct mlx5e_priv *priv = netdev_priv(rpriv->netdev); @@ -796,8 +796,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv, static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv, struct net_device *netdev) { - __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, - rpriv); + __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb, + rpriv); } static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb, diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c index ff8a9f1..3a4f4f0 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1649,16 +1649,17 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app, return NOTIFY_OK; if (event == NETDEV_REGISTER) { - err = __tc_indr_block_cb_register(netdev, app, - nfp_flower_indr_setup_tc_cb, - app); + err = __flow_indr_block_cb_register(netdev, app, + nfp_flower_indr_setup_tc_cb, + app); if (err) nfp_flower_cmsg_warn(app, "Indirect block reg failed - %s\n", netdev->name); } else if (event == NETDEV_UNREGISTER) { - __tc_indr_block_cb_unregister(netdev, - nfp_flower_indr_setup_tc_cb, app); + __flow_indr_block_cb_unregister(netdev, + nfp_flower_indr_setup_tc_cb, + app); } return NOTIFY_OK; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index d3b12bc..46b8777 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -4,6 +4,7 @@ #include #include #include +#include struct flow_match { struct flow_dissector *dissector; @@ -370,4 +371,32 @@ static inline void flow_block_init(struct flow_block *flow_block) INIT_LIST_HEAD(&flow_block->cb_list); } +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv, + enum tc_setup_type type, void *type_data); + +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command command); + +int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + flow_indr_block_bind_cb_t *cb, + void *cb_ident); + +void __flow_indr_block_cb_unregister(struct net_device *dev, +flow_indr_block_bind_cb_t *cb, +void *cb_ident); + +int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, + flow_indr_block_bind_cb_t *cb, void *cb_ident); + +void flow_indr_blo
[PATCH net-next v7 2/6] cls_api: remove the tcf_block cache
From: wenxu Remove the tcf_block in the tc_indr_block_dev for muti-subsystem support. Signed-off-by: wenxu Acked-by: Jakub Kicinski --- v7: no change net/sched/cls_api.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 1dd210d..12eaa6c9 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -574,7 +574,6 @@ struct tc_indr_block_dev { struct net_device *dev; unsigned int refcnt; struct list_head cb_list; - struct tcf_block *block; }; struct tc_indr_block_cb { @@ -611,7 +610,6 @@ static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev) INIT_LIST_HEAD(&indr_dev->cb_list); indr_dev->dev = dev; - indr_dev->block = tc_dev_ingress_block(dev); if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, tc_indr_setup_block_ht_params)) { kfree(indr_dev); @@ -706,6 +704,7 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, { struct tc_indr_block_cb *indr_block_cb; struct tc_indr_block_dev *indr_dev; + struct tcf_block *block; int err; indr_dev = tc_indr_block_dev_get(dev); @@ -717,8 +716,9 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv, - FLOW_BLOCK_BIND); + block = tc_dev_ingress_block(dev); + tc_indr_block_ing_cmd(dev, block, indr_block_cb->cb, + indr_block_cb->cb_priv, FLOW_BLOCK_BIND); return 0; err_dev_put: @@ -745,6 +745,7 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, { struct tc_indr_block_cb *indr_block_cb; struct tc_indr_block_dev *indr_dev; + struct tcf_block *block; indr_dev = tc_indr_block_dev_lookup(dev); if (!indr_dev) @@ -755,8 +756,9 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, return; /* Send unbind message if required to free any block cbs. */ - tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv, - FLOW_BLOCK_UNBIND); + block = tc_dev_ingress_block(dev); + tc_indr_block_ing_cmd(dev, block, indr_block_cb->cb, + indr_block_cb->cb_priv, FLOW_BLOCK_UNBIND); tc_indr_block_cb_del(indr_block_cb); tc_indr_block_dev_put(indr_dev); } @@ -792,8 +794,6 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev, if (!indr_dev) return; - indr_dev->block = command == FLOW_BLOCK_BIND ? block : NULL; - list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list) indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo); -- 1.8.3.1
[PATCH net-next v7 0/6] flow_offload: add indr-block in nf_table_offload
From: wenxu This series patch make nftables offload support the vlan and tunnel device offload through indr-block architecture. The first four patches mv tc indr block to flow offload and rename to flow-indr-block. Because the new flow-indr-block can't get the tcf_block directly. The fifth patch provide a callback list to get flow_block of each subsystem immediately when the device register and contain a block. The last patch make nf_tables_offload support flow-indr-block. This version add a mutex lock for add/del flow_indr_block_ing_cb wenxu (6): cls_api: modify the tc_indr_block_ing_cmd parameters. cls_api: remove the tcf_block cache cls_api: add flow_indr_block_call function flow_offload: move tc indirect block to flow offload flow_offload: support get multi-subsystem block netfilter: nf_tables_offload: support indr block call drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +- .../net/ethernet/netronome/nfp/flower/offload.c| 11 +- include/net/flow_offload.h | 37 +++ include/net/netfilter/nf_tables_offload.h | 4 + include/net/pkt_cls.h | 35 --- include/net/sch_generic.h | 3 - net/core/flow_offload.c| 240 +++ net/netfilter/nf_tables_api.c | 7 + net/netfilter/nf_tables_offload.c | 148 ++-- net/sched/cls_api.c| 254 - 10 files changed, 464 insertions(+), 285 deletions(-) -- 1.8.3.1
[PATCH net-next v7 5/6] flow_offload: support get multi-subsystem block
From: wenxu It provide a callback list to find the blocks of tc and nft subsystems Signed-off-by: wenxu Acked-by: Jakub Kicinski --- v7: add a mutex lock for add/del flow_indr_block_ing_cb include/net/flow_offload.h | 10 - net/core/flow_offload.c| 51 ++ net/sched/cls_api.c| 9 +++- 3 files changed, 55 insertions(+), 15 deletions(-) diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 46b8777..e8069b6 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -379,6 +379,15 @@ typedef void flow_indr_block_ing_cmd_t(struct net_device *dev, void *cb_priv, enum flow_block_command command); +struct flow_indr_block_ing_entry { + flow_indr_block_ing_cmd_t *cb; + struct list_headlist; +}; + +void flow_indr_add_block_ing_cb(struct flow_indr_block_ing_entry *entry); + +void flow_indr_del_block_ing_cb(struct flow_indr_block_ing_entry *entry); + int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, flow_indr_block_bind_cb_t *cb, void *cb_ident); @@ -395,7 +404,6 @@ void flow_indr_block_cb_unregister(struct net_device *dev, void *cb_ident); void flow_indr_block_call(struct net_device *dev, - flow_indr_block_ing_cmd_t *cb, struct flow_block_offload *bo, enum flow_block_command command); diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c index 4cc18e4..64c3d4d 100644 --- a/net/core/flow_offload.c +++ b/net/core/flow_offload.c @@ -3,6 +3,7 @@ #include #include #include +#include struct flow_rule *flow_rule_alloc(unsigned int num_actions) { @@ -282,6 +283,8 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f, } EXPORT_SYMBOL(flow_block_cb_setup_simple); +static LIST_HEAD(block_ing_cb_list); + static struct rhashtable indr_setup_block_ht; struct flow_indr_block_cb { @@ -295,7 +298,6 @@ struct flow_indr_block_dev { struct rhash_head ht_node; struct net_device *dev; unsigned int refcnt; - flow_indr_block_ing_cmd_t *block_ing_cmd_cb; struct list_head cb_list; }; @@ -389,6 +391,20 @@ static void flow_indr_block_cb_del(struct flow_indr_block_cb *indr_block_cb) kfree(indr_block_cb); } +static void flow_block_ing_cmd(struct net_device *dev, + flow_indr_block_bind_cb_t *cb, + void *cb_priv, + enum flow_block_command command) +{ + struct flow_indr_block_ing_entry *entry; + + rcu_read_lock(); + list_for_each_entry_rcu(entry, &block_ing_cb_list, list) { + entry->cb(dev, cb, cb_priv, command); + } + rcu_read_unlock(); +} + int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, flow_indr_block_bind_cb_t *cb, void *cb_ident) @@ -406,10 +422,8 @@ int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - if (indr_dev->block_ing_cmd_cb) - indr_dev->block_ing_cmd_cb(dev, indr_block_cb->cb, - indr_block_cb->cb_priv, - FLOW_BLOCK_BIND); + flow_block_ing_cmd(dev, indr_block_cb->cb, indr_block_cb->cb_priv, + FLOW_BLOCK_BIND); return 0; @@ -448,10 +462,8 @@ void __flow_indr_block_cb_unregister(struct net_device *dev, if (!indr_block_cb) return; - if (indr_dev->block_ing_cmd_cb) - indr_dev->block_ing_cmd_cb(dev, indr_block_cb->cb, - indr_block_cb->cb_priv, - FLOW_BLOCK_UNBIND); + flow_block_ing_cmd(dev, indr_block_cb->cb, indr_block_cb->cb_priv, + FLOW_BLOCK_UNBIND); flow_indr_block_cb_del(indr_block_cb); flow_indr_block_dev_put(indr_dev); @@ -469,7 +481,6 @@ void flow_indr_block_cb_unregister(struct net_device *dev, EXPORT_SYMBOL_GPL(flow_indr_block_cb_unregister); void flow_indr_block_call(struct net_device *dev, - flow_indr_block_ing_cmd_t cb, struct flow_block_offload *bo, enum flow_block_command command) { @@ -480,15 +491,29 @@ void flow_indr_block_call(struct net_device *dev, if (!indr_dev) return; - indr_dev->block_ing_cmd_cb = command == FLOW_BLOCK_BIND -? cb : NULL; - list_for_each_entry(indr_block_cb, &indr_dev->
[PATCH net-next v7 1/6] cls_api: modify the tc_indr_block_ing_cmd parameters.
From: wenxu This patch make tc_indr_block_ing_cmd can't access struct tc_indr_block_dev and tc_indr_block_cb. Signed-off-by: wenxu Acked-by: Jakub Kicinski --- v7: no change net/sched/cls_api.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 9d85d32..1dd210d 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -677,26 +677,28 @@ static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb) static int tcf_block_setup(struct tcf_block *block, struct flow_block_offload *bo); -static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev, - struct tc_indr_block_cb *indr_block_cb, +static void tc_indr_block_ing_cmd(struct net_device *dev, + struct tcf_block *block, + tc_indr_block_bind_cb_t *cb, + void *cb_priv, enum flow_block_command command) { struct flow_block_offload bo = { .command= command, .binder_type= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS, - .net= dev_net(indr_dev->dev), - .block_shared = tcf_block_non_null_shared(indr_dev->block), + .net= dev_net(dev), + .block_shared = tcf_block_non_null_shared(block), }; INIT_LIST_HEAD(&bo.cb_list); - if (!indr_dev->block) + if (!block) return; - bo.block = &indr_dev->block->flow_block; + bo.block = &block->flow_block; - indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, - &bo); - tcf_block_setup(indr_dev->block, &bo); + cb(dev, cb_priv, TC_SETUP_BLOCK, &bo); + + tcf_block_setup(block, &bo); } int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, @@ -715,7 +717,8 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv, if (err) goto err_dev_put; - tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_BIND); + tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv, + FLOW_BLOCK_BIND); return 0; err_dev_put: @@ -752,7 +755,8 @@ void __tc_indr_block_cb_unregister(struct net_device *dev, return; /* Send unbind message if required to free any block cbs. */ - tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_UNBIND); + tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv, + FLOW_BLOCK_UNBIND); tc_indr_block_cb_del(indr_block_cb); tc_indr_block_dev_put(indr_dev); } -- 1.8.3.1
Ktls RX offload in CX6 of mellanox
Hi Mellanox team. I test the ktls offload feature with CX6 dx with net-next tree. I found the driver already support the KTLS rx offload function My firmware version is the latest one 22.29.1016 According to the document: https://docs.mellanox.com/display/OFEDv521040/Kernel+Transport+Layer+Security+%28kTLS%29+Offloads With the 22.29.1016 FW, The RX offload is supported now. # lspci | grep Ether b3:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] b3:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] # ethtool -i net2 driver: mlx5_core version: 5.0-0 firmware-version: 22.29.1016 (MT_000430) expansion-rom-version: bus-info: :b3:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes # ethtool -K net2 | grep tls-hw tls-hw-tx-offload: on tls-hw-rx-offload: off [fixed] But I found the RX offload is not supported currently? I found the mlx5_accel_is_ktls_rx(mdev) will return false and it leads this feature not supported. So it means the current FW also does not support RX offload? BR wenxu
[PATCH net-next] net/sched: cls_flower add CT_FLAGS_INVALID flag support
From: wenxu This patch add the TCA_FLOWER_KEY_CT_FLAGS_INVALID flag to match the ct_state with invalid for conntrack. Signed-off-by: wenxu --- include/linux/skbuff.h | 4 ++-- include/net/sch_generic.h| 1 + include/uapi/linux/pkt_cls.h | 1 + net/core/dev.c | 2 ++ net/core/flow_dissector.c| 13 + net/sched/act_ct.c | 1 + net/sched/cls_flower.c | 6 +- 7 files changed, 21 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index c9568cf..e22ccf0 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1353,8 +1353,8 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, skb_flow_dissect_ct(const struct sk_buff *skb, struct flow_dissector *flow_dissector, void *target_container, - u16 *ctinfo_map, - size_t mapsize); + u16 *ctinfo_map, size_t mapsize, + bool post_ct); void skb_flow_dissect_tunnel_info(const struct sk_buff *skb, struct flow_dissector *flow_dissector, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 639e465..e7bee99 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -388,6 +388,7 @@ struct qdisc_skb_cb { #define QDISC_CB_PRIV_LEN 20 unsigned char data[QDISC_CB_PRIV_LEN]; u16 mru; + boolpost_ct; }; typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv); diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index ee95f42..709668e 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -591,6 +591,7 @@ enum { TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */ TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */ TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */ + TCA_FLOWER_KEY_CT_FLAGS_INVALID = 1 << 4, /* Conntrack is invalid. */ }; enum { diff --git a/net/core/dev.c b/net/core/dev.c index bae35c1..9dce3f7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3878,6 +3878,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ qdisc_skb_cb(skb)->mru = 0; + qdisc_skb_cb(skb)->post_ct = false; mini_qdisc_bstats_cpu_update(miniq, skb); switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { @@ -4960,6 +4961,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) qdisc_skb_cb(skb)->pkt_len = skb->len; qdisc_skb_cb(skb)->mru = 0; + qdisc_skb_cb(skb)->post_ct = false; skb->tc_at_ingress = 1; mini_qdisc_bstats_cpu_update(miniq, skb); diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 2d70ded..c565c7a 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -237,9 +237,8 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, void skb_flow_dissect_ct(const struct sk_buff *skb, struct flow_dissector *flow_dissector, - void *target_container, - u16 *ctinfo_map, - size_t mapsize) + void *target_container, u16 *ctinfo_map, + size_t mapsize, bool post_ct) { #if IS_ENABLED(CONFIG_NF_CONNTRACK) struct flow_dissector_key_ct *key; @@ -251,13 +250,19 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, return; ct = nf_ct_get(skb, &ctinfo); - if (!ct) + if (!ct && !post_ct) return; key = skb_flow_dissector_target(flow_dissector, FLOW_DISSECTOR_KEY_CT, target_container); + if (!ct) { + key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED | + TCA_FLOWER_KEY_CT_FLAGS_INVALID; + return; + } + if (ctinfo < mapsize) key->ct_state = ctinfo_map[ctinfo]; #if IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 83a5c67..b344207 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1030,6 +1030,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, out: tcf_action_update_bstats(&c->common, skb); + qdisc_skb_cb(skb)->post_ct = true; if (defrag) qdisc_skb_cb(skb)->pkt_len = skb->len; return retval; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 1319986..e8653d9 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c
Re: [PATCH net-next] net/sched: cls_flower add CT_FLAGS_INVALID flag support
On 1/19/2021 2:21 AM, Marcelo Ricardo Leitner wrote: > On Mon, Jan 18, 2021 at 01:18:47PM +0800, we...@ucloud.cn wrote: > ... >> --- a/net/sched/cls_flower.c >> +++ b/net/sched/cls_flower.c >> @@ -305,6 +305,9 @@ static int fl_classify(struct sk_buff *skb, const struct >> tcf_proto *tp, >> struct fl_flow_key skb_key; >> struct fl_flow_mask *mask; >> struct cls_fl_filter *f; >> +bool post_ct; >> + >> +post_ct = qdisc_skb_cb(skb)->post_ct; > Patch-wise, only here I think you could initialize post_ct right on > the declaration. No need for the extra line/block of lines here. > > But I'm missing the iproute2 changes for flower, with a man page > update as well. Not sure if you planned to post them later on or not, > but it's nice to always have them paired together. Will do . Thanks. > > Thanks, > Marcelo >
[PATCH v2 net-next ] net/sched: cls_flower add CT_FLAGS_INVALID flag support
From: wenxu This patch add the TCA_FLOWER_KEY_CT_FLAGS_INVALID flag to match the ct_state with invalid for conntrack. Signed-off-by: wenxu --- v2: initialize post_ct right on the declaration include/linux/skbuff.h | 4 ++-- include/net/sch_generic.h| 1 + include/uapi/linux/pkt_cls.h | 1 + net/core/dev.c | 2 ++ net/core/flow_dissector.c| 13 + net/sched/act_ct.c | 1 + net/sched/cls_flower.c | 4 +++- 7 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index c9568cf..e22ccf0 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1353,8 +1353,8 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, skb_flow_dissect_ct(const struct sk_buff *skb, struct flow_dissector *flow_dissector, void *target_container, - u16 *ctinfo_map, - size_t mapsize); + u16 *ctinfo_map, size_t mapsize, + bool post_ct); void skb_flow_dissect_tunnel_info(const struct sk_buff *skb, struct flow_dissector *flow_dissector, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 639e465..e7bee99 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -388,6 +388,7 @@ struct qdisc_skb_cb { #define QDISC_CB_PRIV_LEN 20 unsigned char data[QDISC_CB_PRIV_LEN]; u16 mru; + boolpost_ct; }; typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv); diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index ee95f42..709668e 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -591,6 +591,7 @@ enum { TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */ TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */ TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */ + TCA_FLOWER_KEY_CT_FLAGS_INVALID = 1 << 4, /* Conntrack is invalid. */ }; enum { diff --git a/net/core/dev.c b/net/core/dev.c index bae35c1..9dce3f7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3878,6 +3878,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ qdisc_skb_cb(skb)->mru = 0; + qdisc_skb_cb(skb)->post_ct = false; mini_qdisc_bstats_cpu_update(miniq, skb); switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { @@ -4960,6 +4961,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) qdisc_skb_cb(skb)->pkt_len = skb->len; qdisc_skb_cb(skb)->mru = 0; + qdisc_skb_cb(skb)->post_ct = false; skb->tc_at_ingress = 1; mini_qdisc_bstats_cpu_update(miniq, skb); diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 2d70ded..c565c7a 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -237,9 +237,8 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, void skb_flow_dissect_ct(const struct sk_buff *skb, struct flow_dissector *flow_dissector, - void *target_container, - u16 *ctinfo_map, - size_t mapsize) + void *target_container, u16 *ctinfo_map, + size_t mapsize, bool post_ct) { #if IS_ENABLED(CONFIG_NF_CONNTRACK) struct flow_dissector_key_ct *key; @@ -251,13 +250,19 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, return; ct = nf_ct_get(skb, &ctinfo); - if (!ct) + if (!ct && !post_ct) return; key = skb_flow_dissector_target(flow_dissector, FLOW_DISSECTOR_KEY_CT, target_container); + if (!ct) { + key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED | + TCA_FLOWER_KEY_CT_FLAGS_INVALID; + return; + } + if (ctinfo < mapsize) key->ct_state = ctinfo_map[ctinfo]; #if IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 83a5c67..b344207 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1030,6 +1030,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, out: tcf_action_update_bstats(&c->common, skb); + qdisc_skb_cb(skb)->post_ct = true; if (defrag) qdisc_skb_cb(skb)->pkt_len = skb->len; return retval; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 1319986..0dcb5a0 100644 --- a/net/sched
[PATCH iproute2-next] tc: flower: add tc conntrack inv ct_state support
From: wenxu Matches on conntrack inv ct_state. Signed-off-by: wenxu --- include/uapi/linux/pkt_cls.h | 1 + man/man8/tc-flower.8 | 2 ++ tc/f_flower.c| 1 + 3 files changed, 4 insertions(+) diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index ee95f42..709668e 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -591,6 +591,7 @@ enum { TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */ TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */ TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */ + TCA_FLOWER_KEY_CT_FLAGS_INVALID = 1 << 4, /* Conntrack is invalid. */ }; enum { diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8 index 1a76b37..8de68d1 100644 --- a/man/man8/tc-flower.8 +++ b/man/man8/tc-flower.8 @@ -387,6 +387,8 @@ new - New connection. .TP est - Established connection. .TP +inv - The packet is associated with no known connection. +.TP Example: +trk+est .RE .TP diff --git a/tc/f_flower.c b/tc/f_flower.c index 1fe0ef4..489c0d7 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -345,6 +345,7 @@ static struct flower_ct_states { { "trk", TCA_FLOWER_KEY_CT_FLAGS_TRACKED }, { "new", TCA_FLOWER_KEY_CT_FLAGS_NEW }, { "est", TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED }, + { "inv", TCA_FLOWER_KEY_CT_FLAGS_INVALID}, }; static int flower_parse_ct_state(char *str, struct nlmsghdr *n) -- 1.8.3.1
[PATCH iproute2-next v2] tc: flower: add tc conntrack inv ct_state support
From: wenxu Matches on conntrack inv ct_state. Signed-off-by: wenxu --- v2: change the description include/uapi/linux/pkt_cls.h | 1 + man/man8/tc-flower.8 | 2 ++ tc/f_flower.c| 1 + 3 files changed, 4 insertions(+) diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index 449a639..e8f2aed 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -563,6 +563,7 @@ enum { TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */ TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */ TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */ + TCA_FLOWER_KEY_CT_FLAGS_INVALID = 1 << 4, /* Conntrack is invalid. */ }; enum { diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8 index eb9eb5f..f90117b 100644 --- a/man/man8/tc-flower.8 +++ b/man/man8/tc-flower.8 @@ -312,6 +312,8 @@ new - New connection. .TP est - Established connection. .TP +inv - The state is invalid. The packet couldn't be associated to a connection. +.TP Example: +trk+est .RE .TP diff --git a/tc/f_flower.c b/tc/f_flower.c index 9d59d71..7d2df9d 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -340,6 +340,7 @@ static struct flower_ct_states { { "trk", TCA_FLOWER_KEY_CT_FLAGS_TRACKED }, { "new", TCA_FLOWER_KEY_CT_FLAGS_NEW }, { "est", TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED }, + { "inv", TCA_FLOWER_KEY_CT_FLAGS_INVALID}, }; static int flower_parse_ct_state(char *str, struct nlmsghdr *n) -- 1.8.3.1
Re: [PATCH v2 net-next ] net/sched: cls_flower add CT_FLAGS_INVALID flag support
On 1/21/2021 9:09 AM, Cong Wang wrote: > On Wed, Jan 20, 2021 at 3:40 PM Marcelo Ricardo Leitner > wrote: >> On Wed, Jan 20, 2021 at 02:18:41PM -0800, Cong Wang wrote: >>> On Tue, Jan 19, 2021 at 12:33 AM wrote: diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 2d70ded..c565c7a 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -237,9 +237,8 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, void skb_flow_dissect_ct(const struct sk_buff *skb, struct flow_dissector *flow_dissector, - void *target_container, - u16 *ctinfo_map, - size_t mapsize) + void *target_container, u16 *ctinfo_map, + size_t mapsize, bool post_ct) >>> Why do you pass this boolean as a parameter when you >>> can just read it from qdisc_skb_cb(skb)? >> In this case, yes, but this way skb_flow_dissect_ct() can/is able to >> not care about what the ->cb actually is. It could be called from >> somewhere else too. > This sounds reasonable, it is in net/core/ directory anyway, > so should be independent of tc even though cls_flower is its > only caller. yes. This is the same what I think. > > Thanks. >
Re: [PATCH v2 net-next 3/3] net/sched: sch_frag: add generic packet fragment support.
在 2020/11/18 15:00, Cong Wang 写道: > On Tue, Nov 17, 2020 at 5:37 PM wrote: >> From: wenxu >> >> Currently kernel tc subsystem can do conntrack in cat_ct. But when several >> fragment packets go through the act_ct, function tcf_ct_handle_fragments >> will defrag the packets to a big one. But the last action will redirect >> mirred to a device which maybe lead the reassembly big packet over the mtu >> of target device. >> >> This patch add support for a xmit hook to mirred, that gets executed before >> xmiting the packet. Then, when act_ct gets loaded, it configs that hook. >> The frag xmit hook maybe reused by other modules. >> >> Signed-off-by: wenxu >> --- >> v2: make act_frag just buildin for tc core but not a module >> return an error code from tcf_fragment >> depends on INET for ip_do_fragment > Much better now. > > >> +#ifdef CONFIG_INET >> + ret = ip_do_fragment(net, skb->sk, skb, sch_frag_xmit); >> +#endif > > Doesn't the whole sch_frag need to be put under CONFIG_INET? > I don't think fragmentation could work without CONFIG_INET. I have already test with this. Without CONFIG_INET it is work. And only the ip_do_fragment depends on CONFIG_INET > > Thanks. >
[PATCH v3 net-next 2/3] net/sched: act_mirred: refactor the handle of xmit
From: wenxu This one is prepare for the next patch. Signed-off-by: wenxu --- v3: no change include/net/sch_generic.h | 5 - net/sched/act_mirred.c| 21 +++-- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d8fd867..dd74f06 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1281,9 +1281,4 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc, void mini_qdisc_pair_block_init(struct mini_Qdisc_pair *miniqp, struct tcf_block *block); -static inline int skb_tc_reinsert(struct sk_buff *skb, struct tcf_result *res) -{ - return res->ingress ? netif_receive_skb(skb) : dev_queue_xmit(skb); -} - #endif diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index e24b7e2..17d0095 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -205,6 +205,18 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, return err; } +static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb) +{ + int err; + + if (!want_ingress) + err = dev_queue_xmit(skb); + else + err = netif_receive_skb(skb); + + return err; +} + static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { @@ -287,18 +299,15 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, /* let's the caller reinsert the packet, if possible */ if (use_reinsert) { res->ingress = want_ingress; - if (skb_tc_reinsert(skb, res)) + err = tcf_mirred_forward(res->ingress, skb); + if (err) tcf_action_inc_overlimit_qstats(&m->common); __this_cpu_dec(mirred_rec_level); return TC_ACT_CONSUMED; } } - if (!want_ingress) - err = dev_queue_xmit(skb2); - else - err = netif_receive_skb(skb2); - + err = tcf_mirred_forward(want_ingress, skb2); if (err) { out: tcf_action_inc_overlimit_qstats(&m->common); -- 1.8.3.1
[PATCH v3 net-next 0/3] net/sched: fix over mtu packet of defrag in
From: wenxu Currently kernel tc subsystem can do conntrack in act_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. The first patch fix miss init the qdisc_skb_cb->mru The send one refactor the hanle of xmit in act_mirred and prepare for the third one The last one add implict packet fragment support to fix the over mtu for defrag in act_ct. wenxu (3): net/sched: fix miss init the mru in qdisc_skb_cb net/sched: act_mirred: refactor the handle of xmit net/sched: sch_frag: add generic packet fragment support. include/net/act_api.h | 8 +++ include/net/sch_generic.h | 5 +- net/core/dev.c| 2 + net/sched/Makefile| 1 + net/sched/act_api.c | 44 ++ net/sched/act_ct.c| 7 +++ net/sched/act_mirred.c| 21 +-- net/sched/sch_frag.c | 150 ++ 8 files changed, 228 insertions(+), 10 deletions(-) create mode 100644 net/sched/sch_frag.c -- 1.8.3.1
[PATCH v3 net-next 3/3] net/sched: sch_frag: add generic packet fragment support.
From: wenxu Currently kernel tc subsystem can do conntrack in cat_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. This patch add support for a xmit hook to mirred, that gets executed before xmiting the packet. Then, when act_ct gets loaded, it configs that hook. The frag xmit hook maybe reused by other modules. Signed-off-by: wenxu --- v2: make act_frag just buildin for tc core but not a module return an error code from tcf_fragment depends on INET for ip_do_fragment v3: put the whole sch_frag.c under CONFIG_INET include/net/act_api.h | 8 +++ include/net/sch_generic.h | 2 + net/sched/Makefile| 1 + net/sched/act_api.c | 44 ++ net/sched/act_ct.c| 7 +++ net/sched/act_mirred.c| 2 +- net/sched/sch_frag.c | 150 ++ 7 files changed, 213 insertions(+), 1 deletion(-) create mode 100644 net/sched/sch_frag.c diff --git a/include/net/act_api.h b/include/net/act_api.h index 8721492..decb6de 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -239,6 +239,14 @@ int tcf_action_check_ctrlact(int action, struct tcf_proto *tp, struct netlink_ext_ack *newchain); struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, struct tcf_chain *newchain); + +typedef int xmit_hook_func(struct sk_buff *skb, + int (*xmit)(struct sk_buff *skb)); + +int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); +int tcf_set_xmit_hook(xmit_hook_func *xmit_hook); +void tcf_clear_xmit_hook(void); + #endif /* CONFIG_NET_CLS_ACT */ static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index dd74f06..162ed62 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1281,4 +1281,6 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc, void mini_qdisc_pair_block_init(struct mini_Qdisc_pair *miniqp, struct tcf_block *block); +int sch_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); + #endif diff --git a/net/sched/Makefile b/net/sched/Makefile index 66bbf9a..dd14ef4 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -5,6 +5,7 @@ obj-y := sch_generic.o sch_mq.o +obj-$(CONFIG_INET) += sch_frag.o obj-$(CONFIG_NET_SCHED)+= sch_api.o sch_blackhole.o obj-$(CONFIG_NET_CLS) += cls_api.o obj-$(CONFIG_NET_CLS_ACT) += act_api.o diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 60e1572..fbb35a8 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -22,6 +22,50 @@ #include #include +static xmit_hook_func __rcu *tcf_xmit_hook; +static DEFINE_SPINLOCK(tcf_xmit_hook_lock); +static u16 tcf_xmit_hook_count; + +int tcf_set_xmit_hook(xmit_hook_func *xmit_hook) +{ + spin_lock(&tcf_xmit_hook_lock); + if (!tcf_xmit_hook_count) { + rcu_assign_pointer(tcf_xmit_hook, xmit_hook); + } else if (xmit_hook != rcu_access_pointer(tcf_xmit_hook)) { + spin_unlock(&tcf_xmit_hook_lock); + return -EBUSY; + } + + tcf_xmit_hook_count++; + spin_unlock(&tcf_xmit_hook_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(tcf_set_xmit_hook); + +void tcf_clear_xmit_hook(void) +{ + spin_lock(&tcf_xmit_hook_lock); + if (--tcf_xmit_hook_count == 0) + rcu_assign_pointer(tcf_xmit_hook, NULL); + spin_unlock(&tcf_xmit_hook_lock); + + synchronize_rcu(); +} +EXPORT_SYMBOL_GPL(tcf_clear_xmit_hook); + +int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)) +{ + xmit_hook_func *xmit_hook; + + xmit_hook = rcu_dereference(tcf_xmit_hook); + if (xmit_hook) + return xmit_hook(skb, xmit); + else + return xmit(skb); +} +EXPORT_SYMBOL_GPL(tcf_dev_queue_xmit); + static void tcf_action_goto_chain_exec(const struct tc_action *a, struct tcf_result *res) { diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index aba3cd8..f82dc65 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1541,8 +1541,14 @@ static int __init ct_init_module(void) if (err) goto err_register; + err = tcf_set_xmit_hook(sch_frag_xmit_hook); + if (err) + goto err_action; + return 0; +err_action: + tcf_unregister_action(&act_ct_ops, &ct_net_ops); err_register: tcf_ct_flow_tables_uninit(); err_tbl_init: @@ -1552,6 +1558,7 @@ static int __init ct_init_module
[PATCH v3 net-next 1/3] net/sched: fix miss init the mru in qdisc_skb_cb
From: wenxu The mru in the qdisc_skb_cb should be init as 0. Only defrag packets in the act_ct will set the value. Fixes: 038ebb1a713d ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct") Signed-off-by: wenxu --- v3: no change net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 60d325b..d0efa98 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3867,6 +3867,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) return skb; /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ + qdisc_skb_cb(skb)->mru = 0; mini_qdisc_bstats_cpu_update(miniq, skb); switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { @@ -4954,6 +4955,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) } qdisc_skb_cb(skb)->pkt_len = skb->len; + qdisc_skb_cb(skb)->mru = 0; skb->tc_at_ingress = 1; mini_qdisc_bstats_cpu_update(miniq, skb); -- 1.8.3.1
Re: [PATCH v3 net-next 3/3] net/sched: sch_frag: add generic packet fragment support.
在 2020/11/25 3:24, Jakub Kicinski 写道: > On Fri, 20 Nov 2020 07:38:36 +0800 we...@ucloud.cn wrote: >> +int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff >> *skb)) >> +{ >> +xmit_hook_func *xmit_hook; >> + >> +xmit_hook = rcu_dereference(tcf_xmit_hook); >> +if (xmit_hook) >> +return xmit_hook(skb, xmit); >> +else >> +return xmit(skb); >> +} >> +EXPORT_SYMBOL_GPL(tcf_dev_queue_xmit); > I'm concerned about the performance impact of these indirect calls. > > Did you check what code compiler will generate? What the impact with > retpolines enabled is going to be? > > Now that sch_frag is no longer a module this could be simplified. > > First of all - xmit_hook can only be sch_frag_xmit_hook, so please use > that directly. > > if (READ_ONCE(tcf_xmit_hook_count)) > sch_frag_xmit_hook(... > else > dev_queue_xmit(... > > The abstraction is costly and not necessary right now IMO. > > Then probably the counter should be: > > u32 __read_mostly tcf_xmit_hook_count; > > To avoid byte loads and having it be places in an unlucky cache line. Maybe a static key replace tcf_xmit_hook_count is more simplified? DEFINE_STATIC_KEY_FALSE(tcf_xmit_hook_in_use);
[PATCH v4 net-next 0/3] net/sched: fix over mtu packet of defrag in
From: wenxu Currently kernel tc subsystem can do conntrack in act_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. The first patch fix miss init the qdisc_skb_cb->mru The send one refactor the hanle of xmit in act_mirred and prepare for the third one The last one add implict packet fragment support to fix the over mtu for defrag in act_ct. wenxu (3): net/sched: fix miss init the mru in qdisc_skb_cb net/sched: act_mirred: refactor the handle of xmit net/sched: sch_frag: add generic packet fragment support. include/net/act_api.h | 6 ++ include/net/sch_generic.h | 5 +- net/core/dev.c| 2 + net/sched/Makefile| 1 + net/sched/act_api.c | 16 + net/sched/act_ct.c| 3 + net/sched/act_mirred.c| 21 +-- net/sched/sch_frag.c | 150 ++ 8 files changed, 194 insertions(+), 10 deletions(-) create mode 100644 net/sched/sch_frag.c -- 1.8.3.1
[PATCH v4 net-next 2/3] net/sched: act_mirred: refactor the handle of xmit
From: wenxu This one is prepare for the next patch. Signed-off-by: wenxu --- v4: no change include/net/sch_generic.h | 5 - net/sched/act_mirred.c| 21 +++-- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d8fd867..dd74f06 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1281,9 +1281,4 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc, void mini_qdisc_pair_block_init(struct mini_Qdisc_pair *miniqp, struct tcf_block *block); -static inline int skb_tc_reinsert(struct sk_buff *skb, struct tcf_result *res) -{ - return res->ingress ? netif_receive_skb(skb) : dev_queue_xmit(skb); -} - #endif diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index e24b7e2..17d0095 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -205,6 +205,18 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, return err; } +static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb) +{ + int err; + + if (!want_ingress) + err = dev_queue_xmit(skb); + else + err = netif_receive_skb(skb); + + return err; +} + static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { @@ -287,18 +299,15 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, /* let's the caller reinsert the packet, if possible */ if (use_reinsert) { res->ingress = want_ingress; - if (skb_tc_reinsert(skb, res)) + err = tcf_mirred_forward(res->ingress, skb); + if (err) tcf_action_inc_overlimit_qstats(&m->common); __this_cpu_dec(mirred_rec_level); return TC_ACT_CONSUMED; } } - if (!want_ingress) - err = dev_queue_xmit(skb2); - else - err = netif_receive_skb(skb2); - + err = tcf_mirred_forward(want_ingress, skb2); if (err) { out: tcf_action_inc_overlimit_qstats(&m->common); -- 1.8.3.1
[PATCH v4 net-next 1/3] net/sched: fix miss init the mru in qdisc_skb_cb
From: wenxu The mru in the qdisc_skb_cb should be init as 0. Only defrag packets in the act_ct will set the value. Fixes: 038ebb1a713d ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct") Signed-off-by: wenxu --- v4: no change net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 60d325b..d0efa98 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3867,6 +3867,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) return skb; /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ + qdisc_skb_cb(skb)->mru = 0; mini_qdisc_bstats_cpu_update(miniq, skb); switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { @@ -4954,6 +4955,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) } qdisc_skb_cb(skb)->pkt_len = skb->len; + qdisc_skb_cb(skb)->mru = 0; skb->tc_at_ingress = 1; mini_qdisc_bstats_cpu_update(miniq, skb); -- 1.8.3.1
[PATCH v4 net-next 3/3] net/sched: sch_frag: add generic packet fragment support.
From: wenxu Currently kernel tc subsystem can do conntrack in cat_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. This patch add support for a xmit hook to mirred, that gets executed before xmiting the packet. Then, when act_ct gets loaded, it configs that hook. The frag xmit hook maybe reused by other modules. Signed-off-by: wenxu --- v2: make act_frag just buildin for tc core but not a module return an error code from tcf_fragment depends on INET for ip_do_fragment v3: put the whole sch_frag.c under CONFIG_INET v4: remove the abstraction for xmit_hook include/net/act_api.h | 6 ++ include/net/sch_generic.h | 2 + net/sched/Makefile| 1 + net/sched/act_api.c | 16 + net/sched/act_ct.c| 3 + net/sched/act_mirred.c| 2 +- net/sched/sch_frag.c | 150 ++ 7 files changed, 179 insertions(+), 1 deletion(-) create mode 100644 net/sched/sch_frag.c diff --git a/include/net/act_api.h b/include/net/act_api.h index 8721492..55dab60 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -239,6 +239,12 @@ int tcf_action_check_ctrlact(int action, struct tcf_proto *tp, struct netlink_ext_ack *newchain); struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, struct tcf_chain *newchain); + +#ifdef CONFIG_INET +DECLARE_STATIC_KEY_FALSE(tcf_frag_xmit_count); +#endif + +int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); #endif /* CONFIG_NET_CLS_ACT */ static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index dd74f06..162ed62 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1281,4 +1281,6 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc, void mini_qdisc_pair_block_init(struct mini_Qdisc_pair *miniqp, struct tcf_block *block); +int sch_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); + #endif diff --git a/net/sched/Makefile b/net/sched/Makefile index 66bbf9a..dd14ef4 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -5,6 +5,7 @@ obj-y := sch_generic.o sch_mq.o +obj-$(CONFIG_INET) += sch_frag.o obj-$(CONFIG_NET_SCHED)+= sch_api.o sch_blackhole.o obj-$(CONFIG_NET_CLS) += cls_api.o obj-$(CONFIG_NET_CLS_ACT) += act_api.o diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 60e1572..34fe743 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -22,6 +22,22 @@ #include #include +#ifdef CONFIG_INET +DEFINE_STATIC_KEY_FALSE(tcf_frag_xmit_count); +EXPORT_SYMBOL_GPL(tcf_frag_xmit_count); +#endif + +int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)) +{ +#ifdef CONFIG_INET + if (static_branch_unlikely(&tcf_frag_xmit_count)) + return sch_frag_xmit_hook(skb, xmit); +#endif + + return xmit(skb); +} +EXPORT_SYMBOL_GPL(tcf_dev_queue_xmit); + static void tcf_action_goto_chain_exec(const struct tc_action *a, struct tcf_result *res) { diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index aba3cd8..61092c5 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1541,6 +1541,8 @@ static int __init ct_init_module(void) if (err) goto err_register; + static_branch_inc(&tcf_frag_xmit_count); + return 0; err_register: @@ -1552,6 +1554,7 @@ static int __init ct_init_module(void) static void __exit ct_cleanup_module(void) { + static_branch_dec(&tcf_frag_xmit_count); tcf_unregister_action(&act_ct_ops, &ct_net_ops); tcf_ct_flow_tables_uninit(); destroy_workqueue(act_ct_wq); diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 17d0095..7153c67 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -210,7 +210,7 @@ static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb) int err; if (!want_ingress) - err = dev_queue_xmit(skb); + err = tcf_dev_queue_xmit(skb, dev_queue_xmit); else err = netif_receive_skb(skb); diff --git a/net/sched/sch_frag.c b/net/sched/sch_frag.c new file mode 100644 index 000..e1e77d3 --- /dev/null +++ b/net/sched/sch_frag.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +#include +#include +#include +#include +#include + +struct sch_frag_data { + unsigned long dst; + struct qdisc_skb_cb cb; + __be16 inner_protocol; + u16 vlan_tci; +
Re: [PATCH net-next] net/sched: act_ct: enable stats for HW offloaded entries
On 11/27/2020 2:40 AM, Marcelo Ricardo Leitner wrote: > By setting NF_FLOWTABLE_COUNTER. Otherwise, the updates added by > commit ef803b3cf96a ("netfilter: flowtable: add counter support in HW > offload") are not effective when using act_ct. > > While at it, now that we have the flag set, protect the call to > nf_ct_acct_update() by commit beb97d3a3192 ("net/sched: act_ct: update > nf_conn_acct for act_ct SW offload in flowtable") with the check on > NF_FLOWTABLE_COUNTER, as also done on other places. > > Note that this shouldn't impact performance as these stats are only > enabled when net.netfilter.nf_conntrack_acct is enabled. > > Signed-off-by: Marcelo Ricardo Leitner > --- > net/sched/act_ct.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c > index > aba3cd85f284f3e49add31fe65e3b791f2386fa1..bb1ef3b8e77fb6fd6a74b88a65322baea2dc1ed5 > 100644 > --- a/net/sched/act_ct.c > +++ b/net/sched/act_ct.c > @@ -296,7 +296,8 @@ static int tcf_ct_flow_table_get(struct tcf_ct_params > *params) > goto err_insert; > > ct_ft->nf_ft.type = &flowtable_ct; > - ct_ft->nf_ft.flags |= NF_FLOWTABLE_HW_OFFLOAD; > + ct_ft->nf_ft.flags |= NF_FLOWTABLE_HW_OFFLOAD | > + NF_FLOWTABLE_COUNTER; > err = nf_flow_table_init(&ct_ft->nf_ft); > if (err) > goto err_init; > @@ -540,7 +541,8 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params > *p, > flow_offload_refresh(nf_ft, flow); > nf_conntrack_get(&ct->ct_general); > nf_ct_set(skb, ct, ctinfo); > - nf_ct_acct_update(ct, dir, skb->len); > + if (nf_ft->flags & NF_FLOWTABLE_COUNTER) > + nf_ct_acct_update(ct, dir, skb->len); > > return true; > } Acked-by: wenxu BR wenxu
Re: [PATCH v5 net-next 3/3] net/sched: act_frag: add implict packet fragment support.
在 2020/11/9 21:24, Vlad Buslov 写道: > On Sun 08 Nov 2020 at 01:30, we...@ucloud.cn wrote: >> From: wenxu >> >> Currently kernel tc subsystem can do conntrack in act_ct. But when several >> fragment packets go through the act_ct, function tcf_ct_handle_fragments >> will defrag the packets to a big one. But the last action will redirect >> mirred to a device which maybe lead the reassembly big packet over the mtu >> of target device. >> >> This patch add support for a xmit hook to mirred, that gets executed before >> xmiting the packet. Then, when act_ct gets loaded, it configs that hook. >> The frag xmit hook maybe reused by other modules. >> >> Signed-off-by: wenxu >> --- >> v2: Fix the crash for act_frag module without load >> v3: modify the kconfig describe and put tcf_xmit_hook_is_enabled >> in the tcf_dev_queue_xmit, and xchg atomic for tcf_xmit_hook >> v4: using skb_protocol and fix line length exceeds 80 columns >> v5: no change >> >> include/net/act_api.h | 16 + >> net/sched/Kconfig | 13 >> net/sched/Makefile | 1 + >> net/sched/act_api.c| 51 +++ >> net/sched/act_ct.c | 7 +++ >> net/sched/act_frag.c | 164 >> + >> net/sched/act_mirred.c | 2 +- >> 7 files changed, 253 insertions(+), 1 deletion(-) >> create mode 100644 net/sched/act_frag.c >> >> diff --git a/include/net/act_api.h b/include/net/act_api.h >> index 8721492..403a618 100644 >> --- a/include/net/act_api.h >> +++ b/include/net/act_api.h >> @@ -239,6 +239,22 @@ int tcf_action_check_ctrlact(int action, struct >> tcf_proto *tp, >> struct netlink_ext_ack *newchain); >> struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, >> struct tcf_chain *newchain); >> + >> +int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff >> *skb)); >> +int tcf_set_xmit_hook(int (*xmit_hook)(struct sk_buff *skb, >> + int (*xmit)(struct sk_buff *skb))); >> +void tcf_clear_xmit_hook(void); >> + >> +#if IS_ENABLED(CONFIG_NET_ACT_FRAG) >> +int tcf_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff >> *skb)); >> +#else >> +static inline int tcf_frag_xmit_hook(struct sk_buff *skb, >> + int (*xmit)(struct sk_buff *skb)) >> +{ >> +return 0; >> +} >> +#endif >> + >> #endif /* CONFIG_NET_CLS_ACT */ >> >> static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes, >> diff --git a/net/sched/Kconfig b/net/sched/Kconfig >> index a3b37d8..9a240c7 100644 >> --- a/net/sched/Kconfig >> +++ b/net/sched/Kconfig >> @@ -974,9 +974,22 @@ config NET_ACT_TUNNEL_KEY >>To compile this code as a module, choose M here: the >>module will be called act_tunnel_key. >> >> +config NET_ACT_FRAG >> +tristate "Packet fragmentation" >> +depends on NET_CLS_ACT >> +help >> + Say Y here to allow fragmenting big packets when outputting >> + with the mirred action. >> + >> + If unsure, say N. >> + >> + To compile this code as a module, choose M here: the >> + module will be called act_frag. >> + > Just wondering, what is the motivation for putting the frag code into > standalone module? It doesn't implement usual act_* interface and is not > user-configurable. To me it looks like functionality that belongs to > act_api. Am I missing something? The fragment operation is an single L3 action. So we put in an single modules. Maybe it is not proper to put in the act_api directly. >> config NET_ACT_CT >> tristate "connection tracking tc action" >> depends on NET_CLS_ACT && NF_CONNTRACK && NF_NAT && NF_FLOW_TABLE >> +depends on NET_ACT_FRAG >> help >>Say Y here to allow sending the packets to conntrack module. >> >> diff --git a/net/sched/Makefile b/net/sched/Makefile >> index 66bbf9a..c146186 100644 >> --- a/net/sched/Makefile >> +++ b/net/sched/Makefile >> @@ -29,6 +29,7 @@ obj-$(CONFIG_NET_IFE_SKBMARK) += act_meta_mark.o >> obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o >> obj-$(CONFIG_NET_IFE_SKBTCINDEX)+= act_meta_skbtcindex.o >> obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o >> +obj-$(CONFIG_NET_ACT_FRAG) += act_frag.o >> obj-$(CO
[PATCH v6 net-next 0/3] net/sched: fix over mtu packet of defrag in
From: wenxu Currently kernel tc subsystem can do conntrack in act_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. The first patch fix miss init the qdisc_skb_cb->mru The send one refactor the hanle of xmit in act_mirred and prepare for the third one The last one add implict packet fragment support to fix the over mtu for defrag in act_ct. wenxu (3): net/sched: fix miss init the mru in qdisc_skb_cb net/sched: act_mirred: refactor the handle of xmit net/sched: act_frag: add implict packet fragment support. include/net/act_api.h | 16 + include/net/sch_generic.h | 5 -- net/core/dev.c| 2 + net/sched/Kconfig | 13 net/sched/Makefile| 1 + net/sched/act_api.c | 47 + net/sched/act_ct.c| 7 ++ net/sched/act_frag.c | 164 ++ net/sched/act_mirred.c| 21 -- 9 files changed, 265 insertions(+), 11 deletions(-) create mode 100644 net/sched/act_frag.c -- 1.8.3.1
[PATCH v6 net-next 1/3] net/sched: fix miss init the mru in qdisc_skb_cb
From: wenxu The mru in the qdisc_skb_cb should be init as 0. Only defrag packets in the act_ct will set the value. Fixes: 038ebb1a713d ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct") Signed-off-by: wenxu --- v5: new patch v6: no change net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 751e526..a40de66 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3865,6 +3865,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) return skb; /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ + qdisc_skb_cb(skb)->mru = 0; mini_qdisc_bstats_cpu_update(miniq, skb); switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { @@ -4950,6 +4951,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) } qdisc_skb_cb(skb)->pkt_len = skb->len; + qdisc_skb_cb(skb)->mru = 0; skb->tc_at_ingress = 1; mini_qdisc_bstats_cpu_update(miniq, skb); -- 1.8.3.1
[PATCH v6 net-next 2/3] net/sched: act_mirred: refactor the handle of xmit
From: wenxu This one is prepare for the next patch. Signed-off-by: wenxu --- v6: no change include/net/sch_generic.h | 5 - net/sched/act_mirred.c| 21 +++-- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d8fd867..dd74f06 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1281,9 +1281,4 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc, void mini_qdisc_pair_block_init(struct mini_Qdisc_pair *miniqp, struct tcf_block *block); -static inline int skb_tc_reinsert(struct sk_buff *skb, struct tcf_result *res) -{ - return res->ingress ? netif_receive_skb(skb) : dev_queue_xmit(skb); -} - #endif diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index e24b7e2..17d0095 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -205,6 +205,18 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, return err; } +static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb) +{ + int err; + + if (!want_ingress) + err = dev_queue_xmit(skb); + else + err = netif_receive_skb(skb); + + return err; +} + static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { @@ -287,18 +299,15 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, /* let's the caller reinsert the packet, if possible */ if (use_reinsert) { res->ingress = want_ingress; - if (skb_tc_reinsert(skb, res)) + err = tcf_mirred_forward(res->ingress, skb); + if (err) tcf_action_inc_overlimit_qstats(&m->common); __this_cpu_dec(mirred_rec_level); return TC_ACT_CONSUMED; } } - if (!want_ingress) - err = dev_queue_xmit(skb2); - else - err = netif_receive_skb(skb2); - + err = tcf_mirred_forward(want_ingress, skb2); if (err) { out: tcf_action_inc_overlimit_qstats(&m->common); -- 1.8.3.1
[PATCH v6 net-next 3/3] net/sched: act_frag: add implict packet fragment support.
From: wenxu Currently kernel tc subsystem can do conntrack in cat_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. This patch add support for a xmit hook to mirred, that gets executed before xmiting the packet. Then, when act_ct gets loaded, it configs that hook. The frag xmit hook maybe reused by other modules. Signed-off-by: wenxu --- v2: Fix the crash for act_frag module without load v3: modify the kconfig describe and put tcf_xmit_hook_is_enabled in the tcf_dev_queue_xmit, and xchg atomic for tcf_xmit_hook v4: using skb_protocol and fix line length exceeds 80 columns v5: no change v6: protect the tcf_xmit_hook with rcu lock include/net/act_api.h | 16 + net/sched/Kconfig | 13 net/sched/Makefile | 1 + net/sched/act_api.c| 47 ++ net/sched/act_ct.c | 7 +++ net/sched/act_frag.c | 164 + net/sched/act_mirred.c | 2 +- 7 files changed, 249 insertions(+), 1 deletion(-) create mode 100644 net/sched/act_frag.c diff --git a/include/net/act_api.h b/include/net/act_api.h index 8721492..403a618 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -239,6 +239,22 @@ int tcf_action_check_ctrlact(int action, struct tcf_proto *tp, struct netlink_ext_ack *newchain); struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, struct tcf_chain *newchain); + +int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); +int tcf_set_xmit_hook(int (*xmit_hook)(struct sk_buff *skb, + int (*xmit)(struct sk_buff *skb))); +void tcf_clear_xmit_hook(void); + +#if IS_ENABLED(CONFIG_NET_ACT_FRAG) +int tcf_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); +#else +static inline int tcf_frag_xmit_hook(struct sk_buff *skb, +int (*xmit)(struct sk_buff *skb)) +{ + return 0; +} +#endif + #endif /* CONFIG_NET_CLS_ACT */ static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes, diff --git a/net/sched/Kconfig b/net/sched/Kconfig index a3b37d8..9a240c7 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -974,9 +974,22 @@ config NET_ACT_TUNNEL_KEY To compile this code as a module, choose M here: the module will be called act_tunnel_key. +config NET_ACT_FRAG + tristate "Packet fragmentation" + depends on NET_CLS_ACT + help + Say Y here to allow fragmenting big packets when outputting + with the mirred action. + + If unsure, say N. + + To compile this code as a module, choose M here: the + module will be called act_frag. + config NET_ACT_CT tristate "connection tracking tc action" depends on NET_CLS_ACT && NF_CONNTRACK && NF_NAT && NF_FLOW_TABLE + depends on NET_ACT_FRAG help Say Y here to allow sending the packets to conntrack module. diff --git a/net/sched/Makefile b/net/sched/Makefile index 66bbf9a..c146186 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -29,6 +29,7 @@ obj-$(CONFIG_NET_IFE_SKBMARK) += act_meta_mark.o obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o +obj-$(CONFIG_NET_ACT_FRAG) += act_frag.o obj-$(CONFIG_NET_ACT_CT) += act_ct.o obj-$(CONFIG_NET_ACT_GATE) += act_gate.o obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o diff --git a/net/sched/act_api.c b/net/sched/act_api.c index f66417d..5b9aa3a 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -22,6 +22,53 @@ #include #include +static int (*tcf_xmit_hook)(struct sk_buff *skb, + int (*xmit)(struct sk_buff *skb)); +static DEFINE_SPINLOCK(tcf_xmit_hook_lock); +static u16 tcf_xmit_hook_count; + +int tcf_set_xmit_hook(int (*xmit_hook)(struct sk_buff *skb, + int (*xmit)(struct sk_buff *skb))) +{ + spin_lock(&tcf_xmit_hook_lock); + if (!tcf_xmit_hook_count) { + rcu_assign_pointer(tcf_xmit_hook, xmit_hook); + } else if (xmit_hook != tcf_xmit_hook) { + spin_unlock(&tcf_xmit_hook_lock); + return -EBUSY; + } + + tcf_xmit_hook_count++; + spin_unlock(&tcf_xmit_hook_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(tcf_set_xmit_hook); + +void tcf_clear_xmit_hook(void) +{ + spin_lock(&tcf_xmit_hook_lock); + if (--tcf_xmit_hook_count == 0) + rcu_assign_pointer(tcf_xmit_hook, NULL); + spin_unlock(&tcf_xmit_hook_lock); + +
[PATCH v7 net-next 2/3] net/sched: act_mirred: refactor the handle of xmit
From: wenxu This one is prepare for the next patch. Signed-off-by: wenxu --- v7: no change include/net/sch_generic.h | 5 - net/sched/act_mirred.c| 21 +++-- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d8fd867..dd74f06 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1281,9 +1281,4 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc, void mini_qdisc_pair_block_init(struct mini_Qdisc_pair *miniqp, struct tcf_block *block); -static inline int skb_tc_reinsert(struct sk_buff *skb, struct tcf_result *res) -{ - return res->ingress ? netif_receive_skb(skb) : dev_queue_xmit(skb); -} - #endif diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index e24b7e2..17d0095 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -205,6 +205,18 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, return err; } +static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb) +{ + int err; + + if (!want_ingress) + err = dev_queue_xmit(skb); + else + err = netif_receive_skb(skb); + + return err; +} + static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { @@ -287,18 +299,15 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, /* let's the caller reinsert the packet, if possible */ if (use_reinsert) { res->ingress = want_ingress; - if (skb_tc_reinsert(skb, res)) + err = tcf_mirred_forward(res->ingress, skb); + if (err) tcf_action_inc_overlimit_qstats(&m->common); __this_cpu_dec(mirred_rec_level); return TC_ACT_CONSUMED; } } - if (!want_ingress) - err = dev_queue_xmit(skb2); - else - err = netif_receive_skb(skb2); - + err = tcf_mirred_forward(want_ingress, skb2); if (err) { out: tcf_action_inc_overlimit_qstats(&m->common); -- 1.8.3.1
[PATCH v7 net-next 3/3] net/sched: act_frag: add implict packet fragment support.
From: wenxu Currently kernel tc subsystem can do conntrack in cat_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. This patch add support for a xmit hook to mirred, that gets executed before xmiting the packet. Then, when act_ct gets loaded, it configs that hook. The frag xmit hook maybe reused by other modules. Signed-off-by: wenxu --- v2: Fix the crash for act_frag module without load v3: modify the kconfig describe and put tcf_xmit_hook_is_enabled in the tcf_dev_queue_xmit, and xchg atomic for tcf_xmit_hook v4: using skb_protocol and fix line length exceeds 80 columns v5: no change v6: protect the tcf_xmit_hook with rcu lock v7: add miss __rcu for tcf_xmit_hook include/net/act_api.h | 16 + net/sched/Kconfig | 13 net/sched/Makefile | 1 + net/sched/act_api.c| 47 ++ net/sched/act_ct.c | 7 +++ net/sched/act_frag.c | 164 + net/sched/act_mirred.c | 2 +- 7 files changed, 249 insertions(+), 1 deletion(-) create mode 100644 net/sched/act_frag.c diff --git a/include/net/act_api.h b/include/net/act_api.h index 8721492..403a618 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -239,6 +239,22 @@ int tcf_action_check_ctrlact(int action, struct tcf_proto *tp, struct netlink_ext_ack *newchain); struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, struct tcf_chain *newchain); + +int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); +int tcf_set_xmit_hook(int (*xmit_hook)(struct sk_buff *skb, + int (*xmit)(struct sk_buff *skb))); +void tcf_clear_xmit_hook(void); + +#if IS_ENABLED(CONFIG_NET_ACT_FRAG) +int tcf_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); +#else +static inline int tcf_frag_xmit_hook(struct sk_buff *skb, +int (*xmit)(struct sk_buff *skb)) +{ + return 0; +} +#endif + #endif /* CONFIG_NET_CLS_ACT */ static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes, diff --git a/net/sched/Kconfig b/net/sched/Kconfig index a3b37d8..9a240c7 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -974,9 +974,22 @@ config NET_ACT_TUNNEL_KEY To compile this code as a module, choose M here: the module will be called act_tunnel_key. +config NET_ACT_FRAG + tristate "Packet fragmentation" + depends on NET_CLS_ACT + help + Say Y here to allow fragmenting big packets when outputting + with the mirred action. + + If unsure, say N. + + To compile this code as a module, choose M here: the + module will be called act_frag. + config NET_ACT_CT tristate "connection tracking tc action" depends on NET_CLS_ACT && NF_CONNTRACK && NF_NAT && NF_FLOW_TABLE + depends on NET_ACT_FRAG help Say Y here to allow sending the packets to conntrack module. diff --git a/net/sched/Makefile b/net/sched/Makefile index 66bbf9a..c146186 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -29,6 +29,7 @@ obj-$(CONFIG_NET_IFE_SKBMARK) += act_meta_mark.o obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o +obj-$(CONFIG_NET_ACT_FRAG) += act_frag.o obj-$(CONFIG_NET_ACT_CT) += act_ct.o obj-$(CONFIG_NET_ACT_GATE) += act_gate.o obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o diff --git a/net/sched/act_api.c b/net/sched/act_api.c index f66417d..8a8a6a5 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -22,6 +22,53 @@ #include #include +static int (__rcu *tcf_xmit_hook)(struct sk_buff *skb, + int (*xmit)(struct sk_buff *skb)); +static DEFINE_SPINLOCK(tcf_xmit_hook_lock); +static u16 tcf_xmit_hook_count; + +int tcf_set_xmit_hook(int (*xmit_hook)(struct sk_buff *skb, + int (*xmit)(struct sk_buff *skb))) +{ + spin_lock(&tcf_xmit_hook_lock); + if (!tcf_xmit_hook_count) { + rcu_assign_pointer(tcf_xmit_hook, xmit_hook); + } else if (xmit_hook != tcf_xmit_hook) { + spin_unlock(&tcf_xmit_hook_lock); + return -EBUSY; + } + + tcf_xmit_hook_count++; + spin_unlock(&tcf_xmit_hook_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(tcf_set_xmit_hook); + +void tcf_clear_xmit_hook(void) +{ + spin_lock(&tcf_xmit_hook_lock); + if (--tcf_xmit_hook_count == 0) + rcu_assign_pointer(tcf_xmit_hook, NULL); +
[PATCH v7 net-next 0/3] net/sched: fix over mtu packet of defrag in
From: wenxu Currently kernel tc subsystem can do conntrack in act_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. The first patch fix miss init the qdisc_skb_cb->mru The send one refactor the hanle of xmit in act_mirred and prepare for the third one The last one add implict packet fragment support to fix the over mtu for defrag in act_ct. wenxu (3): net/sched: fix miss init the mru in qdisc_skb_cb net/sched: act_mirred: refactor the handle of xmit net/sched: act_frag: add implict packet fragment support. include/net/act_api.h | 16 + include/net/sch_generic.h | 5 -- net/core/dev.c| 2 + net/sched/Kconfig | 13 net/sched/Makefile| 1 + net/sched/act_api.c | 47 + net/sched/act_ct.c| 7 ++ net/sched/act_frag.c | 164 ++ net/sched/act_mirred.c| 21 -- 9 files changed, 265 insertions(+), 11 deletions(-) create mode 100644 net/sched/act_frag.c -- 1.8.3.1