Re: [net-next V3 6/8] net: sched: convert tasklets to use new tasklet_setup() API
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- net/sched/sch_atm.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c index 1c281cc81f57..390d972bb2f0 100644 --- a/net/sched/sch_atm.c +++ b/net/sched/sch_atm.c @@ -466,10 +466,10 @@ drop: __maybe_unused * non-ATM interfaces. */ -static void sch_atm_dequeue(unsigned long data) +static void sch_atm_dequeue(struct tasklet_struct *t) { - struct Qdisc *sch = (struct Qdisc *)data; - struct atm_qdisc_data *p = qdisc_priv(sch); + struct atm_qdisc_data *p = from_tasklet(p, t, task); + struct Qdisc *sch = (struct Qdisc *)((char *)p - sizeof(struct Qdisc)); Hmm... I think I prefer not burying implementation details in net/sched/sch_atm.c and instead define a helper in include/net/pkt_sched.h diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 4ed32e6b020145afb015c3c07d2ec3a613f1311d..15b1b30f454e4837cd1fc07bb3ff6b4f178b1d39 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -24,6 +24,11 @@ static inline void *qdisc_priv(struct Qdisc *q) return &q->privdata; } +static inline struct Qdisc *qdisc_from_priv(void *priv) +{ + return container_of(priv, struct Qdisc, privdata); +} + /* Timer resolution MUST BE < 10% of min_schedulable_packet_size/bandwidth Sure, I will have it updated and resent. Thanks.
[PATCH net v2] net: openvswitch: silence suspicious RCU usage warning
Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize() by replacing rcu_dereference() with rcu_dereference_ovsl(). In addition, when creating a new datapath, make sure it's configured under the ovs_lock. Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable") Reported-by: syzbot+9a8f8bfcc56e85780...@syzkaller.appspotmail.com Signed-off-by: Eelco Chaudron --- v2: - Moved local variable initialization above lock - Renamed jump label to indicate unlocking net/openvswitch/datapath.c | 14 +++--- net/openvswitch/flow_table.c |2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 832f898edb6a..9d6ef6cb9b26 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -1703,13 +1703,13 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) parms.port_no = OVSP_LOCAL; parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID]; - err = ovs_dp_change(dp, a); - if (err) - goto err_destroy_meters; - /* So far only local changes have been made, now need the lock. */ ovs_lock(); + err = ovs_dp_change(dp, a); + if (err) + goto err_unlock_and_destroy_meters; + vport = new_vport(&parms); if (IS_ERR(vport)) { err = PTR_ERR(vport); @@ -1725,8 +1725,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) ovs_dp_reset_user_features(skb, info); } - ovs_unlock(); - goto err_destroy_meters; + goto err_unlock_and_destroy_meters; } err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid, @@ -1741,7 +1740,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) ovs_notify(&dp_datapath_genl_family, reply, info); return 0; -err_destroy_meters: +err_unlock_and_destroy_meters: + ovs_unlock(); ovs_meters_exit(dp); err_destroy_ports: kfree(dp->ports); diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index f3486a37361a..c89c8da99f1a 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -390,7 +390,7 @@ static struct mask_cache *tbl_mask_cache_alloc(u32 size) } int ovs_flow_tbl_masks_cache_resize(struct flow_table *table, u32 size) { - struct mask_cache *mc = rcu_dereference(table->mask_cache); + struct mask_cache *mc = rcu_dereference_ovsl(table->mask_cache); struct mask_cache *new; if (size == mc->cache_size)
Re: [PATCH net] net: openvswitch: silence suspicious RCU usage warning
On 2 Nov 2020, at 20:51, Jakub Kicinski wrote: > On Mon, 02 Nov 2020 09:52:19 +0100 Eelco Chaudron wrote: >> On 30 Oct 2020, at 22:28, Jakub Kicinski wrote: @@ -1695,6 +1695,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) if (err) goto err_destroy_ports; + /* So far only local changes have been made, now need the lock. */ + ovs_lock(); >>> >>> Should we move the lock below assignments to param? >>> >>> Looks a little strange to protect stack variables with a global lock. >> >> You are right, I should have moved it down after the assignment. I will >> send out a v2. >> >>> Let's update the name of the label. >> >> Guess now it is, unlock and destroy meters, so what label are you >> looking for? >> >> err_unlock_and_destroy_meters: which looks a bit long, or just >> err_unlock: > > I feel like I saw some names like err_unlock_and_destroy_meters in OvS > code, but can't find them in this file right now. > > I'd personally go for kist err_unlock, or maybe err_unlock_ovs as is > used in other functions in this file. > > But as long as it starts with err_unlock it's fine by me :) Ack, sent out a v2.
Re: [PATCHv3 iproute2-next 0/5] iproute2: add libbpf support
On Mon, 2 Nov 2020 22:58:06 -0800, Andrii Nakryiko wrote: > But I don't think I got a real answer as to what's the exact reason > against the submodule. Like what "inappropriate" even means in this > case? Jesper's security argument so far was the only objective > criteria, as far as I can tell. It's the fundamental objection. Distributions in general have the "no bundled libraries" policy. It is sometimes annoying but it helps to understand that the policy is not a whim of distros, it's coming from years of experience with package maintenance for security and stability. > But I also see that using libbpf through submodule gives iproute2 > exact control over which version of libbpf is being used. And that > does not depend at all on any specific Linux distribution, its > version, LTS vs non-LTS, etc. iproute2 will just work the same across > all of them. So matches your stated goals very directly and > explicitly. If you take this route, the end result would be all dependencies for all projects being included as submodules and bundled. At the first sight, this sounds easier for the developers. Why bother with dynamic linking at all? Everything can be linked statically. The result would be nightmare for both distros and users. No timely security updates possible, critical bugs not being fixed in some programs, etc. There is enough experience with this kind of setup to conclude it is not the right way to go. Yes, dynamic linking is initially more work for developers of both apps and libraries. However, it pays off over time - there's no need to keep track of security and other important fixes in the dependencies, it comes for free from the distro work. Btw, taking the bundling to the extreme, every app could bundle its own well tested and compatible kernel version and be run in a VM. This might sound far fetched but there were actual attempts to do that. It didn't take off; I think part of the reason was that the Linux kernel is very good in keeping its APIs stable. And I'm convinced this is the way to go for libraries, too: put an emphasis on API stability. Make it easy to get consumed and updated under the hood. Everybody wins this way. Jiri
Re: [PATCH 41/41] realtek: rtw88: pci: Add prototypes for .probe, .remove and .shutdown
On Mon, 02 Nov 2020, Brian Norris wrote: > On Mon, Nov 2, 2020 at 3:25 AM Lee Jones wrote: > > --- a/drivers/net/wireless/realtek/rtw88/pci.h > > +++ b/drivers/net/wireless/realtek/rtw88/pci.h > > @@ -212,6 +212,10 @@ struct rtw_pci { > > void __iomem *mmap; > > }; > > > > +int rtw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id); > > +void rtw_pci_remove(struct pci_dev *pdev); > > +void rtw_pci_shutdown(struct pci_dev *pdev); > > + > > > > These definitions are already in 4 other header files: > > drivers/net/wireless/realtek/rtw88/rtw8723de.h > drivers/net/wireless/realtek/rtw88/rtw8821ce.h > drivers/net/wireless/realtek/rtw88/rtw8822be.h > drivers/net/wireless/realtek/rtw88/rtw8822ce.h > > Seems like you should be moving them, not just adding yet another duplicate. I followed the current convention. Happy to optimise if that's what is required. -- Lee Jones [李琼斯] Senior Technical Lead - Developer Services Linaro.org │ Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCHv3 iproute2-next 0/5] iproute2: add libbpf support
On 11/3/20 7:58 AM, Andrii Nakryiko wrote: On Mon, Nov 2, 2020 at 7:47 AM David Ahern wrote: On 10/29/20 9:11 AM, Hangbin Liu wrote: This series converts iproute2 to use libbpf for loading and attaching BPF programs when it is available. This means that iproute2 will correctly process BTF information and support the new-style BTF-defined maps, while keeping compatibility with the old internal map definition syntax. This is achieved by checking for libbpf at './configure' time, and using it if available. By default the system libbpf will be used, but static linking against a custom libbpf version can be achieved by passing LIBBPF_DIR to configure. FORCE_LIBBPF can be set to force configure to abort if no suitable libbpf is found (useful for automatic packaging that wants to enforce the dependency). The old iproute2 bpf code is kept and will be used if no suitable libbpf is available. When using libbpf, wrapper code ensures that iproute2 will still understand the old map definition format, including populating map-in-map and tail call maps before load. The examples in bpf/examples are kept, and a separate set of examples are added with BTF-based map definitions for those examples where this is possible (libbpf doesn't currently support declaratively populating tail call maps). At last, Thanks a lot for Toke's help on this patch set. In regards to comments from v2 of the series: iproute2 is a stable, production package that requires minimal support from external libraries. The external packages it does require are also stable with few to no relevant changes. bpf and libbpf on the other hand are under active development and rapidly changing month over month. The git submodule approach has its conveniences for rapid development but is inappropriate for a package like iproute2 and will not be considered. I thought last time this discussion came up there was consensus that the submodule could be an explicit opt in for the configure script at least?
[PATCH v8 0/4] Add support for mv88e6393x family of Marvell
Updated patchset with following changes. - Add kerneldoc for 5GBASER phy interface - Remove lane param initialization wherever is it not needed. Pavana Sharma (4): dt-bindings: net: Add 5GBASER phy interface mode net: phy: Add 5GBASER interface mode net: dsa: mv88e6xxx: Change serdes lane parameter from u8 type to int net: dsa: mv88e6xxx: Add support for mv88e6393x family of Marvell .../bindings/net/ethernet-controller.yaml | 2 + drivers/net/dsa/mv88e6xxx/chip.c | 164 +- drivers/net/dsa/mv88e6xxx/chip.h | 20 +- drivers/net/dsa/mv88e6xxx/global1.h | 2 + drivers/net/dsa/mv88e6xxx/global2.h | 8 + drivers/net/dsa/mv88e6xxx/port.c | 240 +- drivers/net/dsa/mv88e6xxx/port.h | 43 ++- drivers/net/dsa/mv88e6xxx/serdes.c| 295 +++--- drivers/net/dsa/mv88e6xxx/serdes.h| 91 -- include/linux/phy.h | 5 + 10 files changed, 781 insertions(+), 89 deletions(-) -- 2.17.1
[PATCH v8 1/4] dt-bindings: net: Add 5GBASER phy interface mode
Add 5gbase-r PHY interface mode. Signed-off-by: Pavana Sharma --- Documentation/devicetree/bindings/net/ethernet-controller.yaml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/net/ethernet-controller.yaml b/Documentation/devicetree/bindings/net/ethernet-controller.yaml index fdf709817218..aa6ae7851de9 100644 --- a/Documentation/devicetree/bindings/net/ethernet-controller.yaml +++ b/Documentation/devicetree/bindings/net/ethernet-controller.yaml @@ -89,6 +89,8 @@ properties: - trgmii - 1000base-x - 2500base-x + # 5GBASE-R + - 5gbase-r - rxaui - xaui -- 2.17.1
[PATCH v8 3/4] net: dsa: mv88e6xxx: Change serdes lane parameter from u8 type to int
Returning 0 is no more an error case with MV88E6393 family which has serdes lane numbers 0, 9 or 10. So with this change .serdes_get_lane will return lane number or error (-ENODEV). Signed-off-by: Pavana Sharma --- drivers/net/dsa/mv88e6xxx/chip.c | 28 +-- drivers/net/dsa/mv88e6xxx/chip.h | 16 +++ drivers/net/dsa/mv88e6xxx/port.c | 6 +-- drivers/net/dsa/mv88e6xxx/serdes.c | 74 +++--- drivers/net/dsa/mv88e6xxx/serdes.h | 50 ++-- 5 files changed, 87 insertions(+), 87 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index bd297ae7cf9e..d32731a7c658 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -485,12 +485,12 @@ static int mv88e6xxx_serdes_pcs_get_state(struct dsa_switch *ds, int port, struct phylink_link_state *state) { struct mv88e6xxx_chip *chip = ds->priv; - u8 lane; + int lane; int err; mv88e6xxx_reg_lock(chip); lane = mv88e6xxx_serdes_get_lane(chip, port); - if (lane && chip->info->ops->serdes_pcs_get_state) + if ((lane >= 0) && chip->info->ops->serdes_pcs_get_state) err = chip->info->ops->serdes_pcs_get_state(chip, port, lane, state); else @@ -506,11 +506,11 @@ static int mv88e6xxx_serdes_pcs_config(struct mv88e6xxx_chip *chip, int port, const unsigned long *advertise) { const struct mv88e6xxx_ops *ops = chip->info->ops; - u8 lane; + int lane; if (ops->serdes_pcs_config) { lane = mv88e6xxx_serdes_get_lane(chip, port); - if (lane) + if (lane >= 0) return ops->serdes_pcs_config(chip, port, lane, mode, interface, advertise); } @@ -522,15 +522,15 @@ static void mv88e6xxx_serdes_pcs_an_restart(struct dsa_switch *ds, int port) { struct mv88e6xxx_chip *chip = ds->priv; const struct mv88e6xxx_ops *ops; + int lane; int err = 0; - u8 lane; ops = chip->info->ops; if (ops->serdes_pcs_an_restart) { mv88e6xxx_reg_lock(chip); lane = mv88e6xxx_serdes_get_lane(chip, port); - if (lane) + if (lane >= 0) err = ops->serdes_pcs_an_restart(chip, port, lane); mv88e6xxx_reg_unlock(chip); @@ -544,11 +544,11 @@ static int mv88e6xxx_serdes_pcs_link_up(struct mv88e6xxx_chip *chip, int port, int speed, int duplex) { const struct mv88e6xxx_ops *ops = chip->info->ops; - u8 lane; + int lane; if (!phylink_autoneg_inband(mode) && ops->serdes_pcs_link_up) { lane = mv88e6xxx_serdes_get_lane(chip, port); - if (lane) + if (lane >= 0) return ops->serdes_pcs_link_up(chip, port, lane, speed, duplex); } @@ -2422,11 +2422,11 @@ static irqreturn_t mv88e6xxx_serdes_irq_thread_fn(int irq, void *dev_id) struct mv88e6xxx_chip *chip = mvp->chip; irqreturn_t ret = IRQ_NONE; int port = mvp->port; - u8 lane; + int lane; mv88e6xxx_reg_lock(chip); lane = mv88e6xxx_serdes_get_lane(chip, port); - if (lane) + if (lane >= 0) ret = mv88e6xxx_serdes_irq_status(chip, port, lane); mv88e6xxx_reg_unlock(chip); @@ -2434,7 +2434,7 @@ static irqreturn_t mv88e6xxx_serdes_irq_thread_fn(int irq, void *dev_id) } static int mv88e6xxx_serdes_irq_request(struct mv88e6xxx_chip *chip, int port, - u8 lane) + int lane) { struct mv88e6xxx_port *dev_id = &chip->ports[port]; unsigned int irq; @@ -2463,7 +2463,7 @@ static int mv88e6xxx_serdes_irq_request(struct mv88e6xxx_chip *chip, int port, } static int mv88e6xxx_serdes_irq_free(struct mv88e6xxx_chip *chip, int port, -u8 lane) +int lane) { struct mv88e6xxx_port *dev_id = &chip->ports[port]; unsigned int irq = dev_id->serdes_irq; @@ -2488,11 +2488,11 @@ static int mv88e6xxx_serdes_irq_free(struct mv88e6xxx_chip *chip, int port, static int mv88e6xxx_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on) { - u8 lane; + int lane; int err; lane = mv88e6xxx_serdes_get_lane(chip, port); - if (!lane) + if (lane < 0) return 0; if (on) { diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h index 81c244fc0419..d81f586d67e8 100644 --- a/dri
[PATCH v8 2/4] net: phy: Add 5GBASER interface mode
Add 5GBASE-R phy interface mode Signed-off-by: Pavana Sharma --- include/linux/phy.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/phy.h b/include/linux/phy.h index eb3cb1a98b45..71e280059ec5 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -106,6 +106,7 @@ extern const int phy_10gbit_features_array[1]; * @PHY_INTERFACE_MODE_TRGMII: Turbo RGMII * @PHY_INTERFACE_MODE_1000BASEX: 1000 BaseX * @PHY_INTERFACE_MODE_2500BASEX: 2500 BaseX + * @PHY_INTERFACE_MODE_5GBASER: 5G BaseR * @PHY_INTERFACE_MODE_RXAUI: Reduced XAUI * @PHY_INTERFACE_MODE_XAUI: 10 Gigabit Attachment Unit Interface * @PHY_INTERFACE_MODE_10GBASER: 10G BaseR @@ -137,6 +138,8 @@ typedef enum { PHY_INTERFACE_MODE_TRGMII, PHY_INTERFACE_MODE_1000BASEX, PHY_INTERFACE_MODE_2500BASEX, + /* 5GBASE-R mode */ + PHY_INTERFACE_MODE_5GBASER, PHY_INTERFACE_MODE_RXAUI, PHY_INTERFACE_MODE_XAUI, /* 10GBASE-R, XFI, SFI - single lane 10G Serdes */ @@ -215,6 +218,8 @@ static inline const char *phy_modes(phy_interface_t interface) return "1000base-x"; case PHY_INTERFACE_MODE_2500BASEX: return "2500base-x"; + case PHY_INTERFACE_MODE_5GBASER: + return "5gbase-r"; case PHY_INTERFACE_MODE_RXAUI: return "rxaui"; case PHY_INTERFACE_MODE_XAUI: -- 2.17.1
Re: [PATCH 05/41] rtl8192cu: trx: Demote clear abuse of kernel-doc format
On Mon, 02 Nov 2020, Larry Finger wrote: > On 11/2/20 5:23 AM, Lee Jones wrote: > > Fixes the following W=1 kernel build warning(s): > > > > drivers/net/wireless/realtek/rtlwifi/rtl8192cu/trx.c:455: warning: > > Function parameter or member 'txdesc' not described in > > '_rtl_tx_desc_checksum' > > > > Cc: Ping-Ke Shih > > Cc: Kalle Valo > > Cc: "David S. Miller" > > Cc: Jakub Kicinski > > Cc: Larry Finger > > Cc: linux-wirel...@vger.kernel.org > > Cc: netdev@vger.kernel.org > > Signed-off-by: Lee Jones > > --- > > drivers/net/wireless/realtek/rtlwifi/rtl8192cu/trx.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/trx.c > > b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/trx.c > > index 1ad0cf37f60bb..87f959d5d861d 100644 > > --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/trx.c > > +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/trx.c > > @@ -448,7 +448,7 @@ static void _rtl_fill_usb_tx_desc(__le32 *txdesc) > > set_tx_desc_first_seg(txdesc, 1); > > } > > -/** > > +/* > >*For HW recovery information > >*/ > > static void _rtl_tx_desc_checksum(__le32 *txdesc) > > > > Did you check this patch with checkpatch.pl? Yes. > I think you substituted one > warning for another. The wireless-testing trees previously did not accept a > bare "/*", which is why "/**" was present. I don't see a problem. $ git format-patch -n1 --stdout 8cd8b929e0458 | ./scripts/checkpatch.pl total: 0 errors, 0 warnings, 0 checks, 8 lines checked "[PATCH 1/1] rtl8192cu: trx: Demote clear abuse of kernel-doc format" has no obvious style problems and is ready for submission. > This particular instance should have > /* For HW recovery information */ > as the comment. -- Lee Jones [李琼斯] Senior Technical Lead - Developer Services Linaro.org │ Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog
[PATCH v8 4/4] net: dsa: mv88e6xxx: Add support for mv88e6393x family of Marvell
The Marvell 88E6393X device is a single-chip integration of a 11-port Ethernet switch with eight integrated Gigabit Ethernet (GbE) transceivers and three 10-Gigabit interfaces. This patch adds functionalities specific to mv88e6393x family (88E6393X, 88E6193X and 88E6191X) Co-developed-by: Ashkan Boldaji Signed-off-by: Ashkan Boldaji Signed-off-by: Pavana Sharma --- Changes in v2: - Fix a warning (Reported-by: kernel test robot ) Changes in v3: - Fix 'unused function' warning Changes in v4-v8: - Incorporated feedback from maintainers. --- drivers/net/dsa/mv88e6xxx/chip.c| 136 drivers/net/dsa/mv88e6xxx/chip.h| 4 + drivers/net/dsa/mv88e6xxx/global1.h | 2 + drivers/net/dsa/mv88e6xxx/global2.h | 8 + drivers/net/dsa/mv88e6xxx/port.c| 234 drivers/net/dsa/mv88e6xxx/port.h| 43 - drivers/net/dsa/mv88e6xxx/serdes.c | 225 +- drivers/net/dsa/mv88e6xxx/serdes.h | 41 - 8 files changed, 689 insertions(+), 4 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index d32731a7c658..bfcbe70affa3 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -635,6 +635,24 @@ static void mv88e6390x_phylink_validate(struct mv88e6xxx_chip *chip, int port, mv88e6390_phylink_validate(chip, port, mask, state); } +static void mv88e6393x_phylink_validate(struct mv88e6xxx_chip *chip, int port, + unsigned long *mask, + struct phylink_link_state *state) +{ + if (port == 0 || port == 9 || port == 10) { + phylink_set(mask, 1baseT_Full); + phylink_set(mask, 1baseKR_Full); + phylink_set(mask, 5000baseT_Full); + phylink_set(mask, 2500baseX_Full); + phylink_set(mask, 2500baseT_Full); + } + + phylink_set(mask, 1000baseT_Full); + phylink_set(mask, 1000baseX_Full); + + mv88e6065_phylink_validate(chip, port, mask, state); +} + static void mv88e6xxx_validate(struct dsa_switch *ds, int port, unsigned long *supported, struct phylink_link_state *state) @@ -3906,6 +3924,55 @@ static const struct mv88e6xxx_ops mv88e6191_ops = { .phylink_validate = mv88e6390_phylink_validate, }; +static const struct mv88e6xxx_ops mv88e6393x_ops = { + /* MV88E6XXX_FAMILY_6393 */ + .setup_errata = mv88e6393x_setup_errata, + .irl_init_all = mv88e6390_g2_irl_init_all, + .get_eeprom = mv88e6xxx_g2_get_eeprom8, + .set_eeprom = mv88e6xxx_g2_set_eeprom8, + .set_switch_mac = mv88e6xxx_g2_set_switch_mac, + .phy_read = mv88e6xxx_g2_smi_phy_read, + .phy_write = mv88e6xxx_g2_smi_phy_write, + .port_set_link = mv88e6xxx_port_set_link, + .port_set_speed_duplex = mv88e6393x_port_set_speed_duplex, + .port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay, + .port_tag_remap = mv88e6390_port_tag_remap, + .port_set_frame_mode = mv88e6351_port_set_frame_mode, + .port_set_egress_floods = mv88e6352_port_set_egress_floods, + .port_set_ether_type = mv88e6393x_port_set_ether_type, + .port_set_jumbo_size = mv88e6165_port_set_jumbo_size, + .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, + .port_pause_limit = mv88e6390_port_pause_limit, + .port_set_cmode = mv88e6393x_port_set_cmode, + .port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit, + .port_disable_pri_override = mv88e6xxx_port_disable_pri_override, + .port_get_cmode = mv88e6352_port_get_cmode, + .stats_snapshot = mv88e6390_g1_stats_snapshot, + .stats_set_histogram = mv88e6390_g1_stats_set_histogram, + .stats_get_sset_count = mv88e6320_stats_get_sset_count, + .stats_get_strings = mv88e6320_stats_get_strings, + .stats_get_stats = mv88e6390_stats_get_stats, + .set_cpu_port = mv88e6393x_port_set_cpu_dest, + .set_egress_port = mv88e6393x_set_egress_port, + .watchdog_ops = &mv88e6390_watchdog_ops, + .mgmt_rsvd2cpu = mv88e6393x_port_mgmt_rsvd2cpu, + .pot_clear = mv88e6xxx_g2_pot_clear, + .reset = mv88e6352_g1_reset, + .rmu_disable = mv88e6390_g1_rmu_disable, + .vtu_getnext = mv88e6390_g1_vtu_getnext, + .vtu_loadpurge = mv88e6390_g1_vtu_loadpurge, + .serdes_power = mv88e6393x_serdes_power, + .serdes_get_lane = mv88e6393x_serdes_get_lane, + .serdes_pcs_get_state = mv88e6390_serdes_pcs_get_state, + .serdes_irq_mapping = mv88e6390_serdes_irq_mapping, + .serdes_irq_enable = mv88e6393x_serdes_irq_enable, + .serdes_irq_status = mv88e6393x_serdes_irq_status, + .gpio_ops = &mv88e6352_gpio_ops, + .avb_ops = &mv88e6390_avb_ops, + .ptp_ops = &mv88e6352_ptp_ops, + .phylink_validate = mv88e6393x_phylink_validate, +
Re: [PATCH] vhost/vsock: add IOTLB API support
On 2020/11/3 上午1:11, Stefano Garzarella wrote: On Fri, Oct 30, 2020 at 07:44:43PM +0800, Jason Wang wrote: On 2020/10/30 下午6:54, Stefano Garzarella wrote: On Fri, Oct 30, 2020 at 06:02:18PM +0800, Jason Wang wrote: On 2020/10/30 上午1:43, Stefano Garzarella wrote: This patch enables the IOTLB API support for vhost-vsock devices, allowing the userspace to emulate an IOMMU for the guest. These changes were made following vhost-net, in details this patch: - exposes VIRTIO_F_ACCESS_PLATFORM feature and inits the iotlb device if the feature is acked - implements VHOST_GET_BACKEND_FEATURES and VHOST_SET_BACKEND_FEATURES ioctls - calls vq_meta_prefetch() before vq processing to prefetch vq metadata address in IOTLB - provides .read_iter, .write_iter, and .poll callbacks for the chardev; they are used by the userspace to exchange IOTLB messages This patch was tested with QEMU and a patch applied [1] to fix a simple issue: $ qemu -M q35,accel=kvm,kernel-irqchip=split \ -drive file=fedora.qcow2,format=qcow2,if=virtio \ -device intel-iommu,intremap=on \ -device vhost-vsock-pci,guest-cid=3,iommu_platform=on Patch looks good, but a question: It looks to me you don't enable ATS which means vhost won't get any invalidation request or did I miss anything? You're right, I didn't see invalidation requests, only miss and updates. Now I have tried to enable 'ats' and 'device-iotlb' but I still don't see any invalidation. How can I test it? (Sorry but I don't have much experience yet with vIOMMU) I guess it's because the batched unmap. Maybe you can try to use "intel_iommu=strict" in guest kernel command line to see if it works. Btw, make sure the qemu contains the patch [1]. Otherwise ATS won't be enabled for recent Linux Kernel in the guest. The problem was my kernel, it was built with a tiny configuration. Using fedora stock kernel I can see the 'invalidate' requests, but I also had the following issues. Do they make you ring any bells? $ ./qemu -m 4G -smp 4 -M q35,accel=kvm,kernel-irqchip=split \ -drive file=fedora.qcow2,format=qcow2,if=virtio \ -device intel-iommu,intremap=on,device-iotlb=on \ -device vhost-vsock-pci,guest-cid=6,iommu_platform=on,ats=on,id=v1 qemu-system-x86_64: vtd_iova_to_slpte: detected IOVA overflow (iova=0x1d4030c0) It's a hint that IOVA exceeds the AW. It might be worth to check whether the missed IOVA reported from IOTLB is legal. Thanks qemu-system-x86_64: vtd_iommu_translate: detected translation failure (dev=00:03:00, iova=0x1d4030c0) qemu-system-x86_64: New fault is not recorded due to compression of faults Guest kernel messages: [ 44.940872] DMAR: DRHD: handling fault status reg 2 [ 44.941989] DMAR: [DMA Read] Request device [00:03.0] PASID fault addr 88W [ 49.785884] DMAR: DRHD: handling fault status reg 2 [ 49.788874] DMAR: [DMA Read] Request device [00:03.0] PASID fault addr 88W QEMU: b149dea55c Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging Linux guest: 5.8.16-200.fc32.x86_64 Thanks, Stefano
Re: [PATCH v9 2/2] net: Add mhi-net driver
Hi Jakub, On Mon, 2 Nov 2020 at 23:40, Jakub Kicinski wrote: > > On Fri, 30 Oct 2020 11:48:15 +0100 Loic Poulain wrote: > > This patch adds a new network driver implementing MHI transport for > > network packets. Packets can be in any format, though QMAP (rmnet) > > is the usual protocol (flow control + PDN mux). > > > > It support two MHI devices, IP_HW0 which is, the path to the IPA > > (IP accelerator) on qcom modem, And IP_SW0 which is the software > > driven IP path (to modem CPU). > > > > Signed-off-by: Loic Poulain > > Reviewed-by: Manivannan Sadhasivam > > > +static int mhi_ndo_stop(struct net_device *ndev) > > +{ > > + struct mhi_net_dev *mhi_netdev = netdev_priv(ndev); > > + > > + netif_stop_queue(ndev); > > + netif_carrier_off(ndev); > > + cancel_delayed_work_sync(&mhi_netdev->rx_refill); > > Where do you free the allocated skbs? Does > mhi_unprepare_from_transfer() do that? When a buffer is queued, it is owned by the device until the transfer callback (ul_cb/dl_cb) is called. mhi_unprepare_from_transfer() causes the MHI channels to be reset which in turn leads to releasing the buffers, for each buffer the MHI core will call the mhi-net transfer callback with -ENOTCONN status, and we free it from here. > > The skbs should be freed somehow in .ndo_stop(). The skbs are released in remove() (mhi_unprepare_from_transfer), I do not do prepare/unprepare in ndo_open/ndo_stop because we need to have channels started during the whole life of the interface. That's because it set up kind of internal routing of on the device/modem side. Indeed, if channels are not started, configuration of the modem (via out-of-band qmi, at commands, etc) is not possible. > > > + return 0; > > +} > > + > > +static int mhi_ndo_xmit(struct sk_buff *skb, struct net_device *ndev) > > +{ > > + struct mhi_net_dev *mhi_netdev = netdev_priv(ndev); > > + struct mhi_device *mdev = mhi_netdev->mdev; > > + int err; > > + > > + err = mhi_queue_skb(mdev, DMA_TO_DEVICE, skb, skb->len, MHI_EOT); > > + if (unlikely(err)) { > > + net_err_ratelimited("%s: Failed to queue TX buf (%d)\n", > > + ndev->name, err); > > + > > + u64_stats_update_begin(&mhi_netdev->stats.tx_syncp); > > + u64_stats_inc(&mhi_netdev->stats.tx_dropped); > > + u64_stats_update_end(&mhi_netdev->stats.tx_syncp); > > + > > + /* drop the packet */ > > + kfree_skb(skb); > > dev_kfree_skb_any() > > > + } > > + > > + if (mhi_queue_is_full(mdev, DMA_TO_DEVICE)) > > + netif_stop_queue(ndev); > > + > > + return NETDEV_TX_OK; > > +} > > > +static void mhi_net_dl_callback(struct mhi_device *mhi_dev, > > + struct mhi_result *mhi_res) > > +{ > > + struct mhi_net_dev *mhi_netdev = dev_get_drvdata(&mhi_dev->dev); > > + struct sk_buff *skb = mhi_res->buf_addr; > > + int remaining; > > + > > + remaining = atomic_dec_return(&mhi_netdev->stats.rx_queued); > > + > > + if (unlikely(mhi_res->transaction_status)) { > > + u64_stats_update_begin(&mhi_netdev->stats.rx_syncp); > > + u64_stats_inc(&mhi_netdev->stats.rx_errors); > > + u64_stats_update_end(&mhi_netdev->stats.rx_syncp); > > + > > + kfree_skb(skb); > > Are you sure this never runs with irqs disabled or from irq context? > > Otherwise dev_kfree_skb_any(). Yes will fix that. > > > + > > + /* MHI layer resetting the DL channel */ > > + if (mhi_res->transaction_status == -ENOTCONN) > > + return; > > + } else { > > + u64_stats_update_begin(&mhi_netdev->stats.rx_syncp); > > + u64_stats_inc(&mhi_netdev->stats.rx_packets); > > + u64_stats_add(&mhi_netdev->stats.rx_bytes, > > mhi_res->bytes_xferd); > > + u64_stats_update_end(&mhi_netdev->stats.rx_syncp); > > + > > + skb->protocol = htons(ETH_P_MAP); > > + skb_put(skb, mhi_res->bytes_xferd); > > + netif_rx(skb); > > + } > > + > > + /* Refill if RX buffers queue becomes low */ > > + if (remaining <= mhi_netdev->rx_queue_sz / 2) > > + schedule_delayed_work(&mhi_netdev->rx_refill, 0); > > +} > > + > > +static void mhi_net_ul_callback(struct mhi_device *mhi_dev, > > + struct mhi_result *mhi_res) > > +{ > > + struct mhi_net_dev *mhi_netdev = dev_get_drvdata(&mhi_dev->dev); > > + struct net_device *ndev = mhi_netdev->ndev; > > + struct sk_buff *skb = mhi_res->buf_addr; > > + > > + /* Hardware has consumed the buffer, so free the skb (which is not > > + * freed by the MHI stack) and perform accounting. > > + */ > > + consume_skb(skb); > > ditto > > > + u64_stats_update_begin(&mhi_netdev->stats.tx_syncp); > > + if (unlikely(mhi_res->transaction_status)) { > > + u64_stats_inc(&mhi_netdev->stats.tx_errors);
[net-next v4 0/8]net: convert tasklets to use new tasklet_setup API
From: Allen Pais Commit 12cc923f1ccc ("tasklet: Introduce new initialization API")' introduced a new tasklet initialization API. This series converts all the net/* drivers to use the new tasklet_setup() API The following series is based on net-next (9faebeb2d) v3: introduce qdisc_from_priv, suggested by Eric Dumazet. v2: get rid of QDISC_ALIGN() v1: fix kerneldoc Allen Pais (8): net: dccp: convert tasklets to use new tasklet_setup() API net: ipv4: convert tasklets to use new tasklet_setup() API net: mac80211: convert tasklets to use new tasklet_setup() API net: mac802154: convert tasklets to use new tasklet_setup() API net: rds: convert tasklets to use new tasklet_setup() API net: sched: convert tasklets to use new tasklet_setup() API net: smc: convert tasklets to use new tasklet_setup() API net: xfrm: convert tasklets to use new tasklet_setup() API include/net/pkt_sched.h| 5 + net/dccp/timer.c | 12 ++-- net/ipv4/tcp_output.c | 8 +++- net/mac80211/ieee80211_i.h | 4 ++-- net/mac80211/main.c| 14 +- net/mac80211/tx.c | 5 +++-- net/mac80211/util.c| 5 +++-- net/mac802154/main.c | 8 +++- net/rds/ib_cm.c| 14 ++ net/sched/sch_atm.c| 8 net/smc/smc_cdc.c | 6 +++--- net/smc/smc_wr.c | 14 ++ net/xfrm/xfrm_input.c | 7 +++ 13 files changed, 52 insertions(+), 58 deletions(-) -- 2.25.1
[net-next v4 1/8] net: dccp: convert tasklets to use new tasklet_setup() API
From: Allen Pais In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- net/dccp/timer.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/dccp/timer.c b/net/dccp/timer.c index a934d2932373..db768f223ef7 100644 --- a/net/dccp/timer.c +++ b/net/dccp/timer.c @@ -215,13 +215,14 @@ static void dccp_delack_timer(struct timer_list *t) /** * dccp_write_xmitlet - Workhorse for CCID packet dequeueing interface - * @data: Socket to act on + * @t: pointer to the tasklet associated with this handler * * See the comments above %ccid_dequeueing_decision for supported modes. */ -static void dccp_write_xmitlet(unsigned long data) +static void dccp_write_xmitlet(struct tasklet_struct *t) { - struct sock *sk = (struct sock *)data; + struct dccp_sock *dp = from_tasklet(dp, t, dccps_xmitlet); + struct sock *sk = &dp->dccps_inet_connection.icsk_inet.sk; bh_lock_sock(sk); if (sock_owned_by_user(sk)) @@ -235,16 +236,15 @@ static void dccp_write_xmitlet(unsigned long data) static void dccp_write_xmit_timer(struct timer_list *t) { struct dccp_sock *dp = from_timer(dp, t, dccps_xmit_timer); - struct sock *sk = &dp->dccps_inet_connection.icsk_inet.sk; - dccp_write_xmitlet((unsigned long)sk); + dccp_write_xmitlet(&dp->dccps_xmitlet); } void dccp_init_xmit_timers(struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); - tasklet_init(&dp->dccps_xmitlet, dccp_write_xmitlet, (unsigned long)sk); + tasklet_setup(&dp->dccps_xmitlet, dccp_write_xmitlet); timer_setup(&dp->dccps_xmit_timer, dccp_write_xmit_timer, 0); inet_csk_init_xmit_timers(sk, &dccp_write_timer, &dccp_delack_timer, &dccp_keepalive_timer); -- 2.25.1
[net-next v4 8/8] net: xfrm: convert tasklets to use new tasklet_setup() API
From: Allen Pais In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- net/xfrm/xfrm_input.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 37456d022cfa..be6351e3f3cd 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -760,9 +760,9 @@ int xfrm_input_resume(struct sk_buff *skb, int nexthdr) } EXPORT_SYMBOL(xfrm_input_resume); -static void xfrm_trans_reinject(unsigned long data) +static void xfrm_trans_reinject(struct tasklet_struct *t) { - struct xfrm_trans_tasklet *trans = (void *)data; + struct xfrm_trans_tasklet *trans = from_tasklet(trans, t, tasklet); struct sk_buff_head queue; struct sk_buff *skb; @@ -818,7 +818,6 @@ void __init xfrm_input_init(void) trans = &per_cpu(xfrm_trans_tasklet, i); __skb_queue_head_init(&trans->queue); - tasklet_init(&trans->tasklet, xfrm_trans_reinject, -(unsigned long)trans); + tasklet_setup(&trans->tasklet, xfrm_trans_reinject); } } -- 2.25.1
[net-next v4 6/8] net: sched: convert tasklets to use new tasklet_setup() API
From: Allen Pais In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- include/net/pkt_sched.h | 5 + net/sched/sch_atm.c | 8 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 4ed32e6b0201..15b1b30f454e 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -24,6 +24,11 @@ static inline void *qdisc_priv(struct Qdisc *q) return &q->privdata; } +static inline struct Qdisc *qdisc_from_priv(void *priv) +{ + return container_of(priv, struct Qdisc, privdata); +} + /* Timer resolution MUST BE < 10% of min_schedulable_packet_size/bandwidth diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c index 1c281cc81f57..007bd2d9f1ff 100644 --- a/net/sched/sch_atm.c +++ b/net/sched/sch_atm.c @@ -466,10 +466,10 @@ drop: __maybe_unused * non-ATM interfaces. */ -static void sch_atm_dequeue(unsigned long data) +static void sch_atm_dequeue(struct tasklet_struct *t) { - struct Qdisc *sch = (struct Qdisc *)data; - struct atm_qdisc_data *p = qdisc_priv(sch); + struct atm_qdisc_data *p = from_tasklet(p, t, task); + struct Qdisc *sch = qdisc_from_priv(p); struct atm_flow_data *flow; struct sk_buff *skb; @@ -563,7 +563,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt, if (err) return err; - tasklet_init(&p->task, sch_atm_dequeue, (unsigned long)sch); + tasklet_setup(&p->task, sch_atm_dequeue); return 0; } -- 2.25.1
[net-next v4 7/8] net: smc: convert tasklets to use new tasklet_setup() API
From: Allen Pais In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- net/smc/smc_cdc.c | 6 +++--- net/smc/smc_wr.c | 14 ++ 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c index b1ce6ccbfaec..f23f558054a7 100644 --- a/net/smc/smc_cdc.c +++ b/net/smc/smc_cdc.c @@ -389,9 +389,9 @@ static void smc_cdc_msg_recv(struct smc_sock *smc, struct smc_cdc_msg *cdc) * Context: * - tasklet context */ -static void smcd_cdc_rx_tsklet(unsigned long data) +static void smcd_cdc_rx_tsklet(struct tasklet_struct *t) { - struct smc_connection *conn = (struct smc_connection *)data; + struct smc_connection *conn = from_tasklet(conn, t, rx_tsklet); struct smcd_cdc_msg *data_cdc; struct smcd_cdc_msg cdc; struct smc_sock *smc; @@ -411,7 +411,7 @@ static void smcd_cdc_rx_tsklet(unsigned long data) */ void smcd_cdc_rx_init(struct smc_connection *conn) { - tasklet_init(&conn->rx_tsklet, smcd_cdc_rx_tsklet, (unsigned long)conn); + tasklet_setup(&conn->rx_tsklet, smcd_cdc_rx_tsklet); } /* init, exit, misc **/ diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index 1e23cdd41eb1..cbc73a7e4d59 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -131,9 +131,9 @@ static inline void smc_wr_tx_process_cqe(struct ib_wc *wc) wake_up(&link->wr_tx_wait); } -static void smc_wr_tx_tasklet_fn(unsigned long data) +static void smc_wr_tx_tasklet_fn(struct tasklet_struct *t) { - struct smc_ib_device *dev = (struct smc_ib_device *)data; + struct smc_ib_device *dev = from_tasklet(dev, t, send_tasklet); struct ib_wc wc[SMC_WR_MAX_POLL_CQE]; int i = 0, rc; int polled = 0; @@ -435,9 +435,9 @@ static inline void smc_wr_rx_process_cqes(struct ib_wc wc[], int num) } } -static void smc_wr_rx_tasklet_fn(unsigned long data) +static void smc_wr_rx_tasklet_fn(struct tasklet_struct *t) { - struct smc_ib_device *dev = (struct smc_ib_device *)data; + struct smc_ib_device *dev = from_tasklet(dev, t, recv_tasklet); struct ib_wc wc[SMC_WR_MAX_POLL_CQE]; int polled = 0; int rc; @@ -698,10 +698,8 @@ void smc_wr_remove_dev(struct smc_ib_device *smcibdev) void smc_wr_add_dev(struct smc_ib_device *smcibdev) { - tasklet_init(&smcibdev->recv_tasklet, smc_wr_rx_tasklet_fn, -(unsigned long)smcibdev); - tasklet_init(&smcibdev->send_tasklet, smc_wr_tx_tasklet_fn, -(unsigned long)smcibdev); + tasklet_setup(&smcibdev->recv_tasklet, smc_wr_rx_tasklet_fn); + tasklet_setup(&smcibdev->send_tasklet, smc_wr_tx_tasklet_fn); } int smc_wr_create_link(struct smc_link *lnk) -- 2.25.1
[net-next v4 2/8] net: ipv4: convert tasklets to use new tasklet_setup() API
From: Allen Pais In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- net/ipv4/tcp_output.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index bf48cd73e967..6e998d428ceb 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1038,9 +1038,9 @@ static void tcp_tsq_handler(struct sock *sk) * transferring tsq->head because tcp_wfree() might * interrupt us (non NAPI drivers) */ -static void tcp_tasklet_func(unsigned long data) +static void tcp_tasklet_func(struct tasklet_struct *t) { - struct tsq_tasklet *tsq = (struct tsq_tasklet *)data; + struct tsq_tasklet *tsq = from_tasklet(tsq, t, tasklet); LIST_HEAD(list); unsigned long flags; struct list_head *q, *n; @@ -1125,9 +1125,7 @@ void __init tcp_tasklet_init(void) struct tsq_tasklet *tsq = &per_cpu(tsq_tasklet, i); INIT_LIST_HEAD(&tsq->head); - tasklet_init(&tsq->tasklet, -tcp_tasklet_func, -(unsigned long)tsq); + tasklet_setup(&tsq->tasklet, tcp_tasklet_func); } } -- 2.25.1
[net-next v4 4/8] net: mac802154: convert tasklets to use new tasklet_setup() API
From: Allen Pais In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Acked-by: Stefan Schmidt Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- net/mac802154/main.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/net/mac802154/main.c b/net/mac802154/main.c index 06ea0f8bfd5c..520cedc594e1 100644 --- a/net/mac802154/main.c +++ b/net/mac802154/main.c @@ -20,9 +20,9 @@ #include "ieee802154_i.h" #include "cfg.h" -static void ieee802154_tasklet_handler(unsigned long data) +static void ieee802154_tasklet_handler(struct tasklet_struct *t) { - struct ieee802154_local *local = (struct ieee802154_local *)data; + struct ieee802154_local *local = from_tasklet(local, t, tasklet); struct sk_buff *skb; while ((skb = skb_dequeue(&local->skb_queue))) { @@ -91,9 +91,7 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops) INIT_LIST_HEAD(&local->interfaces); mutex_init(&local->iflist_mtx); - tasklet_init(&local->tasklet, -ieee802154_tasklet_handler, -(unsigned long)local); + tasklet_setup(&local->tasklet, ieee802154_tasklet_handler); skb_queue_head_init(&local->skb_queue); -- 2.25.1
[net-next v4 5/8] net: rds: convert tasklets to use new tasklet_setup() API
From: Allen Pais In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- net/rds/ib_cm.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c index b36b60668b1d..d06398be4b80 100644 --- a/net/rds/ib_cm.c +++ b/net/rds/ib_cm.c @@ -314,9 +314,9 @@ static void poll_scq(struct rds_ib_connection *ic, struct ib_cq *cq, } } -static void rds_ib_tasklet_fn_send(unsigned long data) +static void rds_ib_tasklet_fn_send(struct tasklet_struct *t) { - struct rds_ib_connection *ic = (struct rds_ib_connection *)data; + struct rds_ib_connection *ic = from_tasklet(ic, t, i_send_tasklet); struct rds_connection *conn = ic->conn; rds_ib_stats_inc(s_ib_tasklet_call); @@ -354,9 +354,9 @@ static void poll_rcq(struct rds_ib_connection *ic, struct ib_cq *cq, } } -static void rds_ib_tasklet_fn_recv(unsigned long data) +static void rds_ib_tasklet_fn_recv(struct tasklet_struct *t) { - struct rds_ib_connection *ic = (struct rds_ib_connection *)data; + struct rds_ib_connection *ic = from_tasklet(ic, t, i_recv_tasklet); struct rds_connection *conn = ic->conn; struct rds_ib_device *rds_ibdev = ic->rds_ibdev; struct rds_ib_ack_state state; @@ -1219,10 +1219,8 @@ int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp) } INIT_LIST_HEAD(&ic->ib_node); - tasklet_init(&ic->i_send_tasklet, rds_ib_tasklet_fn_send, -(unsigned long)ic); - tasklet_init(&ic->i_recv_tasklet, rds_ib_tasklet_fn_recv, -(unsigned long)ic); + tasklet_setup(&ic->i_send_tasklet, rds_ib_tasklet_fn_send); + tasklet_setup(&ic->i_recv_tasklet, rds_ib_tasklet_fn_recv); mutex_init(&ic->i_recv_mutex); #ifndef KERNEL_HAS_ATOMIC64 spin_lock_init(&ic->i_ack_lock); -- 2.25.1
[net-next v4 3/8] net: mac80211: convert tasklets to use new tasklet_setup() API
From: Allen Pais In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Reviewed-by: Johannes Berg Signed-off-by: Romain Perier Signed-off-by: Allen Pais --- net/mac80211/ieee80211_i.h | 4 ++-- net/mac80211/main.c| 14 +- net/mac80211/tx.c | 5 +++-- net/mac80211/util.c| 5 +++-- 4 files changed, 13 insertions(+), 15 deletions(-) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 2a21226fb518..2a3b0ee65637 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -1795,7 +1795,7 @@ static inline bool ieee80211_sdata_running(struct ieee80211_sub_if_data *sdata) /* tx handling */ void ieee80211_clear_tx_pending(struct ieee80211_local *local); -void ieee80211_tx_pending(unsigned long data); +void ieee80211_tx_pending(struct tasklet_struct *t); netdev_tx_t ieee80211_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev); netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb, @@ -2146,7 +2146,7 @@ void ieee80211_txq_remove_vlan(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata); void ieee80211_fill_txq_stats(struct cfg80211_txq_stats *txqstats, struct txq_info *txqi); -void ieee80211_wake_txqs(unsigned long data); +void ieee80211_wake_txqs(struct tasklet_struct *t); void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata, u16 transaction, u16 auth_alg, u16 status, const u8 *extra, size_t extra_len, const u8 *bssid, diff --git a/net/mac80211/main.c b/net/mac80211/main.c index 523380aed92e..48ab05186610 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -220,9 +220,9 @@ u32 ieee80211_reset_erp_info(struct ieee80211_sub_if_data *sdata) BSS_CHANGED_ERP_SLOT; } -static void ieee80211_tasklet_handler(unsigned long data) +static void ieee80211_tasklet_handler(struct tasklet_struct *t) { - struct ieee80211_local *local = (struct ieee80211_local *) data; + struct ieee80211_local *local = from_tasklet(local, t, tasklet); struct sk_buff *skb; while ((skb = skb_dequeue(&local->skb_queue)) || @@ -733,16 +733,12 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t priv_data_len, skb_queue_head_init(&local->pending[i]); atomic_set(&local->agg_queue_stop[i], 0); } - tasklet_init(&local->tx_pending_tasklet, ieee80211_tx_pending, -(unsigned long)local); + tasklet_setup(&local->tx_pending_tasklet, ieee80211_tx_pending); if (ops->wake_tx_queue) - tasklet_init(&local->wake_txqs_tasklet, ieee80211_wake_txqs, -(unsigned long)local); + tasklet_setup(&local->wake_txqs_tasklet, ieee80211_wake_txqs); - tasklet_init(&local->tasklet, -ieee80211_tasklet_handler, -(unsigned long) local); + tasklet_setup(&local->tasklet, ieee80211_tasklet_handler); skb_queue_head_init(&local->skb_queue); skb_queue_head_init(&local->skb_queue_unreliable); diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 8ba10a48ded4..a50c0edb1153 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4406,9 +4406,10 @@ static bool ieee80211_tx_pending_skb(struct ieee80211_local *local, /* * Transmit all pending packets. Called from tasklet. */ -void ieee80211_tx_pending(unsigned long data) +void ieee80211_tx_pending(struct tasklet_struct *t) { - struct ieee80211_local *local = (struct ieee80211_local *)data; + struct ieee80211_local *local = from_tasklet(local, t, +tx_pending_tasklet); unsigned long flags; int i; bool txok; diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 49342060490f..a25e47750ed9 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -386,9 +386,10 @@ _ieee80211_wake_txqs(struct ieee80211_local *local, unsigned long *flags) rcu_read_unlock(); } -void ieee80211_wake_txqs(unsigned long data) +void ieee80211_wake_txqs(struct tasklet_struct *t) { - struct ieee80211_local *local = (struct ieee80211_local *)data; + struct ieee80211_local *local = from_tasklet(local, t, +wake_txqs_tasklet); unsigned long flags; spin_lock_irqsave(&local->queue_stop_reason_lock, flags); -- 2.25.1
Re: [PATCH net-next v2] net/usb/r8153_ecm: support ECM mode for RTL8153
On Mon, Nov 02, 2020 at 11:47:18AM -0800, Jakub Kicinski wrote: > On Mon, 2 Nov 2020 07:20:15 + Hayes Wang wrote: > > Jakub Kicinski > > > Can you describe the use case in more detail? > > > > > > AFAICT r8152 defines a match for the exact same device. > > > Does it not mean that which driver is used will be somewhat random > > > if both are built? > > > > I export rtl_get_version() from r8152. It would return none zero > > value if r8152 could support this device. Both r8152 and r8153_ecm > > would check the return value of rtl_get_version() in porbe(). > > Therefore, if rtl_get_version() return none zero value, the r8152 > > is used for the device with vendor mode. Otherwise, the r8153_ecm > > is used for the device with ECM mode. > > Oh, I see, I missed that the rtl_get_version() checking is the inverse > of r8152. > > > > > +/* Define these values to match your device */ > > > > +#define VENDOR_ID_REALTEK 0x0bda > > > > +#define VENDOR_ID_MICROSOFT0x045e > > > > +#define VENDOR_ID_SAMSUNG 0x04e8 > > > > +#define VENDOR_ID_LENOVO 0x17ef > > > > +#define VENDOR_ID_LINKSYS 0x13b1 > > > > +#define VENDOR_ID_NVIDIA 0x0955 > > > > +#define VENDOR_ID_TPLINK 0x2357 > > > > > > $ git grep 0x2357 | grep -i tplink > > > drivers/net/usb/cdc_ether.c:#define TPLINK_VENDOR_ID 0x2357 > > > drivers/net/usb/r8152.c:#define VENDOR_ID_TPLINK 0x2357 > > > drivers/usb/serial/option.c:#define TPLINK_VENDOR_ID > > > 0x2357 > > > > > > $ git grep 0x17ef | grep -i lenovo > > > drivers/hid/hid-ids.h:#define USB_VENDOR_ID_LENOVO0x17ef > > > drivers/hid/wacom.h:#define USB_VENDOR_ID_LENOVO 0x17ef > > > drivers/net/usb/cdc_ether.c:#define LENOVO_VENDOR_ID 0x17ef > > > drivers/net/usb/r8152.c:#define VENDOR_ID_LENOVO 0x17ef > > > > > > Time to consolidate those vendor id defines perhaps? > > > > It seems that there is no such header file which I could include > > or add the new vendor IDs. > > Please create one. (Adding Greg KH to the recipients, in case there is > a reason that USB subsystem doesn't have a common vendor id header.) There is a reason, it's a nightmare to maintain and handle merges for, just don't do it. Read the comments at the top of the pci_ids.h file if you are curious why we don't even do this for PCI device ids anymore for the past 10+ years. So no, please do not create such a common file, it is not needed or a good idea. thanks, greg k-h
[PATCH bpf 2/2] libbpf: fix possible use after free in xsk_socket__delete
From: Magnus Karlsson Fix a possible use after free in xsk_socket__delete that will happen if xsk_put_ctx() frees the ctx. To fix, save the umem reference taken from the context and just use that instead. Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices") Signed-off-by: Magnus Karlsson --- tools/lib/bpf/xsk.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index 504b7a8..9bc537d 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -892,6 +892,7 @@ void xsk_socket__delete(struct xsk_socket *xsk) { size_t desc_sz = sizeof(struct xdp_desc); struct xdp_mmap_offsets off; + struct xsk_umem *umem; struct xsk_ctx *ctx; int err; @@ -899,6 +900,7 @@ void xsk_socket__delete(struct xsk_socket *xsk) return; ctx = xsk->ctx; + umem = ctx->umem; if (ctx->prog_fd != -1) { xsk_delete_bpf_maps(xsk); close(ctx->prog_fd); @@ -918,11 +920,11 @@ void xsk_socket__delete(struct xsk_socket *xsk) xsk_put_ctx(ctx); - ctx->umem->refcount--; + umem->refcount--; /* Do not close an fd that also has an associated umem connected * to it. */ - if (xsk->fd != ctx->umem->fd) + if (xsk->fd != umem->fd) close(xsk->fd); free(xsk); } -- 2.7.4
[PATCH bpf 1/2] libbpf: fix null dereference in xsk_socket__delete
From: Magnus Karlsson Fix a possible null pointer dereference in xsk_socket__delete that will occur if a null pointer is fed into the function. Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices") Reported-by: Andrii Nakryiko Signed-off-by: Magnus Karlsson --- tools/lib/bpf/xsk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index e3c98c0..504b7a8 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -891,13 +891,14 @@ int xsk_umem__delete(struct xsk_umem *umem) void xsk_socket__delete(struct xsk_socket *xsk) { size_t desc_sz = sizeof(struct xdp_desc); - struct xsk_ctx *ctx = xsk->ctx; struct xdp_mmap_offsets off; + struct xsk_ctx *ctx; int err; if (!xsk) return; + ctx = xsk->ctx; if (ctx->prog_fd != -1) { xsk_delete_bpf_maps(xsk); close(ctx->prog_fd); -- 2.7.4
[PATCH bpf 0/2] libbpf: fix two bugs in xsk_socket__delete
This small series fixes two bugs in xsk_socket__delete. Details can be found in the individual commit messages, but a brief summary follows: Patch 1: fix null pointer dereference in xsk_socket__delete Patch 2: fix possible use after free in xsk_socket__delete This patch has been applied against commit 7a078d2d1880 ("libbpf, hashmap: Fix undefined behavior in hash_bits") Thanks: Magnus Magnus Karlsson (2): libbpf: fix null dereference in xsk_socket__delete libbpf: fix possible use after free in xsk_socket__delete tools/lib/bpf/xsk.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) -- 2.7.4
[PATCH net-next v3 2/2] net/usb/r8153_ecm: support ECM mode for RTL8153
Support ECM mode based on cdc_ether with relative mii functions, when CONFIG_USB_RTL8152 is not set, or the device is not supported by r8152 driver. Both r8152 and r8153_ecm would check the return value of rtl8152_get_version() in porbe(). If rtl8152_get_version() return none zero value, the r8152 is used for the device with vendor mode. Otherwise, the r8153_ecm is used for the device with ECM mode. Signed-off-by: Hayes Wang --- drivers/net/usb/Makefile| 2 +- drivers/net/usb/r8152.c | 22 + drivers/net/usb/r8153_ecm.c | 162 include/linux/usb/r8152.h | 30 +++ 4 files changed, 197 insertions(+), 19 deletions(-) create mode 100644 drivers/net/usb/r8153_ecm.c create mode 100644 include/linux/usb/r8152.h diff --git a/drivers/net/usb/Makefile b/drivers/net/usb/Makefile index 99fd12be2111..99381e6bea78 100644 --- a/drivers/net/usb/Makefile +++ b/drivers/net/usb/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_USB_LAN78XX) += lan78xx.o obj-$(CONFIG_USB_NET_AX8817X) += asix.o asix-y := asix_devices.o asix_common.o ax88172a.o obj-$(CONFIG_USB_NET_AX88179_178A) += ax88179_178a.o -obj-$(CONFIG_USB_NET_CDCETHER) += cdc_ether.o +obj-$(CONFIG_USB_NET_CDCETHER) += cdc_ether.o r8153_ecm.o obj-$(CONFIG_USB_NET_CDC_EEM) += cdc_eem.o obj-$(CONFIG_USB_NET_DM9601) += dm9601.o obj-$(CONFIG_USB_NET_SR9700) += sr9700.o diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index d8ae89aa470c..41b803729996 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -26,7 +26,7 @@ #include #include #include -#include +#include /* Information for net-next */ #define NETNEXT_VERSION"11" @@ -654,18 +654,6 @@ enum rtl_register_content { #define INTR_LINK 0x0004 -#define RTL8152_REQT_READ 0xc0 -#define RTL8152_REQT_WRITE 0x40 -#define RTL8152_REQ_GET_REGS 0x05 -#define RTL8152_REQ_SET_REGS 0x05 - -#define BYTE_EN_DWORD 0xff -#define BYTE_EN_WORD 0x33 -#define BYTE_EN_BYTE 0x11 -#define BYTE_EN_SIX_BYTES 0x3f -#define BYTE_EN_START_MASK 0x0f -#define BYTE_EN_END_MASK 0xf0 - #define RTL8153_MAX_PACKET 9216 /* 9K */ #define RTL8153_MAX_MTU(RTL8153_MAX_PACKET - VLAN_ETH_HLEN - \ ETH_FCS_LEN) @@ -693,9 +681,6 @@ enum rtl8152_flags { #define DEVICE_ID_THINKPAD_THUNDERBOLT3_DOCK_GEN2 0x3082 #define DEVICE_ID_THINKPAD_USB_C_DOCK_GEN2 0xa387 -#define MCU_TYPE_PLA 0x0100 -#define MCU_TYPE_USB 0x - struct tally_counter { __le64 tx_packets; __le64 rx_packets; @@ -6607,7 +6592,7 @@ static int rtl_fw_init(struct r8152 *tp) return 0; } -static u8 rtl_get_version(struct usb_interface *intf) +u8 rtl8152_get_version(struct usb_interface *intf) { struct usb_device *udev = interface_to_usbdev(intf); u32 ocp_data = 0; @@ -6665,12 +6650,13 @@ static u8 rtl_get_version(struct usb_interface *intf) return version; } +EXPORT_SYMBOL_GPL(rtl8152_get_version); static int rtl8152_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct usb_device *udev = interface_to_usbdev(intf); - u8 version = rtl_get_version(intf); + u8 version = rtl8152_get_version(intf); struct r8152 *tp; struct net_device *netdev; int ret; diff --git a/drivers/net/usb/r8153_ecm.c b/drivers/net/usb/r8153_ecm.c new file mode 100644 index ..13eba7a72633 --- /dev/null +++ b/drivers/net/usb/r8153_ecm.c @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include +#include +#include +#include +#include +#include +#include + +#define OCP_BASE 0xe86c + +static int pla_read_word(struct usbnet *dev, u16 index) +{ + u16 byen = BYTE_EN_WORD; + u8 shift = index & 2; + __le32 tmp; + int ret; + + if (shift) + byen <<= shift; + + index &= ~3; + + ret = usbnet_read_cmd(dev, RTL8152_REQ_GET_REGS, RTL8152_REQT_READ, index, + MCU_TYPE_PLA | byen, &tmp, sizeof(tmp)); + if (ret < 0) + goto out; + + ret = __le32_to_cpu(tmp); + ret >>= (shift * 8); + ret &= 0x; + +out: + return ret; +} + +static int pla_write_word(struct usbnet *dev, u16 index, u32 data) +{ + u32 mask = 0x; + u16 byen = BYTE_EN_WORD; + u8 shift = index & 2; + __le32 tmp; + int ret; + + data &= mask; + + if (shift) { + byen <<= shift; + mask <<= (shift * 8); + data <<= (shift * 8); + } + + index &= ~3; + + ret = usbnet_read_cmd(dev, RTL8152_REQ_GET_REGS, RTL8152_REQT_READ, index, + MCU_TYPE_PLA | byen, &tmp, sizeof(tmp)); + + if (ret < 0) + goto out; + + data |
[PATCH net-next v3 1/2] include/linux/usb: new header file for the vendor ID of USB devices
Add a new header file usb_vendor_id.h to consolidate the definitions of the vendor ID of USB devices which may be used by cdc_ether and r8152 driver. Signed-off-by: Hayes Wang --- drivers/net/usb/cdc_ether.c | 139 +- drivers/net/usb/r8152.c | 48 +-- include/linux/usb/usb_vendor_id.h | 51 +++ 3 files changed, 133 insertions(+), 105 deletions(-) create mode 100644 include/linux/usb/usb_vendor_id.h diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c index 8c1d61c2cbac..1f6d9b46883a 100644 --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -17,6 +17,7 @@ #include #include #include +#include #if IS_ENABLED(CONFIG_USB_NET_RNDIS_HOST) @@ -540,22 +541,6 @@ static const struct driver_info wwan_info = { /*-*/ -#define HUAWEI_VENDOR_ID 0x12D1 -#define NOVATEL_VENDOR_ID 0x1410 -#define ZTE_VENDOR_ID 0x19D2 -#define DELL_VENDOR_ID 0x413C -#define REALTEK_VENDOR_ID 0x0bda -#define SAMSUNG_VENDOR_ID 0x04e8 -#define LENOVO_VENDOR_ID 0x17ef -#define LINKSYS_VENDOR_ID 0x13b1 -#define NVIDIA_VENDOR_ID 0x0955 -#define HP_VENDOR_ID 0x03f0 -#define MICROSOFT_VENDOR_ID0x045e -#define UBLOX_VENDOR_ID0x1546 -#define TPLINK_VENDOR_ID 0x2357 -#define AQUANTIA_VENDOR_ID 0x2eca -#define ASIX_VENDOR_ID 0x0b95 - static const struct usb_device_id products[] = { /* BLACKLIST !! * @@ -661,49 +646,49 @@ static const struct usb_device_id products[] = { /* Novatel USB551L and MC551 - handled by qmi_wwan */ { - USB_DEVICE_AND_INTERFACE_INFO(NOVATEL_VENDOR_ID, 0xB001, USB_CLASS_COMM, - USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + USB_DEVICE_AND_INTERFACE_INFO(USB_VENDOR_ID_NOVATEL, 0xB001, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), .driver_info = 0, }, /* Novatel E362 - handled by qmi_wwan */ { - USB_DEVICE_AND_INTERFACE_INFO(NOVATEL_VENDOR_ID, 0x9010, USB_CLASS_COMM, - USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + USB_DEVICE_AND_INTERFACE_INFO(USB_VENDOR_ID_NOVATEL, 0x9010, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), .driver_info = 0, }, /* Dell Wireless 5800 (Novatel E362) - handled by qmi_wwan */ { - USB_DEVICE_AND_INTERFACE_INFO(DELL_VENDOR_ID, 0x8195, USB_CLASS_COMM, - USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + USB_DEVICE_AND_INTERFACE_INFO(USB_VENDOR_ID_DELL, 0x8195, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), .driver_info = 0, }, /* Dell Wireless 5800 (Novatel E362) - handled by qmi_wwan */ { - USB_DEVICE_AND_INTERFACE_INFO(DELL_VENDOR_ID, 0x8196, USB_CLASS_COMM, - USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + USB_DEVICE_AND_INTERFACE_INFO(USB_VENDOR_ID_DELL, 0x8196, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), .driver_info = 0, }, /* Dell Wireless 5804 (Novatel E371) - handled by qmi_wwan */ { - USB_DEVICE_AND_INTERFACE_INFO(DELL_VENDOR_ID, 0x819b, USB_CLASS_COMM, - USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + USB_DEVICE_AND_INTERFACE_INFO(USB_VENDOR_ID_DELL, 0x819b, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), .driver_info = 0, }, /* Novatel Expedite E371 - handled by qmi_wwan */ { - USB_DEVICE_AND_INTERFACE_INFO(NOVATEL_VENDOR_ID, 0x9011, USB_CLASS_COMM, - USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + USB_DEVICE_AND_INTERFACE_INFO(USB_VENDOR_ID_NOVATEL, 0x9011, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), .driver_info = 0, }, /* HP lt2523 (Novatel E371) - handled by qmi_wwan */ { - USB_DEVICE_AND_INTERFACE_INFO(HP_VENDOR_ID, 0x421d, USB_CLASS_COMM, + USB_DEVICE_AND_INTERFACE_INFO(USB_VENDOR_ID_HP, 0x421d, USB_CLASS_COMM, USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), .driver_info = 0, }, @@ -717,127 +702,127 @@ static const struct usb_device_id products[] = { /* Huawei E1820 - handled by qmi_wwan */ { - USB_DEVICE_INTERFACE_NUMBER(HUAWEI_VENDOR_ID, 0x14ac, 1), + USB_DEVICE_INTERFACE_NUMBER(USB_VENDOR_ID_HUAWEI, 0x14ac, 1), .driver_info = 0, }, /* Realtek RTL8152 Based USB 2.0 Ethernet Adapters */ { - USB_DEVICE_AND_INTERFACE_INFO(REALTEK_VENDOR_ID, 0x8152, USB_CLASS_COMM, - USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + USB_DEVICE_AND_INTER
[PATCH net-next v3 0/2] drivers/net/usb: support ECM mode for RTL8153
v3: Move original patch to #2. And add a new patch #1 to consolidate vendor ID of USB devices. v2: Add include/linux/usb/r8152.h to avoid the warning about no previous prototype for rtl8152_get_version. Hayes Wang (2): include/linux/usb: new header file for the vendor ID of USB devices net/usb/r8153_ecm: support ECM mode for RTL8153 drivers/net/usb/Makefile | 2 +- drivers/net/usb/cdc_ether.c | 93 +++-- drivers/net/usb/r8152.c | 68 + drivers/net/usb/r8153_ecm.c | 162 ++ include/linux/usb/r8152.h | 30 ++ include/linux/usb/usb_vendor_id.h | 51 ++ 6 files changed, 306 insertions(+), 100 deletions(-) create mode 100644 drivers/net/usb/r8153_ecm.c create mode 100644 include/linux/usb/r8152.h create mode 100644 include/linux/usb/usb_vendor_id.h -- 2.26.2
RE: [PATCH net-next v2] net/usb/r8153_ecm: support ECM mode for RTL8153
Greg Kroah-Hartman > Sent: Tuesday, November 3, 2020 5:33 PM [...] > There is a reason, it's a nightmare to maintain and handle merges for, > just don't do it. > > Read the comments at the top of the pci_ids.h file if you are curious > why we don't even do this for PCI device ids anymore for the past 10+ > years. > > So no, please do not create such a common file, it is not needed or a > good idea. Oops. I have sent it.
Re: [PATCH net-next v3 1/2] include/linux/usb: new header file for the vendor ID of USB devices
On Tue, Nov 03, 2020 at 05:46:37PM +0800, Hayes Wang wrote: > diff --git a/include/linux/usb/usb_vendor_id.h > b/include/linux/usb/usb_vendor_id.h > new file mode 100644 > index ..23b6e6849515 > --- /dev/null > +++ b/include/linux/usb/usb_vendor_id.h > @@ -0,0 +1,51 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > + No, this is not ok, sorry. Please see the top of the pci_ids.h file why we do not do this. There is nothing wrong with putting the individual ids in the different drivers, we don't want one single huge file that is a pain for merges and builds. We learn from our past mistakes, please do not fail to learn from history :) thanks, greg k-h
Re: [PATCH v7 0/6] CTU CAN FD open-source IP core SocketCAN driver, PCI, platform integration and documentation
Hello Marc, thanks for response On Saturday 31 of October 2020 12:35:11 Marc Kleine-Budde wrote: > On 10/30/20 11:19 PM, Pavel Pisa wrote: > > This driver adds support for the CTU CAN FD open-source IP core. > > Please fix the following checkpatch warnings/errors: Yes I recheck with actual checkpatch, I have used 5.4 one and may it be overlooked something during last upadates. > - > drivers/net/can/ctucanfd/ctucanfd_frame.h > - > CHECK: Please don't use multiple blank lines > #46: FILE: drivers/net/can/ctucanfd/ctucanfd_frame.h:46: OK, we find a reason for this blank line in header generator. > CHECK: Prefer kernel type 'u32' over 'uint32_t' > #49: FILE: drivers/net/can/ctucanfd/ctucanfd_frame.h:49: > + uint32_t u32; In this case, please confirm that even your personal opinion is against uint32_t in headers, you request the change. uint32_t is used in many kernel headers and in this case allows our tooling to use headers for mutual test of HDL design match with HW access in the C. If the reasons to remove uint32_t prevails, we need to separate Linux generator from the one used for other purposes. When we add Linux mode then we can revamp headers even more and in such case we can even invest time to switch from structure bitfields to plain bitmask defines. It is quite lot of work and takes some time, but if there is consensus I do it during next weeks, I would like to see what is preferred way to define registers bitfields. I personally like RTEMS approach for which we have prepared generator from parsed PDFs when we added BSP for TMS570 https://git.rtems.org/rtems/tree/bsps/arm/tms570/include/bsp/ti_herc/reg_dcan.h#n152 Other solution I like (biased, because I have even designed it) is #define __val2mfld(mask,val) (((mask)&~((mask)<<1))*(val)&(mask)) #define __mfld2val(mask,val) (((val)&(mask))/((mask)&~((mask)<<1))) https://gitlab.com/pikron/sw-base/sysless/-/blob/master/arch/arm/generic/defines/cpu_def.h#L314 Which allows to use simple masks, i.e. #define SSP_CR0_DSS_m 0x000f /* Data Size Select (num bits - 1) */ #define SSP_CR0_FRF_m 0x0030 /* Frame Format: 0 SPI, 1 TI, 2 Microwire */ #define SSP_CR0_CPOL_m 0x0040 /* SPI Clock Polarity. 0 low between frames, 1 high */ # https://gitlab.com/pikron/sw-base/sysless/-/blob/master/libs4c/spi/spi_lpcssp.c#L46 in the sources lpcssp_drv->ssp_regs->CR0 = __val2mfld(SSP_CR0_DSS_m, lpcssp_drv->data16_fl? 16 - 1 : 8 - 1) | __val2mfld(SSP_CR0_FRF_m, 0) | (msg->size_mode & SPI_MODE_CPOL? SSP_CR0_CPOL_m: 0) | (msg->size_mode & SPI_MODE_CPHA? SSP_CR0_CPHA_m: 0) | __val2mfld(SSP_CR0_SCR_m, rate); https://gitlab.com/pikron/sw-base/sysless/-/blob/master/libs4c/spi/spi_lpcssp.c#L217 If you have some preferred Linux style then please send us pointers. In the fact, Ondrej Ille has based his structure bitfileds style on the other driver included in the Linux kernel and it seems to be a problem now. So when I invest my time, I want to use style which pleases me and others. Thanks for the support and best wishes, Pavel Pisa
Re: [PATCH v2 0/8] slab: provide and use krealloc_array()
On Tue, Nov 3, 2020 at 5:14 AM Joe Perches wrote: > > On Mon, 2020-11-02 at 16:20 +0100, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski > > > > Andy brought to my attention the fact that users allocating an array of > > equally sized elements should check if the size multiplication doesn't > > overflow. This is why we have helpers like kmalloc_array(). > > > > However we don't have krealloc_array() equivalent and there are many > > users who do their own multiplication when calling krealloc() for arrays. > > > > This series provides krealloc_array() and uses it in a couple places. > > My concern about this is a possible assumption that __GFP_ZERO will > work, and as far as I know, it will not. > Yeah so I had this concern for devm_krealloc() and even sent a patch that extended it to honor __GFP_ZERO before I noticed that regular krealloc() silently ignores __GFP_ZERO. I'm not sure if this is on purpose. Maybe we should either make krealloc() honor __GFP_ZERO or explicitly state in its documentation that it ignores it? This concern isn't really related to this patch as such - it's more of a general krealloc() inconsistency. Bartosz
RE: [PATCH net-next 6/7] drivers: net: smc911x: Fix cast from pointer to integer of different size
From: Jakub Kicinski > Sent: 02 November 2020 23:48 > > On Sat, 31 Oct 2020 01:49:57 +0100 Andrew Lunn wrote: > > drivers/net/ethernet/smsc/smc911x.c: In function > > ‘smc911x_hardware_send_pkt’: > > drivers/net/ethernet/smsc/smc911x.c:471:11: warning: cast from pointer to > > integer of different size > [-Wpointer-to-int-cast] > > 471 | cmdA = (((u32)skb->data & 0x3) << 16) | > > > > When built on 64bit targets, the skb->data pointer cannot be cast to a > > u32 in a meaningful way. Use long instead. > > > > Signed-off-by: Andrew Lunn > > --- > > drivers/net/ethernet/smsc/smc911x.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/ethernet/smsc/smc911x.c > > b/drivers/net/ethernet/smsc/smc911x.c > > index 4ec292563f38..f37832540364 100644 > > --- a/drivers/net/ethernet/smsc/smc911x.c > > +++ b/drivers/net/ethernet/smsc/smc911x.c > > @@ -466,9 +466,9 @@ static void smc911x_hardware_send_pkt(struct net_device > > *dev) > > TX_CMD_A_INT_FIRST_SEG_ | TX_CMD_A_INT_LAST_SEG_ | > > skb->len; > > #else > > - buf = (char*)((u32)skb->data & ~0x3); > > - len = (skb->len + 3 + ((u32)skb->data & 3)) & ~0x3; > > - cmdA = (((u32)skb->data & 0x3) << 16) | > > + buf = (char *)((long)skb->data & ~0x3); > > + len = (skb->len + 3 + ((long)skb->data & 3)) & ~0x3; > > + cmdA = (((long)skb->data & 0x3) << 16) | > > Probably best if you swap the (long) for something unsigned here as > well. It would be much clearer with a temporary variable: offset = (unsigned long)skb->data & 3; buf = skb->data - offset; len = skb->len + offset; cmdA = offset << 16 | ... David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
RE: [PATCH net-next] drivers: net: sky2: Fix -Wstringop-truncation with W=1
From: Jakub Kicinski > Sent: 03 November 2020 00:01 > > On Sat, 31 Oct 2020 18:40:28 +0100 Andrew Lunn wrote: > > In function ‘strncpy’, > > inlined from ‘sky2_name’ at drivers/net/ethernet/marvell/sky2.c:4903:3, > > inlined from ‘sky2_probe’ at drivers/net/ethernet/marvell/sky2.c:5049:2: > > ./include/linux/string.h:297:30: warning: ‘__builtin_strncpy’ specified > > bound 16 equals destination > size [-Wstringop-truncation] > > > > None of the device names are 16 characters long, so it was never an > > issue, but reduce the length of the buffer size by one to avoid the > > warning. > > > > Signed-off-by: Andrew Lunn > > --- > > drivers/net/ethernet/marvell/sky2.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/marvell/sky2.c > > b/drivers/net/ethernet/marvell/sky2.c > > index 25981a7a43b5..35b0ec5afe13 100644 > > --- a/drivers/net/ethernet/marvell/sky2.c > > +++ b/drivers/net/ethernet/marvell/sky2.c > > @@ -4900,7 +4900,7 @@ static const char *sky2_name(u8 chipid, char *buf, > > int sz) > > }; > > > > if (chipid >= CHIP_ID_YUKON_XL && chipid <= CHIP_ID_YUKON_OP_2) > > - strncpy(buf, name[chipid - CHIP_ID_YUKON_XL], sz); > > + strncpy(buf, name[chipid - CHIP_ID_YUKON_XL], sz - 1); > > Hm. This irks the eye a little. AFAIK the idiomatic code would be: > > strncpy(buf, name..., sz - 1); > buf[sz - 1] = '\0'; > > Perhaps it's easier to convert to strscpy()/strscpy_pad()? > > > else > > snprintf(buf, sz, "(chip %#x)", chipid); > > return buf; Is the pad needed? It isn't present in the 'else' branch. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH net-next v2 00/15] net/smc: extend diagnostic netlink interface
Please apply the following patch series for smc to netdev's net-next tree. This patch series refactors the current netlink API in smc_diag module which is used for diagnostic purposes and extends the netlink API in a backward compatible way so that the extended API can provide information about SMC linkgroups, links and devices (both for SMC-R and SMC-D) and can still work with the legacy netlink API. Please note that patch 9 triggers a checkpatch warning because a comment line was added using the style of the already existing comment block. v2: in patch 10, add missing include to uapi header smc_diag.h Guvenc Gulce (14): net/smc: Use active link of the connection net/smc: Add connection counters for links net/smc: Add link counters for IB device ports net/smc: Add diagnostic information to smc ib-device net/smc: Add diagnostic information to link structure net/smc: Refactor the netlink reply processing routine net/smc: Add ability to work with extended SMC netlink API net/smc: Introduce SMCR get linkgroup command net/smc: Introduce SMCR get link command net/smc: Add SMC-D Linkgroup diagnostic support net/smc: Add support for obtaining SMCD device list net/smc: Add support for obtaining SMCR device list net/smc: Refactor smc ism v2 capability handling net/smc: Add support for obtaining system information Karsten Graul (1): net/smc: use helper smc_conn_abort() in listen processing include/net/smc.h | 2 +- include/uapi/linux/smc.h | 8 + include/uapi/linux/smc_diag.h | 109 + net/smc/af_smc.c | 29 +- net/smc/smc.h | 5 +- net/smc/smc_clc.c | 6 + net/smc/smc_clc.h | 1 + net/smc/smc_core.c| 32 +- net/smc/smc_core.h| 32 +- net/smc/smc_diag.c| 766 +- net/smc/smc_ib.c | 49 +++ net/smc/smc_ib.h | 4 +- net/smc/smc_ism.c | 12 +- net/smc/smc_ism.h | 5 +- net/smc/smc_pnet.c| 3 + 15 files changed, 939 insertions(+), 124 deletions(-) -- 2.17.1
[PATCH net-next v2 10/15] net/smc: Introduce SMCR get link command
From: Guvenc Gulce Introduce get link command which loops through all available links of all available link groups. It uses the SMC-R linkgroup list as entry point, not the socket list, which makes linkgroup diagnosis possible, in case linkgroup does not contain active connections anymore. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- include/uapi/linux/smc_diag.h | 8 + net/smc/smc_diag.c| 62 ++- 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index 6ae028344b6d..a57df0296aa4 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -79,6 +80,7 @@ enum { /* SMC_DIAG_GET_LGR_INFO command extensions */ enum { SMC_DIAG_LGR_INFO_SMCR = 1, + SMC_DIAG_LGR_INFO_SMCR_LINK, }; #define SMC_DIAG_MAX (__SMC_DIAG_MAX - 1) @@ -129,6 +131,12 @@ struct smc_diag_linkinfo { __u8 ibport;/* RDMA device port number */ __u8 gid[40]; /* local GID */ __u8 peer_gid[40]; /* peer GID */ + /* Fields above used by legacy v1 code */ + __u32 conn_cnt; + __u8 netdev[IFNAMSIZ]; /* ethernet device name */ + __u8 link_uid[4]; /* unique link id */ + __u8 peer_link_uid[4]; /* unique peer link id */ + __u32 link_state; /* link state */ }; struct smc_diag_lgrinfo { diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index c53904b3f350..6885814b6e4f 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -20,6 +20,7 @@ #include #include "smc.h" +#include "smc_ib.h" #include "smc_core.h" struct smc_diag_dump_ctx { @@ -203,6 +204,54 @@ static bool smc_diag_fill_dmbinfo(struct sock *sk, struct sk_buff *skb) return true; } +static int smc_diag_fill_lgr_link(struct smc_link_group *lgr, + struct smc_link *link, + struct sk_buff *skb, + struct netlink_callback *cb, + struct smc_diag_req_v2 *req) +{ + struct smc_diag_linkinfo link_info; + int dummy = 0, rc = 0; + struct nlmsghdr *nlh; + + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, MAGIC_SEQ_V2_ACK, + cb->nlh->nlmsg_type, 0, NLM_F_MULTI); + + memset(&link_info, 0, sizeof(link_info)); + link_info.link_state = link->state; + link_info.link_id = link->link_id; + link_info.conn_cnt = atomic_read(&link->conn_cnt); + link_info.ibport = link->ibport; + + memcpy(link_info.link_uid, link->link_uid, + sizeof(link_info.link_uid)); + snprintf(link_info.ibname, sizeof(link_info.ibname), "%s", +link->ibname); + snprintf(link_info.netdev, sizeof(link_info.netdev), "%s", +link->ndevname); + memcpy(link_info.peer_link_uid, link->peer_link_uid, + sizeof(link_info.peer_link_uid)); + + smc_gid_be16_convert(link_info.gid, +link->gid); + smc_gid_be16_convert(link_info.peer_gid, +link->peer_gid); + + /* Just a command place holder to signal back the command reply type */ + if (nla_put(skb, SMC_DIAG_GET_LGR_INFO, sizeof(dummy), &dummy) < 0) + goto errout; + if (nla_put(skb, SMC_DIAG_LGR_INFO_SMCR_LINK, + sizeof(link_info), &link_info) < 0) + goto errout; + + nlmsg_end(skb, nlh); + return rc; + +errout: + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; +} + static int smc_diag_fill_lgr(struct smc_link_group *lgr, struct sk_buff *skb, struct netlink_callback *cb, @@ -238,7 +287,7 @@ static int smc_diag_handle_lgr(struct smc_link_group *lgr, struct smc_diag_req_v2 *req) { struct nlmsghdr *nlh; - int rc = 0; + int i, rc = 0; nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, MAGIC_SEQ_V2_ACK, cb->nlh->nlmsg_type, 0, NLM_F_MULTI); @@ -250,6 +299,17 @@ static int smc_diag_handle_lgr(struct smc_link_group *lgr, goto errout; nlmsg_end(skb, nlh); + + if ((req->cmd_ext & (1 << (SMC_DIAG_LGR_INFO_SMCR_LINK - 1 { + for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { + if (!smc_link_usable(&lgr->lnk[i])) + continue; + rc = smc_diag_fill_lgr_link(lgr, &lgr->lnk[i], skb, + cb, req); + if (rc < 0) + goto errout; + } + } return rc; errout: -- 2.17.1
[PATCH net-next v2 05/15] net/smc: Add diagnostic information to smc ib-device
From: Guvenc Gulce During smc ib-device creation, add network device name to smc ib-device structure. Register for netdevice name changes and update ib-device accordingly. This is needed for diagnostic purposes. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- net/smc/smc_ib.c | 47 ++ net/smc/smc_ib.h | 2 ++ net/smc/smc_pnet.c | 3 +++ 3 files changed, 52 insertions(+) diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index 1c314dbdc7fa..c4a04e868bf0 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -557,6 +557,52 @@ static void smc_ib_cleanup_per_ibdev(struct smc_ib_device *smcibdev) static struct ib_client smc_ib_client; +static void smc_copy_netdev_name(struct smc_ib_device *smcibdev, int port) +{ + struct ib_device *ibdev = smcibdev->ibdev; + struct net_device *ndev; + + if (ibdev->ops.get_netdev) { + ndev = ibdev->ops.get_netdev(ibdev, port + 1); + if (ndev) { + snprintf((char *)&smcibdev->netdev[port], +sizeof(smcibdev->netdev[port]), +"%s", ndev->name); + dev_put(ndev); + } + } +} + +void smc_ib_ndev_name_change(struct net_device *ndev) +{ + struct smc_ib_device *smcibdev; + struct ib_device *libdev; + struct net_device *lndev; + u8 port_cnt; + int i; + + mutex_lock(&smc_ib_devices.mutex); + list_for_each_entry(smcibdev, &smc_ib_devices.list, list) { + port_cnt = smcibdev->ibdev->phys_port_cnt; + for (i = 0; +i < min_t(size_t, port_cnt, SMC_MAX_PORTS); +i++) { + libdev = smcibdev->ibdev; + if (libdev->ops.get_netdev) { + lndev = libdev->ops.get_netdev(libdev, i + 1); + if (lndev) + dev_put(lndev); + if (lndev == ndev) { + snprintf((char *)&smcibdev->netdev[i], +sizeof(smcibdev->netdev[i]), +"%s", ndev->name); + } + } + } + } + mutex_unlock(&smc_ib_devices.mutex); +} + /* callback function for ib_register_client() */ static int smc_ib_add_dev(struct ib_device *ibdev) { @@ -596,6 +642,7 @@ static int smc_ib_add_dev(struct ib_device *ibdev) if (smc_pnetid_by_dev_port(ibdev->dev.parent, i, smcibdev->pnetid[i])) smc_pnetid_by_table_ib(smcibdev, i + 1); + smc_copy_netdev_name(smcibdev, i); pr_warn_ratelimited("smc:ib device %s port %d has pnetid " "%.16s%s\n", smcibdev->ibdev->name, i + 1, diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index 3e6bfeddd53b..b0868146b46b 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -54,11 +54,13 @@ struct smc_ib_device { /* ib-device infos for smc */ wait_queue_head_t lnks_deleted; /* wait 4 removal of all links*/ struct mutexmutex; /* protect dev setup+cleanup */ atomic_tlnk_cnt_by_port[SMC_MAX_PORTS];/*#lnk per port*/ + u8 netdev[SMC_MAX_PORTS][IFNAMSIZ];/* ndev names */ }; struct smc_buf_desc; struct smc_link; +void smc_ib_ndev_name_change(struct net_device *ndev); int smc_ib_register_client(void) __init; void smc_ib_unregister_client(void); bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport); diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c index f3c18b991d35..b0f40d73afd6 100644 --- a/net/smc/smc_pnet.c +++ b/net/smc/smc_pnet.c @@ -828,6 +828,9 @@ static int smc_pnet_netdev_event(struct notifier_block *this, case NETDEV_UNREGISTER: smc_pnet_remove_by_ndev(event_dev); return NOTIFY_OK; + case NETDEV_CHANGENAME: + smc_ib_ndev_name_change(event_dev); + return NOTIFY_OK; case NETDEV_REGISTER: smc_pnet_add_by_ndev(event_dev); return NOTIFY_OK; -- 2.17.1
[PATCH net-next v2 04/15] net/smc: Add link counters for IB device ports
From: Guvenc Gulce Add link counters to the structure of the smc ib device, one counter per ib port. Increase/decrease the counters as needed in the corresponding routines. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- net/smc/smc_core.c | 3 +++ net/smc/smc_ib.h | 1 + 2 files changed, 4 insertions(+) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 6e2077161267..da94725deb09 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -316,6 +316,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, lnk->link_idx = link_idx; lnk->smcibdev = ini->ib_dev; lnk->ibport = ini->ib_port; + atomic_inc(&ini->ib_dev->lnk_cnt_by_port[ini->ib_port - 1]); lnk->path_mtu = ini->ib_dev->pattr[ini->ib_port - 1].active_mtu; atomic_set(&lnk->conn_cnt, 0); smc_llc_link_set_uid(lnk); @@ -360,6 +361,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, smc_llc_link_clear(lnk, false); out: put_device(&ini->ib_dev->ibdev->dev); + atomic_dec(&ini->ib_dev->lnk_cnt_by_port[ini->ib_port - 1]); memset(lnk, 0, sizeof(struct smc_link)); lnk->state = SMC_LNK_UNUSED; if (!atomic_dec_return(&ini->ib_dev->lnk_cnt)) @@ -750,6 +752,7 @@ void smcr_link_clear(struct smc_link *lnk, bool log) smc_ib_dealloc_protection_domain(lnk); smc_wr_free_link_mem(lnk); put_device(&lnk->smcibdev->ibdev->dev); + atomic_dec(&lnk->smcibdev->lnk_cnt_by_port[lnk->ibport - 1]); smcibdev = lnk->smcibdev; memset(lnk, 0, sizeof(struct smc_link)); lnk->state = SMC_LNK_UNUSED; diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index 2ce481187dd0..3e6bfeddd53b 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -53,6 +53,7 @@ struct smc_ib_device {/* ib-device infos for smc */ atomic_tlnk_cnt;/* number of links on ibdev */ wait_queue_head_t lnks_deleted; /* wait 4 removal of all links*/ struct mutexmutex; /* protect dev setup+cleanup */ + atomic_tlnk_cnt_by_port[SMC_MAX_PORTS];/*#lnk per port*/ }; struct smc_buf_desc; -- 2.17.1
[PATCH net-next v2 12/15] net/smc: Add support for obtaining SMCD device list
From: Guvenc Gulce Deliver SMCD device information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- include/uapi/linux/smc.h | 2 + include/uapi/linux/smc_diag.h | 20 + net/smc/smc_core.h| 27 + net/smc/smc_diag.c| 76 +++ net/smc/smc_ib.h | 1 - 5 files changed, 125 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h index 635e2c2aeac5..736e8b98c8a5 100644 --- a/include/uapi/linux/smc.h +++ b/include/uapi/linux/smc.h @@ -38,4 +38,6 @@ enum {/* SMC PNET Table commands */ #define SMC_LGR_ID_SIZE4 #define SMC_MAX_HOSTNAME_LEN 32 /* Max length of hostname */ #define SMC_MAX_EID_LEN32 /* Max length of eid */ +#define SMC_MAX_PORTS 2 /* Max # of ports per ib device */ +#define SMC_PCI_ID_STR_LEN 16 /* Max length of pci id string */ #endif /* _UAPI_LINUX_SMC_H */ diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index 5a80172df757..ab8f76bdd1a4 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -74,6 +74,7 @@ enum { /* V2 Commands */ enum { SMC_DIAG_GET_LGR_INFO = SMC_DIAG_EXTS_PER_CMD, + SMC_DIAG_GET_DEV_INFO, __SMC_DIAG_EXT_MAX, }; @@ -84,6 +85,11 @@ enum { SMC_DIAG_LGR_INFO_SMCD, }; +/* SMC_DIAG_GET_DEV_INFO command extensions */ +enum { + SMC_DIAG_DEV_INFO_SMCD = 1, +}; + #define SMC_DIAG_MAX (__SMC_DIAG_MAX - 1) #define SMC_DIAG_EXT_MAX (__SMC_DIAG_EXT_MAX - 1) @@ -164,6 +170,20 @@ struct smcd_diag_dmbinfo { /* SMC-D Socket internals */ struct smc_diag_v2_lgr_info v2_lgr_info; /* SMCv2 info */ }; +struct smc_diag_dev_info { + /* Pnet ID per device port */ + __u8pnet_id[SMC_MAX_PORTS][SMC_MAX_PNETID_LEN]; + /* whether pnetid is set by user */ + __u8pnetid_by_user[SMC_MAX_PORTS]; + __u32 use_cnt;/* Number of linkgroups */ + __u8is_critical;/* Is device critical */ + __u32 pci_fid;/* PCI FID */ + __u16 pci_pchid; /* PCI CHID */ + __u16 pci_vendor; /* PCI Vendor */ + __u16 pci_device; /* PCI Device Vendor ID */ + __u8pci_id[SMC_PCI_ID_STR_LEN]; /* PCI ID */ +}; + struct smc_diag_lgr { __u8lgr_id[SMC_LGR_ID_SIZE]; /* Linkgroup identifier */ __u8lgr_role; /* Linkgroup role */ diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 639c7565b302..0f966a21c223 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -13,6 +13,7 @@ #define _SMC_CORE_H #include +#include #include #include "smc.h" @@ -366,6 +367,32 @@ static inline bool smc_link_active(struct smc_link *lnk) return lnk->state == SMC_LNK_ACTIVE; } +struct smc_pci_dev { + __u32 pci_fid; + __u16 pci_pchid; + __u16 pci_vendor; + __u16 pci_device; + __u8pci_id[SMC_PCI_ID_STR_LEN]; +}; + +static inline void smc_set_pci_values(struct pci_dev *pci_dev, + struct smc_pci_dev *smc_dev) +{ + smc_dev->pci_vendor = pci_dev->vendor; + smc_dev->pci_device = pci_dev->device; + snprintf(smc_dev->pci_id, sizeof(smc_dev->pci_id), "%s", +pci_name(pci_dev)); +#if IS_ENABLED(CONFIG_S390) + { + struct zpci_dev *zdev; + + zdev = to_zpci(pci_dev); + smc_dev->pci_fid = zdev->fid; + smc_dev->pci_pchid = zdev->pchid; + } +#endif +} + struct smc_sock; struct smc_clc_msg_accept_confirm; struct smc_clc_msg_local; diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index fcff07a9ea47..252aae0b11d9 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -448,6 +448,78 @@ static int smc_diag_fill_smcd_dev(struct smcd_dev_list *dev_list, return rc; } +static int smc_diag_handle_smcd_dev(struct smcd_dev *smcd, + struct sk_buff *skb, + struct netlink_callback *cb, + struct smc_diag_req_v2 *req) +{ + struct smc_diag_dev_info smc_diag_dev; + struct smc_pci_dev smc_pci_dev; + struct nlmsghdr *nlh; + int dummy = 0; + int rc = 0; + + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, MAGIC_SEQ_V2_ACK, + cb->nlh->nlmsg_type, 0, NLM_F_MULTI); + if (!nlh) + return -EMSGSIZE; + + memset(&smc_diag_dev, 0, sizeof(smc_diag_dev)); + memset(&smc_pci_dev, 0, sizeof(smc_pci_dev)); + smc_diag_dev.use_cnt = atomic_read(&smcd->lgr_cnt); + smc_diag_
[PATCH net-next v2 06/15] net/smc: Add diagnostic information to link structure
From: Guvenc Gulce During link creation add network and ib-device name to link structure. This is needed for diagnostic purposes. When diagnostic information is gathered, we need to traverse device, linkgroup and link structures, to be able to do that we need to hold a spinlock for the linkgroup list, without this diagnostic information in link structure, another device list mutex holding would be necessary to dereference the device pointer in the link structure which would be impossible when holding a spinlock already. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- net/smc/smc_core.c | 10 ++ net/smc/smc_core.h | 3 +++ 2 files changed, 13 insertions(+) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index da94725deb09..28fc583d9033 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -303,6 +303,15 @@ static u8 smcr_next_link_id(struct smc_link_group *lgr) return link_id; } +static inline void smcr_copy_dev_info_to_link(struct smc_link *link) +{ + struct smc_ib_device *smcibdev = link->smcibdev; + + memcpy(link->ibname, smcibdev->ibdev->name, sizeof(link->ibname)); + memcpy(link->ndevname, smcibdev->netdev[link->ibport - 1], + sizeof(link->ndevname)); +} + int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, u8 link_idx, struct smc_init_info *ini) { @@ -317,6 +326,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, lnk->smcibdev = ini->ib_dev; lnk->ibport = ini->ib_port; atomic_inc(&ini->ib_dev->lnk_cnt_by_port[ini->ib_port - 1]); + smcr_copy_dev_info_to_link(lnk); lnk->path_mtu = ini->ib_dev->pattr[ini->ib_port - 1].active_mtu; atomic_set(&lnk->conn_cnt, 0); smc_llc_link_set_uid(lnk); diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 83a88a4635db..bd16d63c5222 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -124,6 +124,9 @@ struct smc_link { u8 link_is_asym; /* is link asymmetric? */ struct smc_link_group *lgr; /* parent link group */ struct work_struct link_down_wrk; /* wrk to bring link down */ + /* Diagnostic relevant link information */ + u8 ibname[IB_DEVICE_NAME_MAX];/* ib device name */ + u8 ndevname[IFNAMSIZ];/* network device name */ enum smc_link_state state; /* state of link */ struct delayed_work llc_testlink_wrk; /* testlink worker */ -- 2.17.1
[PATCH net-next v2 14/15] net/smc: Refactor smc ism v2 capability handling
From: Guvenc Gulce Encapsulate the smc ism v2 capability boolean value in a function for better information hiding. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- net/smc/af_smc.c | 12 ++-- net/smc/smc_ism.c | 9 - net/smc/smc_ism.h | 5 ++--- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index bc3e45289771..850e6df47a59 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -668,7 +668,7 @@ static int smc_find_proposal_devices(struct smc_sock *smc, ini->smc_type_v1 = SMC_TYPE_N; } /* else RDMA is supported for this connection */ } - if (smc_ism_v2_capable && smc_find_ism_v2_device_clnt(smc, ini)) + if (smc_ism_is_v2_capable() && smc_find_ism_v2_device_clnt(smc, ini)) ini->smc_type_v2 = SMC_TYPE_N; /* if neither ISM nor RDMA are supported, fallback */ @@ -920,7 +920,7 @@ static int smc_connect_check_aclc(struct smc_init_info *ini, /* perform steps before actually connecting */ static int __smc_connect(struct smc_sock *smc) { - u8 version = smc_ism_v2_capable ? SMC_V2 : SMC_V1; + u8 version = smc_ism_is_v2_capable() ? SMC_V2 : SMC_V1; struct smc_clc_msg_accept_confirm_v2 *aclc2; struct smc_clc_msg_accept_confirm *aclc; struct smc_init_info *ini = NULL; @@ -945,9 +945,9 @@ static int __smc_connect(struct smc_sock *smc) version); ini->smcd_version = SMC_V1; - ini->smcd_version |= smc_ism_v2_capable ? SMC_V2 : 0; + ini->smcd_version |= smc_ism_is_v2_capable() ? SMC_V2 : 0; ini->smc_type_v1 = SMC_TYPE_B; - ini->smc_type_v2 = smc_ism_v2_capable ? SMC_TYPE_D : SMC_TYPE_N; + ini->smc_type_v2 = smc_ism_is_v2_capable() ? SMC_TYPE_D : SMC_TYPE_N; /* get vlan id from IP device */ if (smc_vlan_by_tcpsk(smc->clcsock, ini)) { @@ -1354,7 +1354,7 @@ static int smc_listen_v2_check(struct smc_sock *new_smc, rc = SMC_CLC_DECL_PEERNOSMC; goto out; } - if (!smc_ism_v2_capable) { + if (!smc_ism_is_v2_capable()) { ini->smcd_version &= ~SMC_V2; rc = SMC_CLC_DECL_NOISM2SUPP; goto out; @@ -1680,7 +1680,7 @@ static void smc_listen_work(struct work_struct *work) { struct smc_sock *new_smc = container_of(work, struct smc_sock, smc_listen_work); - u8 version = smc_ism_v2_capable ? SMC_V2 : SMC_V1; + u8 version = smc_ism_is_v2_capable() ? SMC_V2 : SMC_V1; struct socket *newclcsock = new_smc->clcsock; struct smc_clc_msg_accept_confirm *cclc; struct smc_clc_msg_proposal_area *buf; diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c index 5bb2c7fb4ea8..2a2571637bc6 100644 --- a/net/smc/smc_ism.c +++ b/net/smc/smc_ism.c @@ -22,7 +22,7 @@ struct smcd_dev_list smcd_dev_list = { }; EXPORT_SYMBOL_GPL(smcd_dev_list); -bool smc_ism_v2_capable; +static bool smc_ism_v2_capable; /* Test if an ISM communication is possible - same CPC */ int smc_ism_cantalk(u64 peer_gid, unsigned short vlan_id, struct smcd_dev *smcd) @@ -53,6 +53,13 @@ u16 smc_ism_get_chid(struct smcd_dev *smcd) } EXPORT_SYMBOL_GPL(smc_ism_get_chid); +/* HW supports ISM V2 and thus System EID is defined */ +bool smc_ism_is_v2_capable(void) +{ + return smc_ism_v2_capable; +} +EXPORT_SYMBOL_GPL(smc_ism_is_v2_capable); + /* Set a connection using this DMBE. */ void smc_ism_set_conn(struct smc_connection *conn) { diff --git a/net/smc/smc_ism.h b/net/smc/smc_ism.h index 8048e09ddcf8..481a4b7df30b 100644 --- a/net/smc/smc_ism.h +++ b/net/smc/smc_ism.h @@ -10,6 +10,7 @@ #define SMCD_ISM_H #include +#include #include #include "smc.h" @@ -20,9 +21,6 @@ struct smcd_dev_list {/* List of SMCD devices */ }; extern struct smcd_dev_listsmcd_dev_list; /* list of smcd devices */ -extern boolsmc_ism_v2_capable; /* HW supports ISM V2 and thus -* System EID is defined -*/ struct smc_ism_vlanid {/* VLAN id set on ISM device */ struct list_head list; @@ -52,5 +50,6 @@ int smc_ism_write(struct smcd_dev *dev, const struct smc_ism_position *pos, int smc_ism_signal_shutdown(struct smc_link_group *lgr); void smc_ism_get_system_eid(struct smcd_dev *dev, u8 **eid); u16 smc_ism_get_chid(struct smcd_dev *dev); +bool smc_ism_is_v2_capable(void); void smc_ism_init(void); #endif -- 2.17.1
[PATCH net-next v2 07/15] net/smc: Refactor the netlink reply processing routine
From: Guvenc Gulce Refactor the netlink reply processing routine so that it provides sub functions for specific parts of the processing. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- net/smc/smc_diag.c | 218 +++-- 1 file changed, 133 insertions(+), 85 deletions(-) diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index c2225231f679..44be723c97fe 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -69,35 +69,25 @@ static void smc_diag_msg_common_fill(struct smc_diag_msg *r, struct sock *sk) } } -static int smc_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb, - struct smc_diag_msg *r, - struct user_namespace *user_ns) +static bool smc_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb, + struct smc_diag_msg *r, + struct user_namespace *user_ns) { - if (nla_put_u8(skb, SMC_DIAG_SHUTDOWN, sk->sk_shutdown)) - return 1; + if (nla_put_u8(skb, SMC_DIAG_SHUTDOWN, sk->sk_shutdown) < 0) + return false; r->diag_uid = from_kuid_munged(user_ns, sock_i_uid(sk)); r->diag_inode = sock_i_ino(sk); - return 0; + return true; } -static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, - struct netlink_callback *cb, - const struct smc_diag_req *req, - struct nlattr *bc) +static bool smc_diag_fill_base_struct(struct sock *sk, struct sk_buff *skb, + struct netlink_callback *cb, + struct smc_diag_msg *r) { struct smc_sock *smc = smc_sk(sk); - struct smc_diag_fallback fallback; struct user_namespace *user_ns; - struct smc_diag_msg *r; - struct nlmsghdr *nlh; - nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, - cb->nlh->nlmsg_type, sizeof(*r), NLM_F_MULTI); - if (!nlh) - return -EMSGSIZE; - - r = nlmsg_data(nlh); smc_diag_msg_common_fill(r, sk); r->diag_state = sk->sk_state; if (smc->use_fallback) @@ -107,89 +97,148 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, else r->diag_mode = SMC_DIAG_MODE_SMCR; user_ns = sk_user_ns(NETLINK_CB(cb->skb).sk); - if (smc_diag_msg_attrs_fill(sk, skb, r, user_ns)) - goto errout; + if (!smc_diag_msg_attrs_fill(sk, skb, r, user_ns)) + return false; + return true; +} + +static bool smc_diag_fill_fallback(struct sock *sk, struct sk_buff *skb) +{ + struct smc_diag_fallback fallback; + struct smc_sock *smc = smc_sk(sk); + + memset(&fallback, 0, sizeof(fallback)); fallback.reason = smc->fallback_rsn; fallback.peer_diagnosis = smc->peer_diagnosis; if (nla_put(skb, SMC_DIAG_FALLBACK, sizeof(fallback), &fallback) < 0) + return false; + + return true; +} + +static bool smc_diag_fill_conninfo(struct sock *sk, struct sk_buff *skb) +{ + struct smc_host_cdc_msg *local_tx, *local_rx; + struct smc_diag_conninfo cinfo; + struct smc_connection *conn; + struct smc_sock *smc; + + smc = smc_sk(sk); + conn = &smc->conn; + local_tx = &conn->local_tx_ctrl; + local_rx = &conn->local_rx_ctrl; + memset(&cinfo, 0, sizeof(cinfo)); + cinfo.token = conn->alert_token_local; + cinfo.sndbuf_size = conn->sndbuf_desc ? conn->sndbuf_desc->len : 0; + cinfo.rmbe_size = conn->rmb_desc ? conn->rmb_desc->len : 0; + cinfo.peer_rmbe_size = conn->peer_rmbe_size; + + cinfo.rx_prod.wrap = local_rx->prod.wrap; + cinfo.rx_prod.count = local_rx->prod.count; + cinfo.rx_cons.wrap = local_rx->cons.wrap; + cinfo.rx_cons.count = local_rx->cons.count; + + cinfo.tx_prod.wrap = local_tx->prod.wrap; + cinfo.tx_prod.count = local_tx->prod.count; + cinfo.tx_cons.wrap = local_tx->cons.wrap; + cinfo.tx_cons.count = local_tx->cons.count; + + cinfo.tx_prod_flags = *(u8 *)&local_tx->prod_flags; + cinfo.tx_conn_state_flags = *(u8 *)&local_tx->conn_state_flags; + cinfo.rx_prod_flags = *(u8 *)&local_rx->prod_flags; + cinfo.rx_conn_state_flags = *(u8 *)&local_rx->conn_state_flags; + + cinfo.tx_prep.wrap = conn->tx_curs_prep.wrap; + cinfo.tx_prep.count = conn->tx_curs_prep.count; + cinfo.tx_sent.wrap = conn->tx_curs_sent.wrap; + cinfo.tx_sent.count = conn->tx_curs_sent.count; + cinfo.tx_fin.wrap = conn->tx_curs_fin.wrap; + cinfo.tx_fin.count = conn->tx_curs_fin.count; + + if (nla_put(skb, SMC_DIAG_CONNINFO, sizeof(cinfo), &cinfo) < 0) + return false; + + return true; +} + +static bool smc_diag_fill_lgrinfo(struct
[PATCH net-next v2 02/15] net/smc: Use active link of the connection
From: Guvenc Gulce Use active link of the connection directly and not via linkgroup array structure when obtaining link data of the connection. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- net/smc/smc_diag.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index f15fca59b4b2..c2225231f679 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -160,17 +160,17 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, !list_empty(&smc->conn.lgr->list)) { struct smc_diag_lgrinfo linfo = { .role = smc->conn.lgr->role, - .lnk[0].ibport = smc->conn.lgr->lnk[0].ibport, - .lnk[0].link_id = smc->conn.lgr->lnk[0].link_id, + .lnk[0].ibport = smc->conn.lnk->ibport, + .lnk[0].link_id = smc->conn.lnk->link_id, }; memcpy(linfo.lnk[0].ibname, smc->conn.lgr->lnk[0].smcibdev->ibdev->name, - sizeof(smc->conn.lgr->lnk[0].smcibdev->ibdev->name)); + sizeof(smc->conn.lnk->smcibdev->ibdev->name)); smc_gid_be16_convert(linfo.lnk[0].gid, -smc->conn.lgr->lnk[0].gid); +smc->conn.lnk->gid); smc_gid_be16_convert(linfo.lnk[0].peer_gid, -smc->conn.lgr->lnk[0].peer_gid); +smc->conn.lnk->peer_gid); if (nla_put(skb, SMC_DIAG_LGRINFO, sizeof(linfo), &linfo) < 0) goto errout; -- 2.17.1
[PATCH net-next v2 03/15] net/smc: Add connection counters for links
From: Guvenc Gulce Add connection counters to the structure of the link. Increase/decrease the counters as needed in the corresponding routines. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- net/smc/smc_core.c | 16 ++-- net/smc/smc_core.h | 1 + 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 2b19863f7171..6e2077161267 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -139,6 +139,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first) } if (!conn->lnk) return SMC_CLC_DECL_NOACTLINK; + atomic_inc(&conn->lnk->conn_cnt); return 0; } @@ -180,6 +181,8 @@ static void __smc_lgr_unregister_conn(struct smc_connection *conn) struct smc_link_group *lgr = conn->lgr; rb_erase(&conn->alert_node, &lgr->conns_all); + if (conn->lnk) + atomic_dec(&conn->lnk->conn_cnt); lgr->conns_num--; conn->alert_token_local = 0; sock_put(&smc->sk); /* sock_hold in smc_lgr_register_conn() */ @@ -314,6 +317,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, lnk->smcibdev = ini->ib_dev; lnk->ibport = ini->ib_port; lnk->path_mtu = ini->ib_dev->pattr[ini->ib_port - 1].active_mtu; + atomic_set(&lnk->conn_cnt, 0); smc_llc_link_set_uid(lnk); INIT_WORK(&lnk->link_down_wrk, smc_link_down_work); if (!ini->ib_dev->initialized) { @@ -526,6 +530,14 @@ static int smc_switch_cursor(struct smc_sock *smc, struct smc_cdc_tx_pend *pend, return rc; } +static inline void smc_switch_link_and_count(struct smc_connection *conn, +struct smc_link *to_lnk) +{ + atomic_dec(&conn->lnk->conn_cnt); + conn->lnk = to_lnk; + atomic_inc(&conn->lnk->conn_cnt); +} + struct smc_link *smc_switch_conns(struct smc_link_group *lgr, struct smc_link *from_lnk, bool is_dev_err) { @@ -574,7 +586,7 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr, smc->sk.sk_state == SMC_PEERABORTWAIT || smc->sk.sk_state == SMC_PROCESSABORT) { spin_lock_bh(&conn->send_lock); - conn->lnk = to_lnk; + smc_switch_link_and_count(conn, to_lnk); spin_unlock_bh(&conn->send_lock); continue; } @@ -588,7 +600,7 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr, } /* avoid race with smcr_tx_sndbuf_nonempty() */ spin_lock_bh(&conn->send_lock); - conn->lnk = to_lnk; + smc_switch_link_and_count(conn, to_lnk); rc = smc_switch_cursor(smc, pend, wr_buf); spin_unlock_bh(&conn->send_lock); sock_put(&smc->sk); diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 9aee54a6bcba..83a88a4635db 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -129,6 +129,7 @@ struct smc_link { struct delayed_work llc_testlink_wrk; /* testlink worker */ struct completion llc_testlink_resp; /* wait for rx of testlink */ int llc_testlink_time; /* testlink interval */ + atomic_tconn_cnt; }; /* For now we just allow one parallel link per link group. The SMC protocol -- 2.17.1
[PATCH net-next v2 09/15] net/smc: Introduce SMCR get linkgroup command
From: Guvenc Gulce Introduce get linkgroup command which loops through all available SMCR linkgroups. It uses the SMC-R linkgroup list as entry point, not the socket list, which makes linkgroup diagnosis possible, in case linkgroup does not contain active connections anymore. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- include/net/smc.h | 2 +- include/uapi/linux/smc.h | 5 ++ include/uapi/linux/smc_diag.h | 43 + net/smc/smc.h | 5 +- net/smc/smc_core.c| 3 +- net/smc/smc_core.h| 1 - net/smc/smc_diag.c| 88 +++ 7 files changed, 141 insertions(+), 6 deletions(-) diff --git a/include/net/smc.h b/include/net/smc.h index e441aa97ad61..59d25dcb8e92 100644 --- a/include/net/smc.h +++ b/include/net/smc.h @@ -10,8 +10,8 @@ */ #ifndef _SMC_H #define _SMC_H +#include -#define SMC_MAX_PNETID_LEN 16 /* Max. length of PNET id */ struct smc_hashinfo { rwlock_t lock; diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h index 0e11ca421ca4..635e2c2aeac5 100644 --- a/include/uapi/linux/smc.h +++ b/include/uapi/linux/smc.h @@ -3,6 +3,7 @@ * Shared Memory Communications over RDMA (SMC-R) and RoCE * * Definitions for generic netlink based configuration of an SMC-R PNET table + * Definitions for SMC Linkgroup and Devices. * * Copyright IBM Corp. 2016 * @@ -33,4 +34,8 @@ enum {/* SMC PNET Table commands */ #define SMCR_GENL_FAMILY_NAME "SMC_PNETID" #define SMCR_GENL_FAMILY_VERSION 1 +#define SMC_MAX_PNETID_LEN 16 /* Max. length of PNET id */ +#define SMC_LGR_ID_SIZE4 +#define SMC_MAX_HOSTNAME_LEN 32 /* Max length of hostname */ +#define SMC_MAX_EID_LEN32 /* Max length of eid */ #endif /* _UAPI_LINUX_SMC_H */ diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index 236c1c52d562..6ae028344b6d 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -4,8 +4,10 @@ #include #include +#include #include +#define SMC_DIAG_EXTS_PER_CMD 16 /* Sequence numbers */ enum { MAGIC_SEQ = 123456, @@ -21,6 +23,17 @@ struct smc_diag_req { struct inet_diag_sockid id; }; +/* Request structure v2 */ +struct smc_diag_req_v2 { + __u8diag_family; + __u8pad[2]; + __u8diag_ext; /* Query extended information */ + struct inet_diag_sockid id; + __u32 cmd; + __u32 cmd_ext; + __u8cmd_val[8]; +}; + /* Base info structure. It contains socket identity (addrs/ports/cookie) based * on the internal clcsock, and more SMC-related socket data */ @@ -57,7 +70,19 @@ enum { __SMC_DIAG_MAX, }; +/* V2 Commands */ +enum { + SMC_DIAG_GET_LGR_INFO = SMC_DIAG_EXTS_PER_CMD, + __SMC_DIAG_EXT_MAX, +}; + +/* SMC_DIAG_GET_LGR_INFO command extensions */ +enum { + SMC_DIAG_LGR_INFO_SMCR = 1, +}; + #define SMC_DIAG_MAX (__SMC_DIAG_MAX - 1) +#define SMC_DIAG_EXT_MAX (__SMC_DIAG_EXT_MAX - 1) /* SMC_DIAG_CONNINFO */ @@ -88,6 +113,14 @@ struct smc_diag_conninfo { struct smc_diag_cursor tx_fin; /* confirmed sent cursor */ }; +struct smc_diag_v2_lgr_info { + __u8smc_version;/* SMC Version */ + __u8peer_smc_release; /* Peer SMC Version */ + __u8peer_os;/* Peer operating system */ + __u8negotiated_eid[SMC_MAX_EID_LEN]; /* Negotiated EID */ + __u8peer_hostname[SMC_MAX_HOSTNAME_LEN]; /* Peer host */ +}; + /* SMC_DIAG_LINKINFO */ struct smc_diag_linkinfo { @@ -116,4 +149,14 @@ struct smcd_diag_dmbinfo { /* SMC-D Socket internals */ __aligned_u64 peer_token; /* Token of remote DMBE */ }; +struct smc_diag_lgr { + __u8lgr_id[SMC_LGR_ID_SIZE]; /* Linkgroup identifier */ + __u8lgr_role; /* Linkgroup role */ + __u8lgr_type; /* Linkgroup type */ + __u8pnet_id[SMC_MAX_PNETID_LEN]; /* Linkgroup pnet id */ + __u8vlan_id;/* Linkgroup vland id */ + __u32 conns_num; /* Number of connections */ + __u8reserved; /* Reserved for future use */ + struct smc_diag_v2_lgr_info v2_lgr_info; /* SMCv2 info */ +}; #endif /* _UAPI_SMC_DIAG_H_ */ diff --git a/net/smc/smc.h b/net/smc/smc.h index d65e15f0c944..447cf9be979d 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -14,6 +14,7 @@ #include #include #include /* __aligned */ +#include #include #include "smc_ib.h" @@ -29,11 +30,9 @@ * devices */ -#define SMC_MAX_HOSTNAME_LEN 32 -#define SMC_MAX_E
[PATCH net-next v2 13/15] net/smc: Add support for obtaining SMCR device list
From: Guvenc Gulce Deliver SMCR device information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- include/uapi/linux/smc_diag.h | 6 ++ net/smc/smc_diag.c| 133 ++ net/smc/smc_ib.c | 2 + 3 files changed, 141 insertions(+) diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index ab8f76bdd1a4..4c6332785533 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -88,6 +88,7 @@ enum { /* SMC_DIAG_GET_DEV_INFO command extensions */ enum { SMC_DIAG_DEV_INFO_SMCD = 1, + SMC_DIAG_DEV_INFO_SMCR, }; #define SMC_DIAG_MAX (__SMC_DIAG_MAX - 1) @@ -182,6 +183,11 @@ struct smc_diag_dev_info { __u16 pci_vendor; /* PCI Vendor */ __u16 pci_device; /* PCI Device Vendor ID */ __u8pci_id[SMC_PCI_ID_STR_LEN]; /* PCI ID */ + __u8dev_name[IB_DEVICE_NAME_MAX]; /* IB Device name */ + __u8netdev[SMC_MAX_PORTS][IFNAMSIZ]; /* Netdev name(s) */ + __u8port_state[SMC_MAX_PORTS]; /* IB Port State */ + __u8port_valid[SMC_MAX_PORTS]; /* Is IB Port valid */ + __u32 lnk_cnt_by_port[SMC_MAX_PORTS]; /* # lnks per port */ }; struct smc_diag_lgr { diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 252aae0b11d9..58bfbe0bef4d 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -365,6 +365,34 @@ static int smc_diag_handle_lgr(struct smc_link_group *lgr, return rc; } +static bool smcr_diag_is_dev_critical(struct smc_lgr_list *smc_lgr, + struct smc_ib_device *smcibdev) +{ + struct smc_link_group *lgr; + bool rc = false; + int i; + + spin_lock_bh(&smc_lgr->lock); + list_for_each_entry(lgr, &smc_lgr->list, list) { + if (lgr->is_smcd) + continue; + for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { + if (lgr->lnk[i].state == SMC_LNK_UNUSED) + continue; + if (lgr->lnk[i].smcibdev == smcibdev) { + if (lgr->type == SMC_LGR_SINGLE || + lgr->type == SMC_LGR_ASYMMETRIC_LOCAL) { + rc = true; + goto out; + } + } + } + } +out: + spin_unlock_bh(&smc_lgr->lock); + return rc; +} + static int smc_diag_fill_lgr_list(struct smc_lgr_list *smc_lgr, struct sk_buff *skb, struct netlink_callback *cb, @@ -520,6 +548,108 @@ static int smc_diag_prep_smcd_dev(struct smcd_dev_list *dev_list, return rc; } +static inline void smc_diag_handle_dev_port(struct smc_diag_dev_info *smc_diag_dev, + struct ib_device *ibdev, + struct smc_ib_device *smcibdev, + int port) +{ + unsigned char port_state; + + smc_diag_dev->port_valid[port] = 1; + snprintf((char *)&smc_diag_dev->netdev[port], +sizeof(smc_diag_dev->netdev[port]), +"%s", (char *)&smcibdev->netdev[port]); + snprintf((char *)&smc_diag_dev->pnet_id[port], +sizeof(smc_diag_dev->pnet_id[port]), "%s", +(char *)&smcibdev->pnetid[port]); + smc_diag_dev->pnetid_by_user[port] = smcibdev->pnetid_by_user[port]; + port_state = smc_ib_port_active(smcibdev, port + 1); + smc_diag_dev->port_state[port] = port_state; + smc_diag_dev->lnk_cnt_by_port[port] = + atomic_read(&smcibdev->lnk_cnt_by_port[port]); +} + +static int smc_diag_handle_smcr_dev(struct smc_ib_device *smcibdev, + struct sk_buff *skb, + struct netlink_callback *cb, + struct smc_diag_req_v2 *req) +{ + struct smc_diag_dev_info smc_dev; + struct smc_pci_dev smc_pci_dev; + struct pci_dev *pci_dev; + unsigned char is_crit; + struct nlmsghdr *nlh; + int dummy = 0; + int i, rc = 0; + + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, MAGIC_SEQ_V2_ACK, + cb->nlh->nlmsg_type, 0, NLM_F_MULTI); + if (!nlh) + return -EMSGSIZE; + + memset(&smc_dev, 0, sizeof(smc_dev)); + memset(&smc_pci_dev, 0, sizeof(smc_pci_dev)); + for (i = 1; i <= SMC_MAX_PORTS; i++) { + if (rdma_is_port_valid(smcibdev->ibdev, i)) { + smc_diag_handle_dev_port(&smc_dev, smcibdev->ibdev, +smcibdev, i - 1); +
[PATCH net-next v2 01/15] net/smc: use helper smc_conn_abort() in listen processing
The helper smc_connect_abort() can be used by the listen processing functions, too. And rename this helper to smc_conn_abort() to make the purpose clearer. No functional change. Signed-off-by: Karsten Graul --- net/smc/af_smc.c | 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 527185af7bf3..bc3e45289771 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -552,8 +552,7 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code, return smc_connect_fallback(smc, reason_code); } -/* abort connecting */ -static void smc_connect_abort(struct smc_sock *smc, int local_first) +static void smc_conn_abort(struct smc_sock *smc, int local_first) { if (local_first) smc_lgr_cleanup_early(&smc->conn); @@ -814,7 +813,7 @@ static int smc_connect_rdma(struct smc_sock *smc, return 0; connect_abort: - smc_connect_abort(smc, ini->first_contact_local); + smc_conn_abort(smc, ini->first_contact_local); mutex_unlock(&smc_client_lgr_pending); smc->connect_nonblock = 0; @@ -893,7 +892,7 @@ static int smc_connect_ism(struct smc_sock *smc, return 0; connect_abort: - smc_connect_abort(smc, ini->first_contact_local); + smc_conn_abort(smc, ini->first_contact_local); mutex_unlock(&smc_server_lgr_pending); smc->connect_nonblock = 0; @@ -1320,10 +1319,7 @@ static void smc_listen_decline(struct smc_sock *new_smc, int reason_code, int local_first, u8 version) { /* RDMA setup failed, switch back to TCP */ - if (local_first) - smc_lgr_cleanup_early(&new_smc->conn); - else - smc_conn_free(&new_smc->conn); + smc_conn_abort(new_smc, local_first); if (reason_code < 0) { /* error, no fallback possible */ smc_listen_out_err(new_smc); return; @@ -1429,10 +1425,7 @@ static int smc_listen_ism_init(struct smc_sock *new_smc, /* Create send and receive buffers */ rc = smc_buf_create(new_smc, true); if (rc) { - if (ini->first_contact_local) - smc_lgr_cleanup_early(&new_smc->conn); - else - smc_conn_free(&new_smc->conn); + smc_conn_abort(new_smc, ini->first_contact_local); return (rc == -ENOSPC) ? SMC_CLC_DECL_MAX_DMB : SMC_CLC_DECL_MEM; } -- 2.17.1
[PATCH net-next v2 08/15] net/smc: Add ability to work with extended SMC netlink API
From: Guvenc Gulce smc_diag module should be able to work with legacy and extended netlink api. This is done by using the sequence field of the netlink message header. Sequence field is optional and was filled with a constant value MAGIC_SEQ in the current implementation. New constant values MAGIC_SEQ_V2 and MAGIC_SEQ_V2_ACK are used to signal the usage of the new Netlink API between userspace and kernel. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- include/uapi/linux/smc_diag.h | 7 +++ net/smc/smc_diag.c| 21 + 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index 8cb3a6fef553..236c1c52d562 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -6,6 +6,13 @@ #include #include +/* Sequence numbers */ +enum { + MAGIC_SEQ = 123456, + MAGIC_SEQ_V2, + MAGIC_SEQ_V2_ACK, +}; + /* Request structure */ struct smc_diag_req { __u8diag_family; diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 44be723c97fe..bc2b616524ff 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -293,19 +293,24 @@ static int smc_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; } +static int smc_diag_dump_ext(struct sk_buff *skb, struct netlink_callback *cb) +{ + return skb->len; +} + static int smc_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h) { struct net *net = sock_net(skb->sk); - + struct netlink_dump_control c = { + .min_dump_alloc = SKB_WITH_OVERHEAD(32768), + }; if (h->nlmsg_type == SOCK_DIAG_BY_FAMILY && h->nlmsg_flags & NLM_F_DUMP) { - { - struct netlink_dump_control c = { - .dump = smc_diag_dump, - .min_dump_alloc = SKB_WITH_OVERHEAD(32768), - }; - return netlink_dump_start(net->diag_nlsk, skb, h, &c); - } + if (h->nlmsg_seq >= MAGIC_SEQ_V2) + c.dump = smc_diag_dump_ext; + else + c.dump = smc_diag_dump; + return netlink_dump_start(net->diag_nlsk, skb, h, &c); } return 0; } -- 2.17.1
[PATCH net-next v2 11/15] net/smc: Add SMC-D Linkgroup diagnostic support
From: Guvenc Gulce Deliver SMCD Linkgroup information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- include/uapi/linux/smc_diag.h | 7 +++ net/smc/smc_diag.c| 108 ++ net/smc/smc_ism.c | 2 + 3 files changed, 117 insertions(+) diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index a57df0296aa4..5a80172df757 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -81,6 +81,7 @@ enum { enum { SMC_DIAG_LGR_INFO_SMCR = 1, SMC_DIAG_LGR_INFO_SMCR_LINK, + SMC_DIAG_LGR_INFO_SMCD, }; #define SMC_DIAG_MAX (__SMC_DIAG_MAX - 1) @@ -155,6 +156,12 @@ struct smcd_diag_dmbinfo { /* SMC-D Socket internals */ __aligned_u64 my_gid; /* My GID */ __aligned_u64 token; /* Token of DMB */ __aligned_u64 peer_token; /* Token of remote DMBE */ + /* Fields above used by legacy v1 code */ + __u8pnet_id[SMC_MAX_PNETID_LEN]; /* Pnet ID */ + __u32 conns_num; /* Number of connections */ + __u16 chid; /* Linkgroup CHID */ + __u8vlan_id;/* Linkgroup vlan id */ + struct smc_diag_v2_lgr_info v2_lgr_info; /* SMCv2 info */ }; struct smc_diag_lgr { diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 6885814b6e4f..fcff07a9ea47 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -21,6 +21,7 @@ #include "smc.h" #include "smc_ib.h" +#include "smc_ism.h" #include "smc_core.h" struct smc_diag_dump_ctx { @@ -252,6 +253,53 @@ static int smc_diag_fill_lgr_link(struct smc_link_group *lgr, return -EMSGSIZE; } +static int smc_diag_fill_smcd_lgr(struct smc_link_group *lgr, + struct sk_buff *skb, + struct netlink_callback *cb, + struct smc_diag_req_v2 *req) +{ + struct smcd_diag_dmbinfo smcd_lgr; + struct nlmsghdr *nlh; + int dummy = 0; + int rc = 0; + + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, MAGIC_SEQ_V2_ACK, + cb->nlh->nlmsg_type, 0, NLM_F_MULTI); + if (!nlh) + return -EMSGSIZE; + + memset(&smcd_lgr, 0, sizeof(smcd_lgr)); + memcpy(&smcd_lgr.linkid, lgr->id, sizeof(lgr->id)); + smcd_lgr.conns_num = lgr->conns_num; + smcd_lgr.vlan_id = lgr->vlan_id; + smcd_lgr.peer_gid = lgr->peer_gid; + smcd_lgr.my_gid = lgr->smcd->local_gid; + smcd_lgr.chid = smc_ism_get_chid(lgr->smcd); + memcpy(&smcd_lgr.v2_lgr_info.negotiated_eid, lgr->negotiated_eid, + sizeof(smcd_lgr.v2_lgr_info.negotiated_eid)); + memcpy(&smcd_lgr.v2_lgr_info.peer_hostname, lgr->peer_hostname, + sizeof(smcd_lgr.v2_lgr_info.peer_hostname)); + smcd_lgr.v2_lgr_info.peer_os = lgr->peer_os; + smcd_lgr.v2_lgr_info.peer_smc_release = lgr->peer_smc_release; + smcd_lgr.v2_lgr_info.smc_version = lgr->smc_version; + snprintf(smcd_lgr.pnet_id, sizeof(smcd_lgr.pnet_id), "%s", +lgr->smcd->pnetid); + + /* Just a command place holder to signal back the command reply type */ + if (nla_put(skb, SMC_DIAG_GET_LGR_INFO, sizeof(dummy), &dummy) < 0) + goto errout; + + if (nla_put(skb, SMC_DIAG_LGR_INFO_SMCD, + sizeof(smcd_lgr), &smcd_lgr) < 0) + goto errout; + + nlmsg_end(skb, nlh); + return rc; +errout: + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; +} + static int smc_diag_fill_lgr(struct smc_link_group *lgr, struct sk_buff *skb, struct netlink_callback *cb, @@ -343,6 +391,63 @@ static int smc_diag_fill_lgr_list(struct smc_lgr_list *smc_lgr, return rc; } +static int smc_diag_handle_smcd_lgr(struct smcd_dev *dev, + struct sk_buff *skb, + struct netlink_callback *cb, + struct smc_diag_req_v2 *req) +{ + struct smc_diag_dump_ctx *cb_ctx = smc_dump_context(cb); + struct smc_link_group *lgr; + int snum = cb_ctx->pos[1]; + int rc = 0, num = 0; + + spin_lock_bh(&dev->lgr_lock); + list_for_each_entry(lgr, &dev->lgr_list, list) { + if (lgr->is_smcd) { + if (num < snum) + goto next; + rc = smc_diag_fill_smcd_lgr(lgr, skb, cb, req); + if (rc < 0) + goto errout; +next: + num++; + } + } +errout: + spin_unlock_bh(&dev->lgr_lock); + cb_ctx->pos[1] = num; + return rc; +} + +static int smc_diag_fill_smcd_dev(struct smcd_dev_list *dev_list, +
[PATCH net-next v2 15/15] net/smc: Add support for obtaining system information
From: Guvenc Gulce Add new netlink command to obtain system information of the smc module. Signed-off-by: Guvenc Gulce Signed-off-by: Karsten Graul --- include/uapi/linux/smc.h | 1 + include/uapi/linux/smc_diag.h | 18 ++ net/smc/smc_clc.c | 6 net/smc/smc_clc.h | 1 + net/smc/smc_diag.c| 62 +++ net/smc/smc_ism.c | 1 + 6 files changed, 89 insertions(+) diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h index 736e8b98c8a5..04385a98037a 100644 --- a/include/uapi/linux/smc.h +++ b/include/uapi/linux/smc.h @@ -38,6 +38,7 @@ enum {/* SMC PNET Table commands */ #define SMC_LGR_ID_SIZE4 #define SMC_MAX_HOSTNAME_LEN 32 /* Max length of hostname */ #define SMC_MAX_EID_LEN32 /* Max length of eid */ +#define SMC_MAX_EID8 /* Max number of eids */ #define SMC_MAX_PORTS 2 /* Max # of ports per ib device */ #define SMC_PCI_ID_STR_LEN 16 /* Max length of pci id string */ #endif /* _UAPI_LINUX_SMC_H */ diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index 4c6332785533..7409e7a854df 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -75,6 +75,7 @@ enum { enum { SMC_DIAG_GET_LGR_INFO = SMC_DIAG_EXTS_PER_CMD, SMC_DIAG_GET_DEV_INFO, + SMC_DIAG_GET_SYS_INFO, __SMC_DIAG_EXT_MAX, }; @@ -91,6 +92,11 @@ enum { SMC_DIAG_DEV_INFO_SMCR, }; +/* SMC_DIAG_GET_SYS_INFO command extensions */ +enum { + SMC_DIAG_SYS_INFO = 1, +}; + #define SMC_DIAG_MAX (__SMC_DIAG_MAX - 1) #define SMC_DIAG_EXT_MAX (__SMC_DIAG_EXT_MAX - 1) @@ -131,6 +137,18 @@ struct smc_diag_v2_lgr_info { __u8peer_hostname[SMC_MAX_HOSTNAME_LEN]; /* Peer host */ }; + +struct smc_system_info { + __u8smc_version;/* SMC Version */ + __u8smc_release;/* SMC Release */ + __u8ueid_count; /* Number of UEIDs */ + __u8smc_ism_is_v2; /* Is ISM SMC v2 capable */ + __u32 reserved; /* Reserved for future use */ + __u8local_hostname[SMC_MAX_HOSTNAME_LEN]; /* Hostnames */ + __u8seid[SMC_MAX_EID_LEN]; /* System EID */ + __u8ueid[SMC_MAX_EID][SMC_MAX_EID_LEN]; /* User EIDs */ +}; + /* SMC_DIAG_LINKINFO */ struct smc_diag_linkinfo { diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index 696d89c2dce4..ca887ee6b249 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -772,6 +772,12 @@ int smc_clc_send_accept(struct smc_sock *new_smc, bool srv_first_contact, return len > 0 ? 0 : len; } +void smc_clc_get_hostname(u8 **host) +{ + *host = &smc_hostname[0]; +} +EXPORT_SYMBOL_GPL(smc_clc_get_hostname); + void __init smc_clc_init(void) { struct new_utsname *u; diff --git a/net/smc/smc_clc.h b/net/smc/smc_clc.h index e7ab05683bc9..9ed9eb3abe46 100644 --- a/net/smc/smc_clc.h +++ b/net/smc/smc_clc.h @@ -334,5 +334,6 @@ int smc_clc_send_confirm(struct smc_sock *smc, bool clnt_first_contact, int smc_clc_send_accept(struct smc_sock *smc, bool srv_first_contact, u8 version); void smc_clc_init(void) __init; +void smc_clc_get_hostname(u8 **host); #endif diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 58bfbe0bef4d..a69a401329ab 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -23,6 +23,7 @@ #include "smc_ib.h" #include "smc_ism.h" #include "smc_core.h" +#include "smc_clc.h" struct smc_diag_dump_ctx { int pos[2]; @@ -650,6 +651,63 @@ static int smc_diag_prep_smcr_dev(struct smc_ib_devices *dev_list, return rc; } +static int smc_diag_prep_sys_info(struct smcd_dev_list *dev_list, + struct sk_buff *skb, + struct netlink_callback *cb, + struct smc_diag_req_v2 *req) +{ + struct smc_diag_dump_ctx *cb_ctx = smc_dump_context(cb); + struct smc_system_info smc_sys_info; + int dummy = 0, rc = 0, num = 0; + struct smcd_dev *smcd_dev; + int snum = cb_ctx->pos[0]; + struct nlmsghdr *nlh; + u8 *seid = NULL; + u8 *host = NULL; + + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, MAGIC_SEQ_V2_ACK, + cb->nlh->nlmsg_type, 0, NLM_F_MULTI); + if (!nlh) + return -EMSGSIZE; + + if (snum > num) + goto errout; + + memset(&smc_sys_info, 0, sizeof(smc_sys_info)); + smc_sys_info.smc_ism_is_v2 = smc_ism_is_v2_capable(); + smc_sys_info.smc_version = SMC_V2; + smc_sys_info.smc_release = SMC_RELEASE; + smc_clc_get_hostname(&host); + + if (host) + memcpy(smc_sys_info.local_hos
[PATCH] IPv6: Set SIT tunnel hard_header_len to zero
Due to the legacy usage of hard_header_len for SIT tunnels while already using infrastructure from net/ipv4/ip_tunnel.c the calculation of the path MTU in tnl_update_pmtu is incorrect. This leads to unnecessary creation of MTU exceptions for any flow going over a SIT tunnel. As SIT tunnels do not have a header themsevles other than their transport (L3, L2) headers we're leaving hard_header_len set to zero as tnl_update_pmtu is already taking care of the transport headers sizes. This will also help avoiding unnecessary IPv6 GC runs and spinlock contention seen when using SIT tunnels and for more than net.ipv6.route.gc_thresh flows. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Oliver Herms --- net/ipv6/sit.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 5e2c34c0ac97..5e7983cb6154 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1128,7 +1128,6 @@ static void ipip6_tunnel_bind_dev(struct net_device *dev) if (tdev && !netif_is_l3_master(tdev)) { int t_hlen = tunnel->hlen + sizeof(struct iphdr); - dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr); dev->mtu = tdev->mtu - t_hlen; if (dev->mtu < IPV6_MIN_MTU) dev->mtu = IPV6_MIN_MTU; @@ -1426,7 +1425,6 @@ static void ipip6_tunnel_setup(struct net_device *dev) dev->priv_destructor= ipip6_dev_free; dev->type = ARPHRD_SIT; - dev->hard_header_len= LL_MAX_HEADER + t_hlen; dev->mtu= ETH_DATA_LEN - t_hlen; dev->min_mtu= IPV6_MIN_MTU; dev->max_mtu= IP6_MAX_MTU - t_hlen; -- 2.25.1
[PATCH net] net/tls: Fix kernel panic when socket is in TLS ULP
user can initialize tls ulp using setsockopt call on socket before listen() in case of tls-toe (TLS_HW_RECORD) and same setsockopt call on connected socket in case of kernel tls (TLS_SW). In presence of tls-toe devices, TLS ulp is initialized, tls context is allocated per listen socket and socket is listening at adapter as well as kernel tcp stack. now consider the scenario, connections are established in kernel stack. on every connection close which is established in kernel stack, it clears tls context which is created on listen socket causing kernel panic. Addressed the issue by setting child socket to base (non TLS ULP) when tls ulp is initialized on parent socket (listen socket). Fixes: 76f7164d02d4 ("net/tls: free ctx in sock destruct") Signed-off-by: Vinay Kumar Yadav --- .../chelsio/inline_crypto/chtls/chtls_cm.c| 3 +++ net/tls/tls_main.c| 23 ++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c index 63aacc184f68..c56cd9c1e40c 100644 --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c @@ -1206,6 +1206,9 @@ static struct sock *chtls_recv_sock(struct sock *lsk, sk_setup_caps(newsk, dst); ctx = tls_get_ctx(lsk); newsk->sk_destruct = ctx->sk_destruct; + newsk->sk_prot = lsk->sk_prot; + inet_csk(newsk)->icsk_ulp_ops = inet_csk(lsk)->icsk_ulp_ops; + rcu_assign_pointer(inet_csk(newsk)->icsk_ulp_data, ctx); csk->sk = newsk; csk->passive_reap_next = oreq; csk->tx_chan = cxgb4_port_chan(ndev); diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 8d93cea99f2c..9682dacae30c 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -715,7 +715,7 @@ static int tls_init(struct sock *sk) tls_build_proto(sk); #ifdef CONFIG_TLS_TOE - if (tls_toe_bypass(sk)) + if (sk->sk_state == TCP_CLOSE && tls_toe_bypass(sk)) return 0; #endif @@ -744,6 +744,24 @@ static int tls_init(struct sock *sk) return rc; } +#ifdef CONFIG_TLS_TOE +static void tls_clone(const struct request_sock *req, + struct sock *newsk, const gfp_t priority) +{ + struct tls_context *ctx = tls_get_ctx(newsk); + struct inet_connection_sock *icsk = inet_csk(newsk); + + /* In presence of TLS TOE devices, TLS ulp is initialized on listen +* socket so lets child socket back to non tls ULP mode because tcp +* connections can happen in non TLS TOE mode. +*/ + newsk->sk_prot = ctx->sk_proto; + newsk->sk_destruct = ctx->sk_destruct; + icsk->icsk_ulp_ops = NULL; + rcu_assign_pointer(icsk->icsk_ulp_data, NULL); +} +#endif + static void tls_update(struct sock *sk, struct proto *p, void (*write_space)(struct sock *sk)) { @@ -857,6 +875,9 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = { .update = tls_update, .get_info = tls_get_info, .get_info_size = tls_get_info_size, +#ifdef CONFIG_TLS_TOE + .clone = tls_clone +#endif }; static int __init tls_register(void) -- 2.18.1
Re: [PATCH v3 net-next 09/12] net: dsa: tag_brcm: let DSA core deal with TX reallocation
On Mon, Nov 02, 2020 at 12:34:11PM -0800, Florian Fainelli wrote: > On 11/1/2020 11:16 AM, Vladimir Oltean wrote: > > Now that we have a central TX reallocation procedure that accounts for > > the tagger's needed headroom in a generic way, we can remove the > > skb_cow_head call. > > > > Cc: Florian Fainelli > > Signed-off-by: Vladimir Oltean > > Reviewed-by: Florian Fainelli Florian, I just noticed that tag_brcm.c has an __skb_put_padto call, even though it is not a tail tagger. This comes from commit: commit bf08c34086d159edde5c54902dfa2caa4d9fbd8c Author: Florian Fainelli Date: Wed Jan 3 22:13:00 2018 -0800 net: dsa: Move padding into Broadcom tagger Instead of having the different master network device drivers potentially used by DSA/Broadcom tags, move the padding necessary for the switches to accept short packets where it makes most sense: within tag_brcm.c. This avoids multiplying the number of similar commits to e.g: bgmac, bcmsysport, etc. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Do you remember why this was needed? As far as I understand, either the DSA master driver or the MAC itself should pad frames automatically. Is that not happening on Broadcom SoCs, or why do you need to pad from DSA? How should we deal with this? Having tag_brcm.c still do some potential reallocation defeats the purpose of doing it centrally, in a way. I was trying to change the prototype of struct dsa_device_ops::xmit to stop returning a struct sk_buff *, and I stumbled upon this. Should we just go ahead and pad everything unconditionally in DSA?
Re: [PATCH v2 0/8] slab: provide and use krealloc_array()
On Tue, Nov 3, 2020 at 12:13 PM Bartosz Golaszewski wrote: > On Tue, Nov 3, 2020 at 5:14 AM Joe Perches wrote: > > On Mon, 2020-11-02 at 16:20 +0100, Bartosz Golaszewski wrote: > > > From: Bartosz Golaszewski > Yeah so I had this concern for devm_krealloc() and even sent a patch > that extended it to honor __GFP_ZERO before I noticed that regular > krealloc() silently ignores __GFP_ZERO. I'm not sure if this is on > purpose. Maybe we should either make krealloc() honor __GFP_ZERO or > explicitly state in its documentation that it ignores it? And my voice here is to ignore for the same reasons: respect realloc(3) and making common sense with the idea of REallocating (capital letters on purpose). -- With Best Regards, Andy Shevchenko
Re: [PATCH] fsl/fman: add missing put_devcie() call in fman_port_probe()
On 2020/11/03 9:30, Jakub Kicinski wrote: On Sat, 31 Oct 2020 18:54:18 +0800 Yu Kuai wrote: if of_find_device_by_node() succeed, fman_port_probe() doesn't have a corresponding put_device(). Thus add jump target to fix the exception handling for this function implementation. Fixes: 0572054617f3 ("fsl/fman: fix dereference null return value") Signed-off-by: Yu Kuai diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c index d9baac0dbc7d..576ce6df3fce 100644 --- a/drivers/net/ethernet/freescale/fman/fman_port.c +++ b/drivers/net/ethernet/freescale/fman/fman_port.c @@ -1799,13 +1799,13 @@ static int fman_port_probe(struct platform_device *of_dev) of_node_put(fm_node); if (!fm_pdev) { err = -EINVAL; - goto return_err; + goto put_device; } @@ -1898,6 +1898,8 @@ static int fman_port_probe(struct platform_device *of_dev) return_err: of_node_put(port_node); +put_device: + put_device(&fm_pdev->dev); free_port: kfree(port); return err; This does not look right. You're jumping to put_device() when fm_pdev is NULL? Hi, oops, it's a silly mistake. Will fix it in V2 patch. Thanks, Yu Kuai The order of error handling should be the reverse of the order of execution of the function. .
[PATCH V2] fsl/fman: add missing put_devcie() call in fman_port_probe()
if of_find_device_by_node() succeed, fman_port_probe() doesn't have a corresponding put_device(). Thus add jump target to fix the exception handling for this function implementation. Fixes: 0572054617f3 ("fsl/fman: fix dereference null return value") Signed-off-by: Yu Kuai --- .../net/ethernet/freescale/fman/fman_port.c | 23 +-- 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c index d9baac0dbc7d..4ae5d844d1f5 100644 --- a/drivers/net/ethernet/freescale/fman/fman_port.c +++ b/drivers/net/ethernet/freescale/fman/fman_port.c @@ -1792,20 +1792,21 @@ static int fman_port_probe(struct platform_device *of_dev) if (!fm_node) { dev_err(port->dev, "%s: of_get_parent() failed\n", __func__); err = -ENODEV; - goto return_err; + goto free_port; } + of_node_put(port_node); fm_pdev = of_find_device_by_node(fm_node); of_node_put(fm_node); if (!fm_pdev) { err = -EINVAL; - goto return_err; + goto free_port; } fman = dev_get_drvdata(&fm_pdev->dev); if (!fman) { err = -EINVAL; - goto return_err; + goto put_device; } err = of_property_read_u32(port_node, "cell-index", &val); @@ -1813,7 +1814,7 @@ static int fman_port_probe(struct platform_device *of_dev) dev_err(port->dev, "%s: reading cell-index for %pOF failed\n", __func__, port_node); err = -EINVAL; - goto return_err; + goto put_device; } port_id = (u8)val; port->dts_params.id = port_id; @@ -1847,7 +1848,7 @@ static int fman_port_probe(struct platform_device *of_dev) } else { dev_err(port->dev, "%s: Illegal port type\n", __func__); err = -EINVAL; - goto return_err; + goto put_device; } port->dts_params.type = port_type; @@ -1861,7 +1862,7 @@ static int fman_port_probe(struct platform_device *of_dev) dev_err(port->dev, "%s: incorrect qman-channel-id\n", __func__); err = -EINVAL; - goto return_err; + goto put_device; } port->dts_params.qman_channel_id = qman_channel_id; } @@ -1871,20 +1872,18 @@ static int fman_port_probe(struct platform_device *of_dev) dev_err(port->dev, "%s: of_address_to_resource() failed\n", __func__); err = -ENOMEM; - goto return_err; + goto put_device; } port->dts_params.fman = fman; - of_node_put(port_node); - dev_res = __devm_request_region(port->dev, &res, res.start, resource_size(&res), "fman-port"); if (!dev_res) { dev_err(port->dev, "%s: __devm_request_region() failed\n", __func__); err = -EINVAL; - goto free_port; + goto put_device; } port->dts_params.base_addr = devm_ioremap(port->dev, res.start, @@ -1896,8 +1895,8 @@ static int fman_port_probe(struct platform_device *of_dev) return 0; -return_err: - of_node_put(port_node); +put_device: + put_device(&fm_pdev->dev); free_port: kfree(port); return err; -- 2.25.4
RE: [PATCH v2 net-next 0/3] fsl/qbman: in_interrupt() cleanup.
> -Original Message- > From: Sebastian Andrzej Siewior > Sent: 02 November 2020 01:23 > To: netdev@vger.kernel.org > Cc: Horia Geanta ; Aymen Sghaier > ; Herbert Xu ; David S. > Miller ; Madalin Bucur ; Jakub > Kicinski ; Leo Li ; Thomas Gleixner > ; Sebastian Andrzej Siewior > Subject: [PATCH v2 net-next 0/3] fsl/qbman: in_interrupt() cleanup. > > This is the in_interrupt() clean for FSL DPAA framework and the two > users. > > The `napi' parameter has been renamed to `sched_napi', the other parts > are same as in the previous post [0]. > > https://lore.kernel.org/linux-arm-kernel/20201027225454.3492351-1-bige...@linutronix.de/ > > Sebastian For the series, Reviewed-by: Madalin Bucur
[PATCH net-next 0/2] net: axienet: Dynamically enable MDIO interface
This patchset dynamically enable MDIO interface. The background for this change is coming from Cadence GEM controller(macb) in which MDC is active only during MDIO read or write operations while the PHY registers are read or written. It is implemented as an IP feature. For axiethernet as dynamic MDC enable/disable is not supported in hw we are implementing it in sw. This change doesn't affect any existing functionality. Clayton Rayment (1): net: xilinx: axiethernet: Enable dynamic MDIO MDC Radhey Shyam Pandey (1): net: xilinx: axiethernet: Introduce helper functions for MDC enable/disable drivers/net/ethernet/xilinx/xilinx_axienet.h | 2 + drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 21 ++--- drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c | 56 ++- 3 files changed, 51 insertions(+), 28 deletions(-) -- 2.7.4
[PATCH net-next 1/2] net: xilinx: axiethernet: Introduce helper functions for MDC enable/disable
Introduce helper functions to enable/disable MDIO interface clock. This change serves a preparatory patch for the coming feature to dynamically control the management bus clock. Signed-off-by: Radhey Shyam Pandey --- drivers/net/ethernet/xilinx/xilinx_axienet.h | 2 ++ drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c | 29 +++ 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h index 7326ad4..a03c3ca 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h @@ -378,6 +378,7 @@ struct axidma_bd { * @dev: Pointer to device structure * @phy_node: Pointer to device node structure * @mii_bus: Pointer to MII bus structure + * @mii_clk_div: MII bus clock divider value * @regs_start: Resource start for axienet device addresses * @regs: Base address for the axienet_local device address space * @dma_regs: Base address for the axidma device address space @@ -427,6 +428,7 @@ struct axienet_local { /* MDIO bus data */ struct mii_bus *mii_bus;/* MII bus reference */ + u8 mii_clk_div; /* MII bus clock divider value */ /* IO registers, dma functions and IRQs */ resource_size_t regs_start; diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c index 435ed30..84d06bf 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c @@ -30,6 +30,23 @@ static int axienet_mdio_wait_until_ready(struct axienet_local *lp) 1, 2); } +/* Enable the MDIO MDC. Called prior to a read/write operation */ +static void axienet_mdio_mdc_enable(struct axienet_local *lp) +{ + axienet_iow(lp, XAE_MDIO_MC_OFFSET, + ((u32)lp->mii_clk_div | XAE_MDIO_MC_MDIOEN_MASK)); +} + +/* Disable the MDIO MDC. Called after a read/write operation*/ +static void axienet_mdio_mdc_disable(struct axienet_local *lp) +{ + u32 mc_reg; + + mc_reg = axienet_ior(lp, XAE_MDIO_MC_OFFSET); + axienet_iow(lp, XAE_MDIO_MC_OFFSET, + (mc_reg & ~XAE_MDIO_MC_MDIOEN_MASK)); +} + /** * axienet_mdio_read - MDIO interface read function * @bus: Pointer to mii bus structure @@ -124,7 +141,9 @@ static int axienet_mdio_write(struct mii_bus *bus, int phy_id, int reg, **/ int axienet_mdio_enable(struct axienet_local *lp) { - u32 clk_div, host_clock; + u32 host_clock; + + lp->mii_clk_div = 0; if (lp->clk) { host_clock = clk_get_rate(lp->clk); @@ -176,19 +195,19 @@ int axienet_mdio_enable(struct axienet_local *lp) * "clock-frequency" from the CPU */ - clk_div = (host_clock / (MAX_MDIO_FREQ * 2)) - 1; + lp->mii_clk_div = (host_clock / (MAX_MDIO_FREQ * 2)) - 1; /* If there is any remainder from the division of * fHOST / (MAX_MDIO_FREQ * 2), then we need to add * 1 to the clock divisor or we will surely be above 2.5 MHz */ if (host_clock % (MAX_MDIO_FREQ * 2)) - clk_div++; + lp->mii_clk_div++; netdev_dbg(lp->ndev, "Setting MDIO clock divisor to %u/%u Hz host clock.\n", - clk_div, host_clock); + lp->mii_clk_div, host_clock); - axienet_iow(lp, XAE_MDIO_MC_OFFSET, clk_div | XAE_MDIO_MC_MDIOEN_MASK); + axienet_iow(lp, XAE_MDIO_MC_OFFSET, lp->mii_clk_div | XAE_MDIO_MC_MDIOEN_MASK); return axienet_mdio_wait_until_ready(lp); } -- 2.7.4
[PATCH net-next 2/2] net: xilinx: axiethernet: Enable dynamic MDIO MDC
From: Clayton Rayment MDIO spec does not require an MDC at all times, only when MDIO transactions are occurring. This patch allows the xilinx_axienet driver to disable the MDC when not in use, and re-enable it when needed. It also simplifies the driver by removing MDC disable and enable in device reset sequence. Signed-off-by: Clayton Rayment Signed-off-by: Radhey Shyam Pandey --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 21 -- drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c | 27 ++- 2 files changed, 25 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 529c167..6fea980 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -1049,20 +1049,13 @@ static int axienet_open(struct net_device *ndev) dev_dbg(&ndev->dev, "axienet_open()\n"); - /* Disable the MDIO interface till Axi Ethernet Reset is completed. -* When we do an Axi Ethernet reset, it resets the complete core -* including the MDIO. MDIO must be disabled before resetting -* and re-enabled afterwards. + /* When we do an Axi Ethernet reset, it resets the complete core +* including the MDIO. MDIO must be disabled before resetting. * Hold MDIO bus lock to avoid MDIO accesses during the reset. */ mutex_lock(&lp->mii_bus->mdio_lock); - axienet_mdio_disable(lp); ret = axienet_device_reset(ndev); - if (ret == 0) - ret = axienet_mdio_enable(lp); mutex_unlock(&lp->mii_bus->mdio_lock); - if (ret < 0) - return ret; ret = phylink_of_phy_connect(lp->phylink, lp->dev->of_node, 0); if (ret) { @@ -1156,9 +1149,7 @@ static int axienet_stop(struct net_device *ndev) /* Do a reset to ensure DMA is really stopped */ mutex_lock(&lp->mii_bus->mdio_lock); - axienet_mdio_disable(lp); __axienet_device_reset(lp); - axienet_mdio_enable(lp); mutex_unlock(&lp->mii_bus->mdio_lock); cancel_work_sync(&lp->dma_err_task); @@ -1669,16 +1660,12 @@ static void axienet_dma_err_handler(struct work_struct *work) axienet_setoptions(ndev, lp->options & ~(XAE_OPTION_TXEN | XAE_OPTION_RXEN)); - /* Disable the MDIO interface till Axi Ethernet Reset is completed. -* When we do an Axi Ethernet reset, it resets the complete core -* including the MDIO. MDIO must be disabled before resetting -* and re-enabled afterwards. + /* When we do an Axi Ethernet reset, it resets the complete core +* including the MDIO. MDIO must be disabled before resetting. * Hold MDIO bus lock to avoid MDIO accesses during the reset. */ mutex_lock(&lp->mii_bus->mdio_lock); - axienet_mdio_disable(lp); __axienet_device_reset(lp); - axienet_mdio_enable(lp); mutex_unlock(&lp->mii_bus->mdio_lock); for (i = 0; i < lp->tx_bd_num; i++) { diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c index 84d06bf..9c014ce 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c @@ -65,9 +65,13 @@ static int axienet_mdio_read(struct mii_bus *bus, int phy_id, int reg) int ret; struct axienet_local *lp = bus->priv; + axienet_mdio_mdc_enable(lp); + ret = axienet_mdio_wait_until_ready(lp); - if (ret < 0) + if (ret < 0) { + axienet_mdio_mdc_disable(lp); return ret; + } axienet_iow(lp, XAE_MDIO_MCR_OFFSET, (((phy_id << XAE_MDIO_MCR_PHYAD_SHIFT) & @@ -78,14 +82,17 @@ static int axienet_mdio_read(struct mii_bus *bus, int phy_id, int reg) XAE_MDIO_MCR_OP_READ_MASK)); ret = axienet_mdio_wait_until_ready(lp); - if (ret < 0) + if (ret < 0) { + axienet_mdio_mdc_disable(lp); return ret; + } rc = axienet_ior(lp, XAE_MDIO_MRD_OFFSET) & 0x; dev_dbg(lp->dev, "axienet_mdio_read(phy_id=%i, reg=%x) == %x\n", phy_id, reg, rc); + axienet_mdio_mdc_disable(lp); return rc; } @@ -111,9 +118,13 @@ static int axienet_mdio_write(struct mii_bus *bus, int phy_id, int reg, dev_dbg(lp->dev, "axienet_mdio_write(phy_id=%i, reg=%x, val=%x)\n", phy_id, reg, val); + axienet_mdio_mdc_enable(lp); + ret = axienet_mdio_wait_until_ready(lp); - if (ret < 0) + if (ret < 0) { + axienet_mdio_mdc_disable(lp); return ret; + } axienet_iow(lp, XAE_MDIO_MWD_OFFSET, (u32) val); axienet_iow(lp, XAE_MDIO_MCR_OFFSET, @@ -125,8 +136,11 @@ static i
Re: [PATCH v7 0/6] CTU CAN FD open-source IP core SocketCAN driver, PCI, platform integration and documentation
On 11/3/20 11:00 AM, Pavel Pisa wrote: > On Saturday 31 of October 2020 12:35:11 Marc Kleine-Budde wrote: >> On 10/30/20 11:19 PM, Pavel Pisa wrote: >>> This driver adds support for the CTU CAN FD open-source IP core. >> >> Please fix the following checkpatch warnings/errors: > > Yes I recheck with actual checkpatch, I have used 5.4 one > and may it be overlooked something during last upadates. I used the lastest one from linus/master :) >> - >> drivers/net/can/ctucanfd/ctucanfd_frame.h >> - >> CHECK: Please don't use multiple blank lines >> #46: FILE: drivers/net/can/ctucanfd/ctucanfd_frame.h:46: > > OK, we find a reason for this blank line in header generator. > >> CHECK: Prefer kernel type 'u32' over 'uint32_t' >> #49: FILE: drivers/net/can/ctucanfd/ctucanfd_frame.h:49: >> +uint32_t u32; > > In this case, please confirm that even your personal opinion > is against uint32_t in headers, you request the change. confirmed :) > uint32_t is used in many kernel headers and in this case > allows our tooling to use headers for mutual test of HDL > design match with HW access in the C. It's probably historically related :) > If the reasons to remove uint32_t prevails, we need to > separate Linux generator from the one used for other > purposes. When we add Linux mode then we can revamp > headers even more and in such case we can even invest > time to switch from structure bitfields to plain bitmask > defines. This is another point I wanted to address. Obviously checkpatch doesn't complain about bitfields, but it's frowned upon. > It is quite lot of work and takes some time, > but if there is consensus I do it during next weeks, > I would like to see what is preferred way to define > registers bitfields. I personally like RTEMS approach > for which we have prepared generator from parsed PDFs > when we added BSP for TMS570 > > > https://git.rtems.org/rtems/tree/bsps/arm/tms570/include/bsp/ti_herc/reg_dcan.h#n152 The current Linux way is to define bitmask with GENMASK() and single bit mask with BIT(). For example the mcp251xfd driver: First the register offset: > #define MCP251XFD_REG_CON 0x00 Then a bitmask: > #define MCP251XFD_REG_CON_TXBWS_MASK GENMASK(31, 28) And a single bit: > #define MCP251XFD_REG_CON_ABAT BIT(27) see: https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/net/can/spi/mcp251xfd/mcp251xfd.h#L24 The masks are used with FIELD_GET, FIELD_PREP. For example: https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c#L1386 > Other solution I like (biased, because I have even designed it) > is > > #define __val2mfld(mask,val) (((mask)&~((mask)<<1))*(val)&(mask)) > #define __mfld2val(mask,val) (((val)&(mask))/((mask)&~((mask)<<1))) > > https://gitlab.com/pikron/sw-base/sysless/-/blob/master/arch/arm/generic/defines/cpu_def.h#L314 > > Which allows to use simple masks, i.e. > #define SSP_CR0_DSS_m 0x000f /* Data Size Select (num bits - 1) */ > #define SSP_CR0_FRF_m 0x0030 /* Frame Format: 0 SPI, 1 TI, 2 Microwire */ > #define SSP_CR0_CPOL_m 0x0040 /* SPI Clock Polarity. 0 low between frames, > 1 high */ # > > > https://gitlab.com/pikron/sw-base/sysless/-/blob/master/libs4c/spi/spi_lpcssp.c#L46 > > in the sources > > lpcssp_drv->ssp_regs->CR0 = > __val2mfld(SSP_CR0_DSS_m, lpcssp_drv->data16_fl? 16 - 1 : > 8 - 1) | > __val2mfld(SSP_CR0_FRF_m, 0) | > (msg->size_mode & SPI_MODE_CPOL? SSP_CR0_CPOL_m: 0) | > (msg->size_mode & SPI_MODE_CPHA? SSP_CR0_CPHA_m: 0) | > __val2mfld(SSP_CR0_SCR_m, rate); > > > https://gitlab.com/pikron/sw-base/sysless/-/blob/master/libs4c/spi/spi_lpcssp.c#L217 > > If you have some preferred Linux style then please send us pointers. > In the fact, Ondrej Ille has based his structure bitfileds style > on the other driver included in the Linux kernel and it seems > to be a problem now. So when I invest my time, I want to use style > which pleases me and others. Hope that helps, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: OpenPGP digital signature
[PATCH -next] dpaa_eth: use false and true for bool variables
Fix coccicheck warnings: ./dpaa_eth.c:2549:2-22: WARNING: Assignment of 0/1 to bool variable ./dpaa_eth.c:2562:2-22: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot Signed-off-by: Zou Wei --- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index d9c2859..31407c1 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2546,7 +2546,7 @@ static void dpaa_eth_napi_enable(struct dpaa_priv *priv) for_each_online_cpu(i) { percpu_priv = per_cpu_ptr(priv->percpu_priv, i); - percpu_priv->np.down = 0; + percpu_priv->np.down = false; napi_enable(&percpu_priv->np.napi); } } @@ -2559,7 +2559,7 @@ static void dpaa_eth_napi_disable(struct dpaa_priv *priv) for_each_online_cpu(i) { percpu_priv = per_cpu_ptr(priv->percpu_priv, i); - percpu_priv->np.down = 1; + percpu_priv->np.down = true; napi_disable(&percpu_priv->np.napi); } } -- 2.6.2
Re: [PATCH ipsec] xfrm: Pass template address family to xfrm_state_look_at
On Mon, Nov 02, 2020 at 06:32:19PM -0800, Anthony DeRossi wrote: > This fixes a regression where valid selectors are incorrectly skipped > when xfrm_state_find is called with a non-matching address family (e.g. > when using IPv6-in-IPv4 ESP in transport mode). > > The state's address family is matched against the template's family > (encap_family) in xfrm_state_find before checking the selector in > xfrm_state_look_at. The template's family should also be used for > selector matching, otherwise valid selectors may be skipped. > > Fixes: e94ee171349d ("xfrm: Use correct address family in xfrm_state_find") > Signed-off-by: Anthony DeRossi > --- > net/xfrm/xfrm_state.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Your patch reintroduces the same bug that my patch was trying to fix, namely that when you do the comparison on flow you must use the original family and not some other value. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH ipsec] xfrm: Pass template address family to xfrm_state_look_at
On Mon, Nov 02, 2020 at 06:32:19PM -0800, Anthony DeRossi wrote: > This fixes a regression where valid selectors are incorrectly skipped > when xfrm_state_find is called with a non-matching address family (e.g. > when using IPv6-in-IPv4 ESP in transport mode). Why are we even allowing v6-over-v4 in transport mode? Isn't that the whole point of BEET mode? Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
RE: [PATCH -next] dpaa_eth: use false and true for bool variables
> -Original Message- > From: Zou Wei > Sent: 03 November 2020 14:05 > To: Madalin Bucur ; da...@davemloft.net; > k...@kernel.org > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Zou Wei > > Subject: [PATCH -next] dpaa_eth: use false and true for bool variables > > Fix coccicheck warnings: > > ./dpaa_eth.c:2549:2-22: WARNING: Assignment of 0/1 to bool variable > ./dpaa_eth.c:2562:2-22: WARNING: Assignment of 0/1 to bool variable > > Reported-by: Hulk Robot > Signed-off-by: Zou Wei > --- > drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c > b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c > index d9c2859..31407c1 100644 > --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c > +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c > @@ -2546,7 +2546,7 @@ static void dpaa_eth_napi_enable(struct dpaa_priv > *priv) > for_each_online_cpu(i) { > percpu_priv = per_cpu_ptr(priv->percpu_priv, i); > > - percpu_priv->np.down = 0; > + percpu_priv->np.down = false; > napi_enable(&percpu_priv->np.napi); > } > } > @@ -2559,7 +2559,7 @@ static void dpaa_eth_napi_disable(struct dpaa_priv > *priv) > for_each_online_cpu(i) { > percpu_priv = per_cpu_ptr(priv->percpu_priv, i); > > - percpu_priv->np.down = 1; > + percpu_priv->np.down = true; > napi_disable(&percpu_priv->np.napi); > } > } > -- > 2.6.2 Acked-by: Madalin Bucur
RE: [PATCH net v3 0/2] dpaa_eth: buffer layout fixes
> -Original Message- > From: Camelia Groza > Sent: 02 November 2020 20:35 > To: willemdebruijn.ker...@gmail.com; Madalin Bucur (OSS) > ; da...@davemloft.net; k...@kernel.org > Cc: netdev@vger.kernel.org; Camelia Alexandra Groza > Subject: [PATCH net v3 0/2] dpaa_eth: buffer layout fixes > > The patches are related to the software workaround for the A050385 erratum. > The first patch ensures optimal buffer usage for non-erratum scenarios. > The > second patch fixes a currently inconsequential discrepancy between the > FMan and Ethernet drivers. > > Changes in v3: > - refactor defines for clarity in 1/2 > - add more details on the user impact in 1/2 > - remove unnecessary inline identifier in 2/2 > > Changes in v2: > - make the returned value for TX ports explicit in 2/2 > - simplify the buf_layout reference in 2/2 > > Camelia Groza (2): > dpaa_eth: update the buffer layout for non-A050385 erratum scenarios > dpaa_eth: fix the RX headroom size alignment > > drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 28 + > - > 1 file changed, 18 insertions(+), 10 deletions(-) > > -- > 1.9.1 For the series, Acked-by: Madalin Bucur
Re: [PATCH bpf-next 0/5] selftests/xsk: xsk selftests
On Mon, 2 Nov 2020 at 23:08, Daniel Borkmann wrote: > > On 10/30/20 1:13 PM, Weqaar Janjua wrote: > > This patch set adds AF_XDP selftests based on veth to selftests/xsk/. > > > > # Topology: > > # - > > # --- > > # _ | Process | _ > > # / --- \ > > # /|\ > > #/ | \ > > # --- | --- > > # | Thread1 | | | Thread2 | > > # --- | --- > > # | | | > > # --- | --- > > # | xskX | | | xskY | > > # --- | --- > > # | | | > > # --- | -- > > # | vethX | - | vethY | > > # --- peer-- > > # | | | > > # namespaceX | namespaceY > > > > These selftests test AF_XDP SKB and Native/DRV modes using veth Virtual > > Ethernet interfaces. > > > > The test program contains two threads, each thread is single socket with > > a unique UMEM. It validates in-order packet delivery and packet content > > by sending packets to each other. > > > > Prerequisites setup by script TEST_PREREQUISITES.sh: > > > > Set up veth interfaces as per the topology shown ^^: > > * setup two veth interfaces and one namespace > > ** veth in root namespace > > ** veth in af_xdp namespace > > ** namespace af_xdp > > * create a spec file veth.spec that includes this run-time configuration > > that is read by test scripts - filenames prefixed with TEST_XSK > > *** and are randomly generated 4 digit numbers used to avoid > > conflict with any existing interface. > > > > The following tests are provided: > > > > 1. AF_XDP SKB mode > > Generic mode XDP is driver independent, used when the driver does > > not have support for XDP. Works on any netdevice using sockets and > > generic XDP path. XDP hook from netif_receive_skb(). > > a. nopoll - soft-irq processing > > b. poll - using poll() syscall > > c. Socket Teardown > >Create a Tx and a Rx socket, Tx from one socket, Rx on another. > >Destroy both sockets, then repeat multiple times. Only nopoll mode > > is used > > d. Bi-directional Sockets > >Configure sockets as bi-directional tx/rx sockets, sets up fill > > and completion rings on each socket, tx/rx in both directions. > > Only nopoll mode is used > > > > 2. AF_XDP DRV/Native mode > > Works on any netdevice with XDP_REDIRECT support, driver dependent. > > Processes packets before SKB allocation. Provides better performance > > than SKB. Driver hook available just after DMA of buffer descriptor. > > a. nopoll > > b. poll > > c. Socket Teardown > > d. Bi-directional Sockets > > * Only copy mode is supported because veth does not currently support > > zero-copy mode > > > > Total tests: 8. > > > > Flow: > > * Single process spawns two threads: Tx and Rx > > * Each of these two threads attach to a veth interface within their > >assigned namespaces > > * Each thread creates one AF_XDP socket connected to a unique umem > >for each veth interface > > * Tx thread transmits 10k packets from veth to veth > > * Rx thread verifies if all 10k packets were received and delivered > >in-order, and have the right content > > > > Structure of the patch set: > > > > Patch 1: This patch adds XSK Selftests framework under > > tools/testing/selftests/xsk, and README > > Patch 2: Adds tests: SKB poll and nopoll mode, mac-ip-udp debug, > > and README updates > > Patch 3: Adds tests: DRV poll and nopoll mode, and README updates > > Patch 4: Adds tests: SKB and DRV Socket Teardown, and README updates > > Patch 5: Adds tests: SKB and DRV Bi-directional Sockets, and README > > updates > > > > Thanks: Weqaar > > > > Weqaar Janjua (5): > >selftests/xsk: xsk selftests framework > >selftests/xsk: xsk selftests - SKB POLL, NOPOLL > >selftests/xsk: xsk selftests - DRV POLL, NOPOLL > >selftests/xsk: xsk selftests - Socket Teardown - SKB, DRV > >selftests/xsk: xsk selftests - Bi-directional Sockets - SKB, DRV > > Thanks a lot for adding the selftests, Weqaar! Given this needs to copy quite > a bit of BPF selftest base infra e.g. from Makefiles I'd prefer if you could > place these under selftests/bpf/ instead to avoid duplicating changes into two > locations. I understand that these tests don't integrate well into test_progs, > but for example see test_tc_redirect.sh or test_tc_edt.sh for stand-alone > tests > which could be done similarly with the xsk ones. Would be great if you could > integrate them and spin a v2 with that. > > Thanks, > Daniel Hi Daniel, Appreciate the pointers and suggestions which I will re-evaluate against merg
RE: [PATCH v2 net-next 0/3] fsl/qbman: in_interrupt() cleanup.
> -Original Message- > From: Sebastian Andrzej Siewior > Sent: Monday, November 2, 2020 01:23 > To: netdev@vger.kernel.org > Cc: Horia Geanta ; Aymen Sghaier > ; Herbert Xu ; > David S. Miller ; Madalin Bucur > ; Jakub Kicinski ; Leo Li > ; Thomas Gleixner ; Sebastian > Andrzej Siewior > Subject: [PATCH v2 net-next 0/3] fsl/qbman: in_interrupt() cleanup. > > This is the in_interrupt() clean for FSL DPAA framework and the two > users. > > The `napi' parameter has been renamed to `sched_napi', the other parts > are same as in the previous post [0]. > > [0] > https://lore.kernel.org/linux-arm-kernel/20201027225454.3492351-1-bige...@linutronix.de/ > > Sebastian Tested-by: Camelia Groza
Re: lan78xx: /sys/class/net/eth0/carrier stuck at 1
On Fri, 23 Oct 2020 15:05:19 +0200 Andrew Lunn wrote: > On Fri, Oct 23, 2020 at 08:29:59AM +0200, Juerg Haefliger wrote: > > On Wed, 21 Oct 2020 21:35:48 +0200 > > Andrew Lunn wrote: > > > > > On Wed, Oct 21, 2020 at 05:00:53PM +0200, Juerg Haefliger wrote: > > > > Hi, > > > > > > > > If the lan78xx driver is compiled into the kernel and the network cable > > > > is > > > > plugged in at boot, /sys/class/net/eth0/carrier is stuck at 1 and > > > > doesn't > > > > toggle if the cable is unplugged and replugged. > > > > > > > > If the network cable is *not* plugged in at boot, all seems to work > > > > fine. > > > > I.e., post-boot cable plugs and unplugs toggle the carrier flag. > > > > > > > > Also, everything seems to work fine if the driver is compiled as a > > > > module. > > > > > > > > There's an older ticket for the raspi kernel [1] but I've just tested > > > > this > > > > with a 5.8 kernel on a Pi 3B+ and still see that behavior. > > > > > > Hi Jürg > > > > Hi Andrew, > > > > > > > Could you check if a different PHY driver is being used when it is > > > built and broken vs module or built in and working. > > > > > > Look at /sys/class/net/eth0/phydev/driver > > > > There's no such file. > > I _think_ that means it is using genphy, the generic PHY driver, not a > specific vendor PHY driver? What does > > /sys/class/net/eth0/phydev/phy_id contain. There is no directory /sys/class/net/eth0/phydev. $ ls /sys/class/net/eth0/ addr_assign_type broadcastcarrier_down_count dev_port duplex ifalias link_mode netdev_group phys_port_name proto_down statisticstype addr_len carrier carrier_up_countdeviceflags ifindex mtu operstate phys_switch_id queues subsystem uevent address carrier_changes dev_id dormant gro_flush_timeout iflink name_assign_type phys_port_id power speed tx_queue_len > > Given that all works fine as long as the cable is unplugged at boot points > > more towards a race at boot or incorrect initialization sequence or > > something. > > Could be. Could you run > > mii-tool -vv eth0 Hrm. Running that command unlocks the carrier flag and it starts toggling on cable unplug/plug. First invocation: $ sudo mii-tool -vv eth0 Using SIOCGMIIPHY=0x8947 eth0: negotiated 1000baseT-FD flow-control, link ok registers for MII PHY 1: 1040 79ed 0007 c132 05e1 cde1 000f 0200 0800 3000 0088 3200 0004 0040 a000 a000 a035 product info: vendor 00:01:f0, model 19 rev 2 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control Subsequent invocation: $ sudo mii-tool -vv eth0 Using SIOCGMIIPHY=0x8947 eth0: negotiated 1000baseT-FD flow-control, link ok registers for MII PHY 1: 1040 79ed 0007 c132 05e1 cde1 000d 0200 0800 3000 0088 3200 0004 0040 a000 a035 product info: vendor 00:01:f0, model 19 rev 2 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control In the first invocation, register 0x1a shows a pending link-change interrupt (0xa000) which wasn't serviced (and cleared) for some reason. Dumping the registers cleared that interrupt bit and things start working correctly afterwards. Nor sure yet why that first interrupt is ignored. ...Juerg > in the good and bad case. > >Andrew > pgpEo77BWLfwu.pgp Description: OpenPGP digital signature
[net-next,v1,0/5] seg6: add support for SRv6 End.DT4 behavior
This patchset provides support for the SRv6 End.DT4 behavior. The SRv6 End.DT4 is used to implement multi-tenant IPv4 L3 VPN. It decapsulates the received packets and performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device. The SRv6 End.DT4 is defined in the SRv6 Network Programming [1]. - Patch 1/5 is needed to solve a pre-existing issue with tunneled packets when a sniffer is attached; - Patch 2/5 improves the management of the seg6local attributes used by the SRv6 behaviors; - Patch 3/5 introduces two callbacks used for customizing the creation/destruction of a SRv6 behavior; - Patch 4/5 is the core patch that adds support for the SRv6 End.DT4 behavior; - Patch 5/5 adds the selftest for SRv6 End.DT4 behavior. I would like to thank David Ahern for his support during the development of this patch set. Comments, suggestions and improvements are very welcome! Thanks, Andrea Mayer v1 improve comments; add new patch 2/5 titled: seg6: improve management of behavior attributes seg6: add support for the SRv6 End.DT4 behavior - remove the inline keyword in the definition of fib6_config_get_net(). selftests: add selftest for the SRv6 End.DT4 behavior - add check for the vrf sysctl [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming Andrea Mayer (5): vrf: add mac header for tunneled packets when sniffer is attached seg6: improve management of behavior attributes seg6: add callbacks for customizing the creation/destruction of a behavior seg6: add support for the SRv6 End.DT4 behavior selftests: add selftest for the SRv6 End.DT4 behavior drivers/net/vrf.c | 78 ++- net/ipv6/seg6_local.c | 370 - .../selftests/net/srv6_end_dt4_l3vpn_test.sh | 494 ++ 3 files changed, 927 insertions(+), 15 deletions(-) create mode 100755 tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh -- 2.20.1
[net-next,v1,1/5] vrf: add mac header for tunneled packets when sniffer is attached
Before this patch, a sniffer attached to a VRF used as the receiving interface of L3 tunneled packets detects them as malformed packets and it complains about that (i.e.: tcpdump shows bogus packets). The reason is that a tunneled L3 packet does not carry any L2 information and when the VRF is set as the receiving interface of a decapsulated L3 packet, no mac header is currently set or valid. Therefore, the purpose of this patch consists of adding a MAC header to any packet which is directly received on the VRF interface ONLY IF: i) a sniffer is attached on the VRF and ii) the mac header is not set. In this case, the mac address of the VRF is copied in both the destination and the source address of the ethernet header. The protocol type is set either to IPv4 or IPv6, depending on which L3 packet is received. Signed-off-by: Andrea Mayer --- drivers/net/vrf.c | 78 +++ 1 file changed, 72 insertions(+), 6 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 60c1aadece89..26f2ed02a5c1 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1263,6 +1263,61 @@ static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev, skb_dst_set(skb, &rt6->dst); } +static int vrf_prepare_mac_header(struct sk_buff *skb, + struct net_device *vrf_dev, u16 proto) +{ + struct ethhdr *eth; + int err; + + /* in general, we do not know if there is enough space in the head of +* the packet for hosting the mac header. +*/ + err = skb_cow_head(skb, LL_RESERVED_SPACE(vrf_dev)); + if (unlikely(err)) + /* no space in the skb head */ + return -ENOBUFS; + + __skb_push(skb, ETH_HLEN); + eth = (struct ethhdr *)skb->data; + + skb_reset_mac_header(skb); + + /* we set the ethernet destination and the source addresses to the +* address of the VRF device. +*/ + ether_addr_copy(eth->h_dest, vrf_dev->dev_addr); + ether_addr_copy(eth->h_source, vrf_dev->dev_addr); + eth->h_proto = htons(proto); + + /* the destination address of the Ethernet frame corresponds to the +* address set on the VRF interface; therefore, the packet is intended +* to be processed locally. +*/ + skb->protocol = eth->h_proto; + skb->pkt_type = PACKET_HOST; + + skb_postpush_rcsum(skb, skb->data, ETH_HLEN); + + skb_pull_inline(skb, ETH_HLEN); + + return 0; +} + +/* prepare and add the mac header to the packet if it was not set previously. + * In this way, packet sniffers such as tcpdump can parse the packet correctly. + * If the mac header was already set, the original mac header is left + * untouched and the function returns immediately. + */ +static int vrf_add_mac_header_if_unset(struct sk_buff *skb, + struct net_device *vrf_dev, + u16 proto) +{ + if (skb_mac_header_was_set(skb)) + return 0; + + return vrf_prepare_mac_header(skb, vrf_dev, proto); +} + static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, struct sk_buff *skb) { @@ -1289,9 +1344,15 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, skb->skb_iif = vrf_dev->ifindex; if (!list_empty(&vrf_dev->ptype_all)) { - skb_push(skb, skb->mac_len); - dev_queue_xmit_nit(skb, vrf_dev); - skb_pull(skb, skb->mac_len); + int err; + + err = vrf_add_mac_header_if_unset(skb, vrf_dev, + ETH_P_IPV6); + if (likely(!err)) { + skb_push(skb, skb->mac_len); + dev_queue_xmit_nit(skb, vrf_dev); + skb_pull(skb, skb->mac_len); + } } IP6CB(skb)->flags |= IP6SKB_L3SLAVE; @@ -1334,9 +1395,14 @@ static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev, vrf_rx_stats(vrf_dev, skb->len); if (!list_empty(&vrf_dev->ptype_all)) { - skb_push(skb, skb->mac_len); - dev_queue_xmit_nit(skb, vrf_dev); - skb_pull(skb, skb->mac_len); + int err; + + err = vrf_add_mac_header_if_unset(skb, vrf_dev, ETH_P_IP); + if (likely(!err)) { + skb_push(skb, skb->mac_len); + dev_queue_xmit_nit(skb, vrf_dev); + skb_pull(skb, skb->mac_len); + } } skb = vrf_rcv_nfhook(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, vrf_dev); -- 2.20.1
[net-next,v1,2/5] seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc), the parse() callback performs some validity checks on the provided input and updates the tunnel state (slwt) with the result of the parsing operation. However, an attribute may also need to reserve some additional resources (i.e.: memory or setting up an eBPF program) in the parse() callback to complete the parsing operation. The parse() callbacks are invoked by the parse_nla_action() for each attribute belonging to a specific behavior. Given a behavior with N attributes, if the parsing of the i-th attribute fails, the parse_nla_action() returns immediately with an error. Nonetheless, the resources acquired during the parsing of the i-1 attributes are not freed by the parse_nla_action(). Attributes which acquire resources must release them *in an explicit way* in both the seg6_local_{build/destroy}_state(). However, adding a new attribute of this type requires changes to seg6_local_{build/destroy}_state() to release the resources correctly. The seg6local infrastructure still lacks a simple and structured way to release the resources acquired in the parse() operations. We introduced a new callback in the struct seg6_action_param named destroy(). This callback releases any resource which may have been acquired in the parse() counterpart. Each attribute may or may not implement the destroy() callback depending on whether it needs to free some acquired resources. The destroy() callback comes with several of advantages: 1) we can have many attributes as we want for a given behavior with no need to explicitly free the taken resources; 2) As in case of the seg6_local_build_state(), the seg6_local_destroy_state() does not need to handle the release of resources directly. Indeed, it calls the destroy_attrs() function which is in charge of calling the destroy() callback for every set attribute. We do not need to patch seg6_local_{build/destroy}_state() anymore as we add new attributes; 3) the code is more readable and better structured. Indeed, all the information needed to handle a given attribute are contained in only one place; 4) it facilitates the integration with new features introduced in further patches. Signed-off-by: Andrea Mayer --- net/ipv6/seg6_local.c | 103 ++ 1 file changed, 93 insertions(+), 10 deletions(-) diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index eba23279912d..63a82e2fdea9 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -710,6 +710,12 @@ static int cmp_nla_srh(struct seg6_local_lwt *a, struct seg6_local_lwt *b) return memcmp(a->srh, b->srh, len); } +static void destroy_attr_srh(struct seg6_local_lwt *slwt) +{ + kfree(slwt->srh); + slwt->srh = NULL; +} + static int parse_nla_table(struct nlattr **attrs, struct seg6_local_lwt *slwt) { slwt->table = nla_get_u32(attrs[SEG6_LOCAL_TABLE]); @@ -901,16 +907,33 @@ static int cmp_nla_bpf(struct seg6_local_lwt *a, struct seg6_local_lwt *b) return strcmp(a->bpf.name, b->bpf.name); } +static void destroy_attr_bpf(struct seg6_local_lwt *slwt) +{ + kfree(slwt->bpf.name); + if (slwt->bpf.prog) + bpf_prog_put(slwt->bpf.prog); + + slwt->bpf.name = NULL; + slwt->bpf.prog = NULL; +} + struct seg6_action_param { int (*parse)(struct nlattr **attrs, struct seg6_local_lwt *slwt); int (*put)(struct sk_buff *skb, struct seg6_local_lwt *slwt); int (*cmp)(struct seg6_local_lwt *a, struct seg6_local_lwt *b); + + /* optional destroy() callback useful for releasing resources which +* have been previously acquired in the corresponding parse() +* function. +*/ + void (*destroy)(struct seg6_local_lwt *slwt); }; static struct seg6_action_param seg6_action_params[SEG6_LOCAL_MAX + 1] = { [SEG6_LOCAL_SRH]= { .parse = parse_nla_srh, .put = put_nla_srh, - .cmp = cmp_nla_srh }, + .cmp = cmp_nla_srh, + .destroy = destroy_attr_srh }, [SEG6_LOCAL_TABLE] = { .parse = parse_nla_table, .put = put_nla_table, @@ -934,13 +957,68 @@ static struct seg6_action_param seg6_action_params[SEG6_LOCAL_MAX + 1] = { [SEG6_LOCAL_BPF]= { .parse = parse_nla_bpf, .put = put_nla_bpf, - .cmp = cmp_nla_bpf }, + .cmp = cmp_nla_bpf, + .destroy = destroy_attr_bpf }, }; +/* call the destroy() callback (if available) for each set attribute in + * @parsed_attrs, starting from attribute index @start up to @end excluded. + */ +static void __destroy_attrs(unsigned long parsed_attrs, int start, int end, +
[net-next,v1,5/5] selftests: add selftest for the SRv6 End.DT4 behavior
this selftest is designed for evaluating the new SRv6 End.DT4 behavior used, in this example, for implementing IPv4 L3 VPN use cases. Signed-off-by: Andrea Mayer --- .../selftests/net/srv6_end_dt4_l3vpn_test.sh | 494 ++ 1 file changed, 494 insertions(+) create mode 100755 tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh diff --git a/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh new file mode 100755 index ..a5547fed5048 --- /dev/null +++ b/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh @@ -0,0 +1,494 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# author: Andrea Mayer + +# This test is designed for evaluating the new SRv6 End.DT4 behavior used for +# implementing IPv4 L3 VPN use cases. +# +# Hereafter a network diagram is shown, where two different tenants (named 100 +# and 200) offer IPv4 L3 VPN services allowing hosts to communicate with each +# other across an IPv6 network. +# +# Only hosts belonging to the same tenant (and to the same VPN) can communicate +# with each other. Instead, the communication among hosts of different tenants +# is forbidden. +# In other words, hosts hs-t100-1 and hs-t100-2 are connected through the IPv4 +# L3 VPN of tenant 100 while hs-t200-3 and hs-t200-4 are connected using the +# IPv4 L3 VPN of tenant 200. Cross connection between tenant 100 and tenant 200 +# is forbidden and thus, for example, hs-t100-1 cannot reach hs-t200-3 and vice +# versa. +# +# Routers rt-1 and rt-2 implement IPv4 L3 VPN services leveraging the SRv6 +# architecture. The key components for such VPNs are: a) SRv6 Encap behavior, +# b) SRv6 End.DT4 behavior and c) VRF. +# +# To explain how an IPv4 L3 VPN based on SRv6 works, let us briefly consider an +# example where, within the same domain of tenant 100, the host hs-t100-1 pings +# the host hs-t100-2. +# +# First of all, L2 reachability of the host hs-t100-2 is taken into account by +# the router rt-1 which acts as an arp proxy. +# +# When the host hs-t100-1 sends an IPv4 packet destined to hs-t100-2, the +# router rt-1 receives the packet on the internal veth-t100 interface. Such +# interface is enslaved to the VRF vrf-100 whose associated table contains the +# SRv6 Encap route for encapsulating any IPv4 packet in a IPv6 plus the Segment +# Routing Header (SRH) packet. This packet is sent through the (IPv6) core +# network up to the router rt-2 that receives it on veth0 interface. +# +# The rt-2 router uses the 'localsid' routing table to process incoming +# IPv6+SRH packets which belong to the VPN of the tenant 100. For each of these +# packets, the SRv6 End.DT4 behavior removes the outer IPv6+SRH headers and +# performs the lookup on the vrf-100 table using the destination address of +# the decapsulated IPv4 packet. Afterwards, the packet is sent to the host +# hs-t100-2 through the veth-t100 interface. +# +# The ping response follows the same processing but this time the role of rt-1 +# and rt-2 are swapped. +# +# Of course, the IPv4 L3 VPN for tenant 200 works exactly as the IPv4 L3 VPN +# for tenant 100. In this case, only hosts hs-t200-3 and hs-t200-4 are able to +# connect with each other. +# +# +# +---+ +---+ +# | | | | +# | hs-t100-1 netns | | hs-t100-2 netns | +# | | | | +# | +-+ | | +-+ | +# | |veth0| | | |veth0| | +# | | 10.0.0.1/24 | | | | 10.0.0.2/24 | | +# | +-+ | | +-+ | +# |. | | . | +# +---+ +---+ +# .. +# .. +# .. +# +---+ +---+ +# |. | | . | +# | +---+ | | + | +# | | veth-t100 | | | | veth-t100 | | +# | | 10.0.0.254/24 |+--+ | | +--+| 10.0.0.254/24 | | +# | +---+---+| localsid | | | | localsid |+---+ | +# | || table | | | | table || | +# |+++ +--+ | | +--+ +++| +# || vrf-100 || || vrf-100 || +# |
[net-next,v1,4/5] seg6: add support for the SRv6 End.DT4 behavior
SRv6 End.DT4 is defined in the SRv6 Network Programming [1]. The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in multi-tenants environments. It decapsulates the received packets and it performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. To make the End.DT4 work properly, it must be guaranteed that the routing table used for routing lookup operations is bound to one and only one VRF during the tunnel creation. Such constraint has to be enforced by enabling the VRF strict_mode sysctl parameter, i.e: $ sysctl -wq net.vrf.strict_mode=1. At JANOG44, LINE corporation presented their multi-tenant DC architecture using SRv6 [2]. In the slides, they reported that the Linux kernel is missing the support of SRv6 End.DT4 behavior. The iproute2 counterpart required for configuring the SRv6 End.DT4 behavior is already implemented along with the other supported SRv6 behaviors [3]. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming [2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6 [3] https://patchwork.ozlabs.org/patch/799837/ Signed-off-by: Andrea Mayer --- net/ipv6/seg6_local.c | 205 ++ 1 file changed, 205 insertions(+) diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index 4b0f155d641d..a41074acd43e 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -57,6 +57,14 @@ struct bpf_lwt_prog { char *name; }; +struct seg6_end_dt4_info { + struct net *net; + /* VRF device associated to the routing table used by the SRv6 End.DT4 +* behavior for routing IPv4 packets. +*/ + int vrf_ifindex; +}; + struct seg6_local_lwt { int action; struct ipv6_sr_hdr *srh; @@ -66,6 +74,7 @@ struct seg6_local_lwt { int iif; int oif; struct bpf_lwt_prog bpf; + struct seg6_end_dt4_info dt4_info; int headroom; struct seg6_action_desc *desc; @@ -413,6 +422,194 @@ static int input_action_end_dx4(struct sk_buff *skb, return -EINVAL; } +#ifdef CONFIG_NET_L3_MASTER_DEV + +static struct net *fib6_config_get_net(const struct fib6_config *fib6_cfg) +{ + const struct nl_info *nli = &fib6_cfg->fc_nlinfo; + + return nli->nl_net; +} + +static int seg6_end_dt4_build(struct seg6_local_lwt *slwt, const void *cfg, + struct netlink_ext_ack *extack) +{ + struct seg6_end_dt4_info *info = &slwt->dt4_info; + int vrf_ifindex; + struct net *net; + + net = fib6_config_get_net(cfg); + + vrf_ifindex = l3mdev_ifindex_lookup_by_table_id(L3MDEV_TYPE_VRF, net, + slwt->table); + if (vrf_ifindex < 0) { + if (vrf_ifindex == -EPERM) { + NL_SET_ERR_MSG(extack, + "Strict mode for VRF is disabled"); + } else if (vrf_ifindex == -ENODEV) { + NL_SET_ERR_MSG(extack, "No such device"); + } else { + NL_SET_ERR_MSG(extack, "Unknown error"); + + pr_debug("seg6local: SRv6 End.DT4 creation error=%d\n", +vrf_ifindex); + } + + return vrf_ifindex; + } + + info->net = net; + info->vrf_ifindex = vrf_ifindex; + + return 0; +} + +/* The SRv6 End.DT4 behavior extracts the inner (IPv4) packet and routes the + * IPv4 packet by looking at the configured routing table. + * + * In the SRv6 End.DT4 use case, we can receive traffic (IPv6+Segment Routing + * Header packets) from several interfaces and the IPv6 destination address (DA) + * is used for retrieving the specific instance of the End.DT4 behavior that + * should process the packets. + * + * However, the inner IPv4 packet is not really bound to any receiving + * interface and thus the End.DT4 sets the VRF (associated with the + * corresponding routing table) as the *receiving* interface. + * In other words, the End.DT4 processes a packet as if it has been received + * directly by the VRF (and not by one of its slave devices, if any). + * In this way, the VRF interface is used for routing the IPv4 packet in + * according to the routing table configured by the End.DT4 instance. + * + * This design allows you to get some interesting features like: + * 1) the statistics on rx packets; + * 2) the possibility to install a packet sniffer on the receiving interface + * (the VRF one) for looking at the incoming packets; + * 3) the possibility to leverage the netfilter prerouting hook for the inner + * IPv4 packet. + * + * This function returns: + * - the sk_buff* when the VRF rcv handler has processed the packet correctly; + * - NULL when the skb is consumed by the VRF rcv
[net-next,v1,3/5] seg6: add callbacks for customizing the creation/destruction of a behavior
We introduce two callbacks used for customizing the creation/destruction of a SRv6 behavior. Such callbacks are defined in the new struct seg6_local_lwtunnel_ops and hereafter we provide a brief description of them: - build_state(...): used for calling the custom constructor of the behavior during its initialization phase and after all the attributes have been parsed successfully; - destroy_state(...): used for calling the custom destructor of the behavior before it is completely destroyed. Signed-off-by: Andrea Mayer --- net/ipv6/seg6_local.c | 64 +++ 1 file changed, 64 insertions(+) diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index 63a82e2fdea9..4b0f155d641d 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -33,11 +33,23 @@ struct seg6_local_lwt; +typedef int (*slwt_build_state_t)(struct seg6_local_lwt *slwt, const void *cfg, + struct netlink_ext_ack *extack); +typedef void (*slwt_destroy_state_t)(struct seg6_local_lwt *slwt); + +/* callbacks used for customizing the creation and destruction of a behavior */ +struct seg6_local_lwtunnel_ops { + slwt_build_state_t build_state; + slwt_destroy_state_t destroy_state; +}; + struct seg6_action_desc { int action; unsigned long attrs; int (*input)(struct sk_buff *skb, struct seg6_local_lwt *slwt); int static_headroom; + + struct seg6_local_lwtunnel_ops slwt_ops; }; struct bpf_lwt_prog { @@ -1015,6 +1027,45 @@ static void destroy_attrs(struct seg6_local_lwt *slwt) __destroy_attrs(attrs, 0, SEG6_LOCAL_MAX + 1, slwt); } +/* call the custom constructor of the behavior during its initialization phase + * and after that all its attributes have been parsed successfully. + */ +static int +seg6_local_lwtunnel_build_state(struct seg6_local_lwt *slwt, const void *cfg, + struct netlink_ext_ack *extack) +{ + slwt_build_state_t build_func; + struct seg6_action_desc *desc; + int err = 0; + + desc = slwt->desc; + if (!desc) + return -EINVAL; + + build_func = desc->slwt_ops.build_state; + if (build_func) + err = build_func(slwt, cfg, extack); + + return err; +} + +/* call the custom destructor of the behavior which is invoked before the + * tunnel is going to be destroyed. + */ +static void seg6_local_lwtunnel_destroy_state(struct seg6_local_lwt *slwt) +{ + slwt_destroy_state_t destroy_func; + struct seg6_action_desc *desc; + + desc = slwt->desc; + if (!desc) + return; + + destroy_func = desc->slwt_ops.destroy_state; + if (destroy_func) + destroy_func(slwt); +} + static int parse_nla_action(struct nlattr **attrs, struct seg6_local_lwt *slwt) { struct seg6_action_param *param; @@ -1090,8 +1141,16 @@ static int seg6_local_build_state(struct net *net, struct nlattr *nla, err = parse_nla_action(tb, slwt); if (err < 0) + /* In case of error, the parse_nla_action() takes care of +* releasing resources which have been acquired during the +* processing of attributes. +*/ goto out_free; + err = seg6_local_lwtunnel_build_state(slwt, cfg, extack); + if (err < 0) + goto free_attrs; + newts->type = LWTUNNEL_ENCAP_SEG6_LOCAL; newts->flags = LWTUNNEL_STATE_INPUT_REDIRECT; newts->headroom = slwt->headroom; @@ -1100,6 +1159,9 @@ static int seg6_local_build_state(struct net *net, struct nlattr *nla, return 0; +free_attrs: + destroy_attrs(slwt); + out_free: kfree(newts); return err; @@ -1109,6 +1171,8 @@ static void seg6_local_destroy_state(struct lwtunnel_state *lwt) { struct seg6_local_lwt *slwt = seg6_local_lwtunnel(lwt); + seg6_local_lwtunnel_destroy_state(slwt); + destroy_attrs(slwt); return; -- 2.20.1
Re: [PATCH] cfg80211: make wifi driver probe
Kelvin Cheung writes: > We are preparing the Wi-Fi driver for Unisoc WCN chips. Please ignore > this draft version. There will be a formal version soon. Ok, I'll drop this. But please don't use HTML in mails, more info in the wiki page below. I recommend reading it all very carefully. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: KASAN: use-after-free Read in decode_session6
On Sun, Nov 1, 2020 at 1:40 PM syzbot wrote: > > syzbot has bisected this issue to: > > commit bcd623d8e9fa5f82bbd8cd464dc418d24139157b > Author: Xin Long > Date: Thu Oct 29 07:05:05 2020 + > > sctp: call sk_setup_caps in sctp_packet_transmit instead > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14df9cb850 > start commit: 68bb4665 Merge branch 'l2-multicast-forwarding-for-ocelot-.. > git tree: net-next > final oops: https://syzkaller.appspot.com/x/report.txt?x=16df9cb850 > console output: https://syzkaller.appspot.com/x/log.txt?x=12df9cb850 > kernel config: https://syzkaller.appspot.com/x/.config?x=eac680ae76558a0e > dashboard link: https://syzkaller.appspot.com/bug?extid=5be8aebb1b7dfa90ef31 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1128639850 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11bbf39850 > > Reported-by: syzbot+5be8aebb1b7dfa90e...@syzkaller.appspotmail.com > Fixes: bcd623d8e9fa ("sctp: call sk_setup_caps in sctp_packet_transmit > instead") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection I'm looking into this, Thanks.
[PATCH net-next] net: emaclite: Add error handling for of_address_ and phy read functions
From: Shravya Kumbham Add ret variable, conditions to check the return value and it's error path for of_address_to_resource() and phy_read() functions. Addresses-Coverity: Event check_return value. Signed-off-by: Shravya Kumbham Signed-off-by: Radhey Shyam Pandey --- drivers/net/ethernet/xilinx/xilinx_emaclite.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c index 0c26f5b..fc5ccd1 100644 --- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c @@ -820,7 +820,7 @@ static int xemaclite_mdio_write(struct mii_bus *bus, int phy_id, int reg, static int xemaclite_mdio_setup(struct net_local *lp, struct device *dev) { struct mii_bus *bus; - int rc; + int rc, ret; struct resource res; struct device_node *np = of_get_parent(lp->phy_node); struct device_node *npp; @@ -834,7 +834,13 @@ static int xemaclite_mdio_setup(struct net_local *lp, struct device *dev) } npp = of_get_parent(np); - of_address_to_resource(npp, 0, &res); + ret = of_address_to_resource(npp, 0, &res); + if (ret) { + dev_err(dev, "%s resource error!\n", + dev->of_node->full_name); + of_node_put(lp->phy_node); + return ret; + } if (lp->ndev->mem_start != res.start) { struct phy_device *phydev; phydev = of_phy_find_device(lp->phy_node); @@ -923,7 +929,7 @@ static int xemaclite_open(struct net_device *dev) xemaclite_disable_interrupts(lp); if (lp->phy_node) { - u32 bmcr; + int bmcr; lp->phy_dev = of_phy_connect(lp->ndev, lp->phy_node, xemaclite_adjust_link, 0, @@ -945,6 +951,13 @@ static int xemaclite_open(struct net_device *dev) /* Restart auto negotiation */ bmcr = phy_read(lp->phy_dev, MII_BMCR); + if (bmcr < 0) { + dev_err(&lp->ndev->dev, "phy_read failed\n"); + phy_disconnect(lp->phy_dev); + lp->phy_dev = NULL; + + return bmcr; + } bmcr |= (BMCR_ANENABLE | BMCR_ANRESTART); phy_write(lp->phy_dev, MII_BMCR, bmcr); -- 2.7.4
[PATCH v6] lib: optimize cpumask_local_spread()
From: Yuqi Jin In multi-processor and NUMA system, I/O driver will find cpu cores that which shall be bound IRQ. When cpu cores in the local numa have been used, it is better to find the node closest to the local numa node for performance, instead of choosing any online cpu immediately. Currently, Intel DDIO affects only local sockets, so its performance improvement is due to the relative difference in performance between the local socket I/O and remote socket I/O.To ensure that Intel DDIO’s benefits are available to applications where they are most useful, the irq can be pinned to particular sockets using Intel DDIO. This arrangement is called socket affinityi. So this patch can help Intel DDIO work. The same I/O stash function for most processors On Huawei Kunpeng 920 server, there are 4 NUMA node(0 - 3) in the 2-cpu system(0 - 1). The topology of this server is followed: available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 node 0 size: 63379 MB node 0 free: 61899 MB node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 1 size: 64509 MB node 1 free: 63942 MB node 2 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 node 2 size: 64509 MB node 2 free: 63056 MB node 3 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 node 3 size: 63997 MB node 3 free: 63420 MB node distances: node 0 1 2 3 0: 10 16 32 33 1: 16 10 25 32 2: 32 25 10 16 3: 33 32 16 10 We perform PS (parameter server) business test, the behavior of the service is that the client initiates a request through the network card, the server responds to the request after calculation. When two PS processes run on node2 and node3 separately and the network card is located on 'node2' which is in cpu1, the performance of node2 (26W QPS) and node3 (22W QPS) is different. It is better that the NIC queues are bound to the cpu1 cores in turn, then XPS will also be properly initialized, while cpumask_local_spread only considers the local node. When the number of NIC queues exceeds the number of cores in the local node, it returns to the online core directly. So when PS runs on node3 sending a calculated request, the performance is not as good as the node2. The IRQ from 369-392 will be bound from NUMA node0 to NUMA node3 with this patch, before the patch: Euler:/sys/bus/pci # cat /proc/irq/369/smp_affinity_list 0 Euler:/sys/bus/pci # cat /proc/irq/370/smp_affinity_list 1 ... Euler:/sys/bus/pci # cat /proc/irq/391/smp_affinity_list 22 Euler:/sys/bus/pci # cat /proc/irq/392/smp_affinity_list 23 After the patch: Euler:/sys/bus/pci # cat /proc/irq/369/smp_affinity_list 72 Euler:/sys/bus/pci # cat /proc/irq/370/smp_affinity_list 73 ... Euler:/sys/bus/pci # cat /proc/irq/391/smp_affinity_list 94 Euler:/sys/bus/pci # cat /proc/irq/392/smp_affinity_list 95 So the performance of the node3 is the same as node2 that is 26W QPS when the network card is still in 'node2' with the patch. It is considered that the NIC and other I/O devices shall initialize the interrupt binding, if the cores of the local node are used up, it is reasonable to return the node closest to it. Let's optimize it and find the nearest node through NUMA distance for the non-local NUMA nodes. Cc: Rusty Russell Cc: Andrew Morton Cc: Juergen Gross Cc: Paul Burton Cc: Michal Hocko Cc: Michael Ellerman Cc: Mike Rapoport Cc: Anshuman Khandual Signed-off-by: Yuqi Jin Signed-off-by: Shaokun Zhang --- Hi Andrew, I rebased this patch later following this thread [1] ChangeLog from v5: 1. Rebase to 5.10-rc2 ChangeLog from v4: 1. Rebase to 5.6-rc3 ChangeLog from v3: 1. Make spread_lock local to cpumask_local_spread(); 2. Add more descriptions on the affinities change in log; ChangeLog from v2: 1. Change the variables as static and use spinlock to protect; 2. Give more explantation on test and performance; [1]https://lkml.org/lkml/2020/6/30/1300 lib/cpumask.c | 66 +-- 1 file changed, 55 insertions(+), 11 deletions(-) diff --git a/lib/cpumask.c b/lib/cpumask.c index 85da6ab4fbb5..baecaf271770 100644 --- a/lib/cpumask.c +++ b/lib/cpumask.c @@ -193,6 +193,38 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask) } #endif +static void calc_node_distance(int *node_dist, int node) +{ + int i; + + for (i = 0; i < nr_node_ids; i++) + node_dist[i] = node_distance(node, i); +} + +static int find_nearest_node(int *node_dist, bool *used) +{ + int i, min_dist = node_dist[0], node_id = -1; + + /* Choose the first unused node to compare */ + for (i = 0; i < nr_node_ids; i++) { + if (used[i] == 0) { + min_dist = node_dist[i]; + node_id = i; + break; + } + } + + /* Compare and return the neares
Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA
> > > static int rvu_devlink_info_get(struct devlink *devlink, struct > > devlink_info_req *req, > > > struct netlink_ext_ack *extack) { @@ > > > -53,7 +483,8 @@ int rvu_register_dl(struct rvu *rvu) > > > rvu_dl->dl = dl; > > > rvu_dl->rvu = rvu; > > > rvu->rvu_dl = rvu_dl; > > > - return 0; > > > + > > > + return rvu_health_reporters_create(rvu); > > > > when would this be called with rvu->rvu_dl == NULL? > > During initialization. This is the only caller, and it is only reached if rvu_dl is non-zero.
Re: KASAN: use-after-free Read in decode_session6
On Tue, Nov 3, 2020 at 9:14 PM Xin Long wrote: > > On Sun, Nov 1, 2020 at 1:40 PM syzbot > wrote: > > > > syzbot has bisected this issue to: > > > > commit bcd623d8e9fa5f82bbd8cd464dc418d24139157b > > Author: Xin Long > > Date: Thu Oct 29 07:05:05 2020 + > > > > sctp: call sk_setup_caps in sctp_packet_transmit instead > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14df9cb850 > > start commit: 68bb4665 Merge branch 'l2-multicast-forwarding-for-ocelot-.. > > git tree: net-next > > final oops: https://syzkaller.appspot.com/x/report.txt?x=16df9cb850 > > console output: https://syzkaller.appspot.com/x/log.txt?x=12df9cb850 > > kernel config: https://syzkaller.appspot.com/x/.config?x=eac680ae76558a0e > > dashboard link: https://syzkaller.appspot.com/bug?extid=5be8aebb1b7dfa90ef31 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1128639850 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11bbf39850 > > > > Reported-by: syzbot+5be8aebb1b7dfa90e...@syzkaller.appspotmail.com > > Fixes: bcd623d8e9fa ("sctp: call sk_setup_caps in sctp_packet_transmit > > instead") > > > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > I'm looking into this, Thanks. This was actually caused by: commit a1dd2cf2f1aedabc2ca9bb4f90231a521c52d8eb Author: Xin Long Date: Thu Oct 29 15:05:03 2020 +0800 sctp: allow changing transport encap_port by peer packets where the IP6CB was overwritten by SCTP_INPUT_CB. inet6_skb_parmI will fix it by bringing inet6_skb_parm back to sctp_input_cb: struct sctp_input_cb { + union { + struct inet_skb_parmh4; +#if IS_ENABLED(CONFIG_IPV6) + struct inet6_skb_parm h6; +#endif + } header; + __be16 encap_port; struct sctp_chunk *chunk; struct sctp_af *af; - __be16 encap_port; }; Will post it soon, Thanks.
[PATCH net-next] enetc: Remove Tx checksumming offload code
Tx checksumming has been defeatured and completely removed from the h/w reference manual. Made a little cleanup for the TSE case as this is complementary code. Signed-off-by: Claudiu Manoil --- drivers/net/ethernet/freescale/enetc/enetc.c | 51 ++- drivers/net/ethernet/freescale/enetc/enetc.h | 5 +- .../net/ethernet/freescale/enetc/enetc_hw.h | 47 - .../net/ethernet/freescale/enetc/enetc_pf.c | 10 +--- .../net/ethernet/freescale/enetc/enetc_vf.c | 10 +--- 5 files changed, 32 insertions(+), 91 deletions(-) diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 52be6e315752..01089c30b462 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -47,40 +47,6 @@ netdev_tx_t enetc_xmit(struct sk_buff *skb, struct net_device *ndev) return NETDEV_TX_OK; } -static bool enetc_tx_csum(struct sk_buff *skb, union enetc_tx_bd *txbd) -{ - int l3_start, l3_hsize; - u16 l3_flags, l4_flags; - - if (skb->ip_summed != CHECKSUM_PARTIAL) - return false; - - switch (skb->csum_offset) { - case offsetof(struct tcphdr, check): - l4_flags = ENETC_TXBD_L4_TCP; - break; - case offsetof(struct udphdr, check): - l4_flags = ENETC_TXBD_L4_UDP; - break; - default: - skb_checksum_help(skb); - return false; - } - - l3_start = skb_network_offset(skb); - l3_hsize = skb_network_header_len(skb); - - l3_flags = 0; - if (skb->protocol == htons(ETH_P_IPV6)) - l3_flags = ENETC_TXBD_L3_IPV6; - - /* write BD fields */ - txbd->l3_csoff = enetc_txbd_l3_csoff(l3_start, l3_hsize, l3_flags); - txbd->l4_csoff = l4_flags; - - return true; -} - static void enetc_unmap_tx_buff(struct enetc_bdr *tx_ring, struct enetc_tx_swbd *tx_swbd) { @@ -146,22 +112,16 @@ static int enetc_map_tx_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb, if (do_vlan || do_tstamp) flags |= ENETC_TXBD_FLAGS_EX; - if (enetc_tx_csum(skb, &temp_bd)) - flags |= ENETC_TXBD_FLAGS_CSUM | ENETC_TXBD_FLAGS_L4CS; - else if (tx_ring->tsd_enable) + if (tx_ring->tsd_enable) flags |= ENETC_TXBD_FLAGS_TSE | ENETC_TXBD_FLAGS_TXSTART; /* first BD needs frm_len and offload flags set */ temp_bd.frm_len = cpu_to_le16(skb->len); temp_bd.flags = flags; - if (flags & ENETC_TXBD_FLAGS_TSE) { - u32 temp; - - temp = (skb->skb_mstamp_ns >> 5 & ENETC_TXBD_TXSTART_MASK) - | (flags << ENETC_TXBD_FLAGS_OFFSET); - temp_bd.txstart = cpu_to_le32(temp); - } + if (flags & ENETC_TXBD_FLAGS_TSE) + temp_bd.txstart = enetc_txbd_set_tx_start(skb->skb_mstamp_ns, + flags); if (flags & ENETC_TXBD_FLAGS_EX) { u8 e_flags = 0; @@ -1897,8 +1857,7 @@ static void enetc_kfree_si(struct enetc_si *si) static void enetc_detect_errata(struct enetc_si *si) { if (si->pdev->revision == ENETC_REV1) - si->errata = ENETC_ERR_TXCSUM | ENETC_ERR_VLAN_ISOL | -ENETC_ERR_UCMCSWP; + si->errata = ENETC_ERR_VLAN_ISOL | ENETC_ERR_UCMCSWP; } int enetc_pci_probe(struct pci_dev *pdev, const char *name, int sizeof_priv) diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h index dd0fb0c066d7..8532d23b54f5 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.h +++ b/drivers/net/ethernet/freescale/enetc/enetc.h @@ -147,9 +147,8 @@ struct enetc_msg_swbd { #define ENETC_REV1 0x1 enum enetc_errata { - ENETC_ERR_TXCSUM= BIT(0), - ENETC_ERR_VLAN_ISOL = BIT(1), - ENETC_ERR_UCMCSWP = BIT(2), + ENETC_ERR_VLAN_ISOL = BIT(0), + ENETC_ERR_UCMCSWP = BIT(1), }; #define ENETC_SI_F_QBV BIT(0) diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h index 17cf7c94fdb5..68ef4f959982 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h +++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h @@ -374,8 +374,7 @@ union enetc_tx_bd { __le16 frm_len; union { struct { - __le16 l3_csoff; - u8 l4_csoff; + u8 reserved[3]; u8 flags; }; /* default layout */ __le32 txstart; @@ -398,41 +397,37 @@ union enetc_tx_bd { } wb; /* writeback descriptor */ }; -#define ENETC_TXBD_FLAGS_L4CS BIT(0) -#define ENETC_TXBD_FLAGS_TSE
Re: [RFC PATCH ethtool] ethtool: Improve compatibility between netlink and ioctl interfaces
On Mon, Nov 02, 2020 at 11:58:03PM +0100, Michal Kubecek wrote: > On Mon, Nov 02, 2020 at 08:40:36PM +0200, Ido Schimmel wrote: > > +static int linkmodes_reply_adver_all_cb(const struct nlmsghdr *nlhdr, > > ^ advert? > > > + void *data) > > +{ > > + const struct nlattr *bitset_tb[ETHTOOL_A_BITSET_MAX + 1] = {}; > > + const struct nlattr *tb[ETHTOOL_A_LINKMODES_MAX + 1] = {}; > > + DECLARE_ATTR_TB_INFO(bitset_tb); > > + struct nl_context *nlctx = data; > > + struct nl_msg_buff *msgbuff; > > + DECLARE_ATTR_TB_INFO(tb); > > + struct nl_socket *nlsk; > > + struct nlattr *nest; > > + int ret; > > + > > + ret = mnl_attr_parse(nlhdr, GENL_HDRLEN, attr_cb, &tb_info); > > + if (ret < 0) > > + return ret; > > + if (!tb[ETHTOOL_A_LINKMODES_OURS]) > > + return -EINVAL; > > + > > + ret = mnl_attr_parse_nested(tb[ETHTOOL_A_LINKMODES_OURS], attr_cb, > > + &bitset_tb_info); > > + if (ret < 0) > > + return ret; > > + if (!bitset_tb[ETHTOOL_A_BITSET_SIZE] || > > + !bitset_tb[ETHTOOL_A_BITSET_VALUE] || > > + !bitset_tb[ETHTOOL_A_BITSET_MASK]) > > + return -EINVAL; > > + > > + ret = netlink_init_ethnl2_socket(nlctx); > > + if (ret < 0) > > + return ret; > > + > > + nlsk = nlctx->ethnl2_socket; > > + msgbuff = &nlsk->msgbuff; > > + > > + ret = msg_init(nlctx, msgbuff, ETHTOOL_MSG_LINKMODES_SET, > > + NLM_F_REQUEST | NLM_F_ACK); > > + if (ret < 0) > > + return ret; > > + if (ethnla_fill_header(msgbuff, ETHTOOL_A_LINKMODES_HEADER, > > + nlctx->devname, 0)) > > + return -EMSGSIZE; > > + > > + if (ethnla_put_u8(msgbuff, ETHTOOL_A_LINKMODES_AUTONEG, AUTONEG_ENABLE)) > > + return -EMSGSIZE; > > + > > + /* Use the size and mask from the reply and set the value to the mask, > > +* so that all supported link modes will be advertised. > > +*/ > > + ret = -EMSGSIZE; > > + nest = ethnla_nest_start(msgbuff, ETHTOOL_A_LINKMODES_OURS); > > + if (!nest) > > + return -EMSGSIZE; > > + > > + if (ethnla_put_u32(msgbuff, ETHTOOL_A_BITSET_SIZE, > > + mnl_attr_get_u32(bitset_tb[ETHTOOL_A_BITSET_SIZE]))) > > + goto err; > > + > > + if (ethnla_put(msgbuff, ETHTOOL_A_BITSET_VALUE, > > + > > mnl_attr_get_payload_len(bitset_tb[ETHTOOL_A_BITSET_MASK]), > > + mnl_attr_get_payload(bitset_tb[ETHTOOL_A_BITSET_MASK]))) > > + goto err; > > + > > + if (ethnla_put(msgbuff, ETHTOOL_A_BITSET_MASK, > > + > > mnl_attr_get_payload_len(bitset_tb[ETHTOOL_A_BITSET_MASK]), > > + mnl_attr_get_payload(bitset_tb[ETHTOOL_A_BITSET_MASK]))) > > + goto err; > > + > > + ethnla_nest_end(msgbuff, nest); > > To fully replicate ioctl code behaviour, we should only set the bits > corresponding to "real" link modes, not "special" ones (e.g. > ETHTOOL_LINK_MODE_TP_BIT). Michal, I have the changes you requested here: https://github.com/idosch/ethtool/commit/b34d15839f2662808c566c04eda726113e20ee59 Do you want to integrate it with your nl_parse() rework or should I? Thanks
Re: [PATCH net-next 0/5] net: add and use dev_get_tstats64
On 02.11.2020 23:36, Saeed Mahameed wrote: > On Sun, 2020-11-01 at 13:33 +0100, Heiner Kallweit wrote: >> It's a frequent pattern to use netdev->stats for the less frequently >> accessed counters and per-cpu counters for the frequently accessed >> counters (rx/tx bytes/packets). Add a default ndo_get_stats64() >> implementation for this use case. Subsequently switch more drivers >> to use this pattern. >> >> Heiner Kallweit (5): >> net: core: add dev_get_tstats64 as a ndo_get_stats64 implementation >> net: make ip_tunnel_get_stats64 an alias for dev_get_tstats64 >> ip6_tunnel: use ip_tunnel_get_stats64 as ndo_get_stats64 callback >> net: dsa: use net core stats64 handling >> tun: switch to net core provided statistics counters >> > > not many left, > > $ git grep dev_fetch_sw_netstats drivers/ > > drivers/infiniband/hw/hfi1/ipoib_main.c:dev_fetch_sw_netstats(s > torage, priv->netstats); > drivers/net/macsec.c: dev_fetch_sw_netstats(s, dev->tstats); > drivers/net/usb/qmi_wwan.c: dev_fetch_sw_netstats(stats, priv- >> stats64); > drivers/net/usb/usbnet.c: dev_fetch_sw_netstats(stats, dev- >> stats64); > drivers/net/wireless/quantenna/qtnfmac/core.c: dev_fetch_sw_netstats(s > tats, vif->stats64); > > Why not convert them as well ? > macsec has a different implementation, but all others can be converted. > OK, I can do this. Then the series becomes somewhat bigger. @Jakub: Would it be ok to apply the current series and I provide the additionally requested migrations as follow-up series?
Re: [PATCH ipsec] xfrm: Pass template address family to xfrm_state_look_at
On Tue, Nov 3, 2020 at 4:05 AM Herbert Xu wrote: > > On Mon, Nov 02, 2020 at 06:32:19PM -0800, Anthony DeRossi wrote: > > This fixes a regression where valid selectors are incorrectly skipped > > when xfrm_state_find is called with a non-matching address family (e.g. > > when using IPv6-in-IPv4 ESP in transport mode). > > > > The state's address family is matched against the template's family > > (encap_family) in xfrm_state_find before checking the selector in > > xfrm_state_look_at. The template's family should also be used for > > selector matching, otherwise valid selectors may be skipped. > > > > Fixes: e94ee171349d ("xfrm: Use correct address family in xfrm_state_find") > > Signed-off-by: Anthony DeRossi > > --- > > net/xfrm/xfrm_state.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > Your patch reintroduces the same bug that my patch was trying to > fix, namely that when you do the comparison on flow you must use > the original family and not some other value. My mistake, I misunderstood the original bug. Anthony
[PATCH v5 5/5] ARM: defconfig: Enable ax88796c driver
Enable ax88796c driver for the ethernet chip on Exynos3250-based ARTIK5 boards. Signed-off-by: Łukasz Stelmach --- arch/arm/configs/exynos_defconfig | 2 ++ arch/arm/configs/multi_v7_defconfig | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig index cf82c9d23a08..1ee902d01eef 100644 --- a/arch/arm/configs/exynos_defconfig +++ b/arch/arm/configs/exynos_defconfig @@ -107,6 +107,8 @@ CONFIG_MD=y CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=m CONFIG_NETDEVICES=y +CONFIG_NET_VENDOR_ASIX=y +CONFIG_SPI_AX88796C=y CONFIG_SMSC911X=y CONFIG_USB_RTL8150=m CONFIG_USB_RTL8152=y diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig index e731cdf7c88c..dad53846f58f 100644 --- a/arch/arm/configs/multi_v7_defconfig +++ b/arch/arm/configs/multi_v7_defconfig @@ -243,6 +243,8 @@ CONFIG_SATA_HIGHBANK=y CONFIG_SATA_MV=y CONFIG_SATA_RCAR=y CONFIG_NETDEVICES=y +CONFIG_NET_VENDOR_ASIX=y +CONFIG_SPI_AX88796C=m CONFIG_VIRTIO_NET=y CONFIG_B53_SPI_DRIVER=m CONFIG_B53_MDIO_DRIVER=m -- 2.26.2
[PATCH v5 0/5] AX88796C SPI Ethernet Adapter
This is a driver for AX88796C Ethernet Adapter connected in SPI mode as found on ARTIK5 evaluation board. The driver has been ported from a v3.10.9 vendor kernel for ARTIK5 board. Changes in v5: - coding style (local variable declarations) - added spi0 node in the DT binding example and removed interrupt-parent - removed comp module parameter - added CONFIG_SPI_AX88796C_COMPRESSION option to set the initial state of SPI compression - introduced new ethtool tunable "spi-compression" to controll SPI transfer compression - removed unused fields in struct ax88796c_device - switched from using buffers allocated on stack for SPI transfers to DMA safe ones embedded in struct ax_spi and allocated with kmalloc() Changes in v4: - fixed compilation problems in asix,ax88796c.yaml and in ax88796c_main.c introduced in v3 Changes in v3: - modify vendor-prefixes.yaml in a separate patch - fix several problems in the dt binding - removed unnecessary descriptions and properties - changed the order of entries - fixed problems with missing defines in the example - change (1 << N) to BIT(N), left a few (0 << N) - replace ax88796c_get_link(), ax88796c_get_link_ksettings(), ax88796c_set_link_ksettings(), ax88796c_nway_reset(), ax88796c_set_mac_address() with appropriate kernel functions. - disable PHY auto-polling in MAC and use PHYLIB to track the state of PHY and configure MAC - propagate return values instead of returning constants in several places - add WARN_ON() for unlocked mutex - remove local work queue and use the system_wq - replace phy_connect_direct() with phy_connect() and move devm_register_netdev() to the end of ax88796c_probe() (Unlike phy_connect_direct() phy_connect() does not crash if the network device isn't registered yet.) - remove error messages on ENOMEM - move free_irq() to the end of ax88796c_close() to avoid race condition - implement flow-control Changes in v2: - use phylib - added DT bindings - moved #includes to *.c files - used mutex instead of a semaphore for locking - renamed some constants - added error propagation for several functions - used ethtool for dumping registers - added control over checksum offloading - remove vendor specific PM - removed macaddr module parameter and added support for reading a MAC address from platform data (e.g. DT) - removed dependency on SPI from NET_VENDOR_ASIX - added an entry in the MAINTAINERS file - simplified logging with appropriate netif_* and netdev_* helpers - lots of style fixes Łukasz Stelmach (5): dt-bindings: vendor-prefixes: Add asix prefix dt-bindings: net: Add bindings for AX88796C SPI Ethernet Adapter net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver ARM: dts: exynos: Add Ethernet to Artik 5 board ARM: defconfig: Enable ax88796c driver .../bindings/net/asix,ax88796c.yaml | 73 ++ .../devicetree/bindings/vendor-prefixes.yaml |2 + MAINTAINERS |6 + arch/arm/boot/dts/exynos3250-artik5-eval.dts | 29 + arch/arm/configs/exynos_defconfig |2 + arch/arm/configs/multi_v7_defconfig |2 + drivers/net/ethernet/Kconfig |1 + drivers/net/ethernet/Makefile |1 + drivers/net/ethernet/asix/Kconfig | 35 + drivers/net/ethernet/asix/Makefile|6 + drivers/net/ethernet/asix/ax88796c_ioctl.c| 235 drivers/net/ethernet/asix/ax88796c_ioctl.h| 26 + drivers/net/ethernet/asix/ax88796c_main.c | 1132 + drivers/net/ethernet/asix/ax88796c_main.h | 561 drivers/net/ethernet/asix/ax88796c_spi.c | 109 ++ drivers/net/ethernet/asix/ax88796c_spi.h | 70 + include/uapi/linux/ethtool.h |1 + net/ethtool/common.c |1 + 18 files changed, 2292 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/asix,ax88796c.yaml create mode 100644 drivers/net/ethernet/asix/Kconfig create mode 100644 drivers/net/ethernet/asix/Makefile create mode 100644 drivers/net/ethernet/asix/ax88796c_ioctl.c create mode 100644 drivers/net/ethernet/asix/ax88796c_ioctl.h create mode 100644 drivers/net/ethernet/asix/ax88796c_main.c create mode 100644 drivers/net/ethernet/asix/ax88796c_main.h create mode 100644 drivers/net/ethernet/asix/ax88796c_spi.c create mode 100644 drivers/net/ethernet/asix/ax88796c_spi.h -- 2.26.2
[PATCH v5 1/5] dt-bindings: vendor-prefixes: Add asix prefix
Add the prefix for ASIX Electronics Corporation. Signed-off-by: Łukasz Stelmach Reviewed-by: Krzysztof Kozlowski Acked-by: Rob Herring --- Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml index 2735be1a8470..ce3b3f6c9728 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.yaml +++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml @@ -117,6 +117,8 @@ patternProperties: description: Asahi Kasei Corp. "^asc,.*": description: All Sensors Corporation + "^asix,.*": +description: ASIX Electronics Corporation "^aspeed,.*": description: ASPEED Technology Inc. "^asus,.*": -- 2.26.2
[PATCH v5 4/5] ARM: dts: exynos: Add Ethernet to Artik 5 board
Add node for ax88796c ethernet chip. Signed-off-by: Łukasz Stelmach --- arch/arm/boot/dts/exynos3250-artik5-eval.dts | 29 1 file changed, 29 insertions(+) diff --git a/arch/arm/boot/dts/exynos3250-artik5-eval.dts b/arch/arm/boot/dts/exynos3250-artik5-eval.dts index 20446a846a98..a91e09a7d3fa 100644 --- a/arch/arm/boot/dts/exynos3250-artik5-eval.dts +++ b/arch/arm/boot/dts/exynos3250-artik5-eval.dts @@ -37,3 +37,32 @@ &mshc_2 { &serial_2 { status = "okay"; }; + +&spi_0 { + status = "okay"; + cs-gpios = <&gpx3 4 GPIO_ACTIVE_LOW>, <0>; + + assigned-clocks = <&cmu CLK_MOUT_MPLL>, <&cmu CLK_DIV_MPLL_PRE>, + <&cmu CLK_MOUT_SPI0>, <&cmu CLK_DIV_SPI0>, + <&cmu CLK_DIV_SPI0_PRE>, <&cmu CLK_SCLK_SPI0>; + assigned-clock-parents = + <&cmu CLK_FOUT_MPLL>,/* for: CLK_MOUT_MPLL */ + <&cmu CLK_MOUT_MPLL>,/* for: CLK_DIV_MPLL_PRE */ + <&cmu CLK_DIV_MPLL_PRE>, /* for: CLK_MOUT_SPI0 */ + <&cmu CLK_MOUT_SPI0>,/* for: CLK_DIV_SPI0 */ + <&cmu CLK_DIV_SPI0>, /* for: CLK_DIV_SPI0_PRE */ + <&cmu CLK_DIV_SPI0_PRE>; /* for: CLK_SCLK_SPI0 */ + + ethernet@0 { + compatible = "asix,ax88796c"; + reg = <0x0>; + local-mac-address = [00 00 00 00 00 00]; /* Filled in by a boot-loader */ + interrupt-parent = <&gpx2>; + interrupts = <0 IRQ_TYPE_LEVEL_LOW>; + spi-max-frequency = <4000>; + reset-gpios = <&gpe0 2 GPIO_ACTIVE_LOW>; + controller-data { + samsung,spi-feedback-delay = <2>; + }; + }; +}; -- 2.26.2
Re: [PATCH ipsec] xfrm: Pass template address family to xfrm_state_look_at
On Tue, Nov 3, 2020 at 4:08 AM Herbert Xu wrote: > > On Mon, Nov 02, 2020 at 06:32:19PM -0800, Anthony DeRossi wrote: > > This fixes a regression where valid selectors are incorrectly skipped > > when xfrm_state_find is called with a non-matching address family (e.g. > > when using IPv6-in-IPv4 ESP in transport mode). > > Why are we even allowing v6-over-v4 in transport mode? Isn't that > the whole point of BEET mode? I'm not sure. This is the outgoing policy that strongSwan creates for an IPv6-in-IPv4 tunnel when compression is enabled: src fd02::/16 dst fd02::2/128 dir out priority 326271 ptype main tmpl src 10.0.0.8 dst 192.168.1.231 proto comp spi 0xd00e reqid 1 mode tunnel tmpl src 0.0.0.0 dst 0.0.0.0 proto esp spi 0xc543e950 reqid 1 mode transport After your patch, outgoing IPv6 packets fail to match the associated state: src 10.0.0.8 dst 192.168.1.231 proto esp spi 0xc543e950 reqid 1 mode transport replay-window 0 auth-trunc hmac(sha256) 0x143b570f59b23eaa560905f19a922451c6dfa5694ba2e45e1b065bb1863421aa 128 enc cbc(aes) 0x526ed144ca087125ce30e36c8f20d972 encap type espinudp sport 4501 dport 4500 addr 0.0.0.0 anti-replay context: seq 0x0, oseq 0x0, bitmap 0x sel src 0.0.0.0/0 dst 0.0.0.0/0 Is this an invalid configuration? Anthony
[PATCH v5 2/5] dt-bindings: net: Add bindings for AX88796C SPI Ethernet Adapter
Add bindings for AX88796C SPI Ethernet Adapter. Signed-off-by: Łukasz Stelmach --- .../bindings/net/asix,ax88796c.yaml | 73 +++ 1 file changed, 73 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/asix,ax88796c.yaml diff --git a/Documentation/devicetree/bindings/net/asix,ax88796c.yaml b/Documentation/devicetree/bindings/net/asix,ax88796c.yaml new file mode 100644 index ..699ebf452479 --- /dev/null +++ b/Documentation/devicetree/bindings/net/asix,ax88796c.yaml @@ -0,0 +1,73 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/asix,ax88796c.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: ASIX AX88796C SPI Ethernet Adapter + +maintainers: + - Łukasz Stelmach + +description: | + ASIX AX88796C is an Ethernet controller with a built in PHY. This + describes SPI mode of the chip. + + The node for this driver must be a child node of an SPI controller, + hence all mandatory properties described in + ../spi/spi-controller.yaml must be specified. + +allOf: + - $ref: ethernet-controller.yaml# + +properties: + compatible: +const: asix,ax88796c + + reg: +maxItems: 1 + + spi-max-frequency: +maximum: 4000 + + interrupts: +maxItems: 1 + + reset-gpios: +description: + A GPIO line handling reset of the chip. As the line is active low, + it should be marked GPIO_ACTIVE_LOW. +maxItems: 1 + + local-mac-address: true + + mac-address: true + +required: + - compatible + - reg + - spi-max-frequency + - interrupts + - reset-gpios + +additionalProperties: false + +examples: + # Artik5 eval board + - | +#include +#include +spi0 { +#address-cells = <1>; +#size-cells = <0>; + +ethernet@0 { +compatible = "asix,ax88796c"; +reg = <0x0>; +local-mac-address = [00 00 00 00 00 00]; /* Filled in by a bootloader */ +interrupt-parent = <&gpx2>; +interrupts = <0 IRQ_TYPE_LEVEL_LOW>; +spi-max-frequency = <4000>; +reset-gpios = <&gpe0 2 GPIO_ACTIVE_LOW>; +}; +}; -- 2.26.2
[PATCH v5 3/5] net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver
ASIX AX88796[1] is a versatile ethernet adapter chip, that can be connected to a CPU with a 8/16-bit bus or with an SPI. This driver supports SPI connection. The driver has been ported from the vendor kernel for ARTIK5[2] boards. Several changes were made to adapt it to the current kernel which include: + updated DT configuration, + clock configuration moved to DT, + new timer, ethtool and gpio APIs, + dev_* instead of pr_* and custom printk() wrappers, + removed awkward vendor power managemtn. + introduced ethtool tunable to control SPI compression [1] https://www.asix.com.tw/products.php?op=pItemdetail&PItemID=104;65;86&PLine=65 [2] https://git.tizen.org/cgit/profile/common/platform/kernel/linux-3.10-artik/ The other ax88796 driver is for NE2000 compatible AX88796L chip. These chips are not compatible. Hence, two separate drivers are required. Signed-off-by: Łukasz Stelmach --- MAINTAINERS|6 + drivers/net/ethernet/Kconfig |1 + drivers/net/ethernet/Makefile |1 + drivers/net/ethernet/asix/Kconfig | 35 + drivers/net/ethernet/asix/Makefile |6 + drivers/net/ethernet/asix/ax88796c_ioctl.c | 235 drivers/net/ethernet/asix/ax88796c_ioctl.h | 26 + drivers/net/ethernet/asix/ax88796c_main.c | 1132 drivers/net/ethernet/asix/ax88796c_main.h | 561 ++ drivers/net/ethernet/asix/ax88796c_spi.c | 109 ++ drivers/net/ethernet/asix/ax88796c_spi.h | 70 ++ include/uapi/linux/ethtool.h |1 + net/ethtool/common.c |1 + 13 files changed, 2184 insertions(+) create mode 100644 drivers/net/ethernet/asix/Kconfig create mode 100644 drivers/net/ethernet/asix/Makefile create mode 100644 drivers/net/ethernet/asix/ax88796c_ioctl.c create mode 100644 drivers/net/ethernet/asix/ax88796c_ioctl.h create mode 100644 drivers/net/ethernet/asix/ax88796c_main.c create mode 100644 drivers/net/ethernet/asix/ax88796c_main.h create mode 100644 drivers/net/ethernet/asix/ax88796c_spi.c create mode 100644 drivers/net/ethernet/asix/ax88796c_spi.h diff --git a/MAINTAINERS b/MAINTAINERS index 14b8ec0bb58b..930dc859d4f7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2812,6 +2812,12 @@ S: Maintained F: Documentation/hwmon/asc7621.rst F: drivers/hwmon/asc7621.c +ASIX AX88796C SPI ETHERNET ADAPTER +M: Łukasz Stelmach +S: Maintained +F: Documentation/devicetree/bindings/net/asix,ax99706c-spi.yaml +F: drivers/net/ethernet/asix/ax88796c_* + ASPEED PINCTRL DRIVERS M: Andrew Jeffery L: linux-asp...@lists.ozlabs.org (moderated for non-subscribers) diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig index de50e8b9e656..f3b218e45ea5 100644 --- a/drivers/net/ethernet/Kconfig +++ b/drivers/net/ethernet/Kconfig @@ -32,6 +32,7 @@ source "drivers/net/ethernet/apm/Kconfig" source "drivers/net/ethernet/apple/Kconfig" source "drivers/net/ethernet/aquantia/Kconfig" source "drivers/net/ethernet/arc/Kconfig" +source "drivers/net/ethernet/asix/Kconfig" source "drivers/net/ethernet/atheros/Kconfig" source "drivers/net/ethernet/aurora/Kconfig" source "drivers/net/ethernet/broadcom/Kconfig" diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile index f8f38dcb5f8a..9eb368d93607 100644 --- a/drivers/net/ethernet/Makefile +++ b/drivers/net/ethernet/Makefile @@ -18,6 +18,7 @@ obj-$(CONFIG_NET_XGENE) += apm/ obj-$(CONFIG_NET_VENDOR_APPLE) += apple/ obj-$(CONFIG_NET_VENDOR_AQUANTIA) += aquantia/ obj-$(CONFIG_NET_VENDOR_ARC) += arc/ +obj-$(CONFIG_NET_VENDOR_ASIX) += asix/ obj-$(CONFIG_NET_VENDOR_ATHEROS) += atheros/ obj-$(CONFIG_NET_VENDOR_AURORA) += aurora/ obj-$(CONFIG_NET_VENDOR_CADENCE) += cadence/ diff --git a/drivers/net/ethernet/asix/Kconfig b/drivers/net/ethernet/asix/Kconfig new file mode 100644 index ..6211814b0446 --- /dev/null +++ b/drivers/net/ethernet/asix/Kconfig @@ -0,0 +1,35 @@ +# +# Asix network device configuration +# + +config NET_VENDOR_ASIX + bool "Asix devices" + default y + help + If you have a network (Ethernet, non-USB, not NE2000 compatible) + interface based on a chip from ASIX, say Y. + +if NET_VENDOR_ASIX + +config SPI_AX88796C + tristate "Asix AX88796C-SPI support" + select PHYLIB + depends on SPI + depends on GPIOLIB + help + Say Y here if you intend to use ASIX AX88796C attached in SPI mode. + +config SPI_AX88796C_COMPRESSION + bool "SPI transfer compression" + default n + depends on SPI_AX88796C + help + Say Y here to enable SPI transfer compression. It saves up + to 24 dummy cycles during each transfer which may noticably + speed up short transfers. This sets the default value that is + inherited by network interfecase during probe. It can be + changed in run time via spi-compression ethtool tunable. + +
Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
On Mon, 2020-11-02 at 14:54 -0800, Jakub Kicinski wrote: > On Fri, 30 Oct 2020 12:16:00 +0100 Paolo Abeni wrote: > > Real-time setups try hard to ensure proper isolation between time > > critical applications and e.g. network processing performed by the > > network stack in softirq and RPS is used to move the softirq > > activity away from the isolated core. > > > > If the network configuration is dynamic, with netns and devices > > routinely created at run-time, enforcing the correct RPS setting > > on each newly created device allowing to transient bad configuration > > became complex. > > > > These series try to address the above, introducing a new > > sysctl knob: rps_default_mask. The new sysctl entry allows > > configuring a systemwide RPS mask, to be enforced since receive > > queue creation time without any fourther per device configuration > > required. > > > > Additionally, a simple self-test is introduced to check the > > rps_default_mask behavior. > > RPS is disabled by default, the processing is going to happen wherever > the IRQ is mapped, and one would hope that the IRQ is not mapped to the > core where the critical processing runs. > > Would you mind elaborating further on the use case? On Mon, 2020-11-02 at 15:27 -0800, Saeed Mahameed wrote: > The whole thing can be replaced with a user daemon scripts that > monitors all newly created devices and assign to them whatever rps mask > (call it default). > > So why do we need this special logic in kernel ? > > I am not sure about this, but if rps queues sysfs are available before > the netdev is up, then you can also use udevd to assign the rps masks > before such devices are even brought up, so you would avoid the race > conditions that you described, which are not really clear to me to be > honest. Thank you for the feedback. Please allow me to answer you both here, as your questions are related. The relevant use case is an host running containers (with the related orchestration tools) in a RT environment. Virtual devices (veths, ovs ports, etc.) are created by the orchestration tools at run-time. Critical processes are allowed to send packets/generate outgoing network traffic - but any interrupt is moved away from the related cores, so that usual incoming network traffic processing does not happen there. Still an xmit operation on a virtual devices may be transmitted via ovs or veth, with the relevant forwarding operation happening in a softirq on the same CPU originating the packet. RPS is configured (even) on such virtual devices to move away the forwarding from the relevant CPUs. As Saeed noted, such configuration could be possibly performed via some user-space daemon monitoring network devices and network namespaces creation. That will be anyway prone to some race: the orchestation tool may create and enable the netns and virtual devices before the daemon has properly set the RPS mask. In the latter scenario some packet forwarding could still slip in the relevant CPU, causing measurable latency. In all non RT scenarios the above will be likely irrelevant, but in the RT context that is not acceptable - e.g. it causes in real environments latency above the defined limits, while the proposed patches avoid the issue. Do you see any other simple way to avoid the above race? Please let me know if the above answers your doubts, Paolo
Re: [PATCH 30/33] docs: ABI: cleanup several ABI documents
On Wed 28 Oct 09:23 CDT 2020, Mauro Carvalho Chehab wrote: [..] > .../ABI/testing/sysfs-class-remoteproc| 14 +- for this: Acked-by: Bjorn Andersson Thanks, Bjorn
Re: lan78xx: /sys/class/net/eth0/carrier stuck at 1
On Tue, Nov 03, 2020 at 01:47:12PM +0100, Juerg Haefliger wrote: > On Fri, 23 Oct 2020 15:05:19 +0200 > Andrew Lunn wrote: > > > On Fri, Oct 23, 2020 at 08:29:59AM +0200, Juerg Haefliger wrote: > > > On Wed, 21 Oct 2020 21:35:48 +0200 > > > Andrew Lunn wrote: > > > > > > > On Wed, Oct 21, 2020 at 05:00:53PM +0200, Juerg Haefliger wrote: > > > > > Hi, > > > > > > > > > > If the lan78xx driver is compiled into the kernel and the network > > > > > cable is > > > > > plugged in at boot, /sys/class/net/eth0/carrier is stuck at 1 and > > > > > doesn't > > > > > toggle if the cable is unplugged and replugged. > > > > > > > > > > If the network cable is *not* plugged in at boot, all seems to work > > > > > fine. > > > > > I.e., post-boot cable plugs and unplugs toggle the carrier flag. > > > > > > > > > > Also, everything seems to work fine if the driver is compiled as a > > > > > module. > > > > > > > > > > There's an older ticket for the raspi kernel [1] but I've just tested > > > > > this > > > > > with a 5.8 kernel on a Pi 3B+ and still see that behavior. > > > > > > > > Hi Jürg > > > > > > Hi Andrew, > > > > > > > > > > Could you check if a different PHY driver is being used when it is > > > > built and broken vs module or built in and working. > > > > > > > > Look at /sys/class/net/eth0/phydev/driver > > > > > > There's no such file. > > > > I _think_ that means it is using genphy, the generic PHY driver, not a > > specific vendor PHY driver? What does > > > > /sys/class/net/eth0/phydev/phy_id contain. > > There is no directory /sys/class/net/eth0/phydev. [Goes and looks at the code] The symbolic link is only created if the PHY is connected to the MAC if the MAC has been registered with the core first. lan78xx does it the other way around: ret = lan78xx_phy_init(dev); if (ret < 0) goto out4; ret = register_netdev(netdev); if (ret != 0) { netif_err(dev, probe, netdev, "couldn't register the device\n"); goto out5; } The register dump you show below indicates an ID of 007c132, which fits the drivers drivers/net/phy/microchip.c : "Microchip LAN88xx". Any mention of that in dmesg, do you see the module loaded? > > > > Given that all works fine as long as the cable is unplugged at boot points > > > more towards a race at boot or incorrect initialization sequence or > > > something. > > > > Could be. Could you run > > > > mii-tool -vv eth0 > > Hrm. Running that command unlocks the carrier flag and it starts toggling on > cable unplug/plug. First invocation: > > $ sudo mii-tool -vv eth0 > Using SIOCGMIIPHY=0x8947 > eth0: negotiated 1000baseT-FD flow-control, link ok > registers for MII PHY 1: > 1040 79ed 0007 c132 05e1 cde1 000f > 0200 0800 3000 > 0088 3200 0004 > 0040 a000 a000 a035 > product info: vendor 00:01:f0, model 19 rev 2 > basic mode: autonegotiation enabled > basic status: autonegotiation complete, link ok > capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD > advertising: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD > flow-control > link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD > flow-control > > Subsequent invocation: > > $ sudo mii-tool -vv eth0 > Using SIOCGMIIPHY=0x8947 > eth0: negotiated 1000baseT-FD flow-control, link ok > registers for MII PHY 1: > 1040 79ed 0007 c132 05e1 cde1 000d > 0200 0800 3000 > 0088 3200 0004 > 0040 a000 a035 > product info: vendor 00:01:f0, model 19 rev 2 > basic mode: autonegotiation enabled > basic status: autonegotiation complete, link ok > capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD > advertising: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD > flow-control > link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD > flow-control > > In the first invocation, register 0x1a shows a pending link-change interrupt > (0xa000) which wasn't serviced (and cleared) for some reason. Dumping the > registers cleared that interrupt bit and things start working correctly > afterwards. Nor sure yet why that first interrupt is ignored. So, 0x1a is the interrupt status, and 0x19 is the interrupt mask. This should really be interpreted as a level interrupt. But it appears the hardware the interrupt is connected to is actually doing edge. And the edge has been missed, and so the interrupt is never serviced. I think the call sequence goes something like this, if i'm reading the code correct: lan78xx_probe() calls lan78xx_bind: lan78xx_bind() registers an interrupt domain. This allows USB status messages indicating an interrupt to be dispatched using the normal interrupt mechanism
Re: [bpf-next PATCH v2 5/5] selftest/bpf: Use global variables instead of maps for test_tcpbpf_kern
On Mon, Nov 2, 2020 at 5:26 PM Martin KaFai Lau wrote: > > On Sat, Oct 31, 2020 at 11:52:37AM -0700, Alexander Duyck wrote: > [ ... ] > > > +struct tcpbpf_globals global = { 0 }; > > int _version SEC("version") = 1; > > > > SEC("sockops") > > @@ -105,29 +72,15 @@ int bpf_testcb(struct bpf_sock_ops *skops) > > > > op = (int) skops->op; > > > > - update_event_map(op); > > + global.event_map |= (1 << op); > > > > switch (op) { > > case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: > > /* Test failure to set largest cb flag (assumes not defined) > > */ > > - bad_call_rv = bpf_sock_ops_cb_flags_set(skops, 0x80); > > + global.bad_cb_test_rv = bpf_sock_ops_cb_flags_set(skops, > > 0x80); > > /* Set callback */ > > - good_call_rv = bpf_sock_ops_cb_flags_set(skops, > > + global.good_cb_test_rv = bpf_sock_ops_cb_flags_set(skops, > >BPF_SOCK_OPS_STATE_CB_FLAG); > > - /* Update results */ > > - { > > - __u32 key = 0; > > - struct tcpbpf_globals g, *gp; > > - > > - gp = bpf_map_lookup_elem(&global_map, &key); > > - if (!gp) > > - break; > > - g = *gp; > > - g.bad_cb_test_rv = bad_call_rv; > > - g.good_cb_test_rv = good_call_rv; > > - bpf_map_update_elem(&global_map, &key, &g, > > - BPF_ANY); > > - } > > break; > > case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: > > skops->sk_txhash = 0x12345f; > > @@ -143,10 +96,8 @@ int bpf_testcb(struct bpf_sock_ops *skops) > > > > thdr = (struct tcphdr *)(header + offset); > > v = thdr->syn; > > - __u32 key = 1; > > > > - bpf_map_update_elem(&sockopt_results, &key, > > &v, > > - BPF_ANY); > > + global.tcp_saved_syn = v; > > } > > } > > break; > > @@ -156,25 +107,16 @@ int bpf_testcb(struct bpf_sock_ops *skops) > > break; > > case BPF_SOCK_OPS_STATE_CB: > > if (skops->args[1] == BPF_TCP_CLOSE) { > > - __u32 key = 0; > > - struct tcpbpf_globals g, *gp; > > - > > - gp = bpf_map_lookup_elem(&global_map, &key); > > - if (!gp) > > - break; > > - g = *gp; > > if (skops->args[0] == BPF_TCP_LISTEN) { > > - g.num_listen++; > > + global.num_listen++; > > } else { > > - g.total_retrans = skops->total_retrans; > > - g.data_segs_in = skops->data_segs_in; > > - g.data_segs_out = skops->data_segs_out; > > - g.bytes_received = skops->bytes_received; > > - g.bytes_acked = skops->bytes_acked; > > + global.total_retrans = skops->total_retrans; > > + global.data_segs_in = skops->data_segs_in; > > + global.data_segs_out = skops->data_segs_out; > > + global.bytes_received = skops->bytes_received; > > + global.bytes_acked = skops->bytes_acked; > > } > > - g.num_close_events++; > > - bpf_map_update_elem(&global_map, &key, &g, > > - BPF_ANY); > It is interesting that there is no race in the original "g.num_close_events++" > followed by the bpf_map_update_elem(). It seems quite fragile though. How would it race with the current code though? At this point we are controlling the sockets in a single thread. As such the close events should already be serialized shouldn't they? This may have been a problem with the old code, but even then it was only two sockets so I don't think there was much risk of them racing against each other since the two sockets were linked anyway. > > + global.num_close_events++; > There is __sync_fetch_and_add(). > > not sure about the global.event_map though, may be use an individual > variable for each _CB. Thoughts? I think this may be overkill for what we actually need. Since we are closing the sockets in a single threaded application there isn't much risk of the sockets all racing against each other in the close is there?
Re: [bpf-next PATCH v2 4/5] selftests/bpf: Migrate tcpbpf_user.c to use BPF skeleton
On Mon, Nov 2, 2020 at 4:55 PM Martin KaFai Lau wrote: > > On Sat, Oct 31, 2020 at 11:52:31AM -0700, Alexander Duyck wrote: > > From: Alexander Duyck > > > > Update tcpbpf_user.c to make use of the BPF skeleton. Doing this we can > > simplify test_tcpbpf_user and reduce the overhead involved in setting up > > the test. > > > > In addition we can clean up the remaining bits such as the one remaining > > CHECK_FAIL at the end of test_tcpbpf_user so that the function only makes > > use of CHECK as needed. > > > > Acked-by: Andrii Nakryiko > > Signed-off-by: Alexander Duyck > Acked-by: Martin KaFai Lau > > > --- > > .../testing/selftests/bpf/prog_tests/tcpbpf_user.c | 48 > > > > 1 file changed, 18 insertions(+), 30 deletions(-) > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c > > b/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c > > index d96f4084d2f5..c7a61b0d616a 100644 > > --- a/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c > > +++ b/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c > > @@ -4,6 +4,7 @@ > > #include > > > > #include "test_tcpbpf.h" > > +#include "test_tcpbpf_kern.skel.h" > > > > #define LO_ADDR6 "::1" > > #define CG_NAME "/tcpbpf-user-test" > > @@ -133,44 +134,31 @@ static void run_test(int map_fd, int sock_map_fd) > > > > void test_tcpbpf_user(void) > > { > > - const char *file = "test_tcpbpf_kern.o"; > > - int prog_fd, map_fd, sock_map_fd; > > - int error = EXIT_FAILURE; > > - struct bpf_object *obj; > > + struct test_tcpbpf_kern *skel; > > + int map_fd, sock_map_fd; > > int cg_fd = -1; > > - int rv; > > - > > - cg_fd = test__join_cgroup(CG_NAME); > > - if (cg_fd < 0) > > - goto err; > > > > - if (bpf_prog_load(file, BPF_PROG_TYPE_SOCK_OPS, &obj, &prog_fd)) { > > - fprintf(stderr, "FAILED: load_bpf_file failed for: %s\n", > > file); > > - goto err; > > - } > > + skel = test_tcpbpf_kern__open_and_load(); > > + if (CHECK(!skel, "open and load skel", "failed")) > > + return; > > > > - rv = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_SOCK_OPS, 0); > > - if (rv) { > > - fprintf(stderr, "FAILED: bpf_prog_attach: %d (%s)\n", > > -errno, strerror(errno)); > > - goto err; > > - } > > + cg_fd = test__join_cgroup(CG_NAME); > > + if (CHECK(cg_fd < 0, "test__join_cgroup(" CG_NAME ")", > > + "cg_fd:%d errno:%d", cg_fd, errno)) > > + goto cleanup_skel; > > > > - map_fd = bpf_find_map(__func__, obj, "global_map"); > > - if (map_fd < 0) > > - goto err; > > + map_fd = bpf_map__fd(skel->maps.global_map); > > + sock_map_fd = bpf_map__fd(skel->maps.sockopt_results); > > > > - sock_map_fd = bpf_find_map(__func__, obj, "sockopt_results"); > > - if (sock_map_fd < 0) > > - goto err; > > + skel->links.bpf_testcb = > > bpf_program__attach_cgroup(skel->progs.bpf_testcb, cg_fd); > > + if (ASSERT_OK_PTR(skel->links.bpf_testcb, > > "attach_cgroup(bpf_testcb)")) > > + goto cleanup_namespace; > > > > run_test(map_fd, sock_map_fd); > > > > - error = 0; > > -err: > > - bpf_prog_detach(cg_fd, BPF_CGROUP_SOCK_OPS); > > +cleanup_namespace: > nit. > > may be "cleanup_cgroup" instead? > > or only have one jump label to handle failure since "cg_fd != -1" has been > tested already. Good point. I can go through and just drop the second label and simplify this. Will fix for v3.
Re: [PATCH mlx5-next v1 06/11] vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus
On Sun, Nov 01, 2020 at 10:15:37PM +0200, Leon Romanovsky wrote: > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index 6c218b47b9f1..5316e51e72d4 100644 > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -1,18 +1,27 @@ > // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB > /* Copyright (c) 2020 Mellanox Technologies Ltd. */ > > +#include > #include > +#include > +#include > #include > #include > +#include > +#include > #include > #include > +#include > #include > #include > -#include > #include > -#include "mlx5_vnet.h" > #include "mlx5_vdpa.h" > > +MODULE_AUTHOR("Eli Cohen "); > +MODULE_DESCRIPTION("Mellanox VDPA driver"); > +MODULE_LICENSE("Dual BSD/GPL"); > + > +#define to_mlx5_vdpa_ndev(__mvdev) container_of(__mvdev, struct > mlx5_vdpa_net, mvdev) > #define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev) > > #define VALID_FEATURES_MASK > \ > @@ -159,6 +168,11 @@ static bool mlx5_vdpa_debug; > mlx5_vdpa_info(mvdev, "%s\n", #_status); >\ > } while (0) > > +static inline u32 mlx5_vdpa_max_qps(int max_vqs) > +{ > + return max_vqs / 2; > +} > + > static void print_status(struct mlx5_vdpa_dev *mvdev, u8 status, bool set) > { > if (status & ~VALID_STATUS_MASK) > @@ -1928,8 +1942,11 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev) > } > } > > -void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev) > +static int mlx5v_probe(struct auxiliary_device *adev, > +const struct auxiliary_device_id *id) > { > + struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev); > + struct mlx5_core_dev *mdev = madev->mdev; > struct virtio_net_config *config; > struct mlx5_vdpa_dev *mvdev; > struct mlx5_vdpa_net *ndev; > @@ -1943,7 +1960,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev) > ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, > mdev->device, &mlx5_vdpa_ops, >2 * mlx5_vdpa_max_qps(max_vqs)); > if (IS_ERR(ndev)) > - return ndev; > + return PTR_ERR(ndev); > > ndev->mvdev.max_vqs = max_vqs; > mvdev = &ndev->mvdev; > @@ -1972,7 +1989,8 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev) > if (err) > goto err_reg; > > - return ndev; > + dev_set_drvdata(&adev->dev, ndev); > + return 0; > > err_reg: > free_resources(ndev); > @@ -1981,10 +1999,29 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev) > err_mtu: > mutex_destroy(&ndev->reslock); > put_device(&mvdev->vdev.dev); > - return ERR_PTR(err); > + return err; > } > > -void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev) > +static int mlx5v_remove(struct auxiliary_device *adev) > { > + struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(&adev->dev); > + > vdpa_unregister_device(&mvdev->vdev); > + return 0; > } > + > +static const struct auxiliary_device_id mlx5v_id_table[] = { > + { .name = MLX5_ADEV_NAME ".vnet", }, > + {}, > +}; > + > +MODULE_DEVICE_TABLE(auxiliary, mlx5v_id_table); > + > +static struct auxiliary_driver mlx5v_driver = { > + .name = "vnet", > + .probe = mlx5v_probe, > + .remove = mlx5v_remove, > + .id_table = mlx5v_id_table, > +}; It is hard to see from the diff, but when this patch is applied the vdpa module looks like I imagined things would look with the auxiliary bus. It is very similar in structure to a PCI driver with the probe() function cleanly registering with its subsystem. This is what I'd like to see from the new Intel RDMA driver. Greg, I think this patch is the best clean usage example. I've looked over this series and it has the right idea and parts. There is definitely more that can be done to improve mlx5 in this area, but this series is well scoped and cleans a good part of it. Jason