Re: [PATCH net-next 2/2] net: dsa: lan9303: Add basic offloading of unicast traffic
Den 21. sep. 2017 16:21, skrev Andrew Lunn: Hi Egil +static void lan9303_bridge_ports(struct lan9303 *chip) +{ + /* ports bridged: remove mirroring */ + lan9303_write_switch_reg(chip, LAN9303_SWE_PORT_MIRROR, 0); +} Could you replace the 0 with something symbolic which makes this easier to understand. #define LAN9303_SWE_PORT_MIRROR_DISABLED 0 OK + static int lan9303_handle_reset(struct lan9303 *chip) { if (!chip->reset_gpio) @@ -844,6 +866,69 @@ static void lan9303_port_disable(struct dsa_switch *ds, int port, } } +static int lan9303_port_bridge_join(struct dsa_switch *ds, int port, + struct net_device *br) +{ + struct lan9303 *chip = ds->priv; + + dev_dbg(chip->dev, "%s(port %d)\n", __func__, port); + if (ds->ports[1].bridge_dev == ds->ports[2].bridge_dev) { + lan9303_bridge_ports(chip); + chip->is_bridged = true; /* unleash stp_state_set() */ + } + + return 0; +} + +static void lan9303_port_bridge_leave(struct dsa_switch *ds, int port, + struct net_device *br) +{ + struct lan9303 *chip = ds->priv; + + dev_dbg(chip->dev, "%s(port %d)\n", __func__, port); + if (chip->is_bridged) { + lan9303_separate_ports(chip); + chip->is_bridged = false; + } +} + +static void lan9303_port_stp_state_set(struct dsa_switch *ds, int port, + u8 state) +{ + int portmask, portstate; + struct lan9303 *chip = ds->priv; + + dev_dbg(chip->dev, "%s(port %d, state %d)\n", + __func__, port, state); + if (!chip->is_bridged) + return; /* touching SWE_PORT_STATE will break port separation */ I'm wondering how this is supposed to work. Please add a good comment here, since the hardware is forcing you to do something odd. Maybe it would be a good idea to save the STP state in chip. And then when chip->is_bridged is set true, change the state in the hardware to the saved value? What happens when port 0 is added to the bridge, there is then a minute pause and then port 1 is added? I would expect that as soon as port 0 is added, the STP state machine for port 0 will start and move into listening and then forwarding. Due to hardware limitations it looks like you cannot do this. So what state is the hardware effectively in? Blocking? Forwarding? Then port 1 is added. You can then can respect the states. port 1 will do blocking->listening->forwarding, but what about port 0? The calls won't get repeated? How does it transition to forwarding? Andrew I see your point with the "minute pause" argument. Although a bit contrived use case, it is easy to fix by caching the STP state, as you suggest. So I can do that. The port separation method is from the original version of the driver, not by me. I have read through the datasheet to see if there are other ways to make port separation, but I could not find anything. If anybody care to read through the 300+ page lan9303.pdf and prove me wrong, I am happy to do it differently. How does other DSA HW chips handle port separation? Knowing that could perhaps help me know what to look for. Egil ' + + switch (state) { + case BR_STATE_DISABLED: + portstate = LAN9303_SWE_PORT_STATE_DISABLED_PORT0; + break; + case BR_STATE_BLOCKING: + case BR_STATE_LISTENING: + portstate = LAN9303_SWE_PORT_STATE_BLOCKING_PORT0; + break; + case BR_STATE_LEARNING: + portstate = LAN9303_SWE_PORT_STATE_LEARNING_PORT0; + break; + case BR_STATE_FORWARDING: + portstate = LAN9303_SWE_PORT_STATE_FORWARDING_PORT0; + break; + default: + portstate = LAN9303_SWE_PORT_STATE_DISABLED_PORT0; + dev_err(chip->dev, "unknown stp state: port %d, state %d\n", + port, state); + } + + portmask = 0x3 << (port * 2); + portstate <<= (port * 2); + lan9303_write_switch_reg_mask(chip, LAN9303_SWE_PORT_STATE, + portstate, portmask); +}
Re: linux-next: build failure after merge of the net-next tree
On Thu, 2017-09-21 at 18:37 -0700, David Miller wrote: > From: Stephen Rothwell > Date: Fri, 22 Sep 2017 11:03:55 +1000 > > > After merging the net-next tree, today's linux-next build (arm > > multi_v7_defconfig) failed like this: > > > > net/ipv4/fib_frontend.c: In function 'fib_validate_source': > > net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member > > named 'fib_has_custom_local_routes' > >if (net->ipv4.fib_has_custom_local_routes) > > ^ > > net/ipv4/fib_frontend.c: In function 'inet_rtm_newroute': > > net/ipv4/fib_frontend.c:773:12: error: 'struct netns_ipv4' has no member > > named 'fib_has_custom_local_routes' > >net->ipv4.fib_has_custom_local_routes = true; > > ^ > > > > Caused by commit > > > > 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.") > > Paolo, it seems this struct member should be placed outside of > IP_MULTIPLE_TABLES protection, since users can insert custom > local routes even without that set. > > So I'm installing the following fix for this: > > > [PATCH] ipv4: Move fib_has_custom_local_routes outside of IP_MULTIPLE_TABLES. > > > net/ipv4/fib_frontend.c: In function 'fib_validate_source': > > net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member > > named 'fib_has_custom_local_routes' > >if (net->ipv4.fib_has_custom_local_routes) > > ^ > > net/ipv4/fib_frontend.c: In function 'inet_rtm_newroute': > > net/ipv4/fib_frontend.c:773:12: error: 'struct netns_ipv4' has no member > > named 'fib_has_custom_local_routes' > >net->ipv4.fib_has_custom_local_routes = true; > > ^ > > Fixes: 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is > disabled.") > Reported-by: Stephen Rothwell > Signed-off-by: David S. Miller > --- > include/net/netns/ipv4.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > index 20720721da4b..8387f099115e 100644 > --- a/include/net/netns/ipv4.h > +++ b/include/net/netns/ipv4.h > @@ -49,10 +49,10 @@ struct netns_ipv4 { > #ifdef CONFIG_IP_MULTIPLE_TABLES > struct fib_rules_ops*rules_ops; > boolfib_has_custom_rules; > - boolfib_has_custom_local_routes; > struct fib_table __rcu *fib_main; > struct fib_table __rcu *fib_default; > #endif > + boolfib_has_custom_local_routes; > #ifdef CONFIG_IP_ROUTE_CLASSID > int fib_num_tclassid_users; > #endif > -- > 2.13.5 My fault, I should have fuzzed the configuration before posting. Fix looks good, thanks David! Paolo
Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
On Thu, 21 Sep 2017, Colin King wrote: > From: Colin Ian King > > Don't populate const array ac_to_fifo on the stack in an inlined > function, instead make it static. Makes the object code smaller > by over 800 bytes: > >text data bss dec hex filename > 159029 331541216 193399 2f377 4965-mac.o > >text data bss dec hex filename > 158122 332501216 192588 2f04c 4965-mac.o > > (gcc version 7.2.0 x86_64) Gcc 7 must be much more aggressive about inlining than gcc 4. Here is the change that I got on this file: text data bss dec hex filename - 76346 1494 152 77992 130a8 drivers/net/wireless/intel/iwlegacy/4965-mac.o + 76298 1494 152 77944 13078 drivers/net/wireless/intel/iwlegacy/4965-mac.o decrease of 48 julia > > Signed-off-by: Colin Ian King > --- > drivers/net/wireless/intel/iwlegacy/4965-mac.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/wireless/intel/iwlegacy/4965-mac.c > b/drivers/net/wireless/intel/iwlegacy/4965-mac.c > index de9b6522c43f..65eba2c24292 100644 > --- a/drivers/net/wireless/intel/iwlegacy/4965-mac.c > +++ b/drivers/net/wireless/intel/iwlegacy/4965-mac.c > @@ -1480,7 +1480,7 @@ il4965_get_ac_from_tid(u16 tid) > static inline int > il4965_get_fifo_from_tid(u16 tid) > { > - const u8 ac_to_fifo[] = { > + static const u8 ac_to_fifo[] = { > IL_TX_FIFO_VO, > IL_TX_FIFO_VI, > IL_TX_FIFO_BE, > -- > 2.14.1 > > -- > To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCH net-next 2/2] net: dsa: lan9303: Add basic offloading of unicast traffic
Den 21. sep. 2017 16:26, skrev Vivien Didelot: Hi Egil, Egil Hjelmeland writes: When both user ports are joined to the same bridge, the normal HW MAC learning is enabled. This means that unicast traffic is forwarded in HW. If one of the user ports leave the bridge, the ports goes back to the initial separated operation. Port separation relies on disabled HW MAC learning. Hence the condition that both ports must join same bridge. Add brigde methods port_bridge_join, port_bridge_leave and port_stp_state_set. Signed-off-by: Egil Hjelmeland Styling nitpicks below, other than that, the patch LGTM: Reviewed-by: Vivien Didelot #include #include #include +#include It's nice to order header inclusions alphabetically. #include "lan9303.h" @@ -146,6 +147,7 @@ # define LAN9303_SWE_PORT_STATE_FORWARDING_PORT0 (0) # define LAN9303_SWE_PORT_STATE_LEARNING_PORT0 BIT(1) # define LAN9303_SWE_PORT_STATE_BLOCKING_PORT0 BIT(0) +# define LAN9303_SWE_PORT_STATE_DISABLED_PORT0 (3) #define LAN9303_SWE_PORT_MIRROR 0x1846 # define LAN9303_SWE_PORT_MIRROR_SNIFF_ALL BIT(8) # define LAN9303_SWE_PORT_MIRROR_SNIFFER_PORT2 BIT(7) @@ -431,6 +433,20 @@ static int lan9303_read_switch_reg(struct lan9303 *chip, u16 regnum, u32 *val) return ret; } +static int lan9303_write_switch_reg_mask( + struct lan9303 *chip, u16 regnum, u32 val, u32 mask) That is unlikely to respect the Kernel Coding Style. Please fill the line as much as possible and align with the opening parenthesis: static int lan9303_write_switch_reg_mask(struct lan9303 *chip, u16 regnum, u32 val, u32 mask) OK. Probably this function will go away in v2 due to Andrews comment. +{ + int ret; + u32 reg; + + ret = lan9303_read_switch_reg(chip, regnum, ®); + if (ret) + return ret; + reg = (reg & ~mask) | val; + + return lan9303_write_switch_reg(chip, regnum, reg); +} ... + + portmask = 0x3 << (port * 2); + portstate <<= (port * 2); Unnecessary alignment, please remove the extra space characters. I think the alignment makes the logic more clear. + lan9303_write_switch_reg_mask(chip, LAN9303_SWE_PORT_STATE, + portstate, portmask); +} + static const struct dsa_switch_ops lan9303_switch_ops = { .get_tag_protocol = lan9303_get_tag_protocol, .setup = lan9303_setup, @@ -855,6 +940,9 @@ static const struct dsa_switch_ops lan9303_switch_ops = { .get_sset_count = lan9303_get_sset_count, .port_enable = lan9303_port_enable, .port_disable = lan9303_port_disable, + .port_bridge_join = lan9303_port_bridge_join, + .port_bridge_leave = lan9303_port_bridge_leave, + .port_stp_state_set = lan9303_port_stp_state_set, Same here, alignment unnecessary, especially since the rest of the structure does not do that. }; static int lan9303_register_switch(struct lan9303 *chip) diff --git a/drivers/net/dsa/lan9303.h b/drivers/net/dsa/lan9303.h index 4d8be555ff4d..5be246f05965 100644 --- a/drivers/net/dsa/lan9303.h +++ b/drivers/net/dsa/lan9303.h @@ -21,6 +21,7 @@ struct lan9303 { struct dsa_switch *ds; struct mutex indirect_mutex; /* protect indexed register access */ const struct lan9303_phy_ops *ops; + bool is_bridged; /* true if port 1 and 2 is bridged */ }; extern const struct regmap_access_table lan9303_register_set; Please use the checkpatch.pl script to ensure your patch respects the kernel conventions before submitting, it can spot nice stuffs! I have checked _every_ patch with checkpatch.pl and weeded all warnings before I submitted them. I use a Git alias(*) to check a commit which does basically this: git format-patch --stdout -1 | ./scripts/checkpatch.pl - (*) in details, especially convenient during interactive rebases: $ git config alias.checkcommit '!f () { git format-patch --stdout -1 ${1:-HEAD} | ./scripts/checkpatch.pl - ; }; f' $ git checkcommit # i.e. current one $ git checkcommit HEAD^^ $ git checkcommit d329ac88eb21 ... Thanks, Vivien
Re: [PATCH net 1/4] net:ethernet:aquantia: Setup max_mtu in ndev to enable jumbo frames
>> self->ndev->mtu = aq_nic_cfg->mtu - ETH_HLEN; >> +self->ndev->min_mtu = ETH_MIN_MTU; > This is not required. It will default to ETH_MIN_MTU. Thanks Andrew, true. >> +self->ndev->max_mtu = self->aq_hw_caps.mtu - ETH_FCS_LEN - ETH_HLEN; >> >> return 0; >> } >> @@ -695,7 +696,7 @@ int aq_nic_set_mtu(struct aq_nic_s *self, int new_mtu) >> { >> int err = 0; >> >> -if (new_mtu > self->aq_hw_caps.mtu) { >> +if (new_mtu + ETH_FCS_LEN > self->aq_hw_caps.mtu) { > If you have set max_mtu correctly, this cannot happen. But it seems > odd you don't have ETH_HLEN here, where as when setting max_mtu you > do. I agree, thats an extra check. Will remove that. Regarding ETH_HLEN - the baseline code invokes this function with (new_mtu + ETH_HLEN) - thats why this check does not add it. In general, that extra level of storage (aq_nic_cfg.mtu field) is almost useless, I'll cleanup this under a separate patchset. -- BR, Igor
Re: tg3 pxe weirdness
On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote: > On Thu, Sep 21, 2017 at 7:53 PM, Berend De Schouwer > wrote: > > Hi, > > > > I've got a machine with a Broadcom bcm5762c, using the tg3 driver, > > that > > fails to receive network packets under some very specific > > conditions. > > > > > > Berend > > Can you please share below details? > 1) Model and Manufacturer of the system HP EliteDesk 705 G3 SFF lspci -n on the network card: 01:00.0 0200: 14e4:1687 (rev 10) dmesg from tg3: tg3 :01:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24 tg3 :01:00.0: setting latency timer to 64 tg3 :01:00.0: eth0: Tigon3 [partno(none) rev 5762100] (PCI Express) MAC address 70:5a:0f:3d:9f:4f tg3 :01:00.0: eth0: attached PHY is 5762C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) tg3 :01:00.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1] tg3 :01:00.0: eth0: dma_rwctrl[0001] dma_mask[64-bit] The same motherboard with add-on network cards does boot using PXE. It's an EFI board but Secureboot is currently disabled. > 2) Linux distro/kernel used? CentOS 6 + updates. I've tried all the CentOS 6 kernels up to 2.6.32- 696.10.2.el6.x86_64 I've tried updating the tg3 driver from 3.137 to both 3.137h and 3.137o, with the same result. I'm currently in the process of making 4.13.2 boot.
Re: [PATCH net-next] net: mvpp2: phylink support
Hi Antoine, You can add Tested-by: Marcin Wojtas Best regards, Marcin 2017-09-21 15:45 GMT+02:00 Antoine Tenart : > Convert the PPv2 driver to use phylink, which models the MAC to PHY > link. The phylink support is made such a way the GoP link IRQ can still > be used: the two modes are incompatible and the GoP link IRQ will be > used if no PHY is described in the device tree. This is the same > behaviour as before. > > Signed-off-by: Antoine Tenart > --- > drivers/net/ethernet/marvell/Kconfig | 1 + > drivers/net/ethernet/marvell/mvpp2.c | 502 > +-- > 2 files changed, 303 insertions(+), 200 deletions(-) > > diff --git a/drivers/net/ethernet/marvell/Kconfig > b/drivers/net/ethernet/marvell/Kconfig > index da6fb825afea..991138b8dfba 100644 > --- a/drivers/net/ethernet/marvell/Kconfig > +++ b/drivers/net/ethernet/marvell/Kconfig > @@ -86,6 +86,7 @@ config MVPP2 > depends on ARCH_MVEBU || COMPILE_TEST > depends on HAS_DMA > select MVMDIO > + select PHYLINK > ---help--- > This driver supports the network interface units in the > Marvell ARMADA 375, 7K and 8K SoCs. > diff --git a/drivers/net/ethernet/marvell/mvpp2.c > b/drivers/net/ethernet/marvell/mvpp2.c > index 8041d692db3c..5fb7e76ee128 100644 > --- a/drivers/net/ethernet/marvell/mvpp2.c > +++ b/drivers/net/ethernet/marvell/mvpp2.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -341,6 +342,7 @@ > #define MVPP2_GMAC_FORCE_LINK_PASS BIT(1) > #define MVPP2_GMAC_IN_BAND_AUTONEG BIT(2) > #define MVPP2_GMAC_IN_BAND_AUTONEG_BYPASS BIT(3) > +#define MVPP2_GMAC_IN_BAND_RESTART_AN BIT(4) > #define MVPP2_GMAC_CONFIG_MII_SPEEDBIT(5) > #define MVPP2_GMAC_CONFIG_GMII_SPEED BIT(6) > #define MVPP2_GMAC_AN_SPEED_EN BIT(7) > @@ -350,6 +352,12 @@ > #define MVPP2_GMAC_AN_DUPLEX_ENBIT(13) > #define MVPP2_GMAC_STATUS0 0x10 > #define MVPP2_GMAC_STATUS0_LINK_UP BIT(0) > +#defineMVPP2_GMAC_STATUS0_GMII_SPEED BIT(1) > +#defineMVPP2_GMAC_STATUS0_MII_SPEEDBIT(2) > +#defineMVPP2_GMAC_STATUS0_FULL_DUPLEX BIT(3) > +#define MVPP2_GMAC_STATUS0_RX_PAUSEBIT(6) > +#define MVPP2_GMAC_STATUS0_TX_PAUSEBIT(7) > +#define MVPP2_GMAC_STATUS0_AN_COMPLETE BIT(11) > #define MVPP2_GMAC_PORT_FIFO_CFG_1_REG 0x1c > #define MVPP2_GMAC_TX_FIFO_MIN_TH_OFFS 6 > #define MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK 0x1fc0 > @@ -878,12 +886,11 @@ struct mvpp2_port { > u16 rx_ring_size; > struct mvpp2_pcpu_stats __percpu *stats; > > + struct device_node *of_node; > + > phy_interface_t phy_interface; > - struct device_node *phy_node; > + struct phylink *phylink; > struct phy *comphy; > - unsigned int link; > - unsigned int duplex; > - unsigned int speed; > > struct mvpp2_bm_pool *pool_long; > struct mvpp2_bm_pool *pool_short; > @@ -4716,13 +4723,14 @@ static void mvpp2_port_periodic_xon_disable(struct > mvpp2_port *port) > } > > /* Configure loopback port */ > -static void mvpp2_port_loopback_set(struct mvpp2_port *port) > +static void mvpp2_port_loopback_set(struct mvpp2_port *port, > + const struct phylink_link_state *state) > { > u32 val; > > val = readl(port->base + MVPP2_GMAC_CTRL_1_REG); > > - if (port->speed == 1000) > + if (state->speed == 1000) > val |= MVPP2_GMAC_GMII_LB_EN_MASK; > else > val &= ~MVPP2_GMAC_GMII_LB_EN_MASK; > @@ -4778,10 +4786,6 @@ static void mvpp2_defaults_set(struct mvpp2_port *port) > int tx_port_num, val, queue, ptxq, lrxq; > > if (port->priv->hw_version == MVPP21) { > - /* Configure port to loopback if needed */ > - if (port->flags & MVPP2_F_LOOPBACK) > - mvpp2_port_loopback_set(port); > - > /* Update TX FIFO MIN Threshold */ > val = readl(port->base + MVPP2_GMAC_PORT_FIFO_CFG_1_REG); > val &= ~MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK; > @@ -5860,111 +5864,6 @@ static irqreturn_t mvpp2_link_status_isr(int irq, > void *dev_id) > return IRQ_HANDLED; > } > > -static void mvpp2_gmac_set_autoneg(struct mvpp2_port *port, > - struct phy_device *phydev) > -{ > - u32 val; > - > - if (port->phy_interface != PHY_INTERFACE_MODE_RGMII && > - port->phy_interface != PHY_INTERFACE_MODE_RGMII_ID && > - port->phy_interface != PHY_INTERFACE_MODE_RGMII_RXID && > - port->phy_interface != PHY_INTERFACE_MODE_RGMII_TXID && > - port->phy_interface != PHY_INTERFACE_MODE_SGMII) > - return; > - > -
[PATCH net-next RFC 4/5] vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH
Signed-off-by: Jason Wang --- drivers/vhost/net.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 58585ec..c89640e 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -87,7 +87,7 @@ struct vhost_net_ubuf_ref { struct vhost_virtqueue *vq; }; -#define VHOST_RX_BATCH 64 +#define VHOST_NET_BATCH 64 struct vhost_net_buf { struct sk_buff **queue; int tail; @@ -159,7 +159,7 @@ static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq) rxq->head = 0; rxq->tail = skb_array_consume_batched(nvq->rx_array, rxq->queue, - VHOST_RX_BATCH); + VHOST_NET_BATCH); return rxq->tail; } @@ -912,7 +912,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) return -ENOMEM; } - queue = kmalloc_array(VHOST_RX_BATCH, sizeof(struct sk_buff *), + queue = kmalloc_array(VHOST_NET_BATCH, sizeof(struct sk_buff *), GFP_KERNEL); if (!queue) { kfree(vqs); -- 2.7.4
[PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched processing
This patch implements basic batched processing of tx virtqueue by prefetching desc indices and updating used ring in a batch. For non-zerocopy case, vq->heads were used for storing the prefetched indices and updating used ring. It is also a requirement for doing more batching on top. For zerocopy case and for simplicity, batched processing were simply disabled by only fetching and processing one descriptor at a time, this could be optimized in the future. XDP_DROP (without touching skb) on tun (with Moongen in guest) with zercopy disabled: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz: Before: 3.20Mpps After: 3.90Mpps (+22%) No differences were seen with zerocopy enabled. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 215 -- drivers/vhost/vhost.c | 2 +- 2 files changed, 121 insertions(+), 96 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index c89640e..c439892 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -408,27 +408,25 @@ static int vhost_net_enable_vq(struct vhost_net *n, return vhost_poll_start(poll, sock->file); } -static int vhost_net_tx_get_vq_desc(struct vhost_net *net, - struct vhost_virtqueue *vq, - struct iovec iov[], unsigned int iov_size, - unsigned int *out_num, unsigned int *in_num) +static bool vhost_net_tx_avail(struct vhost_net *net, + struct vhost_virtqueue *vq) { unsigned long uninitialized_var(endtime); - int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), - out_num, in_num, NULL, NULL); - if (r == vq->num && vq->busyloop_timeout) { - preempt_disable(); - endtime = busy_clock() + vq->busyloop_timeout; - while (vhost_can_busy_poll(vq->dev, endtime) && - vhost_vq_avail_empty(vq->dev, vq)) - cpu_relax(); - preempt_enable(); - r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), - out_num, in_num, NULL, NULL); - } + if (!vq->busyloop_timeout) + return false; - return r; + if (!vhost_vq_avail_empty(vq->dev, vq)) + return true; + + preempt_disable(); + endtime = busy_clock() + vq->busyloop_timeout; + while (vhost_can_busy_poll(vq->dev, endtime) && + vhost_vq_avail_empty(vq->dev, vq)) + cpu_relax(); + preempt_enable(); + + return !vhost_vq_avail_empty(vq->dev, vq); } static bool vhost_exceeds_maxpend(struct vhost_net *net) @@ -446,8 +444,9 @@ static void handle_tx(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; + struct vring_used_elem used, *heads = vq->heads; unsigned out, in; - int head; + int avails, head; struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0, @@ -461,6 +460,7 @@ static void handle_tx(struct vhost_net *net) struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); bool zcopy, zcopy_used; + int i, batched = VHOST_NET_BATCH; mutex_lock(&vq->mutex); sock = vq->private_data; @@ -475,6 +475,12 @@ static void handle_tx(struct vhost_net *net) hdr_size = nvq->vhost_hlen; zcopy = nvq->ubufs; + /* Disable zerocopy batched fetching for simplicity */ + if (zcopy) { + heads = &used; + batched = 1; + } + for (;;) { /* Release DMAs done buffers first */ if (zcopy) @@ -486,95 +492,114 @@ static void handle_tx(struct vhost_net *net) if (unlikely(vhost_exceeds_maxpend(net))) break; - head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, - ARRAY_SIZE(vq->iov), - &out, &in); + avails = vhost_prefetch_desc_indices(vq, heads, batched, !zcopy); /* On error, stop handling until the next kick. */ - if (unlikely(head < 0)) + if (unlikely(avails < 0)) break; - /* Nothing new? Wait for eventfd to tell us they refilled. */ - if (head == vq->num) { + /* Nothing new? Busy poll for a while or wait for +* eventfd to tell us they refilled. */ + if (!avails) { + if (vhost_net_tx_avail(net, vq)) + continue; if (unlikely(vhost_enable_notify(&net->dev, vq))) { vhost_disable_notify(&net->dev, vq);
[PATCH net-next RFC 1/5] vhost: split out ring head fetching logic
This patch splits out ring head fetching logic and leave the descriptor fetching and translation logic. This makes it is possible to batch fetching the descriptor indices. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 75 +-- drivers/vhost/vhost.h | 5 2 files changed, 54 insertions(+), 26 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d6dbb28..f87ec75 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1984,41 +1984,23 @@ static int get_indirect(struct vhost_virtqueue *vq, return 0; } -/* This looks in the virtqueue and for the first available buffer, and converts - * it to an iovec for convenient access. Since descriptors consist of some - * number of output then some number of input descriptors, it's actually two - * iovecs, but we pack them into one and note how many of each there were. - * - * This function returns the descriptor number found, or vq->num (which is - * never a valid descriptor number) if none was found. A negative code is - * returned on error. */ -int vhost_get_vq_desc(struct vhost_virtqueue *vq, - struct iovec iov[], unsigned int iov_size, - unsigned int *out_num, unsigned int *in_num, - struct vhost_log *log, unsigned int *log_num) +static unsigned int vhost_get_vq_head(struct vhost_virtqueue *vq, int *err) { - struct vring_desc desc; - unsigned int i, head, found = 0; - u16 last_avail_idx; - __virtio16 avail_idx; - __virtio16 ring_head; - int ret, access; - - /* Check it isn't doing very strange things with descriptor numbers. */ - last_avail_idx = vq->last_avail_idx; + u16 last_avail_idx = vq->last_avail_idx; + __virtio16 avail_idx, ring_head; if (vq->avail_idx == vq->last_avail_idx) { if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) { vq_err(vq, "Failed to access avail idx at %p\n", &vq->avail->idx); - return -EFAULT; + goto err; } vq->avail_idx = vhost16_to_cpu(vq, avail_idx); if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) { vq_err(vq, "Guest moved used index from %u to %u", last_avail_idx, vq->avail_idx); - return -EFAULT; + goto err; } /* If there's nothing new since last we looked, return @@ -2040,13 +2022,35 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, vq_err(vq, "Failed to read head: idx %d address %p\n", last_avail_idx, &vq->avail->ring[last_avail_idx % vq->num]); - return -EFAULT; + goto err; } - head = vhost16_to_cpu(vq, ring_head); + return vhost16_to_cpu(vq, ring_head); +err: + *err = -EFAULT; + return 0; +} + +/* This looks in the virtqueue and for the first available buffer, and converts + * it to an iovec for convenient access. Since descriptors consist of some + * number of output then some number of input descriptors, it's actually two + * iovecs, but we pack them into one and note how many of each there were. + * + * This function returns the descriptor number found, or vq->num (which is + * never a valid descriptor number) if none was found. A negative code is + * returned on error. */ +int __vhost_get_vq_desc(struct vhost_virtqueue *vq, + struct iovec iov[], unsigned int iov_size, + unsigned int *out_num, unsigned int *in_num, + struct vhost_log *log, unsigned int *log_num, + __virtio16 head) +{ + struct vring_desc desc; + unsigned int i, found = 0; + int ret = 0, access; /* If their number is silly, that's an error. */ - if (unlikely(head >= vq->num)) { + if (unlikely(head > vq->num)) { vq_err(vq, "Guest says index %u > %u is available", head, vq->num); return -EINVAL; @@ -2133,6 +2137,25 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY)); return head; } +EXPORT_SYMBOL_GPL(__vhost_get_vq_desc); + +int vhost_get_vq_desc(struct vhost_virtqueue *vq, + struct iovec iov[], unsigned int iov_size, + unsigned int *out_num, unsigned int *in_num, + struct vhost_log *log, unsigned int *log_num) +{ + int ret = 0; + unsigned int head = vhost_get_vq_head(vq, &ret); + + if (ret) + return ret; + + if (head == vq->num) + return head; + + return __vhost_get_vq_desc(vq, iov, iov_size, out_num, in_num, +
[PATCH net-next RFC 0/5] batched tx processing in vhost_net
Hi: This series tries to implement basic tx batched processing. This is done by prefetching descriptor indices and update used ring in a batch. This intends to speed up used ring updating and improve the cache utilization. Test shows about ~22% improvement in tx pss. Please review. Jason Wang (5): vhost: split out ring head fetching logic vhost: introduce helper to prefetch desc index vhost: introduce vhost_add_used_idx() vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH vhost_net: basic tx virtqueue batched processing drivers/vhost/net.c | 221 -- drivers/vhost/vhost.c | 165 +++-- drivers/vhost/vhost.h | 9 ++ 3 files changed, 270 insertions(+), 125 deletions(-) -- 2.7.4
[PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()
This patch introduces a helper which just increase the used idx. This will be used in pair with vhost_prefetch_desc_indices() by batching code. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 33 + drivers/vhost/vhost.h | 1 + 2 files changed, 34 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 8424166d..6532cda 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2178,6 +2178,39 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len) } EXPORT_SYMBOL_GPL(vhost_add_used); +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n) +{ + u16 old, new; + + old = vq->last_used_idx; + new = (vq->last_used_idx += n); + /* If the driver never bothers to signal in a very long while, +* used index might wrap around. If that happens, invalidate +* signalled_used index we stored. TODO: make sure driver +* signals at least once in 2^16 and remove this. +*/ + if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old))) + vq->signalled_used_valid = false; + + /* Make sure buffer is written before we update index. */ + smp_wmb(); + if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx), + &vq->used->idx)) { + vq_err(vq, "Failed to increment used idx"); + return -EFAULT; + } + if (unlikely(vq->log_used)) { + /* Log used index update. */ + log_write(vq->log_base, + vq->log_addr + offsetof(struct vring_used, idx), + sizeof(vq->used->idx)); + if (vq->log_ctx) + eventfd_signal(vq->log_ctx, 1); + } + return 0; +} +EXPORT_SYMBOL_GPL(vhost_add_used_idx); + static int __vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, unsigned count) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 16c2cb6..5dd6c05 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -199,6 +199,7 @@ int __vhost_get_vq_desc(struct vhost_virtqueue *vq, void vhost_discard_vq_desc(struct vhost_virtqueue *, int n); int vhost_vq_init_access(struct vhost_virtqueue *); +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n); int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len); int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads, unsigned count); -- 2.7.4
[PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index
This patch introduces vhost_prefetch_desc_indices() which could batch descriptor indices fetching and used ring updating. This intends to reduce the cache misses of indices fetching and updating and reduce cache line bounce when virtqueue is almost full. copy_to_user() was used in order to benefit from modern cpus that support fast string copy. Batched virtqueue processing will be the first user. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 55 +++ drivers/vhost/vhost.h | 3 +++ 2 files changed, 58 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index f87ec75..8424166d 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev, } EXPORT_SYMBOL_GPL(vhost_dequeue_msg); +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq, + struct vring_used_elem *heads, + u16 num, bool used_update) +{ + int ret, ret2; + u16 last_avail_idx, last_used_idx, total, copied; + __virtio16 avail_idx; + struct vring_used_elem __user *used; + int i; + + if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) { + vq_err(vq, "Failed to access avail idx at %p\n", + &vq->avail->idx); + return -EFAULT; + } + last_avail_idx = vq->last_avail_idx & (vq->num - 1); + vq->avail_idx = vhost16_to_cpu(vq, avail_idx); + total = vq->avail_idx - vq->last_avail_idx; + ret = total = min(total, num); + + for (i = 0; i < ret; i++) { + ret2 = vhost_get_avail(vq, heads[i].id, + &vq->avail->ring[last_avail_idx]); + if (unlikely(ret2)) { + vq_err(vq, "Failed to get descriptors\n"); + return -EFAULT; + } + last_avail_idx = (last_avail_idx + 1) & (vq->num - 1); + } + + if (!used_update) + return ret; + + last_used_idx = vq->last_used_idx & (vq->num - 1); + while (total) { + copied = min((u16)(vq->num - last_used_idx), total); + ret2 = vhost_copy_to_user(vq, + &vq->used->ring[last_used_idx], + &heads[ret - total], + copied * sizeof(*used)); + + if (unlikely(ret2)) { + vq_err(vq, "Failed to update used ring!\n"); + return -EFAULT; + } + + last_used_idx = 0; + total -= copied; + } + + /* Only get avail ring entries after they have been exposed by guest. */ + smp_rmb(); + return ret; +} +EXPORT_SYMBOL(vhost_prefetch_desc_indices); static int __init vhost_init(void) { diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 39ff897..16c2cb6 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -228,6 +228,9 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to, ssize_t vhost_chr_write_iter(struct vhost_dev *dev, struct iov_iter *from); int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled); +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq, + struct vring_used_elem *heads, + u16 num, bool used_update); #define vq_err(vq, fmt, ...) do { \ pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \ -- 2.7.4
Re: [PATCH net-next RFC 1/5] vhost: split out ring head fetching logic
On Fri, Sep 22, 2017 at 04:02:31PM +0800, Jason Wang wrote: > +/* This looks in the virtqueue and for the first available buffer, and > converts > + * it to an iovec for convenient access. Since descriptors consist of some > + * number of output then some number of input descriptors, it's actually two > + * iovecs, but we pack them into one and note how many of each there were. > + * > + * This function returns the descriptor number found, or vq->num (which is > + * never a valid descriptor number) if none was found. A negative code is > + * returned on error. */ > +int __vhost_get_vq_desc(struct vhost_virtqueue *vq, > + struct iovec iov[], unsigned int iov_size, > + unsigned int *out_num, unsigned int *in_num, > + struct vhost_log *log, unsigned int *log_num, > + __virtio16 head) [...] > +int vhost_get_vq_desc(struct vhost_virtqueue *vq, > + struct iovec iov[], unsigned int iov_size, > + unsigned int *out_num, unsigned int *in_num, > + struct vhost_log *log, unsigned int *log_num) Please document vhost_get_vq_desc(). Please also explain the difference between __vhost_get_vq_desc() and vhost_get_vq_desc() in the documentation.
Re: [patch net-next 07/12] mlxsw: spectrum: Add the multicast routing offloading logic
On 09/21/2017 06:26 PM, Andrew Lunn wrote: >> +static void mlxsw_sp_mr_route_stats_update(struct mlxsw_sp *mlxsw_sp, >> + struct mlxsw_sp_mr_route *mr_route) >> +{ >> +struct mlxsw_sp_mr *mr = mlxsw_sp->mr; >> +u64 packets, bytes; >> + >> +if (mr_route->route_action == MLXSW_SP_MR_ROUTE_ACTION_TRAP) >> +return; >> + >> +mr->mr_ops->route_stats(mlxsw_sp, mr_route->route_priv, &packets, >> +&bytes); >> + >> +switch (mr_route->mr_table->proto) { >> +case MLXSW_SP_L3_PROTO_IPV4: >> +mr_route->mfc4->mfc_un.res.pkt = packets; >> +mr_route->mfc4->mfc_un.res.bytes = bytes; > What about wrong_if and lastuse? wronf_if is updated by ipmr as it is trapped to the CPU. We did not address lastuse currently, though it can be easily addressed here. > > Is an mfc with iif on the host, not the switch, not offloaded? I am not sure I followed. What do you mean MFC with iif on the host? you mean MFC with iif that is an external NIC which is not part of the spectrum ASIC? in this case, the route will not be offloaded and all traffic will pass in slowpath. > >Andrew
Re: [PATCH 2/2] ip_tunnel: add mpls over gre encapsulation
On 09/22/2017 12:35 AM, Roopa Prabhu wrote: > diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c > index 36ea2ad..060ed07 100644 > --- a/net/mpls/af_mpls.c > +++ b/net/mpls/af_mpls.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #if IS_ENABLED(CONFIG_IPV6) > #include > @@ -39,6 +40,40 @@ static int one = 1; > static int label_limit = (1 << 20) - 1; > static int ttl_max = 255; > > +size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e) > +{ > + return sizeof(struct mpls_shim_hdr); > +} > + > +int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, > + u8 *protocol, struct flowi4 *fl4) > +{ > + return 0; > +} any reason you are supporting only rx ? Tx path doesn't need changes, all gre hdr fields remain the same except for the protocol type which is loaded from skb->protocol this last is set by the mpls stack before entering gre device.
Re: [net-next 1/2] dummy: add device MTU validation check
2017-09-21, 08:02:18 -0700, Eric Dumazet wrote: > On Thu, 2017-09-21 at 21:32 +0800, Zhang Shengju wrote: > > Currently, any mtu value can be assigned when adding a new dummy device: > > [~]# ip link add name dummy1 mtu 10 type dummy > > [~]# ip link show dummy1 > > 15: dummy1: mtu 10 qdisc noop state DOWN mode DEFAULT > > group default qlen 1000 > > link/ether 0a:61:6b:16:14:ce brd ff:ff:ff:ff:ff:ff > > > > This patch adds device MTU validation check. > > What is wrong with big MTU on dummy ? It looks like the "centralize MTU checking" series broke that, but only for changing the MTU on an existing dummy device. Commit a52ad514fdf3 defined min_mtu/max_mtu in ether_setup, which dummy uses, but there was no MTU check in dummy prior to that commit. > If this is a generic rule, this check should belong in core network > stack. > > > > > Signed-off-by: Zhang Shengju > > --- > > drivers/net/dummy.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c > > index e31ab3b..0276b2b 100644 > > --- a/drivers/net/dummy.c > > +++ b/drivers/net/dummy.c > > @@ -365,6 +365,14 @@ static int dummy_validate(struct nlattr *tb[], struct > > nlattr *data[], > > if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) > > return -EADDRNOTAVAIL; > > } > > + > > + if (tb[IFLA_MTU]) { > > + u32 mtu = nla_get_u32(tb[IFLA_MTU]); > > You do not verify/validate nla_len(tb[IFLA_MTU]). I think ifla_policy already performs that check: static const struct nla_policy ifla_policy[IFLA_MAX+1] = { [...] [IFLA_MTU] = { .type = NLA_U32 }, static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { [...] err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy, extack); -- Sabrina
Re: [PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index
On Fri, Sep 22, 2017 at 04:02:32PM +0800, Jason Wang wrote: > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index f87ec75..8424166d 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct > vhost_dev *dev, > } > EXPORT_SYMBOL_GPL(vhost_dequeue_msg); > > +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq, > + struct vring_used_elem *heads, > + u16 num, bool used_update) Missing doc comment. > +{ > + int ret, ret2; > + u16 last_avail_idx, last_used_idx, total, copied; > + __virtio16 avail_idx; > + struct vring_used_elem __user *used; > + int i; The following variable names are a little confusing: last_avail_idx vs vq->last_avail_idx. last_avail_idx is a wrapped avail->ring[] index, vq->last_avail_idx is a free-running counter. The same for last_used_idx vs vq->last_used_idx. num argument vs vq->num. The argument could be called nheads instead to make it clear that this is heads[] and not the virtqueue size. Not a bug but it took me a while to figure out what was going on.
Re: [PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()
On Fri, Sep 22, 2017 at 04:02:33PM +0800, Jason Wang wrote: > This patch introduces a helper which just increase the used idx. This > will be used in pair with vhost_prefetch_desc_indices() by batching > code. > > Signed-off-by: Jason Wang > --- > drivers/vhost/vhost.c | 33 + > drivers/vhost/vhost.h | 1 + > 2 files changed, 34 insertions(+) Reviewed-by: Stefan Hajnoczi
Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
On Thu, Sep 21, 2017 at 11:56:30PM +0100, Colin King wrote: > From: Colin Ian King > > Don't populate const array ac_to_fifo on the stack in an inlined > function, instead make it static. Makes the object code smaller > by over 800 bytes: > >text data bss dec hex filename > 159029 331541216 193399 2f377 4965-mac.o > >text data bss dec hex filename > 158122 332501216 192588 2f04c 4965-mac.o > > (gcc version 7.2.0 x86_64) > > Signed-off-by: Colin Ian King Content type information was added at the end of the topic, but I think Kalle can fix that when he will be committing the patch. Acked-by: Stanislaw Gruszka
Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote: > > On Thu, 21 Sep 2017, Colin King wrote: > > > From: Colin Ian King > > > > Don't populate const array ac_to_fifo on the stack in an inlined > > function, instead make it static. Makes the object code smaller > > by over 800 bytes: > > > >textdata bss dec hex filename > > 159029 331541216 193399 2f377 4965-mac.o > > > >textdata bss dec hex filename > > 158122 332501216 192588 2f04c 4965-mac.o > > > > (gcc version 7.2.0 x86_64) > > Gcc 7 must be much more aggressive about inlining than gcc 4. Here is the > change that I got on this file: > > text data bss dec hex filename > - 76346 1494 152 77992 130a8 > drivers/net/wireless/intel/iwlegacy/4965-mac.o > + 76298 1494 152 77944 13078 > drivers/net/wireless/intel/iwlegacy/4965-mac.o > decrease of 48 More likely different CONFIG options. e.g. allyesconfig vs defconfig with wireless
Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
On 22/09/17 11:03, Joe Perches wrote: > On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote: >> >> On Thu, 21 Sep 2017, Colin King wrote: >> >>> From: Colin Ian King >>> >>> Don't populate const array ac_to_fifo on the stack in an inlined >>> function, instead make it static. Makes the object code smaller >>> by over 800 bytes: >>> >>>textdata bss dec hex filename >>> 159029 331541216 193399 2f377 4965-mac.o >>> >>>textdata bss dec hex filename >>> 158122 332501216 192588 2f04c 4965-mac.o >>> >>> (gcc version 7.2.0 x86_64) >> >> Gcc 7 must be much more aggressive about inlining than gcc 4. Here is the >> change that I got on this file: >> >> text data bss dec hex filename >> - 76346 1494 152 77992 130a8 >> drivers/net/wireless/intel/iwlegacy/4965-mac.o >> + 76298 1494 152 77944 13078 >> drivers/net/wireless/intel/iwlegacy/4965-mac.o >> decrease of 48 > > More likely different CONFIG options. > > e.g. allyesconfig vs defconfig with wireless > yup, I used allyesconfig
Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
On Fri, 22 Sep 2017, Colin Ian King wrote: > On 22/09/17 11:03, Joe Perches wrote: > > On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote: > >> > >> On Thu, 21 Sep 2017, Colin King wrote: > >> > >>> From: Colin Ian King > >>> > >>> Don't populate const array ac_to_fifo on the stack in an inlined > >>> function, instead make it static. Makes the object code smaller > >>> by over 800 bytes: > >>> > >>>text data bss dec hex filename > >>> 159029 331541216 193399 2f377 4965-mac.o > >>> > >>>text data bss dec hex filename > >>> 158122 332501216 192588 2f04c 4965-mac.o > >>> > >>> (gcc version 7.2.0 x86_64) > >> > >> Gcc 7 must be much more aggressive about inlining than gcc 4. Here is the > >> change that I got on this file: > >> > >> text data bss dec hex filename > >> - 76346 1494 152 77992 130a8 > >> drivers/net/wireless/intel/iwlegacy/4965-mac.o > >> + 76298 1494 152 77944 13078 > >> drivers/net/wireless/intel/iwlegacy/4965-mac.o > >> decrease of 48 > > > > More likely different CONFIG options. > > > > e.g. allyesconfig vs defconfig with wireless > > > yup, I used allyesconfig Mine is also allyesconfig. julia > > -- > To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
tools: selftests: psock_tpacket: skip un-supported tpacket_v3 test
The TPACKET_V3 test of PACKET_TX_RING will fail with kernel version lower than v4.11. Supported code of tx ring was add with commit id <7f953ab2ba46: af_packet: TX_RING support for TPACKET_V3> at Jan. 3 of 2017. So skip this item test instead of reporting failing for old kernels. Signed-off-by: Orson Zhai --- tools/testing/selftests/net/psock_tpacket.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c index 7f6cd9fdacf3..f0cfc18c3726 100644 --- a/tools/testing/selftests/net/psock_tpacket.c +++ b/tools/testing/selftests/net/psock_tpacket.c @@ -57,6 +57,7 @@ #include #include #include +#include #include "psock_lib.h" @@ -676,7 +677,7 @@ static void __v3_fill(struct ring *ring, unsigned int blocks, int type) ring->flen = ring->req3.tp_block_size; } -static void setup_ring(int sock, struct ring *ring, int version, int type) +static int setup_ring(int sock, struct ring *ring, int version, int type) { int ret = 0; unsigned int blocks = 256; @@ -703,7 +704,11 @@ static void setup_ring(int sock, struct ring *ring, int version, int type) if (ret == -1) { perror("setsockopt"); - exit(1); + if (errno == EINVAL) { + printf("[SKIP] This type seems un-supported in current kernel, skipped.\n"); + return -1; + } else + exit(1); } ring->rd_len = ring->rd_num * sizeof(*ring->rd); @@ -715,6 +720,7 @@ static void setup_ring(int sock, struct ring *ring, int version, int type) total_packets = 0; total_bytes = 0; + return 0; } static void mmap_ring(int sock, struct ring *ring) @@ -830,7 +836,12 @@ static int test_tpacket(int version, int type) sock = pfsocket(version); memset(&ring, 0, sizeof(ring)); - setup_ring(sock, &ring, version, type); + if(setup_ring(sock, &ring, version, type)) { + /* skip test when error of invalid argument */ + close(sock); + return 0; + } + mmap_ring(sock, &ring); bind_ring(sock, &ring); walk_ring(sock, &ring); -- 2.12.2
Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
Stanislaw Gruszka writes: > On Thu, Sep 21, 2017 at 11:56:30PM +0100, Colin King wrote: >> From: Colin Ian King >> >> Don't populate const array ac_to_fifo on the stack in an inlined >> function, instead make it static. Makes the object code smaller >> by over 800 bytes: >> >>text data bss dec hex filename >> 159029331541216 193399 2f377 4965-mac.o >> >>text data bss dec hex filename >> 158122332501216 192588 2f04c 4965-mac.o >> >> (gcc version 7.2.0 x86_64) >> >> Signed-off-by: Colin Ian King > > Content type information was added at the end of the topic, but > I think Kalle can fix that when he will be committing the patch. Yeah, I'll fix that when I commit this. But very good that you pointed it out, I might miss stuff like this. I'll also remove the "wireless:" prefix from the title. -- Kalle Valo
Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
On Fri, 2017-09-22 at 12:06 +0200, Julia Lawall wrote: > > On Fri, 22 Sep 2017, Colin Ian King wrote: > > > On 22/09/17 11:03, Joe Perches wrote: > > > On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote: > > > > > > > > On Thu, 21 Sep 2017, Colin King wrote: > > > > > > > > > From: Colin Ian King > > > > > > > > > > Don't populate const array ac_to_fifo on the stack in an inlined > > > > > function, instead make it static. Makes the object code smaller > > > > > by over 800 bytes: > > > > > > > > > >text data bss dec hex filename > > > > > 159029 331541216 193399 2f377 4965-mac.o > > > > > > > > > >text data bss dec hex filename > > > > > 158122 332501216 192588 2f04c 4965-mac.o > > > > > > > > > > (gcc version 7.2.0 x86_64) > > > > > > > > Gcc 7 must be much more aggressive about inlining than gcc 4. Here is > > > > the > > > > change that I got on this file: > > > > > > > > text data bss dec hex filename > > > > - 76346 1494 152 77992 130a8 > > > > drivers/net/wireless/intel/iwlegacy/4965-mac.o > > > > + 76298 1494 152 77944 13078 > > > > drivers/net/wireless/intel/iwlegacy/4965-mac.o > > > > decrease of 48 > > > > > > More likely different CONFIG options. > > > > > > e.g. allyesconfig vs defconfig with wireless > > > > > > > yup, I used allyesconfig > > Mine is also allyesconfig. Odd. Here is what I get: (Sorry, no gcc-7) gcc4: gcc-4.9 (Ubuntu 4.9.4-2ubuntu1) 4.9.4 gcc5: gcc-5 (Ubuntu 5.4.1-8ubuntu1) 5.4.1 20170304 gcc6: gcc-6 (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406 $ size drivers/net/wireless/intel/iwlegacy/4965-mac.o* text data bss dec hex filename 118559 1766 152 120477 1d69d drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.allyesconfig.new 118623 1766 152 120541 1d6dd drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.allyesconfig.old 51595 1156 0 52751 ce0f drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.defconfig.new 51659 1156 0 52815 ce4f drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.defconfig.old 141956 29790 1216 172962 2a3a2 drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.allyesconfig.new 142671 29702 1216 173589 2a615 drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.allyesconfig.old 51733 1156 0 52889 ce99 drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.defconfig.new 51813 1156 0 52969 cee9 drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.defconfig.old 152991 29790 1216 183997 2cebd drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.allyesconfig.new 153722 29702 1216 184640 2d140 drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.allyesconfig.old 51813 1156 0 52969 cee9 drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.defconfig.new 51893 1156 0 53049 cf39 drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.defconfig.old
Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
On Fri, 22 Sep 2017, Joe Perches wrote: > On Fri, 2017-09-22 at 12:06 +0200, Julia Lawall wrote: > > > > On Fri, 22 Sep 2017, Colin Ian King wrote: > > > > > On 22/09/17 11:03, Joe Perches wrote: > > > > On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote: > > > > > > > > > > On Thu, 21 Sep 2017, Colin King wrote: > > > > > > > > > > > From: Colin Ian King > > > > > > > > > > > > Don't populate const array ac_to_fifo on the stack in an inlined > > > > > > function, instead make it static. Makes the object code smaller > > > > > > by over 800 bytes: > > > > > > > > > > > >textdata bss dec hex filename > > > > > > 159029 331541216 193399 2f377 4965-mac.o > > > > > > > > > > > >textdata bss dec hex filename > > > > > > 158122 332501216 192588 2f04c 4965-mac.o > > > > > > > > > > > > (gcc version 7.2.0 x86_64) > > > > > > > > > > Gcc 7 must be much more aggressive about inlining than gcc 4. Here > > > > > is the > > > > > change that I got on this file: > > > > > > > > > > text data bss dec hex filename > > > > > - 76346 1494 152 77992 130a8 > > > > > drivers/net/wireless/intel/iwlegacy/4965-mac.o > > > > > + 76298 1494 152 77944 13078 > > > > > drivers/net/wireless/intel/iwlegacy/4965-mac.o > > > > > decrease of 48 > > > > > > > > More likely different CONFIG options. > > > > > > > > e.g. allyesconfig vs defconfig with wireless > > > > > > > > > > yup, I used allyesconfig > > > > Mine is also allyesconfig. > > Odd. > > Here is what I get: (Sorry, no gcc-7) > > gcc4: gcc-4.9 (Ubuntu 4.9.4-2ubuntu1) 4.9.4 > gcc5: gcc-5 (Ubuntu 5.4.1-8ubuntu1) 5.4.1 20170304 > gcc6: gcc-6 (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406 > > $ size drivers/net/wireless/intel/iwlegacy/4965-mac.o* > text data bss dec hex filename > 118559 1766 152 120477 1d69d > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.allyesconfig.new > 118623 1766 152 120541 1d6dd > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.allyesconfig.old My gcc is 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) You get a savings of 64 and I got a savings of 48, but it's not that big a difference, compared to what the later versions give. julia > 51595 1156 0 52751 ce0f > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.defconfig.new > 51659 1156 0 52815 ce4f > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.defconfig.old > 141956 29790 1216 172962 2a3a2 > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.allyesconfig.new > 142671 29702 1216 173589 2a615 > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.allyesconfig.old > 51733 1156 0 52889 ce99 > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.defconfig.new > 51813 1156 0 52969 cee9 > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.defconfig.old > 152991 29790 1216 183997 2cebd > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.allyesconfig.new > 153722 29702 1216 184640 2d140 > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.allyesconfig.old > 51813 1156 0 52969 cee9 > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.defconfig.new > 51893 1156 0 53049 cf39 > drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.defconfig.old > > > -- > To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
[PATCH 0/5] use setup_timer() helper function.
This series uses setup_timer() helper function. The series addresses the files under net/*. Allen Pais (5): net: nfc: hci: use setup_timer() helper. net: nfc: hci: llc_shdlc: use setup_timer() helper. net: af_packet: use setup_timer() helper. net: nfc: core: use setup_timer() helper. net: nfc: llcp_core: use setup_timer() helper. net/nfc/core.c | 5 ++--- net/nfc/hci/core.c | 5 ++--- net/nfc/hci/llc_shdlc.c | 15 ++- net/nfc/llcp_core.c | 10 -- net/packet/af_packet.c | 4 +--- 5 files changed, 15 insertions(+), 24 deletions(-) -- 2.7.4
[PATCH 1/5] net: nfc: hci: use setup_timer() helper.
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais --- net/nfc/hci/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/nfc/hci/core.c b/net/nfc/hci/core.c index b740fef..a8a6e78 100644 --- a/net/nfc/hci/core.c +++ b/net/nfc/hci/core.c @@ -1004,9 +1004,8 @@ int nfc_hci_register_device(struct nfc_hci_dev *hdev) INIT_WORK(&hdev->msg_tx_work, nfc_hci_msg_tx_work); - init_timer(&hdev->cmd_timer); - hdev->cmd_timer.data = (unsigned long)hdev; - hdev->cmd_timer.function = nfc_hci_cmd_timeout; + setup_timer(&hdev->cmd_timer, nfc_hci_cmd_timeout, + (unsigned long)hdev); skb_queue_head_init(&hdev->rx_hcp_frags); -- 2.7.4
[PATCH 2/5] net: nfc: hci: llc_shdlc: use setup_timer() helper.
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais --- net/nfc/hci/llc_shdlc.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/nfc/hci/llc_shdlc.c b/net/nfc/hci/llc_shdlc.c index 17e59a0..58df37e 100644 --- a/net/nfc/hci/llc_shdlc.c +++ b/net/nfc/hci/llc_shdlc.c @@ -763,17 +763,14 @@ static void *llc_shdlc_init(struct nfc_hci_dev *hdev, xmit_to_drv_t xmit_to_drv, mutex_init(&shdlc->state_mutex); shdlc->state = SHDLC_DISCONNECTED; - init_timer(&shdlc->connect_timer); - shdlc->connect_timer.data = (unsigned long)shdlc; - shdlc->connect_timer.function = llc_shdlc_connect_timeout; + setup_timer(&shdlc->connect_timer, llc_shdlc_connect_timeout, + (unsigned long)shdlc); - init_timer(&shdlc->t1_timer); - shdlc->t1_timer.data = (unsigned long)shdlc; - shdlc->t1_timer.function = llc_shdlc_t1_timeout; + setup_timer(&shdlc->t1_timer, llc_shdlc_t1_timeout, + (unsigned long)shdlc); - init_timer(&shdlc->t2_timer); - shdlc->t2_timer.data = (unsigned long)shdlc; - shdlc->t2_timer.function = llc_shdlc_t2_timeout; + setup_timer(&shdlc->t2_timer, llc_shdlc_t2_timeout, + (unsigned long)shdlc); shdlc->w = SHDLC_MAX_WINDOW; shdlc->srej_support = SHDLC_SREJ_SUPPORT; -- 2.7.4
[PATCH 5/5] net: nfc: llcp_core: use setup_timer() helper.
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais --- net/nfc/llcp_core.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c index 02eef5c..7988185 100644 --- a/net/nfc/llcp_core.c +++ b/net/nfc/llcp_core.c @@ -1573,9 +1573,8 @@ int nfc_llcp_register_device(struct nfc_dev *ndev) INIT_LIST_HEAD(&local->list); kref_init(&local->ref); mutex_init(&local->sdp_lock); - init_timer(&local->link_timer); - local->link_timer.data = (unsigned long) local; - local->link_timer.function = nfc_llcp_symm_timer; + setup_timer(&local->link_timer, nfc_llcp_symm_timer, + (unsigned long)local); skb_queue_head_init(&local->tx_queue); INIT_WORK(&local->tx_work, nfc_llcp_tx_work); @@ -1601,9 +1600,8 @@ int nfc_llcp_register_device(struct nfc_dev *ndev) mutex_init(&local->sdreq_lock); INIT_HLIST_HEAD(&local->pending_sdreqs); - init_timer(&local->sdreq_timer); - local->sdreq_timer.data = (unsigned long) local; - local->sdreq_timer.function = nfc_llcp_sdreq_timer; + setup_timer(&local->sdreq_timer, nfc_llcp_sdreq_timer, + (unsigned long)local); INIT_WORK(&local->sdreq_timeout_work, nfc_llcp_sdreq_timeout_work); list_add(&local->list, &llcp_devices); -- 2.7.4
[PATCH 4/5] net: nfc: core: use setup_timer() helper.
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais --- net/nfc/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/nfc/core.c b/net/nfc/core.c index 5cf33df..e5e23c2 100644 --- a/net/nfc/core.c +++ b/net/nfc/core.c @@ -1094,9 +1094,8 @@ struct nfc_dev *nfc_allocate_device(struct nfc_ops *ops, dev->targets_generation = 1; if (ops->check_presence) { - init_timer(&dev->check_pres_timer); - dev->check_pres_timer.data = (unsigned long)dev; - dev->check_pres_timer.function = nfc_check_pres_timeout; + setup_timer(&dev->check_pres_timer, nfc_check_pres_timeout, + (unsigned long)dev); INIT_WORK(&dev->check_pres_work, nfc_check_pres_work); } -- 2.7.4
[PATCH 3/5] net: af_packet: use setup_timer() helper.
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais --- net/packet/af_packet.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c261729..1d3e3ce 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -544,9 +544,7 @@ static void prb_init_blk_timer(struct packet_sock *po, struct tpacket_kbdq_core *pkc, void (*func) (unsigned long)) { - init_timer(&pkc->retire_blk_timer); - pkc->retire_blk_timer.data = (long)po; - pkc->retire_blk_timer.function = func; + setup_timer(&pkc->retire_blk_timer, func, (long)po); pkc->retire_blk_timer.expires = jiffies; } -- 2.7.4
Re: [net-next 1/2] dummy: add device MTU validation check
On Fri, 2017-09-22 at 10:56 +0200, Sabrina Dubroca wrote: > 2017-09-21, 08:02:18 -0700, Eric Dumazet wrote: > > On Thu, 2017-09-21 at 21:32 +0800, Zhang Shengju wrote: > > > Currently, any mtu value can be assigned when adding a new dummy device: > > > [~]# ip link add name dummy1 mtu 10 type dummy > > > [~]# ip link show dummy1 > > > 15: dummy1: mtu 10 qdisc noop state DOWN mode > > > DEFAULT group default qlen 1000 > > > link/ether 0a:61:6b:16:14:ce brd ff:ff:ff:ff:ff:ff > > > > > > This patch adds device MTU validation check. > > > > What is wrong with big MTU on dummy ? > > It looks like the "centralize MTU checking" series broke that, but > only for changing the MTU on an existing dummy device. Commit > a52ad514fdf3 defined min_mtu/max_mtu in ether_setup, which dummy uses, > but there was no MTU check in dummy prior to that commit. > It looks like we accept big mtu on loopback, right ? lpaa23:~# ifconfig lo mtu 10 lpaa23:~# ifconfig lo loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:10 Metric:1 RX packets:3823 errors:0 dropped:0 overruns:0 frame:0 TX packets:3823 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:759159 (759.1 KB) TX bytes:759159 (759.1 KB) Also we accept very small MTU as well (although this automatically removes IP addresses, as one would expect) lpaa23:~# ifconfig lo mtu 50 lpaa23:~# ifconfig lo loLink encap:Local Loopback UP LOOPBACK RUNNING MTU:50 Metric:1 RX packets:4052 errors:0 dropped:0 overruns:0 frame:0 TX packets:4052 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:806274 (806.2 KB) TX bytes:806274 (806.2 KB) So, why dummy devices would not accept bit MTU ? Do we have some fundamental assumption in the stack ? If yes, we need to fix loopback urgently, it is more important than dummy. Thanks.
Re: [PATCH net-next] net: mvpp2: phylink support
On Thu, Sep 21, 2017 at 03:45:22PM +0200, Antoine Tenart wrote: > Convert the PPv2 driver to use phylink, which models the MAC to PHY > link. The phylink support is made such a way the GoP link IRQ can still > be used: the two modes are incompatible and the GoP link IRQ will be > used if no PHY is described in the device tree. This is the same > behaviour as before. This makes no sense. The point of phylink is to be able to support SFP cages, and SFP cages do not have a PHY described in DT. So, when you want to use phylink because of SFP, you can't, because if you omit the PHY the driver avoids using phylink. > +static void mvpp2_phylink_validate(struct net_device *dev, > +unsigned long *supported, > +struct phylink_link_state *state) > +{ > + __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, }; > + > + phylink_set_port_modes(mask); > + > + phylink_set(mask, Autoneg); > + phylink_set(mask, Pause); > + phylink_set(mask, Asym_Pause); > + > + phylink_set(mask, 10baseT_Half); > + phylink_set(mask, 10baseT_Full); > + phylink_set(mask, 100baseT_Half); > + phylink_set(mask, 100baseT_Full); > + phylink_set(mask, 1000baseT_Half); > + phylink_set(mask, 1000baseT_Full); > + phylink_set(mask, 1000baseX_Full); > + phylink_set(mask, 1baseKR_Full); > + > + bitmap_and(supported, supported, mask, __ETHTOOL_LINK_MODE_MASK_NBITS); > + bitmap_and(state->advertising, state->advertising, mask, > +__ETHTOOL_LINK_MODE_MASK_NBITS); > +} I don't think you've understood this despite the comments in the header file. What you describe above basically means you don't support any kind of copper connection at 10G speeds, or any fiber modes at 10G speeds either. You need to set 1baseT_Full for copper, 1baseSR_Full, 1baseLR_Full, 1baseLRM_Full etc for fiber. Had you looked at my modifications for Marvell's mvpp2x driver you'd have spotted this... > + > +static int mvpp2_phylink_mac_link_state(struct net_device *dev, > + struct phylink_link_state *state) > +{ > + struct mvpp2_port *port = netdev_priv(dev); > + u32 val; > + > + if (!phy_interface_mode_is_rgmii(port->phy_interface) && > + port->phy_interface != PHY_INTERFACE_MODE_SGMII) > + return 0; You're blocking this for 1000base-X and 10G connections, which is not correct. The expectation is that this function returns the current MAC state irrespective of the interface mode. > + > + val = readl(port->base + MVPP2_GMAC_STATUS0); > + > + state->an_complete = !!(val & MVPP2_GMAC_STATUS0_AN_COMPLETE); > + state->link = !!(val & MVPP2_GMAC_STATUS0_LINK_UP); > + state->duplex = !!(val & MVPP2_GMAC_STATUS0_FULL_DUPLEX); > + > + if (val & MVPP2_GMAC_STATUS0_GMII_SPEED) > + state->speed = SPEED_1000; > + else > + state->speed = (val & MVPP2_GMAC_STATUS0_MII_SPEED) ? > +SPEED_100 : SPEED_10; > + > + state->pause = 0; > + if (val & MVPP2_GMAC_STATUS0_RX_PAUSE) > + state->pause |= MLO_PAUSE_RX; > + if (val & MVPP2_GMAC_STATUS0_TX_PAUSE) > + state->pause |= MLO_PAUSE_TX; > + > + return 1; > +} > + > +static void mvpp2_mac_an_restart(struct net_device *dev) > +{ > + struct mvpp2_port *port = netdev_priv(dev); > + u32 val; > + > + if (!phy_interface_mode_is_rgmii(port->phy_interface) && > + port->phy_interface != PHY_INTERFACE_MODE_SGMII) > + return; This prevents AN restart in 1000base-X mode, which is exactly the mode that you need to do this. SGMII doesn't care, and RGMII doesn't have inband AN. > + > + val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG); > + val |= MVPP2_GMAC_IN_BAND_RESTART_AN; > + writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG); > +} > + > +static void mvpp2_mac_config(struct net_device *dev, unsigned int mode, > + const struct phylink_link_state *state) > +{ > + struct mvpp2_port *port = netdev_priv(dev); > + u32 val; > + > + /* disable current port for reconfiguration */ > + mvpp2_interrupts_disable(port); > + netif_carrier_off(port->dev); > + mvpp2_port_disable(port); > + phy_power_off(port->comphy); > + > + /* comphy reconfiguration */ > + port->phy_interface = state->interface; > + mvpp22_comphy_init(port); > + > + /* gop/mac reconfiguration */ > + mvpp22_gop_init(port); > + mvpp2_port_mii_set(port); > + > + if (!phy_interface_mode_is_rgmii(port->phy_interface) && > + port->phy_interface != PHY_INTERFACE_MODE_SGMII) > + return; Again, 1000base-X is excluded, which will break it. You do need to avoid touching the GMAC for 10G connections however. > + > + val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG); > + val &= ~(MVPP2_GMAC_CONFIG_MII_SPEED | > + MVPP2
[PATCH] net: stmmac: Meet alignment requirements for DMA
According to Documentation/DMA-API.txt: Warnings: Memory coherency operates at a granularity called the cache line width. In order for memory mapped by this API to operate correctly, the mapped region must begin exactly on a cache line boundary and end exactly on one (to prevent two separately mapped regions from sharing a single cache line). Since the cache line size may not be known at compile time, the API will not enforce this requirement. Therefore, it is recommended that driver writers who don't take special care to determine the cache line size at run time only map virtual regions that begin and end on page boundaries (which are guaranteed also to be cache line boundaries). On some systems where DMA is non-coherent and requires software writeback / invalidate of the caches, we must ensure that dma_(un)map_single is called with a cacheline aligned buffer and a length of a whole number of cachelines. To address the alignment requirements of DMA buffers, keep a separate entry in stmmac_rx_queue for the aligned skbuff head. Use this for dma_map_single, such that the address meets the cacheline alignment requirents. Use skb_headroom() to convert between rx_skbuff_head, the aligned head of the buffer, and the packet data, rx_skbuff_dma. Tested on a Creator Ci40 with Pistachio SoC. Signed-off-by: Matt Redfearn --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 50 --- 2 files changed, 36 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index a916e13624eb..dd26a724dee7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -67,6 +67,7 @@ struct stmmac_rx_queue { struct dma_desc *dma_rx cacheline_aligned_in_smp; struct sk_buff **rx_skbuff; dma_addr_t *rx_skbuff_dma; + dma_addr_t *rx_skbuff_head; unsigned int cur_rx; unsigned int dirty_rx; u32 rx_zeroc_thresh; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 1763e48c84e2..da68eeff2a1c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1132,14 +1132,16 @@ static int stmmac_init_rx_buffers(struct stmmac_priv *priv, struct dma_desc *p, return -ENOMEM; } rx_q->rx_skbuff[i] = skb; - rx_q->rx_skbuff_dma[i] = dma_map_single(priv->device, skb->data, - priv->dma_buf_sz, - DMA_FROM_DEVICE); - if (dma_mapping_error(priv->device, rx_q->rx_skbuff_dma[i])) { + rx_q->rx_skbuff_head[i] = dma_map_single(priv->device, skb->head, +skb_headroom(skb) + +priv->dma_buf_sz, +DMA_FROM_DEVICE); + if (dma_mapping_error(priv->device, rx_q->rx_skbuff_head[i])) { netdev_err(priv->dev, "%s: DMA mapping error\n", __func__); dev_kfree_skb_any(skb); return -EINVAL; } + rx_q->rx_skbuff_dma[i] = rx_q->rx_skbuff_head[i] + skb_headroom(skb); if (priv->synopsys_id >= DWMAC_CORE_4_00) p->des0 = cpu_to_le32(rx_q->rx_skbuff_dma[i]); @@ -1164,7 +1166,8 @@ static void stmmac_free_rx_buffer(struct stmmac_priv *priv, u32 queue, int i) struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue]; if (rx_q->rx_skbuff[i]) { - dma_unmap_single(priv->device, rx_q->rx_skbuff_dma[i], + dma_unmap_single(priv->device, rx_q->rx_skbuff_head[i], +skb_headroom(rx_q->rx_skbuff[i]) + priv->dma_buf_sz, DMA_FROM_DEVICE); dev_kfree_skb_any(rx_q->rx_skbuff[i]); } @@ -1438,6 +1441,7 @@ static void free_dma_rx_desc_resources(struct stmmac_priv *priv) rx_q->dma_erx, rx_q->dma_rx_phy); kfree(rx_q->rx_skbuff_dma); + kfree(rx_q->rx_skbuff_head); kfree(rx_q->rx_skbuff); } } @@ -1500,6 +1504,12 @@ static int alloc_dma_rx_desc_resources(struct stmmac_priv *priv) if (!rx_q->rx_skbuff_dma) goto err_dma; + rx_q->rx_skbuff_head = kmalloc_array(DMA_RX_SIZE, +sizeof(dma_addr_t), +GFP_KERNEL); + if (!rx_q->rx_skbuff_head) + goto err_dma; + rx_q->rx_skbuff = kmalloc_array(DMA_RX_SIZE, sizeof(struct sk_buff *),
[PATCH iproute2 v2] man: fix documentation for range of route table ID
Signed-off-by: Thomas Haller --- Changes in v2: - "0" is not a valid table ID. man/man8/ip-route.8.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in index 803de3b9..705ceb20 100644 --- a/man/man8/ip-route.8.in +++ b/man/man8/ip-route.8.in @@ -322,7 +322,7 @@ normal routing tables. .P .B Route tables: Linux-2.x can pack routes into several routing tables identified -by a number in the range from 1 to 2^31 or by name from the file +by a number in the range from 1 to 2^32-1 or by name from the file .B @SYSCONFDIR@/rt_tables By default all normal routes are inserted into the .B main -- 2.13.5
Re: [PATCH] e1000: avoid null pointer dereference on invalid stat type
On Thu, Sep 21, 2017 at 11:01:58PM +0100, Colin King wrote: > @@ -1837,12 +1838,13 @@ static void e1000_get_ethtool_stats(struct net_device > *netdev, > p = (char *)adapter + stat->stat_offset; > break; > default: > + p = NULL; > WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n", > stat->type, i); > break; > } > > - if (stat->sizeof_stat == sizeof(u64)) > + if (p && stat->sizeof_stat == sizeof(u64)) > data[i] = *(u64 *)p; > else > data[i] = *(u32 *)p; The else side will still crash. regards, dan carpenter
Re: [PATCH] e1000: avoid null pointer dereference on invalid stat type
On 22/09/17 12:50, Dan Carpenter wrote: > On Thu, Sep 21, 2017 at 11:01:58PM +0100, Colin King wrote: >> @@ -1837,12 +1838,13 @@ static void e1000_get_ethtool_stats(struct >> net_device *netdev, >> p = (char *)adapter + stat->stat_offset; >> break; >> default: >> +p = NULL; >> WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n", >>stat->type, i); >> break; >> } >> >> -if (stat->sizeof_stat == sizeof(u64)) >> +if (p && stat->sizeof_stat == sizeof(u64)) >> data[i] = *(u64 *)p; >> else >> data[i] = *(u32 *)p; > > > The else side will still crash. > > regards, > dan carpenter > Thanks, stupid me. I'll fix that up.
Re: [net-next 1/2] dummy: add device MTU validation check
2017-09-22, 04:05:09 -0700, Eric Dumazet wrote: > On Fri, 2017-09-22 at 10:56 +0200, Sabrina Dubroca wrote: > > 2017-09-21, 08:02:18 -0700, Eric Dumazet wrote: > > > On Thu, 2017-09-21 at 21:32 +0800, Zhang Shengju wrote: > > > > Currently, any mtu value can be assigned when adding a new dummy device: > > > > [~]# ip link add name dummy1 mtu 10 type dummy > > > > [~]# ip link show dummy1 > > > > 15: dummy1: mtu 10 qdisc noop state DOWN mode > > > > DEFAULT group default qlen 1000 > > > > link/ether 0a:61:6b:16:14:ce brd ff:ff:ff:ff:ff:ff > > > > > > > > This patch adds device MTU validation check. > > > > > > What is wrong with big MTU on dummy ? > > > > It looks like the "centralize MTU checking" series broke that, but > > only for changing the MTU on an existing dummy device. Commit > > a52ad514fdf3 defined min_mtu/max_mtu in ether_setup, which dummy uses, > > but there was no MTU check in dummy prior to that commit. > > > > It looks like we accept big mtu on loopback, right ? Yes. I only meant that before commit a52ad514fdf3, there was no range check on dummy's MTU. Commit 25e3e84b183a ("dummy: expend mtu range for dummy device") and 8b1efc0f83f1 ("net: remove MTU limits on a few ether_setup callers") fixed that only partially. It's the same with ifb, btw, it didn't have any check before a52ad514fdf3, so we should set min_mtu = max_mtu = 0. -- Sabrina
[PATCH] MAINTAINERS: update git tree locations for ieee802154 subsystem
Patches for ieee802154 will go through my new trees towards netdev from now on. The 6LoWPAN subsystem will stay as is (shared between ieee802154 and bluetooth) and go through the bluetooth tree as usual. Signed-off-by: Stefan Schmidt --- MAINTAINERS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2281af4b41b6..fc7313daf4b0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6643,8 +6643,8 @@ M:Alexander Aring M: Stefan Schmidt L: linux-w...@vger.kernel.org W: http://wpan.cakelab.org/ -T: git git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git -T: git git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git +T: git git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan.git +T: git git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next.git S: Maintained F: net/ieee802154/ F: net/mac802154/ -- 2.13.5
Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack
Thu, Sep 21, 2017 at 01:21:53PM CEST, linyunsh...@huawei.com wrote: >When using tc qdisc to configure DCB parameter, dcb_ops->setup_tc >is used to tell hclge_dcb module to do the setup. >When using lldptool to configure DCB parameter, hclge_dcb module >call the client_ops->setup_tc to tell network stack which queue >and priority is using for specific tc. > >Signed-off-by: Yunsheng Lin [...] >-static int hns3_setup_tc(struct net_device *netdev, u8 tc) >+static int hns3_setup_tc(struct net_device *netdev, u8 tc, u8 *prio_tc) > { > struct hns3_nic_priv *priv = netdev_priv(netdev); > struct hnae3_handle *h = priv->ae_handle; > struct hnae3_knic_private_info *kinfo = &h->kinfo; >+ bool if_running = netif_running(netdev); > unsigned int i; > int ret; > > if (tc > HNAE3_MAX_TC) > return -EINVAL; > >- if (kinfo->num_tc == tc) >- return 0; >- > if (!netdev) > return -EINVAL; > >- if (!tc) { >+ if (if_running) { >+ (void)hns3_nic_net_stop(netdev); >+ msleep(100); >+ } >+ >+ ret = (kinfo->dcb_ops && kinfo->dcb_ops->setup_tc) ? >+ kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : -EOPNOTSUPP; This is most odd. Why do you call dcb_ops from ndo_setup_tc callback? Why are you mixing this together? prio->tc mapping can be done directly in dcbnl
Re: tg3 pxe weirdness
On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote: > On Thu, Sep 21, 2017 at 7:53 PM, Berend De Schouwer > wrote: > > Hi, > > > > I've got a machine with a Broadcom bcm5762c, using the tg3 driver, > > that > > fails to receive network packets under some very specific > > conditions. > > > > > > Berend > > Can you please share below details? > 1) Model and Manufacturer of the system > 2) Linux distro/kernel used? 4.13.3 mainline locks up, but network dumps confirm it gets further. The second DHCP query completes, and the first NFS mount completes. I have to rely on network dumps since on PXE and SATA boot it breaks VGA output, so I can't see what it's doing, or why it stops.
RE: [net-next 1/2] dummy: add device MTU validation check
> -Original Message- > From: Sabrina Dubroca [mailto:s...@queasysnail.net] > Sent: 2017年9月22日 20:23 > To: Eric Dumazet > Cc: Jarod Wilson ; Zhang Shengju > ; da...@davemloft.net; > will...@google.com; step...@networkplumber.org; > netdev@vger.kernel.org > Subject: Re: [net-next 1/2] dummy: add device MTU validation check > > 2017-09-22, 04:05:09 -0700, Eric Dumazet wrote: > > On Fri, 2017-09-22 at 10:56 +0200, Sabrina Dubroca wrote: > > > 2017-09-21, 08:02:18 -0700, Eric Dumazet wrote: > > > > On Thu, 2017-09-21 at 21:32 +0800, Zhang Shengju wrote: > > > > > Currently, any mtu value can be assigned when adding a new dummy > device: > > > > > [~]# ip link add name dummy1 mtu 10 type dummy [~]# ip link > > > > > show dummy1 > > > > > 15: dummy1: mtu 10 qdisc noop state > DOWN mode DEFAULT group default qlen 1000 > > > > > link/ether 0a:61:6b:16:14:ce brd ff:ff:ff:ff:ff:ff > > > > > > > > > > This patch adds device MTU validation check. > > > > > > > > What is wrong with big MTU on dummy ? > > > > > > It looks like the "centralize MTU checking" series broke that, but > > > only for changing the MTU on an existing dummy device. Commit > > > a52ad514fdf3 defined min_mtu/max_mtu in ether_setup, which dummy > > > uses, but there was no MTU check in dummy prior to that commit. > > > > > > > It looks like we accept big mtu on loopback, right ? > > Yes. I only meant that before commit a52ad514fdf3, there was no range check > on dummy's MTU. Commit 25e3e84b183a ("dummy: expend mtu range for > dummy device") and 8b1efc0f83f1 ("net: remove MTU limits on a few > ether_setup callers") fixed that only partially. It's the same with ifb, btw, > it > didn't have any check before a52ad514fdf3, so we should set min_mtu = > max_mtu = 0. > > -- > Sabrina I agree, dummy and ifb device should not have any limit on mtu, just like loopback device. I will send v2 patch, and set min/max_mtu to zero for dummy and ifb device, thanks. ZSJ
Re: tg3 pxe weirdness
On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote: > On Thu, Sep 21, 2017 at 7:53 PM, Berend De Schouwer > wrote: > > Hi, > > > > I've got a machine with a Broadcom bcm5762c, using the tg3 driver, > > that > > fails to receive network packets under some very specific > > conditions. > > > > > > Berend > > Can you please share below details? > 1) Model and Manufacturer of the system > 2) Linux distro/kernel used? 3.10.107 behaves like CentOS' kernel.
Re: [patch net-next 07/12] mlxsw: spectrum: Add the multicast routing offloading logic
On Fri, Sep 22, 2017 at 11:36:59AM +0300, Yotam Gigi wrote: > On 09/21/2017 06:26 PM, Andrew Lunn wrote: > >> +static void mlxsw_sp_mr_route_stats_update(struct mlxsw_sp *mlxsw_sp, > >> + struct mlxsw_sp_mr_route *mr_route) > >> +{ > >> + struct mlxsw_sp_mr *mr = mlxsw_sp->mr; > >> + u64 packets, bytes; > >> + > >> + if (mr_route->route_action == MLXSW_SP_MR_ROUTE_ACTION_TRAP) > >> + return; > >> + > >> + mr->mr_ops->route_stats(mlxsw_sp, mr_route->route_priv, &packets, > >> + &bytes); > >> + > >> + switch (mr_route->mr_table->proto) { > >> + case MLXSW_SP_L3_PROTO_IPV4: > >> + mr_route->mfc4->mfc_un.res.pkt = packets; > >> + mr_route->mfc4->mfc_un.res.bytes = bytes; > > What about wrong_if and lastuse? Hi Yotam > wronf_if is updated by ipmr as it is trapped to the CPU. Great. > We did not address lastuse currently, though it can be easily > addressed here. Please do. I've written multicast routing daemons, where i use it to flush out MFCs which are no longer in use. Having it always 0 is going to break daemons. > > Is an mfc with iif on the host, not the switch, not offloaded? > > > I am not sure I followed. What do you mean MFC with iif on the host? you mean > MFC with iif that is an external NIC which is not part of the spectrum ASIC? Yes. We probably have different perspectives on the world. To Mellanox, everything is a switch in a box. In the DSA world, we tend to think of having a general purpose machine which also has a switch connected. Think of a wireless access point, set top box, passenger entertainment system. We have applications on the general purpose computer, we have wifi interfaces, cable modems, etc. Think about all the different packages in LEDE. We might have a multicast video stream, coming from the cable modem being sent over ports of the switch to clients. So when i look at these patches, i try to make sure the general use cases works, not just the plain boring Ethernet switch box use cases :-) > in this case, the route will not be offloaded and all traffic will > pass in slowpath. O.K. I was just thinking if those counters need to be +=, not =. But either the iif is on the host, or it is in the switch. It cannot be both. So = is O.K. Thanks Andrew
[PATCH net 1/2] l2tp: ensure sessions are freed after their PPPOL2TP socket
If l2tp_tunnel_delete() or l2tp_tunnel_closeall() deletes a session right after pppol2tp_release() orphaned its socket, then the 'sock' variable of the pppol2tp_session_close() callback is NULL. Yet the session is still used by pppol2tp_release(). Therefore we need to take an extra reference in any case, to prevent l2tp_tunnel_delete() or l2tp_tunnel_closeall() from freeing the session. Since the pppol2tp_session_close() callback is only set if the session is associated to a PPPOL2TP socket and that both l2tp_tunnel_delete() and l2tp_tunnel_closeall() hold the PPPOL2TP socket before calling pppol2tp_session_close(), we're sure that pppol2tp_session_close() and pppol2tp_session_destruct() are paired and called in the right order. So the reference taken by the former will be released by the later. Signed-off-by: Guillaume Nault --- net/l2tp/l2tp_ppp.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index 50e3ee9a9d61..bc6e8bfc5be4 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -437,11 +437,11 @@ static void pppol2tp_session_close(struct l2tp_session *session) BUG_ON(session->magic != L2TP_SESSION_MAGIC); - if (sock) { + if (sock) inet_shutdown(sock, SEND_SHUTDOWN); - /* Don't let the session go away before our socket does */ - l2tp_session_inc_refcount(session); - } + + /* Don't let the session go away before our socket does */ + l2tp_session_inc_refcount(session); } /* Really kill the session socket. (Called from sock_put() if -- 2.14.1
[PATCH net 2/2] l2tp: fix race between l2tp_session_delete() and l2tp_tunnel_closeall()
There are several ways to remove L2TP sessions: * deleting a session explicitly using the netlink interface (with L2TP_CMD_SESSION_DELETE), * deleting the session's parent tunnel (either by closing the tunnel's file descriptor or using the netlink interface), * closing the PPPOL2TP file descriptor of a PPP pseudo-wire. In some cases, when these methods are used concurrently on the same session, the session can be removed twice, leading to use-after-free bugs. This patch adds a 'dead' flag, used by l2tp_session_delete() and l2tp_tunnel_closeall() to prevent them from stepping on each other's toes. The session deletion path used when closing a PPPOL2TP file descriptor doesn't need to be adapted. It already has to ensure that a session remains valid for the lifetime of its PPPOL2TP file descriptor. So it takes an extra reference on the session in the ->session_close() callback (pppol2tp_session_close()), which is eventually dropped in the ->sk_destruct() callback of the PPPOL2TP socket (pppol2tp_session_destruct()). Still, __l2tp_session_unhash() and l2tp_session_queue_purge() can be called twice and even concurrently for a given session, but thanks to proper locking and re-initialisation of list fields, this is not an issue. Signed-off-by: Guillaume Nault --- net/l2tp/l2tp_core.c | 6 ++ net/l2tp/l2tp_core.h | 1 + 2 files changed, 7 insertions(+) diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index ee485df73ccd..d8c2a89a76e1 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1314,6 +1314,9 @@ void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel) hlist_del_init(&session->hlist); + if (test_and_set_bit(0, &session->dead)) + goto again; + if (session->ref != NULL) (*session->ref)(session); @@ -1750,6 +1753,9 @@ EXPORT_SYMBOL_GPL(__l2tp_session_unhash); */ int l2tp_session_delete(struct l2tp_session *session) { + if (test_and_set_bit(0, &session->dead)) + return 0; + if (session->ref) (*session->ref)(session); __l2tp_session_unhash(session); diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h index a305e0c5925a..70a12df40a5f 100644 --- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -76,6 +76,7 @@ struct l2tp_session_cfg { struct l2tp_session { int magic; /* should be * L2TP_SESSION_MAGIC */ + longdead; struct l2tp_tunnel *tunnel;/* back pointer to tunnel * context */ -- 2.14.1
[PATCH net 0/2] l2tp: fix some races in session deletion
L2TP provides several interfaces for deleting sessions. Using two of them concurrently can lead to use-after-free bugs. Patch #2 uses a flag to prevent double removal of L2TP sessions. Patch #1 fixes a bug found in the way. Fixing this bug is also necessary for patch #2 to handle all cases. This issue is similar to the tunnel deletion bug being worked on by Sabrina: https://patchwork.ozlabs.org/patch/814173/ Guillaume Nault (2): l2tp: ensure sessions are freed after their PPPOL2TP socket l2tp: fix race between l2tp_session_delete() and l2tp_tunnel_closeall() net/l2tp/l2tp_core.c | 6 ++ net/l2tp/l2tp_core.h | 1 + net/l2tp/l2tp_ppp.c | 8 3 files changed, 11 insertions(+), 4 deletions(-) -- 2.14.1
[PATCH][V2] e1000: avoid null pointer dereference on invalid stat type
From: Colin Ian King Currently if the stat type is invalid then data[i] is being set either by dereferencing a null pointer p, or it is reading from an incorrect previous location if we had a valid stat type previously. Fix this by nullify pointer p if a stat type is invalid and only setting data if p is not null. Detected by CoverityScan, CID#113385 ("Explicit null dereferenced") Signed-off-by: Colin Ian King --- drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c index ec8aa4562cc9..724c93a6ea92 100644 --- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c +++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c @@ -1824,7 +1824,7 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, { struct e1000_adapter *adapter = netdev_priv(netdev); int i; - char *p = NULL; + char *p; const struct e1000_stats *stat = e1000_gstrings_stats; e1000_update_stats(adapter); @@ -1837,16 +1837,18 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, p = (char *)adapter + stat->stat_offset; break; default: + p = NULL; WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n", stat->type, i); break; } - if (stat->sizeof_stat == sizeof(u64)) - data[i] = *(u64 *)p; - else - data[i] = *(u32 *)p; - + if (p) { + if (stat->sizeof_stat == sizeof(u64)) + data[i] = *(u64 *)p; + else + data[i] = *(u32 *)p; + } stat++; } /* BUG_ON(i != E1000_STATS_LEN); */ -- 2.14.1
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 22/09/17 00:11, Y Song wrote: > On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree wrote: >> On 21/09/17 20:44, Alexei Starovoitov wrote: >>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: More intuitive, but agree on the from_be/le. Maybe we should just drop the "to_" prefix altogether, and leave the rest as is since it's not surrounded by braces, it's also not a cast but rather an op. >> That works for me. >>> 'be16 r4' is ambiguous regarding upper bits. >>> >>> what about my earlier suggestion: >>> r4 = (be16) (u16) r4 >>> r4 = (le64) (u64) r4 >>> >>> It will be pretty clear what instruction is doing (that upper bits become >>> zero). >> Trouble with that is that's very *not* what C will do with those casts >> and it doesn't really capture the bidirectional/symmetry thing. The >> closest I could see with that is something like `r4 = (be16/u16) r4`, >> but that's quite an ugly mongrel. >> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the >> cleanest and clearest. Should it be >> r4 = be16 r4 >> or just >> be16 r4 >> ? Personally I incline towards the latter, but admit it doesn't really >> match the syntax of other opcodes. > I did some quick prototyping in llvm to make sure we have a syntax > llvm is happy. Apparently, llvm does not like the syntax >r4 = be16 r4orr4 = (be16) (u16) r4. > > In llvm:utils/TableGen/AsmMatcherEmitter.cpp: > > // Verify that any operand is only mentioned once. Wait, how do you deal with (totally legal) r4 += r4? Or r4 = *(r4 +0)? Even jumps can have src_reg == dst_reg, though it doesn't seem useful.
[PATCH v4 2/3] ipv4: Namespaceify tcp_fastopen_key knob
Different namespace application might require different tcp_fastopen_key independently of the host. David Miller pointed out there is a leak without releasing the context of tcp_fastopen_key during netns teardown. So add the release action in exit_batch path. Tested: 1. Container namespace: # cat /proc/sys/net/ipv4/tcp_fastopen_key: 2817fff2-f803cf97-eadfd1f3-78c0992b cookie key in tcp syn packets: Fast Open Cookie Kind: TCP Fast Open Cookie (34) Length: 10 Fast Open Cookie: 1e5dd82a8c492ca9 2. Host: # cat /proc/sys/net/ipv4/tcp_fastopen_key: 107d7c5f-68eb2ac7-02fb06e6-ed341702 cookie key in tcp syn packets: Fast Open Cookie Kind: TCP Fast Open Cookie (34) Length: 10 Fast Open Cookie: e213c02bf0afbc8a Signed-off-by: Haishuang Yan --- include/net/netns/ipv4.h | 4 +++ include/net/tcp.h | 6 ++--- net/ipv4/af_inet.c | 2 +- net/ipv4/sysctl_net_ipv4.c | 26 +-- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_fastopen.c| 64 +++--- net/ipv4/tcp_ipv4.c| 6 + 7 files changed, 70 insertions(+), 40 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index ce6dde0..66b8335 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -36,6 +36,8 @@ struct inet_timewait_death_row { int sysctl_max_tw_buckets; }; +struct tcp_fastopen_context; + struct netns_ipv4 { #ifdef CONFIG_SYSCTL struct ctl_table_header *forw_hdr; @@ -128,6 +130,8 @@ struct netns_ipv4 { struct inet_timewait_death_row tcp_death_row; int sysctl_max_syn_backlog; int sysctl_tcp_fastopen; + struct tcp_fastopen_context __rcu *tcp_fastopen_ctx; + spinlock_t tcp_fastopen_ctx_lock; #ifdef CONFIG_NET_L3_MASTER_DEV int sysctl_udp_l3mdev_accept; diff --git a/include/net/tcp.h b/include/net/tcp.h index f628967..e27bd18 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1556,13 +1556,13 @@ struct tcp_fastopen_request { }; void tcp_free_fastopen_req(struct tcp_sock *tp); -extern struct tcp_fastopen_context __rcu *tcp_fastopen_ctx; -int tcp_fastopen_reset_cipher(void *key, unsigned int len); +void tcp_fastopen_ctx_destroy(struct net *net); +int tcp_fastopen_reset_cipher(struct net *net, void *key, unsigned int len); void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb); struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct tcp_fastopen_cookie *foc); -void tcp_fastopen_init_key_once(bool publish); +void tcp_fastopen_init_key_once(struct net *net); bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss, struct tcp_fastopen_cookie *cookie); bool tcp_fastopen_defer_connect(struct sock *sk, int *err); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index ddd126d..43a1bbe 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -222,7 +222,7 @@ int inet_listen(struct socket *sock, int backlog) (tcp_fastopen & TFO_SERVER_ENABLE) && !inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) { fastopen_queue_tune(sk, backlog); - tcp_fastopen_init_key_once(true); + tcp_fastopen_init_key_once(sock_net(sk)); } err = inet_csk_listen_start(sk, backlog); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index e31e853c..20e19fe 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -251,10 +251,12 @@ static int proc_allowed_congestion_control(struct ctl_table *ctl, return ret; } -static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, +static int proc_tcp_fastopen_key(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { + struct net *net = container_of(table->data, struct net, + ipv4.sysctl_tcp_fastopen); struct ctl_table tbl = { .maxlen = (TCP_FASTOPEN_KEY_LENGTH * 2 + 10) }; struct tcp_fastopen_context *ctxt; int ret; @@ -265,7 +267,7 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, return -ENOMEM; rcu_read_lock(); - ctxt = rcu_dereference(tcp_fastopen_ctx); + ctxt = rcu_dereference(net->ipv4.tcp_fastopen_ctx); if (ctxt) memcpy(user_key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); else @@ -282,12 +284,7 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, ret = -EINVAL; goto bad_key; } - /* Generate a dummy secret but don't publish it. This -* is needed so we don't regenerate a new key on the -* first invocation of t
[PATCH v4 3/3] ipv4: Namespaceify tcp_fastopen_blackhole_timeout knob
Different namespace application might require different time period in second to disable Fastopen on active TCP sockets. Tested: Simulate following similar situation that the server's data gets dropped after 3WHS. C syn-data ---> S C <--- syn/ack - S C ack > S S (accept & write) C? X <- data -- S [retry and timeout] And then print netstat of TCPFastOpenBlackhole, the counter increased as expected when the firewall blackhole issue is detected and active TFO is disabled. # cat /proc/net/netstat | awk '{print $91}' TCPFastOpenBlackhole 1 Signed-off-by: Haishuang Yan --- include/net/netns/ipv4.h | 3 +++ net/ipv4/sysctl_net_ipv4.c | 20 +++- net/ipv4/tcp_fastopen.c| 30 +++--- net/ipv4/tcp_ipv4.c| 2 ++ 4 files changed, 27 insertions(+), 28 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 66b8335..d76edde 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -132,6 +132,9 @@ struct netns_ipv4 { int sysctl_tcp_fastopen; struct tcp_fastopen_context __rcu *tcp_fastopen_ctx; spinlock_t tcp_fastopen_ctx_lock; + unsigned int sysctl_tcp_fastopen_blackhole_timeout; + atomic_t tfo_active_disable_times; + unsigned long tfo_active_disable_stamp; #ifdef CONFIG_NET_L3_MASTER_DEV int sysctl_udp_l3mdev_accept; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 20e19fe..cac8dd3 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -355,11 +355,13 @@ static int proc_tfo_blackhole_detect_timeout(struct ctl_table *table, void __user *buffer, size_t *lenp, loff_t *ppos) { + struct net *net = container_of(table->data, struct net, + ipv4.sysctl_tcp_fastopen_blackhole_timeout); int ret; ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); if (write && ret == 0) - tcp_fastopen_active_timeout_reset(); + atomic_set(&net->ipv4.tfo_active_disable_times, 0); return ret; } @@ -398,14 +400,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .proc_handler = proc_dointvec }, { - .procname = "tcp_fastopen_blackhole_timeout_sec", - .data = &sysctl_tcp_fastopen_blackhole_timeout, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_tfo_blackhole_detect_timeout, - .extra1 = &zero, - }, - { .procname = "tcp_abort_on_overflow", .data = &sysctl_tcp_abort_on_overflow, .maxlen = sizeof(int), @@ -1083,6 +1077,14 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .maxlen = ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10), .proc_handler = proc_tcp_fastopen_key, }, + { + .procname = "tcp_fastopen_blackhole_timeout_sec", + .data = &init_net.ipv4.sysctl_tcp_fastopen_blackhole_timeout, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_tfo_blackhole_detect_timeout, + .extra1 = &zero, + }, #ifdef CONFIG_IP_ROUTE_MULTIPATH { .procname = "fib_multipath_use_neigh", diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 4eae44a..de470e7 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -422,25 +422,16 @@ bool tcp_fastopen_defer_connect(struct sock *sk, int *err) * TFO connection with data exchanges. */ -/* Default to 1hr */ -unsigned int sysctl_tcp_fastopen_blackhole_timeout __read_mostly = 60 * 60; -static atomic_t tfo_active_disable_times __read_mostly = ATOMIC_INIT(0); -static unsigned long tfo_active_disable_stamp __read_mostly; - /* Disable active TFO and record current jiffies and * tfo_active_disable_times */ void tcp_fastopen_active_disable(struct sock *sk) { - atomic_inc(&tfo_active_disable_times); - tfo_active_disable_stamp = jiffies; - NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENBLACKHOLE); -} + struct net *net = sock_net(sk); -/* Reset tfo_active_disable_times to 0 */ -void tcp_fastopen_active_timeout_reset(void) -{ - atomic_set(&tfo_active_disable_times, 0); + atomic_inc(&net->ipv4.tfo_active_disable_times); + net->ipv4.tfo_active_disable_stamp = jiffies; + NET_INC_STATS(net, LINUX_MIB_TCPFASTOPENBLACKHOLE); } /* Calculate timeout for tfo active disable @@ -449,17 +440,18 @@ void tcp_fastopen_active_timeout_reset(void) */ bool tcp_fastopen_active_should_disable(struct sock *sk) { - int tfo_da_times = atomic_read(&tfo_active_
[PATCH v4 1/3] ipv4: Namespaceify tcp_fastopen knob
Different namespace application might require enable TCP Fast Open feature independently of the host. This patch series continues making more of the TCP Fast Open related sysctl knobs be per net-namespace. Reported-by: Luca BRUNO Signed-off-by: Haishuang Yan --- Change since v4: * Fix potential leak issue * Fix typo error * Split the patches to be per-sysctl * Remove unrelated change by mistake --- include/net/netns/ipv4.h | 1 + include/net/tcp.h | 1 - net/ipv4/af_inet.c | 7 --- net/ipv4/sysctl_net_ipv4.c | 14 +++--- net/ipv4/tcp.c | 4 ++-- net/ipv4/tcp_fastopen.c| 11 +-- net/ipv4/tcp_ipv4.c| 2 ++ 7 files changed, 21 insertions(+), 19 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 20d061c..ce6dde0 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -127,6 +127,7 @@ struct netns_ipv4 { int sysctl_tcp_timestamps; struct inet_timewait_death_row tcp_death_row; int sysctl_max_syn_backlog; + int sysctl_tcp_fastopen; #ifdef CONFIG_NET_L3_MASTER_DEV int sysctl_udp_l3mdev_accept; diff --git a/include/net/tcp.h b/include/net/tcp.h index b510f28..f628967 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -240,7 +240,6 @@ /* sysctl variables for tcp */ -extern int sysctl_tcp_fastopen; extern int sysctl_tcp_retrans_collapse; extern int sysctl_tcp_stdurg; extern int sysctl_tcp_rfc1337; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index e31108e..ddd126d 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -195,7 +195,7 @@ int inet_listen(struct socket *sock, int backlog) { struct sock *sk = sock->sk; unsigned char old_state; - int err; + int err, tcp_fastopen; lock_sock(sk); @@ -217,8 +217,9 @@ int inet_listen(struct socket *sock, int backlog) * because the socket was in TCP_LISTEN state previously but * was shutdown() rather than close(). */ - if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) && - (sysctl_tcp_fastopen & TFO_SERVER_ENABLE) && + tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen; + if ((tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) && + (tcp_fastopen & TFO_SERVER_ENABLE) && !inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) { fastopen_queue_tune(sk, backlog); tcp_fastopen_init_key_once(true); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 0d3c038..e31e853c 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -401,13 +401,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .proc_handler = proc_dointvec }, { - .procname = "tcp_fastopen", - .data = &sysctl_tcp_fastopen, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, - { .procname = "tcp_fastopen_key", .mode = 0600, .maxlen = ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10), @@ -1085,6 +1078,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "tcp_fastopen", + .data = &init_net.ipv4.sysctl_tcp_fastopen, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, #ifdef CONFIG_IP_ROUTE_MULTIPATH { .procname = "fib_multipath_use_neigh", diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5091402..dac56c4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1126,7 +1126,7 @@ static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, struct sockaddr *uaddr = msg->msg_name; int err, flags; - if (!(sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) || + if (!(sock_net(sk)->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) || (uaddr && msg->msg_namelen >= sizeof(uaddr->sa_family) && uaddr->sa_family == AF_UNSPEC)) return -EOPNOTSUPP; @@ -2759,7 +2759,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_FASTOPEN_CONNECT: if (val > 1 || val < 0) { err = -EINVAL; - } else if (sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) { + } else if (net->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) { if (sk->sk_state == TCP_CLOSE) tp->fastopen_connect = val; else diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_
[PATCH] brcm80211: make const array ucode_ofdm_rates static, reduces object code size
From: Colin Ian King Don't populate const array ucode_ofdm_rates on the stack, instead make it static. Makes the object code smaller by 100 bytes: Before: textdata bss dec hex filename 39482 564 0 400469c6e phy_cmn.o After textdata bss dec hex filename 39326 620 0 399469c0a phy_cmn.o (gcc 6.3.0, x86-64) Signed-off-by: Colin Ian King --- drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c index 1c4e9dd57960..3a13d176b221 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c @@ -1916,7 +1916,7 @@ void wlc_phy_txpower_update_shm(struct brcms_phy *pi) pi->hwpwr_txcur); for (j = TXP_FIRST_OFDM; j <= TXP_LAST_OFDM; j++) { - const u8 ucode_ofdm_rates[] = { + static const u8 ucode_ofdm_rates[] = { 0x0c, 0x12, 0x18, 0x24, 0x30, 0x48, 0x60, 0x6c }; offset = wlapi_bmac_rate_shm_offset( -- 2.14.1
Re: [pktgen script v2 0/2] Add a pktgen sample script of NUMA awareness
On Mon, 18 Sep 2017 11:06:21 +0200 Jesper Dangaard Brouer wrote: > On Sun, 17 Sep 2017 20:36:36 +0800 Robert Hoo > wrote: > > > Change log > > v2: > > Rebased to > > https://github.com/netoptimizer/network-testing/tree/master/pktgen > > Hi Robert, > > Thank you for submitting this against my git tree[1]. I skimmed the > patches and they looked okay. I'll give them a test run, before I > accept them into my tree. > > Later I'll synchronize my pktgen scripts/git-tree with the kernel via > regular patches against DaveM's net-next tree[2] (and I'll try to > remember to give you author credit). > > [1] https://github.com/netoptimizer/network-testing/tree/master/pktgen > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/samples/pktgen FYI, I've applied and pushed these patches to my tree. https://github.com/netoptimizer/network-testing/commits?author=robert-hoo https://github.com/netoptimizer/network-testing/commit/1b9b4b797a4f112 https://github.com/netoptimizer/network-testing/commit/65efc2352f63dde https://github.com/netoptimizer/network-testing/commit/54eb5178aaf4031 I fixed the description a bit, and only made one simple change: https://github.com/netoptimizer/network-testing/commit/9ff58568b3f8c91 Thanks for working on improving the pktgen scripts :-) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file > *file, struct ifreq *ifr) > if (tfile->detached) > return -EINVAL; > > + if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN)) > + return -EPERM; > + This should perhaps be moved into the !dev branch, directly below the ns_capable check. > dev = __dev_get_by_name(net, ifr->ifr_name); > if (dev) { > if (ifr->ifr_flags & IFF_TUN_EXCL) > @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file > *file, struct ifreq *ifr) > tun->flags = (tun->flags & ~TUN_FEATURES) | > (ifr->ifr_flags & TUN_FEATURES); > > + if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != > IFF_TAP) > + tun->flags = tun->flags & ~IFF_NAPI_FRAGS; > + Similarly, this check only need to be performed in that branch. Instead of reverting to non-frags mode, a tun_set_iff with the wrong set of flags should probably fail hard.
Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver
On Thu, Sep 21, 2017 at 10:17 PM, Petar Penkov wrote: > Changes TUN driver to use napi_gro_receive() upon receiving packets > rather than netif_rx_ni(). Adds flag IFF_NAPI that enables these > changes and operation is not affected if the flag is disabled. SKBs > are constructed upon packet arrival and are queued to be processed > later. > > The new path was evaluated with a benchmark with the following setup: > Open two tap devices and a receiver thread that reads in a loop for > each device. Start one sender thread and pin all threads to different > CPUs. Send 1M minimum UDP packets to each device and measure sending > time for each of the sending methods: > napi_gro_receive(): 4.90s > netif_rx_ni(): 4.90s > netif_receive_skb():7.20s > > Signed-off-by: Petar Penkov > Cc: Eric Dumazet > Cc: Mahesh Bandewar > Cc: Willem de Bruijn > Cc: da...@davemloft.net > Cc: ppen...@stanford.edu Acked-by: Willem de Bruijn Thanks, Petar.
[PATCH] i40e: make const array patterns static, reduces object code size
From: Colin Ian King Don't populate const array patterns on the stack, instead make it static. Makes the object code smaller by over 60 bytes: Before: textdata bss dec hex filename 1953 496 02449 991 i40e_diag.o After: textdata bss dec hex filename 1798 584 02382 94e i40e_diag.o (gcc 6.3.0, x86-64) Signed-off-by: Colin Ian King --- drivers/net/ethernet/intel/i40e/i40e_diag.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_diag.c b/drivers/net/ethernet/intel/i40e/i40e_diag.c index f141e78d409e..76ed56641864 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_diag.c +++ b/drivers/net/ethernet/intel/i40e/i40e_diag.c @@ -36,7 +36,9 @@ static i40e_status i40e_diag_reg_pattern_test(struct i40e_hw *hw, u32 reg, u32 mask) { - const u32 patterns[] = {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x}; + static const u32 patterns[] = { + 0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x + }; u32 pat, val, orig_val; int i; -- 2.14.1
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree wrote: > On 22/09/17 00:11, Y Song wrote: >> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree wrote: >>> On 21/09/17 20:44, Alexei Starovoitov wrote: On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: > More intuitive, but agree on the from_be/le. Maybe we should > just drop the "to_" prefix altogether, and leave the rest as is since > it's not surrounded by braces, it's also not a cast but rather an op. >>> That works for me. 'be16 r4' is ambiguous regarding upper bits. what about my earlier suggestion: r4 = (be16) (u16) r4 r4 = (le64) (u64) r4 It will be pretty clear what instruction is doing (that upper bits become zero). >>> Trouble with that is that's very *not* what C will do with those casts >>> and it doesn't really capture the bidirectional/symmetry thing. The >>> closest I could see with that is something like `r4 = (be16/u16) r4`, >>> but that's quite an ugly mongrel. >>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the >>> cleanest and clearest. Should it be >>> r4 = be16 r4 >>> or just >>> be16 r4 >>> ? Personally I incline towards the latter, but admit it doesn't really >>> match the syntax of other opcodes. >> I did some quick prototyping in llvm to make sure we have a syntax >> llvm is happy. Apparently, llvm does not like the syntax >>r4 = be16 r4orr4 = (be16) (u16) r4. >> >> In llvm:utils/TableGen/AsmMatcherEmitter.cpp: >> >> // Verify that any operand is only mentioned once. > Wait, how do you deal with (totally legal) r4 += r4? > Or r4 = *(r4 +0)? > Even jumps can have src_reg == dst_reg, though it doesn't seem useful. We are talking about dag node here. The above "r4", although using the same register, will be different dag nodes. So it will be okay. The "r4 = be16 r4" tries to use the *same* dag node as both source and destination in the asm output which is prohibited.
[PATCH 1/1] forcedeth: optimize the xmit/rx with unlikely
In the xmit/rx fastpath, the function dma_map_single rarely fails. Therefore, add an unlikely() optimization to this error check conditional. Signed-off-by: Zhu Yanjun --- drivers/net/ethernet/nvidia/forcedeth.c | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c index 0a7ba3a..63a9e1e 100644 --- a/drivers/net/ethernet/nvidia/forcedeth.c +++ b/drivers/net/ethernet/nvidia/forcedeth.c @@ -1816,8 +1816,8 @@ static int nv_alloc_rx(struct net_device *dev) skb->data, skb_tailroom(skb), DMA_FROM_DEVICE); - if (dma_mapping_error(&np->pci_dev->dev, - np->put_rx_ctx->dma)) { + if (unlikely(dma_mapping_error(&np->pci_dev->dev, + np->put_rx_ctx->dma))) { kfree_skb(skb); goto packet_dropped; } @@ -1857,8 +1857,8 @@ static int nv_alloc_rx_optimized(struct net_device *dev) skb->data, skb_tailroom(skb), DMA_FROM_DEVICE); - if (dma_mapping_error(&np->pci_dev->dev, - np->put_rx_ctx->dma)) { + if (unlikely(dma_mapping_error(&np->pci_dev->dev, + np->put_rx_ctx->dma))) { kfree_skb(skb); goto packet_dropped; } @@ -2224,8 +2224,8 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev) np->put_tx_ctx->dma = dma_map_single(&np->pci_dev->dev, skb->data + offset, bcnt, DMA_TO_DEVICE); - if (dma_mapping_error(&np->pci_dev->dev, - np->put_tx_ctx->dma)) { + if (unlikely(dma_mapping_error(&np->pci_dev->dev, + np->put_tx_ctx->dma))) { /* on DMA mapping error - drop the packet */ dev_kfree_skb_any(skb); u64_stats_update_begin(&np->swstats_tx_syncp); @@ -2265,7 +2265,8 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev) frag, offset, bcnt, DMA_TO_DEVICE); - if (dma_mapping_error(&np->pci_dev->dev, np->put_tx_ctx->dma)) { + if (unlikely(dma_mapping_error(&np->pci_dev->dev, + np->put_tx_ctx->dma))) { /* Unwind the mapped fragments */ do { @@ -2373,8 +2374,8 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb, np->put_tx_ctx->dma = dma_map_single(&np->pci_dev->dev, skb->data + offset, bcnt, DMA_TO_DEVICE); - if (dma_mapping_error(&np->pci_dev->dev, - np->put_tx_ctx->dma)) { + if (unlikely(dma_mapping_error(&np->pci_dev->dev, + np->put_tx_ctx->dma))) { /* on DMA mapping error - drop the packet */ dev_kfree_skb_any(skb); u64_stats_update_begin(&np->swstats_tx_syncp); @@ -2415,7 +2416,8 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb, bcnt, DMA_TO_DEVICE); - if (dma_mapping_error(&np->pci_dev->dev, np->put_tx_ctx->dma)) { + if (unlikely(dma_mapping_error(&np->pci_dev->dev, + np->put_tx_ctx->dma))) { /* Unwind the mapped fragments */ do { @@ -5070,8 +5072,8 @@ static int nv_loopback_test(struct net_device *dev) test_dma_addr = dma_map_single(&np->pci_dev->dev, tx_skb->data, skb_tailroom(tx_skb), DMA_FROM_DEVICE); - if (dma_mapping_error(&n
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 7:11 AM, Y Song wrote: > On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree wrote: >> On 22/09/17 00:11, Y Song wrote: >>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree wrote: On 21/09/17 20:44, Alexei Starovoitov wrote: > On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: >> More intuitive, but agree on the from_be/le. Maybe we should >> just drop the "to_" prefix altogether, and leave the rest as is since >> it's not surrounded by braces, it's also not a cast but rather an op. That works for me. > 'be16 r4' is ambiguous regarding upper bits. > > what about my earlier suggestion: > r4 = (be16) (u16) r4 > r4 = (le64) (u64) r4 > > It will be pretty clear what instruction is doing (that upper bits become > zero). Trouble with that is that's very *not* what C will do with those casts and it doesn't really capture the bidirectional/symmetry thing. The closest I could see with that is something like `r4 = (be16/u16) r4`, but that's quite an ugly mongrel. I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the cleanest and clearest. Should it be r4 = be16 r4 or just be16 r4 ? Personally I incline towards the latter, but admit it doesn't really match the syntax of other opcodes. >>> I did some quick prototyping in llvm to make sure we have a syntax >>> llvm is happy. Apparently, llvm does not like the syntax >>>r4 = be16 r4orr4 = (be16) (u16) r4. >>> >>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp: >>> >>> // Verify that any operand is only mentioned once. >> Wait, how do you deal with (totally legal) r4 += r4? >> Or r4 = *(r4 +0)? >> Even jumps can have src_reg == dst_reg, though it doesn't seem useful. > > We are talking about dag node here. The above "r4", although using the same > register, will be different dag nodes. So it will be okay. > > The "r4 = be16 r4" tries to use the *same* dag node as both source and > destination > in the asm output which is prohibited. With second thought, we may allow "r4 = be16 r4" by using different dag nodes. (I need to do experiment for this.) But we do have constraints that the two "r4" must be the same register. "r5 = be16 r4" is not allowed. So from that perspective, referencing "r4" only once is a good idea and less confusing.
Re: [Intel-wired-lan] [PATCH][V2] e1000: avoid null pointer dereference on invalid stat type
On Fri, Sep 22, 2017 at 6:41 AM, Colin King wrote: > From: Colin Ian King > > Currently if the stat type is invalid then data[i] is being set > either by dereferencing a null pointer p, or it is reading from > an incorrect previous location if we had a valid stat type > previously. Fix this by nullify pointer p if a stat type is > invalid and only setting data if p is not null. > > Detected by CoverityScan, CID#113385 ("Explicit null dereferenced") > > Signed-off-by: Colin Ian King > --- > drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c > b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c > index ec8aa4562cc9..724c93a6ea92 100644 > --- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c > +++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c > @@ -1824,7 +1824,7 @@ static void e1000_get_ethtool_stats(struct net_device > *netdev, > { > struct e1000_adapter *adapter = netdev_priv(netdev); > int i; > - char *p = NULL; > + char *p; > const struct e1000_stats *stat = e1000_gstrings_stats; > > e1000_update_stats(adapter); I was honestly happier with the portion of the first patch that moved this into the loop. > @@ -1837,16 +1837,18 @@ static void e1000_get_ethtool_stats(struct net_device > *netdev, > p = (char *)adapter + stat->stat_offset; > break; > default: > + p = NULL; > WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n", > stat->type, i); > break; > } > > - if (stat->sizeof_stat == sizeof(u64)) > - data[i] = *(u64 *)p; > - else > - data[i] = *(u32 *)p; > - > + if (p) { > + if (stat->sizeof_stat == sizeof(u64)) > + data[i] = *(u64 *)p; > + else > + data[i] = *(u32 *)p; > + } > stat++; > } Instead of doing all this why not just call out a "continue;" instead of a "break" if the type isn't recognized, and move the stat++ into the loop declaration and process it at the same time as i++? That would clean most of this up and we don't have to worry about any loop carried variables and the like. > /* BUG_ON(i != E1000_STATS_LEN); */ > -- > 2.14.1 > > ___ > Intel-wired-lan mailing list > intel-wired-...@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 07:27:29AM -0700, Y Song wrote: > On Fri, Sep 22, 2017 at 7:11 AM, Y Song wrote: > > On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree wrote: > >> On 22/09/17 00:11, Y Song wrote: > >>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree > >>> wrote: > On 21/09/17 20:44, Alexei Starovoitov wrote: > > On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: > >> More intuitive, but agree on the from_be/le. Maybe we should > >> just drop the "to_" prefix altogether, and leave the rest as is since > >> it's not surrounded by braces, it's also not a cast but rather an op. > That works for me. > > 'be16 r4' is ambiguous regarding upper bits. > > > > what about my earlier suggestion: > > r4 = (be16) (u16) r4 > > r4 = (le64) (u64) r4 > > > > It will be pretty clear what instruction is doing (that upper bits > > become zero). > Trouble with that is that's very *not* what C will do with those casts > and it doesn't really capture the bidirectional/symmetry thing. The > closest I could see with that is something like `r4 = (be16/u16) r4`, > but that's quite an ugly mongrel. > I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the > cleanest and clearest. Should it be > r4 = be16 r4 > or just > be16 r4 > ? Personally I incline towards the latter, but admit it doesn't really > match the syntax of other opcodes. > >>> I did some quick prototyping in llvm to make sure we have a syntax > >>> llvm is happy. Apparently, llvm does not like the syntax > >>>r4 = be16 r4orr4 = (be16) (u16) r4. > >>> > >>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp: > >>> > >>> // Verify that any operand is only mentioned once. > >> Wait, how do you deal with (totally legal) r4 += r4? > >> Or r4 = *(r4 +0)? > >> Even jumps can have src_reg == dst_reg, though it doesn't seem useful. > > > > We are talking about dag node here. The above "r4", although using the same > > register, will be different dag nodes. So it will be okay. > > > > The "r4 = be16 r4" tries to use the *same* dag node as both source and > > destination > > in the asm output which is prohibited. > > With second thought, we may allow "r4 = be16 r4" by using different dag nodes. > (I need to do experiment for this.) But we do have constraints that > the two "r4" must > be the same register. "r5 = be16 r4" is not allowed. So from that > perspective, referencing > "r4" only once is a good idea and less confusing. looks like we're converging on "be16/be32/be64/le16/le32/le64 #register" for BPF_END. I guess it can live with that. I would prefer more C like syntax to match the rest, but llvm parsing point is a strong one. For BPG_NEG I prefer to do it in C syntax like interpreter does: ALU_NEG: DST = (u32) -DST; ALU64_NEG: DST = -DST; Yonghong, does it mean that asmparser will equally suffer?
Re: tg3 pxe weirdness
On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote: > > > Can you please share below details? > 1) Model and Manufacturer of the system > 2) Linux distro/kernel used? 4.13.3 gets a little further, but after some more data is transferred the tg3 driver still crashes. This is unfortunately before I've got a writeable filesystem. The last line is: tg3 :01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2 I've got some ideas to get the full dmesg. As with the other kernels it works OK on 1Gbps, but not slower switches.
[PATCH] b43: make const arrays static, reduces object code size
From: Colin Ian King Don't populate const arrays on the stack, instead make them static. Makes the object code smaller by over 60 bytes: Before: textdata bss dec hex filename 148161296 0 161123ef0 b43/phy_ht.o After: textdata bss dec hex filename 145511496 0 160473eaf b43/phy_ht.o (gcc 6.3.0, x86-64) Signed-off-by: Colin Ian King --- drivers/net/wireless/broadcom/b43/phy_ht.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/broadcom/b43/phy_ht.c b/drivers/net/wireless/broadcom/b43/phy_ht.c index 718c90e81696..c3158d085c2b 100644 --- a/drivers/net/wireless/broadcom/b43/phy_ht.c +++ b/drivers/net/wireless/broadcom/b43/phy_ht.c @@ -119,7 +119,7 @@ static void b43_radio_2059_rcal(struct b43_wldev *dev) /* Calibrate the internal RC oscillator? */ static void b43_radio_2057_rccal(struct b43_wldev *dev) { - const u16 radio_values[3][2] = { + static const u16 radio_values[3][2] = { { 0x61, 0xE9 }, { 0x69, 0xD5 }, { 0x73, 0x99 }, }; int i; @@ -154,7 +154,7 @@ static void b43_radio_2059_init_pre(struct b43_wldev *dev) static void b43_radio_2059_init(struct b43_wldev *dev) { - const u16 routing[] = { R2059_C1, R2059_C2, R2059_C3 }; + static const u16 routing[] = { R2059_C1, R2059_C2, R2059_C3 }; int i; /* Prepare (reset?) radio */ @@ -263,7 +263,7 @@ static void b43_phy_ht_reset_cca(struct b43_wldev *dev) static void b43_phy_ht_zero_extg(struct b43_wldev *dev) { u8 i, j; - u16 base[] = { 0x40, 0x60, 0x80 }; + static const u16 base[] = { 0x40, 0x60, 0x80 }; for (i = 0; i < ARRAY_SIZE(base); i++) { for (j = 0; j < 4; j++) -- 2.14.1
[PATCH] hv_netvsc: make const array ver_list static, reduces object code size
From: Colin Ian King Don't populate const array ver_list on the stack, instead make it static. Makes the object code smaller by over 400 bytes: Before: textdata bss dec hex filename 184443168 320 2193255ac drivers/net/hyperv/netvsc.o After: textdata bss dec hex filename 179503224 320 2149453f6 drivers/net/hyperv/netvsc.o (gcc 6.3.0, x86-64) Signed-off-by: Colin Ian King --- drivers/net/hyperv/netvsc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 8d5077fb0492..b0d323e24978 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -484,7 +484,7 @@ static int netvsc_connect_vsp(struct hv_device *device, struct netvsc_device *net_device, const struct netvsc_device_info *device_info) { - const u32 ver_list[] = { + static const u32 ver_list[] = { NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2, NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5 }; -- 2.14.1
[net-next] net: remove MTU limits for dummy and ifb device
These two drivers (dummy and ifb) call ether_setup(), after commit 61e84623ace3 ("net: centralize net_device min/max MTU checking"), the range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default. These two devices should not have limits on MTU. This patch set their min_mtu/max_mtu to 0. So that dev_set_mtu() will not check the mtu range, and can be set with any value. CC: Eric Dumazet CC: Sabrina Dubroca Signed-off-by: Zhang Shengju --- drivers/net/dummy.c | 2 +- drivers/net/ifb.c | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c index e31ab3b..58483af 100644 --- a/drivers/net/dummy.c +++ b/drivers/net/dummy.c @@ -353,7 +353,7 @@ static void dummy_setup(struct net_device *dev) eth_hw_addr_random(dev); dev->min_mtu = 0; - dev->max_mtu = ETH_MAX_MTU; + dev->max_mtu = 0; } static int dummy_validate(struct nlattr *tb[], struct nlattr *data[], diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 8870bd2..0008da7 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -231,6 +231,9 @@ static void ifb_setup(struct net_device *dev) eth_hw_addr_random(dev); dev->needs_free_netdev = true; dev->priv_destructor = ifb_dev_free; + + dev->min_mtu = 0; + dev->max_mtu = 0; } static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev) -- 1.8.3.1
RE: [PATCH] hv_netvsc: make const array ver_list static, reduces object code size
> -Original Message- > From: Colin King [mailto:colin.k...@canonical.com] > Sent: Friday, September 22, 2017 8:50 AM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen Hemminger ; > de...@linuxdriverproject.org; netdev@vger.kernel.org > Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: [PATCH] hv_netvsc: make const array ver_list static, reduces > object code size > > From: Colin Ian King > > Don't populate const array ver_list on the stack, instead make it > static. Makes the object code smaller by over 400 bytes: > > Before: >text data bss dec hex filename > 18444 3168 320 2193255ac > drivers/net/hyperv/netvsc.o > > After: >text data bss dec hex filename > 17950 3224 320 2149453f6 > drivers/net/hyperv/netvsc.o > > (gcc 6.3.0, x86-64) > > Signed-off-by: Colin Ian King > --- Reviewed-by: Haiyang Zhang
Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack
Fri, Sep 22, 2017 at 04:11:51PM CEST, linyunsh...@huawei.com wrote: >Hi, Jiri > >>>- if (!tc) { >>>+ if (if_running) { >>>+ (void)hns3_nic_net_stop(netdev); >>>+ msleep(100); >>>+ } >>>+ >>>+ ret = (kinfo->dcb_ops && kinfo->dcb_ops->>setup_tc) ? >>>+ kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : ->EOPNOTSUPP; > >>This is most odd. Why do you call dcb_ops from >ndo_setup_tc callback? >>Why are you mixing this together? prio->tc mapping >can be done >>directly in dcbnl > >Here is what we do in dcb_ops->setup_tc: >Firstly, if current tc num is different from the tc num >that user provide, then we setup the queues for each >tc. > >Secondly, we tell hardware the pri to tc mapping that >the stack is using. In rx direction, our hardware need >that mapping to put different packet into different tc' >queues according to the priority of the packet, then >rss decides which specific queue in the tc should the >packet goto. > >By mixing, I suppose you meant why we need the >pri to tc infomation? by mixing, I mean what I wrote. You are calling dcb_ops callback from ndo_setup_tc callback. So you are mixing DCBNL subsystem and TC subsystem. Why? Why do you need sch_mqprio? Why DCBNL is not enough for all? >I hope I did not misunderstand your question, thanks >for your time reviewing.
URGENT RESPOND NEEDED FROM YOU BY DANIEL MMINELE THE DEPUTY GOVERNOR OF SOUTH AFRICAN RESERVE BANK
Hello Dear, My Name is Dr. Daniel Mminele the deputy governor of south African reserve bank, you can also check on our Bank website here for your assurance , http://www.whoswhosa.co.za/south-african-reserve-bank-20991 I am a citizen of South African. I am a computer analyst long time working with reserve Bank of South African. (R.B.) My Dear, I came across your payment file which was Marked X and your Fund Release Hard Disk Painted Red, I took my time to study your payment file and found out that you have really attempted to receive the funds without success till now. But I am using another person email address to send you this because I don’t want my bank to notify me alright The most annoying thing is that the directors and top officials of this nation could not tell you the truth that on no account will they ever release the fund to you; instead they will let you spend money unnecessarily. I swear with my life, I can release this funds to you if you can certify me of my security and how I can run away from this country to meet with you. This is like a Mafia Setting here, if I do this for you, I do not intend to work here all the days of my life and I cannot run back to my country as this Bank has my home address , and if I don't run away after I making the transfer to your bank account, I may be in trouble or my life will be in danger. The only thing I will need to release this fund is a Special Hard Disk, we call it HD120 GIG. I will buy two of it, recopy your information, destroy the previous one, and punch the computer to make the fund reflect in your bank account within 24 banking hours. I will clean up the tracer and destroy your file, after which I will run away from here to meet with you in your country if you are interested. But you must assure me the absolute confidentiality of this deal before I can do anything further. Do get in touch with me immediately, You should send to me your convenient Tell phone number /fax numbers for easy communications and also re confirm your banking details, so that there won't be any mistake. For phone conversation, please call me on +27011745039 or +27843464227, And you must contact me though this my Email address which is, danielmmine...@yahoo.com SPECIAL INFORMATION: YOU WILL SEND THE FEE FOR THE HARD DISK FIRST BEFORE I MAKE YOUR TRANSFER OR YOU BUY THE HARD DISK IN YOUR COUNTRY AND SEND IT TO ME,DON'T CONTACT ME IF YOU CAN NOT SEND THE HARD DISK FEE FIRST OR THE HARD DISK. AS SOON AS I RECEIVED YOUR EMAIL I WILL LET YOU KNOW HOW MUCH THE DISK WILL COST YOU. Regards, Dr. Daniel Mminele the deputy governor of south African reserve bank
Re: [PATCH net v2] l2tp: fix race condition in l2tp_tunnel_delete
2017-09-19, 18:43:37 +0200, Guillaume Nault wrote: > On Tue, Sep 19, 2017 at 03:40:40PM +0200, Sabrina Dubroca wrote: > > If we try to delete the same tunnel twice, the first delete operation > > does a lookup (l2tp_tunnel_get), finds the tunnel, calls > > l2tp_tunnel_delete, which queues it for deletion by > > l2tp_tunnel_del_work. > > > > The second delete operation also finds the tunnel and calls > > l2tp_tunnel_delete. If the workqueue has already fired and started > > running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the > > same tunnel a second time, and try to free the socket again. > > > > Add a dead flag to prevent firing the workqueue twice. Then we can > > remove the check of queue_work's result that was meant to prevent that > > race but doesn't. > > > > Also check the flag in the tunnel lookup functions, to avoid returning a > > tunnel that is already scheduled for destruction. > > > > Reproducer: > > > > ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 > > remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000 > > ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 > > peer_session_id 2000 > > ip link set l2tp1 up > > ip l2tp del tunnel tunnel_id 3000 > > ip l2tp del tunnel tunnel_id 3000 > > > > Fixes: f8ccac0e4493 ("l2tp: put tunnel socket release on a workqueue") > > Reported-by: Jianlin Shi > > Signed-off-by: Sabrina Dubroca > > --- > > v2: as Tom Parkin explained, we can't remove the tunnel from the > > per-net list from netlink. v2 uses only a dead flag, and adds > > corresponding checks during lookups > > > > net/l2tp/l2tp_core.c | 18 +- > > net/l2tp/l2tp_core.h | 5 - > > 2 files changed, 13 insertions(+), 10 deletions(-) > > > > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c > > index ee485df73ccd..3891f0260f2b 100644 > > --- a/net/l2tp/l2tp_core.c > > +++ b/net/l2tp/l2tp_core.c > > @@ -203,7 +203,8 @@ struct l2tp_tunnel *l2tp_tunnel_get(const struct net > > *net, u32 tunnel_id) > > > > rcu_read_lock_bh(); > > list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) { > > - if (tunnel->tunnel_id == tunnel_id) { > > + if (tunnel->tunnel_id == tunnel_id && > > + !test_bit(0, &tunnel->dead)) { > > l2tp_tunnel_inc_refcount(tunnel); > > rcu_read_unlock_bh(); > > > > @@ -390,7 +391,8 @@ struct l2tp_tunnel *l2tp_tunnel_find(const struct net > > *net, u32 tunnel_id) > > > > rcu_read_lock_bh(); > > list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) { > > - if (tunnel->tunnel_id == tunnel_id) { > > + if (tunnel->tunnel_id == tunnel_id && > > + !test_bit(0, &tunnel->dead)) { > > rcu_read_unlock_bh(); > > return tunnel; > > } > > @@ -409,7 +411,7 @@ struct l2tp_tunnel *l2tp_tunnel_find_nth(const struct > > net *net, int nth) > > > > rcu_read_lock_bh(); > > list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) { > > - if (++count > nth) { > > + if (++count > nth && !test_bit(0, &tunnel->dead)) { > > rcu_read_unlock_bh(); > > return tunnel; > > } > > > I don't get why you're checking the dead flag in l2tp_tunnel_{get,find}*(). > Since it can be set concurrently right after test_bit(), it doesn't > protect the caller from getting a tunnel that is being removed by > l2tp_tunnel_delete(). > Or have I missed something? You're right. Then I would try going back to essentially v1, but keeping code to remove the tunnel from the list in l2tp_tunnel_destruct if it's not dead yet. What do you think? 8< diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index ee485df73ccd..63cd1f30ac7d 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1234,6 +1234,23 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int hdr_len } EXPORT_SYMBOL_GPL(l2tp_xmit_skb); +static bool __l2tp_tunnel_delete(struct l2tp_tunnel *tunnel) +{ + struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net); + bool ret = false; + + spin_lock_bh(&pn->l2tp_tunnel_list_lock); + if (!tunnel->dead) { + tunnel->dead = 1; + list_del_rcu(&tunnel->list); + atomic_dec(&l2tp_tunnel_count); + ret = true; + } + spin_unlock_bh(&pn->l2tp_tunnel_list_lock); + + return ret; +} + /* * Tinnel and session create/destroy. */ @@ -1245,7 +1262,6 @@ EXPORT_SYMBOL_GPL(l2tp_xmit_skb); static void l2tp_tunnel_destruct(struct sock *sk) { struct l2tp_tunnel *tunnel = l2tp_tunnel(sk); - struct l2tp_net *pn; if (tunnel == NULL)
[PATCH] Switch to use the new hashtable implementation. This reduces the code and need for yet another hashtable implementation.
Signed-off-by: Aaron Wood --- net/9p/error.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/net/9p/error.c b/net/9p/error.c index 126fd0dceea2..2e966fcc5cbb 100644 --- a/net/9p/error.c +++ b/net/9p/error.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include @@ -50,8 +51,8 @@ struct errormap { struct hlist_node list; }; -#define ERRHASHSZ 32 -static struct hlist_head hash_errmap[ERRHASHSZ]; +#define ERR_HASH_BITS 5 +static DEFINE_HASHTABLE(hash_errmap, ERR_HASH_BITS); /* FixMe - reduce to a reasonable size */ static struct errormap errmap[] = { @@ -193,18 +194,14 @@ static struct errormap errmap[] = { int p9_error_init(void) { struct errormap *c; - int bucket; - - /* initialize hash table */ - for (bucket = 0; bucket < ERRHASHSZ; bucket++) - INIT_HLIST_HEAD(&hash_errmap[bucket]); + int key; /* load initial error map into hash table */ for (c = errmap; c->name != NULL; c++) { c->namelen = strlen(c->name); - bucket = jhash(c->name, c->namelen, 0) % ERRHASHSZ; + key = jhash(c->name, c->namelen, 0); INIT_HLIST_NODE(&c->list); - hlist_add_head(&c->list, &hash_errmap[bucket]); + hash_add(hash_errmap, &c->list, key); } return 1; @@ -222,12 +219,12 @@ int p9_errstr2errno(char *errstr, int len) { int errno; struct errormap *c; - int bucket; + int key; errno = 0; c = NULL; - bucket = jhash(errstr, len, 0) % ERRHASHSZ; - hlist_for_each_entry(c, &hash_errmap[bucket], list) { + key = jhash(errstr, len, 0); + hash_for_each_possible(hash_errmap, c, list, key) { if (c->namelen == len && !memcmp(c->name, errstr, len)) { errno = c->val; break; -- 2.11.0
[PATCH net-next 4/4] net: dsa: add port enable and disable helpers
Provide dsa_port_enable and dsa_port_disable helpers to respectively enable and disable a switch port. This makes the dsa_port_set_state_now helper static. Signed-off-by: Vivien Didelot --- net/dsa/dsa_priv.h | 3 ++- net/dsa/port.c | 31 ++- net/dsa/slave.c| 19 +-- 3 files changed, 37 insertions(+), 16 deletions(-) diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 9803952a5b40..6bfff19d1615 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -117,7 +117,8 @@ void dsa_master_ethtool_restore(struct net_device *dev); /* port.c */ int dsa_port_set_state(struct dsa_port *dp, u8 state, struct switchdev_trans *trans); -void dsa_port_set_state_now(struct dsa_port *dp, u8 state); +int dsa_port_enable(struct dsa_port *dp); +void dsa_port_disable(struct dsa_port *dp); int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br); void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br); int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering, diff --git a/net/dsa/port.c b/net/dsa/port.c index 76d43a82d397..50749339e252 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -56,7 +56,7 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state, return 0; } -void dsa_port_set_state_now(struct dsa_port *dp, u8 state) +static void dsa_port_set_state_now(struct dsa_port *dp, u8 state) { int err; @@ -65,6 +65,35 @@ void dsa_port_set_state_now(struct dsa_port *dp, u8 state) pr_err("DSA: failed to set STP state %u (%d)\n", state, err); } +int dsa_port_enable(struct dsa_port *dp) +{ + u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING; + struct dsa_switch *ds = dp->ds; + int port = dp->index; + int err; + + if (ds->ops->port_enable) { + err = ds->ops->port_enable(ds, port); + if (err) + return err; + } + + dsa_port_set_state_now(dp, stp_state); + + return 0; +} + +void dsa_port_disable(struct dsa_port *dp) +{ + struct dsa_switch *ds = dp->ds; + int port = dp->index; + + dsa_port_set_state_now(dp, BR_STATE_DISABLED); + + if (ds->ops->port_disable) + ds->ops->port_disable(ds, port); +} + int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br) { struct dsa_notifier_bridge_info info = { diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 235a5c95dfcc..e40623939323 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -74,9 +74,7 @@ static int dsa_slave_open(struct net_device *dev) struct dsa_slave_priv *p = netdev_priv(dev); struct phy_device *phy = p->phy; struct dsa_port *dp = p->dp; - struct dsa_switch *ds = dp->ds; struct net_device *master = dsa_master_netdev(p); - u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING; int err; if (!(master->flags & IFF_UP)) @@ -99,13 +97,9 @@ static int dsa_slave_open(struct net_device *dev) goto clear_allmulti; } - if (ds->ops->port_enable) { - err = ds->ops->port_enable(ds, p->dp->index); - if (err) - goto clear_promisc; - } - - dsa_port_set_state_now(p->dp, stp_state); + err = dsa_port_enable(dp); + if (err) + goto clear_promisc; if (phy) { /* If phy_stop() has been called before, phy will be in @@ -139,15 +133,12 @@ static int dsa_slave_close(struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); struct net_device *master = dsa_master_netdev(p); - struct dsa_switch *ds = p->dp->ds; + struct dsa_port *dp = p->dp; if (p->phy) phy_stop(p->phy); - dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); - - if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index); + dsa_port_disable(dp); dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); -- 2.14.1
[PATCH net-next 0/4] net: dsa: simplify port enabling
This patchset removes the unnecessary PHY device argument in port enable/disable switch operations, makes slave open and close symmetrical and finally provides helpers for enabling or disabling a DSA port. Vivien Didelot (4): net: dsa: move up phy enabling in core net: dsa: remove phy arg from port enable/disable net: dsa: make slave close symmetrical to open net: dsa: add port enable and disable helpers drivers/net/dsa/b53/b53_common.c | 6 +++--- drivers/net/dsa/b53/b53_priv.h | 4 ++-- drivers/net/dsa/bcm_sf2.c | 32 -- drivers/net/dsa/lan9303-core.c | 6 ++ drivers/net/dsa/microchip/ksz_common.c | 6 ++ drivers/net/dsa/mt7530.c | 8 +++- drivers/net/dsa/mv88e6xxx/chip.c | 6 ++ drivers/net/dsa/qca8k.c| 6 ++ include/net/dsa.h | 6 ++ net/dsa/dsa_priv.h | 3 ++- net/dsa/port.c | 31 - net/dsa/slave.c| 36 ++ 12 files changed, 77 insertions(+), 73 deletions(-) -- 2.14.1
[PATCH net-next 3/4] net: dsa: make slave close symmetrical to open
The DSA slave open function configures the unicast MAC addresses on the master device, enable the switch port, change its STP state, then start the PHY device. Make the close function symmetric, by first stopping the PHY device, then changing the STP state, disabling the switch port and restore the master device. Signed-off-by: Vivien Didelot --- net/dsa/slave.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 6290741e496a..235a5c95dfcc 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -144,6 +144,11 @@ static int dsa_slave_close(struct net_device *dev) if (p->phy) phy_stop(p->phy); + dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); + + if (ds->ops->port_disable) + ds->ops->port_disable(ds, p->dp->index); + dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); if (dev->flags & IFF_ALLMULTI) @@ -154,11 +159,6 @@ static int dsa_slave_close(struct net_device *dev) if (!ether_addr_equal(dev->dev_addr, master->dev_addr)) dev_uc_del(master, dev->dev_addr); - if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index); - - dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); - return 0; } -- 2.14.1
[PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable
The .port_enable and .port_disable functions are meant to deal with the switch ports only, and no driver is using the phy argument anyway. Remove it. Signed-off-by: Vivien Didelot --- drivers/net/dsa/b53/b53_common.c | 6 +++--- drivers/net/dsa/b53/b53_priv.h | 4 ++-- drivers/net/dsa/bcm_sf2.c | 16 +++- drivers/net/dsa/lan9303-core.c | 6 ++ drivers/net/dsa/microchip/ksz_common.c | 6 ++ drivers/net/dsa/mt7530.c | 8 +++- drivers/net/dsa/mv88e6xxx/chip.c | 6 ++ drivers/net/dsa/qca8k.c| 6 ++ include/net/dsa.h | 6 ++ net/dsa/slave.c| 4 ++-- 10 files changed, 27 insertions(+), 41 deletions(-) diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index d4ce092def83..e46eb29d29f0 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -502,7 +502,7 @@ void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port) } EXPORT_SYMBOL(b53_imp_vlan_setup); -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy) +int b53_enable_port(struct dsa_switch *ds, int port) { struct b53_device *dev = ds->priv; unsigned int cpu_port = dev->cpu_port; @@ -531,7 +531,7 @@ int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy) } EXPORT_SYMBOL(b53_enable_port); -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy) +void b53_disable_port(struct dsa_switch *ds, int port) { struct b53_device *dev = ds->priv; u8 reg; @@ -874,7 +874,7 @@ static int b53_setup(struct dsa_switch *ds) if (dsa_is_cpu_port(ds, port)) b53_enable_cpu_port(dev, port); else if (!(BIT(port) & ds->enabled_port_mask)) - b53_disable_port(ds, port, NULL); + b53_disable_port(ds, port); } return ret; diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h index 603c66d240d8..688d02ee6155 100644 --- a/drivers/net/dsa/b53/b53_priv.h +++ b/drivers/net/dsa/b53/b53_priv.h @@ -311,8 +311,8 @@ int b53_mirror_add(struct dsa_switch *ds, int port, struct dsa_mall_mirror_tc_entry *mirror, bool ingress); void b53_mirror_del(struct dsa_switch *ds, int port, struct dsa_mall_mirror_tc_entry *mirror); -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy); -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy); +int b53_enable_port(struct dsa_switch *ds, int port); +void b53_disable_port(struct dsa_switch *ds, int port); void b53_brcm_hdr_setup(struct dsa_switch *ds, int port); void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable); int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy); diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index ad96b9725a2c..77e0c43f973b 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -159,8 +159,7 @@ static inline void bcm_sf2_port_intr_disable(struct bcm_sf2_priv *priv, intrl2_1_writel(priv, P_IRQ_MASK(off), INTRL2_CPU_CLEAR); } -static int bcm_sf2_port_setup(struct dsa_switch *ds, int port, - struct phy_device *phy) +static int bcm_sf2_port_setup(struct dsa_switch *ds, int port) { struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); unsigned int i; @@ -191,11 +190,10 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int port, if (port == priv->moca_port) bcm_sf2_port_intr_enable(priv, port); - return b53_enable_port(ds, port, phy); + return b53_enable_port(ds, port); } -static void bcm_sf2_port_disable(struct dsa_switch *ds, int port, -struct phy_device *phy) +static void bcm_sf2_port_disable(struct dsa_switch *ds, int port) { struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); u32 off, reg; @@ -214,7 +212,7 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, int port, else off = CORE_G_PCTL_PORT(port); - b53_disable_port(ds, port, phy); + b53_disable_port(ds, port); /* Power down the port memory */ reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL); @@ -613,7 +611,7 @@ static int bcm_sf2_sw_suspend(struct dsa_switch *ds) for (port = 0; port < DSA_MAX_PORTS; port++) { if ((1 << port) & ds->enabled_port_mask || dsa_is_cpu_port(ds, port)) - bcm_sf2_port_disable(ds, port, NULL); + bcm_sf2_port_disable(ds, port); } return 0; @@ -636,7 +634,7 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds) for (port = 0; port < DSA_MAX_PORTS; port++) { if ((1 << port) & ds->enabled_port_mask) -
[PATCH net-next 1/4] net: dsa: move up phy enabling in core
bcm_sf2 is currently the only driver using the phy argument passed to .port_enable. It resets the state machine if the phy has been hard reset. This check is generic and can be moved to DSA core. Signed-off-by: Vivien Didelot --- drivers/net/dsa/bcm_sf2.c | 16 +--- net/dsa/slave.c | 15 +-- 2 files changed, 14 insertions(+), 17 deletions(-) diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index 898d5642b516..ad96b9725a2c 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -184,22 +184,8 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int port, core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port)); /* Re-enable the GPHY and re-apply workarounds */ - if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) { + if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) bcm_sf2_gphy_enable_set(ds, true); - if (phy) { - /* if phy_stop() has been called before, phy -* will be in halted state, and phy_start() -* will call resume. -* -* the resume path does not configure back -* autoneg settings, and since we hard reset -* the phy manually here, we need to reset the -* state machine also. -*/ - phy->state = PHY_READY; - phy_init_hw(phy); - } - } /* Enable MoCA port interrupts to get notified */ if (port == priv->moca_port) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 02ace7d462c4..606812160fd5 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -72,6 +72,7 @@ static int dsa_slave_get_iflink(const struct net_device *dev) static int dsa_slave_open(struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); + struct phy_device *phy = p->phy; struct dsa_port *dp = p->dp; struct dsa_switch *ds = dp->ds; struct net_device *master = dsa_master_netdev(p); @@ -106,8 +107,18 @@ static int dsa_slave_open(struct net_device *dev) dsa_port_set_state_now(p->dp, stp_state); - if (p->phy) - phy_start(p->phy); + if (phy) { + /* If phy_stop() has been called before, phy will be in +* halted state, and phy_start() will call resume. +* +* The resume path does not configure back autoneg +* settings, and since the internal phy may have been +* hard reset, we need to reset the state machine also. +*/ + phy->state = PHY_READY; + phy_init_hw(phy); + phy_start(phy); + } return 0; -- 2.14.1
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 22/09/17 16:16, Alexei Starovoitov wrote: > looks like we're converging on > "be16/be32/be64/le16/le32/le64 #register" for BPF_END. > I guess it can live with that. I would prefer more C like syntax > to match the rest, but llvm parsing point is a strong one. Yep, agreed. I'll post a v2 once we've settled BPF_NEG. > For BPG_NEG I prefer to do it in C syntax like interpreter does: > ALU_NEG: > DST = (u32) -DST; > ALU64_NEG: > DST = -DST; > Yonghong, does it mean that asmparser will equally suffer? Correction to my earlier statements: verifier will currently disassemble neg as: (87) r0 neg 0 (84) (u32) r0 neg (u32) 0 because it pretends 'neg' is a compound-assignment operator like +=. The analogy with be16 and friends would be to use neg64 r0 neg32 r0 whereas the analogy with everything else would be r0 = -r0 r0 = (u32) -r0 as Alexei says. I'm happy to go with Alexei's version if it doesn't cause problems for llvm.
Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core
On Fri, Sep 22, 2017 at 12:17:50PM -0400, Vivien Didelot wrote: > bcm_sf2 is currently the only driver using the phy argument passed to > .port_enable. It resets the state machine if the phy has been hard > reset. This check is generic and can be moved to DSA core. > > dsa_port_set_state_now(p->dp, stp_state); > > - if (p->phy) > - phy_start(p->phy); > + if (phy) { > + /* If phy_stop() has been called before, phy will be in > + * halted state, and phy_start() will call resume. > + * > + * The resume path does not configure back autoneg > + * settings, and since the internal phy may have been > + * hard reset, we need to reset the state machine also. > + */ > + phy->state = PHY_READY; > + phy_init_hw(phy); > + phy_start(phy); > + } Hi Vivien If this is generic, why is it needed at all here? Shouldn't this actually by in phylib? Florian ? Andrew
Re: [PATCH net-next 3/4] net: dsa: make slave close symmetrical to open
On Fri, Sep 22, 2017 at 12:17:52PM -0400, Vivien Didelot wrote: > The DSA slave open function configures the unicast MAC addresses on the > master device, enable the switch port, change its STP state, then start > the PHY device. > > Make the close function symmetric, by first stopping the PHY device, > then changing the STP state, disabling the switch port and restore the > master device. > > Signed-off-by: Vivien Didelot Reviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next 4/4] net: dsa: add port enable and disable helpers
On Fri, Sep 22, 2017 at 12:17:53PM -0400, Vivien Didelot wrote: > Provide dsa_port_enable and dsa_port_disable helpers to respectively > enable and disable a switch port. This makes the dsa_port_set_state_now > helper static. > > Signed-off-by: Vivien Didelot Reviewed-by: Andrew Lunn Andrew
Re: [Intel-wired-lan] [PATCH] i40e: make const array patterns static, reduces object code size
On Fri, 22 Sep 2017 15:11:38 +0100 Colin King wrote: > From: Colin Ian King > > Don't populate const array patterns on the stack, instead make it > static. Makes the object code smaller by over 60 bytes: > > Before: >text data bss dec hex filename >1953 496 02449 991 i40e_diag.o > > After: >text data bss dec hex filename >1798 584 02382 94e i40e_diag.o > > (gcc 6.3.0, x86-64) > > Signed-off-by: Colin Ian King Looks good, thanks Colin! Acked-by: Jesse Brandeburg
Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijn wrote: >> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file >> *file, struct ifreq *ifr) >> if (tfile->detached) >> return -EINVAL; >> >> + if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN)) >> + return -EPERM; >> + > > This should perhaps be moved into the !dev branch, directly below the > ns_capable check. > Hmm, does that mean fail only on creation but allow to attach if exists? That would be wrong, isn't it? Correct me if I'm wrong but we want to prevent both these scenarios if user does not have sufficient privileges (i.e. NET_ADMIN in init-ns). >> dev = __dev_get_by_name(net, ifr->ifr_name); >> if (dev) { >> if (ifr->ifr_flags & IFF_TUN_EXCL) >> @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file >> *file, struct ifreq *ifr) >> tun->flags = (tun->flags & ~TUN_FEATURES) | >> (ifr->ifr_flags & TUN_FEATURES); >> >> + if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != >> IFF_TAP) >> + tun->flags = tun->flags & ~IFF_NAPI_FRAGS; >> + > > Similarly, this check only need to be performed in that branch. > Instead of reverting to non-frags mode, a tun_set_iff with the wrong > set of flags should probably fail hard. Yes, agree, wrong set of flags should fail hard and probably be done before attach or open, no?
Re: [PATCH] brcm80211: make const array ucode_ofdm_rates static, reduces object code size
Please use 'brcmsmac:' as prefix instead of 'brcm80211:'. On 22-09-17 16:03, Colin King wrote: From: Colin Ian King Don't populate const array ucode_ofdm_rates on the stack, instead make it static. Makes the object code smaller by 100 bytes: Before: text data bss dec hex filename 39482564 0 400469c6e phy_cmn.o After text data bss dec hex filename 39326620 0 399469c0a phy_cmn.o (gcc 6.3.0, x86-64) Acked-by: Arend van Spriel Signed-off-by: Colin Ian King --- drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c index 1c4e9dd57960..3a13d176b221 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c @@ -1916,7 +1916,7 @@ void wlc_phy_txpower_update_shm(struct brcms_phy *pi) pi->hwpwr_txcur); for (j = TXP_FIRST_OFDM; j <= TXP_LAST_OFDM; j++) { - const u8 ucode_ofdm_rates[] = { + static const u8 ucode_ofdm_rates[] = { 0x0c, 0x12, 0x18, 0x24, 0x30, 0x48, 0x60, 0x6c }; offset = wlapi_bmac_rate_shm_offset(
Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core
On 09/22/2017 09:17 AM, Vivien Didelot wrote: > bcm_sf2 is currently the only driver using the phy argument passed to > .port_enable. It resets the state machine if the phy has been hard > reset. This check is generic and can be moved to DSA core. This is completely specific to bcm_sf2 because it does call bcm_sf2_gphy_enable_set() which performs a HW reset of the PHY, you can't move this to the generic portion of net/dsa/slave.c. NACK. > > Signed-off-by: Vivien Didelot > --- > drivers/net/dsa/bcm_sf2.c | 16 +--- > net/dsa/slave.c | 15 +-- > 2 files changed, 14 insertions(+), 17 deletions(-) > > diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c > index 898d5642b516..ad96b9725a2c 100644 > --- a/drivers/net/dsa/bcm_sf2.c > +++ b/drivers/net/dsa/bcm_sf2.c > @@ -184,22 +184,8 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int > port, > core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port)); > > /* Re-enable the GPHY and re-apply workarounds */ > - if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) { > + if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) > bcm_sf2_gphy_enable_set(ds, true); > - if (phy) { > - /* if phy_stop() has been called before, phy > - * will be in halted state, and phy_start() > - * will call resume. > - * > - * the resume path does not configure back > - * autoneg settings, and since we hard reset > - * the phy manually here, we need to reset the > - * state machine also. > - */ > - phy->state = PHY_READY; > - phy_init_hw(phy); > - } > - } > > /* Enable MoCA port interrupts to get notified */ > if (port == priv->moca_port) > diff --git a/net/dsa/slave.c b/net/dsa/slave.c > index 02ace7d462c4..606812160fd5 100644 > --- a/net/dsa/slave.c > +++ b/net/dsa/slave.c > @@ -72,6 +72,7 @@ static int dsa_slave_get_iflink(const struct net_device > *dev) > static int dsa_slave_open(struct net_device *dev) > { > struct dsa_slave_priv *p = netdev_priv(dev); > + struct phy_device *phy = p->phy; > struct dsa_port *dp = p->dp; > struct dsa_switch *ds = dp->ds; > struct net_device *master = dsa_master_netdev(p); > @@ -106,8 +107,18 @@ static int dsa_slave_open(struct net_device *dev) > > dsa_port_set_state_now(p->dp, stp_state); > > - if (p->phy) > - phy_start(p->phy); > + if (phy) { > + /* If phy_stop() has been called before, phy will be in > + * halted state, and phy_start() will call resume. > + * > + * The resume path does not configure back autoneg > + * settings, and since the internal phy may have been > + * hard reset, we need to reset the state machine also. > + */ > + phy->state = PHY_READY; > + phy_init_hw(phy); > + phy_start(phy); > + } > > return 0; > > -- Florian
Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core
On 09/22/2017 09:32 AM, Andrew Lunn wrote: > On Fri, Sep 22, 2017 at 12:17:50PM -0400, Vivien Didelot wrote: >> bcm_sf2 is currently the only driver using the phy argument passed to >> .port_enable. It resets the state machine if the phy has been hard >> reset. This check is generic and can be moved to DSA core. >> >> dsa_port_set_state_now(p->dp, stp_state); >> >> -if (p->phy) >> -phy_start(p->phy); >> +if (phy) { >> +/* If phy_stop() has been called before, phy will be in >> + * halted state, and phy_start() will call resume. >> + * >> + * The resume path does not configure back autoneg >> + * settings, and since the internal phy may have been >> + * hard reset, we need to reset the state machine also. >> + */ >> +phy->state = PHY_READY; >> +phy_init_hw(phy); >> +phy_start(phy); >> +} > > Hi Vivien > > If this is generic, why is it needed at all here? Shouldn't this > actually by in phylib? This does not belong in the core logic within net/dsa/slave.c. The reason why this is necessary here is because we are doing a HW-based reset of the PHY, as the comment explains this is specific to how the HW works. There may be a cleaner solution to this problem, but in any case, I don't think other drivers should inherit that logic. -- Florian
usb/wireless/rsi_91x: use-after-free write in __run_timers
Hi! I've got the following report while fuzzing the kernel with syzkaller. On commit 6e80ecdddf4ea6f3cd84e83720f3d852e6624a68 (Sep 21). == BUG: KASAN: use-after-free in __run_timers+0xc0e/0xd40 Write of size 8 at addr 880069f701b8 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc1-42311-g6e80ecdddf4e #234 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 print_address_description+0x78/0x280 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 kasan_report+0x22f/0x340 mm/kasan/report.c:409 __asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435 collect_expired_timers ./include/linux/list.h:729 __run_timers+0xc0e/0xd40 kernel/time/timer.c:1616 run_timer_softirq+0x83/0x140 kernel/time/timer.c:1646 __do_softirq+0x305/0xc2d kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x171/0x1a0 kernel/softirq.c:405 exiting_irq ./arch/x86/include/asm/apic.h:638 smp_apic_timer_interrupt+0x2b9/0x8d0 arch/x86/kernel/apic/apic.c:1048 apic_timer_interrupt+0x9d/0xb0 RIP: 0010:native_safe_halt+0x6/0x10 ./arch/x86/include/asm/irqflags.h:53 RSP: 0018:86607958 EFLAGS: 0282 ORIG_RAX: ff10 RAX: dc20 RBX: 10cc0f2f RCX: RDX: RSI: 0001 RDI: 8662ea64 RBP: 86607958 R08: 813d3501 R09: R10: R11: R12: 10cc0f3b R13: 86607a98 R14: 86fc1628 R15: arch_safe_halt ./arch/x86/include/asm/paravirt.h:93 default_idle+0x127/0x690 arch/x86/kernel/process.c:341 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:332 default_idle_call+0x3b/0x60 kernel/sched/idle.c:98 cpuidle_idle_call kernel/sched/idle.c:156 do_idle+0x35c/0x440 kernel/sched/idle.c:246 cpu_startup_entry+0x1d/0x20 kernel/sched/idle.c:351 rest_init+0xf3/0x100 init/main.c:435 start_kernel+0x782/0x7b0 init/main.c:710 x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:377 x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:358 secondary_startup_64+0xa5/0xa5 arch/x86/kernel/head_64.S:235 Allocated by task 1845: save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772 kmalloc ./include/linux/slab.h:493 kzalloc ./include/linux/slab.h:666 rsi_91x_init+0x98/0x510 drivers/net/wireless/rsi/rsi_91x_main.c:203 rsi_probe+0xb6/0x13b0 drivers/net/wireless/rsi/rsi_91x_usb.c:665 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523 device_add+0xd0b/0x1660 drivers/base/core.c:1835 usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932 generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174 usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523 device_add+0xd0b/0x1660 drivers/base/core.c:1835 usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457 hub_port_connect drivers/usb/core/hub.c:4903 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x3a1/0x470 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Freed by task 1845: save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524 slab_free_hook mm/slub.c:1390 slab_free_freelist_hook mm/slub.c:1412 slab_free mm/slub.c:2988 kfree+0xf6/0x2f0 mm/slub.c:3919 rsi_91x_deinit+0x1e8/0x250 drivers/net/wireless/rsi/rsi_91x_main.c:268 rsi_probe+0xed1/0x13b0 drivers/net/wireless/rsi/rsi_91x_usb.c:709 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_d
Re: [PATCH iproute2 master 0/2] BPF/XDP json follow-up
On Thu, 21 Sep 2017 10:42:27 +0200 Daniel Borkmann wrote: > After merging net-next branch into master, Stephen asked to > fix up json dump for XDP as there were some merge conflicts, > so here it is. > > Thanks! > > Daniel Borkmann (2): > json: move json printer to common library > bpf: properly output json for xdp > > include/json_print.h | 71 > ip/Makefile | 2 +- > ip/ip_common.h | 65 ++ > ip/ip_print.c| 233 > --- > ip/iplink_xdp.c | 74 +--- > lib/Makefile | 2 +- > lib/bpf.c| 19 +++-- > lib/json_print.c | 231 ++ > 8 files changed, 369 insertions(+), 328 deletions(-) > create mode 100644 include/json_print.h > delete mode 100644 ip/ip_print.c > create mode 100644 lib/json_print.c > Applied.
[PATCH] Add a driver for Renesas uPD60620 and uPD60620A PHYs
Signed-off-by: Bernd Edlinger --- drivers/net/phy/Kconfig| 5 + drivers/net/phy/Makefile | 1 + drivers/net/phy/uPD60620.c | 226 + 3 files changed, 232 insertions(+) create mode 100644 drivers/net/phy/uPD60620.c diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index a9d16a3..25089f0 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -287,6 +287,11 @@ config DP83867_PHY ---help--- Currently supports the DP83867 PHY. +config RENESAS_PHY + tristate "Driver for Renesas PHYs" + ---help--- + Supports the uPD60620 and uPD60620A PHYs. + config FIXED_PHY tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs" depends on PHYLIB diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 416df92..1404ad3 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -72,6 +72,7 @@ obj-$(CONFIG_MICROSEMI_PHY) += mscc.o obj-$(CONFIG_NATIONAL_PHY)+= national.o obj-$(CONFIG_QSEMI_PHY) += qsemi.o obj-$(CONFIG_REALTEK_PHY) += realtek.o +obj-$(CONFIG_RENESAS_PHY) += uPD60620.o obj-$(CONFIG_ROCKCHIP_PHY)+= rockchip.o obj-$(CONFIG_SMSC_PHY)+= smsc.o obj-$(CONFIG_STE10XP) += ste10Xp.o diff --git a/drivers/net/phy/uPD60620.c b/drivers/net/phy/uPD60620.c new file mode 100644 index 000..b3d900c --- /dev/null +++ b/drivers/net/phy/uPD60620.c @@ -0,0 +1,226 @@ +/* + * Driver for the Renesas PHY uPD60620. + * + * Copyright (C) 2015 Softing Industrial Automation GmbH + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include +#include +#include + +#define UPD60620_PHY_ID0xb8242824 + +/* Extended Registers and values */ +/* PHY Special Control/Status*/ +#define PHY_PHYSCR 0x1F /* PHY.31 */ +#define PHY_PHYSCR_10MB0x0004/* PHY speed = 10mb */ +#define PHY_PHYSCR_100MB 0x0008/* PHY speed = 100mb */ +#define PHY_PHYSCR_DUPLEX 0x0010/* PHY Duplex */ +#define PHY_PHYSCR_RSVD5 0x0020/* Reserved Bit 5 */ +#define PHY_PHYSCR_MIIMOD 0x0040/* Enable 4B5B MII mode */ +#define PHY_PHYSCR_RSVD7 0x0080/* Reserved Bit 7 */ +#define PHY_PHYSCR_RSVD8 0x0100/* Reserved Bit 8 */ +#define PHY_PHYSCR_RSVD9 0x0200/* Reserved Bit 9 */ +#define PHY_PHYSCR_RSVD10 0x0400/* Reserved Bit 10 */ +#define PHY_PHYSCR_RSVD11 0x0800/* Reserved Bit 11 */ +#define PHY_PHYSCR_ANDONE 0x1000/* Auto negotiation done */ +#define PHY_PHYSCR_RSVD13 0x2000/* Reserved Bit 13 */ +#define PHY_PHYSCR_RSVD14 0x4000/* Reserved Bit 14 */ +#define PHY_PHYSCR_RSVD15 0x8000/* Reserved Bit 15 */ + +/* PHY Global Config Mapping */ +#define PHY_GLOBAL_CONFIG 0x07 +/* PHY GPIO Config Register 1 */ +#define PHY_GPIO_CONFIG1 0x01 /* PHY 7.1 */ +#define PHY_GPIO4_INT0 0x000d /* GPIO4 configuration */ +#define PHY_GPIO5_INT1 0x00d0 /* GPIO5 configuration */ + +/* PHY Interrupt Control Register */ +#define PHY_ICR0x1e /* PHY.30 */ +#define PHY_ICR_RSVD0 0x0001/* Reserved bit 0 */ +#define PHY_ICR_ANCPRRN0x0002/* Auto negotiation paged received */ +#define PHY_ICR_PDFEN 0x0004/* Parallel detection fault */ +#define PHY_ICR_ANCLPAEN 0x0008/* Auto negotiation last page ack */ +#define PHY_ICR_LNKINTEN 0x0010/* Link down */ +#define PHY_ICR_REMFD 0x0020/* Remote fault detected */ +#define PHY_ICR_ANCINTEN 0x0040/* Auto negotiation complete */ +#define PHY_ICR_EOEN 0x0080/* Energy on generated */ +#define PHY_ICR_RSVD8 0x0100/* Reserved bit 8 */ +#define PHY_ICR_FEQTRGEN 0x0200/* FEQ Trigger */ +#define PHY_ICR_BERTRGEN 0x0400/* BER Counter Trigger */ +#define PHY_ICR_MLINTEN0x0800/* Maxlvl */ +#define PHY_ICR_CLPINTEN 0x1000/* Clipping */ +#define PHY_ICR_RSVD13 0x2000/* Reserved bit 13 */ +#define PHY_ICR_RSVD14 0x4000/* Reserved bit 14 */ +#define PHY_ICR_RSVD15 0x8000/* Reserved bit 15 */ + +/* PHY Interrupt Status Register */ +#define PHY_ISR0x1d /* PHY.29 */ +#define PHY_ISR_DUPINT 0x/* Placeholder for Duplex/Speed intr */ +#define PHY_ISR_RSVD0 0x0001/* Reserved bit 0 */ +#define PHY_ISR_ANCPR 0x0002/* Auto negotiation paged received */ +#define PHY_ISR_PDF0x0004/* Parallel detection fault */ +#define PHY_ISR_ANCLPA 0x0008/* Auto negotiation last page ack */ +#define PHY_ISR_LNKINT 0x0010/* Link down */ +#define PHY_ISR_REMFD 0x0020/* Remote fault detected */ +#define PHY_ISR_ANCINT 0x0040/* Auto negotiation complete */ +#define PHY_ISR_EO 0x0080/* Energy on generate
Re: [PATCH iproute2 v2] man: fix documentation for range of route table ID
On Fri, 22 Sep 2017 13:28:54 +0200 Thomas Haller wrote: > Signed-off-by: Thomas Haller > --- > Changes in v2: > - "0" is not a valid table ID. > > man/man8/ip-route.8.in | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in > index 803de3b9..705ceb20 100644 > --- a/man/man8/ip-route.8.in > +++ b/man/man8/ip-route.8.in > @@ -322,7 +322,7 @@ normal routing tables. > .P > .B Route tables: > Linux-2.x can pack routes into several routing tables identified > -by a number in the range from 1 to 2^31 or by name from the file > +by a number in the range from 1 to 2^32-1 or by name from the file > .B @SYSCONFDIR@/rt_tables > By default all normal routes are inserted into the > .B main Applied
Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver
> #ifdef CONFIG_TUN_VNET_CROSS_LE > static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) > { > @@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool > clean) > > tun = rtnl_dereference(tfile->tun); > > + if (tun && clean) { > + tun_napi_disable(tun, tfile); are we missing synchronize_net() separating disable and del calls? > + tun_napi_del(tun, tfile); > + } > + > if (tun && !tfile->detached) { > u16 index = tfile->queue_index; > BUG_ON(index >= tun->numqueues);
[PATCH][V3] e1000: avoid null pointer dereference on invalid stat type
From: Colin Ian King Currently if the stat type is invalid then data[i] is being set either by dereferencing a null pointer p, or it is reading from an incorrect previous location if we had a valid stat type previously. Fix this by skipping over the read of p on an invalid stat type. Detected by CoverityScan, CID#113385 ("Explicit null dereferenced") Signed-off-by: Colin Ian King --- drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c index ec8aa4562cc9..3b3983a1ffbb 100644 --- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c +++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c @@ -1824,11 +1824,12 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, { struct e1000_adapter *adapter = netdev_priv(netdev); int i; - char *p = NULL; const struct e1000_stats *stat = e1000_gstrings_stats; e1000_update_stats(adapter); - for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) { + for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++, stat++) { + char *p; + switch (stat->type) { case NETDEV_STATS: p = (char *)netdev + stat->stat_offset; @@ -1839,15 +1840,13 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, default: WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n", stat->type, i); - break; + continue; } if (stat->sizeof_stat == sizeof(u64)) data[i] = *(u64 *)p; else data[i] = *(u32 *)p; - - stat++; } /* BUG_ON(i != E1000_STATS_LEN); */ } -- 2.14.1
Re: [PATCH net-next 3/4] net: dsa: make slave close symmetrical to open
On 09/22/2017 09:17 AM, Vivien Didelot wrote: > The DSA slave open function configures the unicast MAC addresses on the > master device, enable the switch port, change its STP state, then start > the PHY device. > > Make the close function symmetric, by first stopping the PHY device, > then changing the STP state, disabling the switch port and restore the > master device. > > Signed-off-by: Vivien Didelot Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable
On 09/22/2017 09:17 AM, Vivien Didelot wrote: > The .port_enable and .port_disable functions are meant to deal with the > switch ports only, and no driver is using the phy argument anyway. > Remove it. I don't think this makes sense, there are perfectly legit reasons why a switch driver may have something to do with the PHY device attached to its per-port network interface, we should definitively keep that around, unless you think we should be accessing the PHY within the switch drivers by doing: struct phy_device *phydev = ds->ports[port].netdev->phydev? > > Signed-off-by: Vivien Didelot > --- > drivers/net/dsa/b53/b53_common.c | 6 +++--- > drivers/net/dsa/b53/b53_priv.h | 4 ++-- > drivers/net/dsa/bcm_sf2.c | 16 +++- > drivers/net/dsa/lan9303-core.c | 6 ++ > drivers/net/dsa/microchip/ksz_common.c | 6 ++ > drivers/net/dsa/mt7530.c | 8 +++- > drivers/net/dsa/mv88e6xxx/chip.c | 6 ++ > drivers/net/dsa/qca8k.c| 6 ++ > include/net/dsa.h | 6 ++ > net/dsa/slave.c| 4 ++-- > 10 files changed, 27 insertions(+), 41 deletions(-) > > diff --git a/drivers/net/dsa/b53/b53_common.c > b/drivers/net/dsa/b53/b53_common.c > index d4ce092def83..e46eb29d29f0 100644 > --- a/drivers/net/dsa/b53/b53_common.c > +++ b/drivers/net/dsa/b53/b53_common.c > @@ -502,7 +502,7 @@ void b53_imp_vlan_setup(struct dsa_switch *ds, int > cpu_port) > } > EXPORT_SYMBOL(b53_imp_vlan_setup); > > -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy) > +int b53_enable_port(struct dsa_switch *ds, int port) > { > struct b53_device *dev = ds->priv; > unsigned int cpu_port = dev->cpu_port; > @@ -531,7 +531,7 @@ int b53_enable_port(struct dsa_switch *ds, int port, > struct phy_device *phy) > } > EXPORT_SYMBOL(b53_enable_port); > > -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device > *phy) > +void b53_disable_port(struct dsa_switch *ds, int port) > { > struct b53_device *dev = ds->priv; > u8 reg; > @@ -874,7 +874,7 @@ static int b53_setup(struct dsa_switch *ds) > if (dsa_is_cpu_port(ds, port)) > b53_enable_cpu_port(dev, port); > else if (!(BIT(port) & ds->enabled_port_mask)) > - b53_disable_port(ds, port, NULL); > + b53_disable_port(ds, port); > } > > return ret; > diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h > index 603c66d240d8..688d02ee6155 100644 > --- a/drivers/net/dsa/b53/b53_priv.h > +++ b/drivers/net/dsa/b53/b53_priv.h > @@ -311,8 +311,8 @@ int b53_mirror_add(struct dsa_switch *ds, int port, > struct dsa_mall_mirror_tc_entry *mirror, bool ingress); > void b53_mirror_del(struct dsa_switch *ds, int port, > struct dsa_mall_mirror_tc_entry *mirror); > -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy); > -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device > *phy); > +int b53_enable_port(struct dsa_switch *ds, int port); > +void b53_disable_port(struct dsa_switch *ds, int port); > void b53_brcm_hdr_setup(struct dsa_switch *ds, int port); > void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable); > int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy); > diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c > index ad96b9725a2c..77e0c43f973b 100644 > --- a/drivers/net/dsa/bcm_sf2.c > +++ b/drivers/net/dsa/bcm_sf2.c > @@ -159,8 +159,7 @@ static inline void bcm_sf2_port_intr_disable(struct > bcm_sf2_priv *priv, > intrl2_1_writel(priv, P_IRQ_MASK(off), INTRL2_CPU_CLEAR); > } > > -static int bcm_sf2_port_setup(struct dsa_switch *ds, int port, > - struct phy_device *phy) > +static int bcm_sf2_port_setup(struct dsa_switch *ds, int port) > { > struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); > unsigned int i; > @@ -191,11 +190,10 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, > int port, > if (port == priv->moca_port) > bcm_sf2_port_intr_enable(priv, port); > > - return b53_enable_port(ds, port, phy); > + return b53_enable_port(ds, port); > } > > -static void bcm_sf2_port_disable(struct dsa_switch *ds, int port, > - struct phy_device *phy) > +static void bcm_sf2_port_disable(struct dsa_switch *ds, int port) > { > struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); > u32 off, reg; > @@ -214,7 +212,7 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, > int port, > else > off = CORE_G_PCTL_PORT(port); > > - b53_disable_port(ds, port, phy); > + b53_disable_port(ds, port); > > /* Power down the port memory */ > reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL); > @@
Re: [PATCH net-next 4/4] net: dsa: add port enable and disable helpers
On 09/22/2017 09:17 AM, Vivien Didelot wrote: > Provide dsa_port_enable and dsa_port_disable helpers to respectively > enable and disable a switch port. This makes the dsa_port_set_state_now > helper static. > > Signed-off-by: Vivien Didelot Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
On Fri, Sep 22, 2017 at 9:51 AM, Mahesh Bandewar (महेश बंडेवार) wrote: > On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijn > wrote: >>> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file >>> *file, struct ifreq *ifr) >>> if (tfile->detached) >>> return -EINVAL; >>> >>> + if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN)) >>> + return -EPERM; >>> + >> >> This should perhaps be moved into the !dev branch, directly below the >> ns_capable check. >> > Hmm, does that mean fail only on creation but allow to attach if > exists? That would be wrong, isn't it? Correct me if I'm wrong but we > want to prevent both these scenarios if user does not have sufficient > privileges (i.e. NET_ADMIN in init-ns). > My understanding is we want to protect both scenarios. >>> dev = __dev_get_by_name(net, ifr->ifr_name); >>> if (dev) { >>> if (ifr->ifr_flags & IFF_TUN_EXCL) >>> @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file >>> *file, struct ifreq *ifr) >>> tun->flags = (tun->flags & ~TUN_FEATURES) | >>> (ifr->ifr_flags & TUN_FEATURES); >>> >>> + if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != >>> IFF_TAP) >>> + tun->flags = tun->flags & ~IFF_NAPI_FRAGS; >>> + >> >> Similarly, this check only need to be performed in that branch. >> Instead of reverting to non-frags mode, a tun_set_iff with the wrong >> set of flags should probably fail hard. > Yes, agree, wrong set of flags should fail hard and probably be done > before attach or open, no? Agreed, in v3 I will push this check before the conditional so both branches can be rejected with EINVAL.
Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
On Fri, Sep 22, 2017 at 1:48 PM, Petar Penkov wrote: > On Fri, Sep 22, 2017 at 9:51 AM, Mahesh Bandewar (महेश बंडेवार) > wrote: >> On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijn >> wrote: @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) if (tfile->detached) return -EINVAL; + if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN)) + return -EPERM; + >>> >>> This should perhaps be moved into the !dev branch, directly below the >>> ns_capable check. >>> >> Hmm, does that mean fail only on creation but allow to attach if >> exists? That would be wrong, isn't it? Correct me if I'm wrong but we >> want to prevent both these scenarios if user does not have sufficient >> privileges (i.e. NET_ADMIN in init-ns). Ok. >> > My understanding is we want to protect both scenarios. dev = __dev_get_by_name(net, ifr->ifr_name); if (dev) { if (ifr->ifr_flags & IFF_TUN_EXCL) @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) tun->flags = (tun->flags & ~TUN_FEATURES) | (ifr->ifr_flags & TUN_FEATURES); + if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != IFF_TAP) + tun->flags = tun->flags & ~IFF_NAPI_FRAGS; + >>> >>> Similarly, this check only need to be performed in that branch. >>> Instead of reverting to non-frags mode, a tun_set_iff with the wrong >>> set of flags should probably fail hard. >> Yes, agree, wrong set of flags should fail hard and probably be done >> before attach or open, no? > Agreed, in v3 I will push this check before the conditional so both > branches can be rejected with EINVAL. Sounds great.
Re: [PATCH] Add a driver for Renesas uPD60620 and uPD60620A PHYs
On Fri, Sep 22, 2017 at 05:08:45PM +, Bernd Edlinger wrote: > Signed-off-by: Bernd Edlinger > --- > drivers/net/phy/Kconfig| 5 + > drivers/net/phy/Makefile | 1 + > drivers/net/phy/uPD60620.c | 226 > + > 3 files changed, 232 insertions(+) > create mode 100644 drivers/net/phy/uPD60620.c > > diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig > index a9d16a3..25089f0 100644 > --- a/drivers/net/phy/Kconfig > +++ b/drivers/net/phy/Kconfig > @@ -287,6 +287,11 @@ config DP83867_PHY > ---help--- > Currently supports the DP83867 PHY. > > +config RENESAS_PHY > + tristate "Driver for Renesas PHYs" > + ---help--- > + Supports the uPD60620 and uPD60620A PHYs. > + Hi Bernd Please call this "Reneseas PHYs" and place in it alphabetical order. > config FIXED_PHY > tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs" > depends on PHYLIB > diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile > index 416df92..1404ad3 100644 > --- a/drivers/net/phy/Makefile > +++ b/drivers/net/phy/Makefile > @@ -72,6 +72,7 @@ obj-$(CONFIG_MICROSEMI_PHY) += mscc.o > obj-$(CONFIG_NATIONAL_PHY) += national.o > obj-$(CONFIG_QSEMI_PHY) += qsemi.o > obj-$(CONFIG_REALTEK_PHY) += realtek.o > +obj-$(CONFIG_RENESAS_PHY)+= uPD60620.o > obj-$(CONFIG_ROCKCHIP_PHY) += rockchip.o > obj-$(CONFIG_SMSC_PHY) += smsc.o > obj-$(CONFIG_STE10XP) += ste10Xp.o > diff --git a/drivers/net/phy/uPD60620.c b/drivers/net/phy/uPD60620.c > new file mode 100644 > index 000..b3d900c > --- /dev/null > +++ b/drivers/net/phy/uPD60620.c > @@ -0,0 +1,226 @@ > +/* > + * Driver for the Renesas PHY uPD60620. > + * > + * Copyright (C) 2015 Softing Industrial Automation GmbH > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + */ > + > +#include > +#include > +#include > + > +#define UPD60620_PHY_ID0xb8242824 > + > +/* Extended Registers and values */ > +/* PHY Special Control/Status*/ > +#define PHY_PHYSCR 0x1F /* PHY.31 */ > +#define PHY_PHYSCR_10MB0x0004/* PHY speed = 10mb */ > +#define PHY_PHYSCR_100MB 0x0008/* PHY speed = 100mb */ > +#define PHY_PHYSCR_DUPLEX 0x0010/* PHY Duplex */ > +#define PHY_PHYSCR_RSVD5 0x0020/* Reserved Bit 5 */ > +#define PHY_PHYSCR_MIIMOD 0x0040/* Enable 4B5B MII mode */ Are any of these comments actually useful. It seems like the defines are pretty obvious. > +#define PHY_PHYSCR_RSVD7 0x0080/* Reserved Bit 7 */ > +#define PHY_PHYSCR_RSVD8 0x0100/* Reserved Bit 8 */ > +#define PHY_PHYSCR_RSVD9 0x0200/* Reserved Bit 9 */ > +#define PHY_PHYSCR_RSVD10 0x0400/* Reserved Bit 10 */ > +#define PHY_PHYSCR_RSVD11 0x0800/* Reserved Bit 11 */ > +#define PHY_PHYSCR_ANDONE 0x1000/* Auto negotiation done */ > +#define PHY_PHYSCR_RSVD13 0x2000/* Reserved Bit 13 */ > +#define PHY_PHYSCR_RSVD14 0x4000/* Reserved Bit 14 */ > +#define PHY_PHYSCR_RSVD15 0x8000/* Reserved Bit 15 */ It looks like the only register you use is SCR and SPM. Maybe delete all the rest? Or do you plan to add more features making use of these registers? > +/* Init PHY */ > + > +static int upd60620_config_init(struct phy_device *phydev) > +{ > + /* Enable support for passive HUBs (could be a strap option) */ > + /* PHYMODE: All speeds, HD in parallel detect */ > + return phy_write(phydev, PHY_SPM, 0x0180 | phydev->mdio.addr); > +} > + > +/* Get PHY status from common registers */ > + > +static int upd60620_read_status(struct phy_device *phydev) > +{ > + int phy_state; > + > + /* Read negotiated state */ > + phy_state = phy_read(phydev, MII_BMSR); > + if (phy_state < 0) > + return phy_state; > + > + phydev->link = 0; > + phydev->lp_advertising = 0; > + phydev->pause = 0; > + phydev->asym_pause = 0; > + > + if (phy_state & BMSR_ANEGCOMPLETE) { It is worth comparing this against genphy_read_status() which is the reference implementation. You would normally check if auto negotiation is enabled, not if it has completed. If it is enabled you read the current negotiated state, even if it is not completed. > + phy_state = phy_read(phydev, PHY_PHYSCR); > + if (phy_state < 0) > + return phy_state; > + > + if (phy_state & (PHY_PHYSCR_10MB | PHY_PHYSCR_100MB)) { > + phydev->link = 1; > + phydev->speed = SPEED_10; > + phydev->duplex = DUPLEX_HALF; > + > + if (phy_state & PHY_PHYSCR_100MB) > + phydev->speed = SPEED_100; > + if (phy_state
Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver
On Fri, Sep 22, 2017 at 1:11 PM, Mahesh Bandewar (महेश बंडेवार) wrote: >> #ifdef CONFIG_TUN_VNET_CROSS_LE >> static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) >> { >> @@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool >> clean) >> >> tun = rtnl_dereference(tfile->tun); >> >> + if (tun && clean) { >> + tun_napi_disable(tun, tfile); > are we missing synchronize_net() separating disable and del calls? That is not needed here. napi_disable has its own mechanism for waiting until a napi struct is no longer run. netif_napi_del will call synchronize_net if needed. These two calls are made one after the other in quite a few drivers.