date:20170922

Re: [PATCH net-next 2/2] net: dsa: lan9303: Add basic offloading of unicast traffic

2017-09-22 Thread Egil Hjelmeland


Den 21. sep. 2017 16:21, skrev Andrew Lunn:

Hi Egil


+static void lan9303_bridge_ports(struct lan9303 *chip)
+{
+   /* ports bridged: remove mirroring */
+   lan9303_write_switch_reg(chip, LAN9303_SWE_PORT_MIRROR, 0);
+}


Could you replace the 0 with something symbolic which makes this
easier to understand.

#define LAN9303_SWE_PORT_MIRROR_DISABLED 0



OK


+
  static int lan9303_handle_reset(struct lan9303 *chip)
  {
if (!chip->reset_gpio)
@@ -844,6 +866,69 @@ static void lan9303_port_disable(struct dsa_switch *ds, 
int port,
}
  }
  
+static int lan9303_port_bridge_join(struct dsa_switch *ds, int port,

+   struct net_device *br)
+{
+   struct lan9303 *chip = ds->priv;
+
+   dev_dbg(chip->dev, "%s(port %d)\n", __func__, port);
+   if (ds->ports[1].bridge_dev ==  ds->ports[2].bridge_dev) {
+   lan9303_bridge_ports(chip);
+   chip->is_bridged = true;  /* unleash stp_state_set() */
+   }
+
+   return 0;
+}
+
+static void lan9303_port_bridge_leave(struct dsa_switch *ds, int port,
+ struct net_device *br)
+{
+   struct lan9303 *chip = ds->priv;
+
+   dev_dbg(chip->dev, "%s(port %d)\n", __func__, port);
+   if (chip->is_bridged) {
+   lan9303_separate_ports(chip);
+   chip->is_bridged = false;
+   }
+}
+
+static void lan9303_port_stp_state_set(struct dsa_switch *ds, int port,
+  u8 state)
+{
+   int portmask, portstate;
+   struct lan9303 *chip = ds->priv;
+
+   dev_dbg(chip->dev, "%s(port %d, state %d)\n",
+   __func__, port, state);
+   if (!chip->is_bridged)
+   return; /* touching SWE_PORT_STATE will break port separation */


I'm wondering how this is supposed to work. Please add a good comment
here, since the hardware is forcing you to do something odd.

Maybe it would be a good idea to save the STP state in chip.  And then
when chip->is_bridged is set true, change the state in the hardware to
the saved value?

What happens when port 0 is added to the bridge, there is then a
minute pause and then port 1 is added? I would expect that as soon as
port 0 is added, the STP state machine for port 0 will start and move
into listening and then forwarding. Due to hardware limitations it
looks like you cannot do this. So what state is the hardware
effectively in? Blocking? Forwarding?

Then port 1 is added. You can then can respect the states. port 1 will
do blocking->listening->forwarding, but what about port 0? The calls
won't get repeated? How does it transition to forwarding?

   Andrew



I see your point with the "minute pause" argument. Although a bit
contrived use case, it is easy to fix by caching the STP state, as
you suggest. So I can do that.

The port separation method is from the original version of the driver,
not by me.

I have read through the datasheet to see if there are other ways
to make port separation, but I could not find anything.
If anybody care to read through the 300+ page lan9303.pdf and prove
me wrong, I am happy to do it differently.

How does other DSA HW chips handle port separation? Knowing that
could perhaps help me know what to look for.

Egil
'


+
+   switch (state) {
+   case BR_STATE_DISABLED:
+   portstate = LAN9303_SWE_PORT_STATE_DISABLED_PORT0;
+   break;
+   case BR_STATE_BLOCKING:
+   case BR_STATE_LISTENING:
+   portstate = LAN9303_SWE_PORT_STATE_BLOCKING_PORT0;
+   break;
+   case BR_STATE_LEARNING:
+   portstate = LAN9303_SWE_PORT_STATE_LEARNING_PORT0;
+   break;
+   case BR_STATE_FORWARDING:
+   portstate = LAN9303_SWE_PORT_STATE_FORWARDING_PORT0;
+   break;
+   default:
+   portstate = LAN9303_SWE_PORT_STATE_DISABLED_PORT0;
+   dev_err(chip->dev, "unknown stp state: port %d, state %d\n",
+   port, state);
+   }
+
+   portmask = 0x3 << (port * 2);
+   portstate <<= (port * 2);
+   lan9303_write_switch_reg_mask(chip, LAN9303_SWE_PORT_STATE,
+ portstate, portmask);
+}

Re: linux-next: build failure after merge of the net-next tree

2017-09-22 Thread Paolo Abeni

On Thu, 2017-09-21 at 18:37 -0700, David Miller wrote:
> From: Stephen Rothwell 
> Date: Fri, 22 Sep 2017 11:03:55 +1000
> 
> > After merging the net-next tree, today's linux-next build (arm
> > multi_v7_defconfig) failed like this:
> > 
> > net/ipv4/fib_frontend.c: In function 'fib_validate_source':
> > net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member 
> > named 'fib_has_custom_local_routes'
> >if (net->ipv4.fib_has_custom_local_routes)
> > ^
> > net/ipv4/fib_frontend.c: In function 'inet_rtm_newroute':
> > net/ipv4/fib_frontend.c:773:12: error: 'struct netns_ipv4' has no member 
> > named 'fib_has_custom_local_routes'
> >net->ipv4.fib_has_custom_local_routes = true;
> > ^
> > 
> > Caused by commit
> > 
> >   6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.")
> 
> Paolo, it seems this struct member should be placed outside of
> IP_MULTIPLE_TABLES protection, since users can insert custom
> local routes even without that set.
> 
> So I'm installing the following fix for this:
> 
> 
> [PATCH] ipv4: Move fib_has_custom_local_routes outside of IP_MULTIPLE_TABLES.
> 
> > net/ipv4/fib_frontend.c: In function 'fib_validate_source':
> > net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member 
> > named 'fib_has_custom_local_routes'
> >if (net->ipv4.fib_has_custom_local_routes)
> > ^
> > net/ipv4/fib_frontend.c: In function 'inet_rtm_newroute':
> > net/ipv4/fib_frontend.c:773:12: error: 'struct netns_ipv4' has no member 
> > named 'fib_has_custom_local_routes'
> >net->ipv4.fib_has_custom_local_routes = true;
> > ^
> 
> Fixes: 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is 
> disabled.")
> Reported-by: Stephen Rothwell 
> Signed-off-by: David S. Miller 
> ---
>  include/net/netns/ipv4.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 20720721da4b..8387f099115e 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -49,10 +49,10 @@ struct netns_ipv4 {
>  #ifdef CONFIG_IP_MULTIPLE_TABLES
> struct fib_rules_ops*rules_ops;
> boolfib_has_custom_rules;
> -   boolfib_has_custom_local_routes;
> struct fib_table __rcu  *fib_main;
> struct fib_table __rcu  *fib_default;
>  #endif
> +   boolfib_has_custom_local_routes;
>  #ifdef CONFIG_IP_ROUTE_CLASSID
> int fib_num_tclassid_users;
>  #endif
> -- 
> 2.13.5

My fault, I should have fuzzed the configuration before posting.

Fix looks good, thanks David!

Paolo

Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit

2017-09-22 Thread Julia Lawall



On Thu, 21 Sep 2017, Colin King wrote:

> From: Colin Ian King 
>
> Don't populate const array ac_to_fifo on the stack in an inlined
> function, instead make it static.  Makes the object code smaller
> by over 800 bytes:
>
>text  data bss dec hex filename
>  159029 331541216  193399   2f377 4965-mac.o
>
>text  data bss dec hex filename
>  158122 332501216  192588   2f04c 4965-mac.o
>
> (gcc version 7.2.0 x86_64)

Gcc 7 must be much more aggressive about inlining than gcc 4.  Here is the
change that I got on this file:

 text  data bss dec hex filename
-   76346  1494 152   77992   130a8 
drivers/net/wireless/intel/iwlegacy/4965-mac.o
+   76298  1494 152   77944   13078 
drivers/net/wireless/intel/iwlegacy/4965-mac.o
decrease of 48

julia

>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/wireless/intel/iwlegacy/4965-mac.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/intel/iwlegacy/4965-mac.c 
> b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
> index de9b6522c43f..65eba2c24292 100644
> --- a/drivers/net/wireless/intel/iwlegacy/4965-mac.c
> +++ b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
> @@ -1480,7 +1480,7 @@ il4965_get_ac_from_tid(u16 tid)
>  static inline int
>  il4965_get_fifo_from_tid(u16 tid)
>  {
> - const u8 ac_to_fifo[] = {
> + static const u8 ac_to_fifo[] = {
>   IL_TX_FIFO_VO,
>   IL_TX_FIFO_VI,
>   IL_TX_FIFO_BE,
> --
> 2.14.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH net-next 2/2] net: dsa: lan9303: Add basic offloading of unicast traffic

2017-09-22 Thread Egil Hjelmeland


Den 21. sep. 2017 16:26, skrev Vivien Didelot:

Hi Egil,

Egil Hjelmeland  writes:


When both user ports are joined to the same bridge, the normal
HW MAC learning is enabled. This means that unicast traffic is forwarded
in HW.

If one of the user ports leave the bridge,
the ports goes back to the initial separated operation.

Port separation relies on disabled HW MAC learning. Hence the condition
that both ports must join same bridge.

Add brigde methods port_bridge_join, port_bridge_leave and
port_stp_state_set.

Signed-off-by: Egil Hjelmeland 


Styling nitpicks below, other than that, the patch LGTM:

Reviewed-by: Vivien Didelot 


  #include 
  #include 
  #include 
+#include 


It's nice to order header inclusions alphabetically.

  
  #include "lan9303.h"
  
@@ -146,6 +147,7 @@

  # define LAN9303_SWE_PORT_STATE_FORWARDING_PORT0 (0)
  # define LAN9303_SWE_PORT_STATE_LEARNING_PORT0 BIT(1)
  # define LAN9303_SWE_PORT_STATE_BLOCKING_PORT0 BIT(0)
+# define LAN9303_SWE_PORT_STATE_DISABLED_PORT0 (3)
  #define LAN9303_SWE_PORT_MIRROR 0x1846
  # define LAN9303_SWE_PORT_MIRROR_SNIFF_ALL BIT(8)
  # define LAN9303_SWE_PORT_MIRROR_SNIFFER_PORT2 BIT(7)
@@ -431,6 +433,20 @@ static int lan9303_read_switch_reg(struct lan9303 *chip, 
u16 regnum, u32 *val)
return ret;
  }
  
+static int lan9303_write_switch_reg_mask(

+   struct lan9303 *chip, u16 regnum, u32 val, u32 mask)


That is unlikely to respect the Kernel Coding Style. Please fill the
line as much as possible and align with the opening parenthesis:

 static int lan9303_write_switch_reg_mask(struct lan9303 *chip, u16 regnum,
  u32 val, u32 mask)


OK. Probably this function will go away in v2 due to Andrews comment.


+{
+   int ret;
+   u32 reg;
+
+   ret = lan9303_read_switch_reg(chip, regnum, ®);
+   if (ret)
+   return ret;
+   reg = (reg & ~mask) | val;
+
+   return lan9303_write_switch_reg(chip, regnum, reg);
+}


...


+
+   portmask = 0x3 << (port * 2);
+   portstate <<= (port * 2);


Unnecessary alignment, please remove the extra space characters.



I think the alignment makes the logic more clear.



+   lan9303_write_switch_reg_mask(chip, LAN9303_SWE_PORT_STATE,
+ portstate, portmask);
+}
+
  static const struct dsa_switch_ops lan9303_switch_ops = {
.get_tag_protocol = lan9303_get_tag_protocol,
.setup = lan9303_setup,
@@ -855,6 +940,9 @@ static const struct dsa_switch_ops lan9303_switch_ops = {
.get_sset_count = lan9303_get_sset_count,
.port_enable = lan9303_port_enable,
.port_disable = lan9303_port_disable,
+   .port_bridge_join   = lan9303_port_bridge_join,
+   .port_bridge_leave  = lan9303_port_bridge_leave,
+   .port_stp_state_set = lan9303_port_stp_state_set,


Same here, alignment unnecessary, especially since the rest of the
structure does not do that.


  };
  
  static int lan9303_register_switch(struct lan9303 *chip)

diff --git a/drivers/net/dsa/lan9303.h b/drivers/net/dsa/lan9303.h
index 4d8be555ff4d..5be246f05965 100644
--- a/drivers/net/dsa/lan9303.h
+++ b/drivers/net/dsa/lan9303.h
@@ -21,6 +21,7 @@ struct lan9303 {
struct dsa_switch *ds;
struct mutex indirect_mutex; /* protect indexed register access */
const struct lan9303_phy_ops *ops;
+   bool is_bridged; /* true if port 1 and 2 is bridged */
  };
  
  extern const struct regmap_access_table lan9303_register_set;


Please use the checkpatch.pl script to ensure your patch respects the
kernel conventions before submitting, it can spot nice stuffs!


I have checked _every_ patch with checkpatch.pl and weeded all warnings
before I submitted them.



I use a Git alias(*) to check a commit which does basically this:

 git format-patch --stdout -1 | ./scripts/checkpatch.pl -


(*) in details, especially convenient during interactive rebases:

 $ git config alias.checkcommit '!f () { git format-patch --stdout -1 
${1:-HEAD} | ./scripts/checkpatch.pl - ; }; f'
 $ git checkcommit # i.e. current one
 $ git checkcommit HEAD^^
 $ git checkcommit d329ac88eb21
 ...


Thanks,

 Vivien

Re: [PATCH net 1/4] net:ethernet:aquantia: Setup max_mtu in ndev to enable jumbo frames

2017-09-22 Thread Igor Russkikh


>>  self->ndev->mtu = aq_nic_cfg->mtu - ETH_HLEN;
>> +self->ndev->min_mtu = ETH_MIN_MTU;
> This is not required. It will default to ETH_MIN_MTU.

Thanks Andrew, true.

>> +self->ndev->max_mtu = self->aq_hw_caps.mtu - ETH_FCS_LEN - ETH_HLEN;
>>  
>>  return 0;
>>  }
>> @@ -695,7 +696,7 @@ int aq_nic_set_mtu(struct aq_nic_s *self, int new_mtu)
>>  {
>>  int err = 0;
>>  
>> -if (new_mtu > self->aq_hw_caps.mtu) {
>> +if (new_mtu + ETH_FCS_LEN > self->aq_hw_caps.mtu) {
> If you have set max_mtu correctly, this cannot happen.  But it seems
> odd you don't have ETH_HLEN here, where as when setting max_mtu you
> do.

I agree, thats an extra check. Will remove that.
Regarding ETH_HLEN - the baseline code invokes this function with (new_mtu + 
ETH_HLEN)  - thats why this check does not add it. In general, that extra level 
of storage (aq_nic_cfg.mtu field) is almost useless, I'll cleanup this under a 
separate patchset.

-- 
BR,
  Igor

Re: tg3 pxe weirdness

2017-09-22 Thread Berend De Schouwer

On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote:
> On Thu, Sep 21, 2017 at 7:53 PM, Berend De Schouwer
>  wrote:
> > Hi,
> > 
> > I've got a machine with a Broadcom bcm5762c, using the tg3 driver,
> > that
> > fails to receive network packets under some very specific
> > conditions.
> > 
> > 
> > Berend
> 
> Can you please share below details?
> 1) Model and Manufacturer of the system

HP EliteDesk 705 G3 SFF
lspci -n on the network card: 01:00.0 0200: 14e4:1687 (rev 10)

dmesg from tg3: 
tg3 :01:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
tg3 :01:00.0: setting latency timer to 64
tg3 :01:00.0: eth0: Tigon3 [partno(none) rev 5762100] (PCI Express)
MAC address 70:5a:0f:3d:9f:4f
tg3 :01:00.0: eth0: attached PHY is 5762C (10/100/1000Base-T
Ethernet) (WireSpeed[1], EEE[1])
tg3 :01:00.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0]
TSOcap[1]
tg3 :01:00.0: eth0: dma_rwctrl[0001] dma_mask[64-bit]

The same motherboard with add-on network cards does boot using PXE. 
It's an EFI board but Secureboot is currently disabled.


> 2) Linux distro/kernel used?

CentOS 6 + updates.  I've tried all the CentOS 6 kernels up to 2.6.32-
696.10.2.el6.x86_64

I've tried updating the tg3 driver from 3.137 to both 3.137h and
3.137o, with the same result.

I'm currently in the process of making 4.13.2 boot.

Re: [PATCH net-next] net: mvpp2: phylink support

2017-09-22 Thread Marcin Wojtas

Hi Antoine,

You can add
Tested-by: Marcin Wojtas 

Best regards,
Marcin

2017-09-21 15:45 GMT+02:00 Antoine Tenart :
> Convert the PPv2 driver to use phylink, which models the MAC to PHY
> link. The phylink support is made such a way the GoP link IRQ can still
> be used: the two modes are incompatible and the GoP link IRQ will be
> used if no PHY is described in the device tree. This is the same
> behaviour as before.
>
> Signed-off-by: Antoine Tenart 
> ---
>  drivers/net/ethernet/marvell/Kconfig |   1 +
>  drivers/net/ethernet/marvell/mvpp2.c | 502 
> +--
>  2 files changed, 303 insertions(+), 200 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/Kconfig 
> b/drivers/net/ethernet/marvell/Kconfig
> index da6fb825afea..991138b8dfba 100644
> --- a/drivers/net/ethernet/marvell/Kconfig
> +++ b/drivers/net/ethernet/marvell/Kconfig
> @@ -86,6 +86,7 @@ config MVPP2
> depends on ARCH_MVEBU || COMPILE_TEST
> depends on HAS_DMA
> select MVMDIO
> +   select PHYLINK
> ---help---
>   This driver supports the network interface units in the
>   Marvell ARMADA 375, 7K and 8K SoCs.
> diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
> b/drivers/net/ethernet/marvell/mvpp2.c
> index 8041d692db3c..5fb7e76ee128 100644
> --- a/drivers/net/ethernet/marvell/mvpp2.c
> +++ b/drivers/net/ethernet/marvell/mvpp2.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -341,6 +342,7 @@
>  #define MVPP2_GMAC_FORCE_LINK_PASS BIT(1)
>  #define MVPP2_GMAC_IN_BAND_AUTONEG BIT(2)
>  #define MVPP2_GMAC_IN_BAND_AUTONEG_BYPASS  BIT(3)
> +#define MVPP2_GMAC_IN_BAND_RESTART_AN  BIT(4)
>  #define MVPP2_GMAC_CONFIG_MII_SPEEDBIT(5)
>  #define MVPP2_GMAC_CONFIG_GMII_SPEED   BIT(6)
>  #define MVPP2_GMAC_AN_SPEED_EN BIT(7)
> @@ -350,6 +352,12 @@
>  #define MVPP2_GMAC_AN_DUPLEX_ENBIT(13)
>  #define MVPP2_GMAC_STATUS0 0x10
>  #define MVPP2_GMAC_STATUS0_LINK_UP BIT(0)
> +#defineMVPP2_GMAC_STATUS0_GMII_SPEED   BIT(1)
> +#defineMVPP2_GMAC_STATUS0_MII_SPEEDBIT(2)
> +#defineMVPP2_GMAC_STATUS0_FULL_DUPLEX  BIT(3)
> +#define MVPP2_GMAC_STATUS0_RX_PAUSEBIT(6)
> +#define MVPP2_GMAC_STATUS0_TX_PAUSEBIT(7)
> +#define MVPP2_GMAC_STATUS0_AN_COMPLETE BIT(11)
>  #define MVPP2_GMAC_PORT_FIFO_CFG_1_REG 0x1c
>  #define MVPP2_GMAC_TX_FIFO_MIN_TH_OFFS 6
>  #define MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK 0x1fc0
> @@ -878,12 +886,11 @@ struct mvpp2_port {
> u16 rx_ring_size;
> struct mvpp2_pcpu_stats __percpu *stats;
>
> +   struct device_node *of_node;
> +
> phy_interface_t phy_interface;
> -   struct device_node *phy_node;
> +   struct phylink *phylink;
> struct phy *comphy;
> -   unsigned int link;
> -   unsigned int duplex;
> -   unsigned int speed;
>
> struct mvpp2_bm_pool *pool_long;
> struct mvpp2_bm_pool *pool_short;
> @@ -4716,13 +4723,14 @@ static void mvpp2_port_periodic_xon_disable(struct 
> mvpp2_port *port)
>  }
>
>  /* Configure loopback port */
> -static void mvpp2_port_loopback_set(struct mvpp2_port *port)
> +static void mvpp2_port_loopback_set(struct mvpp2_port *port,
> +   const struct phylink_link_state *state)
>  {
> u32 val;
>
> val = readl(port->base + MVPP2_GMAC_CTRL_1_REG);
>
> -   if (port->speed == 1000)
> +   if (state->speed == 1000)
> val |= MVPP2_GMAC_GMII_LB_EN_MASK;
> else
> val &= ~MVPP2_GMAC_GMII_LB_EN_MASK;
> @@ -4778,10 +4786,6 @@ static void mvpp2_defaults_set(struct mvpp2_port *port)
> int tx_port_num, val, queue, ptxq, lrxq;
>
> if (port->priv->hw_version == MVPP21) {
> -   /* Configure port to loopback if needed */
> -   if (port->flags & MVPP2_F_LOOPBACK)
> -   mvpp2_port_loopback_set(port);
> -
> /* Update TX FIFO MIN Threshold */
> val = readl(port->base + MVPP2_GMAC_PORT_FIFO_CFG_1_REG);
> val &= ~MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK;
> @@ -5860,111 +5864,6 @@ static irqreturn_t mvpp2_link_status_isr(int irq, 
> void *dev_id)
> return IRQ_HANDLED;
>  }
>
> -static void mvpp2_gmac_set_autoneg(struct mvpp2_port *port,
> -  struct phy_device *phydev)
> -{
> -   u32 val;
> -
> -   if (port->phy_interface != PHY_INTERFACE_MODE_RGMII &&
> -   port->phy_interface != PHY_INTERFACE_MODE_RGMII_ID &&
> -   port->phy_interface != PHY_INTERFACE_MODE_RGMII_RXID &&
> -   port->phy_interface != PHY_INTERFACE_MODE_RGMII_TXID &&
> -   port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> -   return;
> -
> -

[PATCH net-next RFC 4/5] vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH

2017-09-22 Thread Jason Wang

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 58585ec..c89640e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -87,7 +87,7 @@ struct vhost_net_ubuf_ref {
struct vhost_virtqueue *vq;
 };
 
-#define VHOST_RX_BATCH 64
+#define VHOST_NET_BATCH 64
 struct vhost_net_buf {
struct sk_buff **queue;
int tail;
@@ -159,7 +159,7 @@ static int vhost_net_buf_produce(struct vhost_net_virtqueue 
*nvq)
 
rxq->head = 0;
rxq->tail = skb_array_consume_batched(nvq->rx_array, rxq->queue,
- VHOST_RX_BATCH);
+ VHOST_NET_BATCH);
return rxq->tail;
 }
 
@@ -912,7 +912,7 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
return -ENOMEM;
}
 
-   queue = kmalloc_array(VHOST_RX_BATCH, sizeof(struct sk_buff *),
+   queue = kmalloc_array(VHOST_NET_BATCH, sizeof(struct sk_buff *),
  GFP_KERNEL);
if (!queue) {
kfree(vqs);
-- 
2.7.4

[PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched processing

2017-09-22 Thread Jason Wang

This patch implements basic batched processing of tx virtqueue by
prefetching desc indices and updating used ring in a batch. For
non-zerocopy case, vq->heads were used for storing the prefetched
indices and updating used ring. It is also a requirement for doing
more batching on top. For zerocopy case and for simplicity, batched
processing were simply disabled by only fetching and processing one
descriptor at a time, this could be optimized in the future.

XDP_DROP (without touching skb) on tun (with Moongen in guest) with
zercopy disabled:

Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:
Before: 3.20Mpps
After:  3.90Mpps (+22%)

No differences were seen with zerocopy enabled.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 215 --
 drivers/vhost/vhost.c |   2 +-
 2 files changed, 121 insertions(+), 96 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c89640e..c439892 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -408,27 +408,25 @@ static int vhost_net_enable_vq(struct vhost_net *n,
return vhost_poll_start(poll, sock->file);
 }
 
-static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
-   struct vhost_virtqueue *vq,
-   struct iovec iov[], unsigned int iov_size,
-   unsigned int *out_num, unsigned int *in_num)
+static bool vhost_net_tx_avail(struct vhost_net *net,
+  struct vhost_virtqueue *vq)
 {
unsigned long uninitialized_var(endtime);
-   int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
- out_num, in_num, NULL, NULL);
 
-   if (r == vq->num && vq->busyloop_timeout) {
-   preempt_disable();
-   endtime = busy_clock() + vq->busyloop_timeout;
-   while (vhost_can_busy_poll(vq->dev, endtime) &&
-  vhost_vq_avail_empty(vq->dev, vq))
-   cpu_relax();
-   preempt_enable();
-   r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
- out_num, in_num, NULL, NULL);
-   }
+   if (!vq->busyloop_timeout)
+   return false;
 
-   return r;
+   if (!vhost_vq_avail_empty(vq->dev, vq))
+   return true;
+
+   preempt_disable();
+   endtime = busy_clock() + vq->busyloop_timeout;
+   while (vhost_can_busy_poll(vq->dev, endtime) &&
+   vhost_vq_avail_empty(vq->dev, vq))
+   cpu_relax();
+   preempt_enable();
+
+   return !vhost_vq_avail_empty(vq->dev, vq);
 }
 
 static bool vhost_exceeds_maxpend(struct vhost_net *net)
@@ -446,8 +444,9 @@ static void handle_tx(struct vhost_net *net)
 {
struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *vq = &nvq->vq;
+   struct vring_used_elem used, *heads = vq->heads;
unsigned out, in;
-   int head;
+   int avails, head;
struct msghdr msg = {
.msg_name = NULL,
.msg_namelen = 0,
@@ -461,6 +460,7 @@ static void handle_tx(struct vhost_net *net)
struct socket *sock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
bool zcopy, zcopy_used;
+   int i, batched = VHOST_NET_BATCH;
 
mutex_lock(&vq->mutex);
sock = vq->private_data;
@@ -475,6 +475,12 @@ static void handle_tx(struct vhost_net *net)
hdr_size = nvq->vhost_hlen;
zcopy = nvq->ubufs;
 
+   /* Disable zerocopy batched fetching for simplicity */
+   if (zcopy) {
+   heads = &used;
+   batched = 1;
+   }
+
for (;;) {
/* Release DMAs done buffers first */
if (zcopy)
@@ -486,95 +492,114 @@ static void handle_tx(struct vhost_net *net)
if (unlikely(vhost_exceeds_maxpend(net)))
break;
 
-   head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
-   ARRAY_SIZE(vq->iov),
-   &out, &in);
+   avails = vhost_prefetch_desc_indices(vq, heads, batched, 
!zcopy);
/* On error, stop handling until the next kick. */
-   if (unlikely(head < 0))
+   if (unlikely(avails < 0))
break;
-   /* Nothing new?  Wait for eventfd to tell us they refilled. */
-   if (head == vq->num) {
+   /* Nothing new?  Busy poll for a while or wait for
+* eventfd to tell us they refilled. */
+   if (!avails) {
+   if (vhost_net_tx_avail(net, vq))
+   continue;
if (unlikely(vhost_enable_notify(&net->dev, vq))) {
vhost_disable_notify(&net->dev, vq);

[PATCH net-next RFC 1/5] vhost: split out ring head fetching logic

2017-09-22 Thread Jason Wang

This patch splits out ring head fetching logic and leave the
descriptor fetching and translation logic. This makes it is possible
to batch fetching the descriptor indices.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 75 +--
 drivers/vhost/vhost.h |  5 
 2 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d6dbb28..f87ec75 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1984,41 +1984,23 @@ static int get_indirect(struct vhost_virtqueue *vq,
return 0;
 }
 
-/* This looks in the virtqueue and for the first available buffer, and converts
- * it to an iovec for convenient access.  Since descriptors consist of some
- * number of output then some number of input descriptors, it's actually two
- * iovecs, but we pack them into one and note how many of each there were.
- *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
- struct iovec iov[], unsigned int iov_size,
- unsigned int *out_num, unsigned int *in_num,
- struct vhost_log *log, unsigned int *log_num)
+static unsigned int vhost_get_vq_head(struct vhost_virtqueue *vq, int *err)
 {
-   struct vring_desc desc;
-   unsigned int i, head, found = 0;
-   u16 last_avail_idx;
-   __virtio16 avail_idx;
-   __virtio16 ring_head;
-   int ret, access;
-
-   /* Check it isn't doing very strange things with descriptor numbers. */
-   last_avail_idx = vq->last_avail_idx;
+   u16 last_avail_idx = vq->last_avail_idx;
+   __virtio16 avail_idx, ring_head;
 
if (vq->avail_idx == vq->last_avail_idx) {
if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) {
vq_err(vq, "Failed to access avail idx at %p\n",
&vq->avail->idx);
-   return -EFAULT;
+   goto err;
}
vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
 
if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
vq_err(vq, "Guest moved used index from %u to %u",
last_avail_idx, vq->avail_idx);
-   return -EFAULT;
+   goto err;
}
 
/* If there's nothing new since last we looked, return
@@ -2040,13 +2022,35 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
vq_err(vq, "Failed to read head: idx %d address %p\n",
   last_avail_idx,
   &vq->avail->ring[last_avail_idx % vq->num]);
-   return -EFAULT;
+   goto err;
}
 
-   head = vhost16_to_cpu(vq, ring_head);
+   return vhost16_to_cpu(vq, ring_head);
+err:
+   *err = -EFAULT;
+   return 0;
+}
+
+/* This looks in the virtqueue and for the first available buffer, and converts
+ * it to an iovec for convenient access.  Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error. */
+int __vhost_get_vq_desc(struct vhost_virtqueue *vq,
+   struct iovec iov[], unsigned int iov_size,
+   unsigned int *out_num, unsigned int *in_num,
+   struct vhost_log *log, unsigned int *log_num,
+   __virtio16 head)
+{
+   struct vring_desc desc;
+   unsigned int i, found = 0;
+   int ret = 0, access;
 
/* If their number is silly, that's an error. */
-   if (unlikely(head >= vq->num)) {
+   if (unlikely(head > vq->num)) {
vq_err(vq, "Guest says index %u > %u is available",
   head, vq->num);
return -EINVAL;
@@ -2133,6 +2137,25 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
return head;
 }
+EXPORT_SYMBOL_GPL(__vhost_get_vq_desc);
+
+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+ struct iovec iov[], unsigned int iov_size,
+ unsigned int *out_num, unsigned int *in_num,
+ struct vhost_log *log, unsigned int *log_num)
+{
+   int ret = 0;
+   unsigned int head = vhost_get_vq_head(vq, &ret);
+
+   if (ret)
+   return ret;
+
+   if (head == vq->num)
+   return head;
+
+   return __vhost_get_vq_desc(vq, iov, iov_size, out_num, in_num,
+

[PATCH net-next RFC 0/5] batched tx processing in vhost_net

2017-09-22 Thread Jason Wang

Hi:

This series tries to implement basic tx batched processing. This is
done by prefetching descriptor indices and update used ring in a
batch. This intends to speed up used ring updating and improve the
cache utilization. Test shows about ~22% improvement in tx pss.

Please review.

Jason Wang (5):
  vhost: split out ring head fetching logic
  vhost: introduce helper to prefetch desc index
  vhost: introduce vhost_add_used_idx()
  vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH
  vhost_net: basic tx virtqueue batched processing

 drivers/vhost/net.c   | 221 --
 drivers/vhost/vhost.c | 165 +++--
 drivers/vhost/vhost.h |   9 ++
 3 files changed, 270 insertions(+), 125 deletions(-)

-- 
2.7.4

[PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()

2017-09-22 Thread Jason Wang

This patch introduces a helper which just increase the used idx. This
will be used in pair with vhost_prefetch_desc_indices() by batching
code.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 33 +
 drivers/vhost/vhost.h |  1 +
 2 files changed, 34 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8424166d..6532cda 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2178,6 +2178,39 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned 
int head, int len)
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
+int vhost_add_used_idx(struct vhost_virtqueue *vq, int n)
+{
+   u16 old, new;
+
+   old = vq->last_used_idx;
+   new = (vq->last_used_idx += n);
+   /* If the driver never bothers to signal in a very long while,
+* used index might wrap around. If that happens, invalidate
+* signalled_used index we stored. TODO: make sure driver
+* signals at least once in 2^16 and remove this.
+*/
+   if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old)))
+   vq->signalled_used_valid = false;
+
+   /* Make sure buffer is written before we update index. */
+   smp_wmb();
+   if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
+  &vq->used->idx)) {
+   vq_err(vq, "Failed to increment used idx");
+   return -EFAULT;
+   }
+   if (unlikely(vq->log_used)) {
+   /* Log used index update. */
+   log_write(vq->log_base,
+ vq->log_addr + offsetof(struct vring_used, idx),
+ sizeof(vq->used->idx));
+   if (vq->log_ctx)
+   eventfd_signal(vq->log_ctx, 1);
+   }
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vhost_add_used_idx);
+
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
struct vring_used_elem *heads,
unsigned count)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 16c2cb6..5dd6c05 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -199,6 +199,7 @@ int __vhost_get_vq_desc(struct vhost_virtqueue *vq,
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
+int vhost_add_used_idx(struct vhost_virtqueue *vq, int n);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
 int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
 unsigned count);
-- 
2.7.4

[PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index

2017-09-22 Thread Jason Wang

This patch introduces vhost_prefetch_desc_indices() which could batch
descriptor indices fetching and used ring updating. This intends to
reduce the cache misses of indices fetching and updating and reduce
cache line bounce when virtqueue is almost full. copy_to_user() was
used in order to benefit from modern cpus that support fast string
copy. Batched virtqueue processing will be the first user.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 55 +++
 drivers/vhost/vhost.h |  3 +++
 2 files changed, 58 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f87ec75..8424166d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct 
vhost_dev *dev,
 }
 EXPORT_SYMBOL_GPL(vhost_dequeue_msg);
 
+int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq,
+   struct vring_used_elem *heads,
+   u16 num, bool used_update)
+{
+   int ret, ret2;
+   u16 last_avail_idx, last_used_idx, total, copied;
+   __virtio16 avail_idx;
+   struct vring_used_elem __user *used;
+   int i;
+
+   if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) {
+   vq_err(vq, "Failed to access avail idx at %p\n",
+  &vq->avail->idx);
+   return -EFAULT;
+   }
+   last_avail_idx = vq->last_avail_idx & (vq->num - 1);
+   vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+   total = vq->avail_idx - vq->last_avail_idx;
+   ret = total = min(total, num);
+
+   for (i = 0; i < ret; i++) {
+   ret2 = vhost_get_avail(vq, heads[i].id,
+ &vq->avail->ring[last_avail_idx]);
+   if (unlikely(ret2)) {
+   vq_err(vq, "Failed to get descriptors\n");
+   return -EFAULT;
+   }
+   last_avail_idx = (last_avail_idx + 1) & (vq->num - 1);
+   }
+
+   if (!used_update)
+   return ret;
+
+   last_used_idx = vq->last_used_idx & (vq->num - 1);
+   while (total) {
+   copied = min((u16)(vq->num - last_used_idx), total);
+   ret2 = vhost_copy_to_user(vq,
+ &vq->used->ring[last_used_idx],
+ &heads[ret - total],
+ copied * sizeof(*used));
+
+   if (unlikely(ret2)) {
+   vq_err(vq, "Failed to update used ring!\n");
+   return -EFAULT;
+   }
+
+   last_used_idx = 0;
+   total -= copied;
+   }
+
+   /* Only get avail ring entries after they have been exposed by guest. */
+   smp_rmb();
+   return ret;
+}
+EXPORT_SYMBOL(vhost_prefetch_desc_indices);
 
 static int __init vhost_init(void)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 39ff897..16c2cb6 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -228,6 +228,9 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct 
iov_iter *to,
 ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
 struct iov_iter *from);
 int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled);
+int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq,
+   struct vring_used_elem *heads,
+   u16 num, bool used_update);
 
 #define vq_err(vq, fmt, ...) do {  \
pr_debug(pr_fmt(fmt), ##__VA_ARGS__);   \
-- 
2.7.4

Re: [PATCH net-next RFC 1/5] vhost: split out ring head fetching logic

2017-09-22 Thread Stefan Hajnoczi

On Fri, Sep 22, 2017 at 04:02:31PM +0800, Jason Wang wrote:
> +/* This looks in the virtqueue and for the first available buffer, and 
> converts
> + * it to an iovec for convenient access.  Since descriptors consist of some
> + * number of output then some number of input descriptors, it's actually two
> + * iovecs, but we pack them into one and note how many of each there were.
> + *
> + * This function returns the descriptor number found, or vq->num (which is
> + * never a valid descriptor number) if none was found.  A negative code is
> + * returned on error. */
> +int __vhost_get_vq_desc(struct vhost_virtqueue *vq,
> + struct iovec iov[], unsigned int iov_size,
> + unsigned int *out_num, unsigned int *in_num,
> + struct vhost_log *log, unsigned int *log_num,
> + __virtio16 head)
[...]
> +int vhost_get_vq_desc(struct vhost_virtqueue *vq,
> +   struct iovec iov[], unsigned int iov_size,
> +   unsigned int *out_num, unsigned int *in_num,
> +   struct vhost_log *log, unsigned int *log_num)

Please document vhost_get_vq_desc().

Please also explain the difference between __vhost_get_vq_desc() and
vhost_get_vq_desc() in the documentation.

Re: [patch net-next 07/12] mlxsw: spectrum: Add the multicast routing offloading logic

2017-09-22 Thread Yotam Gigi

On 09/21/2017 06:26 PM, Andrew Lunn wrote:
>> +static void mlxsw_sp_mr_route_stats_update(struct mlxsw_sp *mlxsw_sp,
>> +   struct mlxsw_sp_mr_route *mr_route)
>> +{
>> +struct mlxsw_sp_mr *mr = mlxsw_sp->mr;
>> +u64 packets, bytes;
>> +
>> +if (mr_route->route_action == MLXSW_SP_MR_ROUTE_ACTION_TRAP)
>> +return;
>> +
>> +mr->mr_ops->route_stats(mlxsw_sp, mr_route->route_priv, &packets,
>> +&bytes);
>> +
>> +switch (mr_route->mr_table->proto) {
>> +case MLXSW_SP_L3_PROTO_IPV4:
>> +mr_route->mfc4->mfc_un.res.pkt = packets;
>> +mr_route->mfc4->mfc_un.res.bytes = bytes;
> What about wrong_if and lastuse? 

wronf_if is updated by ipmr as it is trapped to the CPU. We did not
address lastuse currently, though it can be easily addressed here.

>
> Is an mfc with iif on the host, not the switch, not offloaded?

I am not sure I followed. What do you mean MFC with iif on the host? you mean
MFC with iif that is an external NIC which is not part of the spectrum ASIC? in
this case, the route will not be offloaded and all traffic will pass in 
slowpath.

>
>Andrew

Re: [PATCH 2/2] ip_tunnel: add mpls over gre encapsulation

2017-09-22 Thread Amine Kherbouche




On 09/22/2017 12:35 AM, Roopa Prabhu wrote:

> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 36ea2ad..060ed07 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #if IS_ENABLED(CONFIG_IPV6)
>  #include 
> @@ -39,6 +40,40 @@ static int one = 1;
>  static int label_limit = (1 << 20) - 1;
>  static int ttl_max = 255;
>
> +size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
> +{
> +   return sizeof(struct mpls_shim_hdr);
> +}
> +
> +int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
> +   u8 *protocol, struct flowi4 *fl4)
> +{
> +   return 0;
> +}


any reason you are supporting only rx ?


Tx path doesn't need changes, all gre hdr fields remain the same except 
for the protocol type which is loaded from skb->protocol this last is 
set by the mpls stack before entering gre device.

Re: [net-next 1/2] dummy: add device MTU validation check

2017-09-22 Thread Sabrina Dubroca

2017-09-21, 08:02:18 -0700, Eric Dumazet wrote:
> On Thu, 2017-09-21 at 21:32 +0800, Zhang Shengju wrote:
> > Currently, any mtu value can be assigned when adding a new dummy device:
> > [~]# ip link add name dummy1 mtu 10 type dummy
> > [~]# ip link show dummy1
> > 15: dummy1:  mtu 10 qdisc noop state DOWN mode DEFAULT 
> > group default qlen 1000
> > link/ether 0a:61:6b:16:14:ce brd ff:ff:ff:ff:ff:ff
> > 
> > This patch adds device MTU validation check.
> 
> What is wrong with big MTU on dummy ?

It looks like the "centralize MTU checking" series broke that, but
only for changing the MTU on an existing dummy device. Commit
a52ad514fdf3 defined min_mtu/max_mtu in ether_setup, which dummy uses,
but there was no MTU check in dummy prior to that commit.


> If this is a generic rule, this check should belong in core network
> stack.
> 
> > 
> > Signed-off-by: Zhang Shengju 
> > ---
> >  drivers/net/dummy.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
> > index e31ab3b..0276b2b 100644
> > --- a/drivers/net/dummy.c
> > +++ b/drivers/net/dummy.c
> > @@ -365,6 +365,14 @@ static int dummy_validate(struct nlattr *tb[], struct 
> > nlattr *data[],
> > if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS])))
> > return -EADDRNOTAVAIL;
> > }
> > +
> > +   if (tb[IFLA_MTU]) {
> > +   u32 mtu = nla_get_u32(tb[IFLA_MTU]);
> 
> You do not verify/validate nla_len(tb[IFLA_MTU]).

I think ifla_policy already performs that check:


static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[...]
[IFLA_MTU]  = { .type = NLA_U32 },


static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
[...]
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy, extack);


-- 
Sabrina

Re: [PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index

2017-09-22 Thread Stefan Hajnoczi

On Fri, Sep 22, 2017 at 04:02:32PM +0800, Jason Wang wrote:
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f87ec75..8424166d 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct 
> vhost_dev *dev,
>  }
>  EXPORT_SYMBOL_GPL(vhost_dequeue_msg);
>  
> +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq,
> + struct vring_used_elem *heads,
> + u16 num, bool used_update)

Missing doc comment.

> +{
> + int ret, ret2;
> + u16 last_avail_idx, last_used_idx, total, copied;
> + __virtio16 avail_idx;
> + struct vring_used_elem __user *used;
> + int i;

The following variable names are a little confusing:

last_avail_idx vs vq->last_avail_idx.  last_avail_idx is a wrapped
avail->ring[] index, vq->last_avail_idx is a free-running counter.  The
same for last_used_idx vs vq->last_used_idx.

num argument vs vq->num.  The argument could be called nheads instead to
make it clear that this is heads[] and not the virtqueue size.

Not a bug but it took me a while to figure out what was going on.

Re: [PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()

2017-09-22 Thread Stefan Hajnoczi

On Fri, Sep 22, 2017 at 04:02:33PM +0800, Jason Wang wrote:
> This patch introduces a helper which just increase the used idx. This
> will be used in pair with vhost_prefetch_desc_indices() by batching
> code.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vhost.c | 33 +
>  drivers/vhost/vhost.h |  1 +
>  2 files changed, 34 insertions(+)

Reviewed-by: Stefan Hajnoczi

Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit

2017-09-22 Thread Stanislaw Gruszka

On Thu, Sep 21, 2017 at 11:56:30PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Don't populate const array ac_to_fifo on the stack in an inlined
> function, instead make it static.  Makes the object code smaller
> by over 800 bytes:
> 
>text  data bss dec hex filename
>  159029 331541216  193399   2f377 4965-mac.o
> 
>text  data bss dec hex filename
>  158122 332501216  192588   2f04c 4965-mac.o
> 
> (gcc version 7.2.0 x86_64)
> 
> Signed-off-by: Colin Ian King 

Content type information was added at the end of the topic, but
I think Kalle can fix that when he will be committing the patch.

Acked-by: Stanislaw Gruszka

Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit

2017-09-22 Thread Joe Perches

On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote:
> 
> On Thu, 21 Sep 2017, Colin King wrote:
> 
> > From: Colin Ian King 
> > 
> > Don't populate const array ac_to_fifo on the stack in an inlined
> > function, instead make it static.  Makes the object code smaller
> > by over 800 bytes:
> > 
> >textdata bss dec hex filename
> >  159029   331541216  193399   2f377 4965-mac.o
> > 
> >textdata bss dec hex filename
> >  158122   332501216  192588   2f04c 4965-mac.o
> > 
> > (gcc version 7.2.0 x86_64)
> 
> Gcc 7 must be much more aggressive about inlining than gcc 4.  Here is the
> change that I got on this file:
> 
>  text  data bss dec hex filename
> -   76346  1494 152   77992   130a8 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o
> +   76298  1494 152   77944   13078 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o
> decrease of 48

More likely different CONFIG options.

e.g. allyesconfig vs defconfig with wireless

Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit

2017-09-22 Thread Colin Ian King

On 22/09/17 11:03, Joe Perches wrote:
> On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote:
>>
>> On Thu, 21 Sep 2017, Colin King wrote:
>>
>>> From: Colin Ian King 
>>>
>>> Don't populate const array ac_to_fifo on the stack in an inlined
>>> function, instead make it static.  Makes the object code smaller
>>> by over 800 bytes:
>>>
>>>textdata bss dec hex filename
>>>  159029   331541216  193399   2f377 4965-mac.o
>>>
>>>textdata bss dec hex filename
>>>  158122   332501216  192588   2f04c 4965-mac.o
>>>
>>> (gcc version 7.2.0 x86_64)
>>
>> Gcc 7 must be much more aggressive about inlining than gcc 4.  Here is the
>> change that I got on this file:
>>
>>  text  data bss dec hex filename
>> -   76346  1494 152   77992   130a8 
>> drivers/net/wireless/intel/iwlegacy/4965-mac.o
>> +   76298  1494 152   77944   13078 
>> drivers/net/wireless/intel/iwlegacy/4965-mac.o
>> decrease of 48
> 
> More likely different CONFIG options.
> 
> e.g. allyesconfig vs defconfig with wireless
> 
yup, I used allyesconfig

Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit

2017-09-22 Thread Julia Lawall



On Fri, 22 Sep 2017, Colin Ian King wrote:

> On 22/09/17 11:03, Joe Perches wrote:
> > On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote:
> >>
> >> On Thu, 21 Sep 2017, Colin King wrote:
> >>
> >>> From: Colin Ian King 
> >>>
> >>> Don't populate const array ac_to_fifo on the stack in an inlined
> >>> function, instead make it static.  Makes the object code smaller
> >>> by over 800 bytes:
> >>>
> >>>text  data bss dec hex filename
> >>>  159029 331541216  193399   2f377 4965-mac.o
> >>>
> >>>text  data bss dec hex filename
> >>>  158122 332501216  192588   2f04c 4965-mac.o
> >>>
> >>> (gcc version 7.2.0 x86_64)
> >>
> >> Gcc 7 must be much more aggressive about inlining than gcc 4.  Here is the
> >> change that I got on this file:
> >>
> >>  text  data bss dec hex filename
> >> -   76346  1494 152   77992   130a8 
> >> drivers/net/wireless/intel/iwlegacy/4965-mac.o
> >> +   76298  1494 152   77944   13078 
> >> drivers/net/wireless/intel/iwlegacy/4965-mac.o
> >> decrease of 48
> >
> > More likely different CONFIG options.
> >
> > e.g. allyesconfig vs defconfig with wireless
> >
> yup, I used allyesconfig

Mine is also allyesconfig.

julia

>
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

tools: selftests: psock_tpacket: skip un-supported tpacket_v3 test

2017-09-22 Thread Orson Zhai

The TPACKET_V3 test of PACKET_TX_RING will fail with kernel version
lower than v4.11. Supported code of tx ring was add with commit id
<7f953ab2ba46: af_packet: TX_RING support for TPACKET_V3> at Jan. 3
of 2017.

So skip this item test instead of reporting failing for old kernels.

Signed-off-by: Orson Zhai 
---
 tools/testing/selftests/net/psock_tpacket.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/psock_tpacket.c 
b/tools/testing/selftests/net/psock_tpacket.c
index 7f6cd9fdacf3..f0cfc18c3726 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "psock_lib.h"
 
@@ -676,7 +677,7 @@ static void __v3_fill(struct ring *ring, unsigned int 
blocks, int type)
ring->flen = ring->req3.tp_block_size;
 }
 
-static void setup_ring(int sock, struct ring *ring, int version, int type)
+static int setup_ring(int sock, struct ring *ring, int version, int type)
 {
int ret = 0;
unsigned int blocks = 256;
@@ -703,7 +704,11 @@ static void setup_ring(int sock, struct ring *ring, int 
version, int type)
 
if (ret == -1) {
perror("setsockopt");
-   exit(1);
+   if (errno == EINVAL) {
+   printf("[SKIP] This type seems un-supported in current 
kernel, skipped.\n");
+   return -1;
+   } else
+   exit(1);
}
 
ring->rd_len = ring->rd_num * sizeof(*ring->rd);
@@ -715,6 +720,7 @@ static void setup_ring(int sock, struct ring *ring, int 
version, int type)
 
total_packets = 0;
total_bytes = 0;
+   return 0;
 }
 
 static void mmap_ring(int sock, struct ring *ring)
@@ -830,7 +836,12 @@ static int test_tpacket(int version, int type)
 
sock = pfsocket(version);
memset(&ring, 0, sizeof(ring));
-   setup_ring(sock, &ring, version, type);
+   if(setup_ring(sock, &ring, version, type)) {
+   /* skip test when error of invalid argument */
+   close(sock);
+   return 0;
+   }
+
mmap_ring(sock, &ring);
bind_ring(sock, &ring);
walk_ring(sock, &ring);
-- 
2.12.2

Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit

2017-09-22 Thread Kalle Valo

Stanislaw Gruszka  writes:

> On Thu, Sep 21, 2017 at 11:56:30PM +0100, Colin King wrote:
>> From: Colin Ian King 
>> 
>> Don't populate const array ac_to_fifo on the stack in an inlined
>> function, instead make it static.  Makes the object code smaller
>> by over 800 bytes:
>> 
>>text data bss dec hex filename
>>  159029331541216  193399   2f377 4965-mac.o
>> 
>>text data bss dec hex filename
>>  158122332501216  192588   2f04c 4965-mac.o
>> 
>> (gcc version 7.2.0 x86_64)
>> 
>> Signed-off-by: Colin Ian King 
>
> Content type information was added at the end of the topic, but
> I think Kalle can fix that when he will be committing the patch.

Yeah, I'll fix that when I commit this. But very good that you pointed
it out, I might miss stuff like this.

I'll also remove the "wireless:" prefix from the title.

-- 
Kalle Valo

Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit

2017-09-22 Thread Joe Perches

On Fri, 2017-09-22 at 12:06 +0200, Julia Lawall wrote:
> 
> On Fri, 22 Sep 2017, Colin Ian King wrote:
> 
> > On 22/09/17 11:03, Joe Perches wrote:
> > > On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote:
> > > > 
> > > > On Thu, 21 Sep 2017, Colin King wrote:
> > > > 
> > > > > From: Colin Ian King 
> > > > > 
> > > > > Don't populate const array ac_to_fifo on the stack in an inlined
> > > > > function, instead make it static.  Makes the object code smaller
> > > > > by over 800 bytes:
> > > > > 
> > > > >text  data bss dec hex filename
> > > > >  159029 331541216  193399   2f377 4965-mac.o
> > > > > 
> > > > >text  data bss dec hex filename
> > > > >  158122 332501216  192588   2f04c 4965-mac.o
> > > > > 
> > > > > (gcc version 7.2.0 x86_64)
> > > > 
> > > > Gcc 7 must be much more aggressive about inlining than gcc 4.  Here is 
> > > > the
> > > > change that I got on this file:
> > > > 
> > > >  text  data bss dec hex filename
> > > > -   76346  1494 152   77992   130a8 
> > > > drivers/net/wireless/intel/iwlegacy/4965-mac.o
> > > > +   76298  1494 152   77944   13078 
> > > > drivers/net/wireless/intel/iwlegacy/4965-mac.o
> > > > decrease of 48
> > > 
> > > More likely different CONFIG options.
> > > 
> > > e.g. allyesconfig vs defconfig with wireless
> > > 
> > 
> > yup, I used allyesconfig
> 
> Mine is also allyesconfig.

Odd.

Here is what I get: (Sorry, no gcc-7)

gcc4: gcc-4.9 (Ubuntu 4.9.4-2ubuntu1) 4.9.4
gcc5: gcc-5 (Ubuntu 5.4.1-8ubuntu1) 5.4.1 20170304
gcc6: gcc-6 (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406

$ size drivers/net/wireless/intel/iwlegacy/4965-mac.o*
   text    data bss dec hex filename
 118559    1766 152  120477   1d69d 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.allyesconfig.new
 118623    1766 152  120541   1d6dd 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.allyesconfig.old
  51595    1156   0   52751    ce0f 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.defconfig.new
  51659    1156   0   52815    ce4f 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.defconfig.old
 141956   29790    1216  172962   2a3a2 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.allyesconfig.new
 142671   29702    1216  173589   2a615 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.allyesconfig.old
  51733    1156   0   52889    ce99 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.defconfig.new
  51813    1156   0   52969    cee9 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.defconfig.old
 152991   29790    1216  183997   2cebd 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.allyesconfig.new
 153722   29702    1216  184640   2d140 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.allyesconfig.old
  51813    1156   0   52969    cee9 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.defconfig.new
  51893    1156   0   53049    cf39 
drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.defconfig.old

Re: [PATCH] wireless: iwlegacy: make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit

2017-09-22 Thread Julia Lawall



On Fri, 22 Sep 2017, Joe Perches wrote:

> On Fri, 2017-09-22 at 12:06 +0200, Julia Lawall wrote:
> >
> > On Fri, 22 Sep 2017, Colin Ian King wrote:
> >
> > > On 22/09/17 11:03, Joe Perches wrote:
> > > > On Fri, 2017-09-22 at 09:23 +0200, Julia Lawall wrote:
> > > > >
> > > > > On Thu, 21 Sep 2017, Colin King wrote:
> > > > >
> > > > > > From: Colin Ian King 
> > > > > >
> > > > > > Don't populate const array ac_to_fifo on the stack in an inlined
> > > > > > function, instead make it static.  Makes the object code smaller
> > > > > > by over 800 bytes:
> > > > > >
> > > > > >textdata bss dec hex filename
> > > > > >  159029   331541216  193399   2f377 4965-mac.o
> > > > > >
> > > > > >textdata bss dec hex filename
> > > > > >  158122   332501216  192588   2f04c 4965-mac.o
> > > > > >
> > > > > > (gcc version 7.2.0 x86_64)
> > > > >
> > > > > Gcc 7 must be much more aggressive about inlining than gcc 4.  Here 
> > > > > is the
> > > > > change that I got on this file:
> > > > >
> > > > >  text  data bss dec hex filename
> > > > > -   76346  1494 152   77992   130a8 
> > > > > drivers/net/wireless/intel/iwlegacy/4965-mac.o
> > > > > +   76298  1494 152   77944   13078 
> > > > > drivers/net/wireless/intel/iwlegacy/4965-mac.o
> > > > > decrease of 48
> > > >
> > > > More likely different CONFIG options.
> > > >
> > > > e.g. allyesconfig vs defconfig with wireless
> > > >
> > >
> > > yup, I used allyesconfig
> >
> > Mine is also allyesconfig.
>
> Odd.
>
> Here is what I get: (Sorry, no gcc-7)
>
> gcc4: gcc-4.9 (Ubuntu 4.9.4-2ubuntu1) 4.9.4
> gcc5: gcc-5 (Ubuntu 5.4.1-8ubuntu1) 5.4.1 20170304
> gcc6: gcc-6 (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
>
> $ size drivers/net/wireless/intel/iwlegacy/4965-mac.o*
>    text      data bss dec hex filename
>  118559      1766 152  120477   1d69d 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.allyesconfig.new
>  118623      1766 152  120541   1d6dd 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.allyesconfig.old

My gcc is 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)

You get a savings of 64 and I got a savings of 48, but it's not that big a
difference, compared to what the later versions give.

julia

>   51595      1156   0   52751    ce0f 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.defconfig.new
>   51659      1156   0   52815    ce4f 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc4.defconfig.old
>  141956     29790    1216  172962   2a3a2 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.allyesconfig.new
>  142671     29702    1216  173589   2a615 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.allyesconfig.old
>   51733      1156   0   52889    ce99 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.defconfig.new
>   51813      1156   0   52969    cee9 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc5.defconfig.old
>  152991     29790    1216  183997   2cebd 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.allyesconfig.new
>  153722     29702    1216  184640   2d140 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.allyesconfig.old
>   51813      1156   0   52969    cee9 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.defconfig.new
>   51893      1156   0   53049    cf39 
> drivers/net/wireless/intel/iwlegacy/4965-mac.o.gcc6.defconfig.old
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[PATCH 0/5] use setup_timer() helper function.

2017-09-22 Thread Allen Pais

This series uses setup_timer() helper function. The series
addresses the files under net/*.



Allen Pais (5):
  net: nfc: hci: use setup_timer() helper.
  net: nfc: hci: llc_shdlc: use setup_timer() helper.
  net: af_packet: use setup_timer() helper.
  net: nfc: core: use setup_timer() helper.
  net: nfc: llcp_core: use setup_timer() helper.

 net/nfc/core.c  |  5 ++---
 net/nfc/hci/core.c  |  5 ++---
 net/nfc/hci/llc_shdlc.c | 15 ++-
 net/nfc/llcp_core.c | 10 --
 net/packet/af_packet.c  |  4 +---
 5 files changed, 15 insertions(+), 24 deletions(-)

-- 
2.7.4

[PATCH 1/5] net: nfc: hci: use setup_timer() helper.

2017-09-22 Thread Allen Pais

   Use setup_timer function instead of initializing timer with the
   function and data fields.

Signed-off-by: Allen Pais 
---
 net/nfc/hci/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/nfc/hci/core.c b/net/nfc/hci/core.c
index b740fef..a8a6e78 100644
--- a/net/nfc/hci/core.c
+++ b/net/nfc/hci/core.c
@@ -1004,9 +1004,8 @@ int nfc_hci_register_device(struct nfc_hci_dev *hdev)
 
INIT_WORK(&hdev->msg_tx_work, nfc_hci_msg_tx_work);
 
-   init_timer(&hdev->cmd_timer);
-   hdev->cmd_timer.data = (unsigned long)hdev;
-   hdev->cmd_timer.function = nfc_hci_cmd_timeout;
+   setup_timer(&hdev->cmd_timer, nfc_hci_cmd_timeout,
+   (unsigned long)hdev);
 
skb_queue_head_init(&hdev->rx_hcp_frags);
 
-- 
2.7.4

[PATCH 2/5] net: nfc: hci: llc_shdlc: use setup_timer() helper.

2017-09-22 Thread Allen Pais

   Use setup_timer function instead of initializing timer with the
   function and data fields.

Signed-off-by: Allen Pais 
---
 net/nfc/hci/llc_shdlc.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/nfc/hci/llc_shdlc.c b/net/nfc/hci/llc_shdlc.c
index 17e59a0..58df37e 100644
--- a/net/nfc/hci/llc_shdlc.c
+++ b/net/nfc/hci/llc_shdlc.c
@@ -763,17 +763,14 @@ static void *llc_shdlc_init(struct nfc_hci_dev *hdev, 
xmit_to_drv_t xmit_to_drv,
mutex_init(&shdlc->state_mutex);
shdlc->state = SHDLC_DISCONNECTED;
 
-   init_timer(&shdlc->connect_timer);
-   shdlc->connect_timer.data = (unsigned long)shdlc;
-   shdlc->connect_timer.function = llc_shdlc_connect_timeout;
+   setup_timer(&shdlc->connect_timer, llc_shdlc_connect_timeout,
+   (unsigned long)shdlc);
 
-   init_timer(&shdlc->t1_timer);
-   shdlc->t1_timer.data = (unsigned long)shdlc;
-   shdlc->t1_timer.function = llc_shdlc_t1_timeout;
+   setup_timer(&shdlc->t1_timer, llc_shdlc_t1_timeout,
+   (unsigned long)shdlc);
 
-   init_timer(&shdlc->t2_timer);
-   shdlc->t2_timer.data = (unsigned long)shdlc;
-   shdlc->t2_timer.function = llc_shdlc_t2_timeout;
+   setup_timer(&shdlc->t2_timer, llc_shdlc_t2_timeout,
+   (unsigned long)shdlc);
 
shdlc->w = SHDLC_MAX_WINDOW;
shdlc->srej_support = SHDLC_SREJ_SUPPORT;
-- 
2.7.4

[PATCH 5/5] net: nfc: llcp_core: use setup_timer() helper.

2017-09-22 Thread Allen Pais

   Use setup_timer function instead of initializing timer with the
   function and data fields.

Signed-off-by: Allen Pais 
---
 net/nfc/llcp_core.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index 02eef5c..7988185 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -1573,9 +1573,8 @@ int nfc_llcp_register_device(struct nfc_dev *ndev)
INIT_LIST_HEAD(&local->list);
kref_init(&local->ref);
mutex_init(&local->sdp_lock);
-   init_timer(&local->link_timer);
-   local->link_timer.data = (unsigned long) local;
-   local->link_timer.function = nfc_llcp_symm_timer;
+   setup_timer(&local->link_timer, nfc_llcp_symm_timer,
+   (unsigned long)local);
 
skb_queue_head_init(&local->tx_queue);
INIT_WORK(&local->tx_work, nfc_llcp_tx_work);
@@ -1601,9 +1600,8 @@ int nfc_llcp_register_device(struct nfc_dev *ndev)
 
mutex_init(&local->sdreq_lock);
INIT_HLIST_HEAD(&local->pending_sdreqs);
-   init_timer(&local->sdreq_timer);
-   local->sdreq_timer.data = (unsigned long) local;
-   local->sdreq_timer.function = nfc_llcp_sdreq_timer;
+   setup_timer(&local->sdreq_timer, nfc_llcp_sdreq_timer,
+   (unsigned long)local);
INIT_WORK(&local->sdreq_timeout_work, nfc_llcp_sdreq_timeout_work);
 
list_add(&local->list, &llcp_devices);
-- 
2.7.4

[PATCH 4/5] net: nfc: core: use setup_timer() helper.

2017-09-22 Thread Allen Pais

   Use setup_timer function instead of initializing timer with the
   function and data fields.

Signed-off-by: Allen Pais 
---
 net/nfc/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/nfc/core.c b/net/nfc/core.c
index 5cf33df..e5e23c2 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -1094,9 +1094,8 @@ struct nfc_dev *nfc_allocate_device(struct nfc_ops *ops,
dev->targets_generation = 1;
 
if (ops->check_presence) {
-   init_timer(&dev->check_pres_timer);
-   dev->check_pres_timer.data = (unsigned long)dev;
-   dev->check_pres_timer.function = nfc_check_pres_timeout;
+   setup_timer(&dev->check_pres_timer, nfc_check_pres_timeout,
+   (unsigned long)dev);
 
INIT_WORK(&dev->check_pres_work, nfc_check_pres_work);
}
-- 
2.7.4

[PATCH 3/5] net: af_packet: use setup_timer() helper.

2017-09-22 Thread Allen Pais

   Use setup_timer function instead of initializing timer with the
   function and data fields.

Signed-off-by: Allen Pais 
---
 net/packet/af_packet.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c261729..1d3e3ce 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -544,9 +544,7 @@ static void prb_init_blk_timer(struct packet_sock *po,
struct tpacket_kbdq_core *pkc,
void (*func) (unsigned long))
 {
-   init_timer(&pkc->retire_blk_timer);
-   pkc->retire_blk_timer.data = (long)po;
-   pkc->retire_blk_timer.function = func;
+   setup_timer(&pkc->retire_blk_timer, func, (long)po);
pkc->retire_blk_timer.expires = jiffies;
 }
 
-- 
2.7.4

Re: [net-next 1/2] dummy: add device MTU validation check

2017-09-22 Thread Eric Dumazet

On Fri, 2017-09-22 at 10:56 +0200, Sabrina Dubroca wrote:
> 2017-09-21, 08:02:18 -0700, Eric Dumazet wrote:
> > On Thu, 2017-09-21 at 21:32 +0800, Zhang Shengju wrote:
> > > Currently, any mtu value can be assigned when adding a new dummy device:
> > > [~]# ip link add name dummy1 mtu 10 type dummy
> > > [~]# ip link show dummy1
> > > 15: dummy1:  mtu 10 qdisc noop state DOWN mode 
> > > DEFAULT group default qlen 1000
> > > link/ether 0a:61:6b:16:14:ce brd ff:ff:ff:ff:ff:ff
> > > 
> > > This patch adds device MTU validation check.
> > 
> > What is wrong with big MTU on dummy ?
> 
> It looks like the "centralize MTU checking" series broke that, but
> only for changing the MTU on an existing dummy device. Commit
> a52ad514fdf3 defined min_mtu/max_mtu in ether_setup, which dummy uses,
> but there was no MTU check in dummy prior to that commit.
> 

It looks like we accept big mtu on loopback, right ?

lpaa23:~# ifconfig lo mtu 10
lpaa23:~# ifconfig lo
loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:10  Metric:1
  RX packets:3823 errors:0 dropped:0 overruns:0 frame:0
  TX packets:3823 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:759159 (759.1 KB)  TX bytes:759159 (759.1 KB)

Also we accept very small MTU as well (although this automatically
removes IP addresses, as one would expect)

lpaa23:~# ifconfig lo mtu 50
lpaa23:~# ifconfig lo
loLink encap:Local Loopback  
  UP LOOPBACK RUNNING  MTU:50  Metric:1
  RX packets:4052 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4052 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:806274 (806.2 KB)  TX bytes:806274 (806.2 KB)



So, why dummy devices would not accept bit MTU ?

Do we have some fundamental assumption in the stack ?

If yes, we need to fix loopback urgently, it is more important than
dummy.

Thanks.

Re: [PATCH net-next] net: mvpp2: phylink support

2017-09-22 Thread Russell King - ARM Linux

On Thu, Sep 21, 2017 at 03:45:22PM +0200, Antoine Tenart wrote:
> Convert the PPv2 driver to use phylink, which models the MAC to PHY
> link. The phylink support is made such a way the GoP link IRQ can still
> be used: the two modes are incompatible and the GoP link IRQ will be
> used if no PHY is described in the device tree. This is the same
> behaviour as before.

This makes no sense.  The point of phylink is to be able to support SFP
cages, and SFP cages do not have a PHY described in DT.  So, when you
want to use phylink because of SFP, you can't, because if you omit
the PHY the driver avoids using phylink.

> +static void mvpp2_phylink_validate(struct net_device *dev,
> +unsigned long *supported,
> +struct phylink_link_state *state)
> +{
> + __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
> +
> + phylink_set_port_modes(mask);
> +
> + phylink_set(mask, Autoneg);
> + phylink_set(mask, Pause);
> + phylink_set(mask, Asym_Pause);
> +
> + phylink_set(mask, 10baseT_Half);
> + phylink_set(mask, 10baseT_Full);
> + phylink_set(mask, 100baseT_Half);
> + phylink_set(mask, 100baseT_Full);
> + phylink_set(mask, 1000baseT_Half);
> + phylink_set(mask, 1000baseT_Full);
> + phylink_set(mask, 1000baseX_Full);
> + phylink_set(mask, 1baseKR_Full);
> +
> + bitmap_and(supported, supported, mask, __ETHTOOL_LINK_MODE_MASK_NBITS);
> + bitmap_and(state->advertising, state->advertising, mask,
> +__ETHTOOL_LINK_MODE_MASK_NBITS);
> +}

I don't think you've understood this despite the comments in the header
file.  What you describe above basically means you don't support any
kind of copper connection at 10G speeds, or any fiber modes at 10G
speeds either.

You need to set 1baseT_Full for copper, 1baseSR_Full,
1baseLR_Full, 1baseLRM_Full etc for fiber.  Had you looked at
my modifications for Marvell's mvpp2x driver you'd have spotted this...

> +
> +static int mvpp2_phylink_mac_link_state(struct net_device *dev,
> + struct phylink_link_state *state)
> +{
> + struct mvpp2_port *port = netdev_priv(dev);
> + u32 val;
> +
> + if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
> + port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> + return 0;

You're blocking this for 1000base-X and 10G connections, which is not
correct.  The expectation is that this function returns the current
MAC state irrespective of the interface mode.

> +
> + val = readl(port->base + MVPP2_GMAC_STATUS0);
> +
> + state->an_complete = !!(val & MVPP2_GMAC_STATUS0_AN_COMPLETE);
> + state->link = !!(val & MVPP2_GMAC_STATUS0_LINK_UP);
> + state->duplex = !!(val & MVPP2_GMAC_STATUS0_FULL_DUPLEX);
> +
> + if (val & MVPP2_GMAC_STATUS0_GMII_SPEED)
> + state->speed = SPEED_1000;
> + else
> + state->speed = (val & MVPP2_GMAC_STATUS0_MII_SPEED) ?
> +SPEED_100 : SPEED_10;
> +
> + state->pause = 0;
> + if (val & MVPP2_GMAC_STATUS0_RX_PAUSE)
> + state->pause |= MLO_PAUSE_RX;
> + if (val & MVPP2_GMAC_STATUS0_TX_PAUSE)
> + state->pause |= MLO_PAUSE_TX;
> +
> + return 1;
> +}
> +
> +static void mvpp2_mac_an_restart(struct net_device *dev)
> +{
> + struct mvpp2_port *port = netdev_priv(dev);
> + u32 val;
> +
> + if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
> + port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> + return;

This prevents AN restart in 1000base-X mode, which is exactly the
mode that you need to do this.  SGMII doesn't care, and RGMII doesn't
have inband AN.

> +
> + val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> + val |= MVPP2_GMAC_IN_BAND_RESTART_AN;
> + writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> +}
> +
> +static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
> +  const struct phylink_link_state *state)
> +{
> + struct mvpp2_port *port = netdev_priv(dev);
> + u32 val;
> +
> + /* disable current port for reconfiguration */
> + mvpp2_interrupts_disable(port);
> + netif_carrier_off(port->dev);
> + mvpp2_port_disable(port);
> + phy_power_off(port->comphy);
> +
> + /* comphy reconfiguration */
> + port->phy_interface = state->interface;
> + mvpp22_comphy_init(port);
> +
> + /* gop/mac reconfiguration */
> + mvpp22_gop_init(port);
> + mvpp2_port_mii_set(port);
> +
> + if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
> + port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> + return;

Again, 1000base-X is excluded, which will break it.  You do need
to avoid touching the GMAC for 10G connections however.

> +
> + val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> + val &= ~(MVPP2_GMAC_CONFIG_MII_SPEED |
> +  MVPP2

[PATCH] net: stmmac: Meet alignment requirements for DMA

2017-09-22 Thread Matt Redfearn

According to Documentation/DMA-API.txt:
 Warnings:  Memory coherency operates at a granularity called the cache
 line width.  In order for memory mapped by this API to operate
 correctly, the mapped region must begin exactly on a cache line
 boundary and end exactly on one (to prevent two separately mapped
 regions from sharing a single cache line).  Since the cache line size
 may not be known at compile time, the API will not enforce this
 requirement.  Therefore, it is recommended that driver writers who
 don't take special care to determine the cache line size at run time
 only map virtual regions that begin and end on page boundaries (which
 are guaranteed also to be cache line boundaries).

On some systems where DMA is non-coherent and requires software
writeback / invalidate of the caches, we must ensure that
dma_(un)map_single is called with a cacheline aligned buffer and a
length of a whole number of cachelines.

To address the alignment requirements of DMA buffers, keep a separate
entry in stmmac_rx_queue for the aligned skbuff head. Use this for
dma_map_single, such that the address meets the cacheline alignment
requirents. Use skb_headroom() to convert between rx_skbuff_head, the
aligned head of the buffer, and the packet data, rx_skbuff_dma.

Tested on a Creator Ci40 with Pistachio SoC.

Signed-off-by: Matt Redfearn 
---

 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 50 ---
 2 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index a916e13624eb..dd26a724dee7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -67,6 +67,7 @@ struct stmmac_rx_queue {
struct dma_desc *dma_rx cacheline_aligned_in_smp;
struct sk_buff **rx_skbuff;
dma_addr_t *rx_skbuff_dma;
+   dma_addr_t *rx_skbuff_head;
unsigned int cur_rx;
unsigned int dirty_rx;
u32 rx_zeroc_thresh;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1763e48c84e2..da68eeff2a1c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1132,14 +1132,16 @@ static int stmmac_init_rx_buffers(struct stmmac_priv 
*priv, struct dma_desc *p,
return -ENOMEM;
}
rx_q->rx_skbuff[i] = skb;
-   rx_q->rx_skbuff_dma[i] = dma_map_single(priv->device, skb->data,
-   priv->dma_buf_sz,
-   DMA_FROM_DEVICE);
-   if (dma_mapping_error(priv->device, rx_q->rx_skbuff_dma[i])) {
+   rx_q->rx_skbuff_head[i] = dma_map_single(priv->device, skb->head,
+skb_headroom(skb) +
+priv->dma_buf_sz,
+DMA_FROM_DEVICE);
+   if (dma_mapping_error(priv->device, rx_q->rx_skbuff_head[i])) {
netdev_err(priv->dev, "%s: DMA mapping error\n", __func__);
dev_kfree_skb_any(skb);
return -EINVAL;
}
+   rx_q->rx_skbuff_dma[i] = rx_q->rx_skbuff_head[i] + skb_headroom(skb);
 
if (priv->synopsys_id >= DWMAC_CORE_4_00)
p->des0 = cpu_to_le32(rx_q->rx_skbuff_dma[i]);
@@ -1164,7 +1166,8 @@ static void stmmac_free_rx_buffer(struct stmmac_priv 
*priv, u32 queue, int i)
struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
 
if (rx_q->rx_skbuff[i]) {
-   dma_unmap_single(priv->device, rx_q->rx_skbuff_dma[i],
+   dma_unmap_single(priv->device, rx_q->rx_skbuff_head[i],
+skb_headroom(rx_q->rx_skbuff[i]) +
 priv->dma_buf_sz, DMA_FROM_DEVICE);
dev_kfree_skb_any(rx_q->rx_skbuff[i]);
}
@@ -1438,6 +1441,7 @@ static void free_dma_rx_desc_resources(struct stmmac_priv 
*priv)
  rx_q->dma_erx, rx_q->dma_rx_phy);
 
kfree(rx_q->rx_skbuff_dma);
+   kfree(rx_q->rx_skbuff_head);
kfree(rx_q->rx_skbuff);
}
 }
@@ -1500,6 +1504,12 @@ static int alloc_dma_rx_desc_resources(struct 
stmmac_priv *priv)
if (!rx_q->rx_skbuff_dma)
goto err_dma;
 
+   rx_q->rx_skbuff_head = kmalloc_array(DMA_RX_SIZE,
+sizeof(dma_addr_t),
+GFP_KERNEL);
+   if (!rx_q->rx_skbuff_head)
+   goto err_dma;
+
rx_q->rx_skbuff = kmalloc_array(DMA_RX_SIZE,
sizeof(struct sk_buff *),

[PATCH iproute2 v2] man: fix documentation for range of route table ID

2017-09-22 Thread Thomas Haller

Signed-off-by: Thomas Haller 
---
Changes in v2:
  - "0" is not a valid table ID.

 man/man8/ip-route.8.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index 803de3b9..705ceb20 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -322,7 +322,7 @@ normal routing tables.
 .P
 .B Route tables:
 Linux-2.x can pack routes into several routing tables identified
-by a number in the range from 1 to 2^31 or by name from the file
+by a number in the range from 1 to 2^32-1 or by name from the file
 .B @SYSCONFDIR@/rt_tables
 By default all normal routes are inserted into the
 .B main
-- 
2.13.5

Re: [PATCH] e1000: avoid null pointer dereference on invalid stat type

2017-09-22 Thread Dan Carpenter

On Thu, Sep 21, 2017 at 11:01:58PM +0100, Colin King wrote:
> @@ -1837,12 +1838,13 @@ static void e1000_get_ethtool_stats(struct net_device 
> *netdev,
>   p = (char *)adapter + stat->stat_offset;
>   break;
>   default:
> + p = NULL;
>   WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
> stat->type, i);
>   break;
>   }
>  
> - if (stat->sizeof_stat == sizeof(u64))
> + if (p && stat->sizeof_stat == sizeof(u64))
>   data[i] = *(u64 *)p;
>   else
>   data[i] = *(u32 *)p;
   

The else side will still crash.

regards,
dan carpenter

Re: [PATCH] e1000: avoid null pointer dereference on invalid stat type

2017-09-22 Thread Colin Ian King

On 22/09/17 12:50, Dan Carpenter wrote:
> On Thu, Sep 21, 2017 at 11:01:58PM +0100, Colin King wrote:
>> @@ -1837,12 +1838,13 @@ static void e1000_get_ethtool_stats(struct 
>> net_device *netdev,
>>  p = (char *)adapter + stat->stat_offset;
>>  break;
>>  default:
>> +p = NULL;
>>  WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
>>stat->type, i);
>>  break;
>>  }
>>  
>> -if (stat->sizeof_stat == sizeof(u64))
>> +if (p && stat->sizeof_stat == sizeof(u64))
>>  data[i] = *(u64 *)p;
>>  else
>>  data[i] = *(u32 *)p;
>
> 
> The else side will still crash.
> 
> regards,
> dan carpenter
> 
Thanks, stupid me. I'll fix that up.

Re: [net-next 1/2] dummy: add device MTU validation check

2017-09-22 Thread Sabrina Dubroca

2017-09-22, 04:05:09 -0700, Eric Dumazet wrote:
> On Fri, 2017-09-22 at 10:56 +0200, Sabrina Dubroca wrote:
> > 2017-09-21, 08:02:18 -0700, Eric Dumazet wrote:
> > > On Thu, 2017-09-21 at 21:32 +0800, Zhang Shengju wrote:
> > > > Currently, any mtu value can be assigned when adding a new dummy device:
> > > > [~]# ip link add name dummy1 mtu 10 type dummy
> > > > [~]# ip link show dummy1
> > > > 15: dummy1:  mtu 10 qdisc noop state DOWN mode 
> > > > DEFAULT group default qlen 1000
> > > > link/ether 0a:61:6b:16:14:ce brd ff:ff:ff:ff:ff:ff
> > > > 
> > > > This patch adds device MTU validation check.
> > > 
> > > What is wrong with big MTU on dummy ?
> > 
> > It looks like the "centralize MTU checking" series broke that, but
> > only for changing the MTU on an existing dummy device. Commit
> > a52ad514fdf3 defined min_mtu/max_mtu in ether_setup, which dummy uses,
> > but there was no MTU check in dummy prior to that commit.
> > 
> 
> It looks like we accept big mtu on loopback, right ?

Yes. I only meant that before commit a52ad514fdf3, there was no range
check on dummy's MTU. Commit 25e3e84b183a ("dummy: expend mtu range
for dummy device") and 8b1efc0f83f1 ("net: remove MTU limits on a few
ether_setup callers") fixed that only partially. It's the same with
ifb, btw, it didn't have any check before a52ad514fdf3, so we should
set min_mtu = max_mtu = 0.

-- 
Sabrina

[PATCH] MAINTAINERS: update git tree locations for ieee802154 subsystem

2017-09-22 Thread Stefan Schmidt

Patches for ieee802154 will go through my new trees towards netdev from
now on. The 6LoWPAN subsystem will stay as is (shared between ieee802154
and bluetooth) and go through the bluetooth tree as usual.

Signed-off-by: Stefan Schmidt 
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2281af4b41b6..fc7313daf4b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6643,8 +6643,8 @@ M:Alexander Aring 
 M: Stefan Schmidt 
 L: linux-w...@vger.kernel.org
 W: http://wpan.cakelab.org/
-T: git 
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git
-T: git 
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan.git
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next.git
 S: Maintained
 F: net/ieee802154/
 F: net/mac802154/
-- 
2.13.5

Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack

2017-09-22 Thread Jiri Pirko

Thu, Sep 21, 2017 at 01:21:53PM CEST, linyunsh...@huawei.com wrote:
>When using tc qdisc to configure DCB parameter, dcb_ops->setup_tc
>is used to tell hclge_dcb module to do the setup.
>When using lldptool to configure DCB parameter, hclge_dcb module
>call the client_ops->setup_tc to tell network stack which queue
>and priority is using for specific tc.
>
>Signed-off-by: Yunsheng Lin 

[...]



>-static int hns3_setup_tc(struct net_device *netdev, u8 tc)
>+static int hns3_setup_tc(struct net_device *netdev, u8 tc, u8 *prio_tc)
> {
>   struct hns3_nic_priv *priv = netdev_priv(netdev);
>   struct hnae3_handle *h = priv->ae_handle;
>   struct hnae3_knic_private_info *kinfo = &h->kinfo;
>+  bool if_running = netif_running(netdev);
>   unsigned int i;
>   int ret;
> 
>   if (tc > HNAE3_MAX_TC)
>   return -EINVAL;
> 
>-  if (kinfo->num_tc == tc)
>-  return 0;
>-
>   if (!netdev)
>   return -EINVAL;
> 
>-  if (!tc) {
>+  if (if_running) {
>+  (void)hns3_nic_net_stop(netdev);
>+  msleep(100);
>+  }
>+
>+  ret = (kinfo->dcb_ops && kinfo->dcb_ops->setup_tc) ?
>+  kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : -EOPNOTSUPP;

This is most odd. Why do you call dcb_ops from ndo_setup_tc callback?
Why are you mixing this together? prio->tc mapping can be done
directly in dcbnl

Re: tg3 pxe weirdness

2017-09-22 Thread Berend De Schouwer

On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote:
> On Thu, Sep 21, 2017 at 7:53 PM, Berend De Schouwer
>  wrote:
> > Hi,
> > 
> > I've got a machine with a Broadcom bcm5762c, using the tg3 driver,
> > that
> > fails to receive network packets under some very specific
> > conditions.
> > 
> > 
> > Berend
> 
> Can you please share below details?
> 1) Model and Manufacturer of the system
> 2) Linux distro/kernel used?

4.13.3 mainline locks up, but network dumps confirm it gets further. 
The second DHCP query completes, and the first NFS mount completes.

I have to rely on network dumps since on PXE and SATA boot it breaks
VGA output, so I can't see what it's doing, or why it stops.

RE: [net-next 1/2] dummy: add device MTU validation check

2017-09-22 Thread 张胜举

> -Original Message-
> From: Sabrina Dubroca [mailto:s...@queasysnail.net]
> Sent: 2017年9月22日 20:23
> To: Eric Dumazet 
> Cc: Jarod Wilson ; Zhang Shengju
> ; da...@davemloft.net;
> will...@google.com; step...@networkplumber.org;
> netdev@vger.kernel.org
> Subject: Re: [net-next 1/2] dummy: add device MTU validation check
> 
> 2017-09-22, 04:05:09 -0700, Eric Dumazet wrote:
> > On Fri, 2017-09-22 at 10:56 +0200, Sabrina Dubroca wrote:
> > > 2017-09-21, 08:02:18 -0700, Eric Dumazet wrote:
> > > > On Thu, 2017-09-21 at 21:32 +0800, Zhang Shengju wrote:
> > > > > Currently, any mtu value can be assigned when adding a new dummy
> device:
> > > > > [~]# ip link add name dummy1 mtu 10 type dummy [~]# ip link
> > > > > show dummy1
> > > > > 15: dummy1:  mtu 10 qdisc noop state
> DOWN mode DEFAULT group default qlen 1000
> > > > > link/ether 0a:61:6b:16:14:ce brd ff:ff:ff:ff:ff:ff
> > > > >
> > > > > This patch adds device MTU validation check.
> > > >
> > > > What is wrong with big MTU on dummy ?
> > >
> > > It looks like the "centralize MTU checking" series broke that, but
> > > only for changing the MTU on an existing dummy device. Commit
> > > a52ad514fdf3 defined min_mtu/max_mtu in ether_setup, which dummy
> > > uses, but there was no MTU check in dummy prior to that commit.
> > >
> >
> > It looks like we accept big mtu on loopback, right ?
> 
> Yes. I only meant that before commit a52ad514fdf3, there was no range check
> on dummy's MTU. Commit 25e3e84b183a ("dummy: expend mtu range for
> dummy device") and 8b1efc0f83f1 ("net: remove MTU limits on a few
> ether_setup callers") fixed that only partially. It's the same with ifb, btw, 
> it
> didn't have any check before a52ad514fdf3, so we should set min_mtu =
> max_mtu = 0.
> 
> --
> Sabrina

I agree, dummy and ifb device should not have any limit on mtu, just like 
loopback device.
I will send v2 patch, and set min/max_mtu to zero for dummy and ifb device, 
thanks.

ZSJ

Re: tg3 pxe weirdness

2017-09-22 Thread Berend De Schouwer

On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote:
> On Thu, Sep 21, 2017 at 7:53 PM, Berend De Schouwer
>  wrote:
> > Hi,
> > 
> > I've got a machine with a Broadcom bcm5762c, using the tg3 driver,
> > that
> > fails to receive network packets under some very specific
> > conditions.
> > 
> > 
> > Berend
> 
> Can you please share below details?
> 1) Model and Manufacturer of the system
> 2) Linux distro/kernel used?

3.10.107 behaves like CentOS' kernel.

Re: [patch net-next 07/12] mlxsw: spectrum: Add the multicast routing offloading logic

2017-09-22 Thread Andrew Lunn

On Fri, Sep 22, 2017 at 11:36:59AM +0300, Yotam Gigi wrote:
> On 09/21/2017 06:26 PM, Andrew Lunn wrote:
> >> +static void mlxsw_sp_mr_route_stats_update(struct mlxsw_sp *mlxsw_sp,
> >> + struct mlxsw_sp_mr_route *mr_route)
> >> +{
> >> +  struct mlxsw_sp_mr *mr = mlxsw_sp->mr;
> >> +  u64 packets, bytes;
> >> +
> >> +  if (mr_route->route_action == MLXSW_SP_MR_ROUTE_ACTION_TRAP)
> >> +  return;
> >> +
> >> +  mr->mr_ops->route_stats(mlxsw_sp, mr_route->route_priv, &packets,
> >> +  &bytes);
> >> +
> >> +  switch (mr_route->mr_table->proto) {
> >> +  case MLXSW_SP_L3_PROTO_IPV4:
> >> +  mr_route->mfc4->mfc_un.res.pkt = packets;
> >> +  mr_route->mfc4->mfc_un.res.bytes = bytes;
> > What about wrong_if and lastuse? 

Hi Yotam

> wronf_if is updated by ipmr as it is trapped to the CPU.

Great.

> We did not address lastuse currently, though it can be easily
> addressed here.

Please do. I've written multicast routing daemons, where i use it to
flush out MFCs which are no longer in use. Having it always 0 is going
to break daemons.

> > Is an mfc with iif on the host, not the switch, not offloaded?
> 
> 
> I am not sure I followed. What do you mean MFC with iif on the host? you mean
> MFC with iif that is an external NIC which is not part of the spectrum ASIC?

Yes. We probably have different perspectives on the world. To
Mellanox, everything is a switch in a box. In the DSA world, we tend
to think of having a general purpose machine which also has a switch
connected. Think of a wireless access point, set top box, passenger
entertainment system. We have applications on the general purpose
computer, we have wifi interfaces, cable modems, etc. Think about all
the different packages in LEDE. We might have a multicast video
stream, coming from the cable modem being sent over ports of the
switch to clients.

So when i look at these patches, i try to make sure the general use
cases works, not just the plain boring Ethernet switch box use cases
:-)

> in this case, the route will not be offloaded and all traffic will
> pass in slowpath.

O.K. I was just thinking if those counters need to be +=, not =.  But
either the iif is on the host, or it is in the switch. It cannot be
both. So = is O.K.

Thanks
Andrew

[PATCH net 1/2] l2tp: ensure sessions are freed after their PPPOL2TP socket

2017-09-22 Thread Guillaume Nault

If l2tp_tunnel_delete() or l2tp_tunnel_closeall() deletes a session
right after pppol2tp_release() orphaned its socket, then the 'sock'
variable of the pppol2tp_session_close() callback is NULL. Yet the
session is still used by pppol2tp_release().

Therefore we need to take an extra reference in any case, to prevent
l2tp_tunnel_delete() or l2tp_tunnel_closeall() from freeing the session.

Since the pppol2tp_session_close() callback is only set if the session
is associated to a PPPOL2TP socket and that both l2tp_tunnel_delete()
and l2tp_tunnel_closeall() hold the PPPOL2TP socket before calling
pppol2tp_session_close(), we're sure that pppol2tp_session_close() and
pppol2tp_session_destruct() are paired and called in the right order.
So the reference taken by the former will be released by the later.

Signed-off-by: Guillaume Nault 
---
 net/l2tp/l2tp_ppp.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 50e3ee9a9d61..bc6e8bfc5be4 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -437,11 +437,11 @@ static void pppol2tp_session_close(struct l2tp_session 
*session)
 
BUG_ON(session->magic != L2TP_SESSION_MAGIC);
 
-   if (sock) {
+   if (sock)
inet_shutdown(sock, SEND_SHUTDOWN);
-   /* Don't let the session go away before our socket does */
-   l2tp_session_inc_refcount(session);
-   }
+
+   /* Don't let the session go away before our socket does */
+   l2tp_session_inc_refcount(session);
 }
 
 /* Really kill the session socket. (Called from sock_put() if
-- 
2.14.1

[PATCH net 2/2] l2tp: fix race between l2tp_session_delete() and l2tp_tunnel_closeall()

2017-09-22 Thread Guillaume Nault

There are several ways to remove L2TP sessions:

  * deleting a session explicitly using the netlink interface (with
L2TP_CMD_SESSION_DELETE),
  * deleting the session's parent tunnel (either by closing the
tunnel's file descriptor or using the netlink interface),
  * closing the PPPOL2TP file descriptor of a PPP pseudo-wire.

In some cases, when these methods are used concurrently on the same
session, the session can be removed twice, leading to use-after-free
bugs.

This patch adds a 'dead' flag, used by l2tp_session_delete() and
l2tp_tunnel_closeall() to prevent them from stepping on each other's
toes.

The session deletion path used when closing a PPPOL2TP file descriptor
doesn't need to be adapted. It already has to ensure that a session
remains valid for the lifetime of its PPPOL2TP file descriptor.
So it takes an extra reference on the session in the ->session_close()
callback (pppol2tp_session_close()), which is eventually dropped
in the ->sk_destruct() callback of the PPPOL2TP socket
(pppol2tp_session_destruct()).
Still, __l2tp_session_unhash() and l2tp_session_queue_purge() can be
called twice and even concurrently for a given session, but thanks to
proper locking and re-initialisation of list fields, this is not an
issue.

Signed-off-by: Guillaume Nault 
---
 net/l2tp/l2tp_core.c | 6 ++
 net/l2tp/l2tp_core.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index ee485df73ccd..d8c2a89a76e1 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1314,6 +1314,9 @@ void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
 
hlist_del_init(&session->hlist);
 
+   if (test_and_set_bit(0, &session->dead))
+   goto again;
+
if (session->ref != NULL)
(*session->ref)(session);
 
@@ -1750,6 +1753,9 @@ EXPORT_SYMBOL_GPL(__l2tp_session_unhash);
  */
 int l2tp_session_delete(struct l2tp_session *session)
 {
+   if (test_and_set_bit(0, &session->dead))
+   return 0;
+
if (session->ref)
(*session->ref)(session);
__l2tp_session_unhash(session);
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index a305e0c5925a..70a12df40a5f 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -76,6 +76,7 @@ struct l2tp_session_cfg {
 struct l2tp_session {
int magic;  /* should be
 * L2TP_SESSION_MAGIC */
+   longdead;
 
struct l2tp_tunnel  *tunnel;/* back pointer to tunnel
 * context */
-- 
2.14.1

[PATCH net 0/2] l2tp: fix some races in session deletion

2017-09-22 Thread Guillaume Nault

L2TP provides several interfaces for deleting sessions. Using two of
them concurrently can lead to use-after-free bugs.

Patch #2 uses a flag to prevent double removal of L2TP sessions.
Patch #1 fixes a bug found in the way. Fixing this bug is also
necessary for patch #2 to handle all cases.


This issue is similar to the tunnel deletion bug being worked on by
Sabrina: https://patchwork.ozlabs.org/patch/814173/

Guillaume Nault (2):
  l2tp: ensure sessions are freed after their PPPOL2TP socket
  l2tp: fix race between l2tp_session_delete() and
l2tp_tunnel_closeall()

 net/l2tp/l2tp_core.c | 6 ++
 net/l2tp/l2tp_core.h | 1 +
 net/l2tp/l2tp_ppp.c  | 8 
 3 files changed, 11 insertions(+), 4 deletions(-)

-- 
2.14.1

[PATCH][V2] e1000: avoid null pointer dereference on invalid stat type

2017-09-22 Thread Colin King

From: Colin Ian King 

Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously.  Fix this by nullify pointer p if a stat type is
invalid and only setting data if p is not null.

Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index ec8aa4562cc9..724c93a6ea92 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1824,7 +1824,7 @@ static void e1000_get_ethtool_stats(struct net_device 
*netdev,
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
int i;
-   char *p = NULL;
+   char *p;
const struct e1000_stats *stat = e1000_gstrings_stats;
 
e1000_update_stats(adapter);
@@ -1837,16 +1837,18 @@ static void e1000_get_ethtool_stats(struct net_device 
*netdev,
p = (char *)adapter + stat->stat_offset;
break;
default:
+   p = NULL;
WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
  stat->type, i);
break;
}
 
-   if (stat->sizeof_stat == sizeof(u64))
-   data[i] = *(u64 *)p;
-   else
-   data[i] = *(u32 *)p;
-
+   if (p) {
+   if (stat->sizeof_stat == sizeof(u64))
+   data[i] = *(u64 *)p;
+   else
+   data[i] = *(u32 *)p;
+   }
stat++;
}
 /* BUG_ON(i != E1000_STATS_LEN); */
-- 
2.14.1

Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-22 Thread Edward Cree

On 22/09/17 00:11, Y Song wrote:
> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree  wrote:
>> On 21/09/17 20:44, Alexei Starovoitov wrote:
>>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
 More intuitive, but agree on the from_be/le. Maybe we should
 just drop the "to_" prefix altogether, and leave the rest as is since
 it's not surrounded by braces, it's also not a cast but rather an op.
>> That works for me.
>>> 'be16 r4' is ambiguous regarding upper bits.
>>>
>>> what about my earlier suggestion:
>>> r4 = (be16) (u16) r4
>>> r4 = (le64) (u64) r4
>>>
>>> It will be pretty clear what instruction is doing (that upper bits become 
>>> zero).
>> Trouble with that is that's very *not* what C will do with those casts
>>  and it doesn't really capture the bidirectional/symmetry thing.  The
>>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>>  but that's quite an ugly mongrel.
>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>>  cleanest and clearest.  Should it be
>> r4 = be16 r4
>>  or just
>> be16 r4
>> ?  Personally I incline towards the latter, but admit it doesn't really
>>  match the syntax of other opcodes.
> I did some quick prototyping in llvm to make sure we have a syntax
> llvm is happy. Apparently, llvm does not like the syntax
>r4 = be16 r4orr4 = (be16) (u16) r4.
>
> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
>
> // Verify that any operand is only mentioned once.
Wait, how do you deal with (totally legal) r4 += r4?
Or r4 = *(r4 +0)?
Even jumps can have src_reg == dst_reg, though it doesn't seem useful.

[PATCH v4 2/3] ipv4: Namespaceify tcp_fastopen_key knob

2017-09-22 Thread Haishuang Yan

Different namespace application might require different tcp_fastopen_key
independently of the host.

David Miller pointed out there is a leak without releasing the context
of tcp_fastopen_key during netns teardown. So add the release action in
exit_batch path.

Tested:
1. Container namespace:
# cat /proc/sys/net/ipv4/tcp_fastopen_key:
2817fff2-f803cf97-eadfd1f3-78c0992b

cookie key in tcp syn packets:
Fast Open Cookie
Kind: TCP Fast Open Cookie (34)
Length: 10
Fast Open Cookie: 1e5dd82a8c492ca9

2. Host:
# cat /proc/sys/net/ipv4/tcp_fastopen_key:
107d7c5f-68eb2ac7-02fb06e6-ed341702

cookie key in tcp syn packets:
Fast Open Cookie
Kind: TCP Fast Open Cookie (34)
Length: 10
Fast Open Cookie: e213c02bf0afbc8a

Signed-off-by: Haishuang Yan 
---
 include/net/netns/ipv4.h   |  4 +++
 include/net/tcp.h  |  6 ++---
 net/ipv4/af_inet.c |  2 +-
 net/ipv4/sysctl_net_ipv4.c | 26 +--
 net/ipv4/tcp.c |  2 +-
 net/ipv4/tcp_fastopen.c| 64 +++---
 net/ipv4/tcp_ipv4.c|  6 +
 7 files changed, 70 insertions(+), 40 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index ce6dde0..66b8335 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -36,6 +36,8 @@ struct inet_timewait_death_row {
int sysctl_max_tw_buckets;
 };
 
+struct tcp_fastopen_context;
+
 struct netns_ipv4 {
 #ifdef CONFIG_SYSCTL
struct ctl_table_header *forw_hdr;
@@ -128,6 +130,8 @@ struct netns_ipv4 {
struct inet_timewait_death_row tcp_death_row;
int sysctl_max_syn_backlog;
int sysctl_tcp_fastopen;
+   struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
+   spinlock_t tcp_fastopen_ctx_lock;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
int sysctl_udp_l3mdev_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f628967..e27bd18 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1556,13 +1556,13 @@ struct tcp_fastopen_request {
 };
 void tcp_free_fastopen_req(struct tcp_sock *tp);
 
-extern struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
-int tcp_fastopen_reset_cipher(void *key, unsigned int len);
+void tcp_fastopen_ctx_destroy(struct net *net);
+int tcp_fastopen_reset_cipher(struct net *net, void *key, unsigned int len);
 void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb);
 struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
  struct request_sock *req,
  struct tcp_fastopen_cookie *foc);
-void tcp_fastopen_init_key_once(bool publish);
+void tcp_fastopen_init_key_once(struct net *net);
 bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss,
 struct tcp_fastopen_cookie *cookie);
 bool tcp_fastopen_defer_connect(struct sock *sk, int *err);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index ddd126d..43a1bbe 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -222,7 +222,7 @@ int inet_listen(struct socket *sock, int backlog)
(tcp_fastopen & TFO_SERVER_ENABLE) &&
!inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) {
fastopen_queue_tune(sk, backlog);
-   tcp_fastopen_init_key_once(true);
+   tcp_fastopen_init_key_once(sock_net(sk));
}
 
err = inet_csk_listen_start(sk, backlog);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index e31e853c..20e19fe 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -251,10 +251,12 @@ static int proc_allowed_congestion_control(struct 
ctl_table *ctl,
return ret;
 }
 
-static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write,
+static int proc_tcp_fastopen_key(struct ctl_table *table, int write,
 void __user *buffer, size_t *lenp,
 loff_t *ppos)
 {
+   struct net *net = container_of(table->data, struct net,
+   ipv4.sysctl_tcp_fastopen);
struct ctl_table tbl = { .maxlen = (TCP_FASTOPEN_KEY_LENGTH * 2 + 10) };
struct tcp_fastopen_context *ctxt;
int ret;
@@ -265,7 +267,7 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int 
write,
return -ENOMEM;
 
rcu_read_lock();
-   ctxt = rcu_dereference(tcp_fastopen_ctx);
+   ctxt = rcu_dereference(net->ipv4.tcp_fastopen_ctx);
if (ctxt)
memcpy(user_key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH);
else
@@ -282,12 +284,7 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, 
int write,
ret = -EINVAL;
goto bad_key;
}
-   /* Generate a dummy secret but don't publish it. This
-* is needed so we don't regenerate a new key on the
-* first invocation of t

[PATCH v4 3/3] ipv4: Namespaceify tcp_fastopen_blackhole_timeout knob

2017-09-22 Thread Haishuang Yan

Different namespace application might require different time period in
second to disable Fastopen on active TCP sockets.

Tested:
Simulate following similar situation that the server's data gets dropped
after 3WHS.
C  syn-data ---> S
C <--- syn/ack - S
C  ack > S
S (accept & write)
C?  X <- data -- S
[retry and timeout]

And then print netstat of TCPFastOpenBlackhole, the counter increased as
expected when the firewall blackhole issue is detected and active TFO is
disabled.
# cat /proc/net/netstat | awk '{print $91}'
TCPFastOpenBlackhole
1

Signed-off-by: Haishuang Yan 
---
 include/net/netns/ipv4.h   |  3 +++
 net/ipv4/sysctl_net_ipv4.c | 20 +++-
 net/ipv4/tcp_fastopen.c| 30 +++---
 net/ipv4/tcp_ipv4.c|  2 ++
 4 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 66b8335..d76edde 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -132,6 +132,9 @@ struct netns_ipv4 {
int sysctl_tcp_fastopen;
struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
spinlock_t tcp_fastopen_ctx_lock;
+   unsigned int sysctl_tcp_fastopen_blackhole_timeout;
+   atomic_t tfo_active_disable_times;
+   unsigned long tfo_active_disable_stamp;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
int sysctl_udp_l3mdev_accept;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 20e19fe..cac8dd3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -355,11 +355,13 @@ static int proc_tfo_blackhole_detect_timeout(struct 
ctl_table *table,
 void __user *buffer,
 size_t *lenp, loff_t *ppos)
 {
+   struct net *net = container_of(table->data, struct net,
+   ipv4.sysctl_tcp_fastopen_blackhole_timeout);
int ret;
 
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (write && ret == 0)
-   tcp_fastopen_active_timeout_reset();
+   atomic_set(&net->ipv4.tfo_active_disable_times, 0);
 
return ret;
 }
@@ -398,14 +400,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
.proc_handler   = proc_dointvec
},
{
-   .procname   = "tcp_fastopen_blackhole_timeout_sec",
-   .data   = &sysctl_tcp_fastopen_blackhole_timeout,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_tfo_blackhole_detect_timeout,
-   .extra1 = &zero,
-   },
-   {
.procname   = "tcp_abort_on_overflow",
.data   = &sysctl_tcp_abort_on_overflow,
.maxlen = sizeof(int),
@@ -1083,6 +1077,14 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
.maxlen = ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10),
.proc_handler   = proc_tcp_fastopen_key,
},
+   {
+   .procname   = "tcp_fastopen_blackhole_timeout_sec",
+   .data   = 
&init_net.ipv4.sysctl_tcp_fastopen_blackhole_timeout,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_tfo_blackhole_detect_timeout,
+   .extra1 = &zero,
+   },
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
{
.procname   = "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 4eae44a..de470e7 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -422,25 +422,16 @@ bool tcp_fastopen_defer_connect(struct sock *sk, int *err)
  * TFO connection with data exchanges.
  */
 
-/* Default to 1hr */
-unsigned int sysctl_tcp_fastopen_blackhole_timeout __read_mostly = 60 * 60;
-static atomic_t tfo_active_disable_times __read_mostly = ATOMIC_INIT(0);
-static unsigned long tfo_active_disable_stamp __read_mostly;
-
 /* Disable active TFO and record current jiffies and
  * tfo_active_disable_times
  */
 void tcp_fastopen_active_disable(struct sock *sk)
 {
-   atomic_inc(&tfo_active_disable_times);
-   tfo_active_disable_stamp = jiffies;
-   NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENBLACKHOLE);
-}
+   struct net *net = sock_net(sk);
 
-/* Reset tfo_active_disable_times to 0 */
-void tcp_fastopen_active_timeout_reset(void)
-{
-   atomic_set(&tfo_active_disable_times, 0);
+   atomic_inc(&net->ipv4.tfo_active_disable_times);
+   net->ipv4.tfo_active_disable_stamp = jiffies;
+   NET_INC_STATS(net, LINUX_MIB_TCPFASTOPENBLACKHOLE);
 }
 
 /* Calculate timeout for tfo active disable
@@ -449,17 +440,18 @@ void tcp_fastopen_active_timeout_reset(void)
  */
 bool tcp_fastopen_active_should_disable(struct sock *sk)
 {
-   int tfo_da_times = atomic_read(&tfo_active_

[PATCH v4 1/3] ipv4: Namespaceify tcp_fastopen knob

2017-09-22 Thread Haishuang Yan

Different namespace application might require enable TCP Fast Open
feature independently of the host.

This patch series continues making more of the TCP Fast Open related
sysctl knobs be per net-namespace.

Reported-by: Luca BRUNO 
Signed-off-by: Haishuang Yan 

---
Change since v4:
  * Fix potential leak issue
  * Fix typo error
  * Split the patches to be per-sysctl
  * Remove unrelated change by mistake
---
 include/net/netns/ipv4.h   |  1 +
 include/net/tcp.h  |  1 -
 net/ipv4/af_inet.c |  7 ---
 net/ipv4/sysctl_net_ipv4.c | 14 +++---
 net/ipv4/tcp.c |  4 ++--
 net/ipv4/tcp_fastopen.c| 11 +--
 net/ipv4/tcp_ipv4.c|  2 ++
 7 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 20d061c..ce6dde0 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -127,6 +127,7 @@ struct netns_ipv4 {
int sysctl_tcp_timestamps;
struct inet_timewait_death_row tcp_death_row;
int sysctl_max_syn_backlog;
+   int sysctl_tcp_fastopen;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
int sysctl_udp_l3mdev_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b510f28..f628967 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -240,7 +240,6 @@
 
 
 /* sysctl variables for tcp */
-extern int sysctl_tcp_fastopen;
 extern int sysctl_tcp_retrans_collapse;
 extern int sysctl_tcp_stdurg;
 extern int sysctl_tcp_rfc1337;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e31108e..ddd126d 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -195,7 +195,7 @@ int inet_listen(struct socket *sock, int backlog)
 {
struct sock *sk = sock->sk;
unsigned char old_state;
-   int err;
+   int err, tcp_fastopen;
 
lock_sock(sk);
 
@@ -217,8 +217,9 @@ int inet_listen(struct socket *sock, int backlog)
 * because the socket was in TCP_LISTEN state previously but
 * was shutdown() rather than close().
 */
-   if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) &&
-   (sysctl_tcp_fastopen & TFO_SERVER_ENABLE) &&
+   tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen;
+   if ((tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) &&
+   (tcp_fastopen & TFO_SERVER_ENABLE) &&
!inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) {
fastopen_queue_tune(sk, backlog);
tcp_fastopen_init_key_once(true);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0d3c038..e31e853c 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -401,13 +401,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
.proc_handler   = proc_dointvec
},
{
-   .procname   = "tcp_fastopen",
-   .data   = &sysctl_tcp_fastopen,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
-   {
.procname   = "tcp_fastopen_key",
.mode   = 0600,
.maxlen = ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10),
@@ -1085,6 +1078,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+   {
+   .procname   = "tcp_fastopen",
+   .data   = &init_net.ipv4.sysctl_tcp_fastopen,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
{
.procname   = "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5091402..dac56c4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1126,7 +1126,7 @@ static int tcp_sendmsg_fastopen(struct sock *sk, struct 
msghdr *msg,
struct sockaddr *uaddr = msg->msg_name;
int err, flags;
 
-   if (!(sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) ||
+   if (!(sock_net(sk)->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) ||
(uaddr && msg->msg_namelen >= sizeof(uaddr->sa_family) &&
 uaddr->sa_family == AF_UNSPEC))
return -EOPNOTSUPP;
@@ -2759,7 +2759,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_FASTOPEN_CONNECT:
if (val > 1 || val < 0) {
err = -EINVAL;
-   } else if (sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) {
+   } else if (net->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) {
if (sk->sk_state == TCP_CLOSE)
tp->fastopen_connect = val;
else
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_

[PATCH] brcm80211: make const array ucode_ofdm_rates static, reduces object code size

2017-09-22 Thread Colin King

From: Colin Ian King 

Don't populate const array ucode_ofdm_rates on the stack, instead make it
static. Makes the object code smaller by 100 bytes:

Before:
   textdata bss dec hex filename
  39482 564   0   400469c6e phy_cmn.o

After
   textdata bss dec hex filename
  39326 620   0   399469c0a phy_cmn.o

(gcc 6.3.0, x86-64)

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
index 1c4e9dd57960..3a13d176b221 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
@@ -1916,7 +1916,7 @@ void wlc_phy_txpower_update_shm(struct brcms_phy *pi)
 pi->hwpwr_txcur);
 
for (j = TXP_FIRST_OFDM; j <= TXP_LAST_OFDM; j++) {
-   const u8 ucode_ofdm_rates[] = {
+   static const u8 ucode_ofdm_rates[] = {
0x0c, 0x12, 0x18, 0x24, 0x30, 0x48, 0x60, 0x6c
};
offset = wlapi_bmac_rate_shm_offset(
-- 
2.14.1

Re: [pktgen script v2 0/2] Add a pktgen sample script of NUMA awareness

2017-09-22 Thread Jesper Dangaard Brouer

On Mon, 18 Sep 2017 11:06:21 +0200
Jesper Dangaard Brouer  wrote:

> On Sun, 17 Sep 2017 20:36:36 +0800 Robert Hoo  
> wrote:
> 
> > Change log
> > v2:
> >  Rebased to 
> > https://github.com/netoptimizer/network-testing/tree/master/pktgen  
> 
> Hi Robert,
> 
> Thank you for submitting this against my git tree[1]. I skimmed the
> patches and they looked okay.  I'll give them a test run, before I
> accept them into my tree.
> 
> Later I'll synchronize my pktgen scripts/git-tree with the kernel via
> regular patches against DaveM's net-next tree[2] (and I'll try to
> remember to give you author credit).
> 
> [1] https://github.com/netoptimizer/network-testing/tree/master/pktgen
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/samples/pktgen

FYI, I've applied and pushed these patches to my tree.
 https://github.com/netoptimizer/network-testing/commits?author=robert-hoo
 https://github.com/netoptimizer/network-testing/commit/1b9b4b797a4f112
 https://github.com/netoptimizer/network-testing/commit/65efc2352f63dde
 https://github.com/netoptimizer/network-testing/commit/54eb5178aaf4031

I fixed the description a bit, and only made one simple change:
 https://github.com/netoptimizer/network-testing/commit/9ff58568b3f8c91

Thanks for working on improving the pktgen scripts :-)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver

2017-09-22 Thread Willem de Bruijn

> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file 
> *file, struct ifreq *ifr)
> if (tfile->detached)
> return -EINVAL;
>
> +   if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN))
> +   return -EPERM;
> +

This should perhaps be moved into the !dev branch, directly below the
ns_capable check.

> dev = __dev_get_by_name(net, ifr->ifr_name);
> if (dev) {
> if (ifr->ifr_flags & IFF_TUN_EXCL)
> @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file 
> *file, struct ifreq *ifr)
> tun->flags = (tun->flags & ~TUN_FEATURES) |
> (ifr->ifr_flags & TUN_FEATURES);
>
> +   if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != 
> IFF_TAP)
> +   tun->flags = tun->flags & ~IFF_NAPI_FRAGS;
> +

Similarly, this check only need to be performed in that branch.
Instead of reverting to non-frags mode, a tun_set_iff with the wrong
set of flags should probably fail hard.

Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver

2017-09-22 Thread Willem de Bruijn

On Thu, Sep 21, 2017 at 10:17 PM, Petar Penkov  wrote:
> Changes TUN driver to use napi_gro_receive() upon receiving packets
> rather than netif_rx_ni(). Adds flag IFF_NAPI that enables these
> changes and operation is not affected if the flag is disabled.  SKBs
> are constructed upon packet arrival and are queued to be processed
> later.
>
> The new path was evaluated with a benchmark with the following setup:
> Open two tap devices and a receiver thread that reads in a loop for
> each device. Start one sender thread and pin all threads to different
> CPUs. Send 1M minimum UDP packets to each device and measure sending
> time for each of the sending methods:
> napi_gro_receive(): 4.90s
> netif_rx_ni():  4.90s
> netif_receive_skb():7.20s
>
> Signed-off-by: Petar Penkov 
> Cc: Eric Dumazet 
> Cc: Mahesh Bandewar 
> Cc: Willem de Bruijn 
> Cc: da...@davemloft.net
> Cc: ppen...@stanford.edu

Acked-by: Willem de Bruijn 

Thanks, Petar.

[PATCH] i40e: make const array patterns static, reduces object code size

2017-09-22 Thread Colin King

From: Colin Ian King 

Don't populate const array patterns on the stack, instead make it
static. Makes the object code smaller by over 60 bytes:

Before:
   textdata bss dec hex filename
   1953 496   02449 991 i40e_diag.o

After:
   textdata bss dec hex filename
   1798 584   02382 94e i40e_diag.o

(gcc 6.3.0, x86-64)

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/intel/i40e/i40e_diag.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_diag.c 
b/drivers/net/ethernet/intel/i40e/i40e_diag.c
index f141e78d409e..76ed56641864 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_diag.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_diag.c
@@ -36,7 +36,9 @@
 static i40e_status i40e_diag_reg_pattern_test(struct i40e_hw *hw,
u32 reg, u32 mask)
 {
-   const u32 patterns[] = {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};
+   static const u32 patterns[] = {
+   0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x
+   };
u32 pat, val, orig_val;
int i;
 
-- 
2.14.1

Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-22 Thread Y Song

On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree  wrote:
> On 22/09/17 00:11, Y Song wrote:
>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree  wrote:
>>> On 21/09/17 20:44, Alexei Starovoitov wrote:
 On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
> More intuitive, but agree on the from_be/le. Maybe we should
> just drop the "to_" prefix altogether, and leave the rest as is since
> it's not surrounded by braces, it's also not a cast but rather an op.
>>> That works for me.
 'be16 r4' is ambiguous regarding upper bits.

 what about my earlier suggestion:
 r4 = (be16) (u16) r4
 r4 = (le64) (u64) r4

 It will be pretty clear what instruction is doing (that upper bits become 
 zero).
>>> Trouble with that is that's very *not* what C will do with those casts
>>>  and it doesn't really capture the bidirectional/symmetry thing.  The
>>>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>>>  but that's quite an ugly mongrel.
>>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>>>  cleanest and clearest.  Should it be
>>> r4 = be16 r4
>>>  or just
>>> be16 r4
>>> ?  Personally I incline towards the latter, but admit it doesn't really
>>>  match the syntax of other opcodes.
>> I did some quick prototyping in llvm to make sure we have a syntax
>> llvm is happy. Apparently, llvm does not like the syntax
>>r4 = be16 r4orr4 = (be16) (u16) r4.
>>
>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
>>
>> // Verify that any operand is only mentioned once.
> Wait, how do you deal with (totally legal) r4 += r4?
> Or r4 = *(r4 +0)?
> Even jumps can have src_reg == dst_reg, though it doesn't seem useful.

We are talking about dag node here. The above "r4", although using the same
register, will be different dag nodes. So it will be okay.

The "r4 = be16 r4" tries to use the *same* dag node as both source and
destination
in the asm output which is prohibited.

[PATCH 1/1] forcedeth: optimize the xmit/rx with unlikely

2017-09-22 Thread Zhu Yanjun

In the xmit/rx fastpath, the function dma_map_single rarely fails.
Therefore, add an unlikely() optimization to this error check
conditional.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/nvidia/forcedeth.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c 
b/drivers/net/ethernet/nvidia/forcedeth.c
index 0a7ba3a..63a9e1e 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -1816,8 +1816,8 @@ static int nv_alloc_rx(struct net_device *dev)
 skb->data,
 skb_tailroom(skb),
 DMA_FROM_DEVICE);
-   if (dma_mapping_error(&np->pci_dev->dev,
- np->put_rx_ctx->dma)) {
+   if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+  np->put_rx_ctx->dma))) {
kfree_skb(skb);
goto packet_dropped;
}
@@ -1857,8 +1857,8 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 skb->data,
 skb_tailroom(skb),
 DMA_FROM_DEVICE);
-   if (dma_mapping_error(&np->pci_dev->dev,
- np->put_rx_ctx->dma)) {
+   if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+  np->put_rx_ctx->dma))) {
kfree_skb(skb);
goto packet_dropped;
}
@@ -2224,8 +2224,8 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
np->put_tx_ctx->dma = dma_map_single(&np->pci_dev->dev,
 skb->data + offset, bcnt,
 DMA_TO_DEVICE);
-   if (dma_mapping_error(&np->pci_dev->dev,
- np->put_tx_ctx->dma)) {
+   if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+  np->put_tx_ctx->dma))) {
/* on DMA mapping error - drop the packet */
dev_kfree_skb_any(skb);
u64_stats_update_begin(&np->swstats_tx_syncp);
@@ -2265,7 +2265,8 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
frag, offset,
bcnt,
DMA_TO_DEVICE);
-   if (dma_mapping_error(&np->pci_dev->dev, 
np->put_tx_ctx->dma)) {
+   if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+  np->put_tx_ctx->dma))) {
 
/* Unwind the mapped fragments */
do {
@@ -2373,8 +2374,8 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff 
*skb,
np->put_tx_ctx->dma = dma_map_single(&np->pci_dev->dev,
 skb->data + offset, bcnt,
 DMA_TO_DEVICE);
-   if (dma_mapping_error(&np->pci_dev->dev,
- np->put_tx_ctx->dma)) {
+   if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+  np->put_tx_ctx->dma))) {
/* on DMA mapping error - drop the packet */
dev_kfree_skb_any(skb);
u64_stats_update_begin(&np->swstats_tx_syncp);
@@ -2415,7 +2416,8 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff 
*skb,
bcnt,
DMA_TO_DEVICE);
 
-   if (dma_mapping_error(&np->pci_dev->dev, 
np->put_tx_ctx->dma)) {
+   if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+  np->put_tx_ctx->dma))) {
 
/* Unwind the mapped fragments */
do {
@@ -5070,8 +5072,8 @@ static int nv_loopback_test(struct net_device *dev)
test_dma_addr = dma_map_single(&np->pci_dev->dev, tx_skb->data,
   skb_tailroom(tx_skb),
   DMA_FROM_DEVICE);
-   if (dma_mapping_error(&n

Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-22 Thread Y Song

On Fri, Sep 22, 2017 at 7:11 AM, Y Song  wrote:
> On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree  wrote:
>> On 22/09/17 00:11, Y Song wrote:
>>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree  wrote:
 On 21/09/17 20:44, Alexei Starovoitov wrote:
> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
>> More intuitive, but agree on the from_be/le. Maybe we should
>> just drop the "to_" prefix altogether, and leave the rest as is since
>> it's not surrounded by braces, it's also not a cast but rather an op.
 That works for me.
> 'be16 r4' is ambiguous regarding upper bits.
>
> what about my earlier suggestion:
> r4 = (be16) (u16) r4
> r4 = (le64) (u64) r4
>
> It will be pretty clear what instruction is doing (that upper bits become 
> zero).
 Trouble with that is that's very *not* what C will do with those casts
  and it doesn't really capture the bidirectional/symmetry thing.  The
  closest I could see with that is something like `r4 = (be16/u16) r4`,
  but that's quite an ugly mongrel.
 I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
  cleanest and clearest.  Should it be
 r4 = be16 r4
  or just
 be16 r4
 ?  Personally I incline towards the latter, but admit it doesn't really
  match the syntax of other opcodes.
>>> I did some quick prototyping in llvm to make sure we have a syntax
>>> llvm is happy. Apparently, llvm does not like the syntax
>>>r4 = be16 r4orr4 = (be16) (u16) r4.
>>>
>>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
>>>
>>> // Verify that any operand is only mentioned once.
>> Wait, how do you deal with (totally legal) r4 += r4?
>> Or r4 = *(r4 +0)?
>> Even jumps can have src_reg == dst_reg, though it doesn't seem useful.
>
> We are talking about dag node here. The above "r4", although using the same
> register, will be different dag nodes. So it will be okay.
>
> The "r4 = be16 r4" tries to use the *same* dag node as both source and
> destination
> in the asm output which is prohibited.

With second thought, we may allow "r4 = be16 r4" by using different dag nodes.
(I need to do experiment for this.) But we do have constraints that
the two "r4" must
be the same register.  "r5 = be16 r4"  is not allowed. So from that
perspective, referencing
"r4" only once is a good idea and less confusing.

Re: [Intel-wired-lan] [PATCH][V2] e1000: avoid null pointer dereference on invalid stat type

2017-09-22 Thread Alexander Duyck

On Fri, Sep 22, 2017 at 6:41 AM, Colin King  wrote:
> From: Colin Ian King 
>
> Currently if the stat type is invalid then data[i] is being set
> either by dereferencing a null pointer p, or it is reading from
> an incorrect previous location if we had a valid stat type
> previously.  Fix this by nullify pointer p if a stat type is
> invalid and only setting data if p is not null.
>
> Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
> b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> index ec8aa4562cc9..724c93a6ea92 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> @@ -1824,7 +1824,7 @@ static void e1000_get_ethtool_stats(struct net_device 
> *netdev,
>  {
> struct e1000_adapter *adapter = netdev_priv(netdev);
> int i;
> -   char *p = NULL;
> +   char *p;
> const struct e1000_stats *stat = e1000_gstrings_stats;
>
> e1000_update_stats(adapter);

I was honestly happier with the portion of the first patch that moved
this into the loop.

> @@ -1837,16 +1837,18 @@ static void e1000_get_ethtool_stats(struct net_device 
> *netdev,
> p = (char *)adapter + stat->stat_offset;
> break;
> default:
> +   p = NULL;
> WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
>   stat->type, i);
> break;
> }
>
> -   if (stat->sizeof_stat == sizeof(u64))
> -   data[i] = *(u64 *)p;
> -   else
> -   data[i] = *(u32 *)p;
> -
> +   if (p) {
> +   if (stat->sizeof_stat == sizeof(u64))
> +   data[i] = *(u64 *)p;
> +   else
> +   data[i] = *(u32 *)p;
> +   }
> stat++;
> }

Instead of doing all this why not just call out a "continue;" instead
of a "break" if the type isn't recognized, and move the stat++ into
the loop declaration and process it at the same time as i++? That
would clean most of this up and we don't have to worry about any loop
carried variables and the like.

>  /* BUG_ON(i != E1000_STATS_LEN); */
> --
> 2.14.1
>
> ___
> Intel-wired-lan mailing list
> intel-wired-...@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-22 Thread Alexei Starovoitov

On Fri, Sep 22, 2017 at 07:27:29AM -0700, Y Song wrote:
> On Fri, Sep 22, 2017 at 7:11 AM, Y Song  wrote:
> > On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree  wrote:
> >> On 22/09/17 00:11, Y Song wrote:
> >>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree  
> >>> wrote:
>  On 21/09/17 20:44, Alexei Starovoitov wrote:
> > On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
> >> More intuitive, but agree on the from_be/le. Maybe we should
> >> just drop the "to_" prefix altogether, and leave the rest as is since
> >> it's not surrounded by braces, it's also not a cast but rather an op.
>  That works for me.
> > 'be16 r4' is ambiguous regarding upper bits.
> >
> > what about my earlier suggestion:
> > r4 = (be16) (u16) r4
> > r4 = (le64) (u64) r4
> >
> > It will be pretty clear what instruction is doing (that upper bits 
> > become zero).
>  Trouble with that is that's very *not* what C will do with those casts
>   and it doesn't really capture the bidirectional/symmetry thing.  The
>   closest I could see with that is something like `r4 = (be16/u16) r4`,
>   but that's quite an ugly mongrel.
>  I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>   cleanest and clearest.  Should it be
>  r4 = be16 r4
>   or just
>  be16 r4
>  ?  Personally I incline towards the latter, but admit it doesn't really
>   match the syntax of other opcodes.
> >>> I did some quick prototyping in llvm to make sure we have a syntax
> >>> llvm is happy. Apparently, llvm does not like the syntax
> >>>r4 = be16 r4orr4 = (be16) (u16) r4.
> >>>
> >>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
> >>>
> >>> // Verify that any operand is only mentioned once.
> >> Wait, how do you deal with (totally legal) r4 += r4?
> >> Or r4 = *(r4 +0)?
> >> Even jumps can have src_reg == dst_reg, though it doesn't seem useful.
> >
> > We are talking about dag node here. The above "r4", although using the same
> > register, will be different dag nodes. So it will be okay.
> >
> > The "r4 = be16 r4" tries to use the *same* dag node as both source and
> > destination
> > in the asm output which is prohibited.
> 
> With second thought, we may allow "r4 = be16 r4" by using different dag nodes.
> (I need to do experiment for this.) But we do have constraints that
> the two "r4" must
> be the same register.  "r5 = be16 r4"  is not allowed. So from that
> perspective, referencing
> "r4" only once is a good idea and less confusing.

looks like we're converging on
"be16/be32/be64/le16/le32/le64 #register" for BPF_END.
I guess it can live with that. I would prefer more C like syntax
to match the rest, but llvm parsing point is a strong one.

For BPG_NEG I prefer to do it in C syntax like interpreter does:
ALU_NEG:
DST = (u32) -DST;
ALU64_NEG:
DST = -DST;
Yonghong, does it mean that asmparser will equally suffer?

Re: tg3 pxe weirdness

2017-09-22 Thread Berend De Schouwer

On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote:
> 
> 
> Can you please share below details?
> 1) Model and Manufacturer of the system
> 2) Linux distro/kernel used?

4.13.3 gets a little further, but after some more data is transferred
the tg3 driver still crashes.  This is unfortunately before I've got a
writeable filesystem.

The last line is:
tg3 :01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2

I've got some ideas to get the full dmesg.

As with the other kernels it works OK on 1Gbps, but not slower
switches.

[PATCH] b43: make const arrays static, reduces object code size

2017-09-22 Thread Colin King

From: Colin Ian King 

Don't populate const arrays on the stack, instead make them static.
Makes the object code smaller by over 60 bytes:

Before:
   textdata bss dec hex filename
  148161296   0   161123ef0 b43/phy_ht.o

After:
   textdata bss dec hex filename
  145511496   0   160473eaf b43/phy_ht.o

(gcc 6.3.0, x86-64)

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/broadcom/b43/phy_ht.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/broadcom/b43/phy_ht.c 
b/drivers/net/wireless/broadcom/b43/phy_ht.c
index 718c90e81696..c3158d085c2b 100644
--- a/drivers/net/wireless/broadcom/b43/phy_ht.c
+++ b/drivers/net/wireless/broadcom/b43/phy_ht.c
@@ -119,7 +119,7 @@ static void b43_radio_2059_rcal(struct b43_wldev *dev)
 /* Calibrate the internal RC oscillator? */
 static void b43_radio_2057_rccal(struct b43_wldev *dev)
 {
-   const u16 radio_values[3][2] = {
+   static const u16 radio_values[3][2] = {
{ 0x61, 0xE9 }, { 0x69, 0xD5 }, { 0x73, 0x99 },
};
int i;
@@ -154,7 +154,7 @@ static void b43_radio_2059_init_pre(struct b43_wldev *dev)
 
 static void b43_radio_2059_init(struct b43_wldev *dev)
 {
-   const u16 routing[] = { R2059_C1, R2059_C2, R2059_C3 };
+   static const u16 routing[] = { R2059_C1, R2059_C2, R2059_C3 };
int i;
 
/* Prepare (reset?) radio */
@@ -263,7 +263,7 @@ static void b43_phy_ht_reset_cca(struct b43_wldev *dev)
 static void b43_phy_ht_zero_extg(struct b43_wldev *dev)
 {
u8 i, j;
-   u16 base[] = { 0x40, 0x60, 0x80 };
+   static const u16 base[] = { 0x40, 0x60, 0x80 };
 
for (i = 0; i < ARRAY_SIZE(base); i++) {
for (j = 0; j < 4; j++)
-- 
2.14.1

[PATCH] hv_netvsc: make const array ver_list static, reduces object code size

2017-09-22 Thread Colin King

From: Colin Ian King 

Don't populate const array ver_list on the stack, instead make it
static. Makes the object code smaller by over 400 bytes:

Before:
   textdata bss dec hex filename
  184443168 320   2193255ac drivers/net/hyperv/netvsc.o

After:
   textdata bss dec hex filename
  179503224 320   2149453f6 drivers/net/hyperv/netvsc.o

(gcc 6.3.0, x86-64)

Signed-off-by: Colin Ian King 
---
 drivers/net/hyperv/netvsc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 8d5077fb0492..b0d323e24978 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -484,7 +484,7 @@ static int netvsc_connect_vsp(struct hv_device *device,
  struct netvsc_device *net_device,
  const struct netvsc_device_info *device_info)
 {
-   const u32 ver_list[] = {
+   static const u32 ver_list[] = {
NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2,
NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5
};
-- 
2.14.1

[net-next] net: remove MTU limits for dummy and ifb device

2017-09-22 Thread Zhang Shengju

These two drivers (dummy and ifb) call ether_setup(), after commit
61e84623ace3 ("net: centralize net_device min/max MTU checking"), the
range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.

These two devices should not have limits on MTU. This patch set their
min_mtu/max_mtu to 0. So that dev_set_mtu() will not check the mtu range,
and can be set with any value.

CC: Eric Dumazet 
CC: Sabrina Dubroca 
Signed-off-by: Zhang Shengju 
---
 drivers/net/dummy.c | 2 +-
 drivers/net/ifb.c   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index e31ab3b..58483af 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -353,7 +353,7 @@ static void dummy_setup(struct net_device *dev)
eth_hw_addr_random(dev);
 
dev->min_mtu = 0;
-   dev->max_mtu = ETH_MAX_MTU;
+   dev->max_mtu = 0;
 }
 
 static int dummy_validate(struct nlattr *tb[], struct nlattr *data[],
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 8870bd2..0008da7 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -231,6 +231,9 @@ static void ifb_setup(struct net_device *dev)
eth_hw_addr_random(dev);
dev->needs_free_netdev = true;
dev->priv_destructor = ifb_dev_free;
+
+   dev->min_mtu = 0;
+   dev->max_mtu = 0;
 }
 
 static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)
-- 
1.8.3.1

RE: [PATCH] hv_netvsc: make const array ver_list static, reduces object code size

2017-09-22 Thread Haiyang Zhang



> -Original Message-
> From: Colin King [mailto:colin.k...@canonical.com]
> Sent: Friday, September 22, 2017 8:50 AM
> To: KY Srinivasan ; Haiyang Zhang
> ; Stephen Hemminger ;
> de...@linuxdriverproject.org; netdev@vger.kernel.org
> Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: [PATCH] hv_netvsc: make const array ver_list static, reduces
> object code size
> 
> From: Colin Ian King 
> 
> Don't populate const array ver_list on the stack, instead make it
> static. Makes the object code smaller by over 400 bytes:
> 
> Before:
>text  data bss dec hex filename
>   18444  3168 320   2193255ac
>   drivers/net/hyperv/netvsc.o
> 
> After:
>text  data bss dec hex filename
>   17950  3224 320   2149453f6
>   drivers/net/hyperv/netvsc.o
> 
> (gcc 6.3.0, x86-64)
> 
> Signed-off-by: Colin Ian King 
> ---

Reviewed-by: Haiyang Zhang

Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack

2017-09-22 Thread Jiri Pirko

Fri, Sep 22, 2017 at 04:11:51PM CEST, linyunsh...@huawei.com wrote:
>Hi, Jiri
>
>>>- if (!tc) {
>>>+ if (if_running) {
>>>+ (void)hns3_nic_net_stop(netdev);
>>>+ msleep(100);
>>>+ }
>>>+
>>>+ ret = (kinfo->dcb_ops && kinfo->dcb_ops->>setup_tc) ?
>>>+ kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : ->EOPNOTSUPP;
>
>>This is most odd. Why do you call dcb_ops from >ndo_setup_tc callback?
>>Why are you mixing this together? prio->tc mapping >can be done
>>directly in dcbnl
>
>Here is what we do in dcb_ops->setup_tc:
>Firstly, if current tc num is different from the tc num
>that user provide, then we setup the queues for each
>tc.
>
>Secondly, we tell hardware the pri to tc mapping that
>the stack is using. In rx direction, our hardware need
>that mapping to put different packet into different tc'
>queues according to the priority of the packet, then
>rss decides which specific queue in the tc should the
>packet goto.
>
>By mixing, I suppose you meant why we need the
>pri to tc infomation?

by mixing, I mean what I wrote. You are calling dcb_ops callback from
ndo_setup_tc callback. So you are mixing DCBNL subsystem and TC
subsystem. Why? Why do you need sch_mqprio? Why DCBNL is not enough for
all?



>I hope I did not misunderstand your question, thanks
>for your time reviewing.

URGENT RESPOND NEEDED FROM YOU BY DANIEL MMINELE THE DEPUTY GOVERNOR OF SOUTH AFRICAN RESERVE BANK

2017-09-22 Thread Mrs, Suran Yoda

Hello Dear,

My Name is Dr. Daniel Mminele the deputy governor of south African
reserve bank, you can also check  on our Bank website here  for your
assurance , http://www.whoswhosa.co.za/south-african-reserve-bank-20991

I am a citizen of South African. I am a computer analyst long time
working with reserve Bank of  South African. (R.B.) My Dear, I came
across your payment file which was Marked X and your Fund Release Hard
Disk Painted Red, I took my time to study your payment file and found
out that you have really attempted to receive the funds without
success till now. But I am using another person email address to send
you this because I don’t want my bank to notify  me alright

The most annoying thing is that the directors and top officials of
this nation could not tell you the truth that on no account will they
ever release the fund to you; instead they will let you spend money
unnecessarily. I swear with my life, I can release this funds to you
if you can certify me of my security and how I can run away from this
country to meet with you. This is like a Mafia Setting here,  if I do
this for you, I do not intend to work here all the days of my life and
I cannot run back to my country  as this Bank has my home address ,
and if I don't run away after I making the transfer to your bank
account, I may be in trouble or my life will be in danger.

The only thing I will need to release this fund is a Special Hard
Disk, we call it HD120 GIG. I will buy two of it, recopy your
information, destroy the previous one, and punch the computer to make
the fund reflect in your bank account within 24 banking hours. I will
clean up the tracer and destroy your file, after which I will run away
from here to meet with you in your country if you are interested. But
you must assure me the absolute confidentiality of this deal before I
can do anything further. Do get in touch with me immediately, You
should send to me your convenient
Tell phone  number /fax numbers for easy communications and also re
confirm your banking
details, so that there won't be any mistake.
For phone conversation, please call me on +27011745039 or
+27843464227, And you must contact me though  this my Email address
which is, danielmmine...@yahoo.com

SPECIAL INFORMATION:


YOU WILL SEND THE FEE FOR THE HARD DISK FIRST BEFORE I MAKE YOUR
TRANSFER OR YOU BUY
THE HARD DISK IN YOUR COUNTRY AND SEND IT TO ME,DON'T CONTACT ME IF
YOU CAN NOT SEND
THE HARD DISK FEE FIRST OR THE HARD DISK. AS SOON AS I RECEIVED YOUR
EMAIL I WILL LET
YOU KNOW HOW MUCH THE DISK WILL COST YOU.

Regards, Dr. Daniel Mminele the deputy governor of south African reserve bank

Re: [PATCH net v2] l2tp: fix race condition in l2tp_tunnel_delete

2017-09-22 Thread Sabrina Dubroca

2017-09-19, 18:43:37 +0200, Guillaume Nault wrote:
> On Tue, Sep 19, 2017 at 03:40:40PM +0200, Sabrina Dubroca wrote:
> > If we try to delete the same tunnel twice, the first delete operation
> > does a lookup (l2tp_tunnel_get), finds the tunnel, calls
> > l2tp_tunnel_delete, which queues it for deletion by
> > l2tp_tunnel_del_work.
> > 
> > The second delete operation also finds the tunnel and calls
> > l2tp_tunnel_delete. If the workqueue has already fired and started
> > running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the
> > same tunnel a second time, and try to free the socket again.
> > 
> > Add a dead flag to prevent firing the workqueue twice. Then we can
> > remove the check of queue_work's result that was meant to prevent that
> > race but doesn't.
> > 
> > Also check the flag in the tunnel lookup functions, to avoid returning a
> > tunnel that is already scheduled for destruction.
> > 
> > Reproducer:
> > 
> > ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 
> > remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000
> > ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 
> > peer_session_id 2000
> > ip link set l2tp1 up
> > ip l2tp del tunnel tunnel_id 3000
> > ip l2tp del tunnel tunnel_id 3000
> > 
> > Fixes: f8ccac0e4493 ("l2tp: put tunnel socket release on a workqueue")
> > Reported-by: Jianlin Shi 
> > Signed-off-by: Sabrina Dubroca 
> > ---
> > v2: as Tom Parkin explained, we can't remove the tunnel from the
> > per-net list from netlink. v2 uses only a dead flag, and adds
> > corresponding checks during lookups
> > 
> >  net/l2tp/l2tp_core.c | 18 +-
> >  net/l2tp/l2tp_core.h |  5 -
> >  2 files changed, 13 insertions(+), 10 deletions(-)
> > 
> > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> > index ee485df73ccd..3891f0260f2b 100644
> > --- a/net/l2tp/l2tp_core.c
> > +++ b/net/l2tp/l2tp_core.c
> > @@ -203,7 +203,8 @@ struct l2tp_tunnel *l2tp_tunnel_get(const struct net 
> > *net, u32 tunnel_id)
> >  
> > rcu_read_lock_bh();
> > list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
> > -   if (tunnel->tunnel_id == tunnel_id) {
> > +   if (tunnel->tunnel_id == tunnel_id &&
> > +   !test_bit(0, &tunnel->dead)) {
> > l2tp_tunnel_inc_refcount(tunnel);
> > rcu_read_unlock_bh();
> >  
> > @@ -390,7 +391,8 @@ struct l2tp_tunnel *l2tp_tunnel_find(const struct net 
> > *net, u32 tunnel_id)
> >  
> > rcu_read_lock_bh();
> > list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
> > -   if (tunnel->tunnel_id == tunnel_id) {
> > +   if (tunnel->tunnel_id == tunnel_id &&
> > +   !test_bit(0, &tunnel->dead)) {
> > rcu_read_unlock_bh();
> > return tunnel;
> > }
> > @@ -409,7 +411,7 @@ struct l2tp_tunnel *l2tp_tunnel_find_nth(const struct 
> > net *net, int nth)
> >  
> > rcu_read_lock_bh();
> > list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
> > -   if (++count > nth) {
> > +   if (++count > nth && !test_bit(0, &tunnel->dead)) {
> > rcu_read_unlock_bh();
> > return tunnel;
> > }
> > 
> I don't get why you're checking the dead flag in l2tp_tunnel_{get,find}*().
> Since it can be set concurrently right after test_bit(), it doesn't
> protect the caller from getting a tunnel that is being removed by
> l2tp_tunnel_delete().
> Or have I missed something?

You're right.

Then I would try going back to essentially v1, but keeping code to
remove the tunnel from the list in l2tp_tunnel_destruct if it's not
dead yet.

What do you think?


 8< 

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index ee485df73ccd..63cd1f30ac7d 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1234,6 +1234,23 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct 
sk_buff *skb, int hdr_len
 }
 EXPORT_SYMBOL_GPL(l2tp_xmit_skb);
 
+static bool __l2tp_tunnel_delete(struct l2tp_tunnel *tunnel)
+{
+   struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);
+   bool ret = false;
+
+   spin_lock_bh(&pn->l2tp_tunnel_list_lock);
+   if (!tunnel->dead) {
+   tunnel->dead = 1;
+   list_del_rcu(&tunnel->list);
+   atomic_dec(&l2tp_tunnel_count);
+   ret = true;
+   }
+   spin_unlock_bh(&pn->l2tp_tunnel_list_lock);
+
+   return ret;
+}
+
 /*
  * Tinnel and session create/destroy.
  */
@@ -1245,7 +1262,6 @@ EXPORT_SYMBOL_GPL(l2tp_xmit_skb);
 static void l2tp_tunnel_destruct(struct sock *sk)
 {
struct l2tp_tunnel *tunnel = l2tp_tunnel(sk);
-   struct l2tp_net *pn;
 
if (tunnel == NULL)

[PATCH] Switch to use the new hashtable implementation. This reduces the code and need for yet another hashtable implementation.

2017-09-22 Thread Aaron Wood

Signed-off-by: Aaron Wood 
---
 net/9p/error.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/9p/error.c b/net/9p/error.c
index 126fd0dceea2..2e966fcc5cbb 100644
--- a/net/9p/error.c
+++ b/net/9p/error.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -50,8 +51,8 @@ struct errormap {
struct hlist_node list;
 };
 
-#define ERRHASHSZ  32
-static struct hlist_head hash_errmap[ERRHASHSZ];
+#define ERR_HASH_BITS  5
+static DEFINE_HASHTABLE(hash_errmap, ERR_HASH_BITS);
 
 /* FixMe - reduce to a reasonable size */
 static struct errormap errmap[] = {
@@ -193,18 +194,14 @@ static struct errormap errmap[] = {
 int p9_error_init(void)
 {
struct errormap *c;
-   int bucket;
-
-   /* initialize hash table */
-   for (bucket = 0; bucket < ERRHASHSZ; bucket++)
-   INIT_HLIST_HEAD(&hash_errmap[bucket]);
+   int key;
 
/* load initial error map into hash table */
for (c = errmap; c->name != NULL; c++) {
c->namelen = strlen(c->name);
-   bucket = jhash(c->name, c->namelen, 0) % ERRHASHSZ;
+   key = jhash(c->name, c->namelen, 0);
INIT_HLIST_NODE(&c->list);
-   hlist_add_head(&c->list, &hash_errmap[bucket]);
+   hash_add(hash_errmap, &c->list, key);
}
 
return 1;
@@ -222,12 +219,12 @@ int p9_errstr2errno(char *errstr, int len)
 {
int errno;
struct errormap *c;
-   int bucket;
+   int key;
 
errno = 0;
c = NULL;
-   bucket = jhash(errstr, len, 0) % ERRHASHSZ;
-   hlist_for_each_entry(c, &hash_errmap[bucket], list) {
+   key = jhash(errstr, len, 0);
+   hash_for_each_possible(hash_errmap, c, list, key) {
if (c->namelen == len && !memcmp(c->name, errstr, len)) {
errno = c->val;
break;
-- 
2.11.0

[PATCH net-next 4/4] net: dsa: add port enable and disable helpers

2017-09-22 Thread Vivien Didelot

Provide dsa_port_enable and dsa_port_disable helpers to respectively
enable and disable a switch port. This makes the dsa_port_set_state_now
helper static.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa_priv.h |  3 ++-
 net/dsa/port.c | 31 ++-
 net/dsa/slave.c| 19 +--
 3 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 9803952a5b40..6bfff19d1615 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -117,7 +117,8 @@ void dsa_master_ethtool_restore(struct net_device *dev);
 /* port.c */
 int dsa_port_set_state(struct dsa_port *dp, u8 state,
   struct switchdev_trans *trans);
-void dsa_port_set_state_now(struct dsa_port *dp, u8 state);
+int dsa_port_enable(struct dsa_port *dp);
+void dsa_port_disable(struct dsa_port *dp);
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br);
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br);
 int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 76d43a82d397..50749339e252 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -56,7 +56,7 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state,
return 0;
 }
 
-void dsa_port_set_state_now(struct dsa_port *dp, u8 state)
+static void dsa_port_set_state_now(struct dsa_port *dp, u8 state)
 {
int err;
 
@@ -65,6 +65,35 @@ void dsa_port_set_state_now(struct dsa_port *dp, u8 state)
pr_err("DSA: failed to set STP state %u (%d)\n", state, err);
 }
 
+int dsa_port_enable(struct dsa_port *dp)
+{
+   u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING;
+   struct dsa_switch *ds = dp->ds;
+   int port = dp->index;
+   int err;
+
+   if (ds->ops->port_enable) {
+   err = ds->ops->port_enable(ds, port);
+   if (err)
+   return err;
+   }
+
+   dsa_port_set_state_now(dp, stp_state);
+
+   return 0;
+}
+
+void dsa_port_disable(struct dsa_port *dp)
+{
+   struct dsa_switch *ds = dp->ds;
+   int port = dp->index;
+
+   dsa_port_set_state_now(dp, BR_STATE_DISABLED);
+
+   if (ds->ops->port_disable)
+   ds->ops->port_disable(ds, port);
+}
+
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
 {
struct dsa_notifier_bridge_info info = {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 235a5c95dfcc..e40623939323 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -74,9 +74,7 @@ static int dsa_slave_open(struct net_device *dev)
struct dsa_slave_priv *p = netdev_priv(dev);
struct phy_device *phy = p->phy;
struct dsa_port *dp = p->dp;
-   struct dsa_switch *ds = dp->ds;
struct net_device *master = dsa_master_netdev(p);
-   u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING;
int err;
 
if (!(master->flags & IFF_UP))
@@ -99,13 +97,9 @@ static int dsa_slave_open(struct net_device *dev)
goto clear_allmulti;
}
 
-   if (ds->ops->port_enable) {
-   err = ds->ops->port_enable(ds, p->dp->index);
-   if (err)
-   goto clear_promisc;
-   }
-
-   dsa_port_set_state_now(p->dp, stp_state);
+   err = dsa_port_enable(dp);
+   if (err)
+   goto clear_promisc;
 
if (phy) {
/* If phy_stop() has been called before, phy will be in
@@ -139,15 +133,12 @@ static int dsa_slave_close(struct net_device *dev)
 {
struct dsa_slave_priv *p = netdev_priv(dev);
struct net_device *master = dsa_master_netdev(p);
-   struct dsa_switch *ds = p->dp->ds;
+   struct dsa_port *dp = p->dp;
 
if (p->phy)
phy_stop(p->phy);
 
-   dsa_port_set_state_now(p->dp, BR_STATE_DISABLED);
-
-   if (ds->ops->port_disable)
-   ds->ops->port_disable(ds, p->dp->index);
+   dsa_port_disable(dp);
 
dev_mc_unsync(master, dev);
dev_uc_unsync(master, dev);
-- 
2.14.1

[PATCH net-next 0/4] net: dsa: simplify port enabling

2017-09-22 Thread Vivien Didelot

This patchset removes the unnecessary PHY device argument in port
enable/disable switch operations, makes slave open and close symmetrical
and finally provides helpers for enabling or disabling a DSA port.

Vivien Didelot (4):
  net: dsa: move up phy enabling in core
  net: dsa: remove phy arg from port enable/disable
  net: dsa: make slave close symmetrical to open
  net: dsa: add port enable and disable helpers

 drivers/net/dsa/b53/b53_common.c   |  6 +++---
 drivers/net/dsa/b53/b53_priv.h |  4 ++--
 drivers/net/dsa/bcm_sf2.c  | 32 --
 drivers/net/dsa/lan9303-core.c |  6 ++
 drivers/net/dsa/microchip/ksz_common.c |  6 ++
 drivers/net/dsa/mt7530.c   |  8 +++-
 drivers/net/dsa/mv88e6xxx/chip.c   |  6 ++
 drivers/net/dsa/qca8k.c|  6 ++
 include/net/dsa.h  |  6 ++
 net/dsa/dsa_priv.h |  3 ++-
 net/dsa/port.c | 31 -
 net/dsa/slave.c| 36 ++
 12 files changed, 77 insertions(+), 73 deletions(-)

-- 
2.14.1

[PATCH net-next 3/4] net: dsa: make slave close symmetrical to open

2017-09-22 Thread Vivien Didelot

The DSA slave open function configures the unicast MAC addresses on the
master device, enable the switch port, change its STP state, then start
the PHY device.

Make the close function symmetric, by first stopping the PHY device,
then changing the STP state, disabling the switch port and restore the
master device.

Signed-off-by: Vivien Didelot 
---
 net/dsa/slave.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 6290741e496a..235a5c95dfcc 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -144,6 +144,11 @@ static int dsa_slave_close(struct net_device *dev)
if (p->phy)
phy_stop(p->phy);
 
+   dsa_port_set_state_now(p->dp, BR_STATE_DISABLED);
+
+   if (ds->ops->port_disable)
+   ds->ops->port_disable(ds, p->dp->index);
+
dev_mc_unsync(master, dev);
dev_uc_unsync(master, dev);
if (dev->flags & IFF_ALLMULTI)
@@ -154,11 +159,6 @@ static int dsa_slave_close(struct net_device *dev)
if (!ether_addr_equal(dev->dev_addr, master->dev_addr))
dev_uc_del(master, dev->dev_addr);
 
-   if (ds->ops->port_disable)
-   ds->ops->port_disable(ds, p->dp->index);
-
-   dsa_port_set_state_now(p->dp, BR_STATE_DISABLED);
-
return 0;
 }
 
-- 
2.14.1

[PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable

2017-09-22 Thread Vivien Didelot

The .port_enable and .port_disable functions are meant to deal with the
switch ports only, and no driver is using the phy argument anyway.
Remove it.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/b53/b53_common.c   |  6 +++---
 drivers/net/dsa/b53/b53_priv.h |  4 ++--
 drivers/net/dsa/bcm_sf2.c  | 16 +++-
 drivers/net/dsa/lan9303-core.c |  6 ++
 drivers/net/dsa/microchip/ksz_common.c |  6 ++
 drivers/net/dsa/mt7530.c   |  8 +++-
 drivers/net/dsa/mv88e6xxx/chip.c   |  6 ++
 drivers/net/dsa/qca8k.c|  6 ++
 include/net/dsa.h  |  6 ++
 net/dsa/slave.c|  4 ++--
 10 files changed, 27 insertions(+), 41 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index d4ce092def83..e46eb29d29f0 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -502,7 +502,7 @@ void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port)
 }
 EXPORT_SYMBOL(b53_imp_vlan_setup);
 
-int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
+int b53_enable_port(struct dsa_switch *ds, int port)
 {
struct b53_device *dev = ds->priv;
unsigned int cpu_port = dev->cpu_port;
@@ -531,7 +531,7 @@ int b53_enable_port(struct dsa_switch *ds, int port, struct 
phy_device *phy)
 }
 EXPORT_SYMBOL(b53_enable_port);
 
-void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
+void b53_disable_port(struct dsa_switch *ds, int port)
 {
struct b53_device *dev = ds->priv;
u8 reg;
@@ -874,7 +874,7 @@ static int b53_setup(struct dsa_switch *ds)
if (dsa_is_cpu_port(ds, port))
b53_enable_cpu_port(dev, port);
else if (!(BIT(port) & ds->enabled_port_mask))
-   b53_disable_port(ds, port, NULL);
+   b53_disable_port(ds, port);
}
 
return ret;
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 603c66d240d8..688d02ee6155 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -311,8 +311,8 @@ int b53_mirror_add(struct dsa_switch *ds, int port,
   struct dsa_mall_mirror_tc_entry *mirror, bool ingress);
 void b53_mirror_del(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror);
-int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy);
-void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy);
+int b53_enable_port(struct dsa_switch *ds, int port);
+void b53_disable_port(struct dsa_switch *ds, int port);
 void b53_brcm_hdr_setup(struct dsa_switch *ds, int port);
 void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable);
 int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy);
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index ad96b9725a2c..77e0c43f973b 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -159,8 +159,7 @@ static inline void bcm_sf2_port_intr_disable(struct 
bcm_sf2_priv *priv,
intrl2_1_writel(priv, P_IRQ_MASK(off), INTRL2_CPU_CLEAR);
 }
 
-static int bcm_sf2_port_setup(struct dsa_switch *ds, int port,
- struct phy_device *phy)
+static int bcm_sf2_port_setup(struct dsa_switch *ds, int port)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
unsigned int i;
@@ -191,11 +190,10 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
if (port == priv->moca_port)
bcm_sf2_port_intr_enable(priv, port);
 
-   return b53_enable_port(ds, port, phy);
+   return b53_enable_port(ds, port);
 }
 
-static void bcm_sf2_port_disable(struct dsa_switch *ds, int port,
-struct phy_device *phy)
+static void bcm_sf2_port_disable(struct dsa_switch *ds, int port)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
u32 off, reg;
@@ -214,7 +212,7 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, int 
port,
else
off = CORE_G_PCTL_PORT(port);
 
-   b53_disable_port(ds, port, phy);
+   b53_disable_port(ds, port);
 
/* Power down the port memory */
reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL);
@@ -613,7 +611,7 @@ static int bcm_sf2_sw_suspend(struct dsa_switch *ds)
for (port = 0; port < DSA_MAX_PORTS; port++) {
if ((1 << port) & ds->enabled_port_mask ||
dsa_is_cpu_port(ds, port))
-   bcm_sf2_port_disable(ds, port, NULL);
+   bcm_sf2_port_disable(ds, port);
}
 
return 0;
@@ -636,7 +634,7 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds)
 
for (port = 0; port < DSA_MAX_PORTS; port++) {
if ((1 << port) & ds->enabled_port_mask)
-

[PATCH net-next 1/4] net: dsa: move up phy enabling in core

2017-09-22 Thread Vivien Didelot

bcm_sf2 is currently the only driver using the phy argument passed to
.port_enable. It resets the state machine if the phy has been hard
reset. This check is generic and can be moved to DSA core.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/bcm_sf2.c | 16 +---
 net/dsa/slave.c   | 15 +--
 2 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 898d5642b516..ad96b9725a2c 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -184,22 +184,8 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port));
 
/* Re-enable the GPHY and re-apply workarounds */
-   if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) {
+   if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1)
bcm_sf2_gphy_enable_set(ds, true);
-   if (phy) {
-   /* if phy_stop() has been called before, phy
-* will be in halted state, and phy_start()
-* will call resume.
-*
-* the resume path does not configure back
-* autoneg settings, and since we hard reset
-* the phy manually here, we need to reset the
-* state machine also.
-*/
-   phy->state = PHY_READY;
-   phy_init_hw(phy);
-   }
-   }
 
/* Enable MoCA port interrupts to get notified */
if (port == priv->moca_port)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 02ace7d462c4..606812160fd5 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -72,6 +72,7 @@ static int dsa_slave_get_iflink(const struct net_device *dev)
 static int dsa_slave_open(struct net_device *dev)
 {
struct dsa_slave_priv *p = netdev_priv(dev);
+   struct phy_device *phy = p->phy;
struct dsa_port *dp = p->dp;
struct dsa_switch *ds = dp->ds;
struct net_device *master = dsa_master_netdev(p);
@@ -106,8 +107,18 @@ static int dsa_slave_open(struct net_device *dev)
 
dsa_port_set_state_now(p->dp, stp_state);
 
-   if (p->phy)
-   phy_start(p->phy);
+   if (phy) {
+   /* If phy_stop() has been called before, phy will be in
+* halted state, and phy_start() will call resume.
+*
+* The resume path does not configure back autoneg
+* settings, and since the internal phy may have been
+* hard reset, we need to reset the state machine also.
+*/
+   phy->state = PHY_READY;
+   phy_init_hw(phy);
+   phy_start(phy);
+   }
 
return 0;
 
-- 
2.14.1

Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-22 Thread Edward Cree

On 22/09/17 16:16, Alexei Starovoitov wrote:
> looks like we're converging on
> "be16/be32/be64/le16/le32/le64 #register" for BPF_END.
> I guess it can live with that. I would prefer more C like syntax
> to match the rest, but llvm parsing point is a strong one.
Yep, agreed.  I'll post a v2 once we've settled BPF_NEG.
> For BPG_NEG I prefer to do it in C syntax like interpreter does:
> ALU_NEG:
> DST = (u32) -DST;
> ALU64_NEG:
> DST = -DST;
> Yonghong, does it mean that asmparser will equally suffer?
Correction to my earlier statements: verifier will currently disassemble
 neg as:
(87) r0 neg 0
(84) (u32) r0 neg (u32) 0
 because it pretends 'neg' is a compound-assignment operator like +=.
The analogy with be16 and friends would be to use
neg64 r0
neg32 r0
 whereas the analogy with everything else would be
r0 = -r0
r0 = (u32) -r0
 as Alexei says.
I'm happy to go with Alexei's version if it doesn't cause problems for llvm.

Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core

2017-09-22 Thread Andrew Lunn

On Fri, Sep 22, 2017 at 12:17:50PM -0400, Vivien Didelot wrote:
> bcm_sf2 is currently the only driver using the phy argument passed to
> .port_enable. It resets the state machine if the phy has been hard
> reset. This check is generic and can be moved to DSA core.
>  
>   dsa_port_set_state_now(p->dp, stp_state);
>  
> - if (p->phy)
> - phy_start(p->phy);
> + if (phy) {
> + /* If phy_stop() has been called before, phy will be in
> +  * halted state, and phy_start() will call resume.
> +  *
> +  * The resume path does not configure back autoneg
> +  * settings, and since the internal phy may have been
> +  * hard reset, we need to reset the state machine also.
> +  */
> + phy->state = PHY_READY;
> + phy_init_hw(phy);
> + phy_start(phy);
> + }

Hi Vivien

If this is generic, why is it needed at all here? Shouldn't this
actually by in phylib?

Florian ?

   Andrew

Re: [PATCH net-next 3/4] net: dsa: make slave close symmetrical to open

2017-09-22 Thread Andrew Lunn

On Fri, Sep 22, 2017 at 12:17:52PM -0400, Vivien Didelot wrote:
> The DSA slave open function configures the unicast MAC addresses on the
> master device, enable the switch port, change its STP state, then start
> the PHY device.
> 
> Make the close function symmetric, by first stopping the PHY device,
> then changing the STP state, disabling the switch port and restore the
> master device.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next 4/4] net: dsa: add port enable and disable helpers

2017-09-22 Thread Andrew Lunn

On Fri, Sep 22, 2017 at 12:17:53PM -0400, Vivien Didelot wrote:
> Provide dsa_port_enable and dsa_port_disable helpers to respectively
> enable and disable a switch port. This makes the dsa_port_set_state_now
> helper static.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [Intel-wired-lan] [PATCH] i40e: make const array patterns static, reduces object code size

2017-09-22 Thread Jesse Brandeburg

On Fri, 22 Sep 2017 15:11:38 +0100
Colin King  wrote:

> From: Colin Ian King 
> 
> Don't populate const array patterns on the stack, instead make it
> static. Makes the object code smaller by over 60 bytes:
> 
> Before:
>text  data bss dec hex filename
>1953   496   02449 991 i40e_diag.o
> 
> After:
>text  data bss dec hex filename
>1798   584   02382 94e i40e_diag.o
> 
> (gcc 6.3.0, x86-64)
> 
> Signed-off-by: Colin Ian King 

Looks good, thanks Colin!

Acked-by: Jesse Brandeburg

Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver

2017-09-22 Thread महेश बंडेवार

On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijn
 wrote:
>> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file 
>> *file, struct ifreq *ifr)
>> if (tfile->detached)
>> return -EINVAL;
>>
>> +   if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN))
>> +   return -EPERM;
>> +
>
> This should perhaps be moved into the !dev branch, directly below the
> ns_capable check.
>
Hmm, does that mean fail only on creation but allow to attach if
exists? That would be wrong, isn't it? Correct me if I'm wrong but we
want to prevent both these scenarios if user does not have sufficient
privileges (i.e. NET_ADMIN in init-ns).

>> dev = __dev_get_by_name(net, ifr->ifr_name);
>> if (dev) {
>> if (ifr->ifr_flags & IFF_TUN_EXCL)
>> @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file 
>> *file, struct ifreq *ifr)
>> tun->flags = (tun->flags & ~TUN_FEATURES) |
>> (ifr->ifr_flags & TUN_FEATURES);
>>
>> +   if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != 
>> IFF_TAP)
>> +   tun->flags = tun->flags & ~IFF_NAPI_FRAGS;
>> +
>
> Similarly, this check only need to be performed in that branch.
> Instead of reverting to non-frags mode, a tun_set_iff with the wrong
> set of flags should probably fail hard.
Yes, agree, wrong set of flags should fail hard and probably be done
before attach or open, no?

Re: [PATCH] brcm80211: make const array ucode_ofdm_rates static, reduces object code size

2017-09-22 Thread Arend van Spriel


Please use 'brcmsmac:' as prefix instead of 'brcm80211:'.

On 22-09-17 16:03, Colin King wrote:

From: Colin Ian King 

Don't populate const array ucode_ofdm_rates on the stack, instead make it
static. Makes the object code smaller by 100 bytes:

Before:
text   data bss dec hex filename
   39482564   0   400469c6e phy_cmn.o

After
text   data bss dec hex filename
   39326620   0   399469c0a phy_cmn.o

(gcc 6.3.0, x86-64)


Acked-by: Arend van Spriel 

Signed-off-by: Colin Ian King 
---
  drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
index 1c4e9dd57960..3a13d176b221 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
@@ -1916,7 +1916,7 @@ void wlc_phy_txpower_update_shm(struct brcms_phy *pi)
 pi->hwpwr_txcur);
  
  		for (j = TXP_FIRST_OFDM; j <= TXP_LAST_OFDM; j++) {

-   const u8 ucode_ofdm_rates[] = {
+   static const u8 ucode_ofdm_rates[] = {
0x0c, 0x12, 0x18, 0x24, 0x30, 0x48, 0x60, 0x6c
};
offset = wlapi_bmac_rate_shm_offset(

Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core

2017-09-22 Thread Florian Fainelli

On 09/22/2017 09:17 AM, Vivien Didelot wrote:
> bcm_sf2 is currently the only driver using the phy argument passed to
> .port_enable. It resets the state machine if the phy has been hard
> reset. This check is generic and can be moved to DSA core.

This is completely specific to bcm_sf2 because it does call
bcm_sf2_gphy_enable_set() which performs a HW reset of the PHY, you
can't move this to the generic portion of net/dsa/slave.c. NACK.

> 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/bcm_sf2.c | 16 +---
>  net/dsa/slave.c   | 15 +--
>  2 files changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
> index 898d5642b516..ad96b9725a2c 100644
> --- a/drivers/net/dsa/bcm_sf2.c
> +++ b/drivers/net/dsa/bcm_sf2.c
> @@ -184,22 +184,8 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
> port,
>   core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port));
>  
>   /* Re-enable the GPHY and re-apply workarounds */
> - if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) {
> + if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1)
>   bcm_sf2_gphy_enable_set(ds, true);
> - if (phy) {
> - /* if phy_stop() has been called before, phy
> -  * will be in halted state, and phy_start()
> -  * will call resume.
> -  *
> -  * the resume path does not configure back
> -  * autoneg settings, and since we hard reset
> -  * the phy manually here, we need to reset the
> -  * state machine also.
> -  */
> - phy->state = PHY_READY;
> - phy_init_hw(phy);
> - }
> - }
>  
>   /* Enable MoCA port interrupts to get notified */
>   if (port == priv->moca_port)
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 02ace7d462c4..606812160fd5 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -72,6 +72,7 @@ static int dsa_slave_get_iflink(const struct net_device 
> *dev)
>  static int dsa_slave_open(struct net_device *dev)
>  {
>   struct dsa_slave_priv *p = netdev_priv(dev);
> + struct phy_device *phy = p->phy;
>   struct dsa_port *dp = p->dp;
>   struct dsa_switch *ds = dp->ds;
>   struct net_device *master = dsa_master_netdev(p);
> @@ -106,8 +107,18 @@ static int dsa_slave_open(struct net_device *dev)
>  
>   dsa_port_set_state_now(p->dp, stp_state);
>  
> - if (p->phy)
> - phy_start(p->phy);
> + if (phy) {
> + /* If phy_stop() has been called before, phy will be in
> +  * halted state, and phy_start() will call resume.
> +  *
> +  * The resume path does not configure back autoneg
> +  * settings, and since the internal phy may have been
> +  * hard reset, we need to reset the state machine also.
> +  */
> + phy->state = PHY_READY;
> + phy_init_hw(phy);
> + phy_start(phy);
> + }
>  
>   return 0;
>  
> 


-- 
Florian

Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core

2017-09-22 Thread Florian Fainelli

On 09/22/2017 09:32 AM, Andrew Lunn wrote:
> On Fri, Sep 22, 2017 at 12:17:50PM -0400, Vivien Didelot wrote:
>> bcm_sf2 is currently the only driver using the phy argument passed to
>> .port_enable. It resets the state machine if the phy has been hard
>> reset. This check is generic and can be moved to DSA core.
>>  
>>  dsa_port_set_state_now(p->dp, stp_state);
>>  
>> -if (p->phy)
>> -phy_start(p->phy);
>> +if (phy) {
>> +/* If phy_stop() has been called before, phy will be in
>> + * halted state, and phy_start() will call resume.
>> + *
>> + * The resume path does not configure back autoneg
>> + * settings, and since the internal phy may have been
>> + * hard reset, we need to reset the state machine also.
>> + */
>> +phy->state = PHY_READY;
>> +phy_init_hw(phy);
>> +phy_start(phy);
>> +}
> 
> Hi Vivien
> 
> If this is generic, why is it needed at all here? Shouldn't this
> actually by in phylib?

This does not belong in the core logic within net/dsa/slave.c. The
reason why this is necessary here is because we are doing a HW-based
reset of the PHY, as the comment explains this is specific to how the HW
works. There may be a cleaner solution to this problem, but in any case,
I don't think other drivers should inherit that logic.
-- 
Florian

usb/wireless/rsi_91x: use-after-free write in __run_timers

2017-09-22 Thread Andrey Konovalov

Hi!

I've got the following report while fuzzing the kernel with syzkaller.

On commit 6e80ecdddf4ea6f3cd84e83720f3d852e6624a68 (Sep 21).

==
BUG: KASAN: use-after-free in __run_timers+0xc0e/0xd40
Write of size 8 at addr 880069f701b8 by task swapper/0/0

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc1-42311-g6e80ecdddf4e #234
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x292/0x395 lib/dump_stack.c:52
 print_address_description+0x78/0x280 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351
 kasan_report+0x22f/0x340 mm/kasan/report.c:409
 __asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435
 collect_expired_timers ./include/linux/list.h:729
 __run_timers+0xc0e/0xd40 kernel/time/timer.c:1616
 run_timer_softirq+0x83/0x140 kernel/time/timer.c:1646
 __do_softirq+0x305/0xc2d kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364
 irq_exit+0x171/0x1a0 kernel/softirq.c:405
 exiting_irq ./arch/x86/include/asm/apic.h:638
 smp_apic_timer_interrupt+0x2b9/0x8d0 arch/x86/kernel/apic/apic.c:1048
 apic_timer_interrupt+0x9d/0xb0
 
RIP: 0010:native_safe_halt+0x6/0x10 ./arch/x86/include/asm/irqflags.h:53
RSP: 0018:86607958 EFLAGS: 0282 ORIG_RAX: ff10
RAX: dc20 RBX: 10cc0f2f RCX: 
RDX:  RSI: 0001 RDI: 8662ea64
RBP: 86607958 R08: 813d3501 R09: 
R10:  R11:  R12: 10cc0f3b
R13: 86607a98 R14: 86fc1628 R15: 
 arch_safe_halt ./arch/x86/include/asm/paravirt.h:93
 default_idle+0x127/0x690 arch/x86/kernel/process.c:341
 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:332
 default_idle_call+0x3b/0x60 kernel/sched/idle.c:98
 cpuidle_idle_call kernel/sched/idle.c:156
 do_idle+0x35c/0x440 kernel/sched/idle.c:246
 cpu_startup_entry+0x1d/0x20 kernel/sched/idle.c:351
 rest_init+0xf3/0x100 init/main.c:435
 start_kernel+0x782/0x7b0 init/main.c:710
 x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:377
 x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:358
 secondary_startup_64+0xa5/0xa5 arch/x86/kernel/head_64.S:235

Allocated by task 1845:
 save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772
 kmalloc ./include/linux/slab.h:493
 kzalloc ./include/linux/slab.h:666
 rsi_91x_init+0x98/0x510 drivers/net/wireless/rsi/rsi_91x_main.c:203
 rsi_probe+0xb6/0x13b0 drivers/net/wireless/rsi/rsi_91x_usb.c:665
 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
 really_probe drivers/base/dd.c:413
 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710
 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
 device_add+0xd0b/0x1660 drivers/base/core.c:1835
 usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932
 generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
 usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
 really_probe drivers/base/dd.c:413
 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710
 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
 device_add+0xd0b/0x1660 drivers/base/core.c:1835
 usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
 hub_port_connect drivers/usb/core/hub.c:4903
 hub_port_connect_change drivers/usb/core/hub.c:5009
 port_event drivers/usb/core/hub.c:5115
 hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
 worker_thread+0x221/0x1850 kernel/workqueue.c:2253
 kthread+0x3a1/0x470 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

Freed by task 1845:
 save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459
 kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524
 slab_free_hook mm/slub.c:1390
 slab_free_freelist_hook mm/slub.c:1412
 slab_free mm/slub.c:2988
 kfree+0xf6/0x2f0 mm/slub.c:3919
 rsi_91x_deinit+0x1e8/0x250 drivers/net/wireless/rsi/rsi_91x_main.c:268
 rsi_probe+0xed1/0x13b0 drivers/net/wireless/rsi/rsi_91x_usb.c:709
 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
 really_probe drivers/base/dd.c:413
 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
 bus_for_each_d

Re: [PATCH iproute2 master 0/2] BPF/XDP json follow-up

2017-09-22 Thread Stephen Hemminger

On Thu, 21 Sep 2017 10:42:27 +0200
Daniel Borkmann  wrote:

> After merging net-next branch into master, Stephen asked to
> fix up json dump for XDP as there were some merge conflicts,
> so here it is.
> 
> Thanks!
> 
> Daniel Borkmann (2):
>   json: move json printer to common library
>   bpf: properly output json for xdp
> 
>  include/json_print.h |  71 
>  ip/Makefile  |   2 +-
>  ip/ip_common.h   |  65 ++
>  ip/ip_print.c| 233 
> ---
>  ip/iplink_xdp.c  |  74 +---
>  lib/Makefile |   2 +-
>  lib/bpf.c|  19 +++--
>  lib/json_print.c | 231 ++
>  8 files changed, 369 insertions(+), 328 deletions(-)
>  create mode 100644 include/json_print.h
>  delete mode 100644 ip/ip_print.c
>  create mode 100644 lib/json_print.c
> 

Applied.

[PATCH] Add a driver for Renesas uPD60620 and uPD60620A PHYs

2017-09-22 Thread Bernd Edlinger

Signed-off-by: Bernd Edlinger 
---
  drivers/net/phy/Kconfig|   5 +
  drivers/net/phy/Makefile   |   1 +
  drivers/net/phy/uPD60620.c | 226 
+
  3 files changed, 232 insertions(+)
  create mode 100644 drivers/net/phy/uPD60620.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index a9d16a3..25089f0 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -287,6 +287,11 @@ config DP83867_PHY
---help---
  Currently supports the DP83867 PHY.

+config RENESAS_PHY
+   tristate "Driver for Renesas PHYs"
+   ---help---
+ Supports the uPD60620 and uPD60620A PHYs.
+
  config FIXED_PHY
tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs"
depends on PHYLIB
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 416df92..1404ad3 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -72,6 +72,7 @@ obj-$(CONFIG_MICROSEMI_PHY)   += mscc.o
  obj-$(CONFIG_NATIONAL_PHY)+= national.o
  obj-$(CONFIG_QSEMI_PHY)   += qsemi.o
  obj-$(CONFIG_REALTEK_PHY) += realtek.o
+obj-$(CONFIG_RENESAS_PHY)  += uPD60620.o
  obj-$(CONFIG_ROCKCHIP_PHY)+= rockchip.o
  obj-$(CONFIG_SMSC_PHY)+= smsc.o
  obj-$(CONFIG_STE10XP) += ste10Xp.o
diff --git a/drivers/net/phy/uPD60620.c b/drivers/net/phy/uPD60620.c
new file mode 100644
index 000..b3d900c
--- /dev/null
+++ b/drivers/net/phy/uPD60620.c
@@ -0,0 +1,226 @@
+/*
+ * Driver for the Renesas PHY uPD60620.
+ *
+ * Copyright (C) 2015 Softing Industrial Automation GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ */
+
+#include 
+#include 
+#include 
+
+#define UPD60620_PHY_ID0xb8242824
+
+/* Extended Registers and values */
+/* PHY Special Control/Status*/
+#define PHY_PHYSCR 0x1F  /* PHY.31 */
+#define PHY_PHYSCR_10MB0x0004/* PHY speed = 10mb */
+#define PHY_PHYSCR_100MB   0x0008/* PHY speed = 100mb */
+#define PHY_PHYSCR_DUPLEX  0x0010/* PHY Duplex */
+#define PHY_PHYSCR_RSVD5   0x0020/* Reserved Bit 5 */
+#define PHY_PHYSCR_MIIMOD  0x0040/* Enable 4B5B MII mode */
+#define PHY_PHYSCR_RSVD7   0x0080/* Reserved Bit 7 */
+#define PHY_PHYSCR_RSVD8   0x0100/* Reserved Bit 8 */
+#define PHY_PHYSCR_RSVD9   0x0200/* Reserved Bit 9 */
+#define PHY_PHYSCR_RSVD10  0x0400/* Reserved Bit 10 */
+#define PHY_PHYSCR_RSVD11  0x0800/* Reserved Bit 11 */
+#define PHY_PHYSCR_ANDONE  0x1000/* Auto negotiation done */
+#define PHY_PHYSCR_RSVD13  0x2000/* Reserved Bit 13 */
+#define PHY_PHYSCR_RSVD14  0x4000/* Reserved Bit 14 */
+#define PHY_PHYSCR_RSVD15  0x8000/* Reserved Bit 15 */
+
+/* PHY Global Config Mapping */
+#define PHY_GLOBAL_CONFIG  0x07
+/* PHY GPIO Config Register 1 */
+#define PHY_GPIO_CONFIG1   0x01 /* PHY 7.1 */
+#define PHY_GPIO4_INT0 0x000d   /* GPIO4 configuration */
+#define PHY_GPIO5_INT1 0x00d0   /* GPIO5 configuration */
+
+/* PHY Interrupt Control Register */
+#define PHY_ICR0x1e  /* PHY.30 */
+#define PHY_ICR_RSVD0  0x0001/* Reserved bit 0 */
+#define PHY_ICR_ANCPRRN0x0002/* Auto negotiation paged received */
+#define PHY_ICR_PDFEN  0x0004/* Parallel detection fault */
+#define PHY_ICR_ANCLPAEN   0x0008/* Auto negotiation last page ack */
+#define PHY_ICR_LNKINTEN   0x0010/* Link down */
+#define PHY_ICR_REMFD  0x0020/* Remote fault detected */
+#define PHY_ICR_ANCINTEN   0x0040/* Auto negotiation complete */
+#define PHY_ICR_EOEN   0x0080/* Energy on generated */
+#define PHY_ICR_RSVD8  0x0100/* Reserved bit 8 */
+#define PHY_ICR_FEQTRGEN   0x0200/* FEQ Trigger */
+#define PHY_ICR_BERTRGEN   0x0400/* BER Counter Trigger */
+#define PHY_ICR_MLINTEN0x0800/* Maxlvl */
+#define PHY_ICR_CLPINTEN   0x1000/* Clipping */
+#define PHY_ICR_RSVD13 0x2000/* Reserved bit 13 */
+#define PHY_ICR_RSVD14 0x4000/* Reserved bit 14 */
+#define PHY_ICR_RSVD15 0x8000/* Reserved bit 15 */
+
+/* PHY Interrupt Status Register */
+#define PHY_ISR0x1d  /* PHY.29 */
+#define PHY_ISR_DUPINT 0x/* Placeholder for Duplex/Speed 
intr */
+#define PHY_ISR_RSVD0  0x0001/* Reserved bit 0 */
+#define PHY_ISR_ANCPR  0x0002/* Auto negotiation paged received */
+#define PHY_ISR_PDF0x0004/* Parallel detection fault */
+#define PHY_ISR_ANCLPA 0x0008/* Auto negotiation last page ack */
+#define PHY_ISR_LNKINT 0x0010/* Link down */
+#define PHY_ISR_REMFD  0x0020/* Remote fault detected */
+#define PHY_ISR_ANCINT 0x0040/* Auto negotiation complete */
+#define PHY_ISR_EO 0x0080/* Energy on generate

Re: [PATCH iproute2 v2] man: fix documentation for range of route table ID

2017-09-22 Thread Stephen Hemminger

On Fri, 22 Sep 2017 13:28:54 +0200
Thomas Haller  wrote:

> Signed-off-by: Thomas Haller 
> ---
> Changes in v2:
>   - "0" is not a valid table ID.
> 
>  man/man8/ip-route.8.in | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
> index 803de3b9..705ceb20 100644
> --- a/man/man8/ip-route.8.in
> +++ b/man/man8/ip-route.8.in
> @@ -322,7 +322,7 @@ normal routing tables.
>  .P
>  .B Route tables:
>  Linux-2.x can pack routes into several routing tables identified
> -by a number in the range from 1 to 2^31 or by name from the file
> +by a number in the range from 1 to 2^32-1 or by name from the file
>  .B @SYSCONFDIR@/rt_tables
>  By default all normal routes are inserted into the
>  .B main

Applied

Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver

2017-09-22 Thread महेश बंडेवार

>  #ifdef CONFIG_TUN_VNET_CROSS_LE
>  static inline bool tun_legacy_is_little_endian(struct tun_struct *tun)
>  {
> @@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool 
> clean)
>
> tun = rtnl_dereference(tfile->tun);
>
> +   if (tun && clean) {
> +   tun_napi_disable(tun, tfile);
are we missing synchronize_net() separating disable and del calls?
> +   tun_napi_del(tun, tfile);
> +   }
> +
> if (tun && !tfile->detached) {
> u16 index = tfile->queue_index;
> BUG_ON(index >= tun->numqueues);

[PATCH][V3] e1000: avoid null pointer dereference on invalid stat type

2017-09-22 Thread Colin King

From: Colin Ian King 

Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously.  Fix this by skipping over the read of p on an invalid
stat type.

Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index ec8aa4562cc9..3b3983a1ffbb 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1824,11 +1824,12 @@ static void e1000_get_ethtool_stats(struct net_device 
*netdev,
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
int i;
-   char *p = NULL;
const struct e1000_stats *stat = e1000_gstrings_stats;
 
e1000_update_stats(adapter);
-   for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) {
+   for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++, stat++) {
+   char *p;
+
switch (stat->type) {
case NETDEV_STATS:
p = (char *)netdev + stat->stat_offset;
@@ -1839,15 +1840,13 @@ static void e1000_get_ethtool_stats(struct net_device 
*netdev,
default:
WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
  stat->type, i);
-   break;
+   continue;
}
 
if (stat->sizeof_stat == sizeof(u64))
data[i] = *(u64 *)p;
else
data[i] = *(u32 *)p;
-
-   stat++;
}
 /* BUG_ON(i != E1000_STATS_LEN); */
 }
-- 
2.14.1

Re: [PATCH net-next 3/4] net: dsa: make slave close symmetrical to open

2017-09-22 Thread Florian Fainelli

On 09/22/2017 09:17 AM, Vivien Didelot wrote:
> The DSA slave open function configures the unicast MAC addresses on the
> master device, enable the switch port, change its STP state, then start
> the PHY device.
> 
> Make the close function symmetric, by first stopping the PHY device,
> then changing the STP state, disabling the switch port and restore the
> master device.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable

2017-09-22 Thread Florian Fainelli

On 09/22/2017 09:17 AM, Vivien Didelot wrote:
> The .port_enable and .port_disable functions are meant to deal with the
> switch ports only, and no driver is using the phy argument anyway.
> Remove it.

I don't think this makes sense, there are perfectly legit reasons why a
switch driver may have something to do with the PHY device attached to
its per-port network interface, we should definitively keep that around,
unless you think we should be accessing the PHY within the switch
drivers by doing:

struct phy_device *phydev = ds->ports[port].netdev->phydev?

> 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/b53/b53_common.c   |  6 +++---
>  drivers/net/dsa/b53/b53_priv.h |  4 ++--
>  drivers/net/dsa/bcm_sf2.c  | 16 +++-
>  drivers/net/dsa/lan9303-core.c |  6 ++
>  drivers/net/dsa/microchip/ksz_common.c |  6 ++
>  drivers/net/dsa/mt7530.c   |  8 +++-
>  drivers/net/dsa/mv88e6xxx/chip.c   |  6 ++
>  drivers/net/dsa/qca8k.c|  6 ++
>  include/net/dsa.h  |  6 ++
>  net/dsa/slave.c|  4 ++--
>  10 files changed, 27 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/net/dsa/b53/b53_common.c 
> b/drivers/net/dsa/b53/b53_common.c
> index d4ce092def83..e46eb29d29f0 100644
> --- a/drivers/net/dsa/b53/b53_common.c
> +++ b/drivers/net/dsa/b53/b53_common.c
> @@ -502,7 +502,7 @@ void b53_imp_vlan_setup(struct dsa_switch *ds, int 
> cpu_port)
>  }
>  EXPORT_SYMBOL(b53_imp_vlan_setup);
>  
> -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
> +int b53_enable_port(struct dsa_switch *ds, int port)
>  {
>   struct b53_device *dev = ds->priv;
>   unsigned int cpu_port = dev->cpu_port;
> @@ -531,7 +531,7 @@ int b53_enable_port(struct dsa_switch *ds, int port, 
> struct phy_device *phy)
>  }
>  EXPORT_SYMBOL(b53_enable_port);
>  
> -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device 
> *phy)
> +void b53_disable_port(struct dsa_switch *ds, int port)
>  {
>   struct b53_device *dev = ds->priv;
>   u8 reg;
> @@ -874,7 +874,7 @@ static int b53_setup(struct dsa_switch *ds)
>   if (dsa_is_cpu_port(ds, port))
>   b53_enable_cpu_port(dev, port);
>   else if (!(BIT(port) & ds->enabled_port_mask))
> - b53_disable_port(ds, port, NULL);
> + b53_disable_port(ds, port);
>   }
>  
>   return ret;
> diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
> index 603c66d240d8..688d02ee6155 100644
> --- a/drivers/net/dsa/b53/b53_priv.h
> +++ b/drivers/net/dsa/b53/b53_priv.h
> @@ -311,8 +311,8 @@ int b53_mirror_add(struct dsa_switch *ds, int port,
>  struct dsa_mall_mirror_tc_entry *mirror, bool ingress);
>  void b53_mirror_del(struct dsa_switch *ds, int port,
>   struct dsa_mall_mirror_tc_entry *mirror);
> -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy);
> -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device 
> *phy);
> +int b53_enable_port(struct dsa_switch *ds, int port);
> +void b53_disable_port(struct dsa_switch *ds, int port);
>  void b53_brcm_hdr_setup(struct dsa_switch *ds, int port);
>  void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable);
>  int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy);
> diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
> index ad96b9725a2c..77e0c43f973b 100644
> --- a/drivers/net/dsa/bcm_sf2.c
> +++ b/drivers/net/dsa/bcm_sf2.c
> @@ -159,8 +159,7 @@ static inline void bcm_sf2_port_intr_disable(struct 
> bcm_sf2_priv *priv,
>   intrl2_1_writel(priv, P_IRQ_MASK(off), INTRL2_CPU_CLEAR);
>  }
>  
> -static int bcm_sf2_port_setup(struct dsa_switch *ds, int port,
> -   struct phy_device *phy)
> +static int bcm_sf2_port_setup(struct dsa_switch *ds, int port)
>  {
>   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
>   unsigned int i;
> @@ -191,11 +190,10 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, 
> int port,
>   if (port == priv->moca_port)
>   bcm_sf2_port_intr_enable(priv, port);
>  
> - return b53_enable_port(ds, port, phy);
> + return b53_enable_port(ds, port);
>  }
>  
> -static void bcm_sf2_port_disable(struct dsa_switch *ds, int port,
> -  struct phy_device *phy)
> +static void bcm_sf2_port_disable(struct dsa_switch *ds, int port)
>  {
>   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
>   u32 off, reg;
> @@ -214,7 +212,7 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, 
> int port,
>   else
>   off = CORE_G_PCTL_PORT(port);
>  
> - b53_disable_port(ds, port, phy);
> + b53_disable_port(ds, port);
>  
>   /* Power down the port memory */
>   reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL);
> @@

Re: [PATCH net-next 4/4] net: dsa: add port enable and disable helpers

2017-09-22 Thread Florian Fainelli

On 09/22/2017 09:17 AM, Vivien Didelot wrote:
> Provide dsa_port_enable and dsa_port_disable helpers to respectively
> enable and disable a switch port. This makes the dsa_port_set_state_now
> helper static.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver

2017-09-22 Thread Petar Penkov

On Fri, Sep 22, 2017 at 9:51 AM, Mahesh Bandewar (महेश बंडेवार)
 wrote:
> On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijn
>  wrote:
>>> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file 
>>> *file, struct ifreq *ifr)
>>> if (tfile->detached)
>>> return -EINVAL;
>>>
>>> +   if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN))
>>> +   return -EPERM;
>>> +
>>
>> This should perhaps be moved into the !dev branch, directly below the
>> ns_capable check.
>>
> Hmm, does that mean fail only on creation but allow to attach if
> exists? That would be wrong, isn't it? Correct me if I'm wrong but we
> want to prevent both these scenarios if user does not have sufficient
> privileges (i.e. NET_ADMIN in init-ns).
>
My understanding is we want to protect both scenarios.
>>> dev = __dev_get_by_name(net, ifr->ifr_name);
>>> if (dev) {
>>> if (ifr->ifr_flags & IFF_TUN_EXCL)
>>> @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file 
>>> *file, struct ifreq *ifr)
>>> tun->flags = (tun->flags & ~TUN_FEATURES) |
>>> (ifr->ifr_flags & TUN_FEATURES);
>>>
>>> +   if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != 
>>> IFF_TAP)
>>> +   tun->flags = tun->flags & ~IFF_NAPI_FRAGS;
>>> +
>>
>> Similarly, this check only need to be performed in that branch.
>> Instead of reverting to non-frags mode, a tun_set_iff with the wrong
>> set of flags should probably fail hard.
> Yes, agree, wrong set of flags should fail hard and probably be done
> before attach or open, no?
Agreed, in v3 I will push this check before the conditional so both
branches can be rejected with EINVAL.

Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver

2017-09-22 Thread Willem de Bruijn

On Fri, Sep 22, 2017 at 1:48 PM, Petar Penkov  wrote:
> On Fri, Sep 22, 2017 at 9:51 AM, Mahesh Bandewar (महेश बंडेवार)
>  wrote:
>> On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijn
>>  wrote:
 @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file 
 *file, struct ifreq *ifr)
 if (tfile->detached)
 return -EINVAL;

 +   if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN))
 +   return -EPERM;
 +
>>>
>>> This should perhaps be moved into the !dev branch, directly below the
>>> ns_capable check.
>>>
>> Hmm, does that mean fail only on creation but allow to attach if
>> exists? That would be wrong, isn't it? Correct me if I'm wrong but we
>> want to prevent both these scenarios if user does not have sufficient
>> privileges (i.e. NET_ADMIN in init-ns).

Ok.

>>
> My understanding is we want to protect both scenarios.
 dev = __dev_get_by_name(net, ifr->ifr_name);
 if (dev) {
 if (ifr->ifr_flags & IFF_TUN_EXCL)
 @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file 
 *file, struct ifreq *ifr)
 tun->flags = (tun->flags & ~TUN_FEATURES) |
 (ifr->ifr_flags & TUN_FEATURES);

 +   if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != 
 IFF_TAP)
 +   tun->flags = tun->flags & ~IFF_NAPI_FRAGS;
 +
>>>
>>> Similarly, this check only need to be performed in that branch.
>>> Instead of reverting to non-frags mode, a tun_set_iff with the wrong
>>> set of flags should probably fail hard.
>> Yes, agree, wrong set of flags should fail hard and probably be done
>> before attach or open, no?
> Agreed, in v3 I will push this check before the conditional so both
> branches can be rejected with EINVAL.

Sounds great.

Re: [PATCH] Add a driver for Renesas uPD60620 and uPD60620A PHYs

2017-09-22 Thread Andrew Lunn

On Fri, Sep 22, 2017 at 05:08:45PM +, Bernd Edlinger wrote:
> Signed-off-by: Bernd Edlinger 
> ---
>   drivers/net/phy/Kconfig|   5 +
>   drivers/net/phy/Makefile   |   1 +
>   drivers/net/phy/uPD60620.c | 226 
> +
>   3 files changed, 232 insertions(+)
>   create mode 100644 drivers/net/phy/uPD60620.c
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index a9d16a3..25089f0 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -287,6 +287,11 @@ config DP83867_PHY
>   ---help---
> Currently supports the DP83867 PHY.
> 
> +config RENESAS_PHY
> + tristate "Driver for Renesas PHYs"
> + ---help---
> +   Supports the uPD60620 and uPD60620A PHYs.
> +

Hi Bernd

Please call this "Reneseas PHYs" and place in it alphabetical order.

>   config FIXED_PHY
>   tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs"
>   depends on PHYLIB
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index 416df92..1404ad3 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -72,6 +72,7 @@ obj-$(CONFIG_MICROSEMI_PHY) += mscc.o
>   obj-$(CONFIG_NATIONAL_PHY)  += national.o
>   obj-$(CONFIG_QSEMI_PHY) += qsemi.o
>   obj-$(CONFIG_REALTEK_PHY)   += realtek.o
> +obj-$(CONFIG_RENESAS_PHY)+= uPD60620.o
>   obj-$(CONFIG_ROCKCHIP_PHY)  += rockchip.o
>   obj-$(CONFIG_SMSC_PHY)  += smsc.o
>   obj-$(CONFIG_STE10XP)   += ste10Xp.o
> diff --git a/drivers/net/phy/uPD60620.c b/drivers/net/phy/uPD60620.c
> new file mode 100644
> index 000..b3d900c
> --- /dev/null
> +++ b/drivers/net/phy/uPD60620.c
> @@ -0,0 +1,226 @@
> +/*
> + * Driver for the Renesas PHY uPD60620.
> + *
> + * Copyright (C) 2015 Softing Industrial Automation GmbH
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#define UPD60620_PHY_ID0xb8242824
> +
> +/* Extended Registers and values */
> +/* PHY Special Control/Status*/
> +#define PHY_PHYSCR 0x1F  /* PHY.31 */
> +#define PHY_PHYSCR_10MB0x0004/* PHY speed = 10mb */
> +#define PHY_PHYSCR_100MB   0x0008/* PHY speed = 100mb */
> +#define PHY_PHYSCR_DUPLEX  0x0010/* PHY Duplex */
> +#define PHY_PHYSCR_RSVD5   0x0020/* Reserved Bit 5 */
> +#define PHY_PHYSCR_MIIMOD  0x0040/* Enable 4B5B MII mode */

Are any of these comments actually useful. It seems like the defines
are pretty obvious.

> +#define PHY_PHYSCR_RSVD7   0x0080/* Reserved Bit 7 */
> +#define PHY_PHYSCR_RSVD8   0x0100/* Reserved Bit 8 */
> +#define PHY_PHYSCR_RSVD9   0x0200/* Reserved Bit 9 */
> +#define PHY_PHYSCR_RSVD10  0x0400/* Reserved Bit 10 */
> +#define PHY_PHYSCR_RSVD11  0x0800/* Reserved Bit 11 */
> +#define PHY_PHYSCR_ANDONE  0x1000/* Auto negotiation done */
> +#define PHY_PHYSCR_RSVD13  0x2000/* Reserved Bit 13 */
> +#define PHY_PHYSCR_RSVD14  0x4000/* Reserved Bit 14 */
> +#define PHY_PHYSCR_RSVD15  0x8000/* Reserved Bit 15 */

It looks like the only register you use is SCR and SPM. Maybe delete
all the rest? Or do you plan to add more features making use of these
registers?

> +/* Init PHY */
> +
> +static int upd60620_config_init(struct phy_device *phydev)
> +{
> + /* Enable support for passive HUBs (could be a strap option) */
> + /* PHYMODE: All speeds, HD in parallel detect */
> + return phy_write(phydev, PHY_SPM, 0x0180 | phydev->mdio.addr);
> +}
> +
> +/* Get PHY status from common registers */
> +
> +static int upd60620_read_status(struct phy_device *phydev)
> +{
> + int phy_state;
> +
> + /* Read negotiated state */
> + phy_state = phy_read(phydev, MII_BMSR);
> + if (phy_state < 0)
> + return phy_state;
> +
> + phydev->link = 0;
> + phydev->lp_advertising = 0;
> + phydev->pause = 0;
> + phydev->asym_pause = 0;
> +
> + if (phy_state & BMSR_ANEGCOMPLETE) {

It is worth comparing this against genphy_read_status() which is the
reference implementation. You would normally check if auto negotiation
is enabled, not if it has completed. If it is enabled you read the
current negotiated state, even if it is not completed.

> + phy_state = phy_read(phydev, PHY_PHYSCR);
> + if (phy_state < 0)
> + return phy_state;
> +
> + if (phy_state & (PHY_PHYSCR_10MB | PHY_PHYSCR_100MB)) {
> + phydev->link = 1;
> + phydev->speed = SPEED_10;
> + phydev->duplex = DUPLEX_HALF;
> +
> + if (phy_state & PHY_PHYSCR_100MB)
> + phydev->speed = SPEED_100;
> + if (phy_state

Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver

2017-09-22 Thread Willem de Bruijn

On Fri, Sep 22, 2017 at 1:11 PM, Mahesh Bandewar (महेश बंडेवार)
 wrote:
>>  #ifdef CONFIG_TUN_VNET_CROSS_LE
>>  static inline bool tun_legacy_is_little_endian(struct tun_struct *tun)
>>  {
>> @@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool 
>> clean)
>>
>> tun = rtnl_dereference(tfile->tun);
>>
>> +   if (tun && clean) {
>> +   tun_napi_disable(tun, tfile);
> are we missing synchronize_net() separating disable and del calls?

That is not needed here. napi_disable has its own mechanism for waiting
until a napi struct is no longer run. netif_napi_del will call synchronize_net
if needed. These two calls are made one after the other in quite a few drivers.

1 2 >

1 - 100 of 172 matches

Mail list logo