date:20201215

RE: [PATCH net 2/2] vhost_net: fix high cpu load when sendmsg fails

2020-12-15 Thread wangyunjian



> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Tuesday, December 15, 2020 12:10 PM
> To: wangyunjian ; netdev@vger.kernel.org;
> m...@redhat.com; willemdebruijn.ker...@gmail.com
> Cc: virtualizat...@lists.linux-foundation.org; Lilijun (Jerry)
> ; chenchanghu ;
> xudingke ; huangbin (J)
> 
> Subject: Re: [PATCH net 2/2] vhost_net: fix high cpu load when sendmsg fails
> 
> 
> On 2020/12/15 上午9:48, wangyunjian wrote:
> > From: Yunjian Wang 
> >
> > Currently we break the loop and wake up the vhost_worker when sendmsg
> > fails. When the worker wakes up again, we'll meet the same error. This
> > will cause high CPU load. To fix this issue, we can skip this
> > description by ignoring the error. When we exceeds sndbuf, the return
> > value of sendmsg is -EAGAIN. In the case we don't skip the description
> > and don't drop packet.
> >
> > Signed-off-by: Yunjian Wang 
> > ---
> >   drivers/vhost/net.c | 21 +
> >   1 file changed, 9 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index
> > c8784dfafdd7..f966592d8900 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -827,16 +827,13 @@ static void handle_tx_copy(struct vhost_net *net,
> struct socket *sock)
> > msg.msg_flags &= ~MSG_MORE;
> > }
> >
> > -   /* TODO: Check specific error and bomb out unless ENOBUFS? */
> > err = sock->ops->sendmsg(sock, &msg, len);
> > -   if (unlikely(err < 0)) {
> > +   if (unlikely(err == -EAGAIN)) {
> > vhost_discard_vq_desc(vq, 1);
> > vhost_net_enable_vq(net, vq);
> > break;
> > -   }
> 
> 
> As I've pointed out in last version. If you don't discard descriptor, you 
> probably
> need to add the head to used ring. Otherwise this descriptor will be always
> inflight that may confuse drivers.

Sorry for missing the comment.

After deleting discard descriptor and break, the next processing will be the 
same
as the normal success of sendmsg(), and vhost_zerocopy_signal_used() or
vhost_add_used_and_signal() method will be called to add the head to used ring.

Thanks
> 
> 
> > -   if (err != len)
> > -   pr_debug("Truncated TX packet: len %d != %zd\n",
> > -err, len);
> > +   } else if (unlikely(err < 0 || err != len))
> 
> 
> It looks to me err != len covers err < 0.

OK

> 
> Thanks
> 
> 
> > +   vq_err(vq, "Fail to sending packets err : %d, len : 
> > %zd\n", err,
> > +len);
> >   done:
> > vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
> > vq->heads[nvq->done_idx].len = 0;
> > @@ -922,7 +919,6 @@ static void handle_tx_zerocopy(struct vhost_net
> *net, struct socket *sock)
> > msg.msg_flags &= ~MSG_MORE;
> > }
> >
> > -   /* TODO: Check specific error and bomb out unless ENOBUFS? */
> > err = sock->ops->sendmsg(sock, &msg, len);
> > if (unlikely(err < 0)) {
> > if (zcopy_used) {
> > @@ -931,13 +927,14 @@ static void handle_tx_zerocopy(struct vhost_net
> *net, struct socket *sock)
> > nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> > % UIO_MAXIOV;
> > }
> > -   vhost_discard_vq_desc(vq, 1);
> > -   vhost_net_enable_vq(net, vq);
> > -   break;
> > +   if (err == -EAGAIN) {
> > +   vhost_discard_vq_desc(vq, 1);
> > +   vhost_net_enable_vq(net, vq);
> > +   break;
> > +   }
> > }
> > if (err != len)
> > -   pr_debug("Truncated TX packet: "
> > -" len %d != %zd\n", err, len);
> > +   vq_err(vq, "Fail to sending packets err : %d, len : 
> > %zd\n", err,
> > +len);
> > if (!zcopy_used)
> > vhost_add_used_and_signal(&net->dev, vq, head, 0);
> > else

Re: [RFC] net: stmmac: Problem with adding the native GPIOs support

2020-12-15 Thread Serge Semin

Hello Alexandre,

Thanks for the response. My comments are below.

On Mon, Dec 14, 2020 at 11:52:14AM +0100, Alexandre Torgue wrote:
> Hi Serge,
> 

> Sorry I never used GPIO provided by DWMAC IP. Obviously, I think is to late
> for you to use GPIOs provided by your SoC directly. Unfortunately, it seems
> to be a "perfect" chicken and eggs problem :(.

If you meant the problem that the PHY is getting reset together with
the MAC reset, then at some extent it's indeed the chicken-eggs
problem, but it affects the STMMAC driver only due to the
stmmac_reset() procedure implementation (it waits for the SWR flag
being cleared right in the same method, but the flag won't be cleared
until all the clocks are ready, which isn't possible until PHY reset
isn't cleared, so it causes the DMA-reset timeout). The solution of
that is simple. If we first performed the reset procedure, then
initialized/attached the PHY and after that would have made sure the
DMA_BUS_MODE.SFT_RESET flag was cleared, then the problem wouldn't be
even noticeable. But still that would have solved just a part of the
problem. The driver would still perform the MAC reset in the PM
resume() callback, which in my case will automatically reset the PHY,
while the PHY subsystem doesn't expect that.

So in order to make the driver properly working for any situation we
either need to take the possible PHY reset into account in both open()
and PM-resume() callbacks, or get rid of the reset completely there.

The perfect solution would be not to reset the MAC all the time on the
network device open and resume procedures. In that case we could have
reset the controller in the stmmac_dvr_probe() just once, then
register the GPIO interface and use it for the MDIO-bus, whatever with
no problems. What do you think of that? Is that even possible seeing,
for example, AMD xGBE driver doesn't reset the MAC on network dev
open?  Yeah, the GMAC manual states, that the DMA initialization needs
to start with the GMAC reset, but in fact do we really need to do that
all the time on the device open/resume? Wouldn't that be enough to
reset the device just on probe?

> 
> Do you have possibilty to "play" with gpio setting. I mean change
> configuration of them (at least for reset one) before perform a DMA reset:
> If you have a pull-up on RST line and you could "disconnect" GPO inside GMAC
> then your PHY should remain on during DMA reset phase.

Alas no. It is impossible to do anything with hardware now. We need to
deal with what we currently have. The GPO lane is externally
pulled-down to GND on all the Baikal-T1 SoC-based hardware and these
are not a single type of device, but multiple of them, which have been
produced for more than three years now. We also can't somehow
detach/disconnect GPO inside the GMAC or somehow else, because the SoC
has already been synthesized with no such feature. So when the GPIO
register is reset or the GPIO.GPO field is cleared PHY gets to be in
reset state, and it concerns all the devices.(

-Sergey

> 
> regards
> Alex
> 
> On 12/14/20 10:25 AM, Serge Semin wrote:
> > Hello folks,
> > 
> > I've got a problem, which has been blowing by head up for more than three
> > weeks now, and I'm desperately need your help in that matter. See our
> > Baikal-T1 SoC is created with two DW GMAC v3.73a IP-cores. Each core
> > has been synthesized with two GPIOs: one as GPI and another as GPO. There
> > are multiple Baikal-T1-based devices have been created so far with active
> > GMAC interface usage and each of them has been designed like this:
> > 
> >   ++
> >   | Baikal-T1 ++   ++
> >   |   SoC | DW GMAC|   |   Some PHY |
> >   |   |  Rx-clk+<--+Rx-clk  |
> >   |   ||   ||
> >   |   | GPI+<--+#IRQ|
> >   |   ||   ||
> >   |   |   RGMII+<->+RGMII   |
> >   |   |MDIO+<->+MDIO|
> >   |   ||   ||
> >   |   | GPO+-->+#RST|
> >   |   ||   ||
> >   |   |  Tx-clk+-->+Tx-clk  |
> >   |   ||   ||
> >   |   ++   ++
> >   ++
> > 
> > Each of such devices has got en external RGMII-PHY attached configured via 
> > the
> > MDIO bus with Rx-clock supplied by the PHY and Tx-clock consumed by it. The
> > main peculiarity of such configuration is that the DW GMAC GPIOs have been 
> > used
> > to catch the PHY IRQs and to reset the PHY. Seeing the GPIOs support hasn't
> > been added to the STMMAC driver it's the very first setup for now, which has
> > been using them. Anyway the hardware setup depicted above doesn't seem
> > problematic at the first glance, but in fact it is. See, the DW *MAC driver
> > (STMMAC ethernet driver) is doi

Re: [RFC] net: stmmac: Problem with adding the native GPIOs support

2020-12-15 Thread Serge Semin

Hello Andrew,

On Mon, Dec 14, 2020 at 04:31:43PM +0100, Andrew Lunn wrote:
> On Mon, Dec 14, 2020 at 12:25:16PM +0300, Serge Semin wrote:
> > Hello folks,
> > 
> > I've got a problem, which has been blowing by head up for more than three
> > weeks now, and I'm desperately need your help in that matter. See our
> > Baikal-T1 SoC is created with two DW GMAC v3.73a IP-cores. Each core
> > has been synthesized with two GPIOs: one as GPI and another as GPO. There
> > are multiple Baikal-T1-based devices have been created so far with active
> > GMAC interface usage and each of them has been designed like this:
> > 
> >  ++
> >  | Baikal-T1 ++   ++
> >  |   SoC | DW GMAC|   |   Some PHY |
> >  |   |  Rx-clk+<--+Rx-clk  |
> >  |   ||   ||
> >  |   | GPI+<--+#IRQ|
> >  |   ||   ||
> >  |   |   RGMII+<->+RGMII   |
> >  |   |MDIO+<->+MDIO|
> >  |   ||   ||
> >  |   | GPO+-->+#RST|
> >  |   ||   ||
> >  |   |  Tx-clk+-->+Tx-clk  |
> >  |   ||   ||
> >  |   ++   ++
> >  ++
> > 
> > Each of such devices has got en external RGMII-PHY attached configured via 
> > the
> > MDIO bus with Rx-clock supplied by the PHY and Tx-clock consumed by it. The
> > main peculiarity of such configuration is that the DW GMAC GPIOs have been 
> > used
> > to catch the PHY IRQs and to reset the PHY. Seeing the GPIOs support hasn't
> > been added to the STMMAC driver it's the very first setup for now, which has
> > been using them.
> 

> It sounds like you need to cleanly implement a GPIO controller within
> the stmmac driver. But you probably want to make it conditional on a
> DT property. For example, look to see if there is the
> 'gpio-controller;'

Yeap, that's what I have already done. The problem is that the
GPOs state is getting reset together with the MAC reset. So we don't
have a full control over the GPOs state when the MAC gets reset.

> 
> > Anyway the hardware setup depicted above doesn't seem
> > problematic at the first glance, but in fact it is. See, the DW *MAC driver
> > (STMMAC ethernet driver) is doing the MAC reset each time it performs the
> > device open or resume by means of the call-chain:
> > 
> >   stmmac_open()---+
> >   
> > +->stmmac_hw_setup()->stmmac_init_dma_engine()->stmmac_reset().
> >   stmmac_resume()-+
> > 
> > Such reset causes the whole interface reset: MAC, DMA and, what is more
> > important, GPIOs as being exposed as part of the MAC registers. That
> > in our case automatically causes the external PHY reset, what neither
> > the STTMAC driver nor the PHY subsystem expect at all.
> 

> Is the reset of the GPIO sub block under software control? When you
> have a GPIO controller implemented, you would want to disable this.

Not sure I've fully understood your question. The GPIO sub-block of
the MAC is getting reset together with the MAC. So when we reset the
MAC, the GPOs state will get reset too. Seeing the STMMAC driver
performs the reset on open() and resume() callbacks the GPIOs gets to
reset synchronously there too. That's the main problem. We can't
somehow change the MAC reset behavior. So it's either to get rid of
the reset or somehow take the results of the reset into account in
software (like reinitialize the PHY too after it).

> 
> Once you have a GPIO controller, you can make use of the standard PHY
> DT properties to allow the PHY driver to make use of the interrupt,
> and to control the reset of the PHY.

Yeah, that's what I initially intended to implement. If only the
GPIO-control register wasn't reset on the MAC reset, I wouldn't even
asked the question.

-Sergey

> 
>  Andrew

[PATCH net-next] ibmvnic: merge do_change_param_reset into do_reset

2020-12-15 Thread Lijun Pan

Commit b27507bb59ed ("net/ibmvnic: unlock rtnl_lock in reset so
linkwatch_event can run") introduced do_change_param_reset function to
solve the rtnl lock issue. Majority of the code in do_change_param_reset
duplicates do_reset. Also, we can handle the rtnl lock issue in do_reset
itself. Hence merge do_change_param_reset back into do_reset to clean up
the code.

Signed-off-by: Lijun Pan 
---
This patch was accepted into net-next as 16b5f5ce351f but was reverted
in 9f32c27eb4fc to yield to other under-testing patches. Since those
bug fix patches are already accepted, resubmit this one.

 drivers/net/ethernet/ibm/ibmvnic.c | 154 +
 1 file changed, 44 insertions(+), 110 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index f302504faa8a..f6d3b20a5361 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1924,92 +1924,6 @@ static int ibmvnic_set_mac(struct net_device *netdev, 
void *p)
return rc;
 }
 
-/**
- * do_change_param_reset returns zero if we are able to keep processing reset
- * events, or non-zero if we hit a fatal error and must halt.
- */
-static int do_change_param_reset(struct ibmvnic_adapter *adapter,
-struct ibmvnic_rwi *rwi,
-u32 reset_state)
-{
-   struct net_device *netdev = adapter->netdev;
-   int i, rc;
-
-   netdev_dbg(adapter->netdev, "Change param resetting driver (%d)\n",
-  rwi->reset_reason);
-
-   netif_carrier_off(netdev);
-   adapter->reset_reason = rwi->reset_reason;
-
-   ibmvnic_cleanup(netdev);
-
-   if (reset_state == VNIC_OPEN) {
-   rc = __ibmvnic_close(netdev);
-   if (rc)
-   goto out;
-   }
-
-   release_resources(adapter);
-   release_sub_crqs(adapter, 1);
-   release_crq_queue(adapter);
-
-   adapter->state = VNIC_PROBED;
-
-   rc = init_crq_queue(adapter);
-
-   if (rc) {
-   netdev_err(adapter->netdev,
-  "Couldn't initialize crq. rc=%d\n", rc);
-   return rc;
-   }
-
-   rc = ibmvnic_reset_init(adapter, true);
-   if (rc) {
-   rc = IBMVNIC_INIT_FAILED;
-   goto out;
-   }
-
-   /* If the adapter was in PROBE state prior to the reset,
-* exit here.
-*/
-   if (reset_state == VNIC_PROBED)
-   goto out;
-
-   rc = ibmvnic_login(netdev);
-   if (rc) {
-   goto out;
-   }
-
-   rc = init_resources(adapter);
-   if (rc)
-   goto out;
-
-   ibmvnic_disable_irqs(adapter);
-
-   adapter->state = VNIC_CLOSED;
-
-   if (reset_state == VNIC_CLOSED)
-   return 0;
-
-   rc = __ibmvnic_open(netdev);
-   if (rc) {
-   rc = IBMVNIC_OPEN_FAILED;
-   goto out;
-   }
-
-   /* refresh device's multicast list */
-   ibmvnic_set_multi(netdev);
-
-   /* kick napi */
-   for (i = 0; i < adapter->req_rx_queues; i++)
-   napi_schedule(&adapter->napi[i]);
-
-out:
-   if (rc)
-   adapter->state = reset_state;
-   return rc;
-}
-
 /**
  * do_reset returns zero if we are able to keep processing reset events, or
  * non-zero if we hit a fatal error and must halt.
@@ -2027,7 +1941,11 @@ static int do_reset(struct ibmvnic_adapter *adapter,
   adapter->state, adapter->failover_pending,
   rwi->reset_reason, reset_state);
 
-   rtnl_lock();
+   adapter->reset_reason = rwi->reset_reason;
+   /* requestor of VNIC_RESET_CHANGE_PARAM already has the rtnl lock */
+   if (!(adapter->reset_reason == VNIC_RESET_CHANGE_PARAM))
+   rtnl_lock();
+
/*
 * Now that we have the rtnl lock, clear any pending failover.
 * This will ensure ibmvnic_open() has either completed or will
@@ -2037,7 +1955,6 @@ static int do_reset(struct ibmvnic_adapter *adapter,
adapter->failover_pending = false;
 
netif_carrier_off(netdev);
-   adapter->reset_reason = rwi->reset_reason;
 
old_num_rx_queues = adapter->req_rx_queues;
old_num_tx_queues = adapter->req_tx_queues;
@@ -2049,25 +1966,37 @@ static int do_reset(struct ibmvnic_adapter *adapter,
if (reset_state == VNIC_OPEN &&
adapter->reset_reason != VNIC_RESET_MOBILITY &&
adapter->reset_reason != VNIC_RESET_FAILOVER) {
-   adapter->state = VNIC_CLOSING;
+   if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
+   rc = __ibmvnic_close(netdev);
+   if (rc)
+   goto out;
+   } else {
+   adapter->state = VNIC_CLOSING;
 
-   /* Release the RTNL lock before link state change and
-* re-acquire after the link

[PATCH] net: phy: fix kernel-doc for .config_intr()

2020-12-15 Thread Ioana Ciornei

From: Ioana Ciornei 

Fix the kernel-doc for .config_intr() so that we do not trigger a
warning like below.

include/linux/phy.h:869: warning: Function parameter or member 'config_intr' 
not described in 'phy_driver'

Fixes: 6527b938426f ("net: phy: remove the .did_interrupt() and 
.ack_interrupt() callback")
Signed-off-by: Ioana Ciornei 
---
 include/linux/phy.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 381a95732b6a..9effb511acde 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -743,7 +743,8 @@ struct phy_driver {
/** @read_status: Determines the negotiated speed and duplex */
int (*read_status)(struct phy_device *phydev);
 
-   /** @config_intr: Enables or disables interrupts.
+   /**
+* @config_intr: Enables or disables interrupts.
 * It should also clear any pending interrupts prior to enabling the
 * IRQs and after disabling them.
 */
-- 
2.28.0

Re: [PATCH] net: phy: fix kernel-doc for .config_intr()

2020-12-15 Thread Ioana Ciornei

On Tue, Dec 15, 2020 at 10:37:51AM +0200, Ioana Ciornei wrote:
> From: Ioana Ciornei 
> 
> Fix the kernel-doc for .config_intr() so that we do not trigger a
> warning like below.
> 
> include/linux/phy.h:869: warning: Function parameter or member 'config_intr' 
> not described in 'phy_driver'
> 
> Fixes: 6527b938426f ("net: phy: remove the .did_interrupt() and 
> .ack_interrupt() callback")
> Signed-off-by: Ioana Ciornei 

Sorry, I just realized that Jakub already sent a fix for this:

https://lore.kernel.org/netdev/20201215063750.3120976-1-k...@kernel.org/

Ioana


> ---
>  include/linux/phy.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 381a95732b6a..9effb511acde 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -743,7 +743,8 @@ struct phy_driver {
>   /** @read_status: Determines the negotiated speed and duplex */
>   int (*read_status)(struct phy_device *phydev);
>  
> - /** @config_intr: Enables or disables interrupts.
> + /**
> +  * @config_intr: Enables or disables interrupts.
>* It should also clear any pending interrupts prior to enabling the
>* IRQs and after disabling them.
>*/
> -- 
> 2.28.0
>

Re: [PATCH 04/25] dt-bindings: net: dwmac: Refactor snps,*-config properties

2020-12-15 Thread Serge Semin

Hello Rob,

On Mon, Dec 14, 2020 at 08:30:06AM -0600, Rob Herring wrote:
> On Mon, Dec 14, 2020 at 12:15:54PM +0300, Serge Semin wrote:
> > Currently the "snps,axi-config", "snps,mtl-rx-config" and
> > "snps,mtl-tx-config" properties are declared as a single phandle reference
> > to a node with corresponding parameters defined. That's not good for
> > several reasons. First of all scattering around a device tree some
> > particular device-specific configs with no visual relation to that device
> > isn't suitable from maintainability point of view. That leads to a
> > disturbed representation of the actual device tree mixing actual device
> > nodes and some vendor-specific configs. Secondly using the same configs
> > set for several device nodes doesn't represent well the devices structure,
> > since the interfaces these configs describe in hardware belong to
> > different devices and may actually differ. In the later case having the
> > configs node separated from the corresponding device nodes gets to be
> > even unjustified.
> > 
> > So instead of having a separate DW *MAC configs nodes we suggest to
> > define them as sub-nodes of the device nodes, which interfaces they
> > actually describe. By doing so we'll make the DW *MAC nodes visually
> > correct describing all the aspects of the IP-core configuration. Thus
> > we'll be able to describe the configs sub-nodes bindings right in the
> > snps,dwmac.yaml file.
> > 
> > Note the former "snps,axi-config", "snps,mtl-rx-config" and
> > "snps,mtl-tx-config" bindings have been marked as deprecated.
> > 
> > Signed-off-by: Serge Semin 
> > 
> > ---
> > 
> > Note the current DT schema tool requires the vendor-specific properties to 
> > be
> > defined in accordance with the schema: 
> > dtschema/meta-schemas/vendor-props.yaml
> > It means the property can be;
> > - boolean,
> > - string,
> > - defined with $ref and additional constraints,
> > - defined with allOf: [ $ref ] and additional constraints.
> > 
> > The modification provided by this commit needs to extend that definition to
> > make the DT schema tool correctly parse this schema. That is we need to let
> > the vendors-specific properties to also accept the oneOf-based combined
> > sub-schema. Like this:
> > 
> > --- a/dtschema/meta-schemas/vendor-props.yaml
> > +++ b/dtschema/meta-schemas/vendor-props.yaml
> > @@ -48,15 +48,24 @@
> >- properties:   # A property with a type and additional constraints
> >$ref:
> >  pattern: "types.yaml#[\/]{0,1}definitions\/.*"
> > -  allOf:
> > -items:
> > -  - properties:
> > +
> > +if:
> > +  not:
> > +required:
> > +  - $ref
> > +then:
> > +  patternProperties:
> > +"^(all|one)Of$":
> > +  contains:
> > +properties:
> >$ref:
> >  pattern: "types.yaml#[\/]{0,1}definitions\/.*"
> >  required:
> >- $ref
> > -oneOf:
> > +
> > +anyOf:
> >- required: [ $ref ]
> >- required: [ allOf ]
> > +  - required: [ oneOf ]
> > 
> >  ...
> > ---
> >  .../devicetree/bindings/net/snps,dwmac.yaml   | 380 +-
> >  1 file changed, 288 insertions(+), 92 deletions(-)
> > 
> > diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml 
> > b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
> > index 0dd543c6c08e..44aa88151cba 100644
> > --- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml
> > +++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
> > @@ -150,69 +150,251 @@ properties:
> >in a different mode than the PHY in order to function.
> >  
> >snps,axi-config:
> > -$ref: /schemas/types.yaml#definitions/phandle
> > -description:
> > -  AXI BUS Mode parameters. Phandle to a node that can contain the
> > -  following properties
> > -* snps,lpi_en, enable Low Power Interface
> > -* snps,xit_frm, unlock on WoL
> > -* snps,wr_osr_lmt, max write outstanding req. limit
> > -* snps,rd_osr_lmt, max read outstanding req. limit
> > -* snps,kbbe, do not cross 1KiB boundary.
> > -* snps,blen, this is a vector of supported burst length.
> > -* snps,fb, fixed-burst
> > -* snps,mb, mixed-burst
> > -* snps,rb, rebuild INCRx Burst
> > +description: AXI BUS Mode parameters
> > +oneOf:
> > +  - deprecated: true
> > +$ref: /schemas/types.yaml#definitions/phandle
> > +  - type: object
> > +properties:
> 

> Anywhere have have the same node/property string meaning 2 different 
> things is a pain, let's not create another one. 

IIUC you meant that having a node and property with the same name
isn't ok. Right? If so could you explain why not? especially seeing
the property is expected to be set with phandle reference to that
node. That seemed like a perfect so

Re: [PATCH] net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function

2020-12-15 Thread Maxime Ripard

Hi,

On Mon, Dec 14, 2020 at 09:21:17PM +0100, Christophe JAILLET wrote:
> 'irq_of_parse_and_map()' should be balanced by a corresponding
> 'irq_dispose_mapping()' call. Otherwise, there is some resources leaks.

Do you have a source to back that? It's not clear at all from the
documentation for those functions, and couldn't find any user calling it
from the ten-or-so random picks I took.

Maxime

signature.asc
Description: PGP signature

[net-next v5 00/15] Add mlx5 subfunction support

2020-12-15 Thread Saeed Mahameed

From: Saeed Mahameed 

Hi Dave, Jakub, Jason,

This series form Parav was the theme of this mlx5 release cycle,
we've been waiting anxiously for the auxbus infrastructure to make it into
the kernel, and now as the auxbus is in and all the stars are aligned, I
can finally submit this V2 of the devlink and mlx5 subfunction support.

Subfunctions came to solve the scaling issue of virtualization
and switchdev environments, where SRIOV failed to deliver and users ran
out of VFs very quickly as SRIOV demands huge amount of physical resources
in both of the servers and the NIC.

Subfunction provide the same functionality as SRIOV but in a very
lightweight manner, please see the thorough and detailed
documentation from Parav below, in the commit messages and the
Networking documentation patches at the end of this series.

Sending V4 as a continuation to V1 that was sent Last month [0],
[0] https://lore.kernel.org/linux-rdma/20201112192424.2742-1-pa...@nvidia.com/

---
Changelog:
v4->v5:
 - Fix some typos in the documentation
 
v3->v4:
 - Fix 32bit compilation issue

v2->v3:
 - added header file sf/priv.h to cmd.c to avoid missing prototype warning
 - made mlx5_sf_table_disable as static function as its used only in one file

v1->v2:
 - added documentation for subfunction and its mlx5 implementation
 - add MLX5_SF config option documentation
 - rebased
 - dropped devlink global lock improvement patch as mlx5 doesn't support
   reload while SFs are allocated
 - dropped devlink reload lock patch as mlx5 doesn't support reload
   when SFs are allocated
 - using updated vhca event from device to add remove auxiliary device
 - split sf devlink port allocation and sf hardware context allocation

Parav Pandit Says:
=

This patchset introduces support for mlx5 subfunction (SF).

A subfunction is a lightweight function that has a parent PCI function on
which it is deployed. mlx5 subfunction has its own function capabilities
and its own resources. This means a subfunction has its own dedicated
queues(txq, rxq, cq, eq). These queues are neither shared nor stolen from
the parent PCI function.

When subfunction is RDMA capable, it has its own QP1, GID table and rdma
resources neither shared nor stolen from the parent PCI function.

A subfunction has dedicated window in PCI BAR space that is not shared
with the other subfunctions or parent PCI function. This ensures that all
class devices of the subfunction accesses only assigned PCI BAR space.

A Subfunction supports eswitch representation through which it supports tc
offloads. User must configure eswitch to send/receive packets from/to
subfunction port.

Subfunctions share PCI level resources such as PCI MSI-X IRQs with
their other subfunctions and/or with its parent PCI function.

Patch summary:
--
Patch 1 to 4 prepares devlink
patch 5 to 7 mlx5 adds SF device support
Patch 8 to 11 mlx5 adds SF devlink port support
Patch 12 and 14 adds documentation

Patch-1 prepares code to handle multiple port function attributes
Patch-2 introduces devlink pcisf port flavour similar to pcipf and pcivf
Patch-3 adds port add and delete driver callbacks
Patch-4 adds port function state get and set callbacks
Patch-5 mlx5 vhca event notifier support to distribute subfunction
state change notification
Patch-6 adds SF auxiliary device
Patch-7 adds SF auxiliary driver
Patch-8 prepares eswitch to handler SF vport
Patch-9 adds eswitch helpers to add/remove SF vport
Patch-10 implements devlink port add/del callbacks
Patch-11 implements devlink port function get/set callbacks
Patch-12 to 14 adds documentation
Patch-12 added mlx5 port function documentation
Patch-13 adds subfunction documentation
Patch-14 adds mlx5 subfunction documentation

Subfunction support is discussed in detail in RFC [1] and [2].
RFC [1] and extension [2] describes requirements, design and proposed
plumbing using devlink, auxiliary bus and sysfs for systemd/udev
support. Functionality of this patchset is best explained using real
examples further below.

overview:

A subfunction can be created and deleted by a user using devlink port
add/delete interface.

A subfunction can be configured using devlink port function attribute
before its activated.

When a subfunction is activated, it results in an auxiliary device on
the host PCI device where it is deployed. A driver binds to the
auxiliary device that further creates supported class devices.

example subfunction usage sequence:
---
Change device to switchdev mode:
$ devlink dev eswitch set pci/:06:00.0 mode switchdev

Add a devlink port of subfunction flavour:
$ devlink port add pci/:06:00.0 flavour pcisf pfnum 0 sfnum 88

Configure mac address of the port function:
$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88

Now activate the function:
$ devlink port function set ens2f0npf0sf88 state active

Now use the auxiliary device and class devices:
$ devlink dev show
pci/:06:00

[net-next v5 03/15] devlink: Introduce PCI SF port flavour and port attribute

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

A PCI sub-function (SF) represents a portion of the device similar
to PCI VF.

In an eswitch, PCI SF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with SF,
and its representor netdevice, introduce a PCI SF port flavour.

When devlink port flavour is PCI SF, fill up PCI SF attributes of the
port.

Extend port name creation using PCI PF and SF number scheme on best
effort basis, so that vendor drivers can skip defining their own
scheme.
This is done as cApfNSfM, where A, N and M are controller, PCI PF and
PCI SF number respectively.
This is similar to existing naming for PCI PF and PCI VF ports.

An example view of a PCI SF port:

$ devlink port show pci/:06:00.0/32768
pci/:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 
0 pfnum 0 sfnum 88 external false splittable false
  function:
hw_addr 00:00:00:00:88:88 state active opstate attached

$ devlink port show pci/:06:00.0/32768 -jp
{
"port": {
"pci/:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:88:88",
"state": "active",
"opstate": "attached"
}
}
}
}

Signed-off-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 include/net/devlink.h| 17 +
 include/uapi/linux/devlink.h |  5 
 net/core/devlink.c   | 46 
 3 files changed, 68 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f466819cc477..5bd43f0a79a8 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -93,6 +93,20 @@ struct devlink_port_pci_vf_attrs {
u8 external:1;
 };
 
+/**
+ * struct devlink_port_pci_sf_attrs - devlink port's PCI SF attributes
+ * @controller: Associated controller number
+ * @pf: Associated PCI PF number for this port.
+ * @sf: Associated PCI SF for of the PCI PF for this port.
+ * @external: when set, indicates if a port is for an external controller
+ */
+struct devlink_port_pci_sf_attrs {
+   u32 controller;
+   u16 pf;
+   u32 sf;
+   u8 external:1;
+};
+
 /**
  * struct devlink_port_attrs - devlink port object
  * @flavour: flavour of the port
@@ -114,6 +128,7 @@ struct devlink_port_attrs {
struct devlink_port_phys_attrs phys;
struct devlink_port_pci_pf_attrs pci_pf;
struct devlink_port_pci_vf_attrs pci_vf;
+   struct devlink_port_pci_sf_attrs pci_sf;
};
 };
 
@@ -1404,6 +1419,8 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port 
*devlink_port, u32 contro
   u16 pf, bool external);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 
controller,
   u16 pf, u16 vf, bool external);
+void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 
controller,
+  u16 pf, u32 sf, bool external);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
u32 size, u16 ingress_pools_count,
u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 5203f54a2be1..6fe00f10eb3f 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -200,6 +200,10 @@ enum devlink_port_flavour {
DEVLINK_PORT_FLAVOUR_UNUSED, /* Port which exists in the switch, but
  * is not used in any way.
  */
+   DEVLINK_PORT_FLAVOUR_PCI_SF, /* Represents eswitch port
+ * for the PCI SF. It is an internal
+ * port that faces the PCI SF.
+ */
 };
 
 enum devlink_param_cmode {
@@ -529,6 +533,7 @@ enum devlink_attr {
DEVLINK_ATTR_RELOAD_ACTION_INFO,/* nested */
DEVLINK_ATTR_RELOAD_ACTION_STATS,   /* nested */
 
+   DEVLINK_ATTR_PORT_PCI_SF_NUMBER,/* u32 */
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 13e0de80c4f9..08eac247f200 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -690,6 +690,15 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, 
attrs->pci_vf.external))
return -EMSGSIZE;
break;
+   case DEVLINK_PORT_FLAVOUR_PCI_SF:
+

[net-next v5 01/15] net/mlx5: Fix compilation warning for 32-bit platform

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

MLX5_GENERAL_OBJECT_TYPES types bitfield is 64-bit field.

Defining an enum for such bit fields on 32-bit platform results in below
warning.

./include/vdso/bits.h:7:26: warning: left shift count >= width of type 
[-Wshift-count-overflow]
 ^
./include/linux/mlx5/mlx5_ifc.h:10716:46: note: in expansion of macro ‘BIT’
 MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT(0x20),
 ^~~
Use 32-bit friendly left shift.

Fixes: 2a2970891647 ("net/mlx5: Add sample offload hardware bits and 
structures")
Signed-off-by: Parav Pandit 
Reported-by: Stephen Rothwell 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/mlx5_ifc.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 0d6e287d614f..b9f15935dfe5 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -10711,9 +10711,9 @@ struct mlx5_ifc_affiliated_event_header_bits {
 };
 
 enum {
-   MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = BIT(0xc),
-   MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC = BIT(0x13),
-   MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT(0x20),
+   MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = 1ULL << 0xc,
+   MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC = 1ULL << 0x13,
+   MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = 1ULL << 0x20,
 };
 
 enum {
-- 
2.26.2

[net-next v5 07/15] net/mlx5: SF, Add auxiliary device support

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Introduce API to add and delete an auxiliary device for an SF.
Each SF has its own dedicated window in the PCI BAR 2.

SF device is similar to PCI PF and VF that supports multiple class of
devices such as net, rdma and vdpa.

SF device will be added or removed in subsequent patch during SF
devlink port function state change command.

A subfunction device exposes user supplied subfunction number which will
be further used by systemd/udev to have deterministic name for its
netdevice and rdma device.

An mlx5 subfunction auxiliary device example:

$ devlink dev eswitch set pci/:06:00.0 mode switchdev

$ devlink port show
pci/:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 
splittable false

$ devlink port add pci/:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show ens2f0npf0sf88
pci/:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 
0 pfnum 0 sfnum 88 external false splittable false
  function:
hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state 
active

On activation,

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> 
../../../devices/pci:00/:00:03.0/:06:00.0/mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

Signed-off-by: Parav Pandit 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 .../device_drivers/ethernet/mellanox/mlx5.rst |   5 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/main.c|   4 +
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.c  | 261 ++
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.h  |  35 +++
 include/linux/mlx5/driver.h   |   2 +
 6 files changed, 308 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst 
b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
index e9b65035cd47..a5eb22793bb9 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
@@ -97,6 +97,11 @@ Enabling the driver and kconfig options
 
 |   Provides low-level InfiniBand/RDMA and `RoCE 
`_
 support.
 
+**CONFIG_MLX5_SF=(y/n)**
+
+|   Build support for subfunction.
+|   Subfunctons are more light weight than PCI SRIOV VFs. Choosing this option
+|   will enable support for creating subfunction devices.
 
 **External options** ( Choose if the corresponding mlx5 feature is required )
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 292c02c4828c..2aefbca404c3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -88,4 +88,4 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o 
steering/dr_table.o
 #
 # SF device
 #
-mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o
+mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 6e67ad11c713..292c30e71d7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -74,6 +74,7 @@
 #include "lib/hv_vhca.h"
 #include "diag/rsc_dump.h"
 #include "sf/vhca_event.h"
+#include "sf/dev/dev.h"
 
 MODULE_AUTHOR("Eli Cohen ");
 MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) 
core driver");
@@ -1155,6 +1156,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
goto err_sriov;
}
 
+   mlx5_sf_dev_table_create(dev);
+
return 0;
 
 err_sriov:
@@ -1186,6 +1189,7 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 
 static void mlx5_unload(struct mlx5_core_dev *dev)
 {
+   mlx5_sf_dev_table_destroy(dev);
mlx5_sriov_detach(dev);
mlx5_ec_cleanup(dev);
mlx5_vhca_event_stop(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c 
b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
new file mode 100644
index ..6562bf63afaa
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
@@ -0,0 +1,261 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#include 
+#include 
+#include "mlx5_core.h"
+#include "dev.h"
+#include "sf/vhca_event.h"
+#include "sf/sf.h"
+#include "sf/mlx5_ifc_vhca_event.h"
+#include "ecpf.h"
+
+struct mlx5_sf_dev_table {
+   struct xarray devices;
+   unsigned int max_sfs;
+   phys_addr_t base_address;
+   u64 sf_bar_length;
+   struct notifier_bloc

[net-next v5 06/15] net/mlx5: Introduce vhca state event notifier

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.

Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.

This enables single entity to subscribe, query and rearm the event
for a function.

Signed-off-by: Parav Pandit 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/Kconfig   |   9 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   4 +
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   4 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   3 +
 .../net/ethernet/mellanox/mlx5/core/events.c  |   7 +
 .../net/ethernet/mellanox/mlx5/core/main.c|  16 ++
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   2 +
 .../mlx5/core/sf/mlx5_ifc_vhca_event.h|  82 
 .../net/ethernet/mellanox/mlx5/core/sf/sf.h   |  45 +
 .../mellanox/mlx5/core/sf/vhca_event.c| 189 ++
 .../mellanox/mlx5/core/sf/vhca_event.h|  57 ++
 include/linux/mlx5/driver.h   |   4 +
 12 files changed, 422 insertions(+)
 create mode 100644 
drivers/net/ethernet/mellanox/mlx5/core/sf/mlx5_ifc_vhca_event.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 6e4d7bb7fea2..d6c48582e7a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -203,3 +203,12 @@ config MLX5_SW_STEERING
default y
help
Build support for software-managed steering in the NIC.
+
+config MLX5_SF
+   bool "Mellanox Technologies subfunction device support using auxiliary 
device"
+   depends on MLX5_CORE && MLX5_CORE_EN
+   default n
+   help
+   Build support for subfuction device in the NIC. A Mellanox subfunction
+   device can support RDMA, netdevice and vdpa device.
+   It is similar to a SRIOV VF but it doesn't require SRIOV support.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 77961643d5a9..292c02c4828c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -85,3 +85,7 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o 
steering/dr_table.o
steering/dr_ste.o steering/dr_send.o \
steering/dr_cmd.o steering/dr_fw.o \
steering/dr_action.o steering/fs_dr.o
+#
+# SF device
+#
+mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 50c7b9ee80c3..47dcc3ac2cf0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -464,6 +464,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_ALLOC_MEMIC:
case MLX5_CMD_OP_MODIFY_XRQ:
case MLX5_CMD_OP_RELEASE_XRQ_ERROR:
+   case MLX5_CMD_OP_QUERY_VHCA_STATE:
+   case MLX5_CMD_OP_MODIFY_VHCA_STATE:
*status = MLX5_DRIVER_STATUS_ABORTED;
*synd = MLX5_DRIVER_SYND;
return -EIO;
@@ -657,6 +659,8 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(DESTROY_UMEM);
MLX5_COMMAND_STR_CASE(RELEASE_XRQ_ERROR);
MLX5_COMMAND_STR_CASE(MODIFY_XRQ);
+   MLX5_COMMAND_STR_CASE(QUERY_VHCA_STATE);
+   MLX5_COMMAND_STR_CASE(MODIFY_VHCA_STATE);
default: return "unknown command opcode";
}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index fc0afa03d407..421febebc658 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -595,6 +595,9 @@ static void gather_async_events_mask(struct mlx5_core_dev 
*dev, u64 mask[4])
async_event_mask |=
(1ull << MLX5_EVENT_TYPE_ESW_FUNCTIONS_CHANGED);
 
+   if (MLX5_CAP_GEN_MAX(dev, vhca_state))
+   async_event_mask |= (1ull << MLX5_EVENT_TYPE_VHCA_STATE_CHANGE);
+
mask[0] = async_event_mask;
 
if (MLX5_CAP_GEN(dev, event_cap))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c 
b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index 3ce17c3d7a00..5523d218e5fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -110,6 +110,8 @@ static const char

[net-next v5 15/15] net/mlx5: Add devlink subfunction port documentation

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Add documentation for subfunction management using devlink
port.

Signed-off-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 .../device_drivers/ethernet/mellanox/mlx5.rst | 204 ++
 1 file changed, 204 insertions(+)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst 
b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
index a5eb22793bb9..07e38c044355 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
@@ -12,6 +12,8 @@ Contents
 - `Enabling the driver and kconfig options`_
 - `Devlink info`_
 - `Devlink parameters`_
+- `mlx5 subfunction`_
+- `mlx5 port function`_
 - `Devlink health reporters`_
 - `mlx5 tracepoints`_
 
@@ -181,6 +183,208 @@ User command examples:
   values:
  cmode driverinit value true
 
+mlx5 subfunction
+
+mlx5 supports subfunctions management using devlink port (see 
:ref:`Documentation/networking/devlink/devlink-port.rst `) 
interface.
+
+A Subfunction has its own function capabilities and its own resources. This
+means a subfunction has its own dedicated queues(txq, rxq, cq, eq). These 
queues
+are neither shared nor stolen from the parent PCI function.
+
+When subfunction is RDMA capable, it has its own QP1, GID table and rdma
+resources neither shared nor stolen from the parent PCI function.
+
+A subfunction has dedicated window in PCI BAR space that is not shared
+with the other subfunctions or parent PCI function. This ensures that all
+class devices of the subfunction accesses only assigned PCI BAR space.
+
+A Subfunction supports eswitch representation through which it supports tc
+offloads. User must configure eswitch to send/receive packets from/to
+subfunction port.
+
+Subfunctions share PCI level resources such as PCI MSI-X IRQs with
+the other subfunctions and/or with its parent PCI function.
+
+Example mlx5 software, system and device view::
+
+   ___
+  | admin |
+  | user  |--
+  |___| |
+  | |
+  |   __|___
+ | | | |  | |
+ | devlink | | tc tool |  |user |
+ | tool| |_|  | applications|
+ |_| ||_|
+   | |   |  |
+   | |   |  | Userspace
+ +-|-|---|--|+
+   | |   +--+   +--+   Kernel
+   | |   |  netdev  |   | rdma dev |
+   | |   +--+   +--+
+   (devlink port add/del |  ^   ^
+port function set)   |  |   |
+   | |  +---|
+  _|___  |  |___|___
+ | | |  |   | mlx5 class|
+ | devlink |   ++   |   |   drivers |
+ | kernel  |   | rep netdev |   |   |(mlx5_core,ib) |
+ |_|   ++   |   |___|
+   | |  |   ^
+   (devlink ops) |  |  (probe/remove)
+  _| |  |   |
+ | subfunction  || +---+   | subfunction |
+ | management driver|- | subfunction   |---|  driver |
+ | (mlx5_core)  |  | auxiliary dev |   | (mlx5_core) |
+ |__|  +---+   |_|
+   |^
+  (sf add/del, vhca events) |
+   |  (device add/del)
+  _||
+ |  |  | subfunction |
+ |  PCI NIC | activate/deactive events>| host driver |
+ |__|  | (mlx5_core) |
+   |_|
+
+Subfunction is created using devlink port interface.
+
+- Change device to switchdev mode::
+
+$ devlink dev eswitch set pci/:06:00.0 mode switchdev
+
+- Add a devlink port of subfunction flavour::
+
+$ devlink port add pci/:06:00.0 flavour pcisf pfnum 0 sfnum 88
+
+- Show a devlink port of the subfunction::
+
+$ devlink port show pci/:06:00.0/32768
+pci/:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 
0 sfnum 88
+  function:
+hw_addr 00:00:00:00:00:00
+
+- Delete a devlink port of subfunction after use::
+
+$ devlink port del pci/:06:00.0 fl

[net-next v5 11/15] net/mlx5: SF, Add port add delete functionality

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that upper layer who
wish to support sf port management in switchdev mode can perform its
task whenever eswitch mode is set to switchdev or before eswitch is
disabled.

Initialize sf port table on such eswitch event.

Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.

$ devlink dev eswitch set pci/:06:00.0 mode switchdev

$ devlink port show
pci/:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 
splittable false

$ devlink port add pci/:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show ens2f0npf0sf88
pci/:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 
0 pfnum 0 sfnum 88 external false splittable false
  function:
hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port show ens2f0npf0sf88 -jp
{
"port": {
"pci/:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:88:88",
"state": "inactive",
"opstate": "detached"
}
}
}
}

Signed-off-by: Parav Pandit 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   4 +
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   5 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  25 ++
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  12 +
 .../net/ethernet/mellanox/mlx5/core/main.c|  18 +
 .../net/ethernet/mellanox/mlx5/core/sf/cmd.c  |  27 ++
 .../ethernet/mellanox/mlx5/core/sf/devlink.c  | 312 ++
 .../ethernet/mellanox/mlx5/core/sf/hw_table.c | 125 +++
 .../net/ethernet/mellanox/mlx5/core/sf/priv.h |  17 +
 .../net/ethernet/mellanox/mlx5/core/sf/sf.h   |  28 ++
 include/linux/mlx5/driver.h   |   6 +
 12 files changed, 584 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index efa95d6dd112..957d5d9cfb36 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -89,3 +89,8 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o 
steering/dr_table.o
 # SF device
 #
 mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o sf/dev/driver.o
+
+#
+# SF manager
+#
+mlx5_core-$(CONFIG_MLX5_SF_MANAGER) += sf/cmd.o sf/hw_table.o sf/devlink.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 47dcc3ac2cf0..e8cecd50558d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -333,6 +333,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_DEALLOC_MEMIC:
case MLX5_CMD_OP_PAGE_FAULT_RESUME:
case MLX5_CMD_OP_QUERY_ESW_FUNCTIONS:
+   case MLX5_CMD_OP_DEALLOC_SF:
return MLX5_CMD_STAT_OK;
 
case MLX5_CMD_OP_QUERY_HCA_CAP:
@@ -466,6 +467,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_RELEASE_XRQ_ERROR:
case MLX5_CMD_OP_QUERY_VHCA_STATE:
case MLX5_CMD_OP_MODIFY_VHCA_STATE:
+   case MLX5_CMD_OP_ALLOC_SF:
*status = MLX5_DRIVER_STATUS_ABORTED;
*synd = MLX5_DRIVER_SYND;
return -EIO;
@@ -661,6 +663,8 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(MODIFY_XRQ);
MLX5_COMMAND_STR_CASE(QUERY_VHCA_STATE);
MLX5_COMMAND_STR_CASE(MODIFY_VHCA_STATE);
+   MLX5_COMMAND_STR_CASE(ALLOC_SF);
+   MLX5_COMMAND_STR_CASE(DEALLOC_SF);
default: return "unknown command opcode";
}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c 
b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 9afe918c5827..d4c0cdf5edd9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -8,6 +8,7 @@
 #include "fs_core.h"
 #include "eswitch.h"
 #include "sf/dev/dev.h"
+#include "sf/sf.h"
 
 static int mlx5_devlink_flash_update(struct devlink *devlink,
 struct devlink_fla

[net-next v5 12/15] net/mlx5: SF, Port function state change support

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Support changing the state of the SF port's function through devlink.
When activating the SF port's function, enable the hca in the device
followed by adding its auxiliary device.
When deactivating the SF port's function, delete its auxiliary device
followed by disabling the vHCA.

Port function attributes get/set callbacks are invoked with devlink
instance lock held. Such callbacks need to synchronize with sf port
table getting disabled either via sriov sysfs callback. Such callbacks
synchronize with table disable context holding table refcount.

$ devlink dev eswitch set pci/:06:00.0 mode switchdev

$ devlink port show
pci/:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 
splittable false

$ devlink port add pci/:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show ens2f0npf0sf88
pci/:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 
0 pfnum 0 sfnum 88 external false splittable false
  function:
hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set pci/:06:00.0/32768 hw_addr 00:00:00:00:88:88 
state active

$ devlink port show ens2f0npf0sf88 -jp
{
"port": {
"pci/:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:88:88",
"state": "active",
"opstate": "attached"
}
}
}
}

On port function activation, an auxiliary device is created in below
example.

$ devlink dev show
devlink dev show auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 
splittable false

Signed-off-by: Parav Pandit 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   2 +
 .../net/ethernet/mellanox/mlx5/core/main.c|  10 +
 .../net/ethernet/mellanox/mlx5/core/sf/cmd.c  |  22 ++
 .../ethernet/mellanox/mlx5/core/sf/devlink.c  | 284 --
 .../ethernet/mellanox/mlx5/core/sf/hw_table.c | 116 ++-
 .../net/ethernet/mellanox/mlx5/core/sf/priv.h |   4 +
 .../net/ethernet/mellanox/mlx5/core/sf/sf.h   |  19 ++
 7 files changed, 431 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c 
b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index d4c0cdf5edd9..75d950d95fcf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -195,6 +195,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
 #ifdef CONFIG_MLX5_SF_MANAGER
.port_new = mlx5_devlink_sf_port_new,
.port_del = mlx5_devlink_sf_port_del,
+   .port_function_state_get = mlx5_devlink_sf_port_fn_state_get,
+   .port_function_state_set = mlx5_devlink_sf_port_fn_state_set,
 #endif
.flash_update = mlx5_devlink_flash_update,
.info_get = mlx5_devlink_info_get,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 435323088ce0..f6b885fdd5c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -75,6 +75,7 @@
 #include "diag/rsc_dump.h"
 #include "sf/vhca_event.h"
 #include "sf/dev/dev.h"
+#include "sf/sf.h"
 
 MODULE_AUTHOR("Eli Cohen ");
 MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) 
core driver");
@@ -1161,6 +1162,12 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 
mlx5_vhca_event_start(dev);
 
+   err = mlx5_sf_hw_table_create(dev);
+   if (err) {
+   mlx5_core_err(dev, "sf table create failed %d\n", err);
+   goto err_vhca;
+   }
+
err = mlx5_ec_init(dev);
if (err) {
mlx5_core_err(dev, "Failed to init embedded CPU\n");
@@ -1180,6 +1187,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 err_sriov:
mlx5_ec_cleanup(dev);
 err_ec:
+   mlx5_sf_hw_table_destroy(dev);
+err_vhca:
mlx5_vhca_event_stop(dev);
mlx5_cleanup_fs(dev);
 err_fs:
@@ -1209,6 +1218,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
mlx5_sf_dev_table_destroy(dev);
mlx5_sriov_detach(dev);
mlx5_ec_cleanup(dev);
+   mlx5_sf_hw_table_destroy(dev);
mlx5_vhca_event_stop(dev);
mlx5_cleanup_fs(dev);
mlx5_accel_ipsec_cleanup(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
index 0bc3075f34fa..a8d75c2f0275 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
@@ -25,3 +25,25 @@ int mlx5_cmd_dealloc_sf(struct mlx5_core_d

[net-next v5 14/15] devlink: Extend devlink port documentation for subfunctions

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Add devlink port documentation for subfunction management.

Signed-off-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 Documentation/driver-api/auxiliary_bus.rst|  2 +
 .../networking/devlink/devlink-port.rst   | 89 ++-
 2 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/Documentation/driver-api/auxiliary_bus.rst 
b/Documentation/driver-api/auxiliary_bus.rst
index 2312506b0674..fff96c7ba7a8 100644
--- a/Documentation/driver-api/auxiliary_bus.rst
+++ b/Documentation/driver-api/auxiliary_bus.rst
@@ -1,5 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0-only
 
+.. _auxiliary_bus:
+
 =
 Auxiliary Bus
 =
diff --git a/Documentation/networking/devlink/devlink-port.rst 
b/Documentation/networking/devlink/devlink-port.rst
index 4c910dbb01ca..c6924e7a341e 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -34,6 +34,9 @@ Devlink port flavours are described below.
* - ``DEVLINK_PORT_FLAVOUR_PCI_VF``
  - This indicates an eswitch port representing a port of PCI
virtual function (VF).
+   * - ``DEVLINK_PORT_FLAVOUR_PCI_SF``
+ - This indicates an eswitch port representing a port of PCI
+   subfunction (SF).
* - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
  - This indicates a virtual port for the PCI virtual function.
 
@@ -57,9 +60,9 @@ Devlink port can have a different type based on the link 
layer described below.
 PCI controllers
 ---
 In most cases a PCI device has only one controller. A controller consists of
-potentially multiple physical and virtual functions. Such PCI function consists
-of one or more ports. This port of the function is represented by the devlink
-eswitch port.
+potentially multiple physical functions, virtual functions and subfunctions.
+Such PCI function consists of one or more ports. This port of the function
+is represented by the devlink eswitch port.
 
 A PCI Device connected to multiple CPUs or multiple PCI root complexes or
 SmartNIC, however, may have multiple controllers. For a device with multiple
@@ -112,7 +115,85 @@ PCI function. Usually it means, user should configure port 
function attribute
 before a bus specific device for the function is created. However, when
 SRIOV is enabled, virtual function devices are created on the PCI bus.
 Hence, function attribute should be configured before binding virtual
-function device to the driver.
+function device to the driver. For subfunctions, this means user should
+configure port function attribute before activating the port function.
 
 User may set the hardware address of the function represented by the devlink
 port function. For Ethernet port function this means a MAC address.
+
+Subfunctions
+
+
+Subfunctions are lightweight functions that has parent PCI function on which
+it is deployed. Subfunctions are created and deployed in unit of 1. Unlike
+SRIOV VFs, they don't require their own PCI virtual function. They communicate
+with the hardware through the parent PCI function. Subfunctions can possibly
+scale better.
+
+To use a subfunction, 3 steps setup sequence is followed.
+(1) create - create a subfunction;
+(2) configure - configure subfunction attributes;
+(3) deploy - deploy the subfunction;
+
+Subfunction management is done using devlink port user interface.
+User performs setup on the subfunction management device.
+
+(1) Create
+--
+A subfunction is created using a devlink port interface. User adds the
+subfunction by adding a devlink port of subfunction flavour. The devlink
+kernel code calls down to subfunction management driver (devlink op) and asks
+it to create a subfunction devlink port. Driver then instantiates the
+subfunction port and any associated objects such as health reporters and
+representor netdevice.
+
+(2) Configure
+-
+Subfunction devlink port is created but it is not active yet. That means the
+entities are created on devlink side, the e-switch port representor is created,
+but the subfunction device itself it not created. User might use e-switch port
+representor to do settings, putting it into bridge, adding TC rules, etc. User
+might as well configure the hardware address (such as MAC address) of the
+subfunction while subfunction is inactive.
+
+(3) Deploy
+--
+Once subfunction is configured, user must activate it to use it. Upon
+activation, subfunction management driver asks the subfunction management
+device to instantiate the actual subfunction device on particular PCI function.
+A subfunction device is created on the 
:ref:`Documentation/driver-api/auxiliary_bus.rst `. At this 
point matching
+subfunction driver binds to the subfunction's auxiliary device.
+
+Terms and Definitions
+=
+
+.. list-table:: Terms and Definitions
+   :widths: 22 90
+
+   * - Term
+ - Definitions
+   * - ``PCI device``
+ - A physical PCI device having one or more PCI b

[net-next v5 13/15] devlink: Add devlink port documentation

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Added documentation for devlink port and port function related commands.

Signed-off-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Reviewed-by: Jacob Keller 
Signed-off-by: Saeed Mahameed 
---
 .../networking/devlink/devlink-port.rst   | 118 ++
 Documentation/networking/devlink/index.rst|   1 +
 2 files changed, 119 insertions(+)
 create mode 100644 Documentation/networking/devlink/devlink-port.rst

diff --git a/Documentation/networking/devlink/devlink-port.rst 
b/Documentation/networking/devlink/devlink-port.rst
new file mode 100644
index ..4c910dbb01ca
--- /dev/null
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -0,0 +1,118 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _devlink_port:
+
+
+Devlink Port
+
+
+``devlink-port`` is a port that exists on the device. It has a logically
+separate ingress/egress point of the device. A devlink port can be any one
+of many flavours. A devlink port flavour along with port attributes
+describe what a port represents.
+
+A device driver that intends to publish a devlink port sets the
+devlink port attributes and registers the devlink port.
+
+Devlink port flavours are described below.
+
+.. list-table:: List of devlink port flavours
+   :widths: 33 90
+
+   * - Flavour
+ - Description
+   * - ``DEVLINK_PORT_FLAVOUR_PHYSICAL``
+ - Any kind of physical port. This can be an eswitch physical port or any
+   other physical port on the device.
+   * - ``DEVLINK_PORT_FLAVOUR_DSA``
+ - This indicates a DSA interconnect port.
+   * - ``DEVLINK_PORT_FLAVOUR_CPU``
+ - This indicates a CPU port applicable only to DSA.
+   * - ``DEVLINK_PORT_FLAVOUR_PCI_PF``
+ - This indicates an eswitch port representing a port of PCI
+   physical function (PF).
+   * - ``DEVLINK_PORT_FLAVOUR_PCI_VF``
+ - This indicates an eswitch port representing a port of PCI
+   virtual function (VF).
+   * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
+ - This indicates a virtual port for the PCI virtual function.
+
+Devlink port can have a different type based on the link layer described below.
+
+.. list-table:: List of devlink port types
+   :widths: 23 90
+
+   * - Type
+ - Description
+   * - ``DEVLINK_PORT_TYPE_ETH``
+ - Driver should set this port type when a link layer of the port is
+   Ethernet.
+   * - ``DEVLINK_PORT_TYPE_IB``
+ - Driver should set this port type when a link layer of the port is
+   InfiniBand.
+   * - ``DEVLINK_PORT_TYPE_AUTO``
+ - This type is indicated by the user when driver should detect the port
+   type automatically.
+
+PCI controllers
+---
+In most cases a PCI device has only one controller. A controller consists of
+potentially multiple physical and virtual functions. Such PCI function consists
+of one or more ports. This port of the function is represented by the devlink
+eswitch port.
+
+A PCI Device connected to multiple CPUs or multiple PCI root complexes or
+SmartNIC, however, may have multiple controllers. For a device with multiple
+controllers, each controller is distinguished by a unique controller number.
+An eswitch on the PCI device support ports of multiple controllers.
+
+An example view of a system with two controllers::
+
+ -
+ |   |
+ |   - - --- --- |
+---  |   | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
+| server  |  | ---   / ---/- --- ---/--- ---/--- |
+| pci rc  |=== | pf0 |__//   | pf1 |___/___/ |
+| connect |  | ---   --- |
+---  | | controller_num=1 (no eswitch)   |
+ --|--
+ (internal wire)
+   |
+ -
+ | devlink eswitch ports and reps|
+ | - |
+ | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
+ | |pf0| pf0vfN | pf0sfN | pf1| pf1vfN |pf1sfN | |
+ | - |
+ | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
+ | |pf0| pf0vfN | pf0sfN | pf1| pf1vfN |pf1sfN | |
+ | - |
+ |   |
+ |   |
+---  |   - - --- --- |
+| smartNIC|  |   | vf(s) | | sf(s) |

[net-next v5 10/15] net/mlx5: E-switch, Add eswitch helpers for SF vport

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Add helpers to enable/disable eswitch port, register its devlink port and
load its representor.

Signed-off-by: Vu Pham 
Signed-off-by: Parav Pandit 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 .../mellanox/mlx5/core/esw/devlink_port.c | 41 +++
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 12 +++---
 .../net/ethernet/mellanox/mlx5/core/eswitch.h | 16 
 .../mellanox/mlx5/core/eswitch_offloads.c | 36 +++-
 4 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index 11baa3d0..4b7e9f783789 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -122,3 +122,44 @@ struct devlink_port *mlx5_esw_offloads_devlink_port(struct 
mlx5_eswitch *esw, u1
vport = mlx5_eswitch_get_vport(esw, vport_num);
return vport->dl_port;
 }
+
+int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct 
devlink_port *dl_port,
+ u16 vport_num, u32 sfnum)
+{
+   struct mlx5_core_dev *dev = esw->dev;
+   struct netdev_phys_item_id ppid = {};
+   unsigned int dl_port_index;
+   struct mlx5_vport *vport;
+   struct devlink *devlink;
+   u16 pfnum;
+   int err;
+
+   vport = mlx5_eswitch_get_vport(esw, vport_num);
+   if (IS_ERR(vport))
+   return PTR_ERR(vport);
+
+   pfnum = PCI_FUNC(dev->pdev->devfn);
+   mlx5_esw_get_port_parent_id(dev, &ppid);
+   memcpy(dl_port->attrs.switch_id.id, &ppid.id[0], ppid.id_len);
+   dl_port->attrs.switch_id.id_len = ppid.id_len;
+   devlink_port_attrs_pci_sf_set(dl_port, 0, pfnum, sfnum, false);
+   devlink = priv_to_devlink(dev);
+   dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, vport_num);
+   err = devlink_port_register(devlink, dl_port, dl_port_index);
+   if (err)
+   return err;
+
+   vport->dl_port = dl_port;
+   return 0;
+}
+
+void mlx5_esw_devlink_sf_port_unregister(struct mlx5_eswitch *esw, u16 
vport_num)
+{
+   struct mlx5_vport *vport;
+
+   vport = mlx5_eswitch_get_vport(esw, vport_num);
+   if (IS_ERR(vport))
+   return;
+   devlink_port_unregister(vport->dl_port);
+   vport->dl_port = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index d75247a8ce55..d06e7a5f15de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1273,8 +1273,8 @@ static void esw_vport_cleanup(struct mlx5_eswitch *esw, 
struct mlx5_vport *vport
esw_vport_cleanup_acl(esw, vport);
 }
 
-static int esw_enable_vport(struct mlx5_eswitch *esw, u16 vport_num,
-   enum mlx5_eswitch_vport_event enabled_events)
+int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, u16 vport_num,
+ enum mlx5_eswitch_vport_event enabled_events)
 {
struct mlx5_vport *vport;
int ret;
@@ -1310,7 +1310,7 @@ static int esw_enable_vport(struct mlx5_eswitch *esw, u16 
vport_num,
return ret;
 }
 
-static void esw_disable_vport(struct mlx5_eswitch *esw, u16 vport_num)
+void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, u16 vport_num)
 {
struct mlx5_vport *vport;
 
@@ -1432,7 +1432,7 @@ int mlx5_eswitch_load_vport(struct mlx5_eswitch *esw, u16 
vport_num,
 {
int err;
 
-   err = esw_enable_vport(esw, vport_num, enabled_events);
+   err = mlx5_esw_vport_enable(esw, vport_num, enabled_events);
if (err)
return err;
 
@@ -1443,14 +1443,14 @@ int mlx5_eswitch_load_vport(struct mlx5_eswitch *esw, 
u16 vport_num,
return err;
 
 err_rep:
-   esw_disable_vport(esw, vport_num);
+   mlx5_esw_vport_disable(esw, vport_num);
return err;
 }
 
 void mlx5_eswitch_unload_vport(struct mlx5_eswitch *esw, u16 vport_num)
 {
esw_offloads_unload_rep(esw, vport_num);
-   esw_disable_vport(esw, vport_num);
+   mlx5_esw_vport_disable(esw, vport_num);
 }
 
 void mlx5_eswitch_unload_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 4e3ed878ff03..54514b04808d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -688,6 +688,10 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
 enum mlx5_eswitch_vport_event enabled_events);
 void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw);
 
+int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, u16 vport_num,
+ enum mlx5_eswitch_vport_event enabled_events);
+void mlx5_esw_

[net-next v5 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport

2020-12-15 Thread Saeed Mahameed

From: Vu Pham 

Prepare eswitch to handle SF vport during
(a) querying eswitch functions
(b) egress ACL creation
(c) account for SF vports in total vports calculation

Assign a dedicated placeholder for SFs vports and their representors.
They are placed after VFs vports and before ECPF vports as below:
[PF,VF0,...,VFn,SF0,...SFm,ECPF,UPLINK].

Change functions to map SF's vport numbers to indices when
accessing the vports or representors arrays, and vice versa.

Signed-off-by: Vu Pham 
Signed-off-by: Parav Pandit 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/Kconfig   | 10 
 .../mellanox/mlx5/core/esw/acl/egress_ofld.c  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 11 +++-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h | 50 +++
 .../mellanox/mlx5/core/eswitch_offloads.c | 11 
 .../net/ethernet/mellanox/mlx5/core/vport.c   |  3 +-
 6 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index d6c48582e7a8..ad45d20f9d44 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -212,3 +212,13 @@ config MLX5_SF
Build support for subfuction device in the NIC. A Mellanox subfunction
device can support RDMA, netdevice and vdpa device.
It is similar to a SRIOV VF but it doesn't require SRIOV support.
+
+config MLX5_SF_MANAGER
+   bool
+   depends on MLX5_SF && MLX5_ESWITCH
+   default y
+   help
+   Build support for subfuction port in the NIC. A Mellanox subfunction
+   port is managed through devlink.  A subfunction supports RDMA, netdevice
+   and vdpa device. It is similar to a SRIOV VF but it doesn't require
+   SRIOV support.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c 
b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
index 4c74e2690d57..26b37a0f8762 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
@@ -150,7 +150,7 @@ static void esw_acl_egress_ofld_groups_destroy(struct 
mlx5_vport *vport)
 
 static bool esw_acl_egress_needed(const struct mlx5_eswitch *esw, u16 
vport_num)
 {
-   return mlx5_eswitch_is_vf_vport(esw, vport_num);
+   return mlx5_eswitch_is_vf_vport(esw, vport_num) || 
mlx5_esw_is_sf_vport(esw, vport_num);
 }
 
 int esw_acl_egress_ofld_setup(struct mlx5_eswitch *esw, struct mlx5_vport 
*vport)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index da901e364656..d75247a8ce55 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1366,9 +1366,15 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev 
*dev)
 {
int outlen = MLX5_ST_SZ_BYTES(query_esw_functions_out);
u32 in[MLX5_ST_SZ_DW(query_esw_functions_in)] = {};
+   u16 max_sf_vports;
u32 *out;
int err;
 
+   max_sf_vports = mlx5_sf_max_functions(dev);
+   /* Device interface is array of 64-bits */
+   if (max_sf_vports)
+   outlen += DIV_ROUND_UP(max_sf_vports, BITS_PER_TYPE(__be64)) * 
sizeof(__be64);
+
out = kvzalloc(outlen, GFP_KERNEL);
if (!out)
return ERR_PTR(-ENOMEM);
@@ -1376,7 +1382,7 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev 
*dev)
MLX5_SET(query_esw_functions_in, in, opcode,
 MLX5_CMD_OP_QUERY_ESW_FUNCTIONS);
 
-   err = mlx5_cmd_exec_inout(dev, query_esw_functions, in, out);
+   err = mlx5_cmd_exec(dev, in, sizeof(in), out, outlen);
if (!err)
return out;
 
@@ -1899,7 +1905,8 @@ static bool
 is_port_function_supported(const struct mlx5_eswitch *esw, u16 vport_num)
 {
return vport_num == MLX5_VPORT_PF ||
-  mlx5_eswitch_is_vf_vport(esw, vport_num);
+  mlx5_eswitch_is_vf_vport(esw, vport_num) ||
+  mlx5_esw_is_sf_vport(esw, vport_num);
 }
 
 int mlx5_devlink_port_function_hw_addr_get(struct devlink *devlink,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index cf87de94418f..4e3ed878ff03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -43,6 +43,7 @@
 #include 
 #include "lib/mpfs.h"
 #include "lib/fs_chains.h"
+#include "sf/sf.h"
 #include "en/tc_ct.h"
 
 #ifdef CONFIG_MLX5_ESWITCH
@@ -499,6 +500,40 @@ static inline u16 mlx5_eswitch_first_host_vport_num(struct 
mlx5_core_dev *dev)
MLX5_VPORT_PF : MLX5_VPORT_FIRST_VF;
 }
 
+static inline int mlx5_esw_sf_start_idx(const struct mlx5_eswitch *esw)
+{
+   /* PF and VF vports indices start from 0 to max_vfs */
+   return MLX5_VPORT

[net-next v5 05/15] devlink: Support get and set state of port function

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

devlink port function can be in active or inactive state.
Allow users to get and set port function's state.

When the port function it activated, its operational state may change
after a while when the device is created and driver binds to it.
Similarly on deactivation flow.

To clearly describe the state of the port function and its device's
operational state in the host system, define state and opstate
attributes.

Example of a PCI SF port which supports a port function:
Create a device with ID=10 and one physical port.

$ devlink dev eswitch set pci/:06:00.0 mode switchdev

$ devlink port show
pci/:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 
splittable false

$ devlink port add pci/:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show pci/:06:00.0/32768
pci/:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 
0 pfnum 0 sfnum 88 external false splittable false
  function:
hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set pci/:06:00.0/32768 hw_addr 00:00:00:00:88:88 
state active

$ devlink port show pci/:06:00.0/32768 -jp
{
"port": {
"pci/:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:88:88",
"state": "active",
"opstate": "attached"
}
}
}
}

Signed-off-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 include/net/devlink.h| 23 +
 include/uapi/linux/devlink.h | 21 +
 net/core/devlink.c   | 90 +++-
 3 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f8cff3e402da..18a7e66b7982 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1374,6 +1374,29 @@ struct devlink_ops {
int (*port_function_hw_addr_set)(struct devlink *devlink, struct 
devlink_port *port,
 const u8 *hw_addr, int hw_addr_len,
 struct netlink_ext_ack *extack);
+   /**
+* @port_function_state_get: Port function's state get function.
+*
+* Should be used by device drivers to report the state of a function
+* managed by the devlink port. Driver should return -EOPNOTSUPP if it
+* doesn't support port function handling for a particular port.
+*/
+   int (*port_function_state_get)(struct devlink *devlink,
+  struct devlink_port *port,
+  enum devlink_port_function_state *state,
+  enum devlink_port_function_opstate 
*opstate,
+  struct netlink_ext_ack *extack);
+   /**
+* @port_function_state_set: Port function's state set function.
+*
+* Should be used by device drivers to set the state of a function
+* managed by the devlink port. Driver should return -EOPNOTSUPP if it
+* doesn't support port function handling for a particular port.
+*/
+   int (*port_function_state_set)(struct devlink *devlink,
+  struct devlink_port *port,
+  enum devlink_port_function_state state,
+  struct netlink_ext_ack *extack);
/**
 * @port_new: Port add function.
 *
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 6fe00f10eb3f..beeb30bb6b20 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -583,9 +583,30 @@ enum devlink_resource_unit {
 enum devlink_port_function_attr {
DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, /* binary */
+   DEVLINK_PORT_FUNCTION_ATTR_STATE,   /* u8 */
+   DEVLINK_PORT_FUNCTION_ATTR_OPSTATE, /* u8 */
 
__DEVLINK_PORT_FUNCTION_ATTR_MAX,
DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
 };
 
+enum devlink_port_function_state {
+   DEVLINK_PORT_FUNCTION_STATE_INACTIVE,
+   DEVLINK_PORT_FUNCTION_STATE_ACTIVE,
+};
+
+/**
+ * enum devlink_port_function_opstate - indicates operational state of port 
function
+ * @DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED: Driver is attached to the function 
of port, for
+ * gracefufl tear down of the 
function, after
+ * inactivation of the port function, 
user should wait
+ * for operational state to turn 
D

[net-next v5 08/15] net/mlx5: SF, Add auxiliary device driver

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Add auxiliary device driver for mlx5 subfunction auxiliary device.

A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.

As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> 
../../../devices/pci:00/:00:03.0/:06:00.0/mlx5_core.sf.4

$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

$ devlink dev show
pci/:06:00.0
auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 
splittable false

$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88

$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 
sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid :00ff:fe00: 
sys_image_guid 248a:0703:00b3:d112

In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/

Signed-off-by: Parav Pandit 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  12 +++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/main.c|  12 ++-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  10 ++
 .../net/ethernet/mellanox/mlx5/core/pci_irq.c |  20 
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.c  |  10 ++
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.h  |  20 
 .../mellanox/mlx5/core/sf/dev/driver.c| 101 ++
 include/linux/mlx5/driver.h   |   4 +-
 10 files changed, 187 insertions(+), 6 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 2aefbca404c3..efa95d6dd112 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -88,4 +88,4 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o 
steering/dr_table.o
 #
 # SF device
 #
-mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o
+mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o sf/dev/driver.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c 
b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 3261d0dc1104..9afe918c5827 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -7,6 +7,7 @@
 #include "fw_reset.h"
 #include "fs_core.h"
 #include "eswitch.h"
+#include "sf/dev/dev.h"
 
 static int mlx5_devlink_flash_update(struct devlink *devlink,
 struct devlink_flash_update_params *params,
@@ -127,6 +128,17 @@ static int mlx5_devlink_reload_down(struct devlink 
*devlink, bool netns_change,
struct netlink_ext_ack *extack)
 {
struct mlx5_core_dev *dev = devlink_priv(devlink);
+   bool sf_dev_allocated;
+
+   sf_dev_allocated = mlx5_sf_dev_allocated(dev);
+   if (sf_dev_allocated) {
+   /* Reload results in deleting SF device which further results in
+* unregistering devlink instance while holding devlink_mutext.
+* Hence, do not support reload.
+*/
+   NL_SET_ERR_MSG_MOD(extack, "reload is unsupported when SFs are 
allocated\n");
+   return -EOPNOTSUPP;
+   }
 
switch (action) {
case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 421febebc658..174dfbc996c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -467,7 +467,7 @@ int mlx5_eq_table_init(struct mlx5_core_dev *dev)
for (i = 0; i < MLX5_EVENT_TYPE_MAX; i++)
ATOMIC_INIT_NOTIFIER_HEAD(&eq_table->nh[i]);
 
-   eq_table->irq_table = dev->priv.irq_table;
+   eq_table->irq_table = mlx5_irq_table_get(dev);
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 292c30e71d7f..932a280a56a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -84,7 +84,6 @@ unsigned int mlx5_core_debug_mask;
 module_param_named(debug_mask, mlx5_core_debug_mask, uint, 0644);
 MODULE_PARM_DESC(debug_mask, "debug mask: 1 = dump cmd data, 2 = dump cmd exec 
time, 3 = both. D

[PATCH net] ethtool: fix error paths in ethnl_set_channels()

2020-12-15 Thread Ivan Vecera

Fix two error paths in ethnl_set_channels() to avoid lock-up caused
but unreleased RTNL.

Fixes: e19c591eafad ("ethtool: set device channel counts with CHANNELS_SET 
request")
Cc: Michal Kubecek 
Reported-by: LiLiang 
Signed-off-by: Ivan Vecera 
---
 net/ethtool/channels.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
index 5635604cb9ba..25a9e566ef5c 100644
--- a/net/ethtool/channels.c
+++ b/net/ethtool/channels.c
@@ -194,8 +194,9 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
genl_info *info)
if (netif_is_rxfh_configured(dev) &&
!ethtool_get_max_rxfh_channel(dev, &max_rx_in_use) &&
(channels.combined_count + channels.rx_count) <= max_rx_in_use) {
+   ret = -EINVAL;
GENL_SET_ERR_MSG(info, "requested channel counts are too low 
for existing indirection table settings");
-   return -EINVAL;
+   goto out_ops;
}
 
/* Disabling channels, query zero-copy AF_XDP sockets */
@@ -203,8 +204,9 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
genl_info *info)
   min(channels.rx_count, channels.tx_count);
for (i = from_channel; i < old_total; i++)
if (xsk_get_pool_from_qid(dev, i)) {
+   ret = -EINVAL;
GENL_SET_ERR_MSG(info, "requested channel counts are 
too low for existing zerocopy AF_XDP sockets");
-   return -EINVAL;
+   goto out_ops;
}
 
ret = dev->ethtool_ops->set_channels(dev, &channels);
-- 
2.26.2

[net-next v5 04/15] devlink: Support add and delete devlink port

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Extended devlink interface for the user to add and delete port.
Extend devlink to connect user requests to driver to add/delete
such port in the device.

When driver routines are invoked, devlink instance lock is not held.
This enables driver to perform several devlink objects registration,
unregistration such as (port, health reporter, resource etc)
by using existing devlink APIs.
This also helps to uniformly use the code for port unregistration
during driver unload and during port deletion initiated by user.

Examples of add, show and delete commands:
$ devlink dev eswitch set pci/:06:00.0 mode switchdev

$ devlink port show
pci/:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 
splittable false

$ devlink port add pci/:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show pci/:06:00.0/32768
pci/:06:00.0/32768: type eth netdev eth0 flavour pcisf controller 0 pfnum 0 
sfnum 88 external false splittable false
  function:
hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ udevadm test-builtin net_id /sys/class/net/eth0
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=enp6s0f0npf0sf88
ID_NET_NAME_SLOT=ens2f0npf0sf88
Unload module index
Unloaded link configuration context.

Signed-off-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 include/net/devlink.h | 39 
 net/core/devlink.c| 71 +++
 2 files changed, 110 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 5bd43f0a79a8..f8cff3e402da 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -153,6 +153,17 @@ struct devlink_port {
struct mutex reporters_lock; /* Protects reporter_list */
 };
 
+struct devlink_port_new_attrs {
+   enum devlink_port_flavour flavour;
+   unsigned int port_index;
+   u32 controller;
+   u32 sfnum;
+   u16 pfnum;
+   u8 port_index_valid:1,
+  controller_valid:1,
+  sfnum_valid:1;
+};
+
 struct devlink_sb_pool_info {
enum devlink_sb_pool_type pool_type;
u32 size;
@@ -1363,6 +1374,34 @@ struct devlink_ops {
int (*port_function_hw_addr_set)(struct devlink *devlink, struct 
devlink_port *port,
 const u8 *hw_addr, int hw_addr_len,
 struct netlink_ext_ack *extack);
+   /**
+* @port_new: Port add function.
+*
+* Should be used by device driver to let caller add new port of a
+* specified flavour with optional attributes.
+* Driver should return -EOPNOTSUPP if it doesn't support port addition
+* of a specified flavour or specified attributes. Driver should set
+* extack error message in case of fail to add the port. Devlink core
+* does not hold a devlink instance lock when this callback is invoked.
+* Driver must ensures synchronization when adding or deleting a port.
+* Driver must register a port with devlink core.
+*/
+   int (*port_new)(struct devlink *devlink,
+   const struct devlink_port_new_attrs *attrs,
+   struct netlink_ext_ack *extack);
+   /**
+* @port_del: Port delete function.
+*
+* Should be used by device driver to let caller delete port which was
+* previously created using port_new() callback.
+* Driver should return -EOPNOTSUPP if it doesn't support port deletion.
+* Driver should set extack error message in case of fail to delete the
+* port. Devlink core does not hold a devlink instance lock when this
+* callback is invoked. Driver must ensures synchronization when adding
+* or deleting a port. Driver must register a port with devlink core.
+*/
+   int (*port_del)(struct devlink *devlink, unsigned int port_index,
+   struct netlink_ext_ack *extack);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 08eac247f200..11043707f63f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1146,6 +1146,61 @@ static int devlink_nl_cmd_port_unsplit_doit(struct 
sk_buff *skb,
return devlink_port_unsplit(devlink, port_index, info->extack);
 }
 
+static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb,
+   struct genl_info *info)
+{
+   struct netlink_ext_ack *extack = info->extack;
+   struct devlink_port_new_attrs new_attrs = {};
+   struct devlink *devlink = info->user_ptr[0];
+
+   if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] ||
+   !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) {
+   NL_SET

[net-next v5 02/15] devlink: Prepare code to fill multiple port function attributes

2020-12-15 Thread Saeed Mahameed

From: Parav Pandit 

Prepare code to fill zero or more port function optional attributes.
Subsequent patch makes use of this to fill more port function
attributes.

Signed-off-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Reviewed-by: Vu Pham 
Signed-off-by: Saeed Mahameed 
---
 net/core/devlink.c | 63 +++---
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index ee828e4b1007..13e0de80c4f9 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -712,6 +712,31 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
return 0;
 }
 
+static int
+devlink_port_function_hw_addr_fill(struct devlink *devlink, const struct 
devlink_ops *ops,
+  struct devlink_port *port, struct sk_buff 
*msg,
+  struct netlink_ext_ack *extack, bool 
*msg_updated)
+{
+   u8 hw_addr[MAX_ADDR_LEN];
+   int hw_addr_len;
+   int err;
+
+   if (!ops->port_function_hw_addr_get)
+   return 0;
+
+   err = ops->port_function_hw_addr_get(devlink, port, hw_addr, 
&hw_addr_len, extack);
+   if (err) {
+   if (err == -EOPNOTSUPP)
+   return 0;
+   return err;
+   }
+   err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, hw_addr_len, 
hw_addr);
+   if (err)
+   return err;
+   *msg_updated = true;
+   return 0;
+}
+
 static int
 devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port 
*port,
   struct netlink_ext_ack *extack)
@@ -719,36 +744,16 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, 
struct devlink_port *por
struct devlink *devlink = port->devlink;
const struct devlink_ops *ops;
struct nlattr *function_attr;
-   bool empty_nest = true;
-   int err = 0;
+   bool msg_updated = false;
+   int err;
 
function_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_PORT_FUNCTION);
if (!function_attr)
return -EMSGSIZE;
 
ops = devlink->ops;
-   if (ops->port_function_hw_addr_get) {
-   int hw_addr_len;
-   u8 hw_addr[MAX_ADDR_LEN];
-
-   err = ops->port_function_hw_addr_get(devlink, port, hw_addr, 
&hw_addr_len, extack);
-   if (err == -EOPNOTSUPP) {
-   /* Port function attributes are optional for a port. If 
port doesn't
-* support function attribute, returning -EOPNOTSUPP is 
not an error.
-*/
-   err = 0;
-   goto out;
-   } else if (err) {
-   goto out;
-   }
-   err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, 
hw_addr_len, hw_addr);
-   if (err)
-   goto out;
-   empty_nest = false;
-   }
-
-out:
-   if (err || empty_nest)
+   err = devlink_port_function_hw_addr_fill(devlink, ops, port, msg, 
extack, &msg_updated);
+   if (err || !msg_updated)
nla_nest_cancel(msg, function_attr);
else
nla_nest_end(msg, function_attr);
@@ -986,7 +991,6 @@ devlink_port_function_hw_addr_set(struct devlink *devlink, 
struct devlink_port *
const struct devlink_ops *ops;
const u8 *hw_addr;
int hw_addr_len;
-   int err;
 
hw_addr = nla_data(attr);
hw_addr_len = nla_len(attr);
@@ -1011,12 +1015,7 @@ devlink_port_function_hw_addr_set(struct devlink 
*devlink, struct devlink_port *
return -EOPNOTSUPP;
}
 
-   err = ops->port_function_hw_addr_set(devlink, port, hw_addr, 
hw_addr_len, extack);
-   if (err)
-   return err;
-
-   devlink_port_notify(port, DEVLINK_CMD_PORT_NEW);
-   return 0;
+   return ops->port_function_hw_addr_set(devlink, port, hw_addr, 
hw_addr_len, extack);
 }
 
 static int
@@ -1037,6 +1036,8 @@ devlink_port_function_set(struct devlink *devlink, struct 
devlink_port *port,
if (attr)
err = devlink_port_function_hw_addr_set(devlink, port, attr, 
extack);
 
+   if (!err)
+   devlink_port_notify(port, DEVLINK_CMD_PORT_NEW);
return err;
 }
 
-- 
2.26.2

[PATCH net-next 1/1] net/smc: fix access to parent of an ib device

2020-12-15 Thread Karsten Graul

The parent of an ib device is used to retrieve the PCI device
attributes. It turns out that there are possible cases when an ib device
has no parent set in the device structure, which may lead to page
faults when trying to access this memory.
Fix that by checking the parent pointer and consolidate the pci device
specific processing in a new function.

Fixes: a3db10efcc4c ("net/smc: Add support for obtaining SMCR device list")
Reported-by: syzbot+600fef7c414ee7e2d...@syzkaller.appspotmail.com
Signed-off-by: Karsten Graul 
---
 net/smc/smc_ib.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 89ea10675a7d..ddd7fac98b1d 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -394,6 +394,22 @@ static int smc_nl_handle_dev_port(struct sk_buff *skb,
return -EMSGSIZE;
 }
 
+static bool smc_nl_handle_pci_values(const struct smc_pci_dev *smc_pci_dev,
+struct sk_buff *skb)
+{
+   if (nla_put_u32(skb, SMC_NLA_DEV_PCI_FID, smc_pci_dev->pci_fid))
+   return false;
+   if (nla_put_u16(skb, SMC_NLA_DEV_PCI_CHID, smc_pci_dev->pci_pchid))
+   return false;
+   if (nla_put_u16(skb, SMC_NLA_DEV_PCI_VENDOR, smc_pci_dev->pci_vendor))
+   return false;
+   if (nla_put_u16(skb, SMC_NLA_DEV_PCI_DEVICE, smc_pci_dev->pci_device))
+   return false;
+   if (nla_put_string(skb, SMC_NLA_DEV_PCI_ID, smc_pci_dev->pci_id))
+   return false;
+   return true;
+}
+
 static int smc_nl_handle_smcr_dev(struct smc_ib_device *smcibdev,
  struct sk_buff *skb,
  struct netlink_callback *cb)
@@ -417,19 +433,13 @@ static int smc_nl_handle_smcr_dev(struct smc_ib_device 
*smcibdev,
is_crit = smcr_diag_is_dev_critical(&smc_lgr_list, smcibdev);
if (nla_put_u8(skb, SMC_NLA_DEV_IS_CRIT, is_crit))
goto errattr;
-   memset(&smc_pci_dev, 0, sizeof(smc_pci_dev));
-   pci_dev = to_pci_dev(smcibdev->ibdev->dev.parent);
-   smc_set_pci_values(pci_dev, &smc_pci_dev);
-   if (nla_put_u32(skb, SMC_NLA_DEV_PCI_FID, smc_pci_dev.pci_fid))
-   goto errattr;
-   if (nla_put_u16(skb, SMC_NLA_DEV_PCI_CHID, smc_pci_dev.pci_pchid))
-   goto errattr;
-   if (nla_put_u16(skb, SMC_NLA_DEV_PCI_VENDOR, smc_pci_dev.pci_vendor))
-   goto errattr;
-   if (nla_put_u16(skb, SMC_NLA_DEV_PCI_DEVICE, smc_pci_dev.pci_device))
-   goto errattr;
-   if (nla_put_string(skb, SMC_NLA_DEV_PCI_ID, smc_pci_dev.pci_id))
-   goto errattr;
+   if (smcibdev->ibdev->dev.parent) {
+   memset(&smc_pci_dev, 0, sizeof(smc_pci_dev));
+   pci_dev = to_pci_dev(smcibdev->ibdev->dev.parent);
+   smc_set_pci_values(pci_dev, &smc_pci_dev);
+   if (!smc_nl_handle_pci_values(&smc_pci_dev, skb))
+   goto errattr;
+   }
snprintf(smc_ibname, sizeof(smc_ibname), "%s", smcibdev->ibdev->name);
if (nla_put_string(skb, SMC_NLA_DEV_IB_NAME, smc_ibname))
goto errattr;
-- 
2.17.1

Re: [PATCH v2 net-next 2/2] nfc: s3fwrn5: Remove unused NCI prop commands

2020-12-15 Thread Krzysztof Kozlowski

On Tue, Dec 15, 2020 at 03:54:01PM +0900, Bongsu Jeon wrote:
> From: Bongsu Jeon 
> 
> Remove the unused NCI prop commands that s3fwrn5 driver doesn't use.
> 
> Signed-off-by: Bongsu Jeon 
> ---
>  drivers/nfc/s3fwrn5/nci.c | 25 -
>  drivers/nfc/s3fwrn5/nci.h | 22 --
>  2 files changed, 47 deletions(-)
> 

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof

[PATCH net-next 0/1] net/smc: fix access to parent of an ib device

2020-12-15 Thread Karsten Graul

Please apply the following patch for smc to netdev's net-next tree.

The patch fixes an access to the parent of an ib device which might be NULL.

I am sending this fix to net-next because the fixed code is still in this
tree only.

Karsten Graul (1):
  net/smc: fix access to parent of an ib device

 net/smc/smc_ib.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

-- 
2.17.1

Re: [PATCH] net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function

2020-12-15 Thread Dan Carpenter

On Tue, Dec 15, 2020 at 09:56:55AM +0100, Maxime Ripard wrote:
> Hi,
> 
> On Mon, Dec 14, 2020 at 09:21:17PM +0100, Christophe JAILLET wrote:
> > 'irq_of_parse_and_map()' should be balanced by a corresponding
> > 'irq_dispose_mapping()' call. Otherwise, there is some resources leaks.
> 
> Do you have a source to back that? It's not clear at all from the
> documentation for those functions, and couldn't find any user calling it
> from the ten-or-so random picks I took.

It looks like irq_create_of_mapping() needs to be freed with
irq_dispose_mapping() so this is correct.

regards,
dan carpenter

Re: [PATCH v2 net-next 1/2] nfc: s3fwrn5: Remove the delay for NFC sleep

2020-12-15 Thread Krzysztof Kozlowski

On Tue, Dec 15, 2020 at 03:54:00PM +0900, Bongsu Jeon wrote:
> From: Bongsu Jeon 
> 
> Remove the delay for NFC sleep because the delay is only needed to
> guarantee that the NFC is awake.
> 
> Signed-off-by: Bongsu Jeon 
> ---
>  drivers/nfc/s3fwrn5/phy_common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof

Re: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

2020-12-15 Thread Jesper Dangaard Brouer

On Mon, 14 Dec 2020 17:29:06 -0800
Ivan Babrou  wrote:

> Without this change the driver tries to allocate too many queues,
> breaching the number of available msi-x interrupts on machines
> with many logical cpus and default adapter settings:
> 
> Insufficient resources for 12 XDP event queues (24 other channels, max 32)
> 
> Which in turn triggers EINVAL on XDP processing:
> 
> sfc :86:00.0 ext0: XDP TX failed (-22)

I have a similar QA report with XDP_REDIRECT:
  sfc :05:00.0 ens1f0np0: XDP redirect failed (-22)

Here we are back to the issue we discussed with ixgbe, that NIC / msi-x
interrupts hardware resources are not enough on machines with many
logical cpus.

After this fix, what will happen if (cpu >= efx->xdp_tx_queue_count) ?
(Copied efx_xdp_tx_buffers code below signature)

The question leads to, does this driver need a fallback mechanism when
HW resource or systems logical cpus exceed the one TX-queue per CPU
assumption?

> Signed-off-by: Ivan Babrou 
> ---
>  drivers/net/ethernet/sfc/efx_channels.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/efx_channels.c
> b/drivers/net/ethernet/sfc/efx_channels.c index
> a4a626e9cd9a..1bfeee283ea9 100644 ---
> a/drivers/net/ethernet/sfc/efx_channels.c +++
> b/drivers/net/ethernet/sfc/efx_channels.c @@ -17,6 +17,7 @@
>  #include "rx_common.h"
>  #include "nic.h"
>  #include "sriov.h"
> +#include "workarounds.h"
>  
>  /* This is the first interrupt mode to try out of:
>   * 0 => MSI-X
> @@ -137,6 +138,7 @@ static int efx_allocate_msix_channels(struct
> efx_nic *efx, {
>   unsigned int n_channels = parallelism;
>   int vec_count;
> + int tx_per_ev;
>   int n_xdp_tx;
>   int n_xdp_ev;
>  
> @@ -149,9 +151,9 @@ static int efx_allocate_msix_channels(struct
> efx_nic *efx,
>* multiple tx queues, assuming tx and ev queues are both
>* maximum size.
>*/
> -
> + tx_per_ev = EFX_MAX_EVQ_SIZE / EFX_TXQ_MAX_ENT(efx);
>   n_xdp_tx = num_possible_cpus();
> - n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, EFX_MAX_TXQ_PER_CHANNEL);
> + n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, tx_per_ev);
>  
>   vec_count = pci_msix_vec_count(efx->pci_dev);
>   if (vec_count < 0)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

/* Transmit a packet from an XDP buffer
 *
 * Returns number of packets sent on success, error code otherwise.
 * Runs in NAPI context, either in our poll (for XDP TX) or a different NIC
 * (for XDP redirect).
 */
int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
   bool flush)
{
struct efx_tx_buffer *tx_buffer;
struct efx_tx_queue *tx_queue;
struct xdp_frame *xdpf;
dma_addr_t dma_addr;
unsigned int len;
int space;
int cpu;
int i;

cpu = raw_smp_processor_id();

if (!efx->xdp_tx_queue_count ||
unlikely(cpu >= efx->xdp_tx_queue_count))
return -EINVAL;

tx_queue = efx->xdp_tx_queues[cpu];
if (unlikely(!tx_queue))
return -EINVAL;

if (unlikely(n && !xdpfs))
return -EINVAL;

if (!n)
return 0;

/* Check for available space. We should never need multiple
 * descriptors per frame.
 */
space = efx->txq_entries +
tx_queue->read_count - tx_queue->insert_count;

for (i = 0; i < n; i++) {
xdpf = xdpfs[i];

if (i >= space)
break;

/* We'll want a descriptor for this tx. */
prefetchw(__efx_tx_queue_get_insert_buffer(tx_queue));

len = xdpf->len;

/* Map for DMA. */
dma_addr = dma_map_single(&efx->pci_dev->dev,
  xdpf->data, len,
  DMA_TO_DEVICE);
if (dma_mapping_error(&efx->pci_dev->dev, dma_addr))
break;

/*  Create descriptor and set up for unmapping DMA. */
tx_buffer = efx_tx_map_chunk(tx_queue, dma_addr, len);
tx_buffer->xdpf = xdpf;
tx_buffer->flags = EFX_TX_BUF_XDP |
   EFX_TX_BUF_MAP_SINGLE;
tx_buffer->dma_offset = 0;
tx_buffer->unmap_len = len;
tx_queue->tx_packets++;
}

/* Pass mapped frames to hardware. */
if (flush && i > 0)
efx_nic_push_buffers(tx_queue);

if (i == 0)
return -EIO;

efx_xdp_return_frames(n - i, xdpfs + i);

return i;
}

general protection fault in taprio_dequeue_soft

2020-12-15 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:7f376f19 Merge tag 'mtd/fixes-for-5.10-rc8' of git://git.k..
git tree:   net
console output: https://syzkaller.appspot.com/x/log.txt?x=1384228750
kernel config:  https://syzkaller.appspot.com/x/.config?x=3416bb960d5c705d
dashboard link: https://syzkaller.appspot.com/bug?extid=8971da381fb5a31f542d
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=128c574550
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17a1f12350

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8971da381fb5a31f5...@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 
0xdc00:  [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x-0x0007]
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:taprio_dequeue_soft+0x22e/0xa40 net/sched/sch_taprio.c:544
Code: 24 18 e8 d5 3e 4c fa 48 8b 44 24 10 80 38 00 0f 85 4c 07 00 00 48 8b 93 
c0 02 00 00 49 63 c5 4c 8d 24 c2 4c 89 e0 48 c1 e8 03 <80> 3c 28 00 0f 85 3c 07 
00 00 4d 8b 24 24 4d 85 e4 0f 84 87 03 00
RSP: 0018:c9d90e08 EFLAGS: 00010246
RAX:  RBX: 8880282e3800 RCX: 8723c557
RDX:  RSI: 8723c59b RDI: 0005
RBP: dc00 R08: 0001 R09: 8ebaf667
R10:  R11: 0001 R12: 
R13:  R14: 0401 R15: 88801917e000
FS:  () GS:8880b9f0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2600 CR3: 13cdc000 CR4: 001506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 dequeue_skb net/sched/sch_generic.c:263 [inline]
 qdisc_restart net/sched/sch_generic.c:366 [inline]
 __qdisc_run+0x1ae/0x15e0 net/sched/sch_generic.c:384
 qdisc_run include/net/pkt_sched.h:131 [inline]
 qdisc_run include/net/pkt_sched.h:123 [inline]
 net_tx_action+0x4b9/0xbf0 net/core/dev.c:4915
 __do_softirq+0x2a0/0x9f6 kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:393 [inline]
 __irq_exit_rcu kernel/softirq.c:423 [inline]
 irq_exit_rcu+0x132/0x200 kernel/softirq.c:435
 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:631
RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:79 [inline]
RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:169 [inline]
RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:112 [inline]
RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:517
Code: 5d 07 88 f8 84 db 75 ac e8 44 0f 88 f8 e8 bf cd 8d f8 e9 0c 00 00 00 e8 
35 0f 88 f8 0f 00 2d 9e 86 c0 00 e8 29 0f 88 f8 fb f4 <9c> 5b 81 e3 00 02 00 00 
fa 31 ff 48 89 de e8 84 07 88 f8 48 85 db
RSP: 0018:c9d27d18 EFLAGS: 0293
RAX:  RBX:  RCX: 119d8e91
RDX: 888010d98000 RSI: 88e7f547 RDI: 
RBP: 888014e50064 R08: 0001 R09: 0001
R10:  R11: 0001 R12: 0001
R13: 888014e5 R14: 888014e50064 R15: 88801747a804
 acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:648
 cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237
 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351
 call_cpuidle kernel/sched/idle.c:158 [inline]
 cpuidle_idle_call kernel/sched/idle.c:239 [inline]
 do_idle+0x3e1/0x590 kernel/sched/idle.c:299
 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:395
 start_secondary+0x266/0x340 arch/x86/kernel/smpboot.c:266
 secondary_startup_64_no_verify+0xb0/0xbb
Modules linked in:
---[ end trace 86b7dd17b9a0a261 ]---
RIP: 0010:taprio_dequeue_soft+0x22e/0xa40 net/sched/sch_taprio.c:544
Code: 24 18 e8 d5 3e 4c fa 48 8b 44 24 10 80 38 00 0f 85 4c 07 00 00 48 8b 93 
c0 02 00 00 49 63 c5 4c 8d 24 c2 4c 89 e0 48 c1 e8 03 <80> 3c 28 00 0f 85 3c 07 
00 00 4d 8b 24 24 4d 85 e4 0f 84 87 03 00
RSP: 0018:c9d90e08 EFLAGS: 00010246
RAX:  RBX: 8880282e3800 RCX: 8723c557
RDX:  RSI: 8723c59b RDI: 0005
RBP: dc00 R08: 0001 R09: 8ebaf667
R10:  R11: 0001 R12: 
R13:  R14: 0401 R15: 88

[MPTCP][PATCH net-next] mptcp: clear use_ack and use_map when dropping other suboptions

2020-12-15 Thread Geliang Tang

This patch cleared use_ack and use_map when dropping other suboptions to
fix the following syzkaller BUG:

[   15.223006] BUG: unable to handle page fault for address: 00223b10
[   15.223700] #PF: supervisor read access in kernel mode
[   15.224209] #PF: error_code(0x) - not-present page
[   15.224724] PGD b8d5067 P4D b8d5067 PUD c0a5067 PMD 0
[   15.225237] Oops:  [#1] SMP
[   15.225556] CPU: 0 PID: 7747 Comm: syz-executor Not tainted 5.10.0-rc6+ #24
[   15.226281] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[   15.227292] RIP: 0010:skb_release_data+0x89/0x1e0
[   15.227816] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a 
ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 
08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
[   15.229669] RSP: 0018:c900019c7c08 EFLAGS: 00010293
[   15.230188] RAX: 88800daad900 RBX: 00223b08 RCX: 0006
[   15.230895] RDX:  RSI: 818e06c5 RDI: 88807f6dc700
[   15.231593] RBP: 88807f71a4c0 R08: 0001 R09: 0001
[   15.232299] R10: c900019c7c18 R11:  R12: 88807f71a4f0
[   15.233007] R13:  R14: 88807f6dc700 R15: 0002
[   15.233714] FS:  7f65d9b5f700() GS:88807c40() 
knlGS:
[   15.234509] CS:  0010 DS:  ES:  CR0: 80050033
[   15.235081] CR2: 00223b10 CR3: 0b883000 CR4: 06f0
[   15.235788] Call Trace:
[   15.236042]  skb_release_all+0x28/0x30
[   15.236419]  __kfree_skb+0x11/0x20
[   15.236768]  tcp_data_queue+0x270/0x1240
[   15.237161]  ? tcp_urg+0x50/0x2a0
[   15.237496]  tcp_rcv_established+0x39a/0x890
[   15.237997]  ? mark_held_locks+0x49/0x70
[   15.238467]  tcp_v4_do_rcv+0xb9/0x270
[   15.238915]  __release_sock+0x8a/0x160
[   15.239365]  release_sock+0x32/0xd0
[   15.239793]  __inet_stream_connect+0x1d2/0x400
[   15.240313]  ? do_wait_intr_irq+0x80/0x80
[   15.240791]  inet_stream_connect+0x36/0x50
[   15.241275]  mptcp_stream_connect+0x69/0x1b0
[   15.241787]  __sys_connect+0x122/0x140
[   15.242236]  ? syscall_enter_from_user_mode+0x17/0x50
[   15.242836]  ? lockdep_hardirqs_on_prepare+0xd4/0x170
[   15.243436]  __x64_sys_connect+0x1a/0x20
[   15.243924]  do_syscall_64+0x33/0x40
[   15.244313]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   15.244821] RIP: 0033:0x7f65d946e469
[   15.245183] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
[   15.247019] RSP: 002b:7f65d9b5eda8 EFLAGS: 0246 ORIG_RAX: 
002a
[   15.247770] RAX: ffda RBX: 0049bf00 RCX: 7f65d946e469
[   15.248471] RDX: 0010 RSI: 20c0 RDI: 0005
[   15.249205] RBP: 0049bf00 R08:  R09: 
[   15.249908] R10:  R11: 0246 R12: 0049bf0c
[   15.250603] R13: 7fffe8a25cef R14: 7f65d9b3f000 R15: 0003
[   15.251312] Modules linked in:
[   15.251626] CR2: 00223b10
[   15.251965] BUG: kernel NULL pointer dereference, address: 0048
[   15.252005] ---[ end trace f5c51fe19123c773 ]---
[   15.252822] #PF: supervisor read access in kernel mode
[   15.252823] #PF: error_code(0x) - not-present page
[   15.252825] PGD c6c6067 P4D c6c6067 PUD c0d8067
[   15.253294] RIP: 0010:skb_release_data+0x89/0x1e0
[   15.253910] PMD 0
[   15.253914] Oops:  [#2] SMP
[   15.253917] CPU: 1 PID: 7746 Comm: syz-executor Tainted: G  D   
5.10.0-rc6+ #24
[   15.253920] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[   15.254435] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a 
ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 
08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
[   15.254899] RIP: 0010:skb_release_data+0x89/0x1e0
[   15.254902] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a 
ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 
08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
[   15.254905] RSP: 0018:c900019bfc08 EFLAGS: 00010293
[   15.255376] RSP: 0018:c900019c7c08 EFLAGS: 00010293
[   15.255580]
[   15.255583] RAX: 888004a7ac80 RBX: 0040 RCX: 
[   15.255912]
[   15.256724] RDX:  RSI: 818e06c5 RDI: 88807f6ddd00
[   15.257620] RAX: 88800daad900 RBX: 00223b08 RCX: 0006
[   15.259817] RBP: 88800e9006c0 R08:  R09: 
[   15.259818] R10:  R11:  R12: 88800e9006f0
[   15.259820] R13:  R14: 88807f6ddd00 R15: 0002
[   15.259822] F

Re: [PATCH net] ethtool: fix error paths in ethnl_set_channels()

2020-12-15 Thread Michal Kubecek

On Tue, Dec 15, 2020 at 10:08:10AM +0100, Ivan Vecera wrote:
> Fix two error paths in ethnl_set_channels() to avoid lock-up caused
> but unreleased RTNL.
> 
> Fixes: e19c591eafad ("ethtool: set device channel counts with CHANNELS_SET 
> request")
> Cc: Michal Kubecek 
> Reported-by: LiLiang 
> Signed-off-by: Ivan Vecera 
> ---
>  net/ethtool/channels.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
> index 5635604cb9ba..25a9e566ef5c 100644
> --- a/net/ethtool/channels.c
> +++ b/net/ethtool/channels.c
> @@ -194,8 +194,9 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
> genl_info *info)
>   if (netif_is_rxfh_configured(dev) &&
>   !ethtool_get_max_rxfh_channel(dev, &max_rx_in_use) &&
>   (channels.combined_count + channels.rx_count) <= max_rx_in_use) {
> + ret = -EINVAL;
>   GENL_SET_ERR_MSG(info, "requested channel counts are too low 
> for existing indirection table settings");
> - return -EINVAL;
> + goto out_ops;
>   }
>  
>   /* Disabling channels, query zero-copy AF_XDP sockets */
> @@ -203,8 +204,9 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
> genl_info *info)
>  min(channels.rx_count, channels.tx_count);
>   for (i = from_channel; i < old_total; i++)
>   if (xsk_get_pool_from_qid(dev, i)) {
> + ret = -EINVAL;
>   GENL_SET_ERR_MSG(info, "requested channel counts are 
> too low for existing zerocopy AF_XDP sockets");
> - return -EINVAL;
> + goto out_ops;
>   }
>  
>   ret = dev->ethtool_ops->set_channels(dev, &channels);

Oh, the joys of mindless copy and paste... :-(

Reviewed-by: Michal Kubecek 


signature.asc
Description: PGP signature

Re: [PATCH net] nfp: do not send control messages during cleanup

2020-12-15 Thread Simon Horman

On Mon, Dec 14, 2020 at 06:26:50PM -0800, Jakub Kicinski wrote:
> On Fri, 11 Dec 2020 10:27:38 +0100 Simon Horman wrote:
> > On cleanup the txbufs are freed before app cleanup. But app clean-up may
> > result in control messages due to use of common control paths. There is no
> > need to clean-up the NIC in such cases so simply discard requests. Without
> > such a check a NULL pointer dereference occurs.
> > 
> > Fixes: a1db217861f3 ("net: flow_offload: fix flow_indr_dev_unregister path")
> > Cc: wenxu 
> > Signed-off-by: Simon Horman 
> > Signed-off-by: Louis Peens 
> 
> Hm. We can apply this as a quick fix, but I'd think that app->stop
> (IIRC that's the callback) is responsible for making sure that
> everything gets shut down and no more cmsgs can be generated after
> ctrl vNIC goes down. Perhaps some code needs to be reshuffled between
> init/clean and start/stop for flower? WDYT?

Thanks Jakub,

I was a bit concerned with fragility in the clean-up path, which is why I
had opted for this simple solution. However, looking at your suggestion
above it seems simple to move the cleanup to app->stop. I'll work on
posting a patch to implement your suggestion.

WARNING: suspicious RCU usage in nf_ct_iterate_cleanup

2020-12-15 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:33dc9614 Merge tag 'ktest-v5.10-rc6' of git://git.kernel.o..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1200a46b50
kernel config:  https://syzkaller.appspot.com/x/.config?x=5ed9af1b47477866
dashboard link: https://syzkaller.appspot.com/bug?extid=dced7c2d89dde957f7dd
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+dced7c2d89dde957f...@syzkaller.appspotmail.com

=
WARNING: suspicious RCU usage
5.10.0-rc7-syzkaller #0 Not tainted
-
kernel/sched/core.c:7270 Illegal context switch in RCU-bh read-side critical 
section!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 0
2 locks held by kworker/1:8/18355:
 #0: 888010063d38 ((wq_completion)events){+.+.}-{0:0}, at: 
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 888010063d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set 
include/asm-generic/atomic-instrumented.h:856 [inline]
 #0: 888010063d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set 
include/asm-generic/atomic-long.h:41 [inline]
 #0: 888010063d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data 
kernel/workqueue.c:616 [inline]
 #0: 888010063d38 ((wq_completion)events){+.+.}-{0:0}, at: 
set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
 #0: 888010063d38 ((wq_completion)events){+.+.}-{0:0}, at: 
process_one_work+0x821/0x15a0 kernel/workqueue.c:2243
 #1: c90002a6fda8 ((work_completion)(&w->work)#2){+.+.}-{0:0}, at: 
process_one_work+0x854/0x15a0 kernel/workqueue.c:2247

stack backtrace:
CPU: 1 PID: 18355 Comm: kworker/1:8 Not tainted 5.10.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Workqueue: events iterate_cleanup_work
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:118
 ___might_sleep+0x220/0x2b0 kernel/sched/core.c:7270
 get_next_corpse net/netfilter/nf_conntrack_core.c: [inline]
 nf_ct_iterate_cleanup+0x132/0x400 net/netfilter/nf_conntrack_core.c:2244
 nf_ct_iterate_cleanup_net net/netfilter/nf_conntrack_core.c:2329 [inline]
 nf_ct_iterate_cleanup_net+0x113/0x170 net/netfilter/nf_conntrack_core.c:2314
 iterate_cleanup_work+0x45/0x130 net/netfilter/nf_nat_masquerade.c:216
 process_one_work+0x933/0x15a0 kernel/workqueue.c:2272
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2418
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

[PATCH net-next] devlink: use _BITUL() macro instead of BIT() in the UAPI header

2020-12-15 Thread Tobias Klauser

The BIT() macro is not available for the UAPI headers. Moreover, it can
be defined differently in user space headers. Thus, replace its usage
with the _BITUL() macro which is already used in other macro definitions
in .

Fixes: dc64cc7c6310 ("devlink: Add devlink reload limit option")
Signed-off-by: Tobias Klauser 
---
 include/uapi/linux/devlink.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 5203f54a2be1..cf89c318f2ac 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -322,7 +322,7 @@ enum devlink_reload_limit {
DEVLINK_RELOAD_LIMIT_MAX = __DEVLINK_RELOAD_LIMIT_MAX - 1
 };
 
-#define DEVLINK_RELOAD_LIMITS_VALID_MASK (BIT(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
+#define DEVLINK_RELOAD_LIMITS_VALID_MASK (_BITUL(__DEVLINK_RELOAD_LIMIT_MAX) - 
1)
 
 enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
-- 
2.27.0

Re: [PATCH] net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function

2020-12-15 Thread Maxime Ripard

On Tue, Dec 15, 2020 at 12:11:53PM +0300, Dan Carpenter wrote:
> On Tue, Dec 15, 2020 at 09:56:55AM +0100, Maxime Ripard wrote:
> > Hi,
> > 
> > On Mon, Dec 14, 2020 at 09:21:17PM +0100, Christophe JAILLET wrote:
> > > 'irq_of_parse_and_map()' should be balanced by a corresponding
> > > 'irq_dispose_mapping()' call. Otherwise, there is some resources leaks.
> > 
> > Do you have a source to back that? It's not clear at all from the
> > documentation for those functions, and couldn't find any user calling it
> > from the ten-or-so random picks I took.
> 
> It looks like irq_create_of_mapping() needs to be freed with
> irq_dispose_mapping() so this is correct.

The doc should be updated first to make that clear then, otherwise we're
going to fix one user while multiples will have poped up

Maxime


signature.asc
Description: PGP signature

Re: [PATCH v2 3/3] net: mhi: Add dedicated alloc thread

2020-12-15 Thread Loic Poulain

Hi Jakub,

On Mon, 14 Dec 2020 at 20:47, Jakub Kicinski  wrote:
>
> On Mon, 14 Dec 2020 10:19:07 +0100 Loic Poulain wrote:
> > On Sat, 12 Dec 2020 at 21:55, Jakub Kicinski  wrote:
> > > On Thu, 10 Dec 2020 12:15:51 +0100 Loic Poulain wrote:
> > > > The buffer allocation for RX path is currently done by a work executed
> > > > in the system workqueue. The work to do is quite simple and consists
> > > > mostly in allocating and queueing as much as possible buffers to the MHI
> > > > RX channel.
> > > >
> > > > It appears that using a dedicated kthread would be more appropriate to
> > > > prevent
> > > > 1. RX allocation latency introduced by the system queue
> > >
> > > System work queue should not add much latency, you can also create your
> > > own workqueue. Did you intend to modify the priority of the thread you
> > > create?
> >
> > No, and I don't, since I assume there is no reason to prioritize
> > network over other loads. I've considered the dedicated workqueue, but
> > since there is only one task to run as a while loop, I thought using a
> > kthread was more appropriate (and slightly lighter), but I can move to
> > that solution if you recommend it.
>
> Not sure what to recommend TBH, if thread works better for you that's
> fine. I don't understand why the thread would work better, tho. I was
> just checking if there is any extra tuning that happens.
>
> > > > 2. Unbounded work execution, the work only returning when queue is
> > > > full, it can possibly monopolise the workqueue thread on slower systems.
> > >
> > > Is this something you observed in practice?
> >
> > No, I've just observed that work duration is inconstant , queuing from
> > few buffers to several hundreeds. This unbounded behavior makes me
> > feel that doing that in the shared sytem workqueue is probably not the
> > right place. I've not tested on a slower machine though.
>
> I think long running work should not be an issue for the cmwq
> implementation we have in the kernel.
>
> Several hundred buffers means it's running concurrently with RX, right?
> Since the NIC queue is 128 buffers.

Exactly, buffers can be completed by the hardware before we even
finished to completely fill the MHI ring buffer, that why the loop can
queue more than 128 buffers.

> > > > This patch replaces the system work with a simple kthread that loops on
> > > > buffer allocation and sleeps when queue is full. Moreover it gets rid
> > > > of the local rx_queued variable (to track buffer count), and instead,
> > > > relies on the new mhi_get_free_desc_count helper.
> > >
> > > Seems unrelated, should probably be a separate patch.
> >
> > I can do that.
> >
> > >
> > > > After pratical testing on a x86_64 machine, this change improves
> > > > - Peek throughput (slightly, by few mbps)
> > > > - Throughput stability when concurrent loads are running (stress)
> > > > - CPU usage, less CPU cycles dedicated to the task
> > >
> > > Do you have an explanation why the CPU cycles are lower?
> >
> > For CPU cycles, TBH, not really, this is just observational.
>
> Is the IRQ pinned? I wonder how often work runs on the same CPU as IRQ
> processing and how often does the thread do.
>
> > Regarding throughput stability, it's certainly because the work can
> > consume all its dedicated kthread time.
>
> Meaning workqueue implementation doesn't get enough CPU? Strange.
>
> > > > Below is the powertop output for RX allocation task before and
> > > > after this change, when performing UDP download at 6Gbps. Mostly
> > > > to highlight the improvement in term of CPU usage.
> > > >
> > > > older (system workqueue):
> > > > Usage   Events/sCategory   Description
> > > > 63,2 ms/s 134,0kWork  mhi_net_rx_refill_work
> > > > 62,8 ms/s 134,3kWork  mhi_net_rx_refill_work
> > > > 60,8 ms/s 141,4kWork  mhi_net_rx_refill_work
> > > >
> > > > newer (dedicated kthread):
> > > > Usage   Events/sCategory   Description
> > > > 20,7 ms/s 155,6Process[PID 3360] [mhi-net-rx]
> > > > 22,2 ms/s 169,6Process[PID 3360] [mhi-net-rx]
> > > > 22,3 ms/s 150,2Process[PID 3360] [mhi-net-rx]
> > > >
> > > > Signed-off-by: Loic Poulain 
>
> > > > + skb = netdev_alloc_skb(ndev, size);
> > > > + if (unlikely(!skb)) {
> > > > + /* No memory, retry later */
> > > > +
> > > > schedule_timeout_interruptible(msecs_to_jiffies(250));
> > >
> > > You should have a counter for this, at least for your testing. If
> > > this condition is hit it'll probably have a large impact on the
> > > performance.
> >
> > Indeed, going to do that, what about a ratelimited error? I assume if
> > it's happen, system is really in bad shape.
>
> It's not that uncommon to run out of memory for a 2k allocation in an
> atomic context (note that netdev_alloc_skb() uses GFP_ATOMIC).
> You can add a rate-limited print if you want, tho.
>
> > > > +

Re: [PATCH v4 0/4] Improve s0ix flows for systems i219LM

2020-12-15 Thread Hans de Goede

Hi,

On 12/14/20 8:36 PM, Limonciello, Mario wrote:
>> Hi All,
>>
>> Sasha (and the other intel-wired-lan folks), thank you for investigating this
>> further and for coming up with a better solution.
>>
>> Mario, thank you for implementing the new scheme.
>>
> 
> Sure.
> 
>> I've tested this patch set on a Lenovo X1C8 with vPRO and AMT enabled in the
>> BIOS
>> (the previous issues were soon on a X1C7).
>>
>> I have good and bad news:
>>
>> The good news is that after reverting the
>> "e1000e: disable s0ix entry and exit flows for ME systems"
>> I can reproduce the original issue on the X1C8 (I no longer have
>> a X1C7 to test on).
>>
>> The bad news is that increasing the timeout to 1 second does
>> not fix the issue. Suspend/resume is still broken after one
>> suspend/resume cycle, as described in the original bug-report:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1865570
>>
>> More good news though, bumping the timeout to 250 poll iterations
>> (approx 2.5 seconds) as done in Aaron Ma's original patch for
>> this fixes this on the X1C8 just as it did on the X1C7
>> (it takes 2 seconds for ULP_CONFIG_DONE to clear).
>>
>> I've ran some extra tests and the poll loop succeeds on its
>> first iteration when an ethernet-cable is connected. It seems
>> that Lenovo's variant of the ME firmware waits up to 2 seconds
>> for a link, causing the long wait for ULP_CONFIG_DONE to clear.
>>
>> I think that for now the best fix would be to increase the timeout
>> to 2.5 seconds as done in  Aaron Ma's original patch. Combined
>> with a broken-firmware warning when we waited longer then 1 second,
>> to make it clear that there is a firmware issue here and that
>> the long wait / slow resume is not the fault of the driver.
>>
> 
> OK.  I've submitted v5 with this suggestion.
> 
>> ###
>>
>> I've added Mark Pearson from Lenovo to the Cc so that Lenovo
>> can investigate this issue further.
>>
>> Mark, this thread is about an issue with enabling S0ix support for
>> e1000e (i219lm) controllers. This was enabled in the kernel a
>> while ago, but then got disabled again on vPro / AMT enabled
>> systems because on some systems (Lenovo X1C7 and now also X1C8)
>> this lead to suspend/resume issues.
>>
>> When AMT is active then there is a handover handshake for the
>> OS to get access to the ethernet controller from the ME. The
>> Intel folks have checked and the Windows driver is using a timeout
>> of 1 second for this handshake, yet on Lenovo systems this is
>> taking 2 seconds. This likely has something to do with the
>> ME firmware on these Lenovo models, can you get the firmware
>> team at Lenovo to investigate this further ?
>>
> 
> Please be very careful with nomenclature.  AMT active, or AMT capable?
> The goal for this series is to support AMT capable systems with an i219LM
> where AMT has not been provisioned by the end user or organization.
> OEMs do not ship systems with AMD provisioned.

Ah, sorry about that. What I meant with "active" is set to "Enabled"
in the BIOS.

Also FWIW I just tried disabling AMT in the BIOS (using the "Disabled"
option, not the "Permanently Disabled" option) on the Lenovo X1 Carbon
8th gen, but that does not make a difference.

It still takes 2 seconds for ULP_CONFIG_DONE to clear even with AMT
set to "Disabled" in the BIOS :|

Regards,

Hans

Re: [PATCH v2] net/mlx4: Use true,false for bool variable

2020-12-15 Thread Vasyl

Hi,

Ouuu it was fixed recently in net-next.
Sorry, I missed that.
Thanks for submitting policy clarification I am going to adapt to it.

Thanks

On Tue, Dec 15, 2020 at 7:18 AM Leon Romanovsky  wrote:
>
> On Mon, Dec 14, 2020 at 09:37:34PM -0800, Joe Perches wrote:
> > On Tue, 2020-12-15 at 07:18 +0200, Leon Romanovsky wrote:
> > > On Mon, Dec 14, 2020 at 11:15:01AM -0800, Joe Perches wrote:
> > > > I prefer revisions to single patches (as opposed to large patch series)
> > > > in the same thread.
> > >
> > > It depends which side you are in that game. From the reviewer point of
> > > view, such submission breaks flow very badly. It unfolds the already
> > > reviewed thread, messes with the order and many more little annoying
> > > things.
> >
> > This is where I disagree with you.  I am a reviewer here.
>
> It is ok, different people have different views.
>
> >
> > Not having context to be able to inspect vN -> vN+1 is made
> > more difficult not having the original patch available and
> > having to search history for it.
>
> I'm following after specific subsystems and see all patches there,
> so for me and Jakub context already exists.
>
> Bottom line, it depends on the workflow.
>
> >
> > Almost no one adds URL links to older submissions below the ---.
>
> Too bad, maybe it is time to enforce it.
>
> >
> > Were that a standard mechanism below the --- line, then it would
> > be OK.
>
> So let's me summarize, we (RDMA and netdev subsystems) would like to ask
> do not submit new patch revisions as reply-to.
>
> Thanks



-- 
Доброї вам пори дня.

Re: [PATCH v3 bpf-next 2/2] net: xdp: introduce xdp_prepare_buff utility routine

2020-12-15 Thread Maciej Fijalkowski

On Sat, Dec 12, 2020 at 06:41:49PM +0100, Lorenzo Bianconi wrote:
> Introduce xdp_prepare_buff utility routine to initialize per-descriptor
> xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
> all XDP capable drivers.
> 
> Signed-off-by: Lorenzo Bianconi 
> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  |  5 ++---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  4 +---
>  drivers/net/ethernet/cavium/thunder/nicvf_main.c  |  7 ---
>  drivers/net/ethernet/freescale/dpaa/dpaa_eth.c|  6 ++
>  drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c  | 13 +
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 12 ++--
>  drivers/net/ethernet/intel/ice/ice_txrx.c | 11 ++-
>  drivers/net/ethernet/intel/igb/igb_main.c | 12 ++--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 ++--
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 12 ++--
>  drivers/net/ethernet/marvell/mvneta.c |  6 ++
>  drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c   |  7 +++
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c|  5 ++---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |  4 +---
>  .../net/ethernet/netronome/nfp/nfp_net_common.c   |  8 
>  drivers/net/ethernet/qlogic/qede/qede_fp.c|  4 +---
>  drivers/net/ethernet/sfc/rx.c |  6 ++
>  drivers/net/ethernet/socionext/netsec.c   |  5 ++---
>  drivers/net/ethernet/ti/cpsw.c| 15 +--
>  drivers/net/ethernet/ti/cpsw_new.c| 15 +--
>  drivers/net/hyperv/netvsc_bpf.c   |  4 +---
>  drivers/net/tun.c |  4 +---
>  drivers/net/veth.c|  6 +-
>  drivers/net/virtio_net.c  | 12 
>  drivers/net/xen-netfront.c|  4 +---
>  include/net/xdp.h | 12 
>  net/bpf/test_run.c|  5 +
>  net/core/dev.c| 10 --
>  28 files changed, 96 insertions(+), 130 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 338dce73927e..1cfd0c98677e 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -1519,10 +1519,9 @@ static int ena_xdp_handle_buff(struct ena_ring 
> *rx_ring, struct xdp_buff *xdp)
>   int ret;
>  
>   rx_info = &rx_ring->rx_buffer_info[rx_ring->ena_bufs[0].req_id];
> - xdp->data = page_address(rx_info->page) + rx_info->page_offset;
> + xdp_prepare_buff(xdp, page_address(rx_info->page),
> +  rx_info->page_offset, rx_ring->ena_bufs[0].len);
>   xdp_set_data_meta_invalid(xdp);
> - xdp->data_hard_start = page_address(rx_info->page);
> - xdp->data_end = xdp->data + rx_ring->ena_bufs[0].len;
>   /* If for some reason we received a bigger packet than
>* we expect, then we simply drop it
>*/
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index b7942c3440c0..e1664b86a7b8 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -134,10 +134,8 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct 
> bnxt_rx_ring_info *rxr, u16 cons,
>  
>   txr = rxr->bnapi->tx_ring;
>   xdp_init_buff(&xdp, PAGE_SIZE, &rxr->xdp_rxq);
> - xdp.data_hard_start = *data_ptr - offset;
> - xdp.data = *data_ptr;
> + xdp_prepare_buff(&xdp, *data_ptr - offset, offset, *len);
>   xdp_set_data_meta_invalid(&xdp);
> - xdp.data_end = *data_ptr + *len;
>   orig_data = xdp.data;
>  
>   rcu_read_lock();
> diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
> b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> index 9fc672f075f2..9bdac04359c6 100644
> --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> @@ -530,6 +530,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct 
> bpf_prog *prog,
>   struct cqe_rx_t *cqe_rx, struct snd_queue *sq,
>   struct rcv_queue *rq, struct sk_buff **skb)
>  {
> + unsigned char *hard_start, *data;
>   struct xdp_buff xdp;
>   struct page *page;
>   u32 action;
> @@ -549,10 +550,10 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, 
> struct bpf_prog *prog,
>  
>   xdp_init_buff(&xdp, RCV_FRAG_LEN + XDP_PACKET_HEADROOM,
> &rq->xdp_rxq);
> - xdp.data_hard_start = page_address(page);
> - xdp.data = (void *)cpu_addr;
> + hard_start = page_address(page);
> + data = (unsigned char *)cpu_addr;
> + xdp_prepare_buff(&xdp, hard_start, data - hard_start

Re: [PATCH v5 4/4] e1000e: Export S0ix flags to ethtool

2020-12-15 Thread Hans de Goede

Hi,

On 12/14/20 8:29 PM, Mario Limonciello wrote:
> This flag can be used by an end user to disable S0ix flows on a
> buggy system or by an OEM for development purposes.
> 
> If you need this flag to be persisted across reboots, it's suggested
> to use a udev rule to call adjust it until the kernel could have your
> configuration in a disallow list.
> 
> Signed-off-by: Mario Limonciello 

Thanks, patch looks good to me:

Reviewed-by: Hans de Goede 

Regards,

Hans

> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h   |  1 +
>  drivers/net/ethernet/intel/e1000e/ethtool.c | 46 +
>  drivers/net/ethernet/intel/e1000e/netdev.c  |  9 ++--
>  3 files changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
> b/drivers/net/ethernet/intel/e1000e/e1000.h
> index ba7a0f8f6937..5b2143f4b1f8 100644
> --- a/drivers/net/ethernet/intel/e1000e/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000e/e1000.h
> @@ -436,6 +436,7 @@ s32 e1000e_get_base_timinca(struct e1000_adapter 
> *adapter, u32 *timinca);
>  #define FLAG2_DFLT_CRC_STRIPPING  BIT(12)
>  #define FLAG2_CHECK_RX_HWTSTAMP   BIT(13)
>  #define FLAG2_CHECK_SYSTIM_OVERFLOW   BIT(14)
> +#define FLAG2_ENABLE_S0IX_FLOWS   BIT(15)
>  
>  #define E1000_RX_DESC_PS(R, i)   \
>   (&(((union e1000_rx_desc_packet_split *)((R).desc))[i]))
> diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c 
> b/drivers/net/ethernet/intel/e1000e/ethtool.c
> index 03215b0aee4b..06442e6bef73 100644
> --- a/drivers/net/ethernet/intel/e1000e/ethtool.c
> +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
> @@ -23,6 +23,13 @@ struct e1000_stats {
>   int stat_offset;
>  };
>  
> +static const char e1000e_priv_flags_strings[][ETH_GSTRING_LEN] = {
> +#define E1000E_PRIV_FLAGS_S0IX_ENABLED   BIT(0)
> + "s0ix-enabled",
> +};
> +
> +#define E1000E_PRIV_FLAGS_STR_LEN ARRAY_SIZE(e1000e_priv_flags_strings)
> +
>  #define E1000_STAT(str, m) { \
>   .stat_string = str, \
>   .type = E1000_STATS, \
> @@ -1776,6 +1783,8 @@ static int e1000e_get_sset_count(struct net_device 
> __always_unused *netdev,
>   return E1000_TEST_LEN;
>   case ETH_SS_STATS:
>   return E1000_STATS_LEN;
> + case ETH_SS_PRIV_FLAGS:
> + return E1000E_PRIV_FLAGS_STR_LEN;
>   default:
>   return -EOPNOTSUPP;
>   }
> @@ -2097,6 +2106,10 @@ static void e1000_get_strings(struct net_device 
> __always_unused *netdev,
>   p += ETH_GSTRING_LEN;
>   }
>   break;
> + case ETH_SS_PRIV_FLAGS:
> + memcpy(data, e1000e_priv_flags_strings,
> +E1000E_PRIV_FLAGS_STR_LEN * ETH_GSTRING_LEN);
> + break;
>   }
>  }
>  
> @@ -2305,6 +2318,37 @@ static int e1000e_get_ts_info(struct net_device 
> *netdev,
>   return 0;
>  }
>  
> +static u32 e1000e_get_priv_flags(struct net_device *netdev)
> +{
> + struct e1000_adapter *adapter = netdev_priv(netdev);
> + u32 priv_flags = 0;
> +
> + if (adapter->flags2 & FLAG2_ENABLE_S0IX_FLOWS)
> + priv_flags |= E1000E_PRIV_FLAGS_S0IX_ENABLED;
> +
> + return priv_flags;
> +}
> +
> +static int e1000e_set_priv_flags(struct net_device *netdev, u32 priv_flags)
> +{
> + struct e1000_adapter *adapter = netdev_priv(netdev);
> + unsigned int flags2 = adapter->flags2;
> +
> + flags2 &= ~FLAG2_ENABLE_S0IX_FLOWS;
> + if (priv_flags & E1000E_PRIV_FLAGS_S0IX_ENABLED) {
> + struct e1000_hw *hw = &adapter->hw;
> +
> + if (hw->mac.type < e1000_pch_cnp)
> + return -EINVAL;
> + flags2 |= FLAG2_ENABLE_S0IX_FLOWS;
> + }
> +
> + if (flags2 != adapter->flags2)
> + adapter->flags2 = flags2;
> +
> + return 0;
> +}
> +
>  static const struct ethtool_ops e1000_ethtool_ops = {
>   .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS,
>   .get_drvinfo= e1000_get_drvinfo,
> @@ -2336,6 +2380,8 @@ static const struct ethtool_ops e1000_ethtool_ops = {
>   .set_eee= e1000e_set_eee,
>   .get_link_ksettings = e1000_get_link_ksettings,
>   .set_link_ksettings = e1000_set_link_ksettings,
> + .get_priv_flags = e1000e_get_priv_flags,
> + .set_priv_flags = e1000e_set_priv_flags,
>  };
>  
>  void e1000e_set_ethtool_ops(struct net_device *netdev)
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index b9800ba2006c..e9b82c209c2d 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6923,7 +6923,6 @@ static __maybe_unused int e1000e_pm_suspend(struct 
> device *dev)
>   struct net_device *netdev = pci_get_drvdata(to_pci_dev(dev));
>   struct e1000_adapter *adapter = netdev_priv(netdev);
>   struct pci_dev *pdev = to_pci_dev(dev);
> - stru

Re: [PATCH v5 0/4] Improve s0ix flows for systems i219LM

2020-12-15 Thread Hans de Goede

Hi,

On 12/14/20 8:29 PM, Mario Limonciello wrote:
> commit e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME 
> systems")
> disabled s0ix flows for systems that have various incarnations of the
> i219-LM ethernet controller.  This was done because of some regressions
> caused by an earlier
> commit 632fbd5eb5b0e ("e1000e: fix S0ix flows for cable connected case")
> with i219-LM controller.
> 
> Per discussion with Intel architecture team this direction should be changed 
> and
> allow S0ix flows to be used by default.  This patch series includes 
> directional
> changes for their conclusions in https://lkml.org/lkml/2020/12/13/15.
> 
> Changes from v4 to v5:
>  - If setting S0ix to enabled in ethtool examine the hardware generation.
>If running on hardware older than Cannon Point return an error.
>  - Increase ULP timeout to 2.5 seconds, but show a warning after 1 second.

Thank you. I've given v5 a test on a Lenovo X1 Carbon 8th gen (AMT capable)
and things work fine there with v5:

Tested-by: Hans de Goede 

Regards,

Hans




> Changes from v3 to v4:
>  - Drop patch 1 for proper s0i3.2 entry, it was separated and is now merged 
> in kernel
>  - Add patch to only run S0ix flows if shutdown succeeded which was suggested 
> in
>thread
>  - Adjust series for guidance from https://lkml.org/lkml/2020/12/13/15
>* Revert i219-LM disallow-list.
>* Drop all patches for systems tested by Dell in an allow list
>* Increase ULP timeout to 1000ms
> Changes from v2 to v3:
>  - Correct some grammar and spelling issues caught by Bjorn H.
>* s/s0ix/S0ix/ in all commit messages
>* Fix a typo in commit message
>* Fix capitalization of proper nouns
>  - Add more pre-release systems that pass
>  - Re-order the series to add systems only at the end of the series
>  - Add Fixes tag to a patch in series.
> 
> Changes from v1 to v2:
>  - Directly incorporate Vitaly's dependency patch in the series
>  - Split out s0ix code into it's own file
>  - Adjust from DMI matching to PCI subsystem vendor ID/device matching
>  - Remove module parameter and sysfs, use ethtool flag instead.
>  - Export s0ix flag to ethtool private flags
>  - Include more people and lists directly in this submission chain.
> 
> 
> Mario Limonciello (4):
>   e1000e: Only run S0ix flows if shutdown succeeded
>   e1000e: bump up timeout to wait when ME un-configures ULP mode
>   Revert "e1000e: disable s0ix entry and exit flows for ME systems"
>   e1000e: Export S0ix flags to ethtool
> 
>  drivers/net/ethernet/intel/e1000e/e1000.h   |  1 +
>  drivers/net/ethernet/intel/e1000e/ethtool.c | 46 
>  drivers/net/ethernet/intel/e1000e/ich8lan.c | 16 --
>  drivers/net/ethernet/intel/e1000e/netdev.c  | 59 -
>  4 files changed, 70 insertions(+), 52 deletions(-)
> 
> --
> 2.25.1
>

Re: [PATCH v5 2/4] e1000e: bump up timeout to wait when ME un-configures ULP mode

2020-12-15 Thread Hans de Goede

Hi,

On 12/14/20 8:29 PM, Mario Limonciello wrote:
> Per guidance from Intel ethernet architecture team, it may take
> up to 1 second for unconfiguring ULP mode.
> 
> However in practice this seems to be taking up to 2 seconds on
> some Lenovo machines.  Detect scenarios that take more than 1 second
> but less than 2.5 seconds and emit a warning on resume for those
> scenarios.
> 
> Suggested-by: Aaron Ma 
> Suggested-by: Sasha Netfin 
> Suggested-by: Hans de Goede 
> CC: Mark Pearson 
> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
> BugLink: https://bugs.launchpad.net/bugs/1865570
> Link: 
> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20200323191639.48826-1-aaron...@canonical.com/
> Link: https://lkml.org/lkml/2020/12/13/15
> Link: https://lkml.org/lkml/2020/12/14/708
> Signed-off-by: Mario Limonciello 

Thanks, patch looks good to me:

Reviewed-by: Hans de Goede 

Regards,

Hans

> ---
>  drivers/net/ethernet/intel/e1000e/ich8lan.c | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index 9aa6fad8ed47..fdf23d20c954 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -1240,6 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw 
> *hw, bool force)
>   return 0;
>  
>   if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> + struct e1000_adapter *adapter = hw->adapter;
> + bool firmware_bug = false;
> +
>   if (force) {
>   /* Request ME un-configure ULP mode in the PHY */
>   mac_reg = er32(H2ME);
> @@ -1248,16 +1251,23 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw 
> *hw, bool force)
>   ew32(H2ME, mac_reg);
>   }
>  
> - /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
> + /* Poll up to 2.5 seconds for ME to clear ULP_CFG_DONE.
> +  * If this takes more than 1 second, show a warning indicating 
> a firmware
> +  * bug */
>   while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
> - if (i++ == 30) {
> + if (i++ == 250) {
>   ret_val = -E1000_ERR_PHY;
>   goto out;
>   }
> + if (i > 100 && !firmware_bug)
> + firmware_bug = true;
>  
>   usleep_range(1, 11000);
>   }
> - e_dbg("ULP_CONFIG_DONE cleared after %dmsec\n", i * 10);
> + if (firmware_bug)
> + e_warn("ULP_CONFIG_DONE took %dmsec.  This is a 
> firmware bug\n", i * 10);
> + else
> + e_dbg("ULP_CONFIG_DONE cleared after %dmsec\n", i * 10);
>  
>   if (force) {
>   mac_reg = er32(H2ME);
>

Re: [PATCH v5 3/4] Revert "e1000e: disable s0ix entry and exit flows for ME systems"

2020-12-15 Thread Hans de Goede

Hi,

On 12/14/20 8:29 PM, Mario Limonciello wrote:
> commit e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME 
> systems")
> disabled s0ix flows for systems that have various incarnations of the
> i219-LM ethernet controller.  This changed caused power consumption 
> regressions
> on the following shipping Dell Comet Lake based laptops:
> * Latitude 5310
> * Latitude 5410
> * Latitude 5410
> * Latitude 5510
> * Precision 3550
> * Latitude 5411
> * Latitude 5511
> * Precision 3551
> * Precision 7550
> * Precision 7750
> 
> This commit was introduced because of some regressions on certain Thinkpad
> laptops.  This comment was potentially caused by an earlier
> commit 632fbd5eb5b0e ("e1000e: fix S0ix flows for cable connected case").
> or it was possibly caused by a system not meeting platform architectural
> requirements for low power consumption.  Other changes made in the driver
> with extended timeouts are expected to make the driver more impervious to
> platform firmware behavior.
> 
> Fixes: e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME 
> systems")
> Reviewed-by: Alexander Duyck 
> Signed-off-by: Mario Limonciello 

Thanks, patch looks good to me:

Reviewed-by: Hans de Goede 

Regards,

Hans

> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 45 +-
>  1 file changed, 2 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 6588f5d4a2be..b9800ba2006c 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -103,45 +103,6 @@ static const struct e1000_reg_info e1000_reg_info_tbl[] 
> = {
>   {0, NULL}
>  };
>  
> -struct e1000e_me_supported {
> - u16 device_id;  /* supported device ID */
> -};
> -
> -static const struct e1000e_me_supported me_supported[] = {
> - {E1000_DEV_ID_PCH_LPT_I217_LM},
> - {E1000_DEV_ID_PCH_LPTLP_I218_LM},
> - {E1000_DEV_ID_PCH_I218_LM2},
> - {E1000_DEV_ID_PCH_I218_LM3},
> - {E1000_DEV_ID_PCH_SPT_I219_LM},
> - {E1000_DEV_ID_PCH_SPT_I219_LM2},
> - {E1000_DEV_ID_PCH_LBG_I219_LM3},
> - {E1000_DEV_ID_PCH_SPT_I219_LM4},
> - {E1000_DEV_ID_PCH_SPT_I219_LM5},
> - {E1000_DEV_ID_PCH_CNP_I219_LM6},
> - {E1000_DEV_ID_PCH_CNP_I219_LM7},
> - {E1000_DEV_ID_PCH_ICP_I219_LM8},
> - {E1000_DEV_ID_PCH_ICP_I219_LM9},
> - {E1000_DEV_ID_PCH_CMP_I219_LM10},
> - {E1000_DEV_ID_PCH_CMP_I219_LM11},
> - {E1000_DEV_ID_PCH_CMP_I219_LM12},
> - {E1000_DEV_ID_PCH_TGP_I219_LM13},
> - {E1000_DEV_ID_PCH_TGP_I219_LM14},
> - {E1000_DEV_ID_PCH_TGP_I219_LM15},
> - {0}
> -};
> -
> -static bool e1000e_check_me(u16 device_id)
> -{
> - struct e1000e_me_supported *id;
> -
> - for (id = (struct e1000e_me_supported *)me_supported;
> -  id->device_id; id++)
> - if (device_id == id->device_id)
> - return true;
> -
> - return false;
> -}
> -
>  /**
>   * __ew32_prepare - prepare to write to MAC CSR register on certain parts
>   * @hw: pointer to the HW structure
> @@ -6974,8 +6935,7 @@ static __maybe_unused int e1000e_pm_suspend(struct 
> device *dev)
>   e1000e_pm_thaw(dev);
>   } else {
>   /* Introduce S0ix implementation */
> - if (hw->mac.type >= e1000_pch_cnp &&
> - !e1000e_check_me(hw->adapter->pdev->device))
> + if (hw->mac.type >= e1000_pch_cnp)
>   e1000e_s0ix_entry_flow(adapter);
>   }
>  
> @@ -6991,8 +6951,7 @@ static __maybe_unused int e1000e_pm_resume(struct 
> device *dev)
>   int rc;
>  
>   /* Introduce S0ix implementation */
> - if (hw->mac.type >= e1000_pch_cnp &&
> - !e1000e_check_me(hw->adapter->pdev->device))
> + if (hw->mac.type >= e1000_pch_cnp)
>   e1000e_s0ix_exit_flow(adapter);
>  
>   rc = __e1000_resume(pdev);
>

Re: [PATCH 1/3] PCI/ASPM: Use the path max in L1 ASPM latency check

2020-12-15 Thread Ian Kumlien

On Tue, Dec 15, 2020 at 1:40 AM Bjorn Helgaas  wrote:
>
> On Mon, Dec 14, 2020 at 11:56:31PM +0100, Ian Kumlien wrote:
> > On Mon, Dec 14, 2020 at 8:19 PM Bjorn Helgaas  wrote:
>
> > > If you're interested, you could probably unload the Realtek drivers,
> > > remove the devices, and set the PCI_EXP_LNKCTL_LD (Link Disable) bit
> > > in 02:04.0, e.g.,
> > >
> > >   # RT=/sys/devices/pci:00/:00:01.2/:01:00.0/:02:04.0
> > >   # echo 1 > $RT/:04:00.0/remove
> > >   # echo 1 > $RT/:04:00.1/remove
> > >   # echo 1 > $RT/:04:00.2/remove
> > >   # echo 1 > $RT/:04:00.4/remove
> > >   # echo 1 > $RT/:04:00.7/remove
> > >   # setpci -s02:04.0 CAP_EXP+0x10.w=0x0010
> > >
> > > That should take 04:00.x out of the picture.
> >
> > Didn't actually change the behaviour, I'm suspecting an errata for AMD 
> > pcie...
> >
> > So did this, with unpatched kernel:
> > [ ID] Interval   Transfer Bitrate Retr  Cwnd
> > [  5]   0.00-1.00   sec  4.56 MBytes  38.2 Mbits/sec0   67.9 KBytes
> > [  5]   1.00-2.00   sec  4.47 MBytes  37.5 Mbits/sec0   96.2 KBytes
> > [  5]   2.00-3.00   sec  4.85 MBytes  40.7 Mbits/sec0   50.9 KBytes
> > [  5]   3.00-4.00   sec  4.23 MBytes  35.4 Mbits/sec0   70.7 KBytes
> > [  5]   4.00-5.00   sec  4.23 MBytes  35.4 Mbits/sec0   48.1 KBytes
> > [  5]   5.00-6.00   sec  4.23 MBytes  35.4 Mbits/sec0   45.2 KBytes
> > [  5]   6.00-7.00   sec  4.23 MBytes  35.4 Mbits/sec0   36.8 KBytes
> > [  5]   7.00-8.00   sec  3.98 MBytes  33.4 Mbits/sec0   36.8 KBytes
> > [  5]   8.00-9.00   sec  4.23 MBytes  35.4 Mbits/sec0   36.8 KBytes
> > [  5]   9.00-10.00  sec  4.23 MBytes  35.4 Mbits/sec0   48.1 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval   Transfer Bitrate Retr
> > [  5]   0.00-10.00  sec  43.2 MBytes  36.2 Mbits/sec0 sender
> > [  5]   0.00-10.00  sec  42.7 MBytes  35.8 Mbits/sec  
> > receiver
> >
> > and:
> > echo 0 > /sys/devices/pci:00/:00:01.2/:01:00.0/link/l1_aspm
>
> BTW, thanks a lot for testing out the "l1_aspm" sysfs file.  I'm very
> pleased that it seems to be working as intended.

It was nice to find it for easy disabling :)

> > and:
> > [ ID] Interval   Transfer Bitrate Retr  Cwnd
> > [  5]   0.00-1.00   sec   113 MBytes   951 Mbits/sec  153772 KBytes
> > [  5]   1.00-2.00   sec   109 MBytes   912 Mbits/sec  276550 KBytes
> > [  5]   2.00-3.00   sec   111 MBytes   933 Mbits/sec  123625 KBytes
> > [  5]   3.00-4.00   sec   111 MBytes   933 Mbits/sec   31687 KBytes
> > [  5]   4.00-5.00   sec   110 MBytes   923 Mbits/sec0679 KBytes
> > [  5]   5.00-6.00   sec   110 MBytes   923 Mbits/sec  136577 KBytes
> > [  5]   6.00-7.00   sec   110 MBytes   923 Mbits/sec  214645 KBytes
> > [  5]   7.00-8.00   sec   110 MBytes   923 Mbits/sec   32628 KBytes
> > [  5]   8.00-9.00   sec   110 MBytes   923 Mbits/sec   81537 KBytes
> > [  5]   9.00-10.00  sec   110 MBytes   923 Mbits/sec   10577 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval   Transfer Bitrate Retr
> > [  5]   0.00-10.00  sec  1.08 GBytes   927 Mbits/sec  1056 
> > sender
> > [  5]   0.00-10.00  sec  1.07 GBytes   923 Mbits/sec  
> > receiver
> >
> > But this only confirms that the fix i experience is a side effect.
> >
> > The original code is still wrong :)
>
> What exactly is this machine?  Brand, model, config?  Maybe you could
> add this and a dmesg log to the buzilla?  It seems like other people
> should be seeing the same problem, so I'm hoping to grub around on the
> web to see if there are similar reports involving these devices.

ASUS Pro WS X570-ACE with AMD Ryzen 9 3900X

> https://bugzilla.kernel.org/show_bug.cgi?id=209725
>
> Here's one that is superficially similar:
> https://linux-hardware.org/index.php?probe=e5f24075e5&log=lspci_all
> in that it has a RP -- switch -- I211 path.  Interestingly, the switch
> here advertises <64us L1 exit latency instead of the <32us latency
> your switch advertises.  Of course, I can't tell if it's exactly the
> same switch.

Same chipset it seems

I'm running bios version:
Version: 2206
Release Date: 08/13/2020

ANd latest is:
Version 3003
2020/12/07

Will test upgrading that as well, but it could be that they report the
incorrect latency of the switch - I don't know how many things AGESA
changes but... It's been updated twice since my upgrade.

> Bjorn

Re: Fw: [External] Re: [PATCH v4 0/4] Improve s0ix flows for systems i219LM

2020-12-15 Thread Neftin, Sasha


On 12/14/2020 20:40, Mark Pearson wrote:

Thanks Hans

On 14/12/2020 13:31, Mark Pearson wrote:




*From:* Hans de Goede 
*Sent:* December 14, 2020 13:24
*To:* Mario Limonciello ; Jeff Kirsher
; Tony Nguyen ;
intel-wired-...@lists.osuosl.org ;
David Miller ; Aaron Ma ;
Mark Pearson 
*Cc:* linux-ker...@vger.kernel.org ;
Netdev ; Alexander Duyck
; Jakub Kicinski ; Sasha
Netfin ; Aaron Brown ;
Stefan Assmann ; darc...@redhat.com
; yijun.s...@dell.com ;
perry.y...@dell.com ; anthony.w...@canonical.com

*Subject:* [External] Re: [PATCH v4 0/4] Improve s0ix flows for systems
i219LM
  
Hi All,






###

I've added Mark Pearson from Lenovo to the Cc so that Lenovo
can investigate this issue further.

Mark, this thread is about an issue with enabling S0ix support for
e1000e (i219lm) controllers. This was enabled in the kernel a
while ago, but then got disabled again on vPro / AMT enabled
systems because on some systems (Lenovo X1C7 and now also X1C8)
this lead to suspend/resume issues.

When AMT is active then there is a handover handshake for the
OS to get access to the ethernet controller from the ME. The
Intel folks have checked and the Windows driver is using a timeout
of 1 second for this handshake, yet on Lenovo systems this is
taking 2 seconds. This likely has something to do with the
ME firmware on these Lenovo models, can you get the firmware
team at Lenovo to investigate this further ?

Absolutely - I'll ask them to look into this again.

we need to explain why on Windows systems required 1s and on Linux 
systems up to 2.5s - otherwise it is not reliable approach - you will 
encounter others buggy system.

(ME not POR on the Linux systems - is only one possible answer)

We did try to make progress with this previously - but it got a bit
stuck and hence the need for these patchesbut I believe things may
have changed a bit so it's worth trying again

Mark


Sasha

Re: [PATCH v5 bpf-next 13/14] bpf: add new frame_length field to the XDP ctx

2020-12-15 Thread Eelco Chaudron





On 9 Dec 2020, at 13:07, Eelco Chaudron wrote:


On 9 Dec 2020, at 12:10, Maciej Fijalkowski wrote:





+
+   ctx_reg = (si->src_reg == si->dst_reg) ? scratch_reg - 1 :
si->src_reg;
+   while (dst_reg == ctx_reg || scratch_reg == ctx_reg)
+   ctx_reg--;
+
+   /* Save scratch registers */
+   if (ctx_reg != si->src_reg) {
+   *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, ctx_reg,
+ offsetof(struct xdp_buff,
+  tmp_reg[1]));
+
+   *insn++ = BPF_MOV64_REG(ctx_reg, si->src_reg);
+   }
+
+   *insn++ = BPF_STX_MEM(BPF_DW, ctx_reg, scratch_reg,
+ offsetof(struct xdp_buff, tmp_reg[0]));


Why don't you push regs to stack, use it and then pop it back? That 
way

I
suppose you could avoid polluting xdp_buff with tmp_reg[2].


There is no “real” stack in eBPF, only a read-only frame 
pointer, and as we
are replacing a single instruction, we have no info on what we can 
use as

scratch space.


Uhm, what? You use R10 for stack operations. Verifier tracks the 
stack
depth used by programs and then it is passed down to JIT so that 
native

asm will create a properly sized stack frame.

From the top of my head I would let know xdp_convert_ctx_access of a
current stack depth and use it for R10 stores, so your scratch space 
would

be R10 + (stack depth + 8), R10 + (stack_depth + 16).


Other instances do exactly the same, i.e. put some scratch registers 
in the underlying data structure, so I reused this approach. From the 
current information in the callback, I was not able to determine the 
current stack_depth. With "real" stack above, I meant having a 
pop/push like instruction.


I do not know the verifier code well enough, but are you suggesting I 
can get the current stack_depth from the verifier in the 
xdp_convert_ctx_access() callback? If so any pointers?


Maciej any feedback on the above, i.e. getting the stack_depth in 
xdp_convert_ctx_access()?


Problem with that would be the fact that convert_ctx_accesses() 
happens to
be called after the check_max_stack_depth(), so probably stack_depth 
of a
prog that has frame_length accesses would have to be adjusted 
earlier.


Ack, need to learn more on the verifier part…

Re: [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout

2020-12-15 Thread Cambda Zhu



> On Dec 15, 2020, at 19:06, Eric Dumazet  wrote:
> 
> On Tue, Dec 15, 2020 at 3:08 AM Jakub Kicinski  wrote:
>> 
>> On Sun, 13 Dec 2020 21:59:45 +0800 Cambda Zhu wrote:
 On Dec 13, 2020, at 06:32, Jakub Kicinski  wrote:
 On Tue,  8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote:
> For each TCP zero window probe, the icsk_backoff is increased by one and
> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
> shift count would be masked to range 0 to 63. And on ARMv7 the result is
> zero. If the shift count is masked, only several probes will be sent
> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
> needs tcp_retries2 times probes to end this false timeout. Besides,
> bitwise shift greater than or equal to the width is an undefined
> behavior.
 
 If icsk_backoff can reach 64, can it not also reach 256 and wrap?
>>> 
>>> If tcp_retries2 is set greater than 255, it can be wrapped. But for TCP 
>>> probe0,
>>> it seems to be not a serious problem. The timeout will be icsk_rto and 
>>> backoff
>>> again. And considering icsk_backoff is 8 bits, not only it may always be 
>>> lesser
>>> than tcp_retries2, but also may always be lesser than tcp_orphan_retries. 
>>> And
>>> the icsk_probes_out is 8 bits too. So if the max_probes is greater than 255,
>>> the connection won’t abort even if it’s an orphan sock in some cases.
>>> 
>>> We can change the type of icsk_backoff/icsk_probes_out to fix these 
>>> problems.
>>> But I think maybe the retries greater than 255 have no sense indeed and the 
>>> RFC
>>> only requires the timeout(R2) greater than 100s at least. Could it be 
>>> better to
>>> limit the min/max ranges of their sysctls?
>> 
>> All right, I think the patch is good as is, applied for 5.11, thank you!
> 
> It looks like we can remove the (u64) casts then.
> 
> Also if we _really_ care about icsk_backoff approaching 63, we also
> need to change inet_csk_rto_backoff() ?

Yes, we need. But the socket can close after tcp_orphan_retries times probes 
even if alive is
always true. And there’re something I’m not very clear yet:
1) inet_csk_rto_backoff() may be not only for TCP, and the RFC 6298 requires 
the max value
   of RTO is 60 seconds at least. So what’s the proper shift limit?
2) If max_probes is greater than 255, the icsk_probes_out cannot be greater 
than max_probes
   and the connection may not close forever. This looks more serious.

> 
> Was your patch based on a real world use, or some fuzzer UBSAN report ?
> 

I found this issue not on TCP. I’m developing a private protocol, and in this 
protocol I made
something like probe0 with max RTO lesser than 120 seconds. I found similar 
issues on testing
and found Linux TCP have same issues. So it’s not a real world use for TCP and 
it may be ok to
ignore the issues.

> diff --git a/include/net/inet_connection_sock.h
> b/include/net/inet_connection_sock.h
> index 
> 7338b3865a2a3d278dc27c0167bba1b966bbda9f..a2a145e3b062c0230935c293fc1900df095937d4
> 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -242,9 +242,10 @@ static inline unsigned long
> inet_csk_rto_backoff(const struct inet_connection_sock *icsk,
> unsigned long max_when)
> {
> -u64 when = (u64)icsk->icsk_rto << icsk->icsk_backoff;
> +   u8 backoff = min_t(u8, 32U, icsk->icsk_backoff);
> +   u64 when = (u64)icsk->icsk_rto << backoff;
> 
> -return (unsigned long)min_t(u64, when, max_when);
> +   return (unsigned long)min_t(u64, when, max_when);
> }
> 
> struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern);
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 
> 78d13c88720fda50e3f1880ac741cea1985ef3e9..fc6e4d40fd94a717d24ebd8aef7f7930a4551fe9
> 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1328,9 +1328,8 @@ static inline unsigned long
> tcp_probe0_when(const struct sock *sk,
> {
>u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
>   inet_csk(sk)->icsk_backoff);
> -   u64 when = (u64)tcp_probe0_base(sk) << backoff;
> 
> -   return (unsigned long)min_t(u64, when, max_when);
> +   return min(tcp_probe0_base(sk) << backoff, max_when);
> }
> 
> static inline void tcp_check_probe_timer(struct sock *sk)

Re: [PATCH v1 net-next 05/15] nvme-tcp: Add DDP offload control path

2020-12-15 Thread Shai Malin

On 12/14/2020 08:38, Boris Pismenny wrote:
> On 10/12/2020 19:15, Shai Malin wrote:
> > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 
> > c0c33320fe65..ef96e4a02bbd 100644
> > --- a/drivers/nvme/host/tcp.c
> > +++ b/drivers/nvme/host/tcp.c
> > @@ -14,6 +14,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "nvme.h"
> >  #include "fabrics.h"
> > @@ -62,6 +63,7 @@ enum nvme_tcp_queue_flags {
> >   NVME_TCP_Q_ALLOCATED= 0,
> >   NVME_TCP_Q_LIVE = 1,
> >   NVME_TCP_Q_POLLING  = 2,
> > + NVME_TCP_Q_OFFLOADS = 3,
> >  };
> >
> > The same comment from the previous version - we are concerned that perhaps
> > the generic term "offload" for both the transport type (for the Marvell 
> > work)
> > and for the DDP and CRC offload queue (for the Mellanox work) may be
> > misleading and confusing to developers and to users.
> >
> > As suggested by Sagi, we can call this NVME_TCP_Q_DDP.
> >
>
> While I don't mind changing the naming here. I wonder  why not call the
> toe you use TOE and not TCP_OFFLOAD, and then offload is free for this?

Thanks - please do change the name to NVME_TCP_Q_DDP.
The Marvell nvme-tcp-offload patch series introducing the offloading of both the
TCP as well as the NVMe/TCP layer, therefore it's not TOE.

>
> Moreover, the most common use of offload in the kernel is for partial offloads
> like this one, and not for full offloads (such as toe).

Because each vendor might implement a different partial offload I
suggest naming it
with the specific technique which is used, as was suggested - NVME_TCP_Q_DDP.

[PATCH net 2/2] net: mvpp2: disable force link UP during port init procedure

2020-12-15 Thread stefanc

From: Stefan Chulski 

Force link UP can be enabled by bootloader during tftpboot
and breaks NFS support.
Force link UP disabled during port init procedure.

Signed-off-by: Stefan Chulski 
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index d2b0506..0ad3177 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -5479,7 +5479,7 @@ static int mvpp2_port_init(struct mvpp2_port *port)
struct mvpp2 *priv = port->priv;
struct mvpp2_txq_pcpu *txq_pcpu;
unsigned int thread;
-   int queue, err;
+   int queue, err, val;
 
/* Checks for hardware constraints */
if (port->first_rxq + port->nrxqs >
@@ -5493,6 +5493,18 @@ static int mvpp2_port_init(struct mvpp2_port *port)
mvpp2_egress_disable(port);
mvpp2_port_disable(port);
 
+   if (mvpp2_is_xlg(port->phy_interface)) {
+   val = readl(port->base + MVPP22_XLG_CTRL0_REG);
+   val &= ~MVPP22_XLG_CTRL0_FORCE_LINK_PASS;
+   val |= MVPP22_XLG_CTRL0_FORCE_LINK_DOWN;
+   writel(val, port->base + MVPP22_XLG_CTRL0_REG);
+   } else {
+   val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+   val &= ~MVPP2_GMAC_FORCE_LINK_PASS;
+   val |= MVPP2_GMAC_FORCE_LINK_DOWN;
+   writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+   }
+
port->tx_time_coal = MVPP2_TXDONE_COAL_USEC;
 
port->txqs = devm_kcalloc(dev, port->ntxqs, sizeof(*port->txqs),
-- 
1.9.1

[PATCH net 1/2] net: mvpp2: Fix GoP port 3 Networking Complex Control configurations

2020-12-15 Thread stefanc

From: Stefan Chulski 

During GoP port 2 Networking Complex Control mode of operation configurations,
also GoP port 3 mode of operation was wrongly set mode.
Patch removes these configurations.
GENCONF_CTRL0_PORTX naming also fixed.

Signed-off-by: Stefan Chulski 
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2.h  | 6 +++---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index 6bd7e40..39c4e5c 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -651,9 +651,9 @@
 #define GENCONF_PORT_CTRL1_EN(p)   BIT(p)
 #define GENCONF_PORT_CTRL1_RESET(p)(BIT(p) << 28)
 #define GENCONF_CTRL0  0x1120
-#define GENCONF_CTRL0_PORT0_RGMII  BIT(0)
-#define GENCONF_CTRL0_PORT1_RGMII_MII  BIT(1)
-#define GENCONF_CTRL0_PORT1_RGMII  BIT(2)
+#define GENCONF_CTRL0_PORT2_RGMII  BIT(0)
+#define GENCONF_CTRL0_PORT3_RGMII_MII  BIT(1)
+#define GENCONF_CTRL0_PORT3_RGMII  BIT(2)
 
 /* Various constants */
 
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index d64dc12..d2b0506 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -1231,9 +1231,9 @@ static void mvpp22_gop_init_rgmii(struct mvpp2_port *port)
 
regmap_read(priv->sysctrl_base, GENCONF_CTRL0, &val);
if (port->gop_id == 2)
-   val |= GENCONF_CTRL0_PORT0_RGMII | GENCONF_CTRL0_PORT1_RGMII;
+   val |= GENCONF_CTRL0_PORT2_RGMII;
else if (port->gop_id == 3)
-   val |= GENCONF_CTRL0_PORT1_RGMII_MII;
+   val |= GENCONF_CTRL0_PORT3_RGMII_MII;
regmap_write(priv->sysctrl_base, GENCONF_CTRL0, val);
 }
 
@@ -1250,9 +1250,9 @@ static void mvpp22_gop_init_sgmii(struct mvpp2_port *port)
if (port->gop_id > 1) {
regmap_read(priv->sysctrl_base, GENCONF_CTRL0, &val);
if (port->gop_id == 2)
-   val &= ~GENCONF_CTRL0_PORT0_RGMII;
+   val &= ~GENCONF_CTRL0_PORT2_RGMII;
else if (port->gop_id == 3)
-   val &= ~GENCONF_CTRL0_PORT1_RGMII_MII;
+   val &= ~GENCONF_CTRL0_PORT3_RGMII_MII;
regmap_write(priv->sysctrl_base, GENCONF_CTRL0, val);
}
 }
-- 
1.9.1

Re: [PATCH RFC v2 1/5] dt-bindings: net: dwmac-meson: use picoseconds for the RGMII RX delay

2020-12-15 Thread Rob Herring

On Sun, Dec 13, 2020 at 05:59:05PM +0100, Martin Blumenstingl wrote:
> Hi Rob,
> 
> On Mon, Dec 7, 2020 at 8:17 PM Rob Herring  wrote:
> >
> > On Sun, Nov 15, 2020 at 07:52:06PM +0100, Martin Blumenstingl wrote:
> > > Amlogic Meson G12A, G12B and SM1 SoCs have a more advanced RGMII RX
> > > delay register which allows picoseconds precision. Deprecate the old
> > > "amlogic,rx-delay-ns" in favour of a new "amlogic,rgmii-rx-delay-ps"
> > > property.
> > >
> > > For older SoCs the only known supported values were 0ns and 2ns. The new
> > > SoCs have 200ps precision and support RGMII RX delays between 0ps and
> > > 3000ps.
> > >
> > > While here, also update the description of the RX delay to indicate
> > > that:
> > > - with "rgmii" or "rgmii-id" the RX delay should be specified
> > > - with "rgmii-id" or "rgmii-rxid" the RX delay is added by the PHY so
> > >   any configuration on the MAC side is ignored
> > > - with "rmii" the RX delay is not applicable and any configuration is
> > >   ignored
> > >
> > > Signed-off-by: Martin Blumenstingl 
> > > ---
> > >  .../bindings/net/amlogic,meson-dwmac.yaml | 61 +--
> > >  1 file changed, 56 insertions(+), 5 deletions(-)
> >
> > Don't we have common properties for this now?
> I did a quick:
> $ grep -R rx-delay Documentation/devicetree/bindings/net/
> 
> I could find "rx-delay" without vendor prefix, but that's not using
> any unit in the name (ns, ps, ...)
> Please let me know if you aware of any "generic" property for the RX
> delay in picosecond precision

{rx,tx}-internal-delay-ps in ethernet-controller.yaml and 
ethernet-phy.yaml.

Re: [PATCH v3 bpf-next 2/2] net: xdp: introduce xdp_prepare_buff utility routine

2020-12-15 Thread Lorenzo Bianconi

[...]
> > xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
> > b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > index 4dbbbd49c389..fcd1ca3343fb 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > @@ -2393,12 +2393,12 @@ static int i40e_clean_rx_irq(struct i40e_ring 
> > *rx_ring, int budget)
> >  
> > /* retrieve a buffer from the ring */
> > if (!skb) {
> > -   xdp.data = page_address(rx_buffer->page) +
> > -  rx_buffer->page_offset;
> > -   xdp.data_meta = xdp.data;
> > -   xdp.data_hard_start = xdp.data -
> > - i40e_rx_offset(rx_ring);
> > -   xdp.data_end = xdp.data + size;
> > +   unsigned int offset = i40e_rx_offset(rx_ring);
> 
> I now see that we could call the i40e_rx_offset() once per napi, so can
> you pull this variable out and have it initialized a single time? Applies
> to other intel drivers as well.

ack, fine. I will fix in v4.

Regards,
Lorenzo

> 
> I also feel like it's sub-optimal for drivers that are calculating the
> data_hard_start out of data (intel, bnxt, sfc and mlx4 have this approach)
> due to additional add, but I don't have a solution for that. Would be
> weird to have another helper. Not sure what other people think, but I have
> in mind a "death by 1000 cuts" phrase :)
> 
> > +   unsigned char *hard_start;
> > +
> > +   hard_start = page_address(rx_buffer->page) +
> > +rx_buffer->page_offset - offset;
> > +   xdp_prepare_buff(&xdp, hard_start, offset, size);
> >  #if (PAGE_SIZE > 4096)
> > /* At larger PAGE_SIZE, frame_sz depend on len size */
> > xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size);
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c 
> > b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > index d52d98d56367..a7a00060f520 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > @@ -1094,8 +1094,9 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int 
> > budget)
> > while (likely(total_rx_pkts < (unsigned int)budget)) {
> > union ice_32b_rx_flex_desc *rx_desc;
> > struct ice_rx_buf *rx_buf;
> > +   unsigned int size, offset;
> > +   unsigned char *hard_start;
> > struct sk_buff *skb;
> > -   unsigned int size;
> > u16 stat_err_bits;
> > u16 vlan_tag = 0;
> > u8 rx_ptype;
> > @@ -1138,10 +1139,10 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int 
> > budget)
> > goto construct_skb;
> > }
> >  
> > -   xdp.data = page_address(rx_buf->page) + rx_buf->page_offset;
> > -   xdp.data_hard_start = xdp.data - ice_rx_offset(rx_ring);
> > -   xdp.data_meta = xdp.data;
> > -   xdp.data_end = xdp.data + size;
> > +   offset = ice_rx_offset(rx_ring);
> > +   hard_start = page_address(rx_buf->page) + rx_buf->page_offset -
> > +offset;
> > +   xdp_prepare_buff(&xdp, hard_start, offset, size);
> >  #if (PAGE_SIZE > 4096)
> > /* At larger PAGE_SIZE, frame_sz depend on len size */
> > xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 365dfc0e3b65..070b2bb4e9ca 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -8700,12 +8700,12 @@ static int igb_clean_rx_irq(struct igb_q_vector 
> > *q_vector, const int budget)
> >  
> > /* retrieve a buffer from the ring */
> > if (!skb) {
> > -   xdp.data = page_address(rx_buffer->page) +
> > -  rx_buffer->page_offset;
> > -   xdp.data_meta = xdp.data;
> > -   xdp.data_hard_start = xdp.data -
> > - igb_rx_offset(rx_ring);
> > -   xdp.data_end = xdp.data + size;
> > +   unsigned int offset = igb_rx_offset(rx_ring);
> > +   unsigned char *hard_start;
> > +
> > +   hard_start = page_address(rx_buffer->page) +
> > +rx_buffer->page_offset - offset;
> > +   xdp_prepare_buff(&xdp, hard_start, offset, size);
> >  #if (PAGE_SIZE > 4096)
> > /* At larger PAGE_SIZE, frame_sz depend on len size */
> > xdp.frame_sz = igb_rx_frame_truesize(rx_ring, size);
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> >

Re: [RFC] net: stmmac: Problem with adding the native GPIOs support

2020-12-15 Thread Andrew Lunn

> > > Anyway the hardware setup depicted above doesn't seem
> > > problematic at the first glance, but in fact it is. See, the DW *MAC 
> > > driver
> > > (STMMAC ethernet driver) is doing the MAC reset each time it performs the
> > > device open or resume by means of the call-chain:
> > > 
> > >   stmmac_open()---+
> > >   
> > > +->stmmac_hw_setup()->stmmac_init_dma_engine()->stmmac_reset().
> > >   stmmac_resume()-+
> > > 
> > > Such reset causes the whole interface reset: MAC, DMA and, what is more
> > > important, GPIOs as being exposed as part of the MAC registers. That
> > > in our case automatically causes the external PHY reset, what neither
> > > the STTMAC driver nor the PHY subsystem expect at all.
> > 
> 
> > Is the reset of the GPIO sub block under software control? When you
> > have a GPIO controller implemented, you would want to disable this.
> 
> Not sure I've fully understood your question. The GPIO sub-block of
> the MAC is getting reset together with the MAC.

And my question is, is that under software control, or is the hardware
synthesised so that the GPIO controller is reset as part of the MAC
reset?

>From what you are saying, it sounds like from software you cannot
independently control the GPIO controller reset?

This is something i would be asking the hardware people. Look at the
VHDL, etc.

  Andrew

RE: [PATCH v1 net-next 07/15] nvme-tcp : Recalculate crc in the end of the capsule

2020-12-15 Thread Shai Malin



> crc offload of the nvme capsule. Check if all the skb bits are on, and if not
> recalculate the crc in SW and check it.
> 
> This patch reworks the receive-side crc calculation to always run at the end,
> so as to keep a single flow for both offload and non-offload. This change
> simplifies the code, but it may degrade performance for non-offload crc
> calculation.
> 
> Signed-off-by: Boris Pismenny 
> Signed-off-by: Ben Ben-Ishay 
> Signed-off-by: Or Gerlitz 
> Signed-off-by: Yoray Zack 
> ---
>  drivers/nvme/host/tcp.c | 111 
> 
>  1 file changed, 91 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index
> 534fd5c00f33..3c10c8876036 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -69,6 +69,7 @@ enum nvme_tcp_queue_flags {
>   NVME_TCP_Q_LIVE = 1,
>   NVME_TCP_Q_POLLING  = 2,
>   NVME_TCP_Q_OFFLOADS = 3,
> + NVME_TCP_Q_OFF_CRC_RX   = 4,

Because only the data digest is offloaded, and not the header digest, 
in order to avoid confusion, I suggest replacing the term 
NVME_TCP_Q_OFF_CRC_RX with NVME_TCP_Q_OFF_DDGST_RX.

>  };

Re: [PATCH 04/25] dt-bindings: net: dwmac: Refactor snps,*-config properties

2020-12-15 Thread Rob Herring

On Tue, Dec 15, 2020 at 2:54 AM Serge Semin
 wrote:
>
> Hello Rob,
>
> On Mon, Dec 14, 2020 at 08:30:06AM -0600, Rob Herring wrote:
> > On Mon, Dec 14, 2020 at 12:15:54PM +0300, Serge Semin wrote:
> > > Currently the "snps,axi-config", "snps,mtl-rx-config" and
> > > "snps,mtl-tx-config" properties are declared as a single phandle reference
> > > to a node with corresponding parameters defined. That's not good for
> > > several reasons. First of all scattering around a device tree some
> > > particular device-specific configs with no visual relation to that device
> > > isn't suitable from maintainability point of view. That leads to a
> > > disturbed representation of the actual device tree mixing actual device
> > > nodes and some vendor-specific configs. Secondly using the same configs
> > > set for several device nodes doesn't represent well the devices structure,
> > > since the interfaces these configs describe in hardware belong to
> > > different devices and may actually differ. In the later case having the
> > > configs node separated from the corresponding device nodes gets to be
> > > even unjustified.
> > >
> > > So instead of having a separate DW *MAC configs nodes we suggest to
> > > define them as sub-nodes of the device nodes, which interfaces they
> > > actually describe. By doing so we'll make the DW *MAC nodes visually
> > > correct describing all the aspects of the IP-core configuration. Thus
> > > we'll be able to describe the configs sub-nodes bindings right in the
> > > snps,dwmac.yaml file.
> > >
> > > Note the former "snps,axi-config", "snps,mtl-rx-config" and
> > > "snps,mtl-tx-config" bindings have been marked as deprecated.
> > >
> > > Signed-off-by: Serge Semin 
> > >
> > > ---
> > >
> > > Note the current DT schema tool requires the vendor-specific properties 
> > > to be
> > > defined in accordance with the schema: 
> > > dtschema/meta-schemas/vendor-props.yaml
> > > It means the property can be;
> > > - boolean,
> > > - string,
> > > - defined with $ref and additional constraints,
> > > - defined with allOf: [ $ref ] and additional constraints.
> > >
> > > The modification provided by this commit needs to extend that definition 
> > > to
> > > make the DT schema tool correctly parse this schema. That is we need to 
> > > let
> > > the vendors-specific properties to also accept the oneOf-based combined
> > > sub-schema. Like this:
> > >
> > > --- a/dtschema/meta-schemas/vendor-props.yaml
> > > +++ b/dtschema/meta-schemas/vendor-props.yaml
> > > @@ -48,15 +48,24 @@
> > >- properties:   # A property with a type and additional constraints
> > >$ref:
> > >  pattern: "types.yaml#[\/]{0,1}definitions\/.*"
> > > -  allOf:
> > > -items:
> > > -  - properties:
> > > +
> > > +if:
> > > +  not:
> > > +required:
> > > +  - $ref
> > > +then:
> > > +  patternProperties:
> > > +"^(all|one)Of$":
> > > +  contains:
> > > +properties:
> > >$ref:
> > >  pattern: "types.yaml#[\/]{0,1}definitions\/.*"
> > >  required:
> > >- $ref
> > > -oneOf:
> > > +
> > > +anyOf:
> > >- required: [ $ref ]
> > >- required: [ allOf ]
> > > +  - required: [ oneOf ]
> > >
> > >  ...
> > > ---
> > >  .../devicetree/bindings/net/snps,dwmac.yaml   | 380 +-
> > >  1 file changed, 288 insertions(+), 92 deletions(-)
> > >
> > > diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml 
> > > b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
> > > index 0dd543c6c08e..44aa88151cba 100644
> > > --- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml
> > > +++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
> > > @@ -150,69 +150,251 @@ properties:
> > >in a different mode than the PHY in order to function.
> > >
> > >snps,axi-config:
> > > -$ref: /schemas/types.yaml#definitions/phandle
> > > -description:
> > > -  AXI BUS Mode parameters. Phandle to a node that can contain the
> > > -  following properties
> > > -* snps,lpi_en, enable Low Power Interface
> > > -* snps,xit_frm, unlock on WoL
> > > -* snps,wr_osr_lmt, max write outstanding req. limit
> > > -* snps,rd_osr_lmt, max read outstanding req. limit
> > > -* snps,kbbe, do not cross 1KiB boundary.
> > > -* snps,blen, this is a vector of supported burst length.
> > > -* snps,fb, fixed-burst
> > > -* snps,mb, mixed-burst
> > > -* snps,rb, rebuild INCRx Burst
> > > +description: AXI BUS Mode parameters
> > > +oneOf:
> > > +  - deprecated: true
> > > +$ref: /schemas/types.yaml#definitions/phandle
> > > +  - type: object
> > > +properties:
> >
>
> > Anywhere have have the same node/property string meaning 2 different
> > thin

[PATCH] atm: ambassador: remove h from printk format specifier

2020-12-15 Thread trix

From: Tom Rix 

See Documentation/core-api/printk-formats.rst.
h should no longer be used in the format specifier for printk.

Signed-off-by: Tom Rix 
---
 drivers/atm/ambassador.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/atm/ambassador.c b/drivers/atm/ambassador.c
index c039b8a4fefe..6b0fff8c0141 100644
--- a/drivers/atm/ambassador.c
+++ b/drivers/atm/ambassador.c
@@ -2169,7 +2169,7 @@ static void setup_pci_dev(struct pci_dev *pci_dev)
pci_lat = (lat < MIN_PCI_LATENCY) ? MIN_PCI_LATENCY : lat;
 
if (lat != pci_lat) {
-   PRINTK (KERN_INFO, "Changing PCI latency timer from %hu to %hu",
+   PRINTK (KERN_INFO, "Changing PCI latency timer from %u to %u",
lat, pci_lat);
pci_write_config_byte(pci_dev, PCI_LATENCY_TIMER, pci_lat);
}
@@ -2300,7 +2300,7 @@ static void __init amb_check_args (void) {
   unsigned int max_rx_size;
   
 #ifdef DEBUG_AMBASSADOR
-  PRINTK (KERN_NOTICE, "debug bitmap is %hx", debug &= DBG_MASK);
+  PRINTK (KERN_NOTICE, "debug bitmap is %x", debug &= DBG_MASK);
 #else
   if (debug)
 PRINTK (KERN_NOTICE, "no debugging support");
-- 
2.27.0

[PATCH] atm: horizon: remove h from printk format specifier

2020-12-15 Thread trix

From: Tom Rix 

See Documentation/core-api/printk-formats.rst.
h should no longer be used in the format specifier for printk.

Signed-off-by: Tom Rix 
---
 drivers/atm/horizon.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/atm/horizon.c b/drivers/atm/horizon.c
index 4f2951cbe69c..e110c305fc9c 100644
--- a/drivers/atm/horizon.c
+++ b/drivers/atm/horizon.c
@@ -1609,7 +1609,7 @@ static int hrz_send (struct atm_vcc * atm_vcc, struct 
sk_buff * skb) {
 if (*s++ == 'D') {
for (i = 0; i < 4; ++i)
d = (d << 4) | hex_to_bin(*s++);
-  PRINTK (KERN_INFO, "debug bitmap is now %hx", debug = d);
+  PRINTK (KERN_INFO, "debug bitmap is now %x", debug = d);
 }
   }
 #endif
@@ -2675,7 +2675,7 @@ static int hrz_probe(struct pci_dev *pci_dev,
   "changing", lat, pci_lat);
pci_write_config_byte(pci_dev, PCI_LATENCY_TIMER, pci_lat);
} else if (lat < MIN_PCI_LATENCY) {
-   PRINTK(KERN_INFO, "%s PCI latency timer from %hu to %hu",
+   PRINTK(KERN_INFO, "%s PCI latency timer from %u to %u",
   "increasing", lat, MIN_PCI_LATENCY);
pci_write_config_byte(pci_dev, PCI_LATENCY_TIMER, 
MIN_PCI_LATENCY);
}
@@ -2777,7 +2777,7 @@ static void hrz_remove_one(struct pci_dev *pci_dev)
 
 static void __init hrz_check_args (void) {
 #ifdef DEBUG_HORIZON
-  PRINTK (KERN_NOTICE, "debug bitmap is %hx", debug &= DBG_MASK);
+  PRINTK (KERN_NOTICE, "debug bitmap is %x", debug &= DBG_MASK);
 #else
   if (debug)
 PRINTK (KERN_NOTICE, "no debug support in this image");
-- 
2.27.0

Re: [RFC PATCH net-next 01/16] net: mscc: ocelot: offload bridge port flags to device

2020-12-15 Thread Alexandre Belloni

On 08/12/2020 14:07:47+0200, Vladimir Oltean wrote:
> We should not be unconditionally enabling address learning, since doing
> that is actively detrimential when a port is standalone and not offloading
> a bridge. Namely, if a port in the switch is standalone and others are
> offloading the bridge, then we could enter a situation where we learn an
> address towards the standalone port, but the bridged ports could not
> forward the packet there, because the CPU is the only path between the
> standalone and the bridged ports. The solution of course is to not
> enable address learning unless the bridge asks for it. Currently this is
> the only bridge port flag we are looking at. The others (flooding etc)
> are TBD.
> 
> Signed-off-by: Vladimir Oltean 
Reviewed-by: Alexandre Belloni 

> ---
>  drivers/net/ethernet/mscc/ocelot.c | 21 -
>  drivers/net/ethernet/mscc/ocelot.h |  3 +++
>  drivers/net/ethernet/mscc/ocelot_net.c |  4 
>  include/soc/mscc/ocelot.h  |  2 ++
>  4 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot.c 
> b/drivers/net/ethernet/mscc/ocelot.c
> index b9626eec8db6..7a5c534099d3 100644
> --- a/drivers/net/ethernet/mscc/ocelot.c
> +++ b/drivers/net/ethernet/mscc/ocelot.c
> @@ -883,6 +883,7 @@ EXPORT_SYMBOL(ocelot_get_ts_info);
>  
>  void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
>  {
> + struct ocelot_port *ocelot_port = ocelot->ports[port];
>   u32 port_cfg;
>   int p, i;
>  
> @@ -896,7 +897,8 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, 
> int port, u8 state)
>   ocelot->bridge_fwd_mask |= BIT(port);
>   fallthrough;
>   case BR_STATE_LEARNING:
> - port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
> + if (ocelot_port->brport_flags & BR_LEARNING)
> + port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
>   break;
>  
>   default:
> @@ -1178,6 +1180,7 @@ EXPORT_SYMBOL(ocelot_port_bridge_join);
>  int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
>struct net_device *bridge)
>  {
> + struct ocelot_port *ocelot_port = ocelot->ports[port];
>   struct ocelot_vlan pvid = {0}, native_vlan = {0};
>   struct switchdev_trans trans;
>   int ret;
> @@ -1200,6 +1203,10 @@ int ocelot_port_bridge_leave(struct ocelot *ocelot, 
> int port,
>   ocelot_port_set_pvid(ocelot, port, pvid);
>   ocelot_port_set_native_vlan(ocelot, port, native_vlan);
>  
> + ocelot_port->brport_flags = 0;
> + ocelot_rmw_gix(ocelot, 0, ANA_PORT_PORT_CFG_LEARN_ENA,
> +ANA_PORT_PORT_CFG, port);
> +
>   return 0;
>  }
>  EXPORT_SYMBOL(ocelot_port_bridge_leave);
> @@ -1391,6 +1398,18 @@ int ocelot_get_max_mtu(struct ocelot *ocelot, int port)
>  }
>  EXPORT_SYMBOL(ocelot_get_max_mtu);
>  
> +void ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
> +   unsigned long flags,
> +   struct switchdev_trans *trans)
> +{
> + struct ocelot_port *ocelot_port = ocelot->ports[port];
> +
> + if (switchdev_trans_ph_prepare(trans))
> + return;
> +
> + ocelot_port->brport_flags = flags;
> +}
> +
>  void ocelot_init_port(struct ocelot *ocelot, int port)
>  {
>   struct ocelot_port *ocelot_port = ocelot->ports[port];
> diff --git a/drivers/net/ethernet/mscc/ocelot.h 
> b/drivers/net/ethernet/mscc/ocelot.h
> index 291d39d49c4e..739bd201d951 100644
> --- a/drivers/net/ethernet/mscc/ocelot.h
> +++ b/drivers/net/ethernet/mscc/ocelot.h
> @@ -102,6 +102,9 @@ struct ocelot_multicast {
>   struct ocelot_pgid *pgid;
>  };
>  
> +void ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
> +   unsigned long flags,
> +   struct switchdev_trans *trans);
>  int ocelot_port_fdb_do_dump(const unsigned char *addr, u16 vid,
>   bool is_static, void *data);
>  int ocelot_mact_learn(struct ocelot *ocelot, int port,
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
> b/drivers/net/ethernet/mscc/ocelot_net.c
> index 9ba7e2b166e9..93ecd5274156 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -882,6 +882,10 @@ static int ocelot_port_attr_set(struct net_device *dev,
>   case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED:
>   ocelot_port_attr_mc_set(ocelot, port, !attr->u.mc_disabled);
>   break;
> + case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
> + ocelot_port_bridge_flags(ocelot, port, attr->u.brport_flags,
> +  trans);
> + break;
>   default:
>   err = -EOPNOTSUPP;
>   break;
> diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> index 2f4cd3288bcc..50514c087231 100644
> --- a/include/soc/mscc/ocelot.h
> +++ b/include/soc/mscc/ocel

Re: [PATCH][next] netfilter: nftables: fix incorrect increment of loop counter

2020-12-15 Thread Pablo Neira Ayuso

Hi,

On Mon, Dec 14, 2020 at 11:40:15PM +, Colin King wrote:
> From: Colin Ian King 
> 
> The intention of the err_expr cleanup path is to iterate over the
> allocated expr_array objects and free them, starting from i - 1 and
> working down to the start of the array. Currently the loop counter
> is being incremented instead of decremented and also the index i is
> being used instead of k, repeatedly destroying the same expr_array
> element.  Fix this by decrementing k and using k as the index into
> expr_array.
> 
> Addresses-Coverity: ("Infinite loop")
> Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions 
> support")
> Signed-off-by: Colin Ian King 

Reviewed-by: Pablo Neira Ayuso 

@Jakub: Would you please take this one into net-next? Thanks!

> ---
>  net/netfilter/nf_tables_api.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 8d5aa0ac45f4..4186b1e52d58 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -5254,8 +5254,8 @@ static int nft_set_elem_expr_clone(const struct nft_ctx 
> *ctx,
>   return 0;
>  
>  err_expr:
> - for (k = i - 1; k >= 0; k++)
> - nft_expr_destroy(ctx, expr_array[i]);
> + for (k = i - 1; k >= 0; k--)
> + nft_expr_destroy(ctx, expr_array[k]);
>  
>   return -ENOMEM;
>  }
> -- 
> 2.29.2
>

Re: [RFC PATCH net-next 02/16] net: mscc: ocelot: allow offloading of bridge on top of LAG

2020-12-15 Thread Alexandre Belloni

On 08/12/2020 14:07:48+0200, Vladimir Oltean wrote:
> Commit 7afb3e575e5a ("net: mscc: ocelot: don't handle netdev events for
> other netdevs") was too aggressive, and it made ocelot_netdevice_event
> react only to network interface events emitted for the ocelot switch
> ports.
> 
> In fact, only the PRECHANGEUPPER should have had that check.
> 
> When we ignore all events that are not for us, we miss the fact that the
> upper of the LAG changes, and the bonding interface gets enslaved to a
> bridge. This is an operation we could offload under certain conditions.
> 
> Signed-off-by: Vladimir Oltean 
Reviewed-by: Alexandre Belloni 

> ---
>  drivers/net/ethernet/mscc/ocelot_net.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
> b/drivers/net/ethernet/mscc/ocelot_net.c
> index 93ecd5274156..6fb2a813e694 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -1047,10 +1047,8 @@ static int ocelot_netdevice_event(struct 
> notifier_block *unused,
>   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>   int ret = 0;
>  
> - if (!ocelot_netdevice_dev_check(dev))
> - return 0;
> -
>   if (event == NETDEV_PRECHANGEUPPER &&
> + ocelot_netdevice_dev_check(dev) &&
>   netif_is_lag_master(info->upper_dev)) {
>   struct netdev_lag_upper_info *lag_upper_info = info->upper_info;
>   struct netlink_ext_ack *extack;
> -- 
> 2.25.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[PATCH] net/mlx5: fix spelling mistake in Kconfig "accelaration" -> "acceleration"

2020-12-15 Thread Colin King

From: Colin Ian King 

There are some spelling mistakes in the Kconfig. Fix these.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 6e4d7bb7fea2..bcdee9dc4aa2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -149,14 +149,14 @@ config MLX5_IPSEC
IPsec support for the Connect-X family.
 
 config MLX5_EN_IPSEC
-   bool "IPSec XFRM cryptography-offload accelaration"
+   bool "IPSec XFRM cryptography-offload acceleration"
depends on MLX5_CORE_EN
depends on XFRM_OFFLOAD
depends on INET_ESP_OFFLOAD || INET6_ESP_OFFLOAD
depends on MLX5_FPGA_IPSEC || MLX5_IPSEC
default n
help
- Build support for IPsec cryptography-offload accelaration in the NIC.
+ Build support for IPsec cryptography-offload acceleration in the NIC.
  Note: Support for hardware with this capability needs to be selected
  for this option to become available.
 
@@ -192,7 +192,7 @@ config MLX5_TLS
 config MLX5_EN_TLS
bool
help
-   Build support for TLS cryptography-offload accelaration in the NIC.
+   Build support for TLS cryptography-offload acceleration in the NIC.
Note: Support for hardware with this capability needs to be selected
for this option to become available.
 
-- 
2.29.2

Re: [PATCH v3 bpf-next 2/2] net: xdp: introduce xdp_prepare_buff utility routine

2020-12-15 Thread Daniel Borkmann


On 12/15/20 2:47 PM, Lorenzo Bianconi wrote:
[...]

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 329397c60d84..61d3f5f8b7f3 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -866,10 +866,8 @@ static u32 xennet_run_xdp(struct netfront_queue *queue, 
struct page *pdata,
  
  	xdp_init_buff(xdp, XEN_PAGE_SIZE - XDP_PACKET_HEADROOM,

  &queue->xdp_rxq);
-   xdp->data_hard_start = page_address(pdata);
-   xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM;
+   xdp_prepare_buff(xdp, page_address(pdata), XDP_PACKET_HEADROOM, len);
xdp_set_data_meta_invalid(xdp);
-   xdp->data_end = xdp->data + len;
  
  	act = bpf_prog_run_xdp(prog, xdp);

switch (act) {
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3fb3a9aa1b71..66d8a4b317a3 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -83,6 +83,18 @@ xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct 
xdp_rxq_info *rxq)
xdp->rxq = rxq;
  }
  
+static inline void


nit: maybe __always_inline


+xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
+int headroom, int data_len)
+{
+   unsigned char *data = hard_start + headroom;
+
+   xdp->data_hard_start = hard_start;
+   xdp->data = data;
+   xdp->data_end = data + data_len;
+   xdp->data_meta = data;
+}
+
  /* Reserve memory area at end-of data area.
   *


For the drivers with xdp_set_data_meta_invalid(), we're basically setting 
xdp->data_meta
twice unless compiler is smart enough to optimize the first one away (did you 
double check?).
Given this is supposed to be a cleanup, why not integrate this logic as well so 
the
xdp_set_data_meta_invalid() doesn't get extra treatment?

Thanks,
Daniel

RE: [PATCH v5 4/4] e1000e: Export S0ix flags to ethtool

2020-12-15 Thread Shen, Yijun

> -Original Message-
> From: Limonciello, Mario 
> Sent: Tuesday, December 15, 2020 3:30 AM
> To: Jeff Kirsher; Tony Nguyen; intel-wired-...@lists.osuosl.org
> Cc: linux-ker...@vger.kernel.org; Netdev; Alexander Duyck; Jakub Kicinski;
> Sasha Netfin; Aaron Brown; Stefan Assmann; David Miller;
> darc...@redhat.com; Shen, Yijun; Yuan, Perry;
> anthony.w...@canonical.com; Hans de Goede; Limonciello, Mario
> Subject: [PATCH v5 4/4] e1000e: Export S0ix flags to ethtool
> 
> This flag can be used by an end user to disable S0ix flows on a buggy system
> or by an OEM for development purposes.
> 
> If you need this flag to be persisted across reboots, it's suggested to use a
> udev rule to call adjust it until the kernel could have your configuration in 
> a
> disallow list.
> 
> Signed-off-by: Mario Limonciello 

Verified this series patches with Dell Systems.

Tested-By: Yijun Shen 

> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h   |  1 +
>  drivers/net/ethernet/intel/e1000e/ethtool.c | 46 +
> drivers/net/ethernet/intel/e1000e/netdev.c  |  9 ++--
>  3 files changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h
> b/drivers/net/ethernet/intel/e1000e/e1000.h
> index ba7a0f8f6937..5b2143f4b1f8 100644
> --- a/drivers/net/ethernet/intel/e1000e/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000e/e1000.h
> @@ -436,6 +436,7 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
> *adapter, u32 *timinca);
>  #define FLAG2_DFLT_CRC_STRIPPING  BIT(12)
>  #define FLAG2_CHECK_RX_HWTSTAMP   BIT(13)
>  #define FLAG2_CHECK_SYSTIM_OVERFLOW   BIT(14)
> +#define FLAG2_ENABLE_S0IX_FLOWS   BIT(15)
> 
>  #define E1000_RX_DESC_PS(R, i)   \
>   (&(((union e1000_rx_desc_packet_split *)((R).desc))[i])) diff --git
> a/drivers/net/ethernet/intel/e1000e/ethtool.c
> b/drivers/net/ethernet/intel/e1000e/ethtool.c
> index 03215b0aee4b..06442e6bef73 100644
> --- a/drivers/net/ethernet/intel/e1000e/ethtool.c
> +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
> @@ -23,6 +23,13 @@ struct e1000_stats {
>   int stat_offset;
>  };
> 
> +static const char e1000e_priv_flags_strings[][ETH_GSTRING_LEN] = {
> +#define E1000E_PRIV_FLAGS_S0IX_ENABLED   BIT(0)
> + "s0ix-enabled",
> +};
> +
> +#define E1000E_PRIV_FLAGS_STR_LEN
> ARRAY_SIZE(e1000e_priv_flags_strings)
> +
>  #define E1000_STAT(str, m) { \
>   .stat_string = str, \
>   .type = E1000_STATS, \
> @@ -1776,6 +1783,8 @@ static int e1000e_get_sset_count(struct net_device
> __always_unused *netdev,
>   return E1000_TEST_LEN;
>   case ETH_SS_STATS:
>   return E1000_STATS_LEN;
> + case ETH_SS_PRIV_FLAGS:
> + return E1000E_PRIV_FLAGS_STR_LEN;
>   default:
>   return -EOPNOTSUPP;
>   }
> @@ -2097,6 +2106,10 @@ static void e1000_get_strings(struct net_device
> __always_unused *netdev,
>   p += ETH_GSTRING_LEN;
>   }
>   break;
> + case ETH_SS_PRIV_FLAGS:
> + memcpy(data, e1000e_priv_flags_strings,
> +E1000E_PRIV_FLAGS_STR_LEN * ETH_GSTRING_LEN);
> + break;
>   }
>  }
> 
> @@ -2305,6 +2318,37 @@ static int e1000e_get_ts_info(struct net_device
> *netdev,
>   return 0;
>  }
> 
> +static u32 e1000e_get_priv_flags(struct net_device *netdev) {
> + struct e1000_adapter *adapter = netdev_priv(netdev);
> + u32 priv_flags = 0;
> +
> + if (adapter->flags2 & FLAG2_ENABLE_S0IX_FLOWS)
> + priv_flags |= E1000E_PRIV_FLAGS_S0IX_ENABLED;
> +
> + return priv_flags;
> +}
> +
> +static int e1000e_set_priv_flags(struct net_device *netdev, u32
> +priv_flags) {
> + struct e1000_adapter *adapter = netdev_priv(netdev);
> + unsigned int flags2 = adapter->flags2;
> +
> + flags2 &= ~FLAG2_ENABLE_S0IX_FLOWS;
> + if (priv_flags & E1000E_PRIV_FLAGS_S0IX_ENABLED) {
> + struct e1000_hw *hw = &adapter->hw;
> +
> + if (hw->mac.type < e1000_pch_cnp)
> + return -EINVAL;
> + flags2 |= FLAG2_ENABLE_S0IX_FLOWS;
> + }
> +
> + if (flags2 != adapter->flags2)
> + adapter->flags2 = flags2;
> +
> + return 0;
> +}
> +
>  static const struct ethtool_ops e1000_ethtool_ops = {
>   .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS,
>   .get_drvinfo= e1000_get_drvinfo,
> @@ -2336,6 +2380,8 @@ static const struct ethtool_ops e1000_ethtool_ops
> = {
>   .set_eee= e1000e_set_eee,
>   .get_link_ksettings = e1000_get_link_ksettings,
>   .set_link_ksettings = e1000_set_link_ksettings,
> + .get_priv_flags = e1000e_get_priv_flags,
> + .set_priv_flags = e1000e_set_priv_flags,
>  };
> 
>  void e1000e_set_ethtool_ops(struct net_device *netdev) diff --git
> a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index b98

Re: [RFC] net: stmmac: Problem with adding the native GPIOs support

2020-12-15 Thread Serge Semin

On Tue, Dec 15, 2020 at 02:58:37PM +0100, Andrew Lunn wrote:
> > > > Anyway the hardware setup depicted above doesn't seem
> > > > problematic at the first glance, but in fact it is. See, the DW *MAC 
> > > > driver
> > > > (STMMAC ethernet driver) is doing the MAC reset each time it performs 
> > > > the
> > > > device open or resume by means of the call-chain:
> > > > 
> > > >   stmmac_open()---+
> > > >   
> > > > +->stmmac_hw_setup()->stmmac_init_dma_engine()->stmmac_reset().
> > > >   stmmac_resume()-+
> > > > 
> > > > Such reset causes the whole interface reset: MAC, DMA and, what is more
> > > > important, GPIOs as being exposed as part of the MAC registers. That
> > > > in our case automatically causes the external PHY reset, what neither
> > > > the STTMAC driver nor the PHY subsystem expect at all.
> > > 
> > 
> > > Is the reset of the GPIO sub block under software control? When you
> > > have a GPIO controller implemented, you would want to disable this.
> > 
> > Not sure I've fully understood your question. The GPIO sub-block of
> > the MAC is getting reset together with the MAC.
> 

> And my question is, is that under software control, or is the hardware
> synthesised so that the GPIO controller is reset as part of the MAC
> reset?

Alas the SoC has already been synthesized and multiple devices have
already been produced as I described in the initial message. So we can't
change the way the MAC reset works.

> 
> From what you are saying, it sounds like from software you cannot
> independently control the GPIO controller reset?

No. The hardware implements the default MAC reset behavior. So the
GPIO controller gets reset synchronously with the MAC reset and that
can't be changed.

> 
> This is something i would be asking the hardware people. Look at the
> VHDL, etc.

Alas it's too late. I have to fix it in software somehow. As I see it
the only possible ways to bypass the problem are either to re-init the
PHY each time the reset happens or somehow to get rid of the MAC
reset. That's why I have sent this RFC to ask the driver maintainers
whether my suggestions are correct or of a better idea to work around
the problem.

-Sergey

> 
>   Andrew

RE: [PATCH v5 3/4] Revert "e1000e: disable s0ix entry and exit flows for ME systems"

2020-12-15 Thread Shen, Yijun

> -Original Message-
> From: Limonciello, Mario 
> Sent: Tuesday, December 15, 2020 3:30 AM
> To: Jeff Kirsher; Tony Nguyen; intel-wired-...@lists.osuosl.org
> Cc: linux-ker...@vger.kernel.org; Netdev; Alexander Duyck; Jakub Kicinski;
> Sasha Netfin; Aaron Brown; Stefan Assmann; David Miller;
> darc...@redhat.com; Shen, Yijun; Yuan, Perry;
> anthony.w...@canonical.com; Hans de Goede; Limonciello, Mario
> Subject: [PATCH v5 3/4] Revert "e1000e: disable s0ix entry and exit flows for
> ME systems"
> 
> commit e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME
> systems") disabled s0ix flows for systems that have various incarnations of
> the i219-LM ethernet controller.  This changed caused power consumption
> regressions on the following shipping Dell Comet Lake based laptops:
> * Latitude 5310
> * Latitude 5410
> * Latitude 5410
> * Latitude 5510
> * Precision 3550
> * Latitude 5411
> * Latitude 5511
> * Precision 3551
> * Precision 7550
> * Precision 7750
> 
> This commit was introduced because of some regressions on certain
> Thinkpad laptops.  This comment was potentially caused by an earlier
> commit 632fbd5eb5b0e ("e1000e: fix S0ix flows for cable connected case").
> or it was possibly caused by a system not meeting platform architectural
> requirements for low power consumption.  Other changes made in the driver
> with extended timeouts are expected to make the driver more impervious to
> platform firmware behavior.
> 
> Fixes: e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME
> systems")
> Reviewed-by: Alexander Duyck 
> Signed-off-by: Mario Limonciello 

Verified this series patches with Dell Systems.

Tested-By: Yijun Shen 

> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 45 +-
>  1 file changed, 2 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 6588f5d4a2be..b9800ba2006c 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -103,45 +103,6 @@ static const struct e1000_reg_info
> e1000_reg_info_tbl[] = {
>   {0, NULL}
>  };
> 
> -struct e1000e_me_supported {
> - u16 device_id;  /* supported device ID */
> -};
> -
> -static const struct e1000e_me_supported me_supported[] = {
> - {E1000_DEV_ID_PCH_LPT_I217_LM},
> - {E1000_DEV_ID_PCH_LPTLP_I218_LM},
> - {E1000_DEV_ID_PCH_I218_LM2},
> - {E1000_DEV_ID_PCH_I218_LM3},
> - {E1000_DEV_ID_PCH_SPT_I219_LM},
> - {E1000_DEV_ID_PCH_SPT_I219_LM2},
> - {E1000_DEV_ID_PCH_LBG_I219_LM3},
> - {E1000_DEV_ID_PCH_SPT_I219_LM4},
> - {E1000_DEV_ID_PCH_SPT_I219_LM5},
> - {E1000_DEV_ID_PCH_CNP_I219_LM6},
> - {E1000_DEV_ID_PCH_CNP_I219_LM7},
> - {E1000_DEV_ID_PCH_ICP_I219_LM8},
> - {E1000_DEV_ID_PCH_ICP_I219_LM9},
> - {E1000_DEV_ID_PCH_CMP_I219_LM10},
> - {E1000_DEV_ID_PCH_CMP_I219_LM11},
> - {E1000_DEV_ID_PCH_CMP_I219_LM12},
> - {E1000_DEV_ID_PCH_TGP_I219_LM13},
> - {E1000_DEV_ID_PCH_TGP_I219_LM14},
> - {E1000_DEV_ID_PCH_TGP_I219_LM15},
> - {0}
> -};
> -
> -static bool e1000e_check_me(u16 device_id) -{
> - struct e1000e_me_supported *id;
> -
> - for (id = (struct e1000e_me_supported *)me_supported;
> -  id->device_id; id++)
> - if (device_id == id->device_id)
> - return true;
> -
> - return false;
> -}
> -
>  /**
>   * __ew32_prepare - prepare to write to MAC CSR register on certain parts
>   * @hw: pointer to the HW structure
> @@ -6974,8 +6935,7 @@ static __maybe_unused int
> e1000e_pm_suspend(struct device *dev)
>   e1000e_pm_thaw(dev);
>   } else {
>   /* Introduce S0ix implementation */
> - if (hw->mac.type >= e1000_pch_cnp &&
> - !e1000e_check_me(hw->adapter->pdev->device))
> + if (hw->mac.type >= e1000_pch_cnp)
>   e1000e_s0ix_entry_flow(adapter);
>   }
> 
> @@ -6991,8 +6951,7 @@ static __maybe_unused int
> e1000e_pm_resume(struct device *dev)
>   int rc;
> 
>   /* Introduce S0ix implementation */
> - if (hw->mac.type >= e1000_pch_cnp &&
> - !e1000e_check_me(hw->adapter->pdev->device))
> + if (hw->mac.type >= e1000_pch_cnp)
>   e1000e_s0ix_exit_flow(adapter);
> 
>   rc = __e1000_resume(pdev);
> --
> 2.25.1

RE: [PATCH v5 2/4] e1000e: bump up timeout to wait when ME un-configures ULP mode

2020-12-15 Thread Shen, Yijun

> -Original Message-
> From: Limonciello, Mario 
> Sent: Tuesday, December 15, 2020 3:30 AM
> To: Jeff Kirsher; Tony Nguyen; intel-wired-...@lists.osuosl.org
> Cc: linux-ker...@vger.kernel.org; Netdev; Alexander Duyck; Jakub Kicinski;
> Sasha Netfin; Aaron Brown; Stefan Assmann; David Miller;
> darc...@redhat.com; Shen, Yijun; Yuan, Perry;
> anthony.w...@canonical.com; Hans de Goede; Limonciello, Mario; Aaron
> Ma; Mark Pearson
> Subject: [PATCH v5 2/4] e1000e: bump up timeout to wait when ME un-
> configures ULP mode
> 
> Per guidance from Intel ethernet architecture team, it may take up to 1
> second for unconfiguring ULP mode.
> 
> However in practice this seems to be taking up to 2 seconds on some Lenovo
> machines.  Detect scenarios that take more than 1 second but less than 2.5
> seconds and emit a warning on resume for those scenarios.
> 
> Suggested-by: Aaron Ma 
> Suggested-by: Sasha Netfin 
> Suggested-by: Hans de Goede 
> CC: Mark Pearson 
> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
> BugLink: https://bugs.launchpad.net/bugs/1865570
> Link: https://patchwork.ozlabs.org/project/intel-wired-
> lan/patch/20200323191639.48826-1-aaron...@canonical.com/
> Link: https://lkml.org/lkml/2020/12/13/15
> Link: https://lkml.org/lkml/2020/12/14/708
> Signed-off-by: Mario Limonciello 

Verified this series patches with Dell Systems.

Tested-By: Yijun Shen 

> ---
>  drivers/net/ethernet/intel/e1000e/ich8lan.c | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index 9aa6fad8ed47..fdf23d20c954 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -1240,6 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
> e1000_hw *hw, bool force)
>   return 0;
> 
>   if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> + struct e1000_adapter *adapter = hw->adapter;
> + bool firmware_bug = false;
> +
>   if (force) {
>   /* Request ME un-configure ULP mode in the PHY */
>   mac_reg = er32(H2ME);
> @@ -1248,16 +1251,23 @@ static s32 e1000_disable_ulp_lpt_lp(struct
> e1000_hw *hw, bool force)
>   ew32(H2ME, mac_reg);
>   }
> 
> - /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
> + /* Poll up to 2.5 seconds for ME to clear ULP_CFG_DONE.
> +  * If this takes more than 1 second, show a warning indicating
> a firmware
> +  * bug */
>   while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
> - if (i++ == 30) {
> + if (i++ == 250) {
>   ret_val = -E1000_ERR_PHY;
>   goto out;
>   }
> + if (i > 100 && !firmware_bug)
> + firmware_bug = true;
> 
>   usleep_range(1, 11000);
>   }
> - e_dbg("ULP_CONFIG_DONE cleared after %dmsec\n", i * 10);
> + if (firmware_bug)
> + e_warn("ULP_CONFIG_DONE took %dmsec.  This is a
> firmware bug\n", i * 10);
> + else
> + e_dbg("ULP_CONFIG_DONE cleared after %dmsec\n",
> i * 10);
> 
>   if (force) {
>   mac_reg = er32(H2ME);
> --
> 2.25.1

RE: [PATCH v5 1/4] e1000e: Only run S0ix flows if shutdown succeeded

2020-12-15 Thread Shen, Yijun

> -Original Message-
> From: Limonciello, Mario 
> Sent: Tuesday, December 15, 2020 3:30 AM
> To: Jeff Kirsher; Tony Nguyen; intel-wired-...@lists.osuosl.org
> Cc: linux-ker...@vger.kernel.org; Netdev; Alexander Duyck; Jakub Kicinski;
> Sasha Netfin; Aaron Brown; Stefan Assmann; David Miller;
> darc...@redhat.com; Shen, Yijun; Yuan, Perry;
> anthony.w...@canonical.com; Hans de Goede; Limonciello, Mario
> Subject: [PATCH v5 1/4] e1000e: Only run S0ix flows if shutdown succeeded
> 
> If the shutdown failed, the part will be thawed and running S0ix flows will
> put it into an undefined state.
> 
> Reported-by: Alexander Duyck 
> Reviewed-by: Alexander Duyck 
> Signed-off-by: Mario Limonciello 

Verified this series patch on Dell Systems.

Tested-By: Yijun Shen 

> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 128ab6898070..6588f5d4a2be 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6970,13 +6970,14 @@ static __maybe_unused int
> e1000e_pm_suspend(struct device *dev)
>   e1000e_pm_freeze(dev);
> 
>   rc = __e1000_shutdown(pdev, false);
> - if (rc)
> + if (rc) {
>   e1000e_pm_thaw(dev);
> -
> - /* Introduce S0ix implementation */
> - if (hw->mac.type >= e1000_pch_cnp &&
> - !e1000e_check_me(hw->adapter->pdev->device))
> - e1000e_s0ix_entry_flow(adapter);
> + } else {
> + /* Introduce S0ix implementation */
> + if (hw->mac.type >= e1000_pch_cnp &&
> + !e1000e_check_me(hw->adapter->pdev->device))
> + e1000e_s0ix_entry_flow(adapter);
> + }
> 
>   return rc;
>  }
> --
> 2.25.1

Re: [RFC PATCH net-next 03/16] net: mscc: ocelot: rename ocelot_netdevice_port_event to ocelot_netdevice_changeupper

2020-12-15 Thread Alexandre Belloni

Hi,

On 08/12/2020 14:07:49+0200, Vladimir Oltean wrote:
> -static int ocelot_netdevice_port_event(struct net_device *dev,
> -unsigned long event,
> -struct netdev_notifier_changeupper_info 
> *info)
> +static int ocelot_netdevice_changeupper(struct net_device *dev,
> + struct netdev_notifier_changeupper_info 
> *info)

[...]

> - netdev_for_each_lower_dev(dev, slave, iter) {
> - ret = ocelot_netdevice_port_event(slave, event, info);
> - if (ret)
> - goto notify;
> + netdev_for_each_lower_dev(dev, slave, iter) {
> + ret = ocelot_netdevice_changeupper(slave, 
> event, info);
> + if (ret)
> + goto notify;
> + }
> + } else {
> + ret = ocelot_netdevice_changeupper(dev, event, info);

Does that compile? Shouldn't event be dropped?


-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [stabe-rc 5.9 ] sched: core.c:7270 Illegal context switch in RCU-bh read-side critical section!

2020-12-15 Thread Paul E. McKenney

On Tue, Dec 15, 2020 at 07:50:31AM +0530, Naresh Kamboju wrote:
> There are two warnings "WARNING: suspicious RCU usage" noticed on arm64 
> juno-r2
> device while running selftest bpf test_tc_edt.sh and net: udpgro_bench.sh.
> These warnings are occurring intermittently.
> 
> metadata:
>   git branch: linux-5.9.y
>   git repo: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>   git describe: v5.9.14-106-g609d95a95925
>   make_kernelversion: 5.9.15-rc1
>   kernel-config:
> http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/juno/lkft/linux-stable-rc-5.9/58/config
> 
> 
> Steps to reproduce:
> --
> Not easy to reproduce.
> 
> Crash log:
> --
> # selftests: bpf: test_tc_edt.sh
> [  503.796362]
> [  503.797960] =
> [  503.802131] WARNING: suspicious RCU usage
> [  503.806232] 5.9.15-rc1 #1 Tainted: GW
> [  503.811358] -
> [  503.815444] /usr/src/kernel/kernel/sched/core.c:7270 Illegal
> context switch in RCU-bh read-side critical section!
> [  503.825858]
> [  503.825858] other info that might help us debug this:
> [  503.825858]
> [  503.833998]
> [  503.833998] rcu_scheduler_active = 2, debug_locks = 1
> [  503.840981] 3 locks held by kworker/u12:1/157:
> [  503.845514]  #0: 0009754ed538
> ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work+0x208/0x768
> [  503.855048]  #1: 800013e63df0 (net_cleanup_work){+.+.}-{0:0},
> at: process_one_work+0x208/0x768
> [  503.864201]  #2: 8000129fe3f0 (pernet_ops_rwsem){}-{3:3},
> at: cleanup_net+0x64/0x3b8
> [  503.872786]
> [  503.872786] stack backtrace:
> [  503.877229] CPU: 1 PID: 157 Comm: kworker/u12:1 Tainted: GW
> 5.9.15-rc1 #1
> [  503.885433] Hardware name: ARM Juno development board (r2) (DT)
> [  503.891382] Workqueue: netns cleanup_net
> [  503.895324] Call trace:
> [  503.897786]  dump_backtrace+0x0/0x1f8
> [  503.901464]  show_stack+0x2c/0x38
> [  503.904796]  dump_stack+0xec/0x158
> [  503.908215]  lockdep_rcu_suspicious+0xd4/0xf8
> [  503.912591]  ___might_sleep+0x1e4/0x208

You really are forbidden to invoke ___might_sleep() while in a BH-disable
region of code, whether due to rcu_read_lock_bh(), local_bh_disable(),
or whatever else.

I do see the cond_resched() in inet_twsk_purge(), but I don't immediately
see a BH-disable region of code.  Maybe someone more familiar with this
code would have some ideas.

Or you could place checks for being in a BH-disable further up in
the code.  Or build with CONFIG_DEBUG_INFO=y to allow more precise
interpretation of this stack trace.

Thanx, Paul

> [  503.916444]  inet_twsk_purge+0x144/0x378
> [  503.920384]  tcpv6_net_exit_batch+0x20/0x28
> [  503.924585]  ops_exit_list.isra.10+0x78/0x88
> [  503.928872]  cleanup_net+0x248/0x3b8
> [  503.932462]  process_one_work+0x2b0/0x768
> [  503.936487]  worker_thread+0x48/0x498
> [  503.940166]  kthread+0x158/0x168
> [  503.943409]  ret_from_fork+0x10/0x1c
> [  504.165891] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready
> [  504.459624] audit: type=1334 audit(1607978673.070:40866):
> prog-id=20436 op=LOAD
> <>
> [  879.304684]
> [  879.306200] =
> [  879.310314] WARNING: suspicious RCU usage
> [  879.314420] 5.9.15-rc1 #1 Tainted: GW
> [  879.319554] -
> [  879.323644] /usr/src/kernel/kernel/sched/core.c:7270 Illegal
> context switch in RCU-sched read-side critical section!
> [  879.334259]
> [  879.334259] other info that might help us debug this:
> [  879.334259]
> [  879.342345]
> [  879.342345] rcu_scheduler_active = 2, debug_locks = 1
> [  879.348958] 3 locks held by kworker/u12:8/248:
> [  879.353483]  #0: 0009754ed538
> ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work+0x208/0x768
> [  879.362910]  #1: 800013bc3df0 (net_cleanup_work){+.+.}-{0:0},
> at: process_one_work+0x208/0x768
> [  879.371984]  #2: 8000129fe3f0 (pernet_ops_rwsem){}-{3:3},
> at: cleanup_net+0x64/0x3b8
> [  879.380540]
> [  879.380540] stack backtrace:
> [  879.384998] CPU: 1 PID: 248 Comm: kworker/u12:8 Tainted: GW
> 5.9.15-rc1 #1
> [  879.393201] Hardware name: ARM Juno development board (r2) (DT)
> [  879.399147] Workqueue: netns cleanup_net
> [  879.403089] Call trace:
> [  879.405550]  dump_backtrace+0x0/0x1f8
> [  879.409228]  show_stack+0x2c/0x38
> [  879.412561]  dump_stack+0xec/0x158
> # ud[  879.415980]  lockdep_rcu_suspicious+0xd4/0xf8
> [  879.420691]  ___might_sleep+0x1ac/0x208
> p tx: 32 MB/s  546 calls/[  879.424570]
> nf_ct_iterate_cleanup+0x1b8/0x2d8 [nf_conntrack]
> s546 msg/s[  879.433190]  nf_conntrack_cleanup_net_list+0x58/0x100
> [nf_conntrack]
> 
> [  879.440765]  nf_conntrack_pernet_exit+0xa8/0xb8 [nf_conntrack]
> [  879.446755]  ops_exit_list.isra.10+0x78/0x88
> [  879.451043]  cleanup_net+0x248/0x3b8
> [  879.454635]  process_one_wor

UBSAN: shift-out-of-bounds in xprt_calc_majortimeo

2020-12-15 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:14240d4c Add linux-next specific files for 20201210
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1321cf1750
kernel config:  https://syzkaller.appspot.com/x/.config?x=6dbe20fdaa5aaebe
dashboard link: https://syzkaller.appspot.com/bug?extid=ba2e91df8f74809417fa
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=174ecb9b50
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14ff941350

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ba2e91df8f7480941...@syzkaller.appspotmail.com


UBSAN: shift-out-of-bounds in net/sunrpc/xprt.c:658:14
shift exponent 536871232 is too large for 64-bit type 'long unsigned int'
CPU: 1 PID: 8494 Comm: syz-executor211 Not tainted 
5.10.0-rc7-next-20201210-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 xprt_calc_majortimeo.isra.0.cold+0x17/0x46 net/sunrpc/xprt.c:658
 xprt_init_majortimeo net/sunrpc/xprt.c:686 [inline]
 xprt_request_init+0x486/0x9e0 net/sunrpc/xprt.c:1805
 xprt_do_reserve net/sunrpc/xprt.c:1815 [inline]
 xprt_reserve+0x18f/0x280 net/sunrpc/xprt.c:1836
 __rpc_execute+0x21d/0x1360 net/sunrpc/sched.c:891
 rpc_execute+0x230/0x350 net/sunrpc/sched.c:967
 rpc_run_task+0x5d0/0x8f0 net/sunrpc/clnt.c:1140
 rpc_call_sync+0xc6/0x1a0 net/sunrpc/clnt.c:1169
 rpc_ping net/sunrpc/clnt.c:2682 [inline]
 rpc_create_xprt+0x3f1/0x4a0 net/sunrpc/clnt.c:477
 rpc_create+0x354/0x670 net/sunrpc/clnt.c:593
 nfs_create_rpc_client+0x4eb/0x680 fs/nfs/client.c:536
 nfs_init_client fs/nfs/client.c:653 [inline]
 nfs_init_client+0x6d/0x100 fs/nfs/client.c:640
 nfs_get_client+0xcd7/0x1020 fs/nfs/client.c:430
 nfs_init_server.isra.0+0x2c0/0xed0 fs/nfs/client.c:692
 nfs_create_server+0x18f/0x650 fs/nfs/client.c:996
 nfs_try_get_tree+0x181/0x9f0 fs/nfs/super.c:939
 nfs_get_tree+0xaa1/0x1520 fs/nfs/fs_context.c:1350
 vfs_get_tree+0x89/0x2f0 fs/super.c:1496
 do_new_mount fs/namespace.c:2896 [inline]
 path_mount+0x12ae/0x1e70 fs/namespace.c:3227
 do_mount fs/namespace.c:3240 [inline]
 __do_sys_mount fs/namespace.c:3448 [inline]
 __se_sys_mount fs/namespace.c:3425 [inline]
 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3425
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x440419
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 
7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7ffe282dde28 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffda RBX: 0030656c69662f2e RCX: 00440419
RDX: 20fb5ffc RSI: 20343ff8 RDI: 2100
RBP: 006ca018 R08: 2000a000 R09: 
R10:  R11: 0246 R12: 00401c20
R13: 00401cb0 R14:  R15: 



---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Re: [PATCH v3 bpf-next 2/2] net: xdp: introduce xdp_prepare_buff utility routine

2020-12-15 Thread Lorenzo Bianconi

> On 12/15/20 2:47 PM, Lorenzo Bianconi wrote:
> [...]
> > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > index 329397c60d84..61d3f5f8b7f3 100644
> > > > --- a/drivers/net/xen-netfront.c
> > > > +++ b/drivers/net/xen-netfront.c
> > > > @@ -866,10 +866,8 @@ static u32 xennet_run_xdp(struct netfront_queue 
> > > > *queue, struct page *pdata,
> > > > xdp_init_buff(xdp, XEN_PAGE_SIZE - XDP_PACKET_HEADROOM,
> > > >   &queue->xdp_rxq);
> > > > -   xdp->data_hard_start = page_address(pdata);
> > > > -   xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM;
> > > > +   xdp_prepare_buff(xdp, page_address(pdata), XDP_PACKET_HEADROOM, 
> > > > len);
> > > > xdp_set_data_meta_invalid(xdp);
> > > > -   xdp->data_end = xdp->data + len;
> > > > act = bpf_prog_run_xdp(prog, xdp);
> > > > switch (act) {
> > > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > > index 3fb3a9aa1b71..66d8a4b317a3 100644
> > > > --- a/include/net/xdp.h
> > > > +++ b/include/net/xdp.h
> > > > @@ -83,6 +83,18 @@ xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, 
> > > > struct xdp_rxq_info *rxq)
> > > > xdp->rxq = rxq;
> > > >   }
> > > > +static inline void
> 
> nit: maybe __always_inline

ack, I will add in v4

> 
> > > > +xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
> > > > +int headroom, int data_len)
> > > > +{
> > > > +   unsigned char *data = hard_start + headroom;
> > > > +
> > > > +   xdp->data_hard_start = hard_start;
> > > > +   xdp->data = data;
> > > > +   xdp->data_end = data + data_len;
> > > > +   xdp->data_meta = data;
> > > > +}
> > > > +
> > > >   /* Reserve memory area at end-of data area.
> > > >*
> 
> For the drivers with xdp_set_data_meta_invalid(), we're basically setting 
> xdp->data_meta
> twice unless compiler is smart enough to optimize the first one away (did you 
> double check?).
> Given this is supposed to be a cleanup, why not integrate this logic as well 
> so the
> xdp_set_data_meta_invalid() doesn't get extra treatment?

we discussed it before, but I am fine to add it in v4. Something like:

static __always_inline void
xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
 int headroom, int data_len, bool meta_valid)
{
unsigned char *data = hard_start + headroom;

xdp->data_hard_start = hard_start;
xdp->data = data;
xdp->data_end = data + data_len;
xdp->data_meta = meta_valid ? data : data + 1;
}

Regards,
Lorenzo

> 
> Thanks,
> Daniel
> 


signature.asc
Description: PGP signature

Re: [RFC PATCH 0/7] Share events between metrics

2020-12-15 Thread Paul A. Clarke

On Thu, May 07, 2020 at 10:43:43PM -0700, Ian Rogers wrote:
> On Thu, May 7, 2020 at 2:47 PM Andi Kleen  wrote:
> >
> > > > - without this change events within a metric may get scheduled
> > > >   together, after they may appear as part of a larger group and be
> > > >   multiplexed at different times, lowering accuracy - however, less
> > > >   multiplexing may compensate for this.

Does mutiplexing somewhat related events at different times actually reduce
accuracy, or is it just more likely to give that appearance?

It seems that perf measurements are only useful if the workload is in a
fairly steady state.  If there is some wobbling, then measuring at the
same time is more accurate for the periods where the events are being
measured simultaneously, but may be far off for when they are not being
measured at all.  Spreading them out over a longer duration may actually
increase accuracy by sampling over more varied intervals.

Or, is the concern more about trying to time-slice the results in a 
fairly granular way and expecting accurate results then?

(Or, maybe my ignorance is showing again.  :-)

PC

UBSAN: shift-out-of-bounds in hash_ipmark_create

2020-12-15 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:15ac8fdb Add linux-next specific files for 20201207
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=156c845b50
kernel config:  https://syzkaller.appspot.com/x/.config?x=3696b8138207d24d
dashboard link: https://syzkaller.appspot.com/bug?extid=d81819ac03d8c36e3974
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=14960f9b50
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12be080f50

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d81819ac03d8c36e3...@syzkaller.appspotmail.com


UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 8473 Comm: syz-executor542 Not tainted 
5.10.0-rc6-next-20201207-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline]
 hash_ipmark_create.cold+0x96/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524
 ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115
 nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:672
 sys_sendmsg+0x6e8/0x810 net/socket.c:2345
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2399
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2432
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x440419
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 
7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7ffdadcbeb88 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 004002c8 RCX: 00440419
RDX:  RSI: 20c0 RDI: 0003
RBP: 006ca018 R08: 0005 R09: 004002c8
R10: 0001 R11: 0246 R12: 00401c20
R13: 00401cb0 R14:  R15: 



---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Re: [PATCH v3 bpf-next 2/2] net: xdp: introduce xdp_prepare_buff utility routine

2020-12-15 Thread Maciej Fijalkowski

On Tue, Dec 15, 2020 at 04:06:20PM +0100, Lorenzo Bianconi wrote:
> > On 12/15/20 2:47 PM, Lorenzo Bianconi wrote:
> > [...]
> > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > index 329397c60d84..61d3f5f8b7f3 100644
> > > > > --- a/drivers/net/xen-netfront.c
> > > > > +++ b/drivers/net/xen-netfront.c
> > > > > @@ -866,10 +866,8 @@ static u32 xennet_run_xdp(struct netfront_queue 
> > > > > *queue, struct page *pdata,
> > > > >   xdp_init_buff(xdp, XEN_PAGE_SIZE - XDP_PACKET_HEADROOM,
> > > > > &queue->xdp_rxq);
> > > > > - xdp->data_hard_start = page_address(pdata);
> > > > > - xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM;
> > > > > + xdp_prepare_buff(xdp, page_address(pdata), XDP_PACKET_HEADROOM, 
> > > > > len);
> > > > >   xdp_set_data_meta_invalid(xdp);
> > > > > - xdp->data_end = xdp->data + len;
> > > > >   act = bpf_prog_run_xdp(prog, xdp);
> > > > >   switch (act) {
> > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > > > index 3fb3a9aa1b71..66d8a4b317a3 100644
> > > > > --- a/include/net/xdp.h
> > > > > +++ b/include/net/xdp.h
> > > > > @@ -83,6 +83,18 @@ xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, 
> > > > > struct xdp_rxq_info *rxq)
> > > > >   xdp->rxq = rxq;
> > > > >   }
> > > > > +static inline void
> > 
> > nit: maybe __always_inline
> 
> ack, I will add in v4
> 
> > 
> > > > > +xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
> > > > > +  int headroom, int data_len)
> > > > > +{
> > > > > + unsigned char *data = hard_start + headroom;
> > > > > +
> > > > > + xdp->data_hard_start = hard_start;
> > > > > + xdp->data = data;
> > > > > + xdp->data_end = data + data_len;
> > > > > + xdp->data_meta = data;
> > > > > +}
> > > > > +
> > > > >   /* Reserve memory area at end-of data area.
> > > > >*
> > 
> > For the drivers with xdp_set_data_meta_invalid(), we're basically setting 
> > xdp->data_meta
> > twice unless compiler is smart enough to optimize the first one away (did 
> > you double check?).
> > Given this is supposed to be a cleanup, why not integrate this logic as 
> > well so the
> > xdp_set_data_meta_invalid() doesn't get extra treatment?

That's what I was trying to say previously.

> 
> we discussed it before, but I am fine to add it in v4. Something like:
> 
> static __always_inline void
> xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
>int headroom, int data_len, bool meta_valid)
> {
>   unsigned char *data = hard_start + headroom;
>   
>   xdp->data_hard_start = hard_start;
>   xdp->data = data;
>   xdp->data_end = data + data_len;
>   xdp->data_meta = meta_valid ? data : data + 1;

This will introduce branch, so for intel drivers we're getting the
overhead of one add and a branch. I'm still opting for a separate helper.

static __always_inline void
xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
 int headroom, int data_len)
{
unsigned char *data = hard_start + headroom;

xdp->data_hard_start = hard_start;
xdp->data = data;
xdp->data_end = data + data_len;
xdp_set_data_meta_invalid(xdp);
}

static __always_inline void
xdp_prepare_buff_meta(struct xdp_buff *xdp, unsigned char *hard_start,
  int headroom, int data_len)
{
unsigned char *data = hard_start + headroom;

xdp->data_hard_start = hard_start;
xdp->data = data;
xdp->data_end = data + data_len;
xdp->data_meta = data;
}

> }
> 
> Regards,
> Lorenzo
> 
> > 
> > Thanks,
> > Daniel
> >

Re: [RFC PATCH net-next 03/16] net: mscc: ocelot: rename ocelot_netdevice_port_event to ocelot_netdevice_changeupper

2020-12-15 Thread Vladimir Oltean

On Tue, Dec 15, 2020 at 04:01:32PM +0100, Alexandre Belloni wrote:
> Hi,
> 
> On 08/12/2020 14:07:49+0200, Vladimir Oltean wrote:
> > -static int ocelot_netdevice_port_event(struct net_device *dev,
> > -  unsigned long event,
> > -  struct netdev_notifier_changeupper_info 
> > *info)
> > +static int ocelot_netdevice_changeupper(struct net_device *dev,
> > +   struct netdev_notifier_changeupper_info 
> > *info)
> 
> [...]
> 
> > -   netdev_for_each_lower_dev(dev, slave, iter) {
> > -   ret = ocelot_netdevice_port_event(slave, event, info);
> > -   if (ret)
> > -   goto notify;
> > +   netdev_for_each_lower_dev(dev, slave, iter) {
> > +   ret = ocelot_netdevice_changeupper(slave, 
> > event, info);
> > +   if (ret)
> > +   goto notify;
> > +   }
> > +   } else {
> > +   ret = ocelot_netdevice_changeupper(dev, event, info);
> 
> Does that compile?

No it doesn't.

> Shouldn't event be dropped?

It is, but in the next patch. I'll fix it, thanks.

Re: [PATCH net v3] lan743x: fix rx_napi_poll/interrupt ping-pong

2020-12-15 Thread Sven Van Asbroeck

Hi Jakub,

On Fri, Dec 11, 2020 at 9:38 AM Sven Van Asbroeck  wrote:
>
> From: Sven Van Asbroeck 
>
> Even if there is more rx data waiting on the chip, the rx napi poll fn
> will never run more than once - it will always read a few buffers, then
> bail out and re-arm interrupts. Which results in ping-pong between napi
> and interrupt.
>
> This defeats the purpose of napi, and is bad for performance.
>
> Fix by making the rx napi poll behave identically to other ethernet
> drivers:

I was wondering if maybe you had any lingering doubts about this patch?
Is there anything I can do to address these?

Re: [RFC PATCH net-next 04/16] net: mscc: ocelot: use a switch-case statement in ocelot_netdevice_event

2020-12-15 Thread Alexandre Belloni

On 08/12/2020 14:07:50+0200, Vladimir Oltean wrote:
> Make ocelot's net device event handler more streamlined by structuring
> it in a similar way with others. The inspiration here was
> dsa_slave_netdevice_event.
> 
> Signed-off-by: Vladimir Oltean 
> ---
>  drivers/net/ethernet/mscc/ocelot_net.c | 68 +-
>  1 file changed, 45 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
> b/drivers/net/ethernet/mscc/ocelot_net.c
> index 50765a3b1c44..47b620967156 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -1030,49 +1030,71 @@ static int ocelot_netdevice_changeupper(struct 
> net_device *dev,
> info->upper_dev);
>   }
>  
> - return err;
> + return notifier_from_errno(err);
> +}
> +
> +static int
> +ocelot_netdevice_lag_changeupper(struct net_device *dev,
> +  struct netdev_notifier_changeupper_info *info)
> +{
> + struct net_device *lower;
> + struct list_head *iter;
> + int err = NOTIFY_DONE;
> +
> + netdev_for_each_lower_dev(dev, lower, iter) {
> + err = ocelot_netdevice_changeupper(lower, info);
> + if (err)
> + return notifier_from_errno(err);
> + }
> +
> + return NOTIFY_DONE;
>  }
>  
>  static int ocelot_netdevice_event(struct notifier_block *unused,
> unsigned long event, void *ptr)
>  {
> - struct netdev_notifier_changeupper_info *info = ptr;
>   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> - int ret = 0;
>  
> - if (event == NETDEV_PRECHANGEUPPER &&
> - ocelot_netdevice_dev_check(dev) &&
> - netif_is_lag_master(info->upper_dev)) {
> - struct netdev_lag_upper_info *lag_upper_info = info->upper_info;
> + switch (event) {
> + case NETDEV_PRECHANGEUPPER: {
> + struct netdev_notifier_changeupper_info *info = ptr;
> + struct netdev_lag_upper_info *lag_upper_info;
>   struct netlink_ext_ack *extack;
>  
> + if (!ocelot_netdevice_dev_check(dev))
> + break;
> +
> + if (!netif_is_lag_master(info->upper_dev))
> + break;
> +
> + lag_upper_info = info->upper_info;
> +
>   if (lag_upper_info &&
>   lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) {
>   extack = netdev_notifier_info_to_extack(&info->info);
>   NL_SET_ERR_MSG_MOD(extack, "LAG device using 
> unsupported Tx type");
>  
> - ret = -EINVAL;
> - goto notify;
> + return NOTIFY_BAD;

This changes the return value in case of error, I'm not sure how
important this is.

>   }
> +
> + break;
>   }
> + case NETDEV_CHANGEUPPER: {
> + struct netdev_notifier_changeupper_info *info = ptr;
>  
> - if (event == NETDEV_CHANGEUPPER) {
> - if (netif_is_lag_master(dev)) {
> - struct net_device *slave;
> - struct list_head *iter;
> + if (ocelot_netdevice_dev_check(dev))
> + return ocelot_netdevice_changeupper(dev, info);
>  
> - netdev_for_each_lower_dev(dev, slave, iter) {
> - ret = ocelot_netdevice_changeupper(slave, 
> event, info);
> - if (ret)
> - goto notify;
> - }
> - } else {
> - ret = ocelot_netdevice_changeupper(dev, event, info);
> - }
> + if (netif_is_lag_master(dev))
> + return ocelot_netdevice_lag_changeupper(dev, info);
> +
> + break;
> + }
> + default:
> + break;
>   }
>  
> -notify:
> - return notifier_from_errno(ret);
> + return NOTIFY_DONE;

This changes the return value from NOTIFY_OK to NOTIFY_DONE but this is
probably what we want.

>  }
>  
>  struct notifier_block ocelot_netdevice_nb __read_mostly = {
> -- 
> 2.25.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [RFC PATCH net-next 05/16] net: mscc: ocelot: don't refuse bonding interfaces we can't offload

2020-12-15 Thread Alexandre Belloni

On 08/12/2020 14:07:51+0200, Vladimir Oltean wrote:
> Since switchdev/DSA exposes network interfaces that fulfill many of the
> same user space expectations that dedicated NICs do, it makes sense to
> not deny bonding interfaces with a bonding policy that we cannot offload,
> but instead allow the bonding driver to select the egress interface in
> software.
> 
> Signed-off-by: Vladimir Oltean 
Reviewed-by: Alexandre Belloni 

> ---
>  drivers/net/ethernet/mscc/ocelot_net.c | 38 ++
>  1 file changed, 15 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
> b/drivers/net/ethernet/mscc/ocelot_net.c
> index 47b620967156..77957328722a 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -1022,6 +1022,15 @@ static int ocelot_netdevice_changeupper(struct 
> net_device *dev,
>   }
>   }
>   if (netif_is_lag_master(info->upper_dev)) {
> + struct netdev_lag_upper_info *lag_upper_info;
> +
> + lag_upper_info = info->upper_info;
> +
> + /* Only offload what we can */
> + if (lag_upper_info &&
> + lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH)
> + return NOTIFY_DONE;
> +
>   if (info->linking)
>   err = ocelot_port_lag_join(ocelot, port,
>  info->upper_dev);
> @@ -1037,10 +1046,16 @@ static int
>  ocelot_netdevice_lag_changeupper(struct net_device *dev,
>struct netdev_notifier_changeupper_info *info)
>  {
> + struct netdev_lag_upper_info *lag_upper_info = info->upper_info;
>   struct net_device *lower;
>   struct list_head *iter;
>   int err = NOTIFY_DONE;
>  
> + /* Can't offload LAG => also do bridging in software */
> + if (lag_upper_info &&
> + lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH)
> + return NOTIFY_DONE;
> +
>   netdev_for_each_lower_dev(dev, lower, iter) {
>   err = ocelot_netdevice_changeupper(lower, info);
>   if (err)
> @@ -1056,29 +1071,6 @@ static int ocelot_netdevice_event(struct 
> notifier_block *unused,
>   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>  
>   switch (event) {
> - case NETDEV_PRECHANGEUPPER: {
> - struct netdev_notifier_changeupper_info *info = ptr;
> - struct netdev_lag_upper_info *lag_upper_info;
> - struct netlink_ext_ack *extack;
> -
> - if (!ocelot_netdevice_dev_check(dev))
> - break;
> -
> - if (!netif_is_lag_master(info->upper_dev))
> - break;
> -
> - lag_upper_info = info->upper_info;
> -
> - if (lag_upper_info &&
> - lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) {
> - extack = netdev_notifier_info_to_extack(&info->info);
> - NL_SET_ERR_MSG_MOD(extack, "LAG device using 
> unsupported Tx type");
> -
> - return NOTIFY_BAD;
> - }
> -
> - break;
> - }
>   case NETDEV_CHANGEUPPER: {
>   struct netdev_notifier_changeupper_info *info = ptr;
>  
> -- 
> 2.25.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [RFC PATCH net-next 04/16] net: mscc: ocelot: use a switch-case statement in ocelot_netdevice_event

2020-12-15 Thread Alexandre Belloni

On 15/12/2020 16:52:26+0100, Alexandre Belloni wrote:
> On 08/12/2020 14:07:50+0200, Vladimir Oltean wrote:
> > Make ocelot's net device event handler more streamlined by structuring
> > it in a similar way with others. The inspiration here was
> > dsa_slave_netdevice_event.
> > 
> > Signed-off-by: Vladimir Oltean 
> > ---
> >  drivers/net/ethernet/mscc/ocelot_net.c | 68 +-
> >  1 file changed, 45 insertions(+), 23 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
> > b/drivers/net/ethernet/mscc/ocelot_net.c
> > index 50765a3b1c44..47b620967156 100644
> > --- a/drivers/net/ethernet/mscc/ocelot_net.c
> > +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> > @@ -1030,49 +1030,71 @@ static int ocelot_netdevice_changeupper(struct 
> > net_device *dev,
> >   info->upper_dev);
> > }
> >  
> > -   return err;
> > +   return notifier_from_errno(err);
> > +}
> > +
> > +static int
> > +ocelot_netdevice_lag_changeupper(struct net_device *dev,
> > +struct netdev_notifier_changeupper_info *info)
> > +{
> > +   struct net_device *lower;
> > +   struct list_head *iter;
> > +   int err = NOTIFY_DONE;
> > +
> > +   netdev_for_each_lower_dev(dev, lower, iter) {
> > +   err = ocelot_netdevice_changeupper(lower, info);
> > +   if (err)
> > +   return notifier_from_errno(err);
> > +   }
> > +
> > +   return NOTIFY_DONE;
> >  }
> >  
> >  static int ocelot_netdevice_event(struct notifier_block *unused,
> >   unsigned long event, void *ptr)
> >  {
> > -   struct netdev_notifier_changeupper_info *info = ptr;
> > struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> > -   int ret = 0;
> >  
> > -   if (event == NETDEV_PRECHANGEUPPER &&
> > -   ocelot_netdevice_dev_check(dev) &&
> > -   netif_is_lag_master(info->upper_dev)) {
> > -   struct netdev_lag_upper_info *lag_upper_info = info->upper_info;
> > +   switch (event) {
> > +   case NETDEV_PRECHANGEUPPER: {
> > +   struct netdev_notifier_changeupper_info *info = ptr;
> > +   struct netdev_lag_upper_info *lag_upper_info;
> > struct netlink_ext_ack *extack;
> >  
> > +   if (!ocelot_netdevice_dev_check(dev))
> > +   break;
> > +
> > +   if (!netif_is_lag_master(info->upper_dev))
> > +   break;
> > +
> > +   lag_upper_info = info->upper_info;
> > +
> > if (lag_upper_info &&
> > lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) {
> > extack = netdev_notifier_info_to_extack(&info->info);
> > NL_SET_ERR_MSG_MOD(extack, "LAG device using 
> > unsupported Tx type");
> >  
> > -   ret = -EINVAL;
> > -   goto notify;
> > +   return NOTIFY_BAD;
> 
> This changes the return value in case of error, I'm not sure how
> important this is.
> 

Ok, this is removed anyway, so

Reviewed-by: Alexandre Belloni 


> > }
> > +
> > +   break;
> > }
> > +   case NETDEV_CHANGEUPPER: {
> > +   struct netdev_notifier_changeupper_info *info = ptr;
> >  
> > -   if (event == NETDEV_CHANGEUPPER) {
> > -   if (netif_is_lag_master(dev)) {
> > -   struct net_device *slave;
> > -   struct list_head *iter;
> > +   if (ocelot_netdevice_dev_check(dev))
> > +   return ocelot_netdevice_changeupper(dev, info);
> >  
> > -   netdev_for_each_lower_dev(dev, slave, iter) {
> > -   ret = ocelot_netdevice_changeupper(slave, 
> > event, info);
> > -   if (ret)
> > -   goto notify;
> > -   }
> > -   } else {
> > -   ret = ocelot_netdevice_changeupper(dev, event, info);
> > -   }
> > +   if (netif_is_lag_master(dev))
> > +   return ocelot_netdevice_lag_changeupper(dev, info);
> > +
> > +   break;
> > +   }
> > +   default:
> > +   break;
> > }
> >  
> > -notify:
> > -   return notifier_from_errno(ret);
> > +   return NOTIFY_DONE;
> 
> This changes the return value from NOTIFY_OK to NOTIFY_DONE but this is
> probably what we want.
> 
> >  }
> >  
> >  struct notifier_block ocelot_netdevice_nb __read_mostly = {
> > -- 
> > 2.25.1
> > 
> 
> -- 
> Alexandre Belloni, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [RFC PATCH net-next 06/16] net: mscc: ocelot: use ipv6 in the aggregation code

2020-12-15 Thread Alexandre Belloni

On 08/12/2020 14:07:52+0200, Vladimir Oltean wrote:
> IPv6 header information is not currently part of the entropy source for
> the 4-bit aggregation code used for LAG offload, even though it could be.
> The hardware reference manual says about these fields:
> 
> ANA::AGGR_CFG.AC_IP6_TCPUDP_PORT_ENA
> Use IPv6 TCP/UDP port when calculating aggregation code. Configure
> identically for all ports. Recommended value is 1.
> 
> ANA::AGGR_CFG.AC_IP6_FLOW_LBL_ENA
> Use IPv6 flow label when calculating AC. Configure identically for all
> ports. Recommended value is 1.
> 
> Integration with the xmit_hash_policy of the bonding interface is TBD.
> 
> Signed-off-by: Vladimir Oltean 
Reviewed-by: Alexandre Belloni 

> ---
>  drivers/net/ethernet/mscc/ocelot.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot.c 
> b/drivers/net/ethernet/mscc/ocelot.c
> index 7a5c534099d3..13e86dd71e5a 100644
> --- a/drivers/net/ethernet/mscc/ocelot.c
> +++ b/drivers/net/ethernet/mscc/ocelot.c
> @@ -1557,7 +1557,10 @@ int ocelot_init(struct ocelot *ocelot)
>   ocelot_write(ocelot, ANA_AGGR_CFG_AC_SMAC_ENA |
>ANA_AGGR_CFG_AC_DMAC_ENA |
>ANA_AGGR_CFG_AC_IP4_SIPDIP_ENA |
> -  ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA, ANA_AGGR_CFG);
> +  ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA |
> +  ANA_AGGR_CFG_AC_IP6_FLOW_LBL_ENA |
> +  ANA_AGGR_CFG_AC_IP6_TCPUDP_ENA,
> +  ANA_AGGR_CFG);
>  
>   /* Set MAC age time to default value. The entry is aged after
>* 2*AGE_PERIOD
> -- 
> 2.25.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[PATCH net-next] r8169: facilitate adding new chip versions

2020-12-15 Thread Heiner Kallweit

Add a constant RTL_GIGA_MAC_MAX and use it if all new chip versions
handle a feature in a specific way. As result we have to touch less
places when adding support for a new chip version.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/ethernet/realtek/r8169.h  |  3 ++-
 drivers/net/ethernet/realtek/r8169_main.c | 14 +++---
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.h 
b/drivers/net/ethernet/realtek/r8169.h
index 7be86ef5a..4a924920e 100644
--- a/drivers/net/ethernet/realtek/r8169.h
+++ b/drivers/net/ethernet/realtek/r8169.h
@@ -66,7 +66,8 @@ enum mac_version {
RTL_GIGA_MAC_VER_60,
RTL_GIGA_MAC_VER_61,
RTL_GIGA_MAC_VER_63,
-   RTL_GIGA_MAC_NONE
+   RTL_GIGA_MAC_MAX,
+   RTL_GIGA_MAC_NONE = RTL_GIGA_MAC_MAX
 };
 
 struct rtl8169_private;
diff --git a/drivers/net/ethernet/realtek/r8169_main.c 
b/drivers/net/ethernet/realtek/r8169_main.c
index 46d8510b2..01087d3c0 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -962,7 +962,7 @@ static void rtl_writephy(struct rtl8169_private *tp, int 
location, int val)
case RTL_GIGA_MAC_VER_31:
r8168dp_2_mdio_write(tp, location, val);
break;
-   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_63:
+   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_MAX:
r8168g_mdio_write(tp, location, val);
break;
default:
@@ -979,7 +979,7 @@ static int rtl_readphy(struct rtl8169_private *tp, int 
location)
case RTL_GIGA_MAC_VER_28:
case RTL_GIGA_MAC_VER_31:
return r8168dp_2_mdio_read(tp, location);
-   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_63:
+   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_MAX:
return r8168g_mdio_read(tp, location);
default:
return r8169_mdio_read(tp, location);
@@ -1383,7 +1383,7 @@ static void __rtl8169_set_wol(struct rtl8169_private *tp, 
u32 wolopts)
break;
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_37:
-   case RTL_GIGA_MAC_VER_39 ... RTL_GIGA_MAC_VER_63:
+   case RTL_GIGA_MAC_VER_39 ... RTL_GIGA_MAC_MAX:
options = RTL_R8(tp, Config2) & ~PME_SIGNAL;
if (wolopts)
options |= PME_SIGNAL;
@@ -2182,7 +2182,7 @@ static void rtl_wol_suspend_quirk(struct rtl8169_private 
*tp)
case RTL_GIGA_MAC_VER_32:
case RTL_GIGA_MAC_VER_33:
case RTL_GIGA_MAC_VER_34:
-   case RTL_GIGA_MAC_VER_37 ... RTL_GIGA_MAC_VER_63:
+   case RTL_GIGA_MAC_VER_37 ... RTL_GIGA_MAC_MAX:
RTL_W32(tp, RxConfig, RTL_R32(tp, RxConfig) |
AcceptBroadcast | AcceptMulticast | AcceptMyPhys);
break;
@@ -2216,7 +2216,7 @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_46:
case RTL_GIGA_MAC_VER_47:
case RTL_GIGA_MAC_VER_48:
-   case RTL_GIGA_MAC_VER_50 ... RTL_GIGA_MAC_VER_63:
+   case RTL_GIGA_MAC_VER_50 ... RTL_GIGA_MAC_MAX:
RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) & ~0x80);
break;
case RTL_GIGA_MAC_VER_40:
@@ -2244,7 +2244,7 @@ static void rtl_pll_power_up(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_46:
case RTL_GIGA_MAC_VER_47:
case RTL_GIGA_MAC_VER_48:
-   case RTL_GIGA_MAC_VER_50 ... RTL_GIGA_MAC_VER_63:
+   case RTL_GIGA_MAC_VER_50 ... RTL_GIGA_MAC_MAX:
RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) | 0xc0);
break;
case RTL_GIGA_MAC_VER_40:
@@ -3950,7 +3950,7 @@ static void rtl8169_cleanup(struct rtl8169_private *tp, 
bool going_down)
RTL_W8(tp, ChipCmd, RTL_R8(tp, ChipCmd) | StopReq);
rtl_loop_wait_high(tp, &rtl_txcfg_empty_cond, 100, 666);
break;
-   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_63:
+   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_MAX:
rtl_enable_rxdvgate(tp);
fsleep(2000);
break;
-- 
2.29.2

Re: [net-next v4 00/15] Add mlx5 subfunction support

2020-12-15 Thread Alexander Duyck

On Mon, Dec 14, 2020 at 6:44 PM David Ahern  wrote:
>
> On 12/14/20 6:53 PM, Alexander Duyck wrote:
> >> example subfunction usage sequence:
> >> ---
> >> Change device to switchdev mode:
> >> $ devlink dev eswitch set pci/:06:00.0 mode switchdev
> >>
> >> Add a devlink port of subfunction flaovur:
> >> $ devlink port add pci/:06:00.0 flavour pcisf pfnum 0 sfnum 88
> >
> > Typo in your description. Also I don't know if you want to stick with
> > "flavour" or just shorten it to the U.S. spelling which is "flavor".
>
> The term exists in devlink today (since 2018). When support was added to
> iproute2 I decided there was no reason to require the US spelling over
> the British spelling, so I accepted the patch.

Okay. The only reason why I noticed is because "flaovur" is definitely
a wrong spelling. If it is already in the interface then no need to
change it.

[PATCH net v4] lan743x: fix rx_napi_poll/interrupt ping-pong

2020-12-15 Thread Sven Van Asbroeck

From: Sven Van Asbroeck 

Even if there is more rx data waiting on the chip, the rx napi poll fn
will never run more than once - it will always read a few buffers, then
bail out and re-arm interrupts. Which results in ping-pong between napi
and interrupt.

This defeats the purpose of napi, and is bad for performance.

Fix by making the rx napi poll behave identically to other ethernet
drivers:
1. initialize rx napi polling with an arbitrary budget (64).
2. in the polling fn, return full weight if rx queue is not depleted,
   this tells the napi core to "keep polling".
3. update the rx tail ("ring the doorbell") once for every 8 processed
   rx ring buffers.

Thanks to Jakub Kicinski, Eric Dumazet and Andrew Lunn for their expert
opinions and suggestions.

Tested with 20 seconds of full bandwidth receive (iperf3):
rx irqs  softirqs(NET_RX)
-
before  2382733620
after   129  4081

Tested-by: Sven Van Asbroeck  # lan7430
Fixes: 23f0703c125be ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Sven Van Asbroeck 
---

v3 -> v4:
- eliminate potential undefined behaviour in corner case
  (if weight == 0)

v2 -> v3:
- use NAPI_POLL_WEIGHT
  (Heiner Kallweit)

v1 -> v2:
- make napi rx polling behave identically to existing ethernet drivers
  (Jacub Kicinski, Eric Dumazet, Andrew Lunn)

Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git # 7f376f1917d7

To: Bryan Whitehead 
To: Microchip Linux Driver Support 
To: "David S. Miller" 
To: Jakub Kicinski 
Cc: Andrew Lunn 
Cc: Eric Dumazet 
Cc: Heiner Kallweit 
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org

 drivers/net/ethernet/microchip/lan743x_main.c | 43 ++-
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index b319c22c211c..8947c3a62810 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1962,6 +1962,14 @@ static struct sk_buff *lan743x_rx_allocate_skb(struct 
lan743x_rx *rx)
  length, GFP_ATOMIC | GFP_DMA);
 }
 
+static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
+{
+   /* update the tail once per 8 descriptors */
+   if ((index & 7) == 7)
+   lan743x_csr_write(rx->adapter, RX_TAIL(rx->channel_number),
+ index);
+}
+
 static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
struct sk_buff *skb)
 {
@@ -1992,6 +2000,7 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx 
*rx, int index,
descriptor->data0 = (RX_DESC_DATA0_OWN_ |
(length & RX_DESC_DATA0_BUF_LENGTH_MASK_));
skb_reserve(buffer_info->skb, RX_HEAD_PADDING);
+   lan743x_rx_update_tail(rx, index);
 
return 0;
 }
@@ -2010,6 +2019,7 @@ static void lan743x_rx_reuse_ring_element(struct 
lan743x_rx *rx, int index)
descriptor->data0 = (RX_DESC_DATA0_OWN_ |
((buffer_info->buffer_length) &
RX_DESC_DATA0_BUF_LENGTH_MASK_));
+   lan743x_rx_update_tail(rx, index);
 }
 
 static void lan743x_rx_release_ring_element(struct lan743x_rx *rx, int index)
@@ -2220,6 +2230,7 @@ static int lan743x_rx_napi_poll(struct napi_struct *napi, 
int weight)
 {
struct lan743x_rx *rx = container_of(napi, struct lan743x_rx, napi);
struct lan743x_adapter *adapter = rx->adapter;
+   int result = RX_PROCESS_RESULT_NOTHING_TO_DO;
u32 rx_tail_flags = 0;
int count;
 
@@ -2228,27 +2239,19 @@ static int lan743x_rx_napi_poll(struct napi_struct 
*napi, int weight)
lan743x_csr_write(adapter, DMAC_INT_STS,
  DMAC_INT_BIT_RXFRM_(rx->channel_number));
}
-   count = 0;
-   while (count < weight) {
-   int rx_process_result = lan743x_rx_process_packet(rx);
-
-   if (rx_process_result == RX_PROCESS_RESULT_PACKET_RECEIVED) {
-   count++;
-   } else if (rx_process_result ==
-   RX_PROCESS_RESULT_NOTHING_TO_DO) {
+   for (count = 0; count < weight; count++) {
+   result = lan743x_rx_process_packet(rx);
+   if (result == RX_PROCESS_RESULT_NOTHING_TO_DO)
break;
-   } else if (rx_process_result ==
-   RX_PROCESS_RESULT_PACKET_DROPPED) {
-   continue;
-   }
}
rx->frame_count += count;
-   if (count == weight)
-   goto done;
+   if (count == weight || result == RX_PROCESS_RESULT_PACKET_RECEIVED)
+   return weight;
 
if (!napi_complete_done(napi, count))
-   goto done;
+

Re: [PATCH][next] netfilter: nftables: fix incorrect increment of loop counter

2020-12-15 Thread Pablo Neira Ayuso

On Tue, Dec 15, 2020 at 03:38:30PM +0100, Pablo Neira Ayuso wrote:
> Hi,
> 
> On Mon, Dec 14, 2020 at 11:40:15PM +, Colin King wrote:
> > From: Colin Ian King 
> > 
> > The intention of the err_expr cleanup path is to iterate over the
> > allocated expr_array objects and free them, starting from i - 1 and
> > working down to the start of the array. Currently the loop counter
> > is being incremented instead of decremented and also the index i is
> > being used instead of k, repeatedly destroying the same expr_array
> > element.  Fix this by decrementing k and using k as the index into
> > expr_array.
> > 
> > Addresses-Coverity: ("Infinite loop")
> > Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions 
> > support")
> > Signed-off-by: Colin Ian King 
> 
> Reviewed-by: Pablo Neira Ayuso 
> 
> @Jakub: Would you please take this one into net-next? Thanks!

You marked as "Awaiting Upstream", I'll take care of it.

Thanks.

[PATCH] net: remove disc_data_lock in ppp line discipline

2020-12-15 Thread Gao Yan

tty layer provide tty->ldisc_sem lock to protect tty->disc_data;
For examlpe, when cpu A is running ppp_synctty_ioctl that
hold the tty->ldisc_sem, so if cpu B calls ppp_synctty_close,
it will wait until cpu A release tty->ldisc_sem. So I think it is
unnecessary to have the disc_data_lock;

cpu A   cpu B
tty_ioctl   tty_reopen
 ->hold tty->ldisc_sem->hold tty->ldisc_sem(write), failed
 ->ld->ops->ioctl ->wait...
 ->release tty->ldisc_sem ->wait...OK,hold tty->ldisc_sem
->tty_ldisc_reinit
  ->tty_ldisc_close
->ld->ops->close

Signed-off-by: Gao Yan 
---
 drivers/net/ppp/ppp_async.c   | 5 -
 drivers/net/ppp/ppp_synctty.c | 5 -
 2 files changed, 10 deletions(-)

diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
index 29a0917a8..f8cb591d6 100644
--- a/drivers/net/ppp/ppp_async.c
+++ b/drivers/net/ppp/ppp_async.c
@@ -127,17 +127,14 @@ static const struct ppp_channel_ops async_ops = {
  * FIXME: this is no longer true. The _close path for the ldisc is
  * now guaranteed to be sane.
  */
-static DEFINE_RWLOCK(disc_data_lock);
 
 static struct asyncppp *ap_get(struct tty_struct *tty)
 {
struct asyncppp *ap;
 
-   read_lock(&disc_data_lock);
ap = tty->disc_data;
if (ap != NULL)
refcount_inc(&ap->refcnt);
-   read_unlock(&disc_data_lock);
return ap;
 }
 
@@ -216,10 +213,8 @@ ppp_asynctty_close(struct tty_struct *tty)
 {
struct asyncppp *ap;
 
-   write_lock_irq(&disc_data_lock);
ap = tty->disc_data;
tty->disc_data = NULL;
-   write_unlock_irq(&disc_data_lock);
if (!ap)
return;
 
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 0f338752c..8cdf7268c 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -129,17 +129,14 @@ ppp_print_buffer (const char *name, const __u8 *buf, int 
count)
  *
  * FIXME: Fixed in tty_io nowadays.
  */
-static DEFINE_RWLOCK(disc_data_lock);
 
 static struct syncppp *sp_get(struct tty_struct *tty)
 {
struct syncppp *ap;
 
-   read_lock(&disc_data_lock);
ap = tty->disc_data;
if (ap != NULL)
refcount_inc(&ap->refcnt);
-   read_unlock(&disc_data_lock);
return ap;
 }
 
@@ -215,10 +212,8 @@ ppp_sync_close(struct tty_struct *tty)
 {
struct syncppp *ap;
 
-   write_lock_irq(&disc_data_lock);
ap = tty->disc_data;
tty->disc_data = NULL;
-   write_unlock_irq(&disc_data_lock);
if (!ap)
return;
 
-- 
2.17.1

Re: [PATCH v5 3/6] net: dsa: microchip: ksz8795: move register offsets and shifts to separate struct

2020-12-15 Thread Michael Grzeschik


Gentle Ping. Did you find time to look into my other patches of the
series. I really would like to send the next version.

Thanks!

On Mon, Dec 07, 2020 at 10:44:15PM +0100, Michael Grzeschik wrote:

On Mon, Dec 07, 2020 at 08:02:57PM +, tristram...@microchip.com wrote:

In order to get this driver used with other switches the functions need
to use different offsets and register shifts. This patch changes the
direct use of the register defines to register description structures,
which can be set depending on the chips register layout.

Signed-off-by: Michael Grzeschik 

---
v1 -> v4: - extracted this change from bigger previous patch
v4 -> v5: - added missing variables in ksz8_r_vlan_entries
 - moved shifts, masks and registers to arrays indexed by enums
 - using unsigned types where possible
---
drivers/net/dsa/microchip/ksz8.h|  69 +++
drivers/net/dsa/microchip/ksz8795.c | 261 +---
drivers/net/dsa/microchip/ksz8795_reg.h |  85 
3 files changed, 253 insertions(+), 162 deletions(-)
create mode 100644 drivers/net/dsa/microchip/ksz8.h


Sorry for not respond to these patches sooner.

There are 3 older KSZ switch families: KSZ8863/73, KSZ8895/64, and KSZ8795/94.
The newer KSZ8795 is not considered an upgrade for KSZ8895, so some of
these switch registers are moved around and some features are dropped.

It is best to have one driver to support all 3 switches, but some operations are
Incompatible so it may be better to keep the drivers separate for now.

For basic operations those issues may not occur so it seems simple to have
one driver handling all 3 switches.  I will come up with a list of those
incompatibilities.


Look into the next patch. I handled many special cases for the ksz8863
in the "net: dsa: microchip: ksz8795: add support for ksz88xx chips".
These cases, including the VLAN, Tagging ... are handled by checking if
the feautre IS_88X3 is set. This can be extended to other types as well.

My first version of the patches was an RFC series that was mentioning
that it is based on your RFC series for the ksz8895.

8863 RFC: 
https://patchwork.ozlabs.org/project/netdev/cover/20190508211330.19328-1-m.grzesc...@pengutronix.de/

8895 RFC: https://patchwork.ozlabs.org/patch/822712/

I remember, that I was reading the datasheets of all three chips,
8895, 8863 and 8795. After the 8795 series was mainline, the
obvious next step was to get the 8863 into the 8795 code. The result
is this series.

So the obvious question is, how far does your 8895 series differ
from the 8863 switches?


The tail tag format of KSZ8863 is different from KSZ8895 and KSZ8795, but
because of the DSA driver implementation that issue never comes up.


Right. In the first four series I kept an extra tail tag patch. But
after cleaning up I figured that the Implementation matched the
one for the KSZ9893. Therefor I reused the tag code.


-static void ksz8_from_vlan(u16 vlan, u8 *fid, u8 *member, u8 *valid)
+static void ksz8_from_vlan(struct ksz_device *dev, u32 vlan, u8 *fid,
+  u8 *member, u8 *valid)
{
-   *fid = vlan & VLAN_TABLE_FID;
-   *member = (vlan & VLAN_TABLE_MEMBERSHIP) >>
VLAN_TABLE_MEMBERSHIP_S;
-   *valid = !!(vlan & VLAN_TABLE_VALID);
+   struct ksz8 *ksz8 = dev->priv;
+   const u32 *masks = ksz8->masks;
+   const u8 *shifts = ksz8->shifts;
+
+   *fid = vlan & masks[VLAN_TABLE_FID];
+   *member = (vlan & masks[VLAN_TABLE_MEMBERSHIP]) >>
+   shifts[VLAN_TABLE_MEMBERSHIP_S];
+   *valid = !!(vlan & masks[VLAN_TABLE_VALID]);
}

-static void ksz8_to_vlan(u8 fid, u8 member, u8 valid, u16 *vlan)
+static void ksz8_to_vlan(struct ksz_device *dev, u8 fid, u8 member, u8
valid,
+u32 *vlan)
{
+   struct ksz8 *ksz8 = dev->priv;
+   const u32 *masks = ksz8->masks;
+   const u8 *shifts = ksz8->shifts;
+
   *vlan = fid;
-   *vlan |= (u16)member << VLAN_TABLE_MEMBERSHIP_S;
+   *vlan |= (u16)member << shifts[VLAN_TABLE_MEMBERSHIP_S];
   if (valid)
-   *vlan |= VLAN_TABLE_VALID;
+   *vlan |= masks[VLAN_TABLE_VALID];
}

static void ksz8_r_vlan_entries(struct ksz_device *dev, u16 addr)
{
+   struct ksz8 *ksz8 = dev->priv;
+   const u8 *shifts = ksz8->shifts;
   u64 data;
   int i;

@@ -418,7 +509,7 @@ static void ksz8_r_vlan_entries(struct ksz_device
*dev, u16 addr)
   addr *= dev->phy_port_cnt;
   for (i = 0; i < dev->phy_port_cnt; i++) {
   dev->vlan_cache[addr + i].table[0] = (u16)data;
-   data >>= VLAN_TABLE_S;
+   data >>= shifts[VLAN_TABLE];
   }
}

@@ -454,6 +545,8 @@ static void ksz8_w_vlan_table(struct ksz_device *dev,
u16 vid, u32 vlan)



The VLAN table operation in KSZ8863 is completely different from KSZ8795.


-/**
- * VLAN_TABLE_FID  00-007F007F-007F007F
- * VLAN_TABLE_MEMBERSHIP   00-0F800F80-0F800F80
- * VLAN_TABLE_VA

Re: [PATCH net-next v2 2/4] sch_htb: Hierarchical QoS hardware offload

2020-12-15 Thread Jamal Hadi Salim


On 2020-12-14 3:30 p.m., Maxim Mikityanskiy wrote:

On 2020-12-14 21:35, Cong Wang wrote:
On Mon, Dec 14, 2020 at 7:13 AM Maxim Mikityanskiy 
 wrote:


On 2020-12-11 21:16, Cong Wang wrote:
On Fri, Dec 11, 2020 at 7:26 AM Maxim Mikityanskiy 
 wrote:







Interesting, please explain how your HTB offload still has a global rate
limit and borrowing across queues?


Sure, I will explain that.


I simply can't see it, all I can see
is you offload HTB into each queue in ->attach(),


In the non-offload mode, the same HTB instance would be attached to all 
queues. In the offload mode, HTB behaves like MQ: there is a root 
instance of HTB, but each queue gets a separate simple qdisc (pfifo). 
Only the root qdisc (HTB) gets offloaded, and when that happens, the NIC 
creates an object for the QoS root.


Then all configuration changes are sent to the driver, and it issues the 
corresponding firmware commands to replicate the whole hierarchy in the 
NIC. Leaf classes correspond to queue groups (in this implementation 
queue groups contain only one queue, but it can be extended),



FWIW, it is very valuable to be able to abstract HTB if the hardware
can emulate it (users dont have to learn about new abstracts).
Since you are expressing a limitation above:
How does the user discover if they over-provisioned i.e single
queue example above? If there are too many corner cases it may
make sense to just create a new qdisc.

and inner 
classes correspond to entities called TSARs.


The information about rate limits is stored inside TSARs and queue 
groups. Queues know what groups they belong to, and groups and TSARs 
know what TSAR is their parent. A queue is picked in ndo_select_queue by 
looking at the classification result of clsact. So, when a packet is put 
onto a queue, the NIC can track the whole hierarchy and do the HTB 
algorithm.




Same question above:
Is there a limit to the number of classes that can be created?
IOW, if someone just created an arbitrary number of queues do they
get errored-out if it doesnt make sense for the hardware?
If such limits exist, it may make sense to provide a knob to query
(maybe ethtool) and if such limits can be adjusted it may be worth
looking at providing interfaces via devlink.

cheers,
jamal


cheers,
jamal

Re: [RFC PATCH net-next 07/16] net: mscc: ocelot: set up the bonding mask in a way that avoids a net_device

2020-12-15 Thread Alexandre Belloni

On 08/12/2020 14:07:53+0200, Vladimir Oltean wrote:
> Since this code should be called from pure switchdev as well as from
> DSA, we must find a way to determine the bonding mask not by looking
> directly at the net_device lowers of the bonding interface, since those
> could have different private structures.
> 
> We keep a pointer to the bonding upper interface, if present, in struct
> ocelot_port. Then the bonding mask becomes the bitwise OR of all ports
> that have the same bonding upper interface. This adds a duplication of
> functionality with the current "lags" array, but the duplication will be
> short-lived, since further patches will remove the latter completely.
> 
> Signed-off-by: Vladimir Oltean 
Reviewed-by: Alexandre Belloni 

> ---
>  drivers/net/ethernet/mscc/ocelot.c | 29 ++---
>  include/soc/mscc/ocelot.h  |  2 ++
>  2 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot.c 
> b/drivers/net/ethernet/mscc/ocelot.c
> index 13e86dd71e5a..30dee1f957d1 100644
> --- a/drivers/net/ethernet/mscc/ocelot.c
> +++ b/drivers/net/ethernet/mscc/ocelot.c
> @@ -881,6 +881,24 @@ int ocelot_get_ts_info(struct ocelot *ocelot, int port,
>  }
>  EXPORT_SYMBOL(ocelot_get_ts_info);
>  
> +static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device 
> *bond)
> +{
> + u32 bond_mask = 0;
> + int port;
> +
> + for (port = 0; port < ocelot->num_phys_ports; port++) {
> + struct ocelot_port *ocelot_port = ocelot->ports[port];
> +
> + if (!ocelot_port)
> + continue;
> +
> + if (ocelot_port->bond == bond)
> + bond_mask |= BIT(port);
> + }
> +
> + return bond_mask;
> +}
> +
>  void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
>  {
>   struct ocelot_port *ocelot_port = ocelot->ports[port];
> @@ -1272,17 +1290,12 @@ static void ocelot_setup_lag(struct ocelot *ocelot, 
> int lag)
>  int ocelot_port_lag_join(struct ocelot *ocelot, int port,
>struct net_device *bond)
>  {
> - struct net_device *ndev;
>   u32 bond_mask = 0;
>   int lag, lp;
>  
> - rcu_read_lock();
> - for_each_netdev_in_bond_rcu(bond, ndev) {
> - struct ocelot_port_private *priv = netdev_priv(ndev);
> + ocelot->ports[port]->bond = bond;
>  
> - bond_mask |= BIT(priv->chip_port);
> - }
> - rcu_read_unlock();
> + bond_mask = ocelot_get_bond_mask(ocelot, bond);
>  
>   lp = __ffs(bond_mask);
>  
> @@ -1315,6 +1328,8 @@ void ocelot_port_lag_leave(struct ocelot *ocelot, int 
> port,
>   u32 port_cfg;
>   int i;
>  
> + ocelot->ports[port]->bond = NULL;
> +
>   /* Remove port from any lag */
>   for (i = 0; i < ocelot->num_phys_ports; i++)
>   ocelot->lags[i] &= ~BIT(port);
> diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> index 50514c087231..b812bdff1da1 100644
> --- a/include/soc/mscc/ocelot.h
> +++ b/include/soc/mscc/ocelot.h
> @@ -597,6 +597,8 @@ struct ocelot_port {
>   phy_interface_t phy_mode;
>  
>   u8  *xmit_template;
> +
> + struct net_device   *bond;
>  };
>  
>  struct ocelot {
> -- 
> 2.25.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[net-next PATCH v2 00/14] ACPI support for dpaa2 driver

2020-12-15 Thread Calvin Johnson



This patch set provides ACPI support to DPAA2 network drivers.

It also introduces new fwnode based APIs to support phylink and phy
layers
Following functions are defined:
  phylink_fwnode_phy_connect()
  fwnode_mdiobus_register_phy()
  fwnode_mdiobus_register()
  fwnode_get_phy_id()
  fwnode_phy_find_device()
  device_phy_find_device()
  fwnode_get_phy_node()
  fwnode_mdio_find_device()
  fwnode_get_id()

First one helps in connecting phy to phylink instance.
Next three helps in getting phy_id and registering phy to mdiobus
Next two help in finding a phy on a mdiobus.
Next one helps in getting phy_node from a fwnode.
Last one is used to get fwnode ID.

Corresponding OF functions are refactored.
END


Changes in v2:
- Updated with more description in document
- use reverse christmas tree ordering for local variables
- Refactor OF functions to use fwnode functions

Calvin Johnson (14):
  Documentation: ACPI: DSD: Document MDIO PHY
  net: phy: Introduce phy related fwnode functions
  of: mdio: Refactor of_phy_find_device()
  net: phy: Introduce fwnode_get_phy_id()
  of: mdio: Refactor of_get_phy_id()
  net: mdiobus: Introduce fwnode_mdiobus_register_phy()
  of: mdio: Refactor of_mdiobus_register_phy()
  net: mdiobus: Introduce fwnode_mdiobus_register()
  net/fsl: Use fwnode_mdiobus_register()
  device property: Introduce fwnode_get_id()
  phylink: introduce phylink_fwnode_phy_connect()
  net: phylink: Refactor phylink_of_phy_connect()
  net: phy: Introduce fwnode_mdio_find_device()
  net: dpaa2-mac: Add ACPI support for DPAA2 MAC driver

 Documentation/firmware-guide/acpi/dsd/phy.rst | 129 ++
 drivers/base/property.c   |  26 
 .../net/ethernet/freescale/dpaa2/dpaa2-mac.c  |  86 +++-
 drivers/net/ethernet/freescale/xgmac_mdio.c   |  14 +-
 drivers/net/mdio/of_mdio.c|  79 +--
 drivers/net/phy/mdio_bus.c| 116 
 drivers/net/phy/phy_device.c  | 108 +++
 drivers/net/phy/phylink.c |  49 ---
 include/linux/mdio.h  |   2 +
 include/linux/of_mdio.h   |   6 +-
 include/linux/phy.h   |  32 +
 include/linux/phylink.h   |   3 +
 include/linux/property.h  |   1 +
 13 files changed, 519 insertions(+), 132 deletions(-)
 create mode 100644 Documentation/firmware-guide/acpi/dsd/phy.rst

-- 
2.17.1

[net-next PATCH v2 01/14] Documentation: ACPI: DSD: Document MDIO PHY

2020-12-15 Thread Calvin Johnson

Introduce ACPI mechanism to get PHYs registered on a MDIO bus and
provide them to be connected to MAC.

Describe properties "phy-handle" and "phy-mode".

Signed-off-by: Calvin Johnson 
---

Changes in v2:
- Updated with more description in document

 Documentation/firmware-guide/acpi/dsd/phy.rst | 129 ++
 1 file changed, 129 insertions(+)
 create mode 100644 Documentation/firmware-guide/acpi/dsd/phy.rst

diff --git a/Documentation/firmware-guide/acpi/dsd/phy.rst 
b/Documentation/firmware-guide/acpi/dsd/phy.rst
new file mode 100644
index ..a2e4fdcdbf53
--- /dev/null
+++ b/Documentation/firmware-guide/acpi/dsd/phy.rst
@@ -0,0 +1,129 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+MDIO bus and PHYs in ACPI
+=
+
+The PHYs on an MDIO bus [1] are probed and registered using
+fwnode_mdiobus_register_phy().
+Later, for connecting these PHYs to MAC, the PHYs registered on the
+mdiobus have to be referenced.
+
+UUID given below should be used as mentioned in the "Device Properties
+UUID For _DSD" [2] document.
+   - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
+
+This document introduces two _DSD properties that are to be used
+for PHYs on the MDIO bus.[3]
+
+phy-handle
+--
+For each MAC node, a device property "phy-handle" is used to reference
+the PHY that is registered on an MDIO bus. This is mandatory for
+network interfaces that have PHYs connected to MAC via MDIO bus.
+
+During the MDIO bus driver initialization, PHYs on this bus are probed
+using the _ADR object as shown below and are registered on the mdio bus.
+
+::
+  Scope(\_SB.MDI0)
+  {
+Device(PHY1) {
+  Name (_ADR, 0x1)
+} // end of PHY1
+
+Device(PHY2) {
+  Name (_ADR, 0x2)
+} // end of PHY2
+  }
+
+Later, during the MAC driver initialization, the registered PHY devices
+have to be retrieved from the mdio bus. For this, MAC driver needs
+reference to the previously registered PHYs which are provided
+using reference to the device as {\_SB.MDI0.PHY1}.
+
+phy-mode
+
+The "phy-mode" _DSD property is used to describe the connection to
+the PHY. The valid values for "phy-mode" are defined in [4].
+
+
+An ASL example of this is shown below.
+
+DSDT entry for MDIO node
+
+The MDIO bus has an SoC component(mdio controller) and a platform
+component(PHYs on the mdiobus).
+
+a) Silicon Component
+This node describes the MDIO controller,MDI0
+
+::
+   Scope(_SB)
+   {
+ Device(MDI0) {
+   Name(_HID, "NXP0006")
+   Name(_CCA, 1)
+   Name(_UID, 0)
+   Name(_CRS, ResourceTemplate() {
+ Memory32Fixed(ReadWrite, MDI0_BASE, MDI_LEN)
+ Interrupt(ResourceConsumer, Level, ActiveHigh, Shared)
+  {
+MDI0_IT
+  }
+   }) // end of _CRS for MDI0
+ } // end of MDI0
+   }
+
+b) Platform Component
+This node defines the PHYs that are connected to the MDIO bus, MDI0
+---
+::
+   Scope(\_SB.MDI0)
+   {
+ Device(PHY1) {
+   Name (_ADR, 0x1)
+ } // end of PHY1
+
+ Device(PHY2) {
+   Name (_ADR, 0x2)
+ } // end of PHY2
+   }
+
+
+Below are the MAC nodes where PHY nodes are referenced.
+phy-mode and phy-handle are used as explained earlier.
+--
+::
+   Scope(\_SB.MCE0.PR17)
+   {
+ Name (_DSD, Package () {
+ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
+Package () {
+Package (2) {"phy-mode", "rgmii-id"},
+Package (2) {"phy-handle", \_SB.MDI0.PHY1}
+ }
+  })
+   }
+
+   Scope(\_SB.MCE0.PR18)
+   {
+ Name (_DSD, Package () {
+   ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
+   Package () {
+   Package (2) {"phy-mode", "rgmii-id"},
+   Package (2) {"phy-handle", \_SB.MDI0.PHY2}}
+   }
+ })
+   }
+
+References
+==
+
+[1] Documentation/networking/phy.rst
+
+[2] 
https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
+
+[3] Documentation/firmware-guide/acpi/DSD-properties-rules.rst
+
+[4] Documentation/devicetree/bindings/net/ethernet-controller.yaml
-- 
2.17.1

[net-next PATCH v2 02/14] net: phy: Introduce phy related fwnode functions

2020-12-15 Thread Calvin Johnson

Define fwnode_phy_find_device() to iterate an mdiobus and find the
phy device of the provided phy fwnode. Additionally define
device_phy_find_device() to find phy device of provided device.

Define fwnode_get_phy_node() to get phy_node using named reference.

Signed-off-by: Calvin Johnson 
---

Changes in v2:
- use reverse christmas tree ordering for local variables

 drivers/net/phy/phy_device.c | 64 
 include/linux/phy.h  | 20 +++
 2 files changed, 84 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 80c2e646c093..c153273606c1 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -9,6 +9,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -2829,6 +2830,69 @@ static bool phy_drv_supports_irq(struct phy_driver 
*phydrv)
return phydrv->config_intr && phydrv->handle_interrupt;
 }
 
+/**
+ * fwnode_phy_find_device - Find phy_device on the mdiobus for the provided
+ * phy_fwnode.
+ * @phy_fwnode: Pointer to the phy's fwnode.
+ *
+ * If successful, returns a pointer to the phy_device with the embedded
+ * struct device refcount incremented by one, or NULL on failure.
+ */
+struct phy_device *fwnode_phy_find_device(struct fwnode_handle *phy_fwnode)
+{
+   struct mdio_device *mdiodev;
+   struct device *d;
+
+   if (!phy_fwnode)
+   return NULL;
+
+   d = bus_find_device_by_fwnode(&mdio_bus_type, phy_fwnode);
+   if (d) {
+   mdiodev = to_mdio_device(d);
+   if (mdiodev->flags & MDIO_DEVICE_FLAG_PHY)
+   return to_phy_device(d);
+   put_device(d);
+   }
+
+   return NULL;
+}
+EXPORT_SYMBOL(fwnode_phy_find_device);
+
+/**
+ * device_phy_find_device - For the given device, get the phy_device
+ * @dev: Pointer to the given device
+ *
+ * Refer return conditions of fwnode_phy_find_device().
+ */
+struct phy_device *device_phy_find_device(struct device *dev)
+{
+   return fwnode_phy_find_device(dev_fwnode(dev));
+}
+EXPORT_SYMBOL_GPL(device_phy_find_device);
+
+/**
+ * fwnode_get_phy_node - Get the phy_node using the named reference.
+ * @fwnode: Pointer to fwnode from which phy_node has to be obtained.
+ *
+ * Refer return conditions of fwnode_find_reference().
+ * For ACPI, only "phy-handle" is supported. DT supports all the three
+ * named references to the phy node.
+ */
+struct fwnode_handle *fwnode_get_phy_node(struct fwnode_handle *fwnode)
+{
+   struct fwnode_handle *phy_node;
+
+   /* Only phy-handle is used for ACPI */
+   phy_node = fwnode_find_reference(fwnode, "phy-handle", 0);
+   if (is_acpi_node(fwnode) || !IS_ERR(phy_node))
+   return phy_node;
+   phy_node = fwnode_find_reference(fwnode, "phy", 0);
+   if (IS_ERR(phy_node))
+   phy_node = fwnode_find_reference(fwnode, "phy-device", 0);
+   return phy_node;
+}
+EXPORT_SYMBOL_GPL(fwnode_get_phy_node);
+
 /**
  * phy_probe - probe and init a PHY device
  * @dev: device to probe and init
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 381a95732b6a..7790a9a56d0f 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1341,10 +1341,30 @@ struct phy_device *phy_device_create(struct mii_bus 
*bus, int addr, u32 phy_id,
 bool is_c45,
 struct phy_c45_device_ids *c45_ids);
 #if IS_ENABLED(CONFIG_PHYLIB)
+struct phy_device *fwnode_phy_find_device(struct fwnode_handle *phy_fwnode);
+struct phy_device *device_phy_find_device(struct device *dev);
+struct fwnode_handle *fwnode_get_phy_node(struct fwnode_handle *fwnode);
 struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45);
 int phy_device_register(struct phy_device *phy);
 void phy_device_free(struct phy_device *phydev);
 #else
+static inline
+struct phy_device *fwnode_phy_find_device(struct fwnode_handle *phy_fwnode)
+{
+   return NULL;
+}
+
+static inline struct phy_device *device_phy_find_device(struct device *dev)
+{
+   return NULL;
+}
+
+static inline
+struct fwnode_handle *fwnode_get_phy_node(struct fwnode_handle *fwnode)
+{
+   return NULL;
+}
+
 static inline
 struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
 {
-- 
2.17.1

[net-next PATCH v2 10/14] device property: Introduce fwnode_get_id()

2020-12-15 Thread Calvin Johnson

Using fwnode_get_id(), get the reg property value for DT node
and get the _ADR object value for ACPI node.

Signed-off-by: Calvin Johnson 
---

Changes in v2: None

 drivers/base/property.c  | 26 ++
 include/linux/property.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 4c43d30145c6..1c50e17ae879 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -580,6 +580,32 @@ const char *fwnode_get_name_prefix(const struct 
fwnode_handle *fwnode)
return fwnode_call_ptr_op(fwnode, get_name_prefix);
 }
 
+/**
+ * fwnode_get_id - Get the id of a fwnode.
+ * @fwnode: firmware node
+ * @id: id of the fwnode
+ *
+ * Returns 0 on success or a negative errno.
+ */
+int fwnode_get_id(struct fwnode_handle *fwnode, u32 *id)
+{
+   unsigned long long adr;
+   acpi_status status;
+
+   if (is_of_node(fwnode)) {
+   return of_property_read_u32(to_of_node(fwnode), "reg", id);
+   } else if (is_acpi_node(fwnode)) {
+   status = acpi_evaluate_integer(ACPI_HANDLE_FWNODE(fwnode),
+  METHOD_NAME__ADR, NULL, &adr);
+   if (ACPI_FAILURE(status))
+   return -ENODATA;
+   *id = (u32)adr;
+   return 0;
+   }
+   return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(fwnode_get_id);
+
 /**
  * fwnode_get_parent - Return parent firwmare node
  * @fwnode: Firmware whose parent is retrieved
diff --git a/include/linux/property.h b/include/linux/property.h
index 2d4542629d80..92d405cf2b07 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -82,6 +82,7 @@ struct fwnode_handle *fwnode_find_reference(const struct 
fwnode_handle *fwnode,
 
 const char *fwnode_get_name(const struct fwnode_handle *fwnode);
 const char *fwnode_get_name_prefix(const struct fwnode_handle *fwnode);
+int fwnode_get_id(struct fwnode_handle *fwnode, u32 *id);
 struct fwnode_handle *fwnode_get_parent(const struct fwnode_handle *fwnode);
 struct fwnode_handle *fwnode_get_next_parent(
struct fwnode_handle *fwnode);
-- 
2.17.1

[net-next PATCH v2 08/14] net: mdiobus: Introduce fwnode_mdiobus_register()

2020-12-15 Thread Calvin Johnson

Introduce fwnode_mdiobus_register() to register PHYs on the  mdiobus.
If the fwnode is DT node, then call of_mdiobus_register().
If it is an ACPI node, then:
- disable auto probing of mdiobus
- register mdiobus
- save fwnode to mdio structure
- loop over child nodes & register a phy_device for each PHY

Signed-off-by: Calvin Johnson 
---

Changes in v2: None

 drivers/net/phy/mdio_bus.c | 50 ++
 include/linux/phy.h|  1 +
 2 files changed, 51 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 3361a1a86e97..e7ad34908936 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -8,6 +8,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -567,6 +568,55 @@ static int mdiobus_create_device(struct mii_bus *bus,
return ret;
 }
 
+/**
+ * fwnode_mdiobus_register - Register mii_bus and create PHYs from fwnode
+ * @mdio: pointer to mii_bus structure
+ * @fwnode: pointer to fwnode of MDIO bus.
+ *
+ * This function registers the mii_bus structure and registers a phy_device
+ * for each child node of @fwnode.
+ */
+int fwnode_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode)
+{
+   struct fwnode_handle *child;
+   unsigned long long addr;
+   acpi_status status;
+   int ret;
+
+   if (is_of_node(fwnode)) {
+   return of_mdiobus_register(mdio, to_of_node(fwnode));
+   } else if (is_acpi_node(fwnode)) {
+   /* Mask out all PHYs from auto probing. */
+   mdio->phy_mask = ~0;
+   ret = mdiobus_register(mdio);
+   if (ret)
+   return ret;
+
+   mdio->dev.fwnode = fwnode;
+   /* Loop over the child nodes and register a phy_device for each PHY */
+   fwnode_for_each_child_node(fwnode, child) {
+   status = 
acpi_evaluate_integer(ACPI_HANDLE_FWNODE(child),
+  "_ADR", NULL, &addr);
+   if (ACPI_FAILURE(status)) {
+   pr_debug("_ADR returned %d\n", status);
+   continue;
+   }
+
+   if (addr < 0 || addr >= PHY_MAX_ADDR)
+   continue;
+
+   ret = fwnode_mdiobus_register_phy(mdio, child, addr);
+   if (ret == -ENODEV)
+   dev_err(&mdio->dev,
+   "MDIO device at address %lld is 
missing.\n",
+   addr);
+   }
+   return 0;
+   }
+   return -EINVAL;
+}
+EXPORT_SYMBOL(fwnode_mdiobus_register);
+
 /**
  * __mdiobus_register - bring up all the PHYs on a given bus and attach them 
to bus
  * @bus: target mii_bus
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 10a66b65a008..67ea4ca6f76f 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -383,6 +383,7 @@ static inline struct mii_bus *mdiobus_alloc(void)
return mdiobus_alloc_size(0);
 }
 
+int fwnode_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle 
*fwnode);
 int __mdiobus_register(struct mii_bus *bus, struct module *owner);
 int __devm_mdiobus_register(struct device *dev, struct mii_bus *bus,
struct module *owner);
-- 
2.17.1

[net-next PATCH v2 11/14] phylink: introduce phylink_fwnode_phy_connect()

2020-12-15 Thread Calvin Johnson

Define phylink_fwnode_phy_connect() to connect phy specified by
a fwnode to a phylink instance.

Signed-off-by: Calvin Johnson 
---

Changes in v2: None

 drivers/net/phy/phylink.c | 54 +++
 include/linux/phylink.h   |  3 +++
 2 files changed, 57 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 84f6e197f965..389dc3ec165e 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -5,6 +5,7 @@
  *
  * Copyright (C) 2015 Russell King
  */
+#include 
 #include 
 #include 
 #include 
@@ -1120,6 +1121,59 @@ int phylink_of_phy_connect(struct phylink *pl, struct 
device_node *dn,
 }
 EXPORT_SYMBOL_GPL(phylink_of_phy_connect);
 
+/**
+ * phylink_fwnode_phy_connect() - connect the PHY specified in the fwnode.
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ * @fwnode: a pointer to a &struct fwnode_handle.
+ * @flags: PHY-specific flags to communicate to the PHY device driver
+ *
+ * Connect the phy specified @fwnode to the phylink instance specified
+ * by @pl.
+ *
+ * Returns 0 on success or a negative errno.
+ */
+int phylink_fwnode_phy_connect(struct phylink *pl,
+  struct fwnode_handle *fwnode,
+  u32 flags)
+{
+   struct fwnode_handle *phy_fwnode;
+   struct phy_device *phy_dev;
+   int ret;
+
+   if (is_of_node(fwnode)) {
+   /* Fixed links and 802.3z are handled without needing a PHY */
+   if (pl->cfg_link_an_mode == MLO_AN_FIXED ||
+   (pl->cfg_link_an_mode == MLO_AN_INBAND &&
+phy_interface_mode_is_8023z(pl->link_interface)))
+   return 0;
+   }
+
+   phy_fwnode = fwnode_get_phy_node(fwnode);
+   if (IS_ERR(phy_fwnode)) {
+   if (pl->cfg_link_an_mode == MLO_AN_PHY)
+   return -ENODEV;
+   return 0;
+   }
+
+   phy_dev = fwnode_phy_find_device(phy_fwnode);
+   /* We're done with the phy_node handle */
+   fwnode_handle_put(phy_fwnode);
+   if (!phy_dev)
+   return -ENODEV;
+
+   ret = phy_attach_direct(pl->netdev, phy_dev, flags,
+   pl->link_interface);
+   if (ret)
+   return ret;
+
+   ret = phylink_bringup_phy(pl, phy_dev, pl->link_config.interface);
+   if (ret)
+   phy_detach(phy_dev);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(phylink_fwnode_phy_connect);
+
 /**
  * phylink_disconnect_phy() - disconnect any PHY attached to the phylink
  *   instance.
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index d81a714cfbbd..75d4f99090fd 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -439,6 +439,9 @@ void phylink_destroy(struct phylink *);
 
 int phylink_connect_phy(struct phylink *, struct phy_device *);
 int phylink_of_phy_connect(struct phylink *, struct device_node *, u32 flags);
+int phylink_fwnode_phy_connect(struct phylink *pl,
+  struct fwnode_handle *fwnode,
+  u32 flags);
 void phylink_disconnect_phy(struct phylink *);
 
 void phylink_mac_change(struct phylink *, bool up);
-- 
2.17.1

[net-next PATCH v2 12/14] net: phylink: Refactor phylink_of_phy_connect()

2020-12-15 Thread Calvin Johnson

Refactor phylink_of_phy_connect() to use phylink_fwnode_phy_connect().

Signed-off-by: Calvin Johnson 
---

Changes in v2: None

 drivers/net/phy/phylink.c | 39 +--
 1 file changed, 1 insertion(+), 38 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 389dc3ec165e..26f014f0ad42 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1080,44 +1080,7 @@ EXPORT_SYMBOL_GPL(phylink_connect_phy);
 int phylink_of_phy_connect(struct phylink *pl, struct device_node *dn,
   u32 flags)
 {
-   struct device_node *phy_node;
-   struct phy_device *phy_dev;
-   int ret;
-
-   /* Fixed links and 802.3z are handled without needing a PHY */
-   if (pl->cfg_link_an_mode == MLO_AN_FIXED ||
-   (pl->cfg_link_an_mode == MLO_AN_INBAND &&
-phy_interface_mode_is_8023z(pl->link_interface)))
-   return 0;
-
-   phy_node = of_parse_phandle(dn, "phy-handle", 0);
-   if (!phy_node)
-   phy_node = of_parse_phandle(dn, "phy", 0);
-   if (!phy_node)
-   phy_node = of_parse_phandle(dn, "phy-device", 0);
-
-   if (!phy_node) {
-   if (pl->cfg_link_an_mode == MLO_AN_PHY)
-   return -ENODEV;
-   return 0;
-   }
-
-   phy_dev = of_phy_find_device(phy_node);
-   /* We're done with the phy_node handle */
-   of_node_put(phy_node);
-   if (!phy_dev)
-   return -ENODEV;
-
-   ret = phy_attach_direct(pl->netdev, phy_dev, flags,
-   pl->link_interface);
-   if (ret)
-   return ret;
-
-   ret = phylink_bringup_phy(pl, phy_dev, pl->link_config.interface);
-   if (ret)
-   phy_detach(phy_dev);
-
-   return ret;
+   return phylink_fwnode_phy_connect(pl, of_fwnode_handle(dn), flags);
 }
 EXPORT_SYMBOL_GPL(phylink_of_phy_connect);
 
-- 
2.17.1

[net-next PATCH v2 14/14] net: dpaa2-mac: Add ACPI support for DPAA2 MAC driver

2020-12-15 Thread Calvin Johnson

Modify dpaa2_mac_connect() to support ACPI along with DT.
Modify dpaa2_mac_get_node() to get the dpmac fwnode from either
DT or ACPI.

Replace of_get_phy_mode with fwnode_get_phy_mode to get
phy-mode for a dpmac_node.

Use helper function phylink_fwnode_phy_connect() to find phy_dev and
connect to mac->phylink.

Signed-off-by: Calvin Johnson 
---

Changes in v2:
- Refactor OF functions to use fwnode functions

 .../net/ethernet/freescale/dpaa2/dpaa2-mac.c  | 86 +++
 1 file changed, 50 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
index 828c177df03d..c242d5c2a9ed 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
 /* Copyright 2019 NXP */
 
+#include 
+#include 
+
 #include "dpaa2-eth.h"
 #include "dpaa2-mac.h"
 
@@ -34,39 +37,47 @@ static int phy_mode(enum dpmac_eth_if eth_if, 
phy_interface_t *if_mode)
return 0;
 }
 
-/* Caller must call of_node_put on the returned value */
-static struct device_node *dpaa2_mac_get_node(u16 dpmac_id)
+static struct fwnode_handle *dpaa2_mac_get_node(struct device *dev,
+   u16 dpmac_id)
 {
-   struct device_node *dpmacs, *dpmac = NULL;
-   u32 id;
+   struct device_node *dpmacs = NULL;
+   struct fwnode_handle *parent, *child  = NULL;
int err;
+   u32 id;
 
-   dpmacs = of_find_node_by_name(NULL, "dpmacs");
-   if (!dpmacs)
-   return NULL;
+   if (is_of_node(dev->parent->fwnode)) {
+   dpmacs = of_find_node_by_name(NULL, "dpmacs");
+   if (!dpmacs)
+   return NULL;
+   parent = of_fwnode_handle(dpmacs);
+   } else if (is_acpi_node(dev->parent->fwnode)) {
+   parent = dev->parent->fwnode;
+   }
 
-   while ((dpmac = of_get_next_child(dpmacs, dpmac)) != NULL) {
-   err = of_property_read_u32(dpmac, "reg", &id);
-   if (err)
+   fwnode_for_each_child_node(parent, child) {
+   err = fwnode_get_id(child, &id);
+   if (err) {
continue;
-   if (id == dpmac_id)
-   break;
+   } else if (id == dpmac_id) {
+   if (is_of_node(dev->parent->fwnode))
+   of_node_put(dpmacs);
+   return child;
+   }
}
-
-   of_node_put(dpmacs);
-
-   return dpmac;
+   if (is_of_node(dev->parent->fwnode))
+   of_node_put(dpmacs);
+   return NULL;
 }
 
-static int dpaa2_mac_get_if_mode(struct device_node *node,
+static int dpaa2_mac_get_if_mode(struct fwnode_handle *dpmac_node,
 struct dpmac_attr attr)
 {
phy_interface_t if_mode;
int err;
 
-   err = of_get_phy_mode(node, &if_mode);
-   if (!err)
-   return if_mode;
+   err = fwnode_get_phy_mode(dpmac_node);
+   if (err > 0)
+   return err;
 
err = phy_mode(attr.eth_if, &if_mode);
if (!err)
@@ -255,26 +266,27 @@ bool dpaa2_mac_is_type_fixed(struct fsl_mc_device 
*dpmac_dev,
 }
 
 static int dpaa2_pcs_create(struct dpaa2_mac *mac,
-   struct device_node *dpmac_node, int id)
+   struct fwnode_handle *dpmac_node,
+   int id)
 {
struct mdio_device *mdiodev;
-   struct device_node *node;
+   struct fwnode_handle *node;
 
-   node = of_parse_phandle(dpmac_node, "pcs-handle", 0);
-   if (!node) {
+   node = fwnode_find_reference(dpmac_node, "pcs-handle", 0);
+   if (IS_ERR(node)) {
/* do not error out on old DTS files */
netdev_warn(mac->net_dev, "pcs-handle node not found\n");
return 0;
}
 
-   if (!of_device_is_available(node)) {
+   if (!of_device_is_available(to_of_node(node))) {
netdev_err(mac->net_dev, "pcs-handle node not available\n");
-   of_node_put(node);
+   of_node_put(to_of_node(node));
return -ENODEV;
}
 
-   mdiodev = of_mdio_find_device(node);
-   of_node_put(node);
+   mdiodev = fwnode_mdio_find_device(node);
+   fwnode_handle_put(node);
if (!mdiodev)
return -EPROBE_DEFER;
 
@@ -304,7 +316,7 @@ int dpaa2_mac_connect(struct dpaa2_mac *mac)
 {
struct fsl_mc_device *dpmac_dev = mac->mc_dev;
struct net_device *net_dev = mac->net_dev;
-   struct device_node *dpmac_node;
+   struct fwnode_handle *dpmac_node = NULL;
struct phylink *phylink;
struct dpmac_attr attr;
int err;
@@ -324,7 +336,7 @@ int dpaa2_mac_connect(struct dpaa2_mac *mac)
 
mac->if_link_type = attr.lin

Re: [RFC PATCH net-next 08/16] net: mscc: ocelot: avoid unneeded "lp" variable in LAG join

2020-12-15 Thread Alexandre Belloni

On 08/12/2020 14:07:54+0200, Vladimir Oltean wrote:
> The index of the LAG is equal to the logical port ID that all the
> physical port members have, which is further equal to the index of the
> first physical port that is a member of the LAG.
> 
> The code gets a bit carried away with logic like this:
> 
>   if (a == b)
>   c = a;
>   else
>   c = b;
> 
> which can be simplified, of course, into:
> 
>   c = b;
> 
> (with a being port, b being lp, c being lag)
> 
> This further makes the "lp" variable redundant, since we can use "lag"
> everywhere where "lp" (logical port) was used. So instead of a "c = b"
> assignment, we can do a complete deletion of b. Only one comment here:
> 
>   if (bond_mask) {
>   lp = __ffs(bond_mask);
>   ocelot->lags[lp] = 0;
>   }
> 
> lp was clobbered before, because it was used as a temporary variable to
> hold the new smallest port ID from the bond. Now that we don't have "lp"
> any longer, we'll just avoid the temporary variable and zeroize the
> bonding mask directly.
> 
> Signed-off-by: Vladimir Oltean 
Reviewed-by: Alexandre Belloni 

> ---
>  drivers/net/ethernet/mscc/ocelot.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot.c 
> b/drivers/net/ethernet/mscc/ocelot.c
> index 30dee1f957d1..080fd4ce37ea 100644
> --- a/drivers/net/ethernet/mscc/ocelot.c
> +++ b/drivers/net/ethernet/mscc/ocelot.c
> @@ -1291,28 +1291,24 @@ int ocelot_port_lag_join(struct ocelot *ocelot, int 
> port,
>struct net_device *bond)
>  {
>   u32 bond_mask = 0;
> - int lag, lp;
> + int lag;
>  
>   ocelot->ports[port]->bond = bond;
>  
>   bond_mask = ocelot_get_bond_mask(ocelot, bond);
>  
> - lp = __ffs(bond_mask);
> + lag = __ffs(bond_mask);
>  
>   /* If the new port is the lowest one, use it as the logical port from
>* now on
>*/
> - if (port == lp) {
> - lag = port;
> + if (port == lag) {
>   ocelot->lags[port] = bond_mask;
>   bond_mask &= ~BIT(port);
> - if (bond_mask) {
> - lp = __ffs(bond_mask);
> - ocelot->lags[lp] = 0;
> - }
> + if (bond_mask)
> + ocelot->lags[__ffs(bond_mask)] = 0;
>   } else {
> - lag = lp;
> - ocelot->lags[lp] |= BIT(port);
> + ocelot->lags[lag] |= BIT(port);
>   }
>  
>   ocelot_setup_lag(ocelot, lag);
> -- 
> 2.25.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

1 2 3 >

1 - 100 of 273 matches

Mail list logo