date:20201123

Re: [PATCH net-next 1/2] dt-bindings: net: nfc: s3fwrn5: Support a UART interface

2020-11-23 Thread k...@kernel.org

On Mon, Nov 23, 2020 at 04:55:26PM +0900, Bongsu Jeon wrote:
> Since S3FWRN82 NFC Chip, The UART interface can be used.
> S3FWRN82 supports I2C and UART interface.
> 
> Signed-off-by: Bongsu Jeon 
> ---
>  .../bindings/net/nfc/samsung,s3fwrn5.yaml | 28 +--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml 
> b/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
> index cb0b8a560282..37b3e5ae5681 100644
> --- a/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
> +++ b/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
> @@ -13,6 +13,7 @@ maintainers:
>  properties:
>compatible:
>  const: samsung,s3fwrn5-i2c
> +const: samsung,s3fwrn82-uart

This does not work, you need to use enum. Did you run at least
dt_bindings_check?

The compatible should be just "samsung,s3fwrn82". I think it was a
mistake in the first s3fwrn5 submission to add a interface to
compatible.

>  
>en-gpios:
>  maxItems: 1
> @@ -47,10 +48,19 @@ additionalProperties: false
>  required:
>- compatible
>- en-gpios
> -  - interrupts
> -  - reg
>- wake-gpios
>  
> +allOf:
> +  - if:
> +  properties:
> +compatible:
> +  contains:
> +const: samsung,s3fwrn5-i2c
> +then:
> +  required:
> +- interrupts
> +- reg
> +
>  examples:
>- |
>  #include 
> @@ -71,3 +81,17 @@ examples:
>  wake-gpios = <&gpj0 2 GPIO_ACTIVE_HIGH>;
>  };
>  };
> +  # UART example on Raspberry Pi
> +  - |
> +&uart0 {
> +status = "okay";
> +
> +s3fwrn82_uart {

Just "bluetooth" to follow Devicetree specification.

Best regards,
Krzysztof

Re: [PATCH net-next v2] compat: always include linux/compat.h from net/compat.h

2020-11-23 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig

Re: [PATCH net-next] net: page_pool: Add page_pool_put_page_bulk() to page_pool.rst

2020-11-23 Thread Ilias Apalodimas

On Fri, Nov 20, 2020 at 11:19:34PM +0100, Lorenzo Bianconi wrote:
> Introduce page_pool_put_page_bulk() entry into the API section of
> page_pool.rst
> 
> Signed-off-by: Lorenzo Bianconi 
> ---
>  Documentation/networking/page_pool.rst | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Documentation/networking/page_pool.rst 
> b/Documentation/networking/page_pool.rst
> index 43088ddf95e4..e848f5b995b8 100644
> --- a/Documentation/networking/page_pool.rst
> +++ b/Documentation/networking/page_pool.rst
> @@ -97,6 +97,14 @@ a page will cause no race conditions is enough.
>  
>  * page_pool_get_dma_dir(): Retrieve the stored DMA direction.
>  
> +* page_pool_put_page_bulk(): It tries to refill a bulk of count pages into 
> the

Tries to refill a number of pages sounds better?

> +  ptr_ring cache holding ptr_ring producer lock. If the ptr_ring is full,
> +  page_pool_put_page_bulk() will release leftover pages to the page 
> allocator.
> +  page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx
> +  completion loop for the XDP_REDIRECT use case.
> +  Please consider the caller must not use data area after running

s/consider/note/

> +  page_pool_put_page_bulk(), as this function overwrites it.
> +
>  Coding examples
>  ===
>  
> -- 
> 2.28.0
> 


Other than that 
Acked-by: Ilias Apalodimas

Re: [PATCH net] bonding: fix feature flag setting at init time

2020-11-23 Thread Ivan Vecera

On Sun, 22 Nov 2020 22:17:16 -0500
Jarod Wilson  wrote:

> Have run into a case where bond_option_mode_set() gets called before
> hw_features has been filled in, and very bad things happen when
> netdev_change_features() then gets called, because the empty hw_features
> wipes out almost all features. Further reading of netdev feature flag
> documentation suggests drivers aren't supposed to touch wanted_features,
> so this changes bond_option_mode_set() to use netdev_increment_features()
> and &= ~BOND_XFRM_FEATURES on mode changes and then only calling
> netdev_features_change() if there was actually a change of features. This
> specifically fixes bonding on top of mlxsw interfaces, and has been
> regression-tested with ixgbe interfaces. This change also simplifies the
> xfrm-specific code in bond_setup() a little bit as well.

Hi Jarod,

the reason is not correct... The problem is not with empty ->hw_features but
with empty ->wanted_features.
During bond device creation bond_newlink() is called. It calls bond_changelink()
first and afterwards register_netdevice(). The problem is that ->wanted_features
are initialized in register_netdevice() so during bond_changlink() call
->wanted_features is 0. So...

bond_newlink()
-> bond_changelink()
   -> __bond_opt_set()
  -> bond_option_mode_set()
 -> netdev_change_features()
-> __netdev_update_features()
   features = netdev_get_wanted_features()
  { dev->features & ~dev->hw_features) | 
dev->wanted_features }

dev->wanted_features is here zero so the rest of the expression clears a bunch 
of
bits from dev->features...

In case of mlxsw it is important that NETIF_F_HW_VLAN_CTAG_FILTER bit is 
cleared in
bonding device because in this case vlan_add_rx_filter_info() does not call
bond_vlan_rx_add_vid() so mlxsw_sp_port_add_vid() is not called as well.

Later this causes a WARN in mlxsw_sp_inetaddr_port_vlan_event() because
instance of mlxsw_sp_port_vlan does not exist as mlxsw_sp_port_add_vid() was not
called.

Btw. it should be enough to call existing snippet in bond_option_mode_set() only
when device is already registered?

E.g.:
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..ca4913fee5a9 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -768,11 +768,13 @@ static int bond_option_mode_set(struct bonding *bond,
bond->params.tlb_dynamic_lb = 1;
 
 #ifdef CONFIG_XFRM_OFFLOAD
-   if (newval->value == BOND_MODE_ACTIVEBACKUP)
-   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
-   else
-   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
+   if (dev->reg_state == NETREG_REGISTERED) {
+   if (newval->value == BOND_MODE_ACTIVEBACKUP)
+   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
+   else
+   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
+   netdev_change_features(bond->dev);
+   }
 #endif /* CONFIG_XFRM_OFFLOAD */


Thanks,
Ivan

Re: [PATCH net-next 2/2] net: nfc: s3fwrn5: Support a UART interface

2020-11-23 Thread k...@kernel.org

On Mon, Nov 23, 2020 at 04:56:58PM +0900, Bongsu Jeon wrote:
> Since S3FWRN82 NFC Chip, The UART interface can be used.
> S3FWRN82 uses NCI protocol and supports I2C and UART interface.
> 
> Signed-off-by: Bongsu Jeon 

Please start sending emails properly, e.g. with git send-email, so all
your patches in the patchset are referencing the first patch.

> ---
>  drivers/nfc/s3fwrn5/Kconfig  |  12 ++
>  drivers/nfc/s3fwrn5/Makefile |   2 +
>  drivers/nfc/s3fwrn5/uart.c   | 250 +++
>  3 files changed, 264 insertions(+)
>  create mode 100644 drivers/nfc/s3fwrn5/uart.c
> 
> diff --git a/drivers/nfc/s3fwrn5/Kconfig b/drivers/nfc/s3fwrn5/Kconfig
> index 3f8b6da58280..6f88737769e1 100644
> --- a/drivers/nfc/s3fwrn5/Kconfig
> +++ b/drivers/nfc/s3fwrn5/Kconfig
> @@ -20,3 +20,15 @@ config NFC_S3FWRN5_I2C
> To compile this driver as a module, choose m here. The module will
> be called s3fwrn5_i2c.ko.
> Say N if unsure.
> +
> +config NFC_S3FWRN82_UART
> + tristate "Samsung S3FWRN82 UART support"
> + depends on NFC_NCI && SERIAL_DEV_BUS

What about SERIAL_DEV_BUS as module? Shouldn't this be
SERIAL_DEV_BUS || !SERIAL_DEV_BUS?

> + select NFC_S3FWRN5
> + help
> +   This module adds support for a UART interface to the S3FWRN82 chip.
> +   Select this if your platform is using the UART bus.
> +
> +   To compile this driver as a module, choose m here. The module will
> +   be called s3fwrn82_uart.ko.
> +   Say N if unsure.
> diff --git a/drivers/nfc/s3fwrn5/Makefile b/drivers/nfc/s3fwrn5/Makefile
> index d0ffa35f50e8..d1902102060b 100644
> --- a/drivers/nfc/s3fwrn5/Makefile
> +++ b/drivers/nfc/s3fwrn5/Makefile
> @@ -5,6 +5,8 @@
>  
>  s3fwrn5-objs = core.o firmware.o nci.o
>  s3fwrn5_i2c-objs = i2c.o
> +s3fwrn82_uart-objs = uart.o
>  
>  obj-$(CONFIG_NFC_S3FWRN5) += s3fwrn5.o
>  obj-$(CONFIG_NFC_S3FWRN5_I2C) += s3fwrn5_i2c.o
> +obj-$(CONFIG_NFC_S3FWRN82_UART) += s3fwrn82_uart.o
> diff --git a/drivers/nfc/s3fwrn5/uart.c b/drivers/nfc/s3fwrn5/uart.c
> new file mode 100644
> index ..b3c36a5b28d3
> --- /dev/null
> +++ b/drivers/nfc/s3fwrn5/uart.c
> @@ -0,0 +1,250 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * UART Link Layer for S3FWRN82 NCI based Driver
> + *
> + * Copyright (C) 2020 Samsung Electronics
> + * Author: Bongsu Jeon 

You copied a lot from existing i2c.c. Please keep also the original
copyrights.

> + * All rights reserved.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "s3fwrn5.h"
> +
> +#define S3FWRN82_UART_DRIVER_NAME "s3fwrn82_uart"

Remove the define, it is used only once.

> +#define S3FWRN82_NCI_HEADER 3
> +#define S3FWRN82_NCI_IDX 2
> +#define S3FWRN82_EN_WAIT_TIME 20
> +#define NCI_SKB_BUFF_LEN 258
> +
> +struct s3fwrn82_uart_phy {
> + struct serdev_device *ser_dev;
> + struct nci_dev *ndev;
> + struct sk_buff *recv_skb;
> +
> + unsigned int gpio_en;
> + unsigned int gpio_fw_wake;
> +
> + /* mutex is used to synchronize */

Please do not write obvious comments. Mutex is always used to
synchronize, what else is it for? Instead you must describe what exactly
is protected with mutex.

> + struct mutex mutex;
> + enum s3fwrn5_mode mode;
> +};
> +
> +static void s3fwrn82_uart_set_wake(void *phy_id, bool wake)
> +{
> + struct s3fwrn82_uart_phy *phy = phy_id;
> +
> + mutex_lock(&phy->mutex);
> + gpio_set_value(phy->gpio_fw_wake, wake);
> + msleep(S3FWRN82_EN_WAIT_TIME);
> + mutex_unlock(&phy->mutex);
> +}
> +
> +static void s3fwrn82_uart_set_mode(void *phy_id, enum s3fwrn5_mode mode)
> +{
> + struct s3fwrn82_uart_phy *phy = phy_id;
> +
> + mutex_lock(&phy->mutex);
> + if (phy->mode == mode)
> + goto out;
> + phy->mode = mode;
> + gpio_set_value(phy->gpio_en, 1);
> + gpio_set_value(phy->gpio_fw_wake, 0);
> + if (mode == S3FWRN5_MODE_FW)
> + gpio_set_value(phy->gpio_fw_wake, 1);
> + if (mode != S3FWRN5_MODE_COLD) {
> + msleep(S3FWRN82_EN_WAIT_TIME);
> + gpio_set_value(phy->gpio_en, 0);
> + msleep(S3FWRN82_EN_WAIT_TIME);
> + }
> +out:
> + mutex_unlock(&phy->mutex);
> +}
> +
> +static enum s3fwrn5_mode s3fwrn82_uart_get_mode(void *phy_id)
> +{
> + struct s3fwrn82_uart_phy *phy = phy_id;
> + enum s3fwrn5_mode mode;
> +
> + mutex_lock(&phy->mutex);
> + mode = phy->mode;
> + mutex_unlock(&phy->mutex);
> + return mode;
> +}

All this duplicates I2C version. You need to start either reusing common
blocks.

> +
> +static int s3fwrn82_uart_write(void *phy_id, struct sk_buff *out)
> +{
> + struct s3fwrn82_uart_phy *phy = phy_id;
> + int err;
> +
> + err = serdev_device_write(phy->ser_dev,
> +   out->data, out->len,
> +   MAX_SCHEDULE_TIMEOUT);
> + if (err < 0)
> +

Re: [PATCH bpf] xsk: fix incorrect netdev reference count

2020-11-23 Thread Magnus Karlsson

On Fri, Nov 20, 2020 at 4:17 PM  wrote:
>
> From: Marek Majtyka 
>
> Fix incorrect netdev reference count in xsk_bind operation. Incorrect
> reference count of the device appears when a user calls bind with the
> XDP_ZEROCOPY flag on an interface which does not support zero-copy.
> In such a case, an error is returned but the reference count is not
> decreased. This change fixes the fault, by decreasing the reference count
> in case of such an error.
>
> The problem being corrected appeared in '162c820ed896' for the first time,
> and the code was moved to new file location over the time with commit
> 'c2d3d6a47462'. This specific patch applies to all version starting
> from 'c2d3d6a47462'. The same solution should be applied but on different
> file (net/xdp/xdp_umem.c) and function (xdp_umem_assign_dev) for versions
> from '162c820ed896' to 'c2d3d6a47462' excluded.
>
> Fixes: 162c820ed896 ("xdp: hold device for umem regardless of zero- ...")
> Signed-off-by: Marek Majtyka 
> ---
>  net/xdp/xsk_buff_pool.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 8a3bf4e1318e..46d09bfb1923 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -185,8 +185,10 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
>  err_unreg_pool:
> if (!force_zc)
> err = 0; /* fallback to copy mode */
> -   if (err)
> +   if (err) {
> xsk_clear_pool_at_qid(netdev, queue_id);
> +   dev_put(netdev);
> +   }
> return err;
>  }

Thank you Marek for spotting and fixing this!

Acked-by: Magnus Karlsson 

> --
> 2.27.0
>

Re: [PATCH net-next v4 2/5] net/lapb: support netdev events

2020-11-23 Thread Xie He

On Sun, Nov 22, 2020 at 10:55 PM Martin Schiller  wrote:
>
> No, they aren't independent. The carrier can only be up if the device /
> interface is UP. And as far as I can see a NETDEV_CHANGE event will also
> only be generated on interfaces that are UP.
>
> So you can be sure, that if there is a NETDEV_CHANGE event then the
> device is UP.

OK. Thanks for your explanation!

> I removed the NETDEV_UP handling because I don't think it makes sense
> to implicitly try to establish layer2 (LAPB) if there is no carrier.

As I understand, when the device goes up, the carrier can be either
down or up. Right?

If this is true, when a device goes up and the carrier then goes up
after that, L2 will automatically connect, but if a device goes up and
the carrier is already up, L2 will not automatically connect. I think
it might be better to eliminate this difference in handling. It might
be better to make it automatically connect in both situations, or in
neither situations.

If you want to go with the second way (auto connect in neither
situations), the next (3rd) patch of this series might be also not
needed.

I just want to make the behavior of LAPB more consistent. I think we
should either make LAPB auto-connect in all situations, or make LAPB
wait for L3's instruction to connect in all situations.

> And with the first X.25 connection request on that interface, it will
> be established anyway by x25_transmit_link().
>
> I've tested it here with an HDLC WAN Adapter and it works as expected.
>
> These are also the ideal conditions for the already mentioned "on
> demand" scenario. The only necessary change would be to call
> x25_terminate_link() on an interface after clearing the last X.25
> session.
>
> > On NETDEV_GOING_DOWN, we can also check the carrier status first and
> > if it is down, we don't need to call lapb_disconnect_request.
>
> This is not necessary because lapb_disconnect_request() checks the
> current state. And if the carrier is DOWN then the state should also be
> LAPB_STATE_0 and so lapb_disconnect_request() does nothing.

Yes, I understand. I just thought adding this check might make the
code cleaner. But you are right.

Re: [PATCH net-next] net: page_pool: Add page_pool_put_page_bulk() to page_pool.rst

2020-11-23 Thread Lorenzo Bianconi

> On Fri, Nov 20, 2020 at 11:19:34PM +0100, Lorenzo Bianconi wrote:
> > Introduce page_pool_put_page_bulk() entry into the API section of
> > page_pool.rst
> > 
> > Signed-off-by: Lorenzo Bianconi 
> > ---
> >  Documentation/networking/page_pool.rst | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/Documentation/networking/page_pool.rst 
> > b/Documentation/networking/page_pool.rst
> > index 43088ddf95e4..e848f5b995b8 100644
> > --- a/Documentation/networking/page_pool.rst
> > +++ b/Documentation/networking/page_pool.rst
> > @@ -97,6 +97,14 @@ a page will cause no race conditions is enough.
> >  
> >  * page_pool_get_dma_dir(): Retrieve the stored DMA direction.
> >  
> > +* page_pool_put_page_bulk(): It tries to refill a bulk of count pages into 
> > the
> 
> Tries to refill a number of pages sounds better?

ack, will fix it in v2

> 
> > +  ptr_ring cache holding ptr_ring producer lock. If the ptr_ring is full,
> > +  page_pool_put_page_bulk() will release leftover pages to the page 
> > allocator.
> > +  page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx
> > +  completion loop for the XDP_REDIRECT use case.
> > +  Please consider the caller must not use data area after running
> 
> s/consider/note/

ack, will fix it in v2

Regards,
Lorenzo

> 
> > +  page_pool_put_page_bulk(), as this function overwrites it.
> > +
> >  Coding examples
> >  ===
> >  
> > -- 
> > 2.28.0
> > 
> 
> 
> Other than that 
> Acked-by: Ilias Apalodimas 
> 


signature.asc
Description: PGP signature

Re: [PATCH net-next v4 2/5] net/lapb: support netdev events

2020-11-23 Thread Martin Schiller

On 2020-11-23 09:31, Xie He wrote:

On Sun, Nov 22, 2020 at 10:55 PM Martin Schiller  wrote:

No, they aren't independent. The carrier can only be up if the device 
/
interface is UP. And as far as I can see a NETDEV_CHANGE event will 
also

only be generated on interfaces that are UP.

So you can be sure, that if there is a NETDEV_CHANGE event then the
device is UP.

OK. Thanks for your explanation!

I removed the NETDEV_UP handling because I don't think it makes sense
to implicitly try to establish layer2 (LAPB) if there is no carrier.

As I understand, when the device goes up, the carrier can be either
down or up. Right?

If this is true, when a device goes up and the carrier then goes up
after that, L2 will automatically connect, but if a device goes up and
the carrier is already up, L2 will not automatically connect. I think
it might be better to eliminate this difference in handling. It might
be better to make it automatically connect in both situations, or in
neither situations.

AFAIK the carrier can't be up before the device is up. Therefore, there
will be a NETDEV_CHANGE event after the NETDEV_UP event.

This is what I can see in my tests (with the HDLC interface).

Is the behaviour different for e.g. lapbether?

If you want to go with the second way (auto connect in neither
situations), the next (3rd) patch of this series might be also not
needed.

I just want to make the behavior of LAPB more consistent. I think we
should either make LAPB auto-connect in all situations, or make LAPB
wait for L3's instruction to connect in all situations.

And with the first X.25 connection request on that interface, it will
be established anyway by x25_transmit_link().

I've tested it here with an HDLC WAN Adapter and it works as expected.

These are also the ideal conditions for the already mentioned "on
demand" scenario. The only necessary change would be to call
x25_terminate_link() on an interface after clearing the last X.25
session.

> On NETDEV_GOING_DOWN, we can also check the carrier status first and
> if it is down, we don't need to call lapb_disconnect_request.

This is not necessary because lapb_disconnect_request() checks the
current state. And if the carrier is DOWN then the state should also 
be

LAPB_STATE_0 and so lapb_disconnect_request() does nothing.

Yes, I understand. I just thought adding this check might make the
code cleaner. But you are right.

[PATCH v4] dt-bindings: misc: convert fsl,qoriq-mc from txt to YAML

2020-11-23 Thread Laurentiu Tudor

From: Ionut-robert Aron 

Convert fsl,qoriq-mc to YAML in order to automate the verification
process of dts files. In addition, update MAINTAINERS accordingly
and, while at it, add some missing files.

Signed-off-by: Ionut-robert Aron 
[laurentiu.tu...@nxp.com: update MINTAINERS, updates & fixes in schema]
Signed-off-by: Laurentiu Tudor 
---
Changes in v4:
 - use $ref to point to fsl,qoriq-mc-dpmac binding

Changes in v3:
 - dropped duplicated "fsl,qoriq-mc-dpmac" schema and replaced with
   reference to it
 - fixed a dt_binding_check warning

Changes in v2:
 - fixed errors reported by yamllint
 - dropped multiple unnecessary quotes
 - used schema instead of text in description
 - added constraints on dpmac reg property

 .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 196 --
 .../bindings/misc/fsl,qoriq-mc.yaml   | 186 +
 .../ethernet/freescale/dpaa2/overview.rst |   5 +-
 MAINTAINERS   |   4 +-
 4 files changed, 193 insertions(+), 198 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
 create mode 100644 Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml

diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
deleted file mode 100644
index 7b486d4985dc..
--- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
+++ /dev/null
@@ -1,196 +0,0 @@
-* Freescale Management Complex
-
-The Freescale Management Complex (fsl-mc) is a hardware resource
-manager that manages specialized hardware objects used in
-network-oriented packet processing applications. After the fsl-mc
-block is enabled, pools of hardware resources are available, such as
-queues, buffer pools, I/O interfaces. These resources are building
-blocks that can be used to create functional hardware objects/devices
-such as network interfaces, crypto accelerator instances, L2 switches,
-etc.
-
-For an overview of the DPAA2 architecture and fsl-mc bus see:
-Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
-
-As described in the above overview, all DPAA2 objects in a DPRC share the
-same hardware "isolation context" and a 10-bit value called an ICID
-(isolation context id) is expressed by the hardware to identify
-the requester.
-
-The generic 'iommus' property is insufficient to describe the relationship
-between ICIDs and IOMMUs, so an iommu-map property is used to define
-the set of possible ICIDs under a root DPRC and how they map to
-an IOMMU.
-
-For generic IOMMU bindings, see
-Documentation/devicetree/bindings/iommu/iommu.txt.
-
-For arm-smmu binding, see:
-Documentation/devicetree/bindings/iommu/arm,smmu.yaml.
-
-The MSI writes are accompanied by sideband data which is derived from the ICID.
-The msi-map property is used to associate the devices with both the ITS
-controller and the sideband data which accompanies the writes.
-
-For generic MSI bindings, see
-Documentation/devicetree/bindings/interrupt-controller/msi.txt.
-
-For GICv3 and GIC ITS bindings, see:
-Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml.
-
-Required properties:
-
-- compatible
-Value type: 
-Definition: Must be "fsl,qoriq-mc".  A Freescale Management Complex
-compatible with this binding must have Block Revision
-Registers BRR1 and BRR2 at offset 0x0BF8 and 0x0BFC in
-the MC control register region.
-
-- reg
-Value type: 
-Definition: A standard property.  Specifies one or two regions
-defining the MC's registers:
-
-   -the first region is the command portal for the
-this machine and must always be present
-
-   -the second region is the MC control registers. This
-region may not be present in some scenarios, such
-as in the device tree presented to a virtual machine.
-
-- ranges
-Value type: 
-Definition: A standard property.  Defines the mapping between the child
-MC address space and the parent system address space.
-
-The MC address space is defined by 3 components:
- 
-
-Valid values for region type are
-   0x0 - MC portals
-   0x1 - QBMAN portals
-
-- #address-cells
-Value type: 
-Definition: Must be 3.  (see definition in 'ranges' property)
-
-- #size-cells
-Value type: 
-Definition: Must be 1.
-
-Sub-nodes:
-
-The fsl-mc node may optionally have dpmac sub-nodes that describe
-the relationship between the Ethernet MACs which belong to the MC
-and the Ethernet PHYs on the system board.
-
-The dpmac nodes must be under a node named "dpmacs" which contains
-the fo

Re: [PATCH net-next 3/3] net: phy: mscc: use new PTP_MSGTYPE_* defines

2020-11-23 Thread Antoine Tenart

Hello Christian,

Quoting Christian Eggers (2020-11-22 09:26:36)
> Use recently introduced PTP_MSGTYPE_SYNC and PTP_MSGTYPE_DELAY_REQ
> defines instead of a driver internal enumeration.
> 
> Signed-off-by: Christian Eggers 

Reviewed-by: Antoine Tenart 

Thanks!
Antoine

> Cc: Quentin Schulz 
> Cc: Antoine Tenart 
> Cc: Antoine Tenart 
> ---
>  drivers/net/phy/mscc/mscc_ptp.c | 14 +++---
>  drivers/net/phy/mscc/mscc_ptp.h |  5 -
>  2 files changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/phy/mscc/mscc_ptp.c b/drivers/net/phy/mscc/mscc_ptp.c
> index d8a61456d1ce..924ed5b034a4 100644
> --- a/drivers/net/phy/mscc/mscc_ptp.c
> +++ b/drivers/net/phy/mscc/mscc_ptp.c
> @@ -506,9 +506,9 @@ static int vsc85xx_ptp_cmp_init(struct phy_device 
> *phydev, enum ts_blk blk)
>  {
> struct vsc8531_private *vsc8531 = phydev->priv;
> bool base = phydev->mdio.addr == vsc8531->ts_base_addr;
> -   enum vsc85xx_ptp_msg_type msgs[] = {
> -   PTP_MSG_TYPE_SYNC,
> -   PTP_MSG_TYPE_DELAY_REQ
> +   u8 msgs[] = {
> +   PTP_MSGTYPE_SYNC,
> +   PTP_MSGTYPE_DELAY_REQ
> };
> u32 val;
> u8 i;
> @@ -847,9 +847,9 @@ static int vsc85xx_ts_ptp_action_flow(struct phy_device 
> *phydev, enum ts_blk blk
>  static int vsc85xx_ptp_conf(struct phy_device *phydev, enum ts_blk blk,
> bool one_step, bool enable)
>  {
> -   enum vsc85xx_ptp_msg_type msgs[] = {
> -   PTP_MSG_TYPE_SYNC,
> -   PTP_MSG_TYPE_DELAY_REQ
> +   u8 msgs[] = {
> +   PTP_MSGTYPE_SYNC,
> +   PTP_MSGTYPE_DELAY_REQ
> };
> u32 val;
> u8 i;
> @@ -858,7 +858,7 @@ static int vsc85xx_ptp_conf(struct phy_device *phydev, 
> enum ts_blk blk,
> if (blk == INGRESS)
> vsc85xx_ts_ptp_action_flow(phydev, blk, msgs[i],
>PTP_WRITE_NS);
> -   else if (msgs[i] == PTP_MSG_TYPE_SYNC && one_step)
> +   else if (msgs[i] == PTP_MSGTYPE_SYNC && one_step)
> /* no need to know Sync t when sending in one_step */
> vsc85xx_ts_ptp_action_flow(phydev, blk, msgs[i],
>PTP_WRITE_1588);
> diff --git a/drivers/net/phy/mscc/mscc_ptp.h b/drivers/net/phy/mscc/mscc_ptp.h
> index 3ea163af0f4f..da3465360e90 100644
> --- a/drivers/net/phy/mscc/mscc_ptp.h
> +++ b/drivers/net/phy/mscc/mscc_ptp.h
> @@ -436,11 +436,6 @@ enum ptp_cmd {
> PTP_SAVE_IN_TS_FIFO = 11, /* invalid when writing in reg */
>  };
>  
> -enum vsc85xx_ptp_msg_type {
> -   PTP_MSG_TYPE_SYNC,
> -   PTP_MSG_TYPE_DELAY_REQ,
> -};
> -
>  struct vsc85xx_ptphdr {
> u8 tsmt; /* transportSpecific | messageType */
> u8 ver;  /* reserved0 | versionPTP */
> -- 
> Christian Eggers
> Embedded software developer
>

Re: [PATCH net-next] ip_gre: remove CRC flag from dev features in gre_gso_segment

2020-11-23 Thread Xin Long

On Sat, Nov 21, 2020 at 12:10 AM Alexander Duyck
 wrote:
>
> On Fri, Nov 20, 2020 at 2:23 AM Xin Long  wrote:
> >
> > On Fri, Nov 20, 2020 at 1:24 AM Alexander Duyck
> >  wrote:
> > >
> > > On Wed, Nov 18, 2020 at 9:53 PM Xin Long  wrote:
> > > >
> > > > On Thu, Nov 19, 2020 at 4:35 AM Alexander Duyck
> > > >  wrote:
> > > > >
> > > > > On Mon, Nov 16, 2020 at 1:17 AM Xin Long  wrote:
> > > > > >
> > > > > > This patch is to let it always do CRC checksum in sctp_gso_segment()
> > > > > > by removing CRC flag from the dev features in gre_gso_segment() for
> > > > > > SCTP over GRE, just as it does in Commit 527beb8ef9c0 ("udp: support
> > > > > > sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP.
> > > > > > It could set csum/csum_start in GSO CB properly in 
> > > > > > sctp_gso_segment()
> > > > > > after that commit, so it would do checksum with gso_make_checksum()
> > > > > > in gre_gso_segment(), and Commit 622e32b7d4a6 ("net: gre: recompute
> > > > > > gre csum for sctp over gre tunnels") can be reverted now.
> > > > > >
> > > > > > Signed-off-by: Xin Long 
> > > > > > ---
> > > > > >  net/ipv4/gre_offload.c | 14 +++---
> > > > > >  1 file changed, 3 insertions(+), 11 deletions(-)
> > > > > >
> > > > > > diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
> > > > > > index e0a2465..a5935d4 100644
> > > > > > --- a/net/ipv4/gre_offload.c
> > > > > > +++ b/net/ipv4/gre_offload.c
> > > > > > @@ -15,12 +15,12 @@ static struct sk_buff *gre_gso_segment(struct 
> > > > > > sk_buff *skb,
> > > > > >netdev_features_t features)
> > > > > >  {
> > > > > > int tnl_hlen = skb_inner_mac_header(skb) - 
> > > > > > skb_transport_header(skb);
> > > > > > -   bool need_csum, need_recompute_csum, gso_partial;
> > > > > > struct sk_buff *segs = ERR_PTR(-EINVAL);
> > > > > > u16 mac_offset = skb->mac_header;
> > > > > > __be16 protocol = skb->protocol;
> > > > > > u16 mac_len = skb->mac_len;
> > > > > > int gre_offset, outer_hlen;
> > > > > > +   bool need_csum, gso_partial;
> > > > > >
> > > > > > if (!skb->encapsulation)
> > > > > > goto out;
> > > > > > @@ -41,10 +41,10 @@ static struct sk_buff *gre_gso_segment(struct 
> > > > > > sk_buff *skb,
> > > > > > skb->protocol = skb->inner_protocol;
> > > > > >
> > > > > > need_csum = !!(skb_shinfo(skb)->gso_type & 
> > > > > > SKB_GSO_GRE_CSUM);
> > > > > > -   need_recompute_csum = skb->csum_not_inet;
> > > > > > skb->encap_hdr_csum = need_csum;
> > > > > >
> > > > > > features &= skb->dev->hw_enc_features;
> > > > > > +   features &= ~NETIF_F_SCTP_CRC;
> > > > > >
> > > > > > /* segment inner packet. */
> > > > > > segs = skb_mac_gso_segment(skb, features);
> > > > >
> > > > > Why just blindly strip NETIF_F_SCTP_CRC? It seems like it would make
> > > > > more sense if there was an explanation as to why you are stripping the
> > > > > offload. I know there are many NICs that could very easily perform
> > > > > SCTP CRC offload on the inner data as long as they didn't have to
> > > > > offload the outer data. For example the Intel NICs should be able to
> > > > > do it, although when I wrote the code up enabling their offloads I
> > > > > think it is only looking at the outer headers so that might require
> > > > > updating to get it to not use the software fallback.
> > > > >
> > > > > It really seems like we should only be clearing NETIF_F_SCTP_CRC if
> > > > > need_csum is true since we must compute the CRC before we can compute
> > > > > the GRE checksum.
> > > > Right, it's also what Jakub commented, thanks.
> > > >
> > > > >
> > > > > > @@ -99,15 +99,7 @@ static struct sk_buff *gre_gso_segment(struct 
> > > > > > sk_buff *skb,
> > > > > > }
> > > > > >
> > > > > > *(pcsum + 1) = 0;
> > > > > > -   if (need_recompute_csum && !skb_is_gso(skb)) {
> > > > > > -   __wsum csum;
> > > > > > -
> > > > > > -   csum = skb_checksum(skb, gre_offset,
> > > > > > -   skb->len - gre_offset, 
> > > > > > 0);
> > > > > > -   *pcsum = csum_fold(csum);
> > > > > > -   } else {
> > > > > > -   *pcsum = gso_make_checksum(skb, 0);
> > > > > > -   }
> > > > > > +   *pcsum = gso_make_checksum(skb, 0);
> > > > > > } while ((skb = skb->next));
> > > > > >  out:
> > > > > > return segs;
> > > > >
> > > > > This change doesn't make much sense to me. How are we expecting
> > > > > gso_make_checksum to be able to generate a valid checksum when we are
> > > > > dealing with a SCTP frame? From what I can tell it looks like it is
> > > > > just setting the checksum to ~0 and checksum start to the transport
> > > > > header which isn't true because SCTP is using a CRC, not a 1's
> > > > > co

[PATCH net-next v2 0/2] Add support for DSFP transceiver type

2020-11-23 Thread Moshe Shemesh

Add support for new cable module type DSFP (Dual Small Form-Factor Pluggable
transceiver). DSFP EEPROM memory layout is compatible with CMIS 4.0 spec. Add
CMIS 4.0 module type to UAPI and implement DSFP EEPROM dump in mlx5.

Change log:
v1 -> v2
- Added comments on accessing only the mandatory part of passive and
  active cables.

Vladyslav Tarasiuk (2):
  ethtool: Add CMIS 4.0 module type to UAPI
  net/mlx5e: Add DSFP EEPROM dump support to ethtool

 .../ethernet/mellanox/mlx5/core/en_ethtool.c  | 12 -
 .../net/ethernet/mellanox/mlx5/core/port.c| 52 ---
 include/linux/mlx5/port.h |  1 +
 include/uapi/linux/ethtool.h  |  3 ++
 4 files changed, 60 insertions(+), 8 deletions(-)

-- 
2.18.2

[PATCH net-next v2 2/2] net/mlx5e: Add DSFP EEPROM dump support to ethtool

2020-11-23 Thread Moshe Shemesh

From: Vladyslav Tarasiuk 

DSFP is a new cable module type, which EEPROM uses memory layout
described in CMIS 4.0 document. Use corresponding standard value for
userspace ethtool to distinguish DSFP's layout from older standards.

Add DSFP module ID in accordance to SFF-8024.

DSFP module memory can be flat or paged, which is indicated by a
flat_mem bit. In first case, only page 00 is available, and in second -
multiple pages: 00h, 01h, 02h, 10h and 11h. These five pages in bank
zero include the mandatory part for passive and active cables.

Signed-off-by: Vladyslav Tarasiuk 
Reviewed-by: Moshe Shemesh 
---
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  | 12 -
 .../net/ethernet/mellanox/mlx5/core/port.c| 52 ---
 include/linux/mlx5/port.h |  1 +
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 42e61dc28ead..e6e80f1b0e94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1659,8 +1659,8 @@ static int mlx5e_get_module_info(struct net_device 
*netdev,
int size_read = 0;
u8 data[4] = {0};
 
-   size_read = mlx5_query_module_eeprom(dev, 0, 2, data);
-   if (size_read < 2)
+   size_read = mlx5_query_module_eeprom(dev, 0, 3, data);
+   if (size_read < 3)
return -EIO;
 
/* data[0] = identifier byte */
@@ -1680,6 +1680,14 @@ static int mlx5e_get_module_info(struct net_device 
*netdev,
modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
}
break;
+   case MLX5_MODULE_ID_DSFP:
+   modinfo->type = ETH_MODULE_CMIS_4;
+   /* check flat_mem bit, zero indicates paged memory */
+   if (data[2] & 0x80)
+   modinfo->eeprom_len = ETH_MODULE_CMIS_4_LEN;
+   else
+   modinfo->eeprom_len = ETH_MODULE_CMIS_4_MAX_LEN;
+   break;
case MLX5_MODULE_ID_SFP:
modinfo->type   = ETH_MODULE_SFF_8472;
modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 4bb219565c58..df8e3d024479 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -311,13 +311,9 @@ static int mlx5_query_module_id(struct mlx5_core_dev *dev, 
int module_num,
return 0;
 }
 
-static int mlx5_qsfp_eeprom_page(u16 offset)
+static int mlx5_eeprom_high_page_num(u16 offset)
 {
-   if (offset < MLX5_EEPROM_PAGE_LENGTH)
-   /* Addresses between 0-255 - page 00 */
-   return 0;
-
-   /* Addresses between 256 - 639 belongs to pages 01, 02 and 03
+   /* Addresses 256 and higher belong to pages 01, 02, etc.
 * For example, offset = 400 belongs to page 02:
 * 1 + ((400 - 256)/128) = 2
 */
@@ -325,6 +321,16 @@ static int mlx5_qsfp_eeprom_page(u16 offset)
MLX5_EEPROM_HIGH_PAGE_LENGTH);
 }
 
+static int mlx5_qsfp_eeprom_page(u16 offset)
+{
+   if (offset < MLX5_EEPROM_PAGE_LENGTH)
+   /* Addresses between 0-255 - page 00 */
+   return 0;
+
+   /* Addresses between 256 - 639 belong to pages 01, 02 and 03 */
+   return mlx5_eeprom_high_page_num(offset);
+}
+
 static int mlx5_qsfp_eeprom_high_page_offset(int page_num)
 {
if (!page_num) /* Page 0 always start from low page */
@@ -341,6 +347,37 @@ static void mlx5_qsfp_eeprom_params_set(u16 *i2c_addr, int 
*page_num, u16 *offse
*offset -=  mlx5_qsfp_eeprom_high_page_offset(*page_num);
 }
 
+static int mlx5_dsfp_eeprom_high_page_offset(int page_num)
+{
+   if (!page_num)
+   return 0;
+
+   return (page_num < 0x10 ? page_num : page_num - 13) * 
MLX5_EEPROM_HIGH_PAGE_LENGTH;
+}
+
+static int mlx5_dsfp_eeprom_page(u16 offset)
+{
+   if (offset < MLX5_EEPROM_PAGE_LENGTH)
+   return 0;
+
+   if (offset < MLX5_EEPROM_PAGE_LENGTH + (MLX5_EEPROM_HIGH_PAGE_LENGTH * 
2))
+   /* Addresses 0 - 511 - pages 00, 01 and 02 */
+   return mlx5_eeprom_high_page_num(offset);
+
+   /* Offsets 512 - 767 belong to pages 10h and 11h.
+* For example, offset = 700 belongs to page 11:
+* 13 + 1 + ((700 - 256) / 128) = 17 = 0x11
+*/
+   return 13 + mlx5_eeprom_high_page_num(offset);
+}
+
+static void mlx5_dsfp_eeprom_params_set(u16 *i2c_addr, int *page_num, u16 
*offset)
+{
+   *i2c_addr = MLX5_I2C_ADDR_LOW;
+   *page_num = mlx5_dsfp_eeprom_page(*offset);
+   *offset -= mlx5_dsfp_eeprom_high_page_offset(*page_num);
+}
+
 static void mlx5_sfp_eeprom_params_set(u16 *i2c_addr, int *page_num, u16 
*offset)
 {
*i2c_addr = MLX5_I2C_ADDR_LOW;
@@ -380,6 +417,9 @

[PATCH net-next v2 1/2] ethtool: Add CMIS 4.0 module type to UAPI

2020-11-23 Thread Moshe Shemesh

From: Vladyslav Tarasiuk 

CMIS 4.0 document describes a universal EEPROM memory layout, which is
used for some modules such as DSFP, OSFP and QSFP-DD modules. In order
to distinguish them in userspace from existing standards, add
corresponding values.

CMIS 4.0 EERPOM memory includes mandatory and optional pages, the max
read length 768B includes passive and active cables mandatory pages.

Signed-off-by: Vladyslav Tarasiuk 
Reviewed-by: Moshe Shemesh 
---
 include/uapi/linux/ethtool.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 9ca87bc73c44..0ec4c0ea3235 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1861,9 +1861,12 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 #define ETH_MODULE_SFF_8636_LEN256
 #define ETH_MODULE_SFF_84360x4
 #define ETH_MODULE_SFF_8436_LEN256
+#define ETH_MODULE_CMIS_4  0x5
+#define ETH_MODULE_CMIS_4_LEN  256
 
 #define ETH_MODULE_SFF_8636_MAX_LEN 640
 #define ETH_MODULE_SFF_8436_MAX_LEN 640
+#define ETH_MODULE_CMIS_4_MAX_LEN  768
 
 /* Reset flags */
 /* The reset() operation must clear the flags for the components which
-- 
2.18.2

RE: [PATCH v15 0/9] Enable ptp_kvm for arm/arm64

2020-11-23 Thread Jianyong Wu

Hi,
Ping ...
Any comments?

> -Original Message-
> From: Jianyong Wu 
> Sent: Wednesday, November 11, 2020 2:22 PM
> To: netdev@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> m...@kernel.org; richardcoch...@gmail.com; Mark Rutland
> ; w...@kernel.org; Suzuki Poulose
> ; Andre Przywara ;
> Steven Price 
> Cc: linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> kvm...@lists.cs.columbia.edu; k...@vger.kernel.org; Steve Capper
> ; Justin He ; Jianyong Wu
> ; nd 
> Subject: [PATCH v15 0/9] Enable ptp_kvm for arm/arm64
> 
> Currently, we offen use ntp (sync time with remote network clock) to sync
> time in VM. But the precision of ntp is subject to network delay so it's 
> difficult
> to sync time in a high precision.
> 
> kvm virtual ptp clock (ptp_kvm) offers another way to sync time in VM, as
> the remote clock locates in the host instead of remote network clock.
> It targets to sync time between guest and host in virtualization environment
> and in this way, we can keep the time of all the VMs running in the same host
> in sync. In general, the delay of communication between host and guest is
> quiet small, so ptp_kvm can offer time sync precision up to in order of
> nanosecond. Please keep in mind that ptp_kvm just limits itself to be a
> channel which transmit the remote clock from host to guest and leaves the
> time sync jobs to an application, eg. chrony, in usersapce in VM.
> 
> How ptp_kvm works:
> After ptp_kvm initialized, there will be a new device node under /dev called
> ptp%d. A guest userspace service, like chrony, can use this device to get host
> walltime, sometimes also counter cycle, which depends on the service it calls.
> Then this guest userspace service can use those data to do the time sync for
> guest.
> here is a rough sketch to show how kvm ptp clock works.
> 
> ||  |--|
> |   guest userspace  |  |  host|
> |ioctl -> /dev/ptp%d |  |  |
> |   ^   ||  |  |
> ||  |  |
> |   |   | guest kernel   |  |  |
> |   |   V  (get host walltime/counter cycle)   |
> |  ptp_kvm -> hypercall - - - - - - - - - - ->hypercall service|
> | <- - - - - - - - - - - - |
> ||  |--|
> 
> 1. time sync service in guest userspace call ptp device through /dev/ptp%d.
> 2. ptp_kvm module in guest recive this request then invoke hypercall to
> route into host kernel to request host walltime/counter cycle.
> 3. ptp_kvm hypercall service in host response to the request and send data
> back.
> 4. ptp (not ptp_kvm) in guest copy the data to userspace.
> 
> This ptp_kvm implementation focuses itself to step 2 and 3 and step 2 works
> in guest comparing step 3 works in host kernel.
> 
> change log:
> 
> from v14 to v15
> (1) enable ptp_kvm on arm32 guest, also ptp_kvm has been tested on
> both arm64 and arm32 guest running on arm64 kvm host.
> (2) move arch-agnostic part of ptp_kvm.rst into timekeeping.rst.
> (3) rename KVM_CAP_ARM_PTP_KVM to KVM_CAP_PTP_KVM as it
> should be arch agnostic.
> (4) add description for KVM_CAP_PTP_KVM in
> Documentation/virt/kvm/api.rst.
> (5) adjust dependency in Kconfig for ptp_kvm.
> (6) refine multi-arch process in driver/ptp/Makefile.
> (7) fix make pdfdocs htmldocs issue for ptp_kvm doc.
> (8) address other issues from comments in v14.
> (9) fold hypercall service of ptp_kvm as a function.
> (10) rebase to 5.10-rc3.
> 
> from v13 to v14
> (1) rebase code on 5.9-rc3.
> (2) add a document to introduce implementation of PTP_KVM on arm64.
> (3) fix comments issue in hypercall.c.
> (4) export arm_smccc_1_1_get_conduit using EXPORT_SYMBOL_GPL.
> (5) fix make issue on x86 reported by kernel test robot.
> 
> from v12 to v13:
> (1) rebase code on 5.8-rc1.
> (2) this patch set base on 2 patches of 1/8 and 2/8 from Will Decon.
> (3) remove the change to ptp device code of extend getcrosststamp.
> (4) remove the mechanism of letting user choose the counter type in
> ptp_kvm for arm64.
> (5) add virtual counter option in ptp_kvm service to let user choose 
> the
> specific counter explicitly.
> 
> from v11 to v12:
> (1) rebase code on 5.7-rc6 and rebase 2 patches from Will Decon
> including 1/11 and 2/11. as these patches introduce discover mechanism of
> vendor smccc service.
> (2) rebase ptp_kvm hypercall service from standard smccc to vendor
> smccc and add ptp_kvm to v

Re: [PATCH net-next v4 2/5] net/lapb: support netdev events

2020-11-23 Thread Xie He

On Mon, Nov 23, 2020 at 1:00 AM Martin Schiller  wrote:
>
> AFAIK the carrier can't be up before the device is up. Therefore, there
> will be a NETDEV_CHANGE event after the NETDEV_UP event.
>
> This is what I can see in my tests (with the HDLC interface).
>
> Is the behaviour different for e.g. lapbether?

Some drivers don't support carrier status and will never change it.
Their carrier status will always be UP. There will not be a
NETDEV_CHANGE event.

lapbether doesn't change carrier status. I also have my own virtual
HDLC WAN driver (for testing) which also doesn't change carrier
status.

I just tested with lapbether. When I bring up the interface, there
will only be NETDEV_PRE_UP and then NETDEV_UP. There will not be
NETDEV_CHANGE. The carrier status is alway UP.

I haven't tested whether a device can receive NETDEV_CHANGE when it is
down. It's possible for a device driver to call netif_carrier_on when
the interface is down. Do you know what will happen if a device driver
calls netif_carrier_on when the interface is down?

Re: [PATCH] dpaa2-eth: Fix compile error due to missing devlink support

2020-11-23 Thread Ioana Ciornei



Hi Ezequiel,

Thanks a lot for the fix, I overlooked this when adding devlink support.

On Sat, Nov 21, 2020 at 09:23:36PM -0300, Ezequiel Garcia wrote:
> The dpaa2 driver depends on devlink, so it should select
> NET_DEVLINK in order to fix compile errors, such as:
>
> drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.o: in function 
> `dpaa2_eth_rx_err':
> dpaa2-eth.c:(.text+0x3cec): undefined reference to `devlink_trap_report'
> drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.o: in function 
> `dpaa2_eth_dl_info_get':
> dpaa2-eth-devlink.c:(.text+0x160): undefined reference to 
> `devlink_info_driver_name_put'
>

What tree is this intended for?

Maybe add a fixes tag and send this towards the net tree?

Ioana

> Signed-off-by: Ezequiel Garcia 
> ---
>  drivers/net/ethernet/freescale/dpaa2/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/freescale/dpaa2/Kconfig 
> b/drivers/net/ethernet/freescale/dpaa2/Kconfig
> index cfd369cf4c8c..aee59ead7250 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/Kconfig
> +++ b/drivers/net/ethernet/freescale/dpaa2/Kconfig
> @@ -2,6 +2,7 @@
>  config FSL_DPAA2_ETH
>   tristate "Freescale DPAA2 Ethernet"
>   depends on FSL_MC_BUS && FSL_MC_DPIO
> + select NET_DEVLINK
>   select PHYLINK
>   select PCS_LYNX
>   help
> --
> 2.27.0
>

Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-11-23 Thread Jiri Pirko

Thu, Nov 19, 2020 at 09:38:34PM CET, edwin.p...@broadcom.com wrote:
>On Sat, Oct 10, 2020 at 3:54 PM Ido Schimmel  wrote:
>
>> Add 'ETHTOOL_A_LINKMODES_LANES' attribute and expand 'struct
>> ethtool_link_settings' with lanes field in order to implement a new
>> lanes-selector that will enable the user to advertise a specific number
>> of lanes as well.
>
>Why can't this be implied by port break-out configuration? For higher
>speed signalling modes like PAM4, what's the difference between a
>port with unused lanes vs the same port split into multiple logical
>ports? In essence, the driver could then always choose the slowest

There is a crucial difference. Split port is configured alwasy by user.
Each split port has a devlink instace, netdevice associated with it.
It is one level above the lanes.


>signalling mode that utilizes all the available lanes.
>
>Regards,
>Edwin Peer

Re: [PATCH] libbpf: add support for canceling cached_cons advance

2020-11-23 Thread Magnus Karlsson

On Sun, Nov 22, 2020 at 2:21 PM Li RongQing  wrote:
>
> It is possible to fail receiving packets after calling
> xsk_ring_cons__peek, at this condition, cached_cons has
> been advanced, should be cancelled.

Thanks RongQing,

I have needed this myself in various situations, so I think we should
add this. But your motivation in the commit message is somewhat
confusing. How about something like this?

Add a new function for returning descriptors the user received after
an xsk_ring_cons__peek call. After the application has gotten a number
of descriptors from a ring, it might not be able to or want to process
them all for various reasons. Therefore, it would be useful to have an
interface for returning or cancelling a number of them so that they
are returned to the ring. This patch adds a new function called
xsk_ring_cons__cancel that performs this operation on nb descriptors
counted from the end of the batch of descriptors that was received
through the peek call.

Replace your commit message with this, fix the bug below, send a v2
and then I am happy to ack this.

/Magnus

> Signed-off-by: Li RongQing 
> ---
>  tools/lib/bpf/xsk.h | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 1069c46364ff..4128215c246b 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -153,6 +153,12 @@ static inline size_t xsk_ring_cons__peek(struct 
> xsk_ring_cons *cons,
> return entries;
>  }
>
> +static inline void xsk_ring_cons__cancel(struct xsk_ring_cons *cons,
> +size_t nb)
> +{
> +   rx->cached_cons -= nb;

cons-> not rx->. Please make sure the v2 compiles and passes checkpatch.

> +}
> +
>  static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t 
> nb)
>  {
> /* Make sure data has been read before indicating we are done
> --
> 2.17.3
>

RE: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-11-23 Thread Danielle Ratson




> -Original Message-
> From: Michal Kubecek 
> Sent: Thursday, October 22, 2020 7:28 PM
> To: Danielle Ratson 
> Cc: Jiri Pirko ; Andrew Lunn ; Jakub 
> Kicinski ; Ido Schimmel
> ; netdev@vger.kernel.org; da...@davemloft.net; Jiri Pirko 
> ; f.faine...@gmail.com; mlxsw
> ; Ido Schimmel ; 
> johan...@sipsolutions.net
> Subject: Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI 
> with lanes
> 
> On Thu, Oct 22, 2020 at 06:15:48AM +, Danielle Ratson wrote:
> > > -Original Message-
> > > From: Michal Kubecek 
> > > Sent: Wednesday, October 21, 2020 11:48 AM
> > >
> > > Ah, right, it does. But as you extend struct ethtool_link_ksettings
> > > and drivers will need to be updated to provide this information,
> > > wouldn't it be more useful to let the driver provide link mode in
> > > use instead (and derive number of lanes from it)?
> >
> > This is the way it is done with the speed parameter, so I have aligned
> > it to it. Why the lanes should be done differently comparing to the
> > speed?
> 
> Speed and duplex have worked this way since ages and the interface was 
> probably introduced back in times when combination of
> speed and duplex was sufficient to identify the link mode. This is no longer 
> the case and even adding number of lanes wouldn't make
> the combination unique. So if we are going to extend the interface now and 
> update drivers to provide extra information, I believe it
> would be more useful to provide full information.
> 
> Michal

Hi Michal,

What do you think of passing the link modes you have suggested as a bitmask, 
similar to "supported", that contains only one positive bit?
Something like that:

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index afae2beacbc3..dd946c88daa3 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -127,6 +127,7 @@ struct ethtool_link_ksettings {
__ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising);
__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising);
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(chosen);
} link_modes;
u32 lanes;
 };

Do you have perhaps a better suggestion?

And the speed and duplex parameters should be removed from being passed like as 
well, right?

Thanks,
Danielle

BUG: receive list entry not found for dev vcan0, id 001, mask C00007FF

2020-11-23 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:b9ad3e9f bonding: wait for sysfs kobject destruction befor..
git tree:   net
console output: https://syzkaller.appspot.com/x/log.txt?x=1195c5cd50
kernel config:  https://syzkaller.appspot.com/x/.config?x=330f3436df12fd44
dashboard link: https://syzkaller.appspot.com/bug?extid=d0ddd88c9a7432f041e6
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13c409cd50
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1349ced150

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d0ddd88c9a7432f04...@syzkaller.appspotmail.com

RAX: ffda RBX: 7fffc0827800 RCX: 00443749
RDX: 0018 RSI: 2300 RDI: 0004
RBP:  R08: 0001 R09: 01bb
R10:  R11: 0246 R12: 
R13: 0005 R14:  R15: 
[ cut here ]
BUG: receive list entry not found for dev vcan0, id 001, mask C7FF
WARNING: CPU: 0 PID: 8495 at net/can/af_can.c:546 can_rx_unregister+0x5a4/0x700 
net/can/af_can.c:546
Modules linked in:
CPU: 0 PID: 8495 Comm: syz-executor608 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:can_rx_unregister+0x5a4/0x700 net/can/af_can.c:546
Code: 8b 7c 24 78 44 8b 64 24 68 49 c7 c5 a0 ae 56 8a e8 11 58 97 f9 44 89 f9 
44 89 e2 4c 89 ee 48 c7 c7 e0 ae 56 8a e8 76 ab d3 00 <0f> 0b 48 8b 7c 24 28 e8 
90 22 0f 01 e9 54 fb ff ff e8 06 cf d8 f9
RSP: 0018:c9000182f9f0 EFLAGS: 00010282
RAX:  RBX:  RCX: 
RDX: 88801ffe8000 RSI: 8158f3c5 RDI: f52000305f30
RBP: 0118 R08: 0001 R09: 8880b9e30627
R10:  R11:  R12: 0001
R13: 88801ab0 R14: 192000305f45 R15: c7ff
FS:  () GS:8880b9e0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004c8928 CR3: 0b08e000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 isotp_notifier+0x2a7/0x540 net/can/isotp.c:1303
 call_netdevice_notifier net/core/dev.c:1735 [inline]
 call_netdevice_unregister_notifiers+0x156/0x1c0 net/core/dev.c:1763
 call_netdevice_unregister_net_notifiers net/core/dev.c:1791 [inline]
 unregister_netdevice_notifier+0xcd/0x170 net/core/dev.c:1870
 isotp_release+0x136/0x600 net/can/isotp.c:1011
 __sock_release+0xcd/0x280 net/socket.c:596
 sock_close+0x18/0x20 net/socket.c:1277
 __fput+0x285/0x920 fs/file_table.c:281
 task_work_run+0xdd/0x190 kernel/task_work.c:151
 exit_task_work include/linux/task_work.h:30 [inline]
 do_exit+0xb64/0x29b0 kernel/exit.c:809
 do_group_exit+0x125/0x310 kernel/exit.c:906
 __do_sys_exit_group kernel/exit.c:917 [inline]
 __se_sys_exit_group kernel/exit.c:915 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:915
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x442388
Code: Unable to access opcode bytes at RIP 0x44235e.
RSP: 002b:7fffc0827768 EFLAGS: 0246 ORIG_RAX: 00e7
RAX: ffda RBX: 0001 RCX: 00442388
RDX: 0001 RSI: 003c RDI: 0001
RBP: 004c88f0 R08: 00e7 R09: ffd0
R10:  R11: 0246 R12: 0001
R13: 006dd240 R14:  R15: 


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

inconsistent lock state in io_file_data_ref_zero

2020-11-23 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:27bba9c5 Merge tag 'scsi-fixes' of git://git.kernel.org/pu..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11041f1e50
kernel config:  https://syzkaller.appspot.com/x/.config?x=330f3436df12fd44
dashboard link: https://syzkaller.appspot.com/bug?extid=1f4ba1e5520762c523c6
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17d9b77550
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=157e4f7550

The issue was bisected to:

commit dcd479e10a0510522a5d88b29b8f79ea3467d501
Author: Johannes Berg 
Date:   Fri Oct 9 12:17:11 2020 +

mac80211: always wind down STA state

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=130299a950
final oops: https://syzkaller.appspot.com/x/report.txt?x=108299a950
console output: https://syzkaller.appspot.com/x/log.txt?x=170299a950

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+1f4ba1e5520762c52...@syzkaller.appspotmail.com
Fixes: dcd479e10a05 ("mac80211: always wind down STA state")


WARNING: inconsistent lock state
5.10.0-rc4-syzkaller #0 Not tainted

inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
8880125202a8 (&file_data->lock){+.?.}-{2:2}, at: spin_lock 
include/linux/spinlock.h:354 [inline]
8880125202a8 (&file_data->lock){+.?.}-{2:2}, at: 
io_file_data_ref_zero+0x75/0x480 fs/io_uring.c:7361
{SOFTIRQ-ON-W} state was registered at:
  lock_acquire kernel/locking/lockdep.c:5435 [inline]
  lock_acquire+0x2a3/0x8c0 kernel/locking/lockdep.c:5400
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  io_sqe_files_register fs/io_uring.c:7496 [inline]
  __io_uring_register fs/io_uring.c:9660 [inline]
  __do_sys_io_uring_register+0x343a/0x40d0 fs/io_uring.c:9750
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
irq event stamp: 131582
hardirqs last  enabled at (131582): [] 
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
hardirqs last  enabled at (131582): [] 
_raw_spin_unlock_irqrestore+0x42/0x50 kernel/locking/spinlock.c:191
hardirqs last disabled at (131581): [] 
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (131581): [] 
_raw_spin_lock_irqsave+0x4e/0x50 kernel/locking/spinlock.c:159
softirqs last  enabled at (131566): [] 
irq_enter_rcu+0xcf/0xf0 kernel/softirq.c:360
softirqs last disabled at (131567): [] 
asm_call_irq_on_stack+0xf/0x20

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(&file_data->lock);
  
lock(&file_data->lock);

 *** DEADLOCK ***

2 locks held by swapper/0/0:
 #0: 8b337700 (rcu_callback){}-{0:0}, at: rcu_do_batch 
kernel/rcu/tree.c:2466 [inline]
 #0: 8b337700 (rcu_callback){}-{0:0}, at: rcu_core+0x576/0xe80 
kernel/rcu/tree.c:2711
 #1: 8b337820 (rcu_read_lock){}-{1:2}, at: 
percpu_ref_put_many.constprop.0+0x0/0x250 net/netfilter/xt_cgroup.c:62

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:118
 print_usage_bug kernel/locking/lockdep.c:3738 [inline]
 valid_state kernel/locking/lockdep.c:3749 [inline]
 mark_lock_irq kernel/locking/lockdep.c:3952 [inline]
 mark_lock.cold+0x32/0x74 kernel/locking/lockdep.c:4409
 mark_usage kernel/locking/lockdep.c:4304 [inline]
 __lock_acquire+0x11b1/0x5c00 kernel/locking/lockdep.c:4784
 lock_acquire kernel/locking/lockdep.c:5435 [inline]
 lock_acquire+0x2a3/0x8c0 kernel/locking/lockdep.c:5400
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:354 [inline]
 io_file_data_ref_zero+0x75/0x480 fs/io_uring.c:7361
 percpu_ref_put_many.constprop.0+0x217/0x250 include/linux/percpu-refcount.h:322
 rcu_do_batch kernel/rcu/tree.c:2476 [inline]
 rcu_core+0x5df/0xe80 kernel/rcu/tree.c:2711
 __do_softirq+0x2a0/0x9f6 kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:393 [inline]
 __irq_exit_rcu kernel/softirq.c:423 [inline]
 irq_exit_rcu+0x132/0x200 kernel/softirq.c:435
 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/inc

general protection fault in ieee80211_subif_start_xmit

2020-11-23 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:a349e4c6 Merge tag 'xfs-5.10-fixes-7' of git://git.kernel...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1427b22550
kernel config:  https://syzkaller.appspot.com/x/.config?x=330f3436df12fd44
dashboard link: https://syzkaller.appspot.com/bug?extid=d7a3b15976bf7de2238a
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=164652f550

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d7a3b15976bf7de22...@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 
0xdc34:  [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x01a0-0x01a7]
CPU: 0 PID: 10156 Comm: syz-executor.4 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:ieee80211_multicast_to_unicast net/mac80211/tx.c:4070 [inline]
RIP: 0010:ieee80211_subif_start_xmit+0x24e/0xee0 net/mac80211/tx.c:4154
Code: 03 80 3c 02 00 0f 85 83 0c 00 00 49 8b 9f 50 17 00 00 48 b8 00 00 00 00 
00 fc ff df 48 8d bb a4 01 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 
e2 07 38 d0 7f 08 84 c0 0f 85 58 0c 00 00
RSP: 0018:c9007588 EFLAGS: 00010203
RAX: dc00 RBX:  RCX: 8851c61d
RDX: 0034 RSI: 8851c6ad RDI: 01a4
RBP: 88801b850280 R08:  R09: 8cecb9cf
R10: 0004 R11:  R12: 8a61f1e0
R13: 888012f07042 R14: 005a R15: 8880284b
FS:  7f1159678700() GS:8880b9e0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 016a9e60 CR3: 2ca99000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 __netdev_start_xmit include/linux/netdevice.h:4718 [inline]
 netdev_start_xmit include/linux/netdevice.h:4732 [inline]
 xmit_one net/core/dev.c:3564 [inline]
 dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3580
 sch_direct_xmit+0x2e1/0xbd0 net/sched/sch_generic.c:313
 qdisc_restart net/sched/sch_generic.c:376 [inline]
 __qdisc_run+0x4ba/0x15e0 net/sched/sch_generic.c:384
 qdisc_run include/net/pkt_sched.h:131 [inline]
 qdisc_run include/net/pkt_sched.h:123 [inline]
 __dev_xmit_skb net/core/dev.c:3755 [inline]
 __dev_queue_xmit+0x1453/0x2da0 net/core/dev.c:4108
 neigh_hh_output include/net/neighbour.h:499 [inline]
 neigh_output include/net/neighbour.h:508 [inline]
 ip6_finish_output2+0x8db/0x16c0 net/ipv6/ip6_output.c:117
 __ip6_finish_output net/ipv6/ip6_output.c:143 [inline]
 __ip6_finish_output+0x447/0xab0 net/ipv6/ip6_output.c:128
 ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153
 NF_HOOK_COND include/linux/netfilter.h:290 [inline]
 ip6_output+0x1db/0x520 net/ipv6/ip6_output.c:176
 dst_output include/net/dst.h:443 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 NF_HOOK include/linux/netfilter.h:295 [inline]
 mld_sendpack+0x92a/0xdb0 net/ipv6/mcast.c:1679
 mld_send_cr net/ipv6/mcast.c:1975 [inline]
 mld_ifc_timer_expire+0x60a/0xf10 net/ipv6/mcast.c:2474
 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1410
 expire_timers kernel/time/timer.c:1455 [inline]
 __run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1747
 __run_timers kernel/time/timer.c:1728 [inline]
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1760
 __do_softirq+0x2a0/0x9f6 kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:393 [inline]
 __irq_exit_rcu kernel/softirq.c:423 [inline]
 irq_exit_rcu+0x132/0x200 kernel/softirq.c:435
 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:631
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:85 [inline]
RIP: 0010:lock_acquire kernel/locking/lockdep.c:5438 [inline]
RIP: 0010:lock_acquire+0x2cd/0x8c0 kernel/locking/lockdep.c:5400
Code: 48 c7 c7 c0 5e 4b 89 48 83 c4 20 e8 dd 68 8f 07 b8 ff ff ff ff 65 0f c1 
05 c0 b2 ab 7e 83 f8 01 0f 85 09 04 00 00 ff 34 24 9d  37 fe ff ff 65 ff 05 
67 a1 ab 7e 48 8b 05 a0 ab 82 0b e8 6b 5d
RSP: 0018:c9000aaf73e0 EFLAGS: 0246
RAX: 0001 RBX: 19200155ee7e RCX: 8155f384
RDX: 111004e58121 RSI: 0001 RDI: 
RBP: 0001 R08:  R09: 8ebb166f
R10: fbfff1d762cd R11:  R12: 
R13: 88803eff20a8 R14:  R15: 
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inl

Re: [PATCH net-next v4 2/5] net/lapb: support netdev events

2020-11-23 Thread Xie He

On Mon, Nov 23, 2020 at 1:36 AM Xie He  wrote:
>
> Some drivers don't support carrier status and will never change it.
> Their carrier status will always be UP. There will not be a
> NETDEV_CHANGE event.
>
> lapbether doesn't change carrier status. I also have my own virtual
> HDLC WAN driver (for testing) which also doesn't change carrier
> status.
>
> I just tested with lapbether. When I bring up the interface, there
> will only be NETDEV_PRE_UP and then NETDEV_UP. There will not be
> NETDEV_CHANGE. The carrier status is alway UP.
>
> I haven't tested whether a device can receive NETDEV_CHANGE when it is
> down. It's possible for a device driver to call netif_carrier_on when
> the interface is down. Do you know what will happen if a device driver
> calls netif_carrier_on when the interface is down?

I just did a test on lapbether and saw there would be no NETDEV_CHANGE
event when the netif is down, even if netif_carrier_on/off is called.
So we can rest assured of this part.

Re: [PATCH] net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled

2020-11-23 Thread Tariq Toukan





On 11/22/2020 11:12 PM, Randy Dunlap wrote:

Fix build when CONFIG_IPV6 is not enabled by making a function
be built conditionally.

Fixes these build errors and warnings:

../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c: In function 
'accel_fs_tcp_set_ipv6_flow':
../include/net/sock.h:380:34: error: 'struct sock_common' has no member named 
'skc_v6_daddr'; did you mean 'skc_daddr'?
   380 | #define sk_v6_daddr  __sk_common.skc_v6_daddr
   |  ^~~~
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:55:14: note: in 
expansion of macro 'sk_v6_daddr'
55 | &sk->sk_v6_daddr, 16);
   |  ^~~
At top level:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:47:13: warning: 
'accel_fs_tcp_set_ipv6_flow' defined but not used [-Wunused-function]
47 | static void accel_fs_tcp_set_ipv6_flow(struct mlx5_flow_spec *spec, 
struct sock *sk)

Fixes: 5229a96e59ec ("net/mlx5e: Accel, Expose flow steering API for rules 
add/del")
Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Cc: Saeed Mahameed 
Cc: Boris Pismenny 
Cc: Tariq Toukan 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
---


Reviewed-by: Tariq Toukan 

Thanks for your patch.

Re: [PATCH 1/1] xdp: compact the function xsk_map_inc

2020-11-23 Thread Magnus Karlsson

On Sun, Nov 22, 2020 at 10:07 AM Zhu Yanjun  wrote:
>
> From: Zhu Yanjun 
>
> The function xsk_map_inc always returns zero. As such, changing the
> return type to void and removing the test code.
>
> Signed-off-by: Zhu Yanjun 
> Signed-off-by: Zhu Yanjun 
> ---
>  net/xdp/xsk.c|1 -
>  net/xdp/xsk.h|2 +-
>  net/xdp/xskmap.c |   10 ++
>  3 files changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index cfbec39..c1b8a88 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -548,7 +548,6 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> node = list_first_entry_or_null(&xs->map_list, struct xsk_map_node,
> node);
> if (node) {
> -   WARN_ON(xsk_map_inc(node->map));
> map = node->map;
> *map_entry = node->map_entry;
> }
> diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
> index b9e896c..766b9e2 100644
> --- a/net/xdp/xsk.h
> +++ b/net/xdp/xsk.h
> @@ -41,7 +41,7 @@ struct xsk_map_node {
>
>  void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
>  struct xdp_sock **map_entry);
> -int xsk_map_inc(struct xsk_map *map);
> +void xsk_map_inc(struct xsk_map *map);
>  void xsk_map_put(struct xsk_map *map);
>  void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
>  int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
> diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
> index 49da2b8..c7dd94a 100644
> --- a/net/xdp/xskmap.c
> +++ b/net/xdp/xskmap.c
> @@ -11,10 +11,9 @@
>
>  #include "xsk.h"
>
> -int xsk_map_inc(struct xsk_map *map)
> +void xsk_map_inc(struct xsk_map *map)
>  {
> bpf_map_inc(&map->map);
> -   return 0;
>  }

Thank you Yanjun for your cleanup. I think we can take this one step
further and remove the function xsk_map_inc completely and use
bpf_map_inc directly in the code. Could you please do this and submit
a v2?

>  void xsk_map_put(struct xsk_map *map)
> @@ -26,17 +25,12 @@ void xsk_map_put(struct xsk_map *map)
>struct xdp_sock **map_entry)
>  {
> struct xsk_map_node *node;
> -   int err;
>
> node = kzalloc(sizeof(*node), GFP_ATOMIC | __GFP_NOWARN);
> if (!node)
> return ERR_PTR(-ENOMEM);
>
> -   err = xsk_map_inc(map);
> -   if (err) {
> -   kfree(node);
> -   return ERR_PTR(err);
> -   }
> +   xsk_map_inc(map);
>
> node->map = map;
> node->map_entry = map_entry;
> --
> 1.7.1
>

[PATCH net-next v2] net/nfc/nci: Support NCI 2.x initial sequence

2020-11-23 Thread Bongsu Jeon

implement the NCI 2.x initial sequence to support NCI 2.x NFCC.
Since NCI 2.0, CORE_RESET and CORE_INIT sequence have been changed.
If NFCEE supports NCI 2.x, then NCI 2.x initial sequence will work.

In NCI 1.0, Initial sequence and payloads are as below:
(DH) (NFCC)
 |  -- CORE_RESET_CMD --> |
 |  <-- CORE_RESET_RSP -- |
 |  -- CORE_INIT_CMD -->  |
 |  <-- CORE_INIT_RSP --  |
 CORE_RESET_RSP payloads are Status, NCI version, Configuration Status.
 CORE_INIT_CMD payloads are empty.
 CORE_INIT_RSP payloads are Status, NFCC Features,
Number of Supported RF Interfaces, Supported RF Interface,
Max Logical Connections, Max Routing table Size,
Max Control Packet Payload Size, Max Size for Large Parameters,
Manufacturer ID, Manufacturer Specific Information.

In NCI 2.0, Initial Sequence and Parameters are as below:
(DH) (NFCC)
 |  -- CORE_RESET_CMD --> |
 |  <-- CORE_RESET_RSP -- |
 |  <-- CORE_RESET_NTF -- |
 |  -- CORE_INIT_CMD -->  |
 |  <-- CORE_INIT_RSP --  |
 CORE_RESET_RSP payloads are Status.
 CORE_RESET_NTF payloads are Reset Trigger,
Configuration Status, NCI Version, Manufacturer ID,
Manufacturer Specific Information Length,
Manufacturer Specific Information.
 CORE_INIT_CMD payloads are Feature1, Feature2.
 CORE_INIT_RSP payloads are Status, NFCC Features,
Max Logical Connections, Max Routing Table Size,
Max Control Packet Payload Size,
Max Data Packet Payload Size of the Static HCI Connection,
Number of Credits of the Static HCI Connection,
Max NFC-V RF Frame Size, Number of Supported RF Interfaces,
Supported RF Interfaces.

Signed-off-by: Bongsu Jeon 
---
 Changes in v2:
  - fix the warning of type casting.
  - changed the __u8 type to unsigned char.

 include/net/nfc/nci.h | 39 ++
 net/nfc/nci/core.c| 23 +++--
 net/nfc/nci/ntf.c | 21 
 net/nfc/nci/rsp.c | 75 +--
 4 files changed, 146 insertions(+), 12 deletions(-)

diff --git a/include/net/nfc/nci.h b/include/net/nfc/nci.h
index 0550e0380b8d..decc89803d4b 100644
--- a/include/net/nfc/nci.h
+++ b/include/net/nfc/nci.h
@@ -25,6 +25,8 @@
 #define NCI_MAX_PARAM_LEN  251
 #define NCI_MAX_PAYLOAD_SIZE   255
 #define NCI_MAX_PACKET_SIZE258
+#define NCI_MAX_LARGE_PARAMS_NCI_v215
+#define NCI_VER_2_MASK 0x20
 
 /* NCI Status Codes */
 #define NCI_STATUS_OK  0x00
@@ -131,6 +133,9 @@
 #define NCI_LF_CON_BITR_F_212  0x02
 #define NCI_LF_CON_BITR_F_424  0x04
 
+/* NCI 2.x Feature Enable Bit */
+#define NCI_FEATURE_DISABLE0x00
+
 /* NCI Reset types */
 #define NCI_RESET_TYPE_KEEP_CONFIG 0x00
 #define NCI_RESET_TYPE_RESET_CONFIG0x01
@@ -220,6 +225,11 @@ struct nci_core_reset_cmd {
 } __packed;
 
 #define NCI_OP_CORE_INIT_CMD   nci_opcode_pack(NCI_GID_CORE, 0x01)
+/* To support NCI 2.x */
+struct nci_core_init_v2_cmd {
+   unsigned char   feature1;
+   unsigned char   feature2;
+} __packed;
 
 #define NCI_OP_CORE_SET_CONFIG_CMD nci_opcode_pack(NCI_GID_CORE, 0x02)
 struct set_config_param {
@@ -316,6 +326,11 @@ struct nci_core_reset_rsp {
__u8config_status;
 } __packed;
 
+/* To support NCI ver 2.x */
+struct nci_core_reset_rsp_nci_ver2 {
+   unsigned char   status;
+} __packed;
+
 #define NCI_OP_CORE_INIT_RSP   nci_opcode_pack(NCI_GID_CORE, 0x01)
 struct nci_core_init_rsp_1 {
__u8status;
@@ -334,6 +349,20 @@ struct nci_core_init_rsp_2 {
__le32  manufact_specific_info;
 } __packed;
 
+/* To support NCI ver 2.x */
+struct nci_core_init_rsp_nci_ver2 {
+   unsigned char   status;
+   __le32  nfcc_features;
+   unsigned char   max_logical_connections;
+   __le16  max_routing_table_size;
+   unsigned char   max_ctrl_pkt_payload_len;
+   unsigned char   max_data_pkt_hci_payload_len;
+   unsigned char   number_of_hci_credit;
+   __le16  max_nfc_v_frame_size;
+   unsigned char   num_supported_rf_interfaces;
+   unsigned char   supported_rf_interfaces[];
+} __packed;
+
 #define NCI_OP_CORE_SET_CONFIG_RSP nci_opcode_pack(NCI_GID_CORE, 0x02)
 struct nci_core_set_config_rsp {
__u8status;
@@ -372,6 +401,16 @@ struct nci_nfcee_discover_rsp {
 /* --- */
 /*  NCI Notifications  */
 /* --- */
+#define NCI_OP_CORE_RESET_NTF  nci_opcode_pack(NCI_GID_CORE, 0x00)
+struct nci_core_reset_ntf {
+   unsigned char   reset_trigger;
+   unsigned char   config_status;
+   unsigned char   nci_ver;
+   unsigned char   manufact_id;
+   unsigned char   ma

Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-23 Thread Jiri Pirko

Mon, Nov 23, 2020 at 03:49:06AM CET, gcher...@marvell.com wrote:
>
>
>> -Original Message-
>> From: Jiri Pirko 
>> Sent: Saturday, November 21, 2020 7:44 PM
>> To: George Cherian 
>> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
>> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
>> ; Linu Cherian ;
>> Geethasowjanya Akula ; masahi...@kernel.org;
>> willemdebruijn.ker...@gmail.com; sa...@kernel.org
>> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health
>> reporters for NPA
>> 
>> Sat, Nov 21, 2020 at 05:02:00AM CET, george.cher...@marvell.com wrote:
>> >Add health reporters for RVU NPA block.
>> >NPA Health reporters handle following HW event groups
>> > - GENERAL events
>> > - ERROR events
>> > - RAS events
>> > - RVU event
>> >An event counter per event is maintained in SW.
>> >
>> >Output:
>> > # devlink health
>> > pci/0002:01:00.0:
>> >   reporter npa
>> > state healthy error 0 recover 0
>> > # devlink  health dump show pci/0002:01:00.0 reporter npa
>> > NPA_AF_GENERAL:
>> >Unmap PF Error: 0
>> >Free Disabled for NIX0 RX: 0
>> >Free Disabled for NIX0 TX: 0
>> >Free Disabled for NIX1 RX: 0
>> >Free Disabled for NIX1 TX: 0
>> 
>> This is for 2 ports if I'm not mistaken. Then you need to have this reporter
>> per-port. Register ports and have reporter for each.
>> 
>No, these are not port specific reports.
>NIX is the Network Interface Controller co-processor block.
>There are (max of) 2 such co-processor blocks per SoC.

Ah. I see. In that case, could you please structure the json
differently. Don't concatenate the number with the string. Instead of
that, please have 2 subtrees, one for each NIX.


>
>Moreover, this is an NPA (Network Pool/Buffer Allocator co- processor) 
>reporter.
>This tells whether a free or alloc operation is skipped due to the 
>configurations set by
>other co-processor blocks (NIX,SSO,TIM etc).
>
>https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/marvell/octeontx2.html
>> NAK.

Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-23 Thread George Cherian

Hi Jiri,

> -Original Message-
> From: Jiri Pirko 
> Sent: Monday, November 23, 2020 3:52 PM
> To: George Cherian 
> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org;
> willemdebruijn.ker...@gmail.com; sa...@kernel.org
> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> Mon, Nov 23, 2020 at 03:49:06AM CET, gcher...@marvell.com wrote:
> >
> >
> >> -Original Message-
> >> From: Jiri Pirko 
> >> Sent: Saturday, November 21, 2020 7:44 PM
> >> To: George Cherian 
> >> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> >> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
> >> ; Linu Cherian ;
> >> Geethasowjanya Akula ; masahi...@kernel.org;
> >> willemdebruijn.ker...@gmail.com; sa...@kernel.org
> >> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health
> >> reporters for NPA
> >>
> >> Sat, Nov 21, 2020 at 05:02:00AM CET, george.cher...@marvell.com wrote:
> >> >Add health reporters for RVU NPA block.
> >> >NPA Health reporters handle following HW event groups
> >> > - GENERAL events
> >> > - ERROR events
> >> > - RAS events
> >> > - RVU event
> >> >An event counter per event is maintained in SW.
> >> >
> >> >Output:
> >> > # devlink health
> >> > pci/0002:01:00.0:
> >> >   reporter npa
> >> > state healthy error 0 recover 0  # devlink  health dump show
> >> >pci/0002:01:00.0 reporter npa
> >> > NPA_AF_GENERAL:
> >> >Unmap PF Error: 0
> >> >Free Disabled for NIX0 RX: 0
> >> >Free Disabled for NIX0 TX: 0
> >> >Free Disabled for NIX1 RX: 0
> >> >Free Disabled for NIX1 TX: 0
> >>
> >> This is for 2 ports if I'm not mistaken. Then you need to have this
> >> reporter per-port. Register ports and have reporter for each.
> >>
> >No, these are not port specific reports.
> >NIX is the Network Interface Controller co-processor block.
> >There are (max of) 2 such co-processor blocks per SoC.
> 
> Ah. I see. In that case, could you please structure the json differently. 
> Don't
> concatenate the number with the string. Instead of that, please have 2
> subtrees, one for each NIX.
> 
NPA_AF_GENERAL:
Unmap PF Error: 0
Free Disabled for NIX0 
RX: 0
TX: 0
Free Disabled for NIX1
RX: 0
TX: 0

Something like this?

Regards,
-George
> 
> >
> >Moreover, this is an NPA (Network Pool/Buffer Allocator co- processor)
> reporter.
> >This tells whether a free or alloc operation is skipped due to the
> >configurations set by other co-processor blocks (NIX,SSO,TIM etc).
> >
> >https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.kernel.org_doc
> >_html_latest_networking_device-
> 5Fdrivers_ethernet_marvell_octeontx2.htm
> >l&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=npgTSgHrUSLmXpBZJKVhk0
> lE_XNvtVDl8
> >ZA2zBvBqPw&m=FNPm6lB8fRvGYvMqQWer6S9WI6rZIlMmDCqbM8xrnxM
> &s=B47zBTfDlIdM
> >xUmK0hmQkuoZnsGZYSzkvbZUloevT0A&e=
> >> NAK.

[PATCH v2] ath10k: qmi: Skip host capability request for Xiaomi Poco F1

2020-11-23 Thread Amit Pundir

Workaround to get WiFi working on Xiaomi Poco F1 (sdm845)
phone. We get a non-fatal QMI_ERR_MALFORMED_MSG_V01 error
message in ath10k_qmi_host_cap_send_sync(), but we can still
bring up WiFi services successfully on AOSP if we ignore it.

We suspect either the host cap is not implemented or there
may be firmware specific issues. Firmware version is
QC_IMAGE_VERSION_STRING=WLAN.HL.2.0.c3-00257-QCAHLSWMTPLZ-1

qcom,snoc-host-cap-8bit-quirk didn't help. If I use this
quirk, then the host capability request does get accepted,
but we run into fatal "msa info req rejected" error and
WiFi interface doesn't come up.

Attempts are being made to debug the failure reasons but no
luck so far. Hence this device specific workaround instead
of checking for QMI_ERR_MALFORMED_MSG_V01 error message.
Tried ath10k/WCN3990/hw1.0/wlanmdsp.mbn from the upstream
linux-firmware project but it didn't help and neither did
building board-2.bin file from stock bdwlan* files.

This workaround will be removed once we have a viable fix.
Thanks to postmarketOS guys for catching this.

Signed-off-by: Amit Pundir 
---
We dropped this workaround last time in the favor of
a generic dts quirk to skip host cap check. But that
is under under discussion for a while now,
https://lkml.org/lkml/2020/9/25/1119, so resending
this short term workaround for the time being.

v2: ath10k-check complained about a too long line last
time, so moved the comment to a new line.

 drivers/net/wireless/ath/ath10k/qmi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/qmi.c 
b/drivers/net/wireless/ath/ath10k/qmi.c
index ae6b1f402adf..1c58b0ff1d29 100644
--- a/drivers/net/wireless/ath/ath10k/qmi.c
+++ b/drivers/net/wireless/ath/ath10k/qmi.c
@@ -653,7 +653,9 @@ static int ath10k_qmi_host_cap_send_sync(struct ath10k_qmi 
*qmi)
 
/* older FW didn't support this request, which is not fatal */
if (resp.resp.result != QMI_RESULT_SUCCESS_V01 &&
-   resp.resp.error != QMI_ERR_NOT_SUPPORTED_V01) {
+   resp.resp.error != QMI_ERR_NOT_SUPPORTED_V01 &&
+   /* Xiaomi Poco F1 workaround */
+   !of_machine_is_compatible("xiaomi,beryllium")) {
ath10k_err(ar, "host capability request rejected: %d\n", 
resp.resp.error);
ret = -EINVAL;
goto out;
-- 
2.7.4

Re: Is test_offload.py supposed to work?

2020-11-23 Thread Toke Høiland-Jørgensen

Jakub Kicinski  writes:

> On Fri, 20 Nov 2020 16:46:51 +0100 Toke Høiland-Jørgensen wrote:
>> Hi Jakub and Jiri
>> 
>> I am investigating an error with XDP offload mode, and figured I'd run
>> 'test_offload.py' from selftests. However, I'm unable to get it to run
>> successfully; am I missing some config options, or has it simply
>> bit-rotted to the point where it no longer works?
>
> Yeah it must have bit rotted, there are no config options to get
> wrong there AFAIK.
>
> It shouldn't be too hard to fix tho, it's just a python script...

Right, I'll take a stab at fixing it, just wanted to make sure I wasn't
missing something obvious; thanks!

-Toke

Re: Is test_offload.py supposed to work?

2020-11-23 Thread Toke Høiland-Jørgensen

Andrii Nakryiko  writes:

> On Fri, Nov 20, 2020 at 7:49 AM Toke Høiland-Jørgensen  
> wrote:
>>
>> Hi Jakub and Jiri
>>
>> I am investigating an error with XDP offload mode, and figured I'd run
>> 'test_offload.py' from selftests. However, I'm unable to get it to run
>> successfully; am I missing some config options, or has it simply
>> bit-rotted to the point where it no longer works?
>>
>
> See also discussion in [0]
>
>   [0] https://www.spinics.net/lists/netdev/msg697523.html

Ah, right, thanks for the pointer :)

-Toke

Re: [PATCH] stmmac: pci: Add support for LS7A bridge chip

2020-11-23 Thread Jiaxun Yang


Hi Lizhi,

You didn't send the patch to any mail list, is this intentional?

在 2020/11/23 18:03, lizhi01 写道:

Add gmac driver to support LS7A bridge chip.

Signed-off-by: lizhi01 
---
  arch/mips/configs/loongson3_defconfig  |   4 +-
  drivers/net/ethernet/stmicro/stmmac/Kconfig|   8 +
  drivers/net/ethernet/stmicro/stmmac/Makefile   |   1 +
  .../net/ethernet/stmicro/stmmac/dwmac-loongson.c   | 194 +
  4 files changed, 206 insertions(+), 1 deletion(-)
  create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c

diff --git a/arch/mips/configs/loongson3_defconfig 
b/arch/mips/configs/loongson3_defconfig
index 38a817e..2e8d2be 100644
--- a/arch/mips/configs/loongson3_defconfig
+++ b/arch/mips/configs/loongson3_defconfig
@@ -225,7 +225,9 @@ CONFIG_R8169=y
  # CONFIG_NET_VENDOR_SILAN is not set
  # CONFIG_NET_VENDOR_SIS is not set
  # CONFIG_NET_VENDOR_SMSC is not set
-# CONFIG_NET_VENDOR_STMICRO is not set
+CONFIG_NET_VENDOR_STMICR=y
+CONFIG_STMMAC_ETH=y
+CONFIG_DWMAC_LOONGSON=y
  # CONFIG_NET_VENDOR_SUN is not set
  # CONFIG_NET_VENDOR_TEHUTI is not set
  # CONFIG_NET_VENDOR_TI is not set
diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig 
b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index 53f14c5..30117cb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -230,6 +230,14 @@ config DWMAC_INTEL
  This selects the Intel platform specific bus support for the
  stmmac driver. This driver is used for Intel Quark/EHL/TGL.
  
+config DWMAC_LOONGSON

+   tristate "Intel GMAC support"
+   depends on STMMAC_ETH && PCI
+   depends on COMMON_CLK
+   help
+ This selects the Intel platform specific bus support for the
+ stmmac driver.


Intel ???


+
  config STMMAC_PCI
tristate "STMMAC PCI bus support"
depends on STMMAC_ETH && PCI
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 24e6145..11ea4569 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -34,4 +34,5 @@ dwmac-altr-socfpga-objs := altr_tse_pcs.o dwmac-socfpga.o
  
  obj-$(CONFIG_STMMAC_PCI)	+= stmmac-pci.o

  obj-$(CONFIG_DWMAC_INTEL) += dwmac-intel.o
+obj-$(CONFIG_DWMAC_LOONGSON)   += dwmac-loongson.o
  stmmac-pci-objs:= stmmac_pci.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
new file mode 100644
index 000..765412e
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020, Loongson Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "stmmac.h"
+
+struct stmmac_pci_info {
+   int (*setup)(struct pci_dev *pdev, struct plat_stmmacenet_data *plat);
+};
+
+static void common_default_data(struct plat_stmmacenet_data *plat)
+{
+   plat->clk_csr = 2;
+   plat->has_gmac = 1;
+   plat->force_sf_dma_mode = 1;
+   
+   plat->mdio_bus_data->needs_reset = true;
+
+   plat->multicast_filter_bins = HASH_TABLE_SIZE;
+
+   plat->unicast_filter_entries = 1;
+
+   plat->maxmtu = JUMBO_LEN;
+
+   plat->tx_queues_to_use = 1;
+   plat->rx_queues_to_use = 1;
+
+   plat->tx_queues_cfg[0].use_prio = false;
+   plat->rx_queues_cfg[0].use_prio = false;
+
+   plat->rx_queues_cfg[0].pkt_route = 0x0;
+}
+
+static int loongson_default_data(struct pci_dev *pdev, struct 
plat_stmmacenet_data *plat)
+{
+   common_default_data(plat);
+   
+   plat->bus_id = pci_dev_id(pdev);
+   plat->phy_addr = -1;
+   plat->interface = PHY_INTERFACE_MODE_GMII;
+
+   plat->dma_cfg->pbl = 32;
+   plat->dma_cfg->pblx8 = true;
+
+   plat->multicast_filter_bins = 256;
+
+   return 0;   
+}



You can merge common and Loongson config as the driver is solely used by 
Loongson.


The callback is not necessary as well...



+
+static const struct stmmac_pci_info loongson_pci_info = {
+   .setup = loongson_default_data,
+};
+
+static int loongson_gmac_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
+{
+   struct stmmac_pci_info *info = (struct stmmac_pci_info 
*)id->driver_data;
+   struct plat_stmmacenet_data *plat;
+   struct stmmac_resources res;
+   int ret, i, lpi_irq;
+   struct device_node *np; 
+   
+   plat = devm_kzalloc(&pdev->dev, sizeof(struct plat_stmmacenet_data), 
GFP_KERNEL);
+   if (!plat)
+   return -ENOMEM;
+
+   plat->mdio_bus_data = devm_kzalloc(&pdev->dev, sizeof(struct 
stmmac_mdio_bus_data), GFP_KERNEL);
+   if (!plat->mdio_bus_data) {
+   kfree(plat);
+   return -ENOMEM;
+   }
+
+   plat->dma_cfg = devm_kzalloc(&pdev->dev, sizeof(struct stmmac_dma_cfg), 
GFP_KERNEL);
+   if (!plat->dma_

[PATCH][next] net: hns3: fix spelling mistake "memroy" -> "memory"

2020-11-23 Thread Colin King

From: Colin Ian King 

There are spelling mistakes in two dev_err messages. Fix them.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c   | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 500cc19225f3..ca668a47121e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -9924,7 +9924,7 @@ static int hclge_dev_mem_map(struct hclge_dev *hdev)
   pci_resource_start(pdev, HCLGE_MEM_BAR),
   pci_resource_len(pdev, HCLGE_MEM_BAR));
if (!hw->mem_base) {
-   dev_err(&pdev->dev, "failed to map device memroy\n");
+   dev_err(&pdev->dev, "failed to map device memory\n");
return -EFAULT;
}
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 5d6b419b8a78..5b2f9a56f1d8 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -2904,7 +2904,7 @@ static int hclgevf_dev_mem_map(struct hclgevf_dev *hdev)
  HCLGEVF_MEM_BAR),
   pci_resource_len(pdev, HCLGEVF_MEM_BAR));
if (!hw->mem_base) {
-   dev_err(&pdev->dev, "failed to map device memroy\n");
+   dev_err(&pdev->dev, "failed to map device memory\n");
return -EFAULT;
}
 
-- 
2.28.0

Re: [PATCH net-next v4 2/5] net/lapb: support netdev events

2020-11-23 Thread Martin Schiller


On 2020-11-23 11:08, Xie He wrote:

On Mon, Nov 23, 2020 at 1:36 AM Xie He  wrote:


Some drivers don't support carrier status and will never change it.
Their carrier status will always be UP. There will not be a
NETDEV_CHANGE event.


Well, one could argue that we would have to repair these drivers, but I
don't think that will get us anywhere.

From this point of view it will be the best to handle the NETDEV_UP in
the lapb event handler and establish the link analog to the
NETDEV_CHANGE event if the carrier is UP.



lapbether doesn't change carrier status. I also have my own virtual
HDLC WAN driver (for testing) which also doesn't change carrier
status.

I just tested with lapbether. When I bring up the interface, there
will only be NETDEV_PRE_UP and then NETDEV_UP. There will not be
NETDEV_CHANGE. The carrier status is alway UP.

I haven't tested whether a device can receive NETDEV_CHANGE when it is
down. It's possible for a device driver to call netif_carrier_on when
the interface is down. Do you know what will happen if a device driver
calls netif_carrier_on when the interface is down?


I just did a test on lapbether and saw there would be no NETDEV_CHANGE
event when the netif is down, even if netif_carrier_on/off is called.
So we can rest assured of this part.

Re: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.

2020-11-23 Thread Marc Zyngier


On 2020-11-11 06:22, Jianyong Wu wrote:

ptp_kvm will get this service through SMCC call.
The service offers wall time and cycle count of host to guest.
The caller must specify whether they want the host cycle count
or the difference between host cycle count and cntvoff.

Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 61 +
 include/linux/arm-smccc.h   | 17 +++
 2 files changed, 78 insertions(+)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index b9d8607083eb..f7d189563f3d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -9,6 +9,51 @@
 #include 
 #include 

+static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
+{
+   struct system_time_snapshot systime_snapshot;
+   u64 cycles = ~0UL;
+   u32 feature;
+
+   /*
+* system time and counter value must captured in the same
+* time to keep consistency and precision.
+*/
+   ktime_get_snapshot(&systime_snapshot);
+
+   // binding ptp_kvm clocksource to arm_arch_counter
+   if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
+   return;
+
+   val[0] = upper_32_bits(systime_snapshot.real);
+   val[1] = lower_32_bits(systime_snapshot.real);


What is the endianness of these values? I can't see it defined
anywhere, and this is likely not to work if guest and hypervisor
don't align.


+
+   /*
+* which of virtual counter or physical counter being
+* asked for is decided by the r1 value of SMCCC
+* call. If no invalid r1 value offered, default cycle
+* value(-1) will be returned.
+* Note: keep in mind that feature is u32 and smccc_get_arg1
+* will return u64, so need auto cast here.
+*/
+   feature = smccc_get_arg1(vcpu);
+   switch (feature) {
+   case ARM_PTP_VIRT_COUNTER:
+		cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu, 
CNTVOFF_EL2);

+   break;
+   case ARM_PTP_PHY_COUNTER:
+   cycles = systime_snapshot.cycles;
+   break;
+   case ARM_PTP_NONE_COUNTER:


What is this "NONE" counter?


+   break;
+   default:
+   val[0] = SMCCC_RET_NOT_SUPPORTED;
+   break;
+   }
+   val[2] = upper_32_bits(cycles);
+   val[3] = lower_32_bits(cycles);


Same problem as above.


+}
+
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
@@ -79,6 +124,22 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
+   val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP);
+   break;
+   /*
+* This serves virtual kvm_ptp.
+* Four values will be passed back.
+* reg0 stores high 32-bits of host ktime;
+* reg1 stores low 32-bits of host ktime;
+* For ARM_PTP_VIRT_COUNTER:
+* reg2 stores high 32-bits of difference of host cycles and cntvoff;
+* reg3 stores low 32-bits of difference of host cycles and cntvoff.
+* For ARM_PTP_PHY_COUNTER:
+* reg2 stores the high 32-bits of host cycles;
+* reg3 stores the low 32-bits of host cycles.
+*/
+   case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+   kvm_ptp_get_time(vcpu, val);
break;
default:
return kvm_psci_call(vcpu);
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index d75408141137..a03c5dd409d3 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -103,6 +103,7 @@

 /* KVM "vendor specific" services */
 #define ARM_SMCCC_KVM_FUNC_FEATURES0
+#define ARM_SMCCC_KVM_FUNC_KVM_PTP 1


I think having KVM once in the name is enough.


 #define ARM_SMCCC_KVM_FUNC_FEATURES_2  127
 #define ARM_SMCCC_KVM_NUM_FUNCS128

@@ -114,6 +115,22 @@

 #define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED   1

+/*
+ * ptp_kvm is a feature used for time sync between vm and host.
+ * ptp_kvm module in guest kernel will get service from host using
+ * this hypercall ID.
+ */
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID   \
+   ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+  ARM_SMCCC_SMC_32,\
+  ARM_SMCCC_OWNER_VENDOR_HYP,  \
+  ARM_SMCCC_KVM_FUNC_KVM_PTP)
+
+/* ptp_kvm counter type ID */
+#define ARM_PTP_VIRT_COUNTER   0
+#define ARM_PTP_PHY_COUNTER1
+#define ARM_PTP_NONE_COUNTER   2


The architecture definitely doesn't have this last counter.


+
 /* Paravirtualised time calls (defined by ARM DEN0057A) */
 #define ARM_SMCCC_HV_PV_TIME_FEATURES  \

Re: [PATCH v15 7/9] ptp: arm/arm64: Enable ptp_kvm for arm/arm64

2020-11-23 Thread Marc Zyngier


On 2020-11-11 06:22, Jianyong Wu wrote:
Currently, there is no mechanism to keep time sync between guest and 
host
in arm/arm64 virtualization environment. Time in guest will drift 
compared

with host after boot up as they may both use third party time sources
to correct their time respectively. The time deviation will be in order
of milliseconds. But in some scenarios,like in cloud envirenment, we 
ask


environment


for higher time precision.

kvm ptp clock, which chooses the host clock source as a reference
clock to sync time between guest and host, has been adopted by x86
which takes the time sync order from milliseconds to nanoseconds.

This patch enables kvm ptp clock for arm/arm64 and improves clock sync 
precison


precision


significantly.

Test result comparisons between with kvm ptp clock and without it in 
arm/arm64

are as follows. This test derived from the result of command 'chronyc
sources'. we should take more care of the last sample column which 
shows
the offset between the local clock and the source at the last 
measurement.


no kvm ptp in guest:
MS Name/IP address   Stratum Poll Reach LastRx Last sample

^* dns1.synet.edu.cn  2   6   37713  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37721  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37729  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37737  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37745  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37753  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37761  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   377 4   -130us[ +796us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37712   -130us[ +796us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37720   -130us[ +796us] +/-   
21ms


in host:
MS Name/IP address   Stratum Poll Reach LastRx Last sample

^* 120.25.115.20  2   7   37772   -470us[ -603us] +/-   
18ms
^* 120.25.115.20  2   7   37792   -470us[ -603us] +/-   
18ms
^* 120.25.115.20  2   7   377   112   -470us[ -603us] +/-   
18ms
^* 120.25.115.20  2   7   377 2   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   37722   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   37743   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   37763   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   37783   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   377   103   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   377   123   +872ns[-6808ns] +/-   
17ms


The dns1.synet.edu.cn is the network reference clock for guest and
120.25.115.20 is the network reference clock for host. we can't get the
clock error between guest and host directly, but a roughly estimated 
value

will be in order of hundreds of us to ms.

with kvm ptp in guest:
chrony has been disabled in host to remove the disturb by network 
clock.


MS Name/IP address Stratum Poll Reach LastRx Last sample

* PHC00   3   377 8 -7ns[   +1ns] +/-
3ns
* PHC00   3   377 8 +1ns[  +16ns] +/-
3ns
* PHC00   3   377 6 -4ns[   -0ns] +/-
6ns
* PHC00   3   377 6 -8ns[  -12ns] +/-
5ns
* PHC00   3   377 5 +2ns[   +4ns] +/-
4ns
* PHC00   3   37713 +2ns[   +4ns] +/-
4ns
* PHC00   3   37712 -4ns[   -6ns] +/-
4ns
* PHC00   3   37711 -8ns[  -11ns] +/-
6ns
* PHC00   3   37710-14ns[  -20ns] +/-
4ns
* PHC00   3   377 8 +4ns[   +5ns] +/-
4ns


The PHC0 is the ptp clock which choose the host clock as its source
clock. So we can see that the clock difference between host and guest
is in order of ns.

Signed-off-by: Jianyong Wu 
---
 drivers/clocksource/arm_arch_timer.c | 28 ++
 drivers/ptp/Kconfig  |  2 +-
 drivers/ptp/Makefile |  1 +
 drivers/ptp/ptp_kvm_arm.c| 44 
 4 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ptp/ptp_kvm_arm.c

diff --git a/drivers/clocksource/arm_arch_timer.c
b/drivers/clocksource/arm_arch_timer.c
index d55acffb0b90..b33c5a663d30 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -1650,3 +1651,30 @@ static int __init arch_timer_acpi_init(struct
acpi_table_header *table)
 }
 TIMER

Re: [PATCH v15 8/9] doc: add ptp_kvm introduction for arm64 support

2020-11-23 Thread Marc Zyngier


On 2020-11-11 06:22, Jianyong Wu wrote:

PTP_KVM implementation depends on hypercall using SMCCC. So we
introduce a new SMCCC service ID. This doc explains how does the
ID define and how does PTP_KVM works on arm/arm64.

Signed-off-by: Jianyong Wu 
---
 Documentation/virt/kvm/api.rst |  9 +++
 Documentation/virt/kvm/arm/index.rst   |  1 +
 Documentation/virt/kvm/arm/ptp_kvm.rst | 29 +
 Documentation/virt/kvm/timekeeping.rst | 35 ++
 4 files changed, 74 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/ptp_kvm.rst

diff --git a/Documentation/virt/kvm/api.rst 
b/Documentation/virt/kvm/api.rst

index 36d5f1f3c6dd..9843dbcbf770 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6391,3 +6391,12 @@ When enabled, KVM will disable paravirtual
features provided to the
 guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
 (0x4001). Otherwise, a guest may use the paravirtual features
 regardless of what has actually been exposed through the CPUID leaf.
+
+8.27 KVM_CAP_PTP_KVM
+
+
+:Architectures: arm64
+
+This capability indicates that KVM virtual PTP service is supported in 
host.
+It must company with the implementation of KVM virtual PTP service in 
host
+so VMM can probe if there is the service in host by checking this 
capability.

diff --git a/Documentation/virt/kvm/arm/index.rst
b/Documentation/virt/kvm/arm/index.rst
index 3e2b2aba90fc..78a9b670aafe 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -10,3 +10,4 @@ ARM
hyp-abi
psci
pvtime
+   ptp_kvm
diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst
b/Documentation/virt/kvm/arm/ptp_kvm.rst
new file mode 100644
index ..bb1e6cfefe44
--- /dev/null
+++ b/Documentation/virt/kvm/arm/ptp_kvm.rst
@@ -0,0 +1,29 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+PTP_KVM support for arm/arm64
+=
+
+PTP_KVM is used for time sync between guest and host in a high 
precision.

+It needs to get the wall time and counter value from the host and
transfer these
+to guest via hypercall service. So one more hypercall service has been 
added.

+
+This new SMCCC hypercall is defined as:
+
+* ARM_SMCCC_HYP_KVM_PTP_FUNC_ID: 0x8601
+
+As both 32 and 64-bits ptp_kvm client should be supported, we choose
SMC32/HVC32
+calling convention.
+
+ARM_SMCCC_HYP_KVM_PTP_FUNC_ID:
+
+=====
+Function ID: (uint32)  0x8601
+Arguments:  (uint32)  ARM_PTP_PHY_COUNTER(1) or
ARM_PTP_VIRT_COUNTER(0)
+   which indicate acquiring physical 
counter or

+   virtual counter respectively.
+return value:(uint32)  NOT_SUPPORTED(-1) or val0 and val1 
represent
+   wall clock time and val2 and val3 
represent

+   counter cycle.


This needs a lot more description:

- Which word contains what part of the data (upper/lower part of the 
64bit data)

- The endianness of the data returned

M.
--
Jazz is not dead. It just smells funny...

RE: [PATCH] libbpf: add support for canceling cached_cons advance

2020-11-23 Thread Li,Rongqing



> -Original Message-
> From: Magnus Karlsson [mailto:magnus.karls...@gmail.com]
> Sent: Monday, November 23, 2020 5:40 PM
> To: Li,Rongqing 
> Cc: Network Development ; bpf
> 
> Subject: Re: [PATCH] libbpf: add support for canceling cached_cons advance
> 
> On Sun, Nov 22, 2020 at 2:21 PM Li RongQing  wrote:
> >
> > It is possible to fail receiving packets after calling
> > xsk_ring_cons__peek, at this condition, cached_cons has been advanced,
> > should be cancelled.
> 
> Thanks RongQing,
> 
> I have needed this myself in various situations, so I think we should add 
> this.
> But your motivation in the commit message is somewhat confusing. How about
> something like this?
> 
> Add a new function for returning descriptors the user received after an
> xsk_ring_cons__peek call. After the application has gotten a number of
> descriptors from a ring, it might not be able to or want to process them all 
> for
> various reasons. Therefore, it would be useful to have an interface for 
> returning
> or cancelling a number of them so that they are returned to the ring. This 
> patch
> adds a new function called xsk_ring_cons__cancel that performs this operation
> on nb descriptors counted from the end of the batch of descriptors that was
> received through the peek call.
> 
> Replace your commit message with this, fix the bug below, send a v2 and then I
> am happy to ack this.


Thank you very much
> 
> /Magnus
> 
> > Signed-off-by: Li RongQing 
> > ---
> >  tools/lib/bpf/xsk.h | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h index
> > 1069c46364ff..4128215c246b 100644
> > --- a/tools/lib/bpf/xsk.h
> > +++ b/tools/lib/bpf/xsk.h
> > @@ -153,6 +153,12 @@ static inline size_t xsk_ring_cons__peek(struct
> xsk_ring_cons *cons,
> > return entries;
> >  }
> >
> > +static inline void xsk_ring_cons__cancel(struct xsk_ring_cons *cons,
> > +size_t nb) {
> > +   rx->cached_cons -= nb;
> 
> cons-> not rx->. Please make sure the v2 compiles and passes checkpatch.
> 

Sorry for building error
I will send V2

Thanks 

-Li


> > +}
> > +
> >  static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons,
> > size_t nb)  {
> > /* Make sure data has been read before indicating we are done
> > --
> > 2.17.3
> >

Re: netconsole deadlock with virtnet

2020-11-23 Thread Leon Romanovsky

On Wed, Nov 18, 2020 at 09:12:57AM -0500, Steven Rostedt wrote:
>
> [ Adding netdev as perhaps someone there knows ]
>
> On Wed, 18 Nov 2020 12:09:59 +0800
> Jason Wang  wrote:
>
> > > This CPU0 lock(_xmit_ETHER#2) -> hard IRQ -> lock(console_owner) is
> > > basically
> > >   soft IRQ -> lock(_xmit_ETHER#2) -> hard IRQ -> printk()
> > >
> > > Then CPU1 spins on xmit, which is owned by CPU0, CPU0 spins on
> > > console_owner, which is owned by CPU1?
>
> It still looks to me that the target_list_lock is taken in IRQ, (which can
> be the case because printk calls write_msg() which takes that lock). And
> someplace there's a:
>
>   lock(target_list_lock)
>   lock(xmit_lock)
>
> which means you can remove the console lock from this scenario completely,
> and you still have a possible deadlock between target_list_lock and
> xmit_lock.
>
> >
> >
> > If this is true, it looks not a virtio-net specific issue but somewhere
> > else.
> >
> > I think all network driver will synchronize through bh instead of hardirq.
>
> I think the issue is where target_list_lock is held when we take xmit_lock.
> Is there anywhere in netconsole.c that can end up taking xmit_lock while
> holding the target_list_lock? If so, that's the problem. As
> target_list_lock is something that can be taken in IRQ context, which means
> *any* other lock that is taking while holding the target_list_lock must
> also protect against interrupts from happening while it they are held.

I increased printk buffer like Petr suggested and the splat is below.
It doesn't happening on x86, but on ARM65 and ppc64.

 [   10.027975] =
 [   10.027976] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
 [   10.027976] 5.10.0-rc4_for_upstream_min_debug_2020_11_22_19_37 #1 Not 
tainted
 [   10.027977] -
 [   10.027978] modprobe/638 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 [   10.027979] c9f63c98 (_xmit_ETHER#2){+.-.}-{2:2}, at: 
virtnet_poll_tx+0x84/0x120
 [   10.027982]
 [   10.027982] and this task is already holding:
 [   10.027983] 89007018 (target_list_lock){}-{2:2}, at: 
write_msg+0x6c/0x120 [netconsole]
 [   10.027985] which would create a new lock dependency:
 [   10.027985]  (target_list_lock){}-{2:2} -> (_xmit_ETHER#2){+.-.}-{2:2}
 [   10.027989]
 [   10.027989] but this new dependency connects a HARDIRQ-irq-safe lock:
 [   10.027990]  (console_owner){-...}-{0:0}
 [   10.027991]
 [   10.027992] ... which became HARDIRQ-irq-safe at:
 [   10.027992]   __lock_acquire+0xa78/0x1a94
 [   10.027993]   lock_acquire.part.0+0x170/0x360
 [   10.027993]   lock_acquire+0x68/0x8c
 [   10.027994]   console_unlock+0x1e8/0x6a4
 [   10.027994]   vprintk_emit+0x1c4/0x3c4
 [   10.027995]   vprintk_default+0x40/0x4c
 [   10.027995]   vprintk_func+0x10c/0x220
 [   10.027995]   printk+0x68/0x90
 [   10.027996]   crng_fast_load+0x1bc/0x1c0
 [   10.027997]   add_interrupt_randomness+0x280/0x290
 [   10.027997]   handle_irq_event+0x80/0x120
 [   10.027997]   handle_fasteoi_irq+0xac/0x200
 [   10.027998]   __handle_domain_irq+0x84/0xf0
 [   10.027999]   gic_handle_irq+0xd4/0x320
 [   10.027999]   el1_irq+0xd0/0x180
 [   10.028000]   arch_cpu_idle+0x24/0x44
 [   10.028000]   default_idle_call+0x48/0xa0
 [   10.028001]   do_idle+0x260/0x300
 [   10.028001]   cpu_startup_entry+0x30/0x6c
 [   10.028001]   rest_init+0x1b4/0x288
 [   10.028002]   arch_call_rest_init+0x18/0x24
 [   10.028002]   start_kernel+0x5cc/0x608
 [   10.028003]
 [   10.028003] to a HARDIRQ-irq-unsafe lock:
 [   10.028004]  (_xmit_ETHER#2){+.-.}-{2:2}
 [   10.028005]
 [   10.028006] ... which became HARDIRQ-irq-unsafe at:
 [   10.028006] ...  __lock_acquire+0x8bc/0x1a94
 [   10.028007]   lock_acquire.part.0+0x170/0x360
 [   10.028007]   lock_acquire+0x68/0x8c
 [   10.028008]   _raw_spin_trylock+0x80/0xd0
 [   10.028008]   virtnet_poll+0xac/0x360
 [   10.028009]   net_rx_action+0x1b0/0x4e0
 [   10.028010]   __do_softirq+0x1f4/0x638
 [   10.028010]   do_softirq+0xb8/0xcc
 [   10.028010]   __local_bh_enable_ip+0x18c/0x200
 [   10.028011]   virtnet_napi_enable+0xc0/0xd4
 [   10.028011]   virtnet_open+0x98/0x1c0
 [   10.028012]   __dev_open+0x12c/0x200
 [   10.028013]   __dev_change_flags+0x1a0/0x220
 [   10.028013]   dev_change_flags+0x2c/0x70
 [   10.028014]   do_setlink+0x214/0xe20
 [   10.028014]   __rtnl_newlink+0x514/0x820
 [   10.028015]   rtnl_newlink+0x58/0x84
 [   10.028015]   rtnetlink_rcv_msg+0x184/0x4b4
 [   10.028016]   netlink_rcv_skb+0x60/0x124
 [   10.028016]   rtnetlink_rcv+0x20/0x30
 [   10.028017]   netlink_unicast+0x1b4/0x270
 [   10.028017]   netlink_sendmsg+0x1f0/0x400
 [   10.028018]   sock_sendmsg+0x5c/0x70
 [   10.028018]   sys_sendmsg+0x24c/0x280
 [   10.028019]   ___sys_sendmsg+0x88/0xd0
 [   10.028019]   __sys_sendmsg+0x70/0xd0
 [   10.028020]   __arm64_sys_sendmsg+0x2c/0x40
 [   10.028021]   el0_svc_common.constprop.0+0x84/0x200
 [   10.0280

[arm64] kernel BUG at kernel/seccomp.c:1309!

2020-11-23 Thread Naresh Kamboju

While booting arm64 kernel the following kernel BUG noticed on several arm64
devices running linux next 20201123 tag kernel.


$ git log --oneline next-20201120..next-20201123 -- kernel/seccomp.c
5c5c5fa055ea Merge remote-tracking branch 'seccomp/for-next/seccomp'
bce6a8cba7bf Merge branch 'linus'
7ef95e3dbcee Merge branch 'for-linus/seccomp' into for-next/seccomp
fab686eb0307 seccomp: Remove bogus __user annotations
0d831528 seccomp/cache: Report cache data through /proc/pid/seccomp_cache
8e01b51a31a1 seccomp/cache: Add "emulator" to check if filter is constant allow
f9d480b6ffbe seccomp/cache: Lookup syscall allowlist bitmap for fast path
23d67a54857a seccomp: Migrate to use SYSCALL_WORK flag


Please find these easy steps to reproduce the kernel build and boot.

step to reproduce:
# please install tuxmake
# sudo pip3 install -U tuxmake
# cd linux-next
# tuxmake --runtime docker --target-arch arm --toolchain gcc-9
--kconfig defconfig --kconfig-add
https://builds.tuxbuild.com/1kgWN61pS5M35vjnVfDSvOOPd38/config

# Boot the arm64 on any arm64 devices.
# you will notice the below BUG

crash log details:
---
[6.941012] [ cut here ]
Found device  /dev/ttyAMA3.
[6.947587] lima f408.gpu: mod rate = 5
[6.955422] kernel BUG at kernel/seccomp.c:1309!
[6.955430] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[6.955437] Modules linked in: cec rfkill wlcore_sdio(+) kirin_drm
dw_drm_dsi lima(+) drm_kms_helper gpu_sched drm fuse
[6.955481] CPU: 2 PID: 291 Comm: systemd-udevd Not tainted
5.10.0-rc4-next-20201123 #2
[6.955485] Hardware name: HiKey Development Board (DT)
[6.955493] pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[6.955510] pc : __secure_computing+0xe0/0xe8
[6.958171] mmc_host mmc2: Bus speed (slot 0) = 2480Hz (slot
req 40Hz, actual 40HZ div = 31)
[6.965975] [drm] Initialized lima 1.1.0 20191231 for f408.gpu on minor 0
[6.970176] lr : syscall_trace_enter+0x1cc/0x218
[6.970181] sp : 800012d8be10
[6.970185] x29: 800012d8be10 x28: 0092cb00
[6.970195] x27:  x26: 
[6.970203] x25:  x24: 
[6.970210] x23: 6000 x22: 0202
[7.011614] mmc_host mmc2: Bus speed (slot 0) = 2480Hz (slot
req 2500Hz, actual 2480HZ div = 0)
[7.016457]
[7.016461] x21: 0200 x20: 0092cb00
[7.016470] x19: 800012d8bec0 x18: 
[7.016478] x17:  x16: 
[7.016485] x15:  x14: 
[7.054116] mmc_host mmc2: Bus speed (slot 0) = 2480Hz (slot
req 40Hz, actual 40HZ div = 31)
[7.056715]
[7.103444] mmc_host mmc2: Bus speed (slot 0) = 2480Hz (slot
req 2500Hz, actual 2480HZ div = 0)
[7.105105] x13:  x12: 
[7.125849] x11:  x10: 
[7.125858] x9 : 80001001bcbc x8 : 
[7.125865] x7 :  x6 : 
[7.125871] x5 :  x4 : 
[7.125879] x3 :  x2 : 0092cb00
[7.125886] x1 :  x0 : 0116
[7.125896] Call trace:
] Found device /dev/ttyAMA2.
[7.125908]  __secure_computing+0xe0/0xe8
[7.125918]  syscall_trace_enter+0x1cc/0x218
[7.125927]  el0_svc_common.constprop.0+0x19c/0x1b8
[7.125933]  do_el0_svc+0x2c/0x98
[7.125940]  el0_sync_handler+0x180/0x188
[7.125946]  el0_sync+0x174/0x180
[7.125958] Code: d2800121 97ffd9a9 d2800120 97fbf1a9 (d421)
[7.199584] ---[ end trace 463debbc21f0c7b5 ]---
[7.204205] note: systemd-udevd[291] exited with preempt_count 1
[7.210733] [ cut here ]
[7.215451] WARNING: CPU: 2 PID: #
0 at kernel/rcu/tree.c:632 rcu_eqs_enter.isra.0+0x134/0x140
[7.223927] Modules linked in: cec rfkill wlcore_sdio kirin_drm
dw_drm_dsi lima drm_kms_helper gpu_sched drm fuse
[7.234295] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G  D
  5.10.0-rc4-next-20201123 #2
[7.243252] Hardware name: HiKey Development Board (DT)
[7.248561] pstate: 23c5 (nzCv DAIF -PAN -UAO -TCO BTYPE=--)
[7.254638] pc : rcu_eqs_enter.isra.0+0x134/0x140
[7.259350] lr : rcu_idle_enter+0x18/0x28
[7.263362] sp : 8000128e3e80
[7.266678] x29: 8000128e3e80 x28: 
[7.272001] x27:  x26: 01b79080
[7.277321] x25:  x24: 0001adc9b310
[7.282641] x23:  x22: 01b79080
[7.287970] x21: 77b24b00 x20: 01b79098
[7.287979] x19: 800011c7ab40 x18: 0010
[7.287986] x17:  x16: 
[7.287993] x15: 0092cf98 x14: 0720072007200720
[7.288001] x13: 0720072007200720

Re: [PATCH net-next v4 2/5] net/lapb: support netdev events

2020-11-23 Thread Xie He

On Mon, Nov 23, 2020 at 2:38 AM Martin Schiller  wrote:
>
> Well, one could argue that we would have to repair these drivers, but I
> don't think that will get us anywhere.

Yeah... One problem I see with the Linux project is the lack of
docs/specs. Often we don't know what is right and what is wrong.

>  From this point of view it will be the best to handle the NETDEV_UP in
> the lapb event handler and establish the link analog to the
> NETDEV_CHANGE event if the carrier is UP.

Thanks! This way we can make sure LAPB would automatically connect in
all situations.

Since we'll have a netif_carrier_ok check in NETDEV_UP handing, it
might make the code look prettier to also have a netif_carrier_ok
check in NETDEV_GOING_DOWN handing (for symmetry). Just a suggestion.
You can do whatever looks good to you :)

Thanks!

Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-23 Thread Jiri Pirko

Mon, Nov 23, 2020 at 11:28:28AM CET, gcher...@marvell.com wrote:
>Hi Jiri,
>
>> -Original Message-
>> From: Jiri Pirko 
>> Sent: Monday, November 23, 2020 3:52 PM
>> To: George Cherian 
>> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
>> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
>> ; Linu Cherian ;
>> Geethasowjanya Akula ; masahi...@kernel.org;
>> willemdebruijn.ker...@gmail.com; sa...@kernel.org
>> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health
>> reporters for NPA
>> 
>> Mon, Nov 23, 2020 at 03:49:06AM CET, gcher...@marvell.com wrote:
>> >
>> >
>> >> -Original Message-
>> >> From: Jiri Pirko 
>> >> Sent: Saturday, November 21, 2020 7:44 PM
>> >> To: George Cherian 
>> >> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
>> >> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
>> >> ; Linu Cherian ;
>> >> Geethasowjanya Akula ; masahi...@kernel.org;
>> >> willemdebruijn.ker...@gmail.com; sa...@kernel.org
>> >> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health
>> >> reporters for NPA
>> >>
>> >> Sat, Nov 21, 2020 at 05:02:00AM CET, george.cher...@marvell.com wrote:
>> >> >Add health reporters for RVU NPA block.
>> >> >NPA Health reporters handle following HW event groups
>> >> > - GENERAL events
>> >> > - ERROR events
>> >> > - RAS events
>> >> > - RVU event
>> >> >An event counter per event is maintained in SW.
>> >> >
>> >> >Output:
>> >> > # devlink health
>> >> > pci/0002:01:00.0:
>> >> >   reporter npa
>> >> > state healthy error 0 recover 0  # devlink  health dump show
>> >> >pci/0002:01:00.0 reporter npa
>> >> > NPA_AF_GENERAL:
>> >> >Unmap PF Error: 0
>> >> >Free Disabled for NIX0 RX: 0
>> >> >Free Disabled for NIX0 TX: 0
>> >> >Free Disabled for NIX1 RX: 0
>> >> >Free Disabled for NIX1 TX: 0
>> >>
>> >> This is for 2 ports if I'm not mistaken. Then you need to have this
>> >> reporter per-port. Register ports and have reporter for each.
>> >>
>> >No, these are not port specific reports.
>> >NIX is the Network Interface Controller co-processor block.
>> >There are (max of) 2 such co-processor blocks per SoC.
>> 
>> Ah. I see. In that case, could you please structure the json differently. 
>> Don't
>> concatenate the number with the string. Instead of that, please have 2
>> subtrees, one for each NIX.
>> 
>NPA_AF_GENERAL:
>Unmap PF Error: 0
>Free Disabled for NIX0 
>   RX: 0
>   TX: 0
>Free Disabled for NIX1
>   RX: 0
>   TX: 0
>
>Something like this?

It should be 2 member array, use devlink_fmsg_arr_pair_nest_start()
NIX {
0: {free disabled TX: 0, free disabled RX: 0,}
1: {free disabled TX: 0, free disabled RX: 0,}
}

something like this.


>
>Regards,
>-George
>> 
>> >
>> >Moreover, this is an NPA (Network Pool/Buffer Allocator co- processor)
>> reporter.
>> >This tells whether a free or alloc operation is skipped due to the
>> >configurations set by other co-processor blocks (NIX,SSO,TIM etc).
>> >
>> >https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.kernel.org_doc
>> >_html_latest_networking_device-
>> 5Fdrivers_ethernet_marvell_octeontx2.htm
>> >l&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=npgTSgHrUSLmXpBZJKVhk0
>> lE_XNvtVDl8
>> >ZA2zBvBqPw&m=FNPm6lB8fRvGYvMqQWer6S9WI6rZIlMmDCqbM8xrnxM
>> &s=B47zBTfDlIdM
>> >xUmK0hmQkuoZnsGZYSzkvbZUloevT0A&e=
>> >> NAK.
>

Re: [PATCH 0/3] Bluetooth: Power down controller when suspending

2020-11-23 Thread Marcel Holtmann

Hi Abhishek,

> This patch series adds support for a quirk that will power down the
> Bluetooth controller when suspending and power it back up when resuming.
> 
> On Marvell SDIO Bluetooth controllers (SD8897 and SD8997), we are seeing
> a large number of suspend failures with the following log messages:
> 
> [ 4764.773873] Bluetooth: hci_cmd_timeout() hci0 command 0x0c14 tx timeout
> [ 4767.777897] Bluetooth: btmrvl_enable_hs() Host sleep enable command failed
> [ 4767.777920] Bluetooth: btmrvl_sdio_suspend() HS not actived, suspend 
> failed!
> [ 4767.777946] dpm_run_callback(): pm_generic_suspend+0x0/0x48 returns -16
> [ 4767.777963] call mmc2:0001:2+ returned -16 after 4882288 usecs
> 
> The daily failure rate with this signature is quite significant and
> users are likely facing this at least once a day (and some unlucky users
> are likely facing it multiple times a day).
> 
> Given the severity, we'd like to power off the controller during suspend
> so the driver doesn't need to take any action (or block in any way) when
> suspending and power on during resume. This will break wake-on-bt for
> users but should improve the reliability of suspend.
> 
> We don't want to force all users of MVL8897 and MVL8997 to encounter
> this behavior if they're not affected (especially users that depend on
> Bluetooth for keyboard/mouse input) so the new behavior is enabled via
> module param. We are limiting this quirk to only Chromebooks (i.e.
> laptop). Chromeboxes will continue to have the old behavior since users
> may depend on BT HID to wake and use the system.

I don’t have a super great feeling with this change.

So historically only hciconfig hci0 up/down was doing a power cycle of the 
controller and when adding the mgmt interface we moved that to the mgmt 
interface. In addition we added a special case of power up via hdev->setup. We 
never had an intention that the kernel otherwise can power up/down the 
controller as it pleases.

Can we ask Marvell first to investigate why this is fundamentally broken with 
their hardware? Since what you are proposing is a pretty heavy change that 
might has side affects. For example the state machine for the mgmt interface 
has no concept of a power down/up from the kernel. It is all triggered by 
bluetoothd.

I am careful here since the whole power up/down path is already complicated 
enough.

Regards

Marcel

Re: [PATCH] Bluetooth: sco: Fix crash when using BT_SNDMTU/BT_RCVMTU option

2020-11-23 Thread Marcel Holtmann

Hi Wei,

> This commit add the invalid check for connected socket, without it will
> causes the following crash due to sco_pi(sk)->conn being NULL:
> 
> KASAN: null-ptr-deref in range [0x0050-0x0057]
> CPU: 3 PID: 4284 Comm: test_sco Not tainted 5.10.0-rc3+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 
> 04/01/2014
> RIP: 0010:sco_sock_getsockopt+0x45d/0x8e0
> Code: 48 c1 ea 03 80 3c 02 00 0f 85 ca 03 00 00 49 8b 9d f8 04 00 00 48 b8 00
>  00 00 00 00 fc ff df 48 8d 7b 50 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
>  c0 74 08 3c 03 0f 8e b5 03 00 00 8b 43 50 48 8b 0c
> RSP: 0018:88801bb17d88 EFLAGS: 00010206
> RAX: dc00 RBX:  RCX: 83a4ecdf
> RDX: 000a RSI: c90002fce000 RDI: 0050
> RBP: 111003762fb4 R08: 0001 R09: 88810e1008c0
> R10: bd695dcf R11: fbfff7ad2bb9 R12: 
> R13: 888018ff1000 R14: dc00 R15: 000d
> FS:  7fb4f76c1700() GS:88811af8() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: e3b7a938 CR3: 0001117be001 CR4: 00770ee0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 5554
> Call Trace:
> ? sco_skb_put_cmsg+0x80/0x80
> ? sco_skb_put_cmsg+0x80/0x80
> __sys_getsockopt+0x12a/0x220
> ? __ia32_sys_setsockopt+0x150/0x150
> ? syscall_enter_from_user_mode+0x18/0x50
> ? rcu_read_lock_bh_held+0xb0/0xb0
> __x64_sys_getsockopt+0xba/0x150
> ? syscall_enter_from_user_mode+0x1d/0x50
> do_syscall_64+0x33/0x40
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Fixes: 0fc1a726f897 ("Bluetooth: sco: new getsockopt options 
> BT_SNDMTU/BT_RCVMTU")
> Reported-by: Hulk Robot 
> Signed-off-by: Wei Yongjun 

patch has been applied to bluetooth-next tree.

Regards

Marcel

Re: [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling

2020-11-23 Thread Eric Dumazet




On 11/19/20 8:46 PM, Mat Martineau wrote:
> From: Paolo Abeni 
> 
> Send timely MPTCP-level ack is somewhat difficult when
> the insertion into the msk receive level is performed
> by the worker.
> 
> It needs TCP-level dup-ack to notify the MPTCP-level
> ack_seq increase, as both the TCP-level ack seq and the
> rcv window are unchanged.
> 
> We can actually avoid processing incoming data with the
> worker, and let the subflow or recevmsg() send ack as needed.
> 
> When recvmsg() moves the skbs inside the msk receive queue,
> the msk space is still unchanged, so tcp_cleanup_rbuf() could
> end-up skipping TCP-level ack generation. Anyway, when
> __mptcp_move_skbs() is invoked, a known amount of bytes is
> going to be consumed soon: we update rcv wnd computation taking
> them in account.
> 
> Additionally we need to explicitly trigger tcp_cleanup_rbuf()
> when recvmsg() consumes a significant amount of the receive buffer.
> 
> Signed-off-by: Paolo Abeni 
> Signed-off-by: Mat Martineau 
> ---
>  net/mptcp/options.c  |   1 +
>  net/mptcp/protocol.c | 105 +--
>  net/mptcp/protocol.h |   8 
>  net/mptcp/subflow.c  |   4 +-
>  4 files changed, 61 insertions(+), 57 deletions(-)
> 
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 248e3930c0cb..8a59b3e44599 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -530,6 +530,7 @@ static bool mptcp_established_options_dss(struct sock 
> *sk, struct sk_buff *skb,
>   opts->ext_copy.ack64 = 0;
>   }
>   opts->ext_copy.use_ack = 1;
> + WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
>  
>   /* Add kind/length/subtype/flag overhead if mapping is not populated */
>   if (dss_size == 0)
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4ae2c4a30e44..748343f1a968 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -407,16 +407,42 @@ static void mptcp_set_timeout(const struct sock *sk, 
> const struct sock *ssk)
>   mptcp_sk(sk)->timer_ival = tout > 0 ? tout : TCP_RTO_MIN;
>  }
>  
> -static void mptcp_send_ack(struct mptcp_sock *msk)
> +static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
> +{
> + struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> +
> + /* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */
> + if (subflow->request_join && !subflow->fully_established)
> + return false;
> +
> + /* only send if our side has not closed yet */
> + return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
> +}
> +
> +static void mptcp_send_ack(struct mptcp_sock *msk, bool force)
>  {
>   struct mptcp_subflow_context *subflow;
> + struct sock *pick = NULL;
>  
>   mptcp_for_each_subflow(msk, subflow) {
>   struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>  
> - lock_sock(ssk);
> - tcp_send_ack(ssk);
> - release_sock(ssk);
> + if (force) {
> + lock_sock(ssk);
> + tcp_send_ack(ssk);
> + release_sock(ssk);
> + continue;
> + }
> +
> + /* if the hintes ssk is still active, use it */
> + pick = ssk;
> + if (ssk == msk->ack_hint)
> + break;
> + }
> + if (!force && pick) {
> + lock_sock(pick);
> + tcp_cleanup_rbuf(pick, 1);

Calling tcp_cleanup_rbuf() on a socket that was never established is going to 
fail
with a divide by 0 (mss being 0)

AFAIK, mptcp_recvmsg() can be called right after a socket(AF_INET, SOCK_STREAM, 
IPPROTO_MPTCP)
call.

Probably, after a lock_sock(), you should double check socket state (same above 
before calling tcp_send_ack())



> + release_sock(pick);
>   }
>  }
>  




>  
> + /* be sure to advertise window change */
> + old_space = READ_ONCE(msk->old_wspace);
> + if ((tcp_space(sk) - old_space) >= old_space)
> + mptcp_send_ack(msk, false);
> +

Yes, if we call recvmsg() right after socket(), we will end up calling 
tcp_cleanup_rbuf(),
while no byte was ever copied/drained.

Re: [PATCH net] Bluetooth: Fix potential null pointer dereference in create_le_conn_complete

2020-11-23 Thread Marcel Holtmann

Hi Wang,

> The pointer 'conn' may be null. Before being used by
> hci_connect_le_scan_cleanup(), The pointer 'conn' must be
> checked whether it is null.
> 
> Fixes: 28a667c9c279 ("Bluetooth: advertisement handling in new connect 
> procedure")
> Reported-by: Hulk Robot 
> Signed-off-by: Wang Hai 
> ---
> net/bluetooth/hci_conn.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)

please send a version that applies cleanly against bluetooth-next tree.

Regards

Marcel

BUG: receive list entry not found for dev vxcan1, id 002, mask C00007FF

2020-11-23 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:c2e7554e Merge tag 'gfs2-v5.10-rc4-fixes' of git://git.ker..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=117f03ba50
kernel config:  https://syzkaller.appspot.com/x/.config?x=75292221eb79ace2
dashboard link: https://syzkaller.appspot.com/bug?extid=381d06e0c8eaacb8706f
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+381d06e0c8eaacb87...@syzkaller.appspotmail.com

[ cut here ]
BUG: receive list entry not found for dev vxcan1, id 002, mask C7FF
WARNING: CPU: 1 PID: 12946 at net/can/af_can.c:546 
can_rx_unregister+0x5a4/0x700 net/can/af_can.c:546
Modules linked in:
CPU: 1 PID: 12946 Comm: syz-executor.1 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:can_rx_unregister+0x5a4/0x700 net/can/af_can.c:546
Code: 8b 7c 24 78 44 8b 64 24 68 49 c7 c5 20 ac 56 8a e8 01 6c 97 f9 44 89 f9 
44 89 e2 4c 89 ee 48 c7 c7 60 ac 56 8a e8 66 af d3 00 <0f> 0b 48 8b 7c 24 28 e8 
b0 25 0f 01 e9 54 fb ff ff e8 26 e0 d8 f9
RSP: 0018:c90017e2fb38 EFLAGS: 00010286
RAX:  RBX:  RCX: 
RDX: 8880147a8000 RSI: 8158f3c5 RDI: f52002fc5f59
RBP: 0118 R08: 0001 R09: 8880b9f2011b
R10:  R11:  R12: 0002
R13: 8880254c R14: 192002fc5f6e R15: c7ff
FS:  01ddc940() GS:8880b9f0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 001b2f121000 CR3: 152c CR4: 001506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 isotp_notifier+0x2a7/0x540 net/can/isotp.c:1303
 call_netdevice_notifier net/core/dev.c:1735 [inline]
 call_netdevice_unregister_notifiers+0x156/0x1c0 net/core/dev.c:1763
 call_netdevice_unregister_net_notifiers net/core/dev.c:1791 [inline]
 unregister_netdevice_notifier+0xcd/0x170 net/core/dev.c:1870
 isotp_release+0x136/0x600 net/can/isotp.c:1011
 __sock_release+0xcd/0x280 net/socket.c:596
 sock_close+0x18/0x20 net/socket.c:1277
 __fput+0x285/0x920 fs/file_table.c:281
 task_work_run+0xdd/0x190 kernel/task_work.c:151
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
 exit_to_user_mode_prepare+0x17e/0x1a0 kernel/entry/common.c:191
 syscall_exit_to_user_mode+0x38/0x260 kernel/entry/common.c:266
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x417811
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 a4 1a 00 00 c3 48 83 
ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 
53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0169fbf0 EFLAGS: 0293 ORIG_RAX: 0003
RAX:  RBX: 0004 RCX: 00417811
RDX:  RSI: 13b7 RDI: 0003
RBP: 0001 R08: acabb3b7 R09: acabb3bb
R10: 0169fcd0 R11: 0293 R12: 0118c9a0
R13: 0118c9a0 R14: 03e8 R15: 0118bf2c


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Re: [PATCH 1/2] bluetooth: hci_event: consolidate error paths in hci_phy_link_complete_evt()

2020-11-23 Thread Marcel Holtmann

Hi Sergey,

>>> hci_phy_link_complete_evt() has several duplicate error paths -- consolidate
>>> them, using the *goto* statements.
>>> 
>>> Signed-off-by: Sergey Shtylyov 
>>> 
>>> ---
>>> net/bluetooth/hci_event.c |   16 ++--
>>> 1 file changed, 6 insertions(+), 10 deletions(-)
>> patch has been applied to bluetooth-next tree.
> 
>   What about the 2nd patch?

must have been slipping somehow. Can you please re-send against bluetooth-next.

Regards

Marcel

Re: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.

2020-11-23 Thread Marc Zyngier


On 2020-11-23 10:44, Marc Zyngier wrote:

On 2020-11-11 06:22, Jianyong Wu wrote:

ptp_kvm will get this service through SMCC call.
The service offers wall time and cycle count of host to guest.
The caller must specify whether they want the host cycle count
or the difference between host cycle count and cntvoff.

Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 61 
+

 include/linux/arm-smccc.h   | 17 +++
 2 files changed, 78 insertions(+)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index b9d8607083eb..f7d189563f3d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -9,6 +9,51 @@
 #include 
 #include 

+static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
+{
+   struct system_time_snapshot systime_snapshot;
+   u64 cycles = ~0UL;
+   u32 feature;
+
+   /*
+* system time and counter value must captured in the same
+* time to keep consistency and precision.
+*/
+   ktime_get_snapshot(&systime_snapshot);
+
+   // binding ptp_kvm clocksource to arm_arch_counter
+   if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
+   return;
+
+   val[0] = upper_32_bits(systime_snapshot.real);
+   val[1] = lower_32_bits(systime_snapshot.real);


What is the endianness of these values? I can't see it defined
anywhere, and this is likely not to work if guest and hypervisor
don't align.


Scratch that. This is all passed via registers, so the endianness
of the data is irrelevant. Please discard any comment about endianness
I made in this review.

The documentation aspect still requires to be beefed up.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

Re: [PATCHv2 1/1] xdp: remove the function xsk_map_inc

2020-11-23 Thread Zhu Yanjun

On Mon, Nov 23, 2020 at 8:05 PM  wrote:
>
> From: Zhu Yanjun 
>
> The function xsk_map_inc is a simple wrapper of bpf_map_inc and
> always returns zero. As such, replacing this function with bpf_map_inc
> and removing the test code.
>
> Signed-off-by: Zhu Yanjun 


> ---
>  net/xdp/xsk.c|  1 -
>  net/xdp/xsk.h|  1 -
>  net/xdp/xskmap.c | 13 +
>  3 files changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index cfbec3989a76..c1b8a888591c 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -548,7 +548,6 @@ static struct xsk_map *xsk_get_map_list_entry(struct 
> xdp_sock *xs,
> node = list_first_entry_or_null(&xs->map_list, struct xsk_map_node,
> node);
> if (node) {
> -   WARN_ON(xsk_map_inc(node->map));
> map = node->map;
> *map_entry = node->map_entry;
> }
> diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
> index b9e896cee5bb..0aad25c0e223 100644
> --- a/net/xdp/xsk.h
> +++ b/net/xdp/xsk.h
> @@ -41,7 +41,6 @@ static inline struct xdp_sock *xdp_sk(struct sock *sk)
>
>  void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
>  struct xdp_sock **map_entry);
> -int xsk_map_inc(struct xsk_map *map);
>  void xsk_map_put(struct xsk_map *map);
>  void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
>  int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
> diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
> index 49da2b8ace8b..6b7e9a72b101 100644
> --- a/net/xdp/xskmap.c
> +++ b/net/xdp/xskmap.c
> @@ -11,12 +11,6 @@
>
>  #include "xsk.h"
>
> -int xsk_map_inc(struct xsk_map *map)
> -{
> -   bpf_map_inc(&map->map);
> -   return 0;
> -}

Hi, Magnus

The function xsk_map_inc is replaced with bpf_map_inc.

Zhu Yanjun

> -
>  void xsk_map_put(struct xsk_map *map)
>  {
> bpf_map_put(&map->map);
> @@ -26,17 +20,12 @@ static struct xsk_map_node *xsk_map_node_alloc(struct 
> xsk_map *map,
>struct xdp_sock **map_entry)
>  {
> struct xsk_map_node *node;
> -   int err;
>
> node = kzalloc(sizeof(*node), GFP_ATOMIC | __GFP_NOWARN);
> if (!node)
> return ERR_PTR(-ENOMEM);
>
> -   err = xsk_map_inc(map);
> -   if (err) {
> -   kfree(node);
> -   return ERR_PTR(err);
> -   }
> +   bpf_map_inc(&map->map);
>
> node->map = map;
> node->map_entry = map_entry;
> --
> 2.25.1
>

Re: [PATCH net-next] bridge: mrp: Implement LC mode for MRP

2020-11-23 Thread Nikolay Aleksandrov

On 23/11/2020 13:14, Horatiu Vultur wrote:
> Extend MRP to support LC mode(link check) for the interconnect port.
> This applies only to the interconnect ring.
> 
> Opposite to RC mode(ring check) the LC mode is using CFM frames to
> detect when the link goes up or down and based on that the userspace
> will need to react.
> One advantage of the LC mode over RC mode is that there will be fewer
> frames in the normal rings. Because RC mode generates InTest on all
> ports while LC mode sends CFM frame only on the interconnect port.
> 
> All 4 nodes part of the interconnect ring needs to have the same mode.
> And it is not possible to have running LC and RC mode at the same time
> on a node.
> 
> Whenever the MIM starts it needs to detect the status of the other 3
> nodes in the interconnect ring so it would send a frame called
> InLinkStatus, on which the clients needs to reply with their link
> status.
> 
> This patch adds the frame header for the frame InLinkStatus and
> extends existing rules on how to forward this frame.
> 
> Signed-off-by: Horatiu Vultur 
> ---
>  include/uapi/linux/mrp_bridge.h |  7 +++
>  net/bridge/br_mrp.c | 18 +++---
>  2 files changed, 22 insertions(+), 3 deletions(-)
> 

Hi Horatiu,
The patch looks good overall, just one question below.

> diff --git a/include/uapi/linux/mrp_bridge.h b/include/uapi/linux/mrp_bridge.h
> index 6aeb13ef0b1e..450f6941a5a1 100644
> --- a/include/uapi/linux/mrp_bridge.h
> +++ b/include/uapi/linux/mrp_bridge.h
> @@ -61,6 +61,7 @@ enum br_mrp_tlv_header_type {
>   BR_MRP_TLV_HEADER_IN_TOPO = 0x7,
>   BR_MRP_TLV_HEADER_IN_LINK_DOWN = 0x8,
>   BR_MRP_TLV_HEADER_IN_LINK_UP = 0x9,
> + BR_MRP_TLV_HEADER_IN_LINK_STATUS = 0xa,
>   BR_MRP_TLV_HEADER_OPTION = 0x7f,
>  };
>  
> @@ -156,4 +157,10 @@ struct br_mrp_in_link_hdr {
>   __be16 interval;
>  };
>  
> +struct br_mrp_in_link_status_hdr {
> + __u8 sa[ETH_ALEN];
> + __be16 port_role;
> + __be16 id;
> +};
> +

I didn't see this struct used anywhere, am I missing anything?

Cheers,
 Nik

>  #endif
> diff --git a/net/bridge/br_mrp.c b/net/bridge/br_mrp.c
> index bb12fbf9aaf2..cec2c4e4561d 100644
> --- a/net/bridge/br_mrp.c
> +++ b/net/bridge/br_mrp.c
> @@ -858,7 +858,8 @@ static bool br_mrp_in_frame(struct sk_buff *skb)
>   if (hdr->type == BR_MRP_TLV_HEADER_IN_TEST ||
>   hdr->type == BR_MRP_TLV_HEADER_IN_TOPO ||
>   hdr->type == BR_MRP_TLV_HEADER_IN_LINK_DOWN ||
> - hdr->type == BR_MRP_TLV_HEADER_IN_LINK_UP)
> + hdr->type == BR_MRP_TLV_HEADER_IN_LINK_UP ||
> + hdr->type == BR_MRP_TLV_HEADER_IN_LINK_STATUS)
>   return true;
>  
>   return false;
> @@ -1126,9 +1127,9 @@ static int br_mrp_rcv(struct net_bridge_port *p,
>   goto no_forward;
>   }
>   } else {
> - /* MIM should forward IntLinkChange and
> + /* MIM should forward IntLinkChange/Status and
>* IntTopoChange between ring ports but MIM
> -  * should not forward IntLinkChange and
> +  * should not forward IntLinkChange/Status and
>* IntTopoChange if the frame was received at
>* the interconnect port
>*/
> @@ -1155,6 +1156,17 @@ static int br_mrp_rcv(struct net_bridge_port *p,
>in_type == BR_MRP_TLV_HEADER_IN_LINK_DOWN))
>   goto forward;
>  
> + /* MIC should forward IntLinkStatus frames only to
> +  * interconnect port if it was received on a ring port.
> +  * If it is received on interconnect port then, it
> +  * should be forward on both ring ports
> +  */
> + if (br_mrp_is_ring_port(p_port, s_port, p) &&
> + in_type == BR_MRP_TLV_HEADER_IN_LINK_STATUS) {
> + p_dst = NULL;
> + s_dst = NULL;
> + }
> +
>   /* Should forward the InTopo frames only between the
>* ring ports
>*/
>

Re: [PATCHv2 1/1] xdp: remove the function xsk_map_inc

2020-11-23 Thread Magnus Karlsson

On Mon, Nov 23, 2020 at 1:11 PM Zhu Yanjun  wrote:
>
> On Mon, Nov 23, 2020 at 8:05 PM  wrote:
> >
> > From: Zhu Yanjun 
> >
> > The function xsk_map_inc is a simple wrapper of bpf_map_inc and
> > always returns zero. As such, replacing this function with bpf_map_inc
> > and removing the test code.
> >
> > Signed-off-by: Zhu Yanjun 
>
>
> > ---
> >  net/xdp/xsk.c|  1 -
> >  net/xdp/xsk.h|  1 -
> >  net/xdp/xskmap.c | 13 +
> >  3 files changed, 1 insertion(+), 14 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index cfbec3989a76..c1b8a888591c 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -548,7 +548,6 @@ static struct xsk_map *xsk_get_map_list_entry(struct 
> > xdp_sock *xs,
> > node = list_first_entry_or_null(&xs->map_list, struct xsk_map_node,
> > node);
> > if (node) {
> > -   WARN_ON(xsk_map_inc(node->map));

This should be bpf_map_inc(&node->map->map); Think you forgot to
convert this one.

> > map = node->map;
> > *map_entry = node->map_entry;
> > }
> > diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
> > index b9e896cee5bb..0aad25c0e223 100644
> > --- a/net/xdp/xsk.h
> > +++ b/net/xdp/xsk.h
> > @@ -41,7 +41,6 @@ static inline struct xdp_sock *xdp_sk(struct sock *sk)
> >
> >  void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
> >  struct xdp_sock **map_entry);
> > -int xsk_map_inc(struct xsk_map *map);
> >  void xsk_map_put(struct xsk_map *map);
> >  void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
> >  int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
> > diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
> > index 49da2b8ace8b..6b7e9a72b101 100644
> > --- a/net/xdp/xskmap.c
> > +++ b/net/xdp/xskmap.c
> > @@ -11,12 +11,6 @@
> >
> >  #include "xsk.h"
> >
> > -int xsk_map_inc(struct xsk_map *map)
> > -{
> > -   bpf_map_inc(&map->map);
> > -   return 0;
> > -}
>
> Hi, Magnus
>
> The function xsk_map_inc is replaced with bpf_map_inc.
>
> Zhu Yanjun
>
> > -
> >  void xsk_map_put(struct xsk_map *map)
> >  {
> > bpf_map_put(&map->map);
> > @@ -26,17 +20,12 @@ static struct xsk_map_node *xsk_map_node_alloc(struct 
> > xsk_map *map,
> >struct xdp_sock **map_entry)
> >  {
> > struct xsk_map_node *node;
> > -   int err;
> >
> > node = kzalloc(sizeof(*node), GFP_ATOMIC | __GFP_NOWARN);
> > if (!node)
> > return ERR_PTR(-ENOMEM);
> >
> > -   err = xsk_map_inc(map);
> > -   if (err) {
> > -   kfree(node);
> > -   return ERR_PTR(err);
> > -   }
> > +   bpf_map_inc(&map->map);
> >
> > node->map = map;
> > node->map_entry = map_entry;
> > --
> > 2.25.1
> >

Re: [PATCH bpf-next v2 0/5] selftests/bpf: xsk selftests

2020-11-23 Thread Björn Töpel


On 2020-11-21 01:31, Yonghong Song wrote:



On 11/20/20 5:00 AM, Weqaar Janjua wrote:

This patch set adds AF_XDP selftests based on veth to selftests/bpf.

# Topology:
# -
# ---
#   _ | Process | _
#  /  ---  \
# /    |    \
#    / | \
#  --- | ---
#  | Thread1 | | | Thread2 |
#  --- | ---
#   |  |  |
#  --- | ---
#  |  xskX   | | |  xskY   |
#  --- | ---
#   |  |  |
#  --- | --
#  |  vethX  | - |  vethY |
#  ---   peer    --
#   |  |  |
#  namespaceX  | namespaceY

These selftests test AF_XDP SKB and Native/DRV modes using veth Virtual
Ethernet interfaces.

The test program contains two threads, each thread is single socket with
a unique UMEM. It validates in-order packet delivery and packet content
by sending packets to each other.

Prerequisites setup by script test_xsk_prerequisites.sh:

    Set up veth interfaces as per the topology shown ^^:
    * setup two veth interfaces and one namespace
    ** veth in root namespace
    ** veth in af_xdp namespace
    ** namespace af_xdp
    * create a spec file veth.spec that includes this run-time 
configuration

  that is read by test scripts - filenames prefixed with test_xsk_
    ***  and  are randomly generated 4 digit numbers used to 
avoid

    conflict with any existing interface

The following tests are provided:

1. AF_XDP SKB mode
    Generic mode XDP is driver independent, used when the driver does
    not have support for XDP. Works on any netdevice using sockets and
    generic XDP path. XDP hook from netif_receive_skb().
    a. nopoll - soft-irq processing
    b. poll - using poll() syscall
    c. Socket Teardown
   Create a Tx and a Rx socket, Tx from one socket, Rx on another.
   Destroy both sockets, then repeat multiple times. Only nopoll mode
  is used
    d. Bi-directional Sockets
   Configure sockets as bi-directional tx/rx sockets, sets up fill
  and completion rings on each socket, tx/rx in both directions.
  Only nopoll mode is used

2. AF_XDP DRV/Native mode
    Works on any netdevice with XDP_REDIRECT support, driver dependent.
    Processes packets before SKB allocation. Provides better performance
    than SKB. Driver hook available just after DMA of buffer descriptor.
    a. nopoll
    b. poll
    c. Socket Teardown
    d. Bi-directional Sockets
    * Only copy mode is supported because veth does not currently support
  zero-copy mode

Total tests: 8

Flow:
* Single process spawns two threads: Tx and Rx
* Each of these two threads attach to a veth interface within their
   assigned namespaces
* Each thread creates one AF_XDP socket connected to a unique umem
   for each veth interface
* Tx thread transmits 10k packets from veth to veth
* Rx thread verifies if all 10k packets were received and delivered
   in-order, and have the right content

v2 changes:
* Move selftests/xsk to selftests/bpf
* Remove Makefiles under selftests/xsk, and utilize 
selftests/bpf/Makefile


Structure of the patch set:

Patch 1: This patch adds XSK Selftests framework under selftests/bpf
Patch 2: Adds tests: SKB poll and nopoll mode, and mac-ip-udp debug
Patch 3: Adds tests: DRV poll and nopoll mode
Patch 4: Adds tests: SKB and DRV Socket Teardown
Patch 5: Adds tests: SKB and DRV Bi-directional Sockets


I just want to report that after applying the above 5 patches
on top of bpf-next commit 450d060e8f75 ("bpftool: Add {i,d}tlb_misses 
support for bpftool profile"), I hit the following error with below 
command sequences:


  $ ./test_xsk_prerequisites.sh
  $ ./test_xsk_skb_poll.sh
# Interface found: ve1480
# Interface found: ve9258
# NS switched: af_xdp9258
1..1
# Interface [ve9258] vector [Rx]
# Interface [ve1480] vector [Tx]
# Sending 1 packets on interface ve1480
[  331.741244] [ cut here ]
[  331.741741] kernel BUG at net/core/skbuff.c:1621!
[  331.742265] invalid opcode:  [#1] PREEMPT SMP PTI
[  331.742837] CPU: 0 PID: 1883 Comm: xdpxceiver Not tainted 5.10.0-rc3+ 
#1037
[  331.743468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3

-1.el7.centos 04/01/2014
[  331.744300] RIP: 0010:pskb_expand_head+0x27b/0x310


Ugh, looks like the tests are working. :-P

This is a BUG_ON(skb_shared(skb)) trigger, related to the skbuff 
refcount changes done recently in AF_XDP.


I'll cook a patch! Thanks for the report!


Björn

Re: [PATCH bpf] xsk: fix incorrect netdev reference count

2020-11-23 Thread patchwork-bot+netdevbpf

Hello:

This patch was applied to bpf/bpf.git (refs/heads/master):

On Fri, 20 Nov 2020 16:14:43 +0100 you wrote:
> From: Marek Majtyka 
> 
> Fix incorrect netdev reference count in xsk_bind operation. Incorrect
> reference count of the device appears when a user calls bind with the
> XDP_ZEROCOPY flag on an interface which does not support zero-copy.
> In such a case, an error is returned but the reference count is not
> decreased. This change fixes the fault, by decreasing the reference count
> in case of such an error.
> 
> [...]

Here is the summary with links:
  - [bpf] xsk: fix incorrect netdev reference count
https://git.kernel.org/bpf/bpf/c/178648916e73

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-23 Thread Gustavo A. R. Silva

On Sun, Nov 22, 2020 at 11:53:55AM -0800, James Bottomley wrote:
> On Sun, 2020-11-22 at 11:22 -0800, Joe Perches wrote:
> > On Sun, 2020-11-22 at 11:12 -0800, James Bottomley wrote:
> > > On Sun, 2020-11-22 at 10:25 -0800, Joe Perches wrote:
> > > > On Sun, 2020-11-22 at 10:21 -0800, James Bottomley wrote:
> > > > > Please tell me our reward for all this effort isn't a single
> > > > > missing error print.
> > > > 
> > > > There were quite literally dozens of logical defects found
> > > > by the fallthrough additions.  Very few were logging only.
> > > 
> > > So can you give us the best examples (or indeed all of them if
> > > someone is keeping score)?  hopefully this isn't a US election
> > > situation ...
> > 
> > Gustavo?  Are you running for congress now?
> > 
> > https://lwn.net/Articles/794944/
> 
> That's 21 reported fixes of which about 50% seem to produce no change
> in code behaviour at all, a quarter seem to have no user visible effect
> with the remaining quarter producing unexpected errors on obscure
> configuration parameters, which is why no-one really noticed them
> before.

The really important point here is the number of bugs this has prevented
and will prevent in the future. See an example of this, below:

https://lore.kernel.org/linux-iio/20190813135802.gb27...@kroah.com/

This work is still relevant, even if the total number of issues/bugs
we find in the process is zero (which is not the case).

"The sucky thing about doing hard work to deploy hardening is that the
result is totally invisible by definition (things not happening) [..]"
- Dmitry Vyukov

Thanks
--
Gustavo

[PATCH bpf] net, xsk: Avoid taking multiple skbuff references

2020-11-23 Thread Björn Töpel

From: Björn Töpel 

Commit 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
addressed the problem that packets were discarded from the Tx AF_XDP
ring, when the driver returned NETDEV_TX_BUSY. Part of the fix was
bumping the skbuff reference count, so that the buffer would not be
freed by dev_direct_xmit(). A reference count larger than one means
that the skbuff is "shared", which is not the case.

If the "shared" skbuff is sent to the generic XDP receive path,
netif_receive_generic_xdp(), and pskb_expand_head() is entered the
BUG_ON(skb_shared(skb)) will trigger.

This patch adds a variant to dev_direct_xmit(), __dev_direct_xmit(),
where a user can select the skbuff free policy. This allows AF_XDP to
avoid bumping the reference count, but still keep the NETDEV_TX_BUSY
behavior.

Reported-by: Yonghong Song 
Fixes: 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
Signed-off-by: Björn Töpel 
---
 include/linux/netdevice.h | 1 +
 net/core/dev.c| 9 +++--
 net/xdp/xsk.c | 8 +---
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 964b494b0e8d..e7402fca7752 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2815,6 +2815,7 @@ u16 dev_pick_tx_cpu_id(struct net_device *dev, struct 
sk_buff *skb,
   struct net_device *sb_dev);
 int dev_queue_xmit(struct sk_buff *skb);
 int dev_queue_xmit_accel(struct sk_buff *skb, struct net_device *sb_dev);
+int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool free_on_busy);
 int dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
 int register_netdevice(struct net_device *dev);
 void unregister_netdevice_queue(struct net_device *dev, struct list_head 
*head);
diff --git a/net/core/dev.c b/net/core/dev.c
index 82dc6b48e45f..2af79a4253bb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4180,7 +4180,7 @@ int dev_queue_xmit_accel(struct sk_buff *skb, struct 
net_device *sb_dev)
 }
 EXPORT_SYMBOL(dev_queue_xmit_accel);
 
-int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
+int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool free_on_busy)
 {
struct net_device *dev = skb->dev;
struct sk_buff *orig_skb = skb;
@@ -4211,7 +4211,7 @@ int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
 
local_bh_enable();
 
-   if (!dev_xmit_complete(ret))
+   if (free_on_busy && !dev_xmit_complete(ret))
kfree_skb(skb);
 
return ret;
@@ -4220,6 +4220,11 @@ int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
kfree_skb_list(skb);
return NET_XMIT_DROP;
 }
+
+int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
+{
+   return __dev_direct_xmit(skb, queue_id, true);
+}
 EXPORT_SYMBOL(dev_direct_xmit);
 
 /*
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5a6cdf7b320d..c6ad31b374b7 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -411,11 +411,7 @@ static int xsk_generic_xmit(struct sock *sk)
skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
skb->destructor = xsk_destruct_skb;
 
-   /* Hinder dev_direct_xmit from freeing the packet and
-* therefore completing it in the destructor
-*/
-   refcount_inc(&skb->users);
-   err = dev_direct_xmit(skb, xs->queue_id);
+   err = __dev_direct_xmit(skb, xs->queue_id, false);
if  (err == NETDEV_TX_BUSY) {
/* Tell user-space to retry the send */
skb->destructor = sock_wfree;
@@ -429,12 +425,10 @@ static int xsk_generic_xmit(struct sock *sk)
/* Ignore NET_XMIT_CN as packet might have been sent */
if (err == NET_XMIT_DROP) {
/* SKB completed but not sent */
-   kfree_skb(skb);
err = -EBUSY;
goto out;
}
 
-   consume_skb(skb);
sent_frame = true;
}
 

base-commit: 178648916e73e00de83150eb0c90c0d3a977a46a
-- 
2.27.0

[PATCHv6 iproute2-next 0/5] iproute2: add libbpf support

2020-11-23 Thread Hangbin Liu

This series converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available. This means that iproute2 will
correctly process BTF information and support the new-style BTF-defined
maps, while keeping compatibility with the old internal map definition
syntax.

This is achieved by checking for libbpf at './configure' time, and using
it if available. By default the system libbpf will be used, but static
linking against a custom libbpf version can be achieved by passing
LIBBPF_DIR to configure. LIBBPF_FORCE can be set to on to force configure
abort if no suitable libbpf is found (useful for automatic packaging
that wants to enforce the dependency), or set off to disable libbpf check
and build iproute2 with legacy bpf.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code ensures that iproute2 will
still understand the old map definition format, including populating
map-in-map and tail call maps before load.

The examples in bpf/examples are kept, and a separate set of examples
are added with BTF-based map definitions for those examples where this
is possible (libbpf doesn't currently support declaratively populating
tail call maps).

At last, Thanks a lot for Toke's help on this patch set.

v6:
a) print runtime libbpf version in ip -V and tc -V

v5:
a) Fix LIBBPF_DIR typo and description, use libbpf DESTDIR as LIBBPF_DIR
   dest.
b) Fix bpf_prog_load_dev typo.
c) rebase to latest iproute2-next.

v4:
a) Make variable LIBBPF_FORCE able to control whether build iproute2
   with libbpf or not.
b) Add new file bpf_glue.c to for libbpf/legacy mixed bpf calls.
c) Fix some build issues and shell compatibility error.

v3:
a) Update configure to Check function bpf_program__section_name() separately
b) Add a new function get_bpf_program__section_name() to choose whether to
use bpf_program__title() or not.
c) Test build the patch on Fedora 33 with libbpf-0.1.0-1.fc33 and
   libbpf-devel-0.1.0-1.fc33

v2:
a) Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
b) Add ipvrf with libbpf support.


Here are the test results with patched iproute2:
== Show libbpf version
# ip -V
ip utility, iproute2-5.9.0, libbpf 0.1.0
# tc -V
tc utility, iproute2-5.9.0, libbpf 0.1.0

== setup env
# clang -O2 -Wall -g -target bpf -c bpf_graft.c -o btf_graft.o
# clang -O2 -Wall -g -target bpf -c bpf_map_in_map.c -o btf_map_in_map.o
# clang -O2 -Wall -g -target bpf -c bpf_shared.c -o btf_shared.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_cyclic.c -o bpf_cyclic.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_graft.c -o bpf_graft.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_map_in_map.c -o bpf_map_in_map.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_shared.c -o bpf_shared.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_tailcall.c -o bpf_tailcall.o
# rm -rf /sys/fs/bpf/xdp/globals
# /root/iproute2/ip/ip link add type veth
# /root/iproute2/ip/ip link set veth0 up
# /root/iproute2/ip/ip link set veth1 up


== Load objs
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1:  mtu 1500 xdp qdisc noqueue 
state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 4 tag 3056d2382e53f27c jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
4: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
loaded_at 2020-10-22T08:04:21-0400  uid 0
xlated 80B  jited 71B  memlock 4096B
btf_id 5
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1:  mtu 1500 xdp qdisc noqueue 
state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 8 tag 4420e72b2a601ed7 jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
8: xdp  name imain  tag 4420e72b2a601ed7  gpl
loaded_at 2020-10-22T08:04:23-0400  uid 0
xlated 336B  jited 193B  memlock 4096B  map_ids 3
btf_id 10
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1:  mtu 1500 xdp qdisc noqueue 
state UP mode DEFAULT group default qlen 1000
link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
prog/xdp id 12 tag 9cbab549c3af3eab jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef 
/sys/fs/b

[PATCHv6 iproute2-next 1/5] iproute2: add check_libbpf() and get_libbpf_version()

2020-11-23 Thread Hangbin Liu

This patch aim to add basic checking functions for later iproute2
libbpf support.

First we add check_libbpf() in configure to see if we have bpf library
support. By default the system libbpf will be used, but static linking
against a custom libbpf version can be achieved by passing libbpf DESTDIR
to variable LIBBPF_DIR for configure.

Another variable LIBBPF_FORCE is used to control whether to build iproute2
with libbpf. If set to on, then force to build with libbpf and exit if
not available. If set to off, then force to not build with libbpf.

When dynamically linking against libbpf, we can't be sure that the
version we discovered at compile time is actually the one we are
using at runtime. This can lead to hard-to-debug errors. So we add
a new file lib/bpf_glue.c and a helper function get_libbpf_version()
to get correct libbpf version at runtime.

Signed-off-by: Hangbin Liu 
---

v6:
1) Add a new helper get_libbpf_version() to get runtime libbpf version
  based on Toke's xdp-tools patch. The libbpf version will be printed
  when exec ip -V or tc -V.

v5:
1) Fix LIBBPF_DIR type and description, use libbpf DESTDIR as LIBBPF_DIR
   dest.

v4:
1) Remove duplicate LIBBPF_CFLAGS
2) Remove un-needed -L since using static libbpf.a
3) Fix == not supported in dash
4) Extend LIBBPF_FORCE to support on/off, when set to on, stop building when
   there is no libbpf support. If set to off, discard libbpf check.
5) Print libbpf version after checking

v3:
Check function bpf_program__section_name() separately and only use it
on higher libbpf version.

v2:
No update
---
 configure  | 113 +
 include/bpf_util.h |   3 ++
 ip/ip.c|  10 +++-
 lib/Makefile   |   2 +-
 lib/bpf_glue.c |  63 +
 tc/tc.c|  10 +++-
 6 files changed, 196 insertions(+), 5 deletions(-)
 create mode 100644 lib/bpf_glue.c

diff --git a/configure b/configure
index 307912aa..2c363d3b 100755
--- a/configure
+++ b/configure
@@ -2,6 +2,11 @@
 # SPDX-License-Identifier: GPL-2.0
 # This is not an autoconf generated configure
 #
+# Influential LIBBPF environment variables:
+#   LIBBPF_FORCE={on,off}   on: require link against libbpf;
+#   off: disable libbpf probing
+#   LIBBPF_DIR  Path to libbpf DESTDIR to use
+
 INCLUDE=${1:-"$PWD/include"}
 
 # Output file which is input to Makefile
@@ -240,6 +245,111 @@ check_elf()
 fi
 }
 
+have_libbpf_basic()
+{
+cat >$TMPDIR/libbpf_test.c <
+int main(int argc, char **argv) {
+bpf_program__set_autoload(NULL, false);
+bpf_map__ifindex(NULL);
+bpf_map__set_pin_path(NULL, NULL);
+bpf_object__open_file(NULL, NULL);
+return 0;
+}
+EOF
+
+$CC -o $TMPDIR/libbpf_test $TMPDIR/libbpf_test.c $LIBBPF_CFLAGS 
$LIBBPF_LDLIBS >/dev/null 2>&1
+local ret=$?
+
+rm -f $TMPDIR/libbpf_test.c $TMPDIR/libbpf_test
+return $ret
+}
+
+have_libbpf_sec_name()
+{
+cat >$TMPDIR/libbpf_sec_test.c <
+int main(int argc, char **argv) {
+void *ptr;
+bpf_program__section_name(NULL);
+return 0;
+}
+EOF
+
+$CC -o $TMPDIR/libbpf_sec_test $TMPDIR/libbpf_sec_test.c $LIBBPF_CFLAGS 
$LIBBPF_LDLIBS >/dev/null 2>&1
+local ret=$?
+
+rm -f $TMPDIR/libbpf_sec_test.c $TMPDIR/libbpf_sec_test
+return $ret
+}
+
+check_force_libbpf_on()
+{
+# if set LIBBPF_FORCE=on but no libbpf support, just exist the config
+# process to make sure we don't build without libbpf.
+if [ "$LIBBPF_FORCE" = on ]; then
+echo " LIBBPF_FORCE=on set, but couldn't find a usable libbpf"
+exit 1
+fi
+}
+
+check_libbpf()
+{
+# if set LIBBPF_FORCE=off, disable libbpf entirely
+if [ "$LIBBPF_FORCE" = off ]; then
+echo "no"
+return
+fi
+
+if ! ${PKG_CONFIG} libbpf --exists && [ -z "$LIBBPF_DIR" ] ; then
+echo "no"
+check_force_libbpf_on
+return
+fi
+
+if [ $(uname -m) = x86_64 ]; then
+local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib64"
+else
+local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib"
+fi
+
+if [ -n "$LIBBPF_DIR" ]; then
+LIBBPF_CFLAGS="-I${LIBBPF_DIR}/usr/include"
+LIBBPF_LDLIBS="${LIBBPF_LIBDIR}/libbpf.a -lz -lelf"
+LIBBPF_VERSION=$(PKG_CONFIG_LIBDIR=${LIBBPF_LIBDIR}/pkgconfig 
${PKG_CONFIG} libbpf --modversion)
+else
+LIBBPF_CFLAGS=$(${PKG_CONFIG} libbpf --cflags)
+LIBBPF_LDLIBS=$(${PKG_CONFIG} libbpf --libs)
+LIBBPF_VERSION=$(${PKG_CONFIG} libbpf --modversion)
+fi
+
+if ! have_libbpf_basic; then
+echo "no"
+echo " libbpf version $LIBBPF_VERSION is too low, please update it to 
at least 0.1.0"
+check_force_libbpf_on
+return
+else
+echo "HAVE_LIBBPF:=y" >> $CONFIG
+echo 'CFLAGS += -DHAVE_LIBBPF ' $LIBBPF_CFLAGS >> $CONFIG
+echo "CFLAGS += -DLIBBPF_VERSION=\\\"$LIBBPF_VERSION\\\"" >> $CONFIG
+echo 'LDLIBS += ' $LIBBPF_LDLIBS >> $CONFIG
+
+

[PATCHv6 iproute2-next 2/5] lib: make ipvrf able to use libbpf and fix function name conflicts

2020-11-23 Thread Hangbin Liu

There are directly calls in libbpf for bpf program load/attach.
So we could just use two wrapper functions for ipvrf and convert
them with libbpf support.

Function bpf_prog_load() is removed as it's conflict with libbpf
function name.

bpf.c is moved to bpf_legacy.c for later main libbpf support in
iproute2.

Reviewed-by: Toke Høiland-Jørgensen 
Signed-off-by: Hangbin Liu 
---
v6: bpf_glue.c is created in previous patch. So I changed the commit
description.
v5: Fix bpf_prog_load_dev typo.
v4: Add new file bpf_glue.c
v2-v3: no update
---
 include/bpf_util.h  | 10 +++---
 ip/ipvrf.c  |  6 +++---
 lib/Makefile|  2 +-
 lib/bpf_glue.c  | 23 +++
 lib/{bpf.c => bpf_legacy.c} | 15 +++
 5 files changed, 37 insertions(+), 19 deletions(-)
 rename lib/{bpf.c => bpf_legacy.c} (99%)

diff --git a/include/bpf_util.h b/include/bpf_util.h
index dee5bb02..3235c34e 100644
--- a/include/bpf_util.h
+++ b/include/bpf_util.h
@@ -274,12 +274,16 @@ int bpf_trace_pipe(void);
 
 void bpf_print_ops(struct rtattr *bpf_ops, __u16 len);
 
-int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
- size_t size_insns, const char *license, char *log,
- size_t size_log);
+int bpf_prog_load_dev(enum bpf_prog_type type, const struct bpf_insn *insns,
+ size_t size_insns, const char *license, __u32 ifindex,
+ char *log, size_t size_log);
+int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
+size_t size_insns, const char *license, char *log,
+size_t size_log);
 
 int bpf_prog_attach_fd(int prog_fd, int target_fd, enum bpf_attach_type type);
 int bpf_prog_detach_fd(int target_fd, enum bpf_attach_type type);
+int bpf_program_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
 
 int bpf_dump_prog_info(FILE *f, uint32_t id);
 
diff --git a/ip/ipvrf.c b/ip/ipvrf.c
index 28dd8e25..42779e5c 100644
--- a/ip/ipvrf.c
+++ b/ip/ipvrf.c
@@ -256,8 +256,8 @@ static int prog_load(int idx)
BPF_EXIT_INSN(),
};
 
-   return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
-"GPL", bpf_log_buf, sizeof(bpf_log_buf));
+   return bpf_program_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+   "GPL", bpf_log_buf, sizeof(bpf_log_buf));
 }
 
 static int vrf_configure_cgroup(const char *path, int ifindex)
@@ -288,7 +288,7 @@ static int vrf_configure_cgroup(const char *path, int 
ifindex)
goto out;
}
 
-   if (bpf_prog_attach_fd(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE)) {
+   if (bpf_program_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE)) {
fprintf(stderr, "Failed to attach prog to cgroup: '%s'\n",
strerror(errno));
goto out;
diff --git a/lib/Makefile b/lib/Makefile
index a02775a5..7c8a197c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -5,7 +5,7 @@ CFLAGS += -fPIC
 
 UTILOBJ = utils.o rt_names.o ll_map.o ll_types.o ll_proto.o ll_addr.o \
inet_proto.o namespace.o json_writer.o json_print.o \
-   names.o color.o bpf.o bpf_glue.o exec.o fs.o cg_map.o
+   names.o color.o bpf_legacy.o bpf_glue.o exec.o fs.o cg_map.o
 
 NLOBJ=libgenl.o libnetlink.o mnl_utils.o
 
diff --git a/lib/bpf_glue.c b/lib/bpf_glue.c
index 67c41c22..fa609bfe 100644
--- a/lib/bpf_glue.c
+++ b/lib/bpf_glue.c
@@ -5,6 +5,29 @@
  *
  */
 #include "bpf_util.h"
+#ifdef HAVE_LIBBPF
+#include 
+#endif
+
+int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
+size_t size_insns, const char *license, char *log,
+size_t size_log)
+{
+#ifdef HAVE_LIBBPF
+   return bpf_load_program(type, insns, size_insns, license, 0, log, 
size_log);
+#else
+   return bpf_prog_load_dev(type, insns, size_insns, license, 0, log, 
size_log);
+#endif
+}
+
+int bpf_program_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
+{
+#ifdef HAVE_LIBBPF
+   return bpf_prog_attach(prog_fd, target_fd, type, 0);
+#else
+   return bpf_prog_attach_fd(prog_fd, target_fd, type);
+#endif
+}
 
 #ifdef HAVE_LIBBPF
 static const char *_libbpf_compile_version = LIBBPF_VERSION;
diff --git a/lib/bpf.c b/lib/bpf_legacy.c
similarity index 99%
rename from lib/bpf.c
rename to lib/bpf_legacy.c
index c7d45077..4246fb76 100644
--- a/lib/bpf.c
+++ b/lib/bpf_legacy.c
@@ -1087,10 +1087,9 @@ int bpf_prog_detach_fd(int target_fd, enum 
bpf_attach_type type)
return bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
 }
 
-static int bpf_prog_load_dev(enum bpf_prog_type type,
-const struct bpf_insn *insns, size_t size_insns,
-const char *license, __u32 ifindex,
-char *log, size_t size_log)
+int bpf_prog_load_dev(enum bpf_prog_type type, const

[PATCHv6 iproute2-next 3/5] lib: add libbpf support

2020-11-23 Thread Hangbin Liu

This patch converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available, which is started by Toke's
implementation[1]. With libbpf iproute2 could correctly process BTF
information and support the new-style BTF-defined maps, while keeping
compatibility with the old internal map definition syntax.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code in bpf_legacy.c ensures that
iproute2 will still understand the old map definition format, including
populating map-in-map and tail call maps before load.

In bpf_libbpf.c, we init iproute2 ctx and elf info first to check the
legacy bytes. When handling the legacy maps, for map-in-maps, we create
them manually and re-use the fd as they are associated with id/inner_id.
For pin maps, we only set the pin path and let libbp load to handle it.
For tail calls, we find it first and update the element after prog load.

Other maps/progs will be loaded by libbpf directly.

[1] https://lore.kernel.org/bpf/20190820114706.18546-1-t...@redhat.com/

Reviewed-by: Toke Høiland-Jørgensen 
Signed-off-by: Hangbin Liu 
---
v6: also make bpf_libbpf.c build depend on HAVE_ELF

v5: no update

v4:
Move ipvrf code to patch 02
Move HAVE_LIBBPF inside HAVE_ELF definition as libbpf depends on elf.

v3:
Add a new function get_bpf_program__section_name() to choose whether
use bpf_program__title() or not.

v2:
Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
Add ipvrf with libbpf support.
---
 include/bpf_util.h |  17 +++
 lib/Makefile   |   6 +
 lib/bpf_legacy.c   | 178 +++
 lib/bpf_libbpf.c   | 348 +
 4 files changed, 549 insertions(+)
 create mode 100644 lib/bpf_libbpf.c

diff --git a/include/bpf_util.h b/include/bpf_util.h
index 3235c34e..53acc410 100644
--- a/include/bpf_util.h
+++ b/include/bpf_util.h
@@ -291,6 +291,16 @@ int bpf_dump_prog_info(FILE *f, uint32_t id);
 int bpf_send_map_fds(const char *path, const char *obj);
 int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
 unsigned int entries);
+#ifdef HAVE_LIBBPF
+int iproute2_bpf_elf_ctx_init(struct bpf_cfg_in *cfg);
+int iproute2_bpf_fetch_ancillary(void);
+int iproute2_get_root_path(char *root_path, size_t len);
+bool iproute2_is_pin_map(const char *libbpf_map_name, char *pathname);
+bool iproute2_is_map_in_map(const char *libbpf_map_name, struct bpf_elf_map 
*imap,
+   struct bpf_elf_map *omap, char *omap_name);
+int iproute2_find_map_name_by_id(unsigned int map_id, char *name);
+int iproute2_load_libbpf(struct bpf_cfg_in *cfg);
+#endif /* HAVE_LIBBPF */
 #else
 static inline int bpf_send_map_fds(const char *path, const char *obj)
 {
@@ -303,6 +313,13 @@ static inline int bpf_recv_map_fds(const char *path, int 
*fds,
 {
return -1;
 }
+#ifdef HAVE_LIBBPF
+static inline int iproute2_load_libbpf(struct bpf_cfg_in *cfg)
+{
+   fprintf(stderr, "No ELF library support compiled in.\n");
+   return -1;
+}
+#endif /* HAVE_LIBBPF */
 #endif /* HAVE_ELF */
 
 const char *get_libbpf_version(void);
diff --git a/lib/Makefile b/lib/Makefile
index 7c8a197c..e37585c6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -7,6 +7,12 @@ UTILOBJ = utils.o rt_names.o ll_map.o ll_types.o ll_proto.o 
ll_addr.o \
inet_proto.o namespace.o json_writer.o json_print.o \
names.o color.o bpf_legacy.o bpf_glue.o exec.o fs.o cg_map.o
 
+ifeq ($(HAVE_ELF),y)
+ifeq ($(HAVE_LIBBPF),y)
+UTILOBJ += bpf_libbpf.o
+endif
+endif
+
 NLOBJ=libgenl.o libnetlink.o mnl_utils.o
 
 all: libnetlink.a libutil.a
diff --git a/lib/bpf_legacy.c b/lib/bpf_legacy.c
index 4246fb76..bc869c3f 100644
--- a/lib/bpf_legacy.c
+++ b/lib/bpf_legacy.c
@@ -940,6 +940,9 @@ static int bpf_do_parse(struct bpf_cfg_in *cfg, const bool 
*opt_tbl)
 static int bpf_do_load(struct bpf_cfg_in *cfg)
 {
if (cfg->mode == EBPF_OBJECT) {
+#ifdef HAVE_LIBBPF
+   return iproute2_load_libbpf(cfg);
+#endif
cfg->prog_fd = bpf_obj_open(cfg->object, cfg->type,
cfg->section, cfg->ifindex,
cfg->verbose);
@@ -3155,4 +3158,179 @@ int bpf_recv_map_fds(const char *path, int *fds, struct 
bpf_map_aux *aux,
close(fd);
return ret;
 }
+
+#ifdef HAVE_LIBBPF
+/* The following functions are wrapper functions for libbpf code to be
+ * compatible with the legacy format. So all the functions have prefix
+ * with iproute2_
+ */
+int iproute2_bpf_elf_ctx_init(struct bpf_cfg_in *cfg)
+{
+   struct bpf_elf_ctx *ctx = &__ctx;
+
+   return bpf_elf_ctx_init(ctx, cfg->object, cfg->type, cfg->ifindex, 
cfg->verbose);
+}
+
+int iproute2_bpf_fetch_ancillary(void)
+{
+   struct bpf_elf_ctx *ctx = &__ctx;
+   struct bpf_elf_sec_data data;
+   int i, ret = 0;
+
+   for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
+

[PATCHv6 iproute2-next 5/5] examples/bpf: add bpf examples with BTF defined maps

2020-11-23 Thread Hangbin Liu

Users should try use the new BTF defined maps instead of struct
bpf_elf_map defined maps. The tail call examples are not added yet
as libbpf doesn't currently support declaratively populating tail call
maps.

Reviewed-by: Toke Høiland-Jørgensen 
Signed-off-by: Hangbin Liu 
---
 examples/bpf/README   |  6 
 examples/bpf/bpf_graft.c  | 66 +++
 examples/bpf/bpf_map_in_map.c | 55 +
 examples/bpf/bpf_shared.c | 53 
 include/bpf_api.h | 13 +++
 5 files changed, 193 insertions(+)
 create mode 100644 examples/bpf/bpf_graft.c
 create mode 100644 examples/bpf/bpf_map_in_map.c
 create mode 100644 examples/bpf/bpf_shared.c

diff --git a/examples/bpf/README b/examples/bpf/README
index 732bcc83..b7261191 100644
--- a/examples/bpf/README
+++ b/examples/bpf/README
@@ -1,6 +1,12 @@
 eBPF toy code examples (running in kernel) to familiarize yourself
 with syntax and features:
 
+- BTF defined map examples
+ - bpf_graft.c -> Demo on altering runtime behaviour
+ - bpf_shared.c-> Ingress/egress map sharing example
+ - bpf_map_in_map.c-> Using map in map example
+
+- legacy struct bpf_elf_map defined map examples
  - legacy/bpf_shared.c -> Ingress/egress map sharing example
  - legacy/bpf_tailcall.c   -> Using tail call chains
  - legacy/bpf_cyclic.c -> Simple cycle as tail calls
diff --git a/examples/bpf/bpf_graft.c b/examples/bpf/bpf_graft.c
new file mode 100644
index ..8066dcce
--- /dev/null
+++ b/examples/bpf/bpf_graft.c
@@ -0,0 +1,66 @@
+#include "../../include/bpf_api.h"
+
+/* This example demonstrates how classifier run-time behaviour
+ * can be altered with tail calls. We start out with an empty
+ * jmp_tc array, then add section aaa to the array slot 0, and
+ * later on atomically replace it with section bbb. Note that
+ * as shown in other examples, the tc loader can prepopulate
+ * tail called sections, here we start out with an empty one
+ * on purpose to show it can also be done this way.
+ *
+ * tc filter add dev foo parent : bpf obj graft.o
+ * tc exec bpf dbg
+ *   [...]
+ *   Socket Thread-20229 [001] ..s. 138993.003923: : fallthrough
+ *   -0[001] ..s. 138993.202265: : fallthrough
+ *   Socket Thread-20229 [001] ..s. 138994.004149: : fallthrough
+ *   [...]
+ *
+ * tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec aaa
+ * tc exec bpf dbg
+ *   [...]
+ *   Socket Thread-19818 [002] ..s. 139012.053587: : aaa
+ *   -0[002] ..s. 139012.172359: : aaa
+ *   Socket Thread-19818 [001] ..s. 139012.173556: : aaa
+ *   [...]
+ *
+ * tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec bbb
+ * tc exec bpf dbg
+ *   [...]
+ *   Socket Thread-19818 [002] ..s. 139022.102967: : bbb
+ *   -0[002] ..s. 139022.155640: : bbb
+ *   Socket Thread-19818 [001] ..s. 139022.156730: : bbb
+ *   [...]
+ */
+
+struct {
+   __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+   __uint(key_size, sizeof(uint32_t));
+   __uint(value_size, sizeof(uint32_t));
+   __uint(max_entries, 1);
+   __uint(pinning, LIBBPF_PIN_BY_NAME);
+} jmp_tc __section(".maps");
+
+__section("aaa")
+int cls_aaa(struct __sk_buff *skb)
+{
+   printt("aaa\n");
+   return TC_H_MAKE(1, 42);
+}
+
+__section("bbb")
+int cls_bbb(struct __sk_buff *skb)
+{
+   printt("bbb\n");
+   return TC_H_MAKE(1, 43);
+}
+
+__section_cls_entry
+int cls_entry(struct __sk_buff *skb)
+{
+   tail_call(skb, &jmp_tc, 0);
+   printt("fallthrough\n");
+   return BPF_H_DEFAULT;
+}
+
+BPF_LICENSE("GPL");
diff --git a/examples/bpf/bpf_map_in_map.c b/examples/bpf/bpf_map_in_map.c
new file mode 100644
index ..39c86268
--- /dev/null
+++ b/examples/bpf/bpf_map_in_map.c
@@ -0,0 +1,55 @@
+#include "../../include/bpf_api.h"
+
+struct inner_map {
+   __uint(type, BPF_MAP_TYPE_ARRAY);
+   __uint(key_size, sizeof(uint32_t));
+   __uint(value_size, sizeof(uint32_t));
+   __uint(max_entries, 1);
+} map_inner __section(".maps");
+
+struct {
+   __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
+   __uint(key_size, sizeof(uint32_t));
+   __uint(value_size, sizeof(uint32_t));
+   __uint(max_entries, 1);
+   __uint(pinning, LIBBPF_PIN_BY_NAME);
+   __array(values, struct inner_map);
+} map_outer __section(".maps") = {
+   .values = {
+   [0] = &map_inner,
+   },
+};
+
+__section("egress")
+int emain(struct __sk_buff *skb)
+{
+   struct bpf_elf_map *map_inner;
+   int key = 0, *val;
+
+   map_inner = map_lookup_elem(&map_outer, &key);
+   if (map_inner) {
+   val = map_lookup_elem(map_inner, &key);
+   if (val)
+   lock_xadd(val, 1);
+   }
+
+   return BPF_H_DEFAULT;
+}
+
+__section("ingress")
+int imain(struct __sk_buff *skb)
+{
+   struct bpf_elf_map *map_inner;
+   int key = 0, *val;
+
+   map_inner = map_lookup_elem(&map_

[PATCHv6 iproute2-next 4/5] examples/bpf: move struct bpf_elf_map defined maps to legacy folder

2020-11-23 Thread Hangbin Liu

Reviewed-by: Toke Høiland-Jørgensen 
Signed-off-by: Hangbin Liu 
---
 examples/bpf/README| 14 +-
 examples/bpf/{ => legacy}/bpf_cyclic.c |  2 +-
 examples/bpf/{ => legacy}/bpf_graft.c  |  2 +-
 examples/bpf/{ => legacy}/bpf_map_in_map.c |  2 +-
 examples/bpf/{ => legacy}/bpf_shared.c |  2 +-
 examples/bpf/{ => legacy}/bpf_tailcall.c   |  2 +-
 6 files changed, 14 insertions(+), 10 deletions(-)
 rename examples/bpf/{ => legacy}/bpf_cyclic.c (95%)
 rename examples/bpf/{ => legacy}/bpf_graft.c (97%)
 rename examples/bpf/{ => legacy}/bpf_map_in_map.c (96%)
 rename examples/bpf/{ => legacy}/bpf_shared.c (97%)
 rename examples/bpf/{ => legacy}/bpf_tailcall.c (98%)

diff --git a/examples/bpf/README b/examples/bpf/README
index 1bbdda3f..732bcc83 100644
--- a/examples/bpf/README
+++ b/examples/bpf/README
@@ -1,8 +1,12 @@
 eBPF toy code examples (running in kernel) to familiarize yourself
 with syntax and features:
 
- - bpf_shared.c-> Ingress/egress map sharing example
- - bpf_tailcall.c  -> Using tail call chains
- - bpf_cyclic.c-> Simple cycle as tail calls
- - bpf_graft.c -> Demo on altering runtime behaviour
- - bpf_map_in_map.c -> Using map in map example
+ - legacy/bpf_shared.c -> Ingress/egress map sharing example
+ - legacy/bpf_tailcall.c   -> Using tail call chains
+ - legacy/bpf_cyclic.c -> Simple cycle as tail calls
+ - legacy/bpf_graft.c  -> Demo on altering runtime behaviour
+ - legacy/bpf_map_in_map.c -> Using map in map example
+
+Note: Users should use new BTF way to defined the maps, the examples
+in legacy folder which is using struct bpf_elf_map defined maps is not
+recommanded.
diff --git a/examples/bpf/bpf_cyclic.c b/examples/bpf/legacy/bpf_cyclic.c
similarity index 95%
rename from examples/bpf/bpf_cyclic.c
rename to examples/bpf/legacy/bpf_cyclic.c
index 11d1c061..33590730 100644
--- a/examples/bpf/bpf_cyclic.c
+++ b/examples/bpf/legacy/bpf_cyclic.c
@@ -1,4 +1,4 @@
-#include "../../include/bpf_api.h"
+#include "../../../include/bpf_api.h"
 
 /* Cyclic dependency example to test the kernel's runtime upper
  * bound on loops. Also demonstrates on how to use direct-actions,
diff --git a/examples/bpf/bpf_graft.c b/examples/bpf/legacy/bpf_graft.c
similarity index 97%
rename from examples/bpf/bpf_graft.c
rename to examples/bpf/legacy/bpf_graft.c
index 07113d4a..f4c920cc 100644
--- a/examples/bpf/bpf_graft.c
+++ b/examples/bpf/legacy/bpf_graft.c
@@ -1,4 +1,4 @@
-#include "../../include/bpf_api.h"
+#include "../../../include/bpf_api.h"
 
 /* This example demonstrates how classifier run-time behaviour
  * can be altered with tail calls. We start out with an empty
diff --git a/examples/bpf/bpf_map_in_map.c 
b/examples/bpf/legacy/bpf_map_in_map.c
similarity index 96%
rename from examples/bpf/bpf_map_in_map.c
rename to examples/bpf/legacy/bpf_map_in_map.c
index ff0e623a..575f8812 100644
--- a/examples/bpf/bpf_map_in_map.c
+++ b/examples/bpf/legacy/bpf_map_in_map.c
@@ -1,4 +1,4 @@
-#include "../../include/bpf_api.h"
+#include "../../../include/bpf_api.h"
 
 #define MAP_INNER_ID   42
 
diff --git a/examples/bpf/bpf_shared.c b/examples/bpf/legacy/bpf_shared.c
similarity index 97%
rename from examples/bpf/bpf_shared.c
rename to examples/bpf/legacy/bpf_shared.c
index 21fe6f1e..05b2b9ef 100644
--- a/examples/bpf/bpf_shared.c
+++ b/examples/bpf/legacy/bpf_shared.c
@@ -1,4 +1,4 @@
-#include "../../include/bpf_api.h"
+#include "../../../include/bpf_api.h"
 
 /* Minimal, stand-alone toy map pinning example:
  *
diff --git a/examples/bpf/bpf_tailcall.c b/examples/bpf/legacy/bpf_tailcall.c
similarity index 98%
rename from examples/bpf/bpf_tailcall.c
rename to examples/bpf/legacy/bpf_tailcall.c
index 161eb606..8ebc554c 100644
--- a/examples/bpf/bpf_tailcall.c
+++ b/examples/bpf/legacy/bpf_tailcall.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#include "../../include/bpf_api.h"
+#include "../../../include/bpf_api.h"
 
 #define ENTRY_INIT 3
 #define ENTRY_00
-- 
2.25.4

Re: [PATCH ipsec-next v5] xfrm: redact SA secret with lockdown confidentiality

2020-11-23 Thread Steffen Klassert

On Tue, Nov 17, 2020 at 05:47:23PM +0100, Antony Antony wrote:
> redact XFRM SA secret in the netlink response to xfrm_get_sa()
> or dumpall sa.
> Enable lockdown, confidentiality mode, at boot or at run time.
> 
> e.g. when enabled:
> cat /sys/kernel/security/lockdown
> none integrity [confidentiality]
> 
> ip xfrm state
> src 172.16.1.200 dst 172.16.1.100
>   proto esp spi 0x0002 reqid 2 mode tunnel
>   replay-window 0
>   aead rfc4106(gcm(aes)) 0x 96
> 
> note: the aead secret is redacted.
> Redacting secret is also a FIPS 140-2 requirement.
> 
> v1->v2
>  - add size checks before memset calls
> v2->v3
>  - replace spaces with tabs for consistency
> v3->v4
>  - use kernel lockdown instead of a /proc setting
> v4->v5
>  - remove kconfig option
> 
> Reviewed-by: Stephan Mueller 
> Signed-off-by: Antony Antony 
> ---
>  include/linux/security.h |  1 +
>  net/xfrm/xfrm_user.c | 74 
>  security/security.c  |  1 +
>  3 files changed, 69 insertions(+), 7 deletions(-)

I'm ok with this and I plan to apply it to ipsec-next if I do not see
objections from the LSM people.

Re: ESN, seqhi and out-of-order calls to advance()

2020-11-23 Thread Steffen Klassert

Hi,

I've Cced netdev, maybe other people have an opinion on this too.

On Thu, Nov 19, 2020 at 01:39:29PM -0800, Nic Dade wrote:
> I've been investigating a problem which happens when I use IPsec
> (strongswan in userspace), ESN, the default anti-replay window (32
> seqnums), on a multi-core CPU. The small window size isn't necessary, it
> just makes it easier to reproduce.

Using the same SA on multiple cores has synchronization problems anyway.
We curently try to solve that on the protocol level:

https://datatracker.ietf.org/doc/html/draft-pwouters-multi-sa-performance

> 
> It looks like it's the same problem which was found and patched in commit
> 3b59df46a4, but that commit only applies to async case, and doesn't quite
> solve the problem when it is the CPU cores which are causing it.
> 
> The trouble is that xfrm_replay_seqhi() is called twice for each packet,
> once to decide what seqhi value to use for authentication, and a second
> time to decide what seq num to advance to for anti-replay. Sometimes they
> are not the same value.
> 
> Here's what happens to cause this: two packets belonging to the same ESP SA
> are being received, and the packet's seqnums are > window apart. The
> packets pass through xfrm_input() which does the 1st call to
> xfrm_replay_seqhi(). Once the packet and seqhi have been authenticated,
> x->repl->advance() == xfrm_replay_advance_esn() is called. If the higher
> seqnum packet gets to advance() first, then it moves the auti-replay window
> forward. Then when the second packet arrives at advance(), the call
> xfrm_replay_advance_esn() makes to xfrm_replay_seqhi() thinks the seqnum <
> bottom means 32-bit seqnums wrapped, and it returns seqhi+1.
> xfrm_replay_advance_esn() stores that away in the xfrm_replay_state_esn for
> the future. From now until the ESP SA renews, or the sender gets to the
> point where seqhi really did increment, all packets will fail
> authentication.

Well analyzed! Yes, that can happen if the packets are processed on
different cpus.

> 
> This is b/c the seqhi which was authenticated != the seqhi which advance()
> believed to be correct.
> 
> I think it would be safer if the authenticated seqhi computed in
> xfrm_input() was passed to the advance() function as an argument, rather
> than having advance() recompute seqhi. That would also fix the async case.
> Or the non-async case also needs to recheck.

Right, using the authenticated seqhi in the advance() function makes
sense. We can do away the recheck() if that fixes the problem entirely.
But I did not look into this problem for some years, so not absolutely
sure if that is sufficient.


> 
> Also it seems to be that calling xfrm_replay_seqhi() while not holding
> x->lock (as xfrm_input() does) is not safe, since both x->seq_hi and x->seq
> are used by xfrm_replay_seqhi(), and that can race with the updates to seq
> and seq_hi in xfrm_replay_advance_esn()

True, indeed.

> --
> 
> Note I still have some mysteries. I do know for sure (from instrumenting
> the code and reproducing) that the 2nd call to xfrm_replay_seqhi() does
> result in a different answer than the 1st call, and that this happens when
> the packets are processed simultaneously and out of order as described
> above. My mystery is that my instrumentation also indicates it's the same
> CPU core (by smp_processor_id()) processing the two packets, which I cannot
> explain.

That is odd. Should not happen if the packets are processed on the same cpu 
with a sync cipher.

Re: [PATCH bpf-next v2 0/5] selftests/bpf: xsk selftests

2020-11-23 Thread Björn Töpel


On 2020-11-23 13:20, Björn Töpel wrote:

On 2020-11-21 01:31, Yonghong Song wrote:



On 11/20/20 5:00 AM, Weqaar Janjua wrote:

This patch set adds AF_XDP selftests based on veth to selftests/bpf.

# Topology:
# -
# ---
#   _ | Process | _
#  /  ---  \
# /    |    \
#    / | \
#  --- | ---
#  | Thread1 | | | Thread2 |
#  --- | ---
#   |  |  |
#  --- | ---
#  |  xskX   | | |  xskY   |
#  --- | ---
#   |  |  |
#  --- | --
#  |  vethX  | - |  vethY |
#  ---   peer    --
#   |  |  |
#  namespaceX  | namespaceY

These selftests test AF_XDP SKB and Native/DRV modes using veth Virtual
Ethernet interfaces.

The test program contains two threads, each thread is single socket with
a unique UMEM. It validates in-order packet delivery and packet content
by sending packets to each other.

Prerequisites setup by script test_xsk_prerequisites.sh:

    Set up veth interfaces as per the topology shown ^^:
    * setup two veth interfaces and one namespace
    ** veth in root namespace
    ** veth in af_xdp namespace
    ** namespace af_xdp
    * create a spec file veth.spec that includes this run-time 
configuration

  that is read by test scripts - filenames prefixed with test_xsk_
    ***  and  are randomly generated 4 digit numbers used to 
avoid

    conflict with any existing interface

The following tests are provided:

1. AF_XDP SKB mode
    Generic mode XDP is driver independent, used when the driver does
    not have support for XDP. Works on any netdevice using sockets and
    generic XDP path. XDP hook from netif_receive_skb().
    a. nopoll - soft-irq processing
    b. poll - using poll() syscall
    c. Socket Teardown
   Create a Tx and a Rx socket, Tx from one socket, Rx on another.
   Destroy both sockets, then repeat multiple times. Only nopoll 
mode

  is used
    d. Bi-directional Sockets
   Configure sockets as bi-directional tx/rx sockets, sets up fill
  and completion rings on each socket, tx/rx in both directions.
  Only nopoll mode is used

2. AF_XDP DRV/Native mode
    Works on any netdevice with XDP_REDIRECT support, driver dependent.
    Processes packets before SKB allocation. Provides better performance
    than SKB. Driver hook available just after DMA of buffer descriptor.
    a. nopoll
    b. poll
    c. Socket Teardown
    d. Bi-directional Sockets
    * Only copy mode is supported because veth does not currently 
support

  zero-copy mode

Total tests: 8

Flow:
* Single process spawns two threads: Tx and Rx
* Each of these two threads attach to a veth interface within their
   assigned namespaces
* Each thread creates one AF_XDP socket connected to a unique umem
   for each veth interface
* Tx thread transmits 10k packets from veth to veth
* Rx thread verifies if all 10k packets were received and delivered
   in-order, and have the right content

v2 changes:
* Move selftests/xsk to selftests/bpf
* Remove Makefiles under selftests/xsk, and utilize 
selftests/bpf/Makefile


Structure of the patch set:

Patch 1: This patch adds XSK Selftests framework under selftests/bpf
Patch 2: Adds tests: SKB poll and nopoll mode, and mac-ip-udp debug
Patch 3: Adds tests: DRV poll and nopoll mode
Patch 4: Adds tests: SKB and DRV Socket Teardown
Patch 5: Adds tests: SKB and DRV Bi-directional Sockets


I just want to report that after applying the above 5 patches
on top of bpf-next commit 450d060e8f75 ("bpftool: Add {i,d}tlb_misses 
support for bpftool profile"), I hit the following error with below 
command sequences:


  $ ./test_xsk_prerequisites.sh
  $ ./test_xsk_skb_poll.sh
# Interface found: ve1480
# Interface found: ve9258
# NS switched: af_xdp9258
1..1
# Interface [ve9258] vector [Rx]
# Interface [ve1480] vector [Tx]
# Sending 1 packets on interface ve1480
[  331.741244] [ cut here ]
[  331.741741] kernel BUG at net/core/skbuff.c:1621!
[  331.742265] invalid opcode:  [#1] PREEMPT SMP PTI
[  331.742837] CPU: 0 PID: 1883 Comm: xdpxceiver Not tainted 
5.10.0-rc3+ #1037
[  331.743468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3

-1.el7.centos 04/01/2014
[  331.744300] RIP: 0010:pskb_expand_head+0x27b/0x310


Ugh, looks like the tests are working. :-P

This is a BUG_ON(skb_shared(skb)) trigger, related to the skbuff 
refcount changes done recently in AF_XDP.


I'll cook a patch! Thanks for the report!



Posted a fix [1].

Please not that it's for the bpf tree, so when Weqaar pushes the v3 of
the selftests to bpf-next, [1] needs to be pulled in.



Björn

[1] 
https://lore.kernel.org/bpf/20201123131215.136131-1-bjorn.to...@gma

Re: [PATCH net-next V7] net: Variable SLAAC: SLAAC with prefixes of arbitrary length in PIO

2020-11-23 Thread Hideaki Yoshifuji

Hi,

2020年11月20日(金) 18:28 Dmytro Shytyi :
>
> Variable SLAAC: SLAAC with prefixes of arbitrary length in PIO (randomly
> generated hostID or stable privacy + privacy extensions).
> The main problem is that SLAAC RA or PD allocates a /64 by the Wireless
> carrier 4G, 5G to a mobile hotspot, however segmentation of the /64 via
> SLAAC is required so that downstream interfaces can be further subnetted.
> Example: uCPE device (4G + WI-FI enabled) receives /64 via Wireless, and
> assigns /72 to VNF-Firewall, /72 to WIFI, /72 to VNF-Router, /72 to
> Load-Balancer and /72 to wired connected devices.
> IETF document that defines problem statement:
> draft-mishra-v6ops-variable-slaac-problem-stmt
> IETF document that specifies variable slaac:
> draft-mishra-6man-variable-slaac
>
> Signed-off-by: Dmytro Shytyi 
> ---
> diff -rupN net-next-5.10.0-rc2/net/ipv6/addrconf.c 
> net-next-patch-5.10.0-rc2/net/ipv6/addrconf.c
> --- net-next-5.10.0-rc2/net/ipv6/addrconf.c 2020-11-10 08:46:01.075193379 
> +0100
> +++ net-next-patch-5.10.0-rc2/net/ipv6/addrconf.c   2020-11-19 
> 21:26:39.770872898 +0100
> @@ -142,7 +142,6 @@ static int ipv6_count_addresses(const st
>  static int ipv6_generate_stable_address(struct in6_addr *addr,
> u8 dad_count,
> const struct inet6_dev *idev);
> -

Do not remove this line.
>  #define IN6_ADDR_HSIZE_SHIFT   8
>  #define IN6_ADDR_HSIZE (1 << IN6_ADDR_HSIZE_SHIFT)
>  /*
> @@ -1315,6 +1314,7 @@ static int ipv6_create_tempaddr(struct i
> struct ifa6_config cfg;
> long max_desync_factor;
> struct in6_addr addr;
> +   struct in6_addr temp;
> int ret = 0;
>
> write_lock_bh(&idev->lock);
> @@ -1340,9 +1340,16 @@ retry:
> goto out;
> }
> in6_ifa_hold(ifp);
> -   memcpy(addr.s6_addr, ifp->addr.s6_addr, 8);
> -   ipv6_gen_rnd_iid(&addr);
>
> +   if (ifp->prefix_len == 64) {
> +   memcpy(addr.s6_addr, ifp->addr.s6_addr, 8);
> +   ipv6_gen_rnd_iid(&addr);
> +   } else if (ifp->prefix_len > 0 && ifp->prefix_len <= 128) {
> +   memcpy(addr.s6_addr32, ifp->addr.s6_addr, 16);
> +   get_random_bytes(temp.s6_addr32, 16);
> +   ipv6_addr_prefix_copy(&temp, &addr, ifp->prefix_len);
> +   memcpy(addr.s6_addr, temp.s6_addr, 16);
> +   }

I do not understand why you are copying many times.
ipv6_addr_copy(), get_random_bytes(), and then ipv6_addr_prefix_copy
is enough, no?

> age = (now - ifp->tstamp) / HZ;
>
> regen_advance = idev->cnf.regen_max_retry *
> @@ -2569,6 +2576,41 @@ static bool is_addr_mode_generate_stable
>idev->cnf.addr_gen_mode == IN6_ADDR_GEN_MODE_RANDOM;
>  }
>
> +static struct inet6_ifaddr *ipv6_cmp_rcvd_prsnt_prfxs(struct inet6_ifaddr 
> *ifp,
> + struct inet6_dev 
> *in6_dev,
> + struct net *net,
> + const struct 
> prefix_info *pinfo)
> +{
> +   struct inet6_ifaddr *result_base = NULL;
> +   struct inet6_ifaddr *result = NULL;
> +   struct in6_addr curr_net_prfx;
> +   struct in6_addr net_prfx;
> +   bool prfxs_equal;
> +
> +   result_base = result;
> +   rcu_read_lock();
> +   list_for_each_entry_rcu(ifp, &in6_dev->addr_list, if_list) {
> +   if (!net_eq(dev_net(ifp->idev->dev), net))
> +   continue;
> +   ipv6_addr_prefix_copy(&net_prfx, &pinfo->prefix, 
> pinfo->prefix_len);
> +   ipv6_addr_prefix_copy(&curr_net_prfx, &ifp->addr, 
> pinfo->prefix_len);
> +   prfxs_equal =
> +   ipv6_prefix_equal(&net_prfx, &curr_net_prfx, 
> pinfo->prefix_len);
> +   if (prfxs_equal && pinfo->prefix_len == ifp->prefix_len) {
> +   result = ifp;
> +   in6_ifa_hold(ifp);
> +   break;
> +   }

I guess we can compare them with ipv6_prefix_equal()
directly because the code assumes "pinfo->prefix_len" and ifp->prefix_len are
equal.

> +   }
> +   rcu_read_unlock();
> +   if (result_base != result)
> +   ifp = result;
> +   else
> +   ifp = NULL;
> +
> +   return ifp;
> +}
> +
>  int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
>  const struct prefix_info *pinfo,
>  struct inet6_dev *in6_dev,
> @@ -2576,9 +2618,16 @@ int addrconf_prefix_rcv_add_addr(struct
>  u32 addr_flags, bool sllao, bool tokenized,
>  __u32 valid_lft, u32 prefered_lft)
>  {
> -   struct inet6_ifaddr *ifp = ipv6_get_ifaddr(net, addr, dev, 1);
> +   struct inet6_ifaddr *ifp = NULL;
> +   int plen = pi

Re: [PATCH bpf-next v7 00/34] bpf: switch to memcg-based memory accounting

2020-11-23 Thread Daniel Borkmann


On 11/19/20 6:37 PM, Roman Gushchin wrote:

Currently bpf is using the memlock rlimit for the memory accounting.
This approach has its downsides and over time has created a significant
amount of problems:

1) The limit is per-user, but because most bpf operations are performed
as root, the limit has a little value.

2) It's hard to come up with a specific maximum value. Especially because
the counter is shared with non-bpf users (e.g. memlock() users).
Any specific value is either too low and creates false failures
or too high and useless.

3) Charging is not connected to the actual memory allocation. Bpf code
should manually calculate the estimated cost and precharge the counter,
and then take care of uncharging, including all fail paths.
It adds to the code complexity and makes it easy to leak a charge.

4) There is no simple way of getting the current value of the counter.
We've used drgn for it, but it's far from being convenient.

5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had
a function to "explain" this case for users.

In order to overcome these problems let's switch to the memcg-based
memory accounting of bpf objects. With the recent addition of the percpu
memory accounting, now it's possible to provide a comprehensive accounting
of the memory used by bpf programs and maps.

This approach has the following advantages:
1) The limit is per-cgroup and hierarchical. It's way more flexible and allows
a better control over memory usage by different workloads. Of course, it
requires enabled cgroups and kernel memory accounting and properly 
configured
cgroup tree, but it's a default configuration for a modern Linux system.

2) The actual memory consumption is taken into account. It happens automatically
on the allocation time if __GFP_ACCOUNT flags is passed. Uncharging is also
performed automatically on releasing the memory. So the code on the bpf side
becomes simpler and safer.

3) There is a simple way to get the current value and statistics.

In general, if a process performs a bpf operation (e.g. creates or updates
a map), it's memory cgroup is charged. However map updates performed from
an interrupt context are charged to the memory cgroup which contained
the process, which created the map.

Providing a 1:1 replacement for the rlimit-based memory accounting is
a non-goal of this patchset. Users and memory cgroups are completely
orthogonal, so it's not possible even in theory.
Memcg-based memory accounting requires a properly configured cgroup tree
to be actually useful. However, it's the way how the memory is managed
on a modern Linux system.


The cover letter here only describes the advantages of this series, but leaves
out discussion of the disadvantages. They definitely must be part of the series
to provide a clear description of the semantic changes to readers. Last time we
discussed them, they were i) no mem limits in general on unprivileged users when
memory cgroups was not configured in the kernel, and ii) no mem limits by 
default
if not configured in the cgroup specifically. Did we made any progress on these
in the meantime? How do we want to address them? What is the concrete 
justification
to not address them?

Also I wonder what are the risk of regressions here, for example, if an existing
orchestrator has configured memory cgroup limits that are tailored to the 
application's
needs.. now, with kernel upgrade BPF will start to interfere, e.g. if a BPF 
program
attached to cgroups (e.g. connect/sendmsg/recvmsg or general cgroup skb egress 
hook)
starts charging to the process' memcg due to map updates?

  [0] 
https://lore.kernel.org/bpf/20200803190639.gd1020...@carbon.dhcp.thefacebook.com/


The patchset consists of the following parts:
1) 4 mm patches, which are already in the mm tree, but are required
to avoid a regression (otherwise vmallocs cannot be mapped to userspace).
2) memcg-based accounting for various bpf objects: progs and maps
3) removal of the rlimit-based accounting
4) removal of rlimit adjustments in userspace samples

First 4 patches are not supposed to be merged via the bpf tree. I'm including
them to make sure bpf tests will pass.

v7:
   - introduced bpf_map_kmalloc_node() and bpf_map_alloc_percpu(), by Alexei
   - switched allocations made from an interrupt context to new helpers,
 by Daniel
   - rebase and minor fixes

Re: [PATCH bpf-next v3 00/10] Introduce preferred busy-polling

2020-11-23 Thread Björn Töpel

On Thu, 19 Nov 2020 at 09:30, Björn Töpel  wrote:
>
> This series introduces three new features:
>
> 1. A new "heavy traffic" busy-polling variant that works in concert
>with the existing napi_defer_hard_irqs and gro_flush_timeout knobs.
>
> 2. A new socket option that let a user change the busy-polling NAPI
>budget.
>
> 3. Allow busy-polling to be performed on XDP sockets.
>
> The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
> option or system-wide using the /proc/sys/net/core/busy_read knob, is
> an opportunistic. That means that if the NAPI context is not
> scheduled, it will poll it. If, after busy-polling, the budget is
> exceeded the busy-polling logic will schedule the NAPI onto the
> regular softirq handling.
>
> One implication of the behavior above is that a busy/heavy loaded NAPI
> context will never enter/allow for busy-polling. Some applications
> prefer that most NAPI processing would be done by busy-polling.
>
> This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
> in concert with the napi_defer_hard_irqs and gro_flush_timeout
> knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
> introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
> feature"), and allows for a user to defer interrupts to be enabled and
> instead schedule the NAPI context from a watchdog timer. When a user
> enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
> and the NAPI context is being processed by a softirq, the softirq NAPI
> processing will exit early to allow the busy-polling to be performed.
>
> If the application stops performing busy-polling via a system call,
> the watchdog timer defined by gro_flush_timeout will timeout, and
> regular softirq handling will resume.
>
> In summary; Heavy traffic applications that prefer busy-polling over
> softirq processing should use this option.
>

Eric/Jakub, any more thoughts/input? Tomatoes? :-P


Thank you,
Björn

Re: [PATCHv2 1/1] xdp: remove the function xsk_map_inc

2020-11-23 Thread Zhu Yanjun

On Mon, Nov 23, 2020 at 8:19 PM Magnus Karlsson
 wrote:
>
> On Mon, Nov 23, 2020 at 1:11 PM Zhu Yanjun  wrote:
> >
> > On Mon, Nov 23, 2020 at 8:05 PM  wrote:
> > >
> > > From: Zhu Yanjun 
> > >
> > > The function xsk_map_inc is a simple wrapper of bpf_map_inc and
> > > always returns zero. As such, replacing this function with bpf_map_inc
> > > and removing the test code.
> > >
> > > Signed-off-by: Zhu Yanjun 
> >
> >
> > > ---
> > >  net/xdp/xsk.c|  1 -
> > >  net/xdp/xsk.h|  1 -
> > >  net/xdp/xskmap.c | 13 +
> > >  3 files changed, 1 insertion(+), 14 deletions(-)
> > >
> > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > index cfbec3989a76..c1b8a888591c 100644
> > > --- a/net/xdp/xsk.c
> > > +++ b/net/xdp/xsk.c
> > > @@ -548,7 +548,6 @@ static struct xsk_map *xsk_get_map_list_entry(struct 
> > > xdp_sock *xs,
> > > node = list_first_entry_or_null(&xs->map_list, struct 
> > > xsk_map_node,
> > > node);
> > > if (node) {
> > > -   WARN_ON(xsk_map_inc(node->map));
>
> This should be bpf_map_inc(&node->map->map); Think you forgot to
> convert this one.

In include/linux/bpf.h:
"
...
1213 void bpf_map_inc(struct bpf_map *map);
...
"

Zhu Yanjun
>
> > > map = node->map;
> > > *map_entry = node->map_entry;
> > > }
> > > diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
> > > index b9e896cee5bb..0aad25c0e223 100644
> > > --- a/net/xdp/xsk.h
> > > +++ b/net/xdp/xsk.h
> > > @@ -41,7 +41,6 @@ static inline struct xdp_sock *xdp_sk(struct sock *sk)
> > >
> > >  void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
> > >  struct xdp_sock **map_entry);
> > > -int xsk_map_inc(struct xsk_map *map);
> > >  void xsk_map_put(struct xsk_map *map);
> > >  void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
> > >  int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool 
> > > *pool,
> > > diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
> > > index 49da2b8ace8b..6b7e9a72b101 100644
> > > --- a/net/xdp/xskmap.c
> > > +++ b/net/xdp/xskmap.c
> > > @@ -11,12 +11,6 @@
> > >
> > >  #include "xsk.h"
> > >
> > > -int xsk_map_inc(struct xsk_map *map)
> > > -{
> > > -   bpf_map_inc(&map->map);
> > > -   return 0;
> > > -}
> >
> > Hi, Magnus
> >
> > The function xsk_map_inc is replaced with bpf_map_inc.
> >
> > Zhu Yanjun
> >
> > > -
> > >  void xsk_map_put(struct xsk_map *map)
> > >  {
> > > bpf_map_put(&map->map);
> > > @@ -26,17 +20,12 @@ static struct xsk_map_node *xsk_map_node_alloc(struct 
> > > xsk_map *map,
> > >struct xdp_sock 
> > > **map_entry)
> > >  {
> > > struct xsk_map_node *node;
> > > -   int err;
> > >
> > > node = kzalloc(sizeof(*node), GFP_ATOMIC | __GFP_NOWARN);
> > > if (!node)
> > > return ERR_PTR(-ENOMEM);
> > >
> > > -   err = xsk_map_inc(map);
> > > -   if (err) {
> > > -   kfree(node);
> > > -   return ERR_PTR(err);
> > > -   }
> > > +   bpf_map_inc(&map->map);
> > >
> > > node->map = map;
> > > node->map_entry = map_entry;
> > > --
> > > 2.25.1
> > >

Re: [net v3] net/tls: missing received data after fast remote close

2020-11-23 Thread Vadim Fedorenko


On 20.11.2020 18:26, Jakub Kicinski wrote:

On Thu, 19 Nov 2020 18:59:48 +0300 Vadim Fedorenko wrote:

In case when tcp socket received FIN after some data and the
parser haven't started before reading data caller will receive
an empty buffer. This behavior differs from plain TCP socket and
leads to special treating in user-space.
The flow that triggers the race is simple. Server sends small
amount of data right after the connection is configured to use TLS
and closes the connection. In this case receiver sees TLS Handshake
data, configures TLS socket right after Change Cipher Spec record.
While the configuration is in process, TCP socket receives small
Application Data record, Encrypted Alert record and FIN packet. So
the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with
SK_DONE bit set. The received data is not parsed upon arrival and is
never sent to user-space.

Patch unpauses parser directly if we have unparsed data in tcp
receive queue.

Signed-off-by: Vadim Fedorenko 

Applied, thanks!

Looks like I missed fixes tag to queue this patch to -stable.

Fixes: c46234ebb4d1 ("tls: RX path for ktls")

[PATCH net-next V2] MAINTAINERS: Update page pool entry

2020-11-23 Thread Jesper Dangaard Brouer

Add some file F: matches that is related to page_pool.

Acked-by: Ilias Apalodimas 
Signed-off-by: Jesper Dangaard Brouer 
---
 MAINTAINERS |2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f827f504251b..a607ff2156dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13177,7 +13177,9 @@ M:  Jesper Dangaard Brouer 
 M: Ilias Apalodimas 
 L: netdev@vger.kernel.org
 S: Supported
+F: Documentation/networking/page_pool.rst
 F: include/net/page_pool.h
+F: include/trace/events/page_pool.h
 F: net/core/page_pool.c
 
 PANASONIC LAPTOP ACPI EXTRAS DRIVER

Re: [arm64] kernel BUG at kernel/seccomp.c:1309!

2020-11-23 Thread Arnd Bergmann

On Mon, Nov 23, 2020 at 12:15 PM Naresh Kamboju
 wrote:
>
> While booting arm64 kernel the following kernel BUG noticed on several arm64
> devices running linux next 20201123 tag kernel.
>
>
> $ git log --oneline next-20201120..next-20201123 -- kernel/seccomp.c
> 5c5c5fa055ea Merge remote-tracking branch 'seccomp/for-next/seccomp'
> bce6a8cba7bf Merge branch 'linus'
> 7ef95e3dbcee Merge branch 'for-linus/seccomp' into for-next/seccomp
> fab686eb0307 seccomp: Remove bogus __user annotations
> 0d831528 seccomp/cache: Report cache data through /proc/pid/seccomp_cache
> 8e01b51a31a1 seccomp/cache: Add "emulator" to check if filter is constant 
> allow
> f9d480b6ffbe seccomp/cache: Lookup syscall allowlist bitmap for fast path
> 23d67a54857a seccomp: Migrate to use SYSCALL_WORK flag
>
>
> Please find these easy steps to reproduce the kernel build and boot.

Adding Gabriel Krisman Bertazi to Cc, as the last patch (23d67a54857a) here
seems suspicious: it changes

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 02aef2844c38..47763f3999f7 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -42,7 +42,7 @@ struct seccomp {
 extern int __secure_computing(const struct seccomp_data *sd);
 static inline int secure_computing(void)
 {
-   if (unlikely(test_thread_flag(TIF_SECCOMP)))
+   if (unlikely(test_syscall_work(SECCOMP)))
return  __secure_computing(NULL);
return 0;
 }

which is in the call chain directly before

int __secure_computing(const struct seccomp_data *sd)
{
   int mode = current->seccomp.mode;

...
switch (mode) {
case SECCOMP_MODE_STRICT:
__secure_computing_strict(this_syscall);  /* may call do_exit */
return 0;
case SECCOMP_MODE_FILTER:
return __seccomp_filter(this_syscall, sd, false);
default:
BUG();
}
}

Clearly, current->seccomp.mode is set to something other
than SECCOMP_MODE_STRICT or SECCOMP_MODE_FILTER
while the test_syscall_work(SECCOMP) returns true, and this
must have not been the case earlier.

 Arnd

>
> step to reproduce:
> # please install tuxmake
> # sudo pip3 install -U tuxmake
> # cd linux-next
> # tuxmake --runtime docker --target-arch arm --toolchain gcc-9
> --kconfig defconfig --kconfig-add
> https://builds.tuxbuild.com/1kgWN61pS5M35vjnVfDSvOOPd38/config
>
> # Boot the arm64 on any arm64 devices.
> # you will notice the below BUG
>
> crash log details:
> ---
> [6.941012] [ cut here ]
> Found device  /dev/ttyAMA3.
> [6.947587] lima f408.gpu: mod rate = 5
> [6.955422] kernel BUG at kernel/seccomp.c:1309!
> [6.955430] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [6.955437] Modules linked in: cec rfkill wlcore_sdio(+) kirin_drm
> dw_drm_dsi lima(+) drm_kms_helper gpu_sched drm fuse
> [6.955481] CPU: 2 PID: 291 Comm: systemd-udevd Not tainted
> 5.10.0-rc4-next-20201123 #2
> [6.955485] Hardware name: HiKey Development Board (DT)
> [6.955493] pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> [6.955510] pc : __secure_computing+0xe0/0xe8
> [6.958171] mmc_host mmc2: Bus speed (slot 0) = 2480Hz (slot
> req 40Hz, actual 40HZ div = 31)
> [6.965975] [drm] Initialized lima 1.1.0 20191231 for f408.gpu on 
> minor 0
> [6.970176] lr : syscall_trace_enter+0x1cc/0x218
> [6.970181] sp : 800012d8be10
> [6.970185] x29: 800012d8be10 x28: 0092cb00
> [6.970195] x27:  x26: 
> [6.970203] x25:  x24: 
> [6.970210] x23: 6000 x22: 0202
> [7.011614] mmc_host mmc2: Bus speed (slot 0) = 2480Hz (slot
> req 2500Hz, actual 2480HZ div = 0)
> [7.016457]
> [7.016461] x21: 0200 x20: 0092cb00
> [7.016470] x19: 800012d8bec0 x18: 
> [7.016478] x17:  x16: 
> [7.016485] x15:  x14: 
> [7.054116] mmc_host mmc2: Bus speed (slot 0) = 2480Hz (slot
> req 40Hz, actual 40HZ div = 31)
> [7.056715]
> [7.103444] mmc_host mmc2: Bus speed (slot 0) = 2480Hz (slot
> req 2500Hz, actual 2480HZ div = 0)
> [7.105105] x13:  x12: 
> [7.125849] x11:  x10: 
> [7.125858] x9 : 80001001bcbc x8 : 
> [7.125865] x7 :  x6 : 
> [7.125871] x5 :  x4 : 
> [7.125879] x3 :  x2 : 0092cb00
> [7.

Re: [PATCHv2 1/1] xdp: remove the function xsk_map_inc

2020-11-23 Thread Magnus Karlsson

On Mon, Nov 23, 2020 at 2:37 PM Zhu Yanjun  wrote:
>
> On Mon, Nov 23, 2020 at 8:19 PM Magnus Karlsson
>  wrote:
> >
> > On Mon, Nov 23, 2020 at 1:11 PM Zhu Yanjun  wrote:
> > >
> > > On Mon, Nov 23, 2020 at 8:05 PM  wrote:
> > > >
> > > > From: Zhu Yanjun 
> > > >
> > > > The function xsk_map_inc is a simple wrapper of bpf_map_inc and
> > > > always returns zero. As such, replacing this function with bpf_map_inc
> > > > and removing the test code.
> > > >
> > > > Signed-off-by: Zhu Yanjun 
> > >
> > >
> > > > ---
> > > >  net/xdp/xsk.c|  1 -
> > > >  net/xdp/xsk.h|  1 -
> > > >  net/xdp/xskmap.c | 13 +
> > > >  3 files changed, 1 insertion(+), 14 deletions(-)
> > > >
> > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > index cfbec3989a76..c1b8a888591c 100644
> > > > --- a/net/xdp/xsk.c
> > > > +++ b/net/xdp/xsk.c
> > > > @@ -548,7 +548,6 @@ static struct xsk_map 
> > > > *xsk_get_map_list_entry(struct xdp_sock *xs,
> > > > node = list_first_entry_or_null(&xs->map_list, struct 
> > > > xsk_map_node,
> > > > node);
> > > > if (node) {
> > > > -   WARN_ON(xsk_map_inc(node->map));
> >
> > This should be bpf_map_inc(&node->map->map); Think you forgot to
> > convert this one.
>
> In include/linux/bpf.h:
> "
> ...
> 1213 void bpf_map_inc(struct bpf_map *map);
> ...
> "

Sorry if I was not clear enough. What I meant is that you cannot just
remove WARN_ON(xsk_map_inc(node->map)). You need to replace it with
bpf_map_inc(&node->map->map), otherwise you will not make a map_inc
and the refcount will be wrong. Please send a v3 using git send-email
so it is nice and clean.

> Zhu Yanjun
> >
> > > > map = node->map;
> > > > *map_entry = node->map_entry;
> > > > }
> > > > diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
> > > > index b9e896cee5bb..0aad25c0e223 100644
> > > > --- a/net/xdp/xsk.h
> > > > +++ b/net/xdp/xsk.h
> > > > @@ -41,7 +41,6 @@ static inline struct xdp_sock *xdp_sk(struct sock *sk)
> > > >
> > > >  void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
> > > >  struct xdp_sock **map_entry);
> > > > -int xsk_map_inc(struct xsk_map *map);
> > > >  void xsk_map_put(struct xsk_map *map);
> > > >  void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
> > > >  int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool 
> > > > *pool,
> > > > diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
> > > > index 49da2b8ace8b..6b7e9a72b101 100644
> > > > --- a/net/xdp/xskmap.c
> > > > +++ b/net/xdp/xskmap.c
> > > > @@ -11,12 +11,6 @@
> > > >
> > > >  #include "xsk.h"
> > > >
> > > > -int xsk_map_inc(struct xsk_map *map)
> > > > -{
> > > > -   bpf_map_inc(&map->map);
> > > > -   return 0;
> > > > -}
> > >
> > > Hi, Magnus
> > >
> > > The function xsk_map_inc is replaced with bpf_map_inc.
> > >
> > > Zhu Yanjun
> > >
> > > > -
> > > >  void xsk_map_put(struct xsk_map *map)
> > > >  {
> > > > bpf_map_put(&map->map);
> > > > @@ -26,17 +20,12 @@ static struct xsk_map_node 
> > > > *xsk_map_node_alloc(struct xsk_map *map,
> > > >struct xdp_sock 
> > > > **map_entry)
> > > >  {
> > > > struct xsk_map_node *node;
> > > > -   int err;
> > > >
> > > > node = kzalloc(sizeof(*node), GFP_ATOMIC | __GFP_NOWARN);
> > > > if (!node)
> > > > return ERR_PTR(-ENOMEM);
> > > >
> > > > -   err = xsk_map_inc(map);
> > > > -   if (err) {
> > > > -   kfree(node);
> > > > -   return ERR_PTR(err);
> > > > -   }
> > > > +   bpf_map_inc(&map->map);
> > > >
> > > > node->map = map;
> > > > node->map_entry = map_entry;
> > > > --
> > > > 2.25.1
> > > >

[RFC 00/18] net: iosm: PCIe Driver for Intel M.2 Modem

2020-11-23 Thread M Chetan Kumar

The IOSM (IPC over Shared Memory) driver is a PCIe host driver implemented
for linux or chrome platform for data exchange over PCIe interface between
Host platform & Intel M.2 Modem. The driver exposes interface conforming to the
MBIM protocol [1]. Any front end application ( eg: Modem Manager) could easily
manage the MBIM interface to enable data communication towards WWAN.

This is the driver we are still working on & below are the known things that
need to be addressed by driver.
1. Usage of completion() inside deinit()
2. Clean-up wrappers around hr_timer
3. Usage of net stats inside driver struct

Kindly request to review and give your suggestions.

Below is the technical detail:-
Intel M.2 modem uses 2 BAR regions. The first region is dedicated to Doorbell
register for IRQs and the second region is used as scratchpad area for book
keeping modem execution stage details along with host system shared memory
region context details. The upper edge of the driver exposes the control and
data channels for user space application interaction. At lower edge these data
and control channels are associated to pipes. The pipes are lowest level
interfaces used over PCIe as a logical channel for message exchange. A single
channel maps to UL and DL pipe and are initialized on device open.

On UL path, driver copies application sent data to SKBs associate it with
transfer descriptor and puts it on to ring buffer for DMA transfer. Once
information has been updated in shared memory region, host gives a Doorbell
to modem to perform DMA and modem uses MSI to communicate back to host.
For receiving data in DL path, SKBs are pre-allocated during pipe open and
transfer descriptors are given to modem for DMA transfer.

The driver exposes two types of ports, namely "wwanctrl", a char device node
which is used for MBIM control operation and "INMx",(x = 0,1,2..7) network
interfaces for IP data communication.
1) MBIM Control Interface:
This node exposes an interface between modem and application using char device
exposed by "IOSM" driver to establish and manage the MBIM data communication
with PCIe based Intel M.2 Modems.

It also support an IOCTL command, apart from read and write methods. The IOCTL
command, "IOCTL_WDM_MAX_COMMAND" could be used by applications to fetch the
Maximum Command buffer length supported by the driver which is restricted to
4096 bytes.

2) MBIM Data Interface:
The IOSM driver represents the MBIM data channel as a single root network
device of the "wwan0" type which is mapped as default IP session 0. Several IP
sessions(INMx) could be multiplexed over the single data channel using
sub devices of master wwanY devices. The driver models such IP sessions as
802.1q VLAN devices which are mapped to a unique VLAN ID.

M Chetan Kumar (18):
  net: iosm: entry point
  net: iosm: irq handling
  net: iosm: mmio scratchpad
  net: iosm: shared memory IPC interface
  net: iosm: shared memory I/O operations
  net: iosm: channel configuration
  net: iosm: char device for FW flash & coredump
  net: iosm: MBIM control device
  net: iosm: bottom half
  net: iosm: multiplex IP sessions
  net: iosm: encode or decode datagram
  net: iosm: power management
  net: iosm: shared memory protocol
  net: iosm: protocol operations
  net: iosm: uevent support
  net: iosm: net driver
  net: iosm: readme file
  net: iosm: infrastructure

 MAINTAINERS   |7 +
 drivers/net/Kconfig   |1 +
 drivers/net/Makefile  |1 +
 drivers/net/wwan/Kconfig  |   13 +
 drivers/net/wwan/Makefile |5 +
 drivers/net/wwan/iosm/Kconfig |   10 +
 drivers/net/wwan/iosm/Makefile|   27 +
 drivers/net/wwan/iosm/README  |  126 +++
 drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.c |   87 ++
 drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.h |   57 +
 drivers/net/wwan/iosm/iosm_ipc_imem.c | 1466 +
 drivers/net/wwan/iosm/iosm_ipc_imem.h |  606 ++
 drivers/net/wwan/iosm/iosm_ipc_imem_ops.c |  779 +
 drivers/net/wwan/iosm/iosm_ipc_imem_ops.h |  102 ++
 drivers/net/wwan/iosm/iosm_ipc_irq.c  |   95 ++
 drivers/net/wwan/iosm/iosm_ipc_irq.h  |   35 +
 drivers/net/wwan/iosm/iosm_ipc_mbim.c |  205 
 drivers/net/wwan/iosm/iosm_ipc_mbim.h |   24 +
 drivers/net/wwan/iosm/iosm_ipc_mmio.c |  222 
 drivers/net/wwan/iosm/iosm_ipc_mmio.h |  192 
 drivers/net/wwan/iosm/iosm_ipc_mux.c  |  455 
 drivers/net/wwan/iosm/iosm_ipc_mux.h  |  344 ++
 drivers/net/wwan/iosm/iosm_ipc_mux_codec.c|  902 +++
 drivers/net/wwan/iosm/iosm_ipc_mux_codec.h|  194 
 drivers/net/wwan/iosm/iosm_ipc_pcie.c |  494 +
 drivers/net/wwan/iosm/iosm_ipc_pcie.h |  205 
 drivers/net/wwan/iosm/iosm_ipc_pm.c   |  334 ++
 drivers/net/ww

[RFC 01/18] net: iosm: entry point

2020-11-23 Thread M Chetan Kumar

1) Register IOSM driver with kernel to manage Intel WWAN PCIe
   device(PCI_VENDOR_ID_INTEL, INTEL_CP_DEVICE_7560_ID).
2) Exposes the EP PCIe device capability to Host PCIe core.
3) Initializes PCIe EP configuration and defines PCIe driver probe, remove
   and power management OPS.
4) Allocate and map(dma) skb memory for data communication from device to
   kernel and vice versa.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_pcie.c | 494 ++
 drivers/net/wwan/iosm/iosm_ipc_pcie.h | 205 ++
 2 files changed, 699 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_pcie.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_pcie.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_pcie.c 
b/drivers/net/wwan/iosm/iosm_ipc_pcie.c
new file mode 100644
index ..9457e889695a
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_pcie.c
@@ -0,0 +1,494 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+#include 
+
+#include "iosm_ipc_imem.h"
+#include "iosm_ipc_pcie.h"
+
+#define DRV_AUTHOR "Intel Corporation "
+
+MODULE_AUTHOR(DRV_AUTHOR);
+MODULE_DESCRIPTION("IOSM Driver");
+MODULE_LICENSE("GPL v2");
+
+static void ipc_pcie_resources_release(struct iosm_pcie *ipc_pcie)
+{
+   /* Free the MSI resources. */
+   ipc_release_irq(ipc_pcie);
+
+   /* Free mapped doorbell scratchpad bus memory into CPU space. */
+   iounmap(ipc_pcie->scratchpad);
+   ipc_pcie->scratchpad = NULL;
+
+   /* Free mapped IPC_REGS bus memory into CPU space. */
+   iounmap(ipc_pcie->ipc_regs);
+   ipc_pcie->ipc_regs = NULL;
+
+   /* Releases all PCI I/O and memory resources previously reserved by a
+* successful call to pci_request_regions.  Call this function only
+* after all use of the PCI regions has ceased.
+*/
+   pci_release_regions(ipc_pcie->pci);
+}
+
+static void ipc_cleanup(struct iosm_pcie *ipc_pcie)
+{
+   struct pci_dev *pci;
+
+   pci = ipc_pcie->pci;
+
+   /* Free the shared memory resources. */
+   ipc_imem_cleanup(ipc_pcie->imem);
+
+   ipc_pcie_resources_release(ipc_pcie);
+
+   /* Signal to the system that the PCI device is not in use. */
+   if (ipc_pcie->pci)
+   pci_disable_device(pci);
+
+   /*dbg cleanup*/
+   ipc_pcie->dev = NULL;
+}
+
+static void ipc_pcie_deinit(struct iosm_pcie *ipc_pcie)
+{
+   if (ipc_pcie) {
+   kfree(ipc_pcie->imem);
+   kfree(ipc_pcie);
+   }
+}
+
+static void iosm_ipc_remove(struct pci_dev *pci)
+{
+   struct iosm_pcie *ipc_pcie = pci_get_drvdata(pci);
+
+   ipc_cleanup(ipc_pcie);
+
+   ipc_pcie_deinit(ipc_pcie);
+}
+
+static int ipc_pcie_resources_request(struct iosm_pcie *ipc_pcie)
+{
+   struct pci_dev *pci = ipc_pcie->pci;
+   u32 cap;
+   u32 ret;
+
+   /* Reserved PCI I/O and memory resources.
+* Mark all PCI regions associated with PCI device pci as
+* being reserved by owner IOSM_IPC.
+*/
+   ret = pci_request_regions(pci, "IOSM_IPC");
+   if (ret) {
+   dev_err(ipc_pcie->dev, "failed pci request regions");
+   goto pci_request_region_fail;
+   }
+
+   /* Reserve the doorbell IPC REGS memory resources.
+* Remap the memory into CPU space. Arrange for the physical address
+* (BAR) to be visible from this driver.
+* pci_ioremap_bar() ensures that the memory is marked uncachable.
+*/
+   ipc_pcie->ipc_regs = pci_ioremap_bar(pci, ipc_pcie->ipc_regs_bar_nr);
+
+   if (!ipc_pcie->ipc_regs) {
+   dev_err(ipc_pcie->dev, "IPC REGS ioremap error");
+   ret = -EBUSY;
+   goto ipc_regs_remap_fail;
+   }
+
+   /* Reserve the MMIO scratchpad memory resources.
+* Remap the memory into CPU space. Arrange for the physical address
+* (BAR) to be visible from this driver.
+* pci_ioremap_bar() ensures that the memory is marked uncachable.
+*/
+   ipc_pcie->scratchpad =
+   pci_ioremap_bar(pci, ipc_pcie->scratchpad_bar_nr);
+
+   if (!ipc_pcie->scratchpad) {
+   dev_err(ipc_pcie->dev, "doorbell scratchpad ioremap error");
+   ret = -EBUSY;
+   goto scratch_remap_fail;
+   }
+
+   /* Install the irq handler triggered by CP. */
+   ret = ipc_acquire_irq(ipc_pcie);
+   if (ret) {
+   dev_err(ipc_pcie->dev, "acquiring MSI irq failed!");
+   goto irq_acquire_fail;
+   }
+
+   /* Enable bus-mastering for the IOSM IPC device. */
+   pci_set_master(pci);
+
+   /* Enable LTR if possible
+* This is needed for L1.2!
+*/
+   pcie_capability_read_dword(ipc_pcie->pci, PCI_EXP_DEVCAP2, &cap);
+   if (cap & PCI_EXP_DEVCAP2_LTR)
+   pcie_capability_set_word(ipc_pcie->pci, PCI_EXP_DEVCTL2,
+

[RFC 02/18] net: iosm: irq handling

2020-11-23 Thread M Chetan Kumar

1) Request interrupt vector, frees allocated resource.
2) Registers IRQ handler.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_irq.c | 95 
 drivers/net/wwan/iosm/iosm_ipc_irq.h | 35 +
 2 files changed, 130 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_irq.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_irq.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_irq.c 
b/drivers/net/wwan/iosm/iosm_ipc_irq.c
new file mode 100644
index ..b9e1bc7959db
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_irq.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include "iosm_ipc_pcie.h"
+#include "iosm_ipc_protocol.h"
+
+/* Write to the specified register offset for doorbell interrupt */
+static inline void write_dbell_reg(struct iosm_pcie *ipc_pcie, int irq_n,
+  u32 data)
+{
+   void __iomem *write_reg;
+
+   /* Select the first doorbell register, which is only currently needed
+* by CP.
+*/
+   write_reg = (void __iomem *)((u8 __iomem *)ipc_pcie->ipc_regs +
+ipc_pcie->doorbell_write +
+(irq_n * ipc_pcie->doorbell_reg_offset));
+
+   /* Fire the doorbell irq by writing data on the doorbell write pointer
+* register.
+*/
+   iowrite32(data, write_reg);
+}
+
+void ipc_doorbell_fire(struct iosm_pcie *ipc_pcie, int irq_n, u32 data)
+{
+   if (!ipc_pcie || !ipc_pcie->ipc_regs)
+   return;
+
+   write_dbell_reg(ipc_pcie, irq_n, data);
+}
+
+/* Threaded Interrupt handler for MSI interrupts */
+static irqreturn_t ipc_msi_interrupt(int irq, void *dev_id)
+{
+   struct iosm_pcie *ipc_pcie = dev_id;
+   int instance = irq - ipc_pcie->pci->irq;
+
+   /* Shift the MSI irq actions to the IPC tasklet. IRQ_NONE means the
+* irq was not from the IPC device or could not be served.
+*/
+   if (instance >= ipc_pcie->nvec)
+   return IRQ_NONE;
+
+   ipc_imem_irq_process(ipc_pcie->imem, instance);
+
+   return IRQ_HANDLED;
+}
+
+void ipc_release_irq(struct iosm_pcie *ipc_pcie)
+{
+   struct pci_dev *pdev = ipc_pcie->pci;
+
+   if (pdev->msi_enabled) {
+   while (--ipc_pcie->nvec >= 0)
+   free_irq(pdev->irq + ipc_pcie->nvec, ipc_pcie);
+   }
+   pci_free_irq_vectors(pdev);
+}
+
+int ipc_acquire_irq(struct iosm_pcie *ipc_pcie)
+{
+   struct pci_dev *pdev = ipc_pcie->pci;
+   int i, rc = 0;
+
+   ipc_pcie->nvec = pci_alloc_irq_vectors(pdev, IPC_MSI_VECTORS,
+  IPC_MSI_VECTORS, PCI_IRQ_MSI);
+
+   if (ipc_pcie->nvec < 0)
+   return ipc_pcie->nvec;
+
+   if (!pdev->msi_enabled) {
+   rc = -1;
+   goto error;
+   }
+
+   for (i = 0; i < ipc_pcie->nvec; ++i) {
+   rc = request_threaded_irq(pdev->irq + i, NULL,
+ ipc_msi_interrupt, 0, KBUILD_MODNAME,
+ ipc_pcie);
+   if (rc) {
+   dev_err(ipc_pcie->dev, "unable to grab IRQ %d, rc=%d",
+   pdev->irq, rc);
+   ipc_pcie->nvec = i;
+   ipc_release_irq(ipc_pcie);
+   goto error;
+   }
+   }
+
+error:
+   return rc;
+}
diff --git a/drivers/net/wwan/iosm/iosm_ipc_irq.h 
b/drivers/net/wwan/iosm/iosm_ipc_irq.h
new file mode 100644
index ..db207cb95a8a
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_irq.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#ifndef IOSM_IPC_IRQ_H
+#define IOSM_IPC_IRQ_H
+
+#include "iosm_ipc_pcie.h"
+
+struct iosm_pcie;
+
+/**
+ * ipc_doorbell_fire - fire doorbell to CP
+ * @ipc_pcie:  Pointer to iosm_pcie
+ * @irq_n: Doorbell type
+ * @data:  ipc state
+ */
+void ipc_doorbell_fire(struct iosm_pcie *ipc_pcie, int irq_n, u32 data);
+
+/**
+ * ipc_release_irq - Remove the IRQ handler.
+ * @ipc_pcie:  Pointer to iosm_pcie struct
+ */
+void ipc_release_irq(struct iosm_pcie *ipc_pcie);
+
+/**
+ * ipc_acquire_irq - Install the IPC IRQ handler.
+ * @ipc_pcie:  Pointer to iosm_pcie struct
+ *
+ * Return: 0 on success and -1 on failure
+ */
+int ipc_acquire_irq(struct iosm_pcie *ipc_pcie);
+
+#endif
-- 
2.12.3

[RFC 03/18] net: iosm: mmio scratchpad

2020-11-23 Thread M Chetan Kumar

1) Initializes the Scratchpad region for Host-Device communication.
2) Exposes device capabilities like chip info and device execution
   stages.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_mmio.c | 222 ++
 drivers/net/wwan/iosm/iosm_ipc_mmio.h | 192 +
 2 files changed, 414 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_mmio.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_mmio.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_mmio.c 
b/drivers/net/wwan/iosm/iosm_ipc_mmio.c
new file mode 100644
index ..eb685f6c720d
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_mmio.c
@@ -0,0 +1,222 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+#include 
+#include 
+
+#include "iosm_ipc_mmio.h"
+#include "iosm_ipc_sio.h"
+
+/* Definition of MMIO offsets
+ * note that MMIO_CI offsets are relative to end of chip info structure
+ */
+
+/* MMIO chip info size in bytes */
+#define MMIO_CHIP_INFO_SIZE 60
+
+/* CP execution stage */
+#define MMIO_OFFSET_EXECUTION_STAGE 0x00
+
+/* Boot ROM Chip Info struct */
+#define MMIO_OFFSET_CHIP_INFO 0x04
+
+#define MMIO_OFFSET_ROM_EXIT_CODE 0x40
+
+#define MMIO_OFFSET_PSI_ADDRESS 0x54
+
+#define MMIO_OFFSET_PSI_SIZE 0x5C
+
+#define MMIO_OFFSET_IPC_STATUS 0x60
+
+#define MMIO_OFFSET_CONTEXT_INFO 0x64
+
+#define MMIO_OFFSET_BASE_ADDR 0x6C
+
+#define MMIO_OFFSET_END_ADDR 0x74
+
+#define MMIO_OFFSET_CP_VERSION 0xF0
+
+#define MMIO_OFFSET_CP_CAPABILITIES 0xF4
+
+/* Timeout in 20 msec to wait for the modem boot code to write a valid
+ * execution stage into mmio area
+ */
+#define IPC_MMIO_EXEC_STAGE_TIMEOUT 50
+
+/* check if exec stage has one of the valid values */
+static bool ipc_mmio_is_valid_exec_stage(enum ipc_mem_exec_stage stage)
+{
+   switch (stage) {
+   case IPC_MEM_EXEC_STAGE_BOOT:
+   case IPC_MEM_EXEC_STAGE_PSI:
+   case IPC_MEM_EXEC_STAGE_EBL:
+   case IPC_MEM_EXEC_STAGE_RUN:
+   case IPC_MEM_EXEC_STAGE_CRASH:
+   case IPC_MEM_EXEC_STAGE_CD_READY:
+   return true;
+   default:
+   return false;
+   }
+}
+
+void ipc_mmio_update_cp_capability(struct iosm_mmio *ipc_mmio)
+{
+   u32 cp_cap;
+   unsigned int ver;
+
+   ver = ipc_mmio_get_cp_version(ipc_mmio);
+   cp_cap = readl(ipc_mmio->base + ipc_mmio->offset.cp_capability);
+
+   ipc_mmio->has_mux_lite = (ver >= IOSM_CP_VERSION) &&
+!(cp_cap & DL_AGGR) && !(cp_cap & UL_AGGR);
+
+   ipc_mmio->has_ul_flow_credit =
+   (ver >= IOSM_CP_VERSION) && (cp_cap & UL_FLOW_CREDIT);
+}
+
+struct iosm_mmio *ipc_mmio_init(void __iomem *mmio, struct device *dev)
+{
+   struct iosm_mmio *ipc_mmio = kzalloc(sizeof(*ipc_mmio), GFP_KERNEL);
+   int retries = IPC_MMIO_EXEC_STAGE_TIMEOUT;
+   enum ipc_mem_exec_stage stage;
+
+   if (!ipc_mmio)
+   return NULL;
+
+   ipc_mmio->dev = dev;
+
+   ipc_mmio->base = mmio;
+
+   ipc_mmio->offset.exec_stage = MMIO_OFFSET_EXECUTION_STAGE;
+
+   /* Check for a valid execution stage to make sure that the boot code
+* has correctly initialized the MMIO area.
+*/
+   do {
+   stage = ipc_mmio_get_exec_stage(ipc_mmio);
+   if (ipc_mmio_is_valid_exec_stage(stage))
+   break;
+
+   msleep(20);
+   } while (retries-- > 0);
+
+   if (!retries) {
+   dev_err(ipc_mmio->dev, "invalid exec stage %X", stage);
+   goto init_fail;
+   }
+
+   ipc_mmio->offset.chip_info = MMIO_OFFSET_CHIP_INFO;
+
+   /* read chip info size and version from chip info structure */
+   ipc_mmio->chip_info_version =
+   ioread8(ipc_mmio->base + ipc_mmio->offset.chip_info);
+
+   /* Increment of 2 is needed as the size value in the chip info
+* excludes the version and size field, which are always present
+*/
+   ipc_mmio->chip_info_size =
+   ioread8(ipc_mmio->base + ipc_mmio->offset.chip_info + 1) + 2;
+
+   if (ipc_mmio->chip_info_size != MMIO_CHIP_INFO_SIZE) {
+   dev_err(ipc_mmio->dev, "Unexpected Chip Info");
+   goto init_fail;
+   }
+
+   ipc_mmio->offset.rom_exit_code = MMIO_OFFSET_ROM_EXIT_CODE;
+
+   ipc_mmio->offset.psi_address = MMIO_OFFSET_PSI_ADDRESS;
+   ipc_mmio->offset.psi_size = MMIO_OFFSET_PSI_SIZE;
+   ipc_mmio->offset.ipc_status = MMIO_OFFSET_IPC_STATUS;
+   ipc_mmio->offset.context_info = MMIO_OFFSET_CONTEXT_INFO;
+   ipc_mmio->offset.ap_win_base = MMIO_OFFSET_BASE_ADDR;
+   ipc_mmio->offset.ap_win_end = MMIO_OFFSET_END_ADDR;
+
+   ipc_mmio->offset.cp_version = MMIO_OFFSET_CP_VERSION;
+   ipc_mmio->offset.cp_capability = MMIO_OFFSET_CP_CAPABILITIES;
+
+   return ipc_mmio;
+
+init_fail:
+   kfree(ipc_mmio);
+   return NULL;
+}
+
+enum

[RFC 04/18] net: iosm: shared memory IPC interface

2020-11-23 Thread M Chetan Kumar

1) Initializes shared memory for host-device communication.
2) Allocate resources required for control & data operations.
3) Transfers the Device IRQ to IPC execution thread.
4) Defines the timer cbs for async events.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_imem.c | 1466 +
 drivers/net/wwan/iosm/iosm_ipc_imem.h |  606 ++
 2 files changed, 2072 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_imem.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_imem.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.c 
b/drivers/net/wwan/iosm/iosm_ipc_imem.c
new file mode 100644
index ..7c26e2fdf77b
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_imem.c
@@ -0,0 +1,1466 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+
+#include "iosm_ipc_chnl_cfg.h"
+#include "iosm_ipc_imem.h"
+#include "iosm_ipc_mbim.h"
+#include "iosm_ipc_sio.h"
+#include "iosm_ipc_task_queue.h"
+
+/* Check the wwan ips if it is valid with Channel as input. */
+static inline int ipc_imem_check_wwan_ips(struct ipc_mem_channel *chnl)
+{
+   if (chnl)
+   return chnl->ctype == IPC_CTYPE_WWAN &&
+  chnl->vlan_id == IPC_MEM_MUX_IP_CH_VLAN_ID;
+   return false;
+}
+
+static int imem_msg_send_device_sleep(struct iosm_imem *ipc_imem, u32 state)
+{
+   union ipc_msg_prep_args prep_args = {
+   .sleep.target = 1,
+   .sleep.state = state,
+   };
+
+   ipc_imem->device_sleep = state;
+
+   return ipc_protocol_tq_msg_send(ipc_imem->ipc_protocol,
+   IPC_MSG_PREP_SLEEP, &prep_args, NULL);
+}
+
+static bool imem_dl_skb_alloc(struct iosm_imem *ipc_imem, struct ipc_pipe 
*pipe)
+{
+   /* limit max. nr of entries */
+   if (pipe->nr_of_queued_entries >= pipe->max_nr_of_queued_entries)
+   return false;
+
+   return ipc_protocol_dl_td_prepare(ipc_imem->ipc_protocol, pipe);
+}
+
+/* This timer handler will retry DL buff allocation if a pipe has no free buf 
*/
+static int imem_tq_td_alloc_timer(void *instance, int arg, void *msg,
+ size_t size)
+{
+   struct iosm_imem *ipc_imem = instance;
+   bool new_buffers_available = false;
+   bool retry_allocation = false;
+   int i;
+
+   for (i = 0; i < IPC_MEM_MAX_CHANNELS; i++) {
+   struct ipc_pipe *pipe = &ipc_imem->channels[i].dl_pipe;
+
+   if (!pipe->is_open || pipe->nr_of_queued_entries > 0)
+   continue;
+
+   while (imem_dl_skb_alloc(ipc_imem, pipe))
+   new_buffers_available = true;
+
+   if (pipe->nr_of_queued_entries == 0)
+   retry_allocation = true;
+   }
+
+   if (new_buffers_available)
+   ipc_protocol_doorbell_trigger(ipc_imem->ipc_protocol,
+ IPC_HP_DL_PROCESS);
+
+   if (retry_allocation)
+   imem_hrtimer_start(ipc_imem, &ipc_imem->td_alloc_timer,
+  IPC_TD_ALLOC_TIMER_PERIOD_MS * 1000);
+   return 0;
+}
+
+static enum hrtimer_restart imem_td_alloc_timer_cb(struct hrtimer *hr_timer)
+{
+   struct iosm_imem *ipc_imem =
+   container_of(hr_timer, struct iosm_imem, td_alloc_timer);
+   /* Post an async tasklet event to trigger HP update Doorbell */
+   ipc_task_queue_send_task(ipc_imem, imem_tq_td_alloc_timer, 0, NULL, 0,
+false);
+   return HRTIMER_NORESTART;
+}
+
+/* Fast update timer tasklet handler to trigger HP update */
+static int imem_tq_fast_update_timer_cb(void *instance, int arg, void *msg,
+   size_t size)
+{
+   struct iosm_imem *ipc_imem = instance;
+
+   ipc_protocol_doorbell_trigger(ipc_imem->ipc_protocol,
+ IPC_HP_FAST_TD_UPD_TMR);
+
+   return 0;
+}
+
+static enum hrtimer_restart imem_fast_update_timer_cb(struct hrtimer *hr_timer)
+{
+   struct iosm_imem *ipc_imem =
+   container_of(hr_timer, struct iosm_imem, fast_update_timer);
+   /* Post an async tasklet event to trigger HP update Doorbell */
+   ipc_task_queue_send_task(ipc_imem, imem_tq_fast_update_timer_cb, 0,
+NULL, 0, false);
+   return HRTIMER_NORESTART;
+}
+
+static void
+imem_hrtimer_init(struct hrtimer *hr_timer,
+ enum hrtimer_restart (*callback)(struct hrtimer *hr_timer))
+{
+   hrtimer_init(hr_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hr_timer->function = callback;
+}
+
+static int imem_setup_cp_mux_cap_init(struct iosm_imem *ipc_imem,
+ struct ipc_mux_config *cfg)
+{
+   ipc_mmio_update_cp_capability(ipc_imem->mmio);
+
+   if (!ipc_imem->mmio->has_mux_lite) {
+   dev_err(i

[RFC 06/18] net: iosm: channel configuration

2020-11-23 Thread M Chetan Kumar

Defines pipes & channel configurations like channel type,
pipe mappings, No. of transfer descriptors and transfer
buffer size etc.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.c | 87 +++
 drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.h | 57 
 2 files changed, 144 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.c 
b/drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.c
new file mode 100644
index ..d1d239218494
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include "iosm_ipc_chnl_cfg.h"
+
+/* Max. sizes of a downlink buffers */
+#define IPC_MEM_MAX_DL_FLASH_BUF_SIZE (16 * 1024)
+#define IPC_MEM_MAX_DL_LOOPBACK_SIZE (1 * 1024 * 1024)
+#define IPC_MEM_MAX_DL_AT_BUF_SIZE 2048
+#define IPC_MEM_MAX_DL_RPC_BUF_SIZE (32 * 1024)
+#define IPC_MEM_MAX_DL_MBIM_BUF_SIZE IPC_MEM_MAX_DL_RPC_BUF_SIZE
+
+/* Max. transfer descriptors for a pipe. */
+#define IPC_MEM_MAX_TDS_FLASH_DL 3
+#define IPC_MEM_MAX_TDS_FLASH_UL 6
+#define IPC_MEM_MAX_TDS_AT 4
+#define IPC_MEM_MAX_TDS_RPC 4
+#define IPC_MEM_MAX_TDS_MBIM IPC_MEM_MAX_TDS_RPC
+#define IPC_MEM_MAX_TDS_LOOPBACK 11
+
+/* Accumulation backoff usec */
+#define IRQ_ACC_BACKOFF_OFF 0
+
+/* MUX acc backoff 1ms */
+#define IRQ_ACC_BACKOFF_MUX 1000
+
+/* Modem channel configuration table
+ * Always reserve element zero for flash channel.
+ */
+static struct ipc_chnl_cfg modem_cfg[] = {
+   /* FLASH Channel */
+   { IPC_MEM_FLASH_CH_ID, IPC_MEM_PIPE_0, IPC_MEM_PIPE_1,
+ IPC_MEM_MAX_TDS_FLASH_UL, IPC_MEM_MAX_TDS_FLASH_DL,
+ IPC_MEM_MAX_DL_FLASH_BUF_SIZE },
+   /* MBIM Channel */
+   { IPC_MEM_MBIM_CTRL_CH_ID, IPC_MEM_PIPE_12, IPC_MEM_PIPE_13,
+ IPC_MEM_MAX_TDS_MBIM, IPC_MEM_MAX_TDS_MBIM,
+ IPC_MEM_MAX_DL_MBIM_BUF_SIZE },
+   /* RPC - 0 */
+   { IPC_WWAN_DSS_ID_0, IPC_MEM_PIPE_2, IPC_MEM_PIPE_3,
+ IPC_MEM_MAX_TDS_RPC, IPC_MEM_MAX_TDS_RPC,
+ IPC_MEM_MAX_DL_RPC_BUF_SIZE },
+   /* IAT0 */
+   { IPC_WWAN_DSS_ID_1, IPC_MEM_PIPE_4, IPC_MEM_PIPE_5, IPC_MEM_MAX_TDS_AT,
+ IPC_MEM_MAX_TDS_AT, IPC_MEM_MAX_DL_AT_BUF_SIZE },
+   /* IAT1 */
+   { IPC_WWAN_DSS_ID_2, IPC_MEM_PIPE_8, IPC_MEM_PIPE_9, IPC_MEM_MAX_TDS_AT,
+ IPC_MEM_MAX_TDS_AT, IPC_MEM_MAX_DL_AT_BUF_SIZE },
+   /* Loopback */
+   { IPC_WWAN_DSS_ID_3, IPC_MEM_PIPE_10, IPC_MEM_PIPE_11,
+ IPC_MEM_MAX_TDS_LOOPBACK, IPC_MEM_MAX_TDS_LOOPBACK,
+ IPC_MEM_MAX_DL_LOOPBACK_SIZE },
+   /* Trace */
+   { IPC_WWAN_DSS_ID_4, IPC_MEM_PIPE_6, IPC_MEM_PIPE_7, IPC_MEM_TDS_TRC,
+ IPC_MEM_TDS_TRC, IPC_MEM_MAX_DL_TRC_BUF_SIZE },
+   /* IP Mux */
+   { IPC_MEM_MUX_IP_CH_VLAN_ID, IPC_MEM_PIPE_0, IPC_MEM_PIPE_1,
+ IPC_MEM_MAX_TDS_MUX_LITE_UL, IPC_MEM_MAX_TDS_MUX_LITE_DL,
+ IPC_MEM_MAX_DL_MUX_LITE_BUF_SIZE },
+};
+
+int ipc_chnl_cfg_get(struct ipc_chnl_cfg *chnl_cfg, int index,
+enum ipc_mux_protocol mux_protocol)
+{
+   int array_size = ARRAY_SIZE(modem_cfg);
+
+   if (index >= array_size) {
+   pr_err("index: %d and array_size %d", index, array_size);
+   return -1;
+   }
+
+   if (index == IPC_MEM_MUX_IP_CH_VLAN_ID)
+   chnl_cfg->accumulation_backoff = IRQ_ACC_BACKOFF_MUX;
+   else
+   chnl_cfg->accumulation_backoff = IRQ_ACC_BACKOFF_OFF;
+
+   chnl_cfg->ul_nr_of_entries = modem_cfg[index].ul_nr_of_entries;
+   chnl_cfg->dl_nr_of_entries = modem_cfg[index].dl_nr_of_entries;
+   chnl_cfg->dl_buf_size = modem_cfg[index].dl_buf_size;
+   chnl_cfg->id = modem_cfg[index].id;
+   chnl_cfg->ul_pipe = modem_cfg[index].ul_pipe;
+   chnl_cfg->dl_pipe = modem_cfg[index].dl_pipe;
+
+   return 0;
+}
diff --git a/drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.h 
b/drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.h
new file mode 100644
index ..42ba4e4849bb
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (C) 2020 Intel Corporation
+ */
+
+#ifndef IOSM_IPC_CHNL_CFG_H
+#define IOSM_IPC_CHNL_CFG_H
+
+#include "iosm_ipc_mux.h"
+
+/* Number of TDs on the trace channel */
+#define IPC_MEM_TDS_TRC 32
+
+/* Trace channel TD buffer size. */
+#define IPC_MEM_MAX_DL_TRC_BUF_SIZE 8192
+
+/* Type of the WWAN ID */
+enum ipc_wwan_id {
+   IPC_WWAN_DSS_ID_0 = 257,
+   IPC_WWAN_DSS_ID_1,
+   IPC_WWAN_DSS_ID_2,
+   IPC_WWAN_DSS_ID_3,
+   IPC_WWAN_DSS_ID_4,
+};
+
+/**
+ * struct ipc_chnl_cfg - IPC channel configuration structure
+ * @id:VLAN ID
+ * @ul_pipe:   Uplink datastream
+ * @dl_pipe:   Downlink datast

[RFC 07/18] net: iosm: char device for FW flash & coredump

2020-11-23 Thread M Chetan Kumar

Implements a char device for flashing Modem FW image while Device
is in boot rom phase and for collecting traces on modem crash.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_sio.c | 188 +++
 drivers/net/wwan/iosm/iosm_ipc_sio.h |  72 ++
 2 files changed, 260 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_sio.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_sio.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_sio.c 
b/drivers/net/wwan/iosm/iosm_ipc_sio.c
new file mode 100644
index ..c35e7c6face1
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_sio.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+#include 
+
+#include "iosm_ipc_sio.h"
+
+/* Open a shared memory device and initialize the head of the rx skbuf list. */
+static int ipc_sio_fop_open(struct inode *inode, struct file *filp)
+{
+   struct iosm_sio *ipc_sio =
+   container_of(filp->private_data, struct iosm_sio, misc);
+
+   if (test_and_set_bit(0, &ipc_sio->sio_is_open))
+   return -EBUSY;
+
+   ipc_sio->channel_id = imem_sys_sio_open(ipc_sio->imem_instance);
+
+   if (ipc_sio->channel_id < 0)
+   return -EIO;
+
+   return 0;
+}
+
+static int ipc_sio_fop_release(struct inode *inode, struct file *filp)
+{
+   struct iosm_sio *ipc_sio =
+   container_of(filp->private_data, struct iosm_sio, misc);
+
+   if (ipc_sio->channel_id < 0)
+   return -EINVAL;
+
+   imem_sys_sio_close(ipc_sio);
+
+   clear_bit(0, &ipc_sio->sio_is_open);
+
+   return 0;
+}
+
+/* Copy the data from skbuff to the user buffer */
+static ssize_t ipc_sio_fop_read(struct file *filp, char __user *buf,
+   size_t size, loff_t *l)
+{
+   struct sk_buff *skb = NULL;
+   struct iosm_sio *ipc_sio;
+   bool is_blocking;
+
+   if (!buf)
+   return -EINVAL;
+
+   ipc_sio = container_of(filp->private_data, struct iosm_sio, misc);
+
+   is_blocking = !(filp->f_flags & O_NONBLOCK);
+
+   /* only log in blocking mode to reduce flooding the log */
+   if (is_blocking)
+   dev_dbg(ipc_sio->dev, "sio read chid[%d] size=%zu",
+   ipc_sio->channel_id, size);
+
+   /* First provide the pending skbuf to the user. */
+   if (ipc_sio->rx_pending_buf) {
+   skb = ipc_sio->rx_pending_buf;
+   ipc_sio->rx_pending_buf = NULL;
+   }
+
+   /* Check rx queue until skb is available */
+   while (!skb) {
+   skb = skb_dequeue(&ipc_sio->rx_list);
+   if (skb)
+   break;
+
+   if (!is_blocking)
+   return -EAGAIN;
+   /* Suspend the user app and wait a certain time for data
+* from CP.
+*/
+   if (WAIT_FOR_TIMEOUT(&ipc_sio->read_sem, IPC_READ_TIMEOUT) <
+   0) {
+   return -ETIMEDOUT;
+   }
+   }
+
+   return imem_sys_sio_read(ipc_sio, buf, size, skb);
+}
+
+/* Route the user data to the shared memory layer. */
+static ssize_t ipc_sio_fop_write(struct file *filp, const char __user *buf,
+size_t size, loff_t *l)
+{
+   struct iosm_sio *ipc_sio;
+   bool is_blocking;
+
+   if (!buf)
+   return -EINVAL;
+
+   ipc_sio = container_of(filp->private_data, struct iosm_sio, misc);
+
+   is_blocking = !(filp->f_flags & O_NONBLOCK);
+
+   if (ipc_sio->channel_id < 0)
+   return -EPERM;
+
+   return imem_sys_sio_write(ipc_sio, buf, size, is_blocking);
+}
+
+/* poll for applications using nonblocking I/O */
+static __poll_t ipc_sio_fop_poll(struct file *filp, poll_table *wait)
+{
+   struct iosm_sio *ipc_sio =
+   container_of(filp->private_data, struct iosm_sio, misc);
+   __poll_t mask = EPOLLOUT | EPOLLWRNORM; /* writable */
+
+   /* Just registers wait_queue hook. This doesn't really wait. */
+   poll_wait(filp, &ipc_sio->poll_inq, wait);
+
+   /* Test the fill level of the skbuf rx queue. */
+   if (!skb_queue_empty(&ipc_sio->rx_list) || ipc_sio->rx_pending_buf)
+   mask |= EPOLLIN | EPOLLRDNORM; /* readable */
+
+   return mask;
+}
+
+struct iosm_sio *ipc_sio_init(struct iosm_imem *ipc_imem, const char *name)
+{
+   static const struct file_operations fops = {
+   .owner = THIS_MODULE,
+   .open = ipc_sio_fop_open,
+   .release = ipc_sio_fop_release,
+   .read = ipc_sio_fop_read,
+   .write = ipc_sio_fop_write,
+   .poll = ipc_sio_fop_poll,
+   };
+
+   struct iosm_sio *ipc_sio = kzalloc(sizeof(*ipc_sio), GFP_KERNEL);
+
+   if (!ipc_sio)
+   return NULL;
+
+   ipc_sio->dev = ipc_imem->

[RFC 09/18] net: iosm: bottom half

2020-11-23 Thread M Chetan Kumar

1) Bottom half(tasklet) for IRQ and task processing.
2) Tasks are processed asynchronous and synchronously.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_task_queue.c | 258 
 drivers/net/wwan/iosm/iosm_ipc_task_queue.h |  46 +
 2 files changed, 304 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_task_queue.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_task_queue.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_task_queue.c 
b/drivers/net/wwan/iosm/iosm_ipc_task_queue.c
new file mode 100644
index ..34f6783f7533
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_task_queue.c
@@ -0,0 +1,258 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+
+#include "iosm_ipc_task_queue.h"
+
+/* Number of available element for the input message queue of the IPC
+ * ipc_task.
+ */
+#define IPC_THREAD_QUEUE_SIZE 256
+
+/**
+ * struct ipc_task_queue_args - Struct for Task queue arguments
+ * @instance:  Instance pointer for function to be called in tasklet context
+ * @msg:   Message argument for tasklet function. (optional, can be NULL)
+ * @completion:OS object used to wait for the tasklet function to 
finish for
+ * synchronous calls
+ * @func:  Function to be called in tasklet (tl) context
+ * @arg:   Generic integer argument for tasklet function (optional)
+ * @size:  Message size argument for tasklet function (optional)
+ * @response:  Return code of tasklet function for synchronous calls
+ * @is_copy:   Is true if msg contains a pointer to a copy of the original msg
+ * for async. calls that needs to be freed once the tasklet returns
+ * @response:  Return code of tasklet function for synchronous calls
+ */
+struct ipc_task_queue_args {
+   void *instance;
+   void *msg;
+   struct completion *completion;
+   int (*func)(void *instance, int arg, void *msg, size_t size);
+   int arg;
+   size_t size;
+   int response;
+   u8 is_copy : 1;
+};
+
+/**
+ * struct ipc_task_queue - Struct for Task queue
+ * @dev:   pointer to device structure
+ * @q_lock:Protect the message queue of the ipc ipc_task
+ * @args:  Message queue of the IPC ipc_task
+ * @q_rpos:First queue element to process.
+ * @q_wpos:First free element of the input queue.
+ */
+struct ipc_task_queue {
+   struct device *dev;
+   spinlock_t q_lock; /* for atomic operation on queue */
+   struct ipc_task_queue_args args[IPC_THREAD_QUEUE_SIZE];
+   unsigned int q_rpos;
+   unsigned int q_wpos;
+};
+
+/* Actual tasklet function, will be called whenever tasklet is scheduled.
+ * Calls event handler callback for each element in the message queue
+ */
+static void ipc_task_queue_handler(unsigned long data)
+{
+   struct ipc_task_queue *ipc_task = (struct ipc_task_queue *)data;
+   unsigned int q_rpos = ipc_task->q_rpos;
+
+   /* Loop over the input queue contents. */
+   while (q_rpos != ipc_task->q_wpos) {
+   /* Get the current first queue element. */
+   struct ipc_task_queue_args *args = &ipc_task->args[q_rpos];
+
+   /* Process the input message. */
+   if (args->func)
+   args->response = args->func(args->instance, args->arg,
+   args->msg, args->size);
+
+   /* Signal completion for synchronous calls */
+   if (args->completion)
+   complete(args->completion);
+
+   /* Free message if copy was allocated. */
+   if (args->is_copy)
+   kfree(args->msg);
+
+   /* Set invalid queue element. Technically
+* spin_lock_irqsave is not required here as
+* the array element has been processed already
+* so we can assume that immediately after processing
+* ipc_task element, queue will not rotate again to
+* ipc_task same element within such short time.
+*/
+   args->completion = NULL;
+   args->func = NULL;
+   args->msg = NULL;
+   args->size = 0;
+   args->is_copy = false;
+
+   /* calculate the new read ptr and update the volatile read
+* ptr
+*/
+   q_rpos = (q_rpos + 1) % IPC_THREAD_QUEUE_SIZE;
+   ipc_task->q_rpos = q_rpos;
+   }
+}
+
+/* Free memory alloc and trigger completions left in the queue during dealloc 
*/
+static void ipc_task_queue_cleanup(struct ipc_task_queue *ipc_task)
+{
+   unsigned int q_rpos = ipc_task->q_rpos;
+
+   while (q_rpos != ipc_task->q_wpos) {
+   struct ipc_task_queue_args *args = &ipc_task->args[q_rpos];
+
+   if (args->completion) {
+   complete(args->completion);
+

[RFC 08/18] net: iosm: MBIM control device

2020-11-23 Thread M Chetan Kumar

Implements a char device for MBIM protocol communication &
provides a simple IOCTL for max transfer buffer size
configuration.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_mbim.c | 205 ++
 drivers/net/wwan/iosm/iosm_ipc_mbim.h |  24 
 2 files changed, 229 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_mbim.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_mbim.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_mbim.c 
b/drivers/net/wwan/iosm/iosm_ipc_mbim.c
new file mode 100644
index ..b263c77d6eb2
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_mbim.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+#include 
+
+#include "iosm_ipc_imem_ops.h"
+#include "iosm_ipc_mbim.h"
+#include "iosm_ipc_sio.h"
+
+#define IOCTL_WDM_MAX_COMMAND _IOR('H', 0xA0, __u16)
+#define WDM_MAX_SIZE 4096
+#define IPC_READ_TIMEOUT 500
+
+/* MBIM IOCTL for max buffer size. */
+static long ipc_mbim_fop_unlocked_ioctl(struct file *filp, unsigned int cmd,
+   unsigned long arg)
+{
+   struct iosm_sio *ipc_mbim =
+   container_of(filp->private_data, struct iosm_sio, misc);
+
+   if (cmd != IOCTL_WDM_MAX_COMMAND ||
+   !access_ok((void __user *)arg, sizeof(ipc_mbim->wmaxcommand)))
+   return -EINVAL;
+
+   if (copy_to_user((void __user *)arg, &ipc_mbim->wmaxcommand,
+sizeof(ipc_mbim->wmaxcommand)))
+   return -EFAULT;
+
+   return 0;
+}
+
+/* Open a shared memory device and initialize the head of the rx skbuf list. */
+static int ipc_mbim_fop_open(struct inode *inode, struct file *filp)
+{
+   struct iosm_sio *ipc_mbim =
+   container_of(filp->private_data, struct iosm_sio, misc);
+
+   if (test_and_set_bit(0, &ipc_mbim->mbim_is_open))
+   return -EBUSY;
+
+   ipc_mbim->channel_id = imem_sys_mbim_open(ipc_mbim->imem_instance);
+
+   if (ipc_mbim->channel_id < 0)
+   return -EIO;
+
+   return 0;
+}
+
+/* Close a shared memory control device and free the rx skbuf list. */
+static int ipc_mbim_fop_release(struct inode *inode, struct file *filp)
+{
+   struct iosm_sio *ipc_mbim =
+   container_of(filp->private_data, struct iosm_sio, misc);
+
+   if (ipc_mbim->channel_id < 0)
+   return -EINVAL;
+
+   imem_sys_sio_close(ipc_mbim);
+
+   clear_bit(0, &ipc_mbim->mbim_is_open);
+   return 0;
+}
+
+/* Copy the data from skbuff to the user buffer */
+static ssize_t ipc_mbim_fop_read(struct file *filp, char __user *buf,
+size_t size, loff_t *l)
+{
+   struct sk_buff *skb = NULL;
+   struct iosm_sio *ipc_mbim;
+   bool is_blocking;
+
+   if (!access_ok(buf, size))
+   return -EINVAL;
+
+   ipc_mbim = container_of(filp->private_data, struct iosm_sio, misc);
+
+   is_blocking = !(filp->f_flags & O_NONBLOCK);
+
+   /* First provide the pending skbuf to the user. */
+   if (ipc_mbim->rx_pending_buf) {
+   skb = ipc_mbim->rx_pending_buf;
+   ipc_mbim->rx_pending_buf = NULL;
+   }
+
+   /* Check rx queue until skb is available */
+   while (!skb) {
+   skb = skb_dequeue(&ipc_mbim->rx_list);
+   if (skb)
+   break;
+
+   if (!is_blocking)
+   return -EAGAIN;
+
+   /* Suspend the user app and wait a certain time for data
+* from CP.
+*/
+   if (WAIT_FOR_TIMEOUT(&ipc_mbim->read_sem, IPC_READ_TIMEOUT) < 0)
+   return -ETIMEDOUT;
+   }
+
+   return imem_sys_sio_read(ipc_mbim, buf, size, skb);
+}
+
+/* Route the user data to the shared memory layer. */
+static ssize_t ipc_mbim_fop_write(struct file *filp, const char __user *buf,
+ size_t size, loff_t *l)
+{
+   struct iosm_sio *ipc_mbim;
+   bool is_blocking;
+
+   if (!access_ok(buf, size))
+   return -EINVAL;
+
+   ipc_mbim = container_of(filp->private_data, struct iosm_sio, misc);
+
+   is_blocking = !(filp->f_flags & O_NONBLOCK);
+
+   if (ipc_mbim->channel_id < 0)
+   return -EPERM;
+
+   return imem_sys_sio_write(ipc_mbim, buf, size, is_blocking);
+}
+
+/* Poll mechanism for applications that use nonblocking IO */
+static __poll_t ipc_mbim_fop_poll(struct file *filp, poll_table *wait)
+{
+   struct iosm_sio *ipc_mbim =
+   container_of(filp->private_data, struct iosm_sio, misc);
+   __poll_t mask = EPOLLOUT | EPOLLWRNORM; /* writable */
+
+   /* Just registers wait_queue hook. This doesn't really wait. */
+   poll_wait(filp, &ipc_mbim->poll_inq, wait);
+
+   /* Test the fill level of the skbuf rx queue. */
+   if (!skb_queue_empty(&ipc_mbim->

[RFC 13/18] net: iosm: shared memory protocol

2020-11-23 Thread M Chetan Kumar

1) Defines messaging protocol for handling Transfer Descriptor
   in both UL/DL direction.
2) Ring buffer management.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_protocol.c | 287 ++
 drivers/net/wwan/iosm/iosm_ipc_protocol.h | 219 +++
 2 files changed, 506 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_protocol.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_protocol.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_protocol.c 
b/drivers/net/wwan/iosm/iosm_ipc_protocol.c
new file mode 100644
index ..82d75d3d191c
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_protocol.c
@@ -0,0 +1,287 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include "iosm_ipc_protocol.h"
+#include "iosm_ipc_task_queue.h"
+
+int ipc_protocol_tq_msg_send(struct iosm_protocol *ipc_protocol,
+enum ipc_msg_prep_type msg_type,
+union ipc_msg_prep_args *prep_args,
+struct ipc_rsp *response)
+{
+   int index = ipc_protocol_msg_prep(ipc_protocol, msg_type, prep_args);
+
+   /* Store reference towards caller specified response in response ring
+* and signal CP
+*/
+   if (index >= 0 && index < IPC_MEM_MSG_ENTRIES) {
+   ipc_protocol->rsp_ring[index] = response;
+   ipc_protocol_msg_hp_update(ipc_protocol);
+   }
+
+   return index;
+}
+
+/* Tasklet message send call back function */
+static int ipc_protocol_tq_msg_send_cb(void *instance, int arg, void *msg,
+  size_t size)
+{
+   struct ipc_call_msg_send_args *send_args = msg;
+   struct iosm_protocol *ipc_protocol =
+   ((struct iosm_imem *)instance)->ipc_protocol;
+
+   return ipc_protocol_tq_msg_send(ipc_protocol, send_args->msg_type,
+   send_args->prep_args,
+   send_args->response);
+}
+
+/* Remove reference to a response. This is typically used when a requestor 
timed
+ * out and is no longer interested in the response.
+ */
+static int ipc_protocol_tq_msg_remove(void *instance, int arg, void *msg,
+ size_t size)
+{
+   struct iosm_protocol *ipc_protocol =
+   ((struct iosm_imem *)instance)->ipc_protocol;
+
+   ipc_protocol->rsp_ring[arg] = NULL;
+   return 0;
+}
+
+int ipc_protocol_msg_send(struct iosm_protocol *ipc_protocol,
+ enum ipc_msg_prep_type prep,
+ union ipc_msg_prep_args *prep_args)
+{
+   struct ipc_call_msg_send_args send_args;
+   unsigned int exec_timeout;
+   struct ipc_rsp response;
+   int result = -1;
+   int index;
+
+   exec_timeout = (ipc_protocol_get_ap_exec_stage(ipc_protocol) ==
+   IPC_MEM_EXEC_STAGE_RUN ?
+   IPC_MSG_COMPLETE_RUN_DEFAULT_TIMEOUT :
+   IPC_MSG_COMPLETE_BOOT_DEFAULT_TIMEOUT);
+
+   /* Trap if called from non-preemptible context */
+   might_sleep();
+
+   response.status = IPC_MEM_MSG_CS_INVALID;
+   init_completion(&response.completion);
+
+   send_args.msg_type = prep;
+   send_args.prep_args = prep_args;
+   send_args.response = &response;
+
+   /* Allocate and prepare message to be sent in tasklet context.
+* A positive index returned form tasklet_call references the message
+* in case it needs to be cancelled when there is a timeout.
+*/
+   index = ipc_task_queue_send_task(ipc_protocol->imem,
+ipc_protocol_tq_msg_send_cb, 0,
+&send_args, 0, true);
+
+   if (index < 0) {
+   dev_err(ipc_protocol->dev, "msg %d failed", prep);
+   return index;
+   }
+
+   /* Wait for the device to respond to the message */
+   switch (wait_for_completion_timeout(&response.completion,
+   msecs_to_jiffies(exec_timeout))) {
+   case 0:
+   /* Timeout, there was no response from the device.
+* Remove the reference to the local response completion
+* object as we are no longer interested in the response.
+*/
+   ipc_task_queue_send_task(ipc_protocol->imem,
+ipc_protocol_tq_msg_remove, index,
+NULL, 0, true);
+   dev_err(ipc_protocol->dev, "msg timeout");
+   ipc_uevent_send(ipc_protocol->pcie->dev, UEVENT_MDM_TIMEOUT);
+   break;
+   default:
+   /* We got a response in time; check completion status: */
+   if (response.status == IPC_MEM_MSG_CS_SUCCESS)
+   resul

[RFC 11/18] net: iosm: encode or decode datagram

2020-11-23 Thread M Chetan Kumar

1) Encode UL packet into datagram.
2) Decode DL datagram and route it to network layer.
3) Supports credit based flow control.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_mux_codec.c | 902 +
 drivers/net/wwan/iosm/iosm_ipc_mux_codec.h | 194 +++
 2 files changed, 1096 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_mux_codec.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c 
b/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
new file mode 100644
index ..54437651704e
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
@@ -0,0 +1,902 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+
+#include "iosm_ipc_imem_ops.h"
+#include "iosm_ipc_mux_codec.h"
+#include "iosm_ipc_task_queue.h"
+
+/* Test the link power state and send a MUX command in blocking mode. */
+static int mux_tq_cmd_send(void *instance, int arg, void *msg, size_t size)
+{
+   struct iosm_mux *ipc_mux = ((struct iosm_imem *)instance)->mux;
+   const struct mux_acb *acb = msg;
+
+   skb_queue_tail(&ipc_mux->channel->ul_list, acb->skb);
+   imem_ul_send(ipc_mux->imem);
+
+   return 0;
+}
+
+static int mux_acb_send(struct iosm_mux *ipc_mux, bool blocking)
+{
+   struct completion *completion = &ipc_mux->channel->ul_sem;
+
+   if (ipc_task_queue_send_task(ipc_mux->imem, mux_tq_cmd_send, 0,
+&ipc_mux->acb, sizeof(ipc_mux->acb),
+false)) {
+   dev_err(ipc_mux->dev, "unable to send mux command");
+   return -1;
+   }
+
+   /* if blocking, suspend the app and wait for irq in the flash or
+* crash phase. return false on timeout to indicate failure.
+*/
+   if (blocking) {
+   u32 wait_time_milliseconds = IPC_MUX_CMD_RUN_DEFAULT_TIMEOUT;
+
+   reinit_completion(completion);
+
+   if (WAIT_FOR_TIMEOUT(completion, wait_time_milliseconds) == 0) {
+   dev_err(ipc_mux->dev, "ch[%d] timeout",
+   ipc_mux->channel_id);
+   ipc_uevent_send(ipc_mux->imem->dev, UEVENT_MDM_TIMEOUT);
+   return -ETIMEDOUT;
+   }
+   }
+
+   return 0;
+}
+
+/* Prepare mux Command */
+static struct mux_lite_cmdh *mux_lite_add_cmd(struct iosm_mux *ipc_mux, u32 
cmd,
+ struct mux_acb *acb, void *param,
+ u32 param_size)
+{
+   struct mux_lite_cmdh *cmdh = (struct mux_lite_cmdh *)acb->skb->data;
+
+   cmdh->signature = MUX_SIG_CMDH;
+   cmdh->command_type = cmd;
+   cmdh->if_id = acb->if_id;
+
+   acb->cmd = cmd;
+
+   cmdh->cmd_len = offsetof(struct mux_lite_cmdh, param) + param_size;
+   cmdh->transaction_id = ipc_mux->tx_transaction_id++;
+
+   if (param)
+   memcpy(&cmdh->param, param, param_size);
+
+   skb_put(acb->skb, cmdh->cmd_len);
+
+   return cmdh;
+}
+
+static int mux_acb_alloc(struct iosm_mux *ipc_mux)
+{
+   struct mux_acb *acb = &ipc_mux->acb;
+   struct sk_buff *skb;
+   dma_addr_t mapping;
+
+   /* Allocate skb memory for the uplink buffer. */
+   skb = ipc_pcie_alloc_skb(ipc_mux->pcie, MUX_MAX_UL_ACB_BUF_SIZE,
+GFP_ATOMIC, &mapping, DMA_TO_DEVICE, 0);
+   if (!skb)
+   return -ENOMEM;
+
+   /* Save the skb address. */
+   acb->skb = skb;
+
+   memset(skb->data, 0, MUX_MAX_UL_ACB_BUF_SIZE);
+
+   return 0;
+}
+
+int mux_dl_acb_send_cmds(struct iosm_mux *ipc_mux, u32 cmd_type, u8 if_id,
+u32 transaction_id, union mux_cmd_param *param,
+size_t res_size, bool blocking, bool respond)
+{
+   struct mux_acb *acb = &ipc_mux->acb;
+   struct mux_lite_cmdh *ack_lite;
+   int ret = 0;
+
+   acb->if_id = if_id;
+   ret = mux_acb_alloc(ipc_mux);
+   if (ret)
+   return ret;
+
+   ack_lite = mux_lite_add_cmd(ipc_mux, cmd_type, acb, param, res_size);
+   if (respond)
+   ack_lite->transaction_id = (u32)transaction_id;
+
+   ret = mux_acb_send(ipc_mux, blocking);
+
+   return ret;
+}
+
+void mux_netif_tx_flowctrl(struct mux_session *session, int idx, bool on)
+{
+   /* Inform the network interface to start/stop flow ctrl */
+   if (ipc_wwan_is_tx_stopped(session->wwan, idx) != on)
+   ipc_wwan_tx_flowctrl(session->wwan, idx, on);
+}
+
+static int mux_dl_cmdresps_decode_process(struct iosm_mux *ipc_mux,
+ struct mux_lite_cmdh *cmdh)
+{
+   struct mux_acb *acb = &ipc_mux->acb;
+
+   switch (cmdh->command_type) {
+   case MUX_CMD_OPEN_SESSION_RESP:
+   case MUX_CMD_CLOSE_SESSION_

[RFC 12/18] net: iosm: power management

2020-11-23 Thread M Chetan Kumar

Implements state machine to handle host & device sleep.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_pm.c | 334 
 drivers/net/wwan/iosm/iosm_ipc_pm.h | 216 +++
 2 files changed, 550 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_pm.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_pm.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_pm.c 
b/drivers/net/wwan/iosm/iosm_ipc_pm.c
new file mode 100644
index ..662f8f309ec0
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_pm.c
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include "iosm_ipc_protocol.h"
+#include "iosm_ipc_task_queue.h"
+
+/* Timeout value in MS for the PM to wait for device to reach active state */
+#define IPC_PM_ACTIVE_TIMEOUT_MS (500)
+
+/* Value definitions for union ipc_pm_cond members.
+ *
+ * Note that here "active" has the value 1, as compared to the enums
+ * ipc_mem_host_pm_state or ipc_mem_dev_pm_state, where "active" is 0
+ */
+#define IPC_PM_SLEEP (0)
+#define IPC_PM_ACTIVE (1)
+
+/* Trigger the doorbell interrupt on cp to change the PM sleep/active status */
+#define ipc_cp_irq_sleep_control(ipc_pcie, data)   
\
+   ipc_doorbell_fire(ipc_pcie, IPC_DOORBELL_IRQ_SLEEP, data)
+
+/* Trigger the doorbell interrupt on CP to do hpda update */
+#define ipc_cp_irq_hpda_update(ipc_pcie, data) 
\
+   ipc_doorbell_fire(ipc_pcie, IPC_DOORBELL_IRQ_HPDA, 0xFF & (data))
+
+void ipc_pm_signal_hpda_doorbell(struct iosm_pm *ipc_pm, u32 identifier,
+bool host_slp_check)
+{
+   if (host_slp_check && ipc_pm->host_pm_state != IPC_MEM_HOST_PM_ACTIVE &&
+   ipc_pm->host_pm_state != IPC_MEM_HOST_PM_ACTIVE_WAIT) {
+   ipc_pm->pending_hpda_update = true;
+   dev_dbg(ipc_pm->dev,
+   "Pending HPDA update set. Host PM_State: %d 
identifier:%d",
+   ipc_pm->host_pm_state, identifier);
+   return;
+   }
+
+   if (!ipc_pm_trigger(ipc_pm, IPC_PM_UNIT_IRQ, true)) {
+   ipc_pm->pending_hpda_update = true;
+   dev_dbg(ipc_pm->dev, "Pending HPDA update set. identifier:%d",
+   identifier);
+   return;
+   }
+   ipc_pm->pending_hpda_update = false;
+
+   /* Trigger the irq towards CP */
+   ipc_cp_irq_hpda_update(ipc_pm->pcie, identifier);
+
+   ipc_pm_trigger(ipc_pm, IPC_PM_UNIT_IRQ, false);
+}
+
+/* Wake up the device if it is in low power mode. */
+static bool ipc_pm_link_activate(struct iosm_pm *ipc_pm)
+{
+   if (ipc_pm->cp_state == IPC_MEM_DEV_PM_ACTIVE)
+   return true;
+
+   if (ipc_pm->cp_state == IPC_MEM_DEV_PM_SLEEP) {
+   if (ipc_pm->ap_state == IPC_MEM_DEV_PM_SLEEP) {
+   /* Wake up the device. */
+   ipc_cp_irq_sleep_control(ipc_pm->pcie,
+IPC_MEM_DEV_PM_WAKEUP);
+   ipc_pm->ap_state = IPC_MEM_DEV_PM_ACTIVE_WAIT;
+
+   return false;
+   }
+
+   if (ipc_pm->ap_state == IPC_MEM_DEV_PM_ACTIVE_WAIT)
+   return false;
+
+   return true;
+   }
+
+   /* link is not ready */
+   return false;
+}
+
+void ipc_pm_host_slp_reinit_dev_active_completion(struct iosm_pm *ipc_pm)
+{
+   if (!ipc_pm)
+   return;
+
+   atomic_set(&ipc_pm->host_sleep_pend, 1);
+
+   reinit_completion(&ipc_pm->host_sleep_complete);
+}
+
+bool ipc_pm_wait_for_device_active(struct iosm_pm *ipc_pm)
+{
+   bool ret_val = false;
+
+   if (ipc_pm->ap_state != IPC_MEM_DEV_PM_ACTIVE)
+
+   /* Wait for IPC_PM_ACTIVE_TIMEOUT_MS for Device sleep state
+* machine to enter ACTIVE state.
+*/
+   if (!WAIT_FOR_TIMEOUT(&ipc_pm->host_sleep_complete,
+ IPC_PM_ACTIVE_TIMEOUT_MS)) {
+   dev_err(ipc_pm->dev,
+   "PM timeout. Expected State:%d. Actual: %d",
+   IPC_MEM_DEV_PM_ACTIVE, ipc_pm->ap_state);
+   goto  active_timeout;
+   }
+
+   ret_val = true;
+active_timeout:
+   /* Reset the atomic variable in any case as device sleep
+* state machine change is no longer of interest.
+*/
+   atomic_set(&ipc_pm->host_sleep_pend, 0);
+
+   return ret_val;
+}
+
+static void ipc_pm_on_link_sleep(struct iosm_pm *ipc_pm)
+{
+   /* pending sleep ack and all conditions are cleared
+* -> signal SLEEP__ACK to CP
+*/
+   ipc_pm->cp_state = IPC_MEM_DEV_PM_SLEEP;
+   ipc_pm->ap_state = IPC_MEM_DEV_PM_SLEEP;
+
+   ipc_cp_irq_sleep_control(ipc_pm->pcie, IPC_MEM_DEV_PM_SLEEP);
+

[RFC 05/18] net: iosm: shared memory I/O operations

2020-11-23 Thread M Chetan Kumar

1) Binds logical channel between host-device for communication.
2) Implements device specific(Char/Net) IO operations.
3) Inject primary BootLoader FW image to modem.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_imem_ops.c | 779 ++
 drivers/net/wwan/iosm/iosm_ipc_imem_ops.h | 102 
 2 files changed, 881 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_imem_ops.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c 
b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
new file mode 100644
index ..2e2f3f43e21c
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
@@ -0,0 +1,779 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+
+#include "iosm_ipc_chnl_cfg.h"
+#include "iosm_ipc_imem.h"
+#include "iosm_ipc_imem_ops.h"
+#include "iosm_ipc_sio.h"
+#include "iosm_ipc_task_queue.h"
+
+/* Open a packet data online channel between the network layer and CP. */
+int imem_sys_wwan_open(void *instance, int vlan_id)
+{
+   struct iosm_imem *ipc_imem = instance;
+
+   dev_dbg(ipc_imem->dev, "%s[vlan id:%d]",
+   ipc_ap_phase_get_string(ipc_imem->phase), vlan_id);
+
+   /* The network interface is only supported in the runtime phase. */
+   if (imem_ap_phase_update(ipc_imem) != IPC_P_RUN) {
+   dev_err(ipc_imem->dev, "[net:%d]: refused phase %s", vlan_id,
+   ipc_ap_phase_get_string(ipc_imem->phase));
+   return -1;
+   }
+
+   /* check for the vlan tag
+* if tag 1 to 8 then create IP MUX channel sessions.
+* if tag 257 to 512 then create dss channel.
+* To start MUX session from 0 as vlan tag would start from 1
+* so map it to if_id = vlan_id - 1
+*/
+   if (vlan_id > 0 && vlan_id <= ipc_mux_get_max_sessions(ipc_imem->mux)) {
+   return ipc_mux_open_session(ipc_imem->mux, vlan_id - 1);
+   } else if (vlan_id > 256 && vlan_id < 512) {
+   int ch_id =
+   imem_channel_alloc(ipc_imem, vlan_id, IPC_CTYPE_WWAN);
+
+   if (imem_channel_open(ipc_imem, ch_id, IPC_HP_NET_CHANNEL_INIT))
+   return ch_id;
+   }
+
+   return -1;
+}
+
+/* Release a net link to CP. */
+void imem_sys_wwan_close(void *instance, int vlan_id, int channel_id)
+{
+   struct iosm_imem *ipc_imem = instance;
+
+   if (ipc_imem->mux && vlan_id > 0 &&
+   vlan_id <= ipc_mux_get_max_sessions(ipc_imem->mux))
+   ipc_mux_close_session(ipc_imem->mux, vlan_id - 1);
+
+   else if ((vlan_id > 256 && vlan_id < 512))
+   imem_channel_close(ipc_imem, channel_id);
+}
+
+/* Tasklet call to do uplink transfer. */
+static int imem_tq_sio_write(void *instance, int arg, void *msg, size_t size)
+{
+   struct iosm_imem *ipc_imem = instance;
+
+   ipc_imem->ev_sio_write_pending = false;
+   imem_ul_send(ipc_imem);
+
+   return 0;
+}
+
+/* Through tasklet to do sio write. */
+static bool imem_call_sio_write(struct iosm_imem *ipc_imem)
+{
+   if (ipc_imem->ev_sio_write_pending)
+   return false;
+
+   ipc_imem->ev_sio_write_pending = true;
+
+   return (!ipc_task_queue_send_task(ipc_imem, imem_tq_sio_write, 0, NULL,
+ 0, false));
+}
+
+/* Add to the ul list skb */
+static int imem_wwan_transmit(struct iosm_imem *ipc_imem, int vlan_id,
+ int channel_id, struct sk_buff *skb)
+{
+   struct ipc_mem_channel *channel;
+
+   channel = &ipc_imem->channels[channel_id];
+
+   if (channel->state != IMEM_CHANNEL_ACTIVE) {
+   dev_err(ipc_imem->dev, "invalid state of channel %d",
+   channel_id);
+   return -1;
+   }
+
+   if (ipc_pcie_addr_map(ipc_imem->pcie, skb->data, skb->len,
+ &IPC_CB(skb)->mapping, DMA_TO_DEVICE)) {
+   dev_err(ipc_imem->dev, "failed to map skb");
+   IPC_CB(skb)->direction = DMA_TO_DEVICE;
+   IPC_CB(skb)->len = skb->len;
+   IPC_CB(skb)->op_type = UL_DEFAULT;
+   return -1;
+   }
+
+   /* Add skb to the uplink skbuf accumulator */
+   skb_queue_tail(&channel->ul_list, skb);
+   imem_call_sio_write(ipc_imem);
+
+   return 0;
+}
+
+/* Function for transfer UL data
+ * WWAN layer must free the packet in case if imem fails to transmit.
+ * In case of success, imem layer will free it.
+ */
+int imem_sys_wwan_transmit(void *instance, int vlan_id, int channel_id,
+  struct sk_buff *skb)
+{
+   struct iosm_imem *ipc_imem = instance;
+   int ret = -1;
+
+   if (!ipc_imem || channel_id < 0)
+   return -EINVAL;
+
+   /* Is CP Running? */
+   if (ipc_imem->phase != IPC_P_RUN) {
+   dev_

[RFC 15/18] net: iosm: uevent support

2020-11-23 Thread M Chetan Kumar

Report modem status via uevent.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_uevent.c | 47 +
 drivers/net/wwan/iosm/iosm_ipc_uevent.h | 41 
 2 files changed, 88 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_uevent.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_uevent.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_uevent.c 
b/drivers/net/wwan/iosm/iosm_ipc_uevent.c
new file mode 100644
index ..27542ca27613
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_uevent.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+
+#include "iosm_ipc_sio.h"
+#include "iosm_ipc_uevent.h"
+
+/* Update the uevent in work queue context */
+static void ipc_uevent_work(struct work_struct *data)
+{
+   struct ipc_uevent_info *info;
+   char *envp[2] = { NULL, NULL };
+
+   info = container_of(data, struct ipc_uevent_info, work);
+
+   envp[0] = info->uevent;
+
+   if (kobject_uevent_env(&info->dev->kobj, KOBJ_CHANGE, envp))
+   pr_err("uevent %s failed to sent", info->uevent);
+
+   kfree(info);
+}
+
+void ipc_uevent_send(struct device *dev, char *uevent)
+{
+   struct ipc_uevent_info *info;
+
+   if (!uevent || !dev)
+   return;
+
+   info = kzalloc(sizeof(*info), GFP_ATOMIC);
+   if (!info)
+   return;
+
+   /* Initialize the kernel work queue */
+   INIT_WORK(&info->work, ipc_uevent_work);
+
+   /* Store the device and event information */
+   info->dev = dev;
+   snprintf(info->uevent, MAX_UEVENT_LEN, "%s: %s", dev_name(dev), uevent);
+
+   /* Schedule uevent in process context using work queue */
+   schedule_work(&info->work);
+}
diff --git a/drivers/net/wwan/iosm/iosm_ipc_uevent.h 
b/drivers/net/wwan/iosm/iosm_ipc_uevent.h
new file mode 100644
index ..422f64411c6e
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_uevent.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#ifndef IOSM_IPC_UEVENT_H
+#define IOSM_IPC_UEVENT_H
+
+/* Baseband event strings */
+#define UEVENT_MDM_NOT_READY "MDM_NOT_READY"
+#define UEVENT_ROM_READY "ROM_READY"
+#define UEVENT_MDM_READY "MDM_READY"
+#define UEVENT_CRASH "CRASH"
+#define UEVENT_CD_READY "CD_READY"
+#define UEVENT_CD_READY_LINK_DOWN "CD_READY_LINK_DOWN"
+#define UEVENT_MDM_TIMEOUT "MDM_TIMEOUT"
+
+/* Maximum length of user events */
+#define MAX_UEVENT_LEN 64
+
+/**
+ * struct ipc_uevent_info - Uevent information structure.
+ * @dev:   Pointer to device structure
+ * @uevent:Uevent information
+ * @work:  Uevent work struct
+ */
+struct ipc_uevent_info {
+   struct device *dev;
+   char uevent[MAX_UEVENT_LEN];
+   struct work_struct work;
+};
+
+/**
+ * ipc_uevent_send - Send modem event to user space.
+ * @dev:   Generic device pointer
+ * @uevent:Uevent information
+ *
+ */
+void ipc_uevent_send(struct device *dev, char *uevent);
+
+#endif
-- 
2.12.3

[RFC 14/18] net: iosm: protocol operations

2020-11-23 Thread M Chetan Kumar

1) Update UL/DL transfer descriptors in message ring.
2) Define message set for pipe/sleep protocol.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_protocol_ops.c | 563 ++
 drivers/net/wwan/iosm/iosm_ipc_protocol_ops.h | 358 
 2 files changed, 921 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_protocol_ops.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_protocol_ops.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_protocol_ops.c 
b/drivers/net/wwan/iosm/iosm_ipc_protocol_ops.c
new file mode 100644
index ..beca5e06203a
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_protocol_ops.c
@@ -0,0 +1,563 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include "iosm_ipc_protocol.h"
+#include "iosm_ipc_protocol_ops.h"
+
+/* Get the next free message element.*/
+static union ipc_mem_msg_entry *
+ipc_protocol_free_msg_get(struct iosm_protocol *ipc_protocol, int *index)
+{
+   u32 head = ipc_protocol->p_ap_shm->msg_head;
+   u32 new_head = (head + 1) % IPC_MEM_MSG_ENTRIES;
+   union ipc_mem_msg_entry *msg;
+
+   if (new_head == ipc_protocol->p_ap_shm->msg_tail) {
+   dev_err(ipc_protocol->dev, "message ring is full");
+   return NULL;
+   }
+
+   /* Get the pointer to the next free message element,
+* reset the fields and mark is as invalid.
+*/
+   msg = &ipc_protocol->p_ap_shm->msg_ring[head];
+   memset(msg, 0, sizeof(*msg));
+
+   /* return index in message ring */
+   *index = head;
+
+   return msg;
+}
+
+/* Updates the message ring Head pointer */
+void ipc_protocol_msg_hp_update(void *instance)
+{
+   struct iosm_protocol *ipc_protocol = instance;
+   u32 head = ipc_protocol->p_ap_shm->msg_head;
+   u32 new_head = (head + 1) % IPC_MEM_MSG_ENTRIES;
+
+   /* Update head pointer and fire doorbell. */
+   ipc_protocol->p_ap_shm->msg_head = new_head;
+   ipc_protocol->old_msg_tail = ipc_protocol->p_ap_shm->msg_tail;
+
+   /* Host Sleep negotiation happens through Message Ring. So Host Sleep
+* check should be avoided by sending false as last argument.
+*/
+   ipc_pm_signal_hpda_doorbell(ipc_protocol->pm, IPC_HP_MR, false);
+}
+
+/* Allocate and prepare a OPEN_PIPE message.
+ * This also allocates the memory for the new TDR structure and
+ * updates the pipe structure referenced in the preparation arguments.
+ */
+static int ipc_protocol_msg_prepipe_open(struct iosm_protocol *ipc_protocol,
+union ipc_msg_prep_args *args)
+{
+   int index = -1;
+   union ipc_mem_msg_entry *msg =
+   ipc_protocol_free_msg_get(ipc_protocol, &index);
+   struct ipc_pipe *pipe = args->pipe_open.pipe;
+   struct ipc_protocol_td *tdr;
+   struct sk_buff **skbr;
+
+   if (!msg) {
+   dev_err(ipc_protocol->dev, "failed to get free message");
+   return -1;
+   }
+
+   /* Allocate the skbuf elements for the skbuf which are on the way.
+* SKB ring is internal memory allocation for driver. No need to
+* re-calculate the start and end addresses.
+*/
+   skbr = kcalloc(pipe->nr_of_entries, sizeof(*skbr), GFP_ATOMIC);
+   if (!skbr)
+   return -ENOMEM;
+
+   /* Allocate the transfer descriptors for the pipe. */
+   tdr = pci_alloc_consistent(ipc_protocol->pcie->pci,
+  pipe->nr_of_entries * sizeof(*tdr),
+  &pipe->phy_tdr_start);
+   if (!tdr) {
+   kfree(skbr);
+   dev_err(ipc_protocol->dev, "tdr alloc error");
+   return -ENOMEM;
+   }
+
+   pipe->max_nr_of_queued_entries = pipe->nr_of_entries - 1;
+   pipe->nr_of_queued_entries = 0;
+   pipe->tdr_start = tdr;
+   pipe->skbr_start = skbr;
+   pipe->old_tail = 0;
+
+   ipc_protocol->p_ap_shm->head_array[pipe->pipe_nr] = 0;
+
+   msg->open_pipe.type_of_message = IPC_MEM_MSG_OPEN_PIPE;
+   msg->open_pipe.pipe_nr = pipe->pipe_nr;
+   msg->open_pipe.tdr_addr = pipe->phy_tdr_start;
+   msg->open_pipe.tdr_entries = pipe->nr_of_entries;
+   msg->open_pipe.interrupt_moderation = pipe->irq_moderation;
+   msg->open_pipe.accumulation_backoff = pipe->accumulation_backoff;
+   msg->open_pipe.reliable = true;
+   msg->open_pipe.optimized_completion = true;
+   msg->open_pipe.irq_vector = pipe->irq;
+
+   return index;
+}
+
+static int ipc_protocol_msg_prepipe_close(struct iosm_protocol *ipc_protocol,
+ union ipc_msg_prep_args *args)
+{
+   int index = -1;
+   union ipc_mem_msg_entry *msg =
+   ipc_protocol_free_msg_get(ipc_protocol, &index);
+   struct ipc_pipe *pipe = args->pipe_close.pipe;
+
+   if (!msg)
+   return -1;
+
+

[RFC 17/18] net: iosm: readme file

2020-11-23 Thread M Chetan Kumar

Documents IOSM Driver interface usage.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/README | 126 +++
 1 file changed, 126 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/README

diff --git a/drivers/net/wwan/iosm/README b/drivers/net/wwan/iosm/README
new file mode 100644
index ..4a489177ad96
--- /dev/null
+++ b/drivers/net/wwan/iosm/README
@@ -0,0 +1,126 @@
+IOSM Driver for PCIe based Intel M.2 Modems
+
+The IOSM (IPC over Shared Memory) driver is a PCIe host driver implemented
+for linux or chrome platform for data exchange over PCIe interface between
+Host platform & Intel M.2 Modem. The driver exposes interface conforming to the
+MBIM protocol [1]. Any front end application ( eg: Modem Manager) could easily
+manage the MBIM interface to enable data communication towards WWAN.
+
+Basic usage
+===
+MBIM functions are inactive when unmanaged. The IOSM driver only
+provides a userspace interface of a character device representing
+MBIM control channel and does not play any role in managing the
+functionality. It is the job of a userspace application to enumerate
+the port appropriately and enable MBIM functionality.
+
+Examples of few such userspace application are:
+ - mbimcli (included with the libmbim [2] library), and
+ - ModemManager [3]
+
+For establishing an MBIM IP session at least these actions are required by the
+management application:
+ - open the control channel
+ - configure network connection settings
+ - connect to network
+ - configure IP interface
+
+Management application development
+--
+The driver and userspace interfaces are described below. The MBIM
+control channel protocol is described in [1].
+
+MBIM control channel userspace ABI
+==
+
+/dev/wwanctrl character device
+--
+The driver exposes an interface to the MBIM function control channel using char
+driver as a subdriver. The userspace end of the control channel pipe is a
+/dev/wwanctrl character device.
+
+The /dev/wwanctrl device is created as a subordinate character device under
+IOSM driver. The character device associated with a specific MBIM function
+can be looked up using sysfs with matching the above device name.
+
+Control channel configuration
+-
+The wMaxControlMessage field of the MBIM functional descriptor
+limits the maximum control message size. The management application needs to
+negotiate the control message size as per the requirements.
+See also the ioctl documentation below.
+
+Fragmentation
+-
+The userspace application is responsible for all control message
+fragmentation and defragmentation as per MBIM.
+
+/dev/wwanctrl write()
+-
+The MBIM control messages from the management application must not
+exceed the negotiated control message size.
+
+/dev/wwanctrl read()
+
+The management application must accept control messages of up the
+negotiated control message size.
+
+/dev/wwanctrl ioctl()
+
+IOCTL_WDM_MAX_COMMAND: Get Maximum Command Size
+This IOCTL command could be used by applications to fetch the Maximum Command
+buffer length supported by the driver which is restricted to 4096 bytes.
+
+   #include 
+   #include 
+   #include 
+   #include 
+   int main()
+   {
+   __u16 max;
+   int fd = open("/dev/wwanctrl", O_RDWR);
+   if (!ioctl(fd, IOCTL_WDM_MAX_COMMAND, &max))
+   printf("wMaxControlMessage is %d\n", max);
+   }
+
+MBIM data channel userspace ABI
+===
+
+wwanY network device
+
+The IOSM driver represents the MBIM data channel as a single
+network device of the "wwan0" type. This network device is initially
+mapped to MBIM IP session 0.
+
+Multiplexed IP sessions (IPS)
+-
+IOSM driver allows multiplexing of several IP sessions over the single network
+device of type wwan0. IOSM driver models such IP sessions as 802.1q VLAN
+subdevices of the master wwanY device, mapping MBIM IP session M to VLAN ID M
+for all values of M greater than 0.
+
+The userspace management application is responsible for adding new VLAN links
+prior to establishing MBIM IP sessions where the SessionId is greater than 0.
+These links can be added by using the normal VLAN kernel interfaces.
+
+For example, adding a link for a MBIM IP session with SessionId 5:
+
+  ip link add link wwan0 name wwan0. type vlan id 5
+
+The driver will automatically map the "wwan0." network device to MBIM
+IP session 5.
+
+References
+==
+
+[1] "MBIM (Mobile Broadband Interface Model) Registry"
+   - http://compliance.usb.org/mbim/
+
+[2] libmbim - "a glib-based library for talking to WWAN modems and
+  devices which speak the Mobile Interface Broadban

[RFC 10/18] net: iosm: multiplex IP sessions

2020-11-23 Thread M Chetan Kumar

Establish IP session between host-device & session management.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_mux.c | 455 +++
 drivers/net/wwan/iosm/iosm_ipc_mux.h | 344 ++
 2 files changed, 799 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_mux.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_mux.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_mux.c 
b/drivers/net/wwan/iosm/iosm_ipc_mux.c
new file mode 100644
index ..3b46ef98460d
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_mux.c
@@ -0,0 +1,455 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include "iosm_ipc_mux_codec.h"
+
+/* At the begin of the runtime phase the IP MUX channel shall created. */
+static int mux_channel_create(struct iosm_mux *ipc_mux)
+{
+   int channel_id;
+
+   channel_id = imem_channel_alloc(ipc_mux->imem, ipc_mux->instance_id,
+   IPC_CTYPE_WWAN);
+
+   if (channel_id < 0) {
+   dev_err(ipc_mux->dev,
+   "allocation of the MUX channel id failed");
+   ipc_mux->state = MUX_S_ERROR;
+   ipc_mux->event = MUX_E_NOT_APPLICABLE;
+   return channel_id; /* MUX channel is not available. */
+   }
+
+   /* Establish the MUX channel in blocking mode. */
+   ipc_mux->channel = imem_channel_open(ipc_mux->imem, channel_id,
+IPC_HP_NET_CHANNEL_INIT);
+
+   if (!ipc_mux->channel) {
+   dev_err(ipc_mux->dev, "imem_channel_open failed");
+   ipc_mux->state = MUX_S_ERROR;
+   ipc_mux->event = MUX_E_NOT_APPLICABLE;
+   return -1; /* MUX channel is not available. */
+   }
+
+   /* Define the MUX active state properties. */
+   ipc_mux->state = MUX_S_ACTIVE;
+   ipc_mux->event = MUX_E_NO_ORDERS;
+   return channel_id;
+}
+
+/* Reset the session/if id state. */
+static void mux_session_free(struct iosm_mux *ipc_mux, int if_id)
+{
+   struct mux_session *if_entry;
+
+   if_entry = &ipc_mux->session[if_id];
+   /* Reset the session state. */
+   if_entry->wwan = NULL;
+}
+
+/* Create and send the session open command. */
+static struct mux_cmd_open_session_resp *
+mux_session_open_send(struct iosm_mux *ipc_mux, int if_id)
+{
+   struct mux_cmd_open_session_resp *open_session_resp;
+   struct mux_acb *acb = &ipc_mux->acb;
+   union mux_cmd_param param;
+
+   /* open_session commands to one ACB and start transmission. */
+   param.open_session.flow_ctrl = 0;
+   param.open_session.reserved = 0;
+   param.open_session.ipv4v6_hints = 0;
+   param.open_session.reserved2 = 0;
+   param.open_session.dl_head_pad_len = IPC_MEM_DL_ETH_OFFSET;
+
+   /* Finish and transfer ACB. The user thread is suspended.
+* It is a blocking function call, until CP responds or timeout.
+*/
+   acb->wanted_response = MUX_CMD_OPEN_SESSION_RESP;
+   if (mux_dl_acb_send_cmds(ipc_mux, MUX_CMD_OPEN_SESSION, if_id, 0,
+¶m, sizeof(param.open_session), true,
+false) ||
+   acb->got_response != MUX_CMD_OPEN_SESSION_RESP) {
+   dev_err(ipc_mux->dev, "if_id %d: OPEN_SESSION send failed",
+   if_id);
+   return NULL;
+   }
+
+   open_session_resp = &ipc_mux->acb.got_param.open_session_resp;
+   if (open_session_resp->response != MUX_CMD_RESP_SUCCESS) {
+   dev_err(ipc_mux->dev,
+   "if_id %d,session open failed,response=%d", if_id,
+   (int)open_session_resp->response);
+   return NULL;
+   }
+
+   return open_session_resp;
+}
+
+/* Open the first IP session. */
+static bool mux_session_open(struct iosm_mux *ipc_mux,
+struct mux_session_open *session_open)
+{
+   struct mux_cmd_open_session_resp *open_session_resp;
+   int if_id;
+
+   /* Search for a free session interface id. */
+   if_id = session_open->if_id;
+   if (if_id < 0 || if_id >= ipc_mux->nr_sessions) {
+   dev_err(ipc_mux->dev, "invalid interface id=%d", if_id);
+   return false;
+   }
+
+   /* Create and send the session open command.
+* It is a blocking function call, until CP responds or timeout.
+*/
+   open_session_resp = mux_session_open_send(ipc_mux, if_id);
+   if (!open_session_resp) {
+   mux_session_free(ipc_mux, if_id);
+   session_open->if_id = -1;
+   return false;
+   }
+
+   /* Initialize the uplink skb accumulator. */
+   skb_queue_head_init(&ipc_mux->session[if_id].ul_list);
+
+   ipc_mux->session[if_id].dl_head_pad_len = IPC_MEM_DL_ETH_OFFSET;
+   ipc_mux->session[if_id].ul_head_

[RFC 18/18] net: iosm: infrastructure

2020-11-23 Thread M Chetan Kumar

1) Kconfig & Makefile changes for IOSM Driver compilation.
2) Modified driver/net Kconfig & Makefile for driver inclusion.
3) Modified MAINTAINER file for IOSM Driver addition.

Signed-off-by: M Chetan Kumar 
---
 MAINTAINERS|  7 +++
 drivers/net/Kconfig|  1 +
 drivers/net/Makefile   |  1 +
 drivers/net/wwan/Kconfig   | 13 +
 drivers/net/wwan/Makefile  |  5 +
 drivers/net/wwan/iosm/Kconfig  | 10 ++
 drivers/net/wwan/iosm/Makefile | 27 +++
 7 files changed, 64 insertions(+)
 create mode 100644 drivers/net/wwan/Kconfig
 create mode 100644 drivers/net/wwan/Makefile
 create mode 100644 drivers/net/wwan/iosm/Kconfig
 create mode 100644 drivers/net/wwan/iosm/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index a008b70f3c16..cb1fc8fabffd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9099,6 +9099,13 @@ M:   Mario Limonciello 
 S: Maintained
 F: drivers/platform/x86/intel-wmi-thunderbolt.c
 
+INTEL WWAN IOSM DRIVER
+M:  M Chetan Kumar 
+M:  Intel Corporation 
+L:  netdev@vger.kernel.org
+S:  Maintained
+F:  drivers/net/wwan/iosm/
+
 INTEL(R) TRACE HUB
 M: Alexander Shishkin 
 S: Supported
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c3dbe64e628e..e0f869a2c52f 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -593,4 +593,5 @@ config NET_FAILOVER
  a VM with direct attached VF by failing over to the paravirtual
  datapath when the VF is unplugged.
 
+source "drivers/net/wwan/Kconfig"
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 72e18d505d1a..025fb399d2af 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -84,3 +84,4 @@ thunderbolt-net-y += thunderbolt.o
 obj-$(CONFIG_USB4_NET) += thunderbolt-net.o
 obj-$(CONFIG_NETDEVSIM) += netdevsim/
 obj-$(CONFIG_NET_FAILOVER) += net_failover.o
+obj-$(CONFIG_WWAN)+= wwan/
diff --git a/drivers/net/wwan/Kconfig b/drivers/net/wwan/Kconfig
new file mode 100644
index ..715dfd0598f9
--- /dev/null
+++ b/drivers/net/wwan/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Wireless WAN device configuration
+#
+
+menuconfig WWAN
+   bool "Wireless WAN"
+   help
+ This section contains all Wireless WAN driver configurations.
+
+if WWAN
+source "drivers/net/wwan/iosm/Kconfig"
+endif # WWAN
diff --git a/drivers/net/wwan/Makefile b/drivers/net/wwan/Makefile
new file mode 100644
index ..a81ff28e6cd9
--- /dev/null
+++ b/drivers/net/wwan/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the Linux WWAN Device Drivers.
+#
+obj-$(CONFIG_IOSM)+= iosm/
diff --git a/drivers/net/wwan/iosm/Kconfig b/drivers/net/wwan/iosm/Kconfig
new file mode 100644
index ..fed382fc9cd7
--- /dev/null
+++ b/drivers/net/wwan/iosm/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: (GPL-2.0-only)
+#
+# IOSM Driver configuration
+#
+
+config IOSM
+   tristate "IOSM Driver"
+   depends on INTEL_IOMMU
+   help
+ This driver enables Intel M.2 WWAN Device communication.
diff --git a/drivers/net/wwan/iosm/Makefile b/drivers/net/wwan/iosm/Makefile
new file mode 100644
index ..153ae0360244
--- /dev/null
+++ b/drivers/net/wwan/iosm/Makefile
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: (GPL-2.0-only)
+#
+# Copyright (C) 2020 Intel Corporation.
+#
+
+iosm-y = \
+   iosm_ipc_task_queue.o   \
+   iosm_ipc_imem.o \
+   iosm_ipc_imem_ops.o \
+   iosm_ipc_mmio.o \
+   iosm_ipc_sio.o  \
+   iosm_ipc_mbim.o \
+   iosm_ipc_wwan.o \
+   iosm_ipc_uevent.o   \
+   iosm_ipc_pm.o   \
+   iosm_ipc_pcie.o \
+   iosm_ipc_irq.o  \
+   iosm_ipc_chnl_cfg.o \
+   iosm_ipc_protocol.o \
+   iosm_ipc_protocol_ops.o \
+   iosm_ipc_mux.o  \
+   iosm_ipc_mux_codec.o
+
+obj-$(CONFIG_IOSM) := iosm.o
+
+# compilation flags
+#ccflags-y += -DDEBUG
-- 
2.12.3

[RFC 16/18] net: iosm: net driver

2020-11-23 Thread M Chetan Kumar

1) Create net device for data/IP communication.
2) Bind VLAN ID to mux IP session.
3) Implement net device operations.

Signed-off-by: M Chetan Kumar 
---
 drivers/net/wwan/iosm/iosm_ipc_wwan.c | 674 ++
 drivers/net/wwan/iosm/iosm_ipc_wwan.h |  72 
 2 files changed, 746 insertions(+)
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_wwan.c
 create mode 100644 drivers/net/wwan/iosm/iosm_ipc_wwan.h

diff --git a/drivers/net/wwan/iosm/iosm_ipc_wwan.c 
b/drivers/net/wwan/iosm/iosm_ipc_wwan.c
new file mode 100644
index ..f14a971455bb
--- /dev/null
+++ b/drivers/net/wwan/iosm/iosm_ipc_wwan.c
@@ -0,0 +1,674 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ */
+
+#include 
+
+#include "iosm_ipc_chnl_cfg.h"
+#include "iosm_ipc_imem_ops.h"
+
+/* Minimum number of transmit queues per WWAN root device */
+#define WWAN_MIN_TXQ (1)
+/* Minimum number of receive queues per WWAN root device */
+#define WWAN_MAX_RXQ (1)
+/* Default transmit queue for WWAN root device */
+#define WWAN_DEFAULT_TXQ (0)
+/* VLAN tag for WWAN root device */
+#define WWAN_ROOT_VLAN_TAG (0)
+
+#define IPC_MEM_MIN_MTU_SIZE (68)
+#define IPC_MEM_MAX_MTU_SIZE (1024 * 1024)
+
+#define IPC_MEM_VLAN_TO_SESSION (1)
+
+/* Required alignment for TX in bytes (32 bit/4 bytes)*/
+#define IPC_WWAN_ALIGN (4)
+
+/**
+ * struct ipc_vlan_info - This structure includes information about VLAN 
device.
+ * @vlan_id:   VLAN tag of the VLAN device.
+ * @ch_id: IPC channel number for which VLAN device is created.
+ * @stats: Contains statistics of VLAN devices.
+ */
+struct ipc_vlan_info {
+   int vlan_id;
+   int ch_id;
+   struct net_device_stats stats;
+};
+
+/**
+ * struct iosm_wwan - This structure contains information about WWAN root 
device
+ *  and interface to the IPC layer.
+ * @vlan_devs: Contains information about VLAN devices created under
+ * WWAN root device.
+ * @netdev:Pointer to network interface device structure.
+ * @ops_instance:  Instance pointer for Callbacks
+ * @dev:   Pointer device structure
+ * @lock:  Spinlock to be used for atomic operations of the
+ * root device.
+ * @stats: Contains statistics of WWAN root device
+ * @vlan_devs_nr:  Number of VLAN devices.
+ * @if_mutex:  Mutex used for add and remove vlan-id
+ * @max_devs:  Maximum supported VLAN devs
+ * @max_ip_devs:   Maximum supported IP VLAN devs
+ * @is_registered: Registration status with netdev
+ */
+struct iosm_wwan {
+   struct ipc_vlan_info *vlan_devs;
+   struct net_device *netdev;
+   void *ops_instance;
+   struct device *dev;
+   spinlock_t lock; /* Used for atomic operations on root device */
+   struct net_device_stats stats;
+   int vlan_devs_nr;
+   struct mutex if_mutex; /* Mutex used for add and remove vlan-id */
+   int max_devs;
+   int max_ip_devs;
+   u8 is_registered : 1;
+};
+
+/* Get the array index of requested tag. */
+static int ipc_wwan_get_vlan_devs_nr(struct iosm_wwan *ipc_wwan, u16 tag)
+{
+   int i = 0;
+
+   if (!ipc_wwan->vlan_devs)
+   return -EINVAL;
+
+   for (i = 0; i < ipc_wwan->vlan_devs_nr; i++)
+   if (ipc_wwan->vlan_devs[i].vlan_id == tag)
+   return i;
+
+   return -EINVAL;
+}
+
+static int ipc_wwan_add_vlan(struct iosm_wwan *ipc_wwan, u16 vid)
+{
+   if (vid >= 512 || !ipc_wwan->vlan_devs)
+   return -EINVAL;
+
+   if (vid == WWAN_ROOT_VLAN_TAG)
+   return 0;
+
+   mutex_lock(&ipc_wwan->if_mutex);
+
+   /* get channel id */
+   ipc_wwan->vlan_devs[ipc_wwan->vlan_devs_nr].ch_id =
+   imem_sys_wwan_open(ipc_wwan->ops_instance, vid);
+
+   if (ipc_wwan->vlan_devs[ipc_wwan->vlan_devs_nr].ch_id < 0) {
+   dev_err(ipc_wwan->dev,
+   "cannot connect wwan0 & id %d to the IPC mem layer",
+   vid);
+   mutex_unlock(&ipc_wwan->if_mutex);
+   return -ENODEV;
+   }
+
+   /* save vlan id */
+   ipc_wwan->vlan_devs[ipc_wwan->vlan_devs_nr].vlan_id = vid;
+
+   dev_dbg(ipc_wwan->dev, "Channel id %d allocated to vlan id %d",
+   ipc_wwan->vlan_devs[ipc_wwan->vlan_devs_nr].ch_id,
+   ipc_wwan->vlan_devs[ipc_wwan->vlan_devs_nr].vlan_id);
+
+   ipc_wwan->vlan_devs_nr++;
+
+   mutex_unlock(&ipc_wwan->if_mutex);
+
+   return 0;
+}
+
+static int ipc_wwan_remove_vlan(struct iosm_wwan *ipc_wwan, u16 vid)
+{
+   int ch_nr = ipc_wwan_get_vlan_devs_nr(ipc_wwan, vid);
+   int i = 0;
+
+   if (ch_nr < 0) {
+   dev_err(ipc_wwan->dev, "vlan dev not found for vid = %d", vid);
+   return ch_nr;
+   }
+
+   if (ipc_wwan->vlan_devs[ch_nr].ch_id < 0) {
+   dev_err(ipc_wwan->dev, "in

Re: [PATCH bpf] net, xsk: Avoid taking multiple skbuff references

2020-11-23 Thread Daniel Borkmann


On 11/23/20 2:12 PM, Björn Töpel wrote:

From: Björn Töpel 

Commit 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
addressed the problem that packets were discarded from the Tx AF_XDP
ring, when the driver returned NETDEV_TX_BUSY. Part of the fix was
bumping the skbuff reference count, so that the buffer would not be
freed by dev_direct_xmit(). A reference count larger than one means
that the skbuff is "shared", which is not the case.

If the "shared" skbuff is sent to the generic XDP receive path,
netif_receive_generic_xdp(), and pskb_expand_head() is entered the
BUG_ON(skb_shared(skb)) will trigger.

This patch adds a variant to dev_direct_xmit(), __dev_direct_xmit(),
where a user can select the skbuff free policy. This allows AF_XDP to
avoid bumping the reference count, but still keep the NETDEV_TX_BUSY
behavior.

Reported-by: Yonghong Song 
Fixes: 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
Signed-off-by: Björn Töpel 
---
  include/linux/netdevice.h | 1 +
  net/core/dev.c| 9 +++--
  net/xdp/xsk.c | 8 +---
  3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 964b494b0e8d..e7402fca7752 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2815,6 +2815,7 @@ u16 dev_pick_tx_cpu_id(struct net_device *dev, struct 
sk_buff *skb,
   struct net_device *sb_dev);
  int dev_queue_xmit(struct sk_buff *skb);
  int dev_queue_xmit_accel(struct sk_buff *skb, struct net_device *sb_dev);
+int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool free_on_busy);
  int dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
  int register_netdevice(struct net_device *dev);
  void unregister_netdevice_queue(struct net_device *dev, struct list_head 
*head);
diff --git a/net/core/dev.c b/net/core/dev.c
index 82dc6b48e45f..2af79a4253bb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4180,7 +4180,7 @@ int dev_queue_xmit_accel(struct sk_buff *skb, struct 
net_device *sb_dev)
  }
  EXPORT_SYMBOL(dev_queue_xmit_accel);
  
-int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)

+int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool free_on_busy)
  {
struct net_device *dev = skb->dev;
struct sk_buff *orig_skb = skb;
@@ -4211,7 +4211,7 @@ int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
  
  	local_bh_enable();
  
-	if (!dev_xmit_complete(ret))

+   if (free_on_busy && !dev_xmit_complete(ret))
kfree_skb(skb);
  
  	return ret;


Hm, but this way free_on_busy, even though constant, cannot be optimized away?
Can't you just move the dev_xmit_complete() check out into dev_direct_xmit()
instead? That way you can just drop the bool, and the below dev_direct_xmit()
should probably just become an __always_line function in netdevice.h so you
avoid the double call.


@@ -4220,6 +4220,11 @@ int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
kfree_skb_list(skb);
return NET_XMIT_DROP;
  }
+
+int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
+{
+   return __dev_direct_xmit(skb, queue_id, true);
+}
  EXPORT_SYMBOL(dev_direct_xmit);
  
  /*

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5a6cdf7b320d..c6ad31b374b7 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -411,11 +411,7 @@ static int xsk_generic_xmit(struct sock *sk)
skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
skb->destructor = xsk_destruct_skb;
  
-		/* Hinder dev_direct_xmit from freeing the packet and

-* therefore completing it in the destructor
-*/
-   refcount_inc(&skb->users);
-   err = dev_direct_xmit(skb, xs->queue_id);
+   err = __dev_direct_xmit(skb, xs->queue_id, false);
if  (err == NETDEV_TX_BUSY) {
/* Tell user-space to retry the send */
skb->destructor = sock_wfree;
@@ -429,12 +425,10 @@ static int xsk_generic_xmit(struct sock *sk)
/* Ignore NET_XMIT_CN as packet might have been sent */
if (err == NET_XMIT_DROP) {
/* SKB completed but not sent */
-   kfree_skb(skb);
err = -EBUSY;
goto out;
}
  
-		consume_skb(skb);

sent_frame = true;
}
  


base-commit: 178648916e73e00de83150eb0c90c0d3a977a46a

Re: [PATCH 0/3] xsk: fix for xsk_poll writeable

2020-11-23 Thread Magnus Karlsson

On Wed, Nov 18, 2020 at 9:25 AM Xuan Zhuo  wrote:
>
> I tried to combine cq available and tx writeable, but I found it very 
> difficult.
> Sometimes we pay attention to the status of "available" for both, but 
> sometimes,
> we may only pay attention to one, such as tx writeable, because we can use the
> item of fq to write to tx. And this kind of demand may be constantly changing,
> and it may be necessary to set it every time before entering xsk_poll, so
> setsockopt is not very convenient. I feel even more that using a new event may
> be a better solution, such as EPOLLPRI, I think it can be used here, after 
> all,
> xsk should not have OOB data ^_^.
>
> However, two other problems were discovered during the test:
>
> * The mask returned by datagram_poll always contains EPOLLOUT
> * It is not particularly reasonable to return EPOLLOUT based on tx not full
>
> After fixing these two problems, I found that when the process is awakened by
> EPOLLOUT, the process can always get the item from cq.
>
> Because the number of packets that the network card can send at a time is
> actually limited, suppose this value is "nic_num". Once the number of
> consumed items in the tx queue is greater than nic_num, this means that there
> must also be new recycled items in the cq queue from nic.
>
> In this way, as long as the tx configured by the user is larger, we won't have
> the situation that tx is already in the writeable state but cannot get the 
> item
> from cq.

I think the overall approach of tying this into poll() instead of
setsockopt() is the right way to go. But we need a more robust
solution. Your patch #3 also breaks backwards compatibility and that
is not allowed. Could you please post some simple code example of what
it is you would like to do in user space? So you would like to wake up
when there are entries in the cq that can be retrieved and the reason
you would like to do this is that you then know you can put some more
entries into the Tx ring and they will get sent as there now are free
slots in the cq. Correct me if wrong. Would an event that wakes you up
when there is both space in the Tx ring and space in the cq work? Is
there a case in which we would like to be woken up when only the Tx
ring is non-full? Maybe there are as it might be beneficial to fill
the Tx and while doing that some entries in the cq has been completed
and away the packets go. But it would be great if you could post some
simple example code, does not need to compile or anything. Can be
pseudo code.

It would also be good to know if your goal is max throughput, max
burst size, or something else.

Thanks: Magnus

> Xuan Zhuo (3):
>   xsk: replace datagram_poll by sock_poll_wait
>   xsk: change the tx writeable condition
>   xsk: set tx/rx the min entries
>
>  include/uapi/linux/if_xdp.h |  2 ++
>  net/xdp/xsk.c   | 26 ++
>  net/xdp/xsk_queue.h |  6 ++
>  3 files changed, 30 insertions(+), 4 deletions(-)
>
> --
> 1.8.3.1
>

Re: [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-23 Thread Miguel Ojeda

On Sun, Nov 22, 2020 at 11:54 PM Finn Thain  wrote:
>
> We should also take into account optimisim about future improvements in
> tooling.

Not sure what you mean here. There is no reliable way to guess what
the intention was with a missing fallthrough, even if you parsed
whitespace and indentation.

> It is if you want to spin it that way.

How is that a "spin"? It is a fact that we won't get *implicit*
fallthrough mistakes anymore (in particular if we make it a hard
error).

> But what we inevitably get is changes like this:
>
>  case 3:
> this();
> +   break;
>  case 4:
> hmmm();
>
> Why? Mainly to silence the compiler. Also because the patch author argued
> successfully that they had found a theoretical bug, often in mature code.

If someone changes control flow, that is on them. Every kernel
developer knows what `break` does.

> But is anyone keeping score of the regressions? If unreported bugs count,
> what about unreported regressions?

Introducing `fallthrough` does not change semantics. If you are really
keen, you can always compare the objects because the generated code
shouldn't change.

Cheers,
Miguel

Re: [PATCH bpf] net, xsk: Avoid taking multiple skbuff references

2020-11-23 Thread Björn Töpel


On 2020-11-23 14:53, Daniel Borkmann wrote:
[...]


Hm, but this way free_on_busy, even though constant, cannot be optimized 
away?
Can't you just move the dev_xmit_complete() check out into 
dev_direct_xmit()
instead? That way you can just drop the bool, and the below 
dev_direct_xmit()

should probably just become an __always_line function in netdevice.h so you
avoid the double call.



Good suggestion! I'll spin a v2.


Björn

Re: [PATCH 1/3] xsk: replace datagram_poll by sock_poll_wait

2020-11-23 Thread Magnus Karlsson

On Wed, Nov 18, 2020 at 9:26 AM Xuan Zhuo  wrote:
>
> datagram_poll will judge the current socket status (EPOLLIN, EPOLLOUT)
> based on the traditional socket information (eg: sk_wmem_alloc), but
> this does not apply to xsk. So this patch uses sock_poll_wait instead of
> datagram_poll, and the mask is calculated by xsk_poll.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  net/xdp/xsk.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index cfbec39..7f0353e 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -477,11 +477,13 @@ static int xsk_sendmsg(struct socket *sock, struct 
> msghdr *m, size_t total_len)
>  static __poll_t xsk_poll(struct file *file, struct socket *sock,
>  struct poll_table_struct *wait)
>  {
> -   __poll_t mask = datagram_poll(file, sock, wait);
> +   __poll_t mask = 0;

It would indeed be nice to not execute a number of tests in
datagram_poll that will never be triggered. It will speed up things
for sure. But we need to make sure that removing those flags that
datagram_poll sets do not have any bad effects in the code above this.
But let us tentatively keep this patch for the next version of the
patch set. Just need to figure out how to solve your problem in a nice
way first. See discussion in patch 0/3.

> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
> struct xsk_buff_pool *pool;
>
> +   sock_poll_wait(file, sock, wait);
> +
> if (unlikely(!xsk_is_bound(xs)))
> return mask;
>
> --
> 1.8.3.1
>

[PATCH net] netdevice.h: Fix unintentional disable of ALL_FOR_ALL features on upper device

2020-11-23 Thread Tariq Toukan

Calling netdev_increment_features() on upper/master device from
netdev_add_tso_features() implies unintentional clearance of ALL_FOR_ALL
features supported by all slaves.  Fix it by passing ALL_FOR_ALL in
addition to ALL_TSO.

Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master")
Signed-off-by: Tariq Toukan 
---
 include/linux/netdevice.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Hi,

I know that netdev_increment_features() does not set any feature that's
unmasked in the mask argument.
I wonder why it can clear them though, was it meant to be like this?
If not, then the proper fix should be in netdev_increment_features(), not
in netdev_add_tso_features().


diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 18dec08439f9..a9d5e4bb829b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4748,7 +4748,7 @@ netdev_features_t 
netdev_increment_features(netdev_features_t all,
 static inline netdev_features_t netdev_add_tso_features(netdev_features_t 
features,
netdev_features_t mask)
 {
-   return netdev_increment_features(features, NETIF_F_ALL_TSO, mask);
+   return netdev_increment_features(features, NETIF_F_ALL_TSO | 
NETIF_F_ALL_FOR_ALL, mask);
 }
 
 int __netdev_update_features(struct net_device *dev);
-- 
2.21.0

1 2 3 4 >

1 - 100 of 377 matches

Mail list logo