Re: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Leon Romanovsky
On Wed, Oct 07, 2020 at 11:32:11PM -0700, Dan Williams wrote:
> On Wed, Oct 7, 2020 at 10:21 PM Leon Romanovsky  wrote:
> >
> > On Wed, Oct 07, 2020 at 08:46:45PM +, Ertman, David M wrote:
> > > > -Original Message-
> > > > From: Parav Pandit 
> > > > Sent: Wednesday, October 7, 2020 1:17 PM
> > > > To: Leon Romanovsky ; Ertman, David M
> > > > 
> > > > Cc: Pierre-Louis Bossart ; alsa-
> > > > de...@alsa-project.org; pa...@mellanox.com; ti...@suse.de;
> > > > netdev@vger.kernel.org; ranjani.sridha...@linux.intel.com;
> > > > fred...@linux.intel.com; linux-r...@vger.kernel.org;
> > > > dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe
> > > > ; gre...@linuxfoundation.org; k...@kernel.org; 
> > > > Williams,
> > > > Dan J ; Saleem, Shiraz
> > > > ; da...@davemloft.net; Patil, Kiran
> > > > 
> > > > Subject: RE: [PATCH v2 1/6] Add ancillary bus support
> > > >
> > > >
> > > > > From: Leon Romanovsky 
> > > > > Sent: Thursday, October 8, 2020 12:56 AM
> > > > >
> > > > > > > This API is partially obscures low level driver-core code and 
> > > > > > > needs
> > > > > > > to provide clear and proper abstractions without need to remember
> > > > > > > about put_device. There is already _add() interface why don't you 
> > > > > > > do
> > > > > > > put_device() in it?
> > > > > > >
> > > > > >
> > > > > > The pushback Pierre is referring to was during our mid-tier internal
> > > > > > review.  It was primarily a concern of Parav as I recall, so he can 
> > > > > > speak to
> > > > his
> > > > > reasoning.
> > > > > >
> > > > > > What we originally had was a single API call
> > > > > > (ancillary_device_register) that started with a call to
> > > > > > device_initialize(), and every error path out of the function 
> > > > > > performed a
> > > > > put_device().
> > > > > >
> > > > > > Is this the model you have in mind?
> > > > >
> > > > > I don't like this flow:
> > > > > ancillary_device_initialize()
> > > > > if (ancillary_ancillary_device_add()) {
> > > > >   put_device()
> > > > >   ancillary_device_unregister()
> > > > Calling device_unregister() is incorrect, because add() wasn't 
> > > > successful.
> > > > Only put_device() or a wrapper ancillary_device_put() is necessary.
> > > >
> > > > >   return err;
> > > > > }
> > > > >
> > > > > And prefer this flow:
> > > > > ancillary_device_initialize()
> > > > > if (ancillary_device_add()) {
> > > > >   ancillary_device_unregister()
> > > > This is incorrect and a clear deviation from the current core APIs that 
> > > > adds the
> > > > confusion.
> > > >
> > > > >   return err;
> > > > > }
> > > > >
> > > > > In this way, the ancillary users won't need to do non-intuitive 
> > > > > put_device();
> > > >
> > > > Below is most simple, intuitive and matching with core APIs for name and
> > > > design pattern wise.
> > > > init()
> > > > {
> > > > err = ancillary_device_initialize();
> > > > if (err)
> > > > return ret;
> > > >
> > > > err = ancillary_device_add();
> > > > if (ret)
> > > > goto err_unwind;
> > > >
> > > > err = some_foo();
> > > > if (err)
> > > > goto err_foo;
> > > > return 0;
> > > >
> > > > err_foo:
> > > > ancillary_device_del(adev);
> > > > err_unwind:
> > > > ancillary_device_put(adev->dev);
> > > > return err;
> > > > }
> > > >
> > > > cleanup()
> > > > {
> > > > ancillary_device_de(adev);
> > > > ancillary_device_put(adev);
> > > > /* It is common to have a one wrapper for this as
> > > > ancillary_device_unregister().
> > > >  * This will match with core device_unregister() that has precise
> > > > documentation.
> > > >  * but given fact that init() code need proper error unwinding, like
> > > > above,
> > > >  * it make sense to have two APIs, and no need to export another
> > > > symbol for unregister().
> > > >  * This pattern is very easy to audit and code.
> > > >  */
> > > > }
> > >
> > > I like this flow +1
> > >
> > > But ... since the init() function is performing both device_init and
> > > device_add - it should probably be called ancillary_device_register,
> > > and we are back to a single exported API for both register and
> > > unregister.
> > >
> > > At that point, do we need wrappers on the primitives init, add, del,
> > > and put?
> >
> > Let me summarize.
> > 1. You are not providing driver/core API but simplification and obfuscation
> > of basic primitives and structures. This is new layer. There is no room for
> > a claim that we must to follow internal API.
>
> Yes, this a driver core api, Greg even questioned why it was in
> drivers/bus instead of drivers/base which I think makes sense.

We can argue till death, but at the end, this is a bus.

>
> > 2. API should be symmetric. If you call to _register()/_add(), you will need
> > to call to _unregister()/_del(). Please don't add obscure _put().
>
> It's not obscure it's a long standing semantic for how to properly
> handle device_add() fa

RE: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Parav Pandit



> From: Leon Romanovsky 
> Sent: Thursday, October 8, 2020 10:56 AM
> 
> On Thu, Oct 08, 2020 at 04:56:01AM +, Parav Pandit wrote:
> >
> >
> > > From: Pierre-Louis Bossart 
> > > Sent: Thursday, October 8, 2020 3:20 AM
> > >
> > >
> > > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > > >> -Original Message-
> > > >> From: Pierre-Louis Bossart 
> > > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > > >> To: Ertman, David M ; Parav Pandit
> > > >> ; Leon Romanovsky 
> > > >> Cc: alsa-de...@alsa-project.org; pa...@mellanox.com;
> > > >> ti...@suse.de; netdev@vger.kernel.org;
> > > >> ranjani.sridha...@linux.intel.com;
> > > >> fred...@linux.intel.com; linux-r...@vger.kernel.org;
> > > >> dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe
> > > >> ; gre...@linuxfoundation.org; k...@kernel.org;
> > > >> Williams, Dan J ; Saleem, Shiraz
> > > >> ; da...@davemloft.net; Patil, Kiran
> > > >> 
> > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > >>
> > > >>
> > > >>
> > >  Below is most simple, intuitive and matching with core APIs for
> > >  name and design pattern wise.
> > >  init()
> > >  {
> > >   err = ancillary_device_initialize();
> > >   if (err)
> > >   return ret;
> > > 
> > >   err = ancillary_device_add();
> > >   if (ret)
> > >   goto err_unwind;
> > > 
> > >   err = some_foo();
> > >   if (err)
> > >   goto err_foo;
> > >   return 0;
> > > 
> > >  err_foo:
> > >   ancillary_device_del(adev);
> > >  err_unwind:
> > >   ancillary_device_put(adev->dev);
> > >   return err;
> > >  }
> > > 
> > >  cleanup()
> > >  {
> > >   ancillary_device_de(adev);
> > >   ancillary_device_put(adev);
> > >   /* It is common to have a one wrapper for this as
> > >  ancillary_device_unregister().
> > >    * This will match with core device_unregister() that has
> > >  precise documentation.
> > >    * but given fact that init() code need proper error
> > >  unwinding, like above,
> > >    * it make sense to have two APIs, and no need to export
> > >  another symbol for unregister().
> > >    * This pattern is very easy to audit and code.
> > >    */
> > >  }
> > > >>>
> > > >>> I like this flow +1
> > > >>>
> > > >>> But ... since the init() function is performing both device_init
> > > >>> and device_add - it should probably be called
> > > >>> ancillary_device_register, and we are back to a single exported
> > > >>> API for both register and unregister.
> > > >>
> > > >> Kind reminder that we introduced the two functions to allow the
> > > >> caller to know if it needed to free memory when initialize()
> > > >> fails, and it didn't need to free memory when add() failed since
> > > >> put_device() takes care of it. If you have a single init()
> > > >> function it's impossible to know which behavior to select on error.
> > > >>
> > > >> I also have a case with SoundWire where it's nice to first
> > > >> initialize, then set some data and then add.
> > > >>
> > > >
> > > > The flow as outlined by Parav above does an initialize as the
> > > > first step, so every error path out of the function has to do a
> > > > put_device(), so you would never need to manually free the memory
> > > > in
> > > the setup function.
> > > > It would be freed in the release call.
> > >
> > > err = ancillary_device_initialize(); if (err)
> > >   return ret;
> > >
> > > where is the put_device() here? if the release function does any
> > > sort of kfree, then you'd need to do it manually in this case.
> > Since device_initialize() failed, put_device() cannot be done here.
> > So yes, pseudo code should have shown, if (err) {
> > kfree(adev);
> > return err;
> > }
> >
> > If we just want to follow register(), unregister() pattern,
> >
> > Than,
> >
> > ancillar_device_register() should be,
> >
> > /**
> >  * ancillar_device_register() - register an ancillary device
> >  * NOTE: __never directly free @adev after calling this function, even
> > if it returned
> >  * an error. Always use ancillary_device_put() to give up the reference
> initialized by this function.
> >  * This note matches with the core and caller knows exactly what to be
> done.
> >  */
> > ancillary_device_register()
> > {
> > device_initialize(&adev->dev);
> > if (!dev->parent || !adev->name)
> > return -EINVAL;
> > if (!dev->release && !(dev->type && dev->type->release)) {
> > /* core is already capable and throws the warning when
> release callback is not set.
> >  * It is done at drivers/base/core.c:1798.
> >  * For NULL release it says, "does not have a release()
> function, it is broken and must be fixed"
> >  */
> > return -EINVAL;
> > }
> > err = dev_set_name(adev...);
> > if 

Re: [PATCH 2/2] usb: serial: option: add Cellient MPL200 card

2020-10-08 Thread Johan Hovold
On Thu, Oct 08, 2020 at 08:47:33AM +0200, Wilken Gottwalt wrote:
> On Tue, 6 Oct 2020 09:02:01 +0200
> Johan Hovold  wrote:
> 
> > On Mon, Oct 05, 2020 at 02:07:23PM +0200, Wilken Gottwalt wrote:
> > > On Mon, 5 Oct 2020 18:36:36 +0700 Lars Melin  wrote:

> > > > It is very likely that Cellient has replaced the VID with their own and 
> > > > kept the PID, it is something other mfgrs has done when buying modules 
> > > > from Qualcomm's series of devices with predefined composition.
> > > > 
> > > > The MS Windows driver for 05c6:9025 describes the interfaces as:
> > > > 
> > > > MI_00 Qualcomm HS-USB Diagnostics 9025
> > > > MI_01 Android Composite ADB Interface
> > > > MI_02 Qualcomm HS-USB Android Modem 9025
> > > > MI_03 Qualcomm HS-USB NMEA 9025
> > > > MI_04 Qualcomm Wireless HS-USB Ethernet Adapter 9025
> > > > MI_05 USB Mass Storage Device
> > > > 
> > > > where the net interface is for QMI/RMNET.
> > > > It fully matches the blacklisting Wilken has done for 2692:9025
> > > 
> > > Does your device have a GPS connector? Mine had not and I'm not sure
> > > if the description of MI_01 is actually correct. I remember looking at
> > > this port and seeing bogus NMEA data.
> > 
> > Well if it's NMEA then the interface shouldn't be blacklisted (even if
> > the values are bogus on your device), but if it's ADB it should be as
> > that is handled by userspace.
> > 
> > Here's some lsusb output from a Cellient MPL200 that still uses the
> > Qualcomm VID:
> > 
> > 
> > https://www.mail-archive.com/modemmanager-devel@lists.freedesktop.org/msg04523.html
> > 
> > which gives some support to Lars's hypothesis. I guess we'll just keep
> > the first interface reserved.
> 
> Lars and Johan are right here. I found an older external Gobi driver
> where I actually added comments saying interface 1 is ADB and interface 3
> is NMEA delivering only zeroed values because of the missing antenna
> connector, at least for the models I had access to.

Great, thanks for confirming.

Johan


Re: [PATCH v2 00/29] [Set 1,2,3] Rid W=1 warnings in Wireless

2020-10-08 Thread Kalle Valo
Luca Coelho  writes:

> On Tue, 2020-10-06 at 10:10 +0300, Kalle Valo wrote:
>> Lee Jones  writes:
>> 
>> > On Tue, 06 Oct 2020, Kalle Valo wrote:
>> > 
>> > > Lee Jones  writes:
>> > > 
>> > > > On Thu, 10 Sep 2020, Lee Jones wrote:
>> > > > 
>> > > > > This is a rebased/re-worked set of patches which have been
>> > > > > previously posted to the mailing list(s).
>> > > > > 
>> > > > > This set is part of a larger effort attempting to clean-up W=1
>> > > > > kernel builds, which are currently overwhelmingly riddled with
>> > > > > niggly little warnings.
>> > > > > 
>> > > > > There are quite a few W=1 warnings in the Wireless.  My plan
>> > > > > is to work through all of them over the next few weeks.
>> > > > > Hopefully it won't be too long before drivers/net/wireless
>> > > > > builds clean with W=1 enabled.
>> > > > > 
>> > > > > Lee Jones (29):
>> > > > >   iwlwifi: dvm: Demote non-compliant kernel-doc headers
>> > > > >   iwlwifi: rs: Demote non-compliant kernel-doc headers
>> > > > >   iwlwifi: dvm: tx: Demote non-compliant kernel-doc headers
>> > > > >   iwlwifi: dvm: lib: Demote non-compliant kernel-doc headers
>> > > > >   iwlwifi: calib: Demote seemingly unintentional kerneldoc header
>> > > > >   wil6210: Fix a couple of formatting issues in 
>> > > > > 'wil6210_debugfs_init'
>> > > > >   iwlwifi: dvm: sta: Demote a bunch of nonconformant kernel-doc 
>> > > > > headers
>> > > > >   iwlwifi: mvm: ops: Remove unused static struct 
>> > > > > 'iwl_mvm_debug_names'
>> > > > >   iwlwifi: dvm: Demote a couple of nonconformant kernel-doc headers
>> > > > >   iwlwifi: mvm: utils: Fix some doc-rot
>> > > > >   iwlwifi: dvm: scan: Demote a few nonconformant kernel-doc headers
>> > > > >   iwlwifi: dvm: rxon: Demote non-conformant kernel-doc headers
>> > > > >   iwlwifi: mvm: tx: Demote misuse of kernel-doc headers
>> > > > >   iwlwifi: dvm: devices: Fix function documentation formatting issues
>> > > > >   iwlwifi: iwl-drv: Provide descriptions debugfs dentries
>> > > > >   wil6210: wmi: Fix formatting and demote non-conforming function
>> > > > > headers
>> > > > >   wil6210: interrupt: Demote comment header which is clearly not
>> > > > > kernel-doc
>> > > > >   wil6210: txrx: Demote obvious abuse of kernel-doc
>> > > > >   wil6210: txrx_edma: Demote comments which are clearly not 
>> > > > > kernel-doc
>> > > > >   wil6210: pmc: Demote a few nonconformant kernel-doc function 
>> > > > > headers
>> > > > >   wil6210: wil_platform: Demote kernel-doc header to standard comment
>> > > > > block
>> > > > >   wil6210: wmi: Correct misnamed function parameter 'ptr_'
>> > > > >   ath6kl: wmi: Remove unused variable 'rate'
>> > > > >   ath9k: ar9002_initvals: Remove unused array
>> > > > > 'ar9280PciePhy_clkreq_off_L1_9280'
>> > > > >   ath9k: ar9001_initvals: Remove unused array 'ar5416Bank6_9100'
>> > > > >   ath9k: ar5008_initvals: Remove unused table entirely
>> > > > >   ath9k: ar5008_initvals: Move ar5416Bank{0,1,2,3,7} to where they 
>> > > > > are
>> > > > > used
>> > > > >   brcmsmac: phytbl_lcn: Remove unused array 'dot11lcn_gain_tbl_rev1'
>> > > > >   brcmsmac: phy_lcn: Remove unused variable
>> > > > > 'lcnphy_rx_iqcomp_table_rev0'
>> > > > 
>> > > > What's happening with all of these iwlwifi patches?
>> > > > 
>> > > > Looks like they are still not applied.
>> > > 
>> > > Luca (CCed) takes iwlwifi patches to his iwlwifi tree.
>> > 
>> > Thanks Kalle.
>> > 
>> > Luca,
>> > 
>> >   Do you know why these patches have not been applied yet?  Do you
>> > plan on applying them this week?  -rc1 is not due for release for
>> > nearly 3 weeks now that Linus tagged an -rc8.
>> 
>> I can also take Lee's patches directly to wireless-drivers-next, if
>> that's easier for Luca.
>
> Yes, please take these patches directly.  It simplifies things.

Ok, I have assigned all Lee's pending iwlwifi patches to me on
patchwork.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


[PATCH v2 0/2] add Cellient MPL200 card

2020-10-08 Thread Wilken Gottwalt
Add the Cellient MPL200 card to usb/qmi_wwan and serial/option. The
serial/option patch got already applied.

Changes v2:
Picked proper subject for qmi_wwan patch, moved the MPL200 device to
the section of the combined devices and changed the comment about the
device to be more precise.

Wilken Gottwalt (2):
  net: usb: qmi_wwan: add Cellient MPL200 card
  usb: serial: option: add Cellient MPL200 card

 drivers/net/usb/qmi_wwan.c  | 1 +
 drivers/usb/serial/option.c | 3 +++
 2 files changed, 4 insertions(+)

-- 
2.28.0



[PATCH v2 1/2] net: usb: qmi_wwan: add Cellient MPL200 card

2020-10-08 Thread Wilken Gottwalt
Add usb ids of the Cellient MPL200 card.

Signed-off-by: Wilken Gottwalt 
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 07c42c0719f5..5ca1356b8656 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1375,6 +1375,7 @@ static const struct usb_device_id products[] = {
{QMI_QUIRK_SET_DTR(0x2cb7, 0x0104, 4)}, /* Fibocom NL678 series */
{QMI_FIXED_INTF(0x0489, 0xe0b4, 0)},/* Foxconn T77W968 LTE */
{QMI_FIXED_INTF(0x0489, 0xe0b5, 0)},/* Foxconn T77W968 LTE with 
eSIM support*/
+   {QMI_FIXED_INTF(0x2692, 0x9025, 4)},/* Cellient MPL200 (rebranded 
Qualcomm 05c6:9025) */
 
/* 4. Gobi 1000 devices */
{QMI_GOBI1K_DEVICE(0x05c6, 0x9212)},/* Acer Gobi Modem Device */
-- 
2.28.0



[PATCH v2 2/2] usb: serial: option: add Cellient MPL200 card

2020-10-08 Thread Wilken Gottwalt
Add usb ids of the Cellient MPL200 card.

Signed-off-by: Wilken Gottwalt 
---
 drivers/usb/serial/option.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index 0c6f160a214a..a65e620b2277 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -528,6 +528,7 @@ static void option_instat_callback(struct urb *urb);
 /* Cellient products */
 #define CELLIENT_VENDOR_ID 0x2692
 #define CELLIENT_PRODUCT_MEN2000x9005
+#define CELLIENT_PRODUCT_MPL2000x9025
 
 /* Hyundai Petatel Inc. products */
 #define PETATEL_VENDOR_ID  0x1ff4
@@ -1982,6 +1983,8 @@ static const struct usb_device_id option_ids[] = {
{ USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_DC_4COM2, 0xff, 0x02, 0x01) },
{ USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_DC_4COM2, 0xff, 0x00, 0x00) },
{ USB_DEVICE(CELLIENT_VENDOR_ID, CELLIENT_PRODUCT_MEN200) },
+   { USB_DEVICE(CELLIENT_VENDOR_ID, CELLIENT_PRODUCT_MPL200),
+ .driver_info = RSVD(1) | RSVD(4) },
{ USB_DEVICE(PETATEL_VENDOR_ID, PETATEL_PRODUCT_NP10T_600A) },
{ USB_DEVICE(PETATEL_VENDOR_ID, PETATEL_PRODUCT_NP10T_600E) },
{ USB_DEVICE_AND_INTERFACE_INFO(TPLINK_VENDOR_ID, TPLINK_PRODUCT_LTE, 
0xff, 0x00, 0x00) },  /* TP-Link LTE Module */
-- 
2.28.0



Re: [PATCH 0/7] wfx: move out from the staging area

2020-10-08 Thread Kalle Valo
Greg Kroah-Hartman  writes:

> On Wed, Oct 07, 2020 at 12:19:36PM +0200, Jerome Pouiller wrote:
>> From: Jérôme Pouiller 
>> 
>> I think the wfx driver is now mature enough to be accepted in the
>> drivers/net/wireless directory.
>> 
>> There is still one item on the TODO list. It is an idea to improve the rate
>> control in some particular cases[1]. However, the current performances of the
>> driver seem to satisfy everyone. In add, the suggested change is large 
>> enough.
>> So, I would prefer to implement it only if it really solves an issue. I 
>> think it
>> is not an obstacle to move the driver out of the staging area.
>> 
>> In order to comply with the last rules for the DT bindings, I have converted 
>> the
>> documentation to yaml. I am moderately happy with the result. Especially, for
>> the description of the binding. Any comments are welcome.
>> 
>> The series also update the copyrights dates of the files. I don't know 
>> exactly
>> how this kind of changes should be sent. It's a bit weird to change all the
>> copyrights in one commit, but I do not see any better way.
>> 
>> I also include a few fixes I have found these last weeks.
>> 
>> [1] https://lore.kernel.org/lkml/3099559.gv3Q75KnN1@pc-42
>
> I'll take the first 6 patches here, the last one you should work with
> the wireless maintainers to get reviewed.
>
> Maybe that might want to wait until after 5.10-rc1 is out, with all of
> these changes in it, making it an easier move.

Yes, the driver needs to be reviewed in linux-wireless list. I recommend
submitting the whole driver in a patchset with one file per patch, which
seems to be the easiest way to review a full driver. The final move will
be in just one commit moving the driver, just like patch 7 does here. As
an example see how wilc1000 review was done.

Device tree bindings needs to be reviewed by the DT maintainer so CC
devicetree on that patch.

But do note that there's currently one new driver in review queue, so it
will most likely take some time before wfx is reviewed.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


Re: [PATCH v2 2/2] usb: serial: option: add Cellient MPL200 card

2020-10-08 Thread Johan Hovold
On Thu, Oct 08, 2020 at 09:22:19AM +0200, Wilken Gottwalt wrote:
> Add usb ids of the Cellient MPL200 card.
> 
> Signed-off-by: Wilken Gottwalt 
> ---

So I had already applied this one (which didn't change since v1).

Thanks again.

Johan


[net] tipc: fix NULL pointer dereference in tipc_named_rcv

2020-10-08 Thread Hoang Huu Le
In the function node_lost_contact(), we call __skb_queue_purge() without
grabbing the list->lock. This can cause to a race-condition why processing
the list 'namedq' in calling path tipc_named_rcv()->tipc_named_dequeue().

[] BUG: kernel NULL pointer dereference, address: 
[] #PF: supervisor read access in kernel mode
[] #PF: error_code(0x) - not-present page
[] PGD 7ca63067 P4D 7ca63067 PUD 6c553067 PMD 0
[] Oops:  [#1] SMP NOPTI
[] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G  O  5.9.0-rc6+ #2
[] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS [...]
[] RIP: 0010:tipc_named_rcv+0x103/0x320 [tipc]
[] Code: 41 89 44 24 10 49 8b 16 49 8b 46 08 49 c7 06 00 00 00 [...]
[] RSP: 0018:c90a7c58 EFLAGS: 0282
[] RAX: 12ec RBX:  RCX: 88807bde1270
[] RDX: 2c7c RSI: 2c7c RDI: 88807b38f1a8
[] RBP: 88807b006288 R08: 88806a367800 R09: 88806a367900
[] R10: 88806a367a00 R11: 88806a367b00 R12: 88807b006258
[] R13: 88807b00628a R14: 888069334d00 R15: 88806a434600
[] FS:  () GS:88807948() knlGS:0[...]
[] CS:  0010 DS:  ES:  CR0: 80050033
[] CR2:  CR3: 7732 CR4: 06e0
[] Call Trace:
[]  ? tipc_bcast_rcv+0x9a/0x1a0 [tipc]
[]  tipc_rcv+0x40d/0x670 [tipc]
[]  ? _raw_spin_unlock+0xa/0x20
[]  tipc_l2_rcv_msg+0x55/0x80 [tipc]
[]  __netif_receive_skb_one_core+0x8c/0xa0
[]  process_backlog+0x98/0x140
[]  net_rx_action+0x13a/0x420
[]  __do_softirq+0xdb/0x316
[]  ? smpboot_thread_fn+0x2f/0x1e0
[]  ? smpboot_thread_fn+0x74/0x1e0
[]  ? smpboot_thread_fn+0x14e/0x1e0
[]  run_ksoftirqd+0x1a/0x40
[]  smpboot_thread_fn+0x149/0x1e0
[]  ? sort_range+0x20/0x20
[]  kthread+0x131/0x150
[]  ? kthread_unuse_mm+0xa0/0xa0
[]  ret_from_fork+0x22/0x30
[] Modules linked in: veth tipc(O) ip6_udp_tunnel udp_tunnel [...]
[] CR2: 
[] ---[ end trace 65c276a8e2e2f310 ]---

To fix this, we need to grab the lock of the 'namedq' list on both
path calling.

Fixes: cad2929dc432 ("tipc: update a binding service via broadcast")
Acked-by: Jon Maloy 
Signed-off-by: Hoang Huu Le 
---
 net/tipc/name_distr.c | 10 +-
 net/tipc/node.c   |  2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index 2f9c148f17e2..fe4edce459ad 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -327,8 +327,13 @@ static struct sk_buff *tipc_named_dequeue(struct 
sk_buff_head *namedq,
struct tipc_msg *hdr;
u16 seqno;
 
+   spin_lock_bh(&namedq->lock);
skb_queue_walk_safe(namedq, skb, tmp) {
-   skb_linearize(skb);
+   if (unlikely(skb_linearize(skb))) {
+   __skb_unlink(skb, namedq);
+   kfree_skb(skb);
+   continue;
+   }
hdr = buf_msg(skb);
seqno = msg_named_seqno(hdr);
if (msg_is_last_bulk(hdr)) {
@@ -338,12 +343,14 @@ static struct sk_buff *tipc_named_dequeue(struct 
sk_buff_head *namedq,
 
if (msg_is_bulk(hdr) || msg_is_legacy(hdr)) {
__skb_unlink(skb, namedq);
+   spin_unlock_bh(&namedq->lock);
return skb;
}
 
if (*open && (*rcv_nxt == seqno)) {
(*rcv_nxt)++;
__skb_unlink(skb, namedq);
+   spin_unlock_bh(&namedq->lock);
return skb;
}
 
@@ -353,6 +360,7 @@ static struct sk_buff *tipc_named_dequeue(struct 
sk_buff_head *namedq,
continue;
}
}
+   spin_unlock_bh(&namedq->lock);
return NULL;
 }
 
diff --git a/net/tipc/node.c b/net/tipc/node.c
index cf4b239fc569..d269ebe382e1 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1496,7 +1496,7 @@ static void node_lost_contact(struct tipc_node *n,
 
/* Clean up broadcast state */
tipc_bcast_remove_peer(n->net, n->bc_entry.link);
-   __skb_queue_purge(&n->bc_entry.namedq);
+   skb_queue_purge(&n->bc_entry.namedq);
 
/* Abort any ongoing link failover */
for (i = 0; i < MAX_BEARERS; i++) {
-- 
2.25.1



Re: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Dan Williams
On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky  wrote:
[..]
> All stated above is my opinion, it can be different from yours.

Yes, but we need to converge to move this forward. Jason was involved
in the current organization for registration, Greg was angling for
this to be core functionality. I have use cases outside of RDMA and
netdev. Parav was ok with the current organization. The SOF folks
already have a proposed incorporation of it. The argument I am hearing
is that "this registration api seems hard for driver writers" when we
have several driver writers who have already taken a look and can make
it work. If you want to follow on with a simpler wrappers for your use
case, great, but I do not yet see anyone concurring with your opinion
that the current organization is irretrievably broken or too obscure
to use.


Re: [PATCH net-next v2 03/16] devlink: Add devlink reload limit option

2020-10-08 Thread Jiri Pirko
Wed, Oct 07, 2020 at 08:00:44AM CEST, mo...@mellanox.com wrote:
>Add reload limit to demand restrictions on reload actions.
>Reload limits supported:
>no_reset: No reset allowed, no down time allowed, no link flap and no
>  configuration is lost.
>
>By default reload limit is unspecified and so no constraints on reload
>actions are required.
>
>Some combinations of action and limit are invalid. For example, driver
>can not reinitialize its entities without any downtime.
>
>The no_reset reload limit will have usecase in this patchset to
>implement restricted fw_activate on mlx5.
>
>Have the uapi parameter of reload limit ready for future support of
>multiselection.
>
>Signed-off-by: Moshe Shemesh 

Reviewed-by: Jiri Pirko 


WARNING in ieee80211_ibss_csa_beacon

2020-10-08 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:c85fb28b Merge tag 'arm64-fixes' of git://git.kernel.org/p..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15b2b40050
kernel config:  https://syzkaller.appspot.com/x/.config?x=de7f697da23057c7
dashboard link: https://syzkaller.appspot.com/bug?extid=b6c9fe29aefe68e4ad34
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b6c9fe29aefe68e4a...@syzkaller.appspotmail.com

WARNING: CPU: 1 PID: 11321 at net/mac80211/ibss.c:504 
ieee80211_ibss_csa_beacon+0x4e9/0x5a0 net/mac80211/ibss.c:504
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 11321 Comm: kworker/u4:0 Not tainted 5.9.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Workqueue: phy10 ieee80211_csa_finalize_work
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1d6/0x29e lib/dump_stack.c:118
 panic+0x2c0/0x800 kernel/panic.c:231
 __warn+0x227/0x250 kernel/panic.c:600
 report_bug+0x1b1/0x2e0 lib/bug.c:198
 handle_bug+0x42/0x80 arch/x86/kernel/traps.c:234
 exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:254
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:ieee80211_ibss_csa_beacon+0x4e9/0x5a0 net/mac80211/ibss.c:504
Code: e8 fc 29 8b f9 b8 f4 ff ff ff eb 0a e8 f0 29 8b f9 b8 00 01 00 00 48 83 
c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 d7 29 8b f9 <0f> 0b b8 ea ff ff ff eb 
e3 e8 c9 29 8b f9 0f 0b e9 88 fb ff ff 48
RSP: 0018:c9000ab5fbf8 EFLAGS: 00010293
RAX: 87e9d419 RBX: 88804cc20580 RCX: 888016268280
RDX:  RSI:  RDI: 
RBP: 88804cc23490 R08: dc00 R09: fbfff16c82b4
R10: fbfff16c82b4 R11:  R12: 8880a21a18c0
R13: 8880a21a18ba R14: 8880a21a0c00 R15: 8880a21a18e0
 ieee80211_set_after_csa_beacon net/mac80211/cfg.c:3043 [inline]
 __ieee80211_csa_finalize net/mac80211/cfg.c:3099 [inline]
 ieee80211_csa_finalize+0x46f/0x960 net/mac80211/cfg.c:3122
 ieee80211_csa_finalize_work+0xfb/0x140 net/mac80211/cfg.c:3147
 process_one_work+0x789/0xfc0 kernel/workqueue.c:2269
 worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415
 kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


Re: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Leon Romanovsky
On Thu, Oct 08, 2020 at 07:14:17AM +, Parav Pandit wrote:
>
>
> > From: Leon Romanovsky 
> > Sent: Thursday, October 8, 2020 10:56 AM
> >
> > On Thu, Oct 08, 2020 at 04:56:01AM +, Parav Pandit wrote:
> > >
> > >
> > > > From: Pierre-Louis Bossart 
> > > > Sent: Thursday, October 8, 2020 3:20 AM
> > > >
> > > >
> > > > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > > > >> -Original Message-
> > > > >> From: Pierre-Louis Bossart 
> > > > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > > > >> To: Ertman, David M ; Parav Pandit
> > > > >> ; Leon Romanovsky 
> > > > >> Cc: alsa-de...@alsa-project.org; pa...@mellanox.com;
> > > > >> ti...@suse.de; netdev@vger.kernel.org;
> > > > >> ranjani.sridha...@linux.intel.com;
> > > > >> fred...@linux.intel.com; linux-r...@vger.kernel.org;
> > > > >> dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe
> > > > >> ; gre...@linuxfoundation.org; k...@kernel.org;
> > > > >> Williams, Dan J ; Saleem, Shiraz
> > > > >> ; da...@davemloft.net; Patil, Kiran
> > > > >> 
> > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > > >>
> > > > >>
> > > > >>
> > > >  Below is most simple, intuitive and matching with core APIs for
> > > >  name and design pattern wise.
> > > >  init()
> > > >  {
> > > > err = ancillary_device_initialize();
> > > > if (err)
> > > > return ret;
> > > > 
> > > > err = ancillary_device_add();
> > > > if (ret)
> > > > goto err_unwind;
> > > > 
> > > > err = some_foo();
> > > > if (err)
> > > > goto err_foo;
> > > > return 0;
> > > > 
> > > >  err_foo:
> > > > ancillary_device_del(adev);
> > > >  err_unwind:
> > > > ancillary_device_put(adev->dev);
> > > > return err;
> > > >  }
> > > > 
> > > >  cleanup()
> > > >  {
> > > > ancillary_device_de(adev);
> > > > ancillary_device_put(adev);
> > > > /* It is common to have a one wrapper for this as
> > > >  ancillary_device_unregister().
> > > >  * This will match with core device_unregister() that has
> > > >  precise documentation.
> > > >  * but given fact that init() code need proper error
> > > >  unwinding, like above,
> > > >  * it make sense to have two APIs, and no need to export
> > > >  another symbol for unregister().
> > > >  * This pattern is very easy to audit and code.
> > > >  */
> > > >  }
> > > > >>>
> > > > >>> I like this flow +1
> > > > >>>
> > > > >>> But ... since the init() function is performing both device_init
> > > > >>> and device_add - it should probably be called
> > > > >>> ancillary_device_register, and we are back to a single exported
> > > > >>> API for both register and unregister.
> > > > >>
> > > > >> Kind reminder that we introduced the two functions to allow the
> > > > >> caller to know if it needed to free memory when initialize()
> > > > >> fails, and it didn't need to free memory when add() failed since
> > > > >> put_device() takes care of it. If you have a single init()
> > > > >> function it's impossible to know which behavior to select on error.
> > > > >>
> > > > >> I also have a case with SoundWire where it's nice to first
> > > > >> initialize, then set some data and then add.
> > > > >>
> > > > >
> > > > > The flow as outlined by Parav above does an initialize as the
> > > > > first step, so every error path out of the function has to do a
> > > > > put_device(), so you would never need to manually free the memory
> > > > > in
> > > > the setup function.
> > > > > It would be freed in the release call.
> > > >
> > > > err = ancillary_device_initialize(); if (err)
> > > > return ret;
> > > >
> > > > where is the put_device() here? if the release function does any
> > > > sort of kfree, then you'd need to do it manually in this case.
> > > Since device_initialize() failed, put_device() cannot be done here.
> > > So yes, pseudo code should have shown, if (err) {
> > >   kfree(adev);
> > >   return err;
> > > }
> > >
> > > If we just want to follow register(), unregister() pattern,
> > >
> > > Than,
> > >
> > > ancillar_device_register() should be,
> > >
> > > /**
> > >  * ancillar_device_register() - register an ancillary device
> > >  * NOTE: __never directly free @adev after calling this function, even
> > > if it returned
> > >  * an error. Always use ancillary_device_put() to give up the reference
> > initialized by this function.
> > >  * This note matches with the core and caller knows exactly what to be
> > done.
> > >  */
> > > ancillary_device_register()
> > > {
> > >   device_initialize(&adev->dev);
> > >   if (!dev->parent || !adev->name)
> > >   return -EINVAL;
> > >   if (!dev->release && !(dev->type && dev->type->release)) {
> > >   /* core is already capable and throws the warning when
> > release callback is not set.
> > >* 

Re: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread gre...@linuxfoundation.org
On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky  wrote:
> [..]
> > All stated above is my opinion, it can be different from yours.
> 
> Yes, but we need to converge to move this forward. Jason was involved
> in the current organization for registration, Greg was angling for
> this to be core functionality. I have use cases outside of RDMA and
> netdev. Parav was ok with the current organization. The SOF folks
> already have a proposed incorporation of it. The argument I am hearing
> is that "this registration api seems hard for driver writers" when we
> have several driver writers who have already taken a look and can make
> it work. If you want to follow on with a simpler wrappers for your use
> case, great, but I do not yet see anyone concurring with your opinion
> that the current organization is irretrievably broken or too obscure
> to use.

That's kind of because I tuned out of this thread a long time ago :)

I do agree with Leon that I think the current patch is not the correct
way to do this the easiest, but don't have a competing proposal to show
what I mean.

Yet.

Let's see what happens after 5.10-rc1 is out, it's too late now for any
of this for this next merge window so we can not worry about it for a
few weeks.

thanks,

greg k-h


Re: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Leon Romanovsky
On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky  wrote:
> [..]
> > All stated above is my opinion, it can be different from yours.
>
> Yes, but we need to converge to move this forward. Jason was involved
> in the current organization for registration, Greg was angling for
> this to be core functionality. I have use cases outside of RDMA and
> netdev. Parav was ok with the current organization. The SOF folks
> already have a proposed incorporation of it. The argument I am hearing
> is that "this registration api seems hard for driver writers" when we
> have several driver writers who have already taken a look and can make
> it work. If you want to follow on with a simpler wrappers for your use
> case, great, but I do not yet see anyone concurring with your opinion
> that the current organization is irretrievably broken or too obscure
> to use.

Can it be that I'm first one to use this bus for very large driver (>120K LOC)
that has 5 different ->probe() flows?

For example, this 
https://lore.kernel.org/linux-rdma/20201006172317.GN1874917@unreal/
hints to me that this bus wasn't used with anything complex as it was initially 
intended.

And regarding registration, I said many times that init()/add() scheme is ok, 
the inability
to call to uninit() after add() failure is not ok from my point of view.

Thanks


Re: [PATCH v4 bpf-next 09/13] bpf: introduce multibuff support to bpf_prog_test_run_xdp()

2020-10-08 Thread Shay Agroskin



Lorenzo Bianconi  writes:


Introduce the capability to allocate a xdp multi-buff in
bpf_prog_test_run_xdp routine. This is a preliminary patch to 
introduce

the selftests for new xdp multi-buff ebpf helpers

Signed-off-by: Lorenzo Bianconi 
---
 net/bpf/test_run.c | 51 
 ++

 1 file changed, 43 insertions(+), 8 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index bd291f5f539c..ec7286cd051b 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -617,44 +617,79 @@ int bpf_prog_test_run_xdp(struct bpf_prog 
*prog, const union bpf_attr *kattr,

 {
 	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct 
 skb_shared_info));

u32 headroom = XDP_PACKET_HEADROOM;
-   u32 size = kattr->test.data_size_in;
u32 repeat = kattr->test.repeat;
struct netdev_rx_queue *rxqueue;
+   struct skb_shared_info *sinfo;
struct xdp_buff xdp = {};
+   u32 max_data_sz, size;
u32 retval, duration;
-   u32 max_data_sz;
+   int i, ret, data_len;
void *data;
-   int ret;
 
 	if (kattr->test.ctx_in || kattr->test.ctx_out)

return -EINVAL;
 
-	/* XDP have extra tailroom as (most) drivers use full page 
*/

max_data_sz = 4096 - headroom - tailroom;


For the sake of consistency, can this 4096 be changed to PAGE_SIZE 
?

Same as in
data_len = min_t(int, kattr->test.data_size_in - size, 
PAGE_SIZE);


expression below


+   size = min_t(u32, kattr->test.data_size_in, max_data_sz);
+   data_len = size;
 
-	data = bpf_test_init(kattr, kattr->test.data_size_in,

-max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, size, max_data_sz, headroom, 
tailroom);

if (IS_ERR(data))
return PTR_ERR(data);
 
 	xdp.data_hard_start = data;

xdp.data = data + headroom;
xdp.data_meta = xdp.data;
-   xdp.data_end = xdp.data + size;
+   xdp.data_end = xdp.data + data_len;
xdp.frame_sz = headroom + max_data_sz + tailroom;
 
+	sinfo = xdp_get_shared_info_from_buff(&xdp);

+   if (unlikely(kattr->test.data_size_in > size)) {
+		void __user *data_in = 
u64_to_user_ptr(kattr->test.data_in);

+
+   while (size < kattr->test.data_size_in) {
+			skb_frag_t *frag = 
&sinfo->frags[sinfo->nr_frags];

+   struct page *page;
+   int data_len;
+
+   page = alloc_page(GFP_KERNEL);
+   if (!page) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   __skb_frag_set_page(frag, page);
+			data_len = min_t(int, 
kattr->test.data_size_in - size,

+PAGE_SIZE);
+   skb_frag_size_set(frag, data_len);
+			if (copy_from_user(page_address(page), 
data_in + size,

+  data_len)) {
+   ret = -EFAULT;
+   goto out;
+   }
+   sinfo->nr_frags++;
+   size += data_len;
+   }
+   xdp.mb = 1;
+   }
+
 	rxqueue = 
 __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 
 0);

xdp.rxq = &rxqueue->xdp_rxq;
bpf_prog_change_xdp(NULL, prog);
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, 
 true);

if (ret)
goto out;
+
 	if (xdp.data != data + headroom || xdp.data_end != 
 xdp.data + size)

-   size = xdp.data_end - xdp.data;
+   size += xdp.data_end - xdp.data - data_len;


Can we please drop the variable shadowing of data_len ? This is 
confusing since the initial value of data_len is correct in the 
`size` calculation, while its value inside the while loop it not.


This seem to be syntactically correct, but I think it's better 
practice to avoid shadowing here.



+
 	ret = bpf_test_finish(kattr, uattr, xdp.data, size, 
 retval, duration);

 out:
bpf_prog_change_xdp(prog, NULL);
+   for (i = 0; i < sinfo->nr_frags; i++)
+   __free_page(skb_frag_page(&sinfo->frags[i]));
kfree(data);
+
return ret;
 }




Re: [PATCH v2 1/2] net: usb: qmi_wwan: add Cellient MPL200 card

2020-10-08 Thread Bjørn Mork
Wilken Gottwalt  writes:

> Add usb ids of the Cellient MPL200 card.
>
> Signed-off-by: Wilken Gottwalt 
> ---
>  drivers/net/usb/qmi_wwan.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> index 07c42c0719f5..5ca1356b8656 100644
> --- a/drivers/net/usb/qmi_wwan.c
> +++ b/drivers/net/usb/qmi_wwan.c
> @@ -1375,6 +1375,7 @@ static const struct usb_device_id products[] = {
>   {QMI_QUIRK_SET_DTR(0x2cb7, 0x0104, 4)}, /* Fibocom NL678 series */
>   {QMI_FIXED_INTF(0x0489, 0xe0b4, 0)},/* Foxconn T77W968 LTE */
>   {QMI_FIXED_INTF(0x0489, 0xe0b5, 0)},/* Foxconn T77W968 LTE with 
> eSIM support*/
> + {QMI_FIXED_INTF(0x2692, 0x9025, 4)},/* Cellient MPL200 (rebranded 
> Qualcomm 05c6:9025) */
>  
>   /* 4. Gobi 1000 devices */
>   {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)},/* Acer Gobi Modem Device */


Thanks.  Looks nice now.

Acked-by: Bjørn Mork 


Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook

2020-10-08 Thread Jesper Dangaard Brouer
On Wed, 07 Oct 2020 20:19:39 -0700
John Fastabend  wrote:

> Maciej Żenczykowski wrote:
> > On Wed, Oct 7, 2020 at 3:37 PM John Fastabend  
> > wrote:  
> > >
> > > Daniel Borkmann wrote:  
> > > > On 10/7/20 6:23 PM, Jesper Dangaard Brouer wrote:
> > > > [...]  
> > > > >   net/core/dev.c |   24 ++--
> > > > >   1 file changed, 22 insertions(+), 2 deletions(-)  
> > >
> > > Couple high-level comments. Whats the problem with just letting the driver
> > > consume the packet? I would chalk it up to a buggy BPF program that is
> > > sending these packets. The drivers really shouldn't panic or do anything
> > > horrible under this case because even today I don't think we can be
> > > 100% certain MTU on skb matches set MTU. Imagine the case where I change
> > > the MTU from 9kB->1500B there will be some skbs in-flight with the larger
> > > length and some with the shorter. If the drivers panic/fault or otherwise
> > > does something else horrible thats not going to be friendly in general 
> > > case
> > > regardless of what BPF does. And seeing this type of config is all done
> > > async its tricky (not practical) to flush any skbs in-flight.
> > >
> > > I've spent many hours debugging these types of feature flag, mtu
> > > change bugs on the driver side I'm not sure it can be resolved by
> > > the stack easily. Better to just build drivers that can handle it IMO.
> > >
> > > Do we know if sending >MTU size skbs to drivers causes problems in real
> > > cases? I haven't tried on the NICs I have here, but I expect they should
> > > be fine. Fine here being system keeps running as expected. Dropping the
> > > skb either on TX or RX side is expected. Even with this change though
> > > its possible for the skb to slip through if I configure MTU on a live
> > > system.  
> > 
> > I wholeheartedly agree with the above.
> > 
> > Ideally the only >mtu check should happen at driver admittance.
> > But again ideally it should happen in some core stack location not in
> > the driver itself.  
> 
> Ideally maybe, but IMO we should just let the skb go to the driver
> and let the driver sort it out. Even if this means pushing the packet
> onto the wire then the switch will drop it or the receiver, etc. A
> BPF program can do lots of horrible things that should never be
> on the wire otherwise. MTU is just one of them, but sending corrupted
> payloads, adding bogus headers, checksums etc. so I don't think we can
> reasonable protect against all of them.

That is a good point.

> Of course if the driver is going to hang/panic then something needs
> to be done. Perhaps a needs_mtu_check feature flag, although thats
> not so nice either so perhaps drivers just need to handle it themselves.
> Also even today the case could happen without BPF as best I can tell
> so the drivers should be prepared for it.

Yes, borderline cases do exist already today (like your reconf with
inflight packets), so I guess drivers should at-least not hang/panic
and be robust enough that we can drop this check.

I think you have convinced me that we can drop this check.  My only
concern is how people can troubleshoot this, but its not different from
current state.


> > However, due to both gso and vlan offload, even this is not trivial to do...
> > The mtu is L3, but drivers/hardware/the wire usually care about L2...

If net_device->mtu is L3 (1500) and XDP (and TC, right?) operate at L2,
that likely means that the "strict" bpf_mtu_check (in my BPF-helper) is
wrong, as XDP (and TC) length at this point include the 14 bytes
Ethernet header.  I will check and fix.

Is this accounted for via net_device->hard_header_len ?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



Re: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Dan Williams
On Thu, Oct 8, 2020 at 1:00 AM Leon Romanovsky  wrote:
>
> On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky  wrote:
> > [..]
> > > All stated above is my opinion, it can be different from yours.
> >
> > Yes, but we need to converge to move this forward. Jason was involved
> > in the current organization for registration, Greg was angling for
> > this to be core functionality. I have use cases outside of RDMA and
> > netdev. Parav was ok with the current organization. The SOF folks
> > already have a proposed incorporation of it. The argument I am hearing
> > is that "this registration api seems hard for driver writers" when we
> > have several driver writers who have already taken a look and can make
> > it work. If you want to follow on with a simpler wrappers for your use
> > case, great, but I do not yet see anyone concurring with your opinion
> > that the current organization is irretrievably broken or too obscure
> > to use.
>
> Can it be that I'm first one to use this bus for very large driver (>120K LOC)
> that has 5 different ->probe() flows?
>
> For example, this 
> https://lore.kernel.org/linux-rdma/20201006172317.GN1874917@unreal/
> hints to me that this bus wasn't used with anything complex as it was 
> initially intended.

I missed that. Yes, I agree that's broken.

>
> And regarding registration, I said many times that init()/add() scheme is ok, 
> the inability
> to call to uninit() after add() failure is not ok from my point of view.

Ok, I got to the wrong conclusion about your position.


[PATCH ipsec] xfrm: interface: fix the priorities for ipip and ipv6 tunnels

2020-10-08 Thread Xin Long
As Nicolas noticed in his case, when xfrm_interface module is installed
the standard IP tunnels will break in receiving packets.

This is caused by the IP tunnel handlers with a higher priority in xfrm
interface processing incoming packets by xfrm_input(), which would drop
the packets and return 0 instead when anything wrong happens.

Rather than changing xfrm_input(), this patch is to adjust the priority
for the IP tunnel handlers in xfrm interface, so that the packets would
go to xfrmi's later than the others', as the others' would not drop the
packets when the handlers couldn't process them.

Note that IPCOMP also defines its own IPIP tunnel handler and it calls
xfrm_input() as well, so we must make its priority lower than xfrmi's,
which means having xfrmi loaded would still break IPCOMP. We may seek
another way to fix it in xfrm_input() in the future.

Reported-by: Nicolas Dichtel 
Tested-by: Nicolas Dichtel 
Fixes: da9bbf0598c9 ("xfrm: interface: support IPIP and IPIP6 tunnels 
processing with .cb_handler")
FIxes: d7b360c2869f ("xfrm: interface: support IP6IP6 and IP6IP tunnels 
processing with .cb_handler")
Signed-off-by: Xin Long 
---
 net/ipv4/xfrm4_tunnel.c   | 4 ++--
 net/ipv6/xfrm6_tunnel.c   | 4 ++--
 net/xfrm/xfrm_interface.c | 8 
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c
index dc19aff..fb0648e 100644
--- a/net/ipv4/xfrm4_tunnel.c
+++ b/net/ipv4/xfrm4_tunnel.c
@@ -64,14 +64,14 @@ static int xfrm_tunnel_err(struct sk_buff *skb, u32 info)
 static struct xfrm_tunnel xfrm_tunnel_handler __read_mostly = {
.handler=   xfrm_tunnel_rcv,
.err_handler=   xfrm_tunnel_err,
-   .priority   =   3,
+   .priority   =   4,
 };
 
 #if IS_ENABLED(CONFIG_IPV6)
 static struct xfrm_tunnel xfrm64_tunnel_handler __read_mostly = {
.handler=   xfrm_tunnel_rcv,
.err_handler=   xfrm_tunnel_err,
-   .priority   =   2,
+   .priority   =   3,
 };
 #endif
 
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 25b7ebd..f696d46 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -303,13 +303,13 @@ static const struct xfrm_type xfrm6_tunnel_type = {
 static struct xfrm6_tunnel xfrm6_tunnel_handler __read_mostly = {
.handler= xfrm6_tunnel_rcv,
.err_handler= xfrm6_tunnel_err,
-   .priority   = 2,
+   .priority   = 3,
 };
 
 static struct xfrm6_tunnel xfrm46_tunnel_handler __read_mostly = {
.handler= xfrm6_tunnel_rcv,
.err_handler= xfrm6_tunnel_err,
-   .priority   = 2,
+   .priority   = 3,
 };
 
 static int __net_init xfrm6_tunnel_net_init(struct net *net)
diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index a8f6611..0bb7963 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -830,14 +830,14 @@ static struct xfrm6_tunnel xfrmi_ipv6_handler 
__read_mostly = {
.handler=   xfrmi6_rcv_tunnel,
.cb_handler =   xfrmi_rcv_cb,
.err_handler=   xfrmi6_err,
-   .priority   =   -1,
+   .priority   =   2,
 };
 
 static struct xfrm6_tunnel xfrmi_ip6ip_handler __read_mostly = {
.handler=   xfrmi6_rcv_tunnel,
.cb_handler =   xfrmi_rcv_cb,
.err_handler=   xfrmi6_err,
-   .priority   =   -1,
+   .priority   =   2,
 };
 #endif
 
@@ -875,14 +875,14 @@ static struct xfrm_tunnel xfrmi_ipip_handler 
__read_mostly = {
.handler=   xfrmi4_rcv_tunnel,
.cb_handler =   xfrmi_rcv_cb,
.err_handler=   xfrmi4_err,
-   .priority   =   -1,
+   .priority   =   3,
 };
 
 static struct xfrm_tunnel xfrmi_ipip6_handler __read_mostly = {
.handler=   xfrmi4_rcv_tunnel,
.cb_handler =   xfrmi_rcv_cb,
.err_handler=   xfrmi4_err,
-   .priority   =   -1,
+   .priority   =   2,
 };
 #endif
 
-- 
2.1.0



Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook

2020-10-08 Thread Daniel Borkmann

On 10/8/20 10:07 AM, Jesper Dangaard Brouer wrote:
[...]

However, due to both gso and vlan offload, even this is not trivial to do...
The mtu is L3, but drivers/hardware/the wire usually care about L2...


If net_device->mtu is L3 (1500) and XDP (and TC, right?) operate at L2,
that likely means that the "strict" bpf_mtu_check (in my BPF-helper) is
wrong, as XDP (and TC) length at this point include the 14 bytes
Ethernet header.  I will check and fix.


Yes, both at L2 layer.


Is this accounted for via net_device->hard_header_len ?


It is, see also ether_setup().


Re: [PATCH net 2/2] IPv6: reply ICMP error if the first fragment don't include all headers

2020-10-08 Thread Hangbin Liu
Hi Eric,

Thanks for the comments. I should add "RFC" in subject next time for the
uncertain fix patch.

On Wed, Oct 07, 2020 at 11:35:41AM +0200, Eric Dumazet wrote:
> 
> 
> On 10/7/20 5:55 AM, Hangbin Liu wrote:
> 
> > kfree_skb(skb);
> > @@ -282,6 +285,21 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff 
> > *skb, struct net_device *dev,
> > }
> > }
> >  
> > +   /* RFC 8200, Section 4.5 Fragment Header:
> > +* If the first fragment does not include all headers through an
> > +* Upper-Layer header, then that fragment should be discarded and
> > +* an ICMP Parameter Problem, Code 3, message should be sent to
> > +* the source of the fragment, with the Pointer field set to zero.
> > +*/
> > +   nexthdr = hdr->nexthdr;
> > +   offset = ipv6_skip_exthdr(skb, skb_transport_offset(skb), &nexthdr, 
> > &frag_off);
> > +   if (frag_off == htons(IP6_MF) && !pskb_may_pull(skb, offset + 1)) {
> > +   __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> > +   icmpv6_param_prob(skb, ICMPV6_HDR_INCOMP, 0);
> > +   rcu_read_unlock();
> > +   return NULL;
> > +   }
> > +
> > rcu_read_unlock();
> >  
> > /* Must drop socket now because of tproxy. */
> > 
> 
> Ouch, this is quite a buggy patch.
> 
> I doubt we want to add yet another ipv6_skip_exthdr() call in IPv6 fast path.
> 
> Surely the presence of NEXTHDR_FRAGMENT is already tested elsewhere ?

Would you like to help point where NEXTHDR_FRAGMENT was tested before IPv6
defragment?

> 
> Also, ipv6_skip_exthdr() does not pull anything in skb->head, it would be 
> strange
> to force a pull of hundreds of bytes just because you want to check if an 
> extra byte is there,
> if the packet could be forwarded as is, without additional memory allocations.
> 
> Testing skb->len should be more than enough at this stage.

Ah, yes, I shouldn't call pskb_may_pull here.
> 
> Also ipv6_skip_exthdr() can return an error.

it returns -1 as error, If we tested it by (offset + 1 > skb->len), does
that count as an error handler?

Thanks
Hangbin


Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook

2020-10-08 Thread Jesper Dangaard Brouer
On Wed, 7 Oct 2020 23:37:00 +0200
Daniel Borkmann  wrote:

> On 10/7/20 6:23 PM, Jesper Dangaard Brouer wrote:
> [...]
> >   net/core/dev.c |   24 ++--
> >   1 file changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index b433098896b2..19406013f93e 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3870,6 +3870,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, 
> > struct net_device *dev)
> > switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
> > case TC_ACT_OK:
> > case TC_ACT_RECLASSIFY:
> > +   *ret = NET_XMIT_SUCCESS;
> > skb->tc_index = TC_H_MIN(cl_res.classid);
> > break;
> > case TC_ACT_SHOT:
> > @@ -4064,9 +4065,12 @@ static int __dev_queue_xmit(struct sk_buff *skb, 
> > struct net_device *sb_dev)
> >   {
> > struct net_device *dev = skb->dev;
> > struct netdev_queue *txq;
> > +#ifdef CONFIG_NET_CLS_ACT
> > +   bool mtu_check = false;
> > +#endif
> > +   bool again = false;
> > struct Qdisc *q;
> > int rc = -ENOMEM;
> > -   bool again = false;
> >   
> > skb_reset_mac_header(skb);
> >   
> > @@ -4082,14 +4086,28 @@ static int __dev_queue_xmit(struct sk_buff *skb, 
> > struct net_device *sb_dev)
> >   
> > qdisc_pkt_len_init(skb);
> >   #ifdef CONFIG_NET_CLS_ACT
> > +   mtu_check = skb_is_redirected(skb);
> > skb->tc_at_ingress = 0;
> >   # ifdef CONFIG_NET_EGRESS
> > if (static_branch_unlikely(&egress_needed_key)) {
> > +   unsigned int len_orig = skb->len;
> > +
> > skb = sch_handle_egress(skb, &rc, dev);
> > if (!skb)
> > goto out;
> > +   /* BPF-prog ran and could have changed packet size beyond MTU */
> > +   if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
> > +   mtu_check = true;
> > }
> >   # endif
> > +   /* MTU-check only happens on "last" net_device in a redirect sequence
> > +* (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
> > +* either ingress or egress to another device).
> > +*/  
> 
> Hmm, quite some overhead in fast path. 

Not really, normal fast-path already call is_skb_forwardable(). And it
already happens in existing code, ingress redirect code, which I remove
calling in patch 6.

(I have considered inlining is_skb_forwardable as a optimization for
general netstack dev_forward_skb)

> Also, won't this be checked multiple times on stacked devices? :(

I don't think it will be checked multiple times, because we have a
skb_reset_redirect() in ingress path (just after sch_handle_ingress()).

> Moreover, this missed the fact that 'real' qdiscs can have
> filters attached too, this would come after this check. Can't this instead be 
> in
> driver layer for those that really need it? I would probably only drop the 
> check
> as done in 1/6 and allow the BPF prog to do the validation if needed.

See other reply, this is likely what we will end-up with.

> > +   if (mtu_check && !is_skb_forwardable(dev, skb)) {
> > +   rc = -EMSGSIZE;
> > +   goto drop;
> > +   }
> >   #endif
> > /* If device/qdisc don't need skb->dst, release it right now while
> >  * its hot in this cpu cache.
> > @@ -4157,7 +4175,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, 
> > struct net_device *sb_dev)
> >   
> > rc = -ENETDOWN;
> > rcu_read_unlock_bh();
> > -
> > +#ifdef CONFIG_NET_CLS_ACT
> > +drop:
> > +#endif
> > atomic_long_inc(&dev->tx_dropped);
> > kfree_skb_list(skb);
> > return rc;
> >   
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping

2020-10-08 Thread Kurt Kanzenbach
On Wed Oct 07 2020, Vladimir Oltean wrote:
> On Wed, Oct 07, 2020 at 12:39:49PM +0200, Kurt Kanzenbach wrote:
>> For instance the hellcreek switch has actually three ptp hardware
>> clocks and the time stamping can be configured to use either one of
>> them.
>
> The sja1105 also has a corrected and an uncorrected PTP clock that can
> take timestamps. Initially I had thought I'd be going to spend some time
> figuring out multi-PHC support, but now I don't see any practical reason
> to use the uncorrected PHC for anything.

Just out of curiosity: How do you implement 802.1AS then? My
understanding is that the free-running clock has to be used for the
calculation of the peer delays and such meaning there should be a way to
get access to both PHCs or having some form of cross timestamping
available.

The hellcreek switch can take cross snapshots of all three ptp clocks in
hardware for that purpose.

>> > So when you'll poll for TX timestamps, you'll receive a TX
>> > timestamp from the PHY and another one from the switch, and those will
>> > be in a race with one another, so you won't know which one is which.
>> 
>> OK. So what happens if the driver will accept to disable hardware
>> timestamping? Is there anything else that needs to be implemented? Are
>> there (good) examples?
>
> It needs to not call skb_complete_tx_timestamp() and friends.
>
> For PHY timestamping, it also needs to invoke the correct methods for RX
> and for TX, where the PHY timestamping hooks will get called. I don't
> think that DSA is compatible yet with PHY timestamping, but it is
> probably a trivial modification.

Hmm? If DSA doesn't support PHY timestamping how are other DSA drivers
dealing with it then? I'm getting really confused.

Furthermore, there is no hellcreek hardware available with timestamping
capable PHYs. How am I supposed to even test this?

For now, until there is hardware available, PHY timestamping is not
supported with the hellcreek switch.

Thanks,
Kurt


signature.asc
Description: PGP signature


Re: [PATCH net 2/2] IPv6: reply ICMP error if the first fragment don't include all headers

2020-10-08 Thread Hangbin Liu
On Wed, Oct 07, 2020 at 07:58:07AM -0700, Jakub Kicinski wrote:
> On Wed,  7 Oct 2020 11:55:02 +0800 Hangbin Liu wrote:
> > Based on RFC 8200, Section 4.5 Fragment Header:
> > 
> >   -  If the first fragment does not include all headers through an
> >  Upper-Layer header, then that fragment should be discarded and
> >  an ICMP Parameter Problem, Code 3, message should be sent to
> >  the source of the fragment, with the Pointer field set to zero.
> > 
> > As the packet may be any kind of L4 protocol, I only checked if there
> > has Upper-Layer header by pskb_may_pull(skb, offset + 1).
> > 
> > As the 1st truncated fragment may also be ICMP message, I also add
> > a check in ICMP code is_ineligible() to let fragment packet with nexthdr
> > ICMP but no ICMP header return false.
> > 
> > Signed-off-by: Hangbin Liu 
> 
> net/ipv6/icmp.c:159:65: warning: incorrect type in argument 4 (different base 
> types)
> net/ipv6/icmp.c:159:65:expected unsigned short *fragoff
> net/ipv6/icmp.c:159:65:got restricted __be16 *

Ah, Thanks for pointing out this error, I will fix it.

Regards
Hangbin


[PATCH net] sctp: fix sctp_auth_init_hmacs() error path

2020-10-08 Thread Eric Dumazet
From: Eric Dumazet 

After freeing ep->auth_hmacs we have to clear the pointer
or risk use-after-free as reported by syzbot:

BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 
[inline]
BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 
[inline]
BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
Read of size 8 at addr 8880a8ff52c0 by task syz-executor941/6874

CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
 sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
 sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
 sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203
 sctp_endpoint_put net/sctp/endpointola.c:236 [inline]
 sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183
 sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981
 sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415
 sk_common_release+0x64/0x390 net/core/sock.c:3254
 sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475
 __sock_release+0xcd/0x280 net/socket.c:596
 sock_close+0x18/0x20 net/socket.c:1277
 __fput+0x285/0x920 fs/file_table.c:281
 task_work_run+0xdd/0x190 kernel/task_work.c:141
 exit_task_work include/linux/task_work.h:25 [inline]
 do_exit+0xb7d/0x29f0 kernel/exit.c:806
 do_group_exit+0x125/0x310 kernel/exit.c:903
 __do_sys_exit_group kernel/exit.c:914 [inline]
 __se_sys_exit_group kernel/exit.c:912 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43f278
Code: Bad RIP value.
RSP: 002b:7fffe0995c38 EFLAGS: 0246 ORIG_RAX: 00e7
RAX: ffda RBX:  RCX: 0043f278
RDX:  RSI: 003c RDI: 
RBP: 004bf068 R08: 00e7 R09: ffd0
R10: 2000 R11: 0246 R12: 0001
R13: 006d1180 R14:  R15: 

Allocated by task 6874:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
 kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554
 kmalloc include/linux/slab.h:554 [inline]
 kmalloc_array include/linux/slab.h:593 [inline]
 kcalloc include/linux/slab.h:605 [inline]
 sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464
 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
 sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
 __sys_setsockopt+0x2db/0x610 net/socket.c:2132
 __do_sys_setsockopt net/socket.c:2143 [inline]
 __se_sys_setsockopt net/socket.c:2140 [inline]
 __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 6874:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
 __cache_free mm/slab.c:3422 [inline]
 kfree+0x10e/0x2b0 mm/slab.c:3760
 sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline]
 sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
 sctp_auth_init_hmacs net/sctp/auth.c:496 [inline]
 sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454
 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
 sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
 __sys_setsockopt+0x2db/0x610 net/socket.c:2132
 __do_sys_setsockopt net/socket.c:2143 [inline]
 __se_sys_setsockopt net/socket.c:2140 [inline]
 __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1f485649f529 ("[SCTP]: Implement SCTP-AUTH internals")
Signed-off-by: Eric Dumazet 
Cc: Vlad Yasevich 
Cc: Neil Horman 
Cc: Marcelo Ricardo Leitner 
---
 net/sctp/auth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index 
9e289c770574f6009b1e854ee4b9b3d5f942d4d5..7e59d8a18f3e40368eb911b63ac9f514b1dcf095
 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -494,6 +494,7 @@ int sctp_auth_init_hmacs(struct sctp_endpoint *ep, gfp_t 
gfp)
 out_err:
/* Clean up any successful allocations */
sctp_auth_destroy_hmacs(ep->auth_hmacs);
+   ep->auth_hmacs = NULL;
return -ENOM

Re: [Patch net] tipc: fix the skb_unshare() in tipc_buf_append()

2020-10-08 Thread Xin Long
On Thu, Oct 8, 2020 at 12:12 PM Cong Wang  wrote:
>
> skb_unshare() drops a reference count on the old skb unconditionally,
> so in the failure case, we end up freeing the skb twice here.
> And because the skb is allocated in fclone and cloned by caller
> tipc_msg_reassemble(), the consequence is actually freeing the
> original skb too, thus triggered the UAF by syzbot.
Do you mean:
frag = skb_clone(skb, GFP_ATOMIC);
frag = skb_unshare(frag) will free the 'skb' too?

>
> Fix this by replacing this skb_unshare() with skb_cloned()+skb_copy().
>
> Fixes: ff48b6222e65 ("tipc: use skb_unshare() instead in tipc_buf_append()")
> Reported-and-tested-by: syzbot+e96a7ba46281824cc...@syzkaller.appspotmail.com
> Cc: Xin Long 
> Cc: Jon Maloy 
> Cc: Ying Xue 
> Signed-off-by: Cong Wang 
> ---
>  net/tipc/msg.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/tipc/msg.c b/net/tipc/msg.c
> index 52e93ba4d8e2..681224401871 100644
> --- a/net/tipc/msg.c
> +++ b/net/tipc/msg.c
> @@ -150,7 +150,8 @@ int tipc_buf_append(struct sk_buff **headbuf, struct 
> sk_buff **buf)
> if (fragid == FIRST_FRAGMENT) {
> if (unlikely(head))
> goto err;
> -   frag = skb_unshare(frag, GFP_ATOMIC);
> +   if (skb_cloned(frag))
> +   frag = skb_copy(frag, GFP_ATOMIC);
> if (unlikely(!frag))
> goto err;
> head = *headbuf = frag;
> --
> 2.28.0
>


KMSAN: uninit-value in ieee80211_sta_tx_notify

2020-10-08 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:5edb1df2 kmsan: drop the _nosanitize string functions
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=1279b1c050
kernel config:  https://syzkaller.appspot.com/x/.config?x=4991d22eb136035c
dashboard link: https://syzkaller.appspot.com/bug?extid=beae481026cb095addca
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+beae481026cb095ad...@syzkaller.appspotmail.com

=
BUG: KMSAN: uninit-value in ieee80211_ac_from_tid 
net/mac80211/ieee80211_i.h:2034 [inline]
BUG: KMSAN: uninit-value in ieee80211_sta_tx_wmm_ac_notify 
net/mac80211/mlme.c:2490 [inline]
BUG: KMSAN: uninit-value in ieee80211_sta_tx_notify+0x4be/0xda0 
net/mac80211/mlme.c:2522
CPU: 1 PID: 8613 Comm: kworker/u4:1 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Workqueue: phy18 ieee80211_beacon_connection_loss_work
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:201
 ieee80211_ac_from_tid net/mac80211/ieee80211_i.h:2034 [inline]
 ieee80211_sta_tx_wmm_ac_notify net/mac80211/mlme.c:2490 [inline]
 ieee80211_sta_tx_notify+0x4be/0xda0 net/mac80211/mlme.c:2522
 __ieee80211_tx_status+0x2d3b/0x4450 net/mac80211/status.c:1013
 ieee80211_tx_status+0x223/0x270 net/mac80211/status.c:1129
 ieee80211_tasklet_handler+0x34e/0x3c0 net/mac80211/main.c:239
 tasklet_action_common+0x416/0x6a0 kernel/softirq.c:562
 tasklet_action+0x30/0x40 kernel/softirq.c:580
 __do_softirq+0x2ea/0x7f5 kernel/softirq.c:299
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:23 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:50 [inline]
 do_softirq_own_stack+0x7c/0xa0 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:344 [inline]
 __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:196
 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
 __ieee80211_tx_skb_tid_band+0x28e/0x390 net/mac80211/tx.c:5352
 ieee80211_tx_skb_tid net/mac80211/ieee80211_i.h:2003 [inline]
 ieee80211_tx_skb net/mac80211/ieee80211_i.h:2012 [inline]
 ieee80211_send_nullfunc+0x59a/0x6d0 net/mac80211/mlme.c:1095
 ieee80211_mgd_probe_ap_send+0x897/0xb40 net/mac80211/mlme.c:2593
 ieee80211_mgd_probe_ap+0x6b3/0x760 net/mac80211/mlme.c:2669
 ieee80211_beacon_connection_loss_work+0x156/0x270 net/mac80211/mlme.c:2788
 process_one_work+0x1688/0x2140 kernel/workqueue.c:2269
 worker_thread+0x10bc/0x2730 kernel/workqueue.c:2415
 kthread+0x551/0x590 kernel/kthread.c:293
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Uninit was created at:
 kmsan_save_stack_with_flags+0x3c/0x90 mm/kmsan/kmsan.c:143
 kmsan_internal_alloc_meta_for_pages mm/kmsan/kmsan_shadow.c:268 [inline]
 kmsan_alloc_page+0xc5/0x1a0 mm/kmsan/kmsan_shadow.c:292
 __alloc_pages_nodemask+0xf34/0x1120 mm/page_alloc.c:4927
 __alloc_pages include/linux/gfp.h:509 [inline]
 __alloc_pages_node include/linux/gfp.h:522 [inline]
 alloc_pages_node include/linux/gfp.h:536 [inline]
 __page_frag_cache_refill mm/page_alloc.c:5002 [inline]
 page_frag_alloc+0x35b/0x880 mm/page_alloc.c:5032
 __netdev_alloc_skb+0xc3d/0xc90 net/core/skbuff.c:456
 netdev_alloc_skb include/linux/skbuff.h:2821 [inline]
 dev_alloc_skb include/linux/skbuff.h:2834 [inline]
 __ieee80211_beacon_get+0x37e3/0x4df0 net/mac80211/tx.c:4819
 ieee80211_beacon_get_tim+0x109/0x800 net/mac80211/tx.c:4933
 ieee80211_beacon_get include/net/mac80211.h:4845 [inline]
 mac80211_hwsim_beacon_tx+0x1c3/0xb80 drivers/net/wireless/mac80211_hwsim.c:1676
 __iterate_interfaces net/mac80211/util.c:737 [inline]
 ieee80211_iterate_active_interfaces_atomic+0x40a/0x610 net/mac80211/util.c:773
 mac80211_hwsim_beacon+0x11d/0x2e0 drivers/net/wireless/mac80211_hwsim.c:1717
 __run_hrtimer+0x7cd/0xf00 kernel/time/hrtimer.c:1524
 __hrtimer_run_queues kernel/time/hrtimer.c:1588 [inline]
 hrtimer_run_softirq+0x3bf/0x690 kernel/time/hrtimer.c:1605
 __do_softirq+0x2ea/0x7f5 kernel/softirq.c:299
=


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


[PATCH 1/6] net: tlan: Fix typo abitrary

2020-10-08 Thread Naoki Hayama
Fix comment typo.
s/abitrary/arbitrary/

Signed-off-by: Naoki Hayama 
---
 drivers/net/ethernet/ti/tlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/tlan.c b/drivers/net/ethernet/ti/tlan.c
index 76a342ea3797..e922258cae3f 100644
--- a/drivers/net/ethernet/ti/tlan.c
+++ b/drivers/net/ethernet/ti/tlan.c
@@ -2511,7 +2511,7 @@ static void tlan_phy_power_down(struct net_device *dev)
}
 
/* Wait for 50 ms and powerup
-* This is abitrary.  It is intended to make sure the
+* This is arbitrary.  It is intended to make sure the
 * transceiver settles.
 */
tlan_set_timer(dev, msecs_to_jiffies(50), TLAN_TIMER_PHY_PUP);
-- 
2.17.1


[PATCH 0/6] spelling: Fix typo related to "arbitrary"

2020-10-08 Thread Naoki Hayama
I found some typos related to "arbitrary".

s/abitrary/arbitrary/
s/arbitary/arbitrary/

This series fixes them.

These typos have been reported in the past in other codes, but
correction 'abitrary||arbitrary' wasn't added to scripts/spelling.txt.
Therefore, PATCH #6 adds it to spelling.txt.


Naoki Hayama (6):
  net: tlan: Fix typo abitrary
  dt-bindings: pinctrl: qcom: Fix typo abitrary
  dt-bindings: pinctrl: sirf: Fix typo abitrary
  ALSA: hdspm: Fix typo arbitary
  drm/etnaviv: Fix typo arbitary
  scripts/spelling.txt: Add arbitrary correction

 Documentation/devicetree/bindings/pinctrl/pinctrl-atlas7.txt| 2 +-
 .../devicetree/bindings/pinctrl/qcom,ipq4019-pinctrl.txt| 2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c   | 2 +-
 drivers/net/ethernet/ti/tlan.c  | 2 +-
 scripts/spelling.txt| 1 +
 sound/pci/rme9652/hdspm.c   | 2 +-
 6 files changed, 6 insertions(+), 5 deletions(-)

-- 
2.17.1


Re: [PATCH net-next v2 3/7] ethtool: trim policy tables

2020-10-08 Thread Eric Dumazet



On 10/6/20 12:07 AM, Jakub Kicinski wrote:
> Since ethtool uses strict attribute validation there's no need
> to initialize all attributes in policy tables. 0 is NLA_UNSPEC
> which is going to be rejected. Remove the NLA_REJECTs.
> 
> Similarly attributes above maxattrs are rejected, so there's
> no need to always size the policy tables to ETHTOOL_A_..._MAX.
> 

This implies that all policy tables must be 'complete'.

strset_stringsets_policy[] for example is :

static const struct nla_policy strset_stringsets_policy[] = {
[ETHTOOL_A_STRINGSETS_STRINGSET]= { .type = NLA_NESTED },
};

So when later strset_parse_request() does :

req_info->counts_only = tb[ETHTOOL_A_STRSET_COUNTS_ONLY];

We have an out-of-bound access since ETHTOOL_A_STRSET_COUNTS_ONLY > 
ETHTOOL_A_STRINGSETS_STRINGSET

Not sure what was the expected type for this attribute, the kernel
only looks at its presence, not its value.


Re: [PATCH net-next v2 3/7] ethtool: trim policy tables

2020-10-08 Thread Johannes Berg
On Thu, 2020-10-08 at 11:12 +0200, Eric Dumazet wrote:
> 
> On 10/6/20 12:07 AM, Jakub Kicinski wrote:
> > Since ethtool uses strict attribute validation there's no need
> > to initialize all attributes in policy tables. 0 is NLA_UNSPEC
> > which is going to be rejected. Remove the NLA_REJECTs.
> > 
> > Similarly attributes above maxattrs are rejected, so there's
> > no need to always size the policy tables to ETHTOOL_A_..._MAX.
> > 
> 
> This implies that all policy tables must be 'complete'.
> 
> strset_stringsets_policy[] for example is :
> 
> static const struct nla_policy strset_stringsets_policy[] = {
> [ETHTOOL_A_STRINGSETS_STRINGSET]= { .type = NLA_NESTED },
> };
> 
> So when later strset_parse_request() does :
> 
> req_info->counts_only = tb[ETHTOOL_A_STRSET_COUNTS_ONLY];
> 
> We have an out-of-bound access since ETHTOOL_A_STRSET_COUNTS_ONLY > 
> ETHTOOL_A_STRINGSETS_STRINGSET

Yeah, Leon Romanovsky reported actually running into this yesterday, and
I sent a fix :-)

> Not sure what was the expected type for this attribute, the kernel
> only looks at its presence, not its value.

It was NLA_FLAG, but never actually in the policy, so you could never
even successfully use it ...

Here was the fix
https://lore.kernel.org/netdev/20201007125348.a74389e18168.Ieab7a871e27b9698826e75dc9e825e4ddbc852b1@changeid/

johannes



Re: [PATCH net-next v2 3/7] ethtool: trim policy tables

2020-10-08 Thread Johannes Berg
On Thu, 2020-10-08 at 11:13 +0200, Johannes Berg wrote:

> > This implies that all policy tables must be 'complete'.

Also, yes they had to be complete already, perhaps *except* for NLA_FLAG
like this below use ...

> > So when later strset_parse_request() does :
> > 
> > req_info->counts_only = tb[ETHTOOL_A_STRSET_COUNTS_ONLY];
> > 
> Here was the fix
> https://lore.kernel.org/netdev/20201007125348.a74389e18168.Ieab7a871e27b9698826e75dc9e825e4ddbc852b1@changeid/

Sorry, wrong link

https://lore.kernel.org/netdev/20201007125348.a0b250308599.Ie9b429e276d064f28ce12db01fffa430e5c770e0@changeid/

johannes



[net-next PATCH v3] net: dsa: rtl8366rb: Roof MTU for switch

2020-10-08 Thread Linus Walleij
The MTU setting for this DSA switch is global so we need
to keep track of the MTU set for each port, then as soon
as any MTU changes, roof the MTU to the biggest common
denominator and poke that into the switch MTU setting.

To achieve this we need a per-chip-variant state container
for the RTL8366RB to use for the RTL8366RB-specific
stuff. Other SMI switches does seem to have per-port
MTU setting capabilities.

Fixes: 5f4a8ef384db ("net: dsa: rtl8366rb: Support setting MTU")
Signed-off-by: Linus Walleij 
---
ChangeLog v2->v3:
- Fix the reverse-christmas-tree properly by also making it
  compile :/
ChangeLog v1->v2:
- Fix a reverse-christmas-tree variable order issue.
---
 drivers/net/dsa/realtek-smi-core.c |  3 ++-
 drivers/net/dsa/realtek-smi-core.h |  2 ++
 drivers/net/dsa/rtl8366rb.c| 38 ++
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/realtek-smi-core.c 
b/drivers/net/dsa/realtek-smi-core.c
index fae188c60191..8e49d4f85d48 100644
--- a/drivers/net/dsa/realtek-smi-core.c
+++ b/drivers/net/dsa/realtek-smi-core.c
@@ -394,9 +394,10 @@ static int realtek_smi_probe(struct platform_device *pdev)
var = of_device_get_match_data(dev);
np = dev->of_node;
 
-   smi = devm_kzalloc(dev, sizeof(*smi), GFP_KERNEL);
+   smi = devm_kzalloc(dev, sizeof(*smi) + var->chip_data_sz, GFP_KERNEL);
if (!smi)
return -ENOMEM;
+   smi->chip_data = (void *)smi + sizeof(*smi);
smi->map = devm_regmap_init(dev, NULL, smi,
&realtek_smi_mdio_regmap_config);
if (IS_ERR(smi->map)) {
diff --git a/drivers/net/dsa/realtek-smi-core.h 
b/drivers/net/dsa/realtek-smi-core.h
index 6f2dab7e33d6..bc7bd47fb037 100644
--- a/drivers/net/dsa/realtek-smi-core.h
+++ b/drivers/net/dsa/realtek-smi-core.h
@@ -71,6 +71,7 @@ struct realtek_smi {
int vlan4k_enabled;
 
charbuf[4096];
+   void*chip_data; /* Per-chip extra variant data */
 };
 
 /**
@@ -111,6 +112,7 @@ struct realtek_smi_variant {
unsigned int clk_delay;
u8 cmd_read;
u8 cmd_write;
+   size_t chip_data_sz;
 };
 
 /* SMI core calls */
diff --git a/drivers/net/dsa/rtl8366rb.c b/drivers/net/dsa/rtl8366rb.c
index 053bf5041f8d..28f510a580be 100644
--- a/drivers/net/dsa/rtl8366rb.c
+++ b/drivers/net/dsa/rtl8366rb.c
@@ -311,6 +311,13 @@
 #define RTL8366RB_GREEN_FEATURE_TX BIT(0)
 #define RTL8366RB_GREEN_FEATURE_RX BIT(2)
 
+/**
+ * struct rtl8366rb - RTL8366RB-specific data
+ */
+struct rtl8366rb {
+   unsigned int max_mtu[RTL8366RB_NUM_PORTS];
+};
+
 static struct rtl8366_mib_counter rtl8366rb_mib_counters[] = {
{ 0,  0, 4, "IfInOctets"},
{ 0,  4, 4, "EtherStatsOctets"  },
@@ -712,6 +719,7 @@ static int rtl8366rb_setup(struct dsa_switch *ds)
 {
struct realtek_smi *smi = ds->priv;
const u16 *jam_table;
+   struct rtl8366rb *rb;
u32 chip_ver = 0;
u32 chip_id = 0;
int jam_size;
@@ -719,6 +727,8 @@ static int rtl8366rb_setup(struct dsa_switch *ds)
int ret;
int i;
 
+   rb = smi->chip_data;
+
ret = regmap_read(smi->map, RTL8366RB_CHIP_ID_REG, &chip_id);
if (ret) {
dev_err(smi->dev, "unable to read chip id\n");
@@ -871,6 +881,9 @@ static int rtl8366rb_setup(struct dsa_switch *ds)
 RTL8366RB_SGCR_MAX_LENGTH_1536);
if (ret)
return ret;
+   for (i = 0; i < RTL8366RB_NUM_PORTS; i++)
+   /* layer 2 size, see rtl8366rb_change_mtu() */
+   rb->max_mtu[i] = 1532;
 
/* Enable learning for all ports */
ret = regmap_write(smi->map, RTL8366RB_SSCR0, 0);
@@ -1112,20 +1125,36 @@ rtl8366rb_port_disable(struct dsa_switch *ds, int port)
 static int rtl8366rb_change_mtu(struct dsa_switch *ds, int port, int new_mtu)
 {
struct realtek_smi *smi = ds->priv;
+   struct rtl8366rb *rb;
+   unsigned int max_mtu;
u32 len;
+   int i;
+
+   /* Cache the per-port MTU setting */
+   rb = smi->chip_data;
+   rb->max_mtu[port] = new_mtu;
 
-   /* The first setting, 1522 bytes, is max IP packet 1500 bytes,
+   /* Roof out the MTU for the entire switch to the greatest
+* common denominator: the biggest set for any one port will
+* be the biggest MTU for the switch.
+*
+* The first setting, 1522 bytes, is max IP packet 1500 bytes,
 * plus ethernet header, 1518 bytes, plus CPU tag, 4 bytes.
 * This function should consider the parameter an SDU, so the
 * MTU passed for this setting is 1518 bytes. The same logic
 * of subtracting the DSA tag of 4 bytes apply to the other
 * settings.
 */
-   if (new_mtu <= 1518)
+   max_mtu = 1518;
+   for (i = 0; i < RTL8

Re: [PATCH net-next 11/15] sctp: add udphdr to overhead when udp_port is set

2020-10-08 Thread Xin Long
On Tue, Oct 6, 2020 at 3:01 AM Marcelo Ricardo Leitner
 wrote:
>
> On Sat, Oct 03, 2020 at 08:24:34PM +0800, Xin Long wrote:
> > On Sat, Oct 3, 2020 at 7:23 PM Xin Long  wrote:
> > >
> > > On Sat, Oct 3, 2020 at 4:12 PM Xin Long  wrote:
> > > >
> > > > On Sat, Oct 3, 2020 at 12:08 PM Marcelo Ricardo Leitner
> > > >  wrote:
> > > > >
> > > > > On Wed, Sep 30, 2020 at 03:00:42AM +0800, kernel test robot wrote:
> > > > > > Hi Xin,
> > > > > >
> > > > > > Thank you for the patch! Yet something to improve:
> > > > >
> > > > > I wonder how are you planning to fix this. It is quite entangled.
> > > > > This is not performance critical. Maybe the cleanest way out is to
> > > > > move it to a .c file.
> > > > >
> > > > > Adding a
> > > > > #if defined(CONFIG_IP_SCTP) || defined(CONFIG_IP_SCTP_MODULE)
> > > > > in there doesn't seem good.
> > > > >
> > > > > >In file included from include/net/sctp/checksum.h:27,
> > > > > > from net/netfilter/nf_nat_proto.c:16:
> > > > > >include/net/sctp/sctp.h: In function 'sctp_mtu_payload':
> > > > > > >> include/net/sctp/sctp.h:583:31: error: 'struct net' has no 
> > > > > > >> member named 'sctp'; did you mean 'ct'?
> > > > > >  583 |   if (sock_net(&sp->inet.sk)->sctp.udp_port)
> > > > > >  |   ^~~~
> > > > > >  |   ct
> > > > > >
> > > > Here is actually another problem, I'm still thinking how to fix it.
> > > >
> > > > Now sctp_mtu_payload() returns different value depending on
> > > > net->sctp.udp_port. but net->sctp.udp_port can be changed by
> > > > "sysctl -w" anytime. so:
>
> Good point.
>
> > > >
> > > > In sctp_packet_config() it gets overhead/headroom by calling
> > > > sctp_mtu_payload(). When 'udp_port' is 0, it's IP+MAC header
> > > > size. Then if 'udp_port' is changed to 9899 by 'sysctl -w',
> > > > udphdr will also be added to the packet in sctp_v4_xmit(),
> > > > and later the headroom may not be enough for IP+MAC headers.
> > > >
> > > > I'm thinking to add sctp_sock->udp_port, and it'll be set when
> > > > the sock is created with net->udp_port. but not sure if we should
> > > > update sctp_sock->udp_port with  net->udp_port when sending packets?
>
> I don't think so,
>
> > > something like:
> ...
> > diff --git a/net/sctp/output.c b/net/sctp/output.c
> > index 6614c9fdc51e..c96b13ec72f4 100644
> > --- a/net/sctp/output.c
> > +++ b/net/sctp/output.c
> > @@ -91,6 +91,14 @@ void sctp_packet_config(struct sctp_packet *packet,
> > __u32 vtag,
> > if (asoc) {
> > sk = asoc->base.sk;
> > sp = sctp_sk(sk);
> > +
> > +   if (unlikely(sp->udp_port != sock_net(sk)->sctp.udp_port)) {
>
> RFC6951 has:
>
> 6.1.  Get or Set the Remote UDP Encapsulation Port Number
>   (SCTP_REMOTE_UDP_ENCAPS_PORT)
> ...
>sue_assoc_id:  This parameter is ignored for one-to-one style
>   sockets.  For one-to-many style sockets, the application may fill
>   in an association identifier or SCTP_FUTURE_ASSOC for this query.
>   It is an error to use SCTP_{CURRENT|ALL}_ASSOC in sue_assoc_id.
>
>sue_address:  This specifies which address is of interest.  If a
>   wildcard address is provided, it applies only to future paths.
>
> So I'm not seeing a reason to have a system wide knob that takes
> effect in run time like this.
> Enable, start apps, and they keep behaving as initially configured.
> Need to disable? Restart the apps/sockets.
>
> Thoughts?
Right, not to update it on tx path makes more sense. Thanks.

>
> > +   __u16 port = sock_net(sk)->sctp.udp_port;
> > +
> > +   if (!sp->udp_port || !port)
> > +   sctp_assoc_update_frag_point(asoc);
> > +   sp->udp_port = port;
> > +   }
> > }


Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping

2020-10-08 Thread Vladimir Oltean
On Thu, Oct 08, 2020 at 10:34:11AM +0200, Kurt Kanzenbach wrote:
> On Wed Oct 07 2020, Vladimir Oltean wrote:
> > On Wed, Oct 07, 2020 at 12:39:49PM +0200, Kurt Kanzenbach wrote:
> >> For instance the hellcreek switch has actually three ptp hardware
> >> clocks and the time stamping can be configured to use either one of
> >> them.
> >
> > The sja1105 also has a corrected and an uncorrected PTP clock that can
> > take timestamps. Initially I had thought I'd be going to spend some time
> > figuring out multi-PHC support, but now I don't see any practical reason
> > to use the uncorrected PHC for anything.
> 
> Just out of curiosity: How do you implement 802.1AS then? My
> understanding is that the free-running clock has to be used for the

Has to be? I couldn't find that wording in IEEE 802.1AS-2011.

> calculation of the peer delays and such meaning there should be a way to
> get access to both PHCs or having some form of cross timestamping
> available.
> 
> The hellcreek switch can take cross snapshots of all three ptp clocks in
> hardware for that purpose.

Well, at the end of the day, all the other TSN offloads (tc-taprio,
tc-gate) will still have to use the synchronized PTP clock, so what
we're doing is we're simply letting that clock be synchronized by ptp4l.

> >> > So when you'll poll for TX timestamps, you'll receive a TX
> >> > timestamp from the PHY and another one from the switch, and those will
> >> > be in a race with one another, so you won't know which one is which.
> >> 
> >> OK. So what happens if the driver will accept to disable hardware
> >> timestamping? Is there anything else that needs to be implemented? Are
> >> there (good) examples?
> >
> > It needs to not call skb_complete_tx_timestamp() and friends.
> >
> > For PHY timestamping, it also needs to invoke the correct methods for RX
> > and for TX, where the PHY timestamping hooks will get called. I don't
> > think that DSA is compatible yet with PHY timestamping, but it is
> > probably a trivial modification.
> 
> Hmm? If DSA doesn't support PHY timestamping how are other DSA drivers
> dealing with it then? I'm getting really confused.

They aren't dealing with it, of course.

> Furthermore, there is no hellcreek hardware available with timestamping
> capable PHYs. How am I supposed to even test this?
> 
> For now, until there is hardware available, PHY timestamping is not
> supported with the hellcreek switch.

I was just pointing out that this is something you'll certainly have to
change if somebody will want PHY timestamping.

Even without hardware, you _could_ probably test that DSA is doing the
right thing by simply adding the PTP timestamping ops to a PHY driver
that you own, and inject dummy timestamps. The expectation becomes that
user space gets those dummy timestamps, and not the ones emitted by your
switch.


RE: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Parav Pandit



> From: Leon Romanovsky 
> Sent: Thursday, October 8, 2020 1:15 PM
> 
> On Thu, Oct 08, 2020 at 07:14:17AM +, Parav Pandit wrote:
> >
> >
> > > From: Leon Romanovsky 
> > > Sent: Thursday, October 8, 2020 10:56 AM
> > >
> > > On Thu, Oct 08, 2020 at 04:56:01AM +, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Pierre-Louis Bossart
> > > > > 
> > > > > Sent: Thursday, October 8, 2020 3:20 AM
> > > > >
> > > > >
> > > > > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > > > > >> -Original Message-
> > > > > >> From: Pierre-Louis Bossart
> > > > > >> 
> > > > > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > > > > >> To: Ertman, David M ; Parav Pandit
> > > > > >> ; Leon Romanovsky 
> > > > > >> Cc: alsa-de...@alsa-project.org; pa...@mellanox.com;
> > > > > >> ti...@suse.de; netdev@vger.kernel.org;
> > > > > >> ranjani.sridha...@linux.intel.com;
> > > > > >> fred...@linux.intel.com; linux-r...@vger.kernel.org;
> > > > > >> dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe
> > > > > >> ; gre...@linuxfoundation.org;
> > > > > >> k...@kernel.org; Williams, Dan J ;
> > > > > >> Saleem, Shiraz ;
> > > > > >> da...@davemloft.net; Patil, Kiran 
> > > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > > > >>
> > > > > >>
> > > > > >>
> > > > >  Below is most simple, intuitive and matching with core APIs
> > > > >  for name and design pattern wise.
> > > > >  init()
> > > > >  {
> > > > >   err = ancillary_device_initialize();
> > > > >   if (err)
> > > > >   return ret;
> > > > > 
> > > > >   err = ancillary_device_add();
> > > > >   if (ret)
> > > > >   goto err_unwind;
> > > > > 
> > > > >   err = some_foo();
> > > > >   if (err)
> > > > >   goto err_foo;
> > > > >   return 0;
> > > > > 
> > > > >  err_foo:
> > > > >   ancillary_device_del(adev);
> > > > >  err_unwind:
> > > > >   ancillary_device_put(adev->dev);
> > > > >   return err;
> > > > >  }
> > > > > 
> > > > >  cleanup()
> > > > >  {
> > > > >   ancillary_device_de(adev);
> > > > >   ancillary_device_put(adev);
> > > > >   /* It is common to have a one wrapper for this as
> > > > >  ancillary_device_unregister().
> > > > >    * This will match with core device_unregister() that has
> > > > >  precise documentation.
> > > > >    * but given fact that init() code need proper error
> > > > >  unwinding, like above,
> > > > >    * it make sense to have two APIs, and no need to export
> > > > >  another symbol for unregister().
> > > > >    * This pattern is very easy to audit and code.
> > > > >    */
> > > > >  }
> > > > > >>>
> > > > > >>> I like this flow +1
> > > > > >>>
> > > > > >>> But ... since the init() function is performing both
> > > > > >>> device_init and device_add - it should probably be called
> > > > > >>> ancillary_device_register, and we are back to a single
> > > > > >>> exported API for both register and unregister.
> > > > > >>
> > > > > >> Kind reminder that we introduced the two functions to allow
> > > > > >> the caller to know if it needed to free memory when
> > > > > >> initialize() fails, and it didn't need to free memory when
> > > > > >> add() failed since
> > > > > >> put_device() takes care of it. If you have a single init()
> > > > > >> function it's impossible to know which behavior to select on error.
> > > > > >>
> > > > > >> I also have a case with SoundWire where it's nice to first
> > > > > >> initialize, then set some data and then add.
> > > > > >>
> > > > > >
> > > > > > The flow as outlined by Parav above does an initialize as the
> > > > > > first step, so every error path out of the function has to do
> > > > > > a put_device(), so you would never need to manually free the
> > > > > > memory in
> > > > > the setup function.
> > > > > > It would be freed in the release call.
> > > > >
> > > > > err = ancillary_device_initialize(); if (err)
> > > > >   return ret;
> > > > >
> > > > > where is the put_device() here? if the release function does any
> > > > > sort of kfree, then you'd need to do it manually in this case.
> > > > Since device_initialize() failed, put_device() cannot be done here.
> > > > So yes, pseudo code should have shown, if (err) {
> > > > kfree(adev);
> > > > return err;
> > > > }
> > > >
> > > > If we just want to follow register(), unregister() pattern,
> > > >
> > > > Than,
> > > >
> > > > ancillar_device_register() should be,
> > > >
> > > > /**
> > > >  * ancillar_device_register() - register an ancillary device
> > > >  * NOTE: __never directly free @adev after calling this function,
> > > > even if it returned
> > > >  * an error. Always use ancillary_device_put() to give up the
> > > > reference
> > > initialized by this function.
> > > >  * This note matches with the core and caller knows exactly what
> > > > to be
> > > done.
> > > >  */
> > > >

Re: [PATCH net 2/2] IPv6: reply ICMP error if the first fragment don't include all headers

2020-10-08 Thread Eric Dumazet



On 10/8/20 10:30 AM, Hangbin Liu wrote:
> Hi Eric,
> 
> Thanks for the comments. I should add "RFC" in subject next time for the
> uncertain fix patch.
> 
> On Wed, Oct 07, 2020 at 11:35:41AM +0200, Eric Dumazet wrote:
>>
>>
>> On 10/7/20 5:55 AM, Hangbin Liu wrote:
>>
>>> kfree_skb(skb);
>>> @@ -282,6 +285,21 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff 
>>> *skb, struct net_device *dev,
>>> }
>>> }
>>>  
>>> +   /* RFC 8200, Section 4.5 Fragment Header:
>>> +* If the first fragment does not include all headers through an
>>> +* Upper-Layer header, then that fragment should be discarded and
>>> +* an ICMP Parameter Problem, Code 3, message should be sent to
>>> +* the source of the fragment, with the Pointer field set to zero.
>>> +*/
>>> +   nexthdr = hdr->nexthdr;
>>> +   offset = ipv6_skip_exthdr(skb, skb_transport_offset(skb), &nexthdr, 
>>> &frag_off);
>>> +   if (frag_off == htons(IP6_MF) && !pskb_may_pull(skb, offset + 1)) {
>>> +   __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>>> +   icmpv6_param_prob(skb, ICMPV6_HDR_INCOMP, 0);
>>> +   rcu_read_unlock();
>>> +   return NULL;
>>> +   }
>>> +
>>> rcu_read_unlock();
>>>  
>>> /* Must drop socket now because of tproxy. */
>>>
>>
>> Ouch, this is quite a buggy patch.
>>
>> I doubt we want to add yet another ipv6_skip_exthdr() call in IPv6 fast path.
>>
>> Surely the presence of NEXTHDR_FRAGMENT is already tested elsewhere ?
> 
> Would you like to help point where NEXTHDR_FRAGMENT was tested before IPv6
> defragment?
I think we have to ask the question : Should routers enforce the rule, or
only end points ?

End points must handle NEXTHDR_FRAGMENT, in ipv6_frag_rcv()


> 
>>
>> Also, ipv6_skip_exthdr() does not pull anything in skb->head, it would be 
>> strange
>> to force a pull of hundreds of bytes just because you want to check if an 
>> extra byte is there,
>> if the packet could be forwarded as is, without additional memory 
>> allocations.
>>
>> Testing skb->len should be more than enough at this stage.
> 
> Ah, yes, I shouldn't call pskb_may_pull here.
>>
>> Also ipv6_skip_exthdr() can return an error.
> 
> it returns -1 as error, If we tested it by (offset + 1 > skb->len), does
> that count as an error handler?

If there is an error, then you want to send the ICMP, right ?

The (offset + 1) expression will become 0, and surely the test will be false,
so you wont send the ICMP...

> 
> Thanks
> Hangbin
> 


[PATCHv2 net-next 02/17] udp6: move the mss check after udp gso tunnel processing

2020-10-08 Thread Xin Long
For some protocol's gso, like SCTP, it's using GSO_BY_FRAGS for
gso_size. When using UDP to encapsulate its packet, it will
return error in udp6_ufo_fragment() as skb->len < gso_size,
and it will never go to the gso tunnel processing.

So we should move this check after udp gso tunnel processing,
the same as udp4_ufo_fragment() does. While at it, also tidy
the variables up.

Signed-off-by: Xin Long 
---
 net/ipv6/udp_offload.c | 154 -
 1 file changed, 76 insertions(+), 78 deletions(-)

diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 584157a..3c5ec8e 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -17,96 +17,94 @@
 static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 netdev_features_t features)
 {
+   u8 nexthdr, frag_hdr_sz = sizeof(struct frag_hdr);
+   unsigned int unfrag_ip6hlen, unfrag_len, mss;
struct sk_buff *segs = ERR_PTR(-EINVAL);
-   unsigned int mss;
-   unsigned int unfrag_ip6hlen, unfrag_len;
-   struct frag_hdr *fptr;
+   const struct ipv6hdr *ipv6h;
u8 *packet_start, *prevhdr;
-   u8 nexthdr;
-   u8 frag_hdr_sz = sizeof(struct frag_hdr);
+   struct frag_hdr *fptr;
+   int tnl_hlen, err;
+   struct udphdr *uh;
__wsum csum;
-   int tnl_hlen;
-   int err;
 
-   mss = skb_shinfo(skb)->gso_size;
-   if (unlikely(skb->len <= mss))
+   if (skb->encapsulation &&
+   (skb_shinfo(skb)->gso_type &
+(SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))) {
+   segs = skb_udp_tunnel_segment(skb, features, true);
goto out;
+   }
 
-   if (skb->encapsulation && skb_shinfo(skb)->gso_type &
-   (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))
-   segs = skb_udp_tunnel_segment(skb, features, true);
-   else {
-   const struct ipv6hdr *ipv6h;
-   struct udphdr *uh;
+   if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP_L4)))
+   goto out;
 
-   if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | 
SKB_GSO_UDP_L4)))
-   goto out;
+   if (!pskb_may_pull(skb, sizeof(struct udphdr)))
+   goto out;
 
-   if (!pskb_may_pull(skb, sizeof(struct udphdr)))
-   goto out;
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+   return __udp_gso_segment(skb, features);
 
-   if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
-   return __udp_gso_segment(skb, features);
-
-   /* Do software UFO. Complete and fill in the UDP checksum as HW 
cannot
-* do checksum of UDP packets sent as multiple IP fragments.
-*/
-
-   uh = udp_hdr(skb);
-   ipv6h = ipv6_hdr(skb);
-
-   uh->check = 0;
-   csum = skb_checksum(skb, 0, skb->len, 0);
-   uh->check = udp_v6_check(skb->len, &ipv6h->saddr,
- &ipv6h->daddr, csum);
-   if (uh->check == 0)
-   uh->check = CSUM_MANGLED_0;
-
-   skb->ip_summed = CHECKSUM_UNNECESSARY;
-
-   /* If there is no outer header we can fake a checksum offload
-* due to the fact that we have already done the checksum in
-* software prior to segmenting the frame.
-*/
-   if (!skb->encap_hdr_csum)
-   features |= NETIF_F_HW_CSUM;
-
-   /* Check if there is enough headroom to insert fragment header. 
*/
-   tnl_hlen = skb_tnl_header_len(skb);
-   if (skb->mac_header < (tnl_hlen + frag_hdr_sz)) {
-   if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz))
-   goto out;
-   }
+   mss = skb_shinfo(skb)->gso_size;
+   if (unlikely(skb->len <= mss))
+   goto out;
 
-   /* Find the unfragmentable header and shift it left by 
frag_hdr_sz
-* bytes to insert fragment header.
-*/
-   err = ip6_find_1stfragopt(skb, &prevhdr);
-   if (err < 0)
-   return ERR_PTR(err);
-   unfrag_ip6hlen = err;
-   nexthdr = *prevhdr;
-   *prevhdr = NEXTHDR_FRAGMENT;
-   unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +
-unfrag_ip6hlen + tnl_hlen;
-   packet_start = (u8 *) skb->head + SKB_GSO_CB(skb)->mac_offset;
-   memmove(packet_start-frag_hdr_sz, packet_start, unfrag_len);
-
-   SKB_GSO_CB(skb)->mac_offset -= frag_hdr_sz;
-   skb->mac_header -= frag_hdr_sz;
-   skb->network_header -= frag_hdr_sz;
-
-   fptr = (struct frag_hdr *)(skb_network_header(skb) + 

[PATCHv2 net-next 01/17] udp: check udp sock encap_type in __udp_lib_err

2020-10-08 Thread Xin Long
There is a chance that __udp4/6_lib_lookup() returns a udp encap
sock in __udp_lib_err(), like the udp encap listening sock may
use the same port as remote encap port, in which case it should
go to __udp4/6_lib_err_encap() for more validation before
processing the icmp packet.

This patch is to check encap_type in __udp_lib_err() for the
further validation for a encap sock.

Signed-off-by: Xin Long 
---
 net/ipv4/udp.c | 2 +-
 net/ipv6/udp.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 09f0a23..ca04a8a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -702,7 +702,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct 
udp_table *udptable)
sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
   iph->saddr, uh->source, skb->dev->ifindex,
   inet_sdif(skb), udptable, NULL);
-   if (!sk) {
+   if (!sk || udp_sk(sk)->encap_type) {
/* No socket for error: try tunnels before discarding */
sk = ERR_PTR(-ENOENT);
if (static_branch_unlikely(&udp_encap_needed_key)) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 29d9691..cde9b88 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -560,7 +560,7 @@ int __udp6_lib_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
 
sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source,
   inet6_iif(skb), inet6_sdif(skb), udptable, NULL);
-   if (!sk) {
+   if (!sk || udp_sk(sk)->encap_type) {
/* No socket for error: try tunnels before discarding */
sk = ERR_PTR(-ENOENT);
if (static_branch_unlikely(&udpv6_encap_needed_key)) {
-- 
2.1.0



[PATCHv2 net-next 04/17] udp: support sctp over udp in skb_udp_tunnel_segment

2020-10-08 Thread Xin Long
To call sctp_gso_segment() properly in skb_udp_tunnel_segment() for sctp
over udp packets, we need to set transport_header to sctp header. When
skb->ip_summed == CHECKSUM_PARTIAL, skb_crc32c_csum_help() should be
called for the inner sctp packet.

Cc: Tom Herbert 
Signed-off-by: Xin Long 
---
 net/ipv4/udp_offload.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index c0b010b..a484405 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -49,6 +49,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct 
sk_buff *skb,
__skb_pull(skb, tnl_hlen);
skb_reset_mac_header(skb);
skb_set_network_header(skb, skb_inner_network_offset(skb));
+   skb_set_transport_header(skb, skb_inner_transport_offset(skb));
skb->mac_len = skb_inner_network_offset(skb);
skb->protocol = new_protocol;
 
@@ -131,8 +132,12 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct 
sk_buff *skb,
uh->check = ~csum_fold(csum_add(partial,
   (__force __wsum)htonl(len)));
 
-   if (skb->encapsulation)
-   skb_checksum_help(skb);
+   if (skb->encapsulation) {
+   if (skb->csum_not_inet)
+   skb_crc32c_csum_help(skb);
+   else
+   skb_checksum_help(skb);
+   }
 
if (offload_csum) {
skb->ip_summed = CHECKSUM_PARTIAL;
-- 
2.1.0



[PATCHv2 net-next 05/17] sctp: create udp4 sock and add its encap_rcv

2020-10-08 Thread Xin Long
This patch is to add the functions to create/release udp4 sock,
and set the sock's encap_rcv to process the incoming udp encap
sctp packets. In sctp_udp_rcv(), as we can see, all we need to
do is fix the transport header for sctp_rcv(), then it would
implement the part of rfc6951#section-5.4:

  "When an encapsulated packet is received, the UDP header is removed.
   Then, the generic lookup is performed, as done by an SCTP stack
   whenever a packet is received, to find the association for the
   received SCTP packet"

Note that these functions will be called in the last patch of
this patchset when enabling this feature.

v1->v2:
  - Add pr_err() when fails to create udp v4 sock.

Signed-off-by: Xin Long 
---
 include/net/netns/sctp.h |  5 +
 include/net/sctp/constants.h |  2 ++
 include/net/sctp/sctp.h  |  2 ++
 net/sctp/protocol.c  | 43 +++
 4 files changed, 52 insertions(+)

diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index d8d02e4..3d10bef 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -22,6 +22,11 @@ struct netns_sctp {
 */
struct sock *ctl_sock;
 
+   /* udp tunneling listening sock. */
+   struct sock *udp4_sock;
+   /* udp tunneling listening port. */
+   int udp_port;
+
/* This is the global local address list.
 * We actively maintain this complete list of addresses on
 * the system by catching address add/delete events.
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 122d9e2..b583166 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -286,6 +286,8 @@ enum { SCTP_MAX_GABS = 16 };
 * functions simpler to write.
 */
 
+#define SCTP_DEFAULT_UDP_PORT 9899 /* default udp tunneling port */
+
 /* These are the values for pf exposure, UNUSED is to keep compatible with old
  * applications by default.
  */
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 4fc747b..bfd87a0 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -84,6 +84,8 @@ int sctp_copy_local_addr_list(struct net *net, struct 
sctp_bind_addr *addr,
 struct sctp_pf *sctp_get_pf_specific(sa_family_t family);
 int sctp_register_pf(struct sctp_pf *, sa_family_t);
 void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int);
+int sctp_udp_sock_start(struct net *net);
+void sctp_udp_sock_stop(struct net *net);
 
 /*
  * sctp/socket.c
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 2583323..2b7a3e1 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_SCTP_PORT_HASH_ENTRIES (64 * 1024)
 
@@ -840,6 +841,45 @@ static int sctp_ctl_sock_init(struct net *net)
return 0;
 }
 
+static int sctp_udp_rcv(struct sock *sk, struct sk_buff *skb)
+{
+   skb_set_transport_header(skb, sizeof(struct udphdr));
+   sctp_rcv(skb);
+   return 0;
+}
+
+int sctp_udp_sock_start(struct net *net)
+{
+   struct udp_tunnel_sock_cfg tuncfg = {NULL};
+   struct udp_port_cfg udp_conf = {0};
+   struct socket *sock;
+   int err;
+
+   udp_conf.family = AF_INET;
+   udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
+   udp_conf.local_udp_port = htons(net->sctp.udp_port);
+   err = udp_sock_create(net, &udp_conf, &sock);
+   if (err) {
+   pr_err("Failed to create the SCTP udp tunneling v4 sock\n");
+   return err;
+   }
+
+   tuncfg.encap_type = 1;
+   tuncfg.encap_rcv = sctp_udp_rcv;
+   setup_udp_tunnel_sock(net, sock, &tuncfg);
+   net->sctp.udp4_sock = sock->sk;
+
+   return 0;
+}
+
+void sctp_udp_sock_stop(struct net *net)
+{
+   if (net->sctp.udp4_sock) {
+   udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket);
+   net->sctp.udp4_sock = NULL;
+   }
+}
+
 /* Register address family specific functions. */
 int sctp_register_af(struct sctp_af *af)
 {
@@ -1271,6 +1311,9 @@ static int __net_init sctp_defaults_init(struct net *net)
/* Enable ECN by default. */
net->sctp.ecn_enable = 1;
 
+   /* Set udp tunneling listening port to default value */
+   net->sctp.udp_port = SCTP_DEFAULT_UDP_PORT;
+
/* Set SCOPE policy to enabled */
net->sctp.scope_policy = SCTP_SCOPE_POLICY_ENABLE;
 
-- 
2.1.0



[PATCHv2 net-next 00/17] sctp: Implement RFC6951: UDP Encapsulation of SCTP

2020-10-08 Thread Xin Long
Description From the RFC:

   The Main Reasons:

   o  To allow SCTP traffic to pass through legacy NATs, which do not
  provide native SCTP support as specified in [BEHAVE] and
  [NATSUPP].

   o  To allow SCTP to be implemented on hosts that do not provide
  direct access to the IP layer.  In particular, applications can
  use their own SCTP implementation if the operating system does not
  provide one.

   Implementation Notes:

   UDP-encapsulated SCTP is normally communicated between SCTP stacks
   using the IANA-assigned UDP port number 9899 (sctp-tunneling) on both
   ends.  There are circumstances where other ports may be used on
   either end, and it might be required to use ports other than the
   registered port.

   Each SCTP stack uses a single local UDP encapsulation port number as
   the destination port for all its incoming SCTP packets, this greatly
   simplifies implementation design.

   An SCTP implementation supporting UDP encapsulation MUST maintain a
   remote UDP encapsulation port number per destination address for each
   SCTP association.  Again, because the remote stack may be using ports
   other than the well-known port, each port may be different from each
   stack.  However, because of remapping of ports by NATs, the remote
   ports associated with different remote IP addresses may not be
   identical, even if they are associated with the same stack.

   Because the well-known port might not be used, implementations need
   to allow other port numbers to be specified as a local or remote UDP
   encapsulation port number through APIs.

Patches:

   This patchset is using the udp4/6 tunnel APIs to implement the UDP
   Encapsulation of SCTP with not much change in SCTP protocol stack
   and with all current SCTP features keeped in Linux Kernel.

   1 - 4: Fix some UDP issues that may be triggered by SCTP over UDP.
   5 - 7: Process incoming UDP encapsulated packets and ICMP packets.
   8 -10: Remote encap port's update by sysctl, sockopt and packets.
   11-14: Process outgoing pakects with UDP encapsulated and its GSO.
   15-16: Add the part from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
  17: Enable this feature.

Tests:

  - lksctp-tools/src/func_tests with UDP Encapsulation enabled/disabled:

  Both make v4test and v6test passed.

  - sctp-tests with UDP Encapsulation enabled/disabled:

  repeatability/procdumps/sctpdiag/gsomtuchange/extoverflow/
  sctphashtable passed. Others failed as expected due to those
  "iptables -p sctp" rules.

  - netperf on lo/netns/virtio_net, with gso enabled/disabled and
with ip_checksum enabled/disabled, with UDP Encapsulation
enabled/disabled:

  No clear performance dropped.

v1->v2:
  - Fix some incorrect code in the patches 5,6,8,10,11,13,14,17, suggested
by Marcelo.
  - Append two patches 15-16 to add the Additional Considerations for UDP
Encapsulation of SCTP from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03,
noticed by Michael.

Xin Long (17):
  udp: check udp sock encap_type in __udp_lib_err
  udp6: move the mss check after udp gso tunnel processing
  udp: do checksum properly in skb_udp_tunnel_segment
  udp: support sctp over udp in skb_udp_tunnel_segment
  sctp: create udp4 sock and add its encap_rcv
  sctp: create udp6 sock and set its encap_rcv
  sctp: add encap_err_lookup for udp encap socks
  sctp: add encap_port for netns sock asoc and transport
  sctp: add SCTP_REMOTE_UDP_ENCAPS_PORT sockopt
  sctp: allow changing transport encap_port by peer packets
  sctp: add udphdr to overhead when udp_port is set
  sctp: call sk_setup_caps in sctp_packet_transmit instead
  sctp: support for sending packet over udp4 sock
  sctp: support for sending packet over udp6 sock
  sctp: add the error cause for new encapsulation port restart
  sctp: handle the init chunk matching an existing asoc
  sctp: enable udp tunneling socks

 include/linux/sctp.h |  20 ++
 include/net/netns/sctp.h |   8 +++
 include/net/sctp/constants.h |   2 +
 include/net/sctp/sctp.h  |   9 ++-
 include/net/sctp/sm.h|   4 ++
 include/net/sctp/structs.h   |  14 ++--
 include/uapi/linux/sctp.h|   7 ++
 net/ipv4/udp.c   |   2 +-
 net/ipv4/udp_offload.c   |  16 +++--
 net/ipv6/udp.c   |   2 +-
 net/ipv6/udp_offload.c   | 154 +--
 net/sctp/associola.c |   4 ++
 net/sctp/ipv6.c  |  44 +
 net/sctp/output.c|  22 +++
 net/sctp/protocol.c  | 148 +
 net/sctp/sm_make_chunk.c |  21 ++
 net/sctp/sm_statefuns.c  |  52 +++
 net/sctp/socket.c| 112 +++
 net/sctp/sysctl.c|  59 +
 19 files changed, 572 insertions(+), 128 deletions(-)

-- 
2.1.0



[PATCHv2 net-next 03/17] udp: do checksum properly in skb_udp_tunnel_segment

2020-10-08 Thread Xin Long
This patch fixes two things:

  When skb->ip_summed == CHECKSUM_PARTIAL, skb_checksum_help() should be
  called do the checksum, instead of gso_make_checksum(), which is used
  to do the checksum for current proto after calling skb_segment(), not
  after the inner proto's gso_segment().

  When offload_csum is disabled, the hardware will not do the checksum
  for the current proto, udp. So instead of calling gso_make_checksum(),
  it should calculate checksum for udp itself.

Cc: Tom Herbert 
Signed-off-by: Xin Long 
---
 net/ipv4/udp_offload.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index e67a66f..c0b010b 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -131,14 +131,15 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct 
sk_buff *skb,
uh->check = ~csum_fold(csum_add(partial,
   (__force __wsum)htonl(len)));
 
-   if (skb->encapsulation || !offload_csum) {
-   uh->check = gso_make_checksum(skb, ~uh->check);
-   if (uh->check == 0)
-   uh->check = CSUM_MANGLED_0;
-   } else {
+   if (skb->encapsulation)
+   skb_checksum_help(skb);
+
+   if (offload_csum) {
skb->ip_summed = CHECKSUM_PARTIAL;
skb->csum_start = skb_transport_header(skb) - skb->head;
skb->csum_offset = offsetof(struct udphdr, check);
+   } else {
+   uh->check = csum_fold(skb_checksum(skb, udp_offset, 
len, 0));
}
} while ((skb = skb->next));
 out:
-- 
2.1.0



[PATCHv2 net-next 06/17] sctp: create udp6 sock and set its encap_rcv

2020-10-08 Thread Xin Long
This patch is to add the udp6 sock part in sctp_udp_sock_start/stop().
udp_conf.use_udp6_rx_checksums is set to true, as:

   "The SCTP checksum MUST be computed for IPv4 and IPv6, and the UDP
checksum SHOULD be computed for IPv4 and IPv6"

says in rfc6951#section-5.3.

v1->v2:
  - Add pr_err() when fails to create udp v6 sock.
  - Add #if IS_ENABLED(CONFIG_IPV6) not to create v6 sock when ipv6 is
disabled.

Signed-off-by: Xin Long 
---
 include/net/netns/sctp.h |  1 +
 net/sctp/protocol.c  | 26 ++
 2 files changed, 27 insertions(+)

diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 3d10bef..f622945 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -24,6 +24,7 @@ struct netns_sctp {
 
/* udp tunneling listening sock. */
struct sock *udp4_sock;
+   struct sock *udp6_sock;
/* udp tunneling listening port. */
int udp_port;
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 2b7a3e1..49b5d75 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -869,6 +869,28 @@ int sctp_udp_sock_start(struct net *net)
setup_udp_tunnel_sock(net, sock, &tuncfg);
net->sctp.udp4_sock = sock->sk;
 
+#if IS_ENABLED(CONFIG_IPV6)
+   memset(&udp_conf, 0, sizeof(udp_conf));
+
+   udp_conf.family = AF_INET6;
+   udp_conf.local_ip6 = in6addr_any;
+   udp_conf.local_udp_port = htons(net->sctp.udp_port);
+   udp_conf.use_udp6_rx_checksums = true;
+   udp_conf.ipv6_v6only = true;
+   err = udp_sock_create(net, &udp_conf, &sock);
+   if (err) {
+   pr_err("Failed to create the SCTP udp tunneling v6 sock\n");
+   udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket);
+   net->sctp.udp4_sock = NULL;
+   return err;
+   }
+
+   tuncfg.encap_type = 1;
+   tuncfg.encap_rcv = sctp_udp_rcv;
+   setup_udp_tunnel_sock(net, sock, &tuncfg);
+   net->sctp.udp6_sock = sock->sk;
+#endif
+
return 0;
 }
 
@@ -878,6 +900,10 @@ void sctp_udp_sock_stop(struct net *net)
udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket);
net->sctp.udp4_sock = NULL;
}
+   if (net->sctp.udp6_sock) {
+   udp_tunnel_sock_release(net->sctp.udp6_sock->sk_socket);
+   net->sctp.udp6_sock = NULL;
+   }
 }
 
 /* Register address family specific functions. */
-- 
2.1.0



[PATCHv2 net-next 10/17] sctp: allow changing transport encap_port by peer packets

2020-10-08 Thread Xin Long
As rfc6951#section-5.4 says:

  "After finding the SCTP association (which
   includes checking the verification tag), the UDP source port MUST be
   stored as the encapsulation port for the destination address the SCTP
   packet is received from (see Section 5.1).

   When a non-encapsulated SCTP packet is received by the SCTP stack,
   the encapsulation of outgoing packets belonging to the same
   association and the corresponding destination address MUST be
   disabled."

transport encap_port should be updated by a validated incoming packet's
udp src port.

We save the udp src port in sctp_input_cb->encap_port, and then update
the transport in two places:

  1. right after vtag is verified, which is required by RFC, and this
 allows the existent transports to be updated by the chunks that
 can only be processed on an asoc.

  2. right before processing the 'init' where the transports are added,
 and this allows building a sctp over udp connection by client with
 the server not knowing the remote encap port.

  3. when processing ootb_pkt and creating the temporary transport for
 the reply pkt.

Note that sctp_input_cb->header is removed, as it's not used any more
in sctp.

v1->v2:
  - Change encap_port as __be16 for sctp_input_cb.

Signed-off-by: Xin Long 
---
 include/net/sctp/sm.h  |  1 +
 include/net/sctp/structs.h |  7 +--
 net/sctp/ipv6.c|  1 +
 net/sctp/protocol.c| 11 ++-
 net/sctp/sm_make_chunk.c   |  1 +
 net/sctp/sm_statefuns.c|  2 ++
 6 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 5c491a3..a499341 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -380,6 +380,7 @@ sctp_vtag_verify(const struct sctp_chunk *chunk,
 if (ntohl(chunk->sctp_hdr->vtag) == asoc->c.my_vtag)
 return 1;
 
+   chunk->transport->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port;
return 0;
 }
 
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index aa98e7e..81464ae 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1120,14 +1120,9 @@ static inline void sctp_outq_cork(struct sctp_outq *q)
  * sctp_input_cb is currently used on rx and sock rx queue
  */
 struct sctp_input_cb {
-   union {
-   struct inet_skb_parmh4;
-#if IS_ENABLED(CONFIG_IPV6)
-   struct inet6_skb_parm   h6;
-#endif
-   } header;
struct sctp_chunk *chunk;
struct sctp_af *af;
+   __be16 encap_port;
 };
 #define SCTP_INPUT_CB(__skb)   ((struct sctp_input_cb *)&((__skb)->cb[0]))
 
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 8a58f42..a064bf2 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -1053,6 +1053,7 @@ static struct inet_protosw sctpv6_stream_protosw = {
 
 static int sctp6_rcv(struct sk_buff *skb)
 {
+   memset(skb->cb, 0, sizeof(skb->cb));
return sctp_rcv(skb) ? -1 : 0;
 }
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 5b74187..0d16e5e 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -843,6 +843,9 @@ static int sctp_ctl_sock_init(struct net *net)
 
 static int sctp_udp_rcv(struct sock *sk, struct sk_buff *skb)
 {
+   memset(skb->cb, 0, sizeof(skb->cb));
+   SCTP_INPUT_CB(skb)->encap_port = udp_hdr(skb)->source;
+
skb_set_transport_header(skb, sizeof(struct udphdr));
sctp_rcv(skb);
return 0;
@@ -1139,9 +1142,15 @@ static struct inet_protosw sctp_stream_protosw = {
.flags  = SCTP_PROTOSW_FLAG
 };
 
+static int sctp4_rcv(struct sk_buff *skb)
+{
+   memset(skb->cb, 0, sizeof(skb->cb));
+   return sctp_rcv(skb);
+}
+
 /* Register with IP layer.  */
 static const struct net_protocol sctp_protocol = {
-   .handler = sctp_rcv,
+   .handler = sctp4_rcv,
.err_handler = sctp_v4_err,
.no_policy   = 1,
.netns_ok= 1,
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 9a56ae2..21d0ff1 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2321,6 +2321,7 @@ int sctp_process_init(struct sctp_association *asoc, 
struct sctp_chunk *chunk,
 * added as the primary transport.  The source address seems to
 * be a better choice than any of the embedded addresses.
 */
+   asoc->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port;
if (!sctp_assoc_add_peer(asoc, peer_addr, gfp, SCTP_ACTIVE))
goto nomem;
 
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index c669f8b..8edab15 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -6268,6 +6268,8 @@ static struct sctp_packet *sctp_ootb_pkt_new(
if (!transport)
goto nomem;
 
+   transport->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port;
+
/* Cache a route for the transport with the chunk's destination as
 * the source address.
 */
-- 
2

[PATCHv2 net-next 07/17] sctp: add encap_err_lookup for udp encap socks

2020-10-08 Thread Xin Long
As it says in rfc6951#section-5.5:

  "When receiving ICMP or ICMPv6 response packets, there might not be
   enough bytes in the payload to identify the SCTP association that the
   SCTP packet triggering the ICMP or ICMPv6 packet belongs to.  If a
   received ICMP or ICMPv6 packet cannot be related to a specific SCTP
   association or the verification tag cannot be verified, it MUST be
   discarded silently.  In particular, this means that the SCTP stack
   MUST NOT rely on receiving ICMP or ICMPv6 messages.  Implementation
   constraints could prevent processing received ICMP or ICMPv6
   messages."

ICMP or ICMPv6 packets need to be handled, and this is implemented by
udp encap sock .encap_err_lookup function.

The .encap_err_lookup function is called in __udp(6)_lib_err_encap()
to confirm this path does need to be updated. For sctp, what we can
do here is check if the corresponding asoc and transport exist.

Note that icmp packet process for sctp over udp is done by udp sock
.encap_err_lookup(), and it means for now we can't do as much as
sctp_v4/6_err() does. Also we can't do the two mappings mentioned
in rfc6951#section-5.5.

Signed-off-by: Xin Long 
---
 net/sctp/protocol.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 49b5d75..dd2d9c4 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -848,6 +848,23 @@ static int sctp_udp_rcv(struct sock *sk, struct sk_buff 
*skb)
return 0;
 }
 
+static int sctp_udp_err_lookup(struct sock *sk, struct sk_buff *skb)
+{
+   struct sctp_association *asoc;
+   struct sctp_transport *t;
+   int family;
+
+   skb->transport_header += sizeof(struct udphdr);
+   family = (ip_hdr(skb)->version == 4) ? AF_INET : AF_INET6;
+   sk = sctp_err_lookup(dev_net(skb->dev), family, skb, sctp_hdr(skb),
+&asoc, &t);
+   if (!sk)
+   return -ENOENT;
+
+   sctp_err_finish(sk, t);
+   return 0;
+}
+
 int sctp_udp_sock_start(struct net *net)
 {
struct udp_tunnel_sock_cfg tuncfg = {NULL};
@@ -866,6 +883,7 @@ int sctp_udp_sock_start(struct net *net)
 
tuncfg.encap_type = 1;
tuncfg.encap_rcv = sctp_udp_rcv;
+   tuncfg.encap_err_lookup = sctp_udp_err_lookup;
setup_udp_tunnel_sock(net, sock, &tuncfg);
net->sctp.udp4_sock = sock->sk;
 
@@ -887,6 +905,7 @@ int sctp_udp_sock_start(struct net *net)
 
tuncfg.encap_type = 1;
tuncfg.encap_rcv = sctp_udp_rcv;
+   tuncfg.encap_err_lookup = sctp_udp_err_lookup;
setup_udp_tunnel_sock(net, sock, &tuncfg);
net->sctp.udp6_sock = sock->sk;
 #endif
-- 
2.1.0



[PATCHv2 net-next 09/17] sctp: add SCTP_REMOTE_UDP_ENCAPS_PORT sockopt

2020-10-08 Thread Xin Long
This patch is to implement:

  rfc6951#section-6.1: Get or Set the Remote UDP Encapsulation Port Number

with the param of the struct:

  struct sctp_udpencaps {
sctp_assoc_t sue_assoc_id;
struct sockaddr_storage sue_address;
uint16_t sue_port;
  };

the encap_port of sock, assoc or transport can be changed by users,
which also means it allows the different transports of the same asoc
to have different encap_port value.

Signed-off-by: Xin Long 
---
 include/uapi/linux/sctp.h |   7 +++
 net/sctp/socket.c | 110 ++
 2 files changed, 117 insertions(+)

diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index 28ad40d..cb78e7a 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -140,6 +140,7 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_ECN_SUPPORTED 130
 #define SCTP_EXPOSE_POTENTIALLY_FAILED_STATE   131
 #define SCTP_EXPOSE_PF_STATE   SCTP_EXPOSE_POTENTIALLY_FAILED_STATE
+#define SCTP_REMOTE_UDP_ENCAPS_PORT132
 
 /* PR-SCTP policies */
 #define SCTP_PR_SCTP_NONE  0x
@@ -1197,6 +1198,12 @@ struct sctp_event {
uint8_t se_on;
 };
 
+struct sctp_udpencaps {
+   sctp_assoc_t sue_assoc_id;
+   struct sockaddr_storage sue_address;
+   uint16_t sue_port;
+};
+
 /* SCTP Stream schedulers */
 enum sctp_sched_type {
SCTP_SS_FCFS,
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 09b94cd..c9e86f5 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4417,6 +4417,53 @@ static int sctp_setsockopt_pf_expose(struct sock *sk,
return retval;
 }
 
+static int sctp_setsockopt_encap_port(struct sock *sk,
+ struct sctp_udpencaps *encap,
+ unsigned int optlen)
+{
+   struct sctp_association *asoc;
+   struct sctp_transport *t;
+
+   if (optlen != sizeof(*encap))
+   return -EINVAL;
+
+   /* If an address other than INADDR_ANY is specified, and
+* no transport is found, then the request is invalid.
+*/
+   if (!sctp_is_any(sk, (union sctp_addr *)&encap->sue_address)) {
+   t = sctp_addr_id2transport(sk, &encap->sue_address,
+  encap->sue_assoc_id);
+   if (!t)
+   return -EINVAL;
+
+   t->encap_port = encap->sue_port;
+   return 0;
+   }
+
+   /* Get association, if assoc_id != SCTP_FUTURE_ASSOC and the
+* socket is a one to many style socket, and an association
+* was not found, then the id was invalid.
+*/
+   asoc = sctp_id2assoc(sk, encap->sue_assoc_id);
+   if (!asoc && encap->sue_assoc_id != SCTP_FUTURE_ASSOC &&
+   sctp_style(sk, UDP))
+   return -EINVAL;
+
+   /* If changes are for association, also apply encap to each
+* transport.
+*/
+   if (asoc) {
+   list_for_each_entry(t, &asoc->peer.transport_addr_list,
+   transports)
+   t->encap_port = encap->sue_port;
+
+   return 0;
+   }
+
+   sctp_sk(sk)->encap_port = encap->sue_port;
+   return 0;
+}
+
 /* API 6.2 setsockopt(), getsockopt()
  *
  * Applications use setsockopt() and getsockopt() to set or retrieve
@@ -4636,6 +4683,9 @@ static int sctp_setsockopt(struct sock *sk, int level, 
int optname,
case SCTP_EXPOSE_POTENTIALLY_FAILED_STATE:
retval = sctp_setsockopt_pf_expose(sk, kopt, optlen);
break;
+   case SCTP_REMOTE_UDP_ENCAPS_PORT:
+   retval = sctp_setsockopt_encap_port(sk, kopt, optlen);
+   break;
default:
retval = -ENOPROTOOPT;
break;
@@ -7791,6 +7841,63 @@ static int sctp_getsockopt_pf_expose(struct sock *sk, 
int len,
return retval;
 }
 
+static int sctp_getsockopt_encap_port(struct sock *sk, int len,
+ char __user *optval, int __user *optlen)
+{
+   struct sctp_association *asoc;
+   struct sctp_udpencaps encap;
+   struct sctp_transport *t;
+
+   if (len < sizeof(encap))
+   return -EINVAL;
+
+   len = sizeof(encap);
+   if (copy_from_user(&encap, optval, len))
+   return -EFAULT;
+
+   /* If an address other than INADDR_ANY is specified, and
+* no transport is found, then the request is invalid.
+*/
+   if (!sctp_is_any(sk, (union sctp_addr *)&encap.sue_address)) {
+   t = sctp_addr_id2transport(sk, &encap.sue_address,
+  encap.sue_assoc_id);
+   if (!t) {
+   pr_debug("%s: failed no transport\n", __func__);
+   return -EINVAL;
+   }
+
+   encap.sue_port = t->encap_port;
+   goto out;
+   }
+
+   /* Get association, if assoc_id != SCTP_FU

[PATCHv2 net-next 11/17] sctp: add udphdr to overhead when udp_port is set

2020-10-08 Thread Xin Long
sctp_mtu_payload() is for calculating the frag size before making
chunks from a msg. So we should only add udphdr size to overhead
when udp socks are listening, as only then sctp can handle the
incoming sctp over udp packets and outgoing sctp over udp packets
will be possible.

Note that we can't do this according to transport->encap_port, as
different transports may be set to different values, while the
chunks were made before choosing the transport, we could not be
able to meet all rfc6951#section-5.6 recommends.

v1->v2:
  - Add udp_port for sctp_sock to avoid a potential race issue, it
will be used in xmit path in the next patch.

Signed-off-by: Xin Long 
---
 include/net/sctp/sctp.h| 7 +--
 include/net/sctp/structs.h | 1 +
 net/sctp/socket.c  | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index bfd87a0..86f74f2 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -578,10 +578,13 @@ static inline __u32 sctp_mtu_payload(const struct 
sctp_sock *sp,
 {
__u32 overhead = sizeof(struct sctphdr) + extra;
 
-   if (sp)
+   if (sp) {
overhead += sp->pf->af->net_header_len;
-   else
+   if (sp->udp_port)
+   overhead += sizeof(struct udphdr);
+   } else {
overhead += sizeof(struct ipv6hdr);
+   }
 
if (WARN_ON_ONCE(mtu && mtu <= overhead))
mtu = overhead;
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 81464ae..80f7149 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -178,6 +178,7 @@ struct sctp_sock {
 */
__u32 hbinterval;
 
+   __be16 udp_port;
__be16 encap_port;
 
/* This is the max_retrans value for new associations. */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c9e86f5..192ab9a 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4926,6 +4926,7 @@ static int sctp_init_sock(struct sock *sk)
 * be modified via SCTP_PEER_ADDR_PARAMS
 */
sp->hbinterval  = net->sctp.hb_interval;
+   sp->udp_port= htons(net->sctp.udp_port);
sp->encap_port  = htons(net->sctp.encap_port);
sp->pathmaxrxt  = net->sctp.max_retrans_path;
sp->pf_retrans  = net->sctp.pf_retrans;
-- 
2.1.0



[PATCHv2 net-next 12/17] sctp: call sk_setup_caps in sctp_packet_transmit instead

2020-10-08 Thread Xin Long
sk_setup_caps() was originally called in Commit 90017accff61 ("sctp:
Add GSO support"), as:

  "We have to refresh this in case we are xmiting to more than one
   transport at a time"

This actually happens in the loop of sctp_outq_flush_transports(),
and it shouldn't be tied to gso, so move it out of gso part and
before sctp_packet_pack().

Signed-off-by: Xin Long 
---
 net/sctp/output.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 1441eaf..fb16500 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -508,12 +508,6 @@ static int sctp_packet_pack(struct sctp_packet *packet,
sizeof(struct inet6_skb_parm)));
skb_shinfo(head)->gso_segs = pkt_count;
skb_shinfo(head)->gso_size = GSO_BY_FRAGS;
-   rcu_read_lock();
-   if (skb_dst(head) != tp->dst) {
-   dst_hold(tp->dst);
-   sk_setup_caps(sk, tp->dst);
-   }
-   rcu_read_unlock();
goto chksum;
}
 
@@ -593,6 +587,13 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t 
gfp)
}
skb_dst_set(head, dst);
 
+   rcu_read_lock();
+   if (__sk_dst_get(sk) != tp->dst) {
+   dst_hold(tp->dst);
+   sk_setup_caps(sk, tp->dst);
+   }
+   rcu_read_unlock();
+
/* pack up chunks */
pkt_count = sctp_packet_pack(packet, head, gso, gfp);
if (!pkt_count) {
-- 
2.1.0



[PATCHv2 net-next 13/17] sctp: support for sending packet over udp4 sock

2020-10-08 Thread Xin Long
This patch does what the rfc6951#section-5.3 says for ipv4:

  "Within the UDP header, the source port MUST be the local UDP
   encapsulation port number of the SCTP stack, and the destination port
   MUST be the remote UDP encapsulation port number maintained for the
   association and the destination address to which the packet is sent
   (see Section 5.1).

   Because the SCTP packet is the UDP payload, the length of the UDP
   packet MUST be the length of the SCTP packet plus the size of the UDP
   header.

   The SCTP checksum MUST be computed for IPv4 and IPv6, and the UDP
   checksum SHOULD be computed for IPv4 and IPv6."

Some places need to be adjusted in sctp_packet_transmit():

  1. For non-gso packets, when transport's encap_port is set, sctp
 checksum has to be done in sctp_packet_pack(), as the outer
 udp will use ip_summed = CHECKSUM_PARTIAL to do the offload
 setting for checksum.

  2. Delay calling dst_clone() and skb_dst_set() for non-udp packets
 until sctp_v4_xmit(), as for udp packets, skb_dst_set() is not
 needed before calling udp_tunnel_xmit_skb().

then in sctp_v4_xmit():

  1. Go to udp_tunnel_xmit_skb() only when transport->encap_port and
 net->sctp.udp_port both are set, as these are one for dst port
 and another for src port.

  2. For gso packet, SKB_GSO_UDP_TUNNEL_CSUM is set for gso_type, and
 with this udp checksum can be done in __skb_udp_tunnel_segment()
 for each segments after the sctp gso.

  3. inner_mac_header and inner_transport_header are set, as these
 will be needed in __skb_udp_tunnel_segment() to find the right
 headers.

  4. df and ttl are calculated, as these are the required params by
 udp_tunnel_xmit_skb().

  5. nocheck param has to be false, as "the UDP checksum SHOULD be
 computed for IPv4 and IPv6", says in rfc6951#section-5.3.

v1->v2:
  - Use sp->udp_port instead in sctp_v4_xmit(), which is more safe.

Signed-off-by: Xin Long 
---
 net/sctp/output.c   |  9 +++--
 net/sctp/protocol.c | 41 ++---
 2 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index fb16500..6614c9f 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -514,8 +514,8 @@ static int sctp_packet_pack(struct sctp_packet *packet,
if (sctp_checksum_disable)
return 1;
 
-   if (!(skb_dst(head)->dev->features & NETIF_F_SCTP_CRC) ||
-   dst_xfrm(skb_dst(head)) || packet->ipfragok) {
+   if (!(tp->dst->dev->features & NETIF_F_SCTP_CRC) ||
+   dst_xfrm(tp->dst) || packet->ipfragok || tp->encap_port) {
struct sctphdr *sh =
(struct sctphdr *)skb_transport_header(head);
 
@@ -542,7 +542,6 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t 
gfp)
struct sctp_association *asoc = tp->asoc;
struct sctp_chunk *chunk, *tmp;
int pkt_count, gso = 0;
-   struct dst_entry *dst;
struct sk_buff *head;
struct sctphdr *sh;
struct sock *sk;
@@ -579,13 +578,11 @@ int sctp_packet_transmit(struct sctp_packet *packet, 
gfp_t gfp)
sh->checksum = 0;
 
/* drop packet if no dst */
-   dst = dst_clone(tp->dst);
-   if (!dst) {
+   if (!tp->dst) {
IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
kfree_skb(head);
goto out;
}
-   skb_dst_set(head, dst);
 
rcu_read_lock();
if (__sk_dst_get(sk) != tp->dst) {
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 0d16e5e..be002b7 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1059,25 +1059,44 @@ static int sctp_inet_supported_addrs(const struct 
sctp_sock *opt,
 }
 
 /* Wrapper routine that calls the ip transmit routine. */
-static inline int sctp_v4_xmit(struct sk_buff *skb,
-  struct sctp_transport *transport)
+static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t)
 {
-   struct inet_sock *inet = inet_sk(skb->sk);
+   struct dst_entry *dst = dst_clone(t->dst);
+   struct flowi4 *fl4 = &t->fl.u.ip4;
+   struct sock *sk = skb->sk;
+   struct inet_sock *inet = inet_sk(sk);
__u8 dscp = inet->tos;
+   __be16 df = 0;
 
pr_debug("%s: skb:%p, len:%d, src:%pI4, dst:%pI4\n", __func__, skb,
-skb->len, &transport->fl.u.ip4.saddr,
-&transport->fl.u.ip4.daddr);
+skb->len, &fl4->saddr, &fl4->daddr);
+
+   if (t->dscp & SCTP_DSCP_SET_MASK)
+   dscp = t->dscp & SCTP_DSCP_VAL_MASK;
 
-   if (transport->dscp & SCTP_DSCP_SET_MASK)
-   dscp = transport->dscp & SCTP_DSCP_VAL_MASK;
+   inet->pmtudisc = t->param_flags & SPP_PMTUD_ENABLE ? IP_PMTUDISC_DO
+  : IP_PMTUDISC_DONT;
+   SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS);
 
-   inet-

[PATCHv2 net-next 08/17] sctp: add encap_port for netns sock asoc and transport

2020-10-08 Thread Xin Long
encap_port is added as per netns/sock/assoc/transport, and the
latter one's encap_port inherits the former one's by default.
The transport's encap_port value would mostly decide if one
packet should go out with udp encapsulated or not.

This patch also allows users to set netns' encap_port by sysctl.

v1->v2:
  - Change to define encap_port as __be16 for sctp_sock, asoc and
transport.

Signed-off-by: Xin Long 
---
 include/net/netns/sctp.h   |  2 ++
 include/net/sctp/structs.h |  6 ++
 net/sctp/associola.c   |  4 
 net/sctp/protocol.c|  3 +++
 net/sctp/socket.c  |  1 +
 net/sctp/sysctl.c  | 10 ++
 6 files changed, 26 insertions(+)

diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index f622945..6af7a39 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -27,6 +27,8 @@ struct netns_sctp {
struct sock *udp6_sock;
/* udp tunneling listening port. */
int udp_port;
+   /* udp tunneling remote encap port. */
+   int encap_port;
 
/* This is the global local address list.
 * We actively maintain this complete list of addresses on
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 0bdff38..aa98e7e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -178,6 +178,8 @@ struct sctp_sock {
 */
__u32 hbinterval;
 
+   __be16 encap_port;
+
/* This is the max_retrans value for new associations. */
__u16 pathmaxrxt;
 
@@ -877,6 +879,8 @@ struct sctp_transport {
 */
unsigned long last_time_ecne_reduced;
 
+   __be16 encap_port;
+
/* This is the max_retrans value for the transport and will
 * be initialized from the assocs value.  This can be changed
 * using the SCTP_SET_PEER_ADDR_PARAMS socket option.
@@ -1790,6 +1794,8 @@ struct sctp_association {
 */
unsigned long hbinterval;
 
+   __be16 encap_port;
+
/* This is the max_retrans value for new transports in the
 * association.
 */
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index fdb69d4..336df4b 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -99,6 +99,8 @@ static struct sctp_association *sctp_association_init(
 */
asoc->hbinterval = msecs_to_jiffies(sp->hbinterval);
 
+   asoc->encap_port = sp->encap_port;
+
/* Initialize path max retrans value. */
asoc->pathmaxrxt = sp->pathmaxrxt;
 
@@ -624,6 +626,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct 
sctp_association *asoc,
 */
peer->hbinterval = asoc->hbinterval;
 
+   peer->encap_port = asoc->encap_port;
+
/* Set the path max_retrans.  */
peer->pathmaxrxt = asoc->pathmaxrxt;
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index dd2d9c4..5b74187 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1359,6 +1359,9 @@ static int __net_init sctp_defaults_init(struct net *net)
/* Set udp tunneling listening port to default value */
net->sctp.udp_port = SCTP_DEFAULT_UDP_PORT;
 
+   /* Set remote encap port to 0 by default */
+   net->sctp.encap_port = 0;
+
/* Set SCOPE policy to enabled */
net->sctp.scope_policy = SCTP_SCOPE_POLICY_ENABLE;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 53d0a41..09b94cd 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4876,6 +4876,7 @@ static int sctp_init_sock(struct sock *sk)
 * be modified via SCTP_PEER_ADDR_PARAMS
 */
sp->hbinterval  = net->sctp.hb_interval;
+   sp->encap_port  = htons(net->sctp.encap_port);
sp->pathmaxrxt  = net->sctp.max_retrans_path;
sp->pf_retrans  = net->sctp.pf_retrans;
sp->ps_retrans  = net->sctp.ps_retrans;
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index c16c809..ecc1b5e 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -36,6 +36,7 @@ static int rto_alpha_max = 1000;
 static int rto_beta_max = 1000;
 static int pf_expose_max = SCTP_PF_EXPOSE_MAX;
 static int ps_retrans_max = SCTP_PS_RETRANS_MAX;
+static int udp_port_max = 65535;
 
 static unsigned long max_autoclose_min = 0;
 static unsigned long max_autoclose_max =
@@ -291,6 +292,15 @@ static struct ctl_table sctp_net_table[] = {
.proc_handler   = proc_dointvec,
},
{
+   .procname   = "encap_port",
+   .data   = &init_net.sctp.encap_port,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   .extra1 = SYSCTL_ZERO,
+   .extra2 = &udp_port_max,
+   },
+   {
.procname   = "addr_scope_policy",
.data   = &init_net.sctp.scope_policy,
.maxlen = sizeof(int),
-- 
2.1.0



[PATCHv2 net-next 14/17] sctp: support for sending packet over udp6 sock

2020-10-08 Thread Xin Long
This one basically does the similar things in sctp_v6_xmit as does for
udp4 sock in the last patch, just note that:

  1. label needs to be calculated, as it's the param of
 udp_tunnel6_xmit_skb().

  2. The 'nocheck' param of udp_tunnel6_xmit_skb() is false, as
 required by RFC.

v1->v2:
  - Use sp->udp_port instead in sctp_v6_xmit(), which is more safe.

Signed-off-by: Xin Long 
---
 net/sctp/ipv6.c | 43 ---
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index a064bf2..814754d 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -191,33 +192,53 @@ static int sctp_v6_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
return ret;
 }
 
-static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport)
+static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t)
 {
+   struct dst_entry *dst = dst_clone(t->dst);
+   struct flowi6 *fl6 = &t->fl.u.ip6;
struct sock *sk = skb->sk;
struct ipv6_pinfo *np = inet6_sk(sk);
-   struct flowi6 *fl6 = &transport->fl.u.ip6;
__u8 tclass = np->tclass;
-   int res;
+   __be32 label;
 
pr_debug("%s: skb:%p, len:%d, src:%pI6 dst:%pI6\n", __func__, skb,
 skb->len, &fl6->saddr, &fl6->daddr);
 
-   if (transport->dscp & SCTP_DSCP_SET_MASK)
-   tclass = transport->dscp & SCTP_DSCP_VAL_MASK;
+   if (t->dscp & SCTP_DSCP_SET_MASK)
+   tclass = t->dscp & SCTP_DSCP_VAL_MASK;
 
if (INET_ECN_is_capable(tclass))
IP6_ECN_flow_xmit(sk, fl6->flowlabel);
 
-   if (!(transport->param_flags & SPP_PMTUD_ENABLE))
+   if (!(t->param_flags & SPP_PMTUD_ENABLE))
skb->ignore_df = 1;
 
SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS);
 
-   rcu_read_lock();
-   res = ip6_xmit(sk, skb, fl6, sk->sk_mark, rcu_dereference(np->opt),
-  tclass, sk->sk_priority);
-   rcu_read_unlock();
-   return res;
+   if (!t->encap_port || !sctp_sk(sk)->udp_port) {
+   int res;
+
+   skb_dst_set(skb, dst);
+   rcu_read_lock();
+   res = ip6_xmit(sk, skb, fl6, sk->sk_mark,
+  rcu_dereference(np->opt),
+  tclass, sk->sk_priority);
+   rcu_read_unlock();
+   return res;
+   }
+
+   if (skb_is_gso(skb))
+   skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+
+   skb->encapsulation = 1;
+   skb_reset_inner_mac_header(skb);
+   skb_reset_inner_transport_header(skb);
+   skb_set_inner_ipproto(skb, IPPROTO_SCTP);
+   label = ip6_make_flowlabel(sock_net(sk), skb, fl6->flowlabel, true, 
fl6);
+
+   return udp_tunnel6_xmit_skb(dst, sk, skb, NULL, &fl6->saddr,
+   &fl6->daddr, tclass, ip6_dst_hoplimit(dst),
+   label, sctp_sk(sk)->udp_port, 
t->encap_port, false);
 }
 
 /* Returns the dst cache entry for the given source and destination ip
-- 
2.1.0



[PATCHv2 net-next 16/17] sctp: handle the init chunk matching an existing asoc

2020-10-08 Thread Xin Long
This is from Section 4 of draft-tuexen-tsvwg-sctp-udp-encaps-cons-03,
and it requires responding with an abort chunk with an error cause
when the udp source port of the received init chunk doesn't match the
encap port of the transport.

Signed-off-by: Xin Long 
---
 net/sctp/sm_statefuns.c | 50 +
 1 file changed, 50 insertions(+)

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 8edab15..244a5d8 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -87,6 +87,13 @@ static enum sctp_disposition sctp_sf_tabort_8_4_8(
const union sctp_subtype type,
void *arg,
struct sctp_cmd_seq *commands);
+static enum sctp_disposition sctp_sf_new_encap_port(
+   struct net *net,
+   const struct sctp_endpoint *ep,
+   const struct sctp_association *asoc,
+   const union sctp_subtype type,
+   void *arg,
+   struct sctp_cmd_seq *commands);
 static struct sctp_sackhdr *sctp_sm_pull_sack(struct sctp_chunk *chunk);
 
 static enum sctp_disposition sctp_stop_t1_and_abort(
@@ -1493,6 +1500,10 @@ static enum sctp_disposition sctp_sf_do_unexpected_init(
if (!sctp_chunk_length_valid(chunk, sizeof(struct sctp_init_chunk)))
return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
  commands);
+
+   if (SCTP_INPUT_CB(chunk->skb)->encap_port != 
chunk->transport->encap_port)
+   return sctp_sf_new_encap_port(net, ep, asoc, type, arg, 
commands);
+
/* Grab the INIT header.  */
chunk->subh.init_hdr = (struct sctp_inithdr *)chunk->skb->data;
 
@@ -3392,6 +3403,45 @@ static enum sctp_disposition sctp_sf_tabort_8_4_8(
 
sctp_packet_append_chunk(packet, abort);
 
+   sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT, SCTP_PACKET(packet));
+
+   SCTP_INC_STATS(net, SCTP_MIB_OUTCTRLCHUNKS);
+
+   sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
+   return SCTP_DISPOSITION_CONSUME;
+}
+
+/* Handling of SCTP Packets Containing an INIT Chunk Matching an
+ * Existing Associations when the udp encap port is incorrect.
+ *
+ * From Section 4 at draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
+ */
+static enum sctp_disposition sctp_sf_new_encap_port(
+   struct net *net,
+   const struct sctp_endpoint *ep,
+   const struct sctp_association *asoc,
+   const union sctp_subtype type,
+   void *arg,
+   struct sctp_cmd_seq *commands)
+{
+   struct sctp_packet *packet = NULL;
+   struct sctp_chunk *chunk = arg;
+   struct sctp_chunk *abort;
+
+   packet = sctp_ootb_pkt_new(net, asoc, chunk);
+   if (!packet)
+   return SCTP_DISPOSITION_NOMEM;
+
+   abort = sctp_make_new_encap_port(asoc, chunk);
+   if (!abort) {
+   sctp_ootb_pkt_free(packet);
+   return SCTP_DISPOSITION_NOMEM;
+   }
+
+   abort->skb->sk = ep->base.sk;
+
+   sctp_packet_append_chunk(packet, abort);
+
sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT,
SCTP_PACKET(packet));
 
-- 
2.1.0



Re: [PATCH 0/7] wfx: move out from the staging area

2020-10-08 Thread Kalle Valo
Kalle Valo  writes:

> Greg Kroah-Hartman  writes:
>
>> On Wed, Oct 07, 2020 at 12:19:36PM +0200, Jerome Pouiller wrote:
>>> From: Jérôme Pouiller 
>>> 
>>> I think the wfx driver is now mature enough to be accepted in the
>>> drivers/net/wireless directory.
>>> 
>>> There is still one item on the TODO list. It is an idea to improve the rate
>>> control in some particular cases[1]. However, the current performances of 
>>> the
>>> driver seem to satisfy everyone. In add, the suggested change is large 
>>> enough.
>>> So, I would prefer to implement it only if it really solves an issue. I 
>>> think it
>>> is not an obstacle to move the driver out of the staging area.
>>> 
>>> In order to comply with the last rules for the DT bindings, I have 
>>> converted the
>>> documentation to yaml. I am moderately happy with the result. Especially, 
>>> for
>>> the description of the binding. Any comments are welcome.
>>> 
>>> The series also update the copyrights dates of the files. I don't know 
>>> exactly
>>> how this kind of changes should be sent. It's a bit weird to change all the
>>> copyrights in one commit, but I do not see any better way.
>>> 
>>> I also include a few fixes I have found these last weeks.
>>> 
>>> [1] https://lore.kernel.org/lkml/3099559.gv3Q75KnN1@pc-42
>>
>> I'll take the first 6 patches here, the last one you should work with
>> the wireless maintainers to get reviewed.
>>
>> Maybe that might want to wait until after 5.10-rc1 is out, with all of
>> these changes in it, making it an easier move.
>
> Yes, the driver needs to be reviewed in linux-wireless list. I recommend
> submitting the whole driver in a patchset with one file per patch, which
> seems to be the easiest way to review a full driver. The final move will
> be in just one commit moving the driver, just like patch 7 does here. As
> an example see how wilc1000 review was done.
>
> Device tree bindings needs to be reviewed by the DT maintainer so CC
> devicetree on that patch.

BTW, I wrote some instructions for new wireless drivers:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#new_driver

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


[PATCHv2 net-next 15/17] sctp: add the error cause for new encapsulation port restart

2020-10-08 Thread Xin Long
This patch is to add the function to make the abort chunk with
the error cause for new encapsulation port restart, defined
on Section 4.4 in draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h | 20 
 include/net/sctp/sm.h|  3 +++
 net/sctp/sm_make_chunk.c | 20 
 3 files changed, 43 insertions(+)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 7673123..bb19265 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -482,11 +482,13 @@ enum sctp_error {
 *  11  Restart of an association with new addresses
 *  12  User Initiated Abort
 *  13  Protocol Violation
+*  14  Restart of an Association with New Encapsulation Port
 */
 
SCTP_ERROR_RESTART = cpu_to_be16(0x0b),
SCTP_ERROR_USER_ABORT  = cpu_to_be16(0x0c),
SCTP_ERROR_PROTO_VIOLATION = cpu_to_be16(0x0d),
+   SCTP_ERROR_NEW_ENCAP_PORT  = cpu_to_be16(0x0e),
 
/* ADDIP Section 3.3  New Error Causes
 *
@@ -793,4 +795,22 @@ enum {
SCTP_FLOWLABEL_VAL_MASK = 0xf
 };
 
+/* UDP Encapsulation
+ * draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.html#section-4-4
+ *
+ *   The error cause indicating an "Restart of an Association with
+ *   New Encapsulation Port"
+ *
+ * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |Cause Code = 14|   Cause Length = 8|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |   Current Encapsulation Port  | New Encapsulation Port|
+ * +---+---+
+ */
+struct sctp_new_encap_port_hdr {
+   __be16 cur_port;
+   __be16 new_port;
+};
+
 #endif /* __LINUX_SCTP_H__ */
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index a499341..fd223c9 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -221,6 +221,9 @@ struct sctp_chunk *sctp_make_violation_paramlen(
 struct sctp_chunk *sctp_make_violation_max_retrans(
const struct sctp_association *asoc,
const struct sctp_chunk *chunk);
+struct sctp_chunk *sctp_make_new_encap_port(
+   const struct sctp_association *asoc,
+   const struct sctp_chunk *chunk);
 struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc,
   const struct sctp_transport *transport);
 struct sctp_chunk *sctp_make_heartbeat_ack(const struct sctp_association *asoc,
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 21d0ff1..3bf1399 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1142,6 +1142,26 @@ struct sctp_chunk *sctp_make_violation_max_retrans(
return retval;
 }
 
+struct sctp_chunk *sctp_make_new_encap_port(const struct sctp_association 
*asoc,
+   const struct sctp_chunk *chunk)
+{
+   struct sctp_new_encap_port_hdr nep;
+   struct sctp_chunk *retval;
+
+   retval = sctp_make_abort(asoc, chunk,
+sizeof(struct sctp_errhdr) + sizeof(nep));
+   if (!retval)
+   goto nodata;
+
+   sctp_init_cause(retval, SCTP_ERROR_NEW_ENCAP_PORT, sizeof(nep));
+   nep.cur_port = htons(SCTP_INPUT_CB(chunk->skb)->encap_port);
+   nep.new_port = htons(chunk->transport->encap_port);
+   sctp_addto_chunk(retval, sizeof(nep), &nep);
+
+nodata:
+   return retval;
+}
+
 /* Make a HEARTBEAT chunk.  */
 struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc,
   const struct sctp_transport *transport)
-- 
2.1.0



[PATCHv2 net-next 17/17] sctp: enable udp tunneling socks

2020-10-08 Thread Xin Long
This patch is to enable udp tunneling socks by calling
sctp_udp_sock_start() in sctp_ctrlsock_init(), and
sctp_udp_sock_stop() in sctp_ctrlsock_exit().

Also add sysctl udp_port to allow changing the listening
sock's port by users.

Wit this patch, the whole sctp over udp feature can be
enabled and used.

v1->v2:
  - Also update ctl_sock udp_port in proc_sctp_do_udp_port()
where netns udp_port gets changed.

Signed-off-by: Xin Long 
---
 net/sctp/protocol.c |  5 +
 net/sctp/sysctl.c   | 49 +
 2 files changed, 54 insertions(+)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index be002b7..79fb4b5 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1469,6 +1469,10 @@ static int __net_init sctp_ctrlsock_init(struct net *net)
if (status)
pr_err("Failed to initialize the SCTP control sock\n");
 
+   status = sctp_udp_sock_start(net);
+   if (status)
+   pr_err("Failed to initialize the SCTP udp tunneling sock\n");
+
return status;
 }
 
@@ -1476,6 +1480,7 @@ static void __net_exit sctp_ctrlsock_exit(struct net *net)
 {
/* Free the control endpoint.  */
inet_ctl_sock_destroy(net->sctp.ctl_sock);
+   sctp_udp_sock_stop(net);
 }
 
 static struct pernet_operations sctp_ctrlsock_ops = {
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index ecc1b5e..a723613 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -49,6 +49,8 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int 
write,
void *buffer, size_t *lenp, loff_t *ppos);
 static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write, void *buffer,
size_t *lenp, loff_t *ppos);
+static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write, void 
*buffer,
+size_t *lenp, loff_t *ppos);
 static int proc_sctp_do_alpha_beta(struct ctl_table *ctl, int write,
   void *buffer, size_t *lenp, loff_t *ppos);
 static int proc_sctp_do_auth(struct ctl_table *ctl, int write,
@@ -292,6 +294,15 @@ static struct ctl_table sctp_net_table[] = {
.proc_handler   = proc_dointvec,
},
{
+   .procname   = "udp_port",
+   .data   = &init_net.sctp.udp_port,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_sctp_do_udp_port,
+   .extra1 = SYSCTL_ZERO,
+   .extra2 = &udp_port_max,
+   },
+   {
.procname   = "encap_port",
.data   = &init_net.sctp.encap_port,
.maxlen = sizeof(int),
@@ -487,6 +498,44 @@ static int proc_sctp_do_auth(struct ctl_table *ctl, int 
write,
return ret;
 }
 
+static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write,
+void *buffer, size_t *lenp, loff_t *ppos)
+{
+   struct net *net = current->nsproxy->net_ns;
+   unsigned int min = *(unsigned int *)ctl->extra1;
+   unsigned int max = *(unsigned int *)ctl->extra2;
+   struct ctl_table tbl;
+   int ret, new_value;
+
+   memset(&tbl, 0, sizeof(struct ctl_table));
+   tbl.maxlen = sizeof(unsigned int);
+
+   if (write)
+   tbl.data = &new_value;
+   else
+   tbl.data = &net->sctp.udp_port;
+
+   ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
+   if (write && ret == 0) {
+   struct sock *sk = net->sctp.ctl_sock;
+
+   if (new_value > max || new_value < min)
+   return -EINVAL;
+
+   net->sctp.udp_port = new_value;
+   sctp_udp_sock_stop(net);
+   ret = sctp_udp_sock_start(net);
+   if (ret)
+   net->sctp.udp_port = 0;
+
+   lock_sock(sk);
+   sctp_sk(sk)->udp_port = net->sctp.udp_port;
+   release_sock(sk);
+   }
+
+   return ret;
+}
+
 int sctp_sysctl_net_register(struct net *net)
 {
struct ctl_table *table;
-- 
2.1.0



Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping

2020-10-08 Thread Kurt Kanzenbach
On Thu Oct 08 2020, Vladimir Oltean wrote:
> On Thu, Oct 08, 2020 at 10:34:11AM +0200, Kurt Kanzenbach wrote:
>> On Wed Oct 07 2020, Vladimir Oltean wrote:
>> > On Wed, Oct 07, 2020 at 12:39:49PM +0200, Kurt Kanzenbach wrote:
>> >> For instance the hellcreek switch has actually three ptp hardware
>> >> clocks and the time stamping can be configured to use either one of
>> >> them.
>> >
>> > The sja1105 also has a corrected and an uncorrected PTP clock that can
>> > take timestamps. Initially I had thought I'd be going to spend some time
>> > figuring out multi-PHC support, but now I don't see any practical reason
>> > to use the uncorrected PHC for anything.
>> 
>> Just out of curiosity: How do you implement 802.1AS then? My
>> understanding is that the free-running clock has to be used for the
>
> Has to be? I couldn't find that wording in IEEE 802.1AS-2011.

It doesn't has to be, it *should* be. That's at least the outcome we had
after lots of discussions. Actually Kamil (on Cc) is the expert on this
topic.

>
>> calculation of the peer delays and such meaning there should be a way to
>> get access to both PHCs or having some form of cross timestamping
>> available.
>> 
>> The hellcreek switch can take cross snapshots of all three ptp clocks in
>> hardware for that purpose.
>
> Well, at the end of the day, all the other TSN offloads (tc-taprio,
> tc-gate) will still have to use the synchronized PTP clock, so what
> we're doing is we're simply letting that clock be synchronized by
> ptp4l.

Yes, the synchronized clock is of course needed for the traffic
scheduling and so on. This is what we do here in this code as well. Only
the synchronized one is exported to user space and used. However, the
multi PHCs issue should be addressed as well at some point.

>
>> >> > So when you'll poll for TX timestamps, you'll receive a TX
>> >> > timestamp from the PHY and another one from the switch, and those will
>> >> > be in a race with one another, so you won't know which one is which.
>> >> 
>> >> OK. So what happens if the driver will accept to disable hardware
>> >> timestamping? Is there anything else that needs to be implemented? Are
>> >> there (good) examples?
>> >
>> > It needs to not call skb_complete_tx_timestamp() and friends.
>> >
>> > For PHY timestamping, it also needs to invoke the correct methods for RX
>> > and for TX, where the PHY timestamping hooks will get called. I don't
>> > think that DSA is compatible yet with PHY timestamping, but it is
>> > probably a trivial modification.
>> 
>> Hmm? If DSA doesn't support PHY timestamping how are other DSA drivers
>> dealing with it then? I'm getting really confused.
>
> They aren't dealing with it, of course.
>
>> Furthermore, there is no hellcreek hardware available with timestamping
>> capable PHYs. How am I supposed to even test this?
>> 
>> For now, until there is hardware available, PHY timestamping is not
>> supported with the hellcreek switch.
>
> I was just pointing out that this is something you'll certainly have to
> change if somebody will want PHY timestamping.

Understood.

>
> Even without hardware, you _could_ probably test that DSA is doing the
> right thing by simply adding the PTP timestamping ops to a PHY driver
> that you own, and inject dummy timestamps. The expectation becomes that
> user space gets those dummy timestamps, and not the ones emitted by your
> switch.

Of course it can be mocked. Whenever somebody wants to do PHY
timestamping with a hellcreek switch this issue can be re-visited.

Thanks,
Kurt


signature.asc
Description: PGP signature


Re: [PATCHv2 net-next 04/17] udp: support sctp over udp in skb_udp_tunnel_segment

2020-10-08 Thread Xin Long
CCing Tom Herbert

On Thu, Oct 8, 2020 at 5:48 PM Xin Long  wrote:
>
> To call sctp_gso_segment() properly in skb_udp_tunnel_segment() for sctp
> over udp packets, we need to set transport_header to sctp header. When
> skb->ip_summed == CHECKSUM_PARTIAL, skb_crc32c_csum_help() should be
> called for the inner sctp packet.
>
> Cc: Tom Herbert 
> Signed-off-by: Xin Long 
> ---
>  net/ipv4/udp_offload.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index c0b010b..a484405 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -49,6 +49,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct 
> sk_buff *skb,
> __skb_pull(skb, tnl_hlen);
> skb_reset_mac_header(skb);
> skb_set_network_header(skb, skb_inner_network_offset(skb));
> +   skb_set_transport_header(skb, skb_inner_transport_offset(skb));
> skb->mac_len = skb_inner_network_offset(skb);
> skb->protocol = new_protocol;
>
> @@ -131,8 +132,12 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct 
> sk_buff *skb,
> uh->check = ~csum_fold(csum_add(partial,
>(__force __wsum)htonl(len)));
>
> -   if (skb->encapsulation)
> -   skb_checksum_help(skb);
> +   if (skb->encapsulation) {
> +   if (skb->csum_not_inet)
> +   skb_crc32c_csum_help(skb);
> +   else
> +   skb_checksum_help(skb);
> +   }
>
> if (offload_csum) {
> skb->ip_summed = CHECKSUM_PARTIAL;
> --
> 2.1.0
>


Re: [PATCHv2 net-next 03/17] udp: do checksum properly in skb_udp_tunnel_segment

2020-10-08 Thread Xin Long
CCing Tom Herbert

On Thu, Oct 8, 2020 at 5:48 PM Xin Long  wrote:
>
> This patch fixes two things:
>
>   When skb->ip_summed == CHECKSUM_PARTIAL, skb_checksum_help() should be
>   called do the checksum, instead of gso_make_checksum(), which is used
>   to do the checksum for current proto after calling skb_segment(), not
>   after the inner proto's gso_segment().
>
>   When offload_csum is disabled, the hardware will not do the checksum
>   for the current proto, udp. So instead of calling gso_make_checksum(),
>   it should calculate checksum for udp itself.
>
> Cc: Tom Herbert 
> Signed-off-by: Xin Long 
> ---
>  net/ipv4/udp_offload.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index e67a66f..c0b010b 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -131,14 +131,15 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct 
> sk_buff *skb,
> uh->check = ~csum_fold(csum_add(partial,
>(__force __wsum)htonl(len)));
>
> -   if (skb->encapsulation || !offload_csum) {
> -   uh->check = gso_make_checksum(skb, ~uh->check);
> -   if (uh->check == 0)
> -   uh->check = CSUM_MANGLED_0;
> -   } else {
> +   if (skb->encapsulation)
> +   skb_checksum_help(skb);
> +
> +   if (offload_csum) {
> skb->ip_summed = CHECKSUM_PARTIAL;
> skb->csum_start = skb_transport_header(skb) - 
> skb->head;
> skb->csum_offset = offsetof(struct udphdr, check);
> +   } else {
> +   uh->check = csum_fold(skb_checksum(skb, udp_offset, 
> len, 0));
> }
> } while ((skb = skb->next));
>  out:
> --
> 2.1.0
>


Re: [PATCH 0/7] wfx: move out from the staging area

2020-10-08 Thread Jérôme Pouiller
On Thursday 8 October 2020 09:30:06 CEST Kalle Valo wrote:
[...]
> Yes, the driver needs to be reviewed in linux-wireless list. I recommend
> submitting the whole driver in a patchset with one file per patch, which
> seems to be the easiest way to review a full driver. The final move will
> be in just one commit moving the driver, just like patch 7 does here. As
> an example see how wilc1000 review was done.

I see. I suppose it is still a bit complicated to review? Maybe I could
try to make things easier.

For my submission to staging/ I had taken time to split the driver in an
understandable series of patches[1]. I think it was easier to review than
just sending files one by one. I could do the same thing for the
submission to linux-wireless. It would ask me a bit of work but, since I
already have a template, it is conceivable.

Do you think it is worth it, or it would be an unnecessary effort?

[1] 
https://lore.kernel.org/driverdev-devel/20190919142527.31797-1-jerome.pouil...@silabs.com/
 or commits a7a91ca5a23d^..40115bbc40e2

-- 
Jérôme Pouiller




Re: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Leon Romanovsky
On Thu, Oct 08, 2020 at 09:45:29AM +, Parav Pandit wrote:
>
>
> > From: Leon Romanovsky 
> > Sent: Thursday, October 8, 2020 1:15 PM
> >
> > On Thu, Oct 08, 2020 at 07:14:17AM +, Parav Pandit wrote:
> > >
> > >
> > > > From: Leon Romanovsky 
> > > > Sent: Thursday, October 8, 2020 10:56 AM
> > > >
> > > > On Thu, Oct 08, 2020 at 04:56:01AM +, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > > From: Pierre-Louis Bossart
> > > > > > 
> > > > > > Sent: Thursday, October 8, 2020 3:20 AM
> > > > > >
> > > > > >
> > > > > > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > > > > > >> -Original Message-
> > > > > > >> From: Pierre-Louis Bossart
> > > > > > >> 
> > > > > > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > > > > > >> To: Ertman, David M ; Parav Pandit
> > > > > > >> ; Leon Romanovsky 
> > > > > > >> Cc: alsa-de...@alsa-project.org; pa...@mellanox.com;
> > > > > > >> ti...@suse.de; netdev@vger.kernel.org;
> > > > > > >> ranjani.sridha...@linux.intel.com;
> > > > > > >> fred...@linux.intel.com; linux-r...@vger.kernel.org;
> > > > > > >> dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe
> > > > > > >> ; gre...@linuxfoundation.org;
> > > > > > >> k...@kernel.org; Williams, Dan J ;
> > > > > > >> Saleem, Shiraz ;
> > > > > > >> da...@davemloft.net; Patil, Kiran 
> > > > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > >  Below is most simple, intuitive and matching with core APIs
> > > > > >  for name and design pattern wise.
> > > > > >  init()
> > > > > >  {
> > > > > > err = ancillary_device_initialize();
> > > > > > if (err)
> > > > > > return ret;
> > > > > > 
> > > > > > err = ancillary_device_add();
> > > > > > if (ret)
> > > > > > goto err_unwind;
> > > > > > 
> > > > > > err = some_foo();
> > > > > > if (err)
> > > > > > goto err_foo;
> > > > > > return 0;
> > > > > > 
> > > > > >  err_foo:
> > > > > > ancillary_device_del(adev);
> > > > > >  err_unwind:
> > > > > > ancillary_device_put(adev->dev);
> > > > > > return err;
> > > > > >  }
> > > > > > 
> > > > > >  cleanup()
> > > > > >  {
> > > > > > ancillary_device_de(adev);
> > > > > > ancillary_device_put(adev);
> > > > > > /* It is common to have a one wrapper for this as
> > > > > >  ancillary_device_unregister().
> > > > > >  * This will match with core device_unregister() that 
> > > > > >  has
> > > > > >  precise documentation.
> > > > > >  * but given fact that init() code need proper error
> > > > > >  unwinding, like above,
> > > > > >  * it make sense to have two APIs, and no need to export
> > > > > >  another symbol for unregister().
> > > > > >  * This pattern is very easy to audit and code.
> > > > > >  */
> > > > > >  }
> > > > > > >>>
> > > > > > >>> I like this flow +1
> > > > > > >>>
> > > > > > >>> But ... since the init() function is performing both
> > > > > > >>> device_init and device_add - it should probably be called
> > > > > > >>> ancillary_device_register, and we are back to a single
> > > > > > >>> exported API for both register and unregister.
> > > > > > >>
> > > > > > >> Kind reminder that we introduced the two functions to allow
> > > > > > >> the caller to know if it needed to free memory when
> > > > > > >> initialize() fails, and it didn't need to free memory when
> > > > > > >> add() failed since
> > > > > > >> put_device() takes care of it. If you have a single init()
> > > > > > >> function it's impossible to know which behavior to select on 
> > > > > > >> error.
> > > > > > >>
> > > > > > >> I also have a case with SoundWire where it's nice to first
> > > > > > >> initialize, then set some data and then add.
> > > > > > >>
> > > > > > >
> > > > > > > The flow as outlined by Parav above does an initialize as the
> > > > > > > first step, so every error path out of the function has to do
> > > > > > > a put_device(), so you would never need to manually free the
> > > > > > > memory in
> > > > > > the setup function.
> > > > > > > It would be freed in the release call.
> > > > > >
> > > > > > err = ancillary_device_initialize(); if (err)
> > > > > > return ret;
> > > > > >
> > > > > > where is the put_device() here? if the release function does any
> > > > > > sort of kfree, then you'd need to do it manually in this case.
> > > > > Since device_initialize() failed, put_device() cannot be done here.
> > > > > So yes, pseudo code should have shown, if (err) {
> > > > >   kfree(adev);
> > > > >   return err;
> > > > > }
> > > > >
> > > > > If we just want to follow register(), unregister() pattern,
> > > > >
> > > > > Than,
> > > > >
> > > > > ancillar_device_reg

Re: [PATCH net] bridge: Netlink interface fix.

2020-10-08 Thread Nikolay Aleksandrov
On Wed, 2020-10-07 at 14:49 +, Nikolay Aleksandrov wrote:
> On Wed, 2020-10-07 at 12:07 +, Henrik Bjoernlund wrote:
> > This commit is correcting NETLINK br_fill_ifinfo() to be able to
> > handle 'filter_mask' with multiple flags asserted.
> > 
> > Fixes: 36a8e8e265420 ("bridge: Extend br_fill_ifinfo to return MPR status")
> > 
> > Signed-off-by: Henrik Bjoernlund 
> > Reviewed-by: Horatiu Vultur 
> > Suggested-by: Nikolay Aleksandrov 
> > Tested-by: Horatiu Vultur 
> > ---
> >  net/bridge/br_netlink.c | 26 +++---
> >  1 file changed, 11 insertions(+), 15 deletions(-)
> > 
> 
> The patch looks good, please don't separate the Fixes tag from the others.
> Acked-by: Nikolay Aleksandrov 
> 

TBH, this does change a user facing api (the attribute nesting), but I think
in this case it's acceptable due to the format being wrong and MRP being new, so
I doubt anyone is yet dumping it mixed with vlan filter_mask and checking for
two identical attributes, i.e. in the old/broken case parsing the attributes
into a table would hide one of them and you'd have to walk over all attributes
to catch that.




Re: [PATCH bpf-next 0/3] libbpf: auto-resize relocatable LOAD/STORE instructions

2020-10-08 Thread Luka Perkov
Hello Andrii,

On Wed, Oct 7, 2020 at 8:01 PM Andrii Nakryiko
 wrote:
>
> On Wed, Oct 7, 2020 at 10:56 AM Luka Perkov  wrote:
> >
> > Hello Andrii,
> >
> > On Fri, Oct 2, 2020 at 3:09 AM Andrii Nakryiko  wrote:
> > > Patch set implements logic in libbpf to auto-adjust memory size (1-, 2-, 
> > > 4-,
> > > 8-bytes) of load/store (LD/ST/STX) instructions which have BPF CO-RE field
> > > offset relocation associated with it. In practice this means transparent
> > > handling of 32-bit kernels, both pointer and unsigned integers. Signed
> > > integers are not relocatable with zero-extending loads/stores, so libbpf
> > > poisons them and generates a warning. If/when BPF gets support for 
> > > sign-extending
> > > loads/stores, it would be possible to automatically relocate them as well.
> > >
> > > All the details are contained in patch #1 comments and commit message.
> > > Patch #2 is a simple change in libbpf to make advanced testing with 
> > > custom BTF
> > > easier. Patch #3 validates correct uses of auto-resizable loads, as well 
> > > as
> > > check that libbpf fails invalid uses.
> > >
> > > I'd really appreciate folks that use BPF on 32-bit architectures to test 
> > > this
> > > out with their BPF programs and report if there are any problems with the
> > > approach.
> > >
> > > Cc: Luka Perkov 
> >
> > First, thank you for the support and sending this series. It took us a
> > bit longer to run the tests as our target hardware still did not fully
> > get complete mainline support and we had to rebase our patches. These
> > are not related to BPF.
> >
> > Related to this patch, we have tested various BPF programs with this
> > patch, and can confirm that it fixed previous issues with pointer
> > offsets that we had and reported at:
> >
> > https://lore.kernel.org/r/CA+XBgLU=8PFkP8S32e4gpst0=r4mfv8rza5kao+cepysntr...@mail.gmail.com/.
> >
> > Most of our programs now work and we are currently debugging other
> > programs that still aren't working. We are still not sure if the
> > remaining issues are related to this or not, but will let you know
> > sometime this week after further and more detailed investigation.
> >
>
> Ok, great, thanks for the update.

Just to update you that we have identified that the problem was a
known issue with JIT as we had enabled the BPF_JIT_ALWAYS_ON.

That said, it would be great to see this series included in 5.10 :)

Thanks,
Luka


pull-request: mac80211 2020-10-08

2020-10-08 Thread Johannes Berg
Hi Dave,

Not sure you were still planning to send anything to Linus,
but even if not this single fix seemed appropriate for net.
Seems to be a long-standing issue.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit d91dc434f2baa592e9793597421231174d57bbbf:

  Merge tag 'rxrpc-fixes-20201005' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs (2020-10-06 
06:18:20 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git 
tags/mac80211-for-net-2020-10-08

for you to fetch changes up to 3dc289f8f139997f4e9d3cfccf8738f20d23e47b:

  net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() 
(2020-10-08 12:37:25 +0200)


A single fix for missing input validation in nl80211.


Anant Thazhemadam (1):
  net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()

 net/wireless/nl80211.c | 3 +++
 1 file changed, 3 insertions(+)



pull-request: mac80211-next 2020-10-08

2020-10-08 Thread Johannes Berg
Hi Dave,

And also a few more patches for net-next, mostly fixes
for the work that recently landed there.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit 9faebeb2d80065926dfbc09cb73b1bb7779a89cd:

  Merge branch 'ethtool-allow-dumping-policies-to-user-space' (2020-10-06 
06:25:56 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git 
tags/mac80211-next-for-net-next-2020-10-08

for you to fetch changes up to ba6ff70a3bb76c1ff440d3a0044b82e97abb648f:

  mac80211: copy configured beacon tx rate to driver (2020-10-08 12:26:35 +0200)


A handful of changes:
 * fixes for the recent S1G work
 * a docbook build time improvement
 * API to pass beacon rate to lower-level driver


Mauro Carvalho Chehab (1):
  docs: net: 80211: reduce docs build time

Rajkumar Manoharan (1):
  mac80211: copy configured beacon tx rate to driver

Thomas Pedersen (3):
  mac80211: handle lack of sband->bitrates in rates
  mac80211: initialize last_rate for S1G STAs
  cfg80211: only allow S1G channels on S1G band

 Documentation/driver-api/80211/cfg80211.rst| 392 ++---
 .../driver-api/80211/mac80211-advanced.rst | 151 +++-
 Documentation/driver-api/80211/mac80211.rst| 148 +++-
 include/net/mac80211.h |   3 +
 net/mac80211/Makefile  |   1 +
 net/mac80211/cfg.c |   6 +-
 net/mac80211/ieee80211_i.h |   3 +
 net/mac80211/mlme.c|   4 +-
 net/mac80211/rate.c|   1 +
 net/mac80211/s1g.c |  16 +
 net/mac80211/sta_info.c|   4 +
 net/mac80211/sta_info.h|   1 +
 net/wireless/chan.c|   5 +-
 13 files changed, 240 insertions(+), 495 deletions(-)
 create mode 100644 net/mac80211/s1g.c



Re: [PATCH 01/29] iwlwifi: dvm: Demote non-compliant kernel-doc headers

2020-10-08 Thread Kalle Valo
Lee Jones  wrote:

> None of these headers attempt to document any function parameters.
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:388: warning: Function 
> parameter or member 't' not described in 'iwl_bg_statistics_periodic'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:545: warning: Function 
> parameter or member 't' not described in 'iwl_bg_ucode_trace'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:771: warning: Function 
> parameter or member 'priv' not described in 'iwl_alive_start'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function 
> parameter or member 'priv' not described in 'iwl_print_event_log'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function 
> parameter or member 'start_idx' not described in 'iwl_print_event_log'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function 
> parameter or member 'num_events' not described in 'iwl_print_event_log'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function 
> parameter or member 'mode' not described in 'iwl_print_event_log'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function 
> parameter or member 'pos' not described in 'iwl_print_event_log'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function 
> parameter or member 'buf' not described in 'iwl_print_event_log'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function 
> parameter or member 'bufsz' not described in 'iwl_print_event_log'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'priv' not described in 'iwl_print_last_event_logs'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'capacity' not described in 'iwl_print_last_event_logs'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'num_wraps' not described in 'iwl_print_last_event_logs'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'next_entry' not described in 'iwl_print_last_event_logs'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'size' not described in 'iwl_print_last_event_logs'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'mode' not described in 'iwl_print_last_event_logs'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'pos' not described in 'iwl_print_last_event_logs'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'buf' not described in 'iwl_print_last_event_logs'
>  drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function 
> parameter or member 'bufsz' not described in 'iwl_print_last_event_logs'
> 
> Cc: Johannes Berg 
> Cc: Emmanuel Grumbach 
> Cc: Luca Coelho 
> Cc: Intel Linux Wireless 
> Cc: Kalle Valo 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: linux-wirel...@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Lee Jones 

14 patches applied to wireless-drivers-next.git, thanks.

7cb391ffdf3c iwlwifi: dvm: Demote non-compliant kernel-doc headers
b392eabc6abe iwlwifi: rs: Demote non-compliant kernel-doc headers
229b5582deb5 iwlwifi: dvm: tx: Demote non-compliant kernel-doc headers
c8a11a84671e iwlwifi: dvm: lib: Demote non-compliant kernel-doc headers
7619ccceae49 iwlwifi: calib: Demote seemingly unintentional kerneldoc header
8f7ed7bf1384 iwlwifi: dvm: sta: Demote a bunch of nonconformant kernel-doc 
headers
707c528a8d51 iwlwifi: mvm: ops: Remove unused static struct 
'iwl_mvm_debug_names'
108285ec6851 iwlwifi: dvm: Demote a couple of nonconformant kernel-doc headers
7b37b874fce3 iwlwifi: mvm: utils: Fix some doc-rot
de00105cf0dc iwlwifi: dvm: scan: Demote a few nonconformant kernel-doc headers
3a7d806926bb iwlwifi: dvm: rxon: Demote non-conformant kernel-doc headers
91b4780fbae7 iwlwifi: mvm: tx: Demote misuse of kernel-doc headers
6806fc7fcfb2 iwlwifi: dvm: devices: Fix function documentation formatting issues
7d4ced86997f iwlwifi: iwl-drv: Provide descriptions debugfs dentries

-- 
https://patchwork.kernel.org/patch/11766851/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



[PATCH v5 0/2] netlink: export policy on validation failures

2020-10-08 Thread Johannes Berg
Export the policy used for attribute validation when it fails,
so e.g. for an out-of-range attribute userspace immediately gets
the valid ranges back.

v2 incorporates the suggestion from Jakub to have a function to
estimate the size (netlink_policy_dump_attr_size_estimate()) and
check that it does the right thing on the *normal* policy dumps,
not (just) when calling it from the error scenario.

v3 only addresses a few minor style issues.

v4 fixes up a forgotten 'git add' ... sorry.

v5 is a resend, I messed up v4's cover letter subject (saying v3)
and apparently the second patch didn't go out at all.


Tested using nl80211/iw in a few scenarios, seems to work fine
and return the policy back, e.g.

kernel reports: integer out of range
policy: 04 00 0b 00 0c 00 04 00 01 00 00 00 00 00 00 00 
^ padding
^ minimum allowed value
policy: 04 00 0b 00 0c 00 05 00 ff ff ff ff 00 00 00 00 
^ padding
^ maximum allowed value
policy: 08 00 01 00 04 00 00 00 
^ type 4 == U32

for an out-of-range case.

johannes





[PATCH v5 1/2] netlink: policy: refactor per-attr policy writing

2020-10-08 Thread Johannes Berg
From: Johannes Berg 

Refactor the per-attribute policy writing into a new
helper function, to be used later for dumping out the
policy of a rejected attribute.

v2:
 - fix some indentation
v3:
 - change variable order in netlink_policy_dump_write()

Reviewed-by: Jakub Kicinski 
Signed-off-by: Johannes Berg 
---
 net/netlink/policy.c | 79 
 1 file changed, 51 insertions(+), 28 deletions(-)

diff --git a/net/netlink/policy.c b/net/netlink/policy.c
index ee26d01328ee..4383436759e2 100644
--- a/net/netlink/policy.c
+++ b/net/netlink/policy.c
@@ -196,49 +196,33 @@ bool netlink_policy_dump_loop(struct 
netlink_policy_dump_state *state)
return !netlink_policy_dump_finished(state);
 }
 
-/**
- * netlink_policy_dump_write - write current policy dump attributes
- * @skb: the message skb to write to
- * @state: the policy dump state
- *
- * Returns: 0 on success, an error code otherwise
- */
-int netlink_policy_dump_write(struct sk_buff *skb,
- struct netlink_policy_dump_state *state)
+static int
+__netlink_policy_dump_write_attr(struct netlink_policy_dump_state *state,
+struct sk_buff *skb,
+const struct nla_policy *pt,
+int nestattr)
 {
-   const struct nla_policy *pt;
-   struct nlattr *policy, *attr;
enum netlink_attribute_type type;
-   bool again;
-
-send_attribute:
-   again = false;
-
-   pt = &state->policies[state->policy_idx].policy[state->attr_idx];
+   struct nlattr *attr;
 
-   policy = nla_nest_start(skb, state->policy_idx);
-   if (!policy)
-   return -ENOBUFS;
-
-   attr = nla_nest_start(skb, state->attr_idx);
+   attr = nla_nest_start(skb, nestattr);
if (!attr)
-   goto nla_put_failure;
+   return -ENOBUFS;
 
switch (pt->type) {
default:
case NLA_UNSPEC:
case NLA_REJECT:
/* skip - use NLA_MIN_LEN to advertise such */
-   nla_nest_cancel(skb, policy);
-   again = true;
-   goto next;
+   nla_nest_cancel(skb, attr);
+   return -ENODATA;
case NLA_NESTED:
type = NL_ATTR_TYPE_NESTED;
fallthrough;
case NLA_NESTED_ARRAY:
if (pt->type == NLA_NESTED_ARRAY)
type = NL_ATTR_TYPE_NESTED_ARRAY;
-   if (pt->nested_policy && pt->len &&
+   if (state && pt->nested_policy && pt->len &&
(nla_put_u32(skb, NL_POLICY_TYPE_ATTR_POLICY_IDX,
 netlink_policy_dump_get_policy_idx(state,

pt->nested_policy,
@@ -349,8 +333,47 @@ int netlink_policy_dump_write(struct sk_buff *skb,
if (nla_put_u32(skb, NL_POLICY_TYPE_ATTR_TYPE, type))
goto nla_put_failure;
 
-   /* finish and move state to next attribute */
nla_nest_end(skb, attr);
+   return 0;
+nla_put_failure:
+   nla_nest_cancel(skb, attr);
+   return -ENOBUFS;
+}
+
+/**
+ * netlink_policy_dump_write - write current policy dump attributes
+ * @skb: the message skb to write to
+ * @state: the policy dump state
+ *
+ * Returns: 0 on success, an error code otherwise
+ */
+int netlink_policy_dump_write(struct sk_buff *skb,
+ struct netlink_policy_dump_state *state)
+{
+   const struct nla_policy *pt;
+   struct nlattr *policy;
+   bool again;
+   int err;
+
+send_attribute:
+   again = false;
+
+   pt = &state->policies[state->policy_idx].policy[state->attr_idx];
+
+   policy = nla_nest_start(skb, state->policy_idx);
+   if (!policy)
+   return -ENOBUFS;
+
+   err = __netlink_policy_dump_write_attr(state, skb, pt, state->attr_idx);
+   if (err == -ENODATA) {
+   nla_nest_cancel(skb, policy);
+   again = true;
+   goto next;
+   } else if (err) {
+   goto nla_put_failure;
+   }
+
+   /* finish and move state to next attribute */
nla_nest_end(skb, policy);
 
 next:
-- 
2.26.2



[PATCH v5 2/2] netlink: export policy in extended ACK

2020-10-08 Thread Johannes Berg
From: Johannes Berg 

Add a new attribute NLMSGERR_ATTR_POLICY to the extended ACK
to advertise the policy, e.g. if an attribute was out of range,
you'll know the range that's permissible.

Add new NL_SET_ERR_MSG_ATTR_POL() and NL_SET_ERR_MSG_ATTR_POL()
macros to set this, since realistically it's only useful to do
this when the bad attribute (offset) is also returned.

Use it in lib/nlattr.c which practically does all the policy
validation.

v2:
 - add and use netlink_policy_dump_attr_size_estimate()
v3:
 - remove redundant break
v4:
 - really remove redundant break ... sorry

Reviewed-by: Jakub Kicinski 
Signed-off-by: Johannes Berg 
---
 include/linux/netlink.h  | 30 --
 include/net/netlink.h|  4 +++
 include/uapi/linux/netlink.h |  2 ++
 lib/nlattr.c | 35 +++--
 net/netlink/af_netlink.c |  5 +++
 net/netlink/policy.c | 61 
 6 files changed, 110 insertions(+), 27 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index e3e49f0e5c13..666cd0390699 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -68,12 +68,14 @@ netlink_kernel_create(struct net *net, int unit, struct 
netlink_kernel_cfg *cfg)
  * @_msg: message string to report - don't access directly, use
  * %NL_SET_ERR_MSG
  * @bad_attr: attribute with error
+ * @policy: policy for a bad attribute
  * @cookie: cookie data to return to userspace (for success)
  * @cookie_len: actual cookie data length
  */
 struct netlink_ext_ack {
const char *_msg;
const struct nlattr *bad_attr;
+   const struct nla_policy *policy;
u8 cookie[NETLINK_MAX_COOKIE_LEN];
u8 cookie_len;
 };
@@ -95,21 +97,29 @@ struct netlink_ext_ack {
 #define NL_SET_ERR_MSG_MOD(extack, msg)\
NL_SET_ERR_MSG((extack), KBUILD_MODNAME ": " msg)
 
-#define NL_SET_BAD_ATTR(extack, attr) do { \
-   if ((extack))   \
+#define NL_SET_BAD_ATTR_POLICY(extack, attr, pol) do { \
+   if ((extack)) { \
(extack)->bad_attr = (attr);\
+   (extack)->policy = (pol);   \
+   }   \
 } while (0)
 
-#define NL_SET_ERR_MSG_ATTR(extack, attr, msg) do {\
-   static const char __msg[] = msg;\
-   struct netlink_ext_ack *__extack = (extack);\
-   \
-   if (__extack) { \
-   __extack->_msg = __msg; \
-   __extack->bad_attr = (attr);\
-   }   \
+#define NL_SET_BAD_ATTR(extack, attr) NL_SET_BAD_ATTR_POLICY(extack, attr, 
NULL)
+
+#define NL_SET_ERR_MSG_ATTR_POL(extack, attr, pol, msg) do {   \
+   static const char __msg[] = msg;\
+   struct netlink_ext_ack *__extack = (extack);\
+   \
+   if (__extack) { \
+   __extack->_msg = __msg; \
+   __extack->bad_attr = (attr);\
+   __extack->policy = (pol);   \
+   }   \
 } while (0)
 
+#define NL_SET_ERR_MSG_ATTR(extack, attr, msg) \
+   NL_SET_ERR_MSG_ATTR_POL(extack, attr, NULL, msg)
+
 static inline void nl_set_extack_cookie_u64(struct netlink_ext_ack *extack,
u64 cookie)
 {
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 2b9e41075f19..7356f41d23ba 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1957,6 +1957,10 @@ int netlink_policy_dump_get_policy_idx(struct 
netlink_policy_dump_state *state,
 bool netlink_policy_dump_loop(struct netlink_policy_dump_state *state);
 int netlink_policy_dump_write(struct sk_buff *skb,
  struct netlink_policy_dump_state *state);
+int netlink_policy_dump_attr_size_estimate(const struct nla_policy *pt);
+int netlink_policy_dump_write_attr(struct sk_buff *skb,
+  const struct nla_policy *pt,
+  int nestattr);
 void netlink_policy_dump_free(struct netlink_policy_dump_state *state);
 
 #endif
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index d02e472ba54c..c3816ff7bfc3 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -129,6 +129,7 @@ struct nlmsgerr {
  * @NLMSGERR_ATTR_COOKIE: arbitrary subsystem specific cookie to
  * be used - in the success case - to identify a created
  * object or operation or similar (binary)
+ * @NLMSGERR_ATTR_POLICY: policy for a rejected attribute
  * @__NLMSGE

Re: [PATCH 1/2] ath11k: Fix memory leak on error path

2020-10-08 Thread Kalle Valo
Alex Dewar  wrote:

> In ath11k_mac_setup_iface_combinations(), if memory cannot be assigned
> for the variable limits, then the memory assigned to combinations will
> be leaked. Fix this.
> 
> Addresses-Coverity-ID: 1497534 ("Resource leaks")
> Fixes: 2626c269702e ("ath11k: add interface_modes to hw_params")
> Signed-off-by: Alex Dewar 

Patch applied to wireless-drivers-next.git, thanks.

8431350eee2e ath11k: Fix memory leak on error path

-- 
https://patchwork.kernel.org/patch/11815579/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: [PATCH v4 bpf-next 09/13] bpf: introduce multibuff support to bpf_prog_test_run_xdp()

2020-10-08 Thread Jesper Dangaard Brouer
On Thu, 8 Oct 2020 11:06:14 +0300
Shay Agroskin  wrote:

> Lorenzo Bianconi  writes:
> 
> > Introduce the capability to allocate a xdp multi-buff in
> > bpf_prog_test_run_xdp routine. This is a preliminary patch to 
> > introduce
> > the selftests for new xdp multi-buff ebpf helpers
> >
> > Signed-off-by: Lorenzo Bianconi 
> > ---
> >  net/bpf/test_run.c | 51  ++
> >  1 file changed, 43 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index bd291f5f539c..ec7286cd051b 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -617,44 +617,79 @@ int bpf_prog_test_run_xdp(struct bpf_prog 
> > *prog, const union bpf_attr *kattr,
> >  {
> > u32 tailroom = SKB_DATA_ALIGN(sizeof(struct 
> >  skb_shared_info));
> > u32 headroom = XDP_PACKET_HEADROOM;
> > -   u32 size = kattr->test.data_size_in;
> > u32 repeat = kattr->test.repeat;
> > struct netdev_rx_queue *rxqueue;
> > +   struct skb_shared_info *sinfo;
> > struct xdp_buff xdp = {};
> > +   u32 max_data_sz, size;
> > u32 retval, duration;
> > -   u32 max_data_sz;
> > +   int i, ret, data_len;
> > void *data;
> > -   int ret;
> >  
> > if (kattr->test.ctx_in || kattr->test.ctx_out)
> > return -EINVAL;
> >  
> > -   /* XDP have extra tailroom as (most) drivers use full page 
> > */
> > max_data_sz = 4096 - headroom - tailroom;  
> 
> For the sake of consistency, can this 4096 be changed to PAGE_SIZE 
> ?

The size 4096 is explicitly use, because the selftest xdp_adjust_tail
expect this, else it will fail on ARCHs with 64K PAGE_SIZE.  It also
seems excessive to create 64K packets for testing XDP.

See: tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c

> Same as in
>  data_len = min_t(int, kattr->test.data_size_in - size, 
>  PAGE_SIZE);
> 
> expression below



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



Re: mwifiex: fix double free

2020-10-08 Thread Kalle Valo
t...@redhat.com wrote:

> From: Tom Rix 
> 
> clang static analysis reports this problem:
> 
> sdio.c:2403:3: warning: Attempt to free released memory
> kfree(card->mpa_rx.buf);
> ^~~
> 
> When mwifiex_init_sdio() fails in its first call to
> mwifiex_alloc_sdio_mpa_buffer, it falls back to calling it
> again.  If the second alloc of mpa_tx.buf fails, the error
> handler will try to free the old, previously freed mpa_rx.buf.
> Reviewing the code, it looks like a second double free would
> happen with mwifiex_cleanup_sdio().
> 
> So set both pointers to NULL when they are freed.
> 
> Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex 
> driver")
> Signed-off-by: Tom Rix 
> Reviewed-by: Brian Norris 

Patch applied to wireless-drivers-next.git, thanks.

53708f4fd9cf mwifiex: fix double free

-- 
https://patchwork.kernel.org/patch/11815655/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: [PATCH] rtlwifi: rtl8192se: remove duplicated legacy_httxpowerdiff

2020-10-08 Thread Kalle Valo
Chris Chiu  wrote:

> From: Chris Chiu 
> 
> The legacy_httxpowerdiff in rtl8192se is pretty much the same as
> the legacy_ht_txpowerdiff for other chips. Use the same name to
> keep the consistency.
> 
> Signed-off-by: Chris Chiu 

Patch applied to wireless-drivers-next.git, thanks.

8b2426c50f20 rtlwifi: rtl8192se: remove duplicated legacy_httxpowerdiff

-- 
https://patchwork.kernel.org/patch/11818043/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: [PATCH][next] ath11k: fix memory leak of 'combinations'

2020-10-08 Thread Kalle Valo
Colin King  wrote:

> From: Colin Ian King 
> 
> Currently the error return path when 'limits' fails to allocate
> does not free the memory allocated for 'combinations'. Fix this
> by adding a kfree before returning.
> 
> Addresses-Coverity: ("Resource leak")
> Fixes: 2626c269702e ("ath11k: add interface_modes to hw_params")
> Signed-off-by: Colin Ian King 

Alex already sent an identical patch:

https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=8431350eee2e27ae60f5250e0437ab298329070e

Patch set to Superseded.

-- 
https://patchwork.kernel.org/patch/11819025/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: [PATCH 0/7] wfx: move out from the staging area

2020-10-08 Thread Kalle Valo
Jérôme Pouiller  writes:

> On Thursday 8 October 2020 09:30:06 CEST Kalle Valo wrote:
> [...]
>> Yes, the driver needs to be reviewed in linux-wireless list. I recommend
>> submitting the whole driver in a patchset with one file per patch, which
>> seems to be the easiest way to review a full driver. The final move will
>> be in just one commit moving the driver, just like patch 7 does here. As
>> an example see how wilc1000 review was done.
>
> I see. I suppose it is still a bit complicated to review? Maybe I could
> try to make things easier.
>
> For my submission to staging/ I had taken time to split the driver in an
> understandable series of patches[1]. I think it was easier to review than
> just sending files one by one. I could do the same thing for the
> submission to linux-wireless. It would ask me a bit of work but, since I
> already have a template, it is conceivable.
>
> Do you think it is worth it, or it would be an unnecessary effort?
>
> [1]
> https://lore.kernel.org/driverdev-devel/20190919142527.31797-1-jerome.pouil...@silabs.com/
>  or commits a7a91ca5a23d^..40115bbc40e2

I don't know how others think, but I prefer to review new drivers "one
file per patch" style as I get to see the big picture easily. And
besides, splitting the driver like that would be a huge job for you. I
don't think it's worth your time in this case. And making changes in the
driver during review process becomes even more complex.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


Re: [PATCH bpf-next V2 1/6] bpf: Remove MTU check in __bpf_skb_max_len

2020-10-08 Thread Jesper Dangaard Brouer
On Wed, 7 Oct 2020 16:46:10 -0700
Maciej Żenczykowski  wrote:

> >  static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> >  {
> > -   return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> > - SKB_MAX_ALLOC;
> > +   return IP_MAX_MTU;
> >  }  
> 
> Shouldn't we just delete this helper instead and replace call sites?

It does seem wrong to pass argument skb into this function, as it is
no-longer used...

Guess I can simply replace __bpf_skb_max_len with IP_MAX_MTU.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



[PATCH net-next] neigh: add netlink filtering based on LLADDR for dump

2020-10-08 Thread Florent Fourcot
neighbours table dump supports today two filtering:
 * based on interface index
 * based on master index

This patch adds a new filtering, based on layer two address. That will
help to replace something like it:

 ip neigh show | grep aa:11:22:bb:ee:ff

by a better command:

 ip neigh show lladdr aa:11:22:bb:ee:ff

Signed-off-by: Florent Fourcot 
---
 net/core/neighbour.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 8e39e28b0a8d..4b32bf49a005 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2542,9 +2542,25 @@ static bool neigh_ifindex_filtered(struct net_device 
*dev, int filter_idx)
return false;
 }
 
+static bool neigh_lladdr_filtered(struct neighbour *neigh, const u8 *lladdr)
+{
+   if (!lladdr)
+   return false;
+
+   /* Ignore all empty values when lladdr filtering is set */
+   if (!neigh->dev->addr_len)
+   return true;
+
+   if (memcmp(lladdr, neigh->ha, neigh->dev->addr_len) != 0)
+   return true;
+
+   return false;
+}
+
 struct neigh_dump_filter {
int master_idx;
int dev_idx;
+   void *lladdr;
 };
 
 static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
@@ -2558,7 +2574,7 @@ static int neigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
struct neigh_hash_table *nht;
unsigned int flags = NLM_F_MULTI;
 
-   if (filter->dev_idx || filter->master_idx)
+   if (filter->dev_idx || filter->master_idx || filter->lladdr)
flags |= NLM_F_DUMP_FILTERED;
 
rcu_read_lock_bh();
@@ -2573,7 +2589,8 @@ static int neigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
if (idx < s_idx || !net_eq(dev_net(n->dev), net))
goto next;
if (neigh_ifindex_filtered(n->dev, filter->dev_idx) ||
-   neigh_master_filtered(n->dev, filter->master_idx))
+   neigh_master_filtered(n->dev, filter->master_idx) ||
+   neigh_lladdr_filtered(n, filter->lladdr))
goto next;
if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
@@ -2689,6 +2706,9 @@ static int neigh_valid_dump_req(const struct nlmsghdr 
*nlh,
case NDA_MASTER:
filter->master_idx = nla_get_u32(tb[i]);
break;
+   case NDA_LLADDR:
+   filter->lladdr = nla_data(tb[i]);
+   break;
default:
if (strict_check) {
NL_SET_ERR_MSG(extack, "Unsupported attribute 
in neighbor dump request");
-- 
2.20.1



RE: [PATCH v2 1/6] Add ancillary bus support

2020-10-08 Thread Parav Pandit


> From: gre...@linuxfoundation.org 
> Sent: Thursday, October 8, 2020 1:21 PM
> 
> On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky 
> wrote:
> > [..]
> > > All stated above is my opinion, it can be different from yours.
> >
> > Yes, but we need to converge to move this forward. Jason was involved
> > in the current organization for registration, Greg was angling for
> > this to be core functionality. I have use cases outside of RDMA and
> > netdev. Parav was ok with the current organization. The SOF folks
> > already have a proposed incorporation of it. The argument I am hearing
> > is that "this registration api seems hard for driver writers" when we
> > have several driver writers who have already taken a look and can make
> > it work. If you want to follow on with a simpler wrappers for your use
> > case, great, but I do not yet see anyone concurring with your opinion
> > that the current organization is irretrievably broken or too obscure
> > to use.
> 
> That's kind of because I tuned out of this thread a long time ago :)
> 
> I do agree with Leon that I think the current patch is not the correct way to
> do this the easiest, but don't have a competing proposal to show what I
> mean.
> 
> Yet.
Please consider the approach of ib_alloc_device(), ib_dealloc_device() and 
ib_register_register()/unregister().
(a) It avoids driver calling put_device() on error unwinding path.
(b) still achieves container_of().

> 
> Let's see what happens after 5.10-rc1 is out, it's too late now for any of 
> this
> for this next merge window so we can not worry about it for a few weeks.
> 
Ok. INHO giving direction to Dave and others to either refine current APIs or 
follow ib_alloc_device() approach will be a helpful input.

ancillary bus can do better APIs than the newly (march 2020 !) introduced vdpa 
bus [1] and its drivers which follows put_device() pattern in [2] and [3] in 
error unwinding path.

[1] https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/vdpa.c
[2] 
https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/ifcvf/ifcvf_main.c#L475
[3] 
https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/mlx5/net/mlx5_vnet.c#L1967

> thanks,
> 
> greg k-h


Re: [Patch net] ip_gre: set dev->hard_header_len properly

2020-10-08 Thread Willem de Bruijn
On Wed, Oct 7, 2020 at 9:22 PM Cong Wang  wrote:
>
> GRE tunnel has its own header_ops, ipgre_header_ops, and sets it
> conditionally. When it is set, it assumes the outer IP header is
> already created before ipgre_xmit().
>
> This is not true when we send packets through a raw packet socket,
> where L2 headers are supposed to be constructed by user. Packet
> socket calls dev_validate_header() to validate the header. But
> GRE tunnel does not set dev->hard_header_len, so that check can
> be simply bypassed, therefore uninit memory could be passed down
> to ipgre_xmit().

If dev->hard_header_len is zero, the packet socket will not reserve
room for the link layer header, so skb->data points to network_header.
But I don't see any uninitialized packet data?

> Fix this by setting dev->hard_header_len whenever sets header_ops,
> as dev->hard_header_len is supposed to be the length of the header
> created by dev->header_ops->create() anyway.

Agreed. Xie has made similar changes to lapbether recently in
commit c7ca03c216ac.

> Reported-and-tested-by: syzbot+4a2c52677a8a1aa28...@syzkaller.appspotmail.com
> Cc: William Tu 
> Cc: Willem de Bruijn 
> Signed-off-by: Cong Wang 
> ---
>  net/ipv4/ip_gre.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 4e31f23e4117..43b62095559e 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -987,10 +987,12 @@ static int ipgre_tunnel_init(struct net_device *dev)
> return -EINVAL;
> dev->flags = IFF_BROADCAST;
> dev->header_ops = &ipgre_header_ops;
> +   dev->hard_header_len = tunnel->hlen + sizeof(*iph);
> }
>  #endif
> } else if (!tunnel->collect_md) {
> dev->header_ops = &ipgre_header_ops;
> +   dev->hard_header_len = tunnel->hlen + sizeof(*iph);

Unrelated to this patch, I do wonder if ipgre_header does the
right thing when tunnel->hlen > sizeof(gre_base_hdr),
i.e., for gre tunnels with optional fields.

Currently it appears to not initialize those.

> }
>
> return ip_tunnel_init(dev);
> --
> 2.28.0
>


Re: [PATCH] man: fix typos

2020-10-08 Thread Samanta Navarro
Signed-off-by: Samanta Navarro 
---
 man/man8/ip-link.8.in   | 14 +++---
 man/man8/ip-neighbour.8 |  6 +++---
 man/man8/tc-actions.8   |  2 +-
 man/man8/tc-pie.8   |  4 ++--
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index f451ecf..62d2d9f 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -391,7 +391,7 @@ packet the new device should accept.
 
 .TP
 .BI gso_max_segs " SEGMENTS "
-specifies the recommended maximum number of a Generic Segment Offload
+specifies the recommended maximum number of Generic Segment Offload
 segments the new device should accept.
 
 .TP
@@ -441,7 +441,7 @@ the following additional arguments are supported:
 - either 802.1Q or 802.1ad.
 
 .BI id " VLANID "
-- specifies the VLAN Identifer to use. Note that numbers with a leading " 0 " 
or " 0x " are interpreted as octal or hexadeimal, respectively.
+- specifies the VLAN Identifier to use. Note that numbers with a leading " 0 " 
or " 0x " are interpreted as octal or hexadecimal, respectively.
 
 .BR reorder_hdr " { " on " | " off " } "
 - specifies whether ethernet headers are reordered or not (default is
@@ -572,7 +572,7 @@ the following additional arguments are supported:
 .in +8
 .sp
 .BI  id " VNI "
-- specifies the VXLAN Network Identifer (or VXLAN Segment
+- specifies the VXLAN Network Identifier (or VXLAN Segment
 Identifier) to use.
 
 .BI dev " PHYS_DEV"
@@ -1237,7 +1237,7 @@ the following additional arguments are supported:
 .in +8
 .sp
 .BI  id " VNI "
-- specifies the Virtual Network Identifer to use.
+- specifies the Virtual Network Identifier to use.
 
 .sp
 .BI remote " IPADDR"
@@ -2147,7 +2147,7 @@ loaded under
 .B xdpgeneric object "|" pinned
 then the kernel will use the generic XDP variant instead of the native one.
 .B xdpdrv
-has the opposite effect of requestsing that the automatic fallback to the
+has the opposite effect of requesting that the automatic fallback to the
 generic XDP variant be disabled and in case driver is not XDP-capable error
 should be returned.
 .B xdpdrv
@@ -2466,7 +2466,7 @@ specifies the master device which enslaves devices to 
show.
 .TP
 .BI vrf " NAME "
 .I NAME
-speficies the VRF which enslaves devices to show.
+specifies the VRF which enslaves devices to show.
 
 .TP
 .BI type " TYPE "
@@ -2497,7 +2497,7 @@ specifies the device to display address-family statistics 
for.
 
 .PP
 .I "TYPE"
-specifies which help of link type to dislpay.
+specifies which help of link type to display.
 
 .SS
 .I GROUP
diff --git a/man/man8/ip-neighbour.8 b/man/man8/ip-neighbour.8
index f71f18b..a27f9ef 100644
--- a/man/man8/ip-neighbour.8
+++ b/man/man8/ip-neighbour.8
@@ -85,11 +85,11 @@ the interface to which this neighbour is attached.
 
 .TP
 .BI proxy
-indicates whether we are proxying for this neigbour entry
+indicates whether we are proxying for this neighbour entry
 
 .TP
 .BI router
-indicates whether neigbour is a router
+indicates whether neighbour is a router
 
 .TP
 .BI extern_learn
@@ -244,7 +244,7 @@ lookup a neighbour entry to a destination given a device
 
 .TP
 .BI proxy
-indicates whether we should lookup a proxy neigbour entry
+indicates whether we should lookup a proxy neighbour entry
 
 .TP
 .BI to " ADDRESS " (default)
diff --git a/man/man8/tc-actions.8 b/man/man8/tc-actions.8
index 6f1c201..80df577 100644
--- a/man/man8/tc-actions.8
+++ b/man/man8/tc-actions.8
@@ -253,7 +253,7 @@ should proceed after executing the action. Any of the 
following are valid:
 .RS
 .TP
 .B reclassify
-Restart the classifiction by jumping back to the first filter attached to
+Restart the classification by jumping back to the first filter attached to
 the action's parent.
 .TP
 .B pipe
diff --git a/man/man8/tc-pie.8 b/man/man8/tc-pie.8
index 0db97d1..5a8c782 100644
--- a/man/man8/tc-pie.8
+++ b/man/man8/tc-pie.8
@@ -40,7 +40,7 @@ aims to control delay. The main design goals are
 PIE is designed to control delay effectively. First, an average dequeue rate is
 estimated based on the standing queue. The rate is used to calculate the 
current
 delay. Then, on a periodic basis, the delay is used to calculate the dropping
-probabilty. Finally, on arrival, a packet is dropped (or marked) based on this
+probability. Finally, on arrival, a packet is dropped (or marked) based on this
 probability.
 
 PIE makes adjustments to the probability based on the trend of the delay i.e.
@@ -52,7 +52,7 @@ growth and are determined through control theoretic 
approaches. alpha determines
 the deviation between the current and target latency changes probability. beta 
exerts
 additional adjustments depending on the latency trend.
 
-The drop probabilty is used to mark packets in ecn mode. However, as in RED,
+The drop probability is used to mark packets in ecn mode. However, as in RED,
 beyond 10% packets are dropped based on this probability. The bytemode is used
 to drop packets proportional to the packet size.
 
-- 
2.28.0



Re: [PATCH net-next v6 2/7] net: dsa: Add DSA driver for Hirschmann Hellcreek switches

2020-10-08 Thread Vladimir Oltean
On Tue, Oct 06, 2020 at 12:13:04PM +0200, Kurt Kanzenbach wrote:
> >> >> +static const struct hellcreek_platform_data de1soc_r1_pdata = {
> >> >> +   .num_ports   = 4,
> >> >> +   .is_100_mbits= 1,
> >> >> +   .qbv_support = 1,
> >> >> +   .qbv_on_cpu_port = 1,
> >> >
> >> > Why does this matter?
> >> 
> >> Because Qbv on the CPU port is a feature and not all switch variants
> >> have that. It will matter as soon as TAPRIO is implemented.
> >
> > How do you plan to install a tc-taprio qdisc on the CPU port?
> 
> That's an issue to be sorted out.

Do you have a compelling use case for tc-taprio on the CPU port though?
I've been waiting for someone to put one on the table.
If it's just "nice to have", I don't think that DSA will change just to
accomodate that. The fact that the CPU port doesn't have a net device is
already pretty much the established behavior.


[PATCH net-next 1/3] net: mscc: ocelot: offload VLAN mangle action to VCAP IS1

2020-10-08 Thread Vladimir Oltean
The VCAP_IS1_ACT_VID_REPLACE_ENA action, from the VCAP IS1 ingress TCAM,
changes the classified VLAN.

We are only exposing this ability for switch ports that are under VLAN
aware bridges. This is because in standalone ports mode and under a
bridge with vlan_filtering=0, the ocelot driver configures the switch to
operate as VLAN-unaware, so the classified VLAN is not derived from the
802.1Q header from the packet, but instead is always equal to the
port-based VLAN ID of the ingress port. We _can_ still change the
classified VLAN for packets when operating in this mode, but the end
result will most likely be a drop, since both the ingress and the egress
port need to be members of the modified VLAN. And even if we install the
new classified VLAN into the VLAN table of the switch, the result would
still not be as expected: we wouldn't see, on the output port, the
modified VLAN tag, but the original one, even though the classified VLAN
was indeed modified. This is because of how the hardware works: on
egress, what is pushed to the frame is a "port tag", which gives us the
following options:

- Tag all frames with port tag (derived from the classified VLAN)
- Tag all frames with port tag, except if the classified VLAN is 0 or
  equal to the native VLAN of the egress port
- No port tag

Needless to say, in VLAN-unaware mode we are disabling the port tag.
Otherwise, the existing VLAN tag would be ignored, and a second VLAN
tag (the port tag), holding the classified VLAN, would be pushed
(instead of replacing the existing 802.1Q tag). This is definitely not
what the user wanted when installing a "vlan modify" action.

So it is simply not worth bothering with VLAN modify rules under other
configurations except when the ports are fully VLAN-aware.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c| 15 +++-
 drivers/net/ethernet/mscc/ocelot_flower.c | 29 ---
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index a965a554ff8b..3c4aa6a0c397 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -205,8 +205,21 @@ int ocelot_port_vlan_filtering(struct ocelot *ocelot, int 
port,
struct ocelot_port *ocelot_port = ocelot->ports[port];
u32 val;
 
-   if (switchdev_trans_ph_prepare(trans))
+   if (switchdev_trans_ph_prepare(trans)) {
+   struct ocelot_vcap_block *block = &ocelot->block[VCAP_IS1];
+   struct ocelot_vcap_filter *filter;
+
+   list_for_each_entry(filter, &block->rules, list) {
+   if (filter->ingress_port_mask & BIT(port) &&
+   filter->action.vid_replace_ena) {
+   dev_err(ocelot->dev,
+   "Cannot change VLAN state with vlan 
modify rules active\n");
+   return -EBUSY;
+   }
+   }
+
return 0;
+   }
 
ocelot_port->vlan_aware = vlan_aware;
 
diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c 
b/drivers/net/ethernet/mscc/ocelot_flower.c
index 0ea6d4f411cb..729495a1a77e 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -142,10 +142,11 @@ ocelot_find_vcap_filter_that_points_at(struct ocelot 
*ocelot, int chain)
return NULL;
 }
 
-static int ocelot_flower_parse_action(struct ocelot *ocelot, bool ingress,
- struct flow_cls_offload *f,
+static int ocelot_flower_parse_action(struct ocelot *ocelot, int port,
+ bool ingress, struct flow_cls_offload *f,
  struct ocelot_vcap_filter *filter)
 {
+   struct ocelot_port *ocelot_port = ocelot->ports[port];
struct netlink_ext_ack *extack = f->common.extack;
bool allow_missing_goto_target = false;
const struct flow_action_entry *a;
@@ -266,6 +267,28 @@ static int ocelot_flower_parse_action(struct ocelot 
*ocelot, bool ingress,
}
filter->type = OCELOT_VCAP_FILTER_OFFLOAD;
break;
+   case FLOW_ACTION_VLAN_MANGLE:
+   if (filter->block_id != VCAP_IS1) {
+   NL_SET_ERR_MSG_MOD(extack,
+  "VLAN modify action can only 
be offloaded to VCAP IS1");
+   return -EOPNOTSUPP;
+   }
+   if (filter->goto_target != -1) {
+   NL_SET_ERR_MSG_MOD(extack,
+  "Last action must be GOTO");
+   return -EOPNOTSUPP;
+   }
+   if (!ocelot_port->vlan_aware) {
+   

[PATCH net-next 0/3] Offload tc-vlan mangle to mscc_ocelot switch

2020-10-08 Thread Vladimir Oltean
This series offloads one more action to the VCAP IS1 ingress TCAM, which
is to change the classified VLAN for packets, according to the VCAP IS1
keys (VLAN, source MAC, source IP, EtherType, etc).

Vladimir Oltean (3):
  net: mscc: ocelot: offload VLAN mangle action to VCAP IS1
  net: dsa: tag_ocelot: use VLAN information from tagging header when
available
  selftests: net: mscc: ocelot: add test for VLAN modify action

 drivers/net/ethernet/mscc/ocelot.c| 15 +-
 drivers/net/ethernet/mscc/ocelot_flower.c | 29 ++--
 net/dsa/tag_ocelot.c  | 34 ++
 .../drivers/net/ocelot/tc_flower_chains.sh| 47 ++-
 4 files changed, 119 insertions(+), 6 deletions(-)

-- 
2.25.1



[PATCH net-next 2/3] net: dsa: tag_ocelot: use VLAN information from tagging header when available

2020-10-08 Thread Vladimir Oltean
When the Extraction Frame Header contains a valid classified VLAN, use
that instead of the VLAN header present in the packet.

Signed-off-by: Vladimir Oltean 
---
 net/dsa/tag_ocelot.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
index ec16badb7812..3b468aca5c53 100644
--- a/net/dsa/tag_ocelot.c
+++ b/net/dsa/tag_ocelot.c
@@ -184,9 +184,14 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
  struct net_device *netdev,
  struct packet_type *pt)
 {
+   struct dsa_port *cpu_dp = netdev->dsa_ptr;
+   struct dsa_switch *ds = cpu_dp->ds;
+   struct ocelot *ocelot = ds->priv;
u64 src_port, qos_class;
+   u64 vlan_tci, tag_type;
u8 *start = skb->data;
u8 *extraction;
+   u16 vlan_tpid;
 
/* Revert skb->data by the amount consumed by the DSA master,
 * so it points to the beginning of the frame.
@@ -214,6 +219,8 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
 
packing(extraction, &src_port,  46, 43, OCELOT_TAG_LEN, UNPACK, 0);
packing(extraction, &qos_class, 19, 17, OCELOT_TAG_LEN, UNPACK, 0);
+   packing(extraction, &tag_type,  16, 16, OCELOT_TAG_LEN, UNPACK, 0);
+   packing(extraction, &vlan_tci,  15,  0, OCELOT_TAG_LEN, UNPACK, 0);
 
skb->dev = dsa_master_find_slave(netdev, 0, src_port);
if (!skb->dev)
@@ -228,6 +235,33 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
skb->offload_fwd_mark = 1;
skb->priority = qos_class;
 
+   /* Ocelot switches copy frames unmodified to the CPU. However, it is
+* possible for the user to request a VLAN modification through
+* VCAP_IS1_ACT_VID_REPLACE_ENA. In this case, what will happen is that
+* the VLAN ID field from the Extraction Header gets updated, but the
+* 802.1Q header does not (the classified VLAN only becomes visible on
+* egress through the "port tag" of front-panel ports).
+* So, for traffic extracted by the CPU, we want to pick up the
+* classified VLAN and manually replace the existing 802.1Q header from
+* the packet with it, so that the operating system is always up to
+* date with the result of tc-vlan actions.
+* NOTE: In VLAN-unaware mode, we don't want to do that, we want the
+* frame to remain unmodified, because the classified VLAN is always
+* equal to the pvid of the ingress port and should not be used for
+* processing.
+*/
+   vlan_tpid = tag_type ? ETH_P_8021AD : ETH_P_8021Q;
+
+   if (ocelot->ports[src_port]->vlan_aware &&
+   eth_hdr(skb)->h_proto == htons(vlan_tpid)) {
+   u16 dummy_vlan_tci;
+
+   skb_push_rcsum(skb, ETH_HLEN);
+   __skb_vlan_pop(skb, &dummy_vlan_tci);
+   skb_pull_rcsum(skb, ETH_HLEN);
+   __vlan_hwaccel_put_tag(skb, htons(vlan_tpid), vlan_tci);
+   }
+
return skb;
 }
 
-- 
2.25.1



[PATCH net-next 3/3] selftests: net: mscc: ocelot: add test for VLAN modify action

2020-10-08 Thread Vladimir Oltean
Create a test that changes a VLAN ID from 200 to 300.

We also need to modify the preferences of the filters installed for the
other rules so that they are unique, because we now install the "tc-vlan
modify" filter in VCAP IS1 only temporarily, and we need to perform the
deletion by filter preference number.

Signed-off-by: Vladimir Oltean 
---
 .../drivers/net/ocelot/tc_flower_chains.sh| 47 ++-
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh 
b/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh
index 71a538add08a..beee0d5646a6 100755
--- a/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh
+++ b/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh
@@ -166,6 +166,9 @@ setup_prepare()
ip link add link $eth3 name $eth3.100 type vlan id 100
ip link set $eth3.100 up
 
+   ip link add link $eth3 name $eth3.200 type vlan id 200
+   ip link set $eth3.200 up
+
tc filter add dev $eth0 ingress chain $(IS1 1) pref 1 \
protocol 802.1Q flower skip_sw vlan_id 100 \
action vlan pop \
@@ -175,12 +178,12 @@ setup_prepare()
flower skip_sw indev $eth1 \
action vlan push protocol 802.1Q id 100
 
-   tc filter add dev $eth0 ingress chain $(IS1 0) \
+   tc filter add dev $eth0 ingress chain $(IS1 0) pref 2 \
protocol ipv4 flower skip_sw src_ip 10.1.1.2 \
action skbedit priority 7 \
action goto chain $(IS1 1)
 
-   tc filter add dev $eth0 ingress chain $(IS2 0 0) \
+   tc filter add dev $eth0 ingress chain $(IS2 0 0) pref 1 \
protocol ipv4 flower skip_sw ip_proto udp dst_port 5201 \
action police rate 50mbit burst 64k \
action goto chain $(IS2 1 0)
@@ -188,6 +191,7 @@ setup_prepare()
 
 cleanup()
 {
+   ip link del $eth3.200
ip link del $eth3.100
tc qdisc del dev $eth0 clsact
ip link del br0
@@ -238,6 +242,44 @@ test_vlan_push()
tcpdump_cleanup
 }
 
+test_vlan_modify()
+{
+   printf "Testing VLAN modification.. "
+
+   ip link set br0 type bridge vlan_filtering 1
+   bridge vlan add dev $eth0 vid 200
+   bridge vlan add dev $eth0 vid 300
+   bridge vlan add dev $eth1 vid 300
+
+   tc filter add dev $eth0 ingress chain $(IS1 2) pref 3 \
+   protocol 802.1Q flower skip_sw vlan_id 200 \
+   action vlan modify id 300 \
+   action goto chain $(IS2 0 0)
+
+   tcpdump_start $eth2
+
+   $MZ $eth3.200 -q -c 1 -p 64 -a $eth3_mac -b $eth2_mac -t ip
+
+   sleep 1
+
+   tcpdump_stop
+
+   if tcpdump_show | grep -q "$eth3_mac > $eth2_mac, .* vlan 300"; then
+   echo "OK"
+   else
+   echo "FAIL"
+   fi
+
+   tcpdump_cleanup
+
+   tc filter del dev $eth0 ingress chain $(IS1 2) pref 3
+
+   bridge vlan del dev $eth0 vid 200
+   bridge vlan del dev $eth0 vid 300
+   bridge vlan del dev $eth1 vid 300
+   ip link set br0 type bridge vlan_filtering 0
+}
+
 test_skbedit_priority()
 {
local num_pkts=100
@@ -262,6 +304,7 @@ trap cleanup EXIT
 ALL_TESTS="
test_vlan_pop
test_vlan_push
+   test_vlan_modify
test_skbedit_priority
 "
 
-- 
2.25.1



[PATCH v1 6/6] staging: qlge: add documentation for debugging qlge

2020-10-08 Thread Coiby Xu
Instructions and examples on kernel data structures dumping and coredump.

Signed-off-by: Coiby Xu 
---
 .../networking/device_drivers/index.rst   |   1 +
 .../device_drivers/qlogic/index.rst   |  18 +++
 .../networking/device_drivers/qlogic/qlge.rst | 118 ++
 MAINTAINERS   |   6 +
 4 files changed, 143 insertions(+)
 create mode 100644 Documentation/networking/device_drivers/qlogic/index.rst
 create mode 100644 Documentation/networking/device_drivers/qlogic/qlge.rst

diff --git a/Documentation/networking/device_drivers/index.rst 
b/Documentation/networking/device_drivers/index.rst
index a3113ffd7a16..d8279de7bf25 100644
--- a/Documentation/networking/device_drivers/index.rst
+++ b/Documentation/networking/device_drivers/index.rst
@@ -15,6 +15,7 @@ Contents:
ethernet/index
fddi/index
hamradio/index
+   qlogic/index
wan/index
wifi/index
 
diff --git a/Documentation/networking/device_drivers/qlogic/index.rst 
b/Documentation/networking/device_drivers/qlogic/index.rst
new file mode 100644
index ..ad05b04286e4
--- /dev/null
+++ b/Documentation/networking/device_drivers/qlogic/index.rst
@@ -0,0 +1,18 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+QLogic QLGE Device Drivers
+===
+
+Contents:
+
+.. toctree::
+   :maxdepth: 2
+
+   qlge
+
+.. only::  subproject and html
+
+   Indices
+   ===
+
+   * :ref:`genindex`
diff --git a/Documentation/networking/device_drivers/qlogic/qlge.rst 
b/Documentation/networking/device_drivers/qlogic/qlge.rst
new file mode 100644
index ..0b888253d152
--- /dev/null
+++ b/Documentation/networking/device_drivers/qlogic/qlge.rst
@@ -0,0 +1,118 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+QLogic QLGE 10Gb Ethernet device driver
+===
+
+This driver use drgn and devlink for debugging.
+
+Dump kernel data structures in drgn
+---
+
+To dump kernel data structures, the following Python script can be used
+in drgn:
+
+.. code-block:: python
+
+   def align(x, a):
+   """the alignment a should be a power of 2
+   """
+   mask = a - 1
+   return (x+ mask) & ~mask
+
+   def struct_size(struct_type):
+   struct_str = "struct {}".format(struct_type)
+   return sizeof(Object(prog, struct_str, address=0x0))
+
+   def netdev_priv(netdevice):
+   NETDEV_ALIGN = 32
+   return netdevice.value_() + align(struct_size("net_device"), 
NETDEV_ALIGN)
+
+   name = 'xxx'
+   qlge_device = None
+   netdevices = prog['init_net'].dev_base_head.address_of_()
+   for netdevice in list_for_each_entry("struct net_device", netdevices, 
"dev_list"):
+   if netdevice.name.string_().decode('ascii') == name:
+   print(netdevice.name)
+
+   ql_adapter = Object(prog, "struct ql_adapter", 
address=netdev_priv(qlge_device))
+
+The struct ql_adapter will be printed in drgn as follows,
+
+>>> ql_adapter
+(struct ql_adapter){
+.ricb = (struct ricb){
+.base_cq = (u8)0,
+.flags = (u8)120,
+.mask = (__le16)26637,
+.hash_cq_id = (u8 [1024]){ 172, 142, 255, 255 },
+.ipv6_hash_key = (__le32 [10]){},
+.ipv4_hash_key = (__le32 [4]){},
+},
+.flags = (unsigned long)0,
+.wol = (u32)0,
+.nic_stats = (struct nic_stats){
+.tx_pkts = (u64)0,
+.tx_bytes = (u64)0,
+.tx_mcast_pkts = (u64)0,
+.tx_bcast_pkts = (u64)0,
+.tx_ucast_pkts = (u64)0,
+.tx_ctl_pkts = (u64)0,
+.tx_pause_pkts = (u64)0,
+...
+},
+.active_vlans = (unsigned long [64]){
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 52780853100545, 18446744073709551615,
+18446619461681283072, 0, 42949673024, 2147483647,
+},
+.rx_ring = (struct rx_ring [17]){
+{
+.cqicb = (struct cqicb){
+.msix_vect = (u8)0,
+.reserved1 = (u8)0,
+.reserved2 = (u8)0,
+.flags = (u8)0,
+.len = (__le16)0,
+.rid = (__le16)0,
+...
+},
+.cq_base = (void *)0x0,
+.cq_base_dma = (dma_addr_t)0,
+}
+   

[PATCH v1 4/6] staging: qlge: remove mpi_core_to_log which sends coredump to the kernel ring buffer

2020-10-08 Thread Coiby Xu
devlink health could be used to get coredump. No need to send so much
data to the kernel ring buffer.

Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/qlge.h |  3 ---
 drivers/staging/qlge/qlge_dbg.c | 11 ---
 drivers/staging/qlge/qlge_ethtool.c |  1 -
 drivers/staging/qlge/qlge_main.c|  2 --
 drivers/staging/qlge/qlge_mpi.c |  6 --
 5 files changed, 23 deletions(-)

diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h
index 290e754450c5..0a39801be15a 100644
--- a/drivers/staging/qlge/qlge.h
+++ b/drivers/staging/qlge/qlge.h
@@ -2149,7 +2149,6 @@ struct ql_adapter {
u32 port_init;
u32 link_status;
struct ql_mpi_coredump *mpi_coredump;
-   u32 core_is_dumped;
u32 link_config;
u32 led_config;
u32 max_frame_size;
@@ -2162,7 +2161,6 @@ struct ql_adapter {
struct delayed_work mpi_work;
struct delayed_work mpi_port_cfg_work;
struct delayed_work mpi_idc_work;
-   struct delayed_work mpi_core_to_log;
struct completion ide_completion;
const struct nic_operations *nic_ops;
u16 device_id;
@@ -2253,7 +2251,6 @@ int ql_write_cfg(struct ql_adapter *qdev, void *ptr, int 
size, u32 bit,
 void ql_queue_fw_error(struct ql_adapter *qdev);
 void ql_mpi_work(struct work_struct *work);
 void ql_mpi_reset_work(struct work_struct *work);
-void ql_mpi_core_to_log(struct work_struct *work);
 int ql_wait_reg_rdy(struct ql_adapter *qdev, u32 reg, u32 bit, u32 ebit);
 void ql_queue_asic_error(struct ql_adapter *qdev);
 void ql_set_ethtool_ops(struct net_device *ndev);
diff --git a/drivers/staging/qlge/qlge_dbg.c b/drivers/staging/qlge/qlge_dbg.c
index 42fd13990f3a..989575743718 100644
--- a/drivers/staging/qlge/qlge_dbg.c
+++ b/drivers/staging/qlge/qlge_dbg.c
@@ -1314,17 +1314,6 @@ void ql_get_dump(struct ql_adapter *qdev, void *buff)
}
 }
 
-/* Coredump to messages log file using separate worker thread */
-void ql_mpi_core_to_log(struct work_struct *work)
-{
-   struct ql_adapter *qdev =
-   container_of(work, struct ql_adapter, mpi_core_to_log.work);
-
-   print_hex_dump(KERN_DEBUG, "Core is dumping to log file!\n",
-  DUMP_PREFIX_OFFSET, 32, 4, qdev->mpi_coredump,
-  sizeof(*qdev->mpi_coredump), false);
-}
-
 #ifdef QL_REG_DUMP
 static void ql_dump_intr_states(struct ql_adapter *qdev)
 {
diff --git a/drivers/staging/qlge/qlge_ethtool.c 
b/drivers/staging/qlge/qlge_ethtool.c
index d44b2dae9213..eed116d8895e 100644
--- a/drivers/staging/qlge/qlge_ethtool.c
+++ b/drivers/staging/qlge/qlge_ethtool.c
@@ -616,7 +616,6 @@ static void ql_get_regs(struct net_device *ndev,
struct ql_adapter *qdev = netdev_priv(ndev);
 
ql_get_dump(qdev, p);
-   qdev->core_is_dumped = 0;
if (!test_bit(QL_FRC_COREDUMP, &qdev->flags))
regs->len = sizeof(struct ql_mpi_coredump);
else
diff --git a/drivers/staging/qlge/qlge_main.c b/drivers/staging/qlge/qlge_main.c
index 135225530e51..aaca740d46c4 100644
--- a/drivers/staging/qlge/qlge_main.c
+++ b/drivers/staging/qlge/qlge_main.c
@@ -3808,7 +3808,6 @@ static void ql_cancel_all_work_sync(struct ql_adapter 
*qdev)
cancel_delayed_work_sync(&qdev->mpi_reset_work);
cancel_delayed_work_sync(&qdev->mpi_work);
cancel_delayed_work_sync(&qdev->mpi_idc_work);
-   cancel_delayed_work_sync(&qdev->mpi_core_to_log);
cancel_delayed_work_sync(&qdev->mpi_port_cfg_work);
 }
 
@@ -4504,7 +4503,6 @@ static int ql_init_device(struct pci_dev *pdev, struct 
net_device *ndev,
INIT_DELAYED_WORK(&qdev->mpi_work, ql_mpi_work);
INIT_DELAYED_WORK(&qdev->mpi_port_cfg_work, ql_mpi_port_cfg_work);
INIT_DELAYED_WORK(&qdev->mpi_idc_work, ql_mpi_idc_work);
-   INIT_DELAYED_WORK(&qdev->mpi_core_to_log, ql_mpi_core_to_log);
init_completion(&qdev->ide_completion);
mutex_init(&qdev->mpi_mutex);
 
diff --git a/drivers/staging/qlge/qlge_mpi.c b/drivers/staging/qlge/qlge_mpi.c
index 143a886080c5..1cea24201b17 100644
--- a/drivers/staging/qlge/qlge_mpi.c
+++ b/drivers/staging/qlge/qlge_mpi.c
@@ -1269,11 +1269,5 @@ void ql_mpi_reset_work(struct work_struct *work)
return;
}
 
-   if (qdev->mpi_coredump && !ql_core_dump(qdev, qdev->mpi_coredump)) {
-   netif_err(qdev, drv, qdev->ndev, "Core is dumped!\n");
-   qdev->core_is_dumped = 1;
-   queue_delayed_work(qdev->workqueue,
-  &qdev->mpi_core_to_log, 5 * HZ);
-   }
ql_soft_reset_mpi_risc(qdev);
 }
-- 
2.28.0



[PATCH v1 2/6] staging: qlge: coredump via devlink health reporter

2020-10-08 Thread Coiby Xu
$ devlink health dump show DEVICE reporter coredump -p -j
{
"Core Registers": {
"segment": 1,
"values": [ 
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 ]
},
"Test Logic Regs": {
"segment": 2,
"values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
},
"RMII Registers": {
"segment": 3,
"values": [ 
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 ]
},
...
"Sem Registers": {
"segment": 50,
"values": [ 0,0,0,0 ]
}
}

Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/qlge_devlink.c | 131 ++--
 1 file changed, 125 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/qlge/qlge_devlink.c 
b/drivers/staging/qlge/qlge_devlink.c
index aa45e7e368c0..91b6600b94a9 100644
--- a/drivers/staging/qlge/qlge_devlink.c
+++ b/drivers/staging/qlge/qlge_devlink.c
@@ -1,16 +1,135 @@
 #include "qlge.h"
 #include "qlge_devlink.h"
 
-static int
-qlge_reporter_coredump(struct devlink_health_reporter *reporter,
-   struct devlink_fmsg *fmsg, void *priv_ctx,
-   struct netlink_ext_ack *extack)
+static int fill_seg_(struct devlink_fmsg *fmsg,
+   struct mpi_coredump_segment_header *seg_header,
+   u32 *reg_data)
 {
-   return 0;
+   int i;
+   int header_size = sizeof(struct mpi_coredump_segment_header);
+   int regs_num = (seg_header->seg_size - header_size) / sizeof(u32);
+   int err;
+
+   err = devlink_fmsg_pair_nest_start(fmsg, seg_header->description);
+   if (err)
+   return err;
+   err = devlink_fmsg_obj_nest_start(fmsg);
+   if (err)
+   return err;
+   err = devlink_fmsg_u32_pair_put(fmsg, "segment", seg_header->seg_num);
+   if (err)
+   return err;
+   err = devlink_fmsg_arr_pair_nest_start(fmsg, "values");
+   if (err)
+   return err;
+   for (i = 0; i < regs_num; i++) {
+   err = devlink_fmsg_u32_put(fmsg, *reg_data);
+   if (err)
+   return err;
+   reg_data++;
+   }
+   err = devlink_fmsg_obj_nest_end(fmsg);
+   if (err)
+   return err;
+   err = devlink_fmsg_arr_pair_nest_end(fmsg);
+   if (err)
+   return err;
+   err = devlink_fmsg_pair_nest_end(fmsg);
+   return err;
+}
+
+#define fill_seg(seg_hdr, seg_regs)   \
+   do {   \
+   err = fill_seg_(fmsg, &dump->seg_hdr, dump->seg_regs); \
+   if (err) { \
+   kvfree(dump);  \
+   return err;\
+   }  \
+   } while (0)
+
+static int qlge_reporter_coredump(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg, void *priv_ctx,
+ struct netlink_ext_ack *extack)
+{
+   int err = 0;
+
+   struct qlge_devlink *dev = devlink_health_reporter_priv(reporter);
+   struct ql_adapter *qdev = dev->qdev;
+   struct ql_mpi_coredump *dump;
+
+   if (!netif_running(qdev->ndev))
+   return 0;
+
+   dump = kvmalloc(sizeof(*dump), GFP_KERNEL);
+   if (!dump)
+   return -ENOMEM;
+
+   err = ql_core_dump(qdev, dump);
+   if (err) {
+   kvfree(dump);
+   return err;
+   }
+
+   fill_seg(core_regs_seg_hdr, mpi_core_regs);
+   fill_seg(test_logic_regs_seg_hdr, test_logic_regs);
+   fill_seg(rmii_regs_seg_hdr, rmii_regs);
+   fill_seg(fcmac1_regs_seg_hdr, fcmac1_regs);
+   fill_seg(fcmac2_regs_seg_hdr, fcmac2_regs);
+   fill_seg(fc1_mbx_regs_seg_hdr, fc1_mbx_regs);
+   fill_seg(ide_regs_seg_hdr, ide_regs);
+   fill_seg(nic1_mbx_regs_seg_hdr, nic1_mbx_regs);
+   fill_seg(smbus_regs_seg_hdr, smbus_regs);
+   fill_seg(fc2_mbx_regs_seg_hdr, fc2_mbx_regs);
+   fill_seg(nic2_mbx_regs_seg_hdr, nic2_mbx_regs);
+   fill_seg(i2c_regs_seg_hdr, i2c_regs);
+   fill_seg(memc_regs_seg_hdr, memc_regs);
+   fill_seg(pbus_regs_seg_hdr, pbus_regs);
+   fill_seg(mde_regs_seg_hdr, mde_regs);
+   fill_seg(nic_regs_seg_hdr, nic_regs);
+   fill_seg(nic2_regs_seg_hdr, nic2_regs);
+   fill_seg(xgmac1_seg_hdr, xgmac1);
+   fill_seg(xgmac2_seg_hdr, xgmac2

[PATCH v1 3/6] staging: qlge: support force_coredump option for devlink health dump

2020-10-08 Thread Coiby Xu
With force_coredump module paramter set, devlink health dump will reset
the MPI RISC first which takes 5 secs to be finished.

Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/qlge_devlink.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/staging/qlge/qlge_devlink.c 
b/drivers/staging/qlge/qlge_devlink.c
index 91b6600b94a9..54257468bc7f 100644
--- a/drivers/staging/qlge/qlge_devlink.c
+++ b/drivers/staging/qlge/qlge_devlink.c
@@ -56,10 +56,17 @@ static int qlge_reporter_coredump(struct 
devlink_health_reporter *reporter,
struct qlge_devlink *dev = devlink_health_reporter_priv(reporter);
struct ql_adapter *qdev = dev->qdev;
struct ql_mpi_coredump *dump;
+   wait_queue_head_t wait;
 
if (!netif_running(qdev->ndev))
return 0;
 
+   if (test_bit(QL_FRC_COREDUMP, &qdev->flags)) {
+   ql_queue_fw_error(qdev);
+   init_waitqueue_head(&wait);
+   wait_event_timeout(wait, 0, 5 * HZ);
+   }
+
dump = kvmalloc(sizeof(*dump), GFP_KERNEL);
if (!dump)
return -ENOMEM;
-- 
2.28.0



[PATCH v1 5/6] staging: qlge: clean up debugging code in the QL_ALL_DUMP ifdef land

2020-10-08 Thread Coiby Xu
The debugging code in the following ifdef land
 - QL_ALL_DUMP
 - QL_REG_DUMP
 - QL_DEV_DUMP
 - QL_CB_DUMP
 - QL_IB_DUMP
 - QL_OB_DUMP

becomes unnecessary because,
 - Device status and general registers can be obtained by ethtool.
 - Coredump can be done via devlink health reporter.
 - Structure related to the hardware (struct ql_adapter) can be obtained
   by crash or drgn.

Suggested-by: Benjamin Poirier 
Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/qlge.h |  82 
 drivers/staging/qlge/qlge_dbg.c | 688 
 drivers/staging/qlge/qlge_ethtool.c |   2 -
 drivers/staging/qlge/qlge_main.c|   7 +-
 4 files changed, 1 insertion(+), 778 deletions(-)

diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h
index 0a39801be15a..8aff3ba77730 100644
--- a/drivers/staging/qlge/qlge.h
+++ b/drivers/staging/qlge/qlge.h
@@ -2285,86 +2285,4 @@ void ql_check_lb_frame(struct ql_adapter *qdev, struct 
sk_buff *skb);
 int ql_own_firmware(struct ql_adapter *qdev);
 int ql_clean_lb_rx_ring(struct rx_ring *rx_ring, int budget);
 
-/* #define QL_ALL_DUMP */
-/* #define QL_REG_DUMP */
-/* #define QL_DEV_DUMP */
-/* #define QL_CB_DUMP */
-/* #define QL_IB_DUMP */
-/* #define QL_OB_DUMP */
-
-#ifdef QL_REG_DUMP
-void ql_dump_xgmac_control_regs(struct ql_adapter *qdev);
-void ql_dump_routing_entries(struct ql_adapter *qdev);
-void ql_dump_regs(struct ql_adapter *qdev);
-#define QL_DUMP_REGS(qdev) ql_dump_regs(qdev)
-#define QL_DUMP_ROUTE(qdev) ql_dump_routing_entries(qdev)
-#define QL_DUMP_XGMAC_CONTROL_REGS(qdev) ql_dump_xgmac_control_regs(qdev)
-#else
-#define QL_DUMP_REGS(qdev)
-#define QL_DUMP_ROUTE(qdev)
-#define QL_DUMP_XGMAC_CONTROL_REGS(qdev)
-#endif
-
-#ifdef QL_STAT_DUMP
-void ql_dump_stat(struct ql_adapter *qdev);
-#define QL_DUMP_STAT(qdev) ql_dump_stat(qdev)
-#else
-#define QL_DUMP_STAT(qdev)
-#endif
-
-#ifdef QL_DEV_DUMP
-void ql_dump_qdev(struct ql_adapter *qdev);
-#define QL_DUMP_QDEV(qdev) ql_dump_qdev(qdev)
-#else
-#define QL_DUMP_QDEV(qdev)
-#endif
-
-#ifdef QL_CB_DUMP
-void ql_dump_wqicb(struct wqicb *wqicb);
-void ql_dump_tx_ring(struct tx_ring *tx_ring);
-void ql_dump_ricb(struct ricb *ricb);
-void ql_dump_cqicb(struct cqicb *cqicb);
-void ql_dump_rx_ring(struct rx_ring *rx_ring);
-void ql_dump_hw_cb(struct ql_adapter *qdev, int size, u32 bit, u16 q_id);
-#define QL_DUMP_RICB(ricb) ql_dump_ricb(ricb)
-#define QL_DUMP_WQICB(wqicb) ql_dump_wqicb(wqicb)
-#define QL_DUMP_TX_RING(tx_ring) ql_dump_tx_ring(tx_ring)
-#define QL_DUMP_CQICB(cqicb) ql_dump_cqicb(cqicb)
-#define QL_DUMP_RX_RING(rx_ring) ql_dump_rx_ring(rx_ring)
-#define QL_DUMP_HW_CB(qdev, size, bit, q_id) \
-   ql_dump_hw_cb(qdev, size, bit, q_id)
-#else
-#define QL_DUMP_RICB(ricb)
-#define QL_DUMP_WQICB(wqicb)
-#define QL_DUMP_TX_RING(tx_ring)
-#define QL_DUMP_CQICB(cqicb)
-#define QL_DUMP_RX_RING(rx_ring)
-#define QL_DUMP_HW_CB(qdev, size, bit, q_id)
-#endif
-
-#ifdef QL_OB_DUMP
-void ql_dump_tx_desc(struct ql_adapter *qdev, struct tx_buf_desc *tbd);
-void ql_dump_ob_mac_iocb(struct ql_adapter *qdev, struct ob_mac_iocb_req 
*ob_mac_iocb);
-void ql_dump_ob_mac_rsp(struct ql_adapter *qdev, struct ob_mac_iocb_rsp 
*ob_mac_rsp);
-#define QL_DUMP_OB_MAC_IOCB(qdev, ob_mac_iocb) ql_dump_ob_mac_iocb(qdev, 
ob_mac_iocb)
-#define QL_DUMP_OB_MAC_RSP(qdev, ob_mac_rsp) ql_dump_ob_mac_rsp(qdev, 
ob_mac_rsp)
-#else
-#define QL_DUMP_OB_MAC_IOCB(qdev, ob_mac_iocb)
-#define QL_DUMP_OB_MAC_RSP(qdev, ob_mac_rsp)
-#endif
-
-#ifdef QL_IB_DUMP
-void ql_dump_ib_mac_rsp(struct ql_adapter *qdev, struct ib_mac_iocb_rsp 
*ib_mac_rsp);
-#define QL_DUMP_IB_MAC_RSP(qdev, ib_mac_rsp) ql_dump_ib_mac_rsp(qdev, 
ib_mac_rsp)
-#else
-#define QL_DUMP_IB_MAC_RSP(qdev, ib_mac_rsp)
-#endif
-
-#ifdef QL_ALL_DUMP
-void ql_dump_all(struct ql_adapter *qdev);
-#define QL_DUMP_ALL(qdev) ql_dump_all(qdev)
-#else
-#define QL_DUMP_ALL(qdev)
-#endif
-
 #endif /* _QLGE_H_ */
diff --git a/drivers/staging/qlge/qlge_dbg.c b/drivers/staging/qlge/qlge_dbg.c
index 989575743718..a02ecf8fb291 100644
--- a/drivers/staging/qlge/qlge_dbg.c
+++ b/drivers/staging/qlge/qlge_dbg.c
@@ -1313,691 +1313,3 @@ void ql_get_dump(struct ql_adapter *qdev, void *buff)
ql_get_core_dump(qdev);
}
 }
-
-#ifdef QL_REG_DUMP
-static void ql_dump_intr_states(struct ql_adapter *qdev)
-{
-   int i;
-   u32 value;
-
-   for (i = 0; i < qdev->intr_count; i++) {
-   ql_write32(qdev, INTR_EN, qdev->intr_context[i].intr_read_mask);
-   value = ql_read32(qdev, INTR_EN);
-   netdev_err(qdev->ndev, "Interrupt %d is %s\n", i,
-  (value & INTR_EN_EN ? "enabled" : "disabled"));
-   }
-}
-
-#define DUMP_XGMAC(qdev, reg)  \
-do {   \
-   u32 data;   \
-   ql_read_xgmac_reg(qdev, reg, &data);\
-   netdev_err(qdev->ndev,

[PATCH v1 1/6] staging: qlge: Initialize devlink health dump framework for the dlge driver

2020-10-08 Thread Coiby Xu
Initialize devlink health dump framework for the dlge driver so the
coredump could be done via devlink.

Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/Kconfig|  1 +
 drivers/staging/qlge/Makefile   |  2 +-
 drivers/staging/qlge/qlge.h |  9 +++
 drivers/staging/qlge/qlge_devlink.c | 38 +
 drivers/staging/qlge/qlge_devlink.h |  8 ++
 drivers/staging/qlge/qlge_main.c| 28 +
 6 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/qlge/qlge_devlink.c
 create mode 100644 drivers/staging/qlge/qlge_devlink.h

diff --git a/drivers/staging/qlge/Kconfig b/drivers/staging/qlge/Kconfig
index a3cb25a3ab80..6d831ed67965 100644
--- a/drivers/staging/qlge/Kconfig
+++ b/drivers/staging/qlge/Kconfig
@@ -3,6 +3,7 @@
 config QLGE
tristate "QLogic QLGE 10Gb Ethernet Driver Support"
depends on ETHERNET && PCI
+   select NET_DEVLINK
help
This driver supports QLogic ISP8XXX 10Gb Ethernet cards.
 
diff --git a/drivers/staging/qlge/Makefile b/drivers/staging/qlge/Makefile
index 1dc2568e820c..07c1898a512e 100644
--- a/drivers/staging/qlge/Makefile
+++ b/drivers/staging/qlge/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QLGE) += qlge.o
 
-qlge-objs := qlge_main.o qlge_dbg.o qlge_mpi.o qlge_ethtool.o
+qlge-objs := qlge_main.o qlge_dbg.o qlge_mpi.o qlge_ethtool.o qlge_devlink.o
diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h
index b295990e361b..290e754450c5 100644
--- a/drivers/staging/qlge/qlge.h
+++ b/drivers/staging/qlge/qlge.h
@@ -2060,6 +2060,14 @@ struct nic_operations {
int (*port_initialize)(struct ql_adapter *qdev);
 };
 
+
+
+struct qlge_devlink {
+struct ql_adapter *qdev;
+struct net_device *ndev;
+struct devlink_health_reporter *reporter;
+};
+
 /*
  * The main Adapter structure definition.
  * This structure has all fields relevant to the hardware.
@@ -2077,6 +2085,7 @@ struct ql_adapter {
struct pci_dev *pdev;
struct net_device *ndev;/* Parent NET device */
 
+   struct qlge_devlink *ql_devlink;
/* Hardware information */
u32 chip_rev_id;
u32 fw_rev_id;
diff --git a/drivers/staging/qlge/qlge_devlink.c 
b/drivers/staging/qlge/qlge_devlink.c
new file mode 100644
index ..aa45e7e368c0
--- /dev/null
+++ b/drivers/staging/qlge/qlge_devlink.c
@@ -0,0 +1,38 @@
+#include "qlge.h"
+#include "qlge_devlink.h"
+
+static int
+qlge_reporter_coredump(struct devlink_health_reporter *reporter,
+   struct devlink_fmsg *fmsg, void *priv_ctx,
+   struct netlink_ext_ack *extack)
+{
+   return 0;
+}
+
+static const struct devlink_health_reporter_ops qlge_reporter_ops = {
+   .name = "dummy",
+   .dump = qlge_reporter_coredump,
+};
+
+int qlge_health_create_reporters(struct qlge_devlink *priv)
+{
+   int err;
+
+   struct devlink_health_reporter *reporter;
+   struct devlink *devlink;
+
+   devlink = priv_to_devlink(priv);
+   reporter =
+   devlink_health_reporter_create(devlink, &qlge_reporter_ops,
+  0,
+  priv);
+   if (IS_ERR(reporter)) {
+   netdev_warn(priv->ndev,
+   "Failed to create reporter, err = %ld\n",
+   PTR_ERR(reporter));
+   return PTR_ERR(reporter);
+   }
+   priv->reporter = reporter;
+
+   return err;
+}
diff --git a/drivers/staging/qlge/qlge_devlink.h 
b/drivers/staging/qlge/qlge_devlink.h
new file mode 100644
index ..c91f7a29e805
--- /dev/null
+++ b/drivers/staging/qlge/qlge_devlink.h
@@ -0,0 +1,8 @@
+#ifndef QLGE_DEVLINK_H
+#define QLGE_DEVLINK_H
+
+#include 
+
+int qlge_health_create_reporters(struct qlge_devlink *priv);
+
+#endif /* QLGE_DEVLINK_H */
diff --git a/drivers/staging/qlge/qlge_main.c b/drivers/staging/qlge/qlge_main.c
index 27da386f9d87..135225530e51 100644
--- a/drivers/staging/qlge/qlge_main.c
+++ b/drivers/staging/qlge/qlge_main.c
@@ -42,6 +42,7 @@
 #include 
 
 #include "qlge.h"
+#include "qlge_devlink.h"
 
 char qlge_driver_name[] = DRV_NAME;
 const char qlge_driver_version[] = DRV_VERSION;
@@ -4549,6 +4550,8 @@ static void ql_timer(struct timer_list *t)
mod_timer(&qdev->timer, jiffies + (5 * HZ));
 }
 
+static const struct devlink_ops qlge_devlink_ops;
+
 static int qlge_probe(struct pci_dev *pdev,
  const struct pci_device_id *pci_entry)
 {
@@ -4556,6 +4559,13 @@ static int qlge_probe(struct pci_dev *pdev,
struct ql_adapter *qdev = NULL;
static int cards_found;
int err = 0;
+   struct devlink *devlink;
+   struct qlge_devlink *ql_devlink;
+
+   devlink = devlink_alloc(&qlge_devlink_ops, sizeof(struct qlge_devlink));
+   if (!devlink)
+   return -ENOMEM;
+   ql_devlink = devlink

[PATCH] dpaa_eth: enable NETIF_MSG_HW by default

2020-10-08 Thread Vladimir Oltean
From: Maxim Kochetkov 

When packets are received on the error queue, this function under
net_ratelimit():

netif_err(priv, hw, net_dev, "Err FD status = 0x%08x\n");

does not get printed. Instead we only see:

[ 3658.845592] net_ratelimit: 244 callbacks suppressed
[ 3663.969535] net_ratelimit: 230 callbacks suppressed
[ 3669.085478] net_ratelimit: 228 callbacks suppressed

Enabling NETIF_MSG_HW fixes this issue, and we can see some information
about the frame descriptors of packets.

Signed-off-by: Maxim Kochetkov 
Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index fdff3b4723ba..06cc863f4dd6 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -87,7 +87,7 @@ MODULE_PARM_DESC(tx_timeout, "The Tx timeout in ms");
 
 #define DPAA_MSG_DEFAULT (NETIF_MSG_DRV | NETIF_MSG_PROBE | \
  NETIF_MSG_LINK | NETIF_MSG_IFUP | \
- NETIF_MSG_IFDOWN)
+ NETIF_MSG_IFDOWN | NETIF_MSG_HW)
 
 #define DPAA_INGRESS_CS_THRESHOLD 0x1000
 /* Ingress congestion threshold on FMan ports
-- 
2.25.1



Re: [PATCH] dpaa_eth: enable NETIF_MSG_HW by default

2020-10-08 Thread Vladimir Oltean
On Thu, Oct 08, 2020 at 03:03:12PM +0300, Vladimir Oltean wrote:
> From: Maxim Kochetkov 
> 
> When packets are received on the error queue, this function under
> net_ratelimit():
> 
> netif_err(priv, hw, net_dev, "Err FD status = 0x%08x\n");
> 
> does not get printed. Instead we only see:
> 
> [ 3658.845592] net_ratelimit: 244 callbacks suppressed
> [ 3663.969535] net_ratelimit: 230 callbacks suppressed
> [ 3669.085478] net_ratelimit: 228 callbacks suppressed
> 
> Enabling NETIF_MSG_HW fixes this issue, and we can see some information
> about the frame descriptors of packets.
> 
> Signed-off-by: Maxim Kochetkov 
> Signed-off-by: Vladimir Oltean 
> ---

I'm sorry for failing to mention the target tree in the email subject.
This is directed towards net-next.

[PATCH 1/2] net: smc: fix missing brace warning for old compilers

2020-10-08 Thread Pujin Shi
For older versions of gcc, the array = {0}; will cause warnings:

net/smc/smc_llc.c: In function 'smc_llc_send_link_delete_all':
net/smc/smc_llc.c:1317:9: warning: missing braces around initializer 
[-Wmissing-braces]
  struct smc_llc_msg_del_link delllc = {0};
 ^
net/smc/smc_llc.c:1317:9: warning: (near initialization for 'delllc.hd') 
[-Wmissing-braces]

1 warnings generated

Fixes: f3811fd7bc97 ("net/smc: send DELETE_LINK, ALL message and wait for send 
to complete")
Signed-off-by: Pujin Shi 
---
 net/smc/smc_llc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index 2db967f2fb50..d09d9d2d0bfd 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -1314,7 +1314,7 @@ static void smc_llc_process_cli_delete_link(struct 
smc_link_group *lgr)
  */
 void smc_llc_send_link_delete_all(struct smc_link_group *lgr, bool ord, u32 
rsn)
 {
-   struct smc_llc_msg_del_link delllc = {0};
+   struct smc_llc_msg_del_link delllc = {};
int i;
 
delllc.hd.common.type = SMC_LLC_DELETE_LINK;
-- 
2.18.1



[PATCH 2/2] net: smc: fix missing brace warning for old compilers

2020-10-08 Thread Pujin Shi
For older versions of gcc, the array = {0}; will cause warnings:

net/smc/smc_llc.c: In function 'smc_llc_add_link_local':
net/smc/smc_llc.c:1212:9: warning: missing braces around initializer 
[-Wmissing-braces]
  struct smc_llc_msg_add_link add_llc = {0};
 ^
net/smc/smc_llc.c:1212:9: warning: (near initialization for 'add_llc.hd') 
[-Wmissing-braces]
net/smc/smc_llc.c: In function 'smc_llc_srv_delete_link_local':
net/smc/smc_llc.c:1245:9: warning: missing braces around initializer 
[-Wmissing-braces]
  struct smc_llc_msg_del_link del_llc = {0};
 ^
net/smc/smc_llc.c:1245:9: warning: (near initialization for 'del_llc.hd') 
[-Wmissing-braces]

2 warnings generated

Fixes: 4dadd151b265 ("net/smc: enqueue local LLC messages")
Signed-off-by: Pujin Shi 
---
 net/smc/smc_llc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index d09d9d2d0bfd..85df0ef60500 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -1209,7 +1209,7 @@ static void smc_llc_process_srv_add_link(struct 
smc_link_group *lgr)
 /* enqueue a local add_link req to trigger a new add_link flow */
 void smc_llc_add_link_local(struct smc_link *link)
 {
-   struct smc_llc_msg_add_link add_llc = {0};
+   struct smc_llc_msg_add_link add_llc = {};
 
add_llc.hd.length = sizeof(add_llc);
add_llc.hd.common.type = SMC_LLC_ADD_LINK;
@@ -1242,7 +1242,7 @@ static void smc_llc_add_link_work(struct work_struct 
*work)
  */
 void smc_llc_srv_delete_link_local(struct smc_link *link, u8 del_link_id)
 {
-   struct smc_llc_msg_del_link del_llc = {0};
+   struct smc_llc_msg_del_link del_llc = {};
 
del_llc.hd.length = sizeof(del_llc);
del_llc.hd.common.type = SMC_LLC_DELETE_LINK;
-- 
2.18.1



Re: [PATCH v1 1/6] staging: qlge: Initialize devlink health dump framework for the dlge driver

2020-10-08 Thread Willem de Bruijn
On Thu, Oct 8, 2020 at 7:58 AM Coiby Xu  wrote:
>
> Initialize devlink health dump framework for the dlge driver so the
> coredump could be done via devlink.
>
> Signed-off-by: Coiby Xu 

> @@ -4556,6 +4559,13 @@ static int qlge_probe(struct pci_dev *pdev,
> struct ql_adapter *qdev = NULL;
> static int cards_found;
> int err = 0;
> +   struct devlink *devlink;
> +   struct qlge_devlink *ql_devlink;
> +
> +   devlink = devlink_alloc(&qlge_devlink_ops, sizeof(struct 
> qlge_devlink));
> +   if (!devlink)
> +   return -ENOMEM;
> +   ql_devlink = devlink_priv(devlink);
>
> ndev = alloc_etherdev_mq(sizeof(struct ql_adapter),
>  min(MAX_CPUS,

need to goto devlink_free instead of return -ENOMEM here, too.

> @@ -4614,6 +4624,16 @@ static int qlge_probe(struct pci_dev *pdev,
> free_netdev(ndev);
> return err;

and here

> }
> +
> +   err = devlink_register(devlink, &pdev->dev);
> +   if (err) {
> +   goto devlink_free;
> +   }
> +
> +   qlge_health_create_reporters(ql_devlink);
> +   ql_devlink->qdev = qdev;
> +   ql_devlink->ndev = ndev;
> +   qdev->ql_devlink = ql_devlink;
> /* Start up the timer to trigger EEH if
>  * the bus goes dead
>  */
> @@ -4624,6 +4644,10 @@ static int qlge_probe(struct pci_dev *pdev,
> atomic_set(&qdev->lb_count, 0);
> cards_found++;
> return 0;
> +
> +devlink_free:
> +   devlink_free(devlink);
> +   return err;
>  }


Re: [PATCH bpf-next V2 1/6] bpf: Remove MTU check in __bpf_skb_max_len

2020-10-08 Thread Willem de Bruijn
On Thu, Oct 8, 2020 at 7:06 AM Jesper Dangaard Brouer  wrote:
>
> On Wed, 7 Oct 2020 16:46:10 -0700
> Maciej Żenczykowski  wrote:
>
> > >  static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> > >  {
> > > -   return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> > > - SKB_MAX_ALLOC;
> > > +   return IP_MAX_MTU;
> > >  }
> >
> > Shouldn't we just delete this helper instead and replace call sites?
>
> It does seem wrong to pass argument skb into this function, as it is
> no-longer used...
>
> Guess I can simply replace __bpf_skb_max_len with IP_MAX_MTU.

Should that be IP6_MAX_MTU, which is larger than IP_MAX_MTU?


Re: [PATCHv2 net-next 02/17] udp6: move the mss check after udp gso tunnel processing

2020-10-08 Thread Willem de Bruijn
On Thu, Oct 8, 2020 at 5:48 AM Xin Long  wrote:
>
> For some protocol's gso, like SCTP, it's using GSO_BY_FRAGS for
> gso_size. When using UDP to encapsulate its packet, it will
> return error in udp6_ufo_fragment() as skb->len < gso_size,
> and it will never go to the gso tunnel processing.
>
> So we should move this check after udp gso tunnel processing,
> the same as udp4_ufo_fragment() does. While at it, also tidy
> the variables up.

Please don't mix a new feature and code cleanup.

This patch changes almost every line of the function due to
indentation changes. But the only relevant part is

"
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
goto out;

if (skb->encapsulation && skb_shinfo(skb)->gso_type &
(SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))
segs = skb_udp_tunnel_segment(skb, features, true);
else {
/* irrelevant here */
}

out:
return segs;
}
"

Is it a sufficient change to just skip the mss check if mss == GSO_BY_FRAGS?


  1   2   3   4   5   >