Re: [PATCH v2 1/6] Add ancillary bus support
On Wed, Oct 07, 2020 at 11:32:11PM -0700, Dan Williams wrote: > On Wed, Oct 7, 2020 at 10:21 PM Leon Romanovsky wrote: > > > > On Wed, Oct 07, 2020 at 08:46:45PM +, Ertman, David M wrote: > > > > -Original Message- > > > > From: Parav Pandit > > > > Sent: Wednesday, October 7, 2020 1:17 PM > > > > To: Leon Romanovsky ; Ertman, David M > > > > > > > > Cc: Pierre-Louis Bossart ; alsa- > > > > de...@alsa-project.org; pa...@mellanox.com; ti...@suse.de; > > > > netdev@vger.kernel.org; ranjani.sridha...@linux.intel.com; > > > > fred...@linux.intel.com; linux-r...@vger.kernel.org; > > > > dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe > > > > ; gre...@linuxfoundation.org; k...@kernel.org; > > > > Williams, > > > > Dan J ; Saleem, Shiraz > > > > ; da...@davemloft.net; Patil, Kiran > > > > > > > > Subject: RE: [PATCH v2 1/6] Add ancillary bus support > > > > > > > > > > > > > From: Leon Romanovsky > > > > > Sent: Thursday, October 8, 2020 12:56 AM > > > > > > > > > > > > This API is partially obscures low level driver-core code and > > > > > > > needs > > > > > > > to provide clear and proper abstractions without need to remember > > > > > > > about put_device. There is already _add() interface why don't you > > > > > > > do > > > > > > > put_device() in it? > > > > > > > > > > > > > > > > > > > The pushback Pierre is referring to was during our mid-tier internal > > > > > > review. It was primarily a concern of Parav as I recall, so he can > > > > > > speak to > > > > his > > > > > reasoning. > > > > > > > > > > > > What we originally had was a single API call > > > > > > (ancillary_device_register) that started with a call to > > > > > > device_initialize(), and every error path out of the function > > > > > > performed a > > > > > put_device(). > > > > > > > > > > > > Is this the model you have in mind? > > > > > > > > > > I don't like this flow: > > > > > ancillary_device_initialize() > > > > > if (ancillary_ancillary_device_add()) { > > > > > put_device() > > > > > ancillary_device_unregister() > > > > Calling device_unregister() is incorrect, because add() wasn't > > > > successful. > > > > Only put_device() or a wrapper ancillary_device_put() is necessary. > > > > > > > > > return err; > > > > > } > > > > > > > > > > And prefer this flow: > > > > > ancillary_device_initialize() > > > > > if (ancillary_device_add()) { > > > > > ancillary_device_unregister() > > > > This is incorrect and a clear deviation from the current core APIs that > > > > adds the > > > > confusion. > > > > > > > > > return err; > > > > > } > > > > > > > > > > In this way, the ancillary users won't need to do non-intuitive > > > > > put_device(); > > > > > > > > Below is most simple, intuitive and matching with core APIs for name and > > > > design pattern wise. > > > > init() > > > > { > > > > err = ancillary_device_initialize(); > > > > if (err) > > > > return ret; > > > > > > > > err = ancillary_device_add(); > > > > if (ret) > > > > goto err_unwind; > > > > > > > > err = some_foo(); > > > > if (err) > > > > goto err_foo; > > > > return 0; > > > > > > > > err_foo: > > > > ancillary_device_del(adev); > > > > err_unwind: > > > > ancillary_device_put(adev->dev); > > > > return err; > > > > } > > > > > > > > cleanup() > > > > { > > > > ancillary_device_de(adev); > > > > ancillary_device_put(adev); > > > > /* It is common to have a one wrapper for this as > > > > ancillary_device_unregister(). > > > > * This will match with core device_unregister() that has precise > > > > documentation. > > > > * but given fact that init() code need proper error unwinding, like > > > > above, > > > > * it make sense to have two APIs, and no need to export another > > > > symbol for unregister(). > > > > * This pattern is very easy to audit and code. > > > > */ > > > > } > > > > > > I like this flow +1 > > > > > > But ... since the init() function is performing both device_init and > > > device_add - it should probably be called ancillary_device_register, > > > and we are back to a single exported API for both register and > > > unregister. > > > > > > At that point, do we need wrappers on the primitives init, add, del, > > > and put? > > > > Let me summarize. > > 1. You are not providing driver/core API but simplification and obfuscation > > of basic primitives and structures. This is new layer. There is no room for > > a claim that we must to follow internal API. > > Yes, this a driver core api, Greg even questioned why it was in > drivers/bus instead of drivers/base which I think makes sense. We can argue till death, but at the end, this is a bus. > > > 2. API should be symmetric. If you call to _register()/_add(), you will need > > to call to _unregister()/_del(). Please don't add obscure _put(). > > It's not obscure it's a long standing semantic for how to properly > handle device_add() fa
RE: [PATCH v2 1/6] Add ancillary bus support
> From: Leon Romanovsky > Sent: Thursday, October 8, 2020 10:56 AM > > On Thu, Oct 08, 2020 at 04:56:01AM +, Parav Pandit wrote: > > > > > > > From: Pierre-Louis Bossart > > > Sent: Thursday, October 8, 2020 3:20 AM > > > > > > > > > On 10/7/20 4:22 PM, Ertman, David M wrote: > > > >> -Original Message- > > > >> From: Pierre-Louis Bossart > > > >> Sent: Wednesday, October 7, 2020 1:59 PM > > > >> To: Ertman, David M ; Parav Pandit > > > >> ; Leon Romanovsky > > > >> Cc: alsa-de...@alsa-project.org; pa...@mellanox.com; > > > >> ti...@suse.de; netdev@vger.kernel.org; > > > >> ranjani.sridha...@linux.intel.com; > > > >> fred...@linux.intel.com; linux-r...@vger.kernel.org; > > > >> dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe > > > >> ; gre...@linuxfoundation.org; k...@kernel.org; > > > >> Williams, Dan J ; Saleem, Shiraz > > > >> ; da...@davemloft.net; Patil, Kiran > > > >> > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support > > > >> > > > >> > > > >> > > > Below is most simple, intuitive and matching with core APIs for > > > name and design pattern wise. > > > init() > > > { > > > err = ancillary_device_initialize(); > > > if (err) > > > return ret; > > > > > > err = ancillary_device_add(); > > > if (ret) > > > goto err_unwind; > > > > > > err = some_foo(); > > > if (err) > > > goto err_foo; > > > return 0; > > > > > > err_foo: > > > ancillary_device_del(adev); > > > err_unwind: > > > ancillary_device_put(adev->dev); > > > return err; > > > } > > > > > > cleanup() > > > { > > > ancillary_device_de(adev); > > > ancillary_device_put(adev); > > > /* It is common to have a one wrapper for this as > > > ancillary_device_unregister(). > > > * This will match with core device_unregister() that has > > > precise documentation. > > > * but given fact that init() code need proper error > > > unwinding, like above, > > > * it make sense to have two APIs, and no need to export > > > another symbol for unregister(). > > > * This pattern is very easy to audit and code. > > > */ > > > } > > > >>> > > > >>> I like this flow +1 > > > >>> > > > >>> But ... since the init() function is performing both device_init > > > >>> and device_add - it should probably be called > > > >>> ancillary_device_register, and we are back to a single exported > > > >>> API for both register and unregister. > > > >> > > > >> Kind reminder that we introduced the two functions to allow the > > > >> caller to know if it needed to free memory when initialize() > > > >> fails, and it didn't need to free memory when add() failed since > > > >> put_device() takes care of it. If you have a single init() > > > >> function it's impossible to know which behavior to select on error. > > > >> > > > >> I also have a case with SoundWire where it's nice to first > > > >> initialize, then set some data and then add. > > > >> > > > > > > > > The flow as outlined by Parav above does an initialize as the > > > > first step, so every error path out of the function has to do a > > > > put_device(), so you would never need to manually free the memory > > > > in > > > the setup function. > > > > It would be freed in the release call. > > > > > > err = ancillary_device_initialize(); if (err) > > > return ret; > > > > > > where is the put_device() here? if the release function does any > > > sort of kfree, then you'd need to do it manually in this case. > > Since device_initialize() failed, put_device() cannot be done here. > > So yes, pseudo code should have shown, if (err) { > > kfree(adev); > > return err; > > } > > > > If we just want to follow register(), unregister() pattern, > > > > Than, > > > > ancillar_device_register() should be, > > > > /** > > * ancillar_device_register() - register an ancillary device > > * NOTE: __never directly free @adev after calling this function, even > > if it returned > > * an error. Always use ancillary_device_put() to give up the reference > initialized by this function. > > * This note matches with the core and caller knows exactly what to be > done. > > */ > > ancillary_device_register() > > { > > device_initialize(&adev->dev); > > if (!dev->parent || !adev->name) > > return -EINVAL; > > if (!dev->release && !(dev->type && dev->type->release)) { > > /* core is already capable and throws the warning when > release callback is not set. > > * It is done at drivers/base/core.c:1798. > > * For NULL release it says, "does not have a release() > function, it is broken and must be fixed" > > */ > > return -EINVAL; > > } > > err = dev_set_name(adev...); > > if
Re: [PATCH 2/2] usb: serial: option: add Cellient MPL200 card
On Thu, Oct 08, 2020 at 08:47:33AM +0200, Wilken Gottwalt wrote: > On Tue, 6 Oct 2020 09:02:01 +0200 > Johan Hovold wrote: > > > On Mon, Oct 05, 2020 at 02:07:23PM +0200, Wilken Gottwalt wrote: > > > On Mon, 5 Oct 2020 18:36:36 +0700 Lars Melin wrote: > > > > It is very likely that Cellient has replaced the VID with their own and > > > > kept the PID, it is something other mfgrs has done when buying modules > > > > from Qualcomm's series of devices with predefined composition. > > > > > > > > The MS Windows driver for 05c6:9025 describes the interfaces as: > > > > > > > > MI_00 Qualcomm HS-USB Diagnostics 9025 > > > > MI_01 Android Composite ADB Interface > > > > MI_02 Qualcomm HS-USB Android Modem 9025 > > > > MI_03 Qualcomm HS-USB NMEA 9025 > > > > MI_04 Qualcomm Wireless HS-USB Ethernet Adapter 9025 > > > > MI_05 USB Mass Storage Device > > > > > > > > where the net interface is for QMI/RMNET. > > > > It fully matches the blacklisting Wilken has done for 2692:9025 > > > > > > Does your device have a GPS connector? Mine had not and I'm not sure > > > if the description of MI_01 is actually correct. I remember looking at > > > this port and seeing bogus NMEA data. > > > > Well if it's NMEA then the interface shouldn't be blacklisted (even if > > the values are bogus on your device), but if it's ADB it should be as > > that is handled by userspace. > > > > Here's some lsusb output from a Cellient MPL200 that still uses the > > Qualcomm VID: > > > > > > https://www.mail-archive.com/modemmanager-devel@lists.freedesktop.org/msg04523.html > > > > which gives some support to Lars's hypothesis. I guess we'll just keep > > the first interface reserved. > > Lars and Johan are right here. I found an older external Gobi driver > where I actually added comments saying interface 1 is ADB and interface 3 > is NMEA delivering only zeroed values because of the missing antenna > connector, at least for the models I had access to. Great, thanks for confirming. Johan
Re: [PATCH v2 00/29] [Set 1,2,3] Rid W=1 warnings in Wireless
Luca Coelho writes: > On Tue, 2020-10-06 at 10:10 +0300, Kalle Valo wrote: >> Lee Jones writes: >> >> > On Tue, 06 Oct 2020, Kalle Valo wrote: >> > >> > > Lee Jones writes: >> > > >> > > > On Thu, 10 Sep 2020, Lee Jones wrote: >> > > > >> > > > > This is a rebased/re-worked set of patches which have been >> > > > > previously posted to the mailing list(s). >> > > > > >> > > > > This set is part of a larger effort attempting to clean-up W=1 >> > > > > kernel builds, which are currently overwhelmingly riddled with >> > > > > niggly little warnings. >> > > > > >> > > > > There are quite a few W=1 warnings in the Wireless. My plan >> > > > > is to work through all of them over the next few weeks. >> > > > > Hopefully it won't be too long before drivers/net/wireless >> > > > > builds clean with W=1 enabled. >> > > > > >> > > > > Lee Jones (29): >> > > > > iwlwifi: dvm: Demote non-compliant kernel-doc headers >> > > > > iwlwifi: rs: Demote non-compliant kernel-doc headers >> > > > > iwlwifi: dvm: tx: Demote non-compliant kernel-doc headers >> > > > > iwlwifi: dvm: lib: Demote non-compliant kernel-doc headers >> > > > > iwlwifi: calib: Demote seemingly unintentional kerneldoc header >> > > > > wil6210: Fix a couple of formatting issues in >> > > > > 'wil6210_debugfs_init' >> > > > > iwlwifi: dvm: sta: Demote a bunch of nonconformant kernel-doc >> > > > > headers >> > > > > iwlwifi: mvm: ops: Remove unused static struct >> > > > > 'iwl_mvm_debug_names' >> > > > > iwlwifi: dvm: Demote a couple of nonconformant kernel-doc headers >> > > > > iwlwifi: mvm: utils: Fix some doc-rot >> > > > > iwlwifi: dvm: scan: Demote a few nonconformant kernel-doc headers >> > > > > iwlwifi: dvm: rxon: Demote non-conformant kernel-doc headers >> > > > > iwlwifi: mvm: tx: Demote misuse of kernel-doc headers >> > > > > iwlwifi: dvm: devices: Fix function documentation formatting issues >> > > > > iwlwifi: iwl-drv: Provide descriptions debugfs dentries >> > > > > wil6210: wmi: Fix formatting and demote non-conforming function >> > > > > headers >> > > > > wil6210: interrupt: Demote comment header which is clearly not >> > > > > kernel-doc >> > > > > wil6210: txrx: Demote obvious abuse of kernel-doc >> > > > > wil6210: txrx_edma: Demote comments which are clearly not >> > > > > kernel-doc >> > > > > wil6210: pmc: Demote a few nonconformant kernel-doc function >> > > > > headers >> > > > > wil6210: wil_platform: Demote kernel-doc header to standard comment >> > > > > block >> > > > > wil6210: wmi: Correct misnamed function parameter 'ptr_' >> > > > > ath6kl: wmi: Remove unused variable 'rate' >> > > > > ath9k: ar9002_initvals: Remove unused array >> > > > > 'ar9280PciePhy_clkreq_off_L1_9280' >> > > > > ath9k: ar9001_initvals: Remove unused array 'ar5416Bank6_9100' >> > > > > ath9k: ar5008_initvals: Remove unused table entirely >> > > > > ath9k: ar5008_initvals: Move ar5416Bank{0,1,2,3,7} to where they >> > > > > are >> > > > > used >> > > > > brcmsmac: phytbl_lcn: Remove unused array 'dot11lcn_gain_tbl_rev1' >> > > > > brcmsmac: phy_lcn: Remove unused variable >> > > > > 'lcnphy_rx_iqcomp_table_rev0' >> > > > >> > > > What's happening with all of these iwlwifi patches? >> > > > >> > > > Looks like they are still not applied. >> > > >> > > Luca (CCed) takes iwlwifi patches to his iwlwifi tree. >> > >> > Thanks Kalle. >> > >> > Luca, >> > >> > Do you know why these patches have not been applied yet? Do you >> > plan on applying them this week? -rc1 is not due for release for >> > nearly 3 weeks now that Linus tagged an -rc8. >> >> I can also take Lee's patches directly to wireless-drivers-next, if >> that's easier for Luca. > > Yes, please take these patches directly. It simplifies things. Ok, I have assigned all Lee's pending iwlwifi patches to me on patchwork. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
[PATCH v2 0/2] add Cellient MPL200 card
Add the Cellient MPL200 card to usb/qmi_wwan and serial/option. The serial/option patch got already applied. Changes v2: Picked proper subject for qmi_wwan patch, moved the MPL200 device to the section of the combined devices and changed the comment about the device to be more precise. Wilken Gottwalt (2): net: usb: qmi_wwan: add Cellient MPL200 card usb: serial: option: add Cellient MPL200 card drivers/net/usb/qmi_wwan.c | 1 + drivers/usb/serial/option.c | 3 +++ 2 files changed, 4 insertions(+) -- 2.28.0
[PATCH v2 1/2] net: usb: qmi_wwan: add Cellient MPL200 card
Add usb ids of the Cellient MPL200 card. Signed-off-by: Wilken Gottwalt --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 07c42c0719f5..5ca1356b8656 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1375,6 +1375,7 @@ static const struct usb_device_id products[] = { {QMI_QUIRK_SET_DTR(0x2cb7, 0x0104, 4)}, /* Fibocom NL678 series */ {QMI_FIXED_INTF(0x0489, 0xe0b4, 0)},/* Foxconn T77W968 LTE */ {QMI_FIXED_INTF(0x0489, 0xe0b5, 0)},/* Foxconn T77W968 LTE with eSIM support*/ + {QMI_FIXED_INTF(0x2692, 0x9025, 4)},/* Cellient MPL200 (rebranded Qualcomm 05c6:9025) */ /* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)},/* Acer Gobi Modem Device */ -- 2.28.0
[PATCH v2 2/2] usb: serial: option: add Cellient MPL200 card
Add usb ids of the Cellient MPL200 card. Signed-off-by: Wilken Gottwalt --- drivers/usb/serial/option.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 0c6f160a214a..a65e620b2277 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -528,6 +528,7 @@ static void option_instat_callback(struct urb *urb); /* Cellient products */ #define CELLIENT_VENDOR_ID 0x2692 #define CELLIENT_PRODUCT_MEN2000x9005 +#define CELLIENT_PRODUCT_MPL2000x9025 /* Hyundai Petatel Inc. products */ #define PETATEL_VENDOR_ID 0x1ff4 @@ -1982,6 +1983,8 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, MEDIATEK_PRODUCT_DC_4COM2, 0xff, 0x02, 0x01) }, { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, MEDIATEK_PRODUCT_DC_4COM2, 0xff, 0x00, 0x00) }, { USB_DEVICE(CELLIENT_VENDOR_ID, CELLIENT_PRODUCT_MEN200) }, + { USB_DEVICE(CELLIENT_VENDOR_ID, CELLIENT_PRODUCT_MPL200), + .driver_info = RSVD(1) | RSVD(4) }, { USB_DEVICE(PETATEL_VENDOR_ID, PETATEL_PRODUCT_NP10T_600A) }, { USB_DEVICE(PETATEL_VENDOR_ID, PETATEL_PRODUCT_NP10T_600E) }, { USB_DEVICE_AND_INTERFACE_INFO(TPLINK_VENDOR_ID, TPLINK_PRODUCT_LTE, 0xff, 0x00, 0x00) }, /* TP-Link LTE Module */ -- 2.28.0
Re: [PATCH 0/7] wfx: move out from the staging area
Greg Kroah-Hartman writes: > On Wed, Oct 07, 2020 at 12:19:36PM +0200, Jerome Pouiller wrote: >> From: Jérôme Pouiller >> >> I think the wfx driver is now mature enough to be accepted in the >> drivers/net/wireless directory. >> >> There is still one item on the TODO list. It is an idea to improve the rate >> control in some particular cases[1]. However, the current performances of the >> driver seem to satisfy everyone. In add, the suggested change is large >> enough. >> So, I would prefer to implement it only if it really solves an issue. I >> think it >> is not an obstacle to move the driver out of the staging area. >> >> In order to comply with the last rules for the DT bindings, I have converted >> the >> documentation to yaml. I am moderately happy with the result. Especially, for >> the description of the binding. Any comments are welcome. >> >> The series also update the copyrights dates of the files. I don't know >> exactly >> how this kind of changes should be sent. It's a bit weird to change all the >> copyrights in one commit, but I do not see any better way. >> >> I also include a few fixes I have found these last weeks. >> >> [1] https://lore.kernel.org/lkml/3099559.gv3Q75KnN1@pc-42 > > I'll take the first 6 patches here, the last one you should work with > the wireless maintainers to get reviewed. > > Maybe that might want to wait until after 5.10-rc1 is out, with all of > these changes in it, making it an easier move. Yes, the driver needs to be reviewed in linux-wireless list. I recommend submitting the whole driver in a patchset with one file per patch, which seems to be the easiest way to review a full driver. The final move will be in just one commit moving the driver, just like patch 7 does here. As an example see how wilc1000 review was done. Device tree bindings needs to be reviewed by the DT maintainer so CC devicetree on that patch. But do note that there's currently one new driver in review queue, so it will most likely take some time before wfx is reviewed. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH v2 2/2] usb: serial: option: add Cellient MPL200 card
On Thu, Oct 08, 2020 at 09:22:19AM +0200, Wilken Gottwalt wrote: > Add usb ids of the Cellient MPL200 card. > > Signed-off-by: Wilken Gottwalt > --- So I had already applied this one (which didn't change since v1). Thanks again. Johan
[net] tipc: fix NULL pointer dereference in tipc_named_rcv
In the function node_lost_contact(), we call __skb_queue_purge() without grabbing the list->lock. This can cause to a race-condition why processing the list 'namedq' in calling path tipc_named_rcv()->tipc_named_dequeue(). [] BUG: kernel NULL pointer dereference, address: [] #PF: supervisor read access in kernel mode [] #PF: error_code(0x) - not-present page [] PGD 7ca63067 P4D 7ca63067 PUD 6c553067 PMD 0 [] Oops: [#1] SMP NOPTI [] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G O 5.9.0-rc6+ #2 [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS [...] [] RIP: 0010:tipc_named_rcv+0x103/0x320 [tipc] [] Code: 41 89 44 24 10 49 8b 16 49 8b 46 08 49 c7 06 00 00 00 [...] [] RSP: 0018:c90a7c58 EFLAGS: 0282 [] RAX: 12ec RBX: RCX: 88807bde1270 [] RDX: 2c7c RSI: 2c7c RDI: 88807b38f1a8 [] RBP: 88807b006288 R08: 88806a367800 R09: 88806a367900 [] R10: 88806a367a00 R11: 88806a367b00 R12: 88807b006258 [] R13: 88807b00628a R14: 888069334d00 R15: 88806a434600 [] FS: () GS:88807948() knlGS:0[...] [] CS: 0010 DS: ES: CR0: 80050033 [] CR2: CR3: 7732 CR4: 06e0 [] Call Trace: [] ? tipc_bcast_rcv+0x9a/0x1a0 [tipc] [] tipc_rcv+0x40d/0x670 [tipc] [] ? _raw_spin_unlock+0xa/0x20 [] tipc_l2_rcv_msg+0x55/0x80 [tipc] [] __netif_receive_skb_one_core+0x8c/0xa0 [] process_backlog+0x98/0x140 [] net_rx_action+0x13a/0x420 [] __do_softirq+0xdb/0x316 [] ? smpboot_thread_fn+0x2f/0x1e0 [] ? smpboot_thread_fn+0x74/0x1e0 [] ? smpboot_thread_fn+0x14e/0x1e0 [] run_ksoftirqd+0x1a/0x40 [] smpboot_thread_fn+0x149/0x1e0 [] ? sort_range+0x20/0x20 [] kthread+0x131/0x150 [] ? kthread_unuse_mm+0xa0/0xa0 [] ret_from_fork+0x22/0x30 [] Modules linked in: veth tipc(O) ip6_udp_tunnel udp_tunnel [...] [] CR2: [] ---[ end trace 65c276a8e2e2f310 ]--- To fix this, we need to grab the lock of the 'namedq' list on both path calling. Fixes: cad2929dc432 ("tipc: update a binding service via broadcast") Acked-by: Jon Maloy Signed-off-by: Hoang Huu Le --- net/tipc/name_distr.c | 10 +- net/tipc/node.c | 2 +- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c index 2f9c148f17e2..fe4edce459ad 100644 --- a/net/tipc/name_distr.c +++ b/net/tipc/name_distr.c @@ -327,8 +327,13 @@ static struct sk_buff *tipc_named_dequeue(struct sk_buff_head *namedq, struct tipc_msg *hdr; u16 seqno; + spin_lock_bh(&namedq->lock); skb_queue_walk_safe(namedq, skb, tmp) { - skb_linearize(skb); + if (unlikely(skb_linearize(skb))) { + __skb_unlink(skb, namedq); + kfree_skb(skb); + continue; + } hdr = buf_msg(skb); seqno = msg_named_seqno(hdr); if (msg_is_last_bulk(hdr)) { @@ -338,12 +343,14 @@ static struct sk_buff *tipc_named_dequeue(struct sk_buff_head *namedq, if (msg_is_bulk(hdr) || msg_is_legacy(hdr)) { __skb_unlink(skb, namedq); + spin_unlock_bh(&namedq->lock); return skb; } if (*open && (*rcv_nxt == seqno)) { (*rcv_nxt)++; __skb_unlink(skb, namedq); + spin_unlock_bh(&namedq->lock); return skb; } @@ -353,6 +360,7 @@ static struct sk_buff *tipc_named_dequeue(struct sk_buff_head *namedq, continue; } } + spin_unlock_bh(&namedq->lock); return NULL; } diff --git a/net/tipc/node.c b/net/tipc/node.c index cf4b239fc569..d269ebe382e1 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1496,7 +1496,7 @@ static void node_lost_contact(struct tipc_node *n, /* Clean up broadcast state */ tipc_bcast_remove_peer(n->net, n->bc_entry.link); - __skb_queue_purge(&n->bc_entry.namedq); + skb_queue_purge(&n->bc_entry.namedq); /* Abort any ongoing link failover */ for (i = 0; i < MAX_BEARERS; i++) { -- 2.25.1
Re: [PATCH v2 1/6] Add ancillary bus support
On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky wrote: [..] > All stated above is my opinion, it can be different from yours. Yes, but we need to converge to move this forward. Jason was involved in the current organization for registration, Greg was angling for this to be core functionality. I have use cases outside of RDMA and netdev. Parav was ok with the current organization. The SOF folks already have a proposed incorporation of it. The argument I am hearing is that "this registration api seems hard for driver writers" when we have several driver writers who have already taken a look and can make it work. If you want to follow on with a simpler wrappers for your use case, great, but I do not yet see anyone concurring with your opinion that the current organization is irretrievably broken or too obscure to use.
Re: [PATCH net-next v2 03/16] devlink: Add devlink reload limit option
Wed, Oct 07, 2020 at 08:00:44AM CEST, mo...@mellanox.com wrote: >Add reload limit to demand restrictions on reload actions. >Reload limits supported: >no_reset: No reset allowed, no down time allowed, no link flap and no > configuration is lost. > >By default reload limit is unspecified and so no constraints on reload >actions are required. > >Some combinations of action and limit are invalid. For example, driver >can not reinitialize its entities without any downtime. > >The no_reset reload limit will have usecase in this patchset to >implement restricted fw_activate on mlx5. > >Have the uapi parameter of reload limit ready for future support of >multiselection. > >Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko
WARNING in ieee80211_ibss_csa_beacon
Hello, syzbot found the following issue on: HEAD commit:c85fb28b Merge tag 'arm64-fixes' of git://git.kernel.org/p.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=15b2b40050 kernel config: https://syzkaller.appspot.com/x/.config?x=de7f697da23057c7 dashboard link: https://syzkaller.appspot.com/bug?extid=b6c9fe29aefe68e4ad34 compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+b6c9fe29aefe68e4a...@syzkaller.appspotmail.com WARNING: CPU: 1 PID: 11321 at net/mac80211/ibss.c:504 ieee80211_ibss_csa_beacon+0x4e9/0x5a0 net/mac80211/ibss.c:504 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 11321 Comm: kworker/u4:0 Not tainted 5.9.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: phy10 ieee80211_csa_finalize_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d6/0x29e lib/dump_stack.c:118 panic+0x2c0/0x800 kernel/panic.c:231 __warn+0x227/0x250 kernel/panic.c:600 report_bug+0x1b1/0x2e0 lib/bug.c:198 handle_bug+0x42/0x80 arch/x86/kernel/traps.c:234 exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:254 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536 RIP: 0010:ieee80211_ibss_csa_beacon+0x4e9/0x5a0 net/mac80211/ibss.c:504 Code: e8 fc 29 8b f9 b8 f4 ff ff ff eb 0a e8 f0 29 8b f9 b8 00 01 00 00 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 d7 29 8b f9 <0f> 0b b8 ea ff ff ff eb e3 e8 c9 29 8b f9 0f 0b e9 88 fb ff ff 48 RSP: 0018:c9000ab5fbf8 EFLAGS: 00010293 RAX: 87e9d419 RBX: 88804cc20580 RCX: 888016268280 RDX: RSI: RDI: RBP: 88804cc23490 R08: dc00 R09: fbfff16c82b4 R10: fbfff16c82b4 R11: R12: 8880a21a18c0 R13: 8880a21a18ba R14: 8880a21a0c00 R15: 8880a21a18e0 ieee80211_set_after_csa_beacon net/mac80211/cfg.c:3043 [inline] __ieee80211_csa_finalize net/mac80211/cfg.c:3099 [inline] ieee80211_csa_finalize+0x46f/0x960 net/mac80211/cfg.c:3122 ieee80211_csa_finalize_work+0xfb/0x140 net/mac80211/cfg.c:3147 process_one_work+0x789/0xfc0 kernel/workqueue.c:2269 worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415 kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Kernel Offset: disabled Rebooting in 86400 seconds.. --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
Re: [PATCH v2 1/6] Add ancillary bus support
On Thu, Oct 08, 2020 at 07:14:17AM +, Parav Pandit wrote: > > > > From: Leon Romanovsky > > Sent: Thursday, October 8, 2020 10:56 AM > > > > On Thu, Oct 08, 2020 at 04:56:01AM +, Parav Pandit wrote: > > > > > > > > > > From: Pierre-Louis Bossart > > > > Sent: Thursday, October 8, 2020 3:20 AM > > > > > > > > > > > > On 10/7/20 4:22 PM, Ertman, David M wrote: > > > > >> -Original Message- > > > > >> From: Pierre-Louis Bossart > > > > >> Sent: Wednesday, October 7, 2020 1:59 PM > > > > >> To: Ertman, David M ; Parav Pandit > > > > >> ; Leon Romanovsky > > > > >> Cc: alsa-de...@alsa-project.org; pa...@mellanox.com; > > > > >> ti...@suse.de; netdev@vger.kernel.org; > > > > >> ranjani.sridha...@linux.intel.com; > > > > >> fred...@linux.intel.com; linux-r...@vger.kernel.org; > > > > >> dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe > > > > >> ; gre...@linuxfoundation.org; k...@kernel.org; > > > > >> Williams, Dan J ; Saleem, Shiraz > > > > >> ; da...@davemloft.net; Patil, Kiran > > > > >> > > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support > > > > >> > > > > >> > > > > >> > > > > Below is most simple, intuitive and matching with core APIs for > > > > name and design pattern wise. > > > > init() > > > > { > > > > err = ancillary_device_initialize(); > > > > if (err) > > > > return ret; > > > > > > > > err = ancillary_device_add(); > > > > if (ret) > > > > goto err_unwind; > > > > > > > > err = some_foo(); > > > > if (err) > > > > goto err_foo; > > > > return 0; > > > > > > > > err_foo: > > > > ancillary_device_del(adev); > > > > err_unwind: > > > > ancillary_device_put(adev->dev); > > > > return err; > > > > } > > > > > > > > cleanup() > > > > { > > > > ancillary_device_de(adev); > > > > ancillary_device_put(adev); > > > > /* It is common to have a one wrapper for this as > > > > ancillary_device_unregister(). > > > > * This will match with core device_unregister() that has > > > > precise documentation. > > > > * but given fact that init() code need proper error > > > > unwinding, like above, > > > > * it make sense to have two APIs, and no need to export > > > > another symbol for unregister(). > > > > * This pattern is very easy to audit and code. > > > > */ > > > > } > > > > >>> > > > > >>> I like this flow +1 > > > > >>> > > > > >>> But ... since the init() function is performing both device_init > > > > >>> and device_add - it should probably be called > > > > >>> ancillary_device_register, and we are back to a single exported > > > > >>> API for both register and unregister. > > > > >> > > > > >> Kind reminder that we introduced the two functions to allow the > > > > >> caller to know if it needed to free memory when initialize() > > > > >> fails, and it didn't need to free memory when add() failed since > > > > >> put_device() takes care of it. If you have a single init() > > > > >> function it's impossible to know which behavior to select on error. > > > > >> > > > > >> I also have a case with SoundWire where it's nice to first > > > > >> initialize, then set some data and then add. > > > > >> > > > > > > > > > > The flow as outlined by Parav above does an initialize as the > > > > > first step, so every error path out of the function has to do a > > > > > put_device(), so you would never need to manually free the memory > > > > > in > > > > the setup function. > > > > > It would be freed in the release call. > > > > > > > > err = ancillary_device_initialize(); if (err) > > > > return ret; > > > > > > > > where is the put_device() here? if the release function does any > > > > sort of kfree, then you'd need to do it manually in this case. > > > Since device_initialize() failed, put_device() cannot be done here. > > > So yes, pseudo code should have shown, if (err) { > > > kfree(adev); > > > return err; > > > } > > > > > > If we just want to follow register(), unregister() pattern, > > > > > > Than, > > > > > > ancillar_device_register() should be, > > > > > > /** > > > * ancillar_device_register() - register an ancillary device > > > * NOTE: __never directly free @adev after calling this function, even > > > if it returned > > > * an error. Always use ancillary_device_put() to give up the reference > > initialized by this function. > > > * This note matches with the core and caller knows exactly what to be > > done. > > > */ > > > ancillary_device_register() > > > { > > > device_initialize(&adev->dev); > > > if (!dev->parent || !adev->name) > > > return -EINVAL; > > > if (!dev->release && !(dev->type && dev->type->release)) { > > > /* core is already capable and throws the warning when > > release callback is not set. > > >*
Re: [PATCH v2 1/6] Add ancillary bus support
On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote: > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky wrote: > [..] > > All stated above is my opinion, it can be different from yours. > > Yes, but we need to converge to move this forward. Jason was involved > in the current organization for registration, Greg was angling for > this to be core functionality. I have use cases outside of RDMA and > netdev. Parav was ok with the current organization. The SOF folks > already have a proposed incorporation of it. The argument I am hearing > is that "this registration api seems hard for driver writers" when we > have several driver writers who have already taken a look and can make > it work. If you want to follow on with a simpler wrappers for your use > case, great, but I do not yet see anyone concurring with your opinion > that the current organization is irretrievably broken or too obscure > to use. That's kind of because I tuned out of this thread a long time ago :) I do agree with Leon that I think the current patch is not the correct way to do this the easiest, but don't have a competing proposal to show what I mean. Yet. Let's see what happens after 5.10-rc1 is out, it's too late now for any of this for this next merge window so we can not worry about it for a few weeks. thanks, greg k-h
Re: [PATCH v2 1/6] Add ancillary bus support
On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote: > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky wrote: > [..] > > All stated above is my opinion, it can be different from yours. > > Yes, but we need to converge to move this forward. Jason was involved > in the current organization for registration, Greg was angling for > this to be core functionality. I have use cases outside of RDMA and > netdev. Parav was ok with the current organization. The SOF folks > already have a proposed incorporation of it. The argument I am hearing > is that "this registration api seems hard for driver writers" when we > have several driver writers who have already taken a look and can make > it work. If you want to follow on with a simpler wrappers for your use > case, great, but I do not yet see anyone concurring with your opinion > that the current organization is irretrievably broken or too obscure > to use. Can it be that I'm first one to use this bus for very large driver (>120K LOC) that has 5 different ->probe() flows? For example, this https://lore.kernel.org/linux-rdma/20201006172317.GN1874917@unreal/ hints to me that this bus wasn't used with anything complex as it was initially intended. And regarding registration, I said many times that init()/add() scheme is ok, the inability to call to uninit() after add() failure is not ok from my point of view. Thanks
Re: [PATCH v4 bpf-next 09/13] bpf: introduce multibuff support to bpf_prog_test_run_xdp()
Lorenzo Bianconi writes: Introduce the capability to allocate a xdp multi-buff in bpf_prog_test_run_xdp routine. This is a preliminary patch to introduce the selftests for new xdp multi-buff ebpf helpers Signed-off-by: Lorenzo Bianconi --- net/bpf/test_run.c | 51 ++ 1 file changed, 43 insertions(+), 8 deletions(-) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index bd291f5f539c..ec7286cd051b 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -617,44 +617,79 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, { u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); u32 headroom = XDP_PACKET_HEADROOM; - u32 size = kattr->test.data_size_in; u32 repeat = kattr->test.repeat; struct netdev_rx_queue *rxqueue; + struct skb_shared_info *sinfo; struct xdp_buff xdp = {}; + u32 max_data_sz, size; u32 retval, duration; - u32 max_data_sz; + int i, ret, data_len; void *data; - int ret; if (kattr->test.ctx_in || kattr->test.ctx_out) return -EINVAL; - /* XDP have extra tailroom as (most) drivers use full page */ max_data_sz = 4096 - headroom - tailroom; For the sake of consistency, can this 4096 be changed to PAGE_SIZE ? Same as in data_len = min_t(int, kattr->test.data_size_in - size, PAGE_SIZE); expression below + size = min_t(u32, kattr->test.data_size_in, max_data_sz); + data_len = size; - data = bpf_test_init(kattr, kattr->test.data_size_in, -max_data_sz, headroom, tailroom); + data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom); if (IS_ERR(data)) return PTR_ERR(data); xdp.data_hard_start = data; xdp.data = data + headroom; xdp.data_meta = xdp.data; - xdp.data_end = xdp.data + size; + xdp.data_end = xdp.data + data_len; xdp.frame_sz = headroom + max_data_sz + tailroom; + sinfo = xdp_get_shared_info_from_buff(&xdp); + if (unlikely(kattr->test.data_size_in > size)) { + void __user *data_in = u64_to_user_ptr(kattr->test.data_in); + + while (size < kattr->test.data_size_in) { + skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags]; + struct page *page; + int data_len; + + page = alloc_page(GFP_KERNEL); + if (!page) { + ret = -ENOMEM; + goto out; + } + + __skb_frag_set_page(frag, page); + data_len = min_t(int, kattr->test.data_size_in - size, +PAGE_SIZE); + skb_frag_size_set(frag, data_len); + if (copy_from_user(page_address(page), data_in + size, + data_len)) { + ret = -EFAULT; + goto out; + } + sinfo->nr_frags++; + size += data_len; + } + xdp.mb = 1; + } + rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0); xdp.rxq = &rxqueue->xdp_rxq; bpf_prog_change_xdp(NULL, prog); ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); if (ret) goto out; + if (xdp.data != data + headroom || xdp.data_end != xdp.data + size) - size = xdp.data_end - xdp.data; + size += xdp.data_end - xdp.data - data_len; Can we please drop the variable shadowing of data_len ? This is confusing since the initial value of data_len is correct in the `size` calculation, while its value inside the while loop it not. This seem to be syntactically correct, but I think it's better practice to avoid shadowing here. + ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration); out: bpf_prog_change_xdp(prog, NULL); + for (i = 0; i < sinfo->nr_frags; i++) + __free_page(skb_frag_page(&sinfo->frags[i])); kfree(data); + return ret; }
Re: [PATCH v2 1/2] net: usb: qmi_wwan: add Cellient MPL200 card
Wilken Gottwalt writes: > Add usb ids of the Cellient MPL200 card. > > Signed-off-by: Wilken Gottwalt > --- > drivers/net/usb/qmi_wwan.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c > index 07c42c0719f5..5ca1356b8656 100644 > --- a/drivers/net/usb/qmi_wwan.c > +++ b/drivers/net/usb/qmi_wwan.c > @@ -1375,6 +1375,7 @@ static const struct usb_device_id products[] = { > {QMI_QUIRK_SET_DTR(0x2cb7, 0x0104, 4)}, /* Fibocom NL678 series */ > {QMI_FIXED_INTF(0x0489, 0xe0b4, 0)},/* Foxconn T77W968 LTE */ > {QMI_FIXED_INTF(0x0489, 0xe0b5, 0)},/* Foxconn T77W968 LTE with > eSIM support*/ > + {QMI_FIXED_INTF(0x2692, 0x9025, 4)},/* Cellient MPL200 (rebranded > Qualcomm 05c6:9025) */ > > /* 4. Gobi 1000 devices */ > {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)},/* Acer Gobi Modem Device */ Thanks. Looks nice now. Acked-by: Bjørn Mork
Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
On Wed, 07 Oct 2020 20:19:39 -0700 John Fastabend wrote: > Maciej Żenczykowski wrote: > > On Wed, Oct 7, 2020 at 3:37 PM John Fastabend > > wrote: > > > > > > Daniel Borkmann wrote: > > > > On 10/7/20 6:23 PM, Jesper Dangaard Brouer wrote: > > > > [...] > > > > > net/core/dev.c | 24 ++-- > > > > > 1 file changed, 22 insertions(+), 2 deletions(-) > > > > > > Couple high-level comments. Whats the problem with just letting the driver > > > consume the packet? I would chalk it up to a buggy BPF program that is > > > sending these packets. The drivers really shouldn't panic or do anything > > > horrible under this case because even today I don't think we can be > > > 100% certain MTU on skb matches set MTU. Imagine the case where I change > > > the MTU from 9kB->1500B there will be some skbs in-flight with the larger > > > length and some with the shorter. If the drivers panic/fault or otherwise > > > does something else horrible thats not going to be friendly in general > > > case > > > regardless of what BPF does. And seeing this type of config is all done > > > async its tricky (not practical) to flush any skbs in-flight. > > > > > > I've spent many hours debugging these types of feature flag, mtu > > > change bugs on the driver side I'm not sure it can be resolved by > > > the stack easily. Better to just build drivers that can handle it IMO. > > > > > > Do we know if sending >MTU size skbs to drivers causes problems in real > > > cases? I haven't tried on the NICs I have here, but I expect they should > > > be fine. Fine here being system keeps running as expected. Dropping the > > > skb either on TX or RX side is expected. Even with this change though > > > its possible for the skb to slip through if I configure MTU on a live > > > system. > > > > I wholeheartedly agree with the above. > > > > Ideally the only >mtu check should happen at driver admittance. > > But again ideally it should happen in some core stack location not in > > the driver itself. > > Ideally maybe, but IMO we should just let the skb go to the driver > and let the driver sort it out. Even if this means pushing the packet > onto the wire then the switch will drop it or the receiver, etc. A > BPF program can do lots of horrible things that should never be > on the wire otherwise. MTU is just one of them, but sending corrupted > payloads, adding bogus headers, checksums etc. so I don't think we can > reasonable protect against all of them. That is a good point. > Of course if the driver is going to hang/panic then something needs > to be done. Perhaps a needs_mtu_check feature flag, although thats > not so nice either so perhaps drivers just need to handle it themselves. > Also even today the case could happen without BPF as best I can tell > so the drivers should be prepared for it. Yes, borderline cases do exist already today (like your reconf with inflight packets), so I guess drivers should at-least not hang/panic and be robust enough that we can drop this check. I think you have convinced me that we can drop this check. My only concern is how people can troubleshoot this, but its not different from current state. > > However, due to both gso and vlan offload, even this is not trivial to do... > > The mtu is L3, but drivers/hardware/the wire usually care about L2... If net_device->mtu is L3 (1500) and XDP (and TC, right?) operate at L2, that likely means that the "strict" bpf_mtu_check (in my BPF-helper) is wrong, as XDP (and TC) length at this point include the 14 bytes Ethernet header. I will check and fix. Is this accounted for via net_device->hard_header_len ? -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v2 1/6] Add ancillary bus support
On Thu, Oct 8, 2020 at 1:00 AM Leon Romanovsky wrote: > > On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote: > > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky wrote: > > [..] > > > All stated above is my opinion, it can be different from yours. > > > > Yes, but we need to converge to move this forward. Jason was involved > > in the current organization for registration, Greg was angling for > > this to be core functionality. I have use cases outside of RDMA and > > netdev. Parav was ok with the current organization. The SOF folks > > already have a proposed incorporation of it. The argument I am hearing > > is that "this registration api seems hard for driver writers" when we > > have several driver writers who have already taken a look and can make > > it work. If you want to follow on with a simpler wrappers for your use > > case, great, but I do not yet see anyone concurring with your opinion > > that the current organization is irretrievably broken or too obscure > > to use. > > Can it be that I'm first one to use this bus for very large driver (>120K LOC) > that has 5 different ->probe() flows? > > For example, this > https://lore.kernel.org/linux-rdma/20201006172317.GN1874917@unreal/ > hints to me that this bus wasn't used with anything complex as it was > initially intended. I missed that. Yes, I agree that's broken. > > And regarding registration, I said many times that init()/add() scheme is ok, > the inability > to call to uninit() after add() failure is not ok from my point of view. Ok, I got to the wrong conclusion about your position.
[PATCH ipsec] xfrm: interface: fix the priorities for ipip and ipv6 tunnels
As Nicolas noticed in his case, when xfrm_interface module is installed the standard IP tunnels will break in receiving packets. This is caused by the IP tunnel handlers with a higher priority in xfrm interface processing incoming packets by xfrm_input(), which would drop the packets and return 0 instead when anything wrong happens. Rather than changing xfrm_input(), this patch is to adjust the priority for the IP tunnel handlers in xfrm interface, so that the packets would go to xfrmi's later than the others', as the others' would not drop the packets when the handlers couldn't process them. Note that IPCOMP also defines its own IPIP tunnel handler and it calls xfrm_input() as well, so we must make its priority lower than xfrmi's, which means having xfrmi loaded would still break IPCOMP. We may seek another way to fix it in xfrm_input() in the future. Reported-by: Nicolas Dichtel Tested-by: Nicolas Dichtel Fixes: da9bbf0598c9 ("xfrm: interface: support IPIP and IPIP6 tunnels processing with .cb_handler") FIxes: d7b360c2869f ("xfrm: interface: support IP6IP6 and IP6IP tunnels processing with .cb_handler") Signed-off-by: Xin Long --- net/ipv4/xfrm4_tunnel.c | 4 ++-- net/ipv6/xfrm6_tunnel.c | 4 ++-- net/xfrm/xfrm_interface.c | 8 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c index dc19aff..fb0648e 100644 --- a/net/ipv4/xfrm4_tunnel.c +++ b/net/ipv4/xfrm4_tunnel.c @@ -64,14 +64,14 @@ static int xfrm_tunnel_err(struct sk_buff *skb, u32 info) static struct xfrm_tunnel xfrm_tunnel_handler __read_mostly = { .handler= xfrm_tunnel_rcv, .err_handler= xfrm_tunnel_err, - .priority = 3, + .priority = 4, }; #if IS_ENABLED(CONFIG_IPV6) static struct xfrm_tunnel xfrm64_tunnel_handler __read_mostly = { .handler= xfrm_tunnel_rcv, .err_handler= xfrm_tunnel_err, - .priority = 2, + .priority = 3, }; #endif diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c index 25b7ebd..f696d46 100644 --- a/net/ipv6/xfrm6_tunnel.c +++ b/net/ipv6/xfrm6_tunnel.c @@ -303,13 +303,13 @@ static const struct xfrm_type xfrm6_tunnel_type = { static struct xfrm6_tunnel xfrm6_tunnel_handler __read_mostly = { .handler= xfrm6_tunnel_rcv, .err_handler= xfrm6_tunnel_err, - .priority = 2, + .priority = 3, }; static struct xfrm6_tunnel xfrm46_tunnel_handler __read_mostly = { .handler= xfrm6_tunnel_rcv, .err_handler= xfrm6_tunnel_err, - .priority = 2, + .priority = 3, }; static int __net_init xfrm6_tunnel_net_init(struct net *net) diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c index a8f6611..0bb7963 100644 --- a/net/xfrm/xfrm_interface.c +++ b/net/xfrm/xfrm_interface.c @@ -830,14 +830,14 @@ static struct xfrm6_tunnel xfrmi_ipv6_handler __read_mostly = { .handler= xfrmi6_rcv_tunnel, .cb_handler = xfrmi_rcv_cb, .err_handler= xfrmi6_err, - .priority = -1, + .priority = 2, }; static struct xfrm6_tunnel xfrmi_ip6ip_handler __read_mostly = { .handler= xfrmi6_rcv_tunnel, .cb_handler = xfrmi_rcv_cb, .err_handler= xfrmi6_err, - .priority = -1, + .priority = 2, }; #endif @@ -875,14 +875,14 @@ static struct xfrm_tunnel xfrmi_ipip_handler __read_mostly = { .handler= xfrmi4_rcv_tunnel, .cb_handler = xfrmi_rcv_cb, .err_handler= xfrmi4_err, - .priority = -1, + .priority = 3, }; static struct xfrm_tunnel xfrmi_ipip6_handler __read_mostly = { .handler= xfrmi4_rcv_tunnel, .cb_handler = xfrmi_rcv_cb, .err_handler= xfrmi4_err, - .priority = -1, + .priority = 2, }; #endif -- 2.1.0
Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
On 10/8/20 10:07 AM, Jesper Dangaard Brouer wrote: [...] However, due to both gso and vlan offload, even this is not trivial to do... The mtu is L3, but drivers/hardware/the wire usually care about L2... If net_device->mtu is L3 (1500) and XDP (and TC, right?) operate at L2, that likely means that the "strict" bpf_mtu_check (in my BPF-helper) is wrong, as XDP (and TC) length at this point include the 14 bytes Ethernet header. I will check and fix. Yes, both at L2 layer. Is this accounted for via net_device->hard_header_len ? It is, see also ether_setup().
Re: [PATCH net 2/2] IPv6: reply ICMP error if the first fragment don't include all headers
Hi Eric, Thanks for the comments. I should add "RFC" in subject next time for the uncertain fix patch. On Wed, Oct 07, 2020 at 11:35:41AM +0200, Eric Dumazet wrote: > > > On 10/7/20 5:55 AM, Hangbin Liu wrote: > > > kfree_skb(skb); > > @@ -282,6 +285,21 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff > > *skb, struct net_device *dev, > > } > > } > > > > + /* RFC 8200, Section 4.5 Fragment Header: > > +* If the first fragment does not include all headers through an > > +* Upper-Layer header, then that fragment should be discarded and > > +* an ICMP Parameter Problem, Code 3, message should be sent to > > +* the source of the fragment, with the Pointer field set to zero. > > +*/ > > + nexthdr = hdr->nexthdr; > > + offset = ipv6_skip_exthdr(skb, skb_transport_offset(skb), &nexthdr, > > &frag_off); > > + if (frag_off == htons(IP6_MF) && !pskb_may_pull(skb, offset + 1)) { > > + __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS); > > + icmpv6_param_prob(skb, ICMPV6_HDR_INCOMP, 0); > > + rcu_read_unlock(); > > + return NULL; > > + } > > + > > rcu_read_unlock(); > > > > /* Must drop socket now because of tproxy. */ > > > > Ouch, this is quite a buggy patch. > > I doubt we want to add yet another ipv6_skip_exthdr() call in IPv6 fast path. > > Surely the presence of NEXTHDR_FRAGMENT is already tested elsewhere ? Would you like to help point where NEXTHDR_FRAGMENT was tested before IPv6 defragment? > > Also, ipv6_skip_exthdr() does not pull anything in skb->head, it would be > strange > to force a pull of hundreds of bytes just because you want to check if an > extra byte is there, > if the packet could be forwarded as is, without additional memory allocations. > > Testing skb->len should be more than enough at this stage. Ah, yes, I shouldn't call pskb_may_pull here. > > Also ipv6_skip_exthdr() can return an error. it returns -1 as error, If we tested it by (offset + 1 > skb->len), does that count as an error handler? Thanks Hangbin
Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
On Wed, 7 Oct 2020 23:37:00 +0200 Daniel Borkmann wrote: > On 10/7/20 6:23 PM, Jesper Dangaard Brouer wrote: > [...] > > net/core/dev.c | 24 ++-- > > 1 file changed, 22 insertions(+), 2 deletions(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index b433098896b2..19406013f93e 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3870,6 +3870,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, > > struct net_device *dev) > > switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { > > case TC_ACT_OK: > > case TC_ACT_RECLASSIFY: > > + *ret = NET_XMIT_SUCCESS; > > skb->tc_index = TC_H_MIN(cl_res.classid); > > break; > > case TC_ACT_SHOT: > > @@ -4064,9 +4065,12 @@ static int __dev_queue_xmit(struct sk_buff *skb, > > struct net_device *sb_dev) > > { > > struct net_device *dev = skb->dev; > > struct netdev_queue *txq; > > +#ifdef CONFIG_NET_CLS_ACT > > + bool mtu_check = false; > > +#endif > > + bool again = false; > > struct Qdisc *q; > > int rc = -ENOMEM; > > - bool again = false; > > > > skb_reset_mac_header(skb); > > > > @@ -4082,14 +4086,28 @@ static int __dev_queue_xmit(struct sk_buff *skb, > > struct net_device *sb_dev) > > > > qdisc_pkt_len_init(skb); > > #ifdef CONFIG_NET_CLS_ACT > > + mtu_check = skb_is_redirected(skb); > > skb->tc_at_ingress = 0; > > # ifdef CONFIG_NET_EGRESS > > if (static_branch_unlikely(&egress_needed_key)) { > > + unsigned int len_orig = skb->len; > > + > > skb = sch_handle_egress(skb, &rc, dev); > > if (!skb) > > goto out; > > + /* BPF-prog ran and could have changed packet size beyond MTU */ > > + if (rc == NET_XMIT_SUCCESS && skb->len > len_orig) > > + mtu_check = true; > > } > > # endif > > + /* MTU-check only happens on "last" net_device in a redirect sequence > > +* (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it > > +* either ingress or egress to another device). > > +*/ > > Hmm, quite some overhead in fast path. Not really, normal fast-path already call is_skb_forwardable(). And it already happens in existing code, ingress redirect code, which I remove calling in patch 6. (I have considered inlining is_skb_forwardable as a optimization for general netstack dev_forward_skb) > Also, won't this be checked multiple times on stacked devices? :( I don't think it will be checked multiple times, because we have a skb_reset_redirect() in ingress path (just after sch_handle_ingress()). > Moreover, this missed the fact that 'real' qdiscs can have > filters attached too, this would come after this check. Can't this instead be > in > driver layer for those that really need it? I would probably only drop the > check > as done in 1/6 and allow the BPF prog to do the validation if needed. See other reply, this is likely what we will end-up with. > > + if (mtu_check && !is_skb_forwardable(dev, skb)) { > > + rc = -EMSGSIZE; > > + goto drop; > > + } > > #endif > > /* If device/qdisc don't need skb->dst, release it right now while > > * its hot in this cpu cache. > > @@ -4157,7 +4175,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, > > struct net_device *sb_dev) > > > > rc = -ENETDOWN; > > rcu_read_unlock_bh(); > > - > > +#ifdef CONFIG_NET_CLS_ACT > > +drop: > > +#endif > > atomic_long_inc(&dev->tx_dropped); > > kfree_skb_list(skb); > > return rc; > > > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping
On Wed Oct 07 2020, Vladimir Oltean wrote: > On Wed, Oct 07, 2020 at 12:39:49PM +0200, Kurt Kanzenbach wrote: >> For instance the hellcreek switch has actually three ptp hardware >> clocks and the time stamping can be configured to use either one of >> them. > > The sja1105 also has a corrected and an uncorrected PTP clock that can > take timestamps. Initially I had thought I'd be going to spend some time > figuring out multi-PHC support, but now I don't see any practical reason > to use the uncorrected PHC for anything. Just out of curiosity: How do you implement 802.1AS then? My understanding is that the free-running clock has to be used for the calculation of the peer delays and such meaning there should be a way to get access to both PHCs or having some form of cross timestamping available. The hellcreek switch can take cross snapshots of all three ptp clocks in hardware for that purpose. >> > So when you'll poll for TX timestamps, you'll receive a TX >> > timestamp from the PHY and another one from the switch, and those will >> > be in a race with one another, so you won't know which one is which. >> >> OK. So what happens if the driver will accept to disable hardware >> timestamping? Is there anything else that needs to be implemented? Are >> there (good) examples? > > It needs to not call skb_complete_tx_timestamp() and friends. > > For PHY timestamping, it also needs to invoke the correct methods for RX > and for TX, where the PHY timestamping hooks will get called. I don't > think that DSA is compatible yet with PHY timestamping, but it is > probably a trivial modification. Hmm? If DSA doesn't support PHY timestamping how are other DSA drivers dealing with it then? I'm getting really confused. Furthermore, there is no hellcreek hardware available with timestamping capable PHYs. How am I supposed to even test this? For now, until there is hardware available, PHY timestamping is not supported with the hellcreek switch. Thanks, Kurt signature.asc Description: PGP signature
Re: [PATCH net 2/2] IPv6: reply ICMP error if the first fragment don't include all headers
On Wed, Oct 07, 2020 at 07:58:07AM -0700, Jakub Kicinski wrote: > On Wed, 7 Oct 2020 11:55:02 +0800 Hangbin Liu wrote: > > Based on RFC 8200, Section 4.5 Fragment Header: > > > > - If the first fragment does not include all headers through an > > Upper-Layer header, then that fragment should be discarded and > > an ICMP Parameter Problem, Code 3, message should be sent to > > the source of the fragment, with the Pointer field set to zero. > > > > As the packet may be any kind of L4 protocol, I only checked if there > > has Upper-Layer header by pskb_may_pull(skb, offset + 1). > > > > As the 1st truncated fragment may also be ICMP message, I also add > > a check in ICMP code is_ineligible() to let fragment packet with nexthdr > > ICMP but no ICMP header return false. > > > > Signed-off-by: Hangbin Liu > > net/ipv6/icmp.c:159:65: warning: incorrect type in argument 4 (different base > types) > net/ipv6/icmp.c:159:65:expected unsigned short *fragoff > net/ipv6/icmp.c:159:65:got restricted __be16 * Ah, Thanks for pointing out this error, I will fix it. Regards Hangbin
[PATCH net] sctp: fix sctp_auth_init_hmacs() error path
From: Eric Dumazet After freeing ep->auth_hmacs we have to clear the pointer or risk use-after-free as reported by syzbot: BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline] BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070 Read of size 8 at addr 8880a8ff52c0 by task syz-executor941/6874 CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline] sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070 sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203 sctp_endpoint_put net/sctp/endpointola.c:236 [inline] sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183 sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981 sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415 sk_common_release+0x64/0x390 net/core/sock.c:3254 sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475 __sock_release+0xcd/0x280 net/socket.c:596 sock_close+0x18/0x20 net/socket.c:1277 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 exit_task_work include/linux/task_work.h:25 [inline] do_exit+0xb7d/0x29f0 kernel/exit.c:806 do_group_exit+0x125/0x310 kernel/exit.c:903 __do_sys_exit_group kernel/exit.c:914 [inline] __se_sys_exit_group kernel/exit.c:912 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x43f278 Code: Bad RIP value. RSP: 002b:7fffe0995c38 EFLAGS: 0246 ORIG_RAX: 00e7 RAX: ffda RBX: RCX: 0043f278 RDX: RSI: 003c RDI: RBP: 004bf068 R08: 00e7 R09: ffd0 R10: 2000 R11: 0246 R12: 0001 R13: 006d1180 R14: R15: Allocated by task 6874: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554 kmalloc include/linux/slab.h:554 [inline] kmalloc_array include/linux/slab.h:593 [inline] kcalloc include/linux/slab.h:605 [inline] sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline] sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631 __sys_setsockopt+0x2db/0x610 net/socket.c:2132 __do_sys_setsockopt net/socket.c:2143 [inline] __se_sys_setsockopt net/socket.c:2140 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 6874: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3422 [inline] kfree+0x10e/0x2b0 mm/slab.c:3760 sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline] sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] sctp_auth_init_hmacs net/sctp/auth.c:496 [inline] sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline] sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631 __sys_setsockopt+0x2db/0x610 net/socket.c:2132 __do_sys_setsockopt net/socket.c:2143 [inline] __se_sys_setsockopt net/socket.c:2140 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1f485649f529 ("[SCTP]: Implement SCTP-AUTH internals") Signed-off-by: Eric Dumazet Cc: Vlad Yasevich Cc: Neil Horman Cc: Marcelo Ricardo Leitner --- net/sctp/auth.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sctp/auth.c b/net/sctp/auth.c index 9e289c770574f6009b1e854ee4b9b3d5f942d4d5..7e59d8a18f3e40368eb911b63ac9f514b1dcf095 100644 --- a/net/sctp/auth.c +++ b/net/sctp/auth.c @@ -494,6 +494,7 @@ int sctp_auth_init_hmacs(struct sctp_endpoint *ep, gfp_t gfp) out_err: /* Clean up any successful allocations */ sctp_auth_destroy_hmacs(ep->auth_hmacs); + ep->auth_hmacs = NULL; return -ENOM
Re: [Patch net] tipc: fix the skb_unshare() in tipc_buf_append()
On Thu, Oct 8, 2020 at 12:12 PM Cong Wang wrote: > > skb_unshare() drops a reference count on the old skb unconditionally, > so in the failure case, we end up freeing the skb twice here. > And because the skb is allocated in fclone and cloned by caller > tipc_msg_reassemble(), the consequence is actually freeing the > original skb too, thus triggered the UAF by syzbot. Do you mean: frag = skb_clone(skb, GFP_ATOMIC); frag = skb_unshare(frag) will free the 'skb' too? > > Fix this by replacing this skb_unshare() with skb_cloned()+skb_copy(). > > Fixes: ff48b6222e65 ("tipc: use skb_unshare() instead in tipc_buf_append()") > Reported-and-tested-by: syzbot+e96a7ba46281824cc...@syzkaller.appspotmail.com > Cc: Xin Long > Cc: Jon Maloy > Cc: Ying Xue > Signed-off-by: Cong Wang > --- > net/tipc/msg.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/tipc/msg.c b/net/tipc/msg.c > index 52e93ba4d8e2..681224401871 100644 > --- a/net/tipc/msg.c > +++ b/net/tipc/msg.c > @@ -150,7 +150,8 @@ int tipc_buf_append(struct sk_buff **headbuf, struct > sk_buff **buf) > if (fragid == FIRST_FRAGMENT) { > if (unlikely(head)) > goto err; > - frag = skb_unshare(frag, GFP_ATOMIC); > + if (skb_cloned(frag)) > + frag = skb_copy(frag, GFP_ATOMIC); > if (unlikely(!frag)) > goto err; > head = *headbuf = frag; > -- > 2.28.0 >
KMSAN: uninit-value in ieee80211_sta_tx_notify
Hello, syzbot found the following issue on: HEAD commit:5edb1df2 kmsan: drop the _nosanitize string functions git tree: https://github.com/google/kmsan.git master console output: https://syzkaller.appspot.com/x/log.txt?x=1279b1c050 kernel config: https://syzkaller.appspot.com/x/.config?x=4991d22eb136035c dashboard link: https://syzkaller.appspot.com/bug?extid=beae481026cb095addca compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) userspace arch: i386 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+beae481026cb095ad...@syzkaller.appspotmail.com = BUG: KMSAN: uninit-value in ieee80211_ac_from_tid net/mac80211/ieee80211_i.h:2034 [inline] BUG: KMSAN: uninit-value in ieee80211_sta_tx_wmm_ac_notify net/mac80211/mlme.c:2490 [inline] BUG: KMSAN: uninit-value in ieee80211_sta_tx_notify+0x4be/0xda0 net/mac80211/mlme.c:2522 CPU: 1 PID: 8613 Comm: kworker/u4:1 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: phy18 ieee80211_beacon_connection_loss_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:201 ieee80211_ac_from_tid net/mac80211/ieee80211_i.h:2034 [inline] ieee80211_sta_tx_wmm_ac_notify net/mac80211/mlme.c:2490 [inline] ieee80211_sta_tx_notify+0x4be/0xda0 net/mac80211/mlme.c:2522 __ieee80211_tx_status+0x2d3b/0x4450 net/mac80211/status.c:1013 ieee80211_tx_status+0x223/0x270 net/mac80211/status.c:1129 ieee80211_tasklet_handler+0x34e/0x3c0 net/mac80211/main.c:239 tasklet_action_common+0x416/0x6a0 kernel/softirq.c:562 tasklet_action+0x30/0x40 kernel/softirq.c:580 __do_softirq+0x2ea/0x7f5 kernel/softirq.c:299 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 __run_on_irqstack arch/x86/include/asm/irq_stack.h:23 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:50 [inline] do_softirq_own_stack+0x7c/0xa0 arch/x86/kernel/irq_64.c:77 do_softirq kernel/softirq.c:344 [inline] __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:196 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32 __ieee80211_tx_skb_tid_band+0x28e/0x390 net/mac80211/tx.c:5352 ieee80211_tx_skb_tid net/mac80211/ieee80211_i.h:2003 [inline] ieee80211_tx_skb net/mac80211/ieee80211_i.h:2012 [inline] ieee80211_send_nullfunc+0x59a/0x6d0 net/mac80211/mlme.c:1095 ieee80211_mgd_probe_ap_send+0x897/0xb40 net/mac80211/mlme.c:2593 ieee80211_mgd_probe_ap+0x6b3/0x760 net/mac80211/mlme.c:2669 ieee80211_beacon_connection_loss_work+0x156/0x270 net/mac80211/mlme.c:2788 process_one_work+0x1688/0x2140 kernel/workqueue.c:2269 worker_thread+0x10bc/0x2730 kernel/workqueue.c:2415 kthread+0x551/0x590 kernel/kthread.c:293 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Uninit was created at: kmsan_save_stack_with_flags+0x3c/0x90 mm/kmsan/kmsan.c:143 kmsan_internal_alloc_meta_for_pages mm/kmsan/kmsan_shadow.c:268 [inline] kmsan_alloc_page+0xc5/0x1a0 mm/kmsan/kmsan_shadow.c:292 __alloc_pages_nodemask+0xf34/0x1120 mm/page_alloc.c:4927 __alloc_pages include/linux/gfp.h:509 [inline] __alloc_pages_node include/linux/gfp.h:522 [inline] alloc_pages_node include/linux/gfp.h:536 [inline] __page_frag_cache_refill mm/page_alloc.c:5002 [inline] page_frag_alloc+0x35b/0x880 mm/page_alloc.c:5032 __netdev_alloc_skb+0xc3d/0xc90 net/core/skbuff.c:456 netdev_alloc_skb include/linux/skbuff.h:2821 [inline] dev_alloc_skb include/linux/skbuff.h:2834 [inline] __ieee80211_beacon_get+0x37e3/0x4df0 net/mac80211/tx.c:4819 ieee80211_beacon_get_tim+0x109/0x800 net/mac80211/tx.c:4933 ieee80211_beacon_get include/net/mac80211.h:4845 [inline] mac80211_hwsim_beacon_tx+0x1c3/0xb80 drivers/net/wireless/mac80211_hwsim.c:1676 __iterate_interfaces net/mac80211/util.c:737 [inline] ieee80211_iterate_active_interfaces_atomic+0x40a/0x610 net/mac80211/util.c:773 mac80211_hwsim_beacon+0x11d/0x2e0 drivers/net/wireless/mac80211_hwsim.c:1717 __run_hrtimer+0x7cd/0xf00 kernel/time/hrtimer.c:1524 __hrtimer_run_queues kernel/time/hrtimer.c:1588 [inline] hrtimer_run_softirq+0x3bf/0x690 kernel/time/hrtimer.c:1605 __do_softirq+0x2ea/0x7f5 kernel/softirq.c:299 = --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
[PATCH 1/6] net: tlan: Fix typo abitrary
Fix comment typo. s/abitrary/arbitrary/ Signed-off-by: Naoki Hayama --- drivers/net/ethernet/ti/tlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/tlan.c b/drivers/net/ethernet/ti/tlan.c index 76a342ea3797..e922258cae3f 100644 --- a/drivers/net/ethernet/ti/tlan.c +++ b/drivers/net/ethernet/ti/tlan.c @@ -2511,7 +2511,7 @@ static void tlan_phy_power_down(struct net_device *dev) } /* Wait for 50 ms and powerup -* This is abitrary. It is intended to make sure the +* This is arbitrary. It is intended to make sure the * transceiver settles. */ tlan_set_timer(dev, msecs_to_jiffies(50), TLAN_TIMER_PHY_PUP); -- 2.17.1
[PATCH 0/6] spelling: Fix typo related to "arbitrary"
I found some typos related to "arbitrary". s/abitrary/arbitrary/ s/arbitary/arbitrary/ This series fixes them. These typos have been reported in the past in other codes, but correction 'abitrary||arbitrary' wasn't added to scripts/spelling.txt. Therefore, PATCH #6 adds it to spelling.txt. Naoki Hayama (6): net: tlan: Fix typo abitrary dt-bindings: pinctrl: qcom: Fix typo abitrary dt-bindings: pinctrl: sirf: Fix typo abitrary ALSA: hdspm: Fix typo arbitary drm/etnaviv: Fix typo arbitary scripts/spelling.txt: Add arbitrary correction Documentation/devicetree/bindings/pinctrl/pinctrl-atlas7.txt| 2 +- .../devicetree/bindings/pinctrl/qcom,ipq4019-pinctrl.txt| 2 +- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 2 +- drivers/net/ethernet/ti/tlan.c | 2 +- scripts/spelling.txt| 1 + sound/pci/rme9652/hdspm.c | 2 +- 6 files changed, 6 insertions(+), 5 deletions(-) -- 2.17.1
Re: [PATCH net-next v2 3/7] ethtool: trim policy tables
On 10/6/20 12:07 AM, Jakub Kicinski wrote: > Since ethtool uses strict attribute validation there's no need > to initialize all attributes in policy tables. 0 is NLA_UNSPEC > which is going to be rejected. Remove the NLA_REJECTs. > > Similarly attributes above maxattrs are rejected, so there's > no need to always size the policy tables to ETHTOOL_A_..._MAX. > This implies that all policy tables must be 'complete'. strset_stringsets_policy[] for example is : static const struct nla_policy strset_stringsets_policy[] = { [ETHTOOL_A_STRINGSETS_STRINGSET]= { .type = NLA_NESTED }, }; So when later strset_parse_request() does : req_info->counts_only = tb[ETHTOOL_A_STRSET_COUNTS_ONLY]; We have an out-of-bound access since ETHTOOL_A_STRSET_COUNTS_ONLY > ETHTOOL_A_STRINGSETS_STRINGSET Not sure what was the expected type for this attribute, the kernel only looks at its presence, not its value.
Re: [PATCH net-next v2 3/7] ethtool: trim policy tables
On Thu, 2020-10-08 at 11:12 +0200, Eric Dumazet wrote: > > On 10/6/20 12:07 AM, Jakub Kicinski wrote: > > Since ethtool uses strict attribute validation there's no need > > to initialize all attributes in policy tables. 0 is NLA_UNSPEC > > which is going to be rejected. Remove the NLA_REJECTs. > > > > Similarly attributes above maxattrs are rejected, so there's > > no need to always size the policy tables to ETHTOOL_A_..._MAX. > > > > This implies that all policy tables must be 'complete'. > > strset_stringsets_policy[] for example is : > > static const struct nla_policy strset_stringsets_policy[] = { > [ETHTOOL_A_STRINGSETS_STRINGSET]= { .type = NLA_NESTED }, > }; > > So when later strset_parse_request() does : > > req_info->counts_only = tb[ETHTOOL_A_STRSET_COUNTS_ONLY]; > > We have an out-of-bound access since ETHTOOL_A_STRSET_COUNTS_ONLY > > ETHTOOL_A_STRINGSETS_STRINGSET Yeah, Leon Romanovsky reported actually running into this yesterday, and I sent a fix :-) > Not sure what was the expected type for this attribute, the kernel > only looks at its presence, not its value. It was NLA_FLAG, but never actually in the policy, so you could never even successfully use it ... Here was the fix https://lore.kernel.org/netdev/20201007125348.a74389e18168.Ieab7a871e27b9698826e75dc9e825e4ddbc852b1@changeid/ johannes
Re: [PATCH net-next v2 3/7] ethtool: trim policy tables
On Thu, 2020-10-08 at 11:13 +0200, Johannes Berg wrote: > > This implies that all policy tables must be 'complete'. Also, yes they had to be complete already, perhaps *except* for NLA_FLAG like this below use ... > > So when later strset_parse_request() does : > > > > req_info->counts_only = tb[ETHTOOL_A_STRSET_COUNTS_ONLY]; > > > Here was the fix > https://lore.kernel.org/netdev/20201007125348.a74389e18168.Ieab7a871e27b9698826e75dc9e825e4ddbc852b1@changeid/ Sorry, wrong link https://lore.kernel.org/netdev/20201007125348.a0b250308599.Ie9b429e276d064f28ce12db01fffa430e5c770e0@changeid/ johannes
[net-next PATCH v3] net: dsa: rtl8366rb: Roof MTU for switch
The MTU setting for this DSA switch is global so we need to keep track of the MTU set for each port, then as soon as any MTU changes, roof the MTU to the biggest common denominator and poke that into the switch MTU setting. To achieve this we need a per-chip-variant state container for the RTL8366RB to use for the RTL8366RB-specific stuff. Other SMI switches does seem to have per-port MTU setting capabilities. Fixes: 5f4a8ef384db ("net: dsa: rtl8366rb: Support setting MTU") Signed-off-by: Linus Walleij --- ChangeLog v2->v3: - Fix the reverse-christmas-tree properly by also making it compile :/ ChangeLog v1->v2: - Fix a reverse-christmas-tree variable order issue. --- drivers/net/dsa/realtek-smi-core.c | 3 ++- drivers/net/dsa/realtek-smi-core.h | 2 ++ drivers/net/dsa/rtl8366rb.c| 38 ++ 3 files changed, 38 insertions(+), 5 deletions(-) diff --git a/drivers/net/dsa/realtek-smi-core.c b/drivers/net/dsa/realtek-smi-core.c index fae188c60191..8e49d4f85d48 100644 --- a/drivers/net/dsa/realtek-smi-core.c +++ b/drivers/net/dsa/realtek-smi-core.c @@ -394,9 +394,10 @@ static int realtek_smi_probe(struct platform_device *pdev) var = of_device_get_match_data(dev); np = dev->of_node; - smi = devm_kzalloc(dev, sizeof(*smi), GFP_KERNEL); + smi = devm_kzalloc(dev, sizeof(*smi) + var->chip_data_sz, GFP_KERNEL); if (!smi) return -ENOMEM; + smi->chip_data = (void *)smi + sizeof(*smi); smi->map = devm_regmap_init(dev, NULL, smi, &realtek_smi_mdio_regmap_config); if (IS_ERR(smi->map)) { diff --git a/drivers/net/dsa/realtek-smi-core.h b/drivers/net/dsa/realtek-smi-core.h index 6f2dab7e33d6..bc7bd47fb037 100644 --- a/drivers/net/dsa/realtek-smi-core.h +++ b/drivers/net/dsa/realtek-smi-core.h @@ -71,6 +71,7 @@ struct realtek_smi { int vlan4k_enabled; charbuf[4096]; + void*chip_data; /* Per-chip extra variant data */ }; /** @@ -111,6 +112,7 @@ struct realtek_smi_variant { unsigned int clk_delay; u8 cmd_read; u8 cmd_write; + size_t chip_data_sz; }; /* SMI core calls */ diff --git a/drivers/net/dsa/rtl8366rb.c b/drivers/net/dsa/rtl8366rb.c index 053bf5041f8d..28f510a580be 100644 --- a/drivers/net/dsa/rtl8366rb.c +++ b/drivers/net/dsa/rtl8366rb.c @@ -311,6 +311,13 @@ #define RTL8366RB_GREEN_FEATURE_TX BIT(0) #define RTL8366RB_GREEN_FEATURE_RX BIT(2) +/** + * struct rtl8366rb - RTL8366RB-specific data + */ +struct rtl8366rb { + unsigned int max_mtu[RTL8366RB_NUM_PORTS]; +}; + static struct rtl8366_mib_counter rtl8366rb_mib_counters[] = { { 0, 0, 4, "IfInOctets"}, { 0, 4, 4, "EtherStatsOctets" }, @@ -712,6 +719,7 @@ static int rtl8366rb_setup(struct dsa_switch *ds) { struct realtek_smi *smi = ds->priv; const u16 *jam_table; + struct rtl8366rb *rb; u32 chip_ver = 0; u32 chip_id = 0; int jam_size; @@ -719,6 +727,8 @@ static int rtl8366rb_setup(struct dsa_switch *ds) int ret; int i; + rb = smi->chip_data; + ret = regmap_read(smi->map, RTL8366RB_CHIP_ID_REG, &chip_id); if (ret) { dev_err(smi->dev, "unable to read chip id\n"); @@ -871,6 +881,9 @@ static int rtl8366rb_setup(struct dsa_switch *ds) RTL8366RB_SGCR_MAX_LENGTH_1536); if (ret) return ret; + for (i = 0; i < RTL8366RB_NUM_PORTS; i++) + /* layer 2 size, see rtl8366rb_change_mtu() */ + rb->max_mtu[i] = 1532; /* Enable learning for all ports */ ret = regmap_write(smi->map, RTL8366RB_SSCR0, 0); @@ -1112,20 +1125,36 @@ rtl8366rb_port_disable(struct dsa_switch *ds, int port) static int rtl8366rb_change_mtu(struct dsa_switch *ds, int port, int new_mtu) { struct realtek_smi *smi = ds->priv; + struct rtl8366rb *rb; + unsigned int max_mtu; u32 len; + int i; + + /* Cache the per-port MTU setting */ + rb = smi->chip_data; + rb->max_mtu[port] = new_mtu; - /* The first setting, 1522 bytes, is max IP packet 1500 bytes, + /* Roof out the MTU for the entire switch to the greatest +* common denominator: the biggest set for any one port will +* be the biggest MTU for the switch. +* +* The first setting, 1522 bytes, is max IP packet 1500 bytes, * plus ethernet header, 1518 bytes, plus CPU tag, 4 bytes. * This function should consider the parameter an SDU, so the * MTU passed for this setting is 1518 bytes. The same logic * of subtracting the DSA tag of 4 bytes apply to the other * settings. */ - if (new_mtu <= 1518) + max_mtu = 1518; + for (i = 0; i < RTL8
Re: [PATCH net-next 11/15] sctp: add udphdr to overhead when udp_port is set
On Tue, Oct 6, 2020 at 3:01 AM Marcelo Ricardo Leitner wrote: > > On Sat, Oct 03, 2020 at 08:24:34PM +0800, Xin Long wrote: > > On Sat, Oct 3, 2020 at 7:23 PM Xin Long wrote: > > > > > > On Sat, Oct 3, 2020 at 4:12 PM Xin Long wrote: > > > > > > > > On Sat, Oct 3, 2020 at 12:08 PM Marcelo Ricardo Leitner > > > > wrote: > > > > > > > > > > On Wed, Sep 30, 2020 at 03:00:42AM +0800, kernel test robot wrote: > > > > > > Hi Xin, > > > > > > > > > > > > Thank you for the patch! Yet something to improve: > > > > > > > > > > I wonder how are you planning to fix this. It is quite entangled. > > > > > This is not performance critical. Maybe the cleanest way out is to > > > > > move it to a .c file. > > > > > > > > > > Adding a > > > > > #if defined(CONFIG_IP_SCTP) || defined(CONFIG_IP_SCTP_MODULE) > > > > > in there doesn't seem good. > > > > > > > > > > >In file included from include/net/sctp/checksum.h:27, > > > > > > from net/netfilter/nf_nat_proto.c:16: > > > > > >include/net/sctp/sctp.h: In function 'sctp_mtu_payload': > > > > > > >> include/net/sctp/sctp.h:583:31: error: 'struct net' has no > > > > > > >> member named 'sctp'; did you mean 'ct'? > > > > > > 583 | if (sock_net(&sp->inet.sk)->sctp.udp_port) > > > > > > | ^~~~ > > > > > > | ct > > > > > > > > > > Here is actually another problem, I'm still thinking how to fix it. > > > > > > > > Now sctp_mtu_payload() returns different value depending on > > > > net->sctp.udp_port. but net->sctp.udp_port can be changed by > > > > "sysctl -w" anytime. so: > > Good point. > > > > > > > > > In sctp_packet_config() it gets overhead/headroom by calling > > > > sctp_mtu_payload(). When 'udp_port' is 0, it's IP+MAC header > > > > size. Then if 'udp_port' is changed to 9899 by 'sysctl -w', > > > > udphdr will also be added to the packet in sctp_v4_xmit(), > > > > and later the headroom may not be enough for IP+MAC headers. > > > > > > > > I'm thinking to add sctp_sock->udp_port, and it'll be set when > > > > the sock is created with net->udp_port. but not sure if we should > > > > update sctp_sock->udp_port with net->udp_port when sending packets? > > I don't think so, > > > > something like: > ... > > diff --git a/net/sctp/output.c b/net/sctp/output.c > > index 6614c9fdc51e..c96b13ec72f4 100644 > > --- a/net/sctp/output.c > > +++ b/net/sctp/output.c > > @@ -91,6 +91,14 @@ void sctp_packet_config(struct sctp_packet *packet, > > __u32 vtag, > > if (asoc) { > > sk = asoc->base.sk; > > sp = sctp_sk(sk); > > + > > + if (unlikely(sp->udp_port != sock_net(sk)->sctp.udp_port)) { > > RFC6951 has: > > 6.1. Get or Set the Remote UDP Encapsulation Port Number > (SCTP_REMOTE_UDP_ENCAPS_PORT) > ... >sue_assoc_id: This parameter is ignored for one-to-one style > sockets. For one-to-many style sockets, the application may fill > in an association identifier or SCTP_FUTURE_ASSOC for this query. > It is an error to use SCTP_{CURRENT|ALL}_ASSOC in sue_assoc_id. > >sue_address: This specifies which address is of interest. If a > wildcard address is provided, it applies only to future paths. > > So I'm not seeing a reason to have a system wide knob that takes > effect in run time like this. > Enable, start apps, and they keep behaving as initially configured. > Need to disable? Restart the apps/sockets. > > Thoughts? Right, not to update it on tx path makes more sense. Thanks. > > > + __u16 port = sock_net(sk)->sctp.udp_port; > > + > > + if (!sp->udp_port || !port) > > + sctp_assoc_update_frag_point(asoc); > > + sp->udp_port = port; > > + } > > }
Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping
On Thu, Oct 08, 2020 at 10:34:11AM +0200, Kurt Kanzenbach wrote: > On Wed Oct 07 2020, Vladimir Oltean wrote: > > On Wed, Oct 07, 2020 at 12:39:49PM +0200, Kurt Kanzenbach wrote: > >> For instance the hellcreek switch has actually three ptp hardware > >> clocks and the time stamping can be configured to use either one of > >> them. > > > > The sja1105 also has a corrected and an uncorrected PTP clock that can > > take timestamps. Initially I had thought I'd be going to spend some time > > figuring out multi-PHC support, but now I don't see any practical reason > > to use the uncorrected PHC for anything. > > Just out of curiosity: How do you implement 802.1AS then? My > understanding is that the free-running clock has to be used for the Has to be? I couldn't find that wording in IEEE 802.1AS-2011. > calculation of the peer delays and such meaning there should be a way to > get access to both PHCs or having some form of cross timestamping > available. > > The hellcreek switch can take cross snapshots of all three ptp clocks in > hardware for that purpose. Well, at the end of the day, all the other TSN offloads (tc-taprio, tc-gate) will still have to use the synchronized PTP clock, so what we're doing is we're simply letting that clock be synchronized by ptp4l. > >> > So when you'll poll for TX timestamps, you'll receive a TX > >> > timestamp from the PHY and another one from the switch, and those will > >> > be in a race with one another, so you won't know which one is which. > >> > >> OK. So what happens if the driver will accept to disable hardware > >> timestamping? Is there anything else that needs to be implemented? Are > >> there (good) examples? > > > > It needs to not call skb_complete_tx_timestamp() and friends. > > > > For PHY timestamping, it also needs to invoke the correct methods for RX > > and for TX, where the PHY timestamping hooks will get called. I don't > > think that DSA is compatible yet with PHY timestamping, but it is > > probably a trivial modification. > > Hmm? If DSA doesn't support PHY timestamping how are other DSA drivers > dealing with it then? I'm getting really confused. They aren't dealing with it, of course. > Furthermore, there is no hellcreek hardware available with timestamping > capable PHYs. How am I supposed to even test this? > > For now, until there is hardware available, PHY timestamping is not > supported with the hellcreek switch. I was just pointing out that this is something you'll certainly have to change if somebody will want PHY timestamping. Even without hardware, you _could_ probably test that DSA is doing the right thing by simply adding the PTP timestamping ops to a PHY driver that you own, and inject dummy timestamps. The expectation becomes that user space gets those dummy timestamps, and not the ones emitted by your switch.
RE: [PATCH v2 1/6] Add ancillary bus support
> From: Leon Romanovsky > Sent: Thursday, October 8, 2020 1:15 PM > > On Thu, Oct 08, 2020 at 07:14:17AM +, Parav Pandit wrote: > > > > > > > From: Leon Romanovsky > > > Sent: Thursday, October 8, 2020 10:56 AM > > > > > > On Thu, Oct 08, 2020 at 04:56:01AM +, Parav Pandit wrote: > > > > > > > > > > > > > From: Pierre-Louis Bossart > > > > > > > > > > Sent: Thursday, October 8, 2020 3:20 AM > > > > > > > > > > > > > > > On 10/7/20 4:22 PM, Ertman, David M wrote: > > > > > >> -Original Message- > > > > > >> From: Pierre-Louis Bossart > > > > > >> > > > > > >> Sent: Wednesday, October 7, 2020 1:59 PM > > > > > >> To: Ertman, David M ; Parav Pandit > > > > > >> ; Leon Romanovsky > > > > > >> Cc: alsa-de...@alsa-project.org; pa...@mellanox.com; > > > > > >> ti...@suse.de; netdev@vger.kernel.org; > > > > > >> ranjani.sridha...@linux.intel.com; > > > > > >> fred...@linux.intel.com; linux-r...@vger.kernel.org; > > > > > >> dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe > > > > > >> ; gre...@linuxfoundation.org; > > > > > >> k...@kernel.org; Williams, Dan J ; > > > > > >> Saleem, Shiraz ; > > > > > >> da...@davemloft.net; Patil, Kiran > > > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support > > > > > >> > > > > > >> > > > > > >> > > > > > Below is most simple, intuitive and matching with core APIs > > > > > for name and design pattern wise. > > > > > init() > > > > > { > > > > > err = ancillary_device_initialize(); > > > > > if (err) > > > > > return ret; > > > > > > > > > > err = ancillary_device_add(); > > > > > if (ret) > > > > > goto err_unwind; > > > > > > > > > > err = some_foo(); > > > > > if (err) > > > > > goto err_foo; > > > > > return 0; > > > > > > > > > > err_foo: > > > > > ancillary_device_del(adev); > > > > > err_unwind: > > > > > ancillary_device_put(adev->dev); > > > > > return err; > > > > > } > > > > > > > > > > cleanup() > > > > > { > > > > > ancillary_device_de(adev); > > > > > ancillary_device_put(adev); > > > > > /* It is common to have a one wrapper for this as > > > > > ancillary_device_unregister(). > > > > > * This will match with core device_unregister() that has > > > > > precise documentation. > > > > > * but given fact that init() code need proper error > > > > > unwinding, like above, > > > > > * it make sense to have two APIs, and no need to export > > > > > another symbol for unregister(). > > > > > * This pattern is very easy to audit and code. > > > > > */ > > > > > } > > > > > >>> > > > > > >>> I like this flow +1 > > > > > >>> > > > > > >>> But ... since the init() function is performing both > > > > > >>> device_init and device_add - it should probably be called > > > > > >>> ancillary_device_register, and we are back to a single > > > > > >>> exported API for both register and unregister. > > > > > >> > > > > > >> Kind reminder that we introduced the two functions to allow > > > > > >> the caller to know if it needed to free memory when > > > > > >> initialize() fails, and it didn't need to free memory when > > > > > >> add() failed since > > > > > >> put_device() takes care of it. If you have a single init() > > > > > >> function it's impossible to know which behavior to select on error. > > > > > >> > > > > > >> I also have a case with SoundWire where it's nice to first > > > > > >> initialize, then set some data and then add. > > > > > >> > > > > > > > > > > > > The flow as outlined by Parav above does an initialize as the > > > > > > first step, so every error path out of the function has to do > > > > > > a put_device(), so you would never need to manually free the > > > > > > memory in > > > > > the setup function. > > > > > > It would be freed in the release call. > > > > > > > > > > err = ancillary_device_initialize(); if (err) > > > > > return ret; > > > > > > > > > > where is the put_device() here? if the release function does any > > > > > sort of kfree, then you'd need to do it manually in this case. > > > > Since device_initialize() failed, put_device() cannot be done here. > > > > So yes, pseudo code should have shown, if (err) { > > > > kfree(adev); > > > > return err; > > > > } > > > > > > > > If we just want to follow register(), unregister() pattern, > > > > > > > > Than, > > > > > > > > ancillar_device_register() should be, > > > > > > > > /** > > > > * ancillar_device_register() - register an ancillary device > > > > * NOTE: __never directly free @adev after calling this function, > > > > even if it returned > > > > * an error. Always use ancillary_device_put() to give up the > > > > reference > > > initialized by this function. > > > > * This note matches with the core and caller knows exactly what > > > > to be > > > done. > > > > */ > > > >
Re: [PATCH net 2/2] IPv6: reply ICMP error if the first fragment don't include all headers
On 10/8/20 10:30 AM, Hangbin Liu wrote: > Hi Eric, > > Thanks for the comments. I should add "RFC" in subject next time for the > uncertain fix patch. > > On Wed, Oct 07, 2020 at 11:35:41AM +0200, Eric Dumazet wrote: >> >> >> On 10/7/20 5:55 AM, Hangbin Liu wrote: >> >>> kfree_skb(skb); >>> @@ -282,6 +285,21 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff >>> *skb, struct net_device *dev, >>> } >>> } >>> >>> + /* RFC 8200, Section 4.5 Fragment Header: >>> +* If the first fragment does not include all headers through an >>> +* Upper-Layer header, then that fragment should be discarded and >>> +* an ICMP Parameter Problem, Code 3, message should be sent to >>> +* the source of the fragment, with the Pointer field set to zero. >>> +*/ >>> + nexthdr = hdr->nexthdr; >>> + offset = ipv6_skip_exthdr(skb, skb_transport_offset(skb), &nexthdr, >>> &frag_off); >>> + if (frag_off == htons(IP6_MF) && !pskb_may_pull(skb, offset + 1)) { >>> + __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS); >>> + icmpv6_param_prob(skb, ICMPV6_HDR_INCOMP, 0); >>> + rcu_read_unlock(); >>> + return NULL; >>> + } >>> + >>> rcu_read_unlock(); >>> >>> /* Must drop socket now because of tproxy. */ >>> >> >> Ouch, this is quite a buggy patch. >> >> I doubt we want to add yet another ipv6_skip_exthdr() call in IPv6 fast path. >> >> Surely the presence of NEXTHDR_FRAGMENT is already tested elsewhere ? > > Would you like to help point where NEXTHDR_FRAGMENT was tested before IPv6 > defragment? I think we have to ask the question : Should routers enforce the rule, or only end points ? End points must handle NEXTHDR_FRAGMENT, in ipv6_frag_rcv() > >> >> Also, ipv6_skip_exthdr() does not pull anything in skb->head, it would be >> strange >> to force a pull of hundreds of bytes just because you want to check if an >> extra byte is there, >> if the packet could be forwarded as is, without additional memory >> allocations. >> >> Testing skb->len should be more than enough at this stage. > > Ah, yes, I shouldn't call pskb_may_pull here. >> >> Also ipv6_skip_exthdr() can return an error. > > it returns -1 as error, If we tested it by (offset + 1 > skb->len), does > that count as an error handler? If there is an error, then you want to send the ICMP, right ? The (offset + 1) expression will become 0, and surely the test will be false, so you wont send the ICMP... > > Thanks > Hangbin >
[PATCHv2 net-next 02/17] udp6: move the mss check after udp gso tunnel processing
For some protocol's gso, like SCTP, it's using GSO_BY_FRAGS for gso_size. When using UDP to encapsulate its packet, it will return error in udp6_ufo_fragment() as skb->len < gso_size, and it will never go to the gso tunnel processing. So we should move this check after udp gso tunnel processing, the same as udp4_ufo_fragment() does. While at it, also tidy the variables up. Signed-off-by: Xin Long --- net/ipv6/udp_offload.c | 154 - 1 file changed, 76 insertions(+), 78 deletions(-) diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 584157a..3c5ec8e 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -17,96 +17,94 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, netdev_features_t features) { + u8 nexthdr, frag_hdr_sz = sizeof(struct frag_hdr); + unsigned int unfrag_ip6hlen, unfrag_len, mss; struct sk_buff *segs = ERR_PTR(-EINVAL); - unsigned int mss; - unsigned int unfrag_ip6hlen, unfrag_len; - struct frag_hdr *fptr; + const struct ipv6hdr *ipv6h; u8 *packet_start, *prevhdr; - u8 nexthdr; - u8 frag_hdr_sz = sizeof(struct frag_hdr); + struct frag_hdr *fptr; + int tnl_hlen, err; + struct udphdr *uh; __wsum csum; - int tnl_hlen; - int err; - mss = skb_shinfo(skb)->gso_size; - if (unlikely(skb->len <= mss)) + if (skb->encapsulation && + (skb_shinfo(skb)->gso_type & +(SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))) { + segs = skb_udp_tunnel_segment(skb, features, true); goto out; + } - if (skb->encapsulation && skb_shinfo(skb)->gso_type & - (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM)) - segs = skb_udp_tunnel_segment(skb, features, true); - else { - const struct ipv6hdr *ipv6h; - struct udphdr *uh; + if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP_L4))) + goto out; - if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP_L4))) - goto out; + if (!pskb_may_pull(skb, sizeof(struct udphdr))) + goto out; - if (!pskb_may_pull(skb, sizeof(struct udphdr))) - goto out; + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) + return __udp_gso_segment(skb, features); - if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) - return __udp_gso_segment(skb, features); - - /* Do software UFO. Complete and fill in the UDP checksum as HW cannot -* do checksum of UDP packets sent as multiple IP fragments. -*/ - - uh = udp_hdr(skb); - ipv6h = ipv6_hdr(skb); - - uh->check = 0; - csum = skb_checksum(skb, 0, skb->len, 0); - uh->check = udp_v6_check(skb->len, &ipv6h->saddr, - &ipv6h->daddr, csum); - if (uh->check == 0) - uh->check = CSUM_MANGLED_0; - - skb->ip_summed = CHECKSUM_UNNECESSARY; - - /* If there is no outer header we can fake a checksum offload -* due to the fact that we have already done the checksum in -* software prior to segmenting the frame. -*/ - if (!skb->encap_hdr_csum) - features |= NETIF_F_HW_CSUM; - - /* Check if there is enough headroom to insert fragment header. */ - tnl_hlen = skb_tnl_header_len(skb); - if (skb->mac_header < (tnl_hlen + frag_hdr_sz)) { - if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz)) - goto out; - } + mss = skb_shinfo(skb)->gso_size; + if (unlikely(skb->len <= mss)) + goto out; - /* Find the unfragmentable header and shift it left by frag_hdr_sz -* bytes to insert fragment header. -*/ - err = ip6_find_1stfragopt(skb, &prevhdr); - if (err < 0) - return ERR_PTR(err); - unfrag_ip6hlen = err; - nexthdr = *prevhdr; - *prevhdr = NEXTHDR_FRAGMENT; - unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) + -unfrag_ip6hlen + tnl_hlen; - packet_start = (u8 *) skb->head + SKB_GSO_CB(skb)->mac_offset; - memmove(packet_start-frag_hdr_sz, packet_start, unfrag_len); - - SKB_GSO_CB(skb)->mac_offset -= frag_hdr_sz; - skb->mac_header -= frag_hdr_sz; - skb->network_header -= frag_hdr_sz; - - fptr = (struct frag_hdr *)(skb_network_header(skb) +
[PATCHv2 net-next 01/17] udp: check udp sock encap_type in __udp_lib_err
There is a chance that __udp4/6_lib_lookup() returns a udp encap sock in __udp_lib_err(), like the udp encap listening sock may use the same port as remote encap port, in which case it should go to __udp4/6_lib_err_encap() for more validation before processing the icmp packet. This patch is to check encap_type in __udp_lib_err() for the further validation for a encap sock. Signed-off-by: Xin Long --- net/ipv4/udp.c | 2 +- net/ipv6/udp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 09f0a23..ca04a8a 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -702,7 +702,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable) sk = __udp4_lib_lookup(net, iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex, inet_sdif(skb), udptable, NULL); - if (!sk) { + if (!sk || udp_sk(sk)->encap_type) { /* No socket for error: try tunnels before discarding */ sk = ERR_PTR(-ENOENT); if (static_branch_unlikely(&udp_encap_needed_key)) { diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 29d9691..cde9b88 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -560,7 +560,7 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt, sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source, inet6_iif(skb), inet6_sdif(skb), udptable, NULL); - if (!sk) { + if (!sk || udp_sk(sk)->encap_type) { /* No socket for error: try tunnels before discarding */ sk = ERR_PTR(-ENOENT); if (static_branch_unlikely(&udpv6_encap_needed_key)) { -- 2.1.0
[PATCHv2 net-next 04/17] udp: support sctp over udp in skb_udp_tunnel_segment
To call sctp_gso_segment() properly in skb_udp_tunnel_segment() for sctp over udp packets, we need to set transport_header to sctp header. When skb->ip_summed == CHECKSUM_PARTIAL, skb_crc32c_csum_help() should be called for the inner sctp packet. Cc: Tom Herbert Signed-off-by: Xin Long --- net/ipv4/udp_offload.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index c0b010b..a484405 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -49,6 +49,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb, __skb_pull(skb, tnl_hlen); skb_reset_mac_header(skb); skb_set_network_header(skb, skb_inner_network_offset(skb)); + skb_set_transport_header(skb, skb_inner_transport_offset(skb)); skb->mac_len = skb_inner_network_offset(skb); skb->protocol = new_protocol; @@ -131,8 +132,12 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb, uh->check = ~csum_fold(csum_add(partial, (__force __wsum)htonl(len))); - if (skb->encapsulation) - skb_checksum_help(skb); + if (skb->encapsulation) { + if (skb->csum_not_inet) + skb_crc32c_csum_help(skb); + else + skb_checksum_help(skb); + } if (offload_csum) { skb->ip_summed = CHECKSUM_PARTIAL; -- 2.1.0
[PATCHv2 net-next 05/17] sctp: create udp4 sock and add its encap_rcv
This patch is to add the functions to create/release udp4 sock, and set the sock's encap_rcv to process the incoming udp encap sctp packets. In sctp_udp_rcv(), as we can see, all we need to do is fix the transport header for sctp_rcv(), then it would implement the part of rfc6951#section-5.4: "When an encapsulated packet is received, the UDP header is removed. Then, the generic lookup is performed, as done by an SCTP stack whenever a packet is received, to find the association for the received SCTP packet" Note that these functions will be called in the last patch of this patchset when enabling this feature. v1->v2: - Add pr_err() when fails to create udp v4 sock. Signed-off-by: Xin Long --- include/net/netns/sctp.h | 5 + include/net/sctp/constants.h | 2 ++ include/net/sctp/sctp.h | 2 ++ net/sctp/protocol.c | 43 +++ 4 files changed, 52 insertions(+) diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h index d8d02e4..3d10bef 100644 --- a/include/net/netns/sctp.h +++ b/include/net/netns/sctp.h @@ -22,6 +22,11 @@ struct netns_sctp { */ struct sock *ctl_sock; + /* udp tunneling listening sock. */ + struct sock *udp4_sock; + /* udp tunneling listening port. */ + int udp_port; + /* This is the global local address list. * We actively maintain this complete list of addresses on * the system by catching address add/delete events. diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 122d9e2..b583166 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -286,6 +286,8 @@ enum { SCTP_MAX_GABS = 16 }; * functions simpler to write. */ +#define SCTP_DEFAULT_UDP_PORT 9899 /* default udp tunneling port */ + /* These are the values for pf exposure, UNUSED is to keep compatible with old * applications by default. */ diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index 4fc747b..bfd87a0 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -84,6 +84,8 @@ int sctp_copy_local_addr_list(struct net *net, struct sctp_bind_addr *addr, struct sctp_pf *sctp_get_pf_specific(sa_family_t family); int sctp_register_pf(struct sctp_pf *, sa_family_t); void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int); +int sctp_udp_sock_start(struct net *net); +void sctp_udp_sock_stop(struct net *net); /* * sctp/socket.c diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 2583323..2b7a3e1 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -44,6 +44,7 @@ #include #include #include +#include #define MAX_SCTP_PORT_HASH_ENTRIES (64 * 1024) @@ -840,6 +841,45 @@ static int sctp_ctl_sock_init(struct net *net) return 0; } +static int sctp_udp_rcv(struct sock *sk, struct sk_buff *skb) +{ + skb_set_transport_header(skb, sizeof(struct udphdr)); + sctp_rcv(skb); + return 0; +} + +int sctp_udp_sock_start(struct net *net) +{ + struct udp_tunnel_sock_cfg tuncfg = {NULL}; + struct udp_port_cfg udp_conf = {0}; + struct socket *sock; + int err; + + udp_conf.family = AF_INET; + udp_conf.local_ip.s_addr = htonl(INADDR_ANY); + udp_conf.local_udp_port = htons(net->sctp.udp_port); + err = udp_sock_create(net, &udp_conf, &sock); + if (err) { + pr_err("Failed to create the SCTP udp tunneling v4 sock\n"); + return err; + } + + tuncfg.encap_type = 1; + tuncfg.encap_rcv = sctp_udp_rcv; + setup_udp_tunnel_sock(net, sock, &tuncfg); + net->sctp.udp4_sock = sock->sk; + + return 0; +} + +void sctp_udp_sock_stop(struct net *net) +{ + if (net->sctp.udp4_sock) { + udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket); + net->sctp.udp4_sock = NULL; + } +} + /* Register address family specific functions. */ int sctp_register_af(struct sctp_af *af) { @@ -1271,6 +1311,9 @@ static int __net_init sctp_defaults_init(struct net *net) /* Enable ECN by default. */ net->sctp.ecn_enable = 1; + /* Set udp tunneling listening port to default value */ + net->sctp.udp_port = SCTP_DEFAULT_UDP_PORT; + /* Set SCOPE policy to enabled */ net->sctp.scope_policy = SCTP_SCOPE_POLICY_ENABLE; -- 2.1.0
[PATCHv2 net-next 00/17] sctp: Implement RFC6951: UDP Encapsulation of SCTP
Description From the RFC: The Main Reasons: o To allow SCTP traffic to pass through legacy NATs, which do not provide native SCTP support as specified in [BEHAVE] and [NATSUPP]. o To allow SCTP to be implemented on hosts that do not provide direct access to the IP layer. In particular, applications can use their own SCTP implementation if the operating system does not provide one. Implementation Notes: UDP-encapsulated SCTP is normally communicated between SCTP stacks using the IANA-assigned UDP port number 9899 (sctp-tunneling) on both ends. There are circumstances where other ports may be used on either end, and it might be required to use ports other than the registered port. Each SCTP stack uses a single local UDP encapsulation port number as the destination port for all its incoming SCTP packets, this greatly simplifies implementation design. An SCTP implementation supporting UDP encapsulation MUST maintain a remote UDP encapsulation port number per destination address for each SCTP association. Again, because the remote stack may be using ports other than the well-known port, each port may be different from each stack. However, because of remapping of ports by NATs, the remote ports associated with different remote IP addresses may not be identical, even if they are associated with the same stack. Because the well-known port might not be used, implementations need to allow other port numbers to be specified as a local or remote UDP encapsulation port number through APIs. Patches: This patchset is using the udp4/6 tunnel APIs to implement the UDP Encapsulation of SCTP with not much change in SCTP protocol stack and with all current SCTP features keeped in Linux Kernel. 1 - 4: Fix some UDP issues that may be triggered by SCTP over UDP. 5 - 7: Process incoming UDP encapsulated packets and ICMP packets. 8 -10: Remote encap port's update by sysctl, sockopt and packets. 11-14: Process outgoing pakects with UDP encapsulated and its GSO. 15-16: Add the part from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03. 17: Enable this feature. Tests: - lksctp-tools/src/func_tests with UDP Encapsulation enabled/disabled: Both make v4test and v6test passed. - sctp-tests with UDP Encapsulation enabled/disabled: repeatability/procdumps/sctpdiag/gsomtuchange/extoverflow/ sctphashtable passed. Others failed as expected due to those "iptables -p sctp" rules. - netperf on lo/netns/virtio_net, with gso enabled/disabled and with ip_checksum enabled/disabled, with UDP Encapsulation enabled/disabled: No clear performance dropped. v1->v2: - Fix some incorrect code in the patches 5,6,8,10,11,13,14,17, suggested by Marcelo. - Append two patches 15-16 to add the Additional Considerations for UDP Encapsulation of SCTP from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03, noticed by Michael. Xin Long (17): udp: check udp sock encap_type in __udp_lib_err udp6: move the mss check after udp gso tunnel processing udp: do checksum properly in skb_udp_tunnel_segment udp: support sctp over udp in skb_udp_tunnel_segment sctp: create udp4 sock and add its encap_rcv sctp: create udp6 sock and set its encap_rcv sctp: add encap_err_lookup for udp encap socks sctp: add encap_port for netns sock asoc and transport sctp: add SCTP_REMOTE_UDP_ENCAPS_PORT sockopt sctp: allow changing transport encap_port by peer packets sctp: add udphdr to overhead when udp_port is set sctp: call sk_setup_caps in sctp_packet_transmit instead sctp: support for sending packet over udp4 sock sctp: support for sending packet over udp6 sock sctp: add the error cause for new encapsulation port restart sctp: handle the init chunk matching an existing asoc sctp: enable udp tunneling socks include/linux/sctp.h | 20 ++ include/net/netns/sctp.h | 8 +++ include/net/sctp/constants.h | 2 + include/net/sctp/sctp.h | 9 ++- include/net/sctp/sm.h| 4 ++ include/net/sctp/structs.h | 14 ++-- include/uapi/linux/sctp.h| 7 ++ net/ipv4/udp.c | 2 +- net/ipv4/udp_offload.c | 16 +++-- net/ipv6/udp.c | 2 +- net/ipv6/udp_offload.c | 154 +-- net/sctp/associola.c | 4 ++ net/sctp/ipv6.c | 44 + net/sctp/output.c| 22 +++ net/sctp/protocol.c | 148 + net/sctp/sm_make_chunk.c | 21 ++ net/sctp/sm_statefuns.c | 52 +++ net/sctp/socket.c| 112 +++ net/sctp/sysctl.c| 59 + 19 files changed, 572 insertions(+), 128 deletions(-) -- 2.1.0
[PATCHv2 net-next 03/17] udp: do checksum properly in skb_udp_tunnel_segment
This patch fixes two things: When skb->ip_summed == CHECKSUM_PARTIAL, skb_checksum_help() should be called do the checksum, instead of gso_make_checksum(), which is used to do the checksum for current proto after calling skb_segment(), not after the inner proto's gso_segment(). When offload_csum is disabled, the hardware will not do the checksum for the current proto, udp. So instead of calling gso_make_checksum(), it should calculate checksum for udp itself. Cc: Tom Herbert Signed-off-by: Xin Long --- net/ipv4/udp_offload.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index e67a66f..c0b010b 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -131,14 +131,15 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb, uh->check = ~csum_fold(csum_add(partial, (__force __wsum)htonl(len))); - if (skb->encapsulation || !offload_csum) { - uh->check = gso_make_checksum(skb, ~uh->check); - if (uh->check == 0) - uh->check = CSUM_MANGLED_0; - } else { + if (skb->encapsulation) + skb_checksum_help(skb); + + if (offload_csum) { skb->ip_summed = CHECKSUM_PARTIAL; skb->csum_start = skb_transport_header(skb) - skb->head; skb->csum_offset = offsetof(struct udphdr, check); + } else { + uh->check = csum_fold(skb_checksum(skb, udp_offset, len, 0)); } } while ((skb = skb->next)); out: -- 2.1.0
[PATCHv2 net-next 06/17] sctp: create udp6 sock and set its encap_rcv
This patch is to add the udp6 sock part in sctp_udp_sock_start/stop(). udp_conf.use_udp6_rx_checksums is set to true, as: "The SCTP checksum MUST be computed for IPv4 and IPv6, and the UDP checksum SHOULD be computed for IPv4 and IPv6" says in rfc6951#section-5.3. v1->v2: - Add pr_err() when fails to create udp v6 sock. - Add #if IS_ENABLED(CONFIG_IPV6) not to create v6 sock when ipv6 is disabled. Signed-off-by: Xin Long --- include/net/netns/sctp.h | 1 + net/sctp/protocol.c | 26 ++ 2 files changed, 27 insertions(+) diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h index 3d10bef..f622945 100644 --- a/include/net/netns/sctp.h +++ b/include/net/netns/sctp.h @@ -24,6 +24,7 @@ struct netns_sctp { /* udp tunneling listening sock. */ struct sock *udp4_sock; + struct sock *udp6_sock; /* udp tunneling listening port. */ int udp_port; diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 2b7a3e1..49b5d75 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -869,6 +869,28 @@ int sctp_udp_sock_start(struct net *net) setup_udp_tunnel_sock(net, sock, &tuncfg); net->sctp.udp4_sock = sock->sk; +#if IS_ENABLED(CONFIG_IPV6) + memset(&udp_conf, 0, sizeof(udp_conf)); + + udp_conf.family = AF_INET6; + udp_conf.local_ip6 = in6addr_any; + udp_conf.local_udp_port = htons(net->sctp.udp_port); + udp_conf.use_udp6_rx_checksums = true; + udp_conf.ipv6_v6only = true; + err = udp_sock_create(net, &udp_conf, &sock); + if (err) { + pr_err("Failed to create the SCTP udp tunneling v6 sock\n"); + udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket); + net->sctp.udp4_sock = NULL; + return err; + } + + tuncfg.encap_type = 1; + tuncfg.encap_rcv = sctp_udp_rcv; + setup_udp_tunnel_sock(net, sock, &tuncfg); + net->sctp.udp6_sock = sock->sk; +#endif + return 0; } @@ -878,6 +900,10 @@ void sctp_udp_sock_stop(struct net *net) udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket); net->sctp.udp4_sock = NULL; } + if (net->sctp.udp6_sock) { + udp_tunnel_sock_release(net->sctp.udp6_sock->sk_socket); + net->sctp.udp6_sock = NULL; + } } /* Register address family specific functions. */ -- 2.1.0
[PATCHv2 net-next 10/17] sctp: allow changing transport encap_port by peer packets
As rfc6951#section-5.4 says: "After finding the SCTP association (which includes checking the verification tag), the UDP source port MUST be stored as the encapsulation port for the destination address the SCTP packet is received from (see Section 5.1). When a non-encapsulated SCTP packet is received by the SCTP stack, the encapsulation of outgoing packets belonging to the same association and the corresponding destination address MUST be disabled." transport encap_port should be updated by a validated incoming packet's udp src port. We save the udp src port in sctp_input_cb->encap_port, and then update the transport in two places: 1. right after vtag is verified, which is required by RFC, and this allows the existent transports to be updated by the chunks that can only be processed on an asoc. 2. right before processing the 'init' where the transports are added, and this allows building a sctp over udp connection by client with the server not knowing the remote encap port. 3. when processing ootb_pkt and creating the temporary transport for the reply pkt. Note that sctp_input_cb->header is removed, as it's not used any more in sctp. v1->v2: - Change encap_port as __be16 for sctp_input_cb. Signed-off-by: Xin Long --- include/net/sctp/sm.h | 1 + include/net/sctp/structs.h | 7 +-- net/sctp/ipv6.c| 1 + net/sctp/protocol.c| 11 ++- net/sctp/sm_make_chunk.c | 1 + net/sctp/sm_statefuns.c| 2 ++ 6 files changed, 16 insertions(+), 7 deletions(-) diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h index 5c491a3..a499341 100644 --- a/include/net/sctp/sm.h +++ b/include/net/sctp/sm.h @@ -380,6 +380,7 @@ sctp_vtag_verify(const struct sctp_chunk *chunk, if (ntohl(chunk->sctp_hdr->vtag) == asoc->c.my_vtag) return 1; + chunk->transport->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port; return 0; } diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index aa98e7e..81464ae 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1120,14 +1120,9 @@ static inline void sctp_outq_cork(struct sctp_outq *q) * sctp_input_cb is currently used on rx and sock rx queue */ struct sctp_input_cb { - union { - struct inet_skb_parmh4; -#if IS_ENABLED(CONFIG_IPV6) - struct inet6_skb_parm h6; -#endif - } header; struct sctp_chunk *chunk; struct sctp_af *af; + __be16 encap_port; }; #define SCTP_INPUT_CB(__skb) ((struct sctp_input_cb *)&((__skb)->cb[0])) diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 8a58f42..a064bf2 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -1053,6 +1053,7 @@ static struct inet_protosw sctpv6_stream_protosw = { static int sctp6_rcv(struct sk_buff *skb) { + memset(skb->cb, 0, sizeof(skb->cb)); return sctp_rcv(skb) ? -1 : 0; } diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 5b74187..0d16e5e 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -843,6 +843,9 @@ static int sctp_ctl_sock_init(struct net *net) static int sctp_udp_rcv(struct sock *sk, struct sk_buff *skb) { + memset(skb->cb, 0, sizeof(skb->cb)); + SCTP_INPUT_CB(skb)->encap_port = udp_hdr(skb)->source; + skb_set_transport_header(skb, sizeof(struct udphdr)); sctp_rcv(skb); return 0; @@ -1139,9 +1142,15 @@ static struct inet_protosw sctp_stream_protosw = { .flags = SCTP_PROTOSW_FLAG }; +static int sctp4_rcv(struct sk_buff *skb) +{ + memset(skb->cb, 0, sizeof(skb->cb)); + return sctp_rcv(skb); +} + /* Register with IP layer. */ static const struct net_protocol sctp_protocol = { - .handler = sctp_rcv, + .handler = sctp4_rcv, .err_handler = sctp_v4_err, .no_policy = 1, .netns_ok= 1, diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 9a56ae2..21d0ff1 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -2321,6 +2321,7 @@ int sctp_process_init(struct sctp_association *asoc, struct sctp_chunk *chunk, * added as the primary transport. The source address seems to * be a better choice than any of the embedded addresses. */ + asoc->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port; if (!sctp_assoc_add_peer(asoc, peer_addr, gfp, SCTP_ACTIVE)) goto nomem; diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index c669f8b..8edab15 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -6268,6 +6268,8 @@ static struct sctp_packet *sctp_ootb_pkt_new( if (!transport) goto nomem; + transport->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port; + /* Cache a route for the transport with the chunk's destination as * the source address. */ -- 2
[PATCHv2 net-next 07/17] sctp: add encap_err_lookup for udp encap socks
As it says in rfc6951#section-5.5: "When receiving ICMP or ICMPv6 response packets, there might not be enough bytes in the payload to identify the SCTP association that the SCTP packet triggering the ICMP or ICMPv6 packet belongs to. If a received ICMP or ICMPv6 packet cannot be related to a specific SCTP association or the verification tag cannot be verified, it MUST be discarded silently. In particular, this means that the SCTP stack MUST NOT rely on receiving ICMP or ICMPv6 messages. Implementation constraints could prevent processing received ICMP or ICMPv6 messages." ICMP or ICMPv6 packets need to be handled, and this is implemented by udp encap sock .encap_err_lookup function. The .encap_err_lookup function is called in __udp(6)_lib_err_encap() to confirm this path does need to be updated. For sctp, what we can do here is check if the corresponding asoc and transport exist. Note that icmp packet process for sctp over udp is done by udp sock .encap_err_lookup(), and it means for now we can't do as much as sctp_v4/6_err() does. Also we can't do the two mappings mentioned in rfc6951#section-5.5. Signed-off-by: Xin Long --- net/sctp/protocol.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 49b5d75..dd2d9c4 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -848,6 +848,23 @@ static int sctp_udp_rcv(struct sock *sk, struct sk_buff *skb) return 0; } +static int sctp_udp_err_lookup(struct sock *sk, struct sk_buff *skb) +{ + struct sctp_association *asoc; + struct sctp_transport *t; + int family; + + skb->transport_header += sizeof(struct udphdr); + family = (ip_hdr(skb)->version == 4) ? AF_INET : AF_INET6; + sk = sctp_err_lookup(dev_net(skb->dev), family, skb, sctp_hdr(skb), +&asoc, &t); + if (!sk) + return -ENOENT; + + sctp_err_finish(sk, t); + return 0; +} + int sctp_udp_sock_start(struct net *net) { struct udp_tunnel_sock_cfg tuncfg = {NULL}; @@ -866,6 +883,7 @@ int sctp_udp_sock_start(struct net *net) tuncfg.encap_type = 1; tuncfg.encap_rcv = sctp_udp_rcv; + tuncfg.encap_err_lookup = sctp_udp_err_lookup; setup_udp_tunnel_sock(net, sock, &tuncfg); net->sctp.udp4_sock = sock->sk; @@ -887,6 +905,7 @@ int sctp_udp_sock_start(struct net *net) tuncfg.encap_type = 1; tuncfg.encap_rcv = sctp_udp_rcv; + tuncfg.encap_err_lookup = sctp_udp_err_lookup; setup_udp_tunnel_sock(net, sock, &tuncfg); net->sctp.udp6_sock = sock->sk; #endif -- 2.1.0
[PATCHv2 net-next 09/17] sctp: add SCTP_REMOTE_UDP_ENCAPS_PORT sockopt
This patch is to implement: rfc6951#section-6.1: Get or Set the Remote UDP Encapsulation Port Number with the param of the struct: struct sctp_udpencaps { sctp_assoc_t sue_assoc_id; struct sockaddr_storage sue_address; uint16_t sue_port; }; the encap_port of sock, assoc or transport can be changed by users, which also means it allows the different transports of the same asoc to have different encap_port value. Signed-off-by: Xin Long --- include/uapi/linux/sctp.h | 7 +++ net/sctp/socket.c | 110 ++ 2 files changed, 117 insertions(+) diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h index 28ad40d..cb78e7a 100644 --- a/include/uapi/linux/sctp.h +++ b/include/uapi/linux/sctp.h @@ -140,6 +140,7 @@ typedef __s32 sctp_assoc_t; #define SCTP_ECN_SUPPORTED 130 #define SCTP_EXPOSE_POTENTIALLY_FAILED_STATE 131 #define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE +#define SCTP_REMOTE_UDP_ENCAPS_PORT132 /* PR-SCTP policies */ #define SCTP_PR_SCTP_NONE 0x @@ -1197,6 +1198,12 @@ struct sctp_event { uint8_t se_on; }; +struct sctp_udpencaps { + sctp_assoc_t sue_assoc_id; + struct sockaddr_storage sue_address; + uint16_t sue_port; +}; + /* SCTP Stream schedulers */ enum sctp_sched_type { SCTP_SS_FCFS, diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 09b94cd..c9e86f5 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4417,6 +4417,53 @@ static int sctp_setsockopt_pf_expose(struct sock *sk, return retval; } +static int sctp_setsockopt_encap_port(struct sock *sk, + struct sctp_udpencaps *encap, + unsigned int optlen) +{ + struct sctp_association *asoc; + struct sctp_transport *t; + + if (optlen != sizeof(*encap)) + return -EINVAL; + + /* If an address other than INADDR_ANY is specified, and +* no transport is found, then the request is invalid. +*/ + if (!sctp_is_any(sk, (union sctp_addr *)&encap->sue_address)) { + t = sctp_addr_id2transport(sk, &encap->sue_address, + encap->sue_assoc_id); + if (!t) + return -EINVAL; + + t->encap_port = encap->sue_port; + return 0; + } + + /* Get association, if assoc_id != SCTP_FUTURE_ASSOC and the +* socket is a one to many style socket, and an association +* was not found, then the id was invalid. +*/ + asoc = sctp_id2assoc(sk, encap->sue_assoc_id); + if (!asoc && encap->sue_assoc_id != SCTP_FUTURE_ASSOC && + sctp_style(sk, UDP)) + return -EINVAL; + + /* If changes are for association, also apply encap to each +* transport. +*/ + if (asoc) { + list_for_each_entry(t, &asoc->peer.transport_addr_list, + transports) + t->encap_port = encap->sue_port; + + return 0; + } + + sctp_sk(sk)->encap_port = encap->sue_port; + return 0; +} + /* API 6.2 setsockopt(), getsockopt() * * Applications use setsockopt() and getsockopt() to set or retrieve @@ -4636,6 +4683,9 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname, case SCTP_EXPOSE_POTENTIALLY_FAILED_STATE: retval = sctp_setsockopt_pf_expose(sk, kopt, optlen); break; + case SCTP_REMOTE_UDP_ENCAPS_PORT: + retval = sctp_setsockopt_encap_port(sk, kopt, optlen); + break; default: retval = -ENOPROTOOPT; break; @@ -7791,6 +7841,63 @@ static int sctp_getsockopt_pf_expose(struct sock *sk, int len, return retval; } +static int sctp_getsockopt_encap_port(struct sock *sk, int len, + char __user *optval, int __user *optlen) +{ + struct sctp_association *asoc; + struct sctp_udpencaps encap; + struct sctp_transport *t; + + if (len < sizeof(encap)) + return -EINVAL; + + len = sizeof(encap); + if (copy_from_user(&encap, optval, len)) + return -EFAULT; + + /* If an address other than INADDR_ANY is specified, and +* no transport is found, then the request is invalid. +*/ + if (!sctp_is_any(sk, (union sctp_addr *)&encap.sue_address)) { + t = sctp_addr_id2transport(sk, &encap.sue_address, + encap.sue_assoc_id); + if (!t) { + pr_debug("%s: failed no transport\n", __func__); + return -EINVAL; + } + + encap.sue_port = t->encap_port; + goto out; + } + + /* Get association, if assoc_id != SCTP_FU
[PATCHv2 net-next 11/17] sctp: add udphdr to overhead when udp_port is set
sctp_mtu_payload() is for calculating the frag size before making chunks from a msg. So we should only add udphdr size to overhead when udp socks are listening, as only then sctp can handle the incoming sctp over udp packets and outgoing sctp over udp packets will be possible. Note that we can't do this according to transport->encap_port, as different transports may be set to different values, while the chunks were made before choosing the transport, we could not be able to meet all rfc6951#section-5.6 recommends. v1->v2: - Add udp_port for sctp_sock to avoid a potential race issue, it will be used in xmit path in the next patch. Signed-off-by: Xin Long --- include/net/sctp/sctp.h| 7 +-- include/net/sctp/structs.h | 1 + net/sctp/socket.c | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index bfd87a0..86f74f2 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -578,10 +578,13 @@ static inline __u32 sctp_mtu_payload(const struct sctp_sock *sp, { __u32 overhead = sizeof(struct sctphdr) + extra; - if (sp) + if (sp) { overhead += sp->pf->af->net_header_len; - else + if (sp->udp_port) + overhead += sizeof(struct udphdr); + } else { overhead += sizeof(struct ipv6hdr); + } if (WARN_ON_ONCE(mtu && mtu <= overhead)) mtu = overhead; diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 81464ae..80f7149 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -178,6 +178,7 @@ struct sctp_sock { */ __u32 hbinterval; + __be16 udp_port; __be16 encap_port; /* This is the max_retrans value for new associations. */ diff --git a/net/sctp/socket.c b/net/sctp/socket.c index c9e86f5..192ab9a 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4926,6 +4926,7 @@ static int sctp_init_sock(struct sock *sk) * be modified via SCTP_PEER_ADDR_PARAMS */ sp->hbinterval = net->sctp.hb_interval; + sp->udp_port= htons(net->sctp.udp_port); sp->encap_port = htons(net->sctp.encap_port); sp->pathmaxrxt = net->sctp.max_retrans_path; sp->pf_retrans = net->sctp.pf_retrans; -- 2.1.0
[PATCHv2 net-next 12/17] sctp: call sk_setup_caps in sctp_packet_transmit instead
sk_setup_caps() was originally called in Commit 90017accff61 ("sctp: Add GSO support"), as: "We have to refresh this in case we are xmiting to more than one transport at a time" This actually happens in the loop of sctp_outq_flush_transports(), and it shouldn't be tied to gso, so move it out of gso part and before sctp_packet_pack(). Signed-off-by: Xin Long --- net/sctp/output.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/net/sctp/output.c b/net/sctp/output.c index 1441eaf..fb16500 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -508,12 +508,6 @@ static int sctp_packet_pack(struct sctp_packet *packet, sizeof(struct inet6_skb_parm))); skb_shinfo(head)->gso_segs = pkt_count; skb_shinfo(head)->gso_size = GSO_BY_FRAGS; - rcu_read_lock(); - if (skb_dst(head) != tp->dst) { - dst_hold(tp->dst); - sk_setup_caps(sk, tp->dst); - } - rcu_read_unlock(); goto chksum; } @@ -593,6 +587,13 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp) } skb_dst_set(head, dst); + rcu_read_lock(); + if (__sk_dst_get(sk) != tp->dst) { + dst_hold(tp->dst); + sk_setup_caps(sk, tp->dst); + } + rcu_read_unlock(); + /* pack up chunks */ pkt_count = sctp_packet_pack(packet, head, gso, gfp); if (!pkt_count) { -- 2.1.0
[PATCHv2 net-next 13/17] sctp: support for sending packet over udp4 sock
This patch does what the rfc6951#section-5.3 says for ipv4: "Within the UDP header, the source port MUST be the local UDP encapsulation port number of the SCTP stack, and the destination port MUST be the remote UDP encapsulation port number maintained for the association and the destination address to which the packet is sent (see Section 5.1). Because the SCTP packet is the UDP payload, the length of the UDP packet MUST be the length of the SCTP packet plus the size of the UDP header. The SCTP checksum MUST be computed for IPv4 and IPv6, and the UDP checksum SHOULD be computed for IPv4 and IPv6." Some places need to be adjusted in sctp_packet_transmit(): 1. For non-gso packets, when transport's encap_port is set, sctp checksum has to be done in sctp_packet_pack(), as the outer udp will use ip_summed = CHECKSUM_PARTIAL to do the offload setting for checksum. 2. Delay calling dst_clone() and skb_dst_set() for non-udp packets until sctp_v4_xmit(), as for udp packets, skb_dst_set() is not needed before calling udp_tunnel_xmit_skb(). then in sctp_v4_xmit(): 1. Go to udp_tunnel_xmit_skb() only when transport->encap_port and net->sctp.udp_port both are set, as these are one for dst port and another for src port. 2. For gso packet, SKB_GSO_UDP_TUNNEL_CSUM is set for gso_type, and with this udp checksum can be done in __skb_udp_tunnel_segment() for each segments after the sctp gso. 3. inner_mac_header and inner_transport_header are set, as these will be needed in __skb_udp_tunnel_segment() to find the right headers. 4. df and ttl are calculated, as these are the required params by udp_tunnel_xmit_skb(). 5. nocheck param has to be false, as "the UDP checksum SHOULD be computed for IPv4 and IPv6", says in rfc6951#section-5.3. v1->v2: - Use sp->udp_port instead in sctp_v4_xmit(), which is more safe. Signed-off-by: Xin Long --- net/sctp/output.c | 9 +++-- net/sctp/protocol.c | 41 ++--- 2 files changed, 33 insertions(+), 17 deletions(-) diff --git a/net/sctp/output.c b/net/sctp/output.c index fb16500..6614c9f 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -514,8 +514,8 @@ static int sctp_packet_pack(struct sctp_packet *packet, if (sctp_checksum_disable) return 1; - if (!(skb_dst(head)->dev->features & NETIF_F_SCTP_CRC) || - dst_xfrm(skb_dst(head)) || packet->ipfragok) { + if (!(tp->dst->dev->features & NETIF_F_SCTP_CRC) || + dst_xfrm(tp->dst) || packet->ipfragok || tp->encap_port) { struct sctphdr *sh = (struct sctphdr *)skb_transport_header(head); @@ -542,7 +542,6 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp) struct sctp_association *asoc = tp->asoc; struct sctp_chunk *chunk, *tmp; int pkt_count, gso = 0; - struct dst_entry *dst; struct sk_buff *head; struct sctphdr *sh; struct sock *sk; @@ -579,13 +578,11 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp) sh->checksum = 0; /* drop packet if no dst */ - dst = dst_clone(tp->dst); - if (!dst) { + if (!tp->dst) { IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES); kfree_skb(head); goto out; } - skb_dst_set(head, dst); rcu_read_lock(); if (__sk_dst_get(sk) != tp->dst) { diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 0d16e5e..be002b7 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1059,25 +1059,44 @@ static int sctp_inet_supported_addrs(const struct sctp_sock *opt, } /* Wrapper routine that calls the ip transmit routine. */ -static inline int sctp_v4_xmit(struct sk_buff *skb, - struct sctp_transport *transport) +static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t) { - struct inet_sock *inet = inet_sk(skb->sk); + struct dst_entry *dst = dst_clone(t->dst); + struct flowi4 *fl4 = &t->fl.u.ip4; + struct sock *sk = skb->sk; + struct inet_sock *inet = inet_sk(sk); __u8 dscp = inet->tos; + __be16 df = 0; pr_debug("%s: skb:%p, len:%d, src:%pI4, dst:%pI4\n", __func__, skb, -skb->len, &transport->fl.u.ip4.saddr, -&transport->fl.u.ip4.daddr); +skb->len, &fl4->saddr, &fl4->daddr); + + if (t->dscp & SCTP_DSCP_SET_MASK) + dscp = t->dscp & SCTP_DSCP_VAL_MASK; - if (transport->dscp & SCTP_DSCP_SET_MASK) - dscp = transport->dscp & SCTP_DSCP_VAL_MASK; + inet->pmtudisc = t->param_flags & SPP_PMTUD_ENABLE ? IP_PMTUDISC_DO + : IP_PMTUDISC_DONT; + SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS); - inet-
[PATCHv2 net-next 08/17] sctp: add encap_port for netns sock asoc and transport
encap_port is added as per netns/sock/assoc/transport, and the latter one's encap_port inherits the former one's by default. The transport's encap_port value would mostly decide if one packet should go out with udp encapsulated or not. This patch also allows users to set netns' encap_port by sysctl. v1->v2: - Change to define encap_port as __be16 for sctp_sock, asoc and transport. Signed-off-by: Xin Long --- include/net/netns/sctp.h | 2 ++ include/net/sctp/structs.h | 6 ++ net/sctp/associola.c | 4 net/sctp/protocol.c| 3 +++ net/sctp/socket.c | 1 + net/sctp/sysctl.c | 10 ++ 6 files changed, 26 insertions(+) diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h index f622945..6af7a39 100644 --- a/include/net/netns/sctp.h +++ b/include/net/netns/sctp.h @@ -27,6 +27,8 @@ struct netns_sctp { struct sock *udp6_sock; /* udp tunneling listening port. */ int udp_port; + /* udp tunneling remote encap port. */ + int encap_port; /* This is the global local address list. * We actively maintain this complete list of addresses on diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 0bdff38..aa98e7e 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -178,6 +178,8 @@ struct sctp_sock { */ __u32 hbinterval; + __be16 encap_port; + /* This is the max_retrans value for new associations. */ __u16 pathmaxrxt; @@ -877,6 +879,8 @@ struct sctp_transport { */ unsigned long last_time_ecne_reduced; + __be16 encap_port; + /* This is the max_retrans value for the transport and will * be initialized from the assocs value. This can be changed * using the SCTP_SET_PEER_ADDR_PARAMS socket option. @@ -1790,6 +1794,8 @@ struct sctp_association { */ unsigned long hbinterval; + __be16 encap_port; + /* This is the max_retrans value for new transports in the * association. */ diff --git a/net/sctp/associola.c b/net/sctp/associola.c index fdb69d4..336df4b 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -99,6 +99,8 @@ static struct sctp_association *sctp_association_init( */ asoc->hbinterval = msecs_to_jiffies(sp->hbinterval); + asoc->encap_port = sp->encap_port; + /* Initialize path max retrans value. */ asoc->pathmaxrxt = sp->pathmaxrxt; @@ -624,6 +626,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc, */ peer->hbinterval = asoc->hbinterval; + peer->encap_port = asoc->encap_port; + /* Set the path max_retrans. */ peer->pathmaxrxt = asoc->pathmaxrxt; diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index dd2d9c4..5b74187 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1359,6 +1359,9 @@ static int __net_init sctp_defaults_init(struct net *net) /* Set udp tunneling listening port to default value */ net->sctp.udp_port = SCTP_DEFAULT_UDP_PORT; + /* Set remote encap port to 0 by default */ + net->sctp.encap_port = 0; + /* Set SCOPE policy to enabled */ net->sctp.scope_policy = SCTP_SCOPE_POLICY_ENABLE; diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 53d0a41..09b94cd 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4876,6 +4876,7 @@ static int sctp_init_sock(struct sock *sk) * be modified via SCTP_PEER_ADDR_PARAMS */ sp->hbinterval = net->sctp.hb_interval; + sp->encap_port = htons(net->sctp.encap_port); sp->pathmaxrxt = net->sctp.max_retrans_path; sp->pf_retrans = net->sctp.pf_retrans; sp->ps_retrans = net->sctp.ps_retrans; diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index c16c809..ecc1b5e 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -36,6 +36,7 @@ static int rto_alpha_max = 1000; static int rto_beta_max = 1000; static int pf_expose_max = SCTP_PF_EXPOSE_MAX; static int ps_retrans_max = SCTP_PS_RETRANS_MAX; +static int udp_port_max = 65535; static unsigned long max_autoclose_min = 0; static unsigned long max_autoclose_max = @@ -291,6 +292,15 @@ static struct ctl_table sctp_net_table[] = { .proc_handler = proc_dointvec, }, { + .procname = "encap_port", + .data = &init_net.sctp.encap_port, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + .extra1 = SYSCTL_ZERO, + .extra2 = &udp_port_max, + }, + { .procname = "addr_scope_policy", .data = &init_net.sctp.scope_policy, .maxlen = sizeof(int), -- 2.1.0
[PATCHv2 net-next 14/17] sctp: support for sending packet over udp6 sock
This one basically does the similar things in sctp_v6_xmit as does for udp4 sock in the last patch, just note that: 1. label needs to be calculated, as it's the param of udp_tunnel6_xmit_skb(). 2. The 'nocheck' param of udp_tunnel6_xmit_skb() is false, as required by RFC. v1->v2: - Use sp->udp_port instead in sctp_v6_xmit(), which is more safe. Signed-off-by: Xin Long --- net/sctp/ipv6.c | 43 --- 1 file changed, 32 insertions(+), 11 deletions(-) diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index a064bf2..814754d 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -55,6 +55,7 @@ #include #include #include +#include #include @@ -191,33 +192,53 @@ static int sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, return ret; } -static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport) +static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t) { + struct dst_entry *dst = dst_clone(t->dst); + struct flowi6 *fl6 = &t->fl.u.ip6; struct sock *sk = skb->sk; struct ipv6_pinfo *np = inet6_sk(sk); - struct flowi6 *fl6 = &transport->fl.u.ip6; __u8 tclass = np->tclass; - int res; + __be32 label; pr_debug("%s: skb:%p, len:%d, src:%pI6 dst:%pI6\n", __func__, skb, skb->len, &fl6->saddr, &fl6->daddr); - if (transport->dscp & SCTP_DSCP_SET_MASK) - tclass = transport->dscp & SCTP_DSCP_VAL_MASK; + if (t->dscp & SCTP_DSCP_SET_MASK) + tclass = t->dscp & SCTP_DSCP_VAL_MASK; if (INET_ECN_is_capable(tclass)) IP6_ECN_flow_xmit(sk, fl6->flowlabel); - if (!(transport->param_flags & SPP_PMTUD_ENABLE)) + if (!(t->param_flags & SPP_PMTUD_ENABLE)) skb->ignore_df = 1; SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS); - rcu_read_lock(); - res = ip6_xmit(sk, skb, fl6, sk->sk_mark, rcu_dereference(np->opt), - tclass, sk->sk_priority); - rcu_read_unlock(); - return res; + if (!t->encap_port || !sctp_sk(sk)->udp_port) { + int res; + + skb_dst_set(skb, dst); + rcu_read_lock(); + res = ip6_xmit(sk, skb, fl6, sk->sk_mark, + rcu_dereference(np->opt), + tclass, sk->sk_priority); + rcu_read_unlock(); + return res; + } + + if (skb_is_gso(skb)) + skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM; + + skb->encapsulation = 1; + skb_reset_inner_mac_header(skb); + skb_reset_inner_transport_header(skb); + skb_set_inner_ipproto(skb, IPPROTO_SCTP); + label = ip6_make_flowlabel(sock_net(sk), skb, fl6->flowlabel, true, fl6); + + return udp_tunnel6_xmit_skb(dst, sk, skb, NULL, &fl6->saddr, + &fl6->daddr, tclass, ip6_dst_hoplimit(dst), + label, sctp_sk(sk)->udp_port, t->encap_port, false); } /* Returns the dst cache entry for the given source and destination ip -- 2.1.0
[PATCHv2 net-next 16/17] sctp: handle the init chunk matching an existing asoc
This is from Section 4 of draft-tuexen-tsvwg-sctp-udp-encaps-cons-03, and it requires responding with an abort chunk with an error cause when the udp source port of the received init chunk doesn't match the encap port of the transport. Signed-off-by: Xin Long --- net/sctp/sm_statefuns.c | 50 + 1 file changed, 50 insertions(+) diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 8edab15..244a5d8 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -87,6 +87,13 @@ static enum sctp_disposition sctp_sf_tabort_8_4_8( const union sctp_subtype type, void *arg, struct sctp_cmd_seq *commands); +static enum sctp_disposition sctp_sf_new_encap_port( + struct net *net, + const struct sctp_endpoint *ep, + const struct sctp_association *asoc, + const union sctp_subtype type, + void *arg, + struct sctp_cmd_seq *commands); static struct sctp_sackhdr *sctp_sm_pull_sack(struct sctp_chunk *chunk); static enum sctp_disposition sctp_stop_t1_and_abort( @@ -1493,6 +1500,10 @@ static enum sctp_disposition sctp_sf_do_unexpected_init( if (!sctp_chunk_length_valid(chunk, sizeof(struct sctp_init_chunk))) return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, commands); + + if (SCTP_INPUT_CB(chunk->skb)->encap_port != chunk->transport->encap_port) + return sctp_sf_new_encap_port(net, ep, asoc, type, arg, commands); + /* Grab the INIT header. */ chunk->subh.init_hdr = (struct sctp_inithdr *)chunk->skb->data; @@ -3392,6 +3403,45 @@ static enum sctp_disposition sctp_sf_tabort_8_4_8( sctp_packet_append_chunk(packet, abort); + sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT, SCTP_PACKET(packet)); + + SCTP_INC_STATS(net, SCTP_MIB_OUTCTRLCHUNKS); + + sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); + return SCTP_DISPOSITION_CONSUME; +} + +/* Handling of SCTP Packets Containing an INIT Chunk Matching an + * Existing Associations when the udp encap port is incorrect. + * + * From Section 4 at draft-tuexen-tsvwg-sctp-udp-encaps-cons-03. + */ +static enum sctp_disposition sctp_sf_new_encap_port( + struct net *net, + const struct sctp_endpoint *ep, + const struct sctp_association *asoc, + const union sctp_subtype type, + void *arg, + struct sctp_cmd_seq *commands) +{ + struct sctp_packet *packet = NULL; + struct sctp_chunk *chunk = arg; + struct sctp_chunk *abort; + + packet = sctp_ootb_pkt_new(net, asoc, chunk); + if (!packet) + return SCTP_DISPOSITION_NOMEM; + + abort = sctp_make_new_encap_port(asoc, chunk); + if (!abort) { + sctp_ootb_pkt_free(packet); + return SCTP_DISPOSITION_NOMEM; + } + + abort->skb->sk = ep->base.sk; + + sctp_packet_append_chunk(packet, abort); + sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT, SCTP_PACKET(packet)); -- 2.1.0
Re: [PATCH 0/7] wfx: move out from the staging area
Kalle Valo writes: > Greg Kroah-Hartman writes: > >> On Wed, Oct 07, 2020 at 12:19:36PM +0200, Jerome Pouiller wrote: >>> From: Jérôme Pouiller >>> >>> I think the wfx driver is now mature enough to be accepted in the >>> drivers/net/wireless directory. >>> >>> There is still one item on the TODO list. It is an idea to improve the rate >>> control in some particular cases[1]. However, the current performances of >>> the >>> driver seem to satisfy everyone. In add, the suggested change is large >>> enough. >>> So, I would prefer to implement it only if it really solves an issue. I >>> think it >>> is not an obstacle to move the driver out of the staging area. >>> >>> In order to comply with the last rules for the DT bindings, I have >>> converted the >>> documentation to yaml. I am moderately happy with the result. Especially, >>> for >>> the description of the binding. Any comments are welcome. >>> >>> The series also update the copyrights dates of the files. I don't know >>> exactly >>> how this kind of changes should be sent. It's a bit weird to change all the >>> copyrights in one commit, but I do not see any better way. >>> >>> I also include a few fixes I have found these last weeks. >>> >>> [1] https://lore.kernel.org/lkml/3099559.gv3Q75KnN1@pc-42 >> >> I'll take the first 6 patches here, the last one you should work with >> the wireless maintainers to get reviewed. >> >> Maybe that might want to wait until after 5.10-rc1 is out, with all of >> these changes in it, making it an easier move. > > Yes, the driver needs to be reviewed in linux-wireless list. I recommend > submitting the whole driver in a patchset with one file per patch, which > seems to be the easiest way to review a full driver. The final move will > be in just one commit moving the driver, just like patch 7 does here. As > an example see how wilc1000 review was done. > > Device tree bindings needs to be reviewed by the DT maintainer so CC > devicetree on that patch. BTW, I wrote some instructions for new wireless drivers: https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#new_driver -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
[PATCHv2 net-next 15/17] sctp: add the error cause for new encapsulation port restart
This patch is to add the function to make the abort chunk with the error cause for new encapsulation port restart, defined on Section 4.4 in draft-tuexen-tsvwg-sctp-udp-encaps-cons-03. Signed-off-by: Xin Long --- include/linux/sctp.h | 20 include/net/sctp/sm.h| 3 +++ net/sctp/sm_make_chunk.c | 20 3 files changed, 43 insertions(+) diff --git a/include/linux/sctp.h b/include/linux/sctp.h index 7673123..bb19265 100644 --- a/include/linux/sctp.h +++ b/include/linux/sctp.h @@ -482,11 +482,13 @@ enum sctp_error { * 11 Restart of an association with new addresses * 12 User Initiated Abort * 13 Protocol Violation +* 14 Restart of an Association with New Encapsulation Port */ SCTP_ERROR_RESTART = cpu_to_be16(0x0b), SCTP_ERROR_USER_ABORT = cpu_to_be16(0x0c), SCTP_ERROR_PROTO_VIOLATION = cpu_to_be16(0x0d), + SCTP_ERROR_NEW_ENCAP_PORT = cpu_to_be16(0x0e), /* ADDIP Section 3.3 New Error Causes * @@ -793,4 +795,22 @@ enum { SCTP_FLOWLABEL_VAL_MASK = 0xf }; +/* UDP Encapsulation + * draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.html#section-4-4 + * + * The error cause indicating an "Restart of an Association with + * New Encapsulation Port" + * + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * |Cause Code = 14| Cause Length = 8| + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | Current Encapsulation Port | New Encapsulation Port| + * +---+---+ + */ +struct sctp_new_encap_port_hdr { + __be16 cur_port; + __be16 new_port; +}; + #endif /* __LINUX_SCTP_H__ */ diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h index a499341..fd223c9 100644 --- a/include/net/sctp/sm.h +++ b/include/net/sctp/sm.h @@ -221,6 +221,9 @@ struct sctp_chunk *sctp_make_violation_paramlen( struct sctp_chunk *sctp_make_violation_max_retrans( const struct sctp_association *asoc, const struct sctp_chunk *chunk); +struct sctp_chunk *sctp_make_new_encap_port( + const struct sctp_association *asoc, + const struct sctp_chunk *chunk); struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc, const struct sctp_transport *transport); struct sctp_chunk *sctp_make_heartbeat_ack(const struct sctp_association *asoc, diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 21d0ff1..3bf1399 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1142,6 +1142,26 @@ struct sctp_chunk *sctp_make_violation_max_retrans( return retval; } +struct sctp_chunk *sctp_make_new_encap_port(const struct sctp_association *asoc, + const struct sctp_chunk *chunk) +{ + struct sctp_new_encap_port_hdr nep; + struct sctp_chunk *retval; + + retval = sctp_make_abort(asoc, chunk, +sizeof(struct sctp_errhdr) + sizeof(nep)); + if (!retval) + goto nodata; + + sctp_init_cause(retval, SCTP_ERROR_NEW_ENCAP_PORT, sizeof(nep)); + nep.cur_port = htons(SCTP_INPUT_CB(chunk->skb)->encap_port); + nep.new_port = htons(chunk->transport->encap_port); + sctp_addto_chunk(retval, sizeof(nep), &nep); + +nodata: + return retval; +} + /* Make a HEARTBEAT chunk. */ struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc, const struct sctp_transport *transport) -- 2.1.0
[PATCHv2 net-next 17/17] sctp: enable udp tunneling socks
This patch is to enable udp tunneling socks by calling sctp_udp_sock_start() in sctp_ctrlsock_init(), and sctp_udp_sock_stop() in sctp_ctrlsock_exit(). Also add sysctl udp_port to allow changing the listening sock's port by users. Wit this patch, the whole sctp over udp feature can be enabled and used. v1->v2: - Also update ctl_sock udp_port in proc_sctp_do_udp_port() where netns udp_port gets changed. Signed-off-by: Xin Long --- net/sctp/protocol.c | 5 + net/sctp/sysctl.c | 49 + 2 files changed, 54 insertions(+) diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index be002b7..79fb4b5 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1469,6 +1469,10 @@ static int __net_init sctp_ctrlsock_init(struct net *net) if (status) pr_err("Failed to initialize the SCTP control sock\n"); + status = sctp_udp_sock_start(net); + if (status) + pr_err("Failed to initialize the SCTP udp tunneling sock\n"); + return status; } @@ -1476,6 +1480,7 @@ static void __net_exit sctp_ctrlsock_exit(struct net *net) { /* Free the control endpoint. */ inet_ctl_sock_destroy(net->sctp.ctl_sock); + sctp_udp_sock_stop(net); } static struct pernet_operations sctp_ctrlsock_ops = { diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index ecc1b5e..a723613 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -49,6 +49,8 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos); static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos); +static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write, void *buffer, +size_t *lenp, loff_t *ppos); static int proc_sctp_do_alpha_beta(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos); static int proc_sctp_do_auth(struct ctl_table *ctl, int write, @@ -292,6 +294,15 @@ static struct ctl_table sctp_net_table[] = { .proc_handler = proc_dointvec, }, { + .procname = "udp_port", + .data = &init_net.sctp.udp_port, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_sctp_do_udp_port, + .extra1 = SYSCTL_ZERO, + .extra2 = &udp_port_max, + }, + { .procname = "encap_port", .data = &init_net.sctp.encap_port, .maxlen = sizeof(int), @@ -487,6 +498,44 @@ static int proc_sctp_do_auth(struct ctl_table *ctl, int write, return ret; } +static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write, +void *buffer, size_t *lenp, loff_t *ppos) +{ + struct net *net = current->nsproxy->net_ns; + unsigned int min = *(unsigned int *)ctl->extra1; + unsigned int max = *(unsigned int *)ctl->extra2; + struct ctl_table tbl; + int ret, new_value; + + memset(&tbl, 0, sizeof(struct ctl_table)); + tbl.maxlen = sizeof(unsigned int); + + if (write) + tbl.data = &new_value; + else + tbl.data = &net->sctp.udp_port; + + ret = proc_dointvec(&tbl, write, buffer, lenp, ppos); + if (write && ret == 0) { + struct sock *sk = net->sctp.ctl_sock; + + if (new_value > max || new_value < min) + return -EINVAL; + + net->sctp.udp_port = new_value; + sctp_udp_sock_stop(net); + ret = sctp_udp_sock_start(net); + if (ret) + net->sctp.udp_port = 0; + + lock_sock(sk); + sctp_sk(sk)->udp_port = net->sctp.udp_port; + release_sock(sk); + } + + return ret; +} + int sctp_sysctl_net_register(struct net *net) { struct ctl_table *table; -- 2.1.0
Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping
On Thu Oct 08 2020, Vladimir Oltean wrote: > On Thu, Oct 08, 2020 at 10:34:11AM +0200, Kurt Kanzenbach wrote: >> On Wed Oct 07 2020, Vladimir Oltean wrote: >> > On Wed, Oct 07, 2020 at 12:39:49PM +0200, Kurt Kanzenbach wrote: >> >> For instance the hellcreek switch has actually three ptp hardware >> >> clocks and the time stamping can be configured to use either one of >> >> them. >> > >> > The sja1105 also has a corrected and an uncorrected PTP clock that can >> > take timestamps. Initially I had thought I'd be going to spend some time >> > figuring out multi-PHC support, but now I don't see any practical reason >> > to use the uncorrected PHC for anything. >> >> Just out of curiosity: How do you implement 802.1AS then? My >> understanding is that the free-running clock has to be used for the > > Has to be? I couldn't find that wording in IEEE 802.1AS-2011. It doesn't has to be, it *should* be. That's at least the outcome we had after lots of discussions. Actually Kamil (on Cc) is the expert on this topic. > >> calculation of the peer delays and such meaning there should be a way to >> get access to both PHCs or having some form of cross timestamping >> available. >> >> The hellcreek switch can take cross snapshots of all three ptp clocks in >> hardware for that purpose. > > Well, at the end of the day, all the other TSN offloads (tc-taprio, > tc-gate) will still have to use the synchronized PTP clock, so what > we're doing is we're simply letting that clock be synchronized by > ptp4l. Yes, the synchronized clock is of course needed for the traffic scheduling and so on. This is what we do here in this code as well. Only the synchronized one is exported to user space and used. However, the multi PHCs issue should be addressed as well at some point. > >> >> > So when you'll poll for TX timestamps, you'll receive a TX >> >> > timestamp from the PHY and another one from the switch, and those will >> >> > be in a race with one another, so you won't know which one is which. >> >> >> >> OK. So what happens if the driver will accept to disable hardware >> >> timestamping? Is there anything else that needs to be implemented? Are >> >> there (good) examples? >> > >> > It needs to not call skb_complete_tx_timestamp() and friends. >> > >> > For PHY timestamping, it also needs to invoke the correct methods for RX >> > and for TX, where the PHY timestamping hooks will get called. I don't >> > think that DSA is compatible yet with PHY timestamping, but it is >> > probably a trivial modification. >> >> Hmm? If DSA doesn't support PHY timestamping how are other DSA drivers >> dealing with it then? I'm getting really confused. > > They aren't dealing with it, of course. > >> Furthermore, there is no hellcreek hardware available with timestamping >> capable PHYs. How am I supposed to even test this? >> >> For now, until there is hardware available, PHY timestamping is not >> supported with the hellcreek switch. > > I was just pointing out that this is something you'll certainly have to > change if somebody will want PHY timestamping. Understood. > > Even without hardware, you _could_ probably test that DSA is doing the > right thing by simply adding the PTP timestamping ops to a PHY driver > that you own, and inject dummy timestamps. The expectation becomes that > user space gets those dummy timestamps, and not the ones emitted by your > switch. Of course it can be mocked. Whenever somebody wants to do PHY timestamping with a hellcreek switch this issue can be re-visited. Thanks, Kurt signature.asc Description: PGP signature
Re: [PATCHv2 net-next 04/17] udp: support sctp over udp in skb_udp_tunnel_segment
CCing Tom Herbert On Thu, Oct 8, 2020 at 5:48 PM Xin Long wrote: > > To call sctp_gso_segment() properly in skb_udp_tunnel_segment() for sctp > over udp packets, we need to set transport_header to sctp header. When > skb->ip_summed == CHECKSUM_PARTIAL, skb_crc32c_csum_help() should be > called for the inner sctp packet. > > Cc: Tom Herbert > Signed-off-by: Xin Long > --- > net/ipv4/udp_offload.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index c0b010b..a484405 100644 > --- a/net/ipv4/udp_offload.c > +++ b/net/ipv4/udp_offload.c > @@ -49,6 +49,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct > sk_buff *skb, > __skb_pull(skb, tnl_hlen); > skb_reset_mac_header(skb); > skb_set_network_header(skb, skb_inner_network_offset(skb)); > + skb_set_transport_header(skb, skb_inner_transport_offset(skb)); > skb->mac_len = skb_inner_network_offset(skb); > skb->protocol = new_protocol; > > @@ -131,8 +132,12 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct > sk_buff *skb, > uh->check = ~csum_fold(csum_add(partial, >(__force __wsum)htonl(len))); > > - if (skb->encapsulation) > - skb_checksum_help(skb); > + if (skb->encapsulation) { > + if (skb->csum_not_inet) > + skb_crc32c_csum_help(skb); > + else > + skb_checksum_help(skb); > + } > > if (offload_csum) { > skb->ip_summed = CHECKSUM_PARTIAL; > -- > 2.1.0 >
Re: [PATCHv2 net-next 03/17] udp: do checksum properly in skb_udp_tunnel_segment
CCing Tom Herbert On Thu, Oct 8, 2020 at 5:48 PM Xin Long wrote: > > This patch fixes two things: > > When skb->ip_summed == CHECKSUM_PARTIAL, skb_checksum_help() should be > called do the checksum, instead of gso_make_checksum(), which is used > to do the checksum for current proto after calling skb_segment(), not > after the inner proto's gso_segment(). > > When offload_csum is disabled, the hardware will not do the checksum > for the current proto, udp. So instead of calling gso_make_checksum(), > it should calculate checksum for udp itself. > > Cc: Tom Herbert > Signed-off-by: Xin Long > --- > net/ipv4/udp_offload.c | 11 ++- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index e67a66f..c0b010b 100644 > --- a/net/ipv4/udp_offload.c > +++ b/net/ipv4/udp_offload.c > @@ -131,14 +131,15 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct > sk_buff *skb, > uh->check = ~csum_fold(csum_add(partial, >(__force __wsum)htonl(len))); > > - if (skb->encapsulation || !offload_csum) { > - uh->check = gso_make_checksum(skb, ~uh->check); > - if (uh->check == 0) > - uh->check = CSUM_MANGLED_0; > - } else { > + if (skb->encapsulation) > + skb_checksum_help(skb); > + > + if (offload_csum) { > skb->ip_summed = CHECKSUM_PARTIAL; > skb->csum_start = skb_transport_header(skb) - > skb->head; > skb->csum_offset = offsetof(struct udphdr, check); > + } else { > + uh->check = csum_fold(skb_checksum(skb, udp_offset, > len, 0)); > } > } while ((skb = skb->next)); > out: > -- > 2.1.0 >
Re: [PATCH 0/7] wfx: move out from the staging area
On Thursday 8 October 2020 09:30:06 CEST Kalle Valo wrote: [...] > Yes, the driver needs to be reviewed in linux-wireless list. I recommend > submitting the whole driver in a patchset with one file per patch, which > seems to be the easiest way to review a full driver. The final move will > be in just one commit moving the driver, just like patch 7 does here. As > an example see how wilc1000 review was done. I see. I suppose it is still a bit complicated to review? Maybe I could try to make things easier. For my submission to staging/ I had taken time to split the driver in an understandable series of patches[1]. I think it was easier to review than just sending files one by one. I could do the same thing for the submission to linux-wireless. It would ask me a bit of work but, since I already have a template, it is conceivable. Do you think it is worth it, or it would be an unnecessary effort? [1] https://lore.kernel.org/driverdev-devel/20190919142527.31797-1-jerome.pouil...@silabs.com/ or commits a7a91ca5a23d^..40115bbc40e2 -- Jérôme Pouiller
Re: [PATCH v2 1/6] Add ancillary bus support
On Thu, Oct 08, 2020 at 09:45:29AM +, Parav Pandit wrote: > > > > From: Leon Romanovsky > > Sent: Thursday, October 8, 2020 1:15 PM > > > > On Thu, Oct 08, 2020 at 07:14:17AM +, Parav Pandit wrote: > > > > > > > > > > From: Leon Romanovsky > > > > Sent: Thursday, October 8, 2020 10:56 AM > > > > > > > > On Thu, Oct 08, 2020 at 04:56:01AM +, Parav Pandit wrote: > > > > > > > > > > > > > > > > From: Pierre-Louis Bossart > > > > > > > > > > > > Sent: Thursday, October 8, 2020 3:20 AM > > > > > > > > > > > > > > > > > > On 10/7/20 4:22 PM, Ertman, David M wrote: > > > > > > >> -Original Message- > > > > > > >> From: Pierre-Louis Bossart > > > > > > >> > > > > > > >> Sent: Wednesday, October 7, 2020 1:59 PM > > > > > > >> To: Ertman, David M ; Parav Pandit > > > > > > >> ; Leon Romanovsky > > > > > > >> Cc: alsa-de...@alsa-project.org; pa...@mellanox.com; > > > > > > >> ti...@suse.de; netdev@vger.kernel.org; > > > > > > >> ranjani.sridha...@linux.intel.com; > > > > > > >> fred...@linux.intel.com; linux-r...@vger.kernel.org; > > > > > > >> dledf...@redhat.com; broo...@kernel.org; Jason Gunthorpe > > > > > > >> ; gre...@linuxfoundation.org; > > > > > > >> k...@kernel.org; Williams, Dan J ; > > > > > > >> Saleem, Shiraz ; > > > > > > >> da...@davemloft.net; Patil, Kiran > > > > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Below is most simple, intuitive and matching with core APIs > > > > > > for name and design pattern wise. > > > > > > init() > > > > > > { > > > > > > err = ancillary_device_initialize(); > > > > > > if (err) > > > > > > return ret; > > > > > > > > > > > > err = ancillary_device_add(); > > > > > > if (ret) > > > > > > goto err_unwind; > > > > > > > > > > > > err = some_foo(); > > > > > > if (err) > > > > > > goto err_foo; > > > > > > return 0; > > > > > > > > > > > > err_foo: > > > > > > ancillary_device_del(adev); > > > > > > err_unwind: > > > > > > ancillary_device_put(adev->dev); > > > > > > return err; > > > > > > } > > > > > > > > > > > > cleanup() > > > > > > { > > > > > > ancillary_device_de(adev); > > > > > > ancillary_device_put(adev); > > > > > > /* It is common to have a one wrapper for this as > > > > > > ancillary_device_unregister(). > > > > > > * This will match with core device_unregister() that > > > > > > has > > > > > > precise documentation. > > > > > > * but given fact that init() code need proper error > > > > > > unwinding, like above, > > > > > > * it make sense to have two APIs, and no need to export > > > > > > another symbol for unregister(). > > > > > > * This pattern is very easy to audit and code. > > > > > > */ > > > > > > } > > > > > > >>> > > > > > > >>> I like this flow +1 > > > > > > >>> > > > > > > >>> But ... since the init() function is performing both > > > > > > >>> device_init and device_add - it should probably be called > > > > > > >>> ancillary_device_register, and we are back to a single > > > > > > >>> exported API for both register and unregister. > > > > > > >> > > > > > > >> Kind reminder that we introduced the two functions to allow > > > > > > >> the caller to know if it needed to free memory when > > > > > > >> initialize() fails, and it didn't need to free memory when > > > > > > >> add() failed since > > > > > > >> put_device() takes care of it. If you have a single init() > > > > > > >> function it's impossible to know which behavior to select on > > > > > > >> error. > > > > > > >> > > > > > > >> I also have a case with SoundWire where it's nice to first > > > > > > >> initialize, then set some data and then add. > > > > > > >> > > > > > > > > > > > > > > The flow as outlined by Parav above does an initialize as the > > > > > > > first step, so every error path out of the function has to do > > > > > > > a put_device(), so you would never need to manually free the > > > > > > > memory in > > > > > > the setup function. > > > > > > > It would be freed in the release call. > > > > > > > > > > > > err = ancillary_device_initialize(); if (err) > > > > > > return ret; > > > > > > > > > > > > where is the put_device() here? if the release function does any > > > > > > sort of kfree, then you'd need to do it manually in this case. > > > > > Since device_initialize() failed, put_device() cannot be done here. > > > > > So yes, pseudo code should have shown, if (err) { > > > > > kfree(adev); > > > > > return err; > > > > > } > > > > > > > > > > If we just want to follow register(), unregister() pattern, > > > > > > > > > > Than, > > > > > > > > > > ancillar_device_reg
Re: [PATCH net] bridge: Netlink interface fix.
On Wed, 2020-10-07 at 14:49 +, Nikolay Aleksandrov wrote: > On Wed, 2020-10-07 at 12:07 +, Henrik Bjoernlund wrote: > > This commit is correcting NETLINK br_fill_ifinfo() to be able to > > handle 'filter_mask' with multiple flags asserted. > > > > Fixes: 36a8e8e265420 ("bridge: Extend br_fill_ifinfo to return MPR status") > > > > Signed-off-by: Henrik Bjoernlund > > Reviewed-by: Horatiu Vultur > > Suggested-by: Nikolay Aleksandrov > > Tested-by: Horatiu Vultur > > --- > > net/bridge/br_netlink.c | 26 +++--- > > 1 file changed, 11 insertions(+), 15 deletions(-) > > > > The patch looks good, please don't separate the Fixes tag from the others. > Acked-by: Nikolay Aleksandrov > TBH, this does change a user facing api (the attribute nesting), but I think in this case it's acceptable due to the format being wrong and MRP being new, so I doubt anyone is yet dumping it mixed with vlan filter_mask and checking for two identical attributes, i.e. in the old/broken case parsing the attributes into a table would hide one of them and you'd have to walk over all attributes to catch that.
Re: [PATCH bpf-next 0/3] libbpf: auto-resize relocatable LOAD/STORE instructions
Hello Andrii, On Wed, Oct 7, 2020 at 8:01 PM Andrii Nakryiko wrote: > > On Wed, Oct 7, 2020 at 10:56 AM Luka Perkov wrote: > > > > Hello Andrii, > > > > On Fri, Oct 2, 2020 at 3:09 AM Andrii Nakryiko wrote: > > > Patch set implements logic in libbpf to auto-adjust memory size (1-, 2-, > > > 4-, > > > 8-bytes) of load/store (LD/ST/STX) instructions which have BPF CO-RE field > > > offset relocation associated with it. In practice this means transparent > > > handling of 32-bit kernels, both pointer and unsigned integers. Signed > > > integers are not relocatable with zero-extending loads/stores, so libbpf > > > poisons them and generates a warning. If/when BPF gets support for > > > sign-extending > > > loads/stores, it would be possible to automatically relocate them as well. > > > > > > All the details are contained in patch #1 comments and commit message. > > > Patch #2 is a simple change in libbpf to make advanced testing with > > > custom BTF > > > easier. Patch #3 validates correct uses of auto-resizable loads, as well > > > as > > > check that libbpf fails invalid uses. > > > > > > I'd really appreciate folks that use BPF on 32-bit architectures to test > > > this > > > out with their BPF programs and report if there are any problems with the > > > approach. > > > > > > Cc: Luka Perkov > > > > First, thank you for the support and sending this series. It took us a > > bit longer to run the tests as our target hardware still did not fully > > get complete mainline support and we had to rebase our patches. These > > are not related to BPF. > > > > Related to this patch, we have tested various BPF programs with this > > patch, and can confirm that it fixed previous issues with pointer > > offsets that we had and reported at: > > > > https://lore.kernel.org/r/CA+XBgLU=8PFkP8S32e4gpst0=r4mfv8rza5kao+cepysntr...@mail.gmail.com/. > > > > Most of our programs now work and we are currently debugging other > > programs that still aren't working. We are still not sure if the > > remaining issues are related to this or not, but will let you know > > sometime this week after further and more detailed investigation. > > > > Ok, great, thanks for the update. Just to update you that we have identified that the problem was a known issue with JIT as we had enabled the BPF_JIT_ALWAYS_ON. That said, it would be great to see this series included in 5.10 :) Thanks, Luka
pull-request: mac80211 2020-10-08
Hi Dave, Not sure you were still planning to send anything to Linus, but even if not this single fix seemed appropriate for net. Seems to be a long-standing issue. Please pull and let me know if there's any problem. Thanks, johannes The following changes since commit d91dc434f2baa592e9793597421231174d57bbbf: Merge tag 'rxrpc-fixes-20201005' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs (2020-10-06 06:18:20 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git tags/mac80211-for-net-2020-10-08 for you to fetch changes up to 3dc289f8f139997f4e9d3cfccf8738f20d23e47b: net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() (2020-10-08 12:37:25 +0200) A single fix for missing input validation in nl80211. Anant Thazhemadam (1): net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() net/wireless/nl80211.c | 3 +++ 1 file changed, 3 insertions(+)
pull-request: mac80211-next 2020-10-08
Hi Dave, And also a few more patches for net-next, mostly fixes for the work that recently landed there. Please pull and let me know if there's any problem. Thanks, johannes The following changes since commit 9faebeb2d80065926dfbc09cb73b1bb7779a89cd: Merge branch 'ethtool-allow-dumping-policies-to-user-space' (2020-10-06 06:25:56 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git tags/mac80211-next-for-net-next-2020-10-08 for you to fetch changes up to ba6ff70a3bb76c1ff440d3a0044b82e97abb648f: mac80211: copy configured beacon tx rate to driver (2020-10-08 12:26:35 +0200) A handful of changes: * fixes for the recent S1G work * a docbook build time improvement * API to pass beacon rate to lower-level driver Mauro Carvalho Chehab (1): docs: net: 80211: reduce docs build time Rajkumar Manoharan (1): mac80211: copy configured beacon tx rate to driver Thomas Pedersen (3): mac80211: handle lack of sband->bitrates in rates mac80211: initialize last_rate for S1G STAs cfg80211: only allow S1G channels on S1G band Documentation/driver-api/80211/cfg80211.rst| 392 ++--- .../driver-api/80211/mac80211-advanced.rst | 151 +++- Documentation/driver-api/80211/mac80211.rst| 148 +++- include/net/mac80211.h | 3 + net/mac80211/Makefile | 1 + net/mac80211/cfg.c | 6 +- net/mac80211/ieee80211_i.h | 3 + net/mac80211/mlme.c| 4 +- net/mac80211/rate.c| 1 + net/mac80211/s1g.c | 16 + net/mac80211/sta_info.c| 4 + net/mac80211/sta_info.h| 1 + net/wireless/chan.c| 5 +- 13 files changed, 240 insertions(+), 495 deletions(-) create mode 100644 net/mac80211/s1g.c
Re: [PATCH 01/29] iwlwifi: dvm: Demote non-compliant kernel-doc headers
Lee Jones wrote: > None of these headers attempt to document any function parameters. > > Fixes the following W=1 kernel build warning(s): > > drivers/net/wireless/intel/iwlwifi/dvm/main.c:388: warning: Function > parameter or member 't' not described in 'iwl_bg_statistics_periodic' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:545: warning: Function > parameter or member 't' not described in 'iwl_bg_ucode_trace' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:771: warning: Function > parameter or member 'priv' not described in 'iwl_alive_start' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function > parameter or member 'priv' not described in 'iwl_print_event_log' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function > parameter or member 'start_idx' not described in 'iwl_print_event_log' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function > parameter or member 'num_events' not described in 'iwl_print_event_log' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function > parameter or member 'mode' not described in 'iwl_print_event_log' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function > parameter or member 'pos' not described in 'iwl_print_event_log' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function > parameter or member 'buf' not described in 'iwl_print_event_log' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1692: warning: Function > parameter or member 'bufsz' not described in 'iwl_print_event_log' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'priv' not described in 'iwl_print_last_event_logs' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'capacity' not described in 'iwl_print_last_event_logs' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'num_wraps' not described in 'iwl_print_last_event_logs' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'next_entry' not described in 'iwl_print_last_event_logs' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'size' not described in 'iwl_print_last_event_logs' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'mode' not described in 'iwl_print_last_event_logs' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'pos' not described in 'iwl_print_last_event_logs' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'buf' not described in 'iwl_print_last_event_logs' > drivers/net/wireless/intel/iwlwifi/dvm/main.c:1772: warning: Function > parameter or member 'bufsz' not described in 'iwl_print_last_event_logs' > > Cc: Johannes Berg > Cc: Emmanuel Grumbach > Cc: Luca Coelho > Cc: Intel Linux Wireless > Cc: Kalle Valo > Cc: "David S. Miller" > Cc: Jakub Kicinski > Cc: linux-wirel...@vger.kernel.org > Cc: netdev@vger.kernel.org > Signed-off-by: Lee Jones 14 patches applied to wireless-drivers-next.git, thanks. 7cb391ffdf3c iwlwifi: dvm: Demote non-compliant kernel-doc headers b392eabc6abe iwlwifi: rs: Demote non-compliant kernel-doc headers 229b5582deb5 iwlwifi: dvm: tx: Demote non-compliant kernel-doc headers c8a11a84671e iwlwifi: dvm: lib: Demote non-compliant kernel-doc headers 7619ccceae49 iwlwifi: calib: Demote seemingly unintentional kerneldoc header 8f7ed7bf1384 iwlwifi: dvm: sta: Demote a bunch of nonconformant kernel-doc headers 707c528a8d51 iwlwifi: mvm: ops: Remove unused static struct 'iwl_mvm_debug_names' 108285ec6851 iwlwifi: dvm: Demote a couple of nonconformant kernel-doc headers 7b37b874fce3 iwlwifi: mvm: utils: Fix some doc-rot de00105cf0dc iwlwifi: dvm: scan: Demote a few nonconformant kernel-doc headers 3a7d806926bb iwlwifi: dvm: rxon: Demote non-conformant kernel-doc headers 91b4780fbae7 iwlwifi: mvm: tx: Demote misuse of kernel-doc headers 6806fc7fcfb2 iwlwifi: dvm: devices: Fix function documentation formatting issues 7d4ced86997f iwlwifi: iwl-drv: Provide descriptions debugfs dentries -- https://patchwork.kernel.org/patch/11766851/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
[PATCH v5 0/2] netlink: export policy on validation failures
Export the policy used for attribute validation when it fails, so e.g. for an out-of-range attribute userspace immediately gets the valid ranges back. v2 incorporates the suggestion from Jakub to have a function to estimate the size (netlink_policy_dump_attr_size_estimate()) and check that it does the right thing on the *normal* policy dumps, not (just) when calling it from the error scenario. v3 only addresses a few minor style issues. v4 fixes up a forgotten 'git add' ... sorry. v5 is a resend, I messed up v4's cover letter subject (saying v3) and apparently the second patch didn't go out at all. Tested using nl80211/iw in a few scenarios, seems to work fine and return the policy back, e.g. kernel reports: integer out of range policy: 04 00 0b 00 0c 00 04 00 01 00 00 00 00 00 00 00 ^ padding ^ minimum allowed value policy: 04 00 0b 00 0c 00 05 00 ff ff ff ff 00 00 00 00 ^ padding ^ maximum allowed value policy: 08 00 01 00 04 00 00 00 ^ type 4 == U32 for an out-of-range case. johannes
[PATCH v5 1/2] netlink: policy: refactor per-attr policy writing
From: Johannes Berg Refactor the per-attribute policy writing into a new helper function, to be used later for dumping out the policy of a rejected attribute. v2: - fix some indentation v3: - change variable order in netlink_policy_dump_write() Reviewed-by: Jakub Kicinski Signed-off-by: Johannes Berg --- net/netlink/policy.c | 79 1 file changed, 51 insertions(+), 28 deletions(-) diff --git a/net/netlink/policy.c b/net/netlink/policy.c index ee26d01328ee..4383436759e2 100644 --- a/net/netlink/policy.c +++ b/net/netlink/policy.c @@ -196,49 +196,33 @@ bool netlink_policy_dump_loop(struct netlink_policy_dump_state *state) return !netlink_policy_dump_finished(state); } -/** - * netlink_policy_dump_write - write current policy dump attributes - * @skb: the message skb to write to - * @state: the policy dump state - * - * Returns: 0 on success, an error code otherwise - */ -int netlink_policy_dump_write(struct sk_buff *skb, - struct netlink_policy_dump_state *state) +static int +__netlink_policy_dump_write_attr(struct netlink_policy_dump_state *state, +struct sk_buff *skb, +const struct nla_policy *pt, +int nestattr) { - const struct nla_policy *pt; - struct nlattr *policy, *attr; enum netlink_attribute_type type; - bool again; - -send_attribute: - again = false; - - pt = &state->policies[state->policy_idx].policy[state->attr_idx]; + struct nlattr *attr; - policy = nla_nest_start(skb, state->policy_idx); - if (!policy) - return -ENOBUFS; - - attr = nla_nest_start(skb, state->attr_idx); + attr = nla_nest_start(skb, nestattr); if (!attr) - goto nla_put_failure; + return -ENOBUFS; switch (pt->type) { default: case NLA_UNSPEC: case NLA_REJECT: /* skip - use NLA_MIN_LEN to advertise such */ - nla_nest_cancel(skb, policy); - again = true; - goto next; + nla_nest_cancel(skb, attr); + return -ENODATA; case NLA_NESTED: type = NL_ATTR_TYPE_NESTED; fallthrough; case NLA_NESTED_ARRAY: if (pt->type == NLA_NESTED_ARRAY) type = NL_ATTR_TYPE_NESTED_ARRAY; - if (pt->nested_policy && pt->len && + if (state && pt->nested_policy && pt->len && (nla_put_u32(skb, NL_POLICY_TYPE_ATTR_POLICY_IDX, netlink_policy_dump_get_policy_idx(state, pt->nested_policy, @@ -349,8 +333,47 @@ int netlink_policy_dump_write(struct sk_buff *skb, if (nla_put_u32(skb, NL_POLICY_TYPE_ATTR_TYPE, type)) goto nla_put_failure; - /* finish and move state to next attribute */ nla_nest_end(skb, attr); + return 0; +nla_put_failure: + nla_nest_cancel(skb, attr); + return -ENOBUFS; +} + +/** + * netlink_policy_dump_write - write current policy dump attributes + * @skb: the message skb to write to + * @state: the policy dump state + * + * Returns: 0 on success, an error code otherwise + */ +int netlink_policy_dump_write(struct sk_buff *skb, + struct netlink_policy_dump_state *state) +{ + const struct nla_policy *pt; + struct nlattr *policy; + bool again; + int err; + +send_attribute: + again = false; + + pt = &state->policies[state->policy_idx].policy[state->attr_idx]; + + policy = nla_nest_start(skb, state->policy_idx); + if (!policy) + return -ENOBUFS; + + err = __netlink_policy_dump_write_attr(state, skb, pt, state->attr_idx); + if (err == -ENODATA) { + nla_nest_cancel(skb, policy); + again = true; + goto next; + } else if (err) { + goto nla_put_failure; + } + + /* finish and move state to next attribute */ nla_nest_end(skb, policy); next: -- 2.26.2
[PATCH v5 2/2] netlink: export policy in extended ACK
From: Johannes Berg Add a new attribute NLMSGERR_ATTR_POLICY to the extended ACK to advertise the policy, e.g. if an attribute was out of range, you'll know the range that's permissible. Add new NL_SET_ERR_MSG_ATTR_POL() and NL_SET_ERR_MSG_ATTR_POL() macros to set this, since realistically it's only useful to do this when the bad attribute (offset) is also returned. Use it in lib/nlattr.c which practically does all the policy validation. v2: - add and use netlink_policy_dump_attr_size_estimate() v3: - remove redundant break v4: - really remove redundant break ... sorry Reviewed-by: Jakub Kicinski Signed-off-by: Johannes Berg --- include/linux/netlink.h | 30 -- include/net/netlink.h| 4 +++ include/uapi/linux/netlink.h | 2 ++ lib/nlattr.c | 35 +++-- net/netlink/af_netlink.c | 5 +++ net/netlink/policy.c | 61 6 files changed, 110 insertions(+), 27 deletions(-) diff --git a/include/linux/netlink.h b/include/linux/netlink.h index e3e49f0e5c13..666cd0390699 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -68,12 +68,14 @@ netlink_kernel_create(struct net *net, int unit, struct netlink_kernel_cfg *cfg) * @_msg: message string to report - don't access directly, use * %NL_SET_ERR_MSG * @bad_attr: attribute with error + * @policy: policy for a bad attribute * @cookie: cookie data to return to userspace (for success) * @cookie_len: actual cookie data length */ struct netlink_ext_ack { const char *_msg; const struct nlattr *bad_attr; + const struct nla_policy *policy; u8 cookie[NETLINK_MAX_COOKIE_LEN]; u8 cookie_len; }; @@ -95,21 +97,29 @@ struct netlink_ext_ack { #define NL_SET_ERR_MSG_MOD(extack, msg)\ NL_SET_ERR_MSG((extack), KBUILD_MODNAME ": " msg) -#define NL_SET_BAD_ATTR(extack, attr) do { \ - if ((extack)) \ +#define NL_SET_BAD_ATTR_POLICY(extack, attr, pol) do { \ + if ((extack)) { \ (extack)->bad_attr = (attr);\ + (extack)->policy = (pol); \ + } \ } while (0) -#define NL_SET_ERR_MSG_ATTR(extack, attr, msg) do {\ - static const char __msg[] = msg;\ - struct netlink_ext_ack *__extack = (extack);\ - \ - if (__extack) { \ - __extack->_msg = __msg; \ - __extack->bad_attr = (attr);\ - } \ +#define NL_SET_BAD_ATTR(extack, attr) NL_SET_BAD_ATTR_POLICY(extack, attr, NULL) + +#define NL_SET_ERR_MSG_ATTR_POL(extack, attr, pol, msg) do { \ + static const char __msg[] = msg;\ + struct netlink_ext_ack *__extack = (extack);\ + \ + if (__extack) { \ + __extack->_msg = __msg; \ + __extack->bad_attr = (attr);\ + __extack->policy = (pol); \ + } \ } while (0) +#define NL_SET_ERR_MSG_ATTR(extack, attr, msg) \ + NL_SET_ERR_MSG_ATTR_POL(extack, attr, NULL, msg) + static inline void nl_set_extack_cookie_u64(struct netlink_ext_ack *extack, u64 cookie) { diff --git a/include/net/netlink.h b/include/net/netlink.h index 2b9e41075f19..7356f41d23ba 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -1957,6 +1957,10 @@ int netlink_policy_dump_get_policy_idx(struct netlink_policy_dump_state *state, bool netlink_policy_dump_loop(struct netlink_policy_dump_state *state); int netlink_policy_dump_write(struct sk_buff *skb, struct netlink_policy_dump_state *state); +int netlink_policy_dump_attr_size_estimate(const struct nla_policy *pt); +int netlink_policy_dump_write_attr(struct sk_buff *skb, + const struct nla_policy *pt, + int nestattr); void netlink_policy_dump_free(struct netlink_policy_dump_state *state); #endif diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index d02e472ba54c..c3816ff7bfc3 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -129,6 +129,7 @@ struct nlmsgerr { * @NLMSGERR_ATTR_COOKIE: arbitrary subsystem specific cookie to * be used - in the success case - to identify a created * object or operation or similar (binary) + * @NLMSGERR_ATTR_POLICY: policy for a rejected attribute * @__NLMSGE
Re: [PATCH 1/2] ath11k: Fix memory leak on error path
Alex Dewar wrote: > In ath11k_mac_setup_iface_combinations(), if memory cannot be assigned > for the variable limits, then the memory assigned to combinations will > be leaked. Fix this. > > Addresses-Coverity-ID: 1497534 ("Resource leaks") > Fixes: 2626c269702e ("ath11k: add interface_modes to hw_params") > Signed-off-by: Alex Dewar Patch applied to wireless-drivers-next.git, thanks. 8431350eee2e ath11k: Fix memory leak on error path -- https://patchwork.kernel.org/patch/11815579/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH v4 bpf-next 09/13] bpf: introduce multibuff support to bpf_prog_test_run_xdp()
On Thu, 8 Oct 2020 11:06:14 +0300 Shay Agroskin wrote: > Lorenzo Bianconi writes: > > > Introduce the capability to allocate a xdp multi-buff in > > bpf_prog_test_run_xdp routine. This is a preliminary patch to > > introduce > > the selftests for new xdp multi-buff ebpf helpers > > > > Signed-off-by: Lorenzo Bianconi > > --- > > net/bpf/test_run.c | 51 ++ > > 1 file changed, 43 insertions(+), 8 deletions(-) > > > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > > index bd291f5f539c..ec7286cd051b 100644 > > --- a/net/bpf/test_run.c > > +++ b/net/bpf/test_run.c > > @@ -617,44 +617,79 @@ int bpf_prog_test_run_xdp(struct bpf_prog > > *prog, const union bpf_attr *kattr, > > { > > u32 tailroom = SKB_DATA_ALIGN(sizeof(struct > > skb_shared_info)); > > u32 headroom = XDP_PACKET_HEADROOM; > > - u32 size = kattr->test.data_size_in; > > u32 repeat = kattr->test.repeat; > > struct netdev_rx_queue *rxqueue; > > + struct skb_shared_info *sinfo; > > struct xdp_buff xdp = {}; > > + u32 max_data_sz, size; > > u32 retval, duration; > > - u32 max_data_sz; > > + int i, ret, data_len; > > void *data; > > - int ret; > > > > if (kattr->test.ctx_in || kattr->test.ctx_out) > > return -EINVAL; > > > > - /* XDP have extra tailroom as (most) drivers use full page > > */ > > max_data_sz = 4096 - headroom - tailroom; > > For the sake of consistency, can this 4096 be changed to PAGE_SIZE > ? The size 4096 is explicitly use, because the selftest xdp_adjust_tail expect this, else it will fail on ARCHs with 64K PAGE_SIZE. It also seems excessive to create 64K packets for testing XDP. See: tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c > Same as in > data_len = min_t(int, kattr->test.data_size_in - size, > PAGE_SIZE); > > expression below -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: mwifiex: fix double free
t...@redhat.com wrote: > From: Tom Rix > > clang static analysis reports this problem: > > sdio.c:2403:3: warning: Attempt to free released memory > kfree(card->mpa_rx.buf); > ^~~ > > When mwifiex_init_sdio() fails in its first call to > mwifiex_alloc_sdio_mpa_buffer, it falls back to calling it > again. If the second alloc of mpa_tx.buf fails, the error > handler will try to free the old, previously freed mpa_rx.buf. > Reviewing the code, it looks like a second double free would > happen with mwifiex_cleanup_sdio(). > > So set both pointers to NULL when they are freed. > > Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex > driver") > Signed-off-by: Tom Rix > Reviewed-by: Brian Norris Patch applied to wireless-drivers-next.git, thanks. 53708f4fd9cf mwifiex: fix double free -- https://patchwork.kernel.org/patch/11815655/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH] rtlwifi: rtl8192se: remove duplicated legacy_httxpowerdiff
Chris Chiu wrote: > From: Chris Chiu > > The legacy_httxpowerdiff in rtl8192se is pretty much the same as > the legacy_ht_txpowerdiff for other chips. Use the same name to > keep the consistency. > > Signed-off-by: Chris Chiu Patch applied to wireless-drivers-next.git, thanks. 8b2426c50f20 rtlwifi: rtl8192se: remove duplicated legacy_httxpowerdiff -- https://patchwork.kernel.org/patch/11818043/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH][next] ath11k: fix memory leak of 'combinations'
Colin King wrote: > From: Colin Ian King > > Currently the error return path when 'limits' fails to allocate > does not free the memory allocated for 'combinations'. Fix this > by adding a kfree before returning. > > Addresses-Coverity: ("Resource leak") > Fixes: 2626c269702e ("ath11k: add interface_modes to hw_params") > Signed-off-by: Colin Ian King Alex already sent an identical patch: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=8431350eee2e27ae60f5250e0437ab298329070e Patch set to Superseded. -- https://patchwork.kernel.org/patch/11819025/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH 0/7] wfx: move out from the staging area
Jérôme Pouiller writes: > On Thursday 8 October 2020 09:30:06 CEST Kalle Valo wrote: > [...] >> Yes, the driver needs to be reviewed in linux-wireless list. I recommend >> submitting the whole driver in a patchset with one file per patch, which >> seems to be the easiest way to review a full driver. The final move will >> be in just one commit moving the driver, just like patch 7 does here. As >> an example see how wilc1000 review was done. > > I see. I suppose it is still a bit complicated to review? Maybe I could > try to make things easier. > > For my submission to staging/ I had taken time to split the driver in an > understandable series of patches[1]. I think it was easier to review than > just sending files one by one. I could do the same thing for the > submission to linux-wireless. It would ask me a bit of work but, since I > already have a template, it is conceivable. > > Do you think it is worth it, or it would be an unnecessary effort? > > [1] > https://lore.kernel.org/driverdev-devel/20190919142527.31797-1-jerome.pouil...@silabs.com/ > or commits a7a91ca5a23d^..40115bbc40e2 I don't know how others think, but I prefer to review new drivers "one file per patch" style as I get to see the big picture easily. And besides, splitting the driver like that would be a huge job for you. I don't think it's worth your time in this case. And making changes in the driver during review process becomes even more complex. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH bpf-next V2 1/6] bpf: Remove MTU check in __bpf_skb_max_len
On Wed, 7 Oct 2020 16:46:10 -0700 Maciej Żenczykowski wrote: > > static u32 __bpf_skb_max_len(const struct sk_buff *skb) > > { > > - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len : > > - SKB_MAX_ALLOC; > > + return IP_MAX_MTU; > > } > > Shouldn't we just delete this helper instead and replace call sites? It does seem wrong to pass argument skb into this function, as it is no-longer used... Guess I can simply replace __bpf_skb_max_len with IP_MAX_MTU. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[PATCH net-next] neigh: add netlink filtering based on LLADDR for dump
neighbours table dump supports today two filtering: * based on interface index * based on master index This patch adds a new filtering, based on layer two address. That will help to replace something like it: ip neigh show | grep aa:11:22:bb:ee:ff by a better command: ip neigh show lladdr aa:11:22:bb:ee:ff Signed-off-by: Florent Fourcot --- net/core/neighbour.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 8e39e28b0a8d..4b32bf49a005 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2542,9 +2542,25 @@ static bool neigh_ifindex_filtered(struct net_device *dev, int filter_idx) return false; } +static bool neigh_lladdr_filtered(struct neighbour *neigh, const u8 *lladdr) +{ + if (!lladdr) + return false; + + /* Ignore all empty values when lladdr filtering is set */ + if (!neigh->dev->addr_len) + return true; + + if (memcmp(lladdr, neigh->ha, neigh->dev->addr_len) != 0) + return true; + + return false; +} + struct neigh_dump_filter { int master_idx; int dev_idx; + void *lladdr; }; static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, @@ -2558,7 +2574,7 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, struct neigh_hash_table *nht; unsigned int flags = NLM_F_MULTI; - if (filter->dev_idx || filter->master_idx) + if (filter->dev_idx || filter->master_idx || filter->lladdr) flags |= NLM_F_DUMP_FILTERED; rcu_read_lock_bh(); @@ -2573,7 +2589,8 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, if (idx < s_idx || !net_eq(dev_net(n->dev), net)) goto next; if (neigh_ifindex_filtered(n->dev, filter->dev_idx) || - neigh_master_filtered(n->dev, filter->master_idx)) + neigh_master_filtered(n->dev, filter->master_idx) || + neigh_lladdr_filtered(n, filter->lladdr)) goto next; if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, @@ -2689,6 +2706,9 @@ static int neigh_valid_dump_req(const struct nlmsghdr *nlh, case NDA_MASTER: filter->master_idx = nla_get_u32(tb[i]); break; + case NDA_LLADDR: + filter->lladdr = nla_data(tb[i]); + break; default: if (strict_check) { NL_SET_ERR_MSG(extack, "Unsupported attribute in neighbor dump request"); -- 2.20.1
RE: [PATCH v2 1/6] Add ancillary bus support
> From: gre...@linuxfoundation.org > Sent: Thursday, October 8, 2020 1:21 PM > > On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote: > > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky > wrote: > > [..] > > > All stated above is my opinion, it can be different from yours. > > > > Yes, but we need to converge to move this forward. Jason was involved > > in the current organization for registration, Greg was angling for > > this to be core functionality. I have use cases outside of RDMA and > > netdev. Parav was ok with the current organization. The SOF folks > > already have a proposed incorporation of it. The argument I am hearing > > is that "this registration api seems hard for driver writers" when we > > have several driver writers who have already taken a look and can make > > it work. If you want to follow on with a simpler wrappers for your use > > case, great, but I do not yet see anyone concurring with your opinion > > that the current organization is irretrievably broken or too obscure > > to use. > > That's kind of because I tuned out of this thread a long time ago :) > > I do agree with Leon that I think the current patch is not the correct way to > do this the easiest, but don't have a competing proposal to show what I > mean. > > Yet. Please consider the approach of ib_alloc_device(), ib_dealloc_device() and ib_register_register()/unregister(). (a) It avoids driver calling put_device() on error unwinding path. (b) still achieves container_of(). > > Let's see what happens after 5.10-rc1 is out, it's too late now for any of > this > for this next merge window so we can not worry about it for a few weeks. > Ok. INHO giving direction to Dave and others to either refine current APIs or follow ib_alloc_device() approach will be a helpful input. ancillary bus can do better APIs than the newly (march 2020 !) introduced vdpa bus [1] and its drivers which follows put_device() pattern in [2] and [3] in error unwinding path. [1] https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/vdpa.c [2] https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/ifcvf/ifcvf_main.c#L475 [3] https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/mlx5/net/mlx5_vnet.c#L1967 > thanks, > > greg k-h
Re: [Patch net] ip_gre: set dev->hard_header_len properly
On Wed, Oct 7, 2020 at 9:22 PM Cong Wang wrote: > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > conditionally. When it is set, it assumes the outer IP header is > already created before ipgre_xmit(). > > This is not true when we send packets through a raw packet socket, > where L2 headers are supposed to be constructed by user. Packet > socket calls dev_validate_header() to validate the header. But > GRE tunnel does not set dev->hard_header_len, so that check can > be simply bypassed, therefore uninit memory could be passed down > to ipgre_xmit(). If dev->hard_header_len is zero, the packet socket will not reserve room for the link layer header, so skb->data points to network_header. But I don't see any uninitialized packet data? > Fix this by setting dev->hard_header_len whenever sets header_ops, > as dev->hard_header_len is supposed to be the length of the header > created by dev->header_ops->create() anyway. Agreed. Xie has made similar changes to lapbether recently in commit c7ca03c216ac. > Reported-and-tested-by: syzbot+4a2c52677a8a1aa28...@syzkaller.appspotmail.com > Cc: William Tu > Cc: Willem de Bruijn > Signed-off-by: Cong Wang > --- > net/ipv4/ip_gre.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 4e31f23e4117..43b62095559e 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -987,10 +987,12 @@ static int ipgre_tunnel_init(struct net_device *dev) > return -EINVAL; > dev->flags = IFF_BROADCAST; > dev->header_ops = &ipgre_header_ops; > + dev->hard_header_len = tunnel->hlen + sizeof(*iph); > } > #endif > } else if (!tunnel->collect_md) { > dev->header_ops = &ipgre_header_ops; > + dev->hard_header_len = tunnel->hlen + sizeof(*iph); Unrelated to this patch, I do wonder if ipgre_header does the right thing when tunnel->hlen > sizeof(gre_base_hdr), i.e., for gre tunnels with optional fields. Currently it appears to not initialize those. > } > > return ip_tunnel_init(dev); > -- > 2.28.0 >
Re: [PATCH] man: fix typos
Signed-off-by: Samanta Navarro --- man/man8/ip-link.8.in | 14 +++--- man/man8/ip-neighbour.8 | 6 +++--- man/man8/tc-actions.8 | 2 +- man/man8/tc-pie.8 | 4 ++-- 4 files changed, 13 insertions(+), 13 deletions(-) diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in index f451ecf..62d2d9f 100644 --- a/man/man8/ip-link.8.in +++ b/man/man8/ip-link.8.in @@ -391,7 +391,7 @@ packet the new device should accept. .TP .BI gso_max_segs " SEGMENTS " -specifies the recommended maximum number of a Generic Segment Offload +specifies the recommended maximum number of Generic Segment Offload segments the new device should accept. .TP @@ -441,7 +441,7 @@ the following additional arguments are supported: - either 802.1Q or 802.1ad. .BI id " VLANID " -- specifies the VLAN Identifer to use. Note that numbers with a leading " 0 " or " 0x " are interpreted as octal or hexadeimal, respectively. +- specifies the VLAN Identifier to use. Note that numbers with a leading " 0 " or " 0x " are interpreted as octal or hexadecimal, respectively. .BR reorder_hdr " { " on " | " off " } " - specifies whether ethernet headers are reordered or not (default is @@ -572,7 +572,7 @@ the following additional arguments are supported: .in +8 .sp .BI id " VNI " -- specifies the VXLAN Network Identifer (or VXLAN Segment +- specifies the VXLAN Network Identifier (or VXLAN Segment Identifier) to use. .BI dev " PHYS_DEV" @@ -1237,7 +1237,7 @@ the following additional arguments are supported: .in +8 .sp .BI id " VNI " -- specifies the Virtual Network Identifer to use. +- specifies the Virtual Network Identifier to use. .sp .BI remote " IPADDR" @@ -2147,7 +2147,7 @@ loaded under .B xdpgeneric object "|" pinned then the kernel will use the generic XDP variant instead of the native one. .B xdpdrv -has the opposite effect of requestsing that the automatic fallback to the +has the opposite effect of requesting that the automatic fallback to the generic XDP variant be disabled and in case driver is not XDP-capable error should be returned. .B xdpdrv @@ -2466,7 +2466,7 @@ specifies the master device which enslaves devices to show. .TP .BI vrf " NAME " .I NAME -speficies the VRF which enslaves devices to show. +specifies the VRF which enslaves devices to show. .TP .BI type " TYPE " @@ -2497,7 +2497,7 @@ specifies the device to display address-family statistics for. .PP .I "TYPE" -specifies which help of link type to dislpay. +specifies which help of link type to display. .SS .I GROUP diff --git a/man/man8/ip-neighbour.8 b/man/man8/ip-neighbour.8 index f71f18b..a27f9ef 100644 --- a/man/man8/ip-neighbour.8 +++ b/man/man8/ip-neighbour.8 @@ -85,11 +85,11 @@ the interface to which this neighbour is attached. .TP .BI proxy -indicates whether we are proxying for this neigbour entry +indicates whether we are proxying for this neighbour entry .TP .BI router -indicates whether neigbour is a router +indicates whether neighbour is a router .TP .BI extern_learn @@ -244,7 +244,7 @@ lookup a neighbour entry to a destination given a device .TP .BI proxy -indicates whether we should lookup a proxy neigbour entry +indicates whether we should lookup a proxy neighbour entry .TP .BI to " ADDRESS " (default) diff --git a/man/man8/tc-actions.8 b/man/man8/tc-actions.8 index 6f1c201..80df577 100644 --- a/man/man8/tc-actions.8 +++ b/man/man8/tc-actions.8 @@ -253,7 +253,7 @@ should proceed after executing the action. Any of the following are valid: .RS .TP .B reclassify -Restart the classifiction by jumping back to the first filter attached to +Restart the classification by jumping back to the first filter attached to the action's parent. .TP .B pipe diff --git a/man/man8/tc-pie.8 b/man/man8/tc-pie.8 index 0db97d1..5a8c782 100644 --- a/man/man8/tc-pie.8 +++ b/man/man8/tc-pie.8 @@ -40,7 +40,7 @@ aims to control delay. The main design goals are PIE is designed to control delay effectively. First, an average dequeue rate is estimated based on the standing queue. The rate is used to calculate the current delay. Then, on a periodic basis, the delay is used to calculate the dropping -probabilty. Finally, on arrival, a packet is dropped (or marked) based on this +probability. Finally, on arrival, a packet is dropped (or marked) based on this probability. PIE makes adjustments to the probability based on the trend of the delay i.e. @@ -52,7 +52,7 @@ growth and are determined through control theoretic approaches. alpha determines the deviation between the current and target latency changes probability. beta exerts additional adjustments depending on the latency trend. -The drop probabilty is used to mark packets in ecn mode. However, as in RED, +The drop probability is used to mark packets in ecn mode. However, as in RED, beyond 10% packets are dropped based on this probability. The bytemode is used to drop packets proportional to the packet size. -- 2.28.0
Re: [PATCH net-next v6 2/7] net: dsa: Add DSA driver for Hirschmann Hellcreek switches
On Tue, Oct 06, 2020 at 12:13:04PM +0200, Kurt Kanzenbach wrote: > >> >> +static const struct hellcreek_platform_data de1soc_r1_pdata = { > >> >> + .num_ports = 4, > >> >> + .is_100_mbits= 1, > >> >> + .qbv_support = 1, > >> >> + .qbv_on_cpu_port = 1, > >> > > >> > Why does this matter? > >> > >> Because Qbv on the CPU port is a feature and not all switch variants > >> have that. It will matter as soon as TAPRIO is implemented. > > > > How do you plan to install a tc-taprio qdisc on the CPU port? > > That's an issue to be sorted out. Do you have a compelling use case for tc-taprio on the CPU port though? I've been waiting for someone to put one on the table. If it's just "nice to have", I don't think that DSA will change just to accomodate that. The fact that the CPU port doesn't have a net device is already pretty much the established behavior.
[PATCH net-next 1/3] net: mscc: ocelot: offload VLAN mangle action to VCAP IS1
The VCAP_IS1_ACT_VID_REPLACE_ENA action, from the VCAP IS1 ingress TCAM, changes the classified VLAN. We are only exposing this ability for switch ports that are under VLAN aware bridges. This is because in standalone ports mode and under a bridge with vlan_filtering=0, the ocelot driver configures the switch to operate as VLAN-unaware, so the classified VLAN is not derived from the 802.1Q header from the packet, but instead is always equal to the port-based VLAN ID of the ingress port. We _can_ still change the classified VLAN for packets when operating in this mode, but the end result will most likely be a drop, since both the ingress and the egress port need to be members of the modified VLAN. And even if we install the new classified VLAN into the VLAN table of the switch, the result would still not be as expected: we wouldn't see, on the output port, the modified VLAN tag, but the original one, even though the classified VLAN was indeed modified. This is because of how the hardware works: on egress, what is pushed to the frame is a "port tag", which gives us the following options: - Tag all frames with port tag (derived from the classified VLAN) - Tag all frames with port tag, except if the classified VLAN is 0 or equal to the native VLAN of the egress port - No port tag Needless to say, in VLAN-unaware mode we are disabling the port tag. Otherwise, the existing VLAN tag would be ignored, and a second VLAN tag (the port tag), holding the classified VLAN, would be pushed (instead of replacing the existing 802.1Q tag). This is definitely not what the user wanted when installing a "vlan modify" action. So it is simply not worth bothering with VLAN modify rules under other configurations except when the ports are fully VLAN-aware. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c| 15 +++- drivers/net/ethernet/mscc/ocelot_flower.c | 29 --- 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index a965a554ff8b..3c4aa6a0c397 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -205,8 +205,21 @@ int ocelot_port_vlan_filtering(struct ocelot *ocelot, int port, struct ocelot_port *ocelot_port = ocelot->ports[port]; u32 val; - if (switchdev_trans_ph_prepare(trans)) + if (switchdev_trans_ph_prepare(trans)) { + struct ocelot_vcap_block *block = &ocelot->block[VCAP_IS1]; + struct ocelot_vcap_filter *filter; + + list_for_each_entry(filter, &block->rules, list) { + if (filter->ingress_port_mask & BIT(port) && + filter->action.vid_replace_ena) { + dev_err(ocelot->dev, + "Cannot change VLAN state with vlan modify rules active\n"); + return -EBUSY; + } + } + return 0; + } ocelot_port->vlan_aware = vlan_aware; diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c index 0ea6d4f411cb..729495a1a77e 100644 --- a/drivers/net/ethernet/mscc/ocelot_flower.c +++ b/drivers/net/ethernet/mscc/ocelot_flower.c @@ -142,10 +142,11 @@ ocelot_find_vcap_filter_that_points_at(struct ocelot *ocelot, int chain) return NULL; } -static int ocelot_flower_parse_action(struct ocelot *ocelot, bool ingress, - struct flow_cls_offload *f, +static int ocelot_flower_parse_action(struct ocelot *ocelot, int port, + bool ingress, struct flow_cls_offload *f, struct ocelot_vcap_filter *filter) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; struct netlink_ext_ack *extack = f->common.extack; bool allow_missing_goto_target = false; const struct flow_action_entry *a; @@ -266,6 +267,28 @@ static int ocelot_flower_parse_action(struct ocelot *ocelot, bool ingress, } filter->type = OCELOT_VCAP_FILTER_OFFLOAD; break; + case FLOW_ACTION_VLAN_MANGLE: + if (filter->block_id != VCAP_IS1) { + NL_SET_ERR_MSG_MOD(extack, + "VLAN modify action can only be offloaded to VCAP IS1"); + return -EOPNOTSUPP; + } + if (filter->goto_target != -1) { + NL_SET_ERR_MSG_MOD(extack, + "Last action must be GOTO"); + return -EOPNOTSUPP; + } + if (!ocelot_port->vlan_aware) { +
[PATCH net-next 0/3] Offload tc-vlan mangle to mscc_ocelot switch
This series offloads one more action to the VCAP IS1 ingress TCAM, which is to change the classified VLAN for packets, according to the VCAP IS1 keys (VLAN, source MAC, source IP, EtherType, etc). Vladimir Oltean (3): net: mscc: ocelot: offload VLAN mangle action to VCAP IS1 net: dsa: tag_ocelot: use VLAN information from tagging header when available selftests: net: mscc: ocelot: add test for VLAN modify action drivers/net/ethernet/mscc/ocelot.c| 15 +- drivers/net/ethernet/mscc/ocelot_flower.c | 29 ++-- net/dsa/tag_ocelot.c | 34 ++ .../drivers/net/ocelot/tc_flower_chains.sh| 47 ++- 4 files changed, 119 insertions(+), 6 deletions(-) -- 2.25.1
[PATCH net-next 2/3] net: dsa: tag_ocelot: use VLAN information from tagging header when available
When the Extraction Frame Header contains a valid classified VLAN, use that instead of the VLAN header present in the packet. Signed-off-by: Vladimir Oltean --- net/dsa/tag_ocelot.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index ec16badb7812..3b468aca5c53 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -184,9 +184,14 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb, struct net_device *netdev, struct packet_type *pt) { + struct dsa_port *cpu_dp = netdev->dsa_ptr; + struct dsa_switch *ds = cpu_dp->ds; + struct ocelot *ocelot = ds->priv; u64 src_port, qos_class; + u64 vlan_tci, tag_type; u8 *start = skb->data; u8 *extraction; + u16 vlan_tpid; /* Revert skb->data by the amount consumed by the DSA master, * so it points to the beginning of the frame. @@ -214,6 +219,8 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb, packing(extraction, &src_port, 46, 43, OCELOT_TAG_LEN, UNPACK, 0); packing(extraction, &qos_class, 19, 17, OCELOT_TAG_LEN, UNPACK, 0); + packing(extraction, &tag_type, 16, 16, OCELOT_TAG_LEN, UNPACK, 0); + packing(extraction, &vlan_tci, 15, 0, OCELOT_TAG_LEN, UNPACK, 0); skb->dev = dsa_master_find_slave(netdev, 0, src_port); if (!skb->dev) @@ -228,6 +235,33 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb, skb->offload_fwd_mark = 1; skb->priority = qos_class; + /* Ocelot switches copy frames unmodified to the CPU. However, it is +* possible for the user to request a VLAN modification through +* VCAP_IS1_ACT_VID_REPLACE_ENA. In this case, what will happen is that +* the VLAN ID field from the Extraction Header gets updated, but the +* 802.1Q header does not (the classified VLAN only becomes visible on +* egress through the "port tag" of front-panel ports). +* So, for traffic extracted by the CPU, we want to pick up the +* classified VLAN and manually replace the existing 802.1Q header from +* the packet with it, so that the operating system is always up to +* date with the result of tc-vlan actions. +* NOTE: In VLAN-unaware mode, we don't want to do that, we want the +* frame to remain unmodified, because the classified VLAN is always +* equal to the pvid of the ingress port and should not be used for +* processing. +*/ + vlan_tpid = tag_type ? ETH_P_8021AD : ETH_P_8021Q; + + if (ocelot->ports[src_port]->vlan_aware && + eth_hdr(skb)->h_proto == htons(vlan_tpid)) { + u16 dummy_vlan_tci; + + skb_push_rcsum(skb, ETH_HLEN); + __skb_vlan_pop(skb, &dummy_vlan_tci); + skb_pull_rcsum(skb, ETH_HLEN); + __vlan_hwaccel_put_tag(skb, htons(vlan_tpid), vlan_tci); + } + return skb; } -- 2.25.1
[PATCH net-next 3/3] selftests: net: mscc: ocelot: add test for VLAN modify action
Create a test that changes a VLAN ID from 200 to 300. We also need to modify the preferences of the filters installed for the other rules so that they are unique, because we now install the "tc-vlan modify" filter in VCAP IS1 only temporarily, and we need to perform the deletion by filter preference number. Signed-off-by: Vladimir Oltean --- .../drivers/net/ocelot/tc_flower_chains.sh| 47 ++- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh b/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh index 71a538add08a..beee0d5646a6 100755 --- a/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh +++ b/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh @@ -166,6 +166,9 @@ setup_prepare() ip link add link $eth3 name $eth3.100 type vlan id 100 ip link set $eth3.100 up + ip link add link $eth3 name $eth3.200 type vlan id 200 + ip link set $eth3.200 up + tc filter add dev $eth0 ingress chain $(IS1 1) pref 1 \ protocol 802.1Q flower skip_sw vlan_id 100 \ action vlan pop \ @@ -175,12 +178,12 @@ setup_prepare() flower skip_sw indev $eth1 \ action vlan push protocol 802.1Q id 100 - tc filter add dev $eth0 ingress chain $(IS1 0) \ + tc filter add dev $eth0 ingress chain $(IS1 0) pref 2 \ protocol ipv4 flower skip_sw src_ip 10.1.1.2 \ action skbedit priority 7 \ action goto chain $(IS1 1) - tc filter add dev $eth0 ingress chain $(IS2 0 0) \ + tc filter add dev $eth0 ingress chain $(IS2 0 0) pref 1 \ protocol ipv4 flower skip_sw ip_proto udp dst_port 5201 \ action police rate 50mbit burst 64k \ action goto chain $(IS2 1 0) @@ -188,6 +191,7 @@ setup_prepare() cleanup() { + ip link del $eth3.200 ip link del $eth3.100 tc qdisc del dev $eth0 clsact ip link del br0 @@ -238,6 +242,44 @@ test_vlan_push() tcpdump_cleanup } +test_vlan_modify() +{ + printf "Testing VLAN modification.. " + + ip link set br0 type bridge vlan_filtering 1 + bridge vlan add dev $eth0 vid 200 + bridge vlan add dev $eth0 vid 300 + bridge vlan add dev $eth1 vid 300 + + tc filter add dev $eth0 ingress chain $(IS1 2) pref 3 \ + protocol 802.1Q flower skip_sw vlan_id 200 \ + action vlan modify id 300 \ + action goto chain $(IS2 0 0) + + tcpdump_start $eth2 + + $MZ $eth3.200 -q -c 1 -p 64 -a $eth3_mac -b $eth2_mac -t ip + + sleep 1 + + tcpdump_stop + + if tcpdump_show | grep -q "$eth3_mac > $eth2_mac, .* vlan 300"; then + echo "OK" + else + echo "FAIL" + fi + + tcpdump_cleanup + + tc filter del dev $eth0 ingress chain $(IS1 2) pref 3 + + bridge vlan del dev $eth0 vid 200 + bridge vlan del dev $eth0 vid 300 + bridge vlan del dev $eth1 vid 300 + ip link set br0 type bridge vlan_filtering 0 +} + test_skbedit_priority() { local num_pkts=100 @@ -262,6 +304,7 @@ trap cleanup EXIT ALL_TESTS=" test_vlan_pop test_vlan_push + test_vlan_modify test_skbedit_priority " -- 2.25.1
[PATCH v1 6/6] staging: qlge: add documentation for debugging qlge
Instructions and examples on kernel data structures dumping and coredump. Signed-off-by: Coiby Xu --- .../networking/device_drivers/index.rst | 1 + .../device_drivers/qlogic/index.rst | 18 +++ .../networking/device_drivers/qlogic/qlge.rst | 118 ++ MAINTAINERS | 6 + 4 files changed, 143 insertions(+) create mode 100644 Documentation/networking/device_drivers/qlogic/index.rst create mode 100644 Documentation/networking/device_drivers/qlogic/qlge.rst diff --git a/Documentation/networking/device_drivers/index.rst b/Documentation/networking/device_drivers/index.rst index a3113ffd7a16..d8279de7bf25 100644 --- a/Documentation/networking/device_drivers/index.rst +++ b/Documentation/networking/device_drivers/index.rst @@ -15,6 +15,7 @@ Contents: ethernet/index fddi/index hamradio/index + qlogic/index wan/index wifi/index diff --git a/Documentation/networking/device_drivers/qlogic/index.rst b/Documentation/networking/device_drivers/qlogic/index.rst new file mode 100644 index ..ad05b04286e4 --- /dev/null +++ b/Documentation/networking/device_drivers/qlogic/index.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +QLogic QLGE Device Drivers +=== + +Contents: + +.. toctree:: + :maxdepth: 2 + + qlge + +.. only:: subproject and html + + Indices + === + + * :ref:`genindex` diff --git a/Documentation/networking/device_drivers/qlogic/qlge.rst b/Documentation/networking/device_drivers/qlogic/qlge.rst new file mode 100644 index ..0b888253d152 --- /dev/null +++ b/Documentation/networking/device_drivers/qlogic/qlge.rst @@ -0,0 +1,118 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=== +QLogic QLGE 10Gb Ethernet device driver +=== + +This driver use drgn and devlink for debugging. + +Dump kernel data structures in drgn +--- + +To dump kernel data structures, the following Python script can be used +in drgn: + +.. code-block:: python + + def align(x, a): + """the alignment a should be a power of 2 + """ + mask = a - 1 + return (x+ mask) & ~mask + + def struct_size(struct_type): + struct_str = "struct {}".format(struct_type) + return sizeof(Object(prog, struct_str, address=0x0)) + + def netdev_priv(netdevice): + NETDEV_ALIGN = 32 + return netdevice.value_() + align(struct_size("net_device"), NETDEV_ALIGN) + + name = 'xxx' + qlge_device = None + netdevices = prog['init_net'].dev_base_head.address_of_() + for netdevice in list_for_each_entry("struct net_device", netdevices, "dev_list"): + if netdevice.name.string_().decode('ascii') == name: + print(netdevice.name) + + ql_adapter = Object(prog, "struct ql_adapter", address=netdev_priv(qlge_device)) + +The struct ql_adapter will be printed in drgn as follows, + +>>> ql_adapter +(struct ql_adapter){ +.ricb = (struct ricb){ +.base_cq = (u8)0, +.flags = (u8)120, +.mask = (__le16)26637, +.hash_cq_id = (u8 [1024]){ 172, 142, 255, 255 }, +.ipv6_hash_key = (__le32 [10]){}, +.ipv4_hash_key = (__le32 [4]){}, +}, +.flags = (unsigned long)0, +.wol = (u32)0, +.nic_stats = (struct nic_stats){ +.tx_pkts = (u64)0, +.tx_bytes = (u64)0, +.tx_mcast_pkts = (u64)0, +.tx_bcast_pkts = (u64)0, +.tx_ucast_pkts = (u64)0, +.tx_ctl_pkts = (u64)0, +.tx_pause_pkts = (u64)0, +... +}, +.active_vlans = (unsigned long [64]){ +0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52780853100545, 18446744073709551615, +18446619461681283072, 0, 42949673024, 2147483647, +}, +.rx_ring = (struct rx_ring [17]){ +{ +.cqicb = (struct cqicb){ +.msix_vect = (u8)0, +.reserved1 = (u8)0, +.reserved2 = (u8)0, +.flags = (u8)0, +.len = (__le16)0, +.rid = (__le16)0, +... +}, +.cq_base = (void *)0x0, +.cq_base_dma = (dma_addr_t)0, +} +
[PATCH v1 4/6] staging: qlge: remove mpi_core_to_log which sends coredump to the kernel ring buffer
devlink health could be used to get coredump. No need to send so much data to the kernel ring buffer. Signed-off-by: Coiby Xu --- drivers/staging/qlge/qlge.h | 3 --- drivers/staging/qlge/qlge_dbg.c | 11 --- drivers/staging/qlge/qlge_ethtool.c | 1 - drivers/staging/qlge/qlge_main.c| 2 -- drivers/staging/qlge/qlge_mpi.c | 6 -- 5 files changed, 23 deletions(-) diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h index 290e754450c5..0a39801be15a 100644 --- a/drivers/staging/qlge/qlge.h +++ b/drivers/staging/qlge/qlge.h @@ -2149,7 +2149,6 @@ struct ql_adapter { u32 port_init; u32 link_status; struct ql_mpi_coredump *mpi_coredump; - u32 core_is_dumped; u32 link_config; u32 led_config; u32 max_frame_size; @@ -2162,7 +2161,6 @@ struct ql_adapter { struct delayed_work mpi_work; struct delayed_work mpi_port_cfg_work; struct delayed_work mpi_idc_work; - struct delayed_work mpi_core_to_log; struct completion ide_completion; const struct nic_operations *nic_ops; u16 device_id; @@ -2253,7 +2251,6 @@ int ql_write_cfg(struct ql_adapter *qdev, void *ptr, int size, u32 bit, void ql_queue_fw_error(struct ql_adapter *qdev); void ql_mpi_work(struct work_struct *work); void ql_mpi_reset_work(struct work_struct *work); -void ql_mpi_core_to_log(struct work_struct *work); int ql_wait_reg_rdy(struct ql_adapter *qdev, u32 reg, u32 bit, u32 ebit); void ql_queue_asic_error(struct ql_adapter *qdev); void ql_set_ethtool_ops(struct net_device *ndev); diff --git a/drivers/staging/qlge/qlge_dbg.c b/drivers/staging/qlge/qlge_dbg.c index 42fd13990f3a..989575743718 100644 --- a/drivers/staging/qlge/qlge_dbg.c +++ b/drivers/staging/qlge/qlge_dbg.c @@ -1314,17 +1314,6 @@ void ql_get_dump(struct ql_adapter *qdev, void *buff) } } -/* Coredump to messages log file using separate worker thread */ -void ql_mpi_core_to_log(struct work_struct *work) -{ - struct ql_adapter *qdev = - container_of(work, struct ql_adapter, mpi_core_to_log.work); - - print_hex_dump(KERN_DEBUG, "Core is dumping to log file!\n", - DUMP_PREFIX_OFFSET, 32, 4, qdev->mpi_coredump, - sizeof(*qdev->mpi_coredump), false); -} - #ifdef QL_REG_DUMP static void ql_dump_intr_states(struct ql_adapter *qdev) { diff --git a/drivers/staging/qlge/qlge_ethtool.c b/drivers/staging/qlge/qlge_ethtool.c index d44b2dae9213..eed116d8895e 100644 --- a/drivers/staging/qlge/qlge_ethtool.c +++ b/drivers/staging/qlge/qlge_ethtool.c @@ -616,7 +616,6 @@ static void ql_get_regs(struct net_device *ndev, struct ql_adapter *qdev = netdev_priv(ndev); ql_get_dump(qdev, p); - qdev->core_is_dumped = 0; if (!test_bit(QL_FRC_COREDUMP, &qdev->flags)) regs->len = sizeof(struct ql_mpi_coredump); else diff --git a/drivers/staging/qlge/qlge_main.c b/drivers/staging/qlge/qlge_main.c index 135225530e51..aaca740d46c4 100644 --- a/drivers/staging/qlge/qlge_main.c +++ b/drivers/staging/qlge/qlge_main.c @@ -3808,7 +3808,6 @@ static void ql_cancel_all_work_sync(struct ql_adapter *qdev) cancel_delayed_work_sync(&qdev->mpi_reset_work); cancel_delayed_work_sync(&qdev->mpi_work); cancel_delayed_work_sync(&qdev->mpi_idc_work); - cancel_delayed_work_sync(&qdev->mpi_core_to_log); cancel_delayed_work_sync(&qdev->mpi_port_cfg_work); } @@ -4504,7 +4503,6 @@ static int ql_init_device(struct pci_dev *pdev, struct net_device *ndev, INIT_DELAYED_WORK(&qdev->mpi_work, ql_mpi_work); INIT_DELAYED_WORK(&qdev->mpi_port_cfg_work, ql_mpi_port_cfg_work); INIT_DELAYED_WORK(&qdev->mpi_idc_work, ql_mpi_idc_work); - INIT_DELAYED_WORK(&qdev->mpi_core_to_log, ql_mpi_core_to_log); init_completion(&qdev->ide_completion); mutex_init(&qdev->mpi_mutex); diff --git a/drivers/staging/qlge/qlge_mpi.c b/drivers/staging/qlge/qlge_mpi.c index 143a886080c5..1cea24201b17 100644 --- a/drivers/staging/qlge/qlge_mpi.c +++ b/drivers/staging/qlge/qlge_mpi.c @@ -1269,11 +1269,5 @@ void ql_mpi_reset_work(struct work_struct *work) return; } - if (qdev->mpi_coredump && !ql_core_dump(qdev, qdev->mpi_coredump)) { - netif_err(qdev, drv, qdev->ndev, "Core is dumped!\n"); - qdev->core_is_dumped = 1; - queue_delayed_work(qdev->workqueue, - &qdev->mpi_core_to_log, 5 * HZ); - } ql_soft_reset_mpi_risc(qdev); } -- 2.28.0
[PATCH v1 2/6] staging: qlge: coredump via devlink health reporter
$ devlink health dump show DEVICE reporter coredump -p -j { "Core Registers": { "segment": 1, "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ] }, "Test Logic Regs": { "segment": 2, "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ] }, "RMII Registers": { "segment": 3, "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ] }, ... "Sem Registers": { "segment": 50, "values": [ 0,0,0,0 ] } } Signed-off-by: Coiby Xu --- drivers/staging/qlge/qlge_devlink.c | 131 ++-- 1 file changed, 125 insertions(+), 6 deletions(-) diff --git a/drivers/staging/qlge/qlge_devlink.c b/drivers/staging/qlge/qlge_devlink.c index aa45e7e368c0..91b6600b94a9 100644 --- a/drivers/staging/qlge/qlge_devlink.c +++ b/drivers/staging/qlge/qlge_devlink.c @@ -1,16 +1,135 @@ #include "qlge.h" #include "qlge_devlink.h" -static int -qlge_reporter_coredump(struct devlink_health_reporter *reporter, - struct devlink_fmsg *fmsg, void *priv_ctx, - struct netlink_ext_ack *extack) +static int fill_seg_(struct devlink_fmsg *fmsg, + struct mpi_coredump_segment_header *seg_header, + u32 *reg_data) { - return 0; + int i; + int header_size = sizeof(struct mpi_coredump_segment_header); + int regs_num = (seg_header->seg_size - header_size) / sizeof(u32); + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, seg_header->description); + if (err) + return err; + err = devlink_fmsg_obj_nest_start(fmsg); + if (err) + return err; + err = devlink_fmsg_u32_pair_put(fmsg, "segment", seg_header->seg_num); + if (err) + return err; + err = devlink_fmsg_arr_pair_nest_start(fmsg, "values"); + if (err) + return err; + for (i = 0; i < regs_num; i++) { + err = devlink_fmsg_u32_put(fmsg, *reg_data); + if (err) + return err; + reg_data++; + } + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + return err; + err = devlink_fmsg_arr_pair_nest_end(fmsg); + if (err) + return err; + err = devlink_fmsg_pair_nest_end(fmsg); + return err; +} + +#define fill_seg(seg_hdr, seg_regs) \ + do { \ + err = fill_seg_(fmsg, &dump->seg_hdr, dump->seg_regs); \ + if (err) { \ + kvfree(dump); \ + return err;\ + } \ + } while (0) + +static int qlge_reporter_coredump(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg, void *priv_ctx, + struct netlink_ext_ack *extack) +{ + int err = 0; + + struct qlge_devlink *dev = devlink_health_reporter_priv(reporter); + struct ql_adapter *qdev = dev->qdev; + struct ql_mpi_coredump *dump; + + if (!netif_running(qdev->ndev)) + return 0; + + dump = kvmalloc(sizeof(*dump), GFP_KERNEL); + if (!dump) + return -ENOMEM; + + err = ql_core_dump(qdev, dump); + if (err) { + kvfree(dump); + return err; + } + + fill_seg(core_regs_seg_hdr, mpi_core_regs); + fill_seg(test_logic_regs_seg_hdr, test_logic_regs); + fill_seg(rmii_regs_seg_hdr, rmii_regs); + fill_seg(fcmac1_regs_seg_hdr, fcmac1_regs); + fill_seg(fcmac2_regs_seg_hdr, fcmac2_regs); + fill_seg(fc1_mbx_regs_seg_hdr, fc1_mbx_regs); + fill_seg(ide_regs_seg_hdr, ide_regs); + fill_seg(nic1_mbx_regs_seg_hdr, nic1_mbx_regs); + fill_seg(smbus_regs_seg_hdr, smbus_regs); + fill_seg(fc2_mbx_regs_seg_hdr, fc2_mbx_regs); + fill_seg(nic2_mbx_regs_seg_hdr, nic2_mbx_regs); + fill_seg(i2c_regs_seg_hdr, i2c_regs); + fill_seg(memc_regs_seg_hdr, memc_regs); + fill_seg(pbus_regs_seg_hdr, pbus_regs); + fill_seg(mde_regs_seg_hdr, mde_regs); + fill_seg(nic_regs_seg_hdr, nic_regs); + fill_seg(nic2_regs_seg_hdr, nic2_regs); + fill_seg(xgmac1_seg_hdr, xgmac1); + fill_seg(xgmac2_seg_hdr, xgmac2
[PATCH v1 3/6] staging: qlge: support force_coredump option for devlink health dump
With force_coredump module paramter set, devlink health dump will reset the MPI RISC first which takes 5 secs to be finished. Signed-off-by: Coiby Xu --- drivers/staging/qlge/qlge_devlink.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/staging/qlge/qlge_devlink.c b/drivers/staging/qlge/qlge_devlink.c index 91b6600b94a9..54257468bc7f 100644 --- a/drivers/staging/qlge/qlge_devlink.c +++ b/drivers/staging/qlge/qlge_devlink.c @@ -56,10 +56,17 @@ static int qlge_reporter_coredump(struct devlink_health_reporter *reporter, struct qlge_devlink *dev = devlink_health_reporter_priv(reporter); struct ql_adapter *qdev = dev->qdev; struct ql_mpi_coredump *dump; + wait_queue_head_t wait; if (!netif_running(qdev->ndev)) return 0; + if (test_bit(QL_FRC_COREDUMP, &qdev->flags)) { + ql_queue_fw_error(qdev); + init_waitqueue_head(&wait); + wait_event_timeout(wait, 0, 5 * HZ); + } + dump = kvmalloc(sizeof(*dump), GFP_KERNEL); if (!dump) return -ENOMEM; -- 2.28.0
[PATCH v1 5/6] staging: qlge: clean up debugging code in the QL_ALL_DUMP ifdef land
The debugging code in the following ifdef land - QL_ALL_DUMP - QL_REG_DUMP - QL_DEV_DUMP - QL_CB_DUMP - QL_IB_DUMP - QL_OB_DUMP becomes unnecessary because, - Device status and general registers can be obtained by ethtool. - Coredump can be done via devlink health reporter. - Structure related to the hardware (struct ql_adapter) can be obtained by crash or drgn. Suggested-by: Benjamin Poirier Signed-off-by: Coiby Xu --- drivers/staging/qlge/qlge.h | 82 drivers/staging/qlge/qlge_dbg.c | 688 drivers/staging/qlge/qlge_ethtool.c | 2 - drivers/staging/qlge/qlge_main.c| 7 +- 4 files changed, 1 insertion(+), 778 deletions(-) diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h index 0a39801be15a..8aff3ba77730 100644 --- a/drivers/staging/qlge/qlge.h +++ b/drivers/staging/qlge/qlge.h @@ -2285,86 +2285,4 @@ void ql_check_lb_frame(struct ql_adapter *qdev, struct sk_buff *skb); int ql_own_firmware(struct ql_adapter *qdev); int ql_clean_lb_rx_ring(struct rx_ring *rx_ring, int budget); -/* #define QL_ALL_DUMP */ -/* #define QL_REG_DUMP */ -/* #define QL_DEV_DUMP */ -/* #define QL_CB_DUMP */ -/* #define QL_IB_DUMP */ -/* #define QL_OB_DUMP */ - -#ifdef QL_REG_DUMP -void ql_dump_xgmac_control_regs(struct ql_adapter *qdev); -void ql_dump_routing_entries(struct ql_adapter *qdev); -void ql_dump_regs(struct ql_adapter *qdev); -#define QL_DUMP_REGS(qdev) ql_dump_regs(qdev) -#define QL_DUMP_ROUTE(qdev) ql_dump_routing_entries(qdev) -#define QL_DUMP_XGMAC_CONTROL_REGS(qdev) ql_dump_xgmac_control_regs(qdev) -#else -#define QL_DUMP_REGS(qdev) -#define QL_DUMP_ROUTE(qdev) -#define QL_DUMP_XGMAC_CONTROL_REGS(qdev) -#endif - -#ifdef QL_STAT_DUMP -void ql_dump_stat(struct ql_adapter *qdev); -#define QL_DUMP_STAT(qdev) ql_dump_stat(qdev) -#else -#define QL_DUMP_STAT(qdev) -#endif - -#ifdef QL_DEV_DUMP -void ql_dump_qdev(struct ql_adapter *qdev); -#define QL_DUMP_QDEV(qdev) ql_dump_qdev(qdev) -#else -#define QL_DUMP_QDEV(qdev) -#endif - -#ifdef QL_CB_DUMP -void ql_dump_wqicb(struct wqicb *wqicb); -void ql_dump_tx_ring(struct tx_ring *tx_ring); -void ql_dump_ricb(struct ricb *ricb); -void ql_dump_cqicb(struct cqicb *cqicb); -void ql_dump_rx_ring(struct rx_ring *rx_ring); -void ql_dump_hw_cb(struct ql_adapter *qdev, int size, u32 bit, u16 q_id); -#define QL_DUMP_RICB(ricb) ql_dump_ricb(ricb) -#define QL_DUMP_WQICB(wqicb) ql_dump_wqicb(wqicb) -#define QL_DUMP_TX_RING(tx_ring) ql_dump_tx_ring(tx_ring) -#define QL_DUMP_CQICB(cqicb) ql_dump_cqicb(cqicb) -#define QL_DUMP_RX_RING(rx_ring) ql_dump_rx_ring(rx_ring) -#define QL_DUMP_HW_CB(qdev, size, bit, q_id) \ - ql_dump_hw_cb(qdev, size, bit, q_id) -#else -#define QL_DUMP_RICB(ricb) -#define QL_DUMP_WQICB(wqicb) -#define QL_DUMP_TX_RING(tx_ring) -#define QL_DUMP_CQICB(cqicb) -#define QL_DUMP_RX_RING(rx_ring) -#define QL_DUMP_HW_CB(qdev, size, bit, q_id) -#endif - -#ifdef QL_OB_DUMP -void ql_dump_tx_desc(struct ql_adapter *qdev, struct tx_buf_desc *tbd); -void ql_dump_ob_mac_iocb(struct ql_adapter *qdev, struct ob_mac_iocb_req *ob_mac_iocb); -void ql_dump_ob_mac_rsp(struct ql_adapter *qdev, struct ob_mac_iocb_rsp *ob_mac_rsp); -#define QL_DUMP_OB_MAC_IOCB(qdev, ob_mac_iocb) ql_dump_ob_mac_iocb(qdev, ob_mac_iocb) -#define QL_DUMP_OB_MAC_RSP(qdev, ob_mac_rsp) ql_dump_ob_mac_rsp(qdev, ob_mac_rsp) -#else -#define QL_DUMP_OB_MAC_IOCB(qdev, ob_mac_iocb) -#define QL_DUMP_OB_MAC_RSP(qdev, ob_mac_rsp) -#endif - -#ifdef QL_IB_DUMP -void ql_dump_ib_mac_rsp(struct ql_adapter *qdev, struct ib_mac_iocb_rsp *ib_mac_rsp); -#define QL_DUMP_IB_MAC_RSP(qdev, ib_mac_rsp) ql_dump_ib_mac_rsp(qdev, ib_mac_rsp) -#else -#define QL_DUMP_IB_MAC_RSP(qdev, ib_mac_rsp) -#endif - -#ifdef QL_ALL_DUMP -void ql_dump_all(struct ql_adapter *qdev); -#define QL_DUMP_ALL(qdev) ql_dump_all(qdev) -#else -#define QL_DUMP_ALL(qdev) -#endif - #endif /* _QLGE_H_ */ diff --git a/drivers/staging/qlge/qlge_dbg.c b/drivers/staging/qlge/qlge_dbg.c index 989575743718..a02ecf8fb291 100644 --- a/drivers/staging/qlge/qlge_dbg.c +++ b/drivers/staging/qlge/qlge_dbg.c @@ -1313,691 +1313,3 @@ void ql_get_dump(struct ql_adapter *qdev, void *buff) ql_get_core_dump(qdev); } } - -#ifdef QL_REG_DUMP -static void ql_dump_intr_states(struct ql_adapter *qdev) -{ - int i; - u32 value; - - for (i = 0; i < qdev->intr_count; i++) { - ql_write32(qdev, INTR_EN, qdev->intr_context[i].intr_read_mask); - value = ql_read32(qdev, INTR_EN); - netdev_err(qdev->ndev, "Interrupt %d is %s\n", i, - (value & INTR_EN_EN ? "enabled" : "disabled")); - } -} - -#define DUMP_XGMAC(qdev, reg) \ -do { \ - u32 data; \ - ql_read_xgmac_reg(qdev, reg, &data);\ - netdev_err(qdev->ndev,
[PATCH v1 1/6] staging: qlge: Initialize devlink health dump framework for the dlge driver
Initialize devlink health dump framework for the dlge driver so the coredump could be done via devlink. Signed-off-by: Coiby Xu --- drivers/staging/qlge/Kconfig| 1 + drivers/staging/qlge/Makefile | 2 +- drivers/staging/qlge/qlge.h | 9 +++ drivers/staging/qlge/qlge_devlink.c | 38 + drivers/staging/qlge/qlge_devlink.h | 8 ++ drivers/staging/qlge/qlge_main.c| 28 + 6 files changed, 85 insertions(+), 1 deletion(-) create mode 100644 drivers/staging/qlge/qlge_devlink.c create mode 100644 drivers/staging/qlge/qlge_devlink.h diff --git a/drivers/staging/qlge/Kconfig b/drivers/staging/qlge/Kconfig index a3cb25a3ab80..6d831ed67965 100644 --- a/drivers/staging/qlge/Kconfig +++ b/drivers/staging/qlge/Kconfig @@ -3,6 +3,7 @@ config QLGE tristate "QLogic QLGE 10Gb Ethernet Driver Support" depends on ETHERNET && PCI + select NET_DEVLINK help This driver supports QLogic ISP8XXX 10Gb Ethernet cards. diff --git a/drivers/staging/qlge/Makefile b/drivers/staging/qlge/Makefile index 1dc2568e820c..07c1898a512e 100644 --- a/drivers/staging/qlge/Makefile +++ b/drivers/staging/qlge/Makefile @@ -5,4 +5,4 @@ obj-$(CONFIG_QLGE) += qlge.o -qlge-objs := qlge_main.o qlge_dbg.o qlge_mpi.o qlge_ethtool.o +qlge-objs := qlge_main.o qlge_dbg.o qlge_mpi.o qlge_ethtool.o qlge_devlink.o diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h index b295990e361b..290e754450c5 100644 --- a/drivers/staging/qlge/qlge.h +++ b/drivers/staging/qlge/qlge.h @@ -2060,6 +2060,14 @@ struct nic_operations { int (*port_initialize)(struct ql_adapter *qdev); }; + + +struct qlge_devlink { +struct ql_adapter *qdev; +struct net_device *ndev; +struct devlink_health_reporter *reporter; +}; + /* * The main Adapter structure definition. * This structure has all fields relevant to the hardware. @@ -2077,6 +2085,7 @@ struct ql_adapter { struct pci_dev *pdev; struct net_device *ndev;/* Parent NET device */ + struct qlge_devlink *ql_devlink; /* Hardware information */ u32 chip_rev_id; u32 fw_rev_id; diff --git a/drivers/staging/qlge/qlge_devlink.c b/drivers/staging/qlge/qlge_devlink.c new file mode 100644 index ..aa45e7e368c0 --- /dev/null +++ b/drivers/staging/qlge/qlge_devlink.c @@ -0,0 +1,38 @@ +#include "qlge.h" +#include "qlge_devlink.h" + +static int +qlge_reporter_coredump(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg, void *priv_ctx, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static const struct devlink_health_reporter_ops qlge_reporter_ops = { + .name = "dummy", + .dump = qlge_reporter_coredump, +}; + +int qlge_health_create_reporters(struct qlge_devlink *priv) +{ + int err; + + struct devlink_health_reporter *reporter; + struct devlink *devlink; + + devlink = priv_to_devlink(priv); + reporter = + devlink_health_reporter_create(devlink, &qlge_reporter_ops, + 0, + priv); + if (IS_ERR(reporter)) { + netdev_warn(priv->ndev, + "Failed to create reporter, err = %ld\n", + PTR_ERR(reporter)); + return PTR_ERR(reporter); + } + priv->reporter = reporter; + + return err; +} diff --git a/drivers/staging/qlge/qlge_devlink.h b/drivers/staging/qlge/qlge_devlink.h new file mode 100644 index ..c91f7a29e805 --- /dev/null +++ b/drivers/staging/qlge/qlge_devlink.h @@ -0,0 +1,8 @@ +#ifndef QLGE_DEVLINK_H +#define QLGE_DEVLINK_H + +#include + +int qlge_health_create_reporters(struct qlge_devlink *priv); + +#endif /* QLGE_DEVLINK_H */ diff --git a/drivers/staging/qlge/qlge_main.c b/drivers/staging/qlge/qlge_main.c index 27da386f9d87..135225530e51 100644 --- a/drivers/staging/qlge/qlge_main.c +++ b/drivers/staging/qlge/qlge_main.c @@ -42,6 +42,7 @@ #include #include "qlge.h" +#include "qlge_devlink.h" char qlge_driver_name[] = DRV_NAME; const char qlge_driver_version[] = DRV_VERSION; @@ -4549,6 +4550,8 @@ static void ql_timer(struct timer_list *t) mod_timer(&qdev->timer, jiffies + (5 * HZ)); } +static const struct devlink_ops qlge_devlink_ops; + static int qlge_probe(struct pci_dev *pdev, const struct pci_device_id *pci_entry) { @@ -4556,6 +4559,13 @@ static int qlge_probe(struct pci_dev *pdev, struct ql_adapter *qdev = NULL; static int cards_found; int err = 0; + struct devlink *devlink; + struct qlge_devlink *ql_devlink; + + devlink = devlink_alloc(&qlge_devlink_ops, sizeof(struct qlge_devlink)); + if (!devlink) + return -ENOMEM; + ql_devlink = devlink
[PATCH] dpaa_eth: enable NETIF_MSG_HW by default
From: Maxim Kochetkov When packets are received on the error queue, this function under net_ratelimit(): netif_err(priv, hw, net_dev, "Err FD status = 0x%08x\n"); does not get printed. Instead we only see: [ 3658.845592] net_ratelimit: 244 callbacks suppressed [ 3663.969535] net_ratelimit: 230 callbacks suppressed [ 3669.085478] net_ratelimit: 228 callbacks suppressed Enabling NETIF_MSG_HW fixes this issue, and we can see some information about the frame descriptors of packets. Signed-off-by: Maxim Kochetkov Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index fdff3b4723ba..06cc863f4dd6 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -87,7 +87,7 @@ MODULE_PARM_DESC(tx_timeout, "The Tx timeout in ms"); #define DPAA_MSG_DEFAULT (NETIF_MSG_DRV | NETIF_MSG_PROBE | \ NETIF_MSG_LINK | NETIF_MSG_IFUP | \ - NETIF_MSG_IFDOWN) + NETIF_MSG_IFDOWN | NETIF_MSG_HW) #define DPAA_INGRESS_CS_THRESHOLD 0x1000 /* Ingress congestion threshold on FMan ports -- 2.25.1
Re: [PATCH] dpaa_eth: enable NETIF_MSG_HW by default
On Thu, Oct 08, 2020 at 03:03:12PM +0300, Vladimir Oltean wrote: > From: Maxim Kochetkov > > When packets are received on the error queue, this function under > net_ratelimit(): > > netif_err(priv, hw, net_dev, "Err FD status = 0x%08x\n"); > > does not get printed. Instead we only see: > > [ 3658.845592] net_ratelimit: 244 callbacks suppressed > [ 3663.969535] net_ratelimit: 230 callbacks suppressed > [ 3669.085478] net_ratelimit: 228 callbacks suppressed > > Enabling NETIF_MSG_HW fixes this issue, and we can see some information > about the frame descriptors of packets. > > Signed-off-by: Maxim Kochetkov > Signed-off-by: Vladimir Oltean > --- I'm sorry for failing to mention the target tree in the email subject. This is directed towards net-next.
[PATCH 1/2] net: smc: fix missing brace warning for old compilers
For older versions of gcc, the array = {0}; will cause warnings: net/smc/smc_llc.c: In function 'smc_llc_send_link_delete_all': net/smc/smc_llc.c:1317:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_del_link delllc = {0}; ^ net/smc/smc_llc.c:1317:9: warning: (near initialization for 'delllc.hd') [-Wmissing-braces] 1 warnings generated Fixes: f3811fd7bc97 ("net/smc: send DELETE_LINK, ALL message and wait for send to complete") Signed-off-by: Pujin Shi --- net/smc/smc_llc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c index 2db967f2fb50..d09d9d2d0bfd 100644 --- a/net/smc/smc_llc.c +++ b/net/smc/smc_llc.c @@ -1314,7 +1314,7 @@ static void smc_llc_process_cli_delete_link(struct smc_link_group *lgr) */ void smc_llc_send_link_delete_all(struct smc_link_group *lgr, bool ord, u32 rsn) { - struct smc_llc_msg_del_link delllc = {0}; + struct smc_llc_msg_del_link delllc = {}; int i; delllc.hd.common.type = SMC_LLC_DELETE_LINK; -- 2.18.1
[PATCH 2/2] net: smc: fix missing brace warning for old compilers
For older versions of gcc, the array = {0}; will cause warnings: net/smc/smc_llc.c: In function 'smc_llc_add_link_local': net/smc/smc_llc.c:1212:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_add_link add_llc = {0}; ^ net/smc/smc_llc.c:1212:9: warning: (near initialization for 'add_llc.hd') [-Wmissing-braces] net/smc/smc_llc.c: In function 'smc_llc_srv_delete_link_local': net/smc/smc_llc.c:1245:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_del_link del_llc = {0}; ^ net/smc/smc_llc.c:1245:9: warning: (near initialization for 'del_llc.hd') [-Wmissing-braces] 2 warnings generated Fixes: 4dadd151b265 ("net/smc: enqueue local LLC messages") Signed-off-by: Pujin Shi --- net/smc/smc_llc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c index d09d9d2d0bfd..85df0ef60500 100644 --- a/net/smc/smc_llc.c +++ b/net/smc/smc_llc.c @@ -1209,7 +1209,7 @@ static void smc_llc_process_srv_add_link(struct smc_link_group *lgr) /* enqueue a local add_link req to trigger a new add_link flow */ void smc_llc_add_link_local(struct smc_link *link) { - struct smc_llc_msg_add_link add_llc = {0}; + struct smc_llc_msg_add_link add_llc = {}; add_llc.hd.length = sizeof(add_llc); add_llc.hd.common.type = SMC_LLC_ADD_LINK; @@ -1242,7 +1242,7 @@ static void smc_llc_add_link_work(struct work_struct *work) */ void smc_llc_srv_delete_link_local(struct smc_link *link, u8 del_link_id) { - struct smc_llc_msg_del_link del_llc = {0}; + struct smc_llc_msg_del_link del_llc = {}; del_llc.hd.length = sizeof(del_llc); del_llc.hd.common.type = SMC_LLC_DELETE_LINK; -- 2.18.1
Re: [PATCH v1 1/6] staging: qlge: Initialize devlink health dump framework for the dlge driver
On Thu, Oct 8, 2020 at 7:58 AM Coiby Xu wrote: > > Initialize devlink health dump framework for the dlge driver so the > coredump could be done via devlink. > > Signed-off-by: Coiby Xu > @@ -4556,6 +4559,13 @@ static int qlge_probe(struct pci_dev *pdev, > struct ql_adapter *qdev = NULL; > static int cards_found; > int err = 0; > + struct devlink *devlink; > + struct qlge_devlink *ql_devlink; > + > + devlink = devlink_alloc(&qlge_devlink_ops, sizeof(struct > qlge_devlink)); > + if (!devlink) > + return -ENOMEM; > + ql_devlink = devlink_priv(devlink); > > ndev = alloc_etherdev_mq(sizeof(struct ql_adapter), > min(MAX_CPUS, need to goto devlink_free instead of return -ENOMEM here, too. > @@ -4614,6 +4624,16 @@ static int qlge_probe(struct pci_dev *pdev, > free_netdev(ndev); > return err; and here > } > + > + err = devlink_register(devlink, &pdev->dev); > + if (err) { > + goto devlink_free; > + } > + > + qlge_health_create_reporters(ql_devlink); > + ql_devlink->qdev = qdev; > + ql_devlink->ndev = ndev; > + qdev->ql_devlink = ql_devlink; > /* Start up the timer to trigger EEH if > * the bus goes dead > */ > @@ -4624,6 +4644,10 @@ static int qlge_probe(struct pci_dev *pdev, > atomic_set(&qdev->lb_count, 0); > cards_found++; > return 0; > + > +devlink_free: > + devlink_free(devlink); > + return err; > }
Re: [PATCH bpf-next V2 1/6] bpf: Remove MTU check in __bpf_skb_max_len
On Thu, Oct 8, 2020 at 7:06 AM Jesper Dangaard Brouer wrote: > > On Wed, 7 Oct 2020 16:46:10 -0700 > Maciej Żenczykowski wrote: > > > > static u32 __bpf_skb_max_len(const struct sk_buff *skb) > > > { > > > - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len : > > > - SKB_MAX_ALLOC; > > > + return IP_MAX_MTU; > > > } > > > > Shouldn't we just delete this helper instead and replace call sites? > > It does seem wrong to pass argument skb into this function, as it is > no-longer used... > > Guess I can simply replace __bpf_skb_max_len with IP_MAX_MTU. Should that be IP6_MAX_MTU, which is larger than IP_MAX_MTU?
Re: [PATCHv2 net-next 02/17] udp6: move the mss check after udp gso tunnel processing
On Thu, Oct 8, 2020 at 5:48 AM Xin Long wrote: > > For some protocol's gso, like SCTP, it's using GSO_BY_FRAGS for > gso_size. When using UDP to encapsulate its packet, it will > return error in udp6_ufo_fragment() as skb->len < gso_size, > and it will never go to the gso tunnel processing. > > So we should move this check after udp gso tunnel processing, > the same as udp4_ufo_fragment() does. While at it, also tidy > the variables up. Please don't mix a new feature and code cleanup. This patch changes almost every line of the function due to indentation changes. But the only relevant part is " mss = skb_shinfo(skb)->gso_size; if (unlikely(skb->len <= mss)) goto out; if (skb->encapsulation && skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM)) segs = skb_udp_tunnel_segment(skb, features, true); else { /* irrelevant here */ } out: return segs; } " Is it a sufficient change to just skip the mss check if mss == GSO_BY_FRAGS?