Re: [net-next v8 1/6] net: marvell: prestera: Add driver for Prestera family ASIC devices

2020-09-11 Thread Vadym Kochan
On Thu, Sep 10, 2020 at 06:30:47PM -0700, Jakub Kicinski wrote:
> On Thu, 10 Sep 2020 18:00:50 +0300 Vadym Kochan wrote:
> > +static int prestera_sdma_tx_wait(struct prestera_sdma *sdma,
> > +struct prestera_tx_ring *tx_ring)
> > +{
> > +   int tx_wait_num = PRESTERA_SDMA_WAIT_MUL * tx_ring->max_burst;
> > +   bool is_ready;
> > +
> > +   return read_poll_timeout_atomic(prestera_sdma_is_ready, is_ready, true,
> > +   1, tx_wait_num, false, sdma);
> > +}
> 
> This is strange and generates a warning:
> 
> drivers/net/ethernet/marvell/prestera/prestera_rxtx.c: In function 
> ‘prestera_sdma_tx_wait’:
> drivers/net/ethernet/marvell/prestera/prestera_rxtx.c:695:7: warning: 
> variable ‘is_ready’ set but not used [-Wunused-but-set-variable]
>   695 |  bool is_ready;
>   |   ^~~~

Sorry about this mistake, will re-submit new version.


Re: [PATCH] Bluetooth: Re-order clearing suspend tasks

2020-09-11 Thread Marcel Holtmann
Hi Abhishek,

> Unregister_pm_notifier is a blocking call so suspend tasks should be
> cleared beforehand. Otherwise, the notifier will wait for completion
> before returning (and we encounter a 2s timeout on resume).
> 
> Fixes: 0e9952804ec9c8 (Bluetooth: Clear suspend tasks on unregister)
> Signed-off-by: Abhishek Pandit-Subedi 
> ---
> Should have caught that unregister_pm_notifier was blocking last time
> but when testing the earlier patch, I got unlucky and saw that the error
> message was never hit (the suspend timeout).
> 
> When re-testing this patch on the same device, I was able to reproduce
> the problem on an older build with the 0e9952804ec9c8 but not on a newer
> build with the same patch. Changing the order correctly fixes it
> everywhere. Confirmed this by adding debug logs in btusb_disconnect and
> hci_suspend_notifier to confirm what order things were getting called.
> 
> Sorry about the churn. Next I'm going try to do something about the palm
> shaped indentation on my forehead...
> 
> net/bluetooth/hci_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel



Re: [PATCH v3 2/2] Bluetooth: sco: new getsockopt options BT_SNDMTU/BT_RCVMTU

2020-09-11 Thread Marcel Holtmann
Hi Joseph,

> This patch defines new getsockopt options BT_SNDMTU/BT_RCVMTU
> for SCO socket to be compatible with other bluetooth sockets.
> These new options return the same value as option SCO_OPTIONS
> which is already present on existing kernels.
> 
> Reviewed-by: Alain Michaud 
> Reviewed-by: Abhishek Pandit-Subedi 
> Signed-off-by: Joseph Hwang 
> ---
> 
> Changes in v3:
> - Fixed the commit message.
> 
> Changes in v2:
> - Used BT_SNDMTU/BT_RCVMTU instead of creating a new opt name.
> - Used the existing conn->mtu instead of creating a new member
>  in struct sco_pinfo.
> - Noted that the old SCO_OPTIONS in sco_sock_getsockopt_old()
>  would just work as it uses sco_pi(sk)->conn->mtu.
> 
> net/bluetooth/sco.c | 6 ++
> 1 file changed, 6 insertions(+)

patch has been applied to bluetooth-next tree.

Regards

Marcel



Re: [PATCH net-next + leds v2 6/7] net: phy: marvell: add support for LEDs controlled by Marvell PHYs

2020-09-11 Thread Matthias Schiffer
On Thu, 2020-09-10 at 17:00 +0200, Andrew Lunn wrote:
> > I propose that at least these HW modes should be available (and
> > documented) for ethernet PHY controlled LEDs:
> >   mode to determine link on:
> > - `link`
> >   mode for activity (these should blink):
> > - `activity` (both rx and tx), `rx`, `tx`
> >   mode for link (on) and activity (blink)
> > - `link/activity`, maybe `link/rx` and `link/tx`
> >   mode for every supported speed:
> > - `1Gbps`, `100Mbps`, `10Mbps`, ...
> >   mode for every supported cable type:
> > - `copper`, `fiber`, ... (are there others?)
> 
> In theory, there is AUI and BNC, but no modern device will have
> these.
> 
> >   mode that allows the user to determine link speed
> > - `speed` (or maybe `linkspeed` ?)
> > - on some Marvell PHYs the speed can be determined by how fast
> >   the LED is blinking (ie. 1Gbps blinks with default blinking
> >   frequency, 100Mbps with half blinking frequeny of 1Gbps,
> > 10Mbps
> >   of half blinking frequency of 100Mbps)
> > - on other Marvell PHYs this is instead:
> >   1Gpbs blinks 3 times, pause, 3 times, pause, ...
> >   100Mpbs blinks 2 times, pause, 2 times, pause, ...
> >   10Mpbs blinks 1 time, pause, 1 time, pause, ...
> > - we don't need to differentiate these modes with different
> > names,
> >   because the important thing is just that this mode allows the
> >   user to determine the speed from how the LED blinks
> >   mode to just force blinking
> > - `blink`
> > The nice thing is that all this can be documented and done in
> > software
> > as well.
> 
> Have you checked include/dt-bindings/net/microchip-lan78xx.h and
> mscc-phy-vsc8531.h ? If you are defining something generic, we need
> to
> make sure the majority of PHYs can actually do it. There is no
> standardization in this area. I'm sure there is some similarity,
> there
> is only so many ways you can blink an LED, but i suspect we need a
> mixture of standardized modes which we hope most PHYs implement, and
> the option to support hardware specific modes.
> 
> Andrew


FWIW, these are the LED HW trigger modes supported by the TI DP83867
PHY:

- Receive Error
- Receive Error or Transmit Error
- Link established, blink for transmit or receive activity
- Full duplex
- 100/1000BT link established
- 10/100BT link established
- 10BT link established
- 100BT link established
- 1000BT link established
- Collision detected
- Receive activity
- Transmit activity
- Receive or Transmit activity
- Link established

AFAIK, the "Link established, blink for transmit or receive activity"
is the only trigger that involves blinking; all other modes simply make
the LED light up when the condition is met. Setting the output level in
software is also possible.

Regarding the option to emulate unsupported HW triggers in software,
two questions come to my mind:

- Do all PHYs support manual setting of the LED level, or are the PHYs
that can only work with HW triggers?
- Is setting PHY registers always efficiently possible, or should SW
triggers be avoided in certain cases? I'm thinking about setups like
mdio-gpio. I guess this can only become an issue for triggers that
blink.


Kind regards,
Matthias



Re: [PATCH 0/2] Bluetooth: Report extended adv capabilities to userspace

2020-09-11 Thread Marcel Holtmann
Hi Daniel,

> This series improves the kernel/controller support that is reported
> to userspace for the following extended advertising features:
> 
> 1. If extended advertising is available, the number of hardware slots
> is used and reported, rather than the fixed default of 5. If no hardware
> support is available, default is used as before for software rotation.
> 
> 2. New flags indicating general hardware offloading and ability to
> set tx power level. These are kept as two separate flags because in
> the future vendor commands may allow tx power to be set without
> hardware offloading support.
> 
> 
> Daniel Winkler (2):
>  bluetooth: Report num supported adv instances for hw offloading
>  bluetooth: Add MGMT capability flags for tx power and ext advertising
> 
> include/net/bluetooth/mgmt.h | 2 ++
> net/bluetooth/hci_core.c | 2 +-
> net/bluetooth/mgmt.c | 8 +---
> 3 files changed, 8 insertions(+), 4 deletions(-)

both patches have been applied to bluetooth-next tree.

Regards

Marcel



Re: [PATCH 0/3] Bluetooth: Emit events for suspend/resume

2020-09-11 Thread Marcel Holtmann
Hi Abhishek,

> This series adds the suspend/resume events suggested in
> https://patchwork.kernel.org/patch/11663455/.
> 
> I have tested it with some userspace changes that monitors the
> controller resumed event to trigger audio device reconnection and
> verified that the events are correctly emitted.
> 
> Please take a look.
> Abhishek
> 
> 
> Abhishek Pandit-Subedi (3):
>  Bluetooth: Add mgmt suspend and resume events
>  Bluetooth: Add suspend reason for device disconnect
>  Bluetooth: Emit controller suspend and resume events
> 
> include/net/bluetooth/hci_core.h |  6 +++
> include/net/bluetooth/mgmt.h | 16 +++
> net/bluetooth/hci_core.c | 26 +++-
> net/bluetooth/hci_event.c| 73 
> net/bluetooth/mgmt.c | 28 
> 5 files changed, 148 insertions(+), 1 deletion(-)

can you please re-send this series. Unfortunately it seems I only have the 
cover letter, but lost the patches.

Regards

Marcel



Re: [PATCH nf-next v3 3/3] netfilter: Introduce egress hook

2020-09-11 Thread Laura García Liébana
Hi Daniel,

On Tue, Sep 8, 2020 at 2:55 PM Daniel Borkmann  wrote:
>
> Hi Lukas,
>
> On 9/5/20 7:24 AM, Lukas Wunner wrote:
> > On Fri, Sep 04, 2020 at 11:14:37PM +0200, Daniel Borkmann wrote:
> >> On 9/4/20 6:21 PM, Lukas Wunner wrote:
> [...]
> >> The tc queueing layer which is below is not the tc egress hook; the
> >> latter is for filtering/mangling/forwarding or helping the lower tc
> >> queueing layer to classify.
> >
> > People want to apply netfilter rules on egress, so either we need an
> > egress hook in the xmit path or we'd have to teach tc to filter and
> > mangle based on netfilter rules.  The former seemed more straight-forward
> > to me but I'm happy to pursue other directions.
>
> I would strongly prefer something where nf integrates into existing tc hook,
> not only due to the hook reuse which would be better, but also to allow for a
> more flexible interaction between tc/BPF use cases and nf, to name one

That sounds good but I'm afraid that it would take too much back and
forth discussions. We'll really appreciate it if this small patch can
be unblocked and then rethink the refactoring of ingress/egress hooks
that you commented in another thread.

Thanks!


Re: [PATCHv11 bpf-next 2/5] xdp: add a new helper for dev map multicast support

2020-09-11 Thread Jesper Dangaard Brouer
On Thu, 10 Sep 2020 12:35:33 -0600
David Ahern  wrote:

> On 9/10/20 11:50 AM, Jesper Dangaard Brouer wrote:
> > Maybe we should change the devmap-prog approach, and run this on the
> > xdp_frame's (in bq_xmit_all() to be precise) .  Hangbin's patchset
> > clearly shows that we need this "layer" between running the xdp_prog and
> > the devmap-prog.   
> 
> I would prefer to leave it in dev_map_enqueue.
> 
> The main premise at the moment is that the program attached to the
> DEVMAP entry is an ACL specific to that dev. If the program is going to
> drop the packet, then no sense queueing it.
> 
> I also expect a follow on feature will be useful to allow the DEVMAP
> program to do another REDIRECT (e.g., potentially after modifying). It
> is not handled at the moment as it needs thought - e.g., limiting the
> number of iterative redirects. If such a feature does happen, then no
> sense queueing it to the current device.

It makes a lot of sense to do queuing before redirecting again.  The
(hidden) bulking we do at XDP redirect is the primary reason for the
performance boost. We all remember performance difference between
non-map version of redirect (which Toke fixed via always having the
bulking available in net_device->xdp_bulkq).

In a simple micro-benchmark I bet it will look better running the
devmap-prog right after the xdp_prog (which is what we have today). But
I claim this is the wrong approach, as soon as (1) traffic is more
intermixed, and (2) devmap-prog gets bigger and becomes more specific
to the egress-device (e.g. BPF update constants per egress-device).
When this happens performance suffers, as I-cache and data-access to
each egress-device gets pushed out of cache. (Hint VPP/fd.io approach)

Queuing xdp_frames up for your devmap-prog makes sense, as these share
common properties.  With intermix traffic the first xdp_prog will sort
packets into egress-devices, and then the devmap-prog can operate on
these.  The best illustration[1] of this sorting I saw in a Netflix
blogpost[2] about FreeBSD, section "RSS Assisted LRO" (not directly
related, but illustration was good).


[1] https://miro.medium.com/max/700/1%2alTGL1_D6hTMEMa7EDV8yZA.png
[2] 
https://netflixtechblog.com/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



Re: KASAN: use-after-free Read in __xfrm6_tunnel_spi_lookup

2020-09-11 Thread Steffen Klassert
On Thu, Sep 10, 2020 at 10:09:50AM +0200, Dmitry Vyukov wrote:
> On Thu, Sep 10, 2020 at 10:08 AM B K Karthik  wrote:
> >
> > On Thu, Sep 10, 2020 at 1:32 PM Dmitry Vyukov  wrote:
> > >
> > > On Thu, Sep 10, 2020 at 9:20 AM Anant Thazhemadam
> > >  wrote:
> > > > Looks like this bug is no longer valid. I'm not sure which commit seems 
> > > > to have fixed it. Can this be marked as invalid or closed yet?
> > >
> > > You can see on the dashboard (or in mailing list archives) that B K
> > > Karthik tested a patch for this bug in July:
> > > https://syzkaller.appspot.com/bug?extid=72ff2fa98097767b5a27
> > >
> > > So perhaps that patch fixes it? Karthik, did you send it? Was it
> > > merged? Did the commit include the syzbot Reported-by tag?
> > >
> >
> > I did send it. I was taking a u32 spi value and casting it to a
> > pointer to an IP address. Steffen Klassert
> >  pointed out to me that the approach i
> > was looking at was completely wrong.
> > https://lkml.org/lkml/2020/7/27/361 is the conversation. hope this
> > helps.
> 
> +Steffen, was there any other fix merged for this?

I think that was already fixed before the sysbot report came in by
commit 8b404f46dd6a ("xfrm: interface: not xfrmi_ipv6/ipip_handler twice")


Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc

2020-09-11 Thread Yunsheng Lin
On 2020/9/11 4:07, Cong Wang wrote:
> On Tue, Sep 8, 2020 at 4:06 AM Yunsheng Lin  wrote:
>>
>> Currently there is concurrent reset and enqueue operation for the
>> same lockless qdisc when there is no lock to synchronize the
>> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
>> qdisc_deactivate() called by dev_deactivate_queue(), which may cause
>> out-of-bounds access for priv->ring[] in hns3 driver if user has
>> requested a smaller queue num when __dev_xmit_skb() still enqueue a
>> skb with a larger queue_mapping after the corresponding qdisc is
>> reset, and call hns3_nic_net_xmit() with that skb later.
>>
>> Reused the existing synchronize_net() in dev_deactivate_many() to
>> make sure skb with larger queue_mapping enqueued to old qdisc(which
>> is saved in dev_queue->qdisc_sleeping) will always be reset when
>> dev_reset_queue() is called.
>>
>> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
>> Signed-off-by: Yunsheng Lin 
>> ---
>> ChangeLog V2:
>> Reuse existing synchronize_net().
>> ---
>>  net/sched/sch_generic.c | 48 
>> +---
>>  1 file changed, 33 insertions(+), 15 deletions(-)
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 265a61d..54c4172 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -1131,24 +1131,10 @@ EXPORT_SYMBOL(dev_activate);
>>
>>  static void qdisc_deactivate(struct Qdisc *qdisc)
>>  {
>> -   bool nolock = qdisc->flags & TCQ_F_NOLOCK;
>> -
>> if (qdisc->flags & TCQ_F_BUILTIN)
>> return;
>> -   if (test_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state))
>> -   return;
>> -
>> -   if (nolock)
>> -   spin_lock_bh(&qdisc->seqlock);
>> -   spin_lock_bh(qdisc_lock(qdisc));
>>
>> set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
>> -
>> -   qdisc_reset(qdisc);
>> -
>> -   spin_unlock_bh(qdisc_lock(qdisc));
>> -   if (nolock)
>> -   spin_unlock_bh(&qdisc->seqlock);
>>  }
>>
>>  static void dev_deactivate_queue(struct net_device *dev,
>> @@ -1165,6 +1151,30 @@ static void dev_deactivate_queue(struct net_device 
>> *dev,
>> }
>>  }
>>
>> +static void dev_reset_queue(struct net_device *dev,
>> +   struct netdev_queue *dev_queue,
>> +   void *_unused)
>> +{
>> +   struct Qdisc *qdisc;
>> +   bool nolock;
>> +
>> +   qdisc = dev_queue->qdisc_sleeping;
>> +   if (!qdisc)
>> +   return;
>> +
>> +   nolock = qdisc->flags & TCQ_F_NOLOCK;
>> +
>> +   if (nolock)
>> +   spin_lock_bh(&qdisc->seqlock);
>> +   spin_lock_bh(qdisc_lock(qdisc));
> 
> 
> I think you do not need this lock for lockless one.

It seems so.
Maybe another patch to remove qdisc_lock(qdisc) for lockless
qdisc?


> 
>> +
>> +   qdisc_reset(qdisc);
>> +
>> +   spin_unlock_bh(qdisc_lock(qdisc));
>> +   if (nolock)
>> +   spin_unlock_bh(&qdisc->seqlock);
>> +}
>> +
>>  static bool some_qdisc_is_busy(struct net_device *dev)
>>  {
>> unsigned int i;
>> @@ -1213,12 +1223,20 @@ void dev_deactivate_many(struct list_head *head)
>> dev_watchdog_down(dev);
>> }
>>
>> -   /* Wait for outstanding qdisc-less dev_queue_xmit calls.
>> +   /* Wait for outstanding qdisc-less dev_queue_xmit calls or
>> +* outstanding qdisc enqueuing calls.
>>  * This is avoided if all devices are in dismantle phase :
>>  * Caller will call synchronize_net() for us
>>  */
>> synchronize_net();
>>
>> +   list_for_each_entry(dev, head, close_list) {
>> +   netdev_for_each_tx_queue(dev, dev_reset_queue, NULL);
>> +
>> +   if (dev_ingress_queue(dev))
>> +   dev_reset_queue(dev, dev_ingress_queue(dev), NULL);
>> +   }
>> +
>> /* Wait for outstanding qdisc_run calls. */
>> list_for_each_entry(dev, head, close_list) {
>> while (some_qdisc_is_busy(dev)) {
> 
> Do you want to reset before waiting for TX action?
> 
> I think it is safer to do it after, at least prior to commit 759ae57f1b
> we did after.

The reference to the txq->qdisc is always protected by RCU, so the 
synchronize_net()
should be enought to ensure there is no skb enqueued to the old qdisc that is 
saved
in the dev_queue->qdisc_sleeping, because __dev_queue_xmit can only see the new 
qdisc
after synchronize_net(), which is noop_qdisc, and noop_qdisc will make sure any 
skb
enqueued to it will be dropped and freed, right?

If we do any additional reset that is not related to qdisc in 
dev_reset_queue(), we
can move it after some_qdisc_is_busy() checking.

Also, it seems the __QDISC_STATE_DEACTIVATED checking in qdisc_run() is 
unnecessary
after this patch, because after synchronize_net() qdisc_run() will now see the 
old
qdisc.

static inline void qdisc_run(struct Qdisc *q)
{
   

[PATCH v1 2/2] net: ag71xx: add flow control support

2020-09-11 Thread Oleksij Rempel
Add flow control support. The functionality was tested on AR9331 SoC and
confirmed by iperf3 results and HW counters exported over ethtool.
Following test configurations was used:

iMX6S receiver <--- TL-SG1005D switch < AR9331 sender

The switch is supporting symmytric flow control:
Settings for eth0:
Supported ports: [ MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes:  10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
--->>   Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 100Mb/s
Duplex: Full
Auto-negotiation: on
Port: MII
PHYAD: 4
Transceiver: external
Link detected: yes

The iMX6S system was configured to 10Mbit, to let the switch use flow
control:
  - ethtool -s eth0 speed 10

With flow control disabled on AR9331:
  - ethtool -A eth0  rx off tx off
  - iperf3 -u -c 172.17.0.1 -b100M -l1472 -t10

[ ID] Interval   Transfer Bitrate JitterLost/Total 
Datagrams
[  5]   0.00-10.00  sec  66.2 MBytes  55.5 Mbits/sec  0.000 ms  0/47155 (0%)  
sender
[  5]   0.00-10.04  sec  11.5 MBytes  9.57 Mbits/sec  1.309 ms  38986/47146 
(83%)  receiver

With flow control enabled on AR9331:
  - ethtool -A eth0  rx on tx on
  - iperf3 -u -c 172.17.0.1 -b100M -l1472 -t10

[ ID] Interval   Transfer Bitrate JitterLost/Total 
Datagrams
[  5]   0.00-10.00  sec  15.1 MBytes  12.6 Mbits/sec  0.000 ms  0/10727 (0%)  
sender
[  5]   0.00-10.05  sec  11.5 MBytes  9.57 Mbits/sec  1.371 ms  2525/10689 
(24%)  receiver

Similar results are get in opposite direction by introducing extra CPU
load on AR9331:
  - chrt 40 dd if=/dev/zero of=/dev/null &

Signed-off-by: Oleksij Rempel 
---
 drivers/net/ethernet/atheros/ag71xx.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/ag71xx.c 
b/drivers/net/ethernet/atheros/ag71xx.c
index 8c80a87aee58..dd5c8a9038bb 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -1056,6 +1056,8 @@ static void ag71xx_mac_validate(struct phylink_config 
*config,
 
phylink_set(mask, MII);
 
+   phylink_set(mask, Pause);
+   phylink_set(mask, Asym_Pause);
phylink_set(mask, Autoneg);
phylink_set(mask, 10baseT_Half);
phylink_set(mask, 10baseT_Full);
@@ -1106,7 +1108,7 @@ static void ag71xx_mac_link_up(struct phylink_config 
*config,
   bool tx_pause, bool rx_pause)
 {
struct ag71xx *ag = netdev_priv(to_net_dev(config->dev));
-   u32 cfg2;
+   u32 cfg1, cfg2;
u32 ifctl;
u32 fifo5;
 
@@ -1140,6 +1142,15 @@ static void ag71xx_mac_link_up(struct phylink_config 
*config,
ag71xx_wr(ag, AG71XX_REG_FIFO_CFG5, fifo5);
ag71xx_wr(ag, AG71XX_REG_MAC_IFCTL, ifctl);
 
+   cfg1 = ag71xx_rr(ag, AG71XX_REG_MAC_CFG1);
+   cfg1 &= ~(MAC_CFG1_TFC | MAC_CFG1_RFC);
+   if (tx_pause)
+   cfg1 |= MAC_CFG1_TFC;
+
+   if (rx_pause)
+   cfg1 |= MAC_CFG1_RFC;
+   ag71xx_wr(ag, AG71XX_REG_MAC_CFG1, cfg1);
+
ag71xx_hw_start(ag);
 }
 
-- 
2.28.0



[PATCH v1 0/2] ag71xx: add ethtool and flow control support

2020-09-11 Thread Oleksij Rempel
The main target of this patches is to provide flow control support
for ag71xx driver. To be able to validate this functionality, I also
added ethtool support with HW counters. So, this patches was validated
with iperf3 and counters showing Pause frames send or received by this
NIC.

Oleksij Rempel (2):
  net: ag71xx: add ethtool support
  net: ag71xx: add flow control support

 drivers/net/ethernet/atheros/ag71xx.c | 160 +-
 1 file changed, 159 insertions(+), 1 deletion(-)

-- 
2.28.0



[PATCH v1 1/2] net: ag71xx: add ethtool support

2020-09-11 Thread Oleksij Rempel
Add basic ethtool support. The functionality was tested on AR9331 SoC.

Signed-off-by: Oleksij Rempel 
---
 drivers/net/ethernet/atheros/ag71xx.c | 147 ++
 1 file changed, 147 insertions(+)

diff --git a/drivers/net/ethernet/atheros/ag71xx.c 
b/drivers/net/ethernet/atheros/ag71xx.c
index 38cce66ef212..8c80a87aee58 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -235,6 +235,59 @@
| NETIF_MSG_RX_ERR  \
| NETIF_MSG_TX_ERR)
 
+struct ag71xx_statistic {
+   unsigned short offset;
+   u32 mask;
+   const char name[ETH_GSTRING_LEN];
+};
+
+static const struct ag71xx_statistic ag71xx_statistics[] = {
+   { 0x0080, GENMASK(17, 0), "Tx/Rx 64 Byte", },
+   { 0x0084, GENMASK(17, 0), "Tx/Rx 65-127 Byte", },
+   { 0x0088, GENMASK(17, 0), "Tx/Rx 128-255 Byte", },
+   { 0x008C, GENMASK(17, 0), "Tx/Rx 256-511 Byte", },
+   { 0x0090, GENMASK(17, 0), "Tx/Rx 512-1023 Byte", },
+   { 0x0094, GENMASK(17, 0), "Tx/Rx 1024-1518 Byte", },
+   { 0x0098, GENMASK(17, 0), "Tx/Rx 1519-1522 Byte VLAN", },
+   { 0x009C, GENMASK(23, 0), "Rx Byte", },
+   { 0x00A0, GENMASK(17, 0), "Rx Packet", },
+   { 0x00A4, GENMASK(11, 0), "Rx FCS Error", },
+   { 0x00A8, GENMASK(17, 0), "Rx Multicast Packet", },
+   { 0x00AC, GENMASK(21, 0), "Rx Broadcast Packet", },
+   { 0x00B0, GENMASK(17, 0), "Rx Control Frame Packet", },
+   { 0x00B4, GENMASK(11, 0), "Rx Pause Frame Packet", },
+   { 0x00B8, GENMASK(11, 0), "Rx Unknown OPCode Packet", },
+   { 0x00BC, GENMASK(11, 0), "Rx Alignment Error", },
+   { 0x00C0, GENMASK(15, 0), "Rx Frame Length Error", },
+   { 0x00C4, GENMASK(11, 0), "Rx Code Error", },
+   { 0x00C8, GENMASK(11, 0), "Rx Carrier Sense Error", },
+   { 0x00CC, GENMASK(11, 0), "Rx Undersize Packet", },
+   { 0x00D0, GENMASK(11, 0), "Rx Oversize Packet", },
+   { 0x00D4, GENMASK(11, 0), "Rx Fragments", },
+   { 0x00D8, GENMASK(11, 0), "Rx Jabber", },
+   { 0x00DC, GENMASK(11, 0), "Rx Dropped Packet", },
+   { 0x00E0, GENMASK(23, 0), "Tx Byte", },
+   { 0x00E4, GENMASK(17, 0), "Tx Packet", },
+   { 0x00E8, GENMASK(17, 0), "Tx Multicast Packet", },
+   { 0x00EC, GENMASK(17, 0), "Tx Broadcast Packet", },
+   { 0x00F0, GENMASK(11, 0), "Tx Pause Control Frame", },
+   { 0x00F4, GENMASK(11, 0), "Tx Deferral Packet", },
+   { 0x00F8, GENMASK(11, 0), "Tx Excessive Deferral Packet", },
+   { 0x00FC, GENMASK(11, 0), "Tx Single Collision Packet", },
+   { 0x0100, GENMASK(11, 0), "Tx Multiple Collision", },
+   { 0x0104, GENMASK(11, 0), "Tx Late Collision Packet", },
+   { 0x0108, GENMASK(11, 0), "Tx Excessive Collision Packet", },
+   { 0x010C, GENMASK(12, 0), "Tx Total Collision", },
+   { 0x0110, GENMASK(11, 0), "Tx Pause Frames Honored", },
+   { 0x0114, GENMASK(11, 0), "Tx Drop Frame", },
+   { 0x0118, GENMASK(11, 0), "Tx Jabber Frame", },
+   { 0x011C, GENMASK(11, 0), "Tx FCS Error", },
+   { 0x0120, GENMASK(11, 0), "Tx Control Frame", },
+   { 0x0124, GENMASK(11, 0), "Tx Oversize Frame", },
+   { 0x0128, GENMASK(11, 0), "Tx Undersize Frame", },
+   { 0x012C, GENMASK(11, 0), "Tx Fragment", },
+};
+
 #define DESC_EMPTY BIT(31)
 #define DESC_MORE  BIT(24)
 #define DESC_PKTLEN_M  0xfff
@@ -394,6 +447,99 @@ static void ag71xx_int_disable(struct ag71xx *ag, u32 ints)
ag71xx_cb(ag, AG71XX_REG_INT_ENABLE, ints);
 }
 
+static void ag71xx_get_drvinfo(struct net_device *ndev,
+  struct ethtool_drvinfo *info)
+{
+   struct ag71xx *ag = netdev_priv(ndev);
+
+   strlcpy(info->driver, "ag71xx", sizeof(info->driver));
+   strlcpy(info->bus_info, of_node_full_name(ag->pdev->dev.of_node),
+   sizeof(info->bus_info));
+}
+
+static int ag71xx_get_link_ksettings(struct net_device *ndev,
+  struct ethtool_link_ksettings *kset)
+{
+   struct ag71xx *ag = netdev_priv(ndev);
+
+   return phylink_ethtool_ksettings_get(ag->phylink, kset);
+}
+
+static int ag71xx_set_link_ksettings(struct net_device *ndev,
+  const struct ethtool_link_ksettings *kset)
+{
+   struct ag71xx *ag = netdev_priv(ndev);
+
+   return phylink_ethtool_ksettings_set(ag->phylink, kset);
+}
+
+static int ag71xx_ethtool_nway_reset(struct net_device *ndev)
+{
+   struct ag71xx *ag = netdev_priv(ndev);
+
+   return phylink_ethtool_nway_reset(ag->phylink);
+}
+
+static void ag71xx_ethtool_get_pauseparam(struct net_device *ndev,
+ struct ethtool_pauseparam *pause)
+{
+   struct ag71xx *ag = netdev_priv(ndev);
+
+   phylink_ethtool_get_pauseparam(ag->phylink, pause);
+}
+
+static int ag71xx_ethtool_set_pauseparam(struct net_device *ndev,
+struct ethtool_pausepar

Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc

2020-09-11 Thread Yunsheng Lin
On 2020/9/11 16:13, Yunsheng Lin wrote:
> On 2020/9/11 4:07, Cong Wang wrote:
>> On Tue, Sep 8, 2020 at 4:06 AM Yunsheng Lin  wrote:
>>>
>>> Currently there is concurrent reset and enqueue operation for the
>>> same lockless qdisc when there is no lock to synchronize the
>>> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
>>> qdisc_deactivate() called by dev_deactivate_queue(), which may cause
>>> out-of-bounds access for priv->ring[] in hns3 driver if user has
>>> requested a smaller queue num when __dev_xmit_skb() still enqueue a
>>> skb with a larger queue_mapping after the corresponding qdisc is
>>> reset, and call hns3_nic_net_xmit() with that skb later.
>>>
>>> Reused the existing synchronize_net() in dev_deactivate_many() to
>>> make sure skb with larger queue_mapping enqueued to old qdisc(which
>>> is saved in dev_queue->qdisc_sleeping) will always be reset when
>>> dev_reset_queue() is called.
>>>
>>> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
>>> Signed-off-by: Yunsheng Lin 
>>> ---
>>> ChangeLog V2:
>>> Reuse existing synchronize_net().
>>> ---
>>>  net/sched/sch_generic.c | 48 
>>> +---
>>>  1 file changed, 33 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>>> index 265a61d..54c4172 100644
>>> --- a/net/sched/sch_generic.c
>>> +++ b/net/sched/sch_generic.c
>>> @@ -1131,24 +1131,10 @@ EXPORT_SYMBOL(dev_activate);
>>>
>>>  static void qdisc_deactivate(struct Qdisc *qdisc)
>>>  {
>>> -   bool nolock = qdisc->flags & TCQ_F_NOLOCK;
>>> -
>>> if (qdisc->flags & TCQ_F_BUILTIN)
>>> return;
>>> -   if (test_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state))
>>> -   return;
>>> -
>>> -   if (nolock)
>>> -   spin_lock_bh(&qdisc->seqlock);
>>> -   spin_lock_bh(qdisc_lock(qdisc));
>>>
>>> set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
>>> -
>>> -   qdisc_reset(qdisc);
>>> -
>>> -   spin_unlock_bh(qdisc_lock(qdisc));
>>> -   if (nolock)
>>> -   spin_unlock_bh(&qdisc->seqlock);
>>>  }
>>>
>>>  static void dev_deactivate_queue(struct net_device *dev,
>>> @@ -1165,6 +1151,30 @@ static void dev_deactivate_queue(struct net_device 
>>> *dev,
>>> }
>>>  }
>>>
>>> +static void dev_reset_queue(struct net_device *dev,
>>> +   struct netdev_queue *dev_queue,
>>> +   void *_unused)
>>> +{
>>> +   struct Qdisc *qdisc;
>>> +   bool nolock;
>>> +
>>> +   qdisc = dev_queue->qdisc_sleeping;
>>> +   if (!qdisc)
>>> +   return;
>>> +
>>> +   nolock = qdisc->flags & TCQ_F_NOLOCK;
>>> +
>>> +   if (nolock)
>>> +   spin_lock_bh(&qdisc->seqlock);
>>> +   spin_lock_bh(qdisc_lock(qdisc));
>>
>>
>> I think you do not need this lock for lockless one.
> 
> It seems so.
> Maybe another patch to remove qdisc_lock(qdisc) for lockless
> qdisc?
> 
> 
>>
>>> +
>>> +   qdisc_reset(qdisc);
>>> +
>>> +   spin_unlock_bh(qdisc_lock(qdisc));
>>> +   if (nolock)
>>> +   spin_unlock_bh(&qdisc->seqlock);
>>> +}
>>> +
>>>  static bool some_qdisc_is_busy(struct net_device *dev)
>>>  {
>>> unsigned int i;
>>> @@ -1213,12 +1223,20 @@ void dev_deactivate_many(struct list_head *head)
>>> dev_watchdog_down(dev);
>>> }
>>>
>>> -   /* Wait for outstanding qdisc-less dev_queue_xmit calls.
>>> +   /* Wait for outstanding qdisc-less dev_queue_xmit calls or
>>> +* outstanding qdisc enqueuing calls.
>>>  * This is avoided if all devices are in dismantle phase :
>>>  * Caller will call synchronize_net() for us
>>>  */
>>> synchronize_net();
>>>
>>> +   list_for_each_entry(dev, head, close_list) {
>>> +   netdev_for_each_tx_queue(dev, dev_reset_queue, NULL);
>>> +
>>> +   if (dev_ingress_queue(dev))
>>> +   dev_reset_queue(dev, dev_ingress_queue(dev), NULL);
>>> +   }
>>> +
>>> /* Wait for outstanding qdisc_run calls. */
>>> list_for_each_entry(dev, head, close_list) {
>>> while (some_qdisc_is_busy(dev)) {
>>
>> Do you want to reset before waiting for TX action?
>>
>> I think it is safer to do it after, at least prior to commit 759ae57f1b
>> we did after.
> 
> The reference to the txq->qdisc is always protected by RCU, so the 
> synchronize_net()
> should be enought to ensure there is no skb enqueued to the old qdisc that is 
> saved
> in the dev_queue->qdisc_sleeping, because __dev_queue_xmit can only see the 
> new qdisc
> after synchronize_net(), which is noop_qdisc, and noop_qdisc will make sure 
> any skb
> enqueued to it will be dropped and freed, right?
> 
> If we do any additional reset that is not related to qdisc in 
> dev_reset_queue(), we
> can move it after some_qdisc_is_busy() checking.
> 
> Also, it seems the __QDISC_ST

Re: [PATCH v2] hv_netvsc: Add validation for untrusted Hyper-V values

2020-09-11 Thread Andrea Parri
> > @@ -740,12 +755,45 @@ static void netvsc_send_completion(struct
> > net_device *ndev,
> >int budget)
> >  {
> > const struct nvsp_message *nvsp_packet = hv_pkt_data(desc);
> > +   u32 msglen = hv_pkt_datalen(desc);
> > +
> > +   /* Ensure packet is big enough to read header fields */
> > +   if (msglen < sizeof(struct nvsp_message_header)) {
> > +   netdev_err(ndev, "nvsp_message length too small: %u\n",
> > msglen);
> > +   return;
> > +   }
> > 
> > switch (nvsp_packet->hdr.msg_type) {
> > case NVSP_MSG_TYPE_INIT_COMPLETE:
> > +   if (msglen < sizeof(struct nvsp_message_init_complete)) {
> 
> This and other similar places should include header size:
>   if (msglen < sizeof(struct nvsp_message_header) + sizeof(struct 
> nvsp_message_init_complete)) {

Thanks for pointing this out; fixing for v3...

  Andrea


Cloudflare L4LB - UNIMOG - using XDP and TC cls

2020-09-11 Thread Marek Majkowski
Hello,

I know the community is looking for examples of eBPF usage. David from
Cloudflare wrote a blog post about our Layer 4 Load Balancer called
UNIMOG. It's a long read but goes into many architectural details:

https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer/

We added the tc cls component to the selftests:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/test_cls_redirect.c

Cheers,
Marek


Re: [PATCH] ath11k: Remove unused inline function htt_htt_stats_debug_dump()

2020-09-11 Thread Kalle Valo
YueHaibing  wrote:

> There is no caller in tree, so can remove it.
> 
> Signed-off-by: YueHaibing 
> Signed-off-by: Kalle Valo 

Patch applied to ath-next branch of ath.git, thanks.

9bc260653a1d ath11k: Remove unused inline function htt_htt_stats_debug_dump()

-- 
https://patchwork.kernel.org/patch/11765693/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



[PATCH RESEND bpf-next v3 0/9] bpf: Support multi-attach for freplace programs

2020-09-11 Thread Toke Høiland-Jørgensen
This series adds support attaching freplace BPF programs to multiple targets.
This is needed to support incremental attachment of multiple XDP programs using
the libxdp dispatcher model.

The first three patches are refactoring patches: The first one is a trivial
change to the logging in the verifier, split out to make the subsequent refactor
easier to read. Patch 2 refactors check_attach_btf_id() so that the checks on
program and target compatibility can be reused when attaching to a secondary
location.

Patch 3 changes prog_aux->linked_prog to be an embedded bpf_tracing_link that is
initialised at program load time. This nicely encapsulates both the trampoline
and the prog reference, and moves the release of these references into bpf_link
teardown. At raw_tracepoint_open() time (i.e., when the link is attached), it
will be removed from the extension prog, and primed as a regular bpf_link.

Based on these refactorings, it becomes pretty straight-forward to support
multiple-attach for freplace programs (patch 4). This is simply a matter of
creating a second bpf_tracing_link if a target is supplied to
raw_tracepoint_open().

Patch 5 is a port of Jiri Olsa's patch to support fentry/fexit on freplace
programs. His approach of getting the target type from the target program
reference no longer works after we've gotten rid of linked_prog (because the
bpf_tracing_link reference disappears on attach). Instead, we used the saved
reference to the target prog type that is also used to verify compatibility on
secondary freplace attachment.

Patches 6-7 are tools and libbpf updates, and patches 8-9 are selftests, the
first one for the multi-freplace functionality itself, and the second one is
Jiri's previous selftest for the fentry-to-freplace fix.

With this series, libxdp and xdp-tools can successfully attach multiple programs
one at a time. To play with this, use the 'freplace-multi-attach' branch of
xdp-tools:

$ git clone --recurse-submodules --branch freplace-multi-attach 
https://github.com/xdp-project/xdp-tools
$ cd xdp-tools
$ make
$ sudo ./xdp-loader/xdp-loader load veth0 lib/testing/xdp_drop.o
$ sudo ./xdp-loader/xdp-loader load veth0 lib/testing/xdp_pass.o
$ sudo ./xdp-loader/xdp-loader status

The series is also available here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=bpf-freplace-multi-attach-alt-03

Changelog:

v3:
  - Get rid of prog_aux->linked_prog entirely in favour of a bpf_tracing_link
  - Incorporate Jiri's fix for attaching fentry to freplace programs

v2:
  - Drop the log arguments from bpf_raw_tracepoint_open
  - Fix kbot errors
  - Rebase to latest bpf-next

---

Jiri Olsa (1):
  selftests/bpf: Adding test for arg dereference in extension trace

Toke Høiland-Jørgensen (8):
  bpf: change logging calls from verbose() to bpf_log() and use log pointer
  bpf: verifier: refactor check_attach_btf_id()
  bpf: wrap prog->aux->linked_prog in a bpf_tracing_link
  bpf: support attaching freplace programs to multiple attach points
  bpf: Fix context type resolving for extension programs
  tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  libbpf: add support for supplying target to bpf_raw_tracepoint_open()
  selftests: add test for multiple attachments of freplace program


 include/linux/bpf.h   |  33 ++-
 include/linux/bpf_verifier.h  |   9 +
 include/uapi/linux/bpf.h  |   6 +-
 kernel/bpf/btf.c  |  22 +-
 kernel/bpf/core.c |   5 +-
 kernel/bpf/syscall.c  | 161 +--
 kernel/bpf/trampoline.c   |  34 ++-
 kernel/bpf/verifier.c | 251 ++
 tools/include/uapi/linux/bpf.h|   6 +-
 tools/lib/bpf/bpf.c   |  13 +-
 tools/lib/bpf/bpf.h   |   9 +
 tools/lib/bpf/libbpf.map  |   1 +
 .../selftests/bpf/prog_tests/fexit_bpf2bpf.c  | 171 +---
 .../selftests/bpf/prog_tests/trace_ext.c  |  93 +++
 .../bpf/progs/freplace_get_constant.c |  15 ++
 .../selftests/bpf/progs/test_trace_ext.c  |  18 ++
 .../bpf/progs/test_trace_ext_tracing.c|  25 ++
 17 files changed, 683 insertions(+), 189 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/freplace_get_constant.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_trace_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_trace_ext_tracing.c



[PATCH RESEND bpf-next v3 1/9] bpf: change logging calls from verbose() to bpf_log() and use log pointer

2020-09-11 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.

Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
for the log struct so we can re-use the code with logging disabled.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/bpf.h  |2 +-
 include/linux/bpf_verifier.h |5 +++-
 kernel/bpf/btf.c |6 +++--
 kernel/bpf/verifier.c|   48 +-
 4 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c6d9f2c444f4..5ad4a935a24e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1394,7 +1394,7 @@ int btf_check_func_arg_match(struct bpf_verifier_env 
*env, int subprog,
 struct bpf_reg_state *regs);
 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
  struct bpf_reg_state *reg);
-int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
+int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog 
*prog,
 struct btf *btf, const struct btf_type *t);
 
 struct bpf_prog *bpf_prog_by_id(u32 id);
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 53c7bd568c5d..20009e766805 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -347,8 +347,9 @@ static inline bool bpf_verifier_log_full(const struct 
bpf_verifier_log *log)
 
 static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 {
-   return (log->level && log->ubuf && !bpf_verifier_log_full(log)) ||
-   log->level == BPF_LOG_KERNEL;
+   return log &&
+   ((log->level && log->ubuf && !bpf_verifier_log_full(log)) ||
+log->level == BPF_LOG_KERNEL);
 }
 
 #define BPF_MAX_SUBPROGS 256
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index f9ac6935ab3c..2ace56c99c36 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -4401,7 +4401,7 @@ static int btf_check_func_type_match(struct 
bpf_verifier_log *log,
 }
 
 /* Compare BTFs of given program with BTF of target program */
-int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
+int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog 
*prog,
 struct btf *btf2, const struct btf_type *t2)
 {
struct btf *btf1 = prog->aux->btf;
@@ -4409,7 +4409,7 @@ int btf_check_type_match(struct bpf_verifier_env *env, 
struct bpf_prog *prog,
u32 btf_id = 0;
 
if (!prog->aux->func_info) {
-   bpf_log(&env->log, "Program extension requires BTF\n");
+   bpf_log(log, "Program extension requires BTF\n");
return -EINVAL;
}
 
@@ -4421,7 +4421,7 @@ int btf_check_type_match(struct bpf_verifier_env *env, 
struct bpf_prog *prog,
if (!t1 || !btf_type_is_func(t1))
return -EFAULT;
 
-   return btf_check_func_type_match(&env->log, btf1, t1, btf2, t2);
+   return btf_check_func_type_match(log, btf1, t1, btf2, t2);
 }
 
 /* Compare BTF of a function with given bpf_reg_state.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 814bc6c1ad16..0be7a187fb7f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11043,6 +11043,7 @@ static int check_attach_btf_id(struct bpf_verifier_env 
*env)
struct bpf_prog *prog = env->prog;
bool prog_extension = prog->type == BPF_PROG_TYPE_EXT;
struct bpf_prog *tgt_prog = prog->aux->linked_prog;
+   struct bpf_verifier_log *log = &env->log;
u32 btf_id = prog->aux->attach_btf_id;
const char prefix[] = "btf_trace_";
struct btf_func_model fmodel;
@@ -11070,23 +11071,23 @@ static int check_attach_btf_id(struct 
bpf_verifier_env *env)
return 0;
 
if (!btf_id) {
-   verbose(env, "Tracing programs must provide btf_id\n");
+   bpf_log(log, "Tracing programs must provide btf_id\n");
return -EINVAL;
}
btf = bpf_prog_get_target_btf(prog);
if (!btf) {
-   verbose(env,
+   bpf_log(log,
"FENTRY/FEXIT program can only be attached to another 
program annotated with BTF\n");
return -EINVAL;
}
t = btf_type_by_id(btf, btf_id);
if (!t) {
-   verbose(env, "attach_btf_id %u is invalid\n", btf_id);
+   bpf_log(log, "attach_btf_id %u is invalid\n", btf_id);
return -EINVAL;
}
tname = btf_name_by_offset(btf, t->name_off);
if (!tname) {
-   verbose(env, "attach_btf_id %u doesn't have a

Re: [PATCH] ath10k: Remove unused macro ATH10K_ROC_TIMEOUT_HZ

2020-09-11 Thread Kalle Valo
YueHaibing  wrote:

> There is no caller in tree, so can remove it.
> 
> Signed-off-by: YueHaibing 
> Signed-off-by: Kalle Valo 

Patch applied to ath-next branch of ath.git, thanks.

42a08ff79ff5 ath10k: Remove unused macro ATH10K_ROC_TIMEOUT_HZ

-- 
https://patchwork.kernel.org/patch/11765703/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



[PATCH RESEND bpf-next v3 3/9] bpf: wrap prog->aux->linked_prog in a bpf_tracing_link

2020-09-11 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The bpf_tracing_link structure is a convenient data structure to contain
the reference to a linked program; in preparation for supporting multiple
attachments for the same freplace program, move the linked_prog in
prog->aux into a bpf_tracing_link wrapper.

With this change, it is no longer possible to attach the same tracing
program multiple times (detaching in-between), since the reference from the
tracing program to the target disappears on the first attach. However,
since the next patch will let the caller supply an attach target, that will
also make it possible to attach to the same place multiple times.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/bpf.h |   21 +---
 kernel/bpf/btf.c|   13 +---
 kernel/bpf/core.c   |5 +--
 kernel/bpf/syscall.c|   81 +--
 kernel/bpf/trampoline.c |   12 ++-
 kernel/bpf/verifier.c   |   13 +---
 6 files changed, 102 insertions(+), 43 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7f19c3216370..722c60f1c1fc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -26,6 +26,7 @@ struct bpf_verifier_log;
 struct perf_event;
 struct bpf_prog;
 struct bpf_prog_aux;
+struct bpf_tracing_link;
 struct bpf_map;
 struct sock;
 struct seq_file;
@@ -614,8 +615,8 @@ static __always_inline unsigned int bpf_dispatcher_nop_func(
 }
 #ifdef CONFIG_BPF_JIT
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
-int bpf_trampoline_link_prog(struct bpf_prog *prog);
-int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
+int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
+int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline 
*tr);
 int bpf_trampoline_get(u64 key, void *addr,
   struct btf_func_model *fmodel,
   struct bpf_trampoline **trampoline);
@@ -667,11 +668,13 @@ static inline struct bpf_trampoline 
*bpf_trampoline_lookup(u64 key)
 {
return NULL;
 }
-static inline int bpf_trampoline_link_prog(struct bpf_prog *prog)
+static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,
+  struct bpf_trampoline *tr)
 {
return -ENOTSUPP;
 }
-static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
+static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog,
+struct bpf_trampoline *tr)
 {
return -ENOTSUPP;
 }
@@ -740,14 +743,13 @@ struct bpf_prog_aux {
u32 max_rdonly_access;
u32 max_rdwr_access;
const struct bpf_ctx_arg_aux *ctx_arg_info;
-   struct bpf_prog *linked_prog;
+   struct bpf_tracing_link *tgt_link;
bool verifier_zext; /* Zero extensions has been inserted by verifier. */
bool offload_requested;
bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
bool func_proto_unreliable;
bool sleepable;
enum bpf_tramp_prog_type trampoline_prog_type;
-   struct bpf_trampoline *trampoline;
struct hlist_node tramp_hlist;
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
const struct btf_type *attach_func_proto;
@@ -827,6 +829,13 @@ struct bpf_link {
struct work_struct work;
 };
 
+struct bpf_tracing_link {
+   struct bpf_link link;
+   enum bpf_attach_type attach_type;
+   struct bpf_trampoline *trampoline;
+   struct bpf_prog *tgt_prog;
+};
+
 struct bpf_link_ops {
void (*release)(struct bpf_link *link);
void (*dealloc)(struct bpf_link *link);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 2ace56c99c36..e10f13f8251c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3706,10 +3706,10 @@ struct btf *btf_parse_vmlinux(void)
 
 struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog)
 {
-   struct bpf_prog *tgt_prog = prog->aux->linked_prog;
+   struct bpf_tracing_link *tgt_link = prog->aux->tgt_link;
 
-   if (tgt_prog) {
-   return tgt_prog->aux->btf;
+   if (tgt_link && tgt_link->tgt_prog) {
+   return tgt_link->tgt_prog->aux->btf;
} else {
return btf_vmlinux;
}
@@ -3733,14 +3733,17 @@ bool btf_ctx_access(int off, int size, enum 
bpf_access_type type,
struct bpf_insn_access_aux *info)
 {
const struct btf_type *t = prog->aux->attach_func_proto;
-   struct bpf_prog *tgt_prog = prog->aux->linked_prog;
struct btf *btf = bpf_prog_get_target_btf(prog);
const char *tname = prog->aux->attach_func_name;
struct bpf_verifier_log *log = info->log;
+   struct bpf_prog *tgt_prog = NULL;
const struct btf_param *args;
u32 nr_args, arg;
int i, ret;
 
+   if (prog->aux->tgt_link)
+   tgt_prog = prog->aux->tgt_link->tgt_prog;
+
if (off % 8) {
bpf_log(log, "func '%

[PATCH RESEND bpf-next v3 5/9] bpf: Fix context type resolving for extension programs

2020-09-11 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Eelco reported we can't properly access arguments if the tracing
program is attached to extension program.

Having following program:

  SEC("classifier/test_pkt_md_access")
  int test_pkt_md_access(struct __sk_buff *skb)

with its extension:

  SEC("freplace/test_pkt_md_access")
  int test_pkt_md_access_new(struct __sk_buff *skb)

and tracing that extension with:

  SEC("fentry/test_pkt_md_access_new")
  int BPF_PROG(fentry, struct sk_buff *skb)

It's not possible to access skb argument in the fentry program,
with following error from verifier:

  ; int BPF_PROG(fentry, struct sk_buff *skb)
  0: (79) r1 = *(u64 *)(r1 +0)
  invalid bpf_context access off=0 size=8

The problem is that btf_ctx_access gets the context type for the
traced program, which is in this case the extension.

But when we trace extension program, we want to get the context
type of the program that the extension is attached to, so we can
access the argument properly in the trace program.

This version of the patch is tweaked slightly from Jiri's original one,
since the refactoring in the previous patches means we have to get the
target prog type from the new variable in prog->aux instead of directly
from the target prog.

Reported-by: Eelco Chaudron 
Suggested-by: Jiri Olsa 
Signed-off-by: Toke Høiland-Jørgensen 
---
 kernel/bpf/btf.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index e10f13f8251c..1a48253ba168 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3863,7 +3863,14 @@ bool btf_ctx_access(int off, int size, enum 
bpf_access_type type,
 
info->reg_type = PTR_TO_BTF_ID;
if (tgt_prog) {
-   ret = btf_translate_to_vmlinux(log, btf, t, tgt_prog->type, 
arg);
+   enum bpf_prog_type tgt_type;
+
+   if (tgt_prog->type == BPF_PROG_TYPE_EXT)
+   tgt_type = tgt_prog->aux->tgt_prog_type;
+   else
+   tgt_type = tgt_prog->type;
+
+   ret = btf_translate_to_vmlinux(log, btf, t, tgt_type, arg);
if (ret > 0) {
info->btf_id = ret;
return true;



[PATCH RESEND bpf-next v3 2/9] bpf: verifier: refactor check_attach_btf_id()

2020-09-11 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The check_attach_btf_id() function really does three things:

1. It performs a bunch of checks on the program to ensure that the
   attachment is valid.

2. It stores a bunch of state about the attachment being requested in
   the verifier environment and struct bpf_prog objects.

3. It allocates a trampoline for the attachment.

This patch splits out (1.) and (3.) into separate functions in preparation
for reusing them when the actual attachment is happening (in the
raw_tracepoint_open syscall operation), which will allow tracing programs
to have multiple (compatible) attachments.

No functional change is intended with this patch.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/bpf.h  |9 ++
 include/linux/bpf_verifier.h |9 ++
 kernel/bpf/trampoline.c  |   22 
 kernel/bpf/verifier.c|  233 +++---
 4 files changed, 170 insertions(+), 103 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5ad4a935a24e..7f19c3216370 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -616,6 +616,9 @@ static __always_inline unsigned int bpf_dispatcher_nop_func(
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
+int bpf_trampoline_get(u64 key, void *addr,
+  struct btf_func_model *fmodel,
+  struct bpf_trampoline **trampoline);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
 #define BPF_DISPATCHER_INIT(_name) {   \
.mutex = __MUTEX_INITIALIZER(_name.mutex),  \
@@ -672,6 +675,12 @@ static inline int bpf_trampoline_unlink_prog(struct 
bpf_prog *prog)
 {
return -ENOTSUPP;
 }
+static inline int bpf_trampoline_get(u64 key, void *addr,
+struct btf_func_model *fmodel,
+struct bpf_trampoline **trampoline)
+{
+   return -EOPNOTSUPP;
+}
 static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
 #define DEFINE_BPF_DISPATCHER(name)
 #define DECLARE_BPF_DISPATCHER(name)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 20009e766805..db3db0b69aad 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -447,4 +447,13 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env 
*env, u32 off, u32 cnt);
 int check_ctx_reg(struct bpf_verifier_env *env,
  const struct bpf_reg_state *reg, int regno);
 
+int bpf_check_attach_target(struct bpf_verifier_log *log,
+   const struct bpf_prog *prog,
+   const struct bpf_prog *tgt_prog,
+   u32 btf_id,
+   struct btf_func_model *fmodel,
+   long *tgt_addr,
+   const char **tgt_name,
+   const struct btf_type **tgt_type);
+
 #endif /* _LINUX_BPF_VERIFIER_H */
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 7dd523a7e32d..cb442c7ece10 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -336,6 +336,28 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
return err;
 }
 
+int bpf_trampoline_get(u64 key, void *addr,
+  struct btf_func_model *fmodel,
+  struct bpf_trampoline **trampoline)
+{
+   struct bpf_trampoline *tr;
+
+   tr = bpf_trampoline_lookup(key);
+   if (!tr)
+   return -ENOMEM;
+
+   mutex_lock(&tr->mutex);
+   if (tr->func.addr)
+   goto out;
+
+   memcpy(&tr->func.model, fmodel, sizeof(*fmodel));
+   tr->func.addr = addr;
+out:
+   mutex_unlock(&tr->mutex);
+   *trampoline = tr;
+   return 0;
+}
+
 void bpf_trampoline_put(struct bpf_trampoline *tr)
 {
if (!tr)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0be7a187fb7f..f2624784b915 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11038,43 +11038,29 @@ static int check_non_sleepable_error_inject(u32 
btf_id)
return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
 }
 
-static int check_attach_btf_id(struct bpf_verifier_env *env)
+int bpf_check_attach_target(struct bpf_verifier_log *log,
+   const struct bpf_prog *prog,
+   const struct bpf_prog *tgt_prog,
+   u32 btf_id,
+   struct btf_func_model *fmodel,
+   long *tgt_addr,
+   const char **tgt_name,
+   const struct btf_type **tgt_type)
 {
-   struct bpf_prog *prog = env->prog;
bool prog_extension = prog->type == BPF_PROG_TYPE_EXT;
-   struct bpf_prog *tgt_prog = prog->aux->linked_prog;
-   struct bpf_verifier_log *log = &env->l

[PATCH RESEND bpf-next v3 4/9] bpf: support attaching freplace programs to multiple attach points

2020-09-11 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

This enables support for attaching freplace programs to multiple attach
points. It does this by amending UAPI for bpf_raw_tracepoint_open with a
target prog fd and btf ID pair that can be used to supply the new
attachment point. The target must be compatible with the target that was
supplied at program load time.

The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.

The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, the bpf_tracing_prog_attach() function will reject new targets for
anything other than PROG_TYPE_EXT programs.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/bpf.h  |3 +
 include/uapi/linux/bpf.h |6 ++-
 kernel/bpf/syscall.c |   96 +++---
 kernel/bpf/verifier.c|9 
 4 files changed, 97 insertions(+), 17 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 722c60f1c1fc..c6b856b2d296 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -753,6 +753,9 @@ struct bpf_prog_aux {
struct hlist_node tramp_hlist;
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
const struct btf_type *attach_func_proto;
+   /* target BPF prog types for trace programs */
+   enum bpf_prog_type tgt_prog_type;
+   enum bpf_attach_type tgt_attach_type;
/* function name for valid attach_btf_id */
const char *attach_func_name;
struct bpf_prog **func;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 90359cab501d..0885ab6ac8d9 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -595,8 +595,10 @@ union bpf_attr {
} query;
 
struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
-   __u64 name;
-   __u32 prog_fd;
+   __u64   name;
+   __u32   prog_fd;
+   __u32   tgt_prog_fd;
+   __u32   tgt_btf_id;
} raw_tracepoint;
 
struct { /* anonymous struct for BPF_BTF_LOAD */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2d238aa8962e..7b1da5f063eb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2582,10 +2583,16 @@ static struct bpf_tracing_link 
*bpf_tracing_link_create(struct bpf_prog *prog,
return link;
 }
 
-static int bpf_tracing_prog_attach(struct bpf_prog *prog)
+static int bpf_tracing_prog_attach(struct bpf_prog *prog,
+  int tgt_prog_fd,
+  u32 btf_id)
 {
-   struct bpf_tracing_link *link, *olink;
struct bpf_link_primer link_primer;
+   struct bpf_prog *tgt_prog = NULL;
+   struct bpf_tracing_link *link;
+   struct btf_func_model fmodel;
+   long addr;
+   u64 key;
int err;
 
switch (prog->type) {
@@ -2613,28 +2620,80 @@ static int bpf_tracing_prog_attach(struct bpf_prog 
*prog)
err = -EINVAL;
goto out_put_prog;
}
+   if (tgt_prog_fd) {
+   /* For now we only allow new targets for BPF_PROG_TYPE_EXT */
+   if (prog->type != BPF_PROG_TYPE_EXT ||
+   !btf_id) {
+   err = -EINVAL;
+   goto out_put_prog;
+   }
 
-   link = READ_ONCE(prog->aux->tgt_link);
-   if (!link) {
-   err = -ENOENT;
+   tgt_prog = bpf_prog_get(tgt_prog_fd);
+   if (IS_ERR(tgt_prog)) {
+   err = PTR_ERR(tgt_prog);
+   tgt_prog = NULL;
+   goto out_put_prog;
+   }
+
+   key = ((u64)tgt_prog->aux->id) << 32 | btf_id;
+   } else if (btf_id) {
+   err = -EINVAL;
goto out_put_prog;
}
-   olink = cmpxchg(&prog->aux->tgt_link, link, NULL);
-   if (olink != link) {
-   err = -ENOENT;
-   goto out_put_prog;
+
+   link = READ_ONCE(prog->aux->tgt_link);
+   if (link) {
+   if (tgt_prog && link->trampoline->key != key) {
+   link = NULL;
+   } else {
+   struct bpf_tracing_link *olink;
+
+   olink = cmpxchg(&prog->aux->tgt_link, link, NULL);
+   if (olink != link) {
+   link = NULL;
+   } else if (tgt_prog) {
+   /* re-using link that already has ref on
+* tgt

Re: [PATCH bpf-next v3 1/9] bpf: change logging calls from verbose() to bpf_log() and use log pointer

2020-09-11 Thread Toke Høiland-Jørgensen
Andrii Nakryiko  writes:

> On Thu, Sep 10, 2020 at 6:13 AM Toke Høiland-Jørgensen  
> wrote:
>>
>> From: Toke Høiland-Jørgensen 
>>
>> In preparation for moving code around, change a bunch of references to
>> env->log (and the verbose() logging helper) to use bpf_log() and a direct
>> pointer to struct bpf_verifier_log. While we're touching the function
>> signature, mark the 'prog' argument to bpf_check_type_match() as const.
>>
>> Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
>> for the log struct so we can re-use the code with logging disabled.
>>
>> Signed-off-by: Toke Høiland-Jørgensen 
>> ---
>
> Only 4 out of 9 emails arrived, can you please resubmit your entire
> patch set again?

Sure, done :)

-Toke



Re: [PATCH v2 01/20] ethernet: alteon: convert tasklets to use new tasklet_setup() API

2020-09-11 Thread Allen
> >> >
> >> >
> >> > -static void ace_tasklet(unsigned long arg)
> >> > +static void ace_tasklet(struct tasklet_struct *t)
> >> >  {
> >> > - struct net_device *dev = (struct net_device *) arg;
> >> > - struct ace_private *ap = netdev_priv(dev);
> >> > + struct ace_private *ap = from_tasklet(ap, t, ace_tasklet);
> >> > + struct net_device *dev = (struct net_device *)((char *)ap -
> >> > + ALIGN(sizeof(struct net_device), 
> >> > NETDEV_ALIGN));
> >> >   int cur_size;
> >> >
> >>
> >> I don't see this is as an improvement.  The 'dev' assignment looks so
> >> incredibly fragile and exposes so many internal details about netdev
> >> object allocation, alignment, and layout.
> >>
> >> Who is going to find and fix this if someone changes how netdev object
> >> allocation works?
> >>
> >
> > Thanks for pointing it out. I'll see if I can fix it to keep it simple.
>
> Just add a backpointer to the netdev from the netdev_priv() if you
> absolutely have too.
>

How does this look?
diff --git a/drivers/net/ethernet/alteon/acenic.c
b/drivers/net/ethernet/alteon/acenic.c
index 8470c836fa18..1a7e4df9b3e9 100644
--- a/drivers/net/ethernet/alteon/acenic.c
+++ b/drivers/net/ethernet/alteon/acenic.c
@@ -465,6 +465,7 @@ static int acenic_probe_one(struct pci_dev *pdev,
SET_NETDEV_DEV(dev, &pdev->dev);

ap = netdev_priv(dev);
+   ap->ndev = dev;
ap->pdev = pdev;
ap->name = pci_name(pdev);

@@ -1562,10 +1563,10 @@ static void ace_watchdog(struct net_device
*data, unsigned int txqueue)
 }


-static void ace_tasklet(unsigned long arg)
+static void ace_tasklet(struct tasklet_struct *t)
 {
-   struct net_device *dev = (struct net_device *) arg;
-   struct ace_private *ap = netdev_priv(dev);
+   struct ace_private *ap = from_tasklet(ap, t, ace_tasklet);
+   struct net_device *dev = ap->ndev;
int cur_size;

cur_size = atomic_read(&ap->cur_rx_bufs);
@@ -2269,7 +2270,7 @@ static int ace_open(struct net_device *dev)
/*
 * Setup the bottom half rx ring refill handler
 */
-   tasklet_init(&ap->ace_tasklet, ace_tasklet, (unsigned long)dev);
+   tasklet_setup(&ap->ace_tasklet, ace_tasklet);
return 0;
 }

diff --git a/drivers/net/ethernet/alteon/acenic.h
b/drivers/net/ethernet/alteon/acenic.h
index c670067b1541..265fa601a258 100644
--- a/drivers/net/ethernet/alteon/acenic.h
+++ b/drivers/net/ethernet/alteon/acenic.h
@@ -633,6 +633,7 @@ struct ace_skb
  */
 struct ace_private
 {
+   struct net_device   *ndev;  /* backpointer */
struct ace_info *info;
struct ace_regs __iomem *regs;  /* register base */
struct ace_skb  *skb;
@@ -776,7 +777,7 @@ static int ace_open(struct net_device *dev);
 static netdev_tx_t ace_start_xmit(struct sk_buff *skb,
  struct net_device *dev);
 static int ace_close(struct net_device *dev);
-static void ace_tasklet(unsigned long dev);
+static void ace_tasklet(struct tasklet_struct *t);
 static void ace_dump_trace(struct ace_private *ap);
 static void ace_set_multicast_list(struct net_device *dev);
 static int ace_change_mtu(struct net_device *dev, int new_mtu);

Let me know what you think.

Thanks.


[PATCH RESEND bpf-next v3 9/9] selftests/bpf: Adding test for arg dereference in extension trace

2020-09-11 Thread Toke Høiland-Jørgensen
From: Jiri Olsa 

Adding test that setup following program:

  SEC("classifier/test_pkt_md_access")
  int test_pkt_md_access(struct __sk_buff *skb)

with its extension:

  SEC("freplace/test_pkt_md_access")
  int test_pkt_md_access_new(struct __sk_buff *skb)

and tracing that extension with:

  SEC("fentry/test_pkt_md_access_new")
  int BPF_PROG(fentry, struct sk_buff *skb)

The test verifies that the tracing program can
dereference skb argument properly.

Signed-off-by: Jiri Olsa 
---
 tools/testing/selftests/bpf/prog_tests/trace_ext.c |   93 
 tools/testing/selftests/bpf/progs/test_trace_ext.c |   18 
 .../selftests/bpf/progs/test_trace_ext_tracing.c   |   25 +
 3 files changed, 136 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_trace_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_trace_ext_tracing.c

diff --git a/tools/testing/selftests/bpf/prog_tests/trace_ext.c 
b/tools/testing/selftests/bpf/prog_tests/trace_ext.c
new file mode 100644
index ..1089dafb4653
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/trace_ext.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_trace_ext.skel.h"
+#include "test_trace_ext_tracing.skel.h"
+
+static __u32 duration;
+
+void test_trace_ext(void)
+{
+   struct test_trace_ext_tracing *skel_trace = NULL;
+   struct test_trace_ext_tracing__bss *bss_trace;
+   const char *file = "./test_pkt_md_access.o";
+   struct test_trace_ext *skel_ext = NULL;
+   struct test_trace_ext__bss *bss_ext;
+   int err, prog_fd, ext_fd;
+   struct bpf_object *obj;
+   char buf[100];
+   __u32 retval;
+   __u64 len;
+
+   err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
+   if (CHECK_FAIL(err))
+   return;
+
+   DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts,
+   .attach_prog_fd = prog_fd,
+   );
+
+   skel_ext = test_trace_ext__open_opts(&opts);
+   if (CHECK(!skel_ext, "setup", "freplace/test_pkt_md_access open 
failed\n"))
+   goto cleanup;
+
+   err = test_trace_ext__load(skel_ext);
+   if (CHECK(err, "setup", "freplace/test_pkt_md_access load failed\n")) {
+   libbpf_strerror(err, buf, sizeof(buf));
+   fprintf(stderr, "%s\n", buf);
+   goto cleanup;
+   }
+
+   err = test_trace_ext__attach(skel_ext);
+   if (CHECK(err, "setup", "freplace/test_pkt_md_access attach failed: 
%d\n", err))
+   goto cleanup;
+
+   ext_fd = bpf_program__fd(skel_ext->progs.test_pkt_md_access_new);
+
+   DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts_trace,
+   .attach_prog_fd = ext_fd,
+   );
+
+   skel_trace = test_trace_ext_tracing__open_opts(&opts_trace);
+   if (CHECK(!skel_trace, "setup", "tracing/test_pkt_md_access_new open 
failed\n"))
+   goto cleanup;
+
+   err = test_trace_ext_tracing__load(skel_trace);
+   if (CHECK(err, "setup", "tracing/test_pkt_md_access_new load 
failed\n")) {
+   libbpf_strerror(err, buf, sizeof(buf));
+   fprintf(stderr, "%s\n", buf);
+   goto cleanup;
+   }
+
+   err = test_trace_ext_tracing__attach(skel_trace);
+   if (CHECK(err, "setup", "tracing/test_pkt_md_access_new attach failed: 
%d\n", err))
+   goto cleanup;
+
+   err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+   NULL, NULL, &retval, &duration);
+   CHECK(err || retval, "",
+ "err %d errno %d retval %d duration %d\n",
+ err, errno, retval, duration);
+
+   bss_ext = skel_ext->bss;
+   bss_trace = skel_trace->bss;
+
+   len = bss_ext->ext_called;
+
+   CHECK(bss_ext->ext_called == 0,
+   "check", "failed to trigger freplace/test_pkt_md_access\n");
+   CHECK(bss_trace->fentry_called != len,
+   "check", "failed to trigger fentry/test_pkt_md_access_new\n");
+   CHECK(bss_trace->fexit_called != len,
+   "check", "failed to trigger fexit/test_pkt_md_access_new\n");
+
+cleanup:
+   test_trace_ext__destroy(skel_ext);
+   bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_trace_ext.c 
b/tools/testing/selftests/bpf/progs/test_trace_ext.c
new file mode 100644
index ..a6318f6b52ee
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_trace_ext.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include 
+#include 
+#include 
+#include 
+#include 
+
+volatile __u64 ext_called = 0;
+
+SEC("freplace/test_pkt_md_access")
+int test_pkt_md_access_new(struct __sk_buff *skb)
+{
+   ext_called = skb->len;
+   return 0;
+}
+
+char _lic

[PATCH RESEND bpf-next v3 7/9] libbpf: add support for supplying target to bpf_raw_tracepoint_open()

2020-09-11 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

This adds support for supplying a target fd and btf ID for the
raw_tracepoint_open() BPF operation, using a new bpf_raw_tracepoint_opts
structure. This can be used for attaching freplace programs to multiple
destinations.

Signed-off-by: Toke Høiland-Jørgensen 
---
 tools/lib/bpf/bpf.c  |   13 -
 tools/lib/bpf/bpf.h  |9 +
 tools/lib/bpf/libbpf.map |1 +
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 82b983ff6569..25c62993c406 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -804,17 +804,28 @@ int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 
*info_len)
return err;
 }
 
-int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+int bpf_raw_tracepoint_open_opts(const char *name, int prog_fd,
+struct bpf_raw_tracepoint_opts *opts)
 {
union bpf_attr attr;
 
+   if (!OPTS_VALID(opts, bpf_raw_tracepoint_opts))
+   return -EINVAL;
+
memset(&attr, 0, sizeof(attr));
attr.raw_tracepoint.name = ptr_to_u64(name);
attr.raw_tracepoint.prog_fd = prog_fd;
+   attr.raw_tracepoint.tgt_prog_fd = OPTS_GET(opts, tgt_prog_fd, 0);
+   attr.raw_tracepoint.tgt_btf_id = OPTS_GET(opts, tgt_btf_id, 0);
 
return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
 }
 
+int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+{
+   return bpf_raw_tracepoint_open_opts(name, prog_fd, NULL);
+}
+
 int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
 bool do_log)
 {
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 015d13f25fcc..30e8854374c0 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -233,7 +233,16 @@ LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void 
*info, __u32 *info_len);
 LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
  __u32 query_flags, __u32 *attach_flags,
  __u32 *prog_ids, __u32 *prog_cnt);
+struct bpf_raw_tracepoint_opts {
+   size_t sz; /* size of this struct for forward/backward compatibility */
+   int tgt_prog_fd; /* target program to attach to */
+   __u32 tgt_btf_id; /* BTF ID of target function */
+};
+#define bpf_raw_tracepoint_opts__last_field tgt_btf_id
+
 LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
+LIBBPF_API int bpf_raw_tracepoint_open_opts(const char *name, int prog_fd,
+   struct bpf_raw_tracepoint_opts 
*opts);
 LIBBPF_API int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf,
__u32 log_buf_size, bool do_log);
 LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 92ceb48a5ca2..a23d9f3f940c 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -303,6 +303,7 @@ LIBBPF_0.1.0 {
 LIBBPF_0.2.0 {
global:
bpf_program__section_name;
+   bpf_raw_tracepoint_open_opts;
perf_buffer__buffer_cnt;
perf_buffer__buffer_fd;
perf_buffer__epoll_fd;



[PATCH RESEND bpf-next v3 6/9] tools: add new members to bpf_attr.raw_tracepoint in bpf.h

2020-09-11 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Sync addition of new members from main kernel tree.

Signed-off-by: Toke Høiland-Jørgensen 
---
 tools/include/uapi/linux/bpf.h |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 90359cab501d..0885ab6ac8d9 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -595,8 +595,10 @@ union bpf_attr {
} query;
 
struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
-   __u64 name;
-   __u32 prog_fd;
+   __u64   name;
+   __u32   prog_fd;
+   __u32   tgt_prog_fd;
+   __u32   tgt_btf_id;
} raw_tracepoint;
 
struct { /* anonymous struct for BPF_BTF_LOAD */



[PATCH RESEND bpf-next v3 8/9] selftests: add test for multiple attachments of freplace program

2020-09-11 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

This adds a selftest for attaching an freplace program to multiple targets
simultaneously.

Signed-off-by: Toke Høiland-Jørgensen 
---
 .../selftests/bpf/prog_tests/fexit_bpf2bpf.c   |  171 
 .../selftests/bpf/progs/freplace_get_constant.c|   15 ++
 2 files changed, 154 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/freplace_get_constant.c

diff --git a/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c 
b/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c
index eda682727787..cdd0c74f2fbb 100644
--- a/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c
@@ -2,36 +2,79 @@
 /* Copyright (c) 2019 Facebook */
 #include 
 #include 
+#include 
+
+typedef int (*test_cb)(struct bpf_object *obj);
+
+static int check_data_map(struct bpf_object *obj, int prog_cnt, bool reset)
+{
+   struct bpf_map *data_map = NULL, *map;
+   __u64 *result = NULL;
+   const int zero = 0;
+   __u32 duration = 0;
+   int ret = -1, i;
+
+   result = malloc((prog_cnt + 32 /* spare */) * sizeof(__u64));
+   if (CHECK(!result, "alloc_memory", "failed to alloc memory"))
+   return -ENOMEM;
+
+   bpf_object__for_each_map(map, obj)
+   if (bpf_map__is_internal(map)) {
+   data_map = map;
+   break;
+   }
+   if (CHECK(!data_map, "find_data_map", "data map not found\n"))
+   goto out;
+
+   ret = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, result);
+   if (CHECK(ret, "get_result",
+ "failed to get output data: %d\n", ret))
+   goto out;
+
+   for (i = 0; i < prog_cnt; i++) {
+   if (CHECK(result[i] != 1, "result",
+ "fexit_bpf2bpf result[%d] failed err %llu\n",
+ i, result[i]))
+   goto out;
+   result[i] = 0;
+   }
+   if (reset) {
+   ret = bpf_map_update_elem(bpf_map__fd(data_map), &zero, result, 
0);
+   if (CHECK(ret, "reset_result", "failed to reset result\n"))
+   goto out;
+   }
+
+   ret = 0;
+out:
+   free(result);
+   return ret;
+}
 
 static void test_fexit_bpf2bpf_common(const char *obj_file,
  const char *target_obj_file,
  int prog_cnt,
  const char **prog_name,
- bool run_prog)
+ bool run_prog,
+ test_cb cb)
 {
-   struct bpf_object *obj = NULL, *pkt_obj;
-   int err, pkt_fd, i;
-   struct bpf_link **link = NULL;
+   struct bpf_object *obj = NULL, *tgt_obj;
struct bpf_program **prog = NULL;
+   struct bpf_link **link = NULL;
__u32 duration = 0, retval;
-   struct bpf_map *data_map;
-   const int zero = 0;
-   __u64 *result = NULL;
+   int err, tgt_fd, i;
 
err = bpf_prog_load(target_obj_file, BPF_PROG_TYPE_UNSPEC,
-   &pkt_obj, &pkt_fd);
+   &tgt_obj, &tgt_fd);
if (CHECK(err, "tgt_prog_load", "file %s err %d errno %d\n",
  target_obj_file, err, errno))
return;
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts,
-   .attach_prog_fd = pkt_fd,
+   .attach_prog_fd = tgt_fd,
   );
 
link = calloc(sizeof(struct bpf_link *), prog_cnt);
prog = calloc(sizeof(struct bpf_program *), prog_cnt);
-   result = malloc((prog_cnt + 32 /* spare */) * sizeof(__u64));
-   if (CHECK(!link || !prog || !result, "alloc_memory",
- "failed to alloc memory"))
+   if (CHECK(!link || !prog, "alloc_memory", "failed to alloc memory"))
goto close_prog;
 
obj = bpf_object__open_file(obj_file, &opts);
@@ -53,39 +96,33 @@ static void test_fexit_bpf2bpf_common(const char *obj_file,
goto close_prog;
}
 
-   if (!run_prog)
-   goto close_prog;
+   if (cb) {
+   err = cb(obj);
+   if (err)
+   goto close_prog;
+   }
 
-   data_map = bpf_object__find_map_by_name(obj, "fexit_bp.bss");
-   if (CHECK(!data_map, "find_data_map", "data map not found\n"))
+   if (!run_prog)
goto close_prog;
 
-   err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
+   err = bpf_prog_test_run(tgt_fd, 1, &pkt_v6, sizeof(pkt_v6),
NULL, NULL, &retval, &duration);
CHECK(err || retval, "ipv6",
  "err %d errno %d retval %d duration %d\n",
  err, errno, retval, duration);
 
-   err = bpf_map_lookup_elem(b

[PATCH v2 net-next] net: phy: mchp: Add support for LAN8814 QUAD PHY

2020-09-11 Thread Divya Koppera
LAN8814 is a low-power, quad-port triple-speed (10BASE-T/100BASETX/1000BASE-T)
Ethernet physical layer transceiver (PHY). It supports transmission and
reception of data on standard CAT-5, as well as CAT-5e and CAT-6, unshielded
twisted pair (UTP) cables.

LAN8814 supports industry-standard QSGMII (Quad Serial Gigabit Media
Independent Interface) and Q-USGMII (Quad Universal Serial Gigabit Media
Independent Interface) providing chip-to-chip connection to four Gigabit
Ethernet MACs using a single serialized link (differential pair) in each
direction.

The LAN8814 SKU supports high-accuracy timestamping functions to
support IEEE-1588 solutions using Microchip Ethernet switches, as well as
customer solutions based on SoCs and FPGAs.

The LAN8804 SKU has same features as that of LAN8814 SKU except that it does
not support 1588, SyncE, or Q-USGMII with PCH/MCH.

This adds support for 10BASE-T, 100BASE-TX, and 1000BASE-T,
QSGMII link with the MAC.

Signed-off-by: Divya Koppera
---
v1 -> v2:
* Removing get_features and config_init as the Errata mentioned and other
  functionality related things are not applicable for this phy.
  Addressed review comments.
---
 drivers/net/phy/micrel.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 9f60865587ea..a7f74b3b97af 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -1320,8 +1320,6 @@ static struct phy_driver ksphy_driver[] = {
.name   = "Microchip INDY Gigabit Quad PHY",
.driver_data= &ksz9021_type,
.probe  = kszphy_probe,
-   .get_features   = ksz9031_get_features,
-   .config_init= ksz9031_config_init,
.soft_reset = genphy_soft_reset,
.read_status= ksz9031_read_status,
.get_sset_count = kszphy_get_sset_count,
-- 
2.17.1



[PATCH net] enetc: Fix mdio bus removal on PF probe bailout

2020-09-11 Thread Claudiu Manoil
This is the correct resolution for the conflict from
merging the "net" tree fix:
commit 26cb7085c898 ("enetc: Remove the mdio bus on PF probe bailout")
with the "net-next" new work:
commit 07095c025ac2 ("net: enetc: Use DT protocol information to set up the 
ports")
that moved mdio bus allocation to an ealier stage of
the PF probing routine.

Fixes: a57066b1a019 ("Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Claudiu Manoil 
---
 drivers/net/ethernet/freescale/enetc/enetc_pf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 26d5981b798f..177334f0adb1 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -1053,7 +1053,6 @@ static int enetc_pf_probe(struct pci_dev *pdev,
 
 err_reg_netdev:
enetc_teardown_serdes(priv);
-   enetc_mdio_remove(pf);
enetc_free_msix(priv);
 err_alloc_msix:
enetc_free_si_resources(priv);
@@ -1061,6 +1060,7 @@ static int enetc_pf_probe(struct pci_dev *pdev,
si->ndev = NULL;
free_netdev(ndev);
 err_alloc_netdev:
+   enetc_mdio_remove(pf);
enetc_of_put_phy(pf);
 err_map_pf_space:
enetc_pci_remove(pdev);
-- 
2.17.1



Re: linux-next: manual merge of the net-next tree with the net tree

2020-09-11 Thread Paul Barker
On Fri, 11 Sep 2020 at 02:17, Stephen Rothwell  wrote:
>
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   drivers/net/dsa/microchip/ksz9477.c
>
> between commit:
>
>   edecfa98f602 ("net: dsa: microchip: look for phy-mode in port nodes")
>
> from the net tree and commit:
>
>   805a7e6f5388 ("net: dsa: microchip: Improve phy mode message")
>
> from the net-next tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc drivers/net/dsa/microchip/ksz9477.c
> index 2f5506ac7d19,b62dd64470a8..
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@@ -1229,12 -1229,15 +1229,15 @@@ static void ksz9477_port_setup(struct k
> ksz9477_set_gbit(dev, true, &data8);
> data8 &= ~PORT_RGMII_ID_IG_ENABLE;
> data8 &= ~PORT_RGMII_ID_EG_ENABLE;
>  -  if (dev->interface == PHY_INTERFACE_MODE_RGMII_ID ||
>  -  dev->interface == PHY_INTERFACE_MODE_RGMII_RXID)
>  +  if (p->interface == PHY_INTERFACE_MODE_RGMII_ID ||
>  +  p->interface == PHY_INTERFACE_MODE_RGMII_RXID)
> data8 |= PORT_RGMII_ID_IG_ENABLE;
>  -  if (dev->interface == PHY_INTERFACE_MODE_RGMII_ID ||
>  -  dev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
>  +  if (p->interface == PHY_INTERFACE_MODE_RGMII_ID ||
>  +  p->interface == PHY_INTERFACE_MODE_RGMII_TXID)
> data8 |= PORT_RGMII_ID_EG_ENABLE;
> +   /* On KSZ9893, disable RGMII in-band status support */
> +   if (dev->features & IS_9893)
> +   data8 &= ~PORT_MII_MAC_MODE;
> p->phydev.speed = SPEED_1000;
> break;
> }
> @@@ -1276,22 -1280,21 +1281,30 @@@ static void ksz9477_config_cpu_port(str
>  * note the difference to help debugging.
>  */
> interface = ksz9477_get_interface(dev, i);
>  -  if (!dev->interface)
>  -  dev->interface = interface;
>  -  if (interface && interface != dev->interface) {
>  +  if (!p->interface) {
>  +  if (dev->compat_interface) {
>  +  dev_warn(dev->dev,
>  +   "Using legacy switch 
> \"phy-mode\" property, because it is missing on port %d node. "
>  +   "Please update your device 
> tree.\n",
>  +   i);
>  +  p->interface = dev->compat_interface;
>  +  } else {
>  +  p->interface = interface;
>  +  }
>  +  }
> -   if (interface && interface != p->interface)
> -   dev_info(dev->dev,
> -"use %s instead of %s\n",
> - phy_modes(p->interface),
> - phy_modes(interface));
> ++  if (interface && interface != p->interface) {
> +   prev_msg = " instead of ";
> +   prev_mode = phy_modes(interface);
> +   } else {
> +   prev_msg = "";
> +   prev_mode = "";
> +   }
> +   dev_info(dev->dev,
> +"Port%d: using phy mode %s%s%s\n",
> +i,
>  -   phy_modes(dev->interface),
> ++   phy_modes(p->interface),
> +prev_msg,
> +prev_mode);
>
> /* enable cpu port */
> ksz9477_port_setup(dev, i, true);

Looks good to me wrt my patch "net: dsa: microchip: Improve phy mode message".

Thanks,

-- 
Paul Barker
Konsulko Group


[PATCH] ipv6: remove redundant assignment to variable err

2020-09-11 Thread Colin King
From: Colin Ian King 

The variable err is being initialized with a value that is never read and
it is being updated later with a new value. The initialization is redundant
and can be removed.  Also re-order variable declarations in reverse
Christmas tree ordering.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King 
---
 net/ipv6/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 5e7e25e2523a..e8ee20720fe0 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5284,9 +5284,10 @@ static int ip6_route_multipath_del(struct fib6_config 
*cfg,
 {
struct fib6_config r_cfg;
struct rtnexthop *rtnh;
+   int last_err = 0;
int remaining;
int attrlen;
-   int err = 1, last_err = 0;
+   int err;
 
remaining = cfg->fc_mp_len;
rtnh = (struct rtnexthop *)cfg->fc_mp;
-- 
2.27.0



Re: [PATCH bpf-next 1/2] bpf: Fix context type resolving for extension programs

2020-09-11 Thread Toke Høiland-Jørgensen
Alexei Starovoitov  writes:

> On Wed, Sep 9, 2020 at 8:11 AM Jiri Olsa  wrote:
>>
>> Eelco reported we can't properly access arguments if the tracing
>> program is attached to extension program.
>>
>> Having following program:
>>
>>   SEC("classifier/test_pkt_md_access")
>>   int test_pkt_md_access(struct __sk_buff *skb)
>>
>> with its extension:
>>
>>   SEC("freplace/test_pkt_md_access")
>>   int test_pkt_md_access_new(struct __sk_buff *skb)
>>
>> and tracing that extension with:
>>
>>   SEC("fentry/test_pkt_md_access_new")
>>   int BPF_PROG(fentry, struct sk_buff *skb)
>>
>> It's not possible to access skb argument in the fentry program,
>> with following error from verifier:
>>
>>   ; int BPF_PROG(fentry, struct sk_buff *skb)
>>   0: (79) r1 = *(u64 *)(r1 +0)
>>   invalid bpf_context access off=0 size=8
>>
>> The problem is that btf_ctx_access gets the context type for the
>> traced program, which is in this case the extension.
>>
>> But when we trace extension program, we want to get the context
>> type of the program that the extension is attached to, so we can
>> access the argument properly in the trace program.
>>
>> Reported-by: Eelco Chaudron 
>> Signed-off-by: Jiri Olsa 
>> ---
>>  kernel/bpf/btf.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index f9ac6935ab3c..37ad01c32e5a 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -3859,6 +3859,14 @@ bool btf_ctx_access(int off, int size, enum 
>> bpf_access_type type,
>> }
>>
>> info->reg_type = PTR_TO_BTF_ID;
>> +
>> +   /* When we trace extension program, we want to get the context
>> +* type of the program that the extension is attached to, so
>> +* we can access the argument properly in the trace program.
>> +*/
>> +   if (tgt_prog && tgt_prog->type == BPF_PROG_TYPE_EXT)
>> +   tgt_prog = tgt_prog->aux->linked_prog;
>> +
>> if (tgt_prog) {
>> ret = btf_translate_to_vmlinux(log, btf, t, tgt_prog->type, 
>> arg);
>
> I think it would be cleaner to move resolve_prog_type() from verifier.c
> and use that helper function here.

FYI, I've added a different version of this patch to my freplace
multi-attach series (since the approach here was incompatible with
that).

-Toke



Re: [PATCH bpf-next 2/2] selftests/bpf: Adding test for arg dereference in extension trace

2020-09-11 Thread Toke Høiland-Jørgensen
Andrii Nakryiko  writes:

> On Wed, Sep 9, 2020 at 8:38 AM Jiri Olsa  wrote:
>>
>> Adding test that setup following program:
>>
>>   SEC("classifier/test_pkt_md_access")
>>   int test_pkt_md_access(struct __sk_buff *skb)
>>
>> with its extension:
>>
>>   SEC("freplace/test_pkt_md_access")
>>   int test_pkt_md_access_new(struct __sk_buff *skb)
>>
>> and tracing that extension with:
>>
>>   SEC("fentry/test_pkt_md_access_new")
>>   int BPF_PROG(fentry, struct sk_buff *skb)
>>
>> The test verifies that the tracing program can
>> dereference skb argument properly.
>>
>> Signed-off-by: Jiri Olsa 

Just FYI, I included this same patch in my freplace series. I didn't
change anything in the version I just resent, but I'll work with Jiri
and get an updated version of this into the next version based on your
comments here... :)

-Toke



Re: [PATCH net-next] net: mvpp2: Initialize link in mvpp2_isr_handle_{xlg,gmac_internal}

2020-09-11 Thread Russell King - ARM Linux admin
On Thu, Sep 10, 2020 at 05:31:42PM -0700, Nathan Chancellor wrote:
> On Thu, Sep 10, 2020 at 03:28:11PM -0700, David Miller wrote:
> > From: Nathan Chancellor 
> > Date: Thu, 10 Sep 2020 10:48:27 -0700
> > 
> > > Clang warns (trimmed for brevity):
> > > 
> > > drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c:3073:7: warning:
> > > variable 'link' is used uninitialized whenever 'if' condition is false
> > > [-Wsometimes-uninitialized]
> > > if (val & MVPP22_XLG_STATUS_LINK_UP)
> > > ^~~
> > > drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c:3075:31: note:
> > > uninitialized use occurs here
> > > mvpp2_isr_handle_link(port, link);
> > > ^~~~
> > > ...
> > > drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c:3090:8: warning:
> > > variable 'link' is used uninitialized whenever 'if' condition is false
> > > [-Wsometimes-uninitialized]
> > > if (val & MVPP2_GMAC_STATUS0_LINK_UP)
> > > ^~~~
> > > drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c:3092:32: note:
> > > uninitialized use occurs here
> > > mvpp2_isr_handle_link(port, link);
> > > ^~~~
> > > 
> > > Initialize link to false like it was before the refactoring that
> > > happened around link status so that a valid valid is always passed into
> > > mvpp2_isr_handle_link.
> > > 
> > > Fixes: 36cfd3a6e52b ("net: mvpp2: restructure "link status" interrupt 
> > > handling")
> > > Link: https://github.com/ClangBuiltLinux/linux/issues/1151
> > > Signed-off-by: Nathan Chancellor 
> > 
> > This got fixed via another change, a much mode simply one in fact,
> > changing the existing assignments to be unconditional and of the
> > form "link = (bits & MASK);"
> 
> Ah great, that is indeed cleaner, thank you for letting me know!

Hmm, I'm not sure why gcc didn't find that. Strangely, the 0-day bot
seems to have only picked up on it with clang, not gcc.

Thanks for fixing.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH 4.19] net/mlx5e: Don't support phys switch id if not in switchdev mode

2020-09-11 Thread Greg Kroah-Hartman
On Thu, Sep 10, 2020 at 11:46:36AM -0700, Saeed Mahameed wrote:
> On Fri, 2020-08-07 at 15:13 +0200, Greg Kroah-Hartman wrote:
> > On Thu, Aug 06, 2020 at 07:05:42PM -0700, Saeed Mahameed wrote:
> > > From: Roi Dayan 
> > > 
> > > Support for phys switch id ndo added for representors and if
> > > we do not have representors there is no need to support it.
> > > Since each port return different switch id supporting this
> > > block support for creating bond over PFs and attaching to bridge
> > > in legacy mode.
> > > 
> > > This bug doesn't exist upstream as the code got refactored and the
> > > netdev api is totally different.
> > > 
> > > Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
> > > Signed-off-by: Roi Dayan 
> > > Signed-off-by: Saeed Mahameed 
> > > ---
> > > Hi Greg,
> > > 
> > > Sorry for submitting a non upstream patch, but this bug is
> > > bothering some users on 4.19-stable kernels and it doesn't exist
> > > upstream, so i hope you are ok with backporting this one liner
> > > patch.
> > 
> > Also queued up to 4.9.y and 4.14.y.
> > 
> 
> Hi Greg, the request was originally made for 4.19.y kernel,
> I see the patch in 4.9 and 4.14 but not in 4.19 can we push it to 4.19
> as well ? 

Very odd, don't know what happened.

Now fixed up, thanks.

greg k-h


[PATCH net-next] i40e: allow VMDQs to be used with AF_XDP zero-copy

2020-09-11 Thread Magnus Karlsson
From: Magnus Karlsson 

Allow VMDQs to be used with AF_XDP sockets in zero-copy mode. For some
reason, we only allowed main VSIs to be used with zero-copy, but
there is now reason to not allow VMDQs also.

Signed-off-by: Magnus Karlsson 
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 2a1153d..ebe15ca 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -45,7 +45,7 @@ static int i40e_xsk_pool_enable(struct i40e_vsi *vsi,
bool if_running;
int err;
 
-   if (vsi->type != I40E_VSI_MAIN)
+   if (!(vsi->type == I40E_VSI_MAIN || vsi->type == I40E_VSI_VMDQ2))
return -EINVAL;
 
if (qid >= vsi->num_queue_pairs)
-- 
2.7.4



Re: [PATCH net-next] i40e: allow VMDQs to be used with AF_XDP zero-copy

2020-09-11 Thread Maciej Fijalkowski
On Fri, Sep 11, 2020 at 02:08:26PM +0200, Magnus Karlsson wrote:
> From: Magnus Karlsson 
> 
> Allow VMDQs to be used with AF_XDP sockets in zero-copy mode. For some
> reason, we only allowed main VSIs to be used with zero-copy, but
> there is now reason to not allow VMDQs also.

You meant 'to allow' I suppose. And what reason? :)

> 
> Signed-off-by: Magnus Karlsson 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
> b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 2a1153d..ebe15ca 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -45,7 +45,7 @@ static int i40e_xsk_pool_enable(struct i40e_vsi *vsi,
>   bool if_running;
>   int err;
>  
> - if (vsi->type != I40E_VSI_MAIN)
> + if (!(vsi->type == I40E_VSI_MAIN || vsi->type == I40E_VSI_VMDQ2))
>   return -EINVAL;
>  
>   if (qid >= vsi->num_queue_pairs)
> -- 
> 2.7.4
> 


Re: [PATCH net-next + leds v2 6/7] net: phy: marvell: add support for LEDs controlled by Marvell PHYs

2020-09-11 Thread Andrew Lunn
> - Do all PHYs support manual setting of the LED level, or are the PHYs
> that can only work with HW triggers?

There are PHYs with do not have simple on/off.

> - Is setting PHY registers always efficiently possible, or should SW
> triggers be avoided in certain cases? I'm thinking about setups like
> mdio-gpio. I guess this can only become an issue for triggers that
> blink.

There are uses cases where not using software frequently writing
registers would be good. PTP time stamping is one, where the extra
jitter can reduce the accuracy of the clock.

I also think activity blinking in software is unlikely to be
accepted. Nothing extra is allowed in the hot path, when you can be
dealing with a million or more packets per second.

So i would say limit software fallback to link and speed, and don't
assume that is even possible depending on the hardware.

Andrew


[PATCH bpf v4] xsk: do not discard packet when NETDEV_TX_BUSY

2020-09-11 Thread Magnus Karlsson
From: Magnus Karlsson 

In the skb Tx path, transmission of a packet is performed with
dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it
signifies that it was not possible to send the packet right now,
please try later. Unfortunately, the xsk transmit code discarded the
packet and returned EBUSY to the application. Fix this unnecessary
packet loss, by not discarding the packet in the Tx ring and return
EAGAIN. As EAGAIN is returned to the application, it can then retry
the send operation later and the packet will then likely be sent as
the driver will then likely have space/resources to send the packet.

In summary, EAGAIN tells the application that the packet was not
discarded from the Tx ring and that it needs to call send()
again. EBUSY, on the other hand, signifies that the packet was not
sent and discarded from the Tx ring. The application needs to put the
packet on the Tx ring again if it wants it to be sent.

Fixes: 35fcde7f8deb ("xsk: support for Tx")
Signed-off-by: Magnus Karlsson 
Reported-by: Arkadiusz Zema 
Suggested-by: Arkadiusz Zema 
Suggested-by: Daniel Borkmann 
---
v3->v4:
* Free the skb without triggering the drop trace when NETDEV_TX_BUSY
* Call consume_skb instead of kfree_skb when the packet has been
  sent successfully for correct tracing
* Use sock_wfree as destructor when NETDEV_TX_BUSY
v1->v3:
* Hinder dev_direct_xmit() from freeing and completing the packet to
  user space by manipulating the skb->users count as suggested by
  Daniel Borkmann.
---
 net/xdp/xsk.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index c323162..d32e39d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -377,15 +377,30 @@ static int xsk_generic_xmit(struct sock *sk)
skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
skb->destructor = xsk_destruct_skb;
 
+   /* Hinder dev_direct_xmit from freeing the packet and
+* therefore completing it in the destructor
+*/
+   refcount_inc(&skb->users);
err = dev_direct_xmit(skb, xs->queue_id);
+   if  (err == NETDEV_TX_BUSY) {
+   /* Tell user-space to retry the send */
+   skb->destructor = sock_wfree;
+   /* Free skb without triggering the perf drop trace */
+   __kfree_skb(skb);
+   err = -EAGAIN;
+   goto out;
+   }
+
xskq_cons_release(xs->tx);
/* Ignore NET_XMIT_CN as packet might have been sent */
-   if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {
+   if (err == NET_XMIT_DROP) {
/* SKB completed but not sent */
+   kfree_skb(skb);
err = -EBUSY;
goto out;
}
 
+   consume_skb(skb);
sent_frame = true;
}
 
-- 
2.7.4



Re: [PATCH net-next + leds v2 6/7] net: phy: marvell: add support for LEDs controlled by Marvell PHYs

2020-09-11 Thread Marek Behún
On Fri, 11 Sep 2020 09:12:01 +0200
Matthias Schiffer  wrote:

> On Thu, 2020-09-10 at 17:00 +0200, Andrew Lunn wrote:
> > > I propose that at least these HW modes should be available (and
> > > documented) for ethernet PHY controlled LEDs:
> > >   mode to determine link on:
> > > - `link`
> > >   mode for activity (these should blink):
> > > - `activity` (both rx and tx), `rx`, `tx`
> > >   mode for link (on) and activity (blink)
> > > - `link/activity`, maybe `link/rx` and `link/tx`
> > >   mode for every supported speed:
> > > - `1Gbps`, `100Mbps`, `10Mbps`, ...
> > >   mode for every supported cable type:
> > > - `copper`, `fiber`, ... (are there others?)  
> > 
> > In theory, there is AUI and BNC, but no modern device will have
> > these.
> >   
> > >   mode that allows the user to determine link speed
> > > - `speed` (or maybe `linkspeed` ?)
> > > - on some Marvell PHYs the speed can be determined by how fast
> > >   the LED is blinking (ie. 1Gbps blinks with default blinking
> > >   frequency, 100Mbps with half blinking frequeny of 1Gbps,
> > > 10Mbps
> > >   of half blinking frequency of 100Mbps)
> > > - on other Marvell PHYs this is instead:
> > >   1Gpbs blinks 3 times, pause, 3 times, pause, ...
> > >   100Mpbs blinks 2 times, pause, 2 times, pause, ...
> > >   10Mpbs blinks 1 time, pause, 1 time, pause, ...
> > > - we don't need to differentiate these modes with different
> > > names,
> > >   because the important thing is just that this mode allows
> > > the user to determine the speed from how the LED blinks
> > >   mode to just force blinking
> > > - `blink`
> > > The nice thing is that all this can be documented and done in
> > > software
> > > as well.  
> > 
> > Have you checked include/dt-bindings/net/microchip-lan78xx.h and
> > mscc-phy-vsc8531.h ? If you are defining something generic, we need
> > to
> > make sure the majority of PHYs can actually do it. There is no
> > standardization in this area. I'm sure there is some similarity,
> > there
> > is only so many ways you can blink an LED, but i suspect we need a
> > mixture of standardized modes which we hope most PHYs implement, and
> > the option to support hardware specific modes.
> > 
> > Andrew  
> 
> 
> FWIW, these are the LED HW trigger modes supported by the TI DP83867
> PHY:
> 
> - Receive Error
> - Receive Error or Transmit Error

Does somebody use this? I would just omit these.

> - Link established, blink for transmit or receive activity

`link/activity`

> - Full duplex

Not needed for now, I think.

> - 100/1000BT link established
> - 10/100BT link established

Disjunctive modes can go f*** themselves :)

> - 10BT link established
> - 100BT link established
> - 1000BT link established

`10Mbps`, `100Mbps`, `1Gbps`

> - Collision detected

Not needed for now.

> - Receive activity
> - Transmit activity

`rx/tx`

> - Receive or Transmit activity

`activity`

> - Link established

`link`

> 
> AFAIK, the "Link established, blink for transmit or receive activity"
> is the only trigger that involves blinking; all other modes simply
> make the LED light up when the condition is met. Setting the output
> level in software is also possible.
> 
> Regarding the option to emulate unsupported HW triggers in software,
> two questions come to my mind:
> 
> - Do all PHYs support manual setting of the LED level, or are the PHYs
> that can only work with HW triggers?
> - Is setting PHY registers always efficiently possible, or should SW
> triggers be avoided in certain cases? I'm thinking about setups like
> mdio-gpio. I guess this can only become an issue for triggers that
> blink.

The software trigger do not have to work with the LED connected to the
PHY. Any other LED on the system can be used. Only the information
about link and speed must come from the PHY, and kernel does have this
information already, either by polling or from interrupt.

> 
> 
> Kind regards,
> Matthias
> 



[PATCH net-next v4 2/6] net: dsa: mt7530: Extend device data ready for adding a new hardware

2020-09-11 Thread Landen Chao
Add a structure holding required operations for each device such as device
initialization, PHY port read or write, a checker whether PHY interface is
supported on a certain port, MAC port setup for either bus pad or a
specific PHY interface.

The patch is done for ready adding a new hardware MT7531, and keep the
same setup logic of existing hardware.

Signed-off-by: Landen Chao 
Signed-off-by: Sean Wang 
---
 drivers/net/dsa/mt7530.c | 271 ---
 drivers/net/dsa/mt7530.h |  37 +-
 2 files changed, 234 insertions(+), 74 deletions(-)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 238417db26f9..9c6f80b3e5f5 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -372,8 +372,9 @@ mt7530_fdb_write(struct mt7530_priv *priv, u16 vid,
mt7530_write(priv, MT7530_ATA1 + (i * 4), reg[i]);
 }
 
+/* Setup TX circuit including relevant PAD and driving */
 static int
-mt7530_pad_clk_setup(struct dsa_switch *ds, int mode)
+mt7530_pad_clk_setup(struct dsa_switch *ds, phy_interface_t interface)
 {
struct mt7530_priv *priv = ds->priv;
u32 ncpo1, ssc_delta, trgint, i, xtal;
@@ -387,7 +388,7 @@ mt7530_pad_clk_setup(struct dsa_switch *ds, int mode)
return -EINVAL;
}
 
-   switch (mode) {
+   switch (interface) {
case PHY_INTERFACE_MODE_RGMII:
trgint = 0;
/* PLL frequency: 125MHz */
@@ -409,7 +410,8 @@ mt7530_pad_clk_setup(struct dsa_switch *ds, int mode)
}
break;
default:
-   dev_err(priv->dev, "xMII mode %d not supported\n", mode);
+   dev_err(priv->dev, "xMII interface %d not supported\n",
+   interface);
return -EINVAL;
}
 
@@ -1349,47 +1351,116 @@ mt7530_setup(struct dsa_switch *ds)
return 0;
 }
 
-static void mt7530_phylink_mac_config(struct dsa_switch *ds, int port,
- unsigned int mode,
- const struct phylink_link_state *state)
+static bool
+mt7530_phy_mode_supported(struct dsa_switch *ds, int port,
+ const struct phylink_link_state *state)
 {
struct mt7530_priv *priv = ds->priv;
-   u32 mcr_cur, mcr_new;
 
switch (port) {
-   case 0: /* Internal phy */
-   case 1:
-   case 2:
-   case 3:
-   case 4:
+   case 0 ... 4: /* Internal phy */
if (state->interface != PHY_INTERFACE_MODE_GMII)
-   return;
+   return false;
break;
case 5: /* 2nd cpu port with phy of port 0 or 4 / external phy */
-   if (priv->p5_interface == state->interface)
-   break;
if (!phy_interface_mode_is_rgmii(state->interface) &&
state->interface != PHY_INTERFACE_MODE_MII &&
state->interface != PHY_INTERFACE_MODE_GMII)
-   return;
+   return false;
+   break;
+   case 6: /* 1st cpu port */
+   if (state->interface != PHY_INTERFACE_MODE_RGMII &&
+   state->interface != PHY_INTERFACE_MODE_TRGMII)
+   return false;
+   break;
+   default:
+   dev_err(priv->dev, "%s: unsupported port: %i\n", __func__,
+   port);
+   return false;
+   }
+
+   return true;
+}
+
+static bool
+mt753x_phy_mode_supported(struct dsa_switch *ds, int port,
+ const struct phylink_link_state *state)
+{
+   struct mt7530_priv *priv = ds->priv;
+
+   return priv->info->phy_mode_supported(ds, port, state);
+}
+
+static int
+mt753x_pad_setup(struct dsa_switch *ds, const struct phylink_link_state *state)
+{
+   struct mt7530_priv *priv = ds->priv;
+
+   return priv->info->pad_setup(ds, state->interface);
+}
+
+static int
+mt7530_mac_config(struct dsa_switch *ds, int port, unsigned int mode,
+ phy_interface_t interface)
+{
+   struct mt7530_priv *priv = ds->priv;
+
+   /* Only need to setup port5. */
+   if (port != 5)
+   return 0;
+
+   mt7530_setup_port5(priv->ds, interface);
+
+   return 0;
+}
+
+static int
+mt753x_mac_config(struct dsa_switch *ds, int port, unsigned int mode,
+ const struct phylink_link_state *state)
+{
+   struct mt7530_priv *priv = ds->priv;
+
+   return priv->info->mac_port_config(ds, port, mode, state->interface);
+}
+
+static void
+mt753x_phylink_mac_config(struct dsa_switch *ds, int port, unsigned int mode,
+ const struct phylink_link_state *state)
+{
+   struct mt7530_priv *priv = ds->priv;
+   u32 mcr_cur, mcr_new;
+
+   if (!mt753x_phy_mode_supported(ds, port, state))
+   goto unsupported;
+
+   switch (port) {
+   case 0 ... 4: /* In

[PATCH net-next v4 4/6] net: dsa: mt7530: Add the support of MT7531 switch

2020-09-11 Thread Landen Chao
Add new support for MT7531:

MT7531 is the next generation of MT7530. It is also a 7-ports switch with
5 giga embedded phys, 2 cpu ports, and the same MAC logic of MT7530. Cpu
port 6 only supports SGMII interface. Cpu port 5 supports either RGMII
or SGMII in different HW sku, but cannot be muxed to PHY of port 0/4 like
mt7530. Due to SGMII interface support, pll, and pad setting are different
from MT7530. This patch adds different initial setting, and SGMII phylink
handlers of MT7531.

MT7531 SGMII interface can be configured in following mode:
- 'SGMII AN mode' with in-band negotiation capability
which is compatible with PHY_INTERFACE_MODE_SGMII.
- 'SGMII force mode' without in-band negotiation
which is compatible with 10B/8B encoding of
PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause.
- 2.5 times faster clocked 'SGMII force mode' without in-band negotiation
which is compatible with 10B/8B encoding of
PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause.

Signed-off-by: Landen Chao 
Signed-off-by: Sean Wang 
---
 drivers/net/dsa/Kconfig  |   6 +-
 drivers/net/dsa/mt7530.c | 911 +--
 drivers/net/dsa/mt7530.h | 222 ++
 3 files changed, 1112 insertions(+), 27 deletions(-)

diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index 06d68a848774..2451f61a38e4 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -33,12 +33,12 @@ config NET_DSA_LANTIQ_GSWIP
  the xrx200 / VR9 SoC.
 
 config NET_DSA_MT7530
-   tristate "MediaTek MT7530 and MT7621 Ethernet switch support"
+   tristate "MediaTek MT753x and MT7621 Ethernet switch support"
depends on NET_DSA
select NET_DSA_TAG_MTK
help
- This enables support for the MediaTek MT7530 and MT7621 Ethernet
- switch chip.
+ This enables support for the MediaTek MT7530, MT7531, and MT7621
+ Ethernet switch chips.
 
 config NET_DSA_MV88E6060
tristate "Marvell 88E6060 ethernet switch chip support"
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 9c6f80b3e5f5..8915223a1291 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -233,6 +233,12 @@ mt7530_write(struct mt7530_priv *priv, u32 reg, u32 val)
mutex_unlock(&bus->mdio_lock);
 }
 
+static u32
+_mt7530_unlocked_read(struct mt7530_dummy_poll *p)
+{
+   return mt7530_mii_read(p->priv, p->reg);
+}
+
 static u32
 _mt7530_read(struct mt7530_dummy_poll *p)
 {
@@ -483,6 +489,108 @@ mt7530_pad_clk_setup(struct dsa_switch *ds, 
phy_interface_t interface)
return 0;
 }
 
+static bool mt7531_dual_sgmii_supported(struct mt7530_priv *priv)
+{
+   u32 val;
+
+   val = mt7530_read(priv, MT7531_TOP_SIG_SR);
+
+   return (val & PAD_DUAL_SGMII_EN) != 0;
+}
+
+static int
+mt7531_pad_setup(struct dsa_switch *ds, phy_interface_t interface)
+{
+   struct mt7530_priv *priv = ds->priv;
+   u32 top_sig;
+   u32 hwstrap;
+   u32 xtal;
+   u32 val;
+
+   if (mt7531_dual_sgmii_supported(priv))
+   return 0;
+
+   val = mt7530_read(priv, MT7531_CREV);
+   top_sig = mt7530_read(priv, MT7531_TOP_SIG_SR);
+   hwstrap = mt7530_read(priv, MT7531_HWTRAP);
+   if ((val & CHIP_REV_M) > 0)
+   xtal = (top_sig & PAD_MCM_SMI_EN) ? HWTRAP_XTAL_FSEL_40MHZ :
+   HWTRAP_XTAL_FSEL_25MHZ;
+   else
+   xtal = hwstrap & HWTRAP_XTAL_FSEL_MASK;
+
+   /* Step 1 : Disable MT7531 COREPLL */
+   val = mt7530_read(priv, MT7531_PLLGP_EN);
+   val &= ~EN_COREPLL;
+   mt7530_write(priv, MT7531_PLLGP_EN, val);
+
+   /* Step 2: switch to XTAL output */
+   val = mt7530_read(priv, MT7531_PLLGP_EN);
+   val |= SW_CLKSW;
+   mt7530_write(priv, MT7531_PLLGP_EN, val);
+
+   val = mt7530_read(priv, MT7531_PLLGP_CR0);
+   val &= ~RG_COREPLL_EN;
+   mt7530_write(priv, MT7531_PLLGP_CR0, val);
+
+   /* Step 3: disable PLLGP and enable program PLLGP */
+   val = mt7530_read(priv, MT7531_PLLGP_EN);
+   val |= SW_PLLGP;
+   mt7530_write(priv, MT7531_PLLGP_EN, val);
+
+   /* Step 4: program COREPLL output frequency to 500MHz */
+   val = mt7530_read(priv, MT7531_PLLGP_CR0);
+   val &= ~RG_COREPLL_POSDIV_M;
+   val |= 2 << RG_COREPLL_POSDIV_S;
+   mt7530_write(priv, MT7531_PLLGP_CR0, val);
+   usleep_range(25, 35);
+
+   switch (xtal) {
+   case HWTRAP_XTAL_FSEL_25MHZ:
+   val = mt7530_read(priv, MT7531_PLLGP_CR0);
+   val &= ~RG_COREPLL_SDM_PCW_M;
+   val |= 0x14 << RG_COREPLL_SDM_PCW_S;
+   mt7530_write(priv, MT7531_PLLGP_CR0, val);
+   break;
+   case HWTRAP_XTAL_FSEL_40MHZ:
+   val = mt7530_read(priv, MT7531_PLLGP_CR0);
+   val &= ~RG_COREPLL_SDM_PCW_M;
+   val |= 0x19 << RG_COREPLL_SDM_PCW_S;
+ 

Re: VLAN filtering with DSA

2020-09-11 Thread Ido Schimmel
On Thu, Sep 10, 2020 at 11:41:04AM -0700, Florian Fainelli wrote:
> +Ido,
> 
> On 9/10/2020 8:07 AM, Vladimir Oltean wrote:
> > Florian, can you please reiterate what is the problem with calling
> > vlan_vid_add() with a VLAN that is installed by the bridge?
> > 
> > The effect of vlan_vid_add(), to my knowledge, is that the network
> > interface should add this VLAN to its filtering table, and not drop it.
> > So why return -EBUSY?

Can you clarify when you return -EBUSY? At least in mlxsw we return an
error in case we have a VLAN-aware bridge taking care of some VLAN and
then user space tries to install a VLAN upper with the same VLAN on the
same port. See more below.

> 
> I suppose that if you wanted to have an 802.1Q just for the sake of
> receiving VLAN tagged frames but not have them ingress the to the CPU, you
> could install an 802.1Q upper, but why would you do that unless the CPU
> should also receive that traffic?
> 
> The case that I wanted to cover was to avoid the two programming interfaces
> or the same VLAN, and prefer the bridge VLAN management over the 802.1Q
> upper, because once the switch port is in a bridge, that is what an user
> would expect to use.
> 
> If you have a bridge that is VLAN aware, it will manage the data and control
> path for us and that is all good since it is capable of dealing with VLAN
> tagged frames.
> 
> A non-VLAN aware bridge's data path is only allowed to see untagged traffic,
> so if you wanted somehow to inject untagged traffic into the bridge data
> path, then you would add a 802.1Q upper to that switch port, and somehow add
> that device into the bridge. There is a problem with that though, if you
> have mutliple bridge devices spanning the same switch, and you do the same
> thing on another switch port, with another 802.1Q upper, I believe you could
> break isolation between bridges for that particular VID.

At least in mlxsw this is handled by mapping the two {Port, VID} pairs
into different FIDs, each corresponding to a different bridge instance,
thereby maintaining the isolation.

> 
> Most of this was based on discussions we had with Ido and him explaining to
> me how they were doing it in mlxsw.
> 
> AFAIR the other case which is that you already have a 802.1Q upper, and then
> you add the switch port to the bridge is permitted and the bridge would
> inherit the VLAN into its local database.

If you have swp1 and swp1.10, you can put swp1 in a VLAN-aware bridge
and swp1.10 in a VLAN-unaware bridge. If you add VLAN 10 as part of the
VLAN-aware bridge on swp1, traffic tagged with this VLAN will still be
injected into the stack via swp1.10.

I'm not sure what is the use case for such a configuration and we reject
it in mlxsw.

> 
> I did not put much thoughts back then into a cascading set-up, so some
> assumptions can certainly be broken, and in fact, are broken today as you
> experimented.
> -- 
> Florian


Re: [PATCH net-next] i40e: allow VMDQs to be used with AF_XDP zero-copy

2020-09-11 Thread Maciej Fijalkowski
On Fri, Sep 11, 2020 at 02:29:50PM +0200, Magnus Karlsson wrote:
> On Fri, Sep 11, 2020 at 2:11 PM Maciej Fijalkowski
>  wrote:
> >
> > On Fri, Sep 11, 2020 at 02:08:26PM +0200, Magnus Karlsson wrote:
> > > From: Magnus Karlsson 
> > >
> > > Allow VMDQs to be used with AF_XDP sockets in zero-copy mode. For some
> > > reason, we only allowed main VSIs to be used with zero-copy, but
> > > there is now reason to not allow VMDQs also.
> >
> > You meant 'to allow' I suppose. And what reason? :)
> 
> Yes, sorry. Should be "not to allow". I was too trigger happy ;-).
> 
> I have gotten requests from users that they want to use VMDQs in
> conjunction with containers. Basically small slices of the i40e
> portioned out as netdevs. Do you see any problems with using a VMDQ
> iwth zero-copy?

No, I only meant to provide the actual reason (what you wrote above) in
the commit message.

> 
> /Magnus
> 
> > >
> > > Signed-off-by: Magnus Karlsson 
> > > ---
> > >  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
> > > b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > > index 2a1153d..ebe15ca 100644
> > > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > > @@ -45,7 +45,7 @@ static int i40e_xsk_pool_enable(struct i40e_vsi *vsi,
> > >   bool if_running;
> > >   int err;
> > >
> > > - if (vsi->type != I40E_VSI_MAIN)
> > > + if (!(vsi->type == I40E_VSI_MAIN || vsi->type == I40E_VSI_VMDQ2))
> > >   return -EINVAL;
> > >
> > >   if (qid >= vsi->num_queue_pairs)
> > > --
> > > 2.7.4
> > >


Re: [PATCH bpf-next] selftests/bpf: Check trampoline execution in d_path test

2020-09-11 Thread Jiri Olsa
On Thu, Sep 10, 2020 at 05:46:21PM -0700, Alexei Starovoitov wrote:
> On Thu, Sep 10, 2020 at 5:22 AM Jiri Olsa  wrote:
> >
> > Some kernels builds might inline vfs_getattr call within
> > fstat syscall code path, so fentry/vfs_getattr trampoline
> > is not called.
> >
> > I'm not sure how to handle this in some generic way other
> > than use some other function, but that might get inlined at
> > some point as well.
> 
> It's great that we had the test and it failed.
> Doing the test skipping will only hide the problem.
> Please don't do it here and in the future.
> Instead let's figure out the real solution.
> Assuming that vfs_getattr was added to btf_allowlist_d_path
> for a reason we have to make this introspection place
> reliable regardless of compiler inlining decisions.
> We can mark it as 'noinline', but that's undesirable.
> I suggest we remove it from the allowlist and replace it with
> security_inode_getattr.
> I think that is a better long term fix.

in my case vfs_getattr got inlined in vfs_statx_fd and both
of them are defined in fs/stat.c 

so the idea is that inlining will not happen if the function
is defined in another object? or less likely..?

we should be safe when it's called from module

> While at it I would apply the same critical thinking to other
> functions in the allowlist. They might suffer the same issue.
> So s/vfs_truncate/security_path_truncate/ and so on?
> Things won't work when CONFIG_SECURITY is off, but that is a rare kernel 
> config?
> Or add both security_* and vfs_* variants and switch tests to use security_* ?
> but it feels fragile to allow inline-able funcs in allowlist.

hm, what's the difference between vfs_getattr and security_inode_getattr
in this regard? I'd expect compiler could inline it same way as for vfs_getattr

jirka



Re: [PATCH bpf-next] selftests/bpf: Check trampoline execution in d_path test

2020-09-11 Thread Jiri Olsa
On Thu, Sep 10, 2020 at 03:22:10PM -0700, Andrii Nakryiko wrote:
> On Thu, Sep 10, 2020 at 5:25 AM Jiri Olsa  wrote:
> >
> > Some kernels builds might inline vfs_getattr call within
> > fstat syscall code path, so fentry/vfs_getattr trampoline
> > is not called.
> >
> > I'm not sure how to handle this in some generic way other
> > than use some other function, but that might get inlined at
> > some point as well.
> >
> > Adding flags that indicate trampolines were called and failing
> > the test if neither of them got called.
> >
> >   $ sudo ./test_progs -t d_path
> >   test_d_path:PASS:setup 0 nsec
> >   ...
> >   trigger_fstat_events:PASS:trigger 0 nsec
> >   test_d_path:FAIL:124 trampolines not called
> >   #22 d_path:FAIL
> >   Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> >
> > If only one trampoline is called, it's still enough to test
> > the helper, so only warn about missing trampoline call and
> > continue in test.
> >
> >   $ sudo ./test_progs -t d_path -v
> >   test_d_path:PASS:setup 0 nsec
> >   ...
> >   trigger_fstat_events:PASS:trigger 0 nsec
> >   fentry/vfs_getattr not called
> >   #22 d_path:OK
> >   Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> >
> > Signed-off-by: Jiri Olsa 
> > ---
> 
> Acked-by: Andrii Nakryiko 
> 
> >  .../testing/selftests/bpf/prog_tests/d_path.c | 25 +++
> >  .../testing/selftests/bpf/progs/test_d_path.c |  7 ++
> >  2 files changed, 27 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/d_path.c 
> > b/tools/testing/selftests/bpf/prog_tests/d_path.c
> > index fc12e0d445ff..ec15f7d1dd0a 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/d_path.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/d_path.c
> > @@ -120,26 +120,41 @@ void test_d_path(void)
> > if (err < 0)
> > goto cleanup;
> >
> > +   if (!bss->called_stat && !bss->called_close) {
> > +   PRINT_FAIL("trampolines not called\n");
> > +   goto cleanup;
> > +   }
> > +
> > +   if (!bss->called_stat) {
> > +   fprintf(stdout, "fentry/vfs_getattr not called\n");
> > +   goto cleanup;
> > +   }
> > +
> > +   if (!bss->called_close) {
> > +   fprintf(stdout, "fentry/filp_close not called\n");
> > +   goto cleanup;
> > +   }
> 
> not sure why you didn't go with `if (CHECK(!bss->called_close, ...`
> for these checks, would even save you some typing.

ok

> 
> > +
> > for (int i = 0; i < MAX_FILES; i++) {
> > -   CHECK(strncmp(src.paths[i], bss->paths_stat[i], 
> > MAX_PATH_LEN),
> > +   CHECK(bss->called_stat && strncmp(src.paths[i], 
> > bss->paths_stat[i], MAX_PATH_LEN),
> >   "check",
> >   "failed to get stat path[%d]: %s vs %s\n",
> >   i, src.paths[i], bss->paths_stat[i]);
> > -   CHECK(strncmp(src.paths[i], bss->paths_close[i], 
> > MAX_PATH_LEN),
> > +   CHECK(bss->called_close && strncmp(src.paths[i], 
> > bss->paths_close[i], MAX_PATH_LEN),
> >   "check",
> >   "failed to get close path[%d]: %s vs %s\n",
> >   i, src.paths[i], bss->paths_close[i]);
> > /* The d_path helper returns size plus NUL char, hence + 1 
> > */
> > -   CHECK(bss->rets_stat[i] != strlen(bss->paths_stat[i]) + 1,
> > +   CHECK(bss->called_stat && bss->rets_stat[i] != 
> > strlen(bss->paths_stat[i]) + 1,
> >   "check",
> >   "failed to match stat return [%d]: %d vs %zd [%s]\n",
> >   i, bss->rets_stat[i], strlen(bss->paths_stat[i]) + 1,
> >   bss->paths_stat[i]);
> > -   CHECK(bss->rets_close[i] != strlen(bss->paths_stat[i]) + 1,
> > +   CHECK(bss->called_close && bss->rets_close[i] != 
> > strlen(bss->paths_close[i]) + 1,
> >   "check",
> >   "failed to match stat return [%d]: %d vs %zd [%s]\n",
> >   i, bss->rets_close[i], strlen(bss->paths_close[i]) + 
> > 1,
> > - bss->paths_stat[i]);
> > + bss->paths_close[i]);
> 
> 
> those `bss->called_xxx` guard conditions are a bit lost on reading, if
> you reordered CHECKs, you could be more explicit:
> 
> if (bss->called_stat) {
> CHECK(...);
> CHECK(...);
> }
> if (bss->called_close) { ... }

ok, will change

thanks,
jirka



Re: [PATCH bpf-next 2/2] selftests/bpf: Adding test for arg dereference in extension trace

2020-09-11 Thread Jiri Olsa
On Thu, Sep 10, 2020 at 03:34:26PM -0700, Andrii Nakryiko wrote:

SNIP

> > +
> > +void test_trace_ext(void)
> > +{
> > +   struct test_trace_ext_tracing *skel_trace = NULL;
> > +   struct test_trace_ext_tracing__bss *bss_trace;
> > +   const char *file = "./test_pkt_md_access.o";
> > +   struct test_trace_ext *skel_ext = NULL;
> > +   struct test_trace_ext__bss *bss_ext;
> > +   int err, prog_fd, ext_fd;
> > +   struct bpf_object *obj;
> > +   char buf[100];
> > +   __u32 retval;
> > +   __u64 len;
> > +
> > +   err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
> > +   if (CHECK_FAIL(err))
> > +   return;
> 
> We should avoid using bpf_prog_load() for new code. Can you please
> just skeleton instead? Or at least bpf_object__open_file()?

ok

> 
> > +
> > +   DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts,
> > +   .attach_prog_fd = prog_fd,
> > +   );
> 
> DECLARE_LIBBPF_OPTS does declare a variable, so should be together
> with all the other variables above, otherwise some overly strict C89
> mode compiler will start complaining. You can assign
> `opts.attach_prog_fd = prog_fd;` outside of declaration. But I also
> don't think you need this one. Having .attach_prog_fd in open_opts is
> not great, because it's a per-program setting specified at bpf_object
> level. Would bpf_program__set_attach_target() work here?

right, I'll try it, it should be enough

SNIP

> > +
> > +cleanup:
> > +   test_trace_ext__destroy(skel_ext);
> > +   bpf_object__close(obj);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/test_trace_ext.c 
> > b/tools/testing/selftests/bpf/progs/test_trace_ext.c
> > new file mode 100644
> > index ..a6318f6b52ee
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_trace_ext.c
> > @@ -0,0 +1,18 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (c) 2019 Facebook
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +volatile __u64 ext_called = 0;
> 
> nit: no need for volatile, global variables are not going anywhere;
> same below in two places

ok, thanks

jirka



[PATCH bpf-next v2 4/5] bpf: selftests: add MPTCP test base

2020-09-11 Thread Nicolas Rybowski
This patch adds a base for MPTCP specific tests.

It is currently limited to the is_mptcp field in case of plain TCP
connection because for the moment there is no easy way to get the subflow
sk from a msk in userspace. This implies that we cannot lookup the
sk_storage attached to the subflow sk in the sockops program.

Acked-by: Matthieu Baerts 
Signed-off-by: Nicolas Rybowski 
---

Notes:
v1 -> v2:
- new patch: mandatory selftests (Alexei)

 tools/testing/selftests/bpf/config|   1 +
 tools/testing/selftests/bpf/network_helpers.c |  37 +-
 tools/testing/selftests/bpf/network_helpers.h |   3 +
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 119 ++
 tools/testing/selftests/bpf/progs/mptcp.c |  48 +++
 5 files changed, 203 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/mptcp.c
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp.c

diff --git a/tools/testing/selftests/bpf/config 
b/tools/testing/selftests/bpf/config
index 2118e23ac07a..8377836ea976 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -39,3 +39,4 @@ CONFIG_BPF_JIT=y
 CONFIG_BPF_LSM=y
 CONFIG_SECURITY=y
 CONFIG_LIRC=y
+CONFIG_MPTCP=y
diff --git a/tools/testing/selftests/bpf/network_helpers.c 
b/tools/testing/selftests/bpf/network_helpers.c
index 12ee40284da0..85cbb683965c 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -14,6 +14,10 @@
 #include "bpf_util.h"
 #include "network_helpers.h"
 
+#ifndef IPPROTO_MPTCP
+#define IPPROTO_MPTCP 262
+#endif
+
 #define clean_errno() (errno == 0 ? "None" : strerror(errno))
 #define log_err(MSG, ...) ({   \
int __save = errno; \
@@ -66,8 +70,8 @@ static int settimeo(int fd, int timeout_ms)
 
 #define save_errno_close(fd) ({ int __save = errno; close(fd); errno = __save; 
})
 
-int start_server(int family, int type, const char *addr_str, __u16 port,
-int timeout_ms)
+static int start_server_proto(int family, int type, int protocol,
+ const char *addr_str, __u16 port, int timeout_ms)
 {
struct sockaddr_storage addr = {};
socklen_t len;
@@ -76,7 +80,7 @@ int start_server(int family, int type, const char *addr_str, 
__u16 port,
if (make_sockaddr(family, addr_str, port, &addr, &len))
return -1;
 
-   fd = socket(family, type, 0);
+   fd = socket(family, type, protocol);
if (fd < 0) {
log_err("Failed to create server socket");
return -1;
@@ -104,6 +108,19 @@ int start_server(int family, int type, const char 
*addr_str, __u16 port,
return -1;
 }
 
+int start_server(int family, int type, const char *addr_str, __u16 port,
+int timeout_ms)
+{
+   return start_server_proto(family, type, 0, addr_str, port, timeout_ms);
+}
+
+int start_mptcp_server(int family, const char *addr_str, __u16 port,
+  int timeout_ms)
+{
+   return start_server_proto(family, SOCK_STREAM, IPPROTO_MPTCP, addr_str,
+ port, timeout_ms);
+}
+
 int fastopen_connect(int server_fd, const char *data, unsigned int data_len,
 int timeout_ms)
 {
@@ -153,7 +170,7 @@ static int connect_fd_to_addr(int fd,
return 0;
 }
 
-int connect_to_fd(int server_fd, int timeout_ms)
+static int connect_to_fd_proto(int server_fd, int protocol, int timeout_ms)
 {
struct sockaddr_storage addr;
struct sockaddr_in *addr_in;
@@ -173,7 +190,7 @@ int connect_to_fd(int server_fd, int timeout_ms)
}
 
addr_in = (struct sockaddr_in *)&addr;
-   fd = socket(addr_in->sin_family, type, 0);
+   fd = socket(addr_in->sin_family, type, protocol);
if (fd < 0) {
log_err("Failed to create client socket");
return -1;
@@ -192,6 +209,16 @@ int connect_to_fd(int server_fd, int timeout_ms)
return -1;
 }
 
+int connect_to_fd(int server_fd, int timeout_ms)
+{
+   return connect_to_fd_proto(server_fd, 0, timeout_ms);
+}
+
+int connect_to_mptcp_fd(int server_fd, int timeout_ms)
+{
+   return connect_to_fd_proto(server_fd, IPPROTO_MPTCP, timeout_ms);
+}
+
 int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms)
 {
struct sockaddr_storage addr;
diff --git a/tools/testing/selftests/bpf/network_helpers.h 
b/tools/testing/selftests/bpf/network_helpers.h
index 7205f8afdba1..336423a789e9 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -35,7 +35,10 @@ extern struct ipv6_packet pkt_v6;
 
 int start_server(int family, int type, const char *addr, __u16 port,
 int timeout_ms);
+int start_mptcp_server(int family, const char *addr, __u16 port,
+  int timeout_ms);
 int connect

[PATCH net-next 12/13] mptcp: call tcp_cleanup_rbuf on subflows

2020-09-11 Thread Paolo Abeni
That is needed to let the subflows announce promptly when new
space is available in the receive buffer.

tcp_cleanup_rbuf() is currently a static function, drop the
scope modifier and add a declaration in the TCP header.

Reviewed-by: Mat Martineau 
Signed-off-by: Paolo Abeni 
---
 include/net/tcp.h| 2 ++
 net/ipv4/tcp.c   | 2 +-
 net/mptcp/protocol.c | 6 ++
 net/mptcp/subflow.c  | 2 ++
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e85d564446c6..852f0d71dd40 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1414,6 +1414,8 @@ static inline int tcp_full_space(const struct sock *sk)
return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf));
 }
 
+void tcp_cleanup_rbuf(struct sock *sk, int copied);
+
 /* We provision sk_rcvbuf around 200% of sk_rcvlowat.
  * If 87.5 % (7/8) of the space has been consumed, we want to override
  * SO_RCVLOWAT constraint, since we are receiving skbs with too small
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 57a568875539..d3781b6087cb 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1527,7 +1527,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr 
*msg, int len)
  * calculation of whether or not we must ACK for the sake of
  * a window update.
  */
-static void tcp_cleanup_rbuf(struct sock *sk, int copied)
+void tcp_cleanup_rbuf(struct sock *sk, int copied)
 {
struct tcp_sock *tp = tcp_sk(sk);
bool time_to_ack = false;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 148c4e685ecd..a17e534a1425 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -515,6 +515,8 @@ static bool __mptcp_move_skbs_from_subflow(struct 
mptcp_sock *msk,
} while (more_data_avail);
 
*bytes += moved;
+   if (moved)
+   tcp_cleanup_rbuf(ssk, moved);
 
return done;
 }
@@ -1422,10 +1424,14 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock 
*msk, int copied)
 */
mptcp_for_each_subflow(msk, subflow) {
struct sock *ssk;
+   bool slow;
 
ssk = mptcp_subflow_tcp_sock(subflow);
+   slow = lock_sock_fast(ssk);
WRITE_ONCE(ssk->sk_rcvbuf, rcvbuf);
tcp_sk(ssk)->window_clamp = window_clamp;
+   tcp_cleanup_rbuf(ssk, 1);
+   unlock_sock_fast(ssk, slow);
}
}
}
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 58f2349930a5..fb59bbd9b4cc 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -823,6 +823,8 @@ static void mptcp_subflow_discard_data(struct sock *ssk, 
struct sk_buff *skb,
sk_eat_skb(ssk, skb);
if (mptcp_subflow_get_map_offset(subflow) >= subflow->map_data_len)
subflow->map_valid = 0;
+   if (incr)
+   tcp_cleanup_rbuf(ssk, incr);
 }
 
 static bool subflow_check_data_avail(struct sock *ssk)
-- 
2.26.2



[PATCH net-next 02/13] mptcp: set data_ready status bit in subflow_check_data_avail()

2020-09-11 Thread Paolo Abeni
This simplify mptcp_subflow_data_available() and will
made follow-up patches simpler.

Additionally remove the unneeded checks on subflow copied_seq:
we always whole skbs out of subflows.

Signed-off-by: Paolo Abeni 
---
 net/mptcp/subflow.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 7ae1d3604047..53b455c3c229 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -825,6 +825,8 @@ static bool subflow_check_data_avail(struct sock *ssk)
 
pr_debug("msk=%p ssk=%p data_avail=%d skb=%p", subflow->conn, ssk,
 subflow->data_avail, skb_peek(&ssk->sk_receive_queue));
+   if (!skb_peek(&ssk->sk_receive_queue))
+   subflow->data_avail = 0;
if (subflow->data_avail)
return true;
 
@@ -849,6 +851,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
subflow->map_data_len = skb->len;
subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq -
   subflow->ssn_offset;
+   subflow->data_avail = 1;
return true;
}
 
@@ -876,8 +879,10 @@ static bool subflow_check_data_avail(struct sock *ssk)
ack_seq = mptcp_subflow_get_mapped_dsn(subflow);
pr_debug("msk ack_seq=%llx subflow ack_seq=%llx", old_ack,
 ack_seq);
-   if (ack_seq == old_ack)
+   if (ack_seq == old_ack) {
+   subflow->data_avail = 1;
break;
+   }
 
/* only accept in-sequence mapping. Old values are spurious
 * retransmission; we can hit "future" values on active backup
@@ -922,13 +927,13 @@ static bool subflow_check_data_avail(struct sock *ssk)
ssk->sk_error_report(ssk);
tcp_set_state(ssk, TCP_CLOSE);
tcp_send_active_reset(ssk, GFP_ATOMIC);
+   subflow->data_avail = 0;
return false;
 }
 
 bool mptcp_subflow_data_available(struct sock *sk)
 {
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
-   struct sk_buff *skb;
 
/* check if current mapping is still valid */
if (subflow->map_valid &&
@@ -941,15 +946,7 @@ bool mptcp_subflow_data_available(struct sock *sk)
 subflow->map_data_len);
}
 
-   if (!subflow_check_data_avail(sk)) {
-   subflow->data_avail = 0;
-   return false;
-   }
-
-   skb = skb_peek(&sk->sk_receive_queue);
-   subflow->data_avail = skb &&
-  before(tcp_sk(sk)->copied_seq, TCP_SKB_CB(skb)->end_seq);
-   return subflow->data_avail;
+   return subflow_check_data_avail(sk);
 }
 
 /* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy,
-- 
2.26.2



[PATCH net-next 09/13] mptcp: move address attribute into mptcp_addr_info

2020-09-11 Thread Paolo Abeni
So that can be accessed easily from the subflow creation
helper. No functional change intended.

Signed-off-by: Paolo Abeni 
---
 net/mptcp/pm_netlink.c | 39 ---
 net/mptcp/protocol.h   |  5 +++--
 net/mptcp/subflow.c|  5 ++---
 3 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 2c208d2e65cd..6947f4fee6b9 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -23,8 +23,6 @@ static int pm_nl_pernet_id;
 
 struct mptcp_pm_addr_entry {
struct list_headlist;
-   unsigned intflags;
-   int ifindex;
struct mptcp_addr_info  addr;
struct rcu_head rcu;
 };
@@ -119,7 +117,7 @@ select_local_address(const struct pm_nl_pernet *pernet,
rcu_read_lock();
spin_lock_bh(&msk->join_list_lock);
list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) {
-   if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW))
+   if (!(entry->addr.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW))
continue;
 
/* avoid any address already in use by subflows and
@@ -150,7 +148,7 @@ select_signal_address(struct pm_nl_pernet *pernet, unsigned 
int pos)
 * can lead to additional addresses not being announced.
 */
list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) {
-   if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
+   if (!(entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
continue;
if (i++ == pos) {
ret = entry;
@@ -210,8 +208,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct 
mptcp_sock *msk)
msk->pm.subflows++;
check_work_pending(msk);
spin_unlock_bh(&msk->pm.lock);
-   __mptcp_subflow_connect(sk, local->ifindex,
-   &local->addr, &remote);
+   __mptcp_subflow_connect(sk, &local->addr, &remote);
spin_lock_bh(&msk->pm.lock);
return;
}
@@ -257,13 +254,13 @@ void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk)
local.family = remote.family;
 
spin_unlock_bh(&msk->pm.lock);
-   __mptcp_subflow_connect((struct sock *)msk, 0, &local, &remote);
+   __mptcp_subflow_connect((struct sock *)msk, &local, &remote);
spin_lock_bh(&msk->pm.lock);
 }
 
 static bool address_use_port(struct mptcp_pm_addr_entry *entry)
 {
-   return (entry->flags &
+   return (entry->addr.flags &
(MPTCP_PM_ADDR_FLAG_SIGNAL | MPTCP_PM_ADDR_FLAG_SUBFLOW)) ==
MPTCP_PM_ADDR_FLAG_SIGNAL;
 }
@@ -293,9 +290,9 @@ static int mptcp_pm_nl_append_new_local_addr(struct 
pm_nl_pernet *pernet,
goto out;
}
 
-   if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)
+   if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL)
pernet->add_addr_signal_max++;
-   if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
+   if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
pernet->local_addr_max++;
 
entry->addr.id = pernet->next_id++;
@@ -345,8 +342,9 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct 
sock_common *skc)
if (!entry)
return -ENOMEM;
 
-   entry->flags = 0;
entry->addr = skc_local;
+   entry->addr.ifindex = 0;
+   entry->addr.flags = 0;
ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
if (ret < 0)
kfree(entry);
@@ -460,14 +458,17 @@ static int mptcp_pm_parse_addr(struct nlattr *attr, 
struct genl_info *info,
entry->addr.addr.s_addr = nla_get_in_addr(tb[addr_addr]);
 
 skip_family:
-   if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX])
-   entry->ifindex = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]);
+   if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX]) {
+   u32 val = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]);
+
+   entry->addr.ifindex = val;
+   }
 
if (tb[MPTCP_PM_ADDR_ATTR_ID])
entry->addr.id = nla_get_u8(tb[MPTCP_PM_ADDR_ATTR_ID]);
 
if (tb[MPTCP_PM_ADDR_ATTR_FLAGS])
-   entry->flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]);
+   entry->addr.flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]);
 
return 0;
 }
@@ -535,9 +536,9 @@ static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, 
struct genl_info *info)
ret = -EINVAL;
goto out;
}
-   if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)
+   if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL)
pernet->add_addr_signal_max--;
-   if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
+   if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_

Re: [net-next v5 01/15] virtchnl: Extend AVF ops

2020-09-11 Thread Jakub Kicinski
On Thu, 10 Sep 2020 21:06:05 + Brady, Alan wrote:
> > It seems like these are triggering on old messages too, curious why this 
> > wasn't
> > caught sooner.  Will fix, thanks.
> > 
> I managed to get a 32-bit build environment setup and found that we
> do indeed have alignment issues there on 32 bit systems for some of
> the new ops we added with the series.  However, I think I'm still
> missing something as it looks like you have errors triggering on much
> more than I found and I'm suspecting there might be a compile option
> I'm missing or perhaps my GCC version is older than yours.  E.g., I
> found issues in virtchnl_txq_info_v2, virtchnl_rxq_info_v2,
> virtchnl_config_rx_queues, and virtchnl_rss_hash.  It appears you
> have compile issues in virtchnl_get_capabilities (among others)
> however which did not trigger on mine.  Manual inspection indicates
> that it _should_ be triggering a failure and that your setup is more
> correct than mine.  I'm guessing some extra padding is getting
> included in some places and causing a false positive on the other
> alignment issues.  Are there any hints you can provide me that might
> help me more accurately reproduce this?

Hm. I build like this:

make CC="ccache gcc" O=build32/ ARCH=i386 allmodconfig
make CC="ccache gcc" O=build32/ ARCH=i386 -j 64 W=1 C=1

My GCC is:
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-10.1.0/configure --enable-languages=c,c++
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.1.0 (GCC) 


[PATCH bpf-next v2 0/5] bpf: add MPTCP subflow support

2020-09-11 Thread Nicolas Rybowski
Previously it was not possible to make a distinction between plain TCP
sockets and MPTCP subflow sockets on the BPF_PROG_TYPE_SOCK_OPS hook.

This patch series now enables a fine control of subflow sockets. In its
current state, it allows to put different sockopt on each subflow from a
same MPTCP connection (socket mark, TCP congestion algorithm, ...) using
BPF programs.

It should also be the basis of exposing MPTCP-specific fields through BPF.

v1 -> v2:
- add basic mandatory selftests for the new helper and is_mptcp field (Alexei)
- rebase on latest bpf-next

Nicolas Rybowski (5):
  bpf: expose is_mptcp flag to bpf_tcp_sock
  mptcp: attach subflow socket to parent cgroup
  bpf: add 'bpf_mptcp_sock' structure and helper
  bpf: selftests: add MPTCP test base
  bpf: selftests: add bpf_mptcp_sock() verifier tests

 include/linux/bpf.h   |  33 +
 include/uapi/linux/bpf.h  |  15 +++
 kernel/bpf/verifier.c |  30 +
 net/core/filter.c |  13 +-
 net/mptcp/Makefile|   2 +
 net/mptcp/bpf.c   |  72 +++
 net/mptcp/subflow.c   |  27 
 scripts/bpf_helpers_doc.py|   2 +
 tools/include/uapi/linux/bpf.h|  15 +++
 tools/testing/selftests/bpf/config|   1 +
 tools/testing/selftests/bpf/network_helpers.c |  37 +-
 tools/testing/selftests/bpf/network_helpers.h |   3 +
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 119 ++
 tools/testing/selftests/bpf/progs/mptcp.c |  48 +++
 tools/testing/selftests/bpf/verifier/sock.c   |  63 ++
 15 files changed, 474 insertions(+), 6 deletions(-)
 create mode 100644 net/mptcp/bpf.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/mptcp.c
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp.c

-- 
2.28.0



[PATCH] vhost: reduce stack usage in log_used

2020-09-11 Thread Li Wang
Fix the warning: [-Werror=-Wframe-larger-than=]

drivers/vhost/vhost.c: In function log_used:
drivers/vhost/vhost.c:1906:1:
warning: the frame size of 1040 bytes is larger than 1024 bytes

Signed-off-by: Li Wang 
---
 drivers/vhost/vhost.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index b45519c..41769de 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1884,25 +1884,31 @@ static int log_write_hva(struct vhost_virtqueue *vq, 
u64 hva, u64 len)
 
 static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
 {
-   struct iovec iov[64];
+   struct iovec *iov;
int i, ret;
 
if (!vq->iotlb)
return log_write(vq->log_base, vq->log_addr + used_offset, len);
 
+   iov = kcalloc(64, sizeof(*iov), GFP_KERNEL);
+   if (!iov)
+   return -ENOMEM;
+
ret = translate_desc(vq, (uintptr_t)vq->used + used_offset,
 len, iov, 64, VHOST_ACCESS_WO);
if (ret < 0)
-   return ret;
+   goto out;
 
for (i = 0; i < ret; i++) {
ret = log_write_hva(vq, (uintptr_t)iov[i].iov_base,
iov[i].iov_len);
if (ret)
-   return ret;
+   goto out;
}
 
-   return 0;
+out:
+   kfree(iov);
+   return ret;
 }
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-- 
2.7.4



Re: [PATCH v3 00/11] Fix PM hibernation in Xen guests

2020-09-11 Thread boris . ostrovsky


On 8/21/20 6:22 PM, Anchal Agarwal wrote:
>
> Known issues:
> 1.KASLR causes intermittent hibernation failures. VM fails to resumes and
> has to be restarted. I will investigate this issue separately and shouldn't
> be a blocker for this patch series.


Is there any change in status for this? This has been noted since January.


-boris


> 2. During hibernation, I observed sometimes that freezing of tasks fails due
> to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1
> out of 200 runs and hibernation is aborted in this case. Re-trying hibernation
> may work. Also, this is a known issue with hibernation and some
> filesystems like XFS has been discussed by the community for years with not an
> effectve resolution at this point.
>


Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs

2020-09-11 Thread Marcelo Tosatti
On Wed, Sep 09, 2020 at 11:08:17AM -0400, Nitesh Narayan Lal wrote:
> In a realtime environment, it is essential to isolate unwanted IRQs from
> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> based on the online CPUs could lead to a potential issue on an RT setup
> that has several isolated CPUs but a very few housekeeping CPUs. This is
> because in these kinds of setups an attempt to move the IRQs to the
> limited housekeeping CPUs from isolated CPUs might fail due to the per
> CPU vector limit. This could eventually result in latency spikes because
> of the IRQ threads that we fail to move from isolated CPUs.
> 
> This patch prevents i40e to add vectors only based on available
> housekeeping CPUs by using num_housekeeping_cpus().
> 
> Signed-off-by: Nitesh Narayan Lal 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 2e433fdbf2c3..3b4cd4b3de85 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -5,6 +5,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  /* Local includes */
> @@ -11002,7 +11003,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
>* will use any remaining vectors to reach as close as we can to the
>* number of online CPUs.
>*/
> - cpus = num_online_cpus();
> + cpus = num_housekeeping_cpus();
>   pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
>   vectors_left -= pf->num_lan_msix;
>  
> -- 
> 2.27.0

For patches 1 and 2:

Reviewed-by: Marcelo Tosatti 



pull-request: wireless-drivers-next-2020-09-11

2020-09-11 Thread Kalle Valo
Hi,

here's a pull request to net-next tree, more info below. Please let me know if
there are any problems.

Kalle

The following changes since commit 9123e3a74ec7b934a4a099e98af6a61c2f80bbf5:

  Linux 5.9-rc1 (2020-08-16 13:04:57 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
tags/wireless-drivers-next-2020-09-11

for you to fetch changes up to 5941d003f0a60877a956cc3cae6e3850b46fad0a:

  Merge ath-next from 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git (2020-09-11 
18:03:00 +0300)


wireless-drivers-next patches for v5.10

First set of patches for v5.10. Most noteworthy here is ath11k getting
initial support for QCA6390 and IPQ6018 devices. But most of the
patches are cleanup: W=1 warning fixes, fallthrough keywords, DMA API
changes and tasklet API changes.

Major changes:

ath10k

* support SDIO firmware codedumps

* support station specific TID configurations

ath11k

* add support for IPQ6018

* add support for QCA6390 PCI devices

ath9k

* add support for NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 to improve PTK0
  rekeying

wcn36xx

* add support for TX ack


Alex Dewar (1):
  ath11k: return error if firmware request fails

Alexander A. Klimov (2):
  ath9k: Replace HTTP links with HTTPS ones
  ath5k: Replace HTTP links with HTTPS ones

Alexander Wetzel (1):
  ath9k: add NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 support

Allen Pais (17):
  ath5k: convert tasklets to use new tasklet_setup() API
  ath9k: convert tasklets to use new tasklet_setup() API
  carl9170: convert tasklets to use new tasklet_setup() API
  atmel: convert tasklets to use new tasklet_setup() API
  b43legacy: convert tasklets to use new tasklet_setup() API
  brcmsmac: convert tasklets to use new tasklet_setup() API
  ipw2x00: convert tasklets to use new tasklet_setup() API
  iwlegacy: convert tasklets to use new tasklet_setup() API
  intersil: convert tasklets to use new tasklet_setup() API
  mwl8k: convert tasklets to use new tasklet_setup() API
  qtnfmac: convert tasklets to use new tasklet_setup() API
  rt2x00: convert tasklets to use new tasklet_setup() API
  rtlwifi/rtw88: convert tasklets to use new tasklet_setup() API
  zd1211rw: convert tasklets to use new tasklet_setup() API
  ath11k: convert tasklets to use new tasklet_setup() API
  zd1211rw: fix build warning
  rtlwifi: fix build warning

Andy Shevchenko (1):
  brcmfmac: use %*ph to print small buffer

Anilkumar Kolli (11):
  ath11k: update firmware files read path
  ath11k: rename default board file
  ath11k: ahb: call ath11k_core_init() before irq configuration
  ath11k: convert ath11k_hw_params to an array
  ath11k: define max_radios in hw_params
  ath11k: add hw_ops for pdev id to hw_mac mapping
  ath11k: Add bdf-addr in hw_params
  dt: bindings: net: update compatible for ath11k
  ath11k: move target ce configs to hw_params
  ath11k: add ipq6018 support
  ath11k: remove calling ath11k_init_hw_params() second time

Bolarinwa Olayemi Saheed (1):
  ath9k: Check the return value of pcie_capability_read_*()

Brian Norris (2):
  rtw88: don't treat NULL pointer as an array
  rtw88: use read_poll_timeout_atomic() for poll loop

Bryan O'Donoghue (9):
  wcn36xx: Fix reported 802.11n rx_highest rate wcn3660/wcn3680
  wcn36xx: Add a chip identifier for WCN3680
  wcn36xx: Hook and identify RF_IRIS_WCN3680
  wcn36xx: Add 802.11ac MCS rates
  wcn36xx: Specify ieee80211_rx_status.nss
  wcn36xx: Add 802.11ac HAL param bitfields
  wcn36xx: Add Supported rates V1 structure
  wcn36xx: Use existing pointers in wcn36xx_smd_config_bss_v1
  wcn36xx: Set feature DOT11AC for wcn3680

Carl Huang (24):
  ath11k: do not depend on ARCH_QCOM for ath11k
  ath11k: add hw_params entry for QCA6390
  ath11k: allocate smaller chunks of memory for firmware
  ath11k: fix memory OOB access in qmi_decode
  ath11k: fix KASAN warning of ath11k_qmi_wlanfw_wlan_cfg_send
  ath11k: enable internal sleep clock
  ath11k: hal: create register values dynamically
  ath11k: ce: support different CE configurations
  ath11k: hal: assign msi_addr and msi_data to srng
  ath11k: ce: get msi_addr and msi_data before srng setup
  ath11k: disable CE interrupt before hif start
  ath11k: force single pdev only for QCA6390
  ath11k: initialize wmi config based on hw_params
  ath11k: wmi: put hardware to DBS mode
  ath11k: dp: redefine peer_map and peer_unmap
  ath11k: enable DP interrupt setup for QCA6390
  ath11k: don't initialize rxdma1 related ring
  ath11k: setup QCA6390 rings for both rxdmas
  ath11k: refine the phy_id check in ath11k_reg_chan_list_event
  at

Re: [PATCH net v1] hinic: fix rewaking txq after netif_tx_disable

2020-09-11 Thread Jakub Kicinski
On Thu, 10 Sep 2020 22:04:40 +0800 Luo bin wrote:
> When calling hinic_close in hinic_set_channels, all queues are
> stopped after netif_tx_disable, but some queue may be rewaken in
> free_tx_poll by mistake while drv is handling tx irq. If one queue
> is rewaken core may call hinic_xmit_frame to send pkt after
> netif_tx_disable within a short time which may results in accessing
> memory that has been already freed in hinic_close. So we call
> napi_disable before netif_tx_disable in hinic_close to fix this bug.
> 
> Fixes: 2eed5a8b614b ("hinic: add set_channels ethtool_ops support")
> Signed-off-by: Luo bin 

Reviewed-by: Jakub Kicinski 


Re: [RFC PATCH net-next 10/22] nexthop: Allow setting "offload" and "trap" indications on nexthops

2020-09-11 Thread Ido Schimmel
On Tue, Sep 08, 2020 at 09:14:37AM -0600, David Ahern wrote:
> On 9/8/20 3:10 AM, Ido Schimmel wrote:
> > From: Ido Schimmel 
> > 
> > Add a function that can be called by device drivers to set "offload" or
> > "trap" indication on nexthops following nexthop notifications.
> > 
> > Signed-off-by: Ido Schimmel 
> > ---
> >  include/net/nexthop.h |  1 +
> >  net/ipv4/nexthop.c| 21 +
> >  2 files changed, 22 insertions(+)
> > 
> > diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> > index 0bde1aa867c0..4147681e86d2 100644
> > --- a/include/net/nexthop.h
> > +++ b/include/net/nexthop.h
> > @@ -146,6 +146,7 @@ struct nh_notifier_info {
> >  
> >  int register_nexthop_notifier(struct net *net, struct notifier_block *nb);
> >  int unregister_nexthop_notifier(struct net *net, struct notifier_block 
> > *nb);
> > +void nexthop_hw_flags_set(struct net *net, u32 id, bool offload, bool 
> > trap);
> 
> how about nexthop_set_hw_flags? consistency with current nexthop_get_
> ... naming

Sure. I opted for consistency with fib_alias_hw_flags_set() and
fib6_info_hw_flags_set(), but I'll change to be consistent with nexthop
code.

> 
> >  
> >  /* caller is holding rcu or rtnl; no reference taken to nexthop */
> >  struct nexthop *nexthop_find_by_id(struct net *net, u32 id);
> > diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> > index 70c8ab6906ec..71605c612458 100644
> > --- a/net/ipv4/nexthop.c
> > +++ b/net/ipv4/nexthop.c
> > @@ -2080,6 +2080,27 @@ int unregister_nexthop_notifier(struct net *net, 
> > struct notifier_block *nb)
> >  }
> >  EXPORT_SYMBOL(unregister_nexthop_notifier);
> >  
> > +void nexthop_hw_flags_set(struct net *net, u32 id, bool offload, bool trap)
> > +{
> > +   struct nexthop *nexthop;
> > +
> > +   rcu_read_lock();
> > +
> > +   nexthop = nexthop_find_by_id(net, id);
> > +   if (!nexthop)
> > +   goto out;
> > +
> > +   nexthop->nh_flags &= ~(RTNH_F_OFFLOAD | RTNH_F_TRAP);
> > +   if (offload)
> > +   nexthop->nh_flags |= RTNH_F_OFFLOAD;
> > +   if (trap)
> > +   nexthop->nh_flags |= RTNH_F_TRAP;
> > +
> > +out:
> > +   rcu_read_unlock();
> > +}
> > +EXPORT_SYMBOL(nexthop_hw_flags_set);
> > +
> >  static void __net_exit nexthop_net_exit(struct net *net)
> >  {
> > rtnl_lock();
> > 
> 


Re: [PATCH net-next] i40e: allow VMDQs to be used with AF_XDP zero-copy

2020-09-11 Thread Jakub Kicinski
On Fri, 11 Sep 2020 14:08:26 +0200 Magnus Karlsson wrote:
> From: Magnus Karlsson 
> 
> Allow VMDQs to be used with AF_XDP sockets in zero-copy mode. For some
> reason, we only allowed main VSIs to be used with zero-copy, but
> there is now reason to not allow VMDQs also.
> 
> Signed-off-by: Magnus Karlsson 

The VMQ interfaces that you create through a debugfs command interfaces?

IDK if we should add features to those, or pretend they never existed
in the first place..


Re: [PATCH net-next] net: stmmac: set get_rx_header_len() as void for it didn't have any error code to return

2020-09-11 Thread Jakub Kicinski
On Fri, 11 Sep 2020 11:55:58 +0800 Luo Jiaxing wrote:
> We found the following warning when using W=1 to build kernel:
> 
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3634:6: warning: variable 
> ‘ret’ set but not used [-Wunused-but-set-variable]
> int ret, coe = priv->hw->rx_csum;
> 
> When digging stmmac_get_rx_header_len(), dwmac4_get_rx_header_len() and
> dwxgmac2_get_rx_header_len() return 0 only, without any error code to
> report. Therefore, it's better to define get_rx_header_len() as void.
> 
> Signed-off-by: Luo Jiaxing 

Reviewed-by: Jakub Kicinski 


Re: [PATCH v1 1/2] net: ag71xx: add ethtool support

2020-09-11 Thread Jakub Kicinski
On Fri, 11 Sep 2020 10:25:27 +0200 Oleksij Rempel wrote:
> Add basic ethtool support. The functionality was tested on AR9331 SoC.
> 
> Signed-off-by: Oleksij Rempel 

Reviewed-by: Jakub Kicinski 


[PATCH net-next 1/3] octeontx2-af: fix LD CUSTOM LTYPE aliasing

2020-09-11 Thread skardach
From: Stanislaw Kardach 

Since LD contains LTYPE definitions tweaked toward efficient
NIX_AF_RX_FLOW_KEY_ALG(0..31)_FIELD(0..4) usage, the original location
of NPC_LT_LD_CUSTOM0/1 was aliased with MPLS_IN_* definitions.
Moving custom frame to value 6 and 7 removes the aliasing at the cost of
custom frames being also considered when TCP/UDP RSS algo is configured.

However since the goal of CUSTOM frames is to classify them to a
separate set of RQs, this cost is acceptable.

Signed-off-by: Stanislaw Kardach 
---
 drivers/net/ethernet/marvell/octeontx2/af/npc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h 
b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
index 3803af9231c6..c0ff5f70aa43 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
@@ -77,6 +77,8 @@ enum npc_kpu_ld_ltype {
NPC_LT_LD_ICMP,
NPC_LT_LD_SCTP,
NPC_LT_LD_ICMP6,
+   NPC_LT_LD_CUSTOM0,
+   NPC_LT_LD_CUSTOM1,
NPC_LT_LD_IGMP = 8,
NPC_LT_LD_ESP,
NPC_LT_LD_AH,
@@ -85,8 +87,6 @@ enum npc_kpu_ld_ltype {
NPC_LT_LD_NSH,
NPC_LT_LD_TU_MPLS_IN_NSH,
NPC_LT_LD_TU_MPLS_IN_IP,
-   NPC_LT_LD_CUSTOM0 = 0xE,
-   NPC_LT_LD_CUSTOM1 = 0xF,
 };
 
 enum npc_kpu_le_ltype {
-- 
2.20.1



[PATCH] rndis_host: increase sleep time in the query-response loop

2020-09-11 Thread Olympia Giannou
Some WinCE devices face connectivity issues via the NDIS interface. They
fail to register, resulting in -110 timeout errors and failures during the
probe procedure.

In this kind of WinCE devices, the Windows-side ndis driver needs quite
more time to be loaded and configured, so that the linux rndis host queries
to them fail to be responded correctly on time.

More specifically, when INIT is called on the WinCE side - no other
requests can be served by the Client and this results in a failed QUERY
afterwards.

The increase of the waiting time on the side of the linux rndis host in
the command-response loop leaves the INIT process to complete and respond
to a QUERY, which comes afterwards. The WinCE devices with this special
"feature" in their ndis driver are satisfied by this fix.

Signed-off-by: Olympia Giannou 
---
 drivers/net/usb/rndis_host.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/rndis_host.c b/drivers/net/usb/rndis_host.c
index bd9c07888ebb..6fa7a009a24a 100644
--- a/drivers/net/usb/rndis_host.c
+++ b/drivers/net/usb/rndis_host.c
@@ -201,7 +201,7 @@ int rndis_command(struct usbnet *dev, struct rndis_msg_hdr 
*buf, int buflen)
dev_dbg(&info->control->dev,
"rndis response error, code %d\n", retval);
}
-   msleep(20);
+   msleep(40);
}
dev_dbg(&info->control->dev, "rndis response timeout\n");
return -ETIMEDOUT;
-- 
2.17.1



Re: [PATCH net-next 4/4] net: dsa: set configure_vlan_while_not_filtering to true by default

2020-09-11 Thread Vladimir Oltean
On Thu, Sep 10, 2020 at 08:09:19PM -0700, Florian Fainelli wrote:
> On 9/10/2020 5:03 PM, Vladimir Oltean wrote:
> > On Thu, Sep 10, 2020 at 02:58:04PM -0700, Florian Fainelli wrote:
> > > On 9/9/2020 11:34 AM, Florian Fainelli wrote:
> > > > On 9/9/2020 10:53 AM, Vladimir Oltean wrote:
> > > > > On Wed, Sep 09, 2020 at 10:22:42AM -0700, Florian Fainelli wrote:
> > > > > > How do you make sure that the CPU port sees the frame untagged
> > > > > > which would
> > > > > > be necessary for a VLAN-unaware bridge? Do you have a special 
> > > > > > remapping
> > > > > > rule?
> > > > >
> > > > > No, I don't have any remapping rules that would be relevant here.
> > > > > Why would the frames need to be necessarily untagged for a 
> > > > > VLAN-unaware
> > > > > bridge, why is it a problem if they aren't?
> > > > >
> > > > > bool br_allowed_ingress(const struct net_bridge *br,
> > > > >      struct net_bridge_vlan_group *vg, struct sk_buff *skb,
> > > > >      u16 *vid, u8 *state)
> > > > > {
> > > > >  /* If VLAN filtering is disabled on the bridge, all packets are
> > > > >   * permitted.
> > > > >   */
> > > > >  if (!br_opt_get(br, BROPT_VLAN_ENABLED)) {
> > > > >      BR_INPUT_SKB_CB(skb)->vlan_filtered = false;
> > > > >      return true;
> > > > >  }
> > > > >
> > > > >  return __allowed_ingress(br, vg, skb, vid, state);
> > > > > }
> > > > >
> > > > > If I have a VLAN on a bridged switch port where the bridge is not
> > > > > filtering, I have an 8021q upper of the bridge with that VLAN ID.
> > > >
> > > > Yes that is the key right there, you need an 8021q upper to pop the VLAN
> > > > ID or push it, that is another thing that users need to be aware of
> > > > which is a bit awkward, most expect things to just work. Maybe we should
> > > > just refuse to have bridge devices that are not VLAN-aware, because this
> > > > is just too cumbersome to deal with.
> > >
> > > With the drivers that you currently maintain and with the CPU port being
> > > always tagged in the VLANs added to the user-facing ports, when you are
> > > using a non-VLAN aware bridge, do you systematically add an br0.1 upper
> > > 802.1Q device to pop/push the VLAN tag?
> >
> > Talking to you, I realized that I confused you uselessly. But in doing
> > that, I actually cleared up a couple of things for myself. So thanks, I
> > guess?
> >
> > This is actually a great question, and it gave me the opportunity to
> > reflect.  So, only 1 driver that I maintain has the logic of always
> > marking the CPU port as egress-tagged. And that would be ocelot/felix.
> >
> > I need to give you a bit of background.
> > The DSA mode of Ocelot switches is more of an afterthought, and I am
> > saying this because there is a distinction I need to make between the
> > "CPU port module" (which is a set of queues that the CPU can inject and
> > extract frames from), and the "NPI port" (which is an operating mode,
> > where a regular front-panel Ethernet port is connected internally to the
> > CPU port module and injection/extraction I/O can therefore be done via
> > Ethernet, and that's your DSA).
> > Basically, when the NPI mode is in use, then it behaves less like an
> > Ethernet port, and more like a set of CPU queues that connect over
> > Ethernet, if that makes sense.
>
> SYSTEMPORT + bcm_sf2 act a lot like that, too, except the CPU port still
> obeys VLAN, buffering, classification and other switch internal rules, but
> essentially we want to map queues from the user-facing port to DMAs used by
> the host processor.

Digressing a lot here, but the NPI port of Ocelot switches really isn't
like that. For example, the NPI port doesn't even need to be in the
reachability domain for a frame to reach it. Other example, a TCAM rule
to drop a frame won't prevent it from reaching the NPI port, if that was
previously selected as a destination for that frame. Other example:
there is no source address learning for traffic injected by the network
stack over the NPI port. So, on RX, every frame that should reach the
CPU is actually _flooded_, due to the destination being unknown. Other
example: the NPI port is so hardcoded to wrap everything in an
Extraction Frame Header, that it even wraps PAUSE frames in it. That one
especially is so bad that I have a patch series in the works to simply
disable the NPI port and use tag_8021q instead. I just hate it.

>
> > The port settings for VLAN are bypassed, and the packet is copied as-is
> > from ingress to the NPI port. The egress-tagged port VLAN configuration
> > does not actually result in a VLAN header being pushed into the frame,
> > if that egress port is the NPI port.  Instead, the classified VLAN ID
> > (i.e. derived from the packet, or from the port-based VLAN, or from
> > custom VLAN classification TCAM rules) is always kept in a 12-bit field
> > of the Extraction Frame Header.
> >
> > Currently I am ignoring the classified VLAN from the Extraction Frame
> > Header, an

Re: [PATCH net-next] net: dsa: b53: Configure VLANs while not filtering

2020-09-11 Thread Vladimir Oltean
On Thu, Sep 10, 2020 at 09:19:05PM -0700, Florian Fainelli wrote:
> Update the B53 driver to support VLANs while not filtering. This
> requires us to enable VLAN globally within the switch upon driver
> initial configuration (dev->vlan_enabled).
>
> We also need to remove the code that dealt with PVID re-configuration in
> b53_vlan_filtering() since that function worked under the assumption
> that it would only be called to make a bridge VLAN filtering, or not
> filtering, and we would attempt to move the port's PVID accordingly.
>
> Now that VLANs are programmed all the time, even in the case of a
> non-VLAN filtering bridge, we would be programming a default_pvid for
> the bridged switch ports.
>
> Signed-off-by: Florian Fainelli 
> ---

Not sure it's worth a lot, but:

Acked-by: Vladimir Oltean 

>  drivers/net/dsa/b53/b53_common.c | 23 ---
>  1 file changed, 4 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/dsa/b53/b53_common.c 
> b/drivers/net/dsa/b53/b53_common.c
> index 6a5796c32721..46ac8875f870 100644
> --- a/drivers/net/dsa/b53/b53_common.c
> +++ b/drivers/net/dsa/b53/b53_common.c
> @@ -1377,23 +1377,6 @@ EXPORT_SYMBOL(b53_phylink_mac_link_up);
>  int b53_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering)
>  {
>   struct b53_device *dev = ds->priv;
> - u16 pvid, new_pvid;
> -
> - b53_read16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(port), &pvid);
> - if (!vlan_filtering) {
> - /* Filtering is currently enabled, use the default PVID since
> -  * the bridge does not expect tagging anymore
> -  */
> - dev->ports[port].pvid = pvid;
> - new_pvid = b53_default_pvid(dev);
> - } else {
> - /* Filtering is currently disabled, restore the previous PVID */
> - new_pvid = dev->ports[port].pvid;
> - }
> -
> - if (pvid != new_pvid)
> - b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(port),
> - new_pvid);
>
>   b53_enable_vlan(dev, dev->vlan_enabled, vlan_filtering);
>
> @@ -1444,7 +1427,7 @@ void b53_vlan_add(struct dsa_switch *ds, int port,
>   untagged = true;
>
>   vl->members |= BIT(port);
> - if (untagged && !dsa_is_cpu_port(ds, port))
> + if (untagged)
>   vl->untag |= BIT(port);
>   else
>   vl->untag &= ~BIT(port);
> @@ -1482,7 +1465,7 @@ int b53_vlan_del(struct dsa_switch *ds, int port,
>   if (pvid == vid)
>   pvid = b53_default_pvid(dev);
>
> - if (untagged && !dsa_is_cpu_port(ds, port))
> + if (untagged)
>   vl->untag &= ~(BIT(port));
>
>   b53_set_vlan_entry(dev, vid, vl);
> @@ -2619,6 +2602,8 @@ struct b53_device *b53_switch_alloc(struct device *base,
>   dev->priv = priv;
>   dev->ops = ops;
>   ds->ops = &b53_switch_ops;
> + ds->configure_vlan_while_not_filtering = true;
> + dev->vlan_enabled = ds->configure_vlan_while_not_filtering;
>   mutex_init(&dev->reg_mutex);
>   mutex_init(&dev->stats_mutex);
>
> --
> 2.25.1
>


[PATCH net-next 3/3] octeontx2-af: add support for custom KPU entries

2020-09-11 Thread skardach
From: Stanislaw Kardach 

Add ability to load a set of custom KPU entries via firmware APIs. This
allows for flexible support for custom protocol parsing and CAM matching.

AF driver will attempt to load the profile from the firmware file and
verify if it can fit hardware capabilities. If not, it will revert to
the built-in profile.

Next it will replace the first KPU_MAX_CST_LT (2) entries in each KPU
in default profile with entries read from the firmware image.
The built-in profile is amended to always contain KPU_MAX_CSR_LT first
no-match entries and AF driver will disable those in the KPU unless
custom profile is loaded.

By convention the custom entries should only utilize NPC_LT_Lx_CUSTOMy
LTYPEs to maintain interoperability with netdev driver.

In relation to MKEX profile, the order of load priority is as follows:

1. Profile in loaded KPU profile.
2. Profile defined by mkex_profile parameter.
3. Built-in MKEX profile.

Firmware image contains also a list of default protocol overrides to
allow for custom protocols to be used there. This allows to apply some
packet alignment fixups for custom protocols at the cost of HW protocol
checks.

Signed-off-by: Stanislaw Kardach 
---
 .../net/ethernet/marvell/octeontx2/af/npc.h   |  40 -
 .../marvell/octeontx2/af/npc_profile.h|  90 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   6 +
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   4 +
 .../ethernet/marvell/octeontx2/af/rvu_npc.c   | 170 ++
 5 files changed, 275 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h 
b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
index 6bfb9a9d3003..fe164b85adfb 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
@@ -148,7 +148,7 @@ struct npc_kpu_profile_cam {
u16 dp1_mask;
u16 dp2;
u16 dp2_mask;
-};
+} __packed;
 
 struct npc_kpu_profile_action {
u8 errlev;
@@ -168,7 +168,7 @@ struct npc_kpu_profile_action {
u8 mask;
u8 right;
u8 shift;
-};
+} __packed;
 
 struct npc_kpu_profile {
int cam_entries;
@@ -323,6 +323,15 @@ struct npc_mcam_kex {
u64 intf_ld_flags[NPC_MAX_INTF][NPC_MAX_LD][NPC_MAX_LFL];
 } __packed;
 
+struct npc_kpu_fwdata {
+   int entries;
+   /* What follows is:
+* struct npc_kpu_profile_cam[entries];
+* struct npc_kpu_profile_action[entries];
+*/
+   u8  data[0];
+} __packed;
+
 struct npc_lt_def {
u8  ltype_mask;
u8  ltype_match;
@@ -356,4 +365,31 @@ struct npc_lt_def_cfg {
struct npc_lt_def   pck_iip4;
 } __packed;
 
+/* Loadable KPU profile firmware data */
+struct npc_kpu_profile_fwdata {
+/* strtoull of "kpuprof" with base:36 */
+#define KPU_SIGN   0x00666f727075706b
+#define KPU_NAME_LEN   32
+/** Maximum number of custom KPU entries supported by the built-in profile. */
+#define KPU_MAX_CST_ENT2
+   /* KPU Profle Header */
+   __le64  signature; /* "kpuprof\0" (8 bytes/ASCII characters) */
+   u8  name[KPU_NAME_LEN]; /* KPU Profile name */
+   __le64  version; /* KPU profile version */
+   u8  kpus;
+   u8  reserved[7];
+
+   /* Default MKEX profile to be used with this KPU profile. Format is 
same as for the MKEX
+* profile to streamline processing.
+*/
+   struct npc_mcam_kex mkex;
+   /* LTYPE values for specific HW offloaded protocols. */
+   struct npc_lt_def_cfg   lt_def;
+   /* Dynamically sized data:
+*  Custom KPU CAM and ACTION configuration entries.
+* struct npc_kpu_fwdata kpu[kpus];
+*/
+   u8  data[0];
+} __packed;
+
 #endif /* NPC_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h 
b/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h
index 695c3b5c103e..6ba8be2e1d09 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h
@@ -11,7 +11,10 @@
 #ifndef NPC_PROFILE_H
 #define NPC_PROFILE_H
 
-#define NPC_KPU_PROFILE_VER0x00010005
+#define NPC_KPU_PROFILE_VER0x00010005
+#define NPC_KPU_VER_MAJ(ver)   (u16)(((ver) >> 32) & 0x)
+#define NPC_KPU_VER_MIN(ver)   (u16)(((ver) >> 16) & 0x)
+#define NPC_KPU_VER_PATCH(ver) (u16)((ver) & 0x)
 
 #define NPC_IH_W   0x8000
 #define NPC_IH_UTAG0x2000
@@ -424,6 +427,27 @@ enum NPC_ERRLEV_E {
NPC_ERRLEV_ENUM_LAST = 16,
 };
 
+#define NPC_KPU_NOP_CAM\
+   {   \
+   NPC_S_NA, 0xff, \
+   0x, \
+   0x, \
+   0x, \
+   0x, \
+   0x, \
+   0x, \
+   }
+
+#define NPC_KPU_NOP_ACTION \
+   {

Re: [RFC PATCH net-next 09/22] rtnetlink: Add RTNH_F_TRAP flag

2020-09-11 Thread David Ahern
On 9/11/20 9:26 AM, Ido Schimmel wrote:
> Reworded to:
> 
> "
> rtnetlink: Add RTNH_F_TRAP flag
> 
> The flag indicates to user space that the nexthop is not programmed to
> forward packets in hardware, but rather to trap them to the CPU. This is
> needed, for example, when the MAC of the nexthop neighbour is not
> resolved and packets should reach the CPU to trigger neighbour
> resolution.
> 
> The flag will be used in subsequent patches by netdevsim to test nexthop
> objects programming to device drivers and in the future by mlxsw as
> well.
> 
> Signed-off-by: Ido Schimmel 
> Reviewed-by: David Ahern 
> "

works for me. thanks


Re: [PATCH net-next 1/7] sfc: decouple TXQ type from label

2020-09-11 Thread Jakub Kicinski
On Thu, 10 Sep 2020 21:31:29 +0100 Edward Cree wrote:
> diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
> index 48d91b26f1a2..b0a08d9f4773 100644
> --- a/drivers/net/ethernet/sfc/tx.c
> +++ b/drivers/net/ethernet/sfc/tx.c
> @@ -527,6 +527,12 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
>   }
>  
>   tx_queue = efx_get_tx_queue(efx, index, type);
> + if (WARN_ON(!tx_queue))

_ONCE

> + /* We don't have a TXQ of the right type.
> +  * This should never happen, as we don't advertise offload
> +  * features unless we can support them.
> +  */
> + return NETDEV_TX_BUSY;

You should probably drop this packet, right? Next time qdisc calls the
driver it's unlikely to find a queue it needs.

>   return __efx_enqueue_skb(tx_queue, skb);
>  }



Re: [PATCH net-next] net: mvpp2: Initialize link in mvpp2_isr_handle_{xlg,gmac_internal}

2020-09-11 Thread Nathan Chancellor
On Fri, Sep 11, 2020 at 08:22:36AM -0700, Jakub Kicinski wrote:
> On Fri, 11 Sep 2020 12:11:58 +0100 Russell King - ARM Linux admin wrote:
> > On Thu, Sep 10, 2020 at 05:31:42PM -0700, Nathan Chancellor wrote:
> > > Ah great, that is indeed cleaner, thank you for letting me know!  
> > 
> > Hmm, I'm not sure why gcc didn't find that. Strangely, the 0-day bot
> > seems to have only picked up on it with clang, not gcc.
> 
> May be similar to: https://lkml.org/lkml/2019/2/25/1092
> 
> Recent GCC is so bad at catching uninitialized vars I was considering
> build testing with GCC8 :/

It is even simpler than that, the warning was straight up disabled in
commit 78a5255ffb6a ("Stop the ad-hoc games with -Wno-maybe-initialized").

clang's -Wuninitialized and -Wsometimes-uninitialized are generally more
accurate but can have some false positives because the semantic analysis
phase happens before inlining and dead code elimination.

Cheers,
Nathan


Re: [PATCH net-next 5/7] sfc: de-indirect TSO handling

2020-09-11 Thread Jakub Kicinski
On Thu, 10 Sep 2020 21:33:11 +0100 Edward Cree wrote:
> index 078c7ec2a70e..272eb5ecb7e7 100644
> --- a/drivers/net/ethernet/sfc/ef100_tx.c
> +++ b/drivers/net/ethernet/sfc/ef100_tx.c
> @@ -38,7 +38,8 @@ void ef100_tx_init(struct efx_tx_queue *tx_queue)
>   tx_queue->channel->channel -
>   tx_queue->efx->tx_channel_offset);
>  
> - if (efx_mcdi_tx_init(tx_queue, false))
> + tx_queue->tso_version = 3;
> + if (efx_mcdi_tx_init(tx_queue))
>   netdev_WARN(tx_queue->efx->net_dev,
>   "failed to initialise TXQ %d\n", tx_queue->queue);
>  }

> --- a/drivers/net/ethernet/sfc/tx.c
> +++ b/drivers/net/ethernet/sfc/tx.c
> @@ -338,8 +338,18 @@ netdev_tx_t __efx_enqueue_skb(struct efx_tx_queue 
> *tx_queue, struct sk_buff *skb
>* size limit.
>*/
>   if (segments) {
> - EFX_WARN_ON_ONCE_PARANOID(!tx_queue->handle_tso);
> - rc = tx_queue->handle_tso(tx_queue, skb, &data_mapped);
> + switch (tx_queue->tso_version) {
> + case 1:
> + rc = efx_enqueue_skb_tso(tx_queue, skb, &data_mapped);
> + break;
> + case 2:
> + rc = efx_ef10_tx_tso_desc(tx_queue, skb, &data_mapped);
> + break;
> + case 0: /* No TSO on this queue, SW fallback needed */
> + default:
> + rc = -EINVAL;
> + break;
> + }

Should tso_version 3 be handled in this switch?


Re: [PATCH net-next v3 7/7] net: mvpp2: ptp: add support for transmit timestamping

2020-09-11 Thread Richard Cochran
On Wed, Sep 09, 2020 at 11:00:47AM -0700, Richard Cochran wrote:
> On Tue, Sep 08, 2020 at 11:00:41PM +0100, Russell King wrote:
> 
> > +static bool mvpp2_tx_hw_tstamp(struct mvpp2_port *port,
> > +  struct mvpp2_tx_desc *tx_desc,
> > +  struct sk_buff *skb)
> > +{
> > +   unsigned int mtype, type, i, offset;
> > +   struct mvpp2_hwtstamp_queue *queue;
> > +   struct ptp_header *hdr;
> > +   u64 ptpdesc;
> > +
> > +   if (port->priv->hw_version == MVPP21 ||
> > +   port->tx_hwtstamp_type == HWTSTAMP_TX_OFF)
> > +   return false;
> > +
> > +   type = ptp_classify_raw(skb);
> > +   if (!type)
> > +   return false;
> > +
> > +   hdr = ptp_parse_header(skb, type);
> > +   if (!hdr)
> > +   return false;
> 
> At this point, the skb will be queued up to receive a transmit time
> stamp, and so it should be marked with:
> 
>   skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;

Russell, since this series went in already, can you follow up with
a patch for this please?

Thanks,
Richard


[PATCH net-next 2/3] octeontx2-af: prepare for custom KPU profiles

2020-09-11 Thread skardach
From: Stanislaw Kardach 

Refactor KPU related NPC code to prepare for upcoming KPU customization
functionality. This requires the following:
* Gathering all KPU profile related data into a single adapter struct.
* Converting the built-in MKEX definition to a structured one to
  streamline the MKEX loading.
* Convert LT default register configuration into a structure which may
  later on be customized.
* Add a single point for KPU profile loading, currently using only
  built-in profile.

Signed-off-by: Stanislaw Kardach 
---
 .../net/ethernet/marvell/octeontx2/af/npc.h   |  36 
 .../marvell/octeontx2/af/npc_profile.h| 154 ++
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  18 ++
 .../ethernet/marvell/octeontx2/af/rvu_nix.c   |  36 ++--
 .../ethernet/marvell/octeontx2/af/rvu_npc.c   | 201 ++
 5 files changed, 302 insertions(+), 143 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h 
b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
index c0ff5f70aa43..6bfb9a9d3003 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
@@ -296,6 +296,9 @@ struct nix_rx_action {
 #endif
 };
 
+/* NPC_AF_INTFX_KEX_CFG field masks */
+#define NPC_PARSE_NIBBLE   GENMASK_ULL(30, 0)
+
 /* NIX Receive Vtag Action Structure */
 #define VTAG0_VALID_BITBIT_ULL(15)
 #define VTAG0_TYPE_MASKGENMASK_ULL(14, 12)
@@ -320,4 +323,37 @@ struct npc_mcam_kex {
u64 intf_ld_flags[NPC_MAX_INTF][NPC_MAX_LD][NPC_MAX_LFL];
 } __packed;
 
+struct npc_lt_def {
+   u8  ltype_mask;
+   u8  ltype_match;
+   u8  lid;
+} __packed;
+
+struct npc_lt_def_ipsec {
+   u8  ltype_mask;
+   u8  ltype_match;
+   u8  lid;
+   u8  spi_offset;
+   u8  spi_nz;
+} __packed;
+
+struct npc_lt_def_cfg {
+   struct npc_lt_def   rx_ol2;
+   struct npc_lt_def   rx_oip4;
+   struct npc_lt_def   rx_iip4;
+   struct npc_lt_def   rx_oip6;
+   struct npc_lt_def   rx_iip6;
+   struct npc_lt_def   rx_otcp;
+   struct npc_lt_def   rx_itcp;
+   struct npc_lt_def   rx_oudp;
+   struct npc_lt_def   rx_iudp;
+   struct npc_lt_def   rx_osctp;
+   struct npc_lt_def   rx_isctp;
+   struct npc_lt_def_ipsec rx_ipsec[2];
+   struct npc_lt_def   pck_ol2;
+   struct npc_lt_def   pck_oip4;
+   struct npc_lt_def   pck_oip6;
+   struct npc_lt_def   pck_iip4;
+} __packed;
+
 #endif /* NPC_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h 
b/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h
index aa2727e6211a..695c3b5c103e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h
@@ -140,6 +140,12 @@
 #define NPC_DSA_EXTEND 0x1000
 #define NPC_DSA_EDSA   0x8000
 
+#define NPC_KEXOF_DMAC 8
+#define MKEX_SIGN  0x19bbfdbd15f /* strtoull of "mkexprof" with base:36 */
+#define KEX_LD_CFG(bytesm1, hdr_ofs, ena, flags_ena, key_ofs)  \
+   (((bytesm1) << 16) | ((hdr_ofs) << 8) | ((ena) << 7) | \
+((flags_ena) << 6) | ((key_ofs) & 0x3F))
+
 enum npc_kpu_parser_state {
NPC_S_NA = 0,
NPC_S_KPU1_ETHER,
@@ -13114,4 +13120,152 @@ static struct npc_kpu_profile npc_kpu_profiles[] = {
},
 };
 
+static struct npc_lt_def_cfg npc_lt_defaults = {
+   .rx_ol2 = {
+   .lid = NPC_LID_LA,
+   .ltype_match = NPC_LT_LA_ETHER,
+   .ltype_mask = 0x0F,
+   },
+   .rx_oip4 = {
+   .lid = NPC_LID_LC,
+   .ltype_match = NPC_LT_LC_IP,
+   .ltype_mask = 0x0E,
+   },
+   .rx_iip4 = {
+   .lid = NPC_LID_LG,
+   .ltype_match = NPC_LT_LG_TU_IP,
+   .ltype_mask = 0x0F,
+   },
+   .rx_oip6 = {
+   .lid = NPC_LID_LC,
+   .ltype_match = NPC_LT_LC_IP6,
+   .ltype_mask = 0x0E,
+   },
+   .rx_iip6 = {
+   .lid = NPC_LID_LG,
+   .ltype_match = NPC_LT_LG_TU_IP6,
+   .ltype_mask = 0x0F,
+   },
+   .rx_otcp = {
+   .lid = NPC_LID_LD,
+   .ltype_match = NPC_LT_LD_TCP,
+   .ltype_mask = 0x0F,
+   },
+   .rx_itcp = {
+   .lid = NPC_LID_LH,
+   .ltype_match = NPC_LT_LH_TU_TCP,
+   .ltype_mask = 0x0F,
+   },
+   .rx_oudp = {
+   .lid = NPC_LID_LD,
+   .ltype_match = NPC_LT_LD_UDP,
+   .ltype_mask = 0x0F,
+   },
+   .rx_iudp = {
+   .lid = NPC_LID_LH,
+   .ltype_match = NPC_LT_LH_TU_UDP,
+   .ltype_mask = 0x0F,
+   },
+   .rx_osctp = {
+   .lid = NPC_LID_LD,
+   .ltype_match = NPC_LT_LD_

Re: [RFC PATCH net-next 11/22] nexthop: Emit a notification when a nexthop is added

2020-09-11 Thread Ido Schimmel
On Tue, Sep 08, 2020 at 09:21:08AM -0600, David Ahern wrote:
> On 9/8/20 3:10 AM, Ido Schimmel wrote:
> > From: Ido Schimmel 
> > 
> > Emit a notification in the nexthop notification chain when a new nexthop
> > is added (not replaced). The nexthop can either be a new group or a
> > single nexthop.
> 
> Add a comment about why EVENT_REPLACE is generated on an 'added (not
> replaced)' event.

Reworded:

"
nexthop: Emit a notification when a nexthop is added

Emit a notification in the nexthop notification chain when a new nexthop
is added (not replaced). The nexthop can either be a new group or a
single nexthop.

The notification is sent after the nexthop is inserted into the
red-black tree, as listeners might need to callback into the nexthop
code with the nexthop ID in order to mark the nexthop as offloaded.

A 'REPLACE' notification is emitted instead of 'ADD' as the distinction
between the two is not important for in-kernel listeners. In case the
listener is not familiar with the encoded nexthop ID, it can simply
treat it as a new one. This is also consistent with the route offload
API.

Signed-off-by: Ido Schimmel 
"

> 
> > 
> > The notification is sent after the nexthop is inserted into the
> > red-black tree, as listeners might need to callback into the nexthop
> > code with the nexthop ID in order to mark the nexthop as offloaded.
> > 
> > Signed-off-by: Ido Schimmel 
> > ---
> >  include/net/nexthop.h | 3 ++-
> >  net/ipv4/nexthop.c| 6 +-
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> > index 4147681e86d2..6431ff8cdb89 100644
> > --- a/include/net/nexthop.h
> > +++ b/include/net/nexthop.h
> > @@ -106,7 +106,8 @@ struct nexthop {
> >  
> >  enum nexthop_event_type {
> > NEXTHOP_EVENT_ADD,
> 
> looks like the ADD event is not used and can be removed.

Right. I will remove it in a separate patch

> 
> > -   NEXTHOP_EVENT_DEL
> > +   NEXTHOP_EVENT_DEL,
> > +   NEXTHOP_EVENT_REPLACE,
> >  };
> >  
> >  struct nh_notifier_single_info {
> > diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> > index 71605c612458..1fa249facd46 100644
> > --- a/net/ipv4/nexthop.c
> > +++ b/net/ipv4/nexthop.c
> > @@ -1277,7 +1277,11 @@ static int insert_nexthop(struct net *net, struct 
> > nexthop *new_nh,
> >  
> > rb_link_node_rcu(&new_nh->rb_node, parent, pp);
> > rb_insert_color(&new_nh->rb_node, root);
> > -   rc = 0;
> > +
> > +   rc = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, new_nh, extack);
> > +   if (rc)
> > +   rb_erase(&new_nh->rb_node, &net->nexthop.rb_root);
> > +
> >  out:
> > if (!rc) {
> > nh_base_seq_inc(net);
> > 
> 


Re: [RFC PATCH net-next 13/22] nexthop: Emit a notification when a single nexthop is replaced

2020-09-11 Thread Ido Schimmel
On Tue, Sep 08, 2020 at 09:25:40AM -0600, David Ahern wrote:
> On 9/8/20 3:10 AM, Ido Schimmel wrote:
> > From: Ido Schimmel 
> > 
> > The notification is emitted after all the validation checks were
> > performed, but before the new configuration (i.e., 'struct nh_info') is
> > pointed at by the old shell (i.e., 'struct nexthop'). This prevents the
> > need to perform rollback in case the notification is vetoed.
> > 
> > The next patch will also emit a replace notification for all the nexthop
> > groups in which the nexthop is used.
> > 
> > Signed-off-by: Ido Schimmel 
> > ---
> >  net/ipv4/nexthop.c | 10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> > index a60a519a5462..b8a4abc00146 100644
> > --- a/net/ipv4/nexthop.c
> > +++ b/net/ipv4/nexthop.c
> > @@ -1099,12 +1099,22 @@ static int replace_nexthop_single(struct net *net, 
> > struct nexthop *old,
> >   struct netlink_ext_ack *extack)
> >  {
> > struct nh_info *oldi, *newi;
> > +   int err;
> >  
> > if (new->is_group) {
> > NL_SET_ERR_MSG(extack, "Can not replace a nexthop with a 
> > nexthop group.");
> > return -EINVAL;
> > }
> >  
> > +   err = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, new, extack);
> > +   if (err)
> > +   return err;
> > +
> > +   /* Hardware flags were set on 'old' as 'new' is not in the red-black
> > +* tree. Therefore, inherit the flags from 'old' to 'new'.
> > +*/
> > +   new->nh_flags |= old->nh_flags & (RTNH_F_OFFLOAD | RTNH_F_TRAP);
> 
> Will that always be true? ie., has h/w seen 'new' and offloaded it yet?

Yes. The chain was converted to a blocking chain, so it is possible to
program the hardware inline.

> vs the notifier telling hardware about the change, it does its thing and
> sets the flags. But I guess that creates a race between the offload and
> the new data being available.
> 
> > +
> > oldi = rtnl_dereference(old->nh_info);
> > newi = rtnl_dereference(new->nh_info);
> >  
> > 
> 


Re: [PATCH net-next v5 3/6] dt-bindings: net: dsa: add new MT7531 binding to support MT7531

2020-09-11 Thread Florian Fainelli




On 9/11/2020 6:48 AM, Landen Chao wrote:

Add devicetree binding to support the compatible mt7531 switch as used
in the MediaTek MT7531 switch.

Signed-off-by: Sean Wang 
Signed-off-by: Landen Chao 


Reviewed-by: Florian Fainelli 
--
Florian


Re: [RFC PATCH net-next 09/22] rtnetlink: Add RTNH_F_TRAP flag

2020-09-11 Thread Ido Schimmel
On Tue, Sep 08, 2020 at 09:02:33AM -0600, David Ahern wrote:
> On 9/8/20 3:10 AM, Ido Schimmel wrote:
> > From: Ido Schimmel 
> > 
> > The flag indicates to user space that the nexthop is not programmed to
> > forward packets in hardware, but rather to trap them.
> 
> please elaborate in the commit message on what 'trap' is doing. I most
> likely will forget a few years from now.

Reworded to:

"
rtnetlink: Add RTNH_F_TRAP flag

The flag indicates to user space that the nexthop is not programmed to
forward packets in hardware, but rather to trap them to the CPU. This is
needed, for example, when the MAC of the nexthop neighbour is not
resolved and packets should reach the CPU to trigger neighbour
resolution.

The flag will be used in subsequent patches by netdevsim to test nexthop
objects programming to device drivers and in the future by mlxsw as
well.

Signed-off-by: Ido Schimmel 
Reviewed-by: David Ahern 
"

> 
> > 
> > The flag will be used in subsequent patches by netdevsim to test nexthop
> > objects programming to device drivers and in the future by mlxsw as
> > well.
> > 
> > Signed-off-by: Ido Schimmel 
> > ---
> >  include/uapi/linux/rtnetlink.h | 6 --
> >  net/ipv4/fib_semantics.c   | 2 ++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> 
> Reviewed-by: David Ahern 


Re: [PATCH net-next] net: mvpp2: Initialize link in mvpp2_isr_handle_{xlg,gmac_internal}

2020-09-11 Thread Jakub Kicinski
On Fri, 11 Sep 2020 12:11:58 +0100 Russell King - ARM Linux admin wrote:
> On Thu, Sep 10, 2020 at 05:31:42PM -0700, Nathan Chancellor wrote:
> > Ah great, that is indeed cleaner, thank you for letting me know!  
> 
> Hmm, I'm not sure why gcc didn't find that. Strangely, the 0-day bot
> seems to have only picked up on it with clang, not gcc.

May be similar to: https://lkml.org/lkml/2019/2/25/1092

Recent GCC is so bad at catching uninitialized vars I was considering
build testing with GCC8 :/


Re: [RFC PATCH net-next 15/22] nexthop: Emit a notification when a nexthop group is reduced

2020-09-11 Thread Ido Schimmel
On Tue, Sep 08, 2020 at 09:33:42AM -0600, David Ahern wrote:
> On 9/8/20 3:10 AM, Ido Schimmel wrote:
> > From: Ido Schimmel 
> > 
> > When a single nexthop is deleted, the configuration of all the groups
> > using the nexthop is effectively modified. In this case, emit a
> > notification in the nexthop notification chain for each modified group
> > so that listeners would not need to keep track of which nexthops are
> > member in which groups.
> > 
> > In the rare cases where the notification fails, emit an error to the
> > kernel log.
> > 
> > Signed-off-by: Ido Schimmel 
> > ---
> >  net/ipv4/nexthop.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> > index 0edc3e73d416..33f611bbce1f 100644
> > --- a/net/ipv4/nexthop.c
> > +++ b/net/ipv4/nexthop.c
> > @@ -893,7 +893,7 @@ static void remove_nh_grp_entry(struct net *net, struct 
> > nh_grp_entry *nhge,
> > struct nexthop *nhp = nhge->nh_parent;
> > struct nexthop *nh = nhge->nh;
> > struct nh_group *nhg, *newg;
> > -   int i, j;
> > +   int i, j, err;
> >  
> > WARN_ON(!nh);
> >  
> > @@ -941,6 +941,10 @@ static void remove_nh_grp_entry(struct net *net, 
> > struct nh_grp_entry *nhge,
> > list_del(&nhge->nh_list);
> > nexthop_put(nhge->nh);
> >  
> > +   err = call_nexthop_notifiers(net, NEXTHOP_EVENT_REPLACE, nhp, NULL);
> > +   if (err)
> > +   pr_err("Failed to replace nexthop group after nexthop 
> > deletion\n");
> 
> This should refer to the notifier failing since wrt nexthop code the
> structs are ok. extack on the stack and logging that message would be
> useful too (or have the users of the notifier log why it fails).

'extack on the stack' idea is cool! I will do that

> 
> > +
> > if (nlinfo)
> > nexthop_notify(RTM_NEWNEXTHOP, nhp, nlinfo);
> >  }
> > 
> 


Re: [PATCH nf-next v3 3/3] netfilter: Introduce egress hook

2020-09-11 Thread Daniel Borkmann

On 9/11/20 9:42 AM, Laura García Liébana wrote:

On Tue, Sep 8, 2020 at 2:55 PM Daniel Borkmann  wrote:

On 9/5/20 7:24 AM, Lukas Wunner wrote:

On Fri, Sep 04, 2020 at 11:14:37PM +0200, Daniel Borkmann wrote:

On 9/4/20 6:21 PM, Lukas Wunner wrote:

[...]

The tc queueing layer which is below is not the tc egress hook; the
latter is for filtering/mangling/forwarding or helping the lower tc
queueing layer to classify.


People want to apply netfilter rules on egress, so either we need an
egress hook in the xmit path or we'd have to teach tc to filter and
mangle based on netfilter rules.  The former seemed more straight-forward
to me but I'm happy to pursue other directions.


I would strongly prefer something where nf integrates into existing tc hook,
not only due to the hook reuse which would be better, but also to allow for a
more flexible interaction between tc/BPF use cases and nf, to name one


That sounds good but I'm afraid that it would take too much back and
forth discussions. We'll really appreciate it if this small patch can
be unblocked and then rethink the refactoring of ingress/egress hooks
that you commented in another thread.


I'm not sure whether your comment was serious or not, but nope, this needs
to be addressed as mentioned as otherwise this use case would regress. It
is one thing for you wanting to remove tc / BPF from your application stack
as you call it, but not at the cost of breaking others.

Thank you,
Daniel


Re: VLAN filtering with DSA

2020-09-11 Thread Vladimir Oltean
On Fri, Sep 11, 2020 at 04:20:58PM +0300, Ido Schimmel wrote:
> On Thu, Sep 10, 2020 at 11:41:04AM -0700, Florian Fainelli wrote:
> > +Ido,
> >
> > On 9/10/2020 8:07 AM, Vladimir Oltean wrote:
> > > Florian, can you please reiterate what is the problem with calling
> > > vlan_vid_add() with a VLAN that is installed by the bridge?
> > >
> > > The effect of vlan_vid_add(), to my knowledge, is that the network
> > > interface should add this VLAN to its filtering table, and not drop it.
> > > So why return -EBUSY?
>
> Can you clarify when you return -EBUSY? At least in mlxsw we return an
> error in case we have a VLAN-aware bridge taking care of some VLAN and
> then user space tries to install a VLAN upper with the same VLAN on the
> same port. See more below.
>

In the original post Message-ID: <20200910150738.mwhh2i6j2qgacqev@skbuf>
I had copied this piece of code:

static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
 u16 vid)
{
...

/* Check for a possible bridge VLAN entry now since there is no
 * need to emulate the switchdev prepare + commit phase.
 */
if (dp->bridge_dev) {
...
/* br_vlan_get_info() returns -EINVAL or -ENOENT if the
 * device, respectively the VID is not found, returning
 * 0 means success, which is a failure for us here.
 */
ret = br_vlan_get_info(dp->bridge_dev, vid, &info);
if (ret == 0)
return -EBUSY;
}
}

> > Most of this was based on discussions we had with Ido and him explaining to
> > me how they were doing it in mlxsw.
> >
> > AFAIR the other case which is that you already have a 802.1Q upper, and then
> > you add the switch port to the bridge is permitted and the bridge would
> > inherit the VLAN into its local database.
>
> If you have swp1 and swp1.10, you can put swp1 in a VLAN-aware bridge
> and swp1.10 in a VLAN-unaware bridge. If you add VLAN 10 as part of the
> VLAN-aware bridge on swp1, traffic tagged with this VLAN will still be
> injected into the stack via swp1.10.
>
> I'm not sure what is the use case for such a configuration and we reject
> it in mlxsw.

Maybe the problem has to do with the fact that Florian took the
.ndo_vlan_rx_add_vid() callback as a shortcut for deducing that there is
an 8021q upper interface.

Currently there are other places in the network stack that don't really
work with a network interface that has problems with an interface that
has "rx-vlan-filter: on" in ethtool -k. See this discussion with Jiri on
the use of tc-vlan:

https://www.spinics.net/lists/netdev/msg645931.html

So, even though today .ndo_vlan_rx_add_vid() is only called from 8021q,
maybe we should dispel the myth that it's specific to 8021q somehow.

Maybe DSA should start tracking its upper interfaces, after all? Not
convinced though.

Thanks,
-Vladimir


Re: VLAN filtering with DSA

2020-09-11 Thread Vladimir Oltean
On Fri, Sep 11, 2020 at 07:30:42PM +0300, Vladimir Oltean wrote:
> Currently there are other places in the network stack that don't really
> work with a network interface that has problems with an interface that
> has "rx-vlan-filter: on" in ethtool -k.

Wow, I should learn how to write.
I meant:

Currently there are other places in the network stack that don't really
work with a network interface that has "rx-vlan-filter: on" in
ethtool -k.


[PATCH net-next 0/3] octeontx2-af: add support for KPU profile customization

2020-09-11 Thread skardach
From: Stanislaw Kardach 

Marvell octeontx2 NPC device contains a configurable Kanguroo Parser Unit
(KPU) and CAM match key data extraction (MKEX). The octeontx2-af driver
configures them both to parse a list of standard protocol headers which
are used by netdev driver and other potential applications (i.e.
userspace through VFIO).
The problem arises when users have some custom protocol headers which
they'd like to use in CAM flow matching. If such protocols are publicly
known, they can be added to the built-in KPU configuration (called
"profile" - in npc_profile.h). If not, then there's more benefit in
keeping such changes local to the user.
For that case a mechanism which would allow users to produce a KPU
profile and load it along with octeontx2-af driver is needed. At the same
time such customization has to take care not to break the netdev driver
operation or other applications (that is be discoverable).

Therefore introduce a mechanism for a limited customization of the
built-in KPU profile via a firmware file (layout and contents described
by struct npc_kpu_profile_fwdata). It allows user modification of only a
limited number of top priority KPU entries, while others are configured
from the built-in KPU profile. Additionally by convention users should
only use NPC_LT_Lx_CUSTOMx LTYPE entries in their profiles to change the
meaning of built-in LTYPEs. This way the baseline protocol support is
always available and the impact of potential user errors is minimized.
As MKEX also needs to be modified to take into account any user
protocols, the KPU profile firmware binary contains also that. Netdev
driver and applications have a way to discover applied MKEX settings by
querying RVU AF device via NPC_GET_KEX_CFG MBOX message.
Finally some users might need to modify hardware packet data alignment
behavior and profile contains settings for that too.

First patch ensures that CUSTOMx LTYPEs are not aliased with meaningful
LTYPEs where possible.

Second patch gathers all KPU profile related data into a single struct
and creates an adapter structure which provides an interface to the KPU
profile for the octeontx2-af driver.

Third patch adds logic for loading the KPU profile through kernel
firmware APIs, filling in the customizable entries in the adapter
structure and programming the MKEX from KPU profile.

Stanislaw Kardach (3):
  octeontx2-af: fix LD CUSTOM LTYPE aliasing
  octeontx2-af: prepare for custom KPU profiles
  octeontx2-af: add support for custom KPU entries

 .../net/ethernet/marvell/octeontx2/af/npc.h   |  80 +++-
 .../marvell/octeontx2/af/npc_profile.h| 244 -
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   6 +
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  22 ++
 .../ethernet/marvell/octeontx2/af/rvu_nix.c   |  36 +-
 .../ethernet/marvell/octeontx2/af/rvu_npc.c   | 341 ++
 6 files changed, 564 insertions(+), 165 deletions(-)

-- 
2.20.1



[PATCH bpf-next v2 1/5] bpf: expose is_mptcp flag to bpf_tcp_sock

2020-09-11 Thread Nicolas Rybowski
is_mptcp is a field from struct tcp_sock used to indicate that the
current tcp_sock is part of the MPTCP protocol.

In this protocol, a first socket (mptcp_sock) is created with
sk_protocol set to IPPROTO_MPTCP (=262) for control purpose but it
isn't directly on the wire. This is the role of the subflow (kernel)
sockets which are classical tcp_sock with sk_protocol set to
IPPROTO_TCP. The only way to differentiate such sockets from plain TCP
sockets is the is_mptcp field from tcp_sock.

Such an exposure in BPF is thus required to be able to differentiate
plain TCP sockets from MPTCP subflow sockets in BPF_PROG_TYPE_SOCK_OPS
programs.

The choice has been made to silently pass the case when CONFIG_MPTCP is
unset by defaulting is_mptcp to 0 in order to make BPF independent of
the MPTCP configuration. Another solution is to make the verifier fail
in 'bpf_tcp_sock_is_valid_ctx_access' but this will add an additional
'#ifdef CONFIG_MPTCP' in the BPF code and a same injected BPF program
will not run if MPTCP is not set.

An example use-case is provided in
https://github.com/multipath-tcp/mptcp_net-next/tree/scripts/bpf/examples

Suggested-by: Matthieu Baerts 
Acked-by: Matthieu Baerts 
Acked-by: Mat Martineau 
Signed-off-by: Nicolas Rybowski 
---
 include/uapi/linux/bpf.h   | 1 +
 net/core/filter.c  | 9 -
 tools/include/uapi/linux/bpf.h | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7dd314176df7..7d179eada1c3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4060,6 +4060,7 @@ struct bpf_tcp_sock {
__u32 delivered;/* Total data packets delivered incl. rexmits */
__u32 delivered_ce; /* Like the above but only ECE marked packets */
__u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */
+   __u32 is_mptcp; /* Is MPTCP subflow? */
 };
 
 struct bpf_sock_tuple {
diff --git a/net/core/filter.c b/net/core/filter.c
index d266c6941967..dab48528dceb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5837,7 +5837,7 @@ bool bpf_tcp_sock_is_valid_access(int off, int size, enum 
bpf_access_type type,
  struct bpf_insn_access_aux *info)
 {
if (off < 0 || off >= offsetofend(struct bpf_tcp_sock,
- icsk_retransmits))
+ is_mptcp))
return false;
 
if (off % size != 0)
@@ -5971,6 +5971,13 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type 
type,
case offsetof(struct bpf_tcp_sock, icsk_retransmits):
BPF_INET_SOCK_GET_COMMON(icsk_retransmits);
break;
+   case offsetof(struct bpf_tcp_sock, is_mptcp):
+#ifdef CONFIG_MPTCP
+   BPF_TCP_SOCK_GET_COMMON(is_mptcp);
+#else
+   *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+   break;
}
 
return insn - insn_buf;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7dd314176df7..7d179eada1c3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4060,6 +4060,7 @@ struct bpf_tcp_sock {
__u32 delivered;/* Total data packets delivered incl. rexmits */
__u32 delivered_ce; /* Like the above but only ECE marked packets */
__u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */
+   __u32 is_mptcp; /* Is MPTCP subflow? */
 };
 
 struct bpf_sock_tuple {
-- 
2.28.0



[PATCH net-next 05/13] mptcp: introduce and use mptcp_try_coalesce()

2020-09-11 Thread Paolo Abeni
Factor-out existing code, will be re-used by the
next patch.

Signed-off-by: Paolo Abeni 
---
 net/mptcp/protocol.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4f12a8ce0ddd..5a2ff333e426 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -110,6 +110,22 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
return 0;
 }
 
+static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to,
+  struct sk_buff *from)
+{
+   bool fragstolen;
+   int delta;
+
+   if (MPTCP_SKB_CB(from)->offset ||
+   !skb_try_coalesce(to, from, &fragstolen, &delta))
+   return false;
+
+   kfree_skb_partial(from, fragstolen);
+   atomic_add(delta, &sk->sk_rmem_alloc);
+   sk_mem_charge(sk, delta);
+   return true;
+}
+
 static void __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
 struct sk_buff *skb,
 unsigned int offset, size_t copy_len)
@@ -121,24 +137,15 @@ static void __mptcp_move_skb(struct mptcp_sock *msk, 
struct sock *ssk,
 
skb_ext_reset(skb);
skb_orphan(skb);
+   MPTCP_SKB_CB(skb)->offset = offset;
msk->ack_seq += copy_len;
 
tail = skb_peek_tail(&sk->sk_receive_queue);
-   if (offset == 0 && tail) {
-   bool fragstolen;
-   int delta;
-
-   if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) {
-   kfree_skb_partial(skb, fragstolen);
-   atomic_add(delta, &sk->sk_rmem_alloc);
-   sk_mem_charge(sk, delta);
-   return;
-   }
-   }
+   if (tail && mptcp_try_coalesce(sk, tail, skb))
+   return;
 
skb_set_owner_r(skb, sk);
__skb_queue_tail(&sk->sk_receive_queue, skb);
-   MPTCP_SKB_CB(skb)->offset = offset;
 }
 
 static void mptcp_stop_timer(struct sock *sk)
-- 
2.26.2



[PATCH net-next 13/13] mptcp: simult flow self-tests

2020-09-11 Thread Paolo Abeni
Add a bunch of test-cases for multiple subflow xmit:
create multiple subflows simulating different links
condition via netem and verify that the msk is able
to use completely the aggregated bandwidth.

Signed-off-by: Paolo Abeni 
---
 tools/testing/selftests/net/mptcp/Makefile|   3 +-
 .../selftests/net/mptcp/simult_flows.sh   | 293 ++
 2 files changed, 295 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/mptcp/simult_flows.sh

diff --git a/tools/testing/selftests/net/mptcp/Makefile 
b/tools/testing/selftests/net/mptcp/Makefile
index aa254aefc2c3..00bb158b4a5d 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -5,7 +5,8 @@ KSFT_KHDR_INSTALL := 1
 
 CFLAGS =  -Wall -Wl,--no-as-needed -O2 -g  -I$(top_srcdir)/usr/include
 
-TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh
+TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh \
+ simult_flows.sh
 
 TEST_GEN_FILES = mptcp_connect pm_nl_ctl
 
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh 
b/tools/testing/selftests/net/mptcp/simult_flows.sh
new file mode 100755
index ..0d88225daa02
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -0,0 +1,293 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+rndh=$(printf %x $sec)-$(mktemp -u XX)
+ns1="ns1-$rndh"
+ns2="ns2-$rndh"
+ns3="ns3-$rndh"
+capture=false
+ksft_skip=4
+timeout=30
+test_cnt=1
+ret=0
+bail=0
+
+usage() {
+   echo "Usage: $0 [ -b ] [ -c ] [ -d ]"
+   echo -e "\t-b: bail out after first error, otherwise runs al testcases"
+   echo -e "\t-c: capture packets for each test using tcpdump (default: no 
capture)"
+   echo -e "\t-d: debug this script"
+}
+
+cleanup()
+{
+   rm -f "$cin" "$cout"
+   rm -f "$sin" "$sout"
+   rm -f "$capout"
+
+   local netns
+   for netns in "$ns1" "$ns2" "$ns3";do
+   ip netns del $netns
+   done
+}
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+   echo "SKIP: Could not run test without ip tool"
+   exit $ksft_skip
+fi
+
+#  "$ns1"  ns2ns3
+# ns1eth1ns2eth1   ns2eth3  ns3eth1
+#netem
+# ns1eth2ns2eth2
+#netem
+
+setup()
+{
+   large=$(mktemp)
+   small=$(mktemp)
+   sout=$(mktemp)
+   cout=$(mktemp)
+   capout=$(mktemp)
+   size=$((2048 * 4096))
+   dd if=/dev/zero of=$small bs=4096 count=20 >/dev/null 2>&1
+   dd if=/dev/zero of=$large bs=4096 count=$((size / 4096)) >/dev/null 2>&1
+
+   trap cleanup EXIT
+
+   for i in "$ns1" "$ns2" "$ns3";do
+   ip netns add $i || exit $ksft_skip
+   ip -net $i link set lo up
+   done
+
+   ip link add ns1eth1 netns "$ns1" type veth peer name ns2eth1 netns 
"$ns2"
+   ip link add ns1eth2 netns "$ns1" type veth peer name ns2eth2 netns 
"$ns2"
+   ip link add ns2eth3 netns "$ns2" type veth peer name ns3eth1 netns 
"$ns3"
+
+   ip -net "$ns1" addr add 10.0.1.1/24 dev ns1eth1
+   ip -net "$ns1" addr add dead:beef:1::1/64 dev ns1eth1 nodad
+   ip -net "$ns1" link set ns1eth1 up mtu 1500
+   ip -net "$ns1" route add default via 10.0.1.2
+   ip -net "$ns1" route add default via dead:beef:1::2
+
+   ip -net "$ns1" addr add 10.0.2.1/24 dev ns1eth2
+   ip -net "$ns1" addr add dead:beef:2::1/64 dev ns1eth2 nodad
+   ip -net "$ns1" link set ns1eth2 up mtu 1500
+   ip -net "$ns1" route add default via 10.0.2.2 metric 101
+   ip -net "$ns1" route add default via dead:beef:2::2 metric 101
+
+   ip netns exec "$ns1" ./pm_nl_ctl limits 1 1
+   ip netns exec "$ns1" ./pm_nl_ctl add 10.0.2.1 dev ns1eth2 flags subflow
+   ip netns exec "$ns1" sysctl -q net.ipv4.conf.all.rp_filter=0
+
+   ip -net "$ns2" addr add 10.0.1.2/24 dev ns2eth1
+   ip -net "$ns2" addr add dead:beef:1::2/64 dev ns2eth1 nodad
+   ip -net "$ns2" link set ns2eth1 up mtu 1500
+
+   ip -net "$ns2" addr add 10.0.2.2/24 dev ns2eth2
+   ip -net "$ns2" addr add dead:beef:2::2/64 dev ns2eth2 nodad
+   ip -net "$ns2" link set ns2eth2 up mtu 1500
+
+   ip -net "$ns2" addr add 10.0.3.2/24 dev ns2eth3
+   ip -net "$ns2" addr add dead:beef:3::2/64 dev ns2eth3 nodad
+   ip -net "$ns2" link set ns2eth3 up mtu 1500
+   ip netns exec "$ns2" sysctl -q net.ipv4.ip_forward=1
+   ip netns exec "$ns2" sysctl -q net.ipv6.conf.all.forwarding=1
+
+   ip -net "$ns3" addr add 10.0.3.3/24 dev ns3eth1
+   ip -net "$ns3" addr add dead:beef:3::3/64 dev ns3eth1 nodad
+   ip -net "$ns3" link set ns3eth1 up mtu 1500
+   ip -net "$ns3" route add default via 10.0.3.2
+   ip -net "$ns3" route add default via dead:beef:3::2
+
+   ip netns exec "$ns3" ./pm_nl_ctl limits 1 1
+}
+
+# $1: ns, $2: port
+wait_local_port_listen()
+{
+   local listener_ns="${1}"
+   local port="${2}"
+
+  

Re: [RFC PATCH net-next 17/22] nexthop: Replay nexthops when registering a notifier

2020-09-11 Thread Ido Schimmel
On Tue, Sep 08, 2020 at 09:37:10AM -0600, David Ahern wrote:
> On 9/8/20 3:10 AM, Ido Schimmel wrote:
> > From: Ido Schimmel 
> > 
> > When registering a new notifier to the nexthop notification chain,
> > replay all the existing nexthops to the new notifier so that it will
> > have a complete picture of the available nexthops.
> > 
> > Signed-off-by: Ido Schimmel 
> > ---
> >  net/ipv4/nexthop.c | 54 --
> >  1 file changed, 52 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> > index b40c343ca969..6505a0a28df2 100644
> > --- a/net/ipv4/nexthop.c
> > +++ b/net/ipv4/nexthop.c
> > @@ -156,6 +156,27 @@ static int call_nexthop_notifiers(struct net *net,
> > return notifier_to_errno(err);
> >  }
> >  
> > +static int call_nexthop_notifier(struct notifier_block *nb, struct net 
> > *net,
> > +enum nexthop_event_type event_type,
> > +struct nexthop *nh,
> > +struct netlink_ext_ack *extack)
> > +{
> > +   struct nh_notifier_info info = {
> > +   .net = net,
> > +   .extack = extack,
> > +   };
> > +   int err;
> > +
> > +   err = nh_notifier_info_init(&info, nh);
> > +   if (err)
> > +   return err;
> > +
> > +   err = nb->notifier_call(nb, event_type, &info);
> > +   nh_notifier_info_fini(&info);
> > +
> > +   return notifier_to_errno(err);
> > +}
> > +
> >  static unsigned int nh_dev_hashfn(unsigned int val)
> >  {
> > unsigned int mask = NH_DEV_HASHSIZE - 1;
> > @@ -2116,11 +2137,40 @@ static struct notifier_block nh_netdev_notifier = {
> > .notifier_call = nh_netdev_event,
> >  };
> >  
> > +static int nexthops_dump(struct net *net, struct notifier_block *nb,
> > +struct netlink_ext_ack *extack)
> > +{
> > +   struct rb_root *root = &net->nexthop.rb_root;
> > +   struct rb_node *node;
> > +   int err = 0;
> > +
> > +   for (node = rb_first(root); node; node = rb_next(node)) {
> > +   struct nexthop *nh;
> > +
> > +   nh = rb_entry(node, struct nexthop, rb_node);
> > +   err = call_nexthop_notifier(nb, net, NEXTHOP_EVENT_REPLACE, nh,
> > +   extack);
> > +   if (err)
> > +   break;
> > +   }
> > +
> > +   return err;
> > +}
> > +
> >  int register_nexthop_notifier(struct net *net, struct notifier_block *nb,
> >   struct netlink_ext_ack *extack)
> >  {
> > -   return blocking_notifier_chain_register(&net->nexthop.notifier_chain,
> > -   nb);
> > +   int err;
> > +
> > +   rtnl_lock();
> > +   err = nexthops_dump(net, nb, extack);
> 
> can the unlock be moved here? register function below should not need it.

It can result in this unlikely race:

 - rtnl_lock(); nexthops_dump(); rtnl_unlock()
 - Nexthop is added / deleted
 - blocking_notifier_chain_register()

It is possible to flip the order:

 - blocking_notifier_chain_register()
 - rtnl_lock(); nexthops_dump(); rtnl_unlock()

Worst case:

 - blocking_notifier_chain_register()
 - Nexthop is added / deleted
 - rtnl_lock(); nexthops_dump(); rtnl_unlock()

Which is OK. If we get a delete notification for a nexthop we don't
know, we ignore it. If we get two replace notifications for the same
nexthop we just "update" it.

> 
> > +   if (err)
> > +   goto unlock;
> > +   err = blocking_notifier_chain_register(&net->nexthop.notifier_chain,
> > +  nb);
> > +unlock:
> > +   rtnl_unlock();
> > +   return err;
> >  }
> >  EXPORT_SYMBOL(register_nexthop_notifier);
> >  
> > 
> 


[PATCH net-next v5 0/6] net-next: dsa: mt7530: add support for MT7531

2020-09-11 Thread Landen Chao
This patch series adds support for MT7531.

MT7531 is the next generation of MT7530 which could be found on Mediatek
router platforms such as MT7622 or MT7629.

It is also a 7-ports switch with 5 giga embedded phys, 2 cpu ports, and
the same MAC logic of MT7530. Cpu port 6 only supports SGMII interface.
Cpu port 5 supports either RGMII or SGMII in different HW SKU, but cannot
be muxed to PHY of port 0/4 like mt7530. Due to support for SGMII
interface, pll, and pad setting are different from MT7530.

MT7531 SGMII interface can be configured in following mode:
- 'SGMII AN mode' with in-band negotiation capability
which is compatible with PHY_INTERFACE_MODE_SGMII.
- 'SGMII force mode' without in-band negotiation
which is compatible with 10B/8B encoding of
PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause.
- 2.5 times faster clocked 'SGMII force mode' without in-band negotiation
which is compatible with 10B/8B encoding of
PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause.

v4 -> v5
- Add fixed-link node to dsa cpu port in dts file by suggestion of
  Vladimir Oltean.

v3 -> v4
- Adjust the coding style by suggestion of Jakub Kicinski.
  Remove unnecessary jumping label, merge continuous numeric 'switch
  cases' into one line, and keep the variables longest to shortest
  (reverse xmas tree).

v2 -> v3
- Keep the same setup logic of mt7530/mt7621 because these series of
  patches is for adding mt7531 hardware.
- Do not adjust rgmii delay when vendor phy driver presents in order to
  prevent double adjustment by suggestion of Andrew Lunn.
- Remove redundant 'Example 4' from dt-bindings by suggestion of
  Rob Herring.
- Fix typo.

v1 -> v2
- change phylink_validate callback function to support full-duplex
  gigabit only to match hardware capability.
- add description of SGMII interface.
- configure mt7531 cpu port in fastest speed by default.
- parse SGMII control word for in-band negotiation mode.
- configure RGMII delay based on phy.rst.
- Rename the definition in the header file to avoid potential conflicts.
- Add wrapper function for mdio read/write to support both C22 and C45.
- correct fixed-link speed of 2500base-x in dts.
- add MT7531 port mirror setting.

Landen Chao (6):
  net: dsa: mt7530: Refine message in Kconfig
  net: dsa: mt7530: Extend device data ready for adding a new hardware
  dt-bindings: net: dsa: add new MT7531 binding to support MT7531
  net: dsa: mt7530: Add the support of MT7531 switch
  arm64: dts: mt7622: add mt7531 dsa to mt7622-rfb1 board
  arm64: dts: mt7622: add mt7531 dsa to bananapi-bpi-r64 board

 .../devicetree/bindings/net/dsa/mt7530.txt|   13 +-
 .../dts/mediatek/mt7622-bananapi-bpi-r64.dts  |   50 +
 arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts  |   63 +-
 drivers/net/dsa/Kconfig   |6 +-
 drivers/net/dsa/mt7530.c  | 1192 +++--
 drivers/net/dsa/mt7530.h  |  259 +++-
 6 files changed, 1467 insertions(+), 116 deletions(-)

-- 
2.17.1


[PATCH net-next] bridge: mcast: Fix incomplete MDB dump

2020-09-11 Thread Ido Schimmel
From: Ido Schimmel 

Each MDB entry is encoded in a nested netlink attribute called
'MDBA_MDB_ENTRY'. In turn, this attribute contains another nested
attributed called 'MDBA_MDB_ENTRY_INFO', which encodes a single port
group entry within the MDB entry.

The cited commit added the ability to restart a dump from a specific
port group entry. However, on failure to add a port group entry to the
dump the entire MDB entry (stored in 'nest2') is removed, resulting in
missing port group entries.

Fix this by finalizing the MDB entry with the partial list of already
encoded port group entries.

Fixes: 5205e919c9f0 ("net: bridge: mcast: add support for src list and filter 
mode dumping")
Signed-off-by: Ido Schimmel 
Acked-by: Nikolay Aleksandrov 
Reviewed-by: Jiri Pirko 
---
 net/bridge/br_mdb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 67e0976aeed2..00f1651a6aba 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -243,7 +243,7 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct 
netlink_callback *cb,
 
err = __mdb_fill_info(skb, mp, p);
if (err) {
-   nla_nest_cancel(skb, nest2);
+   nla_nest_end(skb, nest2);
goto out;
}
 skip_pg:
-- 
2.26.2



[PATCH 2/3] serial: s3c: Update path of Samsung S3C machine file

2020-09-11 Thread Krzysztof Kozlowski
Correct the path to Samsung S3C24xx machine file, mentioned in
documentation.

Signed-off-by: Krzysztof Kozlowski 
---
 include/linux/serial_s3c.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/serial_s3c.h b/include/linux/serial_s3c.h
index 463ed28d2b27..ca2c5393dc6b 100644
--- a/include/linux/serial_s3c.h
+++ b/include/linux/serial_s3c.h
@@ -254,7 +254,7 @@
  * serial port
  *
  * the pointer is setup by the machine specific initialisation from the
- * arch/arm/mach-s3c2410/ directory.
+ * arch/arm/mach-s3c/ directory.
 */
 
 struct s3c2410_uartcfg {
-- 
2.17.1



[PATCH bpf-next v2 3/5] bpf: add 'bpf_mptcp_sock' structure and helper

2020-09-11 Thread Nicolas Rybowski
In order to precisely identify the parent MPTCP connection of a subflow,
it is required to access the mptcp_sock's token which uniquely identify a
MPTCP connection.

This patch adds a new structure 'bpf_mptcp_sock' exposing the 'token' field
of the 'mptcp_sock' extracted from a subflow's 'tcp_sock'. It also adds the
declaration of a new BPF helper of the same name to expose the newly
defined structure in the userspace BPF API.

This is the foundation to expose more MPTCP-specific fields through BPF.

Currently, it is limited to the field 'token' of the msk but it is
easily extensible.

Acked-by: Matthieu Baerts 
Acked-by: Mat Martineau 
Signed-off-by: Nicolas Rybowski 
---
 include/linux/bpf.h| 33 
 include/uapi/linux/bpf.h   | 14 +++
 kernel/bpf/verifier.c  | 30 ++
 net/core/filter.c  |  4 ++
 net/mptcp/Makefile |  2 +
 net/mptcp/bpf.c| 72 ++
 scripts/bpf_helpers_doc.py |  2 +
 tools/include/uapi/linux/bpf.h | 14 +++
 8 files changed, 171 insertions(+)
 create mode 100644 net/mptcp/bpf.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c6d9f2c444f4..6be74420f6fa 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -305,6 +305,7 @@ enum bpf_return_type {
RET_PTR_TO_SOCK_COMMON_OR_NULL, /* returns a pointer to a sock_common 
or NULL */
RET_PTR_TO_ALLOC_MEM_OR_NULL,   /* returns a pointer to dynamically 
allocated memory or NULL */
RET_PTR_TO_BTF_ID_OR_NULL,  /* returns a pointer to a btf_id or 
NULL */
+   RET_PTR_TO_MPTCP_SOCK_OR_NULL,  /* returns a pointer to mptcp_sock or 
NULL */
 };
 
 /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF 
programs
@@ -385,6 +386,8 @@ enum bpf_reg_type {
PTR_TO_RDONLY_BUF_OR_NULL, /* reg points to a readonly buffer or NULL */
PTR_TO_RDWR_BUF, /* reg points to a read/write buffer */
PTR_TO_RDWR_BUF_OR_NULL, /* reg points to a read/write buffer or NULL */
+   PTR_TO_MPTCP_SOCK,   /* reg points to struct mptcp_sock */
+   PTR_TO_MPTCP_SOCK_OR_NULL, /* reg points to struct mptcp_sock or NULL */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -1785,6 +1788,7 @@ extern const struct bpf_func_proto 
bpf_skc_to_tcp_timewait_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto;
 extern const struct bpf_func_proto bpf_copy_from_user_proto;
+extern const struct bpf_func_proto bpf_mptcp_sock_proto;
 
 const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
@@ -1841,6 +1845,35 @@ struct sk_reuseport_kern {
u32 reuseport_id;
bool bind_inany;
 };
+
+#ifdef CONFIG_MPTCP
+bool bpf_mptcp_sock_is_valid_access(int off, int size,
+   enum bpf_access_type type,
+   struct bpf_insn_access_aux *info);
+
+u32 bpf_mptcp_sock_convert_ctx_access(enum bpf_access_type type,
+ const struct bpf_insn *si,
+ struct bpf_insn *insn_buf,
+ struct bpf_prog *prog,
+ u32 *target_size);
+#else /* CONFIG_MPTCP */
+static inline bool bpf_mptcp_sock_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ struct bpf_insn_access_aux 
*info)
+{
+   return false;
+}
+
+static inline u32 bpf_mptcp_sock_convert_ctx_access(enum bpf_access_type type,
+   const struct bpf_insn *si,
+   struct bpf_insn *insn_buf,
+   struct bpf_prog *prog,
+   u32 *target_size)
+{
+   return 0;
+}
+#endif /* CONFIG_MPTCP */
+
 bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
  struct bpf_insn_access_aux *info);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7d179eada1c3..ee7f6fd67f47 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3579,6 +3579,15 @@ union bpf_attr {
  * the data in *dst*. This is a wrapper of **copy_from_user**\ ().
  * Return
  * 0 on success, or a negative error in case of failure.
+ *
+ * struct bpf_mptcp_sock *bpf_mptcp_sock(struct bpf_sock *sk)
+ * Description
+ * This helper gets a **struct bpf_mptcp_sock** pointer from a
+ * **struct bpf_sock** pointer.
+ * Return
+ * A **struct bpf_mptcp_sock** pointer on success, or **NULL** in
+ * case of failure.
+ *
  */
 #define __BPF_FUNC

[PATCH net-next 11/13] mptcp: allow picking different xmit subflows

2020-09-11 Thread Paolo Abeni
Update the scheduler to less trivial heuristic: cache
the last used subflow, and try to send on it a reasonably
long burst of data.

When the burst or the subflow send space is exhausted, pick
the subflow with the lower ratio between write space and
send buffer - that is, the subflow with the greater relative
amount of free space.

Signed-off-by: Paolo Abeni 
---
 net/mptcp/protocol.c | 109 ---
 net/mptcp/protocol.h |   6 ++-
 2 files changed, 97 insertions(+), 18 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ec9c38d3acc7..148c4e685ecd 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1031,41 +1031,103 @@ static void mptcp_nospace(struct mptcp_sock *msk)
}
 }
 
+static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
+{
+   struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+   /* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */
+   if (subflow->request_join && !subflow->fully_established)
+   return false;
+
+   /* only send if our side has not closed yet */
+   return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
+}
+
+#define MPTCP_SEND_BURST_SIZE  ((1 << 16) - \
+sizeof(struct tcphdr) - \
+MAX_TCP_OPTION_SPACE - \
+sizeof(struct ipv6hdr) - \
+sizeof(struct frag_hdr))
+
+struct subflow_send_info {
+   struct sock *ssk;
+   uint64_t ratio;
+};
+
 static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk,
   u32 *sndbuf)
 {
+   struct subflow_send_info send_info[2];
struct mptcp_subflow_context *subflow;
-   struct sock *sk = (struct sock *)msk;
-   struct sock *backup = NULL;
-   bool free;
+   int i, nr_active = 0;
+   int64_t ratio, pace;
+   struct sock *ssk;
 
-   sock_owned_by_me(sk);
+   sock_owned_by_me((struct sock *)msk);
 
*sndbuf = 0;
if (!mptcp_ext_cache_refill(msk))
return NULL;
 
-   mptcp_for_each_subflow(msk, subflow) {
-   struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-
-   free = sk_stream_is_writeable(subflow->tcp_sock);
-   if (!free) {
-   mptcp_nospace(msk);
+   if (__mptcp_check_fallback(msk)) {
+   if (!msk->first)
return NULL;
+   *sndbuf = msk->first->sk_sndbuf;
+   return sk_stream_memory_free(msk->first) ? msk->first : NULL;
+   }
+
+   /* re-use last subflow, if the burst allow that */
+   if (msk->last_snd && msk->snd_burst > 0 &&
+   sk_stream_memory_free(msk->last_snd) &&
+   mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) {
+   mptcp_for_each_subflow(msk, subflow) {
+   ssk =  mptcp_subflow_tcp_sock(subflow);
+   *sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf);
}
+   return msk->last_snd;
+   }
+
+   /* pick the subflow with the lower wmem/wspace ratio */
+   for (i = 0; i < 2; ++i) {
+   send_info[i].ssk = NULL;
+   send_info[i].ratio = -1;
+   }
+   mptcp_for_each_subflow(msk, subflow) {
+   ssk =  mptcp_subflow_tcp_sock(subflow);
+   if (!mptcp_subflow_active(subflow))
+   continue;
 
+   nr_active += !subflow->backup;
*sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf);
-   if (subflow->backup) {
-   if (!backup)
-   backup = ssk;
+   if (!sk_stream_memory_free(subflow->tcp_sock))
+   continue;
 
+   pace = READ_ONCE(ssk->sk_pacing_rate);
+   if (!pace)
continue;
-   }
 
-   return ssk;
+   ratio = (int64_t)READ_ONCE(ssk->sk_wmem_queued) << 32 / pace;
+   if (ratio < send_info[subflow->backup].ratio) {
+   send_info[subflow->backup].ssk = ssk;
+   send_info[subflow->backup].ratio = ratio;
+   }
}
 
-   return backup;
+   pr_debug("msk=%p nr_active=%d ssk=%p:%lld backup=%p:%lld",
+msk, nr_active, send_info[0].ssk, send_info[0].ratio,
+send_info[1].ssk, send_info[1].ratio);
+
+   /* pick the best backup if no other subflow is active */
+   if (!nr_active)
+   send_info[0].ssk = send_info[1].ssk;
+
+   if (send_info[0].ssk) {
+   msk->last_snd = send_info[0].ssk;
+   msk->snd_burst = min_t(int, MPTCP_SEND_BURST_SIZE,
+  sk_stream_wspace(msk->last_snd));
+   return msk->last_s

[PATCH net-next 07/13] mptcp: cleanup mptcp_subflow_discard_data()

2020-09-11 Thread Paolo Abeni
There is no need to use the tcp_read_sock(), we can
simply drop the skb. Additionally try to look at the
next buffer for in order data.

This both simplifies the code and avoid unneeded indirect
calls.

Signed-off-by: Paolo Abeni 
---
 net/mptcp/protocol.h |  1 -
 net/mptcp/subflow.c  | 58 +++-
 2 files changed, 14 insertions(+), 45 deletions(-)

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e20154a33fa7..26f5f81f3f4c 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -355,7 +355,6 @@ int mptcp_is_enabled(struct net *net);
 void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
 struct mptcp_options_received *mp_opt);
 bool mptcp_subflow_data_available(struct sock *sk);
-int mptcp_subflow_discard_data(struct sock *sk, unsigned limit);
 void __init mptcp_subflow_init(void);
 
 /* called with sk socket lock held */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 1f048a5bf120..c4c174749865 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -805,50 +805,22 @@ static enum mapping_status get_mapping_status(struct sock 
*ssk,
return MAPPING_OK;
 }
 
-static int subflow_read_actor(read_descriptor_t *desc,
- struct sk_buff *skb,
- unsigned int offset, size_t len)
+static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb,
+  unsigned limit)
 {
-   size_t copy_len = min(desc->count, len);
-
-   desc->count -= copy_len;
+   struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+   bool fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;
+   u32 incr;
 
-   pr_debug("flushed %zu bytes, %zu left", copy_len, desc->count);
-   return copy_len;
-}
+   incr = limit >= skb->len ? skb->len + fin : limit;
 
-int mptcp_subflow_discard_data(struct sock *ssk, unsigned limit)
-{
-   struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
-   u32 map_remaining;
-   size_t delta;
-
-   map_remaining = subflow->map_data_len -
-   mptcp_subflow_get_map_offset(subflow);
-   delta = min_t(size_t, limit, map_remaining);
-
-   /* discard mapped data */
-   pr_debug("discarding %zu bytes, current map len=%d", delta,
-map_remaining);
-   if (delta) {
-   read_descriptor_t desc = {
-   .count = delta,
-   };
-   int ret;
-
-   ret = tcp_read_sock(ssk, &desc, subflow_read_actor);
-   if (ret < 0) {
-   ssk->sk_err = -ret;
-   return ret;
-   }
-   if (ret < delta)
-   return 0;
-   if (delta == map_remaining) {
-   subflow->data_avail = 0;
-   subflow->map_valid = 0;
-   }
-   }
-   return 0;
+   pr_debug("discarding=%d len=%d seq=%d", incr, skb->len,
+subflow->map_subflow_seq);
+   tcp_sk(ssk)->copied_seq += incr;
+   if (!before(tcp_sk(ssk)->copied_seq, TCP_SKB_CB(skb)->end_seq))
+   sk_eat_skb(ssk, skb);
+   if (mptcp_subflow_get_map_offset(subflow) >= subflow->map_data_len)
+   subflow->map_valid = 0;
 }
 
 static bool subflow_check_data_avail(struct sock *ssk)
@@ -923,9 +895,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
/* only accept in-sequence mapping. Old values are spurious
 * retransmission
 */
-   if (mptcp_subflow_discard_data(ssk, old_ack - ack_seq))
-   goto fatal;
-   return false;
+   mptcp_subflow_discard_data(ssk, skb, old_ack - ack_seq);
}
return true;
 
-- 
2.26.2



Re: [RFC PATCH net-next 17/22] nexthop: Replay nexthops when registering a notifier

2020-09-11 Thread David Ahern
On 9/11/20 10:40 AM, Ido Schimmel wrote:
>>> @@ -2116,11 +2137,40 @@ static struct notifier_block nh_netdev_notifier = {
>>> .notifier_call = nh_netdev_event,
>>>  };
>>>  
>>> +static int nexthops_dump(struct net *net, struct notifier_block *nb,
>>> +struct netlink_ext_ack *extack)
>>> +{
>>> +   struct rb_root *root = &net->nexthop.rb_root;
>>> +   struct rb_node *node;
>>> +   int err = 0;
>>> +
>>> +   for (node = rb_first(root); node; node = rb_next(node)) {
>>> +   struct nexthop *nh;
>>> +
>>> +   nh = rb_entry(node, struct nexthop, rb_node);
>>> +   err = call_nexthop_notifier(nb, net, NEXTHOP_EVENT_REPLACE, nh,
>>> +   extack);
>>> +   if (err)
>>> +   break;
>>> +   }
>>> +
>>> +   return err;
>>> +}
>>> +
>>>  int register_nexthop_notifier(struct net *net, struct notifier_block *nb,
>>>   struct netlink_ext_ack *extack)
>>>  {
>>> -   return blocking_notifier_chain_register(&net->nexthop.notifier_chain,
>>> -   nb);
>>> +   int err;
>>> +
>>> +   rtnl_lock();
>>> +   err = nexthops_dump(net, nb, extack);
>>
>> can the unlock be moved here? register function below should not need it.
> 
> It can result in this unlikely race:
> 
>  - rtnl_lock(); nexthops_dump(); rtnl_unlock()
>  - Nexthop is added / deleted
>  - blocking_notifier_chain_register()
> 

ok. Let's keep the order you have which I believe is consistent with FIB
notifiers.


Re: [PATCH net-next v5 0/6] net-next: dsa: mt7530: add support for MT7531

2020-09-11 Thread Vladimir Oltean
On Fri, Sep 11, 2020 at 09:48:50PM +0800, Landen Chao wrote:
> This patch series adds support for MT7531.
>
> MT7531 is the next generation of MT7530 which could be found on Mediatek
> router platforms such as MT7622 or MT7629.
>
> It is also a 7-ports switch with 5 giga embedded phys, 2 cpu ports, and
> the same MAC logic of MT7530. Cpu port 6 only supports SGMII interface.
> Cpu port 5 supports either RGMII or SGMII in different HW SKU, but cannot
> be muxed to PHY of port 0/4 like mt7530. Due to support for SGMII
> interface, pll, and pad setting are different from MT7530.
>
> MT7531 SGMII interface can be configured in following mode:
> - 'SGMII AN mode' with in-band negotiation capability
> which is compatible with PHY_INTERFACE_MODE_SGMII.
> - 'SGMII force mode' without in-band negotiation
> which is compatible with 10B/8B encoding of
> PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause.
> - 2.5 times faster clocked 'SGMII force mode' without in-band negotiation
> which is compatible with 10B/8B encoding of
> PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause.
>
> v4 -> v5
> - Add fixed-link node to dsa cpu port in dts file by suggestion of
>   Vladimir Oltean.

Thank you!

-Vladimir


[PATCH net-next 03/13] mptcp: trigger msk processing even for OoO data

2020-09-11 Thread Paolo Abeni
This is a prerequisite to allow receiving data from multiple
subflows without re-injection.

Instead of dropping the OoO - "future" data in
subflow_check_data_avail(), call into __mptcp_move_skbs()
and let the msk drop that.

To avoid code duplication factor out the mptcp_subflow_discard_data()
helper.

Note that __mptcp_move_skbs() can now find multiple subflows
with data avail (comprising to-be-discarded data), so must
update the byte counter incrementally.

Signed-off-by: Paolo Abeni 
---
 net/mptcp/protocol.c | 33 +++
 net/mptcp/protocol.h |  9 -
 net/mptcp/subflow.c  | 78 
 3 files changed, 78 insertions(+), 42 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 854a8b3b9ecd..95573c6f7762 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -167,7 +167,8 @@ static bool mptcp_subflow_dsn_valid(const struct mptcp_sock 
*msk,
return true;
 
subflow->data_avail = 0;
-   return mptcp_subflow_data_available(ssk);
+   mptcp_subflow_data_available(ssk);
+   return subflow->data_avail == MPTCP_SUBFLOW_DATA_AVAIL;
 }
 
 static void mptcp_check_data_fin_ack(struct sock *sk)
@@ -313,11 +314,18 @@ static bool __mptcp_move_skbs_from_subflow(struct 
mptcp_sock *msk,
struct tcp_sock *tp;
bool done = false;
 
-   if (!mptcp_subflow_dsn_valid(msk, ssk)) {
-   *bytes = 0;
+   pr_debug("msk=%p ssk=%p data avail=%d valid=%d empty=%d",
+msk, ssk, subflow->data_avail,
+mptcp_subflow_dsn_valid(msk, ssk),
+!skb_peek(&ssk->sk_receive_queue));
+   if (subflow->data_avail == MPTCP_SUBFLOW_OOO_DATA) {
+   mptcp_subflow_discard_data(ssk, subflow->map_data_len);
return false;
}
 
+   if (!mptcp_subflow_dsn_valid(msk, ssk))
+   return false;
+
tp = tcp_sk(ssk);
do {
u32 map_remaining, offset;
@@ -376,7 +384,7 @@ static bool __mptcp_move_skbs_from_subflow(struct 
mptcp_sock *msk,
}
} while (more_data_avail);
 
-   *bytes = moved;
+   *bytes += moved;
 
/* If the moves have caught up with the DATA_FIN sequence number
 * it's time to ack the DATA_FIN and change socket state, but
@@ -415,9 +423,17 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, 
struct sock *ssk)
 
 void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 {
+   struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct mptcp_sock *msk = mptcp_sk(sk);
+   bool wake;
 
-   set_bit(MPTCP_DATA_READY, &msk->flags);
+   /* move_skbs_to_msk below can legitly clear the data_avail flag,
+* but we will need later to properly woke the reader, cache its
+* value
+*/
+   wake = subflow->data_avail == MPTCP_SUBFLOW_DATA_AVAIL;
+   if (wake)
+   set_bit(MPTCP_DATA_READY, &msk->flags);
 
if (atomic_read(&sk->sk_rmem_alloc) < READ_ONCE(sk->sk_rcvbuf) &&
move_skbs_to_msk(msk, ssk))
@@ -438,7 +454,8 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
move_skbs_to_msk(msk, ssk);
}
 wake:
-   sk->sk_data_ready(sk);
+   if (wake)
+   sk->sk_data_ready(sk);
 }
 
 static void __mptcp_flush_join_list(struct mptcp_sock *msk)
@@ -1281,6 +1298,9 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr 
*msg, size_t len,
set_bit(MPTCP_DATA_READY, &msk->flags);
}
 out_err:
+   pr_debug("msk=%p data_ready=%d rx queue empty=%d copied=%d",
+msk, test_bit(MPTCP_DATA_READY, &msk->flags),
+skb_queue_empty(&sk->sk_receive_queue), copied);
mptcp_rcv_space_adjust(msk, copied);
 
release_sock(sk);
@@ -2308,6 +2328,7 @@ static __poll_t mptcp_poll(struct file *file, struct 
socket *sock,
sock_poll_wait(file, sock, wait);
 
state = inet_sk_state_load(sk);
+   pr_debug("msk=%p state=%d flags=%lx", msk, state, msk->flags);
if (state == TCP_LISTEN)
return mptcp_check_readable(msk);
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 60b27d44c184..981e395abb46 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -268,6 +268,12 @@ mptcp_subflow_rsk(const struct request_sock *rsk)
return (struct mptcp_subflow_request_sock *)rsk;
 }
 
+enum mptcp_data_avail {
+   MPTCP_SUBFLOW_NODATA,
+   MPTCP_SUBFLOW_DATA_AVAIL,
+   MPTCP_SUBFLOW_OOO_DATA
+};
+
 /* MPTCP subflow context */
 struct mptcp_subflow_context {
struct  list_head node;/* conn_list of subflows */
@@ -292,10 +298,10 @@ struct mptcp_subflow_context {
map_valid : 1,
mpc_map : 1,
backup : 1,
-   data_avail : 1,
rx_eof : 1,
use_64bit_ack : 1, /* Set when we received a 64-bit DSN */
  

[PATCH net-next 01/13] mptcp: rethink 'is writable' conditional

2020-09-11 Thread Paolo Abeni
Currently, when checking for the 'msk is writable' condition, we
look at the individual subflows write space.
That works well while we send data via a single subflow, but will
not as soon as we will enable concurrent xmit on multiple subflows.

With this change msk becomes writable when the following conditions
hold:
- the socket has some free write space
- there is at least a subflow with write free space

Additionally we need to set the NOSPACE bit on all subflows
before blocking.

Signed-off-by: Paolo Abeni 
---
 net/mptcp/protocol.c | 71 
 net/mptcp/subflow.c  |  6 ++--
 2 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 683196225f91..854a8b3b9ecd 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -472,7 +472,7 @@ void mptcp_data_acked(struct sock *sk)
 {
mptcp_reset_timer(sk);
 
-   if ((!sk_stream_is_writeable(sk) ||
+   if ((!test_bit(MPTCP_SEND_SPACE, &mptcp_sk(sk)->flags) ||
 (inet_sk_state_load(sk) != TCP_ESTABLISHED)) &&
schedule_work(&mptcp_sk(sk)->work))
sock_hold(sk);
@@ -567,6 +567,20 @@ static void dfrag_clear(struct sock *sk, struct 
mptcp_data_frag *dfrag)
put_page(dfrag->page);
 }
 
+static bool mptcp_is_writeable(struct mptcp_sock *msk)
+{
+   struct mptcp_subflow_context *subflow;
+
+   if (!sk_stream_is_writeable((struct sock *)msk))
+   return false;
+
+   mptcp_for_each_subflow(msk, subflow) {
+   if (sk_stream_is_writeable(subflow->tcp_sock))
+   return true;
+   }
+   return false;
+}
+
 static void mptcp_clean_una(struct sock *sk)
 {
struct mptcp_sock *msk = mptcp_sk(sk);
@@ -609,8 +623,15 @@ static void mptcp_clean_una(struct sock *sk)
sk_mem_reclaim_partial(sk);
 
/* Only wake up writers if a subflow is ready */
-   if (test_bit(MPTCP_SEND_SPACE, &msk->flags))
+   if (mptcp_is_writeable(msk)) {
+   set_bit(MPTCP_SEND_SPACE, &mptcp_sk(sk)->flags);
+   smp_mb__after_atomic();
+
+   /* set SEND_SPACE before sk_stream_write_space clears
+* NOSPACE
+*/
sk_stream_write_space(sk);
+   }
}
 }
 
@@ -801,21 +822,31 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct 
sock *ssk,
return ret;
 }
 
-static void mptcp_nospace(struct mptcp_sock *msk, struct socket *sock)
+static void mptcp_nospace(struct mptcp_sock *msk)
 {
+   struct mptcp_subflow_context *subflow;
+
clear_bit(MPTCP_SEND_SPACE, &msk->flags);
smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */
 
-   /* enables sk->write_space() callbacks */
-   set_bit(SOCK_NOSPACE, &sock->flags);
+   mptcp_for_each_subflow(msk, subflow) {
+   struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+   struct socket *sock = READ_ONCE(ssk->sk_socket);
+
+   /* enables ssk->write_space() callbacks */
+   if (sock)
+   set_bit(SOCK_NOSPACE, &sock->flags);
+   }
 }
 
 static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 {
struct mptcp_subflow_context *subflow;
+   struct sock *sk = (struct sock *)msk;
struct sock *backup = NULL;
+   bool free;
 
-   sock_owned_by_me((const struct sock *)msk);
+   sock_owned_by_me(sk);
 
if (!mptcp_ext_cache_refill(msk))
return NULL;
@@ -823,12 +854,9 @@ static struct sock *mptcp_subflow_get_send(struct 
mptcp_sock *msk)
mptcp_for_each_subflow(msk, subflow) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
-   if (!sk_stream_memory_free(ssk)) {
-   struct socket *sock = ssk->sk_socket;
-
-   if (sock)
-   mptcp_nospace(msk, sock);
-
+   free = sk_stream_is_writeable(subflow->tcp_sock);
+   if (!free) {
+   mptcp_nospace(msk);
return NULL;
}
 
@@ -845,16 +873,10 @@ static struct sock *mptcp_subflow_get_send(struct 
mptcp_sock *msk)
return backup;
 }
 
-static void ssk_check_wmem(struct mptcp_sock *msk, struct sock *ssk)
+static void ssk_check_wmem(struct mptcp_sock *msk)
 {
-   struct socket *sock;
-
-   if (likely(sk_stream_is_writeable(ssk)))
-   return;
-
-   sock = READ_ONCE(ssk->sk_socket);
-   if (sock)
-   mptcp_nospace(msk, sock);
+   if (unlikely(!mptcp_is_writeable(msk)))
+   mptcp_nospace(msk);
 }
 
 static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
@@ -907,6 +929,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr 
*msg, size_t len)

  1   2   3   >