date:20201202

Re: [PATCH bpf-next] bpf, xdp: add bpf_redirect{,_map}() leaf node detection and optimization

2020-12-02 Thread Björn Töpel


On 2020-12-02 05:46, Alexei Starovoitov wrote:
[...]


Sorry I don't like this check at all. It's too fragile.
It will work for one hard coded program.
It may work for something more real, but will break with minimal
changes to the prog or llvm changes.
How are we going to explain that fragility to users?



[...]



I haven't looked through all possible paths, but it feels very dangerous.
The stack growth is big. Calling xsk_rcv from preempt_disabled
and recursively calling into another bpf prog?
That violates all stack checks we have in the verifier.



Fair points, and thanks for pointing them out.

If the robustness (your first point) is improved, say via proper
indirect jump support, the stack usage will still be a concern.



I see plenty of cons and not a single pro in this patch.
5% improvement for micro benchmark? That's hardly a justification.



It's indeed a ubench, and something that is mostly beneficial to AF_XDP.
I'll go back to the drawing board and make sure the cons/pro balance is
improved.

Thanks for the feedback!


Björn

Re: [PATCH net-next] net: sched: remove redundant 'rtnl_held' argument

2020-12-02 Thread Vlad Buslov



On Tue 01 Dec 2020 at 21:24, Jakub Kicinski  wrote:
> On Tue, 1 Dec 2020 20:39:16 +0200 Vlad Buslov wrote:
>> On Tue 01 Dec 2020 at 19:03, Jakub Kicinski  wrote:
>> > On Tue, 1 Dec 2020 09:55:37 +0200 Vlad Buslov wrote:  
>> >> On Tue 01 Dec 2020 at 04:52, Jakub Kicinski  wrote:  
>> >> > On Fri, 27 Nov 2020 17:12:05 +0200 Vlad Buslov wrote:
>> >> >> @@ -2262,7 +2260,7 @@ static int tc_del_tfilter(struct sk_buff *skb, 
>> >> >> struct nlmsghdr *n,
>> >> >>  
>> >> >>if (prio == 0) {
>> >> >>tfilter_notify_chain(net, skb, block, q, parent, n,
>> >> >> -   chain, RTM_DELTFILTER, rtnl_held);
>> >> >> +   chain, RTM_DELTFILTER);
>> >> >>tcf_chain_flush(chain, rtnl_held);
>> >> >>err = 0;
>> >> >>goto errout;
>> >> >
>> >> > Hum. This looks off.
>> >> 
>> >> Hi Jakub,
>> >> 
>> >> Prio==0 means user requests to flush whole chain. In such case rtnl lock
>> >> is obtained earlier in tc_del_tfilter():
>> >> 
>> >>   /* Take rtnl mutex if flushing whole chain, block is shared (no qdisc
>> >>* found), qdisc is not unlocked, classifier type is not specified,
>> >>* classifier is not unlocked.
>> >>*/
>> >>   if (!prio ||
>> >>   (q && !(q->ops->cl_ops->flags & QDISC_CLASS_OPS_DOIT_UNLOCKED)) ||
>> >>   !tcf_proto_is_unlocked(name)) {
>> >>   rtnl_held = true;
>> >>   rtnl_lock();
>> >>   }
>> >>   
>> >
>> > Makes sense, although seems a little fragile. Why not put a true in
>> > there, in that case?  
>> 
>> Because, as I described in commit message, the function will trigger an
>> assertion if called without rtnl lock, so passing rtnl_held==false
>> argument makes no sense and is confusing for the reader.
>
> The assumption being that tcf_ functions without the arg must hold the
> lock?

Yes.

>
>> > Do you have a larger plan here? The motivation seems a little unclear
>> > if I'm completely honest. Are you dropping the rtnl_held from all callers 
>> > of __tcf_get_next_proto() just to save the extra argument / typing?  
>> 
>> The plan is to have 'rtnl_held' arg for functions that can be called
>> without rtnl lock and not have such argument for functions that require
>> caller to hold rtnl :)
>> 
>> To elaborate further regarding motivation for this patch: some time ago
>> I received an email asking why I have rtnl_held arg in function that has
>> ASSERT_RTNL() in one of its dependencies. I re-read the code and
>> determined that it was a leftover from earlier version and is not needed
>> in code that was eventually upstreamed. Removing the argument was an
>> easy decision since Jiri hates those and repeatedly asked me to minimize
>> usage of such function arguments, so I didn't expect it to be
>> controversial.
>> 
>> > That's nice but there's also value in the API being consistent.  
>> 
>> Cls_api has multiple functions that don't have 'rtnl_held' argument.
>> Only functions that can work without rtnl lock have it. Why do you
>> suggest it is inconsistent to remove it here?
>
> I see. I was just trying to figure out if you have a plan for larger
> restructuring to improve the situation. I also dislike to arguments
> being passed around in a seemingly random fashion. Removing or adding
> them to a single function does not move the needle much, IMO.

No, this is not part of larger effort. I would like to stop passing
'rtnl_held' everywhere, but for that I need other drivers that implement
TC offload to stop requiring rtnl lock, which would allow removing
rtnl_held from tcf_proto_ops callbacks.

>
> But since the patch is correct I'll apply it now, thanks!

Thank you!

Re: [PATCH net-next v1 1/3] vm_sockets: Include flag field in the vsock address data structure

2020-12-02 Thread Stefano Garzarella


On Tue, Dec 01, 2020 at 08:15:04PM +0200, Paraschiv, Andra-Irina wrote:



On 01/12/2020 18:09, Stefano Garzarella wrote:


On Tue, Dec 01, 2020 at 05:25:03PM +0200, Andra Paraschiv wrote:

vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flag field in the vsock address data structure that can be used to
explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between nested VMs and
sibling VMs use cases and can also setup them at the same time. Till
now, could either have nested VMs or sibling VMs at a time using the
vsock communication stack.

Use the already available "svm_reserved1" field and mark it as a flag
field instead. This flag can be set when initializing the vsock address
variable used for the connect() call.


Maybe we can split this patch in 2 patches, one to rename the svm_flag
and one to add the new flags.


Sure, I can split this in 2 patches, to have a bit more separation of 
duties.






Signed-off-by: Andra Paraschiv 
---
include/uapi/linux/vm_sockets.h | 18 +-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h 
b/include/uapi/linux/vm_sockets.h

index fd0ed7221645d..58da5a91413ac 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -114,6 +114,22 @@

#define VMADDR_CID_HOST 2

+/* This sockaddr_vm flag value covers the current default use case:
+ * local vsock communication between guest and host and nested 
VMs setup.
+ * In addition to this, implicitly, the vsock packets are 
forwarded to the host

+ * if no host->guest vsock transport is set.
+ */
+#define VMADDR_FLAG_DEFAULT_COMMUNICATION 0x


I think we don't need this macro, since the next one can be used to
check if it a sibling communication (flag 0x1 set) or not (flag 0x1
not set).


Right, that's not particularly the use of the flag value, as by 
default comes as 0. It was more for sharing the cases this covers. But 
I can remove the define and keep this kind of info, with regard to the 
default case, in the commit message / comments.




Agree, you can add few lines in the comment block of VMADDR_FLAG_SIBLING 
describing the default case when it is not set.





+
+/* Set this flag value in the sockaddr_vm corresponding field if 
the vsock
+ * channel needs to be setup between two sibling VMs running on 
the same host.
+ * This way can explicitly distinguish between vsock channels 
created for nested
+ * VMs (or local communication between guest and host) and the 
ones created for
+ * sibling VMs. And vsock channels for multiple use cases (nested 
/ sibling VMs)

+ * can be setup at the same time.
+ */
+#define VMADDR_FLAG_SIBLING_VMS_COMMUNICATION 0x0001


What do you think if we shorten in VMADDR_FLAG_SIBLING?



Yup, this seems ok as well for me. I'll update the naming.



Thanks,
Stefano

Re: [PATCH iproute2-net 0/3] devlink: Add devlink reload action limit and stats

2020-12-02 Thread Moshe Shemesh




On 12/1/2020 1:25 PM, Vasundhara Volam wrote:

On Thu, Nov 26, 2020 at 4:46 PM Moshe Shemesh  wrote:

Introduce new options on devlink reload API to enable the user to select
the reload action required and constrains limits on these actions that he
may want to ensure.

Add reload stats to show the history per reload action per limit.

Patch 1 adds the new API reload action and reload limit options to
 devlink reload command.
Patch 2 adds pr_out_dev() helper function and modify monitor function to
 use it.
Patch 3 adds reload stats and remote reload stats to devlink dev show.


Moshe Shemesh (3):
   devlink: Add devlink reload action and limit options
   devlink: Add pr_out_dev() helper function
   devlink: Add reload stats to dev show

  devlink/devlink.c| 260 +--
  include/uapi/linux/devlink.h |   2 +
  2 files changed, 249 insertions(+), 13 deletions(-)

I see man pages are not updated accordingly in this series. Will it be
updated in the follow-up patch?

Right, I will update man page. Thanks.

--
2.18.2

Re: [PATCH net-next v1 2/3] virtio_transport_common: Set sibling VMs flag on the receive path

2020-12-02 Thread Stefano Garzarella


On Tue, Dec 01, 2020 at 09:01:05PM +0200, Paraschiv, Andra-Irina wrote:



On 01/12/2020 18:22, Stefano Garzarella wrote:


On Tue, Dec 01, 2020 at 05:25:04PM +0200, Andra Paraschiv wrote:

The vsock flag can be set during the connect() setup logic, when
initializing the vsock address data structure variable. Then the vsock
transport is assigned, also considering this flag.

The vsock transport is also assigned on the (listen) receive path. The
flag needs to be set considering the use case.

Set the vsock flag of the remote address to the one targeted for sibling
VMs communication if the following conditions are met:

* The source CID of the packet is higher than VMADDR_CID_HOST.
* The destination CID of the packet is higher than VMADDR_CID_HOST.

Signed-off-by: Andra Paraschiv 
---
net/vmw_vsock/virtio_transport_common.c | 8 
1 file changed, 8 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c

index 5956939eebb78..871c84e0916b1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1062,6 +1062,14 @@ virtio_transport_recv_listen(struct sock 
*sk, struct virtio_vsock_pkt *pkt,
  vsock_addr_init(&vchild->remote_addr, 
le64_to_cpu(pkt->hdr.src_cid),

  le32_to_cpu(pkt->hdr.src_port));



Maybe is better to create an helper function that other transports can
use for the same purpose or we can put this code in the
vsock_assign_transport() and set this flag only when the 'psk' argument
is not NULL (this is the case when it's called by the transports when we
receive a new connection request and 'psk' is the listener socket).

The second way should allow us to support all the transports without
touching them.


Ack, I was wondering about the other transports such as vmci or hyperv.

I can move the logic below in the codebase that assigns the transport, 
after checking 'psk'.




+  /* If the packet is coming with the source and destination 
CIDs higher
+   * than VMADDR_CID_HOST, then a vsock channel should be 
established for

+   * sibling VMs communication.
+   */
+  if (vchild->local_addr.svm_cid > VMADDR_CID_HOST &&
+  vchild->remote_addr.svm_cid > VMADDR_CID_HOST)
+  vchild->remote_addr.svm_flag = 
VMADDR_FLAG_SIBLING_VMS_COMMUNICATION;


svm_flag is always initialized to 0 in vsock_addr_init(), so this
assignment is the first one and it's okay, but to avoid future issues
I'd use |= here to set the flag.


Fair point. I was thinking more towards exclusive flags values 
(purposes), but that's fine with the bitwise operator if we would get 
a set of flag values together. I will also update the field name to 
'svm_flags', let me know if we should keep the previous one or there 
is a better option.


Yeah, maybe in the future we will add some new flags and we'll only need 
to add them without touching this code.


Agree with the new 'svm_flags' field name.

Thanks,
Stefano

[PATCH 0/4] patch for stmmac

2020-12-02 Thread Joakim Zhang

A patch set for stmmac, fix some driver issues.

Fugang Duan (4):
  net: stmmac: dwmac4_lib: increase the timeout for dma reset
  net: stmmac: start phylink instance before stmmac_hw_setup()
  net: ethernet: stmmac: free tx skb buffer in stmmac_resume()
  net: ethernet: stmmac: delete the eee_ctrl_timer after napi disabled

 .../net/ethernet/stmicro/stmmac/dwmac4_lib.c  |  2 +-
 .../net/ethernet/stmicro/stmmac/stmmac_main.c | 43 ++-
 2 files changed, 33 insertions(+), 12 deletions(-)

-- 
2.17.1

[PATCH 2/4] net: stmmac: start phylink instance before stmmac_hw_setup()

2020-12-02 Thread Joakim Zhang

From: Fugang Duan 

Start phylink instance and resume back the PHY to supply
RX clock to MAC before MAC layer initialization by calling
.stmmac_hw_setup(), since DMA reset depends on the RX clock,
otherwise DMA reset cost maximum timeout value then finally
timeout.

Signed-off-by: Fugang Duan 
Signed-off-by: Joakim Zhang 
---
 .../net/ethernet/stmicro/stmmac/stmmac_main.c| 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 8c1ac75901ce..107761ef456a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5277,6 +5277,14 @@ int stmmac_resume(struct device *dev)
return ret;
}
 
+   if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
+   rtnl_lock();
+   phylink_start(priv->phylink);
+   /* We may have called phylink_speed_down before */
+   phylink_speed_up(priv->phylink);
+   rtnl_unlock();
+   }
+
rtnl_lock();
mutex_lock(&priv->lock);
 
@@ -5295,14 +5303,6 @@ int stmmac_resume(struct device *dev)
mutex_unlock(&priv->lock);
rtnl_unlock();
 
-   if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
-   rtnl_lock();
-   phylink_start(priv->phylink);
-   /* We may have called phylink_speed_down before */
-   phylink_speed_up(priv->phylink);
-   rtnl_unlock();
-   }
-
phylink_mac_change(priv->phylink, true);
 
netif_device_attach(ndev);
-- 
2.17.1

[PATCH] LF-2678 net: ethernet: stmmac: delete the eee_ctrl_timer after napi disabled

2020-12-02 Thread Joakim Zhang

From: Fugang Duan 

There have chance to re-enable the eee_ctrl_timer and fire the timer
in napi callback after delete the timer in .stmmac_release(), which
introduces to access eee registers in the timer function after clocks
are disabled then causes system hang.

It is safe to delete the timer after napi disabled and disable lpi mode.

Tested-by: Joakim Zhang 
Reviewed-by: Joakim Zhang 
Signed-off-by: Fugang Duan 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index cc1f17b170f0..7e655fa34589 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2933,9 +2933,6 @@ static int stmmac_release(struct net_device *dev)
struct platform_device *pdev = to_platform_device(priv->device);
u32 chan;
 
-   if (priv->eee_enabled)
-   del_timer_sync(&priv->eee_ctrl_timer);
-
if (device_may_wakeup(priv->device))
phylink_speed_down(priv->phylink, false);
/* Stop and disconnect the PHY */
@@ -2954,6 +2951,11 @@ static int stmmac_release(struct net_device *dev)
if (priv->lpi_irq > 0)
free_irq(priv->lpi_irq, dev);
 
+   if (priv->eee_enabled) {
+   priv->tx_path_in_lpi_mode = false;
+   del_timer_sync(&priv->eee_ctrl_timer);
+   }
+
/* Stop TX/RX DMA and clear the descriptors */
stmmac_stop_all_dma(priv);
 
@@ -5224,6 +5226,11 @@ int stmmac_suspend(struct device *dev)
for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++)
del_timer_sync(&priv->tx_queue[chan].txtimer);
 
+   if (priv->eee_enabled) {
+   priv->tx_path_in_lpi_mode = false;
+   del_timer_sync(&priv->eee_ctrl_timer);
+   }
+
/* Stop TX/RX DMA */
stmmac_stop_all_dma(priv);
 
-- 
2.17.1

[PATCH 3/4] net: ethernet: stmmac: free tx skb buffer in stmmac_resume()

2020-12-02 Thread Joakim Zhang

From: Fugang Duan 

When do suspend/resume test, there have WARN_ON() log dump from
stmmac_xmit() funciton, the code logic:
entry = tx_q->cur_tx;
first_entry = entry;
WARN_ON(tx_q->tx_skbuff[first_entry]);

In normal case, tx_q->tx_skbuff[txq->cur_tx] should be NULL because
the skb should be handled and freed in stmmac_tx_clean().

But stmmac_resume() reset queue parameters like below, skb buffers
may not be freed.
tx_q->cur_tx = 0;
tx_q->dirty_tx = 0;

So free tx skb buffer in stmmac_resume() to avoid warning and
memory leak.

log:
[   46.139824] [ cut here ]
[   46.144453] WARNING: CPU: 0 PID: 0 at 
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3235 stmmac_xmit+0x7a0/0x9d0
[   46.154969] Modules linked in: crct10dif_ce vvcam(O) flexcan can_dev
[   46.161328] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   O  
5.4.24-2.1.0+g2ad925d15481 #1
[   46.170369] Hardware name: NXP i.MX8MPlus EVK board (DT)
[   46.175677] pstate: 8005 (Nzcv daif -PAN -UAO)
[   46.180465] pc : stmmac_xmit+0x7a0/0x9d0
[   46.184387] lr : dev_hard_start_xmit+0x94/0x158
[   46.188913] sp : 800010003cc0
[   46.192224] x29: 800010003cc0 x28: 000177e2a100
[   46.197533] x27: 000176ef0840 x26: 000176ef0090
[   46.202842] x25:  x24: 
[   46.208151] x23: 0003 x22: 8000119ddd30
[   46.213460] x21: 00017636f000 x20: 000176ef0cc0
[   46.218769] x19: 0003 x18: 
[   46.224078] x17:  x16: 
[   46.229386] x15: 0079 x14: 
[   46.234695] x13: 0003 x12: 0003
[   46.240003] x11: 0010 x10: 0010
[   46.245312] x9 : 00017002b140 x8 : 
[   46.250621] x7 : 00017636f000 x6 : 0010
[   46.255930] x5 : 0001 x4 : 000176ef
[   46.261238] x3 : 0003 x2 : 
[   46.266547] x1 : 000177e2a000 x0 : 
[   46.271856] Call trace:
[   46.274302]  stmmac_xmit+0x7a0/0x9d0
[   46.277874]  dev_hard_start_xmit+0x94/0x158
[   46.282056]  sch_direct_xmit+0x11c/0x338
[   46.285976]  __qdisc_run+0x118/0x5f0
[   46.289549]  net_tx_action+0x110/0x198
[   46.293297]  __do_softirq+0x120/0x23c
[   46.296958]  irq_exit+0xb8/0xd8
[   46.300098]  __handle_domain_irq+0x64/0xb8
[   46.304191]  gic_handle_irq+0x5c/0x148
[   46.307936]  el1_irq+0xb8/0x180
[   46.311076]  cpuidle_enter_state+0x84/0x360
[   46.315256]  cpuidle_enter+0x34/0x48
[   46.318829]  call_cpuidle+0x18/0x38
[   46.322314]  do_idle+0x1e0/0x280
[   46.325539]  cpu_startup_entry+0x24/0x40
[   46.329460]  rest_init+0xd4/0xe0
[   46.332687]  arch_call_rest_init+0xc/0x14
[   46.336695]  start_kernel+0x420/0x44c
[   46.340353] ---[ end trace bc1ee695123cbacd ]---

Signed-off-by: Fugang Duan 
Signed-off-by: Joakim Zhang 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 107761ef456a..53c5d77eba57 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1557,6 +1557,19 @@ static void dma_free_tx_skbufs(struct stmmac_priv *priv, 
u32 queue)
stmmac_free_tx_buffer(priv, queue, i);
 }
 
+/**
+ * stmmac_free_tx_skbufs - free TX skb buffers
+ * @priv: private structure
+ */
+static void stmmac_free_tx_skbufs(struct stmmac_priv *priv)
+{
+   u32 tx_queue_cnt = priv->plat->tx_queues_to_use;
+   u32 queue;
+
+   for (queue = 0; queue < tx_queue_cnt; queue++)
+   dma_free_tx_skbufs(priv, queue);
+}
+
 /**
  * free_dma_rx_desc_resources - free RX dma desc resources
  * @priv: private structure
@@ -5290,6 +5303,7 @@ int stmmac_resume(struct device *dev)
 
stmmac_reset_queues_param(priv);
 
+   stmmac_free_tx_skbufs(priv);
stmmac_clear_descriptors(priv);
 
stmmac_hw_setup(ndev, false);
-- 
2.17.1

[PATCH 1/4] net: stmmac: dwmac4_lib: increase the timeout for dma reset

2020-12-02 Thread Joakim Zhang

From: Fugang Duan 

Current timeout value is not enough for gmac5 dma reset
on imx8mp platform, increase the timeout range.

Signed-off-by: Fugang Duan 
Signed-off-by: Joakim Zhang 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
index 6e30d7eb4983..0b4ee2dbb691 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
@@ -22,7 +22,7 @@ int dwmac4_dma_reset(void __iomem *ioaddr)
 
return readl_poll_timeout(ioaddr + DMA_BUS_MODE, value,
 !(value & DMA_BUS_MODE_SFT_RESET),
-1, 10);
+1, 100);
 }
 
 void dwmac4_set_rx_tail_ptr(void __iomem *ioaddr, u32 tail_ptr, u32 chan)
-- 
2.17.1

[PATCH 4/4] net: ethernet: stmmac: delete the eee_ctrl_timer after napi disabled

2020-12-02 Thread Joakim Zhang

From: Fugang Duan 

There have chance to re-enable the eee_ctrl_timer and fire the timer
in napi callback after delete the timer in .stmmac_release(), which
introduces to access eee registers in the timer function after clocks
are disabled then causes system hang. Found this issue when do
suspend/resume and reboot stress test.

It is safe to delete the timer after napi disabled and disable lpi mode.

Signed-off-by: Fugang Duan 
Signed-off-by: Joakim Zhang 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 53c5d77eba57..03c6995d276a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2938,9 +2938,6 @@ static int stmmac_release(struct net_device *dev)
struct stmmac_priv *priv = netdev_priv(dev);
u32 chan;
 
-   if (priv->eee_enabled)
-   del_timer_sync(&priv->eee_ctrl_timer);
-
if (device_may_wakeup(priv->device))
phylink_speed_down(priv->phylink, false);
/* Stop and disconnect the PHY */
@@ -2959,6 +2956,11 @@ static int stmmac_release(struct net_device *dev)
if (priv->lpi_irq > 0)
free_irq(priv->lpi_irq, dev);
 
+   if (priv->eee_enabled) {
+   priv->tx_path_in_lpi_mode = false;
+   del_timer_sync(&priv->eee_ctrl_timer);
+   }
+
/* Stop TX/RX DMA and clear the descriptors */
stmmac_stop_all_dma(priv);
 
@@ -5185,6 +5187,11 @@ int stmmac_suspend(struct device *dev)
for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++)
hrtimer_cancel(&priv->tx_queue[chan].txtimer);
 
+   if (priv->eee_enabled) {
+   priv->tx_path_in_lpi_mode = false;
+   del_timer_sync(&priv->eee_ctrl_timer);
+   }
+
/* Stop TX/RX DMA */
stmmac_stop_all_dma(priv);
 
-- 
2.17.1

RE: [PATCH] LF-2678 net: ethernet: stmmac: delete the eee_ctrl_timer after napi disabled

2020-12-02 Thread Joakim Zhang


Hi,

Please ignore this patch due to wrongly send out. Sorry.

Best Regards,
Joakim Zhang

> -Original Message-
> From: Joakim Zhang 
> Sent: 2020年12月2日 17:00
> To: peppe.cavall...@st.com; alexandre.tor...@st.com;
> joab...@synopsys.com
> Cc: da...@davemloft.net; k...@kernel.org; netdev@vger.kernel.org;
> dl-linux-imx 
> Subject: [PATCH] LF-2678 net: ethernet: stmmac: delete the eee_ctrl_timer
> after napi disabled
> 
> From: Fugang Duan 
> 
> There have chance to re-enable the eee_ctrl_timer and fire the timer in napi
> callback after delete the timer in .stmmac_release(), which introduces to
> access eee registers in the timer function after clocks are disabled then 
> causes
> system hang.
> 
> It is safe to delete the timer after napi disabled and disable lpi mode.
> 
> Tested-by: Joakim Zhang 
> Reviewed-by: Joakim Zhang 
> Signed-off-by: Fugang Duan 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index cc1f17b170f0..7e655fa34589 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2933,9 +2933,6 @@ static int stmmac_release(struct net_device *dev)
>   struct platform_device *pdev = to_platform_device(priv->device);
>   u32 chan;
> 
> - if (priv->eee_enabled)
> - del_timer_sync(&priv->eee_ctrl_timer);
> -
>   if (device_may_wakeup(priv->device))
>   phylink_speed_down(priv->phylink, false);
>   /* Stop and disconnect the PHY */
> @@ -2954,6 +2951,11 @@ static int stmmac_release(struct net_device *dev)
>   if (priv->lpi_irq > 0)
>   free_irq(priv->lpi_irq, dev);
> 
> + if (priv->eee_enabled) {
> + priv->tx_path_in_lpi_mode = false;
> + del_timer_sync(&priv->eee_ctrl_timer);
> + }
> +
>   /* Stop TX/RX DMA and clear the descriptors */
>   stmmac_stop_all_dma(priv);
> 
> @@ -5224,6 +5226,11 @@ int stmmac_suspend(struct device *dev)
>   for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++)
>   del_timer_sync(&priv->tx_queue[chan].txtimer);
> 
> + if (priv->eee_enabled) {
> + priv->tx_path_in_lpi_mode = false;
> + del_timer_sync(&priv->eee_ctrl_timer);
> + }
> +
>   /* Stop TX/RX DMA */
>   stmmac_stop_all_dma(priv);
> 
> --
> 2.17.1

[PATCH v3 net-next 1/4] net: bonding: Notify ports about their initial state

2020-12-02 Thread Tobias Waldekranz

When creating a static bond (e.g. balance-xor), all ports will always
be enabled. This is set, and the corresponding notification is sent
out, before the port is linked to the bond upper.

In the offloaded case, this ordering is hard to deal with.

The lower will first see a notification that it can not associate with
any bond. Then the bond is joined. After that point no more
notifications are sent, so all ports remain disabled.

This change simply sends an extra notification once the port has been
linked to the upper to synchronize the initial state.

Signed-off-by: Tobias Waldekranz 
---
 drivers/net/bonding/bond_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e0880a3840d7..d6e1f9cf28d5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1922,6 +1922,8 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,
goto err_unregister;
}
 
+   bond_lower_state_changed(new_slave);
+
res = bond_sysfs_slave_add(new_slave);
if (res) {
slave_dbg(bond_dev, slave_dev, "Error %d calling 
bond_sysfs_slave_add\n", res);
-- 
2.17.1

[PATCH v3 net-next 0/4] net: dsa: Link aggregation support

2020-12-02 Thread Tobias Waldekranz

Start of by adding an extra notification when adding a port to a bond,
this allows static LAGs to be offloaded using the bonding driver.

Then add the generic support required to offload link aggregates to
drivers built on top of the DSA subsystem.

Finally, implement offloading for the mv88e6xxx driver, i.e. Marvell's
LinkStreet family.

Supported LAG implementations:
- Bonding
- Team

Supported modes:
- Isolated. The LAG may be used as a regular interface outside of any
  bridge.
- Bridged. The LAG may be added to a bridge, in which case switching
  is offloaded between the LAG and any other switch ports. I.e. the
  LAG behaves just like a port from this perspective.

In bridged mode, the following is supported:
- STP filtering.
- VLAN filtering.
- Multicast filtering. The bridge correctly snoops IGMP and configures
  the proper groups if snooping is enabled. Static groups can also be
  configured. MLD seems to work, but has not been extensively tested.
- Unicast filtering. Automatic learning works. Static entries are
  _not_ supported. This will be added in a later series as it requires
  some more general refactoring in mv88e6xxx before I can test it.

v2 -> v3:
- Skip unnecessary RCU protection of the LAG device pointer, as
  suggested by Vladimir.
- Refcount LAGs with a plain refcount_t instead of `struct kref`, as
  suggested by Vladimir.

v1 -> v2:
- Allocate LAGs from a static pool to avoid late errors under memory
  pressure, as suggested by Andrew.

RFC -> v1:
- Properly propagate MDB operations.
- Support for bonding in addition to team.
- Fixed a locking bug in mv88e6xxx.
- Make sure ports are disabled-by-default in mv88e6xxx.
- Support for both DSA and EDSA tagging.

Tobias Waldekranz (4):
  net: bonding: Notify ports about their initial state
  net: dsa: Link aggregation support
  net: dsa: mv88e6xxx: Link aggregation support
  net: dsa: tag_dsa: Support reception of packets from LAG devices

 drivers/net/bonding/bond_main.c |   2 +
 drivers/net/dsa/mv88e6xxx/chip.c| 234 +++-
 drivers/net/dsa/mv88e6xxx/chip.h|   4 +
 drivers/net/dsa/mv88e6xxx/global2.c |   8 +-
 drivers/net/dsa/mv88e6xxx/global2.h |   5 +
 drivers/net/dsa/mv88e6xxx/port.c|  21 +++
 drivers/net/dsa/mv88e6xxx/port.h|   5 +
 include/net/dsa.h   |  97 
 net/dsa/dsa.c   |  12 +-
 net/dsa/dsa2.c  |  51 ++
 net/dsa/dsa_priv.h  |  31 
 net/dsa/port.c  | 132 
 net/dsa/slave.c |  83 +-
 net/dsa/switch.c|  49 ++
 net/dsa/tag_dsa.c   |  17 +-
 15 files changed, 735 insertions(+), 16 deletions(-)

-- 
2.17.1

[PATCH v3 net-next 3/4] net: dsa: mv88e6xxx: Link aggregation support

2020-12-02 Thread Tobias Waldekranz

Support offloading of LAGs to hardware. LAGs may be attached to a
bridge in which case VLANs, multicast groups, etc. are also offloaded
as usual.

Signed-off-by: Tobias Waldekranz 
---
 drivers/net/dsa/mv88e6xxx/chip.c| 234 +++-
 drivers/net/dsa/mv88e6xxx/chip.h|   4 +
 drivers/net/dsa/mv88e6xxx/global2.c |   8 +-
 drivers/net/dsa/mv88e6xxx/global2.h |   5 +
 drivers/net/dsa/mv88e6xxx/port.c|  21 +++
 drivers/net/dsa/mv88e6xxx/port.h|   5 +
 6 files changed, 269 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index e7f68ac0c7e3..3c4b795ac7e4 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1189,7 +1189,8 @@ static int mv88e6xxx_set_mac_eee(struct dsa_switch *ds, 
int port,
 }
 
 /* Mask of the local ports allowed to receive frames from a given fabric port 
*/
-static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port)
+static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port,
+  struct dsa_lag **lag)
 {
struct dsa_switch *ds = chip->ds;
struct dsa_switch_tree *dst = ds->dst;
@@ -1201,6 +1202,9 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip 
*chip, int dev, int port)
list_for_each_entry(dp, &dst->ports, list) {
if (dp->ds->index == dev && dp->index == port) {
found = true;
+
+   if (dp->lag && lag)
+   *lag = dp->lag;
break;
}
}
@@ -1231,7 +1235,9 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip 
*chip, int dev, int port)
 
 static int mv88e6xxx_port_vlan_map(struct mv88e6xxx_chip *chip, int port)
 {
-   u16 output_ports = mv88e6xxx_port_vlan(chip, chip->ds->index, port);
+   u16 output_ports;
+
+   output_ports = mv88e6xxx_port_vlan(chip, chip->ds->index, port, NULL);
 
/* prevent frames from going back out of the port they came in on */
output_ports &= ~BIT(port);
@@ -1389,14 +1395,21 @@ static int mv88e6xxx_mac_setup(struct mv88e6xxx_chip 
*chip)
 
 static int mv88e6xxx_pvt_map(struct mv88e6xxx_chip *chip, int dev, int port)
 {
+   struct dsa_lag *lag = NULL;
u16 pvlan = 0;
 
if (!mv88e6xxx_has_pvt(chip))
return 0;
 
/* Skip the local source device, which uses in-chip port VLAN */
-   if (dev != chip->ds->index)
-   pvlan = mv88e6xxx_port_vlan(chip, dev, port);
+   if (dev != chip->ds->index) {
+   pvlan = mv88e6xxx_port_vlan(chip, dev, port, &lag);
+
+   if (lag) {
+   dev = MV88E6XXX_G2_PVT_ADRR_DEV_TRUNK;
+   port = lag->id;
+   }
+   }
 
return mv88e6xxx_g2_pvt_write(chip, dev, port, pvlan);
 }
@@ -5368,6 +5381,207 @@ static int mv88e6xxx_port_egress_floods(struct 
dsa_switch *ds, int port,
return err;
 }
 
+static int mv88e6xxx_lag_sync_map(struct dsa_switch *ds, struct dsa_lag *lag)
+{
+   struct mv88e6xxx_chip *chip = ds->priv;
+   struct dsa_port *dp;
+   u16 map = 0;
+
+   /* Build the map of all ports to distribute flows destined for
+* this LAG. This can be either a local user port, or a DSA
+* port if the LAG port is on a remote chip.
+*/
+   list_for_each_entry(dp, &lag->ports, lag_list) {
+   map |= BIT(dsa_towards_port(ds, dp->ds->index, dp->index));
+   }
+
+   return mv88e6xxx_g2_trunk_mapping_write(chip, lag->id, map);
+}
+
+static const u8 mv88e6xxx_lag_mask_table[8][8] = {
+   /* Row number corresponds to the number of active members in a
+* LAG. Each column states which of the eight hash buckets are
+* mapped to the column:th port in the LAG.
+*
+* Example: In a LAG with three active ports, the second port
+* ([2][1]) would be selected for traffic mapped to buckets
+* 3,4,5 (0x38).
+*/
+   { 0xff,0,0,0,0,0,0,0 },
+   { 0x0f, 0xf0,0,0,0,0,0,0 },
+   { 0x07, 0x38, 0xc0,0,0,0,0,0 },
+   { 0x03, 0x0c, 0x30, 0xc0,0,0,0,0 },
+   { 0x03, 0x0c, 0x30, 0x40, 0x80,0,0,0 },
+   { 0x03, 0x0c, 0x10, 0x20, 0x40, 0x80,0,0 },
+   { 0x03, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80,0 },
+   { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 },
+};
+
+static void mv88e6xxx_lag_set_port_mask(u16 *mask, int port,
+   int num_tx, int nth)
+{
+   u8 active = 0;
+   int i;
+
+   num_tx = num_tx <= 8 ? num_tx : 8;
+   if (nth < num_tx)
+   active = mv88e6xxx_lag_mask_table[num_tx - 1][nth];
+
+   for (i = 0; i < 8; i++) {
+   if (BIT(i) & active)
+   mask[i] |= BIT(port);
+   }
+}
+
+stat

[PATCH v3 net-next 2/4] net: dsa: Link aggregation support

2020-12-02 Thread Tobias Waldekranz

Monitor the following events and notify the driver when:

- A DSA port joins/leaves a LAG.
- A LAG, made up of DSA ports, joins/leaves a bridge.
- A DSA port in a LAG is enabled/disabled (enabled meaning
  "distributing" in 802.3ad LACP terms).

Each LAG interface to which a DSA port is attached is represented by a
`struct dsa_lag` which is globally reachable from the switch tree and
from each associated port.

When a LAG joins a bridge, the DSA subsystem will treat that as each
individual port joining the bridge. The driver may look at the port's
LAG pointer to see if it is associated with any LAG, if that is
required. This is analogue to how switchdev events are replicated out
to all lower devices when reaching e.g. a LAG.

Signed-off-by: Tobias Waldekranz 
---
 include/net/dsa.h  |  97 +
 net/dsa/dsa2.c |  51 ++
 net/dsa/dsa_priv.h |  31 +++
 net/dsa/port.c | 132 +
 net/dsa/slave.c|  83 +---
 net/dsa/switch.c   |  49 +
 6 files changed, 437 insertions(+), 6 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 4e60d2610f20..aaa350b78c55 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -7,6 +7,7 @@
 #ifndef __LINUX_NET_DSA_H
 #define __LINUX_NET_DSA_H
 
+#include 
 #include 
 #include 
 #include 
@@ -71,6 +72,7 @@ enum dsa_tag_protocol {
 
 struct packet_type;
 struct dsa_switch;
+struct dsa_lag;
 
 struct dsa_device_ops {
struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
@@ -149,6 +151,13 @@ struct dsa_switch_tree {
 
/* List of DSA links composing the routing table */
struct list_head rtable;
+
+   /* Link aggregates */
+   struct {
+   struct dsa_lag *pool;
+   unsigned long *busy;
+   unsigned int num;
+   } lags;
 };
 
 /* TC matchall action types */
@@ -180,6 +189,69 @@ struct dsa_mall_tc_entry {
};
 };
 
+struct dsa_lag {
+   struct net_device *dev;
+   int id;
+
+   struct list_head ports;
+
+   /* For multichip systems, we must ensure that each hash bucket
+* is only enabled on a single egress port throughout the
+* whole tree, lest we send duplicates. Therefore we must
+* maintain a global list of active tx ports, so that each
+* switch can figure out which buckets to enable on which
+* ports.
+*/
+   struct list_head tx_ports;
+   int num_tx;
+
+   refcount_t refcount;
+};
+
+#define dsa_lag_foreach(_id, _dst) \
+   for_each_set_bit(_id, (_dst)->lags.busy, (_dst)->lags.num)
+
+static inline bool dsa_lag_offloading(struct dsa_switch_tree *dst)
+{
+   return dst->lags.num > 0;
+}
+
+static inline bool dsa_lag_available(struct dsa_switch_tree *dst)
+{
+   return !bitmap_full(dst->lags.busy, dst->lags.num);
+}
+
+static inline struct dsa_lag *dsa_lag_by_id(struct dsa_switch_tree *dst, int 
id)
+{
+   if (!test_bit(id, dst->lags.busy))
+   return NULL;
+
+   return &dst->lags.pool[id];
+}
+
+static inline struct net_device *dsa_lag_dev_by_id(struct dsa_switch_tree *dst,
+  int id)
+{
+   struct dsa_lag *lag = dsa_lag_by_id(dst, id);
+
+   return lag ? READ_ONCE(lag->dev) : NULL;
+}
+
+static inline struct dsa_lag *dsa_lag_by_dev(struct dsa_switch_tree *dst,
+struct net_device *dev)
+{
+   struct dsa_lag *lag;
+   int id;
+
+   dsa_lag_foreach(id, dst) {
+   lag = dsa_lag_by_id(dst, id);
+
+   if (lag->dev == dev)
+   return lag;
+   }
+
+   return NULL;
+}
 
 struct dsa_port {
/* A CPU port is physically connected to a master device.
@@ -220,6 +292,9 @@ struct dsa_port {
booldevlink_port_setup;
struct phylink  *pl;
struct phylink_config   pl_config;
+   struct dsa_lag  *lag;
+   struct list_headlag_list;
+   struct list_headlag_tx_list;
 
struct list_head list;
 
@@ -335,6 +410,11 @@ struct dsa_switch {
 */
boolmtu_enforcement_ingress;
 
+   /* The maximum number of LAGs that can be configured. A value of zero
+* is used to indicate that LAG offloading is not supported.
+*/
+   unsigned intnum_lags;
+
size_t num_ports;
 };
 
@@ -624,6 +704,13 @@ struct dsa_switch_ops {
void(*crosschip_bridge_leave)(struct dsa_switch *ds, int tree_index,
  int sw_index, int port,
  struct net_device *br);
+   int (*crosschip_lag_change)(struct dsa_switch *ds, int sw_index,
+   int port, struct net_device *lag_dev,
+   struct ne

[PATCH v3 net-next 4/4] net: dsa: tag_dsa: Support reception of packets from LAG devices

2020-12-02 Thread Tobias Waldekranz

Packets ingressing on a LAG that egress on the CPU port, which are not
classified as management, will have a FORWARD tag that does not
contain the normal source device/port tuple. Instead the trunk bit
will be set, and the port field holds the LAG id.

Since the exact source port information is not available in the tag,
frames are injected directly on the LAG interface and thus do never
pass through any DSA port interface on ingress.

Management frames (TO_CPU) are not affected and will pass through the
DSA port interface as usual.

Signed-off-by: Tobias Waldekranz 
---
 net/dsa/dsa.c | 12 +++-
 net/dsa/tag_dsa.c | 17 -
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index a1b1dc8a4d87..7325bf4608e9 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -219,11 +219,21 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct 
net_device *dev,
}
 
skb = nskb;
-   p = netdev_priv(skb->dev);
skb_push(skb, ETH_HLEN);
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, skb->dev);
 
+   if (unlikely(!dsa_slave_dev_check(skb->dev))) {
+   /* Packet is to be injected directly on an upper
+* device, e.g. a team/bond, so skip all DSA-port
+* specific actions.
+*/
+   netif_rx(skb);
+   return 0;
+   }
+
+   p = netdev_priv(skb->dev);
+
if (unlikely(cpu_dp->ds->untag_bridge_pvid)) {
nskb = dsa_untag_bridge_pvid(skb);
if (!nskb) {
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 112c7c6dd568..be7271de8d0b 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -163,6 +163,7 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, 
struct net_device *dev,
  u8 extra)
 {
int source_device, source_port;
+   bool trunk = false;
enum dsa_code code;
enum dsa_cmd cmd;
u8 *dsa_header;
@@ -174,6 +175,8 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, 
struct net_device *dev,
switch (cmd) {
case DSA_CMD_FORWARD:
skb->offload_fwd_mark = 1;
+
+   trunk = !!(dsa_header[1] & 7);
break;
 
case DSA_CMD_TO_CPU:
@@ -216,7 +219,19 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, 
struct net_device *dev,
source_device = dsa_header[0] & 0x1f;
source_port = (dsa_header[1] >> 3) & 0x1f;
 
-   skb->dev = dsa_master_find_slave(dev, source_device, source_port);
+   if (trunk) {
+   struct dsa_port *cpu_dp = dev->dsa_ptr;
+
+   /* The exact source port is not available in the tag,
+* so we inject the frame directly on the upper
+* team/bond.
+*/
+   skb->dev = dsa_lag_dev_by_id(cpu_dp->dst, source_port);
+   } else {
+   skb->dev = dsa_master_find_slave(dev, source_device,
+source_port);
+   }
+
if (!skb->dev)
return NULL;
 
-- 
2.17.1

Re: [External] Re: [PATCH 0/7] Introduce vdpa management tool

2020-12-02 Thread Yongji Xie

On Wed, Dec 2, 2020 at 12:53 PM Parav Pandit  wrote:
>
>
>
> > From: Yongji Xie 
> > Sent: Wednesday, December 2, 2020 9:00 AM
> >
> > On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit  wrote:
> > >
> > >
> > >
> > > > From: Yongji Xie 
> > > > Sent: Tuesday, December 1, 2020 7:49 PM
> > > >
> > > > On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit  wrote:
> > > > >
> > > > >
> > > > >
> > > > > > From: Yongji Xie 
> > > > > > Sent: Tuesday, December 1, 2020 3:26 PM
> > > > > >
> > > > > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang 
> > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > > > > >>> Thanks for adding me, Jason!
> > > > > > > >>>
> > > > > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> > > > > > > >>> Userspace) [1]. This tool is very useful for the vduse device.
> > > > > > > >>> So I'm considering integrating this into my v2 patchset.
> > > > > > > >>> But there is one problem：
> > > > > > > >>>
> > > > > > > >>> In this tool, vdpa device config action and enable action
> > > > > > > >>> are combined into one netlink msg: VDPA_CMD_DEV_NEW. But
> > > > > > > >>> in
> > > > vduse
> > > > > > > >>> case, it needs to be splitted because a chardev should be
> > > > > > > >>> created and opened by a userspace process before we enable
> > > > > > > >>> the vdpa device (call vdpa_register_device()).
> > > > > > > >>>
> > > > > > > >>> So I'd like to know whether it's possible (or have some
> > > > > > > >>> plans) to add two new netlink msgs something like:
> > > > > > > >>> VDPA_CMD_DEV_ENABLE
> > > > > > and
> > > > > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> > > > > > > >>>
> > > > > > > >> Actually, we've discussed such intermediate step in some
> > > > > > > >> early discussion. It looks to me VDUSE could be one of the 
> > > > > > > >> users of
> > this.
> > > > > > > >>
> > > > > > > >> Or I wonder whether we can switch to use anonymous
> > > > > > > >> inode(fd) for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD
> > ioctl?
> > > > > > > >>
> > > > > > > > Yes, we can. Actually the current implementation in VDUSE is
> > > > > > > > like this.  But seems like this is still a intermediate step.
> > > > > > > > The fd should be binded to a name or something else which
> > > > > > > > need to be configured before.
> > > > > > >
> > > > > > >
> > > > > > > The name could be specified via the netlink. It looks to me
> > > > > > > the real issue is that until the device is connected with a
> > > > > > > userspace, it can't be used. So we also need to fail the
> > > > > > > enabling if it doesn't
> > > > opened.
> > > > > > >
> > > > > >
> > > > > > Yes, that's true. So you mean we can firstly try to fetch the fd
> > > > > > binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use
> > > > > > the name/vduse_id as a attribute to create vdpa device? It looks 
> > > > > > fine to
> > me.
> > > > >
> > > > > I probably do not well understand. I tried reading patch [1] and
> > > > > few things
> > > > do not look correct as below.
> > > > > Creating the vdpa device on the bus device and destroying the
> > > > > device from
> > > > the workqueue seems unnecessary and racy.
> > > > >
> > > > > It seems vduse driver needs
> > > > > This is something should be done as part of the vdpa dev add
> > > > > command,
> > > > instead of connecting two sides separately and ensuring race free
> > > > access to it.
> > > > >
> > > > > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
> > > > >
> > > >
> > > > Yes, we can avoid these two ioctls with the help of the management tool.
> > > >
> > > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > > >
> > > > > When above command is executed it creates necessary vdpa device
> > > > > foo2
> > > > on the bus.
> > > > > When user binds foo2 device with the vduse driver, in the probe(),
> > > > > it
> > > > creates respective char device to access it from user space.
> > > >
> > > I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 
> > > or
> > netdevsim.
> > > It has its own implementation similar to fuse with its own backend of 
> > > choice.
> > > More below.
> > >
> > > > But vduse driver is not a vdpa bus driver. It works like vdpasim
> > > > driver, but offloads the data plane and control plane to a user space 
> > > > process.
> > >
> > > In that case to draw parallel lines,
> > >
> > > 1. netdevsim:
> > > (a) create resources in kernel sw
> > > (b) datapath simulates in kernel
> > >
> > > 2. ifc + mlx5 vdpa dev:
> > > (a) creates resource in hw
> > > (b) data path is in hw
> > >
> > > 3. vduse:
> > > (a) creates resources in userspace sw
> > > (b) data path is in user space.
> > > hence creates data path resources for user space.
> > > So char device is created, removed as result of vdpa device creation.
> > >
> > > For example,
> > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > >
> > > Above command

Re: [External] Re: [PATCH 0/7] Introduce vdpa management tool

2020-12-02 Thread Yongji Xie

On Wed, Dec 2, 2020 at 1:51 PM Jason Wang  wrote:
>
>
> On 2020/12/2 下午12:53, Parav Pandit wrote:
> >
> >> From: Yongji Xie 
> >> Sent: Wednesday, December 2, 2020 9:00 AM
> >>
> >> On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit  wrote:
> >>>
> >>>
>  From: Yongji Xie 
>  Sent: Tuesday, December 1, 2020 7:49 PM
> 
>  On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit  wrote:
> >
> >
> >> From: Yongji Xie 
> >> Sent: Tuesday, December 1, 2020 3:26 PM
> >>
> >> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang 
>  wrote:
> >>>
> >>> On 2020/11/30 下午3:07, Yongji Xie wrote:
> >> Thanks for adding me, Jason!
> >>
> >> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> >> Userspace) [1]. This tool is very useful for the vduse device.
> >> So I'm considering integrating this into my v2 patchset.
> >> But there is one problem：
> >>
> >> In this tool, vdpa device config action and enable action
> >> are combined into one netlink msg: VDPA_CMD_DEV_NEW. But
> >> in
>  vduse
> >> case, it needs to be splitted because a chardev should be
> >> created and opened by a userspace process before we enable
> >> the vdpa device (call vdpa_register_device()).
> >>
> >> So I'd like to know whether it's possible (or have some
> >> plans) to add two new netlink msgs something like:
> >> VDPA_CMD_DEV_ENABLE
> >> and
> >> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> >>
> > Actually, we've discussed such intermediate step in some
> > early discussion. It looks to me VDUSE could be one of the users of
> >> this.
> > Or I wonder whether we can switch to use anonymous
> > inode(fd) for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD
> >> ioctl?
>  Yes, we can. Actually the current implementation in VDUSE is
>  like this.  But seems like this is still a intermediate step.
>  The fd should be binded to a name or something else which
>  need to be configured before.
> >>>
> >>> The name could be specified via the netlink. It looks to me
> >>> the real issue is that until the device is connected with a
> >>> userspace, it can't be used. So we also need to fail the
> >>> enabling if it doesn't
>  opened.
> >> Yes, that's true. So you mean we can firstly try to fetch the fd
> >> binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use
> >> the name/vduse_id as a attribute to create vdpa device? It looks fine 
> >> to
> >> me.
> > I probably do not well understand. I tried reading patch [1] and
> > few things
>  do not look correct as below.
> > Creating the vdpa device on the bus device and destroying the
> > device from
>  the workqueue seems unnecessary and racy.
> > It seems vduse driver needs
> > This is something should be done as part of the vdpa dev add
> > command,
>  instead of connecting two sides separately and ensuring race free
>  access to it.
> > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
> >
>  Yes, we can avoid these two ioctls with the help of the management tool.
> 
> > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> >
> > When above command is executed it creates necessary vdpa device
> > foo2
>  on the bus.
> > When user binds foo2 device with the vduse driver, in the probe(),
> > it
>  creates respective char device to access it from user space.
> 
> >>> I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 
> >>> or
> >> netdevsim.
> >>> It has its own implementation similar to fuse with its own backend of 
> >>> choice.
> >>> More below.
> >>>
>  But vduse driver is not a vdpa bus driver. It works like vdpasim
>  driver, but offloads the data plane and control plane to a user space 
>  process.
> >>> In that case to draw parallel lines,
> >>>
> >>> 1. netdevsim:
> >>> (a) create resources in kernel sw
> >>> (b) datapath simulates in kernel
> >>>
> >>> 2. ifc + mlx5 vdpa dev:
> >>> (a) creates resource in hw
> >>> (b) data path is in hw
> >>>
> >>> 3. vduse:
> >>> (a) creates resources in userspace sw
> >>> (b) data path is in user space.
> >>> hence creates data path resources for user space.
> >>> So char device is created, removed as result of vdpa device creation.
> >>>
> >>> For example,
> >>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> >>>
> >>> Above command will create char device for user space.
> >>>
> >>> Similar command for ifc/mlx5 would have created similar channel for rest 
> >>> of
> >> the config commands in hw.
> >>> vduse channel = char device, eventfd etc.
> >>> ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
> >>> channel = sw direct calls
> >>>
> >>> Does it make sense?
> >> In

Re: [PATCH net v2] net/x25: prevent a couple of overflows

2020-12-02 Thread Martin Schiller


On 2020-12-01 16:15, Dan Carpenter wrote:

The .x25_addr[] address comes from the user and is not necessarily
NUL terminated.  This leads to a couple problems.  The first problem is
that the strlen() in x25_bind() can read beyond the end of the buffer.

The second problem is more subtle and could result in memory 
corruption.

The call tree is:
  x25_connect()
  --> x25_write_internal()
  --> x25_addr_aton()

The .x25_addr[] buffers are copied to the "addresses" buffer from
x25_write_internal() so it will lead to stack corruption.

Verify that the strings are NUL terminated and return -EINVAL if they
are not.

Reported-by: "kiyin(尹亮)" 
Signed-off-by: Dan Carpenter 
---
The first patch put a NUL terminator on the end of the string and this
patch returns an error instead.  I don't have a strong preference, 
which

patch to go with.

 net/x25/af_x25.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 9232cdb42ad9..d41fffb2507b 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -675,7 +675,8 @@ static int x25_bind(struct socket *sock, struct
sockaddr *uaddr, int addr_len)
int len, i, rc = 0;

if (addr_len != sizeof(struct sockaddr_x25) ||
-   addr->sx25_family != AF_X25) {
+   addr->sx25_family != AF_X25 ||
+	strnlen(addr->sx25_addr.x25_addr, X25_ADDR_LEN) == X25_ADDR_LEN) 
{

rc = -EINVAL;
goto out;
}
@@ -769,7 +770,8 @@ static int x25_connect(struct socket *sock, struct
sockaddr *uaddr,

rc = -EINVAL;
if (addr_len != sizeof(struct sockaddr_x25) ||
-   addr->sx25_family != AF_X25)
+   addr->sx25_family != AF_X25 ||
+   strnlen(addr->sx25_addr.x25_addr, X25_ADDR_LEN) == X25_ADDR_LEN)
goto out;

rc = -ENETUNREACH;


Acked-by: Martin Schiller

[PATCH net] cxgb3: fix error return code in t3_sge_alloc_qset()

2020-12-02 Thread Zhang Changzhong

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: b1fb1f280d09 ("cxgb3 - Fix dma mapping error path")
Reported-by: Hulk Robot 
Signed-off-by: Zhang Changzhong 
---
 drivers/net/ethernet/chelsio/cxgb3/sge.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/sge.c 
b/drivers/net/ethernet/chelsio/cxgb3/sge.c
index e18e9ce..1cc3c51 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/sge.c
@@ -3175,6 +3175,7 @@ int t3_sge_alloc_qset(struct adapter *adapter, unsigned 
int id, int nports,
  GFP_KERNEL | __GFP_COMP);
if (!avail) {
CH_ALERT(adapter, "free list queue 0 initialization failed\n");
+   ret = -ENOMEM;
goto err;
}
if (avail < q->fl[0].size)
-- 
2.9.5

[PATCH net] net: pasemi: fix error return code in pasemi_mac_open()

2020-12-02 Thread Zhang Changzhong

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 72b05b9940f0 ("pasemi_mac: RX/TX ring management cleanup")
Fixes: 8d636d8bc5ff ("pasemi_mac: jumbo frame support")
Reported-by: Hulk Robot 
Signed-off-by: Zhang Changzhong 
---
 drivers/net/ethernet/pasemi/pasemi_mac.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c 
b/drivers/net/ethernet/pasemi/pasemi_mac.c
index be66601..040a15a 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.c
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.c
@@ -1078,16 +1078,20 @@ static int pasemi_mac_open(struct net_device *dev)
 
mac->tx = pasemi_mac_setup_tx_resources(dev);
 
-   if (!mac->tx)
+   if (!mac->tx) {
+   ret = -ENOMEM;
goto out_tx_ring;
+   }
 
/* We might already have allocated rings in case mtu was changed
 * before interface was brought up.
 */
if (dev->mtu > 1500 && !mac->num_cs) {
pasemi_mac_setup_csrings(mac);
-   if (!mac->num_cs)
+   if (!mac->num_cs) {
+   ret = -ENOMEM;
goto out_tx_ring;
+   }
}
 
/* Zero out rmon counters */
-- 
2.9.5

[PATCH net] vxlan: fix error return code in __vxlan_dev_create()

2020-12-02 Thread Zhang Changzhong

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level")
Reported-by: Hulk Robot 
Signed-off-by: Zhang Changzhong 
---
 drivers/net/vxlan.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 1a557ae..a506872 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3877,8 +3877,10 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
 
if (dst->remote_ifindex) {
remote_dev = __dev_get_by_index(net, dst->remote_ifindex);
-   if (!remote_dev)
+   if (!remote_dev) {
+   err = -ENODEV;
goto errout;
+   }
 
err = netdev_upper_dev_link(remote_dev, dev, extack);
if (err)
-- 
2.9.5

Re: [PATCH bpf] libbpf: sanitise map names before pinning

2020-12-02 Thread Toke Høiland-Jørgensen

Andrii Nakryiko  writes:

> On Mon, Nov 30, 2020 at 8:17 AM Toke Høiland-Jørgensen  
> wrote:
>>
>> When we added sanitising of map names before loading programs to libbpf, we
>> still allowed periods in the name. While the kernel will accept these for
>> the map names themselves, they are not allowed in file names when pinning
>
> That sounds like an unnecessary difference in kernel behavior. If the
> kernel allows maps with '.' in the name, why not allow to pin it?
> Should we fix that in the kernel?

Yeah, it is a bit odd. I always assumed the restriction in file names is
to prevent people from creating hidden (.-prefixed) files in bpffs? But
don't actually know for sure. Anyway, if that is the case we could still
allow periods in the middle of names.

I'm certainly not opposed to changing the kernel behaviour and I can
follow up with a patch for this if others agree; but we obviously still
need this for older kernels so I'll send a v2 with the helper method you
suggested below.

-Toke

Re: KMSAN: uninit-value in validate_beacon_head

2020-12-02 Thread syzbot

syzbot has found a reproducer for the following issue on:

HEAD commit:73d62e81 kmsan: random: prevent boot-time reports in _mix_..
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=153d460750
kernel config:  https://syzkaller.appspot.com/x/.config?x=eef728deea880383
dashboard link: https://syzkaller.appspot.com/bug?extid=72b99dcf4607e8c770f3
compiler:   clang version 11.0.0 (https://github.com/llvm/llvm-project.git 
ca2dcbd030eadbf0aa9b660efe864ff08af6e18b)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=14c1cec350
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=160b6cd350

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+72b99dcf4607e8c77...@syzkaller.appspotmail.com

=
BUG: KMSAN: uninit-value in validate_beacon_head+0x51e/0x5c0 
net/wireless/nl80211.c:225
CPU: 0 PID: 8275 Comm: syz-executor237 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 validate_beacon_head+0x51e/0x5c0 net/wireless/nl80211.c:225
 validate_nla lib/nlattr.c:544 [inline]
 __nla_validate_parse+0x241a/0x4e00 lib/nlattr.c:588
 __nla_parse+0x141/0x150 lib/nlattr.c:685
 __nlmsg_parse include/net/netlink.h:733 [inline]
 nlmsg_parse_deprecated include/net/netlink.h:772 [inline]
 nl80211_prepare_wdev_dump+0x6fd/0xbb0 net/wireless/nl80211.c:891
 nl80211_dump_station+0x143/0x740 net/wireless/nl80211.c:5810
 netlink_dump+0xb92/0x1670 net/netlink/af_netlink.c:2268
 __netlink_dump_start+0xcf1/0xea0 net/netlink/af_netlink.c:2373
 genl_family_rcv_msg_dumpit net/netlink/genetlink.c:697 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:780 [inline]
 genl_rcv_msg+0xff0/0x1610 net/netlink/genetlink.c:800
 netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
 genl_rcv+0x63/0x80 net/netlink/genetlink.c:811
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
 ___sys_sendmsg net/socket.c:2407 [inline]
 __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
 __do_sys_sendmsg net/socket.c:2449 [inline]
 __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4418a9
Code: e8 fc a9 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 
fb 06 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7ffe906479e8 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX:  RCX: 004418a9
RDX:  RSI: 20c0 RDI: 0003
RBP: 006cc018 R08:  R09: 004002c8
R10:  R11: 0246 R12: 00402430
R13: 004024c0 R14:  R15: 

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
 kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104
 kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76
 slab_alloc_node mm/slub.c:2906 [inline]
 __kmalloc_node_track_caller+0xc61/0x15f0 mm/slub.c:4512
 __kmalloc_reserve net/core/skbuff.c:142 [inline]
 __alloc_skb+0x309/0xae0 net/core/skbuff.c:210
 alloc_skb include/linux/skbuff.h:1094 [inline]
 netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
 netlink_sendmsg+0xdb8/0x1840 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
 ___sys_sendmsg net/socket.c:2407 [inline]
 __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
 __do_sys_sendmsg net/socket.c:2449 [inline]
 __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
=

Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support

2020-12-02 Thread Vladimir Oltean

On Wed, Dec 02, 2020 at 10:13:54AM +0100, Tobias Waldekranz wrote:
> Monitor the following events and notify the driver when:
> 
> - A DSA port joins/leaves a LAG.
> - A LAG, made up of DSA ports, joins/leaves a bridge.
> - A DSA port in a LAG is enabled/disabled (enabled meaning
>   "distributing" in 802.3ad LACP terms).
> 
> Each LAG interface to which a DSA port is attached is represented by a
> `struct dsa_lag` which is globally reachable from the switch tree and
> from each associated port.
> 
> When a LAG joins a bridge, the DSA subsystem will treat that as each
> individual port joining the bridge. The driver may look at the port's
> LAG pointer to see if it is associated with any LAG, if that is
> required. This is analogue to how switchdev events are replicated out
> to all lower devices when reaching e.g. a LAG.
> 
> Signed-off-by: Tobias Waldekranz 
> ---
>  include/net/dsa.h  |  97 +
>  net/dsa/dsa2.c |  51 ++
>  net/dsa/dsa_priv.h |  31 +++
>  net/dsa/port.c | 132 +
>  net/dsa/slave.c|  83 +---
>  net/dsa/switch.c   |  49 +
>  6 files changed, 437 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 4e60d2610f20..aaa350b78c55 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -7,6 +7,7 @@
>  #ifndef __LINUX_NET_DSA_H
>  #define __LINUX_NET_DSA_H
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -71,6 +72,7 @@ enum dsa_tag_protocol {
>  
>  struct packet_type;
>  struct dsa_switch;
> +struct dsa_lag;
>  
>  struct dsa_device_ops {
>   struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
> @@ -149,6 +151,13 @@ struct dsa_switch_tree {
>  
>   /* List of DSA links composing the routing table */
>   struct list_head rtable;
> +
> + /* Link aggregates */
> + struct {
> + struct dsa_lag *pool;
> + unsigned long *busy;
> + unsigned int num;

Can we get rid of the busy array and just look at the refcounts?
Can we also get rid of the "num" variable?

> + } lags;
>  };

Re: [PATCH net] inet_ecn: Fix endianness of checksum update when setting ECT(1)

2020-12-02 Thread Toke Høiland-Jørgensen

Jakub Kicinski  writes:

> On Mon, 30 Nov 2020 19:37:05 +0100 Toke Høiland-Jørgensen wrote:
>> When adding support for propagating ECT(1) marking in IP headers it seems I
>> suffered from endianness-confusion in the checksum update calculation: In
>> fact the ECN field is in the *lower* bits of the first 16-bit word of the
>> IP header when calculating in network byte order. This means that the
>> addition performed to update the checksum field was wrong; let's fix that.
>> 
>> Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as 
>> recommended by RFC6040")
>> Reported-by: Jonathan Morton 
>> Tested-by: Pete Heist 
>> Signed-off-by: Toke Høiland-Jørgensen 
>
> Applied and queued, thanks!
>
>> diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
>> index e1eaf1780288..563457fec557 100644
>> --- a/include/net/inet_ecn.h
>> +++ b/include/net/inet_ecn.h
>> @@ -107,7 +107,7 @@ static inline int IP_ECN_set_ect1(struct iphdr *iph)
>>  if ((iph->tos & INET_ECN_MASK) != INET_ECN_ECT_0)
>>  return 0;
>>  
>> -check += (__force u16)htons(0x100);
>> +check += (__force u16)htons(0x1);
>>  
>>  iph->check = (__force __sum16)(check + (check>=0x));
>>  iph->tos ^= INET_ECN_MASK;
>
> This seems to be open coding csum16_add() - is there a reason and if
> not perhaps worth following up in net-next?

Hmm, good point. I think I originally just copied this from
IP_ECN_set_ce(), which comes all the way back from the initial
Linux-2.6.12-rc2 commit in git. So I suppose it may just predate the
csum helpers? I'll wait for this patch to get propagated to net-next,
then follow up with a fix there :)

-Toke

Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-12-02 Thread Jiri Pirko

Wed, Dec 02, 2020 at 01:32:46AM CET, edwin.p...@broadcom.com wrote:
>On Tue, Dec 1, 2020 at 3:22 AM Jiri Pirko  wrote:
>
>> >Consider a physical QSFP connector comprising 4 lanes. Today, if the
>> >speed is forced, we would achieve 100G speeds using all 4 lanes with
>> >NRZ encoding. If we configure the port for PAM4 encoding at the same
>> >speed, then we only require 2 of the available 4 lanes. The remaining 2
>> >lanes are wasted. If we only require 2 of the 4 lanes, why not split the
>> >port and request the same speed of one of the now split out ports? Now,
>> >this same speed is only achievable using PAM4 encoding (it is implied)
>> >and we have a spare, potentially usable, assuming an appropriate break-
>> >out cable, port instead of the 2 unused lanes.
>>
>> I don't see how this dynamic split port could work in real life to be
>> honest. The split is something admin needs to configure and can rely
>> that netdevice exists all the time and not comes and goes under
>> different circumstances. Multiple obvious reasons why.
>
>I'm not suggesting the port split be dynamic at all. I'm suggesting that if
>the admin wants or needs to force PAM4 on a port that would otherwise
>be able to achieve the given speed using more lanes with NRZ, then the
>admin should split the port, so that it has fewer lanes, in order to make
>that intent clear (or otherwise configure the port to have fewer lanes
>attached, if you really don't want to or can't create the additional split
>port).

Okay, I see your point now. The thing is, the port split/unsplit causes
a great distubance. Meaning, the netdevs all of the port
disappear/reappear. Now consider following example:

You have a router you have configured routes on many netdevs
On one of the netdevs (has routes on it), you for any reason
need to force lane number.
In your suggestion, the netdev disappears along with the routes, the
routing is then broken. I don't see how this could be acceptable.

We are talking here about netdev configuration, we have a tool for that,
that is ethtool. What you suggest is to take it to different level,
I don't believe it is correct/doable.


>
>Using this approach, the existing ethtool forced speed interface is
>sufficient to configure all possible lane encodings, because the
>encoding that the driver must select is now implicit (note, we don't
>need to care about media type here). That is, the driver can always
>select the encoding that maximizes utilization of the lanes available
>to the port (as defined by the admin).
>
>> >So concretely, I'm suggesting that if we want to force PAM4 at the lower
>> >speeds, split the port and then we don't need an ethtool interface change
>> >at all to achieve the same goal. Having a spare (potentially usable) port
>> >is better than spare (unusable) lanes.
>>
>> The admin has to decide, define.
>
>I'm not sure I understand. The admin would indeed decide. This paragraph
>merely served to motivate why a rational admin should prefer to have a
>spare port rather than unused lanes he can't use, because they would be
>attached to a port using an encoding that doesn't need them. If he wasn't
>planning on using the additional port, he loses nothing. Otherwise, he gains
>something he would not otherwise have had (it's win-win). From the
>perspective of the original port, two unused lanes is no different than two
>lanes allocated to another logical port.
>
>Regards,
>Edwin Peer

[PATCH v6 bpf-next 0/2] libbpf: add support for privileged/unprivileged control separation

2020-12-02 Thread mariusz . dudek

From: Mariusz Dudek 

This patch series adds support for separation of eBPF program
load and xsk socket creation. In for example a Kubernetes
environment you can have an AF_XDP CNI or daemonset that is 
responsible for launching pods that execute an application 
using AF_XDP sockets. It is desirable that the pod runs with
as low privileges as possible, CAP_NET_RAW in this case, 
and that all operations that require privileges are contained
in the CNI or daemonset.

In this case, you have to be able separate ePBF program load from
xsk socket creation.

Currently, this will not work with the xsk_socket__create APIs
because you need to have CAP_NET_ADMIN privileges to load eBPF
program and CAP_SYS_ADMIN privileges to create update xsk_bpf_maps.
To be exact xsk_set_bpf_maps does not need those privileges but
it takes the prog_fd and xsks_map_fd and those are known only to
process that was loading eBPF program. The api bpf_prog_get_fd_by_id
that looks up the fd of the prog using an prog_id and
bpf_map_get_fd_by_id that looks for xsks_map_fd usinb map_id both
requires CAP_SYS_ADMIN.

With this patch, the pod can be run with CAP_NET_RAW capability
only. In case your umem is larger or equal process limit for
MEMLOCK you need either increase the limit or CAP_IPC_LOCK capability. 
Without this patch in case of insufficient rights ENOPERM is
returned by xsk_socket__create.

To resolve this privileges issue two new APIs are introduced:
- xsk_setup_xdp_prog - loads the built in XDP program. It can
also return xsks_map_fd which is needed by unprivileged
process to update xsks_map with AF_XDP socket "fd"
- xsk_sokcet__update_xskmap - inserts an AF_XDP socket into an
xskmap for a particular xsk_socket

Usage example:
int xsk_setup_xdp_prog(int ifindex, int *xsks_map_fd)

int xsk_socket__update_xskmap(struct xsk_socket *xsk, int xsks_map_fd);

Inserts AF_XDP socket "fd" into the xskmap.

The first patch introduces the new APIs. The second patch provides
a new sample applications working as control and modification to
existing xdpsock application to work with less privileges.

This patch set is based on bpf-next commit ba0581749fec
(net, xdp, xsk: fix __sk_mark_napi_id_once napi_id error)

Since v5
- fixed sample/bpf/xdpsock_user.c to resolve merge conflicts

Since v4
- sample/bpf/Makefile issues fixed

Since v3:
- force_set_map flag removed
- leaking of xsk struct fixed
- unified function error returning policy implemented

Since v2:
- new APIs moved itto LIBBPF_0.3.0 section
- struct bpf_prog_cfg_opts removed 
- loading own eBPF program via xsk_setup_xdp_prog functionality removed

Since v1:
- struct bpf_prog_cfg improved for backward/forward compatibility
- API xsk_update_xskmap renamed to xsk_socket__update_xskmap
- commit message formatting fixed

Mariusz Dudek (2):
  libbpf: separate XDP program load with xsk socket creation
  samples/bpf: sample application for eBPF load and socket creation
split

 samples/bpf/Makefile|   4 +-
 samples/bpf/xdpsock.h   |   8 ++
 samples/bpf/xdpsock_ctrl_proc.c | 187 
 samples/bpf/xdpsock_user.c  | 146 +++--
 tools/lib/bpf/libbpf.map|   2 +
 tools/lib/bpf/xsk.c |  92 ++--
 tools/lib/bpf/xsk.h |   5 +
 7 files changed, 425 insertions(+), 19 deletions(-)
 create mode 100644 samples/bpf/xdpsock_ctrl_proc.c

-- 
2.20.1

[PATCH v6 bpf-next 2/2] samples/bpf: sample application for eBPF load and socket creation split

2020-12-02 Thread mariusz . dudek

From: Mariusz Dudek 

Introduce a sample program to demonstrate the control and data
plane split. For the control plane part a new program called
xdpsock_ctrl_proc is introduced. For the data plane part, some code
was added to xdpsock_user.c to act as the data plane entity.

Application xdpsock_ctrl_proc works as control entity with sudo
privileges (CAP_SYS_ADMIN and CAP_NET_ADMIN are sufficient) and the
extended xdpsock as data plane entity with CAP_NET_RAW capability
only.

Usage example:

sudo ./samples/bpf/xdpsock_ctrl_proc -i 

sudo ./samples/bpf/xdpsock -i  -q 
-n  -N -l -R

Signed-off-by: Mariusz Dudek 
---
 samples/bpf/Makefile|   4 +-
 samples/bpf/xdpsock.h   |   8 ++
 samples/bpf/xdpsock_ctrl_proc.c | 187 
 samples/bpf/xdpsock_user.c  | 146 +++--
 4 files changed, 335 insertions(+), 10 deletions(-)
 create mode 100644 samples/bpf/xdpsock_ctrl_proc.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 05db041f8b18..26fc96ca619e 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -48,6 +48,7 @@ tprogs-y += syscall_tp
 tprogs-y += cpustat
 tprogs-y += xdp_adjust_tail
 tprogs-y += xdpsock
+tprogs-y += xdpsock_ctrl_proc
 tprogs-y += xsk_fwd
 tprogs-y += xdp_fwd
 tprogs-y += task_fd_query
@@ -105,6 +106,7 @@ syscall_tp-objs := syscall_tp_user.o
 cpustat-objs := cpustat_user.o
 xdp_adjust_tail-objs := xdp_adjust_tail_user.o
 xdpsock-objs := xdpsock_user.o
+xdpsock_ctrl_proc-objs := xdpsock_ctrl_proc.o
 xsk_fwd-objs := xsk_fwd.o
 xdp_fwd-objs := xdp_fwd_user.o
 task_fd_query-objs := task_fd_query_user.o $(TRACE_HELPERS)
@@ -202,7 +204,7 @@ TPROGLDLIBS_tracex4 += -lrt
 TPROGLDLIBS_trace_output   += -lrt
 TPROGLDLIBS_map_perf_test  += -lrt
 TPROGLDLIBS_test_overhead  += -lrt
-TPROGLDLIBS_xdpsock+= -pthread
+TPROGLDLIBS_xdpsock+= -pthread -lcap
 TPROGLDLIBS_xsk_fwd+= -pthread
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
diff --git a/samples/bpf/xdpsock.h b/samples/bpf/xdpsock.h
index b7eca15c78cc..fd70cce60712 100644
--- a/samples/bpf/xdpsock.h
+++ b/samples/bpf/xdpsock.h
@@ -8,4 +8,12 @@
 
 #define MAX_SOCKS 4
 
+#define SOCKET_NAME "sock_cal_bpf_fd"
+#define MAX_NUM_OF_CLIENTS 10
+
+#define CLOSE_CONN  1
+
+typedef __u64 u64;
+typedef __u32 u32;
+
 #endif /* XDPSOCK_H */
diff --git a/samples/bpf/xdpsock_ctrl_proc.c b/samples/bpf/xdpsock_ctrl_proc.c
new file mode 100644
index ..384e62e3c6d6
--- /dev/null
+++ b/samples/bpf/xdpsock_ctrl_proc.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2017 - 2018 Intel Corporation. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include "xdpsock.h"
+
+static const char *opt_if = "";
+
+static struct option long_options[] = {
+   {"interface", required_argument, 0, 'i'},
+   {0, 0, 0, 0}
+};
+
+static void usage(const char *prog)
+{
+   const char *str =
+   "  Usage: %s [OPTIONS]\n"
+   "  Options:\n"
+   "  -i, --interface=nRun on interface n\n"
+   "\n";
+   fprintf(stderr, "%s\n", str);
+
+   exit(0);
+}
+
+static void parse_command_line(int argc, char **argv)
+{
+   int option_index, c;
+
+   opterr = 0;
+
+   for (;;) {
+   c = getopt_long(argc, argv, "i:",
+   long_options, &option_index);
+   if (c == -1)
+   break;
+
+   switch (c) {
+   case 'i':
+   opt_if = optarg;
+   break;
+   default:
+   usage(basename(argv[0]));
+   }
+   }
+}
+
+static int send_xsks_map_fd(int sock, int fd)
+{
+   char cmsgbuf[CMSG_SPACE(sizeof(int))];
+   struct msghdr msg;
+   struct iovec iov;
+   int value = 0;
+
+   if (fd == -1) {
+   fprintf(stderr, "Incorrect fd = %d\n", fd);
+   return -1;
+   }
+   iov.iov_base = &value;
+   iov.iov_len = sizeof(int);
+
+   msg.msg_name = NULL;
+   msg.msg_namelen = 0;
+   msg.msg_iov = &iov;
+   msg.msg_iovlen = 1;
+   msg.msg_flags = 0;
+   msg.msg_control = cmsgbuf;
+   msg.msg_controllen = CMSG_LEN(sizeof(int));
+
+   struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
+
+   cmsg->cmsg_level = SOL_SOCKET;
+   cmsg->cmsg_type = SCM_RIGHTS;
+   cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+
+   *(int *)CMSG_DATA(cmsg) = fd;
+   int ret = sendmsg(sock, &msg, 0);
+
+   if (ret == -1) {
+   fprintf(stderr, "Sendmsg failed with %s", strerror(errno));
+   return -errno;
+   }
+
+   return ret;
+}
+
+int
+main(int argc, char **argv)
+{
+   struct sockaddr_un server;
+   int listening = 1;
+   int rval, msgsock;
+   int

[PATCH v6 bpf-next 1/2] libbpf: separate XDP program load with xsk socket creation

2020-12-02 Thread mariusz . dudek

From: Mariusz Dudek 

Add support for separation of eBPF program load and xsk socket
creation.

This is needed for use-case when you want to privide as little
privileges as possible to the data plane application that will
handle xsk socket creation and incoming traffic.

With this patch the data entity container can be run with only
CAP_NET_RAW capability to fulfill its purpose of creating xsk
socket and handling packages. In case your umem is larger or
equal process limit for MEMLOCK you need either increase the
limit or CAP_IPC_LOCK capability.

To resolve privileges issue two APIs are introduced:

- xsk_setup_xdp_prog - loads the built in XDP program. It can
also return xsks_map_fd which is needed by unprivileged process
to update xsks_map with AF_XDP socket "fd"

- xsk_socket__update_xskmap - inserts an AF_XDP socket into an xskmap
for a particular xsk_socket

Signed-off-by: Mariusz Dudek 
---
 tools/lib/bpf/libbpf.map |  2 +
 tools/lib/bpf/xsk.c  | 92 
 tools/lib/bpf/xsk.h  |  5 +++
 3 files changed, 90 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 29ff4807b909..d939d5ac092e 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -345,4 +345,6 @@ LIBBPF_0.3.0 {
btf__parse_split;
btf__new_empty_split;
btf__new_split;
+   xsk_setup_xdp_prog;
+   xsk_socket__update_xskmap;
 } LIBBPF_0.2.0;
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 9bc537d0b92d..4b051ec7cfbb 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -566,8 +566,35 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
   &xsk->fd, 0);
 }
 
-static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
+static int xsk_create_xsk_struct(int ifindex, struct xsk_socket *xsk)
 {
+   char ifname[IFNAMSIZ];
+   struct xsk_ctx *ctx;
+   char *interface;
+
+   ctx = calloc(1, sizeof(*ctx));
+   if (!ctx)
+   return -ENOMEM;
+
+   interface = if_indextoname(ifindex, &ifname[0]);
+   if (!interface) {
+   free(ctx);
+   return -errno;
+   }
+
+   ctx->ifindex = ifindex;
+   strncpy(ctx->ifname, ifname, IFNAMSIZ - 1);
+   ctx->ifname[IFNAMSIZ - 1] = 0;
+
+   xsk->ctx = ctx;
+
+   return 0;
+}
+
+static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp,
+   int *xsks_map_fd)
+{
+   struct xsk_socket *xsk = _xdp;
struct xsk_ctx *ctx = xsk->ctx;
__u32 prog_id = 0;
int err;
@@ -584,8 +611,7 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
 
err = xsk_load_xdp_prog(xsk);
if (err) {
-   xsk_delete_bpf_maps(xsk);
-   return err;
+   goto err_load_xdp_prog;
}
} else {
ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id);
@@ -598,15 +624,29 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
}
}
 
-   if (xsk->rx)
+   if (xsk->rx) {
err = xsk_set_bpf_maps(xsk);
-   if (err) {
-   xsk_delete_bpf_maps(xsk);
-   close(ctx->prog_fd);
-   return err;
+   if (err) {
+   if (!prog_id) {
+   goto err_set_bpf_maps;
+   } else {
+   close(ctx->prog_fd);
+   return err;
+   }
+   }
}
+   if (xsks_map_fd)
+   *xsks_map_fd = ctx->xsks_map_fd;
 
return 0;
+
+err_set_bpf_maps:
+   close(ctx->prog_fd);
+   bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
+err_load_xdp_prog:
+   xsk_delete_bpf_maps(xsk);
+
+   return err;
 }
 
 static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex,
@@ -689,6 +729,40 @@ static struct xsk_ctx *xsk_create_ctx(struct xsk_socket 
*xsk,
return ctx;
 }
 
+static void xsk_destroy_xsk_struct(struct xsk_socket *xsk)
+{
+   free(xsk->ctx);
+   free(xsk);
+}
+
+int xsk_socket__update_xskmap(struct xsk_socket *xsk, int fd)
+{
+   xsk->ctx->xsks_map_fd = fd;
+   return xsk_set_bpf_maps(xsk);
+}
+
+int xsk_setup_xdp_prog(int ifindex, int *xsks_map_fd)
+{
+   struct xsk_socket *xsk;
+   int res;
+
+   xsk = calloc(1, sizeof(*xsk));
+   if (!xsk)
+   return -ENOMEM;
+
+   res = xsk_create_xsk_struct(ifindex, xsk);
+   if (res) {
+   free(xsk);
+   return -EINVAL;
+   }
+
+   res = __xsk_setup_xdp_prog(xsk, xsks_map_fd);
+
+   xsk_destroy_xsk_struct(xsk);
+
+   return res;
+}
+
 int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
  const char *ifname,
  __u32 queue_id, struct xsk_umem

Re: [PATCH v2 net-next 0/2] add ppp_generic ioctl(s) to bridge channels

2020-12-02 Thread James Chapman

On 01/12/2020 11:52, Tom Parkin wrote:
> Following on from my previous RFC[1], this series adds two ioctl calls
> to the ppp code to implement "channel bridging".
>
> When two ppp channels are bridged, frames presented to ppp_input() on
> one channel are passed to the other channel's ->start_xmit function for
> transmission.
>
> The primary use-case for this functionality is in an L2TP Access
> Concentrator where PPP frames are typically presented in a PPPoE session
> (e.g. from a home broadband user) and are forwarded to the ISP network in
> a PPPoL2TP session.
>
> The two new ioctls, PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN form a
> symmetric pair.
>
> Userspace code testing and illustrating use of the ioctl calls is
> available in the go-l2tp[2] and l2tp-ktest[3] repositories.
>
> [1]. Previous RFC series:
>
> https://lore.kernel.org/netdev/20201106181647.16358-1-tpar...@katalix.com/
>
> [2]. go-l2tp: a Go library for building L2TP applications on Linux
> systems. Support for the PPPIOCBRIDGECHAN ioctl is on a branch:
>
> https://github.com/katalix/go-l2tp/tree/tp_002_pppoe_2
>
> [3]. l2tp-ktest: a test suite for the Linux Kernel L2TP subsystem.
> Support for the PPPIOCBRIDGECHAN ioctl is on a branch:
>
> https://github.com/katalix/l2tp-ktest/tree/tp_ac_pppoe_tests_2
>
> Changelog:
>
> v2:
> * Add missing __rcu annotation to struct channel 'bridge' field in
>   order to squash a sparse warning from a C=1 build
> * Integrate review comments from gna...@redhat.com
> * Have ppp_unbridge_channels return -EINVAL if the channel isn't
>   part of a bridge: this better aligns with the return code from
>   ppp_disconnect_channel.
> * Improve docs update by including information on ioctl arguments
>   and error return codes.
>
> Tom Parkin (2):
>   ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls
>   docs: update ppp_generic.rst to document new ioctls
>
>  Documentation/networking/ppp_generic.rst |   9 ++
>  drivers/net/ppp/ppp_generic.c| 143 ++-
>  include/uapi/linux/ppp-ioctl.h   |   2 +
>  3 files changed, 152 insertions(+), 2 deletions(-)
>
Reviewed-by: James Chapman

Re: [PATCH v7 3/3] net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver

2020-12-02 Thread Lukasz Stelmach

It was <2020-11-25 śro 13:26>, when Jakub Kicinski wrote:
> On Tue, 24 Nov 2020 13:03:30 +0100 Łukasz Stelmach wrote:
>> +static int
>> +ax88796c_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +struct ax88796c_device *ax_local = to_ax88796c_device(ndev);
>> +
>> +skb_queue_tail(&ax_local->tx_wait_q, skb);
>> +if (skb_queue_len(&ax_local->tx_wait_q) > TX_QUEUE_HIGH_WATER) {
>> +netif_err(ax_local, tx_queued, ndev,
>> +  "Too many TX packets in queue %d\n",
>> +  skb_queue_len(&ax_local->tx_wait_q));
>
> This will probably happen under heavy traffic. No need to print errors,
> it's normal to back pressure.
>

Removed.

>> +netif_stop_queue(ndev);
>> +}
>> +
>> +set_bit(EVENT_TX, &ax_local->flags);
>> +schedule_work(&ax_local->ax_work);
>> +
>> +return NETDEV_TX_OK;
>> +}
>> +
>> +static void
>> +ax88796c_skb_return(struct ax88796c_device *ax_local, struct sk_buff *skb,
>> +struct rx_header *rxhdr)
>> +{
>> +struct net_device *ndev = ax_local->ndev;
>> +int status;
>> +
>> +do {
>> +if (!(ndev->features & NETIF_F_RXCSUM))
>> +break;
>> +
>> +/* checksum error bit is set */
>> +if ((rxhdr->flags & RX_HDR3_L3_ERR) ||
>> +(rxhdr->flags & RX_HDR3_L4_ERR))
>> +break;
>> +
>> +/* Other types may be indicated by more than one bit. */
>> +if ((rxhdr->flags & RX_HDR3_L4_TYPE_TCP) ||
>> +(rxhdr->flags & RX_HDR3_L4_TYPE_UDP))
>> +skb->ip_summed = CHECKSUM_UNNECESSARY;
>> +} while (0);
>> +
>> +ax_local->stats.rx_packets++;
>> +ax_local->stats.rx_bytes += skb->len;
>> +skb->dev = ndev;
>> +
>> +skb->truesize = skb->len + sizeof(struct sk_buff);
>> +skb->protocol = eth_type_trans(skb, ax_local->ndev);
>> +
>> +netif_info(ax_local, rx_status, ndev, "< rx, len %zu, type 0x%x\n",
>> +   skb->len + sizeof(struct ethhdr), skb->protocol);
>> +
>> +status = netif_rx(skb);
>
> If I'm reading things right this is in process context, so netif_rx_ni()
>

Is it? The stack looks as follows

ax88796c_skb_return()
ax88796c_rx_fixup()
ax88796c_receive()
ax88796c_process_isr()
ax88796c_work()

and ax88796c_work() is a scheduled in the system_wq.

>> +if (status != NET_RX_SUCCESS)
>> +netif_info(ax_local, rx_err, ndev,
>> +   "netif_rx status %d\n", status);
>
> Again, it's inadvisable to put per packet prints without any rate
> limiting in the data path.

Even if limmited by the msglvl flag, which is off by default?

-- 
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics


signature.asc
Description: PGP signature

Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support

2020-12-02 Thread Tobias Waldekranz

On Wed, Dec 02, 2020 at 12:07, Vladimir Oltean  wrote:
> On Wed, Dec 02, 2020 at 10:13:54AM +0100, Tobias Waldekranz wrote:
>> +
>> +/* Link aggregates */
>> +struct {
>> +struct dsa_lag *pool;
>> +unsigned long *busy;
>> +unsigned int num;
>
> Can we get rid of the busy array and just look at the refcounts?

We can, but it is typically 4/8B that makes iterating over used LAGs,
finding the first free LAG etc. much easier.

> Can we also get rid of the "num" variable?

It can be computed but it requires traversing the entire dst port list
every time, as you can see in dsa_tree_setup_lags. We need to hoist the
lowest supported num up from the ds level to the dst.

RE: [External] Re: [PATCH 0/7] Introduce vdpa management tool

2020-12-02 Thread Parav Pandit



> From: Yongji Xie 
> Sent: Wednesday, December 2, 2020 2:52 PM
> 
> On Wed, Dec 2, 2020 at 12:53 PM Parav Pandit  wrote:
> >
> >
> >
> > > From: Yongji Xie 
> > > Sent: Wednesday, December 2, 2020 9:00 AM
> > >
> > > On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit  wrote:
> > > >
> > > >
> > > >
> > > > > From: Yongji Xie 
> > > > > Sent: Tuesday, December 1, 2020 7:49 PM
> > > > >
> > > > > On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit 
> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > From: Yongji Xie 
> > > > > > > Sent: Tuesday, December 1, 2020 3:26 PM
> > > > > > >
> > > > > > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang
> > > > > > > 
> > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > > > > > >>> Thanks for adding me, Jason!
> > > > > > > > >>>
> > > > > > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA
> > > > > > > > >>> Device in
> > > > > > > > >>> Userspace) [1]. This tool is very useful for the vduse 
> > > > > > > > >>> device.
> > > > > > > > >>> So I'm considering integrating this into my v2 patchset.
> > > > > > > > >>> But there is one problem：
> > > > > > > > >>>
> > > > > > > > >>> In this tool, vdpa device config action and enable
> > > > > > > > >>> action are combined into one netlink msg:
> > > > > > > > >>> VDPA_CMD_DEV_NEW. But in
> > > > > vduse
> > > > > > > > >>> case, it needs to be splitted because a chardev should
> > > > > > > > >>> be created and opened by a userspace process before we
> > > > > > > > >>> enable the vdpa device (call vdpa_register_device()).
> > > > > > > > >>>
> > > > > > > > >>> So I'd like to know whether it's possible (or have
> > > > > > > > >>> some
> > > > > > > > >>> plans) to add two new netlink msgs something like:
> > > > > > > > >>> VDPA_CMD_DEV_ENABLE
> > > > > > > and
> > > > > > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more
> flexible.
> > > > > > > > >>>
> > > > > > > > >> Actually, we've discussed such intermediate step in
> > > > > > > > >> some early discussion. It looks to me VDUSE could be
> > > > > > > > >> one of the users of
> > > this.
> > > > > > > > >>
> > > > > > > > >> Or I wonder whether we can switch to use anonymous
> > > > > > > > >> inode(fd) for VDUSE then fetching it via an
> > > > > > > > >> VDUSE_GET_DEVICE_FD
> > > ioctl?
> > > > > > > > >>
> > > > > > > > > Yes, we can. Actually the current implementation in
> > > > > > > > > VDUSE is like this.  But seems like this is still a 
> > > > > > > > > intermediate
> step.
> > > > > > > > > The fd should be binded to a name or something else
> > > > > > > > > which need to be configured before.
> > > > > > > >
> > > > > > > >
> > > > > > > > The name could be specified via the netlink. It looks to
> > > > > > > > me the real issue is that until the device is connected
> > > > > > > > with a userspace, it can't be used. So we also need to
> > > > > > > > fail the enabling if it doesn't
> > > > > opened.
> > > > > > > >
> > > > > > >
> > > > > > > Yes, that's true. So you mean we can firstly try to fetch
> > > > > > > the fd binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD,
> > > > > > > then use the name/vduse_id as a attribute to create vdpa
> > > > > > > device? It looks fine to
> > > me.
> > > > > >
> > > > > > I probably do not well understand. I tried reading patch [1]
> > > > > > and few things
> > > > > do not look correct as below.
> > > > > > Creating the vdpa device on the bus device and destroying the
> > > > > > device from
> > > > > the workqueue seems unnecessary and racy.
> > > > > >
> > > > > > It seems vduse driver needs
> > > > > > This is something should be done as part of the vdpa dev add
> > > > > > command,
> > > > > instead of connecting two sides separately and ensuring race
> > > > > free access to it.
> > > > > >
> > > > > > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be
> avoided.
> > > > > >
> > > > >
> > > > > Yes, we can avoid these two ioctls with the help of the management
> tool.
> > > > >
> > > > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > > > >
> > > > > > When above command is executed it creates necessary vdpa
> > > > > > device
> > > > > > foo2
> > > > > on the bus.
> > > > > > When user binds foo2 device with the vduse driver, in the
> > > > > > probe(), it
> > > > > creates respective char device to access it from user space.
> > > > >
> > > > I see. So vduse cannot work with any existing vdpa devices like
> > > > ifc, mlx5 or
> > > netdevsim.
> > > > It has its own implementation similar to fuse with its own backend of
> choice.
> > > > More below.
> > > >
> > > > > But vduse driver is not a vdpa bus driver. It works like vdpasim
> > > > > driver, but offloads the data plane and control plane to a user space
> process.
> > > >
> > > > In that case to draw parallel lines,
> > > >
> > > > 1. netdevsim:
> > > > (a) create resources in kernel sw
> > > > (b) datapath simulates in kernel
> > > >
> > > > 2. ifc + ml

[PATCH bpf v2] libbpf: sanitise map names before pinning

2020-12-02 Thread Toke Høiland-Jørgensen

When we added sanitising of map names before loading programs to libbpf, we
still allowed periods in the name. While the kernel will accept these for
the map names themselves, they are not allowed in file names when pinning
maps. This means that bpf_object__pin_maps() will fail if called on an
object that contains internal maps (such as sections .rodata).

Fix this by replacing periods with underscores when constructing map pin
paths. This only affects the paths generated by libbpf when
bpf_object__ping_maps() is called with a path argument. Any pin paths set
by bpf_map__set_pin_path() are unaffected, and it will still be up to the
caller to avoid invalid characters in those.

Fixes: 113e6b7e15e2 ("libbpf: Sanitise internal map names so they are not 
rejected by the kernel")
Signed-off-by: Toke Høiland-Jørgensen 
---
v2:
  - Move string munging to helper function

 tools/lib/bpf/libbpf.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8d05132e1945..08ff7783fb93 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7651,6 +7651,20 @@ bool bpf_map__is_pinned(const struct bpf_map *map)
return map->pinned;
 }
 
+static char *sanitize_pin_path(char *str)
+{
+   char *s = str;
+
+   /* bpffs disallows periods in path names */
+   while (*s) {
+   if (*s == '.')
+   *s = '_';
+   s++;
+   }
+
+   return str;
+}
+
 int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
 {
struct bpf_map *map;
@@ -7680,7 +7694,7 @@ int bpf_object__pin_maps(struct bpf_object *obj, const 
char *path)
err = -ENAMETOOLONG;
goto err_unpin_maps;
}
-   pin_path = buf;
+   pin_path = sanitize_pin_path(buf);
} else if (!map->pin_path) {
continue;
}
@@ -7724,7 +7738,7 @@ int bpf_object__unpin_maps(struct bpf_object *obj, const 
char *path)
return -EINVAL;
else if (len >= PATH_MAX)
return -ENAMETOOLONG;
-   pin_path = buf;
+   pin_path = sanitize_pin_path(buf);
} else if (!map->pin_path) {
continue;
}
-- 
2.29.2

Re: [PATCH net-next v3] macvlan: Support for high multicast packet rate

2020-12-02 Thread Thomas Karlsson

On 2020-12-01 20:11, Jakub Kicinski wrote:
> On Mon, 30 Nov 2020 15:00:43 +0100 Thomas Karlsson wrote:
>> Background:
>> Broadcast and multicast packages are enqueued for later processing.
>> This queue was previously hardcoded to 1000.
>>
>> This proved insufficient for handling very high packet rates.
>> This resulted in packet drops for multicast.
>> While at the same time unicast worked fine.
>>
>> The change:
>> This patch make the queue length adjustable to accommodate
>> for environments with very high multicast packet rate.
>> But still keeps the default value of 1000 unless specified.
>>
>> The queue length is specified as a request per macvlan
>> using the IFLA_MACVLAN_BC_QUEUE_LEN parameter.
>>
>> The actual used queue length will then be the maximum of
>> any macvlan connected to the same port. The actual used
>> queue length for the port can be retrieved (read only)
>> by the IFLA_MACVLAN_BC_QUEUE_LEN_USED parameter for verification.
>>
>> This will be followed up by a patch to iproute2
>> in order to adjust the parameter from userspace.
>>
>> Signed-off-by: Thomas Karlsson 
> 
> Looks good! Minor nits below:

:)

> 
>> @@ -1218,6 +1220,7 @@ static int macvlan_port_create(struct net_device *dev)
>>  for (i = 0; i < MACVLAN_HASH_SIZE; i++)
>>  INIT_HLIST_HEAD(&port->vlan_source_hash[i]);
>>  
>> +port->bc_queue_len_used = MACVLAN_DEFAULT_BC_QUEUE_LEN;
> 
> Should this be inited to 0? Otherwise if the first link asks for lower
> queue len than the default it will not get set, right?

Indeed, looks you are right, see also below

 
>>  skb_queue_head_init(&port->bc_queue);
>>  INIT_WORK(&port->bc_work, macvlan_process_broadcast);
>>  
>> @@ -1486,6 +1489,12 @@ int macvlan_common_newlink(struct net *src_net, 
>> struct net_device *dev,
>>  goto destroy_macvlan_port;
>>  }
>>  
>> +vlan->bc_queue_len_requested = MACVLAN_DEFAULT_BC_QUEUE_LEN;
>> +if (data && data[IFLA_MACVLAN_BC_QUEUE_LEN])
>> +vlan->bc_queue_len_requested = 
>> nla_get_u32(data[IFLA_MACVLAN_BC_QUEUE_LEN]);
>> +if (vlan->bc_queue_len_requested > port->bc_queue_len_used)
>> +port->bc_queue_len_used = vlan->bc_queue_len_requested;
> 
> Or perhaps we should just call update_port_bc_queue_len() here?

That would also have prevented the above bug... So yes, I think that is better
to keep the logic only in one place. I'll change to that.

 
>>  err = register_netdevice(dev);
>>  if (err < 0)
>>  goto destroy_macvlan_port;
> 
>> @@ -1658,6 +1684,8 @@ static const struct nla_policy 
>> macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
>>  [IFLA_MACVLAN_MACADDR] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
>>  [IFLA_MACVLAN_MACADDR_DATA] = { .type = NLA_NESTED },
>>  [IFLA_MACVLAN_MACADDR_COUNT] = { .type = NLA_U32 },
>> +[IFLA_MACVLAN_BC_QUEUE_LEN] = { .type = NLA_U32 },
>> +[IFLA_MACVLAN_BC_QUEUE_LEN_USED] = { .type = NLA_U32 },
> 
> This is an input policy, so you can set type to NLA_REJECT and you
> won't have to check if it's set on input.
> 

Great!

>>  };
>>  
>>  int macvlan_link_register(struct rtnl_link_ops *ops)
>> @@ -1688,6 +1716,18 @@ static struct rtnl_link_ops macvlan_link_ops = {
>>  .priv_size  = sizeof(struct macvlan_dev),
>>  };
>>  
>> +static void update_port_bc_queue_len(struct macvlan_port *port)
>> +{
>> +struct macvlan_dev *vlan;
>> +u32 max_bc_queue_len_requested = 0;
> 
> Please reorder so that the vars are longest line to shortest.
> 
got it

>> +list_for_each_entry_rcu(vlan, &port->vlans, list) {
> 
> I don't think you need the _rcu() flavor here, this is always called
> from the configuration paths holding RTNL lock, right?
> 

To be honest, what to use/not to use when traversing the list was what caused 
me the most
doubt/trouble of the patch :)

I sort of assumed that there must be some outer synchronisation that prevented
two or more concurrent calls to new/delte/change link. but wasn't sure how
and where that synchonisation took place. Now that I have googled RTLN lock I 
understand
that part much better.

The main reason I went with _rcu was because the existing code is using 
list_del_rcu and
list_add_tail_rcu when modifying the list as well as _rcu when 
accessing/traversing (in some places).
So I figured if they needed the _rcu variants I too would need that.

But from a closer inspection I think in that situation it is only needed 
because the list is accessed
from for example macvlan_handle_frame (obviously not protected by the RTLN 
lock) using _rcu version
and under the rcu_read_lock as protection. So then it must also be updated with 
_rcu 
in all places of course. Even if all the updates are done under the RTNL lock.

This was a long ramble :)
But thanks, I think I understand the synchronisation mechanism in the kernel a 
bit better now!

As I'm only calling my function from the netlink configuration functions under 
RTLN lock
It should be safe to drop the _r

Re: [PATCH net] cxgb3: fix error return code in t3_sge_alloc_qset()

2020-12-02 Thread Raju Rangoju

On Wednesday, December 12/02/20, 2020 at 17:56:05 +0800, Zhang Changzhong wrote:
> Fix to return a negative error code from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Fixes: b1fb1f280d09 ("cxgb3 - Fix dma mapping error path")
> Reported-by: Hulk Robot 
> Signed-off-by: Zhang Changzhong 
> ---

Acked-by: Raju Rangoju 

>  drivers/net/ethernet/chelsio/cxgb3/sge.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/sge.c 
> b/drivers/net/ethernet/chelsio/cxgb3/sge.c
> index e18e9ce..1cc3c51 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/sge.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/sge.c
> @@ -3175,6 +3175,7 @@ int t3_sge_alloc_qset(struct adapter *adapter, unsigned 
> int id, int nports,
> GFP_KERNEL | __GFP_COMP);
>   if (!avail) {
>   CH_ALERT(adapter, "free list queue 0 initialization failed\n");
> + ret = -ENOMEM;
>   goto err;
>   }
>   if (avail < q->fl[0].size)
> -- 
> 2.9.5
>

[PATCH v5 net-next 0/4] nfc: s3fwrn5: Support a UART interface

2020-12-02 Thread Bongsu Jeon

From: Bongsu Jeon 

S3FWRN82 is the Samsung's NFC chip that supports the UART communication.
Before adding the UART driver module, I did refactoring the s3fwrn5_i2c module 
to reuse the common blocks.

1/4 is the dt bindings for the RN82 UART interface.
2/4..3/4 are refactoring the s3fwrn5_i2c module.
4/4 is the UART driver module implementation.

ChangeLog:
 v5:
   1/4
- remove the 'items' of the compatible property.
- change the GPIO flags.
 v4:
   1/4
- change 'oneOf' to 'items'.
- fix the indentation.
   2/4
- add the ACK tag.
   4/4
- remove the of_match_ptr macro.
 v3:
   3/4
- move the phy_common object to s3fwrn.ko to avoid duplication.
- include the header files to include everything which is used inside.
- wrap the lines.
   4/4
- remove the kfree(phy) because of duplicated free.
- use the phy_common blocks.
- wrap lines properly.
 v2:
   1/4
- change the compatible name.
- change the const to enum for compatible.
- change the node name to nfc.
   3/4
- remove the common function's definition in common header file.
- make the common phy_common.c file to define the common function.
- wrap the lines.
- change the Header guard.
- remove the unused common function.

Bongsu Jeon (4):
  dt-bindings: net: nfc: s3fwrn5: Support a UART interface
  nfc: s3fwrn5: reduce the EN_WAIT_TIME
  nfc: s3fwrn5: extract the common phy blocks
  nfc: s3fwrn5: Support a UART interface

 .../bindings/net/nfc/samsung,s3fwrn5.yaml  |  31 +++-
 drivers/nfc/s3fwrn5/Kconfig|  12 ++
 drivers/nfc/s3fwrn5/Makefile   |   4 +-
 drivers/nfc/s3fwrn5/i2c.c  | 117 
 drivers/nfc/s3fwrn5/phy_common.c   |  75 
 drivers/nfc/s3fwrn5/phy_common.h   |  37 
 drivers/nfc/s3fwrn5/uart.c | 196 +
 7 files changed, 390 insertions(+), 82 deletions(-)
 create mode 100644 drivers/nfc/s3fwrn5/phy_common.c
 create mode 100644 drivers/nfc/s3fwrn5/phy_common.h
 create mode 100644 drivers/nfc/s3fwrn5/uart.c

-- 
1.9.1

[PATCH v5 net-next 1/4] dt-bindings: net: nfc: s3fwrn5: Support a UART interface

2020-12-02 Thread Bongsu Jeon

From: Bongsu Jeon 

Since S3FWRN82 NFC Chip, The UART interface can be used.
S3FWRN82 supports I2C and UART interface.

Signed-off-by: Bongsu Jeon 
---
 .../bindings/net/nfc/samsung,s3fwrn5.yaml  | 31 +++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml 
b/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
index cb0b8a5..ca3904b 100644
--- a/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
+++ b/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
@@ -12,7 +12,9 @@ maintainers:
 
 properties:
   compatible:
-const: samsung,s3fwrn5-i2c
+enum:
+  - samsung,s3fwrn5-i2c
+  - samsung,s3fwrn82
 
   en-gpios:
 maxItems: 1
@@ -47,10 +49,19 @@ additionalProperties: false
 required:
   - compatible
   - en-gpios
-  - interrupts
-  - reg
   - wake-gpios
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+const: samsung,s3fwrn5-i2c
+then:
+  required:
+- interrupts
+- reg
+
 examples:
   - |
 #include 
@@ -71,3 +82,17 @@ examples:
 wake-gpios = <&gpj0 2 GPIO_ACTIVE_HIGH>;
 };
 };
+  # UART example on Raspberry Pi
+  - |
+uart0 {
+status = "okay";
+
+nfc {
+compatible = "samsung,s3fwrn82";
+
+en-gpios = <&gpio 20 GPIO_ACTIVE_HIGH>;
+wake-gpios = <&gpio 16 GPIO_ACTIVE_HIGH>;
+
+status = "okay";
+};
+};
-- 
1.9.1

[PATCH v5 net-next 2/4] nfc: s3fwrn5: reduce the EN_WAIT_TIME

2020-12-02 Thread Bongsu Jeon

From: Bongsu Jeon 

The delay of 20ms is enough to enable and
wake up the Samsung's nfc chip.

Acked-by: Krzysztof Kozlowski 
Signed-off-by: Bongsu Jeon 
---
 drivers/nfc/s3fwrn5/i2c.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c
index ae26594..9a64eea 100644
--- a/drivers/nfc/s3fwrn5/i2c.c
+++ b/drivers/nfc/s3fwrn5/i2c.c
@@ -19,7 +19,7 @@
 
 #define S3FWRN5_I2C_DRIVER_NAME "s3fwrn5_i2c"
 
-#define S3FWRN5_EN_WAIT_TIME 150
+#define S3FWRN5_EN_WAIT_TIME 20
 
 struct s3fwrn5_i2c_phy {
struct i2c_client *i2c_dev;
@@ -40,7 +40,7 @@ static void s3fwrn5_i2c_set_wake(void *phy_id, bool wake)
 
mutex_lock(&phy->mutex);
gpio_set_value(phy->gpio_fw_wake, wake);
-   msleep(S3FWRN5_EN_WAIT_TIME/2);
+   msleep(S3FWRN5_EN_WAIT_TIME);
mutex_unlock(&phy->mutex);
 }
 
@@ -63,7 +63,7 @@ static void s3fwrn5_i2c_set_mode(void *phy_id, enum 
s3fwrn5_mode mode)
if (mode != S3FWRN5_MODE_COLD) {
msleep(S3FWRN5_EN_WAIT_TIME);
gpio_set_value(phy->gpio_en, 0);
-   msleep(S3FWRN5_EN_WAIT_TIME/2);
+   msleep(S3FWRN5_EN_WAIT_TIME);
}
 
phy->irq_skip = true;
-- 
1.9.1

[PATCH v5 net-next 3/4] nfc: s3fwrn5: extract the common phy blocks

2020-12-02 Thread Bongsu Jeon

From: Bongsu Jeon 

Extract the common phy blocks to reuse it.
The UART module will use the common blocks.

Reviewed-by: Krzysztof Kozlowski 
Signed-off-by: Bongsu Jeon 
---
 drivers/nfc/s3fwrn5/Makefile |   2 +-
 drivers/nfc/s3fwrn5/i2c.c| 117 +--
 drivers/nfc/s3fwrn5/phy_common.c |  63 +
 drivers/nfc/s3fwrn5/phy_common.h |  36 
 4 files changed, 139 insertions(+), 79 deletions(-)
 create mode 100644 drivers/nfc/s3fwrn5/phy_common.c
 create mode 100644 drivers/nfc/s3fwrn5/phy_common.h

diff --git a/drivers/nfc/s3fwrn5/Makefile b/drivers/nfc/s3fwrn5/Makefile
index d0ffa35..6b6f52d 100644
--- a/drivers/nfc/s3fwrn5/Makefile
+++ b/drivers/nfc/s3fwrn5/Makefile
@@ -3,7 +3,7 @@
 # Makefile for Samsung S3FWRN5 NFC driver
 #
 
-s3fwrn5-objs = core.o firmware.o nci.o
+s3fwrn5-objs = core.o firmware.o nci.o phy_common.o
 s3fwrn5_i2c-objs = i2c.o
 
 obj-$(CONFIG_NFC_S3FWRN5) += s3fwrn5.o
diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c
index 9a64eea..e1bdde1 100644
--- a/drivers/nfc/s3fwrn5/i2c.c
+++ b/drivers/nfc/s3fwrn5/i2c.c
@@ -15,75 +15,30 @@
 
 #include 
 
-#include "s3fwrn5.h"
+#include "phy_common.h"
 
 #define S3FWRN5_I2C_DRIVER_NAME "s3fwrn5_i2c"
 
-#define S3FWRN5_EN_WAIT_TIME 20
-
 struct s3fwrn5_i2c_phy {
+   struct phy_common common;
struct i2c_client *i2c_dev;
-   struct nci_dev *ndev;
-
-   int gpio_en;
-   int gpio_fw_wake;
-
-   struct mutex mutex;
 
-   enum s3fwrn5_mode mode;
unsigned int irq_skip:1;
 };
 
-static void s3fwrn5_i2c_set_wake(void *phy_id, bool wake)
-{
-   struct s3fwrn5_i2c_phy *phy = phy_id;
-
-   mutex_lock(&phy->mutex);
-   gpio_set_value(phy->gpio_fw_wake, wake);
-   msleep(S3FWRN5_EN_WAIT_TIME);
-   mutex_unlock(&phy->mutex);
-}
-
 static void s3fwrn5_i2c_set_mode(void *phy_id, enum s3fwrn5_mode mode)
 {
struct s3fwrn5_i2c_phy *phy = phy_id;
 
-   mutex_lock(&phy->mutex);
+   mutex_lock(&phy->common.mutex);
 
-   if (phy->mode == mode)
+   if (s3fwrn5_phy_power_ctrl(&phy->common, mode) == false)
goto out;
 
-   phy->mode = mode;
-
-   gpio_set_value(phy->gpio_en, 1);
-   gpio_set_value(phy->gpio_fw_wake, 0);
-   if (mode == S3FWRN5_MODE_FW)
-   gpio_set_value(phy->gpio_fw_wake, 1);
-
-   if (mode != S3FWRN5_MODE_COLD) {
-   msleep(S3FWRN5_EN_WAIT_TIME);
-   gpio_set_value(phy->gpio_en, 0);
-   msleep(S3FWRN5_EN_WAIT_TIME);
-   }
-
phy->irq_skip = true;
 
 out:
-   mutex_unlock(&phy->mutex);
-}
-
-static enum s3fwrn5_mode s3fwrn5_i2c_get_mode(void *phy_id)
-{
-   struct s3fwrn5_i2c_phy *phy = phy_id;
-   enum s3fwrn5_mode mode;
-
-   mutex_lock(&phy->mutex);
-
-   mode = phy->mode;
-
-   mutex_unlock(&phy->mutex);
-
-   return mode;
+   mutex_unlock(&phy->common.mutex);
 }
 
 static int s3fwrn5_i2c_write(void *phy_id, struct sk_buff *skb)
@@ -91,7 +46,7 @@ static int s3fwrn5_i2c_write(void *phy_id, struct sk_buff 
*skb)
struct s3fwrn5_i2c_phy *phy = phy_id;
int ret;
 
-   mutex_lock(&phy->mutex);
+   mutex_lock(&phy->common.mutex);
 
phy->irq_skip = false;
 
@@ -102,7 +57,7 @@ static int s3fwrn5_i2c_write(void *phy_id, struct sk_buff 
*skb)
ret  = i2c_master_send(phy->i2c_dev, skb->data, skb->len);
}
 
-   mutex_unlock(&phy->mutex);
+   mutex_unlock(&phy->common.mutex);
 
if (ret < 0)
return ret;
@@ -114,9 +69,9 @@ static int s3fwrn5_i2c_write(void *phy_id, struct sk_buff 
*skb)
 }
 
 static const struct s3fwrn5_phy_ops i2c_phy_ops = {
-   .set_wake = s3fwrn5_i2c_set_wake,
+   .set_wake = s3fwrn5_phy_set_wake,
.set_mode = s3fwrn5_i2c_set_mode,
-   .get_mode = s3fwrn5_i2c_get_mode,
+   .get_mode = s3fwrn5_phy_get_mode,
.write = s3fwrn5_i2c_write,
 };
 
@@ -128,7 +83,7 @@ static int s3fwrn5_i2c_read(struct s3fwrn5_i2c_phy *phy)
char hdr[4];
int ret;
 
-   hdr_size = (phy->mode == S3FWRN5_MODE_NCI) ?
+   hdr_size = (phy->common.mode == S3FWRN5_MODE_NCI) ?
NCI_CTRL_HDR_SIZE : S3FWRN5_FW_HDR_SIZE;
ret = i2c_master_recv(phy->i2c_dev, hdr, hdr_size);
if (ret < 0)
@@ -137,7 +92,7 @@ static int s3fwrn5_i2c_read(struct s3fwrn5_i2c_phy *phy)
if (ret < hdr_size)
return -EBADMSG;
 
-   data_len = (phy->mode == S3FWRN5_MODE_NCI) ?
+   data_len = (phy->common.mode == S3FWRN5_MODE_NCI) ?
((struct nci_ctrl_hdr *)hdr)->plen :
((struct s3fwrn5_fw_header *)hdr)->len;
 
@@ -157,24 +112,24 @@ static int s3fwrn5_i2c_read(struct s3fwrn5_i2c_phy *phy)
}
 
 out:
-   return s3fwrn5_recv_frame(phy->ndev, skb, phy->mode);
+   return s3fwrn5_recv_frame(phy->common.ndev, skb, phy->common.mode);
 }
 
 static irqreturn_t s3fwrn5_i2c_irq_thread_fn(int irq,

[PATCH v5 net-next 4/4] nfc: s3fwrn5: Support a UART interface

2020-12-02 Thread Bongsu Jeon

From: Bongsu Jeon 

Since S3FWRN82 NFC Chip, The UART interface can be used.
S3FWRN82 uses NCI protocol and supports I2C and UART interface.

Reviewed-by: Krzysztof Kozlowski 
Signed-off-by: Bongsu Jeon 
---
 drivers/nfc/s3fwrn5/Kconfig  |  12 +++
 drivers/nfc/s3fwrn5/Makefile |   2 +
 drivers/nfc/s3fwrn5/phy_common.c |  12 +++
 drivers/nfc/s3fwrn5/phy_common.h |   1 +
 drivers/nfc/s3fwrn5/uart.c   | 196 +++
 5 files changed, 223 insertions(+)
 create mode 100644 drivers/nfc/s3fwrn5/uart.c

diff --git a/drivers/nfc/s3fwrn5/Kconfig b/drivers/nfc/s3fwrn5/Kconfig
index 3f8b6da..8a6b1a7 100644
--- a/drivers/nfc/s3fwrn5/Kconfig
+++ b/drivers/nfc/s3fwrn5/Kconfig
@@ -20,3 +20,15 @@ config NFC_S3FWRN5_I2C
  To compile this driver as a module, choose m here. The module will
  be called s3fwrn5_i2c.ko.
  Say N if unsure.
+
+config NFC_S3FWRN82_UART
+tristate "Samsung S3FWRN82 UART support"
+depends on NFC_NCI && SERIAL_DEV_BUS
+select NFC_S3FWRN5
+help
+  This module adds support for a UART interface to the S3FWRN82 chip.
+  Select this if your platform is using the UART bus.
+
+  To compile this driver as a module, choose m here. The module will
+  be called s3fwrn82_uart.ko.
+  Say N if unsure.
diff --git a/drivers/nfc/s3fwrn5/Makefile b/drivers/nfc/s3fwrn5/Makefile
index 6b6f52d..7da827a 100644
--- a/drivers/nfc/s3fwrn5/Makefile
+++ b/drivers/nfc/s3fwrn5/Makefile
@@ -5,6 +5,8 @@
 
 s3fwrn5-objs = core.o firmware.o nci.o phy_common.o
 s3fwrn5_i2c-objs = i2c.o
+s3fwrn82_uart-objs = uart.o
 
 obj-$(CONFIG_NFC_S3FWRN5) += s3fwrn5.o
 obj-$(CONFIG_NFC_S3FWRN5_I2C) += s3fwrn5_i2c.o
+obj-$(CONFIG_NFC_S3FWRN82_UART) += s3fwrn82_uart.o
diff --git a/drivers/nfc/s3fwrn5/phy_common.c b/drivers/nfc/s3fwrn5/phy_common.c
index 5cad1f4..497b02b 100644
--- a/drivers/nfc/s3fwrn5/phy_common.c
+++ b/drivers/nfc/s3fwrn5/phy_common.c
@@ -47,6 +47,18 @@ bool s3fwrn5_phy_power_ctrl(struct phy_common *phy, enum 
s3fwrn5_mode mode)
 }
 EXPORT_SYMBOL(s3fwrn5_phy_power_ctrl);
 
+void s3fwrn5_phy_set_mode(void *phy_id, enum s3fwrn5_mode mode)
+{
+   struct phy_common *phy = phy_id;
+
+   mutex_lock(&phy->mutex);
+
+   s3fwrn5_phy_power_ctrl(phy, mode);
+
+   mutex_unlock(&phy->mutex);
+}
+EXPORT_SYMBOL(s3fwrn5_phy_set_mode);
+
 enum s3fwrn5_mode s3fwrn5_phy_get_mode(void *phy_id)
 {
struct phy_common *phy = phy_id;
diff --git a/drivers/nfc/s3fwrn5/phy_common.h b/drivers/nfc/s3fwrn5/phy_common.h
index b98531d..99749c9 100644
--- a/drivers/nfc/s3fwrn5/phy_common.h
+++ b/drivers/nfc/s3fwrn5/phy_common.h
@@ -31,6 +31,7 @@ struct phy_common {
 
 void s3fwrn5_phy_set_wake(void *phy_id, bool wake);
 bool s3fwrn5_phy_power_ctrl(struct phy_common *phy, enum s3fwrn5_mode mode);
+void s3fwrn5_phy_set_mode(void *phy_id, enum s3fwrn5_mode mode);
 enum s3fwrn5_mode s3fwrn5_phy_get_mode(void *phy_id);
 
 #endif /* __NFC_S3FWRN5_PHY_COMMON_H */
diff --git a/drivers/nfc/s3fwrn5/uart.c b/drivers/nfc/s3fwrn5/uart.c
new file mode 100644
index 000..82ea35d
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/uart.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * UART Link Layer for S3FWRN82 NCI based Driver
+ *
+ * Copyright (C) 2015 Samsung Electronics
+ * Robert Baldyga 
+ * Copyright (C) 2020 Samsung Electronics
+ * Bongsu Jeon 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "phy_common.h"
+
+#define S3FWRN82_NCI_HEADER 3
+#define S3FWRN82_NCI_IDX 2
+#define NCI_SKB_BUFF_LEN 258
+
+struct s3fwrn82_uart_phy {
+   struct phy_common common;
+   struct serdev_device *ser_dev;
+   struct sk_buff *recv_skb;
+};
+
+static int s3fwrn82_uart_write(void *phy_id, struct sk_buff *out)
+{
+   struct s3fwrn82_uart_phy *phy = phy_id;
+   int err;
+
+   err = serdev_device_write(phy->ser_dev,
+ out->data, out->len,
+ MAX_SCHEDULE_TIMEOUT);
+   if (err < 0)
+   return err;
+
+   return 0;
+}
+
+static const struct s3fwrn5_phy_ops uart_phy_ops = {
+   .set_wake = s3fwrn5_phy_set_wake,
+   .set_mode = s3fwrn5_phy_set_mode,
+   .get_mode = s3fwrn5_phy_get_mode,
+   .write = s3fwrn82_uart_write,
+};
+
+static int s3fwrn82_uart_read(struct serdev_device *serdev,
+ const unsigned char *data,
+ size_t count)
+{
+   struct s3fwrn82_uart_phy *phy = serdev_device_get_drvdata(serdev);
+   size_t i;
+
+   for (i = 0; i < count; i++) {
+   skb_put_u8(phy->recv_skb, *data++);
+
+   if (phy->recv_skb->len < S3FWRN82_NCI_HEADER)
+   continue;
+
+   if ((phy->recv_skb->len - S3FWRN82_NCI_HEADER)
+   < phy->recv_skb->data[S3FWRN82_NCI_IDX])
+   continue;
+
+

[GIT PULL] vdpa: last minute bugfixes

2020-12-02 Thread Michael S. Tsirkin

A couple of patches of the obviously correct variety.

The following changes since commit ad89653f79f1882d55d9df76c9b2b94f008c4e27:

  vhost-vdpa: fix page pinning leakage in error path (rework) (2020-11-25 
04:29:07 -0500)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 2c602741b51daa12f8457f222ce9ce9c4825d067:

  vhost_vdpa: return -EFAULT if copy_to_user() fails (2020-12-02 04:36:40 -0500)


vdpa: last minute bugfixes

A couple of fixes that surfaced at the last minute.

Signed-off-by: Michael S. Tsirkin 


Dan Carpenter (1):
  vhost_vdpa: return -EFAULT if copy_to_user() fails

Randy Dunlap (1):
  vdpa: mlx5: fix vdpa/vhost dependencies

 drivers/Makefile | 1 +
 drivers/vdpa/Kconfig | 1 +
 drivers/vhost/vdpa.c | 4 +++-
 3 files changed, 5 insertions(+), 1 deletion(-)

Linux IPV6 TCP egress path device passed for LOCAL_OUT hook is incorrect

2020-12-02 Thread Preethi Ramachandra

Hi David Ahren,

In TCP egress path for ipv6 the device passed to NF_INET_LOCAL_OUT hook should 
be skb_dst(skb)->dev instead of dst->dev.

https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L202
struct dst_entry *dst = skb_dst(skb); >>> This may return slave device.

In this code path the DST Dev and SKB DST Dev will be set to VRF master device.
ip6_xmit->l3mdev_ip6_out->vrf_l3_out->vrf_ip6_out (This will set SKB DST Dev to 
vrf0)

However, once the control passes back to ip6_xmit, 
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L280
Slave device is passed to LOCAL_OUT nf_hook instead of skb_dst(skb)->dev.

return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
   net, (struct sock *)sk, skb, NULL, dst->dev,  Should be 
skb_dst(skb)->dev
   dst_output);

This will cause a bug in firewall filters. 
nf_hook->ip6table_mangle_hook->ip6table_filter_hook (Device passed is slave 
device). If the firewall filter rule is configured with device as VRF it will 
fail to apply the filter. As per Linux documentation filters should work for 
VRF devices.

Firewall Rule:

ip6tables -A OUTPUT -o vrf0 -m comment --comment F-outer6_T-B-lo0_I-1001 -j 
outer6 (or -A OUTPUT -o vrf0 -m mark --mark 0x100 -m comment --comment 
F-outer6_T-B-lo0_I-1001 -j outer6 (Rule fails to apply for TCP packets)

Linux firewall doc:

https://www.kernel.org/doc/Documentation/networking/vrf.txt

[2] Iptables on ingress supports PREROUTING with skb->dev set to the real
ingress device and both INPUT and PREROUTING rules with skb->dev set to
the VRF device. For egress POSTROUTING and OUTPUT rules can be written
using either the VRF device or real egress device.

Comparison on device passed to LOCALOUT hook for IPV4 TCP and IPV6 raw/udp 
scenarios.

TCP IPV4 egress : In __ip_local_out to NF_INET_LOCAL_OUT skb_dst(skb)->dev is 
passed.

https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L115
return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT,
  net, sk, skb, NULL, skb_dst(skb)->dev, << will be set to 
vrf0
  dst_output);
}

Raw/UDP V6 egress path: skb_dst(skb)->dev is passed onto NF_INET_LOCAL_OUT

rawv6_sendmsg-> ip6_send_skb-> ip6_local_out-> __ip6_local_out (LOCAL_OUT hook, 
device passed is SKB DST which is VRF)->nf_hook-> 
ip6table_mangle_hook->ip6table_filter_hook

https://elixir.bootlin.com/linux/latest/source/net/ipv6/output_core.c#L167

return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
   net, sk, skb, NULL, skb_dst(skb)->dev,
   dst_output);

Thanks,
Preethi


Juniper Business Use Only

[PATCH v2 2/2] net: dsa: qca: ar9331: export stats64

2020-12-02 Thread Oleksij Rempel

Add stats support for the ar9331 switch.

Signed-off-by: Oleksij Rempel 
---
 drivers/net/dsa/qca/ar9331.c | 242 ++-
 1 file changed, 241 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c
index e24a99031b80..1a8027bc9561 100644
--- a/drivers/net/dsa/qca/ar9331.c
+++ b/drivers/net/dsa/qca/ar9331.c
@@ -101,6 +101,57 @@
 AR9331_SW_PORT_STATUS_RX_FLOW_EN | AR9331_SW_PORT_STATUS_TX_FLOW_EN | \
 AR9331_SW_PORT_STATUS_SPEED_M)
 
+/* MIB registers */
+#define AR9331_MIB_COUNTER(x)  (0x2 + ((x) * 0x100))
+
+#define AR9331_PORT_MIB_rxbroad(_port) (AR9331_MIB_COUNTER(_port) + 
0x00)
+#define AR9331_PORT_MIB_rxpause(_port) (AR9331_MIB_COUNTER(_port) + 
0x04)
+#define AR9331_PORT_MIB_rxmulti(_port) (AR9331_MIB_COUNTER(_port) + 
0x08)
+#define AR9331_PORT_MIB_rxfcserr(_port)
(AR9331_MIB_COUNTER(_port) + 0x0c)
+#define AR9331_PORT_MIB_rxalignerr(_port)  (AR9331_MIB_COUNTER(_port) + 
0x10)
+#define AR9331_PORT_MIB_rxrunt(_port)  (AR9331_MIB_COUNTER(_port) + 
0x14)
+#define AR9331_PORT_MIB_rxfragment(_port)  (AR9331_MIB_COUNTER(_port) + 
0x18)
+#define AR9331_PORT_MIB_rx64byte(_port)
(AR9331_MIB_COUNTER(_port) + 0x1c)
+#define AR9331_PORT_MIB_rx128byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x20)
+#define AR9331_PORT_MIB_rx256byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x24)
+#define AR9331_PORT_MIB_rx512byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x28)
+#define AR9331_PORT_MIB_rx1024byte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x2c)
+#define AR9331_PORT_MIB_rx1518byte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x30)
+#define AR9331_PORT_MIB_rxmaxbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x34)
+#define AR9331_PORT_MIB_rxtoolong(_port)   (AR9331_MIB_COUNTER(_port) + 
0x38)
+
+/* 64 bit counter */
+#define AR9331_PORT_MIB_rxgoodbyte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x3c)
+
+/* 64 bit counter */
+#define AR9331_PORT_MIB_rxbadbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x44)
+
+#define AR9331_PORT_MIB_rxoverflow(_port)  (AR9331_MIB_COUNTER(_port) + 
0x4c)
+#define AR9331_PORT_MIB_filtered(_port)
(AR9331_MIB_COUNTER(_port) + 0x50)
+#define AR9331_PORT_MIB_txbroad(_port) (AR9331_MIB_COUNTER(_port) + 
0x54)
+#define AR9331_PORT_MIB_txpause(_port) (AR9331_MIB_COUNTER(_port) + 
0x58)
+#define AR9331_PORT_MIB_txmulti(_port) (AR9331_MIB_COUNTER(_port) + 
0x5c)
+#define AR9331_PORT_MIB_txunderrun(_port)  (AR9331_MIB_COUNTER(_port) + 
0x60)
+#define AR9331_PORT_MIB_tx64byte(_port)
(AR9331_MIB_COUNTER(_port) + 0x64)
+#define AR9331_PORT_MIB_tx128byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x68)
+#define AR9331_PORT_MIB_tx256byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x6c)
+#define AR9331_PORT_MIB_tx512byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x70)
+#define AR9331_PORT_MIB_tx1024byte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x74)
+#define AR9331_PORT_MIB_tx1518byte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x78)
+#define AR9331_PORT_MIB_txmaxbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x7c)
+#define AR9331_PORT_MIB_txoversize(_port)  (AR9331_MIB_COUNTER(_port) + 
0x80)
+
+/* 64 bit counter */
+#define AR9331_PORT_MIB_txbyte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x84)
+
+#define AR9331_PORT_MIB_txcollision(_port) (AR9331_MIB_COUNTER(_port) + 
0x8c)
+#define AR9331_PORT_MIB_txabortcol(_port)  (AR9331_MIB_COUNTER(_port) + 
0x90)
+#define AR9331_PORT_MIB_txmulticol(_port)  (AR9331_MIB_COUNTER(_port) + 
0x94)
+#define AR9331_PORT_MIB_txsinglecol(_port) (AR9331_MIB_COUNTER(_port) + 
0x98)
+#define AR9331_PORT_MIB_txexcdefer(_port)  (AR9331_MIB_COUNTER(_port) + 
0x9c)
+#define AR9331_PORT_MIB_txdefer(_port) (AR9331_MIB_COUNTER(_port) + 
0xa0)
+#define AR9331_PORT_MIB_txlatecol(_port)   (AR9331_MIB_COUNTER(_port) + 
0xa4)
+
 /* Phy bypass mode
  * 
  * Bit:   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
@@ -154,6 +205,59 @@
 #define AR9331_SW_MDIO_POLL_SLEEP_US   1
 #define AR9331_SW_MDIO_POLL_TIMEOUT_US 20
 
+#define STATS_INTERVAL_JIFFIES (100 * HZ)
+
+struct ar9331_sw_stats {
+   u64 rxbroad;
+   u64 rxpause;
+   u64 rxmulti;
+   u64 rxfcserr;
+   u64 rxalignerr;
+   u64 rxrunt;
+   u64 rxfragment;
+   u64 rx64byte;
+   u64 rx128byte;
+   u64 rx256byte;
+   u64 rx512byte;
+   u64 rx1024byte;
+   u64 rx1518byte;
+   u64 rxmaxbyte;
+   u64 rxtoolong;
+   u64 rxgoodbyte;
+   u64 rxbadbyte;
+   u64 rxoverflow;
+   u64 filtered;
+   u64 txbroad;
+   u64 txpause;
+   u64 txmulti;
+   u64 txunderrun;
+   u64 tx64byte;
+   u64 tx128byte;
+   u64 tx256byte;
+   u64 tx512b

[PATCH v2 1/2] net: dsa: add optional stats64 support

2020-12-02 Thread Oleksij Rempel

Allow DSA drivers to export stats64

Signed-off-by: Oleksij Rempel 
---
 include/net/dsa.h |  3 +++
 net/dsa/slave.c   | 14 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 4e60d2610f20..457b89143875 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -655,6 +655,9 @@ struct dsa_switch_ops {
int (*port_change_mtu)(struct dsa_switch *ds, int port,
   int new_mtu);
int (*port_max_mtu)(struct dsa_switch *ds, int port);
+
+   void(*get_stats64)(struct dsa_switch *ds, int port,
+  struct rtnl_link_stats64 *s);
 };
 
 #define DSA_DEVLINK_PARAM_DRIVER(_id, _name, _type, _cmodes)   \
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index ff2266d2b998..6e1a4dc18a97 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1602,6 +1602,18 @@ static struct devlink_port 
*dsa_slave_get_devlink_port(struct net_device *dev)
return dp->ds->devlink ? &dp->devlink_port : NULL;
 }
 
+static void dsa_slave_get_stats64(struct net_device *dev,
+ struct rtnl_link_stats64 *s)
+{
+   struct dsa_port *dp = dsa_slave_to_port(dev);
+   struct dsa_switch *ds = dp->ds;
+
+   if (!ds->ops->get_stats64)
+   return dev_get_tstats64(dev, s);
+
+   return ds->ops->get_stats64(ds, dp->index, s);
+}
+
 static const struct net_device_ops dsa_slave_netdev_ops = {
.ndo_open   = dsa_slave_open,
.ndo_stop   = dsa_slave_close,
@@ -1621,7 +1633,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = 
{
 #endif
.ndo_get_phys_port_name = dsa_slave_get_phys_port_name,
.ndo_setup_tc   = dsa_slave_setup_tc,
-   .ndo_get_stats64= dev_get_tstats64,
+   .ndo_get_stats64= dsa_slave_get_stats64,
.ndo_get_port_parent_id = dsa_slave_get_port_parent_id,
.ndo_vlan_rx_add_vid= dsa_slave_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid   = dsa_slave_vlan_rx_kill_vid,
-- 
2.29.2

[PATCH v2 0/2] net: dsa: add stats64 support

2020-12-02 Thread Oleksij Rempel

changes v2:
- use stats64 instead of get_ethtool_stats
- add worked to poll for the stats

Oleksij Rempel (2):
  net: dsa: add optional stats64 support
  net: dsa: qca: ar9331: export stats64

 drivers/net/dsa/qca/ar9331.c | 242 ++-
 include/net/dsa.h|   3 +
 net/dsa/slave.c  |  14 +-
 3 files changed, 257 insertions(+), 2 deletions(-)

-- 
2.29.2

Re: KMSAN: uninit-value in __skb_checksum_complete (5)

2020-12-02 Thread syzbot

syzbot has found a reproducer for the following issue on:

HEAD commit:73d62e81 kmsan: random: prevent boot-time reports in _mix_..
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=13bd460750
kernel config:  https://syzkaller.appspot.com/x/.config?x=eef728deea880383
dashboard link: https://syzkaller.appspot.com/bug?extid=b024befb3ca7990fea37
compiler:   clang version 11.0.0 (https://github.com/llvm/llvm-project.git 
ca2dcbd030eadbf0aa9b660efe864ff08af6e18b)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=126c837950
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17cdf7b550

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b024befb3ca7990fe...@syzkaller.appspotmail.com

=
BUG: KMSAN: uninit-value in __skb_checksum_complete+0x421/0x630 
net/core/skbuff.c:2846
CPU: 0 PID: 497 Comm: kworker/u4:11 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 __skb_checksum_complete+0x421/0x630 net/core/skbuff.c:2846
 __skb_checksum_validate_complete include/linux/skbuff.h:4014 [inline]
 icmp_rcv+0x94b/0x1d70 net/ipv4/icmp.c:1081
 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:449 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
 __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
 process_backlog+0x523/0xc10 net/core/dev.c:6319
 napi_poll+0x420/0x1010 net/core/dev.c:6763
 net_rx_action+0x35c/0xd40 net/core/dev.c:6833
 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:343 [inline]
 __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195
 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline]
 __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173
 batadv_send_skb_packet+0x622/0x970 net/batman-adv/send.c:108
 batadv_send_broadcast_skb+0x76/0x90 net/batman-adv/send.c:127
 batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:394 [inline]
 batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:420 [inline]
 batadv_iv_send_outstanding_bat_ogm_packet+0xb3a/0xf00 
net/batman-adv/bat_iv_ogm.c:1712
 process_one_work+0x121c/0x1fc0 kernel/workqueue.c:2272
 worker_thread+0x10cc/0x2740 kernel/workqueue.c:2418
 kthread+0x51c/0x560 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296

Uninit was stored to memory at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
 kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289
 kmsan_memcpy_memmove_metadata+0x25e/0x2d0 mm/kmsan/kmsan.c:226
 kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:246
 __msan_memcpy+0x46/0x60 mm/kmsan/kmsan_instr.c:110
 csum_partial_copy_nocheck include/net/checksum.h:51 [inline]
 skb_copy_and_csum_bits+0x23e/0x13e0 net/core/skbuff.c:2733
 icmp_glue_bits+0x155/0x400 net/ipv4/icmp.c:356
 __ip_append_data+0x4f8e/0x6210 net/ipv4/ip_output.c:1139
 ip_append_data+0x326/0x490 net/ipv4/ip_output.c:1323
 icmp_push_reply+0x1f8/0x810 net/ipv4/icmp.c:374
 __icmp_send+0x2a98/0x3a90 net/ipv4/icmp.c:762
 icmp_send include/net/icmp.h:43 [inline]
 __udp4_lib_rcv+0x421f/0x5880 net/ipv4/udp.c:2405
 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564
 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:449 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
 __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
 process_backlog+0x523/0xc10 net/core/dev.c:6319
 napi_poll+0x420/0x1010 net/core/dev.c:6763
 net_rx_action+0x35c/0xd40 net/core/dev.c:6833
 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298

Uninit was stored to memory at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]

Re: [PATCH 2/2] powerpc/ps3: make system bus's remove and shutdown callbacks return void

2020-12-02 Thread Michael Ellerman

Uwe Kleine-König  writes:
> Hello Michael,
>
> On Sat, Nov 28, 2020 at 09:48:30AM +0100, Takashi Iwai wrote:
>> On Thu, 26 Nov 2020 17:59:50 +0100,
>> Uwe Kleine-König wrote:
>> > 
>> > The driver core ignores the return value of struct device_driver::remove
>> > because there is only little that can be done. For the shutdown callback
>> > it's ps3_system_bus_shutdown() which ignores the return value.
>> > 
>> > To simplify the quest to make struct device_driver::remove return void,
>> > let struct ps3_system_bus_driver::remove return void, too. All users
>> > already unconditionally return 0, this commit makes it obvious that
>> > returning an error code is a bad idea and ensures future users behave
>> > accordingly.
>> > 
>> > Signed-off-by: Uwe Kleine-König 
>> 
>> For the sound bit:
>> Acked-by: Takashi Iwai 
>
> assuming that you are the one who will apply this patch: Note that it
> depends on patch 1 that Takashi already applied to his tree. So you
> either have to wait untils patch 1 appears in some tree that you merge
> before applying, or you have to take patch 1, too. (With Takashi
> optinally dropping it then.)

Thanks. I've picked up both patches.

If Takashi doesn't want to rebase his tree to drop patch 1 that's OK, it
will just arrive in mainline via two paths, but git should handle it.

cheers

Re: [PATCH v2 2/2] net: dsa: qca: ar9331: export stats64

2020-12-02 Thread Marc Kleine-Budde

On 12/2/20 1:07 PM, Oleksij Rempel wrote:
> Add stats support for the ar9331 switch.
> 
> Signed-off-by: Oleksij Rempel 
> ---
>  drivers/net/dsa/qca/ar9331.c | 242 ++-
>  1 file changed, 241 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c
> index e24a99031b80..1a8027bc9561 100644
> --- a/drivers/net/dsa/qca/ar9331.c
> +++ b/drivers/net/dsa/qca/ar9331.c
> @@ -101,6 +101,57 @@
>AR9331_SW_PORT_STATUS_RX_FLOW_EN | AR9331_SW_PORT_STATUS_TX_FLOW_EN | \
>AR9331_SW_PORT_STATUS_SPEED_M)
>  
> +/* MIB registers */
> +#define AR9331_MIB_COUNTER(x)(0x2 + ((x) * 
> 0x100))
> +
> +#define AR9331_PORT_MIB_rxbroad(_port)   
> (AR9331_MIB_COUNTER(_port) + 0x00)
> +#define AR9331_PORT_MIB_rxpause(_port)   
> (AR9331_MIB_COUNTER(_port) + 0x04)
> +#define AR9331_PORT_MIB_rxmulti(_port)   
> (AR9331_MIB_COUNTER(_port) + 0x08)
> +#define AR9331_PORT_MIB_rxfcserr(_port)  
> (AR9331_MIB_COUNTER(_port) + 0x0c)
> +#define AR9331_PORT_MIB_rxalignerr(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x10)
> +#define AR9331_PORT_MIB_rxrunt(_port)
> (AR9331_MIB_COUNTER(_port) + 0x14)
> +#define AR9331_PORT_MIB_rxfragment(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x18)
> +#define AR9331_PORT_MIB_rx64byte(_port)  
> (AR9331_MIB_COUNTER(_port) + 0x1c)
> +#define AR9331_PORT_MIB_rx128byte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x20)
> +#define AR9331_PORT_MIB_rx256byte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x24)
> +#define AR9331_PORT_MIB_rx512byte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x28)
> +#define AR9331_PORT_MIB_rx1024byte(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x2c)
> +#define AR9331_PORT_MIB_rx1518byte(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x30)
> +#define AR9331_PORT_MIB_rxmaxbyte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x34)
> +#define AR9331_PORT_MIB_rxtoolong(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x38)
> +
> +/* 64 bit counter */
> +#define AR9331_PORT_MIB_rxgoodbyte(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x3c)
> +
> +/* 64 bit counter */
> +#define AR9331_PORT_MIB_rxbadbyte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x44)
> +
> +#define AR9331_PORT_MIB_rxoverflow(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x4c)
> +#define AR9331_PORT_MIB_filtered(_port)  
> (AR9331_MIB_COUNTER(_port) + 0x50)
> +#define AR9331_PORT_MIB_txbroad(_port)   
> (AR9331_MIB_COUNTER(_port) + 0x54)
> +#define AR9331_PORT_MIB_txpause(_port)   
> (AR9331_MIB_COUNTER(_port) + 0x58)
> +#define AR9331_PORT_MIB_txmulti(_port)   
> (AR9331_MIB_COUNTER(_port) + 0x5c)
> +#define AR9331_PORT_MIB_txunderrun(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x60)
> +#define AR9331_PORT_MIB_tx64byte(_port)  
> (AR9331_MIB_COUNTER(_port) + 0x64)
> +#define AR9331_PORT_MIB_tx128byte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x68)
> +#define AR9331_PORT_MIB_tx256byte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x6c)
> +#define AR9331_PORT_MIB_tx512byte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x70)
> +#define AR9331_PORT_MIB_tx1024byte(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x74)
> +#define AR9331_PORT_MIB_tx1518byte(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x78)
> +#define AR9331_PORT_MIB_txmaxbyte(_port) (AR9331_MIB_COUNTER(_port) + 
> 0x7c)
> +#define AR9331_PORT_MIB_txoversize(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x80)
> +
> +/* 64 bit counter */
> +#define AR9331_PORT_MIB_txbyte(_port)
> (AR9331_MIB_COUNTER(_port) + 0x84)
> +
> +#define AR9331_PORT_MIB_txcollision(_port)   (AR9331_MIB_COUNTER(_port) + 
> 0x8c)
> +#define AR9331_PORT_MIB_txabortcol(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x90)
> +#define AR9331_PORT_MIB_txmulticol(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x94)
> +#define AR9331_PORT_MIB_txsinglecol(_port)   (AR9331_MIB_COUNTER(_port) + 
> 0x98)
> +#define AR9331_PORT_MIB_txexcdefer(_port)(AR9331_MIB_COUNTER(_port) + 
> 0x9c)
> +#define AR9331_PORT_MIB_txdefer(_port)   
> (AR9331_MIB_COUNTER(_port) + 0xa0)
> +#define AR9331_PORT_MIB_txlatecol(_port) (AR9331_MIB_COUNTER(_port) + 
> 0xa4)
> +
>  /* Phy bypass mode
>   * 
>   * Bit:   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
> @@ -154,6 +205,59 @@
>  #define AR9331_SW_MDIO_POLL_SLEEP_US 1
>  #define AR9331_SW_MDIO_POLL_TIMEOUT_US   20
>  
> +#define STATS_INTERVAL_JIFFIES   (100 * HZ)
> +
> +struct ar9331_sw_stats {
> + u64 rxbroad;
> + u64 rxpause;
> + u64 rxmulti;
> + u64 rxfcserr;
> + u64 rxalignerr;
> + u64 rxrunt;
> + u64 rxfragment;
> + u64 rx64byte;
> + u64 rx128byte;
> + u64 rx256byte;
> + u64 rx512byte;
> + u64 rx1024byte;
> + u64 rx1518byte;
> + u64 rxmaxbyte;
> +

Re: [PATCH 2/2] powerpc/ps3: make system bus's remove and shutdown callbacks return void

2020-12-02 Thread Takashi Iwai

On Wed, 02 Dec 2020 13:14:06 +0100,
Michael Ellerman wrote:
> 
> Uwe Kleine-König  writes:
> > Hello Michael,
> >
> > On Sat, Nov 28, 2020 at 09:48:30AM +0100, Takashi Iwai wrote:
> >> On Thu, 26 Nov 2020 17:59:50 +0100,
> >> Uwe Kleine-König wrote:
> >> > 
> >> > The driver core ignores the return value of struct device_driver::remove
> >> > because there is only little that can be done. For the shutdown callback
> >> > it's ps3_system_bus_shutdown() which ignores the return value.
> >> > 
> >> > To simplify the quest to make struct device_driver::remove return void,
> >> > let struct ps3_system_bus_driver::remove return void, too. All users
> >> > already unconditionally return 0, this commit makes it obvious that
> >> > returning an error code is a bad idea and ensures future users behave
> >> > accordingly.
> >> > 
> >> > Signed-off-by: Uwe Kleine-König 
> >> 
> >> For the sound bit:
> >> Acked-by: Takashi Iwai 
> >
> > assuming that you are the one who will apply this patch: Note that it
> > depends on patch 1 that Takashi already applied to his tree. So you
> > either have to wait untils patch 1 appears in some tree that you merge
> > before applying, or you have to take patch 1, too. (With Takashi
> > optinally dropping it then.)
> 
> Thanks. I've picked up both patches.
> 
> If Takashi doesn't want to rebase his tree to drop patch 1 that's OK, it
> will just arrive in mainline via two paths, but git should handle it.

Yeah, I'd like to avoid rebasing, so let's get it merge from both
trees.  git can handle such a case gracefully.


thanks,

Takashi

[PATCH 1/1] bareudp: constify device_type declaration

2020-12-02 Thread Jonas Bonn

device_type may be declared as const.

Signed-off-by: Jonas Bonn 
---
 drivers/net/bareudp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index 28257bccec41..85ebd2b7e446 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -522,7 +522,7 @@ static const struct nla_policy 
bareudp_policy[IFLA_BAREUDP_MAX + 1] = {
 };
 
 /* Info for udev, that this is a virtual tunnel endpoint */
-static struct device_type bareudp_type = {
+static const struct device_type bareudp_type = {
.name = "bareudp",
 };
 
-- 
2.27.0

[PATCH 3/5] gtp: really check namespaces before xmit

2020-12-02 Thread Jonas Bonn

Blindly assuming that packet transmission crosses namespaces results in
skb marks being lost in the single namespace case.

Signed-off-by: Jonas Bonn 
---
 drivers/net/gtp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 096aaf1c867e..e053f86f43f3 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -592,7 +592,9 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, struct 
net_device *dev)
ip4_dst_hoplimit(&pktinfo.rt->dst),
0,
pktinfo.gtph_port, pktinfo.gtph_port,
-   true, false);
+   !net_eq(sock_net(pktinfo.pctx->sk),
+   dev_net(dev)),
+   false);
break;
}
 
-- 
2.27.0

[PATCH 4/5] gtp: drop unnecessary call to skb_dst_drop

2020-12-02 Thread Jonas Bonn

The call to skb_dst_drop() is already done as part of udp_tunnel_xmit().

Signed-off-by: Jonas Bonn 
---
 drivers/net/gtp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index e053f86f43f3..c19465458187 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -515,8 +515,6 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
goto err_rt;
}
 
-   skb_dst_drop(skb);
-
/* This is similar to tnl_update_pmtu(). */
df = iph->frag_off;
if (df) {
-- 
2.27.0

[PATCH 5/5] gtp: set device type

2020-12-02 Thread Jonas Bonn

Set the devtype to 'gtp' when setting up the link.

Signed-off-by: Jonas Bonn 
---
 drivers/net/gtp.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index c19465458187..6de38e06588d 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -610,6 +610,10 @@ static const struct net_device_ops gtp_netdev_ops = {
.ndo_get_stats64= dev_get_tstats64,
 };
 
+static const struct device_type gtp_type = {
+   .name = "gtp",
+};
+
 static void gtp_link_setup(struct net_device *dev)
 {
unsigned int max_gtp_header_len = sizeof(struct iphdr) +
@@ -618,6 +622,7 @@ static void gtp_link_setup(struct net_device *dev)
 
dev->netdev_ops = >p_netdev_ops;
dev->needs_free_netdev  = true;
+   SET_NETDEV_DEVTYPE(dev, >p_type);
 
dev->hard_header_len = 0;
dev->addr_len = 0;
-- 
2.27.0

[PATCH 1/5] gtp: set initial MTU

2020-12-02 Thread Jonas Bonn

The GTP link is brought up with a default MTU of zero.  This can lead to
some rather unexpected behaviour for users who are more accustomed to
interfaces coming online with reasonable defaults.

This patch sets an initial MTU for the GTP link of 1500 less worst-case
tunnel overhead.

Signed-off-by: Jonas Bonn 
---
 drivers/net/gtp.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 4c04e271f184..5a048f050a9c 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -612,11 +612,16 @@ static const struct net_device_ops gtp_netdev_ops = {
 
 static void gtp_link_setup(struct net_device *dev)
 {
+   unsigned int max_gtp_header_len = sizeof(struct iphdr) +
+ sizeof(struct udphdr) +
+ sizeof(struct gtp0_header);
+
dev->netdev_ops = >p_netdev_ops;
dev->needs_free_netdev  = true;
 
dev->hard_header_len = 0;
dev->addr_len = 0;
+   dev->mtu = ETH_DATA_LEN - max_gtp_header_len;
 
/* Zero header length. */
dev->type = ARPHRD_NONE;
@@ -626,11 +631,7 @@ static void gtp_link_setup(struct net_device *dev)
dev->features   |= NETIF_F_LLTX;
netif_keep_dst(dev);
 
-   /* Assume largest header, ie. GTPv0. */
-   dev->needed_headroom= LL_MAX_HEADER +
- sizeof(struct iphdr) +
- sizeof(struct udphdr) +
- sizeof(struct gtp0_header);
+   dev->needed_headroom= LL_MAX_HEADER + max_gtp_header_len;
 }
 
 static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize);
-- 
2.27.0

[PATCH 2/5] gtp: include role in link info

2020-12-02 Thread Jonas Bonn

Querying link info for the GTP interface doesn't reveal in which "role" the
device is set to operate.  Include this information in the info query
result.

Signed-off-by: Jonas Bonn 
---
 drivers/net/gtp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 5a048f050a9c..096aaf1c867e 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -728,7 +728,8 @@ static int gtp_validate(struct nlattr *tb[], struct nlattr 
*data[],
 
 static size_t gtp_get_size(const struct net_device *dev)
 {
-   return nla_total_size(sizeof(__u32));   /* IFLA_GTP_PDP_HASHSIZE */
+   return nla_total_size(sizeof(__u32)) +  /* IFLA_GTP_PDP_HASHSIZE */
+   + nla_total_size(sizeof(__u32)); /* IFLA_GTP_ROLE */
 }
 
 static int gtp_fill_info(struct sk_buff *skb, const struct net_device *dev)
@@ -737,6 +738,8 @@ static int gtp_fill_info(struct sk_buff *skb, const struct 
net_device *dev)
 
if (nla_put_u32(skb, IFLA_GTP_PDP_HASHSIZE, gtp->hash_size))
goto nla_put_failure;
+   if (nla_put_u32(skb, IFLA_GTP_ROLE, gtp->role))
+   goto nla_put_failure;
 
return 0;
 
-- 
2.27.0

Re: [PATCH v2 2/2] net: dsa: qca: ar9331: export stats64

2020-12-02 Thread Oleksij Rempel

On Wed, Dec 02, 2020 at 01:15:58PM +0100, Marc Kleine-Budde wrote:
> On 12/2/20 1:07 PM, Oleksij Rempel wrote:
> > Add stats support for the ar9331 switch.
> > 
> > Signed-off-by: Oleksij Rempel 
> > ---
> >  drivers/net/dsa/qca/ar9331.c | 242 ++-
> >  1 file changed, 241 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c
> > index e24a99031b80..1a8027bc9561 100644
> > --- a/drivers/net/dsa/qca/ar9331.c
> > +++ b/drivers/net/dsa/qca/ar9331.c
> > @@ -101,6 +101,57 @@
> >  AR9331_SW_PORT_STATUS_RX_FLOW_EN | AR9331_SW_PORT_STATUS_TX_FLOW_EN | \
> >  AR9331_SW_PORT_STATUS_SPEED_M)
> >  
> > +/* MIB registers */
> > +#define AR9331_MIB_COUNTER(x)  (0x2 + ((x) * 
> > 0x100))
> > +
> > +#define AR9331_PORT_MIB_rxbroad(_port) 
> > (AR9331_MIB_COUNTER(_port) + 0x00)
> > +#define AR9331_PORT_MIB_rxpause(_port) 
> > (AR9331_MIB_COUNTER(_port) + 0x04)
> > +#define AR9331_PORT_MIB_rxmulti(_port) 
> > (AR9331_MIB_COUNTER(_port) + 0x08)
> > +#define AR9331_PORT_MIB_rxfcserr(_port)
> > (AR9331_MIB_COUNTER(_port) + 0x0c)
> > +#define AR9331_PORT_MIB_rxalignerr(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x10)
> > +#define AR9331_PORT_MIB_rxrunt(_port)  
> > (AR9331_MIB_COUNTER(_port) + 0x14)
> > +#define AR9331_PORT_MIB_rxfragment(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x18)
> > +#define AR9331_PORT_MIB_rx64byte(_port)
> > (AR9331_MIB_COUNTER(_port) + 0x1c)
> > +#define AR9331_PORT_MIB_rx128byte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x20)
> > +#define AR9331_PORT_MIB_rx256byte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x24)
> > +#define AR9331_PORT_MIB_rx512byte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x28)
> > +#define AR9331_PORT_MIB_rx1024byte(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x2c)
> > +#define AR9331_PORT_MIB_rx1518byte(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x30)
> > +#define AR9331_PORT_MIB_rxmaxbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x34)
> > +#define AR9331_PORT_MIB_rxtoolong(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x38)
> > +
> > +/* 64 bit counter */
> > +#define AR9331_PORT_MIB_rxgoodbyte(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x3c)
> > +
> > +/* 64 bit counter */
> > +#define AR9331_PORT_MIB_rxbadbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x44)
> > +
> > +#define AR9331_PORT_MIB_rxoverflow(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x4c)
> > +#define AR9331_PORT_MIB_filtered(_port)
> > (AR9331_MIB_COUNTER(_port) + 0x50)
> > +#define AR9331_PORT_MIB_txbroad(_port) 
> > (AR9331_MIB_COUNTER(_port) + 0x54)
> > +#define AR9331_PORT_MIB_txpause(_port) 
> > (AR9331_MIB_COUNTER(_port) + 0x58)
> > +#define AR9331_PORT_MIB_txmulti(_port) 
> > (AR9331_MIB_COUNTER(_port) + 0x5c)
> > +#define AR9331_PORT_MIB_txunderrun(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x60)
> > +#define AR9331_PORT_MIB_tx64byte(_port)
> > (AR9331_MIB_COUNTER(_port) + 0x64)
> > +#define AR9331_PORT_MIB_tx128byte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x68)
> > +#define AR9331_PORT_MIB_tx256byte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x6c)
> > +#define AR9331_PORT_MIB_tx512byte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x70)
> > +#define AR9331_PORT_MIB_tx1024byte(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x74)
> > +#define AR9331_PORT_MIB_tx1518byte(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x78)
> > +#define AR9331_PORT_MIB_txmaxbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0x7c)
> > +#define AR9331_PORT_MIB_txoversize(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x80)
> > +
> > +/* 64 bit counter */
> > +#define AR9331_PORT_MIB_txbyte(_port)  
> > (AR9331_MIB_COUNTER(_port) + 0x84)
> > +
> > +#define AR9331_PORT_MIB_txcollision(_port) (AR9331_MIB_COUNTER(_port) + 
> > 0x8c)
> > +#define AR9331_PORT_MIB_txabortcol(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x90)
> > +#define AR9331_PORT_MIB_txmulticol(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x94)
> > +#define AR9331_PORT_MIB_txsinglecol(_port) (AR9331_MIB_COUNTER(_port) + 
> > 0x98)
> > +#define AR9331_PORT_MIB_txexcdefer(_port)  (AR9331_MIB_COUNTER(_port) + 
> > 0x9c)
> > +#define AR9331_PORT_MIB_txdefer(_port) 
> > (AR9331_MIB_COUNTER(_port) + 0xa0)
> > +#define AR9331_PORT_MIB_txlatecol(_port)   (AR9331_MIB_COUNTER(_port) + 
> > 0xa4)
> > +
> >  /* Phy bypass mode
> >   * 
> >   * Bit:   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
> > @@ -154,6 +205,59 @@
> >  #define AR9331_SW_MDIO_POLL_SLEEP_US   1
> >  #define AR9331_SW_MDIO_POLL_TIMEOUT_US 20
> >  
> > +#define STATS_INTERVAL_JIFFIES (100 * HZ)
> > +
> > +struct ar9331_sw_stats {
> > +   u64 rxbroad;
> > +   u64 rxpause;
> > +   u64 rxmulti;
> > +   u64 rxfcserr;
> > +   u64 rxalign

[PATCH] net: 8021q: use netdev_info() instead of pr_info()

2020-12-02 Thread Enrico Weigelt, metux IT consult

Use netdev_info() instead of pr_info() for more consistent log output.

Signed-off-by: Enrico Weigelt 
---
 net/8021q/vlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index f292e0267bb9..d3a6f4ffdaef 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -132,7 +132,7 @@ int vlan_check_real_dev(struct net_device *real_dev,
const char *name = real_dev->name;
 
if (real_dev->features & NETIF_F_VLAN_CHALLENGED) {
-   pr_info("VLANs not supported on %s\n", name);
+   netdev_info(real_dev, "VLANs not supported on %s\n", name);
NL_SET_ERR_MSG_MOD(extack, "VLANs not supported on device");
return -EOPNOTSUPP;
}
@@ -385,7 +385,7 @@ static int vlan_device_event(struct notifier_block *unused, 
unsigned long event,
 
if ((event == NETDEV_UP) &&
(dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) {
-   pr_info("adding VLAN 0 to HW filter on device %s\n",
+   netdev_info(dev, "adding VLAN 0 to HW filter on device %s\n",
dev->name);
vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
}
-- 
2.11.0

Re: [PATCH v2 2/2] net: dsa: qca: ar9331: export stats64

2020-12-02 Thread Marc Kleine-Budde

On 12/2/20 1:43 PM, Oleksij Rempel wrote:
>>> +struct ar9331_sw_priv;
>>> +struct ar9331_sw_port {
>>> +   int idx;
>>> +   struct ar9331_sw_priv *priv;
>>> +   struct delayed_work mib_read;
>>> +   struct ar9331_sw_stats stats;
>>> +   struct mutex lock;  /* stats access */
>>
>> What does the lock protect? It's only used a single time.
> 
> The ar9331_read_stats() function is called from two different contests:
> from worker over ar9331_do_stats_poll() and from user space over
> ar9331_get_stats64().
> 
> The mutex lock should prevent a race in the read modify write operations
> for in the stats->*

Makes sense!

Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v3] Compiler Attributes: remove CONFIG_ENABLE_MUST_CHECK

2020-12-02 Thread Miguel Ojeda

On Sat, Nov 28, 2020 at 8:34 PM Masahiro Yamada  wrote:
>
> Revert commit cebc04ba9aeb ("add CONFIG_ENABLE_MUST_CHECK").
>
> A lot of warn_unused_result warnings existed in 2006, but until now
> they have been fixed thanks to people doing allmodconfig tests.
>
> Our goal is to always enable __must_check where appropriate, so this
> CONFIG option is no longer needed.
>
> I see a lot of defconfig (arch/*/configs/*_defconfig) files having:
>
> # CONFIG_ENABLE_MUST_CHECK is not set
>
> I did not touch them for now since it would be a big churn. If arch
> maintainers want to clean them up, please go ahead.
>
> While I was here, I also moved __must_check to compiler_attributes.h
> from compiler_types.h
>
> Signed-off-by: Masahiro Yamada 
> Acked-by: Jason A. Donenfeld 

Picked this new version with the Acks etc., plus I moved it within
compiler_attributes.h to keep it sorted (it's sorted by the second
column, rather than the first).

Thanks a lot!

Cheers,
Miguel

[PATCH 1/7] net: 8021q: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Enrico Weigelt, metux IT consult

Remove MODULE_VERSION(), as it isn't needed at all: the only version
making sense is the kernel version.

Signed-off-by: Enrico Weigelt, metux IT consult 
---
 net/8021q/vlan.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index f292e0267bb9..683e9e825b9e 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -36,15 +36,10 @@
 #include "vlan.h"
 #include "vlanproc.h"
 
-#define DRV_VERSION "1.8"
-
 /* Global VLAN variables */
 
 unsigned int vlan_net_id __read_mostly;
 
-const char vlan_fullname[] = "802.1Q VLAN Support";
-const char vlan_version[] = DRV_VERSION;
-
 /* End of global variables definitions. */
 
 static int vlan_group_prealloc_vid(struct vlan_group *vg,
@@ -687,7 +682,7 @@ static int __init vlan_proto_init(void)
 {
int err;
 
-   pr_info("%s v%s\n", vlan_fullname, vlan_version);
+   pr_info("802.1Q VLAN Support\n");
 
err = register_pernet_subsys(&vlan_net_ops);
if (err < 0)
@@ -743,4 +738,3 @@ module_init(vlan_proto_init);
 module_exit(vlan_cleanup_module);
 
 MODULE_LICENSE("GPL");
-MODULE_VERSION(DRV_VERSION);
-- 
2.11.0

[PATCH 2/7] net: batman-adv: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Enrico Weigelt, metux IT consult

Remove MODULE_VERSION(), as it isn't needed at all: the only version
making sense is the kernel version.

Signed-off-by: Enrico Weigelt, metux IT consult 
---
 net/batman-adv/main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 70fee9b42e25..1c2ccad94bf8 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -747,6 +747,5 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR(BATADV_DRIVER_AUTHOR);
 MODULE_DESCRIPTION(BATADV_DRIVER_DESC);
 MODULE_SUPPORTED_DEVICE(BATADV_DRIVER_DEVICE);
-MODULE_VERSION(BATADV_SOURCE_VERSION);
 MODULE_ALIAS_RTNL_LINK("batadv");
 MODULE_ALIAS_GENL_FAMILY(BATADV_NL_NAME);
-- 
2.11.0

[PATCH 5/7] net: bridge: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Enrico Weigelt, metux IT consult

Remove MODULE_VERSION(), as it isn't needed at all: the only version
making sense is the kernel version.

Signed-off-by: Enrico Weigelt, metux IT consult 
---
 net/bridge/br.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index 401eeb9142eb..2502fdcbb8b2 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -399,5 +399,4 @@ static void __exit br_deinit(void)
 module_init(br_init)
 module_exit(br_deinit)
 MODULE_LICENSE("GPL");
-MODULE_VERSION(BR_VERSION);
 MODULE_ALIAS_RTNL_LINK("bridge");
-- 
2.11.0

[PATCH 3/7] net: ipv4: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Enrico Weigelt, metux IT consult

Remove MODULE_VERSION(), as it isn't needed at all: the only version
making sense is the kernel version.

Signed-off-by: Enrico Weigelt 
---
 net/ipv4/tcp_cubic.c| 1 -
 net/ipv4/tcp_illinois.c | 1 -
 net/ipv4/tcp_nv.c   | 1 -
 3 files changed, 3 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index c7bf5b26bf0c..c6bcd445df04 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -537,4 +537,3 @@ module_exit(cubictcp_unregister);
 MODULE_AUTHOR("Sangtae Ha, Stephen Hemminger");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("CUBIC TCP");
-MODULE_VERSION("2.3");
diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c
index 00e54873213e..8cc9967e82ef 100644
--- a/net/ipv4/tcp_illinois.c
+++ b/net/ipv4/tcp_illinois.c
@@ -355,4 +355,3 @@ module_exit(tcp_illinois_unregister);
 MODULE_AUTHOR("Stephen Hemminger, Shao Liu");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("TCP Illinois");
-MODULE_VERSION("1.0");
diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c
index 95db7a11ba2a..b3879fb24d33 100644
--- a/net/ipv4/tcp_nv.c
+++ b/net/ipv4/tcp_nv.c
@@ -499,4 +499,3 @@ module_exit(tcpnv_unregister);
 MODULE_AUTHOR("Lawrence Brakmo");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("TCP NV");
-MODULE_VERSION("1.0");
-- 
2.11.0

[PATCH 6/7] net: vmw_vsock: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Enrico Weigelt, metux IT consult

Remove MODULE_VERSION(), as it isn't needed at all: the only version
making sense is the kernel version.

Signed-off-by: Enrico Weigelt, metux IT consult 
---
 net/vmw_vsock/af_vsock.c | 1 -
 net/vmw_vsock/hyperv_transport.c | 1 -
 net/vmw_vsock/vmci_transport.c   | 1 -
 3 files changed, 3 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d10916ab4526..cc196ffba3ed 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2238,5 +2238,4 @@ module_exit(vsock_exit);
 
 MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION("VMware Virtual Socket Family");
-MODULE_VERSION("1.0.2.0-k");
 MODULE_LICENSE("GPL v2");
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 630b851f8150..dae562f40896 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -929,6 +929,5 @@ module_init(hvs_init);
 module_exit(hvs_exit);
 
 MODULE_DESCRIPTION("Hyper-V Sockets");
-MODULE_VERSION("1.0.0");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_NETPROTO(PF_VSOCK);
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 8b65323207db..bd39cca58ee6 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -2140,7 +2140,6 @@ module_exit(vmci_transport_exit);
 
 MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION("VMCI transport for Virtual Sockets");
-MODULE_VERSION("1.0.5.0-k");
 MODULE_LICENSE("GPL v2");
 MODULE_ALIAS("vmware_vsock");
 MODULE_ALIAS_NETPROTO(PF_VSOCK);
-- 
2.11.0

[PATCH 7/7] net: tipc: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Enrico Weigelt, metux IT consult

Remove MODULE_VERSION(), as it isn't needed at all: the only version
making sense is the kernel version.

Signed-off-by: Enrico Weigelt, metux IT consult 
---
 net/tipc/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/tipc/core.c b/net/tipc/core.c
index c2ff42900b53..8c0c45347c53 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -227,4 +227,3 @@ module_exit(tipc_exit);
 
 MODULE_DESCRIPTION("TIPC: Transparent Inter Process Communication");
 MODULE_LICENSE("Dual BSD/GPL");
-MODULE_VERSION(TIPC_MOD_VER);
-- 
2.11.0

[PATCH 4/7] net: bluetooth: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Enrico Weigelt, metux IT consult

Remove MODULE_VERSION(), as it isn't needed at all: the only version
making sense is the kernel version.

Signed-off-by: Enrico Weigelt, metux IT consult 
---
 net/bluetooth/6lowpan.c  | 3 ---
 net/bluetooth/af_bluetooth.c | 1 -
 net/bluetooth/bnep/core.c| 1 -
 net/bluetooth/cmtp/core.c| 1 -
 net/bluetooth/hidp/core.c| 1 -
 net/bluetooth/rfcomm/core.c  | 1 -
 6 files changed, 8 deletions(-)

diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c
index cff4944d5b66..9759515edc6e 100644
--- a/net/bluetooth/6lowpan.c
+++ b/net/bluetooth/6lowpan.c
@@ -21,8 +21,6 @@
 
 #include  /* for the compression support */
 
-#define VERSION "0.1"
-
 static struct dentry *lowpan_enable_debugfs;
 static struct dentry *lowpan_control_debugfs;
 
@@ -1316,5 +1314,4 @@ module_exit(bt_6lowpan_exit);
 
 MODULE_AUTHOR("Jukka Rissanen ");
 MODULE_DESCRIPTION("Bluetooth 6LoWPAN");
-MODULE_VERSION(VERSION);
 MODULE_LICENSE("GPL");
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 4ef6a54403aa..6d587a3b3c56 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -799,6 +799,5 @@ module_exit(bt_exit);
 
 MODULE_AUTHOR("Marcel Holtmann ");
 MODULE_DESCRIPTION("Bluetooth Core ver " VERSION);
-MODULE_VERSION(VERSION);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_NETPROTO(PF_BLUETOOTH);
diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c
index 43c284158f63..96f0eb60deb0 100644
--- a/net/bluetooth/bnep/core.c
+++ b/net/bluetooth/bnep/core.c
@@ -764,6 +764,5 @@ MODULE_PARM_DESC(compress_dst, "Compress destination 
headers");
 
 MODULE_AUTHOR("Marcel Holtmann ");
 MODULE_DESCRIPTION("Bluetooth BNEP ver " VERSION);
-MODULE_VERSION(VERSION);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("bt-proto-4");
diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
index 07cfa3249f83..9a6306d7f738 100644
--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -511,6 +511,5 @@ module_exit(cmtp_exit);
 
 MODULE_AUTHOR("Marcel Holtmann ");
 MODULE_DESCRIPTION("Bluetooth CMTP ver " VERSION);
-MODULE_VERSION(VERSION);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("bt-proto-5");
diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 3b4fa27a44e6..597f5dde434a 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -1475,6 +1475,5 @@ module_exit(hidp_exit);
 MODULE_AUTHOR("Marcel Holtmann ");
 MODULE_AUTHOR("David Herrmann ");
 MODULE_DESCRIPTION("Bluetooth HIDP ver " VERSION);
-MODULE_VERSION(VERSION);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("bt-proto-6");
diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index f2bacb464ccf..3384da308a9b 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -2240,6 +2240,5 @@ MODULE_PARM_DESC(l2cap_ertm, "Use L2CAP ERTM mode for 
connection");
 
 MODULE_AUTHOR("Marcel Holtmann ");
 MODULE_DESCRIPTION("Bluetooth RFCOMM ver " VERSION);
-MODULE_VERSION(VERSION);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("bt-proto-3");
-- 
2.11.0

Re: [PATCH v2 1/2] net: dsa: add optional stats64 support

2020-12-02 Thread Vladimir Oltean

On Wed, Dec 02, 2020 at 01:07:11PM +0100, Oleksij Rempel wrote:
> Allow DSA drivers to export stats64
> 
> Signed-off-by: Oleksij Rempel 
> ---

Reviewed-by: Vladimir Oltean

Re: [PATCH net-next] net: sfp: add debugfs support

2020-12-02 Thread Russell King - ARM Linux admin

Jakub,

What's your opinion on this patch? It seems to have stalled...

Regards,
Russell

On Tue, Nov 24, 2020 at 12:46:40PM +0200, Ido Schimmel wrote:
> On Tue, Nov 24, 2020 at 09:49:16AM +, Russell King - ARM Linux admin 
> wrote:
> > On Tue, Nov 24, 2020 at 10:41:51AM +0200, Ido Schimmel wrote:
> > > On Tue, Nov 24, 2020 at 01:14:31AM +0100, Andrew Lunn wrote:
> > > > On Mon, Nov 23, 2020 at 10:06:16PM +, Russell King wrote:
> > > > > Add debugfs support to SFP so that the internal state of the SFP state
> > > > > machines and hardware signal state can be viewed from userspace, 
> > > > > rather
> > > > > than having to compile a debug kernel to view state state transitions
> > > > > in the kernel log.  The 'state' output looks like:
> > > > > 
> > > > > Module state: empty
> > > > > Module probe attempts: 0 0
> > > > > Device state: up
> > > > > Main state: down
> > > > > Fault recovery remaining retries: 5
> > > > > PHY probe remaining retries: 12
> > > > > moddef0: 0
> > > > > rx_los: 1
> > > > > tx_fault: 1
> > > > > tx_disable: 1
> > > > > 
> > > > > Signed-off-by: Russell King 
> > > > 
> > > > Hi Russell
> > > > 
> > > > This looks useful. I always seem to end up recompiling the kernel,
> > > > which as you said, this should avoid.
> > > 
> > > FWIW, another option is to use drgn [1]. Especially when the state is
> > > queried from the kernel and not hardware. We are using that in mlxsw
> > > [2][3].
> > 
> > Presumably that requires /proc/kcore support, which 32-bit ARM doesn't
> > have.
> 
> Yes, it does seem to be required for live debugging. I mostly work with
> x86 systems, I guess it's completely different for Andrew and you.
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Re: [PATCH v2 2/2] net: dsa: qca: ar9331: export stats64

2020-12-02 Thread Vladimir Oltean

On Wed, Dec 02, 2020 at 01:07:12PM +0100, Oleksij Rempel wrote:
> Add stats support for the ar9331 switch.
> 
> Signed-off-by: Oleksij Rempel 
> ---
>  /* Warning: switch reset will reset last AR9331_SW_MDIO_PHY_MODE_PAGE request
> @@ -422,6 +527,7 @@ static void ar9331_sw_phylink_mac_link_down(struct 
> dsa_switch *ds, int port,
>   phy_interface_t interface)
>  {
>   struct ar9331_sw_priv *priv = (struct ar9331_sw_priv *)ds->priv;
> + struct ar9331_sw_port *p = &priv->port[port];
>   struct regmap *regmap = priv->regmap;
>   int ret;
>  
> @@ -429,6 +535,8 @@ static void ar9331_sw_phylink_mac_link_down(struct 
> dsa_switch *ds, int port,
>AR9331_SW_PORT_STATUS_MAC_MASK, 0);
>   if (ret)
>   dev_err_ratelimited(priv->dev, "%s: %i\n", __func__, ret);
> +
> + cancel_delayed_work_sync(&p->mib_read);

Is this sufficient? Do you get a guaranteed .phylink_mac_link_down event
on unbind? How do you ensure you don't race with the stats worker there?

> +static void ar9331_stats_update(struct ar9331_sw_port *port,
> + struct rtnl_link_stats64 *stats)
> +{
> + struct ar9331_sw_stats *s = &port->stats;
> +
> + stats->rx_packets = s->rxbroad + s->rxmulti + s->rx64byte +
> + s->rx128byte + s->rx256byte + s->rx512byte + s->rx1024byte +
> + s->rx1518byte + s->rxmaxbyte;
> + stats->tx_packets = s->txbroad + s->txmulti + s->tx64byte +
> + s->tx128byte + s->tx256byte + s->tx512byte + s->tx1024byte +
> + s->tx1518byte + s->txmaxbyte;
> + stats->rx_bytes = s->rxgoodbyte;
> + stats->tx_bytes = s->txbyte;
> + stats->rx_errors = s->rxfcserr + s->rxalignerr + s->rxrunt +
> + s->rxfragment + s->rxoverflow;
> + stats->tx_errors = s->txoversize;
> + stats->multicast = s->rxmulti;
> + stats->collisions = s->txcollision;
> + stats->rx_length_errors = s->rxrunt * s->rxfragment + s->rxtoolong;

Multiplication? Is this right?

> + stats->rx_crc_errors = s->rxfcserr + s->rxalignerr + s->rxfragment;
> + stats->rx_frame_errors = s->rxalignerr;
> + stats->rx_missed_errors = s->rxoverflow;
> + stats->tx_aborted_errors = s->txabortcol;
> + stats->tx_fifo_errors = s->txunderrun;
> + stats->tx_window_errors = s->txlatecol;
> + stats->rx_nohandler = s->filtered;
> +}
> +
> +static void ar9331_do_stats_poll(struct work_struct *work)
> +{
> +

Could you remove this empty line.

Re: [PATCH 1/2][v3] e1000e: Leverage direct_complete to speed up s2ram

2020-12-02 Thread Kai-Heng Feng

Hi Yu,

> On Dec 1, 2020, at 09:21, Chen Yu  wrote:
> 
> The NIC is put in runtime suspend status when there is no cable connected.
> As a result, it is safe to keep non-wakeup NIC in runtime suspended during
> s2ram because the system does not rely on the NIC plug event nor WoL to wake
> up the system. Besides that, unlike the s2idle, s2ram does not need to
> manipulate S0ix settings during suspend.
> 
> This patch introduces the .prepare() for e1000e so that if the NIC is runtime
> suspended the subsequent suspend/resume hooks will be skipped so as to speed
> up the s2ram. The pm core will check whether the NIC is a wake up device so
> there's no need to check it again in .prepare(). DPM_FLAG_SMART_PREPARE flag
> should be set during probe to ask the pci subsystem to honor the driver's
> prepare() result. Besides, the NIC remains runtime suspended after resumed
> from s2ram as there is no need to resume it.
> 
> Tested on i7-2600K with 82579V NIC
> Before the patch:
> e1000e :00:19.0: pci_pm_suspend+0x0/0x160 returned 0 after 225146 usecs
> e1000e :00:19.0: pci_pm_resume+0x0/0x90 returned 0 after 140588 usecs
> 
> After the patch:
> echo disabled > //sys/devices/pci\:00/\:00\:19.0/power/wakeup
> becomes 0 usecs because the hooks will be skipped.
> 
> Suggested-by: Kai-Heng Feng 
> Signed-off-by: Chen Yu 

Well, I was intended to send it, but anyway :)

> ---
> v2: Added test data and some commit log revise(Paul Menzel)
>Only skip the suspend/resume if the NIC is not a wake up device specified
>by the user(Kai-Heng Feng)
> v3: Leverage direct complete mechanism to skip all hooks(Kai-Heng Feng)
> ---
> drivers/net/ethernet/intel/e1000e/netdev.c | 10 +-
> 1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index b30f00891c03..b210bba3f20a 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -25,6 +25,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include "e1000.h"
> 
> @@ -6957,6 +6958,12 @@ static int __e1000_resume(struct pci_dev *pdev)
>   return 0;
> }
> 
> +static int e1000e_pm_prepare(struct device *dev)
> +{
> + return pm_runtime_suspended(dev) &&
> + pm_suspend_via_firmware();
> +}
> +
> static __maybe_unused int e1000e_pm_suspend(struct device *dev)
> {
>   struct net_device *netdev = pci_get_drvdata(to_pci_dev(dev));
> @@ -7665,7 +7672,7 @@ static int e1000_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
> 
>   e1000_print_device_info(adapter);
> 
> - dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
> + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_SMART_PREPARE);

This isn't required for pci_pm_prepare() to use driver's .prepare callback.

> 
>   if (pci_dev_run_wake(pdev) && hw->mac.type < e1000_pch_cnp)
>   pm_runtime_put_noidle(&pdev->dev);
> @@ -7890,6 +7897,7 @@ MODULE_DEVICE_TABLE(pci, e1000_pci_tbl);
> 
> static const struct dev_pm_ops e1000_pm_ops = {
> #ifdef CONFIG_PM_SLEEP
> + .prepare= e1000e_pm_prepare,

How do we make sure a link change happened in S3 can be detect after resume, 
without a .complete callback which ask device to runtime resume?

Kai-Heng

>   .suspend= e1000e_pm_suspend,
>   .resume = e1000e_pm_resume,
>   .freeze = e1000e_pm_freeze,
> -- 
> 2.17.1
>

[net-next v4 3/8] seg6: add support for optional attributes in SRv6 behaviors

2020-12-02 Thread Andrea Mayer

Before this patch, each SRv6 behavior specifies a set of required
attributes that must be provided by the userspace application when such
behavior is going to be instantiated. If at least one of the required
attributes is not provided, the creation of the behavior fails.

The SRv6 behavior framework lacks a way to manage optional attributes.
By definition, an optional attribute for a SRv6 behavior consists of an
attribute which may or may not be provided by the userspace. Therefore,
if an optional attribute is missing (and thus not supplied by the user)
the creation of the behavior goes ahead without any issue.

This patch explicitly differentiates the required attributes from the
optional attributes. In particular, each behavior can declare a set of
required attributes and a set of optional ones.

The semantic of the required attributes remains *totally* unaffected by
this patch. The introduction of the optional attributes does NOT impact
on the backward compatibility of the existing SRv6 behaviors.

It is essential to note that if an (optional or required) attribute is
supplied to a SRv6 behavior which does not expect it, the behavior
simply discards such attribute without generating any error or warning.
This operating mode remained unchanged both before and after the
introduction of the optional attributes extension.

The optional attributes are one of the key components used to implement
the SRv6 End.DT6 behavior based on the Virtual Routing and Forwarding
(VRF) framework. The optional attributes make possible the coexistence
of the already existing SRv6 End.DT6 implementation with the new SRv6
End.DT6 VRF-based implementation without breaking any backward
compatibility. Further details on the SRv6 End.DT6 behavior (VRF mode)
are reported in subsequent patches.

>From the userspace point of view, the support for optional attributes DO
NOT require any changes to the userspace applications, i.e: iproute2
unless new attributes (required or optional) are needed.

Signed-off-by: Andrea Mayer 
---
 net/ipv6/seg6_local.c | 120 +-
 1 file changed, 106 insertions(+), 14 deletions(-)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index aef39eab9be2..3b5657c622a0 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -36,6 +36,21 @@ struct seg6_local_lwt;
 struct seg6_action_desc {
int action;
unsigned long attrs;
+
+   /* The optattrs field is used for specifying all the optional
+* attributes supported by a specific behavior.
+* It means that if one of these attributes is not provided in the
+* netlink message during the behavior creation, no errors will be
+* returned to the userspace.
+*
+* Each attribute can be only of two types (mutually exclusive):
+* 1) required or 2) optional.
+* Every user MUST obey to this rule! If you set an attribute as
+* required the same attribute CANNOT be set as optional and vice
+* versa.
+*/
+   unsigned long optattrs;
+
int (*input)(struct sk_buff *skb, struct seg6_local_lwt *slwt);
int static_headroom;
 };
@@ -57,6 +72,10 @@ struct seg6_local_lwt {
 
int headroom;
struct seg6_action_desc *desc;
+   /* unlike the required attrs, we have to track the optional attributes
+* that have been effectively parsed.
+*/
+   unsigned long parsed_optattrs;
 };
 
 static struct seg6_local_lwt *seg6_local_lwtunnel(struct lwtunnel_state *lwt)
@@ -959,26 +978,26 @@ static struct seg6_action_param 
seg6_action_params[SEG6_LOCAL_MAX + 1] = {
 };
 
 /* call the destroy() callback (if available) for each set attribute in
- * @slwt, starting from the first attribute up to the @max_parsed (excluded)
- * attribute.
+ * @parsed_attrs, starting from the first attribute up to the @max_parsed
+ * (excluded) attribute.
  */
-static void __destroy_attrs(int max_parsed, struct seg6_local_lwt *slwt)
+static void __destroy_attrs(unsigned long parsed_attrs, int max_parsed,
+   struct seg6_local_lwt *slwt)
 {
-   unsigned long attrs = slwt->desc->attrs;
struct seg6_action_param *param;
int i;
 
/* Every required seg6local attribute is identified by an ID which is
 * encoded as a flag (i.e: 1 << ID) in the 'attrs' bitmask;
 *
-* We scan the 'attrs' bitmask, starting from the first attribute
+* We scan the 'parsed_attrs' bitmask, starting from the first attribute
 * up to the @max_parsed (excluded) attribute.
 * For each set attribute, we retrieve the corresponding destroy()
 * callback. If the callback is not available, then we skip to the next
 * attribute; otherwise, we call the destroy() callback.
 */
for (i = 0; i < max_parsed; ++i) {
-   if (!(attrs & (1 << i)))
+   if (!(parsed_attrs & (1 << i)))
co

[net-next v4 0/8] seg6: add support for SRv6 End.DT4/DT6 behavior

2020-12-02 Thread Andrea Mayer

This patchset provides support for the SRv6 End.DT4 and End.DT6 (VRF mode)
behaviors.

The SRv6 End.DT4 behavior is used to implement multi-tenant IPv4 L3 VPNs. It
decapsulates the received packets and performs IPv4 routing lookup in the
routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a
VRF device in order to force the routing lookup into the associated routing
table.
The SRv6 End.DT4 behavior is defined in the SRv6 Network Programming [1].

The Linux kernel already offers an implementation of the SRv6 End.DT6 behavior
which allows us to set up IPv6 L3 VPNs over SRv6 networks. This new
implementation of DT6 is based on the same VRF infrastructure already exploited
for implementing the SRv6 End.DT4 behavior. The aim of the new SRv6 End.DT6 in
VRF mode consists in simplifying the construction of IPv6 L3 VPN services in
the multi-tenant environment.
Currently, the two SRv6 End.DT6 implementations (legacy and VRF mode)
coexist seamlessly and can be chosen according to the context and the user
preferences.

- Patch 1 is needed to solve a pre-existing issue with tunneled packets
  when a sniffer is attached;

- Patch 2 improves the management of the seg6local attributes used by the
  SRv6 behaviors;

- Patch 3 adds support for optional attributes in SRv6 behaviors;

- Patch 4 introduces two callbacks used for customizing the
  creation/destruction of a SRv6 behavior;

- Patch 5 is the core patch that adds support for the SRv6 End.DT4
  behavior;

- Patch 6 introduces the VRF support for SRv6 End.DT6 behavior;

- Patch 7 adds the selftest for SRv6 End.DT4 behavior;

- Patch 8 adds the selftest for SRv6 End.DT6 (VRF mode) behavior.

Regarding iproute2, the support for the new "vrftable" attribute, required by
both SRv6 End.DT4 and End.DT6 (VRF mode) behaviors, is provided in a different
patchset that will follow shortly.

I would like to thank David Ahern for his support during the development of
this patchset.

Comments, suggestions and improvements are very welcome!

Thanks,
Andrea Mayer

v4
 seg6: add support for the SRv6 End.DT4 behavior
  - remove IS_ERR() checks in cmp_nla_vrftable(), thanks to Jakub Kicinski.

 remove patch for iproute2:
  - mixing the iproute2 patch with this patchset confused patchwork.

v3
 notes about the build bot:
  - apparently the ',' (comma) in the subject prefix confused the build bot.
Removed the ',' in favor of ' ' (space). 

Thanks to David Ahern and Konstantin Ryabitsev for shedding light on this
fact.
Thanks also to Nathan Chancellor for trying to build the patchset v2 by
simulating the bot issue.

 add new patch for iproute2:
  - [9/9] seg6: add support for vrftable attribute in End.DT4/DT6 behaviors

 add new patch:
  -  [8/9] selftests: add selftest for the SRv6 End.DT6 (VRF) behavior

 add new patch:
  - [6/9] seg6: add VRF support for SRv6 End.DT6 behavior

 add new patch:
  - [3/9] seg6: add support for optional attributes in SRv6 behaviors

 selftests: add selftest for the SRv6 End.DT4 behavior
  - keep David Ahern's review tag since the code wasn't changed. Thanks to 
David  
Ahern for his review.

 seg6: add support for the SRv6 End.DT4 behavior
  - remove useless error in seg6_end_dt4_build();
  - remove #ifdef/#endif stubs for DT4 when CONFIG_NET_L3_MASTER_DEV is not
defined;
  - fix coding style.

Thanks to Jakub Kicinski for his review and for all his suggestions.

 seg6: add callbacks for customizing the creation/destruction of a behavior
  - remove typedef(s) slwt_{build/destroy}_state_t;
  - fix coding style: remove empty lines, trivial comments and rename labels in
the seg6_local_build_state() function.

Thanks to Jakub Kicinski for his review and for all his suggestions.

 seg6: improve management of behavior attributes
  - remove defensive programming approach in destroy_attr_srh(),
destroy_attr_bpf() and destroy_attrs();
  - change the __destroy_attrs() function signature, renaming the 'end' 
argument
'parsed_max'. Now, the __destroy_attrs() keeps only the 'parsed_max' and
'slwt' arguments.

Thanks to Jakub Kicinski for his review and for all his suggestions.

 vrf: add mac header for tunneled packets when sniffer is attached
  - keep David Ahern's review tag since the code wasn't changed. 

Thanks to Jakub Kicinski for pointing it out and David Ahern for his review.

v2
 no changes made: resubmitted after false build report.

v1
 improve comments;

 add new patch 2/5 titled: seg6: improve management of behavior attributes

 seg6: add support for the SRv6 End.DT4 behavior
  - remove the inline keyword in the definition of fib6_config_get_net().

 selftests: add selftest for the SRv6 End.DT4 behavior
  - add check for the vrf sysctl

[1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming

Andrea Mayer (8):
  vrf: add mac header for tunneled packets when sniffer is attached
  seg6: improve management of behavior attributes
  seg6:

[net-next v4 1/8] vrf: add mac header for tunneled packets when sniffer is attached

2020-12-02 Thread Andrea Mayer

Before this patch, a sniffer attached to a VRF used as the receiving
interface of L3 tunneled packets detects them as malformed packets and
it complains about that (i.e.: tcpdump shows bogus packets).

The reason is that a tunneled L3 packet does not carry any L2
information and when the VRF is set as the receiving interface of a
decapsulated L3 packet, no mac header is currently set or valid.
Therefore, the purpose of this patch consists of adding a MAC header to
any packet which is directly received on the VRF interface ONLY IF:

 i) a sniffer is attached on the VRF and ii) the mac header is not set.

In this case, the mac address of the VRF is copied in both the
destination and the source address of the ethernet header. The protocol
type is set either to IPv4 or IPv6, depending on which L3 packet is
received.

Signed-off-by: Andrea Mayer 
Reviewed-by: David Ahern 
---
 drivers/net/vrf.c | 78 +++
 1 file changed, 72 insertions(+), 6 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index f8d711a84763..259d5cbacf2c 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1310,6 +1310,61 @@ static void vrf_ip6_input_dst(struct sk_buff *skb, 
struct net_device *vrf_dev,
skb_dst_set(skb, &rt6->dst);
 }
 
+static int vrf_prepare_mac_header(struct sk_buff *skb,
+ struct net_device *vrf_dev, u16 proto)
+{
+   struct ethhdr *eth;
+   int err;
+
+   /* in general, we do not know if there is enough space in the head of
+* the packet for hosting the mac header.
+*/
+   err = skb_cow_head(skb, LL_RESERVED_SPACE(vrf_dev));
+   if (unlikely(err))
+   /* no space in the skb head */
+   return -ENOBUFS;
+
+   __skb_push(skb, ETH_HLEN);
+   eth = (struct ethhdr *)skb->data;
+
+   skb_reset_mac_header(skb);
+
+   /* we set the ethernet destination and the source addresses to the
+* address of the VRF device.
+*/
+   ether_addr_copy(eth->h_dest, vrf_dev->dev_addr);
+   ether_addr_copy(eth->h_source, vrf_dev->dev_addr);
+   eth->h_proto = htons(proto);
+
+   /* the destination address of the Ethernet frame corresponds to the
+* address set on the VRF interface; therefore, the packet is intended
+* to be processed locally.
+*/
+   skb->protocol = eth->h_proto;
+   skb->pkt_type = PACKET_HOST;
+
+   skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+
+   skb_pull_inline(skb, ETH_HLEN);
+
+   return 0;
+}
+
+/* prepare and add the mac header to the packet if it was not set previously.
+ * In this way, packet sniffers such as tcpdump can parse the packet correctly.
+ * If the mac header was already set, the original mac header is left
+ * untouched and the function returns immediately.
+ */
+static int vrf_add_mac_header_if_unset(struct sk_buff *skb,
+  struct net_device *vrf_dev,
+  u16 proto)
+{
+   if (skb_mac_header_was_set(skb))
+   return 0;
+
+   return vrf_prepare_mac_header(skb, vrf_dev, proto);
+}
+
 static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
   struct sk_buff *skb)
 {
@@ -1336,9 +1391,15 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device 
*vrf_dev,
skb->skb_iif = vrf_dev->ifindex;
 
if (!list_empty(&vrf_dev->ptype_all)) {
-   skb_push(skb, skb->mac_len);
-   dev_queue_xmit_nit(skb, vrf_dev);
-   skb_pull(skb, skb->mac_len);
+   int err;
+
+   err = vrf_add_mac_header_if_unset(skb, vrf_dev,
+ ETH_P_IPV6);
+   if (likely(!err)) {
+   skb_push(skb, skb->mac_len);
+   dev_queue_xmit_nit(skb, vrf_dev);
+   skb_pull(skb, skb->mac_len);
+   }
}
 
IP6CB(skb)->flags |= IP6SKB_L3SLAVE;
@@ -1381,9 +1442,14 @@ static struct sk_buff *vrf_ip_rcv(struct net_device 
*vrf_dev,
vrf_rx_stats(vrf_dev, skb->len);
 
if (!list_empty(&vrf_dev->ptype_all)) {
-   skb_push(skb, skb->mac_len);
-   dev_queue_xmit_nit(skb, vrf_dev);
-   skb_pull(skb, skb->mac_len);
+   int err;
+
+   err = vrf_add_mac_header_if_unset(skb, vrf_dev, ETH_P_IP);
+   if (likely(!err)) {
+   skb_push(skb, skb->mac_len);
+   dev_queue_xmit_nit(skb, vrf_dev);
+   skb_pull(skb, skb->mac_len);
+   }
}
 
skb = vrf_rcv_nfhook(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, vrf_dev);
-- 
2.20.1

[net-next v4 6/8] seg6: add VRF support for SRv6 End.DT6 behavior

2020-12-02 Thread Andrea Mayer

SRv6 End.DT6 is defined in the SRv6 Network Programming [1].

The Linux kernel already offers an implementation of the SRv6
End.DT6 behavior which permits IPv6 L3 VPNs over SRv6 networks. This
implementation is not particularly suitable in contexts where we need to
deploy IPv6 L3 VPNs among different tenants which share the same network
address schemes. The underlying problem lies in the fact that the
current version of DT6 (called legacy DT6 from now on) needs a complex
configuration to be applied on routers which requires ad-hoc routes and
routing policy rules to ensure the correct isolation of tenants.

Consequently, a new implementation of DT6 has been introduced with the
aim of simplifying the construction of IPv6 L3 VPN services in the
multi-tenant environment using SRv6 networks. To accomplish this task,
we reused the same VRF infrastructure and SRv6 core components already
exploited for implementing the SRv6 End.DT4 behavior.

Currently the two End.DT6 implementations coexist seamlessly and can be
used depending on the context and the user preferences. So, in order to
support both versions of DT6 a new attribute (vrftable) has been
introduced which allows us to differentiate the implementation of the
behavior to be used.

A SRv6 End.DT6 legacy behavior is still instantiated using a command
like the following one:

 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table 100 dev eth0

While to instantiate the SRv6 End.DT6 in VRF mode, the command is still
pretty straight forward:

 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 vrftable 100 dev 
eth0.

Obviously as in the case of SRv6 End.DT4, the VRF strict_mode parameter
must be set (net.vrf.strict_mode=1) and the VRF associated with table
100 must exist.

Please note that the instances of SRv6 End.DT6 legacy and End.DT6 VRF
mode can coexist in the same system/configuration without problems.

[1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming

Signed-off-by: Andrea Mayer 
---
 net/ipv6/seg6_local.c | 76 +++
 1 file changed, 76 insertions(+)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 24c2616c8c11..b07f7c1c82a4 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -497,6 +497,10 @@ static int __seg6_end_dt_vrf_build(struct seg6_local_lwt 
*slwt, const void *cfg,
info->proto = htons(ETH_P_IP);
info->hdrlen = sizeof(struct iphdr);
break;
+   case AF_INET6:
+   info->proto = htons(ETH_P_IPV6);
+   info->hdrlen = sizeof(struct ipv6hdr);
+   break;
default:
return -EINVAL;
}
@@ -649,6 +653,47 @@ static int seg6_end_dt4_build(struct seg6_local_lwt *slwt, 
const void *cfg,
 {
return __seg6_end_dt_vrf_build(slwt, cfg, AF_INET, extack);
 }
+
+static enum
+seg6_end_dt_mode seg6_end_dt6_parse_mode(struct seg6_local_lwt *slwt)
+{
+   unsigned long parsed_optattrs = slwt->parsed_optattrs;
+   bool legacy, vrfmode;
+
+   legacy  = !!(parsed_optattrs & (1 << SEG6_LOCAL_TABLE));
+   vrfmode = !!(parsed_optattrs & (1 << SEG6_LOCAL_VRFTABLE));
+
+   if (!(legacy ^ vrfmode))
+   /* both are absent or present: invalid DT6 mode */
+   return DT_INVALID_MODE;
+
+   return legacy ? DT_LEGACY_MODE : DT_VRF_MODE;
+}
+
+static enum seg6_end_dt_mode seg6_end_dt6_get_mode(struct seg6_local_lwt *slwt)
+{
+   struct seg6_end_dt_info *info = &slwt->dt_info;
+
+   return info->mode;
+}
+
+static int seg6_end_dt6_build(struct seg6_local_lwt *slwt, const void *cfg,
+ struct netlink_ext_ack *extack)
+{
+   enum seg6_end_dt_mode mode = seg6_end_dt6_parse_mode(slwt);
+   struct seg6_end_dt_info *info = &slwt->dt_info;
+
+   switch (mode) {
+   case DT_LEGACY_MODE:
+   info->mode = DT_LEGACY_MODE;
+   return 0;
+   case DT_VRF_MODE:
+   return __seg6_end_dt_vrf_build(slwt, cfg, AF_INET6, extack);
+   default:
+   NL_SET_ERR_MSG(extack, "table or vrftable must be specified");
+   return -EINVAL;
+   }
+}
 #endif
 
 static int input_action_end_dt6(struct sk_buff *skb,
@@ -660,6 +705,28 @@ static int input_action_end_dt6(struct sk_buff *skb,
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
goto drop;
 
+#ifdef CONFIG_NET_L3_MASTER_DEV
+   if (seg6_end_dt6_get_mode(slwt) == DT_LEGACY_MODE)
+   goto legacy_mode;
+
+   /* DT6_VRF_MODE */
+   skb = end_dt_vrf_core(skb, slwt);
+   if (!skb)
+   /* packet has been processed and consumed by the VRF */
+   return 0;
+
+   if (IS_ERR(skb))
+   return PTR_ERR(skb);
+
+   /* note: this time we do not need to specify the table because the VRF
+* takes care of selecting the correct table.
+*/
+   seg6_lookup_any_nexthop(sk

[net-next v4 4/8] seg6: add callbacks for customizing the creation/destruction of a behavior

2020-12-02 Thread Andrea Mayer

We introduce two callbacks used for customizing the creation/destruction of
a SRv6 behavior. Such callbacks are defined in the new struct
seg6_local_lwtunnel_ops and hereafter we provide a brief description of
them:

 - build_state(...): used for calling the custom constructor of the
   behavior during its initialization phase and after all the attributes
   have been parsed successfully;

 - destroy_state(...): used for calling the custom destructor of the
   behavior before it is completely destroyed.

Signed-off-by: Andrea Mayer 
---
 net/ipv6/seg6_local.c | 49 +++
 1 file changed, 49 insertions(+)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 3b5657c622a0..da5bf4167a52 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -33,6 +33,13 @@
 
 struct seg6_local_lwt;
 
+/* callbacks used for customizing the creation and destruction of a behavior */
+struct seg6_local_lwtunnel_ops {
+   int (*build_state)(struct seg6_local_lwt *slwt, const void *cfg,
+  struct netlink_ext_ack *extack);
+   void (*destroy_state)(struct seg6_local_lwt *slwt);
+};
+
 struct seg6_action_desc {
int action;
unsigned long attrs;
@@ -53,6 +60,8 @@ struct seg6_action_desc {
 
int (*input)(struct sk_buff *skb, struct seg6_local_lwt *slwt);
int static_headroom;
+
+   struct seg6_local_lwtunnel_ops slwt_ops;
 };
 
 struct bpf_lwt_prog {
@@ -1055,6 +1064,38 @@ static int parse_nla_optional_attrs(struct nlattr 
**attrs,
return err;
 }
 
+/* call the custom constructor of the behavior during its initialization phase
+ * and after that all its attributes have been parsed successfully.
+ */
+static int
+seg6_local_lwtunnel_build_state(struct seg6_local_lwt *slwt, const void *cfg,
+   struct netlink_ext_ack *extack)
+{
+   struct seg6_action_desc *desc = slwt->desc;
+   struct seg6_local_lwtunnel_ops *ops;
+
+   ops = &desc->slwt_ops;
+   if (!ops->build_state)
+   return 0;
+
+   return ops->build_state(slwt, cfg, extack);
+}
+
+/* call the custom destructor of the behavior which is invoked before the
+ * tunnel is going to be destroyed.
+ */
+static void seg6_local_lwtunnel_destroy_state(struct seg6_local_lwt *slwt)
+{
+   struct seg6_action_desc *desc = slwt->desc;
+   struct seg6_local_lwtunnel_ops *ops;
+
+   ops = &desc->slwt_ops;
+   if (!ops->destroy_state)
+   return;
+
+   ops->destroy_state(slwt);
+}
+
 static int parse_nla_action(struct nlattr **attrs, struct seg6_local_lwt *slwt)
 {
struct seg6_action_param *param;
@@ -1154,6 +1195,10 @@ static int seg6_local_build_state(struct net *net, 
struct nlattr *nla,
if (err < 0)
goto out_free;
 
+   err = seg6_local_lwtunnel_build_state(slwt, cfg, extack);
+   if (err < 0)
+   goto out_destroy_attrs;
+
newts->type = LWTUNNEL_ENCAP_SEG6_LOCAL;
newts->flags = LWTUNNEL_STATE_INPUT_REDIRECT;
newts->headroom = slwt->headroom;
@@ -1162,6 +1207,8 @@ static int seg6_local_build_state(struct net *net, struct 
nlattr *nla,
 
return 0;
 
+out_destroy_attrs:
+   destroy_attrs(slwt);
 out_free:
kfree(newts);
return err;
@@ -1171,6 +1218,8 @@ static void seg6_local_destroy_state(struct 
lwtunnel_state *lwt)
 {
struct seg6_local_lwt *slwt = seg6_local_lwtunnel(lwt);
 
+   seg6_local_lwtunnel_destroy_state(slwt);
+
destroy_attrs(slwt);
 
return;
-- 
2.20.1

[net-next v4 8/8] selftests: add selftest for the SRv6 End.DT6 (VRF) behavior

2020-12-02 Thread Andrea Mayer

this selftest is designed for evaluating the new SRv6 End.DT6 (VRF) behavior
used, in this example, for implementing IPv6 L3 VPN use cases.

Signed-off-by: Andrea Mayer 
Signed-off-by: Paolo Lungaroni 
---
 .../selftests/net/srv6_end_dt6_l3vpn_test.sh  | 502 ++
 1 file changed, 502 insertions(+)
 create mode 100755 tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh

diff --git a/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh 
b/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
new file mode 100755
index ..68708f5e26a0
--- /dev/null
+++ b/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
@@ -0,0 +1,502 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# author: Andrea Mayer 
+# author: Paolo Lungaroni 
+
+# This test is designed for evaluating the new SRv6 End.DT6 behavior used for
+# implementing IPv6 L3 VPN use cases.
+#
+# Hereafter a network diagram is shown, where two different tenants (named 100
+# and 200) offer IPv6 L3 VPN services allowing hosts to communicate with each
+# other across an IPv6 network.
+#
+# Only hosts belonging to the same tenant (and to the same VPN) can communicate
+# with each other. Instead, the communication among hosts of different tenants
+# is forbidden.
+# In other words, hosts hs-t100-1 and hs-t100-2 are connected through the IPv6
+# L3 VPN of tenant 100 while hs-t200-3 and hs-t200-4 are connected using the
+# IPv6 L3 VPN of tenant 200. Cross connection between tenant 100 and tenant 200
+# is forbidden and thus, for example, hs-t100-1 cannot reach hs-t200-3 and vice
+# versa.
+#
+# Routers rt-1 and rt-2 implement IPv6 L3 VPN services leveraging the SRv6
+# architecture. The key components for such VPNs are: a) SRv6 Encap behavior,
+# b) SRv6 End.DT6 behavior and c) VRF.
+#
+# To explain how an IPv6 L3 VPN based on SRv6 works, let us briefly consider an
+# example where, within the same domain of tenant 100, the host hs-t100-1 pings
+# the host hs-t100-2.
+#
+# First of all, L2 reachability of the host hs-t100-2 is taken into account by
+# the router rt-1 which acts as a ndp proxy.
+#
+# When the host hs-t100-1 sends an IPv6 packet destined to hs-t100-2, the
+# router rt-1 receives the packet on the internal veth-t100 interface. Such
+# interface is enslaved to the VRF vrf-100 whose associated table contains the
+# SRv6 Encap route for encapsulating any IPv6 packet in a IPv6 plus the Segment
+# Routing Header (SRH) packet. This packet is sent through the (IPv6) core
+# network up to the router rt-2 that receives it on veth0 interface.
+#
+# The rt-2 router uses the 'localsid' routing table to process incoming
+# IPv6+SRH packets which belong to the VPN of the tenant 100. For each of these
+# packets, the SRv6 End.DT6 behavior removes the outer IPv6+SRH headers and
+# performs the lookup on the vrf-100 table using the destination address of
+# the decapsulated IPv6 packet. Afterwards, the packet is sent to the host
+# hs-t100-2 through the veth-t100 interface.
+#
+# The ping response follows the same processing but this time the role of rt-1
+# and rt-2 are swapped.
+#
+# Of course, the IPv6 L3 VPN for tenant 200 works exactly as the IPv6 L3 VPN
+# for tenant 100. In this case, only hosts hs-t200-3 and hs-t200-4 are able to
+# connect with each other.
+#
+#
+# +---+   +---+
+# |   |   |   |
+# |  hs-t100-1 netns  |   |  hs-t100-2 netns  |
+# |   |   |   |
+# |  +-+  |   |  +-+  |
+# |  |veth0|  |   |  |veth0|  |
+# |  |  cafe::1/64 |  |   |  |  cafe::2/64 |  |
+# |  +-+  |   |  +-+  |
+# |.  |   | . |
+# +---+   +---+
+#  ..
+#  ..
+#  ..
+# +---+   +---+
+# |.  |   | . |
+# | +---+ |   | + |
+# | |   veth-t100   | |   | |   veth-t100   | |
+# | |  cafe::254/64 |+--+ |   | +--+|  cafe::254/64 | |
+# | +---+---+| localsid | |   | | localsid |+---+ |
+# | ||   table  | |   | |   table  || |
+# |+++   +--+ |   | +--+   +++|
+# || vrf-100 |

[net-next v4 7/8] selftests: add selftest for the SRv6 End.DT4 behavior

2020-12-02 Thread Andrea Mayer

this selftest is designed for evaluating the new SRv6 End.DT4 behavior
used, in this example, for implementing IPv4 L3 VPN use cases.

Signed-off-by: Andrea Mayer 
Reviewed-by: David Ahern 
---
 .../selftests/net/srv6_end_dt4_l3vpn_test.sh  | 494 ++
 1 file changed, 494 insertions(+)
 create mode 100755 tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh

diff --git a/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh 
b/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
new file mode 100755
index ..ad7a9fc59934
--- /dev/null
+++ b/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
@@ -0,0 +1,494 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# author: Andrea Mayer 
+
+# This test is designed for evaluating the new SRv6 End.DT4 behavior used for
+# implementing IPv4 L3 VPN use cases.
+#
+# Hereafter a network diagram is shown, where two different tenants (named 100
+# and 200) offer IPv4 L3 VPN services allowing hosts to communicate with each
+# other across an IPv6 network.
+#
+# Only hosts belonging to the same tenant (and to the same VPN) can communicate
+# with each other. Instead, the communication among hosts of different tenants
+# is forbidden.
+# In other words, hosts hs-t100-1 and hs-t100-2 are connected through the IPv4
+# L3 VPN of tenant 100 while hs-t200-3 and hs-t200-4 are connected using the
+# IPv4 L3 VPN of tenant 200. Cross connection between tenant 100 and tenant 200
+# is forbidden and thus, for example, hs-t100-1 cannot reach hs-t200-3 and vice
+# versa.
+#
+# Routers rt-1 and rt-2 implement IPv4 L3 VPN services leveraging the SRv6
+# architecture. The key components for such VPNs are: a) SRv6 Encap behavior,
+# b) SRv6 End.DT4 behavior and c) VRF.
+#
+# To explain how an IPv4 L3 VPN based on SRv6 works, let us briefly consider an
+# example where, within the same domain of tenant 100, the host hs-t100-1 pings
+# the host hs-t100-2.
+#
+# First of all, L2 reachability of the host hs-t100-2 is taken into account by
+# the router rt-1 which acts as an arp proxy.
+#
+# When the host hs-t100-1 sends an IPv4 packet destined to hs-t100-2, the
+# router rt-1 receives the packet on the internal veth-t100 interface. Such
+# interface is enslaved to the VRF vrf-100 whose associated table contains the
+# SRv6 Encap route for encapsulating any IPv4 packet in a IPv6 plus the Segment
+# Routing Header (SRH) packet. This packet is sent through the (IPv6) core
+# network up to the router rt-2 that receives it on veth0 interface.
+#
+# The rt-2 router uses the 'localsid' routing table to process incoming
+# IPv6+SRH packets which belong to the VPN of the tenant 100. For each of these
+# packets, the SRv6 End.DT4 behavior removes the outer IPv6+SRH headers and
+# performs the lookup on the vrf-100 table using the destination address of
+# the decapsulated IPv4 packet. Afterwards, the packet is sent to the host
+# hs-t100-2 through the veth-t100 interface.
+#
+# The ping response follows the same processing but this time the role of rt-1
+# and rt-2 are swapped.
+#
+# Of course, the IPv4 L3 VPN for tenant 200 works exactly as the IPv4 L3 VPN
+# for tenant 100. In this case, only hosts hs-t200-3 and hs-t200-4 are able to
+# connect with each other.
+#
+#
+# +---+   +---+
+# |   |   |   |
+# |  hs-t100-1 netns  |   |  hs-t100-2 netns  |
+# |   |   |   |
+# |  +-+  |   |  +-+  |
+# |  |veth0|  |   |  |veth0|  |
+# |  | 10.0.0.1/24 |  |   |  | 10.0.0.2/24 |  |
+# |  +-+  |   |  +-+  |
+# |.  |   | . |
+# +---+   +---+
+#  ..
+#  ..
+#  ..
+# +---+   +---+
+# |.  |   | . |
+# | +---+ |   | + |
+# | |   veth-t100   | |   | |   veth-t100   | |
+# | | 10.0.0.254/24 |+--+ |   | +--+| 10.0.0.254/24 | |
+# | +---+---+| localsid | |   | | localsid |+---+ |
+# | ||   table  | |   | |   table  || |
+# |+++   +--+ |   | +--+   +++|
+# || vrf-100 ||   |

[net-next v4 2/8] seg6: improve management of behavior attributes

2020-12-02 Thread Andrea Mayer

Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.

The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().

Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.

The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.

We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.

The destroy() callback comes with several of advantages:

 1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;

 2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;

 3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;

 4) it facilitates the integration with new features introduced in further
patches.

Signed-off-by: Andrea Mayer 
---
 net/ipv6/seg6_local.c | 80 +--
 1 file changed, 70 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index eba23279912d..aef39eab9be2 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -710,6 +710,11 @@ static int cmp_nla_srh(struct seg6_local_lwt *a, struct 
seg6_local_lwt *b)
return memcmp(a->srh, b->srh, len);
 }
 
+static void destroy_attr_srh(struct seg6_local_lwt *slwt)
+{
+   kfree(slwt->srh);
+}
+
 static int parse_nla_table(struct nlattr **attrs, struct seg6_local_lwt *slwt)
 {
slwt->table = nla_get_u32(attrs[SEG6_LOCAL_TABLE]);
@@ -901,16 +906,30 @@ static int cmp_nla_bpf(struct seg6_local_lwt *a, struct 
seg6_local_lwt *b)
return strcmp(a->bpf.name, b->bpf.name);
 }
 
+static void destroy_attr_bpf(struct seg6_local_lwt *slwt)
+{
+   kfree(slwt->bpf.name);
+   if (slwt->bpf.prog)
+   bpf_prog_put(slwt->bpf.prog);
+}
+
 struct seg6_action_param {
int (*parse)(struct nlattr **attrs, struct seg6_local_lwt *slwt);
int (*put)(struct sk_buff *skb, struct seg6_local_lwt *slwt);
int (*cmp)(struct seg6_local_lwt *a, struct seg6_local_lwt *b);
+
+   /* optional destroy() callback useful for releasing resources which
+* have been previously acquired in the corresponding parse()
+* function.
+*/
+   void (*destroy)(struct seg6_local_lwt *slwt);
 };
 
 static struct seg6_action_param seg6_action_params[SEG6_LOCAL_MAX + 1] = {
[SEG6_LOCAL_SRH]= { .parse = parse_nla_srh,
.put = put_nla_srh,
-   .cmp = cmp_nla_srh },
+   .cmp = cmp_nla_srh,
+   .destroy = destroy_attr_srh },
 
[SEG6_LOCAL_TABLE]  = { .parse = parse_nla_table,
.put = put_nla_table,
@@ -934,10 +953,49 @@ static struct seg6_action_param 
seg6_action_params[SEG6_LOCAL_MAX + 1] = {
 
[SEG6_LOCAL_BPF]= { .parse = parse_nla_bpf,
.put = put_nla_bpf,
-   .cmp = cmp_nla_bpf },
+   .cmp = cmp_nla_bpf,
+   .destroy = destroy_attr_bpf },
 
 };
 
+/* call the destroy() callback (if available) for each set attribute in
+ * @slwt, starting from the first attribute up to the @max_parsed (excluded)
+ * attribute.
+ */
+static void __destroy_attrs(int max_parsed, struct seg6_local_lwt *slwt)
+{
+   unsigned long attrs = slwt->desc->attrs;
+   struct seg6_action_param *p

[net-next v4 5/8] seg6: add support for the SRv6 End.DT4 behavior

2020-12-02 Thread Andrea Mayer

SRv6 End.DT4 is defined in the SRv6 Network Programming [1].

The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in
multi-tenants environments. It decapsulates the received packets and it
performs IPv4 routing lookup in the routing table of the tenant.

The SRv6 End.DT4 Linux implementation leverages a VRF device in order to
force the routing lookup into the associated routing table.

To make the End.DT4 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one
VRF during the tunnel creation. Such constraint has to be enforced by
enabling the VRF strict_mode sysctl parameter, i.e:
 $ sysctl -wq net.vrf.strict_mode=1.

At JANOG44, LINE corporation presented their multi-tenant DC architecture
using SRv6 [2]. In the slides, they reported that the Linux kernel is
missing the support of SRv6 End.DT4 behavior.

The SRv6 End.DT4 behavior can be instantiated using a command similar to
the following:

 $ ip route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0

We introduce the "vrftable" extension in iproute2 in a following patch.

[1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming
[2] 
https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6

Signed-off-by: Andrea Mayer 
---
 include/uapi/linux/seg6_local.h |   1 +
 net/ipv6/seg6_local.c   | 287 
 2 files changed, 288 insertions(+)

diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
index edc138bdc56d..3b39ef1dbb46 100644
--- a/include/uapi/linux/seg6_local.h
+++ b/include/uapi/linux/seg6_local.h
@@ -26,6 +26,7 @@ enum {
SEG6_LOCAL_IIF,
SEG6_LOCAL_OIF,
SEG6_LOCAL_BPF,
+   SEG6_LOCAL_VRFTABLE,
__SEG6_LOCAL_MAX,
 };
 #define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index da5bf4167a52..24c2616c8c11 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -69,6 +69,28 @@ struct bpf_lwt_prog {
char *name;
 };
 
+enum seg6_end_dt_mode {
+   DT_INVALID_MODE = -EINVAL,
+   DT_LEGACY_MODE  = 0,
+   DT_VRF_MODE = 1,
+};
+
+struct seg6_end_dt_info {
+   enum seg6_end_dt_mode mode;
+
+   struct net *net;
+   /* VRF device associated to the routing table used by the SRv6
+* End.DT4/DT6 behavior for routing IPv4/IPv6 packets.
+*/
+   int vrf_ifindex;
+   int vrf_table;
+
+   /* tunneled packet proto and family (IPv4 or IPv6) */
+   __be16 proto;
+   u16 family;
+   int hdrlen;
+};
+
 struct seg6_local_lwt {
int action;
struct ipv6_sr_hdr *srh;
@@ -78,6 +100,9 @@ struct seg6_local_lwt {
int iif;
int oif;
struct bpf_lwt_prog bpf;
+#ifdef CONFIG_NET_L3_MASTER_DEV
+   struct seg6_end_dt_info dt_info;
+#endif
 
int headroom;
struct seg6_action_desc *desc;
@@ -429,6 +454,203 @@ static int input_action_end_dx4(struct sk_buff *skb,
return -EINVAL;
 }
 
+#ifdef CONFIG_NET_L3_MASTER_DEV
+static struct net *fib6_config_get_net(const struct fib6_config *fib6_cfg)
+{
+   const struct nl_info *nli = &fib6_cfg->fc_nlinfo;
+
+   return nli->nl_net;
+}
+
+static int __seg6_end_dt_vrf_build(struct seg6_local_lwt *slwt, const void 
*cfg,
+  u16 family, struct netlink_ext_ack *extack)
+{
+   struct seg6_end_dt_info *info = &slwt->dt_info;
+   int vrf_ifindex;
+   struct net *net;
+
+   net = fib6_config_get_net(cfg);
+
+   /* note that vrf_table was already set by parse_nla_vrftable() */
+   vrf_ifindex = l3mdev_ifindex_lookup_by_table_id(L3MDEV_TYPE_VRF, net,
+   info->vrf_table);
+   if (vrf_ifindex < 0) {
+   if (vrf_ifindex == -EPERM) {
+   NL_SET_ERR_MSG(extack,
+  "Strict mode for VRF is disabled");
+   } else if (vrf_ifindex == -ENODEV) {
+   NL_SET_ERR_MSG(extack,
+  "Table has no associated VRF device");
+   } else {
+   pr_debug("seg6local: SRv6 End.DT* creation error=%d\n",
+vrf_ifindex);
+   }
+
+   return vrf_ifindex;
+   }
+
+   info->net = net;
+   info->vrf_ifindex = vrf_ifindex;
+
+   switch (family) {
+   case AF_INET:
+   info->proto = htons(ETH_P_IP);
+   info->hdrlen = sizeof(struct iphdr);
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   info->family = family;
+   info->mode = DT_VRF_MODE;
+
+   return 0;
+}
+
+/* The SRv6 End.DT4/DT6 behavior extracts the inner (IPv4/IPv6) packet and
+ * routes the IPv4/IPv6 packet by looking at the configured routing table.
+ *
+ * In the SRv6 End.DT4/DT6 use case,

Re: [PATCH net-next] net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround

2020-12-02 Thread Russell King - ARM Linux admin

On Wed, Nov 25, 2020 at 02:00:20PM +, Russell King - ARM Linux admin wrote:
> On Tue, Nov 24, 2020 at 08:18:56AM +0800, kernel test robot wrote:
> > All warnings (new ones prefixed by >>):
> > 
> >drivers/net/phy/sfp.c: In function 'sfp_i2c_read':
> > >> drivers/net/phy/sfp.c:339:9: warning: variable 'block_size' set but not 
> > >> used [-Wunused-but-set-variable]
> >  339 |  size_t block_size;
> >  | ^~
> 
> I'm waiting for Thomas to re-test the fixed patch I sent, but Thomas
> seems to be of the opinion that there's no need to re-test, despite
> the fixed patch having the intended effect of changing the behaviour
> on the I2C bus.
> 
> If nothing is forthcoming, I'm intending to drop the patch; we don't
> need to waste time supporting untested workarounds for what are
> essentially broken SFPs by vendors twisting the SFP MSA in the
> kernel.

I have had no further co-operation from Thomas so far. If I don't hear
from someone who is able to test this module by this weekend, I will be
dropping this patch from my repository.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

RE: [PATCH] dpaa_eth: copy timestamp fields to new skb in A-050385 workaround

2020-12-02 Thread Camelia Alexandra Groza

> -Original Message-
> From: Yangbo Lu 
> Sent: Tuesday, December 1, 2020 09:53
> To: netdev@vger.kernel.org
> Cc: Y.b. Lu ; Madalin Bucur
> ; David S . Miller 
> Subject: [PATCH] dpaa_eth: copy timestamp fields to new skb in A-050385
> workaround
> 
> The timestamp fields should be copied to new skb too in
> A-050385 workaround for later TX timestamping handling.
> 
> Fixes: 3c68b8fffb48 ("dpaa_eth: FMan erratum A050385 workaround")
> Signed-off-by: Yangbo Lu 
> ---

Acked-by: Camelia Groza 

>  drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> index d9c2859..cb7c028 100644
> --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> @@ -2120,6 +2120,15 @@ static int dpaa_a050385_wa(struct net_device
> *net_dev, struct sk_buff **s)
>   skb_copy_header(new_skb, skb);
>   new_skb->dev = skb->dev;
> 
> + /* Copy relevant timestamp info from the old skb to the new */
> + if (priv->tx_tstamp) {
> + skb_shinfo(new_skb)->tx_flags = skb_shinfo(skb)->tx_flags;
> + skb_shinfo(new_skb)->hwtstamps = skb_shinfo(skb)-
> >hwtstamps;
> + skb_shinfo(new_skb)->tskey = skb_shinfo(skb)->tskey;
> + if (skb->sk)
> + skb_set_owner_w(new_skb, skb->sk);
> + }
> +
>   /* We move the headroom when we align it so we have to reset the
>* network and transport header offsets relative to the new data
>* pointer. The checksum offload relies on these offsets.
> @@ -2127,7 +2136,6 @@ static int dpaa_a050385_wa(struct net_device
> *net_dev, struct sk_buff **s)
>   skb_set_network_header(new_skb, skb_network_offset(skb));
>   skb_set_transport_header(new_skb, skb_transport_offset(skb));
> 
> - /* TODO: does timestamping need the result in the old skb? */
>   dev_kfree_skb(skb);
>   *s = new_skb;
> 
> --
> 2.7.4

Re: [External] Re: [PATCH 0/7] Introduce vdpa management tool

2020-12-02 Thread Yongji Xie

On Wed, Dec 2, 2020 at 7:13 PM Parav Pandit  wrote:
>
>
>
> > From: Yongji Xie 
> > Sent: Wednesday, December 2, 2020 2:52 PM
> >
> > On Wed, Dec 2, 2020 at 12:53 PM Parav Pandit  wrote:
> > >
> > >
> > >
> > > > From: Yongji Xie 
> > > > Sent: Wednesday, December 2, 2020 9:00 AM
> > > >
> > > > On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit  wrote:
> > > > >
> > > > >
> > > > >
> > > > > > From: Yongji Xie 
> > > > > > Sent: Tuesday, December 1, 2020 7:49 PM
> > > > > >
> > > > > > On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit 
> > wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > From: Yongji Xie 
> > > > > > > > Sent: Tuesday, December 1, 2020 3:26 PM
> > > > > > > >
> > > > > > > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang
> > > > > > > > 
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > > > > > > >>> Thanks for adding me, Jason!
> > > > > > > > > >>>
> > > > > > > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA
> > > > > > > > > >>> Device in
> > > > > > > > > >>> Userspace) [1]. This tool is very useful for the vduse 
> > > > > > > > > >>> device.
> > > > > > > > > >>> So I'm considering integrating this into my v2 patchset.
> > > > > > > > > >>> But there is one problem：
> > > > > > > > > >>>
> > > > > > > > > >>> In this tool, vdpa device config action and enable
> > > > > > > > > >>> action are combined into one netlink msg:
> > > > > > > > > >>> VDPA_CMD_DEV_NEW. But in
> > > > > > vduse
> > > > > > > > > >>> case, it needs to be splitted because a chardev should
> > > > > > > > > >>> be created and opened by a userspace process before we
> > > > > > > > > >>> enable the vdpa device (call vdpa_register_device()).
> > > > > > > > > >>>
> > > > > > > > > >>> So I'd like to know whether it's possible (or have
> > > > > > > > > >>> some
> > > > > > > > > >>> plans) to add two new netlink msgs something like:
> > > > > > > > > >>> VDPA_CMD_DEV_ENABLE
> > > > > > > > and
> > > > > > > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more
> > flexible.
> > > > > > > > > >>>
> > > > > > > > > >> Actually, we've discussed such intermediate step in
> > > > > > > > > >> some early discussion. It looks to me VDUSE could be
> > > > > > > > > >> one of the users of
> > > > this.
> > > > > > > > > >>
> > > > > > > > > >> Or I wonder whether we can switch to use anonymous
> > > > > > > > > >> inode(fd) for VDUSE then fetching it via an
> > > > > > > > > >> VDUSE_GET_DEVICE_FD
> > > > ioctl?
> > > > > > > > > >>
> > > > > > > > > > Yes, we can. Actually the current implementation in
> > > > > > > > > > VDUSE is like this.  But seems like this is still a 
> > > > > > > > > > intermediate
> > step.
> > > > > > > > > > The fd should be binded to a name or something else
> > > > > > > > > > which need to be configured before.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The name could be specified via the netlink. It looks to
> > > > > > > > > me the real issue is that until the device is connected
> > > > > > > > > with a userspace, it can't be used. So we also need to
> > > > > > > > > fail the enabling if it doesn't
> > > > > > opened.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, that's true. So you mean we can firstly try to fetch
> > > > > > > > the fd binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD,
> > > > > > > > then use the name/vduse_id as a attribute to create vdpa
> > > > > > > > device? It looks fine to
> > > > me.
> > > > > > >
> > > > > > > I probably do not well understand. I tried reading patch [1]
> > > > > > > and few things
> > > > > > do not look correct as below.
> > > > > > > Creating the vdpa device on the bus device and destroying the
> > > > > > > device from
> > > > > > the workqueue seems unnecessary and racy.
> > > > > > >
> > > > > > > It seems vduse driver needs
> > > > > > > This is something should be done as part of the vdpa dev add
> > > > > > > command,
> > > > > > instead of connecting two sides separately and ensuring race
> > > > > > free access to it.
> > > > > > >
> > > > > > > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be
> > avoided.
> > > > > > >
> > > > > >
> > > > > > Yes, we can avoid these two ioctls with the help of the management
> > tool.
> > > > > >
> > > > > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > > > > >
> > > > > > > When above command is executed it creates necessary vdpa
> > > > > > > device
> > > > > > > foo2
> > > > > > on the bus.
> > > > > > > When user binds foo2 device with the vduse driver, in the
> > > > > > > probe(), it
> > > > > > creates respective char device to access it from user space.
> > > > > >
> > > > > I see. So vduse cannot work with any existing vdpa devices like
> > > > > ifc, mlx5 or
> > > > netdevsim.
> > > > > It has its own implementation similar to fuse with its own backend of
> > choice.
> > > > > More below.
> > > > >
> > > > > > But vduse driver is not a vdpa bus driver. I

Re: [PATCH net-next v2] mptcp: be careful on MPTCP-level ack.

2020-12-02 Thread Eric Dumazet




On 11/24/20 10:51 PM, Paolo Abeni wrote:
> We can enter the main mptcp_recvmsg() loop even when
> no subflows are connected. As note by Eric, that would
> result in a divide by zero oops on ack generation.
> 
> Address the issue by checking the subflow status before
> sending the ack.
> 
> Additionally protect mptcp_recvmsg() against invocation
> with weird socket states.
> 
> v1 -> v2:
>  - removed unneeded inline keyword - Jakub
> 
> Reported-and-suggested-by: Eric Dumazet 
> Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling")
> Signed-off-by: Paolo Abeni 
> ---
>  net/mptcp/protocol.c | 67 
>  1 file changed, 49 insertions(+), 18 deletions(-)
> 

Looking at mptcp recvmsg(), it seems that a read(fd, ..., 0) will
trigger an infinite loop if there is available data in receive queue ?

It seems the following is needed, commit ea4ca586b16f removed
a needed check to catch this condition.

Untested patch, I can submit it formally if you agree.

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 
221f7cdd416bdb681968bf1b3ff2ed1b03cea3ce..57213ff60f784fae14c2a96f391ccdec6249c168
 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1921,7 +1921,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr 
*msg, size_t len,
len = min_t(size_t, len, INT_MAX);
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
 
-   for (;;) {
+   while (copied < len) {
int bytes_read, old_space;
 
bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied);

Re: [PATCH net-next 2/2] net: ipa: add support for inline checksum offload

2020-12-02 Thread Alex Elder

On 12/1/20 8:13 PM, Jakub Kicinski wrote:
>> To function, the rmnet driver must also add support for this new
>> "inline" checksum offload.  The changes implementing this will be
>> submitted soon.
> We don't usually merge half of a feature. Why not wait until all
> support is in place?
> 
> Do I understand right that it's rmnet that will push the csum header?
> This change seems to only reserve space for it and request the feature
> at init..

You are correct.  The IPA hardware needs to be programmed to
perform the computation and verify that the checksum in the
header matches what it computes (for AP RX offload), or
insert it into the header (for AP TX offload).  That's what
this patch handles.

The RMNet driver is responsible for stripping the offload
header off on RX, and acting on what it says (i.e., whether
hardware is able to state the checksum is good).  And on TX
it inserts an offload header that says what to checksum and
where to put it in the packet.

It's totally fine not to merge this until we have the whole
package ready, I understand.  I'll see what I can do to get
that done quickly.

Thanks Jakub.

-Alex

[iproute2-next v2] seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors

2020-12-02 Thread Paolo Lungaroni

We introduce the "vrftable" attribute for supporting the SRv6 End.DT4 and
End.DT6 behaviors in iproute2.
The "vrftable" attribute indicates the routing table associated with
the VRF device used by SRv6 End.DT4/DT6 for routing IPv4/IPv6 packets.

The SRv6 End.DT4/DT6 is used to implement IPv4/IPv6 L3 VPNs based on Segment
Routing over IPv6 networks in multi-tenants environments.
It decapsulates the received packets and it performs the IPv4/IPv6 routing
lookup in the routing table of the tenant.

The SRv6 End.DT4/DT6 leverages a VRF device in order to force the routing
lookup into the associated routing table using the "vrftable" attribute.

Some examples:
 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev 
eth0
 $ ip -6 route add 2001:db8::2 encap seg6local action End.DT6 vrftable 200 dev 
eth0

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT4 vrftable 100 dev eth0 metric 1024 
pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::2
[ {
"dst": "2001:db8::2",
"encap": "seg6local",
"action": "End.DT6",
"vrftable": 200,
"dev": "eth0",
"metric": 1024,
"flags": [ ],
"pref": "medium"
} ]

v2:
 - no changes made: resubmit after pulling out this patch from the kernel
   patchset.

v1:
 - mixing this patch with the kernel patchset confused patckwork.

Signed-off-by: Paolo Lungaroni 
Signed-off-by: Andrea Mayer 
---
 include/uapi/linux/seg6_local.h |  1 +
 ip/iproute_lwtunnel.c   | 19 ---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
index 5312de80..bb5c8ddf 100644
--- a/include/uapi/linux/seg6_local.h
+++ b/include/uapi/linux/seg6_local.h
@@ -26,6 +26,7 @@ enum {
SEG6_LOCAL_IIF,
SEG6_LOCAL_OIF,
SEG6_LOCAL_BPF,
+   SEG6_LOCAL_VRFTABLE,
__SEG6_LOCAL_MAX,
 };
 #define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 9b4f0885..1ab95cd2 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -294,6 +294,11 @@ static void print_encap_seg6local(FILE *fp, struct rtattr 
*encap)
 
rtnl_rttable_n2a(rta_getattr_u32(tb[SEG6_LOCAL_TABLE]),
 b1, sizeof(b1)));
 
+   if (tb[SEG6_LOCAL_VRFTABLE])
+   print_string(PRINT_ANY, "vrftable", "vrftable %s ",
+
rtnl_rttable_n2a(rta_getattr_u32(tb[SEG6_LOCAL_VRFTABLE]),
+b1, sizeof(b1)));
+
if (tb[SEG6_LOCAL_NH4]) {
print_string(PRINT_ANY, "nh4",
 "nh4 %s ", rt_addr_n2a_rta(AF_INET, 
tb[SEG6_LOCAL_NH4]));
@@ -860,9 +865,10 @@ static int lwt_parse_bpf(struct rtattr *rta, size_t len,
 static int parse_encap_seg6local(struct rtattr *rta, size_t len, int *argcp,
 char ***argvp)
 {
-   int segs_ok = 0, hmac_ok = 0, table_ok = 0, nh4_ok = 0, nh6_ok = 0;
-   int iif_ok = 0, oif_ok = 0, action_ok = 0, srh_ok = 0, bpf_ok = 0;
-   __u32 action = 0, table, iif, oif;
+   int segs_ok = 0, hmac_ok = 0, table_ok = 0, vrftable_ok = 0;
+   int nh4_ok = 0, nh6_ok = 0, iif_ok = 0, oif_ok = 0;
+   __u32 action = 0, table, vrftable, iif, oif;
+   int action_ok = 0, srh_ok = 0, bpf_ok = 0;
struct ipv6_sr_hdr *srh;
char **argv = *argvp;
int argc = *argcp;
@@ -887,6 +893,13 @@ static int parse_encap_seg6local(struct rtattr *rta, 
size_t len, int *argcp,
duparg2("table", *argv);
rtnl_rttable_a2n(&table, *argv);
ret = rta_addattr32(rta, len, SEG6_LOCAL_TABLE, table);
+   } else if (strcmp(*argv, "vrftable") == 0) {
+   NEXT_ARG();
+   if (vrftable_ok++)
+   duparg2("vrftable", *argv);
+   rtnl_rttable_a2n(&vrftable, *argv);
+   ret = rta_addattr32(rta, len, SEG6_LOCAL_VRFTABLE,
+   vrftable);
} else if (strcmp(*argv, "nh4") == 0) {
NEXT_ARG();
if (nh4_ok++)
-- 
2.20.1

Re: [PATCH net-next v1 0/3] vsock: Add flag field in the vsock address

2020-12-02 Thread Stefano Garzarella


Hi Andra,

On Tue, Dec 01, 2020 at 05:25:02PM +0200, Andra Paraschiv wrote:

vsock enables communication between virtual machines and the host they are
running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 Linux kernel
has been released.

Implicitly, if no host->guest vsock transport is loaded, all the vsock packets
are forwarded to the host. This behavior can be used to setup communication
channels between sibling VMs that are running on the same host. One example can
be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a certain use case,
add a flag field in the vsock address data structure. The "svm_reserved1" field
has been repurposed to be the flag field. The value of the flag will then be
taken into consideration when the vsock transport is assigned.

This way can distinguish between nested VMs / local communication and sibling
VMs use cases. And can also setup one or more types of communication at the same
time.



Another thing worth mentioning is that for now it is not supported in 
vhost-vsock, since we are discarding every packet not addressed to the 
host.


What we should do would be:
- add a new IOCTL to vhost-vsock to enable sibling communication, by 
  default I'd like to leave it disabled


- allow sibling forwarding only if both guests have sibling 
  communication enabled and we should implement some kind of filtering 
  or network namespace support to allow the communication only between a 
  subset of VMs



Do you have plans to work on it?

Otherwise I put it in my to-do list and hope I have time to do it (maybe 
next month).


Thanks,
Stefano

Re: [PATCH v2 2/2] net: dsa: qca: ar9331: export stats64

2020-12-02 Thread Oleksij Rempel

On Wed, Dec 02, 2020 at 03:04:38PM +0200, Vladimir Oltean wrote:
> On Wed, Dec 02, 2020 at 01:07:12PM +0100, Oleksij Rempel wrote:
> > Add stats support for the ar9331 switch.
> > 
> > Signed-off-by: Oleksij Rempel 
> > ---
> >  /* Warning: switch reset will reset last AR9331_SW_MDIO_PHY_MODE_PAGE 
> > request
> > @@ -422,6 +527,7 @@ static void ar9331_sw_phylink_mac_link_down(struct 
> > dsa_switch *ds, int port,
> > phy_interface_t interface)
> >  {
> > struct ar9331_sw_priv *priv = (struct ar9331_sw_priv *)ds->priv;
> > +   struct ar9331_sw_port *p = &priv->port[port];
> > struct regmap *regmap = priv->regmap;
> > int ret;
> >  
> > @@ -429,6 +535,8 @@ static void ar9331_sw_phylink_mac_link_down(struct 
> > dsa_switch *ds, int port,
> >  AR9331_SW_PORT_STATUS_MAC_MASK, 0);
> > if (ret)
> > dev_err_ratelimited(priv->dev, "%s: %i\n", __func__, ret);
> > +
> > +   cancel_delayed_work_sync(&p->mib_read);
> 
> Is this sufficient? Do you get a guaranteed .phylink_mac_link_down event
> on unbind? How do you ensure you don't race with the stats worker there?

Currently it works, but you are right, i'll better do this on remove as
well.

> > +static void ar9331_stats_update(struct ar9331_sw_port *port,
> > +   struct rtnl_link_stats64 *stats)
> > +{
> > +   struct ar9331_sw_stats *s = &port->stats;
> > +
> > +   stats->rx_packets = s->rxbroad + s->rxmulti + s->rx64byte +
> > +   s->rx128byte + s->rx256byte + s->rx512byte + s->rx1024byte +
> > +   s->rx1518byte + s->rxmaxbyte;
> > +   stats->tx_packets = s->txbroad + s->txmulti + s->tx64byte +
> > +   s->tx128byte + s->tx256byte + s->tx512byte + s->tx1024byte +
> > +   s->tx1518byte + s->txmaxbyte;
> > +   stats->rx_bytes = s->rxgoodbyte;
> > +   stats->tx_bytes = s->txbyte;
> > +   stats->rx_errors = s->rxfcserr + s->rxalignerr + s->rxrunt +
> > +   s->rxfragment + s->rxoverflow;
> > +   stats->tx_errors = s->txoversize;
> > +   stats->multicast = s->rxmulti;
> > +   stats->collisions = s->txcollision;
> > +   stats->rx_length_errors = s->rxrunt * s->rxfragment + s->rxtoolong;
> 
> Multiplication? Is this right?

no. fixed.

> > +   stats->rx_crc_errors = s->rxfcserr + s->rxalignerr + s->rxfragment;
> > +   stats->rx_frame_errors = s->rxalignerr;
> > +   stats->rx_missed_errors = s->rxoverflow;
> > +   stats->tx_aborted_errors = s->txabortcol;
> > +   stats->tx_fifo_errors = s->txunderrun;
> > +   stats->tx_window_errors = s->txlatecol;
> > +   stats->rx_nohandler = s->filtered;
> > +}
> > +
> > +static void ar9331_do_stats_poll(struct work_struct *work)
> > +{
> > +
> 
> Could you remove this empty line.

fixed

Thank you!

Regards,
Oleksij
-- 
Pengutronix e.K.   | |
Steuerwalder Str. 21   | http://www.pengutronix.de/  |
31137 Hildesheim, Germany  | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

[PATCH net-next v3] devlink: Add devlink port documentation

2020-12-02 Thread Parav Pandit

Added documentation for devlink port and port function related commands.

Signed-off-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Reviewed-by: Jacob Keller 
---
Changelog:
v2->v3:
 - rephased many lines
 - first paragraph now describe devlink port
 - instead of saying PCI device/function, using PCI function every
   where
 - changed 'physical link layer' to 'link layer'
 - made devlink port type description more clear
 - made devlink port flavour description more clear
 - moved devlink port type table after port flavour
 - added description for the example diagram
 - describe CPU port that its linked to DSA
 - made devlink port description for eswitch port more clear
v1->v2:
 - Removed duplicate table entries for DEVLINK_PORT_FLAVOUR_VIRTUAL.
 - replaced 'consist of' to 'consisting'
 - changed 'can be' to 'can be of'
---
 .../networking/devlink/devlink-port.rst   | 111 ++
 Documentation/networking/devlink/index.rst|   1 +
 2 files changed, 112 insertions(+)
 create mode 100644 Documentation/networking/devlink/devlink-port.rst

diff --git a/Documentation/networking/devlink/devlink-port.rst 
b/Documentation/networking/devlink/devlink-port.rst
new file mode 100644
index ..8407bbe9ce88
--- /dev/null
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -0,0 +1,111 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+Devlink Port
+
+
+``devlink-port`` is a port that exist on the device. A devlink port can
+be of one among many flavours. A devlink port flavour along with port
+attributes describe what a port represents.
+
+A device driver who intents to publish a devlink port, sets the
+devlink port attributes and registers the devlink port.
+
+Devlink port flavours are described below.
+
+.. list-table:: List of devlink port flavours
+   :widths: 33 90
+
+   * - Flavour
+ - Description
+   * - ``DEVLINK_PORT_FLAVOUR_PHYSICAL``
+ - Any kind of physical networking port. This can be a eswitch physical
+   port or any other physical port on the device.
+   * - ``DEVLINK_PORT_FLAVOUR_DSA``
+ - This indicates a DSA interconnect port.
+   * - ``DEVLINK_PORT_FLAVOUR_CPU``
+ - This indicates a CPU port applicable only to DSA.
+   * - ``DEVLINK_PORT_FLAVOUR_PCI_PF``
+ - This indicates an eswitch port representing a networking port of
+   PCI physical function (PF).
+   * - ``DEVLINK_PORT_FLAVOUR_PCI_VF``
+ - This indicates an eswitch port representing a networking port of
+   PCI virtual function (VF).
+   * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
+ - This indicates a virtual port for the virtual PCI device such as PCI VF.
+
+A devlink port types are described below.
+
+.. list-table:: List of devlink port types
+   :widths: 23 90
+
+   * - Type
+ - Description
+   * - ``DEVLINK_PORT_TYPE_ETH``
+ - Driver should set this port type when a link layer of the port is 
Ethernet.
+   * - ``DEVLINK_PORT_TYPE_IB``
+ - Driver should set this port type when a link layer of the port is 
InfiniBand.
+   * - ``DEVLINK_PORT_TYPE_AUTO``
+ - This type is indicated by the user when user prefers to set the port 
type
+   to be automatically detected by the device driver.
+
+A controller consist of one or more PCI functions. Such PCI function can have 
one
+or more networking ports. A networking port of such PCI function is represented
+by the eswitch devlink port. A devlink instance holds ports of two types of
+controllers.
+
+(1) controller discovered on same system where eswitch resides:
+This is the case where PCI PF/VF of a controller and devlink eswitch
+instance both are located on a single system.
+
+(2) controller located on external host system.
+This is the case where a controller is located in one system and its
+devlink eswitch ports are located in a different system. Such controller
+is called external controller.
+
+An example view of two controller systems::
+
+In this example a controller which contains the eswitch is local controller
+with controller number = 0. The second is a external controller having
+controller number = 1. Eswitch devlink instance has representor devlink
+ports for the PCI functions of both the controllers.
+
+ -
+ |   |
+ |   - - --- --- |
+---  |   | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
+| server  |  | ---   / ---/- --- ---/--- ---/--- |
+| pci rc  |=== | pf0 |__//   | pf1 |___/___/ |
+| connect |  | ---   --- |
+---  | | controller_num=1 (no eswitch)   |
+ --|--
+ (internal wire)
+   |
+

Re: [PATCH v5 6/9] task_isolation: arch/arm64: enable task isolation functionality

2020-12-02 Thread Mark Rutland

Hi Alex,

On Mon, Nov 23, 2020 at 05:58:06PM +, Alex Belits wrote:
> In do_notify_resume(), call task_isolation_before_pending_work_check()
> first, to report isolation breaking, then after handling all pending
> work, call task_isolation_start() for TIF_TASK_ISOLATION tasks.
> 
> Add _TIF_TASK_ISOLATION to _TIF_WORK_MASK, and _TIF_SYSCALL_WORK,
> define local NOTIFY_RESUME_LOOP_FLAGS to check in the loop, since we
> don't clear _TIF_TASK_ISOLATION in the loop.
> 
> Early kernel entry code calls task_isolation_kernel_enter(). In
> particular:
> 
> Vectors:
> el1_sync -> el1_sync_handler() -> task_isolation_kernel_enter()
> el1_irq -> asm_nmi_enter(), handle_arch_irq()
> el1_error -> do_serror()
> el0_sync -> el0_sync_handler()
> el0_irq -> handle_arch_irq()
> el0_error -> do_serror()
> el0_sync_compat -> el0_sync_compat_handler()
> el0_irq_compat -> handle_arch_irq()
> el0_error_compat -> do_serror()
> 
> SDEI entry:
> __sdei_asm_handler -> __sdei_handler() -> nmi_enter()

As a heads-up, the arm64 entry code is changing, as we found that our
lockdep, RCU, and context-tracking management wasn't quite right. I have
a series of patches:

  https://lore.kernel.org/r/20201130115950.22492-1-mark.rutl...@arm.com

... which are queued in the arm64 for-next/fixes branch. I intend to
have some further rework ready for the next cycle. I'd appreciate if you
could Cc me on any patches altering the arm64 entry code, as I have a
vested interest.

That was quite obviously broken if PROVE_LOCKING and NO_HZ_FULL were
chosen and context tracking was in use (e.g. with
CONTEXT_TRACKING_FORCE), so I'm assuming that this series has not been
tested in that configuration. What sort of testing has this seen?

It would be very helpful for the next posting if you could provide any
instructions on how to test this series (e.g. with pointers to any test
suite that you have), since it's very easy to introduce subtle breakage
in this area without realising it.

> 
> Functions called from there:
> asm_nmi_enter() -> nmi_enter() -> task_isolation_kernel_enter()
> asm_nmi_exit() -> nmi_exit() -> task_isolation_kernel_return()
> 
> Handlers:
> do_serror() -> nmi_enter() -> task_isolation_kernel_enter()
>   or task_isolation_kernel_enter()
> el1_sync_handler() -> task_isolation_kernel_enter()
> el0_sync_handler() -> task_isolation_kernel_enter()
> el0_sync_compat_handler() -> task_isolation_kernel_enter()
> 
> handle_arch_irq() is irqchip-specific, most call handle_domain_irq()
> There is a separate patch for irqchips that do not follow this rule.
> 
> handle_domain_irq() -> task_isolation_kernel_enter()
> do_handle_IPI() -> task_isolation_kernel_enter() (may be redundant)
> nmi_enter() -> task_isolation_kernel_enter()

The IRQ cases look very odd to me. With the rework I've just done for
arm64, we'll do the regular context tracking accounting before we ever
get into handle_domain_irq() or similar, so I suspect that's not
necessary at all?

> 
> Signed-off-by: Chris Metcalf 
> [abel...@marvell.com: simplified to match kernel 5.10]
> Signed-off-by: Alex Belits 
> ---
>  arch/arm64/Kconfig   |  1 +
>  arch/arm64/include/asm/barrier.h |  1 +
>  arch/arm64/include/asm/thread_info.h |  7 +--
>  arch/arm64/kernel/entry-common.c |  7 +++
>  arch/arm64/kernel/ptrace.c   | 10 ++
>  arch/arm64/kernel/signal.c   | 13 -
>  arch/arm64/kernel/smp.c  |  3 +++
>  7 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1515f6f153a0..fc958d8d8945 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -141,6 +141,7 @@ config ARM64
>   select HAVE_ARCH_PREL32_RELOCATIONS
>   select HAVE_ARCH_SECCOMP_FILTER
>   select HAVE_ARCH_STACKLEAK
> + select HAVE_ARCH_TASK_ISOLATION
>   select HAVE_ARCH_THREAD_STRUCT_WHITELIST
>   select HAVE_ARCH_TRACEHOOK
>   select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/arm64/include/asm/barrier.h 
> b/arch/arm64/include/asm/barrier.h
> index c3009b0e5239..ad5a6dd380cf 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -49,6 +49,7 @@
>  #define dma_rmb()dmb(oshld)
>  #define dma_wmb()dmb(oshst)
>  
> +#define instr_sync() isb()

I think I've asked on prior versions of the patchset, but what is this
for? Where is it going to be used, and what is the expected semantics?
I'm wary of exposing this outside of arch code because there aren't
strong cross-architectural semantics, and at the least this requires
some documentation.

If it's unused, please delete it.

[...]

> diff --git a/arch/arm64/kernel/entry-common.c 
> b/arch/arm64/kernel/entry-common.c
> index 43d4c329775f..8152760de683 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -77,6 +78,8 @@ asmlink

Re: [EXT] Re: [PATCH v5 0/9] "Task_isolation" mode

2020-12-02 Thread Mark Rutland

On Tue, Nov 24, 2020 at 05:40:49PM +, Alex Belits wrote:
> 
> On Tue, 2020-11-24 at 08:36 -0800, Tom Rix wrote:
> > External Email
> > 
> > ---
> > ---
> > 
> > On 11/23/20 9:42 AM, Alex Belits wrote:
> > > This is an update of task isolation work that was originally done
> > > by
> > > Chris Metcalf  and maintained by him until
> > > November 2017. It is adapted to the current kernel and cleaned up
> > > to
> > > implement its functionality in a more complete and cleaner manner.
> > 
> > I am having problems applying the patchset to today's linux-next.
> > 
> > Which kernel should I be using ?
> 
> The patches are against Linus' tree, in particular, commit
> a349e4c659609fd20e4beea89e5c4a4038e33a95

Is there any reason to base on that commit in particular?

Generally it's preferred that a series is based on a tag (so either a
release or an -rc kernel), and that the cover letter explains what the
base is. If you can do that in future it'll make the series much easier
to work with.

Thanks,
Mark.

[PATCH v3 net-next 0/2] net: dsa: add stats64 support

2020-12-02 Thread Oleksij Rempel

changes v3:
- fix wrong multiplication
- cancel port workers on remove

changes v2:
- use stats64 instead of get_ethtool_stats
- add worked to poll for the stats

Oleksij Rempel (2):
  net: dsa: add optional stats64 support
  net: dsa: qca: ar9331: export stats64

 drivers/net/dsa/qca/ar9331.c | 248 ++-
 include/net/dsa.h|   3 +
 net/dsa/slave.c  |  14 +-
 3 files changed, 263 insertions(+), 2 deletions(-)

-- 
2.29.2

[PATCH v3 net-next 1/2] net: dsa: add optional stats64 support

2020-12-02 Thread Oleksij Rempel

Allow DSA drivers to export stats64

Signed-off-by: Oleksij Rempel 
Reviewed-by: Vladimir Oltean 
---
 include/net/dsa.h |  3 +++
 net/dsa/slave.c   | 14 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 4e60d2610f20..457b89143875 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -655,6 +655,9 @@ struct dsa_switch_ops {
int (*port_change_mtu)(struct dsa_switch *ds, int port,
   int new_mtu);
int (*port_max_mtu)(struct dsa_switch *ds, int port);
+
+   void(*get_stats64)(struct dsa_switch *ds, int port,
+  struct rtnl_link_stats64 *s);
 };
 
 #define DSA_DEVLINK_PARAM_DRIVER(_id, _name, _type, _cmodes)   \
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index ff2266d2b998..6e1a4dc18a97 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1602,6 +1602,18 @@ static struct devlink_port 
*dsa_slave_get_devlink_port(struct net_device *dev)
return dp->ds->devlink ? &dp->devlink_port : NULL;
 }
 
+static void dsa_slave_get_stats64(struct net_device *dev,
+ struct rtnl_link_stats64 *s)
+{
+   struct dsa_port *dp = dsa_slave_to_port(dev);
+   struct dsa_switch *ds = dp->ds;
+
+   if (!ds->ops->get_stats64)
+   return dev_get_tstats64(dev, s);
+
+   return ds->ops->get_stats64(ds, dp->index, s);
+}
+
 static const struct net_device_ops dsa_slave_netdev_ops = {
.ndo_open   = dsa_slave_open,
.ndo_stop   = dsa_slave_close,
@@ -1621,7 +1633,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = 
{
 #endif
.ndo_get_phys_port_name = dsa_slave_get_phys_port_name,
.ndo_setup_tc   = dsa_slave_setup_tc,
-   .ndo_get_stats64= dev_get_tstats64,
+   .ndo_get_stats64= dsa_slave_get_stats64,
.ndo_get_port_parent_id = dsa_slave_get_port_parent_id,
.ndo_vlan_rx_add_vid= dsa_slave_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid   = dsa_slave_vlan_rx_kill_vid,
-- 
2.29.2

[PATCH v3 net-next 2/2] net: dsa: qca: ar9331: export stats64

2020-12-02 Thread Oleksij Rempel

Add stats support for the ar9331 switch.

Signed-off-by: Oleksij Rempel 
---
 drivers/net/dsa/qca/ar9331.c | 248 ++-
 1 file changed, 247 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c
index e24a99031b80..48c81996e807 100644
--- a/drivers/net/dsa/qca/ar9331.c
+++ b/drivers/net/dsa/qca/ar9331.c
@@ -101,6 +101,57 @@
 AR9331_SW_PORT_STATUS_RX_FLOW_EN | AR9331_SW_PORT_STATUS_TX_FLOW_EN | \
 AR9331_SW_PORT_STATUS_SPEED_M)
 
+/* MIB registers */
+#define AR9331_MIB_COUNTER(x)  (0x2 + ((x) * 0x100))
+
+#define AR9331_PORT_MIB_rxbroad(_port) (AR9331_MIB_COUNTER(_port) + 
0x00)
+#define AR9331_PORT_MIB_rxpause(_port) (AR9331_MIB_COUNTER(_port) + 
0x04)
+#define AR9331_PORT_MIB_rxmulti(_port) (AR9331_MIB_COUNTER(_port) + 
0x08)
+#define AR9331_PORT_MIB_rxfcserr(_port)
(AR9331_MIB_COUNTER(_port) + 0x0c)
+#define AR9331_PORT_MIB_rxalignerr(_port)  (AR9331_MIB_COUNTER(_port) + 
0x10)
+#define AR9331_PORT_MIB_rxrunt(_port)  (AR9331_MIB_COUNTER(_port) + 
0x14)
+#define AR9331_PORT_MIB_rxfragment(_port)  (AR9331_MIB_COUNTER(_port) + 
0x18)
+#define AR9331_PORT_MIB_rx64byte(_port)
(AR9331_MIB_COUNTER(_port) + 0x1c)
+#define AR9331_PORT_MIB_rx128byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x20)
+#define AR9331_PORT_MIB_rx256byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x24)
+#define AR9331_PORT_MIB_rx512byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x28)
+#define AR9331_PORT_MIB_rx1024byte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x2c)
+#define AR9331_PORT_MIB_rx1518byte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x30)
+#define AR9331_PORT_MIB_rxmaxbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x34)
+#define AR9331_PORT_MIB_rxtoolong(_port)   (AR9331_MIB_COUNTER(_port) + 
0x38)
+
+/* 64 bit counter */
+#define AR9331_PORT_MIB_rxgoodbyte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x3c)
+
+/* 64 bit counter */
+#define AR9331_PORT_MIB_rxbadbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x44)
+
+#define AR9331_PORT_MIB_rxoverflow(_port)  (AR9331_MIB_COUNTER(_port) + 
0x4c)
+#define AR9331_PORT_MIB_filtered(_port)
(AR9331_MIB_COUNTER(_port) + 0x50)
+#define AR9331_PORT_MIB_txbroad(_port) (AR9331_MIB_COUNTER(_port) + 
0x54)
+#define AR9331_PORT_MIB_txpause(_port) (AR9331_MIB_COUNTER(_port) + 
0x58)
+#define AR9331_PORT_MIB_txmulti(_port) (AR9331_MIB_COUNTER(_port) + 
0x5c)
+#define AR9331_PORT_MIB_txunderrun(_port)  (AR9331_MIB_COUNTER(_port) + 
0x60)
+#define AR9331_PORT_MIB_tx64byte(_port)
(AR9331_MIB_COUNTER(_port) + 0x64)
+#define AR9331_PORT_MIB_tx128byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x68)
+#define AR9331_PORT_MIB_tx256byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x6c)
+#define AR9331_PORT_MIB_tx512byte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x70)
+#define AR9331_PORT_MIB_tx1024byte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x74)
+#define AR9331_PORT_MIB_tx1518byte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x78)
+#define AR9331_PORT_MIB_txmaxbyte(_port)   (AR9331_MIB_COUNTER(_port) + 
0x7c)
+#define AR9331_PORT_MIB_txoversize(_port)  (AR9331_MIB_COUNTER(_port) + 
0x80)
+
+/* 64 bit counter */
+#define AR9331_PORT_MIB_txbyte(_port)  (AR9331_MIB_COUNTER(_port) + 
0x84)
+
+#define AR9331_PORT_MIB_txcollision(_port) (AR9331_MIB_COUNTER(_port) + 
0x8c)
+#define AR9331_PORT_MIB_txabortcol(_port)  (AR9331_MIB_COUNTER(_port) + 
0x90)
+#define AR9331_PORT_MIB_txmulticol(_port)  (AR9331_MIB_COUNTER(_port) + 
0x94)
+#define AR9331_PORT_MIB_txsinglecol(_port) (AR9331_MIB_COUNTER(_port) + 
0x98)
+#define AR9331_PORT_MIB_txexcdefer(_port)  (AR9331_MIB_COUNTER(_port) + 
0x9c)
+#define AR9331_PORT_MIB_txdefer(_port) (AR9331_MIB_COUNTER(_port) + 
0xa0)
+#define AR9331_PORT_MIB_txlatecol(_port)   (AR9331_MIB_COUNTER(_port) + 
0xa4)
+
 /* Phy bypass mode
  * 
  * Bit:   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
@@ -154,6 +205,59 @@
 #define AR9331_SW_MDIO_POLL_SLEEP_US   1
 #define AR9331_SW_MDIO_POLL_TIMEOUT_US 20
 
+#define STATS_INTERVAL_JIFFIES (100 * HZ)
+
+struct ar9331_sw_stats {
+   u64 rxbroad;
+   u64 rxpause;
+   u64 rxmulti;
+   u64 rxfcserr;
+   u64 rxalignerr;
+   u64 rxrunt;
+   u64 rxfragment;
+   u64 rx64byte;
+   u64 rx128byte;
+   u64 rx256byte;
+   u64 rx512byte;
+   u64 rx1024byte;
+   u64 rx1518byte;
+   u64 rxmaxbyte;
+   u64 rxtoolong;
+   u64 rxgoodbyte;
+   u64 rxbadbyte;
+   u64 rxoverflow;
+   u64 filtered;
+   u64 txbroad;
+   u64 txpause;
+   u64 txmulti;
+   u64 txunderrun;
+   u64 tx64byte;
+   u64 tx128byte;
+   u64 tx256byte;
+   u64 tx512b

[PATCH v2 net-next] net: ipa: fix build-time bug in ipa_hardware_config_qsb()

2020-12-02 Thread Alex Elder

Jon Hunter reported observing a build bug in the IPA driver:
  https://lore.kernel.org/netdev/5b5d9d40-94d5-5dad-b861-fd9bef826...@nvidia.com

The problem is that the QMB0 max read value set for IPA v4.5 (16) is
too large to fit in the 4-bit field.

The actual value we want is 0, which requests that the hardware use
the maximum it is capable of.

Reported-by: Jon Hunter 
Tested-by: Jon Hunter 
Signed-off-by: Alex Elder 
---
v2: Got confirmation that 0 is the desired value to use (with comment).

 drivers/net/ipa/ipa_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index d0768452c15cf..84bb8ae927252 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -288,7 +288,7 @@ static void ipa_hardware_config_qsb(struct ipa *ipa)
max1 = 0;   /* PCIe not present */
break;
case IPA_VERSION_4_5:
-   max0 = 16;
+   max0 = 0;   /* No limit (hardware maximum) */
break;
}
val = u32_encode_bits(max0, GEN_QMB_0_MAX_READS_FMASK);
-- 
2.20.1

Re: [PATCH v5 5/9] task_isolation: Add driver-specific hooks

2020-12-02 Thread Mark Rutland

On Mon, Nov 23, 2020 at 05:57:42PM +, Alex Belits wrote:
> Some drivers don't call functions that call
> task_isolation_kernel_enter() in interrupt handlers. Call it
> directly.

I don't think putting this in drivers is the right approach. IIUC we
only need to track user<->kernel transitions, and we can do that within
the architectural entry code before we ever reach irqchip code. I
suspect the current approacch is an artifact of that being difficult in
the old structure of the arch code; recent rework should address that,
and we can restruecture things further in future.

Thanks,
Mark.

> Signed-off-by: Alex Belits 
> ---
>  drivers/irqchip/irq-armada-370-xp.c | 6 ++
>  drivers/irqchip/irq-gic-v3.c| 3 +++
>  drivers/irqchip/irq-gic.c   | 3 +++
>  drivers/s390/cio/cio.c  | 3 +++
>  4 files changed, 15 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-armada-370-xp.c 
> b/drivers/irqchip/irq-armada-370-xp.c
> index d7eb2e93db8f..4ac7babe1abe 100644
> --- a/drivers/irqchip/irq-armada-370-xp.c
> +++ b/drivers/irqchip/irq-armada-370-xp.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -572,6 +573,7 @@ static const struct irq_domain_ops 
> armada_370_xp_mpic_irq_ops = {
>  static void armada_370_xp_handle_msi_irq(struct pt_regs *regs, bool 
> is_chained)
>  {
>   u32 msimask, msinr;
> + int isol_entered = 0;
>  
>   msimask = readl_relaxed(per_cpu_int_base +
>   ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS)
> @@ -588,6 +590,10 @@ static void armada_370_xp_handle_msi_irq(struct pt_regs 
> *regs, bool is_chained)
>   continue;
>  
>   if (is_chained) {
> + if (!isol_entered) {
> + task_isolation_kernel_enter();
> + isol_entered = 1;
> + }
>   irq = irq_find_mapping(armada_370_xp_msi_inner_domain,
>  msinr - PCI_MSI_DOORBELL_START);
>   generic_handle_irq(irq);
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 16fecc0febe8..ded26dd4da0f 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -646,6 +647,8 @@ static asmlinkage void __exception_irq_entry 
> gic_handle_irq(struct pt_regs *regs
>  {
>   u32 irqnr;
>  
> + task_isolation_kernel_enter();
> +
>   irqnr = gic_read_iar();
>  
>   if (gic_supports_nmi() &&
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index 6053245a4754..bb482b4ae218 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -337,6 +338,8 @@ static void __exception_irq_entry gic_handle_irq(struct 
> pt_regs *regs)
>   struct gic_chip_data *gic = &gic_data[0];
>   void __iomem *cpu_base = gic_data_cpu_base(gic);
>  
> + task_isolation_kernel_enter();
> +
>   do {
>   irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK);
>   irqnr = irqstat & GICC_IAR_INT_ID_MASK;
> diff --git a/drivers/s390/cio/cio.c b/drivers/s390/cio/cio.c
> index 6d716db2a46a..beab1b6d 100644
> --- a/drivers/s390/cio/cio.c
> +++ b/drivers/s390/cio/cio.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -584,6 +585,8 @@ void cio_tsch(struct subchannel *sch)
>   struct irb *irb;
>   int irq_context;
>  
> + task_isolation_kernel_enter();
> +
>   irb = this_cpu_ptr(&cio_irb);
>   /* Store interrupt response block to lowcore. */
>   if (tsch(sch->schid, irb) != 0)
> -- 
> 2.20.1
>

Re: [PATCH v5 7/9] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu()

2020-12-02 Thread Mark Rutland

On Mon, Nov 23, 2020 at 05:58:22PM +, Alex Belits wrote:
> From: Yuri Norov 
> 
> For nohz_full CPUs the desirable behavior is to receive interrupts
> generated by tick_nohz_full_kick_cpu(). But for hard isolation it's
> obviously not desirable because it breaks isolation.
> 
> This patch adds check for it.
> 
> Signed-off-by: Yuri Norov 
> [abel...@marvell.com: updated, only exclude CPUs running isolated tasks]
> Signed-off-by: Alex Belits 
> ---
>  kernel/time/tick-sched.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index a213952541db..6c8679e200f0 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -268,7 +269,8 @@ static void tick_nohz_full_kick(void)
>   */
>  void tick_nohz_full_kick_cpu(int cpu)
>  {
> - if (!tick_nohz_full_cpu(cpu))
> + smp_rmb();

What does this barrier pair with? The commit message doesn't mention it,
and it's not clear in-context.

Thanks,
Mark.

> + if (!tick_nohz_full_cpu(cpu) || task_isolation_on_cpu(cpu))
>   return;
>  
>   irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
> -- 
> 2.20.1
>

[PATCH net-next] selftests: forwarding: Add MPLS L2VPN test

2020-12-02 Thread Guillaume Nault

Connect hosts H1 and H2 using two intermediate encapsulation routers
(LER1 and LER2). These routers encapsulate traffic from the hosts,
including the original Ethernet header, into MPLS.

Use ping to test reachability between H1 and H2.

Signed-off-by: Guillaume Nault 
---
 .../testing/selftests/net/forwarding/Makefile |   1 +
 tools/testing/selftests/net/forwarding/config |   3 +
 .../selftests/net/forwarding/tc_mpls_l2vpn.sh | 192 ++
 3 files changed, 196 insertions(+)
 create mode 100755 tools/testing/selftests/net/forwarding/tc_mpls_l2vpn.sh

diff --git a/tools/testing/selftests/net/forwarding/Makefile 
b/tools/testing/selftests/net/forwarding/Makefile
index 250fbb2d1625..d97bd6889446 100644
--- a/tools/testing/selftests/net/forwarding/Makefile
+++ b/tools/testing/selftests/net/forwarding/Makefile
@@ -48,6 +48,7 @@ TEST_PROGS = bridge_igmp.sh \
tc_chains.sh \
tc_flower_router.sh \
tc_flower.sh \
+   tc_mpls_l2vpn.sh \
tc_shblocks.sh \
tc_vlan_modify.sh \
vxlan_asymmetric.sh \
diff --git a/tools/testing/selftests/net/forwarding/config 
b/tools/testing/selftests/net/forwarding/config
index da96eff72a8e..10e9a3321ae1 100644
--- a/tools/testing/selftests/net/forwarding/config
+++ b/tools/testing/selftests/net/forwarding/config
@@ -6,6 +6,9 @@ CONFIG_IPV6_MULTIPLE_TABLES=y
 CONFIG_NET_VRF=m
 CONFIG_BPF_SYSCALL=y
 CONFIG_CGROUP_BPF=y
+CONFIG_NET_ACT_MIRRED=m
+CONFIG_NET_ACT_MPLS=m
+CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_CLS_FLOWER=m
 CONFIG_NET_SCH_INGRESS=m
 CONFIG_NET_ACT_GACT=m
diff --git a/tools/testing/selftests/net/forwarding/tc_mpls_l2vpn.sh 
b/tools/testing/selftests/net/forwarding/tc_mpls_l2vpn.sh
new file mode 100755
index ..03743f04e178
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/tc_mpls_l2vpn.sh
@@ -0,0 +1,192 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# +---+
+# | H1 (v$h1) |
+# | 192.0.2.1/24  |
+# | 2001:db8::1/124   |
+# | + $h1 |
+# +-|-+
+#   |
+#   | (Plain Ethernet traffic)
+#   |
+# +-|-+
+# | LER1+ $edge1  |
+# | -ingress: |
+# |   -encapsulate Ethernet into MPLS |
+# |   -add outer Ethernet header  |
+# |   -redirect to $mpls1 (egress)|
+# |   |
+# | + $mpls1  |
+# | |   -ingress: |
+# | | -remove outer Ethernet header   |
+# | | -remove MPLS header |
+# | | -redirect to $edge1 (egress)|
+# +-|-+
+#   |
+#   | (Ethernet over MPLS traffic)
+#   |
+# +-|-+
+# | LER2+ $mpls2  |
+# | -ingress: |
+# |   -remove outer Ethernet header   |
+# |   -remove MPLS header |
+# |   -redirect to $edge2 (egress)|
+# |   |
+# | + $edge2  |
+# | |   -ingress: |
+# | | -encapsulate Ethernet into MPLS |
+# | | -add outer Ethernet header  |
+# | | -redirect to $mpls2 (egress)|
+# +-|-|
+#   |
+#   | (Plain Ethernet traffic)
+#   |
+# +-|-+
+# | H2 (v$h2)   | |
+# | + $h2 |
+# | 192.0.2.2/24  |
+# | 2001:db8::2/124   |
+# +---+
+#
+# LER1 and LER2 logically represent two different routers. However, no VRF is
+# created for them, as they don't do any IP routing.
+
+ALL_TESTS="mpls_forward_eth"
+NUM_NETIFS=6
+source lib.sh
+
+h1_create()
+{
+   simple_if_init $h1 192.0.2.1/24 2001:db8::1/124
+}
+
+h1_destroy()
+{
+   simple_if_fini $h1 192.0.2.1/24 2001:db8::1/124
+}
+
+h2_create()
+{
+   simple_if_init $h2 192.0.2.2/24 2001:db8::2/124
+}
+
+h2_destroy()
+{
+   simple_if_fini $h2 192.0.2.2/24 2001:db8::2/124
+}
+
+ler1_create()
+{
+   tc qdisc add dev $edge1 ingress
+   tc filter add dev $edge1 ingress\
+  matchall \
+  action mpls mac_push label 102   \
+

1 2 3 4 >

1 - 100 of 396 matches

Mail list logo