[PATCH iproute2] ip6tunnel: fix 'ip -6 {show|change} dev ' cmds

2019-06-06 Thread Mahesh Bandewar
ow ip6tnl1 Signed-off-by: Mahesh Bandewar --- ip/ip6tunnel.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c index 999408ed801b..56fd3466ed06 100644 --- a/ip/ip6tunnel.c +++ b/ip/ip6tunnel.c @@ -298,6 +298,8 @@ static int parse_args(int argc, char **argv, int cm

[PATCH next 0/3] blackhole device to invalidate dst

2019-06-21 Thread Mahesh Bandewar
is device instead of 'lo' when marking the dst dead. First patch implements the blackhole device and second patch uses it in IPv4 and IPv6 stack while the third patch is the self test that ensures the sanity of this device. Mahesh Bandewar (3): loopback: create blackhole net device si

[PATCH next 1/3] loopback: create blackhole net device similar to loopack.

2019-06-21 Thread Mahesh Bandewar
since it's not registered it won't have ifindex. Lower MTU effectively make the device not pass the MTU check during the route check when a dst associated with the skb is dead. Signed-off-by: Mahesh Bandewar --- drivers/net/loopback.c| 76 ++- inclu

[PATCH next 2/3] blackhole_netdev: use blackhole_netdev to invalidate dst entries

2019-06-21 Thread Mahesh Bandewar
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead". Signed-off-by: Mahesh Bandewar --- net/core/dst.c | 2 +- net/ipv4/route.c | 3 +-- net/ipv6/route.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/net/core/dst.c b/

[PATCH next 3/3] blackhole_dev: add a selftest

2019-06-21 Thread Mahesh Bandewar
Since this is not really a device with all capabilities, this test ensures that it has *enough* to make it through the data path without causing unwanted side-effects (read crash!). Signed-off-by: Mahesh Bandewar --- lib/Kconfig.debug | 9 ++ lib/Makefile

[PATCHv2 next 2/3] blackhole_netdev: use blackhole_netdev to invalidate dst entries

2019-06-27 Thread Mahesh Bandewar
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead". Signed-off-by: Mahesh Bandewar --- v1 -> v2 no change net/core/dst.c | 2 +- net/ipv4/route.c | 3 +-- net/ipv6/route.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff

[PATCHv2 next 1/3] loopback: create blackhole net device similar to loopack.

2019-06-27 Thread Mahesh Bandewar
since it's not registered it won't have ifindex. Lower MTU effectively make the device not pass the MTU check during the route check when a dst associated with the skb is dead. Signed-off-by: Mahesh Bandewar --- v1->v2 no change drivers/net/loopback.c| 76 ++

[PATCHv2 next 3/3] blackhole_dev: add a selftest

2019-06-27 Thread Mahesh Bandewar
Since this is not really a device with all capabilities, this test ensures that it has *enough* to make it through the data path without causing unwanted side-effects (read crash!). Signed-off-by: Mahesh Bandewar --- v1 -> v2 fixed the conflict resolution in selftests Makefile

[PATCHv2 next 0/3] blackhole device to invalidate dst

2019-06-27 Thread Mahesh Bandewar
is device instead of 'lo' when marking the dst dead. First patch implements the blackhole device and second patch uses it in IPv4 and IPv6 stack while the third patch is the self test that ensures the sanity of this device. v1->v2 fixed the self-test patch to handle the conf

[PATCHv3 next 0/3] blackhole device to invalidate dst

2019-07-01 Thread Mahesh Bandewar
is device instead of 'lo' when marking the dst dead. First patch implements the blackhole device and second patch uses it in IPv4 and IPv6 stack while the third patch is the self test that ensures the sanity of this device. v1->v2 fixed the self-test patch to handle the conflict

[PATCHv3 next 1/3] loopback: create blackhole net device similar to loopack.

2019-07-01 Thread Mahesh Bandewar
since it's not registered it won't have ifindex. Lower MTU effectively make the device not pass the MTU check during the route check when a dst associated with the skb is dead. Signed-off-by: Mahesh Bandewar --- v1->v2->v3 no change drivers/net/loopback.c| 76 +

[PATCHv3 next 2/3] blackhole_netdev: use blackhole_netdev to invalidate dst entries

2019-07-01 Thread Mahesh Bandewar
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead". Signed-off-by: Mahesh Bandewar --- v1->v2->v3 no change net/core/dst.c | 2 +- net/ipv4/route.c | 3 +-- net/ipv6/route.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions

[PATCHv3 next 3/3] blackhole_dev: add a selftest

2019-07-01 Thread Mahesh Bandewar
Since this is not really a device with all capabilities, this test ensures that it has *enough* to make it through the data path without causing unwanted side-effects (read crash!). Signed-off-by: Mahesh Bandewar --- v1 -> v2 fixed the conflict resolution in selftests Makefile v2 -> v3

[PATCH next] loopback: fix lockdep splat

2019-07-02 Thread Mahesh Bandewar
0x260/0x260 [3.855074] kernel_init+0xf/0x180 [3.855076] ? rest_init+0x260/0x260 [3.855078] ret_from_fork+0x24/0x30 Fixes: 4de83b88c66 ("loopback: create blackhole net device similar to loopack.") Reported-by: Geert Uytterhoeven Cc: Eric Dumazet Signed-off-by: Mahesh Bandewa

[PATCH next] neigh: initialize neigh entry correctly during arp processing

2017-08-16 Thread Mahesh Bandewar
From: Mahesh Bandewar If the ARP processing creates a neigh entry, it's immediately marked as STALE without timer and stays that way in that state as long as host do not send traffic to that neighbour. I observed this on hosts which are in IPv6 environment, where there is very little to no

[PATCH next] ipvlan: always use the current L2 addr of the master

2017-10-11 Thread Mahesh Bandewar
From: Mahesh Bandewar If the underlying master ever changes its L2 (e.g. bonding device), then make sure that the IPvlan slaves always emit packets with the current L2 of the master instead of the stale mac addr which was copied during the device creation. The problem can be seen with following

[PATCH next] bonding: pass link-local packets to bonding master also.

2018-07-15 Thread Mahesh Bandewar
From: Mahesh Bandewar Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets ar

[RFC PATCH 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-09-21 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[RFC PATCH 2/2] userns: control capabilities of some user namespaces

2017-09-21 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[RFC PATCH 0/2] capability controlled user-namespaces

2017-09-21 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCH next] bonding: speed/duplex update at NETDEV_UP event

2017-09-27 Thread Mahesh Bandewar
From: Mahesh Bandewar Some NIC drivers don't have correct speed/duplex settings at the time they send NETDEV_UP notification and that messes up the bonding state. Especially 802.3ad mode which is very sensitive to these settings. In the current implementation we invoke bond_update_speed_d

[PATCH 2/2] userns: control capabilities of some user namespaces

2017-09-29 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCH 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-09-29 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCH 0/2] capability controlled user-namespaces

2017-09-29 Thread Mahesh Bandewar
From: Mahesh Bandewar [Same as the previous RFC series sent on 9/21] TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few

[PATCH iproute2] iproute: make clang happy

2018-08-20 Thread Mahesh Bandewar
From: Mahesh Bandewar These are primarily fixes for "string is not string literal" warnings / errors (with -Werror -Wformat-nonliteral). This should be a no-op change. I had to replace couple of print helper functions with the code they call as it was becoming harder to eliminate thes

[PATCHv2 iproute2 1/3] ipmaddr: use preferred_family when given

2018-08-21 Thread Mahesh Bandewar
From: Mahesh Bandewar When creating socket() AF_INET is used irrespective of the family that is given at the command-line (with -4, -6, or -0). This change will open the socket with the preferred family. Signed-off-by: Mahesh Bandewar --- ip/ipmaddr.c | 13 - 1 file changed, 12

[PATCHv2 iproute2 2/3] tc: remove extern from prototype declarations

2018-08-21 Thread Mahesh Bandewar
From: Mahesh Bandewar Signed-off-by: Mahesh Bandewar --- tc/m_ematch.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tc/m_ematch.h b/tc/m_ematch.h index f634f19164fa..80b02cfad6cc 100644 --- a/tc/m_ematch.h +++ b/tc/m_ematch.h @@ -20,7 +20,7 @@ struct bstr

[PATCHv2 iproute2 0/3] clang + misc changes

2018-08-21 Thread Mahesh Bandewar
From: Mahesh Bandewar The primary theme is to make clang compile the iproute2 package without warnings. Along with this there are two other misc patches in the series. First patch uses the preferred_family when operating with maddr feature. Prior to this patch, it would always open an AF_INET

[PATCHv2 iproute2 3/3] iproute: make clang happy with iproute2 package

2018-08-21 Thread Mahesh Bandewar
From: Mahesh Bandewar These are primarily fixes for "string is not string literal" warnings / errors (with -Werror -Wformat-nonliteral). This should be a no-op change. I had to replace couple of print helper functions with the code they call as it was becoming harder to eliminate thes

[PATCHv3 iproute2 2/2] iproute: make clang happy

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar These are primarily fixes for "string is not string literal" warnings / errors (with -Werror -Wformat-nonliteral). This should be a no-op change. I had to replace couple of print helper functions with the code they call as it was becoming harder to eliminate thes

[PATCHv3 iproute2 0/2] clang + misc changes

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar The primary theme is to make clang compile the iproute2 package without warnings. Along with this there are two other misc patches in the series. First patch uses the preferred_family when operating with maddr feature. Prior to this patch, it would always open an AF_INET

[PATCHv3 iproute2 1/2] ipmaddr: use preferred_family when given

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar When creating socket() AF_INET is used irrespective of the family that is given at the command-line (with -4, -6, or -0). This change will open the socket with the preferred family. Signed-off-by: Mahesh Bandewar --- ip/ipmaddr.c | 13 - 1 file changed, 12

[PATCH iproute2] iproute2: fix use-after-free

2018-09-12 Thread Mahesh Bandewar
From: Mahesh Bandewar A local program using iproute2 lib pointed out the issue and looking at the code it is pretty obvious - a = (struct nlmsghdr *)b; ... free(b); if (a->nlmsg_seq == seq) ... Fixes: 86bf43c7c2fd ("lib/libnetlink: update rtnl_talk to support mal

[PATCHv4 2/2] userns: control capabilities of some user namespaces

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCHv4 0/2] capability controlled user-namespaces

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCHv4 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. Capability mask is stored in kernel as kernel_cap_t type (array of u32). This sysctl takes input as comma separated hex u32 words. For simplicity one could see this sysctl to operate on string inputs. However

[PATCH next] blackhole_netdev: fix syzkaller reported issue

2019-10-09 Thread Mahesh Bandewar
DR6: fffe0ff0 DR7: 0400 Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") Signed-off-by: Mahesh Bandewar --- net/ipv6/addrconf.c | 6 +- net/ipv6/route.c| 5 ++--- 2 files changed, 7 insertions(+), 4 deletions(-) di

[PATCH next] ipvlan: consolidate TSO flags using NETIF_F_ALL_TSO

2019-10-09 Thread Mahesh Bandewar
This will ensure that any new TSO related flags added (which would be part of ALL_TSO mask and IPvlan driver doesn't need to update every time new flag gets added. Signed-off-by: Mahesh Bandewar Suggested-by: Eric Dumazet --- drivers/net/ipvlan/ipvlan_main.c | 4 ++-- 1 file chang

[PATCHv2 next] blackhole_netdev: fix syzkaller reported issue

2019-10-10 Thread Mahesh Bandewar
DR6: fffe0ff0 DR7: 0400 Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") Signed-off-by: Mahesh Bandewar CC: Eric Dumazet --- v1->v2: fixed missing update in ip6_dst_ifdown() net/ipv6/addrconf.c | 6 +- net/ip

[PATCHv3 next] blackhole_netdev: fix syzkaller reported issue

2019-10-11 Thread Mahesh Bandewar
DR6: fffe0ff0 DR7: 0400 Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") Signed-off-by: Mahesh Bandewar CC: Eric Dumazet CC: Wei Wang --- v1->v2: fixed missing update in ip6_dst_ifdown() v2->v3: added idev cle

[PATCH net] Revert "blackhole_netdev: fix syzkaller reported issue"

2019-10-16 Thread Mahesh Bandewar
evert this now and I'll send a better fix after analysing / fixing the weirdness observed. CC: Eric Dumazet CC: Wei Wang CC: David S. Miller Signed-off-by: Mahesh Bandewar --- net/ipv6/addrconf.c | 7 +-- net/ipv6/route.c| 15 +-- 2 files changed, 10 insertions(+), 12

[PATCH net] bonding: pass link-local packets to bonding master also.

2018-09-24 Thread Mahesh Bandewar
From: Mahesh Bandewar Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets ar

[PATCH net] bonding: avoid possible dead-lock

2018-09-24 Thread Mahesh Bandewar
From: Mahesh Bandewar Syzkaller reported this on a slightly older kernel but it's still applicable to the current kernel - == WARNING: possible circular locking dependency detected 4.18.0-next-20180823+ #46 Not ta

[PATCH net] bonding: fix warning message

2018-10-02 Thread Mahesh Bandewar
From: Mahesh Bandewar RX queue config for bonding master could be different from its slave device(s). With the commit 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also."), the packet is reinjected into stack with skb->dev as bonding master. This potentially

[PATCH next v2] bonding: pass link-local packets to bonding master also.

2018-07-18 Thread Mahesh Bandewar
From: Mahesh Bandewar Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets ar

[PATCH net v2] bonding: pass link-local packets to bonding master also.

2018-07-18 Thread Mahesh Bandewar
From: Mahesh Bandewar Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets ar

[PATCH iproute2] ipmaddr: use preferred_family when given

2018-08-15 Thread Mahesh Bandewar
From: Mahesh Bandewar When creating socket() AF_INET is used irrespective of the family that is given at the command-line (with -4, -6, or -0). This change will open the socket with the preferred family. Signed-off-by: Mahesh Bandewar --- ip/ipmaddr.c | 13 - 1 file changed, 12

[PATCH resend 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-11-02 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-02 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCH resend 0/2] capability controlled user-namespaces

2017-11-02 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCHv2 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCHv2 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-11-09 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCHv2 0/2] capability controlled user-namespaces

2017-11-09 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCH 0/2] bring UP loopback device at initialziation

2017-07-04 Thread Mahesh Bandewar
From: Mahesh Bandewar In almost every scenario the loopback device is brought UP after initialization. So there is no point of bringing up the device in DOWN state followed by device UP operation. This change exposed another issue of fib-trie initialization which is corrected in the first path

[PATCH 2/2] loopback: bringup 'lo' by default at initialization

2017-07-04 Thread Mahesh Bandewar
From: Mahesh Bandewar loopback devices are always brought up right after its initialization including the case of network namespace creation. e.g. ip netns add foo ip -netns foo link set lo up This patch will eliminate the need to do that separately and would bring it up as part of the

[PATCH 1/2] ipv4: initialize fib_trie prior to register_netdev_notifier call.

2017-07-04 Thread Mahesh Bandewar
From: Mahesh Bandewar Net stack initialization currently initializes fib-trie after the first call to netdevice_notifier() call. It does not cause any problem since there are no devices UP at this moment, but trying to bring 'lo' UP at initialization would make this assumption wron

[PATCH 1/3] ipv4: initialize fib_trie prior to register_netdev_notifier call.

2017-07-19 Thread Mahesh Bandewar
From: Mahesh Bandewar Net stack initialization currently initializes fib-trie after the first call to netdevice_notifier() call. In fact fib_trie initialization needs to happen before first rtnl_register(). It does not cause any problem since there are no devices UP at this moment, but trying to

[PATCHv3 2/2] userns: control capabilities of some user namespaces

2017-12-05 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCHv3 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-12-05 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCHv3 0/2] capability controlled user-namespaces

2017-12-05 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCH next] ipvlan: add L2 check for packets arriving via virtual devices

2017-12-07 Thread Mahesh Bandewar
From: Mahesh Bandewar Packets that don't have dest mac as the mac of the master device should not be entertained by the IPvlan rx-handler. This is mostly true as the packet path mostly takes care of that, except when the master device is a virtual device. As demonstrated in the following

[PATCH next 2/2] ipvlan: remove excessive packet scrubbing

2017-12-13 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan currently scrubs packets at every location where packets may be crossing namespace boundary. Though this is desirable, currently IPvlan does it more than necessary. e.g. packets that are going to take dev_forward_skb() path will get scrubbed so no point in scrubbing

[PATCH next 0/2] ipvlan: packet scrub

2017-12-13 Thread Mahesh Bandewar
From: Mahesh Bandewar While crossing namespace boundary IPvlan aggressively scrubs packets. This is creating problems. First thing is that scrubbing changes the packet type in skb meta-data to PACKET_HOST. This causes erroneous packet delivery when dev_forward_skb() has already marked the

[PATCH next 1/2] Revert "ipvlan: add L2 check for packets arriving via virtual devices"

2017-12-13 Thread Mahesh Bandewar
From: Mahesh Bandewar This reverts commit 92ff42645028fa6f9b8aa767718457b9264316b4. Even though the check added is not that taxing, it's not really needed. First of all this will be per packet cost and second thing is that the eth_type_trans() already does this correctly. The exce

[PATCH] kmod: don't load module unless req process has CAP_SYS_MODULE

2017-05-12 Thread Mahesh Bandewar
From: Mahesh Bandewar A process inside random user-ns should not load a module, which is currently possible. As demonstrated in following scenario - Create namespaces; especially a user-ns and become root inside. $ unshare -rfUp -- unshare -unm -- bash Try to load the bridge module. It

[PATCH net] ipv6: avoid dad-failures for addresses with NODAD

2017-05-12 Thread Mahesh Bandewar
From: Mahesh Bandewar Every address gets added with TENTATIVE flag even for the addresses with IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process we realize it's an address with NODAD and complete the process without sending any probe. However the TENTATIVE flags

[PATCH next 2/2] ipvlan: implement VEPA mode

2017-10-26 Thread Mahesh Bandewar
From: Mahesh Bandewar This is very similar to the Macvlan VEPA mode, however, there is some difference. IPvlan uses the mac-address of the lower device, so the VEPA mode has implications of ICMP-redirects for packets destined for its immediate neighbors sharing same master since the packets will

[PATCH next 1/2] ipvlan: introduce 'private' attribute for all existing modes.

2017-10-26 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan has always operated in bridge mode. However there are scenarios where each slave should be able to talk through the master device but not necessarily across each other. Think of an environment where each of a namespace is a private and independant customer. In this

[PATCH next 0/2] add 'private' and 'vepa' attributes to ipvlan modes

2017-10-26 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan has always been operating in bridge-mode for its supported modes i.e. if the packets are destined to the adjacent neighbor dev, then IPvlan driver will switch the packet internally without needing the packets to hit the wire or get routed. However, there are

[PATCH] ip/ipvlan: enhance ability to add mode flags to existing modes

2017-10-30 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan supported bridge-only functionality prior to commits a190d04db937 ('ipvlan: introduce 'private' attribute for all existing modes.') and fe89aa6b250c ('ipvlan: implement VEPA mode'). These two commits allow to configure the VEPA and priv

[PATCH next 2/3] ipvlan: Process fragmented multicast frames correctly

2015-04-23 Thread Mahesh Bandewar
Multicast processing in IPvlan was faulty as is. Eric Dumazet pointed out that fragmented packets wont be processed correctly unless defrag step is introduced. This patch adds the defrag step before driver attempts to process multicast frame(s). Signed-off-by: Mahesh Bandewar --- drivers/net

[PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-23 Thread Mahesh Bandewar
we need to apply any additional tricks to further reduce the impact of this (multicast / broadcast) type of traffic, it can be implemented while processing this work without affecting the fast-path. Signed-off-by: Mahesh Bandewar --- drivers/net/ipvlan/ipvlan.h | 5 ++ drivers/net/ipvlan

[PATCH next 3/3] ipvlan: Always set broadcast bit in multicast filter

2015-04-23 Thread Mahesh Bandewar
correctly without affecting performance characteristics of the device. Signed-off-by: Mahesh Bandewar --- drivers/net/ipvlan/ipvlan_main.c | 20 ++-- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c

Re: [PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-23 Thread Mahesh Bandewar
On Thu, Apr 23, 2015 at 5:32 PM, Eric Dumazet wrote: > On Thu, 2015-04-23 at 14:29 -0700, Mahesh Bandewar wrote: > >> +static void ipvlan_multicast_enqueue(struct ipvl_port *port, >> + struct sk_buff *skb) >> +{ >> + if (skb-

Re: [PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-23 Thread Mahesh Bandewar
On Thu, Apr 23, 2015 at 9:28 PM, David Miller wrote: > From: Mahesh Bandewar > Date: Thu, 23 Apr 2015 19:54:29 -0700 > >> On Thu, Apr 23, 2015 at 5:32 PM, Eric Dumazet wrote: >>> On Thu, 2015-04-23 at 14:29 -0700, Mahesh Bandewar wrote: >>> >>>>

Re: [PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-24 Thread Mahesh Bandewar
On Fri, Apr 24, 2015 at 1:15 PM, Dan Williams wrote: > On Thu, 2015-04-23 at 14:29 -0700, Mahesh Bandewar wrote: >> Processing multicast / broadcast in fast path is performance draining >> and having more links means more clonning and bringing performance >> down further

Re: [PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-27 Thread Mahesh Bandewar
On Fri, Apr 24, 2015 at 3:59 PM, Dan Williams wrote: > On Fri, 2015-04-24 at 15:40 -0700, Mahesh Bandewar wrote: >> On Fri, Apr 24, 2015 at 1:15 PM, Dan Williams wrote: >> > On Thu, 2015-04-23 at 14:29 -0700, Mahesh Bandewar wrote: >> >> Processing multica

Re: [PATCH v8 net-next 2/2] bonding: Simplify the xmit function for modes that use xmit_hash

2015-10-19 Thread Mahesh Bandewar
On Mon, Oct 19, 2015 at 9:35 AM, Jiri Pirko wrote: > Sun, Oct 05, 2014 at 02:45:01AM CEST, mahe...@google.com wrote: >>Earlier change to use usable slave array for TLB mode had an additional >>performance advantage. So extending the same logic to all other modes >>that use xmit-hash for slave sele

Re: [PATCH net-next] ipvlan: read direct ifindex instead of iflink

2015-10-21 Thread Mahesh Bandewar
> > Notice also that ipv6 processing is not using iflink. Since there is a > discrepancy in usage, fixup both v4 and v6 case to use local dev > variable. > > Tested this with l3 ipvlan on top of veth, as well as with single > physical interface in the top namespace. > > S

[PATCH next 2/3] bonding: unify all places where actor-oper key needs to be updated.

2015-10-31 Thread Mahesh Bandewar
-machine logic - (a) If port is "duplex" then only it can participate in LACP (b) Speed change reinitializes the LACP state-machine. Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_3ad.c | 87 +- 1 file changed, 52 insertions(+), 35

[PATCH next 3/3] bonding: simplify / unify event handling code for 3ad mode.

2015-10-31 Thread Mahesh Bandewar
Old logic of updating state-machine is not required since ad_update_actor_keys() does it implicitly. The only loss is the notification differentiation between speed vs. duplex change. Now only one unified notification is printed. Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_3ad.c

[PATCH next 0/3] re-org actor admin/oper key updates

2015-10-31 Thread Mahesh Bandewar
ndler to decide if the key needs update. After this patch-set none of the machines (from same sample set) were exhibiting LACP-weirdness that was observed earlier. Mahesh Bandewar (3): bonding: Simplify __get_duplex function. bonding: unify all places where actor-oper key needs to be up

[PATCH next 1/3] bonding: Simplify __get_duplex function.

2015-10-31 Thread Mahesh Bandewar
Eliminate 'else' clause by simply initializing variable Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_3ad.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 3c45358844eb..3a

Re: [PATCH net-next] macvlan: fix failure during registration v3

2016-04-26 Thread Mahesh Bandewar
On Sat, Apr 23, 2016 at 3:03 PM, Francesco Ruggeri wrote: > If macvlan_common_newlink fails in register_netdevice after macvlan_init > then it decrements port->count twice, first in macvlan_uninit (from > register_netdevice or rollback_registered) and then again in > macvlan_common_newlink. > A si

Re: [PATCH net-next] macvlan: fix failure during registration v3

2016-04-26 Thread Mahesh Bandewar
[...] >> > -destroy_port: >> > - port->count -= 1; >> > - if (!port->count) >> > - macvlan_port_destroy(lowerdev); >> I think you still need this when it fails netdev_upper_dev_link(). The >> only thing you should remove is the label. > > I don't think so. I think the doub

[PATCH next] ipvlan: Fix failure path in dev registration during link creation

2016-04-27 Thread Mahesh Bandewar
From: Mahesh Bandewar When newlink creation fails at device-registration, the port->count is decremented twice. Francesco Ruggeri (frugg...@arista.com) found this issue in Macvlan and the same exists in IPvlan driver too. While fixing this issue I noticed another issue of missing unregister

Re: [PATCH next] ipvlan: Fix failure path in dev registration during link creation

2016-04-27 Thread Mahesh Bandewar
On Wed, Apr 27, 2016 at 11:57 AM, David Miller wrote: > From: Mahesh Bandewar > Date: Wed, 27 Apr 2016 11:37:39 -0700 > >> While fixing this issue I noticed another issue of missing unregister >> in case of failure, so adding it to the fix which is similar to the >>

[PATCH next v2] ipvlan: Fix failure path in dev registration during link creation

2016-04-27 Thread Mahesh Bandewar
From: Mahesh Bandewar When newlink creation fails at device-registration, the port->count is decremented twice. Francesco Ruggeri (frugg...@arista.com) found this issue in Macvlan and the same exists in IPvlan driver too. While fixing this issue I noticed another issue of missing unregister

[PATCH next] ipvlan: inherit MTU from master device

2016-01-27 Thread Mahesh Bandewar
From: Mahesh Bandewar When we create IPvlan slave; we use ether_setup() and that sets up default MTU to 1500 while the master device may have lower / different MTU. Any subsequent changes to the masters' MTU are reflected into the slaves' MTU setting. However if those don't happ

Re: [PATCH next] ipvlan: inherit MTU from master device

2016-01-28 Thread Mahesh Bandewar
On Thu, Jan 28, 2016 at 5:13 AM, Eric Dumazet wrote: > On Wed, Jan 27, 2016 at 11:33 PM, Mahesh Bandewar wrote: >> From: Mahesh Bandewar >> >> When we create IPvlan slave; we use ether_setup() and that >> sets up default MTU to 1500 while the master device may have &g

[PATCH next 1/3] ipvlan: scrub skb before routing in L3 mode.

2016-02-02 Thread Mahesh Bandewar
From: Mahesh Bandewar Scrub skb before hitting the iptable hooks to ensure packets hit these hooks. Signed-off-by: Mahesh Bandewar --- drivers/net/ipvlan/ipvlan_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan

[PATCH next 2/3] ipvlan: mode is u16

2016-02-02 Thread Mahesh Bandewar
From: Mahesh Bandewar The mode argument was erronusly defined as u32 but it has always been u16. Signed-off-by: Mahesh Bandewar --- drivers/net/ipvlan/ipvlan.h | 1 - drivers/net/ipvlan/ipvlan_main.c | 9 ++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers

[PATCH next 3/3] ipvlan: misc changes

2016-02-02 Thread Mahesh Bandewar
From: Mahesh Bandewar 1. scope correction for few functions that are used in single file. 2. Adjust variables that are used in fast-path to fit into single cacheline 3. Update rcv_frame() to skip shared check for frames coming over wire Signed-off-by: Mahesh Bandewar --- drivers/net/ipvlan

Re: [PATCH next 2/3] ipvlan: mode is u16

2016-02-08 Thread Mahesh Bandewar
On Sun, Feb 7, 2016 at 11:19 AM, David Miller wrote: > From: Mahesh Bandewar > Date: Tue, 2 Feb 2016 11:20:30 -0800 > >> From: Mahesh Bandewar >> >> The mode argument was erronusly defined as u32 but it has always >> been u16. >> >> Signed-off-by

[PATCHv2 next 1/3] ipvlan: scrub skb before routing in L3 mode.

2016-02-10 Thread Mahesh Bandewar
From: Mahesh Bandewar Scrub skb before hitting the iptable hooks to ensure packets hit these hooks. Signed-off-by: Mahesh Bandewar --- v1: initial patch v2: resend drivers/net/ipvlan/ipvlan_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ipvlan

[PATCHv2 next 2/3] ipvlan: mode is u16

2016-02-10 Thread Mahesh Bandewar
From: Mahesh Bandewar The mode argument was erronusly defined as u32 but it has always been u16. Also use ipvlan_set_mode() helper to set the mode instead of assigning directly. This should avoid future erronus assignments / updates. Signed-off-by: Mahesh Bandewar --- v1: initial patch v2

[PATCHv2 next 3/3] ipvlan: misc changes

2016-02-10 Thread Mahesh Bandewar
From: Mahesh Bandewar 1. scope correction for few functions that are used in single file. 2. Adjust variables that are used in fast-path to fit into single cacheline 3. Update rcv_frame() to skip shared check for frames coming over wire Signed-off-by: Mahesh Bandewar --- v1: initial patch v2

Re: [PATCHv2 next 1/3] ipvlan: scrub skb before routing in L3 mode.

2016-02-17 Thread Mahesh Bandewar
On Fri, Feb 12, 2016 at 2:24 PM, Cong Wang wrote: > On Wed, Feb 10, 2016 at 7:33 AM, Mahesh Bandewar wrote: >> From: Mahesh Bandewar >> >> Scrub skb before hitting the iptable hooks to ensure packets hit >> these hooks. >> >> Signed-off-by: Mahesh Band

[PATCH next v3 1/3] ipvlan: scrub skb before routing in L3 mode.

2016-02-17 Thread Mahesh Bandewar
From: Mahesh Bandewar Scrub skb before hitting the iptable hooks to ensure packets hit these hooks in master's namespace. Signed-off-by: Mahesh Bandewar --- drivers/net/ipvlan/ipvlan_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/i

[PATCH next v3 3/3] ipvlan: misc perf changes

2016-02-17 Thread Mahesh Bandewar
From: Mahesh Bandewar 1. scope correction for few functions that are used in single file. 2. Adjust variables that are used in fast-path to fit into single cacheline 3. Update rcv_frame() to skip shared check for frames coming over wire Signed-off-by: Mahesh Bandewar --- drivers/net/ipvlan

  1   2   3   >