[PATCH net] tcp: correct read of TFO keys on big endian systems

2020-08-10 Thread Jason Baron
test on s390x, which depends on the read value matching what was written. I've confirmed that the test now passes on big and little endian systems. Signed-off-by: Jason Baron Fixes: 438ac88009bc ("net: fastopen: robustness and endianness fixes for SipHash") Cc: Ard Biesheuvel Cc:

Re: [PATCH v3 6/7] venus: Make debug infrastructure more flexible

2020-06-11 Thread Jason Baron
On 6/11/20 5:19 PM, jim.cro...@gmail.com wrote: > trimmed.. > Currently I think there not enough "levels" to map something like drm.debug to the new dyn dbg feature. I don't think it is intrinsic but I couldn't find the bit of the code where the 5-bit level in struct _ddebug

Re: [net-next 0/2] net: sched: cls-flower: add support for port-based fragment filtering

2020-05-31 Thread Jason Baron
On 5/29/20 7:52 PM, David Miller wrote: > From: Jason Baron > Date: Wed, 27 May 2020 16:25:28 -0400 > >> Port based allow rules must currently allow all fragments since the >> port number is not included in the 1rst fragment. We want to restrict >> allowing all

[net-next 0/2] net: sched: cls-flower: add support for port-based fragment filtering

2020-05-27 Thread Jason Baron
, demonstrating the new behavior. Jason Baron (2): net: sched: cls-flower: include ports in 1rst fragment selftests: tc_flower: add destination port tests net/core/flow_dissector.c | 4 +- net/sched/cls_flower.c | 3 +- .../testing/selftests/net

[net-next 1/2] net: sched: cls-flower: include ports in 1rst fragment

2020-05-27 Thread Jason Baron
ing a rule which allows/blocks specific ports and allows all non-first ip fragments (via setting ip_flags to frag/nofirstfrag). Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Signed-off-by: Jason Baron --- net/core/flow_dissector.c | 4 +++- net/sched/cls_flower.c| 3 ++- 2 files chang

[net-next 2/2] selftests: tc_flower: add destination port tests

2020-05-27 Thread Jason Baron
Verify that tc flower can match on destination port for udp/tcp for both non-fragment and first fragment cases. Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Signed-off-by: Jason Baron --- .../testing/selftests/net/forwarding/tc_flower.sh | 73 +- 1 file changed, 72

[net-next v2] tcp: add TCP_INFO status for failed client TFO

2019-10-23 Thread Jason Baron
lds do not cover all the cases where TFO may fail, but other failures, such as SYN/ACK + data being dropped, will result in the connection not becoming established. And a connection blackhole after session establishment shows up as a stalled connection. Signed-off-by: Jason Baron Cc: Eric Dumazet

Re: [net-next] tcp: add TCP_INFO status for failed client TFO

2019-10-22 Thread Jason Baron
On 10/22/19 2:17 PM, Yuchung Cheng wrote: > On Mon, Oct 21, 2019 at 7:14 PM Neal Cardwell wrote: >> >> On Mon, Oct 21, 2019 at 5:11 PM Jason Baron wrote: >>> >>> >>> >>> On 10/21/19 4:36 PM, Eric Dumazet wrote: >>>>

Re: [net-next] tcp: add TCP_INFO status for failed client TFO

2019-10-21 Thread Jason Baron
On 10/21/19 4:36 PM, Eric Dumazet wrote: > On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch wrote: >> > >> Actually, longterm I hope we would be able to get rid of the >> blackhole-detection and fallback heuristics. In a far distant future where >> these middleboxes have been weeded out ;-) >

Re: [net-next] tcp: add TCP_INFO status for failed client TFO

2019-10-21 Thread Jason Baron
On 10/21/19 2:02 PM, Yuchung Cheng wrote: > Thanks for the patch. Detailed comments below > > On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell wrote: >> >> On Fri, Oct 18, 2019 at 3:03 PM Jason Baron wrote: >>> >>> The TCPI_OPT_SYN_DATA bit as part of

[net-next] tcp: add TCP_INFO status for failed client TFO

2019-10-18 Thread Jason Baron
ainly not cover all the cases where TFO may fail, but other failures, such as SYN/ACK + data being dropped, will result in the connection not becoming established. And a connection blackhole after session establishment shows up as a stalled connection. Signed-off-by: Jason Baron Cc: Eric Dumazet Cc

Re: [PATCH v2] tcp: Add TCP_INFO counter for packets received out-of-order

2019-09-17 Thread Jason Baron
On 9/10/19 4:38 PM, Eric Dumazet wrote: > On Tue, Sep 10, 2019 at 10:11 PM Thomas Higdon wrote: >> >> > ... >> Because an additional 32-bit member in struct tcp_info would cause >> a hole on 64-bit systems, we reserve a struct member '_reserved'. > ... >> diff --git a/include/uapi/linux/tcp.h b

Re: [PATCH net] tcp: remove empty skb from write queue in error cases

2019-08-26 Thread Jason Baron
write queue > is empty") > Signed-off-by: Eric Dumazet > Cc: Jason Baron > Reported-by: Vladimir Rutsky > Cc: Soheil Hassas Yeganeh > Cc: Neal Cardwell > --- > net/ipv4/tcp.c | 30 -- > 1 file changed, 20 insertion

Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed

2019-08-19 Thread Jason Baron
On 8/17/19 12:26 PM, Eric Dumazet wrote: > > > On 8/17/19 4:19 PM, Jason Baron wrote: >> >> >> On 8/17/19 12:26 AM, Eric Dumazet wrote: >>> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE >>> under memory pressure"

[net PATCH] net/smc: make sure EPOLLOUT is raised

2019-08-19 Thread Jason Baron
SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior that Eric Dumazet introduced for tcp sockets. Signed-off-by: Jason Baron Cc: Eric Dumazet Cc: Ursula Braun Cc: Karsten Graul --- net/smc/smc_tx.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net

Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed

2019-08-17 Thread Jason Baron
On 8/17/19 12:26 AM, Eric Dumazet wrote: > As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE > under memory pressure"), it is crucial we properly set SOCK_NOSPACE > when needed. > > However, Jason patch had a bug, because the 'n

Re: [PATCH net-next] gso: enable udp gso for virtual devices

2019-06-26 Thread Jason Baron
On 6/14/19 4:53 PM, Jason Baron wrote: > > > On 6/13/19 5:20 PM, Willem de Bruijn wrote: >>>>> @@ -237,6 +237,7 @@ static inline int find_next_netdev_feature(u64 >>>>> feature, unsigned long start) >>>>>

Re: [PATCH net-next] gso: enable udp gso for virtual devices

2019-06-14 Thread Jason Baron
On 6/13/19 5:20 PM, Willem de Bruijn wrote: @@ -237,6 +237,7 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start) NETIF_F_GSO_GRE_CSUM | \ NETIF_F_GSO_IPXIP4 |

Re: [PATCH net-next] gso: enable udp gso for virtual devices

2019-06-13 Thread Jason Baron
On 6/13/19 1:15 PM, Alexander Duyck wrote: > On Wed, Jun 12, 2019 at 4:14 PM Jason Baron wrote: >> >> Now that the stack supports UDP GRO, we can enable udp gso for virtual >> devices. If packets are looped back locally, and UDP GRO is not enabled >> then they will b

[PATCH net-next] gso: enable udp gso for virtual devices

2019-06-13 Thread Jason Baron
. Tested by connecting two namespaces via macvlan and then ran udpgso_bench_tx: before: udp tx: 2068 MB/s35085 calls/s 35085 msg/s after (no UDP_GRO): udp tx: 3438 MB/s58319 calls/s 58319 msg/s after (UDP_GRO): udp tx: 8037 MB/s 136314 calls/s 136314 msg/s Signed-off-by: Jason

[PATCH net-next v2 5/6] Documentation: ip-sysctl.txt: Document tcp_fastopen_key

2019-05-29 Thread Jason Baron
Add docs for /proc/sys/net/ipv4/tcp_fastopen_key Signed-off-by: Jason Baron Signed-off-by: Christoph Paasch Cc: Jeremy Sowden Acked-by: Yuchung Cheng --- Documentation/networking/ip-sysctl.txt | 20 1 file changed, 20 insertions(+) diff --git a/Documentation/networking

[PATCH net-next v2 3/6] tcp: add support to TCP_FASTOPEN_KEY for optional backup key

2019-05-29 Thread Jason Baron
ceive a 32-byte value as output if requested. If a 16-byte value is used to set the primary key via TCP_FASTOPEN_KEY, then any previously set backup key will be removed. Signed-off-by: Jason Baron Signed-off-by: Christoph Paasch Acked-by: Yuchung Cheng --- net/ipv4/

[PATCH net-next v2 2/6] tcp: add backup TFO key infrastructure

2019-05-29 Thread Jason Baron
se of this infrastructure in subsequent patches. Suggested-by: Igor Lubashev Signed-off-by: Jason Baron Signed-off-by: Christoph Paasch Acked-by: Yuchung Cheng --- include/net/tcp.h | 41 ++- include/uapi/linux/snmp.h | 1 + net/ipv4/proc.c| 1 + net

[PATCH net-next v2 6/6] selftests/net: add TFO key rotation selftest

2019-05-29 Thread Jason Baron
Demonstrate how the primary and backup TFO keys can be rotated while minimizing the number of client cookies that are rejected. Signed-off-by: Jason Baron Signed-off-by: Christoph Paasch Acked-by: Yuchung Cheng --- tools/testing/selftests/net/.gitignore | 1 + tools/testing

[PATCH net-next v2 1/6] tcp: introduce __tcp_fastopen_cookie_gen_cipher()

2019-05-29 Thread Jason Baron
Christoph Paasch Signed-off-by: Jason Baron Acked-by: Yuchung Cheng --- net/ipv4/tcp_fastopen.c | 73 + 1 file changed, 37 insertions(+), 36 deletions(-) diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 018a484..3889ad2 100644

[PATCH net-next v2 4/6] tcp: add support for optional TFO backup key to net.ipv4.tcp_fastopen_key

2019-05-29 Thread Jason Baron
ne key is set, userspace will simply read back that single key as follows: # echo "x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key x-x-x-x Signed-off-by: Jason Baron Signed-off-by: Christoph Paasch Acked-by: Yuchung Cheng --- net/i

[PATCH net-next v2 0/6] add TFO backup key

2019-05-29 Thread Jason Baron
Jason Changes in v2: -spelling fixes in ip-sysctl.txt (Jeremy Sowden) -re-base to latest net-next Christoph Paasch (1): tcp: introduce __tcp_fastopen_cookie_gen_cipher() Jason Baron (5): tcp: add backup TFO key infrastructure tcp: add support to TCP_FASTOPEN_KEY for optional backup key

Re: [PATCH net-next 0/6] add TFO backup key

2019-05-28 Thread Jason Baron
On 5/24/19 7:17 PM, Yuchung Cheng wrote: > On Thu, May 23, 2019 at 4:31 PM Yuchung Cheng wrote: >> >> On Thu, May 23, 2019 at 12:14 PM David Miller wrote: >>> >>> From: Jason Baron >>> Date: Wed, 22 May 2019 16:39:32 -0400 >>> >>>>

[PATCH net-next 5/6] Documentation: ip-sysctl.txt: Document tcp_fastopen_key

2019-05-22 Thread Jason Baron
Add docs for /proc/sys/net/ipv4/tcp_fastopen_key Signed-off-by: Christoph Paasch Signed-off-by: Jason Baron --- Documentation/networking/ip-sysctl.txt | 20 1 file changed, 20 insertions(+) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip

[PATCH net-next 3/6] tcp: add support to TCP_FASTOPEN_KEY for optional backup key

2019-05-22 Thread Jason Baron
is used to set the primary key via TCP_FASTOPEN_KEY, then any previously set backup key will be removed. Signed-off-by: Christoph Paasch Signed-off-by: Jason Baron --- net/ipv4/tcp.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/net/ipv4/t

[PATCH net-next 0/6] add TFO backup key

2019-05-22 Thread Jason Baron
Jason Christoph Paasch (1): tcp: introduce __tcp_fastopen_cookie_gen_cipher() Jason Baron (5): tcp: add backup TFO key infrastructure tcp: add support to TCP_FASTOPEN_KEY for optional backup key tcp: add support for optional TFO backup key to /proc/sys/net/ipv4/tcp_fastopen_key Document

[PATCH net-next 6/6] selftests/net: add TFO key rotation selftest

2019-05-22 Thread Jason Baron
Demonstrate how the primary and backup TFO keys can be rotated while minimizing the number of client cookies that are rejected. Signed-off-by: Christoph Paasch Signed-off-by: Jason Baron --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile

[PATCH net-next 2/6] tcp: add backup TFO key infrastructure

2019-05-22 Thread Jason Baron
se of this infrastructure in subsequent patches. Suggested-by: Igor Lubashev Signed-off-by: Christoph Paasch Signed-off-by: Jason Baron --- include/net/tcp.h | 41 ++- include/uapi/linux/snmp.h | 1 + net/ipv4/proc.c| 1 + net/ipv4/sysctl_net_ipv4.c | 2 +-

[PATCH net-next 4/6] tcp: add support for optional TFO backup key to /proc/sys/net/ipv4/tcp_fastopen_key

2019-05-22 Thread Jason Baron
ne key is set, userspace will simply read back that single key as follows: # echo "x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key x-x-x-x Signed-off-by: Christoph Paasch Signed-off-by: Jason Baron --- net/i

[PATCH net-next 1/6] tcp: introduce __tcp_fastopen_cookie_gen_cipher()

2019-05-22 Thread Jason Baron
From: Christoph Paasch Restructure __tcp_fastopen_cookie_gen() to take a 'struct crypto_cipher' argument and rename it as __tcp_fastopen_cookie_gen_cipher(). Subsequent patches will provide different ciphers based on which key is being used for the cookie generation. Signed-off-by: J

Re: [RFC] nasty corner case in unix_dgram_sendmsg()

2019-02-27 Thread Jason Baron
On 2/26/19 6:59 PM, Al Viro wrote: > On Tue, Feb 26, 2019 at 03:35:39PM -0500, Jason Baron wrote: > >>> I understand what the unix_dgram_peer_wake_me() is doing; I understand >>> what unix_dgram_poll() is using it for. What I do not understand is >>>

Re: [RFC] nasty corner case in unix_dgram_sendmsg()

2019-02-26 Thread Jason Baron
On 2/26/19 2:03 PM, Al Viro wrote: > On Tue, Feb 26, 2019 at 03:31:32PM +, Rainer Weikusat wrote: >> Al Viro writes: >>> On Tue, Feb 26, 2019 at 06:28:17AM +, Al Viro wrote: >> >> [...] >> >> * if after relocking we see that unix_peer(sk) now is equal to other, we arrange f

[PATCH v2 net-next] af_unix: ensure POLLOUT on remote close() for connected dgram socket

2018-08-03 Thread Jason Baron
as reported as a hang when /dev/log is closed. The fix is to signal POLLOUT if the socket is marked as SOCK_DEAD, which means a subsequent write() will get -ECONNREFUSED. Reported-by: Ian Lance Taylor Cc: David Rientjes Cc: Rainer Weikusat Cc: Eric Dumazet Signed-off-by: Jason Baron --- v2: u

[PATCH net] af_unix: ensure POLLOUT on remote close() for connected dgram sockets

2018-07-16 Thread Jason Baron
' step. Nevertheless, this has been observed when the syslog daemon closes /dev/log. Tested against a reproducer that re-creates the syslog hang. The proposed fix is to move the wake_up_interruptible_all() call after the 'free all skbs' step. Reported-by: Ian Lance Taylor Cc: Rainer W

Re: Bug report: epoll can fail to report EPOLLOUT when unix datagram socket peer is closed

2018-07-11 Thread Jason Baron
kbs' step would in fact cause a wakeup and a POLLOUT return. So the race here is probably fairly rare because it means there are no skbs that thread 1 queued and that thread 1 schedules before the 'free all skbs' step. Nevertheless, this has been observed in the wild via syslog. The

[PATCH v4 2/3] qemu: virtio-net: use 64-bit values for feature flags

2018-01-05 Thread Jason Baron
In prepartion for using some of the high order feature bits, make sure that virtio-net uses 64-bit values everywhere. Signed-off-by: Jason Baron Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: virtio-...@lists.oasis-open.org --- hw/net/virtio-net.c

[PATCH net-next v4 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor

2018-01-05 Thread Jason Baron
tention is that device feature bits are to grow down from bit 63, since the transports are starting from bit 24 and growing up. Signed-off-by: Jason Baron Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: virtio-...@lists.oasis-open.org --- drivers/net/virtio_net.c| 23 ++

[PATCH v4 3/3] qemu: add linkspeed and duplex settings to virtio-net

2018-01-05 Thread Jason Baron
bsequently overwrite it later if desired via: 'ethtool -s'. Linkspeed and duplex settings can be set as: '-device virtio-net,speed=1,duplex=full' where speed is [-1...INT_MAX], and duplex is ["half"|"full"]. Signed-off-by: Jason Baron Cc: "Michael

[PATCH v4 0/3] virtio_net: allow hypervisor to indicate linkspeed and duplex setting

2018-01-05 Thread Jason Baron
* only do speed/duplex read in virtnet_config_changed_work() on LINK_UP changes from v2: * move speed/duplex read into virtnet_config_changed_work() so link up changes are detected Jason Baron (1): virtio_net: propagate linkspeed/duplex settings from the hypervisor drivers/net/virtio_net.c

Re: [PATCH net-next v3 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor

2018-01-04 Thread Jason Baron
On 01/04/2018 01:22 PM, Michael S. Tsirkin wrote: > On Thu, Jan 04, 2018 at 01:12:30PM -0500, Jason Baron wrote: >> >> >> On 01/04/2018 12:05 PM, Michael S. Tsirkin wrote: >>> On Thu, Jan 04, 2018 at 12:16:44AM -0500, Jason Baron wrote: >>>> The ability

Re: [PATCH net-next v3 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor

2018-01-04 Thread Jason Baron
On 01/04/2018 12:05 PM, Michael S. Tsirkin wrote: > On Thu, Jan 04, 2018 at 12:16:44AM -0500, Jason Baron wrote: >> The ability to set speed and duplex for virtio_net is useful in various >> scenarios as described here: >> >> 16032be virtio_net: add ethtool support

Re: [PATCH net-next v3 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor

2018-01-04 Thread Jason Baron
On 01/04/2018 11:27 AM, Michael S. Tsirkin wrote: > On Thu, Jan 04, 2018 at 12:16:44AM -0500, Jason Baron wrote: >> The ability to set speed and duplex for virtio_net is useful in various >> scenarios as described here: >> >> 16032be virtio_net: add ethtool support

[PATCH v3 3/3] qemu: add linkspeed and duplex settings to virtio-net

2018-01-03 Thread Jason Baron
bsequently overwrite it later if desired via: 'ethtool -s'. Linkspeed and duplex settings can be set as: '-device virtio-net,speed=1,duplex=full' where speed is [-1...INT_MAX], and duplex is ["half"|"full"]. Signed-off-by: Jason Baron Cc: "Michael

[PATCH v3 0/3] virtio_net: allow hypervisor to indicate linkspeed and duplex setting

2018-01-03 Thread Jason Baron
s, -Jason linux changes: changes from v2: * move speed/duplex read into virtnet_config_changed_work() so link up changes are detected Jason Baron (1): virtio_net: propagate linkspeed/duplex settings from the hypervisor drivers/net/virtio_net.c| 19 ++- include/

[PATCH net-next v3 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor

2018-01-03 Thread Jason Baron
tention is that device feature bits are to grow down from bit 63, since the transports are starting from bit 24 and growing up. Signed-off-by: Jason Baron Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: virtio-...@lists.oasis-open.org --- drivers/net/virtio_net.c| 19 ++

[PATCH v3 2/3] qemu: virtio-net: use 64-bit values for feature flags

2018-01-03 Thread Jason Baron
In prepartion for using some of the high order feature bits, make sure that virtio-net uses 64-bit values everywhere. Signed-off-by: Jason Baron Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: virtio-...@lists.oasis-open.org --- hw/net/virtio-net.c

Re: [PATCH net-next v2 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor

2017-12-28 Thread Jason Baron
On 12/27/2017 04:43 PM, David Miller wrote: > From: Jason Baron > Date: Fri, 22 Dec 2017 16:54:01 -0500 > >> The ability to set speed and duplex for virtio_net in useful in various >> scenarios as described here: >> >> 16032be virtio_net: add ethtool su

[PATCH 3/3] qemu: add linkspeed and duplex settings to virtio-net

2017-12-22 Thread Jason Baron
bsequently overwrite it later if desired via: 'ethtool -s'. Linkspeed and duplex settings can be set as: '-device virtio-net,speed=1,duplex=full' where speed is [-1...INT_MAX], and duplex is ["half"|"full"]. Signed-off-by: Jason Baron Cc: "Michae

[PATCH 2/3] qemu: use 64-bit values for feature flags in virtio-net

2017-12-22 Thread Jason Baron
In prepartion for using some of the high order feature bits, make sure that virtio-net uses 64-bit values everywhere. Signed-off-by: Jason Baron Cc: "Michael S. Tsirkin" Cc: Jason Wang --- hw/net/virtio-net.c| 54 +- include/hw/vir

[PATCH net-next v2 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor

2017-12-22 Thread Jason Baron
htool commands. Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows the hypervisor to export a linkspeed and duplex setting. The user can subsequently overwrite it later if desired via: 'ethtool -s'. Signed-off-by: Jason Baron Cc: "Michael S. Tsirkin" Cc:

[PATCH v2 0/3] virtio_net: allow hypervisor to indicate linkspeed and duplex setting

2017-12-22 Thread Jason Baron
-Jason linux changes: Jason Baron (1): virtio_net: propagate linkspeed/duplex settings from the hypervisor drivers/net/virtio_net.c

[PATCH net] tcp: correct memory barrier usage in tcp_check_space()

2017-01-24 Thread Jason Baron
From: Jason Baron sock_reset_flag() maps to __clear_bit() not the atomic version clear_bit(). Thus, we need smp_mb(), smp_mb__after_atomic() is not sufficient. Fixes: 3c7151275c0c ("tcp: add memory barriers to write space paths") Cc: Eric Dumazet Cc: Oleg Nesterov Signed-off-by: J

Re: wrong smp_mb__after_atomic() in tcp_check_space() ?

2017-01-23 Thread Jason Baron
On 01/23/2017 01:04 PM, Eric Dumazet wrote: On Mon, 2017-01-23 at 11:56 -0500, Jason Baron wrote: On 01/23/2017 09:30 AM, Oleg Nesterov wrote: Hello, smp_mb__after_atomic() looks wrong and misleading, sock_reset_flag() does the non-atomic __clear_bit() and thus it can not guarantee test_bit

Re: wrong smp_mb__after_atomic() in tcp_check_space() ?

2017-01-23 Thread Jason Baron
On 01/23/2017 09:30 AM, Oleg Nesterov wrote: Hello, smp_mb__after_atomic() looks wrong and misleading, sock_reset_flag() does the non-atomic __clear_bit() and thus it can not guarantee test_bit(SOCK_NOSPACE) (non-atomic too) won't be reordered. Indeed. Here's a bit of discussion on it: http:/

[PATCH net-next] tcp: accept RST for rcv_nxt - 1 after receiving a FIN

2017-01-17 Thread Jason Baron
From: Jason Baron Using a Mac OSX box as a client connecting to a Linux server, we have found that when certain applications (such as 'ab'), are abruptly terminated (via ^C), a FIN is sent followed by a RST packet on tcp connections. The FIN is accepted by the Linux stack but the R

Re: [RFC PATCH] tcp: accept RST for rcv_nxt - 1 after receiving a FIN

2017-01-13 Thread Jason Baron
On 01/11/2017 10:48 AM, Eric Dumazet wrote: > On Thu, 2017-01-05 at 16:33 -0500, Jason Baron wrote: > >> >> +/* Accept RST for rcv_nxt - 1 after a FIN. >> + * When tcp connections are abruptly terminated from Mac OSX (via ^C), a >> + * FIN is sent followed by a RS

Re: [RFC PATCH] tcp: accept RST for rcv_nxt - 1 after receiving a FIN

2017-01-11 Thread Jason Baron
On 01/11/2017 12:17 AM, Christoph Paasch wrote: Hello Jason, (resending as Gmail sent out with HTML) On 05/01/17 - 16:33:28, Jason Baron wrote: Using a Mac OSX box as a client connecting to a Linux server, we have found that when certain applications (such as 'ab'), are abruptly

[RFC PATCH] tcp: accept RST for rcv_nxt - 1 after receiving a FIN

2017-01-05 Thread Jason Baron
. 1:1(0) ack 1 win 32768 0.200 accept(3, ..., ...) = 4 // Client closes the connection 0.300 < F. 1:1(0) ack 1 win 32768 // now send rst with same sequence 0.300 < R. 1:1(0) ack 1 win 32768 // make sure we are in TCP_CLOSE 0.400 %{ assert tcpi_state == 7 }% Signed-off-by: Jason Baron ---

Re: [net PATCH] fib_trie: Correct /proc/net/route off by one error

2016-11-07 Thread Jason Baron
/proc/net/route") Cc: Andy Whitcroft Reported-by: Jason Baron Signed-off-by: Alexander Duyck --- net/ipv4/fib_trie.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) Ok. Works for me. Feel free to add: Reviewed-and-Tested-by: Jason Baron Thanks, -Jason

Re: [PATCH net] fib_trie: correct /proc/net/route for large read buffer

2016-11-04 Thread Jason Baron
On 11/04/2016 02:43 PM, Alexander Duyck wrote: On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron wrote: From: Jason Baron When read() is called on /proc/net/route requesting a size that is one entry size (128 bytes) less than m->size or greater, the resulting output has missing and/or duplic

[PATCH net] fib_trie: correct /proc/net/route for large read buffer

2016-11-04 Thread Jason Baron
From: Jason Baron When read() is called on /proc/net/route requesting a size that is one entry size (128 bytes) less than m->size or greater, the resulting output has missing and/or duplicate entries. Since m->size is typically PAGE_SIZE, for a PAGE_SIZE of 4,096 this means that reads requ

Re: [PATCH v2 net-next 0/2] bnx2x: page allocation failure

2016-10-05 Thread Jason Baron
On 10/03/2016 08:19 PM, David Miller wrote: > From: "Baron, Jason" > Date: Mon, 3 Oct 2016 20:24:32 + > >> Or should I just send the incremental at this point? > Incremental is required at this point. Hi David, Ok. The above question was sent out erroneously. I have already posted the incr

[PATCH net-next] bnx2x: free the mac filter group list before freeing the cmd

2016-09-26 Thread Jason Baron
The group list must be freed prior to freeing the command otherwise we have a use-after-free. Signed-off-by: Jason Baron Cc: Yuval Mintz Cc: Ariel Elior --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet

Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-23 Thread Jason Baron
Hi, On 09/23/2016 03:24 AM, Nicholas Piggin wrote: On Fri, 23 Sep 2016 14:42:53 +0800 "Hillf Danton" wrote: The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows with the number of fds passed. We had a customer report page allocation failures of order-4 for this allocat

[PATCH v2 net-next 1/2] bnx2x: allocate mac filtering 'mcast_list' in PAGE_SIZE increments

2016-09-22 Thread Jason Baron
From: Jason Baron Currently, we can have high order page allocations that specify GFP_ATOMIC when configuring multicast MAC address filters. For example, we have seen order 2 page allocation failures with ~500 multicast addresses configured. Convert the allocation for 'mcast_list'

[PATCH v2 net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments

2016-09-22 Thread Jason Baron
From: Jason Baron Currently, we can have high order page allocations that specify GFP_ATOMIC when configuring multicast MAC address filters. For example, we have seen order 2 page allocation failures with ~500 multicast addresses configured. Convert the allocation for the pending list to be

[PATCH v2 net-next 0/2] bnx2x: page allocation failure

2016-09-22 Thread Jason Baron
+0x10/0x40 [1207325.864263] [] ? kthread_create_on_node+0x180/0x180 [1207325.871288] [] ret_from_fork+0x42/0x70 [1207325.877183] [] ? kthread_create_on_node+0x180/0x180 v2: -make use of list_next_entry() -only use PAGE_SIZE allocations Jason Baron (2): bnx2x: allocate mac filtering '

Re: [PATCH net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments

2016-09-20 Thread Jason Baron
On 09/20/2016 07:30 AM, David Laight wrote: From: Jason Baron Sent: 19 September 2016 19:34 ... sizeof(struct bnx2x_mcast_list_elem) = 24. So there are 170 per page on x86. So if we want to fit 2,048 elements, we need 12 pages. If you only need to save the mcast addresses you could use a

Re: [PATCH net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments

2016-09-20 Thread Jason Baron
On 09/20/2016 11:00 AM, Mintz, Yuval wrote: The question I rose was whether it actually makes a difference under such circumstances whether the device would actually filter those multicast addresses or be completely multicast promiscuous. e.g., whether it's significant to be filtering out multi

Re: [PATCH net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments

2016-09-20 Thread Jason Baron
for the pending list to be done in PAGE_SIZE increments. Signed-off-by: Jason Baron While I appreciate the effort, I wonder whether it's worth it: - The hardware [even in its newer generation] provides an approximate based classification [I.e., hashed] with 256 bins. When configurin

Re: [PATCH net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments

2016-09-19 Thread Jason Baron
the pending list to be done in PAGE_SIZE increments. Signed-off-by: Jason Baron While I appreciate the effort, I wonder whether it's worth it: - The hardware [even in its newer generation] provides an approximate based classification [I.e., hashed] with 256 bins. When configuring 500 mult

[PATCH net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments

2016-09-16 Thread Jason Baron
increments. Signed-off-by: Jason Baron --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 131 ++--- 1 file changed, 94 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c index

[PATCH net-next 1/2] bnx2x: allocate mac filtering 'mcast_list' in PAGE_SIZE increments

2016-09-16 Thread Jason Baron
PAGE_SIZE increments. Signed-off-by: Jason Baron --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 85 1 file changed, 57 insertions(+), 28 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ma

[PATCH net-next 0/2] bnx2x: page allocation failure

2016-09-16 Thread Jason Baron
+0x10/0x40 [1207325.864263] [] ? kthread_create_on_node+0x180/0x180 [1207325.871288] [] ret_from_fork+0x42/0x70 [1207325.877183] [] ? kthread_create_on_node+0x180/0x180 Jason Baron (2): bnx2x: allocate mac filtering 'mcast_list' in PAGE_SIZE increments bnx2x: allocate mac filte

Re: strange Mac OSX RST behavior

2016-07-22 Thread Jason Baron
Hi, After looking at this further we found that there is actually a rate limit on 'rst' packets sent by OSX on a closed socket. Its set to 250 per second and controlled via: net.inet.icmp.icmplim. Increasing that limit resolves the issue, but the default is apparently 250. Thanks, -Jason On 07

[PATCH net] tcp: enable per-socket rate limiting of all 'challenge acks'

2016-07-14 Thread Jason Baron
From: Jason Baron The per-socket rate limit for 'challenge acks' was introduced in the context of limiting ack loops: commit f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock") And I think it can be extended to rate limit all 'challenge acks' on

Re: strange Mac OSX RST behavior

2016-07-01 Thread Jason Baron
On 07/01/2016 02:16 PM, One Thousand Gnomes wrote: >> yes, we do in fact see a POLLRDHUP from the FIN in this case and >> read of zero, but we still have more data to write to the socket, and >> b/c the RST is dropped here, the socket stays in TIME_WAIT until >> things eventually time out... > >

Re: strange Mac OSX RST behavior

2016-07-01 Thread Jason Baron
On 07/01/2016 01:08 PM, Rick Jones wrote: > On 07/01/2016 08:10 AM, Jason Baron wrote: >> I'm wondering if anybody else has run into this... >> >> On Mac OSX 10.11.5 (latest version), we have found that when tcp >> connections are abruptly terminated (via ^C), a F

strange Mac OSX RST behavior

2016-07-01 Thread Jason Baron
I'm wondering if anybody else has run into this... On Mac OSX 10.11.5 (latest version), we have found that when tcp connections are abruptly terminated (via ^C), a FIN is sent followed by an RST packet. The RST is sent with the same sequence number as the FIN, and thus dropped since the stack only

Re: [PATCH v2 net-next 2/2] tcp: reduce cpu usage when SO_SNDBUF is set

2016-06-22 Thread Jason Baron
On 06/22/2016 02:51 PM, Eric Dumazet wrote: On Wed, 2016-06-22 at 11:43 -0700, Eric Dumazet wrote: On Wed, 2016-06-22 at 14:18 -0400, Jason Baron wrote: For 1/2, the getting the correct memory barrier, should I re-submit that as a separate patch? Are you sure a full memory barrier (smp_mb

Re: [PATCH v2 net-next 2/2] tcp: reduce cpu usage when SO_SNDBUF is set

2016-06-22 Thread Jason Baron
On 06/22/2016 01:34 PM, Eric Dumazet wrote: On Wed, 2016-06-22 at 11:32 -0400, Jason Baron wrote: From: Jason Baron When SO_SNDBUF is set and we are under tcp memory pressure, the effective write buffer space can be much lower than what was set using SO_SNDBUF. For example, we may have set

[PATCH v2 net-next 2/2] tcp: reduce cpu usage when SO_SNDBUF is set

2016-06-22 Thread Jason Baron
From: Jason Baron When SO_SNDBUF is set and we are under tcp memory pressure, the effective write buffer space can be much lower than what was set using SO_SNDBUF. For example, we may have set the buffer to 100kb, but we may only be able to write 10kb. In this scenario poll()/select()/epoll

[PATCH v2 net-next 1/2] tcp: replace smp_mb__after_atomic() with smp_mb() in tcp_poll()

2016-06-22 Thread Jason Baron
From: Jason Baron sock_reset_flag() maps to __clear_bit() not the atomic version clear_bit(), hence we need an smp_mb() there, smp_mb__after_atomic() is not sufficient. Fixes: 3c7151275c0c ("tcp: add memory barriers to write space paths") Cc: Eric Dumazet Signed-off-by: Jason Baron

[PATCH v2 net-next 0/2] tcp: reduce cpu usage when SO_SNDBUF is set

2016-06-22 Thread Jason Baron
with SOCK_QUEUE_SHRUNK Jason Baron (2): tcp: replace smp_mb__after_atomic() with smp_mb() in tcp_poll() tcp: reduce cpu usage when SO_SNDBUF is set include/net/sock.h | 6 ++ net/ipv4/tcp.c | 26 +++--- net/ipv4/tcp_input.c | 5 +++-- 3 files changed, 28 insertions(

Re: [PATCH net-next] tcp: reduce cpu usage when SO_SNDBUF is set

2016-06-21 Thread Jason Baron
On 06/20/2016 06:29 PM, Eric Dumazet wrote: > On Mon, 2016-06-20 at 17:23 -0400, Jason Baron wrote: >> From: Jason Baron >> >> When SO_SNDBUF is set and we are under tcp memory pressure, the effective >> write buffer space can be much lower than what was set using SO

[PATCH net-next] tcp: reduce cpu usage when SO_SNDBUF is set

2016-06-20 Thread Jason Baron
From: Jason Baron When SO_SNDBUF is set and we are under tcp memory pressure, the effective write buffer space can be much lower than what was set using SO_SNDBUF. For example, we may have set the buffer to 100kb, but we may only be able to write 10kb. In this scenario poll()/select()/epoll

Re: use-after-free in sctp_do_sm

2015-12-04 Thread Jason Baron
On 12/04/2015 12:03 PM, Joe Perches wrote: > On Fri, 2015-12-04 at 11:47 -0500, Jason Baron wrote: >> When DYNAMIC_DEBUG is enabled we have this wrapper from >> include/linux/dynamic_debug.h: >> >> if (unlikely(descriptor.flags & _DPRINTK_FLAGS_PRINT)) >&g

Re: use-after-free in sctp_do_sm

2015-12-04 Thread Jason Baron
On 12/04/2015 11:12 AM, Dmitry Vyukov wrote: > On Thu, Dec 3, 2015 at 9:51 PM, Joe Perches wrote: >> (adding lkml as this is likely better discussed there) >> >> On Thu, 2015-12-03 at 15:42 -0500, Jason Baron wrote: >>> On 12/03/2015 03:24 PM, Joe Perches wrote: &g

Re: use-after-free in sctp_do_sm

2015-12-03 Thread Jason Baron
On 12/03/2015 03:24 PM, Joe Perches wrote: > On Thu, 2015-12-03 at 15:10 -0500, Jason Baron wrote: >> On 12/03/2015 03:03 PM, Joe Perches wrote: >>> On Thu, 2015-12-03 at 14:32 -0500, Jason Baron wrote: >>>> On 12/03/2015 01:52 PM, Aaron Conole wrote: >>>>

Re: use-after-free in sctp_do_sm

2015-12-03 Thread Jason Baron
On 12/03/2015 03:03 PM, Joe Perches wrote: > On Thu, 2015-12-03 at 14:32 -0500, Jason Baron wrote: >> On 12/03/2015 01:52 PM, Aaron Conole wrote: >>> I think that as a minimum, the following patch should be evaluted, >>> but am unsure to whom I should submit it (af

Re: use-after-free in sctp_do_sm

2015-12-03 Thread Jason Baron
On 12/03/2015 01:52 PM, Aaron Conole wrote: > Dmitry Vyukov writes: >> On Thu, Dec 3, 2015 at 6:02 PM, Eric Dumazet wrote: >>> On Thu, Dec 3, 2015 at 7:55 AM, Dmitry Vyukov wrote: On Thu, Dec 3, 2015 at 3:48 PM, Eric Dumazet wrote: >> >> No, I don't. But pr_debug always computes

Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label

2015-11-30 Thread Jason Baron
On 11/30/2015 04:50 PM, Johannes Weiner wrote: > On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote: >> We're trying to move to the updated API, so this should be: >> static_branch_unlikely(&memcg_sockets_enabled_key) >> >> see: include/linux/jump_

Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label

2015-11-30 Thread Jason Baron
Hi, On 11/24/2015 04:52 PM, Johannes Weiner wrote: > The unified hierarchy memory controller is going to use this jump > label as well to control the networking callbacks. Move it to the > memory controller code and give it a more generic name. > > Signed-off-by: Johannes Weiner > Acked-by: Mich

Re: use-after-free in sock_wake_async

2015-11-24 Thread Jason Baron
On 11/24/2015 10:21 AM, Eric Dumazet wrote: > On Tue, Nov 24, 2015 at 6:18 AM, Dmitry Vyukov wrote: >> Hello, >> >> The following program triggers use-after-free in sock_wake_async: >> >> // autogenerated by syzkaller (http://github.com/google/syzkaller) >> #include >> #include >> #include >>

Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue

2015-11-23 Thread Jason Baron
still has a pointer to the > corresponding peer_wait queue. There's no way to forcibly deregister a > wait queue with epoll. > > Based on an idea by Jason Baron, the patch below changes the code such > that a wait_queue_t belonging to the client socket is enqueued on the >

  1   2   >