Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-16 Thread Dave Taht
On Wed, Jan 16, 2013 at 5:19 PM, Maciej Soltysiak wrote: > On Mon, Jan 14, 2013 at 7:11 AM, Dave Taht wrote: > >> The routing cache got eliminated between 3.3 and 3.6, and there were >> all sorts of changes to it over the last 6 releases that have been >> bothersome. > > Ok, you might be onto som

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-16 Thread Dave Taht
Sounds like a bingo to me. On Wed, Jan 16, 2013 at 5:19 PM, Maciej Soltysiak wrote: > On Mon, Jan 14, 2013 at 7:11 AM, Dave Taht wrote: > >> The routing cache got eliminated between 3.3 and 3.6, and there were >> all sorts of changes to it over the last 6 releases that have been >> bothersome. >

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-16 Thread Maciej Soltysiak
On Mon, Jan 14, 2013 at 7:11 AM, Dave Taht wrote: > The routing cache got eliminated between 3.3 and 3.6, and there were > all sorts of changes to it over the last 6 releases that have been > bothersome. Ok, you might be onto something. It got eliminated in 3.6.x; So I checked a few things with

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-14 Thread Dave Taht
On Mon, Jan 14, 2013 at 1:14 AM, Dave Taht wrote: > I am so buried as to only be able to do new builds of cero once a week. > > Can the bad behavior be duplicated on a single core other sort of > processor, like x86? Or merely boot up a x86 box in a single processor > mode? > > I'll try to get a n

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-14 Thread Ketan Kulkarni
I have never played around with polipo proxy much - neither did wonder about its DNS behavior. It would be good to have a bug filed and discussion tracked over there. Maciej: can you please report a bug and put the logs (preferably without TFO ;-) )? I can take a look at those later this week prob

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-14 Thread Eric Dumazet
Some paths want to check a spinlock is held, others want to check if its not held, it depends on the context. So returning 1 on UP would break a bunch of code as well. On Mon, Jan 14, 2013 at 12:18 AM, Jerry Chu wrote: > > > On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet wrote: > >> Oh well ye

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-14 Thread Jerry Chu
On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet wrote: > Oh well yes, this doesnt quite work on !SMP. > Strange - how would one assert a spin lock is held, and obviously only for SMP? (I almost think arch_spin_is_locked(lock) should be ((void)(lock), 1) for UP for the purpose of assertion...) Als

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Dave Taht
I am so buried as to only be able to do new builds of cero once a week. Can the bad behavior be duplicated on a single core other sort of processor, like x86? Or merely boot up a x86 box in a single processor mode? I'll try to get a new release out next sunday. On Sun, Jan 13, 2013 at 8:43 PM, K

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Dave Taht
This is a different issue that tfo, so taking the tfo-ers off the list On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak wrote: > I am seeing something strange here, with polipo related to TFO but also DNS. I have had polipo's internal dns resolver mess up on multiple occasions exactly along the

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Ketan Kulkarni
Thanks Eric and Yuchung for taking care of the patch. I will test few more TFO cases as well once this patch is built in cero. Thanks, Ketan On Jan 14, 2013 9:37 AM, "Eric Dumazet" wrote: > > Quite frankly I would just remove the BUG_ON() > > diff --git a/net/core/request_sock.c b/net/core/reque

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Eric Dumazet
Quite frankly I would just remove the BUG_ON() diff --git a/net/core/request_sock.c b/net/core/request_sock.c index c31d9e8..4425148 100644 --- a/net/core/request_sock.c +++ b/net/core/request_sock.c @@ -186,8 +186,6 @@ void reqsk_fastopen_remove(struct sock *sk, struct request_sock *req,

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Eric Dumazet
Oh well yes, this doesnt quite work on !SMP. And this kind of bug is frequent See following example : commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563 Author: Hugh Dickins Date: Wed Feb 8 17:13:40 2012 -0800 mm: fix UP THP spin_is_locked BUGs Fix CONFIG_TRANSPARENT_HUGEPAGE=y CON

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Yuchung Cheng
Thanks for making these efforts to debug this. Ketan: can we try replace the one BUG_ON with two WARN_ON to confirm the exact faulty condition? I wish I can do that myself but I don't have a box at hand. Yuchung On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau wrote: > On 2013-01-13 7:03 PM, Eri

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Felix Fietkau
On 2013-01-13 7:03 PM, Eric Dumazet wrote: > I suspect a bug in the spin_is_locked() implementation on your arch, as > he socket lock should be held at this point. I don't think this is an arch implementation bug, this probably happens on all !SMP systems. See this bit from include/linux/spinlock_u

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Eric Dumazet
I suspect a bug in the spin_is_locked() implementation on your arch, as he socket lock should be held at this point. On Sun, Jan 13, 2013 at 9:01 AM, Ketan Kulkarni wrote: > I could get a chance to get the backtrace from serial port. I didnt do the > kgdb session yet. > To iterate, the crash o

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-13 Thread Ketan Kulkarni
I could get a chance to get the backtrace from serial port. I didnt do the kgdb session yet. To iterate, the crash occurs on TFO server on mips platform. The call trace looks like this [ 1024.53] Call Trace: [ 1024.53] [<801fc7f4>] reqsk_fastopen_remove+0x30/0x17c [ 1024.53] [<8024a36c

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-05 Thread Ketan Kulkarni
Disabling ECN on cero box has no effect. The box crashed with with ECN disabled. Also tried enabling ECN on x86 and it didnt crash in either case. The tcpdump on cero lo is updated at - https://www.bufferbloat.net/issues/418#change-1703 It is exactly similar to the previously attached "lo_capture.t

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Dave Taht
On Fri, Jan 4, 2013 at 7:35 PM, Dave Taht wrote: > It's rather fun to explore a new protocol on a friday night!, but this > thread is getting out of hand. I created a bug for it here: > > https://www.bufferbloat.net/issues/418 > > I don't mind if we continue to discuss it here, but do put packet >

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Yuchung Cheng
On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni wrote: > Well, I was trying polipo server on cero box and httping from laptop. On > both the boxes I set 3 in tcp_fastopen. > > The panic is seen only when server is on cero box. > If I run server on my laptop and httping from cero all TFO connections

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Dave Taht
It's rather fun to explore a new protocol on a friday night!, but this thread is getting out of hand. I created a bug for it here: https://www.bufferbloat.net/issues/418 I don't mind if we continue to discuss it here, but do put packet captures on the bug, please... I scripted up a few tests tha

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Eric Dumazet
/* TCP Fast Open Cookie as stored in memory */ struct tcp_fastopen_cookie { s8 len; u8 val[TCP_FASTOPEN_COOKIE_MAX]; }; I wonder if 's8' really does what we want on all arches. We want to store a negative 8bit number, not an unsigned one... On Fri, Jan 4, 2013 at 7:02

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Ketan Kulkarni
Without TFO all worked fine. The problem is when tfo server is on cero box. I will try both ECN on on laptop and disabling ECN on cero with TFO on. Will report the behavior seen. Thanks, Ketan. On Jan 5, 2013 7:50 AM, "Yuchung Cheng" wrote: > On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni wrote

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Ketan Kulkarni
Well, I was trying polipo server on cero box and httping from laptop. On both the boxes I set 3 in tcp_fastopen. The panic is seen only when server is on cero box. If I run server on my laptop and httping from cero all TFO connections are successful. So I doubt its the only problem is SYN+DATA. U

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Robert Bradley
On 04/01/13 21:01, dpr...@reed.com wrote: Is this a TFO where the endpoint is on cerowrt, or just a SYN+DATA for a non cerowrt destination? I was looking at the firewall rules, and they are pretty complicated. Perhaps the SYN+DATA triggers a strange firewall behavior (a loop?) SYN's are sp

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Robert Bradley
On 04/01/13 20:42, Maciej Soltysiak wrote: This is very weird, because TFO is TCP and the DNS queries fired off by polipo are UDP: root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo This is the only DNS traffic I saw during the attempts. The tcpdumps have udp bad checksum but wh

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Jerry Chu
On Fri, Jan 4, 2013 at 1:21 PM, Dave Taht wrote: > On Fri, Jan 4, 2013 at 12:57 PM, Jerry Chu wrote: > > +ycheng > > > > On Fri, Jan 4, 2013 at 12:43 PM, Maciej Soltysiak > > wrote: > >> > >> Oops, apologies if email was formatted weirdly... > > > > > > The problem you described below is separa

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Jerry Chu
+ycheng On Fri, Jan 4, 2013 at 12:43 PM, Maciej Soltysiak wrote: > Oops, apologies if email was formatted weirdly... The problem you described below is separate from the MIPS router crash one, right? BTW, we've only tested on x86_64 arch. In addition to tcpdump, "netstat -s | grep -i fastopen"

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Jerry Chu
+ycheng On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht wrote: > Hmm. I would lean towards there being an issue with the new (freshly > ported forward to 3.7.1) unaligned checksum code for mips based on > what you say here. Or an offload... > > As for the 239.x multicast issue, hmm... separate issue

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Dave Taht
On Fri, Jan 4, 2013 at 1:36 PM, Jerry Chu wrote: > On Fri, Jan 4, 2013 at 1:21 PM, Dave Taht wrote: >> >> On Fri, Jan 4, 2013 at 12:57 PM, Jerry Chu wrote: >> > +ycheng >> > >> > On Fri, Jan 4, 2013 at 12:43 PM, Maciej Soltysiak >> > wrote: >> >> >> >> Oops, apologies if email was formatted wei

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Dave Taht
On Fri, Jan 4, 2013 at 12:57 PM, Jerry Chu wrote: > +ycheng > > On Fri, Jan 4, 2013 at 12:43 PM, Maciej Soltysiak > wrote: >> >> Oops, apologies if email was formatted weirdly... > > > The problem you described below is separate from the MIPS router crash one, > right? I think - but am of course

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Dave Taht
Hmm. I would lean towards there being an issue with the new (freshly ported forward to 3.7.1) unaligned checksum code for mips based on what you say here. Or an offload... As for the 239.x multicast issue, hmm... separate issue entirely. Probably... And then there's TFO. I note that in order to u

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread dpreed
know. -Original Message- From: "Maciej Soltysiak" Sent: Friday, January 4, 2013 3:43pm To: "Dave Taht" , "Ketan Kulkarni" Cc: "Jerry Chu" , "Eric Dumazet" , cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] TFO crashes cerowrt 3.

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Maciej Soltysiak
Oops, apologies if email was formatted weirdly... On Fri, Jan 4, 2013 at 9:42 PM, Maciej Soltysiak wrote: > I am seeing something strange here, with polipo related to TFO but also > DNS. > When I just took 3.7.1-1 and set my windows 7 laptop to use > gw.home.lan:8123 as http proxy it didn't work.

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Maciej Soltysiak
I am seeing something strange here, with polipo related to TFO but also DNS. When I just took 3.7.1-1 and set my windows 7 laptop to use gw.home.lan:8123 as http proxy it didn't work. What I observed was: A) after quite a while polipo's response to browser was 504 Host www.osnews.com lookup failed:

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Eric Dumazet
Sorry, could you give us a copy of the panic stack trace ? Thanks On Fri, Jan 4, 2013 at 9:04 AM, Dave Taht wrote: > On Thu, Jan 3, 2013 at 8:54 AM, Ketan Kulkarni wrote: > > Thanks Dave. > > I upgraded my 3800 to 3.7.1-1. It is working for day to day Internet > activity. > > > > However, I a

Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Dave Taht
On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet wrote: > Sorry, could you give us a copy of the panic stack trace ? I will get a serial console up on a wndr3800 by sunday. (sorry, just landed in california, am in disarray) The latest dev build of cero for the wndr3800 and wndr3700v2 is at: http://

[Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

2013-01-04 Thread Dave Taht
On Thu, Jan 3, 2013 at 8:54 AM, Ketan Kulkarni wrote: > Thanks Dave. > I upgraded my 3800 to 3.7.1-1. It is working for day to day Internet activity. > > However, I am not able to get through even a single TCP TFO > connection. The router restarts as soon as it sees the TFO connection. > Looks lik