hiren accepted this revision.
hiren added a comment.
This revision has a positive review.
I hope you'd write a bit more descriptive commit-log (not just 'what' but
also 'why') for the change. Thanks a lot for your work!
Cheers,
Hiren
INLINE COMMENTS
sys/netinet/tcp_syncache.c:1507 Do
hiren added a comment.
In https://reviews.freebsd.org/D5872#128555, @lstewart wrote:
> I thought that had been fixed ages ago... oops.
Fixed? i.e. doing something other than setting cwnd to 1 seg?
> It should be calling cc_cong_signal() with a new congestion type.
Hum...
hiren added a comment.
In https://reviews.freebsd.org/D5872#128539, @lstewart wrote:
> ... but replace with a macro to check that the rexmit/persist timer is
armed if appropriate!
Yes, that would be useful!
REVISION DETAIL
https://reviews.freebsd.org/D5872
EMAIL PREFERENCES
hiren added a comment.
Ack for removing ENOBUFs case.
REVISION DETAIL
https://reviews.freebsd.org/D5872
EMAIL PREFERENCES
https://reviews.freebsd.org/settings/panel/emailpreferences/
To: sepherosa_gmail.com, network, glebius, adrian, delphij,
decui_microsoft.com, honzhan_microsoft.com,
hiren added a comment.
In https://reviews.freebsd.org/D5872#127345, @jtl wrote:
> In https://reviews.freebsd.org/D5872#127343, @mike-karels.net wrote:
>
> > If we get an ENOBUFS when sending data, we will already be running the
retransmit timer.
>
>
> Good point, but see below.
hiren added a comment.
In https://reviews.freebsd.org/D5872#127123, @jtl wrote:
>
> The key feature that makes the retransmit timer inappropriate for an
ACK-only case is that it is only stopped when we receive input; however, in the
ACK-only case, we really want to stop it
hiren added a comment.
Another panic from an almost *idle* box:
Sanitized panic #6
Dump header from device /dev/da0s1b
Architecture: amd64
Architecture Version: 2
Dump Length: 6525980672B (6223 MB)
Blocksize: 512
Dumptime: Thu Feb 19 06:16:57 2015
Hostname: xx
hiren added a comment.
>>! In D1711#96, @rrs wrote:
> Hiren:
>
> You have the wrong structure type.
>
> In the printf before panic it is giving you the lock that was spinning.. that
> would be in the callout_cpu structure I bet.. I mis-told you in email.
>
> So if you did
>
> print *(struct ca
hiren added a comment.
>>! In D1711#92, @rrs wrote:
> Hiren:
>
> There also should have been a printf before the panic string
> printf( "spin lock %p (%s) held by %p (tid %d) too long\n",
> m, m->lock_object.lo_name, td, td->td_tid);
>
> Can we see what that lovely printf has displa
hiren added a comment.
>>! In D1711#91, @rrs wrote:
> Hiren:
>
> Thats helpful.. as I said this is strange. The callout you posted shows its
> associated with CPU 0, (c_cpu == 0), and yet
> the mtx on that (which is what we are spinning on) is free (its owned == 4).
> So why would we have crash
hiren added a comment.
>>! In D1711#86, @hselasky wrote:
> Hi,
>
> rrs + hiren:
>
> I think the problem is this:
>
> In "_callout_stop_safe()" we sometimes exit having "cc_migration_cpu(cc,
> direct) = CPUBLOCK;". Now if a second call to "_callout_stop_safe()" happens
> before the pending cal
hiren added a comment.
>>! In D1711#88, @rrs wrote:
> Hans:
>
> I don't get your call sequence, I sent you an email on it..
>
> Hiren:
>
> Can you go up the call chain and dump the callout structure
> c in
> 0x80760064 in callout_lock (c=0xf8000d81dc98) at
> /usr/src/sys/kern/kern_
hiren added a comment.
@hps: cc_cpu[MAXCPU] info as you requested on IRC. Let me know if you need more
info.
(kgdb) backtrace
#0 doadump (textdump=1) at pcpu.h:219
#1 0x80749c17 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:452
#2 0x80749ff4 in pa
hiren added a comment.
@rrs: One more
Sanitized panic #5
Dump header from device /dev/da0s1b
Architecture: amd64
Architecture Version: 2
Dump Length: 1694281728B (1615 MB)
Blocksize: 512
Dumptime: Sun Feb 15 18:03:14 2015
Hostname: x
Magic: FreeBSD
hiren added a comment.
@rrs:
Looks like we've come full circle back to the very first crash reported. We are
on stable10 with all relevant fixes.
Sanitized panic #4
Dump header from device /dev/da0s1b
Architecture: amd64
Architecture Version: 2
Dump Length: 6764437504B (6451
hiren added a comment.
It all started with:
https://lists.freebsd.org/pipermail/freebsd-net/2014-September/039730.html
Last (conclusive) email in that thread:
https://lists.freebsd.org/pipermail/freebsd-net/2015-January/040895.html
That issue was fixed by: https://reviews.freebsd.org/D1438 i.e.
hiren added a comment.
>>! In D1777#16, @bz wrote:
> Hiren, it only took us 4 years to trigger this? Can people actually
> easily/reliably reproduce it?
Heh, I am not sure about "people" but we @llnw can see this very reliably.
Do you have any other theories/patches that we can try? It'd be he
hiren added a comment.
Update from llnw world:
Things have been pretty stable here without any panics for 24+ hours with
Stable10+D1711+D1777.
Thanks a lot, Randall!
REVISION DETAIL
https://reviews.freebsd.org/D1711
To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, sbruno, imp, adrian, hsela
hiren added a comment.
Update from llnw world:
Things have been pretty stable here without any panics for 24+ hours with
Stable10+D1711+D1777.
Thanks a lot, Randall!
REVISION DETAIL
https://reviews.freebsd.org/D1777
To: rrs, imp, sbruno, gnn, rwatson, lstewart, kostikbel, adrian, bz, jhb
Cc
hiren added a comment.
>>! In D1711#59, @rrs wrote:
> Hiren:
>
> Ok looking at kern_timeout.c thats a call to
> class->lc_lock(c_lock, lock_status);
>
> If my 10.x matches yours.
>
> And the call inside that kern_rwlock.c:757
> is
>
> v = rw->rw_lock;
> owner = (struct thread *)RW_OWNER(v);
hiren added a comment.
>>! In D1711#60, @hiren wrote:
>>>! In D1711#58, @rrs wrote:
>
>> hiren:
>>
>> This looks interesting to me, it is definitely something I would like to
>> look at. I assume you
>> are on 10.stable like Sean?
>
> Yes, its plain stable10+D1711.
> Also, all 3 panics are fr
hiren added a comment.
>>! In D1711#61, @hselasky wrote:
> Hi,
>
> There is only one or two likely consumers of callout_init_rw() at the present
> moment, and one of them is:
>
> ./netinet6/nd6.c: canceled = callout_stop(&ln->ln_timer_ch);
> ./netinet6/nd6.c: can
hiren added a comment.
>>! In D1711#59, @rrs wrote:
> Hiren:
>
> Ok looking at kern_timeout.c thats a call to
> class->lc_lock(c_lock, lock_status);
>
> If my 10.x matches yours.
It's not :-(
Looks like what we have here is not stock stable10 really. I'll check all the
details and get back
hiren added a comment.
>>! In D1711#58, @rrs wrote:
> hiren:
>
> This looks interesting to me, it is definitely something I would like to look
> at. I assume you
> are on 10.stable like Sean?
Yes, its plain stable10+D1711.
Also, all 3 panics are from the same system.
REVISION DETAIL
https:
hiren added a subscriber: hiren.
hiren added a comment.
Sanitized panic #3
Dump header from device /dev/da0s1b
Architecture: amd64
Architecture Version: 2
Dump Length: 5393809408B (5143 MB)
Blocksize: 512
Dumptime: Tue Feb 3 13:21:19 2015
Hostname: xxx
25 matches
Mail list logo