[Differential] D6689: tcp/lro: Implement hash table for LRO entries.

2016-07-05 Thread rrs (Randall Stewart)
rrs accepted this revision. rrs added a comment. These look fine and familiar ;-) I think a hash table as an option to sorting is probably a good thing :D REVISION DETAIL https://reviews.freebsd.org/D6689 EMAIL PREFERENCES https://reviews.freebsd.org/settings/panel/emailpreferences/

[Differential] D6137: tcp/lro: Refactor the free/active list operation.

2016-04-28 Thread rrs (Randall Stewart)
rrs added a comment. It would be worth checking the assembly output.. if it truly inlines it then it should probably only do one compare.. but worth checking... REVISION DETAIL https://reviews.freebsd.org/D6137 EMAIL PREFERENCES https://reviews.freebsd.org/settings/panel/emailpreference

[Differential] [Updated] D1809: [sockbuf] Don't expose lock details when isn't needed

2015-11-09 Thread rrs (Randall Stewart)
rrs added a comment. The socket buffer with SCTP is just not something thats workable. There are all sorts of pre-defined notions that closely align a socket buffer to stream-of-bytes semantics of TCP. With UDP its never an issue, since you have all un-ordered who cares up come the mess

[Differential] [Closed] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests).

2015-03-28 Thread rrs (Randall Stewart)
rrs closed this revision. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, imp, adrian, hselasky, sbruno Cc: julian, hiren, jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net ___ freebsd-net@freeb

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-18 Thread rrs (Randall Stewart)
rrs added a comment. Ok after much discussion with Hans, we *could* have an issue where the user sends in an invalid CPU. This is *not* what I think is happening with Hiren since the cc_cpu and lock is all sane (it would be a invalid index to cc_cpu which would not have an init'd lock). But I hav

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-18 Thread rrs (Randall Stewart)
rrs added a comment. I have thought long and hard about this. I don't think its a bug. But to know for sure I will need to add some instrumentation. I suspect what is happening is a tremendous number of callouts all come due at the same time. The three back traces trying to stop or reset a callou

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hans: I think your wrong here. The caller of callout_cpu_switch() is holding the CC_LOCK(). Now there are only two callers of this function. Either the actual callout code itself (softclock_call_cc()) or the callout_reset_sbt_on(). In the case of callout_reset_sbt_on(). So

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hans: Let me explain to you how I think you are wrong, you are missing a small subtle thing here When we do the callout_stop we set cc_migration_cpu() = CPUBLOCK *NOT* c->c_cpu = CPUBLOCK; You are confusing the two things. The CPUBLOCK is used in two different places

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hiren: You have the wrong structure type. In the printf before panic it is giving you the lock that was spinning.. that would be in the callout_cpu structure I bet.. I mis-told you in email. So if you did print *(struct callout_cpu *)0x81364180 It should show you

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Wow, but look at the flags here. They are cc_flags == 0. That means its *not* on the wheel and yet the thing it points to (our victim) *thinks* its on the wheel. This is not good.. We are stuck in a lock trying to reschedule the timeout (a lock that is not locked by the way

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hiren: There also should have been a printf before the panic string printf( "spin lock %p (%s) held by %p (tid %d) too long\n", m, m->lock_object.lo_name, td, td->td_tid); Can we see what that lovely printf has displayed? In theory the lo_name should be "callou

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hiren: Thats helpful.. as I said this is strange. The callout you posted shows its associated with CPU 0, (c_cpu == 0), and yet the mtx on that (which is what we are spinning on) is free (its owned == 4). So why would we have crashed holding the spin lock too long? Unless j

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hans: I don't get your call sequence, I sent you an email on it.. Hiren: Can you go up the call chain and dump the callout structure c in 0x80760064 in callout_lock (c=0xf8000d81dc98) at /usr/src/sys/kern/kern_timeout.c:530 There is something funny here, becau

[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-05 Thread rrs (Randall Stewart)
rrs added a comment. Jhb/Others So lets go through your scenario with code in arp: a) softclock dequeues callout to run -- Which calls softclock_call_cc We make it to line:676 and see that "yes" the user (arp) init'd with a rw_mtx and run the next line 677 (to get the lock). b) othe

[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-05 Thread rrs (Randall Stewart)
rrs added a comment. Adrian: I know he said callout_drain, but just like in TCP that is *not* always possible. In the case of the arp/nd6 code lock are held (same as TCP) so you can't do a callout_drain. Thats why the original author put ref-counting in with the idea that the timer would kill

[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Hmm thinking about your comment jhb, we could easily add the callout_drain_async to the current callout code. If you think its worth while maybe we should add that to D1711 Jhb, if you think its worth doing add that to D1711 and I will work on it ;-) REVISION DETAIL http

[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. JHB: The scenario you outline is *exactly* the panic that was seen by sbruno. I guess my description was unclear. The existing code in that other thread right now does a callout_stop and tests the return code. If its one its one (which says I canceled a callout) then it l

[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. I don't think this is a refcnt issue bz, the base of this is a hole in the way the callout code works. Basically there is a window when a) The callout_wheel is executing, it sees that a "lock" has been configured, so it goes to release the callout wheel lock and then lo

[Differential] [Request, 51 lines] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread rrs (Randall Stewart)
rrs created this revision. rrs added reviewers: jhb, imp, sbruno, gnn, rwatson, lstewart, kostikbel, adrian. rrs added subscribers: hselasky, julian, hiren, emaste, freebsd-net. REVISION SUMMARY There is a hole in the timer code that when you give it a lock, it will lock that lock and then ch

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. I have created D1777 to address the nd6/arp crash separately. I am currently in the midst of testing these. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, sbruno, imp, adrian, hselasky Cc: julian, hiren, jhb, kostikbel,

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Imp: Ok I have spent a bit of time puzzling this out. First I was mistaken, the callouts being run are either arptimer or nd6timer(function name not right). These are not using giant but the passed in lle structure rw_lock. We need to adjust these so that they check they:

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Julian: The simple fix is to change the callout_init_rw -> callout_init(c, 1); That makes it so the callout_stop() in this instance would return 0, not 1. Since the callout could not be stopped. This would then cause the reference count to drop to 1 (it was 2 the original

[Differential] [Updated, 241 lines] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-02-04 Thread rrs (Randall Stewart)
rrs updated this revision to Diff 3631. rrs added a comment. This revision now requires review to proceed. This fixes the comment as imp suggested and the indent... CHANGES SINCE LAST UPDATE https://reviews.freebsd.org/D1711?vs=3603&id=3631 REVISION DETAIL https://reviews.freebsd.org/D1711

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Julian: The point is *exactly* that, the callout *has* a reference.. and now that the table is being flushed if the callout_stop returns 1 it thinks the callout has been stopped, which it has, which means it will not run and release its reference. Thus the lowering of the

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Ok guys, I have puzzled out what that crash *may* be that was posted by Hiren. The same issue exists in the timeout code rewrite that Han's has up on the board as well. Though the callout_drain_async() may solve it if the user called that instead. Here is what is happening.

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-03 Thread rrs (Randall Stewart)
rrs added a comment. Hiren: Ok looking at kern_timeout.c thats a call to class->lc_lock(c_lock, lock_status); If my 10.x matches yours. And the call inside that kern_rwlock.c:757 is v = rw->rw_lock; owner = (struct thread *)RW_OWNER(v); I would imagine v is probably a freed lock or some suc

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-03 Thread rrs (Randall Stewart)
rrs added a comment. Sbruno: I have a hard time connecting the dots between the kernel panic's you are seeing and the callout system (especially the changes I made above). You were getting a spin-lock held too long right? hiren: This looks interesting to me, it is definitely something I would

[Differential] [Updated, 241 lines] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-02-03 Thread rrs (Randall Stewart)
rrs updated this revision to Diff 3603. rrs added a comment. Lets try this one more time to get line 789 right ;-) CHANGES SINCE LAST UPDATE https://reviews.freebsd.org/D1711?vs=3602&id=3603 REVISION DETAIL https://reviews.freebsd.org/D1711 AFFECTED FILES kern/kern_timeout.c sys/callout

[Differential] [Updated, 242 lines] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-02-03 Thread rrs (Randall Stewart)
rrs updated this revision to Diff 3602. rrs added a comment. This revision now requires review to proceed. Ed: See if this comment clarification does not help a bit. Hans: I have no intention of *not* leaving the macro changes in here, this was a comment suggested by imp and I think its a go

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-03 Thread rrs (Randall Stewart)
rrs added inline comments. INLINE COMMENTS sys/kern/kern_timeout.c:789 Opps I will fix that thanks Ed! sys/kern/kern_timeout.c:1067-1069 Let me see if I can edit this comment a bit and make it more clear. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, sbruno, lst

[Differential] [Updated] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests).

2015-02-02 Thread rrs (Randall Stewart)
rrs added reviewers: jhb, kostikbel. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, sbruno, lstewart, imp, hselasky, adrian, jhb, kostikbel Cc: jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net ___ freebsd-net@freebsd.o

[Differential] [Updated, 239 lines] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-02-02 Thread rrs (Randall Stewart)
rrs updated this revision to Diff 3591. rrs added a comment. This revision now requires review to proceed. Ok I have stripped out the kernel test framework and moved this to D1755.. jhb, you are a reviewer there as well ;-) This boils things down to *just* the fixes to the callout. CHANGES SINCE

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-01-29 Thread rrs (Randall Stewart)
rrs added a comment. jhb The only reason I put the test stuff in this patch is its the only way to validate the patch, and that is something we really need in the callout system is a way to validate that its working correctly since this stuff tends to be very subtle and a bit of a head banger

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-01-29 Thread rrs (Randall Stewart)
rrs added a comment. Imp: I can see how we can easily move callout_test.h out of sys. That really should be in the module/tests/callout_test/ dir. But where would you suggest the global framework file go if not sys? REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, adr

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-01-29 Thread rrs (Randall Stewart)
rrs added a comment. To answer jhb and kostikbel 1) Yes this does address the two issues that Hans re-write of the callout system did without changing the KPI. There may be other bugs as well, but with the test framework and the old code I could reproduce both issues (spin lock held to

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-01-28 Thread rrs (Randall Stewart)
rrs added a comment. Hans: We have discussed that, and *no* it will *not* return the incorrect thing once it can't be stopped. There is code in the callout_reset_sbt_on() that makes sure zero is returned. The only way that zero will *not* be returned is if the PENDING flag is set. That won't hap

[Differential] [Updated, 895 lines] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-01-28 Thread rrs (Randall Stewart)
rrs updated this revision to Diff 3503. rrs added a comment. This addresses all of the concerns of Hans. I still need help with getting MK_TESTKERN to properly be placed in the /mk infrastructure, I have pending queries to GNN and JHB on this... CHANGES SINCE LAST UPDATE https://reviews.freebs

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-01-28 Thread rrs (Randall Stewart)
rrs added a comment. Moving the tests to a sub-dir sounds like a good idea as well.. Thanks Hans.. INLINE COMMENTS sys/kern/kern_timeout.c:674 Hans, no *you must* set it to false here in the soft clock so you can recognize if it is requested to be stopped (if possible). The return codes are

[Differential] [Request, 895 lines] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-01-28 Thread rrs (Randall Stewart)
rrs created this revision. rrs added reviewers: gnn, rwatson, imp, hselasky, adrian, sbruno, lstewart. rrs added a subscriber: freebsd-net. REVISION SUMMARY The callout code had two specific bugs within it after the new CPU migration feature was added. 1) The callout_active() call at times c