rrs accepted this revision.
rrs added a comment.
These look fine and familiar ;-)
I think a hash table as an option to sorting is probably a good thing :D
REVISION DETAIL
https://reviews.freebsd.org/D6689
EMAIL PREFERENCES
https://reviews.freebsd.org/settings/panel/emailpreferences/
rrs added a comment.
It would be worth checking the assembly output.. if it truly inlines it then
it should probably only do one compare.. but worth checking...
REVISION DETAIL
https://reviews.freebsd.org/D6137
EMAIL PREFERENCES
https://reviews.freebsd.org/settings/panel/emailpreference
rrs added a comment.
The socket buffer with SCTP is just not something thats workable. There are
all sorts of pre-defined notions that closely align a socket buffer to
stream-of-bytes
semantics of TCP. With UDP its never an issue, since you have all un-ordered
who
cares up come the mess
rrs closed this revision.
REVISION DETAIL
https://reviews.freebsd.org/D1711
To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, imp, adrian, hselasky, sbruno
Cc: julian, hiren, jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net
___
freebsd-net@freeb
rrs added a comment.
Ok after much discussion with Hans, we *could* have an issue where the
user sends in an invalid CPU. This is *not* what I think is happening with Hiren
since the cc_cpu and lock is all sane (it would be a invalid index to cc_cpu
which
would not have an init'd lock). But I hav
rrs added a comment.
I have thought long and hard about this. I don't think its a bug.
But to know for sure I will need to add some instrumentation.
I suspect what is happening is a tremendous number of callouts
all come due at the same time. The three back traces trying to stop or reset
a callou
rrs added a comment.
Hans:
I think your wrong here. The caller of callout_cpu_switch() is holding
the CC_LOCK(). Now there are only two callers of this function.
Either the actual callout code itself (softclock_call_cc()) or
the callout_reset_sbt_on(). In the case of callout_reset_sbt_on().
So
rrs added a comment.
Hans:
Let me explain to you how I think you are wrong, you are missing a
small subtle thing here
When we do the callout_stop we set
cc_migration_cpu() = CPUBLOCK
*NOT*
c->c_cpu = CPUBLOCK;
You are confusing the two things. The CPUBLOCK is used
in two different places
rrs added a comment.
Hiren:
You have the wrong structure type.
In the printf before panic it is giving you the lock that was spinning.. that
would be in the callout_cpu structure I bet.. I mis-told you in email.
So if you did
print *(struct callout_cpu *)0x81364180
It should show you
rrs added a comment.
Wow, but look at the flags here. They are cc_flags == 0.
That means its *not* on the wheel and yet the thing it points to
(our victim) *thinks* its on the wheel.
This is not good.. We are stuck in a lock
trying to reschedule the timeout (a lock that is
not locked by the way
rrs added a comment.
Hiren:
There also should have been a printf before the panic string
printf( "spin lock %p (%s) held by %p (tid %d) too long\n",
m, m->lock_object.lo_name, td, td->td_tid);
Can we see what that lovely printf has displayed?
In theory the lo_name should be "callou
rrs added a comment.
Hiren:
Thats helpful.. as I said this is strange. The callout you posted shows its
associated with CPU 0, (c_cpu == 0), and yet
the mtx on that (which is what we are spinning on) is free (its owned == 4). So
why would we have crashed
holding the spin lock too long? Unless j
rrs added a comment.
Hans:
I don't get your call sequence, I sent you an email on it..
Hiren:
Can you go up the call chain and dump the callout structure
c in
0x80760064 in callout_lock (c=0xf8000d81dc98) at
/usr/src/sys/kern/kern_timeout.c:530
There is something funny here, becau
rrs added a comment.
Jhb/Others
So lets go through your scenario with code in arp:
a) softclock dequeues callout to run
-- Which calls softclock_call_cc
We make it to line:676 and see that "yes" the user (arp) init'd with a
rw_mtx
and run the next line 677 (to get the lock).
b) othe
rrs added a comment.
Adrian:
I know he said callout_drain, but just like in TCP that is *not* always
possible. In
the case of the arp/nd6 code lock are held (same as TCP) so you can't do a
callout_drain. Thats
why the original author put ref-counting in with the idea that the timer would
kill
rrs added a comment.
Hmm thinking about your comment jhb, we could easily add the
callout_drain_async to the
current callout code. If you think its worth while maybe we should add that to
D1711
Jhb, if you think its worth doing add that to D1711 and I will work on it ;-)
REVISION DETAIL
http
rrs added a comment.
JHB:
The scenario you outline is *exactly* the panic that was seen by sbruno. I
guess my description
was unclear.
The existing code in that other thread right now does a callout_stop and
tests the return code. If its one its one (which says I canceled a callout)
then it
l
rrs added a comment.
I don't think this is a refcnt issue bz, the base of this is a hole in the way
the callout code works. Basically there is a window when
a) The callout_wheel is executing, it sees that a "lock" has been configured,
so it goes to
release the callout wheel lock and then lo
rrs created this revision.
rrs added reviewers: jhb, imp, sbruno, gnn, rwatson, lstewart, kostikbel,
adrian.
rrs added subscribers: hselasky, julian, hiren, emaste, freebsd-net.
REVISION SUMMARY
There is a hole in the timer code that when you give it
a lock, it will lock that lock and then ch
rrs added a comment.
I have created D1777 to address the nd6/arp crash separately. I am currently in
the midst
of testing these.
REVISION DETAIL
https://reviews.freebsd.org/D1711
To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, sbruno, imp, adrian, hselasky
Cc: julian, hiren, jhb, kostikbel,
rrs added a comment.
Imp:
Ok I have spent a bit of time puzzling this out. First I was mistaken, the
callouts being run
are either arptimer or nd6timer(function name not right). These are not using
giant but
the passed in lle structure rw_lock.
We need to adjust these so that they check they:
rrs added a comment.
Julian:
The simple fix is to change the callout_init_rw -> callout_init(c, 1);
That makes it so the callout_stop() in this instance would return 0, not 1.
Since the
callout could not be stopped.
This would then cause the reference count to drop to 1 (it was 2 the original
rrs updated this revision to Diff 3631.
rrs added a comment.
This revision now requires review to proceed.
This fixes the comment as imp suggested and the indent...
CHANGES SINCE LAST UPDATE
https://reviews.freebsd.org/D1711?vs=3603&id=3631
REVISION DETAIL
https://reviews.freebsd.org/D1711
rrs added a comment.
Julian:
The point is *exactly* that, the callout *has* a reference.. and now that the
table is being
flushed if the callout_stop returns 1 it thinks the callout has been stopped,
which it has, which
means it will not run and release its reference. Thus the lowering of the
rrs added a comment.
Ok guys, I have puzzled out what that crash *may* be that was posted by Hiren.
The same
issue exists in the timeout code rewrite that Han's has up on the board as
well. Though
the callout_drain_async() may solve it if the user called that instead. Here is
what is happening.
rrs added a comment.
Hiren:
Ok looking at kern_timeout.c thats a call to
class->lc_lock(c_lock, lock_status);
If my 10.x matches yours.
And the call inside that kern_rwlock.c:757
is
v = rw->rw_lock;
owner = (struct thread *)RW_OWNER(v);
I would imagine v is probably a freed lock or some suc
rrs added a comment.
Sbruno: I have a hard time connecting the dots between the kernel panic's you
are seeing
and the callout system (especially the changes I made above). You were getting
a spin-lock held
too long right?
hiren:
This looks interesting to me, it is definitely something I would
rrs updated this revision to Diff 3603.
rrs added a comment.
Lets try this one more time to get line 789 right ;-)
CHANGES SINCE LAST UPDATE
https://reviews.freebsd.org/D1711?vs=3602&id=3603
REVISION DETAIL
https://reviews.freebsd.org/D1711
AFFECTED FILES
kern/kern_timeout.c
sys/callout
rrs updated this revision to Diff 3602.
rrs added a comment.
This revision now requires review to proceed.
Ed:
See if this comment clarification does not help a bit.
Hans:
I have no intention of *not* leaving the macro changes in here, this was a
comment suggested by
imp and I think its a go
rrs added inline comments.
INLINE COMMENTS
sys/kern/kern_timeout.c:789 Opps I will fix that thanks Ed!
sys/kern/kern_timeout.c:1067-1069 Let me see if I can edit this comment a bit
and make it more clear.
REVISION DETAIL
https://reviews.freebsd.org/D1711
To: rrs, gnn, rwatson, sbruno, lst
rrs added reviewers: jhb, kostikbel.
REVISION DETAIL
https://reviews.freebsd.org/D1711
To: rrs, gnn, rwatson, sbruno, lstewart, imp, hselasky, adrian, jhb, kostikbel
Cc: jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net
___
freebsd-net@freebsd.o
rrs updated this revision to Diff 3591.
rrs added a comment.
This revision now requires review to proceed.
Ok I have stripped out the kernel test framework and moved this
to D1755.. jhb, you are a reviewer there as well ;-)
This boils things down to *just* the fixes to the callout.
CHANGES SINCE
rrs added a comment.
jhb
The only reason I put the test stuff in this patch is its the only way to
validate the patch, and
that is something we really need in the callout system is a way to validate
that its working
correctly since this stuff tends to be very subtle and a bit of a head banger
rrs added a comment.
Imp:
I can see how we can easily move callout_test.h out of sys. That really
should be in the module/tests/callout_test/ dir.
But where would you suggest the global framework file go if not sys?
REVISION DETAIL
https://reviews.freebsd.org/D1711
To: rrs, gnn, rwatson, adr
rrs added a comment.
To answer jhb and kostikbel
1) Yes this does address the two issues that Hans re-write of the callout
system did without changing the KPI.
There may be other bugs as well, but with the test framework and the old
code I could reproduce both issues
(spin lock held to
rrs added a comment.
Hans:
We have discussed that, and *no* it will *not* return the incorrect thing once
it can't be stopped. There is code in the callout_reset_sbt_on() that makes sure
zero is returned. The only way that zero will *not* be returned is if the
PENDING
flag is set. That won't hap
rrs updated this revision to Diff 3503.
rrs added a comment.
This addresses all of the concerns of Hans.
I still need help with getting MK_TESTKERN to properly be
placed in the /mk infrastructure, I have pending queries to GNN and JHB on
this...
CHANGES SINCE LAST UPDATE
https://reviews.freebs
rrs added a comment.
Moving the tests to a sub-dir sounds like a good idea as well.. Thanks Hans..
INLINE COMMENTS
sys/kern/kern_timeout.c:674 Hans, no *you must* set it to false here in the
soft clock so you can recognize if
it is requested to be stopped (if possible). The return codes are
rrs created this revision.
rrs added reviewers: gnn, rwatson, imp, hselasky, adrian, sbruno, lstewart.
rrs added a subscriber: freebsd-net.
REVISION SUMMARY
The callout code had two specific bugs within it after the new CPU migration
feature was added.
1) The callout_active() call at times c
39 matches
Mail list logo