[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Hmm thinking about your comment jhb, we could easily add the callout_drain_async to the current callout code. If you think its worth while maybe we should add that to D1711 Jhb, if you think its worth doing add that to D1711 and I will work on it ;-) REVISION DETAIL http

[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread adrian (Adrian Chadd)
adrian added a comment. .. except he said callout_drain(). What happens if that's put in as part of the teardown process? REVISION DETAIL https://reviews.freebsd.org/D1777 To: rrs, imp, sbruno, gnn, rwatson, lstewart, kostikbel, adrian, bz, jhb Cc: bz, emaste, hiren, julian, hselasky, freebsd

[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. JHB: The scenario you outline is *exactly* the panic that was seen by sbruno. I guess my description was unclear. The existing code in that other thread right now does a callout_stop and tests the return code. If its one its one (which says I canceled a callout) then it l

Re: does "nat redirect_port tcp" works for you on -CURRENT?

2015-02-04 Thread Ian Smith
On Thu, 5 Feb 2015 02:14:41 +0300, Lev Serebryakov wrote: > On 05.02.2015 01:16, Lev Serebryakov wrote: > > > I have such rules in my firewall: > > > > nat 9 config redirect_port tcp 192.168.134.2:16881 16881 > > redirect_port udp 192.158.134.2:16881 16881 redirect_port tcp > > 192.168.134

Re: Intel 82574L (em)

2015-02-04 Thread Sean Bruno
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 02/02/15 09:35, Pieper, Jeffrey E wrote: > In the past we have, yes. This was a few years ago, but iirc the > current implementation is supposed to be the official solution. > > Jeff Jeff: Thanks for you feedback. Jack: I've tried compiling

Re: does "nat redirect_port tcp" works for you on -CURRENT?

2015-02-04 Thread Lev Serebryakov
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 05.02.2015 01:16, Lev Serebryakov wrote: > I have such rules in my firewall: > > nat 9 config redirect_port tcp 192.168.134.2:16881 16881 > redirect_port udp 192.158.134.2:16881 16881 redirect_port tcp > 192.168.134.2:22 2 > > nat 1 config

[Differential] [Updated] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread jhb (John Baldwin)
jhb added a comment. This is just "How It Works". You are always supposed to do a callout_drain() before freeing the storage belonging to a callout. I don't understand how you are preventing the callout/lock being freed out from under the callout routine in this version either. Now you can h

[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. I don't think this is a refcnt issue bz, the base of this is a hole in the way the callout code works. Basically there is a window when a) The callout_wheel is executing, it sees that a "lock" has been configured, so it goes to release the callout wheel lock and then lo

Re: does "nat redirect_port tcp" works for you on -CURRENT?

2015-02-04 Thread Lev Serebryakov
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 05.02.2015 01:16, Lev Serebryakov wrote: > nat 9 config redirect_port tcp 192.168.134.2:16881 16881 > redirect_port udp 192.158.134.2:16881 16881 redirect_port tcp > 192.168.134.2:22 2 Also, if I add "log" to this config: nat 9 config log r

does "nat redirect_port tcp" works for you on -CURRENT?

2015-02-04 Thread Lev Serebryakov
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 I have such rules in my firewall: nat 9 config redirect_port tcp 192.168.134.2:16881 16881 redirect_port udp 192.158.134.2:16881 16881 redirect_port tcp 192.168.134.2:22 2 nat 1 config ip $EXT_IP same_ports ... // Packets from outer world 110

[Differential] [Changed Subscribers] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread bz (Bjoern A. Zeeb)
bz added a subscriber: bz. bz added a reviewer: bz. bz added a comment. It smells like a lle_refcnt bug to me and I have send a private email to errs to understand the problem better (and also if other optional kernel options might have been used while this was experienced). REVISION DETAIL h

[Differential] [Updated] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread adrian (Adrian Chadd)
adrian added a comment. I'm still a little iffy about this kind of fix. It looks more like a lifecycle management thing that we should fix, rather than just moving to MPSAFE. But I unfortunately don't have the time to go look at the lle code and figure out how it should look. :( REVISION DETAI

[Differential] [Request, 51 lines] D1777: Associated fix for arp/nd6 timer usage.

2015-02-04 Thread rrs (Randall Stewart)
rrs created this revision. rrs added reviewers: jhb, imp, sbruno, gnn, rwatson, lstewart, kostikbel, adrian. rrs added subscribers: hselasky, julian, hiren, emaste, freebsd-net. REVISION SUMMARY There is a hole in the timer code that when you give it a lock, it will lock that lock and then ch

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. I have created D1777 to address the nd6/arp crash separately. I am currently in the midst of testing these. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, sbruno, imp, adrian, hselasky Cc: julian, hiren, jhb, kostikbel,

[Differential] [Commented On] D1761: Extend LRO support to accumulate more than 65535 bytes

2015-02-04 Thread adrian (Adrian Chadd)
adrian added a comment. I think we should talk to rwatson about his mbuf hijinx and see about extending flags. This isn't the only option to add. REVISION DETAIL https://reviews.freebsd.org/D1761 To: hselasky, rrs, glebius, gnn, emaste, lstewart, rwatson, bz, imp, np, adrian Cc: freebsd-net _

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Imp: Ok I have spent a bit of time puzzling this out. First I was mistaken, the callouts being run are either arptimer or nd6timer(function name not right). These are not using giant but the passed in lle structure rw_lock. We need to adjust these so that they check they:

[Differential] [Commented On] D1761: Extend LRO support to accumulate more than 65535 bytes

2015-02-04 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. Adrian: We do have one bit left after M_FLOWID was removed. Should I use it for this? And what would be an appropriate name. M_LENGTH_OK ? REVISION DETAIL https://reviews.freebsd.org/D1761 To: hselasky, rrs, glebius, gnn, emaste, lstewart, rwatson, bz, imp, np, adria

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread imp (Warner Losh)
imp added a comment. So why doesn't the lle* code set mpsafe to 1? After re-reading the man page several times, I'm thinking that's the solution here. It already uses other locks and reference counts to synchronize things, so why get Giant involved at all? REVISION DETAIL https://reviews.fre

[Differential] [Requested Changes To] D1761: Extend LRO support to accumulate more than 65535 bytes

2015-02-04 Thread adrian (Adrian Chadd)
adrian requested changes to this revision. adrian added a comment. This revision now requires changes to proceed. I still don't want to see the hash type stuff polluted with things that aren't hashes, making it more difficult to implement RSS work distribution along with other kinds of features.

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread imp (Warner Losh)
imp added a comment. For the lle* code, it looks like the reference count for the data structure improperly doesn't cover the implicit use of the mutex by the callout system. That seems to be the real bug here, no? Protecting a mutex with a reference count without holding a reference to that mu

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Julian: The simple fix is to change the callout_init_rw -> callout_init(c, 1); That makes it so the callout_stop() in this instance would return 0, not 1. Since the callout could not be stopped. This would then cause the reference count to drop to 1 (it was 2 the original

[Differential] [Updated, 241 lines] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-02-04 Thread rrs (Randall Stewart)
rrs updated this revision to Diff 3631. rrs added a comment. This revision now requires review to proceed. This fixes the comment as imp suggested and the indent... CHANGES SINCE LAST UPDATE https://reviews.freebsd.org/D1711?vs=3603&id=3631 REVISION DETAIL https://reviews.freebsd.org/D1711

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. julian: What do you mean by "wait a bit". Spinning or sleeping? REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, hselasky, adrian, imp, sbruno Cc: julian, hiren, jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread julian (JulianElischer)
julian added a comment. let me see if this is correct.. I'm assuming that the reference count can not change if you hold the lock? if so then before releasing the lock, we can check if the count is 1.. if it is then we are the only person who even knows where this thing is so there can't be any

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Julian: The point is *exactly* that, the callout *has* a reference.. and now that the table is being flushed if the callout_stop returns 1 it thinks the callout has been stopped, which it has, which means it will not run and release its reference. Thus the lowering of the

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. julian: Hence a lock is used, the callback won't be called when callout_stop() returns 1. Only the mutex will still be used. Maybe a callout_reset() having 1 tick as timeout will work instead of callout_stop(). If the callout_reset() returns 0 we add a ref instead of g

[Differential] [Changed Subscribers] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-02-04 Thread julian (JulianElischer)
julian added a subscriber: julian. julian added a comment. >>! In D1711#66, @rrs wrote: > 3) The callout_stop() on CPU 2 does what it is supposed to and sets the > cc_cancel bit to true and > return 1. This causes callout_stop() to lower the reference count which > means when llentry_free(

[Differential] [Updated, 79 lines] D1761: Extend LRO support to accumulate more than 65535 bytes

2015-02-04 Thread hselasky (Hans Petter Selasky)
hselasky edited reviewers, added: np; removed: rmacklem hselasky removed a subscriber: np. hselasky updated this revision to Diff 3630. hselasky added a comment. Address comments from imp and np: - Update patch to use IP_MAXPACKET as default for LRO which preserves existing behaviour. - Add to n

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Ok guys, I have puzzled out what that crash *may* be that was posted by Hiren. The same issue exists in the timeout code rewrite that Han's has up on the board as well. Though the callout_drain_async() may solve it if the user called that instead. Here is what is happening.

Re: [RFC][patch] New "keep-state-only" option (version 3)

2015-02-04 Thread Julian Elischer
On 2/4/15 6:08 PM, bycn82 wrote: /Cool, But maybe not all people are following this topic, so can you please simplify it by answering below question in order to allow more people to know what is going on here. / /What kind of problem you are facing and how does your patch resolve it? / le

Re: [RFC][patch] New "keep-state-only" option (version 3)

2015-02-04 Thread Julian Elischer
On 2/4/15 5:24 PM, Lev Serebryakov wrote: -- Re-installation of state (with second, third, etc... packet of connection) should update TCP state of state (sorry!), or it will die in 10 seconds. This version seems to be final (apart from name of new option!). It works perfectly on my route

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#59, @rrs wrote: > Hiren: > > Ok looking at kern_timeout.c thats a call to > class->lc_lock(c_lock, lock_status); > > If my 10.x matches yours. > > And the call inside that kern_rwlock.c:757 > is > > v = rw->rw_lock; > owner = (struct thread *)RW_OWNER(v);

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#60, @hiren wrote: >>>! In D1711#58, @rrs wrote: > >> hiren: >> >> This looks interesting to me, it is definitely something I would like to >> look at. I assume you >> are on 10.stable like Sean? > > Yes, its plain stable10+D1711. > Also, all 3 panics are fr

Re: [RFC][patch] New "keep-state-only" option (version 3)

2015-02-04 Thread bycn82
*Cool, But maybe not all people are following this topic, so can you please simplify it by answering below question in order to allow more people to know what is going on here.* *What kind of problem you are facing and how does your patch resolve it?* On 4 February 2015 at 17:24, Lev Serebryako

[RFC][patch] New "keep-state-only" option (version 3)

2015-02-04 Thread Lev Serebryakov
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 03.02.2015 19:55, Lev Serebryakov wrote: >> Ok, "allow-state"/"deny-state" was very limited idea. Here is >> more universal mechanism: new "keep-state-only" (aliased as >> "record-only") option, which works exactly as "keep-state" BUT >> cancel

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#61, @hselasky wrote: > Hi, > > There is only one or two likely consumers of callout_init_rw() at the present > moment, and one of them is: > > ./netinet6/nd6.c: canceled = callout_stop(&ln->ln_timer_ch); > ./netinet6/nd6.c: can

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#59, @rrs wrote: > Hiren: > > Ok looking at kern_timeout.c thats a call to > class->lc_lock(c_lock, lock_status); > > If my 10.x matches yours. It's not :-( Looks like what we have here is not stock stable10 really. I'll check all the details and get back