[CFR] Forward RTO recovery algorithm (rfc5682) patch

2011-02-05 Thread Weongyo Jeong
This patch is *VERY* experimental patch to implement rfc5862 which is on
the IETF standard tracks so it could be completely wrong or has a lot of
bugs on it or wrong approaches because I'm a really newbie on TCP stack.

This patch includes two features to support `Basic FRTO algorithm' and
`SACK-Enhanced FRTO algorithm' but not including a feature to support
things mentioned at `Appendix A'.

I'm looking for reviewers teaching me where it's wrong or what the
approach was bad on line by line.  However any comments are welcome.

The patch is available at:

http://people.freebsd.org/~weongyo/patch_tcpfrto_20110205.diff

regards,
Weongyo Jeong

Index: netinet/tcp_input.c
===
--- netinet/tcp_input.c	(revision 218148)
+++ netinet/tcp_input.c	(working copy)
@@ -161,6 +161,11 @@
 &VNET_NAME(tcp_abc_l_var), 2,
 "Cap the max cwnd increment during slow-start to this number of segments");
 
+VNET_DEFINE(int, tcp_do_rfc5682) = 1;
+SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, rfc5682, CTLFLAG_RW,
+&VNET_NAME(tcp_do_rfc5682), 0,
+"Enable RFC 5682 (Forward RTO-Recovery: F-RTO)");
+
 SYSCTL_NODE(_net_inet_tcp, OID_AUTO, ecn, CTLFLAG_RW, 0, "TCP ECN");
 
 VNET_DEFINE(int, tcp_do_ecn) = 0;
@@ -254,6 +259,212 @@
 	}
 }
 
+/* revert to the conventional RTO recovery. */
+#define	TCP_FRTO_REVERT(tp)	do {	\
+	(tp)->frto_flags = 0;		\
+} while (0)
+
+static int
+tcp_frto_send2mss(struct tcpcb *tp, struct tcphdr *th, uint16_t type)
+{
+	struct inpcb *inp = tp->t_inpcb;
+	struct socket *so = inp->inp_socket;
+	u_long oldcwnd;
+	tcp_seq onxt;
+
+	/*
+	 * XXXWG If the TCP sender does not have any new data to send, OR the
+	 * advertised window prohibits new transmissions, the recommended
+	 * action is to skip step 3 of this algorithm and continue with
+	 * slow-start retransmissions, following the conventional RTO recovery
+	 * algorithm.  However, alternative ways of handling the window-limited
+	 * cases that could result in better performance are discussed in
+	 * Appendix A.
+	 */
+	if (so->so_snd.sb_cc == 0 || tp->snd_wnd == 0)
+		/* XXXWG skip step 3 OR do alternative ways.  */
+		return (1);
+	oldcwnd = tp->snd_cwnd;
+	onxt = tp->snd_nxt;
+	/*
+	 * XXXWG transmit up to two new (previously unsent) segments and enter
+	 * step 3 of this algorithm.  If the TCP sender does not have enough
+	 * unsent data, it can send only one segment.
+	 */
+	tp->snd_cwnd = 2 * tp->t_maxseg;
+	tp->snd_nxt = tp->snd_max;
+	/*
+	 * XXXWG in addition, the TCP sender MAY override the Nagle algorithm.
+	 */
+	tp->t_flags |= TF_ACKNOW;
+	(void) tcp_output(tp);
+	tp->snd_nxt = onxt;
+	tp->snd_cwnd = oldcwnd;
+	return (0);
+}
+
+static void inline
+tcp_frto_ack_received(struct tcpcb *tp, struct tcphdr *th, uint16_t type)
+{
+
+	/* SACK-Enhanced F-RTO */
+
+	if (tp->t_flags & TF_SACK_PERMIT) {
+		if ((tp->frto_flags & FRTO_INSTEP1) != 0 &&
+		(tp->frto_flags & FRTO_CANGOSTEP2) != 0) {
+			/* SACK-Enhanced F-RTO - step 2 */
+			if (type == CC_DUPACK)
+/*
+ * 2) If duplicate ACKs arrive before the
+ * cumulative acknowledgment for retransmitted
+ * data, adjust the scoreboard according to
+ * the incoming SACK information.  Stay in
+ * step 2 and wait for the next new
+ * acknowledgment.
+ */
+return;
+			KASSERT(type == CC_ACK,
+			("%s: expected ACK (but %d)", __func__, type));
+
+			/*
+			 * 2) When a new acknowledgment arrives, set variable
+			 * "RecoveryPoint" to indicate the highest sequence
+			 * number transmitted so far.
+			 */
+			tp->frto_flags |= FRTO_INSTEP2;
+			tp->snd_recover = tp->snd_max;
+
+			/*
+			 * 2a) If the Cumulative Acknowledgement field covers
+			 * "RecoveryPoint" but not more than
+			 * "RecoveryPoint", revert to the conventional RTO
+			 * recovery and set the congestion window to no
+			 * more than 2 * MSS, like a regular TCP would do.
+			 * Do not enter step 3 of this algorithm.
+			 */
+			if (th->th_ack == tp->snd_recover) {
+tp->snd_cwnd = 2 * tp->t_maxseg;
+TCP_FRTO_REVERT(tp);
+return;
+			/*
+			 * 2b) If the Cumulative Acknowledgment field does not
+			 * cover "RecoveryPoint" but is larger than
+			 * SND.UNA.
+			 */
+			} else if (SEQ_LT(th->th_ack, tp->snd_recover) &&
+			SEQ_GT(th->th_ack, tp->snd_una) &&
+			!tcp_frto_send2mss(tp, th, type)) {
+tp->frto_flags |= FRTO_CANGOSTEP3;
+tp->frto_fack = tp->snd_fack;
+			}
+		}
+		if ((tp->frto_flags & FRTO_INSTEP2) != 0 &&
+		(tp->frto_flags & FRTO_CANGOSTEP3) != 0) {
+			struct sackhole *q = TAILQ_FIRST(&tp->snd_holes);
+			/* the Cumulative Acknowledgment */
+			tcp_seq cack = SEQ_MAX(th->th_ack, tp->snd_una);
+
+			/*
+			 * XXXWG 3a) If the Cumulative Acknowledgment field or
+			 * the SACK information covers more than
+			 * "RecoveryPoint".
+			 */
+			if (SEQ_GEQ(cack, tp->snd_recover) ||
+			(q != NULL && SEQ_GT(q->start, tp->snd_recover)))
+goto frto_revert;
+			/*
+			 * XXXWG 3a) tak

Re: Problem with re0

2011-02-05 Thread Zeus V Panchenko
Pyun YongHyeon (pyu...@gmail.com) [11.01.31 23:14] wrote:
> 
> Then I have no idea. Does other OS work with your hardware without
> issues? As last resort, could you try vendor's FreeBSD driver? The
> vendor's driver applies a bunch of magic DSP fixups which re(4)
> does not have. I don't know whether it makes difference or not but
> it would be worth a try. Note, vendor's driver treat your
> controller as old 8139 such that it disables all offload features
> and does not work on non-x86 architectures.
>

i386 exposes the same problem :(

as for vendor's drivers, i didn't try them yet ...

-- 
Zeus V. Panchenko
IT Dpt., IBS ltdGMT+2 (EET)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: panic: bufwrite: buffer is not busy???

2011-02-05 Thread Eugene Grosbein
On 02.02.2011 00:50, Gleb Smirnoff wrote:
> On Wed, Feb 02, 2011 at 12:30:20AM +0600, Eugene Grosbein wrote:
> E> On 31.01.2011 14:20, Julian Elischer wrote:
> E> 
> E> > replace with:
> E> > 
> E> > 3504if ((hook == NULL) ||
> E> > 3505NG_HOOK_NOT_VALID(hook) ||
> E> >  ((peer = NG_HOOK_PEER(hook)) == NULL) ||
> E> > 3506NG_HOOK_NOT_VALID(peer) ||
> E> >  ((peernode = NG_PEER_NODE(hook)) == NULL) ||
> E> > 3507NG_NODE_NOT_VALID(peernode)) {
> E> >  if (peer)
> E> >kassert((peernode != NULL), ("peer node 
> NULL wile peer hook exists"));
> E> > 3508NG_FREE_ITEM(item);
> E> 
> E> This day I have updated panicing router to RELENG_8 and combined changes 
> supposed
> E> by Julian and Gleb. After 8 hours it has just paniced again and could not 
> finish
> E> to write crashdump again:
> E> 
> E> Fatal trap 12: page fault while in kernel mode
> E> cpuid = 3; apic id = 06
> E> fault virtual address   = 0x63
> E> fault code  = supervisor read data, page not present
> E> instruction pointer = 0x20:0x803d4ccd
> E> stack pointer   = 0x28:0xff80ebffc600
> E> frame pointer   = 0x28:0xff80ebffc680
> E> code segment= base 0x0, limit 0xf, type 0x1b
> E> = DPL 0, pres 1, long 1, def32 0, gran 1
> E> processor eflags= interrupt enabled, resume, IOPL = 0
> E> current process = 2390 (mpd5)
> E> trap number = 12
> E> panic: page fault
> E> cpuid = 3
> E> Uptime: 8h3m51s
> E> Dumping 4087 MB (3 chunks)
> E>   chunk 0: 1MB (150 pages) ... ok
> E>   chunk 1: 3575MB (915088 pages) 3559 3543panic: bufwrite: buffer is not 
> busy???
> E> cpuid = 3
> E> Uptime: 8h3m52s
> E> Automatic reboot in 15 seconds - press a key on the console to abort
> E> 
> E> # gdb kernel
> E> GNU gdb 6.1.1 [FreeBSD]
> E> Copyright 2004 Free Software Foundation, Inc.
> E> GDB is free software, covered by the GNU General Public License, and you 
> are
> E> welcome to change it and/or distribute copies of it under certain 
> conditions.
> E> Type "show copying" to see the conditions.
> E> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> E> This GDB was configured as "amd64-marcel-freebsd"...
> E> (gdb) l *0x803d4ccd
> E> 0x803d4ccd is in ng_pppoe_disconnect (netgraph.h:191).
> E> 186 int line);
> E> 187
> E> 188 static __inline void
> E> 189 _chkhook(hook_p hook, char *file, int line)
> E> 190 {
> E> 191 if (hook->hk_magic != HK_MAGIC) {
> E> 192 printf("Accessing freed hook ");
> E> 193 dumphook(hook, file, line);
> E> 194 }
> E> 195 hook->lastline = line;
> E> (gdb) x/i 0x803d4ccd
> E> 0x803d4ccd :   cmpl   
> $0x78573011,0x64(%rbx)
> 
> This looks like ng_pppoe_disconnect() was called with NULL argument.
> 
> Can you add KDB_TRACE option to kernel? Your boxes for some reason can't
> dump core, but with this option we will have at least trace.

Same box, more panics with KDB_TRACE, NETGRAPGH_DEBUG and your patch and 
Julian's.

First: again, no dump (not even started to dump, and no "Uptime:" written to 
console):

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x2006c
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x803e5a6d
stack pointer   = 0x28:0xff80ec03d600
frame pointer   = 0x28:0xff80ec03d680
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 2390 (mpd5)
trap number = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
X_db_sym_numargs() at 0x801a227a = X_db_sym_numargs+0x15a
kdb_backtrace() at 0x8033d547 = kdb_backtrace+0x37
panic() at 0x8030b567 = panic+0x187
dblfault_handler() at 0x804c0ca0 = dblfault_handler+0x330
dblfault_handler() at 0x804c107f = dblfault_handler+0x70f
trap() at 0x804c155f = trap+0x3df
calltrap() at 0x804a8de4 = calltrap+0x8
--- trap 0xc, rip = 0x803e5a6d, rsp = 0xff80ec03d600, rbp = 
0xff80ec03d680 ---
ng_parse_get_token() at 0x803e5a6d = ng_parse_get_token+0x70cd
ng_destroy_hook() at 0x803d53b2 = ng_destroy_hook+0x222
ng_rmnode() at 0x803d69bb = ng_rmnode+0x12ab
ng_snd_item() at 0x803d8520 = ng_snd_item+0x3f0
ng_parse_get_token() at 0x803e97fa = ng_parse_get_token+0xae5a
sosend_generic() at 0x80373df6 = sosend_generic+0x436
kern_sendit() at 0x803776d5 = kern_sendit+0x1a5
kern_sendit() at 0x8037790c = kern_sendit+0x3dc

Re: panic: bufwrite: buffer is not busy???

2011-02-05 Thread Mike Tancsa
On 2/5/2011 8:48 AM, Eugene Grosbein wrote:
> 
> First: again, no dump (not even started to dump, and no "Uptime:" written to 
> console):

if you try and enable dumps manually from the shell,

dumpon -v /dev/ad0s1b
(or whatever your swap partition is), what does dumpon return with ?

---Mike

-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: panic: bufwrite: buffer is not busy???

2011-02-05 Thread Eugene Grosbein
On 05.02.2011 19:55, Mike Tancsa wrote:
> On 2/5/2011 8:48 AM, Eugene Grosbein wrote:
>>
>> First: again, no dump (not even started to dump, and no "Uptime:" written to 
>> console):
> 
> if you try and enable dumps manually from the shell,
> 
> dumpon -v /dev/ad0s1b
> (or whatever your swap partition is), what does dumpon return with ?

# . /etc/rc.conf
# set -x
# dumpon -v $dumpdev
+ dumpon -v /dev/ad0s4b
kernel dumps on /dev/ad0s4b

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: panic: bufwrite: buffer is not busy???

2011-02-05 Thread Eugene Grosbein
On 05.02.2011 20:00, Eugene Grosbein wrote:

>>> First: again, no dump (not even started to dump, and no "Uptime:" written 
>>> to console):
>>
>> if you try and enable dumps manually from the shell,
>>
>> dumpon -v /dev/ad0s1b
>> (or whatever your swap partition is), what does dumpon return with ?
> 
> # . /etc/rc.conf
> # set -x
> # dumpon -v $dumpdev
> + dumpon -v /dev/ad0s4b
> kernel dumps on /dev/ad0s4b

Note: this is NanoBSD running from SSD. It uses ad0s1 and ad0s2 for code
and ad0s3 for /cfg, as usual, 1GB total. Other space of SSD is dedicated
to ad0s4 where I have ad0s4b (8GB) for crashdump and the rest as ad0s4a for 
/var/crash.

And ad0s4b is NOT configured as swap. There are 4GB of RAM and no swap here.
More than 3GB of RAM are generally free.

Eugene Grosbein
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


divert rewrite

2011-02-05 Thread Julian Elischer
for some time now it has been apparent that the divert socket protocol 
was a little too heavily tied to IPv4.


With IPv6 coming along now, it seems that we should look at how to 
extend it.


I see a couple of possible ways to do this:

--- the first way: 

One would be to add an IPV6 version of divert sockets, possibly from 
the same base code. The ipfw code to call it would pass on whether it 
was an ipv4 or ipv6 packet that is passed out (or it can just look)
and the divert packet would pass it to the correct socket if it was 
openned.


From an application point of view, this means you would have to open 
an ipv4 divert socket and an ipv6 divert socket.


if you didn't have the right one open.. you would just never see the 
packet.


Since applications that use divert would probably have to be rewritten 
to cope with ipv6 anyhwo this seems to be an

ok solution/cost.

Any app that was not updated would continue to run with ipv4 but would 
never see IPV6 packets even if diverted.


-- another way 

Another way to do this would be to recode divert to be its own 
protocol family with its own sockaddr type.


that socket addr would include the family as now, but would have 
enough room to support ipv4 and ipv6 addresses, as well as special 
fields that are curently not available in divert or are just 'hacked'
(such as the fact that the name of the interface is hidden in the 
'sa_zero' bytes of the ipv4 socket address, and if you keep it and 
pass it back you are effectively passing that information back too).


In this scheme we would allow the socket address structure to have
enough fields to be able to encode some of the more intersting
packet layer information that is in the mbuf.
For example, the FIB, or somefo the other packet flags
or maybe even one or two of the common tags.

I could see that some of these flags might be useful to a divert agent 
that understood the protocol stack it was working with:


#define M_PROTO10x0010 /* protocol-specific */
#define M_PROTO20x0020 /* protocol-specific */
#define M_PROTO30x0040 /* protocol-specific */
#define M_PROTO40x0080 /* protocol-specific */
#define M_PROTO50x0100 /* protocol-specific */
#define M_BCAST 0x0200 /* send/received as link-level 
broadcast */
#define M_MCAST 0x0400 /* send/received as link-level 
multicast */

#define M_SKIP_FIREWALL 0x4000 /* skip firewall processing */

#define M_VLANTAG   0x0001 /* ether_vtag is valid */
#define M_PROMISC   0x0002 /* packet was not for us */
#define M_PROTO60x0008 /* protocol-specific */
#define M_PROTO70x0010 /* protocol-specific */
#define M_PROTO80x0020 /* protocol-specific */
#define M_FLOWID0x0040 /* flowid is valid */


If we really wanted to do more, we could also define an OOB format
that could be used with recvmsg() and sendmsg() that would be
extensible enough to really give a lot of information.

This would be the least compatible, and to tell the truth, I'd be 
tempted to leave the old ipv4 interface in place as an upgrade aid.

it could however handle all sorts of protocols, not just ipv4 and ipv6
but possibly L2 packets etc. as well.
It may also be more work than I hope to do :-)

--

If anyone else has suggetions or man-power or would like to help..
pipe up!


Julian




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: divert rewrite

2011-02-05 Thread Ivo Vachkov
Hello,

How can I help?

/ipv

On Sun, Feb 6, 2011 at 12:20 AM, Julian Elischer  wrote:
> for some time now it has been apparent that the divert socket protocol was a
> little too heavily tied to IPv4.
>
> With IPv6 coming along now, it seems that we should look at how to extend
> it.
>
> I see a couple of possible ways to do this:
>
> --- the first way: 
>
> One would be to add an IPV6 version of divert sockets, possibly from the
> same base code. The ipfw code to call it would pass on whether it was an
> ipv4 or ipv6 packet that is passed out (or it can just look)
> and the divert packet would pass it to the correct socket if it was openned.
>
> From an application point of view, this means you would have to open an ipv4
> divert socket and an ipv6 divert socket.
>
> if you didn't have the right one open.. you would just never see the packet.
>
> Since applications that use divert would probably have to be rewritten to
> cope with ipv6 anyhwo this seems to be an
> ok solution/cost.
>
> Any app that was not updated would continue to run with ipv4 but would never
> see IPV6 packets even if diverted.
>
> -- another way 
>
> Another way to do this would be to recode divert to be its own protocol
> family with its own sockaddr type.
>
> that socket addr would include the family as now, but would have enough room
> to support ipv4 and ipv6 addresses, as well as special fields that are
> curently not available in divert or are just 'hacked'
> (such as the fact that the name of the interface is hidden in the 'sa_zero'
> bytes of the ipv4 socket address, and if you keep it and pass it back you
> are effectively passing that information back too).
>
> In this scheme we would allow the socket address structure to have
> enough fields to be able to encode some of the more intersting
> packet layer information that is in the mbuf.
> For example, the FIB, or somefo the other packet flags
> or maybe even one or two of the common tags.
>
> I could see that some of these flags might be useful to a divert agent that
> understood the protocol stack it was working with:
>
> #define M_PROTO1        0x0010 /* protocol-specific */
> #define M_PROTO2        0x0020 /* protocol-specific */
> #define M_PROTO3        0x0040 /* protocol-specific */
> #define M_PROTO4        0x0080 /* protocol-specific */
> #define M_PROTO5        0x0100 /* protocol-specific */
> #define M_BCAST         0x0200 /* send/received as link-level broadcast
> */
> #define M_MCAST         0x0400 /* send/received as link-level multicast
> */
> #define M_SKIP_FIREWALL 0x4000 /* skip firewall processing */
>
> #define M_VLANTAG       0x0001 /* ether_vtag is valid */
> #define M_PROMISC       0x0002 /* packet was not for us */
> #define M_PROTO6        0x0008 /* protocol-specific */
> #define M_PROTO7        0x0010 /* protocol-specific */
> #define M_PROTO8        0x0020 /* protocol-specific */
> #define M_FLOWID        0x0040 /* flowid is valid */
>
>
> If we really wanted to do more, we could also define an OOB format
> that could be used with recvmsg() and sendmsg() that would be
> extensible enough to really give a lot of information.
>
> This would be the least compatible, and to tell the truth, I'd be tempted to
> leave the old ipv4 interface in place as an upgrade aid.
> it could however handle all sorts of protocols, not just ipv4 and ipv6
> but possibly L2 packets etc. as well.
> It may also be more work than I hope to do :-)
>
> --
>
> If anyone else has suggetions or man-power or would like to help..
> pipe up!
>
>
> Julian
>
>
>
>
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>



-- 
"UNIX is basically a simple operating system, but you have to be a
genius to understand the simplicity." Dennis Ritchie
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: divert rewrite

2011-02-05 Thread Julian Elischer

On 2/5/11 4:09 PM, Ivo Vachkov wrote:

Hello,

How can I help?


if you have ipv6 connectivity and experience, I have no experience or 
connectivity, with it so

I'll be coding blind and will need a tester.
If you have an application for IPV6 testing that would be even better.
Divert is often used for NAT but that doesn't seem very useful for 
IPv6 and

natd doesn't support it anyhow.


/ipv

On Sun, Feb 6, 2011 at 12:20 AM, Julian Elischer  wrote:

for some time now it has been apparent that the divert socket protocol was a
little too heavily tied to IPv4.

With IPv6 coming along now, it seems that we should look at how to extend
it.

I see a couple of possible ways to do this:

--- the first way: 

One would be to add an IPV6 version of divert sockets, possibly from the
same base code. The ipfw code to call it would pass on whether it was an
ipv4 or ipv6 packet that is passed out (or it can just look)
and the divert packet would pass it to the correct socket if it was openned.

 From an application point of view, this means you would have to open an ipv4
divert socket and an ipv6 divert socket.

if you didn't have the right one open.. you would just never see the packet.

Since applications that use divert would probably have to be rewritten to
cope with ipv6 anyhwo this seems to be an
ok solution/cost.

Any app that was not updated would continue to run with ipv4 but would never
see IPV6 packets even if diverted.

-- another way 

Another way to do this would be to recode divert to be its own protocol
family with its own sockaddr type.

that socket addr would include the family as now, but would have enough room
to support ipv4 and ipv6 addresses, as well as special fields that are
curently not available in divert or are just 'hacked'
(such as the fact that the name of the interface is hidden in the 'sa_zero'
bytes of the ipv4 socket address, and if you keep it and pass it back you
are effectively passing that information back too).

In this scheme we would allow the socket address structure to have
enough fields to be able to encode some of the more intersting
packet layer information that is in the mbuf.
For example, the FIB, or somefo the other packet flags
or maybe even one or two of the common tags.

I could see that some of these flags might be useful to a divert agent that
understood the protocol stack it was working with:

#define M_PROTO10x0010 /* protocol-specific */
#define M_PROTO20x0020 /* protocol-specific */
#define M_PROTO30x0040 /* protocol-specific */
#define M_PROTO40x0080 /* protocol-specific */
#define M_PROTO50x0100 /* protocol-specific */
#define M_BCAST 0x0200 /* send/received as link-level broadcast
*/
#define M_MCAST 0x0400 /* send/received as link-level multicast
*/
#define M_SKIP_FIREWALL 0x4000 /* skip firewall processing */

#define M_VLANTAG   0x0001 /* ether_vtag is valid */
#define M_PROMISC   0x0002 /* packet was not for us */
#define M_PROTO60x0008 /* protocol-specific */
#define M_PROTO70x0010 /* protocol-specific */
#define M_PROTO80x0020 /* protocol-specific */
#define M_FLOWID0x0040 /* flowid is valid */


If we really wanted to do more, we could also define an OOB format
that could be used with recvmsg() and sendmsg() that would be
extensible enough to really give a lot of information.

This would be the least compatible, and to tell the truth, I'd be tempted to
leave the old ipv4 interface in place as an upgrade aid.
it could however handle all sorts of protocols, not just ipv4 and ipv6
but possibly L2 packets etc. as well.
It may also be more work than I hope to do :-)

--

If anyone else has suggetions or man-power or would like to help..
pipe up!


Julian




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"






___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Proposed patch for Port Randomization modifications according to RFC6056

2011-02-05 Thread Giorgos Keramidas
On Fri, 28 Jan 2011 11:00:40 -0800, Doug Barton  wrote:
> I haven't reviewed the patch in detail yet but I wanted to first thank
> you for taking on this work, and being so responsive to Fernando's
> request (which I agreed with, and you updated before I even had a
> chance to say so). :)

Thanks from me too.

> My one comment so far is on the name of the sysctl's. There are 2
> problems with sysctl/variable names that use an rfc title. The first is
> that they are not very descriptive to the 99.9% of users who are not
> familiar with that particular doc. The second is more esoteric, but if
> the rfc is subsequently updated or obsoleted we're stuck with either an
> anachronism or updating code (both of which have their potential areas
> of confusion).
>
> So in order to avoid this issue, and make it more consistent with the
> existing:
>
> net.inet.ip.portrange.randomtime
> net.inet.ip.portrange.randomcps
> net.inet.ip.portrange.randomized
>
> How does net.inet.ip.portrange.randomalg sound? I would also suggest
> that the second sysctl be named
> net.inet.ip.portrange.randomalg.alg5_tradeoff so that one could do
> sysctl net.inet.ip.portrange.randomalg' and see both values. But I won't
> quibble on that. :)

It's a usability issue too, so I'd certainly support renaming the
sysctls to something human-friendly.  It's always bad enough to go
through look at a search engine to find out what net.inet.rfc1234
means.  It's worse when RFC 1234 has been obsoleted a few years ago
and now it's called RFC 54321.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


MSS rewrite / MSS clamping?

2011-02-05 Thread Jason Fesler
I'm in search of MSS clamping for FreeBSD servers; in particular, for 
IPv6.  I'm finding pretty much nothing (except iptables..) on the net.


Am I chasing wild geese?


--
 Jason Fesler, email/jabber  resume: http://jfesler.com
 "Give a man fire, and he'll be warm for a day;
 set a man on fire, and he'll be warm for the rest of his life."
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: MSS rewrite / MSS clamping?

2011-02-05 Thread Boris Kochergin

On 02/05/11 23:07, Jason Fesler wrote:
I'm in search of MSS clamping for FreeBSD servers; in particular, for 
IPv6.  I'm finding pretty much nothing (except iptables..) on the net.


Am I chasing wild geese?




pf.conf(5) mentions a "max-mss" option for traffic normalization.

-Boris
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: MSS rewrite / MSS clamping?

2011-02-05 Thread Julian Elischer

On 2/5/11 8:07 PM, Jason Fesler wrote:
I'm in search of MSS clamping for FreeBSD servers; in particular, 
for IPv6.

Well, there is ng_tcpmss  but I see that it only works for IPv4
It may however be relatively easy to add code to allow it to work for 
IPV6.


there is also the code in ports net/tcpmss which also is only IPV4,but 
in addition

relies on DIVERT which is currently IPV4 only (I hope to change that).


I'm finding pretty much nothing (except iptables..) on the net.
I assume you don't include things like ppp links if you are talking 
about a server.


Am I chasing wild geese?


don't know about pf.  it MAY be able to help.

For what it is worth, I expect a lot of IPV6 stuff to get kicked into 
shape over the next few months :-)





___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Connections not purged on address deletion

2011-02-05 Thread Robert Watson


On Fri, 4 Feb 2011, Prabhu Hariharan wrote:

When I delete an IP-address from an interface, the TCP (and other) 
connections using that local IP-address are not getting purged.  The telnet 
or ssh sessions on the other end just get hung, as FreeBSD address-deletion 
doesn't handle this situation and fails to call pfctlinput() to notify 
protocols on this event.  The TCP connections simply linger in the system 
and takes it due course on TCP timers to free those inpcbs.


tcp4 0 0 30.30.30.31.22 30.30.30.30.58796 ESTABLISHED

Is this by design?  Or any significance on relying on applications 
intelligently to do timeouts, without a notification from network layer?


I don't know if it's by design per se, but it proves extremely handy in 
practice when 802.11 blips and DHCP goes funny.  Or, perhaps more 
historically, when PPP was restarted, leading to addresses being removed and 
re-added.


Robert
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"