Re: FBSD 1GBit router?

2008-03-03 Thread Łukasz Bromirski

Willem Jan Withagen wrote:


I'm looking for a stream exploder.:)
1 2Mbit stream in, and as many as possible out.
And 7*1Gb = 14Gbit, so I'd like to be pushing 7000 streams.
(One advantage is that they will be UDP streams, so there is
a little less bookkeeping in the protocol stack )


Wouldn't it be a case for use of multicast vs unicast? Hardware
is always better anyway, so why not invest in some switch that
can do unicast/multicast in hardware?

--
"Don't expect me to cry for all the |   Łukasz Bromirski
 reasons you had to die" -- Kurt Cobain |http://lukasz.bromirski.net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ipfw initialization: SI_ORDER_ANY -> SI_ORDER_MIDDLE?

2008-03-03 Thread Paolo Pisati
On Sun, Mar 02, 2008 at 03:58:50PM +0100, Luigi Rizzo wrote:
> 
> The SI_ORDER_* definitions in /sys/sys/kernel.h are enumerated on a
> large range, so if the existing code does not have races,
> you can safely move the non-leaf modules
> (such as ipfw,ko in your case) to (SI_ORDER_ANY - some_small_integer)
> without breaking anything.

fine, i did this.
 
is it MFCable?

bye,
P.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Current problem reports assigned to freebsd-net@FreeBSD.org

2008-03-03 Thread FreeBSD bugmaster
Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker  Resp.  Description

a kern/38554   netchanging interface ipaddress doesn't seem to work
s kern/39937   netipstealth issue
s kern/81147   net[net] [patch] em0 reinitialization while adding aliase
o kern/92552   netA serious bug in most network drivers from 5.X to 6.X 
s kern/95665   net[if_tun] "ping: sendto: No buffer space available" wit
s kern/105943  netNetwork stack may modify read-only mbuf chain copies
o kern/106316  net[dummynet] dummynet with multipass ipfw drops packets 
o kern/108542  net[bce]: Huge network latencies with 6.2-RELEASE / STABL
o kern/112528  net[nfs] NFS over TCP under load hangs with "impossible p
o kern/112686  net[patm] patm driver freezes System (FreeBSD 6.2-p4) i38
o kern/112722  net[udp] IP v4 udp fragmented packet reject
o kern/113842  net[ipv6] PF_INET6 proto domain state can't be cleared wi
o kern/114714  net[gre][patch] gre(4) is not MPSAFE and does not support
o kern/114839  net[fxp] fxp looses ability to speak with traffic
o kern/115239  net[ipnat] panic with 'kmem_map too small' using ipnat
o kern/116077  net[ip] [patch] 6.2-STABLE panic during use of multi-cast
f kern/116172  net[tun] [panic] Network / ipv6 recursive mutex panic
o kern/116185  net[iwi] if_iwi driver leads system to reboot
o kern/116328  net[bge]: Solid hang with bge interface
o kern/116747  net[ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile 
o kern/116837  net[tun] [panic] [patch] ifconfig tunX destroy: panic
o kern/117043  net[em] Intel PWLA8492MT Dual-Port Network adapter EEPROM
o kern/117271  net[tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap
o kern/117423  net[vlan] Duplicate IP on different interfaces
o kern/117448  net[carp] 6.2 kernel crash (regression)
o kern/118880  net[ipv6] IP_RECVDSTADDR & IP_SENDSRCADDR not implemented
o kern/119225  net[wi] 7.0-RC1 no carrier with Prism 2.5 wifi card (regr
o kern/119345  net[ath] Unsuported Atheros 5424/2424 and CPU speedstep n
o kern/119361  net[bge] bge(4) transmit performance problem
o kern/119945  net[rum] [panic] rum device in hostap mode, cause kernel 
o kern/120130  net[carp] [panic] carp causes kernel panics in any conste
o kern/120266  net[panic] gnugk causes kernel panic when closing UDP soc
o kern/120304  net[netgraph] [patch] netgraph source assumes 32-bit time
f kern/120725  net[bce] On board second lan port 'bce1' with Broadcom Ne
f kern/120966  net[rum]: kernel panic with if_rum and WPA encryption

35 problems total.

Non-critical problems

S Tracker  Resp.  Description

o conf/23063   net[PATCH] for static ARP tables in rc.network
s bin/41647netifconfig(8) doesn't accept lladdr along with inet addr
o kern/54383   net[nfs] [patch] NFS root configurations without dynamic 
s kern/60293   netFreeBSD arp poison patch
o kern/64556   net[sis] if_sis short cable fix problems with NetGear FA3
o kern/95267   netpacket drops periodically appear
o kern/95277   net[netinet] [patch] IP Encapsulation mask_match() return
o kern/100519  net[netisr] suggestion to fix suboptimal network polling
o kern/102035  net[plip] plip networking disables parallel port printing
o conf/102502  net[patch] ifconfig name does't rename netgraph node in n
o conf/107035  net[patch] bridge interface given in rc.conf not taking a
o kern/109470  net[wi] Orinoco Classic Gold PC Card Can't Channel Hop
o kern/112179  net[sis] [patch] sis driver for natsemi DP83815D autonego
o kern/114915  net[patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f
o bin/116643   net[patch] [request] fstat(1): add INET/INET6 socket deta
o bin/117339   net[patch] route(8): loading routing management commands 
o kern/118727  net[ng] [patch] [request] add new ng_pf module
a kern/118879  net[bge] [patch] bge has checksum problems on the 5703 ch
o kern/118975  net[bge] [patch] Broadcom 5906 not handled by FreeBSD
o bin/118987   netifconfig(8): ifconfig -l (address_family) does not wor
o kern/119432  net[arp] route add -host  -iface  causes arp e
o kern/119617  net[nfs] nfs error on wpa network when reseting/shutdown
o kern/119791  net[nfs] UDP NFS mount of aliased IP addresses from a Sol
o kern/120493  net[wpi] if_wpi.ko fails to load on a Toshiba Satellite P
o kern/120566  net[request]: ifconfig(8) make order of arguments more fr
o kern/120958  netno response 

Re: ipfw initialization: SI_ORDER_ANY -> SI_ORDER_MIDDLE?

2008-03-03 Thread Luigi Rizzo
On Mon, Mar 03, 2008 at 11:17:19AM +0100, Paolo Pisati wrote:
> On Sun, Mar 02, 2008 at 03:58:50PM +0100, Luigi Rizzo wrote:
> > 
> > The SI_ORDER_* definitions in /sys/sys/kernel.h are enumerated on a
> > large range, so if the existing code does not have races,
> > you can safely move the non-leaf modules
> > (such as ipfw,ko in your case) to (SI_ORDER_ANY - some_small_integer)
> > without breaking anything.
> 
> fine, i did this.
>  
> is it MFCable?

i think so, the SI_ORDER_* definitions are the same at least down to
RELENG_6, which is the lowest release we probably care about.

cheers
luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FBSD 1GBit router?

2008-03-03 Thread Willem Jan Withagen

Łukasz Bromirski wrote:

Willem Jan Withagen wrote:


I'm looking for a stream exploder.:)
1 2Mbit stream in, and as many as possible out.
And 7*1Gb = 14Gbit, so I'd like to be pushing 7000 streams.
(One advantage is that they will be UDP streams, so there is
a little less bookkeeping in the protocol stack )


Wouldn't it be a case for use of multicast vs unicast? Hardware
is always better anyway, so why not invest in some switch that
can do unicast/multicast in hardware?


Usefull suggestion, only this is going to be in an overlay cloud where
we do not have control over all the endpoint networks. let alone that we
can get them to use multicast. And even those that use multicast in their
last-mule equipment, don't always have correct setups.

My experience is that Multicast in nice in theory and experiment, but when
push comes to shove it does not completely deliver.

--WjW


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FBSD 1GBit router?

2008-03-03 Thread Łukasz Bromirski

Willem Jan Withagen wrote:

Łukasz Bromirski wrote:

Willem Jan Withagen wrote:


I'm looking for a stream exploder.:)
1 2Mbit stream in, and as many as possible out.
And 7*1Gb = 14Gbit, so I'd like to be pushing 7000 streams.
(One advantage is that they will be UDP streams, so there is
a little less bookkeeping in the protocol stack )


Wouldn't it be a case for use of multicast vs unicast? Hardware
is always better anyway, so why not invest in some switch that
can do unicast/multicast in hardware?


Usefull suggestion, only this is going to be in an overlay cloud where
we do not have control over all the endpoint networks. let alone that we
can get them to use multicast. And even those that use multicast in their
last-mule equipment, don't always have correct setups.

My experience is that Multicast in nice in theory and experiment, but when
push comes to shove it does not completely deliver.


I don't know exact requirements and application used, but given IP TV
deployments relying heavily on multicast, and all other "VoD"
technologies also using multicast...I find Your comments disturbing :)

However, if you don't control the network over which it will be
transported, you need to replicate each stream...and so either
you'll find bandwidth to do it (or pay for it) or be forced to switch
to other design.

--
"Don't expect me to cry for all the |   Łukasz Bromirski
 reasons you had to die" -- Kurt Cobain |http://lukasz.bromirski.net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FBSD 1GBit router?

2008-03-03 Thread Willem Jan Withagen

Łukasz Bromirski wrote:
My experience is that Multicast in nice in theory and experiment, but 
when

push comes to shove it does not completely deliver.


I don't know exact requirements and application used, but given IP TV
deployments relying heavily on multicast, and all other "VoD"
technologies also using multicast...I find Your comments disturbing :)


Where do you think I'm getting my experience from. ;)
Even network owners and techies will admit to this when squeezed.


However, if you don't control the network over which it will be
transported, you need to replicate each stream...and so either
you'll find bandwidth to do it (or pay for it) or be forced to switch
to other design.


There can only be one..
And IP TV over closed networks are not going to make it.
That is not what the customer really wants.

But this is my last remark to this aspect in this list,
since it takes us very OT.

--WjW
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FBSD 1GBit router?

2008-03-03 Thread Bruce M. Simpson

Willem Jan Withagen wrote:

£ukasz Bromirski wrote:

Wouldn't it be a case for use of multicast vs unicast? Hardware
is always better anyway, so why not invest in some switch that
can do unicast/multicast in hardware?


Usefull suggestion, only this is going to be in an overlay cloud where
we do not have control over all the endpoint networks. let alone that we
can get them to use multicast. And even those that use multicast in their
last-mule equipment, don't always have correct setups.

My experience is that Multicast in nice in theory and experiment, but 
when

push comes to shove it does not completely deliver.


I have to agree wholeheartedly, for more detail than you can shake a 
stick at, look here:

   http://www.cs.ucr.edu/~michalis/COURSES/204-02b/papers/ramalho.html

If you're running over MPLS all bets are off. MPLS is like ATM in the 
sense that it ain't got no multicast grok, as far as I can fathom, 
anyway. Label switching is label switching. I never saw any support for 
the notion of 1:M in the LSPs.


Multicast is more likely to succeed at the moment when you have complete 
knowledge of the network topology, and IP layer visibility. There are 
ongoing efforts to address these limitations.


later
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/92090: [bge] bge0: watchdog timeout -- resetting

2008-03-03 Thread gavin
Synopsis: [bge] bge0: watchdog timeout -- resetting

State-Changed-From-To: feedback->open
State-Changed-By: gavin
State-Changed-When: Mon Mar 3 13:32:16 UTC 2008
State-Changed-Why: 
Feedback was received - this is still an issue


Responsible-Changed-From-To: gavin->freebsd-net
Responsible-Changed-By: gavin
Responsible-Changed-When: Mon Mar 3 13:32:16 UTC 2008
Responsible-Changed-Why: 
Over to freebsd-net

http://www.freebsd.org/cgi/query-pr.cgi?pr=92090
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Ephemeral ports patch (fixed)

2008-03-03 Thread Fernando Gont

At 04:11 a.m. 03/03/2008, Mike Silbersack wrote:

Here's the same patch, but with the first ephemeral port changed 
from 1024 to 1.


Now that I've actually gone to try to apply the patch (so I can view 
the two codepaths side by side, rather than in diff form), I'm 
finding that I can't apply it.  I think all the whitespace got 
stomped, either by your mail program or my mail program.  Can you 
please resent this as an attachment?


Sure. Please let me know if this one is okay.

Kind regards,

--
Fernando Gont
e-mail: [EMAIL PROTECTED] || [EMAIL PROTECTED]
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1



Index: in.h
===
RCS file: /home/ncvs/src/sys/netinet/in.h,v
retrieving revision 1.100
diff -u -r1.100 in.h
--- in.h12 Jun 2007 16:24:53 -  1.100
+++ in.h1 Mar 2008 09:00:10 -
@@ -293,8 +293,7 @@
  *
  * The value IP_PORTRANGE_HIGH changes the range of candidate port numbers
  * into the "high" range.  These are reserved for client outbound connections
- * which do not want to be filtered by any firewalls.  Note that by default
- * this is the same as IP_PORTRANGE_DEFAULT.
+ * which do not want to be filtered by any firewalls.
  *
  * The value IP_PORTRANGE_LOW changes the range to the "low" are
  * that is (by convention) restricted to privileged processes.  This
@@ -331,8 +330,13 @@
 #defineIPPORT_RESERVED 1024
 
 /*
- * Default local port range, used by both IP_PORTRANGE_DEFAULT
- * and IP_PORTRANGE_HIGH.
+ * Default local port range, used by IP_PORTRANGE_DEFAULT
+ */
+#define IPPORT_EPHEMERALFIRST  1
+#define IPPORT_EPHEMERALLAST   655535 
+ 
+/*
+ * Dynamic port range, used by IP_PORTRANGE_HIGH.
  */
 #defineIPPORT_HIFIRSTAUTO  49152
 #defineIPPORT_HILASTAUTO   65535
Index: in_pcb.c
===
RCS file: /home/ncvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.198
diff -u -r1.198 in_pcb.c
--- in_pcb.c22 Dec 2007 10:06:11 -  1.198
+++ in_pcb.c1 Mar 2008 09:00:11 -
@@ -89,8 +89,8 @@
  */
 intipport_lowfirstauto  = IPPORT_RESERVED - 1; /* 1023 */
 intipport_lowlastauto = IPPORT_RESERVEDSTART;  /* 600 */
-intipport_firstauto = IPPORT_HIFIRSTAUTO;  /* 49152 */
-intipport_lastauto  = IPPORT_HILASTAUTO;   /* 65535 */
+intipport_firstauto = IPPORT_EPHEMERALFIRST;   /* 1 */
+intipport_lastauto  = IPPORT_EPHEMERALLAST;/* 65535 */
 intipport_hifirstauto = IPPORT_HIFIRSTAUTO;/* 49152 */
 intipport_hilastauto  = IPPORT_HILASTAUTO; /* 65535 */
 
@@ -393,7 +393,7 @@
if (*lportp != 0)
lport = *lportp;
if (lport == 0) {
-   u_short first, last;
+   u_short first, last, aux;
int count;
 
if (laddr.s_addr != INADDR_ANY)
@@ -440,47 +440,28 @@
/*
 * Simple check to ensure all ports are not used up causing
 * a deadlock here.
-*
-* We split the two cases (up and down) so that the direction
-* is not being tested on each round of the loop.
 */
if (first > last) {
-   /*
-* counting down
-*/
-   if (dorandom)
-   *lastport = first -
-   (arc4random() % (first - last));
-   count = first - last;
+   aux = first;
+   first = last;
+   last = aux;
+   }
 
-   do {
-   if (count-- < 0)/* completely used? */
-   return (EADDRNOTAVAIL);
-   --*lastport;
-   if (*lastport > first || *lastport < last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_pcblookup_local(pcbinfo, laddr, lport,
-   wild));
-   } else {
-   /*
-* counting up
-*/
-   if (dorandom)
-   *lastport = first +
-   (arc4random() % (last - first));
-   count = last - first;
+   if (dorandom)
+   *lastport = first +
+   (arc4random() % (last - first));
 
-   do {
-   if (count-- < 0)/* completely used? */
-   return (EADDRNOTAVAIL);
-   ++*lastport;
-   

Re: FBSD 1GBit router?

2008-03-03 Thread Tom Evans
On Mon, 2008-03-03 at 13:54 +0100, Łukasz Bromirski wrote:
> I don't know exact requirements and application used, but given IP TV
> deployments relying heavily on multicast, and all other "VoD"
> technologies also using multicast...I find Your comments disturbing :)
> 
> However, if you don't control the network over which it will be
> transported, you need to replicate each stream...and so either
> you'll find bandwidth to do it (or pay for it) or be forced to switch
> to other design.
> 

The 4 most used 'IPTV' (in that they deliver TV, over IP) in the UK (BBC
iPlayer, 4OD, Sky by Broadband, and itv.com) do not use multicast at
all. Some multicast streams are available from the BBC, but they are
notorious for not working with various different providers.

Clearly, whilst multicast may be the future, for now it is not.

Tom


signature.asc
Description: This is a digitally signed message part


drlb: a direct routing loadbalancer

2008-03-03 Thread Jean-Yves Moulin

Hi everybody,

I'have made a simple software load-balancer for FreeBSD. It work only in 
direct-routing mode.


This is a kernel module that works for freebsd 6.2, 6.3 and 7.0 (tested) 
(and netbsd soon). It use pfil in order to watch incoming packet and 
redirect to real-server. You can define multiple virtual-ip for a same 
pool of real-server. You can use fourth scheduler: round-robin, 
least-connections, round-robin with weight and least-connections with 
weight. You can specify some timeout (before closing or dropping 
connections), persistence (for TLS) and the size of the connections hash 
table.


When you inhibit real-server, only new connections are not redirected to 
it. The load-balancer keep open-connections (that's why I made it. IPVS 
does'nt do this job right).


It come with two tools: drlbctl for configuring it. And lbdyn, who will 
check your real-server (ala keepalived).


It's a very simple tool and I need some times to add useful features 
(like sharing of connections table..) but I use in in a production 
environment. IPv6 will be tested soon.


You can find it here: http://jym.free.fr (under construction :-) and 
source here:  http://jym.free.fr/files/drlb-0.7.tar.gz


I will make a ports soon.

Comments are welcome.

Greetings from Paris!
jym
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Ephemeral port range (patch)

2008-03-03 Thread Fernando Gont

At 04:43 a.m. 03/03/2008, Mike Silbersack wrote:

Earlier in the week, I had commented (via private e-mail?) that I 
thought that Amit Klein's algorithm which I recently implemented in 
ip_id.c might be adapted to serve as an ephemeral port 
allocator.  Now that I've thought more about it, I'm not as certain 
that it would fit well.  I'll try to sketch out my ideas and see if 
I can figure out how it could fit.


(Shame on me... somehow you mail got stuck in my queue, and I didn't 
respond to it).


While I haven't look match at the scheme proposed by Amit, I think 
there's a "flaw" with the algorithm: IP IDs need to be unique for 
{source IP, des IP, Protocol}. And the algorithm still keeps a 
*global* IP ID. That means you'll cycle through the whole IP ID space 
when you probably didn't need to.


Here, two, a double-hash based scheme (a la RFC1948) will do. It 
would basically separate the IP ID space for every {source IP, dest 
IP, Protocol} tuple, and thus you'll cycle through the IP ID space 
only as fast as needed.


What's interesting is that when it comes to port randomization, IP ID 
randomization, and even timestamp randomization, the double-hash 
scheme seems to be the right solution.


That said, at least theoretically speaking, one could argue that 
there shouldn't be a problem with simply randomizing the IP ID 
number. For connection-oriented protocols, you should be doing PMTUD, 
and thus will not care about the IP ID. If your packets are doing 
fragmentation, then on links will large bandwidth-delay products 
you're already in trouble. For connection-less transport protocols 
(e.g., UDP), while they usually do not implement PMTUD, they also do 
not implement flow-control or congestion control. So you are either 
sending data to a local system (e.g., in a LAN), or you probably 
shouldn't be sending data that fast (and then you shouldn't have 
problems with trivially randomizing the IP ID).




The double-hash concept sounds pretty good, but there's a major 
problem with it.  If an application does a bind() to get a local 
port before doing a connect(), you don't know the remote IP or the remote port.


Yes, this is described in Section 3.5 of our id 
(http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-port-randomization-01.txt). 
Our take is that in that scenario you could simply randomize the 
local port. (i.e., implement the double-hash scheme, and fall-back to 
trivial randomization when you face this scenario).




There's a related "feature" in the BSD TCP stack that all local 
ports are considered equal; even for applications that do a 
connect() call and specify a remote IP/port, we do not let them use 
the same local port to two different remote IPs at the same 
time.  This puts a limit on the total number of outgoing connections 
that one machine can have.


mmm... I see. So this could limit the number of outgoing connections 
to about (ephemeral_ports/TIME_WAIT). Any objections against changing 
this? At least for outgoing connections (i.e., non-listening 
sockets), this shouldn't be the case. I'd be interested in working on 
this issue...


Kind regards,

--
Fernando Gont
e-mail: [EMAIL PROTECTED] || [EMAIL PROTECTED]
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 7.0-RC1 onboard em1 intel pro1000 vanishing occasionally

2008-03-03 Thread Oznon

Hi all,

I am currently experiencing the same issue as well except this time around
with a Supermicro X7DBU motherboard (Intel® 5000P (Blackford) Chipset). The
identical occurence is happening to me as well so we were wondering if
anyone had a resolution? Possibly an updated driver for the Intel NIC card? 
Anywho, here is a snippet of the a particular error we receive before the
interface disappears:

em0:  port
0x2000-0x201f mem 0xd892-0xd893,0xd890-0xd891 irq 18 at
device 0.0 on pci5
em0: Using MSI interrupt
em0: Ethernet address: 00:30:48:7f:be:38
em0: [FILTER]
em1:  port
0x2020-0x203f mem 0xd896-0xd897,0xd894-0xd895 irq 19 at
device 0.1 on pci5
em1: Using MSI interrupt
em1: Setup of Shared code failed

Thanks all,

Guy



Andrew Snow-2 wrote:
> 
> 
> Hi,
> 
> I have a recent Supermicro board  (Super X7DWT
> Intel 5400 chipset) with two onboard NICs - Intel (ESB2/Gilgal) 82563EB 
> Dual-Port Gigabit Ethernet Controller
> 
> 
> 
> Usually boot up looks like this:
> 
> em0:  port 
> 0x3000-0x301f mem 0xda02-0xda03,0xda00-0xda01 irq 44 at 
> device 0.0 on pci5
> em0: Using MSI interrupt
> em0: Ethernet address: 00:30:48:7e:20:e0
> em0: [FILTER]
> em1:  port 
> 0x3020-0x303f mem 0xda06-0xda07,0xda04-0xda05 irq 40 at 
> device 0.1 on pci5
> em1: Using MSI interrupt
> em1: Ethernet address: 00:30:48:7e:20:e1
> em1: [FILTER]
> 
> Sometimes when I reboot this happens:
> 
> em0:  port 
> 0x3000-0x301f mem 0xda02-0xda03,0xda00-0xda01 irq 44 at 
> device 0.0 on pci5
> em0: Using MSI interrupt
> em0: Ethernet address: 00:30:48:7e:20:e0
> em0: [FILTER]
> em1:  port 
> 0x3020-0x303f mem 0xda06-0xda07,0xda04-0xda05 irq 40 at 
> device 0.1 on pci5
> em1: Using MSI interrupt
> em1: Setup of Shared code failed
> device_attach: em1 attach returned 6
> 
> And em1 does not exist after that.  Power cycling the machine seems to 
> fix it for the next boot.
> 
> 
> Any suggestions?
> 
> 
> Regards,
> 
> - Andrew
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 

-- 
View this message in context: 
http://www.nabble.com/7.0-RC1-onboard-em1-intel-pro1000-vanishing-occasionally-tp14979560p15811094.html
Sent from the freebsd-net mailing list archive at Nabble.com.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 7.0-RC1 onboard em1 intel pro1000 vanishing occasionally

2008-03-03 Thread Oznon

I am experiencing the exact same issue as well except this time around I do
not have an IPMI card and I am using the X7DBU motherboard by Supermicro.

Any luck?

Thanks,

Guy


Vladimir Ivanov wrote:
> 
> Andrew Snow wrote:
>> Vladimir Ivanov wrote:
>>> We've same issue w/Supermicro boards if IPMI daughterboard installed. 
>>> A problem looks as PHY reg reads/writes fails.
>>
>> Ahh, that explains it, thanks.
>>
>> The management cards seem to cause multiple problems with the FreeBSD em
>> driver over time.  I don't want to give up the IPMI cards so I'll just
>> keep using system reset to get em1 working, it usually only takes 1 or 
>> 2 resets.
>>
>> My IPMI card has an option for external/dedicated LAN port so I might
>> try that also.
> We keep trying same way.  It is the only way to disable virtual lan 
> channel I seem :-|
> But Jack gave me another hope.
> 
> WBR
> 
> -- 
> Vladimir Ivanov
> Network Operations Center
> OOO "Yandex"
> t: +7 495 739-7000
> f: +7 495 739-7070
> @: [EMAIL PROTECTED] (corporate)
>   [EMAIL PROTECTED] (personal)
> www: www.yandex.ru
> -- 
> 
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 

-- 
View this message in context: 
http://www.nabble.com/7.0-RC1-onboard-em1-intel-pro1000-vanishing-occasionally-tp14979560p15811130.html
Sent from the freebsd-net mailing list archive at Nabble.com.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: LOR icmp6_input/nd6_lookup

2008-03-03 Thread gnn
At Fri, 29 Feb 2008 13:44:27 -0600,
Kevin Day wrote:
> 
> This is from 7.0-RELEASE:
> 
> lock order reversal:
>   1st 0xc3bde2b8 rtentry (rtentry) @ netinet6/nd6.c:1930
>   2nd 0xc3af367c radix node head (radix node head) @ net/route.c:147
> KDB: stack backtrace:
> db_trace_self_wrapper
> (c08af130,e11b8600,c0662bbe,c08b1592,c3af367c,...) at  
> db_trace_self_wrapper+0x26
> kdb_backtrace(c08b1592,c3af367c,c08b15f3,c08b15f3,c08b9ce7,...) at  
> kdb_backtrace+0x29
> witness_checkorder(c3af367c,9,c08b9cde,93,e11b8624,...) at  
> witness_checkorder+0x6de
> _mtx_lock_flags(c3af367c,0,c08b9cde,93,c066160b,...) at _mtx_lock_flags 
> +0xbc
> rtalloc1(e11b86e0,0,0,0,c3c9d01c,...) at rtalloc1+0x63
> nd6_lookup(c3c9d024,0,c39fd800,c3bde258,c3bde258,...) at nd6_lookup+0x55
> nd6_is_addr_neighbor(c3c9d01c,c39fd800,c08c1d75,78a,c09a5ed8,...) at  
> nd6_is_addr_neighbor+0x3b
> nd6_output(c39fd800,c39fd800,c3cf9b00,c3c9d01c,c3bde258,...) at  
> nd6_output+0x10f
> ip6_output(c3cf9b00,0,e11b88e0,0,0,...) at ip6_output+0x1081
> icmp6_reflect(c3cf9b00,28,8,1,c08c96d0,...) at icmp6_reflect+0x42f
> icmp6_input(e11b8c88,e11b8c70,3a,1d5,0,...) at icmp6_input+0x6dc
> ip6_input(c3be2900,0,c08b9887,8c,c09a1e24,...) at ip6_input+0xe36
> netisr_processqueue(c0955e30,0,c08b9887,f6,c3865a40,...) at  
> netisr_processqueue+0x8b
> swi_net(0,0,c08a938d,471,c3870364,...) at swi_net+0x9b
> ithread_loop(c383ac90,e11b8d38,c08a9115,305,c3873000,...) at  
> ithread_loop+0x1b5
> fork_exit(c060fbe0,c383ac90,e11b8d38) at fork_exit+0xb8
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0, eip = 0, esp = 0xe11b8d70, ebp = 0 ---
> 
> Are LOR's still PR-worthy?

Yes, can you file one?

Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 7.0-RC1 onboard em1 intel pro1000 vanishing occasionally

2008-03-03 Thread Jack Vogel
The fix to this problem is in the new shared code that got checked into
CURRENT on Friday. I will be MFCing the changes eventually but
if you want to test now you'll need to go with CURRENT.

On Mon, Mar 3, 2008 at 11:00 AM, Oznon <[EMAIL PROTECTED]> wrote:
>
>  I am experiencing the exact same issue as well except this time around I do
>  not have an IPMI card and I am using the X7DBU motherboard by Supermicro.
>
>  Any luck?
>
>  Thanks,
>
>  Guy
>
>
>
>
>  Vladimir Ivanov wrote:
>  >
>  > Andrew Snow wrote:
>  >> Vladimir Ivanov wrote:
>  >>> We've same issue w/Supermicro boards if IPMI daughterboard installed.
>  >>> A problem looks as PHY reg reads/writes fails.
>  >>
>  >> Ahh, that explains it, thanks.
>  >>
>  >> The management cards seem to cause multiple problems with the FreeBSD em
>  >> driver over time.  I don't want to give up the IPMI cards so I'll just
>  >> keep using system reset to get em1 working, it usually only takes 1 or
>  >> 2 resets.
>  >>
>  >> My IPMI card has an option for external/dedicated LAN port so I might
>  >> try that also.
>  > We keep trying same way.  It is the only way to disable virtual lan
>  > channel I seem :-|
>  > But Jack gave me another hope.
>  >
>  > WBR
>  >
>  > --
>  > Vladimir Ivanov
>  > Network Operations Center
>  > OOO "Yandex"
>  > t: +7 495 739-7000
>  > f: +7 495 739-7070
>  > @: [EMAIL PROTECTED] (corporate)
>  >   [EMAIL PROTECTED] (personal)
>  > www: www.yandex.ru
>  > --
>  >
>  > ___
>  > freebsd-net@freebsd.org mailing list
>  > http://lists.freebsd.org/mailman/listinfo/freebsd-net
>  > To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>  >
>
>  --
>  View this message in context: 
> http://www.nabble.com/7.0-RC1-onboard-em1-intel-pro1000-vanishing-occasionally-tp14979560p15811130.html
>
> Sent from the freebsd-net mailing list archive at Nabble.com.
>
>  ___
>
>
> freebsd-net@freebsd.org mailing list
>  http://lists.freebsd.org/mailman/listinfo/freebsd-net
>  To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 7.0-RC1 onboard em1 intel pro1000 vanishing occasionally

2008-03-03 Thread Oznon

Thanks for getting back to me so quickly Jack. Your response is very much
appreciated for it appears we will be testing with CURRENT for the time
being.

Cheers,

Guy


Jack Vogel wrote:
> 
> The fix to this problem is in the new shared code that got checked into
> CURRENT on Friday. I will be MFCing the changes eventually but
> if you want to test now you'll need to go with CURRENT.
> 
> On Mon, Mar 3, 2008 at 11:00 AM, Oznon <[EMAIL PROTECTED]> wrote:
>>
>>  I am experiencing the exact same issue as well except this time around I
>> do
>>  not have an IPMI card and I am using the X7DBU motherboard by
>> Supermicro.
>>
>>  Any luck?
>>
>>  Thanks,
>>
>>  Guy
>>
>>
>>
>>
>>  Vladimir Ivanov wrote:
>>  >
>>  > Andrew Snow wrote:
>>  >> Vladimir Ivanov wrote:
>>  >>> We've same issue w/Supermicro boards if IPMI daughterboard
>> installed.
>>  >>> A problem looks as PHY reg reads/writes fails.
>>  >>
>>  >> Ahh, that explains it, thanks.
>>  >>
>>  >> The management cards seem to cause multiple problems with the FreeBSD
>> em
>>  >> driver over time.  I don't want to give up the IPMI cards so I'll
>> just
>>  >> keep using system reset to get em1 working, it usually only takes 1
>> or
>>  >> 2 resets.
>>  >>
>>  >> My IPMI card has an option for external/dedicated LAN port so I might
>>  >> try that also.
>>  > We keep trying same way.  It is the only way to disable virtual lan
>>  > channel I seem :-|
>>  > But Jack gave me another hope.
>>  >
>>  > WBR
>>  >
>>  > --
>>  > Vladimir Ivanov
>>  > Network Operations Center
>>  > OOO "Yandex"
>>  > t: +7 495 739-7000
>>  > f: +7 495 739-7070
>>  > @: [EMAIL PROTECTED] (corporate)
>>  >   [EMAIL PROTECTED] (personal)
>>  > www: www.yandex.ru
>>  > --
>>  >
>>  > ___
>>  > freebsd-net@freebsd.org mailing list
>>  > http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>  > To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>>  >
>>
>>  --
>>  View this message in context:
>> http://www.nabble.com/7.0-RC1-onboard-em1-intel-pro1000-vanishing-occasionally-tp14979560p15811130.html
>>
>> Sent from the freebsd-net mailing list archive at Nabble.com.
>>
>>  ___
>>
>>
>> freebsd-net@freebsd.org mailing list
>>  http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>  To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>>
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 

-- 
View this message in context: 
http://www.nabble.com/7.0-RC1-onboard-em1-intel-pro1000-vanishing-occasionally-tp14979560p15816039.html
Sent from the freebsd-net mailing list archive at Nabble.com.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Ephemeral port range (patch)

2008-03-03 Thread Mike Silbersack


On Mon, 3 Mar 2008, Fernando Gont wrote:

(Shame on me... somehow you mail got stuck in my queue, and I didn't respond 
to it).


No sweat, I've taken far longer to reply to your e-mails!

While I haven't look match at the scheme proposed by Amit, I think there's a 
"flaw" with the algorithm: IP IDs need to be unique for {source IP, des IP, 
Protocol}. And the algorithm still keeps a *global* IP ID. That means you'll 
cycle through the whole IP ID space when you probably didn't need to.


That is true.  I think we have a time/space tradeoff here, with Amit's 
algorithm taking more memory and less time than a hash-based algorithm. 
But I haven't benchmarked one against the other, so it is possible that a 
double-hash might win in both categories.


I think Robert Watson said something about investigating the issue of IP 
IDs more in the near future.  What I'd like to see (if possible) is that 
we use Amit's algorithm until we've established a connection with a host, 
then switch to per-IP state and just use linear IP IDs.  That would seem 
to provide the least overhead for high speed connections.


That said, at least theoretically speaking, one could argue that there 
shouldn't be a problem with simply randomizing the IP ID number. For 
connection-oriented protocols, you should be doing PMTUD, and thus will not 
care about the IP ID. If your packets are doing fragmentation, then on links 
will large bandwidth-delay products you're already in trouble. For 
connection-less transport protocols (e.g., UDP), while they usually do not 
implement PMTUD, they also do not implement flow-control or congestion 
control. So you are either sending data to a local system (e.g., in a LAN), 
or you probably shouldn't be sending data that fast (and then you shouldn't 
have problems with trivially randomizing the IP ID).


I have attempted to make that argument before, and it did not go over well 
with most people.  :)


I think the counter-argument was primarily centered around UDP NFS, which, 
as you pointed out, is almost always a losing case.


The double-hash concept sounds pretty good, but there's a major problem 
with it.  If an application does a bind() to get a local port before doing 
a connect(), you don't know the remote IP or the remote port.


Yes, this is described in Section 3.5 of our id 
(http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-port-randomization-01.txt). 
Our take is that in that scenario you could simply randomize the local port. 
(i.e., implement the double-hash scheme, and fall-back to trivial 
randomization when you face this scenario).


Doh, I will try to read the ENTIRE paper next time before commenting.

There's a related "feature" in the BSD TCP stack that all local ports are 
considered equal; even for applications that do a connect() call and 
specify a remote IP/port, we do not let them use the same local port to two 
different remote IPs at the same time.  This puts a limit on the total 
number of outgoing connections that one machine can have.


mmm... I see. So this could limit the number of outgoing connections to about 
(ephemeral_ports/TIME_WAIT). Any objections against changing this? At least 
for outgoing connections (i.e., non-listening sockets), this shouldn't be the 
case. I'd be interested in working on this issue...


I don't think anyone is actively working on that problem, so you won't be 
stepping on anyone's toes by looking into it.  Bring on the patches!


There's a piece of low hanging fruit also in that area - we add incoming 
connections to the local port hash table, even though it seems unlikely 
that you are going to receive a connection from 1.1.1.1:5->1.1.1.2:80 
and then connect from 1.1.1.2:80->1.1.1.1:5.  Those unnecessary 
additions to the local port hash time would be nice to remove if you're 
investigating the related issues.


One thing you may or may not have noticed is that FreeBSD keeps TIME_WAIT 
sockets in a seperate zone which has a limit size, so you will not have to 
worry too much about them clogging up all ephemeral ports.


-Mike
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


IPv6 addresses not released when routes change

2008-03-03 Thread Kevin Oberman
At a recent networking conference an IPv6 hour took place where IPv6
only was available. It was an interesting experience. On the whole,
things worked well, but I hit one problem.

When I brought up my system, I associated with the main conference SSID
and received an IPv6 address prior to the IPv6 hour. Everything was
working fine. At the appointed time, all of the SSIDs for IPv4/IPv6 were
disabled. (No, I was not expecting that) and I re-associated with the
IPv6 only SSID, the interface retained the old address in a different
/64. It added the new address in a different /64, but did not remove the
old address, even though there was no router to it. 

Worse, it continued to originate connections with the old address as the
source. I had to manually delete the old address before I could open a
connection. 

I've been thinking about what software should handle this. It needs to
be some software that is aware that the router that assigned the the
prefix is no longer available. There is nothing wrong with an IPv6
interfaces having several active addresses, so you can't delete one
just because another is assigned.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751


pgp7sj6mukbsf.pgp
Description: PGP signature


Re: Ephemeral ports patch (fixed)

2008-03-03 Thread Mike Silbersack



On Mon, 3 Mar 2008, Fernando Gont wrote:


At 04:11 a.m. 03/03/2008, Mike Silbersack wrote:

Here's the same patch, but with the first ephemeral port changed from 1024 
to 1.


Now that I've actually gone to try to apply the patch (so I can view the 
two codepaths side by side, rather than in diff form), I'm finding that I 
can't apply it.  I think all the whitespace got stomped, either by your 
mail program or my mail program.  Can you please resent this as an 
attachment?


Sure. Please let me know if this one is okay.

Kind regards,

--
Fernando Gont
e-mail: [EMAIL PROTECTED] || [EMAIL PROTECTED]
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1


Too optimistic:

! #define IPPORT_EPHEMERALLAST  655535

Otherwise the patch looks good to me.  It looked a bit strange in unified 
diff format, I needed to look at it in context format.  (Strange, since I 
usually prefer unified.)


Rui, were you going to get this committed?

-Mike
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Ephemeral ports patch (fixed)

2008-03-03 Thread Fernando Gont

At 03:23 a.m. 04/03/2008, Mike Silbersack wrote:


Too optimistic:

! #define IPPORT_EPHEMERALLAST  655535

Otherwise the patch looks good to me.  It looked a bit strange in 
unified diff format, I needed to look at it in context 
format.  (Strange, since I usually prefer unified.)


Doh! I had fixed this in the patch itself, but then undid that change 
when I changed the first ephemeral port from 1024 to 1.


This one should be fine. :-)

Kind regards,

--
Fernando Gont
e-mail: [EMAIL PROTECTED] || [EMAIL PROTECTED]
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1



Index: in.h
===
RCS file: /home/ncvs/src/sys/netinet/in.h,v
retrieving revision 1.100
diff -u -r1.100 in.h
--- in.h12 Jun 2007 16:24:53 -  1.100
+++ in.h1 Mar 2008 09:00:10 -
@@ -293,8 +293,7 @@
  *
  * The value IP_PORTRANGE_HIGH changes the range of candidate port numbers
  * into the "high" range.  These are reserved for client outbound connections
- * which do not want to be filtered by any firewalls.  Note that by default
- * this is the same as IP_PORTRANGE_DEFAULT.
+ * which do not want to be filtered by any firewalls.
  *
  * The value IP_PORTRANGE_LOW changes the range to the "low" are
  * that is (by convention) restricted to privileged processes.  This
@@ -331,8 +330,13 @@
 #defineIPPORT_RESERVED 1024
 
 /*
- * Default local port range, used by both IP_PORTRANGE_DEFAULT
- * and IP_PORTRANGE_HIGH.
+ * Default local port range, used by IP_PORTRANGE_DEFAULT
+ */
+#define IPPORT_EPHEMERALFIRST  1
+#define IPPORT_EPHEMERALLAST   65535 
+ 
+/*
+ * Dynamic port range, used by IP_PORTRANGE_HIGH.
  */
 #defineIPPORT_HIFIRSTAUTO  49152
 #defineIPPORT_HILASTAUTO   65535
Index: in_pcb.c
===
RCS file: /home/ncvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.198
diff -u -r1.198 in_pcb.c
--- in_pcb.c22 Dec 2007 10:06:11 -  1.198
+++ in_pcb.c1 Mar 2008 09:00:11 -
@@ -89,8 +89,8 @@
  */
 intipport_lowfirstauto  = IPPORT_RESERVED - 1; /* 1023 */
 intipport_lowlastauto = IPPORT_RESERVEDSTART;  /* 600 */
-intipport_firstauto = IPPORT_HIFIRSTAUTO;  /* 49152 */
-intipport_lastauto  = IPPORT_HILASTAUTO;   /* 65535 */
+intipport_firstauto = IPPORT_EPHEMERALFIRST;   /* 1 */
+intipport_lastauto  = IPPORT_EPHEMERALLAST;/* 65535 */
 intipport_hifirstauto = IPPORT_HIFIRSTAUTO;/* 49152 */
 intipport_hilastauto  = IPPORT_HILASTAUTO; /* 65535 */
 
@@ -393,7 +393,7 @@
if (*lportp != 0)
lport = *lportp;
if (lport == 0) {
-   u_short first, last;
+   u_short first, last, aux;
int count;
 
if (laddr.s_addr != INADDR_ANY)
@@ -440,47 +440,28 @@
/*
 * Simple check to ensure all ports are not used up causing
 * a deadlock here.
-*
-* We split the two cases (up and down) so that the direction
-* is not being tested on each round of the loop.
 */
if (first > last) {
-   /*
-* counting down
-*/
-   if (dorandom)
-   *lastport = first -
-   (arc4random() % (first - last));
-   count = first - last;
+   aux = first;
+   first = last;
+   last = aux;
+   }
 
-   do {
-   if (count-- < 0)/* completely used? */
-   return (EADDRNOTAVAIL);
-   --*lastport;
-   if (*lastport > first || *lastport < last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_pcblookup_local(pcbinfo, laddr, lport,
-   wild));
-   } else {
-   /*
-* counting up
-*/
-   if (dorandom)
-   *lastport = first +
-   (arc4random() % (last - first));
-   count = last - first;
+   if (dorandom)
+   *lastport = first +
+   (arc4random() % (last - first));
 
-   do {
-   if (count-- < 0)/* completely used? */
-   return (EADDRNOTAVAIL);
-   ++*lastport;
-   if (*lastport 

Re: Ephemeral port range (patch)

2008-03-03 Thread Fernando Gont

At 03:37 a.m. 04/03/2008, Mike Silbersack wrote:

While I haven't look match at the scheme proposed by Amit, I think 
there's a "flaw" with the algorithm: IP IDs need to be unique for 
{source IP, des IP, Protocol}. And the algorithm still keeps a 
*global* IP ID. That means you'll cycle through the whole IP ID 
space when you probably didn't need to.


That is true.  I think we have a time/space tradeoff here, with 
Amit's algorithm taking more memory and less time than a hash-based 
algorithm. But I haven't benchmarked one against the other, so it is 
possible that a double-hash might win in both categories.


(Thinking out loud)
Note that in the case of implementing the double-hash scheme for 
connection-oriented protocols, once you compute the hash for the 
first IP ID to be used for a connection, you could store the result 
of the hash in the TCB, and thus you wouldn't need to recompute this 
"expensive" hash every time you send a packet.



I think Robert Watson said something about investigating the issue 
of IP IDs more in the near future.  What I'd like to see (if 
possible) is that we use Amit's algorithm until we've established a 
connection with a host, then switch to per-IP state and just use 
linear IP IDs.  That would seem to provide the least overhead for 
high speed connections.


I haven't yet looked that much at Amit's approach but, from what I 
have seen, your suggestion makes sense.



That said, at least theoretically speaking, one could argue that 
there shouldn't be a problem with simply randomizing the IP ID 
number. For connection-oriented protocols, you should be doing 
PMTUD, and thus will not care about the IP ID. If your packets are 
doing fragmentation, then on links will large bandwidth-delay 
products you're already in trouble. For connection-less transport 
protocols (e.g., UDP), while they usually do not implement PMTUD, 
they also do not implement flow-control or congestion control. So 
you are either sending data to a local system (e.g., in a LAN), or 
you probably shouldn't be sending data that fast (and then you 
shouldn't have problems with trivially randomizing the IP ID).


I have attempted to make that argument before, and it did not go 
over well with most people.  :)


I think the counter-argument was primarily centered around UDP NFS, 
which, as you pointed out, is almost always a losing case.


Relying on IP fragmentation for anything that is supposed to be 
reliable and that should work at high speed is...mmm... probably not 
the best idea. ;-)  Other than the classic "fragmentation considered 
harmful", there's a more recent id (RFC?) entitled "fragmentation 
considered very harmful" which shows the problems that may arise due 
to fragmentation.


So the thing here is that people want to do the wrong thing, and then 
blame the IP ID generator. ;-)




The double-hash concept sounds pretty good, but there's a major 
problem with it.  If an application does a bind() to get a local 
port before doing a connect(), you don't know the remote IP or the remote port.


Yes, this is described in Section 3.5 of our id 
(http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-port-randomization-01.txt). 
Our take is that in that scenario you could simply randomize the 
local port. (i.e., implement the double-hash scheme, and fall-back 
to trivial randomization when you face this scenario).


Doh, I will try to read the ENTIRE paper next time before commenting.


No worries.



There's a related "feature" in the BSD TCP stack that all local 
ports are considered equal; even for applications that do a 
connect() call and specify a remote IP/port, we do not let them 
use the same local port to two different remote IPs at the same 
time.  This puts a limit on the total number of outgoing 
connections that one machine can have.


mmm... I see. So this could limit the number of outgoing 
connections to about (ephemeral_ports/TIME_WAIT). Any objections 
against changing this? At least for outgoing connections (i.e., 
non-listening sockets), this shouldn't be the case. I'd be 
interested in working on this issue...


I don't think anyone is actively working on that problem, so you 
won't be stepping on anyone's toes by looking into it.  Bring on the patches!


Great! Will do.



There's a piece of low hanging fruit also in that area - we add 
incoming connections to the local port hash table, even though it 
seems unlikely that you are going to receive a connection from 
1.1.1.1:5->1.1.1.2:80 and then connect from 
1.1.1.2:80->1.1.1.1:5.  Those unnecessary additions to the local 
port hash time would be nice to remove if you're investigating the 
related issues.


Ok.


One thing you may or may not have noticed is that FreeBSD keeps 
TIME_WAIT sockets in a seperate zone which has a limit size, so you 
will not have to worry too much about them clogging up all ephemeral ports.


I had not... but will have a look at it.

Thanks!

--
Fernando Gont
e-mail: [EMAIL PROTECTED] |