WIDE-DHCP

2013-08-13 Thread s m
hello guys,


does any body use WIDE-DHCP? i installed it on my freebsd 8.2 but don't
know how to configure it. i searched a lot but can not find any useful
documentation.

please let me know if some body configure it or have some application about.
thanks in advance
SAM
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: WIDE-DHCP

2013-08-13 Thread Olivier Nicole
Sam,

It seems that the distribution includes a directory called db_sample
with some tutorials/examples.

But it also seems that the last release of wide-dhcp is 16 years old...

Olivier

On Tue, Aug 13, 2013 at 3:42 PM, s m  wrote:
> hello guys,
>
>
> does any body use WIDE-DHCP? i installed it on my freebsd 8.2 but don't
> know how to configure it. i searched a lot but can not find any useful
> documentation.
>
> please let me know if some body configure it or have some application about.
> thanks in advance
> SAM
> ___
> freebsd-questi...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: WIDE-DHCP

2013-08-13 Thread sthaug
> It seems that the distribution includes a directory called db_sample
> with some tutorials/examples.
> 
> But it also seems that the last release of wide-dhcp is 16 years old...

And I also strongly doubt that he's going to have any better luck
with his /8 net.

Steinar Haug, Nethelp consulting, sth...@nethelp.no
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: WIDE-DHCP

2013-08-13 Thread s m
yes, unfortunately it's not well enough for me:((


On Tue, Aug 13, 2013 at 1:32 PM,  wrote:

> > It seems that the distribution includes a directory called db_sample
> > with some tutorials/examples.
> >
> > But it also seems that the last release of wide-dhcp is 16 years old...
>
> And I also strongly doubt that he's going to have any better luck
> with his /8 net.
>
> Steinar Haug, Nethelp consulting, sth...@nethelp.no
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Different providers for different nat clients

2013-08-13 Thread ar...@artem.ru

Hello!

I have a strange task and don't understand how to implement such scheme.

There is a router with 3 interfaces:

IF1: PROVIDER A
IF2: PROVIDER B
IF3: LAN

Clients served via NAT. There are about 15 clients.

Now, what i need to do:

By default all traffic from all clients goes to PROVIDER A via IF1.
But, if total incoming traffic for any particular client becomes over X 
Mb then that client
and only that client must be switch for PROVIDER B. The switch must be 
automatic and must

not use any software on the client side.
While i know how to count traffic i don't understand how to route 
external traffic to/from

nat clients on particular external interface.

Any idea how it is done?

Artem

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Different providers for different nat clients

2013-08-13 Thread Olivier Nicole
Artem,

> I have a strange task and don't understand how to implement such scheme.
>
> There is a router with 3 interfaces:
>
> IF1: PROVIDER A
> IF2: PROVIDER B
> IF3: LAN
>
> Clients served via NAT. There are about 15 clients.
>
> Now, what i need to do:
>
> By default all traffic from all clients goes to PROVIDER A via IF1.
> But, if total incoming traffic for any particular client becomes over X Mb
> then that client
> and only that client must be switch for PROVIDER B. The switch must be
> automatic and must
> not use any software on the client side.
> While i know how to count traffic i don't understand how to route external
> traffic to/from
> nat clients on particular external interface.
>
> Any idea how it is done?

I would think that you have to dynamically change the configuration of
the NAT to associate the client to the IP from provider B.

Now, how you do that depends on the NAT software you are using, that
ou did not say.

Good luck,

Olivier
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Different providers for different nat clients

2013-08-13 Thread ar...@artem.ru

13.08.2013 16:19, Olivier Nicole пишет:

Artem,


I have a strange task and don't understand how to implement such scheme.

There is a router with 3 interfaces:

IF1: PROVIDER A
IF2: PROVIDER B
IF3: LAN

Clients served via NAT. There are about 15 clients.

Now, what i need to do:

By default all traffic from all clients goes to PROVIDER A via IF1.
But, if total incoming traffic for any particular client becomes over X Mb
then that client
and only that client must be switch for PROVIDER B. The switch must be
automatic and must
not use any software on the client side.
While i know how to count traffic i don't understand how to route external
traffic to/from
nat clients on particular external interface.

Any idea how it is done?

I would think that you have to dynamically change the configuration of
the NAT to associate the client to the IP from provider B.

Now, how you do that depends on the NAT software you are using, that
ou did not say.

Good luck,

Olivier


Um.. i was planning to use the included natd
But i think it has only one external address to use

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Different providers for different nat clients

2013-08-13 Thread Olivier Nicole
Artem,

> Um.. i was planning to use the included natd
> But i think it has only one external address to use

I think there is a couple of rules to add to ipfw to enable NAT, that
maybe where you divert to here or there:

ipfw add divert natd all from 192.169.x.y to any via ISPB
ipfw add divert natd all from any to any via ISPA

That's the direction I would look at.

Best regards,

Olivier

>
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Different providers for different nat clients

2013-08-13 Thread Daniel Hartmeier
On Tue, Aug 13, 2013 at 04:11:37PM +0400, ar...@artem.ru wrote:

> There is a router with 3 interfaces:
> 
> IF1: PROVIDER A
> IF2: PROVIDER B
> IF3: LAN
> 
> Clients served via NAT. There are about 15 clients.
> 
> Now, what i need to do:
> 
> By default all traffic from all clients goes to PROVIDER A via IF1.
> But, if total incoming traffic for any particular client becomes
> over X Mb then that client
> and only that client must be switch for PROVIDER B. The switch must
> be automatic and must
> not use any software on the client side.
> While i know how to count traffic i don't understand how to route
> external traffic to/from
> nat clients on particular external interface.
> 
> Any idea how it is done?

This is called source-based routing, and at least pf and ipfw support
it. Using pf it could look like

  table 
  nat on IF1 from !IF1 -> IF1
  nat on IF2 from !IF2 -> IF2
  pass in on IF3 route-to (IF2 GW2) from 

with the default route going through IF1 to GW1.

To add a client to the table, use

  pfctl -t overquota -Ta 192.168.2.3

Subsequent new connections will go out through the second provider.
Existing prior connections will continue to to through the first
provider, unless you explicitly remove the sessions, as in

  pfctl -k 192.168.2.3

Daniel
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Create CARP interface in state INIT?

2013-08-13 Thread Daniel Hartmeier
On Thu, Aug 08, 2013 at 01:11:58PM +0100, Karl Pielorz wrote:

> Is there any way from rc.conf of creating a carp interface in the
> 'down' state - i.e. INIT?

I think any interface configured with ifconfig_* in rc.conf will cause
an explicit additional "ifconfig up" call from /etc/network.subr.

Furthermore, ifconfig itself will add "up" when setting the IP address.

So, don't configure the carp interface in rc.conf, but do it in
/etc/rc.local, and be careful to add the address while the vhid is not
yet configured, as in:

  ifconfig carp0 create
  ifconfig carp0 inet 192.168.107.21
  ifconfig carp0 down
  ifconfig carp0 vhid 21 pass secret advskew 100

HTH,
Daniel
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Different providers for different nat clients

2013-08-13 Thread Julian Elischer

On 8/13/13 8:34 PM, Olivier Nicole wrote:

Artem,


Um.. i was planning to use the included natd
But i think it has only one external address to use

I think there is a couple of rules to add to ipfw to enable NAT, that
maybe where you divert to here or there:

ipfw add divert natd all from 192.169.x.y to any via ISPB
ipfw add divert natd all from any to any via ISPA

That's the direction I would look at.


Ok here are some thoughts..
you want existing sessions from the offending client to continue to 
run through the original interface, or their session will immediately 
die. so you need to use dynamic session based routing.

one way to so this is using the
keep-state and check state rules in ipfw.

if you do a  rule like
 check-state
 fwd ISP2 ip from table(1) to any in recv $LAN keep-state
 fwd ISP1 ip from any to any in recv $LAN keep-state


then that session will continue to do that even if the contents of 
table(1) change.


then you can use  NAT rules on each $ISP interface to ensure that 
packets get translated correctly

it's up to you to arrange the contents of the table..

I can't remember off hand whether a firewall pass terminates on a fwd 
rule match or not..

you may want to check that.

I think you should divide your rules up into rules for each interface 
and direction using skipto,

and then in each section have specialist rules for just that traffic.
so with 3 interfaces you would have 6 sets of rules, (say 1000, 2000, 
3000, 4000, 5000 and 6000)

and the very first rules would be:
skipto 1000 ip from any to any in recv $LAN
skipto 2000 ip from any to any out xmit $LAN
skipto 3000 ip from any to any in recv $ISP1
skipto 4000 ip from any to any out xmit $ISP1
skipto 5000 ip from any to any in recv $ISP2
skipto 6000 ip from any to any out xmit $ISP2
[handle loopback packets here]

at 1000 you have the rules above.
at 3000 , 4000, 5000 and 6000 you have NAT rules (with different NAT 
instances for each interface.


you can use whatever method you like (e.g. dummynet accounting?) to 
work out how much traffic is going, and add and remove entries in the 
table.


remember though to make sure exisiting sessions don't get switched!

Julian




Best regards,

Olivier


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


TSO and FreeBSD vs Linux

2013-08-13 Thread Julian Elischer
I have been tracking down a performance embarrassment on AMAZON EC2 
and have found it I think.
Our OS cousins over at Linux land have implemented some interesting 
behaviour when TSO is in use.


They seem to aggregate ACKS when there is a lot of traffic so that 
they can create the
largest possible TSO packet. We on the other hand respond to each and 
every returning ACK, as it arrives and thus generally fall into the 
behaviour of sending a bunch of small packets, the size of each ack.


for two examples look at:


http://www.freebsd.org/~julian/LvsF-tcp-start.tiff
and
http://www.freebsd.org/~julian/LvsF-tcp.tiff

in each case, we can see FreeBSD on the left and Linux on the right.

The first case shows the case as the sessions start, and the second 
case shows

some distance later (when the sequence numbers wrap around.. no particular
reason to use that, it was just fun to see).
In both cases you can see that each Linux packet (white)(once they 
have got

going) is responding to multiple bumps in the send window sequence
number (green and yellow lines) (representing the arrival of several ACKs)
while FreeBSD produces a whole bunch of smaller packets, slavishly 
following
exactly the size of each incoming ack.. This gives us quite  a 
performance debt.
Notice that this behaviour in Linux seems to be modal.. it seems to 
'switch on' a little bit

into the 'starting' trace.

In addition, you can see also that Linux gets going faster even in the 
beginning where
TSO isn't in play, by sending a lot more packets up-front. (of course 
the wisdom of this

can be argued).

Has anyone done any work on aggregating ACKs, or delaying responding 
to them?


Julian
(Who's suspecting he's about to find out more about TSO and the send 
path, than he ever wanted to).


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TSO and FreeBSD vs Linux

2013-08-13 Thread Navdeep Parhar
On 08/13/13 10:29, Julian Elischer wrote:
..
> 
> Has anyone done any work on aggregating ACKs, or delaying responding to
> them?

If LRO is enabled on the FreeBSD receiver, ACKs are already aggregated
(a duplicate ACK will result in an immediate flush though.)  See tcp_lro_rx.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TSO and FreeBSD vs Linux

2013-08-13 Thread Julian Elischer

On 8/14/13 1:29 AM, Julian Elischer wrote:
I have been tracking down a performance embarrassment on AMAZON EC2 
and have found it I think.
Our OS cousins over at Linux land have implemented some interesting 
behaviour when TSO is in use.


They seem to aggregate ACKS when there is a lot of traffic so that 
they can create the
largest possible TSO packet. We on the other hand respond to each 
and every returning ACK, as it arrives and thus generally fall into 
the behaviour of sending a bunch of small packets, the size of each 
ack.


for two examples look at:


http://www.freebsd.org/~julian/LvsF-tcp-start.tiff
and
http://www.freebsd.org/~julian/LvsF-tcp.tiff


some people have troubel with tiff, so here they are a jpeg.

http://www.freebsd.org/~julian/LvsF-tcp-start.jpg
and
http://www.freebsd.org/~julian/LvsF-tcp.jpg


in each case, we can see FreeBSD on the left and Linux on the right.

The first case shows the case as the sessions start, and the second 
case shows
some distance later (when the sequence numbers wrap around.. no 
particular

reason to use that, it was just fun to see).
In both cases you can see that each Linux packet (white)(once they 
have got

going) is responding to multiple bumps in the send window sequence
number (green and yellow lines) (representing the arrival of several 
ACKs)
while FreeBSD produces a whole bunch of smaller packets, slavishly 
following
exactly the size of each incoming ack.. This gives us quite  a 
performance debt.
Notice that this behaviour in Linux seems to be modal.. it seems to 
'switch on' a little bit

into the 'starting' trace.

In addition, you can see also that Linux gets going faster even in 
the beginning where
TSO isn't in play, by sending a lot more packets up-front. (of 
course the wisdom of this

can be argued).

Has anyone done any work on aggregating ACKs, or delaying responding 
to them?


Julian
(Who's suspecting he's about to find out more about TSO and the send 
path, than he ever wanted to).


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TSO and FreeBSD vs Linux

2013-08-13 Thread Luigi Rizzo
On Tue, Aug 13, 2013 at 7:37 PM, Navdeep Parhar  wrote:

> On 08/13/13 10:29, Julian Elischer wrote:
> ..
> >
> > Has anyone done any work on aggregating ACKs, or delaying responding to
> > them?
>
> If LRO is enabled on the FreeBSD receiver, ACKs are already aggregated
> (a duplicate ACK will result in an immediate flush though.)  See
> tcp_lro_rx.
>

>From what I have heard (no direct experience though), when TSO is enabled
linux may decide to hold a transmission in the hope of getting more acks
in the future hence a larger segment sent in one shot.

I am not sure i find similar code in FreeBSD; there is something mentioned
in tcp_output() but then the check only seems to be for t_maxseg

/*
 * Sender silly window avoidance.   We transmit under the following
 * conditions when len is non-zero:
 *
 *  - We have a full segment (or more with TSO)
 *  - This is the last buffer in a write()/send() and we are
 *either idle or running NODELAY
 *  - we've timed out (e.g. persist timer)
 *  - we have more then 1/2 the maximum send window's worth of
 *data (receiver may be limited the window size)
 *  - we need to retransmit
 */
if (len) {
if (len >= tp->t_maxseg)
goto send;
/*

and the t_maxseg seems to be capped to the mss.

This could be implemented in tcp_output(), i suppose.

cheers
luigi


> Regards,
> Navdeep
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>



-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/. Universita` di Pisa
 TEL  +39-050-2211611   . via Diotisalvi 2
 Mobile   +39-338-6809875   . 56122 PISA (Italy)
-+---
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TSO and FreeBSD vs Linux

2013-08-13 Thread Julian Elischer

On 8/14/13 1:37 AM, Navdeep Parhar wrote:

On 08/13/13 10:29, Julian Elischer wrote:
..

Has anyone done any work on aggregating ACKs, or delaying responding to
them?

If LRO is enabled on the FreeBSD receiver, ACKs are already aggregated
(a duplicate ACK will result in an immediate flush though.)  See tcp_lro_rx.

not always, , certainly not with XEN (xn0) on EC2.



Regards,
Navdeep



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TSO and FreeBSD vs Linux

2013-08-13 Thread Navdeep Parhar
On 08/13/13 17:51, Julian Elischer wrote:
> On 8/14/13 1:37 AM, Navdeep Parhar wrote:
>> On 08/13/13 10:29, Julian Elischer wrote:
>> ..
>>> Has anyone done any work on aggregating ACKs, or delaying responding to
>>> them?
>> If LRO is enabled on the FreeBSD receiver, ACKs are already aggregated
>> (a duplicate ACK will result in an immediate flush though.)  See
>> tcp_lro_rx.
> not always, , certainly not with XEN (xn0) on EC2.

It will vary from driver to driver and how many frames a driver gets to
(or chooses to) process in one run of its interrupt handler.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/181257: [bge] bge link status change

2013-08-13 Thread linimon
Old Synopsis: bge link status change
New Synopsis: [bge] bge link status change

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Wed Aug 14 02:08:11 UTC 2013
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=181257
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


TCP Initial Window 10 MFC (was: Re: svn commit: r252789 - stable/9/sys/netinet)

2013-08-13 Thread Lawrence Stewart
Hi Andre,

[RE team is BCCed so they're aware of this discussion]

On 07/06/13 00:58, Andre Oppermann wrote:
> Author: andre
> Date: Fri Jul  5 14:58:24 2013
> New Revision: 252789
> URL: http://svnweb.freebsd.org/changeset/base/252789
> 
> Log:
>   MFC r242266:
>   
>Increase the initial CWND to 10 segments as defined in IETF TCPM
>draft-ietf-tcpm-initcwnd-05. It explains why the increased initial
>window improves the overall performance of many web services without
>risking congestion collapse.
>   
>As long as it remains a draft it is placed under a sysctl marking it
>as experimental:
> net.inet.tcp.experimental.initcwnd10 = 1
>When it becomes an official RFC soon the sysctl will be changed to
>the RFC number and moved to net.inet.tcp.
>   
>This implementation differs from the RFC draft in that it is a bit
>more conservative in the case of packet loss on SYN or SYN|ACK because
>we haven't reduced the default RTO to 1 second yet.  Also the restart
>window isn't yet increased as allowed.  Both will be adjusted with
>upcoming changes.
>   
>Is is enabled by default.  In Linux it is enabled since kernel 3.0.

I haven't been fully alert to FreeBSD happenings this year so apologies
for bringing this up so long after the MFC.

I don't think this change should have been MFCed, at least not in its
current form. Enabling the switch to IW=10 on a stable branch is
inappropriate IMO. I also think the "net.inet.tcp.experimental" sysctl
branch is poorly named as per the important discussion we had back in
February [1]. I would really prefer we didn't get stuck having to keep
it around by making a stable release with it being present.

I think this commit should be backed out of stable/9 and more
importantly, 9.2-RELEASE.

As an aside, I am intending to follow up to the Feb discussion with a
patch that implements the basic infrastructure I proposed so that we can
continue that discussion.

Thoughts?

Cheers,
Lawrence

[1] http://lists.freebsd.org/pipermail/freebsd-net/2013-February/034698.html
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TSO and FreeBSD vs Linux

2013-08-13 Thread Lawrence Stewart
On 08/14/13 03:29, Julian Elischer wrote:
> I have been tracking down a performance embarrassment on AMAZON EC2 and
> have found it I think.

Let us please avoid conflating performance with throughput. The
behaviour you go on to describe as a performance embarrassment is
actually a throughput difference, and the FreeBSD behaviour you're
describing is essentially sacrificing throughput and CPU cycles for
lower latency. That may not be a trade-off you like, but it is an
important factor in this discussion.

Don't fall into the trap of labelling Linux's propensity for maximising
throughput as superior to an alternative approach which strikes a
different balance. It all depends on the use case.

> Our OS cousins over at Linux land have implemented some interesting
> behaviour when TSO is in use.
> 
> They seem to aggregate ACKS when there is a lot of traffic so that they
> can create the
> largest possible TSO packet. We on the other hand respond to each and
> every returning ACK, as it arrives and thus generally fall into the
> behaviour of sending a bunch of small packets, the size of each ack.

There's a thing controlled by ethtool called GRO (generic receive
offload) which appears to be enabled by default on at least Ubuntu and I
guess other Linux's too. It's responsible for aggregating ACKs and data
to batch them up the stack if the driver doesn't provide a hardware
offload implementation. Try rerunning your experiments with the ACK
batching disabled on the Linux host to get an additional comparison point.

> for two examples look at:
> 
> 
> http://www.freebsd.org/~julian/LvsF-tcp-start.tiff
> and
> http://www.freebsd.org/~julian/LvsF-tcp.tiff
> 
> in each case, we can see FreeBSD on the left and Linux on the right.
> 
> The first case shows the case as the sessions start, and the second case
> shows
> some distance later (when the sequence numbers wrap around.. no particular
> reason to use that, it was just fun to see).
> In both cases you can see that each Linux packet (white)(once they have got
> going) is responding to multiple bumps in the send window sequence
> number (green and yellow lines) (representing the arrival of several ACKs)
> while FreeBSD produces a whole bunch of smaller packets, slavishly
> following
> exactly the size of each incoming ack.. This gives us quite  a
> performance debt.

Again, please s/performance/what-you-really-mean/ here.

> Notice that this behaviour in Linux seems to be modal.. it seems to
> 'switch on' a little bit
> into the 'starting' trace.
> 
> In addition, you can see also that Linux gets going faster even in the
> beginning where
> TSO isn't in play, by sending a lot more packets up-front. (of course
> the wisdom of this
> can be argued).

They switched to using an initial window of 10 segments some time ago.
FreeBSD starts with 3 or more recently, 10 if you're running recent
9-STABLE or 10-CURRENT.

> Has anyone done any work on aggregating ACKs, or delaying responding to
> them?

As noted by Navdeep, we already have the code to aggregate ACKs in our
software LRO implementation. The bigger problem is that appropriate byte
counting places a default 2*MSS limit on the amount of ACKed data the
window can grow by i.e. if an ACK for 64k of data comes up the stack,
we'll grow the window by 2 segments worth of data in response. That
needs to be addressed - we could send the ACK count up with the
aggregated single ACK or just ignore abc_l_var when LRO is in use for a
connection.

Cheers,
Lawrence
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TSO and FreeBSD vs Linux

2013-08-13 Thread Julian Elischer

On 8/14/13 11:39 AM, Lawrence Stewart wrote:

On 08/14/13 03:29, Julian Elischer wrote:

I have been tracking down a performance embarrassment on AMAZON EC2 and
have found it I think.

Let us please avoid conflating performance with throughput. The
behaviour you go on to describe as a performance embarrassment is
actually a throughput difference, and the FreeBSD behaviour you're
describing is essentially sacrificing throughput and CPU cycles for
lower latency. That may not be a trade-off you like, but it is an
important factor in this discussion.
it was an embarrassment in that in one class of test we performed very 
poorly.
It was not a disaster or a show-stopper, but for our product it is a 
critical number.
It is a throughput difference, as you say but that is a very important 
part of performance...

The latency of linux didn't seem to be any worse
than FreeBSD, just the throughput was a lot higher in the same scenario.


Don't fall into the trap of labelling Linux's propensity for maximising
throughput as superior to an alternative approach which strikes a
different balance. It all depends on the use case.
well the linux balance seems t be "be better all around" at this 
moment so that is

embarrassing. :-) I could see no latency reversion.




Our OS cousins over at Linux land have implemented some interesting
behaviour when TSO is in use.

They seem to aggregate ACKS when there is a lot of traffic so that they
can create the
largest possible TSO packet. We on the other hand respond to each and
every returning ACK, as it arrives and thus generally fall into the
behaviour of sending a bunch of small packets, the size of each ack.

There's a thing controlled by ethtool called GRO (generic receive
offload) which appears to be enabled by default on at least Ubuntu and I
guess other Linux's too. It's responsible for aggregating ACKs and data
to batch them up the stack if the driver doesn't provide a hardware
offload implementation. Try rerunning your experiments with the ACK
batching disabled on the Linux host to get an additional comparison point.

I will try that as soon as I get back to the machines in question.

for two examples look at:


http://www.freebsd.org/~julian/LvsF-tcp-start.tiff
and
http://www.freebsd.org/~julian/LvsF-tcp.tiff

in each case, we can see FreeBSD on the left and Linux on the right.

The first case shows the case as the sessions start, and the second case
shows
some distance later (when the sequence numbers wrap around.. no particular
reason to use that, it was just fun to see).
In both cases you can see that each Linux packet (white)(once they have got
going) is responding to multiple bumps in the send window sequence
number (green and yellow lines) (representing the arrival of several ACKs)
while FreeBSD produces a whole bunch of smaller packets, slavishly
following
exactly the size of each incoming ack.. This gives us quite  a
performance debt.

Again, please s/performance/what-you-really-mean/ here.
ok, In my tests this makes FreeBSD data transfers much slower, by as 
much as 60%.



Notice that this behaviour in Linux seems to be modal.. it seems to
'switch on' a little bit
into the 'starting' trace.

In addition, you can see also that Linux gets going faster even in the
beginning where
TSO isn't in play, by sending a lot more packets up-front. (of course
the wisdom of this
can be argued).

They switched to using an initial window of 10 segments some time ago.
FreeBSD starts with 3 or more recently, 10 if you're running recent
9-STABLE or 10-CURRENT.

I tried setting initial values as shown:
  net.inet.tcp.local_slowstart_flightsize: 10
  net.inet.tcp.slowstart_flightsize: 10
it didn't seem to make too much difference but I will redo the test.




Has anyone done any work on aggregating ACKs, or delaying responding to
them?

As noted by Navdeep, we already have the code to aggregate ACKs in our
software LRO implementation. The bigger problem is that appropriate byte
counting places a default 2*MSS limit on the amount of ACKed data the
window can grow by i.e. if an ACK for 64k of data comes up the stack,
we'll grow the window by 2 segments worth of data in response. That
needs to be addressed - we could send the ACK count up with the
aggregated single ACK or just ignore abc_l_var when LRO is in use for a
connection.
so, does "Software LRO" mean that LRO on hte NIC should be ON or OFF 
to see this?





Cheers,
Lawrence




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"