[Bug 280037] KTLS with Intel QAT may trigger kernel panics

2024-07-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280037

--- Comment #12 from ss3bsd <3226388...@jcom.home.ne.jp> ---

> I'm now running the same machine with 
> kern.ipc.tls.cbc_enable=0 
> to see if the stability changes.

The machine worked for 2 weeks without panic.

Changed back kern.ipc.tls.cbc_enable=0 -> 1, a couple of days ago.

The machine today panics again at CBC related part of code
(ktls_ocf_tls_cbc_decrypt).

So, the cause might be something specific to CBC. (not a strong evidence
though.)


--

Unread portion of the kernel message buffer:
trap number = 12
panic: page fault
cpuid = 1
time = 1721297633
KDB: stack backtrace:
#0 0x809d2b5d at kdb_backtrace+0x5d
#1 0x809858c1 at vpanic+0x131
#2 0x80985783 at panic+0x43
#3 0x80e5f91b at trap_fatal+0x40b
#4 0x80e5f966 at trap_pfault+0x46
#5 0x80e36288 at calltrap+0x8
#6 0x80e21cc8 at bounce_bus_dmamap_load_buffer+0x178
#7 0x809c8a07 at bus_dmamap_load_crp_buffer+0x237
#8 0x82ddb7ad at qat_ocf_process+0x40d
#9 0x80c83fd0 at crypto_dispatch+0x60
#10 0x80c8c3cd at ktls_ocf_dispatch+0x5d
#11 0x80c8d3f4 at ktls_ocf_tls_cbc_decrypt+0x344
#12 0x80a1a484 at ktls_work_thread+0x664
#13 0x8093fc7f at fork_exit+0x7f
#14 0x80e372ee at fork_trampoline+0xe
Uptime: 2d17h13m52s

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 279875] sockstat: segmentation fault

2024-07-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279875

John Marshall  changed:

   What|Removed |Added

 CC||j...@jmarshall.id.au

--- Comment #1 from John Marshall  ---
'Me too'

Recent 14-STABLE amd64

 FreeBSD 14.1-STABLE #0 stable/14-n268159-60f78f8ed14d: Tue Jul 16 19:25:41
AEST 2024
j...@rwsrv08.gfn.riverwillow.net.au:/build/obj/john/kits/src/amd64.amd64/sys/RWSRV08

No segfault if I specify -j to restrict dispaly to one of the jails, only if I
specify -j0 or omit -j. This is my third build of 14-STABLE (beginning early
May) and all of them have done the same. Same vintage 14-STABLE on i386 is
fine. I only have the two systems running FreeBSD.

rwsrv08# lldb -X sockstat
(lldb) target create "sockstat"
Current executable set to '/usr/bin/sockstat' (x86_64).
(lldb) run
Process 87548 launched: '/usr/bin/sockstat' (x86_64)
USER COMMANDPID   FD  PROTO  LOCAL ADDRESS FOREIGN ADDRESS  
root sockstat   87554 6   stream -> [87548 8]
root sockstat   87553 6   stream -> [87548 7]
...
root syslogd 2948 9   dgram  /var/run/logpriv
root gssd2810 3   stream /var/run/gssd.sock
Process 87548 stopped
* thread #1, name = 'sockstat', stop reason = signal SIGSEGV: address not
mapped to object (fault address: 0x18)
frame #0: 0x02c892dde507 sockstat`displaysock [inlined]
file_compare(a=, b=0x) at sockstat.c:179:38
   176  static int64_t
   177  file_compare(const struct file *a, const struct file *b)
   178  {
-> 179  return ((int64_t)(a->xf_data/2 - b->xf_data/2));
^
   180  }
   181  RB_GENERATE_STATIC(files_t, file, file_tree, file_compare);
   182  
(lldb) bt
* thread #1, name = 'sockstat', stop reason = signal SIGSEGV: address not
mapped to object (fault address: 0x18)
  * frame #0: 0x02c892dde507 sockstat`displaysock [inlined]
file_compare(a=, b=0x) at sockstat.c:179:38
frame #1: 0x02c892dde507 sockstat`displaysock [inlined]
files_t_RB_FIND(head=, elm=) at sockstat.c:181:1
frame #2: 0x02c892dde4fe sockstat`displaysock(s=0x1790ce24be00,
pos=) at sockstat.c:1165:10
frame #3: 0x02c892ddd71f sockstat`display at sockstat.c:1345:4
frame #4: 0x02c892ddcc07 sockstat`main(argc=,
argv=) at sockstat.c:1577:2
frame #5: 0x02d0b7f008da libc.so.7`__libc_start1(argc=1,
argv=0x02d0b2e0ed10, env=0x02d0b2e0ed20, cleanup=,
mainX=(sockstat`main at sockstat.c:1434)) at libc_start1.c:157:7
frame #6: 0x02c892ddb18d sockstat`_start at crt1_s.S:83
(lldb) q

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 280037] KTLS with Intel QAT may trigger kernel panics

2024-07-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280037

--- Comment #13 from ss3bsd <3226388...@jcom.home.ne.jp> ---
By the way, is there any easy way to disable only CBC acceleration of QAT but
enable that of KTLS offload?

I want to try that next if there is.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread Junho Choi
Alan - this is a great result to see. Thanks for experimenting.

Just curious why bbr and rack don't co-exist? Those are two separate things.
Is it a current bug or by design?

BR,

On Thu, Jul 18, 2024 at 5:27 AM  wrote:

> > On 17. Jul 2024, at 22:00, Alan Somers  wrote:
> >
> > On Sat, Jul 13, 2024 at 1:50 AM  wrote:
> >>
> >>> On 13. Jul 2024, at 01:43, Alan Somers  wrote:
> >>>
> >>> I've been experimenting with RACK and BBR.  In my environment, they
> >>> can dramatically improve single-stream TCP performance, which is
> >>> awesome.  But pf interferes.  I have to disable pf in order for them
> >>> to work at all.
> >>>
> >>> Is this a known limitation?  If not, I will experiment some more to
> >>> determine exactly what aspect of my pf configuration is responsible.
> >>> If so, can anybody suggest what changes would have to happen to make
> >>> the two compatible?
> >> A problem with same symptoms was already reported and fixed in
> >> https://reviews.freebsd.org/D43769
> >>
> >> Which version are you using?
> >>
> >> Best regards
> >> Michael
> >>>
> >>> -Alan
> >
> > TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
> >
> > I want to follow up with the list to post my conclusions.  Firstly
> > tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
> > incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
> > confirm that tcp_bbr works for me if I either disable LRO, disable PF,
> > or switch to a 14.1 server.
> >
> > Here's the real problem: on multiple production servers, downloading
> > large files (or ZFS send/recv streams) was slow.  After ruling out
> > many possible causes, wireshark revealed that the connection was
> > suffering about 0.05% packet loss.  I don't know the source of that
> > packet loss, but I don't believe it to be congestion-related.  Along
> > with a 54ms RTT, that's a fatal combination for the throughput of
> > loss-based congestion control algorithms.  According to the Mathis
> > Formula [1], I could only expect 1.1 MBps over such a connection.
> > That's actually worse than what I saw.  With default settings
> > (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
> > outdated, but that's still pretty close for such a simple formula
> > that's 27 years old.
> >
> > So I benchmarked all available congestion control algorithms for
> > single download streams.  The results are summarized in the table
> > below.
> >
> > AlgoPacket Loss RateAverage Throughput
> > vegas   0.05%   2.0 MBps
> > newreno 0.05%   3.2 MBps
> > cubic   0.05%   5.6 MBps
> > hd  0.05%   8.6 MBps
> > cdg 0.05%   13.5 MBps
> > rack0.04%   14 MBps
> > htcp0.05%   15 MBps
> > dctcp   0.05%   15 MBps
> > chd 0.05%   17.3 MBps
> > bbr 0.05%   29.2 MBps
> > cubic   10% 159 kBps
> > chd 10% 208 kBps
> > bbr 10% 5.7 MBps
> >
> > RACK seemed to achieve about the same maximum bandwidth as BBR, though
> > it took a lot longer to get there.  Also, with RACK, wireshark
> > reported about 10x as many retransmissions as dropped packets, which
> > is suspicious.
> >
> > At one point, something went haywire and packet loss briefly spiked to
> > the neighborhood of 10%.  I took advantage of the chaos to repeat my
> > measurements.  As the table shows, all algorithms sucked under those
> > conditions, but BBR sucked impressively less than the others.
> >
> > Disclaimer: there was significant run-to-run variation; the presented
> > results are averages.  And I did not attempt to measure packet loss
> > exactly for most runs; 0.05% is merely an average of a few selected
> > runs.  These measurements were taken on a production server running a
> > real workload, which introduces noise.  Soon I hope to have the
> > opportunity to repeat the experiment on an idle server in the same
> > environment.
> >
> > In conclusion, while we'd like to use BBR, we really can't until we
> > upgrade to 14.1, which hopefully will be soon.  So in the meantime
> > we've switched all relevant servers from cubic to chd, and we'll
> > reevaluate BBR after the upgrade.
> Hi Alan,
>
> just to be clear: the version of BBR currently implemented is
> BBR version 1, which is known to be unfair in certain scenarios.
> Google is still working on BBR to address this problem and improve
> it in other aspects. But there is no RFC yet and the updates haven't
> been implemented yet in FreeBSD.
>
> Best regards
> Michael
> >
> > [1]: https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html
> >
> > -Alan
>
>
>

-- 
Junho Choi  | https://saturnsoft.net


Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread Alan Somers
I'm not sure what you're asking.  BBR and RACK are two different
algorithms that accomplish the same thing.  It wouldn't make sense to
use both on the same socket at the same time.

On Thu, Jul 18, 2024 at 7:01 AM Junho Choi  wrote:
>
> Alan - this is a great result to see. Thanks for experimenting.
>
> Just curious why bbr and rack don't co-exist? Those are two separate things.
> Is it a current bug or by design?
>
> BR,
>
> On Thu, Jul 18, 2024 at 5:27 AM  wrote:
>>
>> > On 17. Jul 2024, at 22:00, Alan Somers  wrote:
>> >
>> > On Sat, Jul 13, 2024 at 1:50 AM  wrote:
>> >>
>> >>> On 13. Jul 2024, at 01:43, Alan Somers  wrote:
>> >>>
>> >>> I've been experimenting with RACK and BBR.  In my environment, they
>> >>> can dramatically improve single-stream TCP performance, which is
>> >>> awesome.  But pf interferes.  I have to disable pf in order for them
>> >>> to work at all.
>> >>>
>> >>> Is this a known limitation?  If not, I will experiment some more to
>> >>> determine exactly what aspect of my pf configuration is responsible.
>> >>> If so, can anybody suggest what changes would have to happen to make
>> >>> the two compatible?
>> >> A problem with same symptoms was already reported and fixed in
>> >> https://reviews.freebsd.org/D43769
>> >>
>> >> Which version are you using?
>> >>
>> >> Best regards
>> >> Michael
>> >>>
>> >>> -Alan
>> >
>> > TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
>> >
>> > I want to follow up with the list to post my conclusions.  Firstly
>> > tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
>> > incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
>> > confirm that tcp_bbr works for me if I either disable LRO, disable PF,
>> > or switch to a 14.1 server.
>> >
>> > Here's the real problem: on multiple production servers, downloading
>> > large files (or ZFS send/recv streams) was slow.  After ruling out
>> > many possible causes, wireshark revealed that the connection was
>> > suffering about 0.05% packet loss.  I don't know the source of that
>> > packet loss, but I don't believe it to be congestion-related.  Along
>> > with a 54ms RTT, that's a fatal combination for the throughput of
>> > loss-based congestion control algorithms.  According to the Mathis
>> > Formula [1], I could only expect 1.1 MBps over such a connection.
>> > That's actually worse than what I saw.  With default settings
>> > (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
>> > outdated, but that's still pretty close for such a simple formula
>> > that's 27 years old.
>> >
>> > So I benchmarked all available congestion control algorithms for
>> > single download streams.  The results are summarized in the table
>> > below.
>> >
>> > AlgoPacket Loss RateAverage Throughput
>> > vegas   0.05%   2.0 MBps
>> > newreno 0.05%   3.2 MBps
>> > cubic   0.05%   5.6 MBps
>> > hd  0.05%   8.6 MBps
>> > cdg 0.05%   13.5 MBps
>> > rack0.04%   14 MBps
>> > htcp0.05%   15 MBps
>> > dctcp   0.05%   15 MBps
>> > chd 0.05%   17.3 MBps
>> > bbr 0.05%   29.2 MBps
>> > cubic   10% 159 kBps
>> > chd 10% 208 kBps
>> > bbr 10% 5.7 MBps
>> >
>> > RACK seemed to achieve about the same maximum bandwidth as BBR, though
>> > it took a lot longer to get there.  Also, with RACK, wireshark
>> > reported about 10x as many retransmissions as dropped packets, which
>> > is suspicious.
>> >
>> > At one point, something went haywire and packet loss briefly spiked to
>> > the neighborhood of 10%.  I took advantage of the chaos to repeat my
>> > measurements.  As the table shows, all algorithms sucked under those
>> > conditions, but BBR sucked impressively less than the others.
>> >
>> > Disclaimer: there was significant run-to-run variation; the presented
>> > results are averages.  And I did not attempt to measure packet loss
>> > exactly for most runs; 0.05% is merely an average of a few selected
>> > runs.  These measurements were taken on a production server running a
>> > real workload, which introduces noise.  Soon I hope to have the
>> > opportunity to repeat the experiment on an idle server in the same
>> > environment.
>> >
>> > In conclusion, while we'd like to use BBR, we really can't until we
>> > upgrade to 14.1, which hopefully will be soon.  So in the meantime
>> > we've switched all relevant servers from cubic to chd, and we'll
>> > reevaluate BBR after the upgrade.
>> Hi Alan,
>>
>> just to be clear: the version of BBR currently implemented is
>> BBR version 1, which is known to be unfair in certain scenarios.
>> Google is still working on BBR to address this problem and improve
>> it in other aspects. But there is no RFC yet and the updates haven't
>> been implemented yet in FreeBSD.
>>
>> Best regards
>> Michael
>> >
>> > [1]: https://www.slac.stanford.e

Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread Alan Somers
On Wed, Jul 17, 2024 at 2:27 PM  wrote:
>
> > On 17. Jul 2024, at 22:00, Alan Somers  wrote:
> >
> > On Sat, Jul 13, 2024 at 1:50 AM  wrote:
> >>
> >>> On 13. Jul 2024, at 01:43, Alan Somers  wrote:
> >>>
> >>> I've been experimenting with RACK and BBR.  In my environment, they
> >>> can dramatically improve single-stream TCP performance, which is
> >>> awesome.  But pf interferes.  I have to disable pf in order for them
> >>> to work at all.
> >>>
> >>> Is this a known limitation?  If not, I will experiment some more to
> >>> determine exactly what aspect of my pf configuration is responsible.
> >>> If so, can anybody suggest what changes would have to happen to make
> >>> the two compatible?
> >> A problem with same symptoms was already reported and fixed in
> >> https://reviews.freebsd.org/D43769
> >>
> >> Which version are you using?
> >>
> >> Best regards
> >> Michael
> >>>
> >>> -Alan
> >
> > TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
> >
> > I want to follow up with the list to post my conclusions.  Firstly
> > tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
> > incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
> > confirm that tcp_bbr works for me if I either disable LRO, disable PF,
> > or switch to a 14.1 server.
> >
> > Here's the real problem: on multiple production servers, downloading
> > large files (or ZFS send/recv streams) was slow.  After ruling out
> > many possible causes, wireshark revealed that the connection was
> > suffering about 0.05% packet loss.  I don't know the source of that
> > packet loss, but I don't believe it to be congestion-related.  Along
> > with a 54ms RTT, that's a fatal combination for the throughput of
> > loss-based congestion control algorithms.  According to the Mathis
> > Formula [1], I could only expect 1.1 MBps over such a connection.
> > That's actually worse than what I saw.  With default settings
> > (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
> > outdated, but that's still pretty close for such a simple formula
> > that's 27 years old.
> >
> > So I benchmarked all available congestion control algorithms for
> > single download streams.  The results are summarized in the table
> > below.
> >
> > AlgoPacket Loss RateAverage Throughput
> > vegas   0.05%   2.0 MBps
> > newreno 0.05%   3.2 MBps
> > cubic   0.05%   5.6 MBps
> > hd  0.05%   8.6 MBps
> > cdg 0.05%   13.5 MBps
> > rack0.04%   14 MBps
> > htcp0.05%   15 MBps
> > dctcp   0.05%   15 MBps
> > chd 0.05%   17.3 MBps
> > bbr 0.05%   29.2 MBps
> > cubic   10% 159 kBps
> > chd 10% 208 kBps
> > bbr 10% 5.7 MBps
> >
> > RACK seemed to achieve about the same maximum bandwidth as BBR, though
> > it took a lot longer to get there.  Also, with RACK, wireshark
> > reported about 10x as many retransmissions as dropped packets, which
> > is suspicious.
> >
> > At one point, something went haywire and packet loss briefly spiked to
> > the neighborhood of 10%.  I took advantage of the chaos to repeat my
> > measurements.  As the table shows, all algorithms sucked under those
> > conditions, but BBR sucked impressively less than the others.
> >
> > Disclaimer: there was significant run-to-run variation; the presented
> > results are averages.  And I did not attempt to measure packet loss
> > exactly for most runs; 0.05% is merely an average of a few selected
> > runs.  These measurements were taken on a production server running a
> > real workload, which introduces noise.  Soon I hope to have the
> > opportunity to repeat the experiment on an idle server in the same
> > environment.
> >
> > In conclusion, while we'd like to use BBR, we really can't until we
> > upgrade to 14.1, which hopefully will be soon.  So in the meantime
> > we've switched all relevant servers from cubic to chd, and we'll
> > reevaluate BBR after the upgrade.
> Hi Alan,
>
> just to be clear: the version of BBR currently implemented is
> BBR version 1, which is known to be unfair in certain scenarios.
> Google is still working on BBR to address this problem and improve
> it in other aspects. But there is no RFC yet and the updates haven't
> been implemented yet in FreeBSD.

I've also heard that RACK suffers from fairness problems.  Do you know
how RACK and BBR compare for fairness?



Re: flushing default router list upon inet6 route flush

2024-07-18 Thread Mark Johnston
On Wed, Jul 17, 2024 at 09:19:53AM +0800, Zhenlei Huang wrote:
> 
> 
> > On Jul 17, 2024, at 4:04 AM, Mark Johnston  wrote:
> > 
> > Hello,
> > 
> > When IPv6 SLAAC is configured for an interface, the kernel will update
> > its default router list upon receipt of a router advertisement.  In so
> > doing it may install a default route; in the kernel this happens in
> > defrouter_addreq().
> > 
> > If one uses "route flush" or "service routing restart" to reset the
> > routing tables, the default router list is not purged, so a subsequent
> > RA from the original default router does not update the list, and so
> > does not re-create the default route, even if one re-runs rtsol(8).
> > 
> > This appears to be a bug, but I'm not sure where best to fix it.  Should
> > "service routing restart" invoke "ndp -R" to flush the default router
> > list?
> 
> That can be a workaround, but not the ideal fix.
> 
> > Should route(8) handle this as part of a flush command?
> 
> No, I do not think so. route(8) should handle the routing / FIB parts.
> IPv6 default route list is maintained as a per AF basis. Handling the
> default route list via route(8), aka the userland, seems to be more a
> HACK.
> 
> > Or
> > something else?
> 
> I'd propose that the kernel handle this situation, so that for other cases
> such as `route -6 delete default`, or route change event from NETLINK
> socket, the IPv6 SLAAC default router feature can also work as expected.
> 
> To be precise, `sys/netinet6/nd6_rtr.c` listen on route events and clear
> the `installed` flag on deleting the previously installed default route, or
> maybe purge all default route list. Then the next time the kernel receives
> a RA it re-installs the default route.

Thank you for the hint.  It turns out that the kernel already does this,
but a bug was preventing it from working correctly.
https://reviews.freebsd.org/D46020 fixes the problem for me.

> This IMO may have side effect that user may really want to delete the
> default route while not providing an explicit default route. I think for that
> case user should disable accepting RA on the interface ( aka
> ifconfig em0 inet6 no_radr, or turn on net.inet6.ip6.no_radr globally).

That sounds perfectly reasonable.

> How about this proposal ?
> 
> 
> Best regards,
> Zhenlei
> 



Re: Multiple Fibs and INET6

2024-07-18 Thread Santiago Martinez
Hi everyone,
Did anyone had the chance to take a look?
For me it’s a bug but before filling the PR want to know what’s your view or if 
it’s a limitation or bug by design.
Br
Santi


> On 12 Jul 2024, at 19:06, Santiago Martinez  wrote:
> 
> 
> Hi Everyone.
> 
> While adding -F ( fib as used in netstat ) to ping and ping6 I have found 
> something that from my understanding is not correct.
> Please can you advise?
> I have the following setup :
> 
> -- two fibs (0 and 1) 
> -- two  loop-backs (lo0 and lo1).
> -- Lo1 has been assigned to fib1
> -- net.add_addr_allfibs = 0
> My interface output looks like this:
> 
> ifconfig lo0 | grep inet6
>inet6 ::1 prefixlen 128
>inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
> 
> ifconfig lo1 | grep inet6
>inet6 fe80::1%lo1 prefixlen 64 scopeid 0x3
> 
> 
> If I do a netstat -rn -6  -F0 I get the following which is was i expected.
> 
> Internet6:
> Destination   Gateway   Flags 
> Netif Expire
> ::/96 link#2URS 
> lo0
> ::1   link#2UHS 
> lo0
> :::0.0.0.0/96 link#2URS 
> lo0
> fe80::%lo0/10 link#2URS 
> lo0
> fe80::%lo0/64 link#2U   
> lo0
> fe80::1%lo0   link#2UHS 
> lo0
> ff02::/16 link#2URS 
> lo0
> 
> 
> Now,  netstat -rn -6  -F1 shows  "fe80::1%lo0" which should not be there and 
> "fe80::1%lo1" is missing which should be there.
> Internet6:
> Destination   Gateway   Flags 
> Netif Expire
> fe80::%lo1/64 link#3U   
> lo1
> fe80::1%lo0   link#2UHS 
> lo0
> 
> 
> What output I was expecting was:
> Internet6:
> Destination   Gateway   Flags 
> Netif Expire
> fe80::%lo1/64 link#3U   
> lo1
> fe80::1%lo1   link#3UHS 
> lo1
> 
> 
> 
> This makes the ping -6 -F0 fe80::1%lo0  to work but ping -6 -F1 fe80::1%l01 
> to fail which I wanted to use as test case.
> 
> Thanks in advance.
> 
> Santiago
> 


Re: Multiple Fibs and INET6

2024-07-18 Thread Zhenlei Huang


> On Jul 13, 2024, at 1:06 AM, Santiago Martinez  wrote:
> 
> Hi Everyone.
> 
> While adding -F ( fib as used in netstat ) to ping and ping6 I have found 
> something that from my understanding is not correct.
> Please can you advise?
> I have the following setup :
> 
> -- two fibs (0 and 1) 
> -- two  loop-backs (lo0 and lo1).
> -- Lo1 has been assigned to fib1
> -- net.add_addr_allfibs = 0
> My interface output looks like this:
> 
> 
> ifconfig lo0 | grep inet6 
>inet6 ::1 prefixlen 128 
>inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
> 
> ifconfig lo1 | grep inet6 
>inet6 fe80::1%lo1 prefixlen 64 scopeid 0x3
> 
> 
> If I do a netstat -rn -6  -F0 I get the following which is was i expected.
> 
> Internet6: 
> Destination   Gateway   Flags 
> Netif Expire 
> ::/96 link#2URS 
> lo0 
> ::1   link#2UHS 
> lo0 
> :::0.0.0.0/96 link#2URS 
> lo0 
> fe80::%lo0/10 link#2URS 
> lo0 
> fe80::%lo0/64 link#2U   
> lo0 
> fe80::1%lo0   link#2UHS 
> lo0 
> ff02::/16 link#2URS 
> lo0
> 
> 
> Now,  netstat -rn -6  -F1 shows  "fe80::1%lo0" which should not be there and 
> "fe80::1%lo1" is missing which should be there.
> 
> Internet6: 
> Destination   Gateway   Flags 
> Netif Expire 
> fe80::%lo1/64 link#3U   
> lo1 
> fe80::1%lo0   link#2UHS 
> lo0
> 
That seems wrong from my first glance. IIRC, there's HACK ( I'd prefer this ) 
for loopback route. For example
```
# sysctl net.fibs=3
net.fibs: 2 -> 3
# ifconfig epair create
# epair0a
# ifconfig epair0a fib 2
# ifconfig epair0a inet6 -ifdisabled up
# netstat -6rnF 2
Routing tables (fib: 2)

Internet6:
Destination   Gateway   Flags Netif 
Expire
fe80::%epair0a/64 link#5U   epair0a
fe80::3b:b3ff:fe8f:9a0a%lo0   link#1UHS lo0
```

The loopback route always refer the first loop interface, aka lo0.  
> 
> 
> What output I was expecting was:
> Internet6: 
> Destination   Gateway   Flags 
> Netif Expire 
> fe80::%lo1/64 link#3U   
> lo1 
> fe80::1%lo1   link#3UHS 
> lo1
> 
> 
> 
> This makes the ping -6 -F0 fe80::1%lo0  to work but ping -6 -F1 fe80::1%l01 
> to fail which I wanted to use as test case.
> 
That is interesting. I can ping without failure.

```
# setfib 1 ping6 -c3 fe80::1%lo1
PING(56=40+8+8 bytes) fe80::1%lo1 --> fe80::1%lo1
16 bytes from fe80::1%lo1, icmp_seq=0 hlim=64 time=0.050 ms
16 bytes from fe80::1%lo1, icmp_seq=1 hlim=64 time=0.067 ms
16 bytes from fe80::1%lo1, icmp_seq=2 hlim=64 time=0.096 ms

--- fe80::1%lo1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.050/0.071/0.096/0.019 ms
```

Best regards,
Zhenlei

> Thanks in advance.
> 
> Santiago
> 
> 





Re: Multiple Fibs and INET6

2024-07-18 Thread Santiago Martinez

Interesting, I'm running 14.1p2.

how does your routing table looks for fib1 ?

Santi


On 7/18/24 18:09, Zhenlei Huang wrote:



On Jul 13, 2024, at 1:06 AM, Santiago Martinez  
wrote:


Hi Everyone.

While adding -F ( fib as used in netstat ) to ping and ping6 I have 
found something that from my understanding is not correct.

Please can you advise?

I have the following setup :

-- two fibs (0 and 1)
-- two  loop-backs (lo0 and lo1).
-- Lo1 has been assigned to fib1
--net.add_addr_allfibs = 0

My interface output looks like this:


ifconfig lo0 | grep inet6
   inet6 ::1 prefixlen 128
   inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2

ifconfig lo1 | grep inet6
   inet6 fe80::1%lo1 prefixlen 64 scopeid 0x3

If I do a netstat -rn -6  -F0 I get the following which is was i 
expected.


Internet6:
Destination   Gateway   Flags 
Netif Expire
::/96 link#2    URS 
lo0
::1   link#2    UHS 
lo0
:::0.0.0.0/96 link#2    URS 
lo0
fe80::%lo0/10 link#2    URS 
lo0
fe80::%lo0/64 link#2    U 
  lo0
fe80::1%lo0   link#2    UHS 
lo0
ff02::/16 link#2    URS 
lo0


Now,  netstat -rn -6  -F1 shows  "fe80::1%lo0" which should not be 
there and "fe80::1%lo1" is missing which should be there.


Internet6:
Destination   Gateway   Flags 
Netif Expire
fe80::%lo1/64 link#3    U 
  lo1
*fe80::1%lo0   link#2    UHS 
lo0*


That seems wrong from my first glance. IIRC, there's HACK ( I'd prefer 
this ) for loopback route. For example

```
# sysctl net.fibs=3
net.fibs: 2 -> 3
# ifconfig epair create
# epair0a
# ifconfig epair0a fib 2
# ifconfig epair0a inet6 -ifdisabled up
# netstat -6rnF 2
Routing tables (fib: 2)

Internet6:
Destination                     Gateway                       Flags   
Netif Expire

fe80::%epair0a/64                 link#5                        U epair0a
fe80::3b:b3ff:fe8f:9a0a%lo0       link#1                        UHS   
      lo0

```

The loopback route always refer the first loop interface, aka lo0.



What output I was expecting was:

Internet6:
Destination   Gateway   Flags 
Netif Expire
fe80::%lo1/64 link#3    U 
  lo1
*fe80::1%lo1   link#3    UHS 
lo1*



This makes the ping -6 -F0 fe80::1%lo0  to work but ping -6 -F1 
fe80::1%l01 to fail which I wanted to use as test case.



That is interesting. I can ping without failure.

```
# setfib 1 ping6 -c3 fe80::1%lo1
PING(56=40+8+8 bytes) fe80::1%lo1 --> fe80::1%lo1
16 bytes from fe80::1%lo1, icmp_seq=0 hlim=64 time=0.050 ms
16 bytes from fe80::1%lo1, icmp_seq=1 hlim=64 time=0.067 ms
16 bytes from fe80::1%lo1, icmp_seq=2 hlim=64 time=0.096 ms

--- fe80::1%lo1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.050/0.071/0.096/0.019 ms
```

Best regards,
Zhenlei


Thanks in advance.

Santiago







Re: Multiple Fibs and INET6

2024-07-18 Thread Zhenlei Huang


> On Jul 19, 2024, at 12:11 AM, Santiago Martinez  wrote:
> 
> Interesting, I'm running 14.1p2.
> 
> 

Yes, I'm running exactly the same version with you.
> how does your routing table looks for fib1 ?
> 
> 

```
# netstat -6rnF 1
Routing tables (fib: 1)

Internet6:
Destination   Gateway   Flags Netif 
Expire
fe80::%lo1/64 link#5U   lo1
fe80::1%lo0   link#2UHS lo0
```
> Santi
> 
> 
> 
> On 7/18/24 18:09, Zhenlei Huang wrote:
>> 
>> 
>>> On Jul 13, 2024, at 1:06 AM, Santiago Martinez >> > wrote:
>>> 
>>> Hi Everyone.
>>> 
>>> While adding -F ( fib as used in netstat ) to ping and ping6 I have found 
>>> something that from my understanding is not correct.
>>> Please can you advise?
>>> I have the following setup :
>>> 
>>> -- two fibs (0 and 1) 
>>> -- two  loop-backs (lo0 and lo1).
>>> -- Lo1 has been assigned to fib1
>>> -- net.add_addr_allfibs = 0
>>> My interface output looks like this:
>>> 
>>> 
>>> ifconfig lo0 | grep inet6 
>>>inet6 ::1 prefixlen 128 
>>>inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
>>> 
>>> ifconfig lo1 | grep inet6 
>>>inet6 fe80::1%lo1 prefixlen 64 scopeid 0x3
>>> 
>>> 
>>> If I do a netstat -rn -6  -F0 I get the following which is was i expected.
>>> 
>>> Internet6: 
>>> Destination   Gateway   Flags 
>>> Netif Expire 
>>> ::/96 link#2URS 
>>> lo0 
>>> ::1   link#2UHS 
>>> lo0 
>>> :::0.0.0.0/96 link#2URS 
>>> lo0 
>>> fe80::%lo0/10 link#2URS 
>>> lo0 
>>> fe80::%lo0/64 link#2U   
>>> lo0 
>>> fe80::1%lo0   link#2UHS 
>>> lo0 
>>> ff02::/16 link#2URS 
>>> lo0
>>> 
>>> 
>>> Now,  netstat -rn -6  -F1 shows  "fe80::1%lo0" which should not be there 
>>> and "fe80::1%lo1" is missing which should be there.
>>> 
>>> Internet6: 
>>> Destination   Gateway   Flags 
>>> Netif Expire 
>>> fe80::%lo1/64 link#3U   
>>> lo1 
>>> fe80::1%lo0   link#2UHS 
>>> lo0
>>> 
>> That seems wrong from my first glance. IIRC, there's HACK ( I'd prefer this 
>> ) for loopback route. For example
>> ```
>> # sysctl net.fibs=3
>> net.fibs: 2 -> 3
>> # ifconfig epair create
>> # epair0a
>> # ifconfig epair0a fib 2
>> # ifconfig epair0a inet6 -ifdisabled up
>> # netstat -6rnF 2
>> Routing tables (fib: 2)
>> 
>> Internet6:
>> Destination   Gateway   Flags 
>> Netif Expire
>> fe80::%epair0a/64 link#5U   
>> epair0a
>> fe80::3b:b3ff:fe8f:9a0a%lo0   link#1UHS 
>> lo0
>> ```
>> 
>> The loopback route always refer the first loop interface, aka lo0.  
>>> 
>>> 
>>> What output I was expecting was:
>>> Internet6: 
>>> Destination   Gateway   Flags 
>>> Netif Expire 
>>> fe80::%lo1/64 link#3U   
>>> lo1 
>>> fe80::1%lo1   link#3UHS 
>>> lo1
>>> 
>>> 
>>> 
>>> This makes the ping -6 -F0 fe80::1%lo0  to work but ping -6 -F1 fe80::1%l01 
>>> to fail which I wanted to use as test case.
>>> 
>> That is interesting. I can ping without failure.
>> 
>> ```
>> # setfib 1 ping6 -c3 fe80::1%lo1
>> PING(56=40+8+8 bytes) fe80::1%lo1 --> fe80::1%lo1
>> 16 bytes from fe80::1%lo1, icmp_seq=0 hlim=64 time=0.050 ms
>> 16 bytes from fe80::1%lo1, icmp_seq=1 hlim=64 time=0.067 ms
>> 16 bytes from fe80::1%lo1, icmp_seq=2 hlim=64 time=0.096 ms
>> 
>> --- fe80::1%lo1 ping statistics ---
>> 3 packets transmitted, 3 packets received, 0.0% packet loss
>> round-trip min/avg/max/stddev = 0.050/0.071/0.096/0.019 ms
>> ```
>> 
>> Best regards,
>> Zhenlei
>> 
>>> 
>>> Thanks in advance.
>>> 
>>> Santiago
>>> 
>>> 
>> 
>> 
>> 





Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread tuexen
> On 18. Jul 2024, at 15:00, Junho Choi  wrote:
> 
> Alan - this is a great result to see. Thanks for experimenting.
> 
> Just curious why bbr and rack don't co-exist? Those are two separate things.
> Is it a current bug or by design?
Technically RACK and BBR can coexist. The problem was with pf and/or LRO.

But this is all fixed now in 14.1 and head.

Best regards
Michael
> 
> BR,
> 
> On Thu, Jul 18, 2024 at 5:27 AM  wrote:
>> On 17. Jul 2024, at 22:00, Alan Somers  wrote:
>> 
>> On Sat, Jul 13, 2024 at 1:50 AM  wrote:
>>> 
 On 13. Jul 2024, at 01:43, Alan Somers  wrote:
 
 I've been experimenting with RACK and BBR.  In my environment, they
 can dramatically improve single-stream TCP performance, which is
 awesome.  But pf interferes.  I have to disable pf in order for them
 to work at all.
 
 Is this a known limitation?  If not, I will experiment some more to
 determine exactly what aspect of my pf configuration is responsible.
 If so, can anybody suggest what changes would have to happen to make
 the two compatible?
>>> A problem with same symptoms was already reported and fixed in
>>> https://reviews.freebsd.org/D43769
>>> 
>>> Which version are you using?
>>> 
>>> Best regards
>>> Michael
 
 -Alan
>> 
>> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
>> 
>> I want to follow up with the list to post my conclusions.  Firstly
>> tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
>> incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
>> confirm that tcp_bbr works for me if I either disable LRO, disable PF,
>> or switch to a 14.1 server.
>> 
>> Here's the real problem: on multiple production servers, downloading
>> large files (or ZFS send/recv streams) was slow.  After ruling out
>> many possible causes, wireshark revealed that the connection was
>> suffering about 0.05% packet loss.  I don't know the source of that
>> packet loss, but I don't believe it to be congestion-related.  Along
>> with a 54ms RTT, that's a fatal combination for the throughput of
>> loss-based congestion control algorithms.  According to the Mathis
>> Formula [1], I could only expect 1.1 MBps over such a connection.
>> That's actually worse than what I saw.  With default settings
>> (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
>> outdated, but that's still pretty close for such a simple formula
>> that's 27 years old.
>> 
>> So I benchmarked all available congestion control algorithms for
>> single download streams.  The results are summarized in the table
>> below.
>> 
>> AlgoPacket Loss RateAverage Throughput
>> vegas   0.05%   2.0 MBps
>> newreno 0.05%   3.2 MBps
>> cubic   0.05%   5.6 MBps
>> hd  0.05%   8.6 MBps
>> cdg 0.05%   13.5 MBps
>> rack0.04%   14 MBps
>> htcp0.05%   15 MBps
>> dctcp   0.05%   15 MBps
>> chd 0.05%   17.3 MBps
>> bbr 0.05%   29.2 MBps
>> cubic   10% 159 kBps
>> chd 10% 208 kBps
>> bbr 10% 5.7 MBps
>> 
>> RACK seemed to achieve about the same maximum bandwidth as BBR, though
>> it took a lot longer to get there.  Also, with RACK, wireshark
>> reported about 10x as many retransmissions as dropped packets, which
>> is suspicious.
>> 
>> At one point, something went haywire and packet loss briefly spiked to
>> the neighborhood of 10%.  I took advantage of the chaos to repeat my
>> measurements.  As the table shows, all algorithms sucked under those
>> conditions, but BBR sucked impressively less than the others.
>> 
>> Disclaimer: there was significant run-to-run variation; the presented
>> results are averages.  And I did not attempt to measure packet loss
>> exactly for most runs; 0.05% is merely an average of a few selected
>> runs.  These measurements were taken on a production server running a
>> real workload, which introduces noise.  Soon I hope to have the
>> opportunity to repeat the experiment on an idle server in the same
>> environment.
>> 
>> In conclusion, while we'd like to use BBR, we really can't until we
>> upgrade to 14.1, which hopefully will be soon.  So in the meantime
>> we've switched all relevant servers from cubic to chd, and we'll
>> reevaluate BBR after the upgrade.
> Hi Alan,
> 
> just to be clear: the version of BBR currently implemented is
> BBR version 1, which is known to be unfair in certain scenarios.
> Google is still working on BBR to address this problem and improve
> it in other aspects. But there is no RFC yet and the updates haven't
> been implemented yet in FreeBSD.
> 
> Best regards
> Michael
>> 
>> [1]: https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html
>> 
>> -Alan
> 
> 
> 
> 
> -- 
> Junho Choi  | https://saturnsoft.net




Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread tuexen
> On 18. Jul 2024, at 15:00, Junho Choi  wrote:
> 
> Alan - this is a great result to see. Thanks for experimenting.
> 
> Just curious why bbr and rack don't co-exist? Those are two separate things.
> Is it a current bug or by design?
Technically RACK and BBR can coexist. The problem was with pf and/or LRO.

But this is all fixed now in 14.1 and head.

Best regards
Michael
> 
> BR,
> 
> On Thu, Jul 18, 2024 at 5:27 AM  wrote:
>> On 17. Jul 2024, at 22:00, Alan Somers  wrote:
>> 
>> On Sat, Jul 13, 2024 at 1:50 AM  wrote:
>>> 
 On 13. Jul 2024, at 01:43, Alan Somers  wrote:
 
 I've been experimenting with RACK and BBR.  In my environment, they
 can dramatically improve single-stream TCP performance, which is
 awesome.  But pf interferes.  I have to disable pf in order for them
 to work at all.
 
 Is this a known limitation?  If not, I will experiment some more to
 determine exactly what aspect of my pf configuration is responsible.
 If so, can anybody suggest what changes would have to happen to make
 the two compatible?
>>> A problem with same symptoms was already reported and fixed in
>>> https://reviews.freebsd.org/D43769
>>> 
>>> Which version are you using?
>>> 
>>> Best regards
>>> Michael
 
 -Alan
>> 
>> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
>> 
>> I want to follow up with the list to post my conclusions.  Firstly
>> tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
>> incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
>> confirm that tcp_bbr works for me if I either disable LRO, disable PF,
>> or switch to a 14.1 server.
>> 
>> Here's the real problem: on multiple production servers, downloading
>> large files (or ZFS send/recv streams) was slow.  After ruling out
>> many possible causes, wireshark revealed that the connection was
>> suffering about 0.05% packet loss.  I don't know the source of that
>> packet loss, but I don't believe it to be congestion-related.  Along
>> with a 54ms RTT, that's a fatal combination for the throughput of
>> loss-based congestion control algorithms.  According to the Mathis
>> Formula [1], I could only expect 1.1 MBps over such a connection.
>> That's actually worse than what I saw.  With default settings
>> (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
>> outdated, but that's still pretty close for such a simple formula
>> that's 27 years old.
>> 
>> So I benchmarked all available congestion control algorithms for
>> single download streams.  The results are summarized in the table
>> below.
>> 
>> AlgoPacket Loss RateAverage Throughput
>> vegas   0.05%   2.0 MBps
>> newreno 0.05%   3.2 MBps
>> cubic   0.05%   5.6 MBps
>> hd  0.05%   8.6 MBps
>> cdg 0.05%   13.5 MBps
>> rack0.04%   14 MBps
>> htcp0.05%   15 MBps
>> dctcp   0.05%   15 MBps
>> chd 0.05%   17.3 MBps
>> bbr 0.05%   29.2 MBps
>> cubic   10% 159 kBps
>> chd 10% 208 kBps
>> bbr 10% 5.7 MBps
>> 
>> RACK seemed to achieve about the same maximum bandwidth as BBR, though
>> it took a lot longer to get there.  Also, with RACK, wireshark
>> reported about 10x as many retransmissions as dropped packets, which
>> is suspicious.
>> 
>> At one point, something went haywire and packet loss briefly spiked to
>> the neighborhood of 10%.  I took advantage of the chaos to repeat my
>> measurements.  As the table shows, all algorithms sucked under those
>> conditions, but BBR sucked impressively less than the others.
>> 
>> Disclaimer: there was significant run-to-run variation; the presented
>> results are averages.  And I did not attempt to measure packet loss
>> exactly for most runs; 0.05% is merely an average of a few selected
>> runs.  These measurements were taken on a production server running a
>> real workload, which introduces noise.  Soon I hope to have the
>> opportunity to repeat the experiment on an idle server in the same
>> environment.
>> 
>> In conclusion, while we'd like to use BBR, we really can't until we
>> upgrade to 14.1, which hopefully will be soon.  So in the meantime
>> we've switched all relevant servers from cubic to chd, and we'll
>> reevaluate BBR after the upgrade.
> Hi Alan,
> 
> just to be clear: the version of BBR currently implemented is
> BBR version 1, which is known to be unfair in certain scenarios.
> Google is still working on BBR to address this problem and improve
> it in other aspects. But there is no RFC yet and the updates haven't
> been implemented yet in FreeBSD.
> 
> Best regards
> Michael
>> 
>> [1]: https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html
>> 
>> -Alan
> 
> 
> 
> 
> -- 
> Junho Choi  | https://saturnsoft.net




Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread tuexen



> On 18. Jul 2024, at 16:03, Alan Somers  wrote:
> 
> On Wed, Jul 17, 2024 at 2:27 PM  wrote:
>> 
>>> On 17. Jul 2024, at 22:00, Alan Somers  wrote:
>>> 
>>> On Sat, Jul 13, 2024 at 1:50 AM  wrote:
 
> On 13. Jul 2024, at 01:43, Alan Somers  wrote:
> 
> I've been experimenting with RACK and BBR.  In my environment, they
> can dramatically improve single-stream TCP performance, which is
> awesome.  But pf interferes.  I have to disable pf in order for them
> to work at all.
> 
> Is this a known limitation?  If not, I will experiment some more to
> determine exactly what aspect of my pf configuration is responsible.
> If so, can anybody suggest what changes would have to happen to make
> the two compatible?
 A problem with same symptoms was already reported and fixed in
 https://reviews.freebsd.org/D43769
 
 Which version are you using?
 
 Best regards
 Michael
> 
> -Alan
>>> 
>>> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
>>> 
>>> I want to follow up with the list to post my conclusions.  Firstly
>>> tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
>>> incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
>>> confirm that tcp_bbr works for me if I either disable LRO, disable PF,
>>> or switch to a 14.1 server.
>>> 
>>> Here's the real problem: on multiple production servers, downloading
>>> large files (or ZFS send/recv streams) was slow.  After ruling out
>>> many possible causes, wireshark revealed that the connection was
>>> suffering about 0.05% packet loss.  I don't know the source of that
>>> packet loss, but I don't believe it to be congestion-related.  Along
>>> with a 54ms RTT, that's a fatal combination for the throughput of
>>> loss-based congestion control algorithms.  According to the Mathis
>>> Formula [1], I could only expect 1.1 MBps over such a connection.
>>> That's actually worse than what I saw.  With default settings
>>> (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
>>> outdated, but that's still pretty close for such a simple formula
>>> that's 27 years old.
>>> 
>>> So I benchmarked all available congestion control algorithms for
>>> single download streams.  The results are summarized in the table
>>> below.
>>> 
>>> AlgoPacket Loss RateAverage Throughput
>>> vegas   0.05%   2.0 MBps
>>> newreno 0.05%   3.2 MBps
>>> cubic   0.05%   5.6 MBps
>>> hd  0.05%   8.6 MBps
>>> cdg 0.05%   13.5 MBps
>>> rack0.04%   14 MBps
>>> htcp0.05%   15 MBps
>>> dctcp   0.05%   15 MBps
>>> chd 0.05%   17.3 MBps
>>> bbr 0.05%   29.2 MBps
>>> cubic   10% 159 kBps
>>> chd 10% 208 kBps
>>> bbr 10% 5.7 MBps
>>> 
>>> RACK seemed to achieve about the same maximum bandwidth as BBR, though
>>> it took a lot longer to get there.  Also, with RACK, wireshark
>>> reported about 10x as many retransmissions as dropped packets, which
>>> is suspicious.
>>> 
>>> At one point, something went haywire and packet loss briefly spiked to
>>> the neighborhood of 10%.  I took advantage of the chaos to repeat my
>>> measurements.  As the table shows, all algorithms sucked under those
>>> conditions, but BBR sucked impressively less than the others.
>>> 
>>> Disclaimer: there was significant run-to-run variation; the presented
>>> results are averages.  And I did not attempt to measure packet loss
>>> exactly for most runs; 0.05% is merely an average of a few selected
>>> runs.  These measurements were taken on a production server running a
>>> real workload, which introduces noise.  Soon I hope to have the
>>> opportunity to repeat the experiment on an idle server in the same
>>> environment.
>>> 
>>> In conclusion, while we'd like to use BBR, we really can't until we
>>> upgrade to 14.1, which hopefully will be soon.  So in the meantime
>>> we've switched all relevant servers from cubic to chd, and we'll
>>> reevaluate BBR after the upgrade.
>> Hi Alan,
>> 
>> just to be clear: the version of BBR currently implemented is
>> BBR version 1, which is known to be unfair in certain scenarios.
>> Google is still working on BBR to address this problem and improve
>> it in other aspects. But there is no RFC yet and the updates haven't
>> been implemented yet in FreeBSD.
> 
> I've also heard that RACK suffers from fairness problems.  Do you know
> how RACK and BBR compare for fairness?
RACK should be fare, BBR (version 1) is known not be be fair...

Best regards
Michael




Re: Multiple Fibs and INET6

2024-07-18 Thread Santiago Martinez
Indeed, ping does work if I ping the "fe80::1%lo1" on FIB 1, which is 
correct.


My script was getting the address from the routing table output (F1) 
which is returning "%lo0" instead of the correct loopback number (lo6 in 
my case) and as a result it was failing.


The routing table should return the correct loop-back interface instead 
of lo0. Not sure how difficult or easy it, i will take a look ( ENOCLUE).


Best regards.

Santi


On 7/18/24 18:15, Zhenlei Huang wrote:



On Jul 19, 2024, at 12:11 AM, Santiago Martinez  
wrote:


Interesting, I'm running 14.1p2.




Yes, I'm running exactly the same version with you.


how does your routing table looks for fib1 ?




```
# netstat -6rnF 1
Routing tables (fib: 1)

Internet6:
Destination                       Gateway     Flags     Netif Expire
fe80::%lo1/64                     link#5      U           lo1
fe80::1%lo0                       link#2      UHS         lo0
```


Santi


On 7/18/24 18:09, Zhenlei Huang wrote:



On Jul 13, 2024, at 1:06 AM, Santiago Martinez 
 wrote:


Hi Everyone.

While adding -F ( fib as used in netstat ) to ping and ping6 I have 
found something that from my understanding is not correct.

Please can you advise?

I have the following setup :

-- two fibs (0 and 1)
-- two  loop-backs (lo0 and lo1).
-- Lo1 has been assigned to fib1
--net.add_addr_allfibs = 0

My interface output looks like this:


ifconfig lo0 | grep inet6
   inet6 ::1 prefixlen 128
   inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2

ifconfig lo1 | grep inet6
   inet6 fe80::1%lo1 prefixlen 64 scopeid 0x3

If I do a netstat -rn -6  -F0 I get the following which is was i 
expected.


Internet6:
Destination   Gateway 
  Flags Netif Expire
::/96 link#2    URS 
lo0
::1   link#2    UHS 
lo0
:::0.0.0.0/96 link#2    URS 
lo0
fe80::%lo0/10 link#2    URS 
lo0
fe80::%lo0/64 link#2    U 
  lo0
fe80::1%lo0   link#2    UHS 
lo0
ff02::/16 link#2    URS 
lo0


Now,  netstat -rn -6  -F1 shows  "fe80::1%lo0" which should not be 
there and "fe80::1%lo1" is missing which should be there.


Internet6:
Destination   Gateway 
  Flags Netif Expire
fe80::%lo1/64 link#3    U 
  lo1
*fe80::1%lo0   link#2 
   UHS lo0*


That seems wrong from my first glance. IIRC, there's HACK ( I'd 
prefer this ) for loopback route. For example

```
# sysctl net.fibs=3
net.fibs: 2 -> 3
# ifconfig epair create
# epair0a
# ifconfig epair0a fib 2
# ifconfig epair0a inet6 -ifdisabled up
# netstat -6rnF 2
Routing tables (fib: 2)

Internet6:
Destination                       Gateway     Flags     Netif Expire
fe80::%epair0a/64                 link#5                        U   
    epair0a

fe80::3b:b3ff:fe8f:9a0a%lo0       link#1                        UHS lo0
```

The loopback route always refer the first loop interface, aka lo0.



What output I was expecting was:

Internet6:
Destination   Gateway 
  Flags Netif Expire
fe80::%lo1/64 link#3    U 
  lo1
*fe80::1%lo1   link#3    
UHS lo1*



This makes the ping -6 -F0 fe80::1%lo0  to work but ping -6 -F1 
fe80::1%l01 to fail which I wanted to use as test case.



That is interesting. I can ping without failure.

```
# setfib 1 ping6 -c3 fe80::1%lo1
PING(56=40+8+8 bytes) fe80::1%lo1 --> fe80::1%lo1
16 bytes from fe80::1%lo1, icmp_seq=0 hlim=64 time=0.050 ms
16 bytes from fe80::1%lo1, icmp_seq=1 hlim=64 time=0.067 ms
16 bytes from fe80::1%lo1, icmp_seq=2 hlim=64 time=0.096 ms

--- fe80::1%lo1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.050/0.071/0.096/0.019 ms
```

Best regards,
Zhenlei



Thanks in advance.

Santiago











Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread Alan Somers
Coexist how?  Do you mean that one socket can use one and a different
socket uses the other?  That makes sense.

On Thu, Jul 18, 2024 at 10:34 AM  wrote:
>
> > On 18. Jul 2024, at 15:00, Junho Choi  wrote:
> >
> > Alan - this is a great result to see. Thanks for experimenting.
> >
> > Just curious why bbr and rack don't co-exist? Those are two separate things.
> > Is it a current bug or by design?
> Technically RACK and BBR can coexist. The problem was with pf and/or LRO.
>
> But this is all fixed now in 14.1 and head.
>
> Best regards
> Michael
> >
> > BR,
> >
> > On Thu, Jul 18, 2024 at 5:27 AM  wrote:
> >> On 17. Jul 2024, at 22:00, Alan Somers  wrote:
> >>
> >> On Sat, Jul 13, 2024 at 1:50 AM  wrote:
> >>>
>  On 13. Jul 2024, at 01:43, Alan Somers  wrote:
> 
>  I've been experimenting with RACK and BBR.  In my environment, they
>  can dramatically improve single-stream TCP performance, which is
>  awesome.  But pf interferes.  I have to disable pf in order for them
>  to work at all.
> 
>  Is this a known limitation?  If not, I will experiment some more to
>  determine exactly what aspect of my pf configuration is responsible.
>  If so, can anybody suggest what changes would have to happen to make
>  the two compatible?
> >>> A problem with same symptoms was already reported and fixed in
> >>> https://reviews.freebsd.org/D43769
> >>>
> >>> Which version are you using?
> >>>
> >>> Best regards
> >>> Michael
> 
>  -Alan
> >>
> >> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
> >>
> >> I want to follow up with the list to post my conclusions.  Firstly
> >> tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
> >> incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
> >> confirm that tcp_bbr works for me if I either disable LRO, disable PF,
> >> or switch to a 14.1 server.
> >>
> >> Here's the real problem: on multiple production servers, downloading
> >> large files (or ZFS send/recv streams) was slow.  After ruling out
> >> many possible causes, wireshark revealed that the connection was
> >> suffering about 0.05% packet loss.  I don't know the source of that
> >> packet loss, but I don't believe it to be congestion-related.  Along
> >> with a 54ms RTT, that's a fatal combination for the throughput of
> >> loss-based congestion control algorithms.  According to the Mathis
> >> Formula [1], I could only expect 1.1 MBps over such a connection.
> >> That's actually worse than what I saw.  With default settings
> >> (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
> >> outdated, but that's still pretty close for such a simple formula
> >> that's 27 years old.
> >>
> >> So I benchmarked all available congestion control algorithms for
> >> single download streams.  The results are summarized in the table
> >> below.
> >>
> >> AlgoPacket Loss RateAverage Throughput
> >> vegas   0.05%   2.0 MBps
> >> newreno 0.05%   3.2 MBps
> >> cubic   0.05%   5.6 MBps
> >> hd  0.05%   8.6 MBps
> >> cdg 0.05%   13.5 MBps
> >> rack0.04%   14 MBps
> >> htcp0.05%   15 MBps
> >> dctcp   0.05%   15 MBps
> >> chd 0.05%   17.3 MBps
> >> bbr 0.05%   29.2 MBps
> >> cubic   10% 159 kBps
> >> chd 10% 208 kBps
> >> bbr 10% 5.7 MBps
> >>
> >> RACK seemed to achieve about the same maximum bandwidth as BBR, though
> >> it took a lot longer to get there.  Also, with RACK, wireshark
> >> reported about 10x as many retransmissions as dropped packets, which
> >> is suspicious.
> >>
> >> At one point, something went haywire and packet loss briefly spiked to
> >> the neighborhood of 10%.  I took advantage of the chaos to repeat my
> >> measurements.  As the table shows, all algorithms sucked under those
> >> conditions, but BBR sucked impressively less than the others.
> >>
> >> Disclaimer: there was significant run-to-run variation; the presented
> >> results are averages.  And I did not attempt to measure packet loss
> >> exactly for most runs; 0.05% is merely an average of a few selected
> >> runs.  These measurements were taken on a production server running a
> >> real workload, which introduces noise.  Soon I hope to have the
> >> opportunity to repeat the experiment on an idle server in the same
> >> environment.
> >>
> >> In conclusion, while we'd like to use BBR, we really can't until we
> >> upgrade to 14.1, which hopefully will be soon.  So in the meantime
> >> we've switched all relevant servers from cubic to chd, and we'll
> >> reevaluate BBR after the upgrade.
> > Hi Alan,
> >
> > just to be clear: the version of BBR currently implemented is
> > BBR version 1, which is known to be unfair in certain scenarios.
> > Google is still working on BBR to address this problem and improve
> > it in other aspects. But there i

Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread tuexen
> On 18. Jul 2024, at 20:37, Alan Somers  wrote:
> 
> Coexist how?  Do you mean that one socket can use one and a different
> socket uses the other?  That makes sense.
Correct.

Best regards
Michael
> 
> On Thu, Jul 18, 2024 at 10:34 AM  wrote:
>> 
>>> On 18. Jul 2024, at 15:00, Junho Choi  wrote:
>>> 
>>> Alan - this is a great result to see. Thanks for experimenting.
>>> 
>>> Just curious why bbr and rack don't co-exist? Those are two separate things.
>>> Is it a current bug or by design?
>> Technically RACK and BBR can coexist. The problem was with pf and/or LRO.
>> 
>> But this is all fixed now in 14.1 and head.
>> 
>> Best regards
>> Michael
>>> 
>>> BR,
>>> 
>>> On Thu, Jul 18, 2024 at 5:27 AM  wrote:
 On 17. Jul 2024, at 22:00, Alan Somers  wrote:
 
 On Sat, Jul 13, 2024 at 1:50 AM  wrote:
> 
>> On 13. Jul 2024, at 01:43, Alan Somers  wrote:
>> 
>> I've been experimenting with RACK and BBR.  In my environment, they
>> can dramatically improve single-stream TCP performance, which is
>> awesome.  But pf interferes.  I have to disable pf in order for them
>> to work at all.
>> 
>> Is this a known limitation?  If not, I will experiment some more to
>> determine exactly what aspect of my pf configuration is responsible.
>> If so, can anybody suggest what changes would have to happen to make
>> the two compatible?
> A problem with same symptoms was already reported and fixed in
> https://reviews.freebsd.org/D43769
> 
> Which version are you using?
> 
> Best regards
> Michael
>> 
>> -Alan
 
 TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
 
 I want to follow up with the list to post my conclusions.  Firstly
 tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
 incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
 confirm that tcp_bbr works for me if I either disable LRO, disable PF,
 or switch to a 14.1 server.
 
 Here's the real problem: on multiple production servers, downloading
 large files (or ZFS send/recv streams) was slow.  After ruling out
 many possible causes, wireshark revealed that the connection was
 suffering about 0.05% packet loss.  I don't know the source of that
 packet loss, but I don't believe it to be congestion-related.  Along
 with a 54ms RTT, that's a fatal combination for the throughput of
 loss-based congestion control algorithms.  According to the Mathis
 Formula [1], I could only expect 1.1 MBps over such a connection.
 That's actually worse than what I saw.  With default settings
 (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
 outdated, but that's still pretty close for such a simple formula
 that's 27 years old.
 
 So I benchmarked all available congestion control algorithms for
 single download streams.  The results are summarized in the table
 below.
 
 AlgoPacket Loss RateAverage Throughput
 vegas   0.05%   2.0 MBps
 newreno 0.05%   3.2 MBps
 cubic   0.05%   5.6 MBps
 hd  0.05%   8.6 MBps
 cdg 0.05%   13.5 MBps
 rack0.04%   14 MBps
 htcp0.05%   15 MBps
 dctcp   0.05%   15 MBps
 chd 0.05%   17.3 MBps
 bbr 0.05%   29.2 MBps
 cubic   10% 159 kBps
 chd 10% 208 kBps
 bbr 10% 5.7 MBps
 
 RACK seemed to achieve about the same maximum bandwidth as BBR, though
 it took a lot longer to get there.  Also, with RACK, wireshark
 reported about 10x as many retransmissions as dropped packets, which
 is suspicious.
 
 At one point, something went haywire and packet loss briefly spiked to
 the neighborhood of 10%.  I took advantage of the chaos to repeat my
 measurements.  As the table shows, all algorithms sucked under those
 conditions, but BBR sucked impressively less than the others.
 
 Disclaimer: there was significant run-to-run variation; the presented
 results are averages.  And I did not attempt to measure packet loss
 exactly for most runs; 0.05% is merely an average of a few selected
 runs.  These measurements were taken on a production server running a
 real workload, which introduces noise.  Soon I hope to have the
 opportunity to repeat the experiment on an idle server in the same
 environment.
 
 In conclusion, while we'd like to use BBR, we really can't until we
 upgrade to 14.1, which hopefully will be soon.  So in the meantime
 we've switched all relevant servers from cubic to chd, and we'll
 reevaluate BBR after the upgrade.
>>> Hi Alan,
>>> 
>>> just to be clear: the version of BBR currently implemented is
>>> BBR version 1, which is known to be unfair in cert

Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

2024-07-18 Thread Junho Choi
RACK is a loss detection algorithm and BBR is a congestion control
algorithm so it's on a different layer.
e.g. linux can configure them independently.

However in FreeBSD it looks like it is using the same configuration sysctl
(net.inet.tcp.functions_default=tcp_rack|tcp_bbr),
so not able to set it both.

Is there any plan to improve it? or does tcp_bbr include tcp_rack's loss
probe behavior?

A little confused.

Best,


On Fri, Jul 19, 2024 at 4:23 AM  wrote:

> > On 18. Jul 2024, at 20:37, Alan Somers  wrote:
> >
> > Coexist how?  Do you mean that one socket can use one and a different
> > socket uses the other?  That makes sense.
> Correct.
>
> Best regards
> Michael
> >
> > On Thu, Jul 18, 2024 at 10:34 AM  wrote:
> >>
> >>> On 18. Jul 2024, at 15:00, Junho Choi  wrote:
> >>>
> >>> Alan - this is a great result to see. Thanks for experimenting.
> >>>
> >>> Just curious why bbr and rack don't co-exist? Those are two separate
> things.
> >>> Is it a current bug or by design?
> >> Technically RACK and BBR can coexist. The problem was with pf and/or
> LRO.
> >>
> >> But this is all fixed now in 14.1 and head.
> >>
> >> Best regards
> >> Michael
> >>>
> >>> BR,
> >>>
> >>> On Thu, Jul 18, 2024 at 5:27 AM  wrote:
>  On 17. Jul 2024, at 22:00, Alan Somers  wrote:
> 
>  On Sat, Jul 13, 2024 at 1:50 AM  wrote:
> >
> >> On 13. Jul 2024, at 01:43, Alan Somers  wrote:
> >>
> >> I've been experimenting with RACK and BBR.  In my environment, they
> >> can dramatically improve single-stream TCP performance, which is
> >> awesome.  But pf interferes.  I have to disable pf in order for them
> >> to work at all.
> >>
> >> Is this a known limitation?  If not, I will experiment some more to
> >> determine exactly what aspect of my pf configuration is responsible.
> >> If so, can anybody suggest what changes would have to happen to make
> >> the two compatible?
> > A problem with same symptoms was already reported and fixed in
> > https://reviews.freebsd.org/D43769
> >
> > Which version are you using?
> >
> > Best regards
> > Michael
> >>
> >> -Alan
> 
>  TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
> 
>  I want to follow up with the list to post my conclusions.  Firstly
>  tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
>  incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
>  confirm that tcp_bbr works for me if I either disable LRO, disable PF,
>  or switch to a 14.1 server.
> 
>  Here's the real problem: on multiple production servers, downloading
>  large files (or ZFS send/recv streams) was slow.  After ruling out
>  many possible causes, wireshark revealed that the connection was
>  suffering about 0.05% packet loss.  I don't know the source of that
>  packet loss, but I don't believe it to be congestion-related.  Along
>  with a 54ms RTT, that's a fatal combination for the throughput of
>  loss-based congestion control algorithms.  According to the Mathis
>  Formula [1], I could only expect 1.1 MBps over such a connection.
>  That's actually worse than what I saw.  With default settings
>  (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
>  outdated, but that's still pretty close for such a simple formula
>  that's 27 years old.
> 
>  So I benchmarked all available congestion control algorithms for
>  single download streams.  The results are summarized in the table
>  below.
> 
>  AlgoPacket Loss RateAverage Throughput
>  vegas   0.05%   2.0 MBps
>  newreno 0.05%   3.2 MBps
>  cubic   0.05%   5.6 MBps
>  hd  0.05%   8.6 MBps
>  cdg 0.05%   13.5 MBps
>  rack0.04%   14 MBps
>  htcp0.05%   15 MBps
>  dctcp   0.05%   15 MBps
>  chd 0.05%   17.3 MBps
>  bbr 0.05%   29.2 MBps
>  cubic   10% 159 kBps
>  chd 10% 208 kBps
>  bbr 10% 5.7 MBps
> 
>  RACK seemed to achieve about the same maximum bandwidth as BBR, though
>  it took a lot longer to get there.  Also, with RACK, wireshark
>  reported about 10x as many retransmissions as dropped packets, which
>  is suspicious.
> 
>  At one point, something went haywire and packet loss briefly spiked to
>  the neighborhood of 10%.  I took advantage of the chaos to repeat my
>  measurements.  As the table shows, all algorithms sucked under those
>  conditions, but BBR sucked impressively less than the others.
> 
>  Disclaimer: there was significant run-to-run variation; the presented
>  results are averages.  And I did not attempt to measure packet loss
>  exactly for most runs; 0.05% is merely an av