Hi Olivier,
I have followed your instructions
- I have made static MAC addresses in the switch
I have checked all settings that you mentioned and since I use BSRDP v1.96
these are all there:
- I have checked for Chelsio settings in loader.conf.local
- ip redirections are off
.....

I see some improvement on the generator but not on the router. The
generator can generate ~7.8Mpps (still half the 14M) using:
pkt-gen -N -f tx -i vcxl0 -n 1000000000 -4 -d 198.19.10.1:2000-198.19.10.100
-D 00:07:43:2f:29:b0 -s 198.18.10.1:2000-198.18.10.20 -S 00:07:43:32:b1:61
-w 4 -l 60 -p 2
.....
298.852607 main_thread [2655] 7696852 pps (7712554 pkts 3702025920 bps in
1002040 usec) 305.82 avg_batch 199998 min_space
299.853510 main_thread [2655] 7560538 pps (7567796 pkts 3632542080 bps in
1000960 usec) 288.34 avg_batch 199998 min_space
300.855517 main_thread [2655] 7585604 pps (7600828 pkts 3648397440 bps in
1002007 usec) 322.03 avg_batch 199998 min_space
301.857506 main_thread [2655] 7554147 pps (7569165 pkts 3633199200 bps in
1001988 usec) 323.40 avg_batch 199998 min_space
......
On the receiver I get:

pkt-gen -N -f rx -i vcxl1 -w 4
....
297.806511 main_thread [2655] 4111236 pps (4119467 pkts 1977344160 bps in
1002002 usec) 31.44 avg_batch 827 min_space
298.808513 main_thread [2655] 4116059 pps (4124299 pkts 1979663520 bps in
1002002 usec) 31.36 avg_batch 923 min_space
299.810513 main_thread [2655] 4113719 pps (4121951 pkts 1978536480 bps in
1002001 usec) 31.50 avg_batch 928 min_space
300.812505 main_thread [2655] 4101845 pps (4110016 pkts 1972807680 bps in
1001992 usec) 31.58 avg_batch 832 min_space
301.814505 main_thread [2655] 4115806 pps (4124033 pkts 1979535840 bps in
1001999 usec) 31.62 avg_batch 620 min_space
....
the router forwards ~4.1Mpps and drops ~3.6Mpps. (far from 14M and even
less than what I had before). For some reason the pps throughput of the
router goes down from 5M to 4M now.

Here are some stats from the router:
#top -CHIPS
last pid: 94189;  load averages:  2.98,  1.00,  0.43    up 6+02:52:49
 04:38:10
141 threads:   8 running, 109 sleeping, 24 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  100% interrupt,  0.0% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system, 65.0% interrupt, 35.0% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system, 55.9% interrupt, 44.1% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system, 98.8% interrupt,  1.2% idle
Mem: 5208K Active, 21M Inact, 441M Wired, 205M Buf, 15G Free
Swap:

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME     CPU COMMAND
   11 root        -92    -      0   432K CPU0     0   3:56  99.74%
intr{irq259:
   11 root        -92    -      0   432K CPU3     3   3:54  99.32%
intr{irq262:
   11 root        -92    -      0   432K WAIT     1   3:29  72.46%
intr{irq260:
   11 root        -92    -      0   432K WAIT     2   3:23  65.40%
intr{irq261:
   10 root        155 ki31      0    64K RUN      2 146.8H  34.39%
idle{idle: c
   10 root        155 ki31      0    64K CPU1     1 146.8H  27.43%
idle{idle: c
   10 root        155 ki31      0    64K RUN      3 146.8H   0.79%
idle{idle: c
   11 root        -100    -      0   432K WAIT     1  33:58   0.40%
intr{irq20:
94189 root         20    0    13M  2956K CPU2     2   0:00   0.05% top
   11 root        -60    -      0   432K WAIT     2   2:21   0.03%
intr{swi4: c
    0 root        -92    -      0   448K -        2   0:00   0.03%
kernel{t5nex
   11 root        -92    -      0   432K WAIT     2   0:00   0.02%
intr{irq265:
   11 root        -92    -      0   432K WAIT     1   0:00   0.01%
intr{irq264:
   11 root        -92    -      0   432K WAIT     3   0:00   0.01%
intr{irq266:
   21 root        -16    -      0    48K psleep   1   0:08   0.00%
pagedaemon{d
   11 root        -92    -      0   432K RUN      0   0:00   0.00%
intr{irq263:
   24 root        -16    -      0   128K qsleep   2   0:07   0.00%
bufdaemon{bu
   18 root        -16    -      0    16K -        1   0:10   0.00%
rand_harvest
53889 root         20    0    20M  6216K select   2   0:00   0.00% sshd
   19 root        -16    -      0    16K tzpoll   2   0:01   0.00%
acpi_thermal
   11 root        -92    -      0   432K WAIT     2   0:02   0.00%
intr{irq267:
   25 root        -16    -      0    16K vlruwt   1   0:01   0.00% vnlru
   26 root         16    -      0    16K syncer   2   0:02   0.00% syncer

root@router]~# nic-queue-usage cxl0
[Q0  2779K/s] [Q1  1946K/s] [Q2  1761K/s] [Q3  2779K/s] [QT  9266K/s
17512K/s -> 0K/s]
[Q0  2763K/s] [Q1  1935K/s] [Q2  1750K/s] [Q3  2763K/s] [QT  9213K/s
17558K/s -> 0K/s]
[Q0  2764K/s] [Q1  1935K/s] [Q2  1750K/s] [Q3  2764K/s] [QT  9215K/s
17440K/s -> 0K/s]

[root@router]~# netstat -hd 1
            input        (Total)           output
   packets  errs idrops      bytes    packets  errs      bytes colls drops
      7.6M     0  3.5M       462M       4.1M     0       251M     0     0
      7.6M     0  3.5M       463M       4.1M     0       251M     0     0
      7.8M     0  3.7M       478M       4.1M     0       251M     0     0
      7.6M     0  3.5M       462M       4.1M     0       251M     0     0
      7.6M     0  3.5M       463M       4.1M     0       250M     0     0

[root@router]~# vmstat 1
procs  memory       page                    disks     faults         cpu
r b w  avm   fre   flt  re  pi  po    fr   sr da0 pa0   in    sy    cs us
sy id
0 0 0 224M   15G     3   0   0   0     4    0   0   0 1164    81  2449  0
 0 100
0 0 0 224M   15G     1   0   0   0     0    2   0   0 40718   117 81546  0
84 16
0 0 0 224M   15G     0   0   0   0     0    1   0   0 40592   112 81271  0
85 15
0 0 0 224M   15G     0   0   0   0     0    2   0   0 40496   112 81100  0
81 19
0 0 0 224M   15G     0   0   0   0     0    1   0   0 40604   112 81320  0
83 17
0 0 0 224M   15G     0   0   0   0     0    2   0   0 40483   112 81053  0
79 21
0 0 0 224M   15G     0   0   0   0     0    1   0   0 40645   112 81389  0
84 16
0 0 0 224M   15G     0   0   0   0     0    1   0   0 40635   112 81390  0
81 19
0 0 0 224M   15G     0   0   0   0     0    2   0   0 40438   112 80982  0
82 18
0 0 0 224M   15G     0   0   0   0     0    1   0   0 40335   112 80783  0
82 18
0 0 0 224M   15G     0   0   0   0     0    2   0   0 40498   112 81084  0
82 18

    2 users    Load  3.38  3.15  2.01                  Aug  5 04:49
   Mem usage:   3%Phy  1%Kmem
Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP
PAGER
        Tot   Share      Tot    Share    Free           in   out     in
out
Act   23828    6032   230592    10780  15442K  count
All   23828    6032   230592    10780          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt 40460 total
             16       81k        24  40k   18             cow
atkbd0 1
                                                          zfod        uart0
4
 0.0%Sys  82.1%Intr  0.0%User  0.0%Nice 17.9%Idle         ozfod  1128 hpet0
uhci
|    |    |    |    |    |    |    |    |    |           %ozfod       uhci2
uhci
+++++++++++++++++++++++++++++++++++++++++                 daefr       ciss0
256
                                           dtbuf          prcfr
t5nex0:evt
Namei     Name-cache   Dir-cache    350137 desvn          totfr
t5nex0:0a0
   Calls    hits   %    hits   %     16447 numvn          react 18873
t5nex0:0a1
       3       3 100                  5870 frevn          pdwak 19296
t5nex0:0a2
                                                        2 pdpgs  1007
t5nex0:0a3
Disks   da0 pass0                                         intrn    11
t5nex0:1a0
KB/t   0.00  0.00                                  451720 wire     49
t5nex0:1a1
tps       0     0                                    4912 act      47
t5nex0:1a2
MB/s   0.00  0.00                                   21924 inact    48
t5nex0:1a3
%busy     0     0                                         laund     1 bce0
267
                                                 15812360 free
Can you point me to something else?

Regards,

Lyubo


On Wed, 5 Feb 2020 at 18:40, Lyubomir Yotov <l.yo...@gmail.com> wrote:

> Thanks Olivier,
>
> I will try it with the settings that you proposed.
>
> Regards,
>
> Lyubo
>
> On Wed, 5 Feb 2020 at 17:15, Olivier Cochard-Labbé <oliv...@cochard.me>
> wrote:
>
>> On Wed, Feb 5, 2020 at 3:49 PM Lyubomir Yotov <l.yo...@gmail.com> wrote:
>>
>>> Dear Olivier,
>>>
>>> Hi,
>>
>>
>>> The switch is configured with:
>>>
>>> configure fdb agingtime 0
>>> disable lldp ports 1,3,5,7
>>>
>>> disable flow-control tx ports 1,3,5,7
>>> disable flow-control rx ports 1,3,5,7
>>>
>>> create vlan vlan2 tag 2
>>> create vlan vlan3 tag 3
>>>
>>> configure vlan "Default" del port 1,3,5,7
>>> configure vlan2 add ports 1,3
>>> configure vlan3 add ports 5,7
>>>
>>
>> No static MAC address configured on the switch ?
>> So are you pinging the device between them before starting the benches to
>> be sure the switch have fully populated MAC addresses table ?
>> => If not the switch will "broadcast" packets to all ports belonging to
>> the vlan (with potential huge performance impact) because it doesn't know
>> where the destination is.
>>
>> cf line 73 of this example, where I'm using ping to check the lab status
>> (and populating switch MAC table) on one of my script here:
>>
>> https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/bench-lab-2nodes.config#L73
>>
>>>
>>> #####dut (DL360 G6) configuration:
>>> #cxl0 MAC 00:07:43:2f:29:b0
>>> #cxl1 MAC 00:07:43:2f:29:b8
>>> ifconfig cxl0 198.18.0.203/24
>>> ifconfig cxl1 198.19.0.203/24
>>> #to create static arp records on restart use
>>> sysrc static_arp_pairs="generator receiver"
>>> sysrc static_arp_generator="198.18.0.201 00:07:43:32:b1:61"
>>> sysrc static_arp_receiver="198.19.0.201 00:07:43:32:b1:69"
>>> #to create static arp records on the fly (lost after restart)
>>> arp -S 198.18.0.201 00:07:43:32:b1:61
>>> arp -S 198.19.0.201 00:07:43:32:b1:69
>>> #add a static route for 198.19.0.0/16
>>> route add -net 198.19.0.0/16 198.19.0.201
>>>
>>>
>> Did you disable ICMP redirect too on the router ? And the Chelsio
>> advanced hardware feature ?
>> Here is an example of my configuration file:
>>
>> https://github.com/ocochard/netbenches/tree/master/Atom_C2758_8Cores-Chelsio_T540-CR/forwarding-pf-ipfw/configs/forwarding
>>
>>
>>> #####start the test
>>> #start the receiver part
>>> pkt-gen -N -f rx -i vcxl1 -w 4
>>> #start the generator part, use '-p 2' on order to achieve higher
>>> throughput from the processor.
>>> pkt-gen -N -f tx -i vcxl0 -n 1000000000 -4 -d 198.19.10.1:2000-198.19.10.100
>>> -D 00:07:43:2f:29:b0 -s 198.18.10.1:2000-198.18.10.20 -S
>>> 00:07:43:32:b1:61 -w 4 -l 60 -U -p 2
>>>
>>
>> pkt-gen option -U (software checksum calculation) has a big performance
>> impact (because my quick&dirty patch didn't use a pre-calculate table but
>> calculate checksum for each packet generated) and Chelsio NIC (vcxl0 here)
>> support doing hardware checksum calculation in netmap mode.
>>
>>>
>>>
>>> #####Results:
>>> The generator can generate about 5.5 Mpps.
>>>
>>
>> There is a problem here: With your hardware you should be able to
>> generate line-rate (14.48Mpps).
>>
>>> The router drops about 1.5 Mpps.
>>> The reciever gets about 4 Mpps.
>>>
>>
>> Wow, this is not good because lower than a small Atom (8core) CPU: And
>> this small Atom has a PDT of 20W comparing to this Xeon with a PDT of 80W.
>>
>> I never tested a Xeon E5410 with only 4 cores, but I would expected it be
>> able to forward a lot more than 4Mpps.
>>
>> cf Atom results here:
>>
>> https://github.com/ocochard/netbenches/tree/master/Atom_C2758_8Cores-Chelsio_T540-CR/forwarding-pf-ipfw/results/fbsd12-stable.r354440.BSDRP.1.96
>>
>> There is a 100% saturation on one of the CPU cores and about 75% evenly
>>> distributed among the other cores on the router:
>>>
>>
>> If the load is evenly distributed, this confirm you are correctly
>> generating multi-flows.
>>
>>>
>>> The generator can generate more than 10 Mpps but the number of flows
>>> should be one or two for this hardware configuration:
>>> pkt-gen -N -f tx -i vcxl0 -n 1000000000 -4 -d 198.19.10.1:2000-198.19.10.2
>>> -D 00:07:43:2f:29:b0 -s 198.18.10.1:2000-198.18.10.10 -S
>>> 00:07:43:32:b1:61 -w 4 -l 60 -U -p 2
>>>
>>>
>> Try without -U, and it will be able to reach line-rate.
>> Disable all unneeded Chelsio advanced feature too (check the
>> loader.conf.local examples).
>>
>> Regards,
>>
>> Olivier
>> _______________________________________________
>> Bsdrp-users mailing list
>> Bsdrp-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bsdrp-users
>>
>
_______________________________________________
Bsdrp-users mailing list
Bsdrp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bsdrp-users

Reply via email to