W dniu 10.11.2018 o 23:06, Jesper Dangaard Brouer pisze:
On Sat, 10 Nov 2018 20:56:02 +0100
Paweł Staszewski <pstaszew...@itcare.pl> wrote:

W dniu 10.11.2018 o 20:49, Paweł Staszewski pisze:

W dniu 10.11.2018 o 20:34, Jesper Dangaard Brouer pisze:
On Fri, 9 Nov 2018 23:20:38 +0100 Paweł Staszewski
<pstaszew...@itcare.pl> wrote:
W dniu 08.11.2018 o 20:12, Paweł Staszewski pisze:
CPU load is lower than for connectx4 - but it looks like bandwidth
limit is the same :)
But also after reaching 60Gbit/60Gbit

   bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
    input: /proc/net/dev type: rate
    -         iface                   Rx Tx Total
===================================================================


           enp175s0:          45.09 Gb/s  15.09 Gb/s     60.18 Gb/s
           enp216s0:          15.14 Gb/s  45.19 Gb/s     60.33 Gb/s
-------------------------------------------------------------------


              total:          60.45 Gb/s  60.48 Gb/s 120.93 Gb/s
Today reached 65/65Gbit/s

But starting from 60Gbit/s RX / 60Gbit TX nics start to drop packets
(with 50%CPU on all 28cores) - so still there is cpu power to use :).
This is weird!

How do you see / measure these drops?
Simple icmp test like ping -i 0.1
And im testing by icmp management ip address on vlan that is attacked
to one NIC (the side that is more stressed with RX)
And another icmp test is forward thru this router - host behind it

Both measurements shows same loss ratio from 0.1 to 0.5% after
reaching ~45Gbit/s RX side - depends how much RX side is pushed drops
vary between 0.1 to 0.5 - even 0.6%:)

Okay good to know, you use an external measurement for this.  I do
think packets are getting dropped by the NIC.

So checked other stats.
softnet_stats shows average 1k squeezed per sec:
Is below output the raw counters? not per sec?

It would be valuable to see the per sec stats instead...
I use this tool:
https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl
CPU          total/sec     dropped/sec    squeezed/sec  collision/sec      
rx_rps/sec  flow_limit/sec
CPU:00               0               0               0 0                0       
        0
[...]
CPU:13               0               0               0 0                0       
        0
CPU:14          485538               0              43 0                0       
        0
CPU:15          474794               0              51 0                0       
        0
CPU:16          449322               0              41 0                0       
        0
CPU:17          476420               0              46 0                0       
        0
CPU:18          440436               0              38 0                0       
        0
CPU:19          501499               0              49 0                0       
        0
CPU:20          459468               0              49 0                0       
        0
CPU:21          438928               0              47 0                0       
        0
CPU:22          468983               0              40 0                0       
        0
CPU:23          446253               0              47 0                0       
        0
CPU:24          451909               0              46 0                0       
        0
CPU:25          479373               0              55 0                0       
        0
CPU:26          467848               0              49 0                0       
        0
CPU:27          453153               0              51 0                0       
        0
CPU:28               0               0               0 0                0       
        0
[...]
CPU:40               0               0               0 0                0       
        0
CPU:41               0               0               0 0                0       
        0
CPU:42          466853               0              43 0                0       
        0
CPU:43          453059               0              54 0                0       
        0
CPU:44          363219               0              34 0                0       
        0
CPU:45          353632               0              38 0                0       
        0
CPU:46          371618               0              40 0                0       
        0
CPU:47          350518               0              46 0                0       
        0
CPU:48          397544               0              40 0                0       
        0
CPU:49          364873               0              38 0                0       
        0
CPU:50          383630               0              38 0                0       
        0
CPU:51          358771               0              39 0                0       
        0
CPU:52          372547               0              38 0                0       
        0
CPU:53          372882               0              36 0                0       
        0
CPU:54          366244               0              43 0                0       
        0
CPU:55          365886               0              39 0                0       
        0

Summed:       11835201               0            1217 0                0       
        0

Do notice, the per CPU squeeze is not too large.
Yes - but im searching invisible thing now :) something invisible is slowing down packet processing :) So trying to find any counter that have something to do with packet processing.

The summed 11.8 Mpps is a little high compared to:

  Ethtool(enp216s0) stat: 4971677 (4,971,677) <= rx_packets /sec
  Ethtool(enp175s0) stat: 3717148 (3,717,148) <= rx_packets /sec
  Sum:  3717148+4971677 = 8688825 (8,688,825)
Yes i was mentioning this that stats from /net/dev for nics are weird if u compare them to ethtool - there are big differences for mellanox nic's
Especially with packets/s
For example when i change
- cqe to compress i have more interrupts - same as more packets/s - but same bw - change ring settings - like half hour before - changed TX fing from 4096 to 256 and i have less interrupts and less packets but more bandwidth... weird...

Cause in normal traffic more packets/s need to be more bandwidth - if average frame is 500-600 if I gain like 1M+pps - then it should mean in average +5/6Gbit/s more But it looks like it is more comparable to number of interrupts not number of packets.





[...]
Remember those tests are now on two separate connectx5 connected to
two separate pcie x16  gen 3.0
   That is strange... I still suspect some HW NIC issue, can you provide
ethtool stats info via tool:

https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

$ ethtool_stats.pl --dev enp175s0 --dev enp216s0

The tool remove zero-stats counters and report per sec stats. It makes
it easier to spot that is relevant for the given workload.
yes mlnx have just too many counters that are always 0 for my case :)
Will try this also
But still alot of non 0 counters
Show adapter(s) (enp175s0 enp216s0) statistics (ONLY that changed!)
Ethtool(enp175s0) stat:         8891 (          8,891) <= ch0_arm /sec
[...]

I have copied the stats over in another document so I can better looks
at it... and I've found some interesting stats.

E.g. we can see that the NIC hardware is dropping packets.

RX-drops on enp175s0:

  (enp175s0) stat: 4850734036 ( 4,850,734,036) <= rx_bytes /sec
  (enp175s0) stat: 5069043007 ( 5,069,043,007) <= rx_bytes_phy /sec
                   -218308971 (  -218,308,971) Dropped bytes /sec
(enp175s0) stat: 139602 ( 139,602) <= rx_discards_phy /sec

  (enp175s0) stat: 3717148 ( 3,717,148) <= rx_packets /sec
  (enp175s0) stat: 3862420 ( 3,862,420) <= rx_packets_phy /sec
                   -145272 (  -145,272) Dropped packets /sec


RX-drops on enp216s0 is less:

  (enp216s0) stat: 2592286809 ( 2,592,286,809) <= rx_bytes /sec
  (enp216s0) stat: 2633575771 ( 2,633,575,771) <= rx_bytes_phy /sec
                    -41288962 (   -41,288,962) Dropped bytes /sec

  (enp216s0) stat:   464 (464) <= rx_discards_phy /sec

  (enp216s0) stat: 4971677 ( 4,971,677) <= rx_packets /sec
  (enp216s0) stat: 4975563 ( 4,975,563) <= rx_packets_phy /sec
                     -3886 (    -3,886) Dropped packets /sec


Reply via email to