W dniu 05.11.2018 o 21:17, Jesper Dangaard Brouer pisze:
On Sun, 4 Nov 2018 01:24:03 +0100 Paweł Staszewski <pstaszew...@itcare.pl> 
wrote:

And today again after allpy patch for page allocator - reached again
64/64 Gbit/s

with only 50-60% cpu load
Great.

today no slowpath hit for netwoking :)

But again dropped pckt at 64GbitRX and 64TX ....
And as it should not be pcie express limit  -i think something more is
Well, this does sounds like a PCIe bandwidth limit to me.

See the PCIe BW here: https://en.wikipedia.org/wiki/PCI_Express

You likely have PCIe v3, where 1-lane have 984.6 MBytes/s or 7.87 Gbit/s
Thus,  x16-lanes have 15.75 GBytes or 126 Gbit/s.  It does say "in each
direction", but you are also forwarding this RX->TX on both (dual) ports
NIC that is sharing the same PCIe slot.
Network controller changed from 2-port 100G connectx4 to 2 separate cards 100G connectx5


   PerfTop:   92239 irqs/sec  kernel:99.4%  exact:  0.0% [4000Hz cycles],  (all, 56 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

     6.65%  [kernel]       [k] irq_entries_start
     5.57%  [kernel]       [k] tasklet_action_common.isra.21
     4.60%  [kernel]       [k] mlx5_eq_int
     4.04%  [kernel]       [k] mlx5e_skb_from_cqe_mpwrq_linear
     3.66%  [kernel]       [k] _raw_spin_lock_irqsave
     3.58%  [kernel]       [k] mlx5e_sq_xmit
     2.66%  [kernel]       [k] fib_table_lookup
     2.52%  [kernel]       [k] _raw_spin_lock
     2.51%  [kernel]       [k] build_skb
     2.50%  [kernel]       [k] _raw_spin_lock_irq
     2.04%  [kernel]       [k] try_to_wake_up
     1.83%  [kernel]       [k] queued_spin_lock_slowpath
     1.81%  [kernel]       [k] mlx5e_poll_tx_cq
     1.65%  [kernel]       [k] do_idle
     1.50%  [kernel]       [k] mlx5e_poll_rx_cq
     1.34%  [kernel]       [k] __sched_text_start
     1.32%  [kernel]       [k] cmd_exec
     1.30%  [kernel]       [k] cmd_work_handler
     1.16%  [kernel]       [k] vlan_do_receive
     1.15%  [kernel]       [k] memcpy_erms
     1.15%  [kernel]       [k] __dev_queue_xmit
     1.07%  [kernel]       [k] mlx5_cmd_comp_handler
     1.06%  [kernel]       [k] sched_ttwu_pending
     1.00%  [kernel]       [k] ipt_do_table
     0.98%  [kernel]       [k] ip_finish_output2
     0.92%  [kernel]       [k] pfifo_fast_dequeue
     0.88%  [kernel]       [k] mlx5e_handle_rx_cqe_mpwrq
     0.78%  [kernel]       [k] dev_gro_receive
     0.78%  [kernel]       [k] mlx5e_napi_poll
     0.76%  [kernel]       [k] mlx5e_post_rx_mpwqes
     0.70%  [kernel]       [k] process_one_work
     0.67%  [kernel]       [k] __netif_receive_skb_core
     0.65%  [kernel]       [k] __build_skb
     0.63%  [kernel]       [k] llist_add_batch
     0.62%  [kernel]       [k] tcp_gro_receive
     0.60%  [kernel]       [k] inet_gro_receive
     0.59%  [kernel]       [k] ip_route_input_rcu
     0.59%  [kernel]       [k] rcu_irq_exit
     0.56%  [kernel]       [k] napi_complete_done
     0.52%  [kernel]       [k] kmem_cache_alloc
     0.48%  [kernel]       [k] __softirqentry_text_start
     0.48%  [kernel]       [k] mlx5e_xmit
     0.47%  [kernel]       [k] __queue_work
     0.46%  [kernel]       [k] memset_erms
     0.46%  [kernel]       [k] dev_hard_start_xmit
     0.45%  [kernel]       [k] insert_work
     0.45%  [kernel]       [k] enqueue_task_fair
     0.44%  [kernel]       [k] __wake_up_common
     0.43%  [kernel]       [k] finish_task_switch
     0.43%  [kernel]       [k] kmem_cache_free_bulk
     0.42%  [kernel]       [k] ip_forward
     0.42%  [kernel]       [k] worker_thread
     0.41%  [kernel]       [k] schedule
     0.41%  [kernel]       [k] _raw_spin_unlock_irqrestore
     0.40%  [kernel]       [k] netif_skb_features
     0.40%  [kernel]       [k] queue_work_on
     0.40%  [kernel]       [k] pfifo_fast_enqueue
     0.39%  [kernel]       [k] vlan_dev_hard_start_xmit
     0.39%  [kernel]       [k] page_frag_free
     0.36%  [kernel]       [k] swiotlb_map_page
     0.36%  [kernel]       [k] update_cfs_rq_h_load
     0.35%  [kernel]       [k] validate_xmit_skb.isra.142
     0.35%  [kernel]       [k] dev_ifconf
     0.35%  [kernel]       [k] check_preempt_curr
     0.34%  [kernel]       [k] _raw_spin_trylock
     0.34%  [kernel]       [k] rcu_idle_exit
     0.33%  [kernel]       [k] ip_rcv_core.isra.20.constprop.25
     0.33%  [kernel]       [k] __qdisc_run
     0.33%  [kernel]       [k] skb_release_data
     0.32%  [kernel]       [k] native_sched_clock
     0.30%  [kernel]       [k] add_interrupt_randomness
     0.29%  [kernel]       [k] interrupt_entry
     0.28%  [kernel]       [k] skb_gro_receive
     0.26%  [kernel]       [k] read_tsc
     0.26%  [kernel]       [k] __get_xps_queue_idx
     0.26%  [kernel]       [k] inet_gifconf
     0.26%  [kernel]       [k] skb_segment
     0.25%  [kernel]       [k] __tasklet_schedule_common
     0.25%  [kernel]       [k] smpboot_thread_fn
     0.23%  [kernel]       [k] __update_load_avg_se
     0.22%  [kernel]       [k] tcp4_gro_receive


Not much traffic now:
  bwm-ng v0.6.1 (probing every 0.500s), press 'h' for help
  input: /proc/net/dev type: rate
  |         iface                   Rx Tx                Total
==============================================================================
         enp175s0:           6.95 Gb/s            4.20 Gb/s           11.15 Gb/s          enp216s0:           4.23 Gb/s            6.98 Gb/s           11.21 Gb/s
------------------------------------------------------------------------------
            total:          11.18 Gb/s           11.18 Gb/s           22.37 Gb/s

  bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
  input: /proc/net/dev type: rate
  |         iface                   Rx Tx                Total
==============================================================================
         enp175s0:       700264.50 P/s        923890.25 P/s 1624154.75 P/s
         enp216s0:       932598.81 P/s        708771.50 P/s 1641370.25 P/s
------------------------------------------------------------------------------
            total:      1632863.38 P/s       1632661.75 P/s 3265525.00 P/s








going on there - and hard to catch - cause perf top doestn chenged
besides there is no queued slowpath hit now

I ordered now also intel cards to compare - but 3 weeks eta
Faster - cause 3 days - i will have mellanox connectx 5 - so can
separate traffic to two different x16 pcie busses
I do think you need to separate traffic to two different x16 PCIe
slots.  I have found that the ConnectX-5 is significantly faster
packet-per-sec performance than ConnectX-4, but that is not your
use-case (max BW). I've not tested these NICs for maximum
_bidirectional_ bandwidth limits, I've only made sure I can do 100G
unidirectional, which can hit some funny motherboard memory limits
(remember to equip motherboard with 4 RAM blocks for full memory BW).

Yes memory channels are separated and there are 4 modules per cpu :)

Reply via email to