Re: bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

Navdeep Parhar Fri, 24 Mar 2017 17:52:15 -0700

On 03/24/2017 16:53, Caraballo-vega, Jordan A. (GSFC-6062)[COMPUTERSCIENCE CORP] wrote:

It looks like netmap is there; however, is there a way of figuring out
if netmap is being used?

If you're not running netmap-fwd or some other netmap application, it'snot being used. You have just 1 txq/rxq and that would explain thedifference between cxl and vcxl.


> cxl0: 16 txq, 8 rxq (NIC)
> vcxl0: 1 txq, 1 rxq (NIC); 2 txq, 2 rxq (netmap)

...
And yes, we are using UDP 64 bytes tests.

That's strange then. The "input packets" counter counts every singleframe that the chip saw on the wire that matched any of its MACaddresses, including frames that the chip drops. There's no way toexplain why vcxl sees ~640K pps incoming vs. 2.8M pps for cxl. Thatnumber shouldn't depend on your router configuration at all -- it'sentirely dependent on the traffic generators. Are you sure you aren'tgetting PAUSE frames out of the chip? There's nothing else that couldslow down UDP senders.


# sysctl -a | grep tx_pause

Regards,
Navdeep


On 3/24/17 7:39 PM, Navdeep Parhar wrote:

On 03/24/2017 16:07, Caraballo-vega, Jordan A. (GSFC-6062)[COMPUTER
SCIENCE CORP] wrote:

At the time of implementing the vcxl* interfaces we get very bad
results.


You're probably not using netmap with the vcxl interfaces, and the
number of "normal" tx and rx queues is just 2 for these interfaces.

Even if you _are_ using netmap, the hw.cxgbe.nnmtxq10g/rxq10g tunables
don't work anymore.  Use these to control the number of queues for
netmap:
hw.cxgbe.nnmtxq_vi
hw.cxgbe.nnmrxq_vi

You should see a line like this in dmesg for all cxl/vcxl interfaces
and that tells you exactly how many queues the driver configured:
cxl0: 4 txq, 4 rxq (NIC); 4 txq, 2 rxq (TOE)


packets  errs idrops      bytes    packets  errs      bytes colls drops
        629k  4.5k     0        66M       629k     0        66M
0     0
        701k  5.0k     0        74M       701k     0        74M
0     0
        668k  4.8k     0        70M       668k     0        70M
0     0
        667k  4.8k     0        70M       667k     0        70M
0     0
        645k  4.5k     0        68M       645k     0        68M
0     0
        686k  4.9k     0        72M       686k     0        72M
0     0

And by using just the cxl* interfaces we were getting about

              input        (Total)           output
     packets  errs idrops      bytes    packets  errs      bytes colls
drops
        2.8M     0  1.2M       294M       1.6M     0       171M
0     0
        2.8M     0  1.2M       294M       1.6M     0       171M
0     0
        2.8M     0  1.2M       294M       1.6M     0       171M
0     0
        2.8M     0  1.2M       295M       1.6M     0       172M
0     0
        2.8M     0  1.2M       295M       1.6M     0       171M
0     0

These are our configurations for now. Any advice or suggestion will be
appreciated.


What I don't understand is that you have PAUSE disabled and congestion
drops enabled but still the number of packets coming in (whether they
are dropped eventually or not is irrelevant here) is very low in your
experiments.  It's almost as if the senders are backing off in the
face of packet loss.  Are you using TCP or UDP?  Always use UDP for
pps testing -- the senders need to be relentless.

Regards,
Navdeep


/etc/rc.conf configurations

ifconfig_cxl0="up"
ifconfig_cxl1="up"
ifconfig_vcxl0="inet 172.16.2.1/24 -tso -lro mtu 9000"
ifconfig_vcxl1="inet 172.16.1.1/24 -tso -lro mtu 9000"
gateway_enable="YES"

/boot/loader.conf configurations

# Chelsio Modules
t4fw_cfg_load="YES"
t5fw_cfg_load="YES"
if_cxgbe_load="YES"

# rx and tx size
dev.cxl.0.qsize_txq=8192
dev.cxl.0.qsize_rxq=8192
dev.cxl.1.qsize_txq=8192
dev.cxl.1.qsize_rxq=8192

# drop toecaps to increase queues
dev.t5nex.0.toecaps=0
dev.t5nex.0.rdmacaps=0
dev.t5nex.0.iscsicaps=0
dev.t5nex.0.fcoecaps=0

# Controls the hardware response to congestion.  -1 disables
# congestion feedback and is not recommended.  0 instructs the
# hardware to backpressure its pipeline on congestion.  This
# usually results in the port emitting PAUSE frames.  1 instructs
# the hardware to drop frames destined for congested queues. From cxgbe
dev.t5nex.0.cong_drop=1

# Saw these recomendations in Vicenzo email thread
hw.cxgbe.num_vis=2
hw.cxgbe.fl_pktshift=0
hw.cxgbe.toecaps_allowed=0
hw.cxgbe.nnmtxq10g=8
hw.cxgbe.nnmrxq10g=8

/etc/sysctl.conf configurations

# Turning off pauses
dev.cxl.0.pause_settings=0
dev.cxl.1.pause_settings=0
# John Jasen suggestion - March 24, 2017
net.isr.bindthreads=0
net.isr.maxthreads=24


On 3/18/17 1:28 AM, Navdeep Parhar wrote:

On Fri, Mar 17, 2017 at 11:43:32PM -0400, John Jasen wrote:

On 03/17/2017 03:32 PM, Navdeep Parhar wrote:

On Fri, Mar 17, 2017 at 12:21 PM, John Jasen <jja...@gmail.com>
wrote:

Yes.
We were hopeful, initially, to be able to achieve higher packet
forwarding rates through either netmap-fwd or due to enhancements
based
off https://wiki.freebsd.org/ProjectsRoutingProposal

Have you tried netmap-fwd?  I'd be interested in how that did in
your tests.

We have. On this particular box, (11-STABLE, netmap-fwd fresh from
git)
it took about 1.7m pps in, dropped 500k, and passed about 800k.

I'm lead to believe that vcxl interfaces may yield better results?

Yes, those are the ones with native netmap support.  Any netmap based
application should use the vcxl interfaces.  If you used them on the
main cxl interfaces you were running netmap in emulated mode.

Regards,
Navdeep


_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

Reply via email to