Re: [vpp-dev] Interesting perf test results from Red Hat's test team

Thomas F Herbert Mon, 20 Feb 2017 06:44:19 -0800


On 02/17/2017 06:18 PM, Alec Hothan (ahothan) wrote:

Hi Karl
Can you also tell which version of DPDK you were using for OVS and forVPP (for VPP is it the one bundled with 17.01?).

DPDK 1611 and VPP 1701.

“The pps is the bi-directional sum of the packets received back at thetraffic generator.”


Just to make sure….

If your traffic gen sends 1 Mpps to each of the 2 interfaces and youget no drop (meaning you receive 1 Mpps from each interface). What doyou report? 2 Mpps or 4 Mpps?


You seem to say 2Mpps (sum of all RX).

The CSIT perf numbers report the sum(TX) = in the above example CSITreports 2 Mpps.

The CSIT numbers for 1 vhost/1 VM (practically similar to yours) areat about half of what you report.


https://docs.fd.io/csit/rls1701/report/vpp_performance_results_hw/performance_results_hw.html#ge2p1x520-dot1q-l2xcbase-eth-2vhost-1vm-ndrpdrdisc

scroll down the table to tc13 tc14, 4t4c (4 threads) L2XC, 64B NDR,5.95Mpps (aggregated TX of the 2 interfaces) PDR 7.47Mpps.


while the results in your slides put it at around 11Mpps.

So either your testbed really switches 2 times more packets than theCSIT one, or you’re actually reporting double the amount compared tohow CSIT reports it…


Thanks

 Alec

    *From: *Karl Rister <kris...@redhat.com>
    *Organization: *Red Hat
    *Reply-To: *"kris...@redhat.com" <kris...@redhat.com>
    *Date: *Thursday, February 16, 2017 at 11:09 AM
    *To: *"Alec Hothan (ahothan)" <ahot...@cisco.com>, "Maciek
    Konstantynowicz (mkonstan)" <mkons...@cisco.com>, Thomas F Herbert
    <therb...@redhat.com>
    *Cc: *Andrew Theurer <atheu...@redhat.com>, Douglas Shakshober
    <dsh...@redhat.com>, "csit-...@lists.fd.io"
    <csit-...@lists.fd.io>, vpp-dev <vpp-dev@lists.fd.io>
    *Subject: *Re: [vpp-dev] Interesting perf test results from Red
    Hat's test team

    On 02/15/2017 08:58 PM, Alec Hothan (ahothan) wrote:

        Great summary slides Karl, I have a few more questions on the
        slides.

        ·         Did you use OSP10/OSPD/ML2 to deploy your testpmd
        VM/configure

        the vswitch or is it direct launch using libvirt and direct
        config of

        the vswitches? (this is a bit related to Maciek’s question on
        the exact

        interface configs in the vswitch)

    There was no use of OSP in these tests, the guest is launched via

    libvirt and the vswitches are manually launched and configured with

    shell scripts.

        ·         Unclear if all the charts results were measured
        using 4 phys

        cores (no HT) or 2 phys cores (4 threads with HT)

    Only the slide 3 has any 4 core (no HT) data, all other data is
    captured

    using HT on the appropriate number of cores: 2 for single queue, 4 for

    two queue, and 6 for three queue.

        ·         How do you report your pps? ;-) Are those

        o   vswitch centric (how many packets the vswitch forwards per
        second

        coming from traffic gen and from VMs)

        o   or traffic gen centric aggregated TX (how many pps are
        sent by the

        traffic gen on both interfaces)

        o   or traffic gen centric aggregated TX+RX (how many pps are
        sent and

        received by the traffic gen on both interfaces)

    The pps is the bi-directional sum of the packets received back at the

    traffic generator.

        ·         From the numbers shown, it looks like it is the
        first or the last

        ·         Unidirectional or symmetric bi-directional traffic?

    symmetric bi-directional

        ·         BIOS Turbo boost enabled or disabled?

    disabled

        ·         How many vcpus running the testpmd VM?

    3, 5, or 7.  1 VCPU for house keeping and then 2 VCPUs for each queue

    configuration.  Only the required VCPUs are active for any

    configuration, so the VCPU count varies depending on the configuration

    being tested.

        ·         How do you range the combinations in your 1M flows
        src/dest

        MAC? I’m not aware about any real NFV cloud deployment/VNF
        that handles

        that type of flow pattern, do you?

    We increment all the fields being modified by one for each packet
    until

    we hit a million and then we restart at the base value and repeat.  So

    all IPs and/or MACs get modified in unison.

    We actually arrived at the srcMac,dstMac configuration in a backwards

    manner.  On one of our systems where we develop the traffic
    generator we

    were getting an error when doing srcMac,dstMac,srcIp,dstIp that we

    couldn't figure out in the time needed for this work so we were
    going to

    just go with srcMac,dstMac due to time constraints.  However, on the

    system where we actually did the testing both worked so I just
    collected

    both out of curiosity.

        Thanks

           Alec

             *From: *<vpp-dev-boun...@lists.fd.io
        <mailto:vpp-dev-boun...@lists.fd.io>> on behalf of "Maciek

             Konstantynowicz (mkonstan)" <mkons...@cisco.com
        <mailto:mkons...@cisco.com>>

             *Date: *Wednesday, February 15, 2017 at 1:28 PM

             *To: *Thomas F Herbert <therb...@redhat.com
        <mailto:therb...@redhat.com>>

             *Cc: *Andrew Theurer <atheu...@redhat.com
        <mailto:atheu...@redhat.com>>, Douglas Shakshober

             <dsh...@redhat.com <mailto:dsh...@redhat.com>>,
        "csit-...@lists.fd.io <mailto:csit-...@lists.fd.io>"
        <csit-...@lists.fd.io <mailto:csit-...@lists.fd.io>>,

             vpp-dev <vpp-dev@lists.fd.io
        <mailto:vpp-dev@lists.fd.io>>, Karl Rister <kris...@redhat.com
        <mailto:kris...@redhat.com>>

             *Subject: *Re: [vpp-dev] Interesting perf test results
        from Red

             Hat's test team

             Thomas, many thanks for sending this.

             Few comments and questions after reading the slides:

             1. s3 clarification - host and data plane thread setup -
        vswitch pmd

             (data plane) thread placement

                 a. "1PMD/core (4 core)” - HT (SMT) disabled, 4 phy
        cores used

             for vswitch, each with data plane thread.

                 b. “2PMD/core (2 core)” - HT (SMT) enabled, 2 phy
        cores, 4

             logical cores used for vswitch, each with data plane thread.

                 c. in both cases each data plane thread handling a single

             interface - 2* physical, 2* vhost => 4 threads, all busy.

                 d. in both cases frames are dropped by vswitch or in
        vring due

             to vswitch not keeping up - IOW testpmd in kvm guest is
        not DUT.

             2. s3 question - vswitch setup - it is unclear what is the

             forwarding mode of each vswitch, as only srcIp changed in
        flows

                 a. flow or MAC learning mode?

                 b. port to port crossconnect?

             3. s3 comment - host and data plane thread setup

                 a. “2PMD/core (2 core)” case - thread placement may yield

             different results

                     - physical interface threads as siblings vs.

                     - physical and virtual interface threads as siblings.

                 b. "1PMD/core (4 core)” - one would expect these to
        be much

             higher than “2PMD/core (2 core)”

                     - speculation: possibly due to "instruction load"
        imbalance

             between threads.

                     - two types of thread with different "instruction
        load":

             phy->vhost vs. vhost->phy

                     - "instruction load" = instr/pkt, instr/cycle
        (IPC efficiency).

             4. s4 comment - results look as expected for vpp

             5. s5 question - unclear why throughput doubled

                 a. e.g. for vpp from "11.16 Mpps" to "22.03 Mpps"

                 b. if only queues increased, and cpu resources did
        not, or have

             they?

             6. s6 question - similar to point 5. - unclear cpu and thread

             reasources.

             7. s7 comment - anomaly for 3q (virtio multi-queue) for
        (srcMAc,dstMAC)

                 a. could be due to flow hashing inefficiency.

             -Maciek

                 On 15 Feb 2017, at 17:34, Thomas F Herbert
        <therb...@redhat.com <mailto:therb...@redhat.com>

                 <mailto:therb...@redhat.com>
        <mailto:therb...@redhat.com%3e>> wrote:

                 Here are test results on VPP 17.01 compared with OVS/DPDK

                 2.6/1611 performed by Karl Rister of Red Hat.

                 This is PVP testing with 1, 2 and 3 queues. It is an
        interesting

                 comparison with the CSIT results. Of particular
        interest is the

                 drop off on the 3 queue results.

                 --TFH

                 --

                 *Thomas F Herbert*

                 SDN Group

                 Office of Technology

                 *Red Hat*

        
<vpp-17.01_vs_ovs-2.6.pdf>_______________________________________________

                 vpp-dev mailing list

        vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
        <mailto:vpp-dev@lists.fd.io>

        https://lists.fd.io/mailman/listinfo/vpp-dev
        <https://lists.fd.io/mailman/listinfo/vpp-dev>

--

    Karl Rister <kris...@redhat.com <mailto:kris...@redhat.com>>


--
*Thomas F Herbert*
SDN Group
Office of Technology
*Red Hat*

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Interesting perf test results from Red Hat's test team

Reply via email to