Re: GCP cloud : Virtio-PMD performance Issue
Thanks @maxime.coque...@redhat.com Have included dev@dpdk.org On Fri, Dec 6, 2024 at 2:11 AM Maxime Coquelin wrote: > Hi Mukul, > > DPDK upstream mailing lists should be added to this e-mail. > I am not allowed to provide off-list support, all discussions should > happen upstream. > > If this is reproduced with downtream DPDK provided with RHEL and you > have a RHEL subscription, please use the Red Hat issue tracker. > > Thanks for your understanding, > Maxime > > On 12/5/24 21:36, Mukul Sinha wrote: > > + Varun > > > > On Fri, Dec 6, 2024 at 2:04 AM Mukul Sinha > <mailto:mukul.si...@broadcom.com>> wrote: > > > > Hi GCP & Virtio-PMD dev teams, > > We are from VMware NSX Advanced Load Balancer Team whereby in > > GCP-cloud (*custom-8-8192 VM instance type 8core8G*) we are triaging > > an issue of TCP profile application throughput performance with > > single dispatcher core single Rx/Tx queue (queue depth: 2048) the > > throughput performance we get using dpdk-22.11 virtio-PMD code is > > degraded significantly when compared to when using dpdk-20.05 PMD > > We see high amount of Tx packet drop counter incrementing on > > virtio-NIC pointing to issue that the GCP hypervisor side is unable > > to drain the packets faster (No drops are seen on Rx side) > > The behavior is like this : > > _Using dpdk-22.11_ > > At 75% CPU usage itself we start seeing huge number of Tx packet > > drops reported (no Rx drops) causing TCP restransmissions eventually > > bringing down the effective throughput numbers > > _Using dpdk-20.05_ > > even at ~95% CPU usage without any packet drops (neither Rx nor Tx) > > we are able to get a much better throughput > > > > To improve performance numbers with dpdk-22.11 we have tried > > increasing the queue depth to 4096 but that din't help. > > If with dpdk-22.11 we move from single core Rx/Tx queue=1 to single > > core Rx/Tx queue=2 we are able to get slightly better numbers (but > > still doesnt match the numbers obtained using dpdk-20.05 single core > > Rx/Tx queue=1). This again corroborates the fact the GCP hypervisor > > is the bottleneck here. > > > > To root-cause this issue we were able to replicate this behavior > > using native DPDK testpmd as shown below (cmds used):- > > Hugepage size: 2 MB > > ./app/dpdk-testpmd -l 0-1 -n 1 -- -i --nb-cores=1 --txd=2048 > > --rxd=2048 --rxq=1 --txq=1 --portmask=0x3 > > set fwd mac > > set fwd flowgen > > set txpkts 1518 > > start > > stop > > > > Testpmd traffic run (for packet-size=1518) for exact same > > time-interval of 15 seconds: > > > > _22.11_ > >-- Forward statistics for port 0 > > -- > >RX-packets: 2 RX-dropped: 0 RX-total: 2 > >TX-packets: 19497570 *TX-dropped: 364674686 *TX-total: > 384172256 > > > > > > > _20.05_ > >-- Forward statistics for port 0 > > -- > >RX-packets: 3 RX-dropped: 0 RX-total: 3 > >TX-packets: 19480319 TX-dropped: 0 TX-total: > > 19480319 > > > > > > > > > As you can see > > dpdk-22.11 > > Packets generated : 384 million Packets serviced : ~19.5 million : > > Tx-dropped : 364 million > > dpdk-20.05 > > Packets generated : ~19.5 million Packets serviced : ~19.5 million : > > Tx-dropped : 0 > > > > Actual serviced traffic remains almost same between the two versions > > (implying the underlying GCP hypervisor is only capable of handling > > that much) but in dpdk-22.11 the PMD is pushing almost 20x traffic > > compared to dpdk-20.05 > > The same pattern can be seen even if we run traffic for a longer > > duration. > > > > === > > > > Following are our queries: > > @ Virtio-dev team > > 1. Why in dpdk-22.11 using virtio PMD the testpmd application is > > able to pump 20 times Tx traffic towards hypervisor compared to > > dpdk-20.05 ? > > What has changed
Re: GCP cloud : Virtio-PMD performance Issue
GCP-dev team @jeroe...@google.com @rush...@google.com @joshw...@google.com Please do check on this & get back. On Fri, Dec 6, 2024 at 4:24 AM Mukul Sinha wrote: > Thanks @maxime.coque...@redhat.com > Have included dev@dpdk.org > > > On Fri, Dec 6, 2024 at 2:11 AM Maxime Coquelin > wrote: > >> Hi Mukul, >> >> DPDK upstream mailing lists should be added to this e-mail. >> I am not allowed to provide off-list support, all discussions should >> happen upstream. >> >> If this is reproduced with downtream DPDK provided with RHEL and you >> have a RHEL subscription, please use the Red Hat issue tracker. >> >> Thanks for your understanding, >> Maxime >> >> On 12/5/24 21:36, Mukul Sinha wrote: >> > + Varun >> > >> > On Fri, Dec 6, 2024 at 2:04 AM Mukul Sinha > > <mailto:mukul.si...@broadcom.com>> wrote: >> > >> > Hi GCP & Virtio-PMD dev teams, >> > We are from VMware NSX Advanced Load Balancer Team whereby in >> > GCP-cloud (*custom-8-8192 VM instance type 8core8G*) we are triaging >> > an issue of TCP profile application throughput performance with >> > single dispatcher core single Rx/Tx queue (queue depth: 2048) the >> > throughput performance we get using dpdk-22.11 virtio-PMD code is >> > degraded significantly when compared to when using dpdk-20.05 PMD >> > We see high amount of Tx packet drop counter incrementing on >> > virtio-NIC pointing to issue that the GCP hypervisor side is unable >> > to drain the packets faster (No drops are seen on Rx side) >> > The behavior is like this : >> > _Using dpdk-22.11_ >> > At 75% CPU usage itself we start seeing huge number of Tx packet >> > drops reported (no Rx drops) causing TCP restransmissions eventually >> > bringing down the effective throughput numbers >> > _Using dpdk-20.05_ >> > even at ~95% CPU usage without any packet drops (neither Rx nor Tx) >> > we are able to get a much better throughput >> > >> > To improve performance numbers with dpdk-22.11 we have tried >> > increasing the queue depth to 4096 but that din't help. >> > If with dpdk-22.11 we move from single core Rx/Tx queue=1 to single >> > core Rx/Tx queue=2 we are able to get slightly better numbers (but >> > still doesnt match the numbers obtained using dpdk-20.05 single core >> > Rx/Tx queue=1). This again corroborates the fact the GCP hypervisor >> > is the bottleneck here. >> > >> > To root-cause this issue we were able to replicate this behavior >> > using native DPDK testpmd as shown below (cmds used):- >> > Hugepage size: 2 MB >> > ./app/dpdk-testpmd -l 0-1 -n 1 -- -i --nb-cores=1 --txd=2048 >> > --rxd=2048 --rxq=1 --txq=1 --portmask=0x3 >> > set fwd mac >> > set fwd flowgen >> > set txpkts 1518 >> > start >> > stop >> > >> > Testpmd traffic run (for packet-size=1518) for exact same >> > time-interval of 15 seconds: >> > >> > _22.11_ >> >-- Forward statistics for port 0 >> > -- >> >RX-packets: 2 RX-dropped: 0 RX-total: 2 >> >TX-packets: 19497570 *TX-dropped: 364674686 *TX-total: >> 384172256 >> > >> > >> >> > _20.05_ >> >-- Forward statistics for port 0 >> > -- >> >RX-packets: 3 RX-dropped: 0 RX-total: 3 >> >TX-packets: 19480319 TX-dropped: 0 TX-total: >> > 19480319 >> > >> > >> >> > >> > As you can see >> > dpdk-22.11 >> > Packets generated : 384 million Packets serviced : ~19.5 million : >> > Tx-dropped : 364 million >> > dpdk-20.05 >> > Packets generated : ~19.5 million Packets serviced : ~19.5 million : >> > Tx-dropped : 0 >> > >> > Actual serviced traffic remains almost same between the two versions >> > (implying the underlying GCP hypervisor is only capable of handling >> > that much) but in dpdk-22.11 the PMD is pushing almost 20x
Re: GCP cloud : Virtio-PMD performance Issue
Hi Maxime, We have run perf top on dpdk-20.05 vs dpdk-22.11 but nothing of difference in the top-hitter api's. In our analysis virtio-PMD CPU performance's not a bottleneck (infact its more performant now) but its the GCP hypervisor which isn't able to cope up with 3 times the Tx traffic load. On Fri, Dec 6, 2024 at 1:54 PM Maxime Coquelin wrote: > Hi Mukul, > > On 12/5/24 23:54, Mukul Sinha wrote: > > Thanks @maxime.coque...@redhat.com <mailto:maxime.coque...@redhat.com> > > Have included dev@dpdk.org <mailto:dev@dpdk.org> > > > > > > On Fri, Dec 6, 2024 at 2:11 AM Maxime Coquelin > > mailto:maxime.coque...@redhat.com>> wrote: > > > > Hi Mukul, > > > > DPDK upstream mailing lists should be added to this e-mail. > > I am not allowed to provide off-list support, all discussions should > > happen upstream. > > > > If this is reproduced with downtream DPDK provided with RHEL and you > > have a RHEL subscription, please use the Red Hat issue tracker. > > > > Thanks for your understanding, > > Maxime > > > > On 12/5/24 21:36, Mukul Sinha wrote: > > > + Varun > > > > > > On Fri, Dec 6, 2024 at 2:04 AM Mukul Sinha > > mailto:mukul.si...@broadcom.com> > > > <mailto:mukul.si...@broadcom.com > > <mailto:mukul.si...@broadcom.com>>> wrote: > > > > > > Hi GCP & Virtio-PMD dev teams, > > > We are from VMware NSX Advanced Load Balancer Team whereby in > > > GCP-cloud (*custom-8-8192 VM instance type 8core8G*) we are > > triaging > > > an issue of TCP profile application throughput performance > with > > > single dispatcher core single Rx/Tx queue (queue depth: 2048) > the > > > throughput performance we get using dpdk-22.11 virtio-PMD > code is > > > degraded significantly when compared to when using dpdk-20.05 > PMD > > > We see high amount of Tx packet drop counter incrementing on > > > virtio-NIC pointing to issue that the GCP hypervisor side is > > unable > > > to drain the packets faster (No drops are seen on Rx side) > > > The behavior is like this : > > > _Using dpdk-22.11_ > > > At 75% CPU usage itself we start seeing huge number of Tx > packet > > > drops reported (no Rx drops) causing TCP restransmissions > > eventually > > > bringing down the effective throughput numbers > > > _Using dpdk-20.05_ > > > even at ~95% CPU usage without any packet drops (neither Rx > > nor Tx) > > > we are able to get a much better throughput > > > > > > To improve performance numbers with dpdk-22.11 we have tried > > > increasing the queue depth to 4096 but that din't help. > > > If with dpdk-22.11 we move from single core Rx/Tx queue=1 to > > single > > > core Rx/Tx queue=2 we are able to get slightly better numbers > > (but > > > still doesnt match the numbers obtained using dpdk-20.05 > > single core > > > Rx/Tx queue=1). This again corroborates the fact the GCP > > hypervisor > > > is the bottleneck here. > > > > > > To root-cause this issue we were able to replicate this > behavior > > > using native DPDK testpmd as shown below (cmds used):- > > > Hugepage size: 2 MB > > > ./app/dpdk-testpmd -l 0-1 -n 1 -- -i --nb-cores=1 --txd=2048 > > > --rxd=2048 --rxq=1 --txq=1 --portmask=0x3 > > > set fwd mac > > > set fwd flowgen > > > set txpkts 1518 > > > start > > > stop > > > > > > Testpmd traffic run (for packet-size=1518) for exact same > > > time-interval of 15 seconds: > > > > > > _22.11_ > > >-- Forward statistics for port 0 > > > -- > > >RX-packets: 2 RX-dropped: 0 > > RX-total: 2 > > >TX-packets: 19497570 *TX-dropped: 364674686 *TX-total: > > 384172256 > > > > > > > > > > > > _20.05_ > >
Re: GCP cloud : Virtio-PMD performance Issue
Thanks Maxime, We will analyse further and try pinpointing the regression commit between DPDK-21.11 & DPDK-21.08. Will get back with further queries once we have an update. On Fri, Dec 13, 2024 at 4:17 PM Maxime Coquelin wrote: > (with DPDK ML that got removed) > > On 12/13/24 11:46, Maxime Coquelin wrote: > > > > > > On 12/13/24 11:21, Mukul Sinha wrote: > >> Thanks @joshw...@google.com <mailto:joshw...@google.com> @Maxime > >> Coquelin <mailto:maxime.coque...@redhat.com> for the inputs. > >> > >> @Maxime Coquelin <mailto:maxime.coque...@redhat.com> > >> I did code bisecting and was able to pin-point through test-pmd run > >> that *this issue we are starting to see since DPDK-21.11 version > >> onwards. Till DPDK-21.08 this issue is not seen.* > >> To remind the issue what we see is that while actual amount of > >> serviced traffic by the hypervisor remains almost same between the two > >> versions (implying the underlying GCP hypervisor is only capable of > >> handling that much) but in >=dpdk-21.11 versions the virtio-PMD is > >> pushing almost 20x traffic compared to dpdk-21.08 (This humongous > >> traffic rate in >=dpdk-21.11 versions leads to high packet drop rates > >> since the underlying hypervisor is only capable of max handling the > >> same load it was servicing in <=dpdk-21.08) > >> The same pattern can be seen even if we run traffic for a longer > >> duration. > >> > >> *_Eg:_* > >> Testpmd traffic run (for packet-size=1518) for exact same > >> time-interval of 15 seconds: > >> > >> _*>=21.11 DPDK version*_ > >>-- Forward statistics for port 0 > >> -- > >>RX-packets: 2 RX-dropped: 0 RX-total: 2 > >>TX-packets: 19497570 *TX-dropped: 364674686 *TX-total: 384172256 > >> > > >> _*Upto 21.08 DPDK version *_ > >>-- Forward statistics for port 0 > >> -- > >>RX-packets: 3 RX-dropped: 0 RX-total: 3 > >>TX-packets: 19480319 TX-dropped: 0 TX-total: > >> 19480319 > >> > > >> > >> As you can see > >> >=dpdk-21.11 > >> Packets generated : 384 million Packets serviced : ~19.5 million : > >> Tx-dropped : 364 million > >> <=dpdk-21.08 > >> Packets generated : ~19.5 million Packets serviced : ~19.5 million : > >> Tx-dropped : 0 > >> > >> > == > >> @Maxime Coquelin <mailto:maxime.coque...@redhat.com> > >> I have run through all the commits made by virtio-team between > >> DPDK-21.11 and DPDK-21.08 as per the commit-logs available at > >> https://git.dpdk.org/dpdk/log/drivers/net/virtio > >> <https://git.dpdk.org/dpdk/log/drivers/net/virtio> > >> I even tried undoing all the possible relevant commits (I could think > >> of) on a dpdk-21.11 workspace & then re-running testpmd in order to > >> track down which commit has introduced this regression but no luck. > >> Need your inputs further if you could glance through the commits made > >> in between these releases and let us know if there's any particular > >> commit of interest which you think can cause the behavior as seen > >> above (or if there's any commit not captured in the above git link; > >> maybe a commit checkin outside the virtio PMD code perhaps?). > > > > As your issue seems 100% reproducible, using git bisect you should be > > able to point to the commit introducing the regression. > > > > This is what I need to be able to help you. > > > > Regards, > > Maxime > > > >> > >> Thanks, > >> Mukul > >> > >> > >> On Mon, Dec 9, 2024 at 9:54 PM Joshua Washington >> <mailto:joshw...@google.com>> wrote: > >> > >> Hello, > >> > >> Based on your VM shape (8 vcpu VM) and packet size (1518B packets), > >> what you are seeing is exactly expected. 8 vCPU Gen 2 VMs has a > >> default egress cap of 16 Gbps. This equates to roughly 1.3Mpps when > >> using 1518B packets, including IFG. Over the course of 15 seconds, > >>