Re: GCP cloud : Virtio-PMD performance Issue

2024-12-09 Thread Mukul Sinha
Thanks @maxime.coque...@redhat.com
Have included dev@dpdk.org


On Fri, Dec 6, 2024 at 2:11 AM Maxime Coquelin 
wrote:

> Hi Mukul,
>
> DPDK upstream mailing lists should be added to this e-mail.
> I am not allowed to provide off-list support, all discussions should
> happen upstream.
>
> If this is reproduced with downtream DPDK provided with RHEL and you
> have a RHEL subscription, please use the Red Hat issue tracker.
>
> Thanks for your understanding,
> Maxime
>
> On 12/5/24 21:36, Mukul Sinha wrote:
> > + Varun
> >
> > On Fri, Dec 6, 2024 at 2:04 AM Mukul Sinha  > <mailto:mukul.si...@broadcom.com>> wrote:
> >
> > Hi GCP & Virtio-PMD dev teams,
> > We are from VMware NSX Advanced Load Balancer Team whereby in
> > GCP-cloud (*custom-8-8192 VM instance type 8core8G*) we are triaging
> > an issue of TCP profile application throughput performance with
> > single dispatcher core single Rx/Tx queue (queue depth: 2048) the
> > throughput performance we get using dpdk-22.11 virtio-PMD code is
> > degraded significantly when compared to when using dpdk-20.05 PMD
> > We see high amount of Tx packet drop counter incrementing on
> > virtio-NIC pointing to issue that the GCP hypervisor side is unable
> > to drain the packets faster (No drops are seen on Rx side)
> > The behavior is like this :
> > _Using dpdk-22.11_
> > At 75% CPU usage itself we start seeing huge number of Tx packet
> > drops reported (no Rx drops) causing TCP restransmissions eventually
> > bringing down the effective throughput numbers
> > _Using dpdk-20.05_
> > even at ~95% CPU usage without any packet drops (neither Rx nor Tx)
> > we are able to get a much better throughput
> >
> > To improve performance numbers with dpdk-22.11 we have tried
> > increasing the queue depth to 4096 but that din't help.
> > If with dpdk-22.11 we move from single core Rx/Tx queue=1 to single
> > core Rx/Tx queue=2 we are able to get slightly better numbers (but
> > still doesnt match the numbers obtained using dpdk-20.05 single core
> > Rx/Tx queue=1). This again corroborates the fact the GCP hypervisor
> > is the bottleneck here.
> >
> > To root-cause this issue we were able to replicate this behavior
> > using native DPDK testpmd as shown below (cmds used):-
> > Hugepage size: 2 MB
> >   ./app/dpdk-testpmd -l 0-1 -n 1 -- -i --nb-cores=1 --txd=2048
> > --rxd=2048 --rxq=1 --txq=1  --portmask=0x3
> > set fwd mac
> > set fwd flowgen
> > set txpkts 1518
> > start
> > stop
> >
> > Testpmd traffic run (for packet-size=1518) for exact same
> > time-interval of 15 seconds:
> >
> > _22.11_
> >-- Forward statistics for port 0
> >   --
> >RX-packets: 2  RX-dropped: 0 RX-total: 2
> >TX-packets: 19497570 *TX-dropped: 364674686 *TX-total:
> 384172256
> >
> >
>  
> > _20.05_
> >-- Forward statistics for port 0
> >   --
> >RX-packets: 3  RX-dropped: 0 RX-total: 3
> >TX-packets: 19480319   TX-dropped: 0 TX-total:
> > 19480319
> >
> >
>  
> >
> > As you can see
> > dpdk-22.11
> > Packets generated : 384 million Packets serviced : ~19.5 million :
> > Tx-dropped : 364 million
> > dpdk-20.05
> > Packets generated : ~19.5 million Packets serviced : ~19.5 million :
> > Tx-dropped : 0
> >
> > Actual serviced traffic remains almost same between the two versions
> > (implying the underlying GCP hypervisor is only capable of handling
> > that much) but in dpdk-22.11 the PMD is pushing almost 20x traffic
> > compared to dpdk-20.05
> > The same pattern can be seen even if we run traffic for a longer
> > duration.
> >
>  
> ===
> >
> > Following are our queries:
> > @ Virtio-dev team
> > 1. Why in dpdk-22.11 using virtio PMD the testpmd application is
> > able to pump 20 times Tx traffic towards hypervisor compared to
> > dpdk-20.05 ?
> > What has changed

Re: GCP cloud : Virtio-PMD performance Issue

2024-12-09 Thread Mukul Sinha
GCP-dev team @jeroe...@google.com   @rush...@google.com
 @joshw...@google.com 
Please do check on this & get back.

On Fri, Dec 6, 2024 at 4:24 AM Mukul Sinha  wrote:

> Thanks @maxime.coque...@redhat.com
> Have included dev@dpdk.org
>
>
> On Fri, Dec 6, 2024 at 2:11 AM Maxime Coquelin 
> wrote:
>
>> Hi Mukul,
>>
>> DPDK upstream mailing lists should be added to this e-mail.
>> I am not allowed to provide off-list support, all discussions should
>> happen upstream.
>>
>> If this is reproduced with downtream DPDK provided with RHEL and you
>> have a RHEL subscription, please use the Red Hat issue tracker.
>>
>> Thanks for your understanding,
>> Maxime
>>
>> On 12/5/24 21:36, Mukul Sinha wrote:
>> > + Varun
>> >
>> > On Fri, Dec 6, 2024 at 2:04 AM Mukul Sinha > > <mailto:mukul.si...@broadcom.com>> wrote:
>> >
>> > Hi GCP & Virtio-PMD dev teams,
>> > We are from VMware NSX Advanced Load Balancer Team whereby in
>> > GCP-cloud (*custom-8-8192 VM instance type 8core8G*) we are triaging
>> > an issue of TCP profile application throughput performance with
>> > single dispatcher core single Rx/Tx queue (queue depth: 2048) the
>> > throughput performance we get using dpdk-22.11 virtio-PMD code is
>> > degraded significantly when compared to when using dpdk-20.05 PMD
>> > We see high amount of Tx packet drop counter incrementing on
>> > virtio-NIC pointing to issue that the GCP hypervisor side is unable
>> > to drain the packets faster (No drops are seen on Rx side)
>> > The behavior is like this :
>> > _Using dpdk-22.11_
>> > At 75% CPU usage itself we start seeing huge number of Tx packet
>> > drops reported (no Rx drops) causing TCP restransmissions eventually
>> > bringing down the effective throughput numbers
>> > _Using dpdk-20.05_
>> > even at ~95% CPU usage without any packet drops (neither Rx nor Tx)
>> > we are able to get a much better throughput
>> >
>> > To improve performance numbers with dpdk-22.11 we have tried
>> > increasing the queue depth to 4096 but that din't help.
>> > If with dpdk-22.11 we move from single core Rx/Tx queue=1 to single
>> > core Rx/Tx queue=2 we are able to get slightly better numbers (but
>> > still doesnt match the numbers obtained using dpdk-20.05 single core
>> > Rx/Tx queue=1). This again corroborates the fact the GCP hypervisor
>> > is the bottleneck here.
>> >
>> > To root-cause this issue we were able to replicate this behavior
>> > using native DPDK testpmd as shown below (cmds used):-
>> > Hugepage size: 2 MB
>> >   ./app/dpdk-testpmd -l 0-1 -n 1 -- -i --nb-cores=1 --txd=2048
>> > --rxd=2048 --rxq=1 --txq=1  --portmask=0x3
>> > set fwd mac
>> > set fwd flowgen
>> > set txpkts 1518
>> > start
>> > stop
>> >
>> > Testpmd traffic run (for packet-size=1518) for exact same
>> > time-interval of 15 seconds:
>> >
>> > _22.11_
>> >-- Forward statistics for port 0
>> >   --
>> >RX-packets: 2  RX-dropped: 0 RX-total: 2
>> >TX-packets: 19497570 *TX-dropped: 364674686 *TX-total:
>> 384172256
>> >
>> >
>>  
>> > _20.05_
>> >-- Forward statistics for port 0
>> >   --
>> >RX-packets: 3  RX-dropped: 0 RX-total: 3
>> >TX-packets: 19480319   TX-dropped: 0 TX-total:
>> > 19480319
>> >
>> >
>>  
>> >
>> > As you can see
>> > dpdk-22.11
>> > Packets generated : 384 million Packets serviced : ~19.5 million :
>> > Tx-dropped : 364 million
>> > dpdk-20.05
>> > Packets generated : ~19.5 million Packets serviced : ~19.5 million :
>> > Tx-dropped : 0
>> >
>> > Actual serviced traffic remains almost same between the two versions
>> > (implying the underlying GCP hypervisor is only capable of handling
>> > that much) but in dpdk-22.11 the PMD is pushing almost 20x 

Re: GCP cloud : Virtio-PMD performance Issue

2024-12-11 Thread Mukul Sinha
Hi Maxime,
We have run perf top on dpdk-20.05 vs dpdk-22.11 but nothing of difference
in the top-hitter api's. In our analysis virtio-PMD CPU performance's not a
bottleneck (infact its more performant now) but its the GCP hypervisor
which isn't able to cope up with 3 times the Tx traffic load.

On Fri, Dec 6, 2024 at 1:54 PM Maxime Coquelin 
wrote:

> Hi Mukul,
>
> On 12/5/24 23:54, Mukul Sinha wrote:
> > Thanks @maxime.coque...@redhat.com <mailto:maxime.coque...@redhat.com>
> > Have included dev@dpdk.org <mailto:dev@dpdk.org>
> >
> >
> > On Fri, Dec 6, 2024 at 2:11 AM Maxime Coquelin
> > mailto:maxime.coque...@redhat.com>> wrote:
> >
> > Hi Mukul,
> >
> > DPDK upstream mailing lists should be added to this e-mail.
> > I am not allowed to provide off-list support, all discussions should
> > happen upstream.
> >
> > If this is reproduced with downtream DPDK provided with RHEL and you
> > have a RHEL subscription, please use the Red Hat issue tracker.
> >
> > Thanks for your understanding,
> > Maxime
> >
> > On 12/5/24 21:36, Mukul Sinha wrote:
> >  > + Varun
> >  >
> >  > On Fri, Dec 6, 2024 at 2:04 AM Mukul Sinha
> > mailto:mukul.si...@broadcom.com>
> >  > <mailto:mukul.si...@broadcom.com
> > <mailto:mukul.si...@broadcom.com>>> wrote:
> >  >
> >  > Hi GCP & Virtio-PMD dev teams,
> >  > We are from VMware NSX Advanced Load Balancer Team whereby in
> >  > GCP-cloud (*custom-8-8192 VM instance type 8core8G*) we are
> > triaging
> >  > an issue of TCP profile application throughput performance
> with
> >  > single dispatcher core single Rx/Tx queue (queue depth: 2048)
> the
> >  > throughput performance we get using dpdk-22.11 virtio-PMD
> code is
> >  > degraded significantly when compared to when using dpdk-20.05
> PMD
> >  > We see high amount of Tx packet drop counter incrementing on
> >  > virtio-NIC pointing to issue that the GCP hypervisor side is
> > unable
> >  > to drain the packets faster (No drops are seen on Rx side)
> >  > The behavior is like this :
> >  > _Using dpdk-22.11_
> >  > At 75% CPU usage itself we start seeing huge number of Tx
> packet
> >  > drops reported (no Rx drops) causing TCP restransmissions
> > eventually
> >  > bringing down the effective throughput numbers
> >  > _Using dpdk-20.05_
> >  > even at ~95% CPU usage without any packet drops (neither Rx
> > nor Tx)
> >  > we are able to get a much better throughput
> >  >
> >  > To improve performance numbers with dpdk-22.11 we have tried
> >  > increasing the queue depth to 4096 but that din't help.
> >  > If with dpdk-22.11 we move from single core Rx/Tx queue=1 to
> > single
> >  > core Rx/Tx queue=2 we are able to get slightly better numbers
> > (but
> >  > still doesnt match the numbers obtained using dpdk-20.05
> > single core
> >  > Rx/Tx queue=1). This again corroborates the fact the GCP
> > hypervisor
> >  > is the bottleneck here.
> >  >
> >  > To root-cause this issue we were able to replicate this
> behavior
> >  > using native DPDK testpmd as shown below (cmds used):-
> >  > Hugepage size: 2 MB
> >  >   ./app/dpdk-testpmd -l 0-1 -n 1 -- -i --nb-cores=1 --txd=2048
> >  > --rxd=2048 --rxq=1 --txq=1  --portmask=0x3
> >  > set fwd mac
> >  > set fwd flowgen
> >  > set txpkts 1518
> >  > start
> >  > stop
> >  >
> >  > Testpmd traffic run (for packet-size=1518) for exact same
> >  > time-interval of 15 seconds:
> >  >
> >  > _22.11_
> >  >-- Forward statistics for port 0
> >  >   --
> >  >RX-packets: 2  RX-dropped: 0
> > RX-total: 2
> >  >TX-packets: 19497570 *TX-dropped: 364674686 *TX-total:
> > 384172256
> >  >
> >  >
> >
>  
> >  > _20.05_
> >  

Re: GCP cloud : Virtio-PMD performance Issue

2024-12-17 Thread Mukul Sinha
Thanks Maxime,
We will analyse further and try pinpointing the regression commit between
DPDK-21.11 & DPDK-21.08.
Will get back with further queries once we have an update.

On Fri, Dec 13, 2024 at 4:17 PM Maxime Coquelin 
wrote:

> (with DPDK ML that got removed)
>
> On 12/13/24 11:46, Maxime Coquelin wrote:
> >
> >
> > On 12/13/24 11:21, Mukul Sinha wrote:
> >> Thanks @joshw...@google.com <mailto:joshw...@google.com> @Maxime
> >> Coquelin <mailto:maxime.coque...@redhat.com> for the inputs.
> >>
> >> @Maxime Coquelin <mailto:maxime.coque...@redhat.com>
> >> I did code bisecting and was able to pin-point through test-pmd run
> >> that *this issue we are starting to see since DPDK-21.11 version
> >> onwards. Till DPDK-21.08 this issue is not seen.*
> >> To remind the issue what we see is that while actual amount of
> >> serviced traffic by the hypervisor remains almost same between the two
> >> versions (implying the underlying GCP hypervisor is only capable of
> >> handling that much) but in >=dpdk-21.11 versions the virtio-PMD is
> >> pushing almost 20x traffic compared to dpdk-21.08 (This humongous
> >> traffic rate in  >=dpdk-21.11 versions leads to high packet drop rates
> >> since the underlying hypervisor is only capable of max handling the
> >> same load it was servicing in <=dpdk-21.08)
> >> The same pattern can be seen even if we run traffic for a longer
> >> duration.
> >>
> >> *_Eg:_*
> >> Testpmd traffic run (for packet-size=1518) for exact same
> >> time-interval of 15 seconds:
> >>
> >> _*>=21.11 DPDK version*_
> >>-- Forward statistics for port 0
> >>   --
> >>RX-packets: 2  RX-dropped: 0 RX-total: 2
> >>TX-packets: 19497570 *TX-dropped: 364674686 *TX-total: 384172256
> >>
> 
> >> _*Upto 21.08 DPDK version *_
> >>-- Forward statistics for port 0
> >>   --
> >>RX-packets: 3  RX-dropped: 0 RX-total: 3
> >>TX-packets: 19480319   TX-dropped: 0 TX-total:
> >> 19480319
> >>
> 
> >>
> >> As you can see
> >>  >=dpdk-21.11
> >> Packets generated : 384 million Packets serviced : ~19.5 million :
> >> Tx-dropped : 364 million
> >> <=dpdk-21.08
> >> Packets generated : ~19.5 million Packets serviced : ~19.5 million :
> >> Tx-dropped : 0
> >>
> >>
> ==
> >> @Maxime Coquelin <mailto:maxime.coque...@redhat.com>
> >> I have run through all the commits made by virtio-team between
> >> DPDK-21.11 and DPDK-21.08 as per the commit-logs available at
> >> https://git.dpdk.org/dpdk/log/drivers/net/virtio
> >> <https://git.dpdk.org/dpdk/log/drivers/net/virtio>
> >> I even tried undoing all the possible relevant commits (I could think
> >> of) on a dpdk-21.11 workspace & then re-running testpmd in order to
> >> track down which commit has introduced this regression but no luck.
> >> Need your inputs further if you could glance through the commits made
> >> in between these releases and let us know if there's any particular
> >> commit of interest which you think can cause the behavior as seen
> >> above (or if there's any commit not captured in the above git link;
> >> maybe a commit checkin outside the virtio PMD code perhaps?).
> >
> > As your issue seems 100% reproducible, using git bisect you should be
> > able to point to the commit introducing the regression.
> >
> > This is what I need to be able to help you.
> >
> > Regards,
> > Maxime
> >
> >>
> >> Thanks,
> >> Mukul
> >>
> >>
> >> On Mon, Dec 9, 2024 at 9:54 PM Joshua Washington  >> <mailto:joshw...@google.com>> wrote:
> >>
> >> Hello,
> >>
> >> Based on your VM shape (8 vcpu VM) and packet size (1518B packets),
> >> what you are seeing is exactly expected. 8 vCPU Gen 2 VMs has a
> >> default egress cap of 16 Gbps. This equates to roughly 1.3Mpps when
> >> using 1518B packets, including IFG. Over the course of 15 seconds,
> >>