Re: [PATCH net-next] virtio_net: Add TX stop and wake counters

2024-06-07 Thread Jason Xing
On Sat, Feb 3, 2024 at 12:46 AM Daniel Jurgens  wrote:
>
> > From: Jakub Kicinski 
> > Sent: Friday, February 2, 2024 10:01 AM
> > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters
> >
> > On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote:
> > > > Can you say more? I'm curious what's your use case.
> > >
> > > I'm not working at Nvidia, so my point of view may differ from theirs.
> > > From what I can tell is that those two counters help me narrow down
> > > the range if I have to diagnose/debug some issues.
> >
> > right, i'm asking to collect useful debugging tricks, nothing against the 
> > patch
> > itself :)
> >
> > > 1) I sometimes notice that if some irq is held too long (say, one
> > > simple case: output of printk printed to the console), those two
> > > counters can reflect the issue.
> > > 2) Similarly in virtio net, recently I traced such counters the
> > > current kernel does not have and it turned out that one of the output
> > > queues in the backend behaves badly.
> > > ...
> > >
> > > Stop/wake queue counters may not show directly the root cause of the
> > > issue, but help us 'guess' to some extent.
> >
> > I'm surprised you say you can detect stall-related issues with this.
> > I guess virtio doesn't have BQL support, which makes it special.
> > Normal HW drivers with BQL almost never stop the queue by themselves.
> > I mean - if they do, and BQL is active, then the system is probably
> > misconfigured (queue is too short). This is what we use at Meta to detect
> > stalls in drivers with BQL:
> >
> > https://lore.kernel.org/all/20240131102150.728960-3-lei...@debian.org/
> >
> > Daniel, I think this may be a good enough excuse to add per-queue stats to
> > the netdev genl family, if you're up for that. LMK if you want more info,
> > otherwise I guess ethtool -S is fine for now.
>
> For now, I would like to go with ethtool -S. But I can take on the netdev 
> level as a background task. Are you suggesting adding them to 
> rtnl_link_stats64?

Hello Daniel, Jakub,

Sorry to revive this thread. I wonder why not use this patch like mlnx
driver does instead of adding statistics into the yaml file? Are we
gradually using or adding more fields into the yaml file to replace
the 'ethtool -S' command?

Thanks,
Jason



Re: [patch net-next] virtio_net: add support for Byte Queue Limits

2024-06-07 Thread Jiri Pirko
Fri, Jun 07, 2024 at 08:47:43AM CEST, jasow...@redhat.com wrote:
>On Fri, Jun 7, 2024 at 2:39 PM Jiri Pirko  wrote:
>>
>> Fri, Jun 07, 2024 at 08:25:19AM CEST, jasow...@redhat.com wrote:
>> >On Thu, Jun 6, 2024 at 9:45 PM Jiri Pirko  wrote:
>> >>
>> >> Thu, Jun 06, 2024 at 09:56:50AM CEST, jasow...@redhat.com wrote:
>> >> >On Thu, Jun 6, 2024 at 2:05 PM Michael S. Tsirkin  
>> >> >wrote:
>> >> >>
>> >> >> On Thu, Jun 06, 2024 at 12:25:15PM +0800, Jason Wang wrote:
>> >> >> > > If the codes of orphan mode don't have an impact when you enable
>> >> >> > > napi_tx mode, please keep it if you can.
>> >> >> >
>> >> >> > For example, it complicates BQL implementation.
>> >> >> >
>> >> >> > Thanks
>> >> >>
>> >> >> I very much doubt sending interrupts to a VM can
>> >> >> *on all benchmarks* compete with not sending interrupts.
>> >> >
>> >> >It should not differ too much from the physical NIC. We can have one
>> >> >more round of benchmarks to see the difference.
>> >> >
>> >> >But if NAPI mode needs to win all of the benchmarks in order to get
>> >> >rid of orphan, that would be very difficult. Considering various bugs
>> >> >will be fixed by dropping skb_orphan(), it would be sufficient if most
>> >> >of the benchmark doesn't show obvious differences.
>> >> >
>> >> >Looking at git history, there're commits that removes skb_orphan(), for 
>> >> >example:
>> >> >
>> >> >commit 8112ec3b8722680251aecdcc23dfd81aa7af6340
>> >> >Author: Eric Dumazet 
>> >> >Date:   Fri Sep 28 07:53:26 2012 +
>> >> >
>> >> >mlx4: dont orphan skbs in mlx4_en_xmit()
>> >> >
>> >> >After commit e22979d96a55d (mlx4_en: Moving to Interrupts for TX
>> >> >completions) we no longer need to orphan skbs in mlx4_en_xmit()
>> >> >since skb wont stay a long time in TX ring before their release.
>> >> >
>> >> >Orphaning skbs in ndo_start_xmit() should be avoided as much as
>> >> >possible, since it breaks TCP Small Queue or other flow control
>> >> >mechanisms (per socket limits)
>> >> >
>> >> >Signed-off-by: Eric Dumazet 
>> >> >Acked-by: Yevgeny Petrilin 
>> >> >Cc: Or Gerlitz 
>> >> >Signed-off-by: David S. Miller 
>> >> >
>> >> >>
>> >> >> So yea, it's great if napi and hardware are advanced enough
>> >> >> that the default can be changed, since this way virtio
>> >> >> is closer to a regular nic and more or standard
>> >> >> infrastructure can be used.
>> >> >>
>> >> >> But dropping it will go against *no breaking userspace* rule.
>> >> >> Complicated? Tough.
>> >> >
>> >> >I don't know what kind of userspace is broken by this. Or why it is
>> >> >not broken since the day we enable NAPI mode by default.
>> >>
>> >> There is a module option that explicitly allows user to set
>> >> napi_tx=false
>> >> or
>> >> napi_weight=0
>> >>
>> >> So if you remove this option or ignore it, both breaks the user
>> >> expectation.
>> >
>> >We can keep them, but I wonder what's the expectation of the user
>> >here? The only thing so far I can imagine is the performance
>> >difference.
>>
>> True.
>>
>> >
>> >> I personally would vote for this breakage. To carry ancient
>> >> things like this one forever does not make sense to me.
>> >
>> >Exactly.
>> >
>> >> While at it,
>> >> let's remove all virtio net module params. Thoughts?
>> >
>> >I tend to
>> >
>> >1) drop the orphan mode, but we can have some benchmarks first
>>
>> Any idea which? That would be really tricky to find the ones where
>> orphan mode makes difference I assume.
>
>True. Personally, I would like to just drop orphan mode. But I'm not
>sure others are happy with this.

How about to do it other way around. I will take a stab at sending patch
removing it. If anyone is against and has solid data to prove orphan
mode is needed, let them provide those.


>
>Thanks
>
>>
>>
>> >2) keep the module parameters
>>
>> and ignore them, correct? Perhaps a warning would be good.
>>
>>
>> >
>> >Thanks
>> >
>> >>
>> >>
>> >>
>> >> >
>> >> >Thanks
>> >> >
>> >> >>
>> >> >> --
>> >> >> MST
>> >> >>
>> >> >
>> >>
>> >
>>
>



Re: [patch net-next] virtio_net: add support for Byte Queue Limits

2024-06-07 Thread Michael S. Tsirkin
On Fri, Jun 07, 2024 at 11:57:37AM +0200, Jiri Pirko wrote:
> >True. Personally, I would like to just drop orphan mode. But I'm not
> >sure others are happy with this.
> 
> How about to do it other way around. I will take a stab at sending patch
> removing it. If anyone is against and has solid data to prove orphan
> mode is needed, let them provide those.

Break it with no warning and see if anyone complains?
No, this is not how we handle userspace compatibility, normally.

-- 
MST




Re: [patch net-next] virtio_net: add support for Byte Queue Limits

2024-06-07 Thread Jason Xing
On Fri, Jun 7, 2024 at 5:57 PM Jiri Pirko  wrote:
>
> Fri, Jun 07, 2024 at 08:47:43AM CEST, jasow...@redhat.com wrote:
> >On Fri, Jun 7, 2024 at 2:39 PM Jiri Pirko  wrote:
> >>
> >> Fri, Jun 07, 2024 at 08:25:19AM CEST, jasow...@redhat.com wrote:
> >> >On Thu, Jun 6, 2024 at 9:45 PM Jiri Pirko  wrote:
> >> >>
> >> >> Thu, Jun 06, 2024 at 09:56:50AM CEST, jasow...@redhat.com wrote:
> >> >> >On Thu, Jun 6, 2024 at 2:05 PM Michael S. Tsirkin  
> >> >> >wrote:
> >> >> >>
> >> >> >> On Thu, Jun 06, 2024 at 12:25:15PM +0800, Jason Wang wrote:
> >> >> >> > > If the codes of orphan mode don't have an impact when you enable
> >> >> >> > > napi_tx mode, please keep it if you can.
> >> >> >> >
> >> >> >> > For example, it complicates BQL implementation.
> >> >> >> >
> >> >> >> > Thanks
> >> >> >>
> >> >> >> I very much doubt sending interrupts to a VM can
> >> >> >> *on all benchmarks* compete with not sending interrupts.
> >> >> >
> >> >> >It should not differ too much from the physical NIC. We can have one
> >> >> >more round of benchmarks to see the difference.
> >> >> >
> >> >> >But if NAPI mode needs to win all of the benchmarks in order to get
> >> >> >rid of orphan, that would be very difficult. Considering various bugs
> >> >> >will be fixed by dropping skb_orphan(), it would be sufficient if most
> >> >> >of the benchmark doesn't show obvious differences.
> >> >> >
> >> >> >Looking at git history, there're commits that removes skb_orphan(), 
> >> >> >for example:
> >> >> >
> >> >> >commit 8112ec3b8722680251aecdcc23dfd81aa7af6340
> >> >> >Author: Eric Dumazet 
> >> >> >Date:   Fri Sep 28 07:53:26 2012 +
> >> >> >
> >> >> >mlx4: dont orphan skbs in mlx4_en_xmit()
> >> >> >
> >> >> >After commit e22979d96a55d (mlx4_en: Moving to Interrupts for TX
> >> >> >completions) we no longer need to orphan skbs in mlx4_en_xmit()
> >> >> >since skb wont stay a long time in TX ring before their release.
> >> >> >
> >> >> >Orphaning skbs in ndo_start_xmit() should be avoided as much as
> >> >> >possible, since it breaks TCP Small Queue or other flow control
> >> >> >mechanisms (per socket limits)
> >> >> >
> >> >> >Signed-off-by: Eric Dumazet 
> >> >> >Acked-by: Yevgeny Petrilin 
> >> >> >Cc: Or Gerlitz 
> >> >> >Signed-off-by: David S. Miller 
> >> >> >
> >> >> >>
> >> >> >> So yea, it's great if napi and hardware are advanced enough
> >> >> >> that the default can be changed, since this way virtio
> >> >> >> is closer to a regular nic and more or standard
> >> >> >> infrastructure can be used.
> >> >> >>
> >> >> >> But dropping it will go against *no breaking userspace* rule.
> >> >> >> Complicated? Tough.
> >> >> >
> >> >> >I don't know what kind of userspace is broken by this. Or why it is
> >> >> >not broken since the day we enable NAPI mode by default.
> >> >>
> >> >> There is a module option that explicitly allows user to set
> >> >> napi_tx=false
> >> >> or
> >> >> napi_weight=0
> >> >>
> >> >> So if you remove this option or ignore it, both breaks the user
> >> >> expectation.
> >> >
> >> >We can keep them, but I wonder what's the expectation of the user
> >> >here? The only thing so far I can imagine is the performance
> >> >difference.
> >>
> >> True.
> >>
> >> >
> >> >> I personally would vote for this breakage. To carry ancient
> >> >> things like this one forever does not make sense to me.
> >> >
> >> >Exactly.
> >> >
> >> >> While at it,
> >> >> let's remove all virtio net module params. Thoughts?
> >> >
> >> >I tend to
> >> >
> >> >1) drop the orphan mode, but we can have some benchmarks first
> >>
> >> Any idea which? That would be really tricky to find the ones where
> >> orphan mode makes difference I assume.
> >
> >True. Personally, I would like to just drop orphan mode. But I'm not
> >sure others are happy with this.
>
> How about to do it other way around. I will take a stab at sending patch
> removing it. If anyone is against and has solid data to prove orphan
> mode is needed, let them provide those.

Honestly, we in production have to use skb orphan mode. I cannot
provide enough and strong evidence about why default mode on earth
causes performance degradation in some cases. I mean I don't know the
root cause. The only thing I can tell is if I enable the skb orphan
mode, then 1)  will let more skb pass the TCP layer, 2) some key
members like snd_cwnd in tcp will behave normally.

I know without orphan mode the skb will be controlled/limited by the
combination of TSO and napi_tx mode thanks to sk_wmem_alloc. So I
_guess_ the root cause is: the possible delay of interrupts generated
by the host machine will cause the delay of freeing skbs, resulting in
the slowing down the tx speed. If the interval between two interrupts
is very short like the real NIC, I think the issue would disappear.

That's all I can tell. Just for record. Hope this information could
also be useful to other readers.

Thanks,
Jason

>
>
> >
> >Thanks
> >
> >>
> >>
> >> >2) keep the mo

Re: [patch net-next] virtio_net: add support for Byte Queue Limits

2024-06-07 Thread Jiri Pirko
Fri, Jun 07, 2024 at 12:23:37PM CEST, m...@redhat.com wrote:
>On Fri, Jun 07, 2024 at 11:57:37AM +0200, Jiri Pirko wrote:
>> >True. Personally, I would like to just drop orphan mode. But I'm not
>> >sure others are happy with this.
>> 
>> How about to do it other way around. I will take a stab at sending patch
>> removing it. If anyone is against and has solid data to prove orphan
>> mode is needed, let them provide those.
>
>Break it with no warning and see if anyone complains?

This is now what I suggested at all.

>No, this is not how we handle userspace compatibility, normally.

Sure.

Again:

I would send orphan removal patch containing:
1) no module options removal. Warn if someone sets it up
2) module option to disable napi is ignored
3) orphan mode is removed from code

There is no breakage. Only, hypotetically performance downgrade in some
hypotetical usecase nobody knows of. My point was, if someone presents
solid data to prove orphan is needed during the patch review, let's toss
out the patch.

Makes sense?


>
>-- 
>MST
>



RE: [PATCH net-next] virtio_net: Add TX stop and wake counters

2024-06-07 Thread Dan Jurgens
> From: Jason Xing 
> Sent: Friday, June 7, 2024 4:16 AM
> To: Dan Jurgens 
> Cc: Jakub Kicinski ; Michael S. Tsirkin ;
> netdev@vger.kernel.org; jasow...@redhat.com;
> xuanz...@linux.alibaba.com; virtualizat...@lists.linux.dev;
> da...@davemloft.net; eduma...@google.com; ab...@redhat.com; Parav
> Pandit 
> Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters
> 
> On Sat, Feb 3, 2024 at 12:46 AM Daniel Jurgens  wrote:
> >
> > > From: Jakub Kicinski 
> > > Sent: Friday, February 2, 2024 10:01 AM
> > > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake
> > > counters
> > >
> > > On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote:
> > > > > Can you say more? I'm curious what's your use case.
> > > >
> > > > I'm not working at Nvidia, so my point of view may differ from theirs.
> > > > From what I can tell is that those two counters help me narrow
> > > > down the range if I have to diagnose/debug some issues.
> > >
> > > right, i'm asking to collect useful debugging tricks, nothing
> > > against the patch itself :)
> > >
> > > > 1) I sometimes notice that if some irq is held too long (say, one
> > > > simple case: output of printk printed to the console), those two
> > > > counters can reflect the issue.
> > > > 2) Similarly in virtio net, recently I traced such counters the
> > > > current kernel does not have and it turned out that one of the
> > > > output queues in the backend behaves badly.
> > > > ...
> > > >
> > > > Stop/wake queue counters may not show directly the root cause of
> > > > the issue, but help us 'guess' to some extent.
> > >
> > > I'm surprised you say you can detect stall-related issues with this.
> > > I guess virtio doesn't have BQL support, which makes it special.
> > > Normal HW drivers with BQL almost never stop the queue by themselves.
> > > I mean - if they do, and BQL is active, then the system is probably
> > > misconfigured (queue is too short). This is what we use at Meta to
> > > detect stalls in drivers with BQL:
> > >
> > > https://lore.kernel.org/all/20240131102150.728960-3-lei...@debian.or
> > > g/
> > >
> > > Daniel, I think this may be a good enough excuse to add per-queue
> > > stats to the netdev genl family, if you're up for that. LMK if you
> > > want more info, otherwise I guess ethtool -S is fine for now.
> >
> > For now, I would like to go with ethtool -S. But I can take on the netdev
> level as a background task. Are you suggesting adding them to
> rtnl_link_stats64?
> 
> Hello Daniel, Jakub,
> 
> Sorry to revive this thread. I wonder why not use this patch like mlnx driver
> does instead of adding statistics into the yaml file? Are we gradually using 
> or
> adding more fields into the yaml file to replace the 'ethtool -S' command?
> 

It's trivial to have the stats in ethtool as well. But I noticed the stats 
series intentionally removed some stats from ethtool. So I didn't put it both 
places.

> Thanks,
> Jason


Re: [PATCH net-next] virtio_net: Add TX stop and wake counters

2024-06-07 Thread Jason Xing
On Fri, Jun 7, 2024 at 8:37 PM Dan Jurgens  wrote:
>
> > From: Jason Xing 
> > Sent: Friday, June 7, 2024 4:16 AM
> > To: Dan Jurgens 
> > Cc: Jakub Kicinski ; Michael S. Tsirkin ;
> > netdev@vger.kernel.org; jasow...@redhat.com;
> > xuanz...@linux.alibaba.com; virtualizat...@lists.linux.dev;
> > da...@davemloft.net; eduma...@google.com; ab...@redhat.com; Parav
> > Pandit 
> > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters
> >
> > On Sat, Feb 3, 2024 at 12:46 AM Daniel Jurgens  wrote:
> > >
> > > > From: Jakub Kicinski 
> > > > Sent: Friday, February 2, 2024 10:01 AM
> > > > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake
> > > > counters
> > > >
> > > > On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote:
> > > > > > Can you say more? I'm curious what's your use case.
> > > > >
> > > > > I'm not working at Nvidia, so my point of view may differ from theirs.
> > > > > From what I can tell is that those two counters help me narrow
> > > > > down the range if I have to diagnose/debug some issues.
> > > >
> > > > right, i'm asking to collect useful debugging tricks, nothing
> > > > against the patch itself :)
> > > >
> > > > > 1) I sometimes notice that if some irq is held too long (say, one
> > > > > simple case: output of printk printed to the console), those two
> > > > > counters can reflect the issue.
> > > > > 2) Similarly in virtio net, recently I traced such counters the
> > > > > current kernel does not have and it turned out that one of the
> > > > > output queues in the backend behaves badly.
> > > > > ...
> > > > >
> > > > > Stop/wake queue counters may not show directly the root cause of
> > > > > the issue, but help us 'guess' to some extent.
> > > >
> > > > I'm surprised you say you can detect stall-related issues with this.
> > > > I guess virtio doesn't have BQL support, which makes it special.
> > > > Normal HW drivers with BQL almost never stop the queue by themselves.
> > > > I mean - if they do, and BQL is active, then the system is probably
> > > > misconfigured (queue is too short). This is what we use at Meta to
> > > > detect stalls in drivers with BQL:
> > > >
> > > > https://lore.kernel.org/all/20240131102150.728960-3-lei...@debian.or
> > > > g/
> > > >
> > > > Daniel, I think this may be a good enough excuse to add per-queue
> > > > stats to the netdev genl family, if you're up for that. LMK if you
> > > > want more info, otherwise I guess ethtool -S is fine for now.
> > >
> > > For now, I would like to go with ethtool -S. But I can take on the netdev
> > level as a background task. Are you suggesting adding them to
> > rtnl_link_stats64?
> >
> > Hello Daniel, Jakub,
> >
> > Sorry to revive this thread. I wonder why not use this patch like mlnx 
> > driver
> > does instead of adding statistics into the yaml file? Are we gradually 
> > using or
> > adding more fields into the yaml file to replace the 'ethtool -S' command?
> >
>
> It's trivial to have the stats in ethtool as well. But I noticed the stats 
> series intentionally removed some stats from ethtool. So I didn't put it both 
> places.

Thank you for the reply. I thought there was some particular reason :-)