Re: [PATCH net-next] virtio_net: Add TX stop and wake counters
On Sat, Feb 3, 2024 at 12:46 AM Daniel Jurgens wrote: > > > From: Jakub Kicinski > > Sent: Friday, February 2, 2024 10:01 AM > > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters > > > > On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote: > > > > Can you say more? I'm curious what's your use case. > > > > > > I'm not working at Nvidia, so my point of view may differ from theirs. > > > From what I can tell is that those two counters help me narrow down > > > the range if I have to diagnose/debug some issues. > > > > right, i'm asking to collect useful debugging tricks, nothing against the > > patch > > itself :) > > > > > 1) I sometimes notice that if some irq is held too long (say, one > > > simple case: output of printk printed to the console), those two > > > counters can reflect the issue. > > > 2) Similarly in virtio net, recently I traced such counters the > > > current kernel does not have and it turned out that one of the output > > > queues in the backend behaves badly. > > > ... > > > > > > Stop/wake queue counters may not show directly the root cause of the > > > issue, but help us 'guess' to some extent. > > > > I'm surprised you say you can detect stall-related issues with this. > > I guess virtio doesn't have BQL support, which makes it special. > > Normal HW drivers with BQL almost never stop the queue by themselves. > > I mean - if they do, and BQL is active, then the system is probably > > misconfigured (queue is too short). This is what we use at Meta to detect > > stalls in drivers with BQL: > > > > https://lore.kernel.org/all/20240131102150.728960-3-lei...@debian.org/ > > > > Daniel, I think this may be a good enough excuse to add per-queue stats to > > the netdev genl family, if you're up for that. LMK if you want more info, > > otherwise I guess ethtool -S is fine for now. > > For now, I would like to go with ethtool -S. But I can take on the netdev > level as a background task. Are you suggesting adding them to > rtnl_link_stats64? Hello Daniel, Jakub, Sorry to revive this thread. I wonder why not use this patch like mlnx driver does instead of adding statistics into the yaml file? Are we gradually using or adding more fields into the yaml file to replace the 'ethtool -S' command? Thanks, Jason
Re: [patch net-next] virtio_net: add support for Byte Queue Limits
Fri, Jun 07, 2024 at 08:47:43AM CEST, jasow...@redhat.com wrote: >On Fri, Jun 7, 2024 at 2:39 PM Jiri Pirko wrote: >> >> Fri, Jun 07, 2024 at 08:25:19AM CEST, jasow...@redhat.com wrote: >> >On Thu, Jun 6, 2024 at 9:45 PM Jiri Pirko wrote: >> >> >> >> Thu, Jun 06, 2024 at 09:56:50AM CEST, jasow...@redhat.com wrote: >> >> >On Thu, Jun 6, 2024 at 2:05 PM Michael S. Tsirkin >> >> >wrote: >> >> >> >> >> >> On Thu, Jun 06, 2024 at 12:25:15PM +0800, Jason Wang wrote: >> >> >> > > If the codes of orphan mode don't have an impact when you enable >> >> >> > > napi_tx mode, please keep it if you can. >> >> >> > >> >> >> > For example, it complicates BQL implementation. >> >> >> > >> >> >> > Thanks >> >> >> >> >> >> I very much doubt sending interrupts to a VM can >> >> >> *on all benchmarks* compete with not sending interrupts. >> >> > >> >> >It should not differ too much from the physical NIC. We can have one >> >> >more round of benchmarks to see the difference. >> >> > >> >> >But if NAPI mode needs to win all of the benchmarks in order to get >> >> >rid of orphan, that would be very difficult. Considering various bugs >> >> >will be fixed by dropping skb_orphan(), it would be sufficient if most >> >> >of the benchmark doesn't show obvious differences. >> >> > >> >> >Looking at git history, there're commits that removes skb_orphan(), for >> >> >example: >> >> > >> >> >commit 8112ec3b8722680251aecdcc23dfd81aa7af6340 >> >> >Author: Eric Dumazet >> >> >Date: Fri Sep 28 07:53:26 2012 + >> >> > >> >> >mlx4: dont orphan skbs in mlx4_en_xmit() >> >> > >> >> >After commit e22979d96a55d (mlx4_en: Moving to Interrupts for TX >> >> >completions) we no longer need to orphan skbs in mlx4_en_xmit() >> >> >since skb wont stay a long time in TX ring before their release. >> >> > >> >> >Orphaning skbs in ndo_start_xmit() should be avoided as much as >> >> >possible, since it breaks TCP Small Queue or other flow control >> >> >mechanisms (per socket limits) >> >> > >> >> >Signed-off-by: Eric Dumazet >> >> >Acked-by: Yevgeny Petrilin >> >> >Cc: Or Gerlitz >> >> >Signed-off-by: David S. Miller >> >> > >> >> >> >> >> >> So yea, it's great if napi and hardware are advanced enough >> >> >> that the default can be changed, since this way virtio >> >> >> is closer to a regular nic and more or standard >> >> >> infrastructure can be used. >> >> >> >> >> >> But dropping it will go against *no breaking userspace* rule. >> >> >> Complicated? Tough. >> >> > >> >> >I don't know what kind of userspace is broken by this. Or why it is >> >> >not broken since the day we enable NAPI mode by default. >> >> >> >> There is a module option that explicitly allows user to set >> >> napi_tx=false >> >> or >> >> napi_weight=0 >> >> >> >> So if you remove this option or ignore it, both breaks the user >> >> expectation. >> > >> >We can keep them, but I wonder what's the expectation of the user >> >here? The only thing so far I can imagine is the performance >> >difference. >> >> True. >> >> > >> >> I personally would vote for this breakage. To carry ancient >> >> things like this one forever does not make sense to me. >> > >> >Exactly. >> > >> >> While at it, >> >> let's remove all virtio net module params. Thoughts? >> > >> >I tend to >> > >> >1) drop the orphan mode, but we can have some benchmarks first >> >> Any idea which? That would be really tricky to find the ones where >> orphan mode makes difference I assume. > >True. Personally, I would like to just drop orphan mode. But I'm not >sure others are happy with this. How about to do it other way around. I will take a stab at sending patch removing it. If anyone is against and has solid data to prove orphan mode is needed, let them provide those. > >Thanks > >> >> >> >2) keep the module parameters >> >> and ignore them, correct? Perhaps a warning would be good. >> >> >> > >> >Thanks >> > >> >> >> >> >> >> >> >> > >> >> >Thanks >> >> > >> >> >> >> >> >> -- >> >> >> MST >> >> >> >> >> > >> >> >> > >> >
Re: [patch net-next] virtio_net: add support for Byte Queue Limits
On Fri, Jun 07, 2024 at 11:57:37AM +0200, Jiri Pirko wrote: > >True. Personally, I would like to just drop orphan mode. But I'm not > >sure others are happy with this. > > How about to do it other way around. I will take a stab at sending patch > removing it. If anyone is against and has solid data to prove orphan > mode is needed, let them provide those. Break it with no warning and see if anyone complains? No, this is not how we handle userspace compatibility, normally. -- MST
Re: [patch net-next] virtio_net: add support for Byte Queue Limits
On Fri, Jun 7, 2024 at 5:57 PM Jiri Pirko wrote: > > Fri, Jun 07, 2024 at 08:47:43AM CEST, jasow...@redhat.com wrote: > >On Fri, Jun 7, 2024 at 2:39 PM Jiri Pirko wrote: > >> > >> Fri, Jun 07, 2024 at 08:25:19AM CEST, jasow...@redhat.com wrote: > >> >On Thu, Jun 6, 2024 at 9:45 PM Jiri Pirko wrote: > >> >> > >> >> Thu, Jun 06, 2024 at 09:56:50AM CEST, jasow...@redhat.com wrote: > >> >> >On Thu, Jun 6, 2024 at 2:05 PM Michael S. Tsirkin > >> >> >wrote: > >> >> >> > >> >> >> On Thu, Jun 06, 2024 at 12:25:15PM +0800, Jason Wang wrote: > >> >> >> > > If the codes of orphan mode don't have an impact when you enable > >> >> >> > > napi_tx mode, please keep it if you can. > >> >> >> > > >> >> >> > For example, it complicates BQL implementation. > >> >> >> > > >> >> >> > Thanks > >> >> >> > >> >> >> I very much doubt sending interrupts to a VM can > >> >> >> *on all benchmarks* compete with not sending interrupts. > >> >> > > >> >> >It should not differ too much from the physical NIC. We can have one > >> >> >more round of benchmarks to see the difference. > >> >> > > >> >> >But if NAPI mode needs to win all of the benchmarks in order to get > >> >> >rid of orphan, that would be very difficult. Considering various bugs > >> >> >will be fixed by dropping skb_orphan(), it would be sufficient if most > >> >> >of the benchmark doesn't show obvious differences. > >> >> > > >> >> >Looking at git history, there're commits that removes skb_orphan(), > >> >> >for example: > >> >> > > >> >> >commit 8112ec3b8722680251aecdcc23dfd81aa7af6340 > >> >> >Author: Eric Dumazet > >> >> >Date: Fri Sep 28 07:53:26 2012 + > >> >> > > >> >> >mlx4: dont orphan skbs in mlx4_en_xmit() > >> >> > > >> >> >After commit e22979d96a55d (mlx4_en: Moving to Interrupts for TX > >> >> >completions) we no longer need to orphan skbs in mlx4_en_xmit() > >> >> >since skb wont stay a long time in TX ring before their release. > >> >> > > >> >> >Orphaning skbs in ndo_start_xmit() should be avoided as much as > >> >> >possible, since it breaks TCP Small Queue or other flow control > >> >> >mechanisms (per socket limits) > >> >> > > >> >> >Signed-off-by: Eric Dumazet > >> >> >Acked-by: Yevgeny Petrilin > >> >> >Cc: Or Gerlitz > >> >> >Signed-off-by: David S. Miller > >> >> > > >> >> >> > >> >> >> So yea, it's great if napi and hardware are advanced enough > >> >> >> that the default can be changed, since this way virtio > >> >> >> is closer to a regular nic and more or standard > >> >> >> infrastructure can be used. > >> >> >> > >> >> >> But dropping it will go against *no breaking userspace* rule. > >> >> >> Complicated? Tough. > >> >> > > >> >> >I don't know what kind of userspace is broken by this. Or why it is > >> >> >not broken since the day we enable NAPI mode by default. > >> >> > >> >> There is a module option that explicitly allows user to set > >> >> napi_tx=false > >> >> or > >> >> napi_weight=0 > >> >> > >> >> So if you remove this option or ignore it, both breaks the user > >> >> expectation. > >> > > >> >We can keep them, but I wonder what's the expectation of the user > >> >here? The only thing so far I can imagine is the performance > >> >difference. > >> > >> True. > >> > >> > > >> >> I personally would vote for this breakage. To carry ancient > >> >> things like this one forever does not make sense to me. > >> > > >> >Exactly. > >> > > >> >> While at it, > >> >> let's remove all virtio net module params. Thoughts? > >> > > >> >I tend to > >> > > >> >1) drop the orphan mode, but we can have some benchmarks first > >> > >> Any idea which? That would be really tricky to find the ones where > >> orphan mode makes difference I assume. > > > >True. Personally, I would like to just drop orphan mode. But I'm not > >sure others are happy with this. > > How about to do it other way around. I will take a stab at sending patch > removing it. If anyone is against and has solid data to prove orphan > mode is needed, let them provide those. Honestly, we in production have to use skb orphan mode. I cannot provide enough and strong evidence about why default mode on earth causes performance degradation in some cases. I mean I don't know the root cause. The only thing I can tell is if I enable the skb orphan mode, then 1) will let more skb pass the TCP layer, 2) some key members like snd_cwnd in tcp will behave normally. I know without orphan mode the skb will be controlled/limited by the combination of TSO and napi_tx mode thanks to sk_wmem_alloc. So I _guess_ the root cause is: the possible delay of interrupts generated by the host machine will cause the delay of freeing skbs, resulting in the slowing down the tx speed. If the interval between two interrupts is very short like the real NIC, I think the issue would disappear. That's all I can tell. Just for record. Hope this information could also be useful to other readers. Thanks, Jason > > > > > >Thanks > > > >> > >> > >> >2) keep the mo
Re: [patch net-next] virtio_net: add support for Byte Queue Limits
Fri, Jun 07, 2024 at 12:23:37PM CEST, m...@redhat.com wrote: >On Fri, Jun 07, 2024 at 11:57:37AM +0200, Jiri Pirko wrote: >> >True. Personally, I would like to just drop orphan mode. But I'm not >> >sure others are happy with this. >> >> How about to do it other way around. I will take a stab at sending patch >> removing it. If anyone is against and has solid data to prove orphan >> mode is needed, let them provide those. > >Break it with no warning and see if anyone complains? This is now what I suggested at all. >No, this is not how we handle userspace compatibility, normally. Sure. Again: I would send orphan removal patch containing: 1) no module options removal. Warn if someone sets it up 2) module option to disable napi is ignored 3) orphan mode is removed from code There is no breakage. Only, hypotetically performance downgrade in some hypotetical usecase nobody knows of. My point was, if someone presents solid data to prove orphan is needed during the patch review, let's toss out the patch. Makes sense? > >-- >MST >
RE: [PATCH net-next] virtio_net: Add TX stop and wake counters
> From: Jason Xing > Sent: Friday, June 7, 2024 4:16 AM > To: Dan Jurgens > Cc: Jakub Kicinski ; Michael S. Tsirkin ; > netdev@vger.kernel.org; jasow...@redhat.com; > xuanz...@linux.alibaba.com; virtualizat...@lists.linux.dev; > da...@davemloft.net; eduma...@google.com; ab...@redhat.com; Parav > Pandit > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters > > On Sat, Feb 3, 2024 at 12:46 AM Daniel Jurgens wrote: > > > > > From: Jakub Kicinski > > > Sent: Friday, February 2, 2024 10:01 AM > > > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake > > > counters > > > > > > On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote: > > > > > Can you say more? I'm curious what's your use case. > > > > > > > > I'm not working at Nvidia, so my point of view may differ from theirs. > > > > From what I can tell is that those two counters help me narrow > > > > down the range if I have to diagnose/debug some issues. > > > > > > right, i'm asking to collect useful debugging tricks, nothing > > > against the patch itself :) > > > > > > > 1) I sometimes notice that if some irq is held too long (say, one > > > > simple case: output of printk printed to the console), those two > > > > counters can reflect the issue. > > > > 2) Similarly in virtio net, recently I traced such counters the > > > > current kernel does not have and it turned out that one of the > > > > output queues in the backend behaves badly. > > > > ... > > > > > > > > Stop/wake queue counters may not show directly the root cause of > > > > the issue, but help us 'guess' to some extent. > > > > > > I'm surprised you say you can detect stall-related issues with this. > > > I guess virtio doesn't have BQL support, which makes it special. > > > Normal HW drivers with BQL almost never stop the queue by themselves. > > > I mean - if they do, and BQL is active, then the system is probably > > > misconfigured (queue is too short). This is what we use at Meta to > > > detect stalls in drivers with BQL: > > > > > > https://lore.kernel.org/all/20240131102150.728960-3-lei...@debian.or > > > g/ > > > > > > Daniel, I think this may be a good enough excuse to add per-queue > > > stats to the netdev genl family, if you're up for that. LMK if you > > > want more info, otherwise I guess ethtool -S is fine for now. > > > > For now, I would like to go with ethtool -S. But I can take on the netdev > level as a background task. Are you suggesting adding them to > rtnl_link_stats64? > > Hello Daniel, Jakub, > > Sorry to revive this thread. I wonder why not use this patch like mlnx driver > does instead of adding statistics into the yaml file? Are we gradually using > or > adding more fields into the yaml file to replace the 'ethtool -S' command? > It's trivial to have the stats in ethtool as well. But I noticed the stats series intentionally removed some stats from ethtool. So I didn't put it both places. > Thanks, > Jason
Re: [PATCH net-next] virtio_net: Add TX stop and wake counters
On Fri, Jun 7, 2024 at 8:37 PM Dan Jurgens wrote: > > > From: Jason Xing > > Sent: Friday, June 7, 2024 4:16 AM > > To: Dan Jurgens > > Cc: Jakub Kicinski ; Michael S. Tsirkin ; > > netdev@vger.kernel.org; jasow...@redhat.com; > > xuanz...@linux.alibaba.com; virtualizat...@lists.linux.dev; > > da...@davemloft.net; eduma...@google.com; ab...@redhat.com; Parav > > Pandit > > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters > > > > On Sat, Feb 3, 2024 at 12:46 AM Daniel Jurgens wrote: > > > > > > > From: Jakub Kicinski > > > > Sent: Friday, February 2, 2024 10:01 AM > > > > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake > > > > counters > > > > > > > > On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote: > > > > > > Can you say more? I'm curious what's your use case. > > > > > > > > > > I'm not working at Nvidia, so my point of view may differ from theirs. > > > > > From what I can tell is that those two counters help me narrow > > > > > down the range if I have to diagnose/debug some issues. > > > > > > > > right, i'm asking to collect useful debugging tricks, nothing > > > > against the patch itself :) > > > > > > > > > 1) I sometimes notice that if some irq is held too long (say, one > > > > > simple case: output of printk printed to the console), those two > > > > > counters can reflect the issue. > > > > > 2) Similarly in virtio net, recently I traced such counters the > > > > > current kernel does not have and it turned out that one of the > > > > > output queues in the backend behaves badly. > > > > > ... > > > > > > > > > > Stop/wake queue counters may not show directly the root cause of > > > > > the issue, but help us 'guess' to some extent. > > > > > > > > I'm surprised you say you can detect stall-related issues with this. > > > > I guess virtio doesn't have BQL support, which makes it special. > > > > Normal HW drivers with BQL almost never stop the queue by themselves. > > > > I mean - if they do, and BQL is active, then the system is probably > > > > misconfigured (queue is too short). This is what we use at Meta to > > > > detect stalls in drivers with BQL: > > > > > > > > https://lore.kernel.org/all/20240131102150.728960-3-lei...@debian.or > > > > g/ > > > > > > > > Daniel, I think this may be a good enough excuse to add per-queue > > > > stats to the netdev genl family, if you're up for that. LMK if you > > > > want more info, otherwise I guess ethtool -S is fine for now. > > > > > > For now, I would like to go with ethtool -S. But I can take on the netdev > > level as a background task. Are you suggesting adding them to > > rtnl_link_stats64? > > > > Hello Daniel, Jakub, > > > > Sorry to revive this thread. I wonder why not use this patch like mlnx > > driver > > does instead of adding statistics into the yaml file? Are we gradually > > using or > > adding more fields into the yaml file to replace the 'ethtool -S' command? > > > > It's trivial to have the stats in ethtool as well. But I noticed the stats > series intentionally removed some stats from ethtool. So I didn't put it both > places. Thank you for the reply. I thought there was some particular reason :-)