> -----Original Message----- > From: Jason Wang [mailto:jasow...@redhat.com] > Sent: Tuesday, December 22, 2020 12:41 PM > To: Willem de Bruijn <willemdebruijn.ker...@gmail.com>; wangyunjian > <wangyunj...@huawei.com> > Cc: Network Development <netdev@vger.kernel.org>; Michael S. Tsirkin > <m...@redhat.com>; virtualizat...@lists.linux-foundation.org; Lilijun (Jerry) > <jerry.lili...@huawei.com>; chenchanghu <chenchan...@huawei.com>; > xudingke <xudin...@huawei.com>; huangbin (J) > <brian.huang...@huawei.com> > Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg > fails > > > On 2020/12/22 上午7:07, Willem de Bruijn wrote: > > On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunj...@huawei.com> > wrote: > >> From: Yunjian Wang<wangyunj...@huawei.com> > >> > >> Currently we break the loop and wake up the vhost_worker when sendmsg > >> fails. When the worker wakes up again, we'll meet the same error. > > The patch is based on the assumption that such error cases always > > return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb? > > > >> This will cause high CPU load. To fix this issue, we can skip this > >> description by ignoring the error. When we exceeds sndbuf, the return > >> value of sendmsg is -EAGAIN. In the case we don't skip the > >> description and don't drop packet. > > the -> that > > > > here and above: description -> descriptor > > > > Perhaps slightly revise to more explicitly state that > > > > 1. in the case of persistent failure (i.e., bad packet), the driver > > drops the packet 2. in the case of transient failure (e.g,. memory > > pressure) the driver schedules the worker to try again later > > > If we want to go with this way, we need a better time to wakeup the worker. > Otherwise it just produces more stress on the cpu that is what this patch > tries > to avoid.
The problem was initially discovered when a VM sent an abnormal packet, which causing the VM can't send packets anymore. After this patch "feb8892cb441c7 vhost_net: conditionally enable tx polling", there have also been high CPU consumption issues. It is the first problem that I am actually more concerned with and want to solve. Thanks > > Thanks > > > > > >