On Wed, 20 Jun 2018 08:01:50 +0900 Greg KH <g...@kroah.com> wrote: > On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote: > > Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v > > guests failed to bring up network interfaces on boot, logging "A link > > change request failed with some changes committed already. Interface eth0 > > may have been left with an inconsistent configuration, please check." > > Running 'ifconfig eth0 up' appears to fix the problem temporarily so I went > > about bisecting which landed on: > > > > commit be9c798d0d13ae609a91177323ac816545c39d28 > > Author: Stephen Hemminger <step...@networkplumber.org> > > Date: Mon May 14 15:32:18 2018 -0700 > > > > hv_netvsc: common detach logic > > > > [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ] > > > > Make common function for detaching internals of device > > during changes to MTU and RSS. Make sure no more packets > > are transmitted and all packets have been received before > > doing device teardown. > > > > Change the wait logic to be common and use usleep_range(). > > > > Changes transmit enabling logic so that transmit queues are disabled > > during the period when lower device is being changed. And enabled > > only after sub channels are setup. This avoids issue where it could > > be that a packet was being sent while subchannel was not initialized. > > > > Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") > > Signed-off-by: Stephen Hemminger <sthem...@microsoft.com> > > Signed-off-by: David S. Miller <da...@davemloft.net> > > Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> > > > > > > The removal of which does indeed fix the problem. That led me to: > > > > commit 52acf73b6e9a6962045feb2ba5a8921da2201915 > > Author: Dexuan Cui <de...@microsoft.com> > > Date: Wed Jun 6 21:32:51 2018 +0000 > > > > hv_netvsc: Fix a network regression after ifdown/ifup > > > > Recently people reported the NIC stops working after > > "ifdown eth0; ifup eth0". It turns out in this case the TX queues are > > not > > enabled, after the refactoring of the common detach logic: when the NIC > > has sub-channels, usually we enable all the TX queues after all > > sub-channels are set up: see rndis_set_subchannel() -> > > netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where > > the number of channels doesn't change, we also must make sure the TX > > queues > > are enabled. The patch fixes the regression. > > > > Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") > > Signed-off-by: Dexuan Cui <de...@microsoft.com> > > Cc: Stephen Hemminger <sthem...@microsoft.com> > > Cc: K. Y. Srinivasan <k...@microsoft.com> > > Cc: Haiyang Zhang <haiya...@microsoft.com> > > Signed-off-by: David S. Miller <da...@davemloft.net> > > > > Which sounded very promising, but does not seem to fully fix the issue. > > Doing some more digging, I was able to determine that the message coincides > > with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1 second) after the > > hv_netvsc driver loads. If I delay the mtu change until well after the > > driver loads, everything works fine. If I unload hv_netvsc and then reload > > it and apply the mtu change immediately, the failure re-occurs. So > > something is racy here, and the above doesn't entirely address it. > > > > I'm happy to test out any suggested patches and/or do additional debugging > > if anyone has any suggestions. > > > > (oh, and I did also try 4.18-rc1 and the problem still persists) > > Always cc: the authors of the patch you are having problems with, along > with the mailing list for the networking subsystem, so that the right > people know to look at this.
Let me take a look at it, and log it into the bug system.