> -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > Sent: Tuesday, March 22, 2016 7:01 AM > To: KY Srinivasan <k...@microsoft.com> > Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; Haiyang > Zhang <haiya...@microsoft.com>; Alex Ng (LIS) <ale...@microsoft.com>; > Radim Krcmar <rkrc...@redhat.com>; Cathy Avery <cav...@redhat.com> > Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > > KY Srinivasan <k...@microsoft.com> writes: > > >> -----Original Message----- > >> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > >> Sent: Monday, March 21, 2016 12:52 AM > >> To: KY Srinivasan <k...@microsoft.com> > >> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; Haiyang > >> Zhang <haiya...@microsoft.com>; Alex Ng (LIS) > <ale...@microsoft.com>; > >> Radim Krcmar <rkrc...@redhat.com>; Cathy Avery > <cav...@redhat.com> > >> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> > >> KY Srinivasan <k...@microsoft.com> writes: > >> > >> >> -----Original Message----- > >> >> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > >> >> Sent: Friday, March 18, 2016 5:33 AM > >> >> To: de...@linuxdriverproject.org > >> >> Cc: linux-ker...@vger.kernel.org; KY Srinivasan <k...@microsoft.com>; > >> >> Haiyang Zhang <haiya...@microsoft.com>; Alex Ng (LIS) > >> >> <ale...@microsoft.com>; Radim Krcmar <rkrc...@redhat.com>; > Cathy > >> >> Avery <cav...@redhat.com> > >> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> >> > >> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is > >> always > >> >> delivered to CPU0 regardless of what CPU we're sending > >> >> CHANNELMSG_UNLOAD > >> >> from. vmbus_wait_for_unload() doesn't account for the fact that in > case > >> >> we're crashing on some other CPU and CPU0 is still alive and > operational > >> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there > completing > >> >> vmbus_connection.unload_event, our wait on the current CPU will > never > >> >> end. > >> > > >> > What was the host you were testing on? > >> > > >> > >> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible > >> by forcing crash on a secondary CPU, e.g.: > > > > Prior to 2012R2, all messages would be delivered on CPU0 and this includes > CHANNELMSG_UNLOAD_RESPONSE. > > For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. > From 2012 R2 on, all vmbus > > messages (responses) will be delivered on the CPU that we initially set up > > - > look at the code in > > vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to > CHANNELMSG_UNLOAD_RESPONSE > > will be delivered on the CPU where we initiate the contact with the > > host - CHANNELMSG_INITIATE_CONTACT message. > > Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4. > On > WS2012R2 what you're saying is true and all messages including > CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for > initial > contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a > special > case and it is always delivered to CPU0, no matter which CPU we used for > initial contact. This can be a host bug. You can use the attached patch > to see the issue.
This looks like a host bug and I will try to get is addressed before ws2016 ships. > > For now I can suggest we check message pages for all CPUs from > vmbus_wait_for_unload(). We can race with other CPUs again but we don't > care as we're checking for completion_done() in the loop as well. I'll > try this approach. Thank you. K. Y > > -- > Vitaly _______________________________________________ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel