On 07/05/2018 07:25 PM, Jason J. Herne wrote:
If a vfio-ccw device is left in an error state (example: pending unit check) then it is possible for that state to persist for a vfio-ccw device even after the enable subchannel that we do to bring the device online. If this state is allowed to persist then even simple I/O operations will needlessly fail. A basic sense ccw is used to clear this error state for the boot device. Signed-off-by: Jason J. Herne<jjhe...@linux.ibm.com>
I don't like this. AFAIK an IPL is preceded by and subsystem reset. If it weren't the IPL-ed OS (program) would have to take care any potential mess left by the previous one -- and pray it gets control. A subsystem reset should clear any device error state, so it is not supposed to persist across subsystem resets. If the error re-emerges (unsolicited) after the reset, it's likely something is really broken and needs investigation. Generally IPL is supposed to fail in such cases (except for corner cases which are not really handled by this patch). AFAIU this patch works around a broken reset. While our bios is not guest and one could try to argue that it's firmware -- part of 'the machine', a believe handling the reset in the bios is wrong. AFAIR the qemu emulator is in charge, and if needed makes kvm do what it has to. If the reset is broken for vfio-ccw (which is very possible, but I would have to check), I think we should fix it in the right place. A workaround may be still justified (if kernel changes like clear support are needed). But we should indicate that clearly in the commit message and in the code. Regards, Halil