On Mon, Dec 14, 2015 at 08:35:43AM +0100, Hannes Reinecke wrote: > On 12/14/2015 08:24 AM, Stefan Hajnoczi wrote: > >On Thu, Dec 10, 2015 at 10:13:17AM +0100, Hannes Reinecke wrote: > >>On 12/10/2015 09:26 AM, Stefan Hajnoczi wrote: > >>>On Fri, Nov 27, 2015 at 03:58:58PM +0100, Hannes Reinecke wrote: > >>>>here's now an updated version to enable ALUA and simplified > >>>>active/passive multipath support for qemu. > >>>> > >>>>This patchset relies on having _two_ block devices configured, > >>>>and two SCSI disks pointing to those block devices with the > >>>>_same_ 'wwn' property and unique 'port_group' properties. > >>>>I know, this is a bit of a nasty hack, but I hope to add > >>>>proper multipath support (with several SCSI devices pointing / > >>>>linking to the same block device) in the near future. > >>>> > >>>>It also implements a 'alua_policy', which allows for simulating > >>>>an 'active/passive' multipath setup. > >>>> > >>>>And for testing I've implemented a 'block_disconnect' HMP command, > >>>>which simulates a link failure for the attached devices. > >>>> > >>>>I wouldn't object if someone declares this a gross hack, but with > >>>>it I can finally simulate real-life multipath failover and do > >>>>some functional multipath-tools testing withouth having to recurse > >>>>on using real hardware. > >>> > >>>I'm not familiar with how ALUA works but have been thinking about a > >>>multipath problem: > >>> > >>>If the host has SCSI disks that are marked 'offline' then QEMU will > >>>refuse to start up since it cannot open the block device (ENXIO). > >>> > >>Define 'offline'. > >>If this means the ALUA state 'offline' then we wouldn't have to worry; ALUA > >>state 'offline' essentially means "Yeah, there's something here, but I won't > >>tell you and you cannot access it.". > >>And any transitions to and from 'offline' are essentially vendor-specific. > >>In short: Do not use it. > >> > >>If OTOH means the 'block_disconnect' state this is something which > >>should/needs to be implemented in the HBA emulation for simulating > >>a link failure. > >>qemu itself should be able to access the device and it should start up > >>perfectly normal, so we shouldn't get any ENXIO errors. > >> > >>(Obviously, if _all_ disks are in 'disconnect' state the guest wouldn't > >>start up as it cannot read any data. But that's beside the point.) > > > >I'm referring to scsi_device_set_state(scmd->device, SDEV_OFFLINE) in > >Linux. This is the state where the host block device cannot be opened > >or accessed. > > > Which means the device is declared dead by the SCSI stack. > And qemu does _very_ well not to start in this circumstances. > > However, this behaviour is not influenced nor modified by the ALUA patchset > but is rather a different topic. > > <rambling> > 'offline' devices is the final step in SCSI EH, which means that SCSI EH has > exhausted its options and doesn't know how to fix the device. > However, in modern systems this typically happens when SCSI EH kicks in > during a (transport) link disconnect, as then every single step in SCSI EH > will fail. (Which also means that SCSI EH is woefully inadequate for FC, but > that's a different topic.) > But as this is a transport issue, _all_ respective drivers should be aware > of this, and should have been modified _not_ to start SCSI EH when the > transport link is severed. > So the very fact that SCSI EH is started means that there's an issue with > the driver, which really needs to be fixed first. > Hence I think qemu is right here, as the underlying reason for the 'offline' > device should be fixed first. > </rambling>
Interesting, thanks for explaining. Stefan
signature.asc
Description: PGP signature