> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] > Sent: 29 February 2016 14:56 > To: Paul Durrant > Cc: Bob Liu; xen-devel@lists.xen.org; Ian Jackson; jbeul...@suse.com; Roger > Pau Monne; jgr...@suse.com > Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass- > through SCSI commands > > On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote: > > > -----Original Message----- > > > From: Bob Liu [mailto:bob....@oracle.com] > > > Sent: 29 February 2016 03:37 > > > To: xen-devel@lists.xen.org > > > Cc: Ian Jackson; jbeul...@suse.com; Roger Pau Monne; > jgr...@suse.com; > > > Paul Durrant; konrad.w...@oracle.com; Bob Liu > > > Subject: [RFC PATCH] xen-block: introduces extra request to pass- > through > > > SCSI commands > > > > > > 1) What is this patch about? > > > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). > > > A request with BLKIF_OP_EXTRA_FLAG set means the following request > is an > > > extra request which is used to pass through SCSI commands. > > > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. > > > It can be extended easily to transmit other per-request/bio data from > > > frontend > > > to backend e.g Data Integrity Field per bio. > > > > > > 2) Why we need this? > > > Currently only raw data segments are transmitted from blkfront to > blkback, > > > which > > > means some advanced features are lost. > > > * Guest knows nothing about features of the real backend storage. > > > For example, on bare-metal environment INQUIRY SCSI command > > > can be used > > > to query storage device information. If it's a SSD or flash device we > > > can have the option to use the device as a fast cache. > > > But this can't happen in current domU guests, because blkfront only > > > knows it's just a normal virtual disk > > > > > > > That's the sort of information that should be advertised via xenstore then. > There already feature flags for specific things but if some form of > throughput/latency information is meaningful to a frontend stack then > perhaps that can be advertised too. > > Certainly could be put on the XenStore. Do you envision this being done > pre guest creation (so toolstack does it), or the backend finds this > and populates the XenStore keys? > > Or that the frontend writes an XenStore key 'scsi-inq=vpd80' and the > backend > responds by populating an 'scsi-inq-vpd80=' <binary blob>'? If so can > the XenStore accept binary payloads? Can it be more than 4K? >
I was thinking more along the lines of blkback creating xenstore keys with any relevant information. We have sector size and number of sectors already but I don't see any harm in having values for other quantities that may be useful to a frontend. Bouncing SCSI inquiries via xenstore was certainly not what I was thinking. > > > > > > * Failover Clusters in Windows > > > Failover clusters require SCSI-3 persistent reservation target disks, > > > but now this can't work in domU. > > > > > > > That's true but allowing arbitrary SCSI messages through is not the way > forward IMO. Just because Windows thinks every HBA is SCSI doesn't mean > other OS do so I think reservation/release should have dedicated messages > in the blkif protocol if it's desirable to support clustering in the frontend. > > Could you expand a bit on the 'dedicated message' you have in mind please? > As in we create message types to reserve/release whatever backend is being used and the backend uses whatever mechanism is appropriate to deal with that. E.g. if it were qdisk talking to a raw file then that might just be an flock. > > > > > 3) Known issues: > > > * Security issues, how to 'validate' this extra request payload. > > > E.g SCSI operates on LUN bases (the whole disk) while we really just > want > > > to > > > operate on partitions > > > > > > * Can't pass SCSI commands through if the backend storage driver is bio- > > > based > > > instead of request-based. > > > > > > 4) Alternative approach: Using PVSCSI instead: > > > * Doubt PVSCSI can support as many type of backend storage devices as > > > Xen-block. > > > > > > > LIO can interface to any block device in much the same way blkback does > IIRC. > > But it can't do multipath or LVMs - which is an integral component. > Really? I was not aware of that limitation and it surprises me since AFAIK LIO can also use a raw file as a backstore which seems like it would be above either of those. > Anyhow that is more of a implementation specific quirk. > > > > > * Much longer path: > > > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> > > > PVSCSI- > > > backend -> Target framework(LIO?) -> > > > > > > With xen-block we only need: > > > ioctl() -> blkfront -> blkback -> > > > > > > > ...and what happens if the block device that blkback is talking to is a SCSI > LUN? > > > > That latter path is also not true for Windows. You've got all the SCSI > translation logic in the frontend when using blkif so that first path would > collapse to: > > > > Disk driver -> (SCSI) HBA Driver -> xen-scsiback -> LIO -> backstore -> XXX > > I don't know if it matters on the length of the path for say SCSI INQ. It > isn't > like > that is performance specific. Neither are the clustering SCSI commands. > > > > > > * xen-block has been existed for many years, widely used and more > stable. > > > > > > > It's definitely widely used, but it has had stability issues in recent > > times. > > Oh? Could you send the bug-reports to me and Roger, CC xen-devel and > LKML please ? I was casting my mind back to incompatibilities that crept in with persistent grants. TBH I haven't used blkback much since then; I tend to use qemu qdisk as my backend these days. Paul _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel