> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
> Sent: 29 February 2016 14:56
> To: Paul Durrant
> Cc: Bob Liu; xen-devel@lists.xen.org; Ian Jackson; jbeul...@suse.com; Roger
> Pau Monne; jgr...@suse.com
> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass-
> through SCSI commands
> 
> On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Bob Liu [mailto:bob....@oracle.com]
> > > Sent: 29 February 2016 03:37
> > > To: xen-devel@lists.xen.org
> > > Cc: Ian Jackson; jbeul...@suse.com; Roger Pau Monne;
> jgr...@suse.com;
> > > Paul Durrant; konrad.w...@oracle.com; Bob Liu
> > > Subject: [RFC PATCH] xen-block: introduces extra request to pass-
> through
> > > SCSI commands
> > >
> > > 1) What is this patch about?
> > > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> > > A request with BLKIF_OP_EXTRA_FLAG set means the following request
> is an
> > > extra request which is used to pass through SCSI commands.
> > > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> > > It can be extended easily to transmit other per-request/bio data from
> > > frontend
> > > to backend e.g Data Integrity Field per bio.
> > >
> > > 2) Why we need this?
> > > Currently only raw data segments are transmitted from blkfront to
> blkback,
> > > which
> > > means some advanced features are lost.
> > >  * Guest knows nothing about features of the real backend storage.
> > >   For example, on bare-metal environment INQUIRY SCSI command
> > > can be used
> > >   to query storage device information. If it's a SSD or flash device we
> > >   can have the option to use the device as a fast cache.
> > >   But this can't happen in current domU guests, because blkfront only
> > >   knows it's just a normal virtual disk
> > >
> >
> > That's the sort of information that should be advertised via xenstore then.
> There already feature flags for specific things but if some form of
> throughput/latency information is meaningful to a frontend stack then
> perhaps that can be advertised too.
> 
> Certainly could be put on the XenStore. Do you envision this being done
> pre guest creation (so toolstack does it), or the backend finds this
> and populates the XenStore keys?
> 
> Or that the frontend writes an XenStore key 'scsi-inq=vpd80' and the
> backend
> responds by populating an 'scsi-inq-vpd80=' <binary blob>'? If so can
> the XenStore accept binary payloads? Can it be more than 4K?
> 

I was thinking more along the lines of blkback creating xenstore keys with any 
relevant information. We have sector size and number of sectors already but I 
don't see any harm in having values for other quantities that may be useful to 
a frontend. Bouncing SCSI inquiries via xenstore was certainly not what I was 
thinking.

> 
> >
> > >  * Failover Clusters in Windows
> > >   Failover clusters require SCSI-3 persistent reservation target disks,
> > >   but now this can't work in domU.
> > >
> >
> > That's true but allowing arbitrary SCSI messages through is not the way
> forward IMO. Just because Windows thinks every HBA is SCSI doesn't mean
> other OS do so I think reservation/release should have dedicated messages
> in the blkif protocol if it's desirable to support clustering in the frontend.
> 
> Could you expand a bit on the 'dedicated message' you have in mind please?
> 

As in we create message types to reserve/release whatever backend is being used 
and the backend uses whatever mechanism is appropriate to deal with that. E.g. 
if it were qdisk talking to a raw file then that might just be an flock.

> >
> > > 3) Known issues:
> > >  * Security issues, how to 'validate' this extra request payload.
> > >    E.g SCSI operates on LUN bases (the whole disk) while we really just
> want
> > > to
> > >    operate on partitions
> > >
> > >  * Can't pass SCSI commands through if the backend storage driver is bio-
> > > based
> > >    instead of request-based.
> > >
> > > 4) Alternative approach: Using PVSCSI instead:
> > >  * Doubt PVSCSI can support as many type of backend storage devices as
> > > Xen-block.
> > >
> >
> > LIO can interface to any block device in much the same way blkback does
> IIRC.
> 
> But it can't do multipath or LVMs - which is an integral component.
> 

Really? I was not aware of that limitation and it surprises me since AFAIK LIO 
can also use a raw file as a backstore which seems like it would be above 
either of those.

> Anyhow that is more of a implementation specific quirk.
> >
> > >  * Much longer path:
> > >    ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> 
> > > PVSCSI-
> > > backend -> Target framework(LIO?) ->
> > >
> > >    With xen-block we only need:
> > >    ioctl() -> blkfront -> blkback ->
> > >
> >
> > ...and what happens if the block device that blkback is talking to is a SCSI
> LUN?
> >
> > That latter path is also not true for Windows. You've got all the SCSI
> translation logic in the frontend when using blkif so that first path would
> collapse to:
> >
> > Disk driver -> (SCSI) HBA Driver -> xen-scsiback -> LIO -> backstore -> XXX
> 
> I don't know if it matters on the length of the path for say SCSI INQ. It 
> isn't
> like
> that is performance specific. Neither are the clustering SCSI commands.
> 
> >
> > >  * xen-block has been existed for many years, widely used and more
> stable.
> > >
> >
> > It's definitely widely used, but it has had stability issues in recent 
> > times.
> 
> Oh? Could you send the bug-reports to me and Roger, CC xen-devel and
> LKML please ?

I was casting my mind back to incompatibilities that crept in with persistent 
grants. TBH I haven't used blkback much since then; I tend to use qemu qdisk as 
my backend these days.

  Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to