On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote: > Hi Nicholas, > > On 04/19/2012 06:53 PM, Nicholas A. Bellinger wrote: > > On Thu, 2012-04-19 at 07:30 -0500, Anthony Liguori wrote: > >> However, for storage, be it scsi or direct access, the same problem really > >> doesn't exist. There isn't an obvious benefit to being in the kernel. > >> > > > > In the modern Linux v3.x tree, it was decided there is an obvious > > benefit to fabric drivers developers for going ahead and putting proper > > SCSI target logic directly into the kernel.. ;) > > I'm sure there are obvious benefits to having the kernel have SCSI target > logic. > I'm not claiming that there isn't. > > But there is not an obvious benefit to doing SCSI emulation *for virtual > machine* guests in the kernel. > > Guests are unconditionally hostile. There is no qualification here. Public > clouds are the obvious example of this. > > TCM runs in the absolute most privileged context possible. When you're > dealing > with extremely hostile input, it's pretty obvious that you want to run it in > the > lowest privileged context as humanly possible. >
The argument that a SCSI target for virtual machines is so complex that it can't possibly be implemented properly in the kernel is a bunch of non-sense. > Good or bad, QEMU runs as a very unprivileged user confined by SELinux and > very > soon, sandboxed with seccomp. There's an obvious benefit to putting complex > code into an environment like this. > Being able to identify which virtio-scsi guests can actually connect via vhost-scsi into individual tcm_vhost endpoints is step one here. tcm_vhost (as well as it's older sibling tcm_loop) are currently both using a virtual initiator WWPN that is set via configfs before attaching the virtual machine to tcm_vhost fabric endpoint + LUNs. Using vhost-scsi initiator WWPNs to enforce what clients can connect to individual tcm_vhost endpoints is one option for restricting access. We are already doing something similar with iscsi-target and tcm_fc(FCoE) endpoints to restrict fabric login access from remote SCSI initiator ports.. <SNIP> > > > >> So before we get too deep in patches, we need some solid justification > >> first. > >> > > > > So the potential performance benefit is one thing that will be in favor > > of vhost-scsi, > > Why? Why would vhost-scsi be any faster than doing target emulation in > userspace and then doing O_DIRECT with linux-aio to a raw device? > > ? Well, using a raw device from userspace there is still going to be a SG-IO memcpy going on here between user <-> kernel in current code, yes..? Being able to deliver interrupts and SGL memory directly into tcm_vhost cmwq kernel context for backend device execution w/o QEMU userspace involvement or extra SGL memcpy is the perceived performance benefit here. How much benefit will this actually provide across single port and multi port tcm_vhost LUNs into a single guest..? That still remains to be demonstrated with performance+throughput benchmarks.. > I think the ability to utilize the underlying TCM fabric > > and run concurrent ALUA multipath using multiple virtio-scsi LUNs to the > > same /sys/kernel/config/target/core/$HBA/$DEV/ backend can potentially > > give us some nice flexibility when dynamically managing paths into the > > virtio-scsi guest. > > The thing is, you can always setup this kind of infrastructure and expose a > raw > block device to userspace and then have QEMU emulate a target and turn that > into > O_DIRECT + linux-aio. > > We can also use SG_IO to inject SCSI commands if we really need to. I'd > rather > we optimize this path. If nothing else, QEMU should be filtering SCSI > requests > before the kernel sees them. If something is going to SEGV, it's far better > that it's QEMU than the kernel. > QEMU SG-IO and BSG drivers are fine for tcm_loop SCSI LUNs with QEMU HBA emulation, but they still aren't tied directly to an individual guest instance. That is, the raw devices being passed into SG-IO / BSG are still locally accessible on host via SCSI devices w/o guest access restrictions, while a tcm_vhost endpoint is not exposing any host accessible block device, that could also restrict access to an authorized list of virtio-scsi clients. > We cannot avoid doing SCSI emulation in QEMU. SCSI is too fundamental to far > too many devices. So the prospect of not having good SCSI emulation in QEMU > is > not realistic. > I'm certainly not advocating for a lack of decent SCSI emulation in QEMU. Being able to support this across all host platform is something QEMU certainly needs to take seriously. Quite the opposite, I think virtio-scsi <-> vhost-scsi is a mechanism by which it will be (eventually) possible to support T10 DIF protection for storage blocks directly between Linux KVM guest <-> host. In order for QEMU userspace to support this, Linux would need to expose a method to userspace for issuing DIF protected CDBs. This userspace API currently does not exist AFAIK, so a kernel-level approach is the currently the only option when it comes to supporting end-to-end block protection information originating from within Linux guests. (Note this is going to involve a virtio-scsi spec rev as well) --nab