Re: [Qemu-devel] [PATCH] block: Set cdrom device read only flag
Since pretty much every cdrom drive use scsi today, shouldnt the readonly/writeable flag for MMC devices rather be done down in the hw/scsi* code rather than the generic block code? If/when/ever I get enough time I would like to port the writeable dvd+r emulation I wrote for STGT to qemu. In STGT, MMC/DVD devices can be either read-only or read-write. If the filesize for the backing file is >0 bytes, it is assumed the file is an iso image and that the file is a read-only iso image. If filesize is ==0, then the file is opened read-write and is treated as a "blank dvd+r disk that the initiator can write/burn to" regards ronnie sahlberg On Tue, Aug 7, 2012 at 6:47 PM, Markus Armbruster wrote: > Kevin Wolf writes: > >> Am 02.08.2012 09:20, schrieb Kevin Shanahan: >>> On Thu, Aug 02, 2012 at 02:49:52PM +0930, Kevin Shanahan wrote: >>>> On Thu, Aug 02, 2012 at 11:46:13AM +0930, Kevin Shanahan wrote: >>>>> On Thu, Aug 02, 2012 at 11:02:42AM +0930, Kevin Shanahan wrote: >>>>>> Set the block driver read_only flag for cdrom devices so that >>>>>> qmp_change_blockdev does not attempt to open cdrom files in read-write >>>>>> mode when changing media. >>>>> >>>>> Hrm, this fixes my simple test case using the kvm monitor directly but >>>>> changing media via libvirt still has the same issue (fails for RO >>>>> files, succeeds for writable files). >>>>> >>>>> $ virsh attach-disk --type cdrom --mode readonly test1 >>>>> /srv/kvm/iso/ubuntu-12.04-server-amd64.iso hdc >>>>> error: Failed to attach disk >>>>> error: internal error unable to execute QEMU command 'change': >>>>> Could not open '/srv/kvm/iso/ubuntu-12.04-server-amd64.iso' >>>>> >>>>> I'll keep looking into it. >>>> >>>> In the libvirt case, it seems libvirt is failing to add media=cdrom to >>>> the commandline, so in this case qemu is defaulting to media=disk and >>>> my proposed fix has no effect. Diving into libvirt now to see why no >>>> media=disk is getting added... >>>> >>>> Common test case has this xml (generated by virt-install): >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> Ok, looks like libvirt is intentionally leaving media=cdrom off the >>> command line in the case that "-device ide-cd,..." is >>> supported. Presumably by specifying the device this way, qemu is >>> supposed to work out that the media type is cdrom automatically (but >>> it doesn't, it defaults to disk). >>> >>> Libvirt wants to use: >>> >>> qemu-kvm ... \ >>> -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw \ >>> -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 ... >>> >>> If I hack qemu/qemu_command.c::qemuBuildDriveStr() to ignore the check >>> for QEMU_CAPS_IDE_CD and always add media=cdrom, then (with my qemu as >>> well patch) qemu will open cdrom media files read-only. >>> >>> There's probably a neater way to just get qemu to set the media type >>> if "-device ide-cd,..." is used, but I haven't worked it out yet. >>> >>> Anyway, apologies for the rambling conversation with myself on your >>> lists. Hope this is helpful in some way. >> >> Thanks, that's some good information. >> >> However, I don't think you should start from media=cdrom. libvirt >> already does specify readonly=on and that is the property you're really >> interested in. Since commit 528f7663 (released with 0.13) the 'change' >> command should keep the read-only flag for all kinds of media. > > Correct. > >> Now I'm not sure where you lost the read-only flag. At least on git >> master it doesn't seem to reproduce for me. > > I can: > > $ qemu --enable-kvm -S -m 384 -vnc localhost:5500 -monitor stdio \ > -drive if=none,id=cd,readonly=on,format=raw \ > -device ide-cd,bus=ide.1,unit=0,drive=cd > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) change cd r5.iso > Could not open 'r5.iso' > (qemu) q > armbru@blackfin:~/work/images$ ls -l r5.iso > -r--r--r--. 1 armbru armbru 2872639488 Mar 31 2011 r5.iso > > Looks like a QEMU bug to me. >
Re: [Qemu-devel] [PATCH] ISCSI: Pick default initiator-name based on the name of the VM
csi_open(BlockDriverState *bs, const char > *filename, int flags) > if (ret != 0) { > error_report("Failed to set initiator username and password"); > ret = -EINVAL; > -goto failed; > +goto out; > } > } > > @@ -985,13 +981,13 @@ static int iscsi_open(BlockDriverState *bs, const char > *filename, int flags) > if (parse_chap(iscsi, iscsi_url->target) != 0) { > error_report("iSCSI: Failed to set CHAP user/password"); > ret = -EINVAL; > -goto failed; > +goto out; > } > > if (iscsi_set_session_type(iscsi, ISCSI_SESSION_NORMAL) != 0) { > error_report("iSCSI: Failed to set session type to normal."); > ret = -EINVAL; > -goto failed; > +goto out; > } > > iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_NONE_CRC32C); > @@ -1012,7 +1008,7 @@ static int iscsi_open(BlockDriverState *bs, const char > *filename, int flags) > != 0) { > error_report("iSCSI: Failed to start async connect."); > ret = -EINVAL; > -goto failed; > +goto out; > } > > while (!task.complete) { > @@ -1023,11 +1019,7 @@ static int iscsi_open(BlockDriverState *bs, const char > *filename, int flags) > error_report("iSCSI: Failed to connect to LUN : %s", > iscsi_get_error(iscsi)); > ret = -EINVAL; > -goto failed; > -} > - > -if (iscsi_url != NULL) { > -iscsi_destroy_url(iscsi_url); > +goto out; > } > > /* Medium changer or tape. We dont have any emulation for this so this > must > @@ -1039,19 +1031,22 @@ static int iscsi_open(BlockDriverState *bs, const > char *filename, int flags) > bs->sg = 1; > } > > -return 0; > +ret = 0; > > -failed: > +out: > if (initiator_name != NULL) { > g_free(initiator_name); > } > if (iscsi_url != NULL) { > iscsi_destroy_url(iscsi_url); > } > -if (iscsi != NULL) { > -iscsi_destroy_context(iscsi); > + > +if (ret) { > +if (iscsi != NULL) { > +iscsi_destroy_context(iscsi); > +} > +memset(iscsilun, 0, sizeof(IscsiLun)); > } > -memset(iscsilun, 0, sizeof(IscsiLun)); > return ret; > } > > > Can you confirm that this is how the initiator name should be passed, so I can > split the above patch and commit it? > That looks good. Thanks! ronnie sahlberg
Re: [Qemu-devel] virtio-scsi vs. virtio-blk
On Thu, Aug 9, 2012 at 10:31 PM, Stefan Priebe - Profihost AG wrote: > Am 09.08.2012 14:19, schrieb Paolo Bonzini: > >> Il 09/08/2012 14:08, Stefan Priebe - Profihost AG ha scritto: >>> >>> >>> virtio-scsi: >>> rand 4k: >>>write: io=822448KB, bw=82228KB/s, iops=20557, runt= 10002msec >>>read : io=950920KB, bw=94694KB/s, iops=23673, runt= 10042msec >>> seq: >>>write: io=2436MB, bw=231312KB/s, iops=56, runt= 10784msec >>>read : io=3248MB, bw=313799KB/s, iops=76, runt= 10599msec >>> >>> virtio-blk: >>> rand 4k: >>>write: io=896472KB, bw=89051KB/s, iops=22262, runt= 10067msec >>>read : io=1710MB, bw=175073KB/s, iops=43768, runt= 10002msec >>> seq: >>>write: io=4008MB, bw=391285KB/s, iops=95, runt= 10489msec >>>read : io=5748MB, bw=570178KB/s, iops=139, runt= 10323msec >> >> >> Thanks; some overhead is expected, but not this much. Especially the >> sequential case is bad, what disk is this? > > > right now this is an external iscsi Nexentastor. Locally i can't get this > bandwith nor these iops to test. > > > >> Things to test include: >> >> - using the deadline I/O scheduler on at least the host, and possibly >> the guest too > > guest uses noop right now. Disk Host is nexentastor running open solaris. I > use libiscsi right now so the disks are not visible in both cases > (virtio-blk and virtio-scsi) to the host right now. > And if you mount the disks locally on the host using open-iscsi, and access them as /dev/sg* from qemu, what performance do you get? virtio-blk would first go to scsi emulation and then call out to block/iscsi.c to translate back to scsi commands to send to libiscsi while virtio-scsi (I think) would treat libiscsi as a generic scsi passthrough device. I.e. all commands just go straight through bdrv_aio_ioctl(SG_IO) If anything, I think the codepaths should be shorter for the virtio-scsi case, and it should due to the lack of scsi emulate and scsi re-encode perform better. Can you try also using normal scsi-generic and see how it performs compared to virtio-blk/-scsi ? git show 983924532f61091fd90d1f2fafa4aa938c414dbb This command shows how to set up libiscsi with passthrough via an emulated scsi hba. as virtio-blk/-scsi both use libiscsi, i think the bottleneck might either be the interface between guest and qemu, or the difference to the guest when talking to local scsi-emulation vs talking to the passthrough remote target. also is it possible to map the LUNs locally on the host using open-iscsi and then use the scsi-generic devices /dev/sg* for qemu and see how it compares?
Re: [Qemu-devel] virtio-scsi vs. virtio-blk
You want discard to work? That should not be a problem with iscsi. You are using qemu 1.0 ? So you dont have the qemu support for scsi-generic passthrough to iscsi devices. This should though work without too much trouble First you need an iscsi target that supports SBC UNMAP command. STGT does support this : http://stgt.sourceforge.net/ This is a userspace software iscsi target that works on most distros of linux. Support for thin-provisioning in STGT is very very recent : commit 9a855ac0026971c2b5c7f7133febfaf9462410dc Author: Ronnie Sahlberg Date: Sun Apr 15 12:07:32 2012 +1000 SBC: Add support for thin-provisioning and the UNMAP command The UNMAP command is implemented using FALLOC_FL_PUNCH_HOLE and release UNMAPPED blocks back to the underlying filesystem. FALLOC_FL_PUNCH_HOLE is fairly new addition to Linux but works o ext4 and XFS filesystems currently. Signed-off-by: Ronnie Sahlberg Signed-off-by: FUJITA Tomonori That means STGT version 1.0.27 or later. As this is very recent your distro probably does not have support for this yet, so you probably want to download, compile and install STGT from the current git tree. Thin-provisioning in STGT requires the also very recent addition on FALLOC_FL_PUNCH_HOLE (and also SEEK_HOLE/SEEK_DATA if you want "get_lba_status" support) Because STGT just calls out to these functions. I think you need to run the target on linux 3.2 or later kernels using ext4/xfs filessytem for this to work since I dont think any other filesystems support it. Never tested XFS myself but google claims it works. Once you have STGT running on a 3.2 or later kernel and using a filesystem that supports discard, this is the command to tell TGTD to activate thin-provisioning support for the LUN : tgtadm --lld iscsi --mode logicalunit --op update --tid $TID --lun 1 --params thin_provisioning=1 STGT will show the thin provisioning status like this LUN: 1 Type: disk SCSI ID: IET 00010001 SCSI SN: beaf11 Size: 105 MB, Block size: 512 Online: Yes Removable media: Yes Prevent removal: No Readonly: No Thin-provisioning: Yes Backing store type: rdwr Backing store path: /100mb.img Backing store flags: ... Once you got STGT up and running and setup for thin provisioning that should be almost all you need. (Other iscsi targets may also probably support thin-provisioning but I have no idea on how to set them up) Once you have set STGT up, you just need a guest that supports discard. Any recent linux distro with a kernel 3.2 or later should do. I often use latest mint when I test. Just set it up and put a ext4 filesystem on the iscsi lun, and use the 'discard' mount option in the guest. Use 'ls -ls ' on the target and see that the file 'shrinks' in size when you delete files from the ext4 filesystem. You can also use wireshark, it understands and decodes the unmap UNMAP that is used to punch holes in the medium. NOTE: STGT does not yet support/implement the "thresholds" for thin-provisioning so there is not yet any mechanism to automatically inform your guest when the unterlying storage is running low on space. So you do need to track space utilization on the target backing filesystem youself! At some stage I will add the thresholds from sbc 4.7.3.8 but it wont be anytime soon. (patches are likely welcome) That should be all you need to do to get it running. It is pretty easy. Ping me if you have any trouble. regards ronnie sahlberg On Fri, Aug 10, 2012 at 8:30 PM, Paolo Bonzini wrote: > Il 10/08/2012 12:28, Stefan Priebe - Profihost AG ha scritto: >> I'm using iscsi. So no raw or qcow2. > > Ok, then you need to use scsi-block as your device instead of scsi-disk > or scsi-hd. This will disable the QEMU SCSI emulation and let your VM > talk directly to the NAS. > > CCing Ronnie who may be interested in bug reports since I'm on holiday > starting "soon". > > Paolo > >> >> Thanks, >> >> Stefan >> >> Am 10.08.2012 12:20, schrieb Paolo Bonzini: >>> Il 10/08/2012 11:22, Stefan Priebe - Profihost AG ha scritto: >>>> virtio-scsi is now working fine. Could you please help me to get discard >>>> / trim running? I can't find any information what is needed to get >>>> discard / trim working. >>> >>> You need to add discard_granularity=NNN, where NNN is usually 512 for >>> raw and the cluster size (65536) for qcow2. >>> >>> However, discard is supported only for XFS on raw, and on qcow2 it will >>> not reclaim space---the space will only be reused for future qcow2 >>> allocation. >>> >>> Bette
Re: [Qemu-devel] virtio-scsi vs. virtio-blk
On Fri, Aug 10, 2012 at 8:30 PM, Paolo Bonzini wrote: > Il 10/08/2012 12:28, Stefan Priebe - Profihost AG ha scritto: >> I'm using iscsi. So no raw or qcow2. > > Ok, then you need to use scsi-block as your device instead of scsi-disk > or scsi-hd. This will disable the QEMU SCSI emulation and let your VM > talk directly to the NAS. > > CCing Ronnie who may be interested in bug reports since I'm on holiday > starting "soon". > I think it works on any, You can of course not boot from a if=scsi disk in qemu, but any '-drive file=iscsi://...,if=scsi' should work as long as it is not the boot device. SCSI emulation in qemu picks this up via WRITESAME10/16 and then calls bdrv_aio_discard() block/iscsi.c is invoked for discard and then translates this back to a SBC UNMAP command it sends to the target. Now, block/iscsi.c does assume that any target that reports that it supports thin-provisioning actually implements UNMAP command. There could be targets that support thin-provision ing that does NOT support UNMAP and unly support discard via WRITESAME10/16 so at some stage I should send a patch to iscsi.c to check which commands the target supprots and use one of the supported ones instead of a blanket "you say you support thin-provisioning, I take that as confirmation you support SBC UNMAP" regards ronnie sahlberg
Re: [Qemu-devel] virtio-scsi vs. virtio-blk
You can easily verify if your target supports thin-provisioning via the UNMAP command. Download the sg3-utils package and either mount the LUN locally via the kernel open-iscsi or apply the libiscsi patch to sg3-utils to make it iscsi-aware then use the commands"sg_unmap" to try to unmap regions and "sg_get_lba_status" to check that the regions are now unmapped. On Fri, Aug 10, 2012 at 9:54 PM, Stefan Priebe - Profihost AG wrote: > http://www.nexenta.com/corp/products/what-is-openstorage/nexentastor > > tells me: > "SCSI UNMAP as a client-side feature frees up storage in the back end, in > the context of thin provisioning (a 100-to-one reduction in space for VDI > when using NexentaStor)." > > So i would say nexenta supports it. But i'm using virtio-scsi-pci? I'm > really sorry to ask so many questions. > > Stefan > Am 10.08.2012 13:20, schrieb ronnie sahlberg: > >> On Fri, Aug 10, 2012 at 8:30 PM, Paolo Bonzini >> wrote: >>> >>> Il 10/08/2012 12:28, Stefan Priebe - Profihost AG ha scritto: >>>> >>>> I'm using iscsi. So no raw or qcow2. >>> >>> >>> Ok, then you need to use scsi-block as your device instead of scsi-disk >>> or scsi-hd. This will disable the QEMU SCSI emulation and let your VM >>> talk directly to the NAS. >>> >>> CCing Ronnie who may be interested in bug reports since I'm on holiday >>> starting "soon". >>> >> >> I think it works on any, >> You can of course not boot from a if=scsi disk in qemu, >> >> but any '-drive file=iscsi://...,if=scsi' should work as long as it is >> not the boot device. >> >> SCSI emulation in qemu picks this up via WRITESAME10/16 and then calls >> bdrv_aio_discard() >> block/iscsi.c is invoked for discard and then translates this back to >> a SBC UNMAP command it sends to the target. >> >> >> Now, block/iscsi.c does assume that any target that reports that it >> supports thin-provisioning actually implements UNMAP command. >> There could be targets that support thin-provision ing that does NOT >> support UNMAP and unly support discard via WRITESAME10/16 >> so at some stage I should send a patch to iscsi.c to check which >> commands the target supprots and use one of the supported ones instead >> of a blanket >> "you say you support thin-provisioning, I take that as confirmation >> you support SBC UNMAP" >> >> >> regards >> ronnie sahlberg >> >
Re: [Qemu-devel] virtio-scsi vs. virtio-blk
On Fri, Aug 10, 2012 at 9:57 PM, Stefan Priebe - Profihost AG wrote: > Am 10.08.2012 13:12, schrieb ronnie sahlberg: > >> You want discard to work? > > Yes > > >> You are using qemu 1.0 ? > > actual qemu-kvm git > > >> So you dont have the qemu support for scsi-generic passthrough to iscsi >> devices. > > Why? > scsi-generic passthrough I think was added for iscsi in 1.1 so in 1.0 your guest will talk scsi to qemu, and invoke the scsi-emulation in qemu. It then will call functions like 'bdrv_aio_discard()" in libiscsi that will translate it back into a scsi command again and pass it to the target. It still works, it just means you have a small degradation of performance compared to if you could send the SCSI CDB straight through to the iscsi target as you can in qemu 1.1 Very likely so small performance hit that you can not even measure it. Biggest difference is cosmetic if you run 'sg_inq' in your guest. in 1.0 it will talk to the qemu scsi emulation layer. in 1.1 when scsi passthrough is use you will talk to the iscsi target.
Re: [Qemu-devel] virtio-scsi vs. virtio-blk
On Fri, Aug 10, 2012 at 10:14 PM, Stefan Priebe - Profihost AG wrote: > Am 10.08.2012 14:04, schrieb ronnie sahlberg: > >> On Fri, Aug 10, 2012 at 9:57 PM, Stefan Priebe - Profihost AG >> wrote: >>> >>> Am 10.08.2012 13:12, schrieb ronnie sahlberg: >>> >>>> You want discard to work? >>> >>> >>> Yes >>> >>> >>>> You are using qemu 1.0 ? >>> >>> >>> actual qemu-kvm git >>> >>> >>>> So you dont have the qemu support for scsi-generic passthrough to iscsi >>>> devices. >>> >>> >>> Why? >>> >> >> scsi-generic passthrough I think was added for iscsi in 1.1 >> so in 1.0 your guest will talk scsi to qemu, and invoke the >> scsi-emulation in qemu. >> It then will call functions like 'bdrv_aio_discard()" in libiscsi >> that will translate it back into a scsi command again and pass it to >> the target. >> >> It still works, it just means you have a small degradation of >> performance compared to if you could send the SCSI CDB straight >> through to the iscsi target as you can in qemu 1.1 >> Very likely so small performance hit that you can not even measure it. > > > which version are you talking about? I use qemu-kvm.git so this is upcomming > 1.2 and i use libiscsi 1.5.0. I dont know the kvm version numbers. But you can check the file block/iscsi.c for the version you use for this : .bdrv_aio_discard = iscsi_aio_discard, If it has bdrv_aio_discard then you have support for 'discard' when using the scsi emulation. i.e. -drive ...,if=scsi,... #ifdef __linux__ .bdrv_ioctl = iscsi_ioctl, .bdrv_aio_ioctl = iscsi_aio_ioctl, #endif If it has these two lines too, then you have scsi-passthrough and can bypass the qemu scsi emulation. One way to activate passthough is via scsi-generic: Example: -device lsi -device scsi-generic,drive=MyISCSI \ -drive file=iscsi://10.1.1.125/iqn.ronnie.test/1,if=none,id=MyI regards ronnie sahlberg
Re: [Qemu-devel] [PATCH] iscsi: fix race between task completition and task abortion
Stefan H How should I do Acked-by properly, Is a reply with the text Acked-by: Ronnie Sahlberg sufficient ? regards ronnie sahlberg On Tue, Aug 14, 2012 at 8:35 PM, Stefan Hajnoczi wrote: > On Tue, Aug 14, 2012 at 08:44:46AM +0200, Stefan Priebe wrote: >> From: spriebe > > CCing Ronnie, iSCSI block driver author. > >> >> --- >> block/iscsi.c | 36 >> 1 files changed, 20 insertions(+), 16 deletions(-) >> >> diff --git a/block/iscsi.c b/block/iscsi.c >> index 12ca76d..257f97f 100644 >> --- a/block/iscsi.c >> +++ b/block/iscsi.c >> @@ -76,6 +76,10 @@ static void >> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void >> *command_data, >> void *private_data) >> { >> + IscsiAIOCB *acb = (IscsiAIOCB *)private_data; >> + >> + scsi_free_scsi_task(acb->task); >> + acb->task = NULL; >> } >> >> static void >> @@ -85,14 +89,19 @@ iscsi_aio_cancel(BlockDriverAIOCB *blockacb) >> IscsiLun *iscsilun = acb->iscsilun; >> >> acb->common.cb(acb->common.opaque, -ECANCELED); >> -acb->canceled = 1; >> >> -/* send a task mgmt call to the target to cancel the task on the target >> */ >> -iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task, >> - iscsi_abort_task_cb, NULL); >> +if (acb->canceled != 0) >> +return; >> + >> +acb->canceled = 1; >> >> -/* then also cancel the task locally in libiscsi */ >> -iscsi_scsi_task_cancel(iscsilun->iscsi, acb->task); >> +/* send a task mgmt call to the target to cancel the task on the target >> + * this also cancels the task in libiscsi >> + */ >> +if (acb->task) { >> +iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task, >> + iscsi_abort_task_cb, &acb); >> +} >> } >> >> static AIOPool iscsi_aio_pool = { >> @@ -184,6 +193,11 @@ iscsi_readv_writev_bh_cb(void *p) >> } >> >> qemu_aio_release(acb); >> + >> +if (acb->task) { >> + scsi_free_scsi_task(acb->task); >> + acb->task = NULL; >> +} >> } >> >> >> @@ -212,8 +226,6 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int >> status, >> } >> >> iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); >> -scsi_free_scsi_task(acb->task); >> -acb->task = NULL; >> } >> >> static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun) >> @@ -313,8 +325,6 @@ iscsi_aio_read16_cb(struct iscsi_context *iscsi, int >> status, >> } >> >> iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); >> -scsi_free_scsi_task(acb->task); >> -acb->task = NULL; >> } >> >> static BlockDriverAIOCB * >> @@ -429,8 +439,6 @@ iscsi_synccache10_cb(struct iscsi_context *iscsi, int >> status, >> } >> >> iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); >> -scsi_free_scsi_task(acb->task); >> -acb->task = NULL; >> } >> >> static BlockDriverAIOCB * >> @@ -483,8 +491,6 @@ iscsi_unmap_cb(struct iscsi_context *iscsi, int status, >> } >> >> iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); >> -scsi_free_scsi_task(acb->task); >> -acb->task = NULL; >> } >> >> static BlockDriverAIOCB * >> @@ -560,8 +566,6 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int >> status, >> } >> >> iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); >> -scsi_free_scsi_task(acb->task); >> -acb->task = NULL; >> } >> >> static BlockDriverAIOCB *iscsi_aio_ioctl(BlockDriverState *bs, >> -- >> 1.7.2.5 >> >>
Re: [Qemu-devel] [PATCH] PATCH V2: fix NULL dereferences / races between task completition and abort
Acked-by: Ronnie Sahlberg On Wed, Aug 15, 2012 at 5:09 PM, Stefan Priebe wrote: > Signed-off-by: Stefan Priebe > > --- > block/iscsi.c | 55 +++ > 1 files changed, 23 insertions(+), 32 deletions(-) > > diff --git a/block/iscsi.c b/block/iscsi.c > index 12ca76d..1c8b049 100644 > --- a/block/iscsi.c > +++ b/block/iscsi.c > @@ -76,6 +76,10 @@ static void > iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void > *command_data, > void *private_data) > { > + IscsiAIOCB *acb = (IscsiAIOCB *)private_data; > + > + scsi_free_scsi_task(acb->task); > + acb->task = NULL; > } > > static void > @@ -84,15 +88,15 @@ iscsi_aio_cancel(BlockDriverAIOCB *blockacb) > IscsiAIOCB *acb = (IscsiAIOCB *)blockacb; > IscsiLun *iscsilun = acb->iscsilun; > > -acb->common.cb(acb->common.opaque, -ECANCELED); > acb->canceled = 1; > > -/* send a task mgmt call to the target to cancel the task on the target > */ > -iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task, > - iscsi_abort_task_cb, NULL); > +acb->common.cb(acb->common.opaque, -ECANCELED); > > -/* then also cancel the task locally in libiscsi */ > -iscsi_scsi_task_cancel(iscsilun->iscsi, acb->task); > +/* send a task mgmt call to the target to cancel the task on the target > + * this also cancels the task in libiscsi > + */ > +iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task, > + iscsi_abort_task_cb, &acb); > } > > static AIOPool iscsi_aio_pool = { > @@ -179,11 +183,18 @@ iscsi_readv_writev_bh_cb(void *p) > > qemu_bh_delete(acb->bh); > > -if (acb->canceled == 0) { > +if (!acb->canceled) { > acb->common.cb(acb->common.opaque, acb->status); > } > > qemu_aio_release(acb); > + > +if (acb->canceled) { > +return; > +} > + > +scsi_free_scsi_task(acb->task); > +acb->task = NULL; > } > > > @@ -197,10 +208,8 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int > status, > > g_free(acb->buf); > > -if (acb->canceled != 0) { > +if (acb->canceled) { > qemu_aio_release(acb); > -scsi_free_scsi_task(acb->task); > -acb->task = NULL; > return; > } > > @@ -212,8 +221,6 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int > status, > } > > iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); > -scsi_free_scsi_task(acb->task); > -acb->task = NULL; > } > > static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun) > @@ -298,10 +305,8 @@ iscsi_aio_read16_cb(struct iscsi_context *iscsi, int > status, > > trace_iscsi_aio_read16_cb(iscsi, status, acb, acb->canceled); > > -if (acb->canceled != 0) { > +if (acb->canceled) { > qemu_aio_release(acb); > -scsi_free_scsi_task(acb->task); > -acb->task = NULL; > return; > } > > @@ -313,8 +318,6 @@ iscsi_aio_read16_cb(struct iscsi_context *iscsi, int > status, > } > > iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); > -scsi_free_scsi_task(acb->task); > -acb->task = NULL; > } > > static BlockDriverAIOCB * > @@ -414,10 +417,8 @@ iscsi_synccache10_cb(struct iscsi_context *iscsi, int > status, > { > IscsiAIOCB *acb = opaque; > > -if (acb->canceled != 0) { > +if (acb->canceled) { > qemu_aio_release(acb); > -scsi_free_scsi_task(acb->task); > -acb->task = NULL; > return; > } > > @@ -429,8 +430,6 @@ iscsi_synccache10_cb(struct iscsi_context *iscsi, int > status, > } > > iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); > -scsi_free_scsi_task(acb->task); > -acb->task = NULL; > } > > static BlockDriverAIOCB * > @@ -468,10 +467,8 @@ iscsi_unmap_cb(struct iscsi_context *iscsi, int status, > { > IscsiAIOCB *acb = opaque; > > -if (acb->canceled != 0) { > +if (acb->canceled) { > qemu_aio_release(acb); > -scsi_free_scsi_task(acb->task); > -acb->task = NULL; > return; > } > > @@ -483,8 +480,6 @@ iscsi_unmap_cb(struct iscsi_context *iscsi, int status, > } > > iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); > -scsi_free_scsi_task(acb->task); > -acb-&g
[Qemu-devel] [PATCH] Two trivial patches for iSCSI and blank DVDs
Kevin, List, Paolo Please find two trivial patches to the iscsi block driver. 1, If the LUN contains a blank disk then the readcapacity10 data will contain a ReturnedLogicalBlockAddress of 0. For this case this does not mean that the device contains one readable block and can be read from. This is a special case for a blank disk that contains no readable blocks at all. This change makes bdrv_getlength()/iscsi_getlength() now rerurn a size of 0 for iSCSI LUNs that contain a blank disk. 2, If the iSCSI LUN contains a blank disk then we can not read or write to it using bdrv_*() functions since there is no bdrv api for writeable mmc devices. For this case with a blank disk force the use of scsi-generic just like we do for tape and mediachanger devices. This allows the guest to talk directly to the target LUN and write to/burn the disk. I have confirmed that dvdrecord -dao -ignsize -overburn dev=/dev/sg1 .iso works from a qemu guest to burn an iSCSI cdrom with these patches. regards ronnie sahlberg
[Qemu-devel] [PATCH 1/2] ISCSI: Set number of blocks to 0 for blank CDROM devices
The number of blocks of the device is used to compute the device size in bdrv_getlength()/iscsi_getlength(). For MMC devices, the ReturnedLogicalBlockAddress in the READCAPACITY10 has a special meaning when it is 0. In this case it does not mean that LBA 0 is the last accessible LBA, and thus the device has 1 readable block, but instead it means that the disc is blank and there are no readable blocks. This change ensures that when the iSCSI LUN is loaded with a blank DVD-R disk or similar that bdrv_getlength() will return the correct size of the device as 0 bytes. Signed-off-by: Ronnie Sahlberg --- block/iscsi.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index bb9cf82..fb420ea 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -717,7 +717,12 @@ iscsi_readcapacity10_cb(struct iscsi_context *iscsi, int status, } itask->iscsilun->block_size = rc10->block_size; -itask->iscsilun->num_blocks = rc10->lba + 1; +if (rc10->lba == 0) { +/* blank disk loaded */ +itask->iscsilun->num_blocks = 0; +} else { +itask->iscsilun->num_blocks = rc10->lba + 1; +} itask->bs->total_sectors= itask->iscsilun->num_blocks * itask->iscsilun->block_size / BDRV_SECTOR_SIZE ; -- 1.7.3.1
[Qemu-devel] [PATCH 2/2] ISCSI: Force scsi-generic for MMC with blank disks
There is no bdrv_* API for the commands for burning a blank MMC disk so when iSCSI LUNs are specified and the LUN is a MMC device with 0 available blocks. This is a blank disk so force scsi generic. This allows the guest to talk directly to the target to burn data on the disk. Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index fb420ea..ca53afa 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1017,10 +1017,19 @@ static int iscsi_open(BlockDriverState *bs, const char *filename, int flags) /* Medium changer or tape. We dont have any emulation for this so this must * be sg ioctl compatible. We force it to be sg, otherwise qemu will try * to read from the device to guess the image format. + * MMC device with no blocks contain a blank disk so force them to use sg + * too. */ -if (iscsilun->type == TYPE_MEDIUM_CHANGER || -iscsilun->type == TYPE_TAPE) { +switch (iscsilun->type) { +case TYPE_ROM: +if (iscsilun->num_blocks > 0) { +break; +} +case TYPE_MEDIUM_CHANGER: +case TYPE_TAPE: bs->sg = 1; +default: +break; } ret = 0; -- 1.7.3.1
Re: [Qemu-devel] [PATCH 2/2] ISCSI: Force scsi-generic for MMC with blank disks
On Sun, Aug 19, 2012 at 7:58 AM, Paolo Bonzini wrote: > Il 17/08/2012 04:36, Ronnie Sahlberg ha scritto: >> There is no bdrv_* API for the commands for burning a blank MMC disk >> so when iSCSI LUNs are specified and the LUN is a MMC device with >> 0 available blocks. This is a blank disk so force scsi generic. >> >> This allows the guest to talk directly to the target to burn data on >> the disk. >> >> Signed-off-by: Ronnie Sahlberg > > What happens without the patch? It's ok that scsi-{hd,cd} does not > work, but do scsi-{block,generic} work without the patch? > Neither of them work, basically because in block.c:find_image_format() if bs->sg is not set in if (bs->sg || !bdrv_is_inserted(bs)) { then we continue to ret = bdrv_pread(bs, 0, buf, sizeof(buf)); which fails with an error. So this patch is basically to prevent find_image_format() from trying to read from a blank disk. Maybe there is a better way to do this? regards ronnie sahlberg
Re: [Qemu-devel] [PATCH 2/2] ISCSI: Force scsi-generic for MMC with blank disks
Ah, This patch only affects the case when there is a blank / empty disk loaded. It has no effect on when real *.iso images are loaded and the disk contains data. The use case to be able to "burn" to an iscsi cdrom is probably not very urgent, so maybe it is best to delay this until post 1.2 regards ronnie sahlberg On Sun, Aug 19, 2012 at 8:02 AM, ronnie sahlberg wrote: > On Sun, Aug 19, 2012 at 7:58 AM, Paolo Bonzini wrote: >> Il 17/08/2012 04:36, Ronnie Sahlberg ha scritto: >>> There is no bdrv_* API for the commands for burning a blank MMC disk >>> so when iSCSI LUNs are specified and the LUN is a MMC device with >>> 0 available blocks. This is a blank disk so force scsi generic. >>> >>> This allows the guest to talk directly to the target to burn data on >>> the disk. >>> >>> Signed-off-by: Ronnie Sahlberg >> >> What happens without the patch? It's ok that scsi-{hd,cd} does not >> work, but do scsi-{block,generic} work without the patch? >> > > Neither of them work, basically because in > block.c:find_image_format() > > if bs->sg is not set in > > if (bs->sg || !bdrv_is_inserted(bs)) { > > then we continue to > > ret = bdrv_pread(bs, 0, buf, sizeof(buf)); > > which fails with an error. > So this patch is basically to prevent find_image_format() from trying > to read from a blank disk. > > Maybe there is a better way to do this? > > > > regards > ronnie sahlberg
Re: [Qemu-devel] [PATCH 1/2] ISCSI: Set number of blocks to 0 for blank CDROM devices
On Sun, Aug 19, 2012 at 7:57 AM, Paolo Bonzini wrote: > Il 17/08/2012 04:36, Ronnie Sahlberg ha scritto: >> The number of blocks of the device is used to compute the device size >> in bdrv_getlength()/iscsi_getlength(). >> For MMC devices, the ReturnedLogicalBlockAddress in the READCAPACITY10 >> has a special meaning when it is 0. >> In this case it does not mean that LBA 0 is the last accessible LBA, >> and thus the device has 1 readable block, but instead it means that the >> disc is blank and there are no readable blocks. >> >> This change ensures that when the iSCSI LUN is loaded with a blank >> DVD-R disk or similar that bdrv_getlength() will return the correct >> size of the device as 0 bytes. >> >> Signed-off-by: Ronnie Sahlberg >> --- >> block/iscsi.c |7 ++- >> 1 files changed, 6 insertions(+), 1 deletions(-) >> >> diff --git a/block/iscsi.c b/block/iscsi.c >> index bb9cf82..fb420ea 100644 >> --- a/block/iscsi.c >> +++ b/block/iscsi.c >> @@ -717,7 +717,12 @@ iscsi_readcapacity10_cb(struct iscsi_context *iscsi, >> int status, >> } >> >> itask->iscsilun->block_size = rc10->block_size; >> -itask->iscsilun->num_blocks = rc10->lba + 1; >> +if (rc10->lba == 0) { >> +/* blank disk loaded */ >> +itask->iscsilun->num_blocks = 0; >> +} else { >> +itask->iscsilun->num_blocks = rc10->lba + 1; >> +} >> itask->bs->total_sectors= itask->iscsilun->num_blocks * >> itask->iscsilun->block_size / >> BDRV_SECTOR_SIZE ; >> >> > > Thanks, applied to SCSI branch. We'll see whether this hits 1.2. This patch only matters for the use case of having empty/blank media loaded into an iscsi cdrom. I doubt being able to "burn" data to an iscsi drive is very important or urgent, so it is fine with me to delay until post 1.2 regards ronnie sahlberg
Re: [Qemu-devel] [PATCH 2/2] ISCSI: Force scsi-generic for MMC with blank disks
On Sun, Aug 19, 2012 at 8:16 AM, Paolo Bonzini wrote: > Il 19/08/2012 00:02, ronnie sahlberg ha scritto: >> Neither of them work, basically because in >> block.c:find_image_format() >> >> if bs->sg is not set in >> >> if (bs->sg || !bdrv_is_inserted(bs)) { >> >> then we continue to >> >> ret = bdrv_pread(bs, 0, buf, sizeof(buf)); >> >> which fails with an error. >> So this patch is basically to prevent find_image_format() from trying >> to read from a blank disk. >> >> Maybe there is a better way to do this? > > Yeah, I think in this case find_image_format should just use raw. Ok, so that is basically what the patch does. It forces bs->sg==true so that we pick "raw" right there instead of trying to read from the device. So you are happy with the patch ? regards ronnie sahlberg
Re: [Qemu-devel] [PATCH 2/2] ISCSI: Force scsi-generic for MMC with blank disks
On Sun, Aug 19, 2012 at 8:20 AM, Paolo Bonzini wrote: > Il 19/08/2012 00:02, ronnie sahlberg ha scritto: >> On Sun, Aug 19, 2012 at 7:58 AM, Paolo Bonzini wrote: >>> Il 17/08/2012 04:36, Ronnie Sahlberg ha scritto: >>>> There is no bdrv_* API for the commands for burning a blank MMC disk >>>> so when iSCSI LUNs are specified and the LUN is a MMC device with >>>> 0 available blocks. This is a blank disk so force scsi generic. >>>> >>>> This allows the guest to talk directly to the target to burn data on >>>> the disk. >>>> >>>> Signed-off-by: Ronnie Sahlberg >>> >>> What happens without the patch? It's ok that scsi-{hd,cd} does not >>> work, but do scsi-{block,generic} work without the patch? >>> >> >> Neither of them work, basically because in >> block.c:find_image_format() >> >> if bs->sg is not set in >> >> if (bs->sg || !bdrv_is_inserted(bs)) { >> >> then we continue to >> >> ret = bdrv_pread(bs, 0, buf, sizeof(buf)); >> >> which fails with an error. >> So this patch is basically to prevent find_image_format() from trying >> to read from a blank disk. >> >> Maybe there is a better way to do this? > > Ah, BTW does using -drive ...,format=raw work, or does it hiccup later > again on something else? > format=raw works ! That then begs the question if would it be possible to force format=raw always for iscsi devices? A iscsi device as far as I can see would always just be a raw block device and there would never be a "header" on such devices so maybe that would be a solution? regards ronnie sahlberg
Re: [Qemu-devel] [PATCH RFT 0/3] iscsi: fix NULL dereferences / races between task completion and abort
On Mon, Aug 20, 2012 at 6:12 PM, Stefan Priebe - Profihost AG wrote: > Hi Ronnie, > > Am 20.08.2012 10:08, schrieb Paolo Bonzini: > >> That's because the "big QEMU lock" is held by the thread that called >> qemu_aio_cancel. >> >>> and i also see >>> no cancellation message in kernel log. >> >> >> And that's because the UNMAP actually ultimately succeeds. You'll >> probably see soft lockup messages though. >> >> The solution here is to bump the timeout of the UNMAP command (either in >> the kernel or in libiscsi, I didn't really understand who's at fault). > > > What's your suggestion / idea about that? > There are no timeouts in libiscsi itself. But you can probably tweak it through the guest kernel : $ cat /sys/bus/scsi/devices/0\:0\:0\:0/timeout 30 should be the default scsi timeout for this device, and $ cat /sys/bus/scsi/devices/0\:0\:0\:0/queue_depth 31 would be how many concurrent i/o that the guest will drive to the device. When performing the UNMAPS, maybe setting timeout to something really big, and at the same time also dropping queue-depth to something really small so that the guest kernel will not be able to send so many UNMAPs concurrently. ronnie s
[Qemu-devel] [PATCH] iSCSI: Add support for SG_IO to bdrv_ioctl()/iscsi_ioctl()
Paolo, List Please find a patch that adds emulation of SG_IO to the synchronous function bdrv_ioctl()/iscsi_ioctl(). Previously we have only supported emulation for this ioctl in iSCSI for the asynchronous function iscsi_aio_ioctl() since that is the only place scsi-generic uses SG_IO. By adding support for SG_IO to iscsi_ioctl() this makes scsi-block and scsi-disk work too. Since scsi-block/scsi-disk never worked with iscsi, and only scsi-generic worked, this is not a new regression. So whether this should go in now or wait until post 1.2 is for your disgression. regards ronnie sahlberg
[Qemu-devel] [PATCH] iSCSI: Add support for SG_IO in bdrv_ioctl()
We need to support SG_IO in the synchronous bdrv_ioctl() since this is used by scsi-block Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 109 - 1 files changed, 108 insertions(+), 1 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 993a86d..9e98bfe 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -548,7 +548,8 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status, #define SG_ERR_DRIVER_SENSE0x08 -if (status == SCSI_STATUS_CHECK_CONDITION && acb->task->datain.size >= 2) { +if (status == SCSI_STATUS_CHECK_CONDITION +&& acb->task->datain.size >= 2) { int ss; acb->ioh->driver_status |= SG_ERR_DRIVER_SENSE; @@ -633,9 +634,53 @@ static BlockDriverAIOCB *iscsi_aio_ioctl(BlockDriverState *bs, return &acb->common; } +struct IoctlTask { +int status; +int complete; +sg_io_hdr_t *ioh; +struct scsi_task *task; +}; + +static void +iscsi_ioctl_cb(struct iscsi_context *iscsi, int status, + void *command_data, void *opaque) +{ +struct IoctlTask *itask = opaque; + +if (status < 0) { +error_report("Failed to ioctl(SG_IO) to iSCSI lun. %s", + iscsi_get_error(iscsi)); +itask->status = -EIO; +} + +itask->ioh->driver_status = 0; +itask->ioh->host_status = 0; +itask->ioh->resid = 0; + +#define SG_ERR_DRIVER_SENSE0x08 + +if (status == SCSI_STATUS_CHECK_CONDITION +&& itask->task->datain.size >= 2) { +int ss; + +itask->ioh->driver_status |= SG_ERR_DRIVER_SENSE; + +itask->ioh->sb_len_wr = itask->task->datain.size - 2; +ss = (itask->ioh->mx_sb_len >= itask->ioh->sb_len_wr) ? + itask->ioh->mx_sb_len : itask->ioh->sb_len_wr; +memcpy(itask->ioh->sbp, &itask->task->datain.data[2], ss); +} + +itask->complete = 1; +} + static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf) { IscsiLun *iscsilun = bs->opaque; +struct iscsi_context *iscsi = iscsilun->iscsi; +struct IoctlTask itask; +struct scsi_task *task; +struct iscsi_data data; switch (req) { case SG_GET_VERSION_NUM: @@ -644,6 +689,68 @@ static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf) case SG_GET_SCSI_ID: ((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type; break; +case SG_IO: +itask.ioh = buf; +task = malloc(sizeof(struct scsi_task)); +if (task == NULL) { +error_report("iSCSI: Failed to allocate task for scsi command. %s", + iscsi_get_error(iscsi)); +return -1; +} +memset(task, 0, sizeof(struct scsi_task)); + +switch (itask.ioh->dxfer_direction) { +case SG_DXFER_TO_DEV: +task->xfer_dir = SCSI_XFER_WRITE; +break; +case SG_DXFER_FROM_DEV: +task->xfer_dir = SCSI_XFER_READ; +break; +default: +task->xfer_dir = SCSI_XFER_NONE; +break; +} +task->cdb_size = itask.ioh->cmd_len; +memcpy(&task->cdb[0], itask.ioh->cmdp, itask.ioh->cmd_len); +task->expxferlen = itask.ioh->dxfer_len; + +if (task->xfer_dir == SCSI_XFER_WRITE) { +data.data = itask.ioh->dxferp; +data.size = itask.ioh->dxfer_len; +} + +if (iscsi_scsi_command_async(iscsi, iscsilun->lun, task, + iscsi_ioctl_cb, + (task->xfer_dir == SCSI_XFER_WRITE) ? + &data : NULL, + &itask) != 0) { +scsi_free_scsi_task(task); +return -1; +} + +/* tell libiscsi to read straight into the buffer we got from ioctl */ +if (task->xfer_dir == SCSI_XFER_READ) { +scsi_task_add_data_in_buffer(task, + itask.ioh->dxfer_len, + itask.ioh->dxferp); +} + +itask.complete = 0; +itask.status = 0; +itask.task = task; +while (!itask.complete) { +iscsi_set_events(iscsilun); +qemu_aio_wait(); +} +scsi_free_scsi_task(task); + +if (itask.status != 0) { +error_report("iSCSI: Failed to send async command to target : %s", + iscsi_get_error(iscsi)); +return -1; +} + +return 0; default: return -1; } -- 1.7.3.1
Re: [Qemu-devel] [PATCH v2 1/3] net: asynchronous send/receive infrastructure for net/socket.c
In net_socket_update_fd_handler() shouldnt you call qemu_notify_event() if any of the handlers have changed from NULL to non-NULL ? or else it might take a while before the change takes effect. regards ronnie sahlberg On Wed, Aug 22, 2012 at 1:52 AM, Stefan Hajnoczi wrote: > The net/socket.c net client is not truly asynchronous. This patch > borrows the qemu_set_fd_handler2() code from net/tap.c as the basis for > proper asynchronous send/receive. > > Only read packets from the socket when the peer is able to receive. > This avoids needless queuing. > > Later patches implement asynchronous send. > > Signed-off-by: Stefan Hajnoczi > --- > net/socket.c | 58 > -- > 1 file changed, 52 insertions(+), 6 deletions(-) > > diff --git a/net/socket.c b/net/socket.c > index c172c24..54e32f0 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -42,9 +42,51 @@ typedef struct NetSocketState { > unsigned int packet_len; > uint8_t buf[4096]; > struct sockaddr_in dgram_dst; /* contains inet host and port destination > iff connectionless (SOCK_DGRAM) */ > +IOHandler *send_fn; /* differs between SOCK_STREAM/SOCK_DGRAM > */ > +bool read_poll; /* waiting to receive data? */ > +bool write_poll; /* waiting to transmit data? */ > } NetSocketState; > > static void net_socket_accept(void *opaque); > +static void net_socket_writable(void *opaque); > + > +/* Only read packets from socket when peer can receive them */ > +static int net_socket_can_send(void *opaque) > +{ > +NetSocketState *s = opaque; > + > +return qemu_can_send_packet(&s->nc); > +} > + > +static void net_socket_update_fd_handler(NetSocketState *s) > +{ > +qemu_set_fd_handler2(s->fd, > + s->read_poll ? net_socket_can_send : NULL, > + s->read_poll ? s->send_fn : NULL, > + s->write_poll ? net_socket_writable : NULL, > + s); > +} > + > +static void net_socket_read_poll(NetSocketState *s, bool enable) > +{ > +s->read_poll = enable; > +net_socket_update_fd_handler(s); > +} > + > +static void net_socket_write_poll(NetSocketState *s, bool enable) > +{ > +s->write_poll = enable; > +net_socket_update_fd_handler(s); > +} > + > +static void net_socket_writable(void *opaque) > +{ > +NetSocketState *s = opaque; > + > +net_socket_write_poll(s, false); > + > +qemu_flush_queued_packets(&s->nc); > +} > > /* XXX: we consider we can send the whole packet without blocking */ > static ssize_t net_socket_receive(NetClientState *nc, const uint8_t *buf, > size_t size) > @@ -81,7 +123,8 @@ static void net_socket_send(void *opaque) > } else if (size == 0) { > /* end of connection */ > eoc: > -qemu_set_fd_handler(s->fd, NULL, NULL, NULL); > +net_socket_read_poll(s, false); > +net_socket_write_poll(s, false); > if (s->listen_fd != -1) { > qemu_set_fd_handler(s->listen_fd, net_socket_accept, NULL, s); > } > @@ -152,7 +195,8 @@ static void net_socket_send_dgram(void *opaque) > return; > if (size == 0) { > /* end of connection */ > -qemu_set_fd_handler(s->fd, NULL, NULL, NULL); > +net_socket_read_poll(s, false); > +net_socket_write_poll(s, false); > return; > } > qemu_send_packet(&s->nc, s->buf, size); > @@ -243,7 +287,8 @@ static void net_socket_cleanup(NetClientState *nc) > { > NetSocketState *s = DO_UPCAST(NetSocketState, nc, nc); > if (s->fd != -1) { > -qemu_set_fd_handler(s->fd, NULL, NULL, NULL); > +net_socket_read_poll(s, false); > +net_socket_write_poll(s, false); > close(s->fd); > s->fd = -1; > } > @@ -314,8 +359,8 @@ static NetSocketState > *net_socket_fd_init_dgram(NetClientState *peer, > > s->fd = fd; > s->listen_fd = -1; > - > -qemu_set_fd_handler(s->fd, net_socket_send_dgram, NULL, s); > +s->send_fn = net_socket_send_dgram; > +net_socket_read_poll(s, true); > > /* mcast: save bound address as dst */ > if (is_connected) { > @@ -332,7 +377,8 @@ err: > static void net_socket_connect(void *opaque) > { > NetSocketState *s = opaque; > -qemu_set_fd_handler(s->fd, net_socket_send, NULL, s); > +s->send_fn = net_socket_send; > +net_socket_read_poll(s, true); > } > > static NetClientInfo net_socket_info = { > -- > 1.7.10.4 > >
Re: [Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
On Tue, Jan 24, 2012 at 5:07 AM, Eric Blake wrote: > > Can -readconfig support reading from an inherited fd, rather than only > taking a file name that qemu has to open()? > ... > > Can you give an actual command line that uses -readconfig, as part of > your example? > Thanks for the review Eric. I will update the patch with an example using -readconfig and fix the typo. Yes, -readconfig can read from a pipe() and it can also read from an arbitrary file descriptor. Reading from a pipe : cat iscsi.conf | ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -display vnc=127.0.0.1:0 -drive file=iscsi://127.0.0.1/iqn.ronnie.test/1 -readconfig /proc/self/fd/0 Read from an arbitrary filedescriptor inherited from the parent process : 9
Re: [Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
Fair enough. I will send a separate tiny patch to add 'fd:' support to specify to qemu to -readconfig from a preexisting filedescriptor. Other protocols like 'exec:' can easily be added later as needed. regards ronnie sahlberg On Thu, Jan 26, 2012 at 2:57 AM, Eric Blake wrote: > On 01/24/2012 11:47 PM, ronnie sahlberg wrote: >> Read from an arbitrary filedescriptor inherited from the parent process : >> 9> vnc=127.0.0.1:0 -drive file=iscsi://127.0.0.1/iqn.ronnie.test/1 >> -readconfig /proc/self/fd/9 > > That requires the existence of procfs, which is not portable (although > it does work on Linux). I'd rather see: > > -readconfig fd:9 > > which matches things for -incoming; that is, if -readconfig starts with > '/' or '.', it is a filename; otherwise, it is a protocol:value > designation, where we recognize at least the fd: protocol where a value > is the incoming fd, but we could also recognize things like exec: > protocol which is an arbitrary command to use via popen. > >> I imagine you would pipe() then fork() and pass the read side of your >> pipe to qemu here ? > > Yes, the idea is that libvirt would rather pipe() and then pass the read > size fd to qemu, so that libvirt's handling of the decrypted secret > information is only ever passed over the pipe and not stored on disk. > >> If this works well or at least in some acceptable form it might be >> useful for other users needing to pass sensitive config data into QEMU >> too? > > Yes, the fd: notation of -incoming should be reusable in multiple > contexsts, including any other location where sensitive information must > be passed in. > > -- > Eric Blake ebl...@redhat.com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org >
[Qemu-devel] [PATCH 0/0] Allow -readconfig to read from a pre-existing filedescriptor
List, Please find attached a trivial patch to allow -readconfig to read from a pre-existing filedescriptor instead of a file off disk. Syntax is '-readconfig fd:' to read from filedescriptor . This is useful for example for libvirt which allow it to pass configuration data, including possibly sensitive such, into qemu without this data hitting the filesystem or a disk. regards ronnie sahlberg
[Qemu-devel] [PATCH] READCONFIG: Allow reading the configuration from a pre-existing filedescriptor
Update the readconfig filename parsing to allow specifying an existing, inherited, filedescriptor as 'fd:' This is useful when you want to pass potentially sensitive onfiguration data to qemu without having it hit the filesystem/stable-storage Signed-off-by: Ronnie Sahlberg --- qemu-config.c | 15 +-- qemu-options.hx |3 ++- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/qemu-config.c b/qemu-config.c index b030205..c12c5eb 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -770,8 +770,19 @@ out: int qemu_read_config_file(const char *filename) { -FILE *f = fopen(filename, "r"); -int ret; +FILE *f; +int ret, fd; + +if (strncmp(filename, "fd:", 3)) { +f = fopen(filename, "r"); +} else { +errno = 0; +fd = strtol(filename + 3, NULL, 10); +if (errno != 0) { +return -errno; +} +f = fdopen(fd, "r"); +} if (f == NULL) { return -errno; diff --git a/qemu-options.hx b/qemu-options.hx index 3a07ae8..653ff2d 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2577,7 +2577,8 @@ DEF("readconfig", HAS_ARG, QEMU_OPTION_readconfig, STEXI @item -readconfig @var{file} @findex -readconfig -Read device configuration from @var{file}. +Read device configuration from @var{file}. Use 'fd:' as filename +to read from an existing, inherited, filedesriptor. ETEXI DEF("writeconfig", HAS_ARG, QEMU_OPTION_writeconfig, "-writeconfig \n" -- 1.7.3.1
[Qemu-devel] [PATCH 0/0] Add configuration variables for iSCSI
Kevin, List Please review and/or apply. This is version 3 of the patch to add configuration variables for iSCSI. Version 2 added the feature to specify configuration blocks that apply to a specific target name, allowing qemu to use different settings if/when connecting one guest to multiple different iscsi targets. Version 3 fixes a typo and adds an example to the documentation on how to use -readconfig for reading the configuration from a file This patch adds configuration variables for iSCSI to set initiator-name to use when logging in to the target, which type of header-digest to negotiate with the target and username and password for CHAP authentication. This allows specifying a initiator-name either from the command line -iscsi initiator-name=iqn.2004-01.com.example:test or from a configuration file included with -readconfig [iscsi] initiator-name = iqn.2004-01.com.example:test header-digest = CRC32C|CRC32C-NONE|NONE-CRC32C|NONE user = CHAP username password = CHAP password In the configuration file it is also possible to set a target specific configuratyion block using the header [iscsi "iqn.target.name"] When a iscsi session is initialized, it will first try to use a configuration section that matches the target name. If no such block is found, it will fall-back to try the default [iscsi] section instead. regards ronnie sahlberg
[Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
This patch adds configuration variables for iSCSI to set initiator-name to use when logging in to the target, which type of header-digest to negotiate with the target and username and password for CHAP authentication. This allows specifying a initiator-name either from the command line -iscsi initiator-name=iqn.2004-01.com.example:test or from a configuration file included with -readconfig [iscsi] initiator-name = iqn.2004-01.com.example:test header-digest = CRC32C|CRC32C-NONE|NONE-CRC32C|NONE user = CHAP username password = CHAP password If you use several different targets, you can also configure this on a per target basis by using a group name: [iscsi "iqn.target.name"] ... The configuration file can be read using -readconfig. Example : qemu-system-i386 -drive file=iscsi://127.0.0.1/iqn.ronnie.test/1 -readconfig iscsi.conf Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 139 +++ qemu-config.c | 27 +++ qemu-doc.texi | 54 +- qemu-options.hx | 16 +-- vl.c|8 +++ 5 files changed, 229 insertions(+), 15 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 938c568..bd3ca11 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -455,6 +455,109 @@ iscsi_connect_cb(struct iscsi_context *iscsi, int status, void *command_data, } } +static int parse_chap(struct iscsi_context *iscsi, const char *target) +{ +QemuOptsList *list; +QemuOpts *opts; +const char *user = NULL; +const char *password = NULL; + +list = qemu_find_opts("iscsi"); +if (!list) { +return 0; +} + +opts = qemu_opts_find(list, target); +if (opts == NULL) { +opts = QTAILQ_FIRST(&list->head); +if (!opts) { +return 0; +} +} + +user = qemu_opt_get(opts, "user"); +if (!user) { +return 0; +} + +password = qemu_opt_get(opts, "password"); +if (!password) { +error_report("CHAP username specified but no password was given"); +return -1; +} + +if (iscsi_set_initiator_username_pwd(iscsi, user, password)) { +error_report("Failed to set initiator username and password"); +return -1; +} + +return 0; +} + +static void parse_header_digest(struct iscsi_context *iscsi, const char *target) +{ +QemuOptsList *list; +QemuOpts *opts; +const char *digest = NULL; + +list = qemu_find_opts("iscsi"); +if (!list) { +return; +} + +opts = qemu_opts_find(list, target); +if (opts == NULL) { +opts = QTAILQ_FIRST(&list->head); +if (!opts) { +return; +} +} + +digest = qemu_opt_get(opts, "header-digest"); +if (!digest) { +return; +} + +if (!strcmp(digest, "CRC32C")) { +iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_CRC32C); +} else if (!strcmp(digest, "NONE")) { +iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_NONE); +} else if (!strcmp(digest, "CRC32C-NONE")) { +iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_CRC32C_NONE); +} else if (!strcmp(digest, "NONE-CRC32C")) { +iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_NONE_CRC32C); +} else { +error_report("Invalid header-digest setting : %s", digest); +} +} + +static char *parse_initiator_name(const char *target) +{ +QemuOptsList *list; +QemuOpts *opts; +const char *name = NULL; + +list = qemu_find_opts("iscsi"); +if (!list) { +return g_strdup("iqn.2008-11.org.linux-kvm"); +} + +opts = qemu_opts_find(list, target); +if (opts == NULL) { +opts = QTAILQ_FIRST(&list->head); +if (!opts) { +return g_strdup("iqn.2008-11.org.linux-kvm"); +} +} + +name = qemu_opt_get(opts, "initiator-name"); +if (!name) { +return g_strdup("iqn.2008-11.org.linux-kvm"); +} + +return g_strdup(name); +} + /* * We support iscsi url's on the form * iscsi://[%@][:]// @@ -465,6 +568,7 @@ static int iscsi_open(BlockDriverState *bs, const char *filename, int flags) struct iscsi_context *iscsi = NULL; struct iscsi_url *iscsi_url = NULL; struct IscsiTask task; +char *initiator_name = NULL; int ret; if ((BDRV_SECTOR_SIZE % 512) != 0) { @@ -474,16 +578,6 @@ static int iscsi_open(BlockDriverState *bs, const char *filename, int flags) return -EINVAL; } -memset(iscsilun, 0, sizeof(IscsiLun)); - -/* Should really append the KVM name after the ':' here */ -iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:"); -if (iscsi == NULL) { -error_report("iSCSI: Failed to create
Re: [Qemu-devel] [PATCH v2 11/33] scsi-disk: support DVD profile in GET CONFIGURATION
Hi Paolo, List, I see you are very active in the MMC emulation for qemu. I have written several MMC emulators, the third of which (in STGT) is open source. I did the MMC emulation in STGT, (see README.mmc for how to use it as "blank burnable media") Do you have DVD+/-R + morhping implemented yet? STGT has a feature where once an initiator has finished "burning" a new image to a "blank" disk where STGT automagically morphs the LUN from being profile:dvd+/-r to be profile:dvd-rom to indicate it is no longer a blank burnable disk but a readonly disk containing an iso. Would that be of interest to you or the community? It is a few thousand lines of semi-trivial code to port but nothing that hairy. Im happy to port it to qemu if there is need/desire/interest for it. There was a bug in STGT that made it impossible to build a virtual jukebox with writeable/burnable media for about 6 months and no one noticed, so maybe not too many people use such features :-( regards ronnie sahlberg On Thu, Jan 26, 2012 at 7:09 PM, Paolo Bonzini wrote: > On 01/25/2012 09:13 PM, Artyom Tarasenko wrote: >> >> On 1/25/12, Paolo Bonzini wrote: >>> >>> On 01/25/2012 05:34 PM, Artyom Tarasenko wrote: >>>> >>>> This patch produces the following error when booting Solaris/SPARC: >>>> >>>> WARNING: >>>> /iommu@0,1000/sbus@0,10001000/espdma@5,840/esp@5,880 >>>> (esp0): >>>> data transfer overrun: current esp state: >>>> esp: State=DATA Last State=DATA_DONE >>>> esp: Latched stat=0x91 intr=0x10 fifo >>>> 0x0 >>>> esp: last msg out: IDENTIFY; last msg in: COMMAND COMPLETE >>>> esp: DMA csr=0xa4240030 >>>> esp: addr=fc005860 dmacnt=0 last=fc005858 last_cnt=8 >>>> esp: Cmd dump for Target 6 Lun 0: >>>> esp: cdblen=10, cdb=[ 0x46 0x2 0x0 0x0 0x0 0x0 0x0 0x0 0x8 >>>> 0x0 >>>> ] >>>> esp: pkt_state=0xf pkt_flags=0x1 >>>> pkt_statistics=0x0 >>>> esp: cmd_flags=0x462 cmd_timeout=60 >>>> >>>> Particularly it seems that Solaris is not happy with respose length >>>>> >>>>> 8. Any idea, why? > > > The response length is not 8, it is 40; but Solaris only expected 8 (see the > CDB in the error dump). > > >>>> $ sparc-softmmu/qemu-system-sparc -bios ~/ss5-170.bin -m 32 -nographic >>>> -drive if=scsi,unit=6,media=cdrom,file=~/Solaris9cd1.iso >>> >>> >>> From a quick look, it's the same bug that was reported recently on >>> INQUIRY commands. >> >> >> If you need I can provide more input. > > > You can test http://article.gmane.org/gmane.comp.emulators.qemu/133434 if > you have time. > > Paolo >
Re: [Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
Kevin, Collissions are bad, but what about IF ! STRNCMP (filename, "/proc/self/fd/", 14) THEN fopen(filename, "r") ELSE fdopen(atoi(filename+14), "r") FI modulo better validation for the atio() arguments. Probability of anyone using "/proc/self/fd/" as a prefix for normal files is very small. Small enough it will be justifiable to say "do that and you are on your own". ? regards ronnie sahlberg On Thu, Jan 26, 2012 at 8:08 PM, Kevin Wolf wrote: > Am 25.01.2012 16:57, schrieb Eric Blake: >> On 01/24/2012 11:47 PM, ronnie sahlberg wrote: >>> Read from an arbitra filedescriptor inherited from the parent process : >>> 9>> vnc=127.0.0.1:0 -drive file=iscsi://127.0.0.1/iqn.ronnie.test/1 >>> -readconfig /proc/self/fd/9 >> >> That requires the existence of procfs, which is not portable (although >> it does work on Linux). I'd rather see: >> >> -readconfig fd:9 >> >> which matches things for -incoming; that is, if -readconfig starts with >> '/' or '.', it is a filename; otherwise, it is a protocol:value >> designation, where we recognize at least the fd: protocol where a value >> is the incoming fd, but we could also recognize things like exec: >> protocol which is an arbitrary command to use via popen. > > Magic prefixes like this have one big problem: What if someone has a > config file called "fd:9"? We have the very same problem with protocols > in the block layer and while in the general case it's a convenient > syntax, we've come to hate it in cases where it misinterprets things. > > Kevin
Re: [Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
Ok so what about this You use a filename starting with "/proc/self/fd/" and you dont have a proc filesystem mounted? you are on your own! regards ronnie sahlberg On Thu, Jan 26, 2012 at 8:27 PM, Kevin Wolf wrote: > Am 26.01.2012 10:18, schrieb ronnie sahlberg: >> Kevin, >> >> Collissions are bad, but what about >> >> IF ! STRNCMP (filename, "/proc/self/fd/", 14) THEN >> fopen(filename, "r") >> ELSE >> fdopen(atoi(filename+14), "r") >> FI >> >> modulo better validation for the atio() arguments. >> >> >> Probability of anyone using "/proc/self/fd/" as a prefix for normal >> files is very small. >> Small enough it will be justifiable to say "do that and you are on your >> own". ? > > Interesting idea. Maybe that could work. > > Kevin 0001-READCONFIG-make-proc-self-fd-a-magic-prefix-to-refer.patch.gz Description: GNU Zip compressed data From 65aa496d77b839f1c48745fc5545e3e6772f724c Mon Sep 17 00:00:00 2001 From: Ronnie Sahlberg Date: Thu, 26 Jan 2012 20:47:40 +1100 Subject: [PATCH] READCONFIG: make /proc/self/fd/ a magic prefix to refer to a specific filedesriptor inherited from the parent. Signed-off-by: Ronnie Sahlberg --- qemu-config.c | 16 ++-- qemu-options.hx |3 ++- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/qemu-config.c b/qemu-config.c index b030205..9e709d4 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -770,8 +770,20 @@ out: int qemu_read_config_file(const char *filename) { -FILE *f = fopen(filename, "r"); -int ret; +FILE *f; +int ret, fd; +char *ptr = NULL; + +if (strncmp(filename, "/proc/self/fd/", 14)) { +f = fopen(filename, "r"); +} else { +errno = 0; +fd = strtol(filename + 14, &ptr, 10); +if (errno != 0 || ptr == filename + 14 || *ptr != '\0') { +return -EINVAL; +} +f = fdopen(fd, "r"); +} if (f == NULL) { return -errno; diff --git a/qemu-options.hx b/qemu-options.hx index 3a07ae8..e54af58 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2577,7 +2577,8 @@ DEF("readconfig", HAS_ARG, QEMU_OPTION_readconfig, STEXI @item -readconfig @var{file} @findex -readconfig -Read device configuration from @var{file}. +Read device configuration from @var{file}. Use '/proc/self/fd/' as filename +to read from an existing, inherited, filedesriptor. ETEXI DEF("writeconfig", HAS_ARG, QEMU_OPTION_writeconfig, "-writeconfig \n" -- 1.7.3.1
Re: [Qemu-devel] qemu-img convert with block driver without .bdrv_create (like iscsi)
On Thu, Oct 25, 2012 at 1:02 AM, Kevin Wolf wrote: > Am 25.10.2012 09:52, schrieb Paolo Bonzini: >> Il 25/10/2012 09:46, Kevin Wolf ha scritto: >>>>> 1)add a .bdrv_create in block/iscsi.c ? >>>>> >>>>> (like host_device block driver, only open/close the device and check if >>>>> size if big enough) >>> Yes, this is the right way. >> >> Could it be a default implementation of .bdrv_create (i.e. something >> you'd do in bdrv_create if the protocol doesn't have it)? > > No, there are block drivers that really can't create images. They should > keep failing. Technically, you can not create new LUNs via the iscsi protocol itself, you can only access pre-existing luns that have been created by some out-of-band method. So basically, with iscsi you can only ever access already preexisting luns. In that case I think requiring a .bdrv_create feels wrong. We are not creating a new LUN here but just opening an already existing LUN. What about changing bdrv_create() so that IF there is no .bdrv_create method, then assume it might be a pre-existing file in which case we fallback to use .bdrv_open instead. regards ronnie sahlberg
Re: [Qemu-devel] qemu-img convert with block driver without .bdrv_create (like iscsi)
On Thu, Oct 25, 2012 at 6:58 AM, Paolo Bonzini wrote: > Il 25/10/2012 15:41, ronnie sahlberg ha scritto: >> On Thu, Oct 25, 2012 at 1:02 AM, Kevin Wolf wrote: >>> Am 25.10.2012 09:52, schrieb Paolo Bonzini: >>>> Il 25/10/2012 09:46, Kevin Wolf ha scritto: >>>>>>> 1)add a .bdrv_create in block/iscsi.c ? >>>>>>> >>>>>>> (like host_device block driver, only open/close the device and check if >>>>>>> size if big enough) >>>>> Yes, this is the right way. >>>> >>>> Could it be a default implementation of .bdrv_create (i.e. something >>>> you'd do in bdrv_create if the protocol doesn't have it)? >>> >>> No, there are block drivers that really can't create images. They should >>> keep failing. >> >> Technically, you can not create new LUNs via the iscsi protocol >> itself, you can only access pre-existing luns >> that have been created by some out-of-band method. >> >> So basically, with iscsi you can only ever access already preexisting luns. >> >> In that case I think requiring a .bdrv_create feels wrong. We are not >> creating a new LUN here but just opening an already existing LUN. > > The problem is that bdrv_create is overloaded to mean both "create the > backing storage" and "format the image". Only the latter applies to > iSCSI and, in general, as far as protocols are concerned bdrv_create is > usually a no-op. > > However, Kevin said he prefers to have an explicit override of > bdrv_create for iSCSI. Can you implement that? (I'll then do the same > for NBD). > Yepp. Since there is consensus. We can do a .bdrv_create for iscsi.c regards ronnie sahlberg
Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI -> Bad ram pointer
But older distros/kernels work fine? Can you take a network trace? About half a year there was an issue where recent kernels had added support to start using new scsi opcodes, but the qemu functions that determine "which transfer direction is used for this opcode" had not yet been updated, so that the opcode was sent with the wrong transfer direction. That caused the guests memory to be overwritten and crash. I dont have (easy) access to the git tree right now, but it was a patch for the ATA_PASSTHROUGH command that fixed that. Could you take a network trace and check maybe this is something similar ? I.e. does the guest send an "unusual" scsi opcode just before the crash ? regards ronnie sahlberg On Tue, Oct 30, 2012 at 12:37 PM, Peter Lieven wrote: > Am 30.10.2012 19:27, schrieb Stefan Hajnoczi: > >> On Tue, Oct 30, 2012 at 4:56 PM, Peter Lieven wrote: >>> >>> On 30.10.2012 09:32, Stefan Hajnoczi wrote: >>>> >>>> On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote: >>>>> >>>>> Hi, >>>> >>>> Bug subject should be virtio-blk, not virtio-scsi. virtio-scsi is a >>>> different virtio device type from virtoi-blk and is not present in the >>>> backtrace you posted. >>>> >>>> Sounds pedantic but I want to make sure this gets chalked up against the >>>> right device :). >>>> >>>>> If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio >>>>> storage backend that supports iSCSI >>>>> qemu-kvm crashes reliably with the following error: >>>> >>>> Are you using vanilla qemu-kvm-1.2.0 or are there patches applied? >>>> >>>> Have you tried qemu-kvm.git/master? >>>> >>>> Have you tried a local raw disk image to check whether libiscsi is >>>> involved? >>>> >>>>> Bad ram pointer 0x3039303620008000 >>>>> >>>>> This happens directly after the confirmation of the Timezone before >>>>> the Disk is partitioned. >>>>> >>>>> If I specify -global virtio-blk-pci.scsi=off in the cmdline this >>>>> does not happen. >>>>> >>>>> Here is a stack trace: >>>>> >>>>> Thread 1 (Thread 0x77fee700 (LWP 8226)): >>>>> #0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6 >>>>> No symbol table info available. >>>>> #1 <https://github.com/sahlberg/libiscsi/issues/1> >>>>> 0x557b751d in qemu_ram_addr_from_host_nofail ( >>>>> ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835 >>>>> ram_addr = 0 >>>>> #2 <https://github.com/sahlberg/libiscsi/issues/2> >>>>> 0x557b9177 in cpu_physical_memory_unmap ( >>>>> buffer=0x3039303620008000, len=4986663671065686081, is_write=1, >>>>> access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645 >>>> >>>> buffer and len are ASCII junk. It appears to be hex digits and it's not >>>> clear where they come from. >>>> >>>> It would be interesting to print *elem one stack frame up in #3 >>>> virtqueue_fill() to show the iovecs and in/out counts. >>> >>> >>> (gdb) print *elem >> >> Great, thanks for providing this info: >> >>> $6 = {index = 3, out_num = 2, in_num = 4, in_addr = {1914920960, >>> 1916656688, >>> 2024130072, 2024130088, 0 , 4129, 93825009696000, >>> 140737328183160, 0 }, out_addr = {2024130056, >>> 2038414056, 0, 8256, 4128, 93824999311936, 0, 3, 0 >> times>, >>> 12385, 93825009696000, 140737328183160, 0 }, >> >> Up to here everything is fine. >> >>> in_sg = >>> {{ >>>iov_base = 0x3039303620008000, iov_len = 4986663671065686081}, { >>>iov_base = 0x383038454635, iov_len = 3544389261899019573}, { >> >> The fields are bogus, in_sg has been overwritten with ASCII data. >> Unfortunately I don't see any hint of where this ASCII data came from >> yet. >> >> The hdr fields you provided in stack frame #6 show that in_sg was >> overwritten during or after the bdrv_ioctl() call. We pulled valid >> data out of the vring and mapped buffers correctly. But something is >> overwriting in_sg and when we complete the request we blow up due to >> the bogus values. > > Ok. What I have to mention. I've been testing with qemu-kvm 1.2.0 > an
Re: [Qemu-devel] QEMU question: is eventfd not thread safe?
Hi, As Paolo said, I hit this with block/iscsi.c a few months back. Then about a month back I recall something that looks alkmost identical to this hit the IDE driver on the KVM list. ( At least the symptoms looked just like my symptoms, and the KVM guys managed to bisect it down to the exact same patch that I did that would expose the bug in block.iscsi.c, namely the lack of calling qemu_notify_event() ) If we have hit a problem 3 times recently due to developers not realizing the importance of calling qemu_notify_event(), maybe the API is suboptimal. Would it be possible to change the set-event-handlers functions to automatically call qemu_notify_event() when the descriptos change? To eliminate the need to call this function at all ? regards ronnie sahlberg On Mon, Jul 2, 2012 at 10:06 AM, Alexey Kardashevskiy wrote: > On 02/07/12 09:07, Benjamin Herrenschmidt wrote: >>>>> diff --git a/iohandler.c b/iohandler.c >>>>> index 3c74de6..54f4c48 100644 >>>>> --- a/iohandler.c >>>>> +++ b/iohandler.c >>>>> @@ -77,6 +77,7 @@ int qemu_set_fd_handler2(int fd, >>>>> ioh->fd_write = fd_write; >>>>> ioh->opaque = opaque; >>>>> ioh->deleted = 0; >>>>> + kill(getpid(), SIGUSR2); >>>>> } >>>>> return 0; >>>>> } >> >> That probably wants to be a pthread_kill targetted at the main loop. >> >>>>> +static void sigusr2_print(int signal) >>>>> +{ >>>>> +} >>>>> + >>>>> +static void sigusr2_init(void) >>>>> +{ >>>>> + struct sigaction action; >>>>> + >>>>> + memset(&action, 0, sizeof(action)); >>>>> + sigfillset(&action.sa_mask); >>>>> + action.sa_handler = sigusr2_print; >>>>> + action.sa_flags = 0; >>>>> + sigaction(SIGUSR2, &action, NULL); >>>>> +} >>>>> + >> >> Won't that conflict with the business in coroutine-sigaltstack.c ? > > The code which touches SIGUSR2 does not compile on power. > >> Hrm... looking at it, it looks like it will save/restore the handler, >> so that should be good. >> >> Still, one might want to wrap that into something, like >> qemu_wake_main_loop(); > > > I already posted another patch with qemu_notify_event() in this mail thread > later :) > > >> >> Cheers, >> Ben. >> >>>>> int main_loop_init(void) >>>>> { >>>>> int ret; >>>>> >>>>> + sigusr2_init(); >>>>> + >>>>> qemu_mutex_lock_iothread(); >>>>> ret = qemu_signal_init(); >>>>> if (ret) { >>>>> -- >>>>> 1.7.10 >>> >>> >> >> > > > -- > Alexey >
[Qemu-devel] [PATCH] SCSI rd/wr/vr protect != 0 is an error
List, Paolo, Please find a small patch for the scsi emulation in qemu. Since QEMU scsi emulation does not implement/support protection information any READ/WRITE/VERIFY command that specifies a non-zero value for the rd/vr/wr-protect field in the CDB should fail with ILLEGAL_REQUEST/INVALID_FIELD_IN_CDB regards ronnie sahlberg
[Qemu-devel] [PATCH] SCSI: rd/wr/vr-protect !=0 is an error
The QEMU SCSI emulation does not support protection information, so any READ/WRITE/VERIFY commands that has the protect bits set to non-zero should fail with ILLEGAL_REQUEST/INVALID_FIELD_IN_CDB >From SCSI SBC : If the logical unit does not support protection information, then the device server should terminate the command with CHECK CONDITION status with the sense key set to ILLEGAL REQUEST and the additional sense code set to INVALID FIELD IN CDB. Signed-off-by: Ronnie Sahlberg --- hw/scsi-disk.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 34336b1..b2f3c0c 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -1555,6 +1555,9 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) case READ_16: len = r->req.cmd.xfer / s->qdev.blocksize; DPRINTF("Read (sector %" PRId64 ", count %d)\n", r->req.cmd.lba, len); +if (r->req.cmd.buf[1] & 0xe0) { +goto fail; +} if (r->req.cmd.lba > s->qdev.max_lba) { goto illegal_lba; } @@ -1575,6 +1578,9 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) DPRINTF("Write %s(sector %" PRId64 ", count %d)\n", (command & 0xe) == 0xe ? "And Verify " : "", r->req.cmd.lba, len); +if (r->req.cmd.buf[1] & 0xe0) { +goto fail; +} if (r->req.cmd.lba > s->qdev.max_lba) { goto illegal_lba; } -- 1.7.3.1
[Qemu-devel] [PATCH] SCSI improved LBA-out-of-range checks
Paolo, List Please find a small patch to the scsi emulation. This patch improves the checkign that the requested lbas are all available. We check both that lba+len is not going past the end of the device but also iflba+len < lba This second condition could occur for deviously crafted scsi packets where lba is set to 0x and len is set to 2 in which case lba+len would wrap to 1
[Qemu-devel] [PATCH] SCSI: improve the lba-out-of-range tests for read/write/verify
Improve the tests for the LBA to cover more cases, the new test looks like this if (r->req.cmd.lba > r->req.cmd.lba + len || r->req.cmd.lba + len > s->qdev.max_lba) { For the 16 byte opcodes, the lba is a uint64, so the first check is to make sure that we do not wrap. For example if an opcode would specify the LBA:0x and LEN:2 then lba+len would wrap to 1. The second part of the test is to verify that ALL requested blocks are available, not just the first one. Signed-off-by: Ronnie Sahlberg --- hw/scsi-disk.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index b2f3c0c..40c05de 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -1558,7 +1558,8 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) if (r->req.cmd.buf[1] & 0xe0) { goto fail; } -if (r->req.cmd.lba > s->qdev.max_lba) { +if (r->req.cmd.lba > r->req.cmd.lba + len +|| r->req.cmd.lba + len > s->qdev.max_lba) { goto illegal_lba; } r->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); @@ -1581,7 +1582,8 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) if (r->req.cmd.buf[1] & 0xe0) { goto fail; } -if (r->req.cmd.lba > s->qdev.max_lba) { +if (r->req.cmd.lba > r->req.cmd.lba + len +|| r->req.cmd.lba + len > s->qdev.max_lba) { goto illegal_lba; } r->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); -- 1.7.3.1
[Qemu-devel] [PATCH] SCSI improved LBA-out-of-range checks BUGFIX
Paolo, Sorry but the previous patch was bad. Use this patch instead it uses the correct check of if (r->req.cmd.lba > r->req.cmd.lba + len || r->req.cmd.lba + len > s->qdev.max_lba + 1) {
[Qemu-devel] [PATCH] SCSI Improved checking for LBA out of range
Improve the tests for the LBA to cover more cases, the new test looks like this if (r->req.cmd.lba > r->req.cmd.lba + len || r->req.cmd.lba + len > s->qdev.max_lba + 1) { For the 16 byte opcodes, the lba is a uint64, so the first check is to make sure that we do not wrap. For example if an opcode would specify the LBA:0x and LEN:2 then lba+len would wrap to 1. The second part of the test is to verify that ALL requested blocks are available, not just the first one. Signed-off-by: Ronnie Sahlberg --- hw/scsi-disk.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index b2f3c0c..2c2be33 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -1558,7 +1558,8 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) if (r->req.cmd.buf[1] & 0xe0) { goto fail; } -if (r->req.cmd.lba > s->qdev.max_lba) { +if (r->req.cmd.lba > r->req.cmd.lba + len +|| r->req.cmd.lba + len > s->qdev.max_lba + 1) { goto illegal_lba; } r->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); @@ -1581,7 +1582,8 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) if (r->req.cmd.buf[1] & 0xe0) { goto fail; } -if (r->req.cmd.lba > s->qdev.max_lba) { +if (r->req.cmd.lba > r->req.cmd.lba + len +|| r->req.cmd.lba + len > s->qdev.max_lba + 1) { goto illegal_lba; } r->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); -- 1.7.3.1
Re: [Qemu-devel] [PATCH] SCSI: improve the lba-out-of-range tests for read/write/verify
Sorry, there is a bug in it. Can you change it to : || r->req.cmd.lba + len > s->qdev.max_lba + 1) { or else the last lba of the device will be flagged as out of range. On Thu, Jul 12, 2012 at 5:04 PM, Paolo Bonzini wrote: > Il 12/07/2012 08:52, Ronnie Sahlberg ha scritto: >> Improve the tests for the LBA to cover more cases, the new test looks like >> this >> if (r->req.cmd.lba > r->req.cmd.lba + len >> || r->req.cmd.lba + len > s->qdev.max_lba) { >> >> For the 16 byte opcodes, the lba is a uint64, so the first check is to make >> sure that we do not wrap. >> For example if an opcode would specify the LBA:0x and LEN:2 >> then lba+len would wrap to 1. >> >> The second part of the test is to verify that ALL requested blocks are >> available, not just the first one. > > Fixed code style for || and applied to scsi-next. > > Thanks! > > Paolo > >> Signed-off-by: Ronnie Sahlberg >> --- >> hw/scsi-disk.c |6 -- >> 1 files changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c >> index b2f3c0c..40c05de 100644 >> --- a/hw/scsi-disk.c >> +++ b/hw/scsi-disk.c >> @@ -1558,7 +1558,8 @@ static int32_t scsi_send_command(SCSIRequest *req, >> uint8_t *buf) >> if (r->req.cmd.buf[1] & 0xe0) { >> goto fail; >> } >> -if (r->req.cmd.lba > s->qdev.max_lba) { >> +if (r->req.cmd.lba > r->req.cmd.lba + len >> +|| r->req.cmd.lba + len > s->qdev.max_lba) { >> goto illegal_lba; >> } >> r->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); >> @@ -1581,7 +1582,8 @@ static int32_t scsi_send_command(SCSIRequest *req, >> uint8_t *buf) >> if (r->req.cmd.buf[1] & 0xe0) { >> goto fail; >> } >> -if (r->req.cmd.lba > s->qdev.max_lba) { >> +if (r->req.cmd.lba > r->req.cmd.lba + len >> +|| r->req.cmd.lba + len > s->qdev.max_lba) { >> goto illegal_lba; >> } >> r->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); >> > >
Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
Do you really need to resolve the conflicts? It might be easier and sufficient to just flag those hashes where a conflict has been detected as : "dont dedup this hash anymore, collissions have been seen." On Wed, Jan 2, 2013 at 10:40 AM, Benoît Canet wrote: > Le Wednesday 02 Jan 2013 à 12:26:37 (-0600), Troy Benjegerdes a écrit : >> The probability may be 'low' but it is not zero. Just because it's >> hard to calculate the hash doesn't mean you can't do it. If your >> input data is not random the probability of a hash collision is >> going to get scewed. >> >> Read about how Bitcoin uses hashes. >> >> I need a budget of around $10,000 or so for some FPGAs and/or GPU cards, >> and I can make a regression test that will create deduplication hash >> collisions on purpose. > > It's not a problem as Eric pointed out while reviewing the previous patchset > there is a small place left with zeroes on the deduplication block. > A bit could be set on it when a collision is detected and an offset could > point > to a cluster used to resolve collisions. > >> >> >> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote: >> > > How does this code handle hash collisions, and do you have some >> > > regression >> > > tests that purposefully create a dedup hash collision, and verify that >> > > the >> > > 'right thing' happens? >> > >> > The two hash function that can be used are cryptographics and not broken >> > yet. >> > So nobody knows how to generate a collision. >> > >> > You can do the math to calculate the probability of collision using a 256 >> > bit >> > hash while processing 1EiB of data the result is so low you can consider it >> > won't happen. >> > The sha256 ZFS deduplication works the same way regarding collisions. >> > >> > I currently use qemu-io-test for testing purpose and iozone with the -w >> > flag in >> > the guest. >> > I would like to find a good deduplication stress test to run in a guest. >> > >> > Regards >> > >> > Beno?t >> > >> > > It's great that this almost works, but it seems rather dangerous to put >> > > something like this into the mainline code without some regression tests. >> > > >> > > (I'm also suspecting the regression test will be a great way to find >> > > flakey hardware) >> > > >> > > -- >> > > Troy Benjegerdes'da hozer' >> > > ho...@hozed.org >> > > >> > > Somone asked my why I work on this free (http://www.fsf.org/philosophy/) >> > > software & hardware (http://q3u.be) stuff and not get a real job. >> > > Charles Shultz had the best answer: >> > > >> > > "Why do musicians compose symphonies and poets write poems? They do it >> > > because life wouldn't have any meaning for them if they didn't. That's >> > > why >> > > I draw cartoons. It's my life." -- Charles Shultz >> >> -- >> -- >> Troy Benjegerdes'da hozer' ho...@hozed.org >> >> Somone asked my why I work on this free (http://www.fsf.org/philosophy/) >> software & hardware (http://q3u.be) stuff and not get a real job. >> Charles Shultz had the best answer: >> >> "Why do musicians compose symphonies and poets write poems? They do it >> because life wouldn't have any meaning for them if they didn't. That's why >> I draw cartoons. It's my life." -- Charles Shultz >> >
Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
On Wed, Jan 2, 2013 at 11:18 AM, Troy Benjegerdes wrote: > If you do get a hash collision, it's a rather exceptional event, so I'd > say every effort should be made to log the event and the data that created > it in multiple places. > > There are three questions I'd ask on a hash collision: > > 1) was it the data? > 2) was it the hardware? > 3) was it a software bug? Yes, that is probably good too, and saving off the old and new block content that collided. Unless you are checksumming the blocks, I suspect that the most common reason for "collisions" would just be cases where the original block was corrupted/changed on disk and you dont detect it and then when you re-write an identical one the blocks no longer match and thus you get a false collision.
[Qemu-devel] [PATCH] iSCSI: remove the now obsolete call to qemu_notify_event()
Paolo, List Please find a trivial patch to iscsi that removes the call to qemu_notify_event() We do not need to call this any more since a recent patch that made qemu_aio_set_fd_handler() and friends invoke this call automatically when required. regards ronnie sahlberg
[Qemu-devel] [PATCH] iSCSI: We dont need to explicitely call qemu_notify_event() any more
We no longer need to explicitely call qemu_notify_event() any more since this is now done automatically any time the filehandles we listen to change. Signed-off-by: Ronnie Sahlberg --- block/iscsi.c |6 -- 1 files changed, 0 insertions(+), 6 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 0b96165..355ce65 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -167,12 +167,6 @@ iscsi_set_events(IscsiLun *iscsilun) } -/* If we just added an event, the callback might be delayed - * unless we call qemu_notify_event(). - */ -if (ev & ~iscsilun->events) { -qemu_notify_event(); -} iscsilun->events = ev; } -- 1.7.3.1
[Qemu-devel] [PATCH] iSCSI: Add support for SG_IO to iscsi_ioctl()
Paolo, List Please find a small patch that adds support for SG_IO to the synchronous iscsi_ioctl() function. This is needed to make scsi-block work with iscsi. regards ronnie sahlberg
[Qemu-devel] [PATCH] iSCSI: We need to support SG_IO also from iscsi_ioctl()
We need to support SG_IO from the synchronous iscsi_ioctl() since scsi-block uses this to do an INQ to the device to discover its properties This patch makes scsi-block work with iscsi. Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 355ce65..189ab6f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -537,7 +537,8 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status, #define SG_ERR_DRIVER_SENSE0x08 -if (status == SCSI_STATUS_CHECK_CONDITION && acb->task->datain.size >= 2) { +if (status == SCSI_STATUS_CHECK_CONDITION +&& acb->task->datain.size >= 2) { int ss; acb->ioh->driver_status |= SG_ERR_DRIVER_SENSE; @@ -622,9 +623,17 @@ static BlockDriverAIOCB *iscsi_aio_ioctl(BlockDriverState *bs, return &acb->common; } + +static void ioctl_cb(void *opaque, int status) +{ +int *p_status = opaque; +*p_status = status; +} + static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf) { IscsiLun *iscsilun = bs->opaque; +int status; switch (req) { case SG_GET_VERSION_NUM: @@ -633,6 +642,15 @@ static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf) case SG_GET_SCSI_ID: ((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type; break; +case SG_IO: +status = -EINPROGRESS; +iscsi_aio_ioctl(bs, req, buf, ioctl_cb, &status); + +while (status == -EINPROGRESS) { +qemu_aio_wait(); +} + +return 0; default: return -1; } -- 1.7.3.1
Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend
On Fri, Sep 7, 2012 at 1:47 AM, Daniel P. Berrange wrote: > On Thu, Sep 06, 2012 at 09:10:04PM +0530, Bharata B Rao wrote: > > On Thu, Sep 06, 2012 at 11:29:36AM +0300, Avi Kivity wrote: > > > On 08/14/2012 12:58 PM, Kevin Wolf wrote: > > > > > > > >> While we are at this, let me bring out another issue. Gluster > supports 3 > > > >> transport types: > > > >> > > > >> - socket in which case the server will be hostname, ipv4 or ipv4 > address. > > > >> - rdma in which case server will be interpreted similar to socket. > > > >> - unix in which case server will be a path to unix domain socket > and this > > > >> will look like any other filesystem path. (Eg. > /tmp/glusterd.socket) > > > >> > > > >> I don't think we can fit 'unix' within the standard URI scheme (RFC > 3986) > > > >> easily, but I am planning to specify the 'unix' transport as below: > > > >> > > > >> gluster://[/path/to/unix/domain/socket]/volname/image?transport=unix > > > >> > > > >> i,e., I am asking the user to put the unix domain socket path within > > > >> square brackets when transport type is unix. > > > >> > > > >> Do you think this is fine ? > > > > > > > > Never saw something like this before, but it does seem reasonable to > me. > > > > Excludes ] from the valid characters in the file name of the socket, > but > > > > that shouldn't be a problem in practice. > > > > > > Bikeshedding, but I prefer > > > > > > gluster:///path/to/unix/domain/socket:/volname/image?transport=unix > > > > So if the unix domain socket is /tmp/glusterd.socket, then this would > look like: > > > > gluster:///tmp/glusterd.socket:/volname/image?transport=unix. > > > > So you are saying : will the separator b/n the unix domain socket path > > and rest of the URI components. > > > > Unless you or others strongly feel about this, I would like to go with > > [ ] based spec, which I feel is less prone to errors like missing a colon > > by mistake :) > > > > > > > > as being more similar to file://, or even > > > > > > gluster:///path/to/unix/domain/socket/volname/image?transport=unix > > > > > > with the last two components implied to be part of the payload, not the > > > path. > > > > Note that image is a file path by itself like /dir1/a.img. So I guess it > > becomes difficult to figure out where the unix domain socket path ends > > and rest of the URI components begin w/o a separator in between. > > IMHO this is all gross. URIs already have a well defined way to provide > multiple parameters, dealing with escaping of special characters. ie query > parameters. The whole benefit of using URI syntax is to let apps process > the URIs using standard APIs. We should not be trying to define some extra > magic encoding to let us stuff 2 separate parameters into the path > component > since that means apps have to now write custom parsing code again. Either > the UNIX socket path, or the volume path should be in the URI path, not > both. The other part should be a URI parameter. I'd really expect us to > use: > > gluster:///volname/image?transport=unix&sockpath=/path/to/unix/sock > > +1 ronnie sahlberg
Re: [Qemu-devel] [PATCH] iSCSI: We dont need to explicitely call qemu_notify_event() any more
Ping? On Thu, Aug 30, 2012 at 4:56 PM, Ronnie Sahlberg wrote: > We no longer need to explicitely call qemu_notify_event() any more since this > is now done automatically any time the filehandles we listen to change. > > Signed-off-by: Ronnie Sahlberg > --- > block/iscsi.c |6 -- > 1 files changed, 0 insertions(+), 6 deletions(-) > > diff --git a/block/iscsi.c b/block/iscsi.c > index 0b96165..355ce65 100644 > --- a/block/iscsi.c > +++ b/block/iscsi.c > @@ -167,12 +167,6 @@ iscsi_set_events(IscsiLun *iscsilun) > > } > > -/* If we just added an event, the callback might be delayed > - * unless we call qemu_notify_event(). > - */ > -if (ev & ~iscsilun->events) { > -qemu_notify_event(); > -} > iscsilun->events = ev; > } > > -- > 1.7.3.1 >
Re: [Qemu-devel] [PATCH] iSCSI: We need to support SG_IO also from iscsi_ioctl()
ping? On Thu, Aug 30, 2012 at 5:28 PM, Ronnie Sahlberg wrote: > We need to support SG_IO from the synchronous iscsi_ioctl() since > scsi-block uses this to do an INQ to the device to discover its properties > This patch makes scsi-block work with iscsi. > > Signed-off-by: Ronnie Sahlberg > --- > block/iscsi.c | 20 +++- > 1 files changed, 19 insertions(+), 1 deletions(-) > > diff --git a/block/iscsi.c b/block/iscsi.c > index 355ce65..189ab6f 100644 > --- a/block/iscsi.c > +++ b/block/iscsi.c > @@ -537,7 +537,8 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int > status, > > #define SG_ERR_DRIVER_SENSE0x08 > > -if (status == SCSI_STATUS_CHECK_CONDITION && acb->task->datain.size >= > 2) { > +if (status == SCSI_STATUS_CHECK_CONDITION > +&& acb->task->datain.size >= 2) { > int ss; > > acb->ioh->driver_status |= SG_ERR_DRIVER_SENSE; > @@ -622,9 +623,17 @@ static BlockDriverAIOCB > *iscsi_aio_ioctl(BlockDriverState *bs, > return &acb->common; > } > > + > +static void ioctl_cb(void *opaque, int status) > +{ > +int *p_status = opaque; > +*p_status = status; > +} > + > static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void > *buf) > { > IscsiLun *iscsilun = bs->opaque; > +int status; > > switch (req) { > case SG_GET_VERSION_NUM: > @@ -633,6 +642,15 @@ static int iscsi_ioctl(BlockDriverState *bs, unsigned > long int req, void *buf) > case SG_GET_SCSI_ID: > ((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type; > break; > +case SG_IO: > +status = -EINPROGRESS; > +iscsi_aio_ioctl(bs, req, buf, ioctl_cb, &status); > + > +while (status == -EINPROGRESS) { > +qemu_aio_wait(); > +} > + > +return 0; > default: > return -1; > } > -- > 1.7.3.1 >
[Qemu-devel] [PATCH] SCSI disk, set HISUP bit in INQ data
Paolo, List Please find a trivial patch that make SCSI-DISK set the HISUP bit in the INQ data. Since I think all LUN numbers are reported as SAM4 describes, we should set this bit. Its already sed in SCSI-BUS regards ronnie sahlberg
[Qemu-devel] [PATCH] SCSI: Standard INQUIRY data should report HiSup flag as set.
QEMU as far as I know only reports LUN numbers using the modes that are described in SAM4. As such, since all LUN numbers generated by the SCSI emulation in QEMU follow SAM4, we should set the HiSup bit in the standard INQUIRY data to indicate such. >From SAM4: 4.6.3 LUNs overview All LUN formats described in this standard are hierarchical in structure even when only a single level in that hierarchy is used. The HISUP bit shall be set to one in the standard INQUIRY data (see SPC-4) when any LUN format described in this standard is used. Non-hierarchical formats are outside the scope of this standard. Signed-off-by: Ronnie Sahlberg --- hw/scsi-disk.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 1585683..52bc062 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -678,7 +678,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) * is actually implemented, but we're good enough. */ outbuf[2] = 5; -outbuf[3] = 2; /* Format 2 */ +outbuf[3] = 2 | 0x10; /* Format 2, HiSup */ if (buflen > 36) { outbuf[4] = buflen - 5; /* Additional Length = (Len - 1) - 4 */ -- 1.7.3.1
Re: [Qemu-devel] [RFC] find_next_bit optimizations
Even more efficient might be to do bitwise instead of logical or >if (tmp | d1 | d2 | d3) { that should remove 3 of the 4 conditional jumps and should become 3 bitwise ors and one conditional jump On Mon, Mar 11, 2013 at 8:58 AM, Paolo Bonzini wrote: > Il 11/03/2013 16:37, Peter Lieven ha scritto: >> >> Am 11.03.2013 um 16:29 schrieb Paolo Bonzini : >> >>> Il 11/03/2013 16:24, Peter Lieven ha scritto: > How would that be different in your patch? But you can solve it by > making two >= loops, one checking for 4*BITS_PER_LONG and one checking > BITS_PER_LONG. This is what I have now: diff --git a/util/bitops.c b/util/bitops.c index e72237a..b0dc93f 100644 --- a/util/bitops.c +++ b/util/bitops.c @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, const unsigned long *p = addr + BITOP_WORD(offset); unsigned long result = offset & ~(BITS_PER_LONG-1); unsigned long tmp; +unsigned long d0,d1,d2,d3; if (offset >= size) { return size; } size -= result; -offset %= BITS_PER_LONG; +offset &= (BITS_PER_LONG-1); if (offset) { tmp = *(p++); tmp &= (~0UL << offset); @@ -42,7 +43,19 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, size -= BITS_PER_LONG; result += BITS_PER_LONG; } -while (size & ~(BITS_PER_LONG-1)) { +while (size >= 4*BITS_PER_LONG) { +d0 = *p; +d1 = *(p+1); +d2 = *(p+2); +d3 = *(p+3); +if (d0 || d1 || d2 || d3) { +break; +} +p+=4; +result += 4*BITS_PER_LONG; +size -= 4*BITS_PER_LONG; +} +while (size >= BITS_PER_LONG) { if ((tmp = *(p++))) { goto found_middle; } >>> >>> Minus the %= vs. &=, >>> >>> Reviewed-by: Paolo Bonzini >>> >>> Perhaps: >>> >>>tmp = *p; >>>d1 = *(p+1); >>>d2 = *(p+2); >>>d3 = *(p+3); >>>if (tmp) { >>>goto found_middle; >>>} >>>if (d1 || d2 || d3) { >>>break; >>>} >> >> i do not know what gcc interally makes of the d0 || d1 || d2 || d3 ? > > It depends on the target and how expensive branches are. > >> i would guess its sth like one addition w/ carry and 1 test? > > It could be either 4 compare-and-jump sequences, or 3 bitwise ORs > followed by a compare-and-jump. > > That is, either: > > test %r8, %r8 > jz second_loop > test %r9, %r9 > jz second_loop > test %r10, %r10 > jz second_loop > test %r11, %r11 > jz second_loop > > or > > or %r9, %r8 > or %r11, %r10 > or %r8, %r10 > jz second_loop > > Don't let the length of the code fool you. The processor knows how to > optimize all of these, and GCC knows too. > >> your proposed change would introduce 2 tests (maybe)? > > Yes, but I expect they to be fairly well predicted. > >> what about this to be sure? >> >>tmp = *p; >>d1 = *(p+1); >>d2 = *(p+2); >>d3 = *(p+3); >>if (tmp || d1 || d2 || d3) { >>if (tmp) { >>goto found_middle; > > I suspect that GCC would rewrite it my version (definitely if it > produces 4 compare-and-jumps; but possibly it does it even if it goes > for bitwise ORs, I haven't checked. > > Regarding your other question ("one last thought. would it make sense to > update only `size`in the while loops and compute the `result` at the end > as `orgsize` - `size`?"), again the compiler knows better and might even > do this for you. It will likely drop the p increases and use p[result], > so if you do that change you may even get the same code, only this time > p is increased and you get an extra subtraction at the end. :) > > Bottom line: don't try to outsmart an optimizing C compiler on > micro-optimization, unless you have benchmarked it and it shows there is > a problem. > > Paolo > >>} >>break; >>} >> >> Peter >> > >
Re: [Qemu-devel] [PULL 00/14] SCSI changes for 2014-06-18
On Wed, Jun 18, 2014 at 8:57 AM, Paolo Bonzini wrote: > Il 18/06/2014 17:53, Peter Maydell ha scritto: > >> Then you probably need to fix the configure test ;-) This is >> Ubuntu Trusty so really pretty recent (it's my main x86 build >> machine, not some oddball platform); libiscsi-dev 1.4.0-3. > > > This is a very old release. I'll apply Peter's patch to disable libiscsi > for pre-1.9.0 versions and respin the pull request. > > I didn't want to do that because 1.4.0->1.9.0 breaks ABI, but Peter thinks > that versions before 1.8.0 are not reliable anyway. +1 Versions prior are good for causal use but Peter is the authority on production use here. I think qemu should have a configure check and just force disable iscsi support if libiscsi < 1.9.0 > > If Ubuntu wants to distribute both 1.4.0 and 1.9.0, they can move 1.9.0 > header files to a separate directory using pkgconfig (1.4.0 does not support > pkgconfig). > > Paolo >
Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
On Thu, May 8, 2014 at 4:33 AM, Stefan Hajnoczi wrote: > On Wed, May 07, 2014 at 04:09:27PM +0200, Peter Lieven wrote: >> On 07.05.2014 12:29, Paolo Bonzini wrote: >> >Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto: >> >>On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote: >> +static void iscsi_attach_aio_context(BlockDriverState *bs, >> + AioContext *new_context) >> +{ >> +IscsiLun *iscsilun = bs->opaque; >> + >> +iscsilun->aio_context = new_context; >> +iscsi_set_events(iscsilun); >> + >> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER) >> +/* Set up a timer for sending out iSCSI NOPs */ >> +iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context, >> + QEMU_CLOCK_REALTIME, SCALE_MS, >> + iscsi_nop_timed_event, iscsilun); >> +timer_mod(iscsilun->nop_timer, >> + qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL); >> +#endif >> +} >> >>> >> >>>Is it still guaranteed that iscsi_nop_timed_event for a target is not >> >>>invoked >> >>>while we are in another function/callback of the iscsi driver for the >> >>>same target? >> > >> >Yes, since the timer is in the same AioContext as the iscsi driver >> >callbacks. >> >> >> Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are >> in iscsi_service. >> As Paolo outlined, this cannot happen, right? > > Okay, I think we're safe then. The timer can only be invoked during > aio_poll() event loop iterations. It cannot be invoked while we're > inside iscsi_service(). > >> >>BTW, is iscsi_reconnect() the right libiscsi interface to use since it >> >>is synchronous? It seems like this would block QEMU until the socket >> >>has connected! The guest would be frozen. >> > >> >There is no asynchronous interface yet for reconnection, unfortunately. >> >> We initiate the reconnect after we miss a few NOP replies. So the target is >> already down for approx. 30 seconds. >> Every process inside the guest is already haging or has timed out. >> >> If I understand correctly with the new patches only the communication with >> this target is hanging or isn't it? >> So what benefit would an asyncronous reconnect have? > > Asynchronous reconnect is desirable: > > 1. The QEMU monitor is blocked while we're waiting for the iSCSI target >to accept our reconnect. This means the management stack (libvirt) >cannot control QEMU until we time out or succeed. > > 2. The guest is totally frozen - cannot execute instructions - because >it will soon reach a point in the code that locks the QEMU global >mutex (which is being held while we reconnect to the iSCSI target). > >This may be okayish for guests where the iSCSI LUN contains the >"main" data that is being processed. But what if an iSCSI LUN was >just attached to a guest that is also doing other things that are >independent (e.g. serving a website, processing data from a local >disk, etc) - now the reconnect causes downtime for the entire guest. I will look into making the reconnect async over the next few days.
Re: [Qemu-devel] [PATCH 08/20] nfs: Handle failure for potentially large allocations
For this case and for the iscsi case, isn't it likely that once this happens the guest is pretty much doomed since I/O to the disk will no longer work ? Should we also change the block layer so that IF *_readv/_writev fails with -ENOMEM then it should try again but break the request up into a chain of smaller chunks ? On Wed, May 21, 2014 at 9:28 AM, Kevin Wolf wrote: > Some code in the block layer makes potentially huge allocations. Failure > is not completely unexpected there, so avoid aborting qemu and handle > out-of-memory situations gracefully. > > This patch addresses the allocations in the nfs block driver. > > Signed-off-by: Kevin Wolf > --- > block/nfs.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/block/nfs.c b/block/nfs.c > index 539bd95..e3d6216 100644 > --- a/block/nfs.c > +++ b/block/nfs.c > @@ -165,7 +165,11 @@ static int coroutine_fn nfs_co_writev(BlockDriverState > *bs, > > nfs_co_init_task(client, &task); > > -buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE); > +buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE); > +if (buf == NULL) { > +return -ENOMEM; > +} > + > qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE); > > if (nfs_pwrite_async(client->context, client->fh, > -- > 1.8.3.1 > >
Re: [Qemu-devel] [PATCH] block/iscsi: use 16 byte CDBs only when necessary
That is one big request. I assume the device reports "no limit" in the VPD page so we can not state it is the guest/application going beyond the allowed limit? I am not entirely sure what meaning the target assigns to Protocol Error means here. It could be that ~100M is way higher than MaxBurstLength ? What is the MaxBurstLength that was reported by the server during login negotiation? If so, we should make libiscsi check the maxburstlength and fail the request early. We would still fail the I/O so it will not really solve anything much but at least we should not send the request to the server. Best would probably be to take the smallest of a non-zero Block-Limits.max_transfer_length and iscsi-MaxBurstLength/block-size and pass this back to the guest in the emulated Block-Limits-VPD. At least then you have tried to tell the guest "never do SCSI I/O bigger than this". I.e. even if the target reports BlockLimits.MaxTransferLength == 0 == no limit to QEMU, QEMU should probably take the iscsi transport limit into account and pass this to the guest by setting the emulated BlockLimits page it passes to scale to the maximum that MaxBurstLength allows. Then if BTRFS or SG_IO in the guest ignores the BlockLimits it is clearly a guest problem. (A different interpretation for ProtocolError could be the mismatch between the iscsi expected data transfer length and the scsi transfer length, but that should result in residuals, not protocol error.) Hypothetically there could be targets that support really huge MaxBurstLengths > 32MB. For those you probably want to switch to WRITE16 when the SCSI transfer length goes > 0x. - if (iscsilun->use_16_for_rw) { + if (iscsilun->use_16_for_rw || num_sectors > 0x) { regards ronnie sahlberg On Mon, Sep 1, 2014 at 8:21 AM, Peter Lieven wrote: > On 17.06.2014 13:46, Paolo Bonzini wrote: > > Il 17/06/2014 13:37, Peter Lieven ha scritto: > > On 17.06.2014 13:15, Paolo Bonzini wrote: > > Il 17/06/2014 08:14, Peter Lieven ha scritto: > > > > BTW, while debugging a case with a bigger storage supplier I found > that open-iscsi seems to do exactly this undeterministic behaviour. > I have a 3TB LUN. If I access < 2TB sectors it uses READ10/WRITE10 and > if I go beyond 2TB it changes to READ16/WRITE16. > > > Isn't that exactly what your latest patch does for >64K sector writes? :) > > > Not exactly, we choose the default by checking the LUN size. 10 Byte for > < 2TB and 16 Byte otherwise. > > > Yeah, I meant introducing the non-determinism. > > My latest patch makes an exception if a request is bigger than 64K > sectors and > switches to 16 Byte requests. These would otherwise end in an I/O error. > > > It could also be split at the block layer, like we do for unmap. I think > there's also a maximum transfer size somewhere in the VPD, we could to > READ16/WRITE16 if it is >64K sectors. > > > It seems that there might be a real world example where Linux issues >32MB > write requests. Maybe someone familiar with btrfs can advise. > I see iSCSI Protocol Errors in my logs: > > Sep 1 10:10:14 libiscsi:0 PDU header: 01 a1 00 00 00 01 00 00 00 00 00 00 > 00 00 00 00 00 00 00 07 06 8f 30 00 00 00 00 06 00 00 00 0a 2a 00 01 09 9e > 50 00 47 98 00 00 00 00 00 00 00 [XXX] > Sep 1 10:10:14 qemu-2.0.0: iSCSI: Failed to write10 data to iSCSI lun. > Request was rejected with reason: 0x04 (Protocol Error) > > Looking at the headers the xferlen in the iSCSI PDU is 110047232 Byte which > is 214936 sectors. > 214936 % 65536 = 18328 which is exactly the number of blocks in the SCSI > WRITE10 CDB. > > Can someone advise if this is something that btrfs can cause > or if I have to > blame the customer that he issues very big write requests with Direct I/O? > > The user sseems something like this in the log: > [34640.489284] BTRFS: bdev /dev/vda2 errs: wr 8232, rd 0, flush 0, corrupt > 0, gen 0 > [34640.490379] end_request: I/O error, dev vda, sector 17446880 > [34640.491251] end_request: I/O error, dev vda, sector 5150144 > [34640.491290] end_request: I/O error, dev vda, sector 17472080 > [34640.492201] end_request: I/O error, dev vda, sector 17523488 > [34640.492201] end_request: I/O error, dev vda, sector 17536592 > [34640.492201] end_request: I/O error, dev vda, sector 17599088 > [34640.492201] end_request: I/O error, dev vda, sector 17601104 > [34640.685611] end_request: I/O error, dev vda, sector 15495456 > [34640.685650] end_request: I/O error, dev vda, sector 7138216 > > Thanks, > Peter >
Re: [Qemu-devel] [PATCH] block/iscsi: use 16 byte CDBs only when necessary
On Wed, Sep 3, 2014 at 1:09 AM, Peter Lieven wrote: > > >> Am 02.09.2014 um 21:30 schrieb Peter Lieven : >> >> Looking at the code, is it possible that not the guest is causing trouble >> here, but >> multiwrite_merge code? >> >> From what I see the only limit it has when merging requests is the number of >> IOVs. >> >> >> Any thoughts? >> >> Mine are: >> a) Introducing bs->bl.max_request_size and set merge = 0 if the result would >> be too big. Default >> max request size to 32768 sectors (see below). >> b) Hardcoding the limit in multiwrite_merge for now limiting the merged size >> to 16MB (32768 sectors). >> Which is the limit we already use in bdrv_co_discard and >> bdrv_co_write_zeroes if we don't know >> better. > > or c) disabling multiwrite merge for RAW or only iSCSI completely. > I think (a) would be best. But I would suggest some small modifications: Set the default max to something even smaller, like 256 sectors. This should not have much effect on throughput since the client/initiator can just send several concurrent i/os instead of one large i/o. Also update scsi_disk_emulate_inquiry() so we tell the guest the maximum transfer length for a single i/o so that the guest kernel will break up any huge requests into proper sizes.
Re: [Qemu-devel] [PATCH] block/iscsi: use 16 byte CDBs only when necessary
On Wed, Sep 3, 2014 at 7:18 AM, Paolo Bonzini wrote: > Il 03/09/2014 16:17, ronnie sahlberg ha scritto: >> I think (a) would be best. >> But I would suggest some small modifications: >> >> Set the default max to something even smaller, like 256 sectors. This >> should not have much effect on throughput since the client/initiator >> can just send several concurrent i/os instead of one large i/o. > > I disagree. 256 sectors are 128k, that's a fairly normal I/O size. > Fair enough. But maybe a command line argument to set/override the max? This would be useful when using scsi-passthrough to usb-scsi disks. Linux kernel has I think a hard limit on 120k for the usb transport, which means when doing SG_IO to a usb disk you are limited to this as the max transfer length since anything larger will break in the usb layer. This limit also I think is not easily discoverable by an application since * the actual device still reports, in most cases, max_transfer_length == unlimited * and it is semi-hard to discover whether a /dev/sg* device is on a usb bus or not.
Re: [Qemu-devel] [PATCH 0/4] introduce max_transfer_length
Looks good to me. (minor question is just why not let default max be 0x for both 10 and 16 CDBs ?) On Fri, Sep 5, 2014 at 9:51 AM, Peter Lieven wrote: > This series adds the basics for introducing a maximum transfer length > to the block layer. Its main purpose is currently avoiding that > a multiwrite_merge exceeds the max_xfer_len of an attached iSCSI LUN. > This is a required bug fix. > > Discussed reporting of this maximum in the SCSI Disk Inquiry Emulation > of the Block Limits VPD is currently not added as we do not import any > of the other limits there. This has to be addresses in a seperate series. > > Peter Lieven (4): > BlockLimits: introduce max_transfer_length > block: immediate cancel oversized read/write requests > block/iscsi: set max_transfer_length > block: avoid creating oversized writes in multiwrite_merge > > block.c | 23 +++ > block/iscsi.c | 12 ++-- > include/block/block_int.h |3 +++ > 3 files changed, 36 insertions(+), 2 deletions(-) > > -- > 1.7.9.5 >
Re: [Qemu-devel] [PATCH 0/4] introduce max_transfer_length
Feel free to add a Reviewed-by: Ronnie Sahlberg to the patches. On Fri, Sep 5, 2014 at 12:52 PM, Peter Lieven wrote: > Am 05.09.2014 um 19:05 schrieb ronnie sahlberg: >> Looks good to me. >> >> (minor question is just why not let default max be 0x for both 10 >> and 16 CDBs ?) > > You are right. I was looking at the technical limit, but in fact it doesn't > make sense to have different limits. Its ridiculous to say, you wan't to > do big I/O then you need a target thats bigger than 2TB ;-) > > Peter > >> >> On Fri, Sep 5, 2014 at 9:51 AM, Peter Lieven wrote: >>> This series adds the basics for introducing a maximum transfer length >>> to the block layer. Its main purpose is currently avoiding that >>> a multiwrite_merge exceeds the max_xfer_len of an attached iSCSI LUN. >>> This is a required bug fix. >>> >>> Discussed reporting of this maximum in the SCSI Disk Inquiry Emulation >>> of the Block Limits VPD is currently not added as we do not import any >>> of the other limits there. This has to be addresses in a seperate series. >>> >>> Peter Lieven (4): >>> BlockLimits: introduce max_transfer_length >>> block: immediate cancel oversized read/write requests >>> block/iscsi: set max_transfer_length >>> block: avoid creating oversized writes in multiwrite_merge >>> >>> block.c | 23 +++ >>> block/iscsi.c | 12 ++-- >>> include/block/block_int.h |3 +++ >>> 3 files changed, 36 insertions(+), 2 deletions(-) >>> >>> -- >>> 1.7.9.5 >>> >
Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
On Mon, Sep 8, 2014 at 7:42 AM, Paolo Bonzini wrote: > Il 08/09/2014 16:35, Peter Lieven ha scritto: > > messages. :) So you would not throw an error msg here? >>> No, though a trace could be useful. >> >> Is there a howto somewhere how to implement that? > > Try commit 4ac4458076e1aaf3b01a6361154117df20e22215. > >> Whats your opinion changed the max_xfer_len to 0x regardsless >> of use_16_for_rw in iSCSI? > > If you implemented request splitting in the block layer, it would be > okay to force max_xfer_len to 0x. I think it should be OK if that is a different series. This only affects the iSCSI transport since it is the only transport so far to record or report a max transfer length. If a guest generates these big requests(i.e. not multiwrite) we already fail them due to the READ10 limitations. We would just fail the request at a different layer in the stack with these patches. What I would like to see would also be to report these limitations to the guest itself to prevent it from generating too large I/Os. I am willing to do that part once the initial max_xfer_len ones are finished.
Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases
On Wed, Jul 16, 2014 at 10:11 AM, Stefan Weil wrote: > Am 16.07.2014 18:49, schrieb Paolo Bonzini: >> Il 16/07/2014 18:28, Stefan Weil ha scritto: >>> Debian testing includes a brand new libiscsi, but it >>> does not include libiscsi.pc, so pkg-config won't know that it is >>> available and configure will disable libiscsi. >> >> That's a packaging bug. > > CC'ing Michael as he is the Debian maintainer of this package and > Aurélien who maintains QEMU for Debian. > > Michael, should I send a Debian bug report for libiscsi-dev? Would an > update of libiscsi for Debian stable be reasonable if versions older > than 1.9 are too buggy to be used? If you ask debian to upgrade. Could you ask them to wait and upgrade after I have release the next version, hopefully if all goes well, at the end of this week? It contains new functionality, thanks to plieven, to better handle cases where active/passive storage arrays perform failover. > I must admit that I'm a little bit > surprised because iSCSI support worked for me quite well the last time I > used it with Debian wheezy. I think, and plieven please correct me if I am wrong, earlier version would work reasonably well for basic use but there were bugs and gaps in functionality that made it ill suited for enterprise environments. > > Regards > Stefan > >> >>> I have a patch which >>> fixes this, so QEMU for Debian testing could include libiscsi again. >>> >>> Is a feature regression like this one acceptable? Do we need additional >>> testing (maybe run the build bots with --enable-xxx, so builds fail when >>> xxx no longer works)? >> >> As mentioned in the e49ab19fcaa617ad6cdfe1ac401327326b6a2552 commit >> message, this was intentional. I was reluctant to do it, but ultimately >> Peter Lieven convinced me that it isn't just about using fancy new APIs; >> libiscsi was too buggy to be useful until release 1.8.0 (even 1.9.0 >> requires a patch to avoid segfaults, and many more if you want to run it >> on ARM). >> >> Paolo > >
Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases
On Wed, Jul 16, 2014 at 10:29 AM, Michael Tokarev wrote: > 16.07.2014 21:23, ronnie sahlberg wrote: > >> If you ask debian to upgrade. Could you ask them to wait and upgrade after I >> have release the next version, hopefully if all goes well, at the end >> of this week? > > There's no problem in updating now to fix missing .pc file and to update > next week to include a new version. > Please find a new version 1.12 on the website. Thanks. ronnie sahlberg
Re: [Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
Hi, You mean using the '-S' mode and setting the key after qemu has initialized all devices but before it actually starts booting the guest? That would not be easy and would probably require some intrusive in qemu itself that might affect other device types, so I am unsure. The difference between qcow2 and iscsi and the problem is that .open() is called for all devices before the monitor is started, so .open() is called before we would have a chance to even hand the password to qemu. For qcow2 this is not a problem since even if the file is password protected the file header is not, so you can still open the file and read the header (to discover it is password protected) without knowing the password. So qcow2 can still open the file and do all the sanity checks it needs without yet knowing the password. You only need the password at a much later at the stage when you want to actually start doing I/O to the device. iSCSI is different because the username/password is not really associated with the LUN or doing I/O, the password is required to connect to the iscsi target itself, so without the password we can not even connect to the target, much less run REPORTLUNS or TESTUNITREADY to check if the LUN even exists. We cant even check if the target iqn name even exists at all without the password. So to pass the password similarly to how qcow2 does it, I see three options, neither of which are especially attractive : 1, Change iscsi_open() to become a NOOP, dont even try to connect to the target until the CPU is started (and -S has passed the password) and do the connect on first i/o instead of .open() This would mean that if the target or the LUN does not even exist at all, we wont get an early failure where QEMU would fail to start due to "the device does not exist", instead the error would be postponed until much later and cause qemu to error out on first i/o. This looks like a horrible kludge. 2, Change -S so that it is called before qemu calls any of the .open() functions for block devices. I am not sure what impact this would have and if -S users would be "surprised" if the monitor is open but the block devices have not yet been scanned. This looks difficult to easily integrate with the command line parsing anyway. This too sounds like a horrible kludge. 3, Add a new block device method .open_post_start() which is just like .open() but invoked after the CPU has been started. So .open() is called like today before -S monitor is started, then the new .open_post_start() would be called once -S has started the CPU with the 'c' command. In this case for iscsi, .open() could be changed to just create a context for the device and establish the TCP connection to the target but it would not log in to the target. .open_post_start() which could be called for all block devices once the CPU has been started would then use the context and connection from .open() and perform the actual login and discover the LUN. 3 would still be a little bit ugly since a whole set of error conditions such as "target iqn does not exist", "lun does not exist", "wrong type of LUN", "incorrect password" ... would not be detected until once the guest CPU is actually started. Would be less ugly than options 1,2 though. But even with something like option 3, to make it possible to use -S to provide the password, this is mainly for addressing the case when running via libvirt ? I still want to provide a mechanism for setting the password directly in a configuration file for users that run qemu directly from the command line and are not using -S. comments ? regards ronnie sahlberg On Mon, Dec 19, 2011 at 12:48 AM, Paolo Bonzini wrote: > On 12/18/2011 05:48 AM, Ronnie Sahlberg wrote: >> >> This patch adds configuration variables for iSCSI to set >> initiator-name to use when logging in to the target, >> which type of header-digest to negotiate with the target >> and username and password for CHAP authentication. >> >> This allows specifying a initiator-name either from the command line >> -iscsi initiator-name=iqn.2004-01.com.example:test >> or from a configuration file included with -readconfig >> [iscsi] >> initiator-name = iqn.2004-01.com.example:test >> header-digest = CRC32C|CRC32C-NONE|NONE-CRC32C|NONE >> user = CHAP username >> password = CHAP password > > > header digest and user look like they should be options to the block driver. > For the password you can reuse the system used by qcow2 perhaps? > > Paolo
Re: [Qemu-devel] OEM Windows in Qemu
Why not just buy a non-OEM version ? They are surely not tied to a specific piece of hardware and its licence should allow to run on "arbitrary PC HW you happen to want to install/run it on once activated" On Fri, Dec 23, 2011 at 5:11 PM, wrote: > > >>"My patch" does not produce any SLIC at all. The instructions mentions >>using SLIC from your machine - "my patch" is just a way to _embed_ a given >>data into VM, not a way to "produce" anything. You get in your VM what >>you have outside, either in a file or in your own BIOS, depending on where >>you took that data from. > > Right, I meant to say simply that it did copy the SLIC in a recognizable > form and embed it into the BIOS. > > > >>Yes, the patch only changes RSDT to match SLIC - this is what win7 verifies, >>all other tables does not matter for win7. And yes, it might be different >>for winXP - you may try setting all tables in VM to be of DELL OEM and see >>what happens. > > I had high hopes for this, but no joy. All the DMI tables now read Dell > Inc but XP refuses to activate. I think the key is those specific > memory addresses I mentioned earlier. I tried raw write of "Dell Inc" > into the specified address space, but there are 2 problems with this. > First is that the address space already contains data so I overwrote > something. Second even then Windows did not activate, I think the > activation check only occurs at certain times, probably on boot, so I > made my changes after it checked and it didn't check again and so didn't > see the changes I made. > > I need to modify the BIOS to contain the string so it is visible on boot > without me having to manually edit it. Is there a way to hardcode the > string into a specific memory address when I compile the BIOS? > > Brian > >
Re: [Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
I do recognize that there is real need to provide password to iscsi (and rbd?) via the monitor. I myself do not use libvirt (it is just too much pain to be worth it to try to upgrade it out-of-step with your distribution) but I recognize that likely vast majority of users would use libvirt. Once consensus is reached on "how to handle devices that need two-stage setup" and what it would look like I can volunteer to implement it. It would however be way out of scope of and bigger than a simple iscsi-local patch What about the idea about splitting .open() into two stages: .open() /* stage one, called before -S monitor is invoked */ .open_stage_2() /* called after -S finishes and when it starts to boot the guest CPU */ ? regards ronnie sahlberg On Fri, Dec 23, 2011 at 8:08 PM, Paolo Bonzini wrote: > On 12/22/2011 09:51 PM, ronnie sahlberg wrote: >> >> The difference between qcow2 and iscsi and the problem is that .open() >> is called for all devices before the monitor is started, so .open() is >> called before we would have a chance to even hand the password to >> qemu. >> >> For qcow2 this is not a problem since even if the file is password >> protected the file header is not, so you can still open the file and >> read the header (to discover it is password protected) without knowing >> the password. >> So qcow2 can still open the file and do all the sanity checks it needs >> without yet knowing the password. > > > You're right. > > I see that the libvirt driver for rbd simply passes it on the command line. > I hope I'll be able to review the patch further today. > > Paolo
Re: [Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
On Thu, Jan 19, 2012 at 11:17 PM, Kevin Wolf wrote: > Am 18.12.2011 05:48, schrieb Ronnie Sahlberg: >> This patch adds configuration variables for iSCSI to set >> initiator-name to use when logging in to the target, >> which type of header-digest to negotiate with the target >> and username and password for CHAP authentication. >> >> This allows specifying a initiator-name either from the command line >> -iscsi initiator-name=iqn.2004-01.com.example:test >> or from a configuration file included with -readconfig >> [iscsi] >> initiator-name = iqn.2004-01.com.example:test >> header-digest = CRC32C|CRC32C-NONE|NONE-CRC32C|NONE >> user = CHAP username >> password = CHAP password >> >> The patch also updates the manpage and qemu-doc >> >> Signed-off-by: Ronnie Sahlberg > > So these options are global? What if I wanted to use two different > setups for two different images? > Good point. I will rework the patch so that it first checks for [iscsi "iqn.target.name"] and if that is not found it falls-back to just checking for [iscsi] That would allow to have one "catch all" section for all targets, but also the possibility to override and use different settings on a per-target basis. I will post an updated patch in a day or two. regards ronnie sahlberg
[Qemu-devel] [PATCH] Add configuration variables for iscsi
Kevin, List This is version 2 of the patch to add configuration variables for iSCSI. This version adds the feature to specify configuration blocks that apply to a specific target name, allowing qemu to use different settings if/when connecting one guest to multiple different iscsi targets. This patch adds configuration variables for iSCSI to set initiator-name to use when logging in to the target, which type of header-digest to negotiate with the target and username and password for CHAP authentication. This allows specifying a initiator-name either from the command line -iscsi initiator-name=iqn.2004-01.com.example:test or from a configuration file included with -readconfig [iscsi] initiator-name = iqn.2004-01.com.example:test header-digest = CRC32C|CRC32C-NONE|NONE-CRC32C|NONE user = CHAP username password = CHAP password In the configuration file it is also possible to set a target specific configuratyion block using the header [iscsi "iqn.target.name"] When a iscsi session is initialized, it will first try to use a configuration section that matches the target name. If no such block is found, it will fall-back to try the default [iscsi] section instead. regards ronnie sahlberg
[Qemu-devel] [PATCH] iSCSI: add configuration variables for iSCSI
This patch adds configuration variables for iSCSI to set initiator-name to use when logging in to the target, which type of header-digest to negotiate with the target and username and password for CHAP authentication. This allows specifying a initiator-name either from the command line -iscsi initiator-name=iqn.2004-01.com.example:test or from a configuration file included with -readconfig [iscsi] initiator-name = iqn.2004-01.com.example:test header-digest = CRC32C|CRC32C-NONE|NONE-CRC32C|NONE user = CHAP username password = CHAP password If you use several different targets, you can also configure this on a per target basis by using a group name: [iscsi "iqn.target.name"] ... Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 139 +++ qemu-config.c | 27 +++ qemu-doc.texi | 38 +++- qemu-options.hx | 16 +-- vl.c|8 +++ 5 files changed, 213 insertions(+), 15 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 938c568..bd3ca11 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -455,6 +455,109 @@ iscsi_connect_cb(struct iscsi_context *iscsi, int status, void *command_data, } } +static int parse_chap(struct iscsi_context *iscsi, const char *target) +{ +QemuOptsList *list; +QemuOpts *opts; +const char *user = NULL; +const char *password = NULL; + +list = qemu_find_opts("iscsi"); +if (!list) { +return 0; +} + +opts = qemu_opts_find(list, target); +if (opts == NULL) { +opts = QTAILQ_FIRST(&list->head); +if (!opts) { +return 0; +} +} + +user = qemu_opt_get(opts, "user"); +if (!user) { +return 0; +} + +password = qemu_opt_get(opts, "password"); +if (!password) { +error_report("CHAP username specified but no password was given"); +return -1; +} + +if (iscsi_set_initiator_username_pwd(iscsi, user, password)) { +error_report("Failed to set initiator username and password"); +return -1; +} + +return 0; +} + +static void parse_header_digest(struct iscsi_context *iscsi, const char *target) +{ +QemuOptsList *list; +QemuOpts *opts; +const char *digest = NULL; + +list = qemu_find_opts("iscsi"); +if (!list) { +return; +} + +opts = qemu_opts_find(list, target); +if (opts == NULL) { +opts = QTAILQ_FIRST(&list->head); +if (!opts) { +return; +} +} + +digest = qemu_opt_get(opts, "header-digest"); +if (!digest) { +return; +} + +if (!strcmp(digest, "CRC32C")) { +iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_CRC32C); +} else if (!strcmp(digest, "NONE")) { +iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_NONE); +} else if (!strcmp(digest, "CRC32C-NONE")) { +iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_CRC32C_NONE); +} else if (!strcmp(digest, "NONE-CRC32C")) { +iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_NONE_CRC32C); +} else { +error_report("Invalid header-digest setting : %s", digest); +} +} + +static char *parse_initiator_name(const char *target) +{ +QemuOptsList *list; +QemuOpts *opts; +const char *name = NULL; + +list = qemu_find_opts("iscsi"); +if (!list) { +return g_strdup("iqn.2008-11.org.linux-kvm"); +} + +opts = qemu_opts_find(list, target); +if (opts == NULL) { +opts = QTAILQ_FIRST(&list->head); +if (!opts) { +return g_strdup("iqn.2008-11.org.linux-kvm"); +} +} + +name = qemu_opt_get(opts, "initiator-name"); +if (!name) { +return g_strdup("iqn.2008-11.org.linux-kvm"); +} + +return g_strdup(name); +} + /* * We support iscsi url's on the form * iscsi://[%@][:]// @@ -465,6 +568,7 @@ static int iscsi_open(BlockDriverState *bs, const char *filename, int flags) struct iscsi_context *iscsi = NULL; struct iscsi_url *iscsi_url = NULL; struct IscsiTask task; +char *initiator_name = NULL; int ret; if ((BDRV_SECTOR_SIZE % 512) != 0) { @@ -474,16 +578,6 @@ static int iscsi_open(BlockDriverState *bs, const char *filename, int flags) return -EINVAL; } -memset(iscsilun, 0, sizeof(IscsiLun)); - -/* Should really append the KVM name after the ':' here */ -iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:"); -if (iscsi == NULL) { -error_report("iSCSI: Failed to create iSCSI context."); -ret = -ENOMEM; -goto failed; -} - iscsi_url = iscsi_parse_full_url(iscsi, filename); if (iscsi_url
Re: [Qemu-devel] [PATCH] readconfig: add emulation of /dev/fd/ to platforms that that lacks this API
None. No platforms that I care about. On Tue, Mar 20, 2012 at 3:43 AM, Anthony Liguori wrote: > On 03/03/2012 01:49 AM, Ronnie Sahlberg wrote: >> >> Please find a patch to -readconfig. >> >> On many platforms -readconfig /dev/fd/ can be used to read from an >> already opened and inherited filedescriptor. >> >> On platforms that do not natively provide /dev/fd/ > > > What platforms don't provide /dev/fd/ where fd passing makes sense? > > Regards, > > Anthony Liguori > > >> add emulation of this by checking if the ofiginal fopen(path) failed, then >> IF >> the path starts with /dev/fd/ then try to read the descriptorvalue and >> fdopen() it. >> >> >> It means that we can use -readconfig /dev/fd/ on all platforms. On >> those that provide a /dev/fd we just open that file. On those that do not we >> emulate it. >> >> >> regards >> ronnie sahlberg >> >> >
[Qemu-devel] (no subject)
Please find attached a patch to add built-in support for iSCSI into QEMU. Please review and/or apply this patch. This is the latest version of this patch and I think I have addressed all previous concerns and suggestions. Using built-in iSCSI support has many advantages for certain use cases : * if you have very many iSCSI devices it may be impractical to expose all LUNs to the underlying host. * the devices become private to the guest and are not visible to the host. This automatically avoids polluting the page-cache on the host. * security, the devices are private to the guest, which prevents other guests or the host itself from accessing the devices. * security, it allows non-root users a secure way to get private and password protected access to the device by using CHAP authentication. * migration, many other virtualization systems provide built-in iscsi clients like this. Also providing this as feature in QEMU may make it easier for such users to migrate over to QEMU/KVM. * easier to maintain. For users with very many QEMU instances I think having guest-private iscsi targets and LUNs may be easier to manage than 'huge set of files in /dev/scsi/*' * easier to maintain, when copying a QEMU instance from one host to another, this offers a more self-contained solution where the QEMU command line itself cotnains all required storage configuration, instead of also having to coordinate this move with setting up and tearing down open-iscsi logins on the underlying hosts. The patch has been tested by installing and running several distributions such as RHEL6 and Ubuntu on devices, both system disk as well as the installer CDROM as being iscsi devices. This testing has also been performed by running full QEMU under valgrind. Performance is comparable to using open-iscsi devices.
[Qemu-devel] [PATCH] iSCSI block driver support
This patch adds a new block driver : block.iscsi.c This driver interfaces with the multiplatform posix library for iscsi initiator/client access to iscsi devices hosted at git://github.com/sahlberg/libiscsi.git The patch adds the driver to interface with the iscsi library. It also updated the configure script to * by default, probe is libiscsi is available and if so, build qemu against libiscsi. * --enable-libiscsi Force a build against libiscsi. If libiscsi is not available the build will fail. * --disable-libiscsi Do not link against libiscsi, even if it is available. When linked with libiscsi, qemu gains support to access iscsi resources such as disks and cdrom directly, without having to make the devices visible to the host. You can specify devices using a iscsi url of the form : iscsi://[[:@]][:/ When using authentication, the password can optionally be set with LIBISCSI_CHAP_PASSWORD="password" to avoid it showing up in the process list Example: ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -cdrom iscsi://127.0.0.1/iqn.ronnie.test/2 --drive file=iscsi://127.0.0.1/iqn.ronnie.test/1 -boot d -enable-kvm Signed-off-by: Ronnie Sahlberg --- Makefile.objs |1 + block/iscsi.c | 596 + configure | 31 +++ trace-events |6 + 4 files changed, 634 insertions(+), 0 deletions(-) create mode 100644 block/iscsi.c diff --git a/Makefile.objs b/Makefile.objs index 52d8b23..1b21c14 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -25,6 +25,7 @@ block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o +block-nested-$(CONFIG_LIBISCSI) += iscsi.o block-nested-$(CONFIG_CURL) += curl.o block-nested-$(CONFIG_RBD) += rbd.o diff --git a/block/iscsi.c b/block/iscsi.c new file mode 100644 index 000..07fb3bc --- /dev/null +++ b/block/iscsi.c @@ -0,0 +1,596 @@ +/* + * QEMU Block driver for iSCSI images + * + * Copyright (c) 2010 Ronnie Sahlberg + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "config-host.h" + +#include +#include "sysemu.h" +#include "qemu-common.h" +#include "qemu-error.h" +#include "block_int.h" +#include "trace.h" + +#include +#include + + +typedef struct IscsiLun { +struct iscsi_context *iscsi; +int lun; +int block_size; +unsigned long num_blocks; +} IscsiLun; + +typedef struct IscsiAIOCB { +BlockDriverAIOCB common; +QEMUIOVector *qiov; +QEMUBH *bh; +IscsiLun *iscsilun; +struct scsi_task *task; +uint8_t *buf; +int canceled; +int status; +size_t read_size; +size_t read_offset; +} IscsiAIOCB; + +struct IscsiTask { +IscsiLun *iscsilun; +int status; +int complete; +}; + +static void +iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data, +void *private_data) +{ +} + +static void +iscsi_aio_cancel(BlockDriverAIOCB *blockacb) +{ +IscsiAIOCB *acb = (IscsiAIOCB *)blockacb; +IscsiLun *iscsilun = acb->iscsilun; + +acb->status = -ECANCELED; +acb->common.cb(acb->common.opaque, acb->status); +acb->canceled = 1; + +iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task, + iscsi_abort_task_cb, NULL); +} + +static AIOPool iscsi_aio_pool = { +.aiocb_size = sizeof(IscsiAIOCB), +.cancel = iscsi_aio_cancel, +}; + + +static void iscsi_process_read(void *arg); +static void iscsi_process_write(void *arg); + +static int iscsi_process_flush(void *arg) +{ +IscsiLun *iscsilun = arg; + +return iscsi_queue_length(iscsilun->iscsi) > 0; +} + +static void +iscsi_set_events(IscsiLun *iscsilun) +{ +
[Qemu-devel] iSCSI support for QEMU
Sorry about the missing subject line in the previous mail Got confused when using git-send-patch :-) regards ronnie sahlberg On Sun, Jun 12, 2011 at 12:47 PM, Ronnie Sahlberg wrote: > Please find attached a patch to add built-in support for iSCSI into QEMU. > Please review and/or apply this patch. > > This is the latest version of this patch and I think I have addressed all > previous concerns and suggestions. > > > Using built-in iSCSI support has many advantages for certain use cases : > > * if you have very many iSCSI devices it may be impractical to expose > all LUNs to the underlying host. > > * the devices become private to the guest and are not visible to the host. > This automatically avoids polluting the page-cache on the host. > > * security, the devices are private to the guest, which prevents other guests > or the host itself from accessing the devices. > > * security, it allows non-root users a secure way to get private and password > protected access to the device by using CHAP authentication. > > * migration, many other virtualization systems provide built-in iscsi clients > like this. Also providing this as feature in QEMU may make it easier for such > users to migrate over to QEMU/KVM. > > * easier to maintain. For users with very many QEMU instances I think having > guest-private iscsi targets and LUNs may be easier to manage than 'huge set > of files in /dev/scsi/*' > > * easier to maintain, when copying a QEMU instance from one host to another, > this offers a more self-contained solution where the QEMU command line itself > cotnains all required storage configuration, instead of also having to > coordinate this move with setting up and tearing down open-iscsi logins on > the underlying hosts. > > > > > The patch has been tested by installing and running several distributions > such as RHEL6 and Ubuntu on devices, both system disk as well as the > installer CDROM as being iscsi devices. > > This testing has also been performed by running full QEMU under valgrind. > > Performance is comparable to using open-iscsi devices. > > >
[Qemu-devel] [PATCH 0/4] ISCSI: Passthough updates
Paolo, list Please find a set of patches for iscsi. 1, The recent change to the eventsystem in iscsi is racy. If the async connect of the socket takes longer than almost instant, the shortcircuit trying to write directly to the socket may try to do so before the socket is established. So we hcange the logic here and ONLY try the shortcircuit avoiding setting up POLLOUT if we have been fully logged in. This avoids the race where we might crash if the socket is not yet connected. 2, Swithc to using READCAPACITY10 for MMC devices. Only SBC devices support READCAPACITY16 so far. 3, Switch to use READ10 for MMC devices instead of READ16. With this patch ISCSI and MMC devices now work correctly. 4, If the device is a SMC device, then force the device to be sg. We dont support emulation as far as i know for SMV devices, so these device types would only work for sg or iscsi passthrough anyway. The reason we need to forcer sg is that since we can not READ from a SMC device, we need to shortcircuit find_image_format() before it tries to do the bdrv_pread() trying to guess what format the image is. With this patch I have verified I can access and manage a media changer via iscsi-passthrough. regards ronnie sahblerg
[Qemu-devel] [PATCH 1/4] ISCSI: Only call READCAPACITY16 for SBC devices
Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index df0b6c8..39d75cb 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -727,13 +727,20 @@ iscsi_inquiry_cb(struct iscsi_context *iscsi, int status, void *command_data, scsi_free_scsi_task(task); -task = iscsi_readcapacity16_task(iscsi, itask->iscsilun->lun, +switch (itask->iscsilun->type) { +case TYPE_DISK: +task = iscsi_readcapacity16_task(iscsi, itask->iscsilun->lun, iscsi_readcapacity16_cb, opaque); -if (task == NULL) { -error_report("iSCSI: failed to send readcapacity16 command."); -itask->status = 1; +if (task == NULL) { +error_report("iSCSI: failed to send readcapacity16 command."); +itask->status = 1; +itask->complete = 1; +return; +} +break; +default: +itask->status = 0; itask->complete = 1; -return; } } -- 1.7.3.1
[Qemu-devel] [PATCH 2/4] ISCSI: Use READCAPACITY10 for MMC devices
Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 47 +++ 1 files changed, 47 insertions(+), 0 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 39d75cb..2ddb9e5 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -700,6 +700,42 @@ iscsi_readcapacity16_cb(struct iscsi_context *iscsi, int status, } static void +iscsi_readcapacity10_cb(struct iscsi_context *iscsi, int status, +void *command_data, void *opaque) +{ +struct IscsiTask *itask = opaque; +struct scsi_readcapacity10 *rc10; +struct scsi_task *task = command_data; + +if (status != 0) { +error_report("iSCSI: Failed to read capacity of iSCSI lun. %s", + iscsi_get_error(iscsi)); +itask->status = 1; +itask->complete = 1; +scsi_free_scsi_task(task); +return; +} + +rc10 = scsi_datain_unmarshall(task); +if (rc10 == NULL) { +error_report("iSCSI: Failed to unmarshall readcapacity10 data."); +itask->status = 1; +itask->complete = 1; +scsi_free_scsi_task(task); +return; +} + +itask->iscsilun->block_size = rc10->block_size; +itask->iscsilun->num_blocks = rc10->lba + 1; +itask->bs->total_sectors= itask->iscsilun->num_blocks * + itask->iscsilun->block_size / BDRV_SECTOR_SIZE ; + +itask->status = 0; +itask->complete = 1; +scsi_free_scsi_task(task); +} + +static void iscsi_inquiry_cb(struct iscsi_context *iscsi, int status, void *command_data, void *opaque) { @@ -738,6 +774,17 @@ iscsi_inquiry_cb(struct iscsi_context *iscsi, int status, void *command_data, return; } break; +case TYPE_ROM: +task = iscsi_readcapacity10_task(iscsi, itask->iscsilun->lun, + 0, 0, + iscsi_readcapacity10_cb, opaque); +if (task == NULL) { +error_report("iSCSI: failed to send readcapacity16 command."); +itask->status = 1; +itask->complete = 1; +return; +} +break; default: itask->status = 0; itask->complete = 1; -- 1.7.3.1
[Qemu-devel] [PATCH 3/4] ISCSI: Only use READ16 for SBC devices. Use READ10 for other device types such as MMC
Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 21 - 1 files changed, 16 insertions(+), 5 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 2ddb9e5..a015a52 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -379,14 +379,25 @@ iscsi_aio_readv(BlockDriverState *bs, int64_t sector_num, memset(acb->task, 0, sizeof(struct scsi_task)); acb->task->xfer_dir = SCSI_XFER_READ; -acb->task->cdb_size = 16; -acb->task->cdb[0] = 0x88; lba = sector_qemu2lun(sector_num, iscsilun); -*(uint32_t *)&acb->task->cdb[2] = htonl(lba >> 32); -*(uint32_t *)&acb->task->cdb[6] = htonl(lba & 0x); -*(uint32_t *)&acb->task->cdb[10] = htonl(num_sectors); acb->task->expxferlen = qemu_read_size; +switch (iscsilun->type) { +case TYPE_DISK: +acb->task->cdb_size = 16; +acb->task->cdb[0] = 0x88; +*(uint32_t *)&acb->task->cdb[2] = htonl(lba >> 32); +*(uint32_t *)&acb->task->cdb[6] = htonl(lba & 0x); +*(uint32_t *)&acb->task->cdb[10] = htonl(num_sectors); +break; +default: +acb->task->cdb_size = 10; +acb->task->cdb[0] = 0x28; +*(uint32_t *)&acb->task->cdb[2] = htonl(lba); +*(uint16_t *)&acb->task->cdb[7] = htons(num_sectors); +break; +} + if (iscsi_scsi_command_async(iscsi, iscsilun->lun, acb->task, iscsi_aio_read16_cb, NULL, -- 1.7.3.1
[Qemu-devel] [PATCH 4/4] ISCSI: If the device we open is a SMC device, then force the use of sg. We dont have any medium changer emulation so only passthrough via real sg or scsi-generic via iscsi wou
Forcing sg also makes qemu skip trying to read from the device to guess the image format by reading from the device (find_image_format()). SMC devices do not implement READ6/10/12/16 so it is noit possible to read from them. With this patch I can successfully manage a SMC device wiht iscsi in passthrough mode. Signed-off-by: Ronnie Sahlberg --- block/iscsi.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index a015a52..9ce38b5 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1032,6 +1032,15 @@ static int iscsi_open(BlockDriverState *bs, const char *filename, int flags) if (iscsi_url != NULL) { iscsi_destroy_url(iscsi_url); } + +/* Medium changer. We dont have any emulation for this so this must + be sg ioctl compatible. We force it to be sg, otherwise qemu will try + to read from the device to guess the image format. + */ +if (iscsilun->type == TYPE_MEDIUM_CHANGER) { +bs->sg = 1; +} + return 0; failed: -- 1.7.3.1
Re: [Qemu-devel] [PULL 1.1 0/2] SCSI patches for 1.1.0-rc3
Paolo, You need this patch too since without it it might crash. commit 6e46eb1846a862dad253be1a576f8554071b154a Author: Ronnie Sahlberg Date: Sat May 26 10:28:05 2012 +1000 ISCSI: We can only do the shortcircuit and write directly to the socket IFF we know the socket is open (and writeable). If the target is more than insignificant distance away, we could otherwise try to write to the socket before the nonblocking connect has completed which would cause a crash. Signed-off-by: Ronnie Sahlberg The other patches in the series can wait until later, but this one is needed for 1.1 regards ronnie sahlberg On Fri, May 25, 2012 at 9:11 PM, Paolo Bonzini wrote: > Il 22/05/2012 14:25, Paolo Bonzini ha scritto: >> The following changes since commit 76ee152a86d5f2533443ce4d2be6fe253cfb3c45: >> >> Update version to 1.1.0-rc2 (2012-05-14 17:56:50 -0500) >> >> are available in the git repository at: >> >> git://github.com/bonzini/qemu.git scsi-next >> >> for you to fetch changes up to e1a2d34f4abd8a117f5c5a25a5bb2e67d597de23: >> >> ISCSI: call qemu_notify_event() after updating events (2012-05-22 14:14:05 >> +0200) > > Rebased, commit cd0b3cc9eee6f9b6b165f24787b4290bdc1c3f68 is now the tip > of the branch. > > Paolo > >> ---- >> Jim Meyering (1): >> scsi: declare vmstate_info_scsi_requests to be static >> >> Ronnie Sahlberg (1): >> ISCSI: call qemu_notify_event() after updating events >> >> block/iscsi.c | 7 +++ >> hw/scsi-bus.c | 2 +- >> 2 files changed, 8 insertions(+), 1 deletion(-) > >
Re: [Qemu-devel] [PULL 1.1 0/2] SCSI patches for 1.1.0-rc3
Almost, but connecter=1 should be set after the if statement not inside it. if (status != 0) { +itask->iscsilun->connected = 1; itask->status = 1; I.e. dont set it in the error path, set it after the block. then it should be all good. optionally you can revert 5b5e96bba6835b794ba237c1ddd6580dea8d4aef and use the patch I attach instead regards ronnie sahlberg 2012 at 5:46 PM, Paolo Bonzini wrote: > Il 26/05/2012 07:41, ronnie sahlberg ha scritto: >> Paolo, >> >> You need this patch too since without it it might crash. >> >> commit 6e46eb1846a862dad253be1a576f8554071b154a >> Author: Ronnie Sahlberg >> Date: Sat May 26 10:28:05 2012 +1000 >> >> ISCSI: We can only do the shortcircuit and write directly to the socket >> IFF we know the socket is open (and writeable). >> If the target is more than insignificant distance away, we could >> otherwise >> try to write to the socket before the nonblocking connect has completed >> which would cause a crash. >> >> Signed-off-by: Ronnie Sahlberg >> >> >> The other patches in the series can wait until later, but this one is >> needed for 1.1 > > I didn't get the patch, so I redid it. > > Can you test asap git://github.com/bonzini/qemu.git, branch > scsi-candidate-1.1 to see if it matches what you want to be in 1.1? > > Paolo 0001-ISCSI-We-can-only-do-the-shortcircuit-and-write-dire.patch Description: Binary data
Re: [Qemu-devel] [PULL 1.1 0/2] SCSI patches for 1.1.0-rc3
Im compiling your branch now and will verify all is good. It is missing an include so you need to add this patch to compile : On Sat, May 26, 2012 at 5:59 PM, ronnie sahlberg wrote: > Almost, > but connecter=1 should be set after the if statement not inside it. > > if (status != 0) { > + itask->iscsilun->connected = 1; > itask->status = 1; > > I.e. dont set it in the error path, set it after the block. then it > should be all good. > > > > optionally you can revert 5b5e96bba6835b794ba237c1ddd6580dea8d4aef > and use the patch I attach instead > > > regards > ronnie sahlberg > > > 2012 at 5:46 PM, Paolo Bonzini wrote: >> Il 26/05/2012 07:41, ronnie sahlberg ha scritto: >>> Paolo, >>> >>> You need this patch too since without it it might crash. >>> >>> commit 6e46eb1846a862dad253be1a576f8554071b154a >>> Author: Ronnie Sahlberg >>> Date: Sat May 26 10:28:05 2012 +1000 >>> >>> ISCSI: We can only do the shortcircuit and write directly to the socket >>> IFF we know the socket is open (and writeable). >>> If the target is more than insignificant distance away, we could >>> otherwise >>> try to write to the socket before the nonblocking connect has completed >>> which would cause a crash. >>> >>> Signed-off-by: Ronnie Sahlberg >>> >>> >>> The other patches in the series can wait until later, but this one is >>> needed for 1.1 >> >> I didn't get the patch, so I redid it. >> >> Can you test asap git://github.com/bonzini/qemu.git, branch >> scsi-candidate-1.1 to see if it matches what you want to be in 1.1? >> >> Paolo 0001-ISCSI-need-to-include-scsi-defs.h.patch Description: Binary data
Re: [Qemu-devel] [PULL 1.1 0/2] SCSI patches for 1.1.0-rc3
I have compiled your branch and run through some tests. It all looks good as long as you apply the patch to #include "hw/scsi-defs.h" On Sat, May 26, 2012 at 6:17 PM, ronnie sahlberg wrote: > Im compiling your branch now and will verify all is good. > > It is missing an include so you need to add this patch to compile : > > > > > On Sat, May 26, 2012 at 5:59 PM, ronnie sahlberg > wrote: >> Almost, >> but connecter=1 should be set after the if statement not inside it. >> >> if (status != 0) { >> + itask->iscsilun->connected = 1; >> itask->status = 1; >> >> I.e. dont set it in the error path, set it after the block. then it >> should be all good. >> >> >> >> optionally you can revert 5b5e96bba6835b794ba237c1ddd6580dea8d4aef >> and use the patch I attach instead >> >> >> regards >> ronnie sahlberg >> >> >> 2012 at 5:46 PM, Paolo Bonzini wrote: >>> Il 26/05/2012 07:41, ronnie sahlberg ha scritto: >>>> Paolo, >>>> >>>> You need this patch too since without it it might crash. >>>> >>>> commit 6e46eb1846a862dad253be1a576f8554071b154a >>>> Author: Ronnie Sahlberg >>>> Date: Sat May 26 10:28:05 2012 +1000 >>>> >>>> ISCSI: We can only do the shortcircuit and write directly to the socket >>>> IFF we know the socket is open (and writeable). >>>> If the target is more than insignificant distance away, we could >>>> otherwise >>>> try to write to the socket before the nonblocking connect has completed >>>> which would cause a crash. >>>> >>>> Signed-off-by: Ronnie Sahlberg >>>> >>>> >>>> The other patches in the series can wait until later, but this one is >>>> needed for 1.1 >>> >>> I didn't get the patch, so I redid it. >>> >>> Can you test asap git://github.com/bonzini/qemu.git, branch >>> scsi-candidate-1.1 to see if it matches what you want to be in 1.1? >>> >>> Paolo
Re: [Qemu-devel] [PATCH 4/4] ISCSI: If the device we open is a SMC device, then force the use of sg. We dont have any medium changer emulation so only passthrough via real sg or scsi-generic via iscsi
Paolo I think I have seen a problem inside libiscsi that could be triggered by the shortcut. Can you remove this shortcut completely : -/* Try to write as much as we can to the socket - * without setting up an event. - * Only do this if we are completely logged in, so we know that - * the socket is in connected state. - */ -if (iscsi_is_logged_in(iscsi)) { -if (iscsi_which_events(iscsi) & POLLOUT) { -iscsi_process_write(iscsilun); -} -} I think there is a problem inside libiscsi if the socket becomes full and is no longer writeable and we try to write via this shortcurcuit. It will take a while until I can verify or fix that issue and before a new version of libiscsi can be available so I would feel most comfortable with if we just remove this optimization from QEMU for now. It can be added back later once libiscsi is fixed. regards ronnie sahlberg On Mon, May 28, 2012 at 4:48 PM, Paolo Bonzini wrote: > Il 27/05/2012 15:12, Andreas Färber ha scritto: >>> > Modified to also do the same for tapes, applied to scsi-next branch for >>> > 1.2. >> Paolo, it seems you haven't pushed scsi-next since then. > > Yeah, I have a pending push request for scsi-next, so I'm waiting till > Anthony applies it before pushing 1.2-only patches (I wasn't expecting > parallel 1.1/1.2 development for SCSI). > >> I hope you've >> also shortened the subject to a humanly bearable length? > > Yes. :) > > Paolo
Re: [Qemu-devel] Block job commands in QEMU 1.2 [v2, including support for replication]
On Fri, May 25, 2012 at 11:25 PM, Paolo Bonzini wrote: > Il 25/05/2012 14:09, Stefan Hajnoczi ha scritto: >>> > >>> > Perhaps that be simply a new qemu-img subcommand? It should be possible >>> > to run it while the VM is offline. Then the file that is produced could >>> > be fed to blockdev-dirty-enable. >> For both continuous replication and incremental backups we cannot >> require that the guest is shut down in order to collect the dirty >> bitmap, I think. > > Yes, that is a problem for internal snapshots. For external snapshots, > see the drive-mirror command's sync parameter. Perhaps we can add a > blockdev-dirty-fill command that adds allocated sectors up to a given > base to the dirty bitmap. > >> I think we really need a libvirt API because a local file not only has >> permissions issues but also is not network transparent. The continuous >> replication server runs on another machine, how will it access the dirty >> bitmap file? > > This is still using a "push" model where the dirty data is sent from > QEMU to the replication server, so the dirty bitmap is not needed on the > machine that runs the replication server---only on the machine that runs > the VM (to preserve the bitmap across VM shutdowns including power > loss). It has to be stored on shared storage if you plan to run the VM > from multiple hosts. Why reinventing the wheel? Wouldnt it be much better to externalize the snapshotting. Some/Many filesystems support snapshotting today. A-whole-lot-of/most-non-consumer-grade block storage devices support it too. So a different way to do this would be to use a mechanism to quiescence the backing file/device and then call out to an external agent to snapshot the backing file or the backing device. Other external tools can then be used to compute a dense delta between this new snapshot and the previous snapshot and transfer to the other side. I think all filesystems that support snapshotting on file level support APIs for a cheap way to cumpute the block deltas. I would imagine all midrange or better block storage devices that support LUN snapshotting provide this too. So why do snapshotting and computation of snapshot deltas in qemu ? Why not just externalize it with "you want snapshotting and incremental replication => you must use a system where file/block supports it". regards ronnie sahlberg
[Qemu-devel] [PATCH 0/2] Two trivial patches for SCSI thin provisioning
List, Please find two trivial patches for the recent thin provisioning support 1, you can support unmap via WRITE_SAME_10 just as well as WRITE_SAME_16 since they are the same/identical as far as how QEMU works ? 2, if implementing LBPWS and LBPWS10 we should also tell the guest that the LUN is thin-provisionied. regards ronnie sahlberg
[Qemu-devel] [PATCH 2/2] SCSI emulation: should tell the guest that we actually support thin provisioning
Signed-off-by: Ronnie Sahlberg --- hw/scsi-disk.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index ccd9291..d6c197f 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -532,7 +532,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) outbuf[3] = buflen = 8; outbuf[4] = 0; outbuf[5] = 0x60; /* write_same 10/16 supported */ -outbuf[6] = 0; +outbuf[6] = 2;/* thin provisioning */ outbuf[7] = 0; break; } -- 1.7.3.1
[Qemu-devel] [PATCH 1/2] SCSI emulation: Support unmap via WRITE_SAME_10. It is the same as WRITE_SAME_16 as far as QEMU is concerned
Signed-off-by: Ronnie Sahlberg --- hw/scsi-disk.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 9949786..ccd9291 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -531,7 +531,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) { outbuf[3] = buflen = 8; outbuf[4] = 0; -outbuf[5] = 0x40; /* write same with unmap supported */ +outbuf[5] = 0x60; /* write_same 10/16 supported */ outbuf[6] = 0; outbuf[7] = 0; break; @@ -1456,10 +1456,11 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) goto illegal_lba; } break; +case WRITE_SAME_10: case WRITE_SAME_16: len = r->req.cmd.xfer / s->qdev.blocksize; -DPRINTF("WRITE SAME(16) (sector %" PRId64 ", count %d)\n", +DPRINTF("WRITE SAME() (sector %" PRId64 ", count %d)\n", r->req.cmd.lba, len); if (r->req.cmd.lba > s->qdev.max_lba) { -- 1.7.3.1
Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support
On Fri, Apr 20, 2012 at 5:00 PM, Nicholas A. Bellinger wrote: > On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote: >> Hi Nicholas, >> >> On 04/19/2012 06:53 PM, Nicholas A. Bellinger wrote: >> > On Thu, 2012-04-19 at 07:30 -0500, Anthony Liguori wrote: >> >> However, for storage, be it scsi or direct access, the same problem really >> >> doesn't exist. There isn't an obvious benefit to being in the kernel. >> >> >> > >> > In the modern Linux v3.x tree, it was decided there is an obvious >> > benefit to fabric drivers developers for going ahead and putting proper >> > SCSI target logic directly into the kernel.. ;) >> >> I'm sure there are obvious benefits to having the kernel have SCSI target >> logic. >> I'm not claiming that there isn't. >> >> But there is not an obvious benefit to doing SCSI emulation *for virtual >> machine* guests in the kernel. >> >> Guests are unconditionally hostile. There is no qualification here. Public >> clouds are the obvious example of this. >> >> TCM runs in the absolute most privileged context possible. When you're >> dealing >> with extremely hostile input, it's pretty obvious that you want to run it in >> the >> lowest privileged context as humanly possible. >> > > The argument that a SCSI target for virtual machines is so complex that > it can't possibly be implemented properly in the kernel is a bunch of > non-sense. There are also other benefits to NOT implement scsi emulation in the kernel, aside from the security aspect of running large amounts of code inside kernel context vs within restricted userspace context. I am very happy to be able to add emulation of new opcodes or new features to tgtd WITHOUT having to recompile my kernel. regards ronnie sahlberg
Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support
On Mon, Apr 23, 2012 at 7:33 PM, Stefan Hajnoczi wrote: > On Sat, Apr 21, 2012 at 9:51 AM, Nicholas A. Bellinger > wrote: >> On Fri, 2012-04-20 at 12:09 +0100, Stefan Hajnoczi wrote: >>> On Fri, Apr 20, 2012 at 8:46 AM, Paolo Bonzini wrote: >>> > Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto: >> >> >> >>> > - no support for migration (there can be pending SCSI requests at >>> > migration time, that need to be restarted on the destination) >>> >>> Yes and it hasn't been thought through by me at least ;-). So >>> migration is indeed a challenge that needs to be worked through. >>> >>> > - no support for non-raw images (fix: use NBD on a Unix socket? perhaps >>> > add an NBD backend to lio) >>> >>> For me this is the biggest issue with kernel-level storage for virtual >>> machines. We have NBD today but it goes through the network stack >>> using a limited protocol and probably can't do zero-copy. >>> >>> The most promising option I found was dm-userspace >>> (http://wiki.xensource.com/xenwiki/DmUserspace), which implements a >>> device-mapper target with an in-kernel MMU-like lookup mechanism that >>> calls out to userspace when block addresses need to be translated. >>> It's not anywhere near to upstream and hasn't been pushed for several >>> years. On the plus side we could also write a userspace >>> implementation of this so that QEMU image formats continue to be >>> portable to other host OSes without duplicating code. >>> >>> If tcm_vhost only works with raw images then I don't see it as a >>> realistic option given the effort it will require to complete and >>> maintain. >>> >> >> So there has been interest in the past for creating a TCM backend that >> allows for a userspace passthrough, but so far the code to do this has >> not materialized yet.. >> >> There are pieces of logic from STGT that provide an interface for doing >> something similar that still exist in the upstream kernel. Allowing >> different QEMU formats to be processed (in userspace) through a hybrid >> TCM backend driver that fits into the existing HBA/DEV layout in >> /sys/kernel/config/target/$HBA/$DEV/ is what would be required to really >> do this properly.. > > Could you point to the existing upstream code? > > I think the hybrid TCM backend driver means a way for a userspace > process to execute SCSI Tasks from the target - implementing a subset > of se_subsystem_api in userspace? > > If we solve the problem at the block driver level instead of inside > the SCSI target then it's also possible for the host to inspect VM > disk images similar to loopback devices (mount -o loop). Do you think > putting the userspace interface into the SCSI target has advantages > over the block driver or device-mapper level? > Hi Stefan, A little bit off-topic but When you design the proper place and API to plug virt-scsi into an external SCSI parser outside of qemu like the target in the kernel ... It would be very nice if one could also plug virt-scsi into libiscsi and pass the CDBs straight to the remote iSCSI target too. Keep some thoughts on virt-scsi + libiscsi integration. regards ronnie sahlberg
[Qemu-devel] [PATCH 0/1] Add support for iSCSI thin-provisioning
List, Please find a patch that updates the iscsi block device to provide support for 1, LUNs bigger than 2TB by switching to READCAPACITY16 2, Thin-provisioning by implementing bdrv_aio_discard using SCSI UNMAP commands. The unmapping/discard of blocks was done by booting a guest and using sg_write_same --16 --unmap from within the guest. This was trapped in hw/scsi-disc.c and then appeared on the wire to the iscsi target as a proper SCSI UNMAP command. regards ronnie sahlberg
[Qemu-devel] [PATCH] ISCSI: Add support for thin-provisioning via discard/UNMAP and bigger LUNs
Update the configure test for libiscsi support to detect version 1.3 or later. Version 1.3 of libiscsi provides both READCAPACITY16 as well as UNMAP commands. Update the iscsi block layer to use READCAPACITY16 to detect the size of the LUN instead of READCAPACITY10. This allows support for LUNs larger than 2TB. Update to implement bdrv_aio_discard() using the UNMAP command. This allows us to use thin-provisioned LUNs from TGTD and other iSCSI targets that support thin-provisioning. Signed-off-by: Ronnie Sahlberg --- block/iscsi.c | 86 configure |5 ++- 2 files changed, 77 insertions(+), 14 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index bd3ca11..eb49093 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -383,6 +383,65 @@ iscsi_aio_flush(BlockDriverState *bs, return &acb->common; } +static void +iscsi_unmap_cb(struct iscsi_context *iscsi, int status, + void *command_data, void *opaque) +{ +IscsiAIOCB *acb = opaque; + +if (acb->canceled != 0) { +qemu_aio_release(acb); +scsi_free_scsi_task(acb->task); +acb->task = NULL; +return; +} + +acb->status = 0; +if (status < 0) { +error_report("Failed to unmap data on iSCSI lun. %s", + iscsi_get_error(iscsi)); +acb->status = -EIO; +} + +iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); +scsi_free_scsi_task(acb->task); +acb->task = NULL; +} + +static BlockDriverAIOCB * +iscsi_aio_discard(BlockDriverState *bs, + int64_t sector_num, int nb_sectors, + BlockDriverCompletionFunc *cb, void *opaque) +{ +IscsiLun *iscsilun = bs->opaque; +struct iscsi_context *iscsi = iscsilun->iscsi; +IscsiAIOCB *acb; +struct unmap_list list[1]; + +acb = qemu_aio_get(&iscsi_aio_pool, bs, cb, opaque); + +acb->iscsilun = iscsilun; +acb->canceled = 0; + +list[0].lba = sector_qemu2lun(sector_num, iscsilun); +list[0].num = nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size; + +acb->task = iscsi_unmap_task(iscsi, iscsilun->lun, + 0, 0, &list[0], 1, + iscsi_unmap_cb, + acb); +if (acb->task == NULL) { +error_report("iSCSI: Failed to send unmap command. %s", + iscsi_get_error(iscsi)); +qemu_aio_release(acb); +return NULL; +} + +iscsi_set_events(iscsilun); + +return &acb->common; +} + static int64_t iscsi_getlength(BlockDriverState *bs) { @@ -396,11 +455,11 @@ iscsi_getlength(BlockDriverState *bs) } static void -iscsi_readcapacity10_cb(struct iscsi_context *iscsi, int status, +iscsi_readcapacity16_cb(struct iscsi_context *iscsi, int status, void *command_data, void *opaque) { struct IscsiTask *itask = opaque; -struct scsi_readcapacity10 *rc10; +struct scsi_readcapacity16 *rc16; struct scsi_task *task = command_data; if (status != 0) { @@ -412,26 +471,25 @@ iscsi_readcapacity10_cb(struct iscsi_context *iscsi, int status, return; } -rc10 = scsi_datain_unmarshall(task); -if (rc10 == NULL) { -error_report("iSCSI: Failed to unmarshall readcapacity10 data."); +rc16 = scsi_datain_unmarshall(task); +if (rc16 == NULL) { +error_report("iSCSI: Failed to unmarshall readcapacity16 data."); itask->status = 1; itask->complete = 1; scsi_free_scsi_task(task); return; } -itask->iscsilun->block_size = rc10->block_size; -itask->iscsilun->num_blocks = rc10->lba; -itask->bs->total_sectors = (uint64_t)rc10->lba * - rc10->block_size / BDRV_SECTOR_SIZE ; +itask->iscsilun->block_size = rc16->block_length; +itask->iscsilun->num_blocks = rc16->returned_lba; +itask->bs->total_sectors= rc16->returned_lba * + rc16->block_length / BDRV_SECTOR_SIZE ; itask->status = 0; itask->complete = 1; scsi_free_scsi_task(task); } - static void iscsi_connect_cb(struct iscsi_context *iscsi, int status, void *command_data, void *opaque) @@ -445,10 +503,10 @@ iscsi_connect_cb(struct iscsi_context *iscsi, int status, void *command_data, return; } -task = iscsi_readcapacity10_task(iscsi, itask->iscsilun->lun, 0, 0, - iscsi_readcapacity10_cb, opaque); +task = iscsi_readcapacity16_task(iscsi, itask->iscsilun->lun, + iscsi_readcapacity16_cb, opaque); if (task == NULL) { -error_report("iSCSI: failed to send readcapacity command.&qu
Re: [Qemu-devel] [PATCH] ISCSI: Add support for thin-provisioning via discard/UNMAP and bigger LUNs
On Tue, Apr 24, 2012 at 4:46 PM, Paolo Bonzini wrote: > Il 24/04/2012 08:29, Ronnie Sahlberg ha scritto: >> Update the configure test for libiscsi support to detect version 1.3 or >> later. >> Version 1.3 of libiscsi provides both READCAPACITY16 as well as UNMAP >> commands. >> >> Update the iscsi block layer to use READCAPACITY16 to detect the size of the >> LUN instead of READCAPACITY10. This allows support for LUNs larger than 2TB. >> >> Update to implement bdrv_aio_discard() using the UNMAP command. >> This allows us to use thin-provisioned LUNs from TGTD and other iSCSI >> targets that support thin-provisioning. > > Looks good. Kevin, do you want me to take libiscsi patches via the SCSI > tree? > > As an aside, I am not really sure of the utility of adding these utility > functions directly in libiscsi, rather than making it a pure transport > library. block/iscsi.c is going to grow as you add more functionality > (e.g. WRITE SAME commands), and libiscsi will have to be updated each > time in lockstep. > > I can see the value of basic read/write/flush and readcap10/16, but with > unmap it's starting to be a bit more specific. Are there other clients > of libiscsi that use these functions? Should they be placed into > block/iscsi.c or a new block/iscsi-cdb.c instead? I see your point. I like to add scsi commands as I use them to libiscsi, since then I dont have to re-write the marshalling/unmarshalling code everytime in my small test and utility programs. For example when i want to write some one-off small tools to test something. But, yes. There is no real need to use them directly from qemu. So ignore this patch for now. I will redo UNMAP in the patch to instead use the generic scsi function inside libiscsi. That will serve the purpose to verify that the public API in libiscsi is sufficient for accessing the generic transport and secondly that will show a useful example on how to send CDB+data to and to receive data back from the generic function. This generic API would be what a future virt-scsi->libiscsi integration would use anyway. regards ronnie sahlberg
Re: [Qemu-devel] [PATCH] ISCSI: Add support for thin-provisioning via discard/UNMAP and bigger LUNs
On Tue, Apr 24, 2012 at 6:05 PM, Paolo Bonzini wrote: > Il 24/04/2012 10:02, ronnie sahlberg ha scritto: >> So ignore this patch for now. I will redo UNMAP in the patch to >> instead use the generic scsi function inside libiscsi. >> >> That will serve the purpose to verify that the public API in libiscsi >> is sufficient for accessing the generic transport and >> secondly that will show a useful example on how to send CDB+data to >> and to receive data back from the generic function. >> >> This generic API would be what a future virt-scsi->libiscsi >> integration would use anyway. > > I will be on holiday from tomorrow to May 1, and I won't be able to send > a pull request today. As I want to minimize the time I spend looking at > qemu-devel, :) I'll take this patch as is, since the new libiscsi > version is needed anyway for READ CAPACITY (16). > > You could write a follow-up patch to teach iscsi.c about WRITE SAME(10) > and WRITE SAME(16) with the unmap bit set, and use generic CDB+data > functions for those. > Ok. That works for me. Thanks. regards ronnie sahlberg
Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support
On Tue, Apr 24, 2012 at 7:13 PM, Stefan Hajnoczi wrote: > On Tue, Apr 24, 2012 at 8:05 AM, Paolo Bonzini wrote: >> Il 24/04/2012 06:21, ronnie sahlberg ha scritto: >>> Hi Stefan, >>> >>> A little bit off-topic but >>> >>> When you design the proper place and API to plug virt-scsi into an >>> external SCSI parser outside of qemu like the target in the kernel ... >>> >>> It would be very nice if one could also plug virt-scsi into libiscsi >>> and pass the CDBs straight to the remote iSCSI target too. >>> Keep some thoughts on virt-scsi + libiscsi integration. >> >> Yes, that makes a lot of sense. It's a bit harder than scsi-generic but >> we do want to get there. > > Yep. I think previously there was discussion about a libiscsi > SCSIDevice so that guest SCSI commands can be sent to libiscsi LUNs > without going through the QEMU block layer. (Another way to pass > arbitrary SCSI commands to libiscsi is by hooking up .bdrv_aio_ioctl() > with SG_IO scsi-generic compatible code in block/iscsi.c.) bdrv_aio_ioctl() and SG_IO would mean #ifdef __linux__ So maybe it would be better to instead create a new hw/scsi-scsi.c that calls straight into block/iscsi.c ? That would be a lot more work than emulating SG_IO but would work on all platforms. Comments? How important is !linux support ? regards ronnie sahlberg
Re: [Qemu-devel] Wireshark SCSI dissectors for new transports
Hi Stefan. Wireshark supports a lot more than just SCSI over iSCSI It dissects SCSI over USB, FCOE, raw-FC, FCIP, iFCP (I never got access to traces for mFCP :-( ) and also over NDMP. I never got access to HyperSCSI traces either, so that is missing too. Since I never got HyperSCSI or mFCP (very short-lived attempt from HBA vendors) those two are the only ones today I think where we miss decode. virt-scsi from QEMU sounds interesting! SCSI has a very well defined API in wireshark so assind a new transport should be trivial. I have done so several times. First you need a DLT value from the tcpdump folks to wrap your packets in. Once you have that it should be semi-trivial to hook the SCSI-dissector into your transport/DLT Depending on what the framing looks like, you need at least a wrapper around the SCSI payload that can contain an I_T identifier, then a LUN field, and then scoped per LUN you need a task-tag or similar. Wireshark would need I_T to be able to track initiators and targets separately and form a "conversation" between an arbitrary pair. A LUN identifier to track different luns on the same I_T separately. Finally it also needs a task-tag so that on a specific ILT nexus it will be able to match a SCSI CDB with DATA-IN/OUT blobs and a SCSI response/sense to make it map well you might need to wrap thing inside a struct scsi_wrapper { initiator identifier target identifier lun task-tag opcode (cdb, datain, dataout, response/sense) scsi * } if your transport also supports multiple datain/out blobs for a single task, in order to reassemble the data we would also need a offset/length for each datain/out blob. regards ronnie sahlberg On Tue, Feb 28, 2012 at 8:59 PM, Stefan Hajnoczi wrote: > Wireshark today supports SCSI dissectors for iSCSI. > > In the QEMU system emulator we have an emulated SCSI target which > handles devices for SCSI Parallel Interface (SPI), USB Mass Storage > Device, and now supports the new virtio-scsi transport (for efficient > virtual machine SCSI I/O). > > Cong and I would like to add pcap support to the emulated SCSI target > in QEMU, making it easy to use Wireshark for SCSI debugging and > analysis. I'm looking for advice on getting started because we are > not familiar with Wireshark/dissector internals. > > I believe the SCSI dissector does not support raw CDB dissection - it > requires a SCSI transport dissector (currently iSCSI). A few options > come to mind: > > 1. Change the SCSI dissector to support top-level dissecting of raw > SCSI CDBs without transport information (initiator, target, and other > metadata). > > 2. Add virtio-scsi and perhaps SPI SCSI transport dissectors. > > 3. A simpler approach I'm missing? :) > > Suggestions appreciated. > > Stefan
[Qemu-devel] [PATCH] Rename libiscsi to libiscsiclient
Please find a patch that changes the name of the iscsi library from libiscsi to libiscsiclient While libiscsi is a very nice name that works on all platforms, including win32, it clashes with a linux library. regards ronnie sahlberg
[Qemu-devel] [PATCH] Change library from libiscsi to libiscsiclient to match library rename in libiscsi
Signed-off-by: Ronnie Sahlberg --- configure |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/configure b/configure index fb0e18e..294c0c1 100755 --- a/configure +++ b/configure @@ -2503,9 +2503,9 @@ if test "$libiscsi" != "no" ; then #include int main(void) { iscsi_create_context(""); return 0; } EOF - if compile_prog "-Werror" "-liscsi" ; then + if compile_prog "-Werror" "-liscsiclient" ; then libiscsi="yes" -LIBS="$LIBS -liscsi" +LIBS="$LIBS -liscsiclient" else if test "$libiscsi" = "yes" ; then feature_not_found "libiscsi" -- 1.7.3.1
Re: [Qemu-devel] QEMU desired libiscsi.so clashes with libiscsi.so from iscsi-initiator-utils
Yes, Very unfortuante since libiscsi is such a nice name for a multiplatform library what even works on win32 :-( I have so renamed it to libiscsiclient and sent a patch to qemu to this list to use -liscsiclient instead of -liscsi tarballs can be found at https://github.com/sahlberg/libiscsi/downloads That resolves all issues you are concerned about ? regards ronnie sahlberg On Tue, Feb 14, 2012 at 12:24 AM, Daniel P. Berrange wrote: > I was investigating how to build latest QEMU with the iSCSI block driver > enabled. I saw that configure wanted a libiscsi.so, so I installed that > library from Fedora RPMs via the iscsi-initiator-utils package, but it > still wouldn't build. > > After further investigation, I find that QEMU in fact wants a completely > different, unlreated libiscsi.so library: > > https://github.com/sahlberg/libiscsi > > Obviously we have a problem here because we can't have two different > libraries called libiscsi.so installed at the same time. > > Since iscsi-initiator-utils is a standard Linux distro package whose usage > of libiscsi.so predates this github project, it seems that to resolve this > it will be neccessary to rename the latter. eg perhaps libiscsi-client.so ? > > The followup question is where to find actual libiscsi releases to package > up for OS distros ? It is not very desirable to just package GIT snapshots. > > Regards, > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|