[Qemu-devel] [Bug 1198350] Re: USB pass-through fails with USBDEVFS_DISCONNECT: Invalid argument
I also have this issue. Does anyone have a work around? (it works with Virtual Box) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1198350 Title: USB pass-through fails with USBDEVFS_DISCONNECT: Invalid argument Status in QEMU: New Bug description: Host Gentoo linux 32bit Guest Windows XP SP3 qemu 1.4.2 and qemu fresh get clone and build 2013-07-04 (version1.5.50) qemu command line qemu-system-i386 -enable-kvm localtime -m 2047 -boot d /archive3/qemu/WindowsXP.img -net nic,model=rtl8139 -net user -usb -device usb-ehci,id=ehci -usbdevice host:1493:19 The device I am trying to use with the guest is an interface for the Suunto Ambit 2 GPS watch which has no linux support. When the USB device is plugged in qemu reports to the command line: USBDEVFS_DISCONNECT: Invalid argument Invalid argument dmesg shows [237755.495968] usb 2-1.5: new full-speed USB device number 34 using ehci-pci [237755.582778] usb 2-1.5: config 1 has an invalid interface number: 1 but max is 0 [237755.582781] usb 2-1.5: config 1 has no interface number 0 [237755.583628] usb 2-1.5: New USB device found, idVendor=1493, idProduct=0019 [237755.583631] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [237755.583633] usb 2-1.5: Product: Ambit [237755.583634] usb 2-1.5: Manufacturer: Suunto [237755.583636] usb 2-1.5: SerialNumber: CE8309511700 [237756.584937] usb 2-1.5: reset full-speed USB device number 34 using ehci-pci [237756.832658] usb 2-1.5: reset full-speed USB device number 34 using ehci-pci [237757.143585] usb 2-1.5: usbfs: process 12684 (qemu-system-i38) did not claim interface 1 before use In the windows guest Device Manager a HID device is listed but nothing else happens, no found new hardware dialog or the Suunto software (which is sitting there waiting) is not triggered as it should be. I have tried successfully with several other devices (flash drive, mouse, printer and video capture device). Because this device pretends to be an HID device my kernel's hid-generic driver was picking it up first until I modified hid-core.c to ignore this vendorid/productid. But still no joy. I'm guessing it has something to do with the the dmesg lines: [237755.582778] usb 2-1.5: config 1 has an invalid interface number: 1 but max is 0 [237755.582781] usb 2-1.5: config 1 has no interface number 0 But read that these warnings are not important though I don't get them for other devices. Nor do I get: [237757.143585] usb 2-1.5: usbfs: process 12684 (qemu-system-i38) did not claim interface 1 before use I've done alot of searching and I've run out of ideas. Any help would be great. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1198350/+subscriptions
Re: [Qemu-devel] [PATCH v2] net/net: Change the default mac address of nic
On 10/14/2013 03:30 PM, Mike Qiu wrote: Sorry for the first version make a mistake of the sender. There is no difference with v1. Thanks Mike The default mac address is 52:54:00:12:34:56 + index, this will cause problem that when we boot up more than one guest with all mac addresses unset by default, assume that each guest has one nic. In this situation, all the guest's nic has the same mac address. This patch is to solve this bug. Signed-off-by: Mike Qiu --- net/net.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/net.c b/net/net.c index c330c9a..2796d04 100644 --- a/net/net.c +++ b/net/net.c @@ -21,6 +21,8 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ +#include + #include "config-host.h" #include "net/net.h" @@ -147,12 +149,13 @@ void qemu_macaddr_default_if_unset(MACAddr *macaddr) if (memcmp(macaddr, &zero, sizeof(zero)) != 0) return; +srand((unsigned)time(NULL)); macaddr->a[0] = 0x52; macaddr->a[1] = 0x54; macaddr->a[2] = 0x00; -macaddr->a[3] = 0x12; -macaddr->a[4] = 0x34; -macaddr->a[5] = 0x56 + index++; +macaddr->a[3] = rand() % 256; +macaddr->a[4] = rand() % 256; +macaddr->a[5] = rand() % 256 + index++; } /**
Re: [Qemu-devel] [PATCH] hmp: Add '\n' in monitor_printf()
On 10/14/2013 10:36 PM, Markus Armbruster wrote: Mike Qiu writes: Without this, output of 'info block' scsi0-hd0: /images/f18-ppc64.qcow2 (qcow2) [not inserted] scsi0-cd2: [not inserted] Removable device: not locked, tray closed floppy0: [not inserted] Removable device: not locked, tray closed sd0: [not inserted] Removable device: not locked, tray closed There will be no additional lines between scsi0-hd0 scsi0-cd2, and break the info style. Just saw a similar one: (qemu) info block disk0: test.img (raw) [not inserted] cd: [not inserted] Removable device: not locked, tray closed foo: tmp.img (raw) Removable device: not locked, tray closed [not inserted](qemu) This patch is to solve this. Signed-off-by: Mike Qiu --- hmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hmp.c b/hmp.c index 5891507..2d2e5f8 100644 --- a/hmp.c +++ b/hmp.c @@ -367,7 +367,7 @@ void hmp_info_block(Monitor *mon, const QDict *qdict) info->value->inserted->iops_wr_max, info->value->inserted->iops_size); } else { -monitor_printf(mon, " [not inserted]"); +monitor_printf(mon, " [not inserted]\n"); } if (verbose) { monitor_printf(mon, "\nImages:\n"); What about removing the newline before "Images"? A good idea I think, it no need to add addition lines in one info. But see commit id: fbe2e26c15af35e4d157874dc80f6a19eebaa83b -if (verbose) { -monitor_printf(mon, " images:\n"); ^^^ -image_info = info->value->inserted->image; -while (1) { - bdrv_image_info_dump((fprintf_function)monitor_printf, - mon, image_info); -if (image_info->has_backing_image) { -image_info = image_info->backing_image; -} else { -break; -} +if (verbose) { +monitor_printf(mon, "\nImages:\n"); ^ +image_info = info->value->inserted->image; +while (1) { + bdrv_image_info_dump((fprintf_function)monitor_printf, + mon, image_info); +if (image_info->has_backing_image) { +image_info = image_info->backing_image; +} else { +break; } } -} else { -monitor_printf(mon, " [not inserted]"); } It was changed to add this, so there maybe some reasons I think, Thanks Mike I think we should also drop this newline: if (info->value->removable) { monitor_printf(mon, "Removable device: %slocked, tray %s\n", info->value->locked ? "" : "not ", info->value->tray_open ? "open" : "closed"); } Maybe more. The function probably needs a systematic newline review.
Re: [Qemu-devel] [PATCH v2] net/net: Change the default mac address of nic
On 10/15/2013 01:07 PM, Stefan Weil wrote: Am 15.10.2013 06:17, schrieb Mike Qiu: Changelog to v1: Find remainder of macaddr->a[5] by modulo 256, otherwise it may be overflow by add index++. The default mac address is 52:54:00:12:34:56 + index, this will cause problem that when we boot up more than one guest with all mac addresses unset by default, assume that each guest has one nic. In this situation, all the guest's nic has the same mac address. This patch is to solve this bug. Signed-off-by: Mike Qiu --- net/net.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/net.c b/net/net.c index c330c9a..9e72764 100644 --- a/net/net.c +++ b/net/net.c @@ -21,6 +21,8 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ +#include + #include "config-host.h" #include "net/net.h" @@ -147,12 +149,13 @@ void qemu_macaddr_default_if_unset(MACAddr *macaddr) if (memcmp(macaddr, &zero, sizeof(zero)) != 0) return; +srand((unsigned)time(NULL)); macaddr->a[0] = 0x52; macaddr->a[1] = 0x54; macaddr->a[2] = 0x00; -macaddr->a[3] = 0x12; -macaddr->a[4] = 0x34; -macaddr->a[5] = 0x56 + index++; +macaddr->a[3] = rand() % 256; +macaddr->a[4] = rand() % 256; +macaddr->a[5] = (rand() % 256 + index++) % 256; } /** There is no overflow which must be handled because a[5] is an uint8_t value, so the assignment automatically limits the range to 0...255. OK, you are right, but I think we'd better to ensure this, even though a[5] is an uint8_t. Is it reasonable to get a random mac address in your guest? I don't think so. It would no longer be possible to connect to a guest using ssh, restart that guest and connect again with ssh. Why not? I have do the experiment, after reboot, the mac is not changed. and the ip address always the same. And can be login to the guest after reboot. If you start more than one guest, you simply have to decide which mac address you want and tell it on the command line. Yes, mostly we should do this, but users sometimes not do this, so this interface should cover this situation. Thank Mike Stefan
Re: [Qemu-devel] [PATCH v2] net/net: Change the default mac address of nic
On 10/15/2013 02:05 PM, Stefan Weil wrote: Am 15.10.2013 07:57, schrieb mike: On 10/15/2013 01:07 PM, Stefan Weil wrote: Am 15.10.2013 06:17, schrieb Mike Qiu: Changelog to v1: Find remainder of macaddr->a[5] by modulo 256, otherwise it may be overflow by add index++. The default mac address is 52:54:00:12:34:56 + index, this will cause problem that when we boot up more than one guest with all mac addresses unset by default, assume that each guest has one nic. In this situation, all the guest's nic has the same mac address. This patch is to solve this bug. Signed-off-by: Mike Qiu --- net/net.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/net.c b/net/net.c index c330c9a..9e72764 100644 --- a/net/net.c +++ b/net/net.c @@ -21,6 +21,8 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ +#include + #include "config-host.h" #include "net/net.h" @@ -147,12 +149,13 @@ void qemu_macaddr_default_if_unset(MACAddr *macaddr) if (memcmp(macaddr, &zero, sizeof(zero)) != 0) return; +srand((unsigned)time(NULL)); macaddr->a[0] = 0x52; macaddr->a[1] = 0x54; macaddr->a[2] = 0x00; -macaddr->a[3] = 0x12; -macaddr->a[4] = 0x34; -macaddr->a[5] = 0x56 + index++; +macaddr->a[3] = rand() % 256; +macaddr->a[4] = rand() % 256; +macaddr->a[5] = (rand() % 256 + index++) % 256; } /** There is no overflow which must be handled because a[5] is an uint8_t value, so the assignment automatically limits the range to 0...255. OK, you are right, but I think we'd better to ensure this, even though a[5] is an uint8_t. Is it reasonable to get a random mac address in your guest? I don't think so. It would no longer be possible to connect to a guest using ssh, restart that guest and connect again with ssh. Why not? I have do the experiment, after reboot, the mac is not changed. and the ip address always the same. And can be login to the guest after reboot. "restart" means terminate QEMU and start it again. OK, qemu support the mac address unset right ? So it may be a joke if just can start only one VM with mac address unset :). In your case, also can use monitor to get the mac address, simply 'info network' :) Then you can do as if you set the mac address. Thanks Mike
Re: [Qemu-devel] [PATCH] hmp: Add '\n' in monitor_printf()
On 10/15/2013 04:58 PM, Kevin Wolf wrote: Am 15.10.2013 um 05:38 hat mike geschrieben: On 10/14/2013 10:36 PM, Markus Armbruster wrote: Mike Qiu writes: Without this, output of 'info block' scsi0-hd0: /images/f18-ppc64.qcow2 (qcow2) [not inserted] scsi0-cd2: [not inserted] Removable device: not locked, tray closed floppy0: [not inserted] Removable device: not locked, tray closed sd0: [not inserted] Removable device: not locked, tray closed There will be no additional lines between scsi0-hd0 scsi0-cd2, and break the info style. Just saw a similar one: (qemu) info block disk0: test.img (raw) [not inserted] cd: [not inserted] Removable device: not locked, tray closed foo: tmp.img (raw) Removable device: not locked, tray closed [not inserted](qemu) This patch is to solve this. Signed-off-by: Mike Qiu --- hmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hmp.c b/hmp.c index 5891507..2d2e5f8 100644 --- a/hmp.c +++ b/hmp.c @@ -367,7 +367,7 @@ void hmp_info_block(Monitor *mon, const QDict *qdict) info->value->inserted->iops_wr_max, info->value->inserted->iops_size); } else { -monitor_printf(mon, " [not inserted]"); +monitor_printf(mon, " [not inserted]\n"); } if (verbose) { monitor_printf(mon, "\nImages:\n"); What about removing the newline before "Images"? A good idea I think, it no need to add addition lines in one info. But see commit id: fbe2e26c15af35e4d157874dc80f6a19eebaa83b [...] It was changed to add this, so there maybe some reasons I think, Like everything else in that commit, I did that change because I found it more readable. The problem seems to be commit 3e9fab69 ('block: Add support for throttling burst max in QMP and the command line'), which added a bogus "[not inserted]" message. We simply need to drop it altogether instead of adding a newline. Yes, I agree with you. but maybe need the author of the commit 3e9fab69 ('block: Add support for throttling burst max in QMP and the command line') to have some comments on this line, I think. I think we should also drop this newline: if (info->value->removable) { monitor_printf(mon, "Removable device: %slocked, tray %s\n", info->value->locked ? "" : "not ", info->value->tray_open ? "open" : "closed"); } Why? Look: (qemu) info block scsi0-cd0: /tmp/cdrom.qcow2 (qcow2) Removable device: not locked, tray closed Backing file: /home/kwolf/images/iso/Fedora-18-x86_64-Live-Desktop.iso (chain depth: 1) I/O throttling: bps=1048576 bps_rd=0 bps_wr=0 bps_max=104857 bps_rd_max=0 bps_wr_max=0 iops=0 iops_rd=0 iops_wr=0 iops_max=0 iops_rd_max=0 iops_wr_max=0 iops_size=0 Do you really want to remove the newline? I'm not, but Markus suggest to do so. Thanks Mike Kevin
Re: [Qemu-devel] [PATCH] hmp: Add '\n' in monitor_printf()
On 10/15/2013 05:31 PM, Markus Armbruster wrote: Kevin Wolf writes: Am 15.10.2013 um 05:38 hat mike geschrieben: On 10/14/2013 10:36 PM, Markus Armbruster wrote: Mike Qiu writes: Without this, output of 'info block' scsi0-hd0: /images/f18-ppc64.qcow2 (qcow2) [not inserted] scsi0-cd2: [not inserted] Removable device: not locked, tray closed floppy0: [not inserted] Removable device: not locked, tray closed sd0: [not inserted] Removable device: not locked, tray closed There will be no additional lines between scsi0-hd0 scsi0-cd2, and break the info style. Just saw a similar one: (qemu) info block disk0: test.img (raw) [not inserted] cd: [not inserted] Removable device: not locked, tray closed foo: tmp.img (raw) Removable device: not locked, tray closed [not inserted](qemu) This patch is to solve this. Signed-off-by: Mike Qiu --- hmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hmp.c b/hmp.c index 5891507..2d2e5f8 100644 --- a/hmp.c +++ b/hmp.c @@ -367,7 +367,7 @@ void hmp_info_block(Monitor *mon, const QDict *qdict) info->value->inserted->iops_wr_max, info->value->inserted->iops_size); } else { -monitor_printf(mon, " [not inserted]"); +monitor_printf(mon, " [not inserted]\n"); } if (verbose) { monitor_printf(mon, "\nImages:\n"); What about removing the newline before "Images"? A good idea I think, it no need to add addition lines in one info. But see commit id: fbe2e26c15af35e4d157874dc80f6a19eebaa83b [...] It was changed to add this, so there maybe some reasons I think, Like everything else in that commit, I did that change because I found it more readable. The problem seems to be commit 3e9fab69 ('block: Add support for throttling burst max in QMP and the command line'), which added a bogus "[not inserted]" message. We simply need to drop it altogether instead of adding a newline. I think we should also drop this newline: if (info->value->removable) { monitor_printf(mon, "Removable device: %slocked, tray %s\n", info->value->locked ? "" : "not ", info->value->tray_open ? "open" : "closed"); } Why? Look: (qemu) info block scsi0-cd0: /tmp/cdrom.qcow2 (qcow2) Removable device: not locked, tray closed Backing file: /home/kwolf/images/iso/Fedora-18-x86_64-Live-Desktop.iso (chain depth: 1) I/O throttling: bps=1048576 bps_rd=0 bps_wr=0 bps_max=104857 bps_rd_max=0 bps_wr_max=0 iops=0 iops_rd=0 iops_wr=0 iops_max=0 iops_rd_max=0 iops_wr_max=0 iops_size=0 Do you really want to remove the newline? This one made me think I do: foo: tmp.img (raw) Removable device: not locked, tray closed [not inserted](qemu) If the '[not inserted]' is wrong and needs to go, then I actually don't. Here '[not inserted]' is very strange, if the commit author has some reasonable reasons, we can keep it, otherwise, I think we should remove it. Thanks Mike
Re: [Qemu-devel] [PATCH v2] net/net: Change the default mac address of nic
On 10/15/2013 08:36 PM, Eric Blake wrote: On 10/14/2013 11:07 PM, Stefan Weil wrote: Is it reasonable to get a random mac address in your guest? I don't think so. It would no longer be possible to connect to a guest using ssh, restart that guest and connect again with ssh. Agreed - libvirt ALWAYS passes a MAC to qemu, even if the user did not specify a MAC to libvirt, precisely because the MAC must be reproducible rather than random to avoid changing the guest ABI. I don't think this patch is needed - it's up to management to use qemu correctly. Yes, you are right in this condition. But qemu support Mac address unset. Also we can get the ip address through a lot of different ways, like use monitor to get the mac and then get the ip. So we can login use ssh. But as you mentioned, this patch is not needed, I don't agree with you. First, this patch just fix the Potential issue of this feature. Now libvirt maybe can't triggered this issue, who can promise in future will not. The second is, qemu not only be used by libvirt, lots of developers like to use the command line to boot up the guest. And in the future, we are not sure about other program will use qemu. The third is, when one feature has a issue in qemu, no matter when it is been triggered, should we not fix it? Mike Thanks
Re: [Qemu-devel] [PATCH] hmp: Add '\n' in monitor_printf()
On 10/16/2013 03:56 PM, Wenchao Xia wrote: 于 2013/10/16 15:47, Markus Armbruster 写道: Wenchao Xia writes: 于 2013/10/15 18:07, mike 写道: On 10/15/2013 04:58 PM, Kevin Wolf wrote: Am 15.10.2013 um 05:38 hat mike geschrieben: On 10/14/2013 10:36 PM, Markus Armbruster wrote: Mike Qiu writes: Without this, output of 'info block' scsi0-hd0: /images/f18-ppc64.qcow2 (qcow2) [not inserted] scsi0-cd2: [not inserted] Removable device: not locked, tray closed floppy0: [not inserted] Removable device: not locked, tray closed sd0: [not inserted] Removable device: not locked, tray closed There will be no additional lines between scsi0-hd0 scsi0-cd2, and break the info style. Just saw a similar one: (qemu) info block disk0: test.img (raw) [not inserted] cd: [not inserted] Removable device: not locked, tray closed foo: tmp.img (raw) Removable device: not locked, tray closed [not inserted](qemu) This patch is to solve this. Signed-off-by: Mike Qiu --- hmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hmp.c b/hmp.c index 5891507..2d2e5f8 100644 --- a/hmp.c +++ b/hmp.c @@ -367,7 +367,7 @@ void hmp_info_block(Monitor *mon, const QDict *qdict) info->value->inserted->iops_wr_max, info->value->inserted->iops_size); } else { - monitor_printf(mon, " [not inserted]"); + monitor_printf(mon, " [not inserted]\n"); } if (verbose) { monitor_printf(mon, "\nImages:\n"); What about removing the newline before "Images"? A good idea I think, it no need to add addition lines in one info. But see commit id: fbe2e26c15af35e4d157874dc80f6a19eebaa83b [...] It was changed to add this, so there maybe some reasons I think, Like everything else in that commit, I did that change because I found it more readable. The problem seems to be commit 3e9fab69 ('block: Add support for throttling burst max in QMP and the command line'), which added a bogus "[not inserted]" message. We simply need to drop it altogether instead of adding a newline. Yes, I agree with you. but maybe need the author of the commit 3e9fab69 ('block: Add support for throttling burst max in QMP and the command line') to have some comments on this line, I think. I think we should also drop this newline: if (info->value->removable) { monitor_printf(mon, " Removable device: %slocked, tray %s\n", info->value->locked ? "" : "not ", info->value->tray_open ? "open" : "closed"); } Why? Look: (qemu) info block scsi0-cd0: /tmp/cdrom.qcow2 (qcow2) Removable device: not locked, tray closed Backing file: /home/kwolf/images/iso/Fedora-18-x86_64-Live-Desktop.iso (chain depth: 1) I/O throttling: bps=1048576 bps_rd=0 bps_wr=0 bps_max=104857 bps_rd_max=0 bps_wr_max=0 iops=0 iops_rd=0 iops_wr=0 iops_max=0 iops_rd_max=0 iops_wr_max=0 iops_size=0 Do you really want to remove the newline? I'm not, but Markus suggest to do so. Why the new line matters? I think there is a QMP interface available, so the hmp output format would not bother much, just need to tip clear the info. I want this fixed: $ qemu -nodefaults -nographic -monitor stdio -S -drive if=none,file=tmp.img QEMU 1.6.50 monitor - type 'help' for more information (qemu) info block none0: tmp.img (raw) Removable device: not locked, tray closed [not inserted](qemu) Output doesn't end with newline, messing up the prompt. I see, no end with newline is bad, should fix. OK, I will make a patch to drop this ' [not inserted]' line instead of add a new line. Thanks Mike Elswhere in the thread, we concluded that the offending '[not inserted]' is bogus, and should be dropped.
Re: [Qemu-devel] [PATCH v2] hmp: solve '\n' in monitor_printf()
On 10/17/2013 04:11 PM, Kevin Wolf wrote: Am 17.10.2013 um 05:16 hat Mike Qiu geschrieben: Change to v1: remove '[not inserted]' line instead of adding '\n' Please put such notes for reviewers below the --- line, so that git am automatically removes them and they don't end up in the git commit message. OK, thanks for your kindly remind, Thanks Mike Output of 'info block' scsi0-hd0: /images/f18-ppc64.qcow2 (qcow2) [not inserted] scsi0-cd2: [not inserted] Removable device: not locked, tray closed floppy0: [not inserted] Removable device: not locked, tray closed sd0: [not inserted] Removable device: not locked, tray closed There will be no additional lines between scsi0-hd0 and scsi0-cd2. At the same time, scsi0-hd0 already inserted, but still has '[not inserted]' flag. This line should be removed. This patch is to solve this. Signed-off-by: Mike Qiu Reviewed-by: Kevin Wolf
Re: [Qemu-devel] [PATCH v2] hmp: solve '\n' in monitor_printf()
On 10/17/2013 04:14 PM, Stefan Hajnoczi wrote: On Wed, Oct 16, 2013 at 11:16:01PM -0400, Mike Qiu wrote: Change to v1: remove '[not inserted]' line instead of adding '\n' Output of 'info block' scsi0-hd0: /images/f18-ppc64.qcow2 (qcow2) [not inserted] scsi0-cd2: [not inserted] Removable device: not locked, tray closed floppy0: [not inserted] Removable device: not locked, tray closed sd0: [not inserted] Removable device: not locked, tray closed There will be no additional lines between scsi0-hd0 and scsi0-cd2. At the same time, scsi0-hd0 already inserted, but still has '[not inserted]' flag. This line should be removed. This patch is to solve this. Signed-off-by: Mike Qiu --- hmp.c | 2 -- 1 file changed, 2 deletions(-) The commit message is out-of-date, this patch now drops the bogus "[not inserted]" output. I fixed up the commit message. In the future, please put the patch changelog after the "---" line so that git am does not include the changelog in the commit description. OK, Got it, thanks for your kindly remind :) Thanks Mike Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
Re: [Qemu-devel] [Bug 1174654] Re: qemu-system-x86_64 takes 100% CPU after host machine resumed from suspend to ram
On 10/18/2013 04:29 AM, tobias wrote: hi, tried your option but it does not help. (cpu usage is still high) below my command line syntax: qemu-system-x86_64 -global mc146818rtc.lost_tick_policy=slew -machine accel=kvm:tcg -name win7 -S -machine pc-i440fx-1.4,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 813f5806-64ec-3319-452a-5e1834e753c9 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -drive file=/data/vmware/win7.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device usb-tablet,id=input0 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -vga std Hi, have you enable the kernel CPU idle driver? especially the guest kernel. Thanks Mike
Re: [Qemu-devel] [PATCH v2] net/net: Change the default mac address of nic
On 10/17/2013 08:30 PM, Stefan Hajnoczi wrote: On Tue, Oct 15, 2013 at 09:33:06PM +0800, mike wrote: On 10/15/2013 08:36 PM, Eric Blake wrote: On 10/14/2013 11:07 PM, Stefan Weil wrote: Is it reasonable to get a random mac address in your guest? I don't think so. It would no longer be possible to connect to a guest using ssh, restart that guest and connect again with ssh. Agreed - libvirt ALWAYS passes a MAC to qemu, even if the user did not specify a MAC to libvirt, precisely because the MAC must be reproducible rather than random to avoid changing the guest ABI. I don't think this patch is needed - it's up to management to use qemu correctly. Yes, you are right in this condition. But qemu support Mac address unset. Also we can get the ip address through a lot of different ways, like use monitor to get the mac and then get the ip. So we can login use ssh. But as you mentioned, this patch is not needed, I don't agree with you. First, this patch just fix the Potential issue of this feature. Now libvirt maybe can't triggered this issue, who can promise in future will not. The second is, qemu not only be used by libvirt, lots of developers like to use the command line to boot up the guest. And in the future, we are not sure about other program will use qemu. The third is, when one feature has a issue in qemu, no matter when it is been triggered, should we not fix it? NACK I'm not going to merge this patch: If you terminate QEMU and launch it again the NIC gets a different MAC address. Some guest operating systems are sensitive to this - under For these users must use -device ,mac=XX:XX:XX:XX:XX:XX. I think no body will boot up the guest, which sensitive to this, without mac address. Actually, people use the command line without mac address, mean they mainly don't care about mac address, so give them random mac address is reasonable I think. In my opinion, if we fix this, for qemu side no any issue, we both support mac address set or unset correctly. What am I confuse is, *qemu supports mac address unset, why we force users must set the address when more than one guests*? This is unreasonable. many Linux distros the network interfaces names change due to the MAC address change. As a result firewall configuration will break and other services may fail to start because they cannot find the interface. Agree, so this mac address should set in qemu command line as libvirt does :) If you have multiple guests or want control over the MAC address, set it explicitly using -device ,mac=XX:XX:XX:XX:XX:XX. Currently, especially for developers, people mainly use qemu command line directly, and as qemu supports mac address unset, they may try the simplest command line to boot up lots of guests, they will confuse about why all this guest use the same mac address. Thanks Mike Stefan
Re: [Qemu-devel] [Bug 1174654] Re: qemu-system-x86_64 takes 100% CPU after host machine resumed from suspend to ram
On 10/18/2013 03:12 PM, tobias wrote: Hello Mike, Thanks a lot for getting back on this. Is the "cpu idle driver" a command line option I need to specify for qemu (the -cpu option ?) I could not find a reference to "idle" in the man page. You need to check the guest kernel config file. Thanks Mike regards, Tobias. On 18-10-13 04:33, mike wrote: On 10/18/2013 04:29 AM, tobias wrote: hi, tried your option but it does not help. (cpu usage is still high) below my command line syntax: qemu-system-x86_64 -global mc146818rtc.lost_tick_policy=slew -machine accel=kvm:tcg -name win7 -S -machine pc-i440fx-1.4,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 813f5806-64ec-3319-452a-5e1834e753c9 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -drive file=/data/vmware/win7.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device usb-tablet,id=input0 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -vga std Hi, have you enable the kernel CPU idle driver? especially the guest kernel. Thanks Mike
Re: [Qemu-devel] [PATCH v2] net/net: Change the default mac address of nic
On 10/18/2013 05:00 PM, Stefan Hajnoczi wrote: On Fri, Oct 18, 2013 at 10:54:13AM +0800, mike wrote: NACK I'm not going to merge this patch: If you terminate QEMU and launch it again the NIC gets a different MAC address. Some guest operating systems are sensitive to this - under For these users must use -device ,mac=XX:XX:XX:XX:XX:XX. I think no body will boot up the guest, which sensitive to this, without mac address. Actually, people use the command line without mac address, mean they mainly don't care about mac address, so give them random mac address is reasonable I think. In my opinion, if we fix this, for qemu side no any issue, we both support mac address set or unset correctly. What am I confuse is, *qemu supports mac address unset, why we force users must set the address when more than one guests*? This is unreasonable. many Linux distros the network interfaces names change due to the MAC address change. As a result firewall configuration will break and other services may fail to start because they cannot find the interface. Agree, so this mac address should set in qemu command line as libvirt does :) If you have multiple guests or want control over the MAC address, set it explicitly using -device ,mac=XX:XX:XX:XX:XX:XX. Currently, especially for developers, people mainly use qemu command line directly, and as qemu supports mac address unset, they may try the simplest command line to boot up lots of guests, they will confuse about why all this guest use the same mac address. Your argument is weak: *you* want to avoid specifying the MAC address so in exchange you want to *break* existing configurations and force other people to start specifying a MAC address. OK, I do not want to break the existing configurations. I'm fine if you do not want to merge this patch. But I think this should be an issue of qemu, and need to do something on it, so I make this patch. Can we try other solutions to solve this issue? (if you agree this should be an issue) Also this potential issue can happens if the user set the same mac address with more than one guest on one host. Can we avoid this ? Thanks Mike This doesn't improve anything, it will just annoy users and cause bug reports. Sorry that there isn't a solution that satisfies everyone, you'll have to add a MAC address to your command-line. Stefan
Re: [Qemu-devel] [PATCH] spapr: increase temporary fdt buffer size
On 09/24/2013 01:59 PM, Alexey Kardashevskiy wrote: At the moment the size of the buffer is set to 64K which is enough for approximately 150 VCPUs which is not the limit. This increases the buffer up to 256K which allows having a tree for approximately 600 VCPUs which is way beyond the real number we need. As only the real size of the tree is copied to the guest, there will be no impact on existing configurations. Signed-off-by: Alexey Kardashevskiy --- hw/ppc/spapr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 1bb76f1..063fe63 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -62,7 +62,7 @@ * * We load our kernel at 4M, leaving space for SLOF initial image */ -#define FDT_MAX_SIZE0x1 +#define FDT_MAX_SIZE0x4 #define RTAS_MAX_SIZE 0x1 #define FW_MAX_SIZE 0x40 #define FW_FILE_NAME"slof.bin" Signed-off-by: Mike Qiu
[Qemu-devel] QEMU built from TCC
Can QEMU be built from TCC? Mike
[Qemu-devel] [Bug 818673] Re: virtio: trying to map MMIO memory
Does this Bug similiar with https://bugzilla.redhat.com/show_bug.cgi?id=771390 ? ** Bug watch added: Red Hat Bugzilla #771390 https://bugzilla.redhat.com/show_bug.cgi?id=771390 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/818673 Title: virtio: trying to map MMIO memory Status in QEMU: New Bug description: Qemu host is Core i7, running Linux. Guest is Windows XP sp3. Often, qemu will crash shortly after starting (1-5 minutes) with a statement "qemu-system-x86_64: virtio: trying to map MMIO memory" This has occured with qemu-kvm 0.14, qemu-kvm 0.14.1, qemu-0.15.0-rc0 and qemu 0.15.0-rc1. Qemu is started as such: qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid -drive file=/home/rick/qemu/hds/wxp.raw,if=virtio -m 768 -name WinXP -net nic,model=virtio -net user -localtime -usb -vga qxl -device virtio-serial -chardev spicevmc,name=vdagent,id=vdagent -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice port=1234,disable-ticketing -daemonize -monitor telnet:localhost:12341,server,nowait The WXP guest has virtio 1.1.16 drivers for net and scsi, and the most current spice binaries from spice-space.org. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/818673/+subscriptions
[Qemu-devel] [Bug 601946] Re: [Feature request] qemu-img multi-threaded compressed image conversion
The compression in this case is certainly chunked already, otherwise you couldn't implement a pseudo block device without reading the entire stream to read the last block! As the data in the new disk is necessarily chunk compressed, parallelisation is perfect feasible, it's just a question of the algorithm you use to arbitrate the work between the threads, which may need some thought as you'd likely be navigating a tree structure. There's no question that Jes' suggestion would create a 12x speed up for me, and there's pretty standard off the shelf server hardware with 48 cores. As Jan-Simon Möller points out, being single-threaded and single-process isn't much of an option any more. If one is trying to compress, say, a 4TB virtual disk image then using a little over 2% of the available CPU time meaning you have to wait a week is going to be... frustrating :) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/601946 Title: [Feature request] qemu-img multi-threaded compressed image conversion Status in QEMU: New Bug description: Feature request: qemu-img multi-threaded compressed image conversion Suppose I want to convert raw image to compressed qcow2. Multi- threaded conversion will be much faster, because bottleneck is compressing data. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/601946/+subscriptions
Re: [Qemu-devel] [PATCH] net: Allow specifying ifname for qemu-bridge-helper
On 10/12/2012 12:49 AM, Mike Lovell wrote: This makes a few changes to allow ifname to be specified when using qemu-bridge-helper with both the bridge and tap network interfaces. It adds the --ifname option to qemu-bridge-helper, removes the restriction that ifname cannot be specified with helper for the tap interface, and adds logic to specify the --ifname option when exec'ing the helper. Signed-off-by: Mike Lovell --- This feature was originally requested by Mario De Chenno on the qemu-devel mailing list. Seems pretty simple and figured it was something I could throw together pretty quickly. I have tested the following combinations of invoking qemu (where qbr is qemu-bridge-helper) qemu-system-x86_64 -net nic -net tap,helper="qbr --br=test1" qemu-system-x86_64 -net nic -net tap,helper="qbr --br=test1",ifname=vm1 qemu-system-x86_64 -net nic -net tap,helper=qbr qemu-system-x86_64 -net nic -net tap,helper=qbr,ifname=vm1 qemu-system-x86_64 -net nic -net bridge,helper=qbr qemu-system-x86_64 -net nic -net bridge,helper=qbr,ifname=vm1 qemu-system-x86_64 -net nic -net bridge,helper=qbr,ifname=vm1,br=test1 qemu-system-x86_64 -net nic -net bridge,helper=qbr,br=test1 I realized there were a couple more cases to check. I also tested the following. qemu-system-x86_64 -net nic -net tap,helper="qbr --ifname=vm1" qemu-system-x86_64 -net nic -net tap,helper="qbr --br=test1 --ifname=vm1" qemu-system-x86_64 -net nic -net tap,helper="qbr --ifname=vm1",ifname=foo qemu-system-x86_64 -net nic -net tap,helper="qbr --br=test1 --ifname=vm1",ifname=foo In the last two cases, the --ifname specified in the helper option (in this case, vm1) is used over the tap ifname option (in this case, foo). mike
[Qemu-devel] [PATCH] net: Allow specifying ifname for qemu-bridge-helper
This makes a few changes to allow ifname to be specified when using qemu-bridge-helper with both the bridge and tap network interfaces. It adds the --ifname option to qemu-bridge-helper, removes the restriction that ifname cannot be specified with helper for the tap interface, and adds logic to specify the --ifname option when exec'ing the helper. Signed-off-by: Mike Lovell --- This feature was originally requested by Mario De Chenno on the qemu-devel mailing list. Seems pretty simple and figured it was something I could throw together pretty quickly. I have tested the following combinations of invoking qemu (where qbr is qemu-bridge-helper) qemu-system-x86_64 -net nic -net tap,helper="qbr --br=test1" qemu-system-x86_64 -net nic -net tap,helper="qbr --br=test1",ifname=vm1 qemu-system-x86_64 -net nic -net tap,helper=qbr qemu-system-x86_64 -net nic -net tap,helper=qbr,ifname=vm1 qemu-system-x86_64 -net nic -net bridge,helper=qbr qemu-system-x86_64 -net nic -net bridge,helper=qbr,ifname=vm1 qemu-system-x86_64 -net nic -net bridge,helper=qbr,ifname=vm1,br=test1 qemu-system-x86_64 -net nic -net bridge,helper=qbr,br=test1 net/tap.c| 39 --- qapi-schema.json |3 ++- qemu-bridge-helper.c | 10 +++--- 3 files changed, 37 insertions(+), 15 deletions(-) diff --git a/net/tap.c b/net/tap.c index a88ae8f..cfb5bff 100644 --- a/net/tap.c +++ b/net/tap.c @@ -417,11 +417,13 @@ static int recv_fd(int c) return len; } -static int net_bridge_run_helper(const char *helper, const char *bridge) +static int net_bridge_run_helper(const char *helper, + const char *bridge, + const char *ifname) { sigset_t oldmask, mask; int pid, status; -char *args[5]; +char *args[6]; char **parg; int sv[2]; @@ -439,7 +441,9 @@ static int net_bridge_run_helper(const char *helper, const char *bridge) int open_max = sysconf(_SC_OPEN_MAX), i; char fd_buf[6+10]; char br_buf[6+IFNAMSIZ] = {0}; -char helper_cmd[PATH_MAX + sizeof(fd_buf) + sizeof(br_buf) + 15]; +char ifname_buf[10+IFNAMSIZ] = {0}; +char helper_cmd[PATH_MAX + sizeof(fd_buf) + sizeof(br_buf) + +sizeof(ifname_buf) + 15]; for (i = 0; i < open_max; i++) { if (i != STDIN_FILENO && @@ -459,8 +463,13 @@ static int net_bridge_run_helper(const char *helper, const char *bridge) snprintf(br_buf, sizeof(br_buf), "%s%s", "--br=", bridge); } -snprintf(helper_cmd, sizeof(helper_cmd), "%s %s %s %s", - helper, "--use-vnet", fd_buf, br_buf); +if ((strstr(helper, "--ifname=") == NULL) && (ifname != NULL)) { +snprintf(ifname_buf, sizeof(ifname_buf), "%s%s" , + "--ifname=", ifname); +} + +snprintf(helper_cmd, sizeof(helper_cmd), "%s %s %s %s %s", + helper, "--use-vnet", fd_buf, br_buf, ifname_buf); parg = args; *parg++ = (char *)"sh"; @@ -473,12 +482,17 @@ static int net_bridge_run_helper(const char *helper, const char *bridge) /* assume helper is just the executable path name */ snprintf(br_buf, sizeof(br_buf), "%s%s", "--br=", bridge); +if (ifname != NULL) { +snprintf(ifname_buf, sizeof(ifname_buf), "%s%s" , + "--ifname=", ifname); +} parg = args; *parg++ = (char *)helper; *parg++ = (char *)"--use-vnet"; *parg++ = fd_buf; *parg++ = br_buf; +*parg++ = ifname_buf; *parg++ = NULL; execv(helper, args); @@ -517,7 +531,7 @@ int net_init_bridge(const NetClientOptions *opts, const char *name, NetClientState *peer) { const NetdevBridgeOptions *bridge; -const char *helper, *br; +const char *helper, *br, *ifname; TAPState *s; int fd, vnet_hdr; @@ -527,8 +541,9 @@ int net_init_bridge(const NetClientOptions *opts, const char *name, helper = bridge->has_helper ? bridge->helper : DEFAULT_BRIDGE_HELPER; br = bridge->has_br ? bridge->br : DEFAULT_BRIDGE_INTERFACE; +ifname = bridge->has_ifname ? bridge->ifname : NULL; -fd = net_bridge_run_helper(helper, br); +fd = net_bridge_run_helper(helper, br, ifname); if (fd == -1) { return -1; } @@ -622,14 +637,16 @@ int net_init_tap(const NetClientOptions *opts, const char *name, model = "tap"; } else if (tap->has_helper) { -if (tap->has_ifname || tap->has_script || tap->h
Re: [Qemu-devel] [PATCH] net: Allow specifying ifname for qemu-bridge-helper
On 10/12/2012 02:32 AM, Michael Tokarev wrote: On 12.10.2012 10:49, Mike Lovell wrote: /* request a tap device, disable PI, and add vnet header support if - * requested and it's available. */ -prep_ifreq(&ifr, "tap%d"); + * requested and it's available. use ifname if provided for tap name. */ +prep_ifreq(&ifr, ifname != NULL ? ifname : "tap%d"); Should we check for special symbols here? prep_ifreq() does this: snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname); so at least it ensures we have length constraint. I tried the code as is with specifying ifnames with various random combinations of special characters. Some of them we just allowed through, some caused an error when initializing the tap device, and some cause problems in the shell invoking qemu. I think the linux kernel does the necessary checking during the TUNSETIFF ioctl and the qemu-bridge-helper exits with an error if there was a problem. Actually I'm not so sure anymore this is a good idea. For example, system may have firewall (iptables) rules in place for, say, future ppp interfaces for ppp clients, and this way we may request the interface to be named pppX and be allowed to send packets where we don't usually have access to. While I admit this does have that possibility, I'm not sure its a qemu problem. I don't know what the original motivation for the request was but I could see this being the reason for the request. Some administrator sets up firewall rules for a variety of guests ahead of actually running them and needs to specify the interface at runtime. Also, without using the helper programs, the qemu already allows specifying arbitrary names such as ppp0. Maybe - at least - require some common prefix for the interfaces created this way, so we'll live in our own, easily distinguishable namespace -- like, qvif* (from Qemu Virtual InterFace)? I do like the idea of using a common prefix for the default name of tap devices. Something like "qvif%d" instead of "tap%d" in tap initialization code. But something tells me this could break compatibility with external management software where something might be expecting the interface name to start with tap. mike
[Qemu-devel] centos 6.3 and nested kvm
i have been using kvm in an nested environment for a couple months now. things have appeared to be working fine. i have used both intel (westmere-ep) hosts and, when starting the first level guest, i have specified '-cpu qemu64,+vmx' to pass the kvm support through. when using a centos 6.2 guest, i am able to load the kvm and kvm_intel modules and use it. the problem is today i tried updating my centos 6.2 guest to 6.3 and now the kvm_intel module fails to load in this guest. there aren't any kernel messages that i can see. does anyone on the list have an idea of what might have changed. the 6.2 system had 2.6.32-220.el6.x86_64 as the kernel version and 6.3 has 2.6.32-279.11.1.el6. the exact error i get when trying to load the module is FATAL: Error inserting kvm_intel (/lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/arch/x86/kvm/kvm-intel.ko): Input/output error anyone know of anything that might have changed in the kvm_intel module or elsewhere that would cause this to break? mike
Re: [Qemu-devel] centos 6.3 and nested kvm
On 10/30/2012 02:02 PM, Mike Lovell wrote: i have been using kvm in an nested environment for a couple months now. things have appeared to be working fine. i have used both intel (westmere-ep) hosts and, when starting the first level guest, i have specified '-cpu qemu64,+vmx' to pass the kvm support through. when using a centos 6.2 guest, i am able to load the kvm and kvm_intel modules and use it. the problem is today i tried updating my centos 6.2 guest to 6.3 and now the kvm_intel module fails to load in this guest. there aren't any kernel messages that i can see. does anyone on the list have an idea of what might have changed. the 6.2 system had 2.6.32-220.el6.x86_64 as the kernel version and 6.3 has 2.6.32-279.11.1.el6. the exact error i get when trying to load the module is FATAL: Error inserting kvm_intel (/lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/arch/x86/kvm/kvm-intel.ko): Input/output error anyone know of anything that might have changed in the kvm_intel module or elsewhere that would cause this to break? i probably should have specified this before but i'm using qemu 1.1.2 on the host. mike
[Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
Hi all, Here is something I've been tinkering with the past few weeks and now have it in a state where the basic idea makes sense, it works, and could use some feedback from the community. This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I first had the idea when I was playing with the udp and mcast socket network backends while exploring how to build a VM infrastructure. I liked the idea of using the sockets backends cause it doesn't require escalated permissions to configure and run as well as the ability to talk over IP networks. But the built in socket backends either allowed for only 2 guests talking directly or for multiple guests where all traffic is sent to all. So one can either have two guests talking or have bandwidth wasted with multiple guests. There wasn't something that could talk to multiple guests but also utilize unicast traffic. So I made a backend that can do this. It takes the basics of how the udp and mcast socket backends work and combines them with some switching based on the ethernet packets. The result is multiple guests can talk to each other but not waste bandwidth by delivering unicast traffic to all guests. The backend also adds some header data to each packet. This header includes a network identifier so multiple logical networks can be created using the same multicast configuration but still have separation in the guests. Its kind of like VXLAN or NVGRE but replace the GRE tunnels with UDP packets. There are a couple advantages that I see to this. It allows for multiple guests in multiple locations to talk to each other while keeping unicast traffic to just between two hosts. It doesn't require root permissions to run. It can operate over non-ethernet networks (like IPoIB). It doesn't require changing network configuration on the host. It allows for a ton of logical networks to be created (currently 65536 per multicast address and port combination). There are a few disadvantages as well. It does add some more processing to the QEMU process but not much (I saw it go as fast as the socket backends). It is encapsulating an Ethernet frame inside a UDP packet so there is the overhead of the IP and UDP headers as well as the transport medium headers (most likely Ethernet again). Because there is additional header data and MTU of the guest could be limited depending on the ability to send larger multicast packet from the host. (I haven't really looked closely at this last one). There isn't the ability for something besides QEMU processes to communicate using this, though I hope to build a utility to work with a tap device. Overall, I think this is something that's pretty cool. I don't know how much people give any thought to the socket backends for real world use and so I don't know if this would be of much use to anyone. I am looking for some feedback into what the community thinks and for comments about the code. Its only my second time doing more than 20 lines of C so I'm sure I did some stupid things. I have only tested on 64 bit x86 Linux systems so far. Hopefully you all have good things to say. :) mike
[Qemu-devel] [PATCH] Initial commit for QDES - QEMU Distributed Ethernet Switch
This commit adds a new network backend to QEMU. It combines the basic behavior of the unicast udp and multicast socket backends with some intelligence about the source and destination of the packets. It also adds a header to the packets to allow for creating multiple logical networks using the same underlying network infrastructure. (Kind of like GRE Keys). This provides a network backend that acts as a type of Ethernet switch. During initialization, QDES will create two sockets. One that is used for receiving unicast udp traffic and for sending both unicast and multicast udp traffic. The second socket is for receiving multicast udp traffic. A GHashTable is also created that will serve as an address table for where packets should be delivered. A timer is also configured to regularly clean the contents of the address table. When a packet is received by either of the two sockets, the header is parsed and checked to make sure the packet is a member of the correct logical network. Then the ethernet frame that is received is inspected for the source MAC address. The address table is then updated with the source MAC address, the address of where the packets was received from, and the current time. When QEMU delivers a packet to be sent, the destination MAC address is looked for in the address table. If it is found, the packet is sent to the remote address stored in the table along with the approriate header. If the address is not found, the packet and header is sent to the configured multicast address so all members of the network will receive the packet. Unfortunately, only IPv4 is currently supported. IPv6 is on the short list of improvements to be made. Signed-off-by: Mike Lovell --- hmp-commands.hx |2 +- net.c | 31 +++- net.h |1 + net/Makefile.objs |1 + net/qdes.c| 453 + net/qdes.h| 38 + net/socket.c |3 +- net/socket.h |3 + qemu-options.hx |5 + 9 files changed, 533 insertions(+), 4 deletions(-) create mode 100644 net/qdes.c create mode 100644 net/qdes.h diff --git a/hmp-commands.hx b/hmp-commands.hx index f5d9d91..042bb85 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1007,7 +1007,7 @@ ETEXI { .name = "host_net_add", .args_type = "device:s,opts:s?", -.params = "tap|user|socket|vde|dump [options]", +.params = "tap|user|socket|qdes|vde|dump [options]", .help = "add host VLAN client", .mhandler.cmd = net_host_device_add, }, diff --git a/net.c b/net.c index 4aa416c..dbb4a48 100644 --- a/net.c +++ b/net.c @@ -30,6 +30,7 @@ #include "net/dump.h" #include "net/slirp.h" #include "net/vde.h" +#include "net/qdes.h" #include "net/util.h" #include "monitor.h" #include "qemu-common.h" @@ -1016,6 +1017,30 @@ static const struct { { /* end of list */ } }, }, +[NET_CLIENT_TYPE_QDES] = { +.type = "qdes", +.init = net_init_qdes, +.desc = { +NET_COMMON_PARAMS_DESC, +{ +.name = "timer", +.type = QEMU_OPT_NUMBER, +.help = "Seconds between cleaning mac address table" +}, { +.name = "mcast", +.type = QEMU_OPT_STRING, +.help = "UDP multicast address and port number", +}, { +.name = "localaddr", +.type = QEMU_OPT_STRING, +.help = "source address for multicast and udp packets", +}, { +.name = "network", +.type = QEMU_OPT_NUMBER, +.help = "qdes network number", +}, +}, +}, #ifdef CONFIG_VDE [NET_CLIENT_TYPE_VDE] = { .type = "vde", @@ -1104,7 +1129,8 @@ int net_client_init(QemuOpts *opts, int is_netdev, Error **errp) #ifdef CONFIG_VDE strcmp(type, "vde") != 0 && #endif -strcmp(type, "socket") != 0) { +strcmp(type, "socket") != 0 && +strcmp(type, "qdes") != 0) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, "type", "a netdev backend type"); return -1; @@ -1170,7 +1196,7 @@ int net_client_init(QemuOpts *opts, int is_netdev, Error **errp) static int net_host_check_device(const char *device) { int i; -const char *valid_param_list[] = { "tap", "socket", "dump" +const char *valid_param_list[] = { "tap", "socket", "dump", "qdes" #ifdef CONFIG_NET_BRIDGE
Re: [Qemu-devel] [PATCH] Initial commit for QDES - QEMU Distributed Ethernet Switch
I'm not sure why but it looks like my intro email for this got eaten by something. Here it is again and sorry if it shows up twice. This is my first time posting to the list and submitting a patch and I guess something doesn't like the way I did it. - Hi all, Here is something I've been tinkering with the past few weeks and now have it in a state where the basic idea makes sense, it works, and could use some feedback from the community. This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I first had the idea when I was playing with the udp and mcast socket network backends while exploring how to build a VM infrastructure. I liked the idea of using the sockets backends cause it doesn't require escalated permissions to configure and run as well as the ability to talk over IP networks. But the built in socket backends either allowed for only 2 guests talking directly or for multiple guests where all traffic is sent to all. So one can either have two guests talking or have bandwidth wasted with multiple guests. There wasn't something that could talk to multiple guests but also utilize unicast traffic. So I made a backend that can do this. It takes the basics of how the udp and mcast socket backends work and combines them with some switching based on the ethernet packets. The result is multiple guests can talk to each other but not waste bandwidth by delivering unicast traffic to all guests. The backend also adds some header data to each packet. This header includes a network identifier so multiple logical networks can be created using the same multicast configuration but still have separation in the guests. There are a couple advantages that I see to this. It allows for multiple guests in multiple locations to talk to each other while keeping unicast traffic to just between two hosts. It doesn't require root permissions to run. It can operate over non-ethernet networks (like IPoIB). It doesn't require changing network configuration on the host. It allows for a ton of logical networks to be created (currently 65536 per multicast address and port combination). There are a few disadvantages as well. It does add some more processing to the QEMU process but not much (I saw it go as fast as the socket backends). It is encapsulating an Ethernet frame inside a UDP packet so there is the overhead of the IP and UDP headers as well as the transport medium headers (most likely Ethernet again). Because there is additional header data and MTU of the guest could be limited depending on the ability to send larger multicast packet from the host. (I haven't really looked closely at this last one). There isn't the ability for something besides QEMU processes to communicate using this, though I hope to build a utility to work with a tap device. Overall, I think this is something that's pretty cool. I don't know how much people give any thought to the socket backends for real world use and so I don't know if this would be of much use to anyone. I am looking for some feedback into what the community thinks and for comments about the code. Its only my second time doing more than 20 lines of C so I'm sure I did some stupid things. I have only tested on 64 bit x86 Linux systems so far. Hopefully you all have good things to say. :) mike
Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
On 06/25/2012 04:40 AM, Stefan Hajnoczi wrote: On Mon, Jun 25, 2012 at 6:42 AM, Mike Lovell wrote: This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I first had the idea when I was playing with the udp and mcast socket network backends while exploring how to build a VM infrastructure. I liked the idea of using the sockets backends cause it doesn't require escalated permissions to configure and run as well as the ability to talk over IP networks. But the built in socket backends either allowed for only 2 guests talking directly or for multiple guests where all traffic is sent to all. So one can either have two guests talking or have bandwidth wasted with multiple guests. There wasn't something that could talk to multiple guests but also utilize unicast traffic. Have you looked at QEMU's net/vde.c backend? Does VDE (http://vde.sourceforge.net/) already do everything that QDES does? Stefan I have looked at VDE and used it for a few things. I think QDES has an advantage over VDE in that it doesn't require additional steps to get up and running. With VDE, one has to at least start a vde_switch process on the host for the guest to connect to. Then, if multiple hosts are being used, the multiple vde_switch processes have to be connected. Maybe through some vde_plug processes connected through netcat or ssh. Its a few extra steps that can be a pain in a dynamic environment. With QDES, the only configuration should be the options to QEMU as long as the multiple hosts can send and receive data through the same multicast address. QDES doesn't have anywhere close to the feature set that VDE does. This doesn't do VLANs at the switching level. It doesn't support STP so connecting it to other bridges might cause weird behavior. Etc. But it does provide a really easy way for multiple guests on multiple hosts to be connected. mike
Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
On 06/25/2012 09:33 AM, Mike Lovell wrote: On 06/25/2012 04:40 AM, Stefan Hajnoczi wrote: Have you looked at QEMU's net/vde.c backend? Does VDE (http://vde.sourceforge.net/) already do everything that QDES does? Stefan I have looked at VDE and used it for a few things. I think QDES has an advantage over VDE in that it doesn't require additional steps to get up and running. With VDE, one has to at least start a vde_switch process on the host for the guest to connect to. Then, if multiple hosts are being used, the multiple vde_switch processes have to be connected. Maybe through some vde_plug processes connected through netcat or ssh. Its a few extra steps that can be a pain in a dynamic environment. With QDES, the only configuration should be the options to QEMU as long as the multiple hosts can send and receive data through the same multicast address. QDES doesn't have anywhere close to the feature set that VDE does. This doesn't do VLANs at the switching level. It doesn't support STP so connecting it to other bridges might cause weird behavior. Etc. But it does provide a really easy way for multiple guests on multiple hosts to be connected. Oh. I forgot another reason why I decided to do this over using VDE. I'll do this one with an example. Say you have 3 virtual machines on 3 different hosts. Each host has a vde_switch process running, Switch A, B, and C. Each vde_switch has connections to the other 2 through some vde_plug's and netcat. In this case, VDE will disable one of the links between switches to prevent loops, say the link between Switch A and C. Traffic from the VM connected to Switch C that is destined for the VM on Switch A will have to traverse through Switch B. This is a suboptimal traffic flow. Especially when you consider that the traffic has to flow through 3 to 4 additional processes on each host for each direction. QDES is able to send traffic directly to the correct destination because it isn't using something like STP. Connecting it to a regular 802.1D switching network would probably result in bad things. There aren't loops in QDES because individual packets only flow one way through it. Packets received from the sockets aren't sent back out the sockets. I hope that makes sense. mike
Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
On 06/26/2012 02:29 AM, Stefan Hajnoczi wrote: On Mon, Jun 25, 2012 at 5:32 PM, Mike Lovell wrote: Oh. I forgot another reason why I decided to do this over using VDE. I'll do this one with an example. Say you have 3 virtual machines on 3 different hosts. Each host has a vde_switch process running, Switch A, B, and C. Each vde_switch has connections to the other 2 through some vde_plug's and netcat. In this case, VDE will disable one of the links between switches to prevent loops, say the link between Switch A and C. Traffic from the VM connected to Switch C that is destined for the VM on Switch A will have to traverse through Switch B. This is a suboptimal traffic flow. Especially when you consider that the traffic has to flow through 3 to 4 additional processes on each host for each direction. I haven't tried VDE myself but this sounds odd. Why can't you run a single vde_switch instance and connect multiple guests to it (with netcat)? you can connect multiple guests to a single vde_switch. as i understand it, the communication happens over a local unix domain socket. this would limit the guests on the same switch to the same host. so if you want guests on multiple hosts to talk using vde, i think you need a vde_switch on each host and then connect the vde_switch processes. mike
Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
On 06/27/2012 02:26 AM, Stefan Hajnoczi wrote: On Tue, Jun 26, 2012 at 5:48 PM, Mike Lovell wrote: you can connect multiple guests to a single vde_switch. as i understand it, the communication happens over a local unix domain socket. this would limit the guests on the same switch to the same host. so if you want guests on multiple hosts to talk using vde, i think you need a vde_switch on each host and then connect the vde_switch processes. It can be done with socat or netcat (unix_domain_socket_a <-> TCP <-> unix_domain_socket_b): http://www.dest-unreach.org/socat/doc/socat.html Another idea is to take the QDES code and turn it into a freestanding program that speaks the net/socket.c protocol. That way it works with existing QEMUs: launch the qdes daemon, then launch qemu -netdev socket,connect=qdes-host:qdes-port. okay. so yes that would be possible. its complicated by the fact that there are multiple unix domain sockets used between a vde client application (like QEMU) and the vde_switch process and some of these are dynamically created. clients connect to a control socket for the switch, negotiate the creation of a new unix socket, and then pass traffic over the new socket. whatever is starting the vde_switches and the qemu guests will have to be aware of this and connect them. this goes back to one of my original points. setting this up on multiple physical hosts is more complex than QDES is. with QDES, you just need to specify a ip and port on the physical host that is unique to each process as well as specify a multicast ip and port thats the same for all. no figuring out where the other virtual machines are running, spawning additional processes, and connecting all of those processes together. (i hope to eventually remove the need to specify the local address and port) also, connecting multiple guests on multiple hosts to a single vde_switch instance results in a sub-optimal traffic flow. traffic from a guest on one host will have to go to the host running the vde_switch and then to the host with the destination guest. with QDES, the traffic goes directly to the host running the destination guest. i actually did my initial prototyping of QDES using an external python app that talks to QEMU using the socket network backend. i used the udp one instead of connect but it still worked. some of the future work i intend to do is to have a few external applications that can talk to other QDES as well. i want to do one that can forward packets between a tap device and other QEMU processes so that regular systems can talk to the QEMU virtual machines. doing another that talks to the QEMU sockets backend would be possible as well. i do think there is some value in having it built into QEMU in the ease of use. mike
[Qemu-devel] [PATCH] flatload: fix bss clearing
The current bss clear logic assumes the target mmap address and host address are the same. Use g2h to translate from the target address space to the host so we can call memset on it. Signed-off-by: Mike Frysinger --- linux-user/flatload.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/linux-user/flatload.c b/linux-user/flatload.c index be79496..58f679e 100644 --- a/linux-user/flatload.c +++ b/linux-user/flatload.c @@ -660,7 +660,7 @@ static int load_flat_file(struct linux_binprm * bprm, } /* zero the BSS. */ -memset((void *)((unsigned long)datapos + data_len), 0, bss_len); +memset(g2h(datapos + data_len), 0, bss_len); return 0; } -- 1.7.7.3
Re: [Qemu-devel] [PATCH] flatload: fix bss clearing
On Monday 09 July 2012 09:21:52 Andreas Färber wrote: > Am 09.07.2012 15:04, schrieb Mike Frysinger: > > The current bss clear logic assumes the target mmap address and host > > address are the same. Use g2h to translate from the target address > > space to the host so we can call memset on it. > > Patch looks sensible. Are you working on rebasing your Blackfin target > to QOM and AREG0? i've rebased them to the latest release (1.1.0). FDPIC seems to work fine, as does basic ELF, but FLAT gets into an infinite loop and i haven't figured out why just yet. -mike signature.asc Description: This is a digitally signed message part.
[Qemu-devel] [PATCH] fix gcc warnings when RESERVED_VA is 0
The current code, while correct, triggers a bunch of gcc warnings when RESERVED_VA is 0 like so: linux-user/syscall.c: In function 'do_shmat': linux-user/syscall.c:3058: warning: comparison of unsigned expression < 0 is always false linux-user/syscall.c: In function 'open_self_maps': linux-user/syscall.c:4960: warning: comparison of unsigned expression < 0 is always false linux-user/syscall.c:4960: warning: comparison of unsigned expression < 0 is always false Signed-off-by: Mike Frysinger --- cpu-all.h |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/cpu-all.h b/cpu-all.h index 5e07d28..0e5dcf0 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -202,10 +202,16 @@ extern unsigned long reserved_va; #if HOST_LONG_BITS <= TARGET_VIRT_ADDR_SPACE_BITS #define h2g_valid(x) 1 #else +/* Gcc likes to warn about comparing unsigned longs to < 0, so cpp it away. */ +# if RESERVED_VA +# define _h2g_reserved_va(x) ((x) < RESERVED_VA) +# else +# define _h2g_reserved_va(x) 1 +# endif #define h2g_valid(x) ({ \ unsigned long __guest = (unsigned long)(x) - GUEST_BASE; \ (__guest < (1ul << TARGET_VIRT_ADDR_SPACE_BITS)) && \ -(!RESERVED_VA || (__guest < RESERVED_VA)); \ +_h2g_reserved_va(__guest); \ }) #endif -- 1.7.9.7
[Qemu-devel] [PATCH] fix warnings from printf target addresses
Current code triggers: memory.c: In function 'invalid_read': memory.c:1001: warning: format '%#x' expects type 'unsigned int', but argument 4 has type 'target_phys_addr_t' memory.c: In function 'invalid_write': memory.c:1013: warning: format '%#x' expects type 'unsigned int', but argument 4 has type 'target_phys_addr_t' Signed-off-by: Mike Frysinger --- memory.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/memory.c b/memory.c index 58a242d..7d5f4a3 100644 --- a/memory.c +++ b/memory.c @@ -998,7 +998,8 @@ static uint64_t invalid_read(void *opaque, target_phys_addr_t addr, MemoryRegion *mr = opaque; if (!mr->warning_printed) { -fprintf(stderr, "Invalid read from memory region %s at offset %#x\n", mr->name, addr); +fprintf(stderr, "Invalid read from memory region %s at offset %#llx\n", +mr->name, (unsigned long long)addr); mr->warning_printed = true; } return -1U; @@ -1010,7 +1011,8 @@ static void invalid_write(void *opaque, target_phys_addr_t addr, uint64_t data, MemoryRegion *mr = opaque; if (!mr->warning_printed) { -fprintf(stderr, "Invalid write to memory region %s at offset %#x\n", mr->name, addr); +fprintf(stderr, "Invalid write to memory region %s at offset %#llx\n", +mr->name, (unsigned long long)addr); mr->warning_printed = true; } } -- 1.7.9.7
[Qemu-devel] [PATCH] allow make {dist,}clean work w/out configure
There's no reason to require configure to run before running a clean target, so check MAKECMDGOALS before. Signed-off-by: Mike Frysinger --- Makefile |4 1 file changed, 4 insertions(+) diff --git a/Makefile b/Makefile index 1cd5bc8..e75740c 100644 --- a/Makefile +++ b/Makefile @@ -14,9 +14,11 @@ config-host.mak: $(SRC_PATH)/configure @sed -n "/.*Configured with/s/[^:]*: //p" $@ | sh else config-host.mak: +ifeq ($(findstring clean,$(MAKECMDGOALS)),) @echo "Please call configure before running make!" @exit 1 endif +endif GENERATED_HEADERS = config-host.h trace.h qemu-options.def ifeq ($(TRACE_BACKEND),dtrace) @@ -398,7 +400,9 @@ qemu-doc.dvi qemu-doc.html qemu-doc.info qemu-doc.pdf: \ # Add a dependency on the generated files, so that they are always # rebuilt before other object files +ifeq ($(findstring clean,$(MAKECMDGOALS)),) Makefile: $(GENERATED_HEADERS) +endif # Include automatically generated dependency files # Dependencies in Makefile.objs files come from our recursive subdir rules -- 1.7.9.7
Re: [Qemu-devel] [PATCH] fix warnings from printf target addresses
On Sunday 16 September 2012 03:24:22 Blue Swirl wrote: > On Sun, Sep 16, 2012 at 12:05 AM, Mike Frysinger wrote: > > Current code triggers: > > memory.c: In function 'invalid_read': > > memory.c:1001: warning: format '%#x' expects type 'unsigned int', > > but argument 4 has type 'target_phys_addr_t' > > memory.c: In function 'invalid_write': > > memory.c:1013: warning: format '%#x' expects type 'unsigned int', > > but argument 4 has type 'target_phys_addr_t' > > > > -fprintf(stderr, "Invalid read from memory region %s at offset > > %#x\n", mr->name, addr); +fprintf(stderr, "Invalid read from > > memory region %s at offset %#llx\n", +mr->name, > > (unsigned long long)addr); > > The right way is not adding potentially dangerous casts they're not dangerous here at all > but using TARGET_PRIxPHYS or TARGET_FMT_plx. np. i just copied what was already in memory.c, but i guess that's broken too. -mike signature.asc Description: This is a digitally signed message part.
[Qemu-devel] [PATCH] include address in invalid memory region accesses
The current code to display invalid memory accesses isn't terribly useful as it doesn't tell you what address is actually being accessed. Include it in the error message. Signed-off-by: Mike Frysinger --- memory.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/memory.c b/memory.c index d528d1f..0ea0320 100644 --- a/memory.c +++ b/memory.c @@ -998,7 +998,9 @@ static uint64_t invalid_read(void *opaque, target_phys_addr_t addr, MemoryRegion *mr = opaque; if (!mr->warning_printed) { -fprintf(stderr, "Invalid read from memory region %s\n", mr->name); +fprintf(stderr, +"Invalid read from memory region %s at %#" TARGET_PRIxPHYS "\n", +mr->name, addr); mr->warning_printed = true; } return -1U; @@ -1010,7 +1012,9 @@ static void invalid_write(void *opaque, target_phys_addr_t addr, uint64_t data, MemoryRegion *mr = opaque; if (!mr->warning_printed) { -fprintf(stderr, "Invalid write to memory region %s\n", mr->name); +fprintf(stderr, +"Invalid write to memory region %s at %#" TARGET_PRIxPHYS "\n", +mr->name, addr); mr->warning_printed = true; } } -- 1.7.9.7
Re: [Qemu-devel] [PATCH] configure: do not quote $PKG_CONFIG
On Sunday 15 July 2012 15:54:51 Stefan Weil wrote: > Am 15.07.2012 22:26, schrieb Mike Frysinger: > > We should not quote the PKG_CONFIG setting as this deviates from the > > canonical upstream behavior that gets integrated with all other build > > systems, and deviates from how we treat all other toolchain variables > > that we get from the environment. > > > > Ultimately, the point is that it breaks passing custom flags directly > > to pkg-config via the env var where this normally works elsewhere, > > and it used to work in the past. ping ... -mike signature.asc Description: This is a digitally signed message part.
Re: [Qemu-devel] [PATCH] allow make {dist, }clean work w/out configure
On Sunday 16 September 2012 05:52:47 Paolo Bonzini wrote: > Il 16/09/2012 02:30, Mike Frysinger ha scritto: > > There's no reason to require configure to run before running a clean > > target, so check MAKECMDGOALS before. > > > > --- a/Makefile > > +++ b/Makefile > > @@ -14,9 +14,11 @@ config-host.mak: $(SRC_PATH)/configure > > @sed -n "/.*Configured with/s/[^:]*: //p" $@ | sh > > else > > config-host.mak: > > +ifeq ($(findstring clean,$(MAKECMDGOALS)),) > > Please use instead: > > ifneq ($(filter-out %clean,$(MAKECMDGOALS)),) > > so that "make clean all" still gives error. that breaks `make` though. or rather, `make` no longer errors out. need to make it a little more complicated to handle this. -mike signature.asc Description: This is a digitally signed message part.
[Qemu-devel] [PATCH v2] allow make {dist, }clean work w/out configure
There's no reason to require configure to run before running a clean target, so check MAKECMDGOALS before. Signed-off-by: Mike Frysinger --- v2 - handle edge cases Makefile |4 1 file changed, 4 insertions(+) diff --git a/Makefile b/Makefile index 1cd5bc8..0a37369 100644 --- a/Makefile +++ b/Makefile @@ -14,9 +14,11 @@ config-host.mak: $(SRC_PATH)/configure @sed -n "/.*Configured with/s/[^:]*: //p" $@ | sh else config-host.mak: +ifneq ($(filter-out %clean,$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail)) @echo "Please call configure before running make!" @exit 1 endif +endif GENERATED_HEADERS = config-host.h trace.h qemu-options.def ifeq ($(TRACE_BACKEND),dtrace) @@ -398,7 +400,9 @@ qemu-doc.dvi qemu-doc.html qemu-doc.info qemu-doc.pdf: \ # Add a dependency on the generated files, so that they are always # rebuilt before other object files +ifneq ($(filter-out %clean,$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail)) Makefile: $(GENERATED_HEADERS) +endif # Include automatically generated dependency files # Dependencies in Makefile.objs files come from our recursive subdir rules -- 1.7.9.7
[Qemu-devel] [PATCH] ignore more generated files
Signed-off-by: Mike Frysinger --- .gitignore |8 1 file changed, 8 insertions(+) diff --git a/.gitignore b/.gitignore index 824c0d2..25c134e 100644 --- a/.gitignore +++ b/.gitignore @@ -11,6 +11,7 @@ trace-dtrace.dtrace *-darwin-user *-linux-user *-bsd-user +libcacard/vscclient libdis* libhw32 libhw64 @@ -51,7 +52,11 @@ test-string-output-visitor test-visitor-serialization fsdev/virtfs-proxy-helper.1 fsdev/virtfs-proxy-helper.pod +a.out* .gdbinit +.gdb_history +core +gmon.out *.a *.aux *.cp @@ -79,6 +84,9 @@ fsdev/virtfs-proxy-helper.pod *.orig .pc patches +*.diff +*.patch +*.rej pc-bios/bios-pq/status pc-bios/vgabios-pq/status pc-bios/optionrom/linuxboot.bin -- 1.7.9.7
Re: [Qemu-devel] [PATCH] ignore more generated files
On Sunday 16 September 2012 16:33:15 Stefan Weil wrote: > Am 16.09.2012 22:11, schrieb Mike Frysinger: > > +libcacard/vscclient > > +a.out* > > +.gdb_history > > +core > > +gmon.out > > +*.diff > > +*.patch > > +*.rej > > IMHO .gitignore should only contain files which are generated > by a normal QEMU build. > > a.out*, .gdb_history, core, *.diff, *patch, *.rej are not QEMU specific > and should be ignored in the user's GIT configuration. these files show up a lot when doing development on qemu, hence i think they do make sense to be listed here. it also makes things "just work" for all users rather than requiring every one to set up their local system in the same way. plus, that would conflict with repos that do want to merge these types of files (uncommon, but not unheard of as test inputs). it's not like entries in this file "cost" anything at all. -mike signature.asc Description: This is a digitally signed message part.
Re: [Qemu-devel] [PATCH] ignore more generated files
On Monday 17 September 2012 03:19:54 Jan Kiszka wrote: > On 2012-09-16 22:55, Mike Frysinger wrote: > > On Sunday 16 September 2012 16:33:15 Stefan Weil wrote: > >> Am 16.09.2012 22:11, schrieb Mike Frysinger: > >>> +libcacard/vscclient +a.out* +.gdb_history +core +gmon.out > >>> +*.diff +*.patch +*.rej > >> > >> IMHO .gitignore should only contain files which are generated by > >> a normal QEMU build. > >> > >> a.out*, .gdb_history, core, *.diff, *patch, *.rej are not QEMU > >> specific and should be ignored in the user's GIT configuration. > > > > these files show up a lot when doing development on qemu, hence i > > think they do make sense to be listed here. it also makes things > > "just work" for all users rather than requiring every one to set > > up their local system in the same way. plus, that would conflict > > with repos that do want to merge these types of files (uncommon, > > but not unheard of as test inputs). > > > > it's not like entries in this file "cost" anything at all. > > NAK. I'd like to see what is polluting my repositories, not ignoring > it because that's the policy of someone else. Excluding *.rej is > something I would _never_ do. If they lie around somewhere, something > was not merged here - that's my policy. if something wasn't merged, your git command would have told you and spit an error. same with patch. if .rej files are showing up and you're not noticing at the time of patch application, it's because you're using the tools wrong. -mike signature.asc Description: This is a digitally signed message part.
[Qemu-devel] [RFC PATCH 0/3] v2.2 RCU Implementation for QEMU
This patch set merges Paolo's conversion of address spaces to enable RCU. There is one more patchset coming for TLB access that I'm debugging right now. After I submit the last one I'll start working on enabling RCU more widely - and the series will need to be refactored. The series is availale online at: https://github.com/ncultra/qemu/tree/rcu-for-1.7 Mike Day (2): fixup changes from commit f63ca950 fixup changes from commit f62a6b2f from Paulo Bonzini Paolo Bonzini (1): exec: change iotlb APIs to take AddressSpaceDispatch aio-posix.c| 1 + aio-win32.c| 1 + cpus.c | 3 +++ cputlb.c | 7 --- exec.c | 13 - include/exec/cpu-all.h | 3 +++ include/exec/cputlb.h | 9 ++--- 7 files changed, 26 insertions(+), 11 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH 3/3] exec: change iotlb APIs to take AddressSpaceDispatch
From: Paolo Bonzini This makes it possible to start following RCU rules, which require not dereferencing as->dispatch more than once. It is not covering the whole of TCG, since the TLB data structures are not RCU-friendly, but it is enough for exec.c. Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- cputlb.c | 7 --- exec.c| 9 + include/exec/cputlb.h | 9 ++--- 3 files changed, 15 insertions(+), 10 deletions(-) diff --git a/cputlb.c b/cputlb.c index 977c0ca..1f74d08 100644 --- a/cputlb.c +++ b/cputlb.c @@ -255,6 +255,7 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr, hwaddr paddr, int prot, int mmu_idx, target_ulong size) { +AddressSpaceDispatch *d; MemoryRegionSection *section; unsigned int index; target_ulong address; @@ -269,8 +270,8 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr, } sz = size; -section = address_space_translate_for_iotlb(&address_space_memory, paddr, -&xlat, &sz); +d = address_space_memory.dispatch; +section = address_space_translate_for_iotlb(d, paddr, &xlat, &sz); assert(sz >= TARGET_PAGE_SIZE); #if defined(DEBUG_TLB) @@ -290,7 +291,7 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr, } code_address = address; -iotlb = memory_region_section_get_iotlb(env, section, vaddr, paddr, xlat, +iotlb = memory_region_section_get_iotlb(d, env, section, vaddr, paddr, xlat, prot, &address); index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); diff --git a/exec.c b/exec.c index 52e7fd5..c194bad 100644 --- a/exec.c +++ b/exec.c @@ -301,11 +301,11 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, } MemoryRegionSection * -address_space_translate_for_iotlb(AddressSpace *as, hwaddr addr, hwaddr *xlat, +address_space_translate_for_iotlb(AddressSpaceDispatch *d, hwaddr addr, hwaddr *xlat, hwaddr *plen) { MemoryRegionSection *section; -section = address_space_translate_internal(as->dispatch, addr, xlat, plen, false); +section = address_space_translate_internal(d, addr, xlat, plen, false); assert(!section->mr->iommu_ops); return section; @@ -719,7 +719,8 @@ static int cpu_physical_memory_set_dirty_tracking(int enable) return ret; } -hwaddr memory_region_section_get_iotlb(CPUArchState *env, +hwaddr memory_region_section_get_iotlb(AddressSpaceDispatch *d, + CPUArchState *env, MemoryRegionSection *section, target_ulong vaddr, hwaddr paddr, hwaddr xlat, @@ -739,7 +740,7 @@ hwaddr memory_region_section_get_iotlb(CPUArchState *env, iotlb |= PHYS_SECTION_ROM; } } else { -iotlb = section - address_space_memory.dispatch->sections; +iotlb = section - d->sections; iotlb += xlat; } diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h index e21cb60..bd7ff2f 100644 --- a/include/exec/cputlb.h +++ b/include/exec/cputlb.h @@ -19,6 +19,8 @@ #ifndef CPUTLB_H #define CPUTLB_H +#include + #if !defined(CONFIG_USER_ONLY) /* cputlb.c */ void tlb_protect_code(ram_addr_t ram_addr); @@ -34,9 +36,10 @@ extern int tlb_flush_count; void tb_flush_jmp_cache(CPUArchState *env, target_ulong addr); MemoryRegionSection * -address_space_translate_for_iotlb(AddressSpace *as, hwaddr addr, hwaddr *xlat, - hwaddr *plen); -hwaddr memory_region_section_get_iotlb(CPUArchState *env, +address_space_translate_for_iotlb(AddressSpaceDispatch *d, hwaddr addr, + hwaddr *xlat, hwaddr *plen); +hwaddr memory_region_section_get_iotlb(AddressSpaceDispatch *d, + CPUArchState *env, MemoryRegionSection *section, target_ulong vaddr, hwaddr paddr, hwaddr xlat, -- 1.8.3.1
[Qemu-devel] [PATCH 1/3] fixup changes from commit f63ca950
From: Paolo Bonzini Reviewed-by: Mike Day --- aio-posix.c | 1 + aio-win32.c | 1 + 2 files changed, 2 insertions(+) diff --git a/aio-posix.c b/aio-posix.c index 562add6..7f27ea5 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -176,6 +176,7 @@ bool aio_poll(AioContext *ctx, bool blocking) int ret; bool busy, progress; +rcu_quiescent_state(); progress = false; /* diff --git a/aio-win32.c b/aio-win32.c index 8a6abb0..9744fa5 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -99,6 +99,7 @@ bool aio_poll(AioContext *ctx, bool blocking) bool busy, progress; int count; +rcu_quiescent_state(); progress = false; /* -- 1.8.3.1
[Qemu-devel] [PATCH 2/3] fixup changes from commit f62a6b2f from Paulo Bonzini
From: Paolo Bonzini Reviewed-by: Mike Day --- cpus.c | 3 +++ exec.c | 4 +++- include/exec/cpu-all.h | 3 +++ 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 624658e..d6f2775 100644 --- a/cpus.c +++ b/cpus.c @@ -764,6 +764,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg) CPUState *cpu = arg; int r; +tls_alloc_cpu_single_env_var(); qemu_mutex_lock(&qemu_global_mutex); qemu_thread_get_self(cpu->thread); cpu->thread_id = qemu_get_thread_id(); @@ -804,6 +805,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg) sigset_t waitset; int r; +tls_alloc_cpu_single_env_var(); qemu_mutex_lock_iothread(); qemu_thread_get_self(cpu->thread); cpu->thread_id = qemu_get_thread_id(); @@ -850,6 +852,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) { CPUState *cpu = arg; +tls_alloc_cpu_single_env_var(); qemu_tcg_init_cpu_signals(); qemu_thread_get_self(cpu->thread); diff --git a/exec.c b/exec.c index 3ca9381..52e7fd5 100644 --- a/exec.c +++ b/exec.c @@ -45,7 +45,7 @@ #include "trace.h" #endif #include "exec/cpu-all.h" - +#include "qemu/tls.h" #include "exec/cputlb.h" #include "translate-all.h" @@ -72,6 +72,7 @@ static MemoryRegion io_mem_unassigned; CPUState *first_cpu; /* current CPU in the current thread. It is only valid inside cpu_exec() */ +DEFINE_TLS(CPUArchState *, cpu_single_env_var); DEFINE_TLS(CPUState *, current_cpu); /* 0 = Do not count executed instructions. 1 = Precise instruction counting. @@ -313,6 +314,7 @@ address_space_translate_for_iotlb(AddressSpace *as, hwaddr addr, hwaddr *xlat, void cpu_exec_init_all(void) { +tls_alloc_cpu_single_env_var(); #if !defined(CONFIG_USER_ONLY) qemu_mutex_init(&ram_list.mutex); memory_map_init(); diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index a407b50..b961e58 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -22,6 +22,7 @@ #include "qemu-common.h" #include "exec/cpu-common.h" #include "qemu/thread.h" +#include "qemu/tls.h" #include "qom/cpu.h" /* some important defines: @@ -356,6 +357,8 @@ int page_get_flags(target_ulong address); void page_set_flags(target_ulong start, target_ulong end, int flags); int page_check_range(target_ulong start, target_ulong len, int flags); #endif +DECLARE_TLS(CPUArchState *, cpu_single_env_var); +#define cpu_single_env (*tls_get_cpu_single_env_var()) CPUArchState *cpu_copy(CPUArchState *env); -- 1.8.3.1
Re: [Qemu-devel] [RFC PATCH 0/3] v2.2 RCU Implementation for QEMU
Andreas Färber writes: >> https://github.com/ncultra/qemu/tree/rcu-for-1.7 >> >> Mike Day (2): >> fixup changes from commit f63ca950 >> fixup changes from commit f62a6b2f from Paulo Bonzini > > These two patches are certainly not acceptable for upstream, lacking any > textual explanation and with those subjects. Thanks. The entire series is not ready yet - it needs to be refactored and tested, hence the RFC tag. I'm trying to keep an rebased version of the Paolo's May RCU tree available for those who are interested. When its ready to be considered for an upstream merge it will be as expected. The the next step for me is to start working on leveraging the RCU code. Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
[Qemu-devel] [PATCH] Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system. These patches change from using Linux kernel style upcasts to typesafe object orien
These patches apply to Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135 Signed-off-by: Mike Day --- hw/misc/ivshmem.c | 2 +- hw/pci-bridge/pci_bridge_dev.c | 6 +++--- hw/pci/pci_bridge.c| 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index ebcb52a..46d8c27 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) static void pci_ivshmem_instance_finalize(Object *obj) { -IVShmemState *s = IVSHMEM(dev); +IVShmemState *s = IVSHMEM(obj); if (s->migration_blocker) { migrate_del_blocker(s->migration_blocker); diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c index c995d5d..22caf14 100644 --- a/hw/pci-bridge/pci_bridge_dev.c +++ b/hw/pci-bridge/pci_bridge_dev.c @@ -87,7 +87,6 @@ shpc_error: bridge_error: return err; } - static void pci_bridge_dev_exitfn(PCIDevice *dev) { PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev); @@ -102,8 +101,9 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev) static void pci_bridge_dev_instance_finalize(Object *obj) { PCIDevice *dev = PCI_DEVICE(obj); -PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev); -PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br); +PCIBridge *br = PCI_BRIDGE(dev); +PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(br); + shpc_free(dev); memory_region_destroy(&bridge_dev->bar); pci_bridge_free(dev); diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index 63f9912..307e076 100644 --- a/hw/pci/pci_bridge.c +++ b/hw/pci/pci_bridge.c @@ -391,7 +391,7 @@ void pci_bridge_exitfn(PCIDevice *pci_dev) void pci_bridge_free(PCIDevice *pci_dev) { -PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev); +PCIBridge *s = PCI_BRIDGE(pci_dev); pci_bridge_region_cleanup(s, s->windows); memory_region_destroy(&s->address_space_mem); memory_region_destroy(&s->address_space_io); -- 1.8.3.1
[Qemu-devel] [PATCH RESEND] RCU implementation for Qemu. Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system.
These patches change from using Linux kernel style upcasts to typesafe object oriented casts with runtime checking semantics. These patches apply to Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135 Signed-off-by: Mike Day --- hw/misc/ivshmem.c | 2 +- hw/pci-bridge/pci_bridge_dev.c | 6 +++--- hw/pci/pci_bridge.c| 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index ebcb52a..46d8c27 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) static void pci_ivshmem_instance_finalize(Object *obj) { -IVShmemState *s = IVSHMEM(dev); +IVShmemState *s = IVSHMEM(obj); if (s->migration_blocker) { migrate_del_blocker(s->migration_blocker); diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c index c995d5d..22caf14 100644 --- a/hw/pci-bridge/pci_bridge_dev.c +++ b/hw/pci-bridge/pci_bridge_dev.c @@ -87,7 +87,6 @@ shpc_error: bridge_error: return err; } - static void pci_bridge_dev_exitfn(PCIDevice *dev) { PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev); @@ -102,8 +101,9 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev) static void pci_bridge_dev_instance_finalize(Object *obj) { PCIDevice *dev = PCI_DEVICE(obj); -PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev); -PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br); +PCIBridge *br = PCI_BRIDGE(dev); +PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(br); + shpc_free(dev); memory_region_destroy(&bridge_dev->bar); pci_bridge_free(dev); diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index 63f9912..307e076 100644 --- a/hw/pci/pci_bridge.c +++ b/hw/pci/pci_bridge.c @@ -391,7 +391,7 @@ void pci_bridge_exitfn(PCIDevice *pci_dev) void pci_bridge_free(PCIDevice *pci_dev) { -PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev); +PCIBridge *s = PCI_BRIDGE(pci_dev); pci_bridge_region_cleanup(s, s->windows); memory_region_destroy(&s->address_space_mem); memory_region_destroy(&s->address_space_io); -- 1.8.3.1
Re: [Qemu-devel] [PATCH] Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system. These patches change from using Linux kernel style upcasts to typesafe object o
Peter Maydell writes: > On 19 August 2013 19:33, Mike Day wrote: >> These patches apply to Paolo Bonzini's rcu tree: >> >> https://github.com/bonzini/qemu/tree/rcu >> commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135 >> >> Signed-off-by: Mike Day >> --- >> hw/misc/ivshmem.c | 2 +- >> hw/pci-bridge/pci_bridge_dev.c | 6 +++--- >> hw/pci/pci_bridge.c| 2 +- >> 3 files changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c >> index ebcb52a..46d8c27 100644 >> --- a/hw/misc/ivshmem.c >> +++ b/hw/misc/ivshmem.c >> @@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) >> >> static void pci_ivshmem_instance_finalize(Object *obj) >> { >> -IVShmemState *s = IVSHMEM(dev); >> +IVShmemState *s = IVSHMEM(obj); > > This should have been a flat-out compiler error, right? Yes, correct, but Paolo hasn't previously submitted this specific change code afaik. >> if (s->migration_blocker) { >> migrate_del_blocker(s->migration_blocker); >> diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c >> index c995d5d..22caf14 100644 >> --- a/hw/pci-bridge/pci_bridge_dev.c >> +++ b/hw/pci-bridge/pci_bridge_dev.c >> @@ -87,7 +87,6 @@ shpc_error: >> bridge_error: >> return err; >> } >> - > > Stray blank line change. Thanks - and git just found it for me too. Apologies. Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
[Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs.
Add RCU-enabled variants on the existing bsd DQ facility. Each Q operation has the same interface as the existing (non-RCU) version. Also, each operation is implemented as macro for now. Using the RCU-enabled DQ, existing DQ users will be able to convert to RCU without using a different list interface. To accompany the RCU-enabled DQ, there is also a test file that uses concurrent readers to contend with a single updater. This patchset builds on top of Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu Signed-off-by: Mike Day --- docs/rcu.txt | 2 +- include/qemu/queue.h | 11 -- include/qemu/rcu_queue.h | 137 +++ tests/Makefile | 14 ++- tests/rcuq_test.c| 279 +++ 5 files changed, 430 insertions(+), 13 deletions(-) diff --git a/docs/rcu.txt b/docs/rcu.txt index b3c593c..de59896 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -106,7 +106,7 @@ The core RCU API is small: so that the reclaimer function can fetch the struct foo address and free it: -call_rcu1(foo_reclaim, &foo.rcu); +call_rcu1(&foo.rcu, foo_reclaim); void foo_reclaim(struct rcu_head *rp) { diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 847ddd1..f6f0636 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -139,17 +139,6 @@ struct { \ (elm)->field.le_prev = &(head)->lh_first; \ } while (/*CONSTCOND*/0) -#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\ -(elm)->field.le_prev = &(head)->lh_first; \ -(elm)->field.le_next = (head)->lh_first;\ -smp_wmb(); /* fill elm before linking it */ \ -if ((head)->lh_first != NULL) {\ -(head)->lh_first->field.le_prev = &(elm)->field.le_next;\ -} \ -(head)->lh_first = (elm); \ -smp_wmb(); \ -} while (/* CONSTCOND*/0) - #define QLIST_REMOVE(elm, field) do { \ if ((elm)->field.le_next != NULL) \ (elm)->field.le_next->field.le_prev = \ diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h new file mode 100644 index 000..e82196d --- /dev/null +++ b/include/qemu/rcu_queue.h @@ -0,0 +1,137 @@ +#ifndef QEMU_RCU_SYS_QUEUE_H +#define QEMU_RCU_SYS_QUEUE_H + +/* + * rc_queue.h + * + * Userspace RCU QSBR header. + * + * LGPL-compatible code should include this header with : + * + * #define _LGPL_SOURCE + * #include + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013 Mike D. Day, IBM Corporation. + * + * IBM's contributions to this file may be relicensed under LGPLv2 or later. + */ + +#include "qemu/rcu.h" /* rcu.h includes qemu/queue.h and qemu/atomic.h */ + + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * List functions. + */ + + +/* + * The difference between atomic_read/set and atomic_rcu_read/set + * is in the including of a read/write memory barrier to the volatile + * access. atomic_rcu_* macros include the memory barrier, the + * plain atomic macros do not. Therefore, it should be correct to + * issue a series of reads or writes to the same element using only + * the atomic_* macro, until the last read or write, which should be + * atomic_rcu_* to introduce a read or write memory barrier as + * appropriate. + */ + +/* Upon publication of the listelm->next value, list readers + * will see the new node when following next pointers from + * antecedent nodes, but may not see the new node when following + * prev pointers from subsequent nodes until after the rcu grace + * period expires. + * see linux/include/rculist.h __list_add_rcu(new, prev, next) + */ +#define QLIST_INSERT_AFTER_RCU(listelm,
[Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)
Add RCU-enabled variants on the existing bsd DQ facility. Each Q operation has the same interface as the existing (non-RCU) version. Also, each operation is implemented as macro for now. Using the RCU-enabled DQ, existing DQ users will be able to convert to RCU without using a different list interface. This version (2) adds a macro to walk a Q in reverse: QLIST_FOREACH_REVERSE_RCU(el, head, field) Accordingly the reader threads in the test program walk the Q in reverse in addition to walking forward. To accompany the RCU-enabled DQ, there is also a test file that uses concurrent readers to contend with a single updater. This patchset builds on top of Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu Signed-off-by: Mike Day --- docs/rcu.txt | 2 +- include/qemu/queue.h | 11 -- include/qemu/rcu_queue.h | 145 tests/Makefile | 6 +- tests/rcuq_test.c| 290 +++ 5 files changed, 440 insertions(+), 14 deletions(-) create mode 100644 include/qemu/rcu_queue.h create mode 100644 tests/rcuq_test.c diff --git a/docs/rcu.txt b/docs/rcu.txt index b3c593c..de59896 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -106,7 +106,7 @@ The core RCU API is small: so that the reclaimer function can fetch the struct foo address and free it: -call_rcu1(foo_reclaim, &foo.rcu); +call_rcu1(&foo.rcu, foo_reclaim); void foo_reclaim(struct rcu_head *rp) { diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 847ddd1..f6f0636 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -139,17 +139,6 @@ struct { \ (elm)->field.le_prev = &(head)->lh_first; \ } while (/*CONSTCOND*/0) -#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\ -(elm)->field.le_prev = &(head)->lh_first; \ -(elm)->field.le_next = (head)->lh_first;\ -smp_wmb(); /* fill elm before linking it */ \ -if ((head)->lh_first != NULL) {\ -(head)->lh_first->field.le_prev = &(elm)->field.le_next;\ -} \ -(head)->lh_first = (elm); \ -smp_wmb(); \ -} while (/* CONSTCOND*/0) - #define QLIST_REMOVE(elm, field) do { \ if ((elm)->field.le_next != NULL) \ (elm)->field.le_next->field.le_prev = \ diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h new file mode 100644 index 000..198a87d --- /dev/null +++ b/include/qemu/rcu_queue.h @@ -0,0 +1,145 @@ +#ifndef QEMU_RCU_SYS_QUEUE_H +#define QEMU_RCU_SYS_QUEUE_H + +/* + * rc_queue.h + * + * Userspace RCU QSBR header. + * + * LGPL-compatible code should include this header with : + * + * #define _LGPL_SOURCE + * #include + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013 Mike D. Day, IBM Corporation. + * + * IBM's contributions to this file may be relicensed under LGPLv2 or later. + */ + +#include "qemu/rcu.h" /* rcu.h includes qemu/queue.h and qemu/atomic.h */ + + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * List functions. + */ + + +/* + * The difference between atomic_read/set and atomic_rcu_read/set + * is in the including of a read/write memory barrier to the volatile + * access. atomic_rcu_* macros include the memory barrier, the + * plain atomic macros do not. Therefore, it should be correct to + * issue a series of reads or writes to the same element using only + * the atomic_* macro, until the last read or write, which should be + * atomic_rcu_* to introduce a read or write memory barrier as + * appropriate. + */ + +/* Upon publication of the listelm->next value, list readers + * will see the new node when following next pointer
Re: [Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)
Paolo Bonzini writes: > Just a couple of questions, one of them on the new macro... > >> +/* prior to publication of the elm->prev->next value, some list >> + * readers may still see the removed element when following >> + * the antecedent's next pointer. >> + */ >> + >> +#define QLIST_REMOVE_RCU(elm, field) do { \ >> +if ((elm)->field.le_next != NULL) { \ >> + (elm)->field.le_next->field.le_prev =\ >> +(elm)->field.le_prev; \ >> +} \ >> +atomic_rcu_set((elm)->field.le_prev, (elm)->field.le_next); \ >> +} while (/*CONSTCOND*/0) > > Why is the barrier needed here, but not in Linux's list_del_rcu? > > I think it is not needed because all involved elements have already been > published and just have their pointers shuffled. I read this as more than shuffling pointers. The intent here is that the previous element's next pointer is being updated to omit the current element from the list. atomic_set always deferences the pointer passed to it, and (field)->le_pre is a double pointer. So looking at the macro: #define atomic_set(ptr, i) ((*(__typeof__(*ptr) *volatile) (ptr)) = (i)) It translates to: ( ( * (__typeof(*elm->field.le_prev) *volatile) (elm)->field.le_prev) = elm->field.le_next; ) Which is: *((struct *elm) *volatile)(elm)->field.le_prev = elm->field.le_next; Which is: *(elm)->field.le_prev = elm->field.le_next; Because field.le_prev is a double pointer that has previously been set to &prev (the address of the previous list element) this is assiging the *previous* element's next pointer, the way I read it. The Linux list_del_rcu is dealing with a singly linked list and therefore does not set a value in the previous node's element. But I'm still unclear on whether or not the memory barrier is needed because the deleted element won't be reclaimed right away. Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
Re: [Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)
On Sun, Aug 25, 2013 at 3:18 PM, Mathieu Desnoyers < mathieu.desnoy...@efficios.com> wrote: > > I'm not very comfortable with your DQ implementation not providing any > kind of guarantee to a forward traversal followed by backward traversal, > nor for backward followed by forward traversal. If a list add is > executed concurrently with traversals, and we can ensure there are no > list del of the node, if a traversal sees the added node when doing > forward iteration, I would clearly expect to still see it if a backward > iteration follows. > > I took the liberty of implementing a couple of ideas I had to provide > a RCU DQ with those guarantees. I just pushed the code here (beware, I > just did some basic single-threaded unit tests so far, so consider this > code as largely untested concurrency-wise): > > git clone git://git.urcu.io/urcu.git > branch: rcudq > file: urcu/rcudq.h > > Direct link to the file via gitweb: > http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rcudq.h;h=4a8d7b0d5143a958514cf130b1c124d99f3194ca;hb=refs/heads/rcudq Mathieu - Thanks for the review! And thanks for the code, I'm working with it right now. I like the idea of using a flag to provide a form of atomicity for the doubly-linked list elements. I'm also planning on running some timing tests to see of the additional memory barriers and atomic accesses make *any* difference whatsoever. Mike Mike Day | ncm...@ncultra.org | +1 919 371-8786
[Qemu-devel] [RFC PATCH V3 ] Introduce RCU-enabled DQs
Add RCU-enabled variants on the existing bsd DQ facility. Each Q operation has the same interface as the existing (non-RCU) version. Also, each operation is implemented as macro for now. Using the RCU-enabled DQ, existing DQ users will be able to convert to RCU without using a different list interface. Changes since V2: * reverted QLIST_FOREACH_REVERSE_RCU, it's not needed in the interface file. (But it is used in the test program.) * added g_test support to the test program To accompany the RCU-enabled DQ, there is also a test file that uses concurrent readers to contend with a single updater. This patchset builds on top of Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu Signed-off-by: Mike Day --- docs/rcu.txt | 2 +- include/qemu/queue.h | 11 -- include/qemu/rcu_queue.h | 137 +++ tests/Makefile | 5 +- tests/rcuq_test.c| 337 +++ 5 files changed, 479 insertions(+), 13 deletions(-) create mode 100644 include/qemu/rcu_queue.h create mode 100644 tests/rcuq_test.c diff --git a/docs/rcu.txt b/docs/rcu.txt index b3c593c..de59896 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -106,7 +106,7 @@ The core RCU API is small: so that the reclaimer function can fetch the struct foo address and free it: -call_rcu1(foo_reclaim, &foo.rcu); +call_rcu1(&foo.rcu, foo_reclaim); void foo_reclaim(struct rcu_head *rp) { diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 847ddd1..f6f0636 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -139,17 +139,6 @@ struct { \ (elm)->field.le_prev = &(head)->lh_first; \ } while (/*CONSTCOND*/0) -#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\ -(elm)->field.le_prev = &(head)->lh_first; \ -(elm)->field.le_next = (head)->lh_first;\ -smp_wmb(); /* fill elm before linking it */ \ -if ((head)->lh_first != NULL) {\ -(head)->lh_first->field.le_prev = &(elm)->field.le_next;\ -} \ -(head)->lh_first = (elm); \ -smp_wmb(); \ -} while (/* CONSTCOND*/0) - #define QLIST_REMOVE(elm, field) do { \ if ((elm)->field.le_next != NULL) \ (elm)->field.le_next->field.le_prev = \ diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h new file mode 100644 index 000..e2b8ba5 --- /dev/null +++ b/include/qemu/rcu_queue.h @@ -0,0 +1,137 @@ +#ifndef QEMU_RCU_SYS_QUEUE_H +#define QEMU_RCU_SYS_QUEUE_H + +/* + * rcu_queue.h + * + * Userspace RCU QSBR header. + * + * LGPL-compatible code should include this header with : + * + * #define _LGPL_SOURCE + * #include + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013 Mike D. Day, IBM Corporation. + * + * IBM's contributions to this file may be relicensed under LGPLv2 or later. + */ + +#include "qemu/rcu.h" /* rcu.h includes qemu/queue.h and qemu/atomic.h */ + + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * List functions. + */ + + +/* + * The difference between atomic_read/set and atomic_rcu_read/set + * is in the including of a read/write memory barrier to the volatile + * access. atomic_rcu_* macros include the memory barrier, the + * plain atomic macros do not. Therefore, it should be correct to + * issue a series of reads or writes to the same element using only + * the atomic_* macro, until the last read or write, which should be + * atomic_rcu_* to introduce a read or write memory barrier as + * appropriate. + */ + +/* Upon publication of the listelm->next value, list readers + * will see the new node when following next pointers from + * anteced
[Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ
Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has not been tested further than that at this point. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c | 51 - exec.c| 111 +- hw/9pfs/virtio-9p-synth.c | 2 +- include/exec/cpu-all.h| 4 +- include/qemu/rcu_queue.h | 8 5 files changed, 111 insertions(+), 65 deletions(-) diff --git a/arch_init.c b/arch_init.c index 68a7ab7..5c7b284 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -397,8 +398,8 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); - -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -407,6 +408,7 @@ static void migration_bitmap_sync(void) } } } +rcu_read_unlock(); trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; @@ -457,8 +459,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) MemoryRegion *mr; ram_addr_t current_addr; +rcu_read_lock(); if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -469,9 +472,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -526,6 +529,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } } } +rcu_read_unlock(); last_seen_block = block; last_offset = offset; @@ -565,10 +569,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -631,7 +635,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } qemu_mutex_lock_iothread(); -qemu_mutex_lock_ramlist(); +rcu_read_lock(); bytes_transferred = 0; reset_ram_globals(); @@ -641,13 +645,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { qemu_put_byte(f, strlen(block->idstr)); qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr)); qemu_put_be64(f, block->length); } -qemu_mutex_unlock_ramlist(); +rcu_read_unlock(); ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); @@ -664,8 +668,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) int64_t t0; int total_sent = 0; -qemu_mutex_lock_ramlist(); - if (ram_list.version != last_version) { reset_ram_globals(); } @@ -701,8 +703,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) i++; } -qemu_mutex_unlock_ramlist(); - /* * Must occur before EOS (or any QEMUFile operation) * because of RDMA protocol. @@ -814,6 +814,7 @@ static inline void *host_from_stream_offset(QEMUFile *f, static RAMBlock *block = NULL; char id[256]; uint8_t len; +void *ptr = NULL; if (flags & RAM_SAVE_FLAG_CONTINUE) { if (!block) { @@ -
Re: [Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ
On Wed, Aug 28, 2013 at 12:35 PM, Paolo Bonzini wrote: > > > @@ -828,13 +829,18 @@ static inline void *host_from_stream_offset(QEMUFile *f, > > qemu_get_buffer(f, (uint8_t *)id, len); > > id[len] = 0; > > > > -QTAILQ_FOREACH(block, &ram_list.blocks, next) { > > -if (!strncmp(id, block->idstr, sizeof(id))) > > -return memory_region_get_ram_ptr(block->mr) + offset; > > +rcu_read_lock(); > > +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > > +if (!strncmp(id, block->idstr, sizeof(id))) { > > +ptr = memory_region_get_ram_ptr(block->mr) + offset; > > +goto unlock_out; > > +} > > } > > > > fprintf(stderr, "Can't find block %s!\n", id); > > -return NULL; > > +unlock_out: > > +rcu_read_unlock(); > > +return ptr; > > } > > > Similarly, here the critical section includes the caller, and block is > living across calls to host. Again, for now just put all of ram_load > under a huge RCU critical section. Later we can use ram_list.version to > refresh the list and make the critical sections smaller. And: rcu_read_lock() and rcu_read_unlock() can be called recursively, so I can still leave the lock/unlock pair in host_from_stream_offset.
Re: [Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ
On Fri, Aug 30, 2013 at 4:19 AM, Paolo Bonzini wrote: > > > I'm not sure about that; returning an RCU-protected variable after > rcu_read_unlock() seems wrong to me because the pointer may not be valid > at that point. I suggest using a comment that asks to call > host_from_stream_offset within rcu_read_lock()/rcu_read_unlock(). > However, if existing practice in the kernel is different, I'll bow to that. In this case the only caller is ram_load so I'm removing the critical section from within host_from_stream_offset, and adding comments to note that the ram_list needs to be protected by the caller. The current docs/rcu.txt information indicates that rcu critical sections can be "nested or overlapping." But your suggestion results in cleaner code - we will have to go back to this later as you noted earlier. Mike
[Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2
Changes from V1: * Omitted locks or rcu critical sections within Some functions that read or write the ram_list but are called in a protected context (the caller holds the iothread lock, the ram_list mutex, or an rcu critical section). Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has not been tested further than that at this point. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c | 80 +++--- exec.c| 142 ++ hw/9pfs/virtio-9p-synth.c | 2 +- include/exec/cpu-all.h| 4 +- include/qemu/rcu_queue.h | 8 +++ 5 files changed, 151 insertions(+), 85 deletions(-) diff --git a/arch_init.c b/arch_init.c index 68a7ab7..3f4d676 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -398,7 +399,11 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { + /* This assumes the iothread lock or the ram_list mutex is taken. + * if that changes, accesses to ram_list need to be protected + * by a mutex (writes) or an rcu read lock (reads) + */ +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -457,8 +462,13 @@ static int ram_save_block(QEMUFile *f, bool last_stage) MemoryRegion *mr; ram_addr_t current_addr; +/* Sometimes called with the ram_list mutex held (ram_save_complete) + * also called WITHOUT the ram_list mutex held. (ram_save_iterate). + * Protect ram_list with an rcu critical section. + */ +rcu_read_lock(); if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -469,9 +479,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -526,6 +536,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } } } +rcu_read_unlock(); last_seen_block = block; last_offset = offset; @@ -565,10 +576,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -631,7 +642,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } qemu_mutex_lock_iothread(); -qemu_mutex_lock_ramlist(); +rcu_read_lock(); bytes_transferred = 0; reset_ram_globals(); @@ -641,13 +652,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { qemu_put_byte(f, strlen(block->idstr)); qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr)); qemu_put_be64(f, block->length); } -qemu_mutex_unlock_ramlist(); +rcu_read_unlock(); ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); @@ -664,8 +675,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) int64_t t0; int total_sent = 0; -qemu_mutex_lock_ramlist(); - if (ram_list.version != last_version) { reset_ram_globals(); } @@ -701,8 +710,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) i++; } -qemu_mutex_unlock_ramlist(); - /*
Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2
On Fri, Aug 30, 2013 at 12:38 PM, Paolo Bonzini wrote: > > > @@ -867,7 +879,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) > > if (version_id < 4 || version_id > 4) { > > return -EINVAL; > > } > > - > > +/* this implements a long-running RCU critical section. > > + * When rcu reclaims in the code start to become numerous > > + * it will be necessary to reduce the granularity of this critical > > + * section. > > + */ > > Please add the same comment (and a rcu_read_lock/unlock pair replacing > the ramlist mutex) in ram_save_iterate, too. Just double checking on this particular change. In practice ram_save manipulates the ram_list indirectly through ram_save_block. But I'm assuming you want this change because of the ram state info that persists between calls to ram_save (ram_list version in particular). Also, there is potential for the callback functions ram_control_*_iterate to manipulate the ram_list. I'm adding the rcu_read_lock/unlock pair in ram_load. It will be recursive with the same calls in ram_save_block, but as you pointed out this is low overhead. With this change in my working code, ram_control_*_iterate are called from within an rcu critical section. Mike
Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2
On Tue, Sep 3, 2013 at 10:09 AM, Paolo Bonzini wrote: > > Il 03/09/2013 15:56, Mike Day ha scritto: > >> > +/* this implements a long-running RCU critical section. > >> > + * When rcu reclaims in the code start to become numerous > >> > + * it will be necessary to reduce the granularity of this critical > >> > + * section. > >> > + */ > >> > >> Please add the same comment (and a rcu_read_lock/unlock pair replacing > >> the ramlist mutex) in ram_save_iterate, too. > > > > Just double checking on this particular change. In practice ram_save > > manipulates the ram_list indirectly through ram_save_block. But I'm > > assuming you want this change because of the ram state info that > > persists between calls to ram_save (ram_list version in particular). > > ram_list.version is not really a problem, but last_seen_block has to > persist across ram_save_block calls. Got it. that's a subtle point. > > Also, there is potential for the callback functions > > ram_control_*_iterate to manipulate the ram_list. > > I think that's right now not possible (and they could use > rcu_read_lock/unlock as well). Yeah. So how about we say for now that the rcu critical section status upon entry to the ram_control_*_iterate functions is undefined. I'll make some updates. Mike
[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V3
Changes from V2: * added rcu reclaim function to free ram blocks * reduced the number of comments * made rcu locking policy consistent for different caller scenarios * rcu updates to ram_block now are protected only by the iothread mutex Changes from V1: * Omitted locks or rcu critical sections within Some functions that read or write the ram_list but are called in a protected context (the caller holds the iothread lock, the ram_list mutex, or an rcu critical section). Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has not been tested further than that at this point. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c | 103 ++ exec.c | 160 +-- include/exec/cpu-all.h | 6 +- include/qemu/rcu_queue.h | 8 +++ 4 files changed, 173 insertions(+), 104 deletions(-) diff --git a/arch_init.c b/arch_init.c index 68a7ab7..4cb1543 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -398,7 +399,8 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -407,6 +409,8 @@ static void migration_bitmap_sync(void) } } } +rcu_read_unlock(); + trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; @@ -446,6 +450,8 @@ static void migration_bitmap_sync(void) * * Returns: The number of bytes written. * 0 means no dirty pages + * + * assumes that the caller has rcu-locked the ram_list */ static int ram_save_block(QEMUFile *f, bool last_stage) @@ -457,8 +463,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) MemoryRegion *mr; ram_addr_t current_addr; + if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -469,9 +476,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -526,9 +533,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } } } + last_seen_block = block; last_offset = offset; - return bytes_sent; } @@ -565,10 +572,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -602,10 +609,20 @@ static void reset_ram_globals(void) last_offset = 0; last_version = ram_list.version; ram_bulk_stage = true; +smp_wmb(); } #define MAX_WAIT 50 /* ms, half buffered_file limit */ + +/* ram_save_* functions each implement a long-running RCU critical + * section. When rcu-reclaims in the code start to become numerous + * it will be necessary to reduce the granularity of these critical + * sections. + * + * (ram_save_setup, ram_save_iterate, and ram_save_complete.) + */ + static int ram_save_setup(QEMUFile *f, void *opaque) { RAMBlock *block; @@ -631,7 +648,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } qemu_mutex_lock_iothread(); -qemu_mutex_lock_ramlist(); +rcu_read_lock(); bytes_transferred = 0; reset_ram_globals(); @@ -641,13 +658,13 @@ static
[Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V4
* now passes virt-test -t qemu * uses call_rcu instead of call_rcu1 * completely removed the ram_list mutex and its locking functions * cleaned up some comments Changes from V3: * added rcu reclaim function to free ram blocks * reduced the number of comments * made rcu locking policy consistent for different caller scenarios * rcu updates to ram_block now are protected only by the iothread mutex Changes from V1: * Omitted locks or rcu critical sections within Some functions that read or write the ram_list but are called in a protected context (the caller holds the iothread lock, the ram_list mutex, or an rcu critical section). Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has been virt-tested using KVM -t qemu and passes 15/15 migration tests. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c | 96 +--- exec.c | 162 +-- include/exec/cpu-all.h | 13 ++-- include/qemu/rcu_queue.h | 8 +++ 4 files changed, 160 insertions(+), 119 deletions(-) diff --git a/arch_init.c b/arch_init.c index 68a7ab7..76ef5c9 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -398,7 +399,8 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -407,6 +409,8 @@ static void migration_bitmap_sync(void) } } } +rcu_read_unlock(); + trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; @@ -446,6 +450,8 @@ static void migration_bitmap_sync(void) * * Returns: The number of bytes written. * 0 means no dirty pages + * + * assumes that the caller has rcu-locked the ram_list */ static int ram_save_block(QEMUFile *f, bool last_stage) @@ -458,7 +464,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) ram_addr_t current_addr; if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -469,9 +475,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -565,10 +571,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -602,10 +608,20 @@ static void reset_ram_globals(void) last_offset = 0; last_version = ram_list.version; ram_bulk_stage = true; +smp_wmb(); } #define MAX_WAIT 50 /* ms, half buffered_file limit */ + +/* ram_save_* functions each implement a long-running RCU critical + * section. When rcu-reclaims in the code start to become numerous + * it will be necessary to reduce the granularity of these critical + * sections. + * + * (ram_save_setup, ram_save_iterate, and ram_save_complete.) + */ + static int ram_save_setup(QEMUFile *f, void *opaque) { RAMBlock *block; @@ -631,7 +647,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } qemu_mutex_lock_iothread(); -qemu_mutex_lock_ramlist(); +rcu_read_lock(); bytes_transferred = 0; reset_ram_globals(); @@ -641,13 +657,13 @@ static int ram_save_setup(QEMUFile *f
Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V4
On Wed, Sep 4, 2013 at 3:58 PM, Paolo Bonzini wrote: > > > @@ -1323,23 +1325,21 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t addr) > > { > > RAMBlock *block; > > > > -/* The list is protected by the iothread lock here. */ > > +/* This assumes the iothread lock is taken here too. */ > > + > > block = ram_list.mru_block; > > if (block && addr - block->offset < block->length) { > > -goto found; > > +return block; > > } > > -QTAILQ_FOREACH(block, &ram_list.blocks, next) { > > +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > > if (addr - block->offset < block->length) { > > -goto found; > > +atomic_rcu_set(&ram_list.mru_block, block); > > I think this is not needed, block was already published into the block > list. What is important is to order mru_block == NULL so that it > happens before the node is removed. Which we don't do, but we can fix > later. I am thinking of writing some macros and possibly reorganizing the ram globals into a struct so that we can update it by exchanging pointers atomically. It seems to disjoint right and error-prone now that we are making it rcu-enabled. thanks! Mike
Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT
On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote: > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if > userland touches a still unmapped virtual address, a sigbus signal is > sent instead of allocating a new page. The sigbus signal handler will > then resolve the page fault in userland by calling the > remap_anon_pages syscall. What does "unmapped virtual address" mean in this context? Mike
Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686]
Josh, Logs are uploaded to cephdrop with the file name mikedawson-rbd-qemu-deadlock. - At about 2013-08-05 19:46 or 47, we hit the issue, traffic went to 0 - At about 2013-08-05 19:53:51, ran a 'virsh screenshot' Environment is: - Ceph 0.61.7 (client is co-mingled with three OSDs) - rbd cache = true and cache=writeback - qemu 1.4.0 1.4.0+dfsg-1expubuntu4 - Ubuntu Raring with 3.8.0-25-generic This issue is reproducible in my environment, and I'm willing to run any wip branch you need. What else can I provide to help? Thanks, Mike Dawson On 8/5/2013 3:48 AM, Stefan Hajnoczi wrote: On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote: Am 02.08.2013 um 23:47 schrieb Mike Dawson : We can "un-wedge" the guest by opening a NoVNC session or running a 'virsh screenshot' command. After that, the guest resumes and runs as expected. At that point we can examine the guest. Each time we'll see: If virsh screenshot works then this confirms that QEMU itself is still responding. Its main loop cannot be blocked since it was able to process the screendump command. This supports Josh's theory that a callback is not being invoked. The virtio-blk I/O request would be left in a pending state. Now here is where the behavior varies between configurations: On a Windows guest with 1 vCPU, you may see the symptom that the guest no longer responds to ping. On a Linux guest with multiple vCPUs, you may see the hung task message from the guest kernel because other vCPUs are still making progress. Just the vCPU that issued the I/O request and whose task is in UNINTERRUPTIBLE state would really be stuck. Basically, the symptoms depend not just on how QEMU is behaving but also on the guest kernel and how many vCPUs you have configured. I think this can explain how both problems you are observing, Oliver and Mike, are a result of the same bug. At least I hope they are :). Stefan
[Qemu-devel] [PATCH] block: Bugfix 'format' and 'snapshot' used in drive option
When use -drive file='xxx',format=qcow2,snapshot=on the error message "Can't use snapshot=on with driver-specific options" can be show, and fail to start the qemu. This should not be happened, and there is no file.driver option in qemu command line. It is because the commit 74fe54f2a1b5c4f4498a8fe521e1dfc936656cd4, it puts 'driver' option if the command line use 'format' option. This patch is to solve this bug. Signed-off-by: Mike Qiu --- blockdev.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/blockdev.c b/blockdev.c index 41b0a49..e174b7d 100644 --- a/blockdev.c +++ b/blockdev.c @@ -340,6 +340,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, QDict *bs_opts; const char *id; bool has_driver_specific_opts; +BlockDriver *drv = NULL; translation = BIOS_ATA_TRANSLATION_AUTO; media = MEDIA_DISK; @@ -485,7 +486,11 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, return NULL; } -qdict_put(bs_opts, "driver", qstring_from_str(buf)); +drv = bdrv_find_whitelisted_format(buf, ro); +if (!drv) { +error_report("'%s' invalid format", buf); +return NULL; +} } /* disk I/O throttling */ @@ -700,12 +705,13 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, } QINCREF(bs_opts); -ret = bdrv_open(dinfo->bdrv, file, bs_opts, bdrv_flags, NULL); +ret = bdrv_open(dinfo->bdrv, file, bs_opts, bdrv_flags, drv); if (ret < 0) { if (ret == -EMEDIUMTYPE) { error_report("could not open disk image %s: not in %s format", - file ?: dinfo->id, qdict_get_str(bs_opts, "driver")); + file ?: dinfo->id, drv ? drv->format_name : + qdict_get_str(bs_opts, "driver")); } else { error_report("could not open disk image %s: %s", file ?: dinfo->id, strerror(-ret)); -- 1.8.3.1
Re: [Qemu-devel] [PATCH] block: Bugfix 'format' and 'snapshot' used in drive option
Hi all This 'bug' also happens use -snapshot, But as newer to qemu, I'm not sure about what am I understanding. But I think snapshot should works at least in qcow2. My understanding about the driver option is mainly for 'nbd' and 'http/https/ftp', and so in commit: c2ad1b0c465a9ea8375eaff14bbd85705c673f73 So when 'driver' option used, the snapshot can't be using. As this understanding, my first patch is try to get the the value of key==driver, and if the value in 'nbd' and 'http/https/ftp', then give out that error messages. So give me some comments and sugguestions, pls. Thanks Mike 2013/8/8 22:45, Mike Qiu wrote: > When use -drive file='xxx',format=qcow2,snapshot=on the error > message "Can't use snapshot=on with driver-specific options" > can be show, and fail to start the qemu. > > This should not be happened, and there is no file.driver option > in qemu command line. > > It is because the commit 74fe54f2a1b5c4f4498a8fe521e1dfc936656cd4, > it puts 'driver' option if the command line use 'format' option. > > This patch is to solve this bug. > > Signed-off-by: Mike Qiu > --- > blockdev.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/blockdev.c b/blockdev.c > index 41b0a49..e174b7d 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -340,6 +340,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, > QDict *bs_opts; > const char *id; > bool has_driver_specific_opts; > +BlockDriver *drv = NULL; > > translation = BIOS_ATA_TRANSLATION_AUTO; > media = MEDIA_DISK; > @@ -485,7 +486,11 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, > return NULL; > } > > -qdict_put(bs_opts, "driver", qstring_from_str(buf)); > +drv = bdrv_find_whitelisted_format(buf, ro); > +if (!drv) { > +error_report("'%s' invalid format", buf); > +return NULL; > +} > } > > /* disk I/O throttling */ > @@ -700,12 +705,13 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, > } > > QINCREF(bs_opts); > -ret = bdrv_open(dinfo->bdrv, file, bs_opts, bdrv_flags, NULL); > +ret = bdrv_open(dinfo->bdrv, file, bs_opts, bdrv_flags, drv); > > if (ret < 0) { > if (ret == -EMEDIUMTYPE) { > error_report("could not open disk image %s: not in %s format", > - file ?: dinfo->id, qdict_get_str(bs_opts, > "driver")); > + file ?: dinfo->id, drv ? drv->format_name : > + qdict_get_str(bs_opts, "driver")); > } else { > error_report("could not open disk image %s: %s", > file ?: dinfo->id, strerror(-ret));
[Qemu-devel] [RFC PATCH 02/14] rcu: add rcu library
From: Paolo Bonzini This includes a (mangled) copy of the urcu-qsbr code from liburcu. The main changes are: 1) removing dependencies on many other header files in liburcu; 2) removing for simplicity the tentative busy waiting in synchronize_rcu, which has limited performance effects; 3) replacing futexes in synchronize_rcu with QemuEvents for Win32 portability. The API is the same as liburcu, so it should be possible in the future to require liburcu on POSIX systems for example and use our copy only on Windows. Among the various versions available I chose urcu-qsbr, which has the fastest rcu_read_{lock,unlock} but requires the program to manually annotate quiescent points or intervals. QEMU threads usually have easily identified quiescent periods, so this should not be a problem. Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- docs/rcu.txt | 301 + hw/9pfs/virtio-9p-synth.c | 1 + include/qemu/queue.h | 13 ++ include/qemu/rcu-pointer.h | 110 + include/qemu/rcu.h | 168 + include/qemu/thread.h | 3 - libcacard/Makefile | 3 +- util/Makefile.objs | 1 + util/rcu.c | 203 ++ 9 files changed, 799 insertions(+), 4 deletions(-) create mode 100644 docs/rcu.txt create mode 100644 include/qemu/rcu-pointer.h create mode 100644 include/qemu/rcu.h create mode 100644 util/rcu.c diff --git a/docs/rcu.txt b/docs/rcu.txt new file mode 100644 index 000..19e4840 --- /dev/null +++ b/docs/rcu.txt @@ -0,0 +1,301 @@ +Using RCU (Read-Copy-Update) for synchronization + + +Read-copy update (RCU) is a synchronization mechanism that is used to +protect read-mostly data structures. RCU is very efficient and scalable +on the read side (it is wait-free), and thus can make the read paths +extremely fast. + +RCU supports concurrency between a single writer and multiple readers, +thus it is not used alone. Typically, the write-side will use a lock to +serialize multiple updates, but other approaches are possible (e.g., +restricting updates to a single task). In QEMU, when a lock is used, +this will often be the "iothread mutex", also known as the "big QEMU +lock" (BQL). Also, restricting updates to a single task is done in +QEMU using the "bottom half" API. + +RCU is fundamentally a "wait-to-finish" mechanism. The read side marks +sections of code with "critical sections", and the update side will wait +for the execution of all *currently running* critical sections before +proceeding, or before asynchronously executing a callback. + +The key point here is that only the currently running critical sections +are waited for; critical sections that are started _after_ the beginning +of the wait do not extend the wait, despite running concurrently with +the updater. This is the reason why RCU is more scalable than, +for example, reader-writer locks. It is so much more scalable that +the system will have a single instance of the RCU mechanism; a single +mechanism can be used for an arbitrary number of "things", without +having to worry about things such as contention or deadlocks. + +How is this possible? The basic idea is to split updates in two phases, +"removal" and "reclamation". During removal, we ensure that subsequent +readers will not be able to get a reference to the old data. After +removal has completed, a critical section will not be able to access +the old data. Therefore, critical sections that begin after removal +do not matter; as soon as all previous critical sections have finished, +there cannot be any readers who hold references to the data structure, +which may not be safely reclaimed (e.g., freed or unref'ed). + +Here is a picutre: + +thread 1 thread 2 thread 3 +------ +enter RCU crit.sec. + |finish removal phase + |begin wait + | |enter RCU crit.sec. +exit RCU crit.sec | | +complete wait | +begin reclamation phase | + exit RCU crit.sec. + + +Note how thread 3 is still executing its critical section when thread 2 +starts reclaiming data. This is possible, because the old version of the +data structure was not accessible at the time thread 3 began executing +that critical section. + + +RCU API +=== + +The core RCU API is small: + + void rcu_read_lock(void); + +Used by a reader to inform the reclaimer that the reader is +entering an RCU read-side critical s
[Qemu-devel] [RFC PATCH 05/14] rcu: add call_rcu
From: Paolo Bonzini Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- docs/rcu.txt | 108 +-- include/qemu/rcu.h | 22 ++ util/rcu.c | 120 + 3 files changed, 246 insertions(+), 4 deletions(-) diff --git a/docs/rcu.txt b/docs/rcu.txt index 5736676..d7c4f0b 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -82,7 +82,50 @@ The core RCU API is small: Note that it would be valid for another update to come while synchronize_rcu is running. Because of this, it is better that the updater releases any locks it may hold before calling -synchronize_rcu. +synchronize_rcu. If this is not possible (for example, because +the updater is protected by the BQL), you can use call_rcu. + + void call_rcu1(struct rcu_head * head, +void (*func)(struct rcu_head *head)); + +This function invokes func(head) after all pre-existing RCU +read-side critical sections on all threads have completed. This +marks the end of the removal phase, with func taking care +asynchronously of the reclamation phase. + +The foo struct needs to have an rcu_head structure added, +perhaps as follows: + +struct foo { +struct rcu_head rcu; +int a; +char b; +long c; +}; + +so that the reclaimer function can fetch the struct foo address +and free it: + +call_rcu1(foo_reclaim, &foo.rcu); + +void foo_reclaim(struct rcu_head *rp) +{ +struct foo *fp = container_of(rp, struct foo, rcu); +g_free(fp); +} + +For the common case where the rcu_head member is the first of the +struct, you can use the following macro. + + void call_rcu(T *p, + void (*func)(T *p), + field-name); + +call_rcu1 is typically used through this macro, in the common case +where the "struct rcu_head" is the first field in the struct. In +the above case, one could have written simply: + +call_rcu(foo_reclaim, g_free, rcu); typeof(*p) rcu_dereference(p); typeof(p) rcu_assign_pointer(p, typeof(p) v); @@ -173,6 +216,11 @@ DIFFERENCES WITH LINUX - rcu_dereference takes a _pointer_ to the variable being accessed. Wrong usage will be detected by the compiler. +- call_rcu is a macro that has an extra argument (the name of the first + field in the struct, which must be a struct rcu_head), and expects the + type of the callback's argument to be the type of the first argument. + call_rcu1 is the same as Linux's call_rcu. + - Quiescent points must be marked explicitly unless the thread uses condvars/semaphores/events for synchronization. @@ -229,7 +277,47 @@ The write side looks simply like this (with appropriate locking): synchronize_rcu(); free(old); -Note that the same idiom would be possible with reader/writer +If the processing cannot be done purely within the critical section, it +is possible to combine this idiom with a "real" reference count: + +rcu_read_lock(); +p = rcu_dereference(&foo); +foo_ref(p); +rcu_read_unlock(); +/* do something with p. */ +foo_unref(p); + +The write side can be like this: + +qemu_mutex_lock(&foo_mutex); +old = foo; +rcu_assign_pointer(foo, new); +qemu_mutex_unlock(&foo_mutex); +synchronize_rcu(); +foo_unref(old); + +or with call_rcu: + +qemu_mutex_lock(&foo_mutex); +old = foo; +rcu_assign_pointer(foo, new); +qemu_mutex_unlock(&foo_mutex); +call_rcu(foo_unref, old, rcu); + +In both cases, the write side only performs removal. Reclamation +happens when the last reference to a "foo" object is dropped. +Using synchronize_rcu() is undesirably expensive, because the +last reference may be dropped on the read side. Hence you can +use call_rcu() instead: + + foo_unref(struct foo *p) { +if (atomic_dec(&p->refcount) == 0) { +call_rcu(foo_destroy, p, rcu); +} +} + + +Note that the same idioms would be possible with reader/writer locks: read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock); @@ -239,13 +327,25 @@ locks: write_mutex_unlock(&foo_rwlock); free(p); +-- + +read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock); +p = foo;old = foo; +foo_ref(p); foo = new; +read_unlock(&foo_rwlock); write_mutex_unlock(&foo_rwlock); +/* do something with p. */ foo_unref(old); +foo_unref(
[Qemu-devel] [RFC PATCH 00/14] RCU Implementation for Qemu
This is a rebase of Paolo's May patchset on v1.6.0-rc3 The tree is availavle on github: https://github.com/ncultra/qemu/tree/rcu-for-1.7 Mike Day (3): fix #include directive for rcu header include osdep.h for definition of glue(a,b) fix pointer reference to rcu_assign_pointer Paolo Bonzini (11): qemu-thread: add QemuEvent rcu: add rcu library qemu-thread: register threads with RCU rcu: add call_rcu rcu: add rcutorture rcu: allow nested calls to rcu_thread_offline/rcu_thread_online qemu-thread: report RCU quiescent states event loop: report RCU quiescent states cpus: report RCU quiescent states block: report RCU quiescent states migration: report RCU quiescent states aio-posix.c | 9 +- aio-win32.c | 7 + block/raw-posix.c | 3 + block/raw-win32.c | 3 + cpus.c | 3 + docs/rcu.txt| 434 +++ hw/9pfs/virtio-9p-synth.c | 1 + include/qemu/queue.h| 13 ++ include/qemu/rcu-pointer.h | 110 +++ include/qemu/rcu.h | 208 + include/qemu/thread-posix.h | 8 + include/qemu/thread-win32.h | 4 + include/qemu/thread.h | 10 +- kvm-all.c | 3 + libcacard/Makefile | 3 +- main-loop.c | 7 +- migration.c | 2 + tests/Makefile | 4 +- tests/rcutorture.c | 439 util/Makefile.objs | 1 + util/qemu-thread-posix.c| 173 - util/qemu-thread-win32.c| 44 - util/rcu.c | 320 23 files changed, 1796 insertions(+), 13 deletions(-) create mode 100644 docs/rcu.txt create mode 100644 include/qemu/rcu-pointer.h create mode 100644 include/qemu/rcu.h create mode 100644 tests/rcutorture.c create mode 100644 util/rcu.c -- 1.8.3.1
[Qemu-devel] [RFC PATCH 04/14] qemu-thread: register threads with RCU
From: Paolo Bonzini Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- docs/rcu.txt | 13 +++-- util/qemu-thread-posix.c | 28 +++- util/qemu-thread-win32.c | 2 ++ 3 files changed, 36 insertions(+), 7 deletions(-) diff --git a/docs/rcu.txt b/docs/rcu.txt index 19e4840..5736676 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -122,8 +122,8 @@ on many POSIX systems other than Linux and Solaris). For this reason, QEMU's RCU implementation resorts to manual annotation of "quiescent states", i.e. points where no RCU read-side critical -section can be active. All threads that participate in the RCU mechanism -need to annotate such points. +section can be active. All threads created with qemu_thread_create +participate in the RCU mechanism and need to annotate such points. Marking quiescent states is done with the following three APIs: @@ -144,8 +144,8 @@ Marking quiescent states is done with the following three APIs: thread. -Furthermore, threads that participate in the RCU mechanism must communicate -this fact using the following APIs: +The following APIs can be used to use RCU in a thread that is not +created with qemu_thread_create(): void rcu_register_thread(void); @@ -160,8 +160,9 @@ this fact using the following APIs: either manually or by using the QemuCond/QemuSemaphore/QemuEvent APIs. -Note that these APIs are relatively heavyweight, and should _not_ be -nested. +Note that these APIs are relatively heavyweight, should _not_ be +nested, and should not be called in threads that are created with +qemu_thread_create(). DIFFERENCES WITH LINUX diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c index 37dd298..2371176 100644 --- a/util/qemu-thread-posix.c +++ b/util/qemu-thread-posix.c @@ -26,6 +26,7 @@ #endif #include "qemu/thread.h" #include "qemu/atomic.h" +#include "qemu/rcu.h" static void error_exit(int err, const char *msg) { @@ -388,6 +389,26 @@ void qemu_event_wait(QemuEvent *ev) } +typedef struct QemuThreadData { +/* Passed to win32_start_routine. */ +void *(*start_routine)(void *); +void *arg; +} QemuThreadData; + +static void *thread_start_routine(void *arg) +{ +QemuThreadData *data = (QemuThreadData *) arg; +void *(*start_routine)(void *) = data->start_routine; +void *thread_arg = data->arg; +void *ret; + +rcu_register_thread(); +g_free(data); +ret = start_routine(thread_arg); +rcu_unregister_thread(); +return ret; +} + void qemu_thread_create(QemuThread *thread, void *(*start_routine)(void*), void *arg, int mode) @@ -395,6 +416,11 @@ void qemu_thread_create(QemuThread *thread, sigset_t set, oldset; int err; pthread_attr_t attr; +QemuThreadData *data; + +data = g_malloc(sizeof(*data)); +data->start_routine = start_routine; +data->arg = arg; err = pthread_attr_init(&attr); if (err) { @@ -410,7 +436,7 @@ void qemu_thread_create(QemuThread *thread, /* Leave signal handling to the iothread. */ sigfillset(&set); pthread_sigmask(SIG_SETMASK, &set, &oldset); -err = pthread_create(&thread->thread, &attr, start_routine, arg); +err = pthread_create(&thread->thread, &attr, thread_start_routine, data); if (err) error_exit(err, __func__); diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c index 27a5217..0c4850d 100644 --- a/util/qemu-thread-win32.c +++ b/util/qemu-thread-win32.c @@ -278,6 +278,7 @@ static unsigned __stdcall win32_start_routine(void *arg) data = NULL; } qemu_thread_data = data; +rcu_register_thread(); qemu_thread_exit(start_routine(thread_arg)); abort(); } @@ -293,6 +294,7 @@ void qemu_thread_exit(void *arg) data->exited = true; LeaveCriticalSection(&data->cs); } +rcu_unregister_thread(); _endthreadex(0); } -- 1.8.3.1
[Qemu-devel] [RFC PATCH 08/14] qemu-thread: report RCU quiescent states
From: Paolo Bonzini Most threads will use mutexes and other sleeping synchronization primitives (condition variables, semaphores, events) periodically. For these threads, the synchronization primitives are natural places to report a quiescent state (possibly an extended one). Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- docs/rcu.txt | 27 +++ util/qemu-thread-posix.c | 29 + util/qemu-thread-win32.c | 16 +++- util/rcu.c | 3 --- 4 files changed, 67 insertions(+), 8 deletions(-) diff --git a/docs/rcu.txt b/docs/rcu.txt index 4e7cde3..38fd8f4 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -168,6 +168,33 @@ of "quiescent states", i.e. points where no RCU read-side critical section can be active. All threads created with qemu_thread_create participate in the RCU mechanism and need to annotate such points. +Luckily, in most cases no manual annotation is needed, because waiting +on condition variables (qemu_cond_wait), semaphores (qemu_sem_wait, +qemu_sem_timedwait) or events (qemu_event_wait) implicitly marks the thread +as quiescent for the whole duration of the wait. (There is an exception +for semaphore waits with a zero timeout). + +Manual annotation is still needed in the following cases: + +- threads that spend their sleeping time in the kernel, for example + in a call to select(), poll() or WaitForMultipleObjects(). The QEMU + I/O thread is an example of this case. + +- threads that perform a lot of I/O. In QEMU, the workers used for + aio=thread are an example of this case (see aio_worker in block/raw-*). + +- threads that run continuously until they exit. The migration thread + is an example of this case. + +Regarding the second case, note that the workers run in the QEMU thread +pool. The thread pool uses semaphores for synchronization, hence it does +report quiescent states periodically. However, in some cases (e.g. NFS +mounted with the "hard" option) the workers can take an arbitrarily long +amount of time. When this happens, synchronize_rcu() will not exit and +call_rcu() callbacks will be delayed arbitrarily. It is therefore a +good idea to mark I/O system calls as quiescence points in the worker +functions. + Marking quiescent states is done with the following three APIs: void rcu_quiescent_state(void); diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c index 2371176..dbefff8 100644 --- a/util/qemu-thread-posix.c +++ b/util/qemu-thread-posix.c @@ -119,7 +119,9 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex) { int err; +rcu_thread_offline(); err = pthread_cond_wait(&cond->cond, &mutex->lock); +rcu_thread_online(); if (err) error_exit(err, __func__); } @@ -211,6 +213,10 @@ int qemu_sem_timedwait(QemuSemaphore *sem, int ms) int rc; struct timespec ts; +if (ms) { +rcu_thread_offline(); +} + #if defined(__APPLE__) || defined(__NetBSD__) rc = 0; compute_abs_deadline(&ts, ms); @@ -228,7 +234,10 @@ int qemu_sem_timedwait(QemuSemaphore *sem, int ms) --sem->count; } pthread_mutex_unlock(&sem->lock); -return (rc == ETIMEDOUT ? -1 : 0); +if (rc == ETIMEDOUT) { +rc == -1; +} + #else if (ms <= 0) { /* This is cheaper than sem_timedwait. */ @@ -236,7 +245,7 @@ int qemu_sem_timedwait(QemuSemaphore *sem, int ms) rc = sem_trywait(&sem->sem); } while (rc == -1 && errno == EINTR); if (rc == -1 && errno == EAGAIN) { -return -1; +goto out; } } else { compute_abs_deadline(&ts, ms); @@ -244,19 +253,26 @@ int qemu_sem_timedwait(QemuSemaphore *sem, int ms) rc = sem_timedwait(&sem->sem, &ts); } while (rc == -1 && errno == EINTR); if (rc == -1 && errno == ETIMEDOUT) { -return -1; +goto out; } } if (rc < 0) { error_exit(errno, __func__); } -return 0; + #endif + +out: +if (ms) { +rcu_thread_online(); +} +return rc; } void qemu_sem_wait(QemuSemaphore *sem) { int rc; +rcu_thread_offline(); #if defined(__APPLE__) || defined(__NetBSD__) pthread_mutex_lock(&sem->lock); @@ -276,6 +292,7 @@ void qemu_sem_wait(QemuSemaphore *sem) error_exit(errno, __func__); } #endif +rcu_thread_online(); } #ifdef __linux__ @@ -384,7 +401,11 @@ void qemu_event_wait(QemuEvent *ev) return; } } +rcu_thread_offline(); futex_wait(ev, EV_BUSY); +rcu_thread_online(); +} else { +rcu_quiescent_state(); } } diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c index 0c4850d..7bc07b3 100644 --- a/util/qemu-thread-win32.c +++ b/
[Qemu-devel] [RFC PATCH 07/14] rcu: allow nested calls to rcu_thread_offline/rcu_thread_online
From: Paolo Bonzini Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- docs/rcu.txt | 5 + include/qemu/rcu.h | 21 +++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/docs/rcu.txt b/docs/rcu.txt index d7c4f0b..4e7cde3 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -187,6 +187,11 @@ Marking quiescent states is done with the following three APIs: thread. +rcu_thread_offline() and rcu_thread_online() can be nested. The end of +the extended quiescent state will coincide with the outermost call to +rcu_thread_online(). + + The following APIs can be used to use RCU in a thread that is not created with qemu_thread_create(): diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h index e43b912..3a55045 100644 --- a/include/qemu/rcu.h +++ b/include/qemu/rcu.h @@ -77,6 +77,9 @@ struct rcu_reader_data { unsigned long ctr; bool waiting; +/* Data used by reader only */ +unsigned offline; + /* Data used for registry, protected by rcu_gp_lock */ QLIST_ENTRY(rcu_reader_data) node; }; @@ -127,9 +130,14 @@ static inline void rcu_read_unlock(void) static inline void rcu_quiescent_state(void) { struct rcu_reader_data *p_rcu_reader = get_rcu_reader(); +unsigned ctr; + +if (p_rcu_reader->offline > 0) { +return; +} /* Reuses smp_rmb() in the last rcu_read_unlock(). */ -unsigned ctr = atomic_read(&rcu_gp_ctr); +ctr = atomic_read(&rcu_gp_ctr); atomic_xchg(&p_rcu_reader->ctr, ctr); if (atomic_read(&p_rcu_reader->waiting)) { atomic_set(&p_rcu_reader->waiting, false); @@ -141,6 +149,10 @@ static inline void rcu_thread_offline(void) { struct rcu_reader_data *p_rcu_reader = get_rcu_reader(); +if (p_rcu_reader->offline++ > 0) { +return; +} + atomic_xchg(&p_rcu_reader->ctr, 0); if (atomic_read(&p_rcu_reader->waiting)) { atomic_set(&p_rcu_reader->waiting, false); @@ -150,7 +162,12 @@ static inline void rcu_thread_offline(void) static inline void rcu_thread_online(void) { -rcu_quiescent_state(); +struct rcu_reader_data *p_rcu_reader = get_rcu_reader(); + +assert(p_rcu_reader->offline != 0); +if (--p_rcu_reader->offline == 0) { +rcu_quiescent_state(); +} } extern void synchronize_rcu(void); -- 1.8.3.1
[Qemu-devel] [RFC PATCH 03/14] fix #include directive for rcu header
Signed-off-by: Mike Day --- hw/9pfs/virtio-9p-synth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/9pfs/virtio-9p-synth.c b/hw/9pfs/virtio-9p-synth.c index d5f5842..fdfea21 100644 --- a/hw/9pfs/virtio-9p-synth.c +++ b/hw/9pfs/virtio-9p-synth.c @@ -17,7 +17,7 @@ #include "virtio-9p-xattr.h" #include "fsdev/qemu-fsdev.h" #include "virtio-9p-synth.h" -#include "util/rcu.h" +#include "qemu/rcu.h" #include -- 1.8.3.1
[Qemu-devel] [RFC PATCH 09/14] event loop: report RCU quiescent states
From: Paolo Bonzini Threads that run event loops also have places that can sleep for an extended time. Place an extended quiescent state there. Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- aio-posix.c | 9 - aio-win32.c | 7 +++ main-loop.c | 7 ++- 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/aio-posix.c b/aio-posix.c index b68eccd..562add6 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -17,6 +17,7 @@ #include "block/block.h" #include "qemu/queue.h" #include "qemu/sockets.h" +#include "qemu/rcu.h" struct AioHandler { @@ -232,10 +233,16 @@ bool aio_poll(AioContext *ctx, bool blocking) } /* wait until next event */ +if (blocking) { +rcu_thread_offline(); +} ret = g_poll((GPollFD *)ctx->pollfds->data, ctx->pollfds->len, blocking ? -1 : 0); - +if (blocking) { +rcu_thread_online(); +} + /* if we have any readable fds, dispatch event */ if (ret > 0) { QLIST_FOREACH(node, &ctx->aio_handlers, node) { diff --git a/aio-win32.c b/aio-win32.c index 38723bf..8a6abb0 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -175,7 +175,14 @@ bool aio_poll(AioContext *ctx, bool blocking) /* wait until next event */ while (count > 0) { int timeout = blocking ? INFINITE : 0; + +if (timeout) { +rcu_thread_offline(); +} int ret = WaitForMultipleObjects(count, events, FALSE, timeout); +if (timeout) { +rcu_thread_online(); +} /* if we have any signaled events, dispatch event */ if ((DWORD) (ret - WAIT_OBJECT_0) >= count) { diff --git a/main-loop.c b/main-loop.c index a44fff6..79f43da 100644 --- a/main-loop.c +++ b/main-loop.c @@ -27,6 +27,7 @@ #include "qemu/sockets.h" // struct in_addr needed for libslirp.h #include "slirp/libslirp.h" #include "qemu/main-loop.h" +#include "qemu/rcu.h" #include "block/aio.h" #ifndef _WIN32 @@ -220,6 +221,7 @@ static int os_host_main_loop_wait(uint32_t timeout) if (timeout > 0) { spin_counter = 0; qemu_mutex_unlock_iothread(); + rcu_thread_offline(); } else { spin_counter++; } @@ -227,7 +229,8 @@ static int os_host_main_loop_wait(uint32_t timeout) ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout); if (timeout > 0) { -qemu_mutex_lock_iothread(); +rcu_thread_online(); + qemu_mutex_lock_iothread(); } glib_pollfds_poll(); @@ -424,7 +427,9 @@ static int os_host_main_loop_wait(uint32_t timeout) } qemu_mutex_unlock_iothread(); +rcu_thread_offline(); g_poll_ret = g_poll(poll_fds, n_poll_fds + w->num, poll_timeout); +rcu_thread_online(); qemu_mutex_lock_iothread(); if (g_poll_ret > 0) { for (i = 0; i < w->num; i++) { -- 1.8.3.1
[Qemu-devel] [PATCH 13/14] include osdep.h for definition of glue(a, b)
diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h index 3a55045..f398e6c 100644 --- a/include/qemu/rcu.h +++ b/include/qemu/rcu.h @@ -39,6 +39,7 @@ #include "qemu/compiler.h" #include "qemu/rcu-pointer.h" #include "qemu/thread.h" +#include "qemu/osdep.h" #include "qemu/queue.h" #include "qemu/atomic.h" -- 1.8.3.1
[Qemu-devel] [RFC PATCH 10/14] cpus: report RCU quiescent states
From: Paolo Bonzini CPU threads have extended quiescent states while relinquishing control to the accelerator (except TCG). Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- cpus.c| 3 +++ kvm-all.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/cpus.c b/cpus.c index 0f65e76..624658e 100644 --- a/cpus.c +++ b/cpus.c @@ -37,6 +37,7 @@ #include "sysemu/qtest.h" #include "qemu/main-loop.h" #include "qemu/bitmap.h" +#include "qemu/rcu.h" #ifndef _WIN32 #include "qemu/compatfd.h" @@ -818,6 +819,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg) while (1) { current_cpu = NULL; qemu_mutex_unlock_iothread(); + rcu_thread_offline(); do { int sig; r = sigwait(&waitset, &sig); @@ -826,6 +828,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg) perror("sigwait"); exit(1); } + rcu_thread_online(); qemu_mutex_lock_iothread(); current_cpu = cpu; qemu_wait_io_event_common(cpu); diff --git a/kvm-all.c b/kvm-all.c index 716860f..a47573d 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -33,6 +33,7 @@ #include "exec/memory.h" #include "exec/address-spaces.h" #include "qemu/event_notifier.h" +#include "qemu/rcu.h" #include "trace.h" /* This check must be after config-host.h is included */ @@ -1641,7 +1642,9 @@ int kvm_cpu_exec(CPUState *cpu) } qemu_mutex_unlock_iothread(); + rcu_thread_offline(); run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0); + rcu_thread_online(); qemu_mutex_lock_iothread(); kvm_arch_post_run(cpu, run); -- 1.8.3.1
[Qemu-devel] [RFC PATCH 11/14] block: report RCU quiescent states
From: Paolo Bonzini The aio workers may spend a long time executing I/O operations; mark that time as an extended quiescent state. Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- block/raw-posix.c | 3 +++ block/raw-win32.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/block/raw-posix.c b/block/raw-posix.c index ba721d3..962833e 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -26,6 +26,7 @@ #include "qemu/log.h" #include "block/block_int.h" #include "qemu/module.h" +#include "qemu/rcu.h" #include "trace.h" #include "block/thread-pool.h" #include "qemu/iov.h" @@ -735,6 +736,7 @@ static int aio_worker(void *arg) RawPosixAIOData *aiocb = arg; ssize_t ret = 0; +rcu_thread_offline(); switch (aiocb->aio_type & QEMU_AIO_TYPE_MASK) { case QEMU_AIO_READ: ret = handle_aiocb_rw(aiocb); @@ -774,6 +776,7 @@ static int aio_worker(void *arg) } g_slice_free(RawPosixAIOData, aiocb); +rcu_thread_online(); return ret; } diff --git a/block/raw-win32.c b/block/raw-win32.c index 9b5b2af..5f88452 100644 --- a/block/raw-win32.c +++ b/block/raw-win32.c @@ -25,6 +25,7 @@ #include "qemu/timer.h" #include "block/block_int.h" #include "qemu/module.h" +#include "qemu/rcu.h" #include "raw-aio.h" #include "trace.h" #include "block/thread-pool.h" @@ -99,6 +100,7 @@ static int aio_worker(void *arg) ssize_t ret = 0; size_t count; +rcu_thread_offline(); switch (aiocb->aio_type & QEMU_AIO_TYPE_MASK) { case QEMU_AIO_READ: count = handle_aiocb_rw(aiocb); @@ -136,6 +138,7 @@ static int aio_worker(void *arg) } g_slice_free(RawWin32AIOData, aiocb); +rcu_thread_online(); return ret; } -- 1.8.3.1
[Qemu-devel] [PATCH 14/14] fix pointer reference to rcu_assign_pointer
Signed-off-by: Mike Day --- tests/rcutorture.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/rcutorture.c b/tests/rcutorture.c index 236c628..b219ff5 100644 --- a/tests/rcutorture.c +++ b/tests/rcutorture.c @@ -293,7 +293,7 @@ static void *rcu_update_stress_test(void *arg) smp_mb(); p->pipe_count = 0; p->mbtest = 1; -rcu_assign_pointer(&rcu_stress_current, &p); +rcu_assign_pointer(rcu_stress_current, p); rcu_stress_idx = i; for (i = 0; i < RCU_STRESS_PIPE_LEN; i++) if (i != rcu_stress_idx) { -- 1.8.3.1
[Qemu-devel] [RFC PATCH 12/14] migration: report RCU quiescent states
From: Paolo Bonzini The migration thread polls s->state periodically, it does not use a mutex or condition variable, so it has to report quiescent states manually. Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- migration.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/migration.c b/migration.c index 1402fa7..9d0950e 100644 --- a/migration.c +++ b/migration.c @@ -22,6 +22,7 @@ #include "qemu/sockets.h" #include "migration/block.h" #include "qemu/thread.h" +#include "qemu/rcu.h" #include "qmp-commands.h" #include "trace.h" @@ -563,6 +564,7 @@ static void *migration_thread(void *opaque) int64_t current_time; uint64_t pending_size; +rcu_quiescent_state(); if (!qemu_file_rate_limit(s->file)) { DPRINTF("iterate\n"); pending_size = qemu_savevm_state_pending(s->file, max_size); -- 1.8.3.1
[Qemu-devel] [RFC PATCH 01/14] qemu-thread: add QemuEvent
From: Paolo Bonzini This emulates Win32 manual-reset events using futexes or conditional variables. Typical ways to use them are with multi-producer, single-consumer data structures, to test for a complex condition whose elements come from different threads: for (;;) { qemu_event_reset(ev); ... test complex condition ... if (condition is true) { break; } qemu_event_wait(ev); } Alternatively: ... compute condition ... if (condition) { do { qemu_event_wait(ev); qemu_event_reset(ev); ... compute condition ... } while(condition); qemu_event_set(ev); } QemuEvent provides a very fast userspace path in the common case when no other thread is waiting, or the event is not changing state. It is used to report RCU quiescent states to the thread calling synchronize_rcu (the latter being the single consumer), and to report call_rcu invocations to the thread that receives them. Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- include/qemu/thread-posix.h | 8 +++ include/qemu/thread-win32.h | 4 ++ include/qemu/thread.h | 7 +++ util/qemu-thread-posix.c| 116 util/qemu-thread-win32.c| 26 ++ 5 files changed, 161 insertions(+) diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h index 361566a..eb5c7a1 100644 --- a/include/qemu/thread-posix.h +++ b/include/qemu/thread-posix.h @@ -21,6 +21,14 @@ struct QemuSemaphore { #endif }; +struct QemuEvent { +#ifndef __linux__ +pthread_mutex_t lock; +pthread_cond_t cond; +#endif +unsigned value; +}; + struct QemuThread { pthread_t thread; }; diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h index 13adb95..3d58081 100644 --- a/include/qemu/thread-win32.h +++ b/include/qemu/thread-win32.h @@ -17,6 +17,10 @@ struct QemuSemaphore { HANDLE sema; }; +struct QemuEvent { +HANDLE event; +}; + typedef struct QemuThreadData QemuThreadData; struct QemuThread { QemuThreadData *data; diff --git a/include/qemu/thread.h b/include/qemu/thread.h index c02404b..3e32c65 100644 --- a/include/qemu/thread.h +++ b/include/qemu/thread.h @@ -7,6 +7,7 @@ typedef struct QemuMutex QemuMutex; typedef struct QemuCond QemuCond; typedef struct QemuSemaphore QemuSemaphore; +typedef struct QemuEvent QemuEvent; typedef struct QemuThread QemuThread; #ifdef _WIN32 @@ -45,6 +46,12 @@ void qemu_sem_wait(QemuSemaphore *sem); int qemu_sem_timedwait(QemuSemaphore *sem, int ms); void qemu_sem_destroy(QemuSemaphore *sem); +void qemu_event_init(QemuEvent *ev, bool init); +void qemu_event_set(QemuEvent *ev); +void qemu_event_reset(QemuEvent *ev); +void qemu_event_wait(QemuEvent *ev); +void qemu_event_destroy(QemuEvent *ev); + void qemu_thread_create(QemuThread *thread, void *(*start_routine)(void *), void *arg, int mode); diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c index 4de133e..37dd298 100644 --- a/util/qemu-thread-posix.c +++ b/util/qemu-thread-posix.c @@ -20,7 +20,12 @@ #include #include #include +#ifdef __linux__ +#include +#include +#endif #include "qemu/thread.h" +#include "qemu/atomic.h" static void error_exit(int err, const char *msg) { @@ -272,6 +277,117 @@ void qemu_sem_wait(QemuSemaphore *sem) #endif } +#ifdef __linux__ +#define futex(...) syscall(__NR_futex, __VA_ARGS__) + +static inline void futex_wake(QemuEvent *ev, int n) +{ +futex(ev, FUTEX_WAKE, n, NULL, NULL, 0); +} + +static inline void futex_wait(QemuEvent *ev, unsigned val) +{ +futex(ev, FUTEX_WAIT, (int) val, NULL, NULL, 0); +} +#else +static inline void futex_wake(QemuEvent *ev, int n) +{ +if (n == 1) { +pthread_cond_signal(&ev->cond); +} else { +pthread_cond_broadcast(&ev->cond); +} +} + +static inline void futex_wait(QemuEvent *ev, unsigned val) +{ +pthread_mutex_lock(&ev->lock); +if (ev->value == val) { +pthread_cond_wait(&ev->cond, &ev->lock); +} +pthread_mutex_unlock(&ev->lock); +} +#endif + +/* Valid transitions: + * - free->set, when setting the event + * - busy->set, when setting the event, followed by futex_wake + * - set->free, when resetting the event + * - free->busy, when waiting + * + * set->busy does not happen (it can be observed from the outside but + * it really is set->free->busy). + * + * busy->free provably cannot happen; to enforce it, the set->free transition + * is done with an OR, which becomes a no-op if the event has concurrently + * transitioned to free or busy. + */ + +#define EV_SET 0 +#define EV_FREE1 +#define EV_BUSY -1 + +void qemu_event_init(QemuEvent *ev, bool init) +{ +#ifndef __linux__ +pthread_mutex_init(&ev->lock,
[Qemu-devel] [RFC PATCH 06/14] rcu: add rcutorture
From: Paolo Bonzini A stress test program (works, too :)). Signed-off-by: Paolo Bonzini Reviewed-by: Mike Day --- tests/Makefile | 4 +- tests/rcutorture.c | 439 + 2 files changed, 442 insertions(+), 1 deletion(-) create mode 100644 tests/rcutorture.c diff --git a/tests/Makefile b/tests/Makefile index d044908..b4a52b4 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -99,7 +99,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \ tests/test-string-input-visitor.o tests/test-qmp-output-visitor.o \ tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \ tests/test-qmp-commands.o tests/test-visitor-serialization.o \ - tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o + tests/test-x86-cpuid.o tests/test-mul64.o tests/rcutortore.o \ +tests/test-int128.o test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o @@ -122,6 +123,7 @@ tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o libqemuutil.a libqemustub.a tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o xbzrle.o page_cache.o libqemuutil.a tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o ++tests/rcutorture$(EXESUF): tests/rcutorture.o libqemuutil.a tests/test-int128$(EXESUF): tests/test-int128.o tests/test-qapi-types.c tests/test-qapi-types.h :\ diff --git a/tests/rcutorture.c b/tests/rcutorture.c new file mode 100644 index 000..236c628 --- /dev/null +++ b/tests/rcutorture.c @@ -0,0 +1,439 @@ +/* + * rcutorture.c: simple user-level performance/stress test of RCU. + * + * Usage: + * ./rcu rperf [ ] + * Run a read-side performance test with the specified + * number of readers for seconds. + * ./rcu uperf [ ] + * Run an update-side performance test with the specified + * number of updaters and specified duration. + * ./rcu perf [ ] + * Run a combined read/update performance test with the specified + * number of readers and one updater and specified duration. + * + * The above tests produce output as follows: + * + * n_reads: 46008000 n_updates: 146026 nreaders: 2 nupdaters: 1 duration: 1 + * ns/read: 43.4707 ns/update: 6848.1 + * + * The first line lists the total number of RCU reads and updates executed + * during the test, the number of reader threads, the number of updater + * threads, and the duration of the test in seconds. The second line + * lists the average duration of each type of operation in nanoseconds, + * or "nan" if the corresponding type of operation was not performed. + * + * ./rcu stress [ ] + * Run a stress test with the specified number of readers and + * one updater. + * + * This test produces output as follows: + * + * n_reads: 114633217 n_updates: 3903415 n_mberror: 0 + * rcu_stress_count: 114618391 14826 0 0 0 0 0 0 0 0 0 + * + * The first line lists the number of RCU read and update operations + * executed, followed by the number of memory-ordering violations + * (which will be zero in a correct RCU implementation). The second + * line lists the number of readers observing progressively more stale + * data. A correct RCU implementation will have all but the first two + * numbers non-zero. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (c) 2008 Paul E. McKenney, IBM Corporation. + */ + +/* + * Test variables. + */ + +#include +#include +#include +#include +#include "qemu/atomic.h" +#include "qemu/rcu.h" +#include "qemu/compiler.h" +#include "qemu/thread.h" + +long long n_reads = 0LL; +long n_updates = 0L; +int nthreadsrunning; + +char argsbuf[64]; + +#define GOFLAG_INIT 0 +#define GOFLAG_RUN 1 +#define GOFLAG_STOP 2 + +static volatile int goflag = GOFLAG_INIT; + +#define RCU_READ_RUN 1000 + +#define NR_THREADS 100 +static QemuThread threads[NR_THREADS]; +static struct rcu_reader_data *data[NR_THREADS]; +static int n_threads; + +static void create_thread(void *(*func)(void *)) +{ +if (n_threads >= NR_THREADS) { +fprintf(stderr, "Thread limit of %d exceeded!\n", NR_THREADS); +exit(-1); +} +
[Qemu-devel] [PATCH 1/2] fixed tests/Makefile to correctly link rcutorture
From: Paolo Bonzini Reviewed-by: Mike Day --- tests/Makefile | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index b4a52b4..4d68d28 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -44,9 +44,14 @@ check-unit-y += tests/test-cutils$(EXESUF) gcov-files-test-cutils-y += util/cutils.c check-unit-y += tests/test-mul64$(EXESUF) gcov-files-test-mul64-y = util/host-utils.c +check-unit-y += tests/test-tls$(EXESUF) +# all code tested by test-tls is inside tls.h +gcov-files-test-tls-y = check-unit-y += tests/test-int128$(EXESUF) # all code tested by test-int128 is inside int128.h gcov-files-test-int128-y = +check-unit-y += tests/rcutorture$(EXESUF) +gcov-files-rcutorture-y = util/rcu.c check-unit-y += tests/test-bitops$(EXESUF) check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh @@ -99,8 +104,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \ tests/test-string-input-visitor.o tests/test-qmp-output-visitor.o \ tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \ tests/test-qmp-commands.o tests/test-visitor-serialization.o \ - tests/test-x86-cpuid.o tests/test-mul64.o tests/rcutortore.o \ -tests/test-int128.o + tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \ + tests/test-tls.o tests/rcutorture.o test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o @@ -123,8 +128,9 @@ tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o libqemuutil.a libqemustub.a tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o xbzrle.o page_cache.o libqemuutil.a tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o +tests/rcutorture$(EXESUF): tests/rcutorture.o libqemuutil.a tests/test-int128$(EXESUF): tests/test-int128.o +tests/test-tls$(EXESUF): tests/test-tls.o libqemuutil.a +tests/rcutorture$(EXESUF): tests/rcutorture.o libqemuutil.a tests/test-qapi-types.c tests/test-qapi-types.h :\ $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py -- 1.8.3.1
[Qemu-devel] [RFC PATCH 0/2] v2.1 RCU Implementation for QEMU
This series applies on top today's git.qemu.org/master and is online at: https://github.com/ncultra/qemu/tree/rcu-for-1.7 Paolo Bonzini (2): fixed tests/Makefile to correctly link rcutorture enable TLS in build and activate test-tls in make check configure | 63 ++ include/qemu/tls.h | 127 + include/qom/cpu.h | 2 +- tests/Makefile | 14 -- tests/test-tls.c | 87 5 files changed, 261 insertions(+), 32 deletions(-) create mode 100644 tests/test-tls.c -- 1.8.3.1
[Qemu-devel] [PATCH 2/2] enable TLS in build and activate test-tls in make check
From: Paolo Bonzini Reviewed-by: Mike Day --- configure | 63 ++ include/qemu/tls.h | 127 + include/qom/cpu.h | 2 +- tests/Makefile | 2 +- tests/test-tls.c | 87 5 files changed, 252 insertions(+), 29 deletions(-) create mode 100644 tests/test-tls.c diff --git a/configure b/configure index 18fa608..baf61c8 100755 --- a/configure +++ b/configure @@ -285,6 +285,7 @@ fi ar="${AR-${cross_prefix}ar}" as="${AS-${cross_prefix}as}" cpp="${CPP-$cc -E}" +nm="${NM-${cross_prefix}nm}" objcopy="${OBJCOPY-${cross_prefix}objcopy}" ld="${LD-${cross_prefix}ld}" libtool="${LIBTOOL-${cross_prefix}libtool}" @@ -1241,6 +1242,29 @@ if compile_prog "-Werror -fno-gcse" "" ; then TRANSLATE_OPT_CFLAGS=-fno-gcse fi +## +# Using __thread is either faster than pthread_get/setspecific, +# or (if using GCC's "emutls" feature) exactly the same. So +# we always use it if available. + +cat > $TMPC << EOF +__thread int x; + +int main(void) +{ + x = 42; + return x; +} +EOF +if compile_prog "-Werror" "" ; then + tls=yes +else + tls=no +fi + +## +# Position Independent executables + if test "$static" = "yes" ; then if test "$pie" = "yes" ; then error_exit "static and pie are mutually incompatible" @@ -1260,19 +1284,18 @@ if test "$pie" = ""; then fi if test "$pie" != "no" ; then + if test "$CONFIG_TLS" = yes; then +THREAD=__thread + else +THREAD= + fi cat > $TMPC << EOF - -#ifdef __linux__ -# define THREAD __thread -#else -# define THREAD -#endif - -static THREAD int tls_var; +static $THREAD int tls_var; int main(void) { return tls_var; } EOF + unset THREAD if compile_prog "-fPIE -DPIE" "-pie"; then QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS" LDFLAGS="-pie $LDFLAGS" @@ -3184,6 +3207,22 @@ if test "$trace_backend" = "dtrace"; then fi ## +# check for TLS runtime + +# Some versions of mingw include the "magic" definitions that make +# TLS work, some don't. Check for it. + +if test "$mingw32" = yes; then + cat > $TMPC << EOF +int main(void) { return 0; } +EOF + compile_prog "" "" + if $nm $TMPE | grep _tls_used > /dev/null 2>&1; then +mingw32_tls_runtime=yes + fi +fi + +## # check and set a backend for coroutine # We prefer ucontext, but it's not always possible. The fallback @@ -3677,6 +3716,9 @@ if test "$mingw32" = "yes" ; then version_micro=0 echo "CONFIG_FILEVERSION=$version_major,$version_minor,$version_subminor,$version_micro" >> $config_host_mak echo "CONFIG_PRODUCTVERSION=$version_major,$version_minor,$version_subminor,$version_micro" >> $config_host_mak + if test "$mingw32_tls_runtime" = yes; then +echo "CONFIG_MINGW32_TLS_RUNTIME=y" >> $config_host_mak + fi else echo "CONFIG_POSIX=y" >> $config_host_mak fi @@ -3979,6 +4021,10 @@ if test "$cpuid_h" = "yes" ; then echo "CONFIG_CPUID_H=y" >> $config_host_mak fi +if test "$tls" = "yes" ; then + echo "CONFIG_TLS=y" >> $config_host_mak +fi + if test "$int128" = "yes" ; then echo "CONFIG_INT128=y" >> $config_host_mak fi @@ -4107,6 +4153,7 @@ echo "OBJCC=$objcc" >> $config_host_mak echo "AR=$ar" >> $config_host_mak echo "AS=$as" >> $config_host_mak echo "CPP=$cpp" >> $config_host_mak +echo "NM=$nm" >> $config_host_mak echo "OBJCOPY=$objcopy" >> $config_host_mak echo "LD=$ld" >> $config_host_mak echo "WINDRES=$windres" >> $config_host_mak diff --git a/include/qemu/tls.h b/include/qemu/tls.h index b92ea9d..c878aaa 100644 --- a/include/qemu/tls.h +++ b/include/qemu/tls.h @@ -1,7 +1,7 @@ /* * Abstraction layer for defining and using TLS variables * - * Copyright (c) 2011 Red Hat, Inc + * Copyright (c) 2011, 2013 Red Hat, Inc * Copyright (c) 2011 Linaro Limited * * Authors: @@ -25,28 +25,117 @@ #ifndef QEMU_TLS_H #define QEMU_TLS_H -/* Per-thread variables. Note that we only have implementations - * which are really thread-local on Linux; the dummy implementations - * define plain global variables. +#ifdef CONFIG_WIN32 + +/* Do not use GCC's "emutls" path on
Re: [Qemu-devel] Configure virtio-scsi options via libvirt
On 10:08 Wed 07 May , Stefan Hajnoczi wrote: > On Tue, May 06, 2014 at 04:13:50PM -0700, Mike Perez wrote: > > I would like be able to configure virtio-scsi options num_queues, > > max_sectors, > > and cmd_per_lun via libvirt. Are there any plans to have this support? > > Hi Mike, > I'm not sure about the status of libvirt support for virtio-scsi options > but in the meantime you can use passthrough: > > http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html > > Stefan Thanks Stefan, My plan is to write the support in OpenStack, which unfortunately does not accept patches doing qemu passthrough: http://lists.openstack.org/pipermail/openstack-dev/2013-November/018551.html -- Mike Perez
Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version
On Wed, May 7, 2014 at 5:16 PM, Hani Benhabiles wrote: > FWIW, this new version doesn't trigger a false positive I was having with > patches 05-07 in [1]. > > However, from a quick test for this patch on a couple of patches, I am getting > warnings like this: "WARNING: braces {} are not necessary for single statement > blocks" so there are still some Qemu related changes missing. Thanks for the feedback! As Peter pointed out (and you confirm), I missed forward-porting QEMU's brace rules.
Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version
On Thu, May 8, 2014 at 10:02 AM, Peter Maydell wrote: >>> total: 0 errors, 0 warnings, 79 lines checked >>> >>> /tmp/a has no obvious style problems and is ready for submission. >>> Check 500, Bad 52 >> >> How does your new checkpatch compare to our current one? > > Yes; we do sometimes let checkpatch-failing commits through, > so it would be more interesting to know: > * how many commits passed old-checkpatch but fail the new one > * how many commits failed with old-checkpatch but pass now > * in both cases, how many of the failures are false positives? > > (the last of those is tricky to answer without eyeballing > the errors, unfortunately) That seems like a good test regime and not a lot of effort using the bash script Don shared. There appears to be some interest in this patch. I'll work on a 2nd revision, following Peter's suggestion to separate the QEMU rules into a separate, smaller patch (and with the correct handling of brackets). Mike
Re: [Qemu-devel] [PATCH 1/6] xics: add flags for interrupts
On Thu, May 8, 2014 at 11:12 PM, Alexey Kardashevskiy wrote: >> There are >> a couple ways to mitigate this type of situation by using alternative >> data structures to inform the loop traversal. I don't know if it is >> worth the effort, though. > > Here I lost you :) If I read the code correctly, the problem I'm wondering about is that the loop will waste time traversing the array when there are only unallocated interrupts from the current index to the end. For example, if the interrupt array is 256 entries and the highest index that is allocated is 16, the outside loop will traverse all 256 entries while it should have exited after the 16th. To mitigate this you could keep a shadow index of the current highest allocated index and check for that in the outside loop. Or you could maintain a shadow linked list that only includes allocated array entries and just traverse that list. Each element on the list would be an allocated entry in the interrupt array. > btw I just realized that in patch#2 it should be xics_find_source instead > of xics_find_server. There are many interrupt servers already and just one > interrupt source (we could have many like one per PHB or something like > that but we are not there yet), this is what I meant.
[Qemu-devel] [PATCH] qemu-img fails to delete last snapshot
When deleting the last snapshot, copying the resulting snapshot table currently fails, causing the delete operation to also fail. Fix the failure by skipping the copy and just writing the snapshot header and freeing the extra clusters. Signed-off-by: Mike Day --- There are two specific problems in the curent code. First is a lack of parenthesis in the calculation of a memmove parameter: s->nb_snapshots - snapshot_index - 1 When s->nb_snapshots is 0, snapshot_index is 1. 0 - 1 - 1 = 0xfffe it should be: 0 - (1 - 1) = 0x00 The second problem is shifting the snapshot table to the left. After removing the last snapshot there are no existing snapshots to be shifted. All that needs to be done is to write the header and unallocate the blocks. block/qcow2-snapshot.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index 0aa9def..c8b842c 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -165,9 +165,11 @@ static int qcow2_write_snapshots(BlockDriverState *bs) assert(offset <= INT_MAX); snapshots_size = offset; - /* Allocate space for the new snapshot list */ -snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size); +snapshots_offset = 0; +if (snapshots_size) { +snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size); +} offset = snapshots_offset; if (offset < 0) { ret = offset; @@ -180,12 +182,13 @@ static int qcow2_write_snapshots(BlockDriverState *bs) /* The snapshot list position has not yet been updated, so these clusters * must indeed be completely free */ -ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size); -if (ret < 0) { -goto fail; +if (snapshots_size) { +ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size); +if (ret < 0) { +goto fail; +} } - /* Write all snapshots to the new list */ for(i = 0; i < s->nb_snapshots; i++) { sn = s->snapshots + i; @@ -590,12 +593,14 @@ int qcow2_snapshot_delete(BlockDriverState *bs, return -ENOENT; } sn = s->snapshots[snapshot_index]; - /* Remove it from the snapshot list */ -memmove(s->snapshots + snapshot_index, -s->snapshots + snapshot_index + 1, -(s->nb_snapshots - snapshot_index - 1) * sizeof(sn)); s->nb_snapshots--; +if (s->nb_snapshots) { +memmove(s->snapshots + snapshot_index, +s->snapshots + snapshot_index + 1, +(s->nb_snapshots - (snapshot_index - 1)) * sizeof(sn)); +} + ret = qcow2_write_snapshots(bs); if (ret < 0) { error_setg_errno(errp, -ret, -- 1.9.0
[Qemu-devel] [PATCH] Add backing file option to qemu-img create help.
For the create subcommand the backing file (-b) option is documented on-line but not in the binary. Add it. Signed-off-by: Mike Day --- qemu-img-cmds.hx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index d029609..7724709 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -16,9 +16,9 @@ STEXI ETEXI DEF("create", img_create, -"create [-q] [-f fmt] [-o options] filename [size]") +"create [-q] [-f fmt] [-b backing-file] [-o options] filename [size]") STEXI -@item create [-q] [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}] +@item create [-q] [-f @var{fmt}] [-b @var{backing_file}] [-o @var{options}] @var{filename} [@var{size}] ETEXI DEF("commit", img_commit, -- 1.9.0
Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.
On Mon, May 12, 2014 at 11:20 AM, Eric Blake wrote: > Online where? In 'qemu-img --help', or on some web page (at what URL)? http://qemu.weilnetz.de/qemu-doc.html#vm_005fsnapshots -o backing file is not documented in the help command. Mike
Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.
On Mon, May 12, 2014 at 11:53 AM, Eric Blake wrote: > Ah, but it is: > > $ qemu-img create -f qcow2 -o help > Supported options: > size Virtual disk size > compat Compatibility level (0.10 or 1.1) > backing_file File name of a base image > backing_fmt Image format of the base image > encryption Encrypt the image > cluster_size qcow2 cluster size > preallocationPreallocation mode (allowed values: off, metadata) > lazy_refcounts Postpone refcount updates > > and more importantly, it only appears in the help output of -f modes > that actually support a backing file. Contrast: > > $ qemu-img create -f raw -o help > Supported options: > size Virtual disk size This is much more prominent in the top-level help: rebase [-q] [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] filename
Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.
On Mon, May 12, 2014 at 12:36 PM, Kevin Wolf wrote: > What would qemu-img rebase do with -o? It is just for (safely) changing > the backing file, not for updating options. There is qemu-img amend for > that, and it does have an -o option. A little background ... I'm writing a 4-day KVM training course for the Linux foundation. From time to time I come across undocumented, buggy, or confusing options and if I can I try to fix them. In the case of qemu-img I looked in a lot of places for documentation and ended up at the unofficial web page that documented the -b backing file option which works just fine for create and rebase. If an official page had the information I needed in one place instead of bits and pieces scattered everywhere I would have used it, naturally. Here's why the help is confusing. On the top level help, all the options except for -b are documented. A couple subcommands are even listed in the top level help with their options. I expected that -b would be documented on the top level with the others. The fact that -b is documented for the rebase subcommand reenforced my expectation. I was assuming there is some consistency in using options with backing files. (Especially if the option is called "backingfile.") I'm not advocating anything here, and I think its fine to leave things as they are. I just don't want to be the guy who complains and never fixes anything. Mike
Re: [Qemu-devel] [PATCH] qemu-img fails to delete last snapshot
On Mon, May 12, 2014 at 12:39 PM, Eric Blake wrote: > This information is actually quite useful in understanding the patch, > and I would have included it prior to the --- for inclusion in git, > rather than in the reviewer-only commentary that gets stripped during > 'git am'. > >> >> block/qcow2-snapshot.c | 25 +++-- >> 1 file changed, 15 insertions(+), 10 deletions(-) > > I'd suggest posting a v2 with a better commit message; but the code > itself seems fine, so you can add: > > Reviewed-by: Eric Blake Got it, thanks! Mike
[Qemu-devel] [PATCH v2] qemu-img fails to delete last snapshot
When deleting the last snapshot, copying the resulting snapshot table currently fails, causing the delete operation to also fail. Fix the failure by skipping the copy and just writing the snapshot header and freeing the extra clusters. There are two specific problems in the current code. First is a lack of parenthesis in the calculation of the memmove size parameter: s->nb_snapshots - snapshot_index - 1 When s->nb_snapshots is 0, snapshot_index is 1. 0 - 1 - 1 = 0xfffe it should be: 0 - (1 - 1) = 0x00 The second problem is shifting the snapshot table to the left. After removing the last snapshot there are no existing snapshots to be shifted. All that needs to be done is to write the header and unallocate the blocks. Signed-off-by: Mike Day Reviewed-by: Eric Blake --- v2: improved the git log entry added Eric Blake as a reviewer block/qcow2-snapshot.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index 0aa9def..c8b842c 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -165,9 +165,11 @@ static int qcow2_write_snapshots(BlockDriverState *bs) assert(offset <= INT_MAX); snapshots_size = offset; - /* Allocate space for the new snapshot list */ -snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size); +snapshots_offset = 0; +if (snapshots_size) { +snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size); +} offset = snapshots_offset; if (offset < 0) { ret = offset; @@ -180,12 +182,13 @@ static int qcow2_write_snapshots(BlockDriverState *bs) /* The snapshot list position has not yet been updated, so these clusters * must indeed be completely free */ -ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size); -if (ret < 0) { -goto fail; +if (snapshots_size) { +ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size); +if (ret < 0) { +goto fail; +} } - /* Write all snapshots to the new list */ for(i = 0; i < s->nb_snapshots; i++) { sn = s->snapshots + i; @@ -590,12 +593,14 @@ int qcow2_snapshot_delete(BlockDriverState *bs, return -ENOENT; } sn = s->snapshots[snapshot_index]; - /* Remove it from the snapshot list */ -memmove(s->snapshots + snapshot_index, -s->snapshots + snapshot_index + 1, -(s->nb_snapshots - snapshot_index - 1) * sizeof(sn)); s->nb_snapshots--; +if (s->nb_snapshots) { +memmove(s->snapshots + snapshot_index, +s->snapshots + snapshot_index + 1, +(s->nb_snapshots - (snapshot_index - 1)) * sizeof(sn)); +} + ret = qcow2_write_snapshots(bs); if (ret < 0) { error_setg_errno(errp, -ret, -- 1.9.0
[Qemu-devel] [PATCH] qemu-img: sort block formats in help message
The help message for qemu-img lists the supported block formats, of which there are 27 as of version 2.0.50. The formats are printed in the order of their driver's position in a linked list, which appears random. This patch prints the formats in sorted order, making it easier to read and to find a specific format in the list. [Added suggestions from Fam Zheng to declare variables at the top of the scope in help() and to omit explicit cast for void* opaque. --Stefan] [Removed call to g_sequence_lookup because it breaks the build on machines with glib < 2.28. --Mike] Signed-off-by: Mike Day Signed-off-by: Stefan Hajnoczi --- qemu-img.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 96f4463..93e51d1 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -32,6 +32,7 @@ #include "block/block_int.h" #include "block/qapi.h" #include +#include #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION \ ", Copyright (c) 2004-2008 Fabrice Bellard\n" @@ -55,9 +56,22 @@ typedef enum OutputFormat { #define BDRV_O_FLAGS BDRV_O_CACHE_WB #define BDRV_DEFAULT_CACHE "writeback" -static void format_print(void *opaque, const char *name) +static gint compare_data(gconstpointer a, gconstpointer b, gpointer user) { -printf(" %s", name); +return g_strcmp0(a, b); +} + +static void print_format(gpointer data, gpointer user) +{ +printf(" %s", (char *)data); +} + +static void add_format_to_seq(void *opaque, const char *fmt_name) +{ +GSequence *seq = opaque; + +g_sequence_insert_sorted(seq, (gpointer)fmt_name, + compare_data, NULL); } static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...) @@ -142,10 +156,15 @@ static void QEMU_NORETURN help(void) " '-f' first image format\n" " '-F' second image format\n" " '-s' run in Strict mode - fail on different image size or sector allocation\n"; +GSequence *seq; printf("%s\nSupported formats:", help_msg); -bdrv_iterate_format(format_print, NULL); +seq = g_sequence_new(NULL); +bdrv_iterate_format(add_format_to_seq, seq); +g_sequence_foreach(seq, print_format, NULL); printf("\n"); +g_sequence_free(seq); + exit(EXIT_SUCCESS); } -- 1.9.0