Re: [Qemu-devel] Re: [PATCH] pci: disable migration of p2p bridge

2010-12-22 Thread Isaku Yamahata
On Wed, Dec 22, 2010 at 08:27:17AM +0200, Michael S. Tsirkin wrote:
> On Wed, Dec 22, 2010 at 12:13:43PM +0900, Isaku Yamahata wrote:
> > Right now pcibus_get_dev_path() isn't migration save because
> > bus number/secondary bus number are set by guest OS.
> > So it can't be used reliably for qemu internal id.
> > 
> > For 0.14 release, disable p2p bridge migration at the moment.
> > Once pcibus_get_dev_path() is fixed, this patch should be reverted.
> > It will be addressed for 0.15 release.
> > 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Alex Williamson 
> > Cc: Blue Swirl 
> > Signed-off-by: Isaku Yamahata 
> 
> 
> Hmm, haven't looked into this deeply - can we do this in one place
> when the bridge is created?

Unfortunately it's not easy. It requires revising
register_device_unmigratable(). I have to admit this patch is ugly. 

This patch is temporal work around and should be reverted eventually.
So I think it is better to address the original issue (allowing migration
of p2p bridge) instead of addressing register_device_unmigratable().

thanks
-- 
yamahata



Re: [Qemu-devel] Re: [PATCH 0/4] ppc: Fix PReP emulation

2010-12-22 Thread Alexander Graf

On 22.12.2010, at 00:51, Andreas Färber  wrote:

> Am 21.12.2010 um 01:46 schrieb Alexander Graf:
> 
>> On 21.12.2010, at 01:33, Andreas Färber wrote:
>> 
>>> OpenHack'Ware never worked for me before. Supposedly patched Linux kernels 
>>> loaded via -kernel, still searching for a working one though...
>>> 
>>> $ qemu-system-ppc -M prep -nographic
>>> ERROR: BUG caught...
>>> BIOS execution exception
>>> nip=0x0580 msr=0x2000 dar=0x dsisr=0x
>>> Stopping execution
>>> 
>>> $ qemu-system-ppc -M prep -kernel .../vmlinuz-2.4.25 -nographic # 
>>> http://www.olifantasia.com/qemu/
>>> ERROR: BUG caught...
>>> BIOS execution exception
>>> nip=0x0580 msr=0x2000 dar=0x dsisr=0x
>>> Stopping execution
>>> 
>>> $ qemu-system-ppc -M prep -kernel .../vmlinuz-2.4.25 -append 
>>> 'ide0=0x1f0,0x3f6,13 ide1=0x170,0x376,13 netdev=9,0x300,eth0 console=ttyS0 
>>> console=tty0 root=/dev/hda' -nographic
>>> ERROR: BUG caught...
>>> BIOS execution exception
>>> nip=0x0580 msr=0x2000 dar=0x dsisr=0x
>>> Stopping execution
>>> 
>> 
>> This one boots for me on qemu from SLES10SP3 (0.8.2):
>> 
>> http://www.oszoo.org/wiki/index.php/Debian_Etch_ppc_(PowerPC)_-_Qemu_Ready
> 
> $ qemu-system-ppc -m 256 -M prep -kernel .../zImage.prep -hda 
> .../debian-ppc-qemu.qcow -nographic
> 
> shows same error as above.
> 
> Could this be due to the IRQ issue? The older kernel did expect IRQ 13 twice.
> I would expect it to get a little further before running into such an error...

Phew - I doubt it too. There have been quite some changes since it last worked 
I guess ;). First thing that sounds reasonable now would be to find a known 
good version and check when the interpreted code paths start to differ.

Alex

> 



Re: [Qemu-devel] [PATCH] MIPS interrupts and -icount

2010-12-22 Thread Edgar E. Iglesias
On Tue, Dec 21, 2010 at 11:28:43PM +0100, Aurelien Jarno wrote:
> On Sun, Jul 25, 2010 at 07:46:49AM +0200, Edgar E. Iglesias wrote:
> > On Sun, Jul 25, 2010 at 05:08:07AM +0200, Aurelien Jarno wrote:
> > > On Sun, Jul 25, 2010 at 02:07:54AM +0200, Edgar E. Iglesias wrote:
> > > > On Sun, Jul 25, 2010 at 12:55:45AM +0200, Aurelien Jarno wrote:
> > > > > On Thu, Jul 22, 2010 at 01:32:18PM +0200, Edgar E. Iglesias wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I'm seeing an error when emulating MIPS guests with -icount.
> > > > > > In cpu_interrupt:
> > > > > >   cpu_abort(env, "Raised interrupt while not in I/O function");
> > > > > 
> > > > > I am not able to reproduce the issue. Do you have more details how to
> > > > > reproduce the problem?
> > > > 
> > > > You need a machine with devices that raise the hw interrupts. I didn't
> > > > see the error on the images on the wiki though. But I've got a machine
> > > > here that trigs it easily. Will check if I can publish it and an image.
> > > > 
> > > 
> > > That would be nice if you can share it.
> > > 
> > > > > > It seems to me like the MIPS interrupt glue logic between interrupt
> > > > > > controllers and the MIPS core is not modeled correctly.
> > > > > 
> > > > > It seems indeed that sometimes interrupt are triggered while not in 
> > > > > I/O functions, your patch addresses part of the problem.
> > > > > 
> > > > > > When hw interrupt pending bits in CP0_Cause are set, the CPU should
> > > > > > see the hw interrupt line as active. The CPU may or may not take the
> > > > > > interrupt based on internal state (global irq mask etc) but the glue
> > > > > > logic shouldn't care about that. Am I missing something here?
> > > > > 
> > > > > I don't think it is correct. On the real hardware, the interrupt line 
> > > > > is
> > > > > actually active only when all conditions are fulfilled.
> > > > > 
> > > > > The thing to remember is that the interrupts are level triggered. So 
> > > > > if
> > > > > an interrupt is masked, it should be rejected by the CPU, but could be
> > > > > triggered again as soon as the interrupt mask is changed.
> > > > 
> > > > Agreed, that behaviour is what I'm trying to acheive. The problem now
> > > > is that the the level triggered line, prior to CPU masking is beeing 
> > > > masked
> > > > by external logic. IMO it shouldnt. See hw/mips_int.c and cpu-exec.c 
> > > > prior
> > > > to the patch.
> > > 
> > > Actually all depends if you consider the MIPS interrupt controller part
> > > of the CPU or not. It could be entirely modeled in the CPU, that is in 
> > > cpu-exec.c or entirely modeled as a separate controller, that is in 
> > > mips_int.c.
> > > 
> > > IMHO it should be in mips_int.c. It is an interrupt controller like
> > > another that combines a few interrupt lines into a single one that feeds
> > > the CPU. It is like for example the i8259, with the exception that the 
> > > configuration is not done by load/store into MMIO area, but directly 
> > > using CPU special registers. We should probably mark these instructions 
> > > as I/O.
> > 
> > 
> > Hi,
> > 
> > I agree that it's not obvious where things should be modeled, I'll try to
> > explain my view.
> > 
> > As a first step I'm trying to model a MIPS configured with Vectored
> > Interrupts. We've got external interrupt logic feeding the hw
> > interrupt lines. These lines are level triggered, held active by
> > the external logic as long as interrupts are pending. Regardless
> > of wether the CPU want's to take the interrupt now or later. In fact,
> > there is no way to access the internal flags from RTL logic located
> > here (AFAIK). In my mind, this layers pretty much ends in hw/mips_int.c.
> > 
> > Internally in the MIPS core, I'm guessing there is logic that simpliy
> > applies the internal CPU masks, outputing a single internal IRQ line
> > that decides wether the CPU should take the IRQ or not. Here, things like
> > IE flags etc matter. I don't have access to RTL on the MIPS side so I'm
> > just guessing here.
> > 
> > In my mind, we should model this latter part by asserting INTERRUPT_HARD
> > from hw/mips_int.c whenever any hw lines are active and letting the
> > CPU in cpu-exec.c decide when to take the interrupt by applying it's
> > internal masking.
> > 
> 
> Sorry to come back so long after this discussion, but I now have another
> argument. This commit causes a regression, the host CPU is now always at
> 100%. QEMU spent all its time looping because the CPU interrupt line is
> asserted.
> 
> Not asserting the CPU interrupt line when interrupts are disabled fixes
> the issue.


Hi, I don't see this problem with the qemu.org test images and neither
with my boards/images. I see QEMU basically not running at all when
the guest is idle. Do you have more info on how to reproduce it?

If the CPU hw interrupt line is asserted it means some device is
signaling interrupts. Maybe we are modling the wake up filter
wrongly in target-mips/exec.h, maybe a rea

[Qemu-devel] possible regression in qemu-kvm 0.13.0 (memtest)

2010-12-22 Thread Peter Lieven
Hi,

I came across a strange issue when updating from qemu-kvm 0.12.5 to 
qemu-kvm-0.13.0

If I start a VM with the following parameters
qemu-kvm-0.13.0 -m 2048 -smp 2 -monitor tcp:0:4014,server,nowait -vnc :14 -name 
'ubuntu.test'  -boot order=dc,menu=off  -cdrom ubuntu-10.04.1-desktop-amd64.iso 
-k de 

and select memtest in the Ubuntu CD Boot Menu, the VM immediately resets. After 
this reset there happen several errors including graphic corruption or the 
qemu-kvm binary
aborting with error 134.

Exactly the same scenario on the same machine with qemu-kvm-0.12.5 works 
flawlessly.

Any ideas?

Thanks,
Peter




Re: [Qemu-devel] [PATCH] target-mips: fix translation of MT instructions

2010-12-22 Thread Aurelien Jarno
On Fri, Oct 29, 2010 at 07:48:46AM -0700, Nathan Froyd wrote:
> The translation of dmt/emt/dvpe/evpe was doing the moral equivalent of:
> 
>   int x;
>   ... /* no initialization of x */
>   x = f (x);
> 
> which confused later bits of TCG rather badly, leading to crashes.
> 
> Fix the helpers to only return results (those instructions have no
> inputs), and fix the translation code accordingly.
> 
> Signed-off-by: Nathan Froyd 

Thanks, applied.

> ---
>  target-mips/helper.h|8 
>  target-mips/op_helper.c |   28 
>  target-mips/translate.c |8 
>  3 files changed, 16 insertions(+), 28 deletions(-)
> 
> diff --git a/target-mips/helper.h b/target-mips/helper.h
> index cb13fb2..297ab64 100644
> --- a/target-mips/helper.h
> +++ b/target-mips/helper.h
> @@ -154,10 +154,10 @@ DEF_HELPER_2(mttlo, void, tl, i32)
>  DEF_HELPER_2(mtthi, void, tl, i32)
>  DEF_HELPER_2(mttacx, void, tl, i32)
>  DEF_HELPER_1(mttdsp, void, tl)
> -DEF_HELPER_1(dmt, tl, tl)
> -DEF_HELPER_1(emt, tl, tl)
> -DEF_HELPER_1(dvpe, tl, tl)
> -DEF_HELPER_1(evpe, tl, tl)
> +DEF_HELPER_0(dmt, tl)
> +DEF_HELPER_0(emt, tl)
> +DEF_HELPER_0(dvpe, tl)
> +DEF_HELPER_0(evpe, tl)
>  #endif /* !CONFIG_USER_ONLY */
>  
>  /* microMIPS functions */
> diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
> index 41abd57..ec6864d 100644
> --- a/target-mips/op_helper.c
> +++ b/target-mips/op_helper.c
> @@ -1554,40 +1554,28 @@ void helper_mttdsp(target_ulong arg1)
>  }
>  
>  /* MIPS MT functions */
> -target_ulong helper_dmt(target_ulong arg1)
> +target_ulong helper_dmt(void)
>  {
>  // TODO
> -arg1 = 0;
> -// rt = arg1
> -
> -return arg1;
> + return 0;
>  }
>  
> -target_ulong helper_emt(target_ulong arg1)
> +target_ulong helper_emt(void)
>  {
>  // TODO
> -arg1 = 0;
> -// rt = arg1
> -
> -return arg1;
> +return 0;
>  }
>  
> -target_ulong helper_dvpe(target_ulong arg1)
> +target_ulong helper_dvpe(void)
>  {
>  // TODO
> -arg1 = 0;
> -// rt = arg1
> -
> -return arg1;
> +return 0;
>  }
>  
> -target_ulong helper_evpe(target_ulong arg1)
> +target_ulong helper_evpe(void)
>  {
>  // TODO
> -arg1 = 0;
> -// rt = arg1
> -
> -return arg1;
> +return 0;
>  }
>  #endif /* !CONFIG_USER_ONLY */
>  
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index d62c615..c4c44c1 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -12033,22 +12033,22 @@ static void decode_opc (CPUState *env, DisasContext 
> *ctx, int *is_branch)
>  switch (op2) {
>  case OPC_DMT:
>  check_insn(env, ctx, ASE_MT);
> -gen_helper_dmt(t0, t0);
> +gen_helper_dmt(t0);
>  gen_store_gpr(t0, rt);
>  break;
>  case OPC_EMT:
>  check_insn(env, ctx, ASE_MT);
> -gen_helper_emt(t0, t0);
> +gen_helper_emt(t0);
>  gen_store_gpr(t0, rt);
>  break;
>  case OPC_DVPE:
>  check_insn(env, ctx, ASE_MT);
> -gen_helper_dvpe(t0, t0);
> +gen_helper_dvpe(t0);
>  gen_store_gpr(t0, rt);
>  break;
>  case OPC_EVPE:
>  check_insn(env, ctx, ASE_MT);
> -gen_helper_evpe(t0, t0);
> +gen_helper_evpe(t0);
>  gen_store_gpr(t0, rt);
>  break;
>  case OPC_DI:
> -- 
> 1.6.3.2
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



[Qemu-devel] Re: PCIe Transaction handling in Qemu

2010-12-22 Thread Isaku Yamahata
On Tue, Dec 21, 2010 at 02:24:29PM -0600, Adnan Khaleel wrote:
> Hello, 

Hi.


> I have a question regarding how Qemu PCIe devices handle Config Transactions 
> vs
> Memory Transactions (assuming the PCI device is setup to act
> as PCI_BASE_ADDRESS_SPACE_MEMORY).
> 
> I'm using portions of hw/cirrus_vga.c to make my point,

If you can send out what you have instead of mimicked
example, it would help to figure out what you are trying to do.


> I have some questions about PCIe operations sssuming the device has MMIO
> handlers involved (as shown above).
> 1. Will all PCIe config operations ALWAYS use the installed config handlers? 
> Or
> can PCIe config operations use the MMIO handlers?

MMIO on MMCONFIG area are routed to write/read config handler.
On the other hand MMIO on memory BAR is routed to mmio hanlder you pictured.
NOTE: the upstream qemu lacks q35 chipset support, so guest can NOT do
  MMIO on MMCONFIG area.

> 2. Assuming that both PCI config and MMIO operations can use the MMIO 
> handlers,
> is there any way I can identify if a transaction is a config or a memory
> transaction?
> 3.a. What address is passed on the MMIO handlers for config and MMIO
> operations? From pci_data_write in pci_host.c, it appears that config
> operations send only the offset into the config region. I couldn't determine
> what address is passed for MMIO operations.
>b. Is it an offset from the BAR for MMIO operations?
>c. How do I get the full physical address?
>d. What address does a PCIe device expect to see - physical or offset for?
>e. Is there anyway I can find out what the bus and device numbers are once
> inside the config and MMIO handlers? i.e once the execution has reached
> the pci_cirrus_write_config() or cirrus_vga_mem_readb(..) from the code above?

offset in configuration space of each pcie function is passed to
write/read config handler
physical address is passed to mmio handler of memory BAR.
-- 
yamahata



[Qemu-devel] [PATCH 0/3] pcie/aer: glue inject aer error into hmp

2010-12-22 Thread Isaku Yamahata
This patch series introduces hmp command to inject aer error.
Now fw device path is used to specify pci function.

Isaku Yamahata (3):
  build, pci: remove QMP dependency on core PCI code
  pci: introduce a parser for fw device path to pci device
  pcie/aer: glue aer error injection into qemu monitor

 Makefile.objs   |4 +-
 Makefile.target |2 +
 hmp-commands.hx |   28 +++
 hw/pci-stub.c   |   50 
 hw/pci.c|  128 
 hw/pci.h|2 +
 hw/pcie_aer.c   |  222 +++
 sysemu.h|5 +
 8 files changed, 438 insertions(+), 3 deletions(-)
 create mode 100644 hw/pci-stub.c




[Qemu-devel] [PATCH 2/3] pci: introduce a parser for fw device path to pci device

2010-12-22 Thread Isaku Yamahata
Introduce a function to parse fw device path to pci device.
the format is
/p...@{, }/[]@,/.../[]@,

 = "i"
 = 
 = slot number in hex
 = func number in hex

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |  128 ++
 hw/pci.h |2 +
 2 files changed, 130 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index eb21848..a52a323 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -2027,3 +2027,131 @@ static char *pcibus_get_dev_path(DeviceState *dev)
 return strdup(path);
 }
 
+/*
+ * Parse format and get PCIDevice
+ * return 0 on success
+ *   <0 on error: format is invalid or device isn't found.
+ *
+ * Format:
+ * /p...@{, }/[]@,/...
+ * .../[]@,
+ *
+ *  = "i"
+ *  = 
+ *  = slot number in hex
+ *  = func number in hex
+ *
+ */
+int pci_parse_fw_dev_path(const char *path, PCIDevice **pdev)
+{
+const char *p = path;
+char *e;
+size_t len;
+PCIBus *bus;
+struct PCIHostBus *host;
+
+if (*p != '/') {
+return -EINVAL;
+}
+e = strchr(p + 1, '/');
+if (e == NULL) {
+return -EINVAL;
+}
+len = e - p;
+p = e + 1;
+
+bus = NULL;
+QLIST_FOREACH(host, &host_buses, next) {
+DeviceState *qdev = host->bus->qbus.parent;
+if (qdev) {
+char *devpath = qdev_get_fw_dev_path(qdev);
+
+if (len == strlen(devpath) && !strncmp(devpath, path, len)) {
+bus = host->bus;
+qemu_free(devpath);
+break;
+}
+qemu_free(devpath);
+} else {
+/* This pci bus doesn't have host-to-pci bridge device.
+ * Check only if the path is pci ignoring other parameters. */
+#define PCI_FW_PATH "/pci@"
+if (strncmp(path, PCI_FW_PATH, strlen(PCI_FW_PATH))) {
+return -EINVAL;
+}
+bus = host->bus;
+break;
+}
+}
+
+for (;;) {
+char *at;
+char *comma;
+unsigned long slot;
+unsigned long func;
+PCIDevice *dev;
+PCIBus *child_bus;
+
+if (!bus) {
+return -ENODEV;
+}
+if (*p == '\0') {
+return -EINVAL;
+}
+
+at = strchr(p, '@');
+if (at == NULL) {
+return -EINVAL;
+}
+slot = strtoul(at + 1, &e, 16);
+if (e == at + 1 || *e != ',') {
+return -EINVAL;
+}
+if (slot >= PCI_SLOT_MAX) {
+return -EINVAL;
+}
+
+comma = e;
+func = strtoul(comma + 1, &e, 16);
+if (e == comma + 1 || (*e != '/' && *e != '\0')) {
+return -EINVAL;
+}
+if (func >= PCI_FUNC_MAX) {
+return -EINVAL;
+}
+
+len = e - p;
+dev = bus->devices[PCI_DEVFN(slot, func)];
+if (!dev) {
+return -ENODEV;
+}
+if (at != p) {
+/* fw_name is specified. */
+char *fw_dev_path = pcibus_get_fw_dev_path(&dev->qdev);
+if (strncmp(p, fw_dev_path, len)) {
+qemu_free(fw_dev_path);
+return -EINVAL;
+}
+qemu_free(fw_dev_path);
+}
+
+if (*e == '\0') {
+*pdev = dev;
+return 0;
+}
+
+/*
+ * descending down pci-to-pci bridge.
+ * At the moment, there is no way to safely determine if the given
+ * pci device is really pci-to-pci device.
+ */
+p = e;
+QLIST_FOREACH(child_bus, &bus->child, sibling) {
+if (child_bus->parent_dev == dev) {
+bus = child_bus;
+continue;
+}
+}
+bus = NULL;
+}
+}
diff --git a/hw/pci.h b/hw/pci.h
index 6e80b08..96f8d52 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -16,6 +16,7 @@
 #define PCI_DEVFN(slot, func)   slot) & 0x1f) << 3) | ((func) & 0x07))
 #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
 #define PCI_FUNC(devfn) ((devfn) & 0x07)
+#define PCI_SLOT_MAX32
 #define PCI_FUNC_MAX8
 
 /* Class, Vendor and Device IDs from Linux's pci_ids.h */
@@ -258,6 +259,7 @@ int pci_parse_devaddr(const char *addr, int *domp, int 
*busp,
   unsigned int *slotp, unsigned int *funcp);
 int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp,
  unsigned *slotp);
+int pci_parse_fw_dev_path(const char *path, PCIDevice **pdev);
 
 void do_pci_info_print(Monitor *mon, const QObject *data);
 void do_pci_info(Monitor *mon, QObject **ret_data);
-- 
1.7.1.1




[Qemu-devel] [PATCH 3/3] pcie/aer: glue aer error injection into qemu monitor

2010-12-22 Thread Isaku Yamahata
introduce pcie_aer_inject_error command.

Signed-off-by: Isaku Yamahata 
---
Changes v9 -> v10:
- use fw device path
- error path
- pci-stub.c for CONFIG_PCI=n

Changes v8 -> v9:
- revise error code

Changes v7 -> v8:
- use domain:slot.func:slot.func...:slot.func instead of domain:bus:slot.func
- allow symbolic aer error name in addition to 32bit value

Changes v6 -> v7:
- check return value.

Changes v3 -> v4:
- s/PCIE_AER/PCIEAER/g for structure names.
- compilation adjustment.

Changes v2 -> v3:
- compilation adjustment.
---
 hmp-commands.hx |   28 +++
 hw/pci-stub.c   |   13 +++
 hw/pcie_aer.c   |  222 +++
 sysemu.h|5 +
 4 files changed, 268 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index dd3db36..a5dec9e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -873,6 +873,34 @@ Hot remove PCI device.
 ETEXI
 
 {
+.name   = "pcie_aer_inject_error",
+.args_type  = "advisory_non_fatal:-a,correctable:-c,"
+ "pci_fw_dev_path:s,error_status:s,"
+ "header0:i?,header1:i?,header2:i?,header3:i?,"
+ "prefix0:i?,prefix1:i?,prefix2:i?,prefix3:i?",
+.params = "[-a] [-c]  "
+  " [ []]",
+.help   = "inject pcie aer error\n\t\t\t"
+ " -a for advisory non fatal error\n\t\t\t"
+ " -c for correctable error\n\t\t\t"
+  " = fw device path to pci device"
+ "\n\t\t\t/pci@/"
+  "[]@,/.../"
+  "[]@,\n\t\t\t"
+  " = error string or 32bit\n\t\t\t"
+  " = 32bit x 4\n\t\t\t"
+  " = 32bit x 4",
+.user_print  = pcie_aer_inject_error_print,
+.mhandler.cmd_new = do_pcie_aer_inejct_error,
+},
+
+STEXI
+...@item pcie_aer_inject_error
+...@findex pcie_aer_inject_error
+Inject PCIe AER error
+ETEXI
+
+{
 .name   = "host_net_add",
 .args_type  = "device:s,opts:s?",
 .params = "tap|user|socket|vde|dump [options]",
diff --git a/hw/pci-stub.c b/hw/pci-stub.c
index 674591d..c5a0aa8 100644
--- a/hw/pci-stub.c
+++ b/hw/pci-stub.c
@@ -18,6 +18,7 @@
  * with this program; if not, see .
  */
 
+#include "sysemu.h"
 #include "monitor.h"
 #include "pci.h"
 
@@ -35,3 +36,15 @@ void do_pci_info_print(Monitor *mon, const QObject *data)
 {
 pci_error_message(mon);
 }
+
+int do_pcie_aer_inejct_error(Monitor *mon,
+ const QDict *qdict, QObject **ret_data)
+{
+pci_error_message(mon);
+return -ENOSYS;
+}
+
+void pcie_aer_inject_error_print(Monitor *mon, const QObject *data)
+{
+pci_error_message(mon);
+}
diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
index cb97a95..091680e 100644
--- a/hw/pcie_aer.c
+++ b/hw/pcie_aer.c
@@ -19,6 +19,8 @@
  */
 
 #include "sysemu.h"
+#include "qemu-objects.h"
+#include "monitor.h"
 #include "pci_bridge.h"
 #include "pcie.h"
 #include "msix.h"
@@ -806,3 +808,223 @@ const VMStateDescription vmstate_pcie_aer_log = {
 VMSTATE_END_OF_LIST()
 }
 };
+
+void pcie_aer_inject_error_print(Monitor *mon, const QObject *data)
+{
+QDict *qdict;
+int devfn;
+assert(qobject_type(data) == QTYPE_QDICT);
+qdict = qobject_to_qdict(data);
+
+devfn = (int)qdict_get_int(qdict, "devfn");
+monitor_printf(mon, "OK domain: %x, bus: %x devfn: %x.%x\n",
+   (int) qdict_get_int(qdict, "domain"),
+   (int) qdict_get_int(qdict, "bus"),
+   PCI_SLOT(devfn), PCI_FUNC(devfn));
+}
+
+typedef struct PCIEAERErrorName {
+const char *name;
+uint32_t val;
+bool correctable;
+} PCIEAERErrorName;
+
+/*
+ * AER error name -> value convertion table
+ * This naming scheme is same to linux aer-injection tool.
+ */
+static const struct PCIEAERErrorName pcie_aer_error_list[] = {
+{
+.name = "TRAIN",
+.val = PCI_ERR_UNC_TRAIN,
+.correctable = false,
+}, {
+.name = "DLP",
+.val = PCI_ERR_UNC_DLP,
+.correctable = false,
+}, {
+.name = "SDN",
+.val = PCI_ERR_UNC_SDN,
+.correctable = false,
+}, {
+.name = "POISON_TLP",
+.val = PCI_ERR_UNC_POISON_TLP,
+.correctable = false,
+}, {
+.name = "FCP",
+.val = PCI_ERR_UNC_FCP,
+.correctable = false,
+}, {
+.name = "COMP_TIME",
+.val = PCI_ERR_UNC_COMP_TIME,
+.correctable = false,
+}, {
+.name = "COMP_ABORT",
+.val = PCI_ERR_UNC_COMP_ABORT,
+.correctable = false,
+}, {
+.name = "UNX_COMP",
+.val = PCI_ERR_UNC_UNX_COMP,
+.correctable = false,
+}, {
+.name = "RX_OVER",
+.val = PCI_ERR_UNC_RX_OVER,
+.correctable = false,
+}, {
+.name = "MALF_TLP",
+.val = P

[Qemu-devel] [PATCH 1/3] build, pci: remove QMP dependency on core PCI code

2010-12-22 Thread Isaku Yamahata
by introducing pci-stub.c, eliminate QMP dependency on core PCI code
rquired by query-pci command.

Signed-off-by: Isaku Yamahata 
---
 Makefile.objs   |4 +---
 Makefile.target |2 ++
 hw/pci-stub.c   |   37 +
 3 files changed, 40 insertions(+), 3 deletions(-)
 create mode 100644 hw/pci-stub.c

diff --git a/Makefile.objs b/Makefile.objs
index d6b3d60..c3e52c5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -169,9 +169,7 @@ hw-obj-y =
 hw-obj-y += vl.o loader.o
 hw-obj-$(CONFIG_VIRTIO) += virtio.o virtio-console.o
 hw-obj-y += fw_cfg.o
-# FIXME: Core PCI code and its direct dependencies are required by the
-# QMP query-pci command.
-hw-obj-y += pci.o pci_bridge.o
+hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
 hw-obj-$(CONFIG_PCI) += msix.o msi.o
 hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
 hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
diff --git a/Makefile.target b/Makefile.target
index d08f5dd..38582d4 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -1,6 +1,7 @@
 # -*- Mode: makefile -*-
 
 GENERATED_HEADERS = config-target.h
+CONFIG_NO_PCI = $(if $(subst n,,$(CONFIG_PCI)),n,y)
 CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y)
 
 include ../config-host.mak
@@ -188,6 +189,7 @@ ifdef CONFIG_SOFTMMU
 obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
+obj-$(CONFIG_NO_PCI) += pci-stub.o
 obj-$(CONFIG_VIRTIO) += virtio-blk.o virtio-balloon.o virtio-net.o 
virtio-serial-bus.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 obj-y += vhost_net.o
diff --git a/hw/pci-stub.c b/hw/pci-stub.c
new file mode 100644
index 000..674591d
--- /dev/null
+++ b/hw/pci-stub.c
@@ -0,0 +1,37 @@
+/*
+ * PCI stubs for plathome that doesn't support pci bus.
+ *
+ * Copyright (c) 2010 Isaku Yamahata 
+ *VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "monitor.h"
+#include "pci.h"
+
+static void pci_error_message(Monitor *mon)
+{
+monitor_printf(mon, "PCI devices not supported\n");
+}
+
+void do_pci_info(Monitor *mon, QObject **ret_data)
+{
+pci_error_message(mon);
+}
+
+void do_pci_info_print(Monitor *mon, const QObject *data)
+{
+pci_error_message(mon);
+}
-- 
1.7.1.1




Re: [Qemu-devel] Re: [PATCH] pci: disable migration of p2p bridge

2010-12-22 Thread Michael S. Tsirkin
On Wed, Dec 22, 2010 at 05:00:45PM +0900, Isaku Yamahata wrote:
> On Wed, Dec 22, 2010 at 08:27:17AM +0200, Michael S. Tsirkin wrote:
> > On Wed, Dec 22, 2010 at 12:13:43PM +0900, Isaku Yamahata wrote:
> > > Right now pcibus_get_dev_path() isn't migration save because
> > > bus number/secondary bus number are set by guest OS.
> > > So it can't be used reliably for qemu internal id.
> > > 
> > > For 0.14 release, disable p2p bridge migration at the moment.
> > > Once pcibus_get_dev_path() is fixed, this patch should be reverted.
> > > It will be addressed for 0.15 release.
> > > 
> > > Cc: "Michael S. Tsirkin" 
> > > Cc: Alex Williamson 
> > > Cc: Blue Swirl 
> > > Signed-off-by: Isaku Yamahata 
> > 
> > 
> > Hmm, haven't looked into this deeply - can we do this in one place
> > when the bridge is created?
> 
> Unfortunately it's not easy. It requires revising
> register_device_unmigratable(). I have to admit this patch is ugly. 
> 
> This patch is temporal work around and should be reverted eventually.
> So I think it is better to address the original issue (allowing migration
> of p2p bridge) instead of addressing register_device_unmigratable().
> 
> thanks

Hmm. Let's discuss how we will fix pcibus_get_dev_path?
It does not seem to matter how exactly we build it up.
Any way to describe the topology will do I think,
it is slighly ugly to have multiple routines to
build up the path.

For migration, it does not even matter how we describe the domain
as it can't be hot-polugged, we can just use some kind of number.
However, at the moment we don't really support multi-domain
systems, in that there's no way to create such, correct?
So we could use the domain:00:dev.fn:dev.fn:dev.fn syntax as the
natural and backwards compatible way.

What does matter is the commands we give the users for hotplug, these
really are visible.  However, it's not clear that we must support a full
hierarchical path for this.  Maybe the parent bus qdev id is sufficient
for this?

If so the multiple routines to build up the path issue is
a non-issue and we can easily just fix migration...

Sorry about rambling, to summarise let's just apply the below?

>

pci: fix migration device path for devices behind nested bridges

We were using bus number in the device path, which is clearly
broken as this number is guest-assigned for all devices
except the root.

Fix by using hierarchical list of slot/function numbers, walking
the path from root down to device, instead. Add :00 as bus number
so that if there are no nested bridges, this is compatible
with what we have now.

Note: as pointed out by Gleb, using openfirmware paths
might be cleaner, doing this would break compatibility though.

Signed-off-by: Michael S. Tsirkin 
---
 hw/pci.c |   44 
 1 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 8f6fcf8..aed2d42 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1826,15 +1826,41 @@ static void pcibus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
 
 static char *pcibus_get_dev_path(DeviceState *dev)
 {
-PCIDevice *d = (PCIDevice *)dev;
-char path[16];
-
-snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
- pci_find_domain(d->bus),
- 0 /* TODO: need a persistent path for nested buses.
-* Note: pci_bus_num(d->bus) is not right as it's guest
-* assigned. */,
- PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
-
-return strdup(path);
+PCIDevice *d = container_of(dev, PCIDevice, qdev);
+PCIDevice *t;
+int slot_depth;
+/* Path format: Domain:00:Slot.Function:Slot.Function:Slot.Function.
+ * 00 is added here to make this format compatible with
+ * domain:Bus:Slot.Func for systems without nested PCI bridges.
+ * Slot.Function list specifies the slot and function numbers for all
+ * devices on the path from root to the specific device. */
+int domain_len = strlen(":00");
+int slot_len = strlen(":SS.F");
+int path_len;
+char *path, *p;
+
+/* Calculate # of slots on path between device and root. */;
+slot_depth = 0;
+for (t = d; t; t = t->bus->parent_dev)
+++slot_depth;
+
+path_len = domain_len + slot_len * slot_depth;
+
+/* Allocate memory, fill in the terminating null byte. */
+path = malloc(path_len + 1 /* For '\0' */);
+path[path_len] = '\0';
+
+/* First field is the domain. */
+snprintf(path, domain_len, "%04x:00", pci_find_domain(d->bus));
+
+/* Fill in slot numbers. We walk up from device to root, so need to print
+ * them in the reverse order, last to first. */
+p = path + path_len;
+for (t = d; t; t = t->bus->parent_dev) {
+p -= slot_len;
+snprintf(p, slot_len, ":%02x.%x", PCI_SLOT(t->devfn), 
PCI_FUNC(d->devfn));
+}
+
+return path;
 }
 
-- 
1.7.3.2.91.g446ac



[Qemu-devel] Re: [PATCH 2/3] pci: introduce a parser for fw device path to pci device

2010-12-22 Thread Michael S. Tsirkin
On Wed, Dec 22, 2010 at 07:54:49PM +0900, Isaku Yamahata wrote:
> Introduce a function to parse fw device path to pci device.
> the format is
> /p...@{, 
> }/[]@,/.../[]@,
> 
>  = "i"
>  = 
>  = slot number in hex
>  = func number in hex
> 
> Signed-off-by: Isaku Yamahata 

What concerns me the most here is the use of io addresses,
not sure it's the right thing for the command interface.

Why do we need to support full path at all?  Can we use the id of the
parent bus for this?  Supplying a bus id for the device seems like a
natural way to describe a tree, with minimal need for parsing.

> ---
>  hw/pci.c |  128 
> ++
>  hw/pci.h |2 +
>  2 files changed, 130 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index eb21848..a52a323 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -2027,3 +2027,131 @@ static char *pcibus_get_dev_path(DeviceState *dev)
>  return strdup(path);
>  }
>  
> +/*
> + * Parse format and get PCIDevice
> + * return 0 on success
> + *   <0 on error: format is invalid or device isn't found.
> + *
> + * Format:
> + * /p...@{, }/[]@,/...
> + * .../[]@,
> + *
> + *  = "i"
> + *  = 
> + *  = slot number in hex
> + *  = func number in hex
> + *
> + */
> +int pci_parse_fw_dev_path(const char *path, PCIDevice **pdev)
> +{
> +const char *p = path;
> +char *e;
> +size_t len;
> +PCIBus *bus;
> +struct PCIHostBus *host;
> +
> +if (*p != '/') {
> +return -EINVAL;
> +}
> +e = strchr(p + 1, '/');
> +if (e == NULL) {
> +return -EINVAL;
> +}
> +len = e - p;
> +p = e + 1;
> +
> +bus = NULL;
> +QLIST_FOREACH(host, &host_buses, next) {
> +DeviceState *qdev = host->bus->qbus.parent;
> +if (qdev) {
> +char *devpath = qdev_get_fw_dev_path(qdev);
> +
> +if (len == strlen(devpath) && !strncmp(devpath, path, len)) {
> +bus = host->bus;
> +qemu_free(devpath);
> +break;
> +}
> +qemu_free(devpath);
> +} else {
> +/* This pci bus doesn't have host-to-pci bridge device.
> + * Check only if the path is pci ignoring other parameters. */
> +#define PCI_FW_PATH "/pci@"
> +if (strncmp(path, PCI_FW_PATH, strlen(PCI_FW_PATH))) {
> +return -EINVAL;
> +}
> +bus = host->bus;
> +break;
> +}
> +}
> +
> +for (;;) {
> +char *at;
> +char *comma;
> +unsigned long slot;
> +unsigned long func;
> +PCIDevice *dev;
> +PCIBus *child_bus;
> +
> +if (!bus) {
> +return -ENODEV;
> +}
> +if (*p == '\0') {
> +return -EINVAL;
> +}
> +
> +at = strchr(p, '@');
> +if (at == NULL) {
> +return -EINVAL;
> +}
> +slot = strtoul(at + 1, &e, 16);
> +if (e == at + 1 || *e != ',') {
> +return -EINVAL;
> +}
> +if (slot >= PCI_SLOT_MAX) {
> +return -EINVAL;
> +}
> +
> +comma = e;
> +func = strtoul(comma + 1, &e, 16);
> +if (e == comma + 1 || (*e != '/' && *e != '\0')) {
> +return -EINVAL;
> +}
> +if (func >= PCI_FUNC_MAX) {
> +return -EINVAL;
> +}
> +
> +len = e - p;
> +dev = bus->devices[PCI_DEVFN(slot, func)];
> +if (!dev) {
> +return -ENODEV;
> +}
> +if (at != p) {
> +/* fw_name is specified. */
> +char *fw_dev_path = pcibus_get_fw_dev_path(&dev->qdev);
> +if (strncmp(p, fw_dev_path, len)) {
> +qemu_free(fw_dev_path);
> +return -EINVAL;
> +}
> +qemu_free(fw_dev_path);
> +}
> +
> +if (*e == '\0') {
> +*pdev = dev;
> +return 0;
> +}
> +
> +/*
> + * descending down pci-to-pci bridge.
> + * At the moment, there is no way to safely determine if the given
> + * pci device is really pci-to-pci device.
> + */
> +p = e;
> +QLIST_FOREACH(child_bus, &bus->child, sibling) {
> +if (child_bus->parent_dev == dev) {
> +bus = child_bus;
> +continue;
> +}
> +}
> +bus = NULL;
> +}
> +}
> diff --git a/hw/pci.h b/hw/pci.h
> index 6e80b08..96f8d52 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -16,6 +16,7 @@
>  #define PCI_DEVFN(slot, func)   slot) & 0x1f) << 3) | ((func) & 0x07))
>  #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
>  #define PCI_FUNC(devfn) ((devfn) & 0x07)
> +#define PCI_SLOT_MAX32
>  #define PCI_FUNC_MAX8
>  
>  /* Class, Vendor and Device IDs from Linux's pci_ids.h */
> @@ -258,6 +259,7 @@ int pci_parse_devaddr(const char *add

[Qemu-devel] Re: [PATCH 1/3] build, pci: remove QMP dependency on core PCI code

2010-12-22 Thread Michael S. Tsirkin
On Wed, Dec 22, 2010 at 07:54:48PM +0900, Isaku Yamahata wrote:
> by introducing pci-stub.c, eliminate QMP dependency on core PCI code
> rquired by query-pci command.
> 
> Signed-off-by: Isaku Yamahata 

Yay! Applied.

> ---
>  Makefile.objs   |4 +---
>  Makefile.target |2 ++
>  hw/pci-stub.c   |   37 +
>  3 files changed, 40 insertions(+), 3 deletions(-)
>  create mode 100644 hw/pci-stub.c
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index d6b3d60..c3e52c5 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -169,9 +169,7 @@ hw-obj-y =
>  hw-obj-y += vl.o loader.o
>  hw-obj-$(CONFIG_VIRTIO) += virtio.o virtio-console.o
>  hw-obj-y += fw_cfg.o
> -# FIXME: Core PCI code and its direct dependencies are required by the
> -# QMP query-pci command.
> -hw-obj-y += pci.o pci_bridge.o
> +hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
>  hw-obj-$(CONFIG_PCI) += msix.o msi.o
>  hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
>  hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
> diff --git a/Makefile.target b/Makefile.target
> index d08f5dd..38582d4 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -1,6 +1,7 @@
>  # -*- Mode: makefile -*-
>  
>  GENERATED_HEADERS = config-target.h
> +CONFIG_NO_PCI = $(if $(subst n,,$(CONFIG_PCI)),n,y)
>  CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y)
>  
>  include ../config-host.mak
> @@ -188,6 +189,7 @@ ifdef CONFIG_SOFTMMU
>  obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o
>  # virtio has to be here due to weird dependency between PCI and virtio-net.
>  # need to fix this properly
> +obj-$(CONFIG_NO_PCI) += pci-stub.o
>  obj-$(CONFIG_VIRTIO) += virtio-blk.o virtio-balloon.o virtio-net.o 
> virtio-serial-bus.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
>  obj-y += vhost_net.o
> diff --git a/hw/pci-stub.c b/hw/pci-stub.c
> new file mode 100644
> index 000..674591d
> --- /dev/null
> +++ b/hw/pci-stub.c
> @@ -0,0 +1,37 @@
> +/*
> + * PCI stubs for plathome that doesn't support pci bus.
> + *
> + * Copyright (c) 2010 Isaku Yamahata 
> + *VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "monitor.h"
> +#include "pci.h"
> +
> +static void pci_error_message(Monitor *mon)
> +{
> +monitor_printf(mon, "PCI devices not supported\n");
> +}
> +
> +void do_pci_info(Monitor *mon, QObject **ret_data)
> +{
> +pci_error_message(mon);
> +}
> +
> +void do_pci_info_print(Monitor *mon, const QObject *data)
> +{
> +pci_error_message(mon);
> +}
> -- 
> 1.7.1.1



[Qemu-devel] Deterministic replay

2010-12-22 Thread Pavel Dovgaluk
Hello.

 I am working on implementation of deteministic execution replay technology
for Qemu. It should be similar to VMWare's replay debugging.
 
 To make alarm timer (which invokes host_alarm_handler function)
determinisic, I changed it's behavior: it sets flag, that execution should
be stopped and this flag is checked before every instruction in the
translated code.
 To allow breaking the execution in any moment, I added flag check and
exit_tb call before every instruction.

 But there is one problem occurred: when I make the same actions like
gen_eob() function, Windows XP cannot load on emulated machine (it reboots
with GPF).
 But when I remove two of the actions, everything is OK. These actions are
the following:
if (s->tb->flags & HF_INHIBIT_IRQ_MASK) {
gen_helper_reset_inhibit_irq();
}
if (s->tb->flags & HF_RF_MASK) {
gen_helper_reset_rf();
}

 What are these actions for? When I remove any of them the booting process
is OK.
 Please help me to understand interrupts mechanism or give links where to
read about it.

 I apply my modifications to version 0.12.3 of Qemu. Please refer to it.

Pavel Dovgaluk






Re: [Qemu-devel] PCIe Transaction handling in Qemu

2010-12-22 Thread Paul Brook
> I have some questions about PCIe operations sssuming the device has MMIO
> handlers involved (as shown above).

> 1. Will all PCIe config operations
> ALWAYS use the installed config handlers? Or can PCIe config operations
> use the MMIO handlers? 

Access to PCI config space is provided by the PCI host bridge. It has nothing 
to do with any memory BARs the device may have.  The host bridge may expose 
this in any way it chooses, including but not limited to ISA IO ports or a 
memory mapped region of its own.  Ether way the device doesn't care.

> 2. Assuming that both PCI config and MMIO
> operations can use the MMIO handlers, is there any way I can identify if a
> transaction is a config or a memory transaction?

Incorrect assumption. Memory and Config accesses ae completely separate.

> 3.a. What address is
> passed on the MMIO handlers for config and MMIO operations? From
> pci_data_write in pci_host.c, it appears that config operations send only
> the offset into the config region. I couldn't determine what address is
> passed for MMIO operations. b. Is it an offset from the BAR for MMIO
> operations?

Th offset from the start of the region.

>c. How do I get the full physical address?

You don't.  "Full physical address" is a fairly ill defined term. Physical 
addresses are local to a particular bus. It's common for CPU/ram and each PCI 
bus to have completely independent physical address spaces, with the host 
bridge providing mapping between the two.

>d. What address does a PCIe device expect to see - physical or offset
> for? 

Offset. Old versions of qemu used to pass the cpu physical address. This was a 
bug.

> e. Is there anyway I can find out what the bus and device numbers are
> once inside the config and MMIO handlers? i.e once the execution has
> reached the pci_cirrus_write_config() or cirrus_vga_mem_readb(..) from the
> code above?

No. The device does not, and should not know this. 

Paul



[Qemu-devel] Re: [PATCH 2/3] pci: introduce a parser for fw device path to pci device

2010-12-22 Thread Isaku Yamahata
On Wed, Dec 22, 2010 at 01:04:43PM +0200, Michael S. Tsirkin wrote:
> On Wed, Dec 22, 2010 at 07:54:49PM +0900, Isaku Yamahata wrote:
> > Introduce a function to parse fw device path to pci device.
> > the format is
> > /p...@{, 
> > }/[]@,/.../[]@,
> > 
> >  = "i"
> >  = 
> >  = slot number in hex
> >  = func number in hex
> > 
> > Signed-off-by: Isaku Yamahata 
> 
> What concerns me the most here is the use of io addresses,
> not sure it's the right thing for the command interface.
>
> Why do we need to support full path at all?  Can we use the id of the
> parent bus for this?  Supplying a bus id for the device seems like a
> natural way to describe a tree, with minimal need for parsing.

The ids of most devices are set NULL currently. So the id is useless
right now unfortunately. Maybe how to assign ids to all of qdevs
systematically would be difficult.

To be honest, I don't have strong opinion for format of pci topology.
So far we discussed the following candidates.
what format do you prefer?

- domain::.
  guest assigns bus number, so this can't be used for qemu internal use.
- domain:00:.:...:.
- fw device path
- id
  Unfortunately id is NULL for most devices right now. So id doesn't work.
- any other?

thanks,
-- 
yamahata



[Qemu-devel] Re: [PATCH 2/3] pci: introduce a parser for fw device path to pci device

2010-12-22 Thread Michael S. Tsirkin
On Wed, Dec 22, 2010 at 08:36:40PM +0900, Isaku Yamahata wrote:
> On Wed, Dec 22, 2010 at 01:04:43PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Dec 22, 2010 at 07:54:49PM +0900, Isaku Yamahata wrote:
> > > Introduce a function to parse fw device path to pci device.
> > > the format is
> > > /p...@{, 
> > > }/[]@,/.../[]@,
> > > 
> > >  = "i"
> > >  = 
> > >  = slot number in hex
> > >  = func number in hex
> > > 
> > > Signed-off-by: Isaku Yamahata 
> > 
> > What concerns me the most here is the use of io addresses,
> > not sure it's the right thing for the command interface.
> >
> > Why do we need to support full path at all?  Can we use the id of the
> > parent bus for this?  Supplying a bus id for the device seems like a
> > natural way to describe a tree, with minimal need for parsing.
> 
> The ids of most devices are set NULL currently. So the id is useless
> right now unfortunately.

It's up to the user to assign ids. If one doesn't one won't be able
to activate hotplug/aer, which does not seem like a serious limitation.

> Maybe how to assign ids to all of qdevs
> systematically would be difficult.

I don't think it's urgent. No id -> can't use some of the
functionality. No big deal IMO.

> To be honest, I don't have strong opinion for format of pci topology.
> So far we discussed the following candidates.
> what format do you prefer?
> 
> - domain::.
>   guest assigns bus number, so this can't be used for qemu internal use.
> - domain:00:.:...:.
> - fw device path
> - id
>   Unfortunately id is NULL for most devices right now. So id doesn't work.
> - any other?
> 
> thanks,
> -- 
> yamahata



[Qemu-devel] checking the number of target cpu cycles or instructions executed

2010-12-22 Thread Stefano Bonifazi

Hi all! :)
 how can I check the number of target cpu cycles or target instructions 
executed inside qemu-user (i.e. qemu-ppc)?

Is there any variable I can inspect for such informations?
Thank you very much in advance!
 Stefano B.



Re: [Qemu-devel] Deterministic replay

2010-12-22 Thread Paul Brook
> Hello.
> 
>  I am working on implementation of deteministic execution replay technology
> for Qemu. It should be similar to VMWare's replay debugging.
> 
>  To make alarm timer (which invokes host_alarm_handler function)
> determinisic, I changed it's behavior: it sets flag, that execution should
> be stopped and this flag is checked before every instruction in the
> translated code.

You don't need to do this. A much better solution is to not use the host timer 
at all. See -icount.

Paul



[Qemu-devel] Asynchronously Mirroring a Virtual Machine

2010-12-22 Thread Tomer Margalit
Hi,

I am a grad. student in Tel-Aviv university, and my theses is focused on
asynchronous mirroring.
I have already built asynchronous mirror software (which is composed of a
driver and a daemon), which works by setting up a virtual block device over
an existing one, and thus mirroring the existing one (asynchronously) to the
network using a window.

What I would like to add to it is the ability to asynchronously mirror a
(QEMU) VM as well, so that if the primary site crashes, the VM can be
restored (to the last stable point) immediately.

Since I want it to be as general as possible, I will not rely on my mirror,
but only on there being a virtual block device that is mirroring me (and
that has consistent writes (that is, if write A succeeded, and afterwards
write B succeeded, then if B is on the secondary site, then A must be there
too)), and that there is some kind of marking mechanism (so that I can mark
the last consistent state).

The reason I want to do this using QEMU is that it has live migration -
which is almost what I need.
Right now my plan is to treat the memory as a file, and put it and all the
vm images on the same mirroring block device.
Then, every minute or so, I will (stop the vm) sync the memory and disks,
and put a mark (where during the minute I will continue to write what I
can).

This sounds to me like doing a continuous "live migration", but never
actually moving the machine, and after finishing the migration on the side
that initiated it, not throwing away the dirty/clean pages (so that the
memory does not have to be moved completely again).

I would appreciate any direction and/or suggestions in this matter.

Also, I wondered where can I find some documentation about the whole
migration process - should I just read the code, or is there any document
about it?

Thanks,
Tomer


RE: [Qemu-devel] Deterministic replay

2010-12-22 Thread Pavel Dovgaluk
> >  I am working on implementation of deteministic execution replay
> technology
> > for Qemu. It should be similar to VMWare's replay debugging.
> >
> >  To make alarm timer (which invokes host_alarm_handler function)
> > determinisic, I changed it's behavior: it sets flag, that execution
> should
> > be stopped and this flag is checked before every instruction in the
> > translated code.
> 
> You don't need to do this. A much better solution is to not use the
> host timer
> at all. See -icount.

 Thank you for your reply. 
 I know, that there is a virtual timers, that are enabled by -icount option
and can be used to get rid of host timers usage.
 But the problem is different - I need to synchronize alarm thread, which
breaks the execution of guest code to allow processing interrupts and
interaction with VNC/GDB/...
 Events caused by alarm thread are non-deterministic and asynchronous. To
save these events in the execution log (for latter replay) I need to
synchronize them with execution of guest code.
 The way which I am using for it is allowing execution stop at any point of
the guest code.


Pavel Dovgaluk




Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Hannes Reinecke
On 12/21/2010 11:05 PM, Benjamin Herrenschmidt wrote:
>>> So back to square 1 ... my vscsi (and virtio-blk too btw) can
>>> technically pass a max size to the guest, but we don't have a way to
>>> interrogate scsi-generic (and the underlying block driver) which is the
>>> main issue (that plus the fact that the ioctl seems to be broken in
>>> "compat" mode for /dev/sg specifically)...
>>>
>> Ah, the warm and fuzzy feeling knowing to be not alone in this ...
>>
>> This is basically the same issue I brought up with the first
>> submission round of my megasas emulation.
> 
> heh.
> 
>> As we're passing scatter-gather lists directly to the underlying
>> device we might end up sending a request which is improperly
>> formatted. The linux block layer has three limits onto which a
>> request has to be formatted:
>> - Max length of the scatter-gather list (max_sectors)
>> - Max overall request size (max_segments)
> 
> Didn't you swap the 2 above ? max_sectors is the max overall req. size
> and max_segments the max number of SG elements afaik :-)
> 
Yeah, could be. 'twas only meant for illustration anyway.

>> - Max length of individual sg elements (max_segment_size)
> 
>> newer kernels export these limits; they have been exported with
>> commit c77a5710b7e23847bfdb81fcaa10b585f65c960a.
>> For older kernels, however, we're being left in the dark here.
> 
> Well, first of all, "sg" is not there so that doesn't help with the
> scsi-generic problem much, then parsing sysfs... yuck.
> 
Well, sort of. 'sg' doesn't have any block queue limits directly as the
block queue is attached to the block device (surprise, surprise :-).
But nevertheless any commands send via SG_IO are being placed on the
block queue, hence the same limits apply here, too.

>> So on newer kernel we probably could be doing a quick check on the
>> block queue limits and reformat the I/O if required.
> 
> Maybe but then, "sg" isn't there. We "could" I suppose use "sr" as an
> indication tho when we know it's a cdrom.
> 
If it were me I would be using
>> Instead of reformatting we could be sendiong each element of an eg
>> list individually. Thereby we would be introducing some slowdown as
>> the sg lists have to be reassembled again by the lower layers, but
>> we would be insulated from any sg list mismatch.
>> However, this won't cover requests with too large sg elements.
>> For those we could probably use some simple divide-by-two algorithm
>> on the element to make them fit.
> 
> How can we ? We need a single request to match a single sg list anyways
> no ?
> 
Yes, true. That's what I was trying to illustrate here.

> Let's say you get a READ10 from the guest for 200Kb and your underlying
> max_sectors is 128Kb. How do you want to "break that up" ? The only way
> would be to make it two different READ10's and that's a can of worms
> especially if you start putting tags into the picture...
> 
Precisely. Hence I didn't try to implement anything in that area :-)

>> But seeing we have to split the I/O requests anyway we might as well
>> use the divide-by-two algorithm for the sg lists, too.
>>
>> Easiest would be if we could just transfer the available bits and
>> push the request back to the guest as a partial completion.
>> Sadly the I/O stack on the guest will choose to interpret this as an
>> I/O error instead of retrying the remainder :-(
>>
>> So in the long run I fear we have to implement some sort of I/O
>> request splitting in Qemu, using the values from sysfs.
> 
> So in my case, I'm happy for the time being to continue doing bounce
> buffering and so my only problem at the moment is the max request size
> (aka max_sectors). Also I -can- tell the guest what my limitation is,
> it's part of the vscsi login protocol. I can look into doing DMA
> directly to the guest SG lists later maybe.
> 
> However, I can't quite figure out how to reliably obtain that
> information in my driver since on one hand, the ioctl doesn't seem to
> work in mixed 32/64-bit environments, and on the other hand, sysfs
> doesn't seem to have anything for "sg" in /sys/class/block... Besides,
> those are both Linux-isms... so we'd have to be extra careful there too.
> 
Yes. I've been bashing my head against this, too.

IMO the whole problem arises from the fact that we're deliberately
destroying information here.
Most modern HBAs are using separate codepaths for streaming/block I/O
anyway, but when using 'scsi-generic' we are forced to discard this
information. We have to fake a SCSI READ/WRITE command, and send it via
SG_IO to the underlying device and keep fingers crossed that we're not
exceeding any device limitations.

The whole problem would just go away if we could use the standard block
read()/write() calls here. Then the iovec would be placed _as
scatter-gather list_ on the request-queue and the block layer would take
care of the whole issue.

I've tried to advocate this approach once, but (again) was being told
that it's a misuse of scsi-generic and I should be usi

Re: [Qemu-devel] A problem about qemu compiling .

2010-12-22 Thread Mulyadi Santosa
On Wed, Dec 22, 2010 at 20:02, D Prince  wrote:
>
>
> 2010/12/13 Mulyadi Santosa 
>>
>> ./configure --target-list=i386-softmmu --static --audio-card-list=adlib
>>
>>
>> --
>> regards,
>>
>> Mulyadi Santosa
>
> It's working alright now.Thank you very much for your response!
>  --Terry
>
>

Congrats :) So, you adopted my idea, or did you come up with other
solution? I'd love to read your solution...



-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com



Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Christoph Hellwig
On Wed, Dec 22, 2010 at 02:54:54PM +0100, Hannes Reinecke wrote:
> Most modern HBAs are using separate codepaths for streaming/block I/O
> anyway,

That's not true at all.  Every normal HBA justs passes normal SCSI
commands to the SCSI targets.  It's just raid adapters that take special
commands, and the megaraid one is extremly special as it actually
emulates a few SCSI commands even in RAID mode, which almost no other
HBA does.  Strictly speaking we should not allow scsi-generic with
megaraid_sas, except for the separate passthrough channels that the real
hardware has for things like tape drives.

> However, since Alex Graf is facing similar problems with the AHCI HBA of
> his maybe we could retry again ...

AHCI is a ATA adapter, and should never be used with scsi-generic for
disks.  Only for the ATAPI-attached cdroms/tapes/etc it could be used,
although it's quite pointless.




Re: [Qemu-devel] QEMU forks survey

2010-12-22 Thread Andreas Färber

Am 22.12.2010 um 08:47 schrieb Bastien ROUCARIES:

On Tue, Dec 21, 2010 at 7:28 PM, Andreas Färber > wrote:
Since Christmas and the New Year with its good intensions are  
approaching,

apart from z80 there's some more feature forks around:

http://repo.or.cz/w/qemu.git/forks?o=age


They are also the gold plateform aka the android qemu port (see a
previous thread)


Thanks. I'd like to document all these on wiki.qemu.org in some form.
Unfortunately the site failed to email me a new password, so that'll  
have to wait another 24h...


Andreas


Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Benjamin Herrenschmidt
On Wed, 2010-12-22 at 14:54 +0100, Hannes Reinecke wrote:

> Well, sort of. 'sg' doesn't have any block queue limits directly as the
> block queue is attached to the block device (surprise, surprise :-).
> But nevertheless any commands send via SG_IO are being placed on the
> block queue, hence the same limits apply here, too.

Right, tho is there a "simple" way to map sg to the appropriate block
driver to retreive the info via sysfs ? I looks possible from a quick
peek there but it also looks like an ungodly mess.
 
> If it were me I would be using

I think you meant to type more here :-)

> > However, I can't quite figure out how to reliably obtain that
> > information in my driver since on one hand, the ioctl doesn't seem to
> > work in mixed 32/64-bit environments, and on the other hand, sysfs
> > doesn't seem to have anything for "sg" in /sys/class/block... Besides,
> > those are both Linux-isms... so we'd have to be extra careful there too.
> > 
> Yes. I've been bashing my head against this, too.

Christoph, any suggestion there ?

> IMO the whole problem arises from the fact that we're deliberately
> destroying information here.
> Most modern HBAs are using separate codepaths for streaming/block I/O
> anyway, but when using 'scsi-generic' we are forced to discard this
> information. We have to fake a SCSI READ/WRITE command, and send it via
> SG_IO to the underlying device and keep fingers crossed that we're not
> exceeding any device limitations.

I wouldn't say it like that no.

It's a transport problem. In my case I'm not "faking" anything, vscsi is
just a transport (a variant of SRP). The problem is that when
'emulating' a HW HBA, you have no way to express the intrinsic
limitations of the underlying HBA, but that's not a problem I have with
vscsi which is meant to be a transport and as such does have means to
convey that sort of information (tho in my case, I have some issues due
to assumptions/bugs in the existing ibm vscsi client driver but that's a
different topic).

So I think there's a significant difference here between emulating a HW
HBA and doing something like vscsi. The former has problems that cannot
be easily solved I believe. The later problems on the other hands can be
solved, the means to do so are there, but we have to deal with
"interface" issues ... plumbing problems.

The non working compat ioctl is one, the fact that "sg" has
no /sys/class/block (or /sys/block) entries is another, etc... Ie, we
are faced with a problem with Linux not exposing those informations in
an easy to retrieve way, and no proper cross-platform way to obtain
those informations neither.

> The whole problem would just go away if we could use the standard block
> read()/write() calls here. Then the iovec would be placed _as
> scatter-gather list_ on the request-queue and the block layer would take
> care of the whole issue.

That would be somewhat cheating with the concept of just being a SCSI
transport layer :-) You would interpret some requests and turn them into
something else. That would be "interesting" when your user starts using
tags and make assumptions about what's in flight and what not etc...

> I've tried to advocate this approach once, but (again) was being told
> that it's a misuse of scsi-generic and I should be using scsi-disk instead.
> 
> However, since Alex Graf is facing similar problems with the AHCI HBA of
> his maybe we could retry again ...

Again, I'd say different problems :-) To some extent scsi-disk will
solve the issues with basic read/write operations, but there's some more
nasty SCSI commands that you want through for things like DVD burning
for example, unless we start building higher level abstractions into the
kernel. So you -still- end up acting somewhat as a SCSI transport layer,
and potentially hit the problem with limits again.

Cheers,
Ben.

> Cheers,
> 
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> h...@suse.de+49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)





Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Benjamin Herrenschmidt
On Wed, 2010-12-22 at 14:27 +0100, Christoph Hellwig wrote:
> On Wed, Dec 22, 2010 at 02:54:54PM +0100, Hannes Reinecke wrote:
> > Most modern HBAs are using separate codepaths for streaming/block I/O
> > anyway,
> 
> That's not true at all.  Every normal HBA justs passes normal SCSI
> commands to the SCSI targets.  It's just raid adapters that take special
> commands, and the megaraid one is extremly special as it actually
> emulates a few SCSI commands even in RAID mode, which almost no other
> HBA does.  Strictly speaking we should not allow scsi-generic with
> megaraid_sas, except for the separate passthrough channels that the real
> hardware has for things like tape drives.

Actually, I would put it differently here.

scsi-generic is -fundamentally- busted for HBA HW emulation since you
simply cannot convey the limits of the underlying real HBA.

If you are on top of usb-storage with a 120K max_sectors and try to
emulate a piece of HBA with no such limitation how in hell do you make
you guest know not to give your >120K requests at a time and what do you
do if it does ? You're stuffed basically...

Hence, the only way scsi-generic can make sense imho, is for something
like vscsi which I'm doing now, which is just a transport and does have
the ability to convey to the client/guest some of those limitations...
provided it can get to them in the first place (see the discussion, it's
really non trivial, which makes /dev/sg even less useful even for normal
userspace :-)

In the Megaraid case, the fact that it has this separate read/write
channel on the contrary should make it -easier- to solve that problem
typically by allowing the emulation layer to construct sequences of
READ/WRITE requests that match the limitations. IE. Ie makes
scsi-generic a possibility while it would otherwise (and is) broken in
unfixable ways with other HBA emulation.

> > However, since Alex Graf is facing similar problems with the AHCI HBA of
> > his maybe we could retry again ...
> 
> AHCI is a ATA adapter, and should never be used with scsi-generic for
> disks.  Only for the ATAPI-attached cdroms/tapes/etc it could be used,
> although it's quite pointless.

Right, but in that case (cdroms etc...) it would have the exact same
problem. I'm not familiar with AHCI HW, and so I don't know whether
there's a way for the HW to convey "limits" to the driver, but if not,
then operating via scsi-generic would be busted the same way anything
else is.

Basically, scsi-generic cannot work for emulating an HBA. In fact, I
would go as far as saying that it's not possible to generically emulate
an HBA that just pass-through any SCSI command, simply due to the
inability to convey those limits.

vscsi is a special case (and other "paravirt" drivers that may exist)
because being explicitely designed for acting as such transports, they
-do- convey the necessary limit information. I don't know iscsi but I
would be surprised if it didn't provide similar facilities.

So what we need here is a way for qemu to retrieve those reliably when
using scsi-generic. That's the missing piece of the puzzle on my side.

Cheers,
Ben.





[Qemu-devel] [PATCH] Add a check for readlink in mapped mode.

2010-12-22 Thread Venkateswararao Jujjuri (JV)
Signed-off-by: Venkateswararao Jujjuri 
---
 hw/9pfs/virtio-9p-local.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c
index a8e7525..9a106d4 100644
--- a/hw/9pfs/virtio-9p-local.c
+++ b/hw/9pfs/virtio-9p-local.c
@@ -112,6 +112,13 @@ static ssize_t local_readlink(FsContext *fs_ctx, const 
char *path,
 ssize_t tsize = -1;
 if (fs_ctx->fs_sm == SM_MAPPED) {
 int fd;
+mode_t tmp_mode;
+/* Make sure that it is a symlink */
+if (getxattr(rpath(fs_ctx, path), "user.virtfs.mode", &tmp_mode,
+sizeof(mode_t)) <= 0 || !(tmp_mode & S_IFLNK)) {
+errno = EINVAL;
+return -1;
+}
 fd = open(rpath(fs_ctx, path), O_RDONLY);
 if (fd == -1) {
 return -1;
-- 
1.6.5.2




Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Alexander Graf

On 22.12.2010, at 14:27, Christoph Hellwig wrote:

> On Wed, Dec 22, 2010 at 02:54:54PM +0100, Hannes Reinecke wrote:
>> Most modern HBAs are using separate codepaths for streaming/block I/O
>> anyway,
> 
> That's not true at all.  Every normal HBA justs passes normal SCSI
> commands to the SCSI targets.  It's just raid adapters that take special
> commands, and the megaraid one is extremly special as it actually
> emulates a few SCSI commands even in RAID mode, which almost no other
> HBA does.  Strictly speaking we should not allow scsi-generic with
> megaraid_sas, except for the separate passthrough channels that the real
> hardware has for things like tape drives.
> 
>> However, since Alex Graf is facing similar problems with the AHCI HBA of
>> his maybe we could retry again ...
> 
> AHCI is a ATA adapter, and should never be used with scsi-generic for
> disks.  Only for the ATAPI-attached cdroms/tapes/etc it could be used,
> although it's quite pointless.

It's not 100% pointless - ATAPI passthrough is a feature requested by users.
If we were to model ATAPI properly, it would end up using whatever SCSI layers 
we have below - which means ATAPI passthrough would be a mere matter of 
replacing the "scsi-cdrom" backend with a "scsi-passthrough" backend.

Now for the fun part. ATAPI can also do NCQ. So we actually end up having the 
exact same thing here as megasas. If we were to do ATAPI passthrough on CD-ROM 
with NCQ. NCQ goes through the normal read/write path of a block backend. 
Passthrough would do SG_IO.


Alex





Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Alexander Graf

On 22.12.2010, at 22:59, Benjamin Herrenschmidt wrote:

> On Wed, 2010-12-22 at 14:54 +0100, Hannes Reinecke wrote:
> 
>> Well, sort of. 'sg' doesn't have any block queue limits directly as the
>> block queue is attached to the block device (surprise, surprise :-).
>> But nevertheless any commands send via SG_IO are being placed on the
>> block queue, hence the same limits apply here, too.
> 
> Right, tho is there a "simple" way to map sg to the appropriate block
> driver to retreive the info via sysfs ? I looks possible from a quick
> peek there but it also looks like an ungodly mess.
> 
>> If it were me I would be using
> 
> I think you meant to type more here :-)
> 
>>> However, I can't quite figure out how to reliably obtain that
>>> information in my driver since on one hand, the ioctl doesn't seem to
>>> work in mixed 32/64-bit environments, and on the other hand, sysfs
>>> doesn't seem to have anything for "sg" in /sys/class/block... Besides,
>>> those are both Linux-isms... so we'd have to be extra careful there too.
>>> 
>> Yes. I've been bashing my head against this, too.
> 
> Christoph, any suggestion there ?
> 
>> IMO the whole problem arises from the fact that we're deliberately
>> destroying information here.
>> Most modern HBAs are using separate codepaths for streaming/block I/O
>> anyway, but when using 'scsi-generic' we are forced to discard this
>> information. We have to fake a SCSI READ/WRITE command, and send it via
>> SG_IO to the underlying device and keep fingers crossed that we're not
>> exceeding any device limitations.
> 
> I wouldn't say it like that no.
> 
> It's a transport problem. In my case I'm not "faking" anything, vscsi is
> just a transport (a variant of SRP). The problem is that when
> 'emulating' a HW HBA, you have no way to express the intrinsic
> limitations of the underlying HBA, but that's not a problem I have with
> vscsi which is meant to be a transport and as such does have means to
> convey that sort of information (tho in my case, I have some issues due
> to assumptions/bugs in the existing ibm vscsi client driver but that's a
> different topic).
> 
> So I think there's a significant difference here between emulating a HW
> HBA and doing something like vscsi. The former has problems that cannot
> be easily solved I believe. The later problems on the other hands can be
> solved, the means to do so are there, but we have to deal with
> "interface" issues ... plumbing problems.
> 
> The non working compat ioctl is one, the fact that "sg" has
> no /sys/class/block (or /sys/block) entries is another, etc... Ie, we
> are faced with a problem with Linux not exposing those informations in
> an easy to retrieve way, and no proper cross-platform way to obtain
> those informations neither.

Why would you care about cross-platform here? Not saying I fully understand 
what information exactly you're lacking. But it's either SG_IO max request size 
in which case you don't need any equivalent on other platforms, as it's not 
available anywhere else. Or it's something else in which case you can just set 
it to some "safe" small default value and call it a day :).


Alex




Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Benjamin Herrenschmidt
On Thu, 2010-12-23 at 00:23 +0100, Alexander Graf wrote:
> > The non working compat ioctl is one, the fact that "sg" has
> > no /sys/class/block (or /sys/block) entries is another, etc... Ie,
> we
> > are faced with a problem with Linux not exposing those informations
> in
> > an easy to retrieve way, and no proper cross-platform way to obtain
> > those informations neither.
> 
> Why would you care about cross-platform here? Not saying I fully
> understand what information exactly you're lacking. But it's either
> SG_IO max request size in which case you don't need any equivalent on
> other platforms, as it's not available anywhere else. Or it's
> something else in which case you can just set it to some "safe" small
> default value and call it a day :).

Well, do we support something like scsi-generic on windows or BSD
hosts ? dunno.. .just asking :-) They -have- mechanisms (at least
windows do) to pass SCSI requests down the stack. In that case, they'll
have similar limitations (at the very least the max request size).

So we'd want some way to expose that... but if scsi-generic today is
linux only, then I can try to add linux-isms in there as a stop-gap to
try to at least retreive the max req. size which is the main issue for
me right now... at least until I start trying to have the SG_IO
read/write directly into guest memory without bouncing :-) At that
point, the SG limits might become trouble as well.

Cheers,
Ben.




Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Alexander Graf

On 23.12.2010, at 00:35, Benjamin Herrenschmidt wrote:

> On Thu, 2010-12-23 at 00:23 +0100, Alexander Graf wrote:
>>> The non working compat ioctl is one, the fact that "sg" has
>>> no /sys/class/block (or /sys/block) entries is another, etc... Ie,
>> we
>>> are faced with a problem with Linux not exposing those informations
>> in
>>> an easy to retrieve way, and no proper cross-platform way to obtain
>>> those informations neither.
>> 
>> Why would you care about cross-platform here? Not saying I fully
>> understand what information exactly you're lacking. But it's either
>> SG_IO max request size in which case you don't need any equivalent on
>> other platforms, as it's not available anywhere else. Or it's
>> something else in which case you can just set it to some "safe" small
>> default value and call it a day :).
> 
> Well, do we support something like scsi-generic on windows or BSD
> hosts ? dunno.. .just asking :-) They -have- mechanisms (at least
> windows do) to pass SCSI requests down the stack. In that case, they'll
> have similar limitations (at the very least the max request size).
> 
> So we'd want some way to expose that... but if scsi-generic today is
> linux only, then I can try to add linux-isms in there as a stop-gap to
> try to at least retreive the max req. size which is the main issue for
> me right now... at least until I start trying to have the SG_IO
> read/write directly into guest memory without bouncing :-) At that
> point, the SG limits might become trouble as well.

This all belongs in the block layer. If you create a call back function or 
property in the block struct, windows can implement its own limits when someone 
sits down to implement SG_IO on Windows.


Alex




Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Benjamin Herrenschmidt
On Thu, 2010-12-23 at 00:39 +0100, Alexander Graf wrote:
> This all belongs in the block layer. If you create a call back
> function or property in the block struct, windows can implement its
> own limits when someone sits down to implement SG_IO on Windows.

Right and we do have "generic" ways it seems to interrogate those
limits .. except they seem to be broken for "sg" :-)

Also I've spotted some oddities where the ioctl for the max request size
sometimes put_user as a int * and sometimes as a short * ... ooops...

Cheers,
Ben.





Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Alexander Graf

On 23.12.2010, at 00:44, Benjamin Herrenschmidt wrote:

> On Thu, 2010-12-23 at 00:39 +0100, Alexander Graf wrote:
>> This all belongs in the block layer. If you create a call back
>> function or property in the block struct, windows can implement its
>> own limits when someone sits down to implement SG_IO on Windows.
> 
> Right and we do have "generic" ways it seems to interrogate those
> limits .. except they seem to be broken for "sg" :-)
> 
> Also I've spotted some oddities where the ioctl for the max request size
> sometimes put_user as a int * and sometimes as a short * ... ooops...

Congratulations for finding lots of Linux bugs :). Look at it from that way: 
You'll most likely be the very first person actually using sg properly. So 
after you're done, others won't have to fix it :).


Alex




Re: [Qemu-devel] scsi-generic and max request size

2010-12-22 Thread Benjamin Herrenschmidt
On Thu, 2010-12-23 at 00:49 +0100, Alexander Graf wrote:
> 
> Congratulations for finding lots of Linux bugs :). Look at it from
> that way: You'll most likely be the very first person actually using
> sg properly. So after you're done, others won't have to fix it :).

Hahah, I doubt it :-) Makes me wonder whether "sg" can be used properly
to be honest...

Cheers,
Ben.





Re: [Qemu-devel] [PATCH] Add a check for readlink in mapped mode.

2010-12-22 Thread Stefan Hajnoczi
On Wed, Dec 22, 2010 at 11:09 PM, Venkateswararao Jujjuri (JV)
 wrote:
> Signed-off-by: Venkateswararao Jujjuri 
> ---
>  hw/9pfs/virtio-9p-local.c |    7 +++
>  1 files changed, 7 insertions(+), 0 deletions(-)

Reviewed-by: Stefan Hajnoczi 



Re: [Qemu-devel] Asynchronously Mirroring a Virtual Machine

2010-12-22 Thread Stefan Hajnoczi
On Wed, Dec 22, 2010 at 12:43 PM, Tomer Margalit
 wrote:
> What I would like to add to it is the ability to asynchronously mirror a
> (QEMU) VM as well, so that if the primary site crashes, the VM can be
> restored (to the last stable point) immediately.
> Since I want it to be as general as possible, I will not rely on my mirror,
> but only on there being a virtual block device that is mirroring me (and
> that has consistent writes (that is, if write A succeeded, and afterwards
> write B succeeded, then if B is on the secondary site, then A must be there
> too)), and that there is some kind of marking mechanism (so that I can mark
> the last consistent state).
> The reason I want to do this using QEMU is that it has live migration -
> which is almost what I need.
> Right now my plan is to treat the memory as a file, and put it and all the
> vm images on the same mirroring block device.

Have you looked at Kemari for KVM (try searching qemu-devel)?  It
builds a fault tolerance mechanism on top of live migration.  Memory,
disk, and network changes are sent to a secondary machine which
passively tracks the state of the VM.  You can fail over to the last
completed transaction state.  Sounds similar to what you want to do.

Stefan



Re: [Qemu-devel] possible regression in qemu-kvm 0.13.0 (memtest)

2010-12-22 Thread Stefan Hajnoczi
On Wed, Dec 22, 2010 at 10:02 AM, Peter Lieven  wrote:
> If I start a VM with the following parameters
> qemu-kvm-0.13.0 -m 2048 -smp 2 -monitor tcp:0:4014,server,nowait -vnc :14 
> -name 'ubuntu.test'  -boot order=dc,menu=off  -cdrom 
> ubuntu-10.04.1-desktop-amd64.iso -k de
>
> and select memtest in the Ubuntu CD Boot Menu, the VM immediately resets. 
> After this reset there happen several errors including graphic corruption or 
> the qemu-kvm binary
> aborting with error 134.
>
> Exactly the same scenario on the same machine with qemu-kvm-0.12.5 works 
> flawlessly.
>
> Any ideas?

You could track down the commit which broke this using git-bisect(1).
The steps are:

$ git bisect start v0.13.0 v0.12.5

Then:

$ ./configure [...] && make
$ x86_64-softmmu/qemu-system-x86_64 -m 2048 -smp 2 -monitor
tcp:0:4014,server,nowait -vnc :14 -name 'ubuntu.test'  -boot
order=dc,menu=off  -cdrom ubuntu-10.04.1-desktop-amd64.iso -k de

If memtest runs as expected:
$ git bisect good
otherwise:
$ git bisect bad

Keep repeating this and you should end up at the commit that introduced the bug.

Stefan



[Qemu-devel] [Bug 513273] Re: kvm with -vga std is broken since karmic

2010-12-22 Thread morleyfl
** Changed in: vgabios (Ubuntu)
 Assignee: Dustin Kirkland (kirkland) => morleyfl (morleyfl)

** Changed in: vgabios (Ubuntu Lucid)
 Assignee: Dustin Kirkland (kirkland) => morleyfl (morleyfl)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/513273

Title:
  kvm with -vga std is broken since karmic

Status in QEMU:
  Invalid
Status in “qemu-kvm” package in Ubuntu:
  Invalid
Status in “seabios” package in Ubuntu:
  Invalid
Status in “vgabios” package in Ubuntu:
  Fix Released
Status in “qemu-kvm” source package in Lucid:
  Invalid
Status in “seabios” source package in Lucid:
  Invalid
Status in “vgabios” source package in Lucid:
  Fix Released

Bug description:
  Binary package hint: qemu-kvm

it works with -vga cirrus, with -vga std I got:

BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters
BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
BUG: kvm_dirty_pages_log_disable_slot: invalid parameters


And driver do not work properly (I can not set screen resolution) ...
virtual machine almost works, only screen problem in winxp guest

ProblemType: Bug
Architecture: amd64
Date: Wed Jan 27 15:15:49 2010
DistroRelease: Ubuntu 10.04
KvmCmdLine: Error: command ['ps', '-C', 'kvm', '-F'] failed with exit code 1: 
UIDPID  PPID  CSZ   RSS PSR STIME TTY  TIME CMD
MachineType: Acer Aspire 9300
NonfreeKernelModules: nvidia
Package: qemu-kvm 0.12.2-0ubuntu1
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLin

[Qemu-devel] svm_decache_regs Problem?

2010-12-22 Thread calvino
Hi there, recently I'm digging on kvm source code and when I use the
svm_decache_regs the kvm oops and throw out bug like this
--
BUG:unable to handle kernel NULL pointer dereference at 05f8
IP:[]svm_decache_regs+0x2f/0x72
PGD 19c98067 PUD 1c8db067 PMD 0
Oops:0002 [1] SMP
--

My os is debian-lenny-507 and kernel version is 2.6.26-0.rc8 and cpuinfo is

vendor_id : AuthenticAMD
cpu family : 15
model : 107
model name : AMD Athlon(tm) Dual Core Processor 4850e
stepping : 2
cpu MHz : 2505.188
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm
3dnowext 3dnow rep_good nopl pni cx16 lahf_lm cmp_legacy svm extapic
cr8_legacy 3dnowprefetch

the system is x86_64

--

Here is my dump of the svm_decache_regs
0x8022a8c7 : push   %rbp
0x8022a8c8 : mov%rsp,%rbp
0x8022a8cb : sub$0x18,%rsp
0x8022a8cf : mov%rdi,-0x18(%rbp)
0x8022a8d3 : mov-0x18(%rbp),%rdi
0x8022a8d7 : callq  0x8022a2a0 
0x8022a8dc : mov%rax,-0x8(%rbp)
0x8022a8e0 : mov-0x8(%rbp),%rax
0x8022a8e4 : mov0x1ca0(%rax),%rdx
0x8022a8eb : mov-0x18(%rbp),%rax
0x8022a8ef : mov0x168(%rax),%rax
0x8022a8f6 : mov%rax,0x5f8(%rdx) ---> error
step?
0x8022a8fd : mov-0x8(%rbp),%rax
0x8022a901 : mov0x1ca0(%rax),%rdx
0x8022a908 : mov-0x18(%rbp),%rax
0x8022a90c : mov0x188(%rax),%rax
0x8022a913 : mov%rax,0x5d8(%rdx)
0x8022a91a : mov-0x8(%rbp),%rax
0x8022a91e : mov0x1ca0(%rax),%rdx
0x8022a925 : mov-0x18(%rbp),%rax
0x8022a929 : mov0x1e8(%rax),%rax
0x8022a930 : mov%rax,0x578(%rdx)
0x8022a937 : leaveq
0x8022a938 : retq

AS I see before the BUG means %rax,0x5f8(%rdx) but this address 0x5f8(%rdx)
access error.


MORE this is a piece of my function dump
0x8022bfa0 : callq  0x8022a2a0
 > here to_svm(vcpu)
0x8022bfa5 : mov%rax,-0x8(%rbp)
0x8022bfa9 : mov-0x8(%rbp),%rax
0x8022bfad : mov0x1ca0(%rax),%rdx
0x8022bfb4 : mov-0x20(%rbp),%rax
0x8022bfb8 : mov0x168(%rax),%rax
0x8022bfbf : mov%rax,0x5f8(%rdx)  >the
error instruction, the 0x5f8(%rdx) address
0x8022bfc6 : mov-0x8(%rbp),%rax
0x8022bfca : mov0x1ca0(%rax),%rdx
0x8022bfd1 : mov-0x20(%rbp),%rax
0x8022bfd5 : mov0x1e8(%rax),%rax
0x8022bfdc : mov%rax,0x578(%rdx)
0x8022bfe3 : mov-0x20(%rbp),%rax
0x8022bfe7 : mov0x1e8(%rax),%rax
0x8022bfee : cmp$0x8026103a,%rax
0x8022bff4 : jne0x8022c06a

0x8022bff6 : mov$0x805f07a3,%rdi

static int handle_invalid_op(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
{
 struct vcpu_svm *svm = to_svm(vcpu);

 do something
}


ADDON: my kvm startup command is:kvm -hda xxx -cdrom xxx -net
nic,model=rtl8139,macaddress=11:11:11:11:11:11 -net
tap,ifname=tap,script=xxx vnc  -boot c

when I start the kvm and then it oops and a memory error like that.


---
Is anyone will check how svm_decache_regs works?