Re: [Qemu-devel] latest qemu with gdb remote not working

2013-06-02 Thread Peter Maydell
On 2 June 2013 06:46, Peter Cheung  wrote:
> I just tried to compile the latest qemu on Fedora 18 64 bits,
> it is also fail. When i press "c" in gdb, the qemu won't start
> running.

Good. This significantly increases the chances that somebody
will investigate.

> But one thing fedora is different than mac, when i connect gdb to qemu, it
> won't say "warning: Error 268435459 getting port names from mach_port_names"

That looks like a Mac GDB specific issue.

> In my mac, i use gcc from mac port, here is the version detail:
>
> /Users/peter>gcc -v

> gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)

This is pretty ancient -- I'd suggest using clang to build
QEMU instead. (It probably won't fix your problem here, though.)

What command line are you using to configure and build QEMU?

thanks
-- PMM



Re: [Qemu-devel] KVM call agenda for 2013-05-28

2013-06-02 Thread Michael S. Tsirkin
On Fri, May 31, 2013 at 01:45:55PM +0200, Laszlo Ersek wrote:
> On 05/31/13 09:09, Jordan Justen wrote:
> 
> > Why is updating the ACPI tables in seabios viewed as such a burden?
> > Either qemu does it, or seabios... (And, OVMF too, but I don't think
> > you guys are concerned with that. :)
> 
> I am :)
> 
> > On the flip side, why is moving the ACPI tables to QEMU such an issue?
> > It seems like Xen and virtualbox both already do this. Why is running
> > iasl not an issue for them?
> 
> I think something was mentioned about iasl having problems on BE
> machines? I could be easily wrong but I *guess* qemu's hosts x targets
> (emulate what on what) set is a proper superset of xen's and
> virtualbox's. Presumably if you want to run an x86 guest on a MIPS host,
> and also want to build qemu on the same MIPS (or SPARC) host, you'd have
> to run iasl there too.

You guys should take a look at the patch series I posted.

That's solved there by the means of keeping iasl output in qemu git tree.
configure checks for a working iasl and enables/disables
using this pre-processed output accordingly.
Everyone developing ASL code would still need working iasl
but that's already the case today.

> > tables :)
> 
> Impossible. :)
> 
> In earnest, I think what we have now is (mostly) correct, just not
> extensive / flexible enough. No support for PCI hotplug or CPU hotplug,
> none for S3 (although all of these tie into UEFI deeply), no MTRR setup,
> no MPTABLE; let alone a non-PIIX chipset. (Well maybe I shouldn't lump
> these under the "ACPI umbrella".)
> 
> > but I haven't seen it as much of a burden. (Of course,
> > Laszlo has helped out with many of the ACPI changes in OVMF, so his
> > opinion should be taken into consideration too. :)
> 
> It hasn't been a "burden" in the sense of me not liking the activity; I
> actually like fiddling with knobs. It has certainly been extra work to
> bring OVMF's ACPI tables closer to SeaBIOS's functionality / flexibility
> (and we still lag behind it quite.).
> 
> Due to licensing differences I can't just port code from SeaBIOS to OVMF
> (and I never have without explicit permission), so it's been a lot of
> back and forth with acpidump / iasl -d in guests (massage OVMF, boot
> guest, check guest dmesg / lspci, dump tables, compare, repeat), brain
> picking colleagues, the ACPI and PIIX specs and so on. I have a page on
> the RH intranet dedicated to this. When something around these parts is
> being changed (or looks like it could be changed) in SeaBIOS, or between
> qemu and SeaBIOS, I always must be alert and consider reimplementing it
> in, or porting it with permission to, OVMF. (Most recent example:
> pvpanic device -- currently only in SeaBIOS.)
> 
> It worries me that if I slack off, or am busy with something else, or
> simply don't notice, then the gap will widen again. I appreciate
> learning a bunch about ACPI, and don't mind the days of work that went
> into some of my simple-looking ACPI patches for OVMF, but had the tables
> come from a common (programmatic) source, none of this would have been
> an issue, and I wouldn't have felt even occasionally that ACPI patches
> for OVMF were both duplicate work *and* futile (considering how much
> ahead SeaBIOS was).
> 
> I don't mind reimplementing stuff, or porting it with permission, going
> forward, but the sophisticated parts in SeaBIOS are a hard nut. For
> example I'll never be able to auto-extract offsets from generated AML
> and patch the AML using those offsets; the edk2 build tools (a project
> separate from edk2) don't support this, and it takes several months to
> get a thing as simple as gcc-47 build flags into edk2-buildtools.
> 
> Instead I have to write template ASL, compile it to AML, hexdump the
> result, verify it against the AML grammar in the ACPI spec (offsets
> aren't obvious, BytePrefix and friends are a joy), define & initialize a
> packed struct or array in OVMF, and patch the template AML using fixed
> field names or array subscripts. Workable, but dog slow. If the ACPI
> payload came from up above, we might be as well provided with a list of
> (canonical name, offset, size) triplets, and could perhaps blindly patch
> the contents. (Not unlike Michael's linker code for connecting tables
> into a hierarchy.)
> 
> AFAIK most recently iasl got built-in support for offset extraction (and
> in the process the current SeaBIOS build method was broken...), so that
> part might get easier in the future.
> 
> Oh well it's Friday, sorry about this rant! :) I'll happily do what I
> can in the current status quo, but frequently, it won't amount to much.
> 
> Thanks,
> Laszlo



Re: [Qemu-devel] KVM call agenda for 2013-05-28

2013-06-02 Thread Michael S. Tsirkin
On Thu, May 30, 2013 at 10:34:26PM -0400, Kevin O'Connor wrote:
> On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
> > There were discussions on potentially introducing a middle component
> > to generate the tables.  Coreboot was raised as a possibility, and
> > David thought it would be okay to use coreboot for both OVMF and
> > SeaBIOS.  The possibility was also raised of a "rom" that lives in the
> > qemu repo, is run in the guest, and generates the tables (which is
> > similar to the hvmloader approach that Xen uses).
> 
> Given the objections to implementing ACPI directly in QEMU,

I don't think that's a given, just yet.

So far Anthony asked to be shown the kind of project that
ACPI generation in QEMU would enable. Since qemu community wasn't
directly exposed to the ACPI-related patches it's easy to see how qemu
maintainers won't be aware of the churn and maintainance overhead caused by
generating them on the guest side.

That seems reasonable, so please hang on just a little bit longer
until I post acpi hotplug support for pci bridges
based on this code.

Then we can discuss.

-- 
MST



[Qemu-devel] [PATCH 0/4] Fix ppc64 tcg issues

2013-06-02 Thread Anton Blanchard

Hi,

qemu is currently broken on ppc64. After applying the following patches
I am able to boot a ppc64 and x86-64 image successfully. 

Anton



[Qemu-devel] [PATCH 3/4] tcg-ppc64: Fix add2_i64

2013-06-02 Thread Anton Blanchard

add2_i64 was adding the lower double word to the upper double word
of each input. Fix this so we add the lower double words, then the
upper double words with carry propagation.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Anton Blanchard 
---

sub2 has similar issues, I haven't fixed it because I don't have
a testcase yet.

Index: b/tcg/ppc64/tcg-target.c
===
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -1958,18 +1958,18 @@ static void tcg_out_op (TCGContext *s, T
environment.  So in 64-bit mode it's always carry-out of bit 63.
The fallback code using deposit works just as well for 32-bit.  */
 a0 = args[0], a1 = args[1];
-if (a0 == args[4] || (!const_args[5] && a0 == args[5])) {
+if (a0 == args[3] || (!const_args[5] && a0 == args[5])) {
 a0 = TCG_REG_R0;
 }
-if (const_args[3]) {
-tcg_out32(s, ADDIC | TAI(a0, args[2], args[3]));
+if (const_args[4]) {
+tcg_out32(s, ADDIC | TAI(a0, args[2], args[4]));
 } else {
-tcg_out32(s, ADDC | TAB(a0, args[2], args[3]));
+tcg_out32(s, ADDC | TAB(a0, args[2], args[4]));
 }
 if (const_args[5]) {
-tcg_out32(s, (args[5] ? ADDME : ADDZE) | RT(a1) | RA(args[4]));
+tcg_out32(s, (args[5] ? ADDME : ADDZE) | RT(a1) | RA(args[3]));
 } else {
-tcg_out32(s, ADDE | TAB(a1, args[4], args[5]));
+tcg_out32(s, ADDE | TAB(a1, args[3], args[5]));
 }
 if (a0 != args[0]) {
 tcg_out_mov(s, TCG_TYPE_I64, args[0], a0);
@@ -2147,7 +2147,7 @@ static const TCGTargetOpDef ppc_op_defs[
 { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
 { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
 
-{ INDEX_op_add2_i64, { "r", "r", "r", "rI", "r", "rZM" } },
+{ INDEX_op_add2_i64, { "r", "r", "r", "r", "rI", "rZM" } },
 { INDEX_op_sub2_i64, { "r", "r", "rI", "r", "rZM", "r" } },
 { INDEX_op_muls2_i64, { "r", "r", "r", "r" } },
 { INDEX_op_mulu2_i64, { "r", "r", "r", "r" } },



[Qemu-devel] [PATCH 4/4] tcg-ppc64: rotr_i32 rotates wrong amount

2013-06-02 Thread Anton Blanchard

rotr_i32 calculates the amount to left shift and puts it into a
temporary, but then doesn't use it when doing the shift.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Anton Blanchard 
---

Index: b/tcg/ppc64/tcg-target.c
===
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -1661,7 +1661,7 @@ static void tcg_out_op (TCGContext *s, T
 tcg_out_rlw(s, RLWINM, args[0], args[1], 32 - args[2], 0, 31);
 } else {
 tcg_out32(s, SUBFIC | TAI(0, args[2], 32));
-tcg_out32(s, RLWNM | SAB(args[1], args[0], args[2])
+tcg_out32(s, RLWNM | SAB(args[1], args[0], 0)
  | MB(0) | ME(31));
 }
 break;



[Qemu-devel] [PATCH 2/4] tcg-ppc64: bswap64 rotates output 32 bits

2013-06-02 Thread Anton Blanchard

If our input and output is in the same register, bswap64 tries to
undo a rotate of the input. This just ends up rotating the output.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Anton Blanchard 
---

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 0fcf2b5..64fb0af 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -1922,8 +1922,6 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 
 if (a0 == 0) {
 tcg_out_mov(s, TCG_TYPE_I64, args[0], a0);
-/* Revert the source rotate that we performed above.  */
-tcg_out_rld(s, RLDICL, a1, a1, 32, 0);
 }
 break;
 



[Qemu-devel] [PATCH 1/4] tcg-ppc64: Fix RLDCL opcode

2013-06-02 Thread Anton Blanchard

The rldcl instruction doesn't have an sh field, so the minor opcode
of 8 is actually 4 when using the XO30 macro.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Anton Blanchard 
---

Index: b/tcg/ppc64/tcg-target.c
===
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -357,7 +357,7 @@ static int tcg_target_const_match (tcg_t
 #define RLDICL XO30(  0)
 #define RLDICR XO30(  1)
 #define RLDIMI XO30(  3)
-#define RLDCL  XO30(  8)
+#define RLDCL  XO30(  4)
 
 #define BCLR   XO19( 16)
 #define BCCTR  XO19(528)



Re: [Qemu-devel] [PATCH 10/21] memory: make section size a 128-bit integer

2013-06-02 Thread Paolo Bonzini
Il 01/06/2013 00:18, Richard Henderson ha scritto:
> On 05/30/2013 02:16 PM, Paolo Bonzini wrote:
>> +static inline Int128 int128_rshift(Int128 a, int n)
>> +{
>> +return (Int128) { (a.lo >> n) | (a.hi << (64 - n)), (a.hi >> n) };
>> +}
> 
> Produces wrong results for n == 0, since (a.hi << 64) is undefined.

Good catch, I'm adding an


if (!n) {
return a;
}

before.

Paolo



Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough

2013-06-02 Thread Alex Williamson
On Sat, 2013-06-01 at 14:13 +0200, Benoît Canet wrote:
> Hello,
> 
> I may have soon the PF driver of an SR-IOV card to code and make work with
> QEMU/KVM so I have the following questions.
> 
> In an AMD64 setup where QEMU use VFIO to passthrough the VFs of an SR-IOV card
> to a guest will the consequences of a PF FLR be handled fine by QEMU and the
> guest ?
> 
> I read that pci_reset_function would call pci_restore_state restoring the 
> SR-IOV
> configuration after the reset of the PF.
> 
> The ways the hardware work means that the VFs would disappear and reappear in 
> a
> short lapse of time.
> 
> Will these events be handled by the kernel pci hotplug code ?
> 
> Given that the PF driver restore the PF config space after the reset will /sys
> files used by QEMU disappear and reappear messing the QEMU VFIO passthrough or
> will it goes smoothly ?

On an Intel 82576 SR-IOV NIC, a FLR of the PF does not cause the VFs to
be removed.  It's not clear to me that they continue working across the
reset, but there is no hotplug.  If there was a hotplug, vfio-pci won't
release the device while it's in use, so the hotplug would be blocked
until the devices becomes unused, such as from the VM being shutdown.
Thanks,

Alex




Re: [Qemu-devel] [PATCH 10/21] memory: make section size a 128-bit integer

2013-06-02 Thread Peter Maydell
On 31 May 2013 23:18, Richard Henderson  wrote:
> On 05/30/2013 02:16 PM, Paolo Bonzini wrote:
>> +static inline Int128 int128_rshift(Int128 a, int n)
>> +{
>> +return (Int128) { (a.lo >> n) | (a.hi << (64 - n)), (a.hi >> n) };
>> +}
>
> Produces wrong results for n == 0, since (a.hi << 64) is undefined.

It produces wrong results for shifts by more than 64,
for that matter.

thanks
-- PMM



Re: [Qemu-devel] [SeaBIOS] [SeaBIOS PATCH] boot: fix fw_dev_path pattern for q35-pcihost

2013-06-02 Thread Gleb Natapov
On Wed, May 29, 2013 at 10:33:54AM +0800, Amos Kong wrote:
> On Tue, May 28, 2013 at 06:59:02PM -0400, Kevin O'Connor wrote:
> > On Tue, May 28, 2013 at 08:28:14PM +0800, Amos Kong wrote:
> > > Bootindex string passed from qemu:
> > >  /q35-pcihost@i0cf8/ethernet@2/ethernet-phy@0
> > > 
> > > We match pci domain by "/pci@i0cf8" in SeaBIOS, but fw_dev_path prefix
> > > of q35 is "/q35-pcihost@i0cf8". So bootindex in qemu commandline
> > > doesn't work if it uses q35 machine type.
> > > 
> > > This patch fixes the pattern to match both original pc-i440fx & q35
> > > 
> > > Signed-off-by: Amos Kong 
> > > ---
> > >  src/boot.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/src/boot.c b/src/boot.c
> > > index cd9d784..f30d47e 100644
> > > --- a/src/boot.c
> > > +++ b/src/boot.c
> > > @@ -97,7 +97,7 @@ find_prio(const char *glob)
> > >  return -1;
> > >  }
> > >  
> > > -#define FW_PCI_DOMAIN "/pci@i0cf8"
> > > +#define FW_PCI_DOMAIN "/*pci*@i0cf8"
> > 
> > The seabios pattern matching code isn't that sophisticated - I think
> > this could end up doing something unexpected.  Why does it need to
> > change?
> 
> If we start a guest with default machine type (pc-i440fx), the prefix
> of bootindex string is "/pci@i0cf8", if we start guest with -M q35,
> the prefix will become "/q35-pcihost@i0cf8".
> 
> We only match "/pci@i0cf8" in seabios, it causes boot priority of q35
> devices could not be changed.
> 
> We could not change TYPE_Q35_HOST_DEVICE to 'pci' in qemu to adapt
> seabios, so fix the pattern.
> 
Why couldn't we? QEMU not been able to generate proper device string is
QEMU bug.

--
Gleb.



Re: [Qemu-devel] [PATCH 10/21] memory: make section size a 128-bit integer

2013-06-02 Thread Paolo Bonzini
Il 02/06/2013 16:18, Peter Maydell ha scritto:
> On 31 May 2013 23:18, Richard Henderson  wrote:
>> On 05/30/2013 02:16 PM, Paolo Bonzini wrote:
>>> +static inline Int128 int128_rshift(Int128 a, int n)
>>> +{
>>> +return (Int128) { (a.lo >> n) | (a.hi << (64 - n)), (a.hi >> n) };
>>> +}
>>
>> Produces wrong results for n == 0, since (a.hi << 64) is undefined.
> 
> It produces wrong results for shifts by more than 64,
> for that matter.

This should work:

int64_t h;
if (!n) {
return a;
}
h = a.hi >> n;
if (n >= 64) {
return (Int128) { h, h >> 63 };
} else {
   return (Int128) { (a.lo >> n) | (a.hi << (64 - n)), h };
}

Paolo




Re: [Qemu-devel] [PATCH 10/21] memory: make section size a 128-bit integer

2013-06-02 Thread Peter Maydell
On 2 June 2013 15:36, Paolo Bonzini  wrote:
> This should work:
>
> int64_t h;
> if (!n) {
> return a;
> }
> h = a.hi >> n;

This is undefined for n >= 64.

> if (n >= 64) {
> return (Int128) { h, h >> 63 };
> } else {
>return (Int128) { (a.lo >> n) | (a.hi << (64 - n)), h };
> }

I would suggest looking at fpu/softfloat-macros.h:shift128Right()
except that that has at least one clearly dubious thing in it
(a check for "count < 64" in an else case that can't be reached
if count < 64)...

thanks
-- PMM



Re: [Qemu-devel] [SeaBIOS] [SeaBIOS PATCH] boot: fix fw_dev_path pattern for q35-pcihost

2013-06-02 Thread Michael S. Tsirkin
On Sun, Jun 02, 2013 at 05:29:45PM +0300, Gleb Natapov wrote:
> On Wed, May 29, 2013 at 10:33:54AM +0800, Amos Kong wrote:
> > On Tue, May 28, 2013 at 06:59:02PM -0400, Kevin O'Connor wrote:
> > > On Tue, May 28, 2013 at 08:28:14PM +0800, Amos Kong wrote:
> > > > Bootindex string passed from qemu:
> > > >  /q35-pcihost@i0cf8/ethernet@2/ethernet-phy@0
> > > > 
> > > > We match pci domain by "/pci@i0cf8" in SeaBIOS, but fw_dev_path prefix
> > > > of q35 is "/q35-pcihost@i0cf8". So bootindex in qemu commandline
> > > > doesn't work if it uses q35 machine type.
> > > > 
> > > > This patch fixes the pattern to match both original pc-i440fx & q35
> > > > 
> > > > Signed-off-by: Amos Kong 
> > > > ---
> > > >  src/boot.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/src/boot.c b/src/boot.c
> > > > index cd9d784..f30d47e 100644
> > > > --- a/src/boot.c
> > > > +++ b/src/boot.c
> > > > @@ -97,7 +97,7 @@ find_prio(const char *glob)
> > > >  return -1;
> > > >  }
> > > >  
> > > > -#define FW_PCI_DOMAIN "/pci@i0cf8"
> > > > +#define FW_PCI_DOMAIN "/*pci*@i0cf8"
> > > 
> > > The seabios pattern matching code isn't that sophisticated - I think
> > > this could end up doing something unexpected.  Why does it need to
> > > change?
> > 
> > If we start a guest with default machine type (pc-i440fx), the prefix
> > of bootindex string is "/pci@i0cf8", if we start guest with -M q35,
> > the prefix will become "/q35-pcihost@i0cf8".
> > 
> > We only match "/pci@i0cf8" in seabios, it causes boot priority of q35
> > devices could not be changed.
> > 
> > We could not change TYPE_Q35_HOST_DEVICE to 'pci' in qemu to adapt
> > seabios, so fix the pattern.
> > 
> Why couldn't we? QEMU not been able to generate proper device string is
> QEMU bug.
> 
> --
>   Gleb.


I applied a patch that does just that.




Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-06-02 Thread Gleb Natapov
On Wed, May 29, 2013 at 11:45:44AM +0300, Michael S. Tsirkin wrote:
> On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
> > On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote:
> > > Juan is not available now, and Anthony asked for
> > > agenda to be sent early.
> > > So here comes:
> > > 
> > > Agenda for the meeting Tue, May 28:
> > > 
> > > - Generating acpi tables
> > 
> > I didn't see any meeting notes, but I thought it would be worthwhile
> > to summarize the call.  This is from memory so correct me if I got
> > anything wrong.
> > 
> > Anthony believes that the generation of ACPI tables is the task of the
> > firmware.  Reasons cited include security implications of running more
> > code in qemu vs the guest context, complexities in running iasl on
> > big-endian machines, possible complexity of having to regenerate
> > tables on a vm reboot, overall sloppiness of doing it in QEMU.  Raised
> > that QOM interface should be sufficient.
> > 
> > Kevin believes that the bios table code should be moved up into QEMU.
> > Reasons cited include the churn rate in SeaBIOS for this QEMU feature
> > (15-20% of all SeaBIOS commits since integrating with QEMU have been
> > for bios tables; 20% of SeaBIOS commits in last year), complexity of
> > trying to pass all the content needed to generate the tables (eg,
> > device details, power tree, irq routing), complexity of scheduling
> > changes across different repos and synchronizing their rollout,
> > complexity of implemeting the code in both OVMF and SeaBIOS.  Kevin
> > wasn't aware of a requirement to regenerate acpi tables on a vm
> > reboot.
> 
> I think this last one is based on a misunderstanding: it's based
> on assumption that we we change hardware by hotplug
> we should regenerate the tables to match.
> But there's no management that can take advantage of
> this.
> Two possible reasonable things we can tell management:
> - hotplug for device XXX is not supported: restart qemu
>   to make guest use the device
> - hotplug for device XXX is supported
> 
> What is proposed here instead is a third option:
> - hotplug is supported but device is not functional.
>   reboot guest to make it fully functional
> 
> This will naturally lead to requirement to regenerate tables on reset.
> 
> And this is what would happen with guest-generated
> tables, and I consider this a bug, not a feature.
> 
+1. This will probably break guest resume too.

> If you really wanted to update tables dynamically, without restarting
> qemu, don't stop there, add an interface for guest to update them
> without reset. I think that's over-endineering and a
> requirement that's best avoided.
> 
> 
> > There were discussions on potentially introducing a middle component
> > to generate the tables.  Coreboot was raised as a possibility, and
> > David thought it would be okay to use coreboot for both OVMF and
> > SeaBIOS.  The possibility was also raised of a "rom" that lives in the
> > qemu repo, is run in the guest, and generates the tables (which is
> > similar to the hvmloader approach that Xen uses).
> > 
> > Anthony requested that patches be made that generate the ACPI tables
> > in QEMU for the upcoming hotplug work, so that they could be evaluated
> > to see if they truly do need to live in QEMU or if the code could live
> > in the firmware.  There were no objections.
> > 
> > -Kevin
> 
> I volunteered to implement this.
Why hotplug should generate ACPI code? It does not do so on real HW.

> 
> It was also mentioned that this patch does not yet have to fix the
> cross-version migration issue with fw_cfg. If we agree on a direction,
> we will fix it then.
> 
> Lastly, a proposal was made by Michael to make the call bi-weekly
> instead of weekly, as we were cancelling it too much.
> There were no objections.
> 
> Thus, the next call is planned for June 11, 2013.
> 
> -- 
> MST
> 
> ___
> SeaBIOS mailing list
> seab...@seabios.org
> http://www.seabios.org/mailman/listinfo/seabios

--
Gleb.



Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-06-02 Thread Michael S. Tsirkin
On Sun, Jun 02, 2013 at 06:05:42PM +0300, Gleb Natapov wrote:
> On Wed, May 29, 2013 at 11:45:44AM +0300, Michael S. Tsirkin wrote:
> > On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
> > > On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote:
> > > > Juan is not available now, and Anthony asked for
> > > > agenda to be sent early.
> > > > So here comes:
> > > > 
> > > > Agenda for the meeting Tue, May 28:
> > > > 
> > > > - Generating acpi tables
> > > 
> > > I didn't see any meeting notes, but I thought it would be worthwhile
> > > to summarize the call.  This is from memory so correct me if I got
> > > anything wrong.
> > > 
> > > Anthony believes that the generation of ACPI tables is the task of the
> > > firmware.  Reasons cited include security implications of running more
> > > code in qemu vs the guest context, complexities in running iasl on
> > > big-endian machines, possible complexity of having to regenerate
> > > tables on a vm reboot, overall sloppiness of doing it in QEMU.  Raised
> > > that QOM interface should be sufficient.
> > > 
> > > Kevin believes that the bios table code should be moved up into QEMU.
> > > Reasons cited include the churn rate in SeaBIOS for this QEMU feature
> > > (15-20% of all SeaBIOS commits since integrating with QEMU have been
> > > for bios tables; 20% of SeaBIOS commits in last year), complexity of
> > > trying to pass all the content needed to generate the tables (eg,
> > > device details, power tree, irq routing), complexity of scheduling
> > > changes across different repos and synchronizing their rollout,
> > > complexity of implemeting the code in both OVMF and SeaBIOS.  Kevin
> > > wasn't aware of a requirement to regenerate acpi tables on a vm
> > > reboot.
> > 
> > I think this last one is based on a misunderstanding: it's based
> > on assumption that we we change hardware by hotplug
> > we should regenerate the tables to match.
> > But there's no management that can take advantage of
> > this.
> > Two possible reasonable things we can tell management:
> > - hotplug for device XXX is not supported: restart qemu
> >   to make guest use the device
> > - hotplug for device XXX is supported
> > 
> > What is proposed here instead is a third option:
> > - hotplug is supported but device is not functional.
> >   reboot guest to make it fully functional
> > 
> > This will naturally lead to requirement to regenerate tables on reset.
> > 
> > And this is what would happen with guest-generated
> > tables, and I consider this a bug, not a feature.
> > 
> +1. This will probably break guest resume too.
> 
> > If you really wanted to update tables dynamically, without restarting
> > qemu, don't stop there, add an interface for guest to update them
> > without reset. I think that's over-endineering and a
> > requirement that's best avoided.
> > 
> > 
> > > There were discussions on potentially introducing a middle component
> > > to generate the tables.  Coreboot was raised as a possibility, and
> > > David thought it would be okay to use coreboot for both OVMF and
> > > SeaBIOS.  The possibility was also raised of a "rom" that lives in the
> > > qemu repo, is run in the guest, and generates the tables (which is
> > > similar to the hvmloader approach that Xen uses).
> > > 
> > > Anthony requested that patches be made that generate the ACPI tables
> > > in QEMU for the upcoming hotplug work, so that they could be evaluated
> > > to see if they truly do need to live in QEMU or if the code could live
> > > in the firmware.  There were no objections.
> > > 
> > > -Kevin
> > 
> > I volunteered to implement this.
> Why hotplug should generate ACPI code? It does not do so on real HW.

Hotplug should not generate ACPI code.
What is meant here is adding ACPI code to support hotplug
of devices behind a PCI to PCI bridge.


> > 
> > It was also mentioned that this patch does not yet have to fix the
> > cross-version migration issue with fw_cfg. If we agree on a direction,
> > we will fix it then.
> > 
> > Lastly, a proposal was made by Michael to make the call bi-weekly
> > instead of weekly, as we were cancelling it too much.
> > There were no objections.
> > 
> > Thus, the next call is planned for June 11, 2013.
> > 
> > -- 
> > MST
> > 
> > ___
> > SeaBIOS mailing list
> > seab...@seabios.org
> > http://www.seabios.org/mailman/listinfo/seabios
> 
> --
>   Gleb.



Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough

2013-06-02 Thread Benoît Canet

Thanks a lot for the answer.

Best regards

Benoît

> Le Sunday 02 Jun 2013 à 08:11:42 (-0600), Alex Williamson a écrit :
> On Sat, 2013-06-01 at 14:13 +0200, Benoît Canet wrote:
> > Hello,
> > 
> > I may have soon the PF driver of an SR-IOV card to code and make work with
> > QEMU/KVM so I have the following questions.
> > 
> > In an AMD64 setup where QEMU use VFIO to passthrough the VFs of an SR-IOV 
> > card
> > to a guest will the consequences of a PF FLR be handled fine by QEMU and the
> > guest ?
> > 
> > I read that pci_reset_function would call pci_restore_state restoring the 
> > SR-IOV
> > configuration after the reset of the PF.
> > 
> > The ways the hardware work means that the VFs would disappear and reappear 
> > in a
> > short lapse of time.
> > 
> > Will these events be handled by the kernel pci hotplug code ?
> > 
> > Given that the PF driver restore the PF config space after the reset will 
> > /sys
> > files used by QEMU disappear and reappear messing the QEMU VFIO passthrough 
> > or
> > will it goes smoothly ?
> 
> On an Intel 82576 SR-IOV NIC, a FLR of the PF does not cause the VFs to
> be removed.  It's not clear to me that they continue working across the
> reset, but there is no hotplug.  If there was a hotplug, vfio-pci won't
> release the device while it's in use, so the hotplug would be blocked
> until the devices becomes unused, such as from the VM being shutdown.
> Thanks,
> 
> Alex
> 
> 



[Qemu-devel] [PATCH] softfloat: Fix shift128Right for shift counts 64..127

2013-06-02 Thread Peter Maydell
shift128Right would give the wrong result for a shift count
between 64 and 127. This was never noticed because all of
our uses of this function are guaranteed not to use shift
counts in this range.

Signed-off-by: Peter Maydell 
---
Found by code inspection. This contribution can be licensed
under either the softfloat-2a or -2b license.

 fpu/softfloat-macros.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h
index b5164af..9b09545 100644
--- a/fpu/softfloat-macros.h
+++ b/fpu/softfloat-macros.h
@@ -168,7 +168,7 @@ INLINE void
 z0 = a0>>count;
 }
 else {
-z1 = ( count < 64 ) ? ( a0>>( count & 63 ) ) : 0;
+z1 = (count < 128) ? (a0 >> (count & 63)) : 0;
 z0 = 0;
 }
 *z1Ptr = z1;
-- 
1.7.11.4




Re: [Qemu-devel] Why some test suite in kvm-unit-tests designed for 64bit only?

2013-06-02 Thread Gleb Natapov
On Thu, May 30, 2013 at 06:58:07PM +0800, 李春奇  wrote:
> Hi there,
> I'm now reading codes of kvm-unit-tests and I found that some of the
> test cases for x86 is only designed for x86_64 (including access.flat,
> apic.flat, emulator.flat, idt_test.flat and so on). I wonder why these
> cases are not designed for i386? Or is there any other concerns?
> 
If functionality it tests does not depend on what mode vcpu is in then
having support only for x86_64 is simpler.

--
Gleb.



Re: [Qemu-devel] [SeaBIOS] [SeaBIOS PATCH] boot: fix fw_dev_path pattern for q35-pcihost

2013-06-02 Thread Gleb Natapov
On Sun, Jun 02, 2013 at 05:59:04PM +0300, Michael S. Tsirkin wrote:
> On Sun, Jun 02, 2013 at 05:29:45PM +0300, Gleb Natapov wrote:
> > On Wed, May 29, 2013 at 10:33:54AM +0800, Amos Kong wrote:
> > > On Tue, May 28, 2013 at 06:59:02PM -0400, Kevin O'Connor wrote:
> > > > On Tue, May 28, 2013 at 08:28:14PM +0800, Amos Kong wrote:
> > > > > Bootindex string passed from qemu:
> > > > >  /q35-pcihost@i0cf8/ethernet@2/ethernet-phy@0
> > > > > 
> > > > > We match pci domain by "/pci@i0cf8" in SeaBIOS, but fw_dev_path prefix
> > > > > of q35 is "/q35-pcihost@i0cf8". So bootindex in qemu commandline
> > > > > doesn't work if it uses q35 machine type.
> > > > > 
> > > > > This patch fixes the pattern to match both original pc-i440fx & q35
> > > > > 
> > > > > Signed-off-by: Amos Kong 
> > > > > ---
> > > > >  src/boot.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/src/boot.c b/src/boot.c
> > > > > index cd9d784..f30d47e 100644
> > > > > --- a/src/boot.c
> > > > > +++ b/src/boot.c
> > > > > @@ -97,7 +97,7 @@ find_prio(const char *glob)
> > > > >  return -1;
> > > > >  }
> > > > >  
> > > > > -#define FW_PCI_DOMAIN "/pci@i0cf8"
> > > > > +#define FW_PCI_DOMAIN "/*pci*@i0cf8"
> > > > 
> > > > The seabios pattern matching code isn't that sophisticated - I think
> > > > this could end up doing something unexpected.  Why does it need to
> > > > change?
> > > 
> > > If we start a guest with default machine type (pc-i440fx), the prefix
> > > of bootindex string is "/pci@i0cf8", if we start guest with -M q35,
> > > the prefix will become "/q35-pcihost@i0cf8".
> > > 
> > > We only match "/pci@i0cf8" in seabios, it causes boot priority of q35
> > > devices could not be changed.
> > > 
> > > We could not change TYPE_Q35_HOST_DEVICE to 'pci' in qemu to adapt
> > > seabios, so fix the pattern.
> > > 
> > Why couldn't we? QEMU not been able to generate proper device string is
> > QEMU bug.
> > 
> > --
> > Gleb.
> 
> 
> I applied a patch that does just that.
I noticed :) I have many emails to read :(

--
Gleb.



[Qemu-devel] [PATCH 00/15] Memory/IOMMU patches part 4: region ownership

2013-06-02 Thread Paolo Bonzini
Now that the DMA APIs are unified, we move closer and closer to breaking
memory access from the BQL dependency.  This series adds an API to
reference/unreference memory regions, which is not really needed only
for BQL-less memory access: the big lock can already be dropped between
address_space_map and the corresponding unmap, and the last patch fixes
potential problems in this area if the DMA destination is hot-unplugged.
Lockless memory access only makes things a bit worse.

Reference counting the region piggybacks on reference counting of a QOM
object, the "owner" of the region.  The owner API is designed so that
it will be called as little as possible.  Unowned subregions will get a
region if memory_region_set_owner is called after the subregion is added.
This is in general the common case already; often setting the owner can
be delegated to a bus-specific API that already takes a DeviceState
(for example pci_register_bar or sysbus_init_mmio).

I'll be leaving soon for a few days of vacation, hence I'm not
sending a pull request for the previous part yet.

Paolo

Paolo Bonzini (15):
  memory: add getter/setter for owner
  memory: add ref/unref
  memory: add ref/unref calls
  exec: add a reference to the region returned by address_space_translate
  pci: set owner for BARs
  sysbus: set owner for MMIO regions
  acpi: add memory_region_set_owner calls
  misc: add memory_region_set_owner calls
  isa/portio: allow setting an owner
  vga: add memory_region_set_owner calls
  pci-assign: add memory_region_set_owner calls
  vfio: add memory_region_set_owner calls
  exec: check MRU in qemu_ram_addr_from_host
  memory: return MemoryRegion from qemu_ram_addr_from_host
  memory: ref/unref memory across address_space_map/unmap

 exec.c| 79 ++-
 hw/acpi/ich9.c|  1 +
 hw/acpi/piix4.c   |  5 +++
 hw/char/serial-pci.c  |  1 +
 hw/core/loader.c  |  1 +
 hw/core/sysbus.c  |  2 +
 hw/display/cirrus_vga.c   | 19 ++---
 hw/display/exynos4210_fimd.c  |  6 +++
 hw/display/framebuffer.c  | 12 +++---
 hw/display/qxl.c  |  6 ++-
 hw/display/vga-isa-mm.c   |  2 +-
 hw/display/vga-isa.c  |  4 +-
 hw/display/vga-pci.c  |  5 ++-
 hw/display/vga.c  | 19 ++---
 hw/display/vga_int.h  |  9 ++--
 hw/display/vmware_vga.c   |  4 +-
 hw/i386/kvm/pci-assign.c  | 11 +
 hw/i386/kvmvapic.c|  1 +
 hw/isa/apm.c  |  1 +
 hw/isa/isa-bus.c  |  2 +
 hw/misc/pc-testdev.c  |  7 
 hw/misc/vfio.c| 10 +
 hw/pci/pci.c  |  2 +
 hw/virtio/dataplane/hostmem.c |  7 
 hw/virtio/vhost.c |  2 +
 hw/virtio/virtio-balloon.c|  1 +
 hw/xen/xen_pt.c   |  4 ++
 include/exec/cpu-common.h |  2 +-
 include/exec/ioport.h |  3 ++
 include/exec/memory.h | 51 +-
 include/hw/virtio/dataplane/hostmem.h |  1 +
 ioport.c  | 10 +
 kvm-all.c |  2 +
 memory.c  | 51 ++
 target-arm/kvm.c  |  2 +
 target-i386/kvm.c |  4 +-
 target-sparc/mmu_helper.c |  1 +
 xen-all.c |  2 +
 38 files changed, 301 insertions(+), 51 deletions(-)

-- 
1.8.1.4




[Qemu-devel] [PATCH 01/15] memory: add getter/setter for owner

2013-06-02 Thread Paolo Bonzini
Whenever memory regions are accessed outside the BQL, they need to be
preserved against hot-unplug.  MemoryRegions actually do not have their
own reference count; they piggyback on a QOM object, their "owner".
Add two functions to retrieve and specify the owner.

The setter function will affect the owner recursively on a whole tree
of contained regions, but without crossing (a) aliases (b) regions that
are already owned by another device.  This is so that a device can create
a complex tree of regions and a single call to memory_region_set_owner
(perhaps even within a bus-specific function, e.g. pci_register_bar) will
affect the entire tree.

Signed-off-by: Paolo Bonzini 
---
 include/exec/memory.h | 18 ++
 memory.c  | 21 +
 2 files changed, 39 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3598c4f..457a53c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -150,6 +150,7 @@ struct MemoryRegion {
 const MemoryRegionIOMMUOps *iommu_ops;
 void *opaque;
 MemoryRegion *parent;
+struct Object *owner;
 Int128 size;
 hwaddr addr;
 void (*destructor)(MemoryRegion *mr);
@@ -388,6 +389,23 @@ void memory_region_init_iommu(MemoryRegion *mr,
 void memory_region_destroy(MemoryRegion *mr);
 
 /**
+ * memory_region_owner: get a memory region's owner.
+ *
+ * @mr: the memory region being queried.
+ */
+struct Object *memory_region_owner(MemoryRegion *mr);
+
+/**
+ * memory_region_set_owner: set the owner for a memory region and all
+ * the unowned regions below it.
+ *
+ * @mr: the memory region being set.
+ * @owner: the object that acts as the owner
+ */
+void memory_region_set_owner(MemoryRegion *mr,
+ struct Object *owner);
+
+/**
  * memory_region_size: get a memory region's size.
  *
  * @mr: the memory region being queried.
diff --git a/memory.c b/memory.c
index 49371ee..e855105 100644
--- a/memory.c
+++ b/memory.c
@@ -826,6 +826,7 @@ void memory_region_init(MemoryRegion *mr,
 mr->opaque = NULL;
 mr->iommu_ops = NULL;
 mr->parent = NULL;
+mr->owner = NULL;
 mr->size = int128_make64(size);
 if (size == UINT64_MAX) {
 mr->size = int128_2_64();
@@ -1092,6 +1093,26 @@ void memory_region_destroy(MemoryRegion *mr)
 g_free(mr->ioeventfds);
 }
 
+Object *memory_region_owner(MemoryRegion *mr)
+{
+return mr->owner;
+}
+
+void memory_region_set_owner(MemoryRegion *mr,
+ Object *owner)
+{
+MemoryRegion *child;
+
+assert(mr->owner == NULL || mr->owner == owner);
+mr->owner = owner;
+
+QTAILQ_FOREACH(child, &mr->subregions, subregions_link) {
+if (child->owner == NULL || child->owner == owner) {
+memory_region_set_owner(child, owner);
+}
+}
+}
+
 uint64_t memory_region_size(MemoryRegion *mr)
 {
 if (int128_eq(mr->size, int128_2_64())) {
-- 
1.8.1.4





[Qemu-devel] [PATCH 02/15] memory: add ref/unref

2013-06-02 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 include/exec/memory.h | 30 ++
 memory.c  | 14 ++
 2 files changed, 44 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 457a53c..71df1e6 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -268,6 +268,36 @@ struct MemoryListener {
 void memory_region_init(MemoryRegion *mr,
 const char *name,
 uint64_t size);
+
+/**
+ * memory_region_ref: Add 1 to a memory region's reference count
+ *
+ * Whenever memory regions are accessed outside the BQL, they need to be
+ * preserved against hot-unplug.  MemoryRegions actually do not have their
+ * own reference count; they piggyback on a QOM object, their "owner".
+ * This function adds a reference to the owner.
+ *
+ * All MemoryRegions must have an owner if they can disappear, even if the
+ * device they belong to operates exclusively under the BQL.  This is because
+ * the region could be returned at any time by memory_region_find, and this
+ * is usually under guest control.
+ *
+ * @mr: the #MemoryRegion
+ */
+void memory_region_ref(MemoryRegion *mr);
+
+/**
+ * memory_region_unref: Remove 1 to a memory region's reference count
+ *
+ * Whenever memory regions are accessed outside the BQL, they need to be
+ * preserved against hot-unplug.  MemoryRegions actually do not have their
+ * own reference count; they piggyback on a QOM object, their "owner".
+ * This function removes a reference to the owner and possibly destroys it.
+ *
+ * @mr: the #MemoryRegion
+ */
+void memory_region_unref(MemoryRegion *mr);
+
 /**
  * memory_region_init_io: Initialize an I/O memory region.
  *
diff --git a/memory.c b/memory.c
index e855105..28613d6 100644
--- a/memory.c
+++ b/memory.c
@@ -1113,6 +1113,20 @@ void memory_region_set_owner(MemoryRegion *mr,
 }
 }
 
+void memory_region_ref(MemoryRegion *mr)
+{
+if (mr && mr->owner) {
+object_ref(mr->owner);
+}
+}
+
+void memory_region_unref(MemoryRegion *mr)
+{
+if (mr && mr->owner) {
+object_unref(mr->owner);
+}
+}
+
 uint64_t memory_region_size(MemoryRegion *mr)
 {
 if (int128_eq(mr->size, int128_2_64())) {
-- 
1.8.1.4





[Qemu-devel] [PATCH 12/15] vfio: add memory_region_set_owner calls

2013-06-02 Thread Paolo Bonzini
Cc: Alex Williamson 
Signed-off-by: Paolo Bonzini 
---
 hw/misc/vfio.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index a1f5803..3c0dc9f 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1156,6 +1156,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
 
 memory_region_init_io(&quirk->mem, &vfio_ati_3c3_quirk, quirk,
   "vfio-ati-3c3-quirk", 1);
+memory_region_set_owner(&quirk->mem, OBJECT(vdev));
 memory_region_add_subregion(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].mem, 3,
 &quirk->mem);
 
@@ -1247,6 +1248,7 @@ static void vfio_probe_ati_4010_quirk(VFIODevice *vdev, 
int nr)
 
 memory_region_init_io(&quirk->mem, &vfio_ati_4010_quirk, quirk,
   "vfio-ati-4010-quirk", 8);
+memory_region_set_owner(&quirk->mem, OBJECT(vdev));
 memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 
1);
 
 QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1333,6 +1335,7 @@ static void vfio_probe_ati_f10_quirk(VFIODevice *vdev, 
int nr)
 
 memory_region_init_io(&quirk->mem, &vfio_ati_f10_quirk, quirk,
   "vfio-ati-f10-quirk", 8);
+memory_region_set_owner(&quirk->mem, OBJECT(vdev));
 memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 
1);
 
 QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1453,6 +1456,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIODevice 
*vdev)
 
 memory_region_init_io(&quirk->mem, &vfio_nvidia_3d0_quirk, quirk,
   "vfio-nvidia-3d0-quirk", 6);
+memory_region_set_owner(&quirk->mem, OBJECT(vdev));
 memory_region_add_subregion(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].mem,
 0x10, &quirk->mem);
 
@@ -1568,6 +1572,7 @@ static void 
vfio_probe_nvidia_bar5_window_quirk(VFIODevice *vdev, int nr)
 
 memory_region_init_io(&quirk->mem, &vfio_nvidia_bar5_window_quirk, quirk,
   "vfio-nvidia-bar5-window-quirk", 16);
+memory_region_set_owner(&quirk->mem, OBJECT(vdev));
 memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 
1);
 
 QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1647,6 +1652,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice 
*vdev, int nr)
 memory_region_init_io(&quirk->mem, &vfio_nvidia_bar0_88000_quirk, quirk,
   "vfio-nvidia-bar0-88000-quirk",
   TARGET_PAGE_ALIGN(PCIE_CONFIG_SPACE_SIZE));
+memory_region_set_owner(&quirk->mem, OBJECT(vdev));
 memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
 0x88000 & TARGET_PAGE_MASK,
 &quirk->mem, 1);
@@ -1726,6 +1732,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice 
*vdev, int nr)
 memory_region_init_io(&quirk->mem, &vfio_nvidia_bar0_1800_quirk, quirk,
   "vfio-nvidia-bar0-1800-quirk",
   TARGET_PAGE_ALIGN(PCI_CONFIG_SPACE_SIZE));
+memory_region_set_owner(&quirk->mem, OBJECT(vdev));
 memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
 0x1800 & TARGET_PAGE_MASK,
 &quirk->mem, 1);
@@ -2237,6 +2244,7 @@ empty_region:
 memory_region_init(submem, name, 0);
 }
 
+memory_region_set_owner(submem, memory_region_owner(mem));
 memory_region_add_subregion(mem, offset, submem);
 
 return ret;
-- 
1.8.1.4





[Qemu-devel] [PATCH 07/15] acpi: add memory_region_set_owner calls

2013-06-02 Thread Paolo Bonzini
ACPI regions are added directly to the I/O address space, without
going through BARs.  Thus they need the owner to be set directly.

Signed-off-by: Paolo Bonzini 
---
 hw/acpi/ich9.c  | 1 +
 hw/acpi/piix4.c | 5 +
 hw/isa/apm.c| 1 +
 3 files changed, 7 insertions(+)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 4a17f32..0b19864 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -223,6 +223,7 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
   8);
 memory_region_add_subregion(&pm->io, ICH9_PMIO_SMI_EN, &pm->io_smi);
 
+memory_region_set_owner(&pm->io, OBJECT(lpc_pci));
 pm->irq = sci_irq;
 qemu_register_reset(pm_reset, pm);
 pm->powerdown_notifier.notify = pm_powerdown_req;
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index c4af1cc..d097592 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -433,6 +433,8 @@ static int piix4_pm_initfn(PCIDevice *dev)
 acpi_pm1_cnt_init(&s->ar, &s->io, s->s4_val);
 acpi_gpe_init(&s->ar, GPE_LEN);
 
+memory_region_set_owner(&s->io, OBJECT(s));
+
 s->powerdown_notifier.notify = piix4_pm_powerdown_req;
 qemu_register_powerdown_notifier(&s->powerdown_notifier);
 
@@ -672,10 +674,12 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion 
*parent,
 {
 memory_region_init_io(&s->io_gpe, &piix4_gpe_ops, s, "apci-gpe0",
   GPE_LEN);
+memory_region_set_owner(&s->io_gpe, OBJECT(s));
 memory_region_add_subregion(parent, GPE_BASE, &s->io_gpe);
 
 memory_region_init_io(&s->io_pci, &piix4_pci_ops, s, "apci-pci-hotplug",
   PCI_HOTPLUG_SIZE);
+memory_region_set_owner(&s->io_pci, OBJECT(s));
 memory_region_add_subregion(parent, PCI_HOTPLUG_ADDR,
 &s->io_pci);
 pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
@@ -683,6 +687,7 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion 
*parent,
 qemu_for_each_cpu(piix4_init_cpu_status, &s->gpe_cpu);
 memory_region_init_io(&s->io_cpu, &cpu_hotplug_ops, s, "apci-cpu-hotplug",
   PIIX4_PROC_LEN);
+memory_region_set_owner(&s->io_cpu, OBJECT(s));
 memory_region_add_subregion(parent, PIIX4_PROC_BASE, &s->io_cpu);
 s->cpu_added_notifier.notify = piix4_cpu_added_req;
 qemu_register_cpu_added_notifier(&s->cpu_added_notifier);
diff --git a/hw/isa/apm.c b/hw/isa/apm.c
index 5f21d21..376c564 100644
--- a/hw/isa/apm.c
+++ b/hw/isa/apm.c
@@ -97,6 +97,7 @@ void apm_init(PCIDevice *dev, APMState *apm, 
apm_ctrl_changed_t callback,
 
 /* ioport 0xb2, 0xb3 */
 memory_region_init_io(&apm->io, &apm_ops, apm, "apm-io", 2);
+memory_region_set_owner(&apm->io, OBJECT(dev));
 memory_region_add_subregion(pci_address_space_io(dev), APM_CNT_IOPORT,
 &apm->io);
 }
-- 
1.8.1.4





[Qemu-devel] [PATCH 05/15] pci: set owner for BARs

2013-06-02 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 hw/pci/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index fa30110..2456075 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -914,6 +914,8 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 uint64_t wmask;
 pcibus_t size = memory_region_size(memory);
 
+memory_region_set_owner(memory, OBJECT(pci_dev));
+
 assert(region_num >= 0);
 assert(region_num < PCI_NUM_REGIONS);
 if (size & (size-1)) {
-- 
1.8.1.4





[Qemu-devel] [PATCH 04/15] exec: add a reference to the region returned by address_space_translate

2013-06-02 Thread Paolo Bonzini
Once address_space_translate will be callable out of the big QEMU
lock, the returned MemoryRegion will be unprotected as soon as
address_space_translate terminates. Avoid this by adding a reference to
the region, and dropping it in the caller of address_space_translate.

Signed-off-by: Paolo Bonzini 
---
 exec.c| 16 ++--
 include/exec/memory.h |  3 ++-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/exec.c b/exec.c
index ba50e8d..bf287cb 100644
--- a/exec.c
+++ b/exec.c
@@ -292,6 +292,7 @@ MemoryRegion *address_space_translate(AddressSpace *as, 
hwaddr addr,
 
 *plen = len;
 *xlat = addr;
+memory_region_ref(mr);
 return mr;
 }
 
@@ -1994,6 +1995,7 @@ bool address_space_rw(AddressSpace *as, hwaddr addr, 
uint8_t *buf,
 memcpy(buf, ptr, l);
 }
 }
+memory_region_unref(mr);
 len -= l;
 buf += l;
 addr += l;
@@ -2159,8 +2161,10 @@ void *address_space_map(AddressSpace *as,
 raddr = memory_region_get_ram_addr(mr) + xlat;
 } else {
 if (memory_region_get_ram_addr(mr) + xlat != raddr + todo) {
+memory_region_unref(mr);
 break;
 }
+memory_region_unref(mr);
 }
 
 len -= l;
@@ -2260,6 +2264,7 @@ static inline uint32_t ldl_phys_internal(hwaddr addr,
 break;
 }
 }
+memory_region_unref(mr);
 return val;
 }
 
@@ -2319,6 +2324,7 @@ static inline uint64_t ldq_phys_internal(hwaddr addr,
 break;
 }
 }
+memory_region_unref(mr);
 return val;
 }
 
@@ -2386,6 +2392,7 @@ static inline uint32_t lduw_phys_internal(hwaddr addr,
 break;
 }
 }
+memory_region_unref(mr);
 return val;
 }
 
@@ -2433,6 +2440,7 @@ void stl_phys_notdirty(hwaddr addr, uint32_t val)
 }
 }
 }
+memory_region_unref(mr);
 }
 
 /* warning: addr must be aligned */
@@ -2474,6 +2482,7 @@ static inline void stl_phys_internal(hwaddr addr, 
uint32_t val,
 }
 invalidate_and_set_dirty(addr1, 4);
 }
+memory_region_unref(mr);
 }
 
 void stl_phys(hwaddr addr, uint32_t val)
@@ -2537,6 +2546,7 @@ static inline void stw_phys_internal(hwaddr addr, 
uint32_t val,
 }
 invalidate_and_set_dirty(addr1, 2);
 }
+memory_region_unref(mr);
 }
 
 void stw_phys(hwaddr addr, uint32_t val)
@@ -2626,11 +2636,13 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
 {
 MemoryRegion*mr;
 hwaddr l = 1;
+bool res;
 
 mr = address_space_translate(&address_space_memory,
  phys_addr, &phys_addr, &l, false);
 
-return !(memory_region_is_ram(mr) ||
- memory_region_is_romd(mr));
+res = !(memory_region_is_ram(mr) || memory_region_is_romd(mr));
+memory_region_unref(mr);
+return res;
 }
 #endif
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 71df1e6..d3a2888 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -997,7 +997,8 @@ bool address_space_write(AddressSpace *as, hwaddr addr,
 bool address_space_read(AddressSpace *as, hwaddr addr, uint8_t *buf, int len);
 
 /* address_space_translate: translate an address range into an address space
- * into a MemoryRegion and an address range into that section
+ * into a MemoryRegion and an address range into that section.  Add a reference
+ * to that region.
  *
  * @as: #AddressSpace to be accessed
  * @addr: address within that address space
-- 
1.8.1.4





[Qemu-devel] [PATCH 06/15] sysbus: set owner for MMIO regions

2013-06-02 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 hw/core/sysbus.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
index 9004d8c..788696b 100644
--- a/hw/core/sysbus.c
+++ b/hw/core/sysbus.c
@@ -115,6 +115,8 @@ void sysbus_init_mmio(SysBusDevice *dev, MemoryRegion 
*memory)
 n = dev->num_mmio++;
 dev->mmio[n].addr = -1;
 dev->mmio[n].memory = memory;
+
+memory_region_set_owner(dev->mmio[n].memory, OBJECT(dev));
 }
 
 MemoryRegion *sysbus_mmio_get_region(SysBusDevice *dev, int n)
-- 
1.8.1.4





[Qemu-devel] [PATCH 03/15] memory: add ref/unref calls

2013-06-02 Thread Paolo Bonzini
Add ref/unref calls at the following places:

- places where memory regions are stashed by a listener and
  used outside the BQL (including in Xen or KVM).

- memory_region_find callsites

Signed-off-by: Paolo Bonzini 
---
 exec.c|  6 +-
 hw/core/loader.c  |  1 +
 hw/display/exynos4210_fimd.c  |  6 ++
 hw/display/framebuffer.c  | 12 +++-
 hw/i386/kvmvapic.c|  1 +
 hw/misc/vfio.c|  2 ++
 hw/virtio/dataplane/hostmem.c |  7 +++
 hw/virtio/vhost.c |  2 ++
 hw/virtio/virtio-balloon.c|  1 +
 hw/xen/xen_pt.c   |  4 
 include/hw/virtio/dataplane/hostmem.h |  1 +
 kvm-all.c |  2 ++
 memory.c  | 16 
 target-arm/kvm.c  |  2 ++
 target-sparc/mmu_helper.c |  1 +
 xen-all.c |  2 ++
 16 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 8909478..ba50e8d 100644
--- a/exec.c
+++ b/exec.c
@@ -815,12 +815,16 @@ static uint16_t phys_section_add(MemoryRegionSection 
*section)
 phys_sections_nb_alloc);
 }
 phys_sections[phys_sections_nb] = *section;
+memory_region_ref(section->mr);
 return phys_sections_nb++;
 }
 
 static void phys_sections_clear(void)
 {
-phys_sections_nb = 0;
+while (phys_sections_nb > 0) {
+MemoryRegionSection *section = &phys_sections[--phys_sections_nb];
+memory_region_unref(section->mr);
+}
 }
 
 static void register_subpage(AddressSpaceDispatch *d, MemoryRegionSection 
*section)
diff --git a/hw/core/loader.c b/hw/core/loader.c
index 3a60cbe..44d8714 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -727,6 +727,7 @@ int rom_load_all(void)
 addr += rom->romsize;
 section = memory_region_find(get_system_memory(), rom->addr, 1);
 rom->isrom = int128_nz(section.size) && 
memory_region_is_rom(section.mr);
+memory_region_unref(section.mr);
 }
 qemu_register_reset(rom_reset, NULL);
 roms_loaded = 1;
diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index 0da00a9..f44c4a6 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -1126,6 +1126,11 @@ static void 
fimd_update_memory_section(Exynos4210fimdState *s, unsigned win)
 /* Total number of bytes of virtual screen used by current window */
 w->fb_len = fb_mapped_len = (w->virtpage_width + w->virtpage_offsize) *
 (w->rightbot_y - w->lefttop_y + 1);
+
+/* TODO: add .exit and unref the region there.  Not needed yet since sysbus
+ * does not support hot-unplug.
+ */
+memory_region_unref(w->mem_section.mr);
 w->mem_section = memory_region_find(sysbus_address_space(&s->busdev),
 fb_start_addr, w->fb_len);
 assert(w->mem_section.mr);
@@ -1154,6 +1159,7 @@ static void 
fimd_update_memory_section(Exynos4210fimdState *s, unsigned win)
 return;
 
 error_return:
+memory_region_unref(w->mem_section.mr);
 w->mem_section.mr = NULL;
 w->mem_section.size = int128_zero();
 w->host_fb_addr = NULL;
diff --git a/hw/display/framebuffer.c b/hw/display/framebuffer.c
index 49c9e59..4546e42 100644
--- a/hw/display/framebuffer.c
+++ b/hw/display/framebuffer.c
@@ -54,11 +54,11 @@ void framebuffer_update_display(
 src_len = src_width * rows;
 
 mem_section = memory_region_find(address_space, base, src_len);
+mem = mem_section.mr;
 if (int128_get64(mem_section.size) != src_len ||
 !memory_region_is_ram(mem_section.mr)) {
-return;
+goto out;
 }
-mem = mem_section.mr;
 assert(mem);
 assert(mem_section.offset_within_address_space == base);
 
@@ -68,10 +68,10 @@ void framebuffer_update_display(
but it's not really worth it as dirty flag tracking will probably
already have failed above.  */
 if (!src_base)
-return;
+goto out;
 if (src_len != src_width * rows) {
 cpu_physical_memory_unmap(src_base, src_len, 0, 0);
-return;
+goto out;
 }
 src = src_base;
 dest = surface_data(ds);
@@ -102,10 +102,12 @@ void framebuffer_update_display(
 }
 cpu_physical_memory_unmap(src_base, src_len, 0, 0);
 if (first < 0) {
-return;
+goto out;
 }
 memory_region_reset_dirty(mem, mem_section.offset_within_region, src_len,
   DIRTY_MEMORY_VGA);
 *first_row = first;
 *last_row = last;
+out:
+memory_region_unref(mem);
 }
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 655483b..e375c1c 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -605,6 +605,7 @@ static void vapic_map_rom_writable(VAPICROMState *s)
  rom_size);
 memory_region_add_subregion_overlap(as, ro

[Qemu-devel] [PATCH 09/15] isa/portio: allow setting an owner

2013-06-02 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 hw/isa/isa-bus.c  |  2 ++
 include/exec/ioport.h |  3 +++
 ioport.c  | 10 ++
 3 files changed, 15 insertions(+)

diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
index 7860b17..d263d0f 100644
--- a/hw/isa/isa-bus.c
+++ b/hw/isa/isa-bus.c
@@ -100,6 +100,7 @@ static inline void isa_init_ioport(ISADevice *dev, uint16_t 
ioport)
 
 void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start)
 {
+memory_region_set_owner(io, OBJECT(dev));
 memory_region_add_subregion(isabus->address_space_io, start, io);
 isa_init_ioport(dev, start);
 }
@@ -116,6 +117,7 @@ void isa_register_portio_list(ISADevice *dev, uint16_t 
start,
 isa_init_ioport(dev, start);
 
 portio_list_init(piolist, pio_start, opaque, name);
+portio_list_set_owner(piolist, OBJECT(dev));
 portio_list_add(piolist, isabus->address_space_io, start);
 }
 
diff --git a/include/exec/ioport.h b/include/exec/ioport.h
index fc28350..5fe0d99 100644
--- a/include/exec/ioport.h
+++ b/include/exec/ioport.h
@@ -62,6 +62,7 @@ typedef struct PortioList {
 unsigned nr;
 struct MemoryRegion **regions;
 struct MemoryRegion **aliases;
+struct Object *owner;
 void *opaque;
 const char *name;
 } PortioList;
@@ -69,6 +70,8 @@ typedef struct PortioList {
 void portio_list_init(PortioList *piolist,
   const struct MemoryRegionPortio *callbacks,
   void *opaque, const char *name);
+void portio_list_set_owner(PortioList *piolist,
+   struct Object *owner);
 void portio_list_destroy(PortioList *piolist);
 void portio_list_add(PortioList *piolist,
  struct MemoryRegion *address_space,
diff --git a/ioport.c b/ioport.c
index a0ac2a0..1cccd70 100644
--- a/ioport.c
+++ b/ioport.c
@@ -347,6 +347,12 @@ void portio_list_init(PortioList *piolist,
 piolist->address_space = NULL;
 piolist->opaque = opaque;
 piolist->name = name;
+piolist->owner = NULL;
+}
+
+void portio_list_set_owner(PortioList *piolist, Object *owner)
+{
+piolist->owner = owner;
 }
 
 void portio_list_destroy(PortioList *piolist)
@@ -386,8 +392,12 @@ static void portio_list_add_1(PortioList *piolist,
  */
 memory_region_init_io(region, ops, piolist->opaque, piolist->name,
   INT64_MAX);
+memory_region_set_owner(region, piolist->owner);
+
 memory_region_init_alias(alias, piolist->name,
  region, start + off_low, off_high - off_low);
+memory_region_set_owner(alias, piolist->owner);
+
 memory_region_add_subregion(piolist->address_space,
 start + off_low, alias);
 piolist->regions[piolist->nr] = region;
-- 
1.8.1.4





[Qemu-devel] [PATCH 08/15] misc: add memory_region_set_owner calls

2013-06-02 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 hw/char/serial-pci.c | 1 +
 hw/misc/pc-testdev.c | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c
index 2138e35..6b6106b 100644
--- a/hw/char/serial-pci.c
+++ b/hw/char/serial-pci.c
@@ -106,6 +106,7 @@ static int multi_serial_pci_init(PCIDevice *dev)
 s->irq = pci->irqs[i];
 pci->name[i] = g_strdup_printf("uart #%d", i+1);
 memory_region_init_io(&s->io, &serial_io_ops, s, pci->name[i], 8);
+memory_region_set_owner(&s->io, OBJECT(pci));
 memory_region_add_subregion(&pci->iobar, 8 * i, &s->io);
 }
 return 0;
diff --git a/hw/misc/pc-testdev.c b/hw/misc/pc-testdev.c
index 32df175..77998d6 100644
--- a/hw/misc/pc-testdev.c
+++ b/hw/misc/pc-testdev.c
@@ -150,12 +150,19 @@ static int init_test_device(ISADevice *isa)
 
 memory_region_init_io(&dev->ioport, &test_ioport_ops, dev,
   "pc-testdev-ioport", 4);
+memory_region_set_owner(&dev->ioport, OBJECT(dev));
+
 memory_region_init_io(&dev->flush, &test_flush_ops, dev,
   "pc-testdev-flush-page", 4);
+memory_region_set_owner(&dev->flush, OBJECT(dev));
+
 memory_region_init_io(&dev->irq, &test_irq_ops, dev,
   "pc-testdev-irq-line", 24);
+memory_region_set_owner(&dev->irq, OBJECT(dev));
+
 memory_region_init_io(&dev->iomem, &test_iomem_ops, dev,
   "pc-testdev-iomem", IOMEM_LEN);
+memory_region_set_owner(&dev->iomem, OBJECT(dev));
 
 memory_region_add_subregion(io,  0xe0,   &dev->ioport);
 memory_region_add_subregion(io,  0xe4,   &dev->flush);
-- 
1.8.1.4





[Qemu-devel] [PATCH 10/15] vga: add memory_region_set_owner calls

2013-06-02 Thread Paolo Bonzini
Cc: Gerd Hoffman 
Signed-off-by: Paolo Bonzini 
---
 hw/display/cirrus_vga.c | 19 ++-
 hw/display/qxl.c|  6 --
 hw/display/vga-isa-mm.c |  2 +-
 hw/display/vga-isa.c|  4 ++--
 hw/display/vga-pci.c|  5 +++--
 hw/display/vga.c| 19 ++-
 hw/display/vga_int.h|  9 +
 hw/display/vmware_vga.c |  4 ++--
 8 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
index 64bfe2b..ffc4dd4 100644
--- a/hw/display/cirrus_vga.c
+++ b/hw/display/cirrus_vga.c
@@ -2806,12 +2806,14 @@ static const MemoryRegionOps cirrus_vga_io_ops = {
 },
 };
 
-static void cirrus_init_common(CirrusVGAState * s, int device_id, int is_pci,
+static void cirrus_init_common(CirrusVGAState * s, int device_id,
+   DeviceState *owner,
MemoryRegion *system_memory,
MemoryRegion *system_io)
 {
 int i;
 static int inited;
+int is_pci = !!object_dynamic_cast(OBJECT(owner), TYPE_PCI_DEVICE);
 
 if (!inited) {
 inited = 1;
@@ -2843,19 +2845,23 @@ static void cirrus_init_common(CirrusVGAState * s, int 
device_id, int is_pci,
 /* Register ioport 0x3b0 - 0x3df */
 memory_region_init_io(&s->cirrus_vga_io, &cirrus_vga_io_ops, s,
   "cirrus-io", 0x30);
+memory_region_set_owner(&s->cirrus_vga_io, OBJECT(owner));
 memory_region_add_subregion(system_io, 0x3b0, &s->cirrus_vga_io);
 
 memory_region_init(&s->low_mem_container,
"cirrus-lowmem-container",
0x2);
+memory_region_set_owner(&s->low_mem_container, OBJECT(owner));
 
 memory_region_init_io(&s->low_mem, &cirrus_vga_mem_ops, s,
   "cirrus-low-memory", 0x2);
+memory_region_set_owner(&s->low_mem, OBJECT(owner));
 memory_region_add_subregion(&s->low_mem_container, 0, &s->low_mem);
 for (i = 0; i < 2; ++i) {
 static const char *names[] = { "vga.bank0", "vga.bank1" };
 MemoryRegion *bank = &s->cirrus_bank[i];
 memory_region_init_alias(bank, names[i], &s->vga.vram, 0, 0x8000);
+memory_region_set_owner(bank, OBJECT(owner));
 memory_region_set_enabled(bank, false);
 memory_region_add_subregion_overlap(&s->low_mem_container, i * 0x8000,
 bank, 1);
@@ -2870,6 +2876,7 @@ static void cirrus_init_common(CirrusVGAState * s, int 
device_id, int is_pci,
 memory_region_init_io(&s->cirrus_linear_io, &cirrus_linear_io_ops, s,
   "cirrus-linear-io", s->vga.vram_size_mb
   * 1024 * 1024);
+memory_region_set_owner(&s->cirrus_linear_io, OBJECT(owner));
 memory_region_set_flush_coalesced(&s->cirrus_linear_io);
 
 /* I/O handler for LFB */
@@ -2878,11 +2885,13 @@ static void cirrus_init_common(CirrusVGAState * s, int 
device_id, int is_pci,
   s,
   "cirrus-bitblt-mmio",
   0x40);
+memory_region_set_owner(&s->cirrus_linear_bitblt_io, OBJECT(owner));
 memory_region_set_flush_coalesced(&s->cirrus_linear_bitblt_io);
 
 /* I/O handler for memory-mapped I/O */
 memory_region_init_io(&s->cirrus_mmio_io, &cirrus_mmio_io_ops, s,
   "cirrus-mmio", CIRRUS_PNPMMIO_SIZE);
+memory_region_set_owner(&s->cirrus_mmio_io, OBJECT(owner));
 memory_region_set_flush_coalesced(&s->cirrus_mmio_io);
 
 s->real_vram_size =
@@ -2912,8 +2921,8 @@ static int vga_initfn(ISADevice *dev)
 ISACirrusVGAState *d = ISA_CIRRUS_VGA(dev);
 VGACommonState *s = &d->cirrus_vga.vga;
 
-vga_common_init(s);
-cirrus_init_common(&d->cirrus_vga, CIRRUS_ID_CLGD5430, 0,
+vga_common_init(s, DEVICE(dev));
+cirrus_init_common(&d->cirrus_vga, CIRRUS_ID_CLGD5430, DEVICE(d),
isa_address_space(dev), isa_address_space_io(dev));
 s->con = graphic_console_init(DEVICE(dev), s->hw_ops, s);
 rom_add_vga(VGABIOS_CIRRUS_FILENAME);
@@ -2959,8 +2968,8 @@ static int pci_cirrus_vga_initfn(PCIDevice *dev)
  int16_t device_id = pc->device_id;
 
  /* setup VGA */
- vga_common_init(&s->vga);
- cirrus_init_common(s, device_id, 1, pci_address_space(dev),
+ vga_common_init(&s->vga, DEVICE(dev));
+ cirrus_init_common(s, device_id, DEVICE(dev), pci_address_space(dev),
 pci_address_space_io(dev));
  s->vga.con = graphic_console_init(DEVICE(dev), s->vga.hw_ops, &s->vga);
 
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index c475cb1..3b6cc85 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2066,9 +2066,11 @@ static int qxl_init_primary(PCIDevice *dev)
 qxl->id = 0;
 qxl_init_ramsize(qxl);
 vga->vram_size_mb = qxl->vga.vram_size >> 20;
-vga_common_init(vga);
-vga_init(vga, pci_address_space(dev), pc

[Qemu-devel] [PATCH 14/15] memory: return MemoryRegion from qemu_ram_addr_from_host

2013-06-02 Thread Paolo Bonzini
It will be needed in the next patch.

Signed-off-by: Paolo Bonzini 
---
 exec.c| 35 +--
 include/exec/cpu-common.h |  2 +-
 target-i386/kvm.c |  4 ++--
 3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/exec.c b/exec.c
index 9fd4c90..39324ba 100644
--- a/exec.c
+++ b/exec.c
@@ -1329,15 +1329,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 }
 #endif /* !_WIN32 */
 
-/* Return a host pointer to ram allocated with qemu_ram_alloc.
-   With the exception of the softmmu code in this file, this should
-   only be used for local memory (e.g. video ram) that the device owns,
-   and knows it isn't going to access beyond the end of the block.
-
-   It should not be used for general purpose DMA.
-   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
- */
-void *qemu_get_ram_ptr(ram_addr_t addr)
+static RAMBlock *qemu_get_ram_block(ram_addr_t addr)
 {
 RAMBlock *block;
 
@@ -1357,6 +1349,21 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 
 found:
 ram_list.mru_block = block;
+return block;
+}
+
+/* Return a host pointer to ram allocated with qemu_ram_alloc.
+ * With the exception of the softmmu code in this file, this should
+ * only be used for local memory (e.g. video ram) that the device owns,
+ * and knows it isn't going to access beyond the end of the block.
+ *
+ * It should not be used for general purpose DMA.
+ * Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
+ */
+void *qemu_get_ram_ptr(ram_addr_t addr)
+{
+RAMBlock *block = qemu_get_ram_block(addr);
+
 if (xen_enabled()) {
 /* We need to check if the requested address is in the RAM
  * because we don't want to map the entire memory in QEMU.
@@ -1431,14 +1438,14 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, 
ram_addr_t *size)
 }
 }
 
-int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
 {
 RAMBlock *block;
 uint8_t *host = ptr;
 
 if (xen_enabled()) {
 *ram_addr = xen_ram_addr_from_mapcache(ptr);
-return 0;
+return qemu_get_ram_block(*ram_addr)->mr;
 }
 
 block = ram_list.mru_block;
@@ -1456,11 +1463,11 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t 
*ram_addr)
 }
 }
 
-return -1;
+return NULL;
 
 found:
 *ram_addr = block->offset + (host - block->host);
-return 0;
+return block->mr;
 }
 
 /* Some of the softmmu routines need to translate from a host pointer
@@ -1469,7 +1476,7 @@ ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
 {
 ram_addr_t ram_addr;
 
-if (qemu_ram_addr_from_host(ptr, &ram_addr)) {
+if (qemu_ram_addr_from_host(ptr, &ram_addr) == NULL) {
 fprintf(stderr, "Bad ram pointer %p\n", ptr);
 abort();
 }
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index e061e21..2db580a 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -50,7 +50,7 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
 
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
-int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
 ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9ffb6ca..7ba98cd 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -318,7 +318,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void 
*addr)
 
 if ((env->mcg_cap & MCG_SER_P) && addr
 && (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
-if (qemu_ram_addr_from_host(addr, &ram_addr) ||
+if (qemu_ram_addr_from_host(addr, &ram_addr) == NULL ||
 !kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
 fprintf(stderr, "Hardware memory error for memory used by "
 "QEMU itself instead of guest system!\n");
@@ -350,7 +350,7 @@ int kvm_arch_on_sigbus(int code, void *addr)
 hwaddr paddr;
 
 /* Hope we are lucky for AO MCE */
-if (qemu_ram_addr_from_host(addr, &ram_addr) ||
+if (qemu_ram_addr_from_host(addr, &ram_addr) == NULL ||
 !kvm_physical_memory_addr_from_host(CPU(first_cpu)->kvm_state,
 addr, &paddr)) {
 fprintf(stderr, "Hardware memory error for memory used by "
-- 
1.8.1.4





[Qemu-devel] [PATCH 11/15] pci-assign: add memory_region_set_owner calls

2013-06-02 Thread Paolo Bonzini
Cc: Alex Williamson 
Signed-off-by: Paolo Bonzini 
---
 hw/i386/kvm/pci-assign.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index ff85590..4b1c2d9 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -300,6 +300,7 @@ static void assigned_dev_iomem_setup(PCIDevice *pci_dev, 
int region_num,
 if (e_size > 0) {
 memory_region_init(®ion->container, "assigned-dev-container",
e_size);
+memory_region_set_owner(®ion->container, OBJECT(pci_dev));
 memory_region_add_subregion(®ion->container, 0, 
®ion->real_iomem);
 
 /* deal with MSI-X MMIO page */
@@ -330,9 +331,12 @@ static void assigned_dev_ioport_setup(PCIDevice *pci_dev, 
int region_num,
 
 region->e_size = size;
 memory_region_init(®ion->container, "assigned-dev-container", size);
+memory_region_set_owner(®ion->container, OBJECT(pci_dev));
+
 memory_region_init_io(®ion->real_iomem, &assigned_dev_ioport_ops,
   r_dev->v_addrs + region_num,
   "assigned-dev-iomem", size);
+memory_region_set_owner(®ion->real_iomem, OBJECT(pci_dev));
 memory_region_add_subregion(®ion->container, 0, ®ion->real_iomem);
 }
 
@@ -482,6 +486,8 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
   &slow_bar_ops, &pci_dev->v_addrs[i],
   "assigned-dev-slow-bar",
   cur_region->size);
+memory_region_set_owner(&pci_dev->v_addrs[i].real_iomem,
+OBJECT(pci_dev));
 } else {
 void *virtbase = pci_dev->v_addrs[i].u.r_virtbase;
 char name[32];
@@ -490,6 +496,9 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 memory_region_init_ram_ptr(&pci_dev->v_addrs[i].real_iomem,
name, cur_region->size,
virtbase);
+memory_region_set_owner(&pci_dev->v_addrs[i].real_iomem,
+OBJECT(pci_dev));
+
 vmstate_register_ram(&pci_dev->v_addrs[i].real_iomem,
  &pci_dev->dev.qdev);
 }
@@ -1651,6 +1660,7 @@ static int assigned_dev_register_msix_mmio(AssignedDevice 
*dev)
 
 memory_region_init_io(&dev->mmio, &assigned_dev_msix_mmio_ops, dev,
   "assigned-dev-msix", MSIX_PAGE_SIZE);
+memory_region_set_owner(&dev->mmio, OBJECT(dev));
 return 0;
 }
 
@@ -1916,6 +1926,7 @@ static void assigned_dev_load_option_rom(AssignedDevice 
*dev)
 snprintf(name, sizeof(name), "%s.rom",
 object_get_typename(OBJECT(dev)));
 memory_region_init_ram(&dev->dev.rom, name, st.st_size);
+memory_region_set_owner(&dev->dev.rom, OBJECT(dev));
 vmstate_register_ram(&dev->dev.rom, &dev->dev.qdev);
 ptr = memory_region_get_ram_ptr(&dev->dev.rom);
 memset(ptr, 0xff, st.st_size);
-- 
1.8.1.4





[Qemu-devel] [PATCH 13/15] exec: check MRU in qemu_ram_addr_from_host

2013-06-02 Thread Paolo Bonzini
This function is not used outside the iothread mutex, so it
can use ram_list.mru_block.

Signed-off-by: Paolo Bonzini 
---
 exec.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index bf287cb..9fd4c90 100644
--- a/exec.c
+++ b/exec.c
@@ -1441,18 +1441,26 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t 
*ram_addr)
 return 0;
 }
 
+block = ram_list.mru_block;
+if (block && block->host && host - block->host < block->length) {
+goto found;
+}
+
 QTAILQ_FOREACH(block, &ram_list.blocks, next) {
 /* This case append when the block is not mapped. */
 if (block->host == NULL) {
 continue;
 }
 if (host - block->host < block->length) {
-*ram_addr = block->offset + (host - block->host);
-return 0;
+goto found;
 }
 }
 
 return -1;
+
+found:
+*ram_addr = block->offset + (host - block->host);
+return 0;
 }
 
 /* Some of the softmmu routines need to translate from a host pointer
-- 
1.8.1.4





Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-06-02 Thread Gleb Natapov
On Sun, Jun 02, 2013 at 06:09:50PM +0300, Michael S. Tsirkin wrote:
> On Sun, Jun 02, 2013 at 06:05:42PM +0300, Gleb Natapov wrote:
> > On Wed, May 29, 2013 at 11:45:44AM +0300, Michael S. Tsirkin wrote:
> > > On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
> > > > On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote:
> > > > > Juan is not available now, and Anthony asked for
> > > > > agenda to be sent early.
> > > > > So here comes:
> > > > > 
> > > > > Agenda for the meeting Tue, May 28:
> > > > > 
> > > > > - Generating acpi tables
> > > > 
> > > > I didn't see any meeting notes, but I thought it would be worthwhile
> > > > to summarize the call.  This is from memory so correct me if I got
> > > > anything wrong.
> > > > 
> > > > Anthony believes that the generation of ACPI tables is the task of the
> > > > firmware.  Reasons cited include security implications of running more
> > > > code in qemu vs the guest context, complexities in running iasl on
> > > > big-endian machines, possible complexity of having to regenerate
> > > > tables on a vm reboot, overall sloppiness of doing it in QEMU.  Raised
> > > > that QOM interface should be sufficient.
> > > > 
> > > > Kevin believes that the bios table code should be moved up into QEMU.
> > > > Reasons cited include the churn rate in SeaBIOS for this QEMU feature
> > > > (15-20% of all SeaBIOS commits since integrating with QEMU have been
> > > > for bios tables; 20% of SeaBIOS commits in last year), complexity of
> > > > trying to pass all the content needed to generate the tables (eg,
> > > > device details, power tree, irq routing), complexity of scheduling
> > > > changes across different repos and synchronizing their rollout,
> > > > complexity of implemeting the code in both OVMF and SeaBIOS.  Kevin
> > > > wasn't aware of a requirement to regenerate acpi tables on a vm
> > > > reboot.
> > > 
> > > I think this last one is based on a misunderstanding: it's based
> > > on assumption that we we change hardware by hotplug
> > > we should regenerate the tables to match.
> > > But there's no management that can take advantage of
> > > this.
> > > Two possible reasonable things we can tell management:
> > > - hotplug for device XXX is not supported: restart qemu
> > >   to make guest use the device
> > > - hotplug for device XXX is supported
> > > 
> > > What is proposed here instead is a third option:
> > > - hotplug is supported but device is not functional.
> > >   reboot guest to make it fully functional
> > > 
> > > This will naturally lead to requirement to regenerate tables on reset.
> > > 
> > > And this is what would happen with guest-generated
> > > tables, and I consider this a bug, not a feature.
> > > 
> > +1. This will probably break guest resume too.
> > 
> > > If you really wanted to update tables dynamically, without restarting
> > > qemu, don't stop there, add an interface for guest to update them
> > > without reset. I think that's over-endineering and a
> > > requirement that's best avoided.
> > > 
> > > 
> > > > There were discussions on potentially introducing a middle component
> > > > to generate the tables.  Coreboot was raised as a possibility, and
> > > > David thought it would be okay to use coreboot for both OVMF and
> > > > SeaBIOS.  The possibility was also raised of a "rom" that lives in the
> > > > qemu repo, is run in the guest, and generates the tables (which is
> > > > similar to the hvmloader approach that Xen uses).
> > > > 
> > > > Anthony requested that patches be made that generate the ACPI tables
> > > > in QEMU for the upcoming hotplug work, so that they could be evaluated
> > > > to see if they truly do need to live in QEMU or if the code could live
> > > > in the firmware.  There were no objections.
> > > > 
> > > > -Kevin
> > > 
> > > I volunteered to implement this.
> > Why hotplug should generate ACPI code? It does not do so on real HW.
> 
> Hotplug should not generate ACPI code.
> What is meant here is adding ACPI code to support hotplug
> of devices behind a PCI to PCI bridge.
> 
Ah, OK. This one does not change on reset.

--
Gleb.



[Qemu-devel] [PATCH 15/15] memory: ref/unref memory across address_space_map/unmap

2013-06-02 Thread Paolo Bonzini
The iothread mutex might be released between map and unmap, so the
mapped region might disappear.

Signed-off-by: Paolo Bonzini 
---
 exec.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index 39324ba..538aa9e 100644
--- a/exec.c
+++ b/exec.c
@@ -2068,6 +2068,7 @@ void cpu_physical_memory_write_rom(hwaddr addr,
 }
 
 typedef struct {
+MemoryRegion *mr;
 void *buffer;
 hwaddr addr;
 hwaddr len;
@@ -2165,15 +2166,18 @@ void *address_space_map(AddressSpace *as,
 bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, TARGET_PAGE_SIZE);
 bounce.addr = addr;
 bounce.len = l;
+bounce.mr = mr;
 if (!is_write) {
 address_space_read(as, addr, bounce.buffer, l);
 }
 
 *plen = l;
+memory_region_ref(mr);
 return bounce.buffer;
 }
 if (!todo) {
 raddr = memory_region_get_ram_addr(mr) + xlat;
+memory_region_ref(mr);
 } else {
 if (memory_region_get_ram_addr(mr) + xlat != raddr + todo) {
 memory_region_unref(mr);
@@ -2200,8 +2204,12 @@ void address_space_unmap(AddressSpace *as, void *buffer, 
hwaddr len,
  int is_write, hwaddr access_len)
 {
 if (buffer != bounce.buffer) {
+MemoryRegion *mr;
+ram_addr_t addr1;
+
+mr = qemu_ram_addr_from_host(buffer, &addr1);
+assert(mr);
 if (is_write) {
-ram_addr_t addr1 = qemu_ram_addr_from_host_nofail(buffer);
 while (access_len) {
 unsigned l;
 l = TARGET_PAGE_SIZE;
@@ -2215,6 +2223,7 @@ void address_space_unmap(AddressSpace *as, void *buffer, 
hwaddr len,
 if (xen_enabled()) {
 xen_invalidate_map_cache_entry(buffer);
 }
+memory_region_unref(mr);
 return;
 }
 if (is_write) {
@@ -,6 +2231,7 @@ void address_space_unmap(AddressSpace *as, void *buffer, 
hwaddr len,
 }
 qemu_vfree(bounce.buffer);
 bounce.buffer = NULL;
+memory_region_unref(bounce.mr);
 cpu_notify_map_clients();
 }
 
-- 
1.8.1.4




Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-06-02 Thread Michael S. Tsirkin
On Sun, Jun 02, 2013 at 06:40:43PM +0300, Gleb Natapov wrote:
> On Sun, Jun 02, 2013 at 06:09:50PM +0300, Michael S. Tsirkin wrote:
> > On Sun, Jun 02, 2013 at 06:05:42PM +0300, Gleb Natapov wrote:
> > > On Wed, May 29, 2013 at 11:45:44AM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
> > > > > On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote:
> > > > > > Juan is not available now, and Anthony asked for
> > > > > > agenda to be sent early.
> > > > > > So here comes:
> > > > > > 
> > > > > > Agenda for the meeting Tue, May 28:
> > > > > > 
> > > > > > - Generating acpi tables
> > > > > 
> > > > > I didn't see any meeting notes, but I thought it would be worthwhile
> > > > > to summarize the call.  This is from memory so correct me if I got
> > > > > anything wrong.
> > > > > 
> > > > > Anthony believes that the generation of ACPI tables is the task of the
> > > > > firmware.  Reasons cited include security implications of running more
> > > > > code in qemu vs the guest context, complexities in running iasl on
> > > > > big-endian machines, possible complexity of having to regenerate
> > > > > tables on a vm reboot, overall sloppiness of doing it in QEMU.  Raised
> > > > > that QOM interface should be sufficient.
> > > > > 
> > > > > Kevin believes that the bios table code should be moved up into QEMU.
> > > > > Reasons cited include the churn rate in SeaBIOS for this QEMU feature
> > > > > (15-20% of all SeaBIOS commits since integrating with QEMU have been
> > > > > for bios tables; 20% of SeaBIOS commits in last year), complexity of
> > > > > trying to pass all the content needed to generate the tables (eg,
> > > > > device details, power tree, irq routing), complexity of scheduling
> > > > > changes across different repos and synchronizing their rollout,
> > > > > complexity of implemeting the code in both OVMF and SeaBIOS.  Kevin
> > > > > wasn't aware of a requirement to regenerate acpi tables on a vm
> > > > > reboot.
> > > > 
> > > > I think this last one is based on a misunderstanding: it's based
> > > > on assumption that we we change hardware by hotplug
> > > > we should regenerate the tables to match.
> > > > But there's no management that can take advantage of
> > > > this.
> > > > Two possible reasonable things we can tell management:
> > > > - hotplug for device XXX is not supported: restart qemu
> > > >   to make guest use the device
> > > > - hotplug for device XXX is supported
> > > > 
> > > > What is proposed here instead is a third option:
> > > > - hotplug is supported but device is not functional.
> > > >   reboot guest to make it fully functional
> > > > 
> > > > This will naturally lead to requirement to regenerate tables on reset.
> > > > 
> > > > And this is what would happen with guest-generated
> > > > tables, and I consider this a bug, not a feature.
> > > > 
> > > +1. This will probably break guest resume too.
> > > 
> > > > If you really wanted to update tables dynamically, without restarting
> > > > qemu, don't stop there, add an interface for guest to update them
> > > > without reset. I think that's over-endineering and a
> > > > requirement that's best avoided.
> > > > 
> > > > 
> > > > > There were discussions on potentially introducing a middle component
> > > > > to generate the tables.  Coreboot was raised as a possibility, and
> > > > > David thought it would be okay to use coreboot for both OVMF and
> > > > > SeaBIOS.  The possibility was also raised of a "rom" that lives in the
> > > > > qemu repo, is run in the guest, and generates the tables (which is
> > > > > similar to the hvmloader approach that Xen uses).
> > > > > 
> > > > > Anthony requested that patches be made that generate the ACPI tables
> > > > > in QEMU for the upcoming hotplug work, so that they could be evaluated
> > > > > to see if they truly do need to live in QEMU or if the code could live
> > > > > in the firmware.  There were no objections.
> > > > > 
> > > > > -Kevin
> > > > 
> > > > I volunteered to implement this.
> > > Why hotplug should generate ACPI code? It does not do so on real HW.
> > 
> > Hotplug should not generate ACPI code.
> > What is meant here is adding ACPI code to support hotplug
> > of devices behind a PCI to PCI bridge.
> > 
> Ah, OK. This one does not change on reset.

It wouldn't if QEMU generates it.
With bios generating the tables it might depending
on how it's implemented.
To make it not change across resets we'd need
an interface in QEMU to tell guest whether a
device was added since QEMU start.

> --
>   Gleb.



Re: [Qemu-devel] [PATCH 02/15] memory: add ref/unref

2013-06-02 Thread Peter Maydell
On 2 June 2013 16:43, Paolo Bonzini  wrote:
> +/**
> + * memory_region_ref: Add 1 to a memory region's reference count
> + *
> + * Whenever memory regions are accessed outside the BQL, they need to be
> + * preserved against hot-unplug.  MemoryRegions actually do not have their
> + * own reference count; they piggyback on a QOM object, their "owner".
> + * This function adds a reference to the owner.

It doesn't make much sense to describe the function as
"add 1 to a memory region's reference count" and then
say that memory regions don't have reference counts...

-- PMM



Re: [Qemu-devel] [PATCH 14/15] memory: return MemoryRegion from qemu_ram_addr_from_host

2013-06-02 Thread Peter Maydell
On 2 June 2013 16:43, Paolo Bonzini  wrote:
> -int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
> +MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
>  ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);

This is weird, because now the _nofail and the standard
versions of this function return different things. Why
wouldn't a caller of the _nofail version potentially
need the MemoryRegion* too?

-- PMM



Re: [Qemu-devel] [PATCH 00/15] Memory/IOMMU patches part 4: region ownership

2013-06-02 Thread Peter Maydell
On 2 June 2013 16:43, Paolo Bonzini  wrote:
> Reference counting the region piggybacks on reference counting of a QOM
> object, the "owner" of the region.  The owner API is designed so that
> it will be called as little as possible.  Unowned subregions will get a
> region if memory_region_set_owner is called after the subregion is added.
> This is in general the common case already; often setting the owner can
> be delegated to a bus-specific API that already takes a DeviceState
> (for example pci_register_bar or sysbus_init_mmio).

This feels a bit fragile to me -- there doesn't seem to be
a clear rule for who has to set the owner of a region or
when they have to do it, or for ensuring that it doesn't
get forgotten altogether.

What happens if I take a MemoryRegion* that another device
has exposed to me as a sysbus mmio region (and so claimed
ownership of) and pass it to pci_register_bar()? Who owns
it at that point? [That's a legitimate thing to do, I think,
though I don't suppose anybody does it at the moment.
Sysbus MMIOs aren't only for mapping in the system address
space, they're a general way for one device to expose a
MemoryRegion * for use by another device.]

thanks
-- PMM



Re: [Qemu-devel] [PATCH] tcx: Fix 24-bit display mode

2013-06-02 Thread Mark Cave-Ayland

On 01/06/13 21:59, Blue Swirl wrote:


On Sat, Jun 1, 2013 at 11:41 AM, Mark Cave-Ayland
  wrote:

Commit d08151bf (conversion of tcx to the memory API) broke the 24-bit mode of
the tcx display adapter by accidentally passing in the final address of the
dirty region to memory_region_reset_dirty() instead of its size.

Signed-off-by: Mark Cave-Ayland


Patch looks OK, but please wrap the lines:
WARNING: line over 80 characters
#81: FILE: hw/display/tcx.c:196:
+  page_min, (page_max - page_min) +
TARGET_PAGE_SIZE,

WARNING: line over 80 characters
#100: FILE: hw/display/tcx.c:288:
+  page_min, (page_max - page_min) +
TARGET_PAGE_SIZE,

total: 0 errors, 2 warnings, 26 lines checked


Thanks for the pointers - I'll resubmit a tidied-up v2 patch shortly.


ATB,

Mark.



[Qemu-devel] [PATCHv2] tcx: Fix 24-bit display mode

2013-06-02 Thread Mark Cave-Ayland
Commit d08151bf (conversion of tcx to the memory API) broke the 24-bit mode of
the tcx display adapter by accidentally passing in the final address of the
dirty region to memory_region_reset_dirty() instead of its size.

Signed-off-by: Mark Cave-Ayland 
---
 hw/display/tcx.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/display/tcx.c b/hw/display/tcx.c
index fc27f45..995641c 100644
--- a/hw/display/tcx.c
+++ b/hw/display/tcx.c
@@ -193,15 +193,16 @@ static inline void reset_dirty(TCXState *ts, ram_addr_t 
page_min,
   ram_addr_t cpage)
 {
 memory_region_reset_dirty(&ts->vram_mem,
-  page_min, page_max + TARGET_PAGE_SIZE,
+  page_min,
+  (page_max - page_min) + TARGET_PAGE_SIZE,
   DIRTY_MEMORY_VGA);
 memory_region_reset_dirty(&ts->vram_mem,
   page24 + page_min * 4,
-  page24 + page_max * 4 + TARGET_PAGE_SIZE,
+  (page_max - page_min) * 4 + TARGET_PAGE_SIZE,
   DIRTY_MEMORY_VGA);
 memory_region_reset_dirty(&ts->vram_mem,
   cpage + page_min * 4,
-  cpage + page_max * 4 + TARGET_PAGE_SIZE,
+  (page_max - page_min) * 4 + TARGET_PAGE_SIZE,
   DIRTY_MEMORY_VGA);
 }
 
@@ -285,7 +286,8 @@ static void tcx_update_display(void *opaque)
 /* reset modified pages */
 if (page_max >= page_min) {
 memory_region_reset_dirty(&ts->vram_mem,
-  page_min, page_max + TARGET_PAGE_SIZE,
+  page_min,
+  (page_max - page_min) + TARGET_PAGE_SIZE,
   DIRTY_MEMORY_VGA);
 }
 }
-- 
1.7.10.4




Re: [Qemu-devel] [PATCHv2] tcx: Fix 24-bit display mode

2013-06-02 Thread Blue Swirl
Thanks, applied.

On Sun, Jun 2, 2013 at 4:23 PM, Mark Cave-Ayland
 wrote:
> Commit d08151bf (conversion of tcx to the memory API) broke the 24-bit mode of
> the tcx display adapter by accidentally passing in the final address of the
> dirty region to memory_region_reset_dirty() instead of its size.
>
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/display/tcx.c |   10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/hw/display/tcx.c b/hw/display/tcx.c
> index fc27f45..995641c 100644
> --- a/hw/display/tcx.c
> +++ b/hw/display/tcx.c
> @@ -193,15 +193,16 @@ static inline void reset_dirty(TCXState *ts, ram_addr_t 
> page_min,
>ram_addr_t cpage)
>  {
>  memory_region_reset_dirty(&ts->vram_mem,
> -  page_min, page_max + TARGET_PAGE_SIZE,
> +  page_min,
> +  (page_max - page_min) + TARGET_PAGE_SIZE,
>DIRTY_MEMORY_VGA);
>  memory_region_reset_dirty(&ts->vram_mem,
>page24 + page_min * 4,
> -  page24 + page_max * 4 + TARGET_PAGE_SIZE,
> +  (page_max - page_min) * 4 + TARGET_PAGE_SIZE,
>DIRTY_MEMORY_VGA);
>  memory_region_reset_dirty(&ts->vram_mem,
>cpage + page_min * 4,
> -  cpage + page_max * 4 + TARGET_PAGE_SIZE,
> +  (page_max - page_min) * 4 + TARGET_PAGE_SIZE,
>DIRTY_MEMORY_VGA);
>  }
>
> @@ -285,7 +286,8 @@ static void tcx_update_display(void *opaque)
>  /* reset modified pages */
>  if (page_max >= page_min) {
>  memory_region_reset_dirty(&ts->vram_mem,
> -  page_min, page_max + TARGET_PAGE_SIZE,
> +  page_min,
> +  (page_max - page_min) + TARGET_PAGE_SIZE,
>DIRTY_MEMORY_VGA);
>  }
>  }
> --
> 1.7.10.4
>
>



[Qemu-devel] [PATCH] linux-user: Allow getdents to be provided by getdents64

2013-06-02 Thread Peter Maydell
Newer architectures may only implement the getdents64 syscall, not
getdents. Provide an implementation of getdents in terms of getdents64
so that we can run getdents-using targets on a getdents64-only host.

Signed-off-by: Peter Maydell 
---
Guess which exciting new architecture doesn't have getdents :-)
Tested on i386 by temporarily nobbling the #ifdefs so we use the
via-getdents64 path rather than via-getdents, and with the test program
from the getdents(2) manpage as the guest binary.

 linux-user/syscall.c |   60 +-
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 053fa14..15c771a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -208,8 +208,11 @@ static int gettid(void) {
 return -ENOSYS;
 }
 #endif
+#ifdef __NR_getdents
 _syscall3(int, sys_getdents, uint, fd, struct linux_dirent *, dirp, uint, 
count);
-#if defined(TARGET_NR_getdents64) && defined(__NR_getdents64)
+#endif
+#if !defined(__NR_getdents) || \
+(defined(TARGET_NR_getdents64) && defined(__NR_getdents64))
 _syscall3(int, sys_getdents64, uint, fd, struct linux_dirent64 *, dirp, uint, 
count);
 #endif
 #if defined(TARGET_NR__llseek) && defined(__NR_llseek)
@@ -6963,6 +6966,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 break;
 #endif
 case TARGET_NR_getdents:
+#ifdef __NR_getdents
 #if TARGET_ABI_BITS == 32 && HOST_LONG_BITS == 64
 {
 struct target_dirent *target_dirp;
@@ -7035,6 +7039,60 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 unlock_user(dirp, arg2, ret);
 }
 #endif
+#else
+/* Implement getdents in terms of getdents64 */
+{
+struct linux_dirent64 *dirp;
+abi_long count = arg3;
+
+dirp = lock_user(VERIFY_WRITE, arg2, count, 0);
+if (!dirp) {
+goto efault;
+}
+ret = get_errno(sys_getdents64(arg1, dirp, count));
+if (!is_error(ret)) {
+/* Convert the dirent64 structs to target dirent.  We do this
+ * in-place, since we can guarantee that a target_dirent is no
+ * larger than a dirent64; however this means we have to be
+ * careful to read everything before writing in the new format.
+ */
+struct linux_dirent64 *de;
+struct target_dirent *tde;
+int len = ret;
+int tlen = 0;
+
+de = dirp;
+tde = (struct target_dirent *)dirp;
+while (len > 0) {
+int namelen, treclen;
+int reclen = de->d_reclen;
+uint64_t ino = de->d_ino;
+int64_t off = de->d_off;
+uint8_t type = de->d_type;
+
+namelen = strlen(de->d_name);
+treclen = offsetof(struct target_dirent, d_name) + namelen 
+ 2;
+treclen = QEMU_ALIGN_UP(treclen, sizeof(abi_long));
+
+tde->d_ino = tswapal(ino);
+tde->d_off = tswapal(off);
+tde->d_reclen = tswap16(treclen);
+memmove(tde->d_name, de->d_name, namelen + 1);
+/* The target_dirent type is in what was formerly a padding
+ * byte at the end of the structure:
+ */
+*(((char *)tde) + treclen - 1) = type;
+
+de = (struct linux_dirent64 *)((char *)de + reclen);
+tde = (struct target_dirent *)((char *)tde + treclen);
+len -= reclen;
+tlen += treclen;
+}
+ret = tlen;
+}
+unlock_user(dirp, arg2, ret);
+}
+#endif
 break;
 #if defined(TARGET_NR_getdents64) && defined(__NR_getdents64)
 case TARGET_NR_getdents64:
-- 
1.7.9.5




Re: [Qemu-devel] Could configure generate QEMU's linker scripts?

2013-06-02 Thread Peter Maydell
On 20 May 2013 18:21, Richard Henderson  wrote:
> On 05/19/2013 09:30 AM, Ed Maste wrote:
>> That is, it just changes the start address.  Is this generally the
>> only difference between QEMU's linker scripts and system built-ins?
>
> Yes.

So for a new architecture how do we determine whether we need
to fiddle with the start address or not? (More specifically,
is aarch64 going to need a linker script or just to go in the
configure list of "hosts which don't need one" ?)

thanks
-- PMM



Re: [Qemu-devel] [PATCH uq/master] fix double free the memslot in kvm_set_phys_mem

2013-06-02 Thread Richard W.M. Jones
On Fri, May 31, 2013 at 04:52:18PM +0800, Xiao Guangrong wrote:
> Luiz Capitulino reported that guest refused to boot and qemu
> complained with:
> kvm_set_phys_mem: error unregistering overlapping slot: Invalid argument
> 
> It is caused by commit 235e8982ad that did double free for the memslot
> so that the second one raises the -EINVAL error
> 
> Fix it by reset memory size only if it is needed
> 
> Reported-by: Luiz Capitulino 
> Signed-off-by: Xiao Guangrong 
> ---
>  kvm-all.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 8e7bbf8..405480e 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -206,7 +206,8 @@ static int kvm_set_user_memory_region(KVMState *s, 
> KVMSlot *slot)
>  if (s->migration_log) {
>  mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
>  }
> -if (mem.flags & KVM_MEM_READONLY) {
> +
> +if (slot->memory_size && mem.flags & KVM_MEM_READONLY) {
>  /* Set the slot size to 0 before setting the slot to the desired
>   * value. This is needed based on KVM commit 75d61fbc. */
>  mem.memory_size = 0;

Tested and fixes it for me too.

Tested-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org



Re: [Qemu-devel] [PATCH 10/21] memory: make section size a 128-bit integer

2013-06-02 Thread Paolo Bonzini
Il 02/06/2013 16:50, Peter Maydell ha scritto:
> On 2 June 2013 15:36, Paolo Bonzini  wrote:
>> This should work:
>>
>> int64_t h;
>> if (!n) {
>> return a;
>> }
>> h = a.hi >> n;
> 
> This is undefined for n >= 64.

Yes, it has to be a.hi >> (n & 63).

> I would suggest looking at fpu/softfloat-macros.h:shift128Right()
> except that that has at least one clearly dubious thing in it
> (a check for "count < 64" in an else case that can't be reached
> if count < 64)...

It's a bit different in that I want an arithmetic right shift.

Paolo



Re: [Qemu-devel] [PATCH uq/master] fix double free the memslot in kvm_set_phys_mem

2013-06-02 Thread Jordan Justen
Reviewed-by: Jordan Justen 

On Fri, May 31, 2013 at 1:52 AM, Xiao Guangrong
 wrote:
> Luiz Capitulino reported that guest refused to boot and qemu
> complained with:
> kvm_set_phys_mem: error unregistering overlapping slot: Invalid argument
>
> It is caused by commit 235e8982ad that did double free for the memslot
> so that the second one raises the -EINVAL error
>
> Fix it by reset memory size only if it is needed
>
> Reported-by: Luiz Capitulino 
> Signed-off-by: Xiao Guangrong 
> ---
>  kvm-all.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/kvm-all.c b/kvm-all.c
> index 8e7bbf8..405480e 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -206,7 +206,8 @@ static int kvm_set_user_memory_region(KVMState *s, 
> KVMSlot *slot)
>  if (s->migration_log) {
>  mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
>  }
> -if (mem.flags & KVM_MEM_READONLY) {
> +
> +if (slot->memory_size && mem.flags & KVM_MEM_READONLY) {
>  /* Set the slot size to 0 before setting the slot to the desired
>   * value. This is needed based on KVM commit 75d61fbc. */
>  mem.memory_size = 0;
> --
> 1.7.7.6
>



Re: [Qemu-devel] [PATCH V19 0/8] add-cow file format

2013-06-02 Thread Fam Zheng
On Thu, 05/30 18:00, Dongxu Wang wrote:
> It will introduce a new file format: add-cow.
> 
> The add-cow file format makes it possible to perform copy-on-write on top of
> a raw disk image.  When we know that no backing file clusters remain visible
> (e.g. we have streamed the entire image and copied all data from the backing
> file), then it is possible to discard the add-cow file and use the raw image
> file directly.
> 
> This feature adds the copy-on-write feature to raw files (which cannot support
> it natively) while allowing us to get full performance again later when we no
> longer need copy-on-write.
> 
> add-cow can benefit from other available functions, such as path_has_protocol
> and qed_read_string, so we will make them public.
> 
> snapshot_blkdev are not supported now for add-cow. Will add it in futher 
> patches.
> 
> These patches are using QemuOpts parser, former patches could be found here:
> http://patchwork.ozlabs.org/patch/247508/
> 
> 
> v18->v19:
> 1) support parallel aio write.
> 2) fix flush method.
> 3) other small fix.
> v17 -> v18:
> 1) remove version field.
> 2) header size is maximum value and cluster size value.
> 3) fix type.
> 4) move struct to source file.
> 5) cluster_size->table_size.
> 6) use error_report, not fprintf.
> 7) remove version field from header.
> 8) header_size is MAX(cluster_size, 4096).
> 9) introduce s->cluster_sectors.
> 10) use BLKDBG_L2_LOAD/UPDATE.
> 11) add 037 and 038 tests.
> 
> v16->v17):
> 1) Use stringify.
> 
> v15->v16):
> 1) Rebased on QEMU upstream source tree.
> 2) Judge if opts is null in add_cow_create function.
> 
> v14->v15:
> 1) Fix typo and make some sentences more readable in docs.
> 2) Introduce STRINGIZER macro.
> 
> v13->v14:
> 1) Make some sentences more clear in docs.
> 2) Make MAGIC from 8 bytes to 4 bytes.
> 3) Fix some bugs.
> 
> v12->v13:
> 1) Use QemuOpts, not QEMUOptionParameter
> 2) cluster_size configuable
> 3) Refactor block-cache.c
> 4) Correct qemu-iotests script.
> 5) Other bug fix.
> 
> v11->v12:
> 1) Removed un-used feature bit.
> 2) Share cache code with qcow2.c.
> 3) Remove snapshot_blkdev support, will add it in another patch.
> 5) COW Bitmap field in add-cow file will be multiple of 65536.
> 6) fix grammer and typo.
> 
> Dong Xu Wang (8):
>   V18: docs: document for add-cow file format

Why mention V18 here?

-- 
Fam



Re: [Qemu-devel] [PATCH V19 1/8] V18: docs: document for add-cow file format

2013-06-02 Thread Fam Zheng
On Thu, 05/30 18:00, Dongxu Wang wrote:
> From: Dong Xu Wang 
> 
> Document for add-cow format, the usage and spec of add-cow are
> introduced.
> 
> v18-v19:
> 1) backing_fmt and image_fmt NUL-terminated.
> 2) other fix.
> V17->V18:
> 1) remove version field.
> 2) header size is maximum value and cluster size value.
> 3) fix type.
> 
> Signed-off-by: Dong Xu Wang 
> Signed-off-by: Dongxu Wang 

Changed git config? I suggest you could remove the old SOB line (if
respin).




Re: [Qemu-devel] [PATCH V19 1/8] V18: docs: document for add-cow file format

2013-06-02 Thread Dongxu Wang

On 2013/6/3 9:48, Fam Zheng wrote:

On Thu, 05/30 18:00, Dongxu Wang wrote:

From: Dong Xu Wang 

Document for add-cow format, the usage and spec of add-cow are
introduced.

v18-v19:
1) backing_fmt and image_fmt NUL-terminated.
2) other fix.
V17->V18:
1) remove version field.
2) header size is maximum value and cluster size value.
3) fix type.

Signed-off-by: Dong Xu Wang 
Signed-off-by: Dongxu Wang 


Changed git config? I suggest you could remove the old SOB line (if
respin).

Oh, yes, I changed my gitconfig file, and I did not notice these wrong 
SOB line, please ignore these boring lines. :)







Re: [Qemu-devel] [PATCH V3 0/4] qapi and snapshot code clean up in block layer

2013-06-02 Thread Wenchao Xia

于 2013-5-31 21:19, Luiz Capitulino 写道:

On Fri, 31 May 2013 21:04:10 +0800
Wenchao Xia  wrote:


于 2013-5-30 10:41, Wenchao Xia 写道:

于 2013-5-27 23:41, Kevin Wolf 写道:

Am 25.05.2013 um 05:09 hat Wenchao Xia geschrieben:

These patches are the common part of my hmp/qmp block query series
and Pavel's
qmp snapshot command converion series. It mainly does following things:
1 move snapshot related code to block/snapshot.c, qmp and info
dumping code to
block/qapi.c.
2 better info dumping function to get rid of buffer, avoid string
truncation.


Posted comments on patch 1 and 4.

Patches 2 and 3 are:
Reviewed-by: Kevin Wolf 


It seems nothing need change, Kevin, do you think it can be merged?


This serial blocks mine and Pavel's work, anything need to be
improved? Respin with
-typedef int (*fprintf_function)(FILE *f, const char *fmt, ...)
+typedef int (*fprintf_function)(void *out, const char *fmt, ...)
?


As far as my review is concerned, I'm OK with your current version.


  Thanks Luiz. This series does code moves, so it have big chance to get
conflict when upstream changes, hope it not hang out too long... sorry
for pushing many times.

--
Best Regards

Wenchao Xia




Re: [Qemu-devel] updated: kvm networking todo wiki

2013-06-02 Thread Rusty Russell
Anthony Liguori  writes:
> "Michael S. Tsirkin"  writes:
>
>> On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote:
>>> Stefan Hajnoczi  writes:
>>> 
>>> > On Thu, May 30, 2013 at 7:23 AM, Rusty Russell  
>>> > wrote:
>>> >> Anthony Liguori  writes:
>>> >>> Rusty Russell  writes:
>>>  On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
>>> > FWIW, I think what's more interesting is using vhost-net as a 
>>> > networking
>>> > backend with virtio-net in QEMU being what's guest facing.
>>> >
>>> > In theory, this gives you the best of both worlds: QEMU acts as a 
>>> > first
>>> > line of defense against a malicious guest while still getting the
>>> > performance advantages of vhost-net (zero-copy).
>>> >
>>>  It would be an interesting idea if we didn't already have the vhost
>>>  model where we don't need the userspace bounce.
>>> >>>
>>> >>> The model is very interesting for QEMU because then we can use vhost as
>>> >>> a backend for other types of network adapters (like vmxnet3 or even
>>> >>> e1000).
>>> >>>
>>> >>> It also helps for things like fault tolerance where we need to be able
>>> >>> to control packet flow within QEMU.
>>> >>
>>> >> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 
>>> >> thoughts).
>>> >>
>>> >> Then I'm really confused as to what this would look like.  A zero copy
>>> >> sendmsg?  We should be able to implement that today.
>>> >>
>>> >> On the receive side, what can we do better than readv?  If we need to
>>> >> return to userspace to tell the guest that we've got a new packet, we
>>> >> don't win on latency.  We might reduce syscall overhead with a
>>> >> multi-dimensional readv to read multiple packets at once?
>>> >
>>> > Sounds like recvmmsg(2).
>>> 
>>> Could we map this to mergable rx buffers though?
>>> 
>>> Regards,
>>> 
>>> Anthony Liguori
>>
>> Yes because we don't have to complete buffers in order.
>
> What I meant though was for GRO, we don't know how large the received
> packet is going to be.  Mergable rx buffers lets us allocate a pool of
> data for all incoming packets instead of allocating max packet size *
> max packets.
>
> recvmmsg expects an array of msghdrs and I presume each needs to be
> given a fixed size.  So this seems incompatible with mergable rx
> buffers.

Good point.  You'd need to build 64k buffers to pass to recvmmsg, then
reuse the parts it didn't touch on the next call.  This limits us to
about a 16th of what we could do with an interface which understood
buffer merging, but I don't know how much that would matter in
practice.  We'd need some benchmarks

Cheers,
Rusty.







[Qemu-devel] [PATCH V14 0/6] enhancement for qmp/hmp interfaces of block info

2013-06-02 Thread Wenchao Xia
  This series lets qmp interface show delaied info, including internal snapshot
/backing chain on all block device at runtime, which helps management stack and
human user, by retrieving exactly the same info of what qemu sees.

Example:
-> { "execute": "query-block" }
<- {
  "return":[
 {
"io-status": "ok",
"device":"ide0-hd0",
"locked":false,
"removable":false,
"inserted":{
   "ro":false,
   "drv":"qcow2",
   "encrypted":false,
   "file":"disks/test.qcow2",
   "backing_file_depth":1,
   "bps":100,
   "bps_rd":0,
   "bps_wr":0,
   "iops":100,
   "iops_rd":0,
   "iops_wr":0,
   "image":{
  "filename":"disks/test.qcow2",
  "format":"qcow2",
  "virtual-size":2048000,
  "backing_file":"base.qcow2",
  "full-backing-filename":"disks/base.qcow2",
  "backing-filename-format:"qcow2",
  "snapshots":[
 {
"id": "1",
"name": "snapshot1",
"vm-state-size": 0,
"date-sec": 1200,
"date-nsec": 12,
"vm-clock-sec": 206,
"vm-clock-nsec": 30
 }
  ],
  "backing-image":{
  "filename":"disks/base.qcow2",
  "format":"qcow2",
  "virtual-size":2048000
  }
   }
},
"type":"unknown"
 },
 {
"io-status": "ok",
"device":"ide1-cd0",
"locked":false,
"removable":true,
"type":"unknown"
 },
 {
"device":"floppy0",
"locked":false,
"removable":true,
"type":"unknown"
 },
 {
"device":"sd0",
"locked":false,
"removable":true,
"type":"unknown"
 }
  ]
   }

  These patches follows the rule that use qmp to retrieve information,
hmp layer just does a translation from qmp object it got. To make code
graceful, snapshot and image info retrieving code in qemu and qemu-img are
merged into block layer, and some function names were adjusted to describe
them better. For the part touched by the series, it works as:

   qemu  qemu-img

dump_monitordump_stdout
 |--| 
|
   block/qapi.c

  This series requires following series applied:
"[PATCH V3 0/4] qapi and snapshot code clean up in block layer",
https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg03539.html

  Special thanks for Markus, Stefan, Kevin, Eric reviewing many times.

v14:
  Address Eric's comments:
  2/6: better comments in code.
  3/6: spelling fix in comments.

Wenchao Xia (6):
  1 block: add snapshot info query function bdrv_query_snapshot_info_list()
  2 block: add image info query function bdrv_query_image_info()
  3 qmp: add recursive member in ImageInfo
  4 qmp: add ImageInfo in BlockDeviceInfo used by query-block
  5 hmp: show ImageInfo in 'info block'
  6 hmp: add parameters device and -v for info block

 block/qapi.c |  148 ++
 hmp.c|   21 +++
 include/block/qapi.h |   14 +++--
 monitor.c|7 ++-
 qapi-schema.json |   10 +++-
 qemu-img.c   |   10 +++-
 qmp-commands.hx  |   69 +++-
 7 files changed, 242 insertions(+), 37 deletions(-)





[Qemu-devel] [PATCH V14 3/6] qmp: add recursive member in ImageInfo

2013-06-02 Thread Wenchao Xia
New member *backing-image is added to reflect the backing chain
status.

Signed-off-by: Wenchao Xia 
---
 block/qapi.c |7 +++
 qapi-schema.json |5 -
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index e9d8b74..a407c3d 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -94,6 +94,13 @@ int bdrv_query_snapshot_info_list(BlockDriverState *bs,
  * @p_info: location to store image information
  * @errp: location to store error information
  *
+ * Store "flat" image information in @p_info.
+ *
+ * "Flat" means it does *not* query backing image information,
+ * i.e. (*pinfo)->has_backing_image will be set to false and
+ * (*pinfo)->backing_image to NULL even when the image does in fact have
+ * a backing image.
+ *
  * @p_info will be set only on success. On error, store error in @errp.
  */
 void bdrv_query_image_info(BlockDriverState *bs,
diff --git a/qapi-schema.json b/qapi-schema.json
index ef1f657..a02999d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -236,6 +236,8 @@
 #
 # @snapshots: #optional list of VM snapshots
 #
+# @backing-image: #optional info of the backing image (since 1.6)
+#
 # Since: 1.3
 #
 ##
@@ -245,7 +247,8 @@
'*actual-size': 'int', 'virtual-size': 'int',
'*cluster-size': 'int', '*encrypted': 'bool',
'*backing-filename': 'str', '*full-backing-filename': 'str',
-   '*backing-filename-format': 'str', '*snapshots': ['SnapshotInfo'] } 
}
+   '*backing-filename-format': 'str', '*snapshots': ['SnapshotInfo'],
+   '*backing-image': 'ImageInfo' } }
 
 ##
 # @ImageCheck:
-- 
1.7.1





[Qemu-devel] [PATCH V14 1/6] block: add snapshot info query function bdrv_query_snapshot_info_list()

2013-06-02 Thread Wenchao Xia
This patch adds function bdrv_query_snapshot_info_list(), which will
retrieve snapshot info of an image in qmp object format. The implementation
is based on the code moved from qemu-img.c with modification to fit more
for qmp based block layer API.

Signed-off-by: Wenchao Xia 
Reviewed-by: Eric Blake 
---
 block/qapi.c |   55 ++---
 include/block/qapi.h |4 ++-
 qemu-img.c   |5 +++-
 3 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 794dbf8..1ed56da 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -26,29 +26,56 @@
 #include "block/block_int.h"
 #include "qmp-commands.h"
 
-void bdrv_collect_snapshots(BlockDriverState *bs , ImageInfo *info)
+/*
+ * Returns 0 on success, with *p_list either set to describe snapshot
+ * information, or NULL because there are no snapshots.  Returns -errno on
+ * error, with *p_list untouched.
+ */
+int bdrv_query_snapshot_info_list(BlockDriverState *bs,
+  SnapshotInfoList **p_list,
+  Error **errp)
 {
 int i, sn_count;
 QEMUSnapshotInfo *sn_tab = NULL;
-SnapshotInfoList *info_list, *cur_item = NULL;
+SnapshotInfoList *info_list, *cur_item = NULL, *head = NULL;
+SnapshotInfo *info;
+
 sn_count = bdrv_snapshot_list(bs, &sn_tab);
+if (sn_count < 0) {
+const char *dev = bdrv_get_device_name(bs);
+switch (sn_count) {
+case -ENOMEDIUM:
+error_setg(errp, "Device '%s' is not inserted", dev);
+break;
+case -ENOTSUP:
+error_setg(errp,
+   "Device '%s' does not support internal snapshots",
+   dev);
+break;
+default:
+error_setg_errno(errp, -sn_count,
+ "Can't list snapshots of device '%s'", dev);
+break;
+}
+return sn_count;
+}
 
 for (i = 0; i < sn_count; i++) {
-info->has_snapshots = true;
-info_list = g_new0(SnapshotInfoList, 1);
+info = g_new0(SnapshotInfo, 1);
+info->id= g_strdup(sn_tab[i].id_str);
+info->name  = g_strdup(sn_tab[i].name);
+info->vm_state_size = sn_tab[i].vm_state_size;
+info->date_sec  = sn_tab[i].date_sec;
+info->date_nsec = sn_tab[i].date_nsec;
+info->vm_clock_sec  = sn_tab[i].vm_clock_nsec / 10;
+info->vm_clock_nsec = sn_tab[i].vm_clock_nsec % 10;
 
-info_list->value= g_new0(SnapshotInfo, 1);
-info_list->value->id= g_strdup(sn_tab[i].id_str);
-info_list->value->name  = g_strdup(sn_tab[i].name);
-info_list->value->vm_state_size = sn_tab[i].vm_state_size;
-info_list->value->date_sec  = sn_tab[i].date_sec;
-info_list->value->date_nsec = sn_tab[i].date_nsec;
-info_list->value->vm_clock_sec  = sn_tab[i].vm_clock_nsec / 10;
-info_list->value->vm_clock_nsec = sn_tab[i].vm_clock_nsec % 10;
+info_list = g_new0(SnapshotInfoList, 1);
+info_list->value = info;
 
 /* XXX: waiting for the qapi to support qemu-queue.h types */
 if (!cur_item) {
-info->snapshots = cur_item = info_list;
+head = cur_item = info_list;
 } else {
 cur_item->next = info_list;
 cur_item = info_list;
@@ -57,6 +84,8 @@ void bdrv_collect_snapshots(BlockDriverState *bs , ImageInfo 
*info)
 }
 
 g_free(sn_tab);
+*p_list = head;
+return 0;
 }
 
 void bdrv_collect_image_info(BlockDriverState *bs,
diff --git a/include/block/qapi.h b/include/block/qapi.h
index e6e568d..4f223d1 100644
--- a/include/block/qapi.h
+++ b/include/block/qapi.h
@@ -29,7 +29,9 @@
 #include "block/block.h"
 #include "block/snapshot.h"
 
-void bdrv_collect_snapshots(BlockDriverState *bs , ImageInfo *info);
+int bdrv_query_snapshot_info_list(BlockDriverState *bs,
+  SnapshotInfoList **p_list,
+  Error **errp);
 void bdrv_collect_image_info(BlockDriverState *bs,
  ImageInfo *info,
  const char *filename);
diff --git a/qemu-img.c b/qemu-img.c
index 82c7977..29929c5 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1665,7 +1665,10 @@ static ImageInfoList *collect_image_info_list(const char 
*filename,
 
 info = g_new0(ImageInfo, 1);
 bdrv_collect_image_info(bs, info, filename);
-bdrv_collect_snapshots(bs, info);
+bdrv_query_snapshot_info_list(bs, &info->snapshots, NULL);
+if (info->snapshots) {
+info->has_snapshots = true;
+}
 
 elem = g_new0(ImageInfoList, 1);
 elem->value = info;
-- 
1.7.1





[Qemu-devel] [PATCH V14 2/6] block: add image info query function bdrv_query_image_info()

2013-06-02 Thread Wenchao Xia
This patch adds function bdrv_query_image_info(), which will
retrieve image info in qmp object format. The implementation is
based on the code moved from qemu-img.c, but uses block layer
function to get snapshot info.

Signed-off-by: Wenchao Xia 
---
 block/qapi.c |   43 +--
 include/block/qapi.h |6 +++---
 qemu-img.c   |   11 ++-
 3 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 1ed56da..e9d8b74 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -88,18 +88,29 @@ int bdrv_query_snapshot_info_list(BlockDriverState *bs,
 return 0;
 }
 
-void bdrv_collect_image_info(BlockDriverState *bs,
- ImageInfo *info,
- const char *filename)
+/**
+ * bdrv_query_image_info:
+ * @bs: block device to examine
+ * @p_info: location to store image information
+ * @errp: location to store error information
+ *
+ * @p_info will be set only on success. On error, store error in @errp.
+ */
+void bdrv_query_image_info(BlockDriverState *bs,
+   ImageInfo **p_info,
+   Error **errp)
 {
 uint64_t total_sectors;
-char backing_filename[1024];
+const char *backing_filename;
 char backing_filename2[1024];
 BlockDriverInfo bdi;
+int ret;
+Error *err = NULL;
+ImageInfo *info = g_new0(ImageInfo, 1);
 
 bdrv_get_geometry(bs, &total_sectors);
 
-info->filename= g_strdup(filename);
+info->filename= g_strdup(bs->filename);
 info->format  = g_strdup(bdrv_get_format_name(bs));
 info->virtual_size= total_sectors * 512;
 info->actual_size = bdrv_get_allocated_file_size(bs);
@@ -116,7 +127,7 @@ void bdrv_collect_image_info(BlockDriverState *bs,
 info->dirty_flag = bdi.is_dirty;
 info->has_dirty_flag = true;
 }
-bdrv_get_backing_filename(bs, backing_filename, sizeof(backing_filename));
+backing_filename = bs->backing_file;
 if (backing_filename[0] != '\0') {
 info->backing_filename = g_strdup(backing_filename);
 info->has_backing_filename = true;
@@ -134,6 +145,26 @@ void bdrv_collect_image_info(BlockDriverState *bs,
 info->has_backing_filename_format = true;
 }
 }
+
+ret = bdrv_query_snapshot_info_list(bs, &info->snapshots, &err);
+switch (ret) {
+case 0:
+if (info->snapshots) {
+info->has_snapshots = true;
+}
+break;
+/* recoverable error */
+case -ENOMEDIUM:
+case -ENOTSUP:
+error_free(err);
+break;
+default:
+error_propagate(errp, err);
+qapi_free_ImageInfo(info);
+return;
+}
+
+*p_info = info;
 }
 
 BlockInfo *bdrv_query_info(BlockDriverState *bs)
diff --git a/include/block/qapi.h b/include/block/qapi.h
index 4f223d1..ab1f48f 100644
--- a/include/block/qapi.h
+++ b/include/block/qapi.h
@@ -32,9 +32,9 @@
 int bdrv_query_snapshot_info_list(BlockDriverState *bs,
   SnapshotInfoList **p_list,
   Error **errp);
-void bdrv_collect_image_info(BlockDriverState *bs,
- ImageInfo *info,
- const char *filename);
+void bdrv_query_image_info(BlockDriverState *bs,
+   ImageInfo **p_info,
+   Error **errp);
 BlockInfo *bdrv_query_info(BlockDriverState *s);
 BlockStats *bdrv_query_stats(const BlockDriverState *bs);
 
diff --git a/qemu-img.c b/qemu-img.c
index 29929c5..04a3f7c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1642,6 +1642,7 @@ static ImageInfoList *collect_image_info_list(const char 
*filename,
 ImageInfoList *head = NULL;
 ImageInfoList **last = &head;
 GHashTable *filenames;
+Error *err = NULL;
 
 filenames = g_hash_table_new_full(g_str_hash, str_equal_func, NULL, NULL);
 
@@ -1663,11 +1664,11 @@ static ImageInfoList *collect_image_info_list(const 
char *filename,
 goto err;
 }
 
-info = g_new0(ImageInfo, 1);
-bdrv_collect_image_info(bs, info, filename);
-bdrv_query_snapshot_info_list(bs, &info->snapshots, NULL);
-if (info->snapshots) {
-info->has_snapshots = true;
+bdrv_query_image_info(bs, &info, &err);
+if (error_is_set(&err)) {
+error_report("%s", error_get_pretty(err));
+error_free(err);
+goto err;
 }
 
 elem = g_new0(ImageInfoList, 1);
-- 
1.7.1





[Qemu-devel] [PATCH V14 5/6] hmp: show ImageInfo in 'info block'

2013-06-02 Thread Wenchao Xia
Now human monitor can show image details, include internal
snapshot and backing chain info for every block device.

Signed-off-by: Wenchao Xia 
---
 hmp.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/hmp.c b/hmp.c
index 4fb76ec..2aa832c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -22,6 +22,7 @@
 #include "qemu/sockets.h"
 #include "monitor/monitor.h"
 #include "ui/console.h"
+#include "block/qapi.h"
 
 static void hmp_handle_error(Monitor *mon, Error **errp)
 {
@@ -277,6 +278,7 @@ void hmp_info_cpus(Monitor *mon, const QDict *qdict)
 void hmp_info_block(Monitor *mon, const QDict *qdict)
 {
 BlockInfoList *block_list, *info;
+ImageInfo *image_info;
 
 block_list = qmp_query_block(NULL);
 
@@ -318,6 +320,18 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
 info->value->inserted->iops,
 info->value->inserted->iops_rd,
 info->value->inserted->iops_wr);
+
+monitor_printf(mon, " images:\n");
+image_info = info->value->inserted->image;
+while (1) {
+bdrv_image_info_dump((fprintf_function)monitor_printf, mon,
+ image_info);
+if (image_info->has_backing_image) {
+image_info = image_info->backing_image;
+} else {
+break;
+}
+}
 } else {
 monitor_printf(mon, " [not inserted]");
 }
-- 
1.7.1





[Qemu-devel] [PATCH V14 6/6] hmp: add parameters device and -v for info block

2013-06-02 Thread Wenchao Xia
With these parameters, user can choose the information to be showed,
to avoid message flood in the monitor.

Signed-off-by: Wenchao Xia 
Reviewed-by: Kevin Wolf 
---
 hmp.c |   25 -
 monitor.c |7 ---
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/hmp.c b/hmp.c
index 2aa832c..a590ace 100644
--- a/hmp.c
+++ b/hmp.c
@@ -279,10 +279,15 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
 {
 BlockInfoList *block_list, *info;
 ImageInfo *image_info;
+const char *device = qdict_get_try_str(qdict, "device");
+int verbose = qdict_get_try_bool(qdict, "verbose", 0);
 
 block_list = qmp_query_block(NULL);
 
 for (info = block_list; info; info = info->next) {
+if (device && strcmp(device, info->value->device)) {
+continue;
+}
 monitor_printf(mon, "%s: removable=%d",
info->value->device, info->value->removable);
 
@@ -321,15 +326,17 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
 info->value->inserted->iops_rd,
 info->value->inserted->iops_wr);
 
-monitor_printf(mon, " images:\n");
-image_info = info->value->inserted->image;
-while (1) {
-bdrv_image_info_dump((fprintf_function)monitor_printf, mon,
- image_info);
-if (image_info->has_backing_image) {
-image_info = image_info->backing_image;
-} else {
-break;
+if (verbose) {
+monitor_printf(mon, " images:\n");
+image_info = info->value->inserted->image;
+while (1) {
+bdrv_image_info_dump((fprintf_function)monitor_printf,
+ mon, image_info);
+if (image_info->has_backing_image) {
+image_info = image_info->backing_image;
+} else {
+break;
+}
 }
 }
 } else {
diff --git a/monitor.c b/monitor.c
index 6ce2a4e..243f5ae 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2472,9 +2472,10 @@ static mon_cmd_t info_cmds[] = {
 },
 {
 .name   = "block",
-.args_type  = "",
-.params = "",
-.help   = "show the block devices",
+.args_type  = "verbose:-v,device:B?",
+.params = "[-v] [device]",
+.help   = "show info of one block device or all block devices "
+  "(and details of images with -v option)",
 .mhandler.cmd = hmp_info_block,
 },
 {
-- 
1.7.1





[Qemu-devel] [PATCH V14 4/6] qmp: add ImageInfo in BlockDeviceInfo used by query-block

2013-06-02 Thread Wenchao Xia
Now image info will be retrieved as an embbed json object inside
BlockDeviceInfo, backing chain info and all related internal snapshot
info can be got in the enhanced recursive structure of ImageInfo.

Signed-off-by: Wenchao Xia 
---
 block/qapi.c |   43 +--
 include/block/qapi.h |4 ++-
 qapi-schema.json |5 +++-
 qmp-commands.hx  |   69 -
 4 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index a407c3d..a4bc411 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -174,9 +174,15 @@ void bdrv_query_image_info(BlockDriverState *bs,
 *p_info = info;
 }
 
-BlockInfo *bdrv_query_info(BlockDriverState *bs)
+/* @p_info will be set only on success. */
+void bdrv_query_info(BlockDriverState *bs,
+ BlockInfo **p_info,
+ Error **errp)
 {
 BlockInfo *info = g_malloc0(sizeof(*info));
+BlockDriverState *bs0;
+ImageInfo **p_image_info;
+Error *local_err = NULL;
 info->device = g_strdup(bs->device_name);
 info->type = g_strdup("unknown");
 info->locked = bdrv_dev_is_medium_locked(bs);
@@ -230,8 +236,30 @@ BlockInfo *bdrv_query_info(BlockDriverState *bs)
 info->inserted->iops_wr =
bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
 }
+
+bs0 = bs;
+p_image_info = &info->inserted->image;
+while (1) {
+bdrv_query_image_info(bs0, p_image_info, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+goto err;
+}
+if (bs0->drv && bs0->backing_hd) {
+bs0 = bs0->backing_hd;
+(*p_image_info)->has_backing_image = true;
+p_image_info = &((*p_image_info)->backing_image);
+} else {
+break;
+}
+}
 }
-return info;
+
+*p_info = info;
+return;
+
+ err:
+qapi_free_BlockInfo(info);
 }
 
 BlockStats *bdrv_query_stats(const BlockDriverState *bs)
@@ -268,16 +296,25 @@ BlockInfoList *qmp_query_block(Error **errp)
 {
 BlockInfoList *head = NULL, **p_next = &head;
 BlockDriverState *bs = NULL;
+Error *local_err = NULL;
 
  while ((bs = bdrv_next(bs))) {
 BlockInfoList *info = g_malloc0(sizeof(*info));
-info->value = bdrv_query_info(bs);
+bdrv_query_info(bs, &info->value, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+goto err;
+}
 
 *p_next = info;
 p_next = &info->next;
 }
 
 return head;
+
+ err:
+qapi_free_BlockInfoList(head);
+return NULL;
 }
 
 BlockStatsList *qmp_query_blockstats(Error **errp)
diff --git a/include/block/qapi.h b/include/block/qapi.h
index ab1f48f..0496cc9 100644
--- a/include/block/qapi.h
+++ b/include/block/qapi.h
@@ -35,7 +35,9 @@ int bdrv_query_snapshot_info_list(BlockDriverState *bs,
 void bdrv_query_image_info(BlockDriverState *bs,
ImageInfo **p_info,
Error **errp);
-BlockInfo *bdrv_query_info(BlockDriverState *s);
+void bdrv_query_info(BlockDriverState *bs,
+ BlockInfo **p_info,
+ Error **errp);
 BlockStats *bdrv_query_stats(const BlockDriverState *bs);
 
 void bdrv_snapshot_dump(fprintf_function func_fprintf, void *f,
diff --git a/qapi-schema.json b/qapi-schema.json
index a02999d..5ad6894 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -759,6 +759,8 @@
 #
 # @iops_wr: write I/O operations per second is specified
 #
+# @image: the info of image used (since: 1.6)
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
@@ -768,7 +770,8 @@
 '*backing_file': 'str', 'backing_file_depth': 'int',
 'encrypted': 'bool', 'encryption_key_missing': 'bool',
 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
-'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int'} }
+'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
+'image': 'ImageInfo' } }
 
 ##
 # @BlockDeviceIoStatus:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index ffd130e..8cea5e5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1704,6 +1704,47 @@ Each json-object contain the following:
  - "iops": limit total I/O operations per second (json-int)
  - "iops_rd": limit read operations per second (json-int)
  - "iops_wr": limit write operations per second (json-int)
+ - "image": the detail of the image, it is a json-object containing
+the following:
+ - "filename": image file name (json-string)
+ - "format": image format (json-string)
+ - "virtual-size": image capacity in bytes (json-int)
+ - "dirty-flag": true if image is not cleanly closed, not present
+

[Qemu-devel] [PATCH v1 0/3] Serial cleanup

2013-06-02 Thread peter . crosthwaite
From: Peter Crosthwaite 

Some cosmetics, refactored to use util/fifo8 for the FIFO8, then
factored out some common code.

Tested as working on petalogix-ml605 machine model + Linux (has
coverage of serial fifo usage).


Peter Crosthwaite (3):
  char/serial: cosmetic fixes.
  char/serial: Use generic Fifo8
  char/serial: serial_ioport_write: Factor out common code

 hw/char/serial.c | 128 +++
 include/hw/char/serial.h |  15 ++
 2 files changed, 56 insertions(+), 87 deletions(-)

-- 
1.8.3.rc1.44.gb387c77.dirty




[Qemu-devel] [PATCH v1 1/3] char/serial: cosmetic fixes.

2013-06-02 Thread peter . crosthwaite
From: Peter Crosthwaite 

Some cosmetic fixes to char/serial fixing some checkpatch errors.

Cc: qemu-triv...@nongnu.org

Signed-off-by: Peter Crosthwaite 
---
Needed for the next patch to pass checkpatch. Done as sep patch to not
obscure that patch.

 hw/char/serial.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/hw/char/serial.c b/hw/char/serial.c
index 66b6348..bd6813e 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -263,8 +263,9 @@ static gboolean serial_xmit(GIOChannel *chan, GIOCondition 
cond, void *opaque)
 if (s->tsr_retry <= 0) {
 if (s->fcr & UART_FCR_FE) {
 s->tsr = fifo_get(s,XMIT_FIFO);
-if (!s->xmit_fifo.count)
+if (!s->xmit_fifo.count) {
 s->lsr |= UART_LSR_THRE;
+}
 } else if ((s->lsr & UART_LSR_THRE)) {
 return FALSE;
 } else {
@@ -461,10 +462,11 @@ static uint64_t serial_ioport_read(void *opaque, hwaddr 
addr, unsigned size)
 } else {
 if(s->fcr & UART_FCR_FE) {
 ret = fifo_get(s,RECV_FIFO);
-if (s->recv_fifo.count == 0)
+if (s->recv_fifo.count == 0) {
 s->lsr &= ~(UART_LSR_DR | UART_LSR_BI);
-else
+} else {
 qemu_mod_timer(s->fifo_timeout_timer, qemu_get_clock_ns 
(vm_clock) + s->char_transmit_time * 4);
+}
 s->timeout_ipending = 0;
 } else {
 ret = s->rbr;
@@ -534,15 +536,21 @@ static uint64_t serial_ioport_read(void *opaque, hwaddr 
addr, unsigned size)
 static int serial_can_receive(SerialState *s)
 {
 if(s->fcr & UART_FCR_FE) {
-if(s->recv_fifo.count < UART_FIFO_LENGTH)
-/* Advertise (fifo.itl - fifo.count) bytes when count < ITL, and 1 if 
above. If UART_FIFO_LENGTH - fifo.count is
-advertised the effect will be to almost always fill the fifo 
completely before the guest has a chance to respond,
-effectively overriding the ITL that the guest has set. */
- return (s->recv_fifo.count <= s->recv_fifo.itl) ? 
s->recv_fifo.itl - s->recv_fifo.count : 1;
-else
- return 0;
+if (s->recv_fifo.count < UART_FIFO_LENGTH) {
+/*
+ * Advertise (fifo.itl - fifo.count) bytes when count < ITL, and 1
+ * if above. If UART_FIFO_LENGTH - fifo.count is advertised the
+ * effect will be to almost always fill the fifo completely before
+ * the guest has a chance to respond, effectively overriding the 
ITL
+ * that the guest has set.
+ */
+return (s->recv_fifo.count <= s->recv_fifo.itl) ?
+s->recv_fifo.itl - s->recv_fifo.count : 1;
+} else {
+return 0;
+}
 } else {
-return !(s->lsr & UART_LSR_DR);
+return !(s->lsr & UART_LSR_DR);
 }
 }
 
-- 
1.8.3.rc1.44.gb387c77.dirty




[Qemu-devel] [PATCH v1 2/3] char/serial: Use generic Fifo8

2013-06-02 Thread peter . crosthwaite
From: Peter Crosthwaite 

Use the generic Fifo8 helper provided by QEMU, rather than re-implement
privately.

Signed-off-by: Peter Crosthwaite 
---

 hw/char/serial.c | 98 +---
 include/hw/char/serial.h | 15 +++-
 2 files changed, 39 insertions(+), 74 deletions(-)

diff --git a/hw/char/serial.c b/hw/char/serial.c
index bd6813e..0a2b6c9 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -92,8 +92,6 @@
 #define UART_FCR_RFR0x02/* RCVR Fifo Reset */
 #define UART_FCR_FE 0x01/* FIFO Enable */
 
-#define XMIT_FIFO   0
-#define RECV_FIFO   1
 #define MAX_XMIT_RETRY  4
 
 #ifdef DEBUG_SERIAL
@@ -106,50 +104,14 @@ do {} while (0)
 
 static void serial_receive1(void *opaque, const uint8_t *buf, int size);
 
-static void fifo_clear(SerialState *s, int fifo)
+static inline void recv_fifo_put(SerialState *s, uint8_t chr)
 {
-SerialFIFO *f = (fifo) ? &s->recv_fifo : &s->xmit_fifo;
-memset(f->data, 0, UART_FIFO_LENGTH);
-f->count = 0;
-f->head = 0;
-f->tail = 0;
-}
-
-static int fifo_put(SerialState *s, int fifo, uint8_t chr)
-{
-SerialFIFO *f = (fifo) ? &s->recv_fifo : &s->xmit_fifo;
-
 /* Receive overruns do not overwrite FIFO contents. */
-if (fifo == XMIT_FIFO || f->count < UART_FIFO_LENGTH) {
-
-f->data[f->head++] = chr;
-
-if (f->head == UART_FIFO_LENGTH)
-f->head = 0;
-}
-
-if (f->count < UART_FIFO_LENGTH)
-f->count++;
-else if (fifo == RECV_FIFO)
+if (!fifo8_is_full(&s->recv_fifo)) {
+fifo8_push(&s->recv_fifo, chr);
+} else {
 s->lsr |= UART_LSR_OE;
-
-return 1;
-}
-
-static uint8_t fifo_get(SerialState *s, int fifo)
-{
-SerialFIFO *f = (fifo) ? &s->recv_fifo : &s->xmit_fifo;
-uint8_t c;
-
-if(f->count == 0)
-return 0;
-
-c = f->data[f->tail++];
-if (f->tail == UART_FIFO_LENGTH)
-f->tail = 0;
-f->count--;
-
-return c;
+}
 }
 
 static void serial_update_irq(SerialState *s)
@@ -165,7 +127,7 @@ static void serial_update_irq(SerialState *s)
 tmp_iir = UART_IIR_CTI;
 } else if ((s->ier & UART_IER_RDI) && (s->lsr & UART_LSR_DR) &&
(!(s->fcr & UART_FCR_FE) ||
-s->recv_fifo.count >= s->recv_fifo.itl)) {
+s->recv_fifo.num >= s->recv_fifo_itl)) {
 tmp_iir = UART_IIR_RDI;
 } else if ((s->ier & UART_IER_THRI) && s->thr_ipending) {
 tmp_iir = UART_IIR_THRI;
@@ -262,8 +224,9 @@ static gboolean serial_xmit(GIOChannel *chan, GIOCondition 
cond, void *opaque)
 
 if (s->tsr_retry <= 0) {
 if (s->fcr & UART_FCR_FE) {
-s->tsr = fifo_get(s,XMIT_FIFO);
-if (!s->xmit_fifo.count) {
+s->tsr = fifo8_is_full(&s->xmit_fifo) ?
+0 : fifo8_pop(&s->xmit_fifo);
+if (!s->xmit_fifo.num) {
 s->lsr |= UART_LSR_THRE;
 }
 } else if ((s->lsr & UART_LSR_THRE)) {
@@ -317,7 +280,11 @@ static void serial_ioport_write(void *opaque, hwaddr addr, 
uint64_t val,
 } else {
 s->thr = (uint8_t) val;
 if(s->fcr & UART_FCR_FE) {
-fifo_put(s, XMIT_FIFO, s->thr);
+/* xmit overruns overwrite data, so make space if needed */
+if (fifo8_is_full(&s->xmit_fifo)) {
+fifo8_pop(&s->xmit_fifo);
+}
+fifo8_push(&s->xmit_fifo, s->thr);
 s->thr_ipending = 0;
 s->lsr &= ~UART_LSR_TEMT;
 s->lsr &= ~UART_LSR_THRE;
@@ -368,28 +335,28 @@ static void serial_ioport_write(void *opaque, hwaddr 
addr, uint64_t val,
 if (val & UART_FCR_RFR) {
 qemu_del_timer(s->fifo_timeout_timer);
 s->timeout_ipending=0;
-fifo_clear(s,RECV_FIFO);
+fifo8_reset(&s->recv_fifo);
 }
 
 if (val & UART_FCR_XFR) {
-fifo_clear(s,XMIT_FIFO);
+fifo8_reset(&s->xmit_fifo);
 }
 
 if (val & UART_FCR_FE) {
 s->iir |= UART_IIR_FE;
-/* Set RECV_FIFO trigger Level */
+/* Set recv_fifo trigger Level */
 switch (val & 0xC0) {
 case UART_FCR_ITL_1:
-s->recv_fifo.itl = 1;
+s->recv_fifo_itl = 1;
 break;
 case UART_FCR_ITL_2:
-s->recv_fifo.itl = 4;
+s->recv_fifo_itl = 4;
 break;
 case UART_FCR_ITL_3:
-s->recv_fifo.itl = 8;
+s->recv_fifo_itl = 8;
 break;
 case UART_FCR_ITL_4:
-s->recv_fifo.itl = 14;
+s->recv_fifo_itl = 14;
 break;
 }
 } else
@@ -461,8 +428,9 @@ static uint64_t serial_ioport_read(void *opaque, hwaddr 
addr, unsigned size)
 ret = s->divider & 0xff;
 } else

[Qemu-devel] [PATCH v1 3/3] char/serial: serial_ioport_write: Factor out common code

2013-06-02 Thread peter . crosthwaite
From: Peter Crosthwaite 

These three lines are common to both FIFO and regular mode. Just factor
them out to outside the if rather than replicate the same lines inside
both if and else.

Cc: qemu-triv...@nongnu.org

Signed-off-by: Peter Crosthwaite 
---

 hw/char/serial.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/hw/char/serial.c b/hw/char/serial.c
index 0a2b6c9..017610e 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -285,15 +285,11 @@ static void serial_ioport_write(void *opaque, hwaddr 
addr, uint64_t val,
 fifo8_pop(&s->xmit_fifo);
 }
 fifo8_push(&s->xmit_fifo, s->thr);
-s->thr_ipending = 0;
 s->lsr &= ~UART_LSR_TEMT;
-s->lsr &= ~UART_LSR_THRE;
-serial_update_irq(s);
-} else {
-s->thr_ipending = 0;
-s->lsr &= ~UART_LSR_THRE;
-serial_update_irq(s);
 }
+s->thr_ipending = 0;
+s->lsr &= ~UART_LSR_THRE;
+serial_update_irq(s);
 serial_xmit(NULL, G_IO_OUT, s);
 }
 break;
-- 
1.8.3.rc1.44.gb387c77.dirty




Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-06-02 Thread Gerd Hoffmann
On 06/01/13 01:01, Jordan Justen wrote:
> On Fri, May 31, 2013 at 2:32 AM, Gerd Hoffmann  wrote:
>>   Hi,
>>
>>> I guess -bios would load coreboot. Coreboot would siphon the data
>>> necessary for ACPI table building through the current (same) fw_cfg
>>> bottleneck, build the tables,
>>
>> Yes.
> 
> So, this is really about making coreboot+seabios the default QEMU
> firmware, and making seabios depend on being a coreboot payload?

I still think it's better to simply have qemu generate the acpi tables,
but if that isn't going to be accepted we should seriously consider &
evaluate switching to coreboot.

>>> load the boot firmware (SeaBIOS or OVMF or
>>> something else -- not sure how to configure that),
> 
> It wouldn't be loading OVMF. It would be loading CorebootPkg.

Yep.  Some OVMF bits would be needed though (virtio drivers, qemu boot
priority support, ...).

cheers,
  Gerd




Re: [Qemu-devel] [PATCH v2 1/2] target-i386/seg_helper: refactor 4 helper functions

2013-06-02 Thread li guang
Oh, almost forgot it,
Peter, what about the version?

Thanks!

在 2013-05-06一的 13:55 +0800,liguang写道:
> for helper_{lsl, lar, verr, verw}, there are
> common parts, so move them outside, and then
> call this new helper-helper function.
> 
> Signed-off-by: liguang 
> ---
> v2: change misc_check_helper to privilege_check
> ---
>  target-i386/seg_helper.c |  179 ++---
>  1 files changed, 56 insertions(+), 123 deletions(-)
> 
> diff --git a/target-i386/seg_helper.c b/target-i386/seg_helper.c
> index ff93374..eb9dc04 100644
> --- a/target-i386/seg_helper.c
> +++ b/target-i386/seg_helper.c
> @@ -2288,9 +2288,10 @@ void helper_sysexit(CPUX86State *env, int dflag)
>  EIP = EDX;
>  }
>  
> -target_ulong helper_lsl(CPUX86State *env, target_ulong selector1)
> +
> +static target_ulong privilege_check(CPUX86State *env, target_ulong selector1,
> +  int inst)
>  {
> -unsigned int limit;
>  uint32_t e1, e2, eflags, selector;
>  int rpl, dpl, cpl, type;
>  
> @@ -2302,14 +2303,30 @@ target_ulong helper_lsl(CPUX86State *env, 
> target_ulong selector1)
>  if (load_segment(env, &e1, &e2, selector) != 0) {
>  goto fail;
>  }
> +
> +CC_SRC = eflags & ~CC_Z;
> +
>  rpl = selector & 3;
>  dpl = (e2 >> DESC_DPL_SHIFT) & 3;
>  cpl = env->hflags & HF_CPL_MASK;
> +
>  if (e2 & DESC_S_MASK) {
> -if ((e2 & DESC_CS_MASK) && (e2 & DESC_C_MASK)) {
> -/* conforming */
> -} else {
> -if (dpl < cpl || dpl < rpl) {
> +if (e2 & DESC_CS_MASK) {
> +switch (inst) {
> +case 1:
> +goto fail;
> +case 2:
> +if (!(e2 & (DESC_R_MASK | DESC_C_MASK))) {
> +goto fail;
> +}
> +break;
> +case 3:
> +case 4:
> +if (!(e2 & DESC_C_MASK)) {
> +goto check_pl;
> +}
> +break;
> +default:
>  goto fail;
>  }
>  }
> @@ -2321,140 +2338,56 @@ target_ulong helper_lsl(CPUX86State *env, 
> target_ulong selector1)
>  case 3:
>  case 9:
>  case 11:
> -break;
> +if (inst == 3) {
> +break;
> +}
> +case 5:
> +case 12:
> +if (inst == 4) {
> +break;
> +}
>  default:
>  goto fail;
>  }
> -if (dpl < cpl || dpl < rpl) {
> -fail:
> -CC_SRC = eflags & ~CC_Z;
> -return 0;
> -}
> +goto check_pl;
> +}
> +
> +if (inst == 3) {
> +e2 &= 0x00f0ff00;
>  }
> -limit = get_seg_limit(e1, e2);
> +if (inst == 4) {
> +e2 = get_seg_limit(e1, e2);
> +}
> +
>  CC_SRC = eflags | CC_Z;
> -return limit;
> +
> +check_pl:
> +if (dpl < cpl || dpl < rpl) {
> +goto fail;
> +}
> +
> +fail:
> +return e2;
>  }
>  
> -target_ulong helper_lar(CPUX86State *env, target_ulong selector1)
> +target_ulong helper_lsl(CPUX86State *env, target_ulong selector1)
>  {
> -uint32_t e1, e2, eflags, selector;
> -int rpl, dpl, cpl, type;
> +return privilege_check(env, selector1, 4);
> +}
>  
> -selector = selector1 & 0x;
> -eflags = cpu_cc_compute_all(env, CC_OP);
> -if ((selector & 0xfffc) == 0) {
> -goto fail;
> -}
> -if (load_segment(env, &e1, &e2, selector) != 0) {
> -goto fail;
> -}
> -rpl = selector & 3;
> -dpl = (e2 >> DESC_DPL_SHIFT) & 3;
> -cpl = env->hflags & HF_CPL_MASK;
> -if (e2 & DESC_S_MASK) {
> -if ((e2 & DESC_CS_MASK) && (e2 & DESC_C_MASK)) {
> -/* conforming */
> -} else {
> -if (dpl < cpl || dpl < rpl) {
> -goto fail;
> -}
> -}
> -} else {
> -type = (e2 >> DESC_TYPE_SHIFT) & 0xf;
> -switch (type) {
> -case 1:
> -case 2:
> -case 3:
> -case 4:
> -case 5:
> -case 9:
> -case 11:
> -case 12:
> -break;
> -default:
> -goto fail;
> -}
> -if (dpl < cpl || dpl < rpl) {
> -fail:
> -CC_SRC = eflags & ~CC_Z;
> -return 0;
> -}
> -}
> -CC_SRC = eflags | CC_Z;
> -return e2 & 0x00f0ff00;
> +target_ulong helper_lar(CPUX86State *env, target_ulong selector1)
> +{
> +return privilege_check(env, selector1, 3);
>  }
>  
>  void helper_verr(CPUX86State *env, target_ulong selector1)
>  {
> -uint32_t e1, e2, eflags, selector;
> -int rpl, dpl, cpl;
> -
> -selector = selector1 & 0x;
> -eflags = cpu_cc_compute_all(env, CC_OP);
> -if ((selector & 0xfffc) == 0) {
> -goto fail;
> -}
> -if (load_segment(env, &e1, &e2, selector) != 0) {
> -goto fail;
> -}
> -  

[Qemu-devel] [PATCH v1 0/3] Memory: Trivial fixes

2013-06-02 Thread peter . crosthwaite
From: Peter Crosthwaite 

Some trivial fixes to memory API.


Peter Crosthwaite (3):
  memory: Fix comment typo
  memory: as_update_topology_pass: Improve comments
  memory: render_memory_region: factor out fr constant setters

 memory.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

-- 
1.8.3.rc1.44.gb387c77.dirty




[Qemu-devel] [PATCH v1 1/3] memory: Fix comment typo

2013-06-02 Thread peter . crosthwaite
From: Peter Crosthwaite 

s/ajacent/adjacent

Signed-off-by: Peter Crosthwaite 
---

 memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/memory.c b/memory.c
index 5cb8f4a..60e033b 100644
--- a/memory.c
+++ b/memory.c
@@ -282,7 +282,7 @@ static bool can_merge(FlatRange *r1, FlatRange *r2)
 && r1->readonly == r2->readonly;
 }
 
-/* Attempt to simplify a view by merging ajacent ranges */
+/* Attempt to simplify a view by merging adjacent ranges */
 static void flatview_simplify(FlatView *view)
 {
 unsigned i, j;
-- 
1.8.3.rc1.44.gb387c77.dirty




[Qemu-devel] [PATCH v1 2/3] memory: as_update_topology_pass: Improve comments

2013-06-02 Thread peter . crosthwaite
From: Peter Crosthwaite 

These comments we're a little difficult to read. First one had
incorrect parenthesis. The part about attributes changing is
really applicable to the region being 'in both' rather than 'in
new'

Second comment has an obscure parenthetic about 'Logging may have
changed'. Made clearer, as this if is supposed to handle the case where
the memory region is unchanged (with the notable exception re logging).

Signed-off-by: Peter Crosthwaite 
---

 memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/memory.c b/memory.c
index 60e033b..7e710c4 100644
--- a/memory.c
+++ b/memory.c
@@ -719,7 +719,7 @@ static void address_space_update_topology_pass(AddressSpace 
*as,
 || int128_lt(frold->addr.start, frnew->addr.start)
 || (int128_eq(frold->addr.start, frnew->addr.start)
 && !flatrange_equal(frold, frnew {
-/* In old, but (not in new, or in new but attributes changed). */
+/* In old but not in new, or in both but attributes changed. */
 
 if (!adding) {
 MEMORY_LISTENER_UPDATE_REGION(frold, as, Reverse, region_del);
@@ -727,7 +727,7 @@ static void address_space_update_topology_pass(AddressSpace 
*as,
 
 ++iold;
 } else if (frold && frnew && flatrange_equal(frold, frnew)) {
-/* In both (logging may have changed) */
+/* In both and unchanged (except logging may have changed) */
 
 if (adding) {
 MEMORY_LISTENER_UPDATE_REGION(frnew, as, Forward, region_nop);
-- 
1.8.3.rc1.44.gb387c77.dirty




[Qemu-devel] [PATCH v1 3/3] memory: render_memory_region: factor out fr constant setters

2013-06-02 Thread peter . crosthwaite
From: Peter Crosthwaite 

These 4 replicated lines set properties of fr that are constant over
the course of the function. Factor out their repeated setting (and also
guards against them being set multiple times in the loop below).

Signed-off-by: Peter Crosthwaite 
---

 memory.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/memory.c b/memory.c
index 7e710c4..7bdfc58 100644
--- a/memory.c
+++ b/memory.c
@@ -556,6 +556,11 @@ static void render_memory_region(FlatView *view,
 base = clip.start;
 remain = clip.size;
 
+fr.mr = mr;
+fr.dirty_log_mask = mr->dirty_log_mask;
+fr.romd_mode = mr->romd_mode;
+fr.readonly = readonly;
+
 /* Render the region itself into any gaps left by the current view. */
 for (i = 0; i < view->nr && int128_nz(remain); ++i) {
 if (int128_ge(base, addrrange_end(view->ranges[i].addr))) {
@@ -564,12 +569,8 @@ static void render_memory_region(FlatView *view,
 if (int128_lt(base, view->ranges[i].addr.start)) {
 now = int128_min(remain,
  int128_sub(view->ranges[i].addr.start, base));
-fr.mr = mr;
 fr.offset_in_region = offset_in_region;
 fr.addr = addrrange_make(base, now);
-fr.dirty_log_mask = mr->dirty_log_mask;
-fr.romd_mode = mr->romd_mode;
-fr.readonly = readonly;
 flatview_insert(view, i, &fr);
 ++i;
 int128_addto(&base, now);
@@ -584,12 +585,8 @@ static void render_memory_region(FlatView *view,
 int128_subfrom(&remain, now);
 }
 if (int128_nz(remain)) {
-fr.mr = mr;
 fr.offset_in_region = offset_in_region;
 fr.addr = addrrange_make(base, remain);
-fr.dirty_log_mask = mr->dirty_log_mask;
-fr.romd_mode = mr->romd_mode;
-fr.readonly = readonly;
 flatview_insert(view, i, &fr);
 }
 }
-- 
1.8.3.rc1.44.gb387c77.dirty




[Qemu-devel] fstrim and cache=none

2013-06-02 Thread Dusty Mabe
Hi,

Is it valid to have cache=none and discard=unmap together? I notice
that when I have cache=none the fstrim operations inside of my guests
gives me an error and i get this from the sys filesystem:

[root@guest ~]# cat /sys/block/sdb/queue/discard*
4096
0
0


This wasn't very obvious to me to begin with and just wanted to see if
this is a bug or as designed.


libvirt 1.0.6
qemu 1.5.0

Thanks,
Dusty



[Qemu-devel] Boot guest OS on many nodes?

2013-06-02 Thread Steven.G
Dear all,

I've a scene,
I have many host nodes with linux kernels, and I want to boot one guest
OS on all these nodes use QEMU/KVM. In order that I can use all the
physical CPUs and Mems distributed on these linux nodes, and the Guest
looks like a SMP or shared-memory system.
How can I use QEMU/KVM to achieve this?Any one helps me with some basic
ideas.

Best regards

Steven



Re: [Qemu-devel] [PATCH 7/8] pseries: savevm support for PAPR virtual SCSI

2013-06-02 Thread Alexey Kardashevskiy
On 05/31/2013 08:41 PM, Paolo Bonzini wrote:
> Il 31/05/2013 12:25, Alexey Kardashevskiy ha scritto:
>> On 05/31/2013 08:07 PM, Benjamin Herrenschmidt wrote:
>>> On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote:

 And another question (sorry I am not very familiar with terminology but
 cc:Ben is :) ) - what happens with indirect requests if migration happened
 in the middle of handling such a request? virtio-scsi does not seem to
 handle this situation anyhow, it just reconstructs the whole request and
 that's it.
>>>
>>> So Paolo, the crux of the question here is really whether we have any
>>> guarantee about the state of the request when this happens (by this I
>>> mean a save happening with requests still "in flight") ?
>>>
>>> IE. Can the request can be at any stage of processing, with the data
>>> transfer phase being half way through, or do we somewhat know for sure
>>> that the request will *not* have started transferring any data ?
>>>
>>> This is key, because in the latter case, all we really need to do is
>>> save the request itself, and re-parse it on restore as if it was
>>> new really (at least from a DMA descriptor perspective).
>>>
>>> However, if the data transfer is already half way through, we need to
>>> somewhat save the state of the data transfer machinery, ie. the position
>>> of the "cursor" that follows the guest-provided DMA descriptor list,
>>> etc... (which isn't *that* trivial since we have a concept of indirect
>>> descriptors and we use pointers to follow them, so we'd probably have
>>> to re-walk the whole user descriptors list until we reach the same 
>>> position).
> 
> It may be halfway through, but it is always restarted on the destination.
> 
> virtio-scsi parses the whole descriptor chain upfront and sends the
> guest addresses in the migration stream.
> 
>> Is not it the same QEMU thread which handles hcalls and QEMU console
>> commands so the migration cannot stop parsing/handling a vscsi_req?
> 
> The VM is paused and I/O is flushed at the point when the reqs are sent.
>  That's why you couldn't get a pending request.  Only failed requests
> remain in queue.


Ok. I implemented {save|load}_request for IBMVSCSI, started testing - the
destination system behaves very unstable, sometime it is a fault in
_raw_spin_lock or it looks okay but any attempt to read the filesystem
leads to 100% cpu load in qemu process and no response from the guest.

I tried virtio-scsi as well (as it was referred as a good example), it
fails in exactly the same way. So I started wondering - when did you try it
last time? :)

My test is:
1. create qcow2 image 8GB, put it to 2GB USB disk.
2. put 1.8GB "dummy" image onto the same USB disk.
3. run qemu with qcow2 image.
4. do "mkfs.ext4 /dev/sda" in the guest. It creates 300MB file when there
is enough space.
5. wait till the source qemu gets stopped due to io error (info status
confirms this).
6. migrate.
7. remove "dummy".
8. "c"ontinue in the destination guest.

Is it good/bad/ugly? What do I miss? Thanks!


-- 
Alexey



Re: [Qemu-devel] [PATCH 7/8] pseries: savevm support for PAPR virtual SCSI

2013-06-02 Thread Paolo Bonzini
Il 01/06/2013 02:01, Benjamin Herrenschmidt ha scritto:
> On Fri, 2013-05-31 at 12:41 +0200, Paolo Bonzini wrote:
> 
>> It may be halfway through, but it is always restarted on the destination.
> 
> "restarted" as in the whole transfer is restarted if any right ? So we
> can essentially consider as a new request for which we just did
> scsi_req_enqueue() ?
> 
> IE. We don't do direct DMA to guest pages just yet (we still do copies)
> so basically our process is:
> 
>  1- Obtain request from guest
>  2- Queue it (scsi_req_enqueue)
>  3- No transfer -> go away (completion is called)
>  4- Pre-process user descriptors (check desc type direct vs indirect,
> position our "cursor" walking them etc)
>  5- scsi_req_continue()
> .../... loop of callbacks & transfer
> 
> Now from what you say, I assume that regardless of the point where
> the request was, when we "resume" it will always be at step 4 ?
> 
> IE. I can just pre-process the descriptors again ? (I actually need
> to transfer them again from the guest since I suspect I clobber them
> at the very least due to byteswap) and call scsi_req_continue() and
> assume the transfer (if any) started from the beginning ?

Yes.  Unless the spec somehow lets the guest figure out the point at
which the whole chain has been pre-processed, and lets the guest modify
the chain at this point.  But if that's not the case, you can do that.
Memory has already been loaded when load_request runs.

Paolo



Re: [Qemu-devel] [PATCH 7/8] pseries: savevm support for PAPR virtual SCSI

2013-06-02 Thread Paolo Bonzini
Il 03/06/2013 07:46, Alexey Kardashevskiy ha scritto:
> Ok. I implemented {save|load}_request for IBMVSCSI, started testing - the
> destination system behaves very unstable, sometime it is a fault in
> _raw_spin_lock or it looks okay but any attempt to read the filesystem
> leads to 100% cpu load in qemu process and no response from the guest.
> 
> I tried virtio-scsi as well (as it was referred as a good example), it
> fails in exactly the same way. So I started wondering - when did you try it
> last time? :)

Perhaps a year ago.  Gerd must have tested usb-storage later than that
though.

> My test is:
> 1. create qcow2 image 8GB, put it to 2GB USB disk.
> 2. put 1.8GB "dummy" image onto the same USB disk.
> 3. run qemu with qcow2 image.
> 4. do "mkfs.ext4 /dev/sda" in the guest. It creates 300MB file when there
> is enough space.
> 5. wait till the source qemu gets stopped due to io error (info status
> confirms this).
> 6. migrate.
> 7. remove "dummy".
> 8. "c"ontinue in the destination guest.

Sounds good.  We really need testcases for this.  I'll take a look when
I come back from vacation.

Paolo



Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-06-02 Thread Paolo Bonzini
Il 02/06/2013 17:05, Gleb Natapov ha scritto:
>>> Anthony requested that patches be made that generate the ACPI tables
>>> in QEMU for the upcoming hotplug work, so that they could be evaluated
>>> to see if they truly do need to live in QEMU or if the code could live
>>> in the firmware.  There were no objections.
>>
>> I volunteered to implement this.
>
> Why hotplug should generate ACPI code? It does not do so on real HW.

Hotplug can do a LoadTable and merge it into the existing ones.  But
then you do not need QEMU-time generation of tables to do the same thing
for cold-plug.

Paolo



Re: [Qemu-devel] fstrim and cache=none

2013-06-02 Thread Paolo Bonzini
Il 02/06/2013 22:21, Dusty Mabe ha scritto:
> Hi,
> 
> Is it valid to have cache=none and discard=unmap together? I notice
> that when I have cache=none the fstrim operations inside of my guests
> gives me an error

What error?  Can you add an extra disk and try

dd if=/dev/urandom of=/dev/sdc bs=1M count=32
sg_unmap -l0 -n64 /dev/sdc

from within the guest?

> and i get this from the sys filesystem:
> 
> [root@guest ~]# cat /sys/block/sdb/queue/discard*
> 4096
> 0
> 0

This is expected.  It won't change between discard=unmap and
discard=ignore, too.  (Tip: Use "grep . /sys/block/sdb/queue/discard*"
so that it shows the filename).

> 
> 
> This wasn't very obvious to me to begin with and just wanted to see if
> this is a bug or as designed.

It should be.  It may be a Linux bug, too.

Paolo




Re: [Qemu-devel] [PATCH 14/15] memory: return MemoryRegion from qemu_ram_addr_from_host

2013-06-02 Thread Paolo Bonzini
Il 02/06/2013 18:04, Peter Maydell ha scritto:
> On 2 June 2013 16:43, Paolo Bonzini  wrote:
>> -int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
>> +MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
>>  ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);
> 
> This is weird, because now the _nofail and the standard
> versions of this function return different things. Why
> wouldn't a caller of the _nofail version potentially
> need the MemoryRegion* too?

I'll adjust both functions.

Paolo




Re: [Qemu-devel] [PATCH 00/15] Memory/IOMMU patches part 4: region ownership

2013-06-02 Thread Paolo Bonzini
Il 02/06/2013 18:12, Peter Maydell ha scritto:
> On 2 June 2013 16:43, Paolo Bonzini  wrote:
>> Reference counting the region piggybacks on reference counting of a QOM
>> object, the "owner" of the region.  The owner API is designed so that
>> it will be called as little as possible.  Unowned subregions will get a
>> region if memory_region_set_owner is called after the subregion is added.
>> This is in general the common case already; often setting the owner can
>> be delegated to a bus-specific API that already takes a DeviceState
>> (for example pci_register_bar or sysbus_init_mmio).
> 
> This feels a bit fragile to me -- there doesn't seem to be
> a clear rule for who has to set the owner of a region or
> when they have to do it, or for ensuring that it doesn't
> get forgotten altogether.

The best rule would be "who creates it".  This would touch half the
files in hw/, which I am trying to avoid.

So I'm settling for:

* in general, the bus sets the owner for regions that are managed by the
bus (patches 5/6/9);

* if the device plays directly with address_space_memory/io (shouldn't
happen except in very special cases, such as patch 7) or if regions are
added/deleted dynamically to a container (patches 8/10/11/12), then the
device must set the owner itself.

All stuff that should be in a comment, I guess...

> What happens if I take a MemoryRegion* that another device
> has exposed to me as a sysbus mmio region (and so claimed
> ownership of) and pass it to pci_register_bar()?

You get an assertion failure.

> Who owns it at that point? [That's a legitimate thing to do, I think,
> though I don't suppose anybody does it at the moment.
> Sysbus MMIOs aren't only for mapping in the system address
> space, they're a general way for one device to expose a
> MemoryRegion * for use by another device.]

I don't think it is legitimate, MMIO regions are just for use via
sysbus_map_mmio.  But the same scenario could apply without sysbus MMIO
in the future (e.g. via QOM properties), so it is a good question.

The right thing to do is to use a container or alias region, and put the
1st region inside it.  Then the 1st region keeps its owner, and the
container/alias gets a new one.

Paolo



Re: [Qemu-devel] [PATCH 02/15] memory: add ref/unref

2013-06-02 Thread Paolo Bonzini
Il 02/06/2013 17:58, Peter Maydell ha scritto:
>> > + * memory_region_ref: Add 1 to a memory region's reference count
>> > + *
>> > + * Whenever memory regions are accessed outside the BQL, they need to be
>> > + * preserved against hot-unplug.  MemoryRegions actually do not have their
>> > + * own reference count; they piggyback on a QOM object, their "owner".
>> > + * This function adds a reference to the owner.
> It doesn't make much sense to describe the function as
> "add 1 to a memory region's reference count" and then
> say that memory regions don't have reference counts...

The fact that the reference count is the owner's is really an
implementation detail.  The reference count is used entirely as if it
were the region's.

So the summary says it is the region's, the long description says what
happens under the hood.

Suggestions on rephrasing the comments are welcome.

Paolo



Re: [Qemu-devel] [PATCH uq/master] fix double free the memslot in kvm_set_phys_mem

2013-06-02 Thread Gleb Natapov
On Fri, May 31, 2013 at 04:52:18PM +0800, Xiao Guangrong wrote:
> Luiz Capitulino reported that guest refused to boot and qemu
> complained with:
> kvm_set_phys_mem: error unregistering overlapping slot: Invalid argument
> 
> It is caused by commit 235e8982ad that did double free for the memslot
> so that the second one raises the -EINVAL error
> 
> Fix it by reset memory size only if it is needed
> 
> Reported-by: Luiz Capitulino 
> Signed-off-by: Xiao Guangrong 
Thanks, applied.

> ---
>  kvm-all.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 8e7bbf8..405480e 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -206,7 +206,8 @@ static int kvm_set_user_memory_region(KVMState *s, 
> KVMSlot *slot)
>  if (s->migration_log) {
>  mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
>  }
> -if (mem.flags & KVM_MEM_READONLY) {
> +
> +if (slot->memory_size && mem.flags & KVM_MEM_READONLY) {
>  /* Set the slot size to 0 before setting the slot to the desired
>   * value. This is needed based on KVM commit 75d61fbc. */
>  mem.memory_size = 0;
> -- 
> 1.7.7.6

--
Gleb.