date:20150814

Re: [Qemu-devel] [PATCH RFC] pseries: define coldplugged devices as "configured"

2015-08-14 Thread Laurent Vivier



On 14/08/2015 07:20, Bharata B Rao wrote:
> On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
>> When a device is hotplugged, attach() sets "configured" to
>> false, waiting an action from the OS to configure it and then
>> to call ibm,configure-connector. On ibm,configure-connector,
>> the hypervisor sets "configured" to true.
>>
>> In case of coldplugged device, attach() sets "configured" to
>> false, but firmware and OS never call the ibm,configure-connector
>> in this case, so it remains set to false.
>>
>> It could be harmless, but when we unplug a device, hypervisor
>> waits the device becomes configured because for it, a not configured
>> device is a device being configured, so it waits the end of configuration
>> to unplug it... and it never happens, so it is never unplugged.
> 
> Not true for at least logical DR device like CPU. I am able to cleanly
> unplug a cold plugged CPU in the patchset I posted at:
> 
> https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
> 
> And this is how the state transitions work for cold plugged CPU devices:

Could you try with a PCI card ?

Thanks,
Laurent

Re: [Qemu-devel] [PATCH RFC] pseries: define coldplugged devices as "configured"

2015-08-14 Thread Laurent Vivier



On 14/08/2015 07:20, Bharata B Rao wrote:
> On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
>> When a device is hotplugged, attach() sets "configured" to
>> false, waiting an action from the OS to configure it and then
>> to call ibm,configure-connector. On ibm,configure-connector,
>> the hypervisor sets "configured" to true.
>>
>> In case of coldplugged device, attach() sets "configured" to
>> false, but firmware and OS never call the ibm,configure-connector
>> in this case, so it remains set to false.
>>
>> It could be harmless, but when we unplug a device, hypervisor
>> waits the device becomes configured because for it, a not configured
>> device is a device being configured, so it waits the end of configuration
>> to unplug it... and it never happens, so it is never unplugged.
> 
> Not true for at least logical DR device like CPU. I am able to cleanly
> unplug a cold plugged CPU in the patchset I posted at:
> 
> https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
> 
> And this is how the state transitions work for cold plugged CPU devices:
> 
> - Cold plugged CPU DRC is explicitly set with allocation_state=USABLE
>   and isolation_state=UNISOLATED.
> - device_del results in drck->detach() that just returns by setting
>   drc->awaiting_release to true.
> - Unplug notification is sent to guest.
> - Guest comes back with set_indicator RTAS call for setting isolation_state
>   to ISOLATED. set_isolation_state() sets drc->configured to false.
> - Guest comes back again with set_indicator RTAS call for setting allocation
>   state to UNUSABLE. set_allocation_state() finalizes the device removal by
>   calling drck->detach()

It doesn't work for PCI, because (QEMU 2.4.0):

static int set_allocation_state(sPAPRDRConnector *drc,
sPAPRDRAllocationState state)
...
if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
...
drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
 drc->detach_cb_opaque, NULL);
...
}

> - drck->detach() now calls drc->detach_cb() that truly releases the
>   CPU resource by getting rid of vCPU thread in QEMU.

Laurent

Re: [Qemu-devel] [PATCH RFC] pseries: define coldplugged devices as "configured"

2015-08-14 Thread Bharata B Rao

On Fri, Aug 14, 2015 at 09:16:08AM +0200, Laurent Vivier wrote:
> 
> 
> On 14/08/2015 07:20, Bharata B Rao wrote:
> > On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
> >> When a device is hotplugged, attach() sets "configured" to
> >> false, waiting an action from the OS to configure it and then
> >> to call ibm,configure-connector. On ibm,configure-connector,
> >> the hypervisor sets "configured" to true.
> >>
> >> In case of coldplugged device, attach() sets "configured" to
> >> false, but firmware and OS never call the ibm,configure-connector
> >> in this case, so it remains set to false.
> >>
> >> It could be harmless, but when we unplug a device, hypervisor
> >> waits the device becomes configured because for it, a not configured
> >> device is a device being configured, so it waits the end of configuration
> >> to unplug it... and it never happens, so it is never unplugged.
> > 
> > Not true for at least logical DR device like CPU. I am able to cleanly
> > unplug a cold plugged CPU in the patchset I posted at:
> > 
> > https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
> > 
> > And this is how the state transitions work for cold plugged CPU devices:
> 
> Could you try with a PCI card ?

Yes, there is an issue with removal of cold plugged PCI devices. I can see
the device getting completely removed in the guest but it still remains
in QEMU as shown by the QEMU monitor. So your patch fixes this by ensuring
complete removal.

Regards,
Bharata.

Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Li, Liang Z

> >>>  Could please point out what issue the patch 317b0a6d8ba44e try
> >>> to fix?  I
> >> found in live migration the cpu_synchronize_all_states will be called
> >> twice, and it will take more than 1 ms sometimes. I try to do some
> >> optimization but lack the knowledge about the background.
> >>
> >> What the code in 317b0a6d8ba44e requires is to retrieve the TSC value
> >> from the kernel.
> >
> > I know 317b0a6d8ba44e is to retrieve the TSC value, but I don't understand
> why it is needed. During the live migration, the cpu_synchronize_all_states
> will be called later after stopping kvm-clock. The env->tsc will be updated, 
> is
> that not enough? Or is there some case like call the
> 'stop_vm(RUN_STATE_PAUSED )' or ' 'stop_vm (RUN_STATE_DEBUG) ', that
> require updating the env->tsc? By google, I find that your patch try to fix
> some issue, but I don't know what the exact issue.
> 
> I remember testing these, and I afair that was the reason:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html
> 
> --
> mg

Hi Mg,

Thanks for your reply, I have read the thread in your email, what's the 
mean of 'switching from old to new disk', could give a detail description? 

Liang

Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła


W dniu 2015-08-14 o 03:23, Li, Liang Z pisze:

On Thu, Aug 13, 2015 at 01:25:29AM +, Li, Liang Z wrote:

Hi Paolo & Marcelo,

 Could please point out what issue the patch 317b0a6d8ba44e try to fix?  I

found in live migration the cpu_synchronize_all_states will be called twice,
and it will take more than 1 ms sometimes. I try to do some optimization but
lack the knowledge about the background.

What the code in 317b0a6d8ba44e requires is to retrieve the TSC value from
the kernel.


I know 317b0a6d8ba44e is to retrieve the TSC value, but I don't understand why it is 
needed. During the live migration, the cpu_synchronize_all_states will be called 
later after stopping kvm-clock. The env->tsc will be updated, is that not enough? 
Or is there some case like call the 'stop_vm(RUN_STATE_PAUSED )' or ' 'stop_vm 
(RUN_STATE_DEBUG) ', that require updating the env->tsc? By google, I find that 
your patch try to fix some issue, but I don't know what the exact issue.


I remember testing these, and I afair that was the reason:

http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html

--
mg

Re: [Qemu-devel] [PATCH RFC 02/10] maint: remove double semicolons in many files

2015-08-14 Thread Daniel P. Berrange

On Thu, Aug 13, 2015 at 06:57:55PM +0100, Peter Maydell wrote:
> On 31 July 2015 at 17:30, Daniel P. Berrange  wrote:
> > A number of source files have statements accidentally
> > terminated by a double semicolon - eg 'foo = bar;;'.
> > This is harmless but a mistake none the less.
> >
> > The tcg/ia64/tcg-target.c file is whitelisted because
> > it has valid use of ';;' in a comment containing assembly
> > code.
> >
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  block/vhdx.c | 2 +-
> >  cfg.mk   | 5 -
> >  hw/arm/vexpress.c| 4 ++--
> >  hw/intc/arm_gic.c| 2 +-
> >  numa.c   | 2 +-
> >  qga/commands-win32.c | 2 +-
> >  6 files changed, 10 insertions(+), 7 deletions(-)
> 
> If you kept the "enable the check in cfg.mk" change out of
> the patches like these, we could review and commit them
> without them being tangled up or waiting on the review of
> the syntax-checking infrastructure itself...

Sure, I just really wanted to illustrate the use of the syntax check
infrastructure with this series of fixes. Now I've done that I'll
resubmit just the fixes, while debate over the checking infrastructure
continues.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [RFC PATCH V7 07/19] protect TBContext with tb_lock.

2015-08-14 Thread Frederic Konrad


On 12/08/2015 20:20, Alex Bennée wrote:

Frederic Konrad  writes:


On 10/08/2015 17:27, fred.kon...@greensocs.com wrote:

From: KONRAD Frederic 

This protects TBContext with tb_lock to make tb_* thread safe.

We can still have issue with tb_flush in case of multithread TCG:
An other CPU can be executing code during a flush.

This can be fixed later by making all other TCG thread exiting before calling
tb_flush().

tb_find_slow is separated into tb_find_slow and tb_find_physical as the whole
tb_find_slow doesn't require to lock the tb.

Signed-off-by: KONRAD Frederic 

Changes:

[...]
   
@@ -675,6 +710,7 @@ static inline void code_gen_alloc(size_t tb_size)

   CODE_GEN_AVG_BLOCK_SIZE;
   tcg_ctx.tb_ctx.tbs =
   g_malloc(tcg_ctx.code_gen_max_blocks * sizeof(TranslationBlock));
+qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
   }
   
   /* Must be called before using the QEMU cpus. 'tb_size' is the size

@@ -699,16 +735,22 @@ bool tcg_enabled(void)
   return tcg_ctx.code_gen_buffer != NULL;
   }
   
-/* Allocate a new translation block. Flush the translation buffer if

-   too many translation blocks or too much generated code. */
+/*
+ * Allocate a new translation block. Flush the translation buffer if
+ * too many translation blocks or too much generated code.
+ * tb_alloc is not thread safe but tb_gen_code is protected by a mutex so this
+ * function is called only by one thread.
+ */
   static TranslationBlock *tb_alloc(target_ulong pc)
   {
-TranslationBlock *tb;
+TranslationBlock *tb = NULL;
   
   if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||

   (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) >=
tcg_ctx.code_gen_buffer_max_size) {
-return NULL;
+tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
+tb->pc = pc;
+tb->cflags = 0;

Missed this wrong unreverted part which in the end doesn't do a tb_flush
when required and crashes!
Fixing that allows me to boot with jessie and virt.

\o/

Do you see crashes while it is running?

It's interesting that I've not had a problem booting jessie with virt
though - just crashes while hanging.

Are you likely to push a v8 this week (or a temp branch?) with this and
any other obvious fixes? I appreciate Paolo has given you a not-so-small
pile of review comments as well so I wasn't looking for a complete new
patch set!

here is something I did yesterday:
multi_tcg_v7_bugfixed

The patch-set is a mess and not re-based on the patch-set sent by Paolo.

Fred




Fred

Re: [Qemu-devel] [PULL 0/4] target-mips queue

2015-08-14 Thread Peter Maydell

On 13 August 2015 at 17:45, Leon Alrae  wrote:
> Hi,
>
> First target-mips pull request for 2.5 consisting of patches sent during
> 2.4 freeze.
>
> Thanks,
> Leon
>
> Cc: Peter Maydell 
> Cc: Aurelien Jarno 
>
> The following changes since commit ca0e5d8b0d065a95d0f9042f71b2ace45b015596:
>
>   Open 2.5 development tree (2015-08-11 23:15:55 +0100)
>
> are available in the git repository at:
>
>   git://github.com/lalrae/qemu.git tags/mips-20150813
>
> for you to fetch changes up to c85570163bdf1ba29cb52a63f22ff1c48f1b9398:
>
>   target-mips: Use CPU_LOG_INT for logging related to interrupts (2015-08-13 
> 16:22:53 +0100)
>
> 
> MIPS patches 2015-08-13
>
> Changes:
> * mips32r5-generic CPU updated and renamed to P5600
> * improvements in LWL/LDL, logging and fulong2e

Applied, thanks.

-- PMM

Re: [Qemu-devel] Plan for using softmmu with linux-user

2015-08-14 Thread Peter Maydell

On 14 August 2015 at 04:25, gchen gchen  wrote:
>  - If the performance of "linux-user + softmmu + tci" is not acceptable
>(at present, I am not quite sure), we have to implement SW64 tcg host
>target instead of tci.

If you care even slightly about performance, then do not use TCI.
A tcg backend is only about 2000 lines of code, they're not terribly
difficult to implement.

-- PMM

Re: [Qemu-devel] [PATCH QEMU] vmstate: Remove redefinition of VMSTATE_UINT32_ARRAY

2015-08-14 Thread Peter Maydell

On 14 August 2015 at 07:16, Soren Brinkmann  wrote:
> The macro is defined twice in identical ways.
>
> Signed-off-by: Soren Brinkmann 
> ---
> I have the feeling I'm missing a tiny one-letter difference or some
> ifdef, but I believe the mentioned macro is defined twice.

Duplicate accidentally introduced way back in
commit 9ba2f6601d92c73 as far as I can tell.

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła


 Thanks for your reply, I have read the thread in your email, what's the 
mean of 'switching from old to new disk', could give a detail description?


The test case was like that (using libvirt):

1. Get VM running (linux, using kvmclock),
2. Use blockcopy to copy disk data from one location to another,
3. Issue blockjob --pivot (to finish mirroring)

From what I remember, at point 3, VM is momentarily paused and resumed, 
so kvm state change handler is called twice. Without this patch, the VM 
hanged because its time goes backwards (or qemu crashed if assertion was 
not compiled out).


--
mg

Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Li, Liang Z

> Subject: Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc
> value for kvmclock_current_nsec calculation
> 
> >  Thanks for your reply, I have read the thread in your email, what's the
> mean of 'switching from old to new disk', could give a detail description?
> 
> The test case was like that (using libvirt):
> 
> 1. Get VM running (linux, using kvmclock), 2. Use blockcopy to copy disk data
> from one location to another, 3. Issue blockjob --pivot (to finish mirroring)
> 
>  From what I remember, at point 3, VM is momentarily paused and resumed,
> so kvm state change handler is called twice. Without this patch, the VM
> hanged because its time goes backwards (or qemu crashed if assertion was
> not compiled out).
> 
> --
> mg

So, the problem is cause by stop_vm(RUN_STATE_PAUSED), in this case the 
env->tsc is not updated, which lead to the issue. 
Is that right?  If the cpu_clean_all_dirty() is needed just for the APIC status 
reason, I think we can do the cpu_synchronize_all_states() in do_vm_stop
and after vm_state_notify() when the RUN_STATE_PAUSED is hit, at this point all 
the device models is stopped, there is no outdated APIC status.  

I want to write a patch to fix this issue in another way, could help to verify 
it in you environment, very appreciate if you could.

Thanks.

Liang

Re: [Qemu-devel] [PATCH 00/10] translate-all.c thread-safety

2015-08-14 Thread Frederic Konrad


On 12/08/2015 18:40, Paolo Bonzini wrote:

Hi, this is my attempt at 1) extracting upstreamable parts out of Fred's
MTTCG,


Can you take this one as well after the replace spinlock by QemuMutex:
"remove unused spinlock."

Thanks,
Fred

  and 2) documenting what's going on in user-mode MTTCG 3) fix
one bug in the process.  I couldn't find any other locking problem
from reading the code.

The final two patches are not really upstreamable because they
add some still unnecessary locks to system emulation, but I included
them to show what's going on.  With this locking logic I do not need
tb_lock to be recursive anymore.

Paolo

KONRAD Frederic (4):
   cpus: protect work list with work_mutex
   cpus: remove tcg_halt_cond global variable.
   replace spinlock by QemuMutex.
   tcg: protect TBContext with tb_lock.

Paolo Bonzini (8):
   exec-all: remove non-TCG stuff from exec-all.h header.
   cpu-exec: elide more icount code if CONFIG_USER_ONLY
   tcg: code_bitmap is not used by user-mode emulation
   tcg: comment on which functions have to be called with mmap_lock held
   tcg: add memory barriers in page_find_alloc accesses
   exec: make mmap_lock/mmap_unlock globally available
   cpu-exec: fix lock hierarchy for user-mode emulation
   tcg: comment on which functions have to be called with tb_lock held

  bsd-user/qemu.h  |   2 -
  cpu-exec.c   | 107 +--
  cpus.c   |  34 ++
  exec.c   |   4 ++
  hw/i386/kvmvapic.c   |   2 +
  include/exec/exec-all.h  |  19 +++---
  include/exec/ram_addr.h  |   1 +
  include/qom/cpu.h|   9 ++-
  include/sysemu/sysemu.h  |   3 +
  linux-user/main.c|   6 +-
  linux-user/qemu.h|   2 -
  qom/cpu.c|   1 +
  target-i386/cpu.h|   3 +
  target-i386/mem_helper.c |  25 +++-
  target-i386/translate.c  |   2 +
  tcg/tcg.h|   6 ++
  translate-all.c  | 161 +--
  17 files changed, 290 insertions(+), 97 deletions(-)

Re: [Qemu-devel] Plan for using softmmu with linux-user

2015-08-14 Thread gchen gchen

On 2015年08月14日 16:44, Peter Maydell wrote:
> On 14 August 2015 at 04:25, gchen gchen  wrote:
>>  - If the performance of "linux-user + softmmu + tci" is not acceptable
>>(at present, I am not quite sure), we have to implement SW64 tcg host
>>target instead of tci.
>
> If you care even slightly about performance, then do not use TCI.
> A tcg backend is only about 2000 lines of code, they're not terribly
> difficult to implement.
>

OK, thank you for your suggestion, but for me, I guess, I still need let
tci work correctly:

 - If I implement SW64 tcg backend, I guess, I cann't get help from qemu
   upstream: I don't think SW64 is valuable enough for upstream (either
   I am not sure that I can implment Alpha tcg backend in working time).

 - tci is one tcg backend, at present, it can let i386 console programs
   work under Alpha. So I can learn tcg backend by fixing its X issues
   with the help from upstream (then implement SW64 tcg backend, next).

 - Also tci is only slightly lower than tcg native backend, so if we are
   lucky, its performance may be enough too! (I hope so).

Thanks.
--
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

Re: [Qemu-devel] [PATCH RFC 00/10] Enable repository wide style checking

2015-08-14 Thread Daniel P. Berrange

On Thu, Aug 13, 2015 at 09:39:48PM +0100, Peter Maydell wrote:
> On 13 August 2015 at 19:27, Eric Blake  wrote:
> > It's worth asking the gnulib folks for an opinion on whether relaxing
> > the license on maint.mk and GNUmakefile to explicitly go back to GPLv2+,
> > and/or explicitly add some explicit exception clause like gcc that makes
> > it clear that using these files to build does not taint the built
> > product.  Personally, I see no problem with using GPLv3'd tools (after
> > all, qemu requires GPLv3 GNU make, and gcc is also GPLv3 although clang
> > can step around that one), but I also see your reluctance of even having
> > a file in the qemu.git repo that has a GPLv3 clause.
> 
> Right; we don't ship make or gcc in our code repo, and using
> external-to-the-repository tools which happen to be GPLv3 is
> obviously fine. Similarly, if you used the maint.mk script externally
> as a tool which allowed you to find bugs which you submitted
> patches to fix that wouldn't be a problem. I just don't want
> a GPLv3-licensed file in the git repo and an integrated part
> of our build-and-test system...

Ok, I certainly understand why we can't have GPLv3 code built
into QEMU, but I thought build-system tests would be ok because
it does not affect the built binaries in any way.

> I would certainly appreciate a maint.mk with a GPLv2-or-later
> license. Our other options are (a) use the last v2+ version
> (which is what we do with our binutils disassemblers)
> (b) do the style checks we care about some other way or
> (c) don't bother doing the style checks at all.

Option (b) could involve re-factoring the existing check_patch.pl
script to give us the 2 main benefits from the gnulib check code

 - Ability to turn on/off individual rules on a per-file basis
 - Ability to run against the entire codebase not just patches

IIUC, the check_patch.pl script was imported from Linux, so I'm
not sure if there is a general desire to minimize the divergance
from the original file, or whether refactoring would be welcome ?

I can certainly explore the viability of such an approach if people
are conceptually open to some significant changes to check_patch.pl

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH 0/5] Wire up various EL2/EL3 address translation ops

2015-08-14 Thread Peter Maydell

Ping?

thanks
-- PMM

On 24 July 2015 at 16:20, Peter Maydell  wrote:
> This patch series wires up some of the EL2 and EL3 address
> translation operations which we were missing:
>  * the AArch64 EL2 and EL3 AT ops
>  * the AArch32 ATS12NSO ops
>  * the AArch32 ATS1H ops
>
> Most of these are still not accessible or not very interesting
> because we don't have any CPUs which set ARM_FEATURE_EL2 yet.
> Providing ATS12NSO for AArch32-with-EL3 CPUs is a genuine bugfix.
>
> I included a bugfix for the 32-bit EL2 stage 1 translation
> regime. I think that the only remaining thing missing for EL2
> (based on eyeballing our current code) is implementing stage
> 2 translations.
>
> NB: this code isn't really tested, but it looks nice when you
> read it.
>
> Peter Maydell (5):
>   target-arm: there is no TTBR1 for 32-bit EL2 stage 1 translations
>   target-arm: Wire up AArch64 EL2 and EL3 address translation ops
>   target-arm: Add CP_ACCESS_TRAP_UNCATEGORIZED_EL2,3
>   target-arm: Enable the AArch32 ATS12NSO ops
>   target-arm: Implement AArch32 ATS1H* operations
>
>  target-arm/cpu.h   |  3 ++
>  target-arm/helper.c| 88 
> ++
>  target-arm/op_helper.c |  8 +
>  3 files changed, 92 insertions(+), 7 deletions(-)

Re: [Qemu-devel] [PATCH v2 0/6] replace qemu_fls() with pow2ceil()/pow2floor()

2015-08-14 Thread Peter Maydell

Ping?

(Patches 1 and 2 have been reviewed; thanks.)

-- PMM

On 24 July 2015 at 13:33, Peter Maydell  wrote:
> We have a qemu_fls() function which is just a silly wrapper
> around clz32() and which is used in only a handful of places
> in the codebase. It turns out that all of those are really
> trying to round up or down to a power of 2, which is something
> we have utility functions for. This series replaces all
> the qemu_fls() calls with pow2ceil() or pow2floor(), and then
> removes the now-unused function.
>
> For the case where you really want to do bit counting rather
> than just power-of-2 rounding, you should use the clz/clo
> functions directly.
>
> No changes from v1 to v2 except for a new patch 6 which moves
> the pow2ceil and pow2floor functions to inline.
>
> Peter Maydell (6):
>   hw/pci: Use pow2ceil() rather than hand-calculation
>   hw/virtio/virtio-pci: Use pow2ceil() rather than hand-calculation
>   hw/block/nvme.c: Use pow2ceil() rather than hand-calculation
>   exec.c: Use pow2floor() rather than hand-calculation
>   Remove unused qemu_fls function
>   Make pow2ceil() and pow2floor() inline
>
>  exec.c|  4 +---
>  hw/block/nvme.c   |  2 +-
>  hw/pci/msix.c |  4 +---
>  hw/pci/pci.c  |  4 +---
>  hw/virtio/virtio-pci.c|  4 +---
>  include/qemu-common.h | 17 +
>  include/qemu/host-utils.h | 33 +
>  util/cutils.c | 28 
>  8 files changed, 39 insertions(+), 57 deletions(-)
>
> --
> 1.9.1

Re: [Qemu-devel] [PATCH 0/5] arm_gic: Drop running_irq and last_active arrays

2015-08-14 Thread Peter Maydell

Ping?

thanks
-- PMM

On 28 July 2015 at 14:22, Peter Maydell  wrote:
> This patchset is a bit of cleanup to our GIC implementation that
> I've wanted to do for ages.
>
> Our current GIC code uses a couple of arrays (running_irq and
> last_active) to track currently active interrupts so that
> it can correctly determine the running priority as potentially
> nested interrupts are taken and then dismissed. This does
> work, but:
>  * the effectively-a-linked-list is not very hardware-ish,
>which is usually a bit of a red flag when doing modelling
>  * the GICv2 spec adds the Active Priority Registers which are
>for use for saving and restoring this sort of state, and
>implementing these properly effectively constrains you to
>an implementation that doesn't look like what we have now
>  * it doesn't really fit with the GIC grouping and security
>extensions, where a guest can say "dismiss the last group 1
>interrupt" rather than just "dismiss the last interrupt".
>
> This series gets rid of those arrays and instead uses the
> Active Priority Registers to do the job. The APRs have one
> bit per "preemption level" (ie per distinct group priority).
> When we take an interrupt we set the appropriate bit to 1
> (and this will always be the lowest set bit in the register,
> because low-numbered priorities are higher and we wouldn't
> have taken the interrupt unless it was higher than our current
> priority). Similarly, when we dismiss an interrupt this just
> clears the lowest set bit in the register, which must be the
> current active interrupt. (It's important not to try to look
> at the current configured priority of the interrupt number,
> because the guest might have reconfigured it while it was
> active.) The new running priority is then calculable by
> looking at the new lowest set bit.
>
> The new code also takes a step in the direction of
> separating the idea of "priority drop" from "deactivate interrupt",
> which we will need to implement the GICv2 feature which allows
> guests to do these two things as separate operations. There's
> more work to do in this area though.
>
> Patch series structure:
>  * patch 1 disentangles the v7M NVIC from some of the internal
>state we're about to rewrite
>  * patch 2 fixes a bug that would have meant we could have multiple
>active interrupts at the same group priority, which (a) isn't
>permitted and (b) would break the redesign we're about to do
>  * patch 3 is fixing the guest accessors for the APR registers
>  * patch 4 is the meat of the change
>  * patch 5 is a bonus bugfix
>
>
> Peter Maydell (5):
>   armv7m_nvic: Implement ICSR without using internal GIC state
>   hw/intc/arm_gic: Running priority is group priority, not full priority
>   hw/intc/arm_gic: Fix handling of GICC_APR, GICC_NSAPR registers
>   hw/intc/arm_gic: Drop running_irq and last_active arrays
>   hw/intc/arm_gic: Actually set the active bits for active interrupts
>
>  hw/intc/arm_gic.c| 245 
> ++-
>  hw/intc/arm_gic_common.c |   8 +-
>  hw/intc/armv7m_nvic.c|  13 +--
>  include/hw/intc/arm_gic_common.h |  11 +-
>  4 files changed, 224 insertions(+), 53 deletions(-)

Re: [Qemu-devel] [PATCH 0/4] target-arm: Implement missing EL3 (and EL2) registers

2015-08-14 Thread Peter Maydell

Ping?

thanks
-- PMM

On 30 July 2015 at 19:36, Peter Maydell  wrote:
> This series adds a handful of EL3 system registers that
> we were missing. It also includes the EL2 flavours
> where there were obvious easy parallels. I think this
> means we now have all the EL3 sysregs we care about.
> (A previous series added missing address translation
> operations; I still have to do the missing TLB ops.)
>
> None of these registers are exciting; they're all either
> reads-as-written or RAZ/WI.
>
> A note for people who care about EL2: I notice that a
> lot of AArch32 EL2 registers have the access permission
> pattern of "accessible from EL2(NS) and from EL3 if
> SCR.NS==1, but traps if accessed from EL3 if SCR.NS==0".
> We don't implement this wrinkle (we won't trap the
> erroneous EL3 access). This is true of the EL2 regs I
> add here, but then it's true of all our existing ones...
>
> Peter Maydell (4):
>   target-arm: Add missing MAIR_EL3 and TPIDR_EL3 registers
>   target-arm: Implement missing AMAIR registers
>   target-arm: Implement missing AFSR registers
>   target-arm: Implement missing ACTLR registers
>
>  target-arm/helper.c | 74 
> -
>  1 file changed, 68 insertions(+), 6 deletions(-)
>

Re: [Qemu-devel] [PATCH 1/6] cputlb: Add functions for flushing TLB for a single MMU index

2015-08-14 Thread Peter Maydell

On 7 August 2015 at 13:33, Peter Maydell  wrote:
> Guest CPU TLB maintenance operations may be sufficiently
> specialized to only need to flush TLB entries corresponding
> to a particular MMU index. Implement cputlb functions for
> this, to avoid the inefficiency of flushing TLB entries
> which we don't need to.
>
> Signed-off-by: Peter Maydell 
> ---
>  cputlb.c| 81 
> +
>  include/exec/exec-all.h | 47 
>  2 files changed, 128 insertions(+)
>
> diff --git a/cputlb.c b/cputlb.c
> index a506086..a1996ba 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -69,6 +69,39 @@ void tlb_flush(CPUState *cpu, int flush_global)
>  tlb_flush_count++;
>  }
>
> +static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
> +{
> +CPUArchState *env = cpu->env_ptr;
> +
> +#if defined(DEBUG_TLB)
> +printf("tlb_flush_by_mmuidx %d:\n", mmu_idx);
> +#endif

This debug tracing doesn't compile if enabled -- it was written
to go with my initial implementation which took a single
mmu_idx rather than varargs, and I forgot to update it. I'll
send out a v2 shortly.

thanks
-- PMM

[Qemu-devel] [PATCH] Move RAMBlock and ram_list to ram_addr.h

2015-08-14 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Signed-off-by: Dr. David Alan Gilbert 
---
 include/exec/cpu-all.h  | 41 -
 include/exec/ram_addr.h | 40 
 2 files changed, 40 insertions(+), 41 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ea6a9a6..175f376 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -273,44 +273,6 @@ CPUArchState *cpu_copy(CPUArchState *env);
 
 #if !defined(CONFIG_USER_ONLY)
 
-/* memory API */
-
-typedef struct RAMBlock RAMBlock;
-
-struct RAMBlock {
-struct rcu_head rcu;
-struct MemoryRegion *mr;
-uint8_t *host;
-ram_addr_t offset;
-ram_addr_t used_length;
-ram_addr_t max_length;
-void (*resized)(const char*, uint64_t length, void *host);
-uint32_t flags;
-/* Protected by iothread lock.  */
-char idstr[256];
-/* RCU-enabled, writes protected by the ramlist lock */
-QLIST_ENTRY(RAMBlock) next;
-int fd;
-};
-
-static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
-{
-assert(offset < block->used_length);
-assert(block->host);
-return (char *)block->host + offset;
-}
-
-typedef struct RAMList {
-QemuMutex mutex;
-/* Protected by the iothread lock.  */
-unsigned long *dirty_memory[DIRTY_MEMORY_NUM];
-RAMBlock *mru_block;
-/* RCU-enabled, writes protected by the ramlist lock. */
-QLIST_HEAD(, RAMBlock) blocks;
-uint32_t version;
-} RAMList;
-extern RAMList ram_list;
-
 /* Flags stored in the low bits of the TLB virtual address.  These are
defined so that fast path ram access is all zeros.  */
 /* Zero if TLB entry is valid.  */
@@ -323,9 +285,6 @@ extern RAMList ram_list;
 
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);
-ram_addr_t last_ram_offset(void);
-void qemu_mutex_lock_ramlist(void);
-void qemu_mutex_unlock_ramlist(void);
 #endif /* !CONFIG_USER_ONLY */
 
 int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index c113f21..c400a75 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -22,6 +22,46 @@
 #ifndef CONFIG_USER_ONLY
 #include "hw/xen/xen.h"
 
+typedef struct RAMBlock RAMBlock;
+
+struct RAMBlock {
+struct rcu_head rcu;
+struct MemoryRegion *mr;
+uint8_t *host;
+ram_addr_t offset;
+ram_addr_t used_length;
+ram_addr_t max_length;
+void (*resized)(const char*, uint64_t length, void *host);
+uint32_t flags;
+/* Protected by iothread lock.  */
+char idstr[256];
+/* RCU-enabled, writes protected by the ramlist lock */
+QLIST_ENTRY(RAMBlock) next;
+int fd;
+};
+
+static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
+{
+assert(offset < block->used_length);
+assert(block->host);
+return (char *)block->host + offset;
+}
+
+typedef struct RAMList {
+QemuMutex mutex;
+/* Protected by the iothread lock.  */
+unsigned long *dirty_memory[DIRTY_MEMORY_NUM];
+RAMBlock *mru_block;
+/* RCU-enabled, writes protected by the ramlist lock. */
+QLIST_HEAD(, RAMBlock) blocks;
+uint32_t version;
+} RAMList;
+extern RAMList ram_list;
+
+ram_addr_t last_ram_offset(void);
+void qemu_mutex_lock_ramlist(void);
+void qemu_mutex_unlock_ramlist(void);
+
 ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
 bool share, const char *mem_path,
 Error **errp);
-- 
2.4.3

[Qemu-devel] [PATCH] trace-events: Add hmp completion

2015-08-14 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Add completion for the trace event names in the hmp trace-event
command.

Signed-off-by: Dr. David Alan Gilbert 
---
 hmp-commands.hx |  1 +
 hmp.h   |  1 +
 monitor.c   | 20 
 3 files changed, 22 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index d3b7932..94d2c39 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -271,6 +271,7 @@ ETEXI
 .params = "name on|off",
 .help   = "changes status of a specific trace event",
 .mhandler.cmd = hmp_trace_event,
+.command_completion = trace_event_completion,
 },
 
 STEXI
diff --git a/hmp.h b/hmp.h
index 0cf4f2a..b8f5d33 100644
--- a/hmp.h
+++ b/hmp.h
@@ -113,6 +113,7 @@ void set_link_completion(ReadLineState *rs, int nb_args, 
const char *str);
 void netdev_add_completion(ReadLineState *rs, int nb_args, const char *str);
 void netdev_del_completion(ReadLineState *rs, int nb_args, const char *str);
 void ringbuf_write_completion(ReadLineState *rs, int nb_args, const char *str);
+void trace_event_completion(ReadLineState *rs, int nb_args, const char *str);
 void watchdog_action_completion(ReadLineState *rs, int nb_args,
 const char *str);
 void migrate_set_capability_completion(ReadLineState *rs, int nb_args,
diff --git a/monitor.c b/monitor.c
index aeea2b5..2e6abac 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4429,6 +4429,26 @@ void netdev_del_completion(ReadLineState *rs, int 
nb_args, const char *str)
 }
 }
 
+void trace_event_completion(ReadLineState *rs, int nb_args, const char *str)
+{
+size_t len;
+
+len = strlen(str);
+readline_set_completion_index(rs, len);
+if (nb_args == 2) {
+TraceEventID id;
+for (id = 0; id < trace_event_count(); id++) {
+const char *event_name = trace_event_get_name(trace_event_id(id));
+if (!strncmp(str, event_name, len)) {
+readline_add_completion(rs, event_name);
+}
+}
+} else if (nb_args == 3) {
+add_completion_option(rs, str, "on");
+add_completion_option(rs, str, "off");
+}
+}
+
 void watchdog_action_completion(ReadLineState *rs, int nb_args, const char 
*str)
 {
 int i;
-- 
2.4.3

Re: [Qemu-devel] [PATCH] mirror: Fix coroutine reentrance

2015-08-14 Thread Stefan Hajnoczi

On Thu, Aug 13, 2015 at 10:41:50AM +0200, Kevin Wolf wrote:
> This fixes a regression introduced by commit dcfb3beb ("mirror: Do zero
> write on target if sectors not allocated"), which was reported to cause
> aborts with the message "Co-routine re-entered recursively".
> 
> The cause for this bug is the following code in mirror_iteration_done():
> 
> if (s->common.busy) {
> qemu_coroutine_enter(s->common.co, NULL);
> }
> 
> This has always been ugly because - unlike most places that reenter - it
> doesn't have a specific yield that it pairs with, but is more
> uncontrolled.  What we really mean here is "reenter the coroutine if
> it's in one of the four explicit yields in mirror.c".
> 
> This used to be equivalent with s->common.busy because neither
> mirror_run() nor mirror_iteration() call any function that could yield.
> However since commit dcfb3beb this doesn't hold true any more:
> bdrv_get_block_status_above() can yield.
> 
> So what happens is that bdrv_get_block_status_above() wants to take a
> lock that is already held, so it adds itself to the queue of waiting
> coroutines and yields. Instead of being woken up by the unlock function,
> however, it gets woken up by mirror_iteration_done(), which is obviously
> wrong.
> 
> In most cases the code actually happens to cope fairly well with such
> cases, but in this specific case, the unlock must already have scheduled
> the coroutine for wakeup when mirror_iteration_done() reentered it. And
> then the coroutine happened to process the scheduled restarts and tried
> to reenter itself recursively.
> 
> This patch fixes the problem by pairing the reenter in
> mirror_iteration_done() with specific yields instead of abusing
> s->common.busy.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Kevin Wolf 
> ---
>  block/mirror.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


pgpIjJrUYuZgl.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH RFC 00/10] Enable repository wide style checking

2015-08-14 Thread Peter Maydell

On 14 August 2015 at 11:30, Paul Eggert  wrote:
> Peter Maydell wrote:
>> I just don't want
>> a GPLv3-licensed file in the git repo and an integrated part
>> of our build-and-test system...
>
>
> My kneejerk reaction is that the build procedures in question are large
> enough that they should stay GPLv3.  If you don't want those files in your
> git repo you can simply fetch them as part of your bootstrap or autogen.sh
> or whatever. Although this might not mollify people who worry about GPLv3
> cooties infecting their executables, catering to paranoia is not high on our
> list of things to do.

My objections to the GPLv3 here are purely pragmatic. QEMU contains
too much GPLv2-only code to feasibly rewrite, and GPLv2 and v3 aren't
compatible. Therefore we can't use GPLv3 code. That's sometimes
awkward for us in that it prevents us using code from other free
software projects, but that's the way licensing works.

thanks
-- PMM

[Qemu-devel] [PATCH v2 0/6] flush TLBs for one MMUidx only, missing AArch64 TLB ops

2015-08-14 Thread Peter Maydell

This series does three things:

(1) implement the "flush the TLB only for a specified MMU index"
functionality that we talked about when we added all the new
MMU index values for ARM for EL2 and EL3

(2) use that to restrict the AArch64 TLB maintenance operations
to only the MMU indexes they need to touch

(3) add all the missing EL2 and EL3 related TLB operations for
AArch64

I did a quick performance test by running hackbench. Measuring
suggests that performance is improved by between half and one
percent, which isn't fantastic but then I don't know how much
of hackbench's runtime is bottlenecked by TLB flushes. I would
expect that a workload that actually used EL2 and EL3 will
benefit by not having the EL2 and EL3 flushes taking out the
EL1&0 TLB too.

Disclaimer: the EL2 and EL3 parts of this code are untested
because we haven't completely implemented those for AArch64 yet.

Changes v1->v2:
 * patch 1 updated so the debug printfs will compile if enabled
 * rebased

Patches 2..6 have already been reviewed.

thanks
-- PMM


Peter Maydell (6):
  cputlb: Add functions for flushing TLB for a single MMU index
  target-arm: Move TLBI ALLE1/ALLE1IS definitions into numeric order
  target-arm: Restrict AArch64 TLB flushes to the MMU indexes they must
touch
  target-arm: Implement missing EL2 TLBI operations
  target-arm: Implement missing EL3 TLB invalidate operations
  target-arm: Implement AArch64 TLBI operations on IPAs

 cputlb.c|  97 ++
 include/exec/exec-all.h |  47 +++
 target-arm/helper.c | 329 +---
 3 files changed, 428 insertions(+), 45 deletions(-)

-- 
1.9.1

[Qemu-devel] [PATCH v2 6/6] target-arm: Implement AArch64 TLBI operations on IPAs

2015-08-14 Thread Peter Maydell

Implement the AArch64 TLBI operations which take an intermediate
physical address and invalidate stage 2 translations.

Signed-off-by: Peter Maydell 
Reviewed-by: Edgar E. Iglesias 
---
 target-arm/helper.c | 55 +
 1 file changed, 55 insertions(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 4982396..42a81ab 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2680,6 +2680,45 @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 }
 }
 
+static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+/* Invalidate by IPA. This has to invalidate any structures that
+ * contain only stage 2 translation information, but does not need
+ * to apply to structures that contain combined stage 1 and stage 2
+ * translation information.
+ * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
+ */
+ARMCPU *cpu = arm_env_get_cpu(env);
+CPUState *cs = CPU(cpu);
+uint64_t pageaddr;
+
+if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
+return;
+}
+
+pageaddr = sextract64(value << 12, 0, 48);
+
+tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdx_S2NS, -1);
+}
+
+static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+CPUState *other_cs;
+uint64_t pageaddr;
+
+if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
+return;
+}
+
+pageaddr = sextract64(value << 12, 0, 48);
+
+CPU_FOREACH(other_cs) {
+tlb_flush_page_by_mmuidx(other_cs, pageaddr, ARMMMUIdx_S2NS, -1);
+}
+}
+
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 /* We don't implement EL2, so the only control on DC ZVA is the
@@ -2860,6 +2899,14 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
   .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7,
   .access = PL1_W, .type = ARM_CP_NO_RAW,
   .writefn = tlbi_aa64_vae1_write },
+{ .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_ipas2e1is_write },
+{ .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_ipas2e1is_write },
 { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
   .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
   .access = PL2_W, .type = ARM_CP_NO_RAW,
@@ -2868,6 +2915,14 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
   .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 6,
   .access = PL2_W, .type = ARM_CP_NO_RAW,
   .writefn = tlbi_aa64_alle1is_write },
+{ .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_ipas2e1_write },
+{ .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_ipas2e1_write },
 { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
   .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
   .access = PL2_W, .type = ARM_CP_NO_RAW,
-- 
1.9.1

[Qemu-devel] [PATCH v2 4/6] target-arm: Implement missing EL2 TLBI operations

2015-08-14 Thread Peter Maydell

Implement the missing TLBI operations that exist only
if EL2 is implemented.

Signed-off-by: Peter Maydell 
Reviewed-by: Edgar E. Iglesias 
---
 target-arm/helper.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index aea8b33..77ce718 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2562,6 +2562,16 @@ static void tlbi_aa64_alle1is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 }
 }
 
+static void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+CPUState *other_cs;
+
+CPU_FOREACH(other_cs) {
+tlb_flush_by_mmuidx(other_cs, ARMMMUIdx_S1E2, -1);
+}
+}
+
 static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
@@ -3065,10 +3075,22 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 1,
   .type = ARM_CP_NO_RAW, .access = PL2_W,
   .writefn = tlbi_aa64_vae2_write },
+{ .name = "TLBI_VALE2", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 5,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_vae2_write },
+{ .name = "TLBI_ALLE2IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 0,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle2is_write },
 { .name = "TLBI_VAE2IS", .state = ARM_CP_STATE_AA64,
   .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 1,
   .type = ARM_CP_NO_RAW, .access = PL2_W,
   .writefn = tlbi_aa64_vae2is_write },
+{ .name = "TLBI_VALE2IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 5,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_vae2is_write },
 #ifndef CONFIG_USER_ONLY
 { .name = "CNTHCTL_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 14, .crm = 1, .opc2 = 0,
-- 
1.9.1

[Qemu-devel] [PATCH v2 2/6] target-arm: Move TLBI ALLE1/ALLE1IS definitions into numeric order

2015-08-14 Thread Peter Maydell

Move the two regdefs for TLBI ALLE1 and TLBI ALLE1IS down so that the
whole set of AArch64 TLBI regdefs is arranged in numeric order.

Signed-off-by: Peter Maydell 
Reviewed-by: Edgar E. Iglesias 
---
 target-arm/helper.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 1568aa6..2ca8839 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2672,14 +2672,6 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
   .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
   .access = PL1_W, .type = ARM_CP_NOP },
 /* TLBI operations */
-{ .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
-  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
-  .access = PL2_W, .type = ARM_CP_NO_RAW,
-  .writefn = tlbiall_write },
-{ .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
-  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
-  .access = PL2_W, .type = ARM_CP_NO_RAW,
-  .writefn = tlbiall_is_write },
 { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
   .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
   .access = PL1_W, .type = ARM_CP_NO_RAW,
@@ -2728,6 +2720,14 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
   .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7,
   .access = PL1_W, .type = ARM_CP_NO_RAW,
   .writefn = tlbi_aa64_vaa_write },
+{ .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbiall_is_write },
+{ .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbiall_write },
 #ifndef CONFIG_USER_ONLY
 /* 64 bit address translation operations */
 { .name = "AT_S1E1R", .state = ARM_CP_STATE_AA64,
-- 
1.9.1

[Qemu-devel] [PATCH v2 5/6] target-arm: Implement missing EL3 TLB invalidate operations

2015-08-14 Thread Peter Maydell

Implement the remaining stage 1 TLB invalidate operations
visible from EL3.

Signed-off-by: Peter Maydell 
Reviewed-by: Edgar E. Iglesias 
---
 target-arm/helper.c | 76 +
 1 file changed, 76 insertions(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 77ce718..4982396 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2538,6 +2538,15 @@ static void tlbi_aa64_alle2_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 tlb_flush_by_mmuidx(cs, ARMMMUIdx_S1E2, -1);
 }
 
+static void tlbi_aa64_alle3_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+ARMCPU *cpu = arm_env_get_cpu(env);
+CPUState *cs = CPU(cpu);
+
+tlb_flush_by_mmuidx(cs, ARMMMUIdx_S1E3, -1);
+}
+
 static void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 uint64_t value)
 {
@@ -2572,6 +2581,16 @@ static void tlbi_aa64_alle2is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 }
 }
 
+static void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+CPUState *other_cs;
+
+CPU_FOREACH(other_cs) {
+tlb_flush_by_mmuidx(other_cs, ARMMMUIdx_S1E3, -1);
+}
+}
+
 static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
@@ -2607,6 +2626,20 @@ static void tlbi_aa64_vae2_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdx_S1E2, -1);
 }
 
+static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ uint64_t value)
+{
+/* Invalidate by VA, EL3
+ * Currently handles both VAE3 and VALE3, since we don't support
+ * flush-last-level-only.
+ */
+ARMCPU *cpu = arm_env_get_cpu(env);
+CPUState *cs = CPU(cpu);
+uint64_t pageaddr = sextract64(value << 12, 0, 56);
+
+tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdx_S1E3, -1);
+}
+
 static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
@@ -2636,6 +2669,17 @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 }
 }
 
+static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+   uint64_t value)
+{
+CPUState *other_cs;
+uint64_t pageaddr = sextract64(value << 12, 0, 56);
+
+CPU_FOREACH(other_cs) {
+tlb_flush_page_by_mmuidx(other_cs, pageaddr, ARMMMUIdx_S1E3, -1);
+}
+}
+
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 /* We don't implement EL2, so the only control on DC ZVA is the
@@ -2820,10 +2864,18 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
   .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
   .access = PL2_W, .type = ARM_CP_NO_RAW,
   .writefn = tlbi_aa64_alle1is_write },
+{ .name = "TLBI_VMALLS12E1IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 6,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle1is_write },
 { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
   .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
   .access = PL2_W, .type = ARM_CP_NO_RAW,
   .writefn = tlbi_aa64_alle1_write },
+{ .name = "TLBI_VMALLS12E1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 6,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle1is_write },
 #ifndef CONFIG_USER_ONLY
 /* 64 bit address translation operations */
 { .name = "AT_S1E1R", .state = ARM_CP_STATE_AA64,
@@ -3197,6 +3249,30 @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 1, .opc2 = 2,
   .access = PL3_RW, .accessfn = cptr_access, .resetvalue = 0,
   .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[3]) },
+{ .name = "TLBI_ALLE3IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 0,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle3is_write },
+{ .name = "TLBI_VAE3IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 1,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_vae3is_write },
+{ .name = "TLBI_VALE3IS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 5,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_vae3is_write },
+{ .name = "TLBI_ALLE3", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 0,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle3_write },
+{ .name = "TLBI_VAE3", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn =

[Qemu-devel] [PATCH v2 1/6] cputlb: Add functions for flushing TLB for a single MMU index

2015-08-14 Thread Peter Maydell

Guest CPU TLB maintenance operations may be sufficiently
specialized to only need to flush TLB entries corresponding
to a particular MMU index. Implement cputlb functions for
this, to avoid the inefficiency of flushing TLB entries
which we don't need to.

Signed-off-by: Peter Maydell 
---
 cputlb.c| 97 +
 include/exec/exec-all.h | 47 
 2 files changed, 144 insertions(+)

diff --git a/cputlb.c b/cputlb.c
index a506086..4bc6c24 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -69,6 +69,47 @@ void tlb_flush(CPUState *cpu, int flush_global)
 tlb_flush_count++;
 }
 
+static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
+{
+CPUArchState *env = cpu->env_ptr;
+
+#if defined(DEBUG_TLB)
+printf("tlb_flush_by_mmuidx:");
+#endif
+/* must reset current TB so that interrupts cannot modify the
+   links while we are modifying them */
+cpu->current_tb = NULL;
+
+for (;;) {
+int mmu_idx = va_arg(argp, int);
+
+if (mmu_idx < 0) {
+break;
+}
+
+#if defined(DEBUG_TLB)
+printf(" %d", mmu_idx);
+#endif
+
+memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
+memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+}
+
+#if defined(DEBUG_TLB)
+printf("\n");
+#endif
+
+memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+}
+
+void tlb_flush_by_mmuidx(CPUState *cpu, ...)
+{
+va_list argp;
+va_start(argp, cpu);
+v_tlb_flush_by_mmuidx(cpu, argp);
+va_end(argp);
+}
+
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
 {
 if (addr == (tlb_entry->addr_read &
@@ -121,6 +162,62 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 tb_flush_jmp_cache(cpu, addr);
 }
 
+void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
+{
+CPUArchState *env = cpu->env_ptr;
+int i, k;
+va_list argp;
+
+va_start(argp, addr);
+
+#if defined(DEBUG_TLB)
+printf("tlb_flush_page_by_mmu_idx: " TARGET_FMT_lx, addr);
+#endif
+/* Check if we need to flush due to large pages.  */
+if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) {
+#if defined(DEBUG_TLB)
+printf(" forced full flush ("
+   TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
+   env->tlb_flush_addr, env->tlb_flush_mask);
+#endif
+v_tlb_flush_by_mmuidx(cpu, argp);
+va_end(argp);
+return;
+}
+/* must reset current TB so that interrupts cannot modify the
+   links while we are modifying them */
+cpu->current_tb = NULL;
+
+addr &= TARGET_PAGE_MASK;
+i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+
+for (;;) {
+int mmu_idx = va_arg(argp, int);
+
+if (mmu_idx < 0) {
+break;
+}
+
+#if defined(DEBUG_TLB)
+printf(" %d", mmu_idx);
+#endif
+
+tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
+
+/* check whether there are vltb entries that need to be flushed */
+for (k = 0; k < CPU_VTLB_SIZE; k++) {
+tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
+}
+}
+va_end(argp);
+
+#if defined(DEBUG_TLB)
+printf("\n");
+#endif
+
+tb_flush_jmp_cache(cpu, addr);
+}
+
 /* update the TLBs so that writes to code in the virtual page 'addr'
can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a6fce04..4933683 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -96,8 +96,46 @@ bool qemu_in_vcpu_thread(void);
 void cpu_reload_memory_map(CPUState *cpu);
 void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as);
 /* cputlb.c */
+/**
+ * tlb_flush_page:
+ * @cpu: CPU whose TLB should be flushed
+ * @addr: virtual address of page to be flushed
+ *
+ * Flush one page from the TLB of the specified CPU, for all
+ * MMU indexes.
+ */
 void tlb_flush_page(CPUState *cpu, target_ulong addr);
+/**
+ * tlb_flush:
+ * @cpu: CPU whose TLB should be flushed
+ * @flush_global: ignored
+ *
+ * Flush the entire TLB for the specified CPU.
+ * The flush_global flag is in theory an indicator of whether the whole
+ * TLB should be flushed, or only those entries not marked global.
+ * In practice QEMU does not implement any global/not global flag for
+ * TLB entries, and the argument is ignored.
+ */
 void tlb_flush(CPUState *cpu, int flush_global);
+/**
+ * tlb_flush_page_by_mmuidx:
+ * @cpu: CPU whose TLB should be flushed
+ * @addr: virtual address of page to be flushed
+ * @...: list of MMU indexes to flush, terminated by a negative value
+ *
+ * Flush one page from the TLB of the specified CPU, for the specified
+ * MMU indexes.
+ */
+void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...);
+/**
+ * tlb_flush_by_mmuidx:
+ * @cpu: CPU whose TLB should be flushed
+ * @...: list of MMU indexes to flush, terminated by a nega

[Qemu-devel] [PATCH v2 3/6] target-arm: Restrict AArch64 TLB flushes to the MMU indexes they must touch

2015-08-14 Thread Peter Maydell

Now we have the ability to flush the TLB only for specific MMU indexes,
update the AArch64 TLB maintenance instruction implementations to only
flush the parts of the TLB they need to, rather than doing full flushes.

We take the opportunity to remove some duplicate functions (the per-asid
tlb ops work like the non-per-asid ones because we don't support
flushing a TLB only by ASID) and to bring the function names in line
with the architectural TLBI operation names.

Signed-off-by: Peter Maydell 
Reviewed-by: Edgar E. Iglesias 
---
 target-arm/helper.c | 172 +++-
 1 file changed, 129 insertions(+), 43 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 2ca8839..aea8b33 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2478,65 +2478,151 @@ static CPAccessResult aa64_cacheop_access(CPUARMState 
*env,
  * Page D4-1736 (DDI0487A.b)
  */
 
-static void tlbi_aa64_va_write(CPUARMState *env, const ARMCPRegInfo *ri,
-   uint64_t value)
+static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
 {
-/* Invalidate by VA (AArch64 version) */
 ARMCPU *cpu = arm_env_get_cpu(env);
-uint64_t pageaddr = sextract64(value << 12, 0, 56);
+CPUState *cs = CPU(cpu);
 
-tlb_flush_page(CPU(cpu), pageaddr);
+if (arm_is_secure_below_el3(env)) {
+tlb_flush_by_mmuidx(cs, ARMMMUIdx_S1SE1, ARMMMUIdx_S1SE0, -1);
+} else {
+tlb_flush_by_mmuidx(cs, ARMMMUIdx_S12NSE1, ARMMMUIdx_S12NSE0, -1);
+}
 }
 
-static void tlbi_aa64_vaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
-uint64_t value)
+static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
 {
-/* Invalidate by VA, all ASIDs (AArch64 version) */
-ARMCPU *cpu = arm_env_get_cpu(env);
-uint64_t pageaddr = sextract64(value << 12, 0, 56);
+bool sec = arm_is_secure_below_el3(env);
+CPUState *other_cs;
 
-tlb_flush_page(CPU(cpu), pageaddr);
+CPU_FOREACH(other_cs) {
+if (sec) {
+tlb_flush_by_mmuidx(other_cs, ARMMMUIdx_S1SE1, ARMMMUIdx_S1SE0, 
-1);
+} else {
+tlb_flush_by_mmuidx(other_cs, ARMMMUIdx_S12NSE1,
+ARMMMUIdx_S12NSE0, -1);
+}
+}
 }
 
-static void tlbi_aa64_asid_write(CPUARMState *env, const ARMCPRegInfo *ri,
- uint64_t value)
+static void tlbi_aa64_alle1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
 {
-/* Invalidate by ASID (AArch64 version) */
+/* Note that the 'ALL' scope must invalidate both stage 1 and
+ * stage 2 translations, whereas most other scopes only invalidate
+ * stage 1 translations.
+ */
 ARMCPU *cpu = arm_env_get_cpu(env);
-int asid = extract64(value, 48, 16);
-tlb_flush(CPU(cpu), asid == 0);
+CPUState *cs = CPU(cpu);
+
+if (arm_is_secure_below_el3(env)) {
+tlb_flush_by_mmuidx(cs, ARMMMUIdx_S1SE1, ARMMMUIdx_S1SE0, -1);
+} else {
+if (arm_feature(env, ARM_FEATURE_EL2)) {
+tlb_flush_by_mmuidx(cs, ARMMMUIdx_S12NSE1, ARMMMUIdx_S12NSE0,
+ARMMMUIdx_S2NS, -1);
+} else {
+tlb_flush_by_mmuidx(cs, ARMMMUIdx_S12NSE1, ARMMMUIdx_S12NSE0, -1);
+}
+}
 }
 
-static void tlbi_aa64_va_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+static void tlbi_aa64_alle2_write(CPUARMState *env, const ARMCPRegInfo *ri,
   uint64_t value)
 {
+ARMCPU *cpu = arm_env_get_cpu(env);
+CPUState *cs = CPU(cpu);
+
+tlb_flush_by_mmuidx(cs, ARMMMUIdx_S1E2, -1);
+}
+
+static void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+/* Note that the 'ALL' scope must invalidate both stage 1 and
+ * stage 2 translations, whereas most other scopes only invalidate
+ * stage 1 translations.
+ */
+bool sec = arm_is_secure_below_el3(env);
+bool has_el2 = arm_feature(env, ARM_FEATURE_EL2);
 CPUState *other_cs;
-uint64_t pageaddr = sextract64(value << 12, 0, 56);
 
 CPU_FOREACH(other_cs) {
-tlb_flush_page(other_cs, pageaddr);
+if (sec) {
+tlb_flush_by_mmuidx(other_cs, ARMMMUIdx_S1SE1, ARMMMUIdx_S1SE0, 
-1);
+} else if (has_el2) {
+tlb_flush_by_mmuidx(other_cs, ARMMMUIdx_S12NSE1,
+ARMMMUIdx_S12NSE0, ARMMMUIdx_S2NS, -1);
+} else {
+tlb_flush_by_mmuidx(other_cs, ARMMMUIdx_S12NSE1,
+ARMMMUIdx_S12NSE0, -1);
+}
 }
 }
 
-static void tlbi_aa64_vaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-  uint64_t value)
+static void tlbi_aa64_vae1_w

Re: [Qemu-devel] [PATCH 0/4] target-sparc: Update to use VMStateDescription

2015-08-14 Thread Peter Maydell

On 13 August 2015 at 23:37, Mark Cave-Ayland
 wrote:
> On 10/08/15 13:34, Peter Maydell wrote:
>
>> This patchset updates target-sparc to use VMStateDescription
>> rather than hand-written save/load functions. (This and CRIS
>> are the last two targets still using the old approach.)
>>
>> It's based on some patches from back in 2012 by Juan which
>> I've updated, rebased and made some tweaks to.
>>
>> This is a migration compatibility break; we don't care about
>> cross-version migration on SPARC guests, and not having to
>> maintain the old wire format allows a cleaner vmstate
>> description in several ways.

> Thanks for looking into this! In general the patches look very
> reasonable (although I will need to give them a more thorough testing
> when I get a chance) - my only concern is the break in migration
> compatibility. Am I right in thinking that with this patch applied a
> loadvm cannot restore a savevm from an earlier version?

Yep, that's what cross-version breaks imply.

> Not so much for qemu-system-sparc64 which is still somewhat
> experimental, however qemu-system-sparc has become very usable since
> 2012 with the advent of the cg3 and OpenBIOS changes that can now run
> Solaris/SunOS and I do have a slight concern that people could lose
> their qcow2 snapshots. Then again if we document this loudly in the
> release notes then I guess it is possible to convert a snapshot back to
> a raw, boot that and then savevm it back to the newer qcow2 again...

If there's a migration break, then the old vm-snapshot is
useless. You can load it on the old QEMU, obviously, but the
new one will never run it. The best you can do is load the VM
on the old QEMU and do a clean shutdown of it. Then you can do
a cold boot on the new QEMU.

(Note that QEMU has several meanings of 'snapshot'; the one
I have in mind here is the complete-saved-state-of-disk-and-VM
you get via savevm. Snapshots which are just saved-state-of-disk
are fine.)

Anyway, my assumption was that nobody cared much about migration
compatibility for sparc (nobody has *ever* to my knowledge complained
about compat breaks for ARM targets, which we've done fairly regularly
over the last few years). They're very hard to keep working
reliably, because you pretty much have to start defining per-version
machines (like all the pc-i440fx-2.4, pc-i440fx-2.3, etc) so that
each new version of QEMU can still produce a machine that is
exactly like the one the previous ones did, and you need to test
to make sure you haven't accidentally broken migration between
versions. So mostly we've only cared for targets where there's
serious usage as a VM target (x86, ppc, s390 more recently).
(If you've never tested 'savevm on qemu 2.3 and loadvm on 2.4"
then I'd probably put even odds on it being at least subtly broken.)

But this is in the end a target maintainer choice, so if you
really want to maintain cross-version compat I can do that.
The downside is that you end up with either (a) a really ugly
vmstate because it has to maintain unnatural field orders or
on-the-wire state or (b) a bunch of code that's only exercised
on migration from older QEMU. I think the most awkward part here
is that the old wire format wants to send the ITLB and DTLB
structures interleaved, which means you can't just define them
as being arrays of structs. Juan's original 2012 patchset actually
did the conversion as a compatible change followed by a breaking
change -- this is the patch which does the breaking-change:
http://lists.gnu.org/archive/html/qemu-devel/2012-03/msg03818.html
The diffstat is "9 insertions(+), 163 deletions(-)"...

Let me know what you'd prefer here.

thanks
-- PMM

[Qemu-devel] Help debugging a regression in KVM Module

2015-08-14 Thread Peter Lieven

Hi,

some time a go I stumbled across a regression in the KVM Module that has been 
introduced somewhere
between 3.17 and 3.19.

I have a rather old openSUSE guest with an XFS filesystem which realiably 
crashes after some live migrations.
I originally believed that the issue might be related to my setup with a 3.12 
host kernel and kvm-kmod 3.19,
but I now found that it is also still present with a 3.19 host kernel with 
included 3.19 kvm module.

My idea was to continue testing on a 3.12 host kernel and then bisect all 
commits to the kvm related parts.

Now my question is how to best bisect only kvm related changes (those that go 
into kvm-kmod)?

Thanks,
Peter

Re: [Qemu-devel] Win32 stdio not working if SDL is enabled

2015-08-14 Thread Daniel P. Berrange

On Thu, Aug 13, 2015 at 07:48:47PM +0200, Stefan Weil wrote:
> Am 13.08.2015 um 14:06 schrieb Daniel P. Berrange:
> > When debugging some patches on Windows, I discovered that nothing printed
> > to stderr ever appears on the console. Eventually I discovered that if I
> > build with --disable-sdl, then stderr appears just fine.
> > 
> > Looking at the code in vl.c I see a hack for SDL introduced in
> > 
> >   commit 59a36a2f6728081050afc6ec97d0018467999f79
> >   Author: Stefan Weil 
> >   Date:   Thu Jun 18 20:11:03 2009 +0200
> > 
> > Win32: Fix compilation with SDL.
> > 
> > 
> > If I mostly kill the hack from vl.c, and just leave a plain '#undef main'
> > then I get working console stderr once again.
> > 
> 
> Hi Daniel,
> 
> that's a feature of SDL 1.2: stdout and stderr are by default
> redirected to files stdout.txt and stderr.txt in the executable's
> directory.
> 
> This redirection can be disabled by an environment variable
> (SDL_STDIO_REDIRECT="no"). On my Linux machines, I always
> set this variable, so when I run QEMU for Windows with
> wine32 or wine64, stdout and stderr work.
> 
> Printing to stdout / stderr on Windows can be an adventure:
> depending on your shell (command.exe, cmd.exe, MinGW shell,
> MinGW rxvt, Cygwin shell, ...) it works different, and I also
> had application crashes when a GUI application which was
> not started from a shell tried to print to stdout.

I see it is something intentional done by SDL, but I don't think it is
desirable in general.  I rather doubt it would crash as that would imply
that code is checking the return value of fprintf() and taking some action
on error. Instead we exclusive ignore fprintf() return values, so if the
OS is reporting an I/O error we'll be ignoring it. In any case, it is
possible to build QEMU on Win32 without SDL, or set that env variable,
at which point QEMU will be printing to stdio anyway. So in the unlikely
case there is a crash scenario, we need to fix that regardless.

IMHO we should be disabling this bogus behaviour of SDL so QEMU does not
have different behaviour wrt stdio depending on what libraries you happen
to build against, or what platform you choose. Expecting people to know
about a magic env variable to make QEMU work as it does everywhere else
is just broken.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PATCH] block/iscsi: validate block size returned from target

2015-08-14 Thread Peter Lieven

It has been reported that at least tgtd returns a block size of 0
for LUN 0. To avoid running into divide by zero later on and protect
against other problematic block sizes validate the block size right
at connection time.

Cc: qemu-sta...@nongnu.org
Reported-by: Andrey Korolyov 
Signed-off-by: Peter Lieven 
---
 block/iscsi.c | 4 
 dtc   | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 5002916..fac3a7a 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1214,6 +1214,10 @@ static void iscsi_readcapacity_sync(IscsiLun *iscsilun, 
Error **errp)
 
 if (task == NULL || task->status != SCSI_STATUS_GOOD) {
 error_setg(errp, "iSCSI: failed to send readcapacity10 command.");
+} else if (!iscsilun->block_size ||
+   iscsilun->block_size % BDRV_SECTOR_SIZE) {
+error_setg(errp, "iSCSI: the target returned an invalid "
+   "block size of %d.", iscsilun->block_size);
 }
 if (task) {
 scsi_free_scsi_task(task);

[Qemu-devel] qemu-img seg, test 082 not showing the error

2015-08-14 Thread Dr. David Alan Gilbert

Hi,
  I noticed that although 'make check-block' was passing happily I was
seeing a kernel log showing a qemu-img seg:

[Fri Aug 14 12:26:07 2015] qemu-img[7725]: segfault at 0 ip   (null) sp 
773e9a98 error 14 in qemu-img[55f707577000+f8000]


The case that fails is:
run_qemu_img amend -f qcow2 -o 
backing_file=/home/dgilbert/try-world3/tests/qemu-iotests/scratch/t.qcow2, -o 
help /home/dgilbert/try-world3/tests/qemu-iotests/scratch/t.qcow2

and I think the problem is due to a disagreement between

626f84f3 - qemu-img amend: Support multiple -o options

and

76a3a34d - qemu-img: Add progress output for amend

In the 'Invalid option list' case it goto's out skipping
the qemu_progress_init, but the out: clause does a 
qemu_progress_end that then segs.

However, I've not dug into why the testcase code didn't
spot it, which seems a  more important problem.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] Win32 stdio not working if SDL is enabled

2015-08-14 Thread Daniel P. Berrange

On Fri, Aug 14, 2015 at 12:14:15PM +0100, Daniel P. Berrange wrote:
> On Thu, Aug 13, 2015 at 07:48:47PM +0200, Stefan Weil wrote:
> > Am 13.08.2015 um 14:06 schrieb Daniel P. Berrange:
> > > When debugging some patches on Windows, I discovered that nothing printed
> > > to stderr ever appears on the console. Eventually I discovered that if I
> > > build with --disable-sdl, then stderr appears just fine.
> > > 
> > > Looking at the code in vl.c I see a hack for SDL introduced in
> > > 
> > >   commit 59a36a2f6728081050afc6ec97d0018467999f79
> > >   Author: Stefan Weil 
> > >   Date:   Thu Jun 18 20:11:03 2009 +0200
> > > 
> > > Win32: Fix compilation with SDL.
> > > 
> > > 
> > > If I mostly kill the hack from vl.c, and just leave a plain '#undef main'
> > > then I get working console stderr once again.
> > > 
> > 
> > Hi Daniel,
> > 
> > that's a feature of SDL 1.2: stdout and stderr are by default
> > redirected to files stdout.txt and stderr.txt in the executable's
> > directory.
> > 
> > This redirection can be disabled by an environment variable
> > (SDL_STDIO_REDIRECT="no"). On my Linux machines, I always
> > set this variable, so when I run QEMU for Windows with
> > wine32 or wine64, stdout and stderr work.
> > 
> > Printing to stdout / stderr on Windows can be an adventure:
> > depending on your shell (command.exe, cmd.exe, MinGW shell,
> > MinGW rxvt, Cygwin shell, ...) it works different, and I also
> > had application crashes when a GUI application which was
> > not started from a shell tried to print to stdout.
> 
> I see it is something intentional done by SDL, but I don't think it is
> desirable in general.  I rather doubt it would crash as that would imply
> that code is checking the return value of fprintf() and taking some action
> on error. Instead we exclusive ignore fprintf() return values, so if the
> OS is reporting an I/O error we'll be ignoring it. In any case, it is
> possible to build QEMU on Win32 without SDL, or set that env variable,
> at which point QEMU will be printing to stdio anyway. So in the unlikely
> case there is a crash scenario, we need to fix that regardless.
> 
> IMHO we should be disabling this bogus behaviour of SDL so QEMU does not
> have different behaviour wrt stdio depending on what libraries you happen
> to build against, or what platform you choose. Expecting people to know
> about a magic env variable to make QEMU work as it does everywhere else
> is just broken.

A thought occurs to me - on Windows we actually build two copies of
the emulator

 - qemu-system-x86_64.exe - linked to the "console" subsystem
 - qemu-system-x86_64w.exe - linked to the "windows" subsystem [1]

With the 'windows' subsystem build it is reasonable to believe that the
user will not have any console generally available to view stderr/out.

So how about we make it such that when linked to the 'console' subsystem
we have stdout/stderr open by default, and when linked to the 'windows'
subsystem we have stdout/stderr redirected to a file (as SDL does). Except
that we make this redirection to a file happen in QEMU code, so it has
consistent behaviour even in non-SDL builds on Windows.

Regards,
Daniel

[1] '-mwindows'
 This option is available for Cygwin and MinGW targets.  It
 specifies that a GUI application is to be generated by instructing
 the linker to set the PE header subsystem type appropriately.
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH 0/4] target-sparc: Update to use VMStateDescription

2015-08-14 Thread Artyom Tarasenko

Hi Mark,

On Fri, Aug 14, 2015 at 12:37 AM, Mark Cave-Ayland
 wrote:
> On 10/08/15 13:34, Peter Maydell wrote:
>
>> This patchset updates target-sparc to use VMStateDescription
>> rather than hand-written save/load functions. (This and CRIS
>> are the last two targets still using the old approach.)
>>
>> It's based on some patches from back in 2012 by Juan which
>> I've updated, rebased and made some tweaks to.
>>
>> This is a migration compatibility break; we don't care about
>> cross-version migration on SPARC guests, and not having to
>> maintain the old wire format allows a cleaner vmstate
>> description in several ways.
>>
>> NB that the 'split cpu_put_psr' patch seems to me to be a
>> bugfix in and of itself, since currently we might try to
>> call cpu_check_irqs() and deliver interrupts while we're
>> halfway through updating a PSR value...
>>
>> Juan Quintela (2):
>>   vmstate: introduce CPU_DoubleU arrays
>>   target-sparc: Convert to VMStateDescription
>>
>> Peter Maydell (2):
>>   target-sparc: Split cpu_put_psr into side-effect and no-side-effect
>> parts
>>   target-sparc: Don't flush TLB in cpu_load function
>>
>>  hw/sparc64/sun4u.c  |  20 ---
>>  include/migration/vmstate.h |   7 +
>>  migration/vmstate.c |  23 +++
>>  target-sparc/cpu-qom.h  |   4 +
>>  target-sparc/cpu.c  |   1 +
>>  target-sparc/cpu.h  |   7 +-
>>  target-sparc/machine.c  | 360 
>> 
>>  target-sparc/win_helper.c   |  19 ++-
>>  8 files changed, 210 insertions(+), 231 deletions(-)
>
> Hi Peter,
>
> Thanks for looking into this! In general the patches look very
> reasonable (although I will need to give them a more thorough testing
> when I get a chance) - my only concern is the break in migration
> compatibility. Am I right in thinking that with this patch applied a
> loadvm cannot restore a savevm from an earlier version?
>
> Not so much for qemu-system-sparc64 which is still somewhat
> experimental, however qemu-system-sparc has become very usable since
> 2012 with the advent of the cg3 and OpenBIOS changes that can now run
> Solaris/SunOS and I do have a slight concern that people could lose
> their qcow2 snapshots. Then again if we document this loudly in the
> release notes then I guess it is possible to convert a snapshot back to
> a raw, boot that and then savevm it back to the newer qcow2 again...

I think you and Peter speak about different snapshots. The filesystem
snapshots are not affected with this series, so no need to convert
qcow2 back and force.
What would be broken is the live system snapshot - it won't be
possible to live migrate from one QEMU version to another one without
rebooting the guest.
But I guess a reboot for a QEMU upgrade is not too expensive for our
current users.

ATB,
Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC] pseries: define coldplugged devices as "configured"

2015-08-14 Thread Laurent Vivier

I'd like to know if it is the good way to fix the problem: are there
more comments on this patch ? People from IBM ?

Laurent

On 13/08/2015 14:53, Laurent Vivier wrote:
> When a device is hotplugged, attach() sets "configured" to
> false, waiting an action from the OS to configure it and then
> to call ibm,configure-connector. On ibm,configure-connector,
> the hypervisor sets "configured" to true.
> 
> In case of coldplugged device, attach() sets "configured" to
> false, but firmware and OS never call the ibm,configure-connector
> in this case, so it remains set to false.
> 
> It could be harmless, but when we unplug a device, hypervisor
> waits the device becomes configured because for it, a not configured
> device is a device being configured, so it waits the end of configuration
> to unplug it... and it never happens, so it is never unplugged.
> 
> This patch set by default coldplugged device to "configured=true",
> hotplugged device to "configured=false".
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/ppc/spapr_drc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index ee87432..e86babf 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -310,7 +310,7 @@ static void attach(sPAPRDRConnector *drc, DeviceState *d, 
> void *fdt,
>  drc->dev = d;
>  drc->fdt = fdt;
>  drc->fdt_start_offset = fdt_start_offset;
> -drc->configured = false;
> +drc->configured = coldplug;
>  
>  object_property_add_link(OBJECT(drc), "device",
>   object_get_typename(OBJECT(drc->dev)),
>

[Qemu-devel] [PATCH] vl: redirect stdio to a file in Windows GUI build

2015-08-14 Thread Daniel P. Berrange

If linked to the windows subsystem (-mwindows gcc arg) then there
will be no console available for stdout/err to send data to. Use
the same approach as SDL by redirecting stdout/err to text files
in the current directory.  If linked to the console subsystem
then leave stdout/err untouched.

The redirect can be disabled with QEMU_NO_STDIO_REDIRECT env
variable

The result is that qemu-system-x86_64.exe can use stdio in the
same manner as on any UNIX platform. The qemu-system-x86_64w.exe
binary will log to text files and options like -monitor stdio
will not be available.

Signed-off-by: Daniel P. Berrange 
---
 vl.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/vl.c b/vl.c
index bb9ed8b..c9997bf 100644
--- a/vl.c
+++ b/vl.c
@@ -2995,6 +2995,22 @@ int main(int argc, char **argv, char **envp)
 Error *main_loop_err = NULL;
 Error *err = NULL;
 
+#ifdef _WIN32
+/*
+ * If we're linked with -nwindows GetConsoleWindow returns
+ * NULL, so we know stdout/err are not available, so lets
+ * redirect them to a file. If linked to console subsystem
+ * then we can use stdout/err as normal
+ */
+if (GetConsoleWindow() == NULL &&
+getenv("QEMU_NO_STDIO_REDIRECT") == NULL) {
+freopen("stdout.txt", "w", stdout);
+freopen("stderr.txt", "w", stderr);
+setvbuf(stdout, NULL, _IOLBF, BUFSIZ); /* Line buffering */
+setbuf(stderr, NULL); /* No buffering */
+}
+#endif /* _WIN32 */
+
 qemu_init_cpu_loop();
 qemu_mutex_lock_iothread();
 
-- 
2.4.3

Re: [Qemu-devel] [PATCH v2 2/6] hw/arm: new interface for devices which need to behave differently for kernel boot

2015-08-14 Thread Peter Maydell

On 18 July 2015 at 10:00, Peter Maydell  wrote:
> On 18 July 2015 at 04:55, Peter Crosthwaite
>  wrote:
>> On Thu, Jul 16, 2015 at 1:11 PM, Peter Maydell  
>> wrote:
>>> For ARM we have a little minimalist bootloader in hw/arm/boot.c which
>>> takes the place of firmware if we're directly booting a Linux kernel.
>>> Unfortunately a few devices need special case handling in this situation
>>> to do the initialization which on real hardware would be done by
>>> firmware. (In particular if we're booting a kernel in NonSecure state
>>> then we need to make a TZ-aware GIC put all its interrupts into Group 1,
>>> or the guest will be unable to use them.)
>>>
>>> Create a new QOM interface which can be implemented by devices which
>>> need to do something different from their default reset behaviour.
>>> The callback will be called after machine initialization and before
>>> first reset.
>>>
>>> Suggested-by: Peter Crosthwaite 
>>> Signed-off-by: Peter Maydell 

>>> +/** arm_linux_init: configure the device for a direct boot
>>> + * of an ARM Linux kernel (so that device reset puts it into
>>> + * the state the kernel expects after firmware initialization,
>>> + * rather than the true hardware reset state). This callback is
>>> + * called once after machine construction is complete (before the
>>> + * first system reset).
>>> + *
>>> + * @obj: the object implementing this interface
>>> + * @secure_boot: true if we are booting Secure, false for NonSecure
>>> + * (or for a CPU which doesn't support TrustZone)
>>> + */
>>> +void (*arm_linux_init)(ARMLinuxBootIf *obj, bool secure_boot);
>>
>> Can we drop the "arm_"? ARM is always going to be there in the context
>> as it is in the typename due to ARMLinuxBootIfClass.
>
> Yeah, I guess so. I wasn't really sure what the best method
> name here was.
>
>> So If we are going for an ARM-specific thing, it might make sense to
>> instead drop all the _linux_ stuff and have this call unconditional.
>> Then the API has wider application than just Linux boots. The struct
>> arm_boot_info can be made more widely visible as the one data argument
>> the device accepts, from which security state as well and is_linux can
>> be fished.
>
> I was going to pass arm_boot_info in, but that struct requires
> the target cpu.h so it can't be used in compiled-once objects
> like the GIC code. Hence the single bool parameter.
>
> I'm also not too keen on increasing the set of things we try
> to handle in boot. Currently we do:
>  * "firmware", ie the guest code gets to do all setup that it needs,
>just as on hardware
>  * "linux kernel", where we provide the more-or-less documented
>boot environment etc for Linux kernels in particular
>
> I think you're implying that we want to support a third thing here?

Any further comment on this? As I said, I'm unconvinced about
making this method more general than we really need. The other
patches have been reviewed so consensus on this API is I think
the only blocker.

thanks
-- PMM

[Qemu-devel] [PATCH] spice: Allow to set password even if disable-ticketing was used

2015-08-14 Thread Christophe Fergeau

Before commit b1ea7b79e1, it was possible to start with -spice
disable-ticketing, and then use the "set_password spice" command to
enable ticketing with SPICE. Since commit b1ea7b79e1 this is no longer
possible as qemu_spice_set_ticket() will return an error unless the
'auth' type is "spice". When ticketing is disabled, 'auth' is "none" so
the attempt to set password fails.

This commit allows to call qemu_spice_set_ticket() when 'auth' is "none"
and changes 'auth' to "spice" when this happens.
---
 ui/spice-core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/ui/spice-core.c b/ui/spice-core.c
index 4da3042..3b20c6c 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -882,6 +882,10 @@ static int qemu_spice_set_ticket(bool fail_if_conn, bool 
disconnect_if_conn)
 int qemu_spice_set_passwd(const char *passwd,
   bool fail_if_conn, bool disconnect_if_conn)
 {
+if (strcmp(auth, "none") == 0) {
+/* Allow to set a password when started with 'disable-ticketing' */
+auth = "spice";
+}
 if (strcmp(auth, "spice") != 0) {
 return -1;
 }
-- 
2.4.3

Re: [Qemu-devel] [PATCH v2 2/6] hw/arm: new interface for devices which need to behave differently for kernel boot

2015-08-14 Thread Peter Maydell

[oops, forgot to update Peter C's email address in the From line;
apologies to everybody else for the duplicate mail.]

On 18 July 2015 at 10:00, Peter Maydell  wrote:
> On 18 July 2015 at 04:55, Peter Crosthwaite
>  wrote:
>> On Thu, Jul 16, 2015 at 1:11 PM, Peter Maydell  
>> wrote:
>>> For ARM we have a little minimalist bootloader in hw/arm/boot.c which
>>> takes the place of firmware if we're directly booting a Linux kernel.
>>> Unfortunately a few devices need special case handling in this situation
>>> to do the initialization which on real hardware would be done by
>>> firmware. (In particular if we're booting a kernel in NonSecure state
>>> then we need to make a TZ-aware GIC put all its interrupts into Group 1,
>>> or the guest will be unable to use them.)
>>>
>>> Create a new QOM interface which can be implemented by devices which
>>> need to do something different from their default reset behaviour.
>>> The callback will be called after machine initialization and before
>>> first reset.

>>> +typedef struct ARMLinuxBootIfClass {
>>> +/*< private >*/
>>> +InterfaceClass parent_class;
>>> +
>>> +/*< public >*/
>>> +/** arm_linux_init: configure the device for a direct boot
>>> + * of an ARM Linux kernel (so that device reset puts it into
>>> + * the state the kernel expects after firmware initialization,
>>> + * rather than the true hardware reset state). This callback is
>>> + * called once after machine construction is complete (before the
>>> + * first system reset).
>>> + *
>>> + * @obj: the object implementing this interface
>>> + * @secure_boot: true if we are booting Secure, false for NonSecure
>>> + * (or for a CPU which doesn't support TrustZone)
>>> + */
>>> +void (*arm_linux_init)(ARMLinuxBootIf *obj, bool secure_boot);
>>
>> Can we drop the "arm_"? ARM is always going to be there in the context
>> as it is in the typename due to ARMLinuxBootIfClass.
>
> Yeah, I guess so. I wasn't really sure what the best method
> name here was.
>
>> So If we are going for an ARM-specific thing, it might make sense to
>> instead drop all the _linux_ stuff and have this call unconditional.
>> Then the API has wider application than just Linux boots. The struct
>> arm_boot_info can be made more widely visible as the one data argument
>> the device accepts, from which security state as well and is_linux can
>> be fished.
>
> I was going to pass arm_boot_info in, but that struct requires
> the target cpu.h so it can't be used in compiled-once objects
> like the GIC code. Hence the single bool parameter.
>
> I'm also not too keen on increasing the set of things we try
> to handle in boot. Currently we do:
>  * "firmware", ie the guest code gets to do all setup that it needs,
>just as on hardware
>  * "linux kernel", where we provide the more-or-less documented
>boot environment etc for Linux kernels in particular
>
> I think you're implying that we want to support a third thing here?

Any further comment on this? As I said, I'm unconvinced about
making this method more general than we really need. The other
patches have been reviewed so consensus on this API is I think
the only blocker.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] spice: Allow to set password even if disable-ticketing was used

2015-08-14 Thread Daniel P. Berrange

On Fri, Aug 14, 2015 at 02:47:15PM +0200, Christophe Fergeau wrote:
> Before commit b1ea7b79e1, it was possible to start with -spice
> disable-ticketing, and then use the "set_password spice" command to
> enable ticketing with SPICE. Since commit b1ea7b79e1 this is no longer
> possible as qemu_spice_set_ticket() will return an error unless the
> 'auth' type is "spice". When ticketing is disabled, 'auth' is "none" so
> the attempt to set password fails.
> 
> This commit allows to call qemu_spice_set_ticket() when 'auth' is "none"
> and changes 'auth' to "spice" when this happens.

IMHO we should not be changing the authentication method as a side
effect of trying to set the password.

If app has disabled ticketing, it should remain disabled and the
set password call is right to return an error.

We should have a graphics-set-auth command for changing authentication
parameters on existing graphics backend.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-14 Thread Paolo Bonzini



- Original Message -
> From: "Peter Lieven" 
> To: qemu-devel@nongnu.org, k...@vger.kernel.org
> Cc: "Paolo Bonzini" 
> Sent: Friday, August 14, 2015 1:11:34 PM
> Subject: Help debugging a regression in KVM Module
> 
> Hi,
> 
> some time a go I stumbled across a regression in the KVM Module that has been
> introduced somewhere
> between 3.17 and 3.19.
> 
> I have a rather old openSUSE guest with an XFS filesystem which realiably
> crashes after some live migrations.
> I originally believed that the issue might be related to my setup with a 3.12
> host kernel and kvm-kmod 3.19,
> but I now found that it is also still present with a 3.19 host kernel with
> included 3.19 kvm module.
> 
> My idea was to continue testing on a 3.12 host kernel and then bisect all
> commits to the kvm related parts.
> 
> Now my question is how to best bisect only kvm related changes (those that go
> into kvm-kmod)?

I haven't forgotten this.  Sorry. :(

Unfortunately I'll be away for three weeks, but I'll make it a priority
when I'm back.

Paolo

[Qemu-devel] [PULL v2 09/20] configure: Default to enable module build

2015-08-14 Thread Paolo Bonzini

From: Fam Zheng 

We have module build support around for a while, but also had it bitrot
several times. It probably makes sense to enable it by default so that
people can notice and use it.

Add --disable-modules as a counterpart to --enable-modules, which is
now turned on by default.  If both are omitted, support is guessed as
usual.

pie is now checked for all platforms, because it's depended on by module
build.

Signed-off-by: Fam Zheng 
Message-Id: <1423481144-20314-2-git-send-email-f...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 .travis.yml |   2 +-
 configure   | 123 +---
 2 files changed, 86 insertions(+), 39 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 0ac170b..12bf1db 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -99,5 +99,5 @@ matrix:
   EXTRA_CONFIG="--enable-trace-backends=ust"
   compiler: gcc
 - env: TARGETS=i386-softmmu,x86_64-softmmu
-   EXTRA_CONFIG="--enable-modules"
+   EXTRA_CONFIG="--disable-modules"
   compiler: gcc
diff --git a/configure b/configure
index 704b34c..6faeb00 100755
--- a/configure
+++ b/configure
@@ -271,7 +271,7 @@ gcov_tool="gcov"
 EXESUF=""
 DSOSUF=".so"
 LDFLAGS_SHARED="-shared"
-modules="no"
+modules=""
 prefix="/usr/local"
 mandir="\${prefix}/share/man"
 datadir="\${prefix}/share"
@@ -784,6 +784,9 @@ for opt do
   --enable-modules)
   modules="yes"
   ;;
+  --disable-modules)
+  modules="no"
+  ;;
   --cpu=*)
   ;;
   --target-list=*) target_list="$optarg"
@@ -1508,9 +1511,6 @@ if compile_prog "-Werror -fno-gcse" "" ; then
 fi
 
 if test "$static" = "yes" ; then
-  if test "$modules" = "yes" ; then
-error_exit "static and modules are mutually incompatible"
-  fi
   if test "$pie" = "yes" ; then
 error_exit "static and pie are mutually incompatible"
   else
@@ -1518,17 +1518,6 @@ if test "$static" = "yes" ; then
   fi
 fi
 
-# Unconditional check for compiler __thread support
-  cat > $TMPC << EOF
-static __thread int tls_var;
-int main(void) { return tls_var; }
-EOF
-
-if ! compile_prog "-Werror" "" ; then
-error_exit "Your compiler does not support the __thread specifier for " \
-   "Thread-Local Storage (TLS). Please upgrade to a version that does."
-fi
-
 if test "$pie" = ""; then
   case "$cpu-$targetos" in
 i386-Linux|x86_64-Linux|x32-Linux|i386-OpenBSD|x86_64-OpenBSD)
@@ -1601,6 +1590,17 @@ EOF
   fi
 fi
 
+# Unconditional check for compiler __thread support
+  cat > $TMPC << EOF
+static __thread int tls_var;
+int main(void) { return tls_var; }
+EOF
+
+if ! compile_prog "-Werror" "" ; then
+error_exit "Your compiler does not support the __thread specifier for " \
+   "Thread-Local Storage (TLS). Please upgrade to a version that does."
+fi
+
 ##
 # __sync_fetch_and_and requires at least -march=i486. Many toolchains
 # use i686 as default anyway, but for those that don't, an explicit
@@ -2784,17 +2784,26 @@ if test "$modules" = yes; then
 glib_modules="$glib_modules gmodule-2.0"
 fi
 
-for i in $glib_modules; do
-if $pkg_config --atleast-version=$glib_req_ver $i; then
-glib_cflags=`$pkg_config --cflags $i`
-glib_libs=`$pkg_config --libs $i`
-CFLAGS="$glib_cflags $CFLAGS"
-LIBS="$glib_libs $LIBS"
-libs_qga="$glib_libs $libs_qga"
-else
-error_exit "glib-$glib_req_ver $i is required to compile QEMU"
-fi
-done
+glib_pkg_config()
+{
+  if $pkg_config --atleast-version=$glib_req_ver $1; then
+local probe_cflags
+local probe_libs
+probe_cflags=$($pkg_config --cflags $1)
+probe_libs=$($pkg_config --libs $1)
+CFLAGS="$probe_cflags $CFLAGS"
+LIBS="$probe_libs $LIBS"
+libs_qga="$probe_libs $libs_qga"
+glib_cflags="$probe_cflags $glib_cflags"
+glib_libs="$probe_libs $glib_libs"
+return 0
+  else
+return 1
+  fi
+}
+
+glib_pkg_config gthread-2.0 || \
+  error_exit "glib-$glib_req_ver gthread-2.0 is required to compile QEMU"
 
 # g_test_trap_subprocess added in 2.38. Used by some tests.
 glib_subprocess=yes
@@ -2815,19 +2824,57 @@ if ! compile_prog "$glib_cflags -Werror" "$glib_libs" ; 
then
 fi
 
 ##
-# SHA command probe for modules
-if test "$modules" = yes; then
-shacmd_probe="sha1sum sha1 shasum"
-for c in $shacmd_probe; do
-if has $c; then
-shacmd="$c"
-break
-fi
-done
-if test "$shacmd" = ""; then
-error_exit "one of the checksum commands is required to enable 
modules: $shacmd_probe"
+# SHA command and gmodule-2.0 probe for modules
+# return 0 if probe succeeds
+# $1: true - force mode, exit if probe fail
+# false - optoinal mode, return 1 if probe fail
+module_try_enable()
+{
+  force=$1
+  if test "$static" = "yes"; then
+if $force; then
+  error_exit "static and modules are mutually incompatible"
+else
+  modules="no"
+  return
 fi
-fi
+  fi
+

[Qemu-devel] [PULL v2 00/20] SCSI, build, TCG, RCU, misc patches for 2015-08-12

2015-08-14 Thread Paolo Bonzini

The following changes since commit cb48f67ad8c7b33c617d4f8144a27706e69fd688:

  bsd-user: Fix operand to cpu_x86_exec (2015-07-30 12:38:49 +0100)

are available in the git repository at:

  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 2dfebe37e6210278cddd070761ee543f49a8a82d:

  disas: Defeature print_target_address (2015-08-13 11:33:35 +0200)


* SCSI fixes from Stefan and Fam
* vhost-scsi fix from Igor and Lu Lina
* a build system fix from Daniel
* two more multi-arch-related patches from Peter C.
* TCG patches from myself and Sergey Fedorov
* RCU improvement from Wen Congyang
* enabling module builds by default
* a few more simple cleanups


Chen Hanxiao (1):
  exec: use macro ROUND_UP for alignment

Daniel P. Berrange (1):
  configure: only add CONFIG_RDMA to config-host.h once

Fam Zheng (3):
  scsi-disk: Fix assertion failure on WRITE SAME
  virtio-scsi-test: Add test case for tail unaligned WRITE SAME
  configure: Default to enable module build

Igor Mammedov (1):
  vhost/scsi: call vhost_dev_cleanup() at unrealize() time

Lu Lina (1):
  vhost-scsi: Clarify vhost_virtqueue_mask argument

Paolo Bonzini (6):
  exec: drop cpu_can_do_io, just read cpu->can_do_io
  qemu-nbd: remove unnecessary qemu_notify_event()
  scsi: create restart bottom half in the right AioContext
  scsi-disk: identify AIO callbacks more clearly
  scsi-generic: identify AIO callbacks more clearly
  hw: fix mask for ColdFire UART command register

Peter Crosthwaite (2):
  cpu_defs: Simplify CPUTLB padding logic
  disas: Defeature print_target_address

Sergey Fedorov (1):
  cpu-exec: Do not invalidate original TB in cpu_exec_nocache()

Stefan Hajnoczi (3):
  virtio-scsi: use virtqueue_map_sg() when loading requests
  scsi-disk: fix cmd.mode field typo
  tests: virtio-scsi: clear unit attention after reset

Wen Congyang (1):
  rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

 .travis.yml  |   2 +-
 configure| 127 +++
 cpu-exec.c   |  10 ++--
 cpus.c   |   2 +-
 disas.c  |  12 +
 exec.c   |   2 +-
 hw/char/mcf_uart.c   |   2 +-
 hw/scsi/scsi-bus.c   |   3 +-
 hw/scsi/scsi-disk.c  |  97 
 hw/scsi/scsi-generic.c   |  66 ++--
 hw/scsi/vhost-scsi.c |   3 +-
 hw/scsi/virtio-scsi.c|   5 ++
 include/exec/cpu-defs.h  |  23 +
 include/exec/exec-all.h  |  23 +
 include/qom/cpu.h|   4 +-
 qemu-nbd.c   |   1 -
 qom/cpu.c|   2 +-
 softmmu_template.h   |   4 +-
 tests/virtio-scsi-test.c | 100 +++--
 translate-all.c  |  11 +++-
 util/rcu.c   |  48 +-
 21 files changed, 337 insertions(+), 210 deletions(-)
-- 
2.4.3

Re: [Qemu-devel] [PATCH] spice: Allow to set password even if disable-ticketing was used

2015-08-14 Thread Christophe Fergeau

Hey,

On Fri, Aug 14, 2015 at 01:54:59PM +0100, Daniel P. Berrange wrote:
> On Fri, Aug 14, 2015 at 02:47:15PM +0200, Christophe Fergeau wrote:
> > Before commit b1ea7b79e1, it was possible to start with -spice
> > disable-ticketing, and then use the "set_password spice" command to
> > enable ticketing with SPICE. Since commit b1ea7b79e1 this is no longer
> > possible as qemu_spice_set_ticket() will return an error unless the
> > 'auth' type is "spice". When ticketing is disabled, 'auth' is "none" so
> > the attempt to set password fails.
> > 
> > This commit allows to call qemu_spice_set_ticket() when 'auth' is "none"
> > and changes 'auth' to "spice" when this happens.
> 
> IMHO we should not be changing the authentication method as a side
> effect of trying to set the password.
> 
> If app has disabled ticketing, it should remain disabled and the
> set password call is right to return an error.
> 

In general I agree with you. However in this case, this used to be
working until ~1 year ago, and this change of behaviour caused a bug in
oVirt (oVirt side is being fixed). This is why I sent this patch.

The intent of commit b1ea7b seems to be to prevent
qemu_spice_set_passwd() from being called when SASL is used, and does
not mention at all whether preventing going from auth being "none" to
"spice" is intentional.

If this change of behaviour was an intentional bug fix, and if we are
fine with asking for oVirt changes for this, then I'm ok with dropping
this patch.

Christophe

pgpgri2ftwR_O.pgp
Description: PGP signature

[Qemu-devel] [PATCH 2/3] i8257: remove cpu_request_exit irq

2015-08-14 Thread Paolo Bonzini

This is unused.  cpu_exit now is almost exclusively an internal function
to the CPU execution loop.  The next patch will change the remaining
occurrences to qemu_cpu_kick, making it truly internal.

Signed-off-by: Paolo Bonzini 
---
 hw/dma/i82374.c |  5 +
 hw/dma/i8257.c  | 13 -
 hw/i386/pc.c| 13 +
 hw/isa/i82378.c |  3 +--
 hw/mips/mips_fulong2e.c | 13 +
 hw/mips/mips_jazz.c | 13 +
 hw/mips/mips_malta.c| 13 +
 hw/ppc/prep.c   | 11 ---
 hw/sparc/sun4m.c|  2 +-
 hw/sparc64/sun4u.c  |  2 +-
 include/hw/isa/isa.h|  2 +-
 11 files changed, 13 insertions(+), 77 deletions(-)

diff --git a/hw/dma/i82374.c b/hw/dma/i82374.c
index b8ad2e6..f630971 100644
--- a/hw/dma/i82374.c
+++ b/hw/dma/i82374.c
@@ -38,7 +38,6 @@ do { fprintf(stderr, "i82374 ERROR: " fmt , ## __VA_ARGS__); 
} while (0)
 
 typedef struct I82374State {
 uint8_t commands[8];
-qemu_irq out;
 PortioList port_list;
 } I82374State;
 
@@ -101,7 +100,7 @@ static uint32_t i82374_read_descriptor(void *opaque, 
uint32_t nport)
 
 static void i82374_realize(I82374State *s, Error **errp)
 {
-DMA_init(1, &s->out);
+DMA_init(1);
 memset(s->commands, 0, sizeof(s->commands));
 }
 
@@ -145,8 +144,6 @@ static void i82374_isa_realize(DeviceState *dev, Error 
**errp)
 isa->iobase);
 
 i82374_realize(s, errp);
-
-qdev_init_gpio_out(dev, &s->out, 1);
 }
 
 static Property i82374_properties[] = {
diff --git a/hw/dma/i8257.c b/hw/dma/i8257.c
index 409ba7d..1398424 100644
--- a/hw/dma/i8257.c
+++ b/hw/dma/i8257.c
@@ -59,7 +59,6 @@ static struct dma_cont {
 uint8_t flip_flop;
 int dshift;
 struct dma_regs regs[4];
-qemu_irq *cpu_request_exit;
 MemoryRegion channel_io;
 MemoryRegion cont_io;
 } dma_controllers[2];
@@ -521,13 +520,11 @@ static const MemoryRegionOps cont_io_ops = {
 
 /* dshift = 0: 8 bit DMA, 1 = 16 bit DMA */
 static void dma_init2(struct dma_cont *d, int base, int dshift,
-  int page_base, int pageh_base,
-  qemu_irq *cpu_request_exit)
+  int page_base, int pageh_base)
 {
 int i;
 
 d->dshift = dshift;
-d->cpu_request_exit = cpu_request_exit;
 
 memory_region_init_io(&d->channel_io, NULL, &channel_io_ops, d,
   "dma-chan", 8 << d->dshift);
@@ -591,12 +588,10 @@ static const VMStateDescription vmstate_dma = {
 }
 };
 
-void DMA_init(int high_page_enable, qemu_irq *cpu_request_exit)
+void DMA_init(int high_page_enable)
 {
-dma_init2(&dma_controllers[0], 0x00, 0, 0x80,
-  high_page_enable ? 0x480 : -1, cpu_request_exit);
-dma_init2(&dma_controllers[1], 0xc0, 1, 0x88,
-  high_page_enable ? 0x488 : -1, cpu_request_exit);
+dma_init2(&dma_controllers[0], 0x00, 0, 0x80, high_page_enable ? 0x480 : 
-1);
+dma_init2(&dma_controllers[1], 0xc0, 1, 0x88, high_page_enable ? 0x488 : 
-1);
 vmstate_register (NULL, 0, &vmstate_dma, &dma_controllers[0]);
 vmstate_register (NULL, 1, &vmstate_dma, &dma_controllers[1]);
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7661ea9..c63a308 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1431,15 +1431,6 @@ DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus 
*pci_bus)
 return dev;
 }
 
-static void cpu_request_exit(void *opaque, int irq, int level)
-{
-CPUState *cpu = current_cpu;
-
-if (cpu && level) {
-cpu_exit(cpu);
-}
-}
-
 static const MemoryRegionOps ioport80_io_ops = {
 .write = ioport80_write,
 .read = ioport80_read,
@@ -1474,7 +1465,6 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
 qemu_irq rtc_irq = NULL;
 qemu_irq *a20_line;
 ISADevice *i8042, *port92, *vmmouse, *pit = NULL;
-qemu_irq *cpu_exit_irq;
 MemoryRegion *ioport80_io = g_new(MemoryRegion, 1);
 MemoryRegion *ioportF0_io = g_new(MemoryRegion, 1);
 
@@ -1551,8 +1541,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
 port92 = isa_create_simple(isa_bus, "port92");
 port92_init(port92, &a20_line[1]);
 
-cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1);
-DMA_init(0, cpu_exit_irq);
+DMA_init(0);
 
 for(i = 0; i < MAX_FD; i++) {
 fd[i] = drive_get(IF_FLOPPY, 0, i);
diff --git a/hw/isa/i82378.c b/hw/isa/i82378.c
index fcf97d8..d4c8306 100644
--- a/hw/isa/i82378.c
+++ b/hw/isa/i82378.c
@@ -100,7 +100,6 @@ static void i82378_realize(PCIDevice *pci, Error **errp)
 
 /* 2 82C37 (dma) */
 isa = isa_create_simple(isabus, "i82374");
-qdev_connect_gpio_out(DEVICE(isa), 0, s->out[1]);
 
 /* timer */
 isa_create_simple(isabus, "mc146818rtc");
@@ -111,7 +110,7 @@ static void i82378_init(Object *obj)
 DeviceState *dev = DEVICE(obj);
 I82378State *s = I82378(obj);
 
-qdev_init_gpio_out(dev, s->out, 2);
+qdev_init_gpio_out(dev, s->out, 1);
 qdev_init_gpio_in(dev, i

[Qemu-devel] [PATCH 1/3] i8257: rewrite DMA_schedule to avoid hooking into the CPU loop

2015-08-14 Thread Paolo Bonzini

The i8257 DMA controller uses an idle bottom half, which by default
does not cause the main loop to exit.  Therefore, the DMA_schedule
function is there to ensure that the CPU relinquishes the iothread
mutex to the iothread.

However, this is not enough since the iothread will call
aio_compute_timeout() and go to sleep again.  In the iothread
world, forcing execution of the idle bottom half is much simpler,
and only requires a call to qemu_notify_event().  Do it, removing
the need for the "cpu_request_exit" pseudo-irq.  The next patch
will remove it.

Signed-off-by: Paolo Bonzini 
---
 hw/block/fdc.c   |  2 +-
 hw/dma/i8257.c   | 18 --
 hw/sparc/sun4m.c |  2 +-
 hw/sparc64/sun4u.c   |  2 +-
 include/hw/isa/isa.h |  2 +-
 5 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 5e1b67e..6686a72 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1417,7 +1417,7 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int 
direction)
  * recall us...
  */
 DMA_hold_DREQ(fdctrl->dma_chann);
-DMA_schedule(fdctrl->dma_chann);
+DMA_schedule();
 } else {
 /* Start transfer */
 fdctrl_transfer_handler(fdctrl, fdctrl->dma_chann, 0,
diff --git a/hw/dma/i8257.c b/hw/dma/i8257.c
index a414029..409ba7d 100644
--- a/hw/dma/i8257.c
+++ b/hw/dma/i8257.c
@@ -358,6 +358,7 @@ static void channel_run (int ncont, int ichan)
 }
 
 static QEMUBH *dma_bh;
+static bool dma_bh_scheduled;
 
 static void DMA_run (void)
 {
@@ -390,12 +391,15 @@ static void DMA_run (void)
 
 running = 0;
 out:
-if (rearm)
+if (rearm) {
 qemu_bh_schedule_idle(dma_bh);
+dma_bh_scheduled = true;
+}
 }
 
 static void DMA_run_bh(void *unused)
 {
+dma_bh_scheduled = false;
 DMA_run();
 }
 
@@ -458,12 +462,14 @@ int DMA_write_memory (int nchan, void *buf, int pos, int 
len)
 return len;
 }
 
-/* request the emulator to transfer a new DMA memory block ASAP */
-void DMA_schedule(int nchan)
+/* request the emulator to transfer a new DMA memory block ASAP (even
+ * if the idle bottom half would not have exited the iothread yet).
+ */
+void DMA_schedule(void)
 {
-struct dma_cont *d = &dma_controllers[nchan > 3];
-
-qemu_irq_pulse(*d->cpu_request_exit);
+if (dma_bh_scheduled) {
+qemu_notify_event();
+}
 }
 
 static void dma_reset(void *opaque)
diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 68ac4d8..ebaae9d 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -109,7 +109,7 @@ int DMA_write_memory (int nchan, void *buf, int pos, int 
size)
 }
 void DMA_hold_DREQ (int nchan) {}
 void DMA_release_DREQ (int nchan) {}
-void DMA_schedule(int nchan) {}
+void DMA_schedule(void) {}
 
 void DMA_init(int high_page_enable, qemu_irq *cpu_request_exit)
 {
diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
index 30cfa0e..44eb4eb 100644
--- a/hw/sparc64/sun4u.c
+++ b/hw/sparc64/sun4u.c
@@ -112,7 +112,7 @@ int DMA_write_memory (int nchan, void *buf, int pos, int 
size)
 }
 void DMA_hold_DREQ (int nchan) {}
 void DMA_release_DREQ (int nchan) {}
-void DMA_schedule(int nchan) {}
+void DMA_schedule(void) {}
 
 void DMA_init(int high_page_enable, qemu_irq *cpu_request_exit)
 {
diff --git a/include/hw/isa/isa.h b/include/hw/isa/isa.h
index f21ceaa..81b94ea 100644
--- a/include/hw/isa/isa.h
+++ b/include/hw/isa/isa.h
@@ -112,7 +112,7 @@ int DMA_read_memory (int nchan, void *buf, int pos, int 
size);
 int DMA_write_memory (int nchan, void *buf, int pos, int size);
 void DMA_hold_DREQ (int nchan);
 void DMA_release_DREQ (int nchan);
-void DMA_schedule(int nchan);
+void DMA_schedule(void);
 void DMA_init(int high_page_enable, qemu_irq *cpu_request_exit);
 void DMA_register_channel (int nchan,
DMA_transfer_handler transfer_handler,
-- 
2.4.3

[Qemu-devel] [PATCH 0/3] Signal-free qemu_cpu_kick for TCG

2015-08-14 Thread Paolo Bonzini

The first two patches remove most uses of cpu_exit outside the
CPU loop.  The third patch converts qemu_cpu_kick to do memory
accesses from the iothread instead of using a signal.

Paolo

Paolo Bonzini (3):
  i8257: rewrite DMA_schedule to avoid hooking into the CPU loop
  i8257: remove cpu_request_exit irq
  tcg: signal-free qemu_cpu_kick

 cpu-exec.c  | 18 --
 cpus.c  | 91 ++---
 gdbstub.c   |  2 +-
 hw/block/fdc.c  |  2 +-
 hw/dma/i82374.c |  5 +--
 hw/dma/i8257.c  | 31 +
 hw/i386/pc.c| 13 +--
 hw/isa/i82378.c |  3 +-
 hw/mips/mips_fulong2e.c | 13 +--
 hw/mips/mips_jazz.c | 13 +--
 hw/mips/mips_malta.c| 13 +--
 hw/ppc/prep.c   | 11 --
 hw/ppc/spapr_rtas.c |  2 +-
 hw/sparc/sun4m.c|  4 +--
 hw/sparc64/sun4u.c  |  4 +--
 include/hw/isa/isa.h|  4 +--
 qom/cpu.c   |  2 ++
 17 files changed, 64 insertions(+), 167 deletions(-)

-- 
2.4.3

[Qemu-devel] [PATCH 3/3] tcg: signal-free qemu_cpu_kick

2015-08-14 Thread Paolo Bonzini

Signals are slow and do not exist on Win32.  It is not much more
complicated to use memory barriers (which we already need anyway on
Windows!) and set the existing flags in the iothread.

qemu_cpu_kick_thread is not used anymore on TCG, since the TCG thread
is never outside usermode while the CPU is running (not halted).

Signed-off-by: Paolo Bonzini 
---
 cpu-exec.c  | 18 ---
 cpus.c  | 91 +++--
 gdbstub.c   |  2 +-
 hw/ppc/spapr_rtas.c |  2 +-
 qom/cpu.c   |  2 ++
 5 files changed, 35 insertions(+), 80 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 713540f..069c2eb 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -367,19 +367,10 @@ int cpu_exec(CPUState *cpu)
 cpu->halted = 0;
 }
 
-current_cpu = cpu;
-
-/* As long as current_cpu is null, up to the assignment just above,
- * requests by other threads to exit the execution loop are expected to
- * be issued using the exit_request global. We must make sure that our
- * evaluation of the global value is performed past the current_cpu
- * value transition point, which requires a memory barrier as well as
- * an instruction scheduling constraint on modern architectures.  */
-smp_mb();
-
+atomic_mb_set(¤t_cpu, cpu);
 rcu_read_lock();
 
-if (unlikely(exit_request)) {
+if (unlikely(atomic_mb_read(&exit_request))) {
 cpu->exit_request = 1;
 }
 
@@ -519,8 +510,11 @@ int cpu_exec(CPUState *cpu)
  * loop. Whatever requested the exit will also
  * have set something else (eg exit_request or
  * interrupt_request) which we will handle
- * next time around the loop.
+ * next time around the loop.  But we need to
+ * ensure tcg_exit_req is read before exit_request
+ * or interrupt_request.
  */
+smp_rmb();
 next_tb = 0;
 break;
 case TB_EXIT_ICOUNT_EXPIRED:
diff --git a/cpus.c b/cpus.c
index c1e74d9..0aa02a0 100644
--- a/cpus.c
+++ b/cpus.c
@@ -661,14 +661,6 @@ static void cpu_handle_guest_debug(CPUState *cpu)
 cpu->stopped = true;
 }
 
-static void cpu_signal(int sig)
-{
-if (current_cpu) {
-cpu_exit(current_cpu);
-}
-exit_request = 1;
-}
-
 #ifdef CONFIG_LINUX
 static void sigbus_reraise(void)
 {
@@ -781,29 +773,11 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
 }
 }
 
-static void qemu_tcg_init_cpu_signals(void)
-{
-sigset_t set;
-struct sigaction sigact;
-
-memset(&sigact, 0, sizeof(sigact));
-sigact.sa_handler = cpu_signal;
-sigaction(SIG_IPI, &sigact, NULL);
-
-sigemptyset(&set);
-sigaddset(&set, SIG_IPI);
-pthread_sigmask(SIG_UNBLOCK, &set, NULL);
-}
-
 #else /* _WIN32 */
 static void qemu_kvm_init_cpu_signals(CPUState *cpu)
 {
 abort();
 }
-
-static void qemu_tcg_init_cpu_signals(void)
-{
-}
 #endif /* _WIN32 */
 
 static QemuMutex qemu_global_mutex;
@@ -1041,7 +1015,6 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 rcu_register_thread();
 
 qemu_mutex_lock_iothread();
-qemu_tcg_init_cpu_signals();
 qemu_thread_get_self(cpu->thread);
 
 CPU_FOREACH(cpu) {
@@ -1085,61 +1058,45 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 #ifndef _WIN32
 int err;
 
+if (cpu->thread_kicked) {
+return;
+}
+cpu->thread_kicked = true;
 err = pthread_kill(cpu->thread->thread, SIG_IPI);
 if (err) {
 fprintf(stderr, "qemu:%s: %s", __func__, strerror(err));
 exit(1);
 }
 #else /* _WIN32 */
-if (!qemu_cpu_is_self(cpu)) {
-CONTEXT tcgContext;
-
-if (SuspendThread(cpu->hThread) == (DWORD)-1) {
-fprintf(stderr, "qemu:%s: GetLastError:%lu\n", __func__,
-GetLastError());
-exit(1);
-}
-
-/* On multi-core systems, we are not sure that the thread is actually
- * suspended until we can get the context.
- */
-tcgContext.ContextFlags = CONTEXT_CONTROL;
-while (GetThreadContext(cpu->hThread, &tcgContext) != 0) {
-continue;
-}
-
-cpu_signal(0);
-
-if (ResumeThread(cpu->hThread) == (DWORD)-1) {
-fprintf(stderr, "qemu:%s: GetLastError:%lu\n", __func__,
-GetLastError());
-exit(1);
-}
-}
+abort();
 #endif
 }
 
 void qemu_cpu_kick(CPUState *cpu)
 {
 qemu_cond_broadcast(cpu->halt_cond);
-if (!tcg_enabled() && !cpu->thread_kicked) {
+if (tcg_enabled()) {
+/* Ensure whatever caused the exit has reached the CPU threads before
+ * writing exit_request.
+ */
+smp_wmb();
+exit_request = 1;
+/* Ignore the CPU argument since all CPUs run in the same thread;

Re: [Qemu-devel] [PATCH RFC 00/10] Enable repository wide style checking

2015-08-14 Thread Paul Eggert


Peter Maydell wrote:

I just don't want
a GPLv3-licensed file in the git repo and an integrated part
of our build-and-test system...


My kneejerk reaction is that the build procedures in question are large enough 
that they should stay GPLv3.  If you don't want those files in your git repo you 
can simply fetch them as part of your bootstrap or autogen.sh or whatever. 
Although this might not mollify people who worry about GPLv3 cooties infecting 
their executables, catering to paranoia is not high on our list of things to do.

[Qemu-devel] [PATCH] e500 ATMU register reads broken

2015-08-14 Thread Rudolf Marek


Hi all,

I noticed that ATMU register reads on E500 are broken. Due to the wrong mask, 
some registers cannot be read and instead some other registers are read. Please 
see attached patch which fixes the problem.


I also noticed that if there was an intention to have 1:1 PCI/CPU space mapping 
for 0xC000_ for MPC8544DS without programming ATMUs - it does not work, 
unless ATMUs are programmed.


Signed-off-by: Rudolf Marek 

Thanks,
Rudolf

--
S přátelským pozdravem / Best regards / Mit freundlichen Grüßen

Ing. Rudolf Marek
SYSGO s.r.o.
Zelený pruh 99
CZ-14800 Praha 4
Phone: +420 222138 627, +49 6136 9948 627
Fax: +420 296374890, +49 6136 9948 1 627
rudolf.ma...@sysgo.com

http://www.sysgo.com | http://www.elinos.com | http://www.pikeos.com

>From 75795e2bcc6ffbb245d192eb84e063d855dbf248 Mon Sep 17 00:00:00 2001
From: Rudolf Marek 
Date: Fri, 14 Aug 2015 13:38:55 +0200
Subject: [PATCH] PPC: e500 pci host: Fix ATMUs register reads

There is a bug in the register mask when reading
the ATMUs registers. As the result some registers
cannot be read, and read is aliased to the other
registers. Fix it.

Signed-off-by: Rudolf Marek 
---
 hw/pci-host/ppce500.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/ppce500.c b/hw/pci-host/ppce500.c
index 613ba73..50add34 100644
--- a/hw/pci-host/ppce500.c
+++ b/hw/pci-host/ppce500.c
@@ -140,7 +140,7 @@ static uint64_t pci_reg_read4(void *opaque, hwaddr addr,
 case PPCE500_PCI_OW3:
 case PPCE500_PCI_OW4:
 idx = (addr >> 5) & 0x7;
-switch (addr & 0xC) {
+switch (addr & 0x1F) {
 case PCI_POTAR:
 value = pci->pob[idx].potar;
 break;
@@ -162,7 +162,7 @@ static uint64_t pci_reg_read4(void *opaque, hwaddr addr,
 case PPCE500_PCI_IW2:
 case PPCE500_PCI_IW1:
 idx = ((addr >> 5) & 0x3) - 1;
-switch (addr & 0xC) {
+switch (addr & 0x1F) {
 case PCI_PITAR:
 value = pci->pib[idx].pitar;
 break;
-- 
1.9.1

[Qemu-devel] [Bug 1484925] [NEW] Segfault with custom vnc client

2015-08-14 Thread Uli Stärk

Public bug reported:

Hey,

I'm using Citrix XenServer 6.5. I worte a script that uses noVNC to
connect to the rfb console via xapi. When I use GRML and try to boot it,
the QEMU process segfaults and kills my VM. This happens when the screen
resizes and the kernel is loading:

recvfrom(3, "\3\1\0\0\0\0\2\200\1\220\3\0\2\200\0\0\0P\1\220", 4096, 0, NULL, 
NULL) = 20
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xb28000} ---

I can see in the child process the following message, right before the parent 
Segfaults:
read(4, "cirrus: blanking the screen line_offset=0 height=480\n", 53) = 53

This issue only happens, when I have my custom php/novnc-client
connected. I also tried the nodejs/novnc package from xen-orchestra -
same result. Using the stock client from Citrix XenCenter it works just
fine. So I think it is related to noVNC. I hope this is just a bug and
not exploitable to force a VM to crash or execute code.

XenServer launches the qemu with the following command line:

qemu-dm-25 --syslog -d 25 -m 2048 -boot dc -serial pty -vcpus 1
-videoram 4 -vncunused -k en-us -vnc 127.0.0.1:1 -usb -usbdevice tablet
-net nic,vlan=0,macaddr=8a:43:e2:b1:57:df,model=rtl8139 -net
tap,vlan=0,bridge=xenbr0,ifname=tap25.0 -acpi -monitor pty

XenServer 6.5 is using the following version:
# /usr/lib64/xen/bin/qemu-dm -help
QEMU PC emulator version 0.10.2, Copyright (c) 2003-2008 Fabrice Bellard

Greetings
Uli Stärk

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1484925

Title:
  Segfault with custom vnc client

Status in QEMU:
  New

Bug description:
  Hey,

  I'm using Citrix XenServer 6.5. I worte a script that uses noVNC to
  connect to the rfb console via xapi. When I use GRML and try to boot
  it, the QEMU process segfaults and kills my VM. This happens when the
  screen resizes and the kernel is loading:

  recvfrom(3, "\3\1\0\0\0\0\2\200\1\220\3\0\2\200\0\0\0P\1\220", 4096, 0, NULL, 
NULL) = 20
  --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xb28000} ---

  I can see in the child process the following message, right before the parent 
Segfaults:
  read(4, "cirrus: blanking the screen line_offset=0 height=480\n", 53) = 53

  This issue only happens, when I have my custom php/novnc-client
  connected. I also tried the nodejs/novnc package from xen-orchestra -
  same result. Using the stock client from Citrix XenCenter it works
  just fine. So I think it is related to noVNC. I hope this is just a
  bug and not exploitable to force a VM to crash or execute code.

  XenServer launches the qemu with the following command line:

  qemu-dm-25 --syslog -d 25 -m 2048 -boot dc -serial pty -vcpus 1
  -videoram 4 -vncunused -k en-us -vnc 127.0.0.1:1 -usb -usbdevice
  tablet -net nic,vlan=0,macaddr=8a:43:e2:b1:57:df,model=rtl8139 -net
  tap,vlan=0,bridge=xenbr0,ifname=tap25.0 -acpi -monitor pty

  XenServer 6.5 is using the following version:
  # /usr/lib64/xen/bin/qemu-dm -help
  QEMU PC emulator version 0.10.2, Copyright (c) 2003-2008 Fabrice Bellard

  Greetings
  Uli Stärk

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1484925/+subscriptions

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-08-14 Thread Peter Lieven

Am 22.06.2015 um 23:54 schrieb John Snow:
>
> On 06/22/2015 09:09 AM, Peter Lieven wrote:
>> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven  wrote:
 Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven  wrote:
>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
 Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>> Thread 2 (Thread 0x75550700 (LWP 2636)):
>> #0  0x75d87aa3 in ppoll () from
>> /lib/x86_64-linux-gnu/libc.so.6
>> No symbol table info available.
>> #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
>> nfds=3,
>>   timeout=4999424576) at qemu-timer.c:326
>>   ts = {tv_sec = 4, tv_nsec = 999424576}
>>   tvsec = 4
>> #2  0x55956feb in aio_poll (ctx=0x563528e0,
>> blocking=true)
>>   at aio-posix.c:231
>>   node = 0x0
>>   was_dispatching = false
>>   ret = 1
>>   progress = false
>> #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
>> offset=4292007936,
>>   qiov=0x7554f760, is_write=false, flags=0) at
>> block.c:2699
>>   aio_context = 0x563528e0
>>   co = 0x563888a0
>>   rwco = {bs = 0x5637eae0, offset = 4292007936,
>> qiov = 0x7554f760, is_write = false, ret =
>> 2147483647,
>> flags = 0}
>> #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
>> sector_num=8382828,
>>   buf=0x744cc800 "(", nb_sectors=4, is_write=false,
>> flags=0)
>>   at block.c:2722
>>   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
>> size =
>> 2048}
>>   iov = {iov_base = 0x744cc800, iov_len = 2048}
>> #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
>> sector_num=8382828,
>>   buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
>> No locals.
>> #6  0x5599acef in blk_read (blk=0x56376820,
>> sector_num=8382828,
>>   buf=0x744cc800 "(", nb_sectors=4) at
>> block/block-backend.c:404
>> No locals.
>> #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
>> lba=2095707,
>>   buf=0x744cc800 "(", sector_size=2048) at
>> hw/ide/atapi.c:116
>>   ret = 32767
> Here is the problem: The ATAPI emulation uses synchronous
> blk_read()
> instead of the AIO or coroutine interfaces. This means that it
> keeps
> polling for request completion while it holds the BQL until the
> request
> is completed.
 I will look at this.
>> I need some further help. My way to "emulate" a hung NFS Server is to
>> block it in the Firewall. Currently I face the problem that I
>> cannot mount
>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>> tried with
>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>> another
>> bug):
>>
>> (gdb) thread apply all bt full
>>
>> Thread 3 (Thread 0x70c21700 (LWP 29710)):
>> #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
>> util/qemu-thread-posix.c:120
>>  err = 
>>  __func__ = "qemu_cond_broadcast"
>> #1  0x55911164 in rfifolock_unlock
>> (r=r@entry=0x56259910) at
>> util/rfifolock.c:75
>>  __PRETTY_FUNCTION__ = "rfifolock_unlock"
>> #2  0x55875921 in aio_context_release
>> (ctx=ctx@entry=0x562598b0)
>> at async.c:329
>> No locals.
>> #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
>> blocking=blocking@entry=true) at aio-posix.c:272
>>  node = 
>>  was_dispatching = false
>>  i = 
>>  ret = 
>>  progress = false
>>  timeout = 611734526
>>  __PRETTY_FUNCTION__ = "aio_poll"
>> #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
>> offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>> block/io.c:552
>>  aio_context = 0x562598b0
>>  co = 
>>  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
>> 0x70c208f0, is_write = false, ret = 2147483647, flags =
>> (unknown: 0)}
>> #5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
>> sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
>> nb_sectors=nb_sectors@entry=4, i

Re: [Qemu-devel] [PATCH] mirror: Fix coroutine reentrance

2015-08-14 Thread Jeff Cody

On Thu, Aug 13, 2015 at 10:41:50AM +0200, Kevin Wolf wrote:
> This fixes a regression introduced by commit dcfb3beb ("mirror: Do zero
> write on target if sectors not allocated"), which was reported to cause
> aborts with the message "Co-routine re-entered recursively".
> 
> The cause for this bug is the following code in mirror_iteration_done():
> 
> if (s->common.busy) {
> qemu_coroutine_enter(s->common.co, NULL);
> }
> 
> This has always been ugly because - unlike most places that reenter - it
> doesn't have a specific yield that it pairs with, but is more
> uncontrolled.  What we really mean here is "reenter the coroutine if
> it's in one of the four explicit yields in mirror.c".
> 
> This used to be equivalent with s->common.busy because neither
> mirror_run() nor mirror_iteration() call any function that could yield.
> However since commit dcfb3beb this doesn't hold true any more:
> bdrv_get_block_status_above() can yield.
> 
> So what happens is that bdrv_get_block_status_above() wants to take a
> lock that is already held, so it adds itself to the queue of waiting
> coroutines and yields. Instead of being woken up by the unlock function,
> however, it gets woken up by mirror_iteration_done(), which is obviously
> wrong.
> 
> In most cases the code actually happens to cope fairly well with such
> cases, but in this specific case, the unlock must already have scheduled
> the coroutine for wakeup when mirror_iteration_done() reentered it. And
> then the coroutine happened to process the scheduled restarts and tried
> to reenter itself recursively.
> 
> This patch fixes the problem by pairing the reenter in
> mirror_iteration_done() with specific yields instead of abusing
> s->common.busy.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Kevin Wolf 
> ---
>  block/mirror.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index fc4d8f5..b2fb4b9 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -60,6 +60,7 @@ typedef struct MirrorBlockJob {
>  int sectors_in_flight;
>  int ret;
>  bool unmap;
> +bool waiting_for_io;
>  } MirrorBlockJob;
>  
>  typedef struct MirrorOp {
> @@ -114,11 +115,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  qemu_iovec_destroy(&op->qiov);
>  g_slice_free(MirrorOp, op);
>  
> -/* Enter coroutine when it is not sleeping.  The coroutine sleeps to
> - * rate-limit itself.  The coroutine will eventually resume since there 
> is
> - * a sleep timeout so don't wake it early.
> - */
> -if (s->common.busy) {
> +if (s->waiting_for_io) {
>  qemu_coroutine_enter(s->common.co, NULL);
>  }
>  }
> @@ -203,7 +200,9 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  /* Wait for I/O to this cluster (from a previous iteration) to be done.  
> */
>  while (test_bit(next_chunk, s->in_flight_bitmap)) {
>  trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
> +s->waiting_for_io = true;
>  qemu_coroutine_yield();
> +s->waiting_for_io = false;
>  }
>  
>  do {
> @@ -239,7 +238,9 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>   */
>  while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
>  trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
> +s->waiting_for_io = true;
>  qemu_coroutine_yield();
> +s->waiting_for_io = false;
>  }
>  if (s->buf_free_count < nb_chunks + added_chunks) {
>  trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
> @@ -333,7 +334,9 @@ static void mirror_free_init(MirrorBlockJob *s)
>  static void mirror_drain(MirrorBlockJob *s)
>  {
>  while (s->in_flight > 0) {
> +s->waiting_for_io = true;
>  qemu_coroutine_yield();
> +s->waiting_for_io = false;
>  }
>  }
>  
> @@ -506,7 +509,9 @@ static void coroutine_fn mirror_run(void *opaque)
>  if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
>  (cnt == 0 && s->in_flight > 0)) {
>  trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
> +s->waiting_for_io = true;
>  qemu_coroutine_yield();
> +s->waiting_for_io = false;
>  continue;
>  } else if (cnt != 0) {
>  delay_ns = mirror_iteration(s);
> -- 
> 1.8.3.1
>

Reviewed-by: Jeff Cody

Re: [Qemu-devel] [PATCH] mirror: Fix coroutine reentrance

2015-08-14 Thread Jeff Cody

On Thu, Aug 13, 2015 at 10:41:50AM +0200, Kevin Wolf wrote:
> This fixes a regression introduced by commit dcfb3beb ("mirror: Do zero
> write on target if sectors not allocated"), which was reported to cause
> aborts with the message "Co-routine re-entered recursively".
> 
> The cause for this bug is the following code in mirror_iteration_done():
> 
> if (s->common.busy) {
> qemu_coroutine_enter(s->common.co, NULL);
> }
> 
> This has always been ugly because - unlike most places that reenter - it
> doesn't have a specific yield that it pairs with, but is more
> uncontrolled.  What we really mean here is "reenter the coroutine if
> it's in one of the four explicit yields in mirror.c".
> 
> This used to be equivalent with s->common.busy because neither
> mirror_run() nor mirror_iteration() call any function that could yield.
> However since commit dcfb3beb this doesn't hold true any more:
> bdrv_get_block_status_above() can yield.
> 
> So what happens is that bdrv_get_block_status_above() wants to take a
> lock that is already held, so it adds itself to the queue of waiting
> coroutines and yields. Instead of being woken up by the unlock function,
> however, it gets woken up by mirror_iteration_done(), which is obviously
> wrong.
> 
> In most cases the code actually happens to cope fairly well with such
> cases, but in this specific case, the unlock must already have scheduled
> the coroutine for wakeup when mirror_iteration_done() reentered it. And
> then the coroutine happened to process the scheduled restarts and tried
> to reenter itself recursively.
> 
> This patch fixes the problem by pairing the reenter in
> mirror_iteration_done() with specific yields instead of abusing
> s->common.busy.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Kevin Wolf 

Thanks, applied to my block branch:

https://github.com/codyprime/qemu-kvm-jtc/tree/block

-Jeff

[Qemu-devel] [PULL 0/2] Block job patches

2015-08-14 Thread Jeff Cody

The following changes since commit be1f13ac9d9fc21908975460652a72f5f0c018c5:

  Merge remote-tracking branch 'remotes/lalrae/tags/mips-20150813' into staging 
(2015-08-13 17:47:44 +0100)

are available in the git repository at:


  g...@github.com:codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to e424aff5f307227b1c2512bbb8ece891bb895cef:

  mirror: Fix coroutine reentrance (2015-08-14 09:51:31 -0400)


Block job patches


Kevin Wolf (1):
  mirror: Fix coroutine reentrance

Stefan Hajnoczi (1):
  block/mirror: limit qiov to IOV_MAX elements

[Qemu-devel] [PULL 1/2] block/mirror: limit qiov to IOV_MAX elements

2015-08-14 Thread Jeff Cody

From: Stefan Hajnoczi 

If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2)
EINVAL failures may be encountered.

It is possible to trigger this by setting granularity to a low value
like 8192.

This patch stops appending chunks once IOV_MAX is reached.

The spurious EINVAL failure can be reproduced with a qcow2 image file
and the following QMP invocation:

  qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1',
  granularity=8192, sync='full', mode='absolute-paths',
  format='raw')

While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct
bs=4k.

Cc: Jeff Cody 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paolo Bonzini 
Message-id: 1435761950-26714-1-git-send-email-stefa...@redhat.com
Signed-off-by: Jeff Cody 
---
 block/mirror.c | 4 
 trace-events   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index fc4d8f5..0841964 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -245,6 +245,10 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
 break;
 }
+if (IOV_MAX < nb_chunks + added_chunks) {
+trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
+break;
+}
 
 /* We have enough free space to copy these sectors.  */
 bitmap_set(s->in_flight_bitmap, next_chunk, added_chunks);
diff --git a/trace-events b/trace-events
index 94bf3bb..8f9614a 100644
--- a/trace-events
+++ b/trace-events
@@ -94,6 +94,7 @@ mirror_yield(void *s, int64_t cnt, int buf_free_count, int 
in_flight) "s %p dirt
 mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p 
sector_num %"PRId64" in_flight %d"
 mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested 
chunks %d in_flight %d"
 mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested 
chunks %d in_flight %d"
+mirror_break_iov_max(void *s, int nb_chunks, int added_chunks) "s %p requested 
chunks %d added_chunks %d"
 
 # block/backup.c
 backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int 
nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"
-- 
1.9.3

[Qemu-devel] [PULL 2/2] mirror: Fix coroutine reentrance

2015-08-14 Thread Jeff Cody

From: Kevin Wolf 

This fixes a regression introduced by commit dcfb3beb ("mirror: Do zero
write on target if sectors not allocated"), which was reported to cause
aborts with the message "Co-routine re-entered recursively".

The cause for this bug is the following code in mirror_iteration_done():

if (s->common.busy) {
qemu_coroutine_enter(s->common.co, NULL);
}

This has always been ugly because - unlike most places that reenter - it
doesn't have a specific yield that it pairs with, but is more
uncontrolled.  What we really mean here is "reenter the coroutine if
it's in one of the four explicit yields in mirror.c".

This used to be equivalent with s->common.busy because neither
mirror_run() nor mirror_iteration() call any function that could yield.
However since commit dcfb3beb this doesn't hold true any more:
bdrv_get_block_status_above() can yield.

So what happens is that bdrv_get_block_status_above() wants to take a
lock that is already held, so it adds itself to the queue of waiting
coroutines and yields. Instead of being woken up by the unlock function,
however, it gets woken up by mirror_iteration_done(), which is obviously
wrong.

In most cases the code actually happens to cope fairly well with such
cases, but in this specific case, the unlock must already have scheduled
the coroutine for wakeup when mirror_iteration_done() reentered it. And
then the coroutine happened to process the scheduled restarts and tried
to reenter itself recursively.

This patch fixes the problem by pairing the reenter in
mirror_iteration_done() with specific yields instead of abusing
s->common.busy.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jeff Cody 
Message-id: 1439455310-11263-1-git-send-email-kw...@redhat.com
Signed-off-by: Jeff Cody 
---
 block/mirror.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 0841964..9474443 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -60,6 +60,7 @@ typedef struct MirrorBlockJob {
 int sectors_in_flight;
 int ret;
 bool unmap;
+bool waiting_for_io;
 } MirrorBlockJob;
 
 typedef struct MirrorOp {
@@ -114,11 +115,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 qemu_iovec_destroy(&op->qiov);
 g_slice_free(MirrorOp, op);
 
-/* Enter coroutine when it is not sleeping.  The coroutine sleeps to
- * rate-limit itself.  The coroutine will eventually resume since there is
- * a sleep timeout so don't wake it early.
- */
-if (s->common.busy) {
+if (s->waiting_for_io) {
 qemu_coroutine_enter(s->common.co, NULL);
 }
 }
@@ -203,7 +200,9 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
 while (test_bit(next_chunk, s->in_flight_bitmap)) {
 trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+s->waiting_for_io = true;
 qemu_coroutine_yield();
+s->waiting_for_io = false;
 }
 
 do {
@@ -239,7 +238,9 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
  */
 while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
 trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
+s->waiting_for_io = true;
 qemu_coroutine_yield();
+s->waiting_for_io = false;
 }
 if (s->buf_free_count < nb_chunks + added_chunks) {
 trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
@@ -337,7 +338,9 @@ static void mirror_free_init(MirrorBlockJob *s)
 static void mirror_drain(MirrorBlockJob *s)
 {
 while (s->in_flight > 0) {
+s->waiting_for_io = true;
 qemu_coroutine_yield();
+s->waiting_for_io = false;
 }
 }
 
@@ -510,7 +513,9 @@ static void coroutine_fn mirror_run(void *opaque)
 if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
 (cnt == 0 && s->in_flight > 0)) {
 trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
+s->waiting_for_io = true;
 qemu_coroutine_yield();
+s->waiting_for_io = false;
 continue;
 } else if (cnt != 0) {
 delay_ns = mirror_iteration(s);
-- 
1.9.3

Re: [Qemu-devel] [PULL v2 00/20] SCSI, build, TCG, RCU, misc patches for 2015-08-12

2015-08-14 Thread Peter Maydell

On 14 August 2015 at 14:03, Paolo Bonzini  wrote:
> The following changes since commit cb48f67ad8c7b33c617d4f8144a27706e69fd688:
>
>   bsd-user: Fix operand to cpu_x86_exec (2015-07-30 12:38:49 +0100)
>
> are available in the git repository at:
>
>   git://github.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 2dfebe37e6210278cddd070761ee543f49a8a82d:
>
>   disas: Defeature print_target_address (2015-08-13 11:33:35 +0200)
>
> 
> * SCSI fixes from Stefan and Fam
> * vhost-scsi fix from Igor and Lu Lina
> * a build system fix from Daniel
> * two more multi-arch-related patches from Peter C.
> * TCG patches from myself and Sergey Fedorov
> * RCU improvement from Wen Congyang
> * enabling module builds by default
> * a few more simple cleanups

Hi; I'm afraid this failed to build on my w32 config:

/home/petmay01/linaro/qemu-for-merges/block/dmg.c:1: warning: -fPIC
ignored for target (all code is position independent)

(I have warnings-are-errors enabled.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH] spice: Allow to set password even if disable-ticketing was used

2015-08-14 Thread Daniel P. Berrange

On Fri, Aug 14, 2015 at 03:09:44PM +0200, Christophe Fergeau wrote:
> Hey,
> 
> On Fri, Aug 14, 2015 at 01:54:59PM +0100, Daniel P. Berrange wrote:
> > On Fri, Aug 14, 2015 at 02:47:15PM +0200, Christophe Fergeau wrote:
> > > Before commit b1ea7b79e1, it was possible to start with -spice
> > > disable-ticketing, and then use the "set_password spice" command to
> > > enable ticketing with SPICE. Since commit b1ea7b79e1 this is no longer
> > > possible as qemu_spice_set_ticket() will return an error unless the
> > > 'auth' type is "spice". When ticketing is disabled, 'auth' is "none" so
> > > the attempt to set password fails.
> > > 
> > > This commit allows to call qemu_spice_set_ticket() when 'auth' is "none"
> > > and changes 'auth' to "spice" when this happens.
> > 
> > IMHO we should not be changing the authentication method as a side
> > effect of trying to set the password.
> > 
> > If app has disabled ticketing, it should remain disabled and the
> > set password call is right to return an error.
> > 
> 
> In general I agree with you. However in this case, this used to be
> working until ~1 year ago, and this change of behaviour caused a bug in
> oVirt (oVirt side is being fixed). This is why I sent this patch.
> 
> The intent of commit b1ea7b seems to be to prevent
> qemu_spice_set_passwd() from being called when SASL is used, and does
> not mention at all whether preventing going from auth being "none" to
> "spice" is intentional.
> 
> If this change of behaviour was an intentional bug fix, and if we are
> fine with asking for oVirt changes for this, then I'm ok with dropping
> this patch.

Hmm, is oVirt using this via libvirt ? If so, I guess we have to fix
it, as that would be a break in current usage.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-08-14 Thread Kevin Wolf

Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
> Am 22.06.2015 um 23:54 schrieb John Snow:
> >
> > On 06/22/2015 09:09 AM, Peter Lieven wrote:
> >> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
> >>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven  wrote:
>  Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
> > On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven  wrote:
> >> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
> >>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>  Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
> > Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
> >> Thread 2 (Thread 0x75550700 (LWP 2636)):
> >> #0  0x75d87aa3 in ppoll () from
> >> /lib/x86_64-linux-gnu/libc.so.6
> >> No symbol table info available.
> >> #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
> >> nfds=3,
> >>   timeout=4999424576) at qemu-timer.c:326
> >>   ts = {tv_sec = 4, tv_nsec = 999424576}
> >>   tvsec = 4
> >> #2  0x55956feb in aio_poll (ctx=0x563528e0,
> >> blocking=true)
> >>   at aio-posix.c:231
> >>   node = 0x0
> >>   was_dispatching = false
> >>   ret = 1
> >>   progress = false
> >> #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
> >> offset=4292007936,
> >>   qiov=0x7554f760, is_write=false, flags=0) at
> >> block.c:2699
> >>   aio_context = 0x563528e0
> >>   co = 0x563888a0
> >>   rwco = {bs = 0x5637eae0, offset = 4292007936,
> >> qiov = 0x7554f760, is_write = false, ret =
> >> 2147483647,
> >> flags = 0}
> >> #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
> >> sector_num=8382828,
> >>   buf=0x744cc800 "(", nb_sectors=4, is_write=false,
> >> flags=0)
> >>   at block.c:2722
> >>   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
> >> size =
> >> 2048}
> >>   iov = {iov_base = 0x744cc800, iov_len = 2048}
> >> #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
> >> sector_num=8382828,
> >>   buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
> >> No locals.
> >> #6  0x5599acef in blk_read (blk=0x56376820,
> >> sector_num=8382828,
> >>   buf=0x744cc800 "(", nb_sectors=4) at
> >> block/block-backend.c:404
> >> No locals.
> >> #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
> >> lba=2095707,
> >>   buf=0x744cc800 "(", sector_size=2048) at
> >> hw/ide/atapi.c:116
> >>   ret = 32767
> > Here is the problem: The ATAPI emulation uses synchronous
> > blk_read()
> > instead of the AIO or coroutine interfaces. This means that it
> > keeps
> > polling for request completion while it holds the BQL until the
> > request
> > is completed.
>  I will look at this.
> >> I need some further help. My way to "emulate" a hung NFS Server is to
> >> block it in the Firewall. Currently I face the problem that I
> >> cannot mount
> >> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
> >> tried with
> >> a kernel NFS mount). It reads a few sectors and then stalls (maybe
> >> another
> >> bug):
> >>
> >> (gdb) thread apply all bt full
> >>
> >> Thread 3 (Thread 0x70c21700 (LWP 29710)):
> >> #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
> >> util/qemu-thread-posix.c:120
> >>  err = 
> >>  __func__ = "qemu_cond_broadcast"
> >> #1  0x55911164 in rfifolock_unlock
> >> (r=r@entry=0x56259910) at
> >> util/rfifolock.c:75
> >>  __PRETTY_FUNCTION__ = "rfifolock_unlock"
> >> #2  0x55875921 in aio_context_release
> >> (ctx=ctx@entry=0x562598b0)
> >> at async.c:329
> >> No locals.
> >> #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
> >> blocking=blocking@entry=true) at aio-posix.c:272
> >>  node = 
> >>  was_dispatching = false
> >>  i = 
> >>  ret = 
> >>  progress = false
> >>  timeout = 611734526
> >>  __PRETTY_FUNCTION__ = "aio_poll"
> >> #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
> >> offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
> >> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
> >> block/io.c:552
> >>  aio_context = 0x562598b0
> >>  co = 
> >>  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov

Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła


So, the problem is cause by stop_vm(RUN_STATE_PAUSED), in this case the 
env->tsc is not updated, which lead to the issue.
Is that right?


I think so.


If the cpu_clean_all_dirty() is needed just for the APIC status reason, I think 
we can do the cpu_synchronize_all_states() in do_vm_stop
and after vm_state_notify() when the RUN_STATE_PAUSED is hit, at this point all 
the device models is stopped, there is no outdated APIC status.


Yes, cpu_clean_all_dirty() was needed because without it, the second 
call to cpu_synchronize_all_states() (which is done inside 
qemu_savevm_state_complete() and after kvmclock) does nothing.



I want to write a patch to fix this issue in another way, could help to verify 
it in you environment, very appreciate if you could.


Sure, I'll test it. Both issues were quite easy to reproduce.

--
mg

Re: [Qemu-devel] [PULL v2 00/20] SCSI, build, TCG, RCU, misc patches for 2015-08-12

2015-08-14 Thread Paolo Bonzini



On 14/08/2015 15:53, Peter Maydell wrote:
> Hi; I'm afraid this failed to build on my w32 config:
> 
> /home/petmay01/linaro/qemu-for-merges/block/dmg.c:1: warning: -fPIC
> ignored for target (all code is position independent)
> 
> (I have warnings-are-errors enabled.)

This is a very weird warning, also warnings-are-errors does not build
for me with Win32.

Paolo

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-08-14 Thread Peter Lieven

Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
> Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
>> Am 22.06.2015 um 23:54 schrieb John Snow:
>>> On 06/22/2015 09:09 AM, Peter Lieven wrote:
 Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven  wrote:
>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven  wrote:
 Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
 Thread 2 (Thread 0x75550700 (LWP 2636)):
 #0  0x75d87aa3 in ppoll () from
 /lib/x86_64-linux-gnu/libc.so.6
 No symbol table info available.
 #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
 nfds=3,
   timeout=4999424576) at qemu-timer.c:326
   ts = {tv_sec = 4, tv_nsec = 999424576}
   tvsec = 4
 #2  0x55956feb in aio_poll (ctx=0x563528e0,
 blocking=true)
   at aio-posix.c:231
   node = 0x0
   was_dispatching = false
   ret = 1
   progress = false
 #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
 offset=4292007936,
   qiov=0x7554f760, is_write=false, flags=0) at
 block.c:2699
   aio_context = 0x563528e0
   co = 0x563888a0
   rwco = {bs = 0x5637eae0, offset = 4292007936,
 qiov = 0x7554f760, is_write = false, ret =
 2147483647,
 flags = 0}
 #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4, is_write=false,
 flags=0)
   at block.c:2722
   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
 size =
 2048}
   iov = {iov_base = 0x744cc800, iov_len = 2048}
 #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
 No locals.
 #6  0x5599acef in blk_read (blk=0x56376820,
 sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4) at
 block/block-backend.c:404
 No locals.
 #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
 lba=2095707,
   buf=0x744cc800 "(", sector_size=2048) at
 hw/ide/atapi.c:116
   ret = 32767
>>> Here is the problem: The ATAPI emulation uses synchronous
>>> blk_read()
>>> instead of the AIO or coroutine interfaces. This means that it
>>> keeps
>>> polling for request completion while it holds the BQL until the
>>> request
>>> is completed.
>> I will look at this.
 I need some further help. My way to "emulate" a hung NFS Server is to
 block it in the Firewall. Currently I face the problem that I
 cannot mount
 a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
 tried with
 a kernel NFS mount). It reads a few sectors and then stalls (maybe
 another
 bug):

 (gdb) thread apply all bt full

 Thread 3 (Thread 0x70c21700 (LWP 29710)):
 #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
 util/qemu-thread-posix.c:120
  err = 
  __func__ = "qemu_cond_broadcast"
 #1  0x55911164 in rfifolock_unlock
 (r=r@entry=0x56259910) at
 util/rfifolock.c:75
  __PRETTY_FUNCTION__ = "rfifolock_unlock"
 #2  0x55875921 in aio_context_release
 (ctx=ctx@entry=0x562598b0)
 at async.c:329
 No locals.
 #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
 blocking=blocking@entry=true) at aio-posix.c:272
  node = 
  was_dispatching = false
  i = 
  ret = 
  progress = false
  timeout = 611734526
  __PRETTY_FUNCTION__ = "aio_poll"
 #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
 offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
 is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
 block/io.c:552
  aio_context = 0x562598b0
  co = 
  rwco = {bs

Re: [Qemu-devel] [PULL v2 00/20] SCSI, build, TCG, RCU, misc patches for 2015-08-12

2015-08-14 Thread Paolo Bonzini



On 14/08/2015 16:21, Paolo Bonzini wrote:
> 
> 
> On 14/08/2015 15:53, Peter Maydell wrote:
>> Hi; I'm afraid this failed to build on my w32 config:
>>
>> /home/petmay01/linaro/qemu-for-merges/block/dmg.c:1: warning: -fPIC
>> ignored for target (all code is position independent)
>>
>> (I have warnings-are-errors enabled.)
> 
> This is a very weird warning, also warnings-are-errors does not build
> for me with Win32.

Having googled about it, I will have to submit v3.  The warning is
totally idiotic, to the point that I might make an exception to the
usual "attack code, not people" rule.

Paolo

[Qemu-devel] [Bug 1484925] Re: Segfault with custom vnc client

2015-08-14 Thread Daniel Berrange

Can you attach GDB to your qemu-dm process and attempt to capture a full
stack trace when it crashes (ie thread apply all backtrace)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1484925

Title:
  Segfault with custom vnc client

Status in QEMU:
  New

Bug description:
  Hey,

  I'm using Citrix XenServer 6.5. I worte a script that uses noVNC to
  connect to the rfb console via xapi. When I use GRML and try to boot
  it, the QEMU process segfaults and kills my VM. This happens when the
  screen resizes and the kernel is loading:

  recvfrom(3, "\3\1\0\0\0\0\2\200\1\220\3\0\2\200\0\0\0P\1\220", 4096, 0, NULL, 
NULL) = 20
  --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xb28000} ---

  I can see in the child process the following message, right before the parent 
Segfaults:
  read(4, "cirrus: blanking the screen line_offset=0 height=480\n", 53) = 53

  This issue only happens, when I have my custom php/novnc-client
  connected. I also tried the nodejs/novnc package from xen-orchestra -
  same result. Using the stock client from Citrix XenCenter it works
  just fine. So I think it is related to noVNC. I hope this is just a
  bug and not exploitable to force a VM to crash or execute code.

  XenServer launches the qemu with the following command line:

  qemu-dm-25 --syslog -d 25 -m 2048 -boot dc -serial pty -vcpus 1
  -videoram 4 -vncunused -k en-us -vnc 127.0.0.1:1 -usb -usbdevice
  tablet -net nic,vlan=0,macaddr=8a:43:e2:b1:57:df,model=rtl8139 -net
  tap,vlan=0,bridge=xenbr0,ifname=tap25.0 -acpi -monitor pty

  XenServer 6.5 is using the following version:
  # /usr/lib64/xen/bin/qemu-dm -help
  QEMU PC emulator version 0.10.2, Copyright (c) 2003-2008 Fabrice Bellard

  Greetings
  Uli Stärk

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1484925/+subscriptions

Re: [Qemu-devel] [PULL v2 00/20] SCSI, build, TCG, RCU, misc patches for 2015-08-12

2015-08-14 Thread Peter Maydell

On 14 August 2015 at 15:21, Paolo Bonzini  wrote:
>
>
> On 14/08/2015 15:53, Peter Maydell wrote:
>> Hi; I'm afraid this failed to build on my w32 config:
>>
>> /home/petmay01/linaro/qemu-for-merges/block/dmg.c:1: warning: -fPIC
>> ignored for target (all code is position independent)
>>
>> (I have warnings-are-errors enabled.)
>
> This is a very weird warning, also warnings-are-errors does not build
> for me with Win32.

I agree it's a bit weird, but there it is. Even if we don't
have warnings-as-errors we don't want a warning line for
every .c-to-.o compilation...

I think some mingw setups still emit warnings, but I've
been able to get to the point with the setup I use for
build tests where I can build with -Werror in the --extra-cflags
string.

-- PMM

Re: [Qemu-devel] [PATCH] spice: Allow to set password even if disable-ticketing was used

2015-08-14 Thread Christophe Fergeau

On Fri, Aug 14, 2015 at 03:04:48PM +0100, Daniel P. Berrange wrote:
> Hmm, is oVirt using this via libvirt ? If so, I guess we have to fix
> it, as that would be a break in current usage.

Yes this is done through libvirt.

Before commit qemu-2.1.0-rc2~11^2, you could use virsh update-device
with

to set the password for a running domain whose graphics node is


After qemu-2.1.0-rc2~11^2, this results in an error.

Christophe


pgpTeNIyrUMds.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] spice: Allow to set password even if disable-ticketing was used

2015-08-14 Thread Daniel P. Berrange

On Fri, Aug 14, 2015 at 02:47:15PM +0200, Christophe Fergeau wrote:
> Before commit b1ea7b79e1, it was possible to start with -spice
> disable-ticketing, and then use the "set_password spice" command to
> enable ticketing with SPICE. Since commit b1ea7b79e1 this is no longer
> possible as qemu_spice_set_ticket() will return an error unless the
> 'auth' type is "spice". When ticketing is disabled, 'auth' is "none" so
> the attempt to set password fails.
> 
> This commit allows to call qemu_spice_set_ticket() when 'auth' is "none"
> and changes 'auth' to "spice" when this happens.

BTW, you need to have a Signed-of-by here

> ---
>  ui/spice-core.c | 4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Daniel P. Berrange 

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PULL 1/2] throttle: refuse bps_max/iops_max without bps/iops

2015-08-14 Thread Stefan Hajnoczi

The bps_max/iops_max values are meaningless without corresponding
bps/iops values.  Reported an error if bps_max/iops_max is given without
bps/iops.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Alberto Garcia 
Message-id: 1438683733-2-2-git-send-email-stefa...@redhat.com
---
 blockdev.c  |  6 ++
 include/qemu/throttle.h |  2 ++
 util/throttle.c | 15 +++
 3 files changed, 23 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 62a4586..4125ff6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -337,6 +337,12 @@ static bool check_throttle_config(ThrottleConfig *cfg, 
Error **errp)
 return false;
 }
 
+if (throttle_max_is_missing_limit(cfg)) {
+error_setg(errp, "bps_max/iops_max require corresponding"
+ " bps/iops values");
+return false;
+}
+
 return true;
 }
 
diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
index 995b2d5..12faaad 100644
--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -114,6 +114,8 @@ bool throttle_conflicting(ThrottleConfig *cfg);
 
 bool throttle_is_valid(ThrottleConfig *cfg);
 
+bool throttle_max_is_missing_limit(ThrottleConfig *cfg);
+
 void throttle_config(ThrottleState *ts,
  ThrottleTimers *tt,
  ThrottleConfig *cfg);
diff --git a/util/throttle.c b/util/throttle.c
index 706c131..1113671 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -300,6 +300,21 @@ bool throttle_is_valid(ThrottleConfig *cfg)
 return !invalid;
 }
 
+/* check if bps_max/iops_max is used without bps/iops
+ * @cfg: the throttling configuration to inspect
+ */
+bool throttle_max_is_missing_limit(ThrottleConfig *cfg)
+{
+int i;
+
+for (i = 0; i < BUCKETS_COUNT; i++) {
+if (cfg->buckets[i].max && !cfg->buckets[i].avg) {
+return true;
+}
+}
+return false;
+}
+
 /* fix bucket parameters */
 static void throttle_fix_bucket(LeakyBucket *bkt)
 {
-- 
2.4.3

[Qemu-devel] [PULL 0/2] Block patches

2015-08-14 Thread Stefan Hajnoczi

The following changes since commit 2be4f242b50a84bf360df02480b173bfed161107:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
staging (2015-08-04 16:51:24 +0100)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 92e11a17612108b1729bde4ce61aad0cc1ce5889:

  throttle: add throttle_max_is_missing_limit() test (2015-08-05 12:53:48 +0100)





Stefan Hajnoczi (2):
  throttle: refuse bps_max/iops_max without bps/iops
  throttle: add throttle_max_is_missing_limit() test

 blockdev.c  |  6 ++
 include/qemu/throttle.h |  2 ++
 tests/test-throttle.c   | 21 +
 util/throttle.c | 15 +++
 4 files changed, 44 insertions(+)

-- 
2.4.3

[Qemu-devel] [PULL 2/2] throttle: add throttle_max_is_missing_limit() test

2015-08-14 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Alberto Garcia 
Message-id: 1438683733-2-3-git-send-email-stefa...@redhat.com
---
 tests/test-throttle.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index 0168445..85c9b6c 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -329,6 +329,26 @@ static void test_is_valid(void)
 test_is_valid_for_value(1, true);
 }
 
+static void test_max_is_missing_limit(void)
+{
+int i;
+
+for (i = 0; i < BUCKETS_COUNT; i++) {
+memset(&cfg, 0, sizeof(cfg));
+cfg.buckets[i].max = 100;
+cfg.buckets[i].avg = 0;
+g_assert(throttle_max_is_missing_limit(&cfg));
+
+cfg.buckets[i].max = 0;
+cfg.buckets[i].avg = 0;
+g_assert(!throttle_max_is_missing_limit(&cfg));
+
+cfg.buckets[i].max = 0;
+cfg.buckets[i].avg = 100;
+g_assert(!throttle_max_is_missing_limit(&cfg));
+}
+}
+
 static void test_have_timer(void)
 {
 /* zero structures */
@@ -591,6 +611,7 @@ int main(int argc, char **argv)
 g_test_add_func("/throttle/config/enabled", test_enabled);
 g_test_add_func("/throttle/config/conflicting", test_conflicting_config);
 g_test_add_func("/throttle/config/is_valid",test_is_valid);
+g_test_add_func("/throttle/config/max", test_max_is_missing_limit);
 g_test_add_func("/throttle/config_functions",   test_config_functions);
 g_test_add_func("/throttle/accounting", test_accounting);
 g_test_add_func("/throttle/groups", test_groups);
-- 
2.4.3

Re: [Qemu-devel] Plan for using softmmu with linux-user

2015-08-14 Thread Richard Henderson

On 08/14/2015 02:37 AM, gchen gchen wrote:
>  - If I implement SW64 tcg backend, I guess, I cann't get help from qemu
>upstream: I don't think SW64 is valuable enough for upstream (either
>I am not sure that I can implment Alpha tcg backend in working time).

It'll need some updating to apply to master, but I started an alpha backend a
couple of years ago.  It looks like it was last rebased in May 2014.

  git://github.com/rth7680/qemu.git tcg-alpha-2

r~

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-08-14 Thread Peter Lieven

Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
> Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
>> Am 22.06.2015 um 23:54 schrieb John Snow:
>>> On 06/22/2015 09:09 AM, Peter Lieven wrote:
 Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven  wrote:
>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven  wrote:
 Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
 Thread 2 (Thread 0x75550700 (LWP 2636)):
 #0  0x75d87aa3 in ppoll () from
 /lib/x86_64-linux-gnu/libc.so.6
 No symbol table info available.
 #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
 nfds=3,
   timeout=4999424576) at qemu-timer.c:326
   ts = {tv_sec = 4, tv_nsec = 999424576}
   tvsec = 4
 #2  0x55956feb in aio_poll (ctx=0x563528e0,
 blocking=true)
   at aio-posix.c:231
   node = 0x0
   was_dispatching = false
   ret = 1
   progress = false
 #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
 offset=4292007936,
   qiov=0x7554f760, is_write=false, flags=0) at
 block.c:2699
   aio_context = 0x563528e0
   co = 0x563888a0
   rwco = {bs = 0x5637eae0, offset = 4292007936,
 qiov = 0x7554f760, is_write = false, ret =
 2147483647,
 flags = 0}
 #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4, is_write=false,
 flags=0)
   at block.c:2722
   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
 size =
 2048}
   iov = {iov_base = 0x744cc800, iov_len = 2048}
 #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
 No locals.
 #6  0x5599acef in blk_read (blk=0x56376820,
 sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4) at
 block/block-backend.c:404
 No locals.
 #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
 lba=2095707,
   buf=0x744cc800 "(", sector_size=2048) at
 hw/ide/atapi.c:116
   ret = 32767
>>> Here is the problem: The ATAPI emulation uses synchronous
>>> blk_read()
>>> instead of the AIO or coroutine interfaces. This means that it
>>> keeps
>>> polling for request completion while it holds the BQL until the
>>> request
>>> is completed.
>> I will look at this.
 I need some further help. My way to "emulate" a hung NFS Server is to
 block it in the Firewall. Currently I face the problem that I
 cannot mount
 a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
 tried with
 a kernel NFS mount). It reads a few sectors and then stalls (maybe
 another
 bug):

 (gdb) thread apply all bt full

 Thread 3 (Thread 0x70c21700 (LWP 29710)):
 #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
 util/qemu-thread-posix.c:120
  err = 
  __func__ = "qemu_cond_broadcast"
 #1  0x55911164 in rfifolock_unlock
 (r=r@entry=0x56259910) at
 util/rfifolock.c:75
  __PRETTY_FUNCTION__ = "rfifolock_unlock"
 #2  0x55875921 in aio_context_release
 (ctx=ctx@entry=0x562598b0)
 at async.c:329
 No locals.
 #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
 blocking=blocking@entry=true) at aio-posix.c:272
  node = 
  was_dispatching = false
  i = 
  ret = 
  progress = false
  timeout = 611734526
  __PRETTY_FUNCTION__ = "aio_poll"
 #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
 offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
 is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
 block/io.c:552
  aio_context = 0x562598b0
  co = 
  rwco = {bs

Re: [Qemu-devel] [PULL 0/2] Block job patches

2015-08-14 Thread Peter Maydell

On 14 August 2015 at 14:57, Jeff Cody  wrote:
> The following changes since commit be1f13ac9d9fc21908975460652a72f5f0c018c5:
>
>   Merge remote-tracking branch 'remotes/lalrae/tags/mips-20150813' into 
> staging (2015-08-13 17:47:44 +0100)
>
> are available in the git repository at:
>
>
>   g...@github.com:codyprime/qemu-kvm-jtc.git tags/block-pull-request
>
> for you to fetch changes up to e424aff5f307227b1c2512bbb8ece891bb895cef:
>
>   mirror: Fix coroutine reentrance (2015-08-14 09:51:31 -0400)
>
> 
> Block job patches
> 
>
> Kevin Wolf (1):
>   mirror: Fix coroutine reentrance
>
> Stefan Hajnoczi (1):
>   block/mirror: limit qiov to IOV_MAX elements

Your pull req tag has not only these two commits in it,
but also a merge commit ("Merge branch 'block-next' into HEAD).
Why is that?

thanks
-- PMM

Re: [Qemu-devel] [PULL 0/2] Block job patches

2015-08-14 Thread Jeff Cody

On Fri, Aug 14, 2015 at 03:51:03PM +0100, Peter Maydell wrote:
> On 14 August 2015 at 14:57, Jeff Cody  wrote:
> > The following changes since commit be1f13ac9d9fc21908975460652a72f5f0c018c5:
> >
> >   Merge remote-tracking branch 'remotes/lalrae/tags/mips-20150813' into 
> > staging (2015-08-13 17:47:44 +0100)
> >
> > are available in the git repository at:
> >
> >
> >   g...@github.com:codyprime/qemu-kvm-jtc.git tags/block-pull-request
> >
> > for you to fetch changes up to e424aff5f307227b1c2512bbb8ece891bb895cef:
> >
> >   mirror: Fix coroutine reentrance (2015-08-14 09:51:31 -0400)
> >
> > 
> > Block job patches
> > 
> >
> > Kevin Wolf (1):
> >   mirror: Fix coroutine reentrance
> >
> > Stefan Hajnoczi (1):
> >   block/mirror: limit qiov to IOV_MAX elements
> 
> Your pull req tag has not only these two commits in it,
> but also a merge commit ("Merge branch 'block-next' into HEAD).
> Why is that?
> 
> thanks
> -- PMM

Hi,

I was trying to keep a commit id stable (for 'block/mirror: limit qiov
to IOV_MAX elements'), so it could be used for a downstream backport
before this patch actually hit the official upstream repo.

Does this cause you any issues (i.e., should I rebase and submit a new
pull request)?

Thanks,
Jeff

[Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset

2015-08-14 Thread Xiao Guangrong

Currently, the offset in OperationRegion is limited to 32 bit, extend it
to 64 bit so that we can switch SSDT to 64 bit in later patch

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 2 +-
 include/hw/acpi/aml-build.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 0d4b324..02f9e3d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -752,7 +752,7 @@ Aml *aml_package(uint8_t num_elements)
 
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefOpRegion */
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-  uint32_t offset, uint32_t len)
+  uint64_t offset, uint32_t len)
 {
 Aml *var = aml_alloc();
 build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index e3afa13..996ac5b 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -222,7 +222,7 @@ Aml *aml_interrupt(AmlConsumerAndProducer con_and_pro,
 Aml *aml_io(AmlIODecode dec, uint16_t min_base, uint16_t max_base,
 uint8_t aln, uint8_t len);
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-  uint32_t offset, uint32_t len);
+  uint64_t offset, uint32_t len);
 Aml *aml_irq_no_flags(uint8_t irq);
 Aml *aml_named_field(const char *name, unsigned length);
 Aml *aml_reserved_field(unsigned length);
-- 
2.4.3

[Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof

2015-08-14 Thread Xiao Guangrong

Implement DeRefOf term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 8 
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 02f9e3d..9e89efc 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1135,6 +1135,14 @@ Aml *aml_unicode(const char *str)
 return var;
 }
 
+/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefDerefOf */
+Aml *aml_derefof(Aml *arg)
+{
+Aml *var = aml_opcode(0x83 /* DerefOfOp */);
+aml_append(var, arg);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 996ac5b..21dc5e9 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -275,6 +275,7 @@ Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const 
char *name);
 Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
+Aml *aml_derefof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.4.3

[Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table

2015-08-14 Thread Xiao Guangrong

NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)

Currently, we only support PMEM mode. Each device has 3 tables:
- SPA table, define the PMEM region info

- MEM DEV table, it has the @handle which is used to associate specified
  ACPI NVDIMM  device we will introduce in later patch.
  Also we can happily ignored the memory device's interleave, the real
  nvdimm hardware access is hidden behind host

- DCR table, it defines Vendor ID used to associate specified vendor
  nvdimm driver. Since we only implement PMEM mode this time, Command
  window and Data window are not needed

Signed-off-by: Xiao Guangrong 
---
 hw/i386/acpi-build.c   |   3 +
 hw/mem/Makefile.objs   |   2 +-
 hw/mem/nvdimm/acpi.c   | 285 +
 hw/mem/nvdimm/internal.h   |  29 +
 hw/mem/nvdimm/pc-nvdimm.c  |  27 -
 include/hw/mem/pc-nvdimm.h |   2 +
 6 files changed, 346 insertions(+), 2 deletions(-)
 create mode 100644 hw/mem/nvdimm/acpi.c
 create mode 100644 hw/mem/nvdimm/internal.h

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 8ead1c1..092ed2f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -39,6 +39,7 @@
 #include "hw/loader.h"
 #include "hw/isa/isa.h"
 #include "hw/acpi/memory_hotplug.h"
+#include "hw/mem/pc-nvdimm.h"
 #include "sysemu/tpm.h"
 #include "hw/acpi/tpm.h"
 #include "sysemu/tpm_backend.h"
@@ -1741,6 +1742,8 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables 
*tables)
 build_dmar_q35(tables_blob, tables->linker);
 }
 
+pc_nvdimm_build_nfit_table(table_offsets, tables_blob, tables->linker);
+
 /* Add tables supplied by user (if any) */
 for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
 unsigned len = acpi_table_len(u);
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 4df7482..7a6948d 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,2 +1,2 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
-common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o nvdimm/acpi.o
diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
new file mode 100644
index 000..f28752f
--- /dev/null
+++ b/hw/mem/nvdimm/acpi.c
@@ -0,0 +1,285 @@
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) NFIT Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong 
+ *
+ * NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
+ * and the DSM specfication can be found at:
+ *   http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu-common.h"
+
+#include "hw/acpi/aml-build.h"
+#include "hw/mem/pc-nvdimm.h"
+
+#include "internal.h"
+
+static void nfit_spa_uuid_pm(void *uuid)
+{
+uuid_le uuid_pm = UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d,
+  0x33, 0x18, 0xb7, 0x8c, 0xdb);
+memcpy(uuid, &uuid_pm, sizeof(uuid_pm));
+}
+
+enum {
+NFIT_TABLE_SPA = 0,
+NFIT_TABLE_MEM = 1,
+NFIT_TABLE_IDT = 2,
+NFIT_TABLE_SMBIOS = 3,
+NFIT_TABLE_DCR = 4,
+NFIT_TABLE_BDW = 5,
+NFIT_TABLE_FLUSH = 6,
+};
+
+enum {
+EFI_MEMORY_UC = 0x1ULL,
+EFI_MEMORY_WC = 0x2ULL,
+EFI_MEMORY_WT = 0x4ULL,
+EFI_MEMORY_WB = 0x8ULL,
+EFI_MEMORY_UCE = 0x10ULL,
+EFI_MEMORY_WP = 0x1000ULL,
+EFI_MEMORY_RP = 0x2000ULL,
+EFI_MEMORY_XP = 0x4000ULL,
+EFI_MEMORY_NV = 0x8000ULL,
+EFI_MEMORY_MORE_RELIABLE = 0x1ULL,
+};
+
+/*
+ * struct nfit - Nvdimm Firmware Interface Table
+ * @signature: "NFIT"
+ */
+struct nfit {
+ACPI_TABLE_HEADER_DEF
+uint32_t reserved;
+} QEMU_PACKED;
+
+/*
+ * struct nfit_spa - System Physical Address Range Structure
+ */
+struct nfit_spa {
+uint16_t type;
+uint16_t length;
+uint16_t spa_index;
+uint16_t flags;
+uint32_t reserved;
+uint32_t proximity_domain;
+uint8_t type_uuid[16];
+uint64_t spa_base;
+uint64_t spa_length;
+uint64_t mem_attr;
+} QEMU_PACKED;
+
+/*
+ * struct nfit_memdev - Memory Device to SPA Map Structure
+ */
+struct nfit_memdev {
+uint16_t type;
+uint16_t length;
+uint32_t nfit_handle;
+uint16_t phys_id;
+uint16_t region_id;
+uint16_t s

[Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit

2015-08-14 Thread Xiao Guangrong

Only 512M is left for MMIO below 4G and that are used by PCI, BIOS etc.
Other components also reserve regions from their internal usage, e.g,
[0xFED0, 0xFED0 + 0x400) is reserved for HPET

Switch SSDT to 64 bit to use the huge free room above 4G. In the later
patches, we will dynamical allocate free space within this region which
is used by NVDIMM _DSM method

Signed-off-by: Xiao Guangrong 
---
 hw/i386/acpi-build.c  | 4 ++--
 hw/i386/acpi-dsdt.dsl | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 46eddb8..8ead1c1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1348,7 +1348,7 @@ build_ssdt(GArray *table_data, GArray *linker,
 g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
 build_header(linker, table_data,
 (void *)(table_data->data + table_data->len - ssdt->buf->len),
-"SSDT", ssdt->buf->len, 1);
+"SSDT", ssdt->buf->len, 2);
 free_aml_allocator();
 }
 
@@ -1586,7 +1586,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
AcpiMiscInfo *misc)
 
 memset(dsdt, 0, sizeof *dsdt);
 build_header(linker, table_data, dsdt, "DSDT",
- misc->dsdt_size, 1);
+ misc->dsdt_size, 2);
 }
 
 static GArray *
diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
index a2d84ec..5cd3f0e 100644
--- a/hw/i386/acpi-dsdt.dsl
+++ b/hw/i386/acpi-dsdt.dsl
@@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
 DefinitionBlock (
 "acpi-dsdt.aml",// Output Filename
 "DSDT", // Signature
-0x01,   // DSDT Compliance Revision
+0x02,   // DSDT Compliance Revision
 "BXPC", // OEMID
 "BXDSDT",   // TABLE ID
 0x1 // OEM Revision
-- 
2.4.3

[Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof

2015-08-14 Thread Xiao Guangrong

Implement SizeOf term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 8 
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9e89efc..a526eed 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1143,6 +1143,14 @@ Aml *aml_derefof(Aml *arg)
 return var;
 }
 
+/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefSizeOf */
+Aml *aml_sizeof(Aml *arg)
+{
+Aml *var = aml_opcode(0x87 /* SizeOfOp */);
+aml_append(var, arg);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 21dc5e9..6b591ab 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -276,6 +276,7 @@ Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
+Aml *aml_sizeof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.4.3

[Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM

2015-08-14 Thread Xiao Guangrong

NVDIMM reserves all the free range above 4G to do:
- Persistent Memory (PMEM) mapping
- implement NVDIMM ACPI device _DSM method

Signed-off-by: Xiao Guangrong 
---
 hw/i386/pc.c   | 12 ++--
 hw/mem/nvdimm/pc-nvdimm.c  | 13 +
 include/hw/mem/pc-nvdimm.h |  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7661ea9..41af6ea 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -64,6 +64,7 @@
 #include "hw/pci/pci_host.h"
 #include "acpi-build.h"
 #include "hw/mem/pc-dimm.h"
+#include "hw/mem/pc-nvdimm.h"
 #include "qapi/visitor.h"
 #include "qapi-visit.h"
 
@@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
 MemoryRegion *ram_below_4g, *ram_above_4g;
 FWCfgState *fw_cfg;
 PCMachineState *pcms = PC_MACHINE(machine);
+ram_addr_t offset;
 
 assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size);
 
@@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
 exit(EXIT_FAILURE);
 }
 
+offset = 0x1ULL + above_4g_mem_size;
+
 /* initialize hotplug memory address space */
 if (guest_info->has_reserved_memory &&
 (machine->ram_size < machine->maxram_size)) {
@@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
 exit(EXIT_FAILURE);
 }
 
-pcms->hotplug_memory.base =
-ROUND_UP(0x1ULL + above_4g_mem_size, 1ULL << 30);
+pcms->hotplug_memory.base = ROUND_UP(offset, 1ULL << 30);
 
 if (pcms->enforce_aligned_dimm) {
 /* size hotplug region assuming 1G page max alignment per slot */
@@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine,
"hotplug-memory", hotplug_mem_size);
 memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
 &pcms->hotplug_memory.mr);
+
+offset = pcms->hotplug_memory.base + hotplug_mem_size;
 }
 
+ /* all the space left above 4G is reserved for NVDIMM. */
+pc_nvdimm_reserve_range(offset);
+
 /* Initialize PC system firmware */
 pc_system_firmware_init(rom_memory, guest_info->isapc_ram_fw);
 
diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
index a53d235..7a270a8 100644
--- a/hw/mem/nvdimm/pc-nvdimm.c
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -24,6 +24,19 @@
 
 #include "hw/mem/pc-nvdimm.h"
 
+#define PAGE_SIZE  (1UL << 12)
+
+static struct nvdimms_info {
+ram_addr_t current_addr;
+} nvdimms_info;
+
+/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
+void pc_nvdimm_reserve_range(ram_addr_t offset)
+{
+offset = ROUND_UP(offset, PAGE_SIZE);
+nvdimms_info.current_addr = offset;
+}
+
 static char *get_file(Object *obj, Error **errp)
 {
 PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index 51152b8..8601e9b 100644
--- a/include/hw/mem/pc-nvdimm.h
+++ b/include/hw/mem/pc-nvdimm.h
@@ -28,4 +28,5 @@ typedef struct PCNVDIMMDevice {
 #define PC_NVDIMM(obj) \
 OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
 
+void pc_nvdimm_reserve_range(ram_addr_t offset);
 #endif
-- 
2.4.3

[Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data

2015-08-14 Thread Xiao Guangrong

If @configdata is false, Qemu will build a static and readonly
namespace in memory and use it serveing for
DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests

Signed-off-by: Xiao Guangrong 
---
 hw/mem/Makefile.objs   |   3 +-
 hw/mem/nvdimm/acpi.c   |  10 ++
 hw/mem/nvdimm/internal.h   |  12 ++
 hw/mem/nvdimm/namespace.c  | 307 +
 include/hw/mem/pc-nvdimm.h |   2 +
 5 files changed, 333 insertions(+), 1 deletion(-)
 create mode 100644 hw/mem/nvdimm/namespace.c

diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 7a6948d..7f3fab2 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,2 +1,3 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
-common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o nvdimm/acpi.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o nvdimm/acpi.o\
+  nvdimm/namespace.o
diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 0b09efa..c773954 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -240,6 +240,8 @@ static void build_nfit_table(GSList *device_list, char *buf)
 
 for (; device_list; device_list = device_list->next) {
 PCNVDIMMDevice *nvdimm = device_list->data;
+struct nfit_memdev *nfit_memdev;
+struct nfit_dcr *nfit_dcr;
 int spa_index, dcr_index;
 
 spa_index = ++index;
@@ -252,10 +254,15 @@ static void build_nfit_table(GSList *device_list, char 
*buf)
  * build Memory Device to System Physical Address Range Mapping
  * Table.
  */
+nfit_memdev = (struct nfit_memdev *)buf;
 buf += build_memdev_table(buf, nvdimm, spa_index, dcr_index);
 
 /* build Control Region Descriptor Table. */
+nfit_dcr = (struct nfit_dcr *)buf;
 buf += build_dcr_table(buf, nvdimm, dcr_index);
+
+calculate_nvdimm_isetcookie(nvdimm, nfit_memdev->region_spa_offset,
+nfit_dcr->serial_number);
 }
 }
 
@@ -382,6 +389,9 @@ void pc_nvdimm_build_nfit_table(GArray *table_offsets, 
GArray *table_data,
 
 build_header(linker, table_data, (void *)(table_data->data + nfit_start),
  "NFIT", table_data->len - nfit_start, 1);
+
+build_nvdimm_configdata(list);
+
 exit:
 g_slist_free(list);
 }
diff --git a/hw/mem/nvdimm/internal.h b/hw/mem/nvdimm/internal.h
index 90d54dc..b1f3f16 100644
--- a/hw/mem/nvdimm/internal.h
+++ b/hw/mem/nvdimm/internal.h
@@ -13,6 +13,14 @@
 #ifndef __NVDIMM_INTERNAL_H
 #define __NVDIMM_INTERNAL_H
 
+/* #define NVDIMM_DEBUG */
+
+#ifdef NVDIMM_DEBUG
+#define nvdebug(fmt, ...) fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__)
+#else
+#define nvdebug(...)
+#endif
+
 #define PAGE_SIZE   (1UL << 12)
 
 typedef struct {
@@ -27,4 +35,8 @@ typedef struct {
 
 GSList *get_nvdimm_built_list(void);
 ram_addr_t reserved_range_push(uint64_t size);
+
+void calculate_nvdimm_isetcookie(PCNVDIMMDevice *nvdimm, uint64_t spa,
+ uint32_t sn);
+void build_nvdimm_configdata(GSList *device_list);
 #endif
diff --git a/hw/mem/nvdimm/namespace.c b/hw/mem/nvdimm/namespace.c
new file mode 100644
index 000..04626da
--- /dev/null
+++ b/hw/mem/nvdimm/namespace.c
@@ -0,0 +1,307 @@
+/*
+ * NVDIMM  Namespace Support
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong 
+ *
+ * NVDIMM namespace specification can be found at:
+ *  http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "hw/mem/pc-nvdimm.h"
+
+#include "internal.h"
+
+static uint64_t fletcher64(void *addr, size_t len)
+{
+uint32_t *buf = addr;
+uint32_t lo32 = 0;
+uint64_t hi32 = 0;
+int i;
+
+for (i = 0; i < len / sizeof(uint32_t); i++) {
+lo32 += cpu_to_le32(buf[i]);
+hi32 += lo32;
+}
+
+return hi32 << 32 | lo32;
+}
+
+struct interleave_set_info {
+struct interleave_set_info_map {
+uint64_t region_spa_offset;
+uint32_t serial_number;
+uint32_t zero;
+} mapping[1];
+};
+
+void calculate_nvdimm_isetcookie(PCNVDIMMDevice *nvdimm, uint64_t spa,
+ uint32_t sn)
+{
+struct interleave_set_info info;
+
+info.mapping[0].region_spa_offset = spa;
+info.mapping[0].serial_number = sn;
+info.mappin

[Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field

2015-08-14 Thread Xiao Guangrong

Implement CreateField term which are used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 14 ++
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a526eed..debdad2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1151,6 +1151,20 @@ Aml *aml_sizeof(Aml *arg)
 return var;
 }
 
+/* ACPI 6.0: 20.2.5.2 Named Objects Encoding: DefCreateField */
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
+{
+Aml *var = aml_alloc();
+
+build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
+build_append_byte(var->buf, 0x13); /* CreateFieldOp */
+aml_append(var, srcbuf);
+aml_append(var, index);
+aml_append(var, len);
+build_append_namestring(var->buf, "%s", name);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 6b591ab..d4dbd44 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -277,6 +277,7 @@ Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.4.3

[Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-08-14 Thread Xiao Guangrong

The parameter @file is used as backed memory for NVDIMM which is
divided into two parts if @dataconfig is true:
- first parts is (0, size - 128K], which is used as PMEM (Persistent
  Memory)
- 128K at the end of the file, which is used as Config Data Area, it's
  used to store Label namespace data

The @file supports both regular file and block device, of course we
can assign any these two kinds of files for test and emulation, however,
in the real word for performance reason, we usually used these files as
NVDIMM backed file:
- the regular file in the filesystem with DAX enabled created on NVDIMM
  device on host
- the raw PMEM device on host, e,g /dev/pmem0

Signed-off-by: Xiao Guangrong 
---
 hw/mem/nvdimm/pc-nvdimm.c  | 109 -
 include/hw/mem/pc-nvdimm.h |   7 +++
 2 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
index 7a270a8..97710d1 100644
--- a/hw/mem/nvdimm/pc-nvdimm.c
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -22,12 +22,20 @@
  * License along with this library; if not, see 
  */
 
+#include 
+#include 
+#include 
+
+#include "exec/address-spaces.h"
 #include "hw/mem/pc-nvdimm.h"
 
-#define PAGE_SIZE  (1UL << 12)
+#define PAGE_SIZE   (1UL << 12)
+
+#define MIN_CONFIG_DATA_SIZE(128 << 10)
 
 static struct nvdimms_info {
 ram_addr_t current_addr;
+int device_index;
 } nvdimms_info;
 
 /* the address range [offset, ~0ULL) is reserved for NVDIMM. */
@@ -37,6 +45,26 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
 nvdimms_info.current_addr = offset;
 }
 
+static ram_addr_t reserved_range_push(uint64_t size)
+{
+uint64_t current;
+
+current = ROUND_UP(nvdimms_info.current_addr, PAGE_SIZE);
+
+/* do not have enough space? */
+if (current + size < current) {
+return 0;
+}
+
+nvdimms_info.current_addr = current + size;
+return current;
+}
+
+static uint32_t new_device_index(void)
+{
+return nvdimms_info.device_index++;
+}
+
 static char *get_file(Object *obj, Error **errp)
 {
 PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
@@ -48,6 +76,11 @@ static void set_file(Object *obj, const char *str, Error 
**errp)
 {
 PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 
+if (memory_region_size(&nvdimm->mr)) {
+error_setg(errp, "cannot change property value");
+return;
+}
+
 if (nvdimm->file) {
 g_free(nvdimm->file);
 }
@@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
  set_configdata, NULL);
 }
 
+static uint64_t get_file_size(int fd)
+{
+struct stat stat_buf;
+uint64_t size;
+
+if (fstat(fd, &stat_buf) < 0) {
+return 0;
+}
+
+if (S_ISREG(stat_buf.st_mode)) {
+return stat_buf.st_size;
+}
+
+if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
+return size;
+}
+
+return 0;
+}
+
 static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
 {
 PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
+char name[512];
+void *buf;
+ram_addr_t addr;
+uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE;
+int fd;
 
 if (!nvdimm->file) {
 error_setg(errp, "file property is not set");
 }
+
+fd = open(nvdimm->file, O_RDWR);
+if (fd < 0) {
+error_setg(errp, "can not open %s", nvdimm->file);
+return;
+}
+
+size = get_file_size(fd);
+buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+if (buf == MAP_FAILED) {
+error_setg(errp, "can not do mmap on %s", nvdimm->file);
+goto do_close;
+}
+
+nvdimm->config_data_size = config_size;
+if (nvdimm->configdata) {
+/* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */
+nvdimm_size = size - config_size;
+nvdimm->config_data_addr = buf + nvdimm_size;
+} else {
+nvdimm_size = size;
+nvdimm->config_data_addr = NULL;
+}
+
+if ((int64_t)nvdimm_size <= 0) {
+error_setg(errp, "file size is too small to store NVDIMM"
+ " configure data");
+goto do_unmap;
+}
+
+addr = reserved_range_push(nvdimm_size);
+if (!addr) {
+error_setg(errp, "do not have enough space for size %#lx.\n", size);
+goto do_unmap;
+}
+
+nvdimm->device_index = new_device_index();
+sprintf(name, "NVDIMM-%d", nvdimm->device_index);
+memory_region_init_ram_ptr(&nvdimm->mr, OBJECT(dev), name, nvdimm_size,
+   buf);
+vmstate_register_ram(&nvdimm->mr, DEVICE(dev));
+memory_region_add_subregion(get_system_memory(), addr, &nvdimm->mr);
+
+return;
+
+do_unmap:
+munmap(buf, size);
+do_close:
+close(fd);
 }
 
 static void pc_nvdimm_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index 8601e9b..f617fd2 100644
--- a/include/hw/mem

[Qemu-devel] [PATCH v2 12/18] nvdimm: save arg3 for NVDIMM device _DSM method

2015-08-14 Thread Xiao Guangrong

Check if the function (Arg2) has additional input info (arg3) and save
the info if needed

We only do the save on NVDIMM device since we are not going to support any
function on root device

Signed-off-by: Xiao Guangrong 
---
 hw/mem/nvdimm/acpi.c | 73 +++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 909a8ef..0b09efa 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -259,6 +259,26 @@ static void build_nfit_table(GSList *device_list, char 
*buf)
 }
 }
 
+enum {
+NFIT_CMD_IMPLEMENTED = 0,
+
+/* bus commands */
+NFIT_CMD_ARS_CAP = 1,
+NFIT_CMD_ARS_START = 2,
+NFIT_CMD_ARS_QUERY = 3,
+
+/* per-dimm commands */
+NFIT_CMD_SMART = 1,
+NFIT_CMD_SMART_THRESHOLD = 2,
+NFIT_CMD_DIMM_FLAGS = 3,
+NFIT_CMD_GET_CONFIG_SIZE = 4,
+NFIT_CMD_GET_CONFIG_DATA = 5,
+NFIT_CMD_SET_CONFIG_DATA = 6,
+NFIT_CMD_VENDOR_EFFECT_LOG_SIZE = 7,
+NFIT_CMD_VENDOR_EFFECT_LOG = 8,
+NFIT_CMD_VENDOR = 9,
+};
+
 struct dsm_buffer {
 /* RAM page. */
 uint32_t handle;
@@ -366,6 +386,19 @@ exit:
 g_slist_free(list);
 }
 
+static bool device_cmd_has_arg3[] = {
+false,  /* NFIT_CMD_IMPLEMENTED */
+false,  /* NFIT_CMD_SMART */
+false,  /* NFIT_CMD_SMART_THRESHOLD */
+false,  /* NFIT_CMD_DIMM_FLAGS */
+false,  /* NFIT_CMD_GET_CONFIG_SIZE */
+true,   /* NFIT_CMD_GET_CONFIG_DATA */
+true,   /* NFIT_CMD_SET_CONFIG_DATA */
+false,  /* NFIT_CMD_VENDOR_EFFECT_LOG_SIZE */
+false,  /* NFIT_CMD_VENDOR_EFFECT_LOG */
+false,  /* NFIT_CMD_VENDOR */
+};
+
 #define BUILD_STA_METHOD(_dev_, _method_)  \
 do {   \
 _method_ = aml_method("_STA", 0);  \
@@ -390,10 +423,20 @@ exit:
 
 static void build_nvdimm_devices(Aml *root_dev, GSList *list)
 {
+Aml *has_arg3;
+int i, cmd_nr;
+
+cmd_nr = ARRAY_SIZE(device_cmd_has_arg3);
+has_arg3 = aml_package(cmd_nr);
+for (i = 0; i < cmd_nr; i++) {
+aml_append(has_arg3, aml_int(device_cmd_has_arg3[i]));
+}
+aml_append(root_dev, aml_name_decl("CAG3", has_arg3));
+
 for (; list; list = list->next) {
 PCNVDIMMDevice *nvdimm = list->data;
 uint32_t handle = nvdimm_index_to_handle(nvdimm->device_index);
-Aml *dev, *method;
+Aml *dev, *method, *ifctx;
 
 dev = aml_device("NVD%d", nvdimm->device_index);
 aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
@@ -403,6 +446,34 @@ static void build_nvdimm_devices(Aml *root_dev, GSList 
*list)
 method = aml_method("_DSM", 4);
 {
 SAVE_ARG012_HANDLE(method, aml_int(handle));
+
+/* Local5 = DeRefOf(Index(CAG3, Arg2)) */
+aml_append(method,
+   aml_store(aml_derefof(aml_index(aml_name("CAG3"),
+   aml_arg(2))), aml_local(5)));
+/* if 0 < local5 */
+ifctx = aml_if(aml_lless(aml_int(0), aml_local(5)));
+{
+/* Local0 = Index(Arg3, 0) */
+aml_append(ifctx, aml_store(aml_index(aml_arg(3), aml_int(0)),
+   aml_local(0)));
+/* Local1 = sizeof(Local0) */
+aml_append(ifctx, aml_store(aml_sizeof(aml_local(0)),
+   aml_local(1)));
+/* Local2 = Local1 << 3 */
+aml_append(ifctx, aml_store(aml_shiftleft(aml_local(1),
+   aml_int(3)), aml_local(2)));
+/* Local3 = DeRefOf(Local0) */
+aml_append(ifctx, aml_store(aml_derefof(aml_local(0)),
+   aml_local(3)));
+/* CreateField(Local3, 0, local2, IBUF) */
+aml_append(ifctx, aml_create_field(aml_local(3),
+   aml_int(0), aml_local(2), "IBUF"));
+/* ARG3 = IBUF */
+aml_append(ifctx, aml_store(aml_name("IBUF"),
+   aml_name("ARG3")));
+}
+aml_append(method, ifctx);
 NOTIFY_AND_RETURN(method);
 }
 aml_append(dev, method);
-- 
2.4.3

[Qemu-devel] [PATCH v2 17/18] nvdimm: support NFIT_CMD_SET_CONFIG_DATA

2015-08-14 Thread Xiao Guangrong

Function 6 is used to set Namespace Label Data

Signed-off-by: Xiao Guangrong 
---
 hw/mem/nvdimm/acpi.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 517d710..283228d 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -382,12 +382,17 @@ struct cmd_out_get_config_data {
 uint8_t out_buf[0];
 } QEMU_PACKED;
 
+struct cmd_out_set_config_data {
+uint32_t status;
+} QEMU_PACKED;
+
 struct dsm_out {
 union {
 uint32_t status;
 struct cmd_out_implemented cmd_implemented;
 struct cmd_out_get_config_size cmd_config_size;
 struct cmd_out_get_config_data cmd_config_get;
+struct cmd_out_set_config_data cmd_config_set;
 uint8_t data[PAGE_SIZE];
 };
 };
@@ -483,6 +488,38 @@ exit:
 return status;
 }
 
+static uint32_t
+dsm_cmd_config_set(PCNVDIMMDevice *nvdimm, struct dsm_buffer *in,
+   struct dsm_out *out)
+{
+struct cmd_in_set_config_data *cmd_in = &in->cmd_config_set;
+uint32_t status;
+
+if (!nvdimm->configdata) {
+status = NFIT_STATUS_NOT_SUPPORTED;
+goto exit;
+}
+
+le32_to_cpus(&cmd_in->length);
+le32_to_cpus(&cmd_in->offset);
+
+nvdebug("Write Config: offset %#x length %#x.\n", cmd_in->offset,
+cmd_in->length);
+if (nvdimm->config_data_size < cmd_in->length + cmd_in->offset) {
+nvdebug("position %#x is beyond config data (len = %#lx).\n",
+cmd_in->length + cmd_in->offset, nvdimm->config_data_size);
+status = NFIT_STATUS_INVALID_PARAS;
+goto exit;
+}
+
+status = NFIT_STATUS_SUCCESS;
+memcpy(nvdimm->config_data_addr + cmd_in->offset, cmd_in->in_buf,
+   cmd_in->length);
+
+exit:
+return status;
+}
+
 static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 {
 GSList *list = get_nvdimm_built_list();
@@ -510,6 +547,9 @@ static void dsm_write_nvdimm(struct dsm_buffer *in, struct 
dsm_out *out)
 case NFIT_CMD_GET_CONFIG_DATA:
 status = dsm_cmd_config_get(nvdimm, in, out);
 break;
+case NFIT_CMD_SET_CONFIG_DATA:
+status = dsm_cmd_config_set(nvdimm, in, out);
+break;
 default:
 status = NFIT_STATUS_NOT_SUPPORTED;
 };
-- 
2.4.3

[Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-08-14 Thread Xiao Guangrong

Changlog:
- Use litten endian for DSM method, thanks for Stefan's suggestion

- introduce a new parameter, @configdata, if it's false, Qemu will
  build a static and readonly namespace in memory and use it serveing
  for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
  reserved region is needed at the end of the @file, it is good for
  the user who want to pass whole nvdimm device and make its data
  completely be visible to guest

- divide the source code into separated files and add maintain info

BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
be posted on next week

== Background ==
NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform. They are discovered via ACPI and configured by _DSM
method of NVDIMM device in ACPI. There has some supporting documents which
can be found at:
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
this patchset tries to enable it in virtualization field

== Design ==
NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
address space then CPU can directly access it as normal memory, another is
BLK which is used as block device to reduce the occupying of CPU address
space

BLK mode accesses NVDIMM via Command Register window and Data Register window.
BLK virtualization has high workload since each sector access will cause at
least two VM-EXIT. So we currently only imperilment vPMEM in this patchset

--- vPMEM design ---
We introduce a new device named "pc-nvdimm", it has a parameter, file, which
is the file-based backed memory passed to guest. The file can be regular file
and block device. We can use any file when we do test or emulation, however,
in the real word, the files passed to guest are:
- the regular file in the filesystem with DAX enabled created on NVDIMM device
  on host
- the raw PMEM device on host, e,g /dev/pmem0
Memory access on the address created by mmap on these kinds of files can
directly reach NVDIMM device on host.

--- vConfigure data area design ---
Each NVDIMM device has a configure data area which is used to store label
namespace data. In order to emulating this area, we divide the file into two
parts:
- first parts is (0, size - 128K], which is used as PMEM
- 128K at the end of the file, which is used as Config Data Area
So that the label namespace data can be persistent during power lose or system
failure

--- _DSM method design ---
_DSM in ACPI is used to configure NVDIMM, currently we only allow access of
label namespace data, i.e, Get Namespace Label Size (Function Index 4),
Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
(Function Index 6)

_DSM uses two pages to transfer data between ACPI and Qemu, the first page
is RAM-based used to save the input info of _DSM method and Qemu reuse it
store output info and another page is MMIO-based, ACPI write data to this
page to transfer the control to Qemu

We use the address region above 4G to map these pages because there is huge
free space above 4G and it can avoid the address overlap with PCI and other
address reserved component (e,g HPET). This is also the reason we choose MMIO
notification instead of PIO

== Test ==
In host
1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
2) append '-device pc-nvdimm,file=/tmp/nvdimm' in Qemu command line

In guest, download the latest upsteam kernel (4.2 merge window) and enable
ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
1) insmod drivers/nvdimm/libnvdimm.ko
2) insmod drivers/acpi/nfit.ko
3) insmod drivers/nvdimm/nd_btt.ko
4) insmod drivers/nvdimm/nd_pmem.ko
You can see the whole nvdimm device used as a single namespace and /dev/pmem0
appears. You can do whatever on /dev/pmem0 including DAX access.

Currently Linux NVDIMM driver does not support namespace operation on this
kind of PMEM, apply below changes to support dynamical namespace:

@@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
continue;
}
 
-   if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+   //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+   if (nfit_mem->memdev_pmem)
flags |= NDD_ALIASING;

You can append another NVDIMM device in guest and do:   
# cd /sys/bus/nd/devices/
# cd namespace1.0/
# echo `uuidgen` > uuid
# echo `expr 1024 \* 1024 \* 128` > size
then reload nd.pmem.ko

You can see /dev/pmem1 appears

== TODO ==
1) NVDIMM NUMA support
2) NVDIMM hotplug support

Xiao Guangrong (18):
  acpi: allow aml_operation_region() working on 64 bit off

[Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract

2015-08-14 Thread Xiao Guangrong

Introduce "pc-nvdimm" device and it has two parameters:
- @file, which is the backed memory file for NVDIMM device

- @configdata, specify if we need to reserve 128k at the end of
  @file for nvdimm device's config data. Default is false

If @configdata is false, Qemu will build a static and readonly
namespace in memory and use it serveing for
DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests.
This is good for the user who want to pass whole nvdimm device
and make its data is complete visible to guest

We can use "-device pc-nvdimm,file=/dev/pmem,configdata" in the
Qemu command to create NVDIMM device for the guest

Signed-off-by: Xiao Guangrong 
---
 default-configs/i386-softmmu.mak   |  1 +
 default-configs/x86_64-softmmu.mak |  1 +
 hw/Makefile.objs   |  2 +-
 hw/mem/Makefile.objs   |  1 +
 hw/mem/nvdimm/pc-nvdimm.c  | 99 ++
 include/hw/mem/pc-nvdimm.h | 31 
 6 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 hw/mem/nvdimm/pc-nvdimm.c
 create mode 100644 include/hw/mem/pc-nvdimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 48b5762..67fc3a8 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -49,3 +49,4 @@ CONFIG_MEM_HOTPLUG=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
+CONFIG_NVDIMM=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 4962ed7..dfcde36 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_MEM_HOTPLUG=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
+CONFIG_NVDIMM=y
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 73afa41..1e25d3f 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -30,7 +30,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
-devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
+devices-dirs-y += mem/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index b000fb4..4df7482 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1,2 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o
diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
new file mode 100644
index 000..a53d235
--- /dev/null
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -0,0 +1,99 @@
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong 
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "hw/mem/pc-nvdimm.h"
+
+static char *get_file(Object *obj, Error **errp)
+{
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+return g_strdup(nvdimm->file);
+}
+
+static void set_file(Object *obj, const char *str, Error **errp)
+{
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+if (nvdimm->file) {
+g_free(nvdimm->file);
+}
+
+nvdimm->file = g_strdup(str);
+}
+
+static bool has_configdata(Object *obj, Error **errp)
+{
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+return nvdimm->configdata;
+}
+
+static void set_configdata(Object *obj, bool value, Error **errp)
+{
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+nvdimm->configdata = value;
+}
+
+static void pc_nvdimm_init(Object *obj)
+{
+object_property_add_str(obj, "file", get_file, set_file, NULL);
+object_property_add_bool(obj, "configdata", has_configdata,
+ set_configdata, NULL);
+}
+
+static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
+{
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
+
+if (!nvdimm->file) {
+error_setg(errp, "file property is not set");
+}
+}
+
+static void pc_nvdimm_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+/* nvdimm hotplug has not been supported yet. */
+dc->hotpluggable = false;
+
+dc->realize = pc_nvdimm_realize;
+dc->desc = "NVDIMM memory module";
+}
+
+static TypeInfo pc_nvdimm_info = {
+.name  = TY

[Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function

2015-08-14 Thread Xiao Guangrong

__DSM is defined in ACPI 6.0: 9.14.1 _DSM (Device Specific Method)

Function 0 is a query function. We do not support any function on root
device and only 3 functions are support for NVDIMM device,
NFIT_CMD_GET_CONFIG_SIZE, NFIT_CMD_GET_CONFIG_DATA and
NFIT_CMD_SET_CONFIG_DATA, that means we currently only allow to access
device's Label Namespace

Signed-off-by: Xiao Guangrong 
---
 hw/mem/nvdimm/acpi.c | 152 +++
 1 file changed, 152 insertions(+)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index c773954..20aefce 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -31,6 +31,7 @@
 #include "exec/address-spaces.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/mem/pc-nvdimm.h"
+#include "sysemu/sysemu.h"
 
 #include "internal.h"
 
@@ -41,6 +42,22 @@ static void nfit_spa_uuid_pm(void *uuid)
 memcpy(uuid, &uuid_pm, sizeof(uuid_pm));
 }
 
+static bool dsm_is_root_uuid(uint8_t *uuid)
+{
+uuid_le uuid_root = UUID_LE(0x2f10e7a4, 0x9e91, 0x11e4, 0x89,
+0xd3, 0x12, 0x3b, 0x93, 0xf7, 0x5c, 0xba);
+
+return !memcmp(uuid, &uuid_root, sizeof(uuid_root));
+}
+
+static bool dsm_is_dimm_uuid(uint8_t *uuid)
+{
+uuid_le uuid_dimm = UUID_LE(0x4309ac30, 0x0d11, 0x11e4, 0x91,
+0x91, 0x08, 0x00, 0x20, 0x0c, 0x9a, 0x66);
+
+return !memcmp(uuid, &uuid_dimm, sizeof(uuid_dimm));
+}
+
 enum {
 NFIT_TABLE_SPA = 0,
 NFIT_TABLE_MEM = 1,
@@ -162,6 +179,20 @@ static uint32_t nvdimm_index_to_handle(int index)
 return index + 1;
 }
 
+static PCNVDIMMDevice
+*get_nvdimm_device_by_handle(GSList *list, uint32_t handle)
+{
+for (; list; list = list->next) {
+PCNVDIMMDevice *nvdimm = list->data;
+
+if (nvdimm_index_to_handle(nvdimm->device_index) == handle) {
+return nvdimm;
+}
+}
+
+return NULL;
+}
+
 static size_t get_nfit_total_size(int nr)
 {
 /* each nvdimm has 3 tables. */
@@ -286,6 +317,23 @@ enum {
 NFIT_CMD_VENDOR = 9,
 };
 
+enum {
+NFIT_STATUS_SUCCESS = 0,
+NFIT_STATUS_NOT_SUPPORTED = 1,
+NFIT_STATUS_NON_EXISTING_MEM_DEV = 2,
+NFIT_STATUS_INVALID_PARAS = 3,
+NFIT_STATUS_VENDOR_SPECIFIC_ERROR = 4,
+};
+
+#define DSM_REVISION(1)
+
+/* do not support any command except NFIT_CMD_IMPLEMENTED on root. */
+#define ROOT_SUPPORT_CMD(1 << NFIT_CMD_IMPLEMENTED)
+/* support NFIT_CMD_SET_CONFIG_DATA iif nvdimm->configdata is true. */
+#define DIMM_SUPPORT_CMD((1 << NFIT_CMD_IMPLEMENTED)\
+   | (1 << NFIT_CMD_GET_CONFIG_SIZE)\
+   | (1 << NFIT_CMD_GET_CONFIG_DATA))
+
 struct dsm_buffer {
 /* RAM page. */
 uint32_t handle;
@@ -306,6 +354,18 @@ struct dsm_buffer {
 static ram_addr_t dsm_addr;
 static size_t dsm_size;
 
+struct cmd_out_implemented {
+uint64_t cmd_list;
+};
+
+struct dsm_out {
+union {
+uint32_t status;
+struct cmd_out_implemented cmd_implemented;
+uint8_t data[PAGE_SIZE];
+};
+};
+
 static uint64_t dsm_read(void *opaque, hwaddr addr,
  unsigned size)
 {
@@ -314,10 +374,102 @@ static uint64_t dsm_read(void *opaque, hwaddr addr,
 return 0;
 }
 
+static void dsm_write_root(struct dsm_buffer *in, struct dsm_out *out)
+{
+uint32_t function = in->arg2;
+
+if (function == NFIT_CMD_IMPLEMENTED) {
+out->cmd_implemented.cmd_list = cpu_to_le64(ROOT_SUPPORT_CMD);
+return;
+}
+
+out->status = cpu_to_le32(NFIT_STATUS_NOT_SUPPORTED);
+nvdebug("Return status %#x.\n", out->status);
+}
+
+static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
+{
+GSList *list = get_nvdimm_built_list();
+PCNVDIMMDevice *nvdimm = get_nvdimm_device_by_handle(list, in->handle);
+uint32_t function = in->arg2;
+uint32_t status = NFIT_STATUS_NON_EXISTING_MEM_DEV;
+uint64_t cmd_list;
+
+if (!nvdimm) {
+goto set_status_free;
+}
+
+switch (function) {
+case NFIT_CMD_IMPLEMENTED:
+cmd_list = DIMM_SUPPORT_CMD;
+if (nvdimm->configdata) {
+cmd_list |= 1 << NFIT_CMD_SET_CONFIG_DATA;
+}
+
+out->cmd_implemented.cmd_list = cpu_to_le64(cmd_list);
+goto free;
+default:
+status = NFIT_STATUS_NOT_SUPPORTED;
+};
+
+nvdebug("Return status %#x.\n", status);
+
+set_status_free:
+out->status = cpu_to_le32(status);
+free:
+g_slist_free(list);
+}
+
 static void dsm_write(void *opaque, hwaddr addr,
   uint64_t val, unsigned size)
 {
+struct MemoryRegion *dsm_ram_mr = opaque;
+struct dsm_buffer *dsm;
+struct dsm_out *out;
+void *buf;
+
 assert(val == NOTIFY_VALUE);
+
+buf = memory_region_get_ram_ptr(dsm_ram_mr);
+dsm = buf;
+out = buf;
+
+le32_to_cpus(&dsm->handle);
+le32_to_cpus(&dsm->arg1);
+le32_to_cpus(&dsm->arg2);
+
+nvdebug("Arg0 " UUID_FMT ".\n", dsm->arg0[0], dsm->

[Qemu-devel] [PATCH v2 16/18] nvdimm: support NFIT_CMD_GET_CONFIG_DATA

2015-08-14 Thread Xiao Guangrong

Function 5 is used to get Namespace Label Data

Signed-off-by: Xiao Guangrong 
---
 hw/mem/nvdimm/acpi.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 0a5f2c2..517d710 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -352,6 +352,7 @@ struct dsm_buffer {
 uint32_t arg1;
 uint32_t arg2;
 union {
+struct cmd_in_get_config_data cmd_config_get;
 struct cmd_in_set_config_data cmd_config_set;
 char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
 };
@@ -454,6 +455,34 @@ dsm_cmd_config_size(PCNVDIMMDevice *nvdimm, struct 
dsm_buffer *in,
 return NFIT_STATUS_SUCCESS;
 }
 
+static uint32_t
+dsm_cmd_config_get(PCNVDIMMDevice *nvdimm, struct dsm_buffer *in,
+   struct dsm_out *out)
+{
+struct cmd_in_get_config_data *cmd_in = &in->cmd_config_get;
+uint32_t status;
+
+le32_to_cpus(&cmd_in->length);
+le32_to_cpus(&cmd_in->offset);
+
+nvdebug("Read Config: offset %#x length %#x.\n", cmd_in->offset,
+cmd_in->length);
+
+if (nvdimm->config_data_size < cmd_in->length + cmd_in->offset) {
+nvdebug("position %#x is beyond config data (len = %#lx).\n",
+cmd_in->length + cmd_in->offset, nvdimm->config_data_size);
+status = NFIT_STATUS_INVALID_PARAS;
+goto exit;
+}
+
+status = NFIT_STATUS_SUCCESS;
+memcpy(out->cmd_config_get.out_buf, nvdimm->config_data_addr +
+   cmd_in->offset, cmd_in->length);
+
+exit:
+return status;
+}
+
 static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 {
 GSList *list = get_nvdimm_built_list();
@@ -478,6 +507,9 @@ static void dsm_write_nvdimm(struct dsm_buffer *in, struct 
dsm_out *out)
 case NFIT_CMD_GET_CONFIG_SIZE:
 status = dsm_cmd_config_size(nvdimm, in, out);
 break;
+case NFIT_CMD_GET_CONFIG_DATA:
+status = dsm_cmd_config_get(nvdimm, in, out);
+break;
 default:
 status = NFIT_STATUS_NOT_SUPPORTED;
 };
-- 
2.4.3

[Qemu-devel] [PATCH 07/11] target-m68k: Use setcond for scc

2015-08-14 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target-m68k/translate.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index ce48e2a..28c3e1e 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -886,19 +886,21 @@ static void gen_jmpcc(DisasContext *s, int cond, TCGLabel 
*l1)
 
 DISAS_INSN(scc)
 {
-TCGLabel *l1;
+DisasCompare c;
 int cond;
-TCGv reg;
+TCGv reg, tmp;
 
-l1 = gen_new_label();
 cond = (insn >> 8) & 0xf;
+gen_cc_cond(&c, s, cond);
+
+tmp = tcg_temp_new();
+tcg_gen_setcond_i32(c.tcond, tmp, c.v1, c.v2);
+free_cond(&c);
+
 reg = DREG(insn, 0);
-tcg_gen_andi_i32(reg, reg, 0xff00);
-/* This is safe because we modify the reg directly, with no other values
-   live.  */
-gen_jmpcc(s, cond ^ 1, l1);
-tcg_gen_ori_i32(reg, reg, 0xff);
-gen_set_label(l1);
+tcg_gen_neg_i32(tmp, tmp);
+tcg_gen_deposit_i32(reg, reg, tmp, 0, 8);
+tcg_temp_free(tmp);
 }
 
 /* Force a TB lookup after an instruction that changes the CPU state.  */
-- 
2.4.3

[Qemu-devel] [PATCH 10/11] target-m68k: Inline shifts

2015-08-14 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target-m68k/helper.c| 52 ---
 target-m68k/helper.h|  3 --
 target-m68k/translate.c | 94 +
 3 files changed, 72 insertions(+), 77 deletions(-)

diff --git a/target-m68k/helper.c b/target-m68k/helper.c
index ff7e481..6bd80a5 100644
--- a/target-m68k/helper.c
+++ b/target-m68k/helper.c
@@ -322,58 +322,6 @@ void HELPER(set_sr)(CPUM68KState *env, uint32_t val)
 m68k_switch_sp(env);
 }
 
-uint32_t HELPER(shl_cc)(CPUM68KState *env, uint32_t val, uint32_t shift)
-{
-uint64_t result;
-
-shift &= 63;
-result = (uint64_t)val << shift;
-
-env->cc_c = (result >> 32) & 1;
-env->cc_n = result;
-env->cc_z = result;
-env->cc_v = 0;
-env->cc_x = shift ? env->cc_c : env->cc_x;
-
-return result;
-}
-
-uint32_t HELPER(shr_cc)(CPUM68KState *env, uint32_t val, uint32_t shift)
-{
-uint64_t temp;
-uint32_t result;
-
-shift &= 63;
-temp = (uint64_t)val << 32 >> shift;
-result = temp >> 32;
-
-env->cc_c = (temp >> 31) & 1;
-env->cc_n = result;
-env->cc_z = result;
-env->cc_v = 0;
-env->cc_x = shift ? env->cc_c : env->cc_x;
-
-return result;
-}
-
-uint32_t HELPER(sar_cc)(CPUM68KState *env, uint32_t val, uint32_t shift)
-{
-uint64_t temp;
-uint32_t result;
-
-shift &= 63;
-temp = (int64_t)val << 32 >> shift;
-result = temp >> 32;
-
-env->cc_c = (temp >> 31) & 1;
-env->cc_n = result;
-env->cc_z = result;
-env->cc_v = result ^ val;
-env->cc_x = shift ? env->cc_c : env->cc_x;
-
-return result;
-}
-
 /* FPU helpers.  */
 uint32_t HELPER(f64_to_i32)(CPUM68KState *env, float64 val)
 {
diff --git a/target-m68k/helper.h b/target-m68k/helper.h
index c868148..9985f9b 100644
--- a/target-m68k/helper.h
+++ b/target-m68k/helper.h
@@ -5,9 +5,6 @@ DEF_HELPER_2(divu, void, env, i32)
 DEF_HELPER_2(divs, void, env, i32)
 DEF_HELPER_3(addx_cc, i32, env, i32, i32)
 DEF_HELPER_3(subx_cc, i32, env, i32, i32)
-DEF_HELPER_3(shl_cc, i32, env, i32, i32)
-DEF_HELPER_3(shr_cc, i32, env, i32, i32)
-DEF_HELPER_3(sar_cc, i32, env, i32, i32)
 DEF_HELPER_2(set_sr, void, env, i32)
 DEF_HELPER_3(movec, void, env, i32, i32)
 
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 19097c2..a536054 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2060,48 +2060,98 @@ DISAS_INSN(addx)
 gen_helper_addx_cc(reg, cpu_env, reg, src);
 }
 
-/* TODO: This could be implemented without helper functions.  */
 DISAS_INSN(shift_im)
 {
-TCGv reg;
-int tmp;
-TCGv shift;
+TCGv reg = DREG(insn, 0);
+int count = (insn >> 9) & 7;
+int arith = insn & 8;
 
-set_cc_op(s, CC_OP_FLAGS);
+if (count == 0) {
+count = 8;
+}
 
-reg = DREG(insn, 0);
-tmp = (insn >> 9) & 7;
-if (tmp == 0)
-tmp = 8;
-shift = tcg_const_i32(tmp);
-/* No need to flush flags becuse we know we will set C flag.  */
 if (insn & 0x100) {
-gen_helper_shl_cc(reg, cpu_env, reg, shift);
+tcg_gen_shri_i32(QREG_CC_C, reg, 31 - count);
+tcg_gen_shli_i32(QREG_CC_N, reg, count);
 } else {
-if (insn & 8) {
-gen_helper_shr_cc(reg, cpu_env, reg, shift);
+tcg_gen_shri_i32(QREG_CC_C, reg, count - 1);
+if (arith) {
+tcg_gen_sari_i32(QREG_CC_N, reg, count);
 } else {
-gen_helper_sar_cc(reg, cpu_env, reg, shift);
+tcg_gen_shri_i32(QREG_CC_N, reg, count);
 }
 }
+tcg_gen_andi_i32(QREG_CC_C, QREG_CC_C, 1);
+tcg_gen_mov_i32(QREG_CC_Z, QREG_CC_N);
+tcg_gen_mov_i32(QREG_CC_X, QREG_CC_C);
+
+/* Note that ColdFire always clears V, while M68000 sets it for
+   a change in the sign bit.  */
+if (arith && m68k_feature(s->env, M68K_FEATURE_M68000)) {
+tcg_gen_xor_i32(QREG_CC_V, QREG_CC_N, reg);
+} else {
+tcg_gen_movi_i32(QREG_CC_V, 0);
+}
+
+tcg_gen_mov_i32(reg, QREG_CC_N);
+set_cc_op(s, CC_OP_FLAGS);
 }
 
 DISAS_INSN(shift_reg)
 {
-TCGv reg;
-TCGv shift;
+TCGv reg, s32;
+TCGv_i64 t64, s64;
+int arith = insn & 8;
 
 reg = DREG(insn, 0);
-shift = DREG(insn, 9);
+t64 = tcg_temp_new_i64();
+s64 = tcg_temp_new_i64();
+s32 = tcg_temp_new();
+
+/* Note that m68k truncates the shift count modulo 64, not 32.
+   In addition, a 64-bit shift makes it easy to find "the last
+   bit shifted out", for the carry flag.  */
+tcg_gen_andi_i32(s32, DREG(insn, 9), 63);
+tcg_gen_extu_i32_i64(s64, s32);
+
+/* Non-arithmetic shift clears V.  Use it as a source zero here.  */
+tcg_gen_movi_i32(QREG_CC_V, 0);
+
 if (insn & 0x100) {
-gen_helper_shl_cc(reg, cpu_env, reg, shift);
+tcg_gen_extu_i32_i64(t64, reg);
+tcg_gen_shl_i64(t64, t64, s64);
+tcg_temp_free_i64(s64);
+tcg_gen_extr_i64_i32(QREG_CC_N, QREG_CC_C, t64);
+tcg_temp_free_i64

[Qemu-devel] [PATCH v2 18/18] nvdimm: add maintain info

2015-08-14 Thread Xiao Guangrong

Add NVDIMM maintainer

Signed-off-by: Xiao Guangrong 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 978b717..86786e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -793,6 +793,12 @@ M: Jiri Pirko 
 S: Maintained
 F: hw/net/rocker/
 
+NVDIMM
+M: Xiao Guangrong 
+S: Maintained
+F: hw/mem/nvdimm/
+F: include/hw/mem/pc-nvdimm.h
+
 Subsystems
 --
 Audio
-- 
2.4.3

[Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method

2015-08-14 Thread Xiao Guangrong

This memory range is used to transfer data between ACPI in guest and Qemu,
it occupies two pages:
- one is RAM-based used to save the input info of _DSM method and Qemu reuse
  it store output info

- another one is MMIO-based, ACPI write data to this page to transfer the
  control to Qemu

Signed-off-by: Xiao Guangrong 
---
 hw/mem/nvdimm/acpi.c  | 80 ++-
 hw/mem/nvdimm/internal.h  |  1 +
 hw/mem/nvdimm/pc-nvdimm.c |  2 +-
 3 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index f28752f..e0f2ad3 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -28,6 +28,7 @@
 
 #include "qemu-common.h"
 
+#include "exec/address-spaces.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/mem/pc-nvdimm.h"
 
@@ -257,14 +258,91 @@ static void build_nfit_table(GSList *device_list, char 
*buf)
 }
 }
 
+struct dsm_buffer {
+/* RAM page. */
+uint32_t handle;
+uint8_t arg0[16];
+uint32_t arg1;
+uint32_t arg2;
+union {
+char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
+};
+
+/* MMIO page. */
+union {
+uint32_t notify;
+char pedding[PAGE_SIZE];
+};
+};
+
+static ram_addr_t dsm_addr;
+static size_t dsm_size;
+
+static uint64_t dsm_read(void *opaque, hwaddr addr,
+ unsigned size)
+{
+return 0;
+}
+
+static void dsm_write(void *opaque, hwaddr addr,
+  uint64_t val, unsigned size)
+{
+}
+
+static const MemoryRegionOps dsm_ops = {
+.read = dsm_read,
+.write = dsm_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static int build_dsm_buffer(void)
+{
+MemoryRegion *dsm_ram_mr, *dsm_mmio_mr;
+ram_addr_t addr;;
+
+QEMU_BUILD_BUG_ON(PAGE_SIZE * 2 != sizeof(struct dsm_buffer));
+
+/* DSM buffer has already been built. */
+if (dsm_addr) {
+return 0;
+}
+
+addr = reserved_range_push(2 * PAGE_SIZE);
+if (!addr) {
+return -1;
+}
+
+dsm_addr = addr;
+dsm_size = PAGE_SIZE * 2;
+
+dsm_ram_mr = g_new(MemoryRegion, 1);
+memory_region_init_ram(dsm_ram_mr, NULL, "dsm_ram", PAGE_SIZE,
+   &error_abort);
+vmstate_register_ram_global(dsm_ram_mr);
+memory_region_add_subregion(get_system_memory(), addr, dsm_ram_mr);
+
+dsm_mmio_mr = g_new(MemoryRegion, 1);
+memory_region_init_io(dsm_mmio_mr, NULL, &dsm_ops, dsm_ram_mr,
+  "dsm_mmio", PAGE_SIZE);
+memory_region_add_subregion(get_system_memory(), addr + PAGE_SIZE,
+dsm_mmio_mr);
+return 0;
+}
+
 void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
 GArray *linker)
 {
-GSList *list = get_nvdimm_built_list();
+GSList *list;
 size_t total;
 char *buf;
 int nfit_start, nr;
 
+if (build_dsm_buffer()) {
+fprintf(stderr, "do not have enough space for DSM buffer.\n");
+return;
+}
+
+list = get_nvdimm_built_list();
 nr = get_nvdimm_device_number(list);
 total = get_nfit_total_size(nr);
 
diff --git a/hw/mem/nvdimm/internal.h b/hw/mem/nvdimm/internal.h
index 252a222..90d54dc 100644
--- a/hw/mem/nvdimm/internal.h
+++ b/hw/mem/nvdimm/internal.h
@@ -26,4 +26,5 @@ typedef struct {
 (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } })
 
 GSList *get_nvdimm_built_list(void);
+ram_addr_t reserved_range_push(uint64_t size);
 #endif
diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
index 2a6cfa2..752842a 100644
--- a/hw/mem/nvdimm/pc-nvdimm.c
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -45,7 +45,7 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
 nvdimms_info.current_addr = offset;
 }
 
-static ram_addr_t reserved_range_push(uint64_t size)
+ram_addr_t reserved_range_push(uint64_t size)
 {
 uint64_t current;
 
-- 
2.4.3

[Qemu-devel] [PATCH v2 11/18] nvdimm: build ACPI nvdimm devices

2015-08-14 Thread Xiao Guangrong

NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices

This is a root device under \_SB and specified NVDIMM device are under the
root device. Each NVDIMM device has _ADR which return its handle used to
associate MEMDEV table in NFIT

We reserve handle 0 for root device. In this patch, we save handle, arg0,
arg1 and arg2. Arg3 is conditionally saved in later patch

Signed-off-by: Xiao Guangrong 
---
 hw/i386/acpi-build.c   |   2 +
 hw/mem/nvdimm/acpi.c   | 130 -
 include/hw/mem/pc-nvdimm.h |   2 +
 3 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 092ed2f..a792135 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1342,6 +1342,8 @@ build_ssdt(GArray *table_data, GArray *linker,
 aml_append(sb_scope, scope);
 }
 }
+
+pc_nvdimm_build_acpi_devices(sb_scope);
 aml_append(ssdt, sb_scope);
 }
 
diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index e0f2ad3..909a8ef 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -135,10 +135,11 @@ struct nfit_dcr {
 uint8_t reserved2[6];
 } QEMU_PACKED;
 
-#define REVSISON_ID1
-#define NFIT_FIC1  0x201
+#define REVSISON_ID 1
+#define NFIT_FIC1   0x201
 
 #define MAX_NVDIMM_NUMBER   10
+#define NOTIFY_VALUE0x99
 
 static int get_nvdimm_device_number(GSList *list)
 {
@@ -281,12 +282,15 @@ static size_t dsm_size;
 static uint64_t dsm_read(void *opaque, hwaddr addr,
  unsigned size)
 {
+fprintf(stderr, "BUG: we never read DSM notification MMIO.\n");
+assert(0);
 return 0;
 }
 
 static void dsm_write(void *opaque, hwaddr addr,
   uint64_t val, unsigned size)
 {
+assert(val == NOTIFY_VALUE);
 }
 
 static const MemoryRegionOps dsm_ops = {
@@ -361,3 +365,125 @@ void pc_nvdimm_build_nfit_table(GArray *table_offsets, 
GArray *table_data,
 exit:
 g_slist_free(list);
 }
+
+#define BUILD_STA_METHOD(_dev_, _method_)  \
+do {   \
+_method_ = aml_method("_STA", 0);  \
+aml_append(_method_, aml_return(aml_int(0x0f)));   \
+aml_append(_dev_, _method_);   \
+} while (0)
+
+#define SAVE_ARG012_HANDLE(_method_, _handle_) \
+do {   \
+aml_append(_method_, aml_store(_handle_, aml_name("HDLE")));   \
+aml_append(_method_, aml_store(aml_arg(0), aml_name("ARG0"))); \
+aml_append(_method_, aml_store(aml_arg(1), aml_name("ARG1"))); \
+aml_append(_method_, aml_store(aml_arg(2), aml_name("ARG2"))); \
+} while (0)
+
+#define NOTIFY_AND_RETURN(_method_)\
+do {   \
+aml_append(_method_, aml_store(aml_int(NOTIFY_VALUE),  \
+   aml_name("NOTI"))); \
+aml_append(_method_, aml_return(aml_name("ODAT")));\
+} while (0)
+
+static void build_nvdimm_devices(Aml *root_dev, GSList *list)
+{
+for (; list; list = list->next) {
+PCNVDIMMDevice *nvdimm = list->data;
+uint32_t handle = nvdimm_index_to_handle(nvdimm->device_index);
+Aml *dev, *method;
+
+dev = aml_device("NVD%d", nvdimm->device_index);
+aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
+
+BUILD_STA_METHOD(dev, method);
+
+method = aml_method("_DSM", 4);
+{
+SAVE_ARG012_HANDLE(method, aml_int(handle));
+NOTIFY_AND_RETURN(method);
+}
+aml_append(dev, method);
+
+aml_append(root_dev, dev);
+}
+}
+
+void pc_nvdimm_build_acpi_devices(Aml *sb_scope)
+{
+Aml *dev, *method, *field;
+struct dsm_buffer *dsm_buf;
+GSList *list = get_nvdimm_built_list();
+int nr = get_nvdimm_device_number(list);
+
+if (nr <= 0 || nr > MAX_NVDIMM_NUMBER) {
+g_slist_free(list);
+return;
+}
+
+dev = aml_device("NVDR");
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
+
+/* map DSM buffer into ACPI namespace. */
+aml_append(dev, aml_operation_region("DSMR", AML_SYSTEM_MEMORY,
+   dsm_addr, dsm_size));
+
+/*
+ * DSM input:
+ * @HDLE: store device's handle, it's zero if the _DSM call happens
+ *on ROOT.
+ * @ARG0 ~ @ARG3: store the parameters of _DSM call.
+ *
+ * They are ram mapping on host so that these access never cause VM-EXIT.
+ */
+field = aml_field("DSMR", AML_DWORD_ACC, AML_PRESERVE);
+aml_append(field, aml_named_field("HDLE",
+   siz

[Qemu-devel] [PATCH 00/11] Proposed format for m68k flags

2015-08-14 Thread Richard Henderson

As promised a couple of days ago, with the addition of CC_OP_CMP,
which wasn't in the text of my proposal the other day.  From the
looks of the generated code, I believe this is ideal.

The following is based on Laurent's 8/30 Update cpu flags management.

FWIW, there's something in the last patch here that breaks the
coldfire kernel I've been testing (it may even be a bug in tcg;
the problem only appears well into the boot process).  But I'm
about to go away for the weekend and still wanted to include it
to show what can be done.

For convenience, the complete tree pushed to

  git://github.com/rth7680/qemu.git tgt-m68k


r~


Richard Henderson (11):
  target-m68k: Print flags properly
  target-m68k: Some fixes to SR and flags management
  target-m68k: Remove incorrect clearing of cc_x
  target-m68k: Replace helper_xflag_lt with setcond
  target-m68k: Reorg flags handling
  target-m68k: Introduce DisasCompare
  target-m68k: Use setcond for scc
  target-m68k: Optimize some comparisons
  target-m68k: Optimize gen_flush_flags
  target-m68k: Inline shifts
  target-m68k: Inline addx, subx, negx

 target-m68k/cpu.c   |   2 +-
 target-m68k/cpu.h   |  48 +--
 target-m68k/helper.c| 399 +
 target-m68k/helper.h|  12 +-
 target-m68k/op_helper.c |  35 +--
 target-m68k/qregs.def   |   6 +-
 target-m68k/translate.c | 769 +++-
 7 files changed, 674 insertions(+), 597 deletions(-)

-- 
2.4.3

1 2 >

1 - 100 of 168 matches

Mail list logo