date:20110927

Re: [Qemu-devel] When the tlb_fill will be called from generated code?

2011-09-27 Thread Jan Kiszka

On 2011-09-27 06:15, 陳韋任 wrote:
> Hi, all
> 
>   The comment above tlb_fill says:
> 
> /* try to fill the TLB and return an exception if error. If retaddr is
>NULL, it means that the function was called in C code (i.e. not
>from generated code or from helper.c) */
> 
> I see tlb_fill only be called from softmmu_template.h (i.e., C code). I
> am wondering when/where the tlb_fill is called from generated code (code
> cache) or from helper.c.
> 

You can find the answer yourself: Load qemu into gdb, set a breakpoint
on that function and let it run. If you want to catch only the retaddr
== NULL case, make the breakpoint conditional.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions

2011-09-27 Thread Peter Maydell

On 27 September 2011 05:29, Andi Kleen  wrote:
> Peter Maydell  writes:
>> The answer is that the edge cases basically never match. No CPU
>> architecture does handling of NaNs and input denormals and output
>> denormals and underflow checks and all the rest of it in exactly
>> the same way as anybody else. (In particular x86 is pretty crazy,
>
> Can you clarify this?
>
> IEEE754 is pretty strict on how all these things are handled
> and to my knowledge all serious x86 are fully IEEE compliant.
> Or are you refering to the x87 80bits expansion? While useful
> that's not used anymore with SSE.

IEEE leaves some leeway for implementations. Just off the top
of my head:
 * if two NaNs are passed to an op then which one is propagated
   is implementation defined
 * value of the 'default NaN' is imp-def
 * whether the msbit of the significand is 1 or 0 to indicate
   an SNaN is imp-def
 * how an SNaN is converted to a QNaN is imp-def
 * tininess can be detected either before or after rounding

which different architectures vary on (and some are better at
documenting their choices than others).

Also implementations often have extra non-IEEE modes (which
may even be the default, for speed):
 * squashing denormal outputs to zero
 * squashing denormal inputs to zero
and there's even less agreement here.

Common-but-not-officially-ieee ops like 'max' and 'min'
can also vary: for instance Intel's SSE MAXSD/MINSD etc
have very weird behaviour for the special cases: if both
operands are 0.0s of any sign you always get the second
operand, so max(-0,+0) != max(+0,-0), and if only one operand
is a NaN then the second operand is returned whether it is
the NaN or not (so max(NaN, 3) != max(3, NaN).

> On the other hand qemu is not very good at it, e.g. with x87
> it doesn't even pass paranoia.

This is only because nobody cares much about x86 TCG. ARM floating
point emulation is (now) bit-for-bit correct apart from a handful
of operations which don't set the right set of exception flags.

-- PMM

Re: [Qemu-devel] Help writing a trivial device

2011-09-27 Thread Stefan Hajnoczi

On Mon, Sep 26, 2011 at 08:03:12PM +0200, Lluís Vilanova wrote:
> #if 1   /* KVM doesn't like it */
> s->data_ptr = g_malloc(s->size);
> memory_region_init_ram_ptr(&s->data, &s->dev.qdev, "backdoor.data",
>s->size, s->data_ptr);
> pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->data);
> #endif

Have you tried page-aligning data_ptr with qemu_memalign() or doing an
anonymous mmap instead of g_malloc()?

>From virt/kvm/kvm_main.c:__kvm_set_memory_region():

/* General sanity checks */
if (mem->memory_size & (PAGE_SIZE - 1))
goto out;
if (mem->guest_phys_addr & (PAGE_SIZE - 1))
goto out;
/* We can read the guest memory with __xxx_user() later on. */
if (user_alloc &&
((mem->userspace_addr & (PAGE_SIZE - 1)) ||
 !access_ok(VERIFY_WRITE,
(void __user *)(unsigned long)mem->userspace_addr,
mem->memory_size)))
goto out;

Stefan

Re: [Qemu-devel] virtqueue corruption in emulation mode?

2011-09-27 Thread Stefan Hajnoczi

On Mon, Sep 26, 2011 at 07:16:56PM -0500, Sinha, Ani wrote:
> I am using the virtqueue (virtqueue_pop, virtqueue_push etc) in the emulated 
> mode (non-kvm mode) from an IO thread (a separate thread different from main 
> QEMU thread). What I am observing is that the virtqueue memory seems to get 
> corrupt. Either qemu crashes while performing virtqueue_push() 
> (virtqueue_push() -> virtqueue_fill() 
> ->bring_used_idx()->lduw_phys()->qemu_get_ram_ptr()->"bad ram offset") or 
> crashes when the guest accesses a bad memory while using virtqueue. Now this 
> never ever happens when I run QEMU in KVM mode (/dev/kvm present) OR when I 
> use my functions from within the main qemu thread. I am unable to figure out 
> why this is happening. I have looked into my code over and over again and I 
> can't seem to explain this behavior. Can any of you guys give me any inkling?

QEMU is not thread-safe in general.  It uses a big lock to protect most
of its internal state.

When you say "an IO thread" it sounds like you spawn a new thread
outside the big lock (qemu_global_mutex).  You cannot call the existing
virtqueue functions outside the big lock because they traverse (and
modify!) the memory management data structures.

Please call new threads "helper threads" or something other than "IO
thread" because I/O thread has a specific meaning in QEMU.  It's the
event loop thread that execute main_loop_wait() and dispatches fd
handlers when select(2) returns.  This will prevent confusion.

If you follow the way that existing virtio devices are implemented there
should be no problem.

Stefan

Re: [Qemu-devel] Using iPXE with older qemu releases?

2011-09-27 Thread Stefan Hajnoczi

On Mon, Sep 26, 2011 at 02:22:21PM -0500, Kenton Cabiness wrote:
> Is there a way to point an older qemu release (currently running
> qemu-kvm-0.12.1.2-2.16) to iPXE products?
> 
> We have built iPXE and installed the files and tested by changing
> the symbolic links in /usr/share/qemu-kvm (RH6.1 system) to point to
> the iPXE files.  We would like to package the files in an RPM for
> installation on several machines, but since the links are owned by
> the qemu-kvm package, we cant have the iPXE package overwrite them.
> 
> Is there a command line argument to qemu to point to a different
> directory for iPXE?  I've been looking through the code but can't
> find what tells qemu where to pick up the files.

Try -device virtio-net-pci,romfile=/path/to/ipxe.rom,... as part of your
command-line.  If you have trouble getting this working, please post
your full command-line.

Alternatively use -option-rom /path/to/ipxe.rom.  Either method should
work.

Stefan

[Qemu-devel] [PATCH] tracetool: Omit useless QEMU_*_ENABLED() check

2011-09-27 Thread Stefan Hajnoczi

SystemTap provides a "semaphore" that can optionally be tested before
executing a trace event.  The purpose of this mechanism is to skip
expensive tracing code when the trace event is disabled.

For example, some applications may have trace events that format or
convert strings for trace events.  This expensive processing should only
be done in the case where the trace event is enabled.

Since QEMU's generated trace events never have such special-purpose
code, there is no reason to add the semaphore check.

Signed-off-by: Stefan Hajnoczi 
---
 scripts/tracetool |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/scripts/tracetool b/scripts/tracetool
index 4c9951d..ffe3eba 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -415,9 +415,7 @@ linetoh_dtrace()
 # Define an empty function for the trace event
 cat <

Re: [Qemu-devel] [PATCH] fix qemu build with nas installed

2011-09-27 Thread Christoph Egger


On 09/21/11 12:16, Stefan Hajnoczi wrote:

On Wed, Sep 21, 2011 at 10:34 AM, Christoph Egger
  wrote:


When NAS (http://nas.sf.net) is installed then there
is an existing audio/audio.h. Then when compiling
qemu, #include "audio/audio.h" takes the one from NAS
and causes the build to fail.

So rename audio/audio.h to audio/qaudio.h and adjust
all users.

Signed-off-by: Christoph Egger

diff --git a/arch_init.c b/arch_init.c
index 9a5a0e3..516c4c0 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -32,7 +32,7 @@
  #include "monitor.h"
  #include "sysemu.h"
  #include "arch_init.h"
-#include "audio/audio.h"
+#include "audio/qaudio.h"


It's not looking for system headers so why is
/usr/include/audio/audio.h getting picked up instead of
./audio/audio.h?

If it was #include  I could understand but I suspect we
have a mess of include paths that the compiler is searching and it
would be nice to fix that instead.


The include pathes passed to the compiler are always pre-pended.
So the compiler searches headers in third-party packages first.
I haven't found the place in configure where this happens.

Christoph

--
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85689 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

Re: [Qemu-devel] [FYI] Soft feature freeze for 1.0 is 10/15 (three weeks away)

2011-09-27 Thread Avi Kivity


On 09/26/2011 09:07 PM, Blue Swirl wrote:

>>  The default address is used for early serial printk in OpenBIOS, so it
>>  should still work.
>
>  Ok, so drop the extra mapping, but init the dynamic mapping to 0x80013000.

That should work.


It's already there (macio.c):

if (macio_state->escc_mem) {
memory_region_add_subregion(bar, 0x13000, macio_state->escc_mem);
}

I'll drop the extra mapping.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC] Generic image streaming

2011-09-27 Thread Stefan Hajnoczi

On Tue, Sep 27, 2011 at 11:26:45AM +0800, Zhi Yong Wu wrote:
> On Fri, Sep 23, 2011 at 11:57 PM, Stefan Hajnoczi
>  wrote:
> > Here is my generic image streaming branch, which aims to provide a way
> > to copy the contents of a backing file into an image file of a running
> > guest without requiring specific support in the various block drivers
> > (e.g.  qcow2, qed, vmdk):
> >
> > http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api
> >
> > The tree does not provide full image streaming yet but I'd like to
> > discuss the approach taken in the code.  Here are the main points:
> >
> > The image streaming API is available through HMP and QMP commands.  When
> > streaming is started on a block device a coroutine is created to do the
> > background I/O work.  The coroutine can be cancelled.
> >
> > While the coroutine copies data from the backing file into the image
> > file, the guest may be performing I/O to the image file.  Guest reads do
> > not conflict with streaming but guest writes require special handling.
> > If the guest writes to a region of the image file that we are currently
> > copying, then there is the potential to clobber the guest write with old
> > data from the backing file.
> >
> > Previously I solved this in a QED-specific way by taking advantage of
> > the serialization of allocating write requests.  In order to do this
> > generically we need to track in-flight requests and have the ability to
> > queue I/O.  Guest writes that affect an in-flight streaming copy
> > operation must wait for that operation to complete before being issued.
> > Streaming copy operations must skip overlapping regions of guest writes.
> >
> > One big difference to the QED image streaming implementation is that
> > this generic implementation is not based on copy-on-read operations.
> > Instead we do a sequence of bdrv_is_allocated() to find regions for
> > streaming, followed by bdrv_co_read() and bdrv_co_write() in order to
> Why is the api not bdrv_aio_readv/writev? In your branch, it seems
> that you only modify bdrv_read/write. Does your branch currently only
> support sync read/write mode?

No, it is designed to work with all three: aio, co, and sync.

The streaming loop itself is in a coroutine but it will be able to
interact with any other block I/O requests.  The critical part is the
request tracking here, which monitors aio, co, and sync:
http://repo.or.cz/w/qemu/stefanha.git/commitdiff/cf365d4e33ba19fadd4f1f20c64526e890d34239

Stefan

[Qemu-devel] [RFC][PATCH] Drop global lock during TCG code execution

2011-09-27 Thread Jan Kiszka

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

I'm pretty sure some cases are still broken, definitely SMP (we no
longer perform round-robin scheduling "by chance"). Still, a Linux x86
UP guest and my Musicpal ARM model boot fine here. These numbers
demonstrate where we gain something:

20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Note that this patch depends on
http://thread.gmane.org/gmane.comp.emulators.qemu/118657

---
 cpus.c  |   18 ++
 exec.c  |   33 +
 qemu-common.h   |   11 ---
 softmmu_template.h  |8 
 target-i386/op_helper.c |   27 ---
 5 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/cpus.c b/cpus.c
index f983033..7d64437 100644
--- a/cpus.c
+++ b/cpus.c
@@ -721,7 +721,7 @@ void qemu_cpu_kick(void *_env)
 CPUState *env = _env;
 
 qemu_cond_broadcast(env->halt_cond);
-if (kvm_enabled() && !env->thread_kicked) {
+if (!env->thread_kicked) {
 qemu_cpu_kick_thread(env);
 env->thread_kicked = true;
 }
@@ -750,17 +750,7 @@ int qemu_cpu_is_self(void *_env)
 
 void qemu_mutex_lock_iothread(void)
 {
-if (kvm_enabled()) {
-qemu_mutex_lock(&qemu_global_mutex);
-} else {
-iothread_requesting_mutex = true;
-if (qemu_mutex_trylock(&qemu_global_mutex)) {
-qemu_cpu_kick_thread(first_cpu);
-qemu_mutex_lock(&qemu_global_mutex);
-}
-iothread_requesting_mutex = false;
-qemu_cond_broadcast(&qemu_io_proceeded_cond);
-}
+qemu_mutex_lock(&qemu_global_mutex);
 }
 
 void qemu_mutex_unlock_iothread(void)
@@ -912,7 +902,11 @@ static int tcg_cpu_exec(CPUState *env)
 env->icount_decr.u16.low = decr;
 env->icount_extra = count;
 }
+qemu_mutex_unlock(&qemu_global_mutex);
+
 ret = cpu_exec(env);
+
+qemu_mutex_lock(&qemu_global_mutex);
 #ifdef CONFIG_PROFILER
 qemu_time += profile_getclock() - ti;
 #endif
diff --git a/exec.c b/exec.c
index 1e6f732..0524574 100644
--- a/exec.c
+++ b/exec.c
@@ -1118,6 +1118,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif
 #ifdef TARGET_HAS_PRECISE_SMC
 if (current_tb_modified) {
+qemu_mutex_unlock_iothread();
 /* we generate a block containing just the instruction
modifying the memory. It will ensure that it cannot modify
itself */
@@ -1205,6 +1206,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
 p->first_tb = NULL;
 #ifdef TARGET_HAS_PRECISE_SMC
 if (current_tb_modified) {
+qemu_mutex_unlock_iothread();
 /* we generate a block containing just the instruction
modifying the memory. It will ensure that it cannot modify
itself */
@@ -2061,9 +2063,11 @@ void tlb_flush_page(CPUState *env, target_ulong addr)
can be detected */
 static void tlb_protect_code(ram_addr_t ram_addr)
 {
+qemu_mutex_lock_iothread();
 cpu_physical_memory_reset_dirty(ram_addr,
 ram_addr + TARGET_PAGE_SIZE,
 CODE_DIRTY_FLAG);
+qemu_mutex_unlock_iothread();
 }
 
 /* update the TLB so that writes in physical page 'phys_addr' are no longer
@@ -3122,11 +3126,13 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 
 QLIST_FOREACH(block, &ram_list.blocks, next) {
 if (addr - block->offset < block->length) {
+#if 0 /* requires RCU - and like also a smarter balancing algorithm */
 /* Move this entry to to start of the list.  */
 if (block != QLIST_FIRST(&ram_list.blocks)) {
 QLIST_REMOVE(block, next);
 QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
 }
+#endif
 if (xen_enabled()) {
 /* We need to check if the requested address is in the RAM
  * because we don't want to map the entire memory in QEMU.
@@ -3417,6 +3423,7 @@ static void check_watchpoint(int offset, int len_mask

Re: [Qemu-devel] [PATCH] tracetool: Omit useless QEMU_*_ENABLED() check

2011-09-27 Thread Masami Hiramatsu

(2011/09/27 17:00), Stefan Hajnoczi wrote:
> SystemTap provides a "semaphore" that can optionally be tested before
> executing a trace event.  The purpose of this mechanism is to skip
> expensive tracing code when the trace event is disabled.
> 
> For example, some applications may have trace events that format or
> convert strings for trace events.  This expensive processing should only
> be done in the case where the trace event is enabled.
> 
> Since QEMU's generated trace events never have such special-purpose
> code, there is no reason to add the semaphore check.
> 
> Signed-off-by: Stefan Hajnoczi 

Thanks Stefan,

The systemtap sdt.h just makes a space with NOP for inserting
a breakpoint. However, ENABLED() check adds another static
variable checking for each tracepoint. So without any clear
reason, ENABLED() should not be used.

I've checked that this can remove useless conditional jumps from
the code.

Without this patch:
00420130 :
[snip]
  420146:   48 89 44 24 08  mov%rax,0x8(%rsp)
  42014b:   31 c0   xor%eax,%eax
 // Here, this check a semaphore
  42014d:   66 83 3d 0f da 58 00cmpw   $0x0,0x58da0f(%rip)
  420154:   00
// And a conditional jump
  420155:   75 3e   jne420195 
  420157:   48 8b 1d 4a df 58 00mov0x58df4a(%rip),%rbx
[snip]
  420192:   41 5c   pop%r12
  420194:   c3  retq
  420195:   90  nop // jump to here
  420196:   eb bf   jmp420157 

With this patch:
00420130 :
[snip]
  420146:   48 89 44 24 08  mov%rax,0x8(%rsp)
  42014b:   31 c0   xor%eax,%eax
  42014d:   90  nop // We have just one NOP
  42014e:   48 8b 1d 13 d7 58 00mov0x58d713(%rip),%rbx
  420155:   48 85 dbtest   %rbx,%rbx

Reviewed-by: Masami Hiramatsu 

> ---
>  scripts/tracetool |4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/scripts/tracetool b/scripts/tracetool
> index 4c9951d..ffe3eba 100755
> --- a/scripts/tracetool
> +++ b/scripts/tracetool
> @@ -415,9 +415,7 @@ linetoh_dtrace()
>  # Define an empty function for the trace event
>  cat <  static inline void trace_$name($args) {
> -if (QEMU_${nameupper}_ENABLED()) {
> -QEMU_${nameupper}($argnames);
> -}
> +QEMU_${nameupper}($argnames);
>  }
>  EOF
>  }


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com

Re: [Qemu-devel] [FYI] Soft feature freeze for 1.0 is 10/15 (three weeks away)

2011-09-27 Thread Avi Kivity

On 09/27/2011 11:33 AM, Avi Kivity wrote:

On 09/26/2011 09:07 PM, Blue Swirl wrote:
>>  The default address is used for early serial printk in OpenBIOS, 
so it

>>  should still work.
>
>  Ok, so drop the extra mapping, but init the dynamic mapping to 
0x80013000.

That should work.

It's already there (macio.c):

if (macio_state->escc_mem) {
memory_region_add_subregion(bar, 0x13000, macio_state->escc_mem);
}

I'll drop the extra mapping.

Well, it's not that easy.  As the other mapping is part of an ordinary 
BAR, you need to setup the device (at least PCI_COMMAND and 
PCI_BASE_ADDRESS_0) so it responds to memory requests, and also enable 
the bridge.

We could hack it by having a low-priority mapping at 0x80013000, but it 
seems wrong.  Maybe the firmware should configure that BAR first?  What 
happens on real hardware?

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC] Generic image streaming

2011-09-27 Thread Zhi Yong Wu

On Fri, Sep 23, 2011 at 11:57 PM, Stefan Hajnoczi
 wrote:
> Here is my generic image streaming branch, which aims to provide a way
> to copy the contents of a backing file into an image file of a running
> guest without requiring specific support in the various block drivers
> (e.g.  qcow2, qed, vmdk):
>
> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api
Sorry, i missed this support logic. thanks.
>
> The tree does not provide full image streaming yet but I'd like to
> discuss the approach taken in the code.  Here are the main points:
>
> The image streaming API is available through HMP and QMP commands.  When
> streaming is started on a block device a coroutine is created to do the
> background I/O work.  The coroutine can be cancelled.
>
> While the coroutine copies data from the backing file into the image
> file, the guest may be performing I/O to the image file.  Guest reads do
> not conflict with streaming but guest writes require special handling.
> If the guest writes to a region of the image file that we are currently
> copying, then there is the potential to clobber the guest write with old
> data from the backing file.
>
> Previously I solved this in a QED-specific way by taking advantage of
> the serialization of allocating write requests.  In order to do this
> generically we need to track in-flight requests and have the ability to
> queue I/O.  Guest writes that affect an in-flight streaming copy
> operation must wait for that operation to complete before being issued.
> Streaming copy operations must skip overlapping regions of guest writes.
>
> One big difference to the QED image streaming implementation is that
> this generic implementation is not based on copy-on-read operations.
> Instead we do a sequence of bdrv_is_allocated() to find regions for
> streaming, followed by bdrv_co_read() and bdrv_co_write() in order to
> populate the image file.
>
> It turns out that generic copy-on-read is not an attractive operation
> because it requires using bounce buffers for every request.  Kevin
> pointed out the case where a guest performs a read and pokes the data
> buffer before the read completes, copy-on-read would write out the
> modified memory into the image file unless we use a bounce buffer.
>
> There are a few pieces missing in my tree, which have mostly been solved
> in other places and just need to be reused:
> 1. Arbitration between guest and streaming requests (this is the only
>   real new thing).
> 2. Efficient zero handling (skip writing those regions or mark them as
>   zero clusters).
> 3. Queuing/dependencies when arbitration decides a request must wait.
>   I'm taking a look at reusing Zhi Yong's block queue.
> 4. Rate-limiting to ensure streaming I/O does not impact the guest.
>   Already exists in the QED-specific patches, it may make sense to
>   extract common code that both migration and the block layer can use.
>
> Ideas or questions?
>
> Stefan
>
>



-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH v3] memory: simple memory tree printer

2011-09-27 Thread Avi Kivity


On 09/26/2011 11:03 PM, Blue Swirl wrote:

Add a monitor command 'info mtree' to show the memory hierarchy
much like /proc/iomem in Linux.


Thanks, applied.


+
+#ifdef TARGET_I386
+QTAILQ_INIT(&ml_head);
+mon_printf(f, "I/O\n");
+mtree_print_mr(mon_printf, f, address_space_io.root, 0, 0,&ml_head);
+#endif



I dropped the #ifdef here, since some non-x86 still have fake I/O 
address spaces.


--
error compiling committee.c: too many arguments to function

[Qemu-devel] [PULL] VirtFS update

2011-09-27 Thread Aneesh Kumar K.V


The following changes since commit d85a1302a91912c52cdc3fe459b313848a8a0792:

  Merge remote-tracking branch 'kwolf/for-anthony' into staging (2011-09-22 
10:31:26 -0500)

are available in the git repository at:

  git://repo.or.cz/qemu/v9fs.git for-upstream-5

Aneesh Kumar K.V (8):
  hw/9pfs: Make v9fs_string* functions non-static
  hw/9pfs: Use read-write lock for protecting fid path.
  hw/9pfs: Move fid pathname tracking to seperate data type.
  hw/9pfs: Add init callback to fs driver
  hw/9pfs: Add fs driver specific details to fscontext
  hw/9pfs: Avoid unnecessary get_fid in v9fs_clunk
  hw/9pfs: Implement TFLUSH operation
  hw/9pfs: Add handle based fs driver

 Makefile.objs  |8 +-
 fsdev/file-op-9p.h |   54 ++-
 fsdev/qemu-fsdev.c |1 +
 fsdev/qemu-fsdev.h |1 +
 hw/9pfs/codir.c|   64 +++-
 hw/9pfs/cofile.c   |  110 +--
 hw/9pfs/cofs.c |  195 ++--
 hw/9pfs/coxattr.c  |   41 ++-
 hw/9pfs/virtio-9p-coth.h   |   72 +++--
 hw/9pfs/virtio-9p-device.c |   10 +-
 hw/9pfs/virtio-9p-handle.c |  611 +
 hw/9pfs/virtio-9p-local.c  |  213 +---
 hw/9pfs/virtio-9p.c|  820 
 hw/9pfs/virtio-9p.h|   50 +++-
 14 files changed, 1712 insertions(+), 538 deletions(-)
 create mode 100644 hw/9pfs/virtio-9p-handle.c

Re: [Qemu-devel] LPC2011 Virtualization Micro Conf

2011-09-27 Thread Jes Sorensen

Hi,

For those who are interested, I have posted the notes from the 2011
Linux Plumbers Virtualization micro conference here:

http://wiki.linuxplumbersconf.org/2011:virtualization

Slides can be found by clicking on the presentation and going onto the
Plumbers abstracts.

Cheers,
Jes

[Qemu-devel] [PATCH] slirp: Fix packet expiration

2011-09-27 Thread Thomas Huth


The two new variables "arp_requested" and "expiration_date" in the mbuf
structure have been added after the variable-sized "m_dat_" array. The
variables have to be added before the m_dat_ array instead.
Without this patch, the expiration_date gets clobbered by code that
accesses the m_dat_ array.
I experienced this problem with the code in slirp/tftp.c: The
tftp_send_data() function created a new packet with the m_get()
function (which fills-in a default expiration_date value). Then the
TFTP code cleared the data section of the packet, which accidentially
also cleared the expiration_date. This zeroed expiration_date then
finally causes the packet to be discarded during if_start(), so that
TFTP packets were not transmitted anymore.

Signed-off-by: Thomas Huth 
---
 slirp/mbuf.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/slirp/mbuf.h b/slirp/mbuf.h
index 55170e5..37b9dbb 100644
--- a/slirp/mbuf.h
+++ b/slirp/mbuf.h
@@ -82,12 +82,12 @@ struct m_hdr {
 struct mbuf {
struct  m_hdr m_hdr;
Slirp *slirp;
+   boolarp_requested;
+   uint64_t expiration_date;
union M_dat {
charm_dat_[1]; /* ANSI don't like 0 sized arrays */
char*m_ext_;
} M_dat;
-bool arp_requested;
-uint64_t expiration_date;
 };
 
 #define m_next m_hdr.mh_next

[Qemu-devel] [PATCH] ppc/e500_pci: Fix an array overflow issue

2011-09-27 Thread Liu Yu

Signed-off-by: Liu Yu 
---
 hw/ppce500_pci.c |   26 --
 1 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c
index 2db365d..3e24e85 100644
--- a/hw/ppce500_pci.c
+++ b/hw/ppce500_pci.c
@@ -108,15 +108,18 @@ static uint32_t pci_reg_read4(void *opaque, 
target_phys_addr_t addr)
 
 case PPCE500_PCI_IW3:
 case PPCE500_PCI_IW2:
-case PPCE500_PCI_IW1:
+case PPCE500_PCI_IW1: {
+int idx = ((addr >> 5) & 0x3) - 1;
+
 switch (addr & 0xC) {
-case PCI_PITAR: value = pci->pib[(addr >> 5) & 0x3].pitar; break;
-case PCI_PIWBAR: value = pci->pib[(addr >> 5) & 0x3].piwbar; break;
-case PCI_PIWBEAR: value = pci->pib[(addr >> 5) & 0x3].piwbear; break;
-case PCI_PIWAR: value = pci->pib[(addr >> 5) & 0x3].piwar; break;
+case PCI_PITAR: value = pci->pib[idx].pitar; break;
+case PCI_PIWBAR: value = pci->pib[idx].piwbar; break;
+case PCI_PIWBEAR: value = pci->pib[idx].piwbear; break;
+case PCI_PIWAR: value = pci->pib[idx].piwar; break;
 default: break;
 };
 break;
+}
 
 case PPCE500_PCI_GASKET_TIMR:
 value = pci->gasket_time;
@@ -164,15 +167,18 @@ static void pci_reg_write4(void *opaque, 
target_phys_addr_t addr,
 
 case PPCE500_PCI_IW3:
 case PPCE500_PCI_IW2:
-case PPCE500_PCI_IW1:
+case PPCE500_PCI_IW1: {
+int idx = ((addr >> 5) & 0x3) - 1;
+
 switch (addr & 0xC) {
-case PCI_PITAR: pci->pib[(addr >> 5) & 0x3].pitar = value; break;
-case PCI_PIWBAR: pci->pib[(addr >> 5) & 0x3].piwbar = value; break;
-case PCI_PIWBEAR: pci->pib[(addr >> 5) & 0x3].piwbear = value; break;
-case PCI_PIWAR: pci->pib[(addr >> 5) & 0x3].piwar = value; break;
+case PCI_PITAR: pci->pib[idx].pitar = value; break;
+case PCI_PIWBAR: pci->pib[idx].piwbar = value; break;
+case PCI_PIWBEAR: pci->pib[idx].piwbear = value; break;
+case PCI_PIWAR: pci->pib[idx].piwar = value; break;
 default: break;
 };
 break;
+}
 
 case PPCE500_PCI_GASKET_TIMR:
 pci->gasket_time = value;
-- 
1.6.4

Re: [Qemu-devel] [PATCH] slirp: Fix packet expiration

2011-09-27 Thread Jan Kiszka

On 2011-09-27 11:20, Thomas Huth wrote:
> 
> The two new variables "arp_requested" and "expiration_date" in the mbuf
> structure have been added after the variable-sized "m_dat_" array. The
> variables have to be added before the m_dat_ array instead.
> Without this patch, the expiration_date gets clobbered by code that
> accesses the m_dat_ array.
> I experienced this problem with the code in slirp/tftp.c: The
> tftp_send_data() function created a new packet with the m_get()
> function (which fills-in a default expiration_date value). Then the
> TFTP code cleared the data section of the packet, which accidentially
> also cleared the expiration_date. This zeroed expiration_date then
> finally causes the packet to be discarded during if_start(), so that
> TFTP packets were not transmitted anymore.
> 
> Signed-off-by: Thomas Huth 
> ---
>  slirp/mbuf.h |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/slirp/mbuf.h b/slirp/mbuf.h
> index 55170e5..37b9dbb 100644
> --- a/slirp/mbuf.h
> +++ b/slirp/mbuf.h
> @@ -82,12 +82,12 @@ struct m_hdr {
>  struct mbuf {
>   struct  m_hdr m_hdr;
>   Slirp *slirp;
> + boolarp_requested;
> + uint64_t expiration_date;
>   union M_dat {
>   charm_dat_[1]; /* ANSI don't like 0 sized arrays */
>   char*m_ext_;
>   } M_dat;
> -bool arp_requested;
> -uint64_t expiration_date;
>  };
>  
>  #define m_next   m_hdr.mh_next

Thanks, applied.

What generates "---" as separator? Confuses git am here. You may
want to use standard "---" in the future.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH] slirp: Fix packet expiration

2011-09-27 Thread Fabien Chouteau

On 27/09/2011 11:20, Thomas Huth wrote:
> 
> The two new variables "arp_requested" and "expiration_date" in the mbuf
> structure have been added after the variable-sized "m_dat_" array. The
> variables have to be added before the m_dat_ array instead.
> Without this patch, the expiration_date gets clobbered by code that
> accesses the m_dat_ array.
> I experienced this problem with the code in slirp/tftp.c: The
> tftp_send_data() function created a new packet with the m_get()
> function (which fills-in a default expiration_date value). Then the
> TFTP code cleared the data section of the packet, which accidentially
> also cleared the expiration_date. This zeroed expiration_date then
> finally causes the packet to be discarded during if_start(), so that
> TFTP packets were not transmitted anymore.
> 

Thanks for the patch Thomas.

I think we can add a comment to avoid this kind of mistake in the
future.

/* This is a "flexible array member". It should always remain
 * the last member of the structure.
 */

> Signed-off-by: Thomas Huth 
> ---
>  slirp/mbuf.h |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/slirp/mbuf.h b/slirp/mbuf.h
> index 55170e5..37b9dbb 100644
> --- a/slirp/mbuf.h
> +++ b/slirp/mbuf.h
> @@ -82,12 +82,12 @@ struct m_hdr {
>  struct mbuf {
>   struct  m_hdr m_hdr;
>   Slirp *slirp;
> + boolarp_requested;
> + uint64_t expiration_date;
>   union M_dat {
>   charm_dat_[1]; /* ANSI don't like 0 sized arrays */
>   char*m_ext_;
>   } M_dat;
> -bool arp_requested;
> -uint64_t expiration_date;
>  };
>  
>  #define m_next   m_hdr.mh_next
> 


-- 
Fabien Chouteau

Re: [Qemu-devel] [PATCH] slirp: Fix packet expiration

2011-09-27 Thread Jan Kiszka

On 2011-09-27 12:56, Fabien Chouteau wrote:
> On 27/09/2011 11:20, Thomas Huth wrote:
>>
>> The two new variables "arp_requested" and "expiration_date" in the mbuf
>> structure have been added after the variable-sized "m_dat_" array. The
>> variables have to be added before the m_dat_ array instead.
>> Without this patch, the expiration_date gets clobbered by code that
>> accesses the m_dat_ array.
>> I experienced this problem with the code in slirp/tftp.c: The
>> tftp_send_data() function created a new packet with the m_get()
>> function (which fills-in a default expiration_date value). Then the
>> TFTP code cleared the data section of the packet, which accidentially
>> also cleared the expiration_date. This zeroed expiration_date then
>> finally causes the packet to be discarded during if_start(), so that
>> TFTP packets were not transmitted anymore.
>>
> 
> Thanks for the patch Thomas.
> 
> I think we can add a comment to avoid this kind of mistake in the
> future.
> 
> /* This is a "flexible array member". It should always remain
>  * the last member of the structure.
>  */

Good point, folded something like that in.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] When the tlb_fill will be called from generated code?

2011-09-27 Thread 陳韋任

Hi, Jan

> You can find the answer yourself: Load qemu into gdb, set a breakpoint
> on that function and let it run. If you want to catch only the retaddr
> == NULL case, make the breakpoint conditional.

  Thanks for your tip. I see when retaddr != NULL, then the calling
sequence of tlb_fill might be something like (take i386 guest for
example):

  - __stl_mmu/__ldl_mmu -> tlb_fill

  - helper_ljmp_protected -> load_segment -> ldl_kernel -> __ldl_mmu

I am not sure when/where __stl_mmu/__ldl_mmu are used. I do set
breakpoint on __stl_mmu/__ldl_mmu, but the backtrace can only show
something like,

#0  __stl_mmu (addr=196608, val=0, mmu_idx=0) at 
/tmp/chenwj/temp/qemu-0.13.0/softmmu_template.h:228
#1  0x400028e1 in ?? ()
#2  0x00b4 in ?? ()
#3  0xecc68ff412fa4137 in ?? ()
#4  0x in ?? ()

When retaddr == NULL, then the calling sequence of tlb_fill is,

  tb_find_slow -> get_page_addr_code -> ldub_code -> __ldb_cmmu

I can only guest the b in __ldb_cmmu means load byte, but I can't
figure out what's the difference between _cmmu and _mmu. Could you
give me some hint? Thanks!


Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667

Re: [Qemu-devel] [PATCH] Changed the type of val argument of the function gt64120_writel() from uint32_t to uint64_t, so change the corresponding bswap32() to bswap64().

2011-09-27 Thread Peter Maydell

On 27 September 2011 02:20, Ray Wang  wrote:
> On 9/26/2011 5:46 PM, Peter Maydell wrote:
>> I don't know this device, but this looks a bit suspicious. If
>> you do a bswap64() on the value for a 32-bit write this will put
>> the data into the high 32 bits and zeros into the low half; then
>> storing into s->regs[] will just write a zero (since regs[] is
>> 32 bits), won't it? Changing only writel and not readl also looks
>> odd.
>
>   Yes, you're right. However, if a 64-bit value with effective high 32 bits
> is considered as a 32-bit one, some significant information will be lost.

Yes, but this function is never passed in values with the high part
set. The memory region API gives each device a single function which
implements all widths of memory access (1,2,4 and at some point we
will implement 8 byte widths), so the 'value' parameter is 64 bits
wide. However for accesses at sizes less than 8 only the bottom part
has interesting data, the top part is always zero.

Clearly when we get support for 64-bit accesses, this device will
need some bugfixes for that to work (for instance, it will need to
support passing on the value for pci config writes at the right
width rather than always passing 4). I had a quick scan of the
datasheet for the chip, and I couldn't see anything saying what
happens if you try to access the 32 bit registers at 64 bits, though.
A patch adding support for 64 bit accesses would have to be done
so as not to break 32 bit accesses, obviously.

>> What is the bug this change is trying to fix?
>
>   It is always a potential bug to process a 64-bit value as 32-bit one,
> isn't it?

No. Sometimes a 64 bit type contains 32 bit data. If the code is
wrong then it should be possible to demonstrate that QEMU behaves
wrongly (ie differently from the hardware) as a result.

-- PMM

[Qemu-devel] [PATCH qemu] e1000: CTRL.RST emulation

2011-09-27 Thread Michael S. Tsirkin

e1000 spec says CTRL.RST write should have the same effect
as bus reset, except that is preserves PCI Config.
Reset device registers and interrupts.

Fix suggested by Andy Gospodarek 

Signed-off-by: Michael S. Tsirkin 
---
 hw/e1000.c |   97 +++
 1 files changed, 51 insertions(+), 46 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 6a3a941..81328f4 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -151,6 +151,40 @@ static const char phy_regcap[0x20] = {
 [PHY_ID2] = PHY_R, [M88E1000_PHY_SPEC_STATUS] = PHY_R
 };
 
+static const uint16_t e1000_eeprom_template[64] = {
+0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
+0x3000, 0x1000, 0x6403, E1000_DEVID, 0x8086, E1000_DEVID, 0x8086, 0x3040,
+0x0008, 0x2000, 0x7e14, 0x0048,  0x1000, 0x00d8,  0x, 0x2700,
+0x6cc9, 0x3150, 0x0722, 0x040b,  0x0984, 0x,  0xc000, 0x0706,
+0x1008, 0x, 0x0f04, 0x7fff,  0x4d01, 0x,  0x, 0x,
+0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
+0x0100, 0x4000, 0x121c, 0x,  0x, 0x,  0x, 0x,
+0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
+};
+
+static const uint16_t phy_reg_init[] = {
+[PHY_CTRL] = 0x1140,   [PHY_STATUS] = 0x796d, // link 
initially up
+[PHY_ID1] = 0x141, [PHY_ID2] = PHY_ID2_INIT,
+[PHY_1000T_CTRL] = 0x0e00, [M88E1000_PHY_SPEC_CTRL] = 
0x360,
+[M88E1000_EXT_PHY_SPEC_CTRL] = 0x0d60, [PHY_AUTONEG_ADV] = 0xde1,
+[PHY_LP_ABILITY] = 0x1e0,  [PHY_1000T_STATUS] = 0x3c00,
+[M88E1000_PHY_SPEC_STATUS] = 0xac00,
+};
+
+static const uint32_t mac_reg_init[] = {
+[PBA] = 0x00100030,
+[LEDCTL] =  0x602,
+[CTRL] =E1000_CTRL_SWDPIN2 | E1000_CTRL_SWDPIN0 |
+E1000_CTRL_SPD_1000 | E1000_CTRL_SLU,
+[STATUS] =  0x8000 | E1000_STATUS_GIO_MASTER_ENABLE |
+E1000_STATUS_ASDV | E1000_STATUS_MTXCKOK |
+E1000_STATUS_SPEED_1000 | E1000_STATUS_FD |
+E1000_STATUS_LU,
+[MANC] =E1000_MANC_EN_MNG2HOST | E1000_MANC_RCV_TCO_EN |
+E1000_MANC_ARP_EN | E1000_MANC_0298_EN |
+E1000_MANC_RMCP_EN,
+};
+
 static void
 set_interrupt_cause(E1000State *s, int index, uint32_t val)
 {
@@ -192,9 +226,26 @@ rxbufsize(uint32_t v)
 return 2048;
 }
 
+static void e1000_reset(void *opaque)
+{
+E1000State *d = opaque;
+
+memset(d->phy_reg, 0, sizeof d->phy_reg);
+memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
+memset(d->mac_reg, 0, sizeof d->mac_reg);
+memmove(d->mac_reg, mac_reg_init, sizeof mac_reg_init);
+d->rxbuf_min_shift = 1;
+memset(&d->tx, 0, sizeof d->tx);
+}
+
 static void
 set_ctrl(E1000State *s, int index, uint32_t val)
 {
+if (val & E1000_CTRL_RST) {
+e1000_reset(s);
+qemu_set_irq(s->dev.irq[0], 0);
+}
+
 /* RST is self clearing */
 s->mac_reg[CTRL] = val & ~E1000_CTRL_RST;
 }
@@ -1047,40 +1098,6 @@ static const VMStateDescription vmstate_e1000 = {
 }
 };
 
-static const uint16_t e1000_eeprom_template[64] = {
-0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
-0x3000, 0x1000, 0x6403, E1000_DEVID, 0x8086, E1000_DEVID, 0x8086, 0x3040,
-0x0008, 0x2000, 0x7e14, 0x0048,  0x1000, 0x00d8,  0x, 0x2700,
-0x6cc9, 0x3150, 0x0722, 0x040b,  0x0984, 0x,  0xc000, 0x0706,
-0x1008, 0x, 0x0f04, 0x7fff,  0x4d01, 0x,  0x, 0x,
-0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
-0x0100, 0x4000, 0x121c, 0x,  0x, 0x,  0x, 0x,
-0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
-};
-
-static const uint16_t phy_reg_init[] = {
-[PHY_CTRL] = 0x1140,   [PHY_STATUS] = 0x796d, // link 
initially up
-[PHY_ID1] = 0x141, [PHY_ID2] = PHY_ID2_INIT,
-[PHY_1000T_CTRL] = 0x0e00, [M88E1000_PHY_SPEC_CTRL] = 
0x360,
-[M88E1000_EXT_PHY_SPEC_CTRL] = 0x0d60, [PHY_AUTONEG_ADV] = 0xde1,
-[PHY_LP_ABILITY] = 0x1e0,  [PHY_1000T_STATUS] = 0x3c00,
-[M88E1000_PHY_SPEC_STATUS] = 0xac00,
-};
-
-static const uint32_t mac_reg_init[] = {
-[PBA] = 0x00100030,
-[LEDCTL] =  0x602,
-[CTRL] =E1000_CTRL_SWDPIN2 | E1000_CTRL_SWDPIN0 |
-E1000_CTRL_SPD_1000 | E1000_CTRL_SLU,
-[STATUS] =  0x8000 | E1000_STATUS_GIO_MASTER_ENABLE |
-E1000_STATUS_ASDV | E1000_STATUS_MTXCKOK |
-E1000_STATUS_SPEED_1000 | E1000_STATUS_FD |
-E1000_STATUS_LU,
-[MANC] =E1000_MANC_EN_MNG2HOST | E1000_MANC_RCV_TCO_EN |
-E1000_MANC_ARP_EN | E1000_MANC_0298_EN |
-E1000_MANC_RMCP_EN,
-};
-
 /* PCI i

Re: [Qemu-devel] When the tlb_fill will be called from generated code?

2011-09-27 Thread Max Filippov

> I am not sure when/where __stl_mmu/__ldl_mmu are used. I do set

They are called from the places in TBs where
tcg_gen_qemu_{ld,st}{8,16,32,64}{u,s} were injected.

> breakpoint on __stl_mmu/__ldl_mmu, but the backtrace can only show
> something like,
>
> #0  __stl_mmu (addr=196608, val=0, mmu_idx=0) at 
> /tmp/chenwj/temp/qemu-0.13.0/softmmu_template.h:228
> #1  0x400028e1 in ?? ()
> #2  0x00b4 in ?? ()
> #3  0xecc68ff412fa4137 in ?? ()
> #4  0x in ?? ()
>
> When retaddr == NULL, then the calling sequence of tlb_fill is,
>
>  tb_find_slow -> get_page_addr_code -> ldub_code -> __ldb_cmmu
>
> I can only guest the b in __ldb_cmmu means load byte, but I can't
> figure out what's the difference between _cmmu and _mmu. Could you
> give me some hint? Thanks!

_cmmu is used to access code, _mmu is for data.

-- 
Thanks.
-- Max

Re: [Qemu-devel] [PATCH qemu] e1000: CTRL.RST emulation

2011-09-27 Thread Peter Maydell

On 27 September 2011 12:25, Michael S. Tsirkin  wrote:
> e1000 spec says CTRL.RST write should have the same effect
> as bus reset, except that is preserves PCI Config.
> Reset device registers and interrupts.
>
> Fix suggested by Andy Gospodarek 

Doesn't this have the same effect as this patch:
http://patchwork.ozlabs.org/patch/108673/

except that it's harder to read because it's moved a lot
of code around in the file?

(I think you have an extra qemu_set_irq() call in there,
actually. But it was hard to find. Also your code has the
bug that was in earlier revisions of Anthony's patch where
after doing the reset you fall through and allow other bits
in the ctrl register to be set.)

-- PMM

Re: [Qemu-devel] [RFC] Generic image streaming

2011-09-27 Thread Stefan Hajnoczi

On Mon, Sep 26, 2011 at 3:21 PM, Stefan Hajnoczi
 wrote:
> On Mon, Sep 26, 2011 at 09:35:01AM -0300, Marcelo Tosatti wrote:
>> On Fri, Sep 23, 2011 at 04:57:26PM +0100, Stefan Hajnoczi wrote:
>> > Here is my generic image streaming branch, which aims to provide a way
>> > to copy the contents of a backing file into an image file of a running
>> > guest without requiring specific support in the various block drivers
>> > (e.g.  qcow2, qed, vmdk):
>> >
>> > http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api
>> >
>> > The tree does not provide full image streaming yet but I'd like to
>> > discuss the approach taken in the code.  Here are the main points:
>> >
>> > The image streaming API is available through HMP and QMP commands.  When
>> > streaming is started on a block device a coroutine is created to do the
>> > background I/O work.  The coroutine can be cancelled.
>> >
>> > While the coroutine copies data from the backing file into the image
>> > file, the guest may be performing I/O to the image file.  Guest reads do
>> > not conflict with streaming but guest writes require special handling.
>> > If the guest writes to a region of the image file that we are currently
>> > copying, then there is the potential to clobber the guest write with old
>> > data from the backing file.
>> >
>> > Previously I solved this in a QED-specific way by taking advantage of
>> > the serialization of allocating write requests.  In order to do this
>> > generically we need to track in-flight requests and have the ability to
>> > queue I/O.  Guest writes that affect an in-flight streaming copy
>> > operation must wait for that operation to complete before being issued.
>> > Streaming copy operations must skip overlapping regions of guest writes.
>> >
>> > One big difference to the QED image streaming implementation is that
>> > this generic implementation is not based on copy-on-read operations.
>> > Instead we do a sequence of bdrv_is_allocated() to find regions for
>> > streaming, followed by bdrv_co_read() and bdrv_co_write() in order to
>> > populate the image file.
>> >
>> > It turns out that generic copy-on-read is not an attractive operation
>> > because it requires using bounce buffers for every request.
>>
>> Isnt COR essential for a decent read performance on the
>> image-stream-from-slow-remote-origin case?
>
> It is essential for re-read performance from a slow backing file.  With
> images over internet HTTP it most definitely is worth doing
> copy-on-read.
>
> In the case of an NFS server the performance depends on the network and
> server.  It might be similar speed or faster to read from NFS.
>
> I will think some more about how to implement generic copy-on-read.

I've sketched out how generic copy-on-read can work.  It's probably
not much extra effort since we need request tracking and the ability
to queue/hold requests anyway.

I hope to have patches implementing this by the end of the week:

1. When CoR is enabled, overlapping requests get queued so that only
one is actually being issued to the host at a time.  This prevents
race conditions where a guest write request is clobbered by a
copy-on-read.  Note that only overlapping requests are queued,
non-overlapping requests proceed in parallel.

2. The read operation uses bdrv_is_allocated() first to see whether a
copy-on-read needs to be performed or if we can go down the fast path.
 The fast path is the normal read straight into the guest buffer.  The
copy-on-read path reads into a bounce buffer, writes into the image
file, and then copies the bounce buffer into the guest buffer.

3. The .bdrv_is_allocated() implementations will be audited and
improved to make them aio/coroutine-friendly where necessary.

Stefan

Re: [Qemu-devel] [PATCH] pseries: Refactor spapr irq allocation

2011-09-27 Thread Alexander Graf


On 16.09.2011, at 08:49, David Gibson wrote:

> Paulo Bonzini changed the original spapr code, which manually assigned irq
> numbers for each virtual device, to allocate them automatically from the
> device initialization. That allowed spapr virtual devices to be constructed
> with -device, which is a good start.  However, the way that patch worked
> doesn't extend nicely for the future when we want to support devices other
> than sPAPR VIO devices (e.g. virtio and PCI).
> 
> This patch rearranges the irq allocation to be global across the sPAPR
> environment, so it can be used by other bus types as well.
> 
> Signed-off-by: David Gibson 

Thanks, applied to ppc-next.

Alex

Re: [Qemu-devel] [PATCH qemu] e1000: CTRL.RST emulation

2011-09-27 Thread Michael S. Tsirkin

On Tue, Sep 27, 2011 at 12:50:21PM +0100, Peter Maydell wrote:
> On 27 September 2011 12:25, Michael S. Tsirkin  wrote:
> > e1000 spec says CTRL.RST write should have the same effect
> > as bus reset, except that is preserves PCI Config.
> > Reset device registers and interrupts.
> >
> > Fix suggested by Andy Gospodarek 
> 
> Doesn't this have the same effect as this patch:
> http://patchwork.ozlabs.org/patch/108673/

Right except mine clears the interrupts as well.
I missed that patch - what happened to it in the end?

> except that it's harder to read because it's moved a lot
> of code around in the file?
> 
> (I think you have an extra qemu_set_irq() call in there,

The device spec says we should reset the interrupts.
So it seems necessary.

> actually. But it was hard to find.

I probably should split the patch out
1. code movement
2. code change

Forward declarations just to work around random
placement of functions in file seem wrong -
why not order the functions sensibly instead?

> Also your code has the
> bug that was in earlier revisions of Anthony's patch where
> after doing the reset you fall through and allow other bits
> in the ctrl register to be set.)
> 
> -- PMM

True.

-- 
MST

Re: [Qemu-devel] [PATCH v3] memory: simple memory tree printer

2011-09-27 Thread Jan Kiszka

On 2011-09-26 22:03, Blue Swirl wrote:
> Add a monitor command 'info mtree' to show the memory hierarchy
> much like /proc/iomem in Linux.
> 
> Signed-off-by: Blue Swirl 
> ---
> The alias addresses are unbiased (PPC):
> -fffe : system
>   800a-800a : alias vga.chain4 @vga.vram -
>   8088-808f : macio
> 808e-808f : macio-nvram
> 808a-808a0fff : pmac-ide
> 80896000-80895fff : (null)
> 80893000-8089303f : alias escc-bar @escc -003f
> 80888000-80888fff : dbdma
> 8088-80880fff : heathrow-pic
>   8080-8080 : vga.rom
>   8000-807f : vga.vram
>   800a-800b : vga-lowmem
>   80013000-8001303f : escc
>   fee0-fee00fff : pci-data-idx
>   fec0-fec00fff : pci-conf-idx
>   fe00-fe1f : isa-mmio
> vga.vram
> 8000-807f : vga.vram
> escc
> 80013000-8001303f : escc
> ---
>  memory.c  |   88 
> +
>  memory.h  |2 +
>  monitor.c |   13 +
>  3 files changed, 103 insertions(+), 0 deletions(-)
> 
> diff --git a/memory.c b/memory.c
> index 71e769e..53264be 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1271,3 +1271,91 @@ void set_system_io_map(MemoryRegion *mr)
>  address_space_io.root = mr;
>  memory_region_update_topology();
>  }
> +
> +typedef struct MemoryRegionList MemoryRegionList;
> +
> +struct MemoryRegionList {
> +const MemoryRegion *mr;
> +bool printed;

What's the point of printed? It's never true unless I'm missing
something fundamental.

> +QTAILQ_ENTRY(MemoryRegionList) queue;
> +};
> +
> +typedef QTAILQ_HEAD(queue, MemoryRegionList) MemoryRegionListHead;
> +
> +static void mtree_print_mr(fprintf_function mon_printf, void *f,
> +   const MemoryRegion *mr, unsigned int level,
> +   target_phys_addr_t base,
> +   MemoryRegionListHead *print_queue)
> +{
> +const MemoryRegion *submr;
> +unsigned int i;
> +
> +for (i = 0; i < level; i++) {
> +mon_printf(f, "  ");
> +}
> +
> +if (mr->alias) {
> +MemoryRegionList *ml;
> +bool found = false;
> +
> +/* check if the alias is already in the queue */
> +QTAILQ_FOREACH(ml, print_queue, queue) {
> +if (ml->mr == mr->alias && !ml->printed) {
> +found = true;
> +}
> +}
> +
> +if (!found) {
> +ml = g_new(MemoryRegionList, 1);
> +ml->mr = mr->alias;
> +ml->printed = false;
> +QTAILQ_INSERT_TAIL(print_queue, ml, queue);
> +}
> +mon_printf(f, TARGET_FMT_plx "-" TARGET_FMT_plx " : alias %s @%s "
> +   TARGET_FMT_plx "-" TARGET_FMT_plx "\n",
> +   base + mr->addr,
> +   base + mr->addr + (target_phys_addr_t)mr->size - 1,
> +   mr->name,
> +   mr->alias->name,
> +   mr->alias_offset,
> +   mr->alias_offset + (target_phys_addr_t)mr->size - 1);
> +} else {
> +mon_printf(f, TARGET_FMT_plx "-" TARGET_FMT_plx " : %s\n",
> +   base + mr->addr,
> +   base + mr->addr + (target_phys_addr_t)mr->size - 1,
> +   mr->name);

Region priority is still not printed. Will send a patch.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH qemu] e1000: CTRL.RST emulation

2011-09-27 Thread Peter Maydell

On 27 September 2011 13:32, Michael S. Tsirkin  wrote:
> On Tue, Sep 27, 2011 at 12:50:21PM +0100, Peter Maydell wrote:
>> On 27 September 2011 12:25, Michael S. Tsirkin  wrote:
>> > e1000 spec says CTRL.RST write should have the same effect
>> > as bus reset, except that is preserves PCI Config.
>> > Reset device registers and interrupts.
>> >
>> > Fix suggested by Andy Gospodarek 
>>
>> Doesn't this have the same effect as this patch:
>> http://patchwork.ozlabs.org/patch/108673/
>
> Right except mine clears the interrupts as well.

I agree that's required, but why doesn't it belong in
e1000_reset() ? Surely a qdev reset ought also to clear
the output irq signals...
[Disclaimer: this is fishing for somebody to explain to
me what the semantics of qdev reset actually are :-)]

> I missed that patch - what happened to it in the end?

I think it just got lost in the shuffle (Anthony L never
did come back and actually produce a compiler that gave
a warning on it, so I think that was just a misremembering
on his part.)

-- PMM

[Qemu-devel] [PATCH] linux-user: implement reboot syscall

2011-09-27 Thread Alexander Graf

For OBS, we're running a full cross-guest inside of a VM. When a build
is done there, we reboot the guest as shutdown mechanism.

Unfortunately, reboot is not implemented in linux-user. So this mechanism
fails, spilling unpretty warnings. This patch implements sys_reboot()
emulation.

Signed-off-by: Alexander Graf 
---
 linux-user/syscall.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 27970a4..808f488 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -245,6 +245,8 @@ _syscall3(int, sys_sched_getaffinity, pid_t, pid, unsigned 
int, len,
 #define __NR_sys_sched_setaffinity __NR_sched_setaffinity
 _syscall3(int, sys_sched_setaffinity, pid_t, pid, unsigned int, len,
   unsigned long *, user_mask_ptr);
+_syscall4(int, reboot, int, magic1, int, magic2, unsigned int, cmd,
+  void *, arg);
 
 static bitmask_transtbl fcntl_flags_tbl[] = {
   { TARGET_O_ACCMODE,   TARGET_O_WRONLY,O_ACCMODE,   O_WRONLY,},
@@ -5858,7 +5860,11 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 #endif
 case TARGET_NR_reboot:
-goto unimplemented;
+if (!(p = lock_user_string(arg4)))
+goto efault;
+ret = reboot(arg1, arg2, arg3, p);
+unlock_user(p, arg4, 0);
+break;
 #ifdef TARGET_NR_readdir
 case TARGET_NR_readdir:
 goto unimplemented;
-- 
1.6.0.2

Re: [Qemu-devel] [PATCH] ppc/e500_pci: Fix an array overflow issue

2011-09-27 Thread Alexander Graf


On 27.09.2011, at 10:17, Liu Yu wrote:

> Signed-off-by: Liu Yu 

Patch description missing.

Also, please always CC qemu-...@nongnu.org for patches concerning ppc.


> ---
> hw/ppce500_pci.c |   26 --
> 1 files changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c
> index 2db365d..3e24e85 100644
> --- a/hw/ppce500_pci.c
> +++ b/hw/ppce500_pci.c
> @@ -108,15 +108,18 @@ static uint32_t pci_reg_read4(void *opaque, 
> target_phys_addr_t addr)
> 
> case PPCE500_PCI_IW3:
> case PPCE500_PCI_IW2:
> -case PPCE500_PCI_IW1:
> +case PPCE500_PCI_IW1: {
> +int idx = ((addr >> 5) & 0x3) - 1;

So this is the main change, right? Why the -1? A guest could potentially access 
pib[-1] using this, no?

> +
> switch (addr & 0xC) {
> -case PCI_PITAR: value = pci->pib[(addr >> 5) & 0x3].pitar; break;
> -case PCI_PIWBAR: value = pci->pib[(addr >> 5) & 0x3].piwbar; break;
> -case PCI_PIWBEAR: value = pci->pib[(addr >> 5) & 0x3].piwbear; break;
> -case PCI_PIWAR: value = pci->pib[(addr >> 5) & 0x3].piwar; break;
> +case PCI_PITAR: value = pci->pib[idx].pitar; break;
> +case PCI_PIWBAR: value = pci->pib[idx].piwbar; break;
> +case PCI_PIWBEAR: value = pci->pib[idx].piwbear; break;
> +case PCI_PIWAR: value = pci->pib[idx].piwar; break;

I'm fairly sure this breaks checkpatch.pl.


Alex

[Qemu-devel] [PATCH 1/2] e1000: minor reset code refactoring

2011-09-27 Thread Michael S. Tsirkin

Move reset callback to start of file to make it easier
to reuse it elsewhere in code.

Signed-off-by: Michael S. Tsirkin 
---
 hw/e1000.c |   92 ++--
 1 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 6a3a941..87a1104 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -151,6 +151,40 @@ static const char phy_regcap[0x20] = {
 [PHY_ID2] = PHY_R, [M88E1000_PHY_SPEC_STATUS] = PHY_R
 };
 
+static const uint16_t e1000_eeprom_template[64] = {
+0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
+0x3000, 0x1000, 0x6403, E1000_DEVID, 0x8086, E1000_DEVID, 0x8086, 0x3040,
+0x0008, 0x2000, 0x7e14, 0x0048,  0x1000, 0x00d8,  0x, 0x2700,
+0x6cc9, 0x3150, 0x0722, 0x040b,  0x0984, 0x,  0xc000, 0x0706,
+0x1008, 0x, 0x0f04, 0x7fff,  0x4d01, 0x,  0x, 0x,
+0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
+0x0100, 0x4000, 0x121c, 0x,  0x, 0x,  0x, 0x,
+0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
+};
+
+static const uint16_t phy_reg_init[] = {
+[PHY_CTRL] = 0x1140,   [PHY_STATUS] = 0x796d, // link 
initially up
+[PHY_ID1] = 0x141, [PHY_ID2] = PHY_ID2_INIT,
+[PHY_1000T_CTRL] = 0x0e00, [M88E1000_PHY_SPEC_CTRL] = 
0x360,
+[M88E1000_EXT_PHY_SPEC_CTRL] = 0x0d60, [PHY_AUTONEG_ADV] = 0xde1,
+[PHY_LP_ABILITY] = 0x1e0,  [PHY_1000T_STATUS] = 0x3c00,
+[M88E1000_PHY_SPEC_STATUS] = 0xac00,
+};
+
+static const uint32_t mac_reg_init[] = {
+[PBA] = 0x00100030,
+[LEDCTL] =  0x602,
+[CTRL] =E1000_CTRL_SWDPIN2 | E1000_CTRL_SWDPIN0 |
+E1000_CTRL_SPD_1000 | E1000_CTRL_SLU,
+[STATUS] =  0x8000 | E1000_STATUS_GIO_MASTER_ENABLE |
+E1000_STATUS_ASDV | E1000_STATUS_MTXCKOK |
+E1000_STATUS_SPEED_1000 | E1000_STATUS_FD |
+E1000_STATUS_LU,
+[MANC] =E1000_MANC_EN_MNG2HOST | E1000_MANC_RCV_TCO_EN |
+E1000_MANC_ARP_EN | E1000_MANC_0298_EN |
+E1000_MANC_RMCP_EN,
+};
+
 static void
 set_interrupt_cause(E1000State *s, int index, uint32_t val)
 {
@@ -192,6 +226,18 @@ rxbufsize(uint32_t v)
 return 2048;
 }
 
+static void e1000_reset(void *opaque)
+{
+E1000State *d = opaque;
+
+memset(d->phy_reg, 0, sizeof d->phy_reg);
+memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
+memset(d->mac_reg, 0, sizeof d->mac_reg);
+memmove(d->mac_reg, mac_reg_init, sizeof mac_reg_init);
+d->rxbuf_min_shift = 1;
+memset(&d->tx, 0, sizeof d->tx);
+}
+
 static void
 set_ctrl(E1000State *s, int index, uint32_t val)
 {
@@ -1047,40 +1093,6 @@ static const VMStateDescription vmstate_e1000 = {
 }
 };
 
-static const uint16_t e1000_eeprom_template[64] = {
-0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
-0x3000, 0x1000, 0x6403, E1000_DEVID, 0x8086, E1000_DEVID, 0x8086, 0x3040,
-0x0008, 0x2000, 0x7e14, 0x0048,  0x1000, 0x00d8,  0x, 0x2700,
-0x6cc9, 0x3150, 0x0722, 0x040b,  0x0984, 0x,  0xc000, 0x0706,
-0x1008, 0x, 0x0f04, 0x7fff,  0x4d01, 0x,  0x, 0x,
-0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
-0x0100, 0x4000, 0x121c, 0x,  0x, 0x,  0x, 0x,
-0x, 0x, 0x, 0x,  0x, 0x,  0x, 0x,
-};
-
-static const uint16_t phy_reg_init[] = {
-[PHY_CTRL] = 0x1140,   [PHY_STATUS] = 0x796d, // link 
initially up
-[PHY_ID1] = 0x141, [PHY_ID2] = PHY_ID2_INIT,
-[PHY_1000T_CTRL] = 0x0e00, [M88E1000_PHY_SPEC_CTRL] = 
0x360,
-[M88E1000_EXT_PHY_SPEC_CTRL] = 0x0d60, [PHY_AUTONEG_ADV] = 0xde1,
-[PHY_LP_ABILITY] = 0x1e0,  [PHY_1000T_STATUS] = 0x3c00,
-[M88E1000_PHY_SPEC_STATUS] = 0xac00,
-};
-
-static const uint32_t mac_reg_init[] = {
-[PBA] = 0x00100030,
-[LEDCTL] =  0x602,
-[CTRL] =E1000_CTRL_SWDPIN2 | E1000_CTRL_SWDPIN0 |
-E1000_CTRL_SPD_1000 | E1000_CTRL_SLU,
-[STATUS] =  0x8000 | E1000_STATUS_GIO_MASTER_ENABLE |
-E1000_STATUS_ASDV | E1000_STATUS_MTXCKOK |
-E1000_STATUS_SPEED_1000 | E1000_STATUS_FD |
-E1000_STATUS_LU,
-[MANC] =E1000_MANC_EN_MNG2HOST | E1000_MANC_RCV_TCO_EN |
-E1000_MANC_ARP_EN | E1000_MANC_0298_EN |
-E1000_MANC_RMCP_EN,
-};
-
 /* PCI interface */
 
 static void
@@ -1120,18 +1132,6 @@ pci_e1000_uninit(PCIDevice *dev)
 return 0;
 }
 
-static void e1000_reset(void *opaque)
-{
-E1000State *d = opaque;
-
-memset(d->phy_reg, 0, sizeof d->phy_reg);
-memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
-

[Qemu-devel] [PATCH 0/2] e1000: reset handling

2011-09-27 Thread Michael S. Tsirkin

Here's a fix for e1000 reset as suggested by
Andy Gospodarek and Anthony PERARD,
written in a way that doesn't require forward
declarations, and with reset clearing interrupts.

Changes from v1:
- value written was stored in CTRL register
  so it will be non 0 after reset.

Notes: some internal state:
uint32_t rxbuf_size;
int check_rxov;
isn't cleared in reset need to check that it's ok.

-- 
MST

[Qemu-devel] [PATCH 2/2] e1000: CTRL.RST emulation

2011-09-27 Thread Michael S. Tsirkin

e1000 spec says CTRL.RST write should have the same effect
as bus reset, except that is preserves PCI Config.
Reset device registers and interrupts.

Fix suggested by Andy Gospodarek 
Similar fix proposed by Anthony PERARD 

Signed-off-by: Michael S. Tsirkin 
---
 hw/e1000.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 87a1104..b51e089 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -241,8 +241,13 @@ static void e1000_reset(void *opaque)
 static void
 set_ctrl(E1000State *s, int index, uint32_t val)
 {
-/* RST is self clearing */
-s->mac_reg[CTRL] = val & ~E1000_CTRL_RST;
+if (val & E1000_CTRL_RST) {
+e1000_reset(s);
+qemu_set_irq(s->dev.irq[0], 0);
+return;
+}
+
+s->mac_reg[CTRL] = val;
 }
 
 static void
-- 
1.7.5.53.gc233e

[Qemu-devel] [PATCH 1/3] memory: Print region priority

2011-09-27 Thread Jan Kiszka

Useful to discover eclipses.

Signed-off-by: Jan Kiszka 
---

PS: Current memory/master requires an obvious build fix (central
definition of some types), but I assume you already have a patch in
your tree.

 memory.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/memory.c b/memory.c
index b1724fe..4c190e5 100644
--- a/memory.c
+++ b/memory.c
@@ -1370,18 +1370,20 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 ml->printed = false;
 QTAILQ_INSERT_TAIL(print_queue, ml, queue);
 }
-mon_printf(f, TARGET_FMT_plx "-" TARGET_FMT_plx " : alias %s @%s "
+mon_printf(f, TARGET_FMT_plx "-" TARGET_FMT_plx " (prio %d): alias %s 
@%s "
TARGET_FMT_plx "-" TARGET_FMT_plx "\n",
base + mr->addr,
base + mr->addr + (target_phys_addr_t)mr->size - 1,
+   mr->priority,
mr->name,
mr->alias->name,
mr->alias_offset,
mr->alias_offset + (target_phys_addr_t)mr->size - 1);
 } else {
-mon_printf(f, TARGET_FMT_plx "-" TARGET_FMT_plx " : %s\n",
+mon_printf(f, TARGET_FMT_plx "-" TARGET_FMT_plx " (prio %d): %s\n",
base + mr->addr,
base + mr->addr + (target_phys_addr_t)mr->size - 1,
+   mr->priority,
mr->name);
 }
 QTAILQ_FOREACH(submr, &mr->subregions, subregions_link) {
-- 
1.7.3.4

[Qemu-devel] [PATCH 2/3] memory: Do not print empty PIO root

2011-09-27 Thread Jan Kiszka

Signed-off-by: Jan Kiszka 
---
 memory.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/memory.c b/memory.c
index 4c190e5..24c5abd 100644
--- a/memory.c
+++ b/memory.c
@@ -1414,7 +1414,10 @@ void mtree_info(fprintf_function mon_printf, void *f)
 g_free(ml2);
 }
 
-QTAILQ_INIT(&ml_head);
-mon_printf(f, "I/O\n");
-mtree_print_mr(mon_printf, f, address_space_io.root, 0, 0, &ml_head);
+if (address_space_io.root &&
+!QTAILQ_EMPTY(&address_space_io.root->subregions)) {
+QTAILQ_INIT(&ml_head);
+mon_printf(f, "I/O\n");
+mtree_print_mr(mon_printf, f, address_space_io.root, 0, 0, &ml_head);
+}
 }
-- 
1.7.3.4

[Qemu-devel] [PATCH 3/3] memory: Print regions in ascending order

2011-09-27 Thread Jan Kiszka

Makes reading the output more user friendly.

Signed-off-by: Jan Kiszka 
---
 memory.c |   37 +++--
 1 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/memory.c b/memory.c
index 24c5abd..6ff78cb 100644
--- a/memory.c
+++ b/memory.c
@@ -1339,12 +1339,13 @@ typedef QTAILQ_HEAD(queue, MemoryRegionList) 
MemoryRegionListHead;
 static void mtree_print_mr(fprintf_function mon_printf, void *f,
const MemoryRegion *mr, unsigned int level,
target_phys_addr_t base,
-   MemoryRegionListHead *print_queue)
+   MemoryRegionListHead *alias_print_queue)
 {
+MemoryRegionList *new_ml, *ml, *next_ml;
+MemoryRegionListHead submr_print_queue;
 const MemoryRegion *submr;
 unsigned int i;
 
-
 if (!mr) {
 return;
 }
@@ -1358,7 +1359,7 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 bool found = false;
 
 /* check if the alias is already in the queue */
-QTAILQ_FOREACH(ml, print_queue, queue) {
+QTAILQ_FOREACH(ml, alias_print_queue, queue) {
 if (ml->mr == mr->alias && !ml->printed) {
 found = true;
 }
@@ -1368,7 +1369,7 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 ml = g_new(MemoryRegionList, 1);
 ml->mr = mr->alias;
 ml->printed = false;
-QTAILQ_INSERT_TAIL(print_queue, ml, queue);
+QTAILQ_INSERT_TAIL(alias_print_queue, ml, queue);
 }
 mon_printf(f, TARGET_FMT_plx "-" TARGET_FMT_plx " (prio %d): alias %s 
@%s "
TARGET_FMT_plx "-" TARGET_FMT_plx "\n",
@@ -1386,9 +1387,33 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
mr->priority,
mr->name);
 }
+
+QTAILQ_INIT(&submr_print_queue);
+
 QTAILQ_FOREACH(submr, &mr->subregions, subregions_link) {
-mtree_print_mr(mon_printf, f, submr, level + 1, base + mr->addr,
-   print_queue);
+new_ml = g_new(MemoryRegionList, 1);
+new_ml->mr = submr;
+QTAILQ_FOREACH(ml, &submr_print_queue, queue) {
+if (new_ml->mr->addr < ml->mr->addr ||
+(new_ml->mr->addr == ml->mr->addr &&
+ new_ml->mr->priority > ml->mr->priority)) {
+QTAILQ_INSERT_BEFORE(ml, new_ml, queue);
+new_ml = NULL;
+break;
+}
+}
+if (new_ml) {
+QTAILQ_INSERT_TAIL(&submr_print_queue, new_ml, queue);
+}
+}
+
+QTAILQ_FOREACH(ml, &submr_print_queue, queue) {
+mtree_print_mr(mon_printf, f, ml->mr, level + 1, base + mr->addr,
+   alias_print_queue);
+}
+
+QTAILQ_FOREACH_SAFE(next_ml, &submr_print_queue, queue, ml) {
+g_free(ml);
 }
 }
 
-- 
1.7.3.4

Re: [Qemu-devel] How to run realview-pbx-a9 image in qemu

2011-09-27 Thread loody

hi:

2011/7/12 Xiao Jiang :
> David Gilbert wrote:
>>
>> On 12 July 2011 07:04, Xiao Jiang  wrote:
>>

 It looks like I am not in luck, qemu still can't run successfully.
 I recompiled the qemu from linaro qemu tree and executed below
 instructions in order.

 1. open window A, run below cmd.
 xjiang@xjiang-desktop:~/work/qemu$ sudo qemu-system-arm -M
 realview-pbx-a9
 -m 1024 -kernel zImage-cortex-a9 -serial telnet::,server -append

>>
>> I think you have a bad kernel image.
>> There are some prebuilt ones (and config files) linked off:
>>
>> http://www.arm.com/community/software-enablement/linux.php
>>
>> but unfortunately the webserver that holds them isn't happy today.
>>
>>
>
> Hi Dave,
>
> Thanks for the link, the server is unhappy :(.
> I compiled this image from mainline ..., so to support qemu,
> there should be some modifications in linux kernel, maybe I
> should try kernel tree from linaro.
> But fortunately, I can run vexpress-a9 image in linaro qemu
> successfully this afternoon which is also built from mainline, :)
>
> Regards,
> Xiao
>>
>> Dave
Would you mind to let me know which configs you use to compile for a9
running on qemu?
Thanks for your help,

Re: [Qemu-devel] Help writing a trivial device

2011-09-27 Thread Lluís Vilanova

Stefan Hajnoczi writes:

> On Mon, Sep 26, 2011 at 08:03:12PM +0200, Lluís Vilanova wrote:
>> #if 1   /* KVM doesn't like it */
>> s-> data_ptr = g_malloc(s->size);
>> memory_region_init_ram_ptr(&s->data, &s->dev.qdev, "backdoor.data",
>>s-> size, s->data_ptr);
>> pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->data);
>> #endif

> Have you tried page-aligning data_ptr with qemu_memalign() or doing an
> anonymous mmap instead of g_malloc()?

> From virt/kvm/kvm_main.c:__kvm_set_memory_region():

> /* General sanity checks */
> if (mem->memory_size & (PAGE_SIZE - 1))
>   goto out;
> if (mem->guest_phys_addr & (PAGE_SIZE - 1))
>   goto out;
> /* We can read the guest memory with __xxx_user() later on. */
> if (user_alloc &&
> ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
>  !access_ok(VERIFY_WRITE,
>   (void __user *)(unsigned long)mem->userspace_addr, mem-> 
> memory_size)))
>   goto out;

Damn! It's so obvious I even feel ashamed :)


Thanks a lot,
Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] Help writing a trivial device

2011-09-27 Thread Lluís Vilanova

Zhi Yong Wu writes:

> 2011/9/26 Lluís Vilanova :
>> The testing system is a Linux 2.6.32, with this:
>> 
>> int fd = open("/sys/devices/pci:00/00:00:004.00/resource0", O_RDWR);
>> void *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> printf("-> %ld\n", *(uint64_t*)addr);
> This is the way about how to test bar? Does bar1 correspond to guest
> file /sys/devices/pci:00/00:00:004.00/resource0?

In linux, resourceN corresponds to the Nth BAR.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [PATCH qemu] e1000: CTRL.RST emulation

2011-09-27 Thread Michael S. Tsirkin

On Tue, Sep 27, 2011 at 01:39:17PM +0100, Peter Maydell wrote:
> On 27 September 2011 13:32, Michael S. Tsirkin  wrote:
> > On Tue, Sep 27, 2011 at 12:50:21PM +0100, Peter Maydell wrote:
> >> On 27 September 2011 12:25, Michael S. Tsirkin  wrote:
> >> > e1000 spec says CTRL.RST write should have the same effect
> >> > as bus reset, except that is preserves PCI Config.
> >> > Reset device registers and interrupts.
> >> >
> >> > Fix suggested by Andy Gospodarek 
> >>
> >> Doesn't this have the same effect as this patch:
> >> http://patchwork.ozlabs.org/patch/108673/
> >
> > Right except mine clears the interrupts as well.
> 
> I agree that's required, but why doesn't it belong in
> e1000_reset() ?

I think not, because pci core resets interrupts for us.

> Surely a qdev reset ought also to clear
> the output irq signals...
> [Disclaimer: this is fishing for somebody to explain to
> me what the semantics of qdev reset actually are :-)]

AFAIK what happens is that the qdev is only involved
until we get to the pci root. From then on
pci_bus_reset calls pci_device_reset to reset the device.

> > I missed that patch - what happened to it in the end?
> 
> I think it just got lost in the shuffle (Anthony L never
> did come back and actually produce a compiler that gave
> a warning on it, so I think that was just a misremembering
> on his part.)
> 
> -- PMM

Yes, that just looks strange. So I intend to just stick this
patch in my tree, unless someone NACKs. Then it'll get merged
in due course.

-- 
MST

[Qemu-devel] [PATCH] Add stdio char device on windows

2011-09-27 Thread Fabien Chouteau

Simple implementation of an stdio char device on Windows.

Signed-off-by: Fabien Chouteau 
---
 qemu-char.c |  195 ++-
 1 files changed, 193 insertions(+), 2 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 09d2309..1351a48 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -538,6 +538,9 @@ int send_all(int fd, const void *_buf, int len1)
 }
 #endif /* !_WIN32 */
 
+#define STDIO_MAX_CLIENTS 1
+static int stdio_nb_clients;
+
 #ifndef _WIN32
 
 typedef struct {
@@ -545,8 +548,6 @@ typedef struct {
 int max_size;
 } FDCharDriver;
 
-#define STDIO_MAX_CLIENTS 1
-static int stdio_nb_clients = 0;
 
 static int fd_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
@@ -1451,6 +1452,8 @@ static int qemu_chr_open_pp(QemuOpts *opts, 
CharDriverState **_chr)
 
 #else /* _WIN32 */
 
+static CharDriverState *stdio_clients[STDIO_MAX_CLIENTS];
+
 typedef struct {
 int max_size;
 HANDLE hcom, hrecv, hsend;
@@ -1809,6 +1812,193 @@ static int qemu_chr_open_win_file_out(QemuOpts *opts, 
CharDriverState **_chr)
 
 return qemu_chr_open_win_file(fd_out, _chr);
 }
+
+static int win_stdio_write(CharDriverState *chr, const uint8_t *buf, int len)
+{
+HANDLE *hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
+DWORD   dwSize;
+int len1;
+
+len1 = len;
+
+while (len1 > 0) {
+if (!WriteFile(hStdOut, buf, len1, &dwSize, NULL)) {
+break;
+}
+buf  += dwSize;
+len1 -= dwSize;
+}
+
+return len - len1;
+}
+
+static HANDLE *hStdIn;
+
+static void win_stdio_wait_func(void *opaque)
+{
+CharDriverState *chr = opaque;
+INPUT_RECORD buf[4];
+int  ret;
+DWORDdwSize;
+int  i;
+
+ret = ReadConsoleInput(hStdIn, buf, sizeof(buf)/sizeof(*buf), &dwSize);
+
+if (!ret) {
+/* Avoid error storm */
+qemu_del_wait_object(hStdIn, NULL, NULL);
+return;
+}
+
+for (i = 0; i < dwSize; i++) {
+KEY_EVENT_RECORD *kev = &buf[i].Event.KeyEvent;
+
+if (buf[i].EventType == KEY_EVENT && kev->bKeyDown) {
+int j;
+if (kev->uChar.AsciiChar != 0) {
+for (j = 0; j < kev->wRepeatCount; j++)
+if (qemu_chr_be_can_write(chr)) {
+uint8_t c = kev->uChar.AsciiChar;
+qemu_chr_be_write(chr, &c, 1);
+}
+}
+}
+}
+}
+
+static HANDLE  hInputReadyEvent;
+static HANDLE  hInputDoneEvent;
+static uint8_t win_stdio_buf;
+
+static DWORD WINAPI win_stdio_thread(LPVOID param)
+{
+int   ret;
+DWORD dwSize;
+
+while (1) {
+
+/* Wait for one byte */
+ret = ReadFile(hStdIn, &win_stdio_buf, 1, &dwSize, NULL);
+
+/* Exit in case of error, continue if nothing read */
+if (!ret) {
+break;
+}
+if (!dwSize) {
+continue;
+}
+
+/* Some terminal emulator returns \r\n for Enter, just pass \n */
+if (win_stdio_buf == '\r') {
+continue;
+}
+
+/* Signal the main thread and wait until the byte was eaten */
+if (!SetEvent(hInputReadyEvent)) {
+break;
+}
+if (WaitForSingleObject(hInputDoneEvent, INFINITE) != WAIT_OBJECT_0) {
+break;
+}
+}
+
+qemu_del_wait_object(hInputReadyEvent, NULL, NULL);
+return 0;
+}
+
+static void win_stdio_thread_wait_func(void *opaque)
+{
+CharDriverState *chr = opaque;
+
+if (qemu_chr_be_can_write(chr)) {
+qemu_chr_be_write(chr, &win_stdio_buf, 1);
+}
+
+SetEvent(hInputDoneEvent);
+}
+
+static void qemu_chr_set_echo_win_stdio(CharDriverState *chr, bool echo)
+{
+DWORD mode = 0;
+
+GetConsoleMode(hStdIn, &mode);
+
+if (echo) {
+SetConsoleMode(hStdIn, mode | (ENABLE_ECHO_INPUT));
+} else {
+SetConsoleMode(hStdIn, mode & (~ENABLE_ECHO_INPUT));
+}
+}
+
+static int qemu_chr_open_win_stdio(QemuOpts *opts,
+CharDriverState **_chr)
+{
+CharDriverState *chr;
+DWORDdwMode;
+int  is_console = 0;
+
+hStdIn = GetStdHandle(STD_INPUT_HANDLE);
+if (hStdIn == INVALID_HANDLE_VALUE) {
+fprintf(stderr, "cannot open stdio: invalid handle\n");
+exit(1);
+}
+
+is_console = GetConsoleMode(hStdIn, &dwMode) != 0;
+
+if (stdio_nb_clients >= STDIO_MAX_CLIENTS
+|| ((display_type != DT_NOGRAPHIC) && (stdio_nb_clients != 0))) {
+return -EIO;
+}
+
+chr = g_malloc0(sizeof(CharDriverState));
+if (!chr) {
+return -ENOMEM;
+}
+
+chr->chr_write = win_stdio_write;
+
+if (stdio_nb_clients == 0) {
+if (is_console) {
+if (qemu_add_wait_object(hStdIn,
+ win_stdio_wait_func, chr)) {
+fprintf(stderr, "qemu_add_wai

Re: [Qemu-devel] How to run realview-pbx-a9 image in qemu

2011-09-27 Thread David Gilbert

On 27 September 2011 14:01, loody  wrote:
> hi:




> Would you mind to let me know which configs you use to compile for a9
> running on qemu?

Kernel configs?  I mostly use prebuilt kernels from the Linaro images.

Dave

Re: [Qemu-devel] The reason behind block linking constraint?

2011-09-27 Thread Rob Landley

On 09/26/2011 10:13 PM, 陳韋任 wrote:
> Hi, Rob
> 
  Is it just because we cannot optimize block linking which crosses page
 boundary, or there are some correctness/safety issues should be considered?
>>>
>>> If we link a TB with another TB from the different page, then the
>>> second TB may disappear when the memory mapping changes and the
>>> subsequent direct jump from the first TB will crash qemu.
>>>
>>> I guess that this usually does not happen in usermode, because the
>>> guest would not modify executable code memory mapping. However I
>>> suppose that this is also possible.
>>
>> Dynamic linking modifies guest code, requiring the page to be
>> retranslated.  With lazy binding this can happen at any time, and
>> without PIE executables this can happen to just about any executable page.
> 
>   Max and I have some discussion about the page boundary constraint
> of block linking. Maybe it's not worth to track cross-page block
> linking, for latter possible block unchaining. So there is a page
> boundary constraint.
> 
>   You said dynamic linking requires the page to be retranslated.
> Does that imply if there is NO page boundary constraint, user
> mode might crash? If so, does it occur frequently? Maybe small program
> just works fine without such constraint, I have to run something
> big to make QEMU crash?

The constraints you're talking about are on the translated code, dynamic
linking happens on the target code.  Changes to the target code require
regenerating the translated code, which happens with page granularity.

Rob

Re: [Qemu-devel] [PATCH] Add stdio char device on windows

2011-09-27 Thread Paolo Bonzini


On 09/27/2011 03:14 PM, Fabien Chouteau wrote:

+/* Some terminal emulator returns \r\n for Enter, just pass \n */
+if (win_stdio_buf == '\r') {
+continue;
+}


Does the \r actually do any damage?


+static void qemu_chr_set_echo_win_stdio(CharDriverState *chr, bool echo)
+{
+DWORD mode = 0;
+
+GetConsoleMode(hStdIn, &mode);
+
+if (echo) {
+SetConsoleMode(hStdIn, mode | (ENABLE_ECHO_INPUT));
+} else {
+SetConsoleMode(hStdIn, mode & (~ENABLE_ECHO_INPUT));
+}
+}


You also need to enable ENABLE_LINE_INPUT for ENABLE_ECHO_INPUT to have 
effect.


Paolo

Re: [Qemu-devel] How to run realview-pbx-a9 image in qemu

2011-09-27 Thread loody

hi dave:

2011/9/27 David Gilbert :
> On 27 September 2011 14:01, loody  wrote:
>> hi:
>
>
> 
>
>> Would you mind to let me know which configs you use to compile for a9
>> running on qemu?
>
> Kernel configs?  I mostly use prebuilt kernels from the Linaro images.
>
> Dave
>
Doesn't qemu emulate any default confis in arm, such as realview_defconfig?
(as far as I know, qemu can emulate malta in mips configs)

So that mean qemu can emulate Linaro soc board?

Thanks for your help,


-- 
Regards,

Re: [Qemu-devel] KVM call agenda for September 26

2011-09-27 Thread Juan Quintela

Juan Quintela  wrote:
> Hi
>
> Please send in any agenda items you are interested in covering.
>
> Thanks, Juan.

As there are no agenda for today, call gets cancelled.

Happy hacking, Juan.

[Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Jan Kiszka

Move qemu_eventfd unmodified to oslib-posix and use it for signaling
POSIX AIO completions. If native eventfd suport is available, this
avoids multiple read accesses to drain multiple pending signals. As
before we use a pipe if eventfd is not supported.

Signed-off-by: Jan Kiszka 
---
 os-posix.c |   32 
 oslib-posix.c  |   32 +++-
 posix-aio-compat.c |   12 
 3 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/os-posix.c b/os-posix.c
index dbf3b24..a918895 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -45,10 +45,6 @@
 #include 
 #endif
 
-#ifdef CONFIG_EVENTFD
-#include 
-#endif
-
 static struct passwd *user_pwd;
 static const char *chroot_dir;
 static int daemonize;
@@ -333,34 +329,6 @@ void os_set_line_buffering(void)
 setvbuf(stdout, NULL, _IOLBF, 0);
 }
 
-/*
- * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
- */
-int qemu_eventfd(int fds[2])
-{
-#ifdef CONFIG_EVENTFD
-int ret;
-
-ret = eventfd(0, 0);
-if (ret >= 0) {
-fds[0] = ret;
-qemu_set_cloexec(ret);
-if ((fds[1] = dup(ret)) == -1) {
-close(ret);
-return -1;
-}
-qemu_set_cloexec(fds[1]);
-return 0;
-}
-
-if (errno != ENOSYS) {
-return -1;
-}
-#endif
-
-return qemu_pipe(fds);
-}
-
 int qemu_create_pidfile(const char *filename)
 {
 char buffer[128];
diff --git a/oslib-posix.c b/oslib-posix.c
index a304fb0..8ef7bd7 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -47,7 +47,9 @@ extern int daemon(int, int);
 #include "trace.h"
 #include "qemu_socket.h"
 
-
+#ifdef CONFIG_EVENTFD
+#include 
+#endif
 
 int qemu_daemon(int nochdir, int noclose)
 {
@@ -139,6 +141,34 @@ int qemu_pipe(int pipefd[2])
 return ret;
 }
 
+/*
+ * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
+ */
+int qemu_eventfd(int fds[2])
+{
+#ifdef CONFIG_EVENTFD
+int ret;
+
+ret = eventfd(0, 0);
+if (ret >= 0) {
+fds[0] = ret;
+qemu_set_cloexec(ret);
+if ((fds[1] = dup(ret)) == -1) {
+close(ret);
+return -1;
+}
+qemu_set_cloexec(fds[1]);
+return 0;
+}
+
+if (errno != ENOSYS) {
+return -1;
+}
+#endif
+
+return qemu_pipe(fds);
+}
+
 int qemu_utimensat(int dirfd, const char *path, const struct timespec *times,
int flags)
 {
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index d3c1174..2aa5ba3 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -521,7 +521,7 @@ static void posix_aio_read(void *opaque)
 PosixAioState *s = opaque;
 ssize_t len;
 
-/* read all bytes from signal pipe */
+/* read all bytes from eventfd or signal pipe */
 for (;;) {
 char bytes[16];
 
@@ -546,10 +546,14 @@ static PosixAioState *posix_aio_state;
 
 static void posix_aio_notify_event(void)
 {
-char byte = 0;
+/* Write 8 bytes to be compatible with eventfd.  */
+static const uint64_t val = 1;
 ssize_t ret;
 
-ret = write(posix_aio_state->wfd, &byte, sizeof(byte));
+do {
+ret = write(posix_aio_state->wfd, &val, sizeof(val));
+} while (ret < 0 && errno == EINTR);
+
 if (ret < 0 && errno != EAGAIN)
 die("write()");
 }
@@ -665,7 +669,7 @@ int paio_init(void)
 s = g_malloc(sizeof(PosixAioState));
 
 s->first_aio = NULL;
-if (qemu_pipe(fds) == -1) {
+if (qemu_eventfd(fds) == -1) {
 fprintf(stderr, "failed to create pipe\n");
 return -1;
 }
-- 
1.7.3.4

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Anthony Liguori


On 09/27/2011 08:56 AM, Jan Kiszka wrote:

Move qemu_eventfd unmodified to oslib-posix and use it for signaling
POSIX AIO completions. If native eventfd suport is available, this
avoids multiple read accesses to drain multiple pending signals. As
before we use a pipe if eventfd is not supported.

Signed-off-by: Jan Kiszka
---
  os-posix.c |   32 
  oslib-posix.c  |   32 +++-
  posix-aio-compat.c |   12 
  3 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/os-posix.c b/os-posix.c
index dbf3b24..a918895 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -45,10 +45,6 @@
  #include
  #endif

-#ifdef CONFIG_EVENTFD
-#include
-#endif
-
  static struct passwd *user_pwd;
  static const char *chroot_dir;
  static int daemonize;
@@ -333,34 +329,6 @@ void os_set_line_buffering(void)
  setvbuf(stdout, NULL, _IOLBF, 0);
  }

-/*
- * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
- */
-int qemu_eventfd(int fds[2])
-{
-#ifdef CONFIG_EVENTFD
-int ret;
-
-ret = eventfd(0, 0);
-if (ret>= 0) {
-fds[0] = ret;
-qemu_set_cloexec(ret);
-if ((fds[1] = dup(ret)) == -1) {
-close(ret);
-return -1;
-}
-qemu_set_cloexec(fds[1]);
-return 0;
-}
-
-if (errno != ENOSYS) {
-return -1;
-}
-#endif
-
-return qemu_pipe(fds);
-}
-
  int qemu_create_pidfile(const char *filename)
  {
  char buffer[128];
diff --git a/oslib-posix.c b/oslib-posix.c
index a304fb0..8ef7bd7 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -47,7 +47,9 @@ extern int daemon(int, int);
  #include "trace.h"
  #include "qemu_socket.h"

-
+#ifdef CONFIG_EVENTFD
+#include
+#endif

  int qemu_daemon(int nochdir, int noclose)
  {
@@ -139,6 +141,34 @@ int qemu_pipe(int pipefd[2])
  return ret;
  }

+/*
+ * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
+ */
+int qemu_eventfd(int fds[2])
+{
+#ifdef CONFIG_EVENTFD
+int ret;
+
+ret = eventfd(0, 0);
+if (ret>= 0) {
+fds[0] = ret;
+qemu_set_cloexec(ret);
+if ((fds[1] = dup(ret)) == -1) {
+close(ret);
+return -1;
+}
+qemu_set_cloexec(fds[1]);
+return 0;
+}
+
+if (errno != ENOSYS) {
+return -1;
+}
+#endif
+
+return qemu_pipe(fds);
+}
+


I think it's a bit dangerous to implement eventfd() in terms of pipe().

You don't expect to handle EAGAIN with eventfd() whereas you have to handle it 
with pipe().


Moreover, the eventfd() counter is not lossy (practically speaking) whereas if 
you use pipe() as a counter, it will be lossy in practice.


This is why posix aio uses pipe() and not eventfd().

Regards,

Anthony Liguori


  int qemu_utimensat(int dirfd, const char *path, const struct timespec *times,
 int flags)
  {
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index d3c1174..2aa5ba3 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -521,7 +521,7 @@ static void posix_aio_read(void *opaque)
  PosixAioState *s = opaque;
  ssize_t len;

-/* read all bytes from signal pipe */
+/* read all bytes from eventfd or signal pipe */
  for (;;) {
  char bytes[16];

@@ -546,10 +546,14 @@ static PosixAioState *posix_aio_state;

  static void posix_aio_notify_event(void)
  {
-char byte = 0;
+/* Write 8 bytes to be compatible with eventfd.  */
+static const uint64_t val = 1;
  ssize_t ret;

-ret = write(posix_aio_state->wfd,&byte, sizeof(byte));
+do {
+ret = write(posix_aio_state->wfd,&val, sizeof(val));
+} while (ret<  0&&  errno == EINTR);
+
  if (ret<  0&&  errno != EAGAIN)
  die("write()");
  }
@@ -665,7 +669,7 @@ int paio_init(void)
  s = g_malloc(sizeof(PosixAioState));

  s->first_aio = NULL;
-if (qemu_pipe(fds) == -1) {
+if (qemu_eventfd(fds) == -1) {
  fprintf(stderr, "failed to create pipe\n");
  return -1;
  }

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Avi Kivity


On 09/27/2011 05:07 PM, Anthony Liguori wrote:


You don't expect to handle EAGAIN with eventfd() whereas you have to 
handle it with pipe().


Moreover, the eventfd() counter is not lossy (practically speaking) 
whereas if you use pipe() as a counter, it will be lossy in practice.


This is why posix aio uses pipe() and not eventfd().


We could define a qemu_event mechanism that satisfies the least common 
denominator, and is implemented by eventfd when available.


qemu_event_create()
qemu_event_signal()
qemu_event_wait()
qemu_event_poll_add() // registers in main loop
qemu_event_poll_del()

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] Add stdio char device on windows

2011-09-27 Thread Fabien Chouteau

On 27/09/2011 15:31, Paolo Bonzini wrote:
> On 09/27/2011 03:14 PM, Fabien Chouteau wrote:
>> +/* Some terminal emulator returns \r\n for Enter, just pass \n */
>> +if (win_stdio_buf == '\r') {
>> +continue;
>> +}
>
> Does the \r actually do any damage?
>

It's just more convenient to have the same behavior on all hosts (i.e.
no \r).

>> +static void qemu_chr_set_echo_win_stdio(CharDriverState *chr, bool echo)
>> +{
>> +DWORD mode = 0;
>> +
>> +GetConsoleMode(hStdIn, &mode);
>> +
>> +if (echo) {
>> +SetConsoleMode(hStdIn, mode | (ENABLE_ECHO_INPUT));
>> +} else {
>> +SetConsoleMode(hStdIn, mode & (~ENABLE_ECHO_INPUT));
>> +}
>> +}
> 
> You also need to enable ENABLE_LINE_INPUT for ENABLE_ECHO_INPUT to have 
> effect.
> 

OK thanks, I missed that...

Regards,

-- 
Fabien Chouteau

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Anthony Liguori


On 09/27/2011 09:11 AM, Avi Kivity wrote:

On 09/27/2011 05:07 PM, Anthony Liguori wrote:


You don't expect to handle EAGAIN with eventfd() whereas you have to handle it
with pipe().

Moreover, the eventfd() counter is not lossy (practically speaking) whereas if
you use pipe() as a counter, it will be lossy in practice.

This is why posix aio uses pipe() and not eventfd().


We could define a qemu_event mechanism that satisfies the least common
denominator, and is implemented by eventfd when available.

qemu_event_create()
qemu_event_signal()
qemu_event_wait()
qemu_event_poll_add() // registers in main loop
qemu_event_poll_del()


See hw/event_notifier.[ch].

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Avi Kivity


On 09/27/2011 05:19 PM, Anthony Liguori wrote:

qemu_event_create()
qemu_event_signal()
qemu_event_wait()
qemu_event_poll_add() // registers in main loop
qemu_event_poll_del()



See hw/event_notifier.[ch].


Precisely.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Jan Kiszka

On 2011-09-27 16:19, Anthony Liguori wrote:
> On 09/27/2011 09:11 AM, Avi Kivity wrote:
>> On 09/27/2011 05:07 PM, Anthony Liguori wrote:
>>>
>>> You don't expect to handle EAGAIN with eventfd() whereas you have to handle 
>>> it
>>> with pipe().
>>>
>>> Moreover, the eventfd() counter is not lossy (practically speaking) whereas 
>>> if
>>> you use pipe() as a counter, it will be lossy in practice.
>>>
>>> This is why posix aio uses pipe() and not eventfd().
>>
>> We could define a qemu_event mechanism that satisfies the least common
>> denominator, and is implemented by eventfd when available.
>>
>> qemu_event_create()
>> qemu_event_signal()
>> qemu_event_wait()
>> qemu_event_poll_add() // registers in main loop
>> qemu_event_poll_del()
> 
> See hw/event_notifier.[ch].

That code looks suspicious, btw. It claims things ("we use
EFD_SEMAPHORE") it does not do.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH] event_notifier: move to top-level directory

2011-09-27 Thread Avi Kivity

Has no business in hw/.

Signed-off-by: Avi Kivity 
---
 hw/event_notifier.c => event_notifier.c |1 -
 hw/event_notifier.h => event_notifier.h |0
 2 files changed, 0 insertions(+), 1 deletions(-)
 rename hw/event_notifier.c => event_notifier.c (98%)
 rename hw/event_notifier.h => event_notifier.h (100%)

diff --git a/hw/event_notifier.c b/event_notifier.c
similarity index 98%
rename from hw/event_notifier.c
rename to event_notifier.c
index 13f3656..2c73555 100644
--- a/hw/event_notifier.c
+++ b/event_notifier.c
@@ -10,7 +10,6 @@
  * the COPYING file in the top-level directory.
  */
 
-#include "hw.h"
 #include "event_notifier.h"
 #ifdef CONFIG_EVENTFD
 #include 
diff --git a/hw/event_notifier.h b/event_notifier.h
similarity index 100%
rename from hw/event_notifier.h
rename to event_notifier.h
-- 
1.7.6.3

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Jan Kiszka

On 2011-09-27 16:07, Anthony Liguori wrote:
> On 09/27/2011 08:56 AM, Jan Kiszka wrote:
>> Move qemu_eventfd unmodified to oslib-posix and use it for signaling
>> POSIX AIO completions. If native eventfd suport is available, this
>> avoids multiple read accesses to drain multiple pending signals. As
>> before we use a pipe if eventfd is not supported.
>>
>> Signed-off-by: Jan Kiszka
>> ---
>>   os-posix.c |   32 
>>   oslib-posix.c  |   32 +++-
>>   posix-aio-compat.c |   12 
>>   3 files changed, 39 insertions(+), 37 deletions(-)
>>
>> diff --git a/os-posix.c b/os-posix.c
>> index dbf3b24..a918895 100644
>> --- a/os-posix.c
>> +++ b/os-posix.c
>> @@ -45,10 +45,6 @@
>>   #include
>>   #endif
>>
>> -#ifdef CONFIG_EVENTFD
>> -#include
>> -#endif
>> -
>>   static struct passwd *user_pwd;
>>   static const char *chroot_dir;
>>   static int daemonize;
>> @@ -333,34 +329,6 @@ void os_set_line_buffering(void)
>>   setvbuf(stdout, NULL, _IOLBF, 0);
>>   }
>>
>> -/*
>> - * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
>> - */
>> -int qemu_eventfd(int fds[2])
>> -{
>> -#ifdef CONFIG_EVENTFD
>> -int ret;
>> -
>> -ret = eventfd(0, 0);
>> -if (ret>= 0) {
>> -fds[0] = ret;
>> -qemu_set_cloexec(ret);
>> -if ((fds[1] = dup(ret)) == -1) {
>> -close(ret);
>> -return -1;
>> -}
>> -qemu_set_cloexec(fds[1]);
>> -return 0;
>> -}
>> -
>> -if (errno != ENOSYS) {
>> -return -1;
>> -}
>> -#endif
>> -
>> -return qemu_pipe(fds);
>> -}
>> -
>>   int qemu_create_pidfile(const char *filename)
>>   {
>>   char buffer[128];
>> diff --git a/oslib-posix.c b/oslib-posix.c
>> index a304fb0..8ef7bd7 100644
>> --- a/oslib-posix.c
>> +++ b/oslib-posix.c
>> @@ -47,7 +47,9 @@ extern int daemon(int, int);
>>   #include "trace.h"
>>   #include "qemu_socket.h"
>>
>> -
>> +#ifdef CONFIG_EVENTFD
>> +#include
>> +#endif
>>
>>   int qemu_daemon(int nochdir, int noclose)
>>   {
>> @@ -139,6 +141,34 @@ int qemu_pipe(int pipefd[2])
>>   return ret;
>>   }
>>
>> +/*
>> + * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
>> + */
>> +int qemu_eventfd(int fds[2])
>> +{
>> +#ifdef CONFIG_EVENTFD
>> +int ret;
>> +
>> +ret = eventfd(0, 0);
>> +if (ret>= 0) {
>> +fds[0] = ret;
>> +qemu_set_cloexec(ret);
>> +if ((fds[1] = dup(ret)) == -1) {
>> +close(ret);
>> +return -1;
>> +}
>> +qemu_set_cloexec(fds[1]);
>> +return 0;
>> +}
>> +
>> +if (errno != ENOSYS) {
>> +return -1;
>> +}
>> +#endif
>> +
>> +return qemu_pipe(fds);
>> +}
>> +
> 
> I think it's a bit dangerous to implement eventfd() in terms of pipe().
> 
> You don't expect to handle EAGAIN with eventfd() whereas you have to handle 
> it 
> with pipe().

EAGAIN is returned on eventfd read if no event is pending and the fd is
non-blocking - just as we configure it.

> 
> Moreover, the eventfd() counter is not lossy (practically speaking) whereas 
> if 
> you use pipe() as a counter, it will be lossy in practice.
> 
> This is why posix aio uses pipe() and not eventfd().

I don't get this yet. eventfd is lossy by default. It only decreases the
counter on read if you specify EFD_SEMAPHORE - which we do not do.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH qemu] e1000: CTRL.RST emulation

2011-09-27 Thread Anthony Liguori


On 09/27/2011 06:50 AM, Peter Maydell wrote:

On 27 September 2011 12:25, Michael S. Tsirkin  wrote:

e1000 spec says CTRL.RST write should have the same effect
as bus reset, except that is preserves PCI Config.
Reset device registers and interrupts.

Fix suggested by Andy Gospodarek


Doesn't this have the same effect as this patch:
http://patchwork.ozlabs.org/patch/108673/

except that it's harder to read because it's moved a lot
of code around in the file?

(I think you have an extra qemu_set_irq() call in there,
actually. But it was hard to find. Also your code has the
bug that was in earlier revisions of Anthony's patch where
after doing the reset you fall through and allow other bits
in the ctrl register to be set.)


I didn't see your note which said not to rend the patch unless I produce a 
compiler that issues a warning.  Since Anthony P. was going to resubmit, I never 
looked into it further.


I honestly don't care at this point which patch gets merged.

Regards,

Anthony Liguori



-- PMM

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Avi Kivity

On 09/27/2011 05:29 PM, Jan Kiszka wrote:

>
>  Moreover, the eventfd() counter is not lossy (practically speaking) whereas 
if
>  you use pipe() as a counter, it will be lossy in practice.
>
>  This is why posix aio uses pipe() and not eventfd().

I don't get this yet. eventfd is lossy by default. It only decreases the
counter on read if you specify EFD_SEMAPHORE - which we do not do.

It's not lossy - a read returns the number of events written since the 
last read.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Anthony Liguori


On 09/27/2011 09:29 AM, Jan Kiszka wrote:

On 2011-09-27 16:07, Anthony Liguori wrote:

On 09/27/2011 08:56 AM, Jan Kiszka wrote:

Move qemu_eventfd unmodified to oslib-posix and use it for signaling
POSIX AIO completions. If native eventfd suport is available, this
avoids multiple read accesses to drain multiple pending signals. As
before we use a pipe if eventfd is not supported.

Signed-off-by: Jan Kiszka
---
   os-posix.c |   32 
   oslib-posix.c  |   32 +++-
   posix-aio-compat.c |   12 
   3 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/os-posix.c b/os-posix.c
index dbf3b24..a918895 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -45,10 +45,6 @@
   #include
   #endif

-#ifdef CONFIG_EVENTFD
-#include
-#endif
-
   static struct passwd *user_pwd;
   static const char *chroot_dir;
   static int daemonize;
@@ -333,34 +329,6 @@ void os_set_line_buffering(void)
   setvbuf(stdout, NULL, _IOLBF, 0);
   }

-/*
- * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
- */
-int qemu_eventfd(int fds[2])
-{
-#ifdef CONFIG_EVENTFD
-int ret;
-
-ret = eventfd(0, 0);
-if (ret>= 0) {
-fds[0] = ret;
-qemu_set_cloexec(ret);
-if ((fds[1] = dup(ret)) == -1) {
-close(ret);
-return -1;
-}
-qemu_set_cloexec(fds[1]);
-return 0;
-}
-
-if (errno != ENOSYS) {
-return -1;
-}
-#endif
-
-return qemu_pipe(fds);
-}
-
   int qemu_create_pidfile(const char *filename)
   {
   char buffer[128];
diff --git a/oslib-posix.c b/oslib-posix.c
index a304fb0..8ef7bd7 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -47,7 +47,9 @@ extern int daemon(int, int);
   #include "trace.h"
   #include "qemu_socket.h"

-
+#ifdef CONFIG_EVENTFD
+#include
+#endif

   int qemu_daemon(int nochdir, int noclose)
   {
@@ -139,6 +141,34 @@ int qemu_pipe(int pipefd[2])
   return ret;
   }

+/*
+ * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
+ */
+int qemu_eventfd(int fds[2])
+{
+#ifdef CONFIG_EVENTFD
+int ret;
+
+ret = eventfd(0, 0);
+if (ret>= 0) {
+fds[0] = ret;
+qemu_set_cloexec(ret);
+if ((fds[1] = dup(ret)) == -1) {
+close(ret);
+return -1;
+}
+qemu_set_cloexec(fds[1]);
+return 0;
+}
+
+if (errno != ENOSYS) {
+return -1;
+}
+#endif
+
+return qemu_pipe(fds);
+}
+


I think it's a bit dangerous to implement eventfd() in terms of pipe().

You don't expect to handle EAGAIN with eventfd() whereas you have to handle it
with pipe().


EAGAIN is returned on eventfd read if no event is pending and the fd is
non-blocking - just as we configure it.



Moreover, the eventfd() counter is not lossy (practically speaking) whereas if
you use pipe() as a counter, it will be lossy in practice.

This is why posix aio uses pipe() and not eventfd().


I don't get this yet. eventfd is lossy by default. It only decreases the
counter on read if you specify EFD_SEMAPHORE - which we do not do.


uint64_t value;

for (i = 0; i < 1 << 32; i++) {
   value = 1;
   write(fd, &value, sizeof(value));
}

uint64_t count = 0;

do {
   len = read(fd, &value, sizeof(value));
   count += value;
} while (len != -1);

With eventfd, count == 2^32.  With pipe, count == 8192.

That's each '1' is stored in the pipe buffer whereas with eventfd, an index is 
just incremented.


Not sure what you mean re: EFD_SEMAPHORE.  EFD_SEMAPHORE basically means any 
non-zero value is returned as 1 and the counter is decremented by 1.  Without 
EFD_SEMAPHORE, the count is returned and the counter is reset to 0.


Regards,

Anthony Liguori



Jan

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Jan Kiszka

On 2011-09-27 16:34, Avi Kivity wrote:
> On 09/27/2011 05:29 PM, Jan Kiszka wrote:
>>>
>>>  Moreover, the eventfd() counter is not lossy (practically speaking) 
>>> whereas if
>>>  you use pipe() as a counter, it will be lossy in practice.
>>>
>>>  This is why posix aio uses pipe() and not eventfd().
>>
>> I don't get this yet. eventfd is lossy by default. It only decreases the
>> counter on read if you specify EFD_SEMAPHORE - which we do not do.
>>
> 
> It's not lossy - a read returns the number of events written since the 
> last read.

Yeah, but what's the point? We don't evaluate this.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Anthony Liguori


On 09/27/2011 09:22 AM, Jan Kiszka wrote:

On 2011-09-27 16:19, Anthony Liguori wrote:

On 09/27/2011 09:11 AM, Avi Kivity wrote:

On 09/27/2011 05:07 PM, Anthony Liguori wrote:


You don't expect to handle EAGAIN with eventfd() whereas you have to handle it
with pipe().

Moreover, the eventfd() counter is not lossy (practically speaking) whereas if
you use pipe() as a counter, it will be lossy in practice.

This is why posix aio uses pipe() and not eventfd().


We could define a qemu_event mechanism that satisfies the least common
denominator, and is implemented by eventfd when available.

qemu_event_create()
qemu_event_signal()
qemu_event_wait()
qemu_event_poll_add() // registers in main loop
qemu_event_poll_del()


See hw/event_notifier.[ch].


That code looks suspicious, btw. It claims things ("we use
EFD_SEMAPHORE") it does not do.


Indeed.

But the interface appears to be the right one IMHO.

Regards,

Anthony Liguori



Jan

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Paolo Bonzini


On 09/27/2011 04:07 PM, Anthony Liguori wrote:


I think it's a bit dangerous to implement eventfd() in terms of pipe().

You don't expect to handle EAGAIN with eventfd() whereas you have to
handle it with pipe().

Moreover, the eventfd() counter is not lossy (practically speaking)
whereas if you use pipe() as a counter, it will be lossy in practice.

This is why posix aio uses pipe() and not eventfd().


But this is the same idiom we use for the iothread signaling.  We're not 
using the eventfd's counter.  Perhaps it would be nice to complete 
EventNotifier with "notify event" methods and use it, but Jan's patch is 
safe, I think.


Paolo

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Avi Kivity

On 09/27/2011 05:36 PM, Jan Kiszka wrote:

On 2011-09-27 16:34, Avi Kivity wrote:
>  On 09/27/2011 05:29 PM, Jan Kiszka wrote:
>>>
>>>   Moreover, the eventfd() counter is not lossy (practically speaking) 
whereas if
>>>   you use pipe() as a counter, it will be lossy in practice.
>>>
>>>   This is why posix aio uses pipe() and not eventfd().
>>
>>  I don't get this yet. eventfd is lossy by default. It only decreases the
>>  counter on read if you specify EFD_SEMAPHORE - which we do not do.
>>
>
>  It's not lossy - a read returns the number of events written since the
>  last read.

Yeah, but what's the point? We don't evaluate this.

If we write an interface that looks like eventfd but subtly differs, 
someone will get bitten.  If we write a new interface and implement it 
via eventfd (or a pipe), no one gets bitten.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Jan Kiszka

On 2011-09-27 16:42, Avi Kivity wrote:
> On 09/27/2011 05:36 PM, Jan Kiszka wrote:
>> On 2011-09-27 16:34, Avi Kivity wrote:
>>>  On 09/27/2011 05:29 PM, Jan Kiszka wrote:
>
>   Moreover, the eventfd() counter is not lossy (practically speaking) 
> whereas if
>   you use pipe() as a counter, it will be lossy in practice.
>
>   This is why posix aio uses pipe() and not eventfd().

  I don't get this yet. eventfd is lossy by default. It only decreases the
  counter on read if you specify EFD_SEMAPHORE - which we do not do.

>>>
>>>  It's not lossy - a read returns the number of events written since the
>>>  last read.
>>
>> Yeah, but what's the point? We don't evaluate this.
>>
>>
> 
> If we write an interface that looks like eventfd but subtly differs, 
> someone will get bitten.  If we write a new interface and implement it 
> via eventfd (or a pipe), no one gets bitten.

I don't disagree that there is still room for improving the existing
interface, generalizing to qemu_event. But that's not in the scope of
this patch.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Avi Kivity


On 09/27/2011 05:45 PM, Jan Kiszka wrote:

I don't disagree that there is still room for improving the existing
interface, generalizing to qemu_event. But that's not in the scope of
this patch.



Why not use event_notify.c?

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Jan Kiszka

On 2011-09-27 16:48, Avi Kivity wrote:
> On 09/27/2011 05:45 PM, Jan Kiszka wrote:
>> I don't disagree that there is still room for improving the existing
>> interface, generalizing to qemu_event. But that's not in the scope of
>> this patch.
>>
> 
> Why not use event_notify.c?

It doesn't solve the wrapping issue, it mandates eventfd support.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH V10 2/5] Add TPM (frontend) hardware interface (TPM TIS) to Qemu

2011-09-27 Thread Stefan Berger

This patch adds the main code of the TPM frontend driver, the TPM TIS
interface, to Qemu. The code is largely based on the previous implementation
for Xen but has been significantly extended to meet the standard's
requirements, such as the support for changing of localities and all the
functionality of the available flags.

Communication with the backend (i.e., for Xen or the libtpms-based one)
is cleanly separated through an interface which the backend driver needs
to implement.

The TPM TIS driver's backend was previously chosen in the code added
to arch_init. The frontend holds a pointer to the chosen backend (interface).

Communication with the backend is largely based on signals and conditions.
Whenever the frontend has collected a complete packet, it will signal
the backend, which then starts processing the command. Once the result
has been returned, the backend invokes a callback function
(tis_tpm_receive_cb()).

The one tricky part is support for VM suspend while the TPM is processing
a command. In this case the frontend driver is waiting for the backend
to return the result of the last command before shutting down. It waits
on a condition for a signal from the backend, which is delivered in 
tis_tpm_receive_cb().

Testing the proper functioning of the different flags and localities 
cannot be done from user space when running in Linux for example, since
access to the address space of the TPM TIS interface is not possible. Also
the Linux driver itself does not exercise all functionality. So, for
testing there is a fairly extensive test suite as part of the SeaBIOS patches
since from within the BIOS one can have full access to all the TPM's registers.

v10:
  - Use error_report where necessary
  - Convert to use memory_region_init_io; use DEVICE_NATIVE_ENDIAN which
works for an x86 VM on x86 or ppc host
  - Implement cleanup in tpm_tis_init
  - put frontend model name into TPMBackend driver structure for later display
with the monitor

v9:
  - prefixing all function with tpm_tis_ and all constants with TPM_TIS_
  - adding minimum VMStateDescription and marking device as non-migratable

v5:
  - adding comment to tis_data_read
  - refactoring following support for split command line options
-tpmdev and -device
  - code handling the configuration of the TPM device was moved to tpm.c
  - removed empty line at end of file

v3:
  - prefixing functions with tis_
  - added a function to the backend interface 'early_startup_tpm' that
allows to detect the presence of the block storage and gracefully fails
Qemu if it's not available. This works with migration using shared
storage but doesn't support migration with block storage migration.
For encyrypted QCoW2 and in case of a snapshot resue the late_startup_tpm
interface function is called

Signed-off-by: Stefan Berger 

---
 hw/tpm_tis.c |  818 +++
 1 file changed, 818 insertions(+)

Index: qemu-git.pt/hw/tpm_tis.c
===
--- /dev/null
+++ qemu-git.pt/hw/tpm_tis.c
@@ -0,0 +1,818 @@
+/*
+ * tpm_tis.c - QEMU's TPM TIS interface emulator
+ *
+ * Copyright (C) 2006,2010,2011 IBM Corporation
+ *
+ * Authors:
+ *  Stefan Berger 
+ *  David Safford 
+ *
+ * Xen 4 support: Andrease Niederl 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ *
+ * Implementation of the TIS interface according to specs found at
+ * http://www.trustedcomputiggroup.org.
+ * In the developers menu choose the PC Client section then find the TIS
+ * specification.
+ */
+
+#include "tpm.h"
+#include "block.h"
+#include "exec-memory.h"
+#include "hw/hw.h"
+#include "hw/pc.h"
+#include "hw/tpm_tis.h"
+#include "qemu-error.h"
+
+#include 
+
+/*#define DEBUG_TIS */
+
+/* whether the STS interrupt is supported */
+/*#define RAISE_STS_IRQ */
+
+/* tis registers */
+#define TPM_TIS_REG_ACCESS0x00
+#define TPM_TIS_REG_INT_ENABLE0x08
+#define TPM_TIS_REG_INT_VECTOR0x0c
+#define TPM_TIS_REG_INT_STATUS0x10
+#define TPM_TIS_REG_INTF_CAPABILITY   0x14
+#define TPM_TIS_REG_STS   0x18
+#define TPM_TIS_REG_DATA_FIFO 0x24
+#define TPM_TIS_REG_DID_VID   0xf00
+#define TPM_TIS_REG_RID   0xf04
+
+#define TPM_TIS_STS_VALID (1 << 7)
+#define TPM_TIS_STS_COMMAND_READY (1 << 6)
+#define TPM_TIS_STS_TPM_GO(1 << 5)
+#define TPM_TIS_STS_DATA_AVAILABLE(1 << 4)
+#define TPM_TIS_STS_EXPECT(1 << 3)
+#define TPM_TIS_STS_RESPONSE_RETRY(1 << 1)
+
+#define TPM_TIS_ACCESS_TPM_REG_VALID_STS  (1 << 7)
+#define TPM_TIS_ACCESS_ACTIVE_LOCALITY(1 << 5)
+#define TPM_TIS_ACCESS_BEEN_SEIZED(1 << 4)
+#define TPM_TIS_ACCESS_SEIZE

[Qemu-devel] [PATCH V10 3/5] Add a debug register

2011-09-27 Thread Stefan Berger

This patch uses the possibility to add a vendor-specific register and
adds a debug register useful for dumping the TIS's internal state. This
register is only active in a debug build (#define DEBUG_TIS).

v9:
 - prefixing all function with tpm_tis_ and all constants with TPM_TIS_

v3:
 - all output goes to stderr

Signed-off-by: Stefan Berger 

---
 hw/tpm_tis.c |   73 +++
 1 file changed, 73 insertions(+)

Index: qemu-git.pt/hw/tpm_tis.c
===
--- qemu-git.pt.orig/hw/tpm_tis.c
+++ qemu-git.pt/hw/tpm_tis.c
@@ -47,6 +47,9 @@
 #define TPM_TIS_REG_DID_VID   0xf00
 #define TPM_TIS_REG_RID   0xf04
 
+/* vendor-specific registers */
+#define TPM_TIS_REG_DEBUG 0xf90
+
 #define TPM_TIS_STS_VALID (1 << 7)
 #define TPM_TIS_STS_COMMAND_READY (1 << 6)
 #define TPM_TIS_STS_TPM_GO(1 << 5)
@@ -92,6 +95,11 @@
 
 #define TPM_TIS_NO_DATA_BYTE  0xff
 
+/* local prototypes */
+static uint64_t tpm_tis_mmio_read(void *opaque, target_phys_addr_t addr,
+  unsigned size);
+
+
 #ifdef DEBUG_TIS
 static void tpm_tis_show_buffer(const TPMSizedBuffer *sb, const char *string)
 {
@@ -319,6 +327,66 @@ static uint32_t tpm_tis_data_read(TPMSta
 return ret;
 }
 
+#ifdef DEBUG_TIS
+static void tpm_tis_dump_state(void *opaque, target_phys_addr_t addr)
+{
+static const unsigned regs[] = {
+TPM_TIS_REG_ACCESS,
+TPM_TIS_REG_INT_ENABLE,
+TPM_TIS_REG_INT_VECTOR,
+TPM_TIS_REG_INT_STATUS,
+TPM_TIS_REG_INTF_CAPABILITY,
+TPM_TIS_REG_STS,
+TPM_TIS_REG_DID_VID,
+TPM_TIS_REG_RID,
+0xfff};
+int idx;
+uint8_t locty = tpm_tis_locality_from_addr(addr);
+target_phys_addr_t base = addr & ~0xfff;
+TPMState *s = opaque;
+TPMTISState *tis = &s->s.tis;
+
+fprintf(stderr,
+"tpm_tis: active locality  : %d\n"
+"tpm_tis: state of locality %d : %d\n"
+"tpm_tis: register dump:\n",
+tis->active_locty,
+locty, tis->loc[locty].state);
+
+for (idx = 0; regs[idx] != 0xfff; idx++) {
+fprintf(stderr, "tpm_tis: 0x%04x : 0x%08x\n", regs[idx],
+(uint32_t)tpm_tis_mmio_read(opaque, base + regs[idx], 4));
+}
+
+fprintf(stderr,
+"tpm_tis: read offset   : %d\n"
+"tpm_tis: result buffer : ",
+tis->loc[locty].r_offset);
+for (idx = 0;
+ idx < tpm_tis_get_size_from_buffer(&tis->loc[locty].r_buffer);
+ idx++) {
+fprintf(stderr, "%c%02x%s",
+tis->loc[locty].r_offset == idx ? '>' : ' ',
+tis->loc[locty].r_buffer.buffer[idx],
+((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
+}
+fprintf(stderr,
+"\n"
+"tpm_tis: write offset  : %d\n"
+"tpm_tis: request buffer: ",
+tis->loc[locty].w_offset);
+for (idx = 0;
+ idx < tpm_tis_get_size_from_buffer(&tis->loc[locty].w_buffer);
+ idx++) {
+fprintf(stderr, "%c%02x%s",
+tis->loc[locty].w_offset == idx ? '>' : ' ',
+tis->loc[locty].w_buffer.buffer[idx],
+((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
+}
+fprintf(stderr, "\n");
+}
+#endif
+
 /*
  * Read a register of the TIS interface
  * See specs pages 33-63 for description of the registers
@@ -391,6 +459,11 @@ static uint64_t tpm_tis_mmio_read(void *
 case TPM_TIS_REG_RID:
 val = TPM_TIS_TPM_RID;
 break;
+#ifdef DEBUG_TIS
+case TPM_TIS_REG_DEBUG:
+tpm_tis_dump_state(opaque, addr);
+break;
+#endif
 }
 
 qemu_mutex_unlock(&s->state_lock);

[Qemu-devel] [PATCH V10 4/5] Build the TPM frontend code

2011-09-27 Thread Stefan Berger

Build the TPM frontend code that has been added so far.

Signed-off-by: Stefan Berger 

---
 Makefile.target |1 +
 configure   |   11 +++
 2 files changed, 12 insertions(+)

Index: qemu-git.pt/Makefile.target
===
--- qemu-git.pt.orig/Makefile.target
+++ qemu-git.pt/Makefile.target
@@ -194,6 +194,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/virt
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-y += memory.o
+obj-$(CONFIG_TPM) += tpm.o tpm_tis.o
 LIBS+=-lz
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
Index: qemu-git.pt/configure
===
--- qemu-git.pt.orig/configure
+++ qemu-git.pt/configure
@@ -182,6 +182,7 @@ usb_redir=""
 opengl=""
 zlib="yes"
 guest_agent="yes"
+tpm="no"
 
 # parse CC options first
 for opt do
@@ -766,6 +767,8 @@ for opt do
   ;;
   --disable-guest-agent) guest_agent="no"
   ;;
+  --enable-tpm) tpm="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1048,6 +1051,7 @@ echo "  --disable-usb-redir  disable
 echo "  --enable-usb-redir   enable usb network redirection support"
 echo "  --disable-guest-agentdisable building of the QEMU Guest Agent"
 echo "  --enable-guest-agent enable building of the QEMU Guest Agent"
+echo "  --enable-tpm enable TPM support"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
@@ -2714,6 +2718,7 @@ echo "nss used  $smartcard_nss"
 echo "usb net redir $usb_redir"
 echo "OpenGL support$opengl"
 echo "build guest agent $guest_agent"
+echo "TPM support   $tpm"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3558,6 +3563,12 @@ if test "$gprof" = "yes" ; then
   fi
 fi
 
+if test "$tpm" = "yes"; then
+  if test "$target_softmmu" = "yes" ; then
+echo "CONFIG_TPM=y" >> $config_host_mak
+  fi
+fi
+
 linker_script="-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld"
 if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
   case "$ARCH" in

[Qemu-devel] [PATCH V10 1/5] Support for TPM command line options

2011-09-27 Thread Stefan Berger

This patch adds support for TPM command line options.
The command line options supported here are

./qemu-... -tpmdev passthrough,path=,id=
   -device tpm-tis,tpmdev=

and

./qemu-... -tpmdev ?

where the latter works similar to -soundhw ? and shows a list of
available TPM backends (for example 'passthrough').

Using the type parameter, the backend is chosen, i.e., 'passthrough' for the
passthrough driver. The interpretation of the other parameters along
with determining whether enough parameters were provided is pushed into
the backend driver, which needs to implement the interface function
'create' and return a TPMDriver structure if the VM can be started or 'NULL'
if not enough or bad parameters were provided.

Monitor support for 'info tpm' has been added. It for example prints the
following:

(qemu) info tpm
TPM devices:
 tpm0: model=tpm-tis
  \ tpm0: type=passthrough,path=/dev/tpm0

v10:
 - tpm_display_backend_drivers always prints to stderr

v9:
 - prefixing all functions with tpm_tis_ and all constants with TPM_TIS_
 - splitting off of -tpm command line support into its own patch
 - only support TPM passthrough for now

v8:
 - adjusting formatting of backend drivers output to accomodate better
   formatting of 'passthrough' backend output

v6:
 - use #idef CONFIG_TPM to surround TPM calls
 - use QLIST_FOREACH_SAFE rather than QLIST_FOREACH in tpm_cleanup
 - commented backend ops in tpm.h
 - moving to IRQ 5 (11 collided with network cards)

v5:
 - fixing typo reported by Serge Hallyn
 - Adapting code to split command line parameters supporting 
   -tpmdev ... -device tpm-tis,tpmdev=...
 - moved code out of arch_init.c|h into tpm.c|h
 - increasing reserved memory for ACPI tables to 128kb (from 64kb)
 - the backend interface has a create() function for interpreting the command
   line parameters and returning a TPMDevice structure; previoulsy
   this function was called handle_options()
 - the backend interface has a destroy() function for cleaning up after
   the create() function was called
 - added support for 'info tpm' in monitor

v4:
 - coding style fixes

v3:
 - added hw/tpm_tis.h to this patch so Qemu compiles at this stage

Signed-off-by: Stefan Berger 

---
 hmp-commands.hx |2 
 hw/tpm_tis.h|   89 +
 monitor.c   |   10 +++
 qemu-config.c   |   20 ++
 qemu-options.hx |   34 +++
 tpm.c   |  167 
 tpm.h   |   90 ++
 vl.c|   15 +
 8 files changed, 427 insertions(+)

Index: qemu-git.pt/qemu-options.hx
===
--- qemu-git.pt.orig/qemu-options.hx
+++ qemu-git.pt/qemu-options.hx
@@ -1760,6 +1760,40 @@ ETEXI
 
 DEFHEADING()
 
+DEFHEADING(TPM device options:)
+
+#ifndef _WIN32
+# ifdef CONFIG_TPM
+DEF("tpmdev", HAS_ARG, QEMU_OPTION_tpmdev, \
+"-tpmdev [],id=str[,option][,option][,...]\n",
+QEMU_ARCH_ALL)
+# endif
+#endif
+STEXI
+
+The general form of a TPM device option is:
+@table @option
+
+@item -tpmdev @var{backend} ,id=@var{id} [,@var{options}]
+@findex -tpmdev
+Backend type must be:
+
+The specific backend type will determine the applicable options.
+The @code{-tpmdev} options requires a @code{-device} option.
+
+Options to each backend are described below.
+
+Use ? to print all available TPM backend types.
+@example
+qemu -tpmdev ?
+@end example
+
+@end table
+
+ETEXI
+
+DEFHEADING()
+
 DEFHEADING(Linux/Multiboot boot specific:)
 STEXI
 
Index: qemu-git.pt/vl.c
===
--- qemu-git.pt.orig/vl.c
+++ qemu-git.pt/vl.c
@@ -139,6 +139,7 @@ int main(int argc, char **argv)
 #include "block.h"
 #include "blockdev.h"
 #include "block-migration.h"
+#include "tpm.h"
 #include "dma.h"
 #include "audio/audio.h"
 #include "migration.h"
@@ -2675,6 +2676,11 @@ int main(int argc, char **argv, char **e
 ram_size = value;
 break;
 }
+#ifdef CONFIG_TPM
+case QEMU_OPTION_tpmdev:
+tpm_config_parse(qemu_find_opts("tpmdev"), optarg);
+break;
+#endif
 case QEMU_OPTION_mempath:
 mem_path = optarg;
 break;
@@ -3327,6 +,12 @@ int main(int argc, char **argv, char **e
 exit(1);
 }
 
+#ifdef CONFIG_TPM
+if (tpm_init() < 0) {
+exit(1);
+}
+#endif
+
 /* init the bluetooth world */
 if (foreach_device_config(DEV_BT, bt_parse))
 exit(1);
@@ -3575,6 +3587,9 @@ int main(int argc, char **argv, char **e
 quit_timers();
 net_cleanup();
 res_free();
+#ifdef CONFIG_TPM
+tpm_cleanup();
+#endif
 
 return 0;
 }
Index: qemu-git.pt/qemu-config.c
===
--- qemu-git.pt.orig/qemu-config.c
+++ qemu-git.pt/qemu-config.c
@@ -508,6 +508,25 @@ QemuOptsList qemu_boot_opts = {
 },
 };
 
+st

[Qemu-devel] [PATCH V10 5/5] Add a TPM Passthrough backend driver implementation

2011-09-27 Thread Stefan Berger

>From Andreas Niederl's original posting with adaptations where necessary:

This patch is based of off version 9 of Stefan Berger's patch series
  "Qemu Trusted Platform Module (TPM) integration"
and adds a new backend driver for it.

This patch adds a passthrough backend driver for passing commands sent to the
emulated TPM device directly to a TPM device opened on the host machine.

Thus it is possible to use a hardware TPM device in a system running on QEMU,
providing the ability to access a TPM in a special state (e.g. after a Trusted
Boot).

This functionality is being used in the acTvSM Trusted Virtualization Platform
which is available on [1].

Usage example:
  qemu-system-x86_64 -tpmdev passthrough,id=tpm0,path=/dev/tpm0 \
 -device tpm-tis,tpmdev=tpm0 \
 -cdrom test.iso -boot d

Some notes about the host TPM:
The TPM needs to be enabled and activated. If that's not the case one
has to go through the BIOS/UEFI and enable and activate that TPM for TPM
commands to work as expected.
It may be necessary to boot the kernel using tpm_tis.force=1 in the boot
command line or 'modprobe tpm_tis force=1' in case of using it as a module.

Changes for v10:
 - clarified documentation
 - using /dev/tpm0 as default device if path option is not given
 - refactored code handling 'device' option into its own function
 - fixed name of structure to TPMPassthruThreadParams
 - only add tpm_passthrough_driver to collection of backends if
   it is compiled on the host

Changes for v9:
 - prefixing of all functions and variables with tpm_passthrough_
 - cleanup of all variables into a structure that is now accessed
   using TPMBackend (tb->s.tpm_pt)
 - build it on Linux machines
 - added function to test whether given device is a TPM and refuse
   startup if it is not

Regards,
Andreas Niederl, Stefan Berger

[1] http://trustedjava.sourceforge.net/

Signed-off-by: Andreas Niederl 
Signed-off-by: Stefan Berger 

---
 Makefile.target  |1 
 configure|3 
 hw/tpm_passthrough.c |  471 +++
 qemu-options.hx  |   29 +++
 tpm.c|   23 ++
 tpm.h|   34 +++
 6 files changed, 561 insertions(+)
 create mode 100644 hw/tpm_passthrough.c

Index: qemu-git.pt/Makefile.target
===
--- qemu-git.pt.orig/Makefile.target
+++ qemu-git.pt/Makefile.target
@@ -195,6 +195,7 @@ obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-y += memory.o
 obj-$(CONFIG_TPM) += tpm.o tpm_tis.o
+obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o
 LIBS+=-lz
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
Index: qemu-git.pt/configure
===
--- qemu-git.pt.orig/configure
+++ qemu-git.pt/configure
@@ -3565,6 +3565,9 @@ fi
 
 if test "$tpm" = "yes"; then
   if test "$target_softmmu" = "yes" ; then
+if test "$linux" = "yes" ; then
+  echo "CONFIG_TPM_PASSTHROUGH=y" >> $config_target_mak
+fi
 echo "CONFIG_TPM=y" >> $config_host_mak
   fi
 fi
Index: qemu-git.pt/hw/tpm_passthrough.c
===
--- /dev/null
+++ qemu-git.pt/hw/tpm_passthrough.c
@@ -0,0 +1,471 @@
+/*
+ *  passthrough TPM driver
+ *
+ *  Copyright (c) 2010, 2011 IBM Corporation
+ *  Authors:
+ *Stefan Berger 
+ *
+ *  Copyright (C) 2011 IAIK, Graz University of Technology
+ *Author: Andreas Niederl
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu-common.h"
+#include "qemu-error.h"
+#include "tpm.h"
+#include "hw/hw.h"
+#include "hw/tpm_tis.h"
+#include "hw/pc.h"
+
+/* #define DEBUG_TPM */
+
+/* data structures */
+
+typedef struct TPMPassthruThreadParams {
+TPMState *tpm_state;
+
+TPMRecvDataCB *recv_data_callback;
+TPMBackend *tb;
+} TPMPassthruThreadParams;
+
+struct TPMPassthruState {
+QemuThread thread;
+bool thread_terminate;
+bool thread_running;
+
+TPMPassthruThreadParams tpm_thread_params;
+
+char tpm_dev[64];
+int tpm_fd;
+bool had_startup_error;
+};
+
+#define TPM_PASSTHROUGH_DEFAULT_DEVICE "/dev/tpm0"
+
+/* borrowed from qemu-char.c */
+static int tpm_passthrough_unix_write(int fd, const uint8_t *buf, uint32_t len)
+{
+int ret, len1;
+
+len1 = len;
+wh

[Qemu-devel] [PATCH V10 0/5] Qemu Trusted Platform Module (TPM) integration

2011-09-27 Thread Stefan Berger

The following series of patches adds TPM (Trusted Platform Module) support
to Qemu. An emulator for the TIS (TPM Interface Spec) interface is
added that provides the basis for accessing a 'backend' implementing the actual
TPM functionality. The TIS emulator serves as a 'frontend' enabling for
example Linux's TPM TIS (tpm_tis) driver.

In this series I am posting a backend implementation that makes use of the
host's TPM through a passthrough driver, which on Linux is accessed
using /dev/tpm0.

v10:
 - applies to checkout of 1ce9ce6 (Sep 27)
 - addressed Michael Tsirkin's comments on v9

v9:
 - addressed Michael Tsirkin's and other reviewers' comments
 - only posting Andreas Niederl's passthrough driver as the backend driver

v8:
 - applies to checkout of f0fb8b7 (Aug 30)
 - fixing compilation error pointed out by Andreas Niederl
 - adding patch that allows to feed an initial state into the libtpms TPM
 - following memory API changes (glib) where necessary

v7:
 - applies to checkout of b9c6cbf (Aug 9)
 - measuring the modules if multiboot is used
 - coding style fixes

v6:
 - applies to checkout of 75ef849 (July 2nd)
 - some fixes and improvements to existing patches; see individual patches
 - added a patch with a null driver responding to all TPM requests with
   a response indicating failure; this backend has no dependencies and
   can alwayy be built;
 - added a patch to support the hashing of kernel, ramfs and command line
   if those were passed to Qemu using -kernel, -initrd and -append
   respectively. Measurements are taken, logged, and passed to SeaBIOS using
   the firmware interface.
 - libtpms revision 7 now requires 83kb of block storage due to having more
   NVRAM space

v5:
 - applies to checkout of 1fddfba1
 - adding support for split command line using the -tpmdev ... -device ...
   options while keeping the -tpm option
 - support for querying the device models using -tpm model=?
 - support for monitor 'info tpm'
 - adding documentation of command line options for man page and web page
 - increasing room for ACPI tables that qemu reserves to 128kb (from 64kb)
 - adding (experimental) support for block migration
 - adding (experimental) support for taking measurements when kernel,
   initrd and kernel command line are directly passed to Qemu

v4:
 - applies to checkout of d2d979c6
 - more coding style fixes
 - adding patch for supporting blob encryption (in addition to the existing
   QCoW2-level encryption)
   - this allows for graceful termination of a migration if the target
 is detected to have a wrong key
   - tested with big and little endian hosts
 - main thread releases mutex while checking for work to do on behalf of
   backend
 - introducing file locking (fcntl) on the block layer for serializing access
   to shared (QCoW2) files (used during migration)

v3:
 - Building a null driver at patch 5/8 that responds to all requests
   with an error response; subsequently this driver is transformed to the
   libtpms-based driver for real TPM functionality
 - Reworked the threading; dropped the patch for qemu_thread_join; the
   main thread synchronizing with the TPM thread termination may need
   to write data to the block storage while waiting for the thread to 
   terminate; did not previously show a problem but is safer
 - A lot of testing based on recent git checkout 4b4a72e5 (4/10):
   - migration of i686 VM from x86_64 host to i686 host to ppc64 host while
 running tests inside the VM
   - tests with S3 suspend/resume
   - tests with snapshots
   - multiple-hour tests with VM suspend/resume (using virsh save/restore)
 while running a TPM test suite inside the VM
   All tests passed; [not all of them were done on the ppc64 host]

v2:
 - splitting some of the patches into smaller ones for easier review
 - fixes in individual patches

Regards,
Stefan

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Avi Kivity

On 09/27/2011 05:50 PM, Jan Kiszka wrote:

On 2011-09-27 16:48, Avi Kivity wrote:
>  On 09/27/2011 05:45 PM, Jan Kiszka wrote:
>>  I don't disagree that there is still room for improving the existing
>>  interface, generalizing to qemu_event. But that's not in the scope of
>>  this patch.
>>
>
>  Why not use event_notify.c?

It doesn't solve the wrapping issue, it mandates eventfd support.

We can add pipe fallbacks too (though if it was meant to use with vhost, 
that's not what we want).

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Anthony Liguori

On 09/27/2011 09:54 AM, Avi Kivity wrote:

On 09/27/2011 05:50 PM, Jan Kiszka wrote:

On 2011-09-27 16:48, Avi Kivity wrote:
> On 09/27/2011 05:45 PM, Jan Kiszka wrote:
>> I don't disagree that there is still room for improving the existing
>> interface, generalizing to qemu_event. But that's not in the scope of
>> this patch.
>>
>
> Why not use event_notify.c?

It doesn't solve the wrapping issue, it mandates eventfd support.

We can add pipe fallbacks too (though if it was meant to use with vhost, that's
not what we want).

vhost cannot exist on a kernel that doesn't support eventfd so I don't think we 
need to worry about it.

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO

2011-09-27 Thread Jan Kiszka

On 2011-09-27 16:54, Avi Kivity wrote:
> On 09/27/2011 05:50 PM, Jan Kiszka wrote:
>> On 2011-09-27 16:48, Avi Kivity wrote:
>>>  On 09/27/2011 05:45 PM, Jan Kiszka wrote:
  I don't disagree that there is still room for improving the existing
  interface, generalizing to qemu_event. But that's not in the scope of
  this patch.

>>>
>>>  Why not use event_notify.c?
>>
>> It doesn't solve the wrapping issue, it mandates eventfd support.
>>
> 
> We can add pipe fallbacks too (though if it was meant to use with vhost, 
> that's not what we want).

Not a practical issue due to the dependency on much more recent vhost.

So EventNotifier will have to be migrated over a generic solution as
well. Again, that's food for additional patches.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Try to integrate the Xen PCI Passthrough code into linux: have pci_regs conflict with libpci.

2011-09-27 Thread Anthony PERARD

Hi,

I'm trying to integrate the Xen PCI Passthrough code into Qemu. But we
use libpci, and it's not friendly with pci_regs.h.

So can I replace pci_regs by the libpci one?
Should I avoid to include both? (by having a "hook" the libpci functions)
Or do you have any other suggestions?

Thanks,
Regards,

-- 
Anthony PERARD

Re: [Qemu-devel] [PATCH 5/7] target-mips:Support for Cavium specific instructions

2011-09-27 Thread Richard Henderson

On 09/26/2011 09:17 PM, kha...@kics.edu.pk wrote:
> +switch (opc) {
> +case OPC_SEQI:
> +tcg_gen_setcondi_tl(TCG_COND_LT, cpu_gpr[rt], t0, 1);
> +opn = "seqi";
> +break;
> +case OPC_SNEI:
> +tcg_gen_setcondi_tl(TCG_COND_GT, cpu_gpr[rt], t0, 0);
> +opn = "snei";
> +break;
> +}

You didn't change anything wrt my comments from the last round.


r~

Re: [Qemu-devel] [PATCH] event_notifier: move to top-level directory

2011-09-27 Thread Stefan Hajnoczi

On Tue, Sep 27, 2011 at 3:26 PM, Avi Kivity  wrote:
> Has no business in hw/.
>
> Signed-off-by: Avi Kivity 
> ---
>  hw/event_notifier.c => event_notifier.c |    1 -
>  hw/event_notifier.h => event_notifier.h |    0
>  2 files changed, 0 insertions(+), 1 deletions(-)
>  rename hw/event_notifier.c => event_notifier.c (98%)
>  rename hw/event_notifier.h => event_notifier.h (100%)

Yay.  Now perhaps we can kill qemu_eventfd(), whose users are
typically poking around with write(2) and read(2) when really they
could use the high-level event_notifier.h API.

Stefan

Re: [Qemu-devel] Try to integrate the Xen PCI Passthrough code into linux: have pci_regs conflict with libpci.

2011-09-27 Thread Michael S. Tsirkin

On Tue, Sep 27, 2011 at 04:02:23PM +0100, Anthony PERARD wrote:
> Hi,
> 
> I'm trying to integrate the Xen PCI Passthrough code into Qemu. But we
> use libpci, and it's not friendly with pci_regs.h.
> 
> So can I replace pci_regs by the libpci one?

I prefer sticking to pci_regs in linux.

> Should I avoid to include both? (by having a "hook" the libpci functions)
> Or do you have any other suggestions?
> 
> Thanks,
> Regards,

Can you avoid libpci? It was very useful before sysfs, but
on modern systems there isn't much that it does.

-- 
MST

Re: [Qemu-devel] [PATCH] target-i386: Remove redundant word mask in port out instructions

2011-09-27 Thread Richard Henderson

On 09/26/2011 10:20 AM, Jan Kiszka wrote:
> T0 was already masked to 16 bits when loading it.
> 
> Signed-off-by: Jan Kiszka 

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH] event_notifier: move to top-level directory

2011-09-27 Thread Jan Kiszka

On 2011-09-27 17:17, Stefan Hajnoczi wrote:
> On Tue, Sep 27, 2011 at 3:26 PM, Avi Kivity  wrote:
>> Has no business in hw/.
>>
>> Signed-off-by: Avi Kivity 
>> ---
>>  hw/event_notifier.c => event_notifier.c |1 -
>>  hw/event_notifier.h => event_notifier.h |0
>>  2 files changed, 0 insertions(+), 1 deletions(-)
>>  rename hw/event_notifier.c => event_notifier.c (98%)
>>  rename hw/event_notifier.h => event_notifier.h (100%)
> 
> Yay.  Now perhaps we can kill qemu_eventfd(), whose users are
> typically poking around with write(2) and read(2) when really they
> could use the high-level event_notifier.h API.

EventNotifiers will have to be superseded by something more generic
first. It's not fully covering the use cases of cpus.c and
posix-aio-compat.c.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 0/6] Alpha system emulation, v11

2011-09-27 Thread Richard Henderson

Ping.


r~

On 09/22/2011 09:04 AM, Edgar E. Iglesias wrote:
> On Thu, Sep 22, 2011 at 08:30:43AM -0700, Richard Henderson wrote:
>> Changes v10 - v11:
>>   * Fixes for icount, based on feedback from Peter and Edgar.
>>   * Re-based against master.
>>
>> The patch set is also available at
>>
>>   git://repo.or.cz/qemu/rth.git axp-system-7
>>
>>
>> r~
>>
>>
>> Richard Henderson (6):
>>   target-alpha: Honor icount for RPCC instruction.
>>   target-alpha: Add custom PALcode image for CLIPPER emulation.
>>   target-alpha: Add CLIPPER emulation.
>>   target-alpha: Implement WAIT IPR.
>>   target-alpha: Implement HALT IPR.
>>   target-alpha: Add high-resolution access to wall clock and an alarm.
> 
> Thanks
> 
> Ack on everything but 2/6. Someone else can maybe check if the submodulery
> is OK..
> 
> Acked-by: Edgar E. Iglesias

Re: [Qemu-devel] [PATCH 2/2] e1000: CTRL.RST emulation

2011-09-27 Thread Andy Gospodarek

On Tue, Sep 27, 2011 at 8:58 AM, Michael S. Tsirkin  wrote:
> e1000 spec says CTRL.RST write should have the same effect
> as bus reset, except that is preserves PCI Config.
> Reset device registers and interrupts.
>
> Fix suggested by Andy Gospodarek 
> Similar fix proposed by Anthony PERARD 
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/e1000.c |    9 +++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 87a1104..b51e089 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -241,8 +241,13 @@ static void e1000_reset(void *opaque)
>  static void
>  set_ctrl(E1000State *s, int index, uint32_t val)
>  {
> -    /* RST is self clearing */
> -    s->mac_reg[CTRL] = val & ~E1000_CTRL_RST;
> +    if (val & E1000_CTRL_RST) {
> +        e1000_reset(s);
> +        qemu_set_irq(s->dev.irq[0], 0);
> +        return;
> +    }
> +
> +    s->mac_reg[CTRL] = val;
>  }
>
>  static void
> --
> 1.7.5.53.gc233e
>

Looks good to me.  Thanks for following up with this, Michael.

[Qemu-devel] [PATCH V2] Add stdio char device on windows

2011-09-27 Thread Fabien Chouteau

Simple implementation of an stdio char device on Windows.

Signed-off-by: Fabien Chouteau 
---
 qemu-char.c |  199 ++-
 1 files changed, 197 insertions(+), 2 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 09d2309..46acf1c 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -538,6 +538,9 @@ int send_all(int fd, const void *_buf, int len1)
 }
 #endif /* !_WIN32 */
 
+#define STDIO_MAX_CLIENTS 1
+static int stdio_nb_clients;
+
 #ifndef _WIN32
 
 typedef struct {
@@ -545,8 +548,6 @@ typedef struct {
 int max_size;
 } FDCharDriver;
 
-#define STDIO_MAX_CLIENTS 1
-static int stdio_nb_clients = 0;
 
 static int fd_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
@@ -1451,6 +1452,8 @@ static int qemu_chr_open_pp(QemuOpts *opts, 
CharDriverState **_chr)
 
 #else /* _WIN32 */
 
+static CharDriverState *stdio_clients[STDIO_MAX_CLIENTS];
+
 typedef struct {
 int max_size;
 HANDLE hcom, hrecv, hsend;
@@ -1809,6 +1812,197 @@ static int qemu_chr_open_win_file_out(QemuOpts *opts, 
CharDriverState **_chr)
 
 return qemu_chr_open_win_file(fd_out, _chr);
 }
+
+static int win_stdio_write(CharDriverState *chr, const uint8_t *buf, int len)
+{
+HANDLE *hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
+DWORD   dwSize;
+int len1;
+
+len1 = len;
+
+while (len1 > 0) {
+if (!WriteFile(hStdOut, buf, len1, &dwSize, NULL)) {
+break;
+}
+buf  += dwSize;
+len1 -= dwSize;
+}
+
+return len - len1;
+}
+
+static HANDLE *hStdIn;
+
+static void win_stdio_wait_func(void *opaque)
+{
+CharDriverState *chr = opaque;
+INPUT_RECORD buf[4];
+int  ret;
+DWORDdwSize;
+int  i;
+
+ret = ReadConsoleInput(hStdIn, buf, sizeof(buf)/sizeof(*buf), &dwSize);
+
+if (!ret) {
+/* Avoid error storm */
+qemu_del_wait_object(hStdIn, NULL, NULL);
+return;
+}
+
+for (i = 0; i < dwSize; i++) {
+KEY_EVENT_RECORD *kev = &buf[i].Event.KeyEvent;
+
+if (buf[i].EventType == KEY_EVENT && kev->bKeyDown) {
+int j;
+if (kev->uChar.AsciiChar != 0) {
+for (j = 0; j < kev->wRepeatCount; j++)
+if (qemu_chr_be_can_write(chr)) {
+uint8_t c = kev->uChar.AsciiChar;
+qemu_chr_be_write(chr, &c, 1);
+}
+}
+}
+}
+}
+
+static HANDLE  hInputReadyEvent;
+static HANDLE  hInputDoneEvent;
+static uint8_t win_stdio_buf;
+
+static DWORD WINAPI win_stdio_thread(LPVOID param)
+{
+int   ret;
+DWORD dwSize;
+
+while (1) {
+
+/* Wait for one byte */
+ret = ReadFile(hStdIn, &win_stdio_buf, 1, &dwSize, NULL);
+
+/* Exit in case of error, continue if nothing read */
+if (!ret) {
+break;
+}
+if (!dwSize) {
+continue;
+}
+
+/* Some terminal emulator returns \r\n for Enter, just pass \n */
+if (win_stdio_buf == '\r') {
+continue;
+}
+
+/* Signal the main thread and wait until the byte was eaten */
+if (!SetEvent(hInputReadyEvent)) {
+break;
+}
+if (WaitForSingleObject(hInputDoneEvent, INFINITE) != WAIT_OBJECT_0) {
+break;
+}
+}
+
+qemu_del_wait_object(hInputReadyEvent, NULL, NULL);
+return 0;
+}
+
+static void win_stdio_thread_wait_func(void *opaque)
+{
+CharDriverState *chr = opaque;
+
+if (qemu_chr_be_can_write(chr)) {
+qemu_chr_be_write(chr, &win_stdio_buf, 1);
+}
+
+SetEvent(hInputDoneEvent);
+}
+
+static void qemu_chr_set_echo_win_stdio(CharDriverState *chr, bool echo)
+{
+DWORD dwMode = 0;
+
+GetConsoleMode(hStdIn, &dwMode);
+
+if (echo) {
+SetConsoleMode(hStdIn, dwMode | ENABLE_ECHO_INPUT);
+} else {
+SetConsoleMode(hStdIn, dwMode & ~ENABLE_ECHO_INPUT);
+}
+}
+
+static int qemu_chr_open_win_stdio(QemuOpts *opts,
+CharDriverState **_chr)
+{
+CharDriverState *chr;
+DWORDdwMode;
+int  is_console = 0;
+
+hStdIn = GetStdHandle(STD_INPUT_HANDLE);
+if (hStdIn == INVALID_HANDLE_VALUE) {
+fprintf(stderr, "cannot open stdio: invalid handle\n");
+exit(1);
+}
+
+is_console = GetConsoleMode(hStdIn, &dwMode) != 0;
+
+if (stdio_nb_clients >= STDIO_MAX_CLIENTS
+|| ((display_type != DT_NOGRAPHIC) && (stdio_nb_clients != 0))) {
+return -EIO;
+}
+
+chr = g_malloc0(sizeof(CharDriverState));
+if (!chr) {
+return -ENOMEM;
+}
+
+chr->chr_write = win_stdio_write;
+
+if (stdio_nb_clients == 0) {
+if (is_console) {
+if (qemu_add_wait_object(hStdIn,
+ win_stdio_wait_func, chr)) {
+fprintf(stderr, "qemu_add

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code

2011-09-27 Thread Blue Swirl

On Mon, Sep 26, 2011 at 11:19 PM, Scott Wood  wrote:
> On 09/24/2011 05:00 AM, Alexander Graf wrote:
>> On 24.09.2011, at 10:44, Blue Swirl wrote:
>>> On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf  wrote:
 On 24.09.2011, at 09:41, Blue Swirl wrote:
> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood  
> wrote:
>> The goal with the spin table stuff, suboptimal as it is, was something
>> that would work on any powerpc implementation.  Other
>> implementation-specific release mechanisms are allowed, and are
>> indicated by a property in the cpu node, but only if the loader knows
>> that the OS supports it.
>>
>>> IIUC the spec that includes these bits is not finalized yet. It is 
>>> however in use on all u-boot versions for e500 that I'm aware of and 
>>> the method Linux uses to bring up secondary CPUs.
>>
>> It's in ePAPR 1.0, which has been out for a while now.  ePAPR 1.1 was
>> just released which clarifies some things such as WIMG.
>>
>>> Stuart / Scott, do you have any pointers to documentation where the 
>>> spinning is explained?
>>
>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf
>
> Chapter 5.5.2 describes the table. This is actually an interface
> between OS and Open Firmware, obviously there can't be a real hardware
> device that magically loads r3 etc.
>
> Not Open Firmware, but rather an ePAPR-compliant loader.

'boot program to client program interface definition'.

> The device method would break abstraction layers,
>
> Which abstraction layers?

QEMU system emulation emulates hardware, not software. Hardware
devices don't touch CPU registers.

> it's much like
> vmport stuff in x86. Using a hypercall would be a small improvement.
> Instead it should be possible to implement a small boot ROM which puts
> the secondary CPUs into managed halt state without spinning, then the
> boot CPU could send an IPI to a halted CPU to wake them up based on
> the spin table, just like real HW would do.
>
> The spin table, with no IPI or halt state, is what real HW does (or
> rather, what software does on real HW) today.  It's ugly and inefficient
> but it should work everywhere.  Anything else would be dependent on a
> specific HW implementation.

Yes. Hardware doesn't ever implement the spin table.

> On Sparc32 OpenBIOS this
> is something like a few lines of ASM on both sides.

 That sounds pretty close to what I had implemented in v1. Back then the 
 only comment was to do it using this method from Scott.
>
> I had some comments on the actual v1 implementation as well. :-)
>
 So we have the choice between having code inside the guest that
 spins, maybe even only checks every x ms, by programming a timer,
 or we can try to make an event out of the memory write. V1 was
 the former, v2 (this one) is the latter. This version performs a
 lot better and is easier to understand.
>>>
>>> The abstraction layers should not be broken lightly, I suppose some
>>> performance or laziness^Wlocal optimization reasons were behind vmport
>>> design too. The ideal way to solve this could be to detect a spinning
>>> CPU and optimize that for all architectures, that could be tricky
>>> though (if a CPU remains in the same TB for extended periods, inspect
>>> the TB: if it performs a loop with a single load instruction, replace
>>> the load by a special wait operation for any memory stores to that
>>> page).
>
> How's that going to work with KVM?
>
>> In fact, the whole kernel loading way we go today is pretty much
>> wrong. We should rather do it similar to OpenBIOS where firmware
>> always loads and then pulls the kernel from QEMU using a PV
>> interface. At that point, we would have to implement such an
>> optimization as you suggest. Or implement a hypercall :).
>
> I think the current approach is more usable for most purposes.  If you
> start U-Boot instead of a kernel, how do pass information on from the
> user (kernel, rfs, etc)?  Require the user to create flash images[1]?

No, for example OpenBIOS gets the kernel command line from fw_cfg device.

> Maybe that's a useful mode of operation in some cases, but I don't think
> we should be slavishly bound to it.  Think of the current approach as
> something between whole-system and userspace emulation.

This is similar to ARM, M68k and Xtensa semi-hosting mode, but not at
kernel level but lower. Perhaps this mode should be enabled with
-semihosting flag or a new flag. Then the bare metal version could be
run without the flag.

> Where does the device tree come from?  How do you tell the guest about
> what devices it has, especially in virtualization scenarios with non-PCI
> passthrough devices, or custom qdev instantiations?
>
>> But at least we'd always be running the same guest software stack.
>
> No we wouldn't.  Any U-Boot that runs under QEMU would have to be
> heavily modified, unles

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code

2011-09-27 Thread Alexander Graf


On 27.09.2011, at 17:50, Blue Swirl wrote:

> On Mon, Sep 26, 2011 at 11:19 PM, Scott Wood  wrote:
>> On 09/24/2011 05:00 AM, Alexander Graf wrote:
>>> On 24.09.2011, at 10:44, Blue Swirl wrote:
 On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf  wrote:
> On 24.09.2011, at 09:41, Blue Swirl wrote:
>> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood  
>> wrote:
>>> The goal with the spin table stuff, suboptimal as it is, was something
>>> that would work on any powerpc implementation.  Other
>>> implementation-specific release mechanisms are allowed, and are
>>> indicated by a property in the cpu node, but only if the loader knows
>>> that the OS supports it.
>>> 
 IIUC the spec that includes these bits is not finalized yet. It is 
 however in use on all u-boot versions for e500 that I'm aware of and 
 the method Linux uses to bring up secondary CPUs.
>>> 
>>> It's in ePAPR 1.0, which has been out for a while now.  ePAPR 1.1 was
>>> just released which clarifies some things such as WIMG.
>>> 
 Stuart / Scott, do you have any pointers to documentation where the 
 spinning is explained?
>>> 
>>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf
>> 
>> Chapter 5.5.2 describes the table. This is actually an interface
>> between OS and Open Firmware, obviously there can't be a real hardware
>> device that magically loads r3 etc.
>> 
>> Not Open Firmware, but rather an ePAPR-compliant loader.
> 
> 'boot program to client program interface definition'.
> 
>> The device method would break abstraction layers,
>> 
>> Which abstraction layers?
> 
> QEMU system emulation emulates hardware, not software. Hardware
> devices don't touch CPU registers.

The great part about this emulated device is that it's basically guest software 
running in host context. To the guest, it's not a device in the ordinary sense, 
such as vmport, but rather the same as software running on another core, just 
that the other core isn't running any software.

Sure, if you consider this a device, it does break abstraction layers. Just 
consider it as host running guest code, then it makes sense :).

> 
>> it's much like
>> vmport stuff in x86. Using a hypercall would be a small improvement.
>> Instead it should be possible to implement a small boot ROM which puts
>> the secondary CPUs into managed halt state without spinning, then the
>> boot CPU could send an IPI to a halted CPU to wake them up based on
>> the spin table, just like real HW would do.
>> 
>> The spin table, with no IPI or halt state, is what real HW does (or
>> rather, what software does on real HW) today.  It's ugly and inefficient
>> but it should work everywhere.  Anything else would be dependent on a
>> specific HW implementation.
> 
> Yes. Hardware doesn't ever implement the spin table.
> 
>> On Sparc32 OpenBIOS this
>> is something like a few lines of ASM on both sides.
> 
> That sounds pretty close to what I had implemented in v1. Back then the 
> only comment was to do it using this method from Scott.
>> 
>> I had some comments on the actual v1 implementation as well. :-)
>> 
> So we have the choice between having code inside the guest that
> spins, maybe even only checks every x ms, by programming a timer,
> or we can try to make an event out of the memory write. V1 was
> the former, v2 (this one) is the latter. This version performs a
> lot better and is easier to understand.
 
 The abstraction layers should not be broken lightly, I suppose some
 performance or laziness^Wlocal optimization reasons were behind vmport
 design too. The ideal way to solve this could be to detect a spinning
 CPU and optimize that for all architectures, that could be tricky
 though (if a CPU remains in the same TB for extended periods, inspect
 the TB: if it performs a loop with a single load instruction, replace
 the load by a special wait operation for any memory stores to that
 page).
>> 
>> How's that going to work with KVM?
>> 
>>> In fact, the whole kernel loading way we go today is pretty much
>>> wrong. We should rather do it similar to OpenBIOS where firmware
>>> always loads and then pulls the kernel from QEMU using a PV
>>> interface. At that point, we would have to implement such an
>>> optimization as you suggest. Or implement a hypercall :).
>> 
>> I think the current approach is more usable for most purposes.  If you
>> start U-Boot instead of a kernel, how do pass information on from the
>> user (kernel, rfs, etc)?  Require the user to create flash images[1]?
> 
> No, for example OpenBIOS gets the kernel command line from fw_cfg device.
> 
>> Maybe that's a useful mode of operation in some cases, but I don't think
>> we should be slavishly bound to it.  Think of the current approach as
>> something between whole-system and userspace emulatio

[Qemu-devel] [PATCH] Move filedescriptor parsing code from net.c into qemu_parse_fd()

2011-09-27 Thread Stefan Berger

Move the parsing of a filedescriptor into a common function qemu_parse_fd().
Have the code in net.c call this function.

Signed-off-by: Stefan Berger 

---
 net.c   |8 ++--
 qemu-char.c |   12 
 qemu-char.h |2 ++
 3 files changed, 16 insertions(+), 6 deletions(-)

Index: qemu-git.pt/net.c
===
--- qemu-git.pt.orig/net.c
+++ qemu-git.pt/net.c
@@ -36,6 +36,7 @@
 #include "qemu_socket.h"
 #include "hw/qdev.h"
 #include "iov.h"
+#include "qemu-char.h"
 
 static QTAILQ_HEAD(, VLANState) vlans;
 static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
@@ -733,12 +734,7 @@ int net_handle_fd_param(Monitor *mon, co
 return -1;
 }
 } else {
-char *endptr = NULL;
-
-fd = strtol(param, &endptr, 10);
-if (*endptr || (fd == 0 && param == endptr)) {
-return -1;
-}
+fd = qemu_parse_fd(param);
 }
 
 return fd;
Index: qemu-git.pt/qemu-char.c
===
--- qemu-git.pt.orig/qemu-char.c
+++ qemu-git.pt/qemu-char.c
@@ -2692,3 +2692,15 @@ CharDriverState *qemu_chr_find(const cha
 }
 return NULL;
 }
+
+int qemu_parse_fd(const char *param)
+{
+int fd;
+char *endptr = NULL;
+
+fd = strtol(param, &endptr, 10);
+if (*endptr || (fd == 0 && param == endptr)) {
+return -1;
+}
+return fd;
+}
Index: qemu-git.pt/qemu-char.h
===
--- qemu-git.pt.orig/qemu-char.h
+++ qemu-git.pt/qemu-char.h
@@ -248,4 +248,6 @@ int qemu_set_fd_handler(int fd,
 IOHandler *fd_read,
 IOHandler *fd_write,
 void *opaque);
+
+int qemu_parse_fd(const char *param);
 #endif

Re: [Qemu-devel] [PATCH] event_notifier: move to top-level directory

2011-09-27 Thread Anthony Liguori


On 09/27/2011 10:28 AM, Jan Kiszka wrote:

On 2011-09-27 17:17, Stefan Hajnoczi wrote:

On Tue, Sep 27, 2011 at 3:26 PM, Avi Kivity  wrote:

Has no business in hw/.

Signed-off-by: Avi Kivity
---
  hw/event_notifier.c =>  event_notifier.c |1 -
  hw/event_notifier.h =>  event_notifier.h |0
  2 files changed, 0 insertions(+), 1 deletions(-)
  rename hw/event_notifier.c =>  event_notifier.c (98%)
  rename hw/event_notifier.h =>  event_notifier.h (100%)


Yay.  Now perhaps we can kill qemu_eventfd(), whose users are
typically poking around with write(2) and read(2) when really they
could use the high-level event_notifier.h API.


EventNotifiers will have to be superseded by something more generic
first. It's not fully covering the use cases of cpus.c and
posix-aio-compat.c.


Actually, for posix-aio, we can just switch to using g_idle_add().  g_idle_add() 
uses g_source_attach which is thread safe.  g_idle_add() gives you a thread safe 
mechanism to defer a piece of work to the main loop which is really what we want 
here.


This can actually be made to work with sync I/O emulation too by having another 
GMainLoop in the sync I/O loop although I thought I recalled a patch series to 
remove that stuff.


Kevin/Stefan, what's the plans for sync I/O emulation?

See untested patch below.

Regards,

Anthony Liguori



Jan



>From cf036a192f09ea76d89648b83ec84a4226d14172 Mon Sep 17 00:00:00 2001
From: Anthony Liguori 
Date: Tue, 27 Sep 2011 11:01:51 -0500
Subject: [PATCH] posix-aio-compat: use g_idle_add()

Signed-off-by: Anthony Liguori 
---
 posix-aio-compat.c |   51 +++
 1 files changed, 7 insertions(+), 44 deletions(-)

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index d3c1174..1f746be 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -52,7 +52,6 @@ struct qemu_paiocb {
 };
 
 typedef struct PosixAioState {
-int rfd, wfd;
 struct qemu_paiocb *first_aio;
 } PosixAioState;
 
@@ -466,7 +465,7 @@ static int qemu_paio_error(struct qemu_paiocb *aiocb)
 return ret;
 }
 
-static int posix_aio_process_queue(void *opaque)
+static gboolean posix_aio_process_queue(void *opaque)
 {
 PosixAioState *s = opaque;
 struct qemu_paiocb *acb, **pacb;
@@ -477,8 +476,9 @@ static int posix_aio_process_queue(void *opaque)
 pacb = &s->first_aio;
 for(;;) {
 acb = *pacb;
-if (!acb)
-return result;
+if (!acb) {
+goto out;
+}
 
 ret = qemu_paio_error(acb);
 if (ret == ECANCELED) {
@@ -513,27 +513,8 @@ static int posix_aio_process_queue(void *opaque)
 }
 }
 
-return result;
-}
-
-static void posix_aio_read(void *opaque)
-{
-PosixAioState *s = opaque;
-ssize_t len;
-
-/* read all bytes from signal pipe */
-for (;;) {
-char bytes[16];
-
-len = read(s->rfd, bytes, sizeof(bytes));
-if (len == -1 && errno == EINTR)
-continue; /* try again */
-if (len == sizeof(bytes))
-continue; /* more to read */
-break;
-}
-
-posix_aio_process_queue(s);
+out:
+return FALSE;
 }
 
 static int posix_aio_flush(void *opaque)
@@ -546,12 +527,7 @@ static PosixAioState *posix_aio_state;
 
 static void posix_aio_notify_event(void)
 {
-char byte = 0;
-ssize_t ret;
-
-ret = write(posix_aio_state->wfd, &byte, sizeof(byte));
-if (ret < 0 && errno != EAGAIN)
-die("write()");
+g_idle_add(posix_aio_process_queue, posix_aio_state);
 }
 
 static void paio_remove(struct qemu_paiocb *acb)
@@ -665,19 +641,6 @@ int paio_init(void)
 s = g_malloc(sizeof(PosixAioState));
 
 s->first_aio = NULL;
-if (qemu_pipe(fds) == -1) {
-fprintf(stderr, "failed to create pipe\n");
-return -1;
-}
-
-s->rfd = fds[0];
-s->wfd = fds[1];
-
-fcntl(s->rfd, F_SETFL, O_NONBLOCK);
-fcntl(s->wfd, F_SETFL, O_NONBLOCK);
-
-qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush,
-posix_aio_process_queue, s);
 
 ret = pthread_attr_init(&attr);
 if (ret)
-- 
1.7.4.1

Re: [Qemu-devel] [Xen-devel] Re: Try to integrate the Xen PCI Passthrough code into linux: have pci_regs conflict with libpci.

2011-09-27 Thread Anthony PERARD

On Tue, Sep 27, 2011 at 16:20, Michael S. Tsirkin  wrote:
> On Tue, Sep 27, 2011 at 04:02:23PM +0100, Anthony PERARD wrote:
>> Hi,
>>
>> I'm trying to integrate the Xen PCI Passthrough code into Qemu. But we
>> use libpci, and it's not friendly with pci_regs.h.
>>
>> So can I replace pci_regs by the libpci one?
>
> I prefer sticking to pci_regs in linux.

Fair enough.

>> Should I avoid to include both? (by having a "hook" the libpci functions)
>> Or do you have any other suggestions?
>>
>> Thanks,
>> Regards,
>
> Can you avoid libpci? It was very useful before sysfs, but
> on modern systems there isn't much that it does.

I was thinking to keep any compatibility with a *BSD system, but since
there is one function that assume the existance of the sysfs, I will
just "rewrote" the needed functions and remove the usage of libpci.

Thanks,

-- 
Anthony PERARD

[Qemu-devel] [PATCH 01/12] qemu-timer: remove active_timers array

2011-09-27 Thread Paolo Bonzini

Embed the list in the QEMUClock instead.

Signed-off-by: Paolo Bonzini 
---
 qemu-timer.c |   59 +++--
 1 files changed, 28 insertions(+), 31 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index ad1fc8b..acf7a15 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -134,6 +134,7 @@ struct QEMUClock {
 int enabled;
 
 QEMUTimer *warp_timer;
+QEMUTimer *active_timers;
 
 NotifierList reset_notifiers;
 int64_t last;
@@ -352,14 +353,10 @@ next:
 }
 }
 
-#define QEMU_NUM_CLOCKS 3
-
 QEMUClock *rt_clock;
 QEMUClock *vm_clock;
 QEMUClock *host_clock;
 
-static QEMUTimer *active_timers[QEMU_NUM_CLOCKS];
-
 static QEMUClock *qemu_new_clock(int type)
 {
 QEMUClock *clock;
@@ -403,7 +400,7 @@ static void icount_warp_rt(void *opaque)
 int64_t delta = cur_time - cur_icount;
 qemu_icount_bias += MIN(warp_delta, delta);
 }
-if (qemu_timer_expired(active_timers[QEMU_CLOCK_VIRTUAL],
+if (qemu_timer_expired(vm_clock->active_timers,
qemu_get_clock_ns(vm_clock))) {
 qemu_notify_event();
 }
@@ -434,7 +431,7 @@ void qemu_clock_warp(QEMUClock *clock)
  * the earliest vm_clock timer.
  */
 icount_warp_rt(NULL);
-if (!all_cpu_threads_idle() || !active_timers[clock->type]) {
+if (!all_cpu_threads_idle() || !clock->active_timers) {
 qemu_del_timer(clock->warp_timer);
 return;
 }
@@ -489,7 +486,7 @@ void qemu_del_timer(QEMUTimer *ts)
 
 /* NOTE: this code must be signal safe because
qemu_timer_expired() can be called from a signal. */
-pt = &active_timers[ts->clock->type];
+pt = &ts->clock->active_timers;
 for(;;) {
 t = *pt;
 if (!t)
@@ -513,7 +510,7 @@ static void qemu_mod_timer_ns(QEMUTimer *ts, int64_t 
expire_time)
 /* add the timer in the sorted list */
 /* NOTE: this code must be signal safe because
qemu_timer_expired() can be called from a signal. */
-pt = &active_timers[ts->clock->type];
+pt = &ts->clock->active_timers;
 for(;;) {
 t = *pt;
 if (!qemu_timer_expired_ns(t, expire_time)) {
@@ -526,7 +523,7 @@ static void qemu_mod_timer_ns(QEMUTimer *ts, int64_t 
expire_time)
 *pt = ts;
 
 /* Rearm if necessary  */
-if (pt == &active_timers[ts->clock->type]) {
+if (pt == &ts->clock->active_timers) {
 if (!alarm_timer->pending) {
 qemu_rearm_alarm_timer(alarm_timer);
 }
@@ -548,7 +545,7 @@ void qemu_mod_timer(QEMUTimer *ts, int64_t expire_time)
 int qemu_timer_pending(QEMUTimer *ts)
 {
 QEMUTimer *t;
-for(t = active_timers[ts->clock->type]; t != NULL; t = t->next) {
+for (t = ts->clock->active_timers; t != NULL; t = t->next) {
 if (t == ts)
 return 1;
 }
@@ -569,7 +566,7 @@ static void qemu_run_timers(QEMUClock *clock)
 return;
 
 current_time = qemu_get_clock_ns(clock);
-ptimer_head = &active_timers[clock->type];
+ptimer_head = &clock->active_timers;
 for(;;) {
 ts = *ptimer_head;
 if (!qemu_timer_expired_ns(ts, current_time)) {
@@ -773,8 +770,8 @@ int64_t qemu_next_icount_deadline(void)
 int64_t delta = INT32_MAX;
 
 assert(use_icount);
-if (active_timers[QEMU_CLOCK_VIRTUAL]) {
-delta = active_timers[QEMU_CLOCK_VIRTUAL]->expire_time -
+if (vm_clock->active_timers) {
+delta = vm_clock->active_timers->expire_time -
  qemu_get_clock_ns(vm_clock);
 }
 
@@ -789,20 +786,20 @@ static int64_t qemu_next_alarm_deadline(void)
 int64_t delta;
 int64_t rtdelta;
 
-if (!use_icount && active_timers[QEMU_CLOCK_VIRTUAL]) {
-delta = active_timers[QEMU_CLOCK_VIRTUAL]->expire_time -
+if (!use_icount && vm_clock->active_timers) {
+delta = vm_clock->active_timers->expire_time -
  qemu_get_clock_ns(vm_clock);
 } else {
 delta = INT32_MAX;
 }
-if (active_timers[QEMU_CLOCK_HOST]) {
-int64_t hdelta = active_timers[QEMU_CLOCK_HOST]->expire_time -
+if (host_clock->active_timers) {
+int64_t hdelta = host_clock->active_timers->expire_time -
  qemu_get_clock_ns(host_clock);
 if (hdelta < delta)
 delta = hdelta;
 }
-if (active_timers[QEMU_CLOCK_REALTIME]) {
-rtdelta = (active_timers[QEMU_CLOCK_REALTIME]->expire_time -
+if (rt_clock->active_timers) {
+rtdelta = (rt_clock->active_timers->expire_time -
  qemu_get_clock_ns(rt_clock));
 if (rtdelta < delta)
 delta = rtdelta;
@@ -871,9 +868,9 @@ static void dynticks_rearm_timer(struct qemu_alarm_timer *t)
 int64_t current_ns;
 
 assert(alarm_has_dynticks(t));
-if (!active_timers[QEMU_CLOCK_REALTIME] &&
-!active_timers[QEMU_CLOCK_VIRTUAL] &&
-!active_timers[QEMU_CLOCK_HOST])
+if (!rt_clock->active_timers &&
+!vm_clo

Re: [Qemu-devel] [PATCH 1/5] qemu-char: make qemu_chr_event public

2011-09-27 Thread Hans de Goede


Hi,

On 08/12/2011 03:57 PM, Anthony Liguori wrote:

On 08/11/2011 07:25 AM, Hans de Goede wrote:

Make qemu_chr_event public so that it can be used by chardev code
which lives outside of qemu-char.c


Normally, qemu_chr_generic_open() would be used to do this. Of course, there is 
no generic_close().

Are you sure you don't need the BH indirection?


A bit of a late reply (I was / am still waiting to see how the new
improved chardev code ends up). Things seem to work fine without
the BH indirection for all the spice cases I've tested (agent and usbredir).

But it might indeed be a good idea to keep the BH indirection, so we
would need some way to have the BH indirection for close to, options:

1) DIY in spice-qemu-char.c
2) Add a generic_close function

I would prefer 2, what do you think?

Thanks & Regards,

Hans

Re: [Qemu-devel] [PATCH] event_notifier: move to top-level directory

2011-09-27 Thread Paolo Bonzini


On 09/27/2011 06:05 PM, Anthony Liguori wrote:

Actually, for posix-aio, we can just switch to using g_idle_add().
g_idle_add() uses g_source_attach which is thread safe.  g_idle_add()
 gives you a thread safe mechanism to defer a piece of work to the
main loop which is really what we want here.


For that, a bottom half would also do (apart that I am not sure it is
async-safe with TCG).  In fact, that would make sense since all of
posix_aio_process_queue could become a bottom half.

But the problem to be solved here is waking up qemu_aio_wait, and
scheduling bottom halves currently wakes up only the vl.c main loop.


This can actually be made to work with sync I/O emulation too by
having another GMainLoop in the sync I/O loop although I thought I
recalled a patch series to remove that stuff.


... which stuff? :)  Another GMainLoop in the sync I/O loop is 
problematic for two reasons: 1) the sync I/O loop does not relinquish 
the I/O thread mutex, which makes it very different from the outer loop; 
2) a nested GMainLoop keeps polling all the file descriptors in the 
outer loop, which requires you to cope with reentrancy in those monitor 
commands that flush AIO.


Paolo

Re: [Qemu-devel] [FYI] Soft feature freeze for 1.0 is 10/15 (three weeks away)

2011-09-27 Thread Blue Swirl

On Tue, Sep 27, 2011 at 8:57 AM, Avi Kivity  wrote:
> On 09/27/2011 11:33 AM, Avi Kivity wrote:
>>
>> On 09/26/2011 09:07 PM, Blue Swirl wrote:
>>>
>>> >>  The default address is used for early serial printk in OpenBIOS, so
>>> >> it
>>> >>  should still work.
>>> >
>>> >  Ok, so drop the extra mapping, but init the dynamic mapping to
>>> > 0x80013000.
>>>
>>> That should work.
>>
>> It's already there (macio.c):
>>
>>    if (macio_state->escc_mem) {
>>        memory_region_add_subregion(bar, 0x13000, macio_state->escc_mem);
>>    }
>>
>> I'll drop the extra mapping.
>>
>
> Well, it's not that easy.  As the other mapping is part of an ordinary BAR,
> you need to setup the device (at least PCI_COMMAND and PCI_BASE_ADDRESS_0)
> so it responds to memory requests, and also enable the bridge.
>
> We could hack it by having a low-priority mapping at 0x80013000, but it
> seems wrong.  Maybe the firmware should configure that BAR first?  What
> happens on real hardware?

In this message I seem to confess that the address is arbitrary and in
the subsequent messages the overlap with PCI region is also discussed.
http://lists.nongnu.org/archive/html/qemu-devel/2009-01/msg00542.html

Maybe the address of macio should be fixed as Laurent suggested.

Re: [Qemu-devel] [FYI] Soft feature freeze for 1.0 is 10/15 (three weeks away)

2011-09-27 Thread Avi Kivity

On 09/27/2011 07:39 PM, Blue Swirl wrote:

>
>  Well, it's not that easy.  As the other mapping is part of an ordinary BAR,
>  you need to setup the device (at least PCI_COMMAND and PCI_BASE_ADDRESS_0)
>  so it responds to memory requests, and also enable the bridge.
>
>  We could hack it by having a low-priority mapping at 0x80013000, but it
>  seems wrong.  Maybe the firmware should configure that BAR first?  What
>  happens on real hardware?

In this message I seem to confess that the address is arbitrary and in
the subsequent messages the overlap with PCI region is also discussed.
http://lists.nongnu.org/archive/html/qemu-devel/2009-01/msg00542.html

Maybe the address of macio should be fixed as Laurent suggested.

I'll leave it up to you - I'm out of my depth here.

Meanwhile I suggest applying the pci alias patch - the bug is 
independent of the vga issue.

--
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH 04/12] qemu-timer: move icount to cpus.c

2011-09-27 Thread Paolo Bonzini

None of this is needed by tools, and most of it can even be made static
inside cpus.c.

Signed-off-by: Paolo Bonzini 
---
 cpus.c|  296 +
 exec-all.h|   14 +++
 exec.c|7 --
 qemu-common.h |4 +
 qemu-timer.c  |  279 -
 qemu-timer.h  |   24 +-
 6 files changed, 297 insertions(+), 327 deletions(-)

diff --git a/cpus.c b/cpus.c
index 8978779..58d353f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -65,6 +65,282 @@
 static CPUState *next_cpu;
 
 /***/
+/* guest cycle counter */
+
+/* Conversion factor from emulated instructions to virtual clock ticks.  */
+static int icount_time_shift;
+/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
+#define MAX_ICOUNT_SHIFT 10
+/* Compensate for varying guest execution speed.  */
+static int64_t qemu_icount_bias;
+static QEMUTimer *icount_rt_timer;
+static QEMUTimer *icount_vm_timer;
+static QEMUTimer *icount_warp_timer;
+static int64_t vm_clock_warp_start;
+int64_t qemu_icount;
+int use_icount;
+
+typedef struct TimersState {
+int64_t cpu_ticks_prev;
+int64_t cpu_ticks_offset;
+int64_t cpu_clock_offset;
+int32_t cpu_ticks_enabled;
+int64_t dummy;
+} TimersState;
+
+TimersState timers_state;
+
+/* Return the virtual CPU time, based on the instruction counter.  */
+int64_t cpu_get_icount(void)
+{
+int64_t icount;
+CPUState *env = cpu_single_env;;
+
+icount = qemu_icount;
+if (env) {
+if (!can_do_io(env)) {
+fprintf(stderr, "Bad clock read\n");
+}
+icount -= (env->icount_decr.u16.low + env->icount_extra);
+}
+return qemu_icount_bias + (icount << icount_time_shift);
+}
+
+/* return the host CPU cycle counter and handle stop/restart */
+int64_t cpu_get_ticks(void)
+{
+if (use_icount) {
+return cpu_get_icount();
+}
+if (!timers_state.cpu_ticks_enabled) {
+return timers_state.cpu_ticks_offset;
+} else {
+int64_t ticks;
+ticks = cpu_get_real_ticks();
+if (timers_state.cpu_ticks_prev > ticks) {
+/* Note: non increasing ticks may happen if the host uses
+   software suspend */
+timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - 
ticks;
+}
+timers_state.cpu_ticks_prev = ticks;
+return ticks + timers_state.cpu_ticks_offset;
+}
+}
+
+/* return the host CPU monotonic timer and handle stop/restart */
+int64_t cpu_get_clock(void)
+{
+int64_t ti;
+if (!timers_state.cpu_ticks_enabled) {
+return timers_state.cpu_clock_offset;
+} else {
+ti = get_clock();
+return ti + timers_state.cpu_clock_offset;
+}
+}
+
+/* enable cpu_get_ticks() */
+void cpu_enable_ticks(void)
+{
+if (!timers_state.cpu_ticks_enabled) {
+timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
+timers_state.cpu_clock_offset -= get_clock();
+timers_state.cpu_ticks_enabled = 1;
+}
+}
+
+/* disable cpu_get_ticks() : the clock is stopped. You must not call
+   cpu_get_ticks() after that.  */
+void cpu_disable_ticks(void)
+{
+if (timers_state.cpu_ticks_enabled) {
+timers_state.cpu_ticks_offset = cpu_get_ticks();
+timers_state.cpu_clock_offset = cpu_get_clock();
+timers_state.cpu_ticks_enabled = 0;
+}
+}
+
+/* Correlation between real and virtual time is always going to be
+   fairly approximate, so ignore small variation.
+   When the guest is idle real and virtual time will be aligned in
+   the IO wait loop.  */
+#define ICOUNT_WOBBLE (get_ticks_per_sec() / 10)
+
+static void icount_adjust(void)
+{
+int64_t cur_time;
+int64_t cur_icount;
+int64_t delta;
+static int64_t last_delta;
+/* If the VM is not running, then do nothing.  */
+if (!runstate_is_running()) {
+return;
+}
+cur_time = cpu_get_clock();
+cur_icount = qemu_get_clock_ns(vm_clock);
+delta = cur_icount - cur_time;
+/* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  
*/
+if (delta > 0
+&& last_delta + ICOUNT_WOBBLE < delta * 2
+&& icount_time_shift > 0) {
+/* The guest is getting too far ahead.  Slow time down.  */
+icount_time_shift--;
+}
+if (delta < 0
+&& last_delta - ICOUNT_WOBBLE > delta * 2
+&& icount_time_shift < MAX_ICOUNT_SHIFT) {
+/* The guest is getting too far behind.  Speed time up.  */
+icount_time_shift++;
+}
+last_delta = delta;
+qemu_icount_bias = cur_icount - (qemu_icount << icount_time_shift);
+}
+
+static void icount_adjust_rt(void *opaque)
+{
+qemu_mod_timer(icount_rt_timer,
+   qemu_get_clock_ms(rt_clock) + 1000);
+icount_adjust();
+}
+
+static void icount_adjust_vm(void *opaque)
+{
+qemu_mod_timer(icount_vm_timer,
+   qemu_get_clock_ns(vm_c

Re: [Qemu-devel] [Qemu-ppc] [PATCH] ppc/e500_pci: Fix an array overflow issue

2011-09-27 Thread Scott Wood

On 09/27/2011 07:45 AM, Alexander Graf wrote:
> On 27.09.2011, at 10:17, Liu Yu wrote:
>> ---
>> hw/ppce500_pci.c |   26 --
>> 1 files changed, 16 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c
>> index 2db365d..3e24e85 100644
>> --- a/hw/ppce500_pci.c
>> +++ b/hw/ppce500_pci.c
>> @@ -108,15 +108,18 @@ static uint32_t pci_reg_read4(void *opaque, 
>> target_phys_addr_t addr)
>>
>> case PPCE500_PCI_IW3:
>> case PPCE500_PCI_IW2:
>> -case PPCE500_PCI_IW1:
>> +case PPCE500_PCI_IW1: {
>> +int idx = ((addr >> 5) & 0x3) - 1;
> 
> So this is the main change, right? Why the -1? A guest could potentially 
> access pib[-1] using this, no?

Not with the values of addr that lead to this code.  The -1 is because
IW1/2/3 are 0x1e0/0x1c0/0x1a0.  Previously IW1 would overflow the array.

>> switch (addr & 0xC) {
>> -case PCI_PITAR: value = pci->pib[(addr >> 5) & 0x3].pitar; break;
>> -case PCI_PIWBAR: value = pci->pib[(addr >> 5) & 0x3].piwbar; break;
>> -case PCI_PIWBEAR: value = pci->pib[(addr >> 5) & 0x3].piwbear; 
>> break;
>> -case PCI_PIWAR: value = pci->pib[(addr >> 5) & 0x3].piwar; break;
>> +case PCI_PITAR: value = pci->pib[idx].pitar; break;
>> +case PCI_PIWBAR: value = pci->pib[idx].piwbar; break;
>> +case PCI_PIWBEAR: value = pci->pib[idx].piwbear; break;
>> +case PCI_PIWAR: value = pci->pib[idx].piwar; break;
> 
> I'm fairly sure this breaks checkpatch.pl.

So does the original code...

If this is to be fixed, the outbound window switch should be fixed too
(and made to use idx, for consistency).

-Scott

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code

2011-09-27 Thread Blue Swirl

On Tue, Sep 27, 2011 at 3:59 PM, Alexander Graf  wrote:
>
> On 27.09.2011, at 17:50, Blue Swirl wrote:
>
>> On Mon, Sep 26, 2011 at 11:19 PM, Scott Wood  wrote:
>>> On 09/24/2011 05:00 AM, Alexander Graf wrote:
 On 24.09.2011, at 10:44, Blue Swirl wrote:
> On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf  wrote:
>> On 24.09.2011, at 09:41, Blue Swirl wrote:
>>> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood  
>>> wrote:
 The goal with the spin table stuff, suboptimal as it is, was something
 that would work on any powerpc implementation.  Other
 implementation-specific release mechanisms are allowed, and are
 indicated by a property in the cpu node, but only if the loader knows
 that the OS supports it.

> IIUC the spec that includes these bits is not finalized yet. It is 
> however in use on all u-boot versions for e500 that I'm aware of and 
> the method Linux uses to bring up secondary CPUs.

 It's in ePAPR 1.0, which has been out for a while now.  ePAPR 1.1 was
 just released which clarifies some things such as WIMG.

> Stuart / Scott, do you have any pointers to documentation where the 
> spinning is explained?

 https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf
>>>
>>> Chapter 5.5.2 describes the table. This is actually an interface
>>> between OS and Open Firmware, obviously there can't be a real hardware
>>> device that magically loads r3 etc.
>>>
>>> Not Open Firmware, but rather an ePAPR-compliant loader.
>>
>> 'boot program to client program interface definition'.
>>
>>> The device method would break abstraction layers,
>>>
>>> Which abstraction layers?
>>
>> QEMU system emulation emulates hardware, not software. Hardware
>> devices don't touch CPU registers.
>
> The great part about this emulated device is that it's basically guest 
> software running in host context. To the guest, it's not a device in the 
> ordinary sense, such as vmport, but rather the same as software running on 
> another core, just that the other core isn't running any software.
>
> Sure, if you consider this a device, it does break abstraction layers. Just 
> consider it as host running guest code, then it makes sense :).
>
>>
>>> it's much like
>>> vmport stuff in x86. Using a hypercall would be a small improvement.
>>> Instead it should be possible to implement a small boot ROM which puts
>>> the secondary CPUs into managed halt state without spinning, then the
>>> boot CPU could send an IPI to a halted CPU to wake them up based on
>>> the spin table, just like real HW would do.
>>>
>>> The spin table, with no IPI or halt state, is what real HW does (or
>>> rather, what software does on real HW) today.  It's ugly and inefficient
>>> but it should work everywhere.  Anything else would be dependent on a
>>> specific HW implementation.
>>
>> Yes. Hardware doesn't ever implement the spin table.
>>
>>> On Sparc32 OpenBIOS this
>>> is something like a few lines of ASM on both sides.
>>
>> That sounds pretty close to what I had implemented in v1. Back then the 
>> only comment was to do it using this method from Scott.
>>>
>>> I had some comments on the actual v1 implementation as well. :-)
>>>
>> So we have the choice between having code inside the guest that
>> spins, maybe even only checks every x ms, by programming a timer,
>> or we can try to make an event out of the memory write. V1 was
>> the former, v2 (this one) is the latter. This version performs a
>> lot better and is easier to understand.
>
> The abstraction layers should not be broken lightly, I suppose some
> performance or laziness^Wlocal optimization reasons were behind vmport
> design too. The ideal way to solve this could be to detect a spinning
> CPU and optimize that for all architectures, that could be tricky
> though (if a CPU remains in the same TB for extended periods, inspect
> the TB: if it performs a loop with a single load instruction, replace
> the load by a special wait operation for any memory stores to that
> page).
>>>
>>> How's that going to work with KVM?
>>>
 In fact, the whole kernel loading way we go today is pretty much
 wrong. We should rather do it similar to OpenBIOS where firmware
 always loads and then pulls the kernel from QEMU using a PV
 interface. At that point, we would have to implement such an
 optimization as you suggest. Or implement a hypercall :).
>>>
>>> I think the current approach is more usable for most purposes.  If you
>>> start U-Boot instead of a kernel, how do pass information on from the
>>> user (kernel, rfs, etc)?  Require the user to create flash images[1]?
>>
>> No, for example OpenBIOS gets the kernel command line from fw_cfg device.
>>
>>> Maybe that's a useful mode of operation in some cases, but

[Qemu-devel] [PATCH 07/12] qemu-timer: move more stuff out of qemu-timer.c

2011-09-27 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini 
---
 qemu-timer.c |   35 ---
 qemu-timer.h |2 ++
 savevm.c |   25 +
 vl.c |1 +
 4 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index 7fa81e1..58926dd 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -266,11 +266,8 @@ static QEMUClock *qemu_new_clock(int type)
 clock = g_malloc0(sizeof(QEMUClock));
 clock->type = type;
 clock->enabled = 1;
+clock->last = INT64_MIN;
 notifier_list_init(&clock->reset_notifiers);
-/* required to detect & report backward jumps */
-if (type == QEMU_CLOCK_HOST) {
-clock->last = get_clock_realtime();
-}
 return clock;
 }
 
@@ -344,7 +341,7 @@ void qemu_del_timer(QEMUTimer *ts)
 
 /* modify the current timer so that it will be fired when current_time
>= expire_time. The corresponding callback will be called. */
-static void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
+void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
 {
 QEMUTimer **pt, *t;
 
@@ -378,8 +375,6 @@ static void qemu_mod_timer_ns(QEMUTimer *ts, int64_t 
expire_time)
 }
 }
 
-/* modify the current timer so that it will be fired when current_time
-   >= expire_time. The corresponding callback will be called. */
 void qemu_mod_timer(QEMUTimer *ts, int64_t expire_time)
 {
 qemu_mod_timer_ns(ts, expire_time * ts->scale);
@@ -464,33 +459,11 @@ void init_clocks(void)
 rt_clock = qemu_new_clock(QEMU_CLOCK_REALTIME);
 vm_clock = qemu_new_clock(QEMU_CLOCK_VIRTUAL);
 host_clock = qemu_new_clock(QEMU_CLOCK_HOST);
-
-rtc_clock = host_clock;
 }
 
-/* save a timer */
-void qemu_put_timer(QEMUFile *f, QEMUTimer *ts)
+uint64_t qemu_timer_expire_time_ns(QEMUTimer *ts)
 {
-uint64_t expire_time;
-
-if (qemu_timer_pending(ts)) {
-expire_time = ts->expire_time;
-} else {
-expire_time = -1;
-}
-qemu_put_be64(f, expire_time);
-}
-
-void qemu_get_timer(QEMUFile *f, QEMUTimer *ts)
-{
-uint64_t expire_time;
-
-expire_time = qemu_get_be64(f);
-if (expire_time != -1) {
-qemu_mod_timer_ns(ts, expire_time);
-} else {
-qemu_del_timer(ts);
-}
+return qemu_timer_pending(ts) ? ts->expire_time : -1;
 }
 
 void qemu_run_all_timers(void)
diff --git a/qemu-timer.h b/qemu-timer.h
index b4ea201..9f4ffed 100644
--- a/qemu-timer.h
+++ b/qemu-timer.h
@@ -52,9 +52,11 @@ QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
   QEMUTimerCB *cb, void *opaque);
 void qemu_free_timer(QEMUTimer *ts);
 void qemu_del_timer(QEMUTimer *ts);
+void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time);
 void qemu_mod_timer(QEMUTimer *ts, int64_t expire_time);
 int qemu_timer_pending(QEMUTimer *ts);
 int qemu_timer_expired(QEMUTimer *timer_head, int64_t current_time);
+uint64_t qemu_timer_expire_time_ns(QEMUTimer *ts);
 
 void qemu_run_all_timers(void);
 int qemu_alarm_pending(void);
diff --git a/savevm.c b/savevm.c
index 46f2447..eb0ccfd 100644
--- a/savevm.c
+++ b/savevm.c
@@ -81,6 +81,7 @@
 #include "migration.h"
 #include "qemu_socket.h"
 #include "qemu-queue.h"
+#include "qemu-timer.h"
 #include "cpus.h"
 
 #define SELF_ANNOUNCE_ROUNDS 5
@@ -674,6 +675,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
 return v;
 }
 
+
+/* timer */
+
+void qemu_put_timer(QEMUFile *f, QEMUTimer *ts)
+{
+uint64_t expire_time;
+
+expire_time = qemu_timer_expire_time_ns(ts);
+qemu_put_be64(f, expire_time);
+}
+
+void qemu_get_timer(QEMUFile *f, QEMUTimer *ts)
+{
+uint64_t expire_time;
+
+expire_time = qemu_get_be64(f);
+if (expire_time != -1) {
+qemu_mod_timer_ns(ts, expire_time);
+} else {
+qemu_del_timer(ts);
+}
+}
+
+
 /* bool */
 
 static int get_bool(QEMUFile *f, void *pv, size_t size)
diff --git a/vl.c b/vl.c
index 8ae6468..19e7169 100644
--- a/vl.c
+++ b/vl.c
@@ -2322,6 +2322,7 @@ int main(int argc, char **argv, char **envp)
 runstate_init();
 
 init_clocks();
+rtc_clock = host_clock;
 
 qemu_cache_utils_init(envp);
 
-- 
1.7.6

1 2 >

1 - 100 of 156 matches

Mail list logo