On 04/01/2016 13:21, Peter Maydell wrote:
On 3 January 2016 at 20:57, David Durham <clearc...@lycos.com> wrote:
Any suggestions or comments on how to do this are very welcome
... I built qemu with --target-list i386-softmmu and when I run
qemu, top only shows one qemu-system-i386 using 100% of one core
This is expected. Our current emulation is single threaded
even when emulating multiple target CPUs, so we'll only
use one host core. (We do have some helper threads for a
few IO tasks etc but those are not cpu-bound.)
There is some development work in progress to try to
make better use of multi-core hosts but it's not very
far advanced yet. (Also emulating x86 guests on arm hosts
with multiple cpus might not ever be supported because
the x86 memory model would require barriers everywhere
and it's not clear it would overall improve performance.
ARM-on-x86 is the primary initial usecase.)
thanks
-- PMM
For your information, the x86 memory model only requires
barriers in the following cases (this is somewhat
implemented on modern machines with multiple actual x86
CPU sockets, as opposed to multicore chips, it may also
be observed when using any kind of DMA/bus-master
hardware such as GPUs):
1. Instructions with the explicit "LOCK" prefix, these
require a memory barrier, then a locked read-modify-write
on a single address, then another memory barrier.
2. Explicit memory barrier instructions (there have been
a few over the years).
3. Some of the XCHG-family instructions implicitly behave
as though there was a LOCK in front.
4. On modern CPUs, the floating point ("ESC") instructions
are treated as normal instructions, the related historic
"WAIT" opcode is now a NOP (optionally throwing an
"FPU disabled" exception), (on 386 and older, floating
point instructions might postpone their memory writes
to any point up to and including the next same-CPU WAIT,
but this was never a multi-CPU barrier, just
synchronization between the CPU and FPU chips within
each two-chip CPU).
5. Some specific operations (see the architecture manuals)
typically associated with cache management, system calls
and/or thread switching also act as barriers.
6. Only a minority of instructions flush the instruction
decode (and hence TCG translation) buffers, though for
highest consistency any actual write to a memory page
with code should cause the translation of that code to
be discarded from cache.
7. If doing cycle-accurate bug-for-bug emulation of
specific CPU models, it might be necessary to exactly
model the implicit size limitations of their various
caches, such as how many page table entries are cached
by the on-CPU TLB or how many bytes ahead the
instruction decoder may look. But I don't think that
is a qemu feature anyway.
This still leaves the majority of code not doing memory barriers.
Enjoy
Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded