date:20210225

Re: [PATCH v2] ui/cocoa: Fix stride resolution of pixman image

2021-02-25 Thread Gerd Hoffmann

On Wed, Feb 24, 2021 at 01:08:22PM +, Peter Maydell wrote:
> On Wed, 24 Feb 2021 at 11:23, Gerd Hoffmann  wrote:
> >
> > On Mon, Feb 22, 2021 at 11:40:12PM +0900, Akihiko Odaki wrote:
> > > A display can receive an image which its stride is greater than its
> > > width. In fact, when a guest requests virtio-gpu to scan out a
> > > smaller part of an image, virtio-gpu passes it to a display as an
> > > image which its width represents the one of the part and its stride
> > > equals to the one of the whole image.
> >
> > Probably not limited to virtio-gpu.  Wayland rounds display framebuffers
> > to the next multiple of 64, so when running -- for example -- 800x600
> > wayland will create an image 832 pixels wide.  Other UIs had simliar
> > issues.
> >
> > Patch added to UI patch queue.
> 
> Could you add Akihiko's explanation to the commit message
> for the patch in your queue, please?

That _is_ the (v2) commit message ;)

Akihiko: new versions of a patch should be sent as new thread, not as
reply.  It is less confusing for both people and tools like b4
(https://pypi.org/project/b4/) which help with patch processing.

take care,
  Gerd

Re: [PATCH v2 1/1] hw/s390x: modularize virtio-gpu-ccw

2021-02-25 Thread Gerd Hoffmann

  Hi,

> a programming error. So I'm absolutely against shoving this logic
> down into object.c. But I find the variant I posted nicer to document
> and nicer to read: looking at virtio_ccw_gpu_register() one sees
> immediately that if built as a module, it is OK if the registration
> fails, and if built-in it is expected to work.

Ok, makes sense, and we'll probably have not that many cases where we
need this ...

take care,
  Gerd

Re: [PATCH 1/3] migration/ram: Modify the code comment of ram_save_host_page()

2021-02-25 Thread Kunkun Jiang


On 2021/2/25 6:53, David Edmondson wrote:

On Tuesday, 2021-02-23 at 10:16:43 +08, Kunkun Jiang wrote:


The ram_save_host_page() has been modified several times
since its birth. But the comment hasn't been modified as it should
be. It'd better to modify the comment to explain ram_save_host_page()
more clearly.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
  migration/ram.c | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 72143da0ac..fc49c3f898 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1970,15 +1970,16 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss,
  }
  
  /**

- * ram_save_host_page: save a whole host page
+ * ram_save_host_page: save a whole host page or the rest of a block
   *
- * Starting at *offset send pages up to the end of the current host
- * page. It's valid for the initial offset to point into the middle of
- * a host page in which case the remainder of the hostpage is sent.
- * Only dirty target pages are sent. Note that the host page size may
- * be a huge page for this block.
- * The saving stops at the boundary of the used_length of the block
- * if the RAMBlock isn't a multiple of the host page size.
+ * Starting at pss->page send pages up to the end of the current host
+ * page or the boundary of used_length of the block (if the RAMBlock
+ * isn't a multiple of the host page size). The min one is selected.
+ * Only dirty target pages are sent.
+ *
+ * Note that the host page size may be a huge page for this block, it's
+ * valid for the initial offset to point into the middle of a host page
+ * in which case the remainder of the hostpage is sent.

How about:

* Send dirty pages between pss->page and either the end of that page or
* the used_length of the RAMBlock, whichever is smaller.
*
* Note that if the host page is a huge page, pss->page may be in the
* middle of that page.


Thank you. It looks concise and comprehensive.

Best Regards

Kunkun Jiang


   *
   * Returns the number of pages written or negative on error
   *
--
2.23.0

dme.

Re: [PATCH] libqos/qgraph: format qgraph comments for sphinx documentation

2021-02-25 Thread Emanuele Giuseppe Esposito





On 24/02/2021 11:59, Emanuele Giuseppe Esposito wrote:



On 24/02/2021 11:49, Paolo Bonzini wrote:

On 24/02/21 11:18, Emanuele Giuseppe Esposito wrote:

    qtest
+   qgraph


It may make sense to add instead a "toctree" directive in qtest.rst.  
I haven't checked what the result looks like, though.


Current result is

- QTest Device Emulation Testing Framework
- Qtest Driver Framework

but I agree, maybe with an internal toctree in qtest.rst it will be 
clearer. I'll try.


After trying, I think that simply adding a toctree in qtest.rst is not 
the prettiest solution. The end result will be something like


Qtest driver framework (title)
- qgraph (link to qgraph.rst)
QTest is a device emulation testing framework... [qtest.rst content]

The qgraph link will be also visible in docs/index and docs/devel/index

What about this:

diff --git a/docs/devel/qgraph.rst b/docs/devel/qgraph.rst
index 9349c45af8..62a45cbcbf 100644
--- a/docs/devel/qgraph.rst
+++ b/docs/devel/qgraph.rst
@@ -1,5 +1,261 @@
+.. _qgraph:
+
 
 Qtest Driver Framework
 

---

Add anchor in graph.rst


 .. kernel-doc:: tests/qtest/libqos/qgraph.h
diff --git a/docs/devel/qtest.rst b/docs/devel/qtest.rst
index 97c5a75626..b7201456b6 100644
--- a/docs/devel/qtest.rst
+++ b/docs/devel/qtest.rst
@@ -2,6 +2,12 @@
 QTest Device Emulation Testing Framework
 

+.. toctree::
+   :hidden:
+
+   qgraph
+
+
 QTest is a device emulation testing framework.  It can be very useful 
to test
 device models; it could also control certain aspects of QEMU (such as 
virtual

 clock stepping), with a special purpose "qtest" protocol.  Refer to
@@ -24,6 +30,9 @@ On top of libqtest, a higher level library, 
``libqos``, was created to

 encapsulate common tasks of device drivers, such as memory management and
 communicating with system buses or devices. Many virtual device tests use
 libqos instead of directly calling into libqtest.
+Libqos also offers the qgraph API to increase each test coverage and
+automate QEMU command line arguments and devices setup.
+Refer to :ref:`qgraph` for Qgraph explanation and API.

 Steps to add a new QTest case are:

---

Add hidden toctree because the new file must be linked by at least one, 
and reference qgraph in the text using the anchor.




diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 1dcce3bbed..f0038f8722 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -12,6 +12,7 @@ Contents:

 .. toctree::
:maxdepth: 2
+   :includehidden:

build-system
kconfig
@@ -24,7 +25,6 @@ Contents:
atomics
stable-process
qtest
-   qgraph
decodetree
secure-coding-practices
tcg

---

Allow showing the hidden toctree in the docs/devel index, so that the 
link is visible


End result:
- no visible change in docs/index
- qgraph link visible in docs/devel/index
- qgraph linked as text link in qtree

Thank you,
Emanuele

Re: [PATCH] target/arm: Speed up aarch64 TBL/TBX

2021-02-25 Thread Alex Bennée



Richard Henderson  writes:

> Always perform one call instead of two for 16-byte operands.
> Use byte loads/stores directly into the vector register file
> instead of extractions and deposits to a 64-bit local variable.
>
> In order to easily receive pointers into the vector register file,
> convert the helper to the gvec out-of-line signature.  Move the
> helper into vec_helper.c, where it can make use of H1 and clear_tail.
>
> Signed-off-by: Richard Henderson 

Much better, drops from 12.34% to 5.09% of total runtime, now almost all
inline:

  https://fileserver.linaro.org/s/cEZxoLGQ2pMi4xe


Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 

> ---
>
> Alex, as briefly discussed on IRC today, streamline the TBL/TBX
> implementation.  Would you run this through whatever benchmark
> you were experimenting with today?  This is unmeasureable in RISU
> (exactly one perf hit in the helper through the entire run).
>
> r~
>
> ---
>  target/arm/helper-a64.h|  2 +-
>  target/arm/helper-a64.c| 32 -
>  target/arm/translate-a64.c | 58 +-
>  target/arm/vec_helper.c| 52 ++
>  4 files changed, 60 insertions(+), 84 deletions(-)
>
> diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
> index 7bd6aed659..c139fa81f9 100644
> --- a/target/arm/helper-a64.h
> +++ b/target/arm/helper-a64.h
> @@ -28,7 +28,7 @@ DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
>  DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
>  DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
>  DEF_HELPER_3(vfp_cmped_a64, i64, f64, f64, ptr)
> -DEF_HELPER_FLAGS_5(simd_tbl, TCG_CALL_NO_RWG_SE, i64, env, i64, i64, i32, 
> i32)
> +DEF_HELPER_FLAGS_4(simd_tblx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_3(vfp_mulxs, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
>  DEF_HELPER_FLAGS_3(vfp_mulxd, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
>  DEF_HELPER_FLAGS_3(neon_ceq_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
> diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
> index 7f56c78fa6..061c8ff846 100644
> --- a/target/arm/helper-a64.c
> +++ b/target/arm/helper-a64.c
> @@ -179,38 +179,6 @@ float64 HELPER(vfp_mulxd)(float64 a, float64 b, void 
> *fpstp)
>  return float64_mul(a, b, fpst);
>  }
>  
> -uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t 
> indices,
> -  uint32_t rn, uint32_t numregs)
> -{
> -/* Helper function for SIMD TBL and TBX. We have to do the table
> - * lookup part for the 64 bits worth of indices we're passed in.
> - * result is the initial results vector (either zeroes for TBL
> - * or some guest values for TBX), rn the register number where
> - * the table starts, and numregs the number of registers in the table.
> - * We return the results of the lookups.
> - */
> -int shift;
> -
> -for (shift = 0; shift < 64; shift += 8) {
> -int index = extract64(indices, shift, 8);
> -if (index < 16 * numregs) {
> -/* Convert index (a byte offset into the virtual table
> - * which is a series of 128-bit vectors concatenated)
> - * into the correct register element plus a bit offset
> - * into that element, bearing in mind that the table
> - * can wrap around from V31 to V0.
> - */
> -int elt = (rn * 2 + (index >> 3)) % 64;
> -int bitidx = (index & 7) * 8;
> -uint64_t *q = aa64_vfp_qreg(env, elt >> 1);
> -uint64_t val = extract64(q[elt & 1], bitidx, 8);
> -
> -result = deposit64(result, shift, 8, val);
> -}
> -}
> -return result;
> -}
> -
>  /* 64bit/double versions of the neon float compare functions */
>  uint64_t HELPER(neon_ceq_f64)(float64 a, float64 b, void *fpstp)
>  {
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index b23a8975d5..496e14688a 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -7520,10 +7520,8 @@ static void disas_simd_tb(DisasContext *s, uint32_t 
> insn)
>  int rm = extract32(insn, 16, 5);
>  int rn = extract32(insn, 5, 5);
>  int rd = extract32(insn, 0, 5);
> -int is_tblx = extract32(insn, 12, 1);
> -int len = extract32(insn, 13, 2);
> -TCGv_i64 tcg_resl, tcg_resh, tcg_idx;
> -TCGv_i32 tcg_regno, tcg_numregs;
> +int is_tbx = extract32(insn, 12, 1);
> +int len = (extract32(insn, 13, 2) + 1) * 16;
>  
>  if (op2 != 0) {
>  unallocated_encoding(s);
> @@ -7534,53 +7532,11 @@ static void disas_simd_tb(DisasContext *s, uint32_t 
> insn)
>  return;
>  }
>  
> -/* This does a table lookup: for every byte element in the input
> - * we index into a table formed from up to four vector registers,
> - * and then the output is the result of the lookups. Our helper
> - * function does the lookup operation for a single 64 bit part of
> - * the input.
>

Re: [PATCH v2] Autoconnect jack ports by default

2021-02-25 Thread José Ramón Muñoz Pekkarinen

On Thu, 25 Feb 2021 at 00:38, Geoffrey McRae  wrote:

> While I get where you're coming from, those using QEMU with Jack are
> already advanced users that are used to reading technical documentation.
> Having our one client do something that is unexpected/different would
> not only confuse existing Jack users but also anyone following any
> guides/documentation on how to use a generic jack client. IMO the better
> solution here is simply better documentation, perhaps even a known
> working sample setup.
>

This is an interesting point, but hey, things be told in advance,
whether
the patch is accepted or not is up to you, I'll respect any decision in the
upstreaming, I'm just curious, how can a default behaviour that multiple
other applications and libraries adopt is going to confuse jack community?
For instance, I tend to work with firefox, and mumble that natively support
jack, and they perform automatic connection no patchbay extra config
required, the basics just work out of the box, while you still can do more
complex stuff if you want to use the full set of features of jack.

On the other hand, I took some time to read to automatically configure
the connection over qjackctl patchbay, so thanks for pointing out that via.

José.

Re: [PATCH v4] virtio-blk: Respect discard granularity

2021-02-25 Thread Stefano Garzarella


On Thu, Feb 25, 2021 at 09:12:39AM +0900, Akihiko Odaki wrote:

Report the configured granularity for discard operation to the
guest. If this is not set use the block size.

Since until now we have ignored the configured discard granularity
and always reported the block size, let's add
'report-discard-granularity' property and disable it for older
machine types to avoid migration issues.

Signed-off-by: Akihiko Odaki 
---
hw/block/virtio-blk.c  | 8 +++-
hw/core/machine.c  | 4 +++-
include/hw/virtio/virtio-blk.h | 1 +
3 files changed, 11 insertions(+), 2 deletions(-)


Reviewed-by: Stefano Garzarella

Re: [PATCH v2] ui/cocoa: Fix stride resolution of pixman image

2021-02-25 Thread Akihiko Odaki

2021年2月25日(木) 17:02 Gerd Hoffmann :
>
> On Wed, Feb 24, 2021 at 01:08:22PM +, Peter Maydell wrote:
> > On Wed, 24 Feb 2021 at 11:23, Gerd Hoffmann  wrote:
> > >
> > > On Mon, Feb 22, 2021 at 11:40:12PM +0900, Akihiko Odaki wrote:
> > > > A display can receive an image which its stride is greater than its
> > > > width. In fact, when a guest requests virtio-gpu to scan out a
> > > > smaller part of an image, virtio-gpu passes it to a display as an
> > > > image which its width represents the one of the part and its stride
> > > > equals to the one of the whole image.
> > >
> > > Probably not limited to virtio-gpu.  Wayland rounds display framebuffers
> > > to the next multiple of 64, so when running -- for example -- 800x600
> > > wayland will create an image 832 pixels wide.  Other UIs had simliar
> > > issues.
> > >
> > > Patch added to UI patch queue.
> >
> > Could you add Akihiko's explanation to the commit message
> > for the patch in your queue, please?
>
> That _is_ the (v2) commit message ;)
>
> Akihiko: new versions of a patch should be sent as new thread, not as
> reply.  It is less confusing for both people and tools like b4
> (https://pypi.org/project/b4/) which help with patch processing.

I didn't know that. Thanks for telling me that. I'll do so next time.

Regards,
Akihiko Odaki

>
> take care,
>   Gerd
>

[PATCH] ui/cocoa: Mark variables static

2021-02-25 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 ui/cocoa.m | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 0ef5fdf3b7a..9e9a2f88dde 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -80,7 +80,7 @@ static void cocoa_switch(DisplayChangeListener *dcl,
 
 static void cocoa_refresh(DisplayChangeListener *dcl);
 
-NSWindow *normalWindow, *about_window;
+static NSWindow *normalWindow, *about_window;
 static const DisplayChangeListenerOps dcl_ops = {
 .dpy_name  = "cocoa",
 .dpy_gfx_update = cocoa_update,
@@ -93,11 +93,11 @@ static void cocoa_switch(DisplayChangeListener *dcl,
 static int last_buttons;
 static int cursor_hide = 1;
 
-int gArgc;
-char **gArgv;
-bool stretch_video;
-NSTextField *pauseLabel;
-NSArray * supportedImageFileTypes;
+static int gArgc;
+static char **gArgv;
+static bool stretch_video;
+static NSTextField *pauseLabel;
+static NSArray * supportedImageFileTypes;
 
 static QemuSemaphore display_init_sem;
 static QemuSemaphore app_started_sem;
@@ -135,7 +135,7 @@ static bool bool_with_iothread_lock(BoolCodeBlock block)
 }
 
 // Mac to QKeyCode conversion
-const int mac_to_qkeycode_map[] = {
+static const int mac_to_qkeycode_map[] = {
 [kVK_ANSI_A] = Q_KEY_CODE_A,
 [kVK_ANSI_B] = Q_KEY_CODE_B,
 [kVK_ANSI_C] = Q_KEY_CODE_C,
-- 
2.24.3 (Apple Git-128)

Re: [PATCH v2] Autoconnect jack ports by default

2021-02-25 Thread Gerd Hoffmann

On Wed, Feb 24, 2021 at 11:33:14PM +0100, Christian Schoenebeck wrote:
> On Mittwoch, 24. Februar 2021 23:04:47 CET Geoffrey McRae wrote:
> > This goes against how all standard jack clients work, a new jack client
> > should not auto-connect at all unless explicitly configured to as if
> > there is an existing audio diagram configured (which is 99% of the time)
> > it will cause unexpected/undesired behavior.
> > 
> > Jack is not supposed to be an 'automatic' system, it's the
> > responsibility of the patch bay software to route connections.
> > 
> > The auto-connect feature exists to allow the jack audiodev to re-connect
> > a broken connection when the jack device restarts/reconnects.
> 
> Well, that was also my idea first, and I would agree with you in case of a 
> regular music app of course, but then I thought QEMU is probably not an 
> average JACK client, and it simply lowers the entry level for new users who 
> probably just want to output to system out anyway.

Well, I guess there is more software like that, any music player for
example.  I don't think this is a good reason for qemu to have
non-standard behavior.  If you want qemu autoconnect, you can use the
connect-ports option.

Beside that I'd expect the patch bay software is able to remember the
routing configuration per application, so the setup would be a one-time
thing you don't have to re-do on every qemu launch.  Not fully sure this
is actually the case though, I'm not a regular jack user.

take care,
  Gerd

Re: [PATCH v22 16/17] i386: gdbstub: only write CR0/CR2/CR3/EFER for SOFTMMU

2021-02-25 Thread Claudio Fontana

On 2/25/21 5:19 AM, Richard Henderson wrote:
> On 2/24/21 5:34 AM, Claudio Fontana wrote:
>> Signed-off-by: Claudio Fontana 
>> Cc: Paolo Bonzini 
>> ---
>>  target/i386/gdbstub.c | 16 
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/target/i386/gdbstub.c b/target/i386/gdbstub.c
>> index 41e265fc67..9f505d6ee3 100644
>> --- a/target/i386/gdbstub.c
>> +++ b/target/i386/gdbstub.c
>> @@ -383,26 +383,38 @@ int x86_cpu_gdb_write_register(CPUState *cs, uint8_t 
>> *mem_buf, int n)
>>  
>>  case IDX_CTL_CR0_REG:
>>  if (env->hflags & HF_CS64_MASK) {
>> +#ifdef CONFIG_SOFTMMU
>>  cpu_x86_update_cr0(env, ldq_p(mem_buf));
>> +#endif
>>  return 8;
>>  }
>> +#ifdef CONFIG_SOFTMMU
>>  cpu_x86_update_cr0(env, ldl_p(mem_buf));
>> +#endif
>>  return 4;
> 
> It would be nice to do all these with rather less ifdefs.
> And let's correctly use !CONFIG_USER_ONLY.
> 
> Without adding more stubs, may I suggest a new helper:
> 
> static target_ulong read_long_cs64(env, buf, len)
> {
> #ifdef TARGET_X86_64
> if (env->hflags & HF_CS64_MASK) {
> *len = 8;
> return ldq_p(buf);
> }
> #endif
> *len = 4;
> return ldl_p(buf);
> }

in the current code the

#ifdef TARGET_x86_64 is not there. Is it safe to use everywhere?

should we do a matching:

static int gdb_read_reg_cs64(CPUX86State *env, GByteArray *buf, target_ulong 
val)
{
if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
return gdb_get_reg64(buf, val);
}
return gdb_get_reg32(buf, val);
}

?

Should we add the #ifdef TARGET_X86_64 here as well?

Thanks,

Claudio

> 
> which, even by itself allows some cleanup in this function.
> Then:
> 
> case IDX_CTL_CR2_REG:
>tmp = read_long_cs64(env, mem_buf, &len);
> #ifndef CONFIG_USER_ONLY
>env->cr[2] = tmp;
> #endif
>return len;
> 
> which still has one ifdef, but not 2.
> 
> 
> r~
>

[RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support

2021-02-25 Thread Ying Fang

An accurate cpu topology may help improve the cpu scheduler's decision
making when dealing with multi-core system. So cpu topology description
is helpful to provide guest with the right view. Dario Faggioli's talk
in [0] also shows the virtual topology may has impact on sched performace.
Thus this patch series is posted to introduce cpu topology support for
arm platform.

Both fdt and ACPI are introduced to present the cpu topology. To describe
the cpu topology via ACPI, a PPTT table is introduced according to the
processor hierarchy node structure. This series is derived from [1], in
[1] we are trying to bring both cpu and cache topology support for arm
platform, but there is still some issues to solve to support the cache
hierarchy. So we split the cpu topology part out and send it seperately.
The patch series to support cache hierarchy will be send later since
Salil Mehta's cpu hotplug feature need the cpu topology enabled first and
he is waiting for it to be upstreamed.

This patch series was initially based on the patches posted by Andrew Jones [2].
I jumped in on it since some OS vendor cooperative partner are eager for it.
Thanks for Andrew's contribution.

After applying this patch series, launch a guest with virt-6.0 and cpu
topology configured with sockets:cores:threads = 2:4:2, you will get the
bellow messages with the lscpu command.

-
Architecture:aarch64
CPU op-mode(s):  64-bit
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):   2
NUMA node(s):2
Vendor ID:   HiSilicon
Model:   0
Model name:  Kunpeng-920
Stepping:0x1
BogoMIPS:200.00
NUMA node0 CPU(s):   0-7
NUMA node1 CPU(s):   8-15

[0] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse
[1] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02166.html
[2] 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com

Ying Fang (5):
  device_tree: Add qemu_fdt_add_path
  hw/arm/virt: Add cpu-map to device tree
  hw/arm/virt-acpi-build: distinguish possible and present cpus
  hw/acpi/aml-build: add processor hierarchy node structure
  hw/arm/virt-acpi-build: add PPTT table

 hw/acpi/aml-build.c  | 40 ++
 hw/arm/virt-acpi-build.c | 64 +---
 hw/arm/virt.c| 40 +-
 include/hw/acpi/acpi-defs.h  | 13 
 include/hw/acpi/aml-build.h  |  7 
 include/hw/arm/virt.h|  1 +
 include/sysemu/device_tree.h |  1 +
 softmmu/device_tree.c| 45 +++--
 8 files changed, 204 insertions(+), 7 deletions(-)

-- 
2.23.0

[RFC PATCH 5/5] hw/arm/virt-acpi-build: add PPTT table

2021-02-25 Thread Ying Fang

Add the Processor Properties Topology Table (PPTT) to present
CPU topology information to the guest. A three-level cpu
topology is built in accord with the linux kernel currently does.

Tested-by: Jiajie Li 
Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 50 
 1 file changed, 50 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index bb91152fe2..38d50ce66c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -436,6 +436,50 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  vms->oem_table_id);
 }
 
+static void
+build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
+{
+int pptt_start = table_data->len;
+int uid = 0, cpus = 0, socket = 0;
+MachineState *ms = MACHINE(vms);
+unsigned int smp_cores = ms->smp.cores;
+unsigned int smp_threads = ms->smp.threads;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_socket_hierarchy(table_data, 0, socket);
+
+for (core = 0; core < smp_cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (smp_threads <= 1) {
+build_processor_hierarchy(table_data,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
+  ACPI_PPTT_ACPI_LEAF_NODE,
+  socket_offset, uid++);
+ } else {
+build_processor_hierarchy(table_data,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID,
+  socket_offset, core);
+for (thread = 0; thread < smp_threads; thread++) {
+build_thread_hierarchy(table_data, core_offset, uid++);
+}
+ }
+}
+cpus += smp_cores * smp_threads;
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + pptt_start), "PPTT",
+ table_data->len - pptt_start, 2,
+ vms->oem_id, vms->oem_table_id);
+}
+
 /* GTDT */
 static void
 build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -688,6 +732,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 unsigned dsdt, xsdt;
 GArray *tables_blob = tables->table_data;
 MachineState *ms = MACHINE(vms);
+bool cpu_topology_enabled = !vmc->no_cpu_topology;
 
 table_offsets = g_array_new(false, true /* clear */,
 sizeof(uint32_t));
@@ -707,6 +752,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
 
+if (ms->smp.cpus > 1 && cpu_topology_enabled) {
+acpi_add_table(table_offsets, tables_blob);
+build_pptt(tables_blob, tables->linker, vms);
+}
+
 acpi_add_table(table_offsets, tables_blob);
 build_gtdt(tables_blob, tables->linker, vms);
 
-- 
2.23.0

[RFC PATCH 4/5] hw/acpi/aml-build: add processor hierarchy node structure

2021-02-25 Thread Ying Fang

Add the processor hierarchy node structures to build ACPI information
for CPU topology. Since the private resources may be used to describe
cache hierarchy and it is variable among different topology level,
three helpers are introduced to describe the hierarchy.

(1) build_socket_hierarchy for socket description
(2) build_processor_hierarchy for processor description
(3) build_smt_hierarchy for thread (logic processor) description

Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
 hw/acpi/aml-build.c | 40 +
 include/hw/acpi/acpi-defs.h | 13 
 include/hw/acpi/aml-build.h |  7 +++
 3 files changed, 60 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a2cd7a5830..a0af3e9d73 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1888,6 +1888,46 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
  table_data->len - slit_start, 1, oem_id, oem_table_id);
 }
 
+/*
+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, ACPI_PPTT_PHYSICAL_PACKAGE, 4);
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
+  ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD |
+  ACPI_PPTT_ACPI_LEAF_NODE, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent , 4); /* parent */
+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
 /* build rev1/rev3/rev5.1 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index cf9f44299c..45e10d886f 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -618,4 +618,17 @@ struct AcpiIortRC {
 } QEMU_PACKED;
 typedef struct AcpiIortRC AcpiIortRC;
 
+enum {
+ACPI_PPTT_TYPE_PROCESSOR = 0,
+ACPI_PPTT_TYPE_CACHE,
+ACPI_PPTT_TYPE_ID,
+ACPI_PPTT_TYPE_RESERVED
+};
+
+#define ACPI_PPTT_PHYSICAL_PACKAGE  (1)
+#define ACPI_PPTT_ACPI_PROCESSOR_ID_VALID   (1 << 1)
+#define ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD  (1 << 2)  /* ACPI 6.3 */
+#define ACPI_PPTT_ACPI_LEAF_NODE(1 << 3)  /* ACPI 6.3 */
+#define ACPI_PPTT_ACPI_IDENTICAL(1 << 4)  /* ACPI 6.3 */
+
 #endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 380d3e3924..7f0ca1a198 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -462,6 +462,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
 const char *oem_id, const char *oem_table_id);
 
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id);
+
+void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id);
 
-- 
2.23.0

[RFC PATCH 1/5] device_tree: Add qemu_fdt_add_path

2021-02-25 Thread Ying Fang

qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except
it also adds any missing parent nodes. We also tweak an error
message of qemu_fdt_add_subnode().

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
 include/sysemu/device_tree.h |  1 +
 softmmu/device_tree.c| 45 ++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 982c89345f..15fb98af98 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
 uint32_t qemu_fdt_alloc_phandle(void *fdt);
 int qemu_fdt_nop_node(void *fdt, const char *node_path);
 int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
 
 #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
 do {  \
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index b9a3ddc518..1e3857ca0c 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 
 retval = fdt_add_subnode(fdt, parent, basename);
 if (retval < 0) {
-error_report("FDT: Failed to create subnode %s: %s", name,
- fdt_strerror(retval));
+error_report("%s: Failed to create subnode %s: %s",
+ __func__, name, fdt_strerror(retval));
 exit(1);
 }
 
@@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 return retval;
 }
 
+/*
+ * Like qemu_fdt_add_subnode(), but will add all missing
+ * subnodes in the path.
+ */
+int qemu_fdt_add_path(void *fdt, const char *path)
+{
+char *dupname, *basename, *p;
+int parent, retval = -1;
+
+if (path[0] != '/') {
+return retval;
+}
+
+parent = fdt_path_offset(fdt, "/");
+p = dupname = g_strdup(path);
+
+while (p) {
+*p = '/';
+basename = p + 1;
+p = strchr(p + 1, '/');
+if (p) {
+*p = '\0';
+}
+retval = fdt_path_offset(fdt, dupname);
+if (retval < 0 && retval != -FDT_ERR_NOTFOUND) {
+error_report("%s: Invalid path %s: %s",
+ __func__, path, fdt_strerror(retval));
+exit(1);
+} else if (retval == -FDT_ERR_NOTFOUND) {
+retval = fdt_add_subnode(fdt, parent, basename);
+if (retval < 0) {
+break;
+}
+}
+parent = retval;
+}
+
+g_free(dupname);
+return retval;
+}
+
 void qemu_fdt_dumpdtb(void *fdt, int size)
 {
 const char *dumpdtb = current_machine->dumpdtb;
-- 
2.23.0

[RFC PATCH 3/5] hw/arm/virt-acpi-build: distinguish possible and present cpus

2021-02-25 Thread Ying Fang

When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled in madt.
Furthermore, it is also needed if we are going to support CPU
hotplug in the future.

This patch is a rework based on Andrew Jones's contribution at
https://lists.gnu.org/archive/html/qemu-arm/2018-07/msg00076.html

Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 14 ++
 hw/arm/virt.c|  2 ++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f9c9df916c..bb91152fe2 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -61,13 +61,16 @@
 
 static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
 {
-MachineState *ms = MACHINE(vms);
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
 uint16_t i;
 
-for (i = 0; i < ms->smp.cpus; i++) {
+for (i = 0; i < possible_cpus->len; i++) {
 Aml *dev = aml_device("C%.03X", i);
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
 aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+if (possible_cpus->cpus[i].cpu == NULL) {
+aml_append(dev, aml_name_decl("_STA", aml_int(0)));
+}
 aml_append(scope, dev);
 }
 }
@@ -479,6 +482,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 const int *irqmap = vms->irqmap;
 AcpiMadtGenericDistributor *gicd;
 AcpiMadtGenericMsiFrame *gic_msi;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
 int i;
 
 acpi_data_push(table_data, sizeof(AcpiMultipleApicTable));
@@ -489,7 +493,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
 gicd->version = vms->gic_version;
 
-for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
+for (i = 0; i < possible_cpus->len; i++) {
 AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
sizeof(*gicc));
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -504,7 +508,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicc->cpu_interface_number = cpu_to_le32(i);
 gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity);
 gicc->uid = cpu_to_le32(i);
-gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+if (possible_cpus->cpus[i].cpu != NULL) {
+gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+}
 
 if (arm_feature(&armcpu->env, ARM_FEATURE_PMU)) {
 gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c133b342b8..75659502e2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2047,6 +2047,8 @@ static void machvirt_init(MachineState *machine)
 
 qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
 object_unref(cpuobj);
+/* Initialize cpu member here since cpu hotplug is not supported yet */
+machine->possible_cpus->cpus[n].cpu = cpuobj;
 }
 fdt_add_timer_nodes(vms);
 fdt_add_cpu_nodes(vms);
-- 
2.23.0

[RFC PATCH 2/5] hw/arm/virt: Add cpu-map to device tree

2021-02-25 Thread Ying Fang

Support device tree CPU topology descriptions.

Signed-off-by: Ying Fang 
---
 hw/arm/virt.c | 38 +-
 include/hw/arm/virt.h |  1 +
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 371147f3ae..c133b342b8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -351,10 +351,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 int cpu;
 int addr_cells = 1;
 const MachineState *ms = MACHINE(vms);
+const VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
 int smp_cpus = ms->smp.cpus;
 
 /*
- * From Documentation/devicetree/bindings/arm/cpus.txt
+ * See Linux Documentation/devicetree/bindings/arm/cpus.yaml
  *  On ARM v8 64-bit systems value should be set to 2,
  *  that corresponds to the MPIDR_EL1 register size.
  *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
@@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
 }
 
+if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) {
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+}
+
 g_free(nodename);
 }
+
+if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) {
+/*
+ * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt
+ */
+qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map");
+
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
+char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+char *map_path;
+
+if (ms->smp.threads > 1) {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d/%s%d",
+"cluster", cpu / (ms->smp.cores * ms->smp.threads),
+"core", (cpu / ms->smp.threads) % ms->smp.cores,
+"thread", cpu % ms->smp.threads);
+} else {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d",
+"cluster", cpu / ms->smp.cores,
+"core", cpu % ms->smp.cores);
+}
+qemu_fdt_add_path(vms->fdt, map_path);
+qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path);
+g_free(map_path);
+g_free(cpu_path);
+}
+}
 }
 
 static void fdt_add_its_gic_node(VirtMachineState *vms)
@@ -2742,6 +2777,7 @@ static void virt_machine_5_2_options(MachineClass *mc)
 virt_machine_6_0_options(mc);
 compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
 vmc->no_secure_gpio = true;
+vmc->no_cpu_topology = true;
 }
 DEFINE_VIRT_MACHINE(5, 2)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ee9a93101e..7ef6d08ac3 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -129,6 +129,7 @@ struct VirtMachineClass {
 bool no_kvm_steal_time;
 bool acpi_expose_flash;
 bool no_secure_gpio;
+bool no_cpu_topology;
 };
 
 struct VirtMachineState {
-- 
2.23.0

Re: [PATCH v3 1/3] ui/console: Add placeholder flag to message surface

2021-02-25 Thread Gerd Hoffmann

> -DisplaySurface *qemu_create_message_surface(int w, int h,
> -const char *msg)
> +DisplaySurface *qemu_create_placeholder_surface(int w, int h,
> +const char *msg)
>  {
>  DisplaySurface *surface = qemu_create_displaysurface(w, h);
>  pixman_color_t bg = color_table_rgb[0][QEMU_COLOR_BLACK];

Not setting QEMU_PLACEHOLDER_FLAG here?

take care,
  Gerd

Re: [PATCH v22 17/17] i386: move cpu_load_efer into sysemu-only section of cpu.h

2021-02-25 Thread Claudio Fontana

On 2/25/21 5:28 AM, Richard Henderson wrote:
> On 2/24/21 5:34 AM, Claudio Fontana wrote:
>> cpu_load_efer is now used only for sysemu code.
>>
>> Therefore, make this inline function not visible anymore
>> in CONFIG_USER_ONLY builds.
>>
>> Signed-off-by: Claudio Fontana 
>> ---
>>  target/i386/cpu.h | 31 ---
>>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> Perhaps move to cpu-internal.h?  It is not used outside of target/i386/.
> 
> Or declared in cpu-internal.h and placed in cpu-sysemu.c?  I don't see that
> it's particularly performance sensitive either.
> 
> But one way or the other,
> Reviewed-by: Richard Henderson 
> 
> 
> r~
> 

cpu-sysemu.c (and cpu.c) seems now to be about cpu class, model, properties, 
and relative functions. Maybe worth writing it down in the header..

the file that seems to contain pertinent content now is "helper.c", which maybe 
should be renamed..

Ciao,

Claudio

Re: [PATCH v3 2/3] ui/console: Pass placeholder surface to displays

2021-02-25 Thread Gerd Hoffmann

> +static void dpy_gfx_switch(DisplayChangeListener *dcl, DisplaySurface 
> *surface)

int width, int height;

> +static DisplaySurface *placeholder;
> +static const char placeholder_msg[] = "Display output is not active.";
> +DisplaySurface *broadcast;
> +
> +if (!dcl->ops->dpy_gfx_switch) {
> +return;
> +}
> +
> +if (surface) {
> +broadcast = surface;
> +} else {
> +if (!placeholder) {
> +placeholder = qemu_create_placeholder_surface(640, 480, 
> placeholder_msg);
> +}

Just create a new one unconditionally.

> @@ -1685,9 +1704,7 @@ void dpy_gfx_replace_surface(QemuConsole *con,
>  if (con != (dcl->con ? dcl->con : active_console)) {
>  continue;
>  }
> -if (dcl->ops->dpy_gfx_switch) {
> -dcl->ops->dpy_gfx_switch(dcl, surface);
> -}
> +dpy_gfx_switch(dcl, surface);

You can look at the old_surface here and pass the size to
dpy_gfx_switch(), so the placeholder is created with the same size.

take care,
  Gerd

Re: [PATCH] qtest: delete redundant qtest.h header files

2021-02-25 Thread Markus Armbruster

Chen Qun  writes:

> There are 23 files that include the "sysemu/qtest.h",
> but they do not use any qtest functions.
>
> Signed-off-by: Chen Qun 

The subject sounds as if you were deleting file include/sysemu/qtest.h,
which would be wrong.  You're actually deleting inclusions.  Suggest to
say

qtest: delete superfluous inclusions of qtest.h

or

delete superfluous #include "sysemu/qtest.h"

Perhaps the maintainer merging your patch can do that for you.

Re: [PATCH v3 3/3] virtio-gpu: Do not distinguish the primary console

2021-02-25 Thread Gerd Hoffmann

> -if (m->scanout_id == 0 && m->width == 0) {
> +if (m->width == 0) {
>  s->ds = qemu_create_placeholder_surface(640, 480,
>  "Guest disabled 
> display.");
>  dpy_gfx_replace_surface(con, s->ds);

Just call dpy_gfx_replace_surface(con, NULL) here and let console.c
create the placeholder?

>  for (i = 0; i < g->conf.max_outputs; i++) {
>  g->scanout[i].con =
>  graphic_console_init(DEVICE(g), i, &virtio_gpu_ops, g);
> -if (i > 0) {
> -dpy_gfx_replace_surface(g->scanout[i].con, NULL);
> -}

I think we should call dpy_gfx_replace_surface(..., NULL)
unconditionally here instead of removing the call.

> +/* primary head */

Comment can go away as we remove the special case for scanout == 0,

> +ds = qemu_create_placeholder_surface(scanout->width  ?: 640,
> + scanout->height ?: 480,
> + "Guest disabled display.");
>  dpy_gfx_replace_surface(scanout->con, ds);

likewise "dpy_gfx_replace_surface(..., NULL);"

take care,
  Gerd

[PATCH v2 1/7] intel_iommu: Fix mask may be uninitialized in vtd_context_device_invalidate

2021-02-25 Thread Eric Auger

With -Werror=maybe-uninitialized configuration we get
../hw/i386/intel_iommu.c: In function ‘vtd_context_device_invalidate’:
../hw/i386/intel_iommu.c:1888:10: error: ‘mask’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
 1888 | mask = ~mask;
  | ~^~~

Add a g_assert_not_reached() to avoid the error.

Signed-off-by: Eric Auger 
---
 hw/i386/intel_iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b4f5094259..3206f379f8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1884,6 +1884,8 @@ static void vtd_context_device_invalidate(IntelIOMMUState 
*s,
 case 3:
 mask = 7;   /* Mask bit 2:0 in the SID field */
 break;
+default:
+g_assert_not_reached();
 }
 mask = ~mask;
 
-- 
2.26.2

[PATCH v2 0/7] Some vIOMMU fixes

2021-02-25 Thread Eric Auger

Hi,

Here is a set of vIOMMU fixes:

SMMUv3:
- top SID computation overflow when handling SMMU_CMD_CFGI_ALL
- internal IOTLB handling (changes related to range invalidation)
  - smmu_iotlb_inv_iova with asid = -1
  - non power of 2 invalidation range handling.

VIRTIO-IOMMU:
  - non power of 2 invalidation range handling.

Best Regards

Eric

v2: https://github.com/eauger/qemu/tree/viommu_fixes_for_6-v2
v1: https://github.com/eauger/qemu/tree/viommu_fixes_for_6

History:
v1 -> v2:
- new:
  - dma: Introduce dma_aligned_pow2_mask()
  - intel_iommu: Fix mask may be uninitialized in vtd_context_device_invalidate
  - hw/arm/smmuv3: Uniformize sid traces

Eric Auger (7):
  intel_iommu: Fix mask may be uninitialized in
vtd_context_device_invalidate
  dma: Introduce dma_aligned_pow2_mask()
  virtio-iommu: Handle non power of 2 range invalidations
  hw/arm/smmu-common: Fix smmu_iotlb_inv_iova when asid is not set
  hw/arm/smmuv3: Enforce invalidation on a power of two range
  hw/arm/smmuv3: Fix SMMU_CMD_CFGI_STE_RANGE handling
  hw/arm/smmuv3: Uniformize sid traces

 hw/arm/smmu-common.c | 32 +-
 hw/arm/smmu-internal.h   |  5 
 hw/arm/smmuv3.c  | 58 +++-
 hw/arm/trace-events  | 24 -
 hw/i386/intel_iommu.c| 32 +++---
 hw/virtio/virtio-iommu.c | 19 ++---
 include/sysemu/dma.h |  3 +++
 softmmu/dma-helpers.c| 26 ++
 8 files changed, 130 insertions(+), 69 deletions(-)

-- 
2.26.2

[PATCH v2 6/7] hw/arm/smmuv3: Fix SMMU_CMD_CFGI_STE_RANGE handling

2021-02-25 Thread Eric Auger

If the whole SID range (32b) is invalidated (SMMU_CMD_CFGI_ALL),
@end overflows and we fail to handle the command properly.

Once this gets fixed, the current code really is awkward in the
sense it loops over the whole range instead of removing the
currently cached configs through a hash table lookup.

Fix both the overflow and the lookup.

Signed-off-by: Eric Auger 
---
 hw/arm/smmu-internal.h |  5 +
 hw/arm/smmuv3.c| 34 --
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
index 55147f29be..2d75b31953 100644
--- a/hw/arm/smmu-internal.h
+++ b/hw/arm/smmu-internal.h
@@ -104,4 +104,9 @@ typedef struct SMMUIOTLBPageInvInfo {
 uint64_t mask;
 } SMMUIOTLBPageInvInfo;
 
+typedef struct SMMUSIDRange {
+uint32_t start;
+uint32_t end;
+} SMMUSIDRange;
+
 #endif
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index fdd6332ce5..3b87324ce2 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -32,6 +32,7 @@
 
 #include "hw/arm/smmuv3.h"
 #include "smmuv3-internal.h"
+#include "smmu-internal.h"
 
 /**
  * smmuv3_trigger_irq - pulse @irq if enabled and update
@@ -895,6 +896,20 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
 }
 }
 
+static gboolean
+smmuv3_invalidate_ste(gpointer key, gpointer value, gpointer user_data)
+{
+SMMUDevice *sdev = (SMMUDevice *)key;
+uint32_t sid = smmu_get_sid(sdev);
+SMMUSIDRange *sid_range = (SMMUSIDRange *)user_data;
+
+if (sid < sid_range->start || sid > sid_range->end) {
+return false;
+}
+trace_smmuv3_config_cache_inv(sid);
+return true;
+}
+
 static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 SMMUState *bs = ARM_SMMU(s);
@@ -965,27 +980,18 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 }
 case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
 {
-uint32_t start = CMD_SID(&cmd), end, i;
+uint32_t start = CMD_SID(&cmd);
 uint8_t range = CMD_STE_RANGE(&cmd);
+uint64_t end = start + (1ULL << (range + 1)) - 1;
+SMMUSIDRange sid_range = {start, end};
 
 if (CMD_SSEC(&cmd)) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
-
-end = start + (1 << (range + 1)) - 1;
 trace_smmuv3_cmdq_cfgi_ste_range(start, end);
-
-for (i = start; i <= end; i++) {
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, i);
-SMMUDevice *sdev;
-
-if (!mr) {
-continue;
-}
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
-}
+g_hash_table_foreach_remove(bs->configs, smmuv3_invalidate_ste,
+&sid_range);
 break;
 }
 case SMMU_CMD_CFGI_CD:
-- 
2.26.2

[PATCH v2 5/7] hw/arm/smmuv3: Enforce invalidation on a power of two range

2021-02-25 Thread Eric Auger

As of today, the driver can invalide a number of pages that is
not a power of 2. However IOTLB unmap notifications and internal
IOTLB invalidations work with masks leading to erroneous
invalidations.

In case the range is not a power of 2, split invalidations into
power of 2 invalidations.

When looking for a single page entry in the vSMMU internal IOTLB,
let's make sure that if the entry is not found using a
g_hash_table_remove() we iterate over all the entries to find a
potential range that overlaps it.

Signed-off-by: Eric Auger 
---
 hw/arm/smmu-common.c | 30 ++
 hw/arm/smmuv3.c  | 24 
 2 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index e9ca3aebb2..84d2c62c26 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -151,22 +151,28 @@ inline void
 smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
 uint8_t tg, uint64_t num_pages, uint8_t ttl)
 {
+/* if tg is not set we use 4KB range invalidation */
+uint8_t granule = tg ? tg * 2 + 10 : 12;
+
 if (ttl && (num_pages == 1) && (asid >= 0)) {
 SMMUIOTLBKey key = smmu_get_iotlb_key(asid, iova, tg, ttl);
 
-g_hash_table_remove(s->iotlb, &key);
-} else {
-/* if tg is not set we use 4KB range invalidation */
-uint8_t granule = tg ? tg * 2 + 10 : 12;
-
-SMMUIOTLBPageInvInfo info = {
-.asid = asid, .iova = iova,
-.mask = (num_pages * 1 << granule) - 1};
-
-g_hash_table_foreach_remove(s->iotlb,
-smmu_hash_remove_by_asid_iova,
-&info);
+if (g_hash_table_remove(s->iotlb, &key)) {
+return;
+}
+/*
+ * if the entry is not found, let's see if it does not
+ * belong to a larger IOTLB entry
+ */
 }
+
+SMMUIOTLBPageInvInfo info = {
+.asid = asid, .iova = iova,
+.mask = (num_pages * 1 << granule) - 1};
+
+g_hash_table_foreach_remove(s->iotlb,
+smmu_hash_remove_by_asid_iova,
+&info);
 }
 
 inline void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index bd1f97000d..fdd6332ce5 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -861,7 +861,8 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
 uint16_t vmid = CMD_VMID(cmd);
 bool leaf = CMD_LEAF(cmd);
 uint8_t tg = CMD_TG(cmd);
-hwaddr num_pages = 1;
+uint64_t first_page = 0, last_page;
+uint64_t num_pages = 1;
 int asid = -1;
 
 if (tg) {
@@ -874,9 +875,24 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
 if (type == SMMU_CMD_TLBI_NH_VA) {
 asid = CMD_ASID(cmd);
 }
-trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf);
-smmuv3_inv_notifiers_iova(s, asid, addr, tg, num_pages);
-smmu_iotlb_inv_iova(s, asid, addr, tg, num_pages, ttl);
+
+/* Split invalidations into ^2 range invalidations */
+last_page = num_pages - 1;
+while (num_pages) {
+uint8_t granule = tg * 2 + 10;
+uint64_t mask, count;
+
+mask = dma_aligned_pow2_mask(first_page, last_page, 64 - granule);
+count = mask + 1;
+
+trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, count, ttl, leaf);
+smmuv3_inv_notifiers_iova(s, asid, addr, tg, count);
+smmu_iotlb_inv_iova(s, asid, addr, tg, count, ttl);
+
+num_pages -= count;
+first_page += count;
+addr += count * BIT_ULL(granule);
+}
 }
 
 static int smmuv3_cmdq_consume(SMMUv3State *s)
-- 
2.26.2

[PATCH v2 2/7] dma: Introduce dma_aligned_pow2_mask()

2021-02-25 Thread Eric Auger

Currently get_naturally_aligned_size() is used by the intel iommu
to compute the maximum invalidation range based on @size which is
a power of 2 while being aligned with the @start address and less
than the maximum range defined by @gaw.

This helper is also useful for other iommu devices (virtio-iommu,
SMMUv3) to make sure IOMMU UNMAP notifiers only are called with
power of 2 range sizes.

Let's move this latter into dma-helpers.c and rename it into
dma_aligned_pow2_mask(). Also rewrite the helper so that it
accomodates UINT64_MAX values for the size mask and max mask.
It now returns a mask instead of a size. Change the caller.

Signed-off-by: Eric Auger 
---
 hw/i386/intel_iommu.c | 30 +++---
 include/sysemu/dma.h  |  3 +++
 softmmu/dma-helpers.c | 26 ++
 3 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3206f379f8..6be8f32918 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -35,6 +35,7 @@
 #include "hw/i386/x86-iommu.h"
 #include "hw/pci-host/q35.h"
 #include "sysemu/kvm.h"
+#include "sysemu/dma.h"
 #include "sysemu/sysemu.h"
 #include "hw/i386/apic_internal.h"
 #include "kvm/kvm_i386.h"
@@ -3455,24 +3456,6 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 return vtd_dev_as;
 }
 
-static uint64_t get_naturally_aligned_size(uint64_t start,
-   uint64_t size, int gaw)
-{
-uint64_t max_mask = 1ULL << gaw;
-uint64_t alignment = start ? start & -start : max_mask;
-
-alignment = MIN(alignment, max_mask);
-size = MIN(size, max_mask);
-
-if (alignment <= size) {
-/* Increase the alignment of start */
-return alignment;
-} else {
-/* Find the largest page mask from size */
-return 1ULL << (63 - clz64(size));
-}
-}
-
 /* Unmap the whole range in the notifier's scope. */
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 {
@@ -3501,13 +3484,14 @@ static void vtd_address_space_unmap(VTDAddressSpace 
*as, IOMMUNotifier *n)
 
 while (remain >= VTD_PAGE_SIZE) {
 IOMMUTLBEvent event;
-uint64_t mask = get_naturally_aligned_size(start, remain, s->aw_bits);
+uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
+uint64_t size = mask + 1;
 
-assert(mask);
+assert(size);
 
 event.type = IOMMU_NOTIFIER_UNMAP;
 event.entry.iova = start;
-event.entry.addr_mask = mask - 1;
+event.entry.addr_mask = mask;
 event.entry.target_as = &address_space_memory;
 event.entry.perm = IOMMU_NONE;
 /* This field is meaningless for unmap */
@@ -3515,8 +3499,8 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, 
IOMMUNotifier *n)
 
 memory_region_notify_iommu_one(n, &event);
 
-start += mask;
-remain -= mask;
+start += size;
+remain -= size;
 }
 
 assert(!remain);
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index a052f7bca3..2acb303be2 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -296,4 +296,7 @@ uint64_t dma_buf_write(uint8_t *ptr, int32_t len, 
QEMUSGList *sg);
 void dma_acct_start(BlockBackend *blk, BlockAcctCookie *cookie,
 QEMUSGList *sg, enum BlockAcctType type);
 
+uint64_t dma_aligned_pow2_mask(uint64_t start, uint64_t end,
+   int max_addr_bits);
+
 #endif
diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index 29001b5459..7d766a5e89 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -330,3 +330,29 @@ void dma_acct_start(BlockBackend *blk, BlockAcctCookie 
*cookie,
 {
 block_acct_start(blk_get_stats(blk), cookie, sg->size, type);
 }
+
+uint64_t dma_aligned_pow2_mask(uint64_t start, uint64_t end, int max_addr_bits)
+{
+uint64_t max_mask = UINT64_MAX, addr_mask = end - start;
+uint64_t alignment_mask, size_mask;
+
+if (max_addr_bits != 64) {
+max_mask = (1ULL << max_addr_bits) - 1;
+}
+
+alignment_mask = start ? (start & -start) - 1 : max_mask;
+alignment_mask = MIN(alignment_mask, max_mask);
+size_mask = MIN(addr_mask, max_mask);
+
+if (alignment_mask <= size_mask) {
+/* Increase the alignment of start */
+return alignment_mask;
+} else {
+/* Find the largest page mask from size */
+if (addr_mask == UINT64_MAX) {
+return UINT64_MAX;
+}
+return (1ULL << (63 - clz64(addr_mask + 1))) - 1;
+}
+}
+
-- 
2.26.2

[PATCH v2 3/7] virtio-iommu: Handle non power of 2 range invalidations

2021-02-25 Thread Eric Auger

Unmap notifiers work with an address mask assuming an
invalidation range of a power of 2. Nothing mandates this
in the VIRTIO-IOMMU spec.

So in case the range is not a power of 2, split it into
several invalidations.

Signed-off-by: Eric Auger 
---
 hw/virtio/virtio-iommu.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index c2883a2f6c..1b23e8e18c 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -155,6 +155,7 @@ static void virtio_iommu_notify_unmap(IOMMUMemoryRegion 
*mr, hwaddr virt_start,
   hwaddr virt_end)
 {
 IOMMUTLBEvent event;
+uint64_t delta = virt_end - virt_start;
 
 if (!(mr->iommu_notify_flags & IOMMU_NOTIFIER_UNMAP)) {
 return;
@@ -164,12 +165,24 @@ static void virtio_iommu_notify_unmap(IOMMUMemoryRegion 
*mr, hwaddr virt_start,
 
 event.type = IOMMU_NOTIFIER_UNMAP;
 event.entry.target_as = &address_space_memory;
-event.entry.addr_mask = virt_end - virt_start;
-event.entry.iova = virt_start;
 event.entry.perm = IOMMU_NONE;
 event.entry.translated_addr = 0;
+event.entry.addr_mask = delta;
+event.entry.iova = virt_start;
 
-memory_region_notify_iommu(mr, 0, event);
+if (delta == UINT64_MAX) {
+memory_region_notify_iommu(mr, 0, event);
+}
+
+
+while (virt_start != virt_end + 1) {
+uint64_t mask = dma_aligned_pow2_mask(virt_start, virt_end, 64);
+
+event.entry.addr_mask = mask;
+event.entry.iova = virt_start;
+memory_region_notify_iommu(mr, 0, event);
+virt_start += mask + 1;
+}
 }
 
 static gboolean virtio_iommu_notify_unmap_cb(gpointer key, gpointer value,
-- 
2.26.2

[PATCH v2 7/7] hw/arm/smmuv3: Uniformize sid traces

2021-02-25 Thread Eric Auger

Convert all sid printouts to sid=0x%x.

Signed-off-by: Eric Auger 
---
 hw/arm/trace-events | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index a335ee891d..b79a91af5f 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -29,26 +29,26 @@ smmuv3_cmdq_opcode(const char *opcode) "<--- %s"
 smmuv3_cmdq_consume_out(uint32_t prod, uint32_t cons, uint8_t prod_wrap, 
uint8_t cons_wrap) "prod:%d, cons:%d, prod_wrap:%d, cons_wrap:%d "
 smmuv3_cmdq_consume_error(const char *cmd_name, uint8_t cmd_error) "Error on 
%s command execution: %d"
 smmuv3_write_mmio(uint64_t addr, uint64_t val, unsigned size, uint32_t r) 
"addr: 0x%"PRIx64" val:0x%"PRIx64" size: 0x%x(%d)"
-smmuv3_record_event(const char *type, uint32_t sid) "%s sid=%d"
-smmuv3_find_ste(uint16_t sid, uint32_t features, uint16_t sid_split) "SID:0x%x 
features:0x%x, sid_split:0x%x"
+smmuv3_record_event(const char *type, uint32_t sid) "%s sid=0x%x"
+smmuv3_find_ste(uint16_t sid, uint32_t features, uint16_t sid_split) "sid=0x%x 
features:0x%x, sid_split:0x%x"
 smmuv3_find_ste_2lvl(uint64_t strtab_base, uint64_t l1ptr, int l1_ste_offset, 
uint64_t l2ptr, int l2_ste_offset, int max_l2_ste) "strtab_base:0x%"PRIx64" 
l1ptr:0x%"PRIx64" l1_off:0x%x, l2ptr:0x%"PRIx64" l2_off:0x%x max_l2_ste:%d"
 smmuv3_get_ste(uint64_t addr) "STE addr: 0x%"PRIx64
-smmuv3_translate_disable(const char *n, uint16_t sid, uint64_t addr, bool 
is_write) "%s sid=%d bypass (smmu disabled) iova:0x%"PRIx64" is_write=%d"
-smmuv3_translate_bypass(const char *n, uint16_t sid, uint64_t addr, bool 
is_write) "%s sid=%d STE bypass iova:0x%"PRIx64" is_write=%d"
-smmuv3_translate_abort(const char *n, uint16_t sid, uint64_t addr, bool 
is_write) "%s sid=%d abort on iova:0x%"PRIx64" is_write=%d"
-smmuv3_translate_success(const char *n, uint16_t sid, uint64_t iova, uint64_t 
translated, int perm) "%s sid=%d iova=0x%"PRIx64" translated=0x%"PRIx64" 
perm=0x%x"
+smmuv3_translate_disable(const char *n, uint16_t sid, uint64_t addr, bool 
is_write) "%s sid=0x%x bypass (smmu disabled) iova:0x%"PRIx64" is_write=%d"
+smmuv3_translate_bypass(const char *n, uint16_t sid, uint64_t addr, bool 
is_write) "%s sid=0x%x STE bypass iova:0x%"PRIx64" is_write=%d"
+smmuv3_translate_abort(const char *n, uint16_t sid, uint64_t addr, bool 
is_write) "%s sid=0x%x abort on iova:0x%"PRIx64" is_write=%d"
+smmuv3_translate_success(const char *n, uint16_t sid, uint64_t iova, uint64_t 
translated, int perm) "%s sid=0x%x iova=0x%"PRIx64" translated=0x%"PRIx64" 
perm=0x%x"
 smmuv3_get_cd(uint64_t addr) "CD addr: 0x%"PRIx64
 smmuv3_decode_cd(uint32_t oas) "oas=%d"
 smmuv3_decode_cd_tt(int i, uint32_t tsz, uint64_t ttb, uint32_t granule_sz, 
bool had) "TT[%d]:tsz:%d ttb:0x%"PRIx64" granule_sz:%d had:%d"
-smmuv3_cmdq_cfgi_ste(int streamid) "streamid =%d"
+smmuv3_cmdq_cfgi_ste(int streamid) "streamid= 0x%x"
 smmuv3_cmdq_cfgi_ste_range(int start, int end) "start=0x%x - end=0x%x"
-smmuv3_cmdq_cfgi_cd(uint32_t sid) "streamid = %d"
-smmuv3_config_cache_hit(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t 
perc) "Config cache HIT for sid %d (hits=%d, misses=%d, hit rate=%d)"
-smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, uint32_t misses, 
uint32_t perc) "Config cache MISS for sid %d (hits=%d, misses=%d, hit rate=%d)"
-smmuv3_s1_range_inval(int vmid, int asid, uint64_t addr, uint8_t tg, uint64_t 
num_pages, uint8_t ttl, bool leaf) "vmid =%d asid =%d addr=0x%"PRIx64" tg=%d 
num_pages=0x%"PRIx64" ttl=%d leaf=%d"
+smmuv3_cmdq_cfgi_cd(uint32_t sid) "sid=0x%x"
+smmuv3_config_cache_hit(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t 
perc) "Config cache HIT for sid=0x%x (hits=%d, misses=%d, hit rate=%d)"
+smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, uint32_t misses, 
uint32_t perc) "Config cache MISS for sid=0x%x (hits=%d, misses=%d, hit 
rate=%d)"
+smmuv3_s1_range_inval(int vmid, int asid, uint64_t addr, uint8_t tg, uint64_t 
num_pages, uint8_t ttl, bool leaf) "vmid=%d asid=%d addr=0x%"PRIx64" tg=%d 
num_pages=0x%"PRIx64" ttl=%d leaf=%d"
 smmuv3_cmdq_tlbi_nh(void) ""
 smmuv3_cmdq_tlbi_nh_asid(uint16_t asid) "asid=%d"
-smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid %d"
+smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
 smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu 
mr=%s"
 smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu 
mr=%s"
 smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint64_t iova, 
uint8_t tg, uint64_t num_pages) "iommu mr=%s asid=%d iova=0x%"PRIx64" tg=%d 
num_pages=0x%"PRIx64
-- 
2.26.2

[PATCH v2 4/7] hw/arm/smmu-common: Fix smmu_iotlb_inv_iova when asid is not set

2021-02-25 Thread Eric Auger

If the asid is not set, do not attempt to locate the key directly
as all inserted keys have a valid asid.

Use g_hash_table_foreach_remove instead.

Signed-off-by: Eric Auger 
---
 hw/arm/smmu-common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 405d5c5325..e9ca3aebb2 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -151,7 +151,7 @@ inline void
 smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
 uint8_t tg, uint64_t num_pages, uint8_t ttl)
 {
-if (ttl && (num_pages == 1)) {
+if (ttl && (num_pages == 1) && (asid >= 0)) {
 SMMUIOTLBKey key = smmu_get_iotlb_key(asid, iova, tg, ttl);
 
 g_hash_table_remove(s->iotlb, &key);
-- 
2.26.2

Re: [PATCH v2 38/42] esp: convert ti_buf from array to Fifo8

2021-02-25 Thread Mark Cave-Ayland


On 23/02/2021 18:49, Philippe Mathieu-Daudé wrote:


On 2/9/21 8:30 PM, Mark Cave-Ayland wrote:

Rename TI_BUFSZ to ESP_FIFO_SZ since this constant is really describing the size
of the FIFO and is not directly related to the TI size.

Signed-off-by: Mark Cave-Ayland 
---
  hw/scsi/esp.c | 117 ++
  include/hw/scsi/esp.h |   8 +--
  2 files changed, 79 insertions(+), 46 deletions(-)



@@ -806,11 +818,9 @@ void esp_reg_write(ESPState *s, uint32_t saddr, uint64_t 
val)
  } else {
  trace_esp_error_fifo_overrun();
  }
-} else if (s->ti_wptr == TI_BUFSZ - 1) {
-trace_esp_error_fifo_overrun();
  } else {
  s->ti_size++;
-s->ti_buf[s->ti_wptr++] = val & 0xff;
+esp_fifo_push(s, val & 0xff);


Personally I'd drop the '& 0xff' part.


I left it as it was so that it was direct translation of the code it was replacing, 
but I can easily drop it.



  }
  
  /* Non-DMA transfers raise an interrupt after every byte */

@@ -839,8 +849,7 @@ void esp_reg_write(ESPState *s, uint32_t saddr, uint64_t 
val)
  case CMD_FLUSH:
  trace_esp_mem_writeb_cmd_flush(val);
  /*s->ti_size = 0;*/


Is this comment still meaningful?


This line can also be removed, so I will make this change for v3.


-s->ti_wptr = 0;
-s->ti_rptr = 0;
+fifo8_reset(&s->fifo);
  break;
  case CMD_RESET:
  trace_esp_mem_writeb_cmd_reset(val);
@@ -958,11 +967,18 @@ static int esp_pre_save(void *opaque)
  static int esp_post_load(void *opaque, int version_id)
  {
  ESPState *s = ESP(opaque);
+int len, i;
  
  version_id = MIN(version_id, s->mig_version_id);
  
  if (version_id < 5) {

  esp_set_tc(s, s->mig_dma_left);
+
+/* Migrate ti_buf to fifo */
+len = s->mig_ti_wptr - s->mig_ti_rptr;
+for (i = 0; i < len; i++) {
+fifo8_push(&s->fifo, s->mig_ti_buf[i]);


Again I dare to add:
Reviewed-by: Philippe Mathieu-Daudé 


Thank you :)


ATB,

Mark.

[PATCH v3 00/38] ppc: qemu: Convert qemu-ppce500 to driver model and enable additional driver support

2021-02-25 Thread Bin Meng

At present when building qemu-ppce500 the following warnings are seen:

= WARNING ==
This board does not use CONFIG_DM. CONFIG_DM will be
compulsory starting with the v2020.01 release.
Failure to update may result in board removal.
  UPD include/generated/timestamp_autogenerated.h
See doc/driver-model/migration.rst for more info.

= WARNING ==
This board does not use CONFIG_DM_PCI Please update
the board to use CONFIG_DM_PCI before the v2019.07 release.
Failure to update by the deadline may result in board removal.
See doc/driver-model/migration.rst for more info.

= WARNING ==
This board does not use CONFIG_DM_ETH (Driver Model
for Ethernet drivers). Please update the board to use
CONFIG_DM_ETH before the v2020.07 release. Failure to
update by the deadline may result in board removal.
See doc/driver-model/migration.rst for more info.


The conversion of qemu-ppce500 board to driver model is long overdue.

When testing the exisitng qemu-ppce500 support, PCI was found broken.
This is caused by 2 separate issues:

- One issue was caused by U-Boot:
  Commit e002474158d1 ("pci: pci-uclass: Dynamically allocate the PCI regions")
  Patch #1 updated the non-DM fsl_pci_init driver to dynamically allocate the
  PCI regions, to keep in sync with the pci uclass driver
- One issue was caused by QEMU:
  commit e6b4e5f4795b ("PPC: e500: Move CCSR and MMIO space to upper end of 
address space")
  commit cb3778a0455a ("PPC: e500 pci host: Add support for ATMUs")
  Patch #3-4 fixed this issue to keep in sync with latest QEMU upstream

Patch #5-8, #34-36 are minor fixes and clean-ups.

Starting from patch#9, these are driver model conversion patches.

Patch #11-17 are mainly related to CONFIG_ADDR_MAP, a library to support targets
that have non-identity virtual-physical address mappings. A new command 
'addrmap'
is introduced to aid debugging, and a fix to arch/powerpc/asm/include/io.h is
made to correct the usage of CONFIG_ADDR_MAP as it can only be used in the post-
relocation phase. Also the initialization of this library is moved a bit earlier
in the post-relocation phase otherwise device drivers won't work.

Patch #19-21 are 85xx PCI driver fixes. It adds support to controller register
physical address beyond 32-bit, as well as support to 64-bit bus and cpu address
as current upstream QEMU uses 64-bit cpu address.

Starting from patch#24, these are additional driver support patches.

Patch #24, #26 are minor fix to the 'virtio' command and BLK driver dependency.

Patch #25 enables the VirtIO NET support as by default a VirtIO standard PCI
networking device is connected as an ethernet interface at PCI address 0.1.0.

Patch #27 enables the VirtIO BLK driver support.

Patch #28-30 enables the GPIO support.

Patch #31-32 enables poweroff via GPIO.

Patch #33 enables RTC over the I2C bus.

Patch #37 moves the qemu-ppce500 boards codes to board/emulation as that is the
place for other QEMU targets like x86, arm, riscv.

Patch #38 adds a reST document to describe how to build and run U-Boot for the
QEMU ppce500 machine.

I hope we can make this series to U-Boot v2021.04 release.

This series is available at u-boot-x86/qemu-ppc for testing.

This cover letter is cc'ed to QEMU mailing list for a heads-up.
A future patch will be sent to QEMU mailing list to bring its in-tree
U-Boot source codes up-to-date.

Changes in v3:
- rebase on top of u-boot/master

Changes in v2:
- drop the revert patch of commit e002474158d1
- new patch: pci: fsl_pci_init: Dynamically allocate the PCI regions
- add more details in the commit message, and put some comments
  in the codes to explain why
- add doc/usage/addrmap.rst
- new patch: test: cmd: Add a basic test for 'addrmap' command
- new patch: virtio: Fix VirtIO BLK driver dependency
- new patch: ppc: qemu: Enable VirtIO BLK support
- new patch: ppc: mpc85xx: Add 'gpibe' register to 'struct ccsr_gpio'
- new patch: gpio: mpc8xxx: Support controller register physical address beyond 
32-bit
- new patch: ppc: qemu: Enable GPIO support
- new patch: dm: sysreset: Add a Kconfig option for the 'reset' command
- new patch: ppc: qemu: Enable support for power off via GPIO
- new patch: ppc: qemu: Enable RTC support via I2C
- new patch: ppc: qemu: Delete the temporary FDT virtual-physical mapping after 
U-Boot is relocated
- new patch: ppc: qemu: Drop a custom env variable 'fdt_addr_r'
- new patch: ppc: qemu: Drop fixed_sdram()
- add descriptions for VirtIO BLK, RTC and power off

Bin Meng (38):
  pci: fsl_pci_init: Dynamically allocate the PCI regions
  ppc: qemu: Update MAINTAINERS for correct email address
  common: fdt_support: Support special case of PCI address in
fdt_read_prop()
  ppc: qemu: Support non-identity PCI bus

RE: [PATCH 07/10] Disable auto-coverge before entering COLO mode.

2021-02-25 Thread Rao, Lei

Sorry for the late reply due to CNY.
Auto-converge ensure that live migration can be completed smoothly when there 
are too many dirty pages.
COLO may encounter the same situation when rebuild a new secondary VM. 
So, I think it is necessary to enable COLO and auto-converge at the same time.

Thanks,
Lei.

-Original Message-
From: Lukas Straub  
Sent: Sunday, February 14, 2021 6:52 PM
To: Rao, Lei 
Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; 
jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; 
dgilb...@redhat.com; qemu-devel@nongnu.org
Subject: Re: [PATCH 07/10] Disable auto-coverge before entering COLO mode.

On Wed, 13 Jan 2021 10:46:32 +0800
leirao  wrote:

> From: "Rao, Lei" 
> 
> If we don't disable the feature of auto-converge for live migration 
> before entering COLO mode, it will continue to run with COLO running, 
> and eventually the system will hang due to the CPU throttle reaching 
> DEFAULT_MIGRATE_MAX_CPU_THROTTLE.
> 
> Signed-off-by: Lei Rao 
> ---
>  migration/migration.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c index 
> 31417ce..6ab37e5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error 
> **errp)
>  qapi_free_MigrationCapabilityStatusList(cap);
>  }
>  
> +static void colo_auto_converge_enabled(bool value, Error **errp) {
> +MigrationCapabilityStatusList *cap = NULL;
> +
> +if (migrate_colo_enabled() && migrate_auto_converge()) {
> +QAPI_LIST_PREPEND(cap,
> +  migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE,
> +  value));
> +qmp_migrate_set_capabilities(cap, errp);
> +qapi_free_MigrationCapabilityStatusList(cap);
> +}
> +cpu_throttle_stop();
> +}
> +

I think it's better to error out in migration_prepare or migrate_caps_check if 
both colo and auto-converge is enabled.

>  static void migrate_set_block_incremental(MigrationState *s, bool 
> value)  {
>  s->parameters.block_incremental = value; @@ -3401,7 +3415,7 @@ 
> static MigIterateState migration_iteration_run(MigrationState *s)  
> static void migration_iteration_finish(MigrationState *s)  {
>  /* If we enabled cpu throttling for auto-converge, turn it off. */
> -cpu_throttle_stop();
> +colo_auto_converge_enabled(false, &error_abort);
>  
>  qemu_mutex_lock_iothread();
>  switch (s->state) {



--

RE: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint

2021-02-25 Thread Rao, Lei

If user executes the shutdown normally and QEMU crashes, I think this is 
unacceptable.
Since we can avoid this situation, why not do it?

Thanks,
Lei.

-Original Message-
From: Lukas Straub  
Sent: Sunday, February 14, 2021 7:46 PM
To: Rao, Lei 
Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; 
jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; 
dgilb...@redhat.com; qemu-devel@nongnu.org
Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown during 
checkpoint

On Fri, 29 Jan 2021 02:57:57 +
"Rao, Lei"  wrote:

> The state will be set RUN_STATE_COLO in colo_do_checkpoint_transaction(). If 
> the guest executes power off or shutdown at this time and the QEMU main 
> thread will call vm_shutdown(), it will set the state to RUN_STATE_SHUTDOWN.
> The state switch from RUN_STATE_COLO to RUN_STATE_SHUTDOWN is not defined in 
> runstate_transitions_def. this will cause QEMU crash. Although this is small 
> probability, it may still happen.

This patch fixes the 'colo' -> 'shutdown' transition. AFAIK then 
colo_do_checkpoint_transaction will call vm_start() again, which does 
'shutdown' -> 'running' and (rightfully) crashes. So I think it is better to 
crash here too.

>  By the way. Do you have any comments about other patches?
> Thanks,
> Lei.
> 
> -Original Message-
> From: Lukas Straub 
> Sent: Thursday, January 28, 2021 2:24 AM
> To: Rao, Lei 
> Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; 
> jasow...@redhat.com; zhang.zhanghaili...@huawei.com; 
> quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org
> Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown 
> during checkpoint
> 
> On Thu, 21 Jan 2021 01:48:31 +
> "Rao, Lei"  wrote:
> 
> > The Primary VM can be shut down when it is in COLO state, which may trigger 
> > this bug.  
> 
> Do you have a backtrace for this bug?
> 
> > About 'shutdown' -> 'colo' -> 'running', I think you are right, I did have 
> > the problems you said. For 'shutdown'->'colo', The fixed 
> > patch(5647051f432b7c9b57525470b0a79a31339062d2) have been merged.
> > Recently, I found another bug as follows in the test.
> > qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running'
> > Aborted (core dumped)
> > The gdb bt as following:
> > #0  __GI_raise (sig=sig@entry=6) at 
> > ../sysdeps/unix/sysv/linux/raise.c:50
> > #1  0x7faa3d613859 in __GI_abort () at abort.c:79
> > #2  0x55c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at 
> > vl.c:723
> > #3  0x55c5a1f8cae4 in vm_prepare_start () at 
> > /home/workspace/colo-qemu/cpus.c:2206
> > #4  0x55c5a1f8cb1b in vm_start () at 
> > /home/workspace/colo-qemu/cpus.c:2213
> > #5  0x55c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) 
> > at migration/migration.c:3376
> > #6  0x55c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at 
> > migration/migration.c:3527
> > #7  0x55c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at 
> > util/qemu-thread-posix.c:519
> > #8  0x7faa3d7e9609 in start_thread (arg=) at 
> > pthread_create.c:477
> > #9  0x7faa3d710293 in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > 
> > For the bug, I made the following changes:
> > @@ -3379,7 +3379,9 @@ static void 
> > migration_iteration_finish(MigrationState *s)
> >  case MIGRATION_STATUS_CANCELLED:
> >  case MIGRATION_STATUS_CANCELLING:
> >  if (s->vm_was_running) {
> > -vm_start();
> > +if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> > +vm_start();
> > +}
> >  } else {
> >  if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
> >  runstate_set(RUN_STATE_POSTMIGRATE);
> >  
> > I will send the patch to community after more test.
> > 
> > Thanks,
> > Lei.
> > 
> > -Original Message-
> > From: Lukas Straub 
> > Sent: Thursday, January 21, 2021 3:13 AM
> > To: Rao, Lei 
> > Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; 
> > jasow...@redhat.com; zhang.zhanghaili...@huawei.com; 
> > quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org
> > Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown 
> > during checkpoint
> > 
> > On Wed, 13 Jan 2021 10:46:27 +0800
> > leirao  wrote:
> >   
> > > From: "Rao, Lei" 
> > > 
> > > This patch fixes the following:
> > > qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown'
> > > Aborted (core dumped)
> > > 
> > > Signed-off-by: Lei Rao 
> > 
> > I wonder how that is possible, since the VM is stopped during 'colo' state.
> > 
> > Unrelated to this patch, I think this area needs some work since the 
> > following unintended runstate transition is possible:
> > 'shutdown' -> 'colo' -> 'running'.
> >   
> > > ---
> > >  softmmu/runstate.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/softmmu/runstate.c b/softmmu/run

Re: [PATCH] qapi: Remove QMP events and commands from user-mode builds

2021-02-25 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 24/02/21 18:16, Philippe Mathieu-Daudé wrote:
>> We removed the QMP loop in user-mode builds in commit 1935e0e4e09
>> ("qapi/meson: Remove QMP from user-mode emulation"), now commands
>> and events code is unreachable.
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>   qapi/meson.build | 12 
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>> diff --git a/qapi/meson.build b/qapi/meson.build
>> index 0652569bc43..fcb15a78f15 100644
>> --- a/qapi/meson.build
>> +++ b/qapi/meson.build
>> @@ -102,11 +102,15 @@
>>   'qapi-types-@0@.h'.format(module),
>>   'qapi-visit-@0@.c'.format(module),
>>   'qapi-visit-@0@.h'.format(module),
>> -'qapi-events-@0@.c'.format(module),
>> -'qapi-events-@0@.h'.format(module),
>> -'qapi-commands-@0@.c'.format(module),
>> -'qapi-commands-@0@.h'.format(module),
>> ]
>> +  if have_system or have_tools
>> +qapi_module_outputs += [
>> +  'qapi-events-@0@.c'.format(module),
>> +  'qapi-events-@0@.h'.format(module),
>> +  'qapi-commands-@0@.c'.format(module),
>> +  'qapi-commands-@0@.h'.format(module),
>> +]
>> +  endif
>> if module.endswith('-target')
>>   qapi_specific_outputs += qapi_module_outputs
>> else
>> 
>
> Acked-by: Paolo Bonzini 

I'm taking this as "Markus, care to take this through your tree?"

Queued, thanks!

Re: [PATCH v3 3/3] virtio-gpu: Do not distinguish the primary console

2021-02-25 Thread Akihiko Odaki

2021年2月25日(木) 18:11 Gerd Hoffmann :
>
> > -if (m->scanout_id == 0 && m->width == 0) {
> > +if (m->width == 0) {
> >  s->ds = qemu_create_placeholder_surface(640, 480,
> >  "Guest disabled 
> > display.");
> >  dpy_gfx_replace_surface(con, s->ds);
>
> Just call dpy_gfx_replace_surface(con, NULL) here and let console.c
> create the placeholder?

I'll change according to this and the comments in the other replies
and submit a new version.

>
> >  for (i = 0; i < g->conf.max_outputs; i++) {
> >  g->scanout[i].con =
> >  graphic_console_init(DEVICE(g), i, &virtio_gpu_ops, g);
> > -if (i > 0) {
> > -dpy_gfx_replace_surface(g->scanout[i].con, NULL);
> > -}
>
> I think we should call dpy_gfx_replace_surface(..., NULL)
> unconditionally here instead of removing the call.

graphic_console_init will set a placeholder anyway so it does not need
an additional call.

>
> > +/* primary head */
>
> Comment can go away as we remove the special case for scanout == 0,
>
> > +ds = qemu_create_placeholder_surface(scanout->width  ?: 640,
> > + scanout->height ?: 480,
> > + "Guest disabled display.");
> >  dpy_gfx_replace_surface(scanout->con, ds);
>
> likewise "dpy_gfx_replace_surface(..., NULL);"
>
> take care,
>   Gerd
>

Re: [PATCH v2 1/2] tests/acceptance: replace unstable apt.armbian.com URLs for orangepi-pc, cubieboard

2021-02-25 Thread Daniel P . Berrangé

On Wed, Feb 24, 2021 at 09:02:51PM +0100, Niek Linnenbank wrote:
> Hi Philippe, Cleber,
> 
> On Wed, Feb 24, 2021 at 8:14 PM Cleber Rosa  wrote:
> 
> > On Wed, Feb 24, 2021 at 10:12:10AM +0100, Philippe Mathieu-Daudé wrote:
> > > Hi Niek,
> > >
> > > On 2/23/21 11:53 PM, Niek Linnenbank wrote:
> > > > Currently the automated acceptance tests for the Orange Pi PC and
> > cubieboard
> > > > machines are disabled by default. The tests for both machines require
> > artifacts
> > > > that are stored on the apt.armbian.com domain. Unfortunately, some of
> > these artifacts
> > > > have been removed from apt.armbian.com and it is uncertain whether
> > more will be removed.
> > > >
> > > > This commit moves the artifacts previously stored on apt.armbian.com
> > to github
> > > > and retrieves them using the path: '//'.
> > > >
> > > > Signed-off-by: Niek Linnenbank 
> > > > Reviewed-by: Willian Rampazzo 
> > > > Reviewed-by: Cleber Rosa 
> > >
> > > > Tested-by: Cleber Rosa 
> > >
> > > Did Cleber test this new version?
> > >
> >
> 
> You're right, it was the previous version (v1) that Cleber tested using my
> own machine URL's.
> 
> I was actually not sure whether I should or should not have added the
> Tested-by/Reviewed-by tags in such scenario.
> The content had to be changed due to the outcome of our discussion but also
> I thought I don't want to silently drop
> the tags since Cleber invested his time into it too.
> 
> What should I do here, next time?
> 
> 
> 
> >
> > Nope, and I'm having issues with those URLs.  For instance:
> >
> >$ curl -L
> > https://github.com/nieklinnenbank/QemuArtifacts/raw/master/cubieboard/linux-image-dev-sunxi_5.75_armhf.deb
> >version https://git-lfs.github.com/spec/v1
> >oid
> > sha256:a4b765c851de76592f55023b1ff4104f7fd29bf90937e6054e0a64fdda56380b
> >size 20331524
> >
> > Looks like it has to do with GitHub's behavior wrt quota.
> >
> 
> Indeed. Just this morning I received an e-mail from github with the
> following text:
> 
> "[GitHub] Git LFS disabled for nieklinnenbank
> 
> Git LFS has been disabled on your personal account nieklinnenbank because
> you’ve exceeded your data plan by at least 150%.
> Please purchase additional data packs to cover your bandwidth and storage
> usage:
> 
>   https://github.com/account/billing/data/upgrade
> 
> Current usage as of 24 Feb 2021 09:49AM UTC:
> 
>   Bandwidth: 1.55 GB / 1 GB (155%)
>   Storage: 0.48 GB / 1 GB (48%)"
> 
> I wasn't aware of it but it appears that Github has these quota's for the
> Large File Storage (LFS). I uploaded the files in the git LFS
> because single files are also limited to 100MiB each on the regular Git
> repositories.
> 
> With those strict limits, in my opinion Github isn't really a solution
> since the bandwidth limit will be reached very quickly. At least for the
> LFS part that is. I don't know yet if there is any limit for regular access.
> 
> My current ideas:

>   - we can try to just update the URLs to armbian that are working now
> (with the risk of breaking again in the near future). Ive also found this
> link, which may be more stable:
>  https://archive.armbian.com/orangepipc/archive/

Just do this, as it is the simplest option that gets things working. We
have already spent far too long talking about the problem instead of
just fixing the URLs.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 1/1] accel/tcg: Replace parallel_cpus with cpu->cflags_base

2021-02-25 Thread Alex Bennée



Richard Henderson  writes:

> Precompute the initial tb->cflags value for a cpu from its cluster
> and the number of cpus live in the system.  This avoids having to
> compute this constant data every time we look up a TB.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH v2] target/arm: Speed up aarch64 TBL/TBX

2021-02-25 Thread Alex Bennée



Richard Henderson  writes:

> Always perform one call instead of two for 16-byte operands.
> Use byte loads/stores directly into the vector register file
> instead of extractions and deposits to a 64-bit local variable.
>
> In order to easily receive pointers into the vector register file,
> convert the helper to the gvec out-of-line signature.  Move the
> helper into vec_helper.c, where it can make use of H1 and clear_tail.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 

Looked marginally slower on the (1!) run I did but probably in the noise
and the generated code looks nicer.

-- 
Alex Bennée

Re: [PATCH v2 00/42] esp: consolidate PDMA transfer buffers and other fixes

2021-02-25 Thread Mark Cave-Ayland


On 23/02/2021 21:32, Philippe Mathieu-Daudé wrote:


Hi Mark,

On 2/9/21 8:29 PM, Mark Cave-Ayland wrote:

This patch series comes from an experimental branch that I've been working on
to try and boot a MacOS toolbox ROM under the QEMU q800 machine. The effort is
far from complete, but it seems worth submitting these patches separately since
they are limited to the ESP device and form a substantial part of the work to
date.

As part of Laurent's recent q800 work so-called PDMA (pseudo-DMA) support was
added to the ESP device. This is whereby the DREQ (DMA request) line is used
to signal to the host CPU that it can transfer data to/from the device over
the SCSI bus.

The existing PDMA tracks 4 separate transfer data sources as indicated by the
ESP pdma_origin variable: PDMA, TI, CMD and ASYNC with an independent variable
pdma_len to store the transfer length. This works well with Linux which uses a
single PDMA request to transfer a number of sectors in a single request.

Unfortunately the MacOS toolbox ROM has other ideas here: it sends data to the
ESP as a mixture of FIFO and PDMA transfers and then uses a mixture of the FIFO
and DMA counters to confirm that the correct number of bytes have been
transferred. For this to work correctly the PDMA buffers and separate pdma_len
transfer counter must be consolidated into the FIFO to allow mixing of both
types of transfer within a single request.

The patchset is split into several sections:

- Patches 1-7 are minor patches which make esp.c checkpatch friendly, QOMify 
ESPState,
   and also fix up some trace events ready for later patches in the series

- Patches 8-13 unify the DMA transfer count. In particular there are 2 synthetic
   variables dma_counter and dma_left within ESPState which do not need to 
exist.
   DMA transfer lengths are programmed into the TC (transfer count) register 
which is
   decremented for each byte transferred, generating an interrupt when it 
reaches zero.
   These patches add helper functions to read the TC and STC registers directly 
and
   remove these synthetic variables so that the DMA transfer length is now 
tracked in
   a single place.

- Now that the TC register represents the authoritative DMA transfer length, 
patches
   14-25 work to eliminate the separate PDMA variables pdma_start, pdma_cur, 
pdma_len
   and separate PDMA buffers PDMA and CMD. The PDMA position variables can be 
replaced
   by the existing ESP cmdlen and ti_wptr/ti_rptr, whilst the FIFO (TI) buffer 
is used
   for incoming data with commands being accumulated in cmdbuf as per standard 
DMA
   requests.


I tried to help reviewing up to this point.

The next parts are too specific to me.


Thanks Phil - I understand that a set of 42 patches for a 25 year old disk controller 
is never going to be the top of most people's review list, and some parts are almost 
impossible to review unless you have a good understanding of the datasheet.


I'll see if Laurent has any comments over the next few days, but other than that I'd 
be inclined to send a v3 followed soon by a PR to avoid me having to update these 
regularly (I already see a slight conflict with Paolo's SCSI error handling changes, 
for example).



ATB,

Mark.

Re: [PATCH] libqos/qgraph: format qgraph comments for sphinx documentation

2021-02-25 Thread Paolo Bonzini


On 25/02/21 09:22, Emanuele Giuseppe Esposito wrote:

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 1dcce3bbed..f0038f8722 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -12,6 +12,7 @@ Contents:

  .. toctree::
     :maxdepth: 2
+   :includehidden:

     build-system
     kconfig
@@ -24,7 +25,6 @@ Contents:
     atomics
     stable-process
     qtest
-   qgraph
     decodetree
     secure-coding-practices
     tcg

---

Allow showing the hidden toctree in the docs/devel index, so that the 
link is visible


End result:
- no visible change in docs/index
- qgraph link visible in docs/devel/index
- qgraph linked as text link in qtree


Makes sense.  Did you also try increasing the maxdepth?

Paolo

Re: [PATCH] tcg/i386: rdpmc: use the the condtions

2021-02-25 Thread Paolo Bonzini


On 25/02/21 06:47, Zheng Zhan Liang wrote:

Signed-off-by: Zheng Zhan Liang 
---
  target/i386/tcg/misc_helper.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/tcg/misc_helper.c b/target/i386/tcg/misc_helper.c
index f02e4fd400..90b87fdef0 100644
--- a/target/i386/tcg/misc_helper.c
+++ b/target/i386/tcg/misc_helper.c
@@ -222,7 +222,8 @@ void helper_rdtscp(CPUX86State *env)
  
  void helper_rdpmc(CPUX86State *env)

  {
-if ((env->cr[4] & CR4_PCE_MASK) && ((env->hflags & HF_CPL_MASK) != 0)) {
+if (((env->cr[4] & CR4_PCE_MASK) == 0 ) &&
+((env->hflags & HF_CPL_MASK) != 0)) {
  raise_exception_ra(env, EXCP0D_GPF, GETPC());
  }
  cpu_svm_check_intercept_param(env, SVM_EXIT_RDPMC, 0, GETPC());



Queued, thanks.

Paolo

Re: [PATCH v2 1/7] intel_iommu: Fix mask may be uninitialized in vtd_context_device_invalidate

2021-02-25 Thread Philippe Mathieu-Daudé

On 2/25/21 10:14 AM, Eric Auger wrote:
> With -Werror=maybe-uninitialized configuration we get
> ../hw/i386/intel_iommu.c: In function ‘vtd_context_device_invalidate’:
> ../hw/i386/intel_iommu.c:1888:10: error: ‘mask’ may be used
> uninitialized in this function [-Werror=maybe-uninitialized]
>  1888 | mask = ~mask;
>   | ~^~~
> 
> Add a g_assert_not_reached() to avoid the error.
> 
> Signed-off-by: Eric Auger 
> ---
>  hw/i386/intel_iommu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index b4f5094259..3206f379f8 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1884,6 +1884,8 @@ static void 
> vtd_context_device_invalidate(IntelIOMMUState *s,
>  case 3:
>  mask = 7;   /* Mask bit 2:0 in the SID field */
>  break;
> +default:
> +g_assert_not_reached();
>  }
>  mask = ~mask;

Unrelated to this patch, but I wonder why we don't directly assign the
correct value of the mask in the switch cases...

Reviewed-by: Philippe Mathieu-Daudé 

set the mask
diuse the
>  
>

Re: [PATCH v4 0/8] hw/sh4: Kconfig cleanups

2021-02-25 Thread Paolo Bonzini


On 22/02/21 15:15, Philippe Mathieu-Daudé wrote:

Missing review: 1 (license)

Since v3:
- Include full MIT license text (Peter)

Since v2:
- Added missing TC58128/SH_PCI Kconfig entries (Peter)

Since v1:
- Addressed Peter Maydell review comments from
https://www.mail-archive.com/qemu-block@nongnu.org/msg80599.html

Philippe Mathieu-Daudé (8):
   hw/sh4: Add missing license
   hw/sh4: Add missing Kconfig dependency on SH7750 for the R2D board
   hw/intc: Introduce SH_INTC Kconfig entry
   hw/char: Introduce SH_SCI Kconfig entry
   hw/timer: Introduce SH_TIMER Kconfig entry
   hw/block: Introduce TC58128 eeprom Kconfig entry
   hw/pci-host: Introduce SH_PCI Kconfig entry
   hw/sh4: Remove now unused CONFIG_SH4 from Kconfig

  include/hw/sh4/sh.h   | 31 ---
  hw/block/tc58128.c| 26 ++
  hw/{sh4 => pci-host}/sh_pci.c |  0
  MAINTAINERS   |  6 ++
  hw/block/Kconfig  |  3 +++
  hw/block/meson.build  |  2 +-
  hw/char/Kconfig   |  3 +++
  hw/char/meson.build   |  2 +-
  hw/intc/Kconfig   |  3 +++
  hw/intc/meson.build   |  2 +-
  hw/pci-host/Kconfig   |  4 
  hw/pci-host/meson.build   |  1 +
  hw/sh4/Kconfig| 12 ++--
  hw/sh4/meson.build|  1 -
  hw/timer/Kconfig  |  4 
  hw/timer/meson.build  |  2 +-
  16 files changed, 88 insertions(+), 14 deletions(-)
  rename hw/{sh4 => pci-host}/sh_pci.c (100%)



Acked-by: Paolo Bonzini 

Paolo

Re: [PATCH v2] char: don't fail when client is not connected

2021-02-25 Thread Paolo Bonzini


On 09/02/21 06:49, Pavel Dovgalyuk wrote:

This patch checks that ioc is not null before
using it in tcp socket tcp_chr_add_watch function.

The failure occurs in replay mode of the execution,
when monitor and serial port are tcp servers,
and there are no clients connected to them:

-monitor tcp:127.0.0.1:8081,server,nowait
-serial tcp:127.0.0.1:8082,server,nowait


Signed-off-by: Pavel Dovgalyuk 
Reviewed-by: Marc-André Lureau 
---
  chardev/char-socket.c |3 +++
  1 file changed, 3 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 213a4c8dd0..cef1d9438f 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -385,6 +385,9 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf, size_t 
len)
  static GSource *tcp_chr_add_watch(Chardev *chr, GIOCondition cond)
  {
  SocketChardev *s = SOCKET_CHARDEV(chr);
+if (!s->ioc) {
+return NULL;
+}
  return qio_channel_create_watch(s->ioc, cond);
  }
  



Queued, thanks.

Paolo

[PATCH v1 2/2] block/qcow2: introduce inflight writes counters: fix discard

2021-02-25 Thread Vladimir Sementsov-Ogievskiy

There is a bug in qcow2: host cluster can be discarded (refcount
becomes 0) and reused during data write. In this case data write may
pollute another cluster (recently allocated) or even metadata.

To fix the issue let's track inflight writes to host cluster in the
hash table and consider new counter when discarding and reusing host
clusters.

Enable qcow2-discard-during-rewrite as it is fixed.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |   9 +
 block/qcow2-refcount.c| 154 +-
 block/qcow2.c |  26 ++-
 .../tests/qcow2-discard-during-rewrite|   2 +-
 4 files changed, 186 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 0678073b74..fea2525a76 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -420,6 +420,8 @@ typedef struct BDRVQcow2State {
  * is to convert the image with the desired compression type set.
  */
 Qcow2CompressionType compression_type;
+
+GHashTable *inflight_writes_counters;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
@@ -896,6 +898,13 @@ int qcow2_shrink_reftable(BlockDriverState *bs);
 int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
 int qcow2_detect_metadata_preallocation(BlockDriverState *bs);
 
+int qcow2_inflight_writes_inc_locked(BlockDriverState *bs, int64_t offset,
+ int64_t length);
+int qcow2_inflight_writes_dec(BlockDriverState *bs, int64_t offset,
+  int64_t length);
+int qcow2_inflight_writes_dec_locked(BlockDriverState *bs, int64_t offset,
+ int64_t length);
+
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size);
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 8e649b008e..0ecb1167a6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -799,6 +799,145 @@ found:
 }
 }
 
+/*
+ * Qcow2InFlightRefcount is a type for values of s->inflight_writes_counters
+ * hasm map. And it's keys are cluster indices.
+ */
+typedef struct Qcow2InFlightRefcount {
+/*
+ * Number of in-flight writes to the cluster, always > 0, as when becomes
+ * 0 the entry is removed from s->inflight_writes_counters.
+ */
+uint64_t inflight_writes_cnt;
+
+/* Cluster refcount is known to be zero */
+bool refcount_zero;
+
+/* Cluster refcount was made zero with this discard-type */
+enum qcow2_discard_type type;
+} Qcow2InFlightRefcount;
+
+static Qcow2InFlightRefcount *find_infl_wr(BDRVQcow2State *s,
+   int64_t cluster_index)
+{
+Qcow2InFlightRefcount *infl;
+
+if (!s->inflight_writes_counters) {
+return NULL;
+}
+
+infl = g_hash_table_lookup(s->inflight_writes_counters, &cluster_index);
+
+if (infl) {
+assert(infl->inflight_writes_cnt > 0);
+}
+
+return infl;
+}
+
+/*
+ * Returns true if there are any in-flight writes to the cluster blocking
+ * its reallocation.
+ */
+static bool has_infl_wr(BDRVQcow2State *s, int64_t cluster_index)
+{
+return !!find_infl_wr(s, cluster_index);
+}
+
+static int update_inflight_write_cnt(BlockDriverState *bs,
+ int64_t offset, int64_t length,
+ bool decrease, bool locked)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t start, last, cluster_offset;
+
+if (locked) {
+qemu_co_mutex_assert_locked(&s->lock);
+}
+
+start = start_of_cluster(s, offset);
+last = start_of_cluster(s, offset + length - 1);
+for (cluster_offset = start; cluster_offset <= last;
+ cluster_offset += s->cluster_size)
+{
+int64_t cluster_index = cluster_offset >> s->cluster_bits;
+Qcow2InFlightRefcount *infl = find_infl_wr(s, cluster_index);
+
+if (!infl) {
+infl = g_new0(Qcow2InFlightRefcount, 1);
+g_hash_table_insert(s->inflight_writes_counters,
+g_memdup(&cluster_index, 
sizeof(cluster_index)),
+infl);
+}
+
+if (decrease) {
+assert(infl->inflight_writes_cnt >= 1);
+infl->inflight_writes_cnt--;
+} else {
+infl->inflight_writes_cnt++;
+}
+
+if (infl->inflight_writes_cnt == 0) {
+bool refcount_zero = infl->refcount_zero;
+enum qcow2_discard_type type = infl->type;
+
+g_hash_table_remove(s->inflight_writes_counters, &cluster_index);
+
+if (refcount_zero) {
+/*
+ * Slow path. We must reset normal refcount to actually release
+ * the cluster.
+ */
+int ret;
+
+if (!locked) {
+qemu_co_mutex_lock(&s->lock);
+

[PATCH v1 1/2] iotests: add qcow2-discard-during-rewrite

2021-02-25 Thread Vladimir Sementsov-Ogievskiy

Simple test:
 - start writing to allocated cluster A
 - discard this cluster
 - write to another unallocated cluster B (it's allocated in same place
   where A was allocated)
 - continue writing to A

For now last action pollutes cluster B which is a bug fixed by the
following commit.

For now, add test to "disabled" group, so that it doesn't run
automatically.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 .../tests/qcow2-discard-during-rewrite| 72 +++
 .../tests/qcow2-discard-during-rewrite.out| 21 ++
 2 files changed, 93 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qcow2-discard-during-rewrite
 create mode 100644 tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out

diff --git a/tests/qemu-iotests/tests/qcow2-discard-during-rewrite 
b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
new file mode 100755
index 00..7f0d8a107a
--- /dev/null
+++ b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
@@ -0,0 +1,72 @@
+#!/usr/bin/env bash
+# group: quick disabled
+#
+# Test discarding (and reusing) host cluster during writing data to it.
+#
+# Copyright (c) 2021 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=vsement...@virtuozzo.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./../common.rc
+. ./../common.filter
+
+_supported_fmt qcow2
+_supported_proto file fuse
+_supported_os Linux
+
+size=1M
+_make_test_img $size
+
+(
+cat <

[PATCH v1 0/2] qcow2: fix parallel rewrite and discard

2021-02-25 Thread Vladimir Sementsov-Ogievskiy

Hi all! It occurs that nothing prevents discarding and reallocating host
cluster during data writing. This way data writing will pollute another
flash allocated cluster of data or metadata.

Here is my suggestion to fix it basing on improved refcounts model. Look
at 02 for details.

I don't insist on this version, and will soon send a v2, based on
CoRwLock, as Kevin suggested, which should look simpler. Still, with v1
we keep possibility of relatively async discard.. Doesn't seem worth the
complexity.. But I'd like to share my idea of additional "runtime"
reference counters for clusters, as it may be needed later if we face
problems with more restrictive CoRwLock or may be for some other task.
So here is it.

Vladimir Sementsov-Ogievskiy (2):
  iotests: add qcow2-discard-during-rewrite
  block/qcow2: introduce inflight writes counters: fix discard

 block/qcow2.h |   9 +
 block/qcow2-refcount.c| 154 +-
 block/qcow2.c |  26 ++-
 .../tests/qcow2-discard-during-rewrite|  72 
 .../tests/qcow2-discard-during-rewrite.out|  21 +++
 5 files changed, 278 insertions(+), 4 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qcow2-discard-during-rewrite
 create mode 100644 tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out

-- 
2.29.2

Re: [PATCH] tcg/i386: rdpmc: use the the condtions

2021-02-25 Thread Philippe Mathieu-Daudé

Hi Paolo,

On 2/25/21 11:07 AM, Paolo Bonzini wrote:
> On 25/02/21 06:47, Zheng Zhan Liang wrote:
>> Signed-off-by: Zheng Zhan Liang 
>> ---
>>   target/i386/tcg/misc_helper.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/i386/tcg/misc_helper.c
>> b/target/i386/tcg/misc_helper.c
>> index f02e4fd400..90b87fdef0 100644
>> --- a/target/i386/tcg/misc_helper.c
>> +++ b/target/i386/tcg/misc_helper.c
>> @@ -222,7 +222,8 @@ void helper_rdtscp(CPUX86State *env)
>>     void helper_rdpmc(CPUX86State *env)
>>   {
>> -    if ((env->cr[4] & CR4_PCE_MASK) && ((env->hflags & HF_CPL_MASK)
>> != 0)) {
>> +    if (((env->cr[4] & CR4_PCE_MASK) == 0 ) &&
>> +    ((env->hflags & HF_CPL_MASK) != 0)) {
>>   raise_exception_ra(env, EXCP0D_GPF, GETPC());
>>   }
>>   cpu_svm_check_intercept_param(env, SVM_EXIT_RDPMC, 0, GETPC());
>>
> 
> Queued, thanks.

Do you mind fixing the patch subject?

[PATCH v4 1/3] ui/console: Add placeholder flag to message surface

2021-02-25 Thread Akihiko Odaki

The surfaces created with former qemu_create_message_surface
did not display the content from the guest and always contained
simple messages describing the reason.

A display backend may want to hide the window showing such a
surface. This change renames the function to
qemu_create_placeholder_surface, and adds "placeholder" flag; the
display can check the flag to decide to do anything special like
hiding the window.

Signed-off-by: Akihiko Odaki 
---
 hw/display/vhost-user-gpu.c |  4 ++--
 hw/display/virtio-gpu.c |  6 +++---
 include/ui/console.h| 10 --
 ui/console.c| 11 ++-
 ui/vnc.c|  2 +-
 5 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index 4d8cb3525bf..3e911da795e 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -194,8 +194,8 @@ vhost_user_gpu_handle_display(VhostUserGPU *g, 
VhostUserGpuMsg *msg)
 con = s->con;
 
 if (m->scanout_id == 0 && m->width == 0) {
-s->ds = qemu_create_message_surface(640, 480,
-"Guest disabled display.");
+s->ds = qemu_create_placeholder_surface(640, 480,
+"Guest disabled display.");
 dpy_gfx_replace_surface(con, s->ds);
 } else {
 s->ds = qemu_create_displaysurface(m->width, m->height);
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 2e4a9822b6a..c1f17bec17e 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -338,9 +338,9 @@ static void virtio_gpu_disable_scanout(VirtIOGPU *g, int 
scanout_id)
 
 if (scanout_id == 0) {
 /* primary head */
-ds = qemu_create_message_surface(scanout->width  ?: 640,
- scanout->height ?: 480,
- "Guest disabled display.");
+ds = qemu_create_placeholder_surface(scanout->width  ?: 640,
+ scanout->height ?: 480,
+ "Guest disabled display.");
 }
 dpy_gfx_replace_surface(scanout->con, ds);
 scanout->resource_id = 0;
diff --git a/include/ui/console.h b/include/ui/console.h
index d30e972d0b5..c960b7066cc 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -106,6 +106,7 @@ struct QemuConsoleClass {
 };
 
 #define QEMU_ALLOCATED_FLAG 0x01
+#define QEMU_PLACEHOLDER_FLAG   0x02
 
 typedef struct DisplaySurface {
 pixman_format_code_t format;
@@ -259,8 +260,8 @@ DisplaySurface *qemu_create_displaysurface_from(int width, 
int height,
 pixman_format_code_t format,
 int linesize, uint8_t *data);
 DisplaySurface *qemu_create_displaysurface_pixman(pixman_image_t *image);
-DisplaySurface *qemu_create_message_surface(int w, int h,
-const char *msg);
+DisplaySurface *qemu_create_placeholder_surface(int w, int h,
+const char *msg);
 PixelFormat qemu_default_pixelformat(int bpp);
 
 DisplaySurface *qemu_create_displaysurface(int width, int height);
@@ -281,6 +282,11 @@ static inline int is_buffer_shared(DisplaySurface *surface)
 return !(surface->flags & QEMU_ALLOCATED_FLAG);
 }
 
+static inline int is_placeholder(DisplaySurface *surface)
+{
+return surface->flags & QEMU_PLACEHOLDER_FLAG;
+}
+
 void register_displaychangelistener(DisplayChangeListener *dcl);
 void update_displaychangelistener(DisplayChangeListener *dcl,
   uint64_t interval);
diff --git a/ui/console.c b/ui/console.c
index c5d11bc7017..32823faf414 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1436,8 +1436,8 @@ DisplaySurface 
*qemu_create_displaysurface_pixman(pixman_image_t *image)
 return surface;
 }
 
-DisplaySurface *qemu_create_message_surface(int w, int h,
-const char *msg)
+DisplaySurface *qemu_create_placeholder_surface(int w, int h,
+const char *msg)
 {
 DisplaySurface *surface = qemu_create_displaysurface(w, h);
 pixman_color_t bg = color_table_rgb[0][QEMU_COLOR_BLACK];
@@ -1454,6 +1454,7 @@ DisplaySurface *qemu_create_message_surface(int w, int h,
  x+i, y, FONT_WIDTH, FONT_HEIGHT);
 qemu_pixman_image_unref(glyph);
 }
+surface->flags |= QEMU_PLACEHOLDER_FLAG;
 return surface;
 }
 
@@ -1550,7 +1551,7 @@ void register_displaychangelistener(DisplayChangeListener 
*dcl)
 dcl->ops->dpy_gfx_switch(dcl, con->surface);
 } else {
 if (!dummy) {
-dummy = qemu_create_message_surface(640, 480, nodev);
+dummy = qemu_create_placeholder_surface(640, 480, nodev);

Re: [PATCH v2 7/7] hw/arm/smmuv3: Uniformize sid traces

2021-02-25 Thread Philippe Mathieu-Daudé

On 2/25/21 10:14 AM, Eric Auger wrote:
> Convert all sid printouts to sid=0x%x.
> 
> Signed-off-by: Eric Auger 
> ---
>  hw/arm/trace-events | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

[PATCH] hw/riscv: Add basic fw_cfg support to virt

2021-02-25 Thread Asherah Connor

Provides a minimal fw_cfg for the virt machine on riscv.  I've
arbitrarily selected an MMIO base for it.  This is very rudimentary, so
no DMA interface is exposed.  Tested as working!

(First patch to qemu, so whatever patience you can afford would be
appreciated.)

Signed-off-by: Asherah Connor 
---
 hw/riscv/virt.c | 25 +
 include/hw/riscv/virt.h |  4 +++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 2299b3a6be..4981ca004b 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -56,6 +56,7 @@ static const struct MemmapEntry {
 [VIRT_PLIC] ={  0xc00, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
 [VIRT_UART0] =   { 0x1000, 0x100 },
 [VIRT_VIRTIO] =  { 0x10001000,0x1000 },
+[VIRT_FW_CFG] =  { 0x1010,  0x10 },
 [VIRT_FLASH] =   { 0x2000, 0x400 },
 [VIRT_PCIE_ECAM] =   { 0x3000,0x1000 },
 [VIRT_PCIE_MMIO] =   { 0x4000,0x4000 },
@@ -488,6 +489,26 @@ static inline DeviceState *gpex_pcie_init(MemoryRegion 
*sys_mem,
 return dev;
 }
 
+static FWCfgState *create_fw_cfg(const RISCVVirtState *s)
+{
+hwaddr base = virt_memmap[VIRT_FW_CFG].base;
+hwaddr size = virt_memmap[VIRT_FW_CFG].size;
+FWCfgState *fw_cfg;
+char *nodename;
+
+fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)MACHINE(s)->smp.cpus);
+
+nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
+qemu_fdt_add_subnode(s->fdt, nodename);
+qemu_fdt_setprop_string(s->fdt, nodename,
+"compatible", "qemu,fw-cfg-mmio");
+qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
+ 2, base, 2, size);
+g_free(nodename);
+return fw_cfg;
+}
+
 static void virt_machine_init(MachineState *machine)
 {
 const struct MemmapEntry *memmap = virt_memmap;
@@ -652,6 +673,10 @@ static void virt_machine_init(MachineState *machine)
 start_addr = virt_memmap[VIRT_FLASH].base;
 }
 
+/* init fw_cfg */
+s->fw_cfg = create_fw_cfg(s);
+rom_set_fw(s->fw_cfg);
+
 /* Compute the fdt load address in dram */
 fdt_load_addr = riscv_load_fdt(memmap[VIRT_DRAM].base,
machine->ram_size, s->fdt);
diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
index 84b7a3848f..3b81a2e3f6 100644
--- a/include/hw/riscv/virt.h
+++ b/include/hw/riscv/virt.h
@@ -40,6 +40,7 @@ struct RISCVVirtState {
 RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
 DeviceState *plic[VIRT_SOCKETS_MAX];
 PFlashCFI01 *flash[2];
+FWCfgState *fw_cfg;
 
 void *fdt;
 int fdt_size;
@@ -58,7 +59,8 @@ enum {
 VIRT_DRAM,
 VIRT_PCIE_MMIO,
 VIRT_PCIE_PIO,
-VIRT_PCIE_ECAM
+VIRT_PCIE_ECAM,
+VIRT_FW_CFG
 };
 
 enum {
-- 
2.24.3 (Apple Git-128)

[PATCH v4 3/3] virtio-gpu: Do not distinguish the primary console

2021-02-25 Thread Akihiko Odaki

In the past, virtio-gpu set NULL as the surface for the secondary
consoles to hide its window. The distinction is now handled in
ui/console and the display backends and virtio-gpu does no longer
have to do that.

Signed-off-by: Akihiko Odaki 
---
 hw/display/vhost-user-gpu.c  |  6 ++
 hw/display/virtio-gpu-3d.c   | 10 +++---
 hw/display/virtio-gpu-base.c |  3 ---
 hw/display/virtio-gpu.c  |  9 +
 4 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index 3e911da795e..a01f9315e19 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -193,10 +193,8 @@ vhost_user_gpu_handle_display(VhostUserGPU *g, 
VhostUserGpuMsg *msg)
 s = &g->parent_obj.scanout[m->scanout_id];
 con = s->con;
 
-if (m->scanout_id == 0 && m->width == 0) {
-s->ds = qemu_create_placeholder_surface(640, 480,
-"Guest disabled display.");
-dpy_gfx_replace_surface(con, s->ds);
+if (m->width == 0) {
+dpy_gfx_replace_surface(con, NULL);
 } else {
 s->ds = qemu_create_displaysurface(m->width, m->height);
 /* replace surface on next update */
diff --git a/hw/display/virtio-gpu-3d.c b/hw/display/virtio-gpu-3d.c
index 0b0c11474dd..9eb489077b1 100644
--- a/hw/display/virtio-gpu-3d.c
+++ b/hw/display/virtio-gpu-3d.c
@@ -179,10 +179,8 @@ static void virgl_cmd_set_scanout(VirtIOGPU *g,
 info.width, info.height,
 ss.r.x, ss.r.y, ss.r.width, ss.r.height);
 } else {
-if (ss.scanout_id != 0) {
-dpy_gfx_replace_surface(
-g->parent_obj.scanout[ss.scanout_id].con, NULL);
-}
+dpy_gfx_replace_surface(
+g->parent_obj.scanout[ss.scanout_id].con, NULL);
 dpy_gl_scanout_disable(g->parent_obj.scanout[ss.scanout_id].con);
 }
 g->parent_obj.scanout[ss.scanout_id].resource_id = ss.resource_id;
@@ -595,9 +593,7 @@ void virtio_gpu_virgl_reset(VirtIOGPU *g)
 
 virgl_renderer_reset();
 for (i = 0; i < g->parent_obj.conf.max_outputs; i++) {
-if (i != 0) {
-dpy_gfx_replace_surface(g->parent_obj.scanout[i].con, NULL);
-}
+dpy_gfx_replace_surface(g->parent_obj.scanout[i].con, NULL);
 dpy_gl_scanout_disable(g->parent_obj.scanout[i].con);
 }
 }
diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 4a57350917c..25f8920fdb6 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -193,9 +193,6 @@ virtio_gpu_base_device_realize(DeviceState *qdev,
 for (i = 0; i < g->conf.max_outputs; i++) {
 g->scanout[i].con =
 graphic_console_init(DEVICE(g), i, &virtio_gpu_ops, g);
-if (i > 0) {
-dpy_gfx_replace_surface(g->scanout[i].con, NULL);
-}
 }
 
 return true;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index c1f17bec17e..c9f5e36fd07 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -325,7 +325,6 @@ static void virtio_gpu_disable_scanout(VirtIOGPU *g, int 
scanout_id)
 {
 struct virtio_gpu_scanout *scanout = &g->parent_obj.scanout[scanout_id];
 struct virtio_gpu_simple_resource *res;
-DisplaySurface *ds = NULL;
 
 if (scanout->resource_id == 0) {
 return;
@@ -336,13 +335,7 @@ static void virtio_gpu_disable_scanout(VirtIOGPU *g, int 
scanout_id)
 res->scanout_bitmask &= ~(1 << scanout_id);
 }
 
-if (scanout_id == 0) {
-/* primary head */
-ds = qemu_create_placeholder_surface(scanout->width  ?: 640,
- scanout->height ?: 480,
- "Guest disabled display.");
-}
-dpy_gfx_replace_surface(scanout->con, ds);
+dpy_gfx_replace_surface(scanout->con, NULL);
 scanout->resource_id = 0;
 scanout->ds = NULL;
 scanout->width = 0;
-- 
2.24.3 (Apple Git-128)

[PATCH v4 2/3] ui/console: Pass placeholder surface to displays

2021-02-25 Thread Akihiko Odaki

ui/console used to accept NULL as graphic console surface, but its
semantics was inconsistent among displays:
- cocoa and gtk-egl perform NULL dereference.
- egl-headless, spice and spice-egl do nothing.
- gtk releases underlying resources.
- sdl2-2d and sdl2-gl destroys the window.
- vnc shows a message, "Display output is not active."

Fortunately, only virtio-gpu and virtio-gpu-3d assign NULL so
we can study them to figure out the desired behavior. They assign
NULL *except* for the primary display when the device is realized,
reset, or its scanout is disabled. This effectively destroys
windows for the (uninitialized) secondary displays.

To implement the consistent behavior of display device
realization/reset, this change embeds it to the operation
switching the surface. When NULL was given as a new surface when
switching, ui/console will instead passes a placeholder down
to each display listeners.

sdl destroys the window for a secondary console if its surface is a
placeholder. The other displays simply shows the placeholder.

Signed-off-by: Akihiko Odaki 
---
 ui/console.c   | 17 -
 ui/gtk.c   |  4 
 ui/sdl2-2d.c   |  7 ++-
 ui/sdl2-gl.c   |  4 ++--
 ui/spice-display.c |  6 +++---
 ui/vnc.c   | 10 --
 6 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/ui/console.c b/ui/console.c
index 32823faf414..171a7bf14b9 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1675,11 +1675,26 @@ void dpy_gfx_update_full(QemuConsole *con)
 void dpy_gfx_replace_surface(QemuConsole *con,
  DisplaySurface *surface)
 {
+static const char placeholder_msg[] = "Display output is not active.";
 DisplayState *s = con->ds;
 DisplaySurface *old_surface = con->surface;
 DisplayChangeListener *dcl;
+int width;
+int height;
+
+if (!surface) {
+if (old_surface) {
+width = surface_width(old_surface);
+height = surface_height(old_surface);
+} else {
+width = 640;
+height = 480;
+}
+
+surface = qemu_create_placeholder_surface(width, height, 
placeholder_msg);
+}
 
-assert(old_surface != surface || surface == NULL);
+assert(old_surface != surface);
 
 con->surface = surface;
 QLIST_FOREACH(dcl, &s->listeners, next) {
diff --git a/ui/gtk.c b/ui/gtk.c
index 79dc2401203..a4a5f981e2a 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -567,10 +567,6 @@ static void gd_switch(DisplayChangeListener *dcl,
 }
 vc->gfx.ds = surface;
 
-if (!surface) {
-return;
-}
-
 if (surface->format == PIXMAN_x8r8g8b8) {
 /*
  * PIXMAN_x8r8g8b8 == CAIRO_FORMAT_RGB24
diff --git a/ui/sdl2-2d.c b/ui/sdl2-2d.c
index a2ea85127d5..bfebbdeaea8 100644
--- a/ui/sdl2-2d.c
+++ b/ui/sdl2-2d.c
@@ -32,14 +32,11 @@ void sdl2_2d_update(DisplayChangeListener *dcl,
 int x, int y, int w, int h)
 {
 struct sdl2_console *scon = container_of(dcl, struct sdl2_console, dcl);
-DisplaySurface *surf = qemu_console_surface(dcl->con);
+DisplaySurface *surf = scon->surface;
 SDL_Rect rect;
 size_t surface_data_offset;
 assert(!scon->opengl);
 
-if (!surf) {
-return;
-}
 if (!scon->texture) {
 return;
 }
@@ -75,7 +72,7 @@ void sdl2_2d_switch(DisplayChangeListener *dcl,
 scon->texture = NULL;
 }
 
-if (!new_surface) {
+if (is_placeholder(new_surface) && qemu_console_get_index(dcl->con)) {
 sdl2_window_destroy(scon);
 return;
 }
diff --git a/ui/sdl2-gl.c b/ui/sdl2-gl.c
index fd594d74611..a21d2deed91 100644
--- a/ui/sdl2-gl.c
+++ b/ui/sdl2-gl.c
@@ -86,7 +86,7 @@ void sdl2_gl_switch(DisplayChangeListener *dcl,
 
 scon->surface = new_surface;
 
-if (!new_surface) {
+if (is_placeholder(new_surface) && qemu_console_get_index(dcl->con)) {
 qemu_gl_fini_shader(scon->gls);
 scon->gls = NULL;
 sdl2_window_destroy(scon);
@@ -112,7 +112,7 @@ void sdl2_gl_refresh(DisplayChangeListener *dcl)
 assert(scon->opengl);
 
 graphic_hw_update(dcl->con);
-if (scon->updates && scon->surface) {
+if (scon->updates && scon->real_window) {
 scon->updates = 0;
 sdl2_gl_render_surface(scon);
 }
diff --git a/ui/spice-display.c b/ui/spice-display.c
index 6f32b66a6e7..222c7c20a2a 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -388,7 +388,7 @@ void qemu_spice_display_switch(SimpleSpiceDisplay *ssd,
 SimpleSpiceUpdate *update;
 bool need_destroy;
 
-if (surface && ssd->surface &&
+if (ssd->surface &&
 surface_width(surface) == pixman_image_get_width(ssd->surface) &&
 surface_height(surface) == pixman_image_get_height(ssd->surface) &&
 surface_format(surface) == pixman_image_get_format(ssd->surface)) {
@@ -410,8 +410,8 @@ void qemu_spice_display_switch(SimpleSpiceDisplay *ssd,
 
 /* full mode switch */
 trace_qemu_spice_display_

Re: [PATCH] libqos/qgraph: format qgraph comments for sphinx documentation

2021-02-25 Thread Emanuele Giuseppe Esposito





On 25/02/2021 11:05, Paolo Bonzini wrote:

On 25/02/21 09:22, Emanuele Giuseppe Esposito wrote:

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 1dcce3bbed..f0038f8722 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -12,6 +12,7 @@ Contents:

  .. toctree::
 :maxdepth: 2
+   :includehidden:

 build-system
 kconfig
@@ -24,7 +25,6 @@ Contents:
 atomics
 stable-process
 qtest
-   qgraph
 decodetree
 secure-coding-practices
 tcg

---

Allow showing the hidden toctree in the docs/devel index, so that the 
link is visible


End result:
- no visible change in docs/index
- qgraph link visible in docs/devel/index
- qgraph linked as text link in qtree


Makes sense.  Did you also try increasing the maxdepth?


Yes, it does not change much on the qgraph side. By default, with this 
depth a simple toctree with qgraph would be already visible in both indexes.


As I see it, qgraph is a "subsection" of qtest, not a separate entry in 
docs/index, that is why I am hiding it.


I will go ahead and submit the patch for the sphinx documentation.
As you suggested, better qraph examples and explanations will follow in 
a separate serie.


Thank you,
Emanuele

Re: [RFC PATCH 0/5] Experimenting with tb-lookup tweaks

2021-02-25 Thread Alex Bennée



Richard Henderson  writes:

> On 2/24/21 8:58 AM, Alex Bennée wrote:
>> Hi Richard,
>> 
>> Well I spun up some of the ideas we talked about to see if there was
>> anything to be squeezed out of the function. In the end the results
>> seem to be a washout with my pigz benchmark:
>> 
>>  qemu-system-aarch64 -cpu cortex-a57 \
>>-machine type=virt,virtualization=on,gic-version=3 \
>>-serial mon:stdio \
>>-netdev user,id=unet,hostfwd=tcp::-:22 \
>>-device virtio-net-pci,netdev=unet,id=virt-net,disable-legacy=on \
>>-device virtio-scsi-pci,id=virt-scsi,disable-legacy=on \
>>-blockdev 
>> driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-buster-arm64
>>  \
>>-device scsi-hd,drive=hd,id=virt-scsi-hd \
>>-smp 4 -m 4096 \
>>-kernel ~/lsrc/linux.git/builds/arm64/arch/arm64/boot/Image \
>>-append "root=/dev/sda2 systemd.unit=benchmark-pigz.service" \
>>-display none -snapshot
>> 
>> | Command | Mean [s]   | Min [s] | Max [s] | Relative |
>> |-++-+-+--|
>> | Before  | 46.597 ± 2.482 |  45.208 |  53.618 | 1.00 |
>> | After   | 46.867 ± 2.242 |  45.871 |  53.180 | 1.00 |
>
> Well that's disappointing.
>
>> Maybe the code cleanup itself makes it worthwhile. WDYT?
>
> I think there's little doubt that the first 3 patches are a good code cleanup.
>
> Patch 4 I think is still beneficial, simply so that we can add that "Above
> fields" comment.
>
> Patch 5 would only be worthwhile if we could measure any positive difference,
> which it seems we cannot.
>
> I have a follow-up patch to remove the parallel_cpus global variable which I
> will post in a moment.  While it removes a handful of insns from this
> fast-path, I doubt it helps.  But getting rid of a global is probably always
> positive, no?
>
> I was glancing through the lookup function for alpha, instead of aarch64 and 
> saw:
>
>  21e:   33 43 18xor0x18(%rbx),%eax
>  221:   4c 31 e1xor%r12,%rcx
>  224:   44 31 eaxor%r13d,%edx
>  227:   09 c2   or %eax,%edx
>  229:   48 0b 4b 08 or 0x8(%rbx),%rcx
>
> and thought -- hang on, how come we're just ORing nor XORing here?  Of course
> it's the cs_base field, which alpha has set to zero.  The compiler has
> simplified bits |= 0 ^ tb->cs_base.
>
> Which got me thinking: what if we had a per-cpu
>
> typedef struct {
> target_ulong pc;
> ...
> } TranslationBlockID;
>
> static inline bool arch_tbid_cmp(TranslationBlockID x,
>  TranslationBlockID y)
> {
> return x.pc == y.pc && ...;
> }
>
> We could potentially reduce this to memcmp(&x, &y).
>
> First, this would allow cs_base to be eliminated where it is not used.  
> Second,
> this would allow cs_base to be renamed for the non-x86 targets for which it is
> being abused.  Third, it would allow tb->flags to be either (a) elided or (b)
> extended by the target as needed.
>
> This final is directed at ARM, of course, where we've overflowed the uint32_t
> that is tb->flags.  We could now extend that to 64-bits.
>
> Obviously, some tweaks to tb_hash_func would be required as well, but that's
> manageable.
>
> What do you think about this last?

Sounds like a good idea for clean-up, especially to get rid of
cs_base/extend tbflags when needed. One concern would be where do we go
when we get to heterogeneous emulation? Will they share the same
translation area like the current cpu->cluster_index stuff or will that
only be for similar but not quite the same architectures? Maybe I'm
thinking too far ahead... 

>
>
> r~


-- 
Alex Bennée

Re: [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements

2021-02-25 Thread Dr. David Alan Gilbert

* Stefan Hajnoczi (stefa...@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:18PM +, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > For some read/writes the virtio queue elements are unmappable by
> > the daemon; these are cases where the data is to be read/written
> > from non-RAM.  In viritofs's case this is typically a direct read/write
> > into an mmap'd DAX file also on virtiofs (possibly on another instance).
> > 
> > When we receive a virtio queue element, check that we have enough
> > mappable data to handle the headers.  Make a note of the number of
> > unmappable 'in' entries (ie. for read data back to the VMM),
> > and flag the fuse_bufvec for 'out' entries with a new flag
> > FUSE_BUF_PHYS_ADDR.
> 
> Looking back at this I think vhost-user will need generic
> READ_MEMORY/WRITE_MEMORY commands. It's okay for virtio-fs to have its
> own IO command (although not strictly necessary).
> 
> With generic READ_MEMORY/WRITE_MEMORY libvhost-user and other vhost-user
> device backend implementations can handle vring descriptors that point
> into the DAX window. This can be done transparently so individual device
> implementations (net, blk, etc) don't even know when memory is copied vs
> zero-copy shared memory access.
> 
> So this approach is okay for virtio-fs but it's not a long-term solution
> for all of vhost-user. Eventually the long-term solution may be needed
> so that other VIRTIO devices that have shared memory resources work.
> 
> Another bonus of READ_MEMORY/WRITE_MEMORY is that users that prefer an
> enforcing vIOMMU can disable shared memory (maybe just keep the vring
> itself mmapped).
> 
> I just wanted to share this idea but don't expect it to be addressed in
> this patch series.

Yes, that would be nice; although in this case it would imply an extra
memory copy; you'd have to do the IO in the daemon, and then perform a
read/write back across the socket.

> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index a090040bb2..ed9280de91 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -611,6 +611,13 @@ enum fuse_buf_flags {
> >   * detected.
> >   */
> >  FUSE_BUF_FD_RETRY = (1 << 3),
> > +
> > +/**
> > + * The addresses in the iovec represent guest physical addresses
> > + * that can't be mapped by the daemon process.
> > + * IO must be bounced back to the VMM to do it.
> > + */
> > +FUSE_BUF_PHYS_ADDR = (1 << 4),
> 
> With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
> may need to be renamed in the future, but it is okay for now.

Do we have any naming for something that's either a GPA or a IOVA?

> > +if (req->bad_in_num || req->bad_out_num) {
> > +bool handled_unmappable = false;
> > +
> > +if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
> > +out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
> > +((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
> > +out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
> 
> This violates the VIRTIO specification:
> 
>   2.6.4.1 Device Requirements: Message Framing
> 
>   The device MUST NOT make assumptions about the particular arrangement of 
> descriptors.
> 
>   
> https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-280004
> 
> The driver is not obligated to submit separate iovecs. out_num == 1 is
> valid and the device needs to process it byte-wise instead of making
> assumptions about iovec layout.

Yes, it's actually not new in this patch, but I'll clean it up.
I took the shortcut all the way back in:
  e17f7a580e2c599330ad virtiofsd: Pass write iov's all the way through

Dave

-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH v2] blockjob: report a better error message

2021-02-25 Thread Stefano Garzarella

When a block job fails, we report strerror(-job->job.ret) error
message, also if the job set an error object.
Let's report a better error message using error_get_pretty(job->job.err).

If an error object was not set, strerror(-job->ret) is used as fallback,
as explained in include/qemu/job.h:

typedef struct Job {
...
/**
 * Error object for a failed job.
 * If job->ret is nonzero and an error object was not set, it will be set
 * to strerror(-job->ret) during job_completed.
 */
Error *err;
}

In block_job_query() there can be a transient where 'job.err' is not set
by a scheduled bottom half. In that case we use strerror(-job->ret) as it
was before.

Suggested-by: Kevin Wolf 
Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- fixed potential issue in block_job_query() [Kevin]
- updated commit message

 blockjob.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index f2feff051d..ef968017a2 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -318,8 +318,12 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 info->status= job->job.status;
 info->auto_finalize = job->job.auto_finalize;
 info->auto_dismiss  = job->job.auto_dismiss;
-info->has_error = job->job.ret != 0;
-info->error = job->job.ret ? g_strdup(strerror(-job->job.ret)) : NULL;
+if (job->job.ret) {
+info->has_error = true;
+info->error = job->job.err ?
+g_strdup(error_get_pretty(job->job.err)) :
+g_strdup(strerror(-job->job.ret));
+}
 return info;
 }
 
@@ -356,7 +360,7 @@ static void block_job_event_completed(Notifier *n, void 
*opaque)
 }
 
 if (job->job.ret < 0) {
-msg = strerror(-job->job.ret);
+msg = error_get_pretty(job->job.err);
 }
 
 qapi_event_send_block_job_completed(job_type(&job->job),
-- 
2.29.2

Re: [PATCH] multiprocess: move feature to meson_options.txt

2021-02-25 Thread Philippe Mathieu-Daudé

On 2/24/21 1:23 PM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini 
> ---
>  configure | 12 
>  meson.build   |  9 +++--
>  meson_options.txt |  2 ++
>  3 files changed, 13 insertions(+), 10 deletions(-)
...

> @@ -2535,6 +2540,7 @@ endif
>  summary_info += {'target list':   ' '.join(target_dirs)}
>  if have_system
>summary_info += {'default devices':   get_option('default_devices')}
> +  summary_info += {'Multiprocess QEMU': multiprocess_allowed}

Since you are changing this, it is a good opportunity to find a
better description to this feature (similarly how we recently clarified
the TCI description).

The current description is confusing with multiprocessing (which is
by default on QEMU and every developer want to exploit that).

So the main multiprocess code resides in hw/remote/mpqemu*.

I have the impression "monolithic application" is common in
software engineering. What about "polylithic QEMU"?

Stefan once described it as "out of (main) process device emulation".

Relevant links:
https://english.stackexchange.com/questions/112633/whats-an-antonym-of-monolithic-as-in-monolithic-architecture/119212#119212
https://infovis-wiki.net/wiki/Polylithic_design

...
>  if not supported_cpus.contains(cpu)
> diff --git a/meson_options.txt b/meson_options.txt
> index 675a9c500a..bf11de7bb2 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -45,6 +45,8 @@ option('cfi', type: 'boolean', value: 'false',
> description: 'Control-Flow Integrity (CFI)')
>  option('cfi_debug', type: 'boolean', value: 'false',
> description: 'Verbose errors in case of CFI violation')
> +option('multiprocess', type: 'feature', value: 'auto',
> +   description: 'Multiprocess QEMU support')

Re: [PATCH v3 02/16] qapi/expr.py: Check for dict instead of OrderedDict

2021-02-25 Thread Markus Armbruster

John Snow  writes:

> On 2/24/21 4:30 AM, Markus Armbruster wrote:
>> John Snow  writes:
>> 
>>> OrderedDict is a subtype of dict, so we can check for a more general
>>> form. These functions do not themselves depend on it being any
>>> particular type.
>> 
>> True.  The actual arguments can only be OrderedDict, though.  I think we
>> refrained from relaxing to dict in these helpers because we felt
>> "staying ordered" is clearer.
>> 
>
> As a habit, I tend towards declaring the least specific type possible 
> for input and declaring the most specific type possible for output.

This maximimizes generality, which can be quite worthwhile.  Maximizing
generality by default is not a bad habit, I guess.  But cases exist
where generality is not needed, and other considerations take
precedence.

>> We're *this* close to mooting the point, because
>> 
>>  Changed in version 3.7: Dictionary order is guaranteed to be
>>  insertion order. This behavior was an implementation detail of
>>  CPython from 3.6.
>> 
>> https://docs.python.org/3.7/library/stdtypes.html
>> 
>> Is messing with it necessary for later work?  If not, is it a worthwhile
>> improvement?
>> 
>
> Not strictly necessary, but if the expression checkers here don't 
> *require* the type be an ordereddict, why bother to enforce that here?
>
> It's just a bid to slacken the type (my type hints will look for Dict, 
> not OrderedDict) and leave the use of OrderedDict as an "implementation 
> detail" that only parser.py knows about.

"Orderedness" is anything but a detail only parser.py knows about.

Example:

{ 'command': 'blockdev-insert-medium',
  'data': { 'id': 'str',
'node-name': 'str'} }

AST:

OrderedDict([('command', 'blockdev-insert-medium'),
 ('data',
  OrderedDict([('id', {'type': 'str'}),
   ('node-name', {'type': 'str'})]))])

For the inner dictionary, order matters, because the difference between

void qmp_blockdev_insert_medium(const char *id, const char *node_name,
Error **errp);

and

void qmp_blockdev_insert_medium(const char *node_name, const char *id,
Error **errp);

matters.

For the outer dictionary, order carries no semantic meaning.

My point is: parser.py fundamentally builds *ordered* dicts.  We're
certainly free to relax them to more general types wherever
"orderedness" is not needed.  However, one of the main aims of this
typing exercise is to make the code easier to read, and I doubt making
things more general helps there.

Related: the type aliases for the AST you define later in this series.
I figure relaxing these to more general types where possible would
actually reduce their value.  TopLevelExpression tells me more than
dict.

I'm not against relaxing types per se.  Judicious relaxation is often
useful to keep code more generally useful.  When to relax isn't always
obvious.

> (I needed to change it for prototyping using an off-the-shelf parser, so 
> it was annoying to have it check for a stronger type if it doesn't 
> absolutely have to.)

If your off-the-shelf parse doesn't preserve order, it's not fit for the
purpose :)

>>> Signed-off-by: John Snow 
>>> Reviewed-by: Eduardo Habkost 
>>> Reviewed-by: Cleber Rosa

Re: [PATCH] qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute

2021-02-25 Thread Philippe Mathieu-Daudé

On 2/11/21 8:42 PM, Nathan Chancellor wrote:
> fw_cfg_showrev() is called by an indirect call in kobj_attr_show(),
> which violates clang's CFI checking because fw_cfg_showrev()'s second
> parameter is 'struct attribute', whereas the ->show() member of 'struct
> kobj_structure' expects the second parameter to be of type 'struct
> kobj_attribute'.
> 
> $ cat /sys/firmware/qemu_fw_cfg/rev
> 3
> 
> $ dmesg | grep "CFI failure"
> [   26.016832] CFI failure (target: fw_cfg_showrev+0x0/0x8):
> 
> Fix this by converting fw_cfg_rev_attr to 'struct kobj_attribute' where
> this would have been caught automatically by the incompatible pointer
> types compiler warning. Update fw_cfg_showrev() accordingly.
> 
> Fixes: 75f3e8e47f38 ("firmware: introduce sysfs driver for QEMU's fw_cfg 
> device")
> Link: https://github.com/ClangBuiltLinux/linux/issues/1299
> Signed-off-by: Nathan Chancellor 
> ---
>  drivers/firmware/qemu_fw_cfg.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup

2021-02-25 Thread Klaus Jensen

On Feb 23 06:00, Keith Busch wrote:
> These look good.
> 
> Reviewed-by: Keith Busch 

Thanks, applied to nvme-next.


signature.asc
Description: PGP signature

[PATCH v23 00/17] i386 cleanup PART 2

2021-02-25 Thread Claudio Fontana

v22 -> v23:

* i386: move TCG btp_helper into sysemu/
 - extended the #ifndef CONFIG_USER_ONLY to entire else of
   if (cpl != 0).

* i386: split misc helper into user and sysemu parts
 - added g_assert_not_reached() and changed user file name to -stubs.

* i386: separate fpu_helper into user and sysemu parts
 - removed unused return value
 - added comment abut issues with current cpu_x86_fsave.

* i386: split off sysemu part of cpu.c
 - rename cpu-softmmu.c to cpu-sysemu.c
 - fixed two mispelled comments, and add two comments
   in the headers of cpu.c and cpu-sysemu.c to describe them

* i386: gdbstub: only write CR0/CR2/CR3/EFER for sysemu
 - defined some aux functions to reduce repeated code

* i386: make cpu_load_efer sysemu-only
 - move the function to helper.c, remove "inline"

v21 -> v22: replace "softmmu" with "sysemu"

v20 -> v21:

* meson: add target_user_arch
  - add hexagon

v19 -> v20:

* add new patch to make gdbstub only write certain registers for softmmu.
  In particular, CR0, CR2, CR3 and EFER should not be changed under
  CONFIG_USER_ONLY. (Paolo)

* add new patch to make cpu_load_efer softmmu-only (Paolo)

* i386: split svm_helper into softmmu and stub-only user

  - fixed commit message spelling (Eric)

  - mention in commit message that this reproduces the existing stubs,
but really everything that requires s->cpl == 0 should be impossible
to trigger from user-mode, and we could find a way to assert that
more consistently.

v18 -> v19:

* i386: split smm helper (softmmu)
  - add g_assert_not_reached and cpu_abort for invalid states in
CONFIG_USER_ONLY (Paolo)

* i386: move TCG btp_helper into softmmu/
  - for CONFIG_USER_ONLY, assert that the hidden IOBPT flags are not set
while attempting to generate io_bpt helpers.
Theory to verify (Paolo)

* i386: slit svm_helper into softmmu and stub-only user
  - added XXX in the commit message to highlight the question about
whether the same check should be done controlling access to
cpu_load_efer() and state of the hidden SVME flag. (Paolo)

v17 -> v18:

* meson: add target_user_arch

 - add target_user_arch to all targets which build user.
   Otherwise meson complains about missing key for archs without it.
   (Paolo)

* wrap a few gen_helper_ calls around ifndef CONFIG_USER_ONLY.
  This would need a look from someone like Alex or Richard I think,
  as potentially we could remove even more code I think around the
  gen_helper_ calls for CONFIG_USER_ONLY.

  In the current master code, we have empty helpers for user mode,
  but still we generate the preamble code, temporary variables etc,
  just to then call a helper_() function that does nothing.

  In particular I am referring to patches:

  i386: split tcg btp_helper into softmmu and user parts
DEF_HELPER_FLAGS_3(set_dr, TCG_CALL_NO_WG, void, env, int, tl)
DEF_HELPER_FLAGS_4(bpt_io, TCG_CALL_NO_WG, void, env, i32, i32, tl)
gen_bpt_io
gen_helper_set_dr(cpu_env, s->tmp2_i32, s->T0);

  i386: split smm helper (softmmu)
DEF_HELPER_1(rsm, void, env)
gen_helper_rsm(cpu_env);

  (Alex, Richard?)

* removed suffixes from user/ and softmmu/ modules
  (Alex, Philippe).
  Where possible, removed user stubs entirely.
  Renamed the leftover svm_helper stubs to user/svm_stubs.c

* cleaned up lefover unnecessary header files and squashed them.
 

v16 -> v17: changed to RFC

* tcg_ops are already in master, removed from the series

* i386: split cpu accelerators from cpu.c, using AccelCPUClass:
  removed spurious ; and added spacing before/after functions (Richard)

* added new patches as RFC for the next steps, introducing target-specific
  user-mode specific meson variables, and applied to i386/tcg as an
  example, in order to gather feedback.

v15 -> v16:

* cpu: Move synchronize_from_tb() to tcg_ops:
  - adjusted comments (Alex)

* cpu: tcg_ops: move to tcg-cpu-ops.h, keep a pointer in CPUClass:
  - remove forward decl. of AccelCPUClass, should be in a later patch. (Alex)
  - simplified comment about tcg_ops in struct CPUClass (Alex)
  - remove obsolete comment about ARM blocking TCGCPUOps from being const.
(Alex)

* accel: replace struct CpusAccel with AccelOpsClass:
  - reworded commit message to be clearer about the objective (Alex)

* accel: introduce AccelCPUClass extending CPUClass
  - reworded commit message to be clearer about the objective (Alex)

* hw/core/cpu: call qemu_init_vcpu in cpu_common_realizefn:
  - dropped this patch (Alex, Philippe)

  will try again later, also in the context of:
  https://www.mail-archive.com/qemu-devel@nongnu.org/msg686480.html

* accel: introduce new accessor functions
  - squashed comments in previous patch introducing accel-cpu.h. (Philippe)

* accel-cpu: make cpu_realizefn return a bool
  - split in two patches, separating the change to the phys_bits check
(Philippe)

v14 -> v15:

* change the TcgCpuOperations so that all fields of the struct are
  defined unconditionall

[PATCH v23 01/17] i386: split cpu accelerators from cpu.c, using AccelCPUClass

2021-02-25 Thread Claudio Fontana

i386 is the first user of AccelCPUClass, allowing to split
cpu.c into:

cpu.ccpuid and common x86 cpu functionality
host-cpu.c   host x86 cpu functions and "host" cpu type
kvm/kvm-cpu.cKVM x86 AccelCPUClass
hvf/hvf-cpu.cHVF x86 AccelCPUClass
tcg/tcg-cpu.cTCG x86 AccelCPUClass

Signed-off-by: Claudio Fontana 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 target/i386/cpu.h   |  20 +-
 target/i386/host-cpu.h  |  19 ++
 target/i386/kvm/kvm-cpu.h   |  41 
 target/i386/tcg/tcg-cpu.h   |  15 --
 hw/i386/pc_piix.c   |   1 +
 target/i386/cpu.c   | 390 
 target/i386/host-cpu.c  | 201 +++
 target/i386/hvf/hvf-cpu.c   |  68 +++
 target/i386/kvm/kvm-cpu.c   | 151 ++
 target/i386/kvm/kvm.c   |   3 +-
 target/i386/tcg/tcg-cpu.c   | 113 ++-
 MAINTAINERS |   2 +-
 target/i386/hvf/meson.build |   1 +
 target/i386/kvm/meson.build |   7 +-
 target/i386/meson.build |   6 +-
 15 files changed, 651 insertions(+), 387 deletions(-)
 create mode 100644 target/i386/host-cpu.h
 create mode 100644 target/i386/kvm/kvm-cpu.h
 delete mode 100644 target/i386/tcg/tcg-cpu.h
 create mode 100644 target/i386/host-cpu.c
 create mode 100644 target/i386/hvf/hvf-cpu.c
 create mode 100644 target/i386/kvm/kvm-cpu.c

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 8be39cfb62..c8a84a9033 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1925,13 +1925,20 @@ int cpu_x86_signal_handler(int host_signum, void *pinfo,
void *puc);
 
 /* cpu.c */
+void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
+  uint32_t vendor2, uint32_t vendor3);
+typedef struct PropValue {
+const char *prop, *value;
+} PropValue;
+void x86_cpu_apply_props(X86CPU *cpu, PropValue *props);
+
+/* cpu.c other functions (cpuid) */
 void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx);
 void cpu_clear_apic_feature(CPUX86State *env);
 void host_cpuid(uint32_t function, uint32_t count,
 uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
-void host_vendor_fms(char *vendor, int *family, int *model, int *stepping);
 
 /* helper.c */
 void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
@@ -2136,17 +2143,6 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess 
access);
 void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
TPRAccess access);
 
-
-/* Change the value of a KVM-specific default
- *
- * If value is NULL, no default will be set and the original
- * value from the CPU model table will be kept.
- *
- * It is valid to call this function only for properties that
- * are already present in the kvm_default_props table.
- */
-void x86_cpu_change_kvm_default(const char *prop, const char *value);
-
 /* Special values for X86CPUVersion: */
 
 /* Resolve to latest CPU version */
diff --git a/target/i386/host-cpu.h b/target/i386/host-cpu.h
new file mode 100644
index 00..b47bc0943f
--- /dev/null
+++ b/target/i386/host-cpu.h
@@ -0,0 +1,19 @@
+/*
+ * x86 host CPU type initialization and host CPU functions
+ *
+ * Copyright 2021 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HOST_CPU_H
+#define HOST_CPU_H
+
+void host_cpu_instance_init(X86CPU *cpu);
+void host_cpu_max_instance_init(X86CPU *cpu);
+void host_cpu_realizefn(CPUState *cs, Error **errp);
+
+void host_cpu_vendor_fms(char *vendor, int *family, int *model, int *stepping);
+
+#endif /* HOST_CPU_H */
diff --git a/target/i386/kvm/kvm-cpu.h b/target/i386/kvm/kvm-cpu.h
new file mode 100644
index 00..e858ca21e5
--- /dev/null
+++ b/target/i386/kvm/kvm-cpu.h
@@ -0,0 +1,41 @@
+/*
+ * i386 KVM CPU type and functions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef KVM_CPU_H
+#define KVM_CPU_H
+
+#ifdef CONFIG_KVM
+/*
+ * Change the value of a KVM-specific default
+ *
+ * If value is NULL, no default will be set and the original
+ * value from the CPU model table will be kept.
+ *
+ * It is valid

[PATCH v23 07/17] i386: split off sysemu-only functionality in tcg-cpu

2021-02-25 Thread Claudio Fontana

Signed-off-by: Claudio Fontana 
Reviewed-by: Richard Henderson 
---
 target/i386/tcg/tcg-cpu.h  | 24 +
 target/i386/tcg/sysemu/tcg-cpu.c   | 83 ++
 target/i386/tcg/tcg-cpu.c  | 75 ++-
 target/i386/tcg/meson.build|  3 ++
 target/i386/tcg/sysemu/meson.build |  3 ++
 target/i386/tcg/user/meson.build   |  2 +
 6 files changed, 119 insertions(+), 71 deletions(-)
 create mode 100644 target/i386/tcg/tcg-cpu.h
 create mode 100644 target/i386/tcg/sysemu/tcg-cpu.c
 create mode 100644 target/i386/tcg/sysemu/meson.build
 create mode 100644 target/i386/tcg/user/meson.build

diff --git a/target/i386/tcg/tcg-cpu.h b/target/i386/tcg/tcg-cpu.h
new file mode 100644
index 00..36bd300af0
--- /dev/null
+++ b/target/i386/tcg/tcg-cpu.h
@@ -0,0 +1,24 @@
+/*
+ * i386 TCG cpu class initialization functions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+#ifndef TCG_CPU_H
+#define TCG_CPU_H
+
+bool tcg_cpu_realizefn(CPUState *cs, Error **errp);
+
+#endif /* TCG_CPU_H */
diff --git a/target/i386/tcg/sysemu/tcg-cpu.c b/target/i386/tcg/sysemu/tcg-cpu.c
new file mode 100644
index 00..c223c0fe9b
--- /dev/null
+++ b/target/i386/tcg/sysemu/tcg-cpu.c
@@ -0,0 +1,83 @@
+/*
+ * i386 TCG cpu class initialization functions specific to sysemu
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "tcg/helper-tcg.h"
+
+#include "sysemu/sysemu.h"
+#include "qemu/units.h"
+#include "exec/address-spaces.h"
+
+#include "tcg/tcg-cpu.h"
+
+static void tcg_cpu_machine_done(Notifier *n, void *unused)
+{
+X86CPU *cpu = container_of(n, X86CPU, machine_done);
+MemoryRegion *smram =
+(MemoryRegion *) object_resolve_path("/machine/smram", NULL);
+
+if (smram) {
+cpu->smram = g_new(MemoryRegion, 1);
+memory_region_init_alias(cpu->smram, OBJECT(cpu), "smram",
+ smram, 0, 4 * GiB);
+memory_region_set_enabled(cpu->smram, true);
+memory_region_add_subregion_overlap(cpu->cpu_as_root, 0,
+cpu->smram, 1);
+}
+}
+
+bool tcg_cpu_realizefn(CPUState *cs, Error **errp)
+{
+X86CPU *cpu = X86_CPU(cs);
+
+/*
+ * The realize order is important, since x86_cpu_realize() checks if
+ * nothing else has been set by the user (or by accelerators) in
+ * cpu->ucode_rev and cpu->phys_bits, and the memory regions
+ * initialized here are needed for the vcpu initialization.
+ *
+ * realize order:
+ * tcg_cpu -> host_cpu -> x86_cpu
+ */
+cpu->cpu_as_mem = g_new(MemoryRegion, 1);
+cpu->cpu_as_root = g_new(MemoryRegion, 1);
+
+/* Outer container... */
+memory_region_init(cpu->cpu_as_root, OBJECT(cpu), "memory", ~0ull);
+memory_region_set_enabled(cpu->cpu_as_root, true);
+
+/*
+ * ... with two regions inside: normal system memory with low
+ * priority, and...
+ */
+memory_region_init_alias(cpu->cpu_as_mem, OBJECT(cpu), "memory",
+ get_system_memory(), 0, ~0ull);
+memory_region_add_subregion_overlap(cpu->cpu_as_root, 0, cpu->cpu_as_mem, 
0);
+memory_region_set_enabled(cpu->cpu_as_mem, true);
+
+cs->num_ases = 2;
+cpu_address_space_init(cs, 0, "cpu-memory", cs->memory);
+cpu_address_space_init(cs, 1, "cpu-smm", cpu->cpu_as_root);
+
+/* ... SMRAM with higher priority, linked from /machine/smram.  */
+cpu->machine_done.notify = tcg_cpu_machine_done;
+qemu_add_machine_init_done_notifier(&cpu->machine_done);
+return true;
+}
diff --git a/tar

Re: [PATCH v2 00/42] esp: consolidate PDMA transfer buffers and other fixes

2021-02-25 Thread Philippe Mathieu-Daudé

On 2/25/21 10:54 AM, Mark Cave-Ayland wrote:
> On 23/02/2021 21:32, Philippe Mathieu-Daudé wrote:
> 
>> Hi Mark,
>>
>> On 2/9/21 8:29 PM, Mark Cave-Ayland wrote:
>>> This patch series comes from an experimental branch that I've been
>>> working on
>>> to try and boot a MacOS toolbox ROM under the QEMU q800 machine. The
>>> effort is
>>> far from complete, but it seems worth submitting these patches
>>> separately since
>>> they are limited to the ESP device and form a substantial part of the
>>> work to
>>> date.
>>>
>>> As part of Laurent's recent q800 work so-called PDMA (pseudo-DMA)
>>> support was
>>> added to the ESP device. This is whereby the DREQ (DMA request) line
>>> is used
>>> to signal to the host CPU that it can transfer data to/from the
>>> device over
>>> the SCSI bus.
>>>
>>> The existing PDMA tracks 4 separate transfer data sources as
>>> indicated by the
>>> ESP pdma_origin variable: PDMA, TI, CMD and ASYNC with an independent
>>> variable
>>> pdma_len to store the transfer length. This works well with Linux
>>> which uses a
>>> single PDMA request to transfer a number of sectors in a single request.
>>>
>>> Unfortunately the MacOS toolbox ROM has other ideas here: it sends
>>> data to the
>>> ESP as a mixture of FIFO and PDMA transfers and then uses a mixture
>>> of the FIFO
>>> and DMA counters to confirm that the correct number of bytes have been
>>> transferred. For this to work correctly the PDMA buffers and separate
>>> pdma_len
>>> transfer counter must be consolidated into the FIFO to allow mixing
>>> of both
>>> types of transfer within a single request.
>>>
>>> The patchset is split into several sections:
>>>
>>> - Patches 1-7 are minor patches which make esp.c checkpatch friendly,
>>> QOMify ESPState,
>>>    and also fix up some trace events ready for later patches in the
>>> series
>>>
>>> - Patches 8-13 unify the DMA transfer count. In particular there are
>>> 2 synthetic
>>>    variables dma_counter and dma_left within ESPState which do not
>>> need to exist.
>>>    DMA transfer lengths are programmed into the TC (transfer count)
>>> register which is
>>>    decremented for each byte transferred, generating an interrupt
>>> when it reaches zero.
>>>    These patches add helper functions to read the TC and STC
>>> registers directly and
>>>    remove these synthetic variables so that the DMA transfer length
>>> is now tracked in
>>>    a single place.
>>>
>>> - Now that the TC register represents the authoritative DMA transfer
>>> length, patches
>>>    14-25 work to eliminate the separate PDMA variables pdma_start,
>>> pdma_cur, pdma_len
>>>    and separate PDMA buffers PDMA and CMD. The PDMA position
>>> variables can be replaced
>>>    by the existing ESP cmdlen and ti_wptr/ti_rptr, whilst the FIFO
>>> (TI) buffer is used
>>>    for incoming data with commands being accumulated in cmdbuf as per
>>> standard DMA
>>>    requests.
>>
>> I tried to help reviewing up to this point.
>>
>> The next parts are too specific to me.
> 
> Thanks Phil - I understand that a set of 42 patches for a 25 year old
> disk controller is never going to be the top of most people's review
> list, and some parts are almost impossible to review unless you have a
> good understanding of the datasheet.

Well I also have a series for a 30+ years old MIPS board and am
not confident to post it because probably little interest for
the community, although it is very interesting to compare with
actual SoC and see how the IP blocks are indeed reused and improved
over the time -- or not... i.e. when someone report a hw bug in a 2020
product and the same bug is present in the IP core from the 80th it
inherited ;)

> I'll see if Laurent has any comments over the next few days, but other
> than that I'd be inclined to send a v3 followed soon by a PR to avoid me
> having to update these regularly (I already see a slight conflict with
> Paolo's SCSI error handling changes, for example).

I'll have a look at your v3 and Cc you when I post this MIPS board :D

Regards,

Phil.

[PATCH v23 03/17] accel: introduce new accessor functions

2021-02-25 Thread Claudio Fontana

avoid open coding the accesses to cpu->accel_cpu interfaces,
and instead introduce:

accel_cpu_instance_init,
accel_cpu_realizefn

to be used by the targets/ initfn code,
and by cpu_exec_realizefn respectively.

Signed-off-by: Claudio Fontana 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 include/qemu/accel.h | 13 +
 accel/accel-common.c | 19 +++
 cpu.c|  6 +-
 target/i386/cpu.c|  9 ++---
 4 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/include/qemu/accel.h b/include/qemu/accel.h
index b9d6d69eb8..da0c8ab523 100644
--- a/include/qemu/accel.h
+++ b/include/qemu/accel.h
@@ -78,4 +78,17 @@ int accel_init_machine(AccelState *accel, MachineState *ms);
 void accel_setup_post(MachineState *ms);
 #endif /* !CONFIG_USER_ONLY */
 
+/**
+ * accel_cpu_instance_init:
+ * @cpu: The CPU that needs to do accel-specific object initializations.
+ */
+void accel_cpu_instance_init(CPUState *cpu);
+
+/**
+ * accel_cpu_realizefn:
+ * @cpu: The CPU that needs to call accel-specific cpu realization.
+ * @errp: currently unused.
+ */
+void accel_cpu_realizefn(CPUState *cpu, Error **errp);
+
 #endif /* QEMU_ACCEL_H */
diff --git a/accel/accel-common.c b/accel/accel-common.c
index 9901b0531c..0f6fb4fb66 100644
--- a/accel/accel-common.c
+++ b/accel/accel-common.c
@@ -89,6 +89,25 @@ void accel_init_interfaces(AccelClass *ac)
 accel_init_cpu_interfaces(ac);
 }
 
+void accel_cpu_instance_init(CPUState *cpu)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+if (cc->accel_cpu && cc->accel_cpu->cpu_instance_init) {
+cc->accel_cpu->cpu_instance_init(cpu);
+}
+}
+
+void accel_cpu_realizefn(CPUState *cpu, Error **errp)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+if (cc->accel_cpu && cc->accel_cpu->cpu_realizefn) {
+/* NB: errp parameter is unused currently */
+cc->accel_cpu->cpu_realizefn(cpu, errp);
+}
+}
+
 static const TypeInfo accel_cpu_type = {
 .name = TYPE_ACCEL_CPU,
 .parent = TYPE_OBJECT,
diff --git a/cpu.c b/cpu.c
index ba5d272c1e..25e6fbfa2c 100644
--- a/cpu.c
+++ b/cpu.c
@@ -130,11 +130,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
 CPUClass *cc = CPU_GET_CLASS(cpu);
 
 cpu_list_add(cpu);
-
-if (cc->accel_cpu) {
-/* NB: errp parameter is unused currently */
-cc->accel_cpu->cpu_realizefn(cpu, errp);
-}
+accel_cpu_realizefn(cpu, errp);
 
 #ifdef CONFIG_TCG
 /* NB: errp parameter is unused currently */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6e2b5d7e59..14e2a60ee5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -28,7 +28,6 @@
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
 #include "sysemu/hvf.h"
-#include "hw/core/accel-cpu.h"
 #include "sysemu/xen.h"
 #include "sysemu/whpx.h"
 #include "kvm/kvm_i386.h"
@@ -6786,8 +6785,6 @@ static void x86_cpu_initfn(Object *obj)
 {
 X86CPU *cpu = X86_CPU(obj);
 X86CPUClass *xcc = X86_CPU_GET_CLASS(obj);
-CPUClass *cc = CPU_CLASS(xcc);
-
 CPUX86State *env = &cpu->env;
 
 env->nr_dies = 1;
@@ -6836,10 +6833,8 @@ static void x86_cpu_initfn(Object *obj)
 x86_cpu_load_model(cpu, xcc->model);
 }
 
-/* if required, do the accelerator-specific cpu initialization */
-if (cc->accel_cpu) {
-cc->accel_cpu->cpu_instance_init(CPU(obj));
-}
+/* if required, do accelerator-specific cpu initializations */
+accel_cpu_instance_init(CPU(obj));
 }
 
 static int64_t x86_cpu_get_arch_id(CPUState *cs)
-- 
2.26.2

[PATCH v23 05/17] accel-cpu: make cpu_realizefn return a bool

2021-02-25 Thread Claudio Fontana

overall, all devices' realize functions take an Error **errp, but return void.

hw/core/qdev.c code, which realizes devices, therefore does:

local_err = NULL;
dc->realize(dev, &local_err);
if (local_err != NULL) {
goto fail;
}

However, we can improve at least accel_cpu to return a meaningful bool value.

Signed-off-by: Claudio Fontana 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 include/hw/core/accel-cpu.h | 2 +-
 include/qemu/accel.h| 2 +-
 target/i386/host-cpu.h  | 2 +-
 accel/accel-common.c| 6 +++---
 cpu.c   | 5 +++--
 target/i386/host-cpu.c  | 5 +++--
 target/i386/kvm/kvm-cpu.c   | 4 ++--
 target/i386/tcg/tcg-cpu.c   | 6 --
 8 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/include/hw/core/accel-cpu.h b/include/hw/core/accel-cpu.h
index 24a6697412..5dbfd79955 100644
--- a/include/hw/core/accel-cpu.h
+++ b/include/hw/core/accel-cpu.h
@@ -32,7 +32,7 @@ typedef struct AccelCPUClass {
 
 void (*cpu_class_init)(CPUClass *cc);
 void (*cpu_instance_init)(CPUState *cpu);
-void (*cpu_realizefn)(CPUState *cpu, Error **errp);
+bool (*cpu_realizefn)(CPUState *cpu, Error **errp);
 } AccelCPUClass;
 
 #endif /* ACCEL_CPU_H */
diff --git a/include/qemu/accel.h b/include/qemu/accel.h
index da0c8ab523..4f4c283f6f 100644
--- a/include/qemu/accel.h
+++ b/include/qemu/accel.h
@@ -89,6 +89,6 @@ void accel_cpu_instance_init(CPUState *cpu);
  * @cpu: The CPU that needs to call accel-specific cpu realization.
  * @errp: currently unused.
  */
-void accel_cpu_realizefn(CPUState *cpu, Error **errp);
+bool accel_cpu_realizefn(CPUState *cpu, Error **errp);
 
 #endif /* QEMU_ACCEL_H */
diff --git a/target/i386/host-cpu.h b/target/i386/host-cpu.h
index b47bc0943f..6a9bc918ba 100644
--- a/target/i386/host-cpu.h
+++ b/target/i386/host-cpu.h
@@ -12,7 +12,7 @@
 
 void host_cpu_instance_init(X86CPU *cpu);
 void host_cpu_max_instance_init(X86CPU *cpu);
-void host_cpu_realizefn(CPUState *cs, Error **errp);
+bool host_cpu_realizefn(CPUState *cs, Error **errp);
 
 void host_cpu_vendor_fms(char *vendor, int *family, int *model, int *stepping);
 
diff --git a/accel/accel-common.c b/accel/accel-common.c
index 0f6fb4fb66..d77c09d7b5 100644
--- a/accel/accel-common.c
+++ b/accel/accel-common.c
@@ -98,14 +98,14 @@ void accel_cpu_instance_init(CPUState *cpu)
 }
 }
 
-void accel_cpu_realizefn(CPUState *cpu, Error **errp)
+bool accel_cpu_realizefn(CPUState *cpu, Error **errp)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
 
 if (cc->accel_cpu && cc->accel_cpu->cpu_realizefn) {
-/* NB: errp parameter is unused currently */
-cc->accel_cpu->cpu_realizefn(cpu, errp);
+return cc->accel_cpu->cpu_realizefn(cpu, errp);
 }
+return true;
 }
 
 static const TypeInfo accel_cpu_type = {
diff --git a/cpu.c b/cpu.c
index 25e6fbfa2c..34a0484bf4 100644
--- a/cpu.c
+++ b/cpu.c
@@ -130,8 +130,9 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
 CPUClass *cc = CPU_GET_CLASS(cpu);
 
 cpu_list_add(cpu);
-accel_cpu_realizefn(cpu, errp);
-
+if (!accel_cpu_realizefn(cpu, errp)) {
+return;
+}
 #ifdef CONFIG_TCG
 /* NB: errp parameter is unused currently */
 if (tcg_enabled()) {
diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index d07d41c34c..4ea9e354ea 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -80,7 +80,7 @@ static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
 return phys_bits;
 }
 
-void host_cpu_realizefn(CPUState *cs, Error **errp)
+bool host_cpu_realizefn(CPUState *cs, Error **errp)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = &cpu->env;
@@ -97,10 +97,11 @@ void host_cpu_realizefn(CPUState *cs, Error **errp)
 error_setg(errp, "phys-bits should be between 32 and %u "
" (but is %u)",
TARGET_PHYS_ADDR_SPACE_BITS, phys_bits);
-return;
+return false;
 }
 cpu->phys_bits = phys_bits;
 }
+return true;
 }
 
 #define CPUID_MODEL_ID_SZ 48
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index c23bbe6c50..c660ad4293 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -18,7 +18,7 @@
 #include "kvm_i386.h"
 #include "hw/core/accel-cpu.h"
 
-static void kvm_cpu_realizefn(CPUState *cs, Error **errp)
+static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = &cpu->env;
@@ -41,7 +41,7 @@ static void kvm_cpu_realizefn(CPUState *cs, Error **errp)
MSR_IA32_UCODE_REV);
 }
 }
-host_cpu_realizefn(cs, errp);
+return host_cpu_realizefn(cs, errp);
 }
 
 /*
diff --git a/target/i386/tcg/tcg-cpu.c b/target/i386/tcg/tcg-cpu.c
index 1d3d6d1c6a..23e1f5f0c3 100644
--- a/target/i386/tcg/tcg-cpu.c
+++ b/target/i386/tcg/tcg-cpu.c
@@ -96,7 +96,7 @@ static voi

[PATCH v23 02/17] cpu: call AccelCPUClass::cpu_realizefn in cpu_exec_realizefn

2021-02-25 Thread Claudio Fontana

move the call to accel_cpu->cpu_realizefn to the general
cpu_exec_realizefn from target/i386, so it does not need to be
called for every target explicitly as we enable more targets.

Signed-off-by: Claudio Fontana 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 cpu.c |  6 ++
 target/i386/cpu.c | 20 +++-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/cpu.c b/cpu.c
index bfbe5a66f9..ba5d272c1e 100644
--- a/cpu.c
+++ b/cpu.c
@@ -36,6 +36,7 @@
 #include "sysemu/replay.h"
 #include "exec/translate-all.h"
 #include "exec/log.h"
+#include "hw/core/accel-cpu.h"
 
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
@@ -130,6 +131,11 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
 
 cpu_list_add(cpu);
 
+if (cc->accel_cpu) {
+/* NB: errp parameter is unused currently */
+cc->accel_cpu->cpu_realizefn(cpu, errp);
+}
+
 #ifdef CONFIG_TCG
 /* NB: errp parameter is unused currently */
 if (tcg_enabled()) {
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 648e41791f..6e2b5d7e59 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6445,16 +6445,19 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu)
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 {
 CPUState *cs = CPU(dev);
-CPUClass *cc = CPU_GET_CLASS(cs);
 X86CPU *cpu = X86_CPU(dev);
 X86CPUClass *xcc = X86_CPU_GET_CLASS(dev);
 CPUX86State *env = &cpu->env;
 Error *local_err = NULL;
 static bool ht_warned;
 
-/* The accelerator realizefn needs to be called first. */
-if (cc->accel_cpu) {
-cc->accel_cpu->cpu_realizefn(cs, errp);
+/* Process Hyper-V enlightenments */
+x86_cpu_hyperv_realize(cpu);
+
+cpu_exec_realizefn(cs, &local_err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
 }
 
 if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
@@ -6570,15 +6573,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 env->cache_info_amd.l3_cache = &legacy_l3_cache;
 }
 
-/* Process Hyper-V enlightenments */
-x86_cpu_hyperv_realize(cpu);
-
-cpu_exec_realizefn(cs, &local_err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-return;
-}
-
 #ifndef CONFIG_USER_ONLY
 MachineState *ms = MACHINE(qdev_get_machine());
 qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
-- 
2.26.2

[RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration

2021-02-25 Thread Eric Auger

Up to now vSMMUv3 has not been integrated with VFIO. VFIO
integration requires to program the physical IOMMU consistently
with the guest mappings. However, as opposed to VTD, SMMUv3 has
no "Caching Mode" which allows easy trapping of guest mappings.
This means the vSMMUV3 cannot use the same VFIO integration as VTD.

However SMMUv3 has 2 translation stages. This was devised with
virtualization use case in mind where stage 1 is "owned" by the
guest whereas the host uses stage 2 for VM isolation.

This series sets up this nested translation stage. It only works
if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
other words, it does not work if there is a physical SMMUv2).

- We force the host to use stage 2 instead of stage 1, when we
  detect a vSMMUV3 is behind a VFIO device. For a VFIO device
  without any virtual IOMMU, we still use stage 1 as many existing
  SMMUs expect this behavior.
- We use PCIPASIDOps to propage guest stage1 config changes on
  STE (Stream Table Entry) changes.
- We implement a specific UNMAP notifier that conveys guest
  IOTLB invalidations to the host
- We register MSI IOVA/GPA bindings to the host so that this latter
  can build a nested stage translation
- As the legacy MAP notifier is not called anymore, we must make
  sure stage 2 mappings are set. This is achieved through another
  prereg memory listener.
- Physical SMMU stage 1 related faults are reported to the guest
  via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
  region. Then they are reinjected into the guest.

Best Regards

Eric

This series applies on top of
[PATCH v2 0/7] Some vIOMMU fixes and
PATCH 0/2] Additional vIOMMU fixes related to UNMAP notifiers

All the patches can be found at:
v8: https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8

Previous version:
v7: https://github.com/eauger/qemu/tree/v5.2.0-rc1-2stage-rfcv7

Kernel Dependencies:
[1] [PATCH v14 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
[2] [PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part)
branch containing both:
https://github.com/eauger/linux/tree/v5.11-stallv12-2stage-v14

History:
v7 -> v8:
- adapt to changes to the kernel uapi
- Fix unregistration of MSI bindings
- applies on top of range invalidation fixes
- changes in IOTLBEntry (flags)
- addressed all the comments from reviewers/testers I hope.
  Many thanks to all of you! see individual logs

v6 -> v7:
- rebase on v5.2.0-rc1
- added:
  "pci: Add return_page_response pci ops" and
  "vfio/pci: Implement return_page_response page response callback"
  for vSVA integration (not used in this series).

v5 -> v6:
- just rebase work

v4 -> v5:
- Use PCIPASIDOps for config update notifications
- removal of notification for MSI binding which is not needed
  anymore
- Use a single fault region
- use the specific interrupt index

v3 -> v4:
- adapt to changes in uapi (asid cache invalidation)
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
- sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h
- fix MSI binding for MSI (not MSIX)
- fix mingw compilation

v2 -> v3:
- rework fault handling
- MSI binding registration done in vfio-pci. MSI binding tear down called
  on container cleanup path
- leaf parameter propagated

v1 -> v2:
- Fixed dual assignment (asid now correctly propagated on TLB invalidations)
- Integrated fault reporting


Eric Auger (27):
  hw/vfio/common: trace vfio_connect_container operations
  update-linux-headers: Import iommu.h
  header update against 5.11-rc2 and IOMMU/VFIO nested stage APIs
  memory: Add new fields in IOTLBEntry
  hw/arm/smmuv3: Properly propagate S1 asid invalidation
  memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute
  memory: Introduce IOMMU Memory Region inject_faults API
  iommu: Introduce generic header
  vfio: Force nested if iommu requires it
  vfio: Introduce hostwin_from_range helper
  vfio: Introduce helpers to DMA map/unmap a RAM section
  vfio: Set up nested stage mappings
  vfio: Pass stage 1 MSI bindings to the host
  vfio: Helper to get IRQ info including capabilities
  vfio/pci: Register handler for iommu fault
  vfio/pci: Set up the DMA FAULT region
  vfio/pci: Implement the DMA fault handler
  hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
  hw/arm/smmuv3: Store the PASID table GPA in the translation config
  hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
  hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
  hw/arm/smmuv3: Pass stage 1 configurations to the host
  hw/arm/smmuv3: Implement fault injection
  hw/arm/smmuv3: Allow MAP notifiers
  pci: Add return_page_response pci ops
  vfio/pci: Implement return_page_response page response callback

Liu Yi L (1):
  pci: introduce PCIPASIDOps to PCIDevice

 hw/arm/smmu-common.c  |   2 +-
 hw/arm/smmu-internal.h|

[PATCH v23 04/17] target/i386: fix host_cpu_adjust_phys_bits error handling

2021-02-25 Thread Claudio Fontana

move the check for phys_bits outside of host_cpu_adjust_phys_bits,
because otherwise it is impossible to return an error condition
explicitly.

Signed-off-by: Claudio Fontana 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 target/i386/host-cpu.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index 9cfe56ce41..d07d41c34c 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -50,7 +50,7 @@ static void host_cpu_enable_cpu_pm(X86CPU *cpu)
 env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
 }
 
-static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu, Error **errp)
+static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
 {
 uint32_t host_phys_bits = host_cpu_phys_bits();
 uint32_t phys_bits = cpu->phys_bits;
@@ -77,14 +77,6 @@ static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu, Error 
**errp)
 }
 }
 
-if (phys_bits &&
-(phys_bits > TARGET_PHYS_ADDR_SPACE_BITS ||
- phys_bits < 32)) {
-error_setg(errp, "phys-bits should be between 32 and %u "
-   " (but is %u)",
-   TARGET_PHYS_ADDR_SPACE_BITS, phys_bits);
-}
-
 return phys_bits;
 }
 
@@ -97,7 +89,17 @@ void host_cpu_realizefn(CPUState *cs, Error **errp)
 host_cpu_enable_cpu_pm(cpu);
 }
 if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
-cpu->phys_bits = host_cpu_adjust_phys_bits(cpu, errp);
+uint32_t phys_bits = host_cpu_adjust_phys_bits(cpu);
+
+if (phys_bits &&
+(phys_bits > TARGET_PHYS_ADDR_SPACE_BITS ||
+ phys_bits < 32)) {
+error_setg(errp, "phys-bits should be between 32 and %u "
+   " (but is %u)",
+   TARGET_PHYS_ADDR_SPACE_BITS, phys_bits);
+return;
+}
+cpu->phys_bits = phys_bits;
 }
 }
 
-- 
2.26.2

[PATCH v23 13/17] i386: split svm_helper into sysemu and stub-only user

2021-02-25 Thread Claudio Fontana

For now we just copy over the previous user stubs, but really,

everything that requires s->cpl == 0 should be impossible
to trigger from user-mode emulation.

Later on we should add a check that asserts this easily f.e.:

static bool check_cpl0(DisasContext *s)
{
 int cpl = s->cpl;
 #ifdef CONFIG_USER_ONLY
 assert(cpl == 3);
 #endif
 if (cpl != 0) {
 gen_exception(s, EXCP0D_GPF, s->pc_start - s->cs_base);
 return false;
 }
 return true;
}

Signed-off-by: Claudio Fontana 
Cc: Paolo Bonzini 
Reviewed-by: Richard Henderson 
---
 target/i386/tcg/{ => sysemu}/svm_helper.c | 62 +-
 target/i386/tcg/user/svm_stubs.c  | 76 +++
 target/i386/tcg/meson.build   |  1 -
 target/i386/tcg/sysemu/meson.build|  1 +
 target/i386/tcg/user/meson.build  |  1 +
 5 files changed, 80 insertions(+), 61 deletions(-)
 rename target/i386/tcg/{ => sysemu}/svm_helper.c (96%)
 create mode 100644 target/i386/tcg/user/svm_stubs.c

diff --git a/target/i386/tcg/svm_helper.c b/target/i386/tcg/sysemu/svm_helper.c
similarity index 96%
rename from target/i386/tcg/svm_helper.c
rename to target/i386/tcg/sysemu/svm_helper.c
index 097bb9b83d..5b9c6f18be 100644
--- a/target/i386/tcg/svm_helper.c
+++ b/target/i386/tcg/sysemu/svm_helper.c
@@ -1,5 +1,5 @@
 /*
- *  x86 SVM helpers
+ *  x86 SVM helpers (sysemu only)
  *
  *  Copyright (c) 2003 Fabrice Bellard
  *
@@ -22,66 +22,10 @@
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
-#include "helper-tcg.h"
+#include "tcg/helper-tcg.h"
 
 /* Secure Virtual Machine helpers */
 
-#if defined(CONFIG_USER_ONLY)
-
-void helper_vmrun(CPUX86State *env, int aflag, int next_eip_addend)
-{
-}
-
-void helper_vmmcall(CPUX86State *env)
-{
-}
-
-void helper_vmload(CPUX86State *env, int aflag)
-{
-}
-
-void helper_vmsave(CPUX86State *env, int aflag)
-{
-}
-
-void helper_stgi(CPUX86State *env)
-{
-}
-
-void helper_clgi(CPUX86State *env)
-{
-}
-
-void helper_skinit(CPUX86State *env)
-{
-}
-
-void helper_invlpga(CPUX86State *env, int aflag)
-{
-}
-
-void cpu_vmexit(CPUX86State *nenv, uint32_t exit_code, uint64_t exit_info_1,
-uintptr_t retaddr)
-{
-assert(0);
-}
-
-void helper_svm_check_intercept_param(CPUX86State *env, uint32_t type,
-  uint64_t param)
-{
-}
-
-void cpu_svm_check_intercept_param(CPUX86State *env, uint32_t type,
-   uint64_t param, uintptr_t retaddr)
-{
-}
-
-void helper_svm_check_io(CPUX86State *env, uint32_t port, uint32_t param,
- uint32_t next_eip_addend)
-{
-}
-#else
-
 static inline void svm_save_seg(CPUX86State *env, hwaddr addr,
 const SegmentCache *sc)
 {
@@ -797,5 +741,3 @@ void do_vmexit(CPUX86State *env, uint32_t exit_code, 
uint64_t exit_info_1)
host's code segment or non-canonical (in the case of long mode), a
#GP fault is delivered inside the host. */
 }
-
-#endif
diff --git a/target/i386/tcg/user/svm_stubs.c b/target/i386/tcg/user/svm_stubs.c
new file mode 100644
index 00..97528b56ad
--- /dev/null
+++ b/target/i386/tcg/user/svm_stubs.c
@@ -0,0 +1,76 @@
+/*
+ *  x86 SVM helpers (user-mode)
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "tcg/helper-tcg.h"
+
+void helper_vmrun(CPUX86State *env, int aflag, int next_eip_addend)
+{
+}
+
+void helper_vmmcall(CPUX86State *env)
+{
+}
+
+void helper_vmload(CPUX86State *env, int aflag)
+{
+}
+
+void helper_vmsave(CPUX86State *env, int aflag)
+{
+}
+
+void helper_stgi(CPUX86State *env)
+{
+}
+
+void helper_clgi(CPUX86State *env)
+{
+}
+
+void helper_skinit(CPUX86State *env)
+{
+}
+
+void helper_invlpga(CPUX86State *env, int aflag)
+{
+}
+
+void cpu_vmexit(CPUX86State *nenv, uint32_t exit_code, uint64_t exit_info_1,
+uintptr_t retaddr)
+{
+assert(0);
+}
+
+void helper_svm_check_intercept_param(CPUX86State *env, uint32_t type,
+  uint64_t param)
+{
+}
+
+void cpu_svm_check_intercept_param(CPUX86State *env, uint32_t type,
+   uint64_t param, uintptr_t retaddr)
+{
+}
+
+void helper_svm_check_io(CPUX86Stat

[PATCH v23 06/17] meson: add target_user_arch

2021-02-25 Thread Claudio Fontana

the lack of target_user_arch makes it hard to fully leverage the
build system in order to separate user code from sysemu code.

Provide it, so that we can avoid the proliferation of #ifdef
in target code.

Signed-off-by: Claudio Fontana 
Reviewed-by: Alex Bennée 

[claudio: added changes for new target hexagon]

Signed-off-by: Claudio Fontana 
---
 meson.build   | 5 +
 target/alpha/meson.build  | 3 +++
 target/arm/meson.build| 2 ++
 target/cris/meson.build   | 3 +++
 target/hexagon/meson.build| 3 +++
 target/hppa/meson.build   | 3 +++
 target/i386/meson.build   | 2 ++
 target/m68k/meson.build   | 3 +++
 target/microblaze/meson.build | 3 +++
 target/mips/meson.build   | 3 +++
 target/nios2/meson.build  | 3 +++
 target/openrisc/meson.build   | 3 +++
 target/ppc/meson.build| 3 +++
 target/riscv/meson.build  | 3 +++
 target/s390x/meson.build  | 3 +++
 target/sh4/meson.build| 3 +++
 target/sparc/meson.build  | 3 +++
 target/tilegx/meson.build | 3 +++
 target/tricore/meson.build| 3 +++
 target/xtensa/meson.build | 3 +++
 20 files changed, 60 insertions(+)

diff --git a/meson.build b/meson.build
index 05a67c20d9..5be4e5f38c 100644
--- a/meson.build
+++ b/meson.build
@@ -1735,6 +1735,7 @@ modules = {}
 hw_arch = {}
 target_arch = {}
 target_softmmu_arch = {}
+target_user_arch = {}
 
 ###
 # Trace files #
@@ -2132,6 +2133,10 @@ foreach target : target_dirs
 abi = config_target['TARGET_ABI_DIR']
 target_type='user'
 qemu_target_name = 'qemu-' + target_name
+t = target_user_arch[arch].apply(config_target, strict: false)
+arch_srcs += t.sources()
+arch_deps += t.dependencies()
+
 if 'CONFIG_LINUX_USER' in config_target
   base_dir = 'linux-user'
   target_inc += include_directories('linux-user/host/' / 
config_host['ARCH'])
diff --git a/target/alpha/meson.build b/target/alpha/meson.build
index 1aec55abb4..1b0555d3ee 100644
--- a/target/alpha/meson.build
+++ b/target/alpha/meson.build
@@ -14,5 +14,8 @@ alpha_ss.add(files(
 alpha_softmmu_ss = ss.source_set()
 alpha_softmmu_ss.add(files('machine.c'))
 
+alpha_user_ss = ss.source_set()
+
 target_arch += {'alpha': alpha_ss}
 target_softmmu_arch += {'alpha': alpha_softmmu_ss}
+target_user_arch += {'alpha': alpha_user_ss}
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 15b936c101..a96af5ee1b 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -53,6 +53,8 @@ arm_softmmu_ss.add(files(
   'monitor.c',
   'psci.c',
 ))
+arm_user_ss = ss.source_set()
 
 target_arch += {'arm': arm_ss}
 target_softmmu_arch += {'arm': arm_softmmu_ss}
+target_user_arch += {'arm': arm_user_ss}
diff --git a/target/cris/meson.build b/target/cris/meson.build
index 67c3793c85..7fd81e0348 100644
--- a/target/cris/meson.build
+++ b/target/cris/meson.build
@@ -10,5 +10,8 @@ cris_ss.add(files(
 cris_softmmu_ss = ss.source_set()
 cris_softmmu_ss.add(files('mmu.c', 'machine.c'))
 
+cris_user_ss = ss.source_set()
+
 target_arch += {'cris': cris_ss}
 target_softmmu_arch += {'cris': cris_softmmu_ss}
+target_user_arch += {'cris': cris_user_ss}
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index 15318a6fa7..e92d45400d 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -188,4 +188,7 @@ hexagon_ss.add(files(
 'conv_emu.c',
 ))
 
+hexagon_user_ss = ss.source_set()
+
 target_arch += {'hexagon': hexagon_ss}
+target_user_arch += {'hexagon': hexagon_user_ss}
diff --git a/target/hppa/meson.build b/target/hppa/meson.build
index 8a7ff82efc..85ad314671 100644
--- a/target/hppa/meson.build
+++ b/target/hppa/meson.build
@@ -15,5 +15,8 @@ hppa_ss.add(files(
 hppa_softmmu_ss = ss.source_set()
 hppa_softmmu_ss.add(files('machine.c'))
 
+hppa_user_ss = ss.source_set()
+
 target_arch += {'hppa': hppa_ss}
 target_softmmu_arch += {'hppa': hppa_softmmu_ss}
+target_user_arch += {'hppa': hppa_user_ss}
diff --git a/target/i386/meson.build b/target/i386/meson.build
index fd24479590..cac26a4581 100644
--- a/target/i386/meson.build
+++ b/target/i386/meson.build
@@ -19,6 +19,7 @@ i386_softmmu_ss.add(files(
   'machine.c',
   'monitor.c',
 ))
+i386_user_ss = ss.source_set()
 
 subdir('kvm')
 subdir('hax')
@@ -28,3 +29,4 @@ subdir('tcg')
 
 target_arch += {'i386': i386_ss}
 target_softmmu_arch += {'i386': i386_softmmu_ss}
+target_user_arch += {'i386': i386_user_ss}
diff --git a/target/m68k/meson.build b/target/m68k/meson.build
index 05cd9fbd1e..b507682684 100644
--- a/target/m68k/meson.build
+++ b/target/m68k/meson.build
@@ -13,5 +13,8 @@ m68k_ss.add(files(
 m68k_softmmu_ss = ss.source_set()
 m68k_softmmu_ss.add(files('monitor.c'))
 
+m68k_user_ss = ss.source_set()
+
 target_arch += {'m68k': m68k_ss}
 target_softmmu_arch += {'m68k': m68k_softmmu_ss}
+target_user_arch += {'m68k': m68k_user_ss}
diff --git a/target/microblaze/meson.build b/target/microblaze/meson.build
index 05ee0ec163..52d8fcb0a3 100644
--- a/target/

[PATCH v23 10/17] i386: move TCG btp_helper into sysemu/

2021-02-25 Thread Claudio Fontana

for user-mode, assert that the hidden IOBPT flags are not set
while attempting to generate io_bpt helpers.

Signed-off-by: Claudio Fontana 
Cc: Paolo Bonzini 
Reviewed-by: Richard Henderson 
---
 target/i386/helper.h|   7 +
 target/i386/tcg/helper-tcg.h|   3 +
 target/i386/tcg/bpt_helper.c| 276 --
 target/i386/tcg/sysemu/bpt_helper.c | 293 
 target/i386/tcg/translate.c |   8 +-
 target/i386/tcg/sysemu/meson.build  |   1 +
 6 files changed, 311 insertions(+), 277 deletions(-)
 create mode 100644 target/i386/tcg/sysemu/bpt_helper.c

diff --git a/target/i386/helper.h b/target/i386/helper.h
index 8ffda4cdc6..095520f81f 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -46,7 +46,11 @@ DEF_HELPER_2(read_crN, tl, env, int)
 DEF_HELPER_3(write_crN, void, env, int, tl)
 DEF_HELPER_2(lmsw, void, env, tl)
 DEF_HELPER_1(clts, void, env)
+
+#ifndef CONFIG_USER_ONLY
 DEF_HELPER_FLAGS_3(set_dr, TCG_CALL_NO_WG, void, env, int, tl)
+#endif /* !CONFIG_USER_ONLY */
+
 DEF_HELPER_FLAGS_2(get_dr, TCG_CALL_NO_WG, tl, env, int)
 DEF_HELPER_2(invlpg, void, env, tl)
 
@@ -100,7 +104,10 @@ DEF_HELPER_3(outw, void, env, i32, i32)
 DEF_HELPER_2(inw, tl, env, i32)
 DEF_HELPER_3(outl, void, env, i32, i32)
 DEF_HELPER_2(inl, tl, env, i32)
+
+#ifndef CONFIG_USER_ONLY
 DEF_HELPER_FLAGS_4(bpt_io, TCG_CALL_NO_WG, void, env, i32, i32, tl)
+#endif /* !CONFIG_USER_ONLY */
 
 DEF_HELPER_3(svm_check_intercept_param, void, env, i32, i64)
 DEF_HELPER_4(svm_check_io, void, env, i32, i32, i32)
diff --git a/target/i386/tcg/helper-tcg.h b/target/i386/tcg/helper-tcg.h
index c133c63555..b420b3356d 100644
--- a/target/i386/tcg/helper-tcg.h
+++ b/target/i386/tcg/helper-tcg.h
@@ -92,4 +92,7 @@ void do_interrupt_x86_hardirq(CPUX86State *env, int intno, 
int is_hw);
 /* smm_helper.c */
 void do_smm_enter(X86CPU *cpu);
 
+/* bpt_helper.c */
+bool check_hw_breakpoints(CPUX86State *env, bool force_dr6_update);
+
 #endif /* I386_HELPER_TCG_H */
diff --git a/target/i386/tcg/bpt_helper.c b/target/i386/tcg/bpt_helper.c
index 979230ac12..fb2a65ac9c 100644
--- a/target/i386/tcg/bpt_helper.c
+++ b/target/i386/tcg/bpt_helper.c
@@ -19,223 +19,9 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
-#include "exec/exec-all.h"
 #include "exec/helper-proto.h"
 #include "helper-tcg.h"
 
-
-#ifndef CONFIG_USER_ONLY
-static inline bool hw_local_breakpoint_enabled(unsigned long dr7, int index)
-{
-return (dr7 >> (index * 2)) & 1;
-}
-
-static inline bool hw_global_breakpoint_enabled(unsigned long dr7, int index)
-{
-return (dr7 >> (index * 2)) & 2;
-
-}
-static inline bool hw_breakpoint_enabled(unsigned long dr7, int index)
-{
-return hw_global_breakpoint_enabled(dr7, index) ||
-   hw_local_breakpoint_enabled(dr7, index);
-}
-
-static inline int hw_breakpoint_type(unsigned long dr7, int index)
-{
-return (dr7 >> (DR7_TYPE_SHIFT + (index * 4))) & 3;
-}
-
-static inline int hw_breakpoint_len(unsigned long dr7, int index)
-{
-int len = ((dr7 >> (DR7_LEN_SHIFT + (index * 4))) & 3);
-return (len == 2) ? 8 : len + 1;
-}
-
-static int hw_breakpoint_insert(CPUX86State *env, int index)
-{
-CPUState *cs = env_cpu(env);
-target_ulong dr7 = env->dr[7];
-target_ulong drN = env->dr[index];
-int err = 0;
-
-switch (hw_breakpoint_type(dr7, index)) {
-case DR7_TYPE_BP_INST:
-if (hw_breakpoint_enabled(dr7, index)) {
-err = cpu_breakpoint_insert(cs, drN, BP_CPU,
-&env->cpu_breakpoint[index]);
-}
-break;
-
-case DR7_TYPE_IO_RW:
-/* Notice when we should enable calls to bpt_io.  */
-return hw_breakpoint_enabled(env->dr[7], index)
-   ? HF_IOBPT_MASK : 0;
-
-case DR7_TYPE_DATA_WR:
-if (hw_breakpoint_enabled(dr7, index)) {
-err = cpu_watchpoint_insert(cs, drN,
-hw_breakpoint_len(dr7, index),
-BP_CPU | BP_MEM_WRITE,
-&env->cpu_watchpoint[index]);
-}
-break;
-
-case DR7_TYPE_DATA_RW:
-if (hw_breakpoint_enabled(dr7, index)) {
-err = cpu_watchpoint_insert(cs, drN,
-hw_breakpoint_len(dr7, index),
-BP_CPU | BP_MEM_ACCESS,
-&env->cpu_watchpoint[index]);
-}
-break;
-}
-if (err) {
-env->cpu_breakpoint[index] = NULL;
-}
-return 0;
-}
-
-static void hw_breakpoint_remove(CPUX86State *env, int index)
-{
-CPUState *cs = env_cpu(env);
-
-switch (hw_breakpoint_type(env->dr[7], index)) {
-case DR7_TYPE_BP_INST:
-if (env->cpu_breakpoint[index]) {
-cpu_breakpoint_remove_by_ref(cs, env->cpu_breakpoint[index]);
-env->cpu_breakpoint[index] = NULL;
-}
-break;
-
-

[RFC v8 05/28] hw/arm/smmuv3: Properly propagate S1 asid invalidation

2021-02-25 Thread Eric Auger

At the moment ASID invalidation command (CMD_TLBI_NH_ASID) is
propagated as a domain invalidation, ie. all ASIDs get invalidated,
failing to restrict the invalidation to the accurate asid.

Fix that by populating the new fields laterly introduced in the
IOTLEntry struct, namely setting the granularity to PASID and setting
the arch_id to the invalidated asid.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 42 --
 hw/arm/trace-events |  1 +
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index d037d6df5b..8dffb1bcc3 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -835,6 +835,29 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 memory_region_notify_iommu_one(n, &event);
 }
 
+/**
+ * smmuv3_notify_asid - call the notifier @n for a given asid
+ *
+ * @mr: IOMMU mr region handle
+ * @n: notifier to be called
+ * @asid: address space ID or negative value if we don't care
+ */
+static void smmuv3_notify_asid(IOMMUMemoryRegion *mr,
+   IOMMUNotifier *n, int asid)
+{
+IOMMUTLBEvent event = {};
+
+event.type = IOMMU_NOTIFIER_UNMAP;
+event.entry.target_as = &address_space_memory;
+event.entry.perm = IOMMU_NONE;
+event.entry.granularity = IOMMU_INV_GRAN_PASID;
+event.entry.flags = IOMMU_INV_FLAGS_ARCHID;
+event.entry.arch_id = asid;
+
+memory_region_notify_iommu_one(n, &event);
+}
+
+
 /* invalidate an asid/iova range tuple in all mr's */
 static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova,
   uint8_t tg, uint64_t num_pages)
@@ -910,6 +933,22 @@ smmuv3_invalidate_ste(gpointer key, gpointer value, 
gpointer user_data)
 return true;
 }
 
+static void smmuv3_s1_asid_inval(SMMUState *s, uint16_t asid)
+{
+SMMUDevice *sdev;
+
+trace_smmuv3_s1_asid_inval(asid);
+QLIST_FOREACH(sdev, &s->devices_with_notifiers, next) {
+IOMMUMemoryRegion *mr = &sdev->iommu;
+IOMMUNotifier *n;
+
+IOMMU_NOTIFIER_FOREACH(n, mr) {
+smmuv3_notify_asid(mr, n, asid);
+}
+}
+smmu_iotlb_inv_asid(s, asid);
+}
+
 static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 SMMUState *bs = ARM_SMMU(s);
@@ -1020,8 +1059,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 uint16_t asid = CMD_ASID(&cmd);
 
 trace_smmuv3_cmdq_tlbi_nh_asid(asid);
-smmu_inv_notifiers_all(&s->smmu_state);
-smmu_iotlb_inv_asid(bs, asid);
+smmuv3_s1_asid_inval(bs, asid);
 break;
 }
 case SMMU_CMD_TLBI_NH_ALL:
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index b79a91af5f..8e530ba79d 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -46,6 +46,7 @@ smmuv3_cmdq_cfgi_cd(uint32_t sid) "sid=0x%x"
 smmuv3_config_cache_hit(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t 
perc) "Config cache HIT for sid=0x%x (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, uint32_t misses, 
uint32_t perc) "Config cache MISS for sid=0x%x (hits=%d, misses=%d, hit 
rate=%d)"
 smmuv3_s1_range_inval(int vmid, int asid, uint64_t addr, uint8_t tg, uint64_t 
num_pages, uint8_t ttl, bool leaf) "vmid=%d asid=%d addr=0x%"PRIx64" tg=%d 
num_pages=0x%"PRIx64" ttl=%d leaf=%d"
+smmuv3_s1_asid_inval(int asid) "asid=%d"
 smmuv3_cmdq_tlbi_nh(void) ""
 smmuv3_cmdq_tlbi_nh_asid(uint16_t asid) "asid=%d"
 smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
-- 
2.26.2

[PATCH v23 08/17] i386: split smm helper (sysemu)

2021-02-25 Thread Claudio Fontana

smm is only really useful for sysemu, split in two modules
around the CONFIG_USER_ONLY, in order to remove the ifdef
and use the build system instead.

add cpu_abort() when detecting attempts to enter SMM mode via
SMI interrupt in user-mode, and assert that the cpu is not
in SMM mode while translating RSM instructions.

Signed-off-by: Claudio Fontana 
Cc: Paolo Bonzini 
Reviewed-by: Richard Henderson 
---
 target/i386/helper.h  |  4 
 target/i386/tcg/seg_helper.c  |  4 
 target/i386/tcg/{ => sysemu}/smm_helper.c | 19 ++-
 target/i386/tcg/translate.c   |  5 +
 target/i386/tcg/meson.build   |  1 -
 target/i386/tcg/sysemu/meson.build|  1 +
 6 files changed, 16 insertions(+), 18 deletions(-)
 rename target/i386/tcg/{ => sysemu}/smm_helper.c (98%)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index c2ae2f7e61..8ffda4cdc6 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -70,7 +70,11 @@ DEF_HELPER_1(clac, void, env)
 DEF_HELPER_1(stac, void, env)
 DEF_HELPER_3(boundw, void, env, tl, int)
 DEF_HELPER_3(boundl, void, env, tl, int)
+
+#ifndef CONFIG_USER_ONLY
 DEF_HELPER_1(rsm, void, env)
+#endif /* !CONFIG_USER_ONLY */
+
 DEF_HELPER_2(into, void, env, int)
 DEF_HELPER_2(cmpxchg8b_unlocked, void, env, tl)
 DEF_HELPER_2(cmpxchg8b, void, env, tl)
diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
index 180d47f0e9..d04fbdd7cd 100644
--- a/target/i386/tcg/seg_helper.c
+++ b/target/i386/tcg/seg_helper.c
@@ -1351,7 +1351,11 @@ bool x86_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 case CPU_INTERRUPT_SMI:
 cpu_svm_check_intercept_param(env, SVM_EXIT_SMI, 0, 0);
 cs->interrupt_request &= ~CPU_INTERRUPT_SMI;
+#ifdef CONFIG_USER_ONLY
+cpu_abort(CPU(cpu), "SMI interrupt: cannot enter SMM in user-mode");
+#else
 do_smm_enter(cpu);
+#endif /* CONFIG_USER_ONLY */
 break;
 case CPU_INTERRUPT_NMI:
 cpu_svm_check_intercept_param(env, SVM_EXIT_NMI, 0, 0);
diff --git a/target/i386/tcg/smm_helper.c b/target/i386/tcg/sysemu/smm_helper.c
similarity index 98%
rename from target/i386/tcg/smm_helper.c
rename to target/i386/tcg/sysemu/smm_helper.c
index 62d027abd3..a45b5651c3 100644
--- a/target/i386/tcg/smm_helper.c
+++ b/target/i386/tcg/sysemu/smm_helper.c
@@ -1,5 +1,5 @@
 /*
- *  x86 SMM helpers
+ *  x86 SMM helpers (sysemu-only)
  *
  *  Copyright (c) 2003 Fabrice Bellard
  *
@@ -18,27 +18,14 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "exec/helper-proto.h"
 #include "exec/log.h"
-#include "helper-tcg.h"
+#include "tcg/helper-tcg.h"
 
 
 /* SMM support */
 
-#if defined(CONFIG_USER_ONLY)
-
-void do_smm_enter(X86CPU *cpu)
-{
-}
-
-void helper_rsm(CPUX86State *env)
-{
-}
-
-#else
-
 #ifdef TARGET_X86_64
 #define SMM_REVISION_ID 0x00020064
 #else
@@ -330,5 +317,3 @@ void helper_rsm(CPUX86State *env)
 qemu_log_mask(CPU_LOG_INT, "SMM: after RSM\n");
 log_cpu_state_mask(CPU_LOG_INT, CPU(cpu), CPU_DUMP_CCOP);
 }
-
-#endif /* !CONFIG_USER_ONLY */
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index af1faf9342..b882041ef0 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -8319,9 +8319,14 @@ static target_ulong disas_insn(DisasContext *s, CPUState 
*cpu)
 gen_svm_check_intercept(s, pc_start, SVM_EXIT_RSM);
 if (!(s->flags & HF_SMM_MASK))
 goto illegal_op;
+#ifdef CONFIG_USER_ONLY
+/* we should not be in SMM mode */
+g_assert_not_reached();
+#else
 gen_update_cc_op(s);
 gen_jmp_im(s, s->pc - s->cs_base);
 gen_helper_rsm(cpu_env);
+#endif /* CONFIG_USER_ONLY */
 gen_eob(s);
 break;
 case 0x1b8: /* SSE4.2 popcnt */
diff --git a/target/i386/tcg/meson.build b/target/i386/tcg/meson.build
index 320bcd1e46..449d9719ef 100644
--- a/target/i386/tcg/meson.build
+++ b/target/i386/tcg/meson.build
@@ -8,7 +8,6 @@ i386_ss.add(when: 'CONFIG_TCG', if_true: files(
   'misc_helper.c',
   'mpx_helper.c',
   'seg_helper.c',
-  'smm_helper.c',
   'svm_helper.c',
   'tcg-cpu.c',
   'translate.c'), if_false: files('tcg-stub.c'))
diff --git a/target/i386/tcg/sysemu/meson.build 
b/target/i386/tcg/sysemu/meson.build
index 4ab30cc32e..35ba16dc3d 100644
--- a/target/i386/tcg/sysemu/meson.build
+++ b/target/i386/tcg/sysemu/meson.build
@@ -1,3 +1,4 @@
 i386_softmmu_ss.add(when: ['CONFIG_TCG', 'CONFIG_SOFTMMU'], if_true: files(
   'tcg-cpu.c',
+  'smm_helper.c',
 ))
-- 
2.26.2

[PATCH v23 11/17] i386: split misc helper user stubs and sysemu part

2021-02-25 Thread Claudio Fontana

Signed-off-by: Claudio Fontana 
---
 target/i386/tcg/misc_helper.c| 463 ---
 target/i386/tcg/sysemu/misc_helper.c | 438 +
 target/i386/tcg/user/misc_stubs.c|  75 +
 target/i386/tcg/sysemu/meson.build   |   1 +
 target/i386/tcg/user/meson.build |   1 +
 5 files changed, 515 insertions(+), 463 deletions(-)
 create mode 100644 target/i386/tcg/sysemu/misc_helper.c
 create mode 100644 target/i386/tcg/user/misc_stubs.c

diff --git a/target/i386/tcg/misc_helper.c b/target/i386/tcg/misc_helper.c
index f02e4fd400..82fb7037ac 100644
--- a/target/i386/tcg/misc_helper.c
+++ b/target/i386/tcg/misc_helper.c
@@ -18,12 +18,9 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
-#include "exec/cpu_ldst.h"
-#include "exec/address-spaces.h"
 #include "helper-tcg.h"
 
 /*
@@ -39,69 +36,6 @@ void cpu_load_eflags(CPUX86State *env, int eflags, int 
update_mask)
 (eflags & update_mask) | 0x2;
 }
 
-void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
-{
-#ifdef CONFIG_USER_ONLY
-fprintf(stderr, "outb: port=0x%04x, data=%02x\n", port, data);
-#else
-address_space_stb(&address_space_io, port, data,
-  cpu_get_mem_attrs(env), NULL);
-#endif
-}
-
-target_ulong helper_inb(CPUX86State *env, uint32_t port)
-{
-#ifdef CONFIG_USER_ONLY
-fprintf(stderr, "inb: port=0x%04x\n", port);
-return 0;
-#else
-return address_space_ldub(&address_space_io, port,
-  cpu_get_mem_attrs(env), NULL);
-#endif
-}
-
-void helper_outw(CPUX86State *env, uint32_t port, uint32_t data)
-{
-#ifdef CONFIG_USER_ONLY
-fprintf(stderr, "outw: port=0x%04x, data=%04x\n", port, data);
-#else
-address_space_stw(&address_space_io, port, data,
-  cpu_get_mem_attrs(env), NULL);
-#endif
-}
-
-target_ulong helper_inw(CPUX86State *env, uint32_t port)
-{
-#ifdef CONFIG_USER_ONLY
-fprintf(stderr, "inw: port=0x%04x\n", port);
-return 0;
-#else
-return address_space_lduw(&address_space_io, port,
-  cpu_get_mem_attrs(env), NULL);
-#endif
-}
-
-void helper_outl(CPUX86State *env, uint32_t port, uint32_t data)
-{
-#ifdef CONFIG_USER_ONLY
-fprintf(stderr, "outl: port=0x%04x, data=%08x\n", port, data);
-#else
-address_space_stl(&address_space_io, port, data,
-  cpu_get_mem_attrs(env), NULL);
-#endif
-}
-
-target_ulong helper_inl(CPUX86State *env, uint32_t port)
-{
-#ifdef CONFIG_USER_ONLY
-fprintf(stderr, "inl: port=0x%04x\n", port);
-return 0;
-#else
-return address_space_ldl(&address_space_io, port,
- cpu_get_mem_attrs(env), NULL);
-#endif
-}
-
 void helper_into(CPUX86State *env, int next_eip_addend)
 {
 int eflags;
@@ -126,64 +60,6 @@ void helper_cpuid(CPUX86State *env)
 env->regs[R_EDX] = edx;
 }
 
-#if defined(CONFIG_USER_ONLY)
-target_ulong helper_read_crN(CPUX86State *env, int reg)
-{
-return 0;
-}
-
-void helper_write_crN(CPUX86State *env, int reg, target_ulong t0)
-{
-}
-#else
-target_ulong helper_read_crN(CPUX86State *env, int reg)
-{
-target_ulong val;
-
-cpu_svm_check_intercept_param(env, SVM_EXIT_READ_CR0 + reg, 0, GETPC());
-switch (reg) {
-default:
-val = env->cr[reg];
-break;
-case 8:
-if (!(env->hflags2 & HF2_VINTR_MASK)) {
-val = cpu_get_apic_tpr(env_archcpu(env)->apic_state);
-} else {
-val = env->v_tpr;
-}
-break;
-}
-return val;
-}
-
-void helper_write_crN(CPUX86State *env, int reg, target_ulong t0)
-{
-cpu_svm_check_intercept_param(env, SVM_EXIT_WRITE_CR0 + reg, 0, GETPC());
-switch (reg) {
-case 0:
-cpu_x86_update_cr0(env, t0);
-break;
-case 3:
-cpu_x86_update_cr3(env, t0);
-break;
-case 4:
-cpu_x86_update_cr4(env, t0);
-break;
-case 8:
-if (!(env->hflags2 & HF2_VINTR_MASK)) {
-qemu_mutex_lock_iothread();
-cpu_set_apic_tpr(env_archcpu(env)->apic_state, t0);
-qemu_mutex_unlock_iothread();
-}
-env->v_tpr = t0 & 0x0f;
-break;
-default:
-env->cr[reg] = t0;
-break;
-}
-}
-#endif
-
 void helper_lmsw(CPUX86State *env, target_ulong t0)
 {
 /* only 4 lower bits of CR0 are modified. PE cannot be set to zero
@@ -232,345 +108,6 @@ void helper_rdpmc(CPUX86State *env)
 raise_exception_err(env, EXCP06_ILLOP, 0);
 }
 
-#if defined(CONFIG_USER_ONLY)
-void helper_wrmsr(CPUX86State *env)
-{
-}
-
-void helper_rdmsr(CPUX86State *env)
-{
-}
-#else
-void helper_wrmsr(CPUX86State *env)
-{
-uint64_t val;
-CPUState *cs = env_cpu(env);
-
-cpu_svm_check_intercept_param(env, SVM_EXIT_MSR, 1, GETPC());
-
-val = ((uint32_t)env->regs[R_EAX]) |
-((uint64_t)((uint32_t)env->regs[R_EDX]) << 32);
-
-switch ((uint32_t)env->r

[PATCH v23 12/17] i386: separate fpu_helper into user and sysemu parts

2021-02-25 Thread Claudio Fontana

Signed-off-by: Claudio Fontana 
Reviewed-by: Alex Bennée 

[claudio: removed unused return value]
Signed-off-by: Claudio Fontana 
---
 target/i386/cpu.h   |  3 ++
 target/i386/tcg/fpu_helper.c| 65 +
 target/i386/tcg/sysemu/fpu_helper.c | 57 +
 target/i386/tcg/user/fpu_helper.c   | 58 +
 target/i386/tcg/sysemu/meson.build  |  1 +
 target/i386/tcg/user/meson.build|  1 +
 6 files changed, 122 insertions(+), 63 deletions(-)
 create mode 100644 target/i386/tcg/sysemu/fpu_helper.c
 create mode 100644 target/i386/tcg/user/fpu_helper.c

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c8a84a9033..3797789dc2 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1816,7 +1816,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env);
 int cpu_get_pic_interrupt(CPUX86State *s);
 /* MSDOS compatibility mode FPU exception support */
 void x86_register_ferr_irq(qemu_irq irq);
+void fpu_check_raise_ferr_irq(CPUX86State *s);
 void cpu_set_ignne(void);
+void cpu_clear_ignne(void);
+
 /* mpx_helper.c */
 void cpu_sync_bndcs_hflags(CPUX86State *env);
 
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index 60ed93520a..ade18aa13c 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -21,17 +21,10 @@
 #include 
 #include "cpu.h"
 #include "exec/helper-proto.h"
-#include "qemu/host-utils.h"
-#include "exec/exec-all.h"
-#include "exec/cpu_ldst.h"
 #include "fpu/softfloat.h"
 #include "fpu/softfloat-macros.h"
 #include "helper-tcg.h"
 
-#ifdef CONFIG_SOFTMMU
-#include "hw/irq.h"
-#endif
-
 /* float macros */
 #define FT0(env->ft0)
 #define ST0(env->fpregs[env->fpstt].d)
@@ -75,36 +68,6 @@
 #define floatx80_ln2_d make_floatx80(0x3ffe, 0xb17217f7d1cf79abLL)
 #define floatx80_pi_d make_floatx80(0x4000, 0xc90fdaa22168c234LL)
 
-#if !defined(CONFIG_USER_ONLY)
-static qemu_irq ferr_irq;
-
-void x86_register_ferr_irq(qemu_irq irq)
-{
-ferr_irq = irq;
-}
-
-static void cpu_clear_ignne(void)
-{
-CPUX86State *env = &X86_CPU(first_cpu)->env;
-env->hflags2 &= ~HF2_IGNNE_MASK;
-}
-
-void cpu_set_ignne(void)
-{
-CPUX86State *env = &X86_CPU(first_cpu)->env;
-env->hflags2 |= HF2_IGNNE_MASK;
-/*
- * We get here in response to a write to port F0h.  The chipset should
- * deassert FP_IRQ and FERR# instead should stay signaled until FPSW_SE is
- * cleared, because FERR# and FP_IRQ are two separate pins on real
- * hardware.  However, we don't model FERR# as a qemu_irq, so we just
- * do directly what the chipset would do, i.e. deassert FP_IRQ.
- */
-qemu_irq_lower(ferr_irq);
-}
-#endif
-
-
 static inline void fpush(CPUX86State *env)
 {
 env->fpstt = (env->fpstt - 1) & 7;
@@ -203,8 +166,8 @@ static void fpu_raise_exception(CPUX86State *env, uintptr_t 
retaddr)
 raise_exception_ra(env, EXCP10_COPR, retaddr);
 }
 #if !defined(CONFIG_USER_ONLY)
-else if (ferr_irq && !(env->hflags2 & HF2_IGNNE_MASK)) {
-qemu_irq_raise(ferr_irq);
+else {
+fpu_check_raise_ferr_irq(env);
 }
 #endif
 }
@@ -2501,18 +2464,6 @@ void helper_frstor(CPUX86State *env, target_ulong ptr, 
int data32)
 }
 }
 
-#if defined(CONFIG_USER_ONLY)
-void cpu_x86_fsave(CPUX86State *env, target_ulong ptr, int data32)
-{
-helper_fsave(env, ptr, data32);
-}
-
-void cpu_x86_frstor(CPUX86State *env, target_ulong ptr, int data32)
-{
-helper_frstor(env, ptr, data32);
-}
-#endif
-
 #define XO(X)  offsetof(X86XSaveArea, X)
 
 static void do_xsave_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra)
@@ -2780,18 +2731,6 @@ void helper_fxrstor(CPUX86State *env, target_ulong ptr)
 }
 }
 
-#if defined(CONFIG_USER_ONLY)
-void cpu_x86_fxsave(CPUX86State *env, target_ulong ptr)
-{
-helper_fxsave(env, ptr);
-}
-
-void cpu_x86_fxrstor(CPUX86State *env, target_ulong ptr)
-{
-helper_fxrstor(env, ptr);
-}
-#endif
-
 void helper_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm)
 {
 uintptr_t ra = GETPC();
diff --git a/target/i386/tcg/sysemu/fpu_helper.c 
b/target/i386/tcg/sysemu/fpu_helper.c
new file mode 100644
index 00..1c3610da3b
--- /dev/null
+++ b/target/i386/tcg/sysemu/fpu_helper.c
@@ -0,0 +1,57 @@
+/*
+ *  x86 FPU, MMX/3DNow!/SSE/SSE2/SSE3/SSSE3/SSE4/PNI helpers (sysemu code)
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Gener

[RFC v8 01/28] hw/vfio/common: trace vfio_connect_container operations

2021-02-25 Thread Eric Auger

We currently trace vfio_disconnect_container() but we do not trace
the container <-> group creation, which can be useful to understand
the VFIO topology.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 3 +++
 hw/vfio/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a50b10c801..fcf2c5049f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1709,6 +1709,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 QLIST_FOREACH(container, &space->containers, next) {
 if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
 group->container = container;
+trace_vfio_connect_existing_container(group->groupid,
+  container->fd);
 QLIST_INSERT_HEAD(&container->group_list, group, container_next);
 vfio_kvm_device_add_group(group);
 return 0;
@@ -1742,6 +1744,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 if (ret) {
 goto free_container_exit;
 }
+trace_vfio_connect_new_container(group->groupid, container->fd);
 
 switch (container->iommu_type) {
 case VFIO_TYPE1v2_IOMMU:
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index c0e75f24b7..c17ad82aa4 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -104,6 +104,8 @@ vfio_listener_region_add_no_dma_map(const char *name, 
uint64_t iova, uint64_t si
 vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING 
region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" 
- 0x%"PRIx64
 vfio_disconnect_container(int fd) "close container->fd=%d"
+vfio_connect_existing_container(int groupid, int container_fd) "group=%d 
existing container fd=%d"
+vfio_connect_new_container(int groupid, int container_fd) "group=%d new 
container fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int 
num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
 vfio_put_base_device(int fd) "close vdev->fd=%d"
-- 
2.26.2

[RFC v8 09/28] iommu: Introduce generic header

2021-02-25 Thread Eric Auger

This header is meant to exposes data types used by
several IOMMU devices such as struct for SVA and
nested stage configuration.

Signed-off-by: Eric Auger 
---
 include/hw/iommu/iommu.h | 28 
 1 file changed, 28 insertions(+)
 create mode 100644 include/hw/iommu/iommu.h

diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
new file mode 100644
index 00..12092bda7b
--- /dev/null
+++ b/include/hw/iommu/iommu.h
@@ -0,0 +1,28 @@
+/*
+ * common header for iommu devices
+ *
+ * Copyright Red Hat, Inc. 2019
+ *
+ * Authors:
+ *  Eric Auger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_HW_IOMMU_IOMMU_H
+#define QEMU_HW_IOMMU_IOMMU_H
+#ifdef __linux__
+#include 
+#endif
+
+typedef struct IOMMUConfig {
+union {
+#ifdef __linux__
+struct iommu_pasid_table_config pasid_cfg;
+#endif
+  };
+} IOMMUConfig;
+
+
+#endif /* QEMU_HW_IOMMU_IOMMU_H */
-- 
2.26.2

[PATCH v23 09/17] i386: split tcg excp_helper into sysemu and user parts

2021-02-25 Thread Claudio Fontana

Signed-off-by: Claudio Fontana 
Reviewed-by: Richard Henderson 
---
 target/i386/tcg/excp_helper.c| 572 --
 target/i386/tcg/sysemu/excp_helper.c | 581 +++
 target/i386/tcg/user/excp_helper.c   |  39 ++
 target/i386/tcg/sysemu/meson.build   |   1 +
 target/i386/tcg/user/meson.build |   1 +
 5 files changed, 622 insertions(+), 572 deletions(-)
 create mode 100644 target/i386/tcg/sysemu/excp_helper.c
 create mode 100644 target/i386/tcg/user/excp_helper.c

diff --git a/target/i386/tcg/excp_helper.c b/target/i386/tcg/excp_helper.c
index b7d6259e4a..0183f3932e 100644
--- a/target/i386/tcg/excp_helper.c
+++ b/target/i386/tcg/excp_helper.c
@@ -137,575 +137,3 @@ void raise_exception_ra(CPUX86State *env, int 
exception_index, uintptr_t retaddr
 {
 raise_interrupt2(env, exception_index, 0, 0, 0, retaddr);
 }
-
-#if !defined(CONFIG_USER_ONLY)
-static hwaddr get_hphys(CPUState *cs, hwaddr gphys, MMUAccessType access_type,
-int *prot)
-{
-CPUX86State *env = &X86_CPU(cs)->env;
-uint64_t rsvd_mask = PG_HI_RSVD_MASK;
-uint64_t ptep, pte;
-uint64_t exit_info_1 = 0;
-target_ulong pde_addr, pte_addr;
-uint32_t page_offset;
-int page_size;
-
-if (likely(!(env->hflags2 & HF2_NPT_MASK))) {
-return gphys;
-}
-
-if (!(env->nested_pg_mode & SVM_NPT_NXE)) {
-rsvd_mask |= PG_NX_MASK;
-}
-
-if (env->nested_pg_mode & SVM_NPT_PAE) {
-uint64_t pde, pdpe;
-target_ulong pdpe_addr;
-
-#ifdef TARGET_X86_64
-if (env->nested_pg_mode & SVM_NPT_LMA) {
-uint64_t pml5e;
-uint64_t pml4e_addr, pml4e;
-
-pml5e = env->nested_cr3;
-ptep = PG_NX_MASK | PG_USER_MASK | PG_RW_MASK;
-
-pml4e_addr = (pml5e & PG_ADDRESS_MASK) +
-(((gphys >> 39) & 0x1ff) << 3);
-pml4e = x86_ldq_phys(cs, pml4e_addr);
-if (!(pml4e & PG_PRESENT_MASK)) {
-goto do_fault;
-}
-if (pml4e & (rsvd_mask | PG_PSE_MASK)) {
-goto do_fault_rsvd;
-}
-if (!(pml4e & PG_ACCESSED_MASK)) {
-pml4e |= PG_ACCESSED_MASK;
-x86_stl_phys_notdirty(cs, pml4e_addr, pml4e);
-}
-ptep &= pml4e ^ PG_NX_MASK;
-pdpe_addr = (pml4e & PG_ADDRESS_MASK) +
-(((gphys >> 30) & 0x1ff) << 3);
-pdpe = x86_ldq_phys(cs, pdpe_addr);
-if (!(pdpe & PG_PRESENT_MASK)) {
-goto do_fault;
-}
-if (pdpe & rsvd_mask) {
-goto do_fault_rsvd;
-}
-ptep &= pdpe ^ PG_NX_MASK;
-if (!(pdpe & PG_ACCESSED_MASK)) {
-pdpe |= PG_ACCESSED_MASK;
-x86_stl_phys_notdirty(cs, pdpe_addr, pdpe);
-}
-if (pdpe & PG_PSE_MASK) {
-/* 1 GB page */
-page_size = 1024 * 1024 * 1024;
-pte_addr = pdpe_addr;
-pte = pdpe;
-goto do_check_protect;
-}
-} else
-#endif
-{
-pdpe_addr = (env->nested_cr3 & ~0x1f) + ((gphys >> 27) & 0x18);
-pdpe = x86_ldq_phys(cs, pdpe_addr);
-if (!(pdpe & PG_PRESENT_MASK)) {
-goto do_fault;
-}
-rsvd_mask |= PG_HI_USER_MASK;
-if (pdpe & (rsvd_mask | PG_NX_MASK)) {
-goto do_fault_rsvd;
-}
-ptep = PG_NX_MASK | PG_USER_MASK | PG_RW_MASK;
-}
-
-pde_addr = (pdpe & PG_ADDRESS_MASK) + (((gphys >> 21) & 0x1ff) << 3);
-pde = x86_ldq_phys(cs, pde_addr);
-if (!(pde & PG_PRESENT_MASK)) {
-goto do_fault;
-}
-if (pde & rsvd_mask) {
-goto do_fault_rsvd;
-}
-ptep &= pde ^ PG_NX_MASK;
-if (pde & PG_PSE_MASK) {
-/* 2 MB page */
-page_size = 2048 * 1024;
-pte_addr = pde_addr;
-pte = pde;
-goto do_check_protect;
-}
-/* 4 KB page */
-if (!(pde & PG_ACCESSED_MASK)) {
-pde |= PG_ACCESSED_MASK;
-x86_stl_phys_notdirty(cs, pde_addr, pde);
-}
-pte_addr = (pde & PG_ADDRESS_MASK) + (((gphys >> 12) & 0x1ff) << 3);
-pte = x86_ldq_phys(cs, pte_addr);
-if (!(pte & PG_PRESENT_MASK)) {
-goto do_fault;
-}
-if (pte & rsvd_mask) {
-goto do_fault_rsvd;
-}
-/* combine pde and pte nx, user and rw protections */
-ptep &= pte ^ PG_NX_MASK;
-page_size = 4096;
-} else {
-uint32_t pde;
-
-/* page directory entry */
-pde_addr = (env->nested_cr3 & ~0xfff) + ((gphys >> 20) & 0xffc);
-pde = x86_ldl_phys(cs, pde_addr);
-if (!(pde & PG_PRESENT_MASK)) {
-goto do_fault;
-}
-ptep = pde |

[PATCH v23 14/17] i386: split seg_helper into user-only and sysemu parts

2021-02-25 Thread Claudio Fontana

Signed-off-by: Claudio Fontana 
Reviewed-by: Richard Henderson 
---
 target/i386/tcg/helper-tcg.h|   5 +
 target/i386/tcg/seg_helper.h|  66 
 target/i386/tcg/seg_helper.c| 233 +---
 target/i386/tcg/sysemu/seg_helper.c | 125 +++
 target/i386/tcg/user/seg_helper.c   | 109 +
 target/i386/tcg/sysemu/meson.build  |   1 +
 target/i386/tcg/user/meson.build|   1 +
 7 files changed, 311 insertions(+), 229 deletions(-)
 create mode 100644 target/i386/tcg/seg_helper.h
 create mode 100644 target/i386/tcg/sysemu/seg_helper.c
 create mode 100644 target/i386/tcg/user/seg_helper.c

diff --git a/target/i386/tcg/helper-tcg.h b/target/i386/tcg/helper-tcg.h
index b420b3356d..30eacdbbc9 100644
--- a/target/i386/tcg/helper-tcg.h
+++ b/target/i386/tcg/helper-tcg.h
@@ -88,6 +88,11 @@ void do_vmexit(CPUX86State *env, uint32_t exit_code, 
uint64_t exit_info_1);
 
 /* seg_helper.c */
 void do_interrupt_x86_hardirq(CPUX86State *env, int intno, int is_hw);
+void do_interrupt_all(X86CPU *cpu, int intno, int is_int,
+  int error_code, target_ulong next_eip, int is_hw);
+void handle_even_inj(CPUX86State *env, int intno, int is_int,
+ int error_code, int is_hw, int rm);
+int exception_has_error_code(int intno);
 
 /* smm_helper.c */
 void do_smm_enter(X86CPU *cpu);
diff --git a/target/i386/tcg/seg_helper.h b/target/i386/tcg/seg_helper.h
new file mode 100644
index 00..ebf1035277
--- /dev/null
+++ b/target/i386/tcg/seg_helper.h
@@ -0,0 +1,66 @@
+/*
+ *  x86 segmentation related helpers macros
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef SEG_HELPER_H
+#define SEG_HELPER_H
+
+//#define DEBUG_PCALL
+
+#ifdef DEBUG_PCALL
+# define LOG_PCALL(...) qemu_log_mask(CPU_LOG_PCALL, ## __VA_ARGS__)
+# define LOG_PCALL_STATE(cpu)  \
+log_cpu_state_mask(CPU_LOG_PCALL, (cpu), CPU_DUMP_CCOP)
+#else
+# define LOG_PCALL(...) do { } while (0)
+# define LOG_PCALL_STATE(cpu) do { } while (0)
+#endif
+
+/*
+ * TODO: Convert callers to compute cpu_mmu_index_kernel once
+ * and use *_mmuidx_ra directly.
+ */
+#define cpu_ldub_kernel_ra(e, p, r) \
+cpu_ldub_mmuidx_ra(e, p, cpu_mmu_index_kernel(e), r)
+#define cpu_lduw_kernel_ra(e, p, r) \
+cpu_lduw_mmuidx_ra(e, p, cpu_mmu_index_kernel(e), r)
+#define cpu_ldl_kernel_ra(e, p, r) \
+cpu_ldl_mmuidx_ra(e, p, cpu_mmu_index_kernel(e), r)
+#define cpu_ldq_kernel_ra(e, p, r) \
+cpu_ldq_mmuidx_ra(e, p, cpu_mmu_index_kernel(e), r)
+
+#define cpu_stb_kernel_ra(e, p, v, r) \
+cpu_stb_mmuidx_ra(e, p, v, cpu_mmu_index_kernel(e), r)
+#define cpu_stw_kernel_ra(e, p, v, r) \
+cpu_stw_mmuidx_ra(e, p, v, cpu_mmu_index_kernel(e), r)
+#define cpu_stl_kernel_ra(e, p, v, r) \
+cpu_stl_mmuidx_ra(e, p, v, cpu_mmu_index_kernel(e), r)
+#define cpu_stq_kernel_ra(e, p, v, r) \
+cpu_stq_mmuidx_ra(e, p, v, cpu_mmu_index_kernel(e), r)
+
+#define cpu_ldub_kernel(e, p)cpu_ldub_kernel_ra(e, p, 0)
+#define cpu_lduw_kernel(e, p)cpu_lduw_kernel_ra(e, p, 0)
+#define cpu_ldl_kernel(e, p) cpu_ldl_kernel_ra(e, p, 0)
+#define cpu_ldq_kernel(e, p) cpu_ldq_kernel_ra(e, p, 0)
+
+#define cpu_stb_kernel(e, p, v)  cpu_stb_kernel_ra(e, p, v, 0)
+#define cpu_stw_kernel(e, p, v)  cpu_stw_kernel_ra(e, p, v, 0)
+#define cpu_stl_kernel(e, p, v)  cpu_stl_kernel_ra(e, p, v, 0)
+#define cpu_stq_kernel(e, p, v)  cpu_stq_kernel_ra(e, p, v, 0)
+
+#endif /* SEG_HELPER_H */
diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
index d04fbdd7cd..cf3f051524 100644
--- a/target/i386/tcg/seg_helper.c
+++ b/target/i386/tcg/seg_helper.c
@@ -26,49 +26,7 @@
 #include "exec/cpu_ldst.h"
 #include "exec/log.h"
 #include "helper-tcg.h"
-
-//#define DEBUG_PCALL
-
-#ifdef DEBUG_PCALL
-# define LOG_PCALL(...) qemu_log_mask(CPU_LOG_PCALL, ## __VA_ARGS__)
-# define LOG_PCALL_STATE(cpu)  \
-log_cpu_state_mask(CPU_LOG_PCALL, (cpu), CPU_DUMP_CCOP)
-#else
-# define LOG_PCALL(...) do { } while (0)
-# define LOG_PCALL_STATE(cpu) do { } while (0)
-#endif
-
-/*
- * TODO: Convert callers to compute cpu_mmu_index_kernel once
- * and use *_mmuidx_ra directly.
- */
-#define cpu_ldub_kernel_ra(e, p, r) \
-cpu_ldub_mmuidx_ra(e, p, cpu_mmu_index_kernel(e)

[RFC v8 06/28] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute

2021-02-25 Thread Eric Auger

We introduce a new IOMMU Memory Region attribute,
IOMMU_ATTR_VFIO_NESTED that tells whether the virtual IOMMU
requires HW nested paging for VFIO integration.

Current Intel virtual IOMMU device supports "Caching
Mode" and does not require 2 stages at physical level to be
integrated with VFIO. However SMMUv3 does not implement such
"caching mode" and requires to use HW nested paging.

As such SMMUv3 is the first IOMMU device to advertise this
attribute.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c   | 12 
 include/exec/memory.h |  3 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 8dffb1bcc3..6172a62b8e 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1580,6 +1580,17 @@ static int smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 return 0;
 }
 
+static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
+   enum IOMMUMemoryRegionAttr attr,
+   void *data)
+{
+if (attr == IOMMU_ATTR_VFIO_NESTED) {
+*(bool *) data = true;
+return 0;
+}
+return -EINVAL;
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1587,6 +1598,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 
 imrc->translate = smmuv3_translate;
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
+imrc->get_attr = smmuv3_get_attr;
 }
 
 static const TypeInfo smmuv3_type_info = {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index f2c9bd5fcc..04c75f13c2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -273,7 +273,8 @@ typedef struct MemoryRegionClass {
 
 
 enum IOMMUMemoryRegionAttr {
-IOMMU_ATTR_SPAPR_TCE_FD
+IOMMU_ATTR_SPAPR_TCE_FD,
+IOMMU_ATTR_VFIO_NESTED,
 };
 
 /*
-- 
2.26.2

[RFC v8 12/28] vfio: Introduce hostwin_from_range helper

2021-02-25 Thread Eric Auger

Let's introduce a hostwin_from_range() helper that returns the
hostwin encapsulating an IOVA range or NULL if none is found.

This improves the readibility of callers and removes the usage
of hostwin_found.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 04e5699ccf..7d3c35a0ed 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -654,6 +654,19 @@ out:
 rcu_read_unlock();
 }
 
+static VFIOHostDMAWindow *
+hostwin_from_range(VFIOContainer *container, hwaddr iova, hwaddr end)
+{
+VFIOHostDMAWindow *hostwin;
+
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+return hostwin;
+}
+}
+return NULL;
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -663,7 +676,6 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 void *vaddr;
 int ret;
 VFIOHostDMAWindow *hostwin;
-bool hostwin_found;
 Error *err = NULL;
 
 if (vfio_listener_skipped_section(section)) {
@@ -748,15 +760,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 #endif
 }
 
-hostwin_found = false;
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-
-if (!hostwin_found) {
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
 error_setg(&err, "Container %p can't map guest IOVA region"
" 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
 goto fail;
@@ -937,16 +942,9 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 
 if (memory_region_is_ram_device(section->mr)) {
 hwaddr pgmask;
-VFIOHostDMAWindow *hostwin;
-bool hostwin_found = false;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
 
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-assert(hostwin_found); /* or region_add() would have failed */
+assert(hostwin); /* or region_add() would have failed */
 
 pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
 try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
-- 
2.26.2

[PATCH v23 16/17] i386: gdbstub: only write CR0/CR2/CR3/EFER for sysemu

2021-02-25 Thread Claudio Fontana

define some aux functions to avoid repeating the same code
all over.

Signed-off-by: Claudio Fontana 
Cc: Paolo Bonzini 
---
 target/i386/gdbstub.c | 167 --
 1 file changed, 63 insertions(+), 104 deletions(-)

diff --git a/target/i386/gdbstub.c b/target/i386/gdbstub.c
index 41e265fc67..30812fe21f 100644
--- a/target/i386/gdbstub.c
+++ b/target/i386/gdbstub.c
@@ -78,6 +78,25 @@ static const int gpr_map32[8] = { 0, 1, 2, 3, 4, 5, 6, 7 };
 #define GDB_FORCE_64 0
 #endif
 
+static int gdb_read_reg_cs64(uint32_t hflags, GByteArray *buf, target_ulong 
val)
+{
+if ((hflags & HF_CS64_MASK) || GDB_FORCE_64) {
+return gdb_get_reg64(buf, val);
+}
+return gdb_get_reg32(buf, val);
+}
+
+static int gdb_write_reg_cs64(uint32_t hflags, uint8_t *buf, target_ulong *val)
+{
+#ifdef TARGET_X86_64
+if (hflags & HF_CS64_MASK) {
+*val = ldq_p(buf);
+return 8;
+}
+#endif
+*val = ldl_p(buf);
+return 4;
+}
 
 int x86_cpu_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n)
 {
@@ -142,25 +161,14 @@ int x86_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 return gdb_get_reg32(mem_buf, env->segs[R_FS].selector);
 case IDX_SEG_REGS + 5:
 return gdb_get_reg32(mem_buf, env->segs[R_GS].selector);
-
 case IDX_SEG_REGS + 6:
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->segs[R_FS].base);
-}
-return gdb_get_reg32(mem_buf, env->segs[R_FS].base);
-
+return gdb_read_reg_cs64(env->hflags, mem_buf, 
env->segs[R_FS].base);
 case IDX_SEG_REGS + 7:
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->segs[R_GS].base);
-}
-return gdb_get_reg32(mem_buf, env->segs[R_GS].base);
+return gdb_read_reg_cs64(env->hflags, mem_buf, 
env->segs[R_GS].base);
 
 case IDX_SEG_REGS + 8:
 #ifdef TARGET_X86_64
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->kernelgsbase);
-}
-return gdb_get_reg32(mem_buf, env->kernelgsbase);
+return gdb_read_reg_cs64(env->hflags, mem_buf, env->kernelgsbase);
 #else
 return gdb_get_reg32(mem_buf, 0);
 #endif
@@ -188,45 +196,23 @@ int x86_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 return gdb_get_reg32(mem_buf, env->mxcsr);
 
 case IDX_CTL_CR0_REG:
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->cr[0]);
-}
-return gdb_get_reg32(mem_buf, env->cr[0]);
-
+return gdb_read_reg_cs64(env->hflags, mem_buf, env->cr[0]);
 case IDX_CTL_CR2_REG:
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->cr[2]);
-}
-return gdb_get_reg32(mem_buf, env->cr[2]);
-
+return gdb_read_reg_cs64(env->hflags, mem_buf, env->cr[2]);
 case IDX_CTL_CR3_REG:
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->cr[3]);
-}
-return gdb_get_reg32(mem_buf, env->cr[3]);
-
+return gdb_read_reg_cs64(env->hflags, mem_buf, env->cr[3]);
 case IDX_CTL_CR4_REG:
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->cr[4]);
-}
-return gdb_get_reg32(mem_buf, env->cr[4]);
-
+return gdb_read_reg_cs64(env->hflags, mem_buf, env->cr[4]);
 case IDX_CTL_CR8_REG:
-#ifdef CONFIG_SOFTMMU
+#ifndef CONFIG_USER_ONLY
 tpr = cpu_get_apic_tpr(cpu->apic_state);
 #else
 tpr = 0;
 #endif
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, tpr);
-}
-return gdb_get_reg32(mem_buf, tpr);
+return gdb_read_reg_cs64(env->hflags, mem_buf, tpr);
 
 case IDX_CTL_EFER_REG:
-if ((env->hflags & HF_CS64_MASK) || GDB_FORCE_64) {
-return gdb_get_reg64(mem_buf, env->efer);
-}
-return gdb_get_reg32(mem_buf, env->efer);
+return gdb_read_reg_cs64(env->hflags, mem_buf, env->efer);
 }
 }
 return 0;
@@ -266,7 +252,8 @@ int x86_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = &cpu->env;
-uint32_t tmp;
+target_ulong tmp;
+int len;
 
 /* N.B. GDB can't deal with changes in registers or sizes in the middle
of a session. So if we're in 32-bit mode on a 64-bit cpu, still act
@@ -329,30 +316,13 @@ int x86_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 r

[RFC v8 15/28] vfio: Pass stage 1 MSI bindings to the host

2021-02-25 Thread Eric Auger

We register the stage1 MSI bindings when enabling the vectors
and we unregister them on msi disable.

Signed-off-by: Eric Auger 

---

v7 -> v8:
- add unregistration on msix_diable
- remove vfio_container_unbind_msis()

v4 -> v5:
- use VFIO_IOMMU_SET_MSI_BINDING

v2 -> v3:
- only register the notifier if the IOMMU translates MSIs
- record the msi bindings in a container list and unregister on
  container release
---
 hw/vfio/common.c  | 59 +++
 hw/vfio/pci.c | 76 ++-
 hw/vfio/trace-events  |  2 +
 include/hw/vfio/vfio-common.h | 12 ++
 4 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9bd40f5299..8a64ba414b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -667,6 +667,65 @@ static void vfio_iommu_unmap_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 }
 }
 
+int vfio_iommu_set_msi_binding(VFIOContainer *container, int n,
+   IOMMUTLBEntry *iotlb)
+{
+struct vfio_iommu_type1_set_msi_binding ustruct;
+VFIOMSIBinding *binding;
+int ret;
+
+QLIST_FOREACH(binding, &container->msibinding_list, next) {
+if (binding->index == n) {
+return 0;
+}
+}
+
+ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
+ustruct.iova = iotlb->iova;
+ustruct.flags = VFIO_IOMMU_BIND_MSI;
+ustruct.gpa = iotlb->translated_addr;
+ustruct.size = iotlb->addr_mask + 1;
+ret = ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , &ustruct);
+if (ret) {
+error_report("%s: failed to register the stage1 MSI binding (%m)",
+ __func__);
+return ret;
+}
+binding =  g_new0(VFIOMSIBinding, 1);
+binding->iova = ustruct.iova;
+binding->gpa = ustruct.gpa;
+binding->size = ustruct.size;
+binding->index = n;
+
+QLIST_INSERT_HEAD(&container->msibinding_list, binding, next);
+return 0;
+}
+
+int vfio_iommu_unset_msi_binding(VFIOContainer *container, int n)
+{
+struct vfio_iommu_type1_set_msi_binding ustruct;
+VFIOMSIBinding *binding, *tmp;
+int ret;
+
+ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
+QLIST_FOREACH_SAFE(binding, &container->msibinding_list, next, tmp) {
+if (binding->index != n) {
+continue;
+}
+ustruct.flags = VFIO_IOMMU_UNBIND_MSI;
+ustruct.iova = binding->iova;
+ret = ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , &ustruct);
+if (ret) {
+error_report("Failed to unregister the stage1 MSI binding for 
iova=0x%"PRIx64" (%m)",
+ binding->iova);
+}
+QLIST_REMOVE(binding, next);
+g_free(binding);
+return ret;
+}
+return 0;
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b28e58db34..573c74b466 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -366,6 +366,65 @@ static void vfio_msi_interrupt(void *opaque)
 notify(&vdev->pdev, nr);
 }
 
+static bool vfio_iommu_require_msi_binding(IOMMUMemoryRegion *iommu_mr)
+{
+bool msi_translate = false, nested = false;
+
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_MSI_TRANSLATE,
+ (void *)&msi_translate);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *)&nested);
+if (!nested || !msi_translate) {
+return false;
+}
+   return true;
+}
+
+static int vfio_register_msi_binding(VFIOPCIDevice *vdev,
+ int vector_n, bool set)
+{
+VFIOContainer *container = vdev->vbasedev.group->container;
+PCIDevice *dev = &vdev->pdev;
+AddressSpace *as = pci_device_iommu_address_space(dev);
+IOMMUMemoryRegionClass *imrc;
+IOMMUMemoryRegion *iommu_mr;
+IOMMUTLBEntry entry;
+MSIMessage msg;
+
+if (as == &address_space_memory) {
+return 0;
+}
+
+iommu_mr = IOMMU_MEMORY_REGION(as->root);
+if (!vfio_iommu_require_msi_binding(iommu_mr)) {
+return 0;
+}
+
+/* MSI doorbell address is translated by an IOMMU */
+
+if (!set) { /* unregister */
+trace_vfio_unregister_msi_binding(vdev->vbasedev.name, vector_n);
+
+return vfio_iommu_unset_msi_binding(container, vector_n);
+}
+
+msg = pci_get_msi_message(dev, vector_n);
+imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
+
+rcu_read_lock();
+entry = imrc->translate(iommu_mr, msg.address, IOMMU_WO, 0);
+rcu_read_unlock();
+
+if (entry.perm == IOMMU_NONE) {
+return -ENOENT;
+}
+
+trace_vfio_register_msi_binding(vdev->vbasedev.name, vector_n,
+msg.address, entry.translated_addr);
+
+return vfio_iommu_set_msi_

[RFC v8 04/28] memory: Add new fields in IOTLBEntry

2021-02-25 Thread Eric Auger

The current IOTLBEntry becomes too simple to interact with
some physical IOMMUs. IOTLBs can be invalidated with different
granularities: domain, pasid, addr. Current IOTLB entry only offers
page selective invalidation. Let's add a granularity field
that conveys this information.

Also TLB entries are usually tagged with some ids such as the asid
or pasid. When propagating an invalidation command from the
guest to the host, we need to pass those IDs.

Also we add a leaf field which indicates, in case of invalidation
notification, whether only cache entries for the last level of
translation are required to be invalidated.

A flag field is introduced to inform whether those fields are set.

To enforce all existing users do not user those new fields,
initialize the IOMMUTLBEvents when needed.

Signed-off-by: Eric Auger 

---

v7 -> v8:
- add pasid, granularity and flags
---
 hw/arm/smmu-common.c |  2 +-
 hw/arm/smmuv3.c  |  2 +-
 hw/i386/intel_iommu.c|  6 +++---
 hw/ppc/spapr_iommu.c |  2 +-
 hw/virtio/virtio-iommu.c |  4 ++--
 include/exec/memory.h| 36 +++-
 6 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 84d2c62c26..0ba3dca3b8 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -471,7 +471,7 @@ IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid)
 /* Unmap the whole notifier's range */
 static void smmu_unmap_notifier_range(IOMMUNotifier *n)
 {
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 
 event.type = IOMMU_NOTIFIER_UNMAP;
 event.entry.target_as = &address_space_memory;
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 3b87324ce2..d037d6df5b 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -801,7 +801,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
uint8_t tg, uint64_t num_pages)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 uint8_t granule;
 
 if (!tg) {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6be8f32918..1c5b43f902 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1195,7 +1195,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
 uint32_t offset;
 uint64_t slpte;
 uint64_t subpage_size, subpage_mask;
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 uint64_t iova = start;
 uint64_t iova_next;
 int ret = 0;
@@ -2427,7 +2427,7 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState 
*s,
   VTDInvDesc *inv_desc)
 {
 VTDAddressSpace *vtd_dev_as;
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 struct VTDBus *vtd_bus;
 hwaddr addr;
 uint64_t sz;
@@ -3483,7 +3483,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, 
IOMMUNotifier *n)
 size = remain = end - start + 1;
 
 while (remain >= VTD_PAGE_SIZE) {
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
 uint64_t size = mask + 1;
 
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index f2d460db5f..dda5ba52bf 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -450,7 +450,7 @@ static void spapr_tce_reset(DeviceState *dev)
 static target_ulong put_tce_emu(SpaprTceTable *tcet, target_ulong ioba,
 target_ulong tce)
 {
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 hwaddr page_mask = IOMMU_PAGE_MASK(tcet->page_shift);
 unsigned long index = (ioba - tcet->bus_offset) >> tcet->page_shift;
 
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 1b23e8e18c..83ed2b82e6 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -129,7 +129,7 @@ static void virtio_iommu_notify_map(IOMMUMemoryRegion *mr, 
hwaddr virt_start,
 hwaddr virt_end, hwaddr paddr,
 uint32_t flags)
 {
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 IOMMUAccessFlags perm = IOMMU_ACCESS_FLAG(flags & VIRTIO_IOMMU_MAP_F_READ,
   flags & 
VIRTIO_IOMMU_MAP_F_WRITE);
 
@@ -154,7 +154,7 @@ static void virtio_iommu_notify_map(IOMMUMemoryRegion *mr, 
hwaddr virt_start,
 static void virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr virt_start,
   hwaddr virt_end)
 {
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 uint64_t delta = virt_end - virt_start;
 
 if (!(mr->iommu_notify_flags & IOMMU_NOTIFIER_UNMAP)) {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index c6fb714e49..f2c9bd5fcc 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -75,14 +75,48 @@ typedef enum {
 IOMMU_RW   = 3,
 } IOMMUAccessFlags;
 
+/* Granularity of the cache i

[RFC v8 16/28] vfio: Helper to get IRQ info including capabilities

2021-02-25 Thread Eric Auger

As done for vfio regions, add helpers to retrieve irq info
including their optional capabilities.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c  | 97 +++
 hw/vfio/trace-events  |  1 +
 include/hw/vfio/vfio-common.h |  7 +++
 3 files changed, 105 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8a64ba414b..9c8533c9c5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1419,6 +1419,25 @@ bool vfio_get_info_dma_avail(struct 
vfio_iommu_type1_info *info,
 return true;
 }
 
+struct vfio_info_cap_header *
+vfio_get_irq_info_cap(struct vfio_irq_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_IRQ_INFO_FLAG_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+
+return NULL;
+}
+
 static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
   struct vfio_region_info *info)
 {
@@ -2352,6 +2371,33 @@ retry:
 return 0;
 }
 
+int vfio_get_irq_info(VFIODevice *vbasedev, int index,
+  struct vfio_irq_info **info)
+{
+size_t argsz = sizeof(struct vfio_irq_info);
+
+*info = g_malloc0(argsz);
+
+(*info)->index = index;
+retry:
+(*info)->argsz = argsz;
+
+if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, *info)) {
+g_free(*info);
+*info = NULL;
+return -errno;
+}
+
+if ((*info)->argsz > argsz) {
+argsz = (*info)->argsz;
+*info = g_realloc(*info, argsz);
+
+goto retry;
+}
+
+return 0;
+}
+
 int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
  uint32_t subtype, struct vfio_region_info **info)
 {
@@ -2387,6 +2433,42 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, 
uint32_t type,
 return -ENODEV;
 }
 
+int vfio_get_dev_irq_info(VFIODevice *vbasedev, uint32_t type,
+  uint32_t subtype, struct vfio_irq_info **info)
+{
+int i;
+
+for (i = 0; i < vbasedev->num_irqs; i++) {
+struct vfio_info_cap_header *hdr;
+struct vfio_irq_info_cap_type *cap_type;
+
+if (vfio_get_irq_info(vbasedev, i, info)) {
+continue;
+}
+
+hdr = vfio_get_irq_info_cap(*info, VFIO_IRQ_INFO_CAP_TYPE);
+if (!hdr) {
+g_free(*info);
+continue;
+}
+
+cap_type = container_of(hdr, struct vfio_irq_info_cap_type, header);
+
+trace_vfio_get_dev_irq(vbasedev->name, i,
+   cap_type->type, cap_type->subtype);
+
+if (cap_type->type == type && cap_type->subtype == subtype) {
+return 0;
+}
+
+g_free(*info);
+}
+
+*info = NULL;
+return -ENODEV;
+}
+
+
 bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
 {
 struct vfio_region_info *info = NULL;
@@ -2402,6 +2484,21 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int 
region, uint16_t cap_type)
 return ret;
 }
 
+bool vfio_has_irq_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
+{
+struct vfio_region_info *info = NULL;
+bool ret = false;
+
+if (!vfio_get_region_info(vbasedev, region, &info)) {
+if (vfio_get_region_info_cap(info, cap_type)) {
+ret = true;
+}
+g_free(info);
+}
+
+return ret;
+}
+
 /*
  * Interfaces for IBM EEH (Enhanced Error Handling)
  */
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 8e2a297a4c..815a71e099 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -119,6 +119,7 @@ vfio_region_unmap(const char *name, unsigned long offset, 
unsigned long end) "Re
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
+vfio_get_dev_irq(const char *name, int index, uint32_t type, uint32_t subtype) 
"%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
 vfio_iommu_addr_inv_iotlb(int asid, uint64_t addr, uint64_t size, uint64_t 
nb_granules, bool leaf) "nested IOTLB invalidate asid=%d, addr=0x%"PRIx64" 
granule_size=0x%"PRIx64" nb_granules=0x%"PRIx64" leaf=%d"
 vfio_iommu_asid_inv_iotlb(int asid) "nested IOTLB invalidate asid=%d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f30133b2a3..fcbda2d071 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -232,6 +232,13 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info 
*info,
  unsigned int *avail);
 struct vfio_info_cap_header *
 vfio_get_device_in

[PATCH v23 15/17] i386: split off sysemu part of cpu.c

2021-02-25 Thread Claudio Fontana

Signed-off-by: Claudio Fontana 
Reviewed-by: Richard Henderson 
---
 target/i386/cpu-internal.h |  70 +++
 target/i386/cpu-sysemu.c   | 352 +
 target/i386/cpu.c  | 385 +
 target/i386/meson.build|   1 +
 4 files changed, 429 insertions(+), 379 deletions(-)
 create mode 100644 target/i386/cpu-internal.h
 create mode 100644 target/i386/cpu-sysemu.c

diff --git a/target/i386/cpu-internal.h b/target/i386/cpu-internal.h
new file mode 100644
index 00..9baac5c0b4
--- /dev/null
+++ b/target/i386/cpu-internal.h
@@ -0,0 +1,70 @@
+/*
+ * i386 CPU internal definitions to be shared between cpu.c and cpu-sysemu.c
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef I386_CPU_INTERNAL_H
+#define I386_CPU_INTERNAL_H
+
+typedef enum FeatureWordType {
+   CPUID_FEATURE_WORD,
+   MSR_FEATURE_WORD,
+} FeatureWordType;
+
+typedef struct FeatureWordInfo {
+FeatureWordType type;
+/* feature flags names are taken from "Intel Processor Identification and
+ * the CPUID Instruction" and AMD's "CPUID Specification".
+ * In cases of disagreement between feature naming conventions,
+ * aliases may be added.
+ */
+const char *feat_names[64];
+union {
+/* If type==CPUID_FEATURE_WORD */
+struct {
+uint32_t eax;   /* Input EAX for CPUID */
+bool needs_ecx; /* CPUID instruction uses ECX as input */
+uint32_t ecx;   /* Input ECX value for CPUID */
+int reg;/* output register (R_* constant) */
+} cpuid;
+/* If type==MSR_FEATURE_WORD */
+struct {
+uint32_t index;
+} msr;
+};
+uint64_t tcg_features; /* Feature flags supported by TCG */
+uint64_t unmigratable_flags; /* Feature flags known to be unmigratable */
+uint64_t migratable_flags; /* Feature flags known to be migratable */
+/* Features that shouldn't be auto-enabled by "-cpu host" */
+uint64_t no_autoenable_flags;
+} FeatureWordInfo;
+
+extern FeatureWordInfo feature_word_info[];
+
+void x86_cpu_expand_features(X86CPU *cpu, Error **errp);
+
+#ifndef CONFIG_USER_ONLY
+GuestPanicInformation *x86_cpu_get_crash_info(CPUState *cs);
+void x86_cpu_get_crash_info_qom(Object *obj, Visitor *v,
+const char *name, void *opaque, Error **errp);
+
+void x86_cpu_apic_create(X86CPU *cpu, Error **errp);
+void x86_cpu_apic_realize(X86CPU *cpu, Error **errp);
+void x86_cpu_machine_reset_cb(void *opaque);
+#endif /* !CONFIG_USER_ONLY */
+
+#endif /* I386_CPU_INTERNAL_H */
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
new file mode 100644
index 00..6477584313
--- /dev/null
+++ b/target/i386/cpu-sysemu.c
@@ -0,0 +1,352 @@
+/*
+ *  i386 CPUID, CPU class, definitions, models: sysemu-only code
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "sysemu/xen.h"
+#include "sysemu/whpx.h"
+#include "kvm/kvm_i386.h"
+#include "qapi/error.h"
+#include "qapi/qapi-visit-run-state.h"
+#include "qapi/qmp/qdict.h"
+#include "qom/qom-qobject.h"
+#include "qapi/qapi-commands-machine-target.h"
+#include "hw/qdev-properties.h"
+
+#include "exec/address-spaces.h"
+#include "hw/i386/apic_internal.h"
+
+#include "cpu-internal.h"
+
+/* Return a QDict containing keys for all properties that can be included
+ * in static expansion of CPU models. All properties set by 
x86_cpu_load_model()
+ * must be included in the dictionary.
+ */
+static QDict *x86_cpu_static_props(void)
+{
+FeatureWord w;
+int i;
+static

[PATCH v23 17/17] i386: make cpu_load_efer sysemu-only

2021-02-25 Thread Claudio Fontana

cpu_load_efer is now used only for sysemu code.

Therefore, move this function implementation to
sysemu-only section of helper.c

Signed-off-by: Claudio Fontana 
---
 target/i386/cpu.h| 20 +---
 target/i386/helper.c | 13 +
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 3797789dc2..a1268abe9f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1957,6 +1957,11 @@ static inline AddressSpace *cpu_addressspace(CPUState 
*cs, MemTxAttrs attrs)
 return cpu_get_address_space(cs, cpu_asidx_from_attrs(cs, attrs));
 }
 
+/*
+ * load efer and update the corresponding hflags. XXX: do consistency
+ * checks with cpuid bits?
+ */
+void cpu_load_efer(CPUX86State *env, uint64_t val);
 uint8_t x86_ldub_phys(CPUState *cs, hwaddr addr);
 uint32_t x86_lduw_phys(CPUState *cs, hwaddr addr);
 uint32_t x86_ldl_phys(CPUState *cs, hwaddr addr);
@@ -2053,21 +2058,6 @@ static inline uint32_t cpu_compute_eflags(CPUX86State 
*env)
 return eflags;
 }
 
-
-/* load efer and update the corresponding hflags. XXX: do consistency
-   checks with cpuid bits? */
-static inline void cpu_load_efer(CPUX86State *env, uint64_t val)
-{
-env->efer = val;
-env->hflags &= ~(HF_LMA_MASK | HF_SVME_MASK);
-if (env->efer & MSR_EFER_LMA) {
-env->hflags |= HF_LMA_MASK;
-}
-if (env->efer & MSR_EFER_SVME) {
-env->hflags |= HF_SVME_MASK;
-}
-}
-
 static inline MemTxAttrs cpu_get_mem_attrs(CPUX86State *env)
 {
 return ((MemTxAttrs) { .secure = (env->hflags & HF_SMM_MASK) != 0 });
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 618ad1c409..7304721a94 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -574,6 +574,19 @@ void do_cpu_sipi(X86CPU *cpu)
 #endif
 
 #ifndef CONFIG_USER_ONLY
+
+void cpu_load_efer(CPUX86State *env, uint64_t val)
+{
+env->efer = val;
+env->hflags &= ~(HF_LMA_MASK | HF_SVME_MASK);
+if (env->efer & MSR_EFER_LMA) {
+env->hflags |= HF_LMA_MASK;
+}
+if (env->efer & MSR_EFER_SVME) {
+env->hflags |= HF_SVME_MASK;
+}
+}
+
 uint8_t x86_ldub_phys(CPUState *cs, hwaddr addr)
 {
 X86CPU *cpu = X86_CPU(cs);
-- 
2.26.2

[RFC v8 20/28] hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute

2021-02-25 Thread Eric Auger

The SMMUv3 has the peculiarity to translate MSI
transactionss. let's advertise the corresponding
attribute.

Signed-off-by: Eric Auger 

---
---
 hw/arm/smmuv3.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 6172a62b8e..a998e237f0 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1587,6 +1587,9 @@ static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
 if (attr == IOMMU_ATTR_VFIO_NESTED) {
 *(bool *) data = true;
 return 0;
+} else if (attr == IOMMU_ATTR_MSI_TRANSLATE) {
+*(bool *) data = true;
+return 0;
 }
 return -EINVAL;
 }
-- 
2.26.2

[RFC v8 08/28] memory: Introduce IOMMU Memory Region inject_faults API

2021-02-25 Thread Eric Auger

This new API allows to inject @count iommu_faults into
the IOMMU memory region.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 24 
 softmmu/memory.c  | 10 ++
 2 files changed, 34 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index ad6c807262..0c4389c383 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -65,6 +65,8 @@ struct ReservedRegion {
 unsigned type;
 };
 
+struct iommu_fault;
+
 typedef struct IOMMUTLBEntry IOMMUTLBEntry;
 
 /* See address_space_translate: bit 0 is read, bit 1 is write.  */
@@ -475,6 +477,19 @@ struct IOMMUMemoryRegionClass {
  int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
  uint64_t page_size_mask,
  Error **errp);
+
+/*
+ * Inject @count faults into the IOMMU memory region
+ *
+ * Optional method: if this method is not provided, then
+ * memory_region_injection_faults() will return -ENOENT
+ *
+ * @iommu: the IOMMU memory region to inject the faults in
+ * @count: number of faults to inject
+ * @buf: fault buffer
+ */
+int (*inject_faults)(IOMMUMemoryRegion *iommu, int count,
+ struct iommu_fault *buf);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -1496,6 +1511,15 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion 
*iommu_mr);
 int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
uint64_t page_size_mask,
Error **errp);
+/**
+ * memory_region_inject_faults : inject @count faults stored in @buf
+ *
+ * @iommu_mr: the IOMMU memory region
+ * @count: number of faults to be injected
+ * @buf: buffer containing the faults
+ */
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf);
 
 /**
  * memory_region_name: get a memory region's name
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 874a8fccde..c3f2c562f7 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2029,6 +2029,16 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion 
*iommu_mr)
 return imrc->num_indexes(iommu_mr);
 }
 
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf)
+{
+IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+if (!imrc->inject_faults) {
+return -ENOENT;
+}
+return imrc->inject_faults(iommu_mr, count, buf);
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
 uint8_t mask = 1 << client;
-- 
2.26.2

[RFC v8 19/28] vfio/pci: Implement the DMA fault handler

2021-02-25 Thread Eric Auger

Whenever the eventfd is triggered, we retrieve the DMA fault(s)
from the mmapped fault region and inject them in the iommu
memory region.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 50 ++
 hw/vfio/pci.h |  1 +
 2 files changed, 51 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9fbbf88673..cb46288512 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2915,10 +2915,60 @@ static PCIPASIDOps vfio_pci_pasid_ops = {
 static void vfio_dma_fault_notifier_handler(void *opaque)
 {
 VFIOPCIExtIRQ *ext_irq = opaque;
+VFIOPCIDevice *vdev = ext_irq->vdev;
+PCIDevice *pdev = &vdev->pdev;
+AddressSpace *as = pci_device_iommu_address_space(pdev);
+IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(as->root);
+struct vfio_region_dma_fault header;
+struct iommu_fault *queue;
+char *queue_buffer = NULL;
+ssize_t bytes;
 
 if (!event_notifier_test_and_clear(&ext_irq->notifier)) {
 return;
 }
+
+bytes = pread(vdev->vbasedev.fd, &header, sizeof(header),
+  vdev->dma_fault_region.fd_offset);
+if (bytes != sizeof(header)) {
+error_report("%s unable to read the fault region header (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+/* Normally the fault queue is mmapped */
+queue = (struct iommu_fault *)vdev->dma_fault_region.mmaps[0].mmap;
+if (!queue) {
+size_t queue_size = header.nb_entries * header.entry_size;
+
+error_report("%s: fault queue not mmapped: slower fault handling",
+ vdev->vbasedev.name);
+
+queue_buffer = g_malloc(queue_size);
+bytes =  pread(vdev->vbasedev.fd, queue_buffer, queue_size,
+   vdev->dma_fault_region.fd_offset + header.offset);
+if (bytes != queue_size) {
+error_report("%s unable to read the fault queue (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+queue = (struct iommu_fault *)queue_buffer;
+}
+
+while (vdev->fault_tail_index != header.head) {
+memory_region_inject_faults(iommu_mr, 1,
+&queue[vdev->fault_tail_index]);
+vdev->fault_tail_index =
+(vdev->fault_tail_index + 1) % header.nb_entries;
+}
+bytes = pwrite(vdev->vbasedev.fd, &vdev->fault_tail_index, 4,
+   vdev->dma_fault_region.fd_offset);
+if (bytes != 4) {
+error_report("%s unable to write the fault region tail index (0x%lx)",
+ __func__, bytes);
+}
+g_free(queue_buffer);
 }
 
 static int vfio_register_ext_irq_handler(VFIOPCIDevice *vdev,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index d6cc689f5e..350e9e9005 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -146,6 +146,7 @@ struct VFIOPCIDevice {
 EventNotifier req_notifier;
 VFIOPCIExtIRQ *ext_irqs;
 VFIORegion dma_fault_region;
+uint32_t fault_tail_index;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
-- 
2.26.2

Re: [PATCH qemu v13] spapr: Implement Open Firmware client interface

2021-02-25 Thread Nicholas Piggin

Excerpts from Greg Kurz's message of February 23, 2021 1:01 am:
> On Mon, 22 Feb 2021 22:48:51 +1100
> Alexey Kardashevskiy  wrote:
> 
>> Ping?
>> 
>> I need community support here :) I am hearing that having this mode 
>> helps heaps with development in fully emulated environments as this 
>> skips SLOF entirely, for example. Another rumour I am hearing is that 
>> there is interest in running grub in the userspace which this VOF thing 
>> makes handy too.
>> 
> 
> I had tried a previous version of this : skipping SLOF is very
> beneficial to do guest work, even when running on KVM.

I agree. I do run KVM in simulators which works quite well, but
SLOF ends up being the slowest part, it's also a black box to me
so it's difficult to go about debugging or changing things.

I can't be of much more help with the code I'm sorry, except to
add my +1 for the feature and agree with Greg that the non-vof
changes seem small.

Thanks,
Nick

[RFC v8 13/28] vfio: Introduce helpers to DMA map/unmap a RAM section

2021-02-25 Thread Eric Auger

Let's introduce two helpers that allow to DMA map/unmap a RAM
section. Those helpers will be called for nested stage setup in
another call site. Also the vfio_listener_region_add/del()
structure may be clearer.

Signed-off-by: Eric Auger 

---

v5 -> v6:
- add Error **
---
 hw/vfio/common.c | 199 +--
 hw/vfio/trace-events |   4 +-
 2 files changed, 119 insertions(+), 84 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7d3c35a0ed..e02fb2a3ef 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -667,13 +667,126 @@ hostwin_from_range(VFIOContainer *container, hwaddr 
iova, hwaddr end)
 return NULL;
 }
 
+static int vfio_dma_map_ram_section(VFIOContainer *container,
+MemoryRegionSection *section, Error **err)
+{
+VFIOHostDMAWindow *hostwin;
+Int128 llend, llsize;
+hwaddr iova, end;
+void *vaddr;
+int ret;
+
+assert(memory_region_is_ram(section->mr));
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+end = int128_get64(int128_sub(llend, int128_one()));
+
+vaddr = memory_region_get_ram_ptr(section->mr) +
+section->offset_within_region +
+(iova - section->offset_within_address_space);
+
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
+error_setg(err, "Container %p can't map guest IOVA region"
+   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
+return -EFAULT;
+}
+
+trace_vfio_dma_map_ram(iova, end, vaddr);
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+
+if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
+trace_vfio_listener_region_add_no_dma_map(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+int128_getlo(section->size),
+pgmask + 1);
+return 0;
+}
+}
+
+ret = vfio_dma_map(container, iova, int128_get64(llsize),
+   vaddr, section->readonly);
+if (ret) {
+error_setg(err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+   "0x%"HWADDR_PRIx", %p) = %d (%m)",
+   container, iova, int128_get64(llsize), vaddr, ret);
+if (memory_region_is_ram_device(section->mr)) {
+/* Allow unexpected mappings not to be fatal for RAM devices */
+error_report_err(*err);
+return 0;
+}
+return ret;
+}
+return 0;
+}
+
+static void vfio_dma_unmap_ram_section(VFIOContainer *container,
+   MemoryRegionSection *section)
+{
+Int128 llend, llsize;
+hwaddr iova, end;
+bool try_unmap = true;
+int ret;
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+if (int128_ge(int128_make64(iova), llend)) {
+return;
+}
+end = int128_get64(int128_sub(llend, int128_one()));
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+trace_vfio_dma_unmap_ram(iova, end);
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
+
+assert(hostwin); /* or region_add() would have failed */
+
+pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
+}
+
+if (try_unmap) {
+if (int128_eq(llsize, int128_2_64())) {
+/* The unmap ioctl doesn't accept a full 64-bit span. */
+llsize = int128_rshift(llsize, 1);
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+iova += int128_get64(llsize);
+}
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+}
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
 VFIOContainer *cont

[RFC v8 21/28] hw/arm/smmuv3: Store the PASID table GPA in the translation config

2021-02-25 Thread Eric Auger

For VFIO integration we will need to pass the Context Descriptor (CD)
table GPA to the host. The CD table is also referred to as the PASID
table. Its GPA corresponds to the s1ctrptr field of the Stream Table
Entry. So let's decode and store it in the configuration structure.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c  | 1 +
 include/hw/arm/smmu-common.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index a998e237f0..ab0e1c5818 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -358,6 +358,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
   "SMMUv3 S1 stalling fault model not allowed yet\n");
 goto bad_ste;
 }
+cfg->s1ctxptr = STE_CTXPTR(ste);
 return 0;
 
 bad_ste:
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 706be3c6d0..d578339935 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -76,6 +76,7 @@ typedef struct SMMUTransCfg {
 uint8_t tbi;   /* Top Byte Ignore */
 uint16_t asid;
 SMMUTransTableInfo tt[2];
+dma_addr_t s1ctxptr;
 uint32_t iotlb_hits;   /* counts IOTLB hits for this asid */
 uint32_t iotlb_misses; /* counts IOTLB misses for this asid */
 } SMMUTransCfg;
-- 
2.26.2

[RFC v8 28/28] vfio/pci: Implement return_page_response page response callback

2021-02-25 Thread Eric Auger

This patch implements the page response path. The
response is written into the page response ring buffer and then
update header's head index is updated. This path is not used
by this series. It is introduced here as a POC for vSVA/ARM
integration.

Signed-off-by: Eric Auger 

---

v11 -> v12:
- use VFIO_REGION_INFO_CAP_DMA_FAULT_RESPONSE [Shameer]
- fix hot del regression reported and fixed by Shameer
---
 hw/vfio/pci.c | 123 ++
 hw/vfio/pci.h |   2 +
 2 files changed, 125 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index cb46288512..437780e76b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2655,6 +2655,61 @@ out:
 g_free(fault_region_info);
 }
 
+static void vfio_init_fault_response_regions(VFIOPCIDevice *vdev, Error **errp)
+{
+struct vfio_region_info *fault_region_info = NULL;
+struct vfio_region_info_cap_fault *cap_fault;
+VFIODevice *vbasedev = &vdev->vbasedev;
+struct vfio_info_cap_header *hdr;
+char *fault_region_name;
+int ret;
+
+ret = vfio_get_dev_region_info(&vdev->vbasedev,
+   VFIO_REGION_TYPE_NESTED,
+   
VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT_RESPONSE,
+   &fault_region_info);
+if (ret) {
+goto out;
+}
+
+hdr = vfio_get_region_info_cap(fault_region_info,
+   VFIO_REGION_INFO_CAP_DMA_FAULT_RESPONSE);
+if (!hdr) {
+error_setg(errp, "failed to retrieve DMA FAULT RESPONSE capability");
+goto out;
+}
+cap_fault = container_of(hdr, struct vfio_region_info_cap_fault,
+ header);
+if (cap_fault->version != 1) {
+error_setg(errp, "Unsupported DMA FAULT RESPONSE API version %d",
+   cap_fault->version);
+goto out;
+}
+
+fault_region_name = g_strdup_printf("%s DMA FAULT RESPONSE %d",
+vbasedev->name,
+fault_region_info->index);
+
+ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+&vdev->dma_fault_response_region,
+fault_region_info->index,
+fault_region_name);
+g_free(fault_region_name);
+if (ret) {
+error_setg_errno(errp, -ret,
+ "failed to set up the DMA FAULT RESPONSE region %d",
+ fault_region_info->index);
+goto out;
+}
+
+ret = vfio_region_mmap(&vdev->dma_fault_response_region);
+if (ret) {
+error_setg_errno(errp, -ret, "Failed to mmap the DMA FAULT RESPONSE 
queue");
+}
+out:
+g_free(fault_region_info);
+}
+
 static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = &vdev->vbasedev;
@@ -2730,6 +2785,12 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 return;
 }
 
+vfio_init_fault_response_regions(vdev, &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
 irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
@@ -2908,8 +2969,68 @@ static int vfio_iommu_set_pasid_table(PCIBus *bus, 
int32_t devfn,
 return ioctl(container->fd, VFIO_IOMMU_SET_PASID_TABLE, &info);
 }
 
+static int vfio_iommu_return_page_response(PCIBus *bus, int32_t devfn,
+   IOMMUPageResponse *resp)
+{
+PCIDevice *pdev = bus->devices[devfn];
+VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+struct iommu_page_response *response = &resp->resp;
+struct vfio_region_dma_fault_response header;
+struct iommu_page_response *queue;
+char *queue_buffer = NULL;
+ssize_t bytes;
+
+if (!vdev->dma_fault_response_region.mem) {
+return -EINVAL;
+}
+
+/* read the header */
+bytes = pread(vdev->vbasedev.fd, &header, sizeof(header),
+  vdev->dma_fault_response_region.fd_offset);
+if (bytes != sizeof(header)) {
+error_report("%s unable to read the fault region header (0x%lx)",
+ __func__, bytes);
+return -1;
+}
+
+/* Normally the fault queue is mmapped */
+queue = (struct iommu_page_response 
*)vdev->dma_fault_response_region.mmaps[0].mmap;
+if (!queue) {
+size_t queue_size = header.nb_entries * header.entry_size;
+
+error_report("%s: fault queue not mmapped: slower fault handling",
+ vdev->vbasedev.name);
+
+queue_buffer = g_malloc(queue_size);
+bytes = pread(vdev->vbasedev.fd, queue_buffer, queue_size,
+  vdev->dma_fault_response_region.fd_offset + 
header.offset);
+if (bytes != queue_size) {
+error_report("%s unable to read the fault queue (0x%lx)",
+ __func__, bytes);
+

[RFC v8 02/28] update-linux-headers: Import iommu.h

2021-02-25 Thread Eric Auger

Update the script to import the new iommu.h uapi header.

Signed-off-by: Eric Auger 
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index fa6f2b6272..f588678837 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -142,7 +142,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \
+for header in kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h iommu.h \
   psci.h psp-sev.h userfaultfd.h mman.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
-- 
2.26.2

[RFC v8 17/28] vfio/pci: Register handler for iommu fault

2021-02-25 Thread Eric Auger

We use the new extended IRQ VFIO_IRQ_TYPE_NESTED type and
VFIO_IRQ_SUBTYPE_DMA_FAULT subtype to set/unset
a notifier for physical DMA faults. The associated eventfd is
triggered, in nested mode, whenever a fault is detected at IOMMU
physical level.

The actual handler will be implemented in subsequent patches.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- index_to_str now returns the index name, ie. DMA_FAULT
- use the extended IRQ

v3 -> v4:
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
---
 hw/vfio/pci.c | 81 ++-
 hw/vfio/pci.h |  7 +
 2 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 573c74b466..dfce777556 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2850,6 +2850,76 @@ static PCIPASIDOps vfio_pci_pasid_ops = {
 .set_pasid_table = vfio_iommu_set_pasid_table,
 };
 
+static void vfio_dma_fault_notifier_handler(void *opaque)
+{
+VFIOPCIExtIRQ *ext_irq = opaque;
+
+if (!event_notifier_test_and_clear(&ext_irq->notifier)) {
+return;
+}
+}
+
+static int vfio_register_ext_irq_handler(VFIOPCIDevice *vdev,
+ uint32_t type, uint32_t subtype,
+ IOHandler *handler)
+{
+int32_t fd, ext_irq_index, index;
+struct vfio_irq_info *irq_info;
+Error *err = NULL;
+EventNotifier *n;
+int ret;
+
+ret = vfio_get_dev_irq_info(&vdev->vbasedev, type, subtype, &irq_info);
+if (ret) {
+return ret;
+}
+index = irq_info->index;
+ext_irq_index = irq_info->index - VFIO_PCI_NUM_IRQS;
+g_free(irq_info);
+
+vdev->ext_irqs[ext_irq_index].vdev = vdev;
+vdev->ext_irqs[ext_irq_index].index = index;
+n = &vdev->ext_irqs[ext_irq_index].notifier;
+
+ret = event_notifier_init(n, 0);
+if (ret) {
+error_report("vfio: Unable to init event notifier for ext irq %d(%d)",
+ ext_irq_index, ret);
+return ret;
+}
+
+fd = event_notifier_get_fd(n);
+qemu_set_fd_handler(fd, vfio_dma_fault_notifier_handler, NULL,
+&vdev->ext_irqs[ext_irq_index]);
+
+ret = vfio_set_irq_signaling(&vdev->vbasedev, index, 0,
+ VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err);
+if (ret) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+qemu_set_fd_handler(fd, NULL, NULL, vdev);
+event_notifier_cleanup(n);
+}
+return ret;
+}
+
+static void vfio_unregister_ext_irq_notifiers(VFIOPCIDevice *vdev)
+{
+VFIODevice *vbasedev = &vdev->vbasedev;
+Error *err = NULL;
+int i;
+
+for (i = 0; i < vbasedev->num_irqs - VFIO_PCI_NUM_IRQS; i++) {
+if (vfio_set_irq_signaling(vbasedev, i + VFIO_PCI_NUM_IRQS , 0,
+   VFIO_IRQ_SET_ACTION_TRIGGER, -1, &err)) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+}
+qemu_set_fd_handler(event_notifier_get_fd(&vdev->ext_irqs[i].notifier),
+NULL, NULL, vdev);
+event_notifier_cleanup(&vdev->ext_irqs[i].notifier);
+}
+g_free(vdev->ext_irqs);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = VFIO_PCI(pdev);
@@ -2860,7 +2930,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 ssize_t len;
 struct stat st;
 int groupid;
-int i, ret;
+int i, ret, nb_ext_irqs;
 bool is_mdev;
 
 if (!vdev->vbasedev.sysfsdev) {
@@ -2948,6 +3018,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 goto error;
 }
 
+nb_ext_irqs = vdev->vbasedev.num_irqs - VFIO_PCI_NUM_IRQS;
+if (nb_ext_irqs > 0) {
+vdev->ext_irqs = g_new0(VFIOPCIExtIRQ, nb_ext_irqs);
+}
+
 vfio_populate_device(vdev, &err);
 if (err) {
 error_propagate(errp, err);
@@ -3159,6 +3234,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
+vfio_register_ext_irq_handler(vdev, VFIO_IRQ_TYPE_NESTED,
+  VFIO_IRQ_SUBTYPE_DMA_FAULT,
+  vfio_dma_fault_notifier_handler);
 vfio_setup_resetfn_quirk(vdev);
 
 pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
@@ -3201,6 +3279,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
+vfio_unregister_ext_irq_notifiers(vdev);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
 if (vdev->irqchip_change_notifier.notify) {
 kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 1574ef983f..c5f06f4ae4 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -114,6 +114,12 @@ typedef struct VFIOMSIXInfo {
 unsigned long *pe

[RFC v8 23/28] hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation

2021-02-25 Thread Eric Auger

Let's propagate the leaf attribute throughout the invalidation path.
This hint is used to reduce the scope of the invalidations to the
last level of translation. Not enforcing it induces large performance
penalties in nested mode.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index d5a935004b..24d77175bf 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -799,7 +799,7 @@ epilogue:
 static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
IOMMUNotifier *n,
int asid, dma_addr_t iova,
-   uint8_t tg, uint64_t num_pages)
+   uint8_t tg, uint64_t num_pages, bool leaf)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 IOMMUTLBEvent event = {};
@@ -834,6 +834,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 event.entry.perm = IOMMU_NONE;
 event.entry.flags = IOMMU_INV_FLAGS_ARCHID;
 event.entry.arch_id = asid;
+event.entry.leaf = leaf;
 
 memory_region_notify_iommu_one(n, &event);
 }
@@ -863,7 +864,7 @@ static void smmuv3_notify_asid(IOMMUMemoryRegion *mr,
 
 /* invalidate an asid/iova range tuple in all mr's */
 static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova,
-  uint8_t tg, uint64_t num_pages)
+  uint8_t tg, uint64_t num_pages, bool 
leaf)
 {
 SMMUDevice *sdev;
 
@@ -875,7 +876,7 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid, dma_addr_t iova,
 tg, num_pages);
 
 IOMMU_NOTIFIER_FOREACH(n, mr) {
-smmuv3_notify_iova(mr, n, asid, iova, tg, num_pages);
+smmuv3_notify_iova(mr, n, asid, iova, tg, num_pages, leaf);
 }
 }
 }
@@ -913,7 +914,7 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
 count = mask + 1;
 
 trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, count, ttl, leaf);
-smmuv3_inv_notifiers_iova(s, asid, addr, tg, count);
+smmuv3_inv_notifiers_iova(s, asid, addr, tg, count, leaf);
 smmu_iotlb_inv_iova(s, asid, addr, tg, count, ttl);
 
 num_pages -= count;
-- 
2.26.2

1 2 3 4 >

1 - 100 of 393 matches

Mail list logo