date:20211223

Re: [PATCH] Adding Cédric's repos in MAINTAINERS file.

2021-12-23 Thread Cédric Le Goater


Hello,

On 12/8/21 04:52, lagar...@linux.ibm.com wrote:

From: Leonardo Garcia 

Signed-off-by: Leonardo Garcia 


Here is a description of the branches I have put in place over the years
for aspeed and powernv machines on github:

  - prev stable branch

  dev branch
  -  current staging branch (I should call it -staging)
  -next   frozen staging branch
  -for-upstream   pull request branch (created on demand)

gitlab replicates but for test purposes only.

I haven't formalized yet ppc but it should more or less be the same.
Thanks for reminding me. I will update when this is clear.

Thanks,

C.


---
  MAINTAINERS | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7543eb4d59..52c6b99763 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -273,6 +273,7 @@ F: hw/ppc/ppc.c
  F: hw/ppc/ppc_booke.c
  F: include/hw/ppc/ppc.h
  F: disas/ppc.c
+T: git https://gitlab.com/legoater/qemu.git
  
  RISC-V TCG CPUs

  M: Palmer Dabbelt 
@@ -390,6 +391,7 @@ R: David Gibson 
  R: Greg Kurz 
  S: Maintained
  F: target/ppc/kvm.c
+T: git https://gitlab.com/legoater/qemu.git
  
  S390 KVM CPUs

  M: Halil Pasic 
@@ -1343,6 +1345,7 @@ F: tests/qtest/libqos/*spapr*
  F: tests/qtest/rtas*
  F: tests/qtest/libqos/rtas*
  F: tests/avocado/ppc_pseries.py
+T: git https://gitlab.com/legoater/qemu.git
  
  PowerNV (Non-Virtualized)

  M: Cédric Le Goater 
@@ -1356,6 +1359,7 @@ F: include/hw/ppc/pnv*
  F: include/hw/pci-host/pnv*
  F: pc-bios/skiboot.lid
  F: tests/qtest/pnv*
+T: git https://gitlab.com/legoater/qemu.git powernv-next
  
  virtex_ml507

  M: Edgar E. Iglesias 
@@ -1399,6 +1403,7 @@ F: hw/ppc/vof*
  F: include/hw/ppc/vof*
  F: pc-bios/vof/*
  F: pc-bios/vof*
+T: git https://gitlab.com/legoater/qemu.git
  
  RISC-V Machines

  ---
@@ -2244,6 +2249,7 @@ S: Supported
  F: hw/*/*xive*
  F: include/hw/*/*xive*
  F: docs/*/*xive*
+T: git https://gitlab.com/legoater/qemu.git
  
  Renesas peripherals

  R: Yoshinori Sato

[PULL v2 4/7] iotests.py: add qemu_tool_popen()

2021-12-23 Thread Vladimir Sementsov-Ogievskiy

Split qemu_tool_popen() from qemu_tool_pipe_and_status() to be used
separately.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Nikita Lapshin 
---
 tests/qemu-iotests/iotests.py | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 83bfedb902..452d047716 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -138,14 +138,22 @@ def unarchive_sample_image(sample, fname):
 shutil.copyfileobj(f_in, f_out)
 
 
+def qemu_tool_popen(args: Sequence[str],
+connect_stderr: bool = True) -> 'subprocess.Popen[str]':
+stderr = subprocess.STDOUT if connect_stderr else None
+# pylint: disable=consider-using-with
+return subprocess.Popen(args,
+stdout=subprocess.PIPE,
+stderr=stderr,
+universal_newlines=True)
+
+
 def qemu_tool_pipe_and_status(tool: str, args: Sequence[str],
   connect_stderr: bool = True) -> Tuple[str, int]:
 """
 Run a tool and return both its output and its exit code
 """
-stderr = subprocess.STDOUT if connect_stderr else None
-with subprocess.Popen(args, stdout=subprocess.PIPE,
-  stderr=stderr, universal_newlines=True) as subp:
+with qemu_tool_popen(args, connect_stderr) as subp:
 output = subp.communicate()[0]
 if subp.returncode < 0:
 cmd = ' '.join(args)
-- 
2.31.1

Re: [PATCH 2/3] scripts/qapi-gen.py: add --add-trace-points option

2021-12-23 Thread Philippe Mathieu-Daudé

On 12/21/21 20:35, Vladimir Sementsov-Ogievskiy wrote:
> Add and option to generate trace points. We should generate both trace
> points and trace-events files for further trace point code generation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  scripts/qapi/gen.py  | 13 ++---
>  scripts/qapi/main.py | 10 +++---
>  2 files changed, 17 insertions(+), 6 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH qemu master] hw/misc/aspeed_pwm: fix typo

2021-12-23 Thread Philippe Mathieu-Daudé

On 12/22/21 11:24, Troy Lee wrote:
> Typo found during developing.
> 
> Fixes: 70b3f1a34d3c ("hw/misc: Add basic Aspeed PWM model")
> Signed-off-by: Troy Lee 
> ---
>  hw/misc/aspeed_pwm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/misc/aspeed_pwm.c b/hw/misc/aspeed_pwm.c
> index 8ebab5dcef..dbf9634da3 100644
> --- a/hw/misc/aspeed_pwm.c
> +++ b/hw/misc/aspeed_pwm.c
> @@ -96,7 +96,7 @@ static void aspeed_pwm_class_init(ObjectClass *klass, void 
> *data)
>  
>  dc->realize = aspeed_pwm_realize;
>  dc->reset = aspeed_pwm_reset;
> -dc->desc = "Aspeed PWM Controller",
> +dc->desc = "Aspeed PWM Controller";
>  dc->vmsd = &vmstate_aspeed_pwm;
>  }
>  

No need for another patch, since it doesn't build.
Simply squash it in your commit 70b3f1a34d3c.

Re: [PATCH 02/15] ppc: Mark the 'taihu' machine as deprecated

2021-12-23 Thread Philippe Mathieu-Daudé

On 12/6/21 11:36, Cédric Le Goater wrote:
> From: Thomas Huth 
> 
> The PPC 405 CPU is a system-on-a-chip, so all 405 machines are very similar,
> except for some external periphery. However, the periphery of the 'taihu'
> machine is hardly emulated at all (e.g. neither the LCD nor the USB part had
> been implemented), so there is not much value added by this board. The users
> can use the 'ref405ep' machine to test their PPC405 code instead.
> 
> Signed-off-by: Thomas Huth 
> Reviewed-by: Daniel Henrique Barboza 
> Message-Id: <20211203164904.290954-2-th...@redhat.com>
> Signed-off-by: Cédric Le Goater 
> ---
>  docs/about/deprecated.rst | 9 +
>  hw/ppc/ppc405_boards.c| 1 +
>  2 files changed, 10 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 0/5] hw/qdev: Clarify qdev_connect_gpio_out() documentation

2021-12-23 Thread Philippe Mathieu-Daudé

Hi Peter.

Since you reviewed v1, and Ack-by on v2 would be welcomed.
Otherwise, if you don't object, I plan to queue this via
machine-next tree.

Thanks,

Phil.

On 12/18/21 14:04, Philippe Mathieu-Daudé wrote:
> Trivial patches clarifying qdev_connect_gpio_out() use,
> basically that the qemu_irq argument is an input.
> 
> Since v1:
> - Addressed Yanan Wang and Peter Maydell comments:
> - Correct qdev_init_gpio_out_named() doc
> - Drop i8042_setup_a20_line() wrapper
> 
> Philippe Mathieu-Daudé (5):
>   hw/qdev: Cosmetic around documentation
>   hw/qdev: Correct qdev_init_gpio_out_named() documentation
>   hw/qdev: Correct qdev_connect_gpio_out_named() documentation
>   hw/qdev: Rename qdev_connect_gpio_out*() 'input_pin' parameter
>   hw/input/pckbd: Open-code i8042_setup_a20_line() wrapper
> 
>  include/hw/input/i8042.h |  1 -
>  include/hw/qdev-core.h   | 24 ++--
>  hw/core/gpio.c   | 13 +++--
>  hw/i386/pc.c |  3 ++-
>  hw/input/pckbd.c |  5 -
>  5 files changed, 27 insertions(+), 19 deletions(-)
>

[PULL v2 0/7] NBD patches

2021-12-23 Thread Vladimir Sementsov-Ogievskiy

The following changes since commit 2bf40d0841b942e7ba12953d515e62a436f0af84:

  Merge tag 'pull-user-20211220' of https://gitlab.com/rth7680/qemu into 
staging (2021-12-20 13:20:07 -0800)

are available in the Git repository at:

  https://src.openvz.org/scm/~vsementsov/qemu.git tags/pull-nbd-2021-12-22-v2

for you to fetch changes up to ab7f7e67a7e7b49964109501dfcde4ec29bae60e:

  iotests: add nbd-reconnect-on-open test (2021-12-23 09:40:34 +0100)


nbd: reconnect-on-open feature
  v2: simple fix for mypy and pylint complains on patch 04



Vladimir Sementsov-Ogievskiy (7):
  nbd: allow reconnect on open, with corresponding new options
  nbd/client-connection: nbd_co_establish_connection(): return real
error
  nbd/client-connection: improve error message of cancelled attempt
  iotests.py: add qemu_tool_popen()
  iotests.py: add and use qemu_io_wrap_args()
  iotests.py: add qemu_io_popen()
  iotests: add nbd-reconnect-on-open test

 qapi/block-core.json  |  9 ++-
 block/nbd.c   | 45 +++-
 nbd/client-connection.c   | 59 ++-
 tests/qemu-iotests/iotests.py | 37 ++
 .../qemu-iotests/tests/nbd-reconnect-on-open  | 71 +++
 .../tests/nbd-reconnect-on-open.out   | 11 +++
 6 files changed, 200 insertions(+), 32 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/nbd-reconnect-on-open
 create mode 100644 tests/qemu-iotests/tests/nbd-reconnect-on-open.out

-- 
2.31.1

Re: [PATCH qemu master] hw/misc/aspeed_pwm: fix typo

2021-12-23 Thread Cédric Le Goater


Hello Troy Lee,

On 12/22/21 11:24, Troy Lee wrote:

Typo found during developing.

Fixes: 70b3f1a34d3c ("hw/misc: Add basic Aspeed PWM model")


PWM is not upstream. I will include the fix in a new aspeed-7.0 branch.

Thanks,

C.




Signed-off-by: Troy Lee 
---
  hw/misc/aspeed_pwm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/aspeed_pwm.c b/hw/misc/aspeed_pwm.c
index 8ebab5dcef..dbf9634da3 100644
--- a/hw/misc/aspeed_pwm.c
+++ b/hw/misc/aspeed_pwm.c
@@ -96,7 +96,7 @@ static void aspeed_pwm_class_init(ObjectClass *klass, void 
*data)
  
  dc->realize = aspeed_pwm_realize;

  dc->reset = aspeed_pwm_reset;
-dc->desc = "Aspeed PWM Controller",
+dc->desc = "Aspeed PWM Controller";
  dc->vmsd = &vmstate_aspeed_pwm;
  }

Re: [PATCH for-6.1?] iotest: Further enhance qemu-img-bitmaps

2021-12-23 Thread Hanna Reitz


On 21.07.21 22:46, Eric Blake wrote:

Add a regression test to make sure we detect attempts to use 'qemu-img
bitmap' to modify an in-use local file.

Suggested-by: Nir Soffer 
Signed-off-by: Eric Blake 
---

Sadly, this missed my bitmaps pull request today.  If there's any
reason to respin that pull request, I'm inclined to add this in, as it
just touches the iotests; otherwise, if it slips to 6.2 it's not too
bad.


(Going through my patches folder...)

Not sure if you’re still interested in this, but if so, we should skip 
this test case if OFD locks are not available (like 153 does).


Hanna

Re: [PATCH 3/3] meson: generate trace points for qmp commands

2021-12-23 Thread Vladimir Sementsov-Ogievskiy


23.12.2021 01:11, Paolo Bonzini wrote:

Il mar 21 dic 2021, 20:35 Vladimir Sementsov-Ogievskiy mailto:vsement...@virtuozzo.com>> ha scritto:

--- a/trace/meson.build
+++ b/trace/meson.build
@@ -2,10 +2,14 @@
  specific_ss.add(files('control-target.c'))

  trace_events_files = []
-foreach dir : [ '.' ] + trace_events_subdirs
-  trace_events_file = meson.project_source_root() / dir / 'trace-events'
+foreach path : [ '.' ] + trace_events_subdirs + qapi_trace_events
+  if path.contains('trace-events')
+    trace_events_file = meson.project_build_root() / 'qapi' / path



Just using "trace_events_file = 'qapi' / path" might work, since the build is 
nonrecursive.


This say:

ninja: error: '../trace/qapi/qapi-commands-authz.trace-events', needed by 
'trace/trace-events-all', missing and no known rule to make it
make[1]: *** [Makefile:162: run-ninja] Error 1
make[1]: Leaving directory '/work/src/qemu/up/up-trace-qmp-commands/build'
make: *** [GNUmakefile:11: all] Error 2


so, it consider the path relative to current "trace" directory.



If it doesn't, use the custom target object, possibly indexing it as ct[index]. You can 
use a dictionary to store the custom targets and find them from the "path" 
variable.



O! Great thanks! Magic. The following hack works:

diff --git a/meson.build b/meson.build
index 20d32fd20d..c42a76a14c 100644
--- a/meson.build
+++ b/meson.build
@@ -39,6 +39,7 @@ qemu_icondir = get_option('datadir') / 'icons'
 config_host_data = configuration_data()
 genh = []
 qapi_trace_events = []
+qapi_trace_events_targets = {}
 
 target_dirs = config_host['TARGET_DIRS'].split()

 have_linux_user = false
diff --git a/qapi/meson.build b/qapi/meson.build
index 333ca60583..d4de04459d 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -139,6 +139,9 @@ foreach output : qapi_util_outputs
   if output.endswith('.h')
 genh += qapi_files[i]
   endif
+  if output.endswith('.trace-events')
+qapi_trace_events_targets += {output: qapi_files[i]}
+  endif
   util_ss.add(qapi_files[i])
   i = i + 1
 endforeach
@@ -147,6 +150,9 @@ foreach output : qapi_specific_outputs + 
qapi_nonmodule_outputs
   if output.endswith('.h')
 genh += qapi_files[i]
   endif
+  if output.endswith('.trace-events')
+qapi_trace_events_targets += {output: qapi_files[i]}
+  endif
   specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: qapi_files[i])
   i = i + 1
 endforeach
diff --git a/trace/meson.build b/trace/meson.build
index 77e44fa68d..daa24c3a2d 100644
--- a/trace/meson.build
+++ b/trace/meson.build
@@ -4,7 +4,7 @@ specific_ss.add(files('control-target.c'))
 trace_events_files = []
 foreach path : [ '.' ] + trace_events_subdirs + qapi_trace_events
   if path.contains('trace-events')
-trace_events_file = meson.project_build_root() / 'qapi' / path
+trace_events_file = qapi_trace_events_targets[path]
   else
 trace_events_file = meson.project_source_root() / path / 'trace-events'
   endif



--
Best regards,
Vladimir

Re: [PATCH 1/3] block: better document SSH host key fingerprint checking

2021-12-23 Thread Hanna Reitz


On 18.11.21 15:35, Daniel P. Berrangé wrote:

The docs still illustrate host key fingerprint checking using the old
md5 hashes which are considered insecure and obsolete. Change it to
illustrate using a sha256 hash. Also show how to extract the hash
value from the known_hosts file.

Signed-off-by: Daniel P. Berrangé 
---
  docs/system/qemu-block-drivers.rst.inc | 30 ++
  1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index 16225710eb..2aeeaf6361 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -778,10 +778,32 @@ The optional *HOST_KEY_CHECK* parameter controls how the 
remote
  host's key is checked.  The default is ``yes`` which means to use
  the local ``.ssh/known_hosts`` file.  Setting this to ``no``
  turns off known-hosts checking.  Or you can check that the host key
-matches a specific fingerprint:
-``host_key_check=md5:78:45:8e:14:57:4f:d5:45:83:0a:0e:f3:49:82:c9:c8``
-(``sha1:`` can also be used as a prefix, but note that OpenSSH
-tools only use MD5 to print fingerprints).
+matches a specific fingerprint. The fingerprint can be provided in
+``md5``, ``sha1``, or ``sha256`` format, however, it is strongly
+recommended to only use ``sha256``, since the other options are
+considered insecure by modern standards. The fingerprint value
+must be given as a hex encoded string::
+
+  
host_key_check=sha256:04ce2ae89ff4295a6b9c4111640bdcb3297858ee55cb434d9dd88796e93aa795``


I think the backticks at the end of this line should be dropped.

With that done:

Reviewed-by: Hanna Reitz 


+
+The key string may optionally contain ":" separators between
+each pair of hex digits.
+
+The ``$HOME/.ssh/known_hosts`` file contains the base64 encoded
+host keys. These can be converted into the format needed for
+QEMU using a command such as::
+
+   $ for key in `grep 10.33.8.112 known_hosts | awk '{print $3}'`
+ do
+   echo $key | base64 -d | sha256sum
+ done
+ 6c3aa525beda9dc83eadfbd7e5ba7d976ecb59575d1633c87cd06ed2ed6e366f  -
+ 12214fd9ea5b408086f98ecccd9958609bd9ac7c0ea316734006bc7818b45dc8  -
+ d36420137bcbd101209ef70c3b15dc07362fbe0fa53c5b135eba6e6afa82f0ce  -
+
+Note that there can be multiple keys present per host, each with
+different key ciphers. Care is needed to pick the key fingerprint
+that matches the cipher QEMU will negotiate with the remote server.
  
  Currently authentication must be done using ssh-agent.  Other

  authentication methods may be supported in future.

Re: [PATCH 2/3] block: support sha256 fingerprint with pre-blockdev options

2021-12-23 Thread Hanna Reitz


On 18.11.21 15:35, Daniel P. Berrangé wrote:

When support for sha256 fingerprint checking was aded in

   commit bf783261f0aee6e81af3916bff7606d71ccdc153
   Author: Daniel P. Berrangé 
   Date:   Tue Jun 22 12:51:56 2021 +0100

 block/ssh: add support for sha256 host key fingerprints

it was only made to work with -blockdev. Getting it working with
-drive requires some extra custom parsing.

Signed-off-by: Daniel P. Berrangé 
---
  block/ssh.c | 5 +
  1 file changed, 5 insertions(+)


Reviewed-by: Hanna Reitz

Re: Building QEMU as a shared library

2021-12-23 Thread Philippe Mathieu-Daudé

Hi Peter,

On 12/15/21 11:10, Peter Maydell wrote:
> On Wed, 15 Dec 2021 at 08:18, Amir Gonnen  wrote:
>> My goal is to simulate a mixed architecture system.
>>
>> Today QEMU strongly assumes that the simulated system is a *single 
>> architecture*.
>> Changing this assumption and supporting mixed architecture in QEMU proved to 
>> be
>> non-trivial and may require significant development effort. Common code such 
>> as
>> TCG and others explicitly include architecture specific header files, for 
>> example.
> 
> Yeah. This is definitely something we'd like to fix some day. It's
> the approach I would prefer for getting multi-architecture machines.

Am I understanding correctly your preference would be *not* using shared
libraries, but having a monolithic process able to use any configuration
of heterogeneous architectures?

What are your thoughts on Daniel idea to where (IIUC) cores can are
external processes wired via vhost-user. One problem is not all
operating systems supported provide this possibility.

>> Instead, I would like to suggest a new approach we use at Neuroblade to 
>> achieve this:
>> Build QEMU as a shared library that can be loaded and used directly in a 
>> larger simulation.
>> Today we build qemu-system-nios2 shared library and load it from 
>> qemu-system-x86_64 in order
>> to simulate an x86_64 system that also consists of multiple nios2 cores.
>> In our simulation, two independent "main" functions are running on different 
>> threads, and
>> simulation synchronization is reduced to synchronizing threads.
> 
> I agree with Stefan that you should go ahead and send the code as
> an RFC patchset, but I feel like there is a lot of work required
> to really get the codebase into a state where it is a clean
> shared library...
> 
> -- PMM
>

Re: [PATCH] acpi: validate hotplug selector on access

2021-12-23 Thread Mauro Matteo Cascella

Hi,

On Wed, Dec 22, 2021 at 9:52 PM Michael S. Tsirkin  wrote:
>
> On Wed, Dec 22, 2021 at 09:27:51PM +0100, Philippe Mathieu-Daudé wrote:
> > On Wed, Dec 22, 2021 at 9:20 PM Michael S. Tsirkin  wrote:
> > > On Wed, Dec 22, 2021 at 08:19:41PM +0100, Philippe Mathieu-Daudé wrote:
> > > > +Mauro & Alex
> > > >
> > > > On 12/21/21 15:48, Michael S. Tsirkin wrote:
> > > > > When bus is looked up on a pci write, we didn't
> > > > > validate that the lookup succeeded.
> > > > > Fuzzers thus can trigger QEMU crash by dereferencing the NULL
> > > > > bus pointer.
> > > > >
> > > > > Fixes: b32bd763a1 ("pci: introduce acpi-index property for PCI 
> > > > > device")
> > > > > Cc: "Igor Mammedov" 
> > > > > Fixes: https://gitlab.com/qemu-project/qemu/-/issues/770
> > > > > Signed-off-by: Michael S. Tsirkin 
> > > >
> > > > It seems this problem is important enough to get a CVE assigned.
> > >
> > > Guest root can crash guest.
> > > I don't see why we would assign a CVE.
> >
> > Well thinking about downstream distributions, if there is a CVE assigned,
> > it helps them to have it written in the commit. Maybe I am mistaken.
> >
> > Unrelated but it seems there is a coordination problem with the
> > qemu-security@ list,
> > if this isn't a security issue, why was a CVE requested?
>
> Right.  I don't think a priveleged user crashing VM warrants a CVE,
> it can just halt a CPU or whatever. Just cancel the CVE request pls.

While I agree with you that this is kind of borderline and I expressed
similar concerns in the past, I was told that:

1) root guest users are not necessarily trustworthy (from the host perspective).
2) NULL pointer deref and similar issues caused by an
ill-handled/error condition are CVE worthy, even if triggered by root.
3) In other cases, DoS triggered by root is not a security issue
because it's an expected behavior and not an ill-handled/error
condition (think of assert failures, for example).

In other words, "ill-handled condition" is the crucial factor that
makes a bug CVE worthy or not.

+Prasad, can you shed some light on this? Is my understanding correct?

Also, please note that we regularly get CVE requests for bugs like
this and many CVEs have been assigned in the past. Of course that
doesn't mean we can't change things going forward, but I think we
should make it clear (probably here:
https://www.qemu.org/docs/master/system/security.html) that these
kinds of bugs are not eligible for CVE assignment.

> > > > Mauro, please update us when you get the CVE number.
> > > > Michael, please amend the CVE number before committing the fix.
> > > >
> > > > FWIW Paolo asked every fuzzed bug reproducer to be committed
> > > > as qtest, see tests/qtest/fuzz*c. Alex has a way to generate
> > > > reproducer in plain C.
> > > >
> > > > Regards,
> > > >
> > > > Phil.
> > >
>

-- 
Mauro Matteo Cascella
Red Hat Product Security
PGP-Key ID: BB3410B0

Re: [PATCH 3/3] block: print the server key type and fingerprint on failure

2021-12-23 Thread Hanna Reitz


On 18.11.21 15:35, Daniel P. Berrangé wrote:

When validating the server key fingerprint fails, it is difficult for
the user to know what they got wrong. The fingerprint accepted by QEMU
is received in a different format than openssh displays. There can also
be keys for multiple different ciphers in known_hosts. It may not be
obvious which cipher QEMU will use and whether it will be the same
as openssh. Address this by printing the server key type and its
corresponding fingerprint in the format QEMU accepts.

Signed-off-by: Daniel P. Berrangé 
---
  block/ssh.c | 37 ++---
  1 file changed, 30 insertions(+), 7 deletions(-)


Nice!

Reviewed-by: Hanna Reitz 


diff --git a/block/ssh.c b/block/ssh.c
index fcc0ab765a..967a2b971e 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -386,14 +386,28 @@ static int compare_fingerprint(const unsigned char 
*fingerprint, size_t len,
  return *host_key_check - '\0';
  }
  
+static char *format_fingerprint(const unsigned char *fingerprint, size_t len)

+{
+static const char *hex = "0123456789abcdef";
+char *ret = g_new0(char, (len * 2) + 1);
+for (size_t i = 0; i < len; i++) {
+ret[i * 2] = hex[((fingerprint[i] >> 4) & 0xf)];
+ret[(i * 2) + 1] = hex[(fingerprint[i] & 0xf)];


(I would have found an sn?printf() solution a bit simpler here
(snprintf(&ret[i * 2], 2, "%02x", fingerprint[i])),
but now you already wrote the code, so...)


+}
+ret[len * 2] = '\0';
+return ret;
+}

[PATCH v2 0/2] block: Minor vhost-user-blk fixes

2021-12-23 Thread Philippe Mathieu-Daudé

- Add vhost-user-blk help to qemu-storage-daemon,
- Do not list vhost-user-blk in BlockExportType when
  CONFIG_VHOST_USER_BLK_SERVER is disabled.

Since v1:
- Reword patch 2 description (Markus)
- Fix BlockExportOptions enum build failure (Markus)

Philippe Mathieu-Daudé (2):
  qemu-storage-daemon: Add vhost-user-blk help
  qapi/block: Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER

 qapi/block-export.json   |  6 --
 storage-daemon/qemu-storage-daemon.c | 13 +
 2 files changed, 17 insertions(+), 2 deletions(-)

-- 
2.33.1

[PATCH v2 1/2] qemu-storage-daemon: Add vhost-user-blk help

2021-12-23 Thread Philippe Mathieu-Daudé

Add missing vhost-user-blk help:

  $ qemu-storage-daemon -h
  ...
--export [type=]vhost-user-blk,id=,node-name=,
 addr.type=unix,addr.path=[,writable=on|off]
 [,logical-block-size=][,num-queues=]
   export the specified block node as a
   vhosts-user-blk device over UNIX domain socket
--export [type=]vhost-user-blk,id=,node-name=,
 fd,addr.str=[,writable=on|off]
 [,logical-block-size=][,num-queues=]
   export the specified block node as a
   vhosts-user-blk device over file descriptor
  ...

Fixes: 90fc91d50b7 ("convert vhost-user-blk server to block export API")
Reported-by: Qing Wang 
Signed-off-by: Philippe Mathieu-Daudé 
---
 storage-daemon/qemu-storage-daemon.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index 52cf17e8ace..0c19e128e3f 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -104,6 +104,19 @@ static void help(void)
 " export the specified block node over FUSE\n"
 "\n"
 #endif /* CONFIG_FUSE */
+#ifdef CONFIG_VHOST_USER_BLK_SERVER
+"  --export [type=]vhost-user-blk,id=,node-name=,\n"
+"   addr.type=unix,addr.path=[,writable=on|off]\n"
+"   [,logical-block-size=][,num-queues=]\n"
+" export the specified block node as a\n"
+" vhosts-user-blk device over UNIX domain socket\n"
+"  --export [type=]vhost-user-blk,id=,node-name=,\n"
+"   fd,addr.str=[,writable=on|off]\n"
+"   [,logical-block-size=][,num-queues=]\n"
+" export the specified block node as a\n"
+" vhosts-user-blk device over file descriptor\n"
+"\n"
+#endif /* CONFIG_VHOST_USER_BLK_SERVER */
 "  --monitor [chardev=]name[,mode=control][,pretty[=on|off]]\n"
 " configure a QMP monitor\n"
 "\n"
-- 
2.33.1

[PATCH v2 2/2] qapi/block: Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER

2021-12-23 Thread Philippe Mathieu-Daudé

When building QEMU with --disable-vhost-user and using introspection,
query-qmp-schema lists vhost-user-blk even though it's not actually
available:

  { "execute": "query-qmp-schema" }
  {
  "return": [
  ...
  {
  "name": "312",
  "members": [
  {
  "name": "nbd"
  },
  {
  "name": "vhost-user-blk"
  }
  ],
  "meta-type": "enum",
  "values": [
  "nbd",
  "vhost-user-blk"
  ]
  },

Restrict vhost-user-blk in BlockExportType when
CONFIG_VHOST_USER_BLK_SERVER is disabled, so it
doesn't end listed by query-qmp-schema.

Fixes: 90fc91d50b7 ("convert vhost-user-blk server to block export API")
Signed-off-by: Philippe Mathieu-Daudé 
---
v2: Reword + restrict BlockExportOptions union (armbru)
---
 qapi/block-export.json | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index c1b92ce1c1c..f9ce79a974b 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -277,7 +277,8 @@
 # Since: 4.2
 ##
 { 'enum': 'BlockExportType',
-  'data': [ 'nbd', 'vhost-user-blk',
+  'data': [ 'nbd',
+{ 'name': 'vhost-user-blk', 'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
 { 'name': 'fuse', 'if': 'CONFIG_FUSE' } ] }
 
 ##
@@ -319,7 +320,8 @@
   'discriminator': 'type',
   'data': {
   'nbd': 'BlockExportOptionsNbd',
-  'vhost-user-blk': 'BlockExportOptionsVhostUserBlk',
+  'vhost-user-blk': { 'type': 'BlockExportOptionsVhostUserBlk',
+  'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
   'fuse': { 'type': 'BlockExportOptionsFuse',
 'if': 'CONFIG_FUSE' }
} }
-- 
2.33.1

[PULL 0/1] "make check" switch to meson test harness

2021-12-23 Thread Paolo Bonzini

The following changes since commit 2bf40d0841b942e7ba12953d515e62a436f0af84:

  Merge tag 'pull-user-20211220' of https://gitlab.com/rth7680/qemu into 
staging (2021-12-20 13:20:07 -0800)

are available in the Git repository at:

  https://gitlab.com/bonzini/qemu.git tags/for-upstream-mtest

for you to fetch changes up to 3d2f73ef75e25ba850aff4fcccb36d50137afd0f:

  build: use "meson test" as the test harness (2021-12-23 10:06:19 +0100)


Replace tap-driver.pl with "meson test".


Paolo Bonzini (1):
  build: use "meson test" as the test harness

 Makefile  |   3 +-
 meson.build   |   5 +-
 scripts/mtest2make.py | 112 ++-
 scripts/tap-driver.pl | 379 --
 scripts/tap-merge.pl  | 111 ---
 tests/fp/meson.build  |   2 +-
 6 files changed, 51 insertions(+), 561 deletions(-)
 delete mode 100755 scripts/tap-driver.pl
 delete mode 100755 scripts/tap-merge.pl
-- 
2.33.1

[PULL 1/1] build: use "meson test" as the test harness

2021-12-23 Thread Paolo Bonzini

"meson test" starting with version 0.57 is just as capable and easy to
use as QEMU's own TAP driver.  All existing options for "make check"
work.  The only required code change involves how to mark "slow" tests;
they need to belong to an additional "slow" suite.

The rules for .tap output are replaced by JUnit XML; GitLab is able
to parse that output and present it in the CI pipeline report.

Signed-off-by: Paolo Bonzini 
---
 Makefile  |   3 +-
 meson.build   |   5 +-
 scripts/mtest2make.py | 112 +
 scripts/tap-driver.pl | 379 --
 scripts/tap-merge.pl  | 111 -
 tests/fp/meson.build  |   2 +-
 6 files changed, 51 insertions(+), 561 deletions(-)
 delete mode 100755 scripts/tap-driver.pl
 delete mode 100755 scripts/tap-merge.pl

diff --git a/Makefile b/Makefile
index 74c5b46d38..5d66c35ea5 100644
--- a/Makefile
+++ b/Makefile
@@ -145,7 +145,8 @@ NINJAFLAGS = $(if $V,-v) $(if $(MAKE.n), -n) $(if 
$(MAKE.k), -k0) \
 $(filter-out -j, $(lastword -j1 $(filter -l% -j%, $(MAKEFLAGS \
 
 ninja-cmd-goals = $(or $(MAKECMDGOALS), all)
-ninja-cmd-goals += $(foreach t, $(.tests), $(.test.deps.$t))
+ninja-cmd-goals += $(foreach t, $(.check.build-suites), $(.check-$t.deps))
+ninja-cmd-goals += $(foreach t, $(.bench.build-suites), $(.bench-$t.deps))
 
 makefile-targets := build.ninja ctags TAGS cscope dist clean uninstall
 # "ninja -t targets" also lists all prerequisites.  If build system
diff --git a/meson.build b/meson.build
index f45ecf31bd..f0f1d5ba9d 100644
--- a/meson.build
+++ b/meson.build
@@ -1,8 +1,11 @@
 project('qemu', ['c'], meson_version: '>=0.58.2',
 default_options: ['warning_level=1', 'c_std=gnu11', 'cpp_std=gnu++11', 
'b_colorout=auto',
-  'b_staticpic=false'],
+  'b_staticpic=false', 'stdsplit=false'],
 version: files('VERSION'))
 
+add_test_setup('quick', exclude_suites: 'slow', is_default: true)
+add_test_setup('slow', env: ['G_TEST_SLOW=1', 'SPEED=slow'])
+
 not_found = dependency('', required: false)
 keyval = import('keyval')
 ss = import('sourceset')
diff --git a/scripts/mtest2make.py b/scripts/mtest2make.py
index 02c0453e67..7067bdadf5 100644
--- a/scripts/mtest2make.py
+++ b/scripts/mtest2make.py
@@ -13,101 +13,79 @@
 
 class Suite(object):
 def __init__(self):
-self.tests = list()
-self.slow_tests = list()
-self.executables = set()
+self.deps = set()
+self.speeds = ['quick']
+
+def names(self, base):
+return [base if speed == 'quick' else f'{base}-{speed}' for speed in 
self.speeds]
+
 
 print('''
 SPEED = quick
 
-# $1 = environment, $2 = test command, $3 = test name, $4 = dir
-.test-human-tap = $1 $(if $4,(cd $4 && $2),$2) -m $(SPEED) < /dev/null | 
./scripts/tap-driver.pl --test-name="$3" $(if $(V),,--show-failures-only)
-.test-human-exitcode = $1 $(PYTHON) scripts/test-driver.py $(if $4,-C$4) $(if 
$(V),--verbose) -- $2 < /dev/null
-.test-tap-tap = $1 $(if $4,(cd $4 && $2),$2) < /dev/null | sed "s/^[a-z][a-z]* 
[0-9]*/& $3/" || true
-.test-tap-exitcode = printf "%s\\n" 1..1 "`$1 $(if $4,(cd $4 && $2),$2) < 
/dev/null > /dev/null || echo "not "`ok 1 $3"
-.test.human-print = echo $(if $(V),'$1 $2','Running test $3') &&
-.test.env = MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$(( $${RANDOM:-0} % 255 + 1))}
+.speed.quick = $(foreach s,$(sort $(filter-out %-slow, $1)), --suite $s)
+.speed.slow = $(foreach s,$(sort $1), --suite $s)
 
-# $1 = test name, $2 = test target (human or tap)
-.test.run = $(call 
.test.$2-print,$(.test.env.$1),$(.test.cmd.$1),$(.test.name.$1)) $(call 
.test-$2-$(.test.driver.$1),$(.test.env.$1),$(.test.cmd.$1),$(.test.name.$1),$(.test.dir.$1))
+.mtestargs = --no-rebuild -t 0
+ifneq ($(SPEED), quick)
+.mtestargs += --setup $(SPEED)
+endif
+.mtestargs += $(subst -j,--num-processes , $(filter-out -j, $(lastword -j1 
$(filter -j%, $(MAKEFLAGS)
 
-.test.output-format = human
-''')
+.check.mtestargs = $(MTESTARGS) $(.mtestargs) $(if 
$(V),--verbose,--print-errorlogs)
+.bench.mtestargs = $(MTESTARGS) $(.mtestargs) --benchmark --verbose''')
 
 introspect = json.load(sys.stdin)
-i = 0
 
 def process_tests(test, targets, suites):
-global i
-env = ' '.join(('%s=%s' % (shlex.quote(k), shlex.quote(v))
-for k, v in test['env'].items()))
 executable = test['cmd'][0]
 try:
 executable = os.path.relpath(executable)
 except:
 pass
-if test['workdir'] is not None:
-try:
-test['cmd'][0] = os.path.relpath(executable, test['workdir'])
-except:
-test['cmd'][0] = executable
-else:
-test['cmd'][0] = executable
-cmd = ' '.join((shlex.quote(x) for x in test['cmd']))
-driver = test['protocol'] if 'protocol' in test else 'exitcode'
-
-i += 1
-if test['workdir'] is not None:
-print('.test.dir.%d := %s' % (i, shlex.quote(test['workdir'])))
 
 deps = (target

Re: [PATCH 3/3] block: print the server key type and fingerprint on failure

2021-12-23 Thread Philippe Mathieu-Daudé

On 11/18/21 15:35, Daniel P. Berrangé wrote:
> When validating the server key fingerprint fails, it is difficult for
> the user to know what they got wrong. The fingerprint accepted by QEMU
> is received in a different format than openssh displays. There can also
> be keys for multiple different ciphers in known_hosts. It may not be
> obvious which cipher QEMU will use and whether it will be the same
> as openssh. Address this by printing the server key type and its

"OpenSSH"? (twice)

> corresponding fingerprint in the format QEMU accepts.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  block/ssh.c | 37 ++---
>  1 file changed, 30 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [RFC PATCH v3 18/27] hw/intc: Add LoongArch ls7a interrupt controller support(PCH-PIC)

2021-12-23 Thread Mark Cave-Ayland


On 22/12/2021 02:38, yangxiaojuan wrote:


Hi, Mark

On 12/18/2021 08:33 AM, Mark Cave-Ayland wrote:

On 04/12/2021 12:07, Xiaojuan Yang wrote:


This patch realize the PCH-PIC interrupt controller.

Signed-off-by: Xiaojuan Yang 
Signed-off-by: Song Gao 
---
   hw/intc/Kconfig |   4 +
   hw/intc/loongarch_pch_pic.c | 357 
   hw/intc/meson.build |   1 +
   hw/intc/trace-events|   5 +
   hw/loongarch/Kconfig|   1 +
   include/hw/intc/loongarch_pch_pic.h |  61 +
   6 files changed, 429 insertions(+)
   create mode 100644 hw/intc/loongarch_pch_pic.c
   create mode 100644 include/hw/intc/loongarch_pch_pic.h

diff --git a/hw/intc/Kconfig b/hw/intc/Kconfig
index 511dcac537..96da13ad1d 100644
--- a/hw/intc/Kconfig
+++ b/hw/intc/Kconfig
@@ -76,3 +76,7 @@ config M68K_IRQC
 config LOONGARCH_IPI
   bool
+
+config LOONGARCH_PCH_PIC
+bool
+select UNIMP
diff --git a/hw/intc/loongarch_pch_pic.c b/hw/intc/loongarch_pch_pic.c
new file mode 100644
index 00..2ede29ceb0
--- /dev/null
+++ b/hw/intc/loongarch_pch_pic.c
@@ -0,0 +1,357 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU Loongson 7A1000 I/O interrupt controller.
+ *
+ * Copyright (C) 2021 Loongson Technology Corporation Limited
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/loongarch/loongarch.h"
+#include "hw/irq.h"
+#include "hw/intc/loongarch_pch_pic.h"
+#include "migration/vmstate.h"
+#include "trace.h"
+
+#define for_each_set_bit(bit, addr, size) \
+ for ((bit) = find_first_bit((addr), (size));\
+  (bit) < (size);\
+  (bit) = find_next_bit((addr), (size), (bit) + 1))
+
+static void pch_pic_update_irq(loongarch_pch_pic *s, uint64_t mask, int level)
+{
+int i;
+uint64_t val;
+val = mask & s->intirr & (~s->int_mask);
+
+for_each_set_bit(i, &val, 64) {
+if (level == 1) {
+if ((s->intisr & (0x1ULL << i)) == 0) {
+s->intisr |= 1ULL << i;
+qemu_set_irq(s->parent_irq[s->htmsi_vector[i]], 1);
+}
+} else if (level == 0) {
+if (s->intisr & (0x1ULL << i)) {
+s->intisr &= ~(0x1ULL << i);
+qemu_set_irq(s->parent_irq[s->htmsi_vector[i]], 0);
+}
+}
+}
+}


The normal pattern would be to use something like:

for (i = 0; i < 64; i++) {
 if (level) {
 s->intisr |= 1ULL << i;
 } else {
 s->intisr &= ~(0x1ULL << i);
 }

 qemu_set_irq(s->parent_irq[s->htmsi_vector[i]], level);
}

Why is it necessary to check the previous value of (s->intisr & (0x1ULL << i)) 
here?


Here check the previous value to avoid Unnecessary write. It seems make things 
more complicated. I will modify


In general a *_update_irq() function should be fine to propagate the IRQ up to the 
parent directly: I think this is fine in this case because you are directly 
manipulating the parent_irq elements rather than using e.g. a priority encoder within 
this device to raise an IRQ to the CPU. I'm presuming this final prioritisation and 
delivery is done elsewhere?



+static void pch_pic_irq_handler(void *opaque, int irq, int level)
+{
+loongarch_pch_pic *s = LOONGARCH_PCH_PIC(opaque);
+
+assert(irq < PCH_PIC_IRQ_NUM);
+uint64_t mask = 1ULL << irq;
+
+trace_pch_pic_irq_handler(s->intedge, irq, level);
+
+if (s->intedge & mask) {
+/* Edge triggered */
+if (level) {
+if ((s->last_intirr & mask) == 0) {
+s->intirr |= mask;
+}
+s->last_intirr |= mask;
+} else {
+s->last_intirr &= ~mask;
+}
+} else {
+/* Level triggered */
+if (level) {
+s->intirr |= mask;
+s->last_intirr |= mask;
+} else {
+s->intirr &= ~mask;
+s->last_intirr &= ~mask;
+}
+
+}
+pch_pic_update_irq(s, mask, level);
+}
+
+static uint64_t loongarch_pch_pic_reg_read(void *opaque, hwaddr addr,
+   unsigned size)
+{
+loongarch_pch_pic *s = LOONGARCH_PCH_PIC(opaque);
+uint64_t val = 0;
+uint32_t offset = addr & 0xfff;
+int64_t offset_tmp;
+
+if (size == 8) {
+switch (offset) {
+case PCH_PIC_INT_ID_OFFSET:
+val = (PCH_PIC_INT_ID_NUM << 32) | PCH_PIC_INT_ID_VAL;
+break;
+case PCH_PIC_INT_MASK_OFFSET:
+val =  s->int_mask;
+break;
+case PCH_PIC_INT_STATUS_OFFSET:
+val = s->intisr & (~s->int_mask);
+break;
+case PCH_PIC_INT_EDGE_OFFSET:
+val = s->intedge;
+break;
+case PCH_PIC_INT_POL_OFFSET:
+val = s->int_polarity;
+break;
+case PCH_PIC_HTMSI_EN_OFFSET...PCH_PIC_HTMSI_EN_END:
+val = s->h

Re: [PATCH 1/3] scripts/qapi/commands: gen_commands(): add add_trace_points argument

2021-12-23 Thread Vladimir Sementsov-Ogievskiy


21.12.2021 22:35, Vladimir Sementsov-Ogievskiy wrote:

Add possibility to generate trace points for each qmp command.

We should generate both trace points and trace-events file, for further
trace point code generation.

Signed-off-by: Vladimir Sementsov-Ogievskiy
---
  scripts/qapi/commands.py | 84 ++--
  1 file changed, 73 insertions(+), 11 deletions(-)

diff --git a/scripts/qapi/commands.py b/scripts/qapi/commands.py
index 21001bbd6b..e62f1a4125 100644
--- a/scripts/qapi/commands.py
+++ b/scripts/qapi/commands.py
@@ -53,7 +53,8 @@ def gen_command_decl(name: str,
  def gen_call(name: str,
   arg_type: Optional[QAPISchemaObjectType],
   boxed: bool,
- ret_type: Optional[QAPISchemaType]) -> str:
+ ret_type: Optional[QAPISchemaType],
+ add_trace_points: bool) -> str:
  ret = ''
  
  argstr = ''

@@ -71,21 +72,65 @@ def gen_call(name: str,
  if ret_type:
  lhs = 'retval = '
  
-ret = mcgen('''

+qmp_name = f'qmpq_{c_name(name)}'


That was called qmpq_ because qmp_ conflicts with existing qmp_ trace points 
for jobs. But looking at them, they don't add much information to new qmpq_ 
trace events, so, in v2 I'll remove old qmp_ trace points (not many of them) 
and new generated trace points will be named simply qmp_*


--
Best regards,
Vladimir

Re: [PATCH 3/3] meson: generate trace points for qmp commands

2021-12-23 Thread Vladimir Sementsov-Ogievskiy


23.12.2021 12:33, Vladimir Sementsov-Ogievskiy wrote:

23.12.2021 01:11, Paolo Bonzini wrote:

Il mar 21 dic 2021, 20:35 Vladimir Sementsov-Ogievskiy mailto:vsement...@virtuozzo.com>> ha scritto:

    --- a/trace/meson.build
    +++ b/trace/meson.build
    @@ -2,10 +2,14 @@
  specific_ss.add(files('control-target.c'))

  trace_events_files = []
    -foreach dir : [ '.' ] + trace_events_subdirs
    -  trace_events_file = meson.project_source_root() / dir / 'trace-events'
    +foreach path : [ '.' ] + trace_events_subdirs + qapi_trace_events
    +  if path.contains('trace-events')
    +    trace_events_file = meson.project_build_root() / 'qapi' / path



Just using "trace_events_file = 'qapi' / path" might work, since the build is 
nonrecursive.


This say:

ninja: error: '../trace/qapi/qapi-commands-authz.trace-events', needed by 
'trace/trace-events-all', missing and no known rule to make it
make[1]: *** [Makefile:162: run-ninja] Error 1
make[1]: Leaving directory '/work/src/qemu/up/up-trace-qmp-commands/build'
make: *** [GNUmakefile:11: all] Error 2


so, it consider the path relative to current "trace" directory.



If it doesn't, use the custom target object, possibly indexing it as ct[index]. You can 
use a dictionary to store the custom targets and find them from the "path" 
variable.



O! Great thanks! Magic. The following hack works:

diff --git a/meson.build b/meson.build
index 20d32fd20d..c42a76a14c 100644
--- a/meson.build
+++ b/meson.build
@@ -39,6 +39,7 @@ qemu_icondir = get_option('datadir') / 'icons'
  config_host_data = configuration_data()
  genh = []
  qapi_trace_events = []
+qapi_trace_events_targets = {}

  target_dirs = config_host['TARGET_DIRS'].split()
  have_linux_user = false
diff --git a/qapi/meson.build b/qapi/meson.build
index 333ca60583..d4de04459d 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -139,6 +139,9 @@ foreach output : qapi_util_outputs
    if output.endswith('.h')
  genh += qapi_files[i]
    endif
+  if output.endswith('.trace-events')
+    qapi_trace_events_targets += {output: qapi_files[i]}
+  endif
    util_ss.add(qapi_files[i])
    i = i + 1
  endforeach
@@ -147,6 +150,9 @@ foreach output : qapi_specific_outputs + 
qapi_nonmodule_outputs
    if output.endswith('.h')
  genh += qapi_files[i]
    endif
+  if output.endswith('.trace-events')
+    qapi_trace_events_targets += {output: qapi_files[i]}
+  endif
    specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: qapi_files[i])
    i = i + 1
  endforeach
diff --git a/trace/meson.build b/trace/meson.build
index 77e44fa68d..daa24c3a2d 100644
--- a/trace/meson.build
+++ b/trace/meson.build
@@ -4,7 +4,7 @@ specific_ss.add(files('control-target.c'))
  trace_events_files = []
  foreach path : [ '.' ] + trace_events_subdirs + qapi_trace_events
    if path.contains('trace-events')
-    trace_events_file = meson.project_build_root() / 'qapi' / path
+    trace_events_file = qapi_trace_events_targets[path]
    else
  trace_events_file = meson.project_source_root() / path / 'trace-events'
    endif





Or even simpler, I can use a list combined from needed qapi_files[] elements. 
So, the solution is to use custom target objects or their indexed subobjects 
instead of raw paths. This way Meson resolves dependencies better.

--
Best regards,
Vladimir

Re: [PATCH qemu] s390x/css: fix PMCW invalid mask

2021-12-23 Thread Halil Pasic

On Wed, 22 Dec 2021 17:46:11 +0100
Cornelia Huck  wrote:

> On Thu, Dec 16 2021, Nico Boehr  wrote:
> 
> > Previously, we required bits 5, 6 and 7 to be zero (0x07 == 0b111). But,
> > as per the principles of operation, bit 5 is ignored in MSCH and bits 0,
> > 1, 6 and 7 need to be zero.
> >
> > As both PMCW_FLAGS_MASK_INVALID and ioinst_schib_valid() are only used
> > by ioinst_handle_msch(), adjust the mask accordingly.
> >
> > Fixes: db1c8f53bfb1 ("s390: Channel I/O basic definitions.")
> > Signed-off-by: Nico Boehr 
> > Reviewed-by: Pierre Morel 
> > Reviewed-by: Halil Pasic 
> > Reviewed-by: Janosch Frank 
> > ---
> >  include/hw/s390x/ioinst.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/hw/s390x/ioinst.h b/include/hw/s390x/ioinst.h
> > index 3771fff9d44d..ea8d0f244492 100644
> > --- a/include/hw/s390x/ioinst.h
> > +++ b/include/hw/s390x/ioinst.h
> > @@ -107,7 +107,7 @@ QEMU_BUILD_BUG_MSG(sizeof(PMCW) != 28, "size of PMCW is 
> > wrong");
> >  #define PMCW_FLAGS_MASK_MP 0x0004
> >  #define PMCW_FLAGS_MASK_TF 0x0002
> >  #define PMCW_FLAGS_MASK_DNV 0x0001
> > -#define PMCW_FLAGS_MASK_INVALID 0x0700
> > +#define PMCW_FLAGS_MASK_INVALID 0xc300  
> 
> Removing bit 5 from this mask makes sense, at it is simply ignored.
> 
> I'm a bit confused about bits 0 and 1, however. They are _QF and _W,
> respectively (just out of the context here), which are in the same class
> as _DNV (i.e. characteristics of the subchannel that cannot be modified
> via msch). Looking at the PoP, I don't see what is supposed to happen if
> the program tries to modify the dnv bit (maybe I'm simply overlooking
> it.) I would naively assume that the w bit should behave in the same way
> (as it does for message subchannels what dnv does for I/O subchannels,
> and the rest of the values are not meaningful if it is not set), and
> probably also the qf bit (as it doesn't make sense for the program to
> turn QDIO capabilities on and off.) The main question is whether trying
> to modify these bits causes an error or is ignored. The PoP suggests an
> error (no idea if the internal architecture agrees, it hopefully does);
> what happens for dnv?

"""
Bits 0, 1, 6, and 7 of word 1, and bits 0-28 of word 6
of the SCHIB operand must be zeros, and bits 9 and
10 of word 1 must not both be ones. When the
extended-I/O-measurement-block facility is installed
and a format-1 measurement block is specified, bits
26-31 of word 11 must be specified as zeros.
"""
(IBM z/Architecture Principles of Operation (SA22-7832-10), 14-8)

The internal architecture agrees.

DNV bit is ignored. Regarding why, I don't know. Probably for historic
reasons. The PoP tells us that whatever is not listed as significant
or checked and results in an operation exception if not appropriate
is ignored:
"""
The remaining
fields of the SCHIB are ignored and do not affect the
processing of MODIFY SUBCHANNEL. (For further
details, see “Subchannel-Information Block” on
page 2
"""
(same page)

Regarding word 1 of the SCHIB the alignment between PoP and AR is
perfect AFAICT.

> 
> We support neither message subchannels nor QDIO in QEMU, so it's
> probably not relevant right now; but it would still be good if we could
> clarify the expected behaviour here :)
> 
> >  
> >  #define PMCW_CHARS_MASK_ST 0x00e0
> >  #define PMCW_CHARS_MASK_MBFC 0x0004  
> 
>

Re: [RFC PATCH v3 22/27] hw/loongarch: Add some devices support for 3A5000.

2021-12-23 Thread Mark Cave-Ayland


On 22/12/2021 08:26, yangxiaojuan wrote:


Hi, Mark

On 12/18/2021 06:02 PM, Mark Cave-Ayland wrote:

On 04/12/2021 12:07, Xiaojuan Yang wrote:


1.Add uart,virtio-net,vga and usb for 3A5000.
2.Add irq set and map for the pci host. Non pci device
use irq 0-16, pci device use 16-64.
3.Add some unimplented device to emulate guest unused
memory space.

Signed-off-by: Xiaojuan Yang 
Signed-off-by: Song Gao 
---
   hw/loongarch/Kconfig|  8 +
   hw/loongarch/loongson3.c| 63 +++--
   hw/pci-host/ls7a.c  | 42 +-
   include/hw/intc/loongarch_ipi.h |  2 ++
   include/hw/pci-host/ls7a.h  |  4 +++
   softmmu/qdev-monitor.c  |  3 +-
   6 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/hw/loongarch/Kconfig b/hw/loongarch/Kconfig
index 468e3acc74..9ea3b92708 100644
--- a/hw/loongarch/Kconfig
+++ b/hw/loongarch/Kconfig
@@ -1,5 +1,13 @@
   config LOONGSON3_LS7A
   bool
+imply VGA_PCI
+imply VIRTIO_VGA
+imply PARALLEL
+imply PCI_DEVICES
+select ISA_BUS
+select SERIAL
+select SERIAL_ISA
+select VIRTIO_PCI
   select PCI_EXPRESS_7A
   select LOONGARCH_IPI
   select LOONGARCH_PCH_PIC
diff --git a/hw/loongarch/loongson3.c b/hw/loongarch/loongson3.c
index c42f830208..e4a02e7c18 100644
--- a/hw/loongarch/loongson3.c
+++ b/hw/loongarch/loongson3.c
@@ -10,8 +10,11 @@
   #include "qemu/datadir.h"
   #include "qapi/error.h"
   #include "hw/boards.h"
+#include "hw/char/serial.h"
   #include "sysemu/sysemu.h"
   #include "sysemu/qtest.h"
+#include "hw/irq.h"
+#include "net/net.h"
   #include "sysemu/runstate.h"
   #include "sysemu/reset.h"
   #include "hw/loongarch/loongarch.h"
@@ -20,6 +23,7 @@
   #include "hw/intc/loongarch_pch_pic.h"
   #include "hw/intc/loongarch_pch_msi.h"
   #include "hw/pci-host/ls7a.h"
+#include "hw/misc/unimp.h"
   static void loongarch_cpu_reset(void *opaque)
@@ -91,11 +95,12 @@ static void sysbus_mmio_map_loongarch(SysBusDevice *dev, 
int n,
   memory_region_add_subregion(iocsr, addr, dev->mmio[n].memory);
   }
   -static void loongson3_irq_init(MachineState *machine)
+static PCIBus *loongson3_irq_init(MachineState *machine)
   {
   LoongArchMachineState *lams = LOONGARCH_MACHINE(machine);
-DeviceState *ipi, *extioi, *pch_pic, *pch_msi, *cpudev;
+DeviceState *ipi, *extioi, *pch_pic, *pch_msi, *cpudev, *pciehost;
   SysBusDevice *d;
+PCIBus *pci_bus;
   int cpu, pin, i;
   unsigned long ipi_addr;
   @@ -135,6 +140,10 @@ static void loongson3_irq_init(MachineState *machine)
   sysbus_realize_and_unref(d, &error_fatal);
   sysbus_mmio_map(d, 0, LS7A_IOAPIC_REG_BASE);
   +serial_mm_init(get_system_memory(), LS7A_UART_BASE, 0,
+   qdev_get_gpio_in(pch_pic, LS7A_UART_IRQ - 
PCH_PIC_IRQ_OFFSET),
+   115200, serial_hd(0), DEVICE_LITTLE_ENDIAN);
+
   /* Connect 64 pch_pic irqs to extioi */
   for (int i = 0; i < PCH_PIC_IRQ_NUM; i++) {
   sysbus_connect_irq(d, i, qdev_get_gpio_in(extioi, i));
@@ -149,6 +158,35 @@ static void loongson3_irq_init(MachineState *machine)
   sysbus_connect_irq(d, i,
  qdev_get_gpio_in(extioi, i + PCH_MSI_IRQ_START));
   }
+
+pciehost = qdev_new(TYPE_LS7A_HOST_DEVICE);
+d = SYS_BUS_DEVICE(pciehost);
+sysbus_realize_and_unref(d, &error_fatal);
+pci_bus = PCI_HOST_BRIDGE(pciehost)->bus;
+
+/* Connect 48 pci irq to pch_pic */
+for (i = 0; i < LS7A_PCI_IRQS; i++) {
+qdev_connect_gpio_out(pciehost, i,
+  qdev_get_gpio_in(pch_pic, i + LS7A_DEVICE_IRQS));
+}
+
+return pci_bus;
+}
+
+/* Network support */
+static void network_init(PCIBus *pci_bus)
+{
+int i;
+
+for (i = 0; i < nb_nics; i++) {
+NICInfo *nd = &nd_table[i];
+
+if (!nd->model) {
+nd->model = g_strdup("virtio");
+}
+
+pci_nic_init_nofail(nd, pci_bus, nd->model, NULL);
+}
   }
 static void loongson3_init(MachineState *machine)
@@ -161,6 +199,7 @@ static void loongson3_init(MachineState *machine)
   MemoryRegion *address_space_mem = get_system_memory();
   LoongArchMachineState *lams = LOONGARCH_MACHINE(machine);
   int i;
+PCIBus *pci_bus = NULL;
 if (!cpu_model) {
   cpu_model = LOONGARCH_CPU_TYPE_NAME("Loongson-3A5000");
@@ -207,8 +246,26 @@ static void loongson3_init(MachineState *machine)
   memory_region_add_subregion(address_space_mem, 0x9000, 
&lams->highmem);
   offset += highram_size;
   +/*
+ * There are some invalid guest memory access.
+ * Create some unimplemented devices to emulate this.
+ */
+create_unimplemented_device("ls7a-lpc", 0x10002000, 0x14);
+create_unimplemented_device("pci-dma-cfg", 0x1001041c, 0x4);
+create_unimplemented_device("node-bridge", 0xEFDFB000274, 0x4);
+create_unimplemented_device("ls7a-lionlpc", 0x1fe01400, 0x

[PATCH v2 0/4] trace qmp commands

2021-12-23 Thread Vladimir Sementsov-Ogievskiy

Hi all!

This series aims to add trace points for each qmp command with help of
qapi code generator.

v2:
01: new
02: use qmp_* naming for new trace-events
03: add Philippe's r-b, thanks!
04: rewrite, so that it works now! Thanks to Paolo for fast help!

Vladimir Sementsov-Ogievskiy (4):
  jobs: drop qmp_ trace points
  scripts/qapi/commands: gen_commands(): add add_trace_points argument
  scripts/qapi-gen.py: add --add-trace-points option
  meson: generate trace points for qmp commands

 meson.build  |  1 +
 blockdev.c   |  8 
 job-qmp.c|  6 ---
 block/trace-events   |  9 -
 qapi/meson.build |  9 -
 scripts/qapi/commands.py | 84 ++--
 scripts/qapi/gen.py  | 13 +--
 scripts/qapi/main.py | 10 +++--
 trace-events |  8 
 trace/meson.build| 11 --
 10 files changed, 107 insertions(+), 52 deletions(-)

-- 
2.31.1

[PATCH v2 1/4] jobs: drop qmp_ trace points

2021-12-23 Thread Vladimir Sementsov-Ogievskiy

We are going to implement automatic trace points for qmp commands.
These several trace points are in conflict with upcoming ones. So, drop
them now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 blockdev.c | 8 
 job-qmp.c  | 6 --
 block/trace-events | 9 -
 trace-events   | 8 
 4 files changed, 31 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 0eb2823b1b..10961d81a4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2586,8 +2586,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 goto out;
 }
 
-trace_qmp_block_stream(bs);
-
 out:
 aio_context_release(aio_context);
 }
@@ -3354,7 +3352,6 @@ void qmp_block_job_cancel(const char *device,
 goto out;
 }
 
-trace_qmp_block_job_cancel(job);
 job_user_cancel(&job->job, force, errp);
 out:
 aio_context_release(aio_context);
@@ -3369,7 +3366,6 @@ void qmp_block_job_pause(const char *device, Error **errp)
 return;
 }
 
-trace_qmp_block_job_pause(job);
 job_user_pause(&job->job, errp);
 aio_context_release(aio_context);
 }
@@ -3383,7 +3379,6 @@ void qmp_block_job_resume(const char *device, Error 
**errp)
 return;
 }
 
-trace_qmp_block_job_resume(job);
 job_user_resume(&job->job, errp);
 aio_context_release(aio_context);
 }
@@ -3397,7 +3392,6 @@ void qmp_block_job_complete(const char *device, Error 
**errp)
 return;
 }
 
-trace_qmp_block_job_complete(job);
 job_complete(&job->job, errp);
 aio_context_release(aio_context);
 }
@@ -3411,7 +3405,6 @@ void qmp_block_job_finalize(const char *id, Error **errp)
 return;
 }
 
-trace_qmp_block_job_finalize(job);
 job_ref(&job->job);
 job_finalize(&job->job, errp);
 
@@ -3435,7 +3428,6 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
 return;
 }
 
-trace_qmp_block_job_dismiss(bjob);
 job = &bjob->job;
 job_dismiss(&job, errp);
 aio_context_release(aio_context);
diff --git a/job-qmp.c b/job-qmp.c
index 829a28aa70..cf0cb9d717 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -57,7 +57,6 @@ void qmp_job_cancel(const char *id, Error **errp)
 return;
 }
 
-trace_qmp_job_cancel(job);
 job_user_cancel(job, true, errp);
 aio_context_release(aio_context);
 }
@@ -71,7 +70,6 @@ void qmp_job_pause(const char *id, Error **errp)
 return;
 }
 
-trace_qmp_job_pause(job);
 job_user_pause(job, errp);
 aio_context_release(aio_context);
 }
@@ -85,7 +83,6 @@ void qmp_job_resume(const char *id, Error **errp)
 return;
 }
 
-trace_qmp_job_resume(job);
 job_user_resume(job, errp);
 aio_context_release(aio_context);
 }
@@ -99,7 +96,6 @@ void qmp_job_complete(const char *id, Error **errp)
 return;
 }
 
-trace_qmp_job_complete(job);
 job_complete(job, errp);
 aio_context_release(aio_context);
 }
@@ -113,7 +109,6 @@ void qmp_job_finalize(const char *id, Error **errp)
 return;
 }
 
-trace_qmp_job_finalize(job);
 job_ref(job);
 job_finalize(job, errp);
 
@@ -136,7 +131,6 @@ void qmp_job_dismiss(const char *id, Error **errp)
 return;
 }
 
-trace_qmp_job_dismiss(job);
 job_dismiss(&job, errp);
 aio_context_release(aio_context);
 }
diff --git a/block/trace-events b/block/trace-events
index 549090d453..5be3e3913b 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -49,15 +49,6 @@ block_copy_read_fail(void *bcs, int64_t start, int ret) "bcs 
%p start %"PRId64"
 block_copy_write_fail(void *bcs, int64_t start, int ret) "bcs %p start 
%"PRId64" ret %d"
 block_copy_write_zeroes_fail(void *bcs, int64_t start, int ret) "bcs %p start 
%"PRId64" ret %d"
 
-# ../blockdev.c
-qmp_block_job_cancel(void *job) "job %p"
-qmp_block_job_pause(void *job) "job %p"
-qmp_block_job_resume(void *job) "job %p"
-qmp_block_job_complete(void *job) "job %p"
-qmp_block_job_finalize(void *job) "job %p"
-qmp_block_job_dismiss(void *job) "job %p"
-qmp_block_stream(void *bs) "bs %p"
-
 # file-win32.c
 file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) 
"acb %p opaque %p offset %"PRId64" count %d type %d"
 
diff --git a/trace-events b/trace-events
index a637a61eba..1265f1e0cc 100644
--- a/trace-events
+++ b/trace-events
@@ -79,14 +79,6 @@ job_state_transition(void *job,  int ret, const char *legal, 
const char *s0, con
 job_apply_verb(void *job, const char *state, const char *verb, const char 
*legal) "job %p in state %s; applying verb %s (%s)"
 job_completed(void *job, int ret) "job %p ret %d"
 
-# job-qmp.c
-qmp_job_cancel(void *job) "job %p"
-qmp_job_pause(void *job) "job %p"
-qmp_job_resume(void *job) "job %p"
-qmp_job_complete(void *job) "job %p"
-qmp_job_finalize(void *job) "job %p"
-qmp_job_dismiss(void *job) "job %p"
-
 
 ### Guest events, keep at bottom
 
-- 
2.31.1

[PATCH v2 2/4] scripts/qapi/commands: gen_commands(): add add_trace_points argument

2021-12-23 Thread Vladimir Sementsov-Ogievskiy

Add possibility to generate trace points for each qmp command.

We should generate both trace points and trace-events file, for further
trace point code generation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/qapi/commands.py | 84 ++--
 1 file changed, 73 insertions(+), 11 deletions(-)

diff --git a/scripts/qapi/commands.py b/scripts/qapi/commands.py
index 21001bbd6b..9691c11f96 100644
--- a/scripts/qapi/commands.py
+++ b/scripts/qapi/commands.py
@@ -53,7 +53,8 @@ def gen_command_decl(name: str,
 def gen_call(name: str,
  arg_type: Optional[QAPISchemaObjectType],
  boxed: bool,
- ret_type: Optional[QAPISchemaType]) -> str:
+ ret_type: Optional[QAPISchemaType],
+ add_trace_points: bool) -> str:
 ret = ''
 
 argstr = ''
@@ -71,21 +72,65 @@ def gen_call(name: str,
 if ret_type:
 lhs = 'retval = '
 
-ret = mcgen('''
+qmp_name = f'qmp_{c_name(name)}'
+upper = qmp_name.upper()
+
+if add_trace_points:
+ret += mcgen('''
+
+if (trace_event_get_state_backends(TRACE_%(upper)s)) {
+g_autoptr(GString) req_json = qobject_to_json(QOBJECT(args));
+trace_%(qmp_name)s("", req_json->str);
+}
+''',
+ upper=upper, qmp_name=qmp_name)
+
+ret += mcgen('''
 
 %(lhs)sqmp_%(c_name)s(%(args)s&err);
-error_propagate(errp, err);
 ''',
 c_name=c_name(name), args=argstr, lhs=lhs)
-if ret_type:
-ret += mcgen('''
+
+ret += mcgen('''
 if (err) {
+''')
+
+if add_trace_points:
+ret += mcgen('''
+trace_%(qmp_name)s("FAIL: ", error_get_pretty(err));
+''',
+ qmp_name=qmp_name)
+
+ret += mcgen('''
+error_propagate(errp, err);
 goto out;
 }
+''')
+
+if ret_type:
+ret += mcgen('''
 
 qmp_marshal_output_%(c_name)s(retval, ret, errp);
 ''',
  c_name=ret_type.c_name())
+
+if add_trace_points:
+if ret_type:
+ret += mcgen('''
+
+if (trace_event_get_state_backends(TRACE_%(upper)s)) {
+g_autoptr(GString) ret_json = qobject_to_json(*ret);
+trace_%(qmp_name)s("RET:", ret_json->str);
+}
+''',
+ upper=upper, qmp_name=qmp_name)
+else:
+ret += mcgen('''
+
+trace_%(qmp_name)s("SUCCESS", "");
+''',
+ qmp_name=qmp_name)
+
 return ret
 
 
@@ -122,10 +167,14 @@ def gen_marshal_decl(name: str) -> str:
  proto=build_marshal_proto(name))
 
 
+def gen_trace(name: str) -> str:
+return f'qmp_{c_name(name)}(const char *tag, const char *json) "%s%s"\n'
+
 def gen_marshal(name: str,
 arg_type: Optional[QAPISchemaObjectType],
 boxed: bool,
-ret_type: Optional[QAPISchemaType]) -> str:
+ret_type: Optional[QAPISchemaType],
+add_trace_points: bool) -> str:
 have_args = boxed or (arg_type and not arg_type.is_empty())
 if have_args:
 assert arg_type is not None
@@ -180,7 +229,7 @@ def gen_marshal(name: str,
 }
 ''')
 
-ret += gen_call(name, arg_type, boxed, ret_type)
+ret += gen_call(name, arg_type, boxed, ret_type, add_trace_points)
 
 ret += mcgen('''
 
@@ -238,11 +287,12 @@ def gen_register_command(name: str,
 
 
 class QAPISchemaGenCommandVisitor(QAPISchemaModularCVisitor):
-def __init__(self, prefix: str):
+def __init__(self, prefix: str, add_trace_points: bool):
 super().__init__(
 prefix, 'qapi-commands',
 ' * Schema-defined QAPI/QMP commands', None, __doc__)
 self._visited_ret_types: Dict[QAPIGenC, Set[QAPISchemaType]] = {}
+self.add_trace_points = add_trace_points
 
 def _begin_user_module(self, name: str) -> None:
 self._visited_ret_types[self._genc] = set()
@@ -261,6 +311,15 @@ def _begin_user_module(self, name: str) -> None:
 
 ''',
  commands=commands, visit=visit))
+
+if self.add_trace_points and c_name(commands) != 'qapi_commands':
+self._genc.add(mcgen('''
+#include "trace/trace-qapi.h"
+#include "qapi/qmp/qjson.h"
+#include "trace/trace-%(nm)s_trace_events.h"
+''',
+ nm=c_name(commands)))
+
 self._genh.add(mcgen('''
 #include "%(types)s.h"
 
@@ -322,7 +381,9 @@ def visit_command(self,
 with ifcontext(ifcond, self._genh, self._genc):
 self._genh.add(gen_command_decl(name, arg_type, boxed, ret_type))
 self._genh.add(gen_marshal_decl(name))
-self._genc.add(gen_marshal(name, arg_type, boxed, ret_type))
+self._genc.add(gen_marshal(name, arg_type, boxed, ret_type,
+   self.add_trace_points))
+self._gent.add(gen_trace(name))
 with self._temp_module('./init'):
 with ifcontext(ifcond, self._genh, sel

[PATCH v2 3/4] scripts/qapi-gen.py: add --add-trace-points option

2021-12-23 Thread Vladimir Sementsov-Ogievskiy

Add and option to generate trace points. We should generate both trace
points and trace-events files for further trace point code generation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 
---
 scripts/qapi/gen.py  | 13 ++---
 scripts/qapi/main.py | 10 +++---
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/scripts/qapi/gen.py b/scripts/qapi/gen.py
index 995a97d2b8..605b3fe68a 100644
--- a/scripts/qapi/gen.py
+++ b/scripts/qapi/gen.py
@@ -251,7 +251,7 @@ def __init__(self,
 self._builtin_blurb = builtin_blurb
 self._pydoc = pydoc
 self._current_module: Optional[str] = None
-self._module: Dict[str, Tuple[QAPIGenC, QAPIGenH]] = {}
+self._module: Dict[str, Tuple[QAPIGenC, QAPIGenH, QAPIGen]] = {}
 self._main_module: Optional[str] = None
 
 @property
@@ -264,6 +264,11 @@ def _genh(self) -> QAPIGenH:
 assert self._current_module is not None
 return self._module[self._current_module][1]
 
+@property
+def _gent(self) -> QAPIGen:
+assert self._current_module is not None
+return self._module[self._current_module][2]
+
 @staticmethod
 def _module_dirname(name: str) -> str:
 if QAPISchemaModule.is_user_module(name):
@@ -293,7 +298,8 @@ def _add_module(self, name: str, blurb: str) -> None:
 basename = self._module_filename(self._what, name)
 genc = QAPIGenC(basename + '.c', blurb, self._pydoc)
 genh = QAPIGenH(basename + '.h', blurb, self._pydoc)
-self._module[name] = (genc, genh)
+gent = QAPIGen(basename + '.trace-events')
+self._module[name] = (genc, genh, gent)
 self._current_module = name
 
 @contextmanager
@@ -304,11 +310,12 @@ def _temp_module(self, name: str) -> Iterator[None]:
 self._current_module = old_module
 
 def write(self, output_dir: str, opt_builtins: bool = False) -> None:
-for name, (genc, genh) in self._module.items():
+for name, (genc, genh, gent) in self._module.items():
 if QAPISchemaModule.is_builtin_module(name) and not opt_builtins:
 continue
 genc.write(output_dir)
 genh.write(output_dir)
+gent.write(output_dir)
 
 def _begin_builtin_module(self) -> None:
 pass
diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
index f2ea6e0ce4..3adf0319cf 100644
--- a/scripts/qapi/main.py
+++ b/scripts/qapi/main.py
@@ -32,7 +32,8 @@ def generate(schema_file: str,
  output_dir: str,
  prefix: str,
  unmask: bool = False,
- builtins: bool = False) -> None:
+ builtins: bool = False,
+ add_trace_points: bool = False) -> None:
 """
 Generate C code for the given schema into the target directory.
 
@@ -49,7 +50,7 @@ def generate(schema_file: str,
 schema = QAPISchema(schema_file)
 gen_types(schema, output_dir, prefix, builtins)
 gen_visit(schema, output_dir, prefix, builtins)
-gen_commands(schema, output_dir, prefix)
+gen_commands(schema, output_dir, prefix, add_trace_points)
 gen_events(schema, output_dir, prefix)
 gen_introspect(schema, output_dir, prefix, unmask)
 
@@ -74,6 +75,8 @@ def main() -> int:
 parser.add_argument('-u', '--unmask-non-abi-names', action='store_true',
 dest='unmask',
 help="expose non-ABI names in introspection")
+parser.add_argument('--add-trace-points', action='store_true',
+help="add trace points to qmp marshals")
 parser.add_argument('schema', action='store')
 args = parser.parse_args()
 
@@ -88,7 +91,8 @@ def main() -> int:
  output_dir=args.output_dir,
  prefix=args.prefix,
  unmask=args.unmask,
- builtins=args.builtins)
+ builtins=args.builtins,
+ add_trace_points=args.add_trace_points)
 except QAPIError as err:
 print(f"{sys.argv[0]}: {str(err)}", file=sys.stderr)
 return 1
-- 
2.31.1

[PATCH v2 4/4] meson: generate trace points for qmp commands

2021-12-23 Thread Vladimir Sementsov-Ogievskiy

1. Use --add-trace-points when generate qmp commands
2. Add corresponding .trace-events files as outputs in qapi_files
   custom target
3. Define global qapi_trace_events list of .trace-events file targets,
   to fill in trace/qapi.build and to use in trace/meson.build
4. In trace/meson.build use the new array as an additional source of
   .trace_events files to be processed

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 meson.build   |  1 +
 qapi/meson.build  |  9 -
 trace/meson.build | 11 ---
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/meson.build b/meson.build
index 17c7280f78..fcb130f163 100644
--- a/meson.build
+++ b/meson.build
@@ -38,6 +38,7 @@ qemu_icondir = get_option('datadir') / 'icons'
 
 config_host_data = configuration_data()
 genh = []
+qapi_trace_events = []
 
 target_dirs = config_host['TARGET_DIRS'].split()
 have_linux_user = false
diff --git a/qapi/meson.build b/qapi/meson.build
index c0c49c15e4..826e6c2a0a 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -114,6 +114,7 @@ foreach module : qapi_all_modules
   'qapi-events-@0@.h'.format(module),
   'qapi-commands-@0@.c'.format(module),
   'qapi-commands-@0@.h'.format(module),
+  'qapi-commands-@0@.trace-events'.format(module),
 ]
   endif
   if module.endswith('-target')
@@ -126,7 +127,7 @@ endforeach
 qapi_files = custom_target('shared QAPI source files',
   output: qapi_util_outputs + qapi_specific_outputs + qapi_nonmodule_outputs,
   input: [ files('qapi-schema.json') ],
-  command: [ qapi_gen, '-o', 'qapi', '-b', '@INPUT0@' ],
+  command: [ qapi_gen, '-o', 'qapi', '-b', '@INPUT0@', '--add-trace-points' ],
   depend_files: [ qapi_inputs, qapi_gen_depends ])
 
 # Now go through all the outputs and add them to the right sourceset.
@@ -137,6 +138,9 @@ foreach output : qapi_util_outputs
   if output.endswith('.h')
 genh += qapi_files[i]
   endif
+  if output.endswith('.trace-events')
+qapi_trace_events += qapi_files[i]
+  endif
   util_ss.add(qapi_files[i])
   i = i + 1
 endforeach
@@ -145,6 +149,9 @@ foreach output : qapi_specific_outputs + 
qapi_nonmodule_outputs
   if output.endswith('.h')
 genh += qapi_files[i]
   endif
+  if output.endswith('.trace-events')
+qapi_trace_events += qapi_files[i]
+  endif
   specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: qapi_files[i])
   i = i + 1
 endforeach
diff --git a/trace/meson.build b/trace/meson.build
index 573dd699c6..c4794a1f2a 100644
--- a/trace/meson.build
+++ b/trace/meson.build
@@ -2,10 +2,15 @@
 specific_ss.add(files('control-target.c'))
 
 trace_events_files = []
-foreach dir : [ '.' ] + trace_events_subdirs
-  trace_events_file = meson.project_source_root() / dir / 'trace-events'
+foreach item : [ '.' ] + trace_events_subdirs + qapi_trace_events
+  if item in qapi_trace_events
+trace_events_file = item
+group_name = item.full_path().split('/')[-1].underscorify()
+  else
+trace_events_file = meson.project_source_root() / item / 'trace-events'
+group_name = item == '.' ? 'root' : item.underscorify()
+  endif
   trace_events_files += [ trace_events_file ]
-  group_name = dir == '.' ? 'root' : dir.underscorify()
   group = '--group=' + group_name
   fmt = '@0@-' + group_name + '.@1@'
 
-- 
2.31.1

Re: [PATCH qemu] s390x/css: fix PMCW invalid mask

2021-12-23 Thread Cornelia Huck

On Thu, Dec 23 2021, Halil Pasic  wrote:

> On Wed, 22 Dec 2021 17:46:11 +0100
> Cornelia Huck  wrote:
>
>> On Thu, Dec 16 2021, Nico Boehr  wrote:
>> 
>> > Previously, we required bits 5, 6 and 7 to be zero (0x07 == 0b111). But,
>> > as per the principles of operation, bit 5 is ignored in MSCH and bits 0,
>> > 1, 6 and 7 need to be zero.
>> >
>> > As both PMCW_FLAGS_MASK_INVALID and ioinst_schib_valid() are only used
>> > by ioinst_handle_msch(), adjust the mask accordingly.
>> >
>> > Fixes: db1c8f53bfb1 ("s390: Channel I/O basic definitions.")
>> > Signed-off-by: Nico Boehr 
>> > Reviewed-by: Pierre Morel 
>> > Reviewed-by: Halil Pasic 
>> > Reviewed-by: Janosch Frank 
>> > ---
>> >  include/hw/s390x/ioinst.h | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/include/hw/s390x/ioinst.h b/include/hw/s390x/ioinst.h
>> > index 3771fff9d44d..ea8d0f244492 100644
>> > --- a/include/hw/s390x/ioinst.h
>> > +++ b/include/hw/s390x/ioinst.h
>> > @@ -107,7 +107,7 @@ QEMU_BUILD_BUG_MSG(sizeof(PMCW) != 28, "size of PMCW 
>> > is wrong");
>> >  #define PMCW_FLAGS_MASK_MP 0x0004
>> >  #define PMCW_FLAGS_MASK_TF 0x0002
>> >  #define PMCW_FLAGS_MASK_DNV 0x0001
>> > -#define PMCW_FLAGS_MASK_INVALID 0x0700
>> > +#define PMCW_FLAGS_MASK_INVALID 0xc300  
>> 
>> Removing bit 5 from this mask makes sense, at it is simply ignored.
>> 
>> I'm a bit confused about bits 0 and 1, however. They are _QF and _W,
>> respectively (just out of the context here), which are in the same class
>> as _DNV (i.e. characteristics of the subchannel that cannot be modified
>> via msch). Looking at the PoP, I don't see what is supposed to happen if
>> the program tries to modify the dnv bit (maybe I'm simply overlooking
>> it.) I would naively assume that the w bit should behave in the same way
>> (as it does for message subchannels what dnv does for I/O subchannels,
>> and the rest of the values are not meaningful if it is not set), and
>> probably also the qf bit (as it doesn't make sense for the program to
>> turn QDIO capabilities on and off.) The main question is whether trying
>> to modify these bits causes an error or is ignored. The PoP suggests an
>> error (no idea if the internal architecture agrees, it hopefully does);
>> what happens for dnv?
>
> """
> Bits 0, 1, 6, and 7 of word 1, and bits 0-28 of word 6
> of the SCHIB operand must be zeros, and bits 9 and
> 10 of word 1 must not both be ones. When the
> extended-I/O-measurement-block facility is installed
> and a format-1 measurement block is specified, bits
> 26-31 of word 11 must be specified as zeros.
> """
> (IBM z/Architecture Principles of Operation (SA22-7832-10), 14-8)
>
> The internal architecture agrees.

Thanks for checking.

>
> DNV bit is ignored. Regarding why, I don't know. Probably for historic
> reasons.

Yeah, it's a bit odd, "for historic reason" seems plausible.

> The PoP tells us that whatever is not listed as significant
> or checked and results in an operation exception if not appropriate
> is ignored:
> """
> The remaining
> fields of the SCHIB are ignored and do not affect the
> processing of MODIFY SUBCHANNEL. (For further
> details, see “Subchannel-Information Block” on
> page 2
> """
> (same page)
>
> Regarding word 1 of the SCHIB the alignment between PoP and AR is
> perfect AFAICT.
>
>> 
>> We support neither message subchannels nor QDIO in QEMU, so it's
>> probably not relevant right now; but it would still be good if we could
>> clarify the expected behaviour here :)
>> 
>> >  
>> >  #define PMCW_CHARS_MASK_ST 0x00e0
>> >  #define PMCW_CHARS_MASK_MBFC 0x0004  
>> 
>> 

In that case,

Reviewed-by: Cornelia Huck

Re: [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically

2021-12-23 Thread Peter Xu

Hi, Yong,

On Tue, Dec 14, 2021 at 07:07:32PM +0800, huang...@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) 
> 
> Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
> tracking for calculate dirtyrate periodly for dirty restraint.
> 
> Implement thread for calculate dirtyrate periodly, which will
> be used for dirty page limit.
> 
> Add dirtylimit.h to introduce the util function for dirty
> limit implementation.

Sorry to be late on reading it, my apologies.

> 
> Signed-off-by: Hyman Huang(黄勇) 
> ---
>  include/exec/memory.h   |   5 +-
>  include/sysemu/dirtylimit.h |  51 ++
>  migration/dirtyrate.c   | 160 
> +---
>  migration/dirtyrate.h   |   2 +
>  4 files changed, 207 insertions(+), 11 deletions(-)
>  create mode 100644 include/sysemu/dirtylimit.h
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 20f1b27..606bec8 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
>  /* Dirty tracking enabled because measuring dirty rate */
>  #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
>  
> -#define GLOBAL_DIRTY_MASK  (0x3)
> +/* Dirty tracking enabled because dirty limit */
> +#define GLOBAL_DIRTY_LIMIT  (1U << 2)
> +
> +#define GLOBAL_DIRTY_MASK  (0x7)
>  
>  extern unsigned int global_dirty_tracking;
>  
> diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
> new file mode 100644
> index 000..34e48f8
> --- /dev/null
> +++ b/include/sysemu/dirtylimit.h
> @@ -0,0 +1,51 @@
> +/*
> + * dirty limit helper functions
> + *
> + * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
> + *
> + * Authors:
> + *  Hyman Huang(黄勇) 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#ifndef QEMU_DIRTYRLIMIT_H
> +#define QEMU_DIRTYRLIMIT_H
> +
> +#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
> +
> +/**
> + * dirtylimit_calc_current
> + *
> + * get current dirty page rate for specified virtual CPU.
> + */
> +int64_t dirtylimit_calc_current(int cpu_index);
> +
> +/**
> + * dirtylimit_calc_start
> + *
> + * start dirty page rate calculation thread.
> + */
> +void dirtylimit_calc_start(void);
> +
> +/**
> + * dirtylimit_calc_quit
> + *
> + * quit dirty page rate calculation thread.
> + */
> +void dirtylimit_calc_quit(void);
> +
> +/**
> + * dirtylimit_calc_state_init
> + *
> + * initialize dirty page rate calculation state.
> + */
> +void dirtylimit_calc_state_init(int max_cpus);
> +
> +/**
> + * dirtylimit_calc_state_finalize
> + *
> + * finalize dirty page rate calculation state.
> + */
> +void dirtylimit_calc_state_finalize(void);
> +#endif

Since dirtylimit and dirtyrate looks so alike, not sure it's easier to just
reuse dirtyrate.h; after all you reused dirtyrate.c.

> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index d65e744..e8d4e4a 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -27,6 +27,7 @@
>  #include "qapi/qmp/qdict.h"
>  #include "sysemu/kvm.h"
>  #include "sysemu/runstate.h"
> +#include "sysemu/dirtylimit.h"
>  #include "exec/memory.h"
>  
>  /*
> @@ -46,6 +47,155 @@ static struct DirtyRateStat DirtyStat;
>  static DirtyRateMeasureMode dirtyrate_mode =
>  DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
>  
> +struct {
> +DirtyRatesData data;
> +bool quit;
> +QemuThread thread;
> +} *dirtylimit_calc_state;
> +
> +static void dirtylimit_global_dirty_log_start(void)
> +{
> +qemu_mutex_lock_iothread();
> +memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
> +qemu_mutex_unlock_iothread();
> +}
> +
> +static void dirtylimit_global_dirty_log_stop(void)
> +{
> +qemu_mutex_lock_iothread();
> +memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
> +qemu_mutex_unlock_iothread();
> +}

This is merely dirtyrate_global_dirty_log_start/stop but with a different flag.

Let's introduce global_dirty_log_change() with BQL?

  global_dirty_log_change(flag, onoff)
  {
qemu_mutex_lock_iothread();
if (start) {
memory_global_dirty_log_start(flag);
} else {
memory_global_dirty_log_stop(flag);
}
qemu_mutex_unlock_iothread();
  }

Then we merge 4 functions into one.

We can also have a BQL-version of global_dirty_log_sync() in the same patch if
you think above helpful.

> +
> +static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
> + CPUState *cpu, bool start)
> +{
> +if (start) {
> +dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
> +} else {
> +dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
> +}
> +}
> +
> +static void dirtylimit_calc_func(void)

Would you still consider merging this with calculate_dirtyrate_dirty_ring?

I still don't see why it can't.

Maybe it cannot be directly reused, but the whole logic is really, really
simi

Re: [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU

2021-12-23 Thread Emanuele Giuseppe Esposito





On 20/12/2021 11:47, Vladimir Sementsov-Ogievskiy wrote:

20.12.2021 13:34, Emanuele Giuseppe Esposito wrote:



On 18/12/2021 12:53, Vladimir Sementsov-Ogievskiy wrote:

04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:

Once job lock is used and aiocontext is removed, mirror has
to perform job operations under the same critical section,
using the helpers prepared in previous commit.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/mirror.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 00089e519b..f22fa7da6e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -653,7 +653,7 @@ static int mirror_exit_common(Job *job)
  BlockDriverState *target_bs;
  BlockDriverState *mirror_top_bs;
  Error *local_err = NULL;
-    bool abort = job->ret < 0;
+    bool abort = job_has_failed(job);
  int ret = 0;
  if (s->prepared) {
@@ -1161,9 +1161,7 @@ static void mirror_complete(Job *job, Error 
**errp)

  s->should_complete = true;
  /* If the job is paused, it will be re-entered when it is 
resumed */

-    if (!job->paused) {
-    job_enter(job);
-    }
+    job_enter_not_paused(job);
  }
  static void coroutine_fn mirror_pause(Job *job)
@@ -1182,7 +1180,7 @@ static bool mirror_drained_poll(BlockJob *job)
   * from one of our own drain sections, to avoid a deadlock 
waiting for

   * ourselves.
   */
-    if (!s->common.job.paused && !job_is_cancelled(&job->job) && 
!s->in_drain) {
+    if (job_not_paused_nor_cancelled(&s->common.job) && 
!s->in_drain) {

  return true;
  }



Why to introduce a separate API function for every use case?

Could we instead just use WITH_JOB_LOCK_GUARD() ?



This implies making the struct job_mutex public. Is that ok for you?



Yes, I think it's OK.

Alternatively, you can use job_lock() / job_unlock(), or even rewrite 
WITH_JOB_LOCK_GUARD() macro using job_lock/job_unlock, to keep mutex 
private.. But I don't think it really worth it now.


Note that struct Job is already public, so if we'll use per-job mutex in 
future it still is not a problem. Only when we decide to make struct Job 
private, we'll have to decide something about JOB_LOCK_GUARD(), and at 
this point we'll just rewrite it to work through some helper function 
instead of directly touching the mutex.





Ok I will do that. Just FYI the initial idea was that drivers like 
monitor would not need to know about job_mutex lock, that is why I made 
the helpers in mirror.c.


Thank you,
Emanuele

Re: [RFC PATCH v2 11/14] block_job_query: remove atomic read

2021-12-23 Thread Emanuele Giuseppe Esposito





On 18/12/2021 13:07, Vladimir Sementsov-Ogievskiy wrote:

04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:

Not sure what the atomic here was supposed to do, since job.busy
is protected by the job lock.


In block_job_query() it is protected only since previous commit. So, 
before previous commit, atomic read make sense.


To me it doesn't really, because it is protected with job_lock/unlock in 
job.c, and here is read with an atomic. But maybe I am missing something.


Hmm. but job_lock() is still a no-op at this point. So, actually, it 
would be more correct to drop this qatomic_read after patch 14.




Will do.

Thank you,
Emanuele

[PULL 02/15] meson: reuse common_user_inc when building files specific to user-mode emulators

2021-12-23 Thread Paolo Bonzini

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index f45ecf31bd..b0af02b805 100644
--- a/meson.build
+++ b/meson.build
@@ -2897,6 +2897,7 @@ foreach target : target_dirs
   else
 abi = config_target['TARGET_ABI_DIR']
 target_type='user'
+target_inc += common_user_inc
 qemu_target_name = 'qemu-' + target_name
 if target_base_arch in target_user_arch
   t = target_user_arch[target_base_arch].apply(config_target, strict: 
false)
@@ -2905,7 +2906,6 @@ foreach target : target_dirs
 endif
 if 'CONFIG_LINUX_USER' in config_target
   base_dir = 'linux-user'
-  target_inc += include_directories('linux-user/host/' / host_arch)
 endif
 if 'CONFIG_BSD_USER' in config_target
   base_dir = 'bsd-user'
-- 
2.33.1

[PULL 00/15] Build system and KVM changes for 2021-12-23

2021-12-23 Thread Paolo Bonzini

The following changes since commit 2bf40d0841b942e7ba12953d515e62a436f0af84:

  Merge tag 'pull-user-20211220' of https://gitlab.com/rth7680/qemu into 
staging (2021-12-20 13:20:07 -0800)

are available in the Git repository at:

  https://gitlab.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to c139f026aa685e6b27a5a8ecb3272d4ed1700312:

  KVM: x86: ignore interrupt_bitmap field of KVM_GET/SET_SREGS (2021-12-23 
10:05:28 +0100)


* configure and meson cleanups
* KVM_GET/SET_SREGS2 support for x86
* fix occasional container build failures for debian-tricore-cross


Maxim Levitsky (1):
  KVM: use KVM_{GET|SET}_SREGS2 when supported.

Paolo Bonzini (13):
  docker: include bison in debian-tricore-cross
  meson: reuse common_user_inc when building files specific to user-mode 
emulators
  user: move common-user includes to a subdirectory of {bsd,linux}-user/
  meson: cleanup common-user/ build
  configure: simplify creation of plugin symbol list
  configure: do not set bsd_user/linux_user early
  configure, makefile: remove traces of really old files
  configure: parse --enable/--disable-strip automatically, flip default
  configure: move non-command-line variables away from command-line parsing 
section
  meson: build contrib/ executables after generated headers
  configure, meson: move config-poison.h to meson
  meson: add comments in the target-specific flags section
  KVM: x86: ignore interrupt_bitmap field of KVM_GET/SET_SREGS

Thomas Huth (1):
  block/file-posix: Simplify the XFS_IOC_DIOINFO handling

 Makefile   |  11 +-
 block/file-posix.c |  37 ++---
 bsd-user/{ => include}/special-errno.h |   0
 bsd-user/meson.build   |   2 +-
 common-user/meson.build|   2 +-
 configure  | 182 +++--
 contrib/elf2dmp/meson.build|   2 +-
 contrib/ivshmem-client/meson.build |   2 +-
 contrib/ivshmem-server/meson.build |   2 +-
 contrib/rdmacm-mux/meson.build |   2 +-
 .../{ => include}/host/aarch64/host-signal.h   |   0
 linux-user/{ => include}/host/alpha/host-signal.h  |   0
 linux-user/{ => include}/host/arm/host-signal.h|   0
 linux-user/{ => include}/host/i386/host-signal.h   |   0
 linux-user/{ => include}/host/mips/host-signal.h   |   0
 linux-user/{ => include}/host/ppc/host-signal.h|   0
 linux-user/{ => include}/host/ppc64/host-signal.h  |   0
 linux-user/{ => include}/host/riscv/host-signal.h  |   0
 linux-user/{ => include}/host/s390/host-signal.h   |   0
 linux-user/{ => include}/host/s390x/host-signal.h  |   0
 linux-user/{ => include}/host/sparc/host-signal.h  |   0
 .../{ => include}/host/sparc64/host-signal.h   |   0
 linux-user/{ => include}/host/x32/host-signal.h|   0
 linux-user/{ => include}/host/x86_64/host-signal.h |   0
 linux-user/{ => include}/special-errno.h   |   0
 linux-user/meson.build |   4 +-
 meson.build|  33 ++--
 pc-bios/s390-ccw/Makefile  |   2 -
 plugins/meson.build|  11 +-
 scripts/make-config-poison.sh  |  16 ++
 scripts/meson-buildoptions.py  |  21 ++-
 scripts/meson-buildoptions.sh  |   3 +
 target/i386/cpu.h  |   3 +
 target/i386/kvm/kvm.c  | 130 +--
 target/i386/machine.c  |  29 
 .../docker/dockerfiles/debian-tricore-cross.docker |   1 +
 36 files changed, 259 insertions(+), 236 deletions(-)
 rename bsd-user/{ => include}/special-errno.h (100%)
 rename linux-user/{ => include}/host/aarch64/host-signal.h (100%)
 rename linux-user/{ => include}/host/alpha/host-signal.h (100%)
 rename linux-user/{ => include}/host/arm/host-signal.h (100%)
 rename linux-user/{ => include}/host/i386/host-signal.h (100%)
 rename linux-user/{ => include}/host/mips/host-signal.h (100%)
 rename linux-user/{ => include}/host/ppc/host-signal.h (100%)
 rename linux-user/{ => include}/host/ppc64/host-signal.h (100%)
 rename linux-user/{ => include}/host/riscv/host-signal.h (100%)
 rename linux-user/{ => include}/host/s390/host-signal.h (100%)
 rename linux-user/{ => include}/host/s390x/host-signal.h (100%)
 rename linux-user/{ => include}/host/sparc/host-signal.h (100%)
 rename linux-user/{ => include}/host/sparc64/host-signal.h (100%)
 rename linux-user/{ => include}/host/x32/host-signal.h (100%)
 rename linux-user/{ => include}/host/x86_64/host-signal.h (100%)
 rename linux-user/{ => include}/special-errno.h (100%)
 create

[PULL 14/15] KVM: use KVM_{GET|SET}_SREGS2 when supported.

2021-12-23 Thread Paolo Bonzini

From: Maxim Levitsky 

This allows to make PDPTRs part of the migration
stream and thus not reload them after migration which
is against X86 spec.

Signed-off-by: Maxim Levitsky 
Message-Id: <20211101132300.192584-2-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.h |   3 ++
 target/i386/kvm/kvm.c | 108 +-
 target/i386/machine.c |  29 
 3 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 04f2b790c9..9911d7c871 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1455,6 +1455,9 @@ typedef struct CPUX86State {
 SegmentCache idt; /* only base and limit are used */
 
 target_ulong cr[5]; /* NOTE: cr1 is unused */
+
+bool pdptrs_valid;
+uint64_t pdptrs[4];
 int32_t a20_mask;
 
 BNDReg bnd_regs[4];
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 13f8e30c2a..d81745620b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -124,6 +124,7 @@ static uint32_t num_architectural_pmu_fixed_counters;
 static int has_xsave;
 static int has_xcrs;
 static int has_pit_state2;
+static int has_sregs2;
 static int has_exception_payload;
 
 static bool has_msr_mcg_ext_ctl;
@@ -2324,6 +2325,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 has_xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
 has_xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
 has_pit_state2 = kvm_check_extension(s, KVM_CAP_PIT_STATE2);
+has_sregs2 = kvm_check_extension(s, KVM_CAP_SREGS2) > 0;
 
 hv_vpindex_settable = kvm_check_extension(s, KVM_CAP_HYPERV_VP_INDEX);
 
@@ -2650,6 +2652,61 @@ static int kvm_put_sregs(X86CPU *cpu)
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_SREGS, &sregs);
 }
 
+static int kvm_put_sregs2(X86CPU *cpu)
+{
+CPUX86State *env = &cpu->env;
+struct kvm_sregs2 sregs;
+int i;
+
+sregs.flags = 0;
+
+if ((env->eflags & VM_MASK)) {
+set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
+set_v8086_seg(&sregs.ds, &env->segs[R_DS]);
+set_v8086_seg(&sregs.es, &env->segs[R_ES]);
+set_v8086_seg(&sregs.fs, &env->segs[R_FS]);
+set_v8086_seg(&sregs.gs, &env->segs[R_GS]);
+set_v8086_seg(&sregs.ss, &env->segs[R_SS]);
+} else {
+set_seg(&sregs.cs, &env->segs[R_CS]);
+set_seg(&sregs.ds, &env->segs[R_DS]);
+set_seg(&sregs.es, &env->segs[R_ES]);
+set_seg(&sregs.fs, &env->segs[R_FS]);
+set_seg(&sregs.gs, &env->segs[R_GS]);
+set_seg(&sregs.ss, &env->segs[R_SS]);
+}
+
+set_seg(&sregs.tr, &env->tr);
+set_seg(&sregs.ldt, &env->ldt);
+
+sregs.idt.limit = env->idt.limit;
+sregs.idt.base = env->idt.base;
+memset(sregs.idt.padding, 0, sizeof sregs.idt.padding);
+sregs.gdt.limit = env->gdt.limit;
+sregs.gdt.base = env->gdt.base;
+memset(sregs.gdt.padding, 0, sizeof sregs.gdt.padding);
+
+sregs.cr0 = env->cr[0];
+sregs.cr2 = env->cr[2];
+sregs.cr3 = env->cr[3];
+sregs.cr4 = env->cr[4];
+
+sregs.cr8 = cpu_get_apic_tpr(cpu->apic_state);
+sregs.apic_base = cpu_get_apic_base(cpu->apic_state);
+
+sregs.efer = env->efer;
+
+if (env->pdptrs_valid) {
+for (i = 0; i < 4; i++) {
+sregs.pdptrs[i] = env->pdptrs[i];
+}
+sregs.flags |= KVM_SREGS2_FLAGS_PDPTRS_VALID;
+}
+
+return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_SREGS2, &sregs);
+}
+
+
 static void kvm_msr_buf_reset(X86CPU *cpu)
 {
 memset(cpu->kvm_msr_buf, 0, MSR_BUF_SIZE);
@@ -3330,6 +3387,53 @@ static int kvm_get_sregs(X86CPU *cpu)
 return 0;
 }
 
+static int kvm_get_sregs2(X86CPU *cpu)
+{
+CPUX86State *env = &cpu->env;
+struct kvm_sregs2 sregs;
+int i, ret;
+
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs);
+if (ret < 0) {
+return ret;
+}
+
+get_seg(&env->segs[R_CS], &sregs.cs);
+get_seg(&env->segs[R_DS], &sregs.ds);
+get_seg(&env->segs[R_ES], &sregs.es);
+get_seg(&env->segs[R_FS], &sregs.fs);
+get_seg(&env->segs[R_GS], &sregs.gs);
+get_seg(&env->segs[R_SS], &sregs.ss);
+
+get_seg(&env->tr, &sregs.tr);
+get_seg(&env->ldt, &sregs.ldt);
+
+env->idt.limit = sregs.idt.limit;
+env->idt.base = sregs.idt.base;
+env->gdt.limit = sregs.gdt.limit;
+env->gdt.base = sregs.gdt.base;
+
+env->cr[0] = sregs.cr0;
+env->cr[2] = sregs.cr2;
+env->cr[3] = sregs.cr3;
+env->cr[4] = sregs.cr4;
+
+env->efer = sregs.efer;
+
+env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
+
+if (env->pdptrs_valid) {
+for (i = 0; i < 4; i++) {
+env->pdptrs[i] = sregs.pdptrs[i];
+}
+}
+
+/* changes to apic base and cr8/tpr are read back via kvm_arch_post_run */
+x86_update_hflags(env);
+
+return 0;
+}
+
 static int kvm_get_msrs(X86CPU *cpu)
 {
 CPUX86State *env = &cpu->env;
@@ -4173,7 +4277,7 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 assert(

[PULL 01/15] docker: include bison in debian-tricore-cross

2021-12-23 Thread Paolo Bonzini

Binutils sometimes fail to build if bison is not installed:

  /bin/sh ./ylwrap `test -f arparse.y || echo ./`arparse.y y.tab.c arparse.c 
y.tab.h arparse.h y.output arparse.output --  -d
  ./ylwrap: 109: ./ylwrap: -d: not found

(the correct invocation of ylwrap would have "bison -d" after the double
dash).  Work around by installing it in the container.

Cc: Alex Bennée 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/596
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 tests/docker/dockerfiles/debian-tricore-cross.docker | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/docker/dockerfiles/debian-tricore-cross.docker 
b/tests/docker/dockerfiles/debian-tricore-cross.docker
index d8df2c6117..3f6b55562c 100644
--- a/tests/docker/dockerfiles/debian-tricore-cross.docker
+++ b/tests/docker/dockerfiles/debian-tricore-cross.docker
@@ -16,6 +16,7 @@ MAINTAINER Philippe Mathieu-Daudé 
 RUN apt update && \
 DEBIAN_FRONTEND=noninteractive apt install -yy eatmydata && \
 DEBIAN_FRONTEND=noninteractive eatmydata apt install -yy \
+   bison \
bzip2 \
ca-certificates \
ccache \
-- 
2.33.1

[PULL 07/15] configure: do not set bsd_user/linux_user early

2021-12-23 Thread Paolo Bonzini

Similar to other optional features, leave the variables empty and compute
the actual value later.  Use the existence of include or source directories
to detect whether an OS or CPU supports respectively bsd-user and linux-user.

For now, BSD user-mode emulation is buildable even on TCI-only
architectures.  This probably will change once safe signals are
brought over from linux-user.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 configure | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/configure b/configure
index 0306f0c8bc..6516ec243c 100755
--- a/configure
+++ b/configure
@@ -320,8 +320,8 @@ linux="no"
 solaris="no"
 profiler="no"
 softmmu="yes"
-linux_user="no"
-bsd_user="no"
+linux_user=""
+bsd_user=""
 pkgversion=""
 pie=""
 qom_cast_debug="yes"
@@ -538,7 +538,6 @@ gnu/kfreebsd)
 ;;
 freebsd)
   bsd="yes"
-  bsd_user="yes"
   make="${MAKE-gmake}"
   # needed for kinfo_getvmmap(3) in libutil.h
 ;;
@@ -583,7 +582,6 @@ haiku)
 ;;
 linux)
   linux="yes"
-  linux_user="yes"
   vhost_user=${default_feature:-yes}
 ;;
 esac
@@ -1257,18 +1255,26 @@ if eval test -z "\${cross_cc_$cpu}"; then
 cross_cc_vars="$cross_cc_vars cross_cc_${cpu}"
 fi
 
-# For user-mode emulation the host arch has to be one we explicitly
-# support, even if we're using TCI.
-if [ "$ARCH" = "unknown" ]; then
-  bsd_user="no"
-  linux_user="no"
-fi
-
 default_target_list=""
 deprecated_targets_list=ppc64abi32-linux-user
 deprecated_features=""
 mak_wilds=""
 
+if [ "$linux_user" != no ]; then
+if [ "$targetos" = linux ] && [ -d $source_path/linux-user/host/$cpu ]; 
then
+linux_user=yes
+elif [ "$linux_user" = yes ]; then
+error_exit "linux-user not supported on this architecture"
+fi
+fi
+if [ "$bsd_user" != no ]; then
+if [ "$bsd_user" = "" ]; then
+test $targetos = freebsd && bsd_user=yes
+fi
+if [ "$bsd_user" = yes ] && ! [ -d $source_path/bsd-user/$targetos ]; then
+error_exit "bsd-user not supported on this host OS"
+fi
+fi
 if [ "$softmmu" = "yes" ]; then
 mak_wilds="${mak_wilds} $source_path/configs/targets/*-softmmu.mak"
 fi
-- 
2.33.1

[PULL 04/15] meson: cleanup common-user/ build

2021-12-23 Thread Paolo Bonzini

It is not necessary to have a separate static_library just for common_user
files; using the one that already covers the rest of common_ss is enough
unless you need to reuse some source files between emulators and tests.
Just place common files for all user-mode emulators in common_ss,
similar to what is already done for softmmu_ss in full system emulators.

The only disadvantage is that the include_directories under bsd-user/include/
and linux-user/include/ are now enabled for all targets rather than only
user mode emulators.  This however is not different from how include/sysemu/
is available when building user mode emulators.

Tested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 common-user/meson.build |  2 +-
 meson.build | 13 +
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/common-user/meson.build b/common-user/meson.build
index 5cb42bc664..26212dda5c 100644
--- a/common-user/meson.build
+++ b/common-user/meson.build
@@ -1,6 +1,6 @@
 common_user_inc += include_directories('host/' / host_arch)
 
-common_user_ss.add(files(
+user_ss.add(files(
   'safe-syscall.S',
   'safe-syscall-error.c',
 ))
diff --git a/meson.build b/meson.build
index b0af02b805..879628ab68 100644
--- a/meson.build
+++ b/meson.build
@@ -2377,7 +2377,6 @@ blockdev_ss = ss.source_set()
 block_ss = ss.source_set()
 chardev_ss = ss.source_set()
 common_ss = ss.source_set()
-common_user_ss = ss.source_set()
 crypto_ss = ss.source_set()
 hwcore_ss = ss.source_set()
 io_ss = ss.source_set()
@@ -2629,17 +2628,6 @@ subdir('common-user')
 subdir('bsd-user')
 subdir('linux-user')
 
-common_user_ss = common_user_ss.apply(config_all, strict: false)
-common_user = static_library('common-user',
- sources: common_user_ss.sources(),
- dependencies: common_user_ss.dependencies(),
- include_directories: common_user_inc,
- name_suffix: 'fa',
- build_by_default: false)
-common_user = declare_dependency(link_with: common_user)
-
-user_ss.add(common_user)
-
 # needed for fuzzing binaries
 subdir('tests/qtest/libqos')
 subdir('tests/qtest/fuzz')
@@ -2857,6 +2845,7 @@ common_all = common_ss.apply(config_all, strict: false)
 common_all = static_library('common',
 build_by_default: false,
 sources: common_all.sources() + genh,
+include_directories: common_user_inc,
 implicit_include_directories: false,
 dependencies: common_all.dependencies(),
 name_suffix: 'fa')
-- 
2.33.1

[PULL 15/15] KVM: x86: ignore interrupt_bitmap field of KVM_GET/SET_SREGS

2021-12-23 Thread Paolo Bonzini

This is unnecessary, because the interrupt would be retrieved and queued
anyway by KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS respectively,
and it makes the flow more similar to the one for KVM_GET/SET_SREGS2.

Signed-off-by: Paolo Bonzini 
---
 target/i386/kvm/kvm.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index d81745620b..2c8feb4a6f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2607,11 +2607,11 @@ static int kvm_put_sregs(X86CPU *cpu)
 CPUX86State *env = &cpu->env;
 struct kvm_sregs sregs;
 
+/*
+ * The interrupt_bitmap is ignored because KVM_SET_SREGS is
+ * always followed by KVM_SET_VCPU_EVENTS.
+ */
 memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
-if (env->interrupt_injected >= 0) {
-sregs.interrupt_bitmap[env->interrupt_injected / 64] |=
-(uint64_t)1 << (env->interrupt_injected % 64);
-}
 
 if ((env->eflags & VM_MASK)) {
 set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
@@ -3341,23 +3341,17 @@ static int kvm_get_sregs(X86CPU *cpu)
 {
 CPUX86State *env = &cpu->env;
 struct kvm_sregs sregs;
-int bit, i, ret;
+int ret;
 
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS, &sregs);
 if (ret < 0) {
 return ret;
 }
 
-/* There can only be one pending IRQ set in the bitmap at a time, so try
-   to find it and save its number instead (-1 for none). */
-env->interrupt_injected = -1;
-for (i = 0; i < ARRAY_SIZE(sregs.interrupt_bitmap); i++) {
-if (sregs.interrupt_bitmap[i]) {
-bit = ctz64(sregs.interrupt_bitmap[i]);
-env->interrupt_injected = i * 64 + bit;
-break;
-}
-}
+/*
+ * The interrupt_bitmap is ignored because KVM_GET_SREGS is
+ * always preceded by KVM_GET_VCPU_EVENTS.
+ */
 
 get_seg(&env->segs[R_CS], &sregs.cs);
 get_seg(&env->segs[R_DS], &sregs.ds);
-- 
2.33.1

[PULL 03/15] user: move common-user includes to a subdirectory of {bsd, linux}-user/

2021-12-23 Thread Paolo Bonzini

Avoid polluting the compilation of common-user/ with local include files;
making an include file available to common-user/ should be a deliberate
decision in order to keep a clear interface that can be used by both
bsd-user/ and linux-user/.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 bsd-user/{ => include}/special-errno.h  | 0
 bsd-user/meson.build| 2 +-
 linux-user/{ => include}/host/aarch64/host-signal.h | 0
 linux-user/{ => include}/host/alpha/host-signal.h   | 0
 linux-user/{ => include}/host/arm/host-signal.h | 0
 linux-user/{ => include}/host/i386/host-signal.h| 0
 linux-user/{ => include}/host/mips/host-signal.h| 0
 linux-user/{ => include}/host/ppc/host-signal.h | 0
 linux-user/{ => include}/host/ppc64/host-signal.h   | 0
 linux-user/{ => include}/host/riscv/host-signal.h   | 0
 linux-user/{ => include}/host/s390/host-signal.h| 0
 linux-user/{ => include}/host/s390x/host-signal.h   | 0
 linux-user/{ => include}/host/sparc/host-signal.h   | 0
 linux-user/{ => include}/host/sparc64/host-signal.h | 0
 linux-user/{ => include}/host/x32/host-signal.h | 0
 linux-user/{ => include}/host/x86_64/host-signal.h  | 0
 linux-user/{ => include}/special-errno.h| 0
 linux-user/meson.build  | 4 ++--
 18 files changed, 3 insertions(+), 3 deletions(-)
 rename bsd-user/{ => include}/special-errno.h (100%)
 rename linux-user/{ => include}/host/aarch64/host-signal.h (100%)
 rename linux-user/{ => include}/host/alpha/host-signal.h (100%)
 rename linux-user/{ => include}/host/arm/host-signal.h (100%)
 rename linux-user/{ => include}/host/i386/host-signal.h (100%)
 rename linux-user/{ => include}/host/mips/host-signal.h (100%)
 rename linux-user/{ => include}/host/ppc/host-signal.h (100%)
 rename linux-user/{ => include}/host/ppc64/host-signal.h (100%)
 rename linux-user/{ => include}/host/riscv/host-signal.h (100%)
 rename linux-user/{ => include}/host/s390/host-signal.h (100%)
 rename linux-user/{ => include}/host/s390x/host-signal.h (100%)
 rename linux-user/{ => include}/host/sparc/host-signal.h (100%)
 rename linux-user/{ => include}/host/sparc64/host-signal.h (100%)
 rename linux-user/{ => include}/host/x32/host-signal.h (100%)
 rename linux-user/{ => include}/host/x86_64/host-signal.h (100%)
 rename linux-user/{ => include}/special-errno.h (100%)

diff --git a/bsd-user/special-errno.h b/bsd-user/include/special-errno.h
similarity index 100%
rename from bsd-user/special-errno.h
rename to bsd-user/include/special-errno.h
diff --git a/bsd-user/meson.build b/bsd-user/meson.build
index 9fcb80c3fa..8380fa44c2 100644
--- a/bsd-user/meson.build
+++ b/bsd-user/meson.build
@@ -4,7 +4,7 @@ endif
 
 bsd_user_ss = ss.source_set()
 
-common_user_inc += include_directories('.')
+common_user_inc += include_directories('include')
 
 bsd_user_ss.add(files(
   'bsdload.c',
diff --git a/linux-user/host/aarch64/host-signal.h 
b/linux-user/include/host/aarch64/host-signal.h
similarity index 100%
rename from linux-user/host/aarch64/host-signal.h
rename to linux-user/include/host/aarch64/host-signal.h
diff --git a/linux-user/host/alpha/host-signal.h 
b/linux-user/include/host/alpha/host-signal.h
similarity index 100%
rename from linux-user/host/alpha/host-signal.h
rename to linux-user/include/host/alpha/host-signal.h
diff --git a/linux-user/host/arm/host-signal.h 
b/linux-user/include/host/arm/host-signal.h
similarity index 100%
rename from linux-user/host/arm/host-signal.h
rename to linux-user/include/host/arm/host-signal.h
diff --git a/linux-user/host/i386/host-signal.h 
b/linux-user/include/host/i386/host-signal.h
similarity index 100%
rename from linux-user/host/i386/host-signal.h
rename to linux-user/include/host/i386/host-signal.h
diff --git a/linux-user/host/mips/host-signal.h 
b/linux-user/include/host/mips/host-signal.h
similarity index 100%
rename from linux-user/host/mips/host-signal.h
rename to linux-user/include/host/mips/host-signal.h
diff --git a/linux-user/host/ppc/host-signal.h 
b/linux-user/include/host/ppc/host-signal.h
similarity index 100%
rename from linux-user/host/ppc/host-signal.h
rename to linux-user/include/host/ppc/host-signal.h
diff --git a/linux-user/host/ppc64/host-signal.h 
b/linux-user/include/host/ppc64/host-signal.h
similarity index 100%
rename from linux-user/host/ppc64/host-signal.h
rename to linux-user/include/host/ppc64/host-signal.h
diff --git a/linux-user/host/riscv/host-signal.h 
b/linux-user/include/host/riscv/host-signal.h
similarity index 100%
rename from linux-user/host/riscv/host-signal.h
rename to linux-user/include/host/riscv/host-signal.h
diff --git a/linux-user/host/s390/host-signal.h 
b/linux-user/include/host/s390/host-signal.h
similarity index 100%
rename from linux-user/host/s390/host-signal.h
rename to linux-user/include/host/s390/host-signal.h
diff --git a/linux-user/host/s390x/host-signal.h 
b/linux-user/include/host/s390x/host-signal.h
similarity inde

[PULL 08/15] configure, makefile: remove traces of really old files

2021-12-23 Thread Paolo Bonzini

These files have been removed for more than year in the best
case, or for more than ten years for some really old TCG files.
Remove any traces of it.

Acked-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 Makefile  | 11 ---
 configure |  9 -
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/Makefile b/Makefile
index 74c5b46d38..06ad8a61e1 100644
--- a/Makefile
+++ b/Makefile
@@ -205,14 +205,11 @@ recurse-clean: $(addsuffix /clean, $(ROM_DIRS))
 clean: recurse-clean
-$(quiet-@)test -f build.ninja && $(NINJA) $(NINJAFLAGS) -t clean || :
-$(quiet-@)test -f build.ninja && $(NINJA) $(NINJAFLAGS) clean-ctlist 
|| :
-# avoid old build problems by removing potentially incorrect old files
-   rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h 
gen-op-arm.h
find . \( -name '*.so' -o -name '*.dll' -o -name '*.[oda]' \) -type f \
! -path ./roms/edk2/ArmPkg/Library/GccLto/liblto-aarch64.a \
! -path ./roms/edk2/ArmPkg/Library/GccLto/liblto-arm.a \
-exec rm {} +
-   rm -f TAGS cscope.* *.pod *~ */*~
-   rm -f fsdev/*.pod scsi/*.pod
+   rm -f TAGS cscope.* *~ */*~
 
 VERSION = $(shell cat $(SRC_PATH)/VERSION)
 
@@ -223,10 +220,10 @@ qemu-%.tar.bz2:
 
 distclean: clean
-$(quiet-@)test -f build.ninja && $(NINJA) $(NINJAFLAGS) -t clean -g || 
:
-   rm -f config-host.mak config-host.h* config-poison.h
+   rm -f config-host.mak config-poison.h
rm -f tests/tcg/config-*.mak
-   rm -f config-all-disas.mak config.status
-   rm -f roms/seabios/config.mak roms/vgabios/config.mak
+   rm -f config.status
+   rm -f roms/seabios/config.mak
rm -f qemu-plugins-ld.symbols qemu-plugins-ld64.symbols
rm -f *-config-target.h *-config-devices.mak *-config-devices.h
rm -rf meson-private meson-logs meson-info compile_commands.json
diff --git a/configure b/configure
index 6516ec243c..c8b32e7277 100755
--- a/configure
+++ b/configure
@@ -3665,9 +3665,6 @@ fi
 # so the build tree will be missing the link back to the new file, and
 # tests might fail. Prefer to keep the relevant files in their own
 # directory and symlink the directory instead.
-# UNLINK is used to remove symlinks from older development versions
-# that might get into the way when doing "git update" without doing
-# a "make distclean" in between.
 LINKS="Makefile"
 LINKS="$LINKS tests/tcg/Makefile.target"
 LINKS="$LINKS pc-bios/optionrom/Makefile"
@@ -3679,7 +3676,6 @@ LINKS="$LINKS tests/avocado tests/data"
 LINKS="$LINKS tests/qemu-iotests/check"
 LINKS="$LINKS python"
 LINKS="$LINKS contrib/plugins/Makefile "
-UNLINK="pc-bios/keymaps"
 for bios_file in \
 $source_path/pc-bios/*.bin \
 $source_path/pc-bios/*.elf \
@@ -3701,11 +3697,6 @@ for f in $LINKS ; do
 symlink "$source_path/$f" "$f"
 fi
 done
-for f in $UNLINK ; do
-if [ -L "$f" ]; then
-rm -f "$f"
-fi
-done
 
 (for i in $cross_cc_vars; do
   export $i
-- 
2.33.1

[PULL 05/15] block/file-posix: Simplify the XFS_IOC_DIOINFO handling

2021-12-23 Thread Paolo Bonzini

From: Thomas Huth 

The handling for the XFS_IOC_DIOINFO ioctl is currently quite excessive:
This is not a "real" feature like the other features that we provide with
the "--enable-xxx" and "--disable-xxx" switches for the configure script,
since this does not influence lots of code (it's only about one call to
xfsctl() in file-posix.c), so people don't gain much with the ability to
disable this with "--disable-xfsctl".
It's also unfortunate that the ioctl will be disabled on Linux in case
the user did not install the right xfsprogs-devel package before running
configure. Thus let's simplify this by providing the ioctl definition
on our own, so we can completely get rid of the header dependency and
thus the related code in the configure script.

Suggested-by: Paolo Bonzini 
Signed-off-by: Thomas Huth 
Message-Id: <20211215125824.250091-1-th...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 block/file-posix.c | 37 -
 configure  | 31 ---
 meson.build|  1 -
 3 files changed, 16 insertions(+), 53 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b283093e5b..1f1756e192 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -106,10 +106,6 @@
 #include 
 #endif
 
-#ifdef CONFIG_XFS
-#include 
-#endif
-
 /* OS X does not have O_DSYNC */
 #ifndef O_DSYNC
 #ifdef O_SYNC
@@ -156,9 +152,6 @@ typedef struct BDRVRawState {
 int perm_change_flags;
 BDRVReopenState *reopen_state;
 
-#ifdef CONFIG_XFS
-bool is_xfs:1;
-#endif
 bool has_discard:1;
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
@@ -409,14 +402,22 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 if (probe_logical_blocksize(fd, &bs->bl.request_alignment) < 0) {
 bs->bl.request_alignment = 0;
 }
-#ifdef CONFIG_XFS
-if (s->is_xfs) {
-struct dioattr da;
-if (xfsctl(NULL, fd, XFS_IOC_DIOINFO, &da) >= 0) {
-bs->bl.request_alignment = da.d_miniosz;
-/* The kernel returns wrong information for d_mem */
-/* s->buf_align = da.d_mem; */
-}
+
+#ifdef __linux__
+/*
+ * The XFS ioctl definitions are shipped in extra packages that might
+ * not always be available. Since we just need the XFS_IOC_DIOINFO ioctl
+ * here, we simply use our own definition instead:
+ */
+struct xfs_dioattr {
+uint32_t d_mem;
+uint32_t d_miniosz;
+uint32_t d_maxiosz;
+} da;
+if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), &da) >= 0) {
+bs->bl.request_alignment = da.d_miniosz;
+/* The kernel returns wrong information for d_mem */
+/* s->buf_align = da.d_mem; */
 }
 #endif
 
@@ -798,12 +799,6 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 #endif
 s->needs_alignment = raw_needs_alignment(bs);
 
-#ifdef CONFIG_XFS
-if (platform_test_xfs_fd(s->fd)) {
-s->is_xfs = true;
-}
-#endif
-
 bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
 if (S_ISREG(st.st_mode)) {
 /* When extending regular files, we get zeros from the OS */
diff --git a/configure b/configure
index 8ccfe51673..b66ab31834 100755
--- a/configure
+++ b/configure
@@ -291,7 +291,6 @@ EXTRA_CXXFLAGS=""
 EXTRA_LDFLAGS=""
 
 xen_ctrl_version="$default_feature"
-xfs="$default_feature"
 membarrier="$default_feature"
 vhost_kernel="$default_feature"
 vhost_net="$default_feature"
@@ -1019,10 +1018,6 @@ for opt do
   ;;
   --enable-opengl) opengl="yes"
   ;;
-  --disable-xfsctl) xfs="no"
-  ;;
-  --enable-xfsctl) xfs="yes"
-  ;;
   --disable-zlib-test)
   ;;
   --enable-guest-agent) guest_agent="yes"
@@ -1429,7 +1424,6 @@ cat << EOF
   avx512f AVX512F optimization support
   replication replication support
   opengl  opengl support
-  xfsctl  xfsctl support
   qom-cast-debug  cast debugging support
   tools   build qemu-io, qemu-nbd and qemu-img tools
   bochs   bochs image format support
@@ -2321,28 +2315,6 @@ EOF
 fi
 fi
 
-##
-# xfsctl() probe, used for file-posix.c
-if test "$xfs" != "no" ; then
-  cat > $TMPC << EOF
-#include   /* NULL */
-#include 
-int main(void)
-{
-xfsctl(NULL, 0, 0, NULL);
-return 0;
-}
-EOF
-  if compile_prog "" "" ; then
-xfs="yes"
-  else
-if test "$xfs" = "yes" ; then
-  feature_not_found "xfs" "Install xfsprogs/xfslibs devel"
-fi
-xfs=no
-  fi
-fi
-
 ##
 # plugin linker support probe
 
@@ -3454,9 +3426,6 @@ echo "CONFIG_BDRV_RO_WHITELIST=$block_drv_ro_whitelist" 
>> $config_host_mak
 if test "$block_drv_whitelist_tools" = "yes" ; then
   echo "CONFIG_BDRV_WHITELIST_TOOLS=y" >> $config_host_mak
 fi
-if test "$xfs" = "yes" ; then
-  echo "CONFIG_XFS=y" >> $config_host_mak
-fi
 qemu_version=$(head $source_path/VERSION)
 echo "PKGVERSION=$pkgversion" >>$config_host_mak

[PULL 06/15] configure: simplify creation of plugin symbol list

2021-12-23 Thread Paolo Bonzini

--dynamic-list is present on all supported ELF (not Windows or Darwin)
platforms, since it dates back to 2006; -exported_symbols_list is
likewise present on all supported versions of macOS.  Do not bother
doing a functional test in configure.

Remove the file creation from configure as well: for Darwin, move the
the creation of the Darwin-formatted symbols to meson; for ELF, use the
file in the source path directly and switch from -Wl, to -Xlinker to
not break weird paths that include a comma.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 configure   | 80 -
 plugins/meson.build | 11 +--
 2 files changed, 8 insertions(+), 83 deletions(-)

diff --git a/configure b/configure
index b66ab31834..0306f0c8bc 100755
--- a/configure
+++ b/configure
@@ -78,7 +78,6 @@ TMPC="${TMPDIR1}/${TMPB}.c"
 TMPO="${TMPDIR1}/${TMPB}.o"
 TMPCXX="${TMPDIR1}/${TMPB}.cxx"
 TMPE="${TMPDIR1}/${TMPB}.exe"
-TMPTXT="${TMPDIR1}/${TMPB}.txt"
 
 rm -f config.log
 
@@ -2315,69 +2314,6 @@ EOF
 fi
 fi
 
-##
-# plugin linker support probe
-
-if test "$plugins" != "no"; then
-
-#
-# See if --dynamic-list is supported by the linker
-
-ld_dynamic_list="no"
-cat > $TMPTXT < $TMPC <
-void foo(void);
-
-void foo(void)
-{
-  printf("foo\n");
-}
-
-int main(void)
-{
-  foo();
-  return 0;
-}
-EOF
-
-if compile_prog "" "-Wl,--dynamic-list=$TMPTXT" ; then
-ld_dynamic_list="yes"
-fi
-
-#
-# See if -exported_symbols_list is supported by the linker
-
-ld_exported_symbols_list="no"
-cat > $TMPTXT <> $config_host_mak
-# Copy the export object list to the build dir
-if test "$ld_dynamic_list" = "yes" ; then
-   echo "CONFIG_HAS_LD_DYNAMIC_LIST=yes" >> $config_host_mak
-   ld_symbols=qemu-plugins-ld.symbols
-   cp "$source_path/plugins/qemu-plugins.symbols" $ld_symbols
-elif test "$ld_exported_symbols_list" = "yes" ; then
-   echo "CONFIG_HAS_LD_EXPORTED_SYMBOLS_LIST=yes" >> $config_host_mak
-   ld64_symbols=qemu-plugins-ld64.symbols
-   echo "# Automatically generated by configure - do not modify" > 
$ld64_symbols
-   grep 'qemu_' "$source_path/plugins/qemu-plugins.symbols" | sed 's/;//g' 
| \
-   sed -E 's/^[[:space:]]*(.*)/_\1/' >> $ld64_symbols
-else
-   error_exit \
-   "If \$plugins=yes, either \$ld_dynamic_list or " \
-   "\$ld_exported_symbols_list should have been set to 'yes'."
-fi
 fi
 
 if test -n "$gdb_bin"; then
diff --git a/plugins/meson.build b/plugins/meson.build
index b3de57853b..d0a2ee94cf 100644
--- a/plugins/meson.build
+++ b/plugins/meson.build
@@ -1,10 +1,15 @@
 plugin_ldflags = []
 # Modules need more symbols than just those in plugins/qemu-plugins.symbols
 if not enable_modules
-  if 'CONFIG_HAS_LD_DYNAMIC_LIST' in config_host
-plugin_ldflags = ['-Wl,--dynamic-list=qemu-plugins-ld.symbols']
-  elif 'CONFIG_HAS_LD_EXPORTED_SYMBOLS_LIST' in config_host
+  if targetos == 'darwin'
+qemu_plugins_symbols_list = configure_file(
+  input: files('qemu-plugins.symbols'),
+  output: 'qemu-plugins-ld64.symbols',
+  capture: true,
+  command: ['sed', '-ne', 's/^[[:space:]]*\\(qemu_.*\\);/_\\1/p', 
'@INPUT@'])
 plugin_ldflags = ['-Wl,-exported_symbols_list,qemu-plugins-ld64.symbols']
+  else
+plugin_ldflags = ['-Xlinker', '--dynamic-list=' + 
(meson.project_source_root() / 'plugins/qemu-plugins.symbols')]
   endif
 endif
 
-- 
2.33.1

[PULL 10/15] configure: move non-command-line variables away from command-line parsing section

2021-12-23 Thread Paolo Bonzini

This makes it easier to identify candidates for moving to Meson.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Paolo Bonzini 
---
 configure | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
index 302d58102b..8eb8e4c2cc 100755
--- a/configure
+++ b/configure
@@ -307,16 +307,12 @@ debug="no"
 sanitizers="no"
 tsan="no"
 fortify_source="$default_feature"
-mingw32="no"
 gcov="no"
 EXESUF=""
 modules="no"
 module_upgrades="no"
 prefix="/usr/local"
 qemu_suffix="qemu"
-bsd="no"
-linux="no"
-solaris="no"
 profiler="no"
 softmmu="yes"
 linux_user=""
@@ -330,8 +326,6 @@ opengl="$default_feature"
 cpuid_h="no"
 avx2_opt="$default_feature"
 guest_agent="$default_feature"
-guest_agent_with_vss="no"
-guest_agent_ntddscsi="no"
 vss_win32_sdk="$default_feature"
 win_sdk="no"
 want_tools="$default_feature"
@@ -526,6 +520,10 @@ fi
 
 # OS specific
 
+mingw32="no"
+bsd="no"
+linux="no"
+solaris="no"
 case $targetos in
 windows)
   mingw32="yes"
@@ -2546,6 +2544,7 @@ fi
 ##
 # check if we have VSS SDK headers for win
 
+guest_agent_with_vss="no"
 if test "$mingw32" = "yes" && test "$guest_agent" != "no" && \
 test "$vss_win32_sdk" != "no" ; then
   case "$vss_win32_sdk" in
@@ -2576,7 +2575,6 @@ EOF
   echo "ERROR: The headers are extracted in the directory \`inc'."
   feature_not_found "VSS support"
 fi
-guest_agent_with_vss="no"
   fi
 fi
 
@@ -2603,6 +2601,7 @@ fi
 
 ##
 # check if mingw environment provides a recent ntddscsi.h
+guest_agent_ntddscsi="no"
 if test "$mingw32" = "yes" && test "$guest_agent" != "no"; then
   cat > $TMPC << EOF
 #include 
-- 
2.33.1

[PULL 09/15] configure: parse --enable/--disable-strip automatically, flip default

2021-12-23 Thread Paolo Bonzini

Always include the STRIP variable in config-host.mak (it's only used
by the s390-ccw firmware build, and it adds a default if configure
omitted it), and use meson-buildoptions.sh to turn
--enable/--disable-strip into -Dstrip.

The default is now not to strip the binaries like for almost every other
package that has a configure script.

Signed-off-by: Paolo Bonzini 
---
 configure | 10 +-
 pc-bios/s390-ccw/Makefile |  2 --
 scripts/meson-buildoptions.py | 21 ++---
 scripts/meson-buildoptions.sh |  3 +++
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/configure b/configure
index c8b32e7277..302d58102b 100755
--- a/configure
+++ b/configure
@@ -307,7 +307,6 @@ debug="no"
 sanitizers="no"
 tsan="no"
 fortify_source="$default_feature"
-strip_opt="yes"
 mingw32="no"
 gcov="no"
 EXESUF=""
@@ -890,7 +889,6 @@ for opt do
   debug_tcg="yes"
   debug_mutex="yes"
   debug="yes"
-  strip_opt="no"
   fortify_source="no"
   ;;
   --enable-sanitizers) sanitizers="yes"
@@ -901,8 +899,6 @@ for opt do
   ;;
   --disable-tsan) tsan="no"
   ;;
-  --disable-strip) strip_opt="no"
-  ;;
   --disable-slirp) slirp="disabled"
   ;;
   --enable-slirp) slirp="enabled"
@@ -1365,7 +1361,6 @@ Advanced options (experts only):
   --enable-debug   enable common debug build options
   --enable-sanitizers  enable default sanitizers
   --enable-tsanenable thread sanitizer
-  --disable-strip  disable stripping binaries
   --disable-werror disable compilation abort on warning
   --disable-stack-protector disable compiler-provided stack protection
   --audio-drv-list=LISTset audio drivers to try if -audiodev is not used
@@ -3312,9 +3307,6 @@ echo "GIT_SUBMODULES_ACTION=$git_submodules_action" >> 
$config_host_mak
 if test "$debug_tcg" = "yes" ; then
   echo "CONFIG_DEBUG_TCG=y" >> $config_host_mak
 fi
-if test "$strip_opt" = "yes" ; then
-  echo "STRIP=${strip}" >> $config_host_mak
-fi
 if test "$mingw32" = "yes" ; then
   echo "CONFIG_WIN32=y" >> $config_host_mak
   if test "$guest_agent_with_vss" = "yes" ; then
@@ -3591,6 +3583,7 @@ echo "GLIB_CFLAGS=$glib_cflags" >> $config_host_mak
 echo "GLIB_LIBS=$glib_libs" >> $config_host_mak
 echo "QEMU_LDFLAGS=$QEMU_LDFLAGS" >> $config_host_mak
 echo "LD_I386_EMULATION=$ld_i386_emulation" >> $config_host_mak
+echo "STRIP=$strip" >> $config_host_mak
 echo "EXESUF=$EXESUF" >> $config_host_mak
 echo "LIBS_QGA=$libs_qga" >> $config_host_mak
 
@@ -3805,7 +3798,6 @@ if test "$skip_meson" = no; then
 -Doptimization=$(if test "$debug" = yes; then echo 0; else echo 2; fi) 
\
 -Ddebug=$(if test "$debug_info" = yes; then echo true; else echo 
false; fi) \
 -Dwerror=$(if test "$werror" = yes; then echo true; else echo false; 
fi) \
--Dstrip=$(if test "$strip_opt" = yes; then echo true; else echo false; 
fi) \
 -Db_pie=$(if test "$pie" = yes; then echo true; else echo false; fi) \
 -Db_coverage=$(if test "$gcov" = yes; then echo true; else echo false; 
fi) \
 -Db_lto=$lto -Dcfi=$cfi -Dtcg=$tcg -Dxen=$xen \
diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
index cee9d2c63b..0eb68efc7b 100644
--- a/pc-bios/s390-ccw/Makefile
+++ b/pc-bios/s390-ccw/Makefile
@@ -44,8 +44,6 @@ build-all: s390-ccw.img s390-netboot.img
 s390-ccw.elf: $(OBJECTS)
$(call quiet-command,$(CC) $(LDFLAGS) -o $@ 
$(OBJECTS),"BUILD","$(TARGET_DIR)$@")
 
-STRIP ?= strip
-
 s390-ccw.img: s390-ccw.elf
$(call quiet-command,$(STRIP) --strip-unneeded $< -o 
$@,"STRIP","$(TARGET_DIR)$@")
 
diff --git a/scripts/meson-buildoptions.py b/scripts/meson-buildoptions.py
index 96969d89ee..98ae944148 100755
--- a/scripts/meson-buildoptions.py
+++ b/scripts/meson-buildoptions.py
@@ -36,6 +36,10 @@
 "trace_file",
 }
 
+BUILTIN_OPTIONS = {
+"strip",
+}
+
 LINE_WIDTH = 76
 
 
@@ -90,14 +94,17 @@ def allow_arg(opt):
 return not (set(opt["choices"]) <= {"auto", "disabled", "enabled"})
 
 
+def filter_options(json):
+if ":" in json["name"]:
+return False
+if json["section"] == "user":
+return json["name"] not in SKIP_OPTIONS
+else:
+return json["name"] in BUILTIN_OPTIONS
+
+
 def load_options(json):
-json = [
-x
-for x in json
-if x["section"] == "user"
-and ":" not in x["name"]
-and x["name"] not in SKIP_OPTIONS
-]
+json = [x for x in json if filter_options(x)]
 return sorted(json, key=lambda x: x["name"])
 
 
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index ae8f18edc2..46360e541d 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -13,6 +13,7 @@ meson_options_help() {
   printf "%s\n" '   jemalloc/system/tcmalloc)'
   printf "%s\n" '  --enable-slirp[=CHOICE]  Whether and how to find the slirp 
library'
   printf "%s\n" '   (choices: 
auto/disabl

[PATCH v2 05/23] dma: Let dma_memory_read/write() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling
dma_memory_read() or dma_memory_write().

Patch created mechanically using spatch with this script:

  @@
  expression E1, E2, E3, E4;
  @@
  (
  - dma_memory_read(E1, E2, E3, E4)
  + dma_memory_read(E1, E2, E3, E4, MEMTXATTRS_UNSPECIFIED)
  |
  - dma_memory_write(E1, E2, E3, E4)
  + dma_memory_write(E1, E2, E3, E4, MEMTXATTRS_UNSPECIFIED)
  )

Reviewed-by: Richard Henderson 
Reviewed-by: Li Qiang 
Reviewed-by: Edgar E. Iglesias 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Message-Id: <20210702092439.989969-6-phi...@redhat.com>
---
v4: Merged conflict in hw/dma/pl330.c
---
 include/hw/ppc/spapr_vio.h|  6 --
 include/sysemu/dma.h  | 20 
 hw/arm/musicpal.c | 13 +++--
 hw/arm/smmu-common.c  |  3 ++-
 hw/arm/smmuv3.c   | 14 +-
 hw/core/generic-loader.c  |  3 ++-
 hw/dma/pl330.c| 12 
 hw/dma/sparc32_dma.c  | 16 ++--
 hw/dma/xlnx-zynq-devcfg.c |  6 --
 hw/dma/xlnx_dpdma.c   | 10 ++
 hw/i386/amd_iommu.c   | 16 +---
 hw/i386/intel_iommu.c | 28 +---
 hw/ide/macio.c|  2 +-
 hw/intc/xive.c|  7 ---
 hw/misc/bcm2835_property.c|  3 ++-
 hw/misc/macio/mac_dbdma.c | 10 ++
 hw/net/allwinner-sun8i-emac.c | 18 --
 hw/net/ftgmac100.c| 25 -
 hw/net/imx_fec.c  | 32 
 hw/net/npcm7xx_emc.c  | 20 
 hw/nvram/fw_cfg.c |  9 ++---
 hw/pci-host/pnv_phb3.c|  5 +++--
 hw/pci-host/pnv_phb3_msi.c|  9 ++---
 hw/pci-host/pnv_phb4.c|  5 +++--
 hw/sd/allwinner-sdhost.c  | 14 --
 hw/sd/sdhci.c | 35 ++-
 hw/usb/hcd-dwc2.c |  8 
 hw/usb/hcd-ehci.c |  6 --
 hw/usb/hcd-ohci.c | 18 +++---
 hw/usb/hcd-xhci.c | 18 +++---
 30 files changed, 241 insertions(+), 150 deletions(-)

diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index c90e74a67dd..5d2ea8e6656 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -97,14 +97,16 @@ static inline bool spapr_vio_dma_valid(SpaprVioDevice *dev, 
uint64_t taddr,
 static inline int spapr_vio_dma_read(SpaprVioDevice *dev, uint64_t taddr,
  void *buf, uint32_t size)
 {
-return (dma_memory_read(&dev->as, taddr, buf, size) != 0) ?
+return (dma_memory_read(&dev->as, taddr,
+buf, size, MEMTXATTRS_UNSPECIFIED) != 0) ?
 H_DEST_PARM : H_SUCCESS;
 }
 
 static inline int spapr_vio_dma_write(SpaprVioDevice *dev, uint64_t taddr,
   const void *buf, uint32_t size)
 {
-return (dma_memory_write(&dev->as, taddr, buf, size) != 0) ?
+return (dma_memory_write(&dev->as, taddr,
+ buf, size, MEMTXATTRS_UNSPECIFIED) != 0) ?
 H_DEST_PARM : H_SUCCESS;
 }
 
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index e8ad42226f6..522682bf386 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -143,12 +143,14 @@ static inline MemTxResult dma_memory_rw(AddressSpace *as, 
dma_addr_t addr,
  * @addr: address within that address space
  * @buf: buffer with the data transferred
  * @len: length of the data transferred
+ * @attrs: memory transaction attributes
  */
 static inline MemTxResult dma_memory_read(AddressSpace *as, dma_addr_t addr,
-  void *buf, dma_addr_t len)
+  void *buf, dma_addr_t len,
+  MemTxAttrs attrs)
 {
 return dma_memory_rw(as, addr, buf, len,
- DMA_DIRECTION_TO_DEVICE, MEMTXATTRS_UNSPECIFIED);
+ DMA_DIRECTION_TO_DEVICE, attrs);
 }
 
 /**
@@ -162,12 +164,14 @@ static inline MemTxResult dma_memory_read(AddressSpace 
*as, dma_addr_t addr,
  * @addr: address within that address space
  * @buf: buffer with the data transferred
  * @len: the number of bytes to write
+ * @attrs: memory transaction attributes
  */
 static inline MemTxResult dma_memory_write(AddressSpace *as, dma_addr_t addr,
-   const void *buf, dma_addr_t len)
+   const void *buf, dma_addr_t len,
+   MemTxAttrs attrs)
 {
 return dma_memory_rw(as, addr, (void *)buf, len,
- DMA_DIRECTION_FROM_DEVICE, MEMTXATTRS_UNSPECIFIED);
+ DMA_DIRECTION_FROM_DEVICE, attrs);
 }
 
 /**
@@ -239,7 +243,7 @@ static inline void dma_memory_unmap(AddressSpace *as,

[PULL 11/15] meson: build contrib/ executables after generated headers

2021-12-23 Thread Paolo Bonzini

This will be needed as soon as config-poison.h moves from configure to
a meson custom_target (which is built at "ninja" time).

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Paolo Bonzini 
---
 contrib/elf2dmp/meson.build| 2 +-
 contrib/ivshmem-client/meson.build | 2 +-
 contrib/ivshmem-server/meson.build | 2 +-
 contrib/rdmacm-mux/meson.build | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/elf2dmp/meson.build b/contrib/elf2dmp/meson.build
index 4d86cb390a..6707d43c4f 100644
--- a/contrib/elf2dmp/meson.build
+++ b/contrib/elf2dmp/meson.build
@@ -1,5 +1,5 @@
 if curl.found()
-  executable('elf2dmp', files('main.c', 'addrspace.c', 'download.c', 'pdb.c', 
'qemu_elf.c'),
+  executable('elf2dmp', files('main.c', 'addrspace.c', 'download.c', 'pdb.c', 
'qemu_elf.c'), genh,
  dependencies: [glib, curl],
  install: true)
 endif
diff --git a/contrib/ivshmem-client/meson.build 
b/contrib/ivshmem-client/meson.build
index 1b171efb4f..ce8dcca84d 100644
--- a/contrib/ivshmem-client/meson.build
+++ b/contrib/ivshmem-client/meson.build
@@ -1,4 +1,4 @@
-executable('ivshmem-client', files('ivshmem-client.c', 'main.c'),
+executable('ivshmem-client', files('ivshmem-client.c', 'main.c'), genh,
dependencies: glib,
build_by_default: targetos == 'linux',
install: false)
diff --git a/contrib/ivshmem-server/meson.build 
b/contrib/ivshmem-server/meson.build
index 3a53942201..c6c3c82e89 100644
--- a/contrib/ivshmem-server/meson.build
+++ b/contrib/ivshmem-server/meson.build
@@ -1,4 +1,4 @@
-executable('ivshmem-server', files('ivshmem-server.c', 'main.c'),
+executable('ivshmem-server', files('ivshmem-server.c', 'main.c'), genh,
dependencies: [qemuutil, rt],
build_by_default: targetos == 'linux',
install: false)
diff --git a/contrib/rdmacm-mux/meson.build b/contrib/rdmacm-mux/meson.build
index 6cc5016747..7674f54cc5 100644
--- a/contrib/rdmacm-mux/meson.build
+++ b/contrib/rdmacm-mux/meson.build
@@ -2,7 +2,7 @@ if 'CONFIG_PVRDMA' in config_host
   # if not found, CONFIG_PVRDMA should not be set
   # FIXME: broken on big endian architectures
   libumad = cc.find_library('ibumad', required: true)
-  executable('rdmacm-mux', files('main.c'),
+  executable('rdmacm-mux', files('main.c'), genh,
  dependencies: [glib, libumad],
  build_by_default: false,
  install: false)
-- 
2.33.1

[PULL 12/15] configure, meson: move config-poison.h to meson

2021-12-23 Thread Paolo Bonzini

This ensures that the file is regenerated properly whenever config-target.h
or config-devices.h files change.

Signed-off-by: Paolo Bonzini 
---
 Makefile  |  2 +-
 configure | 11 ---
 meson.build   | 12 
 scripts/make-config-poison.sh | 16 
 4 files changed, 29 insertions(+), 12 deletions(-)
 create mode 100755 scripts/make-config-poison.sh

diff --git a/Makefile b/Makefile
index 06ad8a61e1..2f80f56a4a 100644
--- a/Makefile
+++ b/Makefile
@@ -220,7 +220,7 @@ qemu-%.tar.bz2:
 
 distclean: clean
-$(quiet-@)test -f build.ninja && $(NINJA) $(NINJAFLAGS) -t clean -g || 
:
-   rm -f config-host.mak config-poison.h
+   rm -f config-host.mak
rm -f tests/tcg/config-*.mak
rm -f config.status
rm -f roms/seabios/config.mak
diff --git a/configure b/configure
index 8eb8e4c2cc..d2f12bc2d6 100755
--- a/configure
+++ b/configure
@@ -3827,17 +3827,6 @@ if test -n "${deprecated_features}"; then
 echo "  features: ${deprecated_features}"
 fi
 
-# Create list of config switches that should be poisoned in common code...
-# but filter out CONFIG_TCG and CONFIG_USER_ONLY which are special.
-target_configs_h=$(ls *-config-devices.h *-config-target.h 2>/dev/null)
-if test -n "$target_configs_h" ; then
-sed -n -e '/CONFIG_TCG/d' -e '/CONFIG_USER_ONLY/d' \
--e '/^#define / { s///; s/ .*//; s/^/#pragma GCC poison /p; }' \
-$target_configs_h | sort -u > config-poison.h
-else
-:> config-poison.h
-fi
-
 # Save the configure command line for later reuse.
 cat

[PATCH v2 01/23] dma: Let dma_memory_valid() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling
dma_memory_valid().

Reviewed-by: Richard Henderson 
Reviewed-by: Li Qiang 
Reviewed-by: Edgar E. Iglesias 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Message-Id: <20210702092439.989969-2-phi...@redhat.com>
---
 include/hw/ppc/spapr_vio.h | 2 +-
 include/sysemu/dma.h   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index 4bea87f39cc..4c45f1579fa 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -91,7 +91,7 @@ static inline void spapr_vio_irq_pulse(SpaprVioDevice *dev)
 static inline bool spapr_vio_dma_valid(SpaprVioDevice *dev, uint64_t taddr,
uint32_t size, DMADirection dir)
 {
-return dma_memory_valid(&dev->as, taddr, size, dir);
+return dma_memory_valid(&dev->as, taddr, size, dir, 
MEMTXATTRS_UNSPECIFIED);
 }
 
 static inline int spapr_vio_dma_read(SpaprVioDevice *dev, uint64_t taddr,
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index 3201e7901db..296f3b57c9c 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -73,11 +73,11 @@ static inline void dma_barrier(AddressSpace *as, 
DMADirection dir)
  * dma_memory_{read,write}() and check for errors */
 static inline bool dma_memory_valid(AddressSpace *as,
 dma_addr_t addr, dma_addr_t len,
-DMADirection dir)
+DMADirection dir, MemTxAttrs attrs)
 {
 return address_space_access_valid(as, addr, len,
   dir == DMA_DIRECTION_FROM_DEVICE,
-  MEMTXATTRS_UNSPECIFIED);
+  attrs);
 }
 
 static inline MemTxResult dma_memory_rw_relaxed(AddressSpace *as,
-- 
2.33.1

[PATCH v2 07/23] dma: Have dma_buf_rw() take a void pointer

2021-12-23 Thread Philippe Mathieu-Daudé

DMA operations are run on any kind of buffer, not arrays of
uint8_t. Convert dma_buf_rw() to take a void pointer argument
to save us pointless casts to uint8_t *.

Reviewed-by: Klaus Jensen 
Signed-off-by: Philippe Mathieu-Daudé 
---
 softmmu/dma-helpers.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index 3c06a2feddd..09e29997ee5 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -294,9 +294,10 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk,
 }
 
 
-static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg,
+static uint64_t dma_buf_rw(void *buf, int32_t len, QEMUSGList *sg,
DMADirection dir)
 {
+uint8_t *ptr = buf;
 uint64_t resid;
 int sg_cur_index;
 
-- 
2.33.1

[PULL 13/15] meson: add comments in the target-specific flags section

2021-12-23 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini 
---
 meson.build | 5 +
 1 file changed, 5 insertions(+)

diff --git a/meson.build b/meson.build
index a61eb7cee5..3519ed51e3 100644
--- a/meson.build
+++ b/meson.build
@@ -233,6 +233,7 @@ endif
 # Target-specific checks and dependencies #
 ###
 
+# Fuzzing
 if get_option('fuzzing') and get_option('fuzzing_engine') == '' and \
 not cc.links('''
   #include 
@@ -244,6 +245,7 @@ if get_option('fuzzing') and get_option('fuzzing_engine') 
== '' and \
   error('Your compiler does not support -fsanitize=fuzzer')
 endif
 
+# Tracing backends
 if 'ftrace' in get_option('trace_backends') and targetos != 'linux'
   error('ftrace is supported only on Linux')
 endif
@@ -257,6 +259,7 @@ if 'syslog' in get_option('trace_backends') and not 
cc.compiles('''
   error('syslog is not supported on this system')
 endif
 
+# Miscellaneous Linux-only features
 if targetos != 'linux' and get_option('mpath').enabled()
   error('Multipath is supported only on Linux')
 endif
@@ -266,6 +269,7 @@ if targetos != 'linux' and 
get_option('multiprocess').enabled()
 endif
 multiprocess_allowed = targetos == 'linux' and not 
get_option('multiprocess').disabled()
 
+# Target-specific libraries and flags
 libm = cc.find_library('m', required: false)
 threads = dependency('threads')
 util = cc.find_library('util', required: false)
@@ -306,6 +310,7 @@ elif targetos == 'openbsd'
   endif
 endif
 
+# Target-specific configuration of accelerators
 accelerators = []
 if not get_option('kvm').disabled() and targetos == 'linux'
   accelerators += 'CONFIG_KVM'
-- 
2.33.1

[PATCH v2 03/23] dma: Let dma_memory_rw_relaxed() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

We will add the MemTxAttrs argument to dma_memory_rw() in
the next commit. Since dma_memory_rw_relaxed() is only used
by dma_memory_rw(), modify it first in a separate commit to
keep the next commit easier to review.

Reviewed-by: Richard Henderson 
Reviewed-by: Li Qiang 
Reviewed-by: Edgar E. Iglesias 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Message-Id: <20210702092439.989969-4-phi...@redhat.com>
---
 include/sysemu/dma.h | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index d23516f020a..3be803cf3ff 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -83,9 +83,10 @@ static inline bool dma_memory_valid(AddressSpace *as,
 static inline MemTxResult dma_memory_rw_relaxed(AddressSpace *as,
 dma_addr_t addr,
 void *buf, dma_addr_t len,
-DMADirection dir)
+DMADirection dir,
+MemTxAttrs attrs)
 {
-return address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED,
+return address_space_rw(as, addr, attrs,
 buf, len, dir == DMA_DIRECTION_FROM_DEVICE);
 }
 
@@ -93,7 +94,9 @@ static inline MemTxResult 
dma_memory_read_relaxed(AddressSpace *as,
   dma_addr_t addr,
   void *buf, dma_addr_t len)
 {
-return dma_memory_rw_relaxed(as, addr, buf, len, DMA_DIRECTION_TO_DEVICE);
+return dma_memory_rw_relaxed(as, addr, buf, len,
+ DMA_DIRECTION_TO_DEVICE,
+ MEMTXATTRS_UNSPECIFIED);
 }
 
 static inline MemTxResult dma_memory_write_relaxed(AddressSpace *as,
@@ -102,7 +105,8 @@ static inline MemTxResult 
dma_memory_write_relaxed(AddressSpace *as,
dma_addr_t len)
 {
 return dma_memory_rw_relaxed(as, addr, (void *)buf, len,
- DMA_DIRECTION_FROM_DEVICE);
+ DMA_DIRECTION_FROM_DEVICE,
+ MEMTXATTRS_UNSPECIFIED);
 }
 
 /**
@@ -124,7 +128,8 @@ static inline MemTxResult dma_memory_rw(AddressSpace *as, 
dma_addr_t addr,
 {
 dma_barrier(as, dir);
 
-return dma_memory_rw_relaxed(as, addr, buf, len, dir);
+return dma_memory_rw_relaxed(as, addr, buf, len, dir,
+ MEMTXATTRS_UNSPECIFIED);
 }
 
 /**
-- 
2.33.1

[PATCH v2 04/23] dma: Let dma_memory_rw() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling
dma_memory_rw().

Reviewed-by: Richard Henderson 
Reviewed-by: Li Qiang 
Reviewed-by: Edgar E. Iglesias 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Message-Id: <20210702092439.989969-5-phi...@redhat.com>
---
 include/hw/pci/pci.h  |  3 ++-
 include/sysemu/dma.h  | 11 ++-
 hw/intc/spapr_xive.c  |  3 ++-
 hw/usb/hcd-ohci.c | 10 ++
 softmmu/dma-helpers.c |  3 ++-
 5 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e7cdf2d5ec5..4383f1c95e0 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -808,7 +808,8 @@ static inline MemTxResult pci_dma_rw(PCIDevice *dev, 
dma_addr_t addr,
  void *buf, dma_addr_t len,
  DMADirection dir)
 {
-return dma_memory_rw(pci_get_address_space(dev), addr, buf, len, dir);
+return dma_memory_rw(pci_get_address_space(dev), addr, buf, len,
+ dir, MEMTXATTRS_UNSPECIFIED);
 }
 
 /**
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index 3be803cf3ff..e8ad42226f6 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -121,15 +121,15 @@ static inline MemTxResult 
dma_memory_write_relaxed(AddressSpace *as,
  * @buf: buffer with the data transferred
  * @len: the number of bytes to read or write
  * @dir: indicates the transfer direction
+ * @attrs: memory transaction attributes
  */
 static inline MemTxResult dma_memory_rw(AddressSpace *as, dma_addr_t addr,
 void *buf, dma_addr_t len,
-DMADirection dir)
+DMADirection dir, MemTxAttrs attrs)
 {
 dma_barrier(as, dir);
 
-return dma_memory_rw_relaxed(as, addr, buf, len, dir,
- MEMTXATTRS_UNSPECIFIED);
+return dma_memory_rw_relaxed(as, addr, buf, len, dir, attrs);
 }
 
 /**
@@ -147,7 +147,8 @@ static inline MemTxResult dma_memory_rw(AddressSpace *as, 
dma_addr_t addr,
 static inline MemTxResult dma_memory_read(AddressSpace *as, dma_addr_t addr,
   void *buf, dma_addr_t len)
 {
-return dma_memory_rw(as, addr, buf, len, DMA_DIRECTION_TO_DEVICE);
+return dma_memory_rw(as, addr, buf, len,
+ DMA_DIRECTION_TO_DEVICE, MEMTXATTRS_UNSPECIFIED);
 }
 
 /**
@@ -166,7 +167,7 @@ static inline MemTxResult dma_memory_write(AddressSpace 
*as, dma_addr_t addr,
const void *buf, dma_addr_t len)
 {
 return dma_memory_rw(as, addr, (void *)buf, len,
- DMA_DIRECTION_FROM_DEVICE);
+ DMA_DIRECTION_FROM_DEVICE, MEMTXATTRS_UNSPECIFIED);
 }
 
 /**
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 4ec659b93e1..eae95c716f1 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -1684,7 +1684,8 @@ static target_ulong h_int_esb(PowerPCCPU *cpu,
 mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
 
 if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
-  (flags & SPAPR_XIVE_ESB_STORE))) {
+  (flags & SPAPR_XIVE_ESB_STORE),
+  MEMTXATTRS_UNSPECIFIED)) {
 qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
   HWADDR_PRIx "\n", mmio_addr);
 return H_HARDWARE;
diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index 1cf2816772c..56e2315c734 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -586,7 +586,8 @@ static int ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
 if (n > len)
 n = len;
 
-if (dma_memory_rw(ohci->as, ptr + ohci->localmem_base, buf, n, dir)) {
+if (dma_memory_rw(ohci->as, ptr + ohci->localmem_base, buf,
+  n, dir, MEMTXATTRS_UNSPECIFIED)) {
 return -1;
 }
 if (n == len) {
@@ -595,7 +596,7 @@ static int ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
 ptr = td->be & ~0xfffu;
 buf += n;
 if (dma_memory_rw(ohci->as, ptr + ohci->localmem_base, buf,
-  len - n, dir)) {
+  len - n, dir, MEMTXATTRS_UNSPECIFIED)) {
 return -1;
 }
 return 0;
@@ -613,7 +614,8 @@ static int ohci_copy_iso_td(OHCIState *ohci,
 if (n > len)
 n = len;
 
-if (dma_memory_rw(ohci->as, ptr + ohci->localmem_base, buf, n, dir)) {
+if (dma_memory_rw(ohci->as, ptr + ohci->localmem_base, buf,
+  n, dir, MEMTXATTRS_UNSPECIFIED)) {
 return -1;
 }
 if (n == len) {
@@ -622,7 +624,7 @@ static int ohci_copy_iso_td(OHCIState *ohci,
 ptr = end_addr & ~0xfffu;
 buf += n;
 if (dma_memory_rw(ohci->as, ptr + ohci->localmem_base, buf,
-  len - n, dir)) {
+  len - n, dir,

[PATCH v2 00/23] hw: Have DMA APIs take MemTxAttrs arg & propagate MemTxResult (full)

2021-12-23 Thread Philippe Mathieu-Daudé

Hi Peter and Paolo.

This series contains all the uncontroversary patches from
the "improve DMA situations, avoid re-entrancy issues"
earlier series. The rest will be discussed on top.

The only operations added are:
- take MemTxAttrs argument
- propagate MemTxResult

All patches are reviewed.

If you don't see any objection, I plan to send this via
a pull request by the end of next week.

Regards,

Phil.

Philippe Mathieu-Daudé (23):
  dma: Let dma_memory_valid() take MemTxAttrs argument
  dma: Let dma_memory_set() take MemTxAttrs argument
  dma: Let dma_memory_rw_relaxed() take MemTxAttrs argument
  dma: Let dma_memory_rw() take MemTxAttrs argument
  dma: Let dma_memory_read/write() take MemTxAttrs argument
  dma: Let dma_memory_map() take MemTxAttrs argument
  dma: Have dma_buf_rw() take a void pointer
  dma: Have dma_buf_read() / dma_buf_write() take a void pointer
  dma: Let pci_dma_rw() take MemTxAttrs argument
  dma: Let dma_buf_rw() take MemTxAttrs argument
  dma: Let dma_buf_write() take MemTxAttrs argument
  dma: Let dma_buf_read() take MemTxAttrs argument
  dma: Let dma_buf_rw() propagate MemTxResult
  dma: Let dma_buf_read() / dma_buf_write() propagate MemTxResult
  dma: Let st*_dma() take MemTxAttrs argument
  dma: Let ld*_dma() take MemTxAttrs argument
  dma: Let st*_dma() propagate MemTxResult
  dma: Let ld*_dma() propagate MemTxResult
  hw/scsi/megasas: Use uint32_t for reply queue head/tail values
  pci: Let st*_pci_dma() take MemTxAttrs argument
  pci: Let ld*_pci_dma() take MemTxAttrs argument
  pci: Let st*_pci_dma() propagate MemTxResult
  pci: Let ld*_pci_dma() propagate MemTxResult

 include/hw/pci/pci.h  | 38 +--
 include/hw/ppc/spapr_vio.h| 30 
 include/sysemu/dma.h  | 90 +--
 hw/arm/musicpal.c | 13 ++---
 hw/arm/smmu-common.c  |  3 +-
 hw/arm/smmuv3.c   | 14 --
 hw/audio/intel-hda.c  | 13 +++--
 hw/core/generic-loader.c  |  3 +-
 hw/display/virtio-gpu.c   | 10 ++--
 hw/dma/pl330.c| 12 +++--
 hw/dma/sparc32_dma.c  | 16 ---
 hw/dma/xlnx-zynq-devcfg.c |  6 ++-
 hw/dma/xlnx_dpdma.c   | 10 ++--
 hw/hyperv/vmbus.c |  8 ++--
 hw/i386/amd_iommu.c   | 16 ---
 hw/i386/intel_iommu.c | 28 ++-
 hw/ide/ahci.c | 18 ---
 hw/ide/macio.c|  2 +-
 hw/intc/pnv_xive.c|  7 +--
 hw/intc/spapr_xive.c  |  3 +-
 hw/intc/xive.c|  7 +--
 hw/misc/bcm2835_property.c|  3 +-
 hw/misc/macio/mac_dbdma.c | 10 ++--
 hw/net/allwinner-sun8i-emac.c | 18 ---
 hw/net/eepro100.c | 49 +++
 hw/net/ftgmac100.c| 25 ++
 hw/net/imx_fec.c  | 32 -
 hw/net/npcm7xx_emc.c  | 20 
 hw/net/tulip.c| 36 +++---
 hw/nvme/ctrl.c|  5 +-
 hw/nvram/fw_cfg.c | 16 ---
 hw/pci-host/pnv_phb3.c|  5 +-
 hw/pci-host/pnv_phb3_msi.c|  9 ++--
 hw/pci-host/pnv_phb4.c|  5 +-
 hw/scsi/esp-pci.c |  2 +-
 hw/scsi/megasas.c | 86 ++---
 hw/scsi/mptsas.c  | 16 +--
 hw/scsi/scsi-bus.c|  4 +-
 hw/scsi/vmw_pvscsi.c  | 20 +---
 hw/sd/allwinner-sdhost.c  | 14 +++---
 hw/sd/sdhci.c | 35 +-
 hw/usb/hcd-dwc2.c |  8 ++--
 hw/usb/hcd-ehci.c |  6 ++-
 hw/usb/hcd-ohci.c | 28 ++-
 hw/usb/hcd-xhci.c | 26 ++
 hw/usb/libhw.c|  3 +-
 hw/virtio/virtio.c|  6 ++-
 softmmu/dma-helpers.c | 32 -
 hw/scsi/trace-events  |  8 ++--
 49 files changed, 542 insertions(+), 332 deletions(-)

-- 
2.33.1

[PATCH v2 06/23] dma: Let dma_memory_map() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling
dma_memory_map().

Patch created mechanically using spatch with this script:

  @@
  expression E1, E2, E3, E4;
  @@
  - dma_memory_map(E1, E2, E3, E4)
  + dma_memory_map(E1, E2, E3, E4, MEMTXATTRS_UNSPECIFIED)

Reviewed-by: Richard Henderson 
Reviewed-by: Li Qiang 
Reviewed-by: Edgar E. Iglesias 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Message-Id: <20210702092439.989969-7-phi...@redhat.com>
---
 include/hw/pci/pci.h|  3 ++-
 include/sysemu/dma.h|  5 +++--
 hw/display/virtio-gpu.c | 10 ++
 hw/hyperv/vmbus.c   |  8 +---
 hw/ide/ahci.c   |  8 +---
 hw/usb/libhw.c  |  3 ++-
 hw/virtio/virtio.c  |  6 --
 softmmu/dma-helpers.c   |  3 ++-
 8 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 4383f1c95e0..1acefc2a4c3 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -875,7 +875,8 @@ static inline void *pci_dma_map(PCIDevice *dev, dma_addr_t 
addr,
 {
 void *buf;
 
-buf = dma_memory_map(pci_get_address_space(dev), addr, plen, dir);
+buf = dma_memory_map(pci_get_address_space(dev), addr, plen, dir,
+ MEMTXATTRS_UNSPECIFIED);
 return buf;
 }
 
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index 522682bf386..97ff6f29f8c 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -202,16 +202,17 @@ MemTxResult dma_memory_set(AddressSpace *as, dma_addr_t 
addr,
  * @addr: address within that address space
  * @len: pointer to length of buffer; updated on return
  * @dir: indicates the transfer direction
+ * @attrs: memory attributes
  */
 static inline void *dma_memory_map(AddressSpace *as,
dma_addr_t addr, dma_addr_t *len,
-   DMADirection dir)
+   DMADirection dir, MemTxAttrs attrs)
 {
 hwaddr xlen = *len;
 void *p;
 
 p = address_space_map(as, addr, &xlen, dir == DMA_DIRECTION_FROM_DEVICE,
-  MEMTXATTRS_UNSPECIFIED);
+  attrs);
 *len = xlen;
 return p;
 }
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index d78b9700c7d..c6dc818988c 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -814,8 +814,9 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
 
 do {
 len = l;
-map = dma_memory_map(VIRTIO_DEVICE(g)->dma_as,
- a, &len, DMA_DIRECTION_TO_DEVICE);
+map = dma_memory_map(VIRTIO_DEVICE(g)->dma_as, a, &len,
+ DMA_DIRECTION_TO_DEVICE,
+ MEMTXATTRS_UNSPECIFIED);
 if (!map) {
 qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to map MMIO memory 
for"
   " element %d\n", __func__, e);
@@ -1252,8 +1253,9 @@ static int virtio_gpu_load(QEMUFile *f, void *opaque, 
size_t size,
 for (i = 0; i < res->iov_cnt; i++) {
 hwaddr len = res->iov[i].iov_len;
 res->iov[i].iov_base =
-dma_memory_map(VIRTIO_DEVICE(g)->dma_as,
-   res->addrs[i], &len, DMA_DIRECTION_TO_DEVICE);
+dma_memory_map(VIRTIO_DEVICE(g)->dma_as, res->addrs[i], &len,
+   DMA_DIRECTION_TO_DEVICE,
+   MEMTXATTRS_UNSPECIFIED);
 
 if (!res->iov[i].iov_base || len != res->iov[i].iov_len) {
 /* Clean up the half-a-mapping we just created... */
diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c
index dbce3b35fba..8aad29f1bb2 100644
--- a/hw/hyperv/vmbus.c
+++ b/hw/hyperv/vmbus.c
@@ -373,7 +373,8 @@ static ssize_t gpadl_iter_io(GpadlIter *iter, void *buf, 
uint32_t len)
 
 maddr = (iter->gpadl->gfns[idx] << TARGET_PAGE_BITS) | off_in_page;
 
-iter->map = dma_memory_map(iter->as, maddr, &mlen, iter->dir);
+iter->map = dma_memory_map(iter->as, maddr, &mlen, iter->dir,
+   MEMTXATTRS_UNSPECIFIED);
 if (mlen != pgleft) {
 dma_memory_unmap(iter->as, iter->map, mlen, iter->dir, 0);
 iter->map = NULL;
@@ -490,7 +491,8 @@ int vmbus_map_sgl(VMBusChanReq *req, DMADirection dir, 
struct iovec *iov,
 goto err;
 }
 
-iov[ret_cnt].iov_base = dma_memory_map(sgl->as, a, &l, dir);
+iov[ret_cnt].iov_base = dma_memory_map(sgl->as, a, &l, dir,
+   MEMTXATTRS_UNSPECIFIED);
 if (!l) {
 ret = -EFAULT;
 goto err;
@@ -566,7 +568,7 @@ static vmbus_ring_buffer 
*ringbuf_map_hdr(VMBusRingBufCommon *ringbuf)
 dma_addr_t mlen = sizeof(*rb);
 
 rb = dma_memory_map(ringbuf->as, ringbuf->rb_addr, &mlen,
-DMA

[PATCH v2 18/23] dma: Let ld*_dma() propagate MemTxResult

2021-12-23 Thread Philippe Mathieu-Daudé

dma_memory_read() returns a MemTxResult type. Do not discard
it, return it to the caller.

Update the few callers.

Reviewed-by: Richard Henderson 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h   |  6 --
 include/hw/ppc/spapr_vio.h |  6 +-
 include/sysemu/dma.h   | 25 -
 hw/intc/pnv_xive.c |  8 
 hw/usb/hcd-xhci.c  |  7 ---
 5 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 0613308b1b6..8c5f2ed5054 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -854,8 +854,10 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
 static inline uint##_bits##_t ld##_l##_pci_dma(PCIDevice *dev,  \
dma_addr_t addr) \
 {   \
-return ld##_l##_dma(pci_get_address_space(dev), addr,   \
-MEMTXATTRS_UNSPECIFIED);\
+uint##_bits##_t val; \
+ld##_l##_dma(pci_get_address_space(dev), addr, &val, \
+ MEMTXATTRS_UNSPECIFIED); \
+return val; \
 }   \
 static inline void st##_s##_pci_dma(PCIDevice *dev, \
 dma_addr_t addr, uint##_bits##_t val) \
diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index d2ec9b0637f..7eae1a48478 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -127,7 +127,11 @@ static inline int spapr_vio_dma_set(SpaprVioDevice *dev, 
uint64_t taddr,
 #define vio_stq(_dev, _addr, _val) \
 (stq_be_dma(&(_dev)->as, (_addr), (_val), MEMTXATTRS_UNSPECIFIED))
 #define vio_ldq(_dev, _addr) \
-(ldq_be_dma(&(_dev)->as, (_addr), MEMTXATTRS_UNSPECIFIED))
+({ \
+uint64_t _val; \
+ldq_be_dma(&(_dev)->as, (_addr), &_val, MEMTXATTRS_UNSPECIFIED); \
+_val; \
+})
 
 int spapr_vio_send_crq(SpaprVioDevice *dev, uint8_t *crq);
 
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index 725e8e90f88..e6776586613 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -240,14 +240,15 @@ static inline void dma_memory_unmap(AddressSpace *as,
 }
 
 #define DEFINE_LDST_DMA(_lname, _sname, _bits, _end) \
-static inline uint##_bits##_t ld##_lname##_##_end##_dma(AddressSpace *as, \
-dma_addr_t addr, \
-MemTxAttrs attrs) \
-{   \
-uint##_bits##_t val;\
-dma_memory_read(as, addr, &val, (_bits) / 8, attrs); \
-return _end##_bits##_to_cpu(val);   \
-}   \
+static inline MemTxResult ld##_lname##_##_end##_dma(AddressSpace *as, \
+dma_addr_t addr, \
+uint##_bits##_t *pval, 
\
+MemTxAttrs attrs) \
+{ \
+MemTxResult res = dma_memory_read(as, addr, pval, (_bits) / 8, attrs); 
\
+_end##_bits##_to_cpus(pval); \
+return res; \
+} \
 static inline MemTxResult st##_sname##_##_end##_dma(AddressSpace *as, \
 dma_addr_t addr, \
 uint##_bits##_t val, \
@@ -257,12 +258,10 @@ static inline void dma_memory_unmap(AddressSpace *as,
 return dma_memory_write(as, addr, &val, (_bits) / 8, attrs); \
 }
 
-static inline uint8_t ldub_dma(AddressSpace *as, dma_addr_t addr, MemTxAttrs 
attrs)
+static inline MemTxResult ldub_dma(AddressSpace *as, dma_addr_t addr,
+   uint8_t *val, MemTxAttrs attrs)
 {
-uint8_t val;
-
-dma_memory_read(as, addr, &val, 1, attrs);
-return val;
+return dma_memory_read(as, addr, val, 1, attrs);
 }
 
 static inline MemTxResult stb_dma(AddressSpace *as, dma_addr_t addr,
diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
index d9249bbc0c1..bb207514f2d 100644
--- a/hw/intc/pnv_xive.c
+++ b/hw/intc/pnv_xive.c
@@ -172,7 +172,7 @@ static uint64_t pnv_xive_vst_addr_indirect(PnvXive *xive, 
uint32_t type,
 
 /* Get the page size of the indirect table. */
 vsd_addr = vsd & VSD_ADDRESS_MASK;
-vsd = ldq_be_dma(&address_space_memory, vsd_addr, MEMTXATTRS_UNSPECIFIED);
+ldq_be_dma(&address_space_memory, vsd_addr, &vsd, MEMTXATTRS_UNSPECIFIED);
 
 if (!(vsd & VSD_ADDRESS_MASK)) {
 #ifdef XIVE_DEBUG
@@ -195,8 +195,8 @@ static uint6

[PATCH v2 09/23] dma: Let pci_dma_rw() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling pci_dma_rw().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Klaus Jensen 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h | 10 ++
 hw/audio/intel-hda.c |  3 ++-
 hw/scsi/esp-pci.c|  2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 1acefc2a4c3..a751ab5a75d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -806,10 +806,10 @@ static inline AddressSpace 
*pci_get_address_space(PCIDevice *dev)
  */
 static inline MemTxResult pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
  void *buf, dma_addr_t len,
- DMADirection dir)
+ DMADirection dir, MemTxAttrs attrs)
 {
 return dma_memory_rw(pci_get_address_space(dev), addr, buf, len,
- dir, MEMTXATTRS_UNSPECIFIED);
+ dir, attrs);
 }
 
 /**
@@ -827,7 +827,8 @@ static inline MemTxResult pci_dma_rw(PCIDevice *dev, 
dma_addr_t addr,
 static inline MemTxResult pci_dma_read(PCIDevice *dev, dma_addr_t addr,
void *buf, dma_addr_t len)
 {
-return pci_dma_rw(dev, addr, buf, len, DMA_DIRECTION_TO_DEVICE);
+return pci_dma_rw(dev, addr, buf, len,
+  DMA_DIRECTION_TO_DEVICE, MEMTXATTRS_UNSPECIFIED);
 }
 
 /**
@@ -845,7 +846,8 @@ static inline MemTxResult pci_dma_read(PCIDevice *dev, 
dma_addr_t addr,
 static inline MemTxResult pci_dma_write(PCIDevice *dev, dma_addr_t addr,
 const void *buf, dma_addr_t len)
 {
-return pci_dma_rw(dev, addr, (void *) buf, len, DMA_DIRECTION_FROM_DEVICE);
+return pci_dma_rw(dev, addr, (void *) buf, len,
+  DMA_DIRECTION_FROM_DEVICE, MEMTXATTRS_UNSPECIFIED);
 }
 
 #define PCI_DMA_DEFINE_LDST(_l, _s, _bits)  \
diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 8ce9df64e3e..fb3d34a4a0c 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -427,7 +427,8 @@ static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t 
stnr, bool output,
 dprint(d, 3, "dma: entry %d, pos %d/%d, copy %d\n",
st->be, st->bp, st->bpl[st->be].len, copy);
 
-pci_dma_rw(&d->pci, st->bpl[st->be].addr + st->bp, buf, copy, !output);
+pci_dma_rw(&d->pci, st->bpl[st->be].addr + st->bp, buf, copy, !output,
+   MEMTXATTRS_UNSPECIFIED);
 st->lpib += copy;
 st->bp += copy;
 buf += copy;
diff --git a/hw/scsi/esp-pci.c b/hw/scsi/esp-pci.c
index dac054aeed4..1792f84cea6 100644
--- a/hw/scsi/esp-pci.c
+++ b/hw/scsi/esp-pci.c
@@ -280,7 +280,7 @@ static void esp_pci_dma_memory_rw(PCIESPState *pci, uint8_t 
*buf, int len,
 len = pci->dma_regs[DMA_WBC];
 }
 
-pci_dma_rw(PCI_DEVICE(pci), addr, buf, len, dir);
+pci_dma_rw(PCI_DEVICE(pci), addr, buf, len, dir, MEMTXATTRS_UNSPECIFIED);
 
 /* update status registers */
 pci->dma_regs[DMA_WBC] -= len;
-- 
2.33.1

[PATCH v2 02/23] dma: Let dma_memory_set() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling
dma_memory_set().

Reviewed-by: Richard Henderson 
Reviewed-by: Li Qiang 
Reviewed-by: Edgar E. Iglesias 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Message-Id: <20210702092439.989969-3-phi...@redhat.com>
---
 include/hw/ppc/spapr_vio.h | 3 ++-
 include/sysemu/dma.h   | 3 ++-
 hw/nvram/fw_cfg.c  | 3 ++-
 softmmu/dma-helpers.c  | 5 ++---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index 4c45f1579fa..c90e74a67dd 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -111,7 +111,8 @@ static inline int spapr_vio_dma_write(SpaprVioDevice *dev, 
uint64_t taddr,
 static inline int spapr_vio_dma_set(SpaprVioDevice *dev, uint64_t taddr,
 uint8_t c, uint32_t size)
 {
-return (dma_memory_set(&dev->as, taddr, c, size) != 0) ?
+return (dma_memory_set(&dev->as, taddr,
+   c, size, MEMTXATTRS_UNSPECIFIED) != 0) ?
 H_DEST_PARM : H_SUCCESS;
 }
 
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index 296f3b57c9c..d23516f020a 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -175,9 +175,10 @@ static inline MemTxResult dma_memory_write(AddressSpace 
*as, dma_addr_t addr,
  * @addr: address within that address space
  * @c: constant byte to fill the memory
  * @len: the number of bytes to fill with the constant byte
+ * @attrs: memory transaction attributes
  */
 MemTxResult dma_memory_set(AddressSpace *as, dma_addr_t addr,
-   uint8_t c, dma_addr_t len);
+   uint8_t c, dma_addr_t len, MemTxAttrs attrs);
 
 /**
  * address_space_map: Map a physical memory region into a host virtual address.
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index c06b30de112..f7803fe3c30 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -399,7 +399,8 @@ static void fw_cfg_dma_transfer(FWCfgState *s)
  * tested before.
  */
 if (read) {
-if (dma_memory_set(s->dma_as, dma.address, 0, len)) {
+if (dma_memory_set(s->dma_as, dma.address, 0, len,
+   MEMTXATTRS_UNSPECIFIED)) {
 dma.control |= FW_CFG_DMA_CTL_ERROR;
 }
 }
diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index 7d766a5e89a..1f07217ad4a 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -19,7 +19,7 @@
 /* #define DEBUG_IOMMU */
 
 MemTxResult dma_memory_set(AddressSpace *as, dma_addr_t addr,
-   uint8_t c, dma_addr_t len)
+   uint8_t c, dma_addr_t len, MemTxAttrs attrs)
 {
 dma_barrier(as, DMA_DIRECTION_FROM_DEVICE);
 
@@ -31,8 +31,7 @@ MemTxResult dma_memory_set(AddressSpace *as, dma_addr_t addr,
 memset(fillbuf, c, FILLBUF_SIZE);
 while (len > 0) {
 l = len < FILLBUF_SIZE ? len : FILLBUF_SIZE;
-error |= address_space_write(as, addr, MEMTXATTRS_UNSPECIFIED,
- fillbuf, l);
+error |= address_space_write(as, addr, attrs, fillbuf, l);
 len -= l;
 addr += l;
 }
-- 
2.33.1

[PATCH v2 08/23] dma: Have dma_buf_read() / dma_buf_write() take a void pointer

2021-12-23 Thread Philippe Mathieu-Daudé

DMA operations are run on any kind of buffer, not arrays of
uint8_t. Convert dma_buf_read/dma_buf_write functions to take
a void pointer argument and save us pointless casts to uint8_t *.

Remove this pointless casts in the megasas device model.

Reviewed-by: Klaus Jensen 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/sysemu/dma.h  |  4 ++--
 hw/scsi/megasas.c | 22 +++---
 softmmu/dma-helpers.c |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index 97ff6f29f8c..0d5b836013d 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -302,8 +302,8 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk,
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
   QEMUSGList *sg, uint64_t offset, uint32_t align,
   BlockCompletionFunc *cb, void *opaque);
-uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg);
-uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg);
+uint64_t dma_buf_read(void *ptr, int32_t len, QEMUSGList *sg);
+uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg);
 
 void dma_acct_start(BlockBackend *blk, BlockAcctCookie *cookie,
 QEMUSGList *sg, enum BlockAcctType type);
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 8f357841004..dc28302f96d 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -848,7 +848,7 @@ static int megasas_ctrl_get_info(MegasasState *s, 
MegasasCmd *cmd)
MFI_INFO_PDMIX_SATA |
MFI_INFO_PDMIX_LD);
 
-cmd->iov_size -= dma_buf_read((uint8_t *)&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
 return MFI_STAT_OK;
 }
 
@@ -878,7 +878,7 @@ static int megasas_mfc_get_defaults(MegasasState *s, 
MegasasCmd *cmd)
 info.disable_preboot_cli = 1;
 info.cluster_disable = 1;
 
-cmd->iov_size -= dma_buf_read((uint8_t *)&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
 return MFI_STAT_OK;
 }
 
@@ -899,7 +899,7 @@ static int megasas_dcmd_get_bios_info(MegasasState *s, 
MegasasCmd *cmd)
 info.expose_all_drives = 1;
 }
 
-cmd->iov_size -= dma_buf_read((uint8_t *)&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
 return MFI_STAT_OK;
 }
 
@@ -910,7 +910,7 @@ static int megasas_dcmd_get_fw_time(MegasasState *s, 
MegasasCmd *cmd)
 
 fw_time = cpu_to_le64(megasas_fw_time());
 
-cmd->iov_size -= dma_buf_read((uint8_t *)&fw_time, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&fw_time, dcmd_size, &cmd->qsg);
 return MFI_STAT_OK;
 }
 
@@ -937,7 +937,7 @@ static int megasas_event_info(MegasasState *s, MegasasCmd 
*cmd)
 info.shutdown_seq_num = cpu_to_le32(s->shutdown_event);
 info.boot_seq_num = cpu_to_le32(s->boot_event);
 
-cmd->iov_size -= dma_buf_read((uint8_t *)&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
 return MFI_STAT_OK;
 }
 
@@ -1006,7 +1006,7 @@ static int megasas_dcmd_pd_get_list(MegasasState *s, 
MegasasCmd *cmd)
 info.size = cpu_to_le32(offset);
 info.count = cpu_to_le32(num_pd_disks);
 
-cmd->iov_size -= dma_buf_read((uint8_t *)&info, offset, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, offset, &cmd->qsg);
 return MFI_STAT_OK;
 }
 
@@ -1172,7 +1172,7 @@ static int megasas_dcmd_ld_get_list(MegasasState *s, 
MegasasCmd *cmd)
 info.ld_count = cpu_to_le32(num_ld_disks);
 trace_megasas_dcmd_ld_get_list(cmd->index, num_ld_disks, max_ld_disks);
 
-resid = dma_buf_read((uint8_t *)&info, dcmd_size, &cmd->qsg);
+resid = dma_buf_read(&info, dcmd_size, &cmd->qsg);
 cmd->iov_size = dcmd_size - resid;
 return MFI_STAT_OK;
 }
@@ -1221,7 +1221,7 @@ static int megasas_dcmd_ld_list_query(MegasasState *s, 
MegasasCmd *cmd)
 info.size = dcmd_size;
 trace_megasas_dcmd_ld_get_list(cmd->index, num_ld_disks, max_ld_disks);
 
-resid = dma_buf_read((uint8_t *)&info, dcmd_size, &cmd->qsg);
+resid = dma_buf_read(&info, dcmd_size, &cmd->qsg);
 cmd->iov_size = dcmd_size - resid;
 return MFI_STAT_OK;
 }
@@ -1390,7 +1390,7 @@ static int megasas_dcmd_cfg_read(MegasasState *s, 
MegasasCmd *cmd)
 ld_offset += sizeof(struct mfi_ld_config);
 }
 
-cmd->iov_size -= dma_buf_read((uint8_t *)data, info->size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(data, info->size, &cmd->qsg);
 return MFI_STAT_OK;
 }
 
@@ -1420,7 +1420,7 @@ static int megasas_dcmd_get_properties(MegasasState *s, 
MegasasCmd *cmd)
 info.ecc_bucket_leak_rate = cpu_to_le16(1440);
 info.expose_encl_devices = 1;
 
-cmd->iov_size -= dma_buf_read((uint8_t *)&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
 return MFI_STAT_OK;
 }
 
@@ -1465,7 +1465,7 @@ static int megasas

[PATCH v2 14/23] dma: Let dma_buf_read() / dma_buf_write() propagate MemTxResult

2021-12-23 Thread Philippe Mathieu-Daudé

Since the previous commit, dma_buf_rw() returns a MemTxResult
type. Do not discard it, return it to the caller.

Since both dma_buf_read/dma_buf_write functions were previously
returning the QEMUSGList size not consumed, add an extra argument
where the unconsummed size can be stored.

Update the few callers.

Reviewed-by: Klaus Jensen 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/sysemu/dma.h  |  6 --
 hw/ide/ahci.c |  8 
 hw/nvme/ctrl.c|  4 ++--
 hw/scsi/megasas.c | 48 ++-
 hw/scsi/scsi-bus.c|  4 ++--
 softmmu/dma-helpers.c | 18 ++--
 6 files changed, 52 insertions(+), 36 deletions(-)

diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index fd8f16003dd..d11c1d794f9 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -302,8 +302,10 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk,
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
   QEMUSGList *sg, uint64_t offset, uint32_t align,
   BlockCompletionFunc *cb, void *opaque);
-uint64_t dma_buf_read(void *ptr, int32_t len, QEMUSGList *sg, MemTxAttrs 
attrs);
-uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg, MemTxAttrs 
attrs);
+MemTxResult dma_buf_read(void *ptr, int32_t len, uint64_t *residp,
+ QEMUSGList *sg, MemTxAttrs attrs);
+MemTxResult dma_buf_write(void *ptr, int32_t len, uint64_t *residp,
+  QEMUSGList *sg, MemTxAttrs attrs);
 
 void dma_acct_start(BlockBackend *blk, BlockAcctCookie *cookie,
 QEMUSGList *sg, enum BlockAcctType type);
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 205dfdc6622..0c7d31ceada 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1384,9 +1384,9 @@ static void ahci_pio_transfer(const IDEDMA *dma)
 const MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
 
 if (is_write) {
-dma_buf_write(s->data_ptr, size, &s->sg, attrs);
+dma_buf_write(s->data_ptr, size, NULL, &s->sg, attrs);
 } else {
-dma_buf_read(s->data_ptr, size, &s->sg, attrs);
+dma_buf_read(s->data_ptr, size, NULL, &s->sg, attrs);
 }
 }
 
@@ -1479,9 +1479,9 @@ static int ahci_dma_rw_buf(const IDEDMA *dma, bool 
is_write)
 }
 
 if (is_write) {
-dma_buf_read(p, l, &s->sg, MEMTXATTRS_UNSPECIFIED);
+dma_buf_read(p, l, NULL, &s->sg, MEMTXATTRS_UNSPECIFIED);
 } else {
-dma_buf_write(p, l, &s->sg, MEMTXATTRS_UNSPECIFIED);
+dma_buf_write(p, l, NULL, &s->sg, MEMTXATTRS_UNSPECIFIED);
 }
 
 /* free sglist, update byte count */
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 462f79a1f60..fa410a179a6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1150,9 +1150,9 @@ static uint16_t nvme_tx(NvmeCtrl *n, NvmeSg *sg, uint8_t 
*ptr, uint32_t len,
 uint64_t residual;
 
 if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
-residual = dma_buf_write(ptr, len, &sg->qsg, attrs);
+dma_buf_write(ptr, len, &residual, &sg->qsg, attrs);
 } else {
-residual = dma_buf_read(ptr, len, &sg->qsg, attrs);
+dma_buf_read(ptr, len, &residual, &sg->qsg, attrs);
 }
 
 if (unlikely(residual)) {
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index fe36de10a21..87101705d01 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -738,6 +738,7 @@ static int megasas_ctrl_get_info(MegasasState *s, 
MegasasCmd *cmd)
 size_t dcmd_size = sizeof(info);
 BusChild *kid;
 int num_pd_disks = 0;
+uint64_t resid;
 
 memset(&info, 0x0, dcmd_size);
 if (cmd->iov_size < dcmd_size) {
@@ -848,7 +849,8 @@ static int megasas_ctrl_get_info(MegasasState *s, 
MegasasCmd *cmd)
MFI_INFO_PDMIX_SATA |
MFI_INFO_PDMIX_LD);
 
-cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg, 
MEMTXATTRS_UNSPECIFIED);
+dma_buf_read(&info, dcmd_size, &resid, &cmd->qsg, MEMTXATTRS_UNSPECIFIED);
+cmd->iov_size -= resid;
 return MFI_STAT_OK;
 }
 
@@ -856,6 +858,7 @@ static int megasas_mfc_get_defaults(MegasasState *s, 
MegasasCmd *cmd)
 {
 struct mfi_defaults info;
 size_t dcmd_size = sizeof(struct mfi_defaults);
+uint64_t resid;
 
 memset(&info, 0x0, dcmd_size);
 if (cmd->iov_size < dcmd_size) {
@@ -878,7 +881,8 @@ static int megasas_mfc_get_defaults(MegasasState *s, 
MegasasCmd *cmd)
 info.disable_preboot_cli = 1;
 info.cluster_disable = 1;
 
-cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg, 
MEMTXATTRS_UNSPECIFIED);
+dma_buf_read(&info, dcmd_size, &resid, &cmd->qsg, MEMTXATTRS_UNSPECIFIED);
+cmd->iov_size -= resid;
 return MFI_STAT_OK;
 }
 
@@ -886,6 +890,7 @@ static int megasas_dcmd_get_bios_info(MegasasState *s, 
MegasasCmd *cmd)
 {
 struct mfi_bios_data info;
 size_t dcmd_size = sizeof(info);
+uint64_t resid;
 
 m

[PATCH v2 10/23] dma: Let dma_buf_rw() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling dma_buf_rw().

Keep the default MEMTXATTRS_UNSPECIFIED in the 2 callers.

Reviewed-by: Klaus Jensen 
Signed-off-by: Philippe Mathieu-Daudé 
---
 softmmu/dma-helpers.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index 7f37548394e..fa81d2b386c 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -295,7 +295,7 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk,
 
 
 static uint64_t dma_buf_rw(void *buf, int32_t len, QEMUSGList *sg,
-   DMADirection dir)
+   DMADirection dir, MemTxAttrs attrs)
 {
 uint8_t *ptr = buf;
 uint64_t resid;
@@ -307,8 +307,7 @@ static uint64_t dma_buf_rw(void *buf, int32_t len, 
QEMUSGList *sg,
 while (len > 0) {
 ScatterGatherEntry entry = sg->sg[sg_cur_index++];
 int32_t xfer = MIN(len, entry.len);
-dma_memory_rw(sg->as, entry.base, ptr, xfer, dir,
-  MEMTXATTRS_UNSPECIFIED);
+dma_memory_rw(sg->as, entry.base, ptr, xfer, dir, attrs);
 ptr += xfer;
 len -= xfer;
 resid -= xfer;
@@ -319,12 +318,14 @@ static uint64_t dma_buf_rw(void *buf, int32_t len, 
QEMUSGList *sg,
 
 uint64_t dma_buf_read(void *ptr, int32_t len, QEMUSGList *sg)
 {
-return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_FROM_DEVICE);
+return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_FROM_DEVICE,
+  MEMTXATTRS_UNSPECIFIED);
 }
 
 uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg)
 {
-return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_TO_DEVICE);
+return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_TO_DEVICE,
+  MEMTXATTRS_UNSPECIFIED);
 }
 
 void dma_acct_start(BlockBackend *blk, BlockAcctCookie *cookie,
-- 
2.33.1

[PATCH v2 17/23] dma: Let st*_dma() propagate MemTxResult

2021-12-23 Thread Philippe Mathieu-Daudé

dma_memory_write() returns a MemTxResult type. Do not discard
it, return it to the caller.

Reviewed-by: Richard Henderson 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/sysemu/dma.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index f3cf60d222d..725e8e90f88 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -248,13 +248,13 @@ static inline void dma_memory_unmap(AddressSpace *as,
 dma_memory_read(as, addr, &val, (_bits) / 8, attrs); \
 return _end##_bits##_to_cpu(val);   \
 }   \
-static inline void st##_sname##_##_end##_dma(AddressSpace *as,  \
- dma_addr_t addr,   \
- uint##_bits##_t val,   \
- MemTxAttrs attrs)  \
-{   \
-val = cpu_to_##_end##_bits(val);\
-dma_memory_write(as, addr, &val, (_bits) / 8, attrs);   \
+static inline MemTxResult st##_sname##_##_end##_dma(AddressSpace *as, \
+dma_addr_t addr, \
+uint##_bits##_t val, \
+MemTxAttrs attrs) \
+{ \
+val = cpu_to_##_end##_bits(val); \
+return dma_memory_write(as, addr, &val, (_bits) / 8, attrs); \
 }
 
 static inline uint8_t ldub_dma(AddressSpace *as, dma_addr_t addr, MemTxAttrs 
attrs)
@@ -265,10 +265,10 @@ static inline uint8_t ldub_dma(AddressSpace *as, 
dma_addr_t addr, MemTxAttrs att
 return val;
 }
 
-static inline void stb_dma(AddressSpace *as, dma_addr_t addr,
-   uint8_t val, MemTxAttrs attrs)
+static inline MemTxResult stb_dma(AddressSpace *as, dma_addr_t addr,
+  uint8_t val, MemTxAttrs attrs)
 {
-dma_memory_write(as, addr, &val, 1, attrs);
+return dma_memory_write(as, addr, &val, 1, attrs);
 }
 
 DEFINE_LDST_DMA(uw, w, 16, le);
-- 
2.33.1

[PATCH v2 11/23] dma: Let dma_buf_write() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling
dma_buf_write().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Klaus Jensen 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/sysemu/dma.h  | 2 +-
 hw/ide/ahci.c | 6 --
 hw/nvme/ctrl.c| 3 ++-
 hw/scsi/megasas.c | 2 +-
 hw/scsi/scsi-bus.c| 2 +-
 softmmu/dma-helpers.c | 5 ++---
 6 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index 0d5b836013d..e3dd74a9c4f 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -303,7 +303,7 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk,
   QEMUSGList *sg, uint64_t offset, uint32_t align,
   BlockCompletionFunc *cb, void *opaque);
 uint64_t dma_buf_read(void *ptr, int32_t len, QEMUSGList *sg);
-uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg);
+uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg, MemTxAttrs 
attrs);
 
 void dma_acct_start(BlockBackend *blk, BlockAcctCookie *cookie,
 QEMUSGList *sg, enum BlockAcctType type);
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 8e77ddb660f..079d2977f23 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1381,8 +1381,10 @@ static void ahci_pio_transfer(const IDEDMA *dma)
 has_sglist ? "" : "o");
 
 if (has_sglist && size) {
+const MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
+
 if (is_write) {
-dma_buf_write(s->data_ptr, size, &s->sg);
+dma_buf_write(s->data_ptr, size, &s->sg, attrs);
 } else {
 dma_buf_read(s->data_ptr, size, &s->sg);
 }
@@ -1479,7 +1481,7 @@ static int ahci_dma_rw_buf(const IDEDMA *dma, bool 
is_write)
 if (is_write) {
 dma_buf_read(p, l, &s->sg);
 } else {
-dma_buf_write(p, l, &s->sg);
+dma_buf_write(p, l, &s->sg, MEMTXATTRS_UNSPECIFIED);
 }
 
 /* free sglist, update byte count */
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 5f573c417b3..e1a531d5d6c 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1146,10 +1146,11 @@ static uint16_t nvme_tx(NvmeCtrl *n, NvmeSg *sg, 
uint8_t *ptr, uint32_t len,
 assert(sg->flags & NVME_SG_ALLOC);
 
 if (sg->flags & NVME_SG_DMA) {
+const MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
 uint64_t residual;
 
 if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
-residual = dma_buf_write(ptr, len, &sg->qsg);
+residual = dma_buf_write(ptr, len, &sg->qsg, attrs);
 } else {
 residual = dma_buf_read(ptr, len, &sg->qsg);
 }
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index dc28302f96d..da1c88167ee 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -1465,7 +1465,7 @@ static int megasas_dcmd_set_properties(MegasasState *s, 
MegasasCmd *cmd)
 dcmd_size);
 return MFI_STAT_INVALID_PARAMETER;
 }
-dma_buf_write(&info, dcmd_size, &cmd->qsg);
+dma_buf_write(&info, dcmd_size, &cmd->qsg, MEMTXATTRS_UNSPECIFIED);
 trace_megasas_dcmd_unsupported(cmd->index, cmd->iov_size);
 return MFI_STAT_OK;
 }
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 77325d8cc7a..64a506a3975 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1423,7 +1423,7 @@ void scsi_req_data(SCSIRequest *req, int len)
 if (req->cmd.mode == SCSI_XFER_FROM_DEV) {
 req->resid = dma_buf_read(buf, len, req->sg);
 } else {
-req->resid = dma_buf_write(buf, len, req->sg);
+req->resid = dma_buf_write(buf, len, req->sg, MEMTXATTRS_UNSPECIFIED);
 }
 scsi_req_continue(req);
 }
diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index fa81d2b386c..2f1a241b81a 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -322,10 +322,9 @@ uint64_t dma_buf_read(void *ptr, int32_t len, QEMUSGList 
*sg)
   MEMTXATTRS_UNSPECIFIED);
 }
 
-uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg)
+uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg, MemTxAttrs 
attrs)
 {
-return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_TO_DEVICE,
-  MEMTXATTRS_UNSPECIFIED);
+return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_TO_DEVICE, attrs);
 }
 
 void dma_acct_start(BlockBackend *blk, BlockAcctCookie *cookie,
-- 
2.33.1

[PATCH v2 20/23] pci: Let st*_pci_dma() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling st*_pci_dma().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h | 11 ++-
 hw/audio/intel-hda.c | 10 ++
 hw/net/eepro100.c| 29 ++---
 hw/net/tulip.c   | 18 ++
 hw/scsi/megasas.c| 15 ++-
 hw/scsi/vmw_pvscsi.c |  3 ++-
 6 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 8c5f2ed5054..9f51ef2c3c2 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -859,11 +859,12 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
  MEMTXATTRS_UNSPECIFIED); \
 return val; \
 }   \
-static inline void st##_s##_pci_dma(PCIDevice *dev, \
-dma_addr_t addr, uint##_bits##_t val) \
-{   \
-st##_s##_dma(pci_get_address_space(dev), addr, val, \
- MEMTXATTRS_UNSPECIFIED); \
+static inline void st##_s##_pci_dma(PCIDevice *dev, \
+dma_addr_t addr, \
+uint##_bits##_t val, \
+MemTxAttrs attrs) \
+{ \
+st##_s##_dma(pci_get_address_space(dev), addr, val, attrs); \
 }
 
 PCI_DMA_DEFINE_LDST(ub, b, 8);
diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index fb3d34a4a0c..3309ae0ea18 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -345,6 +345,7 @@ static void intel_hda_corb_run(IntelHDAState *d)
 
 static void intel_hda_response(HDACodecDevice *dev, bool solicited, uint32_t 
response)
 {
+const MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
 HDACodecBus *bus = HDA_BUS(dev->qdev.parent_bus);
 IntelHDAState *d = container_of(bus, IntelHDAState, codecs);
 hwaddr addr;
@@ -367,8 +368,8 @@ static void intel_hda_response(HDACodecDevice *dev, bool 
solicited, uint32_t res
 ex = (solicited ? 0 : (1 << 4)) | dev->cad;
 wp = (d->rirb_wp + 1) & 0xff;
 addr = intel_hda_addr(d->rirb_lbase, d->rirb_ubase);
-stl_le_pci_dma(&d->pci, addr + 8*wp, response);
-stl_le_pci_dma(&d->pci, addr + 8*wp + 4, ex);
+stl_le_pci_dma(&d->pci, addr + 8 * wp, response, attrs);
+stl_le_pci_dma(&d->pci, addr + 8 * wp + 4, ex, attrs);
 d->rirb_wp = wp;
 
 dprint(d, 2, "%s: [wp 0x%x] response 0x%x, extra 0x%x\n",
@@ -394,6 +395,7 @@ static void intel_hda_response(HDACodecDevice *dev, bool 
solicited, uint32_t res
 static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t stnr, bool output,
uint8_t *buf, uint32_t len)
 {
+const MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
 HDACodecBus *bus = HDA_BUS(dev->qdev.parent_bus);
 IntelHDAState *d = container_of(bus, IntelHDAState, codecs);
 hwaddr addr;
@@ -428,7 +430,7 @@ static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t 
stnr, bool output,
st->be, st->bp, st->bpl[st->be].len, copy);
 
 pci_dma_rw(&d->pci, st->bpl[st->be].addr + st->bp, buf, copy, !output,
-   MEMTXATTRS_UNSPECIFIED);
+   attrs);
 st->lpib += copy;
 st->bp += copy;
 buf += copy;
@@ -451,7 +453,7 @@ static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t 
stnr, bool output,
 if (d->dp_lbase & 0x01) {
 s = st - d->st;
 addr = intel_hda_addr(d->dp_lbase & ~0x01, d->dp_ubase);
-stl_le_pci_dma(&d->pci, addr + 8*s, st->lpib);
+stl_le_pci_dma(&d->pci, addr + 8 * s, st->lpib, attrs);
 }
 dprint(d, 3, "dma: --\n");
 
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 16e95ef9cc9..83c4431b1ad 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -700,6 +700,8 @@ static void set_ru_state(EEPRO100State * s, ru_state_t 
state)
 
 static void dump_statistics(EEPRO100State * s)
 {
+const MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
+
 /* Dump statistical data. Most data is never changed by the emulation
  * and always 0, so we first just copy the whole block and then those
  * values which really matter.
@@ -707,16 +709,18 @@ static void dump_statistics(EEPRO100State * s)
  */
 pci_dma_write(&s->dev, s->statsaddr, &s->statistics, s->stats_size);
 stl_le_pci_dma(&s->dev, s->statsaddr + 0,
-   s->statistics.tx_good_frames);
+   s->statistics.tx_good_frames, attrs);
 stl_le_pci_dma(&s->dev, s->statsaddr + 36,
-   s->statistics.rx_good_frames);
+   s->statistics.rx_good_frames, attrs);
 stl_le_pci_dma(&s->dev, s->statsaddr + 48,
-   s->statistics.rx_resource_errors);
+   s->statistics.rx_reso

[PATCH v2 21/23] pci: Let ld*_pci_dma() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling ld*_pci_dma().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h |  6 +++---
 hw/audio/intel-hda.c |  2 +-
 hw/net/eepro100.c| 19 +--
 hw/net/tulip.c   | 18 ++
 hw/scsi/megasas.c| 16 ++--
 hw/scsi/mptsas.c | 10 ++
 hw/scsi/vmw_pvscsi.c |  3 ++-
 hw/usb/hcd-xhci.c|  1 +
 8 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 9f51ef2c3c2..7a46c1fa226 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -852,11 +852,11 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
 
 #define PCI_DMA_DEFINE_LDST(_l, _s, _bits)  \
 static inline uint##_bits##_t ld##_l##_pci_dma(PCIDevice *dev,  \
-   dma_addr_t addr) \
+   dma_addr_t addr, \
+   MemTxAttrs attrs) \
 {   \
 uint##_bits##_t val; \
-ld##_l##_dma(pci_get_address_space(dev), addr, &val, \
- MEMTXATTRS_UNSPECIFIED); \
+ld##_l##_dma(pci_get_address_space(dev), addr, &val, attrs); \
 return val; \
 }   \
 static inline void st##_s##_pci_dma(PCIDevice *dev, \
diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 3309ae0ea18..e34b7ab0e92 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -335,7 +335,7 @@ static void intel_hda_corb_run(IntelHDAState *d)
 
 rp = (d->corb_rp + 1) & 0xff;
 addr = intel_hda_addr(d->corb_lbase, d->corb_ubase);
-verb = ldl_le_pci_dma(&d->pci, addr + 4*rp);
+verb = ldl_le_pci_dma(&d->pci, addr + 4 * rp, MEMTXATTRS_UNSPECIFIED);
 d->corb_rp = rp;
 
 dprint(d, 2, "%s: [rp 0x%x] verb 0x%08x\n", __func__, rp, verb);
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 83c4431b1ad..eb82e9cb118 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -737,6 +737,7 @@ static void read_cb(EEPRO100State *s)
 
 static void tx_command(EEPRO100State *s)
 {
+const MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
 uint32_t tbd_array = s->tx.tbd_array_addr;
 uint16_t tcb_bytes = s->tx.tcb_bytes & 0x3fff;
 /* Sends larger than MAX_ETH_FRAME_SIZE are allowed, up to 2600 bytes. */
@@ -772,11 +773,14 @@ static void tx_command(EEPRO100State *s)
 /* Extended Flexible TCB. */
 for (; tbd_count < 2; tbd_count++) {
 uint32_t tx_buffer_address = ldl_le_pci_dma(&s->dev,
-tbd_address);
+tbd_address,
+attrs);
 uint16_t tx_buffer_size = lduw_le_pci_dma(&s->dev,
-  tbd_address + 4);
+  tbd_address + 4,
+  attrs);
 uint16_t tx_buffer_el = lduw_le_pci_dma(&s->dev,
-tbd_address + 6);
+tbd_address + 6,
+attrs);
 tbd_address += 8;
 TRACE(RXTX, logout
 ("TBD (extended flexible mode): buffer address 0x%08x, 
size 0x%04x\n",
@@ -792,9 +796,12 @@ static void tx_command(EEPRO100State *s)
 }
 tbd_address = tbd_array;
 for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-uint32_t tx_buffer_address = ldl_le_pci_dma(&s->dev, tbd_address);
-uint16_t tx_buffer_size = lduw_le_pci_dma(&s->dev, tbd_address + 
4);
-uint16_t tx_buffer_el = lduw_le_pci_dma(&s->dev, tbd_address + 6);
+uint32_t tx_buffer_address = ldl_le_pci_dma(&s->dev, tbd_address,
+attrs);
+uint16_t tx_buffer_size = lduw_le_pci_dma(&s->dev, tbd_address + 4,
+  attrs);
+uint16_t tx_buffer_el = lduw_le_pci_dma(&s->dev, tbd_address + 6,
+attrs);
 tbd_address += 8;
 TRACE(RXTX, logout
 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
diff --git a/hw/net/tulip.c b/hw/net/tulip.c
index 1f2c79dd58b..c76e4868f73 100644
--- a/hw/net/tulip.c
+++ b/hw/net/tulip.c
@@ -70,16 +70,18 @@ static const VMStateDescription vmstate_pci

[PATCH v2 12/23] dma: Let dma_buf_read() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling
dma_buf_read().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Klaus Jensen 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/sysemu/dma.h  |  2 +-
 hw/ide/ahci.c |  4 ++--
 hw/nvme/ctrl.c|  2 +-
 hw/scsi/megasas.c | 24 
 hw/scsi/scsi-bus.c|  2 +-
 softmmu/dma-helpers.c |  5 ++---
 6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index e3dd74a9c4f..fd8f16003dd 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -302,7 +302,7 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk,
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
   QEMUSGList *sg, uint64_t offset, uint32_t align,
   BlockCompletionFunc *cb, void *opaque);
-uint64_t dma_buf_read(void *ptr, int32_t len, QEMUSGList *sg);
+uint64_t dma_buf_read(void *ptr, int32_t len, QEMUSGList *sg, MemTxAttrs 
attrs);
 uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg, MemTxAttrs 
attrs);
 
 void dma_acct_start(BlockBackend *blk, BlockAcctCookie *cookie,
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 079d2977f23..205dfdc6622 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1386,7 +1386,7 @@ static void ahci_pio_transfer(const IDEDMA *dma)
 if (is_write) {
 dma_buf_write(s->data_ptr, size, &s->sg, attrs);
 } else {
-dma_buf_read(s->data_ptr, size, &s->sg);
+dma_buf_read(s->data_ptr, size, &s->sg, attrs);
 }
 }
 
@@ -1479,7 +1479,7 @@ static int ahci_dma_rw_buf(const IDEDMA *dma, bool 
is_write)
 }
 
 if (is_write) {
-dma_buf_read(p, l, &s->sg);
+dma_buf_read(p, l, &s->sg, MEMTXATTRS_UNSPECIFIED);
 } else {
 dma_buf_write(p, l, &s->sg, MEMTXATTRS_UNSPECIFIED);
 }
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e1a531d5d6c..462f79a1f60 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1152,7 +1152,7 @@ static uint16_t nvme_tx(NvmeCtrl *n, NvmeSg *sg, uint8_t 
*ptr, uint32_t len,
 if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
 residual = dma_buf_write(ptr, len, &sg->qsg, attrs);
 } else {
-residual = dma_buf_read(ptr, len, &sg->qsg);
+residual = dma_buf_read(ptr, len, &sg->qsg, attrs);
 }
 
 if (unlikely(residual)) {
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index da1c88167ee..fe36de10a21 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -848,7 +848,7 @@ static int megasas_ctrl_get_info(MegasasState *s, 
MegasasCmd *cmd)
MFI_INFO_PDMIX_SATA |
MFI_INFO_PDMIX_LD);
 
-cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg, 
MEMTXATTRS_UNSPECIFIED);
 return MFI_STAT_OK;
 }
 
@@ -878,7 +878,7 @@ static int megasas_mfc_get_defaults(MegasasState *s, 
MegasasCmd *cmd)
 info.disable_preboot_cli = 1;
 info.cluster_disable = 1;
 
-cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg, 
MEMTXATTRS_UNSPECIFIED);
 return MFI_STAT_OK;
 }
 
@@ -899,7 +899,7 @@ static int megasas_dcmd_get_bios_info(MegasasState *s, 
MegasasCmd *cmd)
 info.expose_all_drives = 1;
 }
 
-cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg, 
MEMTXATTRS_UNSPECIFIED);
 return MFI_STAT_OK;
 }
 
@@ -910,7 +910,7 @@ static int megasas_dcmd_get_fw_time(MegasasState *s, 
MegasasCmd *cmd)
 
 fw_time = cpu_to_le64(megasas_fw_time());
 
-cmd->iov_size -= dma_buf_read(&fw_time, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&fw_time, dcmd_size, &cmd->qsg, 
MEMTXATTRS_UNSPECIFIED);
 return MFI_STAT_OK;
 }
 
@@ -937,7 +937,7 @@ static int megasas_event_info(MegasasState *s, MegasasCmd 
*cmd)
 info.shutdown_seq_num = cpu_to_le32(s->shutdown_event);
 info.boot_seq_num = cpu_to_le32(s->boot_event);
 
-cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, dcmd_size, &cmd->qsg, 
MEMTXATTRS_UNSPECIFIED);
 return MFI_STAT_OK;
 }
 
@@ -1006,7 +1006,7 @@ static int megasas_dcmd_pd_get_list(MegasasState *s, 
MegasasCmd *cmd)
 info.size = cpu_to_le32(offset);
 info.count = cpu_to_le32(num_pd_disks);
 
-cmd->iov_size -= dma_buf_read(&info, offset, &cmd->qsg);
+cmd->iov_size -= dma_buf_read(&info, offset, &cmd->qsg, 
MEMTXATTRS_UNSPECIFIED);
 return MFI_STAT_OK;
 }
 
@@ -1100,7 +1100,7 @@ static int megasas_pd_get_info_submit(SCSIDevice *sdev, 
int lun,
 info->connected_port_bitmap = 0x1;
 info->device_speed = 1;
 info->link_speed = 1;
-resid = dma_buf_read(cmd->iov_buf, dcmd_size, &cmd->qsg);
+resid = dma_buf_read(cmd->iov_buf, dcmd_s

[PATCH v3 kvm/queue 02/16] mm/memfd: Introduce MFD_INACCESSIBLE flag

2021-12-23 Thread Chao Peng

Introduce a new memfd_create() flag indicating the content of the
created memfd is inaccessible from userspace. It does this by force
setting F_SEAL_INACCESSIBLE seal when the file is created. It also set
F_SEAL_SEAL to prevent future sealing, which means, it can not coexist
with MFD_ALLOW_SEALING.

Signed-off-by: Chao Peng 
---
 include/uapi/linux/memfd.h |  1 +
 mm/memfd.c | 12 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h
index 7a8a26751c23..48750474b904 100644
--- a/include/uapi/linux/memfd.h
+++ b/include/uapi/linux/memfd.h
@@ -8,6 +8,7 @@
 #define MFD_CLOEXEC0x0001U
 #define MFD_ALLOW_SEALING  0x0002U
 #define MFD_HUGETLB0x0004U
+#define MFD_INACCESSIBLE   0x0008U
 
 /*
  * Huge page size encoding when MFD_HUGETLB is specified, and a huge page
diff --git a/mm/memfd.c b/mm/memfd.c
index 9f80f162791a..c898a007fb76 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -245,7 +245,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, 
unsigned long arg)
 #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
 #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
 
-#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB)
+#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \
+  MFD_INACCESSIBLE)
 
 SYSCALL_DEFINE2(memfd_create,
const char __user *, uname,
@@ -267,6 +268,10 @@ SYSCALL_DEFINE2(memfd_create,
return -EINVAL;
}
 
+   /* Disallow sealing when MFD_INACCESSIBLE is set. */
+   if (flags & MFD_INACCESSIBLE && flags & MFD_ALLOW_SEALING)
+   return -EINVAL;
+
/* length includes terminating zero */
len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1);
if (len <= 0)
@@ -315,6 +320,11 @@ SYSCALL_DEFINE2(memfd_create,
*file_seals &= ~F_SEAL_SEAL;
}
 
+   if (flags & MFD_INACCESSIBLE) {
+   file_seals = memfd_file_seals_ptr(file);
+   *file_seals &= F_SEAL_SEAL | F_SEAL_INACCESSIBLE;
+   }
+
fd_install(fd, file);
kfree(name);
return fd;
-- 
2.17.1

[PATCH v2 22/23] pci: Let st*_pci_dma() propagate MemTxResult

2021-12-23 Thread Philippe Mathieu-Daudé

st*_dma() returns a MemTxResult type. Do not discard
it, return it to the caller.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 7a46c1fa226..c90cecc85c0 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -859,12 +859,12 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
 ld##_l##_dma(pci_get_address_space(dev), addr, &val, attrs); \
 return val; \
 }   \
-static inline void st##_s##_pci_dma(PCIDevice *dev, \
-dma_addr_t addr, \
-uint##_bits##_t val, \
-MemTxAttrs attrs) \
+static inline MemTxResult st##_s##_pci_dma(PCIDevice *dev, \
+   dma_addr_t addr, \
+   uint##_bits##_t val, \
+   MemTxAttrs attrs) \
 { \
-st##_s##_dma(pci_get_address_space(dev), addr, val, attrs); \
+return st##_s##_dma(pci_get_address_space(dev), addr, val, attrs); \
 }
 
 PCI_DMA_DEFINE_LDST(ub, b, 8);
-- 
2.33.1

[PATCH v2 13/23] dma: Let dma_buf_rw() propagate MemTxResult

2021-12-23 Thread Philippe Mathieu-Daudé

dma_memory_rw() returns a MemTxResult type. Do not discard
it, return it to the caller.

Since dma_buf_rw() was previously returning the QEMUSGList
size not consumed, add an extra argument where this size
can be stored.

Update the 2 callers.

Reviewed-by: Klaus Jensen 
Signed-off-by: Philippe Mathieu-Daudé 
---
 softmmu/dma-helpers.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index a391773c296..b0be1564797 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -294,12 +294,14 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk,
 }
 
 
-static uint64_t dma_buf_rw(void *buf, int32_t len, QEMUSGList *sg,
-   DMADirection dir, MemTxAttrs attrs)
+static MemTxResult dma_buf_rw(void *buf, int32_t len, uint64_t *residp,
+  QEMUSGList *sg, DMADirection dir,
+  MemTxAttrs attrs)
 {
 uint8_t *ptr = buf;
 uint64_t resid;
 int sg_cur_index;
+MemTxResult res = MEMTX_OK;
 
 resid = sg->size;
 sg_cur_index = 0;
@@ -307,23 +309,34 @@ static uint64_t dma_buf_rw(void *buf, int32_t len, 
QEMUSGList *sg,
 while (len > 0) {
 ScatterGatherEntry entry = sg->sg[sg_cur_index++];
 int32_t xfer = MIN(len, entry.len);
-dma_memory_rw(sg->as, entry.base, ptr, xfer, dir, attrs);
+res |= dma_memory_rw(sg->as, entry.base, ptr, xfer, dir, attrs);
 ptr += xfer;
 len -= xfer;
 resid -= xfer;
 }
 
-return resid;
+if (residp) {
+*residp = resid;
+}
+return res;
 }
 
 uint64_t dma_buf_read(void *ptr, int32_t len, QEMUSGList *sg, MemTxAttrs attrs)
 {
-return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_FROM_DEVICE, attrs);
+uint64_t resid;
+
+dma_buf_rw(ptr, len, &resid, sg, DMA_DIRECTION_FROM_DEVICE, attrs);
+
+return resid;
 }
 
 uint64_t dma_buf_write(void *ptr, int32_t len, QEMUSGList *sg, MemTxAttrs 
attrs)
 {
-return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_TO_DEVICE, attrs);
+uint64_t resid;
+
+dma_buf_rw(ptr, len, &resid, sg, DMA_DIRECTION_TO_DEVICE, attrs);
+
+return resid;
 }
 
 void dma_acct_start(BlockBackend *blk, BlockAcctCookie *cookie,
-- 
2.33.1

[PATCH v3 kvm/queue 03/16] mm/memfd: Introduce MEMFD_OPS

2021-12-23 Thread Chao Peng

From: "Kirill A. Shutemov" 

The patch introduces new MEMFD_OPS facility around file created by
memfd_create() to allow a third kernel component to make use of memory
bookmarked in a memfd and gets notifier when the memory in the file
is allocated/invalidated. It will be used for KVM to use memfd file
descriptor as the guest memory backend and KVM will use MEMFD_OPS to
interact with memfd subsystem. In the future there might be other
consumers (e.g. VFIO with encrypted device memory).

It consists two set of callbacks:
  - memfd_falloc_notifier: callbacks which provided by KVM and called
by memfd when memory gets allocated/invalidated through fallocate()
ioctl.
  - memfd_pfn_ops: callbacks which provided by memfd and called by KVM
to request memory page from memfd.

Locking is needed for above callbacks to prevent race condition.
  - get_owner/put_owner is used to ensure the owner is still alive in
the invalidate_page_range/fallocate callback handlers using a
reference mechanism.
  - page is locked between get_lock_pfn/put_unlock_pfn to ensure pfn is
still valid when it's used (e.g. when KVM page fault handler uses
it to establish the mapping in the secondary MMU page tables).

Userspace is in charge of guest memory lifecycle: it can allocate the
memory with fallocate() or punch hole to free memory from the guest.

The file descriptor passed down to KVM as guest memory backend. KVM
registers itself as the owner of the memfd via
memfd_register_falloc_notifier() and provides memfd_falloc_notifier
callbacks that need to be called on fallocate() and punching hole.

memfd_register_falloc_notifier() returns memfd_pfn_ops callbacks that
need to be used for requesting a new page from KVM.

At this time only shmem is supported.

Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Chao Peng 
---
 include/linux/memfd.h|  22 ++
 include/linux/shmem_fs.h |  16 
 mm/Kconfig   |   4 +
 mm/memfd.c   |  21 ++
 mm/shmem.c   | 158 +++
 5 files changed, 221 insertions(+)

diff --git a/include/linux/memfd.h b/include/linux/memfd.h
index 4f1600413f91..0007073b53dc 100644
--- a/include/linux/memfd.h
+++ b/include/linux/memfd.h
@@ -13,4 +13,26 @@ static inline long memfd_fcntl(struct file *f, unsigned int 
c, unsigned long a)
 }
 #endif
 
+#ifdef CONFIG_MEMFD_OPS
+struct memfd_falloc_notifier {
+   void (*invalidate_page_range)(struct inode *inode, void *owner,
+ pgoff_t start, pgoff_t end);
+   void (*fallocate)(struct inode *inode, void *owner,
+ pgoff_t start, pgoff_t end);
+   bool (*get_owner)(void *owner);
+   void (*put_owner)(void *owner);
+};
+
+struct memfd_pfn_ops {
+   long (*get_lock_pfn)(struct inode *inode, pgoff_t offset, int *order);
+   void (*put_unlock_pfn)(unsigned long pfn);
+
+};
+
+extern int memfd_register_falloc_notifier(struct inode *inode, void *owner,
+   const struct memfd_falloc_notifier *notifier,
+   const struct memfd_pfn_ops **pfn_ops);
+extern void memfd_unregister_falloc_notifier(struct inode *inode);
+#endif
+
 #endif /* __LINUX_MEMFD_H */
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 166158b6e917..503adc63728c 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -12,6 +12,11 @@
 
 /* inode in-kernel data */
 
+#ifdef CONFIG_MEMFD_OPS
+struct memfd_falloc_notifier;
+struct memfd_pfn_ops;
+#endif
+
 struct shmem_inode_info {
spinlock_t  lock;
unsigned intseals;  /* shmem seals */
@@ -24,6 +29,10 @@ struct shmem_inode_info {
struct shared_policypolicy; /* NUMA memory alloc policy */
struct simple_xattrsxattrs; /* list of xattrs */
atomic_tstop_eviction;  /* hold when working on inode */
+#ifdef CONFIG_MEMFD_OPS
+   void*owner;
+   const struct memfd_falloc_notifier *falloc_notifier;
+#endif
struct inodevfs_inode;
 };
 
@@ -96,6 +105,13 @@ extern unsigned long shmem_swap_usage(struct vm_area_struct 
*vma);
 extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end);
 
+#ifdef CONFIG_MEMFD_OPS
+extern int shmem_register_falloc_notifier(struct inode *inode, void *owner,
+   const struct memfd_falloc_notifier *notifier,
+   const struct memfd_pfn_ops **pfn_ops);
+extern void shmem_unregister_falloc_notifier(struct inode *inode);
+#endif
+
 /* Flag allocation requirements to shmem_getpage */
 enum sgp_type {
SGP_READ,   /* don't exceed i_size, don't allocate page */
diff --git a/mm/Kconfig b/mm/Kconfig
index 28edafc820ad..9989904d1b56 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -900,6 +900,1

[PATCH v2 23/23] pci: Let ld*_pci_dma() propagate MemTxResult

2021-12-23 Thread Philippe Mathieu-Daudé

ld*_dma() returns a MemTxResult type. Do not discard
it, return it to the caller.

Update the few callers.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h | 17 -
 hw/audio/intel-hda.c |  2 +-
 hw/net/eepro100.c| 25 ++---
 hw/net/tulip.c   | 16 
 hw/scsi/megasas.c| 21 -
 hw/scsi/mptsas.c | 16 +++-
 hw/scsi/vmw_pvscsi.c | 16 ++--
 7 files changed, 60 insertions(+), 53 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index c90cecc85c0..5b36334a28a 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -850,15 +850,14 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
   DMA_DIRECTION_FROM_DEVICE, MEMTXATTRS_UNSPECIFIED);
 }
 
-#define PCI_DMA_DEFINE_LDST(_l, _s, _bits)  \
-static inline uint##_bits##_t ld##_l##_pci_dma(PCIDevice *dev,  \
-   dma_addr_t addr, \
-   MemTxAttrs attrs) \
-{   \
-uint##_bits##_t val; \
-ld##_l##_dma(pci_get_address_space(dev), addr, &val, attrs); \
-return val; \
-}   \
+#define PCI_DMA_DEFINE_LDST(_l, _s, _bits) \
+static inline MemTxResult ld##_l##_pci_dma(PCIDevice *dev, \
+   dma_addr_t addr, \
+   uint##_bits##_t *val, \
+   MemTxAttrs attrs) \
+{ \
+return ld##_l##_dma(pci_get_address_space(dev), addr, val, attrs); \
+} \
 static inline MemTxResult st##_s##_pci_dma(PCIDevice *dev, \
dma_addr_t addr, \
uint##_bits##_t val, \
diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index e34b7ab0e92..2b55d521503 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -335,7 +335,7 @@ static void intel_hda_corb_run(IntelHDAState *d)
 
 rp = (d->corb_rp + 1) & 0xff;
 addr = intel_hda_addr(d->corb_lbase, d->corb_ubase);
-verb = ldl_le_pci_dma(&d->pci, addr + 4 * rp, MEMTXATTRS_UNSPECIFIED);
+ldl_le_pci_dma(&d->pci, addr + 4 * rp, &verb, MEMTXATTRS_UNSPECIFIED);
 d->corb_rp = rp;
 
 dprint(d, 2, "%s: [rp 0x%x] verb 0x%08x\n", __func__, rp, verb);
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index eb82e9cb118..679f52f80f1 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -769,18 +769,16 @@ static void tx_command(EEPRO100State *s)
 } else {
 /* Flexible mode. */
 uint8_t tbd_count = 0;
+uint32_t tx_buffer_address;
+uint16_t tx_buffer_size;
+uint16_t tx_buffer_el;
+
 if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
 /* Extended Flexible TCB. */
 for (; tbd_count < 2; tbd_count++) {
-uint32_t tx_buffer_address = ldl_le_pci_dma(&s->dev,
-tbd_address,
-attrs);
-uint16_t tx_buffer_size = lduw_le_pci_dma(&s->dev,
-  tbd_address + 4,
-  attrs);
-uint16_t tx_buffer_el = lduw_le_pci_dma(&s->dev,
-tbd_address + 6,
-attrs);
+ldl_le_pci_dma(&s->dev, tbd_address, &tx_buffer_address, 
attrs);
+lduw_le_pci_dma(&s->dev, tbd_address + 4, &tx_buffer_size, 
attrs);
+lduw_le_pci_dma(&s->dev, tbd_address + 6, &tx_buffer_el, 
attrs);
 tbd_address += 8;
 TRACE(RXTX, logout
 ("TBD (extended flexible mode): buffer address 0x%08x, 
size 0x%04x\n",
@@ -796,12 +794,9 @@ static void tx_command(EEPRO100State *s)
 }
 tbd_address = tbd_array;
 for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-uint32_t tx_buffer_address = ldl_le_pci_dma(&s->dev, tbd_address,
-attrs);
-uint16_t tx_buffer_size = lduw_le_pci_dma(&s->dev, tbd_address + 4,
-  attrs);
-uint16_t tx_buffer_el = lduw_le_pci_dma(&s->dev, tbd_address + 6,
-attrs);
+ldl_le_pci_dma(&s->dev, tbd_address, &tx_buffer_address, attrs);
+lduw_le_pci_dma(&s->dev, tbd_address + 4, &tx_buffer_size, attrs);
+

[PATCH v2 15/23] dma: Let st*_dma() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling st*_dma().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Richard Henderson 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h   |  3 ++-
 include/hw/ppc/spapr_vio.h | 12 
 include/sysemu/dma.h   | 10 ++
 hw/nvram/fw_cfg.c  |  4 ++--
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index a751ab5a75d..d07e9707b48 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -859,7 +859,8 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
 static inline void st##_s##_pci_dma(PCIDevice *dev, \
 dma_addr_t addr, uint##_bits##_t val) \
 {   \
-st##_s##_dma(pci_get_address_space(dev), addr, val);\
+st##_s##_dma(pci_get_address_space(dev), addr, val, \
+ MEMTXATTRS_UNSPECIFIED); \
 }
 
 PCI_DMA_DEFINE_LDST(ub, b, 8);
diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index 5d2ea8e6656..e87f8e6f596 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -118,10 +118,14 @@ static inline int spapr_vio_dma_set(SpaprVioDevice *dev, 
uint64_t taddr,
 H_DEST_PARM : H_SUCCESS;
 }
 
-#define vio_stb(_dev, _addr, _val) (stb_dma(&(_dev)->as, (_addr), (_val)))
-#define vio_sth(_dev, _addr, _val) (stw_be_dma(&(_dev)->as, (_addr), (_val)))
-#define vio_stl(_dev, _addr, _val) (stl_be_dma(&(_dev)->as, (_addr), (_val)))
-#define vio_stq(_dev, _addr, _val) (stq_be_dma(&(_dev)->as, (_addr), (_val)))
+#define vio_stb(_dev, _addr, _val) \
+(stb_dma(&(_dev)->as, (_addr), (_val), MEMTXATTRS_UNSPECIFIED))
+#define vio_sth(_dev, _addr, _val) \
+(stw_be_dma(&(_dev)->as, (_addr), (_val), MEMTXATTRS_UNSPECIFIED))
+#define vio_stl(_dev, _addr, _val) \
+(stl_be_dma(&(_dev)->as, (_addr), (_val), MEMTXATTRS_UNSPECIFIED))
+#define vio_stq(_dev, _addr, _val) \
+(stq_be_dma(&(_dev)->as, (_addr), (_val), MEMTXATTRS_UNSPECIFIED))
 #define vio_ldq(_dev, _addr) (ldq_be_dma(&(_dev)->as, (_addr)))
 
 int spapr_vio_send_crq(SpaprVioDevice *dev, uint8_t *crq);
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index d11c1d794f9..ebbc0501681 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -249,10 +249,11 @@ static inline void dma_memory_unmap(AddressSpace *as,
 }   \
 static inline void st##_sname##_##_end##_dma(AddressSpace *as,  \
  dma_addr_t addr,   \
- uint##_bits##_t val)   \
+ uint##_bits##_t val,   \
+ MemTxAttrs attrs)  \
 {   \
 val = cpu_to_##_end##_bits(val);\
-dma_memory_write(as, addr, &val, (_bits) / 8, MEMTXATTRS_UNSPECIFIED); 
\
+dma_memory_write(as, addr, &val, (_bits) / 8, attrs); \
 }
 
 static inline uint8_t ldub_dma(AddressSpace *as, dma_addr_t addr)
@@ -263,9 +264,10 @@ static inline uint8_t ldub_dma(AddressSpace *as, 
dma_addr_t addr)
 return val;
 }
 
-static inline void stb_dma(AddressSpace *as, dma_addr_t addr, uint8_t val)
+static inline void stb_dma(AddressSpace *as, dma_addr_t addr,
+   uint8_t val, MemTxAttrs attrs)
 {
-dma_memory_write(as, addr, &val, 1, MEMTXATTRS_UNSPECIFIED);
+dma_memory_write(as, addr, &val, 1, attrs);
 }
 
 DEFINE_LDST_DMA(uw, w, 16, le);
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 9b91b15cb08..e5f3c981841 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -360,7 +360,7 @@ static void fw_cfg_dma_transfer(FWCfgState *s)
 if (dma_memory_read(s->dma_as, dma_addr,
 &dma, sizeof(dma), MEMTXATTRS_UNSPECIFIED)) {
 stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess, control),
-   FW_CFG_DMA_CTL_ERROR);
+   FW_CFG_DMA_CTL_ERROR, MEMTXATTRS_UNSPECIFIED);
 return;
 }
 
@@ -446,7 +446,7 @@ static void fw_cfg_dma_transfer(FWCfgState *s)
 }
 
 stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess, control),
-dma.control);
+dma.control, MEMTXATTRS_UNSPECIFIED);
 
 trace_fw_cfg_read(s, 0);
 }
-- 
2.33.1

[PATCH v3 kvm/queue 01/16] mm/shmem: Introduce F_SEAL_INACCESSIBLE

2021-12-23 Thread Chao Peng

From: "Kirill A. Shutemov" 

Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of
the file is inaccessible from userspace in any possible ways like
read(),write() or mmap() etc.

It provides semantics required for KVM guest private memory support
that a file descriptor with this seal set is going to be used as the
source of guest memory in confidential computing environments such
as Intel TDX/AMD SEV but may not be accessible from host userspace.

At this time only shmem implements this seal.

Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Chao Peng 
---
 include/uapi/linux/fcntl.h |  1 +
 mm/shmem.c | 37 +++--
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 2f86b2ad6d7e..e2bad051936f 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,7 @@
 #define F_SEAL_GROW0x0004  /* prevent file from growing */
 #define F_SEAL_WRITE   0x0008  /* prevent writes */
 #define F_SEAL_FUTURE_WRITE0x0010  /* prevent future writes while mapped */
+#define F_SEAL_INACCESSIBLE0x0020  /* prevent file from accessing */
 /* (1U << 31) is reserved for signed error codes */
 
 /*
diff --git a/mm/shmem.c b/mm/shmem.c
index 18f93c2d68f1..faa7e9b1b9bc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1098,6 +1098,10 @@ static int shmem_setattr(struct user_namespace 
*mnt_userns,
(newsize > oldsize && (info->seals & F_SEAL_GROW)))
return -EPERM;
 
+   if ((info->seals & F_SEAL_INACCESSIBLE) &&
+   (newsize & ~PAGE_MASK))
+   return -EINVAL;
+
if (newsize != oldsize) {
error = shmem_reacct_size(SHMEM_I(inode)->flags,
oldsize, newsize);
@@ -1364,6 +1368,8 @@ static int shmem_writepage(struct page *page, struct 
writeback_control *wbc)
goto redirty;
if (!total_swap_pages)
goto redirty;
+   if (info->seals & F_SEAL_INACCESSIBLE)
+   goto redirty;
 
/*
 * Our capabilities prevent regular writeback or sync from ever calling
@@ -2262,6 +2268,9 @@ static int shmem_mmap(struct file *file, struct 
vm_area_struct *vma)
if (ret)
return ret;
 
+   if (info->seals & F_SEAL_INACCESSIBLE)
+   return -EPERM;
+
/* arm64 - allow memory tagging on RAM-based files */
vma->vm_flags |= VM_MTE_ALLOWED;
 
@@ -2459,12 +2468,15 @@ shmem_write_begin(struct file *file, struct 
address_space *mapping,
pgoff_t index = pos >> PAGE_SHIFT;
 
/* i_rwsem is held by caller */
-   if (unlikely(info->seals & (F_SEAL_GROW |
-  F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) {
+   if (unlikely(info->seals & (F_SEAL_GROW | F_SEAL_WRITE |
+   F_SEAL_FUTURE_WRITE |
+   F_SEAL_INACCESSIBLE))) {
if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))
return -EPERM;
if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size)
return -EPERM;
+   if (info->seals & F_SEAL_INACCESSIBLE)
+   return -EPERM;
}
 
return shmem_getpage(inode, index, pagep, SGP_WRITE);
@@ -2538,6 +2550,21 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, 
struct iov_iter *to)
end_index = i_size >> PAGE_SHIFT;
if (index > end_index)
break;
+
+   /*
+* inode_lock protects setting up seals as well as write to
+* i_size. Setting F_SEAL_INACCESSIBLE only allowed with
+* i_size == 0.
+*
+* Check F_SEAL_INACCESSIBLE after i_size. It effectively
+* serialize read vs. setting F_SEAL_INACCESSIBLE without
+* taking inode_lock in read path.
+*/
+   if (SHMEM_I(inode)->seals & F_SEAL_INACCESSIBLE) {
+   error = -EPERM;
+   break;
+   }
+
if (index == end_index) {
nr = i_size & ~PAGE_MASK;
if (nr <= offset)
@@ -2663,6 +2690,12 @@ static long shmem_fallocate(struct file *file, int mode, 
loff_t offset,
goto out;
}
 
+   if ((info->seals & F_SEAL_INACCESSIBLE) &&
+   (offset & ~PAGE_MASK || len & ~PAGE_MASK)) {
+   error = -EINVAL;
+   goto out;
+   }
+
shmem_falloc.waitq = &shmem_falloc_waitq;
shmem_falloc.start = (u64)unmap_start >> PAGE_SHIFT;
shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT;
-- 
2.17.1

[PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast memslot lookup by file offset

2021-12-23 Thread Chao Peng

Similar to hva_tree for hva range, maintain interval tree ofs_tree for
offset range of a fd-based memslot so the lookup by offset range can be
faster when memslot count is high.

Signed-off-by: Chao Peng 
---
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c  | 17 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2cd35560c44b..3bd875f9669f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -451,6 +451,7 @@ static inline int kvm_vcpu_exiting_guest_mode(struct 
kvm_vcpu *vcpu)
 struct kvm_memory_slot {
struct hlist_node id_node[2];
struct interval_tree_node hva_node[2];
+   struct interval_tree_node ofs_node[2];
struct rb_node gfn_node[2];
gfn_t base_gfn;
unsigned long npages;
@@ -560,6 +561,7 @@ struct kvm_memslots {
u64 generation;
atomic_long_t last_used_slot;
struct rb_root_cached hva_tree;
+   struct rb_root_cached ofs_tree;
struct rb_root gfn_tree;
/*
 * The mapping table from slot id to memslot.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b0f7e6eb00ff..47e96d1eb233 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1087,6 +1087,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 
atomic_long_set(&slots->last_used_slot, (unsigned 
long)NULL);
slots->hva_tree = RB_ROOT_CACHED;
+   slots->ofs_tree = RB_ROOT_CACHED;
slots->gfn_tree = RB_ROOT;
hash_init(slots->id_hash);
slots->node_idx = j;
@@ -1363,7 +1364,7 @@ static void kvm_replace_gfn_node(struct kvm_memslots 
*slots,
  * With NULL @old this simply adds @new.
  * With NULL @new this simply removes @old.
  *
- * If @new is non-NULL its hva_node[slots_idx] range has to be set
+ * If @new is non-NULL its hva/ofs_node[slots_idx] range has to be set
  * appropriately.
  */
 static void kvm_replace_memslot(struct kvm *kvm,
@@ -1377,6 +1378,7 @@ static void kvm_replace_memslot(struct kvm *kvm,
if (old) {
hash_del(&old->id_node[idx]);
interval_tree_remove(&old->hva_node[idx], &slots->hva_tree);
+   interval_tree_remove(&old->ofs_node[idx], &slots->ofs_tree);
 
if ((long)old == atomic_long_read(&slots->last_used_slot))
atomic_long_set(&slots->last_used_slot, (long)new);
@@ -1388,20 +1390,27 @@ static void kvm_replace_memslot(struct kvm *kvm,
}
 
/*
-* Initialize @new's hva range.  Do this even when replacing an @old
+* Initialize @new's hva/ofs range.  Do this even when replacing an @old
 * slot, kvm_copy_memslot() deliberately does not touch node data.
 */
new->hva_node[idx].start = new->userspace_addr;
new->hva_node[idx].last = new->userspace_addr +
  (new->npages << PAGE_SHIFT) - 1;
+   if (kvm_slot_is_private(new)) {
+   new->ofs_node[idx].start = new->ofs;
+   new->ofs_node[idx].last = new->ofs +
+ (new->npages << PAGE_SHIFT) - 1;
+   }
 
/*
 * (Re)Add the new memslot.  There is no O(1) interval_tree_replace(),
-* hva_node needs to be swapped with remove+insert even though hva can't
-* change when replacing an existing slot.
+* hva_node/ofs_node needs to be swapped with remove+insert even though
+* hva/ofs can't change when replacing an existing slot.
 */
hash_add(slots->id_hash, &new->id_node[idx], new->id);
interval_tree_insert(&new->hva_node[idx], &slots->hva_tree);
+   if (kvm_slot_is_private(new))
+   interval_tree_insert(&new->ofs_node[idx], &slots->ofs_tree);
 
/*
 * If the memslot gfn is unchanged, rb_replace_node() can be used to
-- 
2.17.1

[PATCH v3 kvm/queue 09/16] KVM: Split out common memory invalidation code

2021-12-23 Thread Chao Peng

When fd-based memory is enabled, there will be two types of memory
invalidation:
  - memory invalidation from native MMU through mmu_notifier callback
for hva-based memory, and,
  - memory invalidation from memfd through memfd_notifier callback for
fd-based memory.

Some code can be shared between these two types of memory invalidation.
This patch moves those shared code into one place so that it can be
used for both CONFIG_MMU_NOTIFIER and CONFIG_MEMFD_NOTIFIER.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 virt/kvm/kvm_main.c | 35 +++
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 19736a0013a0..7b7530b1ea1e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -469,22 +469,6 @@ void kvm_destroy_vcpus(struct kvm *kvm)
 EXPORT_SYMBOL_GPL(kvm_destroy_vcpus);
 
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
-static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
-{
-   return container_of(mn, struct kvm, mmu_notifier);
-}
-
-static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn,
- struct mm_struct *mm,
- unsigned long start, unsigned 
long end)
-{
-   struct kvm *kvm = mmu_notifier_to_kvm(mn);
-   int idx;
-
-   idx = srcu_read_lock(&kvm->srcu);
-   kvm_arch_mmu_notifier_invalidate_range(kvm, start, end);
-   srcu_read_unlock(&kvm->srcu, idx);
-}
 
 typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 
@@ -611,6 +595,25 @@ static __always_inline int 
__kvm_handle_useraddr_range(struct kvm *kvm,
/* The notifiers are averse to booleans. :-( */
return (int)ret;
 }
+#endif
+
+#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
+static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
+{
+   return container_of(mn, struct kvm, mmu_notifier);
+}
+
+static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn,
+ struct mm_struct *mm,
+ unsigned long start, unsigned 
long end)
+{
+   struct kvm *kvm = mmu_notifier_to_kvm(mn);
+   int idx;
+
+   idx = srcu_read_lock(&kvm->srcu);
+   kvm_arch_mmu_notifier_invalidate_range(kvm, start, end);
+   srcu_read_unlock(&kvm->srcu, idx);
+}
 
 static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
unsigned long start,
-- 
2.17.1

[PATCH v2 16/23] dma: Let ld*_dma() take MemTxAttrs argument

2021-12-23 Thread Philippe Mathieu-Daudé

Let devices specify transaction attributes when calling ld*_dma().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Richard Henderson 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h   |  3 ++-
 include/hw/ppc/spapr_vio.h |  3 ++-
 include/sysemu/dma.h   | 11 ++-
 hw/intc/pnv_xive.c |  7 ---
 hw/usb/hcd-xhci.c  |  6 +++---
 5 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d07e9707b48..0613308b1b6 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -854,7 +854,8 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
 static inline uint##_bits##_t ld##_l##_pci_dma(PCIDevice *dev,  \
dma_addr_t addr) \
 {   \
-return ld##_l##_dma(pci_get_address_space(dev), addr);  \
+return ld##_l##_dma(pci_get_address_space(dev), addr,   \
+MEMTXATTRS_UNSPECIFIED);\
 }   \
 static inline void st##_s##_pci_dma(PCIDevice *dev, \
 dma_addr_t addr, uint##_bits##_t val) \
diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index e87f8e6f596..d2ec9b0637f 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -126,7 +126,8 @@ static inline int spapr_vio_dma_set(SpaprVioDevice *dev, 
uint64_t taddr,
 (stl_be_dma(&(_dev)->as, (_addr), (_val), MEMTXATTRS_UNSPECIFIED))
 #define vio_stq(_dev, _addr, _val) \
 (stq_be_dma(&(_dev)->as, (_addr), (_val), MEMTXATTRS_UNSPECIFIED))
-#define vio_ldq(_dev, _addr) (ldq_be_dma(&(_dev)->as, (_addr)))
+#define vio_ldq(_dev, _addr) \
+(ldq_be_dma(&(_dev)->as, (_addr), MEMTXATTRS_UNSPECIFIED))
 
 int spapr_vio_send_crq(SpaprVioDevice *dev, uint8_t *crq);
 
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index ebbc0501681..f3cf60d222d 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -241,10 +241,11 @@ static inline void dma_memory_unmap(AddressSpace *as,
 
 #define DEFINE_LDST_DMA(_lname, _sname, _bits, _end) \
 static inline uint##_bits##_t ld##_lname##_##_end##_dma(AddressSpace *as, \
-dma_addr_t addr) \
+dma_addr_t addr, \
+MemTxAttrs attrs) \
 {   \
 uint##_bits##_t val;\
-dma_memory_read(as, addr, &val, (_bits) / 8, MEMTXATTRS_UNSPECIFIED); \
+dma_memory_read(as, addr, &val, (_bits) / 8, attrs); \
 return _end##_bits##_to_cpu(val);   \
 }   \
 static inline void st##_sname##_##_end##_dma(AddressSpace *as,  \
@@ -253,14 +254,14 @@ static inline void dma_memory_unmap(AddressSpace *as,
  MemTxAttrs attrs)  \
 {   \
 val = cpu_to_##_end##_bits(val);\
-dma_memory_write(as, addr, &val, (_bits) / 8, attrs); \
+dma_memory_write(as, addr, &val, (_bits) / 8, attrs);   \
 }
 
-static inline uint8_t ldub_dma(AddressSpace *as, dma_addr_t addr)
+static inline uint8_t ldub_dma(AddressSpace *as, dma_addr_t addr, MemTxAttrs 
attrs)
 {
 uint8_t val;
 
-dma_memory_read(as, addr, &val, 1, MEMTXATTRS_UNSPECIFIED);
+dma_memory_read(as, addr, &val, 1, attrs);
 return val;
 }
 
diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
index ad43483612e..d9249bbc0c1 100644
--- a/hw/intc/pnv_xive.c
+++ b/hw/intc/pnv_xive.c
@@ -172,7 +172,7 @@ static uint64_t pnv_xive_vst_addr_indirect(PnvXive *xive, 
uint32_t type,
 
 /* Get the page size of the indirect table. */
 vsd_addr = vsd & VSD_ADDRESS_MASK;
-vsd = ldq_be_dma(&address_space_memory, vsd_addr);
+vsd = ldq_be_dma(&address_space_memory, vsd_addr, MEMTXATTRS_UNSPECIFIED);
 
 if (!(vsd & VSD_ADDRESS_MASK)) {
 #ifdef XIVE_DEBUG
@@ -195,7 +195,8 @@ static uint64_t pnv_xive_vst_addr_indirect(PnvXive *xive, 
uint32_t type,
 /* Load the VSD we are looking for, if not already done */
 if (vsd_idx) {
 vsd_addr = vsd_addr + vsd_idx * XIVE_VSD_SIZE;
-vsd = ldq_be_dma(&address_space_memory, vsd_addr);
+vsd = ldq_be_dma(&address_space_memory, vsd_addr,
+ MEMTXATTRS_UNSPECIFIED);
 
 if (!(vsd & VSD_ADDRESS_MASK)) {
 #ifdef XIVE_DEBUG
@@ -542,7 +543,7 @@ stat

[PATCH v3 kvm/queue 08/16] KVM: Special handling for fd-based memory invalidation

2021-12-23 Thread Chao Peng

For fd-based guest memory, the memory backend (e.g. the fd provider)
should notify KVM to unmap/invalidate the privated memory from KVM
secondary MMU when userspace punches hole on the fd (e.g. when
userspace converts private memory to shared memory).

To support fd-based memory invalidation, existing hva-based memory
invalidation needs to be extended. A new 'inode' for the fd is passed in
from memfd_falloc_notifier and the 'start/end' will represent start/end
offset in the fd instead of hva range. During the invalidation KVM needs
to check this inode against that in the memslot. Only when the 'inode' in
memslot equals to the passed-in 'inode' we should invalidate the mapping
in KVM.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 virt/kvm/kvm_main.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b7a1c4d7eaaa..19736a0013a0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -494,6 +494,7 @@ typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long 
start,
 struct kvm_useraddr_range {
unsigned long start;
unsigned long end;
+   struct inode *inode;
pte_t pte;
gfn_handler_t handler;
on_lock_fn_t on_lock;
@@ -544,14 +545,27 @@ static __always_inline int 
__kvm_handle_useraddr_range(struct kvm *kvm,
struct interval_tree_node *node;
 
slots = __kvm_memslots(kvm, i);
-   useraddr_tree = &slots->hva_tree;
+   useraddr_tree = range->inode ? &slots->ofs_tree : 
&slots->hva_tree;
kvm_for_each_memslot_in_useraddr_range(node, useraddr_tree,
  range->start, range->end - 1) 
{
unsigned long useraddr_start, useraddr_end;
+   unsigned long useraddr_base;
+
+   if (range->inode) {
+   slot = container_of(node, struct 
kvm_memory_slot,
+   ofs_node[slots->node_idx]);
+   if (!slot->file ||
+   slot->file->f_inode != range->inode)
+   continue;
+   useraddr_base = slot->ofs;
+   } else {
+   slot = container_of(node, struct 
kvm_memory_slot,
+   hva_node[slots->node_idx]);
+   useraddr_base = slot->userspace_addr;
+   }
 
-   slot = container_of(node, struct kvm_memory_slot, 
hva_node[slots->node_idx]);
-   useraddr_start = max(range->start, 
slot->userspace_addr);
-   useraddr_end = min(range->end, slot->userspace_addr +
+   useraddr_start = max(range->start, useraddr_base);
+   useraddr_end = min(range->end, useraddr_base +
   (slot->npages << 
PAGE_SHIFT));
 
/*
@@ -568,10 +582,10 @@ static __always_inline int 
__kvm_handle_useraddr_range(struct kvm *kvm,
 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
 */
gfn_range.start = 
useraddr_to_gfn_memslot(useraddr_start,
- slot, true);
+   slot, !range->inode);
gfn_range.end = useraddr_to_gfn_memslot(
useraddr_end + PAGE_SIZE - 1,
-   slot, true);
+   slot, !range->inode);
gfn_range.slot = slot;
 
if (!locked) {
@@ -613,6 +627,7 @@ static __always_inline int kvm_handle_hva_range(struct 
mmu_notifier *mn,
.on_lock= (void *)kvm_null_fn,
.flush_on_ret   = true,
.may_block  = false,
+   .inode  = NULL,
};
 
return __kvm_handle_useraddr_range(kvm, &range);
@@ -632,6 +647,7 @@ static __always_inline int 
kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
.on_lock= (void *)kvm_null_fn,
.flush_on_ret   = false,
.may_block  = false,
+   .inode  = NULL,
};
 
return __kvm_handle_useraddr_range(kvm, &range);
@@ -700,6 +716,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct 
mmu_notifier *mn,
.on_lock= kvm_inc_notifier_count,
.flush_on_ret   = true,
.may_block  = mmu_notifier_range_blockable(range),
+   .inode  = NULL,
};
 
trace_kvm_unmap_hva_range(range-

[PATCH v3 kvm/queue 11/16] KVM: Add kvm_map_gfn_range

2021-12-23 Thread Chao Peng

This new function establishes the mapping in KVM page tables for a
given gfn range. It can be used in the memory fallocate callback for
memfd based memory to establish the mapping for KVM secondary MMU when
the pages are allocated in the memory backend.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 arch/x86/kvm/mmu/mmu.c   | 47 
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c  |  5 +
 3 files changed, 54 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1d275e9d76b5..2856eb662a21 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1568,6 +1568,53 @@ static __always_inline bool kvm_handle_gfn_range(struct 
kvm *kvm,
return ret;
 }
 
+bool kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+   struct kvm_vcpu *vcpu;
+   kvm_pfn_t pfn;
+   gfn_t gfn;
+   int idx;
+   bool ret = true;
+
+   /* Need vcpu context for kvm_mmu_do_page_fault. */
+   vcpu = kvm_get_vcpu(kvm, 0);
+   if (mutex_lock_killable(&vcpu->mutex))
+   return false;
+
+   vcpu_load(vcpu);
+   idx = srcu_read_lock(&kvm->srcu);
+
+   kvm_mmu_reload(vcpu);
+
+   gfn = range->start;
+   while (gfn < range->end) {
+   if (signal_pending(current)) {
+   ret = false;
+   break;
+   }
+
+   if (need_resched())
+   cond_resched();
+
+   pfn = kvm_mmu_do_page_fault(vcpu, gfn << PAGE_SHIFT,
+   PFERR_WRITE_MASK | PFERR_USER_MASK,
+   false);
+   if (is_error_noslot_pfn(pfn) || kvm->vm_bugged) {
+   ret = false;
+   break;
+   }
+
+   gfn++;
+   }
+
+   srcu_read_unlock(&kvm->srcu, idx);
+   vcpu_put(vcpu);
+
+   mutex_unlock(&vcpu->mutex);
+
+   return ret;
+}
+
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
bool flush = false;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index be567925831b..8c2359175509 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -241,6 +241,8 @@ struct kvm_gfn_range {
pte_t pte;
bool may_block;
 };
+
+bool kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f495c1a313bd..660ce15973ad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -471,6 +471,11 @@ EXPORT_SYMBOL_GPL(kvm_destroy_vcpus);
 #if defined(CONFIG_MEMFD_OPS) ||\
(defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER))
 
+bool __weak kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+   return false;
+}
+
 typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 
 typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,
-- 
2.17.1

[PATCH v3 kvm/queue 06/16] KVM: Implement fd-based memory using MEMFD_OPS interfaces

2021-12-23 Thread Chao Peng

This patch adds the new memfd facility in KVM using MEMFD_OPS to provide
guest memory from a file descriptor created in userspace with
memfd_create() instead of traditional userspace hva. It mainly provides
two kind of functions:
  - Pair/unpair a fd-based memslot to a memory backend that owns the
file descriptor when such memslot gets created/deleted.
  - Get/put a pfn that to be used in KVM page fault handler from/to the
paired memory backend.

At the pairing time, KVM and the memfd subsystem exchange calllbacks
that each can call into the other side. These callbacks are the major
places to implement fd-based guest memory provisioning.
KVM->memfd:
  - get_pfn: get and lock a page at specified offset in the fd.
  - put_pfn: put and unlock the pfn.
Note: page needs to be locked between get_pfn/put_pfn to ensure pfn
is valid when KVM uses it to establish the mapping in the secondary
MMU page table.
memfd->KVM:
  - invalidate_page_range: called when userspace punches hole on the fd,
KVM should unmap related pages in the secondary MMU.
  - fallocate: called when userspace fallocates space on the fd, KVM
can map related pages in the secondary MMU.
  - get/put_owner: used to ensure guest is still alive using a reference
mechanism when calling above invalidate/fallocate callbacks.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 arch/x86/kvm/Kconfig |  1 +
 include/linux/kvm_host.h |  6 +++
 virt/kvm/Makefile.kvm|  2 +-
 virt/kvm/memfd.c | 91 
 4 files changed, 99 insertions(+), 1 deletion(-)
 create mode 100644 virt/kvm/memfd.c

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 03b2ce34e7f4..86655cd660ca 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -46,6 +46,7 @@ config KVM
select SRCU
select INTERVAL_TREE
select HAVE_KVM_PM_NOTIFIER if PM
+   select MEMFD_OPS
help
  Support hosting fully virtualized guest machines using hardware
  virtualization extensions.  You will need a fairly recent
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3bd875f9669f..21f8b1880723 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -806,6 +806,12 @@ static inline void kvm_irqfd_exit(void)
 {
 }
 #endif
+
+int kvm_memfd_register(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_memfd_unregister(struct kvm_memory_slot *slot);
+long kvm_memfd_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn, int *order);
+void kvm_memfd_put_pfn(kvm_pfn_t pfn);
+
 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
  struct module *module);
 void kvm_exit(void);
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index ffdcad3cc97a..8842128d8429 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -5,7 +5,7 @@
 
 KVM ?= ../../../virt/kvm
 
-kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o 
$(KVM)/memfd.o
 kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
diff --git a/virt/kvm/memfd.c b/virt/kvm/memfd.c
new file mode 100644
index ..662393a76782
--- /dev/null
+++ b/virt/kvm/memfd.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * memfd.c: routines for fd based guest memory
+ * Copyright (c) 2021, Intel Corporation.
+ *
+ * Author:
+ * Chao Peng 
+ */
+
+#include 
+#include 
+
+#ifdef CONFIG_MEMFD_OPS
+static const struct memfd_pfn_ops *memfd_ops;
+
+static void memfd_invalidate_page_range(struct inode *inode, void *owner,
+   pgoff_t start, pgoff_t end)
+{
+}
+
+static void memfd_fallocate(struct inode *inode, void *owner,
+   pgoff_t start, pgoff_t end)
+{
+}
+
+static bool memfd_get_owner(void *owner)
+{
+   return kvm_get_kvm_safe(owner);
+}
+
+static void memfd_put_owner(void *owner)
+{
+   kvm_put_kvm(owner);
+}
+
+static const struct  memfd_falloc_notifier memfd_notifier = {
+   .invalidate_page_range = memfd_invalidate_page_range,
+   .fallocate = memfd_fallocate,
+   .get_owner = memfd_get_owner,
+   .put_owner = memfd_put_owner,
+};
+#endif
+
+long kvm_memfd_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn, int *order)
+{
+#ifdef CONFIG_MEMFD_OPS
+   pgoff_t index = gfn - slot->base_gfn + (slot->ofs >> PAGE_SHIFT);
+
+   return memfd_ops->get_lock_pfn(slot->file->f_inode, index, order);
+#else
+   return -EOPNOTSUPP;
+#endif
+}
+
+void kvm_memfd_put_pfn(kvm_pfn_t pfn)
+{
+#ifdef CONFIG_MEMFD_OPS
+   memfd_ops->put_unlock_pfn(pfn);
+#endif
+}
+
+int kvm_memfd_register(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+#ifdef CONFIG_MEMFD_OPS
+   int ret;
+   struct fd fd = fdget(slot->fd);
+
+   if (!fd.file)
+   return -EINVAL;
+
+   ret

[PATCH v3 kvm/queue 13/16] KVM: Add KVM_EXIT_MEMORY_ERROR exit

2021-12-23 Thread Chao Peng

This new exit allows user space to handle memory-related errors.
Currently it supports two types (KVM_EXIT_MEM_MAP_SHARED/PRIVATE) of
errors which are used for shared memory <-> private memory conversion
in memory encryption usage.

After private memory is enabled, there are two places in KVM that can
exit to userspace to trigger private <-> shared conversion:
  - explicit conversion: happens when guest explicitly calls into KVM to
map a range (as private or shared), KVM then exits to userspace to
do the map/unmap operations.
  - implicit conversion: happens in KVM page fault handler.
* if the fault is due to a private memory access then causes a
  userspace exit for a shared->private conversion request when the
  page has not been allocated in the private memory backend.
* If the fault is due to a shared memory access then causes a
  userspace exit for a private->shared conversion request when the
  page has already been allocated in the private memory backend.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 include/uapi/linux/kvm.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 41434322fa23..d68db3b2eeec 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -243,6 +243,18 @@ struct kvm_xen_exit {
} u;
 };
 
+struct kvm_memory_exit {
+#define KVM_EXIT_MEM_MAP_SHARED 1
+#define KVM_EXIT_MEM_MAP_PRIVATE2
+   __u32 type;
+   union {
+   struct {
+   __u64 gpa;
+   __u64 size;
+   } map;
+   } u;
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX1048576
 
@@ -282,6 +294,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_X86_BUS_LOCK 33
 #define KVM_EXIT_XEN  34
 #define KVM_EXIT_RISCV_SBI35
+#define KVM_EXIT_MEMORY_ERROR 36
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -499,6 +512,8 @@ struct kvm_run {
unsigned long args[6];
unsigned long ret[2];
} riscv_sbi;
+   /* KVM_EXIT_MEMORY_ERROR */
+   struct kvm_memory_exit mem;
/* Fix the size of the union. */
char padding[256];
};
-- 
2.17.1

[PATCH v2 19/23] hw/scsi/megasas: Use uint32_t for reply queue head/tail values

2021-12-23 Thread Philippe Mathieu-Daudé

While the reply queue values fit in 16-bit, they are accessed
as 32-bit:

  661:s->reply_queue_head = ldl_le_pci_dma(pcid, s->producer_pa);
  662:s->reply_queue_head %= MEGASAS_MAX_FRAMES;
  663:s->reply_queue_tail = ldl_le_pci_dma(pcid, s->consumer_pa);
  664:s->reply_queue_tail %= MEGASAS_MAX_FRAMES;

Having:

  41:#define MEGASAS_MAX_FRAMES 2048 /* Firmware limit at 65535 */

In order to update the ld/st*_pci_dma() API to pass the address
of the value to access, it is simpler to have the head/tail declared
as 32-bit values. Replace the uint16_t by uint32_t, wasting 4 bytes in
the MegasasState structure.

Acked-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/scsi/megasas.c| 4 ++--
 hw/scsi/trace-events | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 87101705d01..266c3d38003 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -109,8 +109,8 @@ struct MegasasState {
 uint64_t reply_queue_pa;
 void *reply_queue;
 uint16_t reply_queue_len;
-uint16_t reply_queue_head;
-uint16_t reply_queue_tail;
+uint32_t reply_queue_head;
+uint32_t reply_queue_tail;
 uint64_t consumer_pa;
 uint64_t producer_pa;
 
diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events
index 92d5b40f892..ae8551f2797 100644
--- a/hw/scsi/trace-events
+++ b/hw/scsi/trace-events
@@ -42,18 +42,18 @@ mptsas_config_sas_phy(void *dev, int address, int port, int 
phy_handle, int dev_
 
 # megasas.c
 megasas_init_firmware(uint64_t pa) "pa 0x%" PRIx64 " "
-megasas_init_queue(uint64_t queue_pa, int queue_len, uint64_t head, uint64_t 
tail, uint32_t flags) "queue at 0x%" PRIx64 " len %d head 0x%" PRIx64 " tail 
0x%" PRIx64 " flags 0x%x"
+megasas_init_queue(uint64_t queue_pa, int queue_len, uint32_t head, uint32_t 
tail, uint32_t flags) "queue at 0x%" PRIx64 " len %d head 0x%" PRIx32 " tail 
0x%" PRIx32 " flags 0x%x"
 megasas_initq_map_failed(int frame) "scmd %d: failed to map queue"
 megasas_initq_mapped(uint64_t pa) "queue already mapped at 0x%" PRIx64
 megasas_initq_mismatch(int queue_len, int fw_cmds) "queue size %d max fw cmds 
%d"
 megasas_qf_mapped(unsigned int index) "skip mapped frame 0x%x"
 megasas_qf_new(unsigned int index, uint64_t frame) "frame 0x%x addr 0x%" PRIx64
 megasas_qf_busy(unsigned long pa) "all frames busy for frame 0x%lx"
-megasas_qf_enqueue(unsigned int index, unsigned int count, uint64_t context, 
unsigned int head, unsigned int tail, int busy) "frame 0x%x count %d context 
0x%" PRIx64 " head 0x%x tail 0x%x busy %d"
-megasas_qf_update(unsigned int head, unsigned int tail, unsigned int busy) 
"head 0x%x tail 0x%x busy %d"
+megasas_qf_enqueue(unsigned int index, unsigned int count, uint64_t context, 
uint32_t head, uint32_t tail, unsigned int busy) "frame 0x%x count %d context 
0x%" PRIx64 " head 0x%" PRIx32 " tail 0x%" PRIx32 " busy %u"
+megasas_qf_update(uint32_t head, uint32_t tail, unsigned int busy) "head 0x%" 
PRIx32 " tail 0x%" PRIx32 " busy %u"
 megasas_qf_map_failed(int cmd, unsigned long frame) "scmd %d: frame %lu"
 megasas_qf_complete_noirq(uint64_t context) "context 0x%" PRIx64 " "
-megasas_qf_complete(uint64_t context, unsigned int head, unsigned int tail, 
int busy) "context 0x%" PRIx64 " head 0x%x tail 0x%x busy %d"
+megasas_qf_complete(uint64_t context, uint32_t head, uint32_t tail, int busy) 
"context 0x%" PRIx64 " head 0x%" PRIx32 " tail 0x%" PRIx32 " busy %u"
 megasas_frame_busy(uint64_t addr) "frame 0x%" PRIx64 " busy"
 megasas_unhandled_frame_cmd(int cmd, uint8_t frame_cmd) "scmd %d: MFI cmd 0x%x"
 megasas_handle_scsi(const char *frame, int bus, int dev, int lun, void *sdev, 
unsigned long size) "%s dev %x/%x/%x sdev %p xfer %lu"
-- 
2.33.1

[PATCH v3 kvm/queue 12/16] KVM: Implement fd-based memory fallocation

2021-12-23 Thread Chao Peng

KVM gets notified through memfd_notifier when userspace allocatea space
via fallocate() on the fd which is used for guest memory. KVM can set up
the mapping in the secondary MMU page tables at this time. This patch
adds function in KVM to map pfn to gfn when the page is allocated in the
memory backend.

While it's possible to postpone the mapping of the secondary MMU to KVM
page fault handler but we can reduce some VMExits by also mapping the
secondary page tables when a page is mapped in the primary MMU.

It reuses the same code for kvm_memfd_invalidate_range, except using
kvm_map_gfn_range as its handler.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c  | 22 +++---
 virt/kvm/memfd.c |  2 ++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8c2359175509..ad89a0e8bf6b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2017,6 +2017,8 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu 
*vcpu)
 #ifdef CONFIG_MEMFD_OPS
 int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode,
   unsigned long start, unsigned long end);
+int kvm_memfd_fallocate_range(struct kvm *kvm, struct inode *inode,
+ unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMFD_OPS */
 
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 660ce15973ad..36dd2adcd7fc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -891,15 +891,17 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #ifdef CONFIG_MEMFD_OPS
-int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode,
-  unsigned long start, unsigned long end)
+int kvm_memfd_handle_range(struct kvm *kvm, struct inode *inode,
+  unsigned long start, unsigned long end,
+  gfn_handler_t handler)
+
 {
int ret;
const struct kvm_useraddr_range useraddr_range = {
.start  = start,
.end= end,
.pte= __pte(0),
-   .handler= kvm_unmap_gfn_range,
+   .handler= handler,
.on_lock= (void *)kvm_null_fn,
.flush_on_ret   = true,
.may_block  = false,
@@ -914,6 +916,20 @@ int kvm_memfd_invalidate_range(struct kvm *kvm, struct 
inode *inode,
 
return ret;
 }
+
+int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode,
+  unsigned long start, unsigned long end)
+{
+   return kvm_memfd_handle_range(kvm, inode, start, end,
+ kvm_unmap_gfn_range);
+}
+
+int kvm_memfd_fallocate_range(struct kvm *kvm, struct inode *inode,
+ unsigned long start, unsigned long end)
+{
+   return kvm_memfd_handle_range(kvm, inode, start, end,
+ kvm_map_gfn_range);
+}
 #endif /* CONFIG_MEMFD_OPS */
 
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
diff --git a/virt/kvm/memfd.c b/virt/kvm/memfd.c
index 547f65f5a187..91a17c9fbc49 100644
--- a/virt/kvm/memfd.c
+++ b/virt/kvm/memfd.c
@@ -23,6 +23,8 @@ static void memfd_invalidate_page_range(struct inode *inode, 
void *owner,
 static void memfd_fallocate(struct inode *inode, void *owner,
pgoff_t start, pgoff_t end)
 {
+   kvm_memfd_fallocate_range(owner, inode, start >> PAGE_SHIFT,
+   end >> PAGE_SHIFT);
 }
 
 static bool memfd_get_owner(void *owner)
-- 
2.17.1

[PATCH v3 kvm/queue 15/16] KVM: Use kvm_userspace_memory_region_ext

2021-12-23 Thread Chao Peng

Use the new extended memslot structure kvm_userspace_memory_region_ext
which includes two additional fd/ofs fields comparing to the current
kvm_userspace_memory_region. The fields fd/ofs will be copied from
userspace only when KVM_MEM_PRIVATE is set.

Internal the KVM we change all existing kvm_userspace_memory_region to
kvm_userspace_memory_region_ext since the new extended structure covers
all the existing fields in kvm_userspace_memory_region.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 arch/x86/kvm/x86.c   |  2 +-
 include/linux/kvm_host.h |  4 ++--
 virt/kvm/kvm_main.c  | 19 +--
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bde45a1bc2..52942195def3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11551,7 +11551,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, 
int id, gpa_t gpa,
}
 
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
-   struct kvm_userspace_memory_region m;
+   struct kvm_userspace_memory_region_ext m;
 
m.slot = id | (i << 16);
m.flags = 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ad89a0e8bf6b..fabab3b77d57 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -981,9 +981,9 @@ enum kvm_mr_change {
 };
 
 int kvm_set_memory_region(struct kvm *kvm,
- const struct kvm_userspace_memory_region *mem);
+ const struct kvm_userspace_memory_region_ext *mem);
 int __kvm_set_memory_region(struct kvm *kvm,
-   const struct kvm_userspace_memory_region *mem);
+   const struct kvm_userspace_memory_region_ext *mem);
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen);
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 36dd2adcd7fc..cf8dcb3b8c7f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1514,7 +1514,7 @@ static void kvm_replace_memslot(struct kvm *kvm,
}
 }
 
-static int check_memory_region_flags(const struct kvm_userspace_memory_region 
*mem)
+static int check_memory_region_flags(const struct 
kvm_userspace_memory_region_ext *mem)
 {
u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
@@ -1907,7 +1907,7 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots 
*slots, int id,
  * Must be called holding kvm->slots_lock for write.
  */
 int __kvm_set_memory_region(struct kvm *kvm,
-   const struct kvm_userspace_memory_region *mem)
+   const struct kvm_userspace_memory_region_ext *mem)
 {
struct kvm_memory_slot *old, *new;
struct kvm_memslots *slots;
@@ -2011,7 +2011,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
 
 int kvm_set_memory_region(struct kvm *kvm,
- const struct kvm_userspace_memory_region *mem)
+ const struct kvm_userspace_memory_region_ext *mem)
 {
int r;
 
@@ -2023,7 +2023,7 @@ int kvm_set_memory_region(struct kvm *kvm,
 EXPORT_SYMBOL_GPL(kvm_set_memory_region);
 
 static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
- struct kvm_userspace_memory_region 
*mem)
+   struct kvm_userspace_memory_region_ext *mem)
 {
if ((u16)mem->slot >= KVM_USER_MEM_SLOTS)
return -EINVAL;
@@ -4569,12 +4569,19 @@ static long kvm_vm_ioctl(struct file *filp,
break;
}
case KVM_SET_USER_MEMORY_REGION: {
-   struct kvm_userspace_memory_region kvm_userspace_mem;
+   struct kvm_userspace_memory_region_ext kvm_userspace_mem;
 
r = -EFAULT;
if (copy_from_user(&kvm_userspace_mem, argp,
-   sizeof(kvm_userspace_mem)))
+   sizeof(struct kvm_userspace_memory_region)))
goto out;
+   if (kvm_userspace_mem.flags & KVM_MEM_PRIVATE) {
+   int offset = offsetof(
+   struct kvm_userspace_memory_region_ext, ofs);
+   if (copy_from_user(&kvm_userspace_mem.ofs, argp + 
offset,
+  sizeof(kvm_userspace_mem) - offset))
+   goto out;
+   }
 
r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem);
break;
-- 
2.17.1

[PATCH v3 kvm/queue 14/16] KVM: Handle page fault for private memory

2021-12-23 Thread Chao Peng

When a page fault from the secondary page table while the guest is
running happens in a memslot with KVM_MEM_PRIVATE, we need go
different paths for private access and shared access.

  - For private access, KVM checks if the page is already allocated in
the memory backend, if yes KVM establishes the mapping, otherwise
exits to userspace to convert a shared page to private one.

  - For shared access, KVM also checks if the page is already allocated
in the memory backend, if yes then exit to userspace to convert a
private page to shared one, otherwise it's treated as a traditional
hva-based shared memory, KVM lets existing code to obtain a pfn with
get_user_pages() and establish the mapping.

The above code assume private memory is persistent and pre-allocated in
the memory backend so KVM can use this information as an indicator for
a page is private or shared. The above check is then performed by
calling kvm_memfd_get_pfn() which currently is implemented as a
pagecache search but in theory that can be implemented differently
(i.e. when the page is even not mapped into host pagecache there should
be some different implementation).

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 arch/x86/kvm/mmu/mmu.c | 73 --
 arch/x86/kvm/mmu/paging_tmpl.h | 11 +++--
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2856eb662a21..fbcdf62f8281 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2920,6 +2920,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
 
+   if (kvm_slot_is_private(slot))
+   return max_level;
+
host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
return min(host_level, max_level);
 }
@@ -3950,7 +3953,59 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu 
*vcpu, gpa_t cr2_or_gpa,
  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
 }
 
-static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault 
*fault, int *r)
+static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   /*
+* At this time private gfn has not been supported yet. Other patch
+* that enables it should change this.
+*/
+   return false;
+}
+
+static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
+   struct kvm_page_fault *fault,
+   bool *is_private_pfn, int *r)
+{
+   int order;
+   int mem_convert_type;
+   struct kvm_memory_slot *slot = fault->slot;
+   long pfn = kvm_memfd_get_pfn(slot, fault->gfn, &order);
+
+   if (kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT)) {
+   if (pfn < 0)
+   mem_convert_type = KVM_EXIT_MEM_MAP_PRIVATE;
+   else {
+   fault->pfn = pfn;
+   if (slot->flags & KVM_MEM_READONLY)
+   fault->map_writable = false;
+   else
+   fault->map_writable = true;
+
+   if (order == 0)
+   fault->max_level = PG_LEVEL_4K;
+   *is_private_pfn = true;
+   *r = RET_PF_FIXED;
+   return true;
+   }
+   } else {
+   if (pfn < 0)
+   return false;
+
+   kvm_memfd_put_pfn(pfn);
+   mem_convert_type = KVM_EXIT_MEM_MAP_SHARED;
+   }
+
+   vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
+   vcpu->run->mem.type = mem_convert_type;
+   vcpu->run->mem.u.map.gpa = fault->gfn << PAGE_SHIFT;
+   vcpu->run->mem.u.map.size = PAGE_SIZE;
+   fault->pfn = -1;
+   *r = -1;
+   return true;
+}
+
+static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault 
*fault,
+   bool *is_private_pfn, int *r)
 {
struct kvm_memory_slot *slot = fault->slot;
bool async;
@@ -3984,6 +4039,10 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, 
struct kvm_page_fault *fault,
}
}
 
+   if (kvm_slot_is_private(slot) &&
+   kvm_faultin_pfn_private(vcpu, fault, is_private_pfn, r))
+   return *r == RET_PF_FIXED ? false : true;
+
async = false;
fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
  fault->write, &fault->map_writable,
@@ -4044,6 +4103,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, 
struct kvm_page_fault *fault
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
 
unsigned long mmu_seq;
+   bool is_private_pfn = false;
int r;
 
fault->gfn = fault->addr >> PAGE_SHIFT;
@@ -4063,7 +4123,7 @@ static int direct_page_fault(struct kvm

Re: [PATCH v4 18/19] iotests.py: implement unsupported_imgopts

2021-12-23 Thread Hanna Reitz


On 03.12.21 14:07, Vladimir Sementsov-Ogievskiy wrote:

We have added support for some addition IMGOPTS in python iotests like
in bash iotests. Similarly to bash iotests, we want a way to skip some
tests which can't work with specific IMGOPTS.

Globally for python iotests we now don't support things like
'data_file=$TEST_IMG.ext_data_file' in IMGOPTS, so, forbid this
globally in iotests.py.

Suggested-by: Hanna Reitz 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/iotests.py | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)


Reviewed-by: Hanna Reitz 

Can we move this and the next patch before patch 2, though? Otherwise, 
the tests adjusted in the next patch will be broken after patch 2 (when 
given those unsupported options).  The move seems trivial, just 
wondering whether you know of anything that would prohibit this.

[PATCH v3 kvm/queue 16/16] KVM: Register/unregister private memory slot to memfd

2021-12-23 Thread Chao Peng

Expose KVM_MEM_PRIVATE flag and register/unregister private memory
slot to memfd when userspace sets the flag.

KVM_MEM_PRIVATE is disallowed by default but architecture code can
turn on it by implementing kvm_arch_private_memory_supported().

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 34 --
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fabab3b77d57..5173c52e70d4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1229,6 +1229,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu 
*vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
 int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+bool kvm_arch_private_memory_supported(struct kvm *kvm);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cf8dcb3b8c7f..1caebded52c4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1514,10 +1514,19 @@ static void kvm_replace_memslot(struct kvm *kvm,
}
 }
 
-static int check_memory_region_flags(const struct 
kvm_userspace_memory_region_ext *mem)
+bool __weak kvm_arch_private_memory_supported(struct kvm *kvm)
+{
+   return false;
+}
+
+static int check_memory_region_flags(struct kvm *kvm,
+   const struct kvm_userspace_memory_region_ext *mem)
 {
u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
+   if (kvm_arch_private_memory_supported(kvm))
+   valid_flags |= KVM_MEM_PRIVATE;
+
 #ifdef __KVM_HAVE_READONLY_MEM
valid_flags |= KVM_MEM_READONLY;
 #endif
@@ -1756,6 +1765,8 @@ static void kvm_delete_memslot(struct kvm *kvm,
   struct kvm_memory_slot *old,
   struct kvm_memory_slot *invalid_slot)
 {
+   if (old->flags & KVM_MEM_PRIVATE)
+   kvm_memfd_unregister(old);
/*
 * Remove the old memslot (in the inactive memslots) by passing NULL as
 * the "new" slot, and for the invalid version in the active slots.
@@ -1836,6 +1847,14 @@ static int kvm_set_memslot(struct kvm *kvm,
kvm_invalidate_memslot(kvm, old, invalid_slot);
}
 
+   if (new->flags & KVM_MEM_PRIVATE && change == KVM_MR_CREATE) {
+   r = kvm_memfd_register(kvm, new);
+   if (r) {
+   mutex_unlock(&kvm->slots_arch_lock);
+   return r;
+   }
+   }
+
r = kvm_prepare_memory_region(kvm, old, new, change);
if (r) {
/*
@@ -1850,6 +1869,10 @@ static int kvm_set_memslot(struct kvm *kvm,
} else {
mutex_unlock(&kvm->slots_arch_lock);
}
+
+   if (new->flags & KVM_MEM_PRIVATE && change == KVM_MR_CREATE)
+   kvm_memfd_unregister(new);
+
return r;
}
 
@@ -1917,7 +1940,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
int as_id, id;
int r;
 
-   r = check_memory_region_flags(mem);
+   r = check_memory_region_flags(kvm, mem);
if (r)
return r;
 
@@ -1974,6 +1997,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
if ((kvm->nr_memslot_pages + npages) < kvm->nr_memslot_pages)
return -EINVAL;
} else { /* Modify an existing slot. */
+   /* Private memslots are immutable, they can only be deleted. */
+   if (mem->flags & KVM_MEM_PRIVATE)
+   return -EINVAL;
+
if ((mem->userspace_addr != old->userspace_addr) ||
(npages != old->npages) ||
((mem->flags ^ old->flags) & KVM_MEM_READONLY))
@@ -2002,6 +2029,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
new->npages = npages;
new->flags = mem->flags;
new->userspace_addr = mem->userspace_addr;
+   new->fd = mem->fd;
+   new->file = NULL;
+   new->ofs = mem->ofs;
 
r = kvm_set_memslot(kvm, old, new, change);
if (r)
-- 
2.17.1

Re: [PATCH v4 19/19] iotests: specify some unsupported_imgopts for python iotests

2021-12-23 Thread Hanna Reitz


On 03.12.21 14:07, Vladimir Sementsov-Ogievskiy wrote:

We support IMGOPTS for python iotests now. Still a lot of tests are
unprepared to common IMGOPTS that are used with bash iotests. So we
should define corresponding unsupported_imgopts.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/044 | 3 ++-
  tests/qemu-iotests/065 | 3 ++-
  tests/qemu-iotests/163 | 3 ++-
  tests/qemu-iotests/165 | 3 ++-
  tests/qemu-iotests/196 | 3 ++-
  tests/qemu-iotests/242 | 3 ++-
  tests/qemu-iotests/246 | 3 ++-
  tests/qemu-iotests/254 | 3 ++-
  tests/qemu-iotests/260 | 4 ++--
  tests/qemu-iotests/274 | 3 ++-
  tests/qemu-iotests/281 | 3 ++-
  tests/qemu-iotests/303 | 3 ++-
  tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 3 ++-
  tests/qemu-iotests/tests/migrate-bitmaps-test  | 3 ++-
  tests/qemu-iotests/tests/migrate-during-backup | 3 ++-
  tests/qemu-iotests/tests/remove-bitmap-from-backing| 3 ++-
  16 files changed, 32 insertions(+), 17 deletions(-)


Few of these tests look like they could be made to support refcount_bits 
if we filtered qemu-img info output accordingly, but I don’t mind just 
marking the option as unsupported, so I’m good with your approach.



diff --git a/tests/qemu-iotests/044 b/tests/qemu-iotests/044
index 714329eb16..a5ee9a7ded 100755
--- a/tests/qemu-iotests/044
+++ b/tests/qemu-iotests/044
@@ -118,4 +118,5 @@ class TestRefcountTableGrowth(iotests.QMPTestCase):
  if __name__ == '__main__':
  iotests.activate_logging()
  iotests.main(supported_fmts=['qcow2'],
- supported_protocols=['file'])
+ supported_protocols=['file'],
+ unsupported_imgopts=['refcount_bits'])
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 4b3c5c6c8c..f7c1b68dad 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -139,4 +139,5 @@ TestQMP = None
  
  if __name__ == '__main__':

  iotests.main(supported_fmts=['qcow2'],
- supported_protocols=['file'])
+ supported_protocols=['file'],
+ unsupported_imgopts=['refcount_bits'])
diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
index dedce8ef43..0b00df519c 100755
--- a/tests/qemu-iotests/163
+++ b/tests/qemu-iotests/163
@@ -169,4 +169,5 @@ ShrinkBaseClass = None
  
  if __name__ == '__main__':

  iotests.main(supported_fmts=['raw', 'qcow2'],
- supported_protocols=['file'])
+ supported_protocols=['file'],
+ unsupported_imgopts=['compat=0.10'])


Works for my case (I use -o compat=0.10), but compat=v2 is also allowed.

For cases that don’t support anything but refcount_bits=16, you already 
disallow specifying any refcount_bits value, even refcount_bits=16 
(which would work fine in most cases, I believe). Perhaps we should then 
also just disallow any compat option instead of compat=0.10 specifically?


[...]


diff --git a/tests/qemu-iotests/tests/migrate-during-backup 
b/tests/qemu-iotests/tests/migrate-during-backup
index 34103229ee..12cc4dde2e 100755
--- a/tests/qemu-iotests/tests/migrate-during-backup
+++ b/tests/qemu-iotests/tests/migrate-during-backup
@@ -94,4 +94,5 @@ class TestMigrateDuringBackup(iotests.QMPTestCase):
  
  if __name__ == '__main__':

  iotests.main(supported_fmts=['qcow2'],
- supported_protocols=['file'])
+ supported_protocols=['file'],
+ unsupported_imgopts=['compat=0.10'])


It seems to me like this test can handle compat=0.10 just fine, though.

Hanna

[PATCH v3 kvm/queue 00/16] KVM: mm: fd-based approach for supporting KVM guest private memory

2021-12-23 Thread Chao Peng

This is the third version of this series which try to implement the
fd-based KVM guest private memory. Earlier this week I sent another v3 
version at link:

https://lore.kernel.org/linux-mm/20211222012223.ga22...@chaop.bj.intel.com/T/

That version is based on the latest TDX codebase. In contrast the one you
are reading is the same code rebased to latest kvm/queue branch at commit:

  c34c87a69727  KVM: x86: Update vPMCs when retiring branch instructions

There are some changes made to fit into the kvm queue branch but
generally the two versions are the same code in logic.

There is also difference in test. In the previous one I tested the new
private memory feature with TDX but in this rebased version I can not
test the new feature because lack TDX. I did run simple regression
test on this new version.

Introduction

In general this patch series introduce fd-based memslot which provide
guest memory through a memfd file descriptor fd[offset,size] instead of
hva/size. The fd then can be created from a supported memory filesystem
like tmpfs/hugetlbfs etc which we refer as memory backend. KVM and the
memory backend exchange some callbacks when such memslot gets created.
At runtime KVM will call into callbacks provided by backend to get the
pfn with the fd+offset. Memory backend will also call into KVM callbacks
when userspace fallocate/punch hole on the fd to notify KVM to map/unmap
secondary MMU page tables.

Comparing to existing hva-based memslot, this new type of memslot allow
guest memory unmapped from host userspace like QEMU and even the kernel
itself, therefore reduce attack surface and prevent userspace bugs.

Based on this fd-based memslot, we can build guest private memory that
is going to be used in confidential computing environments such as Intel
TDX and AMD SEV. When supported, the memory backend can provide more
enforcement on the fd and KVM can use a single memslot to hold both the
private and shared part of the guest memory. 

Memfd/shmem extension
-
Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file
created with this flag cannot read(), write() or mmap() etc.

In addition, two sets of callbacks are introduced as new MEMFD_OPS:
  - memfd_falloc_notifier: memfd -> KVM notifier when memory gets
allocated/invalidated through fallocate().
  - memfd_pfn_ops: kvm -> memfd to get a pfn with the fd+offset.

Memslot extension
-
Add the private fd and the offset into the fd to existing 'shared' memslot
so that both private/shared guest memory can live in one single memslot.
A page in the memslot is either private or shared. A page is private only
when it's already allocated in the backend fd, all the other cases it's
treated as shared, this includes those already mapped as shared as well as
those having not been mapped. This means the memory backend is the place
which tells the truth of which page is private.

Private memory map/unmap and conversion
---
Userspace's map/unmap operations are done by fallocate() ioctl on the
backend fd.
  - map: default fallocate() with mode=0.
  - unmap: fallocate() with FALLOC_FL_PUNCH_HOLE.
The map/unmap will trigger above memfd_falloc_notifier to let KVM
map/unmap second MMU page tables.

Test

NOTE: below is the test for previous TDX based version. For this version
I only tested regular vm booting.

This code has been tested with latest TDX code patches hosted at
(https://github.com/intel/tdx/tree/kvm-upstream) with minimal TDX
adaption and QEMU support.

Example QEMU command line:
-object tdx-guest,id=tdx \
-object memory-backend-memfd-private,id=ram1,size=2G \
-machine 
q35,kvm-type=tdx,pic=no,kernel_irqchip=split,memory-encryption=tdx,memory-backend=ram1

Changelog
--
v3:
  - Added locking protection when calling
invalidate_page_range/fallocate callbacks.
  - Changed memslot structure to keep use useraddr for shared memory.
  - Re-organized F_SEAL_INACCESSIBLE and MEMFD_OPS.
  - Added MFD_INACCESSIBLE flag to force F_SEAL_INACCESSIBLE.
  - Commit message improvement.
  - Many small fixes for comments from the last version.

Links of previous discussions
-
[1] Original design proposal:
https://lkml.kernel.org/kvm/20210824005248.200037-1-sea...@google.com/
[2] Updated proposal and RFC patch v1:
https://lkml.kernel.org/linux-fsdevel/2021141352.26311-1-chao.p.p...@linux.intel.com/
[3] RFC patch v2:
https://x-lore.kernel.org/qemu-devel/2029134739.20218-1-chao.p.p...@linux.intel.com/

Chao Peng (14):
  mm/memfd: Introduce MFD_INACCESSIBLE flag
  KVM: Extend the memslot to support fd-based private memory
  KVM: Maintain ofs_tree for fast memslot lookup by file offset
  KVM: Implement fd-based memory using MEMFD_OPS interfaces
  KVM: Refactor hva based memory invalidation code
  KVM: Special handling for fd-based memory invalidation
  KVM: Split out common memory invalidation code
  KVM: Implement fd-based memory inva

Re: [PATCH] Supporting AST2600 HACE engine accumulative mode

2021-12-23 Thread Cédric Le Goater


[ Adding Klaus ]

On 12/22/21 03:22, Troy Lee wrote:

Accumulative mode will supply a initial state and append padding bit at
the end of hash stream.  However, the crypto library will padding those
bit automatically, so ripped it off from iov array.

Signed-off-by: Troy Lee 
---
  hw/misc/aspeed_hace.c | 30 --
  include/hw/misc/aspeed_hace.h |  1 +
  2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/hw/misc/aspeed_hace.c b/hw/misc/aspeed_hace.c
index 10f00e65f4..7c1794d6d0 100644
--- a/hw/misc/aspeed_hace.c
+++ b/hw/misc/aspeed_hace.c
@@ -27,6 +27,7 @@
  
  #define R_HASH_SRC  (0x20 / 4)

  #define R_HASH_DEST (0x24 / 4)
+#define R_HASH_KEY_BUFF (0x28 / 4)
  #define R_HASH_SRC_LEN  (0x2c / 4)
  
  #define R_HASH_CMD  (0x30 / 4)

@@ -94,7 +95,10 @@ static int hash_algo_lookup(uint32_t reg)
  return -1;
  }
  
-static void do_hash_operation(AspeedHACEState *s, int algo, bool sg_mode)

+static void do_hash_operation(AspeedHACEState *s,
+  int algo,
+  bool sg_mode,
+  bool acc_mode)
  {
  struct iovec iov[ASPEED_HACE_MAX_SG];
  g_autofree uint8_t *digest_buf;
@@ -103,6 +107,7 @@ static void do_hash_operation(AspeedHACEState *s, int algo, 
bool sg_mode)
  
  if (sg_mode) {

  uint32_t len = 0;
+uint32_t total_len = 0;
  
  for (i = 0; !(len & SG_LIST_LEN_LAST); i++) {

  uint32_t addr, src;
@@ -127,6 +132,21 @@ static void do_hash_operation(AspeedHACEState *s, int 
algo, bool sg_mode)
  plen = iov[i].iov_len;
  iov[i].iov_base = address_space_map(&s->dram_as, addr, &plen, 
false,
  MEMTXATTRS_UNSPECIFIED);
+
+total_len += plen;
+if (acc_mode && len & SG_LIST_LEN_LAST) {
+/*
+ * Read the message length in bit from last 64/128 bits
+ * and tear the padding bits from iov
+ */
+uint64_t stream_len;
+
+memcpy(&stream_len, iov[i].iov_base + iov[i].iov_len - 8, 8);
+stream_len = __bswap_64(stream_len) / 8;
+
+if (total_len > stream_len)
+iov[i].iov_len -= total_len - stream_len;
+}
  }
  } else {
  hwaddr len = s->regs[R_HASH_SRC_LEN];
@@ -210,6 +230,9 @@ static void aspeed_hace_write(void *opaque, hwaddr addr, 
uint64_t data,
  case R_HASH_DEST:
  data &= ahc->dest_mask;
  break;
+case R_HASH_KEY_BUFF:
+data &= ahc->key_mask;
+break;
  case R_HASH_SRC_LEN:
  data &= 0x0FFF;
  break;
@@ -234,7 +257,7 @@ static void aspeed_hace_write(void *opaque, hwaddr addr, 
uint64_t data,
  __func__, data & ahc->hash_mask);
  break;
  }
-do_hash_operation(s, algo, data & HASH_SG_EN);
+do_hash_operation(s, algo, data & HASH_SG_EN, data & 
HASH_DIGEST_ACCUM);
  
  if (data & HASH_IRQ_EN) {

  qemu_irq_raise(s->irq);
@@ -333,6 +356,7 @@ static void aspeed_ast2400_hace_class_init(ObjectClass 
*klass, void *data)
  
  ahc->src_mask = 0x0FFF;

  ahc->dest_mask = 0x0FF8;
+ahc->key_mask = 0x0FC0;
  ahc->hash_mask = 0x03ff; /* No SG or SHA512 modes */
  }
  
@@ -351,6 +375,7 @@ static void aspeed_ast2500_hace_class_init(ObjectClass *klass, void *data)
  
  ahc->src_mask = 0x3fff;

  ahc->dest_mask = 0x3ff8;
+ahc->key_mask = 0x3FC0;
  ahc->hash_mask = 0x03ff; /* No SG or SHA512 modes */
  }
  
@@ -369,6 +394,7 @@ static void aspeed_ast2600_hace_class_init(ObjectClass *klass, void *data)
  
  ahc->src_mask = 0x7FFF;

  ahc->dest_mask = 0x7FF8;
+ahc->key_mask = 0x7FF8;
  ahc->hash_mask = 0x00147FFF;
  }
  
diff --git a/include/hw/misc/aspeed_hace.h b/include/hw/misc/aspeed_hace.h

index 94d5ada95f..2242945eb4 100644
--- a/include/hw/misc/aspeed_hace.h
+++ b/include/hw/misc/aspeed_hace.h
@@ -37,6 +37,7 @@ struct AspeedHACEClass {
  
  uint32_t src_mask;

  uint32_t dest_mask;
+uint32_t key_mask;
  uint32_t hash_mask;
  };

[PATCH v3 kvm/queue 04/16] KVM: Extend the memslot to support fd-based private memory

2021-12-23 Thread Chao Peng

Extend the memslot definition to provide fd-based private memory support
by adding two new fields(fd/ofs). The memslot then can maintain memory
for both shared and private pages in a single memslot. Shared pages are
provided in the existing way by using userspace_addr(hva) field and
get_user_pages() while private pages are provided through the new
fields(fd/ofs). Since there is no 'hva' concept anymore for private
memory we cannot call get_user_pages() to get a pfn, instead we rely on
the newly introduced MEMFD_OPS callbacks to do the same job.

This new extension is indicated by a new flag KVM_MEM_PRIVATE.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 include/linux/kvm_host.h | 10 ++
 include/uapi/linux/kvm.h | 12 
 2 files changed, 22 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f8ed799e8674..2cd35560c44b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -460,8 +460,18 @@ struct kvm_memory_slot {
u32 flags;
short id;
u16 as_id;
+   u32 fd;
+   struct file *file;
+   u64 ofs;
 };
 
+static inline bool kvm_slot_is_private(const struct kvm_memory_slot *slot)
+{
+   if (slot && (slot->flags & KVM_MEM_PRIVATE))
+   return true;
+   return false;
+}
+
 static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot 
*slot)
 {
return slot->flags & KVM_MEM_LOG_DIRTY_PAGES;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1daa45268de2..41434322fa23 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -103,6 +103,17 @@ struct kvm_userspace_memory_region {
__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
+struct kvm_userspace_memory_region_ext {
+   __u32 slot;
+   __u32 flags;
+   __u64 guest_phys_addr;
+   __u64 memory_size; /* bytes */
+   __u64 userspace_addr; /* hva */
+   __u64 ofs; /* offset into fd */
+   __u32 fd;
+   __u32 padding[5];
+};
+
 /*
  * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
  * other bits are reserved for kvm internal use which are defined in
@@ -110,6 +121,7 @@ struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES(1UL << 0)
 #define KVM_MEM_READONLY   (1UL << 1)
+#define KVM_MEM_PRIVATE(1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
-- 
2.17.1

[PATCH v3 kvm/queue 07/16] KVM: Refactor hva based memory invalidation code

2021-12-23 Thread Chao Peng

The purpose of this patch is for fd-based memslot to reuse the same
mmu_notifier based guest memory invalidation code for private pages.

No functional changes except renaming 'hva' to more neutral 'useraddr'
so that it can also cover 'offset' in a fd that private pages live in.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 include/linux/kvm_host.h |  8 --
 virt/kvm/kvm_main.c  | 55 ++--
 2 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 21f8b1880723..07863ff855cd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1464,9 +1464,13 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
 }
 
 static inline gfn_t
-hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
+useraddr_to_gfn_memslot(unsigned long useraddr, struct kvm_memory_slot *slot,
+   bool addr_is_hva)
 {
-   gfn_t gfn_offset = (hva - slot->userspace_addr) >> PAGE_SHIFT;
+   unsigned long useraddr_base = addr_is_hva ? slot->userspace_addr
+ : slot->ofs;
+
+   gfn_t gfn_offset = (useraddr - useraddr_base) >> PAGE_SHIFT;
 
return slot->base_gfn + gfn_offset;
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 47e96d1eb233..b7a1c4d7eaaa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -486,16 +486,16 @@ static void kvm_mmu_notifier_invalidate_range(struct 
mmu_notifier *mn,
srcu_read_unlock(&kvm->srcu, idx);
 }
 
-typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
+typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 
 typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,
 unsigned long end);
 
-struct kvm_hva_range {
+struct kvm_useraddr_range {
unsigned long start;
unsigned long end;
pte_t pte;
-   hva_handler_t handler;
+   gfn_handler_t handler;
on_lock_fn_t on_lock;
bool flush_on_ret;
bool may_block;
@@ -515,13 +515,13 @@ static void kvm_null_fn(void)
 #define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)
 
 /* Iterate over each memslot intersecting [start, last] (inclusive) range */
-#define kvm_for_each_memslot_in_hva_range(node, slots, start, last) \
-   for (node = interval_tree_iter_first(&slots->hva_tree, start, last); \
+#define kvm_for_each_memslot_in_useraddr_range(node, tree, start, last)
 \
+   for (node = interval_tree_iter_first(tree, start, last); \
 node;   \
 node = interval_tree_iter_next(node, start, last))  \
 
-static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
- const struct kvm_hva_range 
*range)
+static __always_inline int __kvm_handle_useraddr_range(struct kvm *kvm,
+   const struct kvm_useraddr_range *range)
 {
bool ret = false, locked = false;
struct kvm_gfn_range gfn_range;
@@ -540,17 +540,19 @@ static __always_inline int __kvm_handle_hva_range(struct 
kvm *kvm,
idx = srcu_read_lock(&kvm->srcu);
 
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+   struct rb_root_cached *useraddr_tree;
struct interval_tree_node *node;
 
slots = __kvm_memslots(kvm, i);
-   kvm_for_each_memslot_in_hva_range(node, slots,
+   useraddr_tree = &slots->hva_tree;
+   kvm_for_each_memslot_in_useraddr_range(node, useraddr_tree,
  range->start, range->end - 1) 
{
-   unsigned long hva_start, hva_end;
+   unsigned long useraddr_start, useraddr_end;
 
slot = container_of(node, struct kvm_memory_slot, 
hva_node[slots->node_idx]);
-   hva_start = max(range->start, slot->userspace_addr);
-   hva_end = min(range->end, slot->userspace_addr +
- (slot->npages << PAGE_SHIFT));
+   useraddr_start = max(range->start, 
slot->userspace_addr);
+   useraddr_end = min(range->end, slot->userspace_addr +
+  (slot->npages << 
PAGE_SHIFT));
 
/*
 * To optimize for the likely case where the address
@@ -562,11 +564,14 @@ static __always_inline int __kvm_handle_hva_range(struct 
kvm *kvm,
gfn_range.may_block = range->may_block;
 
/*
-* {gfn(page) | page intersects with [hva_start, 
hva_end)} =
+* {gfn(page) | page intersects with [useraddr_start, 
useraddr_end)} =

[PATCH v3 kvm/queue 10/16] KVM: Implement fd-based memory invalidation

2021-12-23 Thread Chao Peng

KVM gets notified when userspace punches a hole in a fd which is used
for guest memory. KVM should invalidate the mapping in the secondary
MMU page tables. This is the same logic as MMU notifier invalidation
except the fd related information is carried around to indicate the
memory range. KVM hence can reuse most of existing MMU notifier
invalidation code including looping through the memslots and then
calling into kvm_unmap_gfn_range() which should do whatever needed for
fd-based memory unmapping (e.g. for private memory managed by TDX it
may need call into SEAM-MODULE).

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 include/linux/kvm_host.h |  8 -
 virt/kvm/kvm_main.c  | 69 +++-
 virt/kvm/memfd.c |  2 ++
 3 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 07863ff855cd..be567925831b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -233,7 +233,7 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t 
cr2_or_gpa,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
-#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_MEMFD_OPS)
 struct kvm_gfn_range {
struct kvm_memory_slot *slot;
gfn_t start;
@@ -2012,4 +2012,10 @@ static inline void kvm_handle_signal_exit(struct 
kvm_vcpu *vcpu)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
 
+#ifdef CONFIG_MEMFD_OPS
+int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode,
+  unsigned long start, unsigned long end);
+#endif /* CONFIG_MEMFD_OPS */
+
+
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7b7530b1ea1e..f495c1a313bd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -468,7 +468,8 @@ void kvm_destroy_vcpus(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_destroy_vcpus);
 
-#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
+#if defined(CONFIG_MEMFD_OPS) ||\
+   (defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER))
 
 typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 
@@ -595,6 +596,30 @@ static __always_inline int 
__kvm_handle_useraddr_range(struct kvm *kvm,
/* The notifiers are averse to booleans. :-( */
return (int)ret;
 }
+
+static void mn_active_invalidate_count_inc(struct kvm *kvm)
+{
+   spin_lock(&kvm->mn_invalidate_lock);
+   kvm->mn_active_invalidate_count++;
+   spin_unlock(&kvm->mn_invalidate_lock);
+
+}
+
+static void mn_active_invalidate_count_dec(struct kvm *kvm)
+{
+   bool wake;
+
+   spin_lock(&kvm->mn_invalidate_lock);
+   wake = (--kvm->mn_active_invalidate_count == 0);
+   spin_unlock(&kvm->mn_invalidate_lock);
+
+   /*
+* There can only be one waiter, since the wait happens under
+* slots_lock.
+*/
+   if (wake)
+   rcuwait_wake_up(&kvm->mn_memslots_update_rcuwait);
+}
 #endif
 
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
@@ -732,9 +757,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct 
mmu_notifier *mn,
 *
 * Pairs with the decrement in range_end().
 */
-   spin_lock(&kvm->mn_invalidate_lock);
-   kvm->mn_active_invalidate_count++;
-   spin_unlock(&kvm->mn_invalidate_lock);
+   mn_active_invalidate_count_inc(kvm);
 
__kvm_handle_useraddr_range(kvm, &useraddr_range);
 
@@ -773,21 +796,11 @@ static void kvm_mmu_notifier_invalidate_range_end(struct 
mmu_notifier *mn,
.may_block  = mmu_notifier_range_blockable(range),
.inode  = NULL,
};
-   bool wake;
 
__kvm_handle_useraddr_range(kvm, &useraddr_range);
 
/* Pairs with the increment in range_start(). */
-   spin_lock(&kvm->mn_invalidate_lock);
-   wake = (--kvm->mn_active_invalidate_count == 0);
-   spin_unlock(&kvm->mn_invalidate_lock);
-
-   /*
-* There can only be one waiter, since the wait happens under
-* slots_lock.
-*/
-   if (wake)
-   rcuwait_wake_up(&kvm->mn_memslots_update_rcuwait);
+   mn_active_invalidate_count_dec(kvm);
 
BUG_ON(kvm->mmu_notifier_count < 0);
 }
@@ -872,6 +885,32 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 
+#ifdef CONFIG_MEMFD_OPS
+int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode,
+  unsigned long start, unsigned long end)
+{
+   int ret;
+   const struct kvm_useraddr_range useraddr_range = {
+   .start  = start,
+   .end= end,
+   .pte= __pte(0),
+   .handler= kvm_unmap_gfn_range,
+   .on_lock= (void *)kvm_

Re: [PATCH v2] audio: Add sndio backend

2021-12-23 Thread Christian Schoenebeck

On Montag, 20. Dezember 2021 16:41:31 CET Christian Schoenebeck wrote:
> On Freitag, 17. Dezember 2021 10:38:32 CET Alexandre Ratchov wrote:
> > sndio is the native API used by OpenBSD, although it has been ported to
> > other *BSD's and Linux (packages for Ubuntu, Debian, Void, Arch, etc.).
> > 
> > Signed-off-by: Brad Smith 
> > Signed-off-by: Alexandre Ratchov 
> > ---
> > 
> > Thank you for the reviews and all the comments. Here's a second diff
> > with all the suggested changes:
> > 
> > - Replace ISC license by SPDX-License-Identifier header
> > - Fix units (milli- vs micro-) in comment about SNDIO_LATENCY_US
> > - Drop outdated comment about the "size" argument of
> > sndio_get_buffer_out()
> > - Fix AUDIO_FORMAT_U32 handling (missing "break" statement)
> > - Set {read,write] methods to audio_generic_{read,write} (fixes craches)
> > - Check if backend is enabled in sndio_poll_event()
> > - Use https://sndio.org in description
> > - Mark options as available after 7.0 release (instead of 6.2)
> > - Describe sndio-specific options (dev, latency) in qemu-options.hx
> > - Add myself as reviewer to MAINTAINERS
> > - Style fixes: no space after function names, use 4-space indent
> > - Don't use "return foo()" if foo() returns void
> > - Include backend to audio_drivers_priority[]
> > 
> > Tested on OpenBSD, works as expected!
> > 
> >  MAINTAINERS|   5 +
> >  audio/audio.c  |   1 +
> >  audio/audio_template.h |   2 +
> >  audio/meson.build  |   1 +
> >  audio/sndioaudio.c | 555 +
> >  meson.build|   9 +-
> >  meson_options.txt  |   4 +-
> >  qapi/audio.json|  25 +-
> >  qemu-options.hx|  16 ++
> >  tests/vm/freebsd   |   3 +
> >  10 files changed, 618 insertions(+), 3 deletions(-)
> >  create mode 100644 audio/sndioaudio.c
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 7543eb4d59..76bdad064f 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2307,6 +2307,7 @@ X: audio/jackaudio.c
> > 
> >  X: audio/ossaudio.c
> >  X: audio/paaudio.c
> >  X: audio/sdlaudio.c
> > 
> > +X: audio/sndio.c
> > 
> >  X: audio/spiceaudio.c
> >  F: qapi/audio.json
> > 
> > @@ -2349,6 +2350,10 @@ R: Thomas Huth 
> > 
> >  S: Odd Fixes
> >  F: audio/sdlaudio.c
> > 
> > +Sndio Audio backend
> > +R: Alexandre Ratchov 
> > +F: audio/sndio.c
> > +
> 
> Thanks Alexandre for volunteering as reviewer!
> 
> Gerd, would it be OK to set you as maintainer for now until new
> maintainer(s) adopt audio sections? Or should this start with "S: Orphan"
> instead?

Alexandre, if Gerd does not reply in a week or so, then please add "S: Orphan" 
to MAINTAINERS for now to make it clear that there is no maintainer for sndio 
yet to increase the chance for somebody to adopt it.

>From Volker's response I assume you will be posting a v3 anyway.

If nobody takes care to queue your patch then let me know. Maybe I can push it 
through my queue this time, provided that there are enough reviews. I also saw 
your patch just by coincidence BTW, so please CC maintainers of affected files 
as suggested by Volker.

Best regards,
Christian Schoenebeck

Re: [PATCH] pci: Skip power-off reset when pending unplug

2021-12-23 Thread Michael S. Tsirkin

On Wed, Dec 22, 2021 at 04:10:07PM -0700, Alex Williamson wrote:
> On Wed, 22 Dec 2021 15:48:24 -0500
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Dec 22, 2021 at 12:08:09PM -0700, Alex Williamson wrote:
> > > On Tue, 21 Dec 2021 18:40:09 -0500
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Tue, Dec 21, 2021 at 09:36:56AM -0700, Alex Williamson wrote:  
> > > > > On Mon, 20 Dec 2021 18:03:56 -0500
> > > > > "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > > On Mon, Dec 20, 2021 at 11:26:59AM -0700, Alex Williamson wrote:
> > > > > > > The below referenced commit introduced a change where devices 
> > > > > > > under a
> > > > > > > root port slot are reset in response to removing power to the 
> > > > > > > slot.
> > > > > > > This improves emulation relative to bare metal when the slot is 
> > > > > > > powered
> > > > > > > off, but introduces an unnecessary step when devices under that 
> > > > > > > slot
> > > > > > > are slated for removal.
> > > > > > > 
> > > > > > > In the case of an assigned device, there are mandatory delays
> > > > > > > associated with many device reset mechanisms which can stall the 
> > > > > > > hot
> > > > > > > unplug operation.  Also, in cases where the unplug request is 
> > > > > > > triggered
> > > > > > > via a release operation of the host driver, internal device 
> > > > > > > locking in
> > > > > > > the host kernel may result in a failure of the device reset 
> > > > > > > mechanism,
> > > > > > > which generates unnecessary log warnings.
> > > > > > > 
> > > > > > > Skip the reset for devices that are slated for unplug.
> > > > > > > 
> > > > > > > Cc: qemu-sta...@nongnu.org
> > > > > > > Fixes: d5daff7d3126 ("pcie: implement slot power control for pcie 
> > > > > > > root ports")
> > > > > > > Signed-off-by: Alex Williamson   
> > > > > > 
> > > > > > I am not sure this is safe. IIUC pending_deleted_event
> > > > > > is normally set after host admin requested device removal,
> > > > > > while the reset could be triggered by guest for its own reasons
> > > > > > such as suspend or driver reload.
> > > > > 
> > > > > Right, the case where I mention that we get the warning looks exactly
> > > > > like the admin doing a device eject, it calls qdev_unplug().  I'm not
> > > > > trying to prevent arbitrary guest resets of the device, in fact there
> > > > > are cases where the guest really should be able to reset the device,
> > > > > nested assignment in addition to the cases you mention.  Gerd noted
> > > > > that this was an unintended side effect of the referenced patch to
> > > > > reset device that are imminently being removed.
> > > > > 
> > > > > > Looking at this some more, I am not sure I understand the
> > > > > > issue completely.
> > > > > > We have:
> > > > > > 
> > > > > > if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) 
> > > > > > &&
> > > > > > (val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF &&
> > > > > > (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
> > > > > > (old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != 
> > > > > > PCI_EXP_SLTCTL_PIC_OFF)) {
> > > > > > pcie_cap_slot_do_unplug(dev);
> > > > > > }
> > > > > > pcie_cap_update_power(dev);
> > > > > > 
> > > > > > so device unplug triggers first, reset follows and by that time
> > > > > > there should be no devices under the bus, if there are then
> > > > > > it's because guest did not clear the power indicator.
> > > > > 
> > > > > Note that the unplug only triggers here if the Power Indicator Control
> > > > > is OFF, I see writes to SLTCTL in the following order:
> > > > > 
> > > > >  01f1 - > 02f1 -> 06f1 -> 07f1
> > > > > 
> > > > > So PIC changes to BLINK, then PCC changes the slot to OFF (this
> > > > > triggers the reset), then PIC changes to OFF triggering the unplug.
> > > > > 
> > > > > The unnecessary reset that occurs here is universal.  Should the 
> > > > > unplug
> > > > > be occurring when:
> > > > > 
> > > > >   (val & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_ON
> > > > > 
> > > > > ?
> > > > 
> > > > well blinking generally means "do not remove yet".  
> > > 
> > > Blinking indicates that the slot is in a transition phase,  
> > 
> > Well the spec seems to state that blinking indicates it's waiting
> > to see user does not change his/her mind by pressing the
> > button again.
> 
> We're dealing with the Power Indicator, not the Attention Indicator
> here.

Let's make sure we are talking about the same here:


The Attention Indicator, which must be yellow or amber in color, indicates that 
an operational
problem exists or that the hot-plug slot is being identified so that a human 
operator can locate it
easily.

and

Attention Indicator Blinking
A blinking Attention Indicator indicates that system software is identifying 
this slot for a human
operator to find. This behavior is controlled by a user (for example, from a 
software user interface
or management tool).


On the other ha

Re: [PATCH] acpi: validate hotplug selector on access

2021-12-23 Thread Michael S. Tsirkin

On Thu, Dec 23, 2021 at 10:58:14AM +0100, Mauro Matteo Cascella wrote:
> Hi,
> 
> On Wed, Dec 22, 2021 at 9:52 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Dec 22, 2021 at 09:27:51PM +0100, Philippe Mathieu-Daudé wrote:
> > > On Wed, Dec 22, 2021 at 9:20 PM Michael S. Tsirkin  
> > > wrote:
> > > > On Wed, Dec 22, 2021 at 08:19:41PM +0100, Philippe Mathieu-Daudé wrote:
> > > > > +Mauro & Alex
> > > > >
> > > > > On 12/21/21 15:48, Michael S. Tsirkin wrote:
> > > > > > When bus is looked up on a pci write, we didn't
> > > > > > validate that the lookup succeeded.
> > > > > > Fuzzers thus can trigger QEMU crash by dereferencing the NULL
> > > > > > bus pointer.
> > > > > >
> > > > > > Fixes: b32bd763a1 ("pci: introduce acpi-index property for PCI 
> > > > > > device")
> > > > > > Cc: "Igor Mammedov" 
> > > > > > Fixes: https://gitlab.com/qemu-project/qemu/-/issues/770
> > > > > > Signed-off-by: Michael S. Tsirkin 
> > > > >
> > > > > It seems this problem is important enough to get a CVE assigned.
> > > >
> > > > Guest root can crash guest.
> > > > I don't see why we would assign a CVE.
> > >
> > > Well thinking about downstream distributions, if there is a CVE assigned,
> > > it helps them to have it written in the commit. Maybe I am mistaken.
> > >
> > > Unrelated but it seems there is a coordination problem with the
> > > qemu-security@ list,
> > > if this isn't a security issue, why was a CVE requested?
> >
> > Right.  I don't think a priveleged user crashing VM warrants a CVE,
> > it can just halt a CPU or whatever. Just cancel the CVE request pls.
> 
> While I agree with you that this is kind of borderline and I expressed
> similar concerns in the past, I was told that:
> 
> 1) root guest users are not necessarily trustworthy (from the host 
> perspective).
> 2) NULL pointer deref and similar issues caused by an
> ill-handled/error condition are CVE worthy, even if triggered by root.
> 3) In other cases, DoS triggered by root is not a security issue
> because it's an expected behavior and not an ill-handled/error
> condition (think of assert failures, for example).
> 
> In other words, "ill-handled condition" is the crucial factor that
> makes a bug CVE worthy or not.

I guess the point is that a downstream might have a slightly different
code path where it would be more serious ...
OK then, not a big deal for me. So what's the CVE # then?

> +Prasad, can you shed some light on this? Is my understanding correct?
> 
> Also, please note that we regularly get CVE requests for bugs like
> this and many CVEs have been assigned in the past. Of course that
> doesn't mean we can't change things going forward, but I think we
> should make it clear (probably here:
> https://www.qemu.org/docs/master/system/security.html) that these
> kinds of bugs are not eligible for CVE assignment.


That would be good, yes.

> > > > > Mauro, please update us when you get the CVE number.
> > > > > Michael, please amend the CVE number before committing the fix.
> > > > >
> > > > > FWIW Paolo asked every fuzzed bug reproducer to be committed
> > > > > as qtest, see tests/qtest/fuzz*c. Alex has a way to generate
> > > > > reproducer in plain C.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Phil.
> > > >
> >
> 
> -- 
> Mauro Matteo Cascella
> Red Hat Product Security
> PGP-Key ID: BB3410B0

Re: [PATCH v1 1/2] hw/misc: Implementating dummy AST2600 I3C model

2021-12-23 Thread Cédric Le Goater




Hello,

On 12/22/21 10:23, Troy Lee wrote:

Introduce a dummy AST2600 I3C model.

Aspeed 2600 SDK enables I3C support by default.  The I3C driver will try
to reset the device controller and setup through device address table
register.  This dummy model response these register with default value
listed on ast2600v10 datasheet chapter 54.2.  If the device address
table register doesn't set correctly, it will cause guest machine kernel
panic due to reference to invalid address.


Overall looks good. Some comments,



Signed-off-by: Troy Lee 
---
  hw/misc/aspeed_i3c.c | 258 +++
  hw/misc/meson.build  |   1 +
  include/hw/misc/aspeed_i3c.h |  30 
  3 files changed, 289 insertions(+)
  create mode 100644 hw/misc/aspeed_i3c.c
  create mode 100644 include/hw/misc/aspeed_i3c.h

diff --git a/hw/misc/aspeed_i3c.c b/hw/misc/aspeed_i3c.c
new file mode 100644
index 00..9d2bda203e
--- /dev/null
+++ b/hw/misc/aspeed_i3c.c
@@ -0,0 +1,258 @@
+/*
+ * ASPEED I3C Controller
+ *
+ * Copyright (C) 2021 ASPEED Technology Inc.
+ *
+ * This code is licensed under the GPL version 2 or later.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "hw/misc/aspeed_i3c.h"
+#include "qapi/error.h"
+#include "migration/vmstate.h"
+
+/* I3C Controller Registers */
+#define R_I3CG_REG0(x)  (((x * 0x10) + 0x10) / 4)
+#define  I3CG_REG0_SDA_PULLUP_EN_MASK   GENMASK(29, 28)


GENMASK() is a macro defined in the FSI model which is not upstream.
There are other ways to define bitfield masks in QEMU. Please take a
look at include/hw/registerfields.h.



+#define  I3CG_REG0_SDA_PULLUP_EN_2K BIT(28)
+#define  I3CG_REG0_SDA_PULLUP_EN_750BIT(29)
+#define  I3CG_REG0_SDA_PULLUP_EN_545(BIT(29) | BIT(28))
+
+#define R_I3CG_REG1(x)  (((x * 0x10) + 0x14) / 4)
+#define  I3CG_REG1_I2C_MODE BIT(0)
+#define  I3CG_REG1_TEST_MODEBIT(1)
+#define  I3CG_REG1_ACT_MODE_MASKGENMASK(3, 2)
+#define  I3CG_REG1_ACT_MODE(x)  (((x) << 2) & I3CG_REG1_ACT_MODE_MASK)
+#define  I3CG_REG1_PENDING_INT_MASK GENMASK(7, 4)
+#define  I3CG_REG1_PENDING_INT(x)   (((x) << 4) & I3CG_REG1_PENDING_INT_MASK)
+#define  I3CG_REG1_SA_MASK  GENMASK(14, 8)
+#define  I3CG_REG1_SA(x)(((x) << 8) & I3CG_REG1_SA_MASK)
+#define  I3CG_REG1_SA_ENBIT(15)
+#define  I3CG_REG1_INST_ID_MASK GENMASK(19, 16)
+#define  I3CG_REG1_INST_ID(x)   (((x) << 16) & I3CG_REG1_INST_ID_MASK)
+
+/* I3C Device Registers */
+#define R_DEVICE_CTRL   (0x00 / 4)
+#define R_DEVICE_ADDR   (0x04 / 4)
+#define R_HW_CAPABILITY (0x08 / 4)
+#define R_COMMAND_QUEUE_PORT(0x0c / 4)
+#define R_RESPONSE_QUEUE_PORT   (0x10 / 4)
+#define R_RX_TX_DATA_PORT   (0x14 / 4)
+#define R_IBI_QUEUE_STATUS  (0x18 / 4)
+#define R_IBI_QUEUE_DATA(0x18 / 4)
+#define R_QUEUE_THLD_CTRL   (0x1c / 4)
+#define R_DATA_BUFFER_THLD_CTRL (0x20 / 4)
+#define R_IBI_QUEUE_CTRL(0x24 / 4)
+#define R_IBI_MR_REQ_REJECT (0x2c / 4)
+#define R_IBI_SIR_REQ_REJECT(0x30 / 4)
+#define R_RESET_CTRL(0x34 / 4)
+#define R_SLV_EVENT_CTRL(0x38 / 4)
+#define R_INTR_STATUS   (0x3c / 4)
+#define R_INTR_STATUS_EN(0x40 / 4)
+#define R_INTR_SIGNAL_EN(0x44 / 4)
+#define R_INTR_FORCE(0x48 / 4)
+#define R_QUEUE_STATUS_LEVEL(0x4c / 4)
+#define R_DATA_BUFFER_STATUS_LEVEL  (0x50 / 4)
+#define R_PRESENT_STATE (0x54 / 4)
+#define R_CCC_DEVICE_STATUS (0x58 / 4)
+#define R_DEVICE_ADDR_TABLE_POINTER (0x5c / 4)
+#define  DEVICE_ADDR_TABLE_DEPTH(x) (((x) & GENMASK(31, 16)) >> 16)
+#define  DEVICE_ADDR_TABLE_ADDR(x)  ((x) & GENMASK(7, 0))
+#define R_DEV_CHAR_TABLE_POINTER(0x60 / 4)
+#define R_VENDOR_SPECIFIC_REG_POINTER   (0x6c / 4)
+#define R_SLV_MIPI_PID_VALUE(0x70 / 4)
+#define R_SLV_PID_VALUE (0x74 / 4)
+#define R_SLV_CHAR_CTRL (0x78 / 4)
+#define R_SLV_MAX_LEN   (0x7c / 4)
+#define R_MAX_READ_TURNAROUND   (0x80 / 4)
+#define R_MAX_DATA_SPEED(0x84 / 4)
+#define R_SLV_DEBUG_STATUS  (0x88 / 4)
+#define R_SLV_INTR_REQ  (0x8c / 4)
+#define R_DEVICE_CTRL_EXTENDED  (0xb0 / 4)
+#define R_SCL_I3C_OD_TIMING (0xb4 / 4)
+#define R_SCL_I3C_PP_TIMING (0xb8 / 4)
+#define R_SCL_I2C_FM_TIMING (0xbc / 4)
+#define R_SCL_I2C_FMP_TIMING(0xc0 / 4)
+#define R_SCL_EXT_LCNT_TIMING   (0xc8 / 4)
+#define R_SCL_EXT_TERMN_LCNT_TIMING (0xcc / 4)
+#define R_BUS_FREE_TIMING   (0xd4 / 4)
+#define R_BUS_IDLE_TIMIN

Re: [PATCH v1 2/2] hw/arm/aspeed_ast2600: create i3c instance

2021-12-23 Thread Cédric Le Goater


On 12/22/21 10:23, Troy Lee wrote:

This patch includes i3c instance in ast2600 soc.

Signed-off-by: Troy Lee 


Looks good but it is based on the QEMU aspeed branch for OpenBMC.
You should rebase on upstream.

Thanks,

C.


---
  hw/arm/aspeed_ast2600.c | 12 
  include/hw/arm/aspeed_soc.h |  3 +++
  2 files changed, 15 insertions(+)

diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index f2fef9d706..219b025bc2 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -63,6 +63,7 @@ static const hwaddr aspeed_soc_ast2600_memmap[] = {
  [ASPEED_DEV_VUART] = 0x1E787000,
  [ASPEED_DEV_FSI1]  = 0x1E79B000,
  [ASPEED_DEV_FSI2]  = 0x1E79B100,
+[ASPEED_DEV_I3C]   = 0x1E7A,
  [ASPEED_DEV_SDRAM] = 0x8000,
  };
  
@@ -112,6 +113,7 @@ static const int aspeed_soc_ast2600_irqmap[] = {

  [ASPEED_DEV_FSI1]  = 100,
  [ASPEED_DEV_FSI2]  = 101,
  [ASPEED_DEV_DP]= 62,
+[ASPEED_DEV_I3C]   = 102,   /* 102 -> 107 */
  };
  
  static qemu_irq aspeed_soc_get_irq(AspeedSoCState *s, int ctrl)

@@ -230,6 +232,8 @@ static void aspeed_soc_ast2600_init(Object *obj)
  
  object_initialize_child(obj, "pwm", &s->pwm, TYPE_ASPEED_PWM);
  
+object_initialize_child(obj, "i3c", &s->i3c, TYPE_ASPEED_I3C);

+
  object_initialize_child(obj, "fsi[*]", &s->fsi[0], TYPE_ASPEED_APB2OPB);
  }
  
@@ -542,6 +546,14 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp)

  sysbus_connect_irq(SYS_BUS_DEVICE(&s->pwm), 0,
 aspeed_soc_get_irq(s, ASPEED_DEV_PWM));
  
+/* I3C */

+if (!sysbus_realize(SYS_BUS_DEVICE(&s->i3c), errp)) {
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->i3c), 0, sc->memmap[ASPEED_DEV_I3C]);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->i3c), 0,
+   aspeed_soc_get_irq(s, ASPEED_DEV_I3C));
+
  /* FSI */
  if (!sysbus_realize(SYS_BUS_DEVICE(&s->fsi[0]), errp)) {
  return;
diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index 0db200d813..0c950fab3c 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -21,6 +21,7 @@
  #include "hw/timer/aspeed_timer.h"
  #include "hw/rtc/aspeed_rtc.h"
  #include "hw/i2c/aspeed_i2c.h"
+#include "hw/misc/aspeed_i3c.h"
  #include "hw/ssi/aspeed_smc.h"
  #include "hw/misc/aspeed_hace.h"
  #include "hw/watchdog/wdt_aspeed.h"
@@ -53,6 +54,7 @@ struct AspeedSoCState {
  AspeedRtcState rtc;
  AspeedTimerCtrlState timerctrl;
  AspeedI2CState i2c;
+AspeedI3CState i3c;
  AspeedSCUState scu;
  AspeedHACEState hace;
  AspeedXDMAState xdma;
@@ -148,6 +150,7 @@ enum {
  ASPEED_DEV_FSI2,
  ASPEED_DEV_DPMCU,
  ASPEED_DEV_DP,
+ASPEED_DEV_I3C,
  };
  
  #endif /* ASPEED_SOC_H */

1 2 3 >

1 - 100 of 225 matches

Mail list logo